Operating Systems and Systems Integration The Rescue Disk, Trouble Shooting and other Related Topics 1 Aim • be aware of the usefulness of mini-Linux distributions in solving many problems, many of which are not Linux problems • have practical experience with one of the best mini-Linux distributions, tomsrtbt, available from http://www.toms.net/rb/. • be introduced to a shell script report that may be used to diagnose many computer problems, and through that: ◦ learn some aspects of shell script programming ◦ learn about the Linux /proc file system and its usefulness in describing the computer and its operation, as well as the operation of the operating system After completing this exercise, you will: 2 2.1 Procedure A mini-Linux distribution: tomsrtbt The mini Linux distribution tomsrtbt is useful for recovering from many problems; it can be used to start a restore from backups after the hard disk has been completely destroyed. It is also useful as a trouble-shooting aid. This is how we will use it today. Note that tomsrtbt uses the old libc5 libraries, which are not included in Red Hat 7.0 and later. You will need to install these libraries to be able to work with tomsrtbt. It is convenient to install them from Red Hat 6.2. 1. NFS mount the ftp directory from CSAlinux: $ sudo mount CSAlinux:/var/ftp/pub /mnt 2. Change to the Red Hat 6.2 RPM directory: $ cd /mnt/redhat-6.2/RedHat/RPMS 3. Install the libc5 libraries: $ sudo rpm -Uhv ld.so-1.9.5-13.i386.rpm libc-5.3.12-31.i386.rpm Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration  ¨ 2 Use the  © to complete the filenames rather than typing all the numbers in Tab key manually. 4. Change to your home directory. 5. Extract the contents of the file tomsrtbt-1.7.218.tar.gz with the command $ tar xvzf /mnt/tomsrtbt/tomsrtbt-1.7.218.tar.gz Again, use filename completion. 6. Unmount the network file system from CSAlinux: $ sudo umount /mnt 7. Install tomsrtbt on your floppy disk by changing into the directory ∼/tomsrtbt-1.7.218 and typing $ sudo ./install.s 8. If you do not see the message “Succeeded!” after a few minutes of disk activity, your floppy disk has some defects; since it needs to be formatted with a high capacity, the floppy disk needs to be in new condition. Use a new floppy disk. 9. Boot a computer with this floppy disk. 10. Log in as root. 11. Change to the directory /proc and examine the content of the files cpuinfo, meminfo, interrupts, pci. Can you imagine any uses for these files? 12. List the partition information on your computer. 2.2 The shell script report 1. Download the shell script report from http://CSAlinux.tycm.vtc.edu.hk/ossi/ lab/tomsrtbt/report (from the subject home page), and save it to your ∼/bin directory (create ∼/bin if it doesn’t exist). 2. Make it executable (you should know how!), then execute it to see what it does. 3. Take a few minutes to look at the files that have been made in the directory /tmp/reports. What could these files be used for? Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 3 2.3 Rebuilding tomsrtbt to create a trouble shooting tool 1. Open the file tomsrtbt.FAQ so that you can easily refer to it while working in this section. 2. Unpack the contents of tomsrtbt onto your hard disk by executing sudo ./unpack.s tomsrtbt.raw 3. Copy the shell script report to the directory tomsrtbt-1.7.218.unpacked/2/usr/bin 4. Rebuild a new disk image containing report by executing sudo 2/usr/doc/buildit.s 5. A new package will be formed in your current directory. Change into the new directory (which has a name like xxxx.tycm.vtc.edu.hk-tomsrtbt-1.7.219), and rewrite your tomsrtbt disk by executing ./install.s there. 6. Boot a computer with this disk, and execute the report program there. 7. Examine the files created there. 8. See if you can devise a way of: (a) putting these files into a second fat formatted floppy disk (b) putting them onto the tomsrtbt disk itself. You will need to delete some files from your 2 directory and rebuild the disk image to achieve this. Homework, due at the second laboratory session (within two weeks, i.e., week 24): Arrange this so that it is all automatic, so that you have a single disk that you could give to a customer with instructions to boot the computer with one floppy disk, then return the disk to you so that you have evidence that you can analyse to solve problems, without a site visit. Here are a few ideas that you might like to consider: • You will need to delete some commands from tomsrtbt to make room for the reports. You could start by deleting the two editors and the man pages. You may need a bit of trial and error to see what commands are required for this all to work. • The file 1/rc.custom.gz is the startup script. You will probably need to start report from this script; you will do most (all?) of the customisation here. • You could write the reports to the ram disk in /tmp • You could compress the reports so that they take less space using tar and bzip2 • You will need to mount the floppy with a command like # mount /dev/fd0u1722 /fl • Your script should shutdown the computer when it is finished. A A.1 Appendix: the shell script report and shell programming The shell script itself The shell script report is shown here: Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 #! /bin/sh # # # # Nick Urbanik report information about the computer, reading from proc file system and Linux hard disk partition (if any) and storing the result in a directory that may be on a floppy disk. # Designed to work with tomsrtbt or from a hard disk. # Here is my attempt to find out whether the root device is a hard disk # or a RAM disk. # Now what devices are hard disks? # could be a label (if read fstab, but not output of mount) # could be [hs]d[a-z][1-9][0-9]∗ # could be md[0-9]+ # return 0 if root partition is a hard disk # return 1 otherwise mounted on hd() { for line in "‘mount | grep '[ \t]/[ \t]'‘" do mount point=‘echo $ ine | awk '{print $3}'‘ if [ "$mount_point" = "/" ]; then device="‘echo $line | awk '{print $1}'‘" is hd=‘echo $device | awk '/^\/dev\/[hs]d[a-z][1-9][0-9]*$/ {print 0; next} /^\/dev\/md[0-9]+/ {print 0; next} {print 1}'‘ return $is hd fi; done return 1; } # A utility subroutine used by other subroutines # checks that the directory $INFODIR exists. ensure INFODIR exists() { subroutine=$1 if [ −z "$INFODIR" −o ! −d "$INFODIR" ] then echo $prog: $subroutine called without initialised \$INFODIR exit 1 fi } # A utility subroutine used by other subroutines # checks that the directory $c exists. ensure hd mountpoint exists() { subroutine=$1 if [ −z "$c" −o ! −d "$c" ] then echo $prog: $subroutine called without initialised mount point \$c exit 1 fi } # reads the output of fdisk, and dmesg, # and copies the content of a number of files in the /proc file system. # Assume that $INFODIR exists Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 5 report generic info() { ensure INFODIR exists "report_generic_info" fdisk −l > $INFODIR/fdisk 2>&1 # print plenty of stuff from proc: # exclude kcore, kmsg and ksyms for f in ‘ls −1 /proc | grep −v '^k' | grep −v '[0-9]'‘ do if [ ! −d /proc/$f ]; then echo $f: cat /proc/$f echo ' ' fi done > $INFODIR/proc for f in /proc/sys/kernel/∗ do if [ ! −d $f ]; then echo $f: cat $f echo ' ' fi done > $INFODIR/proc−kernel dmesg > $INFODIR/dmesg } # # # # # # print the device names of all partitions that fdisk reported as having the parition type Linux This assumes non-raid. problems detecting raid, journalling file systems 1. tomsrtbt won’t handle raid 2. How determine what raid devices exist? print linux partition devices() { if [ −f $INFODIR/fdisk ]; then grep '83 *Linux' $INFODIR/fdisk | awk '{print $1}' else fdisk −l 2> /dev /null|grep '83 *Linux' | awk '{print $1}' fi } # print the name of the root partition device on standard output if find it. # Assume the root partition on the hard disk is not mounted # Assume $c exists and is a directory: guess root partition() { ensure hd mountpoint exists "guess_root_partition" ensure INFODIR exists "guess_root_partition" for d in ‘print linux partition devices‘ do mount $d $c if [ −f $c/etc/fstab ]; then echo $d fi umount $c done } # takes one parameter, the partition device file. # mount the device on $c mount root partition() { Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 6 device=$1 if mount | grep $device > /dev /null 2>&1 then # false return 1 fi ensure hd mountpoint exists "mount_root_partition" if mount $device $c then # true return 0 else echo $prog: unable to mount $device on $c\; giving up exit 1 fi } # if we have something mounted on $c, read plenty of Linux system configuration # files from it. We could do the same for windows, but I don’t know # enough about the registry. copy config from root partition() { device=$1; if mount | grep $c > /dev /null 2>&1 then : else echo $prog: nothing mounted on $c\; giving up exit fi if $use floppy ; then if mount | grep $a > /dev /null 2>&1 then : else echo $prog: no floppy mounted on $a\; giving up exit fi fi ensure INFODIR exists "copy_config_from_root_partition" for i in fstab lilo.conf conf.modules modules.conf \ sysconfig /network issue resolv.conf hosts profile bashrc \ inittab auto.conf do cp −p $c/etc/$i $INFODIR/$d > /dev /null 2>&1 done if [ −d $c/etc/rc.d ] then tar cf − $c/etc/rc.d 2> /dev /null | bzip2 −9 > $INFODIR/rc−d.tar.bz2 fi if [ −d "$c/etc/sysconfig" ] then tar cf − $c/etc/sysconfig 2> /dev /null | bzip2 −9 > $INFODIR/sysconfig.tar.bz2 fi if [ −f "$c/var/log/messages" ] then # bzip2 -9c $c/var/log/messages > $INFODIR/messages.bz2 tail −30000 $c/var /log /messages | bzip2 −9 > $INFODIR/messages.bz2 Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 7 fi df > $INFODIR/df mount > $INFODIR/mount } # This subroutine is called automatically whenever the program exits, # either because exit was called (signal zero), or an interrupt, hangup # or terminate signal were sent to this script. clean up() { echo cleaning up. . . rmdir $a > /dev /null 2>&1 if $use floppy then umount $FD > /dev /null 2>&1 fi if [ "$c" = "/tmp/c" ] then umount $c > /dev /null 2>&1 rmdir $c fi } # tomsrtbt doesn’t have a basename. # given a file name like “/this/directory/name/contains/a/file”, # will print “file” basen() { dir =$1; echo $dir | awk 'BEGIN{FS="/"}{print $NF}' } # find the first part of the hostname without the domain name part. # for example, if hostname gives “nickpc.tycm.vtc.edu.hk”, will print # “nickpc” # hostname -s doesnt work on tomsrtbt hostname short() { hostname | awk '{ sub( "\\. .*$", "" ) }{ printf $0 }' } # tomsrtbt doesn’t have tr, so can’t use: # INFODIR=$a/‘date | tr ’ :’ ’-’‘ # take output from the date command (which looks like: # Thu Feb 15 11:55:49 HKT 2001 # and replace all spaces and colons by hyphens, so it looks like this: # Thu-Feb-15-11-55-49-HKT-2001 # Windows is not happy about the colons as a file name. date to dir name() { date | awk '{ gsub( "[ :]", "-" ); printf $0 }' } usage() { cat < ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 8 −o dir, −odir −d, −−debug EOF exit $1 } put output into dir instead of /tmp/reports Show all commands as they are executed prog =‘basen $0‘ a=/tmp/reports c=/tmp/c INFODIR=$a/‘date to dir name‘−‘hostname short‘ use floppy =false while [ $# -ne 0 ] # loop over arguments do case $1 in −−) shift break ;; −f |−−floppy ) use floppy =true ;; −d|−−debug ) set −x ;; −h|−−help) usage 0 1>&2 ;; −o) use floppy =false INFODIR="$2" shift ;; −o∗) use floppy =false INFODIR=‘echo "$1" | sed 's/-o//'‘ ;; ∗) echo unknown option $1 usage 1 1>&2 ;; esac shift done mkdir $c mkdir $a > /dev /null 2>&1 if $use floppy ; then echo "Getting basic info about this computer. . ." echo insert a formatted floppy with some disk space, any key to continue. . . read anykey FD=/dev /fd0 if mount −t vfat $FD $a Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 9 then : else echo $prog: cannot mount floppy on $a\; giving up exit 1 fi fi # cleanup on exit, hangup, interrupt, quit, termination trap clean up 0 1 2 3 15 mkdir $INFODIR report generic info if mounted on hd then c=/ copy config from root partition else for device in ‘guess root partition‘ do echo your root partition seems to be $device. mount root partition $device copy config from root partition $device umount $device done fi $use floppy && umount $a rmdir $a > /dev /null 2>&1 A warning: I completed this script recently, and have not tested it extensively. I need your help there! A.2 Features of shell programming used in this shell script I have written comments in the script to describe what it is doing. I hope that I have used reasonably self-documenting code. But I shall point out a few things that aren’t obvious. Most questions can be answered by reading the manual pages for bash, test, awk and sed. A.2.1 Control Structures in Shell programming 1. The shell provides the following control structures: • a while loop (see line 252). The syntax is: while condition ; do commands ; done • a for loop (see lines 19, 66, 74, 107, 161 and 316). The syntax is: for variable in list of words ; do commands ; done The variable is set to the value of each of the words in turn, and the commands in the loop body are executed once for each word. This is a different syntax from for loops in C or Java. • an if statement with syntax: if condition ; then commands ; elif condition commands ;... else commands fi I’ve used it very much here. See lines 22, 39, 50, 68, 76, 93, 110, 121, 128, 143, 150, 151, 167, 171, 175, 191, 195, 298 and 311. Nick Urbanik ver. 1.4 The Rescue Disk, Trouble Shooting and other Related Topics Operating Systems and Systems Integration 10 • a case statement that works rather like the switch statement in C. The syntax is case word in pattern | pattern ) commands ;; ... esac See lines 254 to 287. A.2.2 Other Features of Shell Programming awk and sed: You may notice that I have used the program awk on lines 21, 24, 26, 94, 96, 207, 215, 226 and the program sed on line 280. We have not (nor will we) cover these in detail. awk is a whole programming language in its own right, but here I have used it mainly for splitting strings into their component parts, though I have also used its regular expressions for translating strings. As with most other things in Linux, you can read about them in their man pages. Back Quotes and Command Substitution You have noticed that back quotes (or back ticks) have some special property when you installed the Secure Shell agent. They are used here on lines 19, 21, 24, 25, 28, 66, 107, 246, 249, 280 and 316. The effect of back quotes is called command substitution; you can search for this phrase in the bash manual page. In the example ‘command‘, the shell performs the expansion by executing command and replacing ‘command‘ with the standard output of command, with any trailing newlines deleted. Functions can only return a number. To get the effect of returning a string, we can have the function print the string to standard output, then put the function call in back quotes, as in lines 107, 249 and 316. Here documents The usage() function uses a here document, where all the text between < ver. 1.4