\documentclass{ictlab}

\RCS $Revision: 1.10 $

\usepackage{alltt,key,xr,cols}
\externaldocument[lt-]%
{../../linux_training-plus-config-files-ossi/build/masterfile}

\usepackage[pdfpagemode=None,pdfauthor={Nick Urbanik}]{hyperref}

\newcommand*{\labTitle}{Shell Programming---an Introduction}
\renewcommand*{\subject}{Operating Systems and Systems Integration}

\providecommand*{\RPM}{\acro{RPM}\xspace}
\providecommand*{\CD}{\acro{CD}\xspace}

\begin{document}
%\Large
\tableofcontents

\section{Aim}
\label{sec:aim}

After successfully working through this exercise, You will:
\begin{itemize}
\item write simple shell scripts using \texttt{for}, \texttt{if},
  \texttt{while} statements;
\item understand basic regular expressions, and be able to create your
  own regular expressions;
\item understand how to execute and debug these scripts;
\item understand some simple shell scripts written by others, and
\item be ready to begin to perform automated editing of configuration files
  using \texttt{sed}
\item be ready to begin customizing an installation using an automated
  installation method called \texttt{kickstart}, which will be our
  next laboratory topic.
\end{itemize}

\section{Background}
\label{sec:background}

A working knowledge of shell scripting is essential to everyone
wishing to become reasonably adept at system administration, even if
they do not anticipate ever having to actually write a script.
Consider that as a Linux machine boots up, it executes the shell
scripts in \texttt{/etc/rc.d} to restore the system configuration and
set up services. A detailed understanding of these startup scripts is
important for analyzing the behaviour of a system, and possibly
modifying it.

Writing shell scripts is not hard to learn, since the scripts can be
built in bite-sized sections and there is only a fairly small set of
shell-specific operators and options to learn. The syntax is
simple and straightforward, similar to that of invoking and chaining
together utilities at the command line, and there are only a few
``rules'' to learn. Most short scripts work right the first time, and
debugging even the longer ones is straightforward.

A shell script is a ``quick and dirty'' method of prototyping a complex
application. Getting even a limited subset of the functionality to
work in a shell script, even if slowly, is often a useful first stage
in project development. This way, the structure of the application can
be tested and played with, and the major pitfalls found before
proceeding to the final coding in C, \Cpp, Java, or Perl.

Shell scripting hearkens back to the classical UNIX philosophy of
breaking complex projects into simpler subtasks, of chaining together
components and utilities.

\subsection{Where to get more information}
\label{sec:references}

There is a free on-line book about shell programming at:
\url{http://www.linuxdoc.org/LDP/abs/html/index.html} and
\url{http://www.linuxdoc.org/LDP/abs/abs-guide.pdf}.  The handy
reference to shell programming is:
\begin{verbatim}
$ pinfo bash
\end{verbatim}%$
or
\begin{verbatim}
$ man bash
\end{verbatim}%$

\section{The Shebang}
\label{sec:shebang}

A shell script is started by the Linux kernel.  The kernel reads the
first two bytes of the executable file to determine how to execute
it.  If it starts with the characters ``\texttt{\#!}'' then the kernel
will consider this to be executed as a script, run by an interpreter.
The kernel then reads the next characters after the ``\texttt{\#!}''
to determine what interpreter to use.

For shell scripts, the interpreter is \texttt{/bin/sh}, so the first
line of all our shell scripts is:
\begin{verbatim}
#! /bin/sh
\end{verbatim}
If you make any typing mistake in the name of the interpreter, you
will get an error message such as ``bad interpreter: No such file or
directory.''

\section{Making the script executable}
\label{sec:chmod+x}

To easily execute a script, it should:
\begin{itemize}
\item be on the \texttt{PATH}
\item have execute permission.
\end{itemize}
How to do each of these?
\begin{itemize}
\item Red Hat Linux by default, includes the directory
  $\sim$\texttt{/bin} on the \texttt{PATH}, so create this directory,
  and put your scripts there.
\item If your script is called \texttt{script}, then this command will
  make it executable:
\begin{verbatim}
$ chmod +x script
\end{verbatim}%$
\end{itemize}

\section{True and False}
\label{sec:true-and-false}

Shell programming uses external programs very much.  When program
execution is successful, programs have an \emph{exit status} of 0, and a
non-zero error code when not successful.  As a result, shell
programming uses the value 0 as true, and non-zero as false.

\section{Shell Variables}
\label{sec:variables}

When using the value of a variable, the variable starts with a dollar
sign `\texttt{\$}' When assigning a value to a variable, the variable
has no dollar sign.  An assignment has no spaces either side of the
`\texttt{=}':
\begin{verbatim}
a=375
hello=$a
PATH="$PATH:/sbin:/usr/sbin"
\end{verbatim}

\subsection{Baby Can't Change Parent}
\label{sec:baby-cant-change-parent}

Nonsense, any parent will tell me.  Okay, I'm talking about processes,
not humans.  When a parent process \texttt{fork()}s and has a child
process, the child process inherits all the environment variables of
the parent.  But the child process cannot change any environment
variable of the parent.

If you write a shell script that sets some environment variables and
then exits, you will find that all these new values have disappeared.
This applies to subshells too, so values set in a subshell are
``local''.  To execute some commands in a subshell, put parentheses
around them.  See the example in section~\ref{sec:special-variables}.
Here is an example of what I am talking about:
\begin{verbatim}
$ echo $HOME
/home/nicku
$ pwd
$ cat baby
#! /bin/sh
cd /usr
HOME="Tsing Yi"
echo $HOME
pwd
$ ./baby 
Tsing Yi
/usr
$ echo $HOME
/home/nicku
$ pwd
/home/nicku/teaching/ict/ossi/lab/shell
\end{verbatim}%$

\section{Special Variables}
\label{sec:special-variables}

Parameters may be passed to a shell script.  `\texttt{\$0}' is the
name of the shell script itself.  The first parameter is
called `\texttt{\$1}', the second is `\texttt{\$2}' and so on.  The
number of parameters is `\texttt{\$\#}'.  A list of all the parameters
is in the variables `\texttt{\$*}' and `\texttt{\$@}'.  The only
difference between `\texttt{\$*}' and `\texttt{\$@}' is when they are
enclosed in double quotes---see section \vref{pag:dollar-star-quoting}
on quoting.

\texttt{IFS} is the ``\emph{internal-field separator}''.  The shell
automatically splits strings into fields divided by the
\texttt{IFS}\@.  Here is a simple example:
\begin{verbatim}
$ (IFS=:; echo $PATH)
/usr/kerberos/bin /usr/local/bin /usr/bin /bin /usr/bin/X11 /usr/games
/usr/bin /usr/X11R6/bin /opt/OpenNMS/bin /usr/java/jdk1.3.1_01/bin
/home/nicku/bin /sbin /usr/sbin /usr/local/sbin
\end{verbatim}
I changed \texttt{IFS} in a subshell so that the value of \texttt{IFS}
in the current shell would not be changed.  Sort of like a local variable.

\section{Special Characters}
\label{sec:special-characters}

Comments start with a `\texttt{\#}'.
Statements are separated either by newlines, or by semicolons
`\texttt{;}'.

The dot command is useful for executing a login script:
\begin{verbatim}
. ~/.bash_profile
\end{verbatim}
It is useful here, because it does not execute the commands in a
separate subshell.  Hence, all changes to variables remain.

The `\texttt{\$}' symbol indicates that a variable name comes next,
and gives the value of that variable.

The backslash \texttt{"\bs"} has many meanings, mostly similar to it's
behaviour in the C programming language.  At the end of a line, a
backslash allows a long line to be split into shorter pieces.

There are many other characters that are special to the shell.  See
chapter~4 of \url{http://www.linuxdoc.org/LDP/abs/html/index.html}.

\section{Quoting}
\label{sec:quoting}

There are four main ways of quoting: forward single quotes, double
quotes, the backslash, and backward single quotes.  Quoting causes the
quoted material to have a different meaning from normal.  In
particular, the special treatment the shell gives to special
characters is suppressed to some degree.

Enclosing in double quotes \texttt{"..."} suppresses all special
behaviour, except for variable interpretation (\texttt{\$}), the
forward quote, and the backslash.

Enclosing in single forward quotes \texttt{'...'} suppresses the
special behaviour of all special characters.

Putting a backslash in front of a character preserves the literal
value of the character, except for newline.

Single back quotes \texttt{`...`} mean: ``execute the external program
called within these quotes and put the output back here.''  This is
called \emph{command substitution} in the bash manual.  Command
expansion is really quite different from the other three quoting
methods.  Here is an example using the \texttt{hostname} command,
which prints the hostname on standard output:
\begin{verbatim}
$ hostname
nickpc.tyict.vtc.edu.hk
$ h=hostname
$ echo $h
hostname
$ h=`hostname`
$ echo $h
nickpc.tyict.vtc.edu.hk
\end{verbatim}%$

\subsection{When to use quoting}
\label{sec:when-to-quote}

Many programs, such as \texttt{grep} or \texttt{find} need some
special characters that they themselves will interpret.  We need to be
able to send these characters unchanged to the program.  In this case,
quote them.  Examples:
\begin{verbatim}
$ find . -name "*.rpm"
\end{verbatim}%$
If we do not quote the asterisk, the shell will expand \texttt{*.rpm}
to match only the \texttt{rpm} files in the current directory, but we
want find to locate all the \texttt{.rpm} files in the directories
\emph{below} the current directory also.

If you want a variable value that contains spaces to not be
automatically split my the shell, then quote it.  Here,
\texttt{testquote} is a short shell script that prints information
about its parameters:
\begin{verbatim}
$ test="one two"
$ testquote $test
You have 2 parameters.  They are:
parameter 1: one
parameter 2: two
$ testquote "$test"
You have 1 parameters.  They are:
parameter 1: one two
\end{verbatim}%$

\label{pag:dollar-star-quoting}%
Note that \texttt{"\$*"} is one value (not split up), while
\texttt{"\$@"} is split into the original parameters.  So if
`\texttt{\$\#}' had the value 4, then there are four separately quoted
values in \texttt{"\$@"}.  See the beginning of
section~\vref{sec:special-variables}.

Here is a little example showing the difference between \texttt{"\$@"}
and \texttt{"\$*"}:
\begin{verbatim}
$ cat test_at_star
#! /bin/sh
testquote "$@"
testquote "$*"
$ test_at_star one two three
You have 3 parameters.  They are:
parameter 1: one
parameter 2: two
parameter 3: three
You have 1 parameters.  They are:
parameter 1: one two three
\end{verbatim}
Notice how  \texttt{"\$*"} just turned into one long parameter that
contains spaces.
\subsection{Printing Output}
\label{sec:output}

We use \texttt{echo} to print things.  By default, it puts a new line
at the end.  To avoid printing a newline, use \texttt{echo -n}:
\begin{verbatim}
$ cat echo-n
#! /bin/sh
echo "Hello "
echo World
echo -n "Hello "
echo World
$ ./echo-n 
Hello 
World
Hello World
\end{verbatim}

\subsection{Reading Input}
\label{sec:input}

There are many ways of reading input, but one simple way is to use
\texttt{read};
\begin{verbatim}
$ read answer
yes
$ echo $answer
yes
\end{verbatim}%$

\section{The Basic Statements}
\label{sec:statements}

The shell is a complete programming language, and supports
\texttt{for} loops, \texttt{while} loops, \texttt{if} statements,
\texttt{case} statements (like \texttt{switch} in C), as well as
function calls.  We look at only a small subset of these.


\subsection{The \texttt{if} statement}
\label{sec:if}

The syntax of the \texttt{if} statement is:
\begin{alltt}
if \emph{test-commands}
then
    \emph{statements}
fi
\end{alltt}
We can add an \emph{else}:
\begin{alltt}
if \emph{test-commands}
then
    \emph{statements-if-true}
else
    \emph{statements-if-false}
fi
\end{alltt}
and we can have other \texttt{if} conditions nested inside, but they
are introduced with a new keyword: \texttt{elif}:
\begin{alltt}
if \emph{test-commands}
then
    \emph{statements-if-test-commands-1-true}
elif \emph{test-commands-2}
    \emph{statements-if-test-commands-2-true}
else
    \emph{statements-if-all-test-commands-false}
fi
\end{alltt}

The \texttt{test-commands} is either:
\begin{itemize}
\item a program being executed, or
\item a test made using the program \texttt{test}; see \texttt{man
    test} for all the tests you can make using \texttt{test}.  Also
    see section~\vref{sec:test}.
\end{itemize}

A simple example:
\begin{verbatim}
if grep nick /etc/passwd > /dev/null 2>&1
then
    echo Nick has a local account here
else
    echo Nick has no local account here
fi
\end{verbatim}
We redirect all output from grep to avoid the side effect of printing
the line grep found.

If you want to put the \texttt{then} on the same line as the
\texttt{if}, you need to put a semicolon before the \texttt{then}.
Here is another example that adds the user \texttt{nicku} to the
sudoers file if that user is not there already:
\begin{verbatim}
if ! grep nicku /etc/sudoers > /dev/null 2>&1; then
    echo "nicku   ALL=(ALL) ALL" >> /etc/sudoers
fi
\end{verbatim}

\subsection{The \texttt{while} statement}
\label{sec:while}

The format of the \texttt{while} statement is:
\begin{alltt}
while \emph{test-commands}
do
    \emph{loop-body-statements}
done
\end{alltt}

Again, if you want to put the \texttt{do} on the same line as the
\texttt{while}, then you need an extra semicolon before the
\texttt{do}.  A simple example:
\begin{verbatim}
i=0
while [ "$i" -lt 10 ]; do
    echo -n "$i "       # -n suppresses newline.
    i=`expr $i + 1`     # i=$(($i+1)) also works.
done
\end{verbatim}%$
The square brackets are an example of the \texttt{test} program.  See
section~\vref{sec:test}.

\subsubsection{\texttt{expr}}

In the last example using a \texttt{while} loop, we used the program
\texttt{expr} to do arithmetic.  This is the portable way to do
arithmetic in shell programming.  Note that since \texttt{expr} prints
its output on standard output, we use command substitution to assign
the program output to the variable \texttt{i}.  See the manual page
for \texttt{expr} for more information.
\subsection{The \texttt{for} statement}
\label{sec:for}

The format of the \texttt{for} statement is:
\begin{alltt}
for \emph{name} in \emph{words}
do
    \emph{loop-body-statements}
done
\end{alltt}

Here is a simple example:
\begin{verbatim}
for planet in Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto
do
    echo $planet
done
\end{verbatim}%$
You can leave the \texttt{in \emph{words}} out; in that case,
\texttt{\emph{name}} is set to each parameter in turn.

Here is another example:
\begin{verbatim}
for i in *.txt
do
    echo $i
    grep 'lost treasure' $i
done
\end{verbatim}
Note that the shell will expand the wildcard characters into a list of
file names that you can process one by one in the loop.

\subsection{\texttt{break} and \texttt{continue}}
\label{sec:break-and-continue}

Inside loops you can the \texttt{break} and \texttt{continue}
statements.  They work like they do in C\@.

\subsection{The \texttt{test} program}
\label{sec:test}

The \texttt{test} program is used to perform comparisons of strings,
numbers and files, often used with the \texttt{if} and \texttt{while}
statements.  I will not waste space by copying the manual page here:
do \texttt{man test} to read all about \texttt{test}.

You can call the test program two ways: one as the name \texttt{test},
the other (more common way) as \texttt{[ ... ]}.  If we look, there is
a program called ``\texttt{[}'':%]
\begin{verbatim}
$ which [
/usr/bin/[
$ ls -l /usr/bin/[
lrwxrwxrwx    1 root     root            4 Nov 25 13:36 /usr/bin/[ -> test
\end{verbatim}%]]]

\paragraph{Important Note:}

there must be white space before and after the ``\texttt{[}'':%]
\begin{verbatim}
i=0
while ["$i" -lt 10]; do
    echo -n "$i "
    i=`expr $i + 1`
done
bash: [0: command not found
\end{verbatim}%$]

\subsection{Using the ``\texttt{\&\&}'' and
  ``\texttt{\textbar\textbar}'' Operators for Flow Control}
\label{sec:operators}

The shell has many operators; see the manual for a complete list.  But
here we look at two familiar operators that are surprisingly useful,
and yet which may have a use that is unfamiliar to you.

They are used for logical operations, and are shortcut logical
operators, just as you are familiar with in the C and Java programming
languages.  However, in shell programming, they are used for flow
control, rather like an \texttt{if} statement.

Suppose we have a shell script that we must call with two parameters,
and that it should fail if there are fewer or more parameters.  We can
use the `\texttt{\&\&}' operator after a test and exit.  Here is a
little shell script that will only accept two parameters and will exit
with a help message otherwise:
\begin{verbatim}
#! /bin/sh

[ $# -ne 2 ] && echo $0 parameter1 parameter2 && exit

echo parameter1 is $1, and parameter2 is $2.
\end{verbatim}
So let's run it, first with no parameters, then with two:
\begin{verbatim}
$ ./two-parameters 
./two-parameters parameter1 parameter2
$ ./two-parameters p q
parameter1 is p, and parameter2 is q.
\end{verbatim}
The syntax is like this:
\begin{alltt}
\emph{command1} && \emph{command2}
\end{alltt}
\texttt{\emph{command2}} will execute only if \texttt{\emph{command1}}
is successful.

Similarly, the syntax for the `\verb!||!' operator is:
\begin{alltt}
\emph{command1} || \emph{command2}
\end{alltt}
\texttt{\emph{command2}} will execute only if \texttt{\emph{command1}}
is \emph{not} successful.

\section{Regular Expressions}
\label{sec:regexp}

Much of what a system administrator does is editing configuration
files.  There are tools to help with this; one such tool is the
program \texttt{sed}; another is the programming language Perl\@.  The
one thing that comes in useful in both cases are \texttt{regular
  expressions}.  The \texttt{grep} command also uses regular
expressions.  Regular expressions provide a way of matching patterns
in a text file; they can also provide a way of altering the text that
matches the pattern.  Getting started with regular expressions is our
aim today.


A regular expression is a string of characters.  Some of these
characters have a special meaning; most do not.  The characters with a
special meaning are called \emph{metacharacters}.  Here are some example
regular expressions without metacharacters:
\begin{verbatim}
/nicku/      # simply matches the string "nicku"
/hacker/     # simply matches the string "hacker"
\end{verbatim}

\subsection{Some Funny Characters (metacharacters)}
\label{sec:metachars}

\begin{description}
\item[Asterisk: \texttt{*}] matches zero or more of the thing that
  came just before.  Example:

\texttt{1133*} matches 11 followed by one or more 3's, so it will
match: 113 or 1133 or 11333 or 1133333333333333\ldots

\item[Dot: \texttt{.}] matches any single character, except newline.
  So \texttt{".*"} matches zero or more of any character.

\item[{Caret: \textasciicircum}] matches beginning of a line, or inside
  backets means something different (see below).

\item[Dollar sign: \texttt{\$}] matches the end of a line.  For
  example, ``\textasciicircum\texttt{\$}'' matches blank lines.

\item[Brackets: \texttt{[...]}] matches one character from the set in
  the brackets.  Examples:
  
  \texttt{"[xyz]"} matches the characters \texttt{x}, \texttt{y}, or
  \texttt{z}.
  
  \texttt{"[c-n]"} matches any of the characters in the range
  \texttt{c} to \texttt{n}.
  
  \texttt{"[B-Pk-y]"} matches any of the characters in the ranges
  \texttt{B} to \texttt{P} and \texttt{k} to \texttt{y}.

  \texttt{"[a-z0-9]"} matches any lowercase letter or any digit.
  
  \texttt{"[\textasciicircum{}b-d]"} matches all characters
  \emph{except} those in the range \texttt{b} to \texttt{d}.
  
  Combined sequences of bracketed characters match common word
  patterns. \texttt{"[Yy][Ee][Ss]"} matches yes, Yes, YES, yEs,\ldots

  \texttt{"[0-9][0-9][0-9][0-9][0-9][0-9]\allowbreak[0-9]\allowbreak
    [0-9]\allowbreak[0-9]"}
  matches any \IVE student number.
\item[Backslash: \texttt{\bs}] quotes a metacharacter to take away its
  special meaning.  You can match a literal \texttt{"\$"} with
  \texttt{"\bs\$"}, or a backslash with \texttt{"\bs\bs"}.

\item[Ampersand: \texttt{\&}] means, in a replacement string, the
  string to be replaced.  See below.
\end{description}


\subsection{Sed}
\label{sec:sed}

The \texttt{sed} program (\textbf{s}tream \textbf{ed}itor) is a
non-interactive editing program.  We will look only at a subset of its
behaviour today: substitutions.

Let's start with an example:
\begin{verbatim}
sed '/nicku/s//nickl/' /tmp/sudoers-orig > /tmp/sudoers
\end{verbatim}
On each line of the input file \texttt{/tmp/sudoers-orig},
\texttt{sed} will replace the first instance of \texttt{nicku} with
\texttt{nickl} and send the result to the output.

Let's pull that expression \texttt{/nicku/s//nickl/} apart to see how
it works:

It begins with an \emph{address}, which is a simple regular expression
without any metacharacters:
\begin{verbatim}
/nicku/
\end{verbatim}
This \texttt{sed} address matches all line on the input file that
contain the string \texttt{nicku}.  It will apply the substitute
operation to them.

The next part is a \emph{substitution expression}:
\begin{verbatim}
s//nickl/
\end{verbatim}
The syntax of a sustitution expression is:
\begin{alltt}
s/\emph{pattern to replace}/\emph{replacement}/
\end{alltt}

Here, the \emph{pattern to replace} is empty: that means that we use
the value from the address pattern, so we are replacing the string
\texttt{nicku} with the \emph{replacement}, \texttt{nickl}.

So if the input file \texttt{/tmp/sudoers-orig} contains this line:
\begin{verbatim}
nicku   ALL=(ALL) ALL
\end{verbatim}
then the output file will contain:
\begin{verbatim}
nickl   ALL=(ALL) ALL
\end{verbatim}
instead.

Here is another example:
\begin{verbatim}
sed '/^\/misc/s//#&/' /tmp/auto.master-orig > /tmp/auto.master
\end{verbatim}
What does this do?  It takes as input the file
\texttt{/tmp/auto.master-orig}, then finds a line starting with
\texttt{/misc}, and puts a comment character before it.  The edited
output file is \texttt{/tmp/auto.master}.

Again, let us examine this expression \verb!'/^\/misc/s//#&/'!
part-by-part:

It begins with an \emph{address}, which (in this case) is a regular
expression:
\begin{verbatim}
/^\/misc/
\end{verbatim}
This means: apply the command to lines that start with (the
\texttt{"\textasciicircum"} metacharacter) \texttt{/misc}.  We have to use a
backslash to quote the forward slash, because otherwise the forward
slash would mark the end of the regular expression, rather than a
literal forward slash.

The rest is a \texttt{substitution expression}:
\begin{verbatim}
s//#&/
\end{verbatim}
Again the \emph{pattern to replace} is empty: that means that we use
the value from the address pattern, so we are replacing the string
\texttt{/misc} with the \emph{replacement}.

The hash symbol \texttt{"\#"} in the replacement expression is a
literal hash, i.e., the comment character that we are inserting.

The special metacharacter \texttt{"\&"} has the value of the entire
\emph{pattern to replace}.  So we are replacing a line on the input
file like this:
\begin{verbatim}
/misc   /etc/auto.misc  --timeout 60
\end{verbatim}
with this in the output:q
\begin{verbatim}
#/misc  /etc/auto.misc  --timeout 60
\end{verbatim}

\subsection{Where can I find out more about sed?}
\label{where-can-i-find-out-more-about-sed}

The book \url{http://www.linuxdoc.org/LDP/abs/html/index.html} has an
appendix about \texttt{sed}.  It has a rather limited manual page, but
there is an \acro{FAQ} at \url{http://www.ptug.org/sed/sedfaq.htm}.


\section{Finding examples of shell scripts on your computer}
\label{sec:examples}

Your Linux system has a large number of shell scripts that you can
refer to as examples.  I counted about 1400.  Here is one way of
listing their file names:
\begin{verbatim}
$ file /bin/* /usr/bin/* /usr/sbin/* /sbin/* /etc/rc.d/* /usr/X11R6/bin/* \
| grep -i "shell script" | awk -F: '{print $1}'
\end{verbatim}%$
Let's see how this works.  I suggest executing the commands separately
to see what they do:
\begin{verbatim}
$ file /bin/* /usr/bin/*
$ file /bin/* /usr/bin/* | grep -i "shell script"
$ file /bin/* /usr/bin/* | grep -i "shell script" | awk -F: '{print $1}'
\end{verbatim}
The \texttt{awk} program is actually a complete programming language.
It is mainly useful for selecting columns of data from text.

\texttt{awk} automatically loops through the input, and divides the
input lines into fields.  It calls these fields \texttt{\$1},
\texttt{\$2},\dots\texttt{\$NF}\@.  \texttt{\$0} contains the whole
line.  Here the option \texttt{-F:} sets the \emph{field separator} to
the colon character.  Normally it is any white space.  So printing
\texttt{\$1} here prints what comes before the colon, which is the
file name.

Suppose you want to look for all shell scripts containing a particular
command or statement?  Looking for example shell scripts that use the
\texttt{mktemp} command:
\begin{verbatim}
$ file /bin/* /usr/bin/* /usr/sbin/* /sbin/* /etc/rc.d/* /usr/X11R6/bin/* \
| grep -i 'shell script'| awk -F: '{print $1}' | xargs grep mktemp
\end{verbatim}%$

Here is a useful little shell script that does this:
\begin{verbatim}
#! /bin/sh

if [ $# -eq 0 ]
then
    cmd=`basename $0`
    echo $cmd: search all Bourne shell scripts for a command
    echo usage: $cmd [grep-options] command-to-grep-for
    echo the grep-option -l is useful
    exit 1
fi

(
IFS=:
for d in $PATH
do
    file $d/*
done
find /etc/rc.d -type f | xargs file
) \
| grep 'Bourne.* shell script' \
| awk -F: '{print $1}' \
| xargs grep "$@"
\end{verbatim}%$
We run the \texttt{for} loop in a sub shell to make the change to
\texttt{IFS} local.  \texttt{IFS} is the ``internal field separator''.
The shell will automatically split lines into fields separated by the
\texttt{IFS}\@.

\subsection{Where can I find out more about awk?}
\label{Where can I find out more about awk?}

There is a whole book about \texttt{awk}; you can buy it from O'Reilly
for about \$300 HK, or you can read it online at
\url{http://www.ssc.com/ssc/eap/}.

%
% perl -e '@path=split /:/, $ENV{PATH};for $d ( @path ) {print "$d\n"}'
% find /etc/rc.d -type f | xargs file | grep -i 'shell script'\
% | awk -F: '{print $1}' | wc -l
%     117
% perl -e '@path=split /:/, $ENV{PATH};for $d ( @path ) {print "$d\n"}' \
% | while read d;do file $d/*|grep -i 'shell script' \
% | awk -F: '{print $1}'; done | wc -l
%    1275
%
% Better than using perl, use word splitting in shell:
% (IFS=:;for d in $PATH;do echo $d;done)
%$

\section{Debugging Shell Scripts}
\label{sec:debugging}

It is best to write shell scripts incrementally: write part, test that
it works, and continue until your script does what is required.

You can use \emph{echo} statements to print the values of variables.

You can run the script with the \emph{verbose} option to bash.  For a
script called \texttt{script}, you could run it in verbose mode like
this:
\begin{verbatim}
$ sh -v script
\end{verbatim}%$
You can see each command after it has been expanded by using the
\texttt{-x} option:
\begin{verbatim}
$ sh -x script
\end{verbatim}%$

\section{Common Mistakes}
\label{sec:common-mistakes}

I see many people making the same mistakes.  This is due partly to the
difference in the shell from other programming languages, and partly
due to missing lectures or being late in the laboratory! \verb!:-)!

\begin{description}
\item[Spaces are important!] The shell cares about spaces much more
  than other programming languages.  This is because it does so many
  different things; if you put
\begin{verbatim}
i;
\end{verbatim}
in a C program, it is just an expression that is evaluated, the result
is thrown away, and nothing happens.  The shell, on the other hand,
will look for an program by the name \texttt{i}, and execute it.

The shell breaks things up into separate tokens at white space.  Where
you put spaces does matter.
\begin{description}
\item[Don't put spaces in assignments:] An assignment is a single
  thing.  If you put spaces in it, the shell will try to execute a
  program by the name of the variable you are trying to assign to!
\begin{verbatim}
i =20
bash: i: command not found
\end{verbatim}
  
\item[\texttt{eval} needs spaces:] You need to put spaces between the
  operands of the external program \texttt{eval}:
\begin{verbatim}
i=0
i=`eval i+1`
bash: i+1: command not found
\end{verbatim}

\item[Put spaces around the \texttt{[ ... ]}] See the notes in the
  section\vref{sec:test}.
\end{description}
\item[Use meaningful variable names:] I saw people get confused about
  variables such as \texttt{\$1}, \texttt{\$2}.  Assign them to
  meaningful names, and you won't get so confused.  Use what your
  other lecturers taught you about good programming practice!
\end{description}

%\clearpage
\section{Questions}
\label{sec:questions}

Make all these scripts executable programs on your \texttt{PATH}.
\begin{enumerate}
\item Write a simple shell script that takes any number of arguments on the
  command line, and prints the arguments with ``Hello '' in front.  For
  example, if the name of the script is \texttt{hello}, then you
  should be able to run it like this:
\begin{verbatim}
$ hello Nick Urbanik
Hello Nick Urbanik
$ hello Edmund
Hello Edmund
\end{verbatim}

\item Write a simple shell script that takes two numbers as parameters
  and uses a \texttt{while} loop to print all the numbers from the
  first to the second inclusive, each number separated only by a space
  from the previous number.  Example, if the script is called
  \texttt{jot}, then
\begin{verbatim}
$ jot 2 8
2 3 4 5 6 7 8
\end{verbatim}%$

\item Suppose that the script you wrote for the previous question is
  called \texttt{jot}.  Then run it calling \texttt{sh} yourself.
  Notice the difference:
\begin{verbatim}
sh jot 2 5
sh -v jot 2 5
sh -x jot 2 5
\end{verbatim}
Do you notice any difference in the output from last two?
  
\item Write a shell script that, for each \texttt{.rpm} file in the
  current directory, prints the name of the package on a line by
  itself, then runs \texttt{rpm -K} on the package, then prints a
  blank line, using a \texttt{for} loop.
  
  Mount the server
  \texttt{ictlab\allowbreak.tyict\allowbreak.vtc\allowbreak.edu\allowbreak%
    .hk:\allowbreak/var\allowbreak/ftp\allowbreak/pub}
  on a convenient directory on your machine, such as
  \texttt{/mnt/ftp}.  Test your script on the files in
  \texttt{/mnt\allowbreak/ftp\allowbreak/rh-7.2-updated\allowbreak%
    /RedHat\allowbreak/RPMS}\@.
  
  \begin{explanation}
    The option \texttt{rpm -K} chec\textbf{k}s that the software
    package is not corrupted, and is signed by the author (if you have
    imported the author's public key in your \texttt{gpg} setup)
  \end{explanation}
  
\item Modify the script you wrote for the previous question to print
  the output of \texttt{rpm -K} \emph{only} for \emph{all} the files
  that fail the test.  In particular, if the package's \acro{GPG}
  signature fails, then your script should display the output of
  \texttt{rpm -K}\@.  There are at least two packages in this
  directory which do not have a valid \acro{GPG} signature; one of
  them is \texttt{redhat-release-7.2-1.noarch.rpm}; what is the other?

  Here is output from \texttt{rpm -K} for two packages, one with no
  \acro{GPG} signature, the other with:
\begin{verbatim}
$ rpm -K redhat-release-7.2-1.noarch.rpm bash-2.05-8.i386.rpm
redhat-release-7.2-1.noarch.rpm: md5 OK
bash-2.05-8.i386.rpm: md5 gpg OK
\end{verbatim}%$

  Test it in the same network directory as for
  the previous question.

\item Write a shell script to add a local group called \texttt{administrator}
  if it does not already exist.  Do not execute any external program
  if the \texttt{administrator} group already exists.

\item Download a copy of the bogus student registration data from
  \url{http://ictlab.tyict.vtc.edu.hk/snm/lab/regular-expressions/artificial-student-data.txt}.
  Use this for the following exercises, together with the
  \texttt{grep} program:
  \begin{enumerate}
  \item Search for all students with the name ``CHAN''

  \item Search for all students whose student number begins and ends
    with 9, and with any other digits in between.

  \item Search for all student records where the Hong Kong ID has a
    letter, not a number, in the parentheses.

  \item If you have time, you may do the same exercises, but display
    only the students' names, or student number.
  \end{enumerate}
  
\item Write a shell script to take a file name on its command line,
  and edit it with \texttt{sed} so that every instance of
  ``\texttt{/usr/local/bin}'' is changed to ``\texttt{/usr/bin}''
  
\item Write a shell script to take a file name on its command line,
  and edit it using \texttt{sed} so that every line that begins with
  the string \texttt{server}:
\begin{alltt}
server \emph{other text}
\end{alltt}
is edited so that averything after ``\texttt{server~}'' (i.e., the
``\texttt{\emph{other text}}'') is replaced with the string
``\texttt{clock.tyict.vtc.edu.hk}'', so that the line above looks like this:
\begin{alltt}
server clock.tyict.vtc.edu.hk
\end{alltt}
Test this on a copy of the file \texttt{/etc/ntp.conf} that is on your
computer.  (Install the package \texttt{ntp} if it is not there).
\end{enumerate}
\end{document}