\documentclass{ictlab} % Copyright (c) 2003 by Nick Urbanik . % This material may be distributed only subject to the terms and % conditions set forth in the Open Publication License, v1.0 or later % (the latest version is presently available at % http://www.opencontent.org/openpub/). \RCS $Revision: 1.6 $ \usepackage{verbatim,alltt,moreverb,answer2,biganswerbox,xr} \ifx\pdftexversion\undefined \else \usepackage[breaklinks,pdfpagemode=None,pdfauthor={Nick Urbanik}]{hyperref} \fi \externaldocument[pl-]{../../lectures/perl/perl} \externaldocument[ps-]{../../lectures/perl/perl-slides} \newcommand*{\labTitle}{Using Regular Expressions to Create User Accounts from Registration Data} \begin{document} For each exercise where you write a program, keep the original program from each exercise and modify a copy for the next exercise. \tableofcontents \section{Background} \subsection{Character Classes} \label{sec:character-classes} We have seen how character classes can match a set of characters. For example, the character class \texttt{/[0-9]/} (or \texttt{/\bs d/}) matches any one digit, and \texttt{/[0-9][0-9]/} (or, if you like, \texttt{/\bs d\bs d/}) matches any two digits, one after the other. \subsection{Capturing the Match in Parentheses} \label{sec:capturing} This next topic is like gold mining: extracting useful information from among other less useful material. You can use parentheses in a regular expression, and if there is a match, then the variable \texttt{\$1} is set to the contents of the first set of parentheses in the regular expression. For example, this code: \begin{verbatim} my $line = "STUDENT REGISTER 2001/02 2nd Term MODE : PTE"; if ( $line =~ /MODE : ([Pp][Tt][Ee])/ ) { print "The mode of study is $1\n"; } \end{verbatim}%$ prints: \begin{verbatim} The mode of study is PTE \end{verbatim} \subsection{Creating User Accounts} \label{sec:creating-user-accounts} There are many ways of creating accounts on a Linux system; your program could select the next available user and group \acro{ID}s, manually edit the \texttt{/etc/passwd}, \texttt{/etc/group} and \texttt{/etc/shadow} files, create the user directories, copy the login scripts and other basic account files from \texttt{/etc/skel}, and change the ownership to the new account, but a simpler and more portable way is to use the standard \texttt{useradd} program that you learned about last year. We can call that from our own Perl programs using the built-in \texttt{system} function. Today we will write a program that will create a user account for each student listed in the artificial student registration data. \subsection{The \texttt{system} Builtin Function} \label{sec:system} Perl provides a number of ways of calling external programs. We will use the \texttt{system} builtin function to call \texttt{/usr/sbin/useradd} to create user accounts from our student registration data. Refer to slides~\pageref{ps-sec:external-programs}--\pageref{ps-sld:backticks} in the Perl lecture slides, and section~\vref{pl-sec:external-programs} of my Perl summary, available at \sloppypar \url{http://nicku.org/snm/lectures/perl/perl.pdf}. \section{Procedure} \label{sec:procedure} \subsection{Installing Linux using Kickstart} \label{sec:installing-using-kickstart} \begin{enumerate} \item Install Red Hat Linux version 9 using the \texttt{Kickstart} disk that you are given. To do this, simply: \begin{enumerate} \item Ensure that your removable hard disk is properly installed and firmly pushed into its socket \item Turn on your computer \item Insert your kickstart installation disk before the computer boots. \item At the \texttt{boot:} prompt, type: \begin{alltt} boot: \textbf{linux ks=floppy} \end{alltt} \item When prompted, insert the network card driver disk. \item If any prompts appear that say partition tables on \texttt{/dev/hdc} are inconsistent, ignore them, as this inconsistency is a result of cloning the disks using Ghost\@. \item The installation takes place using Kickstart\@. You may see the chapter from the \emph{Red Hat Linux Customization Guide} for all details about Kickstart\@. The installation instructions are in a text file on the floppy disk, called \texttt{ks.cfg}; you can look at it if you like. \item While the installation takes place, work on the following written tutorial exercises. \end{enumerate} \end{enumerate} \subsection{Tutorial Exercises to Work on While Linux is Installing} \label{sec:tutorial-exercises} \begin{figure}[htb] {\tiny \begin{verbatim} 24-SEP-01 09:36 A COLLEGE IN HONG KONG GRADE REPORTING PAGE : 1 VTC GCP1309L-1 STUDENT REGISTER 2001/02 2nd Term MODE : PTE A College in Hong Kong A computing department 2241/2 Higher Certificate in Software Engineering NO. NAME SEX STUD.NO. HKID HOME TEL. COMPANY COMPANY ADDRESS COMPANY TEL. ----- ----------------------------------- --- --------- ----------- ---------- ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------- ---- 1 LI, Sze Wai M 914981001 L700339(4) 24118453 2 LAW, Kar Hang M 943430710 X984028(4) 28543261 Wing Fat Hong International Corp Ltd 23056370 3 YEUNG, Hoi Man M 915367894 K383949(9) 21943771 David Hot Blocking Press Ltd 25918341 7 4 NG, Sze Wing M 914925086 W216913(3) 23291992 5 LAW, Wing Yee F 907466937 C977399(7) 21234895 Lucky Industrial (Holdings) Ltd 24060909 6 YAU, KA KEI M 973517175 D896832(2) 22818446 Union Plastic Factory Ltd 255 KING'S RD 22641842 7 YIM, Man Wai M 981309634 G422563(7) 29105262 Jardine Matheson Ltd 24296024 2 8 WONG, Kam Lun M 946874929 H187711(5) 28376578 System Engineering Ltd 28943187 7 \end{verbatim} } \caption{The first few lines from the file of artificial student data.} \label{fig:excerpt} \end{figure} Figure~\vref{fig:excerpt} shows a few lines from the student information file. \begin{enumerate} \item For the data shown in figure~\vref{fig:excerpt}, write a regular expression (with your match enclosed in parentheses) that will select the: \begin{enumerate} \item student number \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \item Hong Kong ID \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \item The student's name \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} %% \item The class of a particular student %% \begin{biganswerbox}[7mm]% %% \begin{solution}% %% \mbox{} %% \end{solution} %% \end{biganswerbox} \item the course code \begin{explanation} The course and year are shown in this case on the sixth line: \texttt{2241/2}. The course is 2241; this is the second year of study. \end{explanation} \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \item the year of study \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \item The company the student works for \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \item The home telephone number \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \item The gender of the student \begin{biganswerbox}[7mm]% \begin{solution}% \mbox{} \end{solution} \end{biganswerbox} \end{enumerate} \end{enumerate} \subsection{Finishing the Setup of Your Linux Installation} \label{sec:setup-of-linux} \begin{enumerate} \item Log into your system as your student account (your student number is your user name; the password you should already know) \item Set up \texttt{sudo} with \begin{alltt} $ \textbf{su -c /usr/sbin/visudo} \end{alltt}%$ \item At the prompt, you should enter the root password, which was set in the Kickstart file \texttt{ks.cfg} to the nine-character string ``\texttt{)3SnhGxv9}''. \begin{explanation} You may refer to the document I wrote about \texttt{sudo}, available here: \sloppypar \url{http://nicku.org/ossi/lab/sudo/sudo.pdf}% %{http://nicku.org/ossi/lab/sudo/sudo.pdf} \end{explanation} \item You may wish to change the resolution of the screen using the program \texttt{redhat-config-xfree86}. \end{enumerate} \subsection{Exercises for You After Installation has Finished} \label{sec:practical-after-installation} \begin{enumerate} \item Download the artificial student data from \sloppypar \url{http://nicku.org/snm/lab/regular-expressions/artificial-student-data.txt}% %{http://nicku.org/snm/lab/regular-expressions/artificial-student-data.txt}% . There is also a link to this file from the subject web site. This file is in the old format of the student registration system, but contains no real data about any student. We will work toward generating system accounts from this file during this class. \item Write a Perl program that can read all the lines of this file when it is given as a command line parameter, and display it on standard output. For example, if your program is called \texttt{printit}, then the following command will display the content of the big file of student data: \begin{alltt} $ \textbf{printit artificial-student-data.txt} \end{alltt}%$ This will be a very short program: between 1 and three lines of code should do it. \item Make a copy of your program, and modify it so that it prints only the lines that contain a number with eight or more digits. \item Modify the last program so that it prints all lines that contain a Hong Kong ID\@. \item Modify this last program further so that it prints only the Hong Kong ID, and nothing else for each line. Each Hong Kong ID should be printed one to each line. There should be no other output from your program. \item Write a Perl program that can read all the lines of this file when it is given as a command line parameter, and print the student numbers only, one to each line. \item Make a copy of your program, and modify it so that it prints only the names of the students, one to each line, with no extra spaces either at the beginning or end of each name. \item Using the manual page for the \texttt{useradd} program as a guide, modify your previous programs so that your program \emph{prints} one \texttt{useradd} command for each student, using the student \acro{ID} as the login \acro{ID}, the Hong Kong \acro{ID} as the password, and the name of the student as a comment. Note that our Linux systems require that the user/group names must start with a letter, and may not contain colons, commas, newlines or any non-printable characters. The maximum length of the user \acro{ID} is 32 characters; that of the group \acro{ID} is 16. To meet this requirement, make the user \acro{ID} be the first letter (made lower case) of the family name, followed by the student \acro{ID}. \item Modify this last program further so that it uses the built-in Perl function \texttt{system} to execute the \texttt{useradd} command as well as print the command. Type \texttt{perldoc -f system} to read about this important function. You will probably need the function \texttt{hash\_md5\_password} described in the appendix on page~\pageref{sec:hash_md5_passwd}. \begin{explanation} I suggest that you take one or two student record lines from the student information file, and run your program on that file, generating accounts for them at first. Only after you have tested your program on a subset of the data, and have demonstrated that it works, then run it on all the data in the data file. \end{explanation} \item Modify your last program so that it reports an error message if the execution of any \texttt{useradd} command is unsuccessful. \item Modify your program further so that it creates a group (see \texttt{man groupadd} and \texttt{man gpasswd}) for each year found in the data file, and for each course, and makes each student a member of these groups as their secondary group. For a student in year 1, create a group \texttt{year1} if it does not already exist, and make the student a member of that group. For a student in the course \texttt{2241}, create a group with the name \texttt{ict2241} and add the user to that group. \end{enumerate} \appendix \renewcommand{\appendixname}{Appendix} \section{Appendix: \texttt{hash\_md5\_password}, Provided for Reference Only} \label{sec:hash_md5_passwd} \begin{figure}[htb] \begin{listing}{1} #! /usr/bin/perl -w # Example program to generate MD5 hashed passwords suitable for use in # /etc/shadow on a Linux system. # You could pass the output of this function to useradd -p xxxxx, # where xxxxx is the output of hash_md5_password(). # Based on /usr/share/doc/samba-2.2.1a/examples/LDAP/ldapsync.pl, # distributed with samba. # A portable alternative is the module Crypt::PasswdMD5, available # through the cpan program. use strict; sub hash_md5_password($) { my $clear_text_password = shift; my $salt = join '', ('.', '/', 0..9, 'A'..'Z', 'a'..'z')[rand 64, rand 64, rand 64, rand 64, rand 64, rand 64, rand 64, rand 64]; $salt = '$1$'.$salt.'$'; my $hashed_password = crypt( $clear_text_password, $salt ); return $hashed_password; } # This code is a stub just to test the function: our $clear_password = shift; our $hashed_password = hash_md5_password( $clear_password ); print "MD5 Hash of '$clear_password' is '$hashed_password'\n"; \end{listing}%$ \caption{A function to generate \acro{MD5} Hashes of passwords, and a stub program to test it.} \label{fig:hash_md5_passwd} \end{figure} Figure~\vref{fig:hash_md5_passwd} shows a function that I have adapted, plus a stub program to test it. You can download a copy of it from \sloppypar \url{http://nicku.org/snm/lab/regular-expressions-make-accounts/hash_md5_password.txt}. There is a link on the subject page that you can download it from. You do not need to study this function in detail, but I explain it here for completeness. You will probably want to copy and paste lines 14--23 into your progam that calls \texttt{useradd}. \texttt{useradd} provides the option \texttt{-p}~\emph{hashed\_password} to allow creation of accounts with passwords. But you need to \emph{hash} the password yourself. This function does that. Try running the program, typing in some text, and seeing the output as a hash of the text, suitable for use in the \texttt{/etc/shadow} file that holds hashed passwords. As you can see from the call to the function in the stub program on line 27, the function \texttt{hash\_md5\_password} takes one scalar parameter. This is put into the variable \texttt{\$clear\_text\_password} on line 16. The lines 17--19 need explaining. To attack a set of hashed passwords, a technique that works well with Windows \NT passwords is to build a dictionary of hashed words. With such a dictionary, it would take a short time to find all the trivial passwords in a system. Linux avoids this danger by appending a 48-bit random number called a \texttt{salt} to the plain text before it is hashed. To build an already hashed dictionary would require hashing each word $2^{48}={}$281,474,976,710,656 times, making such an attack impractical. Note that a salt does not decrease the time an attacker needs to search for a single user's password. So lines 17--19 build a list of 64 characters (used for \emph{mime}, or \emph{base 64} encoding), and then build an array of eight of these characters, using the builtin \texttt{rand} function as an index. Line 20 encloses the salt between a literal `\texttt{\$1\$}' and a dollar sign. This is the way the \texttt{crypt} standard library function determines that the password has been hashed with \acro{MD5} rather than with the weaker \acro{DES} hash. We then pass the plain text and the salt to the \texttt{crypt} standard library function. \end{document}