\chapter{Apache Basics} \label{cha:apache} {\mns \subsection{Objectives} On completion of this module you should be able to: \begin{itemize} \item Install the {\pgn Apache} webserver \item Perform basic configuration \end{itemize} \section{What is {\pgn Apache}?} \begin{itemize} % FIXME: Update the data from netcraft \item {\pgn Apache} is the most widely-used web-server\footnote{61.88\% of all servers as of February 2001 (Netcraft --- \url{http://www.netcraft.com/Survey/Reports/200102/platform.html})} \item Listens for requests and hands something back \item Normally the contents of a file \begin{itemize} \item Possibly the result of a program \end{itemize} \item Designed to be stable and configurable \begin{itemize} \item Fast at serving synamic content \item Use kernel based http server {\pgn tux} or {\pgn khttpd} for static content for maximum speed \end{itemize} \end{itemize} \section{Installation} \begin{itemize} \item Basic installation is easy \item You may be able to install from your distribution \begin{itemize} \item Most come with {\pgn Apache} \end{itemize} \item Otherwise just follow the download instructions from the official site \begin{itemize} \item {\cmdn http://www.apache.org/} \end{itemize} \item Then follow the instructions in the {\fn INSTALL} file \begin{itemize} \item Normally just \begin{verbatim} $ ./configure $ make $ make install \end{verbatim}%$ \item If you have problems check the docs \item Available at {\cmdn http://www.apache.org/docs} \end{itemize} \end{itemize} \section{How Apache Listens} \begin{itemize} \item Apache runs several processes at any one time \begin{itemize} \item Parent and several children \end{itemize} \item Parent {\em`watches over'} the children \begin{itemize} \item Tracks how many are answering requests \item Spawns more if free processes drop below a certain point \item Kills spare processes if there are lots free \end{itemize} \item Configure child numbers using {\kwd MinSpareServers} and {\kwd MaxSpareServers} directives \begin{itemize} \item Default is reasonable for a small business \item Tune it for busier sites \end{itemize} \end{itemize} \section{Configuration File(s)} \begin{itemize} \item If compiled from source, {\pgn Apache} installs in {\fn /usr/local/apache} \begin{itemize} \item Earlier versions installed under {\fn /usr/local/etc/httpd} \item Your distribution may differ again \ldots~\footnote{Redhat installs config files under {\fn /etc/httpd} and the sample web pages and logs directories under {\fn /home/httpd}} \end{itemize} \item Configuration file is called {\fn httpd.conf} \begin{itemize} %FIXME: Check this footnote against the RH6.1 RPM (LW) \item Older versions use \begin{itemize} \item {\fn httpd.conf} \item {\fn srm.conf} \item {\fn access.conf} \end{itemize} \end{itemize} \item Controls what requests {\pgn Apache} answers \begin{itemize} \item and how \ldots \end{itemize} \end{itemize} \section{Key Configuration Directives} \begin{itemize} \item Wide range of {\kwd configuration directives} \item For a {\em very} basic server you need at least the following: \begin{itemize} \item {\kwd ServerRoot} \item {\kwd DocumentRoot} \item {\kwd ServerAdmin} \item {\kwd BindAddress} \item {\kwd Port} \item {\kwd Listen} \item {\kwd User} \item {\kwd Group} \end{itemize} \end{itemize} \section{{\kwd ServerRoot}, {\kwd DocumentRoot}} \begin{itemize} \item Tells {\pgn Apache} where its files live \item {\kwd ServerRoot} tells {\pgn Apache} where its {\pgn conf} and {\pgn logs} directories live \begin{itemize} \item Not always necessary \item Good practice to have it \end{itemize} \item {\kwd DocumentRoot} tells {\pgn Apache} where to look for documents to serve up \item Requested filenames are appended to this \item If you have {\cmdn \begin{verbatim} DocumentRoot /var/www/html \end{verbatim}} then a request to \begin{verbatim} http://www.domain.co.uk/foo.html \end{verbatim} points to the file {\fn /var/www/html/foo.html} \end{itemize} \section{Is {\pgn Apache} running?} \label{sec:telnet-apache} \begin{itemize} \item Sometimes it is useful to check the server using the {\cmdn telnet} program: {\myss% \begin{verbatim} $ telnet csalinux 80 Trying 192.168.128.53... Connected to CSAlinux.tycm.vtc.edu.hk (192.168.128.53). Escape character is '^]'. GET / HTTP/1.0 HTTP/1.1 200 OK Date: Mon, 05 Mar 2001 01:51:59 GMT Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.4pl1 mod_perl/1.24 Last-Modified: Wed, 28 Feb 2001 05:23:07 GMT ETag: "28363-129e-3a9c8b3b" Accept-Ranges: bytes Content-Length: 4766 Connection: close Content-Type: text/html CSA Linux ... Connection closed by foreign host. \end{verbatim}}%$ \end{itemize} \section{{\kwd ServerAdmin}} \begin{itemize} \item {\pgn Apache} sometimes can't complete requests \item In these cases it serves up an error page \item {\kwd ServerAdmin} is given as a contact address \item Usually set to something like \begin{verbatim} webmaster@tycm.vtc.edu.hk \end{verbatim} \begin{itemize} \item You should of course ensure that it is a {\em valid} email address \end{itemize} \item Possible to specify a different error page \begin{itemize} \item Doesn't have to use ServerAdmin \end{itemize} \end{itemize} % Nick: There is no need to discuss anachronisms here, unless there is % a practical benefit. Fold next two sections into one, describing % Listen only. \section{{\kwd BindAddress}, and {\kwd Port}} \begin{itemize} \item Tells {\pgn Apache} which requests to answer \item By default {\pgn Apache} listens to every IP address on your machine \begin{itemize} \item But only to the port given by the {\kwd Port} directive \end{itemize} \item {\cmdn BindAddress 192.168.0.1} tells {\pgn Apache} to ignore anything that doesn't come in on {\cmdn 192.168.0.1} \item {\cmdn Port 8080} ignores all but the specified port \item You can use more than one {\kwd Port} directive, e.g. \begin{verbatim} Port 80 Port 8080 \end{verbatim} \item If you don't specify a port then a default is used\footnote{This is usually 80, but if you are using a binary package then bear in mind whoever compiled your package may have chosen a different value} \item You can only use one {\kwd BindAddress}! \end{itemize} \section{{\kwd Listen}} \begin{itemize} \item {\kwd Listen} is a replacement for {\kwd BindAddress} and {\kwd Port} \item Given IP:port or just port, e.g. \begin{verbatim} Listen 192.168.0.1:8080 \end{verbatim} will answer requests on the IP address {\cmdn 192.168.0.1} and port {\cmdn 8080} and no others \item To answer requests to all valid IP addresses, but only a certain port (e.g. {\cmdn 80}) use: \begin{verbatim} Listen 80 \end{verbatim} \item Can use more than one {\kwd Listen} directive \item Should be used instead of {\kwd BindAddress} and {\kwd Port} in new servers \end{itemize} %% FIXME: needs a diagram showing some requests being accepted and others rejected - One for a later date (LW) \section{{\kwd User} and {\kwd Group}} \begin{itemize} \item {\pgn Apache} should normally be started as {\em root} \begin{itemize} \item So it can change the user ID of the children \item These should {\em not} run as root \end{itemize} \item {\kwd User} and {\kwd Group} directives say what user/group the children should run as \begin{itemize} \item Important security feature \end{itemize} \item Should be set to something that has no real power on your system \begin{itemize} \item Most people use user and group {\cmdn nobody} \end{itemize} \item Web documents should be readable by this user \item Nothing should be writeable except log files \end{itemize} \section{Apache Processes} \begin{itemize} \item Looking at a process list\footnote{Some fields from the {\cmdn ps} output have been left out to aid clarity} you can see \begin{itemize} \item The parent {\myts {\cmdn \begin{verbatim} root S Jul 4 0:37 /usr/local/apache/bin/httpd -d /www \end{verbatim}}} \item The children {\myts {\cmdn \begin{verbatim} nobody S 10:11 0:00 /usr/local/apache/bin/httpd -d /www nobody S 10:20 0:00 /usr/local/apache/bin/httpd -d /www nobody S 10:58 0:00 /usr/local/apache/bin/httpd -d /www nobody S 10:58 0:00 /usr/local/apache/bin/httpd -d /www nobody S 11:06 0:00 /usr/local/apache/bin/httpd -d /www nobody S 11:13 0:00 /usr/local/apache/bin/httpd -d /www \end{verbatim}}} \end{itemize} \item Spare processes don't use processor time \begin{itemize} \item They are {\em `sleeping'} \end{itemize} \item They {\em do} use memory, however \begin{itemize} \item Negligible for a default {\pgn Apache} \item Watch carefully the more modules you add! \item Particularly, {\cmdn mod\_perl} adds a heavy memory requirement. \end{itemize} \end{itemize} \section{Logging} \begin{itemize} \item {\pgn Apache} can log information about accesses \item Use the {\kwd TransferLog} and {\kwd ErrorLog} directives \item {\cmdn TransferLog logs/access\_log} \\ will log all requests in the file {\fn ServerRoot/logs/access\_log} \item If the filename starts with a {\cmdn /} then it is treated as a proper pathname, not appended to {\kwd ServerRoot} \item {\kwd ErrorLog} is similar but controls where error messages go \begin{itemize} \item Useful for debugging CGI scripts and misconfigurations \item Check here first if {\pgn Apache} won't start \end{itemize} \end{itemize} \section{Customizable Logging} \begin{itemize} \item Customizable logs available with {\kwd CustomLog} \vspace{10pt}\\{\cmdn CustomLog filename format-string} \item {\cmdn format-string} consists of {\em `\% directives'} and/or text \item {\em \% directives} include: \vspace{14pt} \\ {\myss \begin{tabular}{|p{109pt}|p{240pt}|} \hline {\cmdn \%b} & Bytes sent, excluding HTTP headers \\ \hline {\cmdn\%f} & Filename \\ \hline {\cmdn\%\{headername\}i} & The contents of headername: header in the request \\ \hline {\cmdn\%P} & The process ID of the child that serviced the request \\ \hline {\cmdn\%r} & First line of request \\ \hline {\cmdn\%t} & Time, in common log format time format \\ \hline {\cmdn\%T} & The time taken to serve the request, in seconds \\ \hline {\cmdn\%u} & Remote username (may be bogus if return status (\%s) is 401) \\ \hline {\cmdn\%U} & The URL path requested \\ \hline {\cmdn\%v} & The ServerName of the server answering the request \\ \hline \end{tabular} } \end{itemize} \section{{\kwd CustomLog} examples} \begin{itemize} \item To log the referer information in the file {\fn ServerRoot/logs/referer} {\myts {\cmdn \begin{verbatim} CustomLog logs/referer "%r Refered by: %{Referer}i" \end{verbatim}}} \item {\em \% directives} can be conditional on reply status {\myts {\cmdn \begin{verbatim} CustomLog logs/referer "%r Refered by: %200,304,302{Referer}i" \end{verbatim}}} \begin{itemize} \item Logs the refering page only on status 200,304,302~\footnote{For full details consult the Apache documentation} \end{itemize} \item For full details consult the Apache documentation \begin{itemize} \item Gives list of all possible {\kwd \% directives} \end{itemize} \end{itemize} \section{Example Configuration} \begin{itemize} \item A sample configuration file could look like this: {\cmdn \begin{verbatim} ServerRoot /usr/local/apache DocumentRoot /usr/local/apache/htdocs ServerAdmin webmaster@domain.co.uk Listen 192.168.0.131:80 User nobody Group nobody ErrorLog /usr/local/apache/logs/error_log \end{verbatim}} \item We recommend starting with the default {\fn httpd.conf} rather than from scratch \begin{itemize} \item Correctly configures many things for you \end{itemize} \item The default is well annotated \begin{itemize} \item Everything after a {\cmdn \#} character is a comment \item Ignored by {\pgn Apache} \end{itemize} \item {\pgn Apache} can check the syntax of its configuration \begin{itemize} \item {\pgn httpd -t} \item {\cmdn apachectl configtest} \ldots if you installed {\cmdn apachectl} on your system. \end{itemize} \end{itemize} \section{Basic Exercises} %% FIXME: We need more exercises (LW) {\normalsize \begin{enumerate} \item {\em Apache Installation} \begin{enumerate} \item Find out if Apache is installed on your machine \ldots if not, install it. \item Check Apache is running on your system. \begin{enumerate} \item You should be able to point your web browser at {\cmdn http://127.0.0.1/} to check this \item You might have to try {\cmdn http://127.0.0.1:8080/} \end{enumerate} \item If Apache is not running, start it with {\cmdn /etc/rc.d/init.d/httpd start} % \begin{enumerate} %FIXME: This doesn't exist in a vanilla install!!!! (LW) %\item Run {\cmdn /usr/local/apache/bin/apachectl \verb|--|help} for information % \end{enumerate} \item If Apache still doesn't appear to be running, find its configuration and log files and try to fix the error. \end{enumerate} \item {\em Basic configuration} \begin{enumerate} \item Familiarise yourself with the {\fn httpd.conf} file. \item How would you change the directory where the log files are kept?. \item How would you change the 'root' for documents? \item How would you enable symbolic links to be followed on the cgi-bin directory? \textbf{Warning:} this is a \emph{really bad} idea! \item Make your site only accessible on Port 8080 \item Now make it only accessible on the IP address 127.0.0.1, and port 80 \item Make the changes and check them. \item Place the following line in your {\fn /etc/hosts} file: \begin{verbatim} IP_ADDRESS www.test.com www \end{verbatim} where {\cmdn IP\_ADDRESS} is the IP address of your machine. You should now be able to browse {\cmdn http://www.test.com/} \end{enumerate} \item {\em Logging} \begin{enumerate} \item Take a look at the access logs and familiarise yourself with the information they contain. %% IMPROVEME: could do some "harder" ones, e.g. analyse the logs with grep: Later date (LW) \item Set up a custom log to give the time of the request, the request, referer, and number of bytes sent, as well as the time taken to serve the request. \item Alter your custom log to show the time taken and bytes sent {\em only} if a 200 status response occured. \end{enumerate} \end{enumerate} } \section{Solutions} {\normalsize \begin{enumerate} \item {\em Apache Installation} \begin{enumerate} %% IMPROVEME: This is redhat specific, we should have examples for other distros(LW) \item If Apache is not installed you should be able to install it off a RedHat CD by mounting the CD and typing {\cmdn rpm -ivh /mnt/cdrom/RedHat/RPMS/apache\*.rpm} \item There are several ways to check this. One is to {\pgn telnet} to port 80 of your machine and see if you get a response. \begin{enumerate} \item This should work for a default RedHat install, though the port number that Apache first listens on changes in various different packaging so you should try both 80 and 8080. \end{enumerate} \item You can start Apache one of two ways (Which may be the same on some machines!) \begin{itemize} \item /etc/rc.d/init.d/httpd start % Nick: the footnote comes out in a large font. Fix it! \item somepath/apachectl start~\footnote{You may have to dig a little to find where this script is} \end{itemize} \item If you can't work out why Apache isn't running ask the tutor for assistance. \end{enumerate} \item {\em Basic configuration} \begin{enumerate} \item You should make sure that you understand everything in the {\fn httpd.conf} including those sections that are commented out. \item Alter the {\cmdn CustomLog} and {\cmdn ErrorLog} directives to change where the log files are kept, e.g. \begin{verbatim} ErrorLog /var/log/myerrorlog CustomLog /var/log/myaccesslog common \end{verbatim} \item The 'root' for documents is specified by the {\cmdn DocumentRoot} directive, e.g. \begin{verbatim} DocumentRoot /path/to/my/web/documents \end{verbatim} % Nick: TODO: this question asks the student to do something that % is dangerous, and it seems to me that the answer is wrong. Must % examine this and fix it. \item You can enable symbolic links by adding {\cmdn Options +ExecCGI FollowSymLinks} to the {\cmdn } section for your {\fn cgi-bin}, e.g. \begin{verbatim} Options +ExecCGI FollowSymLinks \end{verbatim} \item Add/Change the {\cmdn Port} directive in your {\fn httpd.conf} file to read {\cmdn Port 8080} \item Add the following to your {\fn httpd.conf}: {\cmdn Listen 127.0.0.1:80} \item Restart the server and try to access it on both port 80 and 8080. Check that it only works as you expect and fetches documents from the correct place. \item Check that you can browse {\fn http://www.test.com} \end{enumerate} \pagebreak \item {\em Logging} \begin{enumerate} \item Make sure you understand what each of the columns in the access logs is for. Try {\cmdn tail}ing the logs as you browse your webserver \item The following should create a file {\fn newlogformat} which holds the desired log format. \begin{verbatim} LogFormat "%t %U %{Referer}i %b %T" newlog CustomLog logs/newlogformat newlog \end{verbatim} \item Change your LogFormat line to \begin{verbatim} LogFormat "$t %U %{Referer}i %200b %200T" newlog CustomLog logs/newlogformat newlog \end{verbatim}%$ \end{enumerate} \end{enumerate} } } % end {\mns from chapter start %%% Local Variables: %%% mode: latex %%% TeX-master: "0_masterfile" %%% End: