%% $Header: /cvsroot/lcdp/lpic/general-linux-2/slides/gl2.111.5.slides.tex,v 1.1 2003/08/27 05:27:45 geoffr Exp $ \input{gl2.slide-header.tex} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %----10->|-----20->|-----30->|-----40->|-----50->|-----60->|-----70->|-----80-> \begin{slide} %================================================================ \begin{center} \Huge \textsf{-- General Linux 1 -- } \\[2mm] \large \textsf{(Linux Professional Institute Certification)}\\[5mm] \large \textsf{ 111-5 Maintain an effective data backup strategy [3]} \\ \normalsize\end{center} \footnote{Copyright \copyright\ 2002 Geoffrey Robertson. Permission is granted to make and distribute verbatim copies or modified versions of this document provided that this copyright notice and this permission notice are preserved on all copies under the terms of the GNU General Public License as published by the Free Software Foundation---either version 2 of the License or (at your option) any later version.} \scriptsize \begin{verbatim} .~. by: Grant Parnell (slides 8 to 38) /V\ and Andrew Eager (slides 39 to ??) // \\ geoffrey robertson @._.@ geoffrey@zip.com.au \end{verbatim} \tiny \begin{verbatim} $Id: gl2.111.5.slides.tex,v 1.1 2003/08/27 05:27:45 geoffr Exp $ \end{verbatim} \normalsize \vfill \end{slide} %----------------------------------------------------------- %----10->|-----20->|-----30->|-----40->|-----50->|-----60->|-----70->|-----80-> %============================================================================== \begin{slide} \listofslides \vfill \end{slide} %------------------------------------------------------------------------------ %============================================================================== %============================================================================== \begin{slide} \slideheading{Maintain an effective data backup strategy} \slidesubheading{Objective} Candidate should be able to plan a backup strategy and backup filesystems automatically to various media. Tasks include dumping a raw device to a file or vice versa, performing partial and manual backups, verifying the integrity of backup files and partially or fully restoring backups. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide} \slidesubheading{Key files, terms and utilities} \begin{minipage}[t]{40mm} \begin{itemize} \item \texttt{cpio} \item \texttt{dd} \item \texttt{dump} \item \texttt{restore} \item \texttt{tar} \end{itemize} \end{minipage} % \begin{minipage}[t]{40mm} \begin{itemize} \item \end{itemize} \end{minipage} \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slideheading{Backups} Decide what data is important and how long you can do without it. \begin{itemize} %\begin{overlay}{1} \item Is this used 24 x 7 or just business hours? %\end{overlay} %\begin{overlay}{2} \item During business hours how long can you do without it? 4 hours, 30 minutes, 5 minutes? %\end{overlay} %\begin{overlay}{3} \item How up-to-date is it required to get you running in an emergency? %\end{overlay} %\begin{overlay}{4} \item Are you backing up for archival or high availability or espionage? %\end{overlay} \end{itemize} \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slideheading{Examples of Data} \slidesubheading{Static} Configurations of running servers. You need these 24x7 but they don't change much. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Databases / Transactions - financial \& otherwise} These are updated frequently and need to balance. Associated with these are logs \& duplication \& other means of rollback \& integrity checking. With databases it's often a good idea to dump them in a good portable format, especially if the inbuilt format is not cross platform or cross version compatible. Example: \begin{alltt} \cmd{mysqldump mydata >mydata.dump} \end{alltt} This will give you a text file which can be used on most mysql versions and possibly adapted to other database packages. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Logs} People don't tend to read them unless something goes wrong in which case they're valuable. These need to be kept but don't need to be restored in a hurry. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Home directories} This is a mixed bag of everything but some policies could be instated to make the admin's life easier. EG Making specific sub-directories for things and assigning them different backup/restore priorities. Often the existence of a home directory is more important than the rest of the contents as it may make a user unable to login without it. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Code repositories} Programmers should be accustomed to doing regular backups anyway, they often need to revert to an old version to figure out what they broke. Any tools used such as CVS that have a central repository should be backed up almost as often as programmers commit code, at least once a day but they could probably cope with it being missing for half a day. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{High availability - read only} Websites frequently used by your clients. They can contain dynamic data but customers don't update it. This sort of scenario lends itself to frequent replication to a backup server. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{High availability - interactive} Taking a website again, this one might allow the customer to do such things as place orders. The website maintains some state information to allow building of an order. This is the most difficult, the state information can be stored in a replicated database. In the event of web server failure the other one comes into play and the customer may have to login again but the information is kept. (Otherwise complex designs and expensive hardware can be used to seamlessly migrate the state to the other webserver). \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slideheading{Important Linux directories} \footnotesize \begin{verbatim} /var/spool/mail - daily backup /var/lib/mysql - databases - backup the dumps, and possibly the binary. /var/log ? - from "don't care" to "backup daily" /etc - backup config changes /home - be selective, but if you can't, backup daily. /home//mail - contains the user's mail folders (may also be 'Mail' or 'Maildir') /home//.ssh - If you login using ssh keys only, this is a must have. /usr/local - locally installed apps & data Application specifics \end{verbatim} \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} \Slidecontents \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slideheading{Backup \& Restore methods} \slidesubheading{Copy the files to another directory} This is the poor mans backup and does not offer much peace of mind. It does protect against accidental deletion \& corruption by users. One advantage is that it can be very quick for things such as log files. You can also keep multiple copies, one for every day of the week for example. See \texttt{/etc/logrotate.conf}. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Backup to a standby partition} This has about the same level of peace of mind as the above. The backup partition can be left un-mounted after the backup. The backup is slower than the above but the restore operation can be quick. See also "Broken Mirror" method below. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Backup to tape} This is probably the most common backup used in the commercial world. It's easy to backup the lot every day provided you have the tape capacity. If you don't, you become more selective as to what to backup. There's a variety of software to do this but there's 3 main basic systems. Tar, cpio and dump. Often commercial software uses these basic systems and provide for labelling \& indexing as well as multi-server capability from a simple GUI. The reason for using the basic systems is you can restore from them if you have to. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Backup to standby disk} This can offer peace of mind and a fairly cheap backup for people that don't require 24x7 service. Basically a removable drive bay houses another hard disk of similar capacity and the entire system is backed up. This can be done partition by partition or file by file using dd, cpio or rsync. Additional steps can be taken to ensure that the backup is also bootable. The backup drive should be removed once done and treated like a tape. The disadvantage here is that you most likely will need to power down the system twice for one backup. Alternately, if you have an external USB or fire-wire storage medium it becomes possible to do this without downtime. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Backup to CDROM/DVD} Under Linux (as far as I know) there's no software to directly write data without creating an image first. This means there must be sufficient space available. It would be possible to create a bootable CD with restore software and a compressed filesystem but I haven't seen this. It may be OK if you don't have a large filesystem or you have a DVD writer or you're not backing up everything. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{RAID System} Not strictly a backup but a RAID system can protect against hard drive failure by providing redundancy. Data is written simultaneously to 2 or more hard drives and can include parity information. It does not protect against corrupt databases and people removing files. It will corrupt \& remove files equally well on all disks. Linux can do RAID in software very well but the ideal is a hardware solution involving hot swapable disks so they can be replaced while the system is fully running. A RAID system can mean the difference between going on-site at 3am and saying "Oh dear, we'll replace that first thing in the morning". Just ensure that you do have a replacement readily available and do not have to wait a week. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{RAID Tape array} In a similar manner to RAID 5 disks, data is written in parallel to 5 tape drives which increases throughput and data integrity. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Backup Server} All of the methods discussed so far involve direct transfer from server to backup medium. If you have a number of servers it may not be practical to install backup devices on each. Another way is to remotely access the required medium directly (\texttt{/dev/rmt0}) but arbitration of access can be an issue. An increasingly popular way is to provide a super-server with a huge amount of disk space capable of holding everything required by the other servers. Transferring the data can happen at any time in either a batch or continuous process. A batch would be say backup a whole directory at once whereas a continuous operation might be transmitting log information or database updates. The backup server itself may then employ any one or more methods to perform backups of itself, possibly based on some statistical analysis. An example of this is a system called ADSM which employs RAID arrays, multiple tape drives, a tape robot with barcode reader and intelligent software that tells the operators which tapes are to go off-site and which ones it wants back. It essentially is a huge cache that stores frequently changing data locally and stores old data off-site. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slidesubheading{Broken Mirror} If you've got about 100Gb of data on a mirrored pair of disks and only have a 10 minute backup window this may be for you. Basically you bring the system down, unhook one of the mirrors and replace it with another set of drives and bring the system up again. Mirroring starts from scratch during quiet time and should be finished before load picks up again. With the drive set you just un-hooked this can then be loaded into the standby server and backed up to tape over the course of many hours. Some high end servers can perform this operation without downtime as the hooking up can be done using inbuilt hardware or such things as dual-port fire-wire drive bays. All that is required in this case is an application shutdown, sync, dismount, remount, application start type operation. \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} \Slidecontents \vfill \end{slide} %----------------------------------------------------------- %============================================================================== \begin{slide}{} %Grant Parnell \slideheading{Backup Software} \slidesubheading{Command line tools} \begin{description} \item[\opt{dd}] \textsf{C}opy and \textsf{C}onvert can be used to copy raw disk blocks, even to tape (yuk). Example: \begin{alltt} dd if=/dev/hda1 of=/dev/hdb1 \end{alltt} \vfill \newpage \item[\opt{tar }] Tape ARchive - you all know how to unpack tgz files, and maybe even create them. Just remove the 'f' option. It also can be an advantage not to use compression as some drives have this built in. Also, a portion of the tape being corrupt can ruin the rest of the data, whereas you can skip corrupt bits and pickup the next file if not compressed. Example: \begin{alltt} tar -c /home cd /tmp; tar -x \end{alltt} \vfill \newpage \item[\opt{cpio}] CP I/O - Similar capabilities of tar but different methodology. Example: \begin{alltt} \$ find /home | cpio -oB >/dev/tape \$ cd /tmp; cpio -idB