\documentclass{cmlab} \RCS $Revision: 1.1 $ \usepackage{alltt,key,xr} % \externaldocument[lt-]{../../linux_training-plus-config-files-ossi/build/masterfile} \usepackage[nooneline,hang,bf]{caption2} \renewcommand{\captionfont}{\raggedright} \usepackage[pdfpagemode=None,pdfauthor={Nick Urbanik}]{hyperref} \newcommand*{\labTitle}{Overview of Lectures} \begin{document} \subsection*{Overview} \label{sec:overview} \begin{description} \item[Aim of subject:] This subject aims at a practical understanding of operating systems supporting LPI and RHCE ceritfications rather than one oriented towards theory and operating sytem design. \item[Free software] is a technical term defined by Richard Stallman, the founder of the \emph{Free Software Foundation}\@. Stallman wrote emacs and the Gnu C compiler suit. Free software provides the following four freedoms: \begin{itemize} \item The freedom to run the program, for any purpose (freedom 0). \item The freedom to study how the program works, and adapt it to your needs (freedom 1). Access to the source code is a precondition for this. \item The freedom to redistribute copies so you can help your neighbor (freedom 2). \item The freedom to improve the program, and release your improvements to the public, so that the whole community benefits. (freedom 3). Access to the source code is a precondition for this. \end{itemize} \item[Why use free software?] Free software provides benefits to users of the software; if the company goes broke or decides not to support it anymore, then people can carry on using and developing the software. There is no vendor lock-in. if it doesn't work, no need to wait for someone else; you can fix it yourself. Good for infrastructure (such as operating systems!) Free software supports open protocols and open standards. The Internet is built on free software; that is why is is so successful. There is no ``embrace and extend'' of standard technologies. \end{description} \subsection*{Types of Operating System} \label{sec:types-of-OS} \begin{description} \item[What is the OS?] The operating system is essentially the kernel that runs in a priveleged execution mode supported by the hardware of the CPU\@. This mode is called \emph{supervisor mode}, or sometimes \emph{kernel mdoe}. Application programs run in \emph{user mode}, and CPU hardware prevents them from executing priveleged instructions to access some of the hardware. The kernel: \begin{itemize} \item manages hardware resources and shares them between applications; this is like a government. \item provides a standard set of \emph{system calls} that allow programmer to interface to hardware at a higher level than by programming registers in hardware. \item These system calls wrapped in library functions. \end{itemize} \item[What resources?] The OS manages CPU, memory, files and disks, printing, network access, I/O devices,\ldots \item[Multiuser and multitasking:] OS must protect users from each other, and protect processes from each other. Processes have owners, and execute with the permission of the owner. \item[Types of OS] There are two main types of operating system (according to Andrew Tanenbaum, \emph{Modern Operating Systems}): \begin{description} \item[Monolithic OS] is an OS where there is one level of privelege, and where all kernel functions execute in the same address space. Andrew described this organisation as ``the big mess'' which would be very hard to port to new architectures. \item[Microkernel or Layered Kernel] is an organisation where as much of the OS function is done in ``user space'' rather than in the priveleged level of the kernel. This should result in a much more portable OS with a very small kernel. \end{description} \item[Andy was wrong!] Linux has a \emph{monolithic} organisation, whereas Windows NT/2000 has a microkernel organisation. Andrew Tanenbaum argued with Linus Torvalds about the design of Linux for a long time. However, history has shown that Andy was wrong in some respects. The Linux kernel is much smaller and simpler than the Windows kernel, and is far easier to understand. The Linux kernel divides the hardware control into dynamically loadable kernel modules. Linux runs on a huge number of hardware platforms; Windows has reduced the number of platforms it can run on to one. \item[Virtual Machine] is another organisation for operating systems; most famous example is IBM 390 mainframe. Now it runs Linux on hundreds or thousands of virtual machines with virtual hardware. If one virtual machine crashes, no problem to other virtual machines. The crashed virtual machine can simply be rebooted without affecting any others. VMware provides software equivalent of a virtual machine. IBM 390 has hardware support. \end{description} \subsection*{Processes} \label{processes} \begin{description} \item[What is a process?] A process is a program in execution. It has its own address space. \item[What is a thread?] A \emph{thread} is a lightweight process that shares its address space with other threads performing the same task. \item[A process has an owner] The process executes with the permissions of its owner. \item[Process is born with \texttt{fork()}] In Linux, a process is born by making a copy of its parent process wiht the simple \texttt{fork()} system call. So each process has a parent, and many processes have children. They form a family tree of processes. \item[Scheduler] is an important component of an OS that determines which runnable process should run next. The aims of the scheduler include: \begin{itemize} \item maximising CPU usage \item maximising process completion \item minimising process execution time \item minimising waiting time for runable processes \item minimising response time \end{itemize} \begin{figure}[htb] \centering% \includegraphics[width=0.5\linewidth]{process-states} \caption{The states that a process may be in.} \label{fig:states} \end{figure} \item[Process states] Processes move through three main states, as in figure~\vref{fig:states}. When a process changes state, this is called a \emph{context switch}. \item[\texttt{vmstat}] gives information about process states in Linux. \end{description} \subsection*{Memory management} \label{memory-management} \begin{description} \item[Virtual memory:] is a method of managing memory automatically by the operating sytem so that the combined size of memory available to all processes running on the computer may be more than the physical memory available by using the hard disk to hold what is not in physical memory. \item[swapping:] involves moving all the content of memory associated with a process from \RAM to hard disk, and back as the operating system needs memory to run other processes. This is inefficient for large processes. Results in holes, where \RAM allocated to two large processes may have a hole between them which is too small to use for any process. \item[paging:] all memory is divided into chunks called \emph{pages}. All pages are of a fixed size. The pages in \RAM used by a process do not have to be contiguous (next to one another), so no problem of holes. \item[Memory Management Unit:] is hardware that sits between the \CPU and the system buses, translating virtual addresses used by programs into physical addresses. The \MMU is connected as shown in figure~\vref{fig:mmu}. The \MMU and \CPU organisation fix the size of the pages, page tables and various other aspects of paging. \begin{figure}[htb] \centering% \includegraphics[width=0.5\columnwidth]{mmu} \caption{The memory management unit is shown here converting virtual addresses from the \CPU to physical addresses.} \label{fig:mmu} \end{figure} \item[Virtual addresses:] are the addresses used by a user's program. The programmer can write the program as if the program has access to as much memory as required, without worrying about whether the addresses are used by other processes. All addresses in the program are \emph{virtual addresses}, and are translated by the \MMU to physical addresses. \begin{figure}[htb] \centering% \includegraphics[width=0.4\columnwidth]{paging-labeled} \caption{A single-level paging system. Virtual memory addresses are 32-bit. Pages are 4K each.} \label{fig:paging} \end{figure} \begin{figure}[htb] \centering% \includegraphics[width=0.55\columnwidth]{multilevel-paging-labeled} \caption{A multi-level paging system. Virtual memory addresses are 32-bit. Pages are 4K each. The page tables are themselves pages, also of 4K each. Since each page table entry is 4 bytes in size on the Intel platform, there are $1024 = 2^{10}$ entries in each of the page tables. So there are ten bits required for the page number part of the virtual address. The page directory itself is a 4K page, each entry is 4 bytes, so there are 1024 entries, one for each page table. So there are ten bits required for the directory part of the virtual address.} \label{fig:multilevel-paging} \end{figure} \item[Why we need multilevel paging (single level paging won't do):] Many people seemed unable to understand why a single level page table needs $2^{20} \times 4$ bytes of \RAM, and why multilevel paging does not. The key is that for virtual memory to work, there must be page table entries for the entire virtual address space. If the virtual addresses are 32 bits long, and each page is 4-kilobytes ($2^{12}$ bytes) in size, then there \emph{must} be $2^{32} \div 2^{12} = 2^{32-12} = 2^{20}$ entries, one for each page. On the Intel platform, each entry is 4 bytes, so the total size of all page table entries is $4\times 2^{20} = 4$\,MB\@. The single-level paging scheme shown in figure~\vref{fig:paging} shows the page table in one piece, so it must all be in \RAM. It is silly to keep all this in memory at once, since most page table entries are never used. For example, virtual memory in today's computers is unlikely to be 4\,GB unless the computer is a very busy server. The solution is to have only the necessary page entries in \RAM, and to have any others that have been used some time ago on the hard disk. \emph{It is more sensible to split the page table into pages that can be paged in and out of memory.} This is what multilevel paging achieves. \item[Explaining the ``Example of paging: Intel x86''] The example in the notes has confused many people since many of you do not know the meaning of \texttt{0x20000000} (or what \texttt{0x}\emph{nnnn} means). The prefix \texttt{0x} indicates a hexadecimal value in the C programming language. You can have statements such as: \begin{verbatim} int i = 0x123; \end{verbatim} which assigns the value 123\hex to the variable \texttt{i}. All that I am showing here is what the components of the virtual address are. The example is simply showing the components of the virtual address \texttt{0x20021406} (i.e., 20021406\hex). The offset within the page is the least significant 12 bits, i.e., 406\hex. The page number is the next 10 bits, i.e., 21\hex. Note that the only allowable page numbers are 0 to 3F\hex, since the process has been allocated pages in the range 20000000\hex--2003FFFF\hex. If any address used by this process were outside that range, the \OS would terminate the process with a segmentation fault. The most significant ten bits give the directory: \begin{tabular}[t]{cccccccc} 2 & 0 & 0 & 2 & 1 & 4 & 0 & 6\\ 0010 & 0000 & 0000 & 0010 & 0001 & 0100 & 0000 & 0110 \end{tabular} If you count the ten most significant bits, you get 0010\,0000\,00\bin, i.e., 80\hex. \end{description} \end{document}