Build An Operating System From Scratch: A Project For An Introductory Operating Systems Course
Build An Operating System From Scratch: A Project For An Introductory Operating Systems Course
Michael Black
American University 4400 Massachusetts Ave, NW Washington, DC 20016 (202) 885-2011
[email protected] ABSTRACT
This paper describes a semester project where students design an operating system from the ground-up, capable of booting from a floppy disk on an actual machine. Unlike previous projects of this kind, this project was designed for students with only one semester of programming experience and no prior exposure to data structures, assembly language, or computer organization. Students nevertheless wrote a full system consisting of system calls, program execution, a file system, a command-line shell, and support for multiprocessing. The project was assigned to a class and successfully completed by nearly every student. students write a system to run directly on a computer without simulators or software underneath. Well-known examples of such systems include Minix and GeekOS, and are usually intended for upper-level or graduate courses. These systems, typically approximating Unix, tend to be intricate and complex, and include substantial amounts of prewritten code (14886 lines for Minix and 4202 lines for GeekOS) [5,8]. The complexity of the systems make them difficult to assign to students at smaller liberal arts colleges which offer only one operating systems course. My objective, described in this paper, is to construct a simple "baremetal" teaching operating system suitable for a small computer science program. The operating system described in this paper is under 1/4 the size of GeekOS, making it one of the smallest and simplest teaching operating systems yet developed. There are several key advantages to such a project over a higherlevel project that isolates students from the machine. First, students gain a deeper understanding of the computer itself, experiencing first-hand how such concepts as segmentation, interrupt-vectors, and memory management actually manifest themselves in their own computers. Second, when forced to program without the familiar POSIX routines and library functions, students realize how much support operating systems give them. Third, there is a thrill in booting a machine on a system that is entirely one's own, making the often-abstract operating systems concepts much more tangible. Despite these advantages, of the many operating systems projects that have been designed, relatively few are truly "bare-metal." More common are systems to be run on simulated computers such as Nachos or OS/161 [3,5], or projects that simulate operating system components in Java or other high level languages [6]. Bare metal systems are difficult to debug, need special tools to develop, and require background knowledge about the machine. Some assembly language is required. And many advanced concepts typically taught in operating systems courses, such as page replacement or thread management, cannot be easily built into the system. It is a daunting task for undergraduates to reproduce Unix in an introductory course. However, the first operating systems for personal computers tended to be very simple and lacked most of the advanced concepts taught in typical operating systems courses. CP/M, for example, provided little more than a rudimentary file system, a small set of system calls, and a command-line shell [7]. Because modern PCs are still capable of emulating the IBM 5150, a bootable operating system need not be as complex as Windows
General Terms
Design
Keywords
Operating systems, education
1. INTRODUCTION
Can a second-year computer science student write an operating system from scratch? Furthermore, could a student complete this system in one semester, with no assembly language or computer organization experience, and still boot the system on a personal computer? This paper describes one such project developed for our operating systems course. The project is designed to give undergraduate students concrete, tangible experience in writing an operating system for an actual machine. Various projects have been designed for operating systems courses over recent years. Possibly the most ambitious projects have been the so-called "bare-metal" operating systems [5], where
Vista. My project has students constructing an operating system with roughly the same complexity as CP/M [13]. The project is nevertheless ambitious. Our operating systems class is intended for sophomore-level students with only one semester of prior programming experience. Students know Java, but are not expected to know C, assembly language, or any computer organization coming into the course. This project is consequently designed to be as simple and basic as possible. All programming is done in C, a minimum of prewritten assembly code is given, and students are given a substantial amount of stepby-step instructions. The entire finished project consisted of approximately 1250 lines of code, of which 320 lines are simple assembly routines provided to the students and approximately 930 lines are C functions students are expected to write. The finished project consists of the following components: A bootloader to boot from a 3 1/2" floppy disk A set of system calls made using an interrupt A file system The ability to execute a program from a file A command-line shell with a minimum of necessary commands (directory, type, copy, delete, execute). Multiprocessing and basic memory management
shell provides a command-line interface. System calls are made from the kernel and shell by calling interrupt 21 (in an homage to MS-DOS). Table 2 shows the list of system calls the students are required to write, in the order that they are assigned. Table 3 shows the shell commands. Table 1. Assembly functions provided to students Name bootloader putInMemory interrupt makeInterrupt21 handleInterrupt21 launchProgram makeTimerInterrupt handleTimerInterrupt returnFromTimer initializeProgram setKernelDataSegment restoreDataSegment Function loads and executes kernel puts a byte in memory makes an interrupt call sets up int 21 vector calls C function on int 21 calls program at 0x20000 sets up timer interrupt (8) calls C function on timer returns to active process sets up process stack frame sets data segment to kernel sets data segment to user Proj. A A B B B C E E E E E E
Figure 1. Screenshot of operating system at startup The file system is illustrated in Figure 2. The first two sectors of the disk, after the bootloader, are reserved for a Disk Map and a Directory. The Directory consists of 16 32-byte file entries, where each entry has room for a 6 byte file name and 26 bytes of sector numbers specifying the file's location. The Map serves as a kind of free list; it consists of 512 bytes representing the first 512 sectors on the disk, with a free sector denoted by 00 and a used sector denoted by FF. The disadvantages of this primitive file system are obvious; besides restricting the user to 16 files, it also restricts the maximum file size to 13kB (not a problem as the finished kernel is not more than 5kB), is unable to reference sectors above 256, and is utterly unscalable. However, there are a couple key advantages. Unlike most practical file systems, it does not require the students to have prior experience with linked lists or other data structures more complicated than an array. Since every sector is referenced by a single byte, students do not need to understand binary math to write the file system handlers. Program execution is also very simple. All executables in this operating system consume less than 64kB of memory, have no segmentation, and have their entry point at the beginning of the program. The kernel is loaded into the second 64kB block of
memory (from 0x10000 to 0x1FFFF). All user programs (including the shell) are loaded into the next 64kB (from 0x20000 to 0x2FFFF). Assembly routines are provided to set up the stack and segment registers; students only need to copy the program to memory and call the routine to launch it. Table 2. System calls written by students AX= 0 1 2 3 4 5 6 7 8 9 A Function print string read string from keyboard read disk sector read file execute program terminate program write disk sector delete file create and write to file kill process wait on process Project B B B C C C D D D E E
they are typically not used in modern operating systems. Using existing BIOS functions greatly simplifies my operating system because students do not have to write floppy or console drivers. However, to use these functions, the computer state must be kept in 16-bit real mode. Virtual memory and protection, which require the 32-bit mode, are consequently not addressed.
3. ASSIGNMENTS
I modularized this project by dividing it into five components, labeled Project A through E. Students were given two weeks to complete each component. Before beginning Project A, students were given a warm-up project writing a simple Unix shell to give them experience coding in C. The projects are designed to increase in difficulty as students become more and more accustomed to low-level programming. All students were given a Linux account on the department server containing the necessary development tools. Students used Bochs [9], rather than a real machine, to test their system. Because gcc and nasm are not capable of producing 16-bit code, students developed their system using the archaic bcc compiler and as86 assembler (from the bin86 toolset). Additional tools included the dd command to produce the binary floppy image and hexedit to verify their binary files. All of these are open source tools available on standard Linux repositories.
Table 3. Shell commands written by students Command type execute delete copy dir create kill execback Function print out text file load and execute program delete file copy file print out directory create a new text file kill process execute program in background Project C C D D D D E E
Figure 2. Block Diagram of File System In the multiprocessing step, user programs may be located in any of the first eight 64kB blocks. A process table holds the identities of currently running programs and their current stack pointer address. On each timer interrupt, the register state is backed up and a new program chosen from the process table in round-robin fashion. Although assembly language routines for handling the timer, backing up and restoring the state are provided to students completing this step, students still must handle the process table and cope with race conditions. The project makes heavy use of existing BIOS functions to print to the screen, read from the keyboard, and read and write sectors to the disk [11]. These BIOS functions are contained in ROM in all modern PCs and are holdovers from the earlier days of DOS;
Write a printString function to print out a null terminated string by making repeated print character (interrupt 10) BIOS calls. Create a handleInterrupt21 function that is called on an interrupt 21 and takes the contents of the AX, BX, CX, and DX registers as parameters. Students have handleInterrupt21 call printString when AX is 0. Subsequently, whenever students write a kernel function, they must call it in handleInterrupt21. Create a readString function that reads characters from the keyboard using BIOS interrupt 16. The readString function terminates the string when ENTER is pressed. Students are expected to write handling for the BACKSPACE key. Create a readSector function that reads a single sector from the disk using BIOS interrupt 13. Students must write mod and division functions to convert a raw sector number to the CHS format required by interrupt 13 In main(), call makeInterrupt21, and test the functions by reading a string, reading a sector, and printing both out.
Make a delete function that takes a file name and deletes it. delete finds the file in the directory, removes each sector of the file from the map, and marks the directory entry as deleted. Add shell commands to copy a file, delete a file, and list the directory. Students must also make a simple editor: a create shell function that reads in lines of text from the keyboard and writes them to a file.
4. STUDENT EXPERIENCE
I administered the project to a single class of 14 students. The student experiences and results are consequently limited to this single sample. Student project submissions were evaluated solely on whether they functioned correctly and met the stated project requirements. Submissions were not graded on efficiency or code elegance.
The class consisted of 6 sophomores, 4 seniors, and 4 masterslevel graduate students. 7 were computer science majors, the rest were non-majors (the most common non-CS major was international studies). All the students had completed the first course in introductory programming, but 3 had not taken the second course and only 4 had taken a computer organization course previously. 10 students correctly completed all 5 components; these included 2 of the 3 students who had only taken the first programming course and 2 of the 4 graduate students. Of the remainder, 3 finished Project D, and the remaining student finished Project C. Since all development work was done on Bochs, students were given the opportunity at the end of the course to test their system on a Pentium 4 computer. 12 of the 14 students' systems acted the same on a real computer as on the simulator. Most student difficulties were on Projects C and E. In Project C, students tended to have problems reading files, especially files greater than one sector in size. Project E provided two sources of frustration. Many students had difficulty understanding when to switch data segments to the kernel space. Nearly all students attempting Project E had difficulty debugging errors due to multiprocessing race conditions. On the other hand, students did not typically have problems understanding what was required of them, and few students made the mistake of trying to call POSIX functions in their kernel. A survey was given to the students at the end of the semester asking them to evaluate their project experience. Students unanimously reported that the projects contributed to their understanding of the course material, that they enjoyed the projects, and that they recommended them to future classes. Students were asked whether they felt that the hardest part of the project, the multiprocessing step (Project E), was necessary; 9 answered yes, 2 no, and 3 "not sure".
From my experience, I believe that several factors were necessary to make a project like this successful: Providing detailed step-by-step instructions for each stage. Keeping prewritten assembly code to an absolute minimum, and providing clear instructions on calling these functions. Approximating a simple system like CP/M or early MS-DOS rather than replicating Unix. Using BIOS calls instead of custom device drivers. Including no advanced data structures, binary math, stack handling, or assembly programming (besides what was provided). In the future, I hope to extensively document this project, simplify it further, and make it reproducible by instructors at other universities. I am presently developing a GUI shell component, and one of my future plans is to boot from a USB key instead of a floppy disk.
6. REFERENCES
[1] Anderson C.L. and Nguyen M. 2005. A Survey of Contemporary Instructional Operating Systems for Use in Undergraduate Courses. In Consortium for Computing Sciences in Colleges, Northwest, 2005. [2] Christopher W.A., Procter S.J., Anderson T.E. 1993. The Nachos Instructional Operating System. Winter USENIX Conference. [3] Holland D.A., Lim A.T., and Seltzer M.I. 2002. A New Instructional Operating System. In Proceedings of SIGCSE Technical Symposium on Computer Science Education. [4] Liu H., Chen X., Gong Y. 2007. BabyOS: a fresh start, In Proceedings of SIGCSE Technical Symposium on Computer Science Education. [5] Hovemeyer D., Hollingsworth J.K., Bhattacharjee B. 2004. Running on the Bare Metal with GeekOS. In Proceedings of SIGCSE Technical Symposium on Computer Science Education. [6] Silberschatz A. and Galvin P. 1999. Applied Operating System Concepts. John Wiley and Sons. [7] Tanenbaum A.S. 2001. Modern Operating Systems. Prentice Hall. [8] Tanenbaum A.S., Woodhull A.S. 1997. "Operating Systems: Design and Implementation." Prentice Hall. [9] Bochs. http://bochs.sourceforge.net. February, 2008. [10] GeekOS http://geekos.sourceforge.net. December, 2007. [11] Ralf Brown's Interrupt List. http://www.ctyme.com/intr/int.htm. December, 2007. [12] OS Development Wiki http://wiki.osdev.org. January, 2008. [13] CP/M-85. 1982. Zenith Data Systems.