0% found this document useful (0 votes)
8 views

Teaching Linux Based Operating System

Uploaded by

Arya Asylum
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Teaching Linux Based Operating System

Uploaded by

Arya Asylum
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

TEACHING LINUX BASED OPERATING SYSTEM

Warren Ebell Liesl Lohlun Yat Pan Ng


Computer Science Honours Computer Science Honours Computer Science Honours
University of Cape Town University of Cape Town University of Cape Town
[email protected] [email protected] [email protected]

ABSTRACT have a communal lab being brought down by students altering


The teaching of operating system internals at UCT does not take OS source code and not succeeding.
a practical view of the operating system. Students are not given This paper will look at an alternative way to teaching a practical
the opportunity to gain any experience with source code, and the based course in operating systems using Linux as a base. This
inner functions. To decrease the impact of testing, a minimized can be accomplished by making use of current resources, and
version of the operating system would be ideal. does not require a host of new labs to be set up just for this
The Linux operating system presents opportunities to alter course. The operating system will be compiled onto a bootable
source code, but the amount of information available to floppy disk, and this will prevent the bringing down of the lab
prospective users is limited. Also there is a means to implement environment.
a minimalist version of the kernel. Some of the major components of the operating system will be
Sections of interest are the scheduler, the file system, and the presented as topics of investigation, and these will give learners
virtual memory manager. The goal was to provide detailed the opportunity to work with the source code of an operating
information about each section, as well as some means of system, and gain practical knowledge to accompany the
altering it. theoretical knowledge gained in previous courses.
As mentioned the operating systems course in third year takes a
Categories and Subject Descriptors high level approach to the operations of the operating system.
D.4 Software - Operating Systems Students get very little exposure to the inner workings of the
OS, they are just told what happens and have to accept that the
theory being presented is actually implemented in the manner
Technical Report Number presented.
CS04-11-00
This extends to many of the kernel components, and students
have very little exposure to these “real world hacks” that give
General Terms the OS the necessary performance, but still keep it relatively
Operating Systems, Scheduler, Virtual Memory Manager, File theoretically correct. The motivation is to give students
Systems. exposure to these, so as to extend their theoretical knowledge
with practical application.
Keywords
Linux, process scheduler, O(1) algorithm scheduler, loadable
module, kernel, virtual memory sub-system, paging algorithm, 2. BACKGROUND AND MOTIVATION
Virtual File System.
Operating systems are an important part of the computer science
field, and the teaching of them is not done on a very practical
level. A survey of operating system professors was conducted
1. INTRODUCTION by Addison-Wesley and the general consensus was that they
The field of computer science has always been a two sided coin: want to get students to be involved in programming projects, but
there has always been a need for the theoretical side of there is no real tool to help them with this.
computational science to push the boundaries of the science, and
A lot of work has been done to come up with a teaching tool that
there has always been the practical side which has always
would allow students to have a strong feel of the core
tempered the theoretical side by limiting the implementation of
mechanisms of an operating system.
the theory, and also providing a “what works in the real world”
view to the science. These two sides are always weighed up in In the preface of his book Gary Nutt provides a realistic view of
the teaching of computer science; some institutions believe that the teaching of operating systems:
a theoretical grounding is better, while others take a more
practical stance. At the UCT Department of computer science “There are only a few widely used commercial operating
the curriculum is well balanced over the majority of the fields of systems. While studying these systems is valuable, there are
study in computer science, with the exception being operating practical barriers to experimenting with any of them in the
systems. This is partially due to the fact that facilities to work classroom. First, commercial operating systems are very
with actual operating systems are limited and no-one wants to complex since they must offer full support to commercial
applications. It is impractical to experiment with such complex stable environment to use for day to day tasks and practicals,
software because it is sometimes difficult to see how specific while the other environment could be used for testing kernels.
issues are addressed within the software. Small changes to the
“Testing? What’s that? If it compiles, it is good, if it boots up it
code may have unpredictable effects on the behaviour of the
is perfect” – Linus Torvalds
overall OS. Second, the OS software sometimes has distinct
proprietary value to the company that implemented it”. This method initially seemed plausible, but after much
discussion was not implemented as a better method was found.
Another weak point of teaching operating systems on a practical
This new method involved compiling the kernel onto two floppy
level is the lack of documentation. For example there is very
disks using the Pocket Linux guide [1]. This method provides a
little documentation regarding the inner workings of the Linux
means for the students to compile their own kernels onto a
kernel. Someone who is new to the kernel is “politely” told to
medium that would not cause lab interruptions, but would still
RTFS1. This presents a problem, as the source is at best
give them an opportunity to test their work. If the kernel then
confusing to read, and takes a few attempts before it is
compiled and booted correctly, it could be implemented on the
understood.
Linux machines as a different boot option in the boot menu
By providing students with easy to understand documentation (using LILO or GRUB).
and a simple way to get exposure to the inner workings of an
In terms of sections that could be learnt, three of the major
operating system would make for a course that would give the
components that are common to most major operating systems
learners the necessary exposure to the practical implementation
were chosen for further investigation. These sections were the
of an Operating System.
Linux Virtual Memory sub-system, the process scheduler, and
The motivation for this practical exposure to operating systems the file system. With these three sections a learner would gain
is that students only get theoretical exposure to operating exposure to some of the important core components of an
systems in the third year course, with limited practical tutorials. operating system.
The most complex of these tutorials is writing a simple device
Initially it was envisioned that each of these sections could be
driver.
abstracted out of the kernel, and give the learners an interface to
the kernel so as to minimize the amount of kernel code that
would need to be understood to write a working module. The
3. APPROACH AND METHODS Linux operating provides for the loading of kernel modules that
extend the functionality of some of the core kernel functions,
The first hurdle that a practical course in operating systems
one example being file system modules.
faces is the source code that is going to be used. As some of the
operating systems out there are propriety there are a limited Research on scheduling algorithms has been a popular topic
number of options available to use. The solution to this problem ever since multi-tasking was first introduced. Many new
is to make use of one of the open source operating systems, as algorithms have been developed since, but there has been little
the source code is freely available for download, and there are change in the schedulers used by commercial operating systems.
people available to answer questions for “newbies2” via One of the main obstacles that scheduler developers face is the
newsgroups or forums. We chose to go with the Mandrake implementation of a scheduling policy in a standard OS:
Linux Community 10 distribution as it is free, and it includes all developing and implementing a scheduling policy require two
the necessary utilities for us to do our work (although any of the different kinds of expertise. Therefore a scheduler developer
Linux distributions would have worked). needs both to master kernel hacking and to be knowledgeable in
the scheduling research field.
Our project supervisor initially recommended the 2.4 series
kernels to work on as there is much documentation about it, but There were not many papers or articles that specifically deal
changed this to the 2.6 series as the 2.4 series is quite old, and with how to write a loadable that can replace the default
there are many new functions in the 2.6 series that could prove scheduler in Linux 2.6 or for any version of Linux for that
to be useful. The core functionality remains the same between matter, but there is a guide [7] that provides high-level
the two versions, so changing was not much of a hiccup to the instructions on how this can be done in Linux 2.2.14.
progress of the project. Another related piece of research done was DWCS [8]. DWCS
Another problem is the facilities available. There would be very stands for Dynamic Window-Constrained Scheduling.
little chance to have a dedicated lab available for just one Originally, DWCS was designed to be a network packet
course. Using a communal lab could present the problem of scheduler that limited the number of late or lost packets over
having some of the machines rendered useless by an attempt to network traffic stream, but later it evolved into a process
run a kernel that does have some flaws. scheduler for Linux. DWCS can be configured to run as an
earliest deadline-first (EDF), static priority or proportional share
To overcome this problem we suggested the use of dual-booting scheduler. DWCS also has the desirable property of attempting
the machines. Having more than one operating system on each to guarantee no more than x deadlines are ever missed in each
machine would allow students the opportunity to have one window of y consecutive deadlines.
Bossa is a kernel-level event-based framework to facilitate the
1 implementation and integration of new scheduling policies. It
Read The Freaking Source. uses a domain-specific language (DSL) that provides high-level
2
Newbie – someone who has little or no knowledge of a subject scheduling abstractions that simplify the implementation and
evolution of new scheduling policies. A dedicated compiler In a traditional O(n) scheduler, calculating time slice often
checks Bossa DSL code for compatibility with the target OS and requires us to loop over each task. This calculation is usually
translates the code into C. However, Bossa currently only done when the time slices of all the currently running processes
supports 2.4.2 and because of the major different in the have all been expired. Recalculation of each task’s time slice
scheduler source-code between 2.4 and 2.6, it was deemed to be will then require a loop of order n, and the priority of each task
un-portable to the 2.6 kernel hence unusable in the project. and other attributes are subsequently used to determine the time
slice given to the task. Not only does this approach face the
4. COMPONENTS danger of scaling to a O(n) algorithm for n task, but locking
must also be done to ensure the task list is not tempered with
4.1 Linux Virtual Memory sub-system during recalculation
The main area of focus in the VM was with the paging sub- The new scheduler uses a fixed range of priorities in a doubly
system. This was chosen because it has a direct influence in the linked list to ensure O(1) scheduling time. This structure is the
performance of the operating system, and poor implementations result of the hybrid between the famous round-robin and first-in-
could be very easy to pick up [2, 3]. Also the implementation of first-out. The data structure used to store the list of runnable
the paging algorithm does not follow the theory of the Least processes is the runqueue. Each runqueues contains two priority
Recently Used strictly [4], and would therefore give the learners arrays, the active and the expired array. The struct prio_array is
an opportunity to see how “real world” performance influenced responsible for implementing this array. It is defined in
the implantation of the theory. /kernel/Sched.c [9] and it is crucial in order to provide the O(1)
The Least Recently Used algorithm is implemented in a way so scheduling. Each priority array consists of one queue of
as to minimize the overhead [4], while still giving the best runnable process per priority level, creating a structure similar
possible performance. The page Table contains a reference bit to that of a double link list with one list having fixed length.
that is set when the page is accessed. If a page fault occurs, all The priority array also contains an array of longs called bitmap,
of the reference bits are checked. If the bit is zero, that page is and the bits in this long array are used to keep tract of which
marked for reclamation. Once all the pages have been checked queue is empty and which is not. It is the fixed length (140) of
the reference bits of those not marked for reclamation are set to the runqueue and the bitmap that guarantee the new scheduler its
zero and the pages marked for reclamation are swapped out. O(1) efficiency.
This method works well in the real world, and is a very close
approximation of the Least Recently Used algorithm, but does
have its flaws.
In the extreme case that two page faults occur right after one
another, all of the pages will be swapped out. The process
happens like this: after the page fault all of the reference bits are
set to zero. The second page fault then occurs, and as none of
the pages have been referenced since the last page fault, all of
them will be marked for reclamation, and will be swapped out.
This can lead to a form of thrashing, but only in an extreme
case.
A solution to this problem would be to implement a frequency
count to work in conjunction with the LRU algorithm. This
would keep the pages that have a greater frequency of use from Figure 1. An illustration of the process taken to select the
being swapped out, even if they have not been referenced since next task to run. [12]
the last page fault. The frequency count would be reset when the
The interactivity of a task is not something that is known
page is swapped out. This implementation of the Least
intuitively by the scheduler. It uses a heuristic that correctly
Frequently Used algorithm on top of the LRU would solve the
quantify the interactivity of a task by checking whether a task is
double page fault problem, but any algorithm could be
implemented instead of the LFU. I/O bound or processor bound. This metric has the advantage of
not being vulnerable to abuse. If a heavily I/O bound task spent
Another option would be to increase the number of bits used to a long time sleeping but also quickly used up its entire time
check the number of references a page receives [5, 6]. This slice, it will not be given a large bonus. The idea is not just to
would be a better implementation as a better indication of the award interactive task but also punish processor bound task. A
history of page usage is kept, while still keeping performance at newly started process quickly receives a large sleep_avg to
an acceptable level. allow for immediate response. Later the penalty or bonus will
also affect the dynamic priority based how much a process hogs
4.2 Process Scheduler the processor.
A scheduler affects the overall feel of a system. Whether this is One of the advantages of the new scheduler is the improved
the interactivity of a desktop client or the throughput of an scalability that it provides. Each processor has its own separate
application server, the operating system installed usually only runqueue and locking, and hence each cpu only maintains a list
employs one algorithm. This algorithm will then need to of its own processes and the scheduler schedules each runqueue
perform well in both cases. separately. In a SMP system, this will probably create an
imbalanced load amongst different processors. Some runqueues 6. After calling the hack version, the return value (if there is
might be longer than other and some might even be idle while one) is return to the {method}_mod() system call.
others suffer from process starvation. Intuitively, this problem
requires a global scheduling mechanism, but it is solved with the 7. {method}_mod() just returns the return value to method in
load_balance() function which is also an “one per processor sched.c.
function”.
4.3 File Systems
This function compares the runqueues from different cpus and When the Linux kernel needs to access a file system, it uses a
find imbalance amongst them. If there is such an imbalance, it file-system-type independent interface, which allows the system
will attempt to pull suitable tasks from the busy processor to an to carry out operations on a file system without knowing its
idle or a less busy one. The idea is to make sure all runqueues construction or type. One of the most important features of
have more or less the same amount of processes. It is called by Linux is its support for many different file systems. Since the
schedule() when the runqueue is empty, and it is called by the kernel is independent of the file system type, it is flexible
timer every 1 ms when the processor is idle or else every 200 enough to accommodate future file systems as and when they
ms. become available.
The design of the module [11] that allows users to switch In Linux, the separate file systems that the system may use are
between default and custom scheduling policy were made not accessed by the device identifiers (such as drive number or
possible by adding additional system calls [10] and using them drive name), but instead they are combined into a single
as stubs between the kernel and kernel modules. While this hierarchical tree structure that represents the file system as an
design might not be security wise, it is does get the job done. entire, single entity. Each new file system is added into this
Keep in mind that the modularization is done for educational single file system tree as it is mounted. All file systems, no
purpose, so the tools or the module is not meant for commercial matter what type are mounted onto a directory and the files of
use. the mounted file system cover up the existing contents of that
directory. This directory is known as the mount directory or the
mount point. When the system is unmounted, the mount
directory’s own files are once again revealed [13].
Linux allows you to use loadable modules for all the file system
types. These software modules can either be linked to the
kernel being booted or compiled in the form of loadable
modules. In the case where file systems are built as modules,
they can be demand loaded as they are needed or loaded by
hand using, “insmod”. Whenever a file system module is loaded
it registers itself with the kernel and unregisters itself when it is
unloaded. Each file system’s initialisation routine registers
itself with the Virtual File System and is represented by a
file_system_type data structure which contains the name
of the file system and a pointer to its VFS superblock read
routine [14].

5. FINDINGS
Using the Pocket-Linux guide to compile a kernel was more
than just a step-by-step process. The main problem was getting
hold of all the correct source code, and then ironing out all the
compile time errors. The guide recommends stripping all the
Figure 2 non-essential parts of the kernel so as to keep the size down, but
in our case some of these non-essential components could be
1. Some code in the kernel calls anyone of those methods used to better understand the kernel functionality. There are also
2. The method calls the system call getSchedConfig() to some components that are completely removed in the guide that
check whether the original or the hacked version should be would aid in learning, such as access to hard disks, to access
called. benchmark files, or creating logs that could be viewed at a later
stage in a stable environment.
3. getSchedConfig() returns an answer.
The modularization of the VM sub-system proved to be too
4. If a hacked version is needed then {method}_mod() system much of a challenge, as it is too extensively linked into other
call is called. core components of the kernel. Some progress was made in
5. The module in SchedMod.c will intercept this call, as it deciphering the little documentation that is available, and
has already replaced the {method}_mod() address on the comments were added to the source code in order to easy the
sys_call_table with an address of its own. understanding of the functionality, and the purpose of the
methods.
The 2.4 series of kernels were easier to understand, especially mounted file system) and for setting up the superblock structure
the earlier ones, as the paging algorithm was still implemented and most importantly the superblock field, write various
in software, and this aided the understanding of the later kernels super_operations functions, write various inode_operations
in the 2.4 series, as well as the 2.6 series. The differences functions and finally write various file_operations functions
between the two major versions is that the 2.6 kernels have more [15].
than one swap-out daemon (one daemon per memory node),
Due to the fact that the current implementation of Linux allows
which is not all that much of a change. All that is required is that
you to use loadable modules, adding a new file system to the
the previously global variables be moved so that there would be
kernel is made relatively easy. The kernel does not need to be
one copy of each in each zone of the nodes
recompiled every time a change is made. The new file system
In order to compile a module the path of the Linux kernel source can be written and compiled separately and then loaded into the
code is required, and since the compressed source is already kernel if and when it is needed.
35MB and 120MB when decompressed, it is pretty unlikely that
we can provide the a modifiable teaching tool based on just 6. CONCLUSION
floppy disk. Alternatively we can, mount a USB flash drive, and
direct the path of the Linux source code there. To give students a more practical course in operating systems
requires a stable OS that also allows some form of
Majority of all the kernel source code were written in C, the rest configurability. Previous work found that the Windows CE
in assembler, and this might pose a great challenge for students, platform was not suitable for this application. The Linux kernel
as the C language is not part of their syllabus. Even for someone does conform to these requirements, and can also be
who is proficient in C, there are many macros and extern implemented in a minimalist manner which is also desirable in
methods that are defined outside the current .c file that one terms of ease of testing.
might be looking at.
The Linux kernel is available for free, unlike some proprietary
The implementation of the scheduler module requires a pretty systems, and there is a community of knowledge available in the
in-depth knowledge in C, and it also requires a very skillful and form of online guides, numerous books as well as message
advance programmer. When someone is writing a normal main() boards. Although these seem like a plentiful source of
program, a syntax error means a quick shout from your gcc knowledge, there are some sections of the kernel that are only
compiler and a logic error means a core dump or segmentation fully understood by a small number of individuals that are
fault. But since kernel module runs in kernel space, your little involved in the development of kernels, and this could be seen
syntax error will probably only be discovered halfway through as a detracting factor.
your kernel recompilation which takes about 10 minutes. Any
logical error in the module will either freeze the entire system or As we have shown, the Pocket Linux Guide is easy to follow
reboots the system without a warning, neither is desirable. and the end product is a form of throw-away testing which
would not influence the uptime of laboratory workstations due
The following summarizes the major findings when attempting to buggy code. The “pocket-linux” also provides a fast method
to modularize the process scheduler. by which kernel compilations can be tested. There would still be
The new O(1) Scheduler is a hybrid of the two a need for some form of Linux specific laboratory for the
tradition scheduling policy RR(round robin) and FIFO students to do their development, but the need for testing
(first in first out). specific machines is eliminated.
Understanding the current O(1) scheduler is already One negative that became apparent during the project was the
an appealing educational exercise. lack of knowledge of the C programming language. Although
there are many similarities between C and C++, some of the
Writing the methods that will replace the default ones
conventions needed to be investigated to be fully understood. If
will require the student to be skillful in C and to have
the operating systems course were to go ahead, an introductory
good knowledge of Linux internals.
into the C programming language would be needed before a
While writing a module and system-calls to abstract student could progress onto working with the source code.
out the scheduler is a good idea, it is not a strategy
that I recommend to encapsulate ALL parts of the The Linux kernel is after all monolithic and hence heavy
operating system. coupling, even after the scheduler was abstracted as a module
and modifications were made, it would be extreme difficult to
Pocket Linux alone will not be able to produce a tell what went wrong should an error occurs. By just knowing
developing environment for kernel editing, modifying one part of the OS, it is often difficult to see what other parts are
and testing. connected or indirectly connected to it, therefore it would be
Using default system calls can also change the current wiser to look at a simpler and earlier version of Linux in the
behavior of the scheduler to a certain degree. future but with specify documentation or books already
obtained.
If the 2.4 version were used, either Bossa or DWCS
can be used to modularize the scheduler. With the increasing popularity of the Linux operating system,
choosing it to be used as a base from which to teach a practical
Writing a file system requires following a few basic steps,
operating systems course would not only benefit the students in
namely: Register the file system, write a function for reading the
the future, but also the open source community, as there could
superblock (the data structure that holds information about each
conceivably be more individuals with experience in the inner [6]. Andrew S. Tanenbaum. Modern Operating Systems, 2nd
workings of the operating system. Edition. Upper Saddle River, NJ: Prentice-Hall, 2001.
[7]. Scott Rhine, Hewlitt-Packard Company,
7. ACKNOWLEDGMENTS http://linux.ittoolbox.com/documents/document.asp?i=111
Our thanks to: 7, 3 Mar 2000
[8]. Rich West, http://www.cs.bu.edu/fac/richwest/dwcs.html,
Matthew West, System Administrator in UCT Computer Jun 2004
Science Department, for helping us in providing background [9]. Panagiotis Christias, http://unixhelp.ed.ac.uk/CGI/man-
information about Linux kernel. cgi?sched_setscheduler+2, Jun 2002
Prof. Ken MacGregor, our supervisor, for guiding us along the [10]. Worcester olytechnic Institute,
way and providing background materials. http://fossil.wpi.edu/docs/howto_add_systemcall.html, Jun
2004
[11]. Jay Salzman, http://www.tldp.org/LDP/lkmpg/2.6/html/,
8. REFERENCES May 2004
[1]. David Horton. Pocket Linux Guide. [12]. Robert Love,
http://my.core.com/~dhorton/linux/pocket/ 2004 http://www.samspublishing.com/articles/article.asp?p=101
[2]. Mel Gorman. Understanding the Linux Virtual Memory 760, Nov 2003
Manager. Prentice-Hall,2004. [13]. Freeos.com, http://www.freeos.com/articles/3838/, March
2001
[3]. Abhishek Nayani. Memory management in Linux –
[14]. A. Rubini, The“Virtual File System” in Linux, May 1997,
Desktop companion to the Linux source code, 1994.
http://www.linux.it/kerneldocs/vfs/vfs.html
[4]. Gary Nutt. Operating systems second Edition. Addison [15]. Writing a Linux FileSystem Module, July 2001,
Wesley, 2002. http://www.cise.ufl.edu/~ppadala/publications/fs/slide001.h
[5]. Rodney R. Oldehoeft, Maekawa Mamoru and Arthur E. tml
Oldehoeft. Operating Systems, Advanced Concepts.
Benjamin/Cummings Publishing, 1987.
This document was created with Win2PDF available at http://www.daneprairie.com.
The unregistered version of Win2PDF is for evaluation or non-commercial use only.

You might also like