Unit-II OS
Unit-II OS
Processes
Process memory is divided into four sections as shown in Figure 3.1 below:
o The text section comprises the compiled program code, read in from non-volatile storage
when the program is launched.
o The data section stores global and static variables, allocated and initialized prior to executing
main.
o The heap is used for dynamic memory allocation, and is managed via calls to new, delete,
malloc, free, etc.
o The stack is used for local variables. Space on the stack is reserved for local variables when
they are declared ( at function entrance or elsewhere, depending on the language ), and the
space is freed up when the variables go out of scope. Note that the stack is also used for
function return values, and the exact mechanisms of stack management may be language
specific.
o Note that the stack and the heap start at opposite ends of the process's free space and grow
towards each other. If they should ever meet, then either a stack overflow error will occur, or
else a call to new or malloc will fail due to insufficient memory available.
When processes are swapped out of memory and later restored, additional information must also be
stored and restored. Key among them are the program counter and the value of all program registers.
For each process there is a Process Control Block, PCB, which stores the following ( types of ) process-specific
information, as illustrated in Figure 3.1. ( Specific details may vary from system to system. )
3.1.4 Threads
The two main objectives of the process scheduling system are to keep the CPU busy at all times and to deliver
"acceptable" response times for all programs, particularly for interactive ones.
The process scheduler must meet these objectives by implementing suitable policies for swapping processes in
and out of the CPU.
( Note that these objectives can be conflicting. In particular, every time the system steps in to swap processes
it takes up time on the CPU to do so, which is thereby "lost" from doing any useful productive work. )
Figure 3.5 - The ready queue and various I/O device queues
A long-term scheduler is typical of a batch system or a very heavily loaded system. It runs
infrequently, ( such as when one process ends selecting one more to be loaded in from disk in its
place ), and can afford to take the time to implement intelligent and advanced scheduling algorithms.
The short-term scheduler, or CPU Scheduler, runs very frequently, on the order of 100 milliseconds,
and must very quickly swap one process out of the CPU and swap in another one.
Some systems also employ a medium-term scheduler. When system loads get high, this scheduler
will swap one or more processes out of the ready queue system for a few seconds, in order to allow
smaller faster jobs to finish up quickly and clear the system. See the differences in Figures 3.7 and 3.8
below.
An efficient scheduling system will select a good process mix of CPU-bound processes and I/O
bound processes.
Processes may create other processes through appropriate system calls, such as fork or spawn. The
process which does the creating is termed the parent of the other process, which is termed its child.
Each process is given an integer identifier, termed its process identifier, or PID. The parent PID
( PPID ) is also stored for each process.
On typical UNIX systems the process scheduler is termed sched, and is given PID 0. The first thing it
does at system startup time is to launch init, which gives that process PID 1. Init then launches all
system daemons and user logins, and becomes the ultimate parent of all other processes. Figure 3.9
shows a typical process tree for a Linux system, and other systems will have similar though not
identical trees:
Depending on system implementation, a child process may receive some amount of shared resources
with its parent. Child processes may or may not be limited to a subset of the resources originally
allocated to the parent, preventing runaway children from consuming all of a certain system resource.
There are two options for the parent process after creating the child:
Dept of DevOps & Cloud 6
Processes may request their own termination by making the exit( ) system call, typically returning an
int. This int is passed along to the parent if it is doing a wait( ), and is typically zero on successful
completion and some non-zero code in the event of problems.
o child code:
o int exitCode;
exit( exitCode ); // return exitCode; has the same effect when executed from main( )
o parent code:
o pid_t pid;
o int status
o pid = wait( &status );
o // pid indicates which child exited. exitCode in low-order bits of status
// macros can test the high-order bits of status for why it stopped
Processes may also be terminated by the system for a variety of reasons, including:
o The inability of the system to deliver necessary system resources.
o In response to a KILL command, or other un handled process interrupt.
o A parent may kill its children if the task assigned to them is no longer needed.
o If the parent exits, the system may or may not allow the child to continue without a parent.
( On UNIX systems, orphaned processes are generally inherited by init, which then proceeds
to kill them. The UNIX nohup command allows a child to continue executing after its parent
has exited. )
When a process terminates, all of its system resources are freed up, open files flushed and closed, etc.
The process termination status and execution times are returned to the parent if the parent is waiting
for the child to terminate, or eventually returned to init if the process becomes an orphan. ( Processes
which are trying to terminate but which cannot because their parent is not waiting for them are
Dept of DevOps & Cloud 7
Independent Processes operating concurrently on a systems are those that can neither affect other processes
or be affected by other processes.
Cooperating Processes are those that can affect or be affected by other processes. There are several reasons
why cooperating processes are allowed:
o Information Sharing - There may be several processes which need access to the same file for example.
( e.g. pipelines. )
o Computation speedup - Often a solution to a problem can be solved faster if the problem can be
broken down into sub-tasks to be solved simultaneously ( particularly when multiple processors are
involved. )
o Modularity - The most efficient architecture may be to break a system down into cooperating modules.
( E.g. databases with a client-server architecture. )
o Convenience - Even a single user may be multi-tasking, such as editing, compiling, printing, and
running the same code in different windows.
Cooperating processes require some type of inter-process communication, which is most commonly one of
two types: Shared Memory systems or Message Passing systems. Figure 3.13 illustrates the difference
between the two systems:
Figure 3.12 - Communications models: (a) Message passing. (b) Shared memory.
Shared Memory is faster once it is set up, because no system calls are required and access occurs at normal
memory speeds. However it is more complicated to set up, and doesn't work as well across multiple
computers. Shared memory is generally preferable when large amounts of information must be shared quickly
on the same computer.
Message Passing requires system calls for every message transfer, and is therefore slower, but it is simpler to
set up and works well across multiple computers. Message passing is generally preferable when the amount
and/or frequency of data transfers is small, or when multiple computers are involved.
Dept of DevOps & Cloud 8
In general the memory to be shared in a shared-memory system is initially within the address space of
a particular process, which needs to make system calls in order to make that memory publicly
available to one or more other processes.
Other processes which wish to use the shared memory must then make their own system calls to
attach the shared memory area onto their address space.
Generally a few messages must be passed back and forth between the cooperating processes first in
order to set up and coordinate the shared memory access.
This is a classic example, in which one process is producing data and another process is consuming
the data. ( In this example in the order in which it is produced, although that could vary. )
The data is passed via an intermediary buffer, which may be either unbounded or bounded. With a
bounded buffer the producer may have to wait until there is space available in the buffer, but with an
unbounded buffer the producer will never need to wait. The consumer may need to wait in either case
until there is data available.
This example uses shared memory and a circular queue. Note in the code below that only the producer
changes "in", and only the consumer changes "out", and that they can never be accessing the same
array location at the same time.
First the following data is set up in the shared memory area:
#define BUFFER_SIZE 10
typedef struct {
. . .
} item;
Then the producer process. Note that the buffer is full when "in" is one less than "out" in a circular
sense:
item nextProduced;
while( true ) {
Then the consumer process. Note that the buffer is empty when "in" is equal to "out":
item nextConsumed;
while( true ) {
Message passing systems must support at a minimum system calls for "send message" and "receive
message".
A communication link must be established between the cooperating processes before messages can be
sent.
There are three key issues to be resolved in message passing systems as further explored in the next
three subsections:
o Direct or indirect communication ( naming )
o Synchronous or asynchronous communication
o Automatic or explicit buffering.
3.4.2.1 Naming
With direct communication the sender must know the name of the receiver to which it
wishes to send a message.
o There is a one-to-one link between every sender-receiver pair.
o For symmetric communication, the receiver must also know the specific name of the
sender from which it wishes to receive messages.
For asymmetric communications, this is not necessary.
Indirect communication uses shared mailboxes, or ports.
o Multiple processes can share the same mailbox or boxes.
o Only one process can read any given message in a mailbox. Initially the process that
creates the mailbox is the owner, and is the only one allowed to read mail in the
mailbox, although this privilege may be transferred.
( Of course the process that reads the message can immediately turn around
and place an identical message back in the box for someone else to read, but
that may put it at the back end of a queue of messages. )
3.4.2.2 Synchronization
3.4.2.3 Buffering
Messages are passed via queues, which may have one of three capacity configurations:
1. Zero capacity - Messages cannot be stored in the queue, so senders must block until
receivers accept the messages.
2. Bounded capacity- There is a certain pre-determined finite capacity in the queue.
Senders must block if the queue is full, until space becomes available in the queue,
but may be either blocking or non-blocking otherwise.
3. Unbounded capacity - The queue has a theoretical infinite capacity, so senders are
never forced to block.
1. The first step in using shared memory is for one of the processes involved to allocate some shared
memory, using shmget:
o The first parameter specifies the key ( identifier ) of the segment. IPC_PRIVATE
creates a new shared memory segment.
o The second parameter indicates how big the shared memory segment is to be, in bytes.
o The third parameter is a set of bitwise ORed flags. In this case the segment is being
created for reading and writing.
o The return value of shmget is an integer identifier
2. Any process which wishes to use the shared memory must attach the shared memory to their address
space, using shmat:
o The first parameter specifies the key ( identifier ) of the segment that the process wishes to
attach to its address space
o The second parameter indicates where the process wishes to have the segment attached.
NULL indicates that the system should decide.
o The third parameter is a flag for read-only operation. Zero indicates read-write; One indicates
readonly.
o The return value of shmat is a void *, which the process can use ( type cast ) as appropriate.
In this example it is being used as a character pointer.
3. Then processes may access the memory using the pointer returned by shmat, for example using sprintf:
4. When a process no longer needs a piece of shared memory, it can be detached using shmdt:
shmdt( shared_memory );
5. And finally the process that originally allocated the shared memory can remove it from the system
suing shmctl.
6. Figure 3.16 from the eighth edition illustrates a complete program implementing shared memory on a
POSIX system:
Threads
4.1 Overview
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers,
( and a thread ID. )
Traditional ( heavyweight ) processes have a single thread of control - There is one program counter, and one
sequence of instructions that can be carried out at any given time.
4.1.1 Motivation
Threads are very useful in modern programming whenever a process has multiple tasks to perform
independently of the others.
This is particularly true when one of the tasks may block, and it is desired to allow the other tasks to proceed
without blocking.
For example in a word processor, a background thread may check spelling and grammar while a foreground
thread processes user input ( keystrokes ), while yet a third thread loads images from the hard drive, and a
fourth does periodic automatic backups of the file being edited.
Another example is a web server - Multiple threads allow for multiple requests to be satisfied simultaneously,
without having to service requests sequentially or to fork off separate processes for every incoming request.
( The latter is how this sort of thing was done before the concept of threads was developed. A daemon would
listen at a port, fork off a child for every incoming request to be processed, and then go back to listening to the
port. )
4.1.2 Benefits
A recent trend in computer architecture is to produce chips with multiple cores, or CPUs on a single chip.
A multi-threaded application running on a traditional single-core chip would have to interleave the threads, as
shown in Figure 4.3. On a multi-core chip, however, the threads could be spread across the available cores,
allowing true parallel processing, as shown in Figure 4.4.
For operating systems, multi-core chips require new scheduling algorithms to make better use of the multiple
cores available.
As multi-threading becomes more pervasive and more important ( thousands instead of tens of threads ),
CPUs have been developed to support more simultaneous threads per core in hardware.
For application programmers, there are five areas where multi-core chips present new challenges:
1. Identifying tasks - Examining applications to find activities that can be performed concurrently.
2. Balance - Finding tasks to run concurrently that provide equal value. I.e. don't waste a thread on
trivial tasks.
3. Data splitting - To prevent the threads from interfering with one another.
4. Data dependency - If one task is dependent upon the results of another, then the tasks need to be
synchronized to assure access in the proper order.
5. Testing and debugging - Inherently more difficult in parallel processing situations, as the race
conditions become much more complex and difficult to identify.
1. Data parallelism divides the data up amongst multiple cores ( threads ), and performs the same task on each
subset of the data. For example dividing a large image up into pieces and performing the same digital image
processing on each piece on different cores.
2. Task parallelism divides the different tasks to be performed among the different cores and performs them
simultaneously.
In practice no program is ever divided up solely by one or the other of these, but instead by some sort of hybrid
combination.
There are two types of threads to be managed in a modern system: User threads and kernel threads.
User threads are supported above the kernel, without kernel support. These are the threads that application
programmers would put into their programs.
Kernel threads are supported within the kernel of the OS itself. All modern OSes support kernel level threads,
allowing the kernel to perform multiple simultaneous tasks and/or to service multiple kernel system calls
simultaneously.
In a specific implementation, the user threads must be mapped to kernel threads, using one of the following
strategies.
Dept of DevOps & Cloud 16
In the many-to-one model, many user-level threads are all mapped onto a single kernel thread.
Thread management is handled by the thread library in user space, which is very efficient.
However, if a blocking system call is made, then the entire process blocks, even if the other user threads
would otherwise be able to continue.
Because a single kernel thread can operate only on a single CPU, the many-to-one model does not allow
individual processes to be split across multiple CPUs.
Green threads for Solaris and GNU Portable Threads implement the many-to-one model in the past, but few
systems continue to do so today.
The one-to-one model creates a separate kernel thread to handle each user thread.
One-to-one model overcomes the problems listed above involving blocking system calls and the splitting of
processes across multiple CPUs.
However the overhead of managing the one-to-one model is more significant, involving more overhead and
slowing down the system.
Most implementations of this model place a limit on how many threads can be created.
Linux and Windows from 95 to XP implement the one-to-one model for threads.
The many-to-many model multiplexes any number of user threads onto an equal or smaller number of kernel
threads, combining the best features of the one-to-one and many-to-one models.
Users have no restrictions on the number of threads created.
Blocking kernel system calls do not block the entire process.
Processes can be split across multiple processors.
Individual processes may be allocated variable numbers of kernel threads, depending on the number of CPUs
present and other factors.
One popular variation of the many-to-many model is the two-tier model, which allows either many-to-many
or one-to-one operation.
IRIX, HP-UX, and Tru64 UNIX use the two-tier model, as did Solaris prior to Solaris 9.
Thread libraries provide programmers with an API for creating and managing threads.
Thread libraries may be implemented either in user space or in kernel space. The former involves API
functions implemented solely within user space, with no kernel support. The latter involves system calls, and
requires a kernel with thread library support.
There are three main thread libraries in use today:
1. POSIX Pthreads - may be provided as either a user or kernel library, as an extension to the POSIX
standard.
2. Win32 threads - provided as a kernel-level library on Windows systems.
3. Java threads - Since Java generally runs on a Java Virtual Machine, the implementation of threads is
based upon whatever OS and hardware the JVM is running on, i.e. either Pthreads or Win32 threads
depending on the system.
The following sections will demonstrate the use of threads in all three systems for calculating the sum of
integers from 0 to N in a separate thread, and storing the result in a variable "sum".
Process Synchronization
Warning: This chapter requires some heavy thought. As you read each of the algorithms below, you need to satisfy
yourself that they do indeed work under all conditions. Think about it, and don't just accept them at face value.
5.1 Background
Unfortunately we have now introduced a new problem, because both the producer and the consumer are
adjusting the value of the variable counter, which can lead to a condition known as a race condition. In this
condition a piece of code may or may not work correctly, depending on which of two simultaneous processes
executes first, and more importantly if one of the processes gets interrupted such that the other process runs
between important steps of the first process. ( Bank balance example discussed in class. )
The particular problem above comes from the producer executing "counter++" at the same time the consumer
is executing "counter--". If one process gets part way through making the update and then the other process
butts in, the value of counter can get left in an incorrect state.
But, you might say, "Each of those are single instructions - How can they get interrupted halfway through?"
The answer is that although they are single instructions in C++, they are actually three steps each at the
hardware level: (1) Fetch counter from memory into a register, (2) increment or decrement the register, and (3)
Store the new value of counter back to memory. If the instructions from the two processes get interleaved,
there could be serious problems, such as illustrated by the following:
Exercise: What would be the resulting value of counter if the order of statements T4 and T5 were reversed?
( What should the value of counter be after one producer and one consumer, assuming the original value was
5? )
Note that race conditions are notoriously difficult to identify and debug, because by their very nature they
only occur on rare occasions, and only when the timing is just exactly right. ( or wrong! :-) ) Race conditions
are also very difficult to reproduce. :-(
Obviously the solution is to only allow one process at a time to manipulate the value "counter". This is a very
common occurrence among cooperating processes, so lets look at some ways in which this is done, as well as
some classic problems in this area.
The producer-consumer problem described above is a specific example of a more general situation known as
the critical section problem. The general idea is that in a number of cooperating processes, each has a critical
section of code, with the following conditions and terminologies:
o Only one process in the group can be allowed to execute in their critical section at any one time. If
one process is already executing their critical section and another process wishes to do so, then the
second process must be made to wait until the first process has completed their critical section work.
o The code preceding the critical section, and which controls access to the critical section, is termed the
entry section. It acts like a carefully controlled locking door.
o The code following the critical section is termed the exit section. It generally releases the lock on
someone else's door, or at least lets the world know that they are no longer in their critical section.
o The rest of the code not included in either the critical section or the entry or exit sections is termed the
remainder section.
A solution to the critical section problem must satisfy the following three conditions:
1. Mutual Exclusion - Only one process at a time can be executing in their critical section.
2. Progress - If no process is currently executing in their critical section, and one or more processes
want to execute their critical section, then only the processes not in their remainder sections can
participate in the decision, and the decision cannot be postponed indefinitely. ( I.e. processes cannot
be blocked forever waiting to get into their critical sections. )
3. Bounded Waiting - There exists a limit as to how many other processes can get into their critical
sections after a process requests entry into their critical section and before that request is granted. ( I.e.
a process requesting entry into their critical section will get a turn eventually, and there is a limit as to
how many other processes get to go first. )
We assume that all processes proceed at a non-zero speed, but no assumptions can be made regarding
the relative speed of one process versus another.
Kernel processes can also be subject to race conditions, which can be especially problematic when updating
commonly shared kernel data structures such as open file tables or virtual memory management. Accordingly
kernels can take on one of two forms:
o Non-preemptive kernels do not allow processes to be interrupted while in kernel mode. This
eliminates the possibility of kernel-mode race conditions, but requires kernel mode operations to
complete very quickly, and can be problematic for real-time systems, because timing cannot be
guaranteed.
o Preemptive kernels allow for real-time operations, but must be carefully written to avoid race
conditions. This can be especially tricky on SMP systems, in which multiple kernel processes may be
running simultaneously on different processors.
Non-preemptive kernels include Windows XP, 2000, traditional UNIX, and Linux prior to 2.6; Preemptive
kernels include Linux 2.6 and later, and some commercial UNIXes such as Solaris and IRIX.
Peterson's Solution is a classic software-based solution to the critical section problem. It is unfortunately not
guaranteed to work on modern hardware, due to vagaries of load and store operations, but it illustrates a
number of important concepts.
Dept of DevOps & Cloud 22
To prove that the solution is correct, we must examine the three conditions listed above:
1. Mutual exclusion - If one process is executing their critical section when the other wishes to do so,
the second process will become blocked by the flag of the first process. If both processes attempt to
enter at the same time, the last process to execute "turn = j" will be blocked.
2. Progress - Each process can only be blocked at the while if the other process wants to use the critical
section ( flag[ j ] = = true ), AND it is the other process's turn to use the critical section ( turn = = j ). If
both of those conditions are true, then the other process ( j ) will be allowed to enter the critical
section, and upon exiting the critical section, will set flag[ j ] to false, releasing process i. The shared
variable turn assures that only one process at a time can be blocked, and the flag variable allows one
process to release the other when exiting their critical section.
3. Bounded Waiting - As each process enters their entry section, they set the turn variable to be the
other processes turn. Since no process ever sets it back to their own turn, this ensures that each
process will have to let the other process go first at most one time before it becomes their turn again.
To generalize the solution(s) expressed above, each process when entering their critical section must set some
sort of lock, to prevent other processes from entering their critical sections simultaneously, and must release
the lock when exiting their critical section, to allow other processes to proceed. Obviously it must be possible
to attain the lock only when no other process has already set a lock. Specific implementations of this general
procedure can get quite complicated, and may include hardware solutions as outlined in this section.
One simple solution to the critical section problem is to simply prevent a process from being interrupted while
in their critical section, which is the approach taken by non preemptive kernels. Unfortunately this does not
work well in multiprocessor environments, due to the difficulties in disabling and the re-enabling interrupts on
all processors. There is also a question as to how this approach affects timing if the clock interrupt is disabled.
Another approach is for hardware to provide certain atomic operations. These operations are guaranteed to
operate as a single instruction, without interruption. One such operation is the "Test and Set", which
simultaneously sets a boolean lock variable and returns its previous value, as shown in Figures 5.3 and 5.4:
Another variation on the test-and-set is an atomic swap of two booleans, as shown in Figures 5.5 and 5.6:
The above examples satisfy the mutual exclusion requirement, but unfortunately do not guarantee bounded
waiting. If there are multiple processes trying to get into their critical sections, there is no guarantee of what
order they will enter, and any one process could have the bad luck to wait forever until they got their turn in
the critical section. ( Since there is no guarantee as to the relative rates of the processes, a very fast process
could theoretically release the lock, whip through their remainder section, and re-lock the lock before a slower
process got a chance. As more and more processes are involved vying for the same resource, the odds of a
slow process getting locked out completely increase. )
Figure 5.7 illustrates a solution using test-and-set that does satisfy this requirement, using two shared data
structures, boolean lock and boolean waiting[ N ], where N is the number of processes in contention for
critical sections:
The key feature of the above algorithm is that a process blocks on the AND of the critical section being locked
and that this process is in the waiting state. When exiting a critical section, the exiting process does not just
unlock the critical section and let the other processes have a free-for-all trying to get in. Rather it first looks in
an orderly progression ( starting with the next process on the list ) for a process that has been waiting, and if it
finds one, then it releases that particular process from its waiting state, without unlocking the critical section,
thereby allowing a specific process into the critical section while continuing to block all the others. Only if
there are no other processes currently waiting is the general lock removed, allowing the next process to come
along access to the critical section.
Unfortunately, hardware level locks are especially difficult to implement in multi-processor architectures.
Discussion of such issues is left to books on advanced computer architecture.
The hardware solutions presented above are often difficult for ordinary programmers to access, particularly on
multi-processor machines, and particularly because they are often platform-dependent.
Therefore most systems offer a software API equivalent called mutex locks or simply mutexes. ( For mutual
exclusion )
The terminology when using mutexes is to acquire a lock prior to entering a critical section, and to release it
when exiting, as shown in Figure 5.8:
Just as with hardware locks, the acquire step will block the process if the lock is in use by another process, and
both the acquire and release operations are atomic.
Acquire and release can be implemented as shown here, based on a boolean variable "available":
One problem with the implementation shown here, ( and in the hardware solutions presented earlier ), is the
busy loop used to block processes in the acquire phase. These types of locks are referred to as spinlocks,
because the CPU just sits and spins while blocking the process.
5.6 Semaphores
A more robust alternative to simple mutexes is to use semaphores, which are integer variables for which only
two ( atomic ) operations are defined, the wait and signal operations, as shown in the following figure.
Note that not only must the variable-changing steps ( S-- and S++ ) be indivisible, it is also necessary that for
the wait operation when the test proves false that there be no interruptions before S gets decremented. It IS
okay, however, for the busy loop to be interrupted when the test is true, which prevents the system from
hanging forever.
o Counting semaphores can take on any integer value, and are usually used to count the number
remaining of some limited resource. The counter is initialized to the number of such resources
available in the system, and whenever the counting semaphore is greater than zero, then a process can
enter a critical section and use one of the resources. When the counter gets to zero ( or negative in
some implementations ), then the process blocks until another process frees up a resource and
increments the counting semaphore with a signal call. ( The binary semaphore can be seen as just a
special case where the number of resources initially available is just one. )
o Semaphores can also be used to synchronize certain operations between processes. For example,
suppose it is important that process P1 execute statement S1 before process P2 executes statement S2.
First we create a semaphore named synch that is shared by the two processes, and initialize it
to zero.
Then in process P1 we insert the code:
S1;
signal( synch );
wait( synch );
S2;
Because synch was initialized to 0, process P2 will block on the wait until after P1 executes
the call to signal.
The big problem with semaphores as described above is the busy loop in the wait call, which consumes CPU
cycles without doing any useful work. This type of lock is known as a spinlock, because the lock just sits
there and spins while it waits. While this is generally a bad thing, it does have the advantage of not invoking
context switches, and so it is sometimes used in multi-processing systems when the wait time is expected to be
short - One thread spins on one processor while another completes their critical section on another processor.
An alternative approach is to block a process when it is forced to wait for an available semaphore, and swap it
out of the CPU. In this implementation each semaphore needs to maintain a list of processes that are blocked
waiting for it, so that one of the processes can be woken up and swapped back in when the semaphore
Note that in this implementation the value of the semaphore can actually become negative, in which case its
magnitude is the number of processes waiting for that semaphore. This is a result of decrementing the counter
before checking its value.
Key to the success of semaphores is that the wait and signal operations be atomic, that is no other process can
execute a wait or signal on the same semaphore at the same time. ( Other processes could be allowed to do
other things, including working with other semaphores, they just can't have access to this semaphore. ) On
single processors this can be implemented by disabling interrupts during the execution of wait and signal;
Multiprocessor systems have to use more complex methods, including the use of spinlocking.
Another problem to consider is that of starvation, in which one or more processes gets blocked forever, and
never get a chance to take their turn in the critical section. For example, in the semaphores above, we did not
specify the algorithms for adding processes to the waiting queue in the semaphore in the wait( ) call, or
selecting one to be removed from the queue in the signal( ) call. If the method chosen is a FIFO queue, then
every process will eventually get their turn, but if a LIFO queue is implemented instead, then the first process
to start waiting could starve.
A challenging scheduling problem arises when a high-priority process gets blocked waiting for a resource that
is currently held by a low-priority process.
If the low-priority process gets pre-empted by one or more medium-priority processes, then the high-priority
process is essentially made to wait for the medium priority processes to finish before the low-priority process
can release the needed resource, causing a priority inversion. If there are enough medium-priority processes,
then the high-priority process may be forced to wait for a very long time.
One solution is a priority-inheritance protocol, in which a low-priority process holding a resource for which a
high-priority process is waiting will temporarily inherit the high priority from the waiting process. This
prevents the medium-priority processes from preempting the low-priority process until it releases the resource,
blocking the priority inversion problem.
The following classic problems are used to test virtually every new proposed synchronization algorithm.
This is a generalization of the producer-consumer problem wherein access is controlled to a shared group of
buffers of a limited size.
In this solution, the two counting semaphores "full" and "empty" keep track of the current number of full and
empty buffers respectively ( and initialized to 0 and N respectively. ) The binary semaphore mutex controls
Some hardware implementations provide specific reader-writer locks, which are accessed using an argument
specifying whether access is requested for reading or writing. The use of reader-writer locks is beneficial for
situation in which: (1) processes can be easily identified as either readers or writers, and (2) there are
significantly more readers than writers, making the additional overhead of the reader-writer lock pay off in
terms of increased concurrency of the readers.
One possible solution, as shown in the following code section, is to use a set of five semaphores
( chopsticks[ 5 ] ), and to have each hungry philosopher first wait on their left chopstick ( chopsticks[ i ] ), and
then wait on their right chopstick ( chopsticks[ ( i + 1 ) % 5 ] )
But suppose that all five philosophers get hungry at the same time, and each starts by picking up their left
chopstick. They then look for their right chopstick, but because it is unavailable, they wait for it, forever, and
eventually all the philosophers starve due to the resulting deadlock.
5.8 Monitors
Semaphores can be very useful for solving concurrency problems, but only if programmers use them
properly. If even one process fails to abide by the proper use of semaphores, either accidentally or deliberately,
then the whole system breaks down. ( And since concurrency problems are by definition rare events, the
problem code may easily go unnoticed and/or be heinous to debug. )
For this reason a higher-level language construct has been developed, called monitors.
A monitor is essentially a class, in which all data is private, and with the special restriction that only one
method within any given monitor object may be active at the same time. An additional restriction is that
monitor methods may only access the shared data within the monitor and any data passed to them as
parameters. I.e. they cannot access any data external to the monitor.
Figure 5.16 shows a schematic of a monitor, with an entry queue of processes waiting their turn to execute
monitor operations ( methods. )
In order to fully realize the potential of monitors, we need to introduce one additional new data type, known as
a condition.
o A variable of type condition has only two legal operations, wait and signal. I.e. if X was defined as
type condition, then legal operations would be X.wait( ) and X.signal( )
o The wait operation blocks a process until some other process calls signal, and adds the blocked
process onto a list associated with that condition.
o The signal process does nothing if there are no processes waiting on that condition. Otherwise it
wakes up exactly one process from the condition's list of waiting processes. ( Contrast this with
counting semaphores, which always affect the semaphore on a signal call. )
Figure 6.18 below illustrates a monitor that includes condition variables within its data space. Note that the
condition variables, along with the list of processes currently waiting for the conditions, are in the data space
of the monitor - The processes on these lists are not "in" the monitor, in the sense that they are not executing
any code in the monitor.
But now there is a potential problem - If process P within the monitor issues a signal that would wake up
process Q also within the monitor, then there would be two processes running simultaneously within the
monitor, violating the exclusion requirement. Accordingly there are two possible solutions to this dilemma:
Signal and wait - When process P issues the signal to wake up process Q, P then waits, either for Q to leave the
monitor or on some other condition.
Signal and continue - When P issues the signal, Q waits, either for P to exit the monitor or for some other condition.
There are arguments for and against either choice. Concurrent Pascal offers a third alternative - The signal call causes
the signaling process to immediately exit the monitor, so that the waiting process can then wake up and proceed.
Java and C# ( C sharp ) offer monitors bulit-in to the language. Erlang offers similar but different constructs.
This solution to the dining philosophers uses monitors, and the restriction that a philosopher may only pick up
chopsticks when both are available. There are also two key data structures in use in this solution:
1. enum { THINKING, HUNGRY,EATING } state[ 5 ]; A philosopher may only set their state to
eating when neither of their adjacent neighbors is eating. ( state[ ( i + 1 ) % 5 ] != EATING &&
state[ ( i + 4 ) % 5 ] != EATING ).
2. condition self[ 5 ]; This condition is used to delay a hungry philosopher who is unable to acquire
chopsticks.
In the following solution philosophers share a monitor, DiningPhilosophers, and eat using the following
sequence of operations:
1. DiningPhilosophers.pickup( ) - Acquires chopsticks, which may block the process.
Dept of DevOps & Cloud 39
One possible implementation of a monitor uses a semaphore "mutex" to control mutual exclusionary access to
the monitor, and a counting semaphore "next" on which processes can suspend themselves after they are
already "inside" the monitor ( in conjunction with condition variables, see below. ) The integer next_count
keeps track of how many processes are waiting in the next queue. Externally accessible monitor processes are
then implemented as:
Condition variables can be implemented using semaphores as well. For a condition x, a semaphore "x_sem"
and an integer "x_count" are introduced, both initialized to zero. The wait and signal methods are then
implemented as follows. ( This approach to the condition implements the signal-and-wait option described
above for ensuring that only one process at a time is active inside the monitor. )
When there are multiple processes waiting on the same condition within a monitor, how does one decide
which one to wake up in response to a signal on that condition? One obvious approach is FCFS, and this may
be suitable in many cases.
Another alternative is to assign ( integer ) priorities, and to wake up the process with the smallest ( best )
priority.
Figure 5.19 illustrates the use of such a condition within a monitor used for resource allocation. Processes
wishing to access this resource must specify the time they expect to use it using the acquire( time ) method,
and must call the release( ) method when they are done with the resource.
Unfortunately the use of monitors to restrict access to resources still only works if programmers make the
requisite acquire and release calls properly. One option would be to place the resource allocation code into the
monitor, thereby eliminating the option for programmers to bypass or ignore the monitor, but then that would
substitute the monitor's scheduling algorithms for whatever other scheduling algorithms may have been
chosen for that particular resource. Chapter 14 on Protection presents more advanced methods for enforcing
"nice" cooperation among processes contending for shared resources.
Concurrent Pascal, Mesa, C#, and Java all implement monitors as described here. Erlang provides
concurrency support using a similar mechanism
References:
1. Abraham Silberschatz, Greg Gagne, and Peter Baer Galvin, "Operating System Concepts, Ninth Edition ",
Chapter 6
Almost all programs have some alternating cycle of CPU number crunching and waiting for I/O of some kind.
( Even a simple fetch from memory takes a long time relative to CPU speeds. )
In a simple system running a single process, the time spent waiting for I/O is wasted, and those CPU cycles
are lost forever.
A scheduling system allows one process to use the CPU while another is waiting for I/O, thereby making full
use of otherwise lost CPU cycles.
The challenge is to make the overall system as "efficient" and "fair" as possible, subject to varying and often
dynamic conditions, and where "efficient" and "fair" are somewhat subjective terms, often subject to shifting
priority policies.
Almost all processes alternate between two states in a continuing cycle, as shown in Figure 6.1 below :
o A CPU burst of performing calculations, and
o An I/O burst, waiting for data transfer in or out of the system.
CPU bursts vary from process to process, and from program to program, but an extensive study shows
frequency patterns similar to that shown in Figure 6.2:
Whenever the CPU becomes idle, it is the job of the CPU Scheduler ( a.k.a. the short-term scheduler ) to
select another process from the ready queue to run next.
The storage structure for the ready queue and the algorithm used to select the next process are not necessarily
a FIFO queue. There are several alternatives to choose from, as well as numerous adjustable parameters for
each algorithm, which is the basic subject of this entire chapter.
6.1.4 Dispatcher
The dispatcher is the module that gives control of the CPU to the process selected by the scheduler. This
function involves:
o Switching context.
o Switching to user mode.
o Jumping to the proper location in the newly loaded program.
The dispatcher needs to be as fast as possible, as it is run on every context switch. The time consumed by the
dispatcher is known as dispatch latency.
There are several different criteria to consider when trying to select the "best" scheduling algorithm for a
particular situation and environment, including:
o CPU utilization - Ideally the CPU would be busy 100% of the time, so as to waste 0 CPU cycles. On
a real system CPU usage should range from 40% ( lightly loaded ) to 90% ( heavily loaded. )
o Throughput - Number of processes completed per unit time. May range from 10 / second to 1 / hour
depending on the specific processes.
o Turnaround time - Time required for a particular process to complete, from submission time to
completion. ( Wall clock time. )
o Waiting time - How much time processes spend in the ready queue waiting their turn to get on the
CPU.
( Load average - The average number of processes sitting in the ready queue waiting their
turn to get into the CPU. Reported in 1-minute, 5-minute, and 15-minute averages by
"uptime" and "who". )
o Response time - The time taken in an interactive program from the issuance of a command to
the commence of a response to that command.
In general one wants to optimize the average value of a criteria ( Maximize CPU utilization and throughput,
and minimize all the others. ) However some times one wants to do something different, such as to minimize
the maximum response time.
Sometimes it is most desirable to minimize the variance of a criteria than the actual value. I.e. users are more
accepting of a consistent predictable system than an inconsistent one, even if it is a little bit slower.
The following subsections will explain several common scheduling strategies, looking at only a single CPU burst each
for a small number of processes. Obviously real systems have to deal with a lot more simultaneous processes
executing their CPU-I/O burst cycles.
P1 24
P2 3
P3 3
In the first Gantt chart below, process P1 arrives first. The average waiting time for the three processes is ( 0 +
24 + 27 ) / 3 = 17.0 ms.
In the second Gantt chart below, the same three processes have an average wait time of ( 0 + 3 + 6 ) / 3 = 3.0
ms. The total run time for the three bursts is the same, but in the second case two of the three finish much
quicker, and the other process is only delayed by a short amount.
FCFS can also block the system in a busy dynamic system in another way, known as the convoy effect. When
one CPU intensive process blocks the CPU, a number of I/O intensive processes can get backed up behind it,
leaving the I/O devices idle. When the CPU hog finally relinquishes the CPU, then the I/O processes pass
through the CPU quickly, leaving the CPU idle while everyone queues up for I/O, and then the cycle repeats
itself when the CPU intensive process gets back to the ready queue.
The idea behind the SJF algorithm is to pick the quickest fastest little job that needs to be done, get it out of
the way first, and then pick the next smallest fastest job to do next.
( Technically this algorithm picks a process based on the next shortest CPU burst, not the overall process
time. )
For example, the Gantt chart below is based upon the following CPU burst times, ( and the assumption that all
jobs arrive at the same time. )
P1 6
P2 8
P4 3
In the case above the average wait time is ( 0 + 3 + 9 + 16 ) / 4 = 7.0 ms, ( as opposed to 10.25 ms for FCFS
for the same processes. )
SJF can be proven to be the fastest scheduling algorithm, but it suffers from one important problem: How do
you know how long the next CPU burst is going to be?
o For long-term batch jobs this can be done based upon the limits that users set for their jobs when they
submit them, which encourages them to set low limits, but risks their having to re-submit the job if
they set the limit too low. However that does not work for short-term CPU scheduling on an
interactive system.
o Another option would be to statistically measure the run time characteristics of jobs, particularly if the
same tasks are run repeatedly and predictably. But once again that really isn't a viable option for short
term CPU scheduling in the real world.
o A more practical approach is to predict the length of the next burst, based on some historical
measurement of recent burst times for this process. One simple, fast, and relatively accurate method is
the exponential average, which can be defined as follows. ( The book uses tau and t for their variables,
but those are hard to distinguish from one another and don't work well in HTML. )
o In this scheme the previous estimate contains the history of all previous times, and alpha serves as a
weighting factor for the relative importance of recent data versus past history. If alpha is 1.0, then past
history is ignored, and we assume the next burst will be the same length as the last burst. If alpha is
0.0, then all measured burst times are ignored, and we just assume a constant burst time. Most
commonly alpha is set at 0.5, as illustrated in Figure 5.3:
SJF can be either preemptive or non-preemptive. Preemption occurs when a new process arrives in the ready
queue that has a predicted burst time shorter than the time remaining in the process whose burst is currently on
the CPU. Preemptive SJF is sometimes referred to as shortest remaining time first scheduling.
For example, the following Gantt chart is based upon the following data:
P1 0 8
P2 1 4
P3 2 9
p4 3 5
Priority scheduling is a more general case of SJF, in which each job is assigned a priority and the job with the
highest priority gets scheduled first. ( SJF uses the inverse of the next expected burst time as its priority - The
smaller the expected burst, the higher the priority. )
Note that in practice, priorities are implemented using integers within a fixed range, but there is no agreed-
upon convention as to whether "high" priorities use large numbers or small numbers. This book uses low
number for high priorities, with 0 being the highest possible priority.
For example, the following Gantt chart is based upon these process burst times and priorities, and yields an
average waiting time of 8.2 ms:
P1 10 3
P2 1 1
P3 2 4
P4 1 5
P5 5 2
Priorities can be assigned either internally or externally. Internal priorities are assigned by the OS using
criteria such as average burst time, ratio of CPU to I/O activity, system resource use, and other factors
available to the kernel. External priorities are assigned by users, based on the importance of the job, fees paid,
politics, etc.
Priority scheduling can be either preemptive or non-preemptive.
Priority scheduling can suffer from a major problem known as indefinite blocking, or starvation, in which a
low-priority task can wait forever because there are always some other jobs around that have higher priority.
o If this problem is allowed to occur, then processes will either run eventually when the system load
lightens ( at say 2:00 a.m. ), or will eventually get lost when the system is shut down or crashes.
( There are rumors of jobs that have been stuck for years. )
o One common solution to this problem is aging, in which priorities of jobs increase the longer they
wait. Under this scheme a low-priority job will eventually get its priority raised high enough that it
gets run.
Round robin scheduling is similar to FCFS scheduling, except that CPU bursts are assigned with limits
called time quantum.
When a process is given the CPU, a timer is set for whatever value has been set for a time quantum.
o If the process finishes its burst before the time quantum timer expires, then it is swapped out of the
CPU just like the normal FCFS algorithm.
P1 24
P2 3
P3 3
The performance of RR is sensitive to the time quantum selected. If the quantum is large enough, then RR
reduces to the FCFS algorithm; If it is very small, then each process gets 1/nth of the processor time and share
the CPU equally.
BUT, a real system invokes overhead for every context switch, and the smaller the time quantum the more
context switches there are. ( See Figure 6.4 below. ) Most modern systems use time quantum between 10 and
100 milliseconds, and context switch times on the order of 10 microseconds, so the overhead is small relative
to the time quantum.
Figure 6.4 - The way in which a smaller time quantum increases context switches.
Turn around time also varies with quantum time, in a non-apparent manner. Consider, for example the
processes shown in Figure 6.5:
Figure 6.5 - The way in which turnaround time varies with the time quantum.
In general, turnaround time is minimized if most processes finish their next cpu burst within one time
quantum. For example, with three processes of 10 ms bursts each, the average turnaround time for 1 ms
quantum is 29, and for 10 ms quantum it reduces to 20. However, if it is made too large, then RR just
degenerates to FCFS. A rule of thumb is that 80% of CPU bursts should be smaller than the time quantum.
When processes can be readily categorized, then multiple separate queues can be established, each
implementing whatever scheduling algorithm is most appropriate for that type of job, and/or with different
parametric adjustments.
Scheduling must also be done between queues, that is scheduling one queue to get time relative to other
queues. Two common options are strict priority ( no job in a lower priority queue runs until all higher priority
queues are empty ) and round-robin ( each queue gets a time slice in turn, possibly of different sizes. )
Note that under this algorithm jobs cannot switch from queue to queue - Once they are assigned a queue, that
is their queue until they finish.
Multilevel feedback queue scheduling is similar to the ordinary multilevel queue scheduling described above,
except jobs may be moved from one queue to another for a variety of reasons:
o If the characteristics of a job change between CPU-intensive and I/O intensive, then it may be
appropriate to switch a job from one queue to another.
o Aging can also be incorporated, so that a job that has waited for a long time can get bumped up into a
higher priority queue for a while.
Multilevel feedback queue scheduling is the most flexible, because it can be tuned for any situation. But it is
also the most complex to implement because of all the adjustable parameters. Some of the parameters which
define one of these systems include:
o The number of queues.
o The scheduling algorithm for each queue.
o The methods used to upgrade or demote processes from one queue to another. ( Which may be
different. )
o The method used to determine which queue a process enters initially.
Contention scope refers to the scope in which threads compete for the use of physical CPUs.
On systems implementing many-to-one and many-to-many threads, Process Contention Scope, PCS, occurs,
because competition occurs between threads that are part of the same process. ( This is the management /
scheduling of multiple user threads on a single kernel thread, and is managed by the thread library. )
System Contention Scope, SCS, involves the system scheduler scheduling kernel threads to run on one or
more CPUs. Systems implementing one-to-one threads ( XP, Solaris 9, Linux ), use only SCS.
PCS scheduling is typically done with priority, where the programmer can set and/or change the priority of
threads created by his or her programs. Even time slicing is not guaranteed among threads of equal priority.