Unit-5 Part1

CS9222 Advanced Operating


Unit – V

Professor & Head/IT - VCEW
Unit - V
Structures – Design Issues – Threads – Process
Synchronization – Processor Scheduling – Memory
Management – Reliability / Fault Tolerance; Database
Operating Systems – Introduction – Concurrency
Control – Distributed Database Systems –
Concurrency Control Algorithms.
Motivation for Multiprocessors
Enhanced Performance -
Concurrent execution of tasks for increased
throughput (between processes)
Exploit Concurrency in Tasks (Parallelism
within process)
Fault Tolerance -
graceful degradation in face of failures
Basic MP Architectures
Single Instruction Single Data (SISD) -
conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) -
Vector and Array Processors
Multiple Instruction Single Data (MISD) -
Not Implemented.
Multiple Instruction Multiple Data (MIMD)
- conventional MP designs
MIMD Classifications
Tightly Coupled System - all processors
share the same global memory and have
the same address spaces (Typical SMP
Main memory for IPC and Synchronization.
Loosely Coupled System - memory is
partitioned and attached to each processor.
Hypercube, Clusters (Multi-Computer).
Message passing for IPC and synchronization.
MP Block Diagram


cache MMU cache MMU cache MMU cache MMU

Interconnection Network

Memory Access Schemes
• Uniform Memory Access (UMA)
– Centrally located
– All processors are equidistant (access times)
• NonUniform Access (NUMA)
– physically partitioned but accessible by all
– processors have the same address space
• NO Remote Memory Access (NORMA)
– physically partitioned, not accessible by all
– processors have own address space
Other Details of MP
Interconnection technology
Cross-Bar switch
Multistage Interconnect Network
Caching - Cache Coherence Problem!
bus snooping
MP OS Structure - 1

Separate Supervisor -
all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency
MP OS Structure - 2

• Master/Slave Configuration
– master monitors the status and assigns work to
other processors (slaves)
– Slaves are a schedulable pool of resources for
the master
– master can be bottleneck
– poor fault tolerance
MP OS Structure - 3
Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently
across all processors
Synchronize access to shared data structures:
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that normally
have little interaction
multithread kernel and control access to resources
MP Overview



Shared Memory Distributed Memory

(tightly coupled) (loosely coupled)

Master/Slave Symmetric Clusters

SMP OS Design Issues
Threads - effectiveness of parallelism depends
on performance of primitives used to express
and control concurrency.
Process Synchronization - disabling interrupts
is not sufficient.
Process Scheduling - efficient, policy controlled,
task scheduling (process/threads)
global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread
SMP OS design issues - 2

Memory Management - complicated since

main memory is shared by possibly many
processors. Each processor must maintain its
own map tables for each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency
Reliability and fault Tolerance - degrade
gracefully in the event of failures
Typical SMP System


cache MMU cache MMU cache MMU cache MMU

System/Memory Bus
• Memory contention Main 50ns I/O
• Limited bus BW INT Memory Bridge subsystem
• I/O contention
• Cache coherence
System Functions ether
(timer, BIOS, reset)
Typical I/O Bus:
• 33MHz/32bit (132MB/s) video
• 66MHz/64bit (528MB/s)
Some Definitions
Parallelism: degree to which a multiprocessor
application achieves parallel execution
Concurrency: Maximum parallelism an
application can achieve with unlimited
System Concurrency: kernel recognizes multiple
threads of control in a program
User Concurrency: User space threads
(coroutines) provide a natural programming
model for concurrent applications. Concurrency
not supported by system.
Process and Threads
Process: encompasses
set of threads (computational entities)
collection of resources
Thread: Dynamic object representing an
execution path and computational state.
threads have their own computational state: PC,
stack, user registers and private data
Remaining resources are shared amongst threads
in a process
Effectiveness of parallel computing depends on
the performance of the primitives used to
express and control parallelism
Threads separate the notion of execution from
the Process abstraction
Useful for expressing the intrinsic concurrency
of a program regardless of resulting
Three types: User threads, kernel threads and
Light Weight Processes (LWP)
User Level Threads

User level threads - supported by user level

(thread) library
no modifications required to kernel
flexible and low cost
can not block without blocking entire process
no parallelism (not recognized by kernel)
Kernel Level Threads
Kernel level threads - kernel directly supports
multiple threads of control in a process. Thread
is the basic scheduling entity
coordination between scheduling and
less overhead than a process
suitable for parallel application
more expensive than user-level threads
generality leads to greater overhead
Light Weight Processes (LWP)

Kernel supported user thread

Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP
LWP is scheduled by kernel
User threads scheduled by library onto LWPs
Multiple LWPs per process
First Class threads (Psyche OS)
Thread operations in user space:
create, destroy, synch, context switch
kernel threads implement a virtual processor
Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals). Example, for
scheduling decisions and preemption warnings.
Kernel scheduler interface - allows dissimilar thread
packages to coordinate.
Scheduler Activations
An activation:
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of current
user thread when stopped by kernel
kernel is responsible for processor allocation =>
preemption by kernel.
Thread package responsible for scheduling
threads on available processors (activations)
Support for Threading
• BSD:
– process model only. 4.4 BSD enhancements.
• Solaris:provides
– user threads, kernel threads and LWPs
• Mach: supports
– kernel threads and tasks. Thread libraries provide
semantics of user threads, LWPs and kernel threads.
• Digital UNIX: extends MACH to provide usual
UNIX semantics.
– Pthreads library.
Process Synchronization:Motivation
Sequential execution runs correctly but
concurrent execution (of the same program)
runs incorrectly.
Concurrent access to shared data may result in
data inconsistency
Maintaining data consistency requires
mechanisms to ensure the orderly execution of
cooperating processes
Let’s look at an example: consumer-producer
Producer-Consumer Problem
Producer Consumer

while (true) { while (true) {

/* produce an item and put in while (count == 0); // do nothing
nextProduced */ nextConsumed = buffer[out];
while (count == BUFFER_SIZE); // do out = (out + 1) % BUFFER_SIZE;
nothing count--;
buffer [in] = nextProduced; // consume the item in nextConsumed
in = (in + 1) % BUFFER_SIZE; }
What can go wrong in concurrent
count: the number of items in the execution?
buffer (initialized to 0)
Race Condition
 count++ could be implemented as
register1 = count
register1 = register1 + 1
count = register1
 count-- could be implemented as
register2 = count
register2 = register2 - 1
count = register2
 Consider this execution interleaving with “count = 5” initially:
 S0: producer execute register1 = count {register1 = 5}
S1: producer execute register1 = register1 + 1 {register1 = 6}
S2: consumer execute register2 = count {register2 = 5}
S3: consumer execute register2 = register2 - 1 {register2 = 4}
S4: producer execute count = register1 {count = 6 }
S5: consumer execute count = register2 {count = 4}

What are all possible values from concurrent execution?

How to prevent race condition?
 Define a critical section in do {
each process entry section
 Reading and writing critical section
common variables. exit section
 Make sure that only one remainder section
process can execute in the } while (TRUE);
critical section at a time.
 What sync code to put into
the entry & exit sections to
prevent race condition?
Solution to Critical-Section
1. Mutual Exclusion - If process Pi is executing in its critical section, then no
other processes can be executing in their critical sections
2. Progress - If no process is executing in its critical section and there exist
some processes that wish to enter their critical section, then the
selection of the processes that will enter the critical section next cannot
be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of times that
other processes are allowed to enter their critical sections after a process
has made a request to enter its critical section and before that request is

What is the difference between

Progress and Bounded Waiting?
Peterson’s Solution
Simple 2-process solution
Assume that the LOAD and STORE instructions are
atomic; that is, cannot be interrupted.
The two processes share two variables:
int turn;
Boolean flag[2]
The variable turn indicates whose turn it is to enter
the critical section.
The flag array is used to indicate if a process is ready
to enter the critical section. flag[i] = true implies that
process Pi is ready!
Algorithm for Process Pi
while (true) { Entry Section  Mutual exclusion
flag[i] = TRUE;  Only one process enters critical section
turn = j; at a time.
while ( flag[j] && turn == j);  Proof: can both processes pass the while
loop (and enter critical section) at the
Exit Section  Progress
flag[i] = FALSE;  Selection for waiting-to-enter-critical-
section process does not block.
REMAINDER SECTION  Proof: can Pi wait at the while loop
forever (after Pj leaves critical section)?
}  Bounded Waiting
 Limited time in waiting for other
 Proof: can Pj win the critical section
twice while Pi waits?
Algorithm for Process Pi
while (true) { Entry Section
flag[i] = TRUE; while (true) {
turn = j; flag[j] = TRUE;
while ( flag[j] && turn == j); turn = i;
while ( flag[i] && turn == i);
Exit Section
flag[i] = FALSE;
flag[j] = FALSE;
} }
Synchronization Hardware
 Many systems provide hardware support for critical section code
 Uniprocessors – could disable interrupts
 Currently running code would execute without preemption
 Generally too inefficient on multiprocessor systems
Operating systems using this not broadly scalable
 Modern machines provide special atomic hardware instructions
Atomic = non-interruptable
 TestAndSet(target): Either test memory word and set value
 Swap(a,b): Or swap contents of two memory words
TestAndSet Instruction

• Definition:

boolean TestAndSet (boolean *target)

boolean rv = *target;
*target = TRUE;
return rv:
Solution using TestAndSet
 Shared boolean variable lock, initialized to false.
 Solution: Entry Section
while (true) {
while ( TestAndSet (&lock ))
; /* do nothing

// critical section Exit Section

lock = FALSE;

// remainder section
 Does it satisfy mutual exclusion?
 How about progress and bounded waiting?
 How to fix this?
Bounded-Waiting TestAndSet
 Mutual exclusion
• Shared variable  Proof: can two processes pass the
boolean waiting[n]; while loop (and enter critical section)
boolean lock; // initialized false. at the same time?
• Solution: Entry Section  Bounded Waiting
do {  Limited time in waiting for other
waiting[i] = TRUE;
while (waiting[i] &&
TestAndSet(&lock);  What is waiting[] for? When does
waiting[i] = FALSE; waiting[i] set to FALSE?
 Proof: how long does Pi’s wait till
// critical section Exit Section waiting[i] becomes FALSE?
j=(i+1)%n;  Progress
while ((j!=i) && !waiting[j])  Proof: exit section unblocks at least
j=(j+1)%n; one process’s waiting[] or set the lock
If (j==i) lock = FALSE;
else waiting[j] = FALSE;
// reminder section
} while (TRUE);
Swap Instruction

• Definition:

void Swap (boolean *a, boolean *b)

boolean temp = *a;
*a = *b;
*b = temp:
Solution using Swap
 Shared Boolean variable lock initialized to FALSE; Each process
has a local Boolean variable key.
 Solution:
while (true) { Entry Section
key = TRUE;
while ( key == TRUE)
Swap (&lock, &key ); Exit Section
// critical section
lock = FALSE;
// remainder section
 Mutual exclusion? Progress and Bounded Waiting?
 Notice a performance problem with Swap & TestAndSet
Processor Scheduling
PS: ready tasks are assigned to the processors so
that performance is maximized.
Cooperate and communicate through shared
variables or message passing, PS in multiprocessor
system is difficult problem.
PS is very critical to the performance of
multiprocessor systems because a naïve scheduler
can degrade performance substantially.
Issues in Processor Scheduling
3 major causes of performance degradation are
 Preemption inside spinlock-controlled critical sections.
This situation occurs when a task is preempted inside CS when there are
other tasks spinning the lock to enter the same CS.
 cache corruption
Big chunk of data needed by the previous tasks must be purged from the
cache and new data must be brought into the cache.
Very high miss ratio a processor switched to another task – Cache corrp.
 context switching overheads
Execution of a large no. of instructions to save and store the registers, to
initialize the registers, to switch address space, etc.
Co-Scheduling of the Medusa OS
Co-scheduling –proposed by ousterhout for MOS
for cm*
All runnable tasks of an application are scheduled
on the processor simultaneously.
Context switching between appl. Rather than bet.
Tasks of several different applications.
Pbm: tasks wasting resources in lock-spinning
while they wait for a preempted task to release
the critical section.
Smart Scheduling
Proposed by zahorjan et al. – 2 nice features
It avoids preempting a task when the task is inside its
It avoids the rescheduling of tasks that were busy
waiting at the time of their preemption until the task
that is executing the corresponding CS release it.
Eliminates the resource waste due to a processor
spinning a lock.
To reduce the overhead due to context switching
nor to reduce the performance degradation due to
cache corruption.
Scheduling in the NYU Ultracomputer
Edler et al. and it cobines the the strategies of the
previous 2 scheduling techniques.
Tasks can be formed into groups and scheduled in
any of the following ways:
 task – scheduled or preempted in the normal manner
All task in group are sched. Or preempted
Tasks in group are never preempted.
Memory Management
The Mach Operating System
Virtual MM of mach OS developed at cm*
Design Issues
Data sharing
The Mach Kernel
Basic primitives necessary for building parallel and
distributed applications.
The Mach Kernel

User process

User space
System V
Software 4.3 BSD emulator HP/UX
layer emulator emulator Other
Microkernel Kernel space
The kernel manages five principal
1. Processes.
2. Threads.
3. Memory objects.
4. Ports.
5. Messages.
Process Management in Mach

Address space process


Process Bootstrap Exception Registered

port port port ports
The process port is used to communicate with the
The bootstrap port is used for initialization when a
process starts up.
The exception port is used to report exceptions
caused by the process. Typical exceptions are division
by zero and illegal instruction executed.
The registered ports are normally used to provide a
way for the process to communicate with standard
system servers.
A process can be runnable or blocked.
If a process is runnable, those threads that are
also runnable can be scheduled and run.
If a process is blocked, its threads may not
run, no matter what state they are in.
Process Management Primitives

Create Create a new process, inheriting certain properties

Terminate Kill a specified process

Suspend Increment suspend counter

Resume Decrement suspend counter. If it is 0, unblock the process

Priority Set the priority for current or future threads

Assign Tell which processor new threads should run on

Info Return information about execution time, memory usage, etc.

Threads Return a list of the process’ threads

 Mach threads are managed by the kernel. Thread creation and destruction are
done by the kernel.

Fork Create a new thread running the same code as the

parent thread
Exit Terminate the calling thread

Join Suspend the caller until a specified thread exits

Detach Announce that the thread will never be jointed (waited

Yield Give up the CPU voluntarily

Self Return the calling thread’s identity to it

Scheduling algorithm
When a thread blocks, exits, or uses up its quantum,
the CPU it is running on first looks on its local run
queue to see if there are any active threads.
If it is nonzero, run the highest-priority thread,
starting at the queue specified by the hint.
If the local run queue is empty, the same algorithm is
applied to the global run queue. The global queue
must be locked first.
Global run queue for processor set 1 Global run queue for processor set 2
(high) 0 0

Low 31 31
:Free :Busy
Count: 6 Count: 7
Hint: 2 Hint: 4
Memory Management in Mach
 Mach has a powerful, elaborate, and highly flexible memory
management system based on paging.
 The code of Mach’s memory management is split into three
parts. The first part is the pmap module, which runs in the
kernel and is concerned with managing the MMU.
 The second part, the machine-independent kernel code, is
concerned with processing page faults, managing address
maps, and replacing pages.
 The third part of the memory management code runs as a
user process called a memory manager. It handles the logical
part of the memory management system, primarily
management of the backing store (disk).
Virtual Memory
The conceptual model of memory that Mach user
processes see is a large, linear virtual address space.
The address space is supported by paging.
A key concept relating to the use of virtual address
space is the memory object. A memory object can be
a page or a set of pages, but it can also be a file or
other, more specialized data structure.
An address space with allocated regions,
mapped objects, and unused addresses

File xyz region


Stack region


Data region

Text region
System calls for virtual address
space manipulation
Allocate Make a region of virtual address space usable

Deallocate Invalidate a region of virtual address space

Map Map a memory object into the virtual address space

Copy Make a copy of a region at another virtual address

Inherit Set the inheritance attribute for a region

Read Read data from another process’ virtual address

Write Write data to another process’ virtual address space
Memory Sharing

Process 1 Process 2 Process 3

Operation of Copy-on-Write

Physical memory
Prototype’s address space Child’s address space

7 RW 7 RO 7

6 6 6
5 5 5

4 4 4

3 3 3
2 2 2
1 1 1
0 0 0
Operation of Copy-on-Write

Physical memory Copy of page 7

Prototype’s address space 8 Child’s address space

7 RW 7 7
6 6 O 6
5 5 5

4 4 4

3 3 3
2 2 2
1 1 1
0 0 0
Advantages of Copy-on-write
1. some pages are read-only, so there is no
need to copy them.
2. other pages may never be referenced, so
they do not have to be copied.
3. still other pages may be writable, but the
child may deallocate them rather than using
Disadvantages of Copy-on-write
1. the administration is more complicated.
2. requires multiple kernel traps, one for each
page that is ultimately written.
3. does not work over a network.
External Memory Managers
 Each memory object that is mapped in a process’ address
space must have an external memory manager that controls
it. Different classes of memory objects are handled by
different memory managers.
 Three ports are needed to do the job.
 The object port, is created by the memory manager and will
later be used by the kernel to inform the memory manager
about page faults and other events relating to the object.
 The control port, is created by the kernel itself so that the
memory manager can respond to these events.
 The name port, is used as a kind of name to identify the
Distributed Shared Memory in Mach
The idea is to have a single, linear, virtual
address space that is shared among processes
running on computers that do not have any
physical shared memory. When a thread
references a page that it does not have, it
causes a page fault. Eventually, the page is
located and shipped to the faulting machine,
where it is installed so that the thread can
continue executing.
Communication in Mach
 The basis of all communication in Mach is a kernel data
structure called a port.
 When a thread in one process wants to communicate with a
thread in another process, the sending thread writes the
message to the port and the receiving thread takes it out.
 Each port is protected to ensure that only authorized
processes can send it and receive from it.
 Ports support unidirectional communication. A port that can
be used to send a request from a client to a server cannot also
be used to send the reply back from the server to the client. A
second port is needed for the reply.
A Mach port
Message queue
Current message count
Maximum messages

Port set this port belongs to

Counts of outstanding capabilities

Capabilities to use for error reporting

Queue of threads blocked on this port

Pointer to the process holding the RECEIVE capability

Index of this port in the receiver’s capability list
Pointer to the kernel object
Miscellaneous items
Message passing via a port

Receiving thread

send receive

port Kernel



Capability 1 Port 1
with RECEIVE 2 X
right kernel
3 Port 3
4 Y 4

Capability with Capability list

SEND right
Primitives for Managing Ports

Allocate Create a port and insert its capability in the capability list

Destroy Destroy a port and remove its capability from the list

Deallocate Remove a capability from the capability list

Extract_right Extract the n-th capability from another process

Insert_right Insert a capability in another process’ capability list

Move_member Move a capability into a capability set

Set_qlimit Set the number of messages a port can hold

Sending and Receiving Messages
 Mach_msg(&hdr, options, send_size, rcv_size, rcv_port, timeout, notify_port);
 The first parameter, hdr, is a pointer to the message to be sent or to the place
where the incoming message is put, or both.
 The second parameter, options, contains a bit specifying that a message is to be
sent, and another one specifying that a message is to be received. Another bit
enables a timeout, given by the timeout parameter. Other bits in options allow a
SEND that cannot complete immediately to return control anyway, with a status
report being sent to notify_port later.
 The send_size and rcv_size parameters tell how large the outgoing message is and
how many bytes are available for storing the incoming message, respectively.
 Rcv_port is used for receiving messages. It is the capability name of the port or
port set being listened to.
The Mach message format
Complex/SimpleReply rights Dest. rights

Message size
Capability index for destination port
Capability index for reply port

Message kind Not examined

by the
Function code kernel

Descriptor 1
Message Data field 1
Descriptor 2
Data field 2
Complex message field descriptor

Bits1 1 1 1 12 8 8
Number of Data field size Data field type
in the data field In bits

0: Out-of-line data present

1: No out-of-line data Bit
0: Short form descriptor Unstructured word
1: Long form descriptor Integer(8,16,32 bits)
0: Sender keeps out-of-line data 32 Booleans
1: Deallocate out-of-line data from sender Floating point
Reliability/Fault Tolerance: the
Sequoia system – a loosely coupled
multiprocessor system.
Attains a high level of fault tolerance by
performing fault detection in hardware and
fault recovery in the OS.
Design Issues
Fault detection and isolation
Fault recovery
The sequoia Architecture
The Sequoia Architecture
Reliability/Fault Tolerance: the
Fault detection
Error detecting codes
Comparison of duplicated operations
Protocol monitoring
Fault Recovery
Recovery from processor failures
Recovery from main memory failures
Recovery from I/O failures
Database Operating Systems
 Database system have
been implemented as
an application on top
of general purpose OS
 Requrements of DBOS
 Transaction
 Support for complex,
persistent data
 Buffer Management
Concurrency Control
 CC is the process of controlling concurrent access to a database to
ensure that the correctness of the database is maintained.
 Database systems
Set of shared data objects that can be accessed by users.
A transaction consists of a sequence of R, compute & W s/m
that refer to the data objects of a database.
Transactions conflicts if they access the same data objects.
Transaction processing
A transaction is executed by executing its actions one by one
from the beginning to the end.
A concurrency control model of DBS
3 software modules
Transaction manager (TM)
Supervises the execution of a transaction
Data manager (DM)
Responsible for enforcing concurrency control
Distributed Database System
 A distributed database is a database in which storage devices
are not all attached to a common processing unit such as the

 It may be stored in multiple computers, located in the same

physical location; or may be dispersed over a network of
interconnected computers.

 Unlike parallel systems, in which the processors are tightly

coupled and constitute a single database system, a distributed
database system consists of loosely coupled sites that share
no physical components.
Model of Distributed Database System
Distributed Database System
 Motivations: DDBS offers several advantages over a centralized
database system such as
 Sharing
 Higher system availability (reliability)
 Improved performance
 Easy expandability
 Large databases

 Transaction Processing Model

 Serializability condition in DDBS
 Data replication
 Complications due to Data replication
 Fully Replicated Database Systems
1. Enhanced reliability 2. Improved responsiveness 3. No directory
management 4. Easier load balancing
Concurrency Control Algorithms
It controls the interleaving of conflicting actions of
transactions so that the integrity of a database is
maintained, i.e., their net effect is a serial execution.
Basic synchronization primitives
A transaction can request, hold or release the lock on a data
 lock a data object in 2 modes: exclusive and shared

Unique number is assigned to a transaction or a data object and is
chosen from a monotonically increasing sequence.
Commonly generated using Lamport’s scheme
Lock based algorithms
Static locking
Two Phase Locking (2PL)
Problems with 2PL: Price for Higher concurrency
2PL in DDBS
Timestamp Based locking
Conflict Resolution
Wait Restart Die Wound
Non-two-phase locking
Timestamp Based Algorithms
Basic timestamp ordering algorithm
Thomas Write Rule (TWR)
Multiversion timestamp ordering algorithm
Conservative timestamp ordering algorithm
Thank U

