Chapter 1: Introduction
What is an Operating System?
Mainframe Systems
Desktop Systems
Multiprocessor Systems
Distributed Systems
Clustered System
Real -Time Systems
Handheld Systems
Computing Environments
OS Notes by Dr. Naveen Choudhary
What is an Operating System?
A program that acts as an intermediary between a user of a
computer and the computer hardware.
Operating system goals:
Execute user programs and make solving user problems easier.
Make the computer system convenient to use.
Use the computer hardware in an efficient manner.
Basically OS is a
Resource manager with objective of efficient resource use and user
convenience
Control program that controls the execution of user programs to
prevent errors and improper use of the computer
OS Notes by Dr. Naveen Choudhary
Computer System Components
1. Hardware – provides basic computing resources (CPU,
memory, I/O devices).
2. Operating system – controls and coordinates the use of
the hardware among the various application/system
programs for the various users.
3. Applications/system programs – define the ways in which
the system resources are used to solve the computing
problems of the users (compilers, database systems,
video games, business programs).
4. Users (people, machines, other computers).
OS Notes by Dr. Naveen Choudhary
Abstract View of System Components
OS Notes by Dr. Naveen Choudhary
Operating System Definitions
Resource allocator – manages and allocates resources.
Control program – controls the execution of user
programs and operations of I/O devices and handles
errors
Kernel – the one program running at all times (all else
being application programs).
OS Notes by Dr. Naveen Choudhary
Computer startup
bootstrap program is loaded at power-up or reboot
Typically stored in ROM or EEPROM, generally known as
firmware
Initializates all aspects of system
Loads operating system kernel and starts execution
OS Notes by Dr. Naveen Choudhary
Mainframe Systems
In old days H/W used to be card readers, tape drives, line
printers etc (ie basically slow devices as lot of mechanical
parts are involved)
But CPU was relatively quite fast being an electronic
device
So at this time interactive systems were not possible as
they will slow down the execution dramatically because of
slow I/O devices and so CPU will mostly be idle. And
that’s why we needed Batch Systems
Reduce setup time by batching similar requirement jobs
Automatic job sequencing – automatically transfers
control from one job to another.
Resident monitor
initial control in monitor
control transfers to job
when job completes control transfers back to monitor
OS Notes by Dr. Naveen Choudhary
Memory Layout for a Simple Batch System
OS Notes by Dr. Naveen Choudhary
Multiprogrammed Batch Systems
Several jobs are kept in main memory at the same time, and the
CPU is multiplexed among them.
Initially used punched cards data ,
program and control cards
Spooling (simultaneous peripheral operation
online ) Introduction of disk technology
So cards can be read into disk for later
cpu reading & cpu o/p for printing can also
be put on disk for later printing
So spooling can help overlap the i/o of
one job with the computation of the other.
OS Notes by Dr. Naveen Choudhary
OS Features Needed for Multiprogramming
I/O routine supplied by the system.
Memory management – the system must allocate the
memory to several jobs.
Job scheduling : choosing a job from many ready jobs on
disk to bring it to the memory
CPU scheduling – the system must choose among
several jobs ready to run.
Allocation of devices.
OS Notes by Dr. Naveen Choudhary
Time-Sharing Systems–Interactive Computing
The CPU is multiplexed among several jobs that are kept
in memory (the CPU is allocated to a job only if the job is
in memory).
A job can be swapped in and out of memory from/to the
disk.
With time sharing systems -- the interactive systems
were possible. Each process was given small quantum of
time
OS Notes by Dr. Naveen Choudhary
Multiprogramming/multitasking
Multiprogramming needed for efficiency
Single user cannot keep CPU and I/O devices busy at all times
Multiprogramming organizes jobs (code and data) so CPU always has one to
execute
A subset of total jobs in system is kept in memory
One job selected and run via job scheduling
When it has to wait (for I/O for example), OS switches to another job
Timesharing (multitasking) is logical extension in which CPU switches
jobs so frequently that users can interact with each job while it is
running, creating interactive computing
Response time should be < 1 second
Each user has at least one program executing in memory process
If several jobs ready to run at the same time CPU scheduling
If processes don’t fit in memory, swapping moves them in and out to run
Virtual memory allows execution of processes not completely in memory
OS Notes by Dr. Naveen Choudhary
Desktop Systems
Personal computers – computer system dedicated to a
single user.
I/O devices – keyboards, mice, display screens, small
printers.
User convenience and responsiveness.
Can adopt technology developed for larger operating
system’ often individuals have sole use of computer and
do not need advanced CPU utilization of protection
features.
May run several different types of operating systems
(Windows, MacOS, UNIX, Linux)
OS Notes by Dr. Naveen Choudhary
Parallel Systems
Multiprocessor systems with more than one CPU in close
communication.
Tightly coupled system – processors share memory and a
clock; communication usually takes place through the
shared memory.
Advantages of parallel system:
Increased throughput :: as more processors so more work
done per unit time but increase is not n (for n processors)
as some time will be wasted in overheads and contention
for shared resources.
Economical
Increased reliability
graceful degradation fault tolerance
OS Notes by Dr. Naveen Choudhary
Parallel Systems (Cont.)
Symmetric multiprocessing (SMP)
Each processor runs and identical copy of the operating
system and these can be communicated ( through shared
memory as & when needed)
Many processes can run at once without performance
deterioration.
Most modern operating systems support SMP
Asymmetric multiprocessing
Each processor is assigned a specific task; master
processor schedules and allocated work to slave
processors.
More common in extremely large systems
OS Notes by Dr. Naveen Choudhary
Multiprocessing Architecture
OS Notes by Dr. Naveen Choudhary
Distributed Systems
Distribute the computation among several physical
processors.
Loosely coupled system – each processor has its own
local memory; processors communicate with one another
through various communications lines, such as high-
speed buses or telephone lines.
Advantages of distributed systems.
Resources Sharing
Computation speed up – load sharing
Reliability
Communications (ie exchange of information in form of mail,
file transfer is possible)
OS Notes by Dr. Naveen Choudhary
Distributed Systems (cont)
Requires networking infrastructure.
Local area networks (LAN) or Wide area networks (WAN)
May be either client-server or peer-to-peer systems.
OS Notes by Dr. Naveen Choudhary
General Structure of Client-Server
OS Notes by Dr. Naveen Choudhary
Real-Time Systems
Often used as a control device in a dedicated application
such as controlling scientific experiments, medical
imaging systems, industrial control systems, and some
display systems.
Well-defined fixed-time constraints.
Real-Time systems may be either hard or soft real-time.
OS Notes by Dr. Naveen Choudhary
Real-Time Systems (Cont.)
Hard real-time:
Secondary storage limited or absent, data stored in short
term memory, or read-only memory (ROM)
Conflicts with time-sharing systems as no virtual memory.
Soft real-time
Limited utility in industrial control of robotics
Useful in applications (multimedia, virtual reality, undersea
exploration, planetory rovers) requiring advanced operating-
system features.
Ex. RT Linux
OS Notes by Dr. Naveen Choudhary
Chapter 2: Computer-System Structures
Computer System Operation
I/O Structure
Storage Structure
Storage Hierarchy
Hardware Protection
General System Architecture
OS Notes by Dr. Naveen Choudhary
Computer-System Architecture
Bootstrap:(1) Initialize h/w - like cpu registers, device controllers, memory controllers etc
( 2 ) Loads o.s kernel into the primary memory and then transfers control to O.S
OS Notes by Dr. Naveen Choudhary
Computer-System Operation
I/O devices and the CPU can execute concurrently.
Each device controller is in charge of a particular device
type.
Each device controller has a local buffer & a set of special
purpose registers
CPU loads the appropriate registers with in the device
controller ( say for some read operation )
I/O is from the device to local buffer of controller.
CPU moves data from/to main memory to/from local
buffers
Device controller informs CPU that it has finished its
operation by causing an interrupt.
OS Notes by Dr. Naveen Choudhary
Common Functions of Interrupts
Interrupt transfers control to the interrupt service routine
generally, through the interrupt vector, which contains the
addresses of all the service routines.
Interrupt architecture must save the address of the
interrupted instruction.
Incoming interrupts are disabled while another interrupt is
being processed to prevent a lost interrupt (specifically
when program counters and other registers are being
saved )
A trap is a software-generated interrupt caused either by
an error or a user request.
An operating system is interrupt driven.
OS Notes by Dr. Naveen Choudhary
Interrupt Handling
The operating system preserves the state of the CPU by
storing registers and the program counter. (disable further
interrupts, while this is being done )
Separate segments of code determine what action should
be taken for each type of interrupt
OS Notes by Dr. Naveen Choudhary
I/O Structure
After I/O starts, control returns to user program only upon
I/O completion. Synchronous I/O
Wait instruction idles the CPU until the next interrupt
Wait loop (contention for memory access). infinite loop
Such infinite loop might also need to pool any i/o device
that do not support interrupt structure
POOLING :: check for a flag (in i/o device controller) to
become true or false
At most one I/O request is outstanding at a time, no
simultaneous I/O processing.
After I/O starts, control returns to user program without
waiting for I/O completion. Asynchronous I/O
When i/o completes, the I/O device controller interrupts the
CPU to inform the CPU that it has finished it’s operation
OS Notes by Dr. Naveen Choudhary
Two I/O Methods
Synchronous Asynchronous
OS Notes by Dr. Naveen Choudhary
Device-Status Table
The O.S also need to be able to keep track of many i/o requests at
the same time. For this purpose the O.S uses a table containing an
entry for each i/o device. Such table entry indicates the device type,
address & status( not functioning, idle, busy)
First element in the queue : represents the request being processed
Other elements in the queue : represents the requests waiting in the
queue
OS Notes by Dr. Naveen Choudhary
Contd…
When i/o needs CPU attention (because it wants to start
new operation or it has completed an operation), it
interrupts
The O.S in response to the interrupt, first determines
which i/o device raised the interrupt and then indexes into
the i/o device table accordingly
O.s now modifies the device table entry and status
accordingly (if i/o completed – also inform the waiting
process accordingly )
OS Notes by Dr. Naveen Choudhary
Direct Memory Access Structure
Used for high-speed I/O devices able to transmit
information at close to memory speeds.
Device controller transfers blocks of data from buffer
storage directly to main memory without CPU
intervention.
Only one interrupt is generated per block, rather than the
one interrupt per byte.
OS Notes by Dr. Naveen Choudhary
Storage Structure
Main memory – only large storage media that the CPU
can access directly.
Secondary storage – extension of main memory that
provides large nonvolatile storage capacity.
Magnetic disks – rigid metal or glass platters covered with
magnetic recording material
Disk surface is logically divided into tracks, which are
subdivided into sectors.
The disk controller determines the logical interaction
between the device and the computer.
OS Notes by Dr. Naveen Choudhary
Moving-Head Disk Mechanism
OS Notes by Dr. Naveen Choudhary
Magentic Tape
Slow
Good for sequential access and not good for random
access
Generally used as Backup media
OS Notes by Dr. Naveen Choudhary
Storage Hierarchy
Storage systems organized in hierarchy.
Speed
Cost
Volatility
Caching – copying information into faster storage system;
main memory can be viewed as a last cache for
secondary storage.
Main memory generally dynamic RAM, cache static
RAM
The program must be in main memory to be executed
because main memory is the only large storage area (in
addition to registers and cache) that the processor can
access directly
OS Notes by Dr. Naveen Choudhary
Storage-Device Hierarchy
As we go down the level : access time increases,
cost per bit decreases, block size increases
Magnetic disk and below : non volatile
Electronic disk :nothing but large DRAM ARRAY
OS Notes by Dr. Naveen Choudhary
Caching
Use of high-speed memory to hold recently-accessed
data.
Requires a cache management policy.
Caching introduces another level in storage hierarchy.
This requires data that is simultaneously stored in more
than one level to be consistent.
CACHE COHERENCE & CONSISTENCY
uniprocessors systems : not a big problem
multiprocessor systems : big problem
cache coherence is generally a h/w issue and is
handled below the O. S. level
OS Notes by Dr. Naveen Choudhary
Migration of A From Disk to Register
OS Notes by Dr. Naveen Choudhary
Hardware Protection
Dual-Mode Operation
I/O Protection
Memory Protection
CPU Protection
OS Notes by Dr. Naveen Choudhary
Dual-Mode Operation
Sharing system resources requires operating system to ensure
that an incorrect program cannot cause other programs to
execute incorrectly. (basically provides us with the means for
protecting the O. S from errant users, and errant users from one
another
Provide hardware support to differentiate between at least two
modes of operations.
1. User mode – execution done on behalf of a user.
2. Monitor mode (also kernel mode or system mode) – execution
done on behalf of operating system.
Certain instruction can only be run in monitor mode
At system boot time, the h/w starts in monitor mode. The
o.s is then loaded and starts user processes in user mode
OS Notes by Dr. Naveen Choudhary
Dual-Mode Operation (Cont.)
Mode bit added to computer hardware to indicate the
current mode: monitor (0) or user (1).
When an interrupt or fault occurs hardware switches to
monitor mode.
Interrupt/fault/trap
monitor user
set user mode
Privileged instructions can be issued only in monitor mode.
OS Notes by Dr. Naveen Choudhary
I/O Protection
A user program may disrupt the normal operation of the
system by issuing illegal I/O instruction, by accessing
memory location within the O.S itself or by refusing to
relinquish the CPU
To prevent users from doing illegal I/O, All I/O instructions
are privileged instructions.
Must ensure that a user program could never gain control
of the computer in monitor mode (I.e., a user program
that, as part of its execution, stores a new address in the
interrupt vector).
OS Notes by Dr. Naveen Choudhary
Use of A System Call to Perform I/O
OS Notes by Dr. Naveen Choudhary
Memory Protection
Must provide memory protection at least for the interrupt
vector and the interrupt service routines. (so that an
errant user does not get the power to work in protected
mode)
Interrupt vector
Table (in Mem) User Program -trap to
SAR4 Take address from
Service routine IVT & execute the service
address 1
Service routine (this is required to
SRA 2 Routine be done in monitor mode)
SRA 3 If user modifies SAR4
address and gives there the
SRA 4 Memo address of his program
ry then he can run his
program in monitor mode
OS Notes by Dr. Naveen Choudhary
Contd ..
In order to have memory protection, add two registers
that determine the range of legal addresses a program
may access: (to protect errant users from each other)
Base register – holds the smallest legal physical memory
address.
Limit register – contains the size of the range
Memory outside the defined range is protected.
OS Notes by Dr. Naveen Choudhary
Use of A Base and Limit Register
OS Notes by Dr. Naveen Choudhary
Hardware Address Protection
300040 300040 +120900 = 420940
OS Notes by Dr. Naveen Choudhary
Hardware Protection
When executing in monitor mode, the operating system
has unrestricted access to both monitor and user’s
memory.
The load instructions for the base and limit registers are
privileged instructions.
OS Notes by Dr. Naveen Choudhary
CPU Protection
O.S should ensure that a process is not stuck up in an
infinite loop and so will held the CPU for ever.
Soln.
Timer – interrupts computer after specified period to
ensure operating system maintains control.
Timer is decremented every clock tick.
When timer reaches the value 0, an interrupt occurs.
Timer commonly used to implement time sharing.
Time also used to compute the current time.
Load-timer is a privileged instruction.
OS Notes by Dr. Naveen Choudhary
Chapter 3: Operating-System Structures
System Components
Operating System Services
System Calls
System Programs
System Structure
Virtual Machines
System Design and Implementation
System Generation
OS Notes by Dr. Naveen Choudhary
Common System Components
Process Management
Main Memory Management
File Management
I/O System Management
Secondary Storage Management
Networking
Protection System
Command-Interpreter System
OS Notes by Dr. Naveen Choudhary
Process Management
A process is a program in execution. A process needs
certain resources, including CPU time, memory, files, and
I/O devices, to accomplish its task.
The operating system is responsible for the following
activities in connection with process management.
Process creation and deletion.
process suspension and resumption.
Provision of mechanisms for:
process synchronization
process communication
OS Notes by Dr. Naveen Choudhary
Main-Memory Management
Memory is a large array of words or bytes, each with its
own address. It is a repository of quickly accessible data
shared by the CPU and I/O devices.
Main memory is a volatile storage device. It loses its
contents in the case of system/power failure.
The operating system is responsible for the following
activities in connections with memory management:
Keep track of which parts of memory are currently being
used and by whom.
Decide which processes to load when memory space
becomes available.
Allocate and deallocate memory space as needed.
OS Notes by Dr. Naveen Choudhary
File Management
A file is a collection of related information defined by its
creator. Commonly, files represent programs (both
source and object forms) and data.
The operating system is responsible for the following
activities in connections with file management:
File creation and deletion.
Directory creation and deletion.
Support of primitives for manipulating files and directories.
Mapping files onto secondary storage.
File backup on stable (nonvolatile) storage media.
OS Notes by Dr. Naveen Choudhary
I/O System Management
The I/O system consists of:
A buffer-caching system
A general device-driver interface
Drivers for specific hardware devices
OS Notes by Dr. Naveen Choudhary
Secondary-Storage Management
Since main memory (primary storage) is volatile and too
small to accommodate all data and programs
permanently, the computer system must provide
secondary storage to back up main memory.
Most modern computer systems use disks as the
principle on-line storage medium, for both programs and
data.
The operating system is responsible for the following
activities in connection with disk management:
Free space management
Storage allocation
Disk scheduling
OS Notes by Dr. Naveen Choudhary
Networking (Distributed Systems)
A distributed system is a collection processors that do not
share memory or a clock. Each processor has its own
local memory.
The processors in the system are connected through a
communication network.
Communication takes place using some protocol.
A distributed system provides user access to various
system resources.
Access to a shared resource allows:
Computation speed-up
Increased data availability
Enhanced reliability
NOTE : O.S usually generalize network access as a form of a
file access, with the details of networking being contained
in the network interfaces device driver
So the objective is to hide the details of getting information
from other system and to make such access as efficient as
possible.
OS Notes by Dr. Naveen Choudhary
Protection System
Protection refers to a mechanism for controlling access
by programs, processes, or users to both system and
user resources.
The protection mechanism must:
distinguish between authorized and unauthorized usage.
specify the controls to be imposed. POLICY
provide a means of enforcement.
OS Notes by Dr. Naveen Choudhary
Command-Interpreter System
The program that reads and interprets control
statements/commands is called variously:
command-line interpreter
shell (in UNIX)
Its function is to get and execute the next command
statement.
Command interpreter can be GUI based
(windows/macintosh) or text based (like UNIX Shell)
OS Notes by Dr. Naveen Choudhary
Command-Interpreter System
Many type of commands are given to the operating
system like
process creation and management related commands
I/O handling
secondary-storage management
main-memory management
file-system access
protection
networking
OS Notes by Dr. Naveen Choudhary
Operating System Services
An O.S provides an environment for execution of the program
The O.S provides certain services to program and users of
those programs for efficiency and convenience to the
programmers
Program execution – system capability to load a program into
memory , to run it, terminate program normally or abnormally
I/O operations – since user programs cannot execute I/O
operations directly, the operating system must provide some
means to perform I/O.
File-system manipulation – program capability to read, write,
create, and delete files.
Communications – exchange of information between processes
executing either on the same computer or on different systems
tied together by a network. Implemented via shared memory or
message passing.
Error detection – ensure correct computing by detecting errors
in the CPU and memory hardware, in I/O devices, or in user
programs.
OS Notes by Dr. Naveen Choudhary
Additional Operating System Functions
Additional functions exist not for helping the user, but rather
for ensuring efficient system operations.
• Resource allocation – allocating resources to multiple users
or multiple jobs running at the same time.
• Accounting – keep track of and record which users use how
much and what kinds of computer resources for account
billing or for accumulating usage statistics.
• Protection – ensuring that all access to system resources is
controlled.
OS Notes by Dr. Naveen Choudhary
System Calls
System calls provide the interface between a running
program (Process) and the operating system.
Generally available as assembly-language , C or C++
instructions.
Three general methods are used to pass parameters
between a running program and the operating system.
Pass parameters in registers.
Store the parameters in a table in memory, and the table
address is passed as a parameter in a register.
Push (store) the parameters onto the stack by the program,
and pop off the stack by operating system.
OS Notes by Dr. Naveen Choudhary
API – System Call – OS Relationship
OS Notes by Dr. Naveen Choudhary
Standard C Library Example
C program invoking printf() library call, which calls write()
system call
OS Notes by Dr. Naveen Choudhary
OS Notes by Dr. Naveen Choudhary
OS Notes by Dr. Naveen Choudhary
OS Notes by Dr. Naveen Choudhary
OS Notes by Dr. Naveen Choudhary
Passing of Parameters As A Table
Systemc
all 13 (X)
OS Notes by Dr. Naveen Choudhary
Types of System Calls
Process control (end, abort, load program into memory,
execute, create, get/set process attributes( attributes like job’s
priority & max. allowed execution time ), wait (wait for a child
process to return/end for certain time), wait for event, signal
event, allocate & free memory
File management create file, delete file, open, close, read,
write
Device management Request device, release device, read,
write etc
Information maintenance get/set time of day, get/set system
data etc
Communications shared memory or message passing
create/delete communication connection, send/
receive messages etc
OS Notes by Dr. Naveen Choudhary
MS-DOS Execution
( is a single tasking O.S)
At System Start-up Running a Program
MS-DOS shows the limited capability of multitasking/concurrency
with the help of TSRs (terminal & stay resident system calls)
OS Notes by Dr. Naveen Choudhary
UNIX Running Multiple Programs
Note: UNIX process related system calls fork, exec, exit (return
with error code 0 (no error), any positive no (error no)
OS Notes by Dr. Naveen Choudhary
Communication Models
Communication may take place using either message
passing or shared memory.
Msg Passing Shared Memory
OS Notes by Dr. Naveen Choudhary
System Programs
System Programs are basically user interface to the system
calls and in some case they can be quite complex (like command
is a system program)
System programs provide a convenient environment for program
development and execution. The can be divided into:
File manipulation
Status information
File modification
Programming language support
Program loading and execution
Communications
Application programs
Example of system programs create, copy, delete, rename, remote
login, send, receive etc text editors, compilers, loaders, linkers etc
Internal Commands ?
External Commands ?
OS Notes by Dr. Naveen Choudhary
MS-DOS Layer Structure
OS Notes by Dr. Naveen Choudhary
MS-DOS System Structure
MS-DOS – written to provide the most functionality in the
least space
not divided into modules
Although MS-DOS has some structure, its interfaces and
levels of functionality are not well separated (for instance
application programs are able to access the basic I/O
routines directly, this makes the MS-DOS vulnerable to
errant (malicious) programs
Why so in DOS was built for 8085 and 8086 didn't support
DUAL mode of operation
OS Notes by Dr. Naveen Choudhary
UNIX System Structure
UNIX – limited by hardware functionality, the original
UNIX operating system had limited structuring. The UNIX
OS consists of two separable parts.
Systems programs
The kernel
Consists of everything below the system-call interface
and above the physical hardware
Provides the file system, CPU scheduling, memory
management, and other operating-system functions; a
large number of functions for one level.
OS Notes by Dr. Naveen Choudhary
UNIX System Structure
OS Notes by Dr. Naveen Choudhary
Layered Approach
The operating system is divided into a number of layers (levels),
each built on top of lower layers. The bottom layer (layer 0), is
the hardware; the highest (layer N) is the user interface.
Advantage :: modularity layers are selected such that each
uses functions (operations) and services of only lower-level
layers. ( The design & implementation of the system is simplified
when the system is broken down into layers )
Layer M – 1 consists of some data structure & a set of routines
that can be invoked by higher level layer. Layer M – 1, in return
can invoke operation on lower level layers
Layer is an abstract object that is the encapsulation of data and
operations that can manipulate those data
OS Notes by Dr. Naveen Choudhary
An Operating System Layer
OS Notes by Dr. Naveen Choudhary
Virtual Machines
A virtual machine takes the layered approach to its logical
conclusion. It treats hardware and the operating system
kernel as though they were all hardware.
A virtual machine provides an interface identical to the
underlying bare hardware.
The operating system creates the illusion of multiple
processes, each executing on its own processor with its
own (virtual) memory.
OS Notes by Dr. Naveen Choudhary
Virtual Machines (Cont.)
The resources of the physical computer are shared to
create the virtual machines.
CPU scheduling can create the appearance that users have
their own processor.
Spooling and a file system can provide virtual card readers
and virtual line printers.
A normal user time-sharing terminal serves as the virtual
machine operator’s console.
OS Notes by Dr. Naveen Choudhary
System Models
Non-virtual Machine Virtual Machine
OS Notes by Dr. Naveen Choudhary
Advantages/Disadvantages of Virtual Machines
The virtual-machine concept provides complete
protection of system resources since each virtual
machine is isolated from all other virtual machines. This
isolation, however, permits no direct sharing of resources.
A virtual-machine system is a perfect vehicle for
operating-systems research and development. System
development is done on the virtual machine, instead of on
a physical machine and so does not disrupt normal
system operation.
The virtual machine concept is difficult to implement due
to the effort required to provide an exact duplicate to the
underlying machine.
OS Notes by Dr. Naveen Choudhary
Java Virtual Machine
Compiled Java programs are platform-neutral bytecodes
executed by a Java Virtual Machine (JVM).
JVM consists of
- class loader
- class verifier
- runtime interpreter
Just-In-Time (JIT) compilers increase performance
OS Notes by Dr. Naveen Choudhary
Java Virtual Machine
OS Notes by Dr. Naveen Choudhary
System Design Goals
User goals – operating system should be convenient to
use, easy to learn, reliable, safe, and fast.
System goals – operating system should be easy to
design, implement, and maintain, as well as flexible,
reliable, error-free, and efficient.
OS Notes by Dr. Naveen Choudhary
System Implementation
Traditionally written in assembly language, operating
systems can now be written in higher-level languages.
Code written in a high-level language:
can be written faster.
is more compact.
is easier to understand and debug.
An operating system is far easier to port (move to some
other hardware) if it is written in a high-level language.
OS Notes by Dr. Naveen Choudhary
System Generation (SYSGEN)
Operating systems are designed to run on any of a class
of machines; the system must be configured for each
specific computer site.
SYSGEN program obtains information concerning the
specific configuration of the hardware system.
Booting – starting a computer by loading the kernel.
Bootstrap program – code stored in ROM that is able to
locate the kernel, load it into memory, and start its
execution.
OS Notes by Dr. Naveen Choudhary
Chapter 4: Processes
Process Concept
Process Scheduling
Operations on Processes
Cooperating Processes
Interprocess Communication
OS Notes by Dr. Naveen Choudhary
Process Concept
An operating system executes a variety of programs:
Batch system – jobs
Time-shared systems – user programs or tasks
Textbook uses the terms job and process almost
interchangeably.
Process – a program in execution; process execution must
progress in sequential fashion.( program is passive entity,
whereas process is an active entity life form, life )
A process includes:
Program code
program counter
Contents of processor registers
Stack (process stack containing temporary data such as subroutine
parameters, return addresses and temporary variables )
data section (containing global variables)
OS Notes by Dr. Naveen Choudhary
Use of A Base and Limit Register
OS Notes by Dr. Naveen Choudhary
Hardware Address Protection
300040 300040 +120900 = 420940
OS Notes by Dr. Naveen Choudhary
Process State
As a process executes, it changes state
new: The process is being created.
running: Instructions are being executed.
waiting: The process is waiting for some event to occur.
ready: The process is waiting to be assigned to a
processor.
terminated: The process has finished execution.
OS Notes by Dr. Naveen Choudhary
Diagram of Process State
Interrupt when time slice/quantum is over
Scheduler dispatch that is CPU is assigned
Note: only one process can be running on any processor at any
instant. Many process may be ready or waiting however
OS Notes by Dr. Naveen Choudhary
Process Control Block (PCB)
Each process is represented in the O.S by a PCB (process/task
control block)
Information associated with each process.
Process state (new, ready, running, waiting, halted)
Program counter
CPU registers (accumulator, index registers, stack pointer,
general purpose register )
CPU scheduling information (process priority, pointer to next
PCB in queue )
Memory-management information (base register, limit
register, page table/segmet table pointer)
Accounting information (amount of CPU or real time used,
time limits, jobs or process no. )
I/O status information (list of i/o devices allocated to this
process, a list of open files)
OS Notes by Dr. Naveen Choudhary
Process Control Block (PCB)
Pointer : pointer to the next PCB in the list (ready queue)
OS Notes by Dr. Naveen Choudhary
CPU Switch From Process to Process
OS Notes by Dr. Naveen Choudhary
Process Scheduling Queues
The objective is to try to keep the CPU as busy as possible for maximum CPU
utilization & throughput – on a single CPU m/c only one process at a time can
be allocated CPU
Job queue – set of all processes in the system ( even
processes which are ready to run but are still in
secondary memory)
Ready queue – set of all processes residing in main
memory, ready and waiting to execute ie waiting for CPU
to be allocated
Device queues – set of processes waiting for an I/O
device to become free ( as it may be busy with some
other processes
In the life time of the process, the process may go thru
various queues
OS Notes by Dr. Naveen Choudhary
Ready Queue And Various I/O Device Queues
OS Notes by Dr. Naveen Choudhary
Representation of Process Scheduling
Note : The process could be removed forcibly from the CPU, as a result of an
interrupt and put back in the ready queue.
Resources that
QUEUES
serve the queue
OS Notes by Dr. Naveen Choudhary
Schedulers
Long-term scheduler (or job scheduler) – selects which
processes should be brought into the ready queue.
Short-term scheduler (or CPU scheduler) – selects which
process should be executed next and allocates CPU.
Job queue
Ready CPU
queue
OS Notes by Dr. Naveen Choudhary
Schedulers (Cont.)
Short-term scheduler is invoked very frequently (milliseconds) so
(must be fast). { as the process will run for a short time (interactive )
then i/o will occur or time slice will expire }
Long-term scheduler is invoked very infrequently (seconds, minutes)
(may be slow). { as new process are not created so frequently }
The long-term scheduler controls the degree of multiprogramming.
The degree of multiprogramming is stable if the rate of creation of
processes is equal to the rate of termination of the process so as to
keep the multiprogramming stable. The long term scheduler should be
invoked whenever some process terminates or leaves the system
L.T.S should select a good mix of i/o bound and CPU bound process so
as to make efficient and optimal use of all the i/o devices and CPU
Processes can be described as either:
I/O-bound process – spends more time doing I/O than
computations, many short CPU bursts.
CPU-bound process – spends more time doing computations; few
very long CPU bursts.
OS Notes by Dr. Naveen Choudhary
Addition of Medium Term Scheduling
(In time sharing system generally long term scheduler is absent and in place a
medium term scheduler is used)
The key idea behind medium term scheduler is that some times it can be
advantageous to remove process from memory (and from active contention of
CPU) and thus to decrease the degree of multiprogramming. At some later time
the process can be reintroduced into memory and its execution can be continued
where it left off. This scheme is called swapping. Swapping (medium term
scheduler) may be necessary to improve the process mix or because a change in
memory requirement has overcommitted available memory, requiring memory to
be freed.
OS Notes by Dr. Naveen Choudhary
Context Switch
When CPU switches to another process, the system must
save the state of the old process and load the saved state
for the new process.
Context-switch time is a pure overhead and the system
does no useful work while switching.
Time dependent on hardware support.
OS Notes by Dr. Naveen Choudhary
CPU Switch From Process to Process
OS Notes by Dr. Naveen Choudhary
Process Creation
Parent process create children processes, which, in turn
create other processes, forming a tree of processes.
Resource sharing
Parent and children share all resources.
Children share subset of parent’s resources.
{1. some O.S pass resources to child process explicitly
2. restricting a child process to a subset of the parent
resources prevents any processes from overloading the
system by creating too many sub processes
Parent and child share no resources.
Execution ( 2 possibilities exist )
Parent and children execute concurrently.
Parent waits until some or all of it’s childrens terminate.
OS Notes by Dr. Naveen Choudhary
Process Creation (Cont.)
There are also two possibilities in terms of address space
of the new processes
Child process is duplicate of the parent processes.
Child processes has a program loaded into it.
UNIX examples
fork system call creates new process (duplicate of parent )
exec system call used after a fork to replace the process’
memory space with a new program.
Wait() If the parent has nothing to do while child runs ( or
till child completes ) then parent can issue a wait system call
to move itself off the ready queue until the termination of the
child
OS Notes by Dr. Naveen Choudhary
Process Termination
When a processes terminates, all the resources of the process,
including physical or virtual memory, open files and i/o buffers are de
allocated by O.S
When Process executes last statement and asks the operating system
to delete it by implictly/explict call to the exit system call
Output data from child to parent (via wait).
Process’ resources are deallocated by operating system.
Parent may terminate execution of children processes (abort).
Child has exceeded allocated resources.
Task assigned to child is no longer required.
Parent is exiting.
Operating system does not allow child to continue if its parent
terminates.
Cascading termination.
User could kill another process using the kill system call (kill
generates a signal to abort a process but the process can ignore it
(in most of the cases )
The wait system call returns the process identifier of a terminated
child, so that the parent can tell which of it’s possibly many children
has terminated
If the parent terminates, however all the children are terminated by
the O.S. Without a parent, Unix does not know to whom to report
the activities of the child
OS Notes by Dr. Naveen Choudhary
Cooperating Processes
Independent process cannot affect or be affected by the
execution of another process. ( any process that does not
share any data (temporary or persistent with any other
processes is independent )
Cooperating process can affect or be affected by the
execution of another process ( any process that share
data with other processes is co – operating )
Advantages of process cooperation
Information sharing
Computation speed-up (big program can be divided in many
small program and they can run concurrently and can share
data whenever required
Modularity (logically breaking down a program in smaller
ones make the processes/program more modular
Convenience
OS Notes by Dr. Naveen Choudhary
Producer-Consumer Problem
To be co-operative (share data ) the process need to
communicate IPC (inter process communication )
A good example of co-operative process is producer
consumer problem
Paradigm for cooperating processes, producer process
produces information that is consumed by a consumer
process.
Example :
producer = print program produces character, consumer = printer driver
Producer = compiler produces assembly code, consumer = assembler
Producer = assembler produces object code, consumer = loader
The producer and consumer must be synchronized, so
that the consumer does not try to consume an item that
has not yet been produced. In this situation, consumer
must wait until an item is produced
Producer consumer need to share some memory (buffer )
for information exchange these can be of two types
unbounded-buffer places no practical limit on the size of the
buffer.
bounded-buffer assumes that there is a fixed buffer size.
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer – Shared-Memory Solution
Shared data
#define BUFFER_SIZE 10
Typedef struct {
...
} item;
item buffer[BUFFER_SIZE];
int in = 0; //rear
int out = 0; //front
// Solution is correct, but can only use BUFFER_SIZE-1
elements
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer – Producer Process
item nextProduced;
while (1) {
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing */
buffer[in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
}
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer – Consumer Process
item nextConsumed;
while (1) {
while (in == out)
; /* do nothing */
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
}
OS Notes by Dr. Naveen Choudhary
Interprocess Communication (IPC)
1) shared memory (as in producer consumer problem)
2) Message system as discussed below
Mechanism for processes to communicate and to
synchronize their actions definition of IPC.
Message system – processes communicate with each
other without resorting to shared variables.
IPC facility provides two operations:
send(message) – message size fixed or variable
receive(message)
If P and Q wish to communicate, they need to:
establish a communication link between them
exchange messages via send/receive
Implementation of communication link
physical (e.g., shared memory, hardware bus (link))
OS Notes by Dr. Naveen Choudhary
Direct Communication
Processes must name each other explicitly (symmetric addressing ):
send (P, message) – send a message to process P
receive(Q, message) – receive a message from process Q
Asymmetric addressing : only sender need to name the
receiver, the receiver is not required to name the sender
{ send (p, messages), receive (id, message ), id will be
automatically set the identity of the message sender }
Properties of communication link
Links are established automatically.
A link is associated with exactly one pair of communicating
processes.
Between each pair there exists exactly one link.
The link may be unidirectional, but is usually bi-directional.
OS Notes by Dr. Naveen Choudhary
Indirect Communication
Messages are directed and received from mailboxes (also
referred to as ports).
Each mailbox has a unique id.
Processes can communicate only if they share a mailbox.
Properties of communication link
Link established only if processes share a common mailbox
A link may be associated with many processes. (many
members may be members of mailbox)
Each pair of processes may share several communication
links. (some process may be sharing more than one
mailbox)
Link may be unidirectional or bi-directional.
OS Notes by Dr. Naveen Choudhary
Indirect Communication
Operations
create a new mailbox
send and receive messages through mailbox
destroy a mailbox
Primitives are defined as:
send(A, message) – send a message to mailbox A
receive(A, message) – receive a message from mailbox A
OS Notes by Dr. Naveen Choudhary
Synchronization
Message passing may be either blocking or non-blocking.
Blocking is considered synchronous
Non-blocking is considered asynchronous
send and receive primitives may be either blocking or
non-blocking.
OS Notes by Dr. Naveen Choudhary
Buffering
Queue of messages attached to the link; implemented in
one of three ways.
1. Zero capacity – 0 messages
Sender must wait for receiver (rendezvous) sender will
block till the receiver receives the message
2. Bounded capacity – finite length of n messages
Sender must wait if link full.
3. Unbounded capacity – infinite length
Sender never waits.
OS Notes by Dr. Naveen Choudhary
Other IPC Issues
Process termination if sender/receiver terminates there
should be some way for the process (receiver/sender) at
the other end to come to know about the termination
(soln) Ack, multiple time outs
Lost messages (soln) Ack, time out (time out should
be intelligently fixed otherwise duplicate may appear at
the receiver and to solve this problem we will require
duplicate filtering
Scrambled message use error checking codes like
checksums, parity or CRC (cyclic redundancy check ) and
then tell the sender to resend if error is detected
OS Notes by Dr. Naveen Choudhary
OS Notes by Dr. Naveen Choudhary
Chapter 5: Threads
Process with a single thread of execution is called Heavy
weight process
Thread =light weight process
OS Notes by Dr. Naveen Choudhary
Single and Multithreaded Processes
Open files Open files
PC PC
Program
Counter
OS Notes by Dr. Naveen Choudhary
Benefits
Context switch among threads is fast & thus computationally
less expensive than context switch among process as no
memory management work need to be done in case of threads.
Rest of the benefits of multiple threads are similar to
multiprocessing.
Multiple Threads Good when the concurrency is required &
different thread are closely related logically In an editor one
thread may be taking input & the other thread may be doing
formatting
Multiple Process Good when concurrency is required but the
tasks in hand are not very closely related. 2 unrelated tasks
like MS – Word and printer driver MS - Word application
producing data, printer drivers = consumes data produced by
MS – Word. Some sort of synchronization among such
processes may be required.
OS Notes by Dr. Naveen Choudhary
User Threads
Thread implementation & management done by user-level
threads library (thread _create, thread_exit, thread_wait, etc.)
rather than via system calls.
User thread use thread switching ( when a thread voluntarily
goes to sleep – without making any system call) does not need
to call the O.S. & to cause an interrupt to the kernel. So
switching between user level thread can be done independently
of the O.S. & there for very quickly .
Disadvantages:
If the kernel is single threaded, then any user level thread
executing a system call will cause the entire task to wait, until
the system call returns because kernel schedule only processes
& processes waiting for I/O (system call ) are put in wait queue
& can not be allotted CPU.
Unfair Scheduling A process containing single thread say t1
will get 100 times more chances to run than a thread t2 which is
one of the threads in process p2 containing 100 threads.
Examples (user level thread libraries )
- POSIX Pthreads, Mach C-threads, Solaris threads
OS Notes by Dr. Naveen Choudhary
Threads can create child thread, like process but unlike
processes, threads are not independent of one another
because all threads can access every address in the task
and thus a thread can read or write over any other
thread‘s stack no protection is actually needed unlike
processes, all the thread in a process are owned by the
same user.
OS Notes by Dr. Naveen Choudhary
Kernel Threads
The threads are implemented & managed with the help of O.S. kernel. The
smallest unit of processing the kernel will recognize & thus schedule is a thread.
So each thread may be schedule independently.
Process B
Process A …..
t1b t100b
t1a
So process B could receive 100 times the CPU time than process A receives.
Now if a thread make an I/O and a system call, the whole process need not be
blocked (only that thread is blocked) & thus another thread in the same process
run during this time.
Examples
- Windows 95/98/NT/2000
- Solaris
- Tru64 UNIX
- BeOS
- Linux
OS Notes by Dr. Naveen Choudhary
Multithreading Models
There can be two types of kernel also
Single threaded
Multi threaded
Many-to-One
One-to-One
Many-to-Many
OS Notes by Dr. Naveen Choudhary
Many-to-One
Many user-level threads mapped to single kernel thread.
Used on systems that do not support kernel threads.
Example :- Unix
OS Notes by Dr. Naveen Choudhary
Many-to-One Model
OS Notes by Dr. Naveen Choudhary
One-to-One
Each user-level thread maps to kernel thread.
Examples
- Windows 95/98/NT/2000
- OS/2
OS Notes by Dr. Naveen Choudhary
One-to-one Model
OS Notes by Dr. Naveen Choudhary
Many-to-Many Model
Allows many user level threads to be mapped to many
kernel threads.
Allows the operating system to create a sufficient number
of kernel threads.
Solaris 2
Windows NT/2000 with the ThreadFiber package
OS Notes by Dr. Naveen Choudhary
Many-to-Many Model
OS Notes by Dr. Naveen Choudhary
Pthreads
a POSIX standard (IEEE 1003.1c) API for thread creation
and synchronization.
API specifies behavior of the thread library,
implementation is up to development of the library.
Common in UNIX operating systems.
OS Notes by Dr. Naveen Choudhary
Solaris 2 Threads
Supports kernel & user level threads
/or simply
process
OS Notes by Dr. Naveen Choudhary
Thread in Solaris 2
All operation with in the kernel are executed by standard kernel –
level threads. There is a kernel level thread for each LWP ( here
means processes with multiple user level threads ) and there are
some kernel level threads which run on the kernel’s behalf & have no
associated LWP ( for instance, a thread to service disk request)
Kernel level threads are the only objects schedule with in the
system.
Some kernel – level threads are multiplexed on the processors in
the system, where as some are tied to a specific processors. For
instance, the kernel thread associated with a device driver for a
device connected to a specific processor will run only on that
processor. By request a thread can also be pinned to a processor
only that thread runs on the processor, with the processors allocated
to only that threads (see right most processor in fig.)
Kernel threads are the only ones which are schedule by the kernel.
If LWP request a system call than thread associated to this LWP is
blocked & other LWP is scheduled to execute.
OS Notes by Dr. Naveen Choudhary
Chapter 6: CPU Scheduling
Basic Concepts
Scheduling Criteria
Scheduling Algorithms
Multiple-Processor Scheduling
Real-Time Scheduling
Algorithm Evaluation
OS Notes by Dr. Naveen Choudhary
Basic Concepts
CPU scheduling is the basis of multi programmed O.S
Our Goal Maximum CPU utilization
Several processes are kept in memory at one time. When one
processes has to wait (for say i/o ), the O.S’s CPU scheduler
takes the CPU away from that process and gives the CPU to
another process.
The process in memory and ready to run are kept in the ready
queue.
The O.S scheduler CPU scheduler (or short term scheduler)
selects a job from the ready queue to run.
Ready queue is not necessarily in first-in-first-out (FIFO) order.
A ready queue may be implemented as a FIFO queue, a priority
queue, a tree or simply an unordered link list.
The records in the ready queue are the PCBs
Process execution generally consists of alternative cycles of
CPU execution and I/O wait. CPU burst cycle–I/O Burst Cycle
OS Notes by Dr. Naveen Choudhary
Alternating Sequence of CPU And I/O Bursts
OS Notes by Dr. Naveen Choudhary
Histogram of CPU-burst Times
Note: fig shows there are large no. of short CPU burst, and there is a small
no. of long I/O bursts. An i/o bound program would typically have many
very short CPU burst and long i/o bursts. An CPU bound program may
have a few very long CPU bursts. This distribution can be important in the
selection of an appropriate CPU scheduling
OS Notes by Dr. Naveen Choudhary
CPU Scheduler
Selects from among the processes in memory that are ready to
execute, and allocates the CPU to one of them.
CPU scheduling decisions may take place when a process:
1. Switches from running to waiting state.
2. Switches from running to ready state.
3. Switches from waiting to ready.
4. Terminates.
Scheduling under 1 and 4 is nonpreemptive. under
nonpreemptive scheduling, once the CPU has been allocated to
a process, the process keeps the CPU until it releases the CPU
either by terminating or by switching to the waiting state.
All other scheduling is preemptive.
OS Notes by Dr. Naveen Choudhary
Dispatcher
Dispatcher module gives control of the CPU to the
process selected by the short-term scheduler; this
involves:
switching context
switching to user mode
jumping to the proper location in the user program to restart
that program
Dispatch latency – time it takes for the dispatcher to stop
one process and start another running.
OS Notes by Dr. Naveen Choudhary
Scheduling Criteria
CPU utilization – keep the CPU as busy as possible
Throughput – no. of processes that complete their execution per
time unit
Turnaround time – amount of time to execute a particular
process
Waiting time – amount of time a process has been waiting in the
ready queue
Response time – amount of time it takes from when a request
was submitted until the first response is produced, not output
(for time-sharing environment) {as outputting takes long time
sometime it is more appropriate to know after how long the
process started responding
Note : for interactive systems (time sharing system) it is
sometime more important to minimize the variance in response
time than it is to minimize the average response time.
OS Notes by Dr. Naveen Choudhary
Optimization Criteria
Max CPU utilization
Max throughput
Min turnaround time
Min waiting time
Min response time
OS Notes by Dr. Naveen Choudhary
First-Come, First-Served (FCFS) Scheduling
Process Burst Time
P1 24
P2 3
P3 3
Suppose that the processes arrive in the order: P1 , P2 , P3
The Gantt Chart for the schedule is:
P1 P2 P3
0 24 27 30
Waiting time for P1 = 0; P2 = 24; P3 = 27
Average waiting time: (0 + 24 + 27)/3 = 17
OS Notes by Dr. Naveen Choudhary
FCFS Scheduling (Cont.)
First to enter the ready queue will be first to be allocated to CPU
When a process enters the ready queue, its PCB is linked in the
tail of the ready queue in this scheme.
FCFS is nonpreemptive ( cond. 1 & 4)
Suppose that the processes arrive in the order
P2 , P3 , P1 .
The Gantt chart for the schedule is:
P2 P3 P1
0 3 6 30
Waiting time for P1 = 6; P2 = 0; P3 = 3
Average waiting time: (6 + 0 + 3)/3 = 3
Much better than previous case.
Convoy effect as all the other processes wait for one big
process to get off the CPU resulting in lower CPU and device
utilization ( as this will not decrease the waiting time of the big
process as it will increases the waiting time of all the other
processes
Soln short process first and then schedule long process to
avoid convoy effect
OS Notes by Dr. Naveen Choudhary
Shortest-Job-First (SJR) Scheduling
Associate with each process the length of its next CPU
burst (in case of a tie we can use FCFS ). Use these
lengths to schedule the process with the shortest time.
Two schemes:
nonpreemptive – once CPU given to the process it cannot
be preempted until completes its CPU burst.
preemptive – if a new process arrives with CPU burst length
less than remaining time of current executing process,
preempt the running process. This scheme is known as the
Shortest-Remaining-Time-First (SRTF).
SJF is optimal – gives minimum average waiting time for
a given set of processes. { as by moving a short process
before a long one, the waiting time of the short processes
decreases more than it increases the waiting time of the
long process. Consequently the average waiting time
decreases
OS Notes by Dr. Naveen Choudhary
Example of Non-Preemptive SJF
Process Arrival Time Burst Time
P1 0.0 7
P2 2.0 4
P3 4.0 1
P4 5.0 4
SJF (non-preemptive)
P1 P3 P2 P4
0 3 7 8 12 16
Average waiting time = (0 + 6 + 3 + 7)/4 = 4
OS Notes by Dr. Naveen Choudhary
Example of Preemptive SJF
Process Arrival Time Burst Time
P1 0.0 7
P2 2.0 4
P3 4.0 1
P4 5.0 4
SJF (preemptive)
P1 P2 P3 P2 P4 P1
0 2 4 5 7 11 16
Average waiting time = (9 + 1 + 0 +2)/4 = 3
OS Notes by Dr. Naveen Choudhary
Determining Length of Next CPU Burst
Can only estimate/predict the length.
Can be done by using the length of previous CPU bursts,
using exponential averaging.( it is expected that the next
CPU burst will be similar in length to the previous ones )
1. t n actual lenght of n thCPU burst
2. n 1 predicted value for the next CPU burst
3. , 0 1 - - relative weight we will give to the recent past (history)
4. Define :
n1 tn 1 n .
OS Notes by Dr. Naveen Choudhary
Examples of Exponential Averaging
=0
n+1 = n
Recent history has no effect. Last prediction has all the
weightage
=1
n+1 = tn
Only the actual last CPU burst counts.
If we expand the formula, we get:
n+1 = tn+(1 - ) tn -1 + …
+(1 - )j tn -1 + …
+(1 - )n=1 tn 0
OS Notes by Dr. Naveen Choudhary
Priority Scheduling
A priority number (integer) is associated with each process
The CPU is allocated to the process with the highest priority
(smallest integer highest priority). { if equal priority use FCFS
to break the tie }
Preemptive (a preemptive priority scheduling algo. Will
preempt the CPU if the priority of the newly arrived process
is higher than the priority of the currently running process )
nonpreemptive
SJF is a priority scheduling where priority is the predicted next
CPU burst time.
Internal priority (set by OS) generally assigned based on the
factors like time limits, memory requirements, the no. of open
files and ratio of average i/o burst to average CPU burst.
External Priorities (set due to the criteria's external to the O.S)
are generally assigned based on the factors like importance of
the process, the type and amount of funds being paid for
computer use, the department sponsoring the work and other
often political factors
Problem Starvation – low priority processes may never
execute.
Solution Aging – as time progresses increase the priority of
the process. (for example say a waiting process’s priority is
increased by 1 every 15 minutes)
OS Notes by Dr. Naveen Choudhary
Round Robin (RR)
Each process (can pick process from the ready queue in
FIFO order) gets a small unit of CPU time (time
quantum), usually 10-100 milliseconds. After this time
has elapsed, the process is preempted and added to the
end of the ready queue.
If there are n processes in the ready queue and the time
quantum is q, then each process gets 1/n of the CPU time
in chunks of at most q time units at once. No process
waits more than (n-1)q time units.
Performance
q large it will be like FIFO
q small q must be large with respect to context switch,
otherwise overhead is too high. most of the time will be
wasted in context switches so if context switch time is
added in, the average turnaround time increases for a
smaller time quantum, since more context switch will be
required
OS Notes by Dr. Naveen Choudhary
Example of RR with Time Quantum = 20
Process Burst Time
P1 53
P2 17
P3 68
P4 24
The Gantt chart is:
P1 P2 P3 P4 P1 P3 P4 P1 P3 P3
0 20 37 57 77 97 117 121 134 154 162
Typically, higher average turnaround than SJF, but better
response.
The average waiting time under RR policy is however is
often quite long
OS Notes by Dr. Naveen Choudhary
Time Quantum and Context Switch Time
OS Notes by Dr. Naveen Choudhary
Multilevel Queue
Ready queue may be partitioned into separate queues:
foreground processes
background processes
Each queue has its own scheduling algorithm, for
example
foreground – RR
background – FCFS
Scheduling must be done between the queues.
Fixed priority scheduling; (i.e., serve all from foreground
then from background). Possibility of starvation.
Time slice – each queue gets a certain amount of CPU time
which it can schedule amongst its processes; i.e., 80% to
foreground in RR
20% to background in FCFS
Scheduling among the queues is preemptive if a job in
low priority queue is running & a job in high priority process
arrives then job in low priority process is preempted.
OS Notes by Dr. Naveen Choudhary
Multilevel Queue Scheduling
OS Notes by Dr. Naveen Choudhary
Multilevel Feedback Queue
This scheme allow the processes to move among the queues :
the idea is to separate processes with different CPU – burst
characteristics. If a process uses too much time it will be moved
to the lower priority queue. This scheme leaves i/o bound and
interactive processes in the high priority queues. Similarly a
process that waits too long in a lower priority queue may be
moved to a higher priority queue. This form of aging prevents
starvation.
Scheduling among queue is fixed priority preemptive
scheduling.
Multilevel-feedback-queue scheduler defined by the following
parameters:
number of queues
scheduling algorithms for each queue
method used to determine when to upgrade a process
method used to determine when to demote a process
method used to determine which queue a process will enter
when that process needs service (generally a
process enters in queue 0 (highest priority queue ) and if it
is taking too much time than it is gradually demoted to the
lower priority queues.)
OS Notes by Dr. Naveen Choudhary
Example of Multilevel Feedback Queue
Three queues:
Q0 – time quantum 8 milliseconds
Q1 – time quantum 16 milliseconds
Q2 – FCFS
Scheduling
A new job enters queue Q0 which is served in FCFS order.
When it gains CPU, job receives 8 milliseconds. If it does
not finish in 8 milliseconds, job is moved to queue Q1.
At Q1 job is again served (may be in FCFS ) and receives
16 additional milliseconds. If it still does not complete, it is
preempted and moved to queue Q2.
OS Notes by Dr. Naveen Choudhary
Multilevel Feedback Queues
OS Notes by Dr. Naveen Choudhary
Multiple-Processor Scheduling
CPU scheduling more complex when multiple CPUs are
available.
2 types :
homogenous system (same type of processors)
heterogeneous systems (diff. type of processors)
Homogeneous processors load sharing possible
2 ways of scheduling in such system
1) any free processor will read the ready queue & pick
up a process to execute.
2) one processor (master) selects a process &
processor on which the process should run.
OS Notes by Dr. Naveen Choudhary
Algorithm Evaluation
Deterministic modeling – takes a particular predetermined workload and
defines the performance of each scheduling algorithm for that
workload.
The evaluation is generally performed based on the criteria's like cpu
utilization, waiting time, response time, throughput etc
Deterministic modeling : take a sample (ready queue) and analyze the
performance of various algos on this sample
Queuing models : little’s formula : n = *גW
avg. length of the queue = rate of arrival of new jobs * avg. waiting time
in queue
Now we have actual queue space available and the required queue
length (according to the little’s formula) so we can calculate queue
utilization (cpu utiliztion) based on queuing network analysis theory
(queing model operation research )
Simulation: run simulation programs on mathematical (randomly
selected) or empirical data. Disadvantage : very time consuming
process.
empirical data = data based on observation
OS Notes by Dr. Naveen Choudhary
Chapter 7: Process Synchronization
Background
The Critical-Section Problem
Synchronization Hardware
Semaphores
Classical Problems of Synchronization
Critical Regions
Monitors
Synchronization in Solaris 2 & Windows 2000
OS Notes by Dr. Naveen Choudhary
Background
Cooperating processes Share data
Concurrent access to shared data may result in data
inconsistency.
Maintaining data consistency requires mechanisms
to ensure the orderly execution of cooperating
processes.
Shared-memory solution to bounded-butter problem
(Chapter 4) allows at most n – 1 items in buffer at the
same time. A solution, where all N buffers are used is not
simple.
Suppose that we modify the producer-consumer code by
adding a variable counter, initialized to 0 and incremented
each time a new item is added to the buffer
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer
Shared data
#define BUFFER_SIZE 10
typedef struct {
...
} item;
item buffer[BUFFER_SIZE];
int in = 0;
int out = 0;
int counter = 0;
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer
Producer process
item nextProduced;
while (1) {
while (counter == BUFFER_SIZE)
; /* do nothing */
buffer[in] = nextProduced;
in = (in + 1) ;
counter++;
}
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer
Consumer process
item nextConsumed;
while (1) {
while (counter == 0)
; /* do nothing */
nextConsumed = buffer[out];
out = (out + 1) ;
counter--;
}
OS Notes by Dr. Naveen Choudhary
Bounded Buffer
The statements
counter++;
counter- -;
must be performed atomically.
Atomic operation means an operation that completes in
its entirety without interruption in other words if two such
instructions (operations ) are executed concurrently, the
result is equivalent to their sequential execution in some
un-known order (we can assume that basic m/c language
instructions like load, store, test are executed atomically.
OS Notes by Dr. Naveen Choudhary
Bounded Buffer
The statement “count++” may be implemented in
machine language as:
register1 = counter
register1 = register1 + 1
counter = register1
The statement “count--” may be implemented as:
register2 = counter
register2 = register2 – 1
counter = register2
OS Notes by Dr. Naveen Choudhary
Bounded Buffer
If both the producer and consumer attempt to update the
buffer concurrently, the assembly language statements
may get interleaved.
Interleaving depends upon how the producer and
consumer processes are scheduled.
OS Notes by Dr. Naveen Choudhary
Bounded Buffer
Assume counter is initially 5. One interleaving of
statements is:
producer: register1 = counter (register1 = 5)
producer: register1 = register1 + 1 (register1 = 6)
consumer: register2 = counter (register2 = 5)
consumer: register2 = register2 – 1 (register2 = 4)
producer: counter = register1 (counter = 6)
consumer: counter = register2 (counter = 4)
The value of count may be either 4 or 6, where the
correct result should be 5.
OS Notes by Dr. Naveen Choudhary
Race Condition
Race condition: The situation where several processes
access – and manipulate shared data concurrently. The
final value of the shared data depends upon which
process finishes last. (ie final value depends on the order
of process execution)
Soln. only one process should be
manipulating/accessing data at a time, and to achieve this
we need some form of synchronization among the
cooperating process.
OS Notes by Dr. Naveen Choudhary
The Critical-Section Problem
n processes all competing for use of some shared data
Each process has a code segment, called critical section,
in which the shared data is accessed.
Problem – ensure that when one process is executing in
its critical section, no other process is allowed to execute
in its critical section.
OS Notes by Dr. Naveen Choudhary
Solution to Critical-Section Problem
1. Mutual Exclusion. If process Pi is executing in its critical
section, then no other processes can be executing in their
critical sections.
2. Progress. If no process is executing in its critical section
and there exist some processes that wish to enter their
critical section, then the selection of the processes that
will enter the critical section next cannot be postponed
indefinitely.
3. Bounded Waiting. A bound must exist on the number of
times that other processes are allowed to enter their
critical sections after a process has made a request to
enter its critical section and before that request is
granted.
Assume that each process executes at a nonzero speed
No assumption concerning relative speed of the n
processes.
OS Notes by Dr. Naveen Choudhary
Initial Attempts to Solve Problem
( general structure of a typical process – which is sharing some
critical data with other processes )
Only 2 processes, P0 and P1
General structure of process Pi (other process Pj)
do {
entry section
critical section
exit section
reminder section
} while (1);
Processes may share some common variables to
synchronize their actions.
OS Notes by Dr. Naveen Choudhary
Synchronization Hardware
Some simple h/w instructions (like testandset & Swap )
are available on many systems to solve the critical
section problem
Testandset modify the content of a word atomically
.
boolean TestAndSet(boolean &target) {
Should
boolean rv = target; be
target = true; execut
ed
atomic
return rv; ally
}
Note : in a uni-processor system solving critical
section problem is easy just disallow interrupts to
occur while a shared variable is being modified but
that is not so in case of multiprocessing environment
OS Notes by Dr. Naveen Choudhary
Mutual Exclusion with Test-and-Set
Shared data:
boolean lock = false;
Process Pi
do {
while (TestAndSet(lock)) do no-operation;
critical section
lock = false;
remainder section
}
OS Notes by Dr. Naveen Choudhary
Synchronization Hardware
Atomically swap two variables.
void Swap(boolean &a, boolean &b) {
Should be
boolean temp = a; executed
a = b; atomically
b = temp;
}
OS Notes by Dr. Naveen Choudhary
Mutual Exclusion with Swap
Shared data (initialized to false):
boolean lock;
boolean waiting[n];
Process Pi
do {
key = true;
while (key == true)
Swap(lock,key);
critical section
lock = false;
remainder section
}
***
OS Notes by Dr. Naveen Choudhary
Semaphores
The solution to the critical section problem discussed on
previous pages are not easy to generalize to more
complex problems
So use synchronization tool called semaphore
Semaphore S is an integer variable
Apart from initialization the semaphores can only be
accessed via two indivisible (atomic) operations
wait (S): Should be
while S 0 do no-op; executed
S--; atomically
signal (S): Should be
S++; executed
atomically
OS Notes by Dr. Naveen Choudhary
Critical Section of n Processes
Shared data:
semaphore mutex; //initially mutex = 1
Process Pi:
do {
wait(mutex);
critical section
signal(mutex);
remainder section
} while (1);
OS Notes by Dr. Naveen Choudhary
Semaphore Implementation
Previous semaphore’s disadvantage based on busy waiting (spin
lock) { so busy waiting and wasting important CPU cycles }
Advantages of spin lock
1) no context switch is required when a process must wait
on a lock
2) context switch may take considerable time. Thus when
locks are expected to be held for short time, spin lock is useful.
Define a semaphore as a record
typedef struct {
int value;
struct process *L;
} semaphore;
Assume two simple operations:
block suspends the process that invokes it.
wakeup(P) resumes the execution of a blocked process P.
OS Notes by Dr. Naveen Choudhary
Implementation
Semaphore operations now defined as
wait(S):
S.value--;
if (S.value < 0) {
add this process to S.L;
/* 1)list of waiting processes for semaphore S
2) waiting : state of the process is switched to waiting & then control is
transferred to the CPU scheduler, which selects another process to execute
3) list : list of PCBs of waiting processes */
block;
}
signal(S):
S.value++;
if (S.value <= 0) {
remove a process P from S.L;
wakeup(P);
/* resume the execution of a blocked process that is remove a process from writing list
and put it in the ready queue */
}
Note : In this case unlike busy waiting semaphore, the semaphore value can be (-)ve. If
the semaphore value is (-)ve, its magnitude is the no. of processes waiting on that
semaphore
OS Notes by Dr. Naveen Choudhary
Semaphore as a General Synchronization Tool
Execute B in Pj only after A executed in Pi
Use semaphore flag initialized to 0
Code:
Pi Pj
A wait(flag)
signal(flag) B
OS Notes by Dr. Naveen Choudhary
Deadlock and Starvation
Deadlock – two or more processes are waiting indefinitely for
an event that can be caused by only one of the waiting
processes.
Let S and Q be two semaphores initialized to 1
P0 P1
wait(S); (1) wait(Q);(2)
wait(Q);(3) wait(S);(4)
signal(S); signal(Q);
signal(Q) signal(S);
Starvation – indefinite blocking. A process may never be
removed from the semaphore queue in which it is suspended.
(like if we add and remove processes from the list associated
with semaphore in LIFO order )
OS Notes by Dr. Naveen Choudhary
Two Types of Semaphores
Counting semaphore – integer value can range over
an unrestricted domain.
Binary semaphore – integer value can range only
between 0 and 1; can be simpler to implement.
Can implement a counting semaphore S as a binary
semaphore.
OS Notes by Dr. Naveen Choudhary
Implementing S as a Binary Semaphore
Data structures:
binary-semaphore S1, S2;
int C:
Initialization:
S1 = 1
S2 = 0
C = initial value of semaphore S
OS Notes by Dr. Naveen Choudhary
Implementing S
wait operation
wait(S1);
C--;
if (C < 0) {
signal(S1);
wait(S2);
}
signal(S1);
signal operation
wait(S1);
C ++;
if (C <= 0)
{ signal(S2); }
signal(S1);
OS Notes by Dr. Naveen Choudhary
Classical Problems of Synchronization
The following example represent different synchronization
problem that are important mainly because they are
examples for a large class of concurrency control
problems
Bounded-Buffer Problem
Readers and Writers Problem
Dining-Philosophers Problem
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer Problem
Pool of n buffers each capable of holding one item
Mutex semaphore provides mutual exclusion for access
to the buffer pool
Shared data
semaphore full, empty, mutex;
Initially:
full = 0, empty = n, mutex = 1
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer Problem Producer Process
do {
…
produce an item in nextp
…
wait(empty);
wait(mutex);
…
add nextp to buffer
…
signal(mutex);
signal(full);
} while (1);
OS Notes by Dr. Naveen Choudhary
Bounded-Buffer Problem Consumer Process
do {
wait(full)
wait(mutex);
…
remove an item from buffer to nextc
…
signal(mutex);
signal(empty);
…
consume the item in nextc
…
} while (1);
OS Notes by Dr. Naveen Choudhary
Readers-Writers Problem
First reader writer problem :: no reader will be kept waiting unless a writer has
already obtained permission to use the shared object. In other words, no reader
should wait for other readers to finish simply because a writer is waiting (problem
writer can starve )
Second reader writer problem :: once a writer is ready, that writer performs its write
as soon as possible. In other words if a writer is waiting to access the object, no
new reader may start reading (problem reader can starve )
The algorithm below is for first reader writer problem
Shared data
semaphore mutex, wrt;
Initially
mutex = 1, wrt = 1, readcount = 0
OS Notes by Dr. Naveen Choudhary
Readers-Writers Problem Writer Process
wait(wrt);
…
writing is performed
…
signal(wrt);
OS Notes by Dr. Naveen Choudhary
Readers-Writers Problem Reader Process
wait(mutex);
readcount++;
if (readcount == 1)
wait(wrt);
signal(mutex);
…
reading is performed
…
wait(mutex);
readcount--;
if (readcount == 0)
signal(wrt);
signal(mutex):
mutex :: to ensure mutual exclusion when the variable readcount is updated
Wrt :: used by the writer & also by first and last reader that enters or exit the
C.S (it is not used by readers who enter or exit while other readers are in
their C.S)
Note that , if a writer is in CS & n readers are waiting, then one reader is
queued on wrt and n-1 readers are queued on mutex. Also observe that,
when a writer executes signal(wrt), we may resume the execution of either
the waiting reader or a single waiting writer. The selection is made by the
scheduler.
OS Notes by Dr. Naveen Choudhary
Dining-Philosophers Problem
1- rice bowl, 5 - chairs, 5 – chopsticks
To eat , the philosopher will need both the chopsticks in front of
him
Shared data
semaphore chopstick[5]; // {chopstick[0] . . .
Chopstick[4] }
Initially all values are 1
OS Notes by Dr. Naveen Choudhary
Dining-Philosophers Problem
The solution below guarantees that no two philosopher are eating
simultaneously
But solution is prone to deadlock (each philosopher grabs her left
chopstick )
Philosopher i:
do {
wait(chopstick[i])
wait(chopstick[(i+1) % 5])
…
eat
…
signal(chopstick[i]);
signal(chopstick[(i+1) % 5]);
…
think
…
} while (1);
OS Notes by Dr. Naveen Choudhary
Deadlock Free Dining-Philosophers Problem
Allow at most four philosopher to be sitting simultaneously at
the table.
Allow a philosopher to pick up her chopstick only if both
chopsticks are available( note that he must pick them up in a
critical section )
Use an asymmetric solution, that is , an odd philosopher pick up
first her left chopstick and then her right chopstick whereas an
even philosopher picks up her right chopstick and then her left
chopstick.
Note :: the deadlock free solution of dinning philosopher
problem does not eliminate the possibility of starvation.
OS Notes by Dr. Naveen Choudhary
Critical Regions
It is a high level language construct used for critical section
problem. It is basically implemented using semaphore internally.
It is provided as it is more user friendly & thus less prone to
errors by the programmer.
High-level synchronization construct
A shared variable v (shared by different processes) of type T, is
declared as:
v: shared T
Variable v accessed only inside statement
region v when B do S
where B is a boolean expression.
While statement S is being executed, no other process can
access variable v.
OS Notes by Dr. Naveen Choudhary
Critical Regions
Regions referring to the same shared variable exclude each
other in time.
When a process tries to execute the region statement, the
Boolean expression B is evaluated. If B is true, statement S is
executed. If it is false, the process is delayed until B becomes
true and no other process is in the region associated with v.
Thus if two statements
region v when true do s1;
region v when true do s2;
are executed concurrently in distinct sequential processes, the
result will be equivalent to the sequential execution “ s1 followed
by s2” or “s2 followed by s1”
OS Notes by Dr. Naveen Choudhary
Monitors
High-level synchronization construct that allows the safe sharing
of an abstract data type among concurrent processes.
monitor monitor-name
{
shared variable declarations
procedure body P1 (…) {
...
}
procedure body P2 (…) {
...
}
procedure body Pn (…) {
...
}
{
initialization code
}
}
OS Notes by Dr. Naveen Choudhary
Monitors …..contd
Internally monitors is also basically implemented using
semaphore.
A procedure defined a monitor can access only those
variables declared locally within the monitor and the
formal parameters.
Similarly the local variables of a monitor can be accessed
only by the local procedures of the monitor.
The monitor construct ensures mutual exclusion that is
only one process at a time can be active within the
monitor.
But monitor as described above does not support
synchronization scheme soln. use condition
constructs as described in next slide.
OS Notes by Dr. Naveen Choudhary
Monitors
To allow a process to wait within the monitor, a
condition variable must be declared, as
condition x, y;
Condition variable can only be used with the
operations wait and signal.
The operation
x.wait();
means that the process invoking this operation is
suspended until another process invokes
x.signal();
The x.signal operation resumes exactly one suspended
process. If no process is suspended, then the signal
operation has no effect.
OS Notes by Dr. Naveen Choudhary
Schematic View of a Monitor
OS Notes by Dr. Naveen Choudhary
Monitor With Condition Variables
OS Notes by Dr. Naveen Choudhary
Monitor Implementation
Let there are many processes waiting on a condition variable
(say x) then which will be resumed when signal(x) is executed
one soln is the process to resume is taken in FCFS order
from the the condition variable’s (x’s) waiting queue of
processes.
other soln. is to wait with some priority as discussed below.
Conditional-wait construct: x.wait(c);
c – integer expression evaluated when the wait operation is
executed.
value of c (a priority number) stored with the name of the process
that is suspended.
when x.signal is executed, process with smallest associated
priority number is resumed next.
OS Notes by Dr. Naveen Choudhary
Correctness of Monitor Depends on its Use
Unfortunately, the monitor concept cannot guarantee that the
correct sequence (as described in previous slide) is observed by
the processes
A process might access the resource without first gaining access
permission to that resource.
A process might never release the resource once it has been
granted access to that resource.
A process might attempt to release a resource that it never
requested.
A process might request the same resource twice without first
releasing that resource.
So monitors (even semaphores & critical regions) are useful if
they are used correctly otherwise the above mentioned
problems will defeat the objectives of such construct
OS Notes by Dr. Naveen Choudhary
Chapter 8: Deadlocks
System Model
Deadlock Characterization
Methods for Handling Deadlocks
Deadlock Prevention
Deadlock Avoidance
Deadlock Detection
Recovery from Deadlock
Combined Approach to Deadlock Handling
OS Notes by Dr. Naveen Choudhary
The Deadlock Problem
A set of blocked processes each holding a resource and
waiting to acquire a resource held by another process in
the set.
Example
System has 2 tape drives.
P1 and P2 each hold one tape drive and each needs another
one.
Example
semaphores A and B, initialized to 1
P0 P1
wait (A);(1) wait(B) (2)
wait (B); (3) wait(A) (4)
OS Notes by Dr. Naveen Choudhary
Bridge Crossing Example
Traffic only in one direction.
Each section of a bridge can be viewed as a resource.
If a deadlock occurs, it can be resolved if one car backs
up (preempt resources and rollback).
OS Notes by Dr. Naveen Choudhary
System Model
Resource types R1, R2, . . ., Rm
CPU cycles, memory space, I/O devices
Each resource type Ri has Wi instances.
Each process utilizes a resource as follows:
request
use
release
OS Notes by Dr. Naveen Choudhary
Deadlock Characterization
Deadlock can arise if four conditions hold simultaneously.
Mutual exclusion: only one process at a time can use a
resource. (ie resource is not sharable some resources have
to be non sharable like printer ( while printing you can not mix
the o/p of 2 processes)
Hold and wait: a process holding at least one resource is
waiting to acquire additional resources held by other processes.
No preemption: a resource can be released only voluntarily by
the process holding it, after that process has completed its task.
Circular wait: there exists a set {P0, P1, …, P0} of waiting
processes such that P0 is waiting for a resource that is held by
P1, P1 is waiting for a resource that is held by
P2, …, Pn–1 is waiting for a resource that is held by
Pn, and P0 is waiting for a resource that is held by P0.
Note : these condition are not totally independent for example
circular wait implies hold and wait.
OS Notes by Dr. Naveen Choudhary
Resource-Allocation Graph
A set of vertices V and a set of edges E.
V is partitioned into two types:
P = {P1, P2, …, Pn}, the set consisting of all the processes in
the system.
R = {R1, R2, …, Rm}, the set consisting of all resource types
in the system.
request edge – directed edge P1 Rj
assignment edge – directed edge Rj Pi
OS Notes by Dr. Naveen Choudhary
Resource-Allocation Graph (Cont.)
Process
Resource Type with 4 instances
Pi requests instance of Rj
Pi
Rj
Pi is holding an instance of Rj
Pi
Rj
OS Notes by Dr. Naveen Choudhary
Example of a Resource Allocation Graph
OS Notes by Dr. Naveen Choudhary
Resource Allocation Graph With A Deadlock
OS Notes by Dr. Naveen Choudhary
Resource Allocation Graph With A Cycle But No Deadlock
OS Notes by Dr. Naveen Choudhary
Basic Facts
If graph contains no cycles no deadlock.
If graph contains a cycle
if only one instance per resource type, then deadlock.
if several instances per resource type, possibility of
deadlock.
OS Notes by Dr. Naveen Choudhary
Methods for Handling Deadlocks
Ensure that the system will never enter a deadlock state.
(deadlock prevention, deadlock avoidance )
Allow the system to enter a deadlock state and then
recover. ( deadlock detection & recovery)
Ignore the problem and pretend that deadlocks never
occur in the system; used by most operating systems,
including UNIX. (based on the assumption that deadlock
will be infrequent & when it occurs & system start slowing
down and may ultimately stop soln restart the
system and redo the work )
Implementation of deadlock detection/avoidance/
/prevention algorithms are computationally expensive
OS Notes by Dr. Naveen Choudhary
Deadlock Prevention
Ensure that at least one of the necessary condition for
deadlock cannot hold
Restrain the ways request can be made.
Mutual Exclusion –
not required for sharable resources; must hold for nonsharable
resources. (read only file no problem many processes can
read the file simultaneously, printer is intrinsically non sharable
& so had to be accessed in a mutually exclusive way )
Hold and Wait – must guarantee that whenever a process
requests a resource, it does not hold any other resources.
Require process to request and be allocated all its resources before
it begins execution, or allow process to request resources only
when the process has none. ( a process may request and use them
before it can request any additional resources, however, it must
release all the resources that is currently allocated )
Low resource utilization ( since many of the resources may
be allocated but unused for long period );
starvation possible (a process that needs several popular
resources may have to wait indefinitely, because at least one of the
resources that it needs is always allocated to some other processes
).
OS Notes by Dr. Naveen Choudhary
Deadlock Prevention (Cont.)
No Preemption –
If a process (say p1) that is holding some resources
requests another resource that cannot be immediately
allocated to it, then all resources currently being held by p1
are released.
Preempted resources are added to the list of resources for
which the p1 process is waiting.
Process (p1) will be restarted only when it can regain its old
resources, as well as the new ones that it is requesting.
Note : The protocol is often applied to resources whose
state can be easily saved and restarted later, such as
CPU registers & memory space. It cannot generally be
applied to such resources as printers and tape drives.
OS Notes by Dr. Naveen Choudhary
Deadlock Prevention (Cont.)
Circular Wait – impose a total ordering of all resource
types, and require that each process requests resources
in an increasing order of enumeration.
{P0, P1, …. Pn} , {R1, R2,….Rn}
P0 is waiting for resource held by P1 :: P0 P1
let Pi is holding resource Ri while requesting resource
Ri+1. so we must have F(Ri) < F(Ri+1) (1: F is a
enumeration function 2: the equation signify that the
process can request only in increasing order of
enumeration. )
But in case of circular wait it should be
F(R0) < F(R1) < ………… < F(Rn) < F(R0) F(R0) < F(R0)
which is not possible so circular wait can not occur.
OS Notes by Dr. Naveen Choudhary
Deadlock Avoidance
Requires that the system has some additional priori information
available.
Simplest and most useful model requires that each
process declare the maximum number of resources of
each type that it may need.
The deadlock-avoidance algorithm dynamically examines
the resource-allocation state to ensure that there can
never be a circular-wait condition. (see if the resource can
be granted to the requesting process by analyzing the
available information )
Resource-allocation state is defined by the number of
available and allocated resources, and the maximum
demands of the processes.
OS Notes by Dr. Naveen Choudhary
Safe State
When a process requests an available resource, system must
decide if immediate allocation leaves the system in a safe state.
System is in safe state if there exists a safe sequence of all
processes.
Sequence <P1, P2, …, Pn> is safe if for each Pi, the resources
that Pi can still request can be satisfied by currently available
resources + resources held by all the Pj, with j<I.
If Pi resource needs are not immediately available, then Pi can wait
until all Pj have finished.
When Pj is finished, Pi can obtain needed resources, execute,
return allocated resources, and terminate.
When Pi terminates, Pi+1 can obtain its needed resources, and so
on.
OS Notes by Dr. Naveen Choudhary
Basic Facts
If a system is in safe state no deadlocks.
If a system is in unsafe state possibility of deadlock.
Avoidance ensure that a system will never enter an
unsafe state.
OS Notes by Dr. Naveen Choudhary
Safe, Unsafe , Deadlock State
OS Notes by Dr. Naveen Choudhary
Resource-Allocation Graph Algorithm
Claim edge Pi Rj indicated that process Pi may request
resource Rj in the future ; represented by a dashed line.
Claim edge converts to request edge when a process
requests a resource.
When a resource is released by a process, assignment
edge reconverts to a claim edge.
Resources must be claimed a priori (in advance )in the
system.
Note : suppose that process Pi request resource Rj. The
request can be granted only if converting the request
edge Pi Rj to the assignment edge Rj Pi does not
result in formation of cycle in the resource allocation
graph.
OS Notes by Dr. Naveen Choudhary
Banker’s Algorithm
Multiple instances.
Each process must a priori claim maximum use.
When a process requests a resource it may have to wait.
When a process gets all its resources it must return them
in a finite amount of time.
OS Notes by Dr. Naveen Choudhary
Data Structures for the Banker’s Algorithm
Let n = number of processes, and m = number of resources types.
Available: Vector of length m. If available [j] = k, there are
k instances of resource type Rj available.
Max: n x m matrix. If Max [i,j] = k, then process Pi may
request at most k instances of resource type Rj.
Allocation: n x m matrix. If Allocation[i,j] = k then Pi is
currently allocated k instances of Rj.
Need: n x m matrix. If Need[i,j] = k, then Pi may need k
more instances of Rj to complete its task.
Need [i,j] = Max[i,j] – Allocation [i,j].
OS Notes by Dr. Naveen Choudhary
Safety Algorithm
1. Let Work and Finish be vectors of length m and n,
respectively. Initialize:
Work = Available
Finish [i] = false for i - 1,3, …, n.
2. Find and i such that both:
(a) Finish [i] = false
(b) Needi Work
If no such i exists, go to step 4.
3. Work = Work + Allocationi
Finish[i] = true
go to step 2.
4. If Finish [i] == true for all i, then the system is in a safe
state. Otherwise unsafe state
OS Notes by Dr. Naveen Choudhary
Resource-Request Algorithm for Process Pi
Request = request vector for process Pi. If Requesti [j] = k
then process Pi wants k instances of resource type Rj.
1. If Requesti Needi go to step 2. Otherwise, raise error
condition, since process has exceeded its maximum claim.
2. If Requesti Available, go to step 3. Otherwise Pi must
wait, since resources are not available.
3. Pretend to allocate requested resources to Pi by modifying
the state as follows:
Available = Available - Requesti;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;;
• If safe the resources are allocated to Pi.
• If unsafe Pi must wait, and the old resource-allocation
state is restored
OS Notes by Dr. Naveen Choudhary
Example of Banker’s Algorithm
5 processes P0 through P4; 3 resource types A
(10 instances),
B (5 instances), and C (7 instances).
Snapshot at time T0:
Allocation Max Available Need
ABC ABC ABC A B C
P0 010 753 332 7 4 3
P1 200 322 1 2 2
P2 302 902 6 0 0
P3 211 222 0 1 1
P4 002 433 4 3 1
OS Notes by Dr. Naveen Choudhary
Example (Cont.)
The content of the matrix. Need is defined to be Max –
Allocation.
Need
ABC
P0 743
P1 122
P2 600
P3 011
P4 431
The system is in a safe state since the sequence < P1, P3, P4,
P0, P2> satisfies safety criteria.
See hardcopy also
OS Notes by Dr. Naveen Choudhary
Example P1 Request (1,0,2) (Cont.)
Check that Request Available (that is, (1,0,2) (3,3,2)
true.
Allocation Need Available
ABC ABC ABC
P0 0 1 0 743 230
P1 3 0 2 020
P2 3 0 1 600
P3 2 1 1 011
P4 0 0 2 431
Executing safety algorithm shows that sequence <P1, P3, P4,
P0, P2> satisfies safety requirement.
Can request for (3,3,0) by P4 be granted? (no, since resources
are not available )
Can request for (0,2,0) by P0 be granted? ( no, since although
resources are available but resulting state is unsafe )
See hardcopy also
OS Notes by Dr. Naveen Choudhary
Deadlock Detection
Allow system to enter deadlock state
Detection algorithm
Recovery scheme
OS Notes by Dr. Naveen Choudhary
Single Instance of Each Resource Type
Maintain wait-for graph
Nodes are processes.
Pi Pj if Pi is waiting for Pj.
Periodically invoke an algorithm that searches for a cycle
in the graph.
An algorithm to detect a cycle in a graph requires an
order of n2 operations, where n is the number of vertices
in the graph.
OS Notes by Dr. Naveen Choudhary
Resource-Allocation Graph and Wait-for Graph
Resource-Allocation Graph Corresponding wait-for graph
OS Notes by Dr. Naveen Choudhary
Several Instances of a Resource Type
Available: A vector of length m indicates the number of
available resources of each type.
Allocation: An n x m matrix defines the number of
resources of each type currently allocated to each
process.
Request: An n x m matrix indicates the current request
of each process. If Request [ij] = k, then process Pi is
requesting k more instances of resource type. Rj.
OS Notes by Dr. Naveen Choudhary
Detection Algorithm
1. Let Work and Finish be vectors of length m and n,
respectively Initialize:
(a) Work = Available
(b) For i = 1,2, …, n, if Allocationi 0, then
Finish[i] = false;otherwise, Finish[i] = true.
2. Find an index i such that both:
(a) Finish[i] == false
(b) Requesti Work
If no such i exists, go to step 4.
OS Notes by Dr. Naveen Choudhary
Detection Algorithm (Cont.)
3. Work = Work + Allocationi
Finish[i] = true
go to step 2.
4. If Finish[i] == false, for some i, 1 i n, then the system is in
deadlock state. Moreover, if Finish[i] == false, then Pi is
deadlocked.
Algorithm requires an order of O(m x n2) operations to detect
whether the system is in deadlocked state.
OS Notes by Dr. Naveen Choudhary
Example of Detection Algorithm
Five processes P0 through P4; three resource types
A (7 instances), B (2 instances), and C (6 instances).
Snapshot at time T0:
Allocation Request Available
ABC ABC ABC
P0 0 1 0 000 000
P1 2 0 0 202
P2 3 0 3 000
P3 2 1 1 100
P4 0 0 2 002
Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true
for all i.
OS Notes by Dr. Naveen Choudhary
Example (Cont.)
P2 requests an additional instance of type C.
Request
ABC
P0 0 0 0
P1 2 0 2
P2 0 0 1
P3 1 0 0
P4 0 0 2
State of system? (deadlocked)
Can reclaim resources held by process P0, but insufficient
resources to fulfill other processes; requests.
Deadlock exists, consisting of processes P1, P2, P3, and P4.
OS Notes by Dr. Naveen Choudhary
Detection-Algorithm Usage
When, and how often, to invoke depends on:
How often a deadlock is likely to occur?
How many processes will need to be rolled back?
one for each disjoint cycle
After every request computationally very expensive
Say one’s a hour or when CPU utilization drops below
say 40%
Disadvantage of calling deadlock algorithm at arbitrary
times :: If detection algorithm is invoked arbitrarily, there may
be many cycles in the resource graph and so we would not be
able to tell which of the many deadlocked processes “caused”
the deadlock.
OS Notes by Dr. Naveen Choudhary
Recovery from Deadlock:
Process Termination
Resource Preemption
Process Termination ::
Abort all deadlocked processes.
Abort one process at a time until the deadlock cycle is eliminated.
In which order should we choose to abort?
Priority of the process.
How long process has computed, and how much longer to
completion.
Resources the process has used. (for example whether the
resources are simple to preempt)
How may more Resources process needs to complete.
How many processes will need to be terminated.
Is process interactive or batch?
OS Notes by Dr. Naveen Choudhary
Recovery from Deadlock: Resource Preemption
Selecting a victim – minimize cost.
which resource and which process one need to preempt
which will incur minimum cost ( no. of resources a deadlocked
process is holding, amount of time a deadlocked process has consumed
so far etc )
Rollback – return to some safe state, restart process for
that state.
if we preempt resource then what to do with the process
that was holding this resource rollback to some safe state from where
we can continue. Generally a total rollback (ie restarting) of the process
may be required
Starvation – same process may always be picked as
victim, include number of rollback in cost factor.
OS Notes by Dr. Naveen Choudhary
Combined Approach to Deadlock Handling
Combine the three basic approaches
prevention
avoidance
detection
allowing the use of the optimal approach for each of
resources in the system.
Partition resources into hierarchically ordered classes.
(classes PCBs, memory, tape drives, hard disks etc )
Use most appropriate technique for handling deadlocks
within each class.
OS Notes by Dr. Naveen Choudhary
Chapter 9: Memory Management
Background
Swapping
Contiguous Allocation
Paging
Segmentation
Segmentation with Paging
O.S. Notes prepared by Dr. Naveen Choudhary
Background
Program must be brought into memory and placed within
a process for it to be run.
Input queue – collection of processes on the disk that are
waiting to be brought into memory to run the program.
User programs go through several steps before being
run.
O.S. Notes prepared by Dr. Naveen Choudhary
Binding of Instructions and Data to Memory
Address binding of instructions and data to memory addresses can
happen at three different stages.
Compile time: If memory location known a priori,
absolute code can be generated; must recompile code if
starting location changes.
Load time: Must generate relocatable code if memory
location is not known at compile time.
Execution time: Binding delayed until run time if the
process can be moved during its execution from one
memory segment to another. Need hardware support for
address maps (e.g., base and limit registers).
O.S. Notes prepared by Dr. Naveen Choudhary
Multistep Processing of a User Program
O.S. Notes prepared by Dr. Naveen Choudhary
Logical vs. Physical Address Space
The concept of a logical address space that is bound to a
separate physical address space is central to proper
memory management.
Logical address – generated by the CPU; also referred to as
virtual address.
Physical address – address seen by the memory unit.
Logical and physical addresses are the same in compile-
time and load-time address-binding schemes; logical
(virtual) and physical addresses differ in execution-time
address-binding scheme.
O.S. Notes prepared by Dr. Naveen Choudhary
Memory-Management Unit (MMU)
Hardware device that maps virtual to physical address.
In MMU scheme, the value in the relocation register is
added to every address generated by a user process at
the time it is sent to memory.
The user program deals with logical addresses; it never
sees the real physical addresses.
O.S. Notes prepared by Dr. Naveen Choudhary
Dynamic relocation using a relocation register
Questions: a cpu with say 16 address line can access 216 memory locations i.e. 64 Kbytes
-q1: what can be the maximim size of a single program
-q2: can we have 8 MB of memory (i.e. Memory less than 16 KB)
O.S. Notes prepared by Dr. Naveen Choudhary
Dynamic Loading
Routine is not loaded until it is called
Better memory-space utilization; unused routine is never
loaded.
Useful when large amounts of code are needed to handle
infrequently occurring cases.
No special support from the operating system is required
implemented through program design. O.S. may help the
programmer however, by providing library routines to
implement dynamic loading.
O.S. Notes prepared by Dr. Naveen Choudhary
Dynamic Linking
Linking postponed until execution time.
Small piece of code, stub, used to locate the appropriate
memory-resident library routine & load the routine if it is
not already in memory.
In the end the Stub replaces itself with the address of the
routine, and executes the routine. Thus the next time that
code segment is reached, the library routine is executed
directly, incurring no cost for dynamically linking.
Operating system needed to check if routine is in
processes’ memory address.
Dynamic linking is particularly useful for language
libraries.
O.S. Notes prepared by Dr. Naveen Choudhary
Overlays
1. Keep in memory only those instructions and data that are
needed at any given time.
2. Needed when process is larger than amount of memory
allocated to it.
3. When other instruction are needed, they are loaded into space
that was occupied previously by instruction that are no longer
needed.
4. Use overlay driver to load new program in place of old program
in overlaid memory ( Special relocation & linking algorithms may
be needed to construct to overlays.
5. Implemented by user, no special support needed from operating
system, programming design of overlay structure is complex
6. Use of overlays is currently limited to micro computer & other
system having limited memory.
O.S. Notes prepared by Dr. Naveen Choudhary
Overlays for a Two-Pass Assembler
Assembler Assembler
O.S. Notes prepared by Dr. Naveen Choudhary
Swapping
1. A process can be swapped temporarily out of memory to a backing store, and then
brought back into memory for continued execution.
2. Backing store – fast disk large enough to accommodate copies of all memory images
for all users; must provide direct access to these memory images.
3. Roll out, roll in – swapping variant (variation) used for priority-based scheduling
algorithms; lower-priority process is swapped out so higher-priority process can be
loaded and executed & when high priority process finishes, the lower priority process
can be rolled in.
4. The system maintains a ready queue consisting of all processes whose memory
image are on backing store or in memory are ready to run. When CPU scheduler
decides to execute a process, it checks the queues & if required swap-in the process.
5. Process swapped out, should be swapped back into the same memory space that it
occupied previously if binding is compile or load time. For execution time binding
this restriction is not there.
6. Major part of swap time is transfer time (because generally swap space is allocated
as a separate chunks of disk, separate from the file system , so that its use is as fast
as possible.) Total transfer time (swapped out transfer time + swap in transfer
time). is directly proportional to the amount of memory swapped. (so time quantum
(round robin scheduling) should be sufficiently large)
7. A job being swapped out should not have any pending i/o. otherwise pending i/o will
ultimately write into the wrong (swapped in ) process area.
soln. never to swap a process with pending i/o
To execute i/o operation only into operating system buffer
O.S. Notes prepared by Dr. Naveen Choudhary
Schematic View of Swapping
O.S. Notes prepared by Dr. Naveen Choudhary
Contiguous Allocation
Main memory usually divided into two partitions:
Resident operating system, usually held in low memory with
interrupt vector.
User processes then held in high memory.
Single-partition allocation
Relocation-register & limit register are used to protect user
processes from each other, and from changing operating-system
code and data.
Relocation register contains value of smallest physical address;
limit register contains range of logical addresses – each logical
address must be less than the limit register.
When the CPU scheduler selects a process for execution, the
dispatcher loads the relocation & limit registers with the correct
values as part of the context switch.
O.S. kernel O.S. & interrupt
Transient part
vector table
of OS
User process
Free
O.S. Notes prepared by Dr. Naveen Choudhary
Hardware Support for Relocation and Limit Registers
O.S. Notes prepared by Dr. Naveen Choudhary
Contiguous Allocation (Cont.)
Multiple-partition allocation MFT (multiprogramming with fixed no. of
tasks) (No. of fixed size partition) one process can be in any of these
fixed size partition. No. of process in memory is linked to no. of fixed
size position in memory not used now.
Multiple-partition allocation (MVT) (multiprogramming with variable no.
of tasks)
Hole – block of available memory; holes of various size are scattered
throughout memory. (initially whole memory is one big hole)
When a process arrives, it is allocated memory from a hole large
enough to accommodate it. (When process terminates, memory
occupied by this process is released & a new hole is generated)
When the next job in the queue has memory requirement that can not be
satisfied by hole in memory; then wait or select the next job from the
ready queue.
Operating system maintains information about:
a) allocated partitions b) free partitions (hole)
OS OS OS OS
process 5 process 5 process 5 process 5
process 9 process 9
process 8 process 10
process 2 process 2 process 2 process 2
O.S. Notes prepared by Dr. Naveen Choudhary
Dynamic Storage-Allocation Problem
How to satisfy a request of memory size n from a list of free holes.
First-fit: Allocate the first hole that is big enough.
Best-fit: Allocate the smallest hole that is big enough; hole
must search entire list, unless ordered by size.
Process
Produces the smallest leftover hole. P1
Worst-fit: Allocate the largest hole; must also search
entire list. Produces the largest leftover hole. Hole
Simulation & have shown that First-fit and best-fit
perform better than worst-fit in terms of speed and p2
storage utilization.
O.S. Notes prepared by Dr. Naveen Choudhary
Fragmentation
External Fragmentation –
1. As process are loaded & removed from memory, the free memory space is broken
into little pieces.
2. So external fragmentation exist when total memory space exists to satisfy a
request, but it is not contiguous. (first fit, worst fix, best fit suffers from external
fragmentation)
3.In the worst case, we could have a block of free (waste) memory between every
two processes.
Internal Fragmentation – allocated memory may be slightly larger than requested
memory; this size difference is memory internal to a partition, but not being used.
1) Hole of size 100 bytes available 2)Request been 98 bytes comes 3)To avoid overhead
of maintaining two bytes hole. It is better to allocated 100 bytes to in requesting process.
4)Now this 2 bytes is the internal fragmentation.
Reduce external fragmentation by compaction
Shuffle memory contents to place all free memory together in one large block
OS OS
400 k 400 k Selecting an
P1 P1 optimal
900 k 900 k
100 k compaction
1000 k P2 strategy is quite
1600 k
P2 P3 difficult (optimal
1700 k – minimum
P3 1900 k
2000 k 400 movement of
300 k program & data)
2300 k
O.S. Notes prepared by Dr. Naveen Choudhary NEXT…
Fragmentation……..
Compaction is possible only if relocation is dynamic, and is done at
execution time.(For compile time or load time address binding,
compaction is not possible)
For execution time address binding, compaction include moving
program & data & then changing the base register to reflect the new
base address.
I/O problem (similar to swapping)
Latch job in memory while it is involved in I/O.
Do I/O only into OS buffers.
O.S. Notes prepared by Dr. Naveen Choudhary
Paging
Logical address space of a process can be noncontiguous;
process is allocated physical memory whenever the latter is
available.
Divide physical memory into fixed-sized blocks called frames
(size is power of 2, between 512 bytes and 8192 bytes).
Divide logical memory into blocks of same size called pages.
Keep track of all free frames.
To run a program of size n pages, need to find n free frames
and load program.
Set up a page table to translate logical to physical addresses.
Paging suffers from Internal fragmentation.
OS keeps page table for each process.
Address space= collection of address (of memory area)
O.S. Notes prepared by Dr. Naveen Choudhary
Address Translation Scheme
Address generated by CPU is divided into:
Page number (p) – used as an index into a page table which contains
base address of each page in physical memory.
Page offset (d) – combined with base address to define the physical
memory address that is sent to the memory unit.
Internal Fragmentation in paging
No external fragmentation. But there will be internal fragmentation if
program size is more than say n pages but less than n+1 pages ( if
process size is n pages & 1 bytes then almost 1 whole page is internal
fragmentation)
On average one half page per process is internal fragmentation so
small page size is better.
But for big pages there will be less entries in the page table so less
over head for the OS. Large pages is also good for disk I\O as it will be
more efficient then doing may small I\O (I\O efficient when data being
transferred is large)
Trend is towards large page size (2 or 4 kilo bytes)
O.S. Notes prepared by Dr. Naveen Choudhary
Address Translation Architecture
No base/relocation & limit registers now
O.S. Notes prepared by Dr. Naveen Choudhary
Paging Example
O.S. Notes prepared by Dr. Naveen Choudhary
Paging Example
Frame
no.
0
Let each
1 page\frame is of
size n bytes
2
4
Address 5 x 4 + 0 = 20
5 x 4 + 2 = 22 5
frame No. x NO. of
6 x 4 + 2 = 26 6
bytes per page +
offset
7
O.S. Notes prepared by Dr. Naveen Choudhary
Free Frames
Before allocation After allocation
O.S. Notes prepared by Dr. Naveen Choudhary
Implementation of Page Table
Page table is kept in main memory.
Page-table base register (PTBR) points to the page table.
(kept in PCB)
Page-table length register (PRLR) indicates size of the
page table.
In this scheme every data/instruction access requires two
memory accesses. One for the page table and one for
the data/instruction.
The two memory access problem can be solved by the
use of a special fast-lookup hardware cache called
associative memory or translation look-aside buffers
(TLBs)
When new process is loaded (i.e. when context switch
occurs) the old TLB contents are flushed
O.S. Notes prepared by Dr. Naveen Choudhary
Associative Memory
Associative memory – parallel search
Page # Frame #
Address translation (A´, A´´)
If A´ is in associative register, get frame # out.
Otherwise get frame # from page table in memory & update
the TLB with this new entry so that next time the same page
can be found quickly.
O.S. Notes prepared by Dr. Naveen Choudhary
Paging Hardware With TLB
O.S. Notes prepared by Dr. Naveen Choudhary
Effective Access Time
Associative Lookup = time unit
Assume memory cycle time is 1 microsecond
Hit ratio – percentage of times that a page number is found in
the associative registers; ratio related to number of associative
registers.
Hit ratio =
Effective Access Time (EAT)
EAT = (1 + ) + (2 + )(1 – )
= + + 2 -2 + -
=2+-2+
=2+–
More memory for TLB then hit rate will be high. But associate
memory is expensive.
O.S. Notes prepared by Dr. Naveen Choudhary
Memory Protection
Memory protection implemented by associating
protection bit with each frame.
Valid-invalid bit attached to each entry in the page table:(
(For example a system with 10 bit address space. The
programs logical address space will be 1 k but a program
might be much smaller say 256 bytes than actual
address space available.)
“valid” indicates that the associated page is in the process’
logical address space, and is thus a legal page.
“invalid” indicates that the page is not in the process’ logical
address space.
This approach can be expended to provide finer level of
protection like read only, read write or execute only by
providing separate protection bits for each kind of access.
O.S. Notes prepared by Dr. Naveen Choudhary
Valid (v) or Invalid (i) Bit In A Page Table
As the page table is larger than required by the
process (in this chapter we are assuming that whole
process need to be in the main memory (i.e. no
O.S. Notes prepared by Dr. Naveen Choudhary
virtual memory support))
Page Table Structure
Hierarchical Paging
Inverted Page Tables
O.S. Notes prepared by Dr. Naveen Choudhary
Hierarchical Page Tables
Break up the logical address space into multiple page
tables.
A simple technique is a two-level page table.
O.S. Notes prepared by Dr. Naveen Choudhary
Two-Level Paging Example
A logical address (on 32-bit machine with 4K page size) is divided into:
a page number consisting of 20 bits.(so page table can have 2 20 (1 million entries)
entries with each entry needing 4 bytes so 4 mega bytes required for page table
problem)
a page offset consisting of 12 bits.
Since the page table is paged, the page number is further divided into:
a 10-bit page number.
a 10-bit page offset.
Thus, a logical address is as follows:
page number page offset
p1 p2 d
10 10 12
where pi is an index into the outer page table, and p2 is the displacement
within the page of the outer page table.
For 3 – level paging we need to have p1,p2 & p3
As the no. of levels increases the access time for memory increase but
caching (Associate members) with high hit ratio can help greatly.
O.S. Notes prepared by Dr. Naveen Choudhary
Two-Level Page-Table Scheme
O.S. Notes prepared by Dr. Naveen Choudhary
Address-Translation Scheme
Address-translation scheme for a two-level 32-bit paging
architecture
Actual page/frame
O.S. Notes prepared by Dr. Naveen Choudhary
Inverted Page Table
One entry for each real page (frame) of memory.
Thus, there is only one page table in the system & it has
only one entry for each page of the physical memory.
Entry consists of the virtual address of the page stored in
that real memory location, with information about the
process that owns that page.
Decreases memory needed to store each page table, but
increases time needed to search the table when a page
reference occurs.
Use hash table to limit the search to one — or at most a
few — page-table entries.
To improve performance, we use associative memory
register to hold recently located entries (This registers are
searched first, before the hash table is consulted )
O.S. Notes prepared by Dr. Naveen Choudhary
Inverted Page Table Architecture
Frame No.
O.S. Notes prepared by Dr. Naveen Choudhary
Shared Pages
Shared code
One copy of read-only (reentrant) code shared among
processes (i.e., text editors, compilers, window systems).
Shared code must appear in same location in the logical
address space of all processes.
Reentrant code the code which never change during
execution
Private code and data
Each process keeps a separate copy of the code and data.
The pages for the private code and data can appear
anywhere in the logical address space.
Inverted page tables are not at all suitable As we can not
map two or more virtual address to the same physical page
(i.e. It will be different for different (pid, p) combination)
O.S. Notes prepared by Dr. Naveen Choudhary
Shared Pages Example
The above is very difficult to implement inverted page table
O.S. Notes prepared by Dr. Naveen Choudhary
Segmentation
Memory-management scheme that supports user view of
memory. (paging is not like users view of memory)
A program is a collection of segments. A segment is a logical
unit such as:
main program,
procedure,
function,
method,
object,
local variables, global variables,
common block,
stack,
symbol table, arrays
O.S. Notes prepared by Dr. Naveen Choudhary
User’s View of a Program
O.S. Notes prepared by Dr. Naveen Choudhary
Logical View of Segmentation
4
1
3 2
4
user space physical memory space
Generally the loader will assign the segment no. to each
segment.
O.S. Notes prepared by Dr. Naveen Choudhary
Segmentation Architecture
Logical address consists of a two tuple:
<segment-number, offset>,
Segment table – maps two-dimensional physical addresses; each
table entry has:
base – contains the starting physical address where the segments
reside in memory.
limit – specifies the length of the segment.
As in paging segment table can be kept in registers (fast) but as
segment table is large it is generally kept in memory.
Segment-table base register (STBR) points to the segment table’s
location in memory.
Segment-table length register (STLR) indicates number of
segments used by a program;
segment number s is legal if s < STLR.
To make the access fast we can use associative memory (as in
paging) for segment table entries.
O.S. Notes prepared by Dr. Naveen Choudhary
Segmentation Architecture (Cont.)
Relocation.
dynamic
by segment table
Sharing.
shared segments
same segment number
Allocation.
first fit/best fit
external fragmentation
O.S. Notes prepared by Dr. Naveen Choudhary
Segmentation Architecture (Cont.)
Protection. With each entry in segment table associate:
validation bit = 0 illegal segment
read/write/execute privileges (some no. of bits)
Protection bits associated with segments; code sharing
occurs at segment level.
Since segments vary in length, memory allocation is a
dynamic storage-allocation problem.
A segmentation example is shown in the following
diagram
O.S. Notes prepared by Dr. Naveen Choudhary
Segmentation Hardware
O.S. Notes prepared by Dr. Naveen Choudhary
Example of Segmentation
O.S. Notes prepared by Dr. Naveen Choudhary
Sharing of Segments
•There can be jump inst within the shared segment. What this address will be
(segment No., offset) so how the segment will refer itself. Because there is only on
physical copy of sqrt, it must refer to itself in same way for other users.---i.e. it must have
a unique segment No.
•So if large no. of processes are sharing a segment then having a unique segment no.
for
O.S. shared segment
Notes prepared by Dr. Naveenno. in all the processes will be a problem.
Choudhary
Segmentation with Paging – MULTICS
The MULTICS system solved problems of external
fragmentation and lengthy search times (search time to allocate
a segment, using first fit or best fit can be long) by paging the
segments.
External fragmentation : - As segment can be of unequal length
(unlike paging) there can be external fragmentation.
Paging will eliminate external fragmentation & makes the
allocation problems trival (any empty frame can be used for a
desired page.)
Solution differs from pure segmentation in that the segment-
table entry contains not the base address of the segment, but
rather the base address of a page table for this segment.
So we will have a separate page table for each segment.
(however b\s each segment is limited in length by its segment
table entry, the page table does not needed to be full sized. )
So external fragmentation is remove but there will be internal
fragmentation as segment size need not fit exactly to the
multiple of no. of page.
O.S. Notes prepared by Dr. Naveen Choudhary
MULTICS Address Translation Scheme
To extend the scheme further (let s = 18 bit then we can have 262, 144
segments, requiring excessively large segment table. we can page the
segment table itself. So now our logical address will look like)
Segment no. offset
s1 s2 d1 d2
8
O.S. Notes prepared by Dr. Naveen Choudhary 10 6 10
Chapter 10: Virtual Memory
Background
Demand Paging
Process Creation
Page Replacement
Allocation of Frames
Thrashing
Operating System Examples
O. S. Notes Prepared by Dr. Naveen Choudhary
Background
Virtual memory – separation of user logical memory from
physical memory.
Only part of the program needs to be in memory for execution.
Logical address space can therefore be much larger than physical
address space.
Allows physical address spaces to be shared by several
processes.(as less space is required per process
Each user program could take less physical memory. So more
programs could be run at the same time, with a corresponding
increase in CPU utilization & through put, but with no increase in
response time or turnaround time
Less I/O would be needed to load or swap each user program into
memory, so each user program would run faster (as the whole
program code (like error routines) not be needed to run the
program).
Virtual memory can be implemented via:
Demand paging
Demand segmentation (not very popular as segments are of
variable size & so such scheme is difficult to implement)
O. S. Notes Prepared by Dr. Naveen Choudhary
Virtual Memory That is Larger Than Physical Memory
O. S. Notes Prepared by Dr. Naveen Choudhary
Demand Paging
Bring a page into memory only when it is needed.
Less I/O needed
Less memory needed
**( not necessary) Faster response
More users
Page is needed reference to it
invalid reference abort (Invalid representation = attempt to use
an illegal memory address (such as incorrect array subscript))
not-in-memory bring to memory
Swapping is the term use to swap-in or swap-out whole
process. Here we are using a sort of lazy swapper which swaps
pages only when that page is needed. Thus swapping is not a
appropriate term & so we will use the term paging in the present
context.
O. S. Notes Prepared by Dr. Naveen Choudhary
Transfer of a Paged Memory to Contiguous Disk Space
O. S. Notes Prepared by Dr. Naveen Choudhary
Valid-Invalid Bit
With each page table entry a valid–invalid bit is
associated
(1 in-memory, 0 not-in-memory)
Initially valid–invalid bit is set to 0 on all entries.
Example of a page table snapshot.
Frame # valid-invalid bit
1
1
1
1
0
0
0
page table
During address translation, if valid–invalid bit in page
table entry is 0 page fault.
O. S. Notes Prepared by Dr. Naveen Choudhary
Page Table When Some Pages Are Not in Main Memory
O. S. Notes Prepared by Dr. Naveen Choudhary
Page Fault
If there is ever a reference to a page, first reference will trap to
OS page fault
OS looks at another table to decide:
Invalid reference abort. ( we check an internal table in PCB to
determine whether the reference was a valid or invalid memory
access)
Just not in memory.
Get empty frame.
Swap page into frame.
Reset tables, validation bit = 1.
Problems with certain type of instruction
Restart instruction: (start the instruction from beginning)
Problems with certain type of instruction
Block move Solution :
- Must check both ends of both block
- Use temporary registers
Cont…
O. S. Notes Prepared by Dr. Naveen Choudhary ….
Page Fault
auto increment/ decrement operation (operand registers
are automatically incremented/ decremented after each
instruction.
So in case of page fault, the instruction will be re-executed
so to solve this problem before re-executing the instruction
(After page fault) the operand registers be reset to the
appropriate values.
Pure demand paging :- never bring a page into memory
until required (process starts with no page in memory &
gradually gets the pages as they are required)
Demand paging shows reasonably performance as
programs shows locality of reference ( and so there will
not be page faults at excessive rate)
O. S. Notes Prepared by Dr. Naveen Choudhary
Steps in Handling a Page Fault
O. S. Notes Prepared by Dr. Naveen Choudhary
What happens if there is no free frame?
Page replacement – find some page in memory, but not
really in use, swap it out.
algorithm
performance – want an algorithm which will result in
minimum number of page faults.
Same page may be brought into memory several times.
O. S. Notes Prepared by Dr. Naveen Choudhary
Performance of Demand Paging
Page Fault Rate 0 p 1.0
if p = 0 no page faults
if p = 1, every reference is a fault
Effective Access Time (EAT)
EAT = (1 – p) x memory access
+ p (page fault overhead
+ [swap page out ]
+ swap page in
+ restart overhead)
saving registers, Process state
(PCB) etc. & restoring them when
page fault interrupt ends.
O. S. Notes Prepared by Dr. Naveen Choudhary
Demand Paging Example
Memory access time = 1 microsecond
50% of the time the page that is being replaced has been modified and
therefore needs to be swapped out.
Swap Page Time (page fault service time ) = 10 msec = 10,000 microsec
EAT = (1 – p) x 1 + p (10000)
1 – P + 10000P
= 1 + 9999P
To improve page fault service time :- use swap space
swap space is generally faster then access from file
system. It is faster b/s swap space is allocated in much larger blocks,
and file look ups & indirect allocation method are not used
2 Ways :- At the first reference to the program
file, copy the whole file into the swap space.
Initially get pages from the file system itself but
due to page fault if this file pages is written back to disk (page
replacement) then they are written to the swap space.
O. S. Notes Prepared by Dr. Naveen Choudhary
Page Replacement
(Not all the pages of a process need to be in memory to run
the processes)
Prevent over-allocation of memory by modifying page-
fault service routine to include page replacement.
Use modify (dirty) bit to reduce overhead of page
transfers – only modified pages are written to disk.
Page replacement completes separation between logical
memory and physical memory – large virtual memory can
be provided on a smaller physical memory.
O. S. Notes Prepared by Dr. Naveen Choudhary
Need For Page Replacement
O. S. Notes Prepared by Dr. Naveen Choudhary
Basic Page Replacement
1. Find the location of the desired page on disk.
2. Find a free frame:
- If there is a free frame, use it.
- If there is no free frame, use a page replacement
algorithm to select a victim frame.
3. Read the desired page into the (newly) free frame.
Update the page and frame tables.
4. Restart the process.
O. S. Notes Prepared by Dr. Naveen Choudhary
Page Replacement
O. S. Notes Prepared by Dr. Naveen Choudhary
Page Replacement Algorithms
Objective:- Want lowest page-fault rate.
Evaluate algorithm by running it on a particular string of
memory references (reference string) and computing the
number of page faults on that string.
In all our examples, the reference string is
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.
O. S. Notes Prepared by Dr. Naveen Choudhary
Graph of Page Faults Versus The Number of Frames
(Desired graphs of page replacement algorithm)
O. S. Notes Prepared by Dr. Naveen Choudhary
First-In-First-Out (FIFO) Algorithm
(when page need to be replaced, replace the oldest page in the
memory)
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
3 frames (3 pages can be in memory at a time per process) 9 page
faults in the example below
1 1 1 4 4 4 5 5 5
2 2 2 1 1 1 3 3
3 3 3 2 2 2 4
O. S. Notes Prepared by Dr. Naveen Choudhary
First-In-First-Out (FIFO) Algorithm
FIFO Replacement – Belady’s Anomaly
more frames should be less page faults but in BA it can
be otherwise
Initial 4 page faults (1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 )
1 5 5 5 5 4 4
2 2 1 1 1 1 5
10 page faults
3 3 3 2 2 2 2
4 4 4 4 3 3 3
To implement FIFO Algo. – Use clock or use FIFO queues
O. S. Notes Prepared by Dr. Naveen Choudhary
FIFO Page Replacement
= 15 page faults
FIFO page replacement algorithms performance is not very good as it
can lead to throwing a page which is in active use & will be needed again
( & will thus cause another page fault)
FIFO also shows belady’s anamoly.
O. S. Notes Prepared by Dr. Naveen Choudhary
FIFO Illustrating Belady’s Anamoly
(Even when memory that is no. of frames increases the page
fault rate may increase from certain page reference string
this is Belady’s Anamoly)
O. S. Notes Prepared by Dr. Naveen Choudhary
Optimal Algorithm
( Will have lowest page-fault rate of all algorithms)
Replace page that will not be used for longest period of
time.
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 4
6 page faults
2
( 4 (initial faults)+2 page faults )
3
4 5
How do you know this? (Optimal page replacement algo. is
difficult to implement because it requires future knowledge of
the reference string)
So basically optimal algo. is used for measuring how well
other page replacement algorithm performs.
O. S. Notes Prepared by Dr. Naveen Choudhary
Optimal Page Replacement
O. S. Notes Prepared by Dr. Naveen Choudhary
Least Recently Used (LRU) Algorithm
Replace the page that has not been used for the longest period
of time
If recent past is an good approximation of the near future then
LRU is good approximation of optimal algo.)
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 5
2
3 5 4
4 3
How to implement LRU (H/W assistance will be required )
Counter implementation
Every page entry has a counter; every time page is referenced through
this entry, copy the clock into the counter.
When a page needs to be changed, look at the counters to determine
which are to change. (pages with the least value of time/counter need to
be changed )
O. S. Notes Prepared by Dr. Naveen Choudhary
LRU Page Replacement
1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1
0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0
7 0 1 2 2 3 0 4 2 2 0 3 3 1 2 0 1 7
= 12 page fault
O. S. Notes Prepared by Dr. Naveen Choudhary
LRU Algorithm (Cont.)
Stack implementation – keep a stack of page numbers in a double
link form:
Page referenced:
move it to the top
requires 6 pointers to be changed
Head Tail
So no search will be required for replacement
There is a class of page replacement algorithm, called stack
algorithms, that can never exhibit Belady’s Anamoly
Stack algorithm -- is an algorithm for which it can be shown that
the set of pages in memory for n frames is always a subset of the
set of pages that would be in memory with n+1 frames
LRU & optimal algo. Are of type stack algo. Whereas FIFO is not
O. S. Notes Prepared by Dr. Naveen Choudhary
Use Of A Stack to Record The Most Recent Page References
O. S. Notes Prepared by Dr. Naveen Choudhary
LRU Approximation Algorithms
The updating of the clock fields and stack must be done for every memory
reference. If we were to use an interrupt for every reference, to allow
software to update such data structures, it would slow every memory
reference approximately by a factor of at least 10
Solution :- Use some LRU approximation algorithms
Reference bit
With each page associate a bit, initially = 0
When page is referenced (Read or write) bit set to 1.
Replace the one which is 0 (if one exists). We do not know the order, however.
Additional reference bit algo.
With each page table entry we have
Reference register
Reference bit Rb
(set when the corresponding page is referenced)
At regular interval the timer interrupt occurs & shift right (Rb +
reference reg.)
When page need to be replaced the page with lesser value is
replaced (10000000, 00000001 then 0000 0001 is the least recently
used page)
O. S. Notes Prepared by Dr. Naveen Choudhary
LRU Approximation Algorithms
Second chance ( give pages with reference bit set, a second chance
Goal :- is to keep regularly referenced pages in the memory. data
structure used Circular queue, reference bit
Need reference bit.
Clock replacement.
If page to be replaced (in clock order) has reference bit = 1.
then:
set reference bit 0.
leave page in memory.
replace next page (in clock order), subject to same rules.
O. S. Notes Prepared by Dr. Naveen Choudhary
Second-Chance (clock) Page-Replacement Algorithm
O. S. Notes Prepared by Dr. Naveen Choudhary
Counting Algorithms
Keep a counter of the number of references that have been
made to each page.
LFU Algorithm: replaces page with smallest count.
( Logic – More actively used page will have large count but new
page which will be used actively in future will also be replaced
as its count at present is low)
MFU Algorithm: based on the argument that the page with the
smallest count was probably just brought in and has yet to be
used.
LFU & MFU does not approximate optimal algo. Well & are
generally not used.
Page Buffering :-
Keep a pool of free frames
When page need to be replace – mark it
Get the new page in a free frame from free frame pool ( so waiting
time of process will be less)
The marked page (victim) page is later written out when the paging
device is idle.
O. S. Notes Prepared by Dr. Naveen Choudhary
Allocation of Frames
Each process needs minimum number of pages (must
be at least enough to hold all the different pages that any
single instruction can reference.)
Example: IBM 370 – 6 pages to handle SS MOVE
instruction:
instruction is 6 bytes, might span 2 pages.
2 pages to handle from (as it is an indirect address).
2 pages to handle to (as it is an indirect address).
Two major allocation schemes.
fixed allocation
priority allocation
O. S. Notes Prepared by Dr. Naveen Choudhary
Fixed Allocation
Equal allocation – e.g., if 100 frames and 5 processes,
give each 20 pages.
Proportional allocation – Allocate according to the size of
process.
si size of process pi
S si (Sum of address space of all the processes currently need to be in the memory)
m total number of frames
s
ai allocation for pi i m
S
m 64
si 10
s2 127
10
a1 64 5 { if S1= 4.4 then use 5 frame}
137
127
a2 64 59
137
O. S. Notes Prepared by Dr. Naveen Choudhary
Priority Allocation
Use a proportional allocation scheme using priorities
rather than size. (High priority process – allocate more
frames)
If process Pi generates a page fault,
select for replacement one of its frames.
select for replacement a frame from a process with lower
priority number.
O. S. Notes Prepared by Dr. Naveen Choudhary
Global vs. Local Allocation
Global replacement – process selects a replacement
frame from the set of all frames; one process can take a
frame from another. (more popular = but efficiency of a
process also depends on the paging behavior of other
processes in the system )
Local replacement – each process selects from only its
own set of allocated frames (no. of frames allocated to
a process remain fixed through processes life time &
efficiency of the process depends on the paging behavior
of that process itself.)
O. S. Notes Prepared by Dr. Naveen Choudhary
Thrashing (high paging activity is called Thrashing. A process is
thrashing if it is spending more time in paging than executing)
If a process does not have “enough” pages as required by
its locality (pages which are being actively used by the
process during this time & so these pages are required to
be in memory) then page-fault rate will become very high.
This leads to:
CPU utilization drops as the processes are waiting for the
paging device
operating system thinks that it needs to increase the degree
of multiprogramming.
another process added to the system.
Thrashing a process is busy swapping pages in and
out rather than executing.
Locality :- Locality of a function may be its instruction,
local variable & may be subset of global variable which
are used in this function
O. S. Notes Prepared by Dr. Naveen Choudhary
Thrashing
Why does paging work?
Locality model
Process migrates from one locality to another.
Localities may overlap.
Why does thrashing occur?
size of locality > total memory size
O. S. Notes Prepared by Dr. Naveen Choudhary
Thrashing (Cont…..)
Why does paging work?
Locality model
Process migrates from one locality to another.
Localities may overlap.
Why does thrashing occur?
size of locality > total memory size
Process1:- page faults since its locality of pages are not in main
memory. Process1 may take pages/ frames (empty) from process2
but processes2 also require these pages ( as these pages are part
of its current locality) so process 2 will also start to page fault & this
can continue for Pn process. Now as process are busy waiting for
paging device & so CPU utilization drops,and seeing this the OS
increases the degree of multiprogramming and this leads to further
decrease in CPU utilization so O.S. again increase multi
programming and this continues leading to further degradation in
CPU utilization and now CPU is not doing any useful work but only
doing paging . so Thrashing has happened.
O. S. Notes Prepared by Dr. Naveen Choudhary
Working-Set Model
working-set window a fixed number of page references (idea is
to examine the most recent page references)
Example: 10,000 instruction
Working set is an approximation of the program’s locality (If a page is
in active use, it will be in the working set. If it is no longer being used, it
will drop from the working set time until after its last reference)
WSSi (working set of Process Pi) =
total number of pages referenced in the most recent (varies in time)
if too small will not encompass entire locality.
if too large will encompass several localities.
if = will encompass entire program.
D = WSSi total demand frames
if D > m Thrashing ( m = total available frames)
Policy if D > m, then suspend one of the processes. (i.e. decrease in
multi programming)
O. S. Notes Prepared by Dr. Naveen Choudhary
Working-set model
The working set strategy prevents Thrashing while keeping the degree of
multi programming as high as possible thus optimizing the CPU utilization.
The difficulty with the working set model is keeping track of the working
set. The working set window is a moving window. At each memory
references, a new reference appears at one end & the oldest reference
drops off the other end.
O. S. Notes Prepared by Dr. Naveen Choudhary
Page-Fault Frequency Scheme
( A comparatively easy & straight forward way to avoid Thrashing)
on desired page fault rate
on desired page fault rate
Establish “acceptable” page-fault rate.
If actual rate too low, process loses frame.
If actual rate too high, process gains frame.
Note :- Similar to the working-set we may have to suspend a process,
if the page fault rate increases & no free frames are available, we
must select some processes and suspend it. The freed frame are
then distributed to processes with high page-fault rates.
O. S. Notes Prepared by Dr. Naveen Choudhary
Other Considerations
Prepaging:- when a process is started or a swapped out process is
restarted , a lot of time will be used in page faults to get initial locality in
the memory.
Implementation keeps the working set in PCB when process is swapped
out
Pre-paging may be advantages is some case ( where the locality
assumed is close to the actual locality of the process when it is restarted
otherwise not useful b/s pre-paging also takes time to the get the pages
into memory)
Page size selection (m/c architecture issue; page size is some power of 2
Fragmentation ( Less internal fragmentation in small pages)
table size (Page table size small page size so more pages hence large
page table)
I/O overhead ( In I/O transfers time the seek time is the significant amount so
small pages – more seeks so I/O transfer time is better for large pages
Locality (Locality improves for small page size and so unnecessary I/O for
unneeded information will be reduced)
Page faults ( no of page faults will increase for small pages & page fault incur
extra overhead & time )
Note : tread is towards large page sizes
O. S. Notes Prepared by Dr. Naveen Choudhary
Other Considerations (Cont.)
Knowing locality can make the programming more efficient but
generally it is desired that demand paging is totally transparent to
the user/ programmer
Program structure
int A[][] = new int[1024][1024];
Each row (row major order) is stored in one page ie page size
=1024 words. So one row can be stored in a page
Lets assume one free frame
Program 1 for (j = 0; j < A.length; j++)
for (i = 0; i < A.length; i++)
A[i,j] = 0;
1024 x 1024 page faults
Program 2 for (i = 0; i < A.length; i++)
for (j = 0; j < A.length; j++)
A[i,j] = 0;
1024 page faults
O. S. Notes Prepared by Dr. Naveen Choudhary
Other Considerations (Cont.)
I/O Interlock – Pages must sometimes be locked into memory.
Consider I/O. Pages that are used for copying a file from a I/O device
must be locked from being selected for eviction by a page replacement
algorithm.
I/O is being done from a I/O device to a memory page now if this page is
replaced with another page of some other process then information from
I/O device will be written wrongly to the new page (as I/O will be
implement by different I/O process & it will be unaware of the page
change)
Solution 1: Write/ read I/O from Kernel/ system buffer so a process
wishing to do I/O will first write/ read the information in kernel buffer &
from there I/O processor will read data ( but this extra copying will
increase the overhead
Solution 2: Associate a lock bit to every frame in memory. while doing I/O
set lock bit for frame. The frame with the lock bit set can not be replaced.
O. S. Notes Prepared by Dr. Naveen Choudhary
Chapter 11: File-System Interface
File Concept
Access Methods
Directory Structure
File System Mounting
File Sharing
Protection
OS Notes by Dr. Naveen Choudhary
File Concept
File :: OS provides a uniform logical view of information storage
ie file
Files are mapped, by the OS onto physical device (HD, CD, etc)
A file is a named collection of related information that is
recorded on secondary storage. From user perspective, a file is
the smallest allotment of logical secondary storage; that is, data
cannot be written to secondary storage unless they are within a
file (you can’t write a single char to HD )
Types:
Data
numeric
character
Binary
etc
Program source program, object program, executable program
etc
OS Notes by Dr. Naveen Choudhary
File Structure
None - sequence of words, bytes
Simple record structure
Lines
Fixed length
Variable length
Can simulate last two with first method by inserting
appropriate control characters.
The structure of the file is decided by
Operating system
Program
note : UNIX OS define all files to be simply a stream of
bytes and each byte is individually addressable by its
offset from the beginning (or end ) of the file
OS Notes by Dr. Naveen Choudhary
File Attributes
Name – only information kept in human-readable form.
Type – needed for systems that support different types.
Location – pointer to file location on device.
Size – current file size.
Protection – controls who can do reading, writing, executing.
Time, date, and user identification –this information may be
kept for (1) creation, (2)last modification and (3) last use. These
data can be useful for protection, security and usage monitoring
Information about files are kept in the directory structure, which
is maintained on the secondary storage because directories, like
files must be nonvolatile and so must be stored on the
secondary storage device and need to be brought into memory
piecemeal as needed
OS Notes by Dr. Naveen Choudhary
File Operations
Create
Write
Read
Reposition within file – file seek
Delete
Truncate (contents of a file is deleted ie file size
becomes 0 but other attributes of file are not changed.
Open(Fi) – search the directory structure on disk for entry
Fi, and move the content of entry to memory. so that we
need not waste time in reading the directory for the
secondary storage every time we access the file
Close (Fi) – move the content of entry Fi in memory to
directory structure on disk.
OS Notes by Dr. Naveen Choudhary
File Types – Name, Extension
OS in some cases like .exe, .com etc or application program in other
cases (.java, .c etc) uses the file extension to recognize (in some cases
to open the appropriate application for the given file windows,
macintosh )
Unix uses a crude magic number stored at the beginning of some file
to indicate roughly the type of file executable, batch, postscipt etc.
{not
OS Notes by all files
Dr. Naveen have magic numbers }
Choudhary
Access Methods
Sequential Access
read next
write next
reset
no read after last write
(rewrite)
Direct Access
read n
write n
position to n
read next
write next
rewrite n
n = relative block number
OS Notes by Dr. Naveen Choudhary
Sequential-access File
(Offset = 0 )
(reset)
Note: sequential access is based on a tape model of a file
OS Notes by Dr. Naveen Choudhary
Simulation of Sequential Access on a Direct-access File
OS Notes by Dr. Naveen Choudhary
Example of Index and Relative Files
(other access methods )
Note : for IBM indexed sequential method uses a small master
index that points to disk blocks of a secondary index. The secondary
index block points to the actual file (data) blocks
OS Notes by Dr. Naveen Choudhary
Directory Structure
A collection of nodes containing information about all
files.
Directory
Files
F1 F2 F4
F3
Fn
Both the directory structure and the files reside on disk.
Backups of these two structures are kept on tapes.
OS Notes by Dr. Naveen Choudhary
A Typical File-system Organization
Device directory (or simply
directory ) contains
information about files
within the partition.
Information like name,
location, size & type of all
the files in the partition
One disk partitioned in 2 logical
storage unit (structure )
2 physical disk are partioned into
one logical storage unit
So partitions can be thought as virtusl disks
Directory can be viewed as a symbol table that translate file names into their
directory entries and with this view we can say that directory itself can be organized
in many ways
OS Notes by Dr. Naveen Choudhary
Information in a Device Directory
Name
Type
Address
Current length
Maximum length
Date last accessed (for archival)
Date last updated (for dump)
Owner ID (who pays)
Protection information (discuss later)
OS Notes by Dr. Naveen Choudhary
Operations Performed on Directory
Search for a file
Create a file (new file entry can be added to the directory )
Delete a file (delete/remove file entry from directory )
List a directory (list the files in the dir )
Rename a file
Traverse the file system it is useful to be able to access
every directory and every file within the directory structure ( can
be used to create backup of the whole file structure with a single
command )
OS Notes by Dr. Naveen Choudhary
Organize the Directory (Logically) to Obtain
Efficiency – locating a file quickly.
Naming – convenient to users.
Two users can have same name for different files.
The same file can have several different names.
Grouping – logical grouping of files by properties, (e.g.,
all Java programs, all games, …)
OS Notes by Dr. Naveen Choudhary
Single-Level Directory
A single directory for all users.
Naming problem
(only single root dir so we can’t have two files with the
same name )
Grouping problem
OS Notes by Dr. Naveen Choudhary
Two-Level Directory
Separate directory for each user.
•Path name need to be given to access files
•Can have the same file name for different user
•Efficient searching is possible
• Limited grouping capability as only 2 levels of directory
OS Notes by Dr. Naveen Choudhary
Tree-Structured Directories
OS Notes by Dr. Naveen Choudhary
Tree-Structured Directories (Cont.)
Efficient searching
Grouping Capability
Current directory (working directory)
Path names can be long as height of the tree can be very
long also
OS Notes by Dr. Naveen Choudhary
Tree-Structured Directories (Cont.)
Absolute or relative path name
Absolute path start from root
Relative path start from current working directory
Creating a new file is done in current directory.
Delete a file
rm <file-name>
Creating a new subdirectory is done in current directory.
mkdir <dir-name>
Example: if in current directory /mail
mkdir count
mail
prog copy prt exp count
Deleting “mail” deleting the entire subtree rooted by “mail”.
Deleting dir 2 options
can only delete empty dir
or
delete all the files in the directory & directory itself with
OS Notes by a
Dr. single command
Naveen Choudhary
Acyclic-Graph Directories
Have shared subdirectories and files.
OS Notes by Dr. Naveen Choudhary
Acyclic-Graph Directories (Cont.)
Two or more different names (aliasing) of a file but
actually single physical file is possible
problem ::
Traversing dir structure (we do not want to traverse a single
file more than once )
Links should have special way of representation in dir
entries and such entries should be ignored while
traversing the dir.
Deletion :: deleting a directory entry may not mean
deleting the file (if actual file is deleted we may have
dangling pointers) as more than one entry may be
pointing to the same file
soln of the above (deletion ) keep reference count
with each dir. Entry (about the file ) and delete the file
only when reference count becomes 0
OS Notes by Dr. Naveen Choudhary
General Graph Directory
OS Notes by Dr. Naveen Choudhary
General Graph Directory (Cont.)
Problem with cycles :: In case of cycles (self referencing),
it is possible that we won’t be able to access a dir or file
even when there reference count is not equal to 0
How do we guarantee no cycles?
Allow only links to file not subdirectories.
Garbage collection. time to time do garbage collection
need to be done first pass traverses dir structure and
note everything that can be accessed and in second pass
everything else (which can not be accessed ) is marked as
free space.
Every time a new link is added use a cycle detection
algorithm to determine whether it is OK.
OS Notes by Dr. Naveen Choudhary
Protection
File owner/creator should be able to control:
what can be done to the file (what all operation are possible
on the file )
by whom
( ie which user is allowed to do what all operation on the file )
Types of access
Read
Write
Execute
Append
Delete
List
OS Notes by Dr. Naveen Choudhary
Access Lists and Groups
Mode of access: read, write, execute
Three classes of users
RWX
a) owner access 7 111
RWX
b) group access 6 110
RWX
c) public access 1 001
Ask manager to create a group (unique name), say G,
and add some users to the group.
For a particular file (say game) or subdirectory, define an
appropriate access.
owner group public
chmod 761 game
Attach a group to a file
chgrp G game
OS Notes by Dr. Naveen Choudhary
Consistency Semantics
Semantics for multiple user accessing a shared file
simultaneously
Semantics should specify when modification of data by one user
are observable by other users.
open ------------file session ------------ close
UNIX semantics (not exactly used in latest versions of unix)
Write by one user to shared file will be immediately visible to other
user
Changing the file offset (pointer by one user will be seen by the
other user also
Note : a file has a single image that interleaves all accesses,
regardless of their origin
Session semantics
Write by one user to shared file are not immediately visible to other
user
When file is closed, the changes made to it are visible only in
session starting later (already open instances of the file do not
reflect these changes )
Note : so file has several images – one each for each user of open
file
OS Notes by Dr. Naveen Choudhary
Consistency Semantics ….contd…
Immutable shared file semantics
Once a file is declared as shared by its creator, it cannot be
modified (ie now its contents can not be altered )
OS Notes by Dr. Naveen Choudhary
Chapter 12: File System Implementation
File System Structure
File System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery
Log-Structured File Systems
NFS
OS Notes by Dr. Naveen Choudhary
File-System Structure
File structure
Logical storage unit
Collection of related information
File system resides on secondary storage (disks).
File system organized into layers.
File control block – storage structure consisting of
information about a file.
OS Notes by Dr. Naveen Choudhary
Layered File System
Converts say read xyz(next) to detailed info.
about file like logical block no. etc
Converts logical block no. to actual physical block nos.
Issue generic commands to device driver to read or
write physical block on disk (device 1, cylinder 73,
track 2, sector 10 )
Converts generic commands like “retrieve block no. 123”
To lower level hardware specific instruction to be used by
hardware controllers which interface the i/o device to the rest
of the system
OS Notes by Dr. Naveen Choudhary
A Typical File Control Block
(Address of file data blocks)
OS Notes by Dr. Naveen Choudhary
In-Memory File System Structures
The following figure illustrates the necessary file system
structures provided by the operating systems.
Figure 12-3(a) refers to opening a file.
Figure 12-3(b) refers to reading a file.
OS Notes by Dr. Naveen Choudhary
In-Memory File System Structures
Contains file status, (open for read only, write
only or read_write ) current file offset &
inode no. (file control block)
OS Notes by Dr. Naveen Choudhary
Directory Implementation
Linear list of file names with pointer to the data blocks.
simple to program
time-consuming to execute
(linear search will be required)
for binary search you need to keep the list sorted, as file entries are
continuously being added and removed; so keeping the list sorted
is also computationally expensive )
Hash Table – linear list with hash data structure.
File name is hashed to reach the entry of the file in the linear list of
file entries
decreases directory search time
collisions – situations where two file names hash to the same
location
fixed size (hash tables are generally of fixed size and hash
functions also depends on the size of hash table )
OS Notes by Dr. Naveen Choudhary
Allocation Methods
An allocation method refers to how disk blocks (data
blocks ) are allocated for files:
Contiguous allocation
Linked allocation
Indexed allocation
OS Notes by Dr. Naveen Choudhary
Contiguous Allocation
Each file occupies a set of contiguous blocks on the
disk.
Only starting location (block #) and length (number of
blocks) are required in the directory entry.
Random/direct access possible (if b is the starting block
and we have to read ith block of file; just give command
read (b +i ) .
Wasteful of space (dynamic storage-allocation problem).
Use best fit or first fit but this can lead to external
fragmentation
Files cannot grow.( as we need to give the Max.
anticipated file size at the time of creation of the file and
this can lead to internal fragmentation if the Max.
anticipated file size is not fully used
OS Notes by Dr. Naveen Choudhary
Contiguous Allocation of Disk Space
OS Notes by Dr. Naveen Choudhary
Linked Allocation
Each file is a linked list of disk blocks: blocks may be
scattered anywhere on the disk.
block = pointer
data
OS Notes by Dr. Naveen Choudhary
Linked Allocation (Cont.)
Simple – need only starting address in directory entry
No external fragmentation
Free-space management system (like FAT) – but this will
require some additional space
No random access possible
Variation of linked allocation scheme used in some OS
File-allocation table (FAT) – disk-space allocation
used by MS-DOS and OS/2.
OS Notes by Dr. Naveen Choudhary
Linked Allocation
OS Notes by Dr. Naveen Choudhary
File-Allocation Table
Free
block
214 0
FAT can reduce the random access time; Generally FAT will
be available at the start of the partition
OS Notes by Dr. Naveen Choudhary
Indexed Allocation
Brings all pointers together into the index block.
Logical view.
index table
Data blocks
OS Notes by Dr. Naveen Choudhary
Example of Indexed Allocation
OS Notes by Dr. Naveen Choudhary
Indexed Allocation (Cont.)
Need index table ( so extra space required )
Random access possible, also sequential access possible
Dynamic access without external fragmentation, but have
overhead of index block.
Mapping from logical to physical in a file of maximum size of
256K words and block size of 512 words (256 * 1024 / 512 =
512 blocks so 512 words in the index block required. So we
need only 1 block for index table.
600th word = 512 * 1 + 88
Q R
Q = displacement into index table 2nd block as 600 >1*512
R = displacement into block 88
OS Notes by Dr. Naveen Choudhary
Indexed Allocation – Mapping (Cont.)
Linked Scheme :: To allow for large files, we may link
together several index blocks
Index
blk1
Index
blk2
OS Notes by Dr. Naveen Choudhary
Indexed Allocation – Mapping (Cont.)
Can address 512 * 3 = 1536 words
512
words
1
2
512
outer-index
index table file
So we are looking for 1537th word of the file we have to first
go to the 2nd entry in the outer index block
OS Notes by Dr. Naveen Choudhary
Combined Scheme: UNIX (4K bytes per block)
inode
OS Notes by Dr. Naveen Choudhary
Free-Space Management
Contiguous free
blocks
Bit vector (n blocks)
0 1 2 n-1
1 1 1 …
0 block[i] occupied
bit[i] =
1 block[i] free
Location of first free block no. can be done as follow
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit in the next word
OS Notes by Dr. Naveen Choudhary
Free-Space Management (Cont.)
Bit map requires extra space. Example:
block size = 212 bytes
disk size = 230 bytes (1 gigabyte)
n = 230/212 = 218 size bits (or 32K bytes) vector
will be required & for bit vector to be efficient it will also be
needed to keep it in primary memory
Easy to get contiguous space
Linked list (free list)
Each free block point to the next free block on the disk
Cannot get contiguous space easily
No waste of space
OS Notes by Dr. Naveen Choudhary
Free-Space Management (Cont.)
Grouping
A modification of the free list approach is to store the
address of n free blocks in the first free block. The first n-1
of these blocks are actually free. The last free block
contains the address of another free blocks & so on.
Advantage address of large no. of free blocks can be
found quickly unlike linked list method
Counting
Generally several contiguous blocks may be allocated or
freed simultaneously
So each entry in the free space list ( some blocks may be
reserved to keep this list ) can be made to contain the
address of the first free block and the number n of free
contiguous blocks that follow the first free block
OS Notes by Dr. Naveen Choudhary
Linked Free Space List on Disk
OS Notes by Dr. Naveen Choudhary
Efficiency and Performance
Efficiency dependent on:
disk allocation and directory algorithms
types of data kept in file’s directory entry ( file last modified, file last
accessed )
Performance
disk cache – separate section of main memory for frequently used
blocks
free-behind and read-ahead – techniques to optimize sequential
access from disk
Free behind removes a page from the buffer as soon as the
next page is requested as previous pages is not likely to be
used in sequential access
Synchronous write :: write immediately to disk & the program
issuing command for synchronous write should wait till the write to
disk is completed. ( it is not so in asynchronous write just write to
cache & return control to the program. The page/block will be
written to disk at some later free time )
improve PC performance by dedicating section of memory as virtual
disk, or RAM disk.
OS Notes by Dr. Naveen Choudhary
Various Disk-Caching Locations
Ram disk (virtual disk ) :: A section of primary memory is set aside &
treated as a virtual disk.
Ramdisk device driver accepts all the standard disk operation but
perform those operation on the memory section, instead of on disk
The difference b/w a Ram disk and a disk cache is that the contents
of the Ramdisk are totally user controlled, whereas those of the disk
cache are under the control of OS. For instance, a Ram disk will stay
empty until the user creates file there
OS Notes by Dr. Naveen Choudhary
Page Cache
A page cache caches pages rather than disk blocks
using virtual memory techniques.
Memory-mapped I/O uses a page cache( virtual memory )
Routine I/O through the file system uses the buffer (disk)
cache.( i/o read(file), i/o write(file)
This leads to the following figure.
OS Notes by Dr. Naveen Choudhary
I/O Without a Unified Buffer Cache
Get block
of disl
OS Notes by Dr. Naveen Choudhary
Unified Buffer Cache
A unified buffer cache uses the same page cache to
cache both memory-mapped pages and ordinary file
system I/O.
OS Notes by Dr. Naveen Choudhary
I/O Using a Unified Buffer Cache
OS Notes by Dr. Naveen Choudhary
Recovery
Consistency checking – compares data in directory
structure with data blocks on disk, and tries to fix
inconsistencies. (inconsistency may occur when some
operation is done on open file (delete a block of mem.)
and computer crashes (ie before reflecting the result in dir
structure )
Soln do consistency checking at startup or
synchronous write for any meta data changes in the file.
Use system programs to back up data from disk to
another storage device (floppy disk, magnetic tape).
Incremental backup
Recover lost file or disk by restoring data from backup.
OS Notes by Dr. Naveen Choudhary
Log Structured File Systems
Log structured (or journaling) file systems record each
update to the file system as a transaction.
All transactions are written to a log. A transaction is
considered committed once it is written to the log.
However, the file system may not yet be updated.
The transactions in the log are asynchronously written to
the file system. When the file system is modified, the
transaction is removed from the log.
If the file system crashes, all remaining transactions in the
log must still be performed.
OS Notes by Dr. Naveen Choudhary
Chapter 13: I/O Systems
I/O Hardware
Application I/O Interface
Kernel I/O Subsystem
Transforming I/O Requests to Hardware Operations
Performance
OS Notes by Dr. Naveen Choudhary
I/O Hardware
Many variety of I/O devices with different functionality & speed
So a Varity of methods (in OS ) is needed to control them. These
methods form the I/O subsystem of the kernel, which separates the
rest of the kernel from the complexities of managing I/O devices
Common concepts
Port devices communicates with the m/c via a connection
point termed as a port (say serial port ). If one or more device
uses a common set of wires the connection is called a bus
Bus (daisy chain or shared direct access)
Controller (host adapter) electrical circuit that controls the
to & from signals to the device, bus or port
I/O instructions control devices
Devices have addresses, used by
Direct I/O instructions
Memory-mapped I/O
OS Notes by Dr. Naveen Choudhary
A Typical PC Bus Structure
OS Notes by Dr. Naveen Choudhary
Device I/O Port Locations on PCs (partial)
OS Notes by Dr. Naveen Choudhary
Polling
Determines state of device
Wait till busy bit is cleared
Sets write bit in command reg. and write a byte in data out reg.
Set command-ready bit in command reg. and then when controller
notices that command ready bit is set then it sets the busy bit in the
status reg.
The controller reads the command register & sees the write
command. It reads the data-out register to get the byte and does the
i/o to the device
The controller clears the command ready bit, clears the error bit in
the status register to indicate that the device i/o succeeded & clears
the buy bit to indicate that it is finished.
Disadvantage : Busy-wait cycle to wait for I/O from device
1contro/c
ommand
reg.
2status
reg.
1 2 3 4 3 data in
processor
Control
reg.
command/data
4 data out
reg.
OS Notes by Dr. Naveen Choudhary
Interrupts
CPU Interrupt request line triggered by I/O device
Interrupt handler receives interrupts
Maskable to ignore or delay/differ some interrupts
Interrupt vector to dispatch interrupt to correct handler
Based on priority
Some unmaskable can not be differed like divide-by-zero,
unrecoverable memory error etc
Interrupt mechanism also used for exceptions
CPU senses IRQ after executing every instruction
Interrupt mechanism enables CPU to respond to an asynchronous
event
Low priority interrupts are differed if high priority interrupt is being
entertained
High priority interrupt can also preempt the execution of low
priority interrupt
cpu
IRQ (interrupt request
line )
OS Notes by Dr. Naveen Choudhary
Interrupt-Driven I/O Cycle
OS Notes by Dr. Naveen Choudhary
Intel Pentium Processor Event-Vector Table
How to efficiently dispatch the interrupt handler for the device (without
first polling all the devices to see which one raised the interrupts )
device interrupt with a no. which is the offset in the interrupt vector table
containing the memory address of specialized interrupt handler
OS Notes by Dr. Naveen Choudhary
Direct Memory Access
Used to avoid programmed I/O for large data movement
Requires DMA controller
Bypasses CPU to transfer data directly between I/O
device and memory
OS Notes by Dr. Naveen Choudhary
Six Step Process to Perform DMA Transfer
OS Notes by Dr. Naveen Choudhary
Application I/O Interface
There are variety of i/o devices & new devices are launched
every now & then. Each device has its own set of capabilities,
control bit definition & protocol for interfacing with host
(processor ) – and they all are different.
The issue is how we can design an OS such that new devices
can be attached to the computer without the OS being rewritten
I/O system calls encapsulate device behaviors in generic
classes
Device-driver layer hides differences among I/O controllers from
kernel
Devices vary in many dimensions
Character-stream or block
Sequential or random-access
Sharable or dedicated
Speed of operation
read-write, read only, or write only
OS Notes by Dr. Naveen Choudhary
A Kernel I/O Structure
OS Notes by Dr. Naveen Choudhary
Characteristics of I/O Devices
OS Notes by Dr. Naveen Choudhary
Block and Character Devices
Block devices include disk drives
Commands include read, write, seek (for random access )
Raw I/O or file-system access
Memory-mapped file access possible ( as in demand
paging)
Character devices include keyboards, mice, serial ports
Commands include get, put
On top of get & put libraries can be built that offers line-at-a
time access, with buffering & editing services.
OS Notes by Dr. Naveen Choudhary
Network Devices
Varying enough from block and character to have own interface
Unix and Windows NT/9i/2000 include socket interface
Separates network protocol from network operation
Includes select functionality manages a set of socket &
give us information like which socket have a packet waiting to
be received or which socket have room to accept a packet
to be sent (to other host )
host1 host2
s1 s1
host3
s2
s2
OS Notes by Dr. Naveen Choudhary
Clocks and Timers
In most computers, the h/w clock is constructed from high
frequency counter
Provide current time, elapsed time, timer
If programmable interval time used for timings, periodic
interrupts.
programmable interval can be set to wait a certain
amount of time & then to generate an interrupt.
used by the scheduler to generate an interrupt that will
preempt a process at the end of its time slice.
Used by disk i/o subsystem to invoke the flushing of dirty
cache buffers to the disk periodically
Used by n/w subsystem to cancel operation that are
proceeding too slowly because of n/w congestion or failure
(time out )
OS Notes by Dr. Naveen Choudhary
Blocking and Nonblocking I/O
Blocking - process suspended (move the process from running
state to waiting state )until I/O completed
Easy to use and understand
Insufficient for some needs
Nonblocking - I/O call returns as much as available
User interface, data copy (buffered I/O)
Implemented via multi-threading
One thread will wait for i/o to complete
Other thread will continue its execution
Returns quickly with count of bytes read or written
Asynchronous - process runs while I/O executes
asynchronous call start the i/o but return immediately without
waiting for the completion of i/o
I/O subsystem signals process when I/O completed by setting
some status bit/registers in application address space or by
triggering a signal or by some software interrupt
OS Notes by Dr. Naveen Choudhary
Kernel I/O Subsystem
Scheduling
To schedule a set of i/o request means to determine a good order
in which to execute them with the objective to improve overall
system performance. The order in which application issues system
call rarely is the best choice
os maintains a queue of request for each device. When an
application issues a blocking i/o system call, the request is placed
on the queue for that device. The i/o scheduler rearranges the
order of the queue to improve the overall system efficiency and the
average response time experienced by the application
Buffering - store data in memory while transferring between
devices
To cope with device speed mismatch
To cope with device transfer size mismatch
To maintain “copy semantics” ( we need to use kernel buffer
buffer in kernel area )
OS Notes by Dr. Naveen Choudhary
Kernel I/O Subsystem
Caching - fast memory (generally static RAM ) holding copy of
data
Always just a copy
Key to performance generally performance improves as process
shows locality of reference
Spooling - hold output for a device
Useful for devices, such as printer, that cannot accept interleaved
data stream
Although a printer can serve only one job at a time, several
application may wish to print their output concurrently without
having their output mixed together. The OS solves this problem by
intercepting all output to the printer. Each application’s output is
spooled to a separate file on disk. The spooling system copies the
queued spool files to the printer one at a time
Spooling is one way that OS can coordinate concurrent output
Device reservation - provides exclusive access to a device by a
processes
System calls for allocation and deallocation
Watch out for deadlock
OS Notes by Dr. Naveen Choudhary
Error Handling
Transient errors like n/w not available, device (disk )
not available
Soln. retry read disk, resend message over n/w
Permanent errors Most OS return an error number or
code when I/O request fails
System error logs hold problem reports
OS Notes by Dr. Naveen Choudhary
Kernel Data Structures
Kernel keeps state info for I/O components, including
open file tables, network connections, character device
state
Many, many complex data structures to track buffers,
memory allocation, “dirty” blocks
OS Notes by Dr. Naveen Choudhary
UNIX I/O Kernel Structure
OS Notes by Dr. Naveen Choudhary
I/O Requests to Hardware Operations
Consider reading a file from disk for a process:
Determine device holding file
Translate name to device representation
Physically read data from disk into buffer
Make data available to requesting process
Return control to process
OS Notes by Dr. Naveen Choudhary
Life Cycle of An I/O Request
OS Notes by Dr. Naveen Choudhary
Chapter 14: Mass-Storage Systems
Disk Structure
Disk Scheduling
Disk Management
Swap-Space Management
RAID Structure
Disk Attachment
Stable-Storage Implementation
Tertiary Storage Devices
Operating System Issues
Performance Issues
OS Notes by Dr. Naveen Choudhary
Disk Structure
Disk drives are addressed as large 1-dimensional arrays
of logical blocks, where the logical block is the smallest
unit of transfer.
The 1-dimensional array of logical blocks is mapped into
the sectors of the disk sequentially.
Sector 0 is the first sector of the first track on the outermost
cylinder.
Mapping proceeds in order through that track, then the rest
of the tracks in that cylinder, and then through the rest of the
cylinders from outermost to innermost.
OS Notes by Dr. Naveen Choudhary
OS Notes by Dr. Naveen Choudhary
Disk Scheduling
The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a fast
access time and disk bandwidth.
Access time has two major components
Seek time is the time for the disk to move the heads to the
cylinder containing the desired sector.
Rotational latency is the additional time waiting for the disk
to rotate the desired sector to the disk head.
Minimize seek time
Seek time seek distance
Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for
service and the completion of the last transfer.
OS Notes by Dr. Naveen Choudhary
Disk Scheduling (Cont.)
Several algorithms exist to schedule the servicing of disk
I/O requests.
We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
OS Notes by Dr. Naveen Choudhary
FCFS
Illustration shows total head movement of 640 cylinders.
OS Notes by Dr. Naveen Choudhary
SSTF
Selects the request with the minimum seek time from the
current head position.
This algo. is not fair but efficient
This algo. is not optimal. A generic optimal algo for disk
scheduling does not exist
SSTF scheduling is a form of SJF scheduling; may cause
starvation of some requests.
Illustration shows total head movement of 236 cylinders.
OS Notes by Dr. Naveen Choudhary
SSTF (Cont.)
OS Notes by Dr. Naveen Choudhary
SCAN
The disk arm starts at one end of the disk, and moves
toward the other end, servicing requests until it gets to the
other end of the disk, where the head movement is
reversed and servicing continues.
Sometimes called the elevator algorithm.
Illustration shows total head movement of 208 cylinders.
OS Notes by Dr. Naveen Choudhary
SCAN (Cont.)
OS Notes by Dr. Naveen Choudhary
C-SCAN
Assuming a uniform distribution of requests for cylinders,
consider the density of requests when the head reaches one
end & reverses direction. At this point, there are relatively few
requests immediately in front of the head since these cylinders
have recently been serviced. The heaviest density of requests is
at the other end of the disk. These requests have also waited
the longest, so why not go there first.
Provides a more uniform wait time than SCAN.
The head moves from one end of the disk to the other. servicing
requests as it goes. When it reaches the other end, however, it
immediately returns to the beginning of the disk, without
servicing any requests on the return trip.
Treats the cylinders as a circular list that wraps around from the
last cylinder to the first one.
OS Notes by Dr. Naveen Choudhary
C-SCAN (Cont.)
OS Notes by Dr. Naveen Choudhary
C-LOOK
Version of C-SCAN
Arm only goes as far as the last request in each direction,
then reverses direction immediately, without first going all
the way to the end of the disk.
OS Notes by Dr. Naveen Choudhary
C-LOOK (Cont.)
OS Notes by Dr. Naveen Choudhary
Selecting a Disk-Scheduling Algorithm
SSTF is common and has a natural appeal
SCAN and C-SCAN perform better for systems that place a
heavy load on the disk.
Performance depends on the number and types of requests. For
a particular list of request, it is possible to define an optimal
order of retrieval, but the computation needed to find an optimal
schedule may not justify the saving over SSTF or SCAN
Requests for disk service can be influenced by the file-allocation
method.
Contiguous allocated file generate several request that are close
together on disk so less head movement
Linked & Indexed file file block can be scattered anywhere
resulting in greater head movement.
The disk-scheduling algorithm should be written as a separate
module of the operating system, allowing it to be replaced with a
different algorithm if necessary.
Either SSTF or LOOK is a reasonable choice for the default
algorithm.
OS Notes by Dr. Naveen Choudhary
Disk Management
Low-level formatting, or physical formatting — Dividing a disk into
sectors that the disk controller can read and write.( sector
header(sector no.), data (256, 512 or 1024 ), trailer (Error Correcting
Code)
To use a disk to hold files, the operating system still needs to
record its own data structures on the disk.
Partition the disk into one or more groups of cylinders.
Logical formatting or “making a file system”. the data structure may
include maps of free and allocated space (a FAT or inodes) and an initial
empty directory )
Boot block initializes system.
The bootstrap is stored in ROM & also some part on disk & loaded into
memory by the ROM part of bootstrap program
Bootstrap loader program.
Methods such as sector sparing & sector slipping used to handle
bad blocks in advanced disks like SCSI ( Small Computer Systems
Interface )
Sector sparing : map the bad sector say 87 to good spare sector 298
when the next time the system boots with the help of some special
command
Spare sectors are available on each cylinder & there is also a spare
cylinder. So as not to affect disk scheduling too much. When a sector in
a cylinder goes bad, it is replaced by a spare sector on the same
cylinder
Sector slipping : let sector 17 gone bad & there is a spare sector no.
202. shift each sector starting from sector 17 to 202 by one position
In DOS just mark bad blocks in FAT while doing logical formatting
or when chkdsk command is run
OS Notes by Dr. Naveen Choudhary
Booting from disk in Windows 2000.
OS Notes by Dr. Naveen Choudhary
MS-DOS Disk Layout
OS Notes by Dr. Naveen Choudhary
Swap-Space Management
Swap-space — Virtual memory uses disk space as an extension of main memory.
Swap-space can be carved out of the normal file system (easy to implement (as
normal file system routines can be used to create it, to name it, and to allocate its
space) but less efficient )or, more commonly, it can be in a separate disk partition.
Objective of using swap space : is to provide the best throughput for the virtual
memory system
Swap-space management :: (1)swapping may use swap space to hold the entire
process image including the code & data segments.(2) Paging
Systems may simply store pages that have been pushed out of memory
Process centric swap space ::
4.3BSD allocates swap space for the entire process (by coping it from the file system
initially) when process starts; holds text segment (the program) and data segment.
Kernel uses swap maps to track swap-space use.
Page related swap space ::
Solaris 2 allocates swap space only when a page is forced out of physical memory, not
when the virtual memory page is first created.
OS Notes by Dr. Naveen Choudhary
4.3 BSD Text-Segment Swap Map
Fixed size chunks Last chunk of
the text
region is
stored in 1k
The text segment is a fixed size, so its swap space is increment
allocated in 512K chunks, except for the final chunk, which
holds the remainder of pages, in 1K increments
OS Notes by Dr. Naveen Choudhary
4.3 BSD Data-Segment Swap Map
20 * 16k 21 * 16k 22 * 16k 23 * 16k 24 * 16k
............. 27 * 16k ( 2 MB)
The data segment swap map is more complicated, because the data segment can grow
over time. The map is of fixed size, but contains swap addresses for blocks of varying
size. Given index i, a block pointed to by swap map entry is of size 2i *16K to a maximum
of 2 megabytes. So using this scheme the blocks of large processes can be found quickly,
and the swap map remains small. Moreover for small processes only small blocks will be
needed to be used minimizing fragmentation.
OS Notes by Dr. Naveen Choudhary
RAID Structure
RAID – multiple disk drives provides reliability via
redundancy.
The general idea behind RAID is to employ a group of
hard drives together with some form of duplication, either
to increase reliability or to speed up operations, ( or
sometimes both. )
RAID originally stood for Redundant Array of
Inexpensive Disks, and was designed to use a bunch of
cheap small disks in place of one or two larger more
expensive ones. Today RAID systems employ large
possibly expensive disks as their components, switching
the definition to Independent disks.
RAID is arranged into six different levels.
OS Notes by Dr. Naveen Choudhary
RAID: Improvement in Performance via
Parallelism
Several improvements in disk-use techniques involve the use of multiple
disks working cooperatively.
Disk striping uses a group of disks as one storage unit.
RAID schemes improve performance and improve the reliability of the
storage system by storing redundant data.
Mirroring or shadowing keeps duplicate of each disk. (Since every block of
data is duplicated on multiple disks, read operations can be satisfied from any
available copy, and multiple disks can be reading different data blocks
simultaneously in parallel.)
Striping (which basically means spreading data out across multiple disks that
can be accessed simultaneously.)
bit-level striping: the bits of each byte are striped across multiple disks.
Block-level striping spreads a filesystem across multiple disks on a
block-by-block basis, so if block N were located on disk 0, then block N
+ 1 would be on disk 1, and so on.
OS Notes by Dr. Naveen Choudhary
• Raid Level 0 - This level includes striping only, with no mirroring.
RAID Levels
• Raid Level 1 - This level includes mirroring only, no striping.
• Raid Level 2 - This level stores error-correcting codes on additional disks,
allowing for any damaged data to be reconstructed by subtraction from the
remaining undamaged data. Note that this scheme requires only three extra
disks to protect 4 disks worth of data, as opposed to full mirroring. ( The
number of disks required is a function of the error-correcting algorithms, and
the means by which the particular bad bit(s) is(are) identified. )
• Raid Level 3 - This level is similar to level 2, except that it takes advantage of
the fact that each disk is still doing its own error-detection, so that when an
error occurs, there is no question about which disk in the array has the bad
data. As a result a single parity bit is all that is needed to recover the lost data
from an array of disks. Level 3 also includes striping, which improves
performance.
• Raid Level 4 - This level is similar to level 3, employing block-level striping
instead of bit-level striping. The benefits are that multiple blocks can be read
independently, and changes to a block only require writing two blocks ( data
and parity ) rather than involving all disks. Note that new disks can be added
seamlessly to the system provided they are initialized to all zeros, as this does
not affect the parity results.
• Raid Level 5 - This level is similar to level 4, except the parity blocks are
distributed over all disks, thereby more evenly balancing the load on the
system. For any given block on the disk(s), one of the disks will hold the parity
information for that block and the other N-1 disks will hold the data. Note that
the same disk cannot hold both data and parity for the same block, as both
would be lost in the event of a disk crash.
• Raid Level 6 - This level extends raid level 5 by storing multiple bits of error-
recovery codes, ( such as the Reed-Solomon codes ), for each bit position of
data, rather than a single parity bit. In the example shown below 2 bits of ECC
are stored for every 4 bits of data, allowing data recovery in the face of up to
two simultaneous disk failures. Note that this still involves only 50% increase in
storage needs, as opposed to 100% for simple mirroring which could only
tolerate a single disk failure.
OS Notes by Dr. Naveen Choudhary
Stable-Storage Implementation
Type of disk failure & recovery methods
A disk write results in one of the three outcomes
Successful completion
Partial failure : partially data were written on the disk
block & then write failed
Total failure: could not write the block atall so old data
on the disk block are in tact
A gen soln.
Keep two physical blocks for each logical block. A o/p
operation will be executed as follow in such case
Write the info. In first physical block
When the first write completes successfully, write the
same info. Onto the second block
Operation is declared completed only after the second
write completes successfully
OS Notes by Dr. Naveen Choudhary