OS Course File
OS Course File
OPERATING SYSTEMS
COURSE FILE
DEPARTMENT OF
CSE-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
(2024-2025)
1
2
Contents
21 Known gaps ,if any and inclusion of the same in lecture schedule 234
3
GEETHANJALI COLLEGE OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF AIML
4
5
6
3. Vision of the Department
To produce globally competent and socially responsible computer science engineers contributing to the
advancement of engineering and technology which involves creativity and innovation by providing excellent
learning environment with world class facilities.
1. To be a Centre of excellence in instruction, innovation in research and scholarship, and service to the stake
holders, the profession, and the public.
2. To prepare graduates to enter a rapidly changing field as a competent computer science engineer.
3. To prepare graduate capable in all phases of software development, possess a firm understanding of hardware
technologies, have the strong mathematical background necessary for scientific computing, and be sufficiently
well versed in general theory to allow growth within the discipline as it advances.
4. To prepare graduates to assume leadership roles by possessing good communication skills, the ability to work
effectively as team members, and an appreciation for their social and ethical responsibility in a global setting.
Program Educational Objectives (PEOs) are broad statements that describe what graduates are expected to
attain within a few years of graduation. The PEOs for Computer Science and Engineering graduates are:
PEO-I: To provide graduates with a good foundation in mathematics, sciences and engineering fundamentals
required to solve engineering problems that will facilitate them to find employment in industry and / or to
pursue postgraduate studies with an appreciation for lifelong learning.
PEO-II: To provide graduates with analytical and problem solving skills to design algorithms, other
hardware / software systems, and inculcate professional ethics, inter-personal skills to work in a multi-
cultural team.
PEO-III: To facilitate graduates get familiarized with state of the art software / hardware tools, imbibing
creativity and Innovation that would enable them to develop cutting-edge technologies of multi-disciplinary
nature for societal development.
6
PROGRAM OUTCOMES (POs) OF B.Tech.(AIML-CSE) PROGRAM:
Program Outcomes (POs) describe what students are expected to know and be able to do by the time of
graduation to accomplish Program Educational Objectives (PEOs). The Program Outcomes for Computer
Science and Engineering graduates are:
Engineering Graduates would be able to:
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
11. Project management and finance: Demonstrate knowledge and understanding of the
7
engineering and management principles and apply these to one’s own work, as a member
and leader in a team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.
PSO1: Demonstrate competency in Programming and problem solving skills and apply these
skills in solving computing problems.
PSO2: Select appropriate programming languages, Data structures and algorithms in combination
with modern technologies and tools, apply them in developing creative and innovative solutions.
PSO3: Demonstrate adequate knowledge in the concepts and techniques of artificial intelligence
and machine learning , apply them in developing intelligent systems to solve real world problems.
8
6. Course Objectives & Course Outcomes
Course Objectives
Develop ability to
1. Understand main components of Operating System (OS) and their working.
2. Implement the different CPU scheduling policies, and process synchronization.
3. Apply the different memory management techniques.
4. Understand File management techniques.
5. Handing Deadlock situations and provide system protection.
7. Brief Importance of the Course and how it fits into the curriculum
10
• They are able to create their own shell script.
j. Five years from now, what do you hope students will remember from this course?
• Analytical thinking and logical reasoning and techniques employed in the analysis and
design of Operating Systems.
k. What is it about this course that makes it unique or special?
The course contents help students to understand and as simulate the principles of operating
systems which will enable student to perform well in future courses study (like Linux).
l. Why does the program offer this course?
• This is the basic course in Operating Systems. Without this course, students cannot
understand and evaluate various Operating Systems available for use.
• This course is a prerequisite for Linux Programming, Embedded Systems.
11
8. Prerequisites
• Data Structures
• Computer Organization
• knowledge in high –level language such as C or JAVA
12
9. Instructional Learning Outcomes
13
4 IV File System Interface Understand the concepts Elucidate the concept of
File System of input/output, storage File Management
and file management. System.
implementation:
File and
Directory implementation,
Free space management
Mass-Storage structure:
Disk structure
Disk schedulin
V Deadlocks: It presents how to detect the Explain various
Principles of deadlock: deadlocks, how to prevent protection and access
5 Deadlock characterization it and how to avoid the control mechanisms.
Deadlock prevention deadlocks
Deadlock detection,
avoidance and recovery. This represents that the
Protection: processes in an OS must be
Goals of protection protected from various
Principles of protection activities. Protection refers
Access matrix to a mechanism for
Access control controlling the access of
programs, processes or
users.
14
10. Course mapping with PEO’s and PO’s
Operating Systems
PSO1
PSO2
PSO3
PO10
PO11
PO12
PO1
PO2
PO3
PO4
PO5
PO6
PO7
PO8
PO9
CO1. Explain the fundamental
concepts, evolution and 3 3 - - - - - - - - - 2 2 - -
services of Operating Systems.
CO2. Apply concepts of
process management in terms
3 2 2 2 - - - - 2 2 - 2 2 - -
of scheduling and coordination
for a given problem.
CO3. Apply Memory
Management techniques for a 3 2 2 2 - - - - - - - 2 2 - -
given problem.
CO4. Elucidate the concept of
3 2 1 2 - - - - - - - 2 2 - -
File Management System.
CO5. Explain various
protection and access control 3 3 2 2 - - - 2 - 2 - 2 2 - -
mechanisms.
3 2.4 1.75 2 - - - 2 2 2 - 2 2 - -
15
11. Class Time Table
16
13. Lecture Schedule
LESSON PLAN
17
27 01 Virtual Memory- Demand Paging, Performance regular BB/LCD
28 01 Page Replacement Algorithms regular BB/LCD
29 01 Allocation of Frames, Thrashing regular BB/LCD
30 01 Discussion on : Objective Questions, regular LCD
University Question Papers (previous)
UNIT 4 File System Interface
31 01 The Concept of a File, Access Methods regular BB/LCD
32 01 Directory Structure, Directory implementation regular BB/LCD
33 01 File System Mounting, File Sharing and Protection regular BB/LCD
34 01 File System Structure, File System Implementation regular BB/LCD
35 01 Allocation Methods-Sequential, Indexed, Linked regular BB/LCD
36 01 Free Space Management: Efficiency &Performance regular BB/LCD
37 01 Mass Storage Structure- Overview, Disk Structure, regular BB/LCD
Disk Attachment
38 01 Disk Scheduling regular BB/LCD
39 01 Disk Management, and Swap SpaceManagement regular BB/LCD
40 01 Discussion on : Objective Questions, regular LCD
University Question Papers (previous)
UNIT 5 Deadlocks and Protection
41 01 Deadlocks- System Model, Deadlock Characterization regular BB/LCD
42 01 Methods for handling Deadlocks, Deadlock Prevention regular BB/LCD
43 01 Deadlock Avoidance regular BB/LCD
44 01 Deadlock Detection & Recovery regular BB/LCD
45 01 Protection- System Protection, Goals and Principles, regular BB/LCD
Domain Protection
46 01 Access Matrix- Implementation, Access Control- regular BB/LCD
Revocation of Access Rights
47 01 Capability based Systems& Language based Protection regular BB/LCD
48 01 Discussion on : Objective Questions, regular LCD
University Question Papers (previous)
18
14. Detailed Notes:
UNIT-1
What is an Operating System?
A program that acts as an intermediary between a user of a computer and the computer hardware
Operating system goals:
• Execute user programs and make solving user problems easier
• Make the computer system convenient to use
• Use the computer hardware in an efficient manner
19
Operating System Definition
• OS is a resource allocator
• Manages all resources
• Decides between conflicting requests for efficient and fair resource use
• OS is a control program
• Controls execution of programs to prevent errors and improper use of the computer
• No universally accepted definition
• Everything a vendor ships when you order an operating system” is good approximation
but varies wildly.
• “The one program running at all times on the computer” is the kernel. Everything else is
either a system program (ships with the operating system) or an application program.
• Computer-system operation
• One or more CPUs, device controllers connect through common bus providing access to
shared memory
• Concurrent execution of CPUs and devices competing for memory cycles
Computer-System Operation
20
Interrupt Handling
• The operating system preserves the state of the CPU by storing registers and the program
counter
• Determines which type of interrupt has occurred:
• polling
• vectored interrupt system
• Separate segments of code determine what action should be taken for each type ofinterrupt
Interrupt Timeline
Storage Hierarchy
• Storage systems organized in hierarchy
• Speed
• Cost
• Volatility
Caching – copying information into faster storage system; main memory can be viewed as a last
•
cache for secondary storage
21
Caching
Two types
1. Asymmetric Multiprocessing
2.Symmetric Multiprocessing
A Dual-Core Design
Clustered Systems
• Like multiprocessor
• systems, but multiple systems working together
• Usually sharing storage via a storage-area network (SAN)
• Provides a high-availability service which survives failures
Asymmetric clustering has one machine in hot-standby mode
Symmetric clustering has multiple nodes running applications, monitoring each other
• Some clusters are for high-performance computing (HPC)
Applications must be written to use parallelization
Operating-System Operations
Operating-System Services
• User Interfaces - Means by which users can issue commands to the system. Depending
on the system these may be a command-line interface ( e.g. sh, csh, ksh, tcsh, etc. ), a
GUI interface ( e.g. Windows, X-Windows, KDE, Gnome, etc. ), or a batch command
systems. The latter are generally older systems using punch cards of job-control
24
language, JCL, but may still be used today for specialty systems designed for a single
purpose.
• Program Execution - The OS must be able to load a program into RAM, run the
program, and terminate the program, either normally or abnormally.
• I/O Operations - The OS is responsible for transferring data to and from I/O devices,
including keyboards, terminals, printers, and storage devices.
• File-System Manipulation - In addition to raw data storage, the OS is also responsible
for maintaining directory and subdirectory structures, mapping file names to specific
blocks of data storage, and providing tools for navigating and utilizing the file system.
• Communications - Inter-process communications, IPC, either between processes
running on the same processor, or between processes running on separate processors or
separate machines. May be implemented as either shared memory or message passing,
(or some systems may offer both.)
• Error Detection - Both hardware and software errors must be detected and handled
appropriately, with a minimum of harmful repercussions. Some systems may include
complex error avoidance or recovery systems, including backups, RAID drives, and other
redundant systems. Debugging and diagnostic tools aid users and administrators in tracing down
the cause of problems.
• Resource Allocation - E.g. CPU cycles, main memory, storage space, and peripheral
devices. Some resources are managed with generic systems and others with very
carefully designed and specially tuned systems, customized for a particular resource and
operating environment.
• Accounting - Keeping track of system activity and resource usage, either for billing
purposes or for statistical record keeping that can be used to optimize future performance.
• Protection and Security - Preventing harm to the system and to resources, either through
wayward internal processes or malicious outsiders. Authentication, ownership, and
restricted access are obvious parts of this system. Highly secure systems may log all
process activity down to excruciating detail, and security regulation dictate the storage of
those records on permanent non-erasable medium for extended times in secure ( off-site )
facilities.
• Gets and processes the next user request, and launches the requested programs.
• In some systems the CI may be incorporated directly into the kernel.
• More commonly the CI is a separate program that launches once the user logs in or otherwise accesses
the system.
• UNIX, for example, provides the user with a choice of different shells, which may either be configured
to launch automatically at login, or which may be changed on the fly. ( Each of these shells uses a
different configuration file of initial settings and commands that are executed upon startup. )
• Different shells provide different functionality, in terms of certain commands that are implemented
directly by the shell without launching any external programs. Most provide at least a rudimentary
25
command interpretation structure for use in shell script programming ( loops, decision constructs,
variables, etc. )
• An interesting distinction is the processing of wild card file naming and I/O re- direction. On UNIX
systems those details are handled by the shell, and the program which is launched sees only a list of
filenames generated by the shell from the wild cards. On a DOS system, the wild cards are passed along
to the programs, which can interpret the wild cards as the program sees fit.
26
Figure 1.3 - The iPad touchscreen
• You can use "strace" to see more examples of the large number of system calls invoked
by a single simple command. Read the man page for strace, and try some simple
examples. ( strace mkdir temp, strace cd temp, strace date > t.t, strace cp t.t t.2, etc. )
• Most programmers do not use the low-level system calls directly, but instead use an
"Application Programming Interface", API. The following sidebar shows the read( ) call
available in the API on UNIX based systems::
The use of APIs instead of direct system calls provides for greater program portability between
different systems. The API then makes the appropriate system calls through the system call
interface, using a table lookup to access specific numbered system calls, as shown in Figure 2.6:
Figure 1.6 - The handling of a user application invoking the open( ) system call
28
• Parameters are generally passed to system calls via registers, or less commonly, by values
pushed onto the stack. Large blocks of data are generally accessed indirectly, through a
memory address passed in a register or on the stack, as shown in Figure 2.7:
Six major categories, as outlined in Figure 1.8 and the following six subsections:
29
• Standard library calls may also generate system calls, as shown here:
• Process control system calls include end, abort, load, execute, create process,
terminate process, get/set process attributes, wait for time or event, signal event,
and allocate and free memory.
• Processes must be created, launched, monitored, paused, resumed,and eventually
stopped.
• When one process pauses or stops, then another must be launched or resumed
• When processes stop abnormally it may be necessary to provide core dumps
and/or other diagnostic or recovery tools.
• Compare DOS ( a single-tasking system ) with UNIX ( a multi-tasking system ).
o When a process is launched in DOS, the command interpreter first unloads
as much of itself as it can to free up memory, then loads the process and
transfers control to it. The interpreter does not resume until the process has
completed, as shown in Figure 1.9:
30
Figure 1.9 - MS-DOS execution. (a) At system startup. (b) Running a program.
31
1.4.2 File Management
• File management system calls include create file, delete file, open, close, read,
write, reposition, get file attributes, and set file attributes.
• These operations may also be supported for directories as well as ordinary files.
• Device management system calls include request device, release device, read,
write, reposition, get/set device attributes, and logically attach or detach devices.
• Devices may be physical ( e.g. disk drives ), or virtual / abstract ( e.g. files,
partitions, and RAM disks ).
• Some systems represent devices as special files in the file system, so that
accessing the "file" calls upon the appropriate device drivers in the OS. See for
example the /dev directory on any UNIX system.
• Information maintenance system calls include calls to get/set the time, date,
system data, and process, file, or device attributes.
• Systems may also provide the ability to dump memory at any time, single step
programs pausing execution after each instruction, and tracing the operation of
programs, all of which can help to debug programs.
1.4.5 Communication
32
1.4.6 Protection
• System programs provide OS functionality through separate applications, which are not
part of the kernel or command interpreters. They are also known as system utilities or
system applications.
• Most systems also ship with useful applications such as calculators and simple editors, (
e.g. Notepad ). Some debate arises as to the border between system and non-system
applications.
• System programs may be divided into these categories:
o File management - programs to create, delete, copy, rename, print, list, and
generally manipulate files and directories.
o Status information - Utilities to check on the date, time, number of users,
processes running, data logging, etc. System registries are used to store and recall
configuration information for particular applications.
o File modification - e.g. text editors and other tools which can change file
contents.
o Programming-language support - E.g. Compilers, linkers, debuggers, profilers,
assemblers, library archive management, interpreters for common languages, and
support for make.
o Program loading and execution - loaders, dynamic loaders, overlay loaders,
etc., as well as interactive debuggers.
o Communications - Programs for providing connectivity between processes and
users, including mail, web browsers, remote logins, file transfers, and remote
command execution.
o Background services - System daemons are commonly started when the system
is booted, and run for as long as the system is running, handling necessary
services. Examples include network daemons, print servers, process schedulers,
and system error monitoring services.
• Most operating systems today also come complete with a set of application programs to
provide additional services, such as copying files or checking the time and date.
• Most users' views of the system is determined by their command interpreter and the
application programs. Most never make system calls, even through the API, ( with the
exception of simple ( file ) I/O in user-written programs. )
33
1.5 Operating-System Design and Implementation
• Requirements define properties which the finished system must have, and are a
necessary first step in designing any large complex system.
o User requirements are features that users care about and understand, and
are written in commonly understood vernacular. They generally do not
include any implementation details, and are written similar to the product
description one might find on a sales brochure or the outside of a shrink-
wrapped box.
o System requirements are written for the developers, and include more
details about implementation specifics, performance requirements,
compatibility constraints, standards compliance, etc. These requirements
serve as a "contract" between the customer and the developers, ( and
between developers and subcontractors ), and can get quite detailed.
• Requirements for operating systems can vary greatly depending on the planned
scope and usage of the system. ( Single user / multi-user, specialized system /
general purpose, high/low security, performance needs, operating environment,
etc. )
1.6.3 Implementation
34
1.6 Operating-System Structure
When DOS was originally written its developers had no idea how big and important it would
eventually become. It was written by a few programmers in a relatively short amount of time,
without the benefit of modern software engineering techniques, and then gradually grew over
time to exceed its original expectations. It does not break the system into subsystems, and has no
distinction between user and kernel modes, allowing all programs direct access to the underlying
hardware. ( Note that user versus kernel mode was not supported by the 8088 chip set anyway,
so that really wasn't an option back then. )
The original UNIX OS used a simple layered approach, but almost all the OS was in one big
layer, not really breaking the OS down into layered subsystems:
35
1.7.2 Layered Approach
1.7.3 Microkernels
• The basic idea behind micro kernels is to remove all non-essential services from
the kernel, and implement them as system applications instead, thereby making
the kernel as small and efficient as possible.
• Most microkernels provide basic process and memory management, and message
passing between other services, and not much more.
• Security and protection can be enhanced, as most services are performed in user
mode, not kernel mode.
• System expansion can also be easier, because it only involves adding more system
applications, not rebuilding a new kernel.
• Mach was the first and most widely known microkernel, and now forms a major
component of Mac OSX.
• Windows NT was originally microkernel, but suffered from performance
problems relative to Windows 95. NT 4.0 improved performance by moving more
services into the kernel, and now XP is back to being more monolithic.
• Another microkernel example is QNX, a real-time OS for embedded systems.
36
Figure 1.14 - Architecture of a typical microkernel
1.7.4 Modules
• Most OSes today do not strictly adhere to one architecture, but are hybrids of
several.
37
1.7.5.1 Mac OS X
• The Max OSX architecture relies on the Mach microkernel for basic
system management services, and the BSD kernel for additional services.
Application services and dynamically loadable modules ( kernel
extensions ) provide the rest of the OS functionality:
Virtual Machines
• The concept of a virtual machine is to provide an interface that looks like independent
hardware, to multiple different OSes running simultaneously on the same physical
hardware. Each OS believes that it has access to and control over its own CPU, RAM,
I/O devices, hard drives, etc.
• One obvious use for this system is for the development and testing of software that must
run on multiple platforms and/or OSes.
• One obvious difficulty involves the sharing of hard drives, which are generally
partitioned into separate smaller virtual disks for each operating OS.
38
1.8.1 History
• Virtual machines first appeared as the VM Operating System for IBM mainframes
in 1972.
1.8.2 Benefits
• Each OS runs independently of all the others, offering protection and security
benefits.
• ( Sharing of physical resources is not commonly implemented, but may be done
as if the virtual machines were networked together. )
• Virtual machines are a very useful tool for OS development, as they allow a user
full access to and control over a virtual machine, without affecting other users
operating the real machine.
• As mentioned before, this approach can also be useful for product development
and testing of SW that must run on multiple OSes / HW platforms.
1.8.3 Simulation
1.8.4 Para-virtualization
39
Figure 1.18 - Solaris 10 with two zones.
1.8.5 Implementation
1.8.6 Examples
1.8.6.1 VMware
40
Figure 1.19 - VMWare Workstation architecture
1.8.6.2 The Java Virtual Machine
• Java was designed from the beginning to be platform independent, by running Java only
on a Java Virtual Machine, JVM, of which different implementations have been
developed for numerous different underlying HW platforms.
• Java source code is compiled into Java byte code in .class files. Java byte code is binary
instructions that will run on the JVM.
• The JVM implements memory management and garbage collection.
• Java byte code may be interpreted as it runs, or compiled to native system binary code
using just-in-time ( JIT ) compilation. Under this scheme, the first time that a piece of
Java byte code is encountered, it is compiled to the appropriate native machine binary
code by the Java interpreter. This native binary code is then cached, so that the next
time that piece of codeis encountered it can be used directly.
• Some hardware chips have been developed to run Java byte code directly, which is an
interesting application of a real machine being developed to emulate the services of a
virtual one!
41
UNIT-II
Process Concepts
A process includes:
• program counter
• stack
• data section
Process in Memory
Process State
As a process executes, it changes state
• new: The process is being created
• running: Instructions are being executed
• waiting: The process is waiting for some event to occur
• ready: The process is waiting to be assigned to a processor
• terminated: The process has finished execution
42
Information associated with each process
• Process state
• Program counter
• CPU registers
• CPU scheduling information
• Memory-management information
• Accounting information
• I/O status information
43
Representation of Process Scheduling
Schedulers
• Long-term scheduler (or job scheduler) – selects which processes should be brought
into the ready queue
• Short-term scheduler (or CPU scheduler) – selects which process should be executed
next and allocates CPU
44
Process Creation
• Parent process create children processes, which, in turn create other processes, forming
a tree of processes
• Generally, process identified and managed via a process identifier (pid)
• Resource sharing
• Parent and children share all resources
• Children share subset of parent’s resources
• Parent and child share no resources
• Child has a program loaded into it
• UNIX examples
• fork system call creates new process
• exec system call used after a fork to replace the process’ memory space with a new
program
Process Creation
45
Process Termination:
• Process executes last statement and asks the operating system to delete it (exit)
• Output data from child to parent (via wait)
• Process’ resources are deallocated by operating system
• Parent may terminate execution of children processes (abort)
• Child has exceeded allocated resources
• Task assigned to child is no longer required
• If parent is exiting
Some operating system do not allow child to continue if its parent terminates
Multithreading Models
• Many-to-One
• One-to-One
• Many-to-Many
Many-to-One
Many user-level threads mapped to single kernel thread
Examples:
One-to-One
Many-to-Many Model
41
Two-level Model
Similar to M:M, except that it allows a user thread to be bound to kernel thread
Examples
• IRIX
• HP-UX
• Tru64 UNIX
• Solaris 8 and earlier
Thread Libraries
• Thread library provides programmer with API for creating and managing threads
• Two primary ways of implementing
• Library entirely in user space
• Kernel-level library supported by the OS
Pthreads
CPU SCHEDULING
• Almost all programs have some alternating cycle of CPU number crunching and waiting
for I/O of some kind. (Even a simple fetch from memory takes a long time relative to
CPU speeds.)
• In a simple system running a single process, the time spent waiting for I/O is wasted, and
those CPU cycles are lost forever.
• A scheduling system allows one process to use the CPU while another is waiting for I/O,
thereby making full use of otherwise lost CPU cycles.
• The challenge is to make the overall system as "efficient" and "fair" as possible, subject
to varying and often dynamic conditions, and where "efficient" and "fair" are somewhat
subjective terms, often subject to shifting priority policies.
• Almost all processes alternate between two states in a continuing cycle, as shown
in Figure 6.1 below :
o A CPU burst of performing calculations, and
o An I/O burst, waiting for data transfer in or out of the system.
42
Figure 2.1 - Alternating sequence of CPU and I/O bursts.
• CPU bursts vary from process to process, and from program to program, but an
extensive study shows frequency patterns similar to that shown in Figure 2.2:
43
2.1.2 CPU Scheduler
• Whenever the CPU becomes idle, it is the job of the CPU Scheduler ( a.k.a. the
short-term scheduler ) to select another process from the ready queue to run next.
• The storage structure for the ready queue and the algorithm used to select the next
process are not necessarily a FIFO queue. There are several alternatives to choose
from, as well as numerous adjustable parameters for each algorithm, which is the
basic subject of this entire chapter.
44
2.1.4 Dispatcher
• The dispatcher is the module that gives control of the CPU to the process
selected by the scheduler. This function involves:
o Switching context.
o Switching to user mode.
o Jumping to the proper location in the newly loaded program.
• The dispatcher needs to be as fast as possible, as it is run on every context switch.
The time consumed by the dispatcher is known as dispatch latency.
• There are several different criteria to consider when trying to select the "best" scheduling
algorithm for a particular situation and environment, including:
o CPU utilization - Ideally the CPU would be busy 100% of the time, so as to
waste 0 CPU cycles. On a real system CPU usage should range from 40% ( lightly
loaded ) to 90% ( heavily loaded. )
o Throughput - Number of processes completed per unit time. May range from 10
/ second to 1 / hour depending on the specific processes.
o Turnaround time - Time required for a particular process to complete, from
submission time to completion. ( Wall clock time. )
o Waiting time - How much time processes spend in the ready queue waiting their
turn to get on the CPU.
▪ ( Load average - The average number of processes sitting in the ready
queue waiting their turn to get into the CPU. Reported in 1-minute, 5-
minute, and 15-minute averages by "uptime" and "who". )
o Response time - The time taken in an interactive program from the issuance of a
command to the commence of a response to that command.
• In general one wants to optimize the average value of a criteria ( Maximize CPU
utilization and throughput, and minimize all the others. ) However some times one wants
to do something different, such as to minimize the maximum response time.
• Sometimes it is most desirable to minimize the variance of a criteria than the actual
value. I.e. users are more accepting of a consistent predictable system than an
inconsistent one, even if it is a little bit slower.
The following subsections will explain several common scheduling strategies, looking at only a
single CPU burst each for a small number of processes. Obviously real systems have to deal with
a lot more simultaneous processes executing their CPU-I/O burst cycles.
45
2.3.1 First-Come First-Serve Scheduling, FCFS
• FCFS is very simple - Just a FIFO queue, like customers waiting in line at the
bank or the post office or at a copying machine.
• Unfortunately, however, FCFS can yield some very long average wait times,
particularly if the first process to get there takes a long time. For example,
consider the following three processes:
• In the first Gantt chart below, process P1 arrives first. The average waiting time
for the three processes is ( 0 + 24 + 27 ) / 3 = 17.0 ms.
• In the second Gantt chart below, the same three processes have an average wait
time of ( 0 + 3 + 6 ) / 3 = 3.0 ms. The total run time for the three bursts is the
same, but in the second case two of the three finish much quicker, and the other
process is only delayed by a short amount.
• FCFS can also block the system in a busy dynamic system in another way, known
as the convoy effect. When one CPU intensive process blocks the CPU, a number
of I/O intensive processes can get backed up behind it, leaving the I/O devices
idle. When the CPU hog finally relinquishes the CPU, then the I/O processes pass
through the CPU quickly, leaving the CPU idle while everyone queues up for I/O,
and then the cycle repeats itself when the CPU intensive process gets back to the
ready queue.
• The idea behind the SJF algorithm is to pick the quickest fastest little job that
needs to be done, get it out of the way first, and then pick the next smallest fastest
job to do next.
46
• ( Technically this algorithm picks a process based on the next shortest CPU
burst, not the overall process time. )
• For example, the Gantt chart below is based upon the following CPU burst times,
( and the assumption that all jobs arrive at the same time. )
o In this scheme the previous estimate contains the history of all previous
times, and alpha serves as a weighting factor for the relative importance of
recent data versus past history. If alpha is 1.0, then past history is ignored,
and we assume the next burst will be the same length as the last burst. If
47
alpha is 0.0, then all measured burst times are ignored, and we just assume
a constant burst time. Most commonly alpha is set at 0.5, as illustrated in
Figure 5.3:
48
• The average wait time in this case is ( ( 5 - 3 ) + ( 10 - 1 ) + ( 17 - 2 ) ) / 4 = 26 / 4
= 6.5 ms. ( As opposed to 7.75 ms for non-preemptive SJF or 8.75 for FCFS. )
• Priority scheduling is a more general case of SJF, in which each job is assigned a
priority and the job with the highest priority gets scheduled first. ( SJF uses the
inverse of the next expected burst time as its priority - The smaller the expected
burst, the higher the priority. )
• Note that in practice, priorities are implemented using integers within a fixed
range, but there is no agreed-upon convention as to whether "high" priorities use
large numbers or small numbers. This book uses low number for high priorities,
with 0 being the highest possible priority.
• For example, the following Gantt chart is based upon these process burst times
and priorities, and yields an average waiting time of 8.2 ms:
49
• Priority scheduling can suffer from a major problem known as indefinite
blocking, or starvation, in which a low-priority task can wait forever because
there are always some other jobs around that have higher priority.
o If this problem is allowed to occur, then processes will either run
eventually when the system load lightens ( at say 2:00 a.m. ), or will
eventually get lost when the system is shut down or crashes. ( There are
rumors of jobs that have been stuck for years. )
o One common solution to this problem is aging, in which priorities of jobs
increase the longer they wait. Under this scheme a low-priority job will
eventually get its priority raised high enough that it gets run.
• Round robin scheduling is similar to FCFS scheduling, except that CPU bursts are
assigned with limits called time quantum.
• When a process is given the CPU, a timer is set for whatever value has been set
for a time quantum.
o If the process finishes its burst before the time quantum timer expires, then
it is swapped out of the CPU just like the normal FCFS algorithm.
o If the timer goes off first, then the process is swapped out of the CPU and
moved to the back end of the ready queue.
• The ready queue is maintained as a circular queue, so when all processes have had
a turn, then the scheduler gives the first process another turn, and so on.
• RR scheduling can give the effect of all processors sharing the CPU equally,
although the average wait time can be longer than with other scheduling
algorithms. In the following example the average wait time is 5.66 ms.
50
context switch times on the order of 10 microseconds, so the overhead is small
relative to the time quantum.
Figure 2.4 - The way in which a smaller time quantum increases context switches.
Figure 2.5 - The way in which turnaround time varies with the time quantum.
• In general, turnaround time is minimized if most processes finish their next cpu
burst within one time quantum. For example, with three processes of 10 ms bursts
each, the average turnaround time for 1 ms quantum is 29, and for 10 ms quantum
it reduces to 20. However, if it is made too large, then RR just degenerates to
FCFS. A rule of thumb is that 80% of CPU bursts should be smaller than the time
quantum.
• When processes can be readily categorized, then multiple separate queues can be
established, each implementing whatever scheduling algorithm is most
appropriate for that type of job, and/or with different parametric adjustments.
51
• Scheduling must also be done between queues, that is scheduling one queue to get
time relative to other queues. Two common options are strict priority ( no job in a
lower priority queue runs until all higher priority queues are empty ) and round-
robin ( each queue gets a time slice in turn, possibly of different sizes. )
• Note that under this algorithm jobs cannot switch from queue to queue - Once
they are assigned a queue, that is their queue until they finish.
52
Figure 2.7 - Multilevel feedback queues.
• Contention scope refers to the scope in which threads compete for the use of
physical CPUs.
• On systems implementing many-to-one and many-to-many threads, Process
Contention Scope, PCS, occurs, because competition occurs between threads that
are part of the same process. ( This is the management / scheduling of multiple
user threads on a single kernel thread, and is managed by the thread library. )
• System Contention Scope, SCS, involves the system scheduler scheduling kernel
threads to run on one or more CPUs. Systems implementing one-to-one threads (
XP, Solaris 9, Linux ), use only SCS.
• PCS scheduling is typically done with priority, where the programmer can set
and/or change the priority of threads created by his or her programs. Even time
slicing is not guaranteed among threads of equal priority.
53
Figure 2.8 - Pthread scheduling API.
• When multiple processors are available, then the scheduling gets more complicated,
because now there is more than one CPU which must be kept busy and in effective use at
all times.
• Load sharing revolves around balancing the load between multiple processors.
• Multi-processor systems may be heterogeneous, ( different kinds of CPUs ), or
homogenous, ( all the same kind of CPU ). Even in the latter case there may be special
scheduling constraints, such as devices which are connected via a private bus to only one
of the CPUs. This book will restrict its discussion to homogenous systems.
54
2.5.1 Approaches to Multiple-Processor Scheduling
• Processors contain cache memory, which speeds up repeated accesses to the same
memory locations.
• If a process were to switch from one processor to another each time it got a time
slice, the data in the cache ( for that process ) would have to be invalidated and re-
loaded from main memory, thereby obviating the benefit of the cache.
• Therefore SMP systems attempt to keep processes on the same processor, via
processor affinity. Soft affinity occurs when the system attempts to keep
processes on the same processor but makes no guarantees. Linux and some other
OSes support hard affinity, in which a process specifies that it is not to be moved
between processors.
• Main memory architecture can also affect process affinity, if particular CPUs
have faster access to memory on the same chip or board than to other memory
loaded elsewhere. ( Non-Uniform Memory Access, NUMA. ) As shown below, if
a process has an affinity for a particular CPU, then it should preferentially be
assigned memory storage in "local" fast access areas.
55
2.5.3 Load Balancing
• Traditional SMP required multiple CPU chips to run multiple kernel threads
concurrently.
• Recent trends are to put multiple CPUs ( cores ) onto a single chip, which appear
to the system as multiple processors.
• Compute cycles can be blocked by the time needed to access memory, whenever
the needed data is not already present in the cache. ( Cache misses. ) In Figure
2.10, as much as half of the CPU cycles are lost to memory stall.
56
Figure 2.11 - Multithreaded multicore system.
57
PROCESS SYNCHRONIZATION
2.6 Background
• Recall that back in Chapter 3 we looked at cooperating processes ( those that can effect or
be effected by other simultaneously running processes ), and as an example, we used the
producer-consumer cooperating processes:
item nextProduced;
while( true ) {
item nextConsumed;
while( true ) {
58
• The only problem with the above code is that the maximum number of items
which can be placed into the buffer is BUFFER_SIZE - 1. One slot is unavailable
because there always has to be a gap between the producer and the consumer.
• We could try to overcome this deficiency by introducing a counter variable, as
shown in the following code segments:
• Unfortunately we have now introduced a new problem, because both the producer
and the consumer are adjusting the value of the variable counter, which can lead
to a condition known as a race condition. In this condition a piece of code may or
may not work correctly, depending on which of two simultaneous processes
executes first, and more importantly if one of the processes gets interrupted such
that the other process runs between important steps of the first process. ( Bank
balance example discussed in class. )
• The particular problem above comes from the producer executing "counter++" at
the same time the consumer is executing "counter--". If one process gets part way
through making the update and then the other process butts in, the value of
counter can get left in an incorrect state.
• But, you might say, "Each of those are single instructions - How can they get
interrupted halfway through?" The answer is that although they are single
instructions in C++, they are actually three steps each at the hardware level: (1)
Fetch counter from memory into a register, (2) increment or decrement the
register, and (3) Store the new value of counter back to memory. If the
instructions from the two processes get interleaved, there could be serious
problems, such as illustrated by the following:
59
• Exercise: What would be the resulting value of counter if the order of statements
T4 and T5 were reversed? ( What should the value of counter be after one
producer and one consumer, assuming the original value was 5? )
• Note that race conditions are notoriously difficult to identify and debug, because
by their very nature they only occur on rare occasions, and only when the timing
is just exactly right. ( or wrong! :-) ) Race conditions are also very difficult to
reproduce. :-(
• Obviously the solution is to only allow one process at a time to manipulate the
value "counter". This is a very common occurrence among cooperating processes,
so lets look at some ways in which this is done, as well as some classic problems
in this area.
60
o The code following the critical section is termed the exit section. It generally
releases the lock on someone else's door, or at least lets the world know that they
are no longer in their critical section.
o The rest of the code not included in either the critical section or the entry or exit
sections is termed the remainder section.
• A solution to the critical section problem must satisfy the following three conditions:
1. Mutual Exclusion - Only one process at a time can be executing in their critical
section.
2. Progress - If no process is currently executing in their critical section, and one or
more processes want to execute their critical section, then only the processes not
in their remainder sections can participate in the decision, and the decision cannot
be postponed indefinitely. ( i.e. processes cannot be blocked forever waiting to get
into their critical sections. )
3. Bounded Waiting - There exists a limit as to how many other processes can get
into their critical sections after a process requests entry into their critical section
and before that request is granted. ( i.e. a process requesting entry into their
critical section will get a turn eventually, and there is a limit as to how many other
processes get to go first. )
• We assume that all processes proceed at a non-zero speed, but no assumptions can be
made regarding the relative speed of one process versus another.
• Kernel processes can also be subject to race conditions, which can be especially
problematic when updating commonly shared kernel data structures such as open file
tables or virtual memory management. Accordingly kernels can take on one of two
forms:
o Non-preemptive kernels do not allow processes to be interrupted while in kernel
mode. This eliminates the possibility of kernel-mode race conditions, but requires
kernel mode operations to complete very quickly, and can be problematic for real-
time systems, because timing cannot be guaranteed.
o Preemptive kernels allow for real-time operations, but must be carefully written to
avoid race conditions. This can be especially tricky on SMP systems, in which
multiple kernel processes may be running simultaneously on different processors.
61
Non-preemptive kernels include Windows XP, 2000, traditional UNIX, and Linux prior
to 2.6; Preemptive kernels include Linux 2.6 and later, and some commercial UNIXes such as
Solaris and IRIX.
• To prove that the solution is correct, we must examine the three conditions listed above:
62
1. Mutual exclusion - If one process is executing their critical section when the
other wishes to do so, the second process will become blocked by the flag of the
first process. If both processes attempt to enter at the same time, the last process
to execute "turn = j" will be blocked.
2. Progress - Each process can only be blocked at the while if the other process
wants to use the critical section ( flag[ j ] = = true ), AND it is the other process's
turn to use the critical section ( turn = = j ). If both of those conditions are true,
then the other process ( j ) will be allowed to enter the critical section, and upon
exiting the critical section, will set flag[ j ] to false, releasing process i. The shared
variable turn assures that only one process at a time can be blocked, and the flag
variable allows one process to release the other when exiting their critical section.
3. Bounded Waiting - As each process enters their entry section, they set the turn
variable to be the other processes turn. Since no process ever sets it back to their
own turn, this ensures that each process will have to let the other process go first
at most one time before it becomes their turn again.
• Note that the instruction "turn = j" is atomic, that is it is a single machine instruction
which cannot be interrupted.
• To generalize the solution(s) expressed above, each process when entering their critical
section must set some sort of lock, to prevent other processes from entering their critical
sections simultaneously, and must release the lock when exiting their critical section, to
allow other processes to proceed. Obviously it must be possible to attain the lock only
when no other process has already set a lock. Specific implementations of this general
procedure can get quite complicated, and may include hardware solutions as outlined in
this section.
• One simple solution to the critical section problem is to simply prevent a process from
being interrupted while in their critical section, which is the approach taken by non
preemptive kernels. Unfortunately this does not work well in multiprocessor
environments, due to the difficulties in disabling and the re-enabling interrupts on all
processors. There is also a question as to how this approach affects timing if the clock
interrupt is disabled.
• Another approach is for hardware to provide certain atomic operations. These operations
are guaranteed to operate as a single instruction, without interruption. One such operation
is the "Test and Set", which simultaneously sets a boolean lock variable and returns its
previous value, as shown in Figures 2.14 and 2.15:
63
Figures 2.14 and 2.15 illustrate "test_and_set( )" function
• The above examples satisfy the mutual exclusion requirement, but unfortunately do not
guarantee bounded waiting. If there are multiple processes trying to get into their critical
sections, there is no guarantee of what order they will enter, and any one process could
have the bad luck to wait forever until they got their turn in the critical section. ( Since
there is no guarantee as to the relative rates of the processes, a very fast process could
theoretically release the lock, whip through their remainder section, and re-lock the lock
before a slower process got a chance. As more and more processes are involved vying for
the same resource, the odds of a slow process getting locked out completely increase. )
64
• Figure 5.7 illustrates a solution using test-and-set that does satisfy this requirement, using
two shared data structures, boolean lock and boolean waiting[ N ], where N is the number
of processes in contention for critical sections:
• The key feature of the above algorithm is that a process blocks on the AND of the critical
section being locked and that this process is in the waiting state. When exiting a critical
section, the exiting process does not just unlock the critical section and let the other
processes have a free-for-all trying to get in. Rather it first looks in an orderly progression
( starting with the next process on the list ) for a process that has been waiting, and if it
finds one, then it releases that particular process from its waiting state, without unlocking
the critical section, thereby allowing a specific process into the critical section while
continuing to block all the others. Only if there are no other processes currently waiting is
the general lock removed, allowing the next process to come along access to the critical
section.
• Unfortunately, hardware level locks are especially difficult to implement in multi-
processor architectures. Discussion of such issues is left to books on advanced computer
architecture.
• The hardware solutions presented above are often difficult for ordinary programmers to
access, particularly on multi-processor machines, and particularly because they are often
platform-dependent.
• Therefore most systems offer a software API equivalent called mutex locks or simply
mutexes. ( For mutual exclusion )
• The terminology when using mutexes is to acquire a lock prior to entering a critical
section, and to release it when exiting, as shown in Figure 2.17:
65
Figure 2.17 - Solution to the critical-section problem using mutex locks
• Just as with hardware locks, the acquire step will block the process if the lock is in use by
another process, and both the acquire and release operations are atomic.
• Acquire and release can be implemented as shown here, based on a boolean variable
"available":
• One problem with the implementation shown here, ( and in the hardware solutions
presented earlier ), is the busy loop used to block processes in the acquire phase. These
types of locks are referred to as spinlocks, because the CPU just sits and spins while
blocking the process.
• Spinlocks are wasteful of cpu cycles, and are a really bad idea on single-cpu single-
threaded machines, because the spinlock blocks the entire computer, and doesn't allow
any other process to release the lock. ( Until the scheduler kicks the spinning process off
of the cpu. )
• On the other hand, spinlocks do not incur the overhead of a context switch, so they are
effectively used on multi-threaded machines when it is expected that the lock will be
released after a short time.
2.11 Semaphores
• A more robust alternative to simple mutexes is to use semaphores, which are integer
variables for which only two ( atomic ) operations are defined, the wait and signal
operations, as shown in the following figure.
66
• Note that not only must the variable-changing steps ( S-- and S++ ) be indivisible, it is
also necessary that for the wait operation when the test proves false that there be no
interruptions before S gets decremented. It IS okay, however, for the busy loop to be
interrupted when the test is true, which prevents the system from hanging forever.
o Counting semaphores can take on any integer value, and are usually used
to count the number remaining of some limited resource. The counter is
67
initialized to the number of such resources available in the system, and
whenever the counting semaphore is greater than zero, then a process can
enter a critical section and use one of the resources. When the counter gets
to zero ( or negative in some implementations ), then the process blocks
until another process frees up a resource and increments the counting
semaphore with a signal call. ( The binary semaphore can be seen as just a
special case where the number of resources initially available is just one. )
o Semaphores can also be used to synchronize certain operations between
processes. For example, suppose it is important that process P1 execute
statement S1 before process P2 executes statement S2.
▪ First we create a semaphore named synch that is shared by the two
processes, and initialize it to zero.
▪ Then in process P1 we insert the code:
S1;
signal( synch );
wait( synch );
S2;
• The big problem with semaphores as described above is the busy loop in the wait
call, which consumes CPU cycles without doing any useful work. This type of
lock is known as a spinlock, because the lock just sits there and spins while it
waits. While this is generally a bad thing, it does have the advantage of not
invoking context switches, and so it is sometimes used in multi-processing
systems when the wait time is expected to be short - One thread spins on one
processor while another completes their critical section on another processor.
• An alternative approach is to block a process when it is forced to wait for an
available semaphore, and swap it out of the CPU. In this implementation each
semaphore needs to maintain a list of processes that are blocked waiting for it, so
that one of the processes can be woken up and swapped back in when the
semaphore becomes available. ( Whether it gets swapped back into the CPU
immediately or whether it needs to hang out in the ready queue for a while is a
scheduling problem. )
• The new definition of a semaphore and the corresponding wait and signal
operations are shown as follows:
68
• Note that in this implementation the value of the semaphore can actually become
negative, in which case its magnitude is the number of processes waiting for that
semaphore. This is a result of decrementing the counter before checking its value.
• Key to the success of semaphores is that the wait and signal operations be atomic,
that is no other process can execute a wait or signal on the same semaphore at the
same time. ( Other processes could be allowed to do other things, including
working with other semaphores, they just can't have access to this semaphore. )
On single processors this can be implemented by disabling interrupts during the
execution of wait and signal; Multiprocessor systems have to use more complex
methods, including the use of spinlocking.
• One important problem that can arise when using semaphores to block processes
waiting for a limited resource is the problem of deadlocks, which occur when
multiple processes are blocked, each waiting for a resource that can only be freed
by one of the other (blocked) processes, as illustrated in the following example
69
• Another problem to consider is that of starvation, in which one or more processes
gets blocked forever, and never get a chance to take their turn in the critical
section. For example, in the semaphores above, we did not specify the algorithms
for adding processes to the waiting queue in the semaphore in the wait( ) call, or
selecting one to be removed from the queue in the signal( ) call. If the method
chosen is a FIFO queue, then every process will eventually get their turn, but if a
LIFO queue is implemented instead, then the first process to start waiting could
starve.
70
2.12 Classic Problems of Synchronization
The following classic problems are used to test virtually every new proposed synchronization
algorithm.
71
2.12.2 The Readers-Writers Problem
• In the readers-writers problem there are some processes ( termed readers ) who
only read the shared data, and never change it, and there are other processes (
termed writers ) who may change the data in addition to or instead of reading it.
There is no limit to how many readers can access the data simultaneously, but
when a writer accesses the data, it needs exclusive access.
• There are several variations to the readers-writers problem, most centered around
relative priorities of readers versus writers.
o The first readers-writers problem gives priority to readers. In this problem,
if a reader wants access to the data, and there is not already a writer
accessing it, then access is granted to the reader. A solution to this
problem can lead to starvation of the writers, as there could always be
more readers coming along to access the data. ( A steady stream of readers
will jump ahead of waiting writers as long as there is currently already
another reader accessing the data, because the writer is forced to wait until
the data is idle, which may never happen if there are enough readers. )
o The second readers-writers problem gives priority to the writers. In this
problem, when a writer wants access to the data it jumps to the head of the
queue - All waiting readers are blocked, and the writer gets access to the
data as soon as it becomes available. In this solution the readers may be
starved by a steady stream of writers.
• The following code is an example of the first readers-writers problem, and
involves an important counter and two binary semaphores:
o readcount is used by the reader processes, to count the number of readers
currently accessing the data.
o mutex is a semaphore used only by the readers for controlled access to
readcount.
o rw_mutex is a semaphore used to block and release the writers. The first
reader to access the data will set this lock and the last reader to exit will
release it; The remaining readers do not touch rw_mutex. ( Eighth edition
called this variable wrt. )
o Note that the first reader to come along will block on rw_mutex if there is
currently a writer accessing the data, and that all following readers will
only block on mutex for their turn to increment readcount.
72
• Some hardware implementations provide specific reader-writer locks, which are
accessed using an argument specifying whether access is requested for reading or
writing. The use of reader-writer locks is beneficial for situation in which: (1)
processes can be easily identified as either readers or writers, and (2) there are
significantly more readers than writers, making the additional overhead of the
reader-writer lock pay off in terms of increased concurrency of the readers.
73
Figure 2.20 - The situation of the dining philosophers
• One possible solution, as shown in the following code section, is to use a set of
five semaphores ( chopsticks[ 5 ] ), and to have each hungry philosopher first wait
on their left chopstick ( chopsticks[ i ] ), and then wait on their right chopstick (
chopsticks[ ( i + 1 ) % 5 ] )
• But suppose that all five philosophers get hungry at the same time, and each starts
by picking up their left chopstick. They then look for their right chopstick, but
because it is unavailable, they wait for it, forever, and eventually all the
philosophers starve due to the resulting deadlock.
74
o Use an asymmetric solution, in which odd philosophers pick up their left
chopstick first and even philosophers pick up their right chopstick first. (
Will this solution always work? What if there are an even number of
philosophers? )
• Note carefully that a deadlock-free solution to the dining philosophers problem
does not necessarily guarantee a starvation-free one. ( While some or even most
of the philosophers may be able to get on with their normal lives of eating and
thinking, there may be one unlucky soul who never seems to be able to get both
chopsticks at the same time. :-(
2.13 Monitors
• Semaphores can be very useful for solving concurrency problems, but only if
programmers use them properly. If even one process fails to abide by the proper use of
semaphores, either accidentally or deliberately, then the whole system breaks down. (
And since concurrency problems are by definition rare events, the problem code may
easily go unnoticed and/or be heinous to debug. )
• For this reason a higher-level language construct has been developed, called monitors.
• A monitor is essentially a class, in which all data is private, and with the special
restriction that only one method within any given monitor object may be active at
the same time. An additional restriction is that monitor methods may only access
the shared data within the monitor and any data passed to them as parameters. I.e.
they cannot access any data external to the monitor.
75
• Figure 2.23 shows a schematic of a monitor, with an entry queue of processes
waiting their turn to execute monitor operations ( methods. )
76
Figure 2.24 - Monitor with condition variables
• But now there is a potential problem - If process P within the monitor issues a
signal that would wake up process Q also within the monitor, then there would be
two processes running simultaneously within the monitor, violating the exclusion
requirement. Accordingly there are two possible solutions to this dilemma:
Signal and wait - When process P issues the signal to wake up process Q, P then waits, either
for Q to leave the monitor or on some other condition.
Signal and continue - When P issues the signal, Q waits, either for P to exit the monitor or for
some other condition.
There are arguments for and against either choice. Concurrent Pascal offers a third alternative -
The signal call causes the signaling process to immediately exit the monitor, so that the waiting
process can then wake up and proceed.
• Java and C# ( C sharp ) offer monitors bulit-in to the language. Erlang offers
similar but different constructs.
• This solution to the dining philosophers uses monitors, and the restriction that a
philosopher may only pick up chopsticks when both are available. There are also
two key data structures in use in this solution:
1. enum { THINKING, HUNGRY,EATING } state[ 5 ]; A philosopher may
only set their state to eating when neither of their adjacent neighbors is
eating. ( state[ ( i + 1 ) % 5 ] != EATING && state[ ( i + 4 ) % 5 ] !=
EATING ).
2. condition self[ 5 ]; This condition is used to delay a hungry philosopher
who is unable to acquire chopsticks.
• In the following solution philosophers share a monitor, DiningPhilosophers, and
eat using the following sequence of operations:
77
1. DiningPhilosophers.pickup( ) - Acquires chopsticks, which may block the
process.
2. eat
3. DiningPhilosophers.putdown( ) - Releases the chopsticks.
78
• Condition variables can be implemented using semaphores as well. For a
condition x, a semaphore "x_sem" and an integer "x_count" are introduced, both
initialized to zero. The wait and signal methods are then implemented as follows.
( This approach to the condition implements the signal-and-wait option described
above for ensuring that only one process at a time is active inside the monitor. )
80
UNIT-III
Main Memory
3.1 Background
• Obviously memory accesses and memory management are a very important part of
modern computer operation. Every instruction has to be fetched from memory before it
can be executed, and most instructions involve retrieving data from memory or storing
data in memory or both.
• The advent of multi-tasking OSs compounds the complexity of memory management,
because as processes are swapped in and out of the CPU, so must their code and data be
swapped in and out of memory, all at high speeds and without interfering with any other
processes.
• Shared memory, virtual memory, the classification of memory as read-only versus read-
write, and concepts like copy-on-write forking all further complicate the issue.
81
Figure 3.1 - A base and a limit register define a logical addresss space
Figure 3.2 - Hardware address protection with base and limit registers
82
3.1.2 Address Binding
• User programs typically refer to memory addresses with symbolic names such as "i",
"count", and "averageTemperature". These symbolic names must be mapped or bound to
physical memory addresses, which typically occurs in several stages:
o Compile Time - If it is known at compile time where a program will reside in
physical memory, then absolute code can be generated by the compiler,
containing actual physical addresses. However if the load address changes at
some later time, then the program will have to be recompiled. DOS .COM
programs use compile time binding.
o Load Time - If the location at which a program will be loaded is not known at
compile time, then the compiler must generate relocatable code, which references
addresses relative to the start of the program. If that starting address changes, then
the program must be reloaded but not recompiled.
o Execution Time - If a program can be moved around in memory during the
course of its execution, then binding must be delayed until execution time. This
requires special hardware, and is the method implemented by most modern OSes.
• Figure 3.3 shows the various stages of the binding processes and the units involved in
each stage:
83
3.1.3 Logical Versus Physical Address Space
• The address generated by the CPU is a logical address, whereas the address actually seen
by the memory hardware is a physical address.
• Addresses bound at compile time or load time have identical logical and physical
addresses.
• Addresses created at execution time, however, have different logical and physical
addresses.
o In this case the logical address is also known as a virtual address, and the two
terms are used interchangeably by our text.
o The set of all logical addresses used by a program composes the logical address
space, and the set of all corresponding physical addresses composes the physical
address space.
• The run time mapping of logical to physical addresses is handled by the memory-
management unit, MMU.
o The MMU can take on many forms. One of the simplest is a modification of the
base-register scheme described earlier.
o The base register is now termed a relocation register, whose value is added to
every memory request at the hardware level.
• Note that user programs never see physical addresses. User programs work entirely in
logical address space, and any memory references or manipulations are done using purely
logical addresses. Only when the address gets sent to the physical memory chips is the
physical memory address generated.
• Rather than loading an entire program into memory at once, dynamic loading loads up
each routine as it is called. The advantage is that unused routines need never be loaded,
reducing total memory usage and generating faster program startup times. The downside
is the added complexity and overhead of checking to see if a routine is loaded every time
it is called and then loading it up if it is not already loaded.
84
3.1.5 Dynamic Linking and Shared Libraries
• With static linking library modules get fully included in executable modules, wasting
both disk space and main memory usage, because every program that included a certain
routine from the library would have to have their own copy of that routine linked into
their executable code.
• With dynamic linking, however, only a stub is linked into the executable module,
containing references to the actual library module linked in at run time.
o This method saves disk space, because the library routines do not need to be fully
included in the executable modules, only the stubs.
o We will also learn that if the code section of the library routines is reentrant,
( meaning it does not modify the code while it runs, making it safe to re-enter it ),
then main memory can be saved by loading only one copy of dynamically linked
routines into memory and sharing the code amongst all processes that are
concurrently using it. ( Each process would have their own copy of the data
section of the routines, but that may be small relative to the code segments. )
Obviously the OS must manage shared routines in memory.
o An added benefit of dynamically linked libraries ( DLLs, also known as shared
libraries or shared objects on UNIX systems ) involves easy upgrades and
updates. When a program uses a routine from a standard library and the routine
changes, then the program must be re-built ( re-linked ) in order to incorporate the
changes. However if DLLs are used, then as long as the stub doesn't change, the
program can be updated merely by loading new versions of the DLLs onto the
system. Version information is maintained in both the program and the DLLs, so
that a program can specify a particular version of the DLL if necessary.
o In practice, the first time a program calls a DLL routine, the stub will recognize
the fact and will replace itself with the actual routine from the DLL library.
Further calls to the same routine will access the routine directly and not incur the
overhead of the stub access. ( Following the UML Proxy Pattern. )
o ( Additional information regarding dynamic linking is available at
http://www.iecc.com/linker/linker10.html )
Swapping
• A process must be loaded into memory in order to execute.
• If there is not enough memory available to keep all running processes in memory at the
same time, then some processes who are not currently using the CPU may have their
memory swapped out to a fast local disk called the backing store.
• One approach to memory management is to load each process into a contiguous space.
The operating system is allocated space first, usually at either low or high memory
locations, and then the remaining available memory is allocated to processes as needed.
( The OS is usually loaded low, because that is where the interrupt vectors are located,
but on older systems part of the OS was loaded high to make more room in low memory (
within the 640K barrier ) for user processes. )
86
3.3.1 Memory Protection
• The system shown in Figure 3.6 below allows protection against user programs accessing
areas that they should not, allows programs to be relocated to different memory starting
addresses as needed, and allows the memory space devoted to the OS to grow or shrink
dynamically as needs change.
• One method of allocating contiguous memory is to divide all available memory into
equal sized partitions, and to assign each process to their own partition. This restricts both
the number of simultaneous processes and the maximum size of each process, and is no
longer used.
• An alternate approach is to keep a list of unused ( free ) memory blocks ( holes ), and to
find a hole of a suitable size whenever a process needs to be loaded into memory. There
are many different strategies for finding the "best" allocation of memory to processes,
including the three most commonly discussed:
1. First fit - Search the list of holes until one is found that is big enough to satisfy
the request, and assign a portion of that hole to that process. Whatever fraction of
the hole not needed by the request is left on the free list as a smaller hole.
Subsequent requests may start looking either from the beginning of the list or
from the point at which this search ended.
2. Best fit - Allocate the smallest hole that is big enough to satisfy the request. This
saves large holes for other process requests that may need them later, but the
resulting unused portions of holes may be too small to be of any use, and will
therefore be wasted. Keeping the free list sorted can speed up the process of
finding the right hole.
3. Worst fit - Allocate the largest hole available, thereby increasing the likelihood
that the remaining portion will be usable for satisfying future requests.
87
• Simulations show that either first or best fit are better than worst fit in terms of both time
and storage utilization. First and best fits are about equal in terms of storage utilization,
but first fit is faster.
3.3.3. Fragmentation
• All the memory allocation strategies suffer from external fragmentation, though first and
best fits experience the problems more so than worst fit. External fragmentation means
that the available memory is broken up into lots of little pieces, none of which is big
enough to satisfy the next memory requirement, although the sum total could.
• The amount of memory lost to fragmentation may vary with algorithm, usage patterns,
and some design decisions such as which end of a hole to allocate and which end to save
on the free list.
• Statistical analysis of first fit, for example, shows that for N blocks of allocated memory,
another 0.5 N will be lost to fragmentation.
• Internal fragmentation also occurs, with all memory allocation strategies. This is caused
by the fact that memory is allocated in blocks of a fixed size, whereas the actual memory
needed will rarely be that exact size. For a random distribution of memory requests, on
the average 1/2 block will be wasted per memory request, because on the average the last
allocated block will be only half full.
o Note that the same effect happens with hard drives, and that modern hardware
gives us increasingly larger drives and memory at the expense of ever larger block
sizes, which translates to more memory lost to internal fragmentation.
o Some systems use variable size blocks to minimize losses due to internal
fragmentation.
• If the programs in memory are relocatable, ( using execution-time address binding ), then
the external fragmentation problem can be reduced via compaction, i.e. moving all
processes down to one end of physical memory. This only involves updating the
relocation register for each process, as all internal work is done using logical addresses.
• Another solution as we will see in upcoming sections is to allow processes to use non-
contiguous blocks of physical memory, with a separate relocation register for each block.
3.4 Segmentation
3.4.1 Basic Method
• Most users (programmers) do not think of their programs as existing in one continuous
linear address space.
• Rather they tend to think of their memory in multiple segments, each dedicated to a
particular use, such as code, data, the stack, the heap, etc.
• Memory segmentation supports this view by providing addresses with a segment number
(mapped to a segment base address) and an offset from the beginning of that segment.
• For example, a C compiler might generate 5 segments for the user code, library code,
global ( static ) variables, the stack, and the heap, as shown in Figure 3.7:
88
Figure 3.7 Programmer's view of a program.
89
Figure 3.9 - Example of segmentation
3.5 Paging
90
• Paging eliminates most of the problems of the other methods discussed previously, and is
the predominant memory management technique used today.
• The basic idea behind paging is to divide physical memory into a number of equal sized
blocks called frames, and to divide a programs logical memory space into blocks of the
same size called pages.
• Any page ( from any process ) can be placed into any available frame.
• The page table is used to look up what frame a particular page is stored in at the moment.
In the following example, for instance, page 2 of the program's logical memory is
currently stored in frame 3 of physical memory:
• A logical address consists of two parts: A page number in which the address resides, and
an offset from the beginning of that page. ( The number of bits in the page number limits
91
how many pages a single process can address. The number of bits in the offset determines
the maximum size of each page, and should correspond to the system frame size. )
• The page table maps the page number to a frame number, to yield a physical address
which also has two parts: The frame number and the offset within that frame. The number
of bits in the frame number determines how many frames the system can address, and the
number of bits in the offset determines the size of each frame.
• Page numbers, frame numbers, and frame sizes are determined by the architecture, but
are typically powers of two, allowing addresses to be split at a certain number of bits. For
example, if the logical address size is 2^m and the page size is 2^n, then the high-order
m-n bits of a logical address designate the page number and the remaining n bits
represent the offset.
• Note also that the number of bits in the page number and the number of bits in the frame
number do not have to be identical. The former determines the address range of the
logical address space, and the latter relates to the physical address space.
• ( DOS used to use an addressing scheme with 16 bit frame numbers and 16-bit offsets, on
hardware that only supported 24-bit hardware addresses. The result was a resolution of
starting frame addresses finer than the size of a single frame, and multiple frame-offset
combinations that mapped to the same physical hardware address. )
• Consider the following micro example, in which a process has 16 bytes of logical
memory, mapped in 4 byte pages into 32 bytes of physical memory. ( Presumably some
other processes would be consuming the remaining 16 bytes of physical memory. )
92
Figure 3.12 - Paging example for a 32-byte memory with 4-byte pages
• Note that paging is like having a table of relocation registers, one for each page of the
logical memory.
• There is no external fragmentation with paging. All blocks of physical memory are used,
and there are no gaps in between and no problems with finding the right sized hole for a
particular chunk of memory.
• There is, however, internal fragmentation. Memory is allocated in chunks the size of a
page, and on the average, the last page will only be half full, wasting on the average half
a page of memory per process. ( Possibly more, if processes keep their code and data in
separate pages. )
• Larger page sizes waste more memory, but are more efficient in terms of overhead.
Modern trends have been to increase page sizes, and some systems even have multiple
size pages to try and make the best of both worlds.
• Page table entries ( frame numbers ) are typically 32 bit numbers, allowing access to
2^32 physical page frames. If those frames are 4 KB in size each, that translates to 16 TB
of addressable physical memory. ( 32 + 12 = 44 bits of physical address space. )
• When a process requests memory ( e.g. when its code is loaded in from disk ), free
frames are allocated from a free-frame list, and inserted into that process's page table.
• Processes are blocked from accessing anyone else's memory because all of their memory
requests are mapped through their page table. There is no way for them to generate an
address that maps into any other process's memory space.
• The operating system must keep track of each individual process's page table, updating it
whenever the process's pages get moved in and out of memory, and applying the correct
page table when processing system calls for a particular process. This all increases the
93
overhead involved when swapping processes in and out of the CPU. ( The currently
active page table must be updated to reflect the process that is currently running. )
Figure 3.13 - Free frames (a) before allocation and (b) after allocation
• Page lookups must be done for every memory reference, and whenever a process gets
swapped in or out of the CPU, its page table must be swapped in and out too, along with
the instruction registers, etc. It is therefore appropriate to provide hardware support for
this operation, in order to make it as fast as possible and to make process switches as fast
as possible also.
• One option is to use a set of registers for the page table. For example, the DEC PDP-11
uses 16-bit addressing and 8 KB pages, resulting in only 8 pages per process. ( It takes 13
bits to address 8 KB of offset, leaving only 3 bits to define a page number. )
• An alternate option is to store the page table in main memory, and to use a single register
( called the page-table base register, PTBR ) to record where in memory the page table is
located.
o Process switching is fast, because only the single register needs to be changed.
o However memory access just got half as fast, because every memory access now
requires two memory accesses - One to fetch the frame number from memory and
then another one to access the desired memory location.
o The solution to this problem is to use a very special high-speed memory device
called the translation look-aside buffer, TLB.
▪ The benefit of the TLB is that it can search an entire table for a key value
in parallel, and if it is found anywhere in the table, then the corresponding
lookup value is returned.
94
Figure 3.14 - Paging hardware with TLB
▪ The TLB is very expensive, however, and therefore very small. ( Not large
enough to hold the entire page table. ) It is therefore used as a cache
device.
▪ Addresses are first checked against the TLB, and if the info is not
there ( a TLB miss ), then the frame is looked up from main
memory and the TLB is updated.
▪ If the TLB is full, then replacement strategies range from least-
recently used, LRU to random.
▪ Some TLBs allow some entries to be wired down, which means
that they cannot be removed from the TLB. Typically these would
be kernel frames.
▪ Some TLBs store address-space identifiers, ASIDs, to keep track
of which process "owns" a particular entry in the TLB. This allows
entries from multiple processes to be stored simultaneously in the
TLB without granting one process access to some other process's
memory location. Without this feature the TLB has to be flushed
clean with every process switch.
▪ The percentage of time that the desired information is found in the TLB is
termed the hit ratio.
▪ ( Eighth Edition Version: ) For example, suppose that it takes 100
nanoseconds to access main memory, and only 20 nanoseconds to search
the TLB. So a TLB hit takes 120 nanoseconds total ( 20 to find the frame
number and then another 100 to go get the data ), and a TLB miss takes
220 ( 20 to search the TLB, 100 to go get the frame number, and then
another 100 to go get the data. ) So with an 80% TLB hit ratio, the average
memory access time would be:
for a 40% slowdown to get the frame number. A 98% hit rate would yield 122
nanoseconds average access time ( you should verify this ), for a 22% slowdown.
95
▪ ( Ninth Edition Version: ) The ninth edition ignores the 20 nanoseconds
required to search the TLB, yielding
for a 20% slowdown to get the frame number. A 99% hit rate would yield 101
nanoseconds average access time ( you should verify this ), for a 1% slowdown.
3.5.3 Protection
• The page table can also help to protect processes from accessing memory that they
shouldn't, or their own memory in ways that they shouldn't.
• A bit or bits can be added to the page table to classify a page as read-write, read-only,
read-write-execute, or some combination of these sorts of things. Then each memory
reference can be checked to ensure it is accessing the memory in the appropriate mode.
• Valid / invalid bits can be added to "mask off" entries in the page table that are not in use
by the current process, as shown by example in Figure 3.12 below.
• Note that the valid / invalid bits described above cannot block all illegal memory
accesses, due to the internal fragmentation. ( Areas of memory in the last page that are
not entirely filled by the process, and may contain data left over by whoever used that
frame last. )
• Many processes do not use all of the page table available to them, particularly in modern
systems with very large potential page tables. Rather than waste memory by creating a
full-size page table for every process, some systems use a page-table length register,
PTLR, to specify the length of the page table.
• Paging systems can make it very easy to share blocks of memory, by simply duplicating
page numbers in multiple page frames. This may be done with either code or data.
96
• If code is reentrant, that means that it does not write to or change the code in any way ( it
is non self-modifying ), and it is therefore safe to re-enter it. More importantly, it means
the code can be shared by multiple processes, so long as each has their own copy of the
data and registers, including the instruction register.
• In the example given below, three different users are running the editor simultaneously,
but the code is only loaded into memory ( in the page frames ) one time.
• Some systems also implement shared memory in this fashion.
• Most modern computer systems support logical address spaces of 2^32 to 2^64.
• With a 2^32 address space and 4K ( 2^12 ) page sizes, this leave 2^20 entries in the page
table. At 4 bytes per entry, this amounts to a 4 MB page table, which is too large to
reasonably keep in contiguous memory. ( And to swap in and out of memory with each
process switch. ) Note that with 4K pages, this would take 1024 pages just to hold the
page table!
• One option is to use a two-tier paging system, i.e. to page the page table.
• For example, the 20 bits described above could be broken down into two 10-bit page
numbers. The first identifies an entry in the outer page table, which identifies where in
memory to find one page of an inner page table. The second 10 bits finds a specific entry
in that inner page table, which in turn identifies a particular frame in physical memory.
( The remaining 12 bits of the 32 bit logical address are the offset within the 4K frame. )
97
Figure 3.17 A two-level page-table scheme
98
Figure 3.18 - Address translation for a two-level 32-bit paging architecture
• VAX Architecture divides 32-bit addresses into 4 equal sized sections, and each page is
512 bytes, yielding an address form of:
• With a 64-bit logical address space and 4K pages, there are 52 bits worth of page
numbers, which is still too many even for two-level paging. One could increase the
paging level, but with 10-bit page tables it would take 7 levels of indirection, which
would be prohibitively slow memory access. So some other approach must be used.
99
3.6.2 Hashed Page Tables
• One common data structure for accessing data that is sparsely distributed over a broad
range of possible values is with hash tables. Figure 3.16 below illustrates a hashed page
table using chain-and-bucket hashing:
• Another approach is to use an inverted page table. Instead of a table listing all of the
pages for a particular process, an inverted page table lists all of the pages currently loaded
in memory, for all processes. ( I.e. there is one entry per frame instead of one entry per
page. )
• Access to an inverted page table can be slow, as it may be necessary to search the entire
table in order to find the desired page ( or to discover that it is not there. ) Hashing the
table can help speedup the search process.
• Inverted page tables prohibit the normal method of implementing shared memory, which
is to map multiple logical pages to a common physical frame. ( Because each frame is
now mapped to one and only one process. )
100
Figure 3.20 - Inverted page table
• The Pentium CPU provides both pure segmentation and segmentation with paging. In the
latter case, the CPU generates a logical address ( segment-offset pair ), which the
segmentation unit converts into a logical linear address, which in turn is mapped to a
physical frame by the paging unit, as shown in Figure 3.21:
101
o The descriptor tables contain 8-byte descriptions of each segment, including base
and limit registers.
o Logical linear addresses are generated by looking the selector up in the descriptor
table and adding the appropriate base address to the offset, as shown in Figure
3.22:
• Pentium paging normally uses a two-tier paging scheme, with the first 10 bits being a
page number for an outer page table ( a.k.a. page directory ), and the next 10 bits being a
page number within one of the 1024 inner page tables, leaving the remaining 12 bits as an
offset into a 4K page.
• A special bit in the page directory can indicate that this page is a 4MB page, in which
case the remaining 22 bits are all used as offset and the inner tier of page tables is not
used.
• The CR3 register points to the page directory for the current process, as shown in Figure
8.23 below.
• If the inner page table is currently swapped out to disk, then the page directory will have
an "invalid bit" set, and the remaining 31 bits provide information on where to find the
swapped out page table on the disk.
102
Figure 3.23 - Paging in the IA-32 architecture.
103
1 Virtual Memory Background
• Preceding sections talked about how to avoid memory fragmentation by breaking process
memory requirements down into smaller bites (pages), and storing the pages non-
contiguously in memory. However the entire process still had to be stored in memory
somewhere.
• In practice, most real processes do not need all their pages, or at least not all at once, for
several reasons:
1. Error handling code is not needed unless that specific error occurs, some of which
are quite rare.
2. Arrays are often over-sized for worst-case scenarios, and only a small fraction of
the arrays are actually used in practice.
3. Certain features of certain programs are rarely used, such as the routine to balance
the federal budget. :-)
• The ability to load only the portions of processes that were actually needed ( and only
when they were needed ) has several benefits:
o Programs could be written for a much larger address space ( virtual memory space
) than physically exists on the computer.
o Because each process is only using a fraction of their total address space, there is
more memory left for other programs, improving CPU utilization and system
throughput.
o Less I/O is needed for swapping processes in and out of RAM, speeding things
up.
Figure below shows the general layout of virtual memory, which can be much larger than
physical memory:
Figure 3.25 - Diagram showing virtual memory that is larger than physical memory
• Figure 3.25 shows virtual address space, which is the programmers logical view of
process memory storage. The actual physical layout is controlled by the process's page
table.
104
• Note that the address space shown in Figure 9.2 is sparse - A great hole in the middle of
the address space is never used, unless the stack and/or the heap grow to fill the hole.
• Virtual memory also allows the sharing of files and memory by multiple processes, with
several benefits:
o System libraries can be shared by mapping them into the virtual address space of
more than one process.
o Processes can also share virtual memory by mapping the same block of memory
to more than one process.
o Process pages can be shared during a fork( ) system call, eliminating the need to
copy all of the pages of the original ( parent ) process.
105
3.2 Demand Paging
• The basic idea behind demand paging is that when a process is swapped in, its pages are
not swapped in all at once. Rather they are swapped in only when the process needs them.
( on demand. ) This is termed a lazy swapper, although a pager is a more accurate term.
• The basic idea behind paging is that when a process is swapped in, the pager only loads
into memory those pages that it expects the process to need ( right away. )
• Pages that are not loaded into memory are marked as invalid in the page table, using the
invalid bit. ( The rest of the page table entry may either be blank or contain information
about where to find the swapped-out page on the hard drive. )
• If the process only ever accesses pages that are loaded in memory ( memory resident
pages ), then the process runs exactly as if all the pages were loaded in to memory.
106
Figure 3.29 - Page table when some pages are not in main memory.
• On the other hand, if a page is needed that was not originally loaded up, then a page fault
trap is generated, which must be handled in a series of steps:
1. The memory address requested is first checked, to make sure it was a valid
memory request.
2. If the reference was invalid, the process is terminated. Otherwise, the page must
be paged in.
3. A free frame is located, possibly from a free-frame list.
4. A disk operation is scheduled to bring in the necessary page from disk. ( This will
usually block the process on an I/O wait, allowing some other process to use the
CPU in the meantime. )
5. When the I/O operation is complete, the process's page table is updated with the
new frame number, and the invalid bit is changed to indicate that this is now a
valid page reference.
6. The instruction that caused the page fault must now be restarted from the
beginning, ( as soon as this process gets another turn on the CPU. )
• In an extreme case, NO pages are swapped in for a process until they are requested by
page faults. This is known as pure demand paging.
107
• In theory each instruction could generate multiple page faults. In practice this is very rare,
due to locality of reference, covered in section 9.6.1.
• The hardware necessary to support virtual memory is the same as for paging and
swapping: A page table and secondary memory. ( Swap space, whose allocation is
discussed in chapter 12. )
• A crucial part of the process is that the instruction must be restarted from scratch once the
desired page has been made available in memory. For most simple instructions this is not
a major difficulty. However there are some architectures that allow a single instruction to
modify a fairly large block of data, ( which may span a page boundary ), and if some of
the data gets modified before the page fault occurs, this could cause problems. One
solution is to access both ends of the block before executing the instruction, guaranteeing
that the necessary pages get paged in before the instruction begins.
• Obviously there is some slowdown and performance hit whenever a page fault occurs and
the system has to go get it from memory, but just how big a hit is it exactly?
• There are many steps that occur when servicing a page fault ( see book for full details ),
and some of the steps are optional or variable. But just for the sake of discussion, suppose
that a normal memory access requires 200 nanoseconds, and that servicing a page fault
takes 8 milliseconds. ( 8,000,000 nanoseconds, or 40,000 times a normal memory
access. ) With a page fault rate of p, ( on a scale from 0 to 1 ), the effective access time is
now:
( 1 - p ) * ( 200 ) + p * 8000000
= 200 + 7,999,800 * p
which clearly depends heavily on p! Even if only one access in 1000 causes a page fault, the
effective access time drops from 200 nanoseconds to 8.2 microseconds, a slowdown of a factor
of 40 times. In order to keep the slowdown less than 10%, the page fault rate must be less than
0.0000025, or one in 399,990 accesses.
• A subtlety is that swap space is faster to access than the regular file system, because it
does not have to go through the whole directory structure. For this reason some systems
will transfer an entire process from the file system to swap space before starting up the
process, so that future paging all occurs from the ( relatively ) faster swap space.
• Some systems use demand paging directly from the file system for binary code ( which
never changes and hence does not have to be stored on a page operation ), and to reserve
the swap space for data segments that must be stored. This approach is used by both
Solaris and BSD Unix.
• In order to make the most use of virtual memory, we load several processes into memory
at the same time. Since we only load the pages that are actually needed by each process at
108
any given time, there is room to load many more processes than if we had to load in the
entire process.
• However memory is also needed for other purposes ( such as I/O buffering ), and what
happens if some process suddenly decides it needs more pages and there aren't any free
frames available? There are several possible solutions to consider:
1. Adjust the memory used by I/O buffering, etc., to free up some frames for user
processes. The decision of how to allocate memory for I/O versus user processes
is a complex one, yielding different policies on different systems. ( Some allocate
a fixed amount for I/O, and others let the I/O system contend for memory along
with everything else. )
2. Put the process requesting more pages into a wait queue until some free frames
become available.
3. Swap some process out of memory completely, freeing up its page frames.
4. Find some page in memory that isn't being used right now, and swap that page
only out to disk, freeing up a frame that can be allocated to the process requesting
it. This is known as page replacement, and is the most common solution. There
are many different algorithms for page replacement, which is the subject of the
remainder of this section.
• The previously discussed page-fault processing assumed that there would be free frames
available on the free-frame list. Now the page-fault handling must be modified to free up
a frame if necessary, as follows:
1. Find the location of the desired page on the disk, either in swap space or in the file
system.
2. Find a free frame:
a. If there is a free frame, use it.
b. If there is no free frame, use a page-replacement algorithm to select an
existing frame to be replaced, known as the victim frame.
c. Write the victim frame to disk. Change all related page tables to indicate
that this page is no longer in memory.
109
3. Read in the desired page and store it in the frame. Adjust all related page and
frame tables to indicate the change.
4. Restart the process that was waiting for this page.
• Note that step 3c adds an extra disk write to the page-fault handling, effectively doubling
the time required to process a page fault. This can be alleviated somewhat by assigning a
modify bit, or dirty bit to each page, indicating whether or not it has been changed since it
was last loaded in from disk. If the dirty bit has not been set, then the page is unchanged,
and does not need to be written out to disk. Otherwise the page write is required. It
should come as no surprise that many page replacement strategies specifically look for
pages that do not have their dirty bit set, and preferentially select clean pages as victim
pages. It should also be obvious that unmodifiable code pages never get their dirty bits
set.
• There are two major requirements to implement a successful demand paging system. We
must develop a frame-allocation algorithm and a page-replacement algorithm. The
former centers around how many frames are allocated to each process ( and to other
needs ), and the latter deals with how to select a page for replacement when there are no
free frames available.
• The overall goal in selecting and tuning these algorithms is to generate the fewest number
of overall page faults. Because disk access is so slow relative to memory access, even
slight improvements to these algorithms can yield large improvements in overall system
performance.
• Algorithms are evaluated using a given string of memory accesses known as a reference
string, which can be generated in one of ( at least ) three common ways:
1. Randomly generated, either evenly distributed or with some distribution curve
based on observed system behavior. This is the fastest and easiest approach, but
may not reflect real performance well, as it ignores locality of reference.
2. Specifically designed sequences. These are useful for illustrating the properties of
comparative algorithms in published papers and textbooks, ( and also for
homework and exam problems. :-) )
3. Recorded memory references from a live system. This may be the best approach,
but the amount of data collected can be enormous, on the order of a million
110
addresses per second. The volume of collected data can be reduced by making
two important observations:
1. Only the page number that was accessed is relevant. The offset within that
page does not affect paging operations.
2. Successive accesses within the same page can be treated as a single page
request, because all requests after the first are guaranteed to be page hits.
( Since there are no intervening requests for other pages that could remove
this page from the page table. )
▪ So for example, if pages were of size 100 bytes, then the sequence of
address requests ( 0100, 0432, 0101, 0612, 0634, 0688, 0132, 0038, 0420 )
would reduce to page requests ( 1, 4, 1, 6, 1, 0, 4 )
As the number of available frames increases, the number of page faults should decrease,
as shown in Figure 3.33:
111
Figure 3.34 - FIFO page-replacement algorithm.
• Although FIFO is simple and easy, it is not always optimal, or even efficient.
• An interesting effect that can occur with FIFO is Belady's anomaly, in which increasing
the number of frames available can actually increase the number of page faults that
occur! Consider, for example, the following chart based on the page sequence ( 1, 2, 3, 4,
1, 2, 5, 1, 2, 3, 4, 5 ) and a varying number of available frames. Obviously the maximum
number of faults is 12 ( every request generates a fault ), and the minimum number is 5 (
each page loaded only once ), but in between there are some interesting results:
• The discovery of Belady's anomaly lead to the search for an optimal page-replacement
algorithm, which is simply that which yields the lowest of all possible page-faults, and
which does not suffer from Belady's anomaly.
• Such an algorithm does exist, and is called OPT or MIN. This algorithm is simply
"Replace the page that will not be used for the longest time in the future."
• For example, Figure 9.14 shows that by applying OPT to the same reference string used
for the FIFO example, the minimum number of possible page faults is 9. Since 6 of the
page-faults are unavoidable ( the first reference to each new page ), FIFO can be shown
112
to require 3 times as many ( extra ) page faults as the optimal algorithm. ( Note: The book
claims that only the first three page faults are required by all algorithms, indicating that
FIFO is only twice as bad as OPT. )
• Unfortunately OPT cannot be implemented in practice, because it requires foretelling the
future, but it makes a nice benchmark for the comparison and evaluation of real proposed
new algorithms.
• In practice most page-replacement algorithms try to approximate OPT by predicting
( estimating ) in one fashion or another what page will not be used for the longest period
of time. The basis of FIFO is the prediction that the page that was brought in the longest
time ago is the one that will not be needed again for the longest future time, but as we
shall see, there are many other prediction methods, all striving to match the performance
of OPT.
• The prediction behind LRU, the Least Recently Used, algorithm is that the page that has
not been used in the longest time is the one that will not be used again in the near future.
( Note the distinction between FIFO and LRU: The former looks at the oldest load time,
and the latter looks at the oldest use time. )
• Some view LRU as analogous to OPT, except looking backwards in time instead of
forwards. ( OPT has the interesting property that for any reference string S and its reverse
R, OPT will generate the same number of page faults for S and for R. It turns out that
LRU has this same property. )
• Figure 9.15 illustrates LRU for our sample string, yielding 12 page faults, ( as compared
to 15 for FIFO and 9 for OPT. )
113
Figure 3.37 - LRU page-replacement algorithm.
• LRU is considered a good replacement policy, and is often used. The problem is how
exactly to implement it. There are two simple approaches commonly used:
1. Counters. Every memory access increments a counter, and the current value of
this counter is stored in the page table entry for that page. Then finding the LRU
page involves simple searching the table for the page with the smallest counter
value. Note that overflowing of the counter must be considered.
2. Stack. Another approach is to use a stack, and whenever a page is accessed, pull
that page from the middle of the stack and place it on the top. The LRU page will
always be at the bottom of the stack. Because this requires removing objects from
the middle of the stack, a doubly linked list is the recommended data structure.
• Note that both implementations of LRU require hardware support, either for incrementing
the counter or for managing the stack, as these operations must be performed for every
memory access.
• Neither LRU or OPT exhibit Belady's anomaly. Both belong to a class of page-
replacement algorithms called stack algorithms, which can never exhibit Belady's
anomaly. A stack algorithm is one in which the pages kept in memory for a frame set of
size N will always be a subset of the pages kept for a frame size of N + 1. In the case of
LRU, ( and particularly the stack implementation thereof ), the top N pages of the stack
will be the same for all frame set sizes of N or anything larger.
Figure 3.38 - Use of a stack to record the most recent page references.
114
3.5 LRU-Approximation Page Replacement
• Unfortunately full implementation of LRU requires hardware support, and few systems
provide the full hardware support necessary.
• However many systems offer some degree of HW support, enough to approximate LRU
fairly well. ( In the absence of ANY hardware support, FIFO might be the best available
choice. )
• In particular, many systems provide a reference bit for every entry in a page table, which
is set anytime that page is accessed. Initially all bits are set to zero, and they can also all
be cleared at any time. One bit of precision is enough to distinguish pages that have been
accessed since the last clear from those that have not, but does not provide any finer grain
of detail.
• Finer grain is possible by storing the most recent 8 reference bits for each page in an 8-bit
byte in the page table entry, which is interpreted as an unsigned int.
o At periodic intervals ( clock interrupts ), the OS takes over, and right-shifts each
of the reference bytes by one bit.
o The high-order ( leftmost ) bit is then filled in with the current value of the
reference bit, and the reference bits are cleared.
o At any given time, the page with the smallest value for the reference byte is the
LRU page.
• Obviously the specific number of bits used and the frequency with which the reference
byte is updated are adjustable, and are tuned to give the fastest performance on a given
hardware platform.
• The second chance algorithm is essentially a FIFO, except the reference bit is used to
give pages a second chance at staying in the page table.
o When a page must be replaced, the page table is scanned in a FIFO ( circular
queue ) manner.
o If a page is found with its reference bit not set, then that page is selected as the
next victim.
o If, however, the next page in the FIFO does have its reference bit set, then it is
given a second chance:
▪ The reference bit is cleared, and the FIFO search continues.
▪ If some other page is found that did not have its reference bit set, then that
page will be selected as the victim, and this page ( the one being given the
second chance ) will be allowed to stay in the page table.
▪ If , however, there are no other pages that do not have their reference bit
set, then this page will be selected as the victim when the FIFO search
circles back around to this page on the second pass.
• If all reference bits in the table are set, then second chance degrades to FIFO, but also
requires a complete search of the table for every page-replacement.
115
• As long as there are some pages whose reference bits are not set, then any page
referenced frequently enough gets to stay in the page table indefinitely.
• This algorithm is also known as the clock algorithm, from the hands of the clock moving
around the circular queue.
• The enhanced second chance algorithm looks at the reference bit and the modify bit
( dirty bit ) as an ordered page, and classifies pages into one of four classes:
1. ( 0, 0 ) - Neither recently used nor modified.
2. ( 0, 1 ) - Not recently used, but modified.
3. ( 1, 0 ) - Recently used, but clean.
4. ( 1, 1 ) - Recently used and modified.
• This algorithm searches the page table in a circular fashion ( in as many as four passes ),
looking for the first page it can find in the lowest numbered category. I.e. it first makes a
pass looking for a ( 0, 0 ), and then if it can't find one, it makes another pass looking for a
( 0, 1 ), etc.
• The main difference between this algorithm and the previous one is the preference for
replacing clean pages if possible.
• There are several algorithms based on counting the number of references that have been
made to a given page, such as:
o Least Frequently Used, LFU: Replace the page with the lowest reference count.
A problem can occur if a page is used frequently initially and then not used any
more, as the reference count remains high. A solution to this problem is to right-
shift the counters periodically, yielding a time-decaying average reference count.
o Most Frequently Used, MFU: Replace the page with the highest reference count.
The logic behind this idea is that pages that have already been referenced a lot
116
have been in the system a long time, and we are probably done with them,
whereas pages referenced only a few times have only recently been loaded, and
we still need them.
• In general counting-based algorithms are not commonly used, as their implementation is
expensive and they do not approximate OPT well.
There are a number of page-buffering algorithms that can be used in conjunction with the afore-
mentioned algorithms, to improve overall performance and sometimes make up for inherent
weaknesses in the hardware and/or the underlying page-replacement algorithms:
• Maintain a certain minimum number of free frames at all times. When a page-fault
occurs, go ahead and allocate one of the free frames from the free list first, to get the
requesting process up and running again as quickly as possible, and then select a victim
page to write to disk and free up a frame as a second step.
• Keep a list of modified pages, and when the I/O system is otherwise idle, have it write
these pages out to disk, and then clear the modify bits, thereby increasing the chance of
finding a "clean" page for the next potential victim.
• Keep a pool of free frames, but remember what page was in it before it was made free.
Since the data in the page is not actually cleared out when the page is freed, it can be
made an active page again without having to load in any new data from disk. This is
useful when an algorithm mistakenly replaces a page that in fact is needed again soon.
• Some applications ( most notably database programs ) understand their data accessing
and caching needs better than the general-purpose OS, and should therefore be given
reign to do their own memory management.
• Sometimes such programs are given a raw disk partition to work with, containing raw
data blocks and no file system structure. It is then up to the application to use this disk
partition as extended memory or for whatever other reasons it sees fit.
We said earlier that there were two important tasks in virtual memory management: a page-
replacement strategy and a frame-allocation strategy. This section covers the second part of that
pair.
• The absolute minimum number of frames that a process must be allocated is dependent
on system architecture, and corresponds to the worst-case scenario of the number of
pages that could be touched by a single ( machine ) instruction.
• If an instruction ( and its operands ) spans a page boundary, then multiple pages could be
needed just for the instruction fetch.
117
• Memory references in an instruction touch more pages, and if those memory locations
can span page boundaries, then multiple pages could be needed for operand access also.
• The worst case involves indirect addressing, particularly where multiple levels of indirect
addressing are allowed. Left unchecked, a pointer to a pointer to a pointer to a pointer to
a . . . could theoretically touch every page in the virtual address space in a single machine
instruction, requiring every virtual page be loaded into physical memory simultaneously.
For this reason architectures place a limit ( say 16 ) on the number of levels of indirection
allowed in an instruction, which is enforced with a counter initialized to the limit and
decremented with every level of indirection in an instruction - If the counter reaches zero,
then an "excessive indirection" trap occurs. This example would still require a minimum
frame allocation of 17 per process.
• Equal Allocation - If there are m frames available and n processes to share them, each
process gets m / n frames, and the leftovers are kept in a free-frame buffer pool.
• Proportional Allocation - Allocate the frames proportionally to the size of the process,
relative to the total size of all processes. So if the size of process i is S_i, and S is the sum
of all S_i, then the allocation for process P_i is a_i = m * S_i / S.
• Variations on proportional allocation could consider priority of process rather than just
their size.
• Obviously all allocations fluctuate over time as the number of available free frames, m,
fluctuates, and all are also subject to the constraints of minimum allocation. ( If the
minimum allocations cannot be met, then processes must either be swapped out or not
allowed to start until more free frames become available. )
• One big question is whether frame allocation ( page replacement ) occurs on a local or
global level.
• With local replacement, the number of pages allocated to a process is fixed, and page
replacement occurs only amongst the pages allocated to this process.
• With global replacement, any page may be a potential victim, whether it currently
belongs to the process seeking a free frame or not.
• Local page replacement allows processes to better control their own page fault rates, and
leads to more consistent performance of a given process over different system load levels.
• Global page replacement is overall more efficient, and is the more commonly used
approach.
• The above arguments all assume that all memory is equivalent, or at least has equivalent
access times.
• This may not be the case in multiple-processor systems, especially where each CPU is
physically located on a separate circuit board which also holds some portion of the
overall system memory.
118
• In these latter systems, CPUs can access memory that is physically located on the same
board much faster than the memory on the other boards.
• The basic solution is akin to processor affinity - At the same time that we try to schedule
processes on the same CPU to minimize cache misses, we also try to allocate memory for
those processes on the same boards, to minimize access times.
• The presence of threads complicates the picture, especially when the threads get loaded
onto different processors.
• Solaris uses an lgroup as a solution, in a hierarchical fashion based on relative latency.
For example, all processors and RAM on a single board would probably be in the same
lgroup. Memory assignments are made within the same lgroup if possible, or to the next
nearest lgroup otherwise. ( Where "nearest" is defined as having the lowest access time. )
3.5 Thrashing
• If a process cannot maintain its minimum required number of frames, then it must be
swapped out, freeing up frames for other processes. This is an intermediate level of CPU
scheduling.
• But what about a process that can keep its minimum, but cannot keep all of the frames
that it is currently using on a regular basis? In this case it is forced to page out pages that
it will need again in the very near future, leading to large numbers of page faults.
• A process that is spending more time paging than executing is said to be thrashing.
• Early process scheduling schemes would control the level of multiprogramming allowed
based on CPU utilization, adding in more processes when CPU utilization was low.
• The problem is that when memory filled up and processes started spending lots of time
waiting for their pages to page in, then CPU utilization would lower, causing the schedule
to add in even more processes and exacerbating the problem! Eventually the system
would essentially grind to a halt.
• Local page replacement policies can prevent one thrashing process from taking pages
away from other processes, but it still tends to clog up the I/O queue, thereby slowing
down any other process that needs to do even a little bit of paging ( or any other I/O for
that matter. )
• The working set model is based on the concept of locality, and defines a working set
window, of length delta. Whatever pages are included in the most recent delta page
references are said to be in the processes working set window, and comprise its current
working set, as illustrated in Figure 9.20:
• The selection of delta is critical to the success of the working set model - If it is too small
then it does not encompass all of the pages of the current locality, and if it is too large,
then it encompasses pages that are no longer being frequently accessed.
• The total demand, D, is the sum of the sizes of the working sets for all processes. If D
exceeds the total number of available frames, then at least one process is thrashing,
because there are not enough frames available to satisfy its minimum working set. If D is
significantly less than the currently available frames, then additional processes can be
launched.
• The hard part of the working-set model is keeping track of what pages are in the current
working set, since every reference adds one to the set and removes one older page. An
approximation can be made using reference bits and a timer that goes off after a set
interval of memory references:
o For example, suppose that we set the timer to go off after every 5000 references (
by any process ), and we can store two additional historical reference bits in
addition to the current reference bit.
o Every time the timer goes off, the current reference bit is copied to one of the two
historical bits, and then cleared.
o If any of the three bits is set, then that page was referenced within the last 15,000
references, and is considered to be in that processes reference set.
o Finer resolution can be achieved with more historical bits and a more frequent
timer, at the expense of greater overhead.
• Note that there is a direct relationship between the page-fault rate and the working-set, as
a process moves from one locality to another.
121
UNIT-IV
FILE-SYSTEM INTERFACE
122
4.1.3 File Types
• Windows ( and some other systems ) use special file extensions to indicate
the type of each file:
• Macintosh stores a creator attribute for each file, according to the program that
first created it with the create( ) system call.
• UNIX stores magic numbers at the beginning of certain files. ( Experiment with
the "file" command, especially in directories such as /bin and /dev )
• Some files contain an internal structure, which may or may not be known to the
OS.
• For the OS to support particular file formats increases the size and complexity of
the OS.
• UNIX treats all files as sequences of bytes, with no further consideration of the
internal structure. ( With the exception of executable binary programs, which it
must know how to load and find the first executable statement, etc. )
• Macintosh files have two forks - a resource fork, and a data fork. The resource
fork contains information relating to the UI, such as icons and button images, and
can be modified independently of the data fork, which contains the code or data as
appropriate.
• Disk files are accessed in units of physical blocks, typically 512 bytes or some
power-of-two multiple thereof. ( Larger physical disks use larger block sizes, to
keep the range of block numbers within the range of a 32-bit integer. )
123
• Internally files are organized in units of logical units, which may be as small as a
single byte, or may be a larger size corresponding to some data record or structure
size.
• The number of logical units which fit into one physical block determines its
packing, and has an impact on the amount of internal fragmentation ( wasted
space ) that occurs.
• As a general rule, half a physical block is wasted for each file, and the larger the
block sizes the more space is lost to internal fragmentation.
• A sequential access file emulates magnetic tape operation, and generally supports
a few operations:
o read next - read a record and advance the tape to the next position.
o write next - write a record and advance the tape to the next position.
o rewind
o skip n records - May or may not be supported. N may be limited to
positive numbers, or may be limited to +/- 1.
• Jump to any record and read that record. Operations supported include:
o read n - read record number n. ( Note an argument is now required. )
o write n - write record number n. ( Note an argument is now required. )
o jump to record n - could be 0 or the end of file.
o Query current record - used to return back to this record later.
o Sequential access can be easily emulated using direct access. The inverse
is complicated and inefficient.
124
Figure 4.3- Simulation of sequential access on a direct-access file.
• An indexed access scheme can be easily built on top of a direct access system.
Very large files may require a multi-tiered indexing scheme, i.e. indexes of
indexes.
125
Figure 4.4 - A typical file-system organization.
126
o If access to other directories is allowed, then provision must be made to
specify the directory being accessed.
o If access is denied, then special consideration must be made for users to
run programs located in system directories. A search path is the list of
directories in which to search for executable programs, and can be set
uniquely for each user.
• An obvious extension to the two-tiered directory structure, and the one with
which we are all most familiar.
• Each user / process has the concept of a current directory from which all
( relative ) searches take place.
• Files may be accessed using either absolute pathnames ( relative to the root of the
tree ) or relative pathnames ( relative to the current directory. )
• Directories are stored the same as any other file in the system, except there is a bit
that identifies them as directories, and they have some special structure that the
OS understands.
• One question for consideration is whether or not to allow the removal of
directories that are not empty - Windows requires that directories be emptied first,
and UNIX provides an option for deleting entire sub-trees.
127
Figure 4.7 - Tree-structured directory structure.
• When the same files need to be accessed in more than one place in the directory
structure ( e.g. because they are being shared by more than one user / process ), it
can be useful to provide an acyclic-graph structure. ( Note the directed arcs from
parent to child. )
• UNIX provides two types of links for implementing the acyclic-graph structure. (
See "man ln" for more details. )
o A hard link ( usually just called a link ) involves multiple directory entries
that both refer to the same file. Hard links are only valid for ordinary files
in the same filesystem.
o A symbolic link, that involves a special file, containing information about
where to find the linked file. Symbolic links may be used to link
directories and/or files in other filesystems, as well as ordinary files in the
current filesystem.
• Windows only supports symbolic links, termed shortcuts.
• Hard links require a reference count, or link count for each file, keeping track of
how many directory entries are currently referring to this file. Whenever one of
the references is removed the link count is reduced, and when it reaches zero, the
disk space can be reclaimed.
• For symbolic links there is some question as to what to do with the symbolic links
when the original file is moved or deleted:
o One option is to find all the symbolic links and adjust them also.
o Another is to leave the symbolic links dangling, and discover that they are
no longer valid the next time they are used.
o What if the original file is removed, and replaced with another file having
the same name before the symbolic link is next used?
128
Figure 4.8 - Acyclic-graph directory structure.
• If cycles are allowed in the graphs, then several problems can arise:
o Search algorithms can go into infinite loops. One solution is to not follow
links in search algorithms. ( Or not to follow symbolic links, and to only
allow symbolic links to refer to directories. )
o Sub-trees can become disconnected from the rest of the tree and still not
have their reference counts reduced to zero. Periodic garbage collection is
required to detect and resolve this problem. ( chkdsk in DOS and fsck in
UNIX search for these problems, among others, even though cycles are
not supposed to be allowed in either system. Disconnected disk blocks that
are not marked as free are added back to the file systems with made-up file
names, and can usually be safely deleted. )
• The basic idea behind mounting file systems is to combine multiple file systems into one
large tree structure.
129
• The mount command is given a filesystem to mount and a mount point ( directory ) on
which to attach it.
• Once a file system is mounted onto a mount point, any further references to that directory
actually refer to the root of the mounted file system.
• Any files ( or sub-directories ) that had been stored in the mount point directory prior to
mounting the new filesystem are now hidden by the mounted filesystem, and are no
longer available. For this reason some systems only allow mounting onto empty
directories.
• Filesystems can only be mounted by root, unless root has previously configured certain
filesystems to be mountable onto certain pre-determined mount points. ( E.g. root may
allow users to mount floppy filesystems to /mnt or something like it. ) Anyone can run
the mount command to see what filesystems are currently mounted.
• Filesystems may be mounted read-only, or have other restrictions imposed.
Figure 4.10 - File system. (a) Existing system. (b) Unmounted volume.
• The traditional Windows OS runs an extended two-tier directory structure, where the first
tier of the structure separates volumes by drive letters, and a tree structure is implemented
below that level.
• Macintosh runs a similar system, where each new volume that is found is automatically
mounted and added to the desktop when it is found.
• More recent Windows systems allow filesystems to be mounted to any directory in the
filesystem, much like UNIX.
130
4.5 File Sharing
• The advent of the Internet introduces issues for accessing files stored on remote
computers
o The original method was ftp, allowing individual files to be transported
across systems as needed. Ftp can be either account and password
controlled, or anonymous, not requiring any user name or password.
o Various forms of distributed file systems allow remote file systems to be
mounted onto a local directory structure, and accessed using normal file
access commands. ( The actual files are still transported across the
network as needed, possibly using ftp as the underlying transport
mechanism. )
o The WWW has made it easy once again to access files on remote systems
without mounting their filesystems, generally using ( anonymous ) ftp as
the underlying file transport mechanism.
131
o Servers restrict which filesystems may be remotely mounted.
Generally the information within those subsystems is limited,
relatively public, and protected by frequent backups.
• The NFS ( Network File System ) is a classic example of such a system.
4.6 Protection
• Files must be kept safe for reliability (against accidental damage), and protection
(against deliberate malicious access.) The former is usually managed with backup copies.
This section discusses the latter.
• One simple protection scheme is to remove all access to a file. However this makes the
file unusable, so some sort of controlled access must be arranged.
• In addition there are some special bits that can also be applied:
o The set user ID ( SUID ) bit and/or the set group ID ( SGID ) bits applied
to executable files temporarily change the identity of whoever runs the
program to match that of the owner / group of the executable program.
This allows users running specific programs to have access to files ( while
running that program ) to which they would normally be unable to
access. Setting of these two bits is usually restricted to root, and must be
done with caution, as it introduces a potential security leak.
o The sticky bit on a directory modifies write permission, allowing users to
only delete files for which they are the owner. This allows everyone to
create files in /tmp, for example, but to only delete files which they have
created, and not anyone else's.
o The SUID, SGID, and sticky bits are indicated with an S, S, and T in the
positions for execute permission for the user, group, and others,
respectively. If the letter is lower case, ( s, s, t ), then the corresponding
execute permission is not also given. If it is upper case, ( S, S, T ), then the
corresponding execute permission IS given.
o The numeric form of chmod is needed to set these advanced bits.
134
• Windows adjusts files access through a simple GUI:
4.8.1Overview
Figure 4.15- In-memory file-system structures. (a) File open. (b) File read.
• Directories need to be fast to search, insert, and delete, with a minimum of wasted disk
space.
Linear List
• A linear list is the simplest and easiest directory structure to set up, but it does
have some drawbacks.
• Finding a file ( or verifying one does not already exist upon creation ) requires a
linear search.
• Deletions can be done by moving all entries, flagging an entry as deleted, or by
moving the last entry into the newly vacant position.
• Sorting the list makes searches faster, at the expense of more complex insertions
and deletions.
• A linked list makes insertions and deletions into a sorted list easier, with overhead
for the links.
• More complex data structures, such as B-trees, could also be considered.
Hash Table
140
4.10 ALLOCATION METHODS
• There are three major methods of storing files on disks: contiguous, linked, and indexed.
141
Figure 4.17 -Contiguous allocation of disk space.
The File Allocation Table, FAT, used by DOS is a variation of linked allocation, where all
the links are stored in a separate table at the beginning of the disk. The benefit of this
approach is that the FAT table can be cached in memory, greatly improving random access
speeds.
142
Figure 4.19- File-allocation table.
• Some disk space is wasted ( relative to linked lists or FAT tables ) because an
entire index block must be allocated for each file, regardless of how many data
blocks the file contains. This leads to questions of how big the index block should
be, and how it should be implemented. There are several approaches:
o Linked Scheme - An index block is one disk block, which can be read
and written in a single disk operation. The first index block contains some
header information, the first N block addresses, and if necessary a pointer
to additional linked index blocks.
143
o Multi-Level Index - The first index block contains a set of pointers to
secondary index blocks, which in turn contain pointers to the actual data
blocks.
o Combined Scheme - This is the scheme used in UNIX inodes, in which
the first 12 or so data block pointers are stored directly in the inode, and
then singly, doubly, and triply indirect pointers provide access to more
data blocks as needed. (See below.) The advantage of this scheme is that
for small files (which many are), the data blocks are readily accessible (up
to 48K with 4K block sizes); files up to about 4144K (using 4K blocks)
are accessible with only a single indirect block (which can be cached), and
huge files are still accessible using a relatively small number of disk
accesses (larger in theory than can be addressed by a 32-bit address, which
is why some systems have moved to 64-bit file pointers.)
Performance
• The optimal allocation method is different for sequential access files than for
random access files, and is also different for small files than for large files.
• Some systems support more than one allocation method, which may require
specifying how the file is to be used (sequential or random access) at the time it is
allocated. Such systems also provide conversion utilities.
• Some systems have been known to use contiguous access for small files, and
automatically switch to an indexed scheme when file sizes surpass a certain
threshold.
• And of course some systems adjust their allocation schemes (e.g. block sizes) to
best match the characteristics of the hardware for optimum performance.
144
4.11 FREE-SPACE MANAGEMENT
• Another important aspect of disk management is keeping track of and allocating
free space.
Bit Vector
• One simple approach is to use a bit vector, in which each bit represents a disk
block, set to 1 if free or 0 if allocated.
• Fast algorithms exist for quickly finding contiguous blocks of a given size
• The down side is that a 40GB disk requires over 5MB just to store the bitmap.
(For example.)
Linked List
• A linked list can also be used to keep track of all free blocks.
• Traversing the list and/or finding a contiguous block of a given size are not easy,
but fortunately are not frequently needed operations. Generally the system just
adds and removes single blocks from the beginning of the list.
• The FAT table keeps track of the free list as just one more linked list on the table.
Grouping
• A variation on linked list free lists is to use links of blocks of indices of free
blocks. If a block holds up to N addresses, then the first block in the linked-list
contains up to N-1 addresses of free blocks and a pointer to the next block of free
addresses.
Counting
• When there are multiple contiguous blocks of free space then the system can keep
track of the starting address of the group and the number of contiguous free
blocks. As long as the average length of a contiguous group of free blocks is
greater than two this offers a savings in space needed for the free list. (Similar to
compression techniques used for graphics images when a group of pixels all the same
color is encountered.)
145
Space Maps
• Sun's ZFS file system was designed for HUGE numbers and sizes of files,
directories, and even file systems.
• The resulting data structures could be VERY inefficient if not implemented
carefully. For example, freeing up a 1 GB file on a 1 TB file system could involve
updating thousands of blocks of free list bit maps if the file was spread across the
disk.
• ZFS uses a combination of techniques, starting with dividing the disk up into (
hundreds of ) metaslabs of a manageable size, each having their own space map.
• Free blocks are managed using the counting technique, but rather than write the
information to a table, it is recorded in a log-structured transaction record.
Adjacent free blocks are also coalesced into a larger single free block.
• An in-memory space map is constructed using a balanced tree data structure,
constructed from the log data.
• The combination of the in-memory tree and the on-disk log provide for very fast
and efficient management of these very large files and free blocks.
Performance
• Disk controllers generally include on-board caching. When a seek is requested,
the heads are moved into place, and then an entire track is read, starting from
whatever sector is currently under the heads (reducing latency.) The requested
sector is returned& the unrequested portion of track is cached in disk's electronics.
• Some OSes cache disk blocks they expect to need again in a buffer cache.
• A page cache connected to the virtual memory system is actually more efficient
as memory addresses do not need to be converted to disk block addresses and
back again.
• Some systems ( Solaris, Linux, Windows 2000, NT, XP ) use page caching for
both process pages and file data in a unified virtual memory.
• Figures below show the advantages of the unified buffer cache found in some
versions of UNIX and Linux - Data does not need to be stored twice, and
problems of inconsistent buffer information are avoided.
146
Figure 4.23- I/O without a unified buffer cache.
MASS-STORAGE STRUCTURE
• In operation the disk rotates at high speed, such as 7200 rpm ( 120 revolutions per
second. ) The rate at which data can be transferred from the disk to the computer
is composed of several steps:
o The positioning time, a.k.a. the seek time or random access time is the
time required to move the heads from one cylinder to another, and for the
heads to settle down after the move. This is typically the slowest step in
the process and the predominant bottleneck to overall transfer rates.
o The rotational latency is the amount of time required for the desired
sector to rotate around and come under the read-write head.This can range
anywhere from zero to one full revolution, and on the average will equal
one-half revolution. This is another physical step and is usually the second
slowest step behind seek time. ( For a disk rotating at 7200 rpm, the
average rotational latency would be 1/2 revolution / 120 revolutions per
second, or just over 4 milliseconds, a long time by computer standards.
o The transfer rate, which is the time required to move the data
electronically from the disk to the computer. ( Some authors may also use
the term transfer rate to refer to the overall transfer rate, including seek
time and rotational latency as well as the electronic data transfer rate. )
• Disk heads "fly" over the surface on a very thin cushion of air. If they should
accidentally contact the disk, then a head crash occurs, which may or may not
permanently damage the disk or even destroy it completely. For this reason it is
normal to park the disk heads when turning a computer off, which means to move
the heads off the disk or to an area of the disk where there is no data stored.
• Floppy disks are normally removable. Hard drives can also be removable, and
some are even hot-swappable, meaning they can be removed while the computer
is running, and a new hard drive inserted in their place.
• Disk drives are connected to the computer via a cable known as the I/O Bus.
Some of the common interface formats include Enhanced Integrated Drive
Electronics, EIDE; Advanced Technology Attachment, ATA; Serial ATA, SATA,
Universal Serial Bus, USB; Fiber Channel, FC, and Small Computer Systems
Interface, SCSI.
149
• The host controller is at the computer end of the I/O bus, and the disk controller
is built into the disk itself. The CPU issues commands to the host controller via
I/O ports. Data is transferred between the magnetic surface and onboard cache by
the disk controller, and then the data is transferred from that cache to the host
controller and the motherboard memory at electronic speeds.
Magnetic Tapes - Magnetic tapes were once used for common secondary storage before the
days of hard disk drives, but today are used primarily for backups.
• Accessing a particular spot on a magnetic tape can be slow, but once reading or
writing commences, access speeds are comparable to disk drives.
• Capacities of tape drives can range from 20 to 200 GB, and compression can
double that capacity.
Storage-Area Network
152
4.16 DISK SCHEDULING
• As mentioned earlier, disk transfer speeds are limited primarily by seek times and
rotational latency. When multiple requests are to be processed there is also some
inherent delay in waiting for other requests to be processed.
• Bandwidth is measured by the amount of data transferred divided by the total amount of
time from the first request being made to the last transfer being completed, ( for a series
of disk requests. )
• Both bandwidth and access time can be improved by processing requests in a good order.
• Disk requests include the disk address, memory address, number of sectors to transfer,
and whether the request is for reading or writing.
FCFS Scheduling
• First-Come First-Serve is simple and intrinsically fair, but not very efficient.
Consider in the following sequence the wild swing from cylinder 122 to 14 and
then back to 124:
SSTF Scheduling
• Shortest Seek Time First scheduling is more efficient, but may lead to starvation
if a constant stream of requests arrives for the same general area of the disk.
• SSTF reduces the total head movement to 236 cylinders, down from 640 required
for the same set of requests under FCFS. Note, however that the distance could be
reduced still further to 208 by starting with 37 and then 14 first before processing
the rest of the requests.
153
Figure 4.29- SSTF disk scheduling.
SCAN Scheduling
• The SCAN algorithm, a.k.a. the elevator algorithm moves back and forth from
one end of the disk to the other, similarly to an elevator processing requests in a
tall building.
• Under the SCAN algorithm, If a request arrives just ahead of the moving head
then it will be processed right away, but if it arrives just after the head has passed,
then it will have to wait for the head to pass going the other way on the return trip.
This leads to a fairly wide variation in access times which can be improved upon.
• Consider, for example, when the head reaches the high end of the disk: Requests
with high cylinder numbers just missed the passing head, which means they are
all fairly recent requests, whereas requests with low numbers may have been
waiting for a much longer time. Making the return scan from high to low then
154
ends up accessing recent requests first and making older requests wait that much
longer.
C-SCAN Scheduling
LOOK Scheduling
155
Selection of a Disk-Scheduling Algorithm
• With very low loads all algorithms are equal, since there will normally only be
one request to process at a time.
• For slightly larger loads, SSTF offers better performance than FCFS, but may lead
to starvation when loads become heavy enough.
• For busier systems, SCAN and LOOK algorithms eliminate starvation problems.
• The actual optimal algorithm may be something even more complex than those
discussed here, but the incremental improvements are generally not worth the
additional overhead.
• Some improvement to overall filesystem access times can be made by intelligent
placement of directory and/or inode information. If those structures are placed in
the middle of the disk instead of at the beginning of the disk, then the maximum
distance from those structures to data blocks is reduced to only one-half of the
disk size. If those structures can be further distributed and furthermore have their
data blocks stored as close as possible to the corresponding directory structures,
then that reduces still further the overall time to find the disk block numbers and
then access the corresponding data blocks.
• On modern disks the rotational latency can be almost as significant as the seek
time, however it is not within the OSes control to account for that, because
modern disks do not reveal their internal sector mapping schemes, ( particularly
when bad blocks have been remapped to spare sectors. )
o Some disk manufacturers provide for disk scheduling algorithms directly
on their disk controllers, ( which do know the actual geometry of the disk
as well as any remapping ), so that if a series of requests are sent from the
computer to the controller then those requests can be processed in an
optimal order.
o Unfortunately there are some considerations that the OS must take into
account that are beyond the abilities of the on-board disk-scheduling
algorithms, such as priorities of some requests over others, or the need to
process certain requests in a particular order. For this reason OSes may
elect to spoon-feed requests to the disk controller one at a time in certain
situations.
Disk Formatting
• Before a disk can be used, it has to be low-level formatted, which means laying
down all of the headers and trailers marking the beginning and ends of each
sector. Included in the header and trailer are the linear sector numbers, and error-
correcting codes, ECC, which allow damaged sectors to not only be detected, but
in many cases for the damaged data to be recovered ( depending on the extent of
the damage. ) Sector sizes are traditionally 512 bytes, but may be larger,
particularly in larger drives.
• ECC calculation is performed with every disk read or write, and if damage is
detected but the data is recoverable, then a soft error has occurred. Soft errors are
generally handled by the on-board disk controller, and never seen by the OS. ( See
below. )
156
• Once the disk is low-level formatted, the next step is to partition the drive into one
or more separate partitions. This step must be completed even if the disk is to be
used as a single large partition, so that the partition table can be written to the
beginning of the disk.
• After partitioning, then the filesystems must be logically formatted, which
involves laying down the master directory information ( FAT table or inode
structure ), initializing free lists, and creating at least the root directory of the
filesystem. ( Disk partitions which are to be used as raw devices are not logically
formatted. This saves the overhead and disk space of the filesystem structure, but
requires that the application program manage its own disk storage requirements. )
Boot Block
• Computer ROM contains a bootstrap program ( OS independent ) with just
enough code to find the first sector on the first hard drive on the first controller,
load that sector into memory, and transfer control over to it. ( The ROM bootstrap
program may look in floppy and/or CD drives before accessing the hard drive,
and is smart enough to recognize whether it has found valid boot code or not. )
• The first sector on the hard drive is known as the Master Boot Record, MBR, and
contains a very small amount of code in addition to the partition table. The
partition table documents how the disk is partitioned into logical disks, and
indicates specifically which partition is the active or boot partition.
• The boot program then looks to the active partition to find an operating system,
possibly loading up a slightly larger / more advanced boot program along the way.
• In a dual-boot ( or larger multi-boot ) system, the user may be given a choice of
which operating system to boot, with a default action to be taken in the event of
no response within some time frame.
• Once the kernel is found by the boot program, it is loaded into memory and then
control is transferred over to the OS. The kernel will normally continue the boot
process by initializing all important kernel data structures, launching important
system services ( e.g. network daemons, sched, init, etc. ), and finally providing
one or more login prompts. Boot options at this stage may include single-user
a.k.a. maintenance or safe modes, in which very few system services are started -
These modes are designed for system administrators to repair problems or
otherwise maintain the system.
157
Bad Blocks
• No disk can be manufactured to 100% perfection, and all physical objects wear
out over time. For these reasons all disks are shipped with a few bad blocks, and
additional blocks can be expected to go bad slowly over time. If a large number of
blocks go bad then the entire disk will need to be replaced, but a few here and
there can be handled through other means.
• In the old days, bad blocks had to be checked for manually. Formatting of the disk
or running certain disk-analysis tools would identify bad blocks, and attempt to
read the data off of them one last time through repeated tries. Then the bad blocks
would be mapped out and taken out of future service. Sometimes the data could
be recovered, and sometimes it was lost forever. ( Disk analysis tools could be
either destructive or non-destructive. )
• Modern disk controllers make much better use of the error-correcting codes, so
that bad blocks can be detected earlier and the data usually recovered. ( Recall
that blocks are tested with every write as well as with every read, so often errors
can be detected before the write operation is complete, and the data simply written
to a different sector instead. )
• Note that re-mapping of sectors from their normal linear progression can throw
off the disk scheduling optimization of the OS, especially if the replacement
sector is physically far away from the sector it is replacing. For this reason most
disks normally keep a few spare sectors on each cylinder, as well as at least one
spare cylinder. Whenever possible a bad sector will be mapped to another sector
on the same cylinder, or at least a cylinder as close as possible. Sector slipping
may also be performed, in which all sectors between the bad sector and the
replacement sector are moved down by one, so that the linear progression of
sector numbers can be maintained.
• If the data on a bad block cannot be recovered, then a hard error has occurred.,
which requires replacing the file(s) from backups, or rebuilding them from
scratch.
Swap-Space Use
Swap-Space Location
Swap space can be physically located in one of two locations:
158
• As a large file which is part of the regular filesystem. This is easy to
implement, but inefficient. Not only must the swap space be accessed
through the directory system, the file is also subject to fragmentation
issues. Caching the block location helps in finding the physical blocks, but
that is not a complete fix.
• As a raw partition, possibly on a separate or little-used disk. This allows
the OS more control over swap space management, which is usually faster
and more efficient. Fragmentation of swap space is generally not a big
issue, as the space is re-initialized every time the system is rebooted. The
downside of keeping swap space on a raw partition is that it can only be
grown by repartitioning the hard drive.
159
UNIT-V
Deadlocks
5.1 System Model
• For the purposes of deadlock discussion, a system can be modeled as a collection of
limited resources, which can be partitioned into different categories, to be allocated to a
number of processes, each having different needs.
• Resource categories may include memory, printers, CPUs, open files, tape drives, CD-
ROMS, etc.
• By definition, all the resources within a category are equivalent, and a request of this
category can be equally satisfied by any one of the resources in that category. If this is
not the case ( i.e. if there is some difference between the resources within a category ),
then that category needs to be further divided into separate categories. For example,
"printers" may need to be separated into "laser printers" and "color inkjet printers".
• Some categories may have a single resource.
• In normal operation a process must request a resource before using it, and release it when
it is done, in the following sequence:
1. Request - If the request cannot be immediately granted, then the process must
wait until the resource(s) it needs become available. For example the system calls
open( ), malloc( ), new( ), and request( ).
2. Use - The process uses the resource, e.g. prints to the printer or reads from the
file.
3. Release - The process relinquishes the resource. so that it becomes available for
other processes. For example, close( ), free( ), delete( ), and release( ).
• For all kernel-managed resources, the kernel keeps track of what resources are free and
which are allocated, to which process they are allocated, and a queue of processes waiting
for this resource to become available. Application-managed resources can be controlled
using mutexes or wait( ) and signal( ) calls, ( i.e. binary or counting semaphores. )
• A set of processes is deadlocked when every process in the set is waiting for a resource
that is currently allocated to another process in the set ( and which can only be released
when that other waiting process makes progress. )
5.2 Deadlock Characterization
5.2.1 Necessary Conditions
• There are four conditions that are necessary to achieve deadlock:
1. Mutual Exclusion - At least one resource must be held in a non-sharable
mode; If any other process requests this resource, then that process must
wait for the resource to be released.
2. Hold and Wait - A process must be simultaneously holding at least one
resource and waiting for at least one resource that is currently being held
by some other process.
3. No preemption - Once a process is holding a resource ( i.e. once its
request has been granted ), then that resource cannot be taken away from
that process until the process voluntarily releases it.
4. Circular Wait - A set of processes { P0, P1, P2, . . ., PN } must exist such
that every P[ i ] is waiting for P[ ( i + 1 ) % ( N + 1 ) ]. ( Note that this
condition implies the hold-and-wait condition, but it is easier to deal with
the conditions if the four are considered separately. )
160
5.2.2 Resource-Allocation Graph
• In some cases deadlocks can be understood more clearly through the use of
Resource-Allocation Graphs, having the following properties:
o A set of resource categories, { R1, R2, R3, . . ., RN }, which appear as
square nodes on the graph. Dots inside the resource nodes indicate specific
instances of the resource. ( E.g. two dots might represent two laser
printers. )
o A set of processes, { P1, P2, P3, . . ., PN }
o Request Edges - A set of directed arcs from Pi to Rj, indicating that
process Pi has requested Rj, and is currently waiting for that resource to
become available.
o Assignment Edges - A set of directed arcs from Rj to Pi indicating that
resource Rj has been allocated to process Pi, and that Pi is currently
holding resource Rj.
o Note that a request edge can be converted into an assignment edge by
reversing the direction of the arc when the request is granted. ( However
note also that request edges point to the category box, whereas assignment
edges emanate from a particular instance dot within the box. )
o For example:
161
165
Figure 5.1 - Resource allocation graph
Figure 5.2 - Resource allocation graph with a deadlock Figure 5.3 - Resource allocation graph with a cycle
but no deadlock
166
5.3 Methods for Handling Deadlocks
• Generally speaking there are three ways of handling deadlocks:
1. Deadlock prevention or avoidance - Do not allow the system to get into a
deadlocked state.
2. Deadlock detection and recovery - Abort a process or preempt some resources
when deadlocks are detected.
3. Ignore the problem all together - If deadlocks only occur once a year or so, it may
be better to simply let them happen and reboot as necessary than to incur the
constant overhead and system performance penalties associated with deadlock
prevention or detection. This is the approach that both Windows and UNIX take.
• In order to avoid deadlocks, the system must have additional information about all
processes. In particular, the system must know what resources a process will or may
request in the future. ( Ranging from a simple worst-case maximum to a complete
resource request and release plan for each process, depending on the particular algorithm.
)
• Deadlock detection is fairly straightforward, but deadlock recovery requires either
aborting processes or preempting resources, neither of which is an attractive alternative.
• If deadlocks are neither prevented nor detected, then when a deadlock occurs the system
will gradually slow down, as more and more processes become stuck waiting for
resources currently held by the deadlock and by other waiting processes. Unfortunately
this slowdown can be indistinguishable from a general system slowdown when a real-
time process has heavy computing needs.
• Deadlocks can be prevented by preventing at least one of the four required conditions:
167
5.4.3 No Preemption
• Preemption of process resource allocations can prevent this condition of
deadlocks, when it is possible.
o One approach is that if a process is forced to wait when requesting a new
resource, then all other resources previously held by this process are
implicitly released, ( preempted ), forcing this process to re-acquire the old
resources along with the new resources in a single request, similar to the
previous discussion.
o Another approach is that when a resource is requested and not available,
then the system looks to see what other processes currently have those
resources and are themselves blocked waiting for some other resource. If
such a process is found, then some of their resources may get preempted
and added to the list of resources for which the process is waiting.
o Either of these approaches may be applicable for resources whose states
are easily saved and restored, such as registers and memory, but are
generally not applicable to other devices such as printers and tape drives.
• The resulting resource-allocation graph would have a cycle in it, and so the
request cannot be granted.
171
5.5.3.2 Resource-Request Algorithm (The Bankers Algorithm)
• Now that we have a tool for determining if a particular state is safe or not,
we are now ready to look at the Banker's algorithm itself.
• This algorithm determines if a new request is safe, and grants it only if it is
safe to do so.
• When a request is made ( that does not exceed currently available
resources ), pretend it has been granted, and then see if the resulting state
is a safe one. If so, grant the request, and if not, deny the request, as
follows:
1. Let Request[ n ][ m ] indicate the number of resources of each type
currently requested by processes. If Request[ i ] > Need[ i ] for any
process i, raise an error condition.
2. If Request[ i ] > Available for any process i, then that process must
wait for resources to become available. Otherwise the process can
continue to step 3.
3. Check to see if the request can be granted safely, by pretending it
has been granted and then seeing if the resulting state is safe. If so,
grant the request, and if not, then the process must wait until its
request can be granted safely.The procedure for granting a request
( or pretending to for testing purposes ) is:
▪ Available = Available - Request
▪ Allocation = Allocation + Request
▪ Need = Need - Request
172
Figure 5.9 - (a) Resource allocation graph. (b) Corresponding wait-for graph
• If deadlocks are not avoided, then another approach is to detect when they have occurred
and recover somehow.
• In addition to the performance hit of constantly checking for deadlocks, a policy /
algorithm must be in place for recovering from deadlocks, and there is potential for lost
work when processes must be aborted or have their resources preempted.
• Now suppose that process P2 makes a request for an additional instance of type C,
yielding the state shown below. Is the system now deadlocked?
174
5.6.3 Detection-Algorithm Usage
• When should the deadlock detection be done? Frequently, or infrequently?
• The answer may depend on how frequently deadlocks are expected to occur, as
well as the possible consequences of not catching them immediately. ( If
deadlocks are not removed immediately when they occur, then more and more
processes can "back up" behind the deadlock, making the eventual task of
unblocking the system more difficult and possibly damaging to more processes. )
• There are two obvious approaches, each with trade-offs:
1. Do deadlock detection after every resource allocation which cannot be
immediately granted. This has the advantage of detecting the deadlock
right away, while the minimum number of processes are involved in the
deadlock. ( One might consider that the process whose request triggered
the deadlock condition is the "cause" of the deadlock, but realistically all
of the processes in the cycle are equally responsible for the resulting
deadlock. ) The down side of this approach is the extensive overhead and
performance hit caused by checking for deadlocks so frequently.
2. Do deadlock detection only when there is some clue that a deadlock may
have occurred, such as when CPU utilization reduces to 40% or some
other magic number. The advantage is that deadlock detection is done
much less frequently, but the down side is that it becomes impossible to
detect the processes involved in the original deadlock, and so deadlock
recovery can be more complicated and damaging to more processes.
3. (As I write this, a third alternative comes to mind: Keep a historical log of
resource allocations, since that last known time of no deadlocks. Do
deadlock checks periodically (once an hour or when CPU usage is low?),
and then use the historical log to trace through and determine when the
deadlock occurred and what processes caused the initial deadlock.
Unfortunately I'm not certain that breaking the original deadlock would
then free up the resulting log jam.)
175
5.7.1 Process Termination
• Two basic approaches, both of which recover resources allocated to terminated
processes:
o Terminate all processes involved in the deadlock. This definitely solves
the deadlock, but at the expense of terminating more processes than would
be absolutely necessary.
o Terminate processes one by one until the deadlock is broken. This is more
conservative, but requires doing deadlock detection after each step.
• In the latter case there are many factors that can go into deciding which processes
to terminate next:
1. Process priorities.
2. How long the process has been running, and how close it is to finishing.
3. How many and what type of resources is the process holding. ( Are they
easy to preempt and restore? )
4. How many more resources does the process need to complete.
5. How many processes will need to be terminated
6. Whether the process is interactive or batch.
7. Whether or not the process has made non-restorable changes to any
resource.
176
PROTECTION
5.8 Goals of Protection
• Obviously to prevent malicious misuse of the system by users or programs.
• To ensure that each shared resource is used only in accordance with system
policies, which may be set either by system designers or by system administrators.
• To ensure that errant programs cause the minimal amount of damage possible.
• Note that protection systems only provide the mechanisms for enforcing policies and
ensuring reliable systems. It is up to administrators and users to implement those
mechanisms effectively.
• The model of protection that we have been discussing can be viewed as an access matrix,
in which columns represent different system resources and rows represent different
protection domains. Entries within the matrix indicate what access that domain has to that
resource.
• Domain switching can be easily supported under this model, simply by providing
"switch" access to other domains:
• The ability to copy rights is denoted by an asterisk, indicating that processes in that
179
domain have the right to copy that access within the same column, i.e. for the same
object. There are two important variations:
o If the asterisk is removed from the original access right, then the right is
transferred, rather than being copied. This may be termed a transfer right as
opposed to a copy right.
o If only the right and not the asterisk is copied, then the access right is added to the
new domain, but it may not be propagated further. That is the new domain does
not also receive the right to copy the access. This may be termed a limited copy
right, as shown in Figure 14.5 below:
• The owner right adds the privilege of adding new rights or removing existing ones:
• Copy and owner rights only allow the modification of rights within a column. The
addition of control rights, which only apply to domain objects, allow a process operating
in one domain to affect the rights available in other domains. For example in the table
below, a process operating in domain D2 has the right to control any of the rights in
domain D4.
180
Figure 5.16 - Modified access matrix of Figure 14.4
5.12.5 Comparison
181
• Each of the methods here has certain advantages or disadvantages, depending on
the particular situation and task at hand.
• Many systems employ some combination of the listed methods.
187
16. University Question papers of previous year
188
189
190
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
17. QUESTION BANK
UNIT-I
Short Answer Questions:
1. What is a system call? List different types of system calls used in developing program.
2. Advantages and Disadvantages of real time Operating system.
3. Explain Operating system structure.
4. Define user mode and kernel mode. Why two modes are required?
5. What is Multiprocessor System?
6. What are the advantages of multiprocessors?
7. What are the modes of operation in Hardware Protection?
8. What is meant by Batch Systems?
9. List the various services provided by operating systems.
10. What is meant by Distributed systems?
208
UNIT-II
3) What is a Process Control Block (PCB) and its significance? --------L2 Understanding
Answer: A PCB is a data structure maintained by the operating system to store information about each process,
including process state, program counter, CPU registers, and other relevant information.
12) What is real-time scheduling, and how does it differ from other scheduling strategies?…… L2
Understanding
Answer: Real-time scheduling prioritizes tasks based on their deadlines or time constraints. Unlike traditional
scheduling, real-time scheduling aims to meet deadlines consistently, even at the expense of overall system
throughput.
15) Describe the critical section problem and its significance in concurrent programming.…… L2
Understanding
Answer: The critical section problem involves ensuring that only one process or thread executes a critical
section of code at a time to avoid data inconsistency or race conditions. It is a fundamental challenge in
concurrent programming.
16) How does Peterson's solution address the critical section problem?…… L2 Understanding
Answer: Peterson's solution is a software-based algorithm for achieving mutual exclusion in a shared memory
environment. It uses shared variables and flags to coordinate access to critical sections between two processes.
17) What is synchronization hardware, and how does it aid in process synchronization?…… L2 Understanding
Answer: Synchronization hardware refers to specialized instructions or hardware mechanisms provided by
CPUs to support synchronization primitives like atomic operations and memory barriers. It helps in
implementing efficient synchronization algorithms.
19) What are classic problems of synchronization, and give examples.…… L2 Understanding
Answer: Classic synchronization problems include the producer-consumer problem, the dining philosophers
problem, and the readers-writers problem. These problems illustrate challenges in coordinating access to shared
resources.
210
Long answer questions
6) Explain Demand paging and virtual memory in memory management and write its
Importance BTL 5
7) Explain the hardware and software support in demand paging BTL 5
8) List the criteria steps to handle a page fault in MM BTL 4
9) Evaluate the working of page replacement Algorithm BTL 5
10) Consider the following page reference string: BTL 5
1,2,3,4,2,1,5,6,2,1,2,3,7,6,3,2,1,2,3,6
Evaluate how many page faults would occur for the following page replacement
Algorithms. Assume five frames and initially all the frames are empty, so
Those first unique pages will also cost one page fault.
i) Counter LRU ii) LFU
213
UNIT-IV
Short:
1. Define a File and list the various operations over files. BTL1
2. What are the different file access methods? BTL1
3. List various file allocation methods. BTL1
4. Explain the need for file system mounting? BTL2
5. Define a Directory and list its operations. BTL1
6. Outline various disk scheduling algorithms. BTL2
7. Classify file attributes and types. BTL2
8. Interpret naming and grouping problems in directory structure. BTL2
9. What is seek time and rotational latency? BTL1
10. What is swap-space management? BTL1
11. How file sharing is done in Linux? BTL1
12. How file protection can be achieved? BTL1
Long:
1. Compare sequential and indexed file access methods with a suitable example. (BTL4)
2. Analyze file sharing and file protection mechanisms. (BTL4)
3. Explain the file allocation methods with a neat diagram. (BTL5)
4. Explain the four approaches of free-space management.(BTL5)
5. Compare and contrast hierarchical directory with DAG directory organization methods. (BTL4)
6. Discuss the factors that show effect on efficiency and performance of file system. (BTL6)
7. Discuss the merits and demerits of all 4 directory organization methods. (BTL6)
8. Assess the merits and demerits of all 3 file allocation methods. (BTL5)
9. Estimate the total head movement in cylinders using FCFS, SSTF, SCAN and C-SCAN by considering
the following disk queue with requests for I/O to blocks on cylinders 98,183,37,122,14,124,65,67 in that
order, with the disk head initially at cylinder 53. Also provide the necessary diagram to show the head
movement for the above queue. (BTL5)
10. Estimate the total head movement in cylinders using FCFS, SSTF, LOOK and C-LOOK by considering
the following disk queue with requests for I/O to blocks on cylinders 98,183,37,122,14,124,65,67 in that
order, with the disk head initially at cylinder 53. Also provide the necessary diagram to show the head
movement for the above queue. (BTL5)
11. Explain the directory implementation through i) Linear List ii) Hash Table. (BTL5)
12. Explain Disk attachment and Disk space management in detail. (BTL5)
214
UNIT-V
Short:
1. Explain deadlock and illustrate a real-time example of deadlock situation. BTL2
2. What are the necessary conditions for the occurrence of a deadlock? BTL1
3. Infer the methods to deal with deadlocks. BTL2
4. Define Protection and give the principles of protection. BTL1
5. List the Goals of Protection. BTL1
6. Outline the merits and demerits of using Access matrix for system protection. BTL2
7. Interpret the various notations used in Resource Allocation Graph. BTL2
8. Interpret Wait-For-Graph form a Resource Allocation Graph with an example. BTL2
9. What is the difference between Resource Allocation Graph and Wait For Graph?BTL1
10. What is the domain of protection and what does each domain consist of? BTL1
Long:
1. Explain the goals and principles of system protection in detail. BTL5
2. Discuss about Hydra system and Cambridge CAP capability based systems. BTL6
3. Explain Access matrix along with some of its implementation methods. BTL5
4. Explain the language based protection with some examples. BTL5
5. Interpret various implementation methods used for revocation of access rights. BTL5
6. Interpret Banker’s algorithm with proper example. BTL5
7. Inspect the deadlock recovery methods with suitable examples. BTL4
8. Explain the steps involved in safety algorithm. BTL5
9. Explain the Resource Allocation Graph in dealing deadlocks with examples BTL5
10. Consider the following snapshot of a system with resource allocation BTL5
Allocation Max Available
A B C D A B C D A B C D
P0 0 0 1 2 0 0 1 2 1 5 2 0
P1 1 0 0 0 1 7 5 0
P2 1 3 5 4 2 3 5 6
P3 0 6 3 2 0 6 5 2
P4 0 0 1 4 0 6 5 6
Answer the following questions using Banker’s algorithm
i. What is the content of matrix Need?
ii. Is the system in a safe state?
iii. If a request from process P1 arrives for (0, 4, 2, 0), can the request be granted immediately?
215
18. Assignment Questions
Assignment-I OS Basics
1. Explain the following operating systems in detail.
i) Simple Batch Systems ii) Multiprogramming Systems
iii) Time shared systems iv) Parallel systems v) Real Time systems
2. a) Explain OS Structure.
b) Explain the dual mode of operation.
c) Write short notes on Special-purpose systems.
The processes are assumed to have arrived in the order P1, P2, P3, P4, P5, all at time 0.
a. Draw four Gantt charts that illustrate the execution of these processes using the
following scheduling algorithms: FCFS, SJF, non preemptive priority and RR
(quantum=1).
b. What is the turnaround time and waiting time of each process for each of these
scheduling algorithms?
c. Which of the algorithms results in the minimum average waiting time?
216
3. Consider the following set of processes, with the length of the CPU burst given in
milliseconds
Process Arrival Time Burst Time Priority
P1 1 7 3
P2 2 8 2
P3 3 6 5
P4 4 9 4
P5 5 4 1
Give the Gantt chart illustrating the execution of these processes using Shortest
Remaining Time First (SRTF) and preemptive priority scheduling. Find the average
waiting time and average turnaround time for each of these algorithms.
5. Explain in detail how semaphores and monitors are used to solve the following problems
i) Producer-Consumer problem ii) Readers –Writers Problem iii) Dining –Philosophers
problem
2. a. Explain the directory implementation through i) Linear List ii) Hash Table.
b. Compare various file allocation techniques.
c. Explain the following Free Space Management Techniques:
i) Bit Map ii) Linked List iii) Grouping
b. Suppose that a disk drive has 4,000 cylinders, numbered 0 to 3999. The drive is
currently serving a request at cylinder 143, and the previous request was at cylinder 120.
The queue of pending requests, in FIFO order is:
87, 2465,918, 1784, 998, 509, 122, 750, 130
Starting from the current head position, what is the total distance (in cylinders) that
the disk arm moves to satisfy all the pending requests for each of the following
Disk Scheduling algorithms?
218
Assignment-V
1. a) Define Deadlock. Give an example.
b) Explain the necessary and sufficient conditions for deadlock.
c) Explain the use of Resource Allocation Graph in dealing with deadlocks. Give examples.
5. a) What is an access matrix? How can it be used for protection? Explain some implementation schemes.
b) What is the need for revocation of access rights? Discuss various ways of implementing.
219
19.Objective Type Questions
UNIT – I
MCQ’s
1. What is an operating system?
a) interface between the hardware and application programs
b) collection of programs that manages hardware resources
c) system service provider to the application programs
d) all of the mentioned
Answer: d
2. What is the main function of the command interpreter?
a) to provide the interface between the API and application program
b) to handle the files in the operating system
c) to get and execute the next user-specified command
d) none of the mentioned
Answer: c
3. To access the services of the operating system, the interface is provided by the ___________
a) Library
b) System calls
c) Assembly instructions
d) API
Answer: b
4) Threads are:
A) Independent processes
B) Lightweight processes within a single process
C) A type of scheduling algorithm
D) Used for inter-process communication
Answer: B) Lightweight processes within a single process
6) The component of the operating system responsible for selecting processes from the ready queue is called:
A) Dispatcher
B) Scheduler
C) PCB
D) Context Switch
Answer: B) Scheduler
9) The component responsible for transferring control of the CPU from one process to another is called:
A) Scheduler
B) PCB
C) Dispatcher
D) Context Switch
Answer: C) Dispatcher
Answers:
1)race
2)Process Control Block (PCB)
3)context switch
4)exclusion
5)shared
6)concurrency
7)cores
8)deadlines
9)time
10)threads
11)throughput
12)proper
13)processes
14)primitives
15)context switch
16)dining philosophers
17)processes,
18)status,
19)ready,
20)other
225
UNIT-III
Multiple Choice Question:
1) Memory management technique in which system stores and retrieves data from secondary storage for use in main
memory is called?
a) fragmentation
b) paging
c) mapping
d) none of the mentioned
Ans: b
2) Program always deals with ____________
a) logical address
b) absolute address
c) physical address
d) relative address
Ans: a
3) What is compaction?
a) a technique for overcoming internal fragmentation
b) a paging technique
c) a technique for overcoming external fragmentation
d) a technique for overcoming fatal error
Ans: c
4) Operating System maintains the page table for ____________
a) each process
b) each thread
c) each instruction
d) each address
Ans: a
5) Run time mapping from virtual to physical address is done by ____________
a) Memory management unit
b) CPU
c) PCI
d) None of the mentioned
Ans: a
6) The page table contains ____________
a) base address of each page in physical memory
b) page offset
c) page size
d) none of the mentioned
Ans: a
7) relocation register are used to :
a) a different address space to processes
b) providing less address space to processes
226
c) to protect the address spaces of processes
d) providing more address space to processes
Ans: c
8) Each partition may contain ________ when memory is divided into several fixed sized partitions.
a) multiple processes at once
b) exactly one process
c) Two process
d) at least one process
Ans: b
9) The degree of multiprogramming is bounded to which extent In fixed-sized partition?
a) All of these
b) the memory size
c) the CPU utilization
d) the number of partitions
Ans: d
10) The number of ——- can be granted by the Owner of address space.
A. Computers
B. Modules
C. Pages
D. Devices
Ans: c
11) The strategies like the first fit, best fit and worst fit are used to select a ______.
A. process from a queue to put in storage
B. process from a queue to put in memory
C. processor to run the next process
D. free hole from a set of available holes
Ans: d
227
UNIT-IV
MCQs:
1. __________is a unique tag, usually a number, identifies the file within the file system.
a) File identifier b) File name c) File type d) none of the mentioned
2. File type can be represented by
a) file name b) file extension c) file identifier d) none of the mentioned
3. What is the mounting of file system?
a) crating of a filesystem b) deleting a filesystem
c) attaching portion of the file system into a directory structure
d) removing portion of the file system into a directory structure
4. Mapping of file is managed by
a) file metadata b) page table c) virtual memory d) file system
5. Mapping of network file system protocol to local file system is done by
a) network file system b) local file system c) volume manager d) remote mirror.
6. Which one of the following explains the sequential file access method?
a) random access according to the byte number b) read bytes one at a time, in order
c) read/write sequentially by record d) read/write randomly by record
7. In which type of allocation method each file occupy a set of contiguous block on the disk?
a) contiguous allocation b) dynamic-storage allocation
c) linked allocation d) indexed allocation
8. If the block of free-space list is free then bit will
a) 1 b) 0 c) Any of 0 or 1 d) none of the mentioned
9. File attributes consist of :
a) name b) type c) size d) ALL
10. The information about all files is kept in :
a) swap space b) operating system c) seperate directory structure d) None of these
11. In the sequential access method, information in the file is processed :
a) one disk after the other, record access doesn’t matter b) one record after the other
c) one text document after the other d) None of these
12. In the single level directory:
a) All files are contained in different directories all at the same level
b) All files are contained in the same directory
c) Depends on the operating system
d) None of these
228
13. In the single level directory:
a) all directories must have unique names b) all files must have unique names
c) all files must have unique owners d) All of these
14. An absolute path name begins at the:
a) leaf b) stem c) current directory d) root
15. A relative path name begins at the:
a) leaf b) stem c) current directory d) root
16. The heads of the magnetic disk are attached to a______that moves all the heads as a unit.
a) spindle b) disk arm c) track d) None of these
17. The set of tracks that are at one arm position make up a .
a) magnetic disks b) electrical disks c) assemblies d) cylinders
18. The time taken to move the disk arm to the desired cylinder is called the :
a) positioning time b) random access time c) seek time d) rotational latency
19. The time taken for the desired sector to rotate to the disk head is called :
a) positioning time b) random access time c) seek time d) rotational latency
20. When the head damages the magnetic surface, it is known as .
a) disk failure b) head crash c) magnetic damage d) All of these
229
UNIT-V
MCQs:
1. When two or more processes attempt to access the same resource a _________ occurs
a) each process is terminated b) each process is blocked and remain blocked forever
3. To avoid the race condition, the number of processes that may be simultaneously inside their critical section is
4. A system has 3 processes and 4 resources. If each process needs a maximum of 2 units, then
6. Four necessary conditions for deadlock are non pre-emption, circular wait, hold and wait and
a) all resources have multiple instances b) all resources have a single instance
230
9. Which of the following situation may cause deadlock?
10. Which of the following is not the approach to dealing with deadlock
a) Prevention b) Avoidance
c) Detection d) Deletion
d) ALL
d) ALL
a) files b) users
a) files b) users
231
c) both A & B d) None of the mentioned
20. For a domain _______ is a list of objects together with the operation allowed on these objects.
1. A set of resource allocation such that the system can allocate resources to each process in some order and avoid
deadlock is known as ________ (safe state)
232
20.Tutorial Problems
Tutorial Sheet-I (Unit-I)
2. Explain the following: a) Parallel systems b) Distribution Systems c) Real Time systems
1. Five processes arrive at time 0, in the order given, with the length of the CPU-burst time in
milliseconds, as shown below.
Processes Burst time
P1 8
P2 9
P3 5
P4 6
P5 4
a) Find the average waiting time, considering the following algorithms:
(i) FCFS (ii) SJF (iii) RR (time quantum = 4 milliseconds).
b) Which algorithm gives the minimum average waiting time?
2. What is dispatcher, explain short term scheduling and long term scheduling.
3. Explain First fit, Best fit and Worst fit algorithms with an example.
As stated prerequisites are taught in the previous semesters, there are no known gaps
Unit-1:
1. Distributed Systems
2. Real-time Embedded System
3. Operating Systems Generation (SYSGEN)
4. Handheld Systems
5. Multimedia Systems
6. Computing Environments
a) Traditional Computing
b) Client server Computing
c) Peer-to-peer Computing
d) Web-based Computing
234
7.Case study on UNIX Operating Systems
8. Case study on LINUX Operating Systems
9. Case study on WINDOWS Operating Systems
10. Threading issues
11. Scheduling Algorithms
12. Evaluation of scheduling Algorithms
Unit-2:
1. Synchronization examples
2. Atomic transactions
3. Case study on UNIX
4. Case study on Linux
5. Case study on Windows
6. Classic problems of synchronization
Unit-3:
1. Page-Replacement Algorithms
2. Memory Management
3. Case study on UNIX
4. Case study on Linux
5. Case study on Windows
6. Paging Techniques
Unit-4:
1. free-space management
2. Case study on UNIX
3. Case study on Linux
4. Case study on Windows
5. File system
6. Protection in file systems
7. Mass-storage structure
8. disk scheduling
Unit-5:
1. Deadlock Detection
2. Deadlock Avoidance
3. Recovery form deadlock
4. Protection
235
5. Case study on UNIX
6. Case study on Linux
7. Case study on Windows
8. Implementation of Access Matrix
Batch2
SlNo AdmnNo StudentName
236