0% found this document useful (0 votes)
22 views90 pages

Os Sem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views90 pages

Os Sem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

UNIT – I Q)Operating-System Operations

Operating System : ▪ Interrupt driven by hardware


• A program that acts as an intermediary between a user of a computer ▪ Software error or request creates exception or trap
and the computer hardware Operating system goals: ▪ Division by zero, request for operating system service
• Execute user programs and make solving user problems easier ▪ Other process problems include infinite loop, processes modifying
• Make the computer system convenient to use each Other or the operating system
• Use the computer hardware in an efficient manner ▪ Dual-mode operation allows OS to protect itself and other system
Q)Operating System Structure components
▪ Multiprogramming needed for efficiency ▪ User mode and kernel mode
▪ Single user cannot keep CPU and I/O devices busy at all times ▪ Mode bit provided by hardware
▪ Multiprogramming organizes jobs (code and data) so CPU always has one ▪ Provides ability to distinguish when system is running user code or
to Execute
kernel code
▪ A subset of total jobs in system is kept in memory
▪ Some instructions designated as privileged, only executable in
▪ One job selected and run via job scheduling kernel mode
▪ When it has to wait (for I/O for example), OS switches to another job ▪ System call changes mode to kernel, return from call resets it to
▪ Timesharing (multitasking) is logical extension in which CPU switches jobs user
so frequently thatusers can interact with each job while it is running,
Dual-Mode Operation
creating interactive computing
• When the computer system is executing on behalf of a user
▪ Response time should be < 1 second application, the system is in user mode.
▪ Each user has at least one program executing in memory [process If
• However, when a user application requests a service from the operating
several jobs ready to run at the same time [ CPU scheduling If processes
system (via a system call), it must transition from user to kernel mode to
don’t fit in memory, swapping moves them in and out to run
fulfill the request. As we shall see, this architectural enhancement is
▪ Virtual memory allows execution of processes not completely in memory
useful for many other aspects of system operation as well.
Memory Layout for Multiprogrammed System
• At system boot time, the hardware starts in kernel mode.
• The operating system is then loaded and starts user applications in user
mode.
• Whenever a trap or interrupt occurs, the hardware switches from user
mode to kernel mode (that is, changes the state of the mode bit to 0).
• Thus, whenever the operating system gains control of the computer, it is
in kernel mode.
• The system always switches to user mode (by setting the mode bit to 1)
before passing control to a user program.
Transition from User to Kernel Mode
• Timer to prevent infinite loop / process hogging resources
• Set interrupt after specific period
• Operating system decrements counter
• When counter zero generate an interrupt
• Set up before scheduling process to regain control or terminate program
that exceeds allotted time

1
A threat is a program that is malicious in nature and leads to harmful effects for
the system. Some of the common threats that occur in a system are −
Virus
Viruses are generally small snippets of code embedded in a system. They are very
dangerous and can corrupt files, destroy data, crash systems etc. They can also
spread further by replicating themselves as required.
Trojan Horse
A trojan horse can secretly access the login details of a system. Then a malicious
user can use these to enter the system as a harmless being and wreak havoc.
Trap Door
Timer A trap door is a security breach that may be present in a system without the
We must ensure that the operating system maintains control over the CPU. We knowledge of the users. It can be exploited to harm the data or files in a system
must prevent a user program from getting stuck in an infinite loop or not calling by malicious people.
system services and never returning control to the operating system. To Worm
accomplish this goal, we can use a timer. A timer can be set to interrupt the A worm can destroy a system by using its resources to extreme levels. It can
computer after a specified period. The period may be fixed (for example, 1/60 generate multiple copies which claim all the resources and don't allow any other
second) or variable (for example, from 1 millisecond to 1 second). A variable timer processes to access them. A worm can shut down a whole network in this way.
is generally implemented by a fixed-rate clock and a counter. Denial of Service
The operating system sets the counter. Every time the clock ticks, the counter is These type of attacks do not allow the legitimate users to access a system. It
decremented. When the counter reaches 0, an interrupt occurs. For instance, a overwhelms the system with requests so it is overwhelmed and cannot work
10-bit counter with a 1-millisecond clock allows interrupts at intervals from 1 properly for other user.
millisecond to 1,024 milliseconds, in steps of 1 millisecond. Before turning over Protection and Security Methods
control to the user, the operating system ensures that the timer is set to The different methods that may provide protect and security for different
interrupt. If the timer interrupts, control transfers automatically to the operating computer systems are −
system, which may treat the interrupt as a fatal error or may give the program Authentication
more time. Clearly, instructions that modify the content of the timer are This deals with identifying each user in the system and making sure they are who
privileged. Thus, we can use the timer to prevent a user program from running they claim to be. The operating system makes sure that all the users are
too long. A simple technique is to initialize a counter with the amount of time that authenticated before they access the system. The different ways to make sure
a program is allowed to run. A program with a 7-minute time limit, for example, that the users are authentic are:
would have its counter initialized to 420. Username/ Password
Every second, the timer interrupts and the counter is decremented by 1. As long Each user has a distinct username and password combination and they need to
as the counter is positive, control is returned to the user program. When the enter it correctly before they can access the system.
counter becomes negative, the operating system terminates the program for User Key/ User Card
exceeding the assigned time limit. The users need to punch a card into the card slot or use they individual key on a
Q)Protection and security keypad to access the system.
Protection and security requires that computer resources such as CPU, softwares, User Attribute Identification
memory etc. are protected. This extends to the operating system as well as the Different user attribute identifications that can be used are fingerprint, eye retina
data in the system. This can be done by ensuring integrity, confidentiality and etc. These are unique for each user and are compared with the existing samples in
availability in the operating system. The system must be protect against the database. The user can only access the system if there is a match.
unauthorized access, viruses, worms etc. One Time Password
Threats to Protection and Security These passwords provide a lot of security for authentication purposes. A one time
password can be generated exclusively for a login every time a user wants to
2
enter the system. It cannot be used more than once. The various ways a one time Trees
password can be implemented are − A tree is a data structure that can be used to represent data hierarchically. Data
Random Numbers values in a tree structure are linked through parent–child relationships. In a
The system can ask for numbers that correspond to alphabets that are pre general tree, a parent may have an unlimited number of children. In a binary tree,
arranged. This combination can be changed each time a login is required. a parent may have at most two children, which we term the left child and the
Secret Key right child. A binary search tree additionally requires an ordering between the
A hardware device can create a secret key related to the user id for login. This key parent’s two children in which left_child <= right_child. Figure 1.16 provides an
can change each time. example of a binary search tree. When we search for an item in
Q)Kernel data structures a binary search tree, theworst-case performance is O(n) (consider howthis can
Linux implements several data structures that are used throughout the kernel. If occur). To remedy this situation, we can use an algorithm to create a balanced
you want to read the Linux source code, you should learn the common data binary search tree. Here, a tree containing n items has at most lg n levels, thus
structures first. ensuring worst-case performance of O(lg n).
Lists, Stacks, and Queues Hash Functions and Maps
An array is a simple data structure in which each element can be accessed A hash function takes data as its input, performs a numeric operation on this data,
directly. and returns a numeric value. This numeric value can then be used as an index into
In a singly linked list, each item points to its successor, as illustrated in a table (typically an array) to quickly retrieve the data. Whereas searching for a
Figure data item through a list of size n can require up to O(n) comparisons in the worst
• In a doubly linked list, a given itemcan refer either to its predecessor or to its case, using a hash function for retrieving data from table can be as good as O(1) in
successor, as illustrated in Figure the worst case, depending on implementation details. Because of this
• In a circularly linked list, the last element in the list refers to the first performance, hash functions are used extensively in operating system
element, rather than to null, as illustrated in Figure

Bitmaps
A bitmap is a string of n binary digits that can be used to represent the status of n
items. For example, suppose we have several resources, and the availability of
each resource is indicated by the value of a binary digit: 0 means that the resource
is available,while 1 indicates that it is unavailable (or vice-versa). The value of the
ith position in the bitmap is associated with the ith resource.

3
As PCs have become faster,morepowerful, and cheaper, designers have shifted
away from centralized system architecture. Terminals connected to centralized
systems are now being supplanted by PCs and mobile devices. Correspondingly,
user-interface functionality once handled directly by centralized systems is
increasingly being handled by PCs, quite often through a web interface. As a
result, many of today’s systems act as server systems to satisfy requests
generated by client systems. This formof specialized distributedsystem, called a
client–server system
Server systems can be broadly categorized as compute servers and file
servers:
• The compute-server system provides an interface to which a client can send a
request to perform an action (for example, read data). In response, the server
executes the action and sends the results to the client. A server running a
database that responds to client requests for data is an example of such a system.
Q)Computing Environments • The file-server system provides a file-system interface where clients can create,
Traditional computer update, read, and delete files.
Blurring over time
Office environment
PCs connected to a network, terminals attached to mainframe or minicomputers
providing batch and timesharing
Now portals allowing networked and remote systems access to same resources
Home networks
Used to be single system, then modems Now firewalled, networked
Mobile Computing
Mobile computing refers to computing on handheld smartphones and tablet
computers. These devices share the distinguishing physical features of being
portable and lightweight.
Historically, compared with desktop and laptop computers, mobile systems gave
up screen size, memory capacity, and overall functionality in return for handheld
mobile access to services such as e-mail and web browsing.
Two operating systems currently dominate mobile computing: Apple iOS
and Google Android
Distributed Systems Peer-to-Peer Computing
A distributed systemis a collection of physically separate, possibly heterogeneous, Another model of distributed system
computer systems that are networked to provide users with access to the various P2P does not distinguish clients and servers
resources that the system maintains Instead all nodes are considered peers
A network, in the simplest terms, is a communication path between two or more May each act as client, server or both
systems. Distributed systems depend on networking for their functionality. Node must join P2P network
Networks are characterized based on the distances between their nodes. Registers its service with central lookup service on network, or
A local-area network (LAN) connects computers within a room, a building, or a Broadcast request for service and respond to requests for service via discovery
campus. A wide-area network (WAN) usually links buildings, cities, or countries. protocol
Client–Server Computing Examples include Napster and Gnutella
4
Virtualization Internet meant that anyone interested could download the source code, modify
Virtualization is a technology that allows operating systems to run as applications it, and submit changes to Torvalds. Releasing updates once a week allowed this
within other operating systems. At first blush, there seems to be little reason for so-called Linux operating system to grow rapidly, enhanced by several thousand
such functionality. But the virtualization industry is vast and growing, which is a programmers
testament to its utility and importance The resulting GNU/Linux operating system has spawned hundreds of
Some languages, such as BASIC, can be either compiled or interpreted. Java, in unique distributions, or custom builds, of the system. Major distributions
contrast, is always interpreted. Interpretation is a form of emulation in that the include RedHat, SUSE, Fedora, Debian, Slackware, and Ubuntu. Distributions vary
high-level language code is translated to native CPU instructions, emulating not in function, utility, installed applications, hardware support, user interface, and
another CPU but a theoretical virtual machine on which that language could run purpose
natively. Thus, we can run Java programs on “Java virtual machines,” but BSD UNIX
technically those virtual machines are Java emulators eeBSD, NetBSD, OpenBSD, and DragonflyBSD. To explore the source code
Cloud Computing of FreeBSD, simply download the virtual machine image of the version of
Cloud computing is a type of computing that delivers computing, storage, and interest and boot it within VMware, as described above for Linux. The source
even applications as a service across a network code comes with the distribution and is stored in /usr/src/. The kernel source
Public cloud—a cloud available via the Internet to anyone willing to pay code is in /usr/src/sys
for the services . Solaris
• Private cloud—a cloud run by a company for that company’s own use Solaris is the commercial UNIX-based operating system of Sun Microsystems.
• Hybrid cloud—a cloud that includes both public and private cloud Originally, Sun’s SunOS operating system was based on BSD UNIX. Sunmoved to
components AT&T’s System V UNIX as its base in 1991. In 2005, Sun open-sourced most of the
• Software as a service (SaaS)—one or more applications (such as word Solaris code as the OpenSolaris project. The purchase of Sun by Oracle in 2009,
processors or spreadsheets) available via the Internet however, left the state of this project unclear. The source code as it was in 2005 is
• Platform as a service (PaaS)—a software stack ready for application use via the still available via a source code browser and for download AT
Internet (for example, a database server) http://src.opensolaris.org/source.
• Infrastructure as a service (IaaS)—servers or storage available over the Several groups interested in using OpenSolaris have started from that base and
Internet (for example, storage available for making backup copies of expanded its features. Their working set is Project Illumos, which has expanded
production data) from the OpenSolaris base to include more features and to be the basis for
several products.
Q)Open-Source Operating Systems Q)Operating System Services
Operating systems made available in source-code format rather than just binary One set of operating-system services provides functions that are helpful to the
closed-source user:
Counter to the copy protection and Digital Rights Management (DRM) User interface - Almost all operating systems have a user interface (UI)
movement Varies between Command-Line (CLI), Graphics User Interface (GUI), Batch
Started by Free Software Foundation (FSF), which has ―copyleft‖ GNU Public Program execution - The system must be able to load a program into memory
License (GPL) and to run that program, end execution, either normally or abnormally (indicating
Examples include GNU/Linux, BSD UNIX (including core of Mac OS X), and Sun error)
Solaris I/O operations - A running program may require I/O, which may involve a file or
Linux an I/O device
The GNU project produced many UNIX-compatible tools, including File-system manipulation - The file system is of particular interest. Obviously,
compilers,editors, and utilities, but never released a kernel. In 1991, a student in programs need to read and write files and directories, create and delete them,
Finland, Linus Torvalds, released a rudimentary UNIX-like kernel using the GNU search them, list file Information, permission management.
compilers and tools and invited contributions worldwide. The advent of the

5
Q)User and Operating-System Interface
two fundamental approaches. One provides a command-line interface, or
command interpreter, that allows users directly enter commands to be performed
by the operating system. The other allows users to interface with the operating
system via a graphical user interface, or GUI
User Operating System Interface - CLI
Command Line Interface (CLI) or command interpreter allows direct command
entrySometimes implemented in kernel, sometimes by systems program
Sometimes multiple flavors implemented – shellsPrimarily fetches a command
from user and executes it
Sometimes commands built-in, sometimes just names of programs
If the latter, adding new features doesn’t require shell modification
User Operating System Interface - GUI
One set of operating-system services provides functions that are helpful to the User-friendly desktop metaphor interface
user (Cont):l Communications – Processes may exchange information, on the Usually mouse, keyboard, and monitor
same computer or between computers over a network Icons represent files, programs, actions, etc
Communications may be via shared memory or through message passing (packets Various mouse buttons over objects in the interface cause various actions
moved by the OS) (provide information,options, execute function, open directory (known as a
Error detection – OS needs to be constantly aware of possible errors folder)
May occur in the CPU and memory hardware, in I/O devices, in user program Invented at Xerox PARC
For each type of error, OS should take the appropriate action to ensure correct Many systems now include both CLI and GUI interfaces
and consistent computing Microsoft Windows is GUI with CLI ―command‖ shell
Debugging facilities can greatly enhance the user’s and programmer’s abilities to Apple Mac OS X as ―Aqua‖ GUI interface with UNIX kernel underneath and
efficiently use the system shells available
Another set of OS functions exists for ensuring the efficient operation of the Solaris is CLI with optional GUI interfaces (Java Desktop, KDE)
system itself via resource sharing Q)System Calls
Resource allocation - When multiple users or multiple jobs running concurrently, Programming interface to the services provided by the OS
resources must be allocated to each of them Typically written in a high-level language (C or C++)
Many types of resources - Some (such as CPU cycles, main memory, and file Mostly accessed by programs via a high-level Application Program Interface (API)
storage) may have special allocation code, others (such as I/O devices) may have rather than direct system call usen Three most common APIs are Win32 API for
general request and release code Windows, POSIX API for POSIX-based systems (including virtually all versions of
Accounting - To keep track of which users use how much and what kinds of UNIX, Linux, and Mac OS X), and Java API for the Java virtual machine (JVM)
computer resources Why use APIs rather than system calls?(Note that the system-call names used
Protection and security - The owners of information stored in a multiuser or throughout this text are generic)
networked computersystem may want to control use of that information, Example of System Calls
concurrent processes should not interfere witheach other
Protection involves ensuring that all access to system resources is controlled
Security of the system from outsiders requires user authentication, extends to
defending externalI/O devices from invalid access attempts
If a system is to be protected and secure, precautions must be instituted
throughout it. A chain is only asstrong as its weakest link.

6
System Call Parameter Passing
Often, more information is required than simply identity of desired system call
Exact type and amount of information vary according to OS and call
Three general methods used to pass parameters to the OS
Simplest: pass the parameters in registers
In some cases, may be more parameters than registers
Parameters stored in a block, or table, in memory, and address of block passed as
a parameter in a register . This approach taken by Linux and Solaris
Parameters placed, or pushed, onto the stack by the program and popped off the
stack by the operating system
Block and stack methods do not limit the number or length of parameters being
passed

System Call Implementation


Typically, a number associated with each system call
System-call interface maintains a table indexed according to these Numbers
The system call interface invokes intended system call in OS kernel and returns
status of the system call and any return values
The caller need know nothing about how the system call is implemented
Just needs to obey API and understand what OS will do as a result call
Most details of OS interface hidden from programmer by API
Managed by run-time support library (set of functions built into libraries included
with compiler)
API – System Call – OS Relationship

Types of System Calls


• Process control
• File management
• Device management
• Information maintenance
• Communications
• Protection
➢ Process Control
A running program needs to be able to halt its execution either normally(end()) or
abnormally (abort()). If a system call is made to terminate the currently running
program abnormally, or if the program runs into a problem and causes an error
trap, a dump of memory is sometimes taken and an error message generated. The
dump is written to disk and may be examined by a debugger—a system program
7
designed to aid the programmer in finding and correcting errors, or bugs—to FreeBSD Running Multiple Programs
determine the cause of the problem
Process control
◦ end, abort
◦ load, execute
◦ create process, terminate process
◦ get process attributes, set process attributes
◦ wait for time
◦ wait event, signal event
◦ allocate and free memory
• File management
◦ create file, delete file
◦ open, close
◦ read, write, reposition
◦ get file attributes, set file attributes File Management
• Device management We first need to be able to create() and delete() files. Either system call
◦ request device, release device requires the name of the file and perhaps some of the file’s attributes. Oncethe
◦ read, write, reposition file is created, we need to open() it and to use it. We may also read(),write(), or
◦ get device attributes, set device attributes reposition() (rewind or skip to the end of the file, for example).Finally, we need to
◦ logically attach or detach devices close() the file, indicating that we are no longer using it.
• Information maintenance Device Management
◦ get time or date, set time or date A processmay need several resources to execute—mainmemory, disk
◦ get system data, set system data drives,access to files, and so on. If the resources are available, they can be
◦ get process, file, or device attributes granted, and control can be returned to the user process. Otherwise, the process
◦ set process, file, or device attributes will have to wait until sufficient resources are available.
• Communications The various resources controlled by the operating system can be thought
◦ create, delete communication connection of as devices. Some of these devices are physical devices (for example, disk
◦ send, receive messages drives), while others can be thought of as abstract or virtual devices (for example,
◦ transfer status information files).
◦ attach or detach remote devices Information Maintenance
Figure 2.8 Types of system calls. Many system calls exist simply for the purpose of transferring information
MS-DOC EXECUTION between the user program and the operating system. For example, most
systems have a system call to return the current time() and date(). Other
system calls may return information about the system, such as the number of
current users, the version number of the operating system, the amount of free
memory or disk space, and so on.

Communication
There are two common models of interprocess communication: the message
passing model and the shared-memory model. In the message-passing model,the

8
communicating processes exchange messages with one another to transfer send messages to one another’s screens, to browse Web pages, to send e-mail
information messages, to log in remotely, or to transfer files from one machine to another.
In the shared-memory model, processes use shared memory create() • Background services. All general-purpose systems have methods for
and shared memory attach() system calls to create and gain access to regions of launching certain system-program processes at boot time. Some of these
memory owned by other processes processes terminate after completing their tasks, while others continue
Protection to run until the system is halted. Constantly running system-program
Protection provides a mechanism for controlling access to the resources processes are known as services, subsystems, ordaemons.
provided by a computer system. Historically, protection was a concern only on Q)Operating System Design and Implementation
multiprogrammed computer systems with several users. However, with the Design and Implementation of OS not ―solvable‖, but some approaches have
advent of networking and the Internet, all computer systems, from servers to proven successful
mobile handheld devices, must be concerned with protection. Internal structure of different Operating Systems can vary widely
Typically, system calls providing protection include set permission() and get Start by defining goals and specifications
permission(), which manipulate the permission settings of resources such as files Affected by choice of hardware, type of system
and disks. The allow user() and deny user() system calls specify whether particular User goals and System goals
users can—or cannot—be allowed access to certain resources. User goals – operating system should be convenient to use, easy to learn,
Q)System Programs reliable, safe, and fast
. System programs, also known as system utilities, System goals – operating system should be easy to design, implement, and
provide a convenient environment for program development and execution. maintain, as well as flexible, reliable, error-free, and efficient
. They can be divided into these categories: Important principle to separate
• File management. These programs create, delete, copy, rename, print, Policy: What will be done?
dump, list, and generally manipulate files and directories. Mechanism: How to do it?
• Status information. Some programs simply ask the system for the date,time, Mechanisms determine how to do something, policies decide what will be done
amount of available memory or disk space, number of users, or similar status The separation of policy from mechanism is a very important principle, it allows
information. Others are more complex, providing detailed maximum flexibility if policy decisions are to be changed later
performance, logging, and debugging information. Typically, these programs Design Goals
format and print the output to the terminal or other output device or files or The first problem in designing a system is to define goals and specifications. At the
display it in a window of the GUI. Some systems also support a registry, which is highest level, the design of the system will be affected by the choice of hardware
used to store and retrieve configuration information. and the type of system: batch, time sharing, single user, multiuser, distributed,
• File modification. Several text editors may be available to create and real time, or general purpose.
modify the content of files stored on disk or other storage devices. There Beyond this highest design level, the requirements may be much harder
may also be special commands to search contents of files or perform to specify. The requirements can, however, be divided into two basic groups: user
transformations of the text. goals and system goals.
• Programming-language support. Compilers, assemblers, debuggers, and Mechanisms and Policies
interpreters for common programming languages (such as C, C++, Java, and PERL) One important principle is the separation of policy from mechanism. Mechanisms
are often providedwith the operating systemor available as a separate download. determine how to do something; policies determine what will be done.
• Program loading and execution. Once a program is assembled or compiled, it For example, the timer construct (see Section 1.5.2) is a mechanism for ensuring
must be loaded into memory to be executed. The system may provide absolute CPU protection, but deciding how long the timer is to be set for a particular user is
loaders, relocatable loaders, linkage editors, and overlay loaders. Debugging a policy decision
systems for either higher-level languages or machine language are needed as well. Implementation
• Communications. These programs provide the mechanism for creating virtual Once an operating system is designed, it must be implemented. Because
connections among processes, users, and computer systems. They allow users to

9
operating systems are collections of many programs, written by many people over DTrace is a facility that dynamically adds probes to a running system, both in user
a long period of time, it is difficult to make general statements about how they processes and in the kernel. These probes can be queried via the D programming
are implemented. language to determine an astonishing amount about the kernel, the system state,
and process activities
Q)Operating-System Debugging Solaris 10 dtrace Following System Call
Debugging is finding and fixing errors, or bugs
OSes generate log files containing error information
Failure of an application can generate core dump file capturing memory of the
process
Operating system failure can generate crash dump file containing kernel
memory
Beyond crashes, performance tuning can optimize system performance
Kernighan’s Law: ―Debugging is twice as hard as writing the code in the rst place.
Therefore, if you write the code as cleverly as possible, you are, by definition, not
smart enough to debug it.‖
DTrace tool in Solaris, FreeBSD, Mac OS X allows live instrumentation on
production systems
Probes fire when code is executed, capturing state data and sending it to
consumers of those probes
Failure Analysis
If a process fails, most operating systems write the error information to a log file
to alert system operators or users that the problem occurred. The operating
system can also take a core dump—a capture of the memory of the process— and
store it in a file for later analysis. (Memory was referred to as the “core” in the
early days of computing.) Running programs and core dumps can be probed by a
debugger, which allows a programmer to explore the code and memory of a
process
Performance Tuning
To identify bottlenecks, we must be able
to monitor system performance. Thus, the operating system must have some
means of computing and displaying measures of system behavior. In a number of
systems, the operating system does this by producing trace listings of system
behavior. All interesting events are logged with their time and important
parameters and are written to a file. Later, an analysis program can process
the log file to determine system performance and to identify bottlenecks and
inefficiencies. These same traces can be run as input for a simulation of a
suggested improved system. Traces also can help people to find errors in
operating-system behavior.

DTrace

10
Q)Operating System Generation
Operating systems are designed to run on any of a class of machines; the system
must be configured for each specific computer site
SYSGEN program obtains information concerning the specific configuration of
the hardware system
Booting – starting a computer by loading the kernel
Bootstrap program – code stored in ROM that is able to locate the kernel, load it
into memory, and start its execution
It is possible to design, code, and implement an operating system specifically
for one machine at one site. More commonly, however, operating systems
are designed to run on any of a class of machines at a variety of sites with
a variety of peripheral configurations. The system must then be configured
or generated for each specific computer site, a process sometimes known as
system generation SYSGEN.
The operating system is normally distributed on disk, on CD-ROM or DVD-ROM, or
as an “ISO” image, which is a file in the format of a CD-ROM or DVD-ROM. To
generate a system, we use a special program. This SYSGEN program reads from a
given file, or asks the operator of the system for information concerning the
specific configuration of the hardware system, or probes the hardware directly to
determine what components are there.
Q)System Boot
Operating system must be made available to hardware so hardware can start it
Small piece of code – bootstrap loader, locates the kernel, loads it into memory,
and starts it
Sometimes two-step process where boot block at fixed location loads bootstrap
loader
When power initialized on system, execution starts at a fixed memory location
Firmware used to hold initial boot code
After an operating system is generated, it must be made available for use by the
hardware. But how does the hardware know where the kernel is or how to load
that kernel? The procedure of starting a computer by loading the kernel is known
as booting the system. On most computer systems, a small piece of code known
as the bootstrap program or bootstrap loader locates the kernel, loads it into
main memory, and starts its execution. Some computer systems, such as PCs, use
a two-step process in which a simple bootstrap loader fetches a more complex
boot programfromdisk,which in turn loads the kernel.
The bootstrap can execute the operating system directly if the operating system is
also in the firmware, or it can complete a sequence in which it loads progressively
smarter programs from firmware and disk until the operating system itself is
loaded into memory and executed.

11
12
UNIT-2
Q) Process Concept
Process Concept
An operating system executes a variety of programs:
Batch system – jobs
Time-shared systems – user programs or tasks
Textbook uses the terms job and process almost interchangeably
Process – a program in execution; process execution must progress in sequential
fashion A process includes:
program counter
stack
data section
Process in Memory • CPU registers. The registers vary in number and type, depending on the
Informally, as mentioned earlier, a process computer architecture. They include accumulators, index registers, stack pointers,
is a program in execution. A process is and general-purpose registers, plus any condition-code information. Along with
more than the program code, which is the program counter, this state information must be saved when an interrupt
sometimes known as the text section. It occurs, to allow the process to be continued
also includes the current activity, as correctly afterward
represented by the value of the program • CPU-scheduling information. This informationincludes a process priority,
counter and the contents of the pointers to scheduling queues, and any other scheduling parameters.
processor’s registers. A process generally • Memory-management information. This information may include such items as
also the value of the base and limit registers and the page tables, or the segment
includes the process stack, which contains tables, depending on the memory system used by the operating system
temporary data (such as function • Accounting information. This information includes the amount of CPU and real
parameters, return addresses, and local time used, time limits, account numbers, job or process numbers, and so on.
variables), and a data section, which contains global variables.A process may also • I/O status information. This information includes the list of I/O devices allocated
include a heap, which is memory that is dynamically allocated during process run to the process, a list of open files, and so on.
time. The structure of a process in memory is shown in Figure In brief, the PCB simply serves as the repository
Process State for any information that may vary from process to
As a process executes, it changes state process.
new: The process is being created Process Control Block (PCB)
running: Instructions are being executed Information associated with each process
waiting: The process is waiting for some event to occur Process state
ready: The process is waiting to be assigned to a processor Program counter
terminated: The process has finished execution CPU registers
Diagram of Process State CPU scheduling information
Process state. The statemay be new, ready, running, waiting, halted, and so on. Memory-management information
• Program counter. The counter indicates the address of the next instruction to be Accounting information
executed for this process. I/O status information

13
CPU Switch From Process to Process
Ready Queue And Various I/O Device Queue

Threads
The process model discussed so far has implied that a process is a program that
performs a single thread of execution. For example, when a process is running
a word-processor program, a single thread of instructions is being executed.
This single thread of control allows the process to perform only one task at
a time. The user cannot simultaneously type in characters and run the spell
checker within the same process
Q) Process Scheduling
Job queue – set of all processes in the system Scheduling Queues
Ready queue – set of all processes residing in main memory, ready and waiting to A new process is initially put in the ready queue. It waits there until it is
execute selected for execution, or dispatched. Once the process is allocated the CPU and is
Device queues – set of processes waiting for an I/O device executing, one of several events could occur:
Processes migrate among the various queues • The process could issue an I/O request and then be placed in an I/O queue.
• The process could create a new child process and wait for the child’s
termination.
• The process could be removed forcibly from the CPU, as a result of an
interrupt, and be put back in the ready queue.

14
The long-term scheduler controls the degree of multiprogramming
Processes can be described as either:
I/O-bound process – spends more time doing I/O than computations, many short
CPU bursts
CPU-bound process – spends more time doing computations; few very long CPU
bursts
Context Switch
When CPU switches to another process, the system must save the state of the old
process and load the saved state for the new process via a context switch
Context of a process represented in the PCB
Context-switch time is overhead; the system does no useful work while switching
Time dependent on hardware support
Switching the CPU to another process requires performing a state save of
the current process and a state restore of a different process. This task is known
as a context switch.
Context-switch times are highly dependent on hardware support. For
instance, some processors (such as the Sun UltraSPARC) providemultiple sets
of registers.
Q)Operations on Processes
Schedulers The processes in most systems can execute concurrently, and they may
A process migrates among the various scheduling queues throughout its be created and deleted dynamically. Thus, these systems must provide a
lifetime. The operating system must select, for scheduling purposes, processes mechanism for process creation and termination. In this section, we explore
from these queues in some fashion. The selection process is carried out by the the mechanisms involved in creating processes and illustrate process creation
appropriate scheduler on UNIX andWindows systems.
Addition of Medium Term Scheduling Process Creation
Parent process create children processes, which, in turn create other processes,
forming a tree of processes
Generally, process identified and managed via a process identifier (pid)
Resource sharing
Parent and children share all resources
Children share subset of parent’s resources
Parent and child share no resources
Execution
Parent and children execute concurrently
Parent waits until children terminate
Long-term scheduler (or job scheduler) – selects which processes should be Address space
brought into the readyqueue Child duplicate of parent
Short-term scheduler (or CPU scheduler) – selects which process should be Child has a program loaded into it
executed next andallocates CPU UNIX examples
Short-term scheduler is invoked very frequently (milliseconds) Þ (must be fast) fork system call creates new process
Long-term scheduler is invoked very infrequently (seconds, minutes) Þ (may be exec system call used after a fork to replace the process’ memory space with a
slow) new program
15
When a process creates a new process, two possibilities for execution exist: A parentmay terminate the execution of one of its children for a variety of
1. The parent continues to execute concurrently with its children. reasons, such as these:
2. The parent waits until some or all of its children have terminated. • The child has exceeded its usage of some of the resources that it has been
There are also two address-space possibilities for the new process: allocated. (To determine whether this has occurred, the parent must have
1. The child process is a duplicate of the parent process (it has the same a mechanismto inspect the state of its children.)
program and data as the parent). • The task assigned to the child is no longer required.
2. The child process has a new program loaded into it. • The parent is exiting, and the operating system does not allow a child to
#include <sys/types.h> continue if its parent terminates.
#include <stdio.h> To illustrate process execution and termination, consider that, in Linux
#include <unistd.h> and UNIX systems, we can terminate a process by using the exit() system
int main() call, providing an exit status as a parameter:
{pid t pid; /* exit with status 1 */
/* fork a child process */ exit(1);
pid = fork(); In fact, under normal termination, exit() may be called either directly (as
if (pid < 0) { /* error occurred */ shown above) or indirectly (by a return statement in main()).
fprintf(stderr, "Fork Failed"); A parent processmaywait for the termination of a child process by using
return 1; the wait() system call. The wait() system call is passed a parameter that
}else if (pid == 0) { /* child process */ allows the parent to obtain the exit status of the child. This system call also
execlp("/bin/ls","ls",NULL); returns the process identifier of the terminated child so that the parent can tell
}else { /* parent process */ which of its children has terminated:
/* parent will wait for the child to complete */ pid t pid;
wait(NULL); int status;
printf("Child Complete"); pid = wait(&status);
} When a process terminates, its resources are deallocated by the operating
return 0; system. However, its entry in the process table must remain there until the
} parent calls wait(), because the process table contains the process’s exit status.
Figure Creating a separate process using the UNIX fork() system call. A process that has terminated, butwhose parent has not yet called wait(), is
known as a zombie process. All processes transition to this state when they
terminate, but generally they exist as zombies only briefly. Once the parent
calls wait(), the process identifier of the zombie process and its entry in the
process table are released.
Now consider what would happen if a parent did not invoke wait() and
instead terminated, thereby leaving its child processes as orphans. Linux and
UNIX address this scenario by assigning the init process as the new parent to
orphan processes. (Recall from Figure 3.8 that the init process is the root of the
Process Termination process hierarchy in UNIX and Linux systems.) The init process periodically
Aprocess terminateswhen itfinishes executing its final statement and asks the invokes wait(), thereby allowing the exit status of any orphaned process to be
operating system to delete it by using the exit() system call. At that point, the collected and releasing the orphan’s process identifier and process-table entry
process may return a status value (typically an integer) to its parent process
(via the wait() system call). All the resources of the process—including
physical and virtual memory, open files, and I/O buffers—are deallocated
by the operating system.
16
processes to establish a region of shared memory. Typically, a shared-memory
Q)Interprocess Communication region resides in the address space of the process creating the shared-memory
Processes within a system may be independent or cooperating segment. Other processes that wish to communicate using this shared-memory
Cooperating process can affect or be affected by other processes, including segment must attach it to their address space
sharing data Cooperating Processes
There are several reasons for providing an environment that allows process Independent process cannot affect or be affected by the execution of another
cooperation: process
• Information sharing. Since several users may be interested in the same Cooperating process can affect or be affected by the execution of another
piece of information (for instance, a shared file), we must provide an process Advantages of process cooperation
environment to allow concurrent access to such information. Information sharing
• Computation speedup. Ifwewant a particular task to run faster,wemust Computation speed-up
break it into subtasks, each of which will be executing in parallel with the Modularity
others. Notice that such a speedup can be achieved only if the computer Convenience
has multiple processing cores. Producer-Consumer Problem
• Modularity. We may want to construct the systemin a modular fashion, Paradigm for cooperating processes, producer process produces information
dividing the system functions into separate processes or threads that is consumed by a consumer process
• Convenience. Even an individual user may work on many tasks at the unbounded-buffer places no practical limit on the size of the buffer
same time. For instance, a user may be editing, listening to music, and bounded-buffer assumes that there is a fixed buffer size
compiling in parallel. Bounded-Buffer – Shared-Memory Solution
Shared data
#define BUFFER_SIZE 10 typedef struct {
...
} item;
item buffer[BUFFER_SIZE]; int in = 0;
int out = 0;
Solution is correct, but can only use BUFFER_SIZE-1 elements
Bounded-Buffer – Producer while (true) {
/* Produce an item */
while (((in = (in + 1) % BUFFER SIZE count) == out) ; /* do nothing -- no free buffers
*/
buffer[in] = item;
in = (in + 1) % BUFFER SIZE;
}
Bounded Buffer – Consumer while (true) {
while (in == out)
; // do nothing -- nothing to consume

// remove an item from the buffer


item = buffer[out];
COMMUNICATION MODEL out = (out + 1) % BUFFER SIZE; return item;
Shared-Memory Systems }
Interprocess communication using shared memory requires communicating Interprocess Communication – Message Passing
17
Mechanism for processes to communicate and to synchronize their actions receive(A, message) – receive a message from mailbox A
Message system – processes communicate with each other without resorting to Mailbox sharing
shared variables P1, P2, and P3 share mailbox A
IPC facility provides two operations: P1, sends; P2 and P3 receive
send(message) – message size fixed or variable Who gets the message?
receive(message) Solutions
Amessage-passing facility provides at least two operations: Allow a link to be associated with at most two processes
send(message) receive(message) Allow only one process at a time to execute a receive operation
If P and Q wish to communicate, they need to: Allow the system to select arbitrarily the receiver. Sender is notified who the
establish a communication link between them receiver was.
exchange messages via send/receive Synchronization
Implementation of communication link Message passing may be either blocking or non-blocking
physical (e.g., shared memory, hardware bus) Blocking is considered synchronous
logical (e.g., logical properties) Blocking send has the sender block until the message is received
Naming Blocking receive has the receiver block until a message is available
Processes that want to communicate must have a way to refer to each other. Non-blocking is considered asynchronous
They can use either direct or indirect communication Non-blocking send has the sender send the message and continue
• Direct Communication Non-blocking receive has the receiver receive a valid message or null
Processes must name each other explicitly: • Buffering
send (P, message) – send a message to process P message next produced;
receive(Q, message) – receive a message from process Q while (true) { /* produce an item in next produced */
Properties of communication link send(next produced);
Links are established automatically }
A link is associated with exactly one pair of communicating processes Figure The producer process using message passing.
Between each pair there exists exactly one link • Zero capacity. The queue has a maximum length of zero; thus, the link cannot
The link may be unidirectional, but is usually bi-directional have any messages waiting in it. In this case, the sender must block until the
• Indirect Communication recipient receives the message.
Messages are directed and received from mailboxes (also referred to as ports) • Bounded capacity. The queue has finite length n; thus, at most n messages can
Each mailbox has a unique id reside in it. If the queue is not full when a new message is sent, the message is
Processes can communicate only if they share a mailbox placed in the queue (either the message is copied or a pointer to the message is
Properties of communication link kept), and the sender can continue execution without waiting. The link’s capacity
Link established only if processes share a common mailbox is finite, however. If the link is full, the sender must block until space is available
A link may be associated with many processes in the queue.
Each pair of processes may share several communication links • Unbounded capacity. The queue’s length is potentially infinite; thus, any
Link may be unidirectional or bi-directional number of messages can wait in it. The sender never blocks.
Operations Q)Examples of IPC Systems - POSIX
create a new mailbox POSIX Shared Memory
send and receive messages through mailbox Process first creates shared memory segment
destroy a mailbox segment id = shmget(IPC PRIVATE, size, S IRUSR | S IWUSR);
Primitives are defined as: Process wanting access to that shared memory must attach to it
send(A, message) – send a message to mailbox A shared memory = (char *) shmat(id, NULL, 0);

18
Now the process could write to the shared memory shm fd = shm open(name, O CREAT | O RDRW, 0666);
printf(shared memory, "Writing to shared memory"); /* configure the size of the shared memory object */
When done a process can detach the shared memory from its address space ftruncate(shm fd, SIZE);
shmdt(shared memory); /* memory map the shared memory object */
message next consumed; ptr = mmap(0, SIZE, PROT WRITE, MAP SHARED, shm fd, 0);
while (true) { receive(next consumed); /* write to the shared memory object */
/* consume the item in next consumed */ sprintf(ptr,"%s",message 0);
} ptr += strlen(message 0);
Figure The consumer process using message passing. sprintf(ptr,"%s",message 1);
Q)Examples of IPC Systems - Mach ptr += strlen(message 1);
Mach communication is message based return 0;
Even system calls are messages }
Each task gets two mailboxes at creation- Kernel and Notify Figure 3.17 Producer process illustrating POSIX shared-memory API.
Only three system calls needed for message transfer Examples of IPC Systems – Windows XP
msg_send(), msg_receive(), msg_rpc() Message-passing centric via local procedure call (LPC) facility
Mailboxes needed for commuication, created via Only works between processes on the same system
port_allocate() Uses ports (like mailboxes) to establish and maintain communication channels
If the mailbox is full, the sending thread has four options: Communication works as follows:
1. Wait indefinitely until there is room in the mailbox. The client opens a handle to the subsystem’s connection port object
2. Wait at most n milliseconds. The client sends a connection request
3. Do not wait at all but rather return immediately. The server creates two private communication ports and returns the handle to
4. Temporarily cache a message. one of them to the client The client and server use the corresponding port handle
#include <stdio.h> to send messages or callbacks and to listen for replies
#include <stlib.h> When an ALPC channel is created, one of three message-passing techniques
#include <string.h> is chosen:
#include <fcntl.h> 1. For small messages (up to 256 bytes), the port’s message queue is used
#include <sys/shm.h> as intermediate storage, and the messages are copied from one process to
#include <sys/stat.h> the other.
int main() 2. Larger messages must be passed through a section object, which is a
{/* the size (in bytes) of shared memory object */ region of shared memory associated with the channel.
const int SIZE 4096; 3. When the amount of data is too large to fit into a section object, an API is
/* name of the shared memory object */ available that allows server processes to read and write directly into the
const char *name = "OS"; address space of a client
/* strings written to shared memory */
const char *message 0 = "Hello";
const char *message 1 = "World!";
/* shared memory file descriptor */
int shm fd;
/* pointer to shared memory obect */
void *ptr;
/* create the shared memory object */

19
import java.net.*;
import java.io.*;
public class DateServer
{ public static void main(String[] args) { try { ServerSocket sock = new
ServerSocket(6013);
/* now listen for connections */
while (true) { Socket client = sock.accept();
PrintWriter pout = new
PrintWriter(client.getOutputStream(), true);
/* write the Date to the socket */
pout.println(new java.util.Date().toString());
/* close the socket and resume */
/* listening for connections */
client.close();
}
}catch (IOException ioe) { System.err.println(ioe);
}
Q)Communication in Client–Server Systems
}
• Sockets
}
A socket is defined as an endpoint for communication. A pair of processes
Figure 3.21 Date server.
communicating over a network employs a pair of sockets—one for each
• Remote Procedure Calls
process. A socket is identified by an IP address concatenated with a port
import java.net.*;
number. In general, sockets use a client–server architecture. The server waits
import java.io.*;
for incoming client requests by listening to a specified port. Once a request
public class DateClient
is received, the server accepts a connection from the client socket to complete
{ public static void main(String[] args) { try { /* make connection to server socket
the connection. Servers implementing specific services (such as telnet, FTP, and
*/
HTTP) listen to well-known ports (a telnet server listens to port 23; an FTP
Socket sock = new Socket("127.0.0.1",6013);
server listens to port 21; and a web, or HTTP, server listens to port 80). All
InputStream in = sock.getInputStream();
ports below 1024 are considered well known; we can use them to implement
BufferedReader bin = new
standard services.
BufferedReader(new InputStreamReader(in));
/* read the date from the socket */
String line;
while ( (line = bin.readLine()) != null)
System.out.println(line);
/* close the socket connection*/
sock.close();
}catch (IOException ioe) { System.err.println(ioe);
}
}
}
Figure 3.22 Date client.

20
• Pipes
A pipe acts as a conduit allowing two processes to communicate. Pipes were one
Remote procedure call (RPC) abstracts procedure calls between processes on of the first IPC mechanisms in early UNIX systems. They typically provide one of
networked systems the simpler ways for processes to communicate with one another, although they
Stubs – client-side proxy for the actual procedure on the server also have some limitations.
The client-side stub locates the server and marshalls the parameters Ordinary Pipes
The server-side stub receives this message, unpacks the marshalled parameters, Ordinary pipes allow two processes to communicate in standard producer
and peforms the procedure on the server consumer fashion: the producer writes to one end of the pipe (the write-end) and
the consumer reads fromthe other end (the read-end). As a result, ordinary pipes
are unidirectional, allowing only one-way communication. If two-way
communication is required, two pipes must be used, with each pipe sending
data in a different direction. We next illustrate constructing ordinary pipes on
both UNIX and Windows systems. In both program examples, one process writes
the message Greetings to the pipe, while the other process reads this message
from the pipe.
On UNIX systems, ordinary pipes are constructed using the function
pipe(int fd[])

Named Pipes
Ordinary pipes provide a simple mechanism for allowing a pair of processes
to communicate. However, ordinary pipes exist only while the processes are
communicating with one another. On both UNIX and Windows systems, once
the processes have finished communicating and have terminated, the ordinary
pipe ceases to exist

21
Q) Threads
To introduce the notion of a thread — a fundamental unit of CPU utilization that
forms the basis of multithreaded computer systems
To discuss the APIs for the Pthreads, Win32, and Java thread libraries
To examine issues related to multithreaded programming
Motivation
Most software applications that run on modern computers are multithreaded.
An application typically is implemented as a separate process with several threads
of control. A web browser might have one thread display images or
text while another thread retrieves data from the network, for example. A
word processor may have a thread for displaying graphics, another thread for
responding to keystrokes from the user, and a third thread for performing
spelling and grammar checking in the background. Applications can also
be designed to leverage processing capabilities on multicore systems. Such Benefits
applications can perform several CPU-intensive tasks in parallel across the The benefits of multithreaded programming can be broken down into four major
multiple computing cores categories:
1. Responsiveness. Multithreading an interactive application may allow a program
to continue running even if part of it is blocked or is performing a lengthy
operation, thereby increasing responsiveness to the user.
2. Resource sharing. Processes can only share resources through techniques such
as shared memory and message passing. Such techniques must be explicitly
arranged by the programmer.
3. Economy. Allocating memory and resources for process creation is costly.
Because threads share the resources of the process to which they belong, it is
more economical to create and context-switch threads

4. Scalability. The benefits of multithreading can be even greater in a


multiprocessor architecture, where threads may be running in parallel on
different processing cores. A single-threaded process can run on only one
processor, regardless how many are available

22
Q) Multicore Programming Q) Multithreading Models
Programming Challenges User Threads
1. Identifying tasks. This involves examining applications to find areas that can be Thread management done by user-level threads librarynThree primary thread
divided into separate, concurrent tasks. Ideally, tasks are independent of one libraries:
another and thus can run in parallel on individual cores. POSIX Pthreadsl Win32 threads
2. Balance. While identifying tasks that can run in parallel, programmers must also Java threads
ensure that the tasks perform equal work of equal value. In some instances, a Kernel Threads
certain task may not contribute as much value to the overall process as other Supported by the Kernel
tasks. Using a separate execution core to run that task may not be worth the cost. Examples
3. Data splitting. Just as applications are divided into separate tasks, the data Windows XP/2000
accessed and manipulated by the tasks must be divided to run on separate cores. Solaris
4. Data dependency. The data accessed by the tasks must be examined for Linux
dependencies between two or more tasks. When one task depends on data from Tru64 UNIX
another, programmers must ensure that the execution of the tasks is Mac OS X
synchronized to accommodate the data dependency. Multithreading Models
5. Testing and debugging. When a program is running in parallel on multiple cores, Many-to-One
many different execution paths are possible. Testing and debugging such One-to-One
concurrent programs is inherently more difficult than testing and debugging Many-to-Many
single-threaded applications. • MANY TO ONE
Types of Parallelism Many user-level threads mapped to
In general, there are two types of parallelism: data parallelism and task single kernel thread Examples:
parallelism. Data parallelism focuses on distributing subsets of the same data Solaris Green Threads
across multiple computing cores and performing the same operation on each GNU Portable Threads
core. Consider, for example, summing the contents of an array of size N. On a • One-to-One
single-core system, one thread would simply sum the elements [0] . . . [N − 1].
On a dual-core system, however, thread A, running on core 0, could sum the
elements [0] . . . [N/2 − 1] while thread B, running on core 1, could sum the
elements [N/2] . . . [N − 1]. The two threads would be running in parallel on
separate computing cores.
Task parallelism involves distributing not data but tasks (threads) across
multiple computing cores. Each thread is performing a unique operation.
Different threads may be operating on the same data, or they may be operating
on different data. Consider again our example above. In contrast to that
situation, an example of task parallelism might involve two threads, each
performing a unique statistical operation on the array of elements. The threads
again are operating in parallel on separate computing cores, but each is
performing a unique operation.
Fundamentally, then, data parallelism involves the distribution of data Each user-level thread maps to kernel thread
across multiple cores and task parallelism on the distribution of tasks across Examples
multiple cores. In practice, however, few applications strictly follow either data Windows NT/XP/2000
or task parallelism. In most instances, applications use a hybrid of these two Linux
strategies. Solaris 9 and later
23
implementation. Operating-system designers may implement thespecification in
• Many-to-Many Model any way they wish. Numerous systems implement the Pthreads specification;
Allows many user level threads to most are UNIX-type systems, including Linux, Mac OS X, and Solaris. Although
be mapped to many kernel threads Windows doesn’t support Pthreads natively, some thirdparty implementations for
Allows the operating system to Windows are available
create a sufficient number of kernel #include <pthread.h>
threads #include <stdio.h>
Solaris prior to version 9 int sum; /* this data is shared by the thread(s) */
Windows NT/2000 with the void *runner(void *param); /* threads call this function */
ThreadFiber package int main(int argc, char *argv[])
• Two-level Model { pthread t tid; /* the thread identifier */
Similar to M:M, except that it allows pthread attr t attr; /* set of thread attributes */
a user thread to be bound to kernel if (argc != 2) { fprintf(stderr,"usage: a.out <integer value>\n");
thread Examples return -1;
IRIX }if (atoi(argv[1]) < 0) { fprintf(stderr,"%d must be >= 0\n",atoi(argv[1]));
return -1;
}
/* get the default attributes */
pthread attr init(&attr);
/* create the thread */
pthread create(&tid,&attr,runner,argv[1]);
/* wait for the thread to exit */
pthread join(tid,NULL);
printf("sum = %d\n",sum);
}
/* The thread will begin control in this function */
void *runner(void *param)
{ int i, upper = atoi(param);
sum = 0;
- HP-UX
for (i = 1; i <= upper; i++)
Tru64 UNIX
sum += i;
Solaris 8 and earlier
pthread exit(0);
Q)Thread Libraries
}
A thread library provides the programmer with an API for creating and managing
Figure Multithreaded C program using the Pthreads API.
threads. There are two primary ways of implementing a thread library. The first
Windows Threads
approach is to provide a library entirely in user space with no kernel support. All
The technique for creating threads using theWindows thread library is similar
code and data structures for the library exist in user space. This means that
to the Pthreads technique in several ways. We illustrate the Windows thread
invoking a function in the library results in a local function call in user space and
API in the C program shown in Figure . Notice that we must include the
not a system call.
windows.h header file when using theWindows API.
Pthreads
#include <windows.h>
Pthreads refers to the POSIX standard (IEEE 1003.1c) defining an API for thread
#include <stdio.h>
creation and synchronization. This is a specification for thread behavior, not an
DWORD Sum; /* data is shared by the thread(s) */
24
/* the thread runs in this separate function */ Java Threads
DWORD WINAPI Summation(LPVOID Param) Threads are the fundamental model of program execution in a Java program,
{ DWORD Upper = *(DWORD*)Param; and the Java language and its API provide a rich set of features for the creation
for (DWORD i = 0; i <= Upper; i++) and management of threads. All Java programs comprise at least a single thread
Sum += i; of control—even a simple Java program consisting of only a main() method
return 0; runs as a single thread in the JVM. Java threads are available on any system that
} provides a JVM includingWindows, Linux, and Mac OS X. The Java thread API
int main(int argc, char *argv[]) is available for Android applications as well.
{ DWORD ThreadId; There are two techniques for creating threads in a Java program. One
HANDLE ThreadHandle; approach is to create a new class that is derived from the Thread class and
int Param; to override its run() method. An alternative—and more commonly used—
if (argc != 2) { fprintf(stderr,"An integer parameter is required\n"); technique is to define a class that implements the Runnable interface. The
return -1; Runnable interface is defined as follows:
}Param = atoi(argv[1]); public interface Runnable
if (Param < 0) { fprintf(stderr,"An integer >= 0 is required\n"); {
return -1; public abstract void run();
} }
/* create the thread */ start() method creates the new thread. Calling the start() method for the new
ThreadHandle = CreateThread( object does two things:
NULL, /* default security attributes */ 1. It allocates memory and initializes a new thread in the JVM.
0, /* default stack size */ 2. It calls the run() method, making the thread eligible to be run by the JVM.
Summation, /* thread function */ (Note again that we never call the run() method directly. Rather, we call the
&Param, /* parameter to thread function */ start() method, and it calls the run() method on our behalf.)
0, /* default creation flags */ Q)Implicit Threading
&ThreadId); /* returns the thread identifier */ With the continued growth of multicore processing, applications containing
if (ThreadHandle != NULL) { /* now wait for the thread to finish */ hundreds—or even thousands—of threads are looming on the horizon.
WaitForSingleObject(ThreadHandle,INFINITE); Designing such applications is not a trivial undertaking: programmers must
/* close the thread handle */ address not only the challenges outlined in Section 4.2 but additional difficulties
CloseHandle(ThreadHandle); as well.
printf("sum = %d\n",Sum); Thread Pools
} Thread pools offer these benefits:
} 1. Servicing a request with an existing thread is faster than waiting to create a
Figure Multithreaded C program using the Windows API. thread.
In situations that require waiting for multiple threads to complete, the 2. A thread pool limits the number of threads that exist at any one point. This is
WaitForMultipleObjects() function is used. This function is passed four particularly important on systems that cannot support a large number of
parameters: concurrent threads.
1. The number of objects to wait for 3. Separating the task to be performed from the mechanics of creating the task
2. A pointer to the array of objects allows us to use different strategies for running the task. For example, the task
3. A flag indicatingwhether all objects have been signaled could be scheduled to execute after a time delay or to execute periodically The
4. A timeout duration (or INFINITE) Windows API provides several functions related to thread pools. Using the thread
pool API is similar to creating a thread with the Thread Create() functionHere, a

25
function that is to run as a separate thread is defined. Such a function may appear emerging technologies for managing multithreaded applications. Other
as follows: commercial approaches include parallel and concurrent libraries, such as Intel’s
DWORD WINAPI PoolFunction(AVOID Param) { Threading Building Blocks (TBB) and several products from Microsoft.The Java
/* language and API have seen significant movement toward supporting concurrent
* this function runs as a separate thread. programming as well. A notable example is the java.util.concurrent package,
*/ which supports implicit thread creation and management.
} Q) Threading Issues
A pointer to Pool Function() is passed to one of the functions in the thread • The fork() and exec() System Calls
pool API, and a thread from the pool executes this function. One suchmember If one thread in a program calls fork(), does the new process duplicate
in the thread pool API is the Queue User Work Item() function, which is passed all threads, or is the new process single-threaded? Some UNIX systems have
three parameters: chosen to have two versions of fork(), one that duplicates all threads and
• LPTHREAD START ROUTINE Function—a pointer to the function that is to run as another that duplicates only the thread that invoked the fork() system call.
a separate thread That is, if a thread invokes the exec() system call, the program
• PVOID Param—the parameter passed to Function specified in the parameter to exec() will replace the entire process—including
• ULONG Flags—flags indicating how the thread pool is to create and manage all threads.
execution of the thread Which of the two versions of fork() to use depends on the application.
OpenMP If exec() is called immediately after forking, then duplicating all threads is
OpenMP is a set of compiler directives as well as an API for programs written unnecessary, as the program specified in the parameters to exec() will replace
in C, C++, or FORTRAN that provides support for parallel programming in the process. In this instance, duplicating only the calling thread is appropriate.
shared-memory environments. If, however, the separate process does not call exec() after forking, the separate
#include <omp.h> process should duplicate all threads
#include <stdio.h> • Signal Handling
int main(int argc, char *argv[]) A signal is used in UNIX systems to notify a process that a particular event has
{ /* sequential code */ occurred. A signal may be received either synchronously or asynchronously,
#pragma omp parallel depending on the source of and the reason for the event being signaled. All
{ printf("I am a parallel region."); signals, whether synchronous or asynchronous, follow the same pattern:
} 1. A signal is generated by the occurrence of a particular event.
/* sequential code */ 2. The signal is delivered to a process.
return 0; 3. Once delivered, the signal must be handled.
} A signal may be handled by one of two possible handlers:
Grand Central Dispatch 1. A default signal handler
Grand Central Dispatch (GCD)—a technology for Apple’s Mac OS X and iOS 2. A user-defined signal handler
operating systems—is a combination of extensions to the C language, an API, • Thread Cancellation
and a run-time library that allows application developers to identify sections Thread cancellation involves terminating a thread before it has completed. For
of code to run in parallel. Like OpenMP, GCD manages most of the details of example, if multiple threads are concurrently searching through a database and
threading. one thread returns the result, the remaining threads might be canceled
GCD identifies extensions to the C and C++ languages known as blocks. A A thread that is to be canceled is often referred to as the target thread.
block is simply a self-contained unit of work. It is specified by a caret ˆ inserted Cancellation of a target thread may occur in two different scenarios:
in front of a pair of braces { }. A simple example of a block is shown below: 1. Asynchronous cancellation. One thread immediately terminates the
ˆ{ printf("I am a block"); } target thread.
Other Approaches 2. Deferred cancellation. The target thread periodically checks whether it
Thread pools, OpenMP, and Grand Central Dispatch are just a few of many
26
should terminate, allowing it an opportunity to terminate itself in an Q)CPU SCHEDULING – BASIC CONCEPTS
orderly fashion. CPU Scheduling
• Thread-Local Storage
Threads belonging to a process share the data of the process. Indeed, this To introduce CPU scheduling,
data sharing provides one of the benefits of multithreaded programming. which is the basis for
However, in some circumstances, each thread might need its own copy of multiprogrammed operating
certain data.We will call such data thread-local storage (or TLS.) For example, systems
in a transaction-processing system, we might service each transaction in a To describe various CPU-
separate thread. Furthermore, each transaction might be assigned a unique scheduling algorithms
identifier. To associate each thread with its unique identifier, we could use To discuss evaluation criteria
thread-local storage. for selecting a CPU-scheduling
It is easy to confuse TLS with local variables. However, local variables algorithm for a particular syste
are visible only during a single function invocation, whereas TLS data are Maximum CPU utilization
visible across function invocations. In some ways, TLS is similar to static obtained with
data. The difference is that TLS data are unique to each thread. Most thread multiprogramming
libraries—including Windows and Pthreads—provide some form of support CPU–I/O Burst Cycle – Process
for thread-local storage; Java provides support as well. execution consists of a cycle of
• Scheduler Activations CPU execution and I/O wait
A final issue to be considered
with multithreaded programs
concerns communication
between the kernel and the
thread library, which may be CPU burst distribution
required by the many-to- The success of CPU scheduling depends on an observed property of processes:
many and two-level models process execution consists of a cycle of CPU execution and I/O wait. Processes
Such coordination allows the alternate between these two states. Process execution begins with a CPU burst.
number of kernel threads to That is followed by an I/O burst, which is followed by another CPU burst, then
be dynamically adjusted to another I/O burst, and so on. Eventually, the final CPU burst ends with a system
help ensure the best request to terminate execution
performance
Many systems implementing
either the many-to-many or
the two-level
model place an intermediate
data structure between the
user and kernel
threads. This data
structure—typically known as a lightweight process, or
LWP
HISTOGRAM OF CPU BURST TIMES

27
turnaround time. Turnaround time is the sum of the periods spent waiting
CPU Scheduler to get into memory, waiting in the ready queue, executing on the CPU, and
Selects from among the processes in memory that are ready to execute, and doing I/O.
allocates the CPU to one of them • Waiting time. The CPU-scheduling algorithm does not affect the amount
CPU scheduling decisions may take place when a process: of time during which a process executes or does I/O. It affects only the
1. Switches from running to waiting state amount of time that a process spends waiting in the ready queue.Waiting
2. Switches from running to ready state time is the sum of the periods spent waiting in the ready queue.
3. Switches from waiting to ready • Response time. In an interactive system, turnaround time may not be
4. Terminates the best criterion. Often, a process can produce some output fairly early
Preemptive Scheduling and can continue computing new results while previous results are being
CPU-scheduling decisions may take place under the following four circumstances: output to the user
1. When a process switches from the running state to the waiting state (for Q)SCHEDULING ALGORITHM
example, as the result of an I/O request or an invocation of wait() for A Process Scheduler schedules different processes to be assigned to the CPU
the termination of a child process) based on particular scheduling algorithms. There are six popular process
2. When a process switches from the running state to the ready state (for scheduling algorithms which we are going to discuss in this chapter −
example, when an interrupt occurs) 1. First-Come, First-Served (FCFS) Scheduling
3. When a process switches from the waiting state to the ready state (for 2. Shortest-Job-Next (SJN) Scheduling
example, at completion of I/O) 3. Priority Scheduling
4. When a process terminates 4. Round Robin(RR) Scheduling
Dispatcher 5. Multiple-Level Queues Scheduling
Another component involved in the CPU-scheduling function is the dispatcher. 6. Multilevel Feedback Queue Scheduling
The dispatcher is the module that gives control of the CPUto the process selected These algorithms are either non-preemptive or preemptive. Non-preemptive
by the short-term scheduler. This function involves the following: algorithms are designed so that once a process enters the running state, it cannot
• Switching context be preempted until it completes its allotted time, whereas the preemptive
• Switching to user mode scheduling is based on priority where a scheduler may preempt a low priority
• Jumping to the proper location in the user programto restart that program running process anytime when a high priority process enters into a ready state.
The dispatcher should be as fast as possible, since it is invoked during every 1. First Come First Serve (FCFS)
process switch. The time it takes for the dispatcher to stop one process and Jobs are executed on first come, first serve basis.
start another running is known as the dispatch latency. It is a non-preemptive, pre-emptive scheduling algorithm.
Q)Scheduling Criteria Easy to understand and implement.
• CPU utilization. We want to keep the CPU as busy as possible. Conceptually, Its implementation is based on FIFO queue.
CPU utilization can range from 0 to 100 percent. In a real system, it Poor in performance as average wait time is high.
should range from 40 percent (for a lightly loaded system) to 90 percent By far the simplest CPU-scheduling algorithm is the first-come, first-served
(for a heavily loaded system). (FCFS) scheduling algorithm. With this scheme, the process that requests the
• Throughput. If the CPU is busy executing processes, then work is being CPU first is allocated the CPU first. The implementation of the FCFS policy is
done. One measure of work is the number of processes that are completed easily managed with a FIFO queue. When a process enters the ready queue, its
per time unit, called throughput. For long processes, this ratemay be one PCB is linked onto the tail of the queue. When the CPU is free, it is allocated to
process per hour; for short transactions, it may be ten processes per second. the process at the head of the queue. The running process is then removed from
• Turnaround time. From the point of view of a particular process, the the queue. The code for FCFS scheduling is simple to write and understand.
important criterion is how long it takes to execute that process. The interval
from the time of submission of a process to the time of completion is the

28
Process Arrival Time Execution Time Service Time

P0 0 5 0

P1 1 3 5

P2 2 8 14

P3 3 6 8

Wait time of each process is as follows

Process Wait Time : Service Time - Arrival Time

P0 0-0=0

P1 5-1=4

P2 8-2=6

P3 16 - 3 = 13
Average Wait Time: (0+4+6+13) / 4 = 5.75
2. Shortest Job Next (SJN)
Waiting time of each process is as follows −
This is also known as shortest job first, or SJF
This is a non-preemptive, pre-emptive scheduling algorithm. Process Waiting Time
Best approach to minimize waiting time.
Easy to implement in Batch systems where required CPU time is known in P0 0-0=0
advance.
Impossible to implement in interactive systems where required CPU time is not P1 5-1=4
known.
The processer should know in advance how much time process will take. P2 14 - 2 = 12
Given: Table of processes, and their Arrival time, Execution time
P3 8-3=5
Average Wait Time: (0 + 4 + 12 + 5)/4 = 21 / 4 = 5.

29
3. Priority Based Scheduling
Priority scheduling is a non-preemptive algorithm and one of the most common Process Waiting Time
scheduling algorithms in batch systems.
Each process is assigned a priority. Process with highest priority is to be executed P0 0-0=0
first and so on.
Processes with same priority are executed on first come first served basis. P1 11 - 1 = 10
Priority can be decided based on memory requirements, time requirements or
any other resource requirement. P2 14 - 2 = 12
Given: Table of processes, and their Arrival time, Execution time, and priority.
Here we are considering 1 is the lowest priority. P3 5-3=2

Process Arrival Time Execution Time Priority Service Time


Average Wait Time: (0 + 10 + 12 + 2)/4 = 24 / 4 = 6
4. Round Robin Scheduling
P0 0 5 1 0
Round Robin is the preemptive process scheduling algorithm.
Each process is provided a fix time to execute, it is called a quantum.
P1 1 3 2 11
Once a process is executed for a given time period, it is preempted and other
process executes for a given time period.
P2 2 8 1 14
Context switching is used to save states of preempted processes.
P3 3 6 3 5

Wait time of each process is as follows −

Process Wait Time : Service Time - Arrival Time

P0 (0 - 0) + (12 - 3) = 9

Waiting time of each process is as follows − P1 (3 - 1) = 2

P2 (6 - 2) + (14 - 9) + (20 - 17) = 12

P3 (9 - 3) + (17 - 12) = 11
30
Average Wait Time: (9+2+12+11) / 4 = 8.5 change their foreground or background nature. This setup has the advantage
5. Multiple-Level Queues Scheduling of low scheduling overhead, but it is inflexible
Multiple-level queues are not an independent scheduling algorithm. They make Q)Thread Scheduling
use of other existing algorithms to group and schedule jobs with common Distinction between user-level and kernel-level threads
characteristics. Many-to-one and many-to-many models, thread library schedules user-level
Multiple queues are maintained for processes with common characteristics. threads to run on LWP
Each queue can have its own scheduling algorithms. Known as process-contention scope (PCS) since scheduling competition is within
Priorities are assigned to each queue. the process
For example, CPU-bound jobs can be scheduled in one queue and all I/O-bound Kernel thread scheduled onto available CPU is system-contention scope (SCS) –
jobs in another queue. The Process Scheduler then alternately selects jobs from competition among all threads in system
each queue and assigns them to the CPU based on the algorithm assigned to the Contention Scope
queue One distinction between user-level and kernel-level threads lies in how they
Let’s look at an example of a multilevel queue scheduling algorithm with are scheduled. On systems implementing the many-to-one (Section 4.3.1) and
five queues, listed below in order of priority: many-to-many (Section 4.3.3) models, the thread library schedules user-level
1. System processes threads to run on an available LWP. This scheme is known as processcontention
2. Interactive processes scope (PCS), since competition for the CPU takes place among
3. Interactive editing processes threads belonging to the same process.
4. Batch processes Pthread Scheduling
5. Student processes We provided a sample POSIX Pthread program in Section 4.4.1, along with an
introduction to thread creation with Pthreads. Now, we highlight the POSIX
Pthread API that allows specifying PCS or SCS during thread creation. Pthreads
identifies the following contention scope values:
• PTHREAD SCOPE PROCESS schedules threads using PCS scheduling.
• PTHREAD SCOPE SYSTEM schedules threads using SCS scheduling.
The Pthread IPC provides two functions for getting—and setting—the
contention scope policy:
• pthread attr setscope(pthread attr t *attr, int scope)
• pthread attr getscope(pthread attr t *attr, int *scope)
Q)Multiple-Processor Scheduling
In multiple-processor scheduling multiple CPU’s are available and hence Load
Sharing becomes possible. However multiple processor scheduling is
more complex as compared to single processor scheduling. In multiple processor
scheduling there are cases when the processors are identical i.e. HOMOGENEOUS,
in terms of their functionality, we can use any processor available to run any
process in the queue.
Approaches to Multiple-Processor Scheduling –
One approach is when all the scheduling decisions and I/O processing are handled
6. Multilevel Feedback Queue Scheduling by a single processor which is called the Master Server and the other processors
Normally, when the multilevel queue scheduling algorithm is used, processes executes only the user code. This is simple and reduces the need of data sharing.
are permanently assigned to a queue when they enter the system. If there This entire scenario is called Asymmetric Multiprocessing.
are separate queues for foreground and background processes, for example, A second approach uses Symmetric Multiprocessing where each processor is self
processes do not move from one queue to the other, since processes do not scheduling. All processes may be in a common ready queue or each processor may
31
have its own private queue for ready processes. The scheduling proceeds further Load Balancing is the phenomena which keeps
by having the scheduler for each processor examine the ready queue and select a the workload evenly distributed across all processors in an SMP system. Load
process to execute. balancing is necessary only on systems where each processor has its own private
Processor Affinity – queue of process which are eligible to execute. Load balancing is unnecessary
Processor Affinity means a processes has an affinity for the processor on which it because once a processor becomes idle it immediately extracts a runnable process
is currently running. from the common run queue. On SMP(symmetric multiprocessing), it is important
When a process runs on a specific processor there are certain effects on the cache to keep the workload balanced among all processors to fully utilize the benefits of
memory. The data most recently accessed by the process populate the cache for having more than one processor else one or more processor will sit idle while
the processor and as a result successive memory access by the process are often other processors have high workloads along with lists of processors awaiting the
satisfied in the cache memory. Now if the process migrates to another processor, CPU.
the contents of the cache memory must be invalidated for the first processor and There are two general approaches to load balancing :
the cache for the second processor must be repopulated. Because of the high cost Push Migration – In push migration a task routinely checks the load on each
of invalidating and repopulating caches, most of the SMP(symmetric processor and if it finds an imbalance then it evenly distributes load on each
multiprocessing) systems try to avoid migration of processes from one processor processors by moving the processes from overloaded to idle or less busy
to another and try to keep a process running on the same processor. This is processors.
known as PROCESSOR AFFINITY. Pull Migration – Pull Migration occurs when an idle processor pulls a waiting task
There are two types of processor affinity: from a busy processor for its execution.
• Soft Affinity – When an operating system has a policy of attempting to Multicore Processors –
keep a process running on the same processor but not guaranteeing it In multicore processors multiple processor cores are places on the same physical
will do so, this situation is called soft affinity. chip. Each core has a register set to maintain its architectural state and thus
• Hard Affinity – Hard Affinity allows a process to specify a subset of appears to the operating system as a separate physical processor. SMP
processors on which it may run. Some systems such as Linux implements systems that use multicore processors are faster and consume less power than
soft affinity but also provide some system calls systems in which each processor has its own physical chip.
like sched_setaffinity() that supports hard affinity. However multicore processors may complicate the scheduling problems. When
Load Balancing – processor accesses memory then it spends a significant amount of time waiting
for the data to become available. This situation is called MEMORY STALL. It occurs
for various reasons such as cache miss, which is accessing the data that is not in
the cache memory. In such cases the processor can spend upto fifty percent of its
time waiting for data to become available from the memory. To solve this
problem recent hardware designs have implemented multithreaded processor
cores in which two or more hardware threads are assigned to each core.
Therefore if one thread stalls while waiting for the memory, core can switch to
another thread.
There are two ways to multithread a processor :
Coarse-Grained Multithreading – In coarse grained multithreading a thread
executes on a processor until a long latency event such as a memory stall occurs,
because of the delay caused by the long latency event, the processor must switch
to another thread to begin execution. The cost of switching between threads is
high as the instruction pipeline must be terminated before the other thread can
begin execution on the processor core. Once this new thread begins execution it
begins filling the pipeline with its instructions.

32
Fine-Grained Multithreading – This multithreading switches between threads at a —as when a timer expires—or in hardware—as when a remote-controlled
much finer level mainly at the boundary of an instruction cycle. The architectural vehicle detects that it is approaching an obstruction.When an event occurs, the
design of fine grained systems include logic for thread switching and as a result system must respond to and service it as quickly as possible.We refer to event
the cost of switching between threads is small. latency as the amount of time that elapses from when an event occurs to when
Virtualization and Threading – it is serviced
In this type of multiple-processor scheduling even a single CPU system acts like a
multiple-processor system. In a system with Virtualization, the virtualization Two types of latencies affect
presents one or more virtual CPU to each of virtual machines running on the the performance of real-time
system and then schedules the use of physical CPU among the virtual machines. systems:
Most virtualized environments have one host operating system and many guest 1. Interrupt latency
operating systems. The host operating system creates and manages the virtual 2. Dispatch latency
machines. Each virtual machine has a guest operating system installed and Interrupt latency refers to the
applications run within that guest.Each guest operating system may be assigned period of time fromthe
for specific use cases,applications or users including time sharing or even real- arrival of an interrupt
time operation. Any guest operating-system scheduling algorithm that assumes a at the CPU to the start of the
certain amount of progress in a given amount of time will be negatively impacted routine that services the
by the virtualization. A time sharing operating system tries to allot 100 interrupt. When an
milliseconds to each time slice to give users a reasonable response time. A given interrupt occurs, the
100 millisecond time slice may take much more than 100 milliseconds of virtual operating system must first
CPU time. Depending on how busy the system is, the time slice may take a second complete the instruction it
or more which results in a very poor response time for users logged into that is executing and determine the type of interrupt that occurred. It must then
virtual machine. The net effect of such scheduling layering is that individual save the state of the current process before servicing the interrupt using the
virtualized operating systems receive only a portion of the available CPU cycles, specific interrupt service routine (ISR). The total time required to perform these
even though they believe they are receiving all cycles and that they are scheduling tasks is the interrupt latency
all of those cycles.Commonly, the time-of-day clocks in virtual machines are
incorrect because timers take no longer to trigger than they would on dedicated
CPU’s.
Virtualizations can thus undo the good scheduling-algorithm efforts of the
operating systems within virtual machines
Q) Real-Time CPU Scheduling
CPU scheduling for real-time operating systems involves special issues. In
general, we can distinguish between soft real-time systems and hard real-time
systems. Soft real-time systems provide no guarantee as to when a critical
real-time process will be scheduled. They guarantee only that the process will
be given preference over noncritical processes. Hard real-time systems have
stricter requirements. A task must be serviced by its deadline; service after the
deadline has expired is the same as no service at all. In this section, we explore
several issues related to process scheduling in both soft and hard real-time
operating systems.
Minimizing Latency
Consider the event-driven nature of a real-time system. The system is typically
waiting for an event in real time to occur. Events may arise either in software
33
P3 2
P4 5
P5 14
Which of the following algorithms will perform best on this workload?
First Come First Served (FCFS), Non Preemptive Shortest Job First (SJF) and Round
Robin (RR). Assume a quantum of 8 milliseconds.
Before looking at the answers, try to calculate the figures for each algorithm.
The advantages of deterministic modeling is that it is exact and fast to compute.
The disadvantage is that it is only applicable to the workload that you use to test.
As an example, use the above workload but assume P1 only has a burst time of 8
milliseconds. What does this do to the average waiting time?
Of course, the workload might be typical and scale up but generally deterministic
modeling is too specific and requires too much knowledge about the workload.
Queuing Models
Another method of evaluating scheduling algorithms is to use queuing theory.
Using data from real processes we can arrive at a probability distribution for the
length of a burst time and the I/O times for a process. We can now generate these
times with a certain distribution.
We can also generate arrival times for processes (arrival time distribution).
we diagram the makeup of dispatch latency. The conflict If we define a queue for the CPU and a queue for each I/O device we can test the
phase of dispatch latency has two components: various scheduling algorithms using queuing theory.
1. Preemption of any process running in the kernel Knowing the arrival rates and the service rates we can calculate various figures
2. Release by low-priority processes of resources needed by a high-priority such as average queue length, average wait time, CPU utilization etc.
Process One useful formula is Little's Formula.
Evaluation of Process Scheduling Algorithms n = λw
The first thing we need to decide is how we will evaluate the algorithms. To do Where
this we need to decide on the relative importance of the factors we listed above n is the average queue length
(Fairness, Efficiency, Response Times, Turnaround and Throughput). Only once we λ is the average arrival rate for new processes (e.g. five a second)
have decided on our evaluation method can we carry out the evaluation. w is the average waiting time in the queue
Deterministic Modeling Knowing two of these values we can, obviously, calculate the third. For example,
This evaluation method takes a predetermined workload and evaluates each if we know that eight processes arrive every second and there are normally
algorithm using that workload. sixteen processes in the queue we can compute that the average waiting time per
Assume we are presented with the following processes, which all arrive at time process is two seconds.
zero. The main disadvantage of using queuing models is that it is not always easy to
define realistic distribution times and we have to make assumptions. This results
in the model only being an approximation of what actually happens.
Process Burst Time Simulations
Rather than using queuing models we simulate a computer. A Variable,
P1 9
representing a clock is incremented. At each increment the state of the simulation
P2 33 is updated.

34
Statistics are gathered at each clock tick so that the system performance can be
analysed.
The data to drive the simulation can be generated in the same way as the queuing
model, although this leads to similar problems.
Alternatively, we can use trace data. This is data collected from real processes on
real machines and is fed into the simulation. This can often provide good results
and good comparisons over a range of scheduling algorithms.
However, simulations can take a long time to run, can take a long time to
implement and the trace data may be difficult to collect and require large
amounts of storage.
Implementation
The best way to compare algorithms is to implement them on real machines. This
will give the best results but does have a number of disadvantages.
· It is expensive as the algorithm has to be written and then implemented on real
hardware.
· If typical workloads are to be monitored, the scheduling algorithm must be used
in a live situation. Users may not be happy with an environment that is constantly
changing.
· If we find a scheduling algorithm that performs well there is no guarantee that
this state will continue if the workload or environment changes.

35
36
UNIT III system
Q)DEADLOCKS-SYSTEM MODEL The resource-allocation graph shown in Figure 7.1
To develop a description of deadlocks, which prevent sets of concurrent processes depicts the following
from completing their tasks. To present a number of different methods for situation.
preventing or avoiding deadlocks in a computer system • The sets P, R, and E:
A process must request a resource before using it and must release the resource ◦ P = {P1, P2, P3}
after using it. A process may request as many resources as it requires to carry out ◦ R = {R1, R2, R3, R4}
its designated task. Obviously, the number of resources requested may not ◦ E = {P1 → R1, P2 → R3, R1 → P2, R2 → P2, R2 →
exceed the total number of resources available in the system. In other words, a P1, R3 → P3}
process cannot request three printers if the system has only two. Under the • Resource instances:
normal mode of operation, a process may utilize a resource in only the following ◦ One instance of resource type R1
sequence: ◦ Two instances of resource type R2
1. Request. The process requests the resource. If the request cannot be granted ◦ One instance of resource type R3
immediately (for example, if the resource is being used by another process), then ◦ Three instances of resource type R4
the requesting process must wait until it can acquire the resource. • Process states:
2. Use. The process can operate on the resource (for example, if the resource is a ◦ Process P1 is holding an instance of resource type
printer, the process can print on the printer). R2 and is waiting for
3. Release. The process releases the resource. an instance of resource type R1.
Q) Deadlock Characterization ◦ Process P2 is holding an instance of R1 and an instance of R2 and is
➢ Necessary Conditions waiting for an instance of R3.
Adeadlock situation can arise if the following four conditions hold simultaneously ◦ Process P3 is holding an instance of R3.
in a system: Given the definition of a resource-allocation graph, it can be shown that, if the
1. Mutual exclusion. At least one resource must be held in a non-sharable mode; graph contains no cycles, then no process in the system is deadlocked. If the graph
that is, only one process at a time can use the resource. If another process does contain a cycle, then a deadlock may
requests that resource, the requesting process must be delayed until the resource exist
has been released. To illustrate this concept, we return to the
2. Hold and wait. A process must be holding at least one resource and waiting to resource-allocation graphdepicted in Figure
acquire additional resources that are currently being held by other processes. 7.1. Suppose that process P3 requests an
3. No preemption. Resources cannot be preempted; that is, a resource can be instance of resource type R2. Since no
released only voluntarily by the process holding it, after that process has resource instance is currently available,we
completed its task. add a request edge P3→ R2 to the graph
4. Circular wait. A set {P0, P1, ..., Pn} of waiting processes must exist such that P0 (Figure 7.2). At this point, two minimal cycles
is waiting for a resource held by P1, P1 is waiting for a resource held by P2, ..., exist in the system:
Pn−1 is waiting for a resource held by Pn, and Pn is waiting for a resource held by P1 → R1 → P2 → R3 → P3 → R2 → P1
P0. P2 → R3 → P3 → R2 → P2
➢ Resource-Allocation Graph Processes P1, P2, and P3 are deadlocked.
Deadlocks can be described more precisely in terms of a directed graph called Process P2 is waiting for the resource
a system resource-allocation graph. This graph consists of a set of vertices V R3, which is held by process P3. Process P3 is
and a set of edges E. The set of vertices V is partitioned into two different types waiting for either process P1 or
of nodes: P = {P1, P2, ..., Pn}, the set consisting of all the active processes in the process P2 to release resource R2. In addition, process P1 is waiting for process
system, and R = {R1, R2, ..., Rm}, the set consisting of all resource types in the P2 to release resource R1.

37
Now consider the resource- Prevention techniques –
allocation graph in Figure 7.3. In this Mutual exclusion – is supported by the OS.
example, Hold and Wait – condition can be prevented by requiring that a process requests
we also have a cycle: all its required resources at one time and blocking the process until all of its
P1 → R1 → P3 → R2 → P1 requests can be granted at a same time simultaneously. But this prevention does
However, there is no deadlock. not yield good result because :
Observe that process P4 may long waiting time required
release its instance of resource type in efficient use of allocated resource
R2. That resource can then be A process may not know all the required resources in advance
allocated to P3, breaking the cycle. No pre-emption – techniques for ‘no pre-emption are’
In summary, if a resource-allocation If a process that is holding some resource, requests another resource that can not
graph does not have a cycle, then be immediately allocated to it, the all resource currently being held are released
the system is not in a deadlocked and if necessary, request them again together with the additional resource.
state. If there is a cycle, then the If a process requests a resource that is currently held by another process, the OS
system may or may not be in a may pre-empt the second process and require it to release its resources. This
deadlocked state. This observation works only if both the processes do not have same priority.
is important when we deal Circular wait One way to ensure that this condition never hold is to impose a total
with the deadlock problem. ordering of all resource types and to require that each process requests resource
Q)Methods for Handling Deadlocks in an increasing order of enumeration, i.e., if a process has been allocated
Generally speaking, we can deal with the deadlock problem in one of three resources of type R, then it may subsequently request only those resources of
ways: types following R in ordering.
• We can use a protocol to prevent or avoid deadlocks, ensuring that the Q) Deadlock Avoidance
system will never enter a deadlocked state. Requires that the system has some additional a priori information available
• We can allow the system to enter a deadlocked state, detect it, and recover. Simplest and most useful model requires that each process declare the maximum
• We can ignore the problem altogether and pretend that deadlocks never number of resources of each type that it may need
occur in the system. The deadlock-avoidance algorithm dynamically examines the resource-allocation
Ensure that the system will never enter a deadlock state Allow the system to state to ensure that there can never be a circular-wait condition
enter a deadlock state and then recover Ignore the problem and pretend that Resource-allocation state is defined by the number of available and allocated
deadlocks never occur in the system; used by most operating systems, including resources, and the maximum demands of the processes
UNIX ➢ Safe State
There are three approaches to deal with deadlocks. When a process requests an available
1. Deadlock Prevention resource, system must decide if immediate
2. Deadlock avoidance allocation leaves the system in a safe state
3. Deadlock detection System is in safe state if there exists a
Q) Deadlock Prevention sequence <P1, P2, …, Pn> of ALL the processes
Deadlock Prevention : is the systems such that for each Pi, the
The strategy of deadlock prevention is to design the system in such a way that the resources that Pi can still request can be
possibility of deadlock is excluded. Indirect method prevent the occurrence of one satisfied by currently available resources +
of three necessary condition of deadlock i.e., mutual exclusion, no pre-emption resources held by all the Pj, with j < in That is:
and hold and wait. Direct method prevent the occurrence of circular wait. If Pi resource needs are not immediately
available, then Pi can wait until all Pj have
finished
38
When Pj is finished, Pi can obtain needed resources, execute, return allocated If Max[i][j] equals k, then process Pi may request at most k instances of
resources, and terminate When Pi terminates, Pi +1 can obtain its needed resource type Rj .
resources, and so on • Allocation. An n × m matrix defines the number of resources of each type
Basic Facts currently allocated to each process. If Allocation[i][j] equals k, then process
If a system is in safe state Þ no deadlocks. If a system is in unsafe state Þ Pi is currently allocated k instances of resource type Rj .
possibility of deadlock Avoidance Þ ensure that a system will never enter an • Need. An n × m matrix indicates the remaining resource need of each
unsafe state. process. If Need[i][j] equals k, then process Pi may need k more instances
➢ Resource-Allocation Graph Scheme of resource type Rj to complete its task. Note that Need[i][j] equalsMax[i][j]
Claim edge Pi® Rj indicated that process Pj − Allocation[i][j].
may request resource Rj; represented by a ➢ Safety Algorithm
dashed linen Claim edge converts to We can now present the algorithm for finding out whether or not a system is
request edge when a process requests a in a safe state. This algorithm can be described as follows:
resource Request edge converted to an 1. Let Work and Finish be vectors of length m and n, respectively. Initialize
assignment edge when the resource is Work = Available and Finish[i] = false for i = 0, 1, ..., n − 1.
allocated to the process 2. Find an index i such that both
When a resource is released by a process, a. Finish[i] == false
assignment edge reconverts to a claim b. Needi ≤ Work
edge Resources must be claimed a priori in If no such i exists, go to step 4.
the system 3. Work = Work + Allocationi
Finish[i] = true
Go to step 2.
4. If Finish[i] == true for all i, then the system is in a safe state.
➢ Banker’s Algorithm This algorithm may require an order ofm × n2 operations to determine whether
The resource-allocation-graph algorithm is not applicable to a resource allocation a state is safe.
system with multiple instances of each resource type. The deadlock avoidance ➢ Resource-Request Algorithm
algorithm that we describe next is applicable to such a system but is less efficient Next, we describe the algorithm for determining whether requests can be safely
than the resource-allocation graph scheme. This algorithm is commonly known as granted.
the banker’s algorithm. Let Requesti be the request vector for process Pi . If Requesti [ j] == k, then
Several data structures must be process Pi wants k instances of resource type Rj. When a request for resources
maintained to implement the banker’s is made by process Pi, the following actions are taken:
algorithm. These data structures encode 1. If Requesti ≤Needi , go to step 2. Otherwise, raise an error condition, since
the state of the resource-allocation the process has exceeded its maximum claim.
system. We need the following data 2. If Requesti ≤ Available, go to step 3. Otherwise, Pi must wait, since the
structures, where n is the number of resources are not available.
processes in the system and m is the 3. Have the system pretend to have allocated the requested resources to
number of resource types: process Pi by modifying the state as follows:
• Available. Avectorof lengthmindicates Available = Available–Requesti ;
the number of available resources Allocationi = Allocationi + Requesti ;
of each type. If Available[j] equals k, Needi = Needi –Requesti ;
then k instances of resource type Rj If the resulting resource-allocation state is safe, the transaction is completed,
are available. and process Pi is allocated its resources. However, if the new state is unsafe, then
• Max. An n × m matrix defines the maximum demand of each process. Pi must wait for Requesti , and the old resource-allocation state is restored.
39
Q)Deadlock Detection with multiple instances of each resource type. We turn now to a deadlock
If a system does not employ either a deadlock-prevention or a deadlockavoidance detection algorithm that is applicable to such a system. The algorithm employs
algorithm, then a deadlock situation may occur. In this environment, several time-varying data structures that are similar to those used in the
the system may provide: banker’s algorithm :
• An algorithm that examines the state of the system to determine whether • Available. Avectorof lengthmindicates the number of available resources
a deadlock has occurred of each type.
• An algorithm to recover from the deadlock • Allocation. An n × m matrix defines the number of resources of each type
currently allocated to each process.
• Request. An n × m matrix indicates the current request of each process.
If Request[i][j] equals k, then process Pi is requesting k more instances of
resource type Rj
ALGORITHM:
1. Let Work and Finish be vectors of length m and n, respectively. Initialize
Work = Available. For i = 0, 1, ..., n–1, if Allocationi ̸= 0, then Finish[i] =
false. Otherwise, Finish[i] = true.
2. Find an index i such that both
a. Finish[i] == false
b. Requesti ≤ Work
If no such i exists, go to step 4.
3. Work = Work + Allocationi
Finish[i] = true
Go to step 2.
4. If Finish[i] ==false for some i, 0≤i<n, then the system is in a deadlocked
▪ Single Instance of Each Resource Type state. Moreover, if Finish[i] == false, then process Pi is deadlocked.
If all resources have only a single instance, then we can define a deadlock This algorithm requires an order of m × n2 operations to detect whether the
detection algorithm that uses a variant of the resource-allocation graph, called system is in a deadlocked state
a wait-for graph. We obtain this graph from the resource-allocation graph by ▪ Detection-Algorithm Usage
removing the resource nodes and collapsing the appropriate edges. More If deadlocks occur frequently, then the detection algorithm should be invoked
precisely, an edge from Pi to Pj in a wait-for graph implies that process Pi is frequently. Resources allocated to deadlocked processes will be idle until the
waiting for process Pj to release a resource that Pi needs. An edge Pi → Pj exists in deadlock can be broken. In addition, the number of processes involved in the
a wait-for graph if and only if the corresponding resource allocation graph deadlock cycle may grow.
contains two edges Pi → Rq and Rq → Pj for some resource Rq . InFigure Deadlocks occur only when some process makes a request that cannot be
7.9,wepresent a resource-allocation graph and the corresponding granted immediately
wait-for graph. Q)Recovery from Deadlock in Operating System
As before, a deadlock exists in the system if and only if the wait-for graph When a Deadlock Detection Algorithm determines that a deadlock has occurred in
contains a cycle. To detect deadlocks, the system needs to maintain the waitfor the system, the system must recover from that deadlock. There are two
graph and periodically invoke an algorithm that searches for a cycle in approaches of breaking a Deadlock:
the graph. An algorithm to detect a cycle in a graph requires an order of n2 1. Process Termination:
operations, where n is the number of vertices in the graph. To eliminate the deadlock, we can simply kill one or more processes. For this, we
▪ Several Instances of a Resource Type use two methods:
The wait-for graph scheme is not applicable to a resource-allocation system (a). Abort all the Deadlocked Processes:
Aborting all the processes will certainly break the deadlock, but with a great
40
expense. The deadlocked processes may have computed for a long time and the known as RAM(Random Access Memory). This memory is a volatile memory.RAM
result of those partial computations must be discarded and there is a probability lost its data when a power interruption occurs.
to recalculate them later. ▪ Basic Hardware
(b). Abort one process at a time until deadlock is eliminated: Main memory and the registers
Abort one deadlocked process at a time, until deadlock cycle is eliminated from built into the processor itself are
the system. Due to this method, there may be considerable overhead, because the only general-purpose storage
after aborting each process, we have to run deadlock detection algorithm to check that the CPU can access directly.
whether any processes are still deadlocked. There are machine instructions
2. Resource Preemption: that take memory addresses as
To eliminate deadlocks using resource preemption, we preempt some resources arguments, but none that take
from processes and give those resources to other processes. This method will disk addresses. Therefore, any
raise three issues – instructions in execution, and any
(a). Selecting a victim: data being used by the
We must determine which resources and which processes are to be preempted instructions, must be in one of
and also the order to minimize the cost. these direct-access storage
(b). Rollback: devices. If the data are not in
We must determine what should be done with the process from which resources memory, they must be moved
are preempted. One simple idea is total rollback. That means abort the process there before the CPU can operate on them. Registers that are built into the CPU
and restart it. are generally accessible within one cycle of the CPU clock. Most CPUs can decode
(c). Starvation: instructions and perform simple operations on register contents at the rate of one
In a system, it may happen that same process is always picked as a victim. As a or more operations per clock tick. The same cannot be said of main memory,
result, that process will never complete its designated task. This situation is which is accessed viaa transaction on the memory bus. Completing a memory
called Starvation and must be avoided. One solution is that a process must be access may take many cycles of the CPU clock. In such cases, the processor
picked as a victim only a finite number of times normally needs to stall, since it does not have the data requiredto complete the
instruction that it is executing
Q) MEMORY MANAGEMENT – MAIN MEMORY - BACKGROUND
The term Memory can be defined as a collection of data in a specific format. It is
used to store instructions and processed data. The memory comprises a large
array or group of words or bytes, each with its own location. The primary motive
of a computer system is to execute programs. These programs, along with the
information they access, should be in the main memory during execution. The
CPU fetches instructions from memory according to the value of the program
counter.
▪ What is Main Memory:
The main memory is central to the operation of a modern computer. Main
Memory is a large array of words or bytes, ranging in size from hundreds of
thousands to billions. Main memory is a repository of rapidly available
information shared by the CPU and I/O devices. Main memory is the place where
programs and information are kept when the processor is effectively utilizing
them. Main memory is associated with the processor, so moving instructions and
information into and out of the processor is extremely fast. Main memory is also Protection of memory space is accomplished by having the CPU hardware

41
compare every address generated in user mode with the registers. Any attempt binding is delayed until load time. If the starting address changes, we need only
by a program executing in user mode to access operating-system memory or reload the user code to incorporate this changed value.
other users’ memory results in a trap to the operating system, which treats the • Execution time. If the process can be moved during its execution from one
attempt as a fatal error memory segment to another, then binding must be delayed until run time. Special
▪ Address Binding hardware must be available for this scheme to work
▪ Logical and Physical Address Space:
Logical Address space: An address generated by the CPU is known as “Logical
Address”. It is also known as a Virtual address. Logical address space can be
defined as the size of the process. A logical address can be changed.
Physical Address space: An address seen by the memory unit (i.e the one loaded
into the memory address register of the memory) is commonly known as a
“Physical Address”. A Physical address is also known as a Real address. The set of
all physical addresses corresponding to these logical addresses is known as
Physical address space. A physical address is computed by MMU. The run-time
mapping from virtual to physical addresses is done by a hardware device Memory
Management Unit(MMU). The physical address always remains constant.

Static and Dynamic Loading:


Compile time. If you knowat compile timewhere the processwill reside in To load a process into the main memory is done by a loader. There are two
memory, then absolute code can be generated. For example, if you know that a different types of loading :
user process will reside starting at location R, then the generated compiler code Static loading:- loading the entire program into a fixed address. It requires more
will start at that location and extend up from there. If, at some later time, the memory space.
starting location changes, then it will be necessary to recompile this code. The Dynamic loading:- The entire program and all data of a process must be in
MS-DOS .COM-format programs are bound at compile time. physical memory for the process to execute. So, the size of a process is limited to
• Load time. If it is not known at compile timewhere the processwill reside in the size of physical memory. To gain proper memory utilization, dynamic loading
memory, then the compiler must generate relocatable code. In this case, final is used. In dynamic loading, a routine is not loaded until it is called. All routines

42
are residing on disk in a relocatable load format. One of the advantages of enough to accommodate copies of all memory images for all users, and it must
dynamic loading is that unused routine is never loaded. This loading is useful provide direct access to these memory images. The system maintains a ready
when a large amount of code is needed to handle it efficiently queue consisting of all processes whose memory images are on the backing
▪ Dynamic Linking and Shared Libraries store or in memory and are ready to run. Whenever the CPU scheduler decides
Dynamically linked libraries are system libraries that are linked to user programs to execute a process, it calls the dispatcher. The dispatcher checks to see whether
when the programs are run (refer back to Figure 8.3). Some operating systems the next process in the queue is in memory. If it is not, and if there is no free
support only static linking, in which system libraries are treated like any other memory region, the dispatcher swaps out a process currently in memory and
object module and are combined by the loader into the binary program image. swaps in the desired process. It then reloads registers and transfers control to
Dynamic linking, in contrast, is similar to dynamic loading. Here, though, linking, the selected process.
rather than loading, is postponed until execution time. This feature is usually used ➢ Swapping on Mobile Systems
with system libraries, such as language subroutine libraries. Although most operating systems for PCs and servers support somemodified
only programs that are compiled with the new library version are affected by any version of swapping, mobile systems typically do not support swapping in any
incompatible changes incorporated in it. Other programs linked before the new form. Mobile devices generally use flash memory rather than more spacious
library was installed will continue using the older library. This system is also hard disks as their persistent storage. The resulting space constraint is one
known as shared libraries reasonwhymobile operating-systemdesigners avoid swapping.Other reasons
Q)Swapping include the limited number of writes that flash memory can tolerate before it
A process must be in memory to be executed. A process, however, can be becomes unreliable and the poor throughput between main memory and flash
swapped temporarily out of memory to a backing store and then brought back memory in these devices.
into memory for continued execution (Figure 8.5). Swapping makes it possible Q)Contiguous Memory Allocation
for the total physical address space of all processes to exceed the real physical In contiguous memory allocation, each process is contained in a single section of
memory of the system, thus increasing the degree of multiprogramming in a memory that is contiguous to the section containing the next process.
system. ➢ Memory Protection
➢ Standard Swapping When the CPU scheduler selects a process for execution, the dispatcher loads the
Standard swapping involves moving processes between main memory and relocation and limit registers with the correct values as part of the context switch.
a backing store. The backing store is commonly a fast disk. It must be large Because every address generated by a CPU is checked against these registers, we
can protect both the operating system and the other users’ programs and data
from being modified by this running process.
The relocation-register scheme provides an effective way to allow the operating
system’s size to change dynamically. This flexibility is desirable in many situations.

43
➢ Memory Allocation to by a segment number, rather than by a segment name. Thus, a logical address
• First fit. Allocate the first hole that is big enough. Searching can start either consists of a two tuple:
at the beginning of the set of holes or at the location where the previous first-fit <segment-number, offset>.
search ended.We can stop searching as soon as we find a free hole that is large Normally, when a program is compiled, the compiler automatically constructs
enough. segments reflecting the input program.
• Best fit. Allocate the smallest hole that is big enough. We must search the entire A C compiler might create separate segments for the following:
list, unless the list is ordered by size. This strategy produces the smallest leftover 1. The code
hole. 2. Global variables
• Worst fit. Allocate the largest hole. Again, we must search the entire list, unless 3. The heap, from which memory is allocated
it is sorted by size. This strategy produces the largest leftover hole, which may be 4. The stacks used by each thread
more useful than the smaller leftover hole from a best-fit approach. 5. The standard C library
➢ Fragmentation Segmentation Hardware
As processes are loaded and removed from memory, the free memory space is Although the programmer can now refer to objects in the program by a two-
broken into little pieces. It happens after sometimes that processes cannot be dimensional address, the actual physical memory is still, of course, a one
allocated to memory blocks considering their small size and memory blocks dimensional sequence of bytes. Thus, we must define an implementation to
remains unused. This problem is known as Fragmentation. map two-dimensional user-defined addresses into one-dimensional physical
Fragmentation is of two types – addresses. This mapping is effected by a segment table. Each entry in the segment
External fragmentation table has a segment base and a segment limit. The segment base contains the
Total memory space is enough to satisfy a request or to reside a process in it, but starting physical address where the segment resides in memory, and the segment
it is not contiguous, so it cannot be used. limit specifies the length of the segment
Internal fragmentation
Memory block assigned to process is bigger. Some portion of memory is left
unused, as it cannot be used by another process
Q)Segmentation
Basic Method

Segmentation is a memory-
management scheme that supports
this programmer
view of memory. A logical address
space is a collection of segments
Each segment has a name and a
length. The addresses specify both
the segment
name and the offset within the
segment. The programmer
therefore specifies
each address by two quantities: a
segment name and an offset.
For simplicity of implementation,
segments are numbered and are
referred
44
addresses. This mapping is effected by a segment table. Each entry in the
segment table has a segment base and a segment limit. The segment base
contains the starting physical address where the segment resides in memory,
and the segment limit specifies the length of the segment
Q)Paging
➢ Basic Method
The basic method for implementing paging involves breaking physical memory
into fixed-sized blocks called frames and breaking logical memory into blocks of
the same size called pages. When a process is to be executed, its pages are loaded
into any available memory frames from their source (a file system or the backing
store). The backing store is divided into fixed-sized blocks that are the same size
as the memory frames or clusters of multiple frames

➢ Hardware Support
The hardware implementation of the page table can be done in several
ways. In the simplest case, the page table is implemented as a set of dedicated
registers. These registers shouldbebuiltwithveryhigh-speedlogic tomake the
paging-address translation efficient. Every access to memory must go through
the paging map, so efficiency is a major consideration. The CPU dispatcher
reloads these registers, just as it reloads the other registers. Instructions to load
or modify the page-table registers are, of course, privileged, so that only the
The page number is used as an index into a page table. The page table operating system can change the memory map. The DEC PDP-11 is an example
contains the base address of each page in physical memory. This base address of such an architecture. The address consists of 16 bits, and the page size is 8
is combined with the page offset to define the physical memory address that KB. The page table thus consists of eight entries that are kept in fast registers.
is sent to the memory unit. The paging model of memory is shown in Figure The standard solution to this problem is to use a special, small, fastlookup
Since the operating system is managing physical memory, it must be aware hardware cache called a translation look-aside buffer (TLB). The TLB
of the allocation details of physical memory—which frames are allocated, is associative, high-speed memory. Each entry in the TLB consists of two parts:
which frames are available, how many total frames there are, and so on. This a key (or tag) and a value.
information is generally kept in a data structure called a frame table

45
Shared Pages
An advantage of paging is the possibility of sharing common code. This
consideration is particularly important in a time-sharing environment. Consider a
system that supports 40 users, each of whom executes a text editor. If the text
editor consists of 150 KB of code and 50 KB of data space, we need 8,000 KB to
support the 40 users. If the code is reentrant code (or pure code), however, it
can be shared, as shown in Figure

➢ Protection
Memory protection in a paged environment is accomplished by protection bits
associated with each frame. Normally, these bits are kept in the page table .
One additional bit is generally attached to each entry in the page table: a
valid–invalid bit. When this bit is set to valid, the associated page is in the
process’s logical address space and is thus a legal (or valid) page. When the
bit is set toinvalid, the page is not in the process’s logical address space. Illegal
addresses are trapped by use of the valid–invalid bit. The operating system
sets this bit for each page to allow or disallow access to the page

46
Q)Structure of the Page Table Now we have a process of size
Structure of page table simply defines, in how many ways a page table can be 4 GB which is divided into 1
structured. Well, the paging is a memory management technique where a large million pages each of size 4KB.
process is divided into pages and is placed in physical memory which is also
divided into frames. Frame and page size is equivalent. The operating system uses
a page table to map the logical address of the page generated by CPU to its
physical address in the main memory
Structure of Page Table
1. Hierarchical Page Table
2. Hashed Page Table
3. Inverted Page Table
4. Oracle SPARC Solaris
1. Hierarchical Page Table
As we knew when the CPU access a page of any process it has to be in the main
memory. Along with the page, the page table of the same process must also be
stored in the main memory. Now, what if the size of the page table is larger than
the frame size of the main memory.
In that case, we have to breakdown the page table at multiple levels in order to fit
in the frame of the main memory. Let us understand this with the help of an
example.
Consider that the size of main memory (physical memory) is 512 MB = 2 29 (i.e.
physical memory can be represented with 29 bits)
This physical memory is divided into a number of frames where each frame size = Next, we have to implement a page table to store the information of pages of
4 KB = 212 (i.e. frame size can be represented with 12 bits) process i.e. which page is stored in which frame of the memory. As we have 1
Physical memory in bits = 29 million pages of the process there will be 1 million entries in the page table.
Frame size in bits = 12 Now in page table the page number provided by CPU in logical address act as an
Number of bits used to represent frame number= 29 – 12 = 17. index which leads you to the frame number where this page is stored in main
We represent physical memory as memory. This means each entry of the page table has the frame number.
As we have seen above the frame number is represented by 17 bits so the size of
each entry will be of 17 bits i.e. approx. 2 bytes.
Size of page table = number of entries * size of each entry
= 220 * 2
= 221
Total number of frames would be 217 = 1, 31,072 =2 MB
Now we have a physical memory of 512 MB which is divided into 131072 frames So we need to divide the page table this division of page table can be
where each frame size is 4 KB. accomplished in several ways. You can perform two-level division, three-level
Today modern system supports logical address space up to 2 32 to 264 bytes. division on page table and so on.
Consider we have a process whose size is 4 GB = 2 32 Let us discuss two-level paging in which a page table itself is paged. So we will
Logical address in bits = 32 have two-page tables’ inner page table and outer page table. We have a logical
Page size in bits = 12 address with page number 20 and page offset 12. As we are paging a page table
Number of bits used to represent page number= 32 – 12 = 20. the page number will further get split to 10-bit page number and 10 bit offset as
So the total number of the pages of the process would be = 2 20 i.e. up to 1 million. you can see in the image below.
47
Here P1 would act as an index and P2 would act as an offset for the outer page
table. Further, the P2 would act as an index and d would act as an offset to inner entry i.e. P3 so further we will check the frame entry of the element which is fr 5.
page table to map the logical address of the page to the physical memory. To this frame number, we will append the offset provided in the logical address to
2. Hashed Page Table reach the page’s physical address.
When the logical address space is beyond 32 bits in such case the page table is So, this is how the hashed page table works to map the logical address to the
structured using hashed page table. Though we can structure the large page table physical address.
using the multilevel page table it would consist of a number of levels increases 3. Inverted Page Table
the complexity of the page table. The concept of normal paging says that every process maintains its own page
The hashed page table is a convenient way to structure the page table where table which includes the entries of all the pages belonging to the process. The
logical address space is beyond 32 bits. The hash table has several entries where large process may have a page table with millions of entries. Such a page table
each entry has a link list. Each link list has a set of linked elements where each consumes a large amount of memory.
element hash to the same location. Each element has three entries page number, Consider we have six processes in execution. So, six processes will have some or
frame number and pointer to the next element. We would understand the the other of their page in the main memory which would compel their page tables
working of this page table better with the help of an example. also to be in the main memory consuming a lot of space. This is the drawback of
The CPU generates a logical address for the page it needs. Now, this logical the paging concept.
address needs to be mapped to the physical address. This logical address has two The inverted page table is the solution to this wastage of memory. The concept of
entries i.e. a page number (P3 in our case) and an offset. an inverted page table involves the existence of single-page table which has
The page number from the logical address is directed to the hash function. The entries of all the pages (may they belong to different processes) in main memory
hash function produces a hash value corresponding to the page number. This hash along with the information of the process to which they are associated. To get a
value directs to an entry in the hash table. As we have studied earlier, each entry better understanding consider the figure below of inverted page table.
in the hash table has a link list. Here the page number is compared with the first The CPU generates the logical address for the page it needs to access. This time
element’s first entry if a match is found then the second entry is checked the logical address consists of three entries process id, page number and the
In our example, the logical address includes page number P 3 which does not offset. The process id identifies the process, of which the page has been
match the first element of the link list as it includes page number P 1. So we will demanded, page number indicates which page of the process has been asked for
move ahead and check the next element; now this element has a page number and the offset value indicates the displacement required.

48
Q)VIRTUAL MEMORY- BACK GROUND
The requirement that instructions must be in physical memory to be executed
seems both necessary and reasonable; but it is also unfortunate, since it limits the
size of a program to the size of physical memory. In fact, an examination of real
programs shows us that, in many cases, the entire program is not needed. For
instance, consider the following:
• Programs often have code to handle unusual error conditions. Since these errors
seldom, if ever, occur in practice, this code is almost never executed.
• Arrays, lists, and tables are often allocated more memory than they actually
need. An array may be declared 100 by 100 elements, even though it is seldom
larger than 10 by 10 elements. An assembler symbol table may have room for
3,000 symbols, although the average program has less than 200 symbols.
• Certain options and features of a programmay be used rarely. For instance, the
routines on U.S. government computers that balance the budget have not been
used in many years.
Even in those cases where the entire program is needed, it may not all be needed
at the same time.
The ability to execute a program that is only partially in memory would confer
many benefits:
The • A program would no longer be constrained by the amount of physical memory
match of process id along with associated page number is searched in the page that is available. Users would be able to write programs for an extremely large
table and say if the search is found at the ith entry of page table then i and offset virtual address space, simplifying the programming task.
together generates the physical address for the requested page. This is how the • Because each user program could take less
logical address is mapped to a physical address using the inverted page table. physical memory, more programs could be
Though the inverted page table reduces the wastage of memory but it increases run at the same time, with a corresponding
the search time. This is because the entries in an inverted page table are sorted increase in CPU utilization and throughput
on the basis of physical address whereas the lookup is performed using logical but with no increase in response time or
address. It happens sometimes that the entire table is searched to find the match. turnaround time.
So these are the three techniques that can be used to structure a page table that • Less I/O would be needed to load or swap
helps the operating system in mapping the logical address of the page required by user programs into memory, so each user
CPU to its physical address program would run faster. Thus, running a
4. Oracle SPARC Solaris program that is not entirely in memory
Consider as a final example a modern 64-bit CPU and operating system that are would benefit both the system and the user.
tightly integrated to provide low-overhead virtual memory. Solaris running Virtual memory involves the separation of
on the SPARC CPU is a fully 64-bit operating system and as such has to solve logical memory as perceived
the problem of virtual memory without using up all of its physical memory by users from physical memory. This
by keeping multiple levels of page tables. Its approach is a bit complex but separation allows an extremely large
solves the problem efficiently using hashed page tables. There are two hash virtual memory to be provided for
tables—one for the kernel and one for all user processes. Each maps memory programmers when only a smaller physical
addresses from virtual to physical memory. Each hash-table entry represents a memory is available
contiguous area of mapped virtual memory,which is more efficient than having The virtual address space of a process refers to the logical (or virtual) view
a separate hash-table entry for each page. of how a process is stored in memory. Typically, this view is that a process
49
begins at a certain logical address—say, address 0—and exists in contiguous Q) DEMAND PAGING
memory, as shown in Figure A demand paging mechanism is very much similar to a paging system with
swapping where processes stored in the secondary memory and pages are loaded
only on demand, not in advance.
So, when a context switch occurs, the OS never copy any of the old program’s
pages from the disk or any of the new program’s pages into the main memory.
Instead, it will start executing the new program after loading the first page and
fetches the program’s pages, which are referenced.
During the program execution, if the program references a page that may not be
available in the main memory because it was swapped, then the processor
considers it as an invalid memory reference. That’s because the page fault and
transfers send control back from the program to the OS, which demands to store
page back into the memory.

In addition to separating logical memory from physical memory, virtual memory


allows files and memory to be shared by two or more processes through page
sharing (Section 8.5.4). This leads to the following benefits:
• System libraries can be shared by several processes through mapping of the
shared object into a virtual address space.Although each process considers the
libraries to be part of its virtual address space, the actual pages where the
libraries reside in physical memory are shared by all the processes (Figure 9.3).
Typically, a library is mapped read-only into the space of each process that is
linked with it.
• Similarly, processes can share memory. that two or more processes can
communicate through the use of shared memory.
Virtual memory allows one process to create a region of memory that it can share
with another process. Processes sharing this region consider it part of their virtual
address space, yet the actual physical pages of memory are shared, much as is
illustrated in F

50
• Basic Concepts The hardware to support demand paging is the same as the hardware for
When a process is to be swapped in, the pager guesses which pages will be paging and swapping:
used before the process is swapped out again. Instead of swapping in a whole • Page table. This table has the ability to mark an entry invalid through a
process, the pager brings only those pages into memory. Thus, it avoids reading valid–invalid bit or a special value of protection bits.
into memory pages that will not be used anyway, decreasing the swap time • Secondary memory. Thismemory holds those pages that are not present
and the amount of physical memory needed. in main memory. The secondary memory is usually a high-speed disk. It is
The procedure for handling this page fault is straightforward known as the swap device, and the section of disk used for this purpose is
1. We check an internal table (usually kept with the process control block) known as swap space
for this process to determine whether the reference was a valid or an • Performance of Demand Paging
invalid memory access. Demand paging can significantly affect the performance of a computer system.
2. If the reference was invalid, we terminate the process. If it was valid but Let p be the probability of a page fault (0 ≤ p ≤ 1). We would expect p to
we have not yet brought in that page, we now page it in. be close to zero—that is, we would expect to have only a few page faults. The
3. We find a free frame (by taking one from the free-frame list, for example). effective access time is then
4. We schedule a disk operation to read the desired page into the newly effective access time = (1 − p) × ma + p × page fault time.
allocated frame. To compute the effective access time, we must know how much time is
5. When the disk read is complete, we modify the internal table kept with needed to service a page fault. A page fault causes the following sequence to
the process and the page table to indicate that the page is now in memory. occur:
6. We restart the instruction that was interrupted by the trap. The process 1. Trap to the operating system.
can now access the page as though it had always been in memory. 2. Save the user registers and process state.
3. Determine that the interrupt was a page fault.
4. Check that the page reference was legal and determine the location of the
page on the disk.
5. Issue a read from the disk to a free frame:
a. Wait in a queue for this device until the read request is serviced.
b. Wait for the device seek and/or latency time.
c. Begin the transfer of the page to a free frame.
6. While waiting, allocate the CPU to some other user (CPU scheduling,
optional).
7. Receive an interrupt from the disk I/O subsystem (I/O completed).
8. Save the registers and process state for the other user (if step 6 is executed).
9. Determine that the interrupt was from the disk.
10. Correct the page table and other tables to show that the desired page is
now in memory.
11. Wait for the CPU to be allocated to this process again.
12. Restore the user registers, process state, and new page table, and then
resume the interrupted instruction.
In any case, we are faced with three major components of the page-fault
service time:
1. Service the page-fault interrupt.
2. Read in the page.
3. Restart the process

51
Q) Copy-on-Write Basic Page Replacement
Copy on Write or simply COW is a resource management technique. One of its Page replacement takes the following approach. If no frame is free, we find
main use is in the implementation of the fork system call in which it shares the one that is not currently being used and free it.We can free a frame by writing
virtual memory(pages) of the OS. its contents to swap space and changing the page table (and all other tables) to
In UNIX like OS, fork() system call creates a duplicate process of the parent indicate that the page is no longer in memory (Figure 9.10). We can now use
process which is called as the child process. the freed frame to hold the page for which the process faulted.We modify the
The idea behind a copy-on-write is that when a parent process creates a child page-fault service routine to include page replacement
process then both of these processes initially will share the same pages in
memory and these shared pages will be marked as copy-on-write which means
that if any of these processes will try to modify the shared pages then only a copy
of these pages will be created and the modifications will be done on the copy of
pages by that process and thus not affecting the other process.
Suppose, there is a process P that creates a new process Q and then process P
modifies page 3.
The below figures shows what happens before and after process P modifies page

1. Find the location of the desired page on the disk.


2. Find a free frame:
3 .
a. If there is a free frame, use it.
b. If there is no free frame, use a page-replacement algorithm to select
a victim frame.
c. Write the victim frame to the disk; change the page and frame tables
accordingly.
3. Read the desired page into the newly freed frame; change the page and
frame tables.
4. Continue the user process from where the page fault occurred.
1. First In First Out (FIFO) –
This is the simplest page replacement algorithm. In this algorithm, the operating
system keeps track of all pages in the memory in a queue, the oldest page is in the
front of the queue. When a page needs to be replaced page in the front of the
queue is selected for removal.
Example-1Consider page reference string 1, 3, 0, 3, 5, 6 with 3 page frames.Find
number of page faults

52
Now for the further page reference string —> 0 Page fault because they are
already available in the memory.
Optimal page replacement is perfect, but not possible in practice as the operating
system cannot know future requests. The use of Optimal Page replacement is to
Initially all slots are empty, so when 1, 3, 0 came they are allocated to the empty
set up a benchmark so that other replacement algorithms can be analyzed against
slots —> 3 Page Faults.
it.
when 3 comes, it is already in memory so —> 0 Page Faults.
3. Least Recently Used –
Then 5 comes, it is not available in memory so it replaces the oldest page slot i.e
In this algorithm page will be replaced which is least recently used.
1. —>1 Page Fault.
Example-3Consider the page reference string 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2 with 4
6 comes, it is also not available in memory so it replaces the oldest page slot i.e 3
page frames.Find number of page faults.
—>1 Page Fault.
Finally when 3 come it is not available so it replaces 0 1 page fault
Belady’s anomaly – Belady’s anomaly proves that it is possible to have more page
faults when increasing the number of page frames while using the First in First
Out (FIFO) page replacement algorithm. For example, if we consider reference
string 3, 2, 1, 0, 3, 2, 4, 3, 2, 1, 0, 4 and 3 slots, we get 9 total page faults, but if we
increase slots to 4, we get 10 page faults.
2. Optimal Page replacement –
In this algorithm, pages are replaced which would not be used for the longest
duration of time in the future.
Example-2:Consider the page references 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, with 4
page frame. Find number of page fault.
Initially all slots are empty, so when 7 0 1 2 are allocated to the empty slots —> 4
Page faults
0 is already there so —> 0 Page fault. Initially all slots are empty, so when 7 0 1 2 are allocated to the empty slots —> 4
when 3 came it will take the place of 7 because it is not used for the longest Page faults
duration of time in the future.—>1 Page fault. 0 is already their so —> 0 Page fault.
0 is already there so —> 0 Page fault.. when 3 came it will take the place of 7 because it is least recently used —>1 Page
4 will takes place of 1 —> 1 Page Fault. fault
53
0 is already in memory so —> 0 Page fault. • Global vs Local Allocation –
4 will takes place of 1 —> 1 Page Fault The number of frames allocated to a process can also dynamically change
Now for the further page reference string —> 0 Page fault because they are depending on whether you have used global replacement or local
already available in the memory. replacement for replacing pages in case of a page fault.
Q)Allocation of Frames Local replacement: When a process needs a page which is not in the memory, it
An important aspect of operating systems, virtual memory is implemented using can bring in the new page and allocate it a frame from its own set of allocated
demand paging. Demand paging necessitates the development of a page- frames only.
replacement algorithm and a frame allocation algorithm. Frame allocation Advantage: The pages in memory for a particular process and the page fault ratio
algorithms are used if you have multiple processes; it helps decide how many is affected by the paging behavior of only that process.
frames to allocate to each process. Disadvantage: A low priority process may hinder a high priority process by not
• Minimum Number of Frames making its frames available to the high priority process.
Our strategies for the allocation of frames are constrained in various ways.We Global replacement: When a process needs a page which is not in the memory, it
cannot, for example, allocate more than the total number of available frames can bring in the new page and allocate it a frame from the set of all frames, even
(unless there is page sharing).We must also allocate at least a minimum number if that frame is currently allocated to some other process; that is, one process can
of frames. Here, we look more closely at the latter requirement. take a frame from another.
One reason for allocating at least a minimum number of frames involves Advantage: Does not hinder the performance of processes and hence results in
performance. Obviously, as the number of frames allocated to each process greater system throughput.
decreases, the page-fault rate increases, slowingprocess execution. Inaddition, Disadvantage: The page fault ratio of a process can not be solely controlled by the
remember that, when a page fault occurs before an executing instruction process itself. The pages in memory for a process depends on the paging behavior
is complete, the instruction must be restarted. Consequently, we must have of other processes as well
enough frames to hold all the different pages that any single instruction can Q)THRASHING
reference. In case, if the page fault and swapping happens very frequently at a higher rate,
• Frame allocation algorithms – then the operating system has to spend more time swapping these pages. This
The two algorithms commonly used to allocate frames to a process are: state in the operating system is termed thrashing. Because of thrashing the CPU
Equal allocation: In a system with x frames and y processes, each process gets utilization is going to be reduced.
equal number of frames, i.e. x/y. For instance, if the system has 48 frames and 9
processes, each process will get 5 frames. The three frames which are not Let's understand by an
allocated to any process can be used as a free-frame buffer pool. example, if any process does
Disadvantage: In systems with processes of varying sizes, it does not make much not have the number of
sense to give each process equal frames. Allocation of a large number of frames to frames that it needs to
a small process will eventually lead to the wastage of a large number of allocated support pages in active use
unused frames. then it will quickly page
Proportional allocation: Frames are allocated to each process according to the fault. And at this point, the
process size. process must replace some
For a process pi of size si, the number of allocated frames is ai = (si/S)*m, where S pages. As all the pages of the
is the sum of the sizes of all the processes and m is the number of frames in the process are actively in use, it
system. For instance, in a system with 62 frames, if there is a process of 10KB and must replace a page that will
another process of 127KB, then the first process will be allocated (10/137)*62 = 4 be needed again right away.
frames and the other process will get (127/137)*62 = 57 frames. Consequently, the process will quickly fault again, and again, and again, replacing
Advantage: All the processes share the available frames according to their needs, pages that it must bring back in immediately. This high paging activity by a
rather than equally. process is called thrashing.

54
During thrashing, the CPU spends less time on some actual productive work spend 2. Page Fault Frequency
more time swapping. The working-set model is successful and its knowledge can be useful in preparing
Causes of Thrashing but it is a very clumpy approach in order to avoid thrashing. There is another
Thrashing affects the performance of execution in the Operating system. Also, technique that is used to avoid thrashing and it is Page Fault Frequency(PFF) and
thrashing results in severe performance problems in the Operating system. it is a more direct approach.
When the utilization of CPU is low, then the process scheduling mechanism tries The main problem is how to prevent thrashing. As thrashing has a high page fault
to load many processes into the memory at the same time due to which degree of rate and also we want to control the page fault rate.
Multiprogramming can be increased. Now in this situation, there are more When the Page fault is too high, then we know that the process needs more
processes in the memory as compared to the available number of frames in the frames. Conversely, if the page fault-rate is too low then the process may have
memory. Allocation of the limited amount of frames to each process. too many frames.
Whenever any process with high priority arrives in the memory and if the frame is We can establish upper and lower bounds on the desired page faults. If the actual
not freely available at that time then the other process that has occupied the page-fault rate exceeds the upper limit then we will allocate the process to
frame is residing in the frame will move to secondary storage and after that this another frame. And if the page fault rate falls below the lower limit then we can
free frame will be allocated to higher priority process. remove the frame from the process.
We can also say that as soon as the memory fills up, the process starts spending a Thus with this, we can directly measure and control the page fault rate in order to
lot of time for the required pages to be swapped in. Again the utilization of the prevent thrashing.
CPU becomes low because most of the processes are waiting for pages. Q)Memory-Mapped Files
Thus a high degree of multiprogramming and lack of frames are two main causes Description:-Rather than accessing data files directly via the file system with every
of thrashing in the Operating system. file access, data files can be paged into memory the same as process files,
Working-Set Model resulting in much faster accesses ( except of course when page-faults occur. ) This
This model is based on the assumption of the locality. It makes the use of the is known as memory mapping a file.
parameter ? in order to define the working-set window. The main idea is to ➢ Basic Mechanism
examine the most recent? page reference. What locality is saying, the recently Basically a file is mapped to an address range within a process's virtual address
used page can be used again, and also the pages that are nearby this page will space, and then paged in as needed using the ordinary demand paging system.
also be used? Note that file writes are made to the memory page frames, and are not
1. Working Set immediately written out to disk. (This is the purpose of the "flush ( )" system call,
The set of the pages in the most recent? page reference is known as the working which may also be needed for stdout in some cases. See the time killer program
set. If a page is in active use, then it will be in the working set. In case if the page for an example of this. )
is no longer being used then This is also why it is important to "close ( )" a file when one is done writing to it -
it will drop from the So that the data can be safely flushed out to disk and so that the memory frames
working set ? times after its can be freed up for other purposes.
last reference. Some systems provide special system calls to memory map files and use direct
The working set mainly disk access otherwise. Other systems map the file to process address space if the
gives the approximation of special system calls are used and map the file to kernel address space otherwise,
the locality of the program. but do memory mapping in either case.
The accuracy of the working File sharing is made possible by mapping the same file to the address space of
set mainly depends on? more than one process, as shown in Figure 9.23 below. Copy-on-write is
what is chosen? supported, and mutual exclusion techniques ( chapter 6 ) may be needed to avoid
This working set model synchronization problems.
avoids thrashing while
keeping the degree of
multiprogramming as high as possible.
55
Map the shared object to virtual memory address space, returning its base
address as a void pointer (LPVOID).
#include <windows.h>
#include <stdio.h>
int main(int argc, char *argv[])
{ HANDLE hMapFile;
LPVOID lpMapAddress;
hMapFile = OpenFileMapping(FILE MAP ALL ACCESS, /* R/W access */
FALSE, /* no inheritance */
TEXT("SharedObject")); /* name of mapped file object */
lpMapAddress = MapViewOfFile(hMapFile, /* mapped object handle */
FILE MAP ALL ACCESS, /* read/write access */
0, /* mapped view of entire file */
0,
0);
/* read from shared memory */
printf("Read message %s", lpMapAddress);
UnmapViewOfFile(lpMapAddress);
CloseHandle(hMapFile);
}
Figure 9.25 Consumer reading from shared memory using the Windows API.
Q)Allocating Kernel Memory
1. The Buddy System allocates memory using a power of two allocator.
➢ Shared Memory in the Win32 API 2. Under this scheme, memory is always allocated as a power of 2 ( 4K, 8K, 16K,
Description:- etc ), rounding up to the next nearest power of two if necessary.
3. If a block of the correct size is not currently available, then one is formed by
splitting the next larger block in two, forming two matched buddies. ( And if that
larger size is not available, then the next largest available size is split, and so on. )
4. One nice feature of the buddy system is that if the address of a block is
exclusively ORed with the size of the block, the resulting address is the address of
the buddy of the same size, which allows for fast and easy coalescing of free
blocks back into larger blocks.
5. Free lists are maintained for every size block.
6. If the necessary block size is not available upon request, a free block from the
next largest size is split into two buddies of the desired size. ( Recursively splitting
larger size blocks if necessary. )
7. When a block is freed, its buddy's address is calculated, and the free list for that
Windows implements shared memory using shared memory-mapped files, size block is checked to see if the buddy is also free. If it is, then the two buddies
involving three basic steps: are coalesced into one larger free block, and the process is repeated with
Create a file, producing a HANDLE to the new file. successively larger free lists.
Name the file as a shared object, producing a HANDLE to the shared object.

56
57
58
UNIT 4
Q) Overview of Mass-Storage Structure
• Magnetic Disks
• Traditional magnetic disks have the following basic structure:
• One or more platters in the form of disks covered with magnetic
media. Hard disk platters are made of rigid metal, while "floppy" disks are
made of more flexible plastic.
• Each platter has two working surfaces. Older hard disk drives would
sometimes not use the very top or bottom surface of a stack of platters,
as these surfaces were more susceptible to potential damage.
• Each working surface is divided into a number of concentric rings
called tracks. The collection of all tracks that are the same distance from
the edge of the platter, ( i.e. all tracks immediately above one another in
the following diagram ) is called a cylinder.
• Each track is further divided into sectors, traditionally containing 512
bytes of data each, although some modern disks occasionally use larger
sector sizes. ( Sectors also include a header and a trailer, including
checksum information among other things. Larger sector sizes reduce the .
fraction of the disk consumed by headers and trailers, but increase
internal fragmentation and the amount of disk that must be marked bad ➢ Solid-State Disks
in the case of errors. ) 1. As technologies improve and economics change, old technologies are often
used in different ways. One example of this is the increasing used of solid
• The data on a hard drive is read by read-write heads. The standard
state disks, or SSDs.
configuration ( shown below ) uses one head per surface, each on a
2. SSDs use memory technology as a small fast hard disk. Specific
separate arm, and controlled by a common arm assembly which moves
implementations may use either flash memory or DRAM chips protected by
all heads simultaneously from one cylinder to another. ( Other
a battery to sustain the information through power cycles.
configurations, including independent read-write heads, may speed up
3. Because SSDs have no moving parts they are much faster than traditional
disk access, but involve serious technical difficulties. )
hard drives, and certain problems such as the scheduling of disk accesses
• The storage capacity of a traditional disk drive is equal to the number of
simply do not apply.
heads ( i.e. the number of working surfaces ), times the number of tracks
4. However SSDs also have their weaknesses: They are more expensive than
per surface, times the number of sectors per track, times the number of
hard drives, generally not as large, and may have shorter life spans.
bytes per sector. A particular physical block of data is specified by
5. SSDs are especially useful as a high-speed cache of hard-disk information
providing the head-sector-cylinder number at which it is located
that must be accessed quickly. One example is to store filesystem meta-
• Disk heads "fly" over the surface on a very thin cushion of air. If they
data, e.g. directory and inode information, that must be accessed quickly
should accidentally contact the disk, then a head crash occurs, which may
and often. Another variation is a boot disk containing the OS and some
or may not permanently damage the disk or even destroy it completely.
application executables, but no vital user data. SSDs are also used in
For this reason it is normal to park the disk heads when turning a
laptops to make them smaller, faster, and lighter.
computer off, which means to move the heads off the disk or to an area
6. Because SSDs are so much faster than traditional hard disks, the
of the disk where there is no data stored.
throughput of the bus can become a limiting factor, causing some SSDs to
be connected directly to the system PCI bus for example.

59
• Magnetic Tapes - Q)Disk Attachment
1. Magnetic tapes were once used for common secondary storage before Disk drives can be attached either directly to a particular host ( a local disk ) or to
the days of hard disk drives, but today are used primarily for backups. a network.
2. Accessing a particular spot on a magnetic tape can be slow, but once ➢ Host-Attached Storage
reading or writing commences, access speeds are comparable to disk Local disks are accessed through I/O Ports as described earlier.
drives. The most common interfaces are IDE or ATA, each of which allow up to two drives
3. Capacities of tape drives can range from 20 to 200 GB, and compression per host controller.
can double that capacity. SATA is similar with simpler cabling.
Q)Disk Structure High end workstations or other systems in need of larger number of disks typically
The traditional head-sector-cylinder, HSC numbers are mapped to linear block use SCSI disks:
addresses by numbering the first sector on the first head on the outermost track The SCSI standard supports up to 16 targets on each SCSI bus, one of which is
as sector 0. Numbering proceeds with the rest of the sectors on that same track, generally the host adapter and the other 15 of which can be disk or tape drives.
and then the rest of the tracks on the same cylinder before proceeding through A SCSI target is usually a single drive, but the standard also supports up to
the rest of the cylinders to the center of the disk. In modern practice these linear 8 units within each target. These would generally be used for accessing individual
block addresses are used in place of the HSC numbers for a variety of reasons: disks within a RAID array. ( See below. )
1. The linear length of tracks near the outer edge of the disk is much longer The SCSI standard also supports multiple host adapters in a single computer, i.e.
than for those tracks located near the center, and therefore it is possible multiple SCSI busses.
to squeeze many more sectors onto outer tracks than onto inner ones. ➢ Network-Attached Storage
2. All disks have some bad sectors, and therefore disks maintain a few spare Network attached storage connects storage devices to computers using a remote
sectors that can be used in place of the bad ones. The mapping of spare procedure call, RPC, interface, typically with something like NFS filesystem
sectors to bad sectors in managed internally to the disk controller. mounts. This is convenient for allowing several computers in a group common
3. Modern hard drives can have thousands of cylinders, and hundreds of access and naming conventions for shared storage.
sectors per track on their outermost tracks. These numbers exceed the NAS can be implemented using SCSI cabling, or ISCSI uses Internet protocols and
range of HSC numbers for many ( older ) operating systems, and standard network connections, allowing long-distance remote access to shared
therefore disks can be configured for any convenient combination of HSC files.
values that falls within the total number of sectors physically on the NAS allows computers to easily share data storage, but tends to be less efficient
drive. than standard host-attached storage.
There is a limit to how closely packed individual bits can be placed on a physical
media, but that limit is growing increasingly more packed as technological
advances are made.
Modern disks pack many more sectors into outer cylinders than inner ones, using
one of two approaches:
With Constant Linear Velocity, CLV, the density of bits is uniform from cylinder to
cylinder. Because there are more sectors in outer cylinders, the disk spins slower
when reading those cylinders, causing the rate of bits passing under the read-
write head to remain constant. This is the approach used by modern CDs and
DVDs. ➢ Storage-Area Network
With Constant Angular Velocity, CAV, the disk rotates at a constant angular speed, A Storage-Area Network, SAN, connects computers and storage devices in a
with the bit density decreasing on outer cylinders. ( These disks would have a network, using storage protocols instead of network protocols.
constant number of sectors per track on all cylinders. ) One advantage of this is that storage access does not tie up regular networking
bandwidth.

60
SAN is very flexible and dynamic, allowing hosts and devices to attach and detach 2 SSTF Scheduling
on the fly. Shortest Seek Time First scheduling is more efficient, but may lead to starvation if
SAN is also controllable, allowing restricted access to certain hosts and devices. a constant stream of requests arrives for the same general area of the disk.
SSTF reduces the total head movement to 236 cylinders, down from 640 required
for the same set of requests under FCFS. Note, however that the distance could be
reduced still further to 208 by starting with 37 and then 14 first before processing
the rest of the requests.

Q)Disk Scheduling
As mentioned earlier, disk transfer speeds are limited primarily by seek
times and rotational latency. When multiple requests are to be processed there is
also some inherent delay in waiting for other requests to be processed.
Bandwidth is measured by the amount of data transferred divided by the total
amount of time from the first request being made to the last transfer being 3.SCAN Scheduling
completed, ( for a series of disk requests. ) The SCAN algorithm, a.k.a. the elevator algorithm moves back and forth from one
Both bandwidth and access time can be improved by processing requests in a end of the disk to the other, similarly to an elevator processing requests in a tall
good order. building.
Disk requests include the disk address, memory address, number of sectors to
transfer, and whether the request is for reading or writing.
1.FCFS Scheduling
First-Come First-Serve is simple and intrinsically fair, but not very efficient.
Consider in the following sequence the wild swing from cylinder 122 to 14 and
then back to 124:

Under the SCAN algorithm, If a request arrives just ahead of the moving head
then it will be processed right away, but if it arrives just after the head has
passed, then it will have to wait for the head to pass going the other way on the

61
return trip. This leads to a fairly wide variation in access times which can be data blocks stored as close as possible to the corresponding directory structures,
improved upon. then that reduces still further the overall time to find the disk block numbers and
Consider, for example, when the head reaches the high end of the disk: Requests then access the corresponding data blocks.
with high cylinder numbers just missed the passing head, which means they are On modern disks the rotational latency can be almost as significant as the seek
all fairly recent requests, whereas requests with low numbers may have been time, however it is not within the OSes control to account for that, because
waiting for a much longer time. Making the return scan from high to low then modern disks do not reveal their internal sector mapping schemes, ( particularly
ends up accessing recent requests first and making older requests wait that much when bad blocks have been remapped to spare sectors. )
longer. Some disk manufacturers provide for disk scheduling algorithms directly on their
4 C-SCAN Scheduling disk controllers, ( which do know the actual geometry of the disk as well as any
The Circular-SCAN algorithm improves upon SCAN by treating all requests in a remapping ), so that if a series of requests are sent from the computer to the
circular queue fashion - Once the head reaches the end of the disk, it returns to controller then those requests can be processed in an optimal order.
the other end without processing any requests, and then starts again from the Unfortunately there are some considerations that the OS must take into account
beginning of the disk: that are beyond the abilities of the on-board disk-scheduling algorithms, such as
priorities of some requests over others, or the need to process certain requests in
a particular order. For this reason OSes may elect to spoon-feed requests to the
disk controller one at a time in certain situations.
Q) Disk Management
1 Disk Formatting
Before a disk can be used, it has to be low-level formatted, which means laying
down all of the headers and trailers marking the beginning and ends of each
sector. Included in the header and trailer are the linear sector numbers, and error-
correcting codes, ECC, which allow damaged sectors to not only be detected, but
in many cases for the damaged data to be recovered ( depending on the extent of
the damage. ) Sector sizes are traditionally 512 bytes, but may be larger,
particularly in larger drives.
ECC calculation is performed with every disk read or write, and if damage is
detected but the data is recoverable, then a soft error has occurred. Soft errors
➢ Selection of a Disk-Scheduling Algorithm are generally handled by the on-board disk controller, and never seen by the OS. (
With very low loads all algorithms are equal, since there will normally only be one See below. )
request to process at a time. Once the disk is low-level formatted, the next step is to partition the drive into
For slightly larger loads, SSTF offers better performance than FCFS, but may lead one or more separate partitions. This step must be completed even if the disk is
to starvation when loads become heavy enough. to be used as a single large partition, so that the partition table can be written to
For busier systems, SCAN and LOOK algorithms eliminate starvation problems. the beginning of the disk.
The actual optimal algorithm may be something even more complex than those After partitioning, then the filesystems must be logically formatted, which
discussed here, but the incremental improvements are generally not worth the involves laying down the master directory information ( FAT table or inode
additional overhead. structure ), initializing free lists, and creating at least the root directory of the
Some improvement to overall filesystem access times can be made by intelligent filesystem. ( Disk partitions which are to be used as raw devices are not logically
placement of directory and/or inode information. If those structures are placed in formatted. This saves the overhead and disk space of the filesystem structure, but
the middle of the disk instead of at the beginning of the disk, then the maximum requires that the application program manage its own disk storage requirements.
distance from those structures to data blocks is reduced to only one-half of the
disk size. If those structures can be further distributed and furthermore have their

62
2 Boot Block Modern disk controllers make much better use of the error-correcting codes, so
Computer ROM contains a bootstrap program ( OS independent ) with just enough that bad blocks can be detected earlier and the data usually recovered. ( Recall
code to find the first sector on the first hard drive on the first controller, load that that blocks are tested with every write as well as with every read, so often errors
sector into memory, and transfer control over to it. ( The ROM bootstrap program can be detected before the write operation is complete, and the data simply
may look in floppy and/or CD drives before accessing the hard drive, and is smart written to a different sector instead. )
enough to recognize whether it has found valid boot code or not. ) Note that re-mapping of sectors from their normal linear progression can throw
The first sector on the hard drive is known as the Master Boot Record, MBR, and off the disk scheduling optimization of the OS, especially if the replacement sector
contains a very small amount of code in addition to the partition table. The is physically far away from the sector it is replacing. For this reason most disks
partition table documents how the disk is partitioned into logical disks, and normally keep a few spare sectors on each cylinder, as well as at least one spare
indicates specifically which partition is cylinder. Whenever possible a bad sector will be mapped to another sector on the
the active or boot partition. same cylinder, or at least a cylinder as close as possible. Sector slipping may also
The boot program then looks to the be performed, in which all sectors between the bad sector and the replacement
active partition to find an operating sector are moved down by one, so that the linear progression of sector numbers
system, possibly loading up a slightly can be maintained.
larger / more advanced boot program If the data on a bad block cannot be recovered, then a hard error has occurred.,
along the way. which requires replacing the file(s) from backups, or rebuilding them from scratch.
In a dual-boot ( or larger multi-boot ) Q)Swap-Space Management
system, the user may be given a Modern systems typically swap out pages as needed, rather than swapping out
choice of which operating system to entire processes. Hence the swapping system is part of the virtual memory
boot, with a default action to be taken management system.
in the event of no response within Managing swap space is obviously an important task for modern OSes.
some time frame. 1 Swap-Space Use
Once the kernel is found by the boot program, it is loaded into memory and then The amount of swap space needed by an OS varies greatly according to how it is
control is transferred over to the OS. The kernel will normally continue the boot used. Some systems require an amount equal to physical RAM; some want a
process by initializing all important kernel data structures, launching important multiple of that; some want an amount equal to the amount by which virtual
system services ( e.g. network daemons, sched, init, etc. ), and finally providing memory exceeds physical RAM, and some systems use little or none at all!
one or more login prompts. Boot options at this stage may include single- Some systems support multiple swap spaces on separate disks in order to speed
user a.k.a. maintenance or safe modes, in which very few system services are up the virtual memory system.
started - These modes are designed for system administrators to repair problems 2 Swap-Space Location
or otherwise maintain the system. Swap space can be physically located in one of two locations:
3 Bad Blocks As a large file which is part of the regular filesystem. This is easy to implement,
No disk can be manufactured to 100% perfection, and all physical objects wear but inefficient. Not only must the swap space be accessed through the directory
out over time. For these reasons all disks are shipped with a few bad blocks, and system, the file is also subject to fragmentation issues. Caching the block location
additional blocks can be expected to go bad slowly over time. If a large number of helps in finding the physical blocks, but that is not a complete fix.
blocks go bad then the entire disk will need to be replaced, but a few here and As a raw partition, possibly on a separate or little-used disk. This allows the OS
there can be handled through other means. more control over swap space management, which is usually faster and more
In the old days, bad blocks had to be checked for manually. Formatting of the disk efficient. Fragmentation of swap space is generally not a big issue, as the space is
or running certain disk-analysis tools would identify bad blocks, and attempt to re-initialized every time the system is rebooted. The downside of keeping swap
read the data off of them one last time through repeated tries. Then the bad space on a raw partition is that it can only be grown by repartitioning the hard
blocks would be mapped out and taken out of future service. Sometimes the data drive.
could be recovered, and sometimes it was lost forever. ( Disk analysis tools could
be either destructive or non-destructive. )
63
3 Swap-Space Management: An Example This is the basic idea behind disk mirroring, in which a system contains identical
Historically OSes swapped out entire processes as needed. Modern systems swap data on two or more disks.
out only individual pages, and only as needed. ( For example process code blocks Note that a power failure during a write operation could cause both disks to
and other blocks that have not been changed since they were originally loaded contain corrupt data, if both disks were writing simultaneously at the time of the
are normally just freed from the virtual memory system rather than copying them power failure. One solution is to write to the two disks in series, so that they will
to swap space, because it is faster to go find them again in the filesystem and read not both become corrupted ( at least not in the same way ) by a power failure.
them back in from there than to write them out to swap space and then read And alternate solution involves non-volatile RAM as a write cache, which is not
them back. ) lost in the event of a power failure and which is protected by error-correcting
In the mapping system shown below for Linux systems, a map of swap space is codes.
kept in memory, where each entry corresponds to a 4K block in the swap space. 2 Improvement in Performance via Parallelism
Zeros indicate free slots and non-zeros refer to how many processes have a There is also a performance benefit to mirroring, particularly with respect to
mapping to that particular block ( >1 for shared pages only. ) reads. Since every block of data is duplicated on multiple disks, read operations
can be satisfied from any available copy, and multiple disks can be reading
different data blocks simultaneously in parallel. ( Writes could possibly be sped up
as well through careful scheduling algorithms, but it would be complicated in
practice. )
Another way of improving disk access time is with striping, which basically means
spreading data out across multiple disks that can be accessed simultaneously.
With bit-level striping the bits of each byte are striped across multiple disks. For
example if 8 disks were involved, then each 8-bit byte would be read in parallel by
8 heads on separate disks. A single disk read would access 8 * 512 bytes = 4K
Q)RAID Structure worth of data in the time normally required to read 512 bytes. Similarly if 4 disks
The general idea behind RAID is to employ a group of hard drives together with were involved, then two bits of each byte could be stored on each disk, for 2K
some form of duplication, either to increase reliability or to speed up operations, worth of disk access per read or write operation.
( or sometimes both. ) Block-level striping spreads a filesystem across multiple disks on a block-by-block
RAID originally stood for Redundant Array of Inexpensive Disks, and was designed basis, so if block N were located on disk 0, then block N + 1 would be on disk 1,
to use a bunch of cheap small disks in place of one or two larger more expensive and so on. This is particularly useful when filesystems are accessed in clusters of
ones. Today RAID systems employ large possibly expensive disks as their physical blocks. Other striping possibilities exist, with block-level striping being
components, switching the definition to Independent disks. the most common.
1 Improvement of Reliability via Redundancy 3 RAID Levels
The more disks a system has, the greater the likelihood that one of them will go Mirroring provides reliability but is expensive; Striping improves performance, but
bad at any given time. Hence increasing disks on a system does not improve reliability. Accordingly there are a number of different schemes
actually decreases the Mean Time To Failure, MTTF of the system. that combine the principals of mirroring and striping in different ways, in order to
If, however, the same data was copied onto multiple disks, then the data would balance reliability versus performance versus cost. These are described by
not be lost unless both ( or all ) copies of the data were damaged simultaneously, different RAID levels, as follows: ( In the diagram that follows, "C" indicates a
which is a MUCH lower probability than for a single disk going bad. More copy, and "P" indicates parity, i.e. checksum bits. )
specifically, the second disk would have to go bad before the first disk was Raid Level 0 - This level includes striping only, with no mirroring.
repaired, which brings the Mean Time To Repair into play. For example if two Raid Level 1 - This level includes mirroring only, no striping.
disks were involved, each with a MTTF of 100,000 hours and a MTTR of 10 hours,
then the Mean Time to Data Loss would be 500 * 10^6 hours, or 57,000 years!

64
Raid Level 2 - This level stores For any given block on the disk(s), one of the disks will hold the parity information
error-correcting codes on for that block and the other N-1 disks will hold the data. Note that the same disk
additional disks, allowing for any cannot hold both data and parity for the same block, as both would be lost in the
damaged data to be event of a disk crash.
reconstructed by subtraction Raid Level 6 - This level extends raid level 5 by storing multiple bits of error-
from the remaining undamaged recovery codes, ( such as the Reed-Solomon codes ), for each bit position of data,
data. Note that this scheme rather than a single parity bit. In the example shown below 2 bits of ECC are
requires only three extra disks to stored for every 4 bits of data, allowing data recovery in the face of up to two
protect 4 disks worth of data, as simultaneous disk failures. Note that this still involves only 50% increase in
opposed to full mirroring. ( The storage needs, as opposed to 100% for simple mirroring which could only tolerate
number of disks required is a a single disk failure.
function of the error-correcting There are also two RAID levels which combine RAID levels 0 and 1 ( striping and
algorithms, and the means by mirroring ) in different combinations, designed to provide both performance and
which the particular bad bit(s) reliability at the expense of increased cost.
is(are) identified. ) RAID level 0 + 1 disks are first striped, and then the striped disks mirrored to
Raid Level 3 - This level is similar another set. This level generally provides better performance than RAID level 5.
to level 2, except that it takes RAID level 1 +
advantage of the fact that each 0 mirrors disks
disk is still doing its own error- in pairs, and
detection, so that when an error then stripes the
occurs, there is no question mirrored pair
about which disk in the array has
the bad data. As a result a single The storage
parity bit is all that is needed to capacity,
recover the lost data from an performance,
array of disks. Level 3 also etc. are all the
includes striping, which same, but there
improves performance. The is an advantage
downside with the parity to this approach
approach is that every disk must in the event of
take part in every disk access, and the parity bits must be constantly calculated multiple disk
and checked, reducing performance. Hardware-level parity calculations and failures, as
NVRAM cache can help with both of those issues. In practice level 3 is greatly illustrated
preferred over level 2. below:.
Raid Level 4 - This level is similar to level 3, employing block-level striping instead In diagram (a) below, the 8 disks have been divided into two sets of four, each of
of bit-level striping. The benefits are that multiple blocks can be read which is striped, and then one stripe set is used to mirror the other set.
independently, and changes to a block only require writing two blocks ( data and If a single disk fails, it wipes out the entire stripe set, but the system can keep on
parity ) rather than involving all disks. Note that new disks can be added functioning using the remaining set.
seamlessly to the system provided they are initialized to all zeros, as this does not However if a second disk from the other stripe set now fails, then the entire
affect the parity results. system is lost, as a result of two disk failures.
Raid Level 5 - This level is similar to level 4, except the parity blocks are In diagram (b), the same 8 disks are divided into four sets of two, each of which is
distributed over all disks, thereby more evenly balancing the load on the system. mirrored, and then the file system is striped across the four sets of mirrored disks.
65
If a single disk fails, then that mirror set is reduced to a single disk, but the system greatly over RAID alone, at a cost of a performance hit that is acceptable because
rolls on, and the other three mirror sets continue mirroring. ZFS is so fast to begin with.
Now if a second disk fails, ( that is not the mirror of the already failed disk ), then
another one of the mirror sets is reduced to a single disk, but the system can Q)FILE SYSTEM INTERFACE-FILE CONCEPT
continue without data loss. ➢ File Concept
In fact the second arrangement could handle as many as four simultaneously 1 File Attributes
failed disks, as long as no two of them were from the same mirror pair. Different OSes keep track of different file attributes, including:
4.Selecting a RAID Level Name - Some systems give special significance to names, and particularly
Trade-offs in selecting the optimal RAID level for a particular application include extensions ( .exe, .txt, etc. ), and some do not. Some extensions may be of
cost, volume of data, need for reliability, need for performance, and rebuild time, significance to the OS ( .exe ), and others only to certain applications ( .jpg )
the latter of which can affect the likelihood that a second disk will fail while the Identifier ( e.g. inode number )
first failed disk is being rebuilt. Type - Text, executable, other binary, etc.
Other decisions include how many disks are involved in a RAID set and how many Location - on the hard drive.
disks to protect with a single parity bit. More disks in the set increases Size
performance but increases cost. Protecting more disks per parity bit saves cost, Protection
but increases the likelihood that a second disk will fail before the first bad disk is Time & Date
repaired. User ID
5 Extensions 2 File Operations
RAID concepts have been extended to tape drives ( e.g. striping tapes for faster The file ADT supports many common operations:
backups or parity checking tapes for reliability ), and for broadcasting of data. Creating a file
6 Problems with RAID Writing a file
RAID protects against physical errors, but not against any number of bugs or other Reading a file
errors that Repositioning within a file
could write Deleting a file
erroneous Truncating a file.
data. Most OSes require that files be opened before access and closed after all access is
ZFS adds an complete. Normally the programmer must open and close files explicitly, but
extra level of some rare systems open the file automatically at first access. Information about
protection by currently open files is stored in an open file table, containing for example:
including data File pointer - records the current position in the file, for the next read or write
block access.
checksums in File-open count - How many times has the current file been opened
all inodes ( simultaneously by different processes ) and not yet closed? When this counter
along with the reaches zero the file can be removed from the table.
pointers to the Disk location of the file.
data blocks. If Access rights
data are Some systems provide support for file locking.
mirrored and A shared lock is for reading only.
one copy has A exclusive lock is for writing as well as reading.
the correct An advisory lock is informational only, and not enforced. ( A "Keep Out" sign,
checksum and the other does not, then the data with the bad checksum will be which may be ignored. )
replaced with a copy of the data with the good checksum. This increases reliability A mandatory lock is enforced. ( A truly locked door. )
66
UNIX used advisory locks, and Windows uses mandatory locks
import java.io.*;
import java.nio.channels.*;
public class LockingExample { public static final boolean EXCLUSIVE = false;
public static final boolean SHARED = true;
public static void main(String args[]) throws IOException { FileLock sharedLock =
null;
FileLock exclusiveLock = null;
try { RandomAccessFile raf = new RandomAccessFile("file.txt","rw");
// get the channel for the file
FileChannel ch = raf.getChannel();
// this locks the first half of the file - exclusive
exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE);
/** Now modify the data . . . */
// release the lock
exclusiveLock.release();
// this locks the second half of the file - shared
sharedLock = ch.lock(raf.length()/2+1,raf.length(),SHARED);
/** Now read the data . . . */
// release the lock
sharedLock.release();
} catch (java.io.IOException ioe) { System.err.println(ioe);
}finally { if (exclusiveLock != null)
exclusiveLock.release();
if (sharedLock != null)
sharedLock.release();
}
}
}
Figure 11.2 File-locking example in Java.
3 File Types 4File Structure
Windows ( and some other systems ) use special file extensions to indicate the Some files contain an internal structure, which may or may not be known to the
type of each file: OS.
Macintosh stores a creator attribute for each file, according to the program that For the OS to support particular file formats increases the size and complexity of
first created it with the create( ) system call. the OS.
UNIX stores magic numbers at the beginning of certain files. ( Experiment with the UNIX treats all files as sequences of bytes, with no further consideration of the
"file" command, especially in directories such as /bin and /dev ) internal structure. ( With the exception of executable binary programs, which it
must know how to load and find the first executable statement, etc. )
Macintosh files have two forks - a resource fork, and a data fork. The resource fork
contains information relating to the UI, such as icons and button images, and can
be modified independently of the data fork, which contains the code or data as
appropriate.
67
5 Internal File Structure
Disk files are accessed in units of physical blocks, typically 512 bytes or some
power-of-two multiple thereof. ( Larger physical disks use larger block sizes, to
keep the range of block numbers within the range of a 32-bit integer. )
Internally files are organized in units of logical units, which may be as small as a
single byte, or may be a larger size corresponding to some data record or
structure size.
The number of logical units which fit into one physical block determines
its packing, and has an impact on the amount of internal fragmentation ( wasted
space ) that occurs.
As a general rule, half a physical block is wasted for each file, and the larger the
• Other Access Methods
block sizes the more space is lost to internal fragmentation.
An indexed access scheme can be easily built on top of a direct access system.
Q)Access Methods
Very large files may require a multi-tiered indexing scheme, i.e. indexes of
• Sequential Access
indexes.
A sequential access file emulates magnetic tape operation, and generally supports
a few operations:
read next - read a record and advance the tape to the next position.
write next - write a record and advance the tape to the next position.
rewind
skip n records - May or may not be supported. N may be limited to positive
numbers, or may be limited to +/- 1.

• Direct Access
Jump to any record and read that record. Operations supported include:
read n - read record number n. ( Note an argument is now required. ) Q) Directory Structure
write n - write record number n. ( Note an argument is now required. ) • Storage Structure
jump to record n - could be 0 or the end of file. A disk can be used in its entirety for a file system.
Query current record - used to return back to this record later. Alternatively a physical disk can be broken up into multiple partitions, slices, or
Sequential access can be easily emulated using direct access. The inverse is mini-disks, each of which becomes a virtual disk and can have its own filesystem. (
complicated and inefficient. or be used for raw storage, swap space, etc. )
Or, multiple physical disks can be combined into one volume, i.e. a larger virtual
disk, with its own filesystem spanning the physical disks.

68
Delete a file - erase from the directory
List a directory - possibly ordered in different ways.
Rename a file - may change sorting order
Traverse the file system.
• Single-Level Directory
Simple to implement, but each file must have a unique name.

• Two-Level Directory
Each user gets their own directory space.
File names only need to be unique within a given user's directory.
/ ufs
A master file directory is used to keep track of each users directory, and must be
/devices devfs
maintained when users are added to or removed from the system.
/dev dev
A separate directory is generally needed for system ( executable ) files.
/system/contract ctfs
Systems may or may not allow users to access other directories besides their own
/proc proc
If access to other directories is allowed, then provision must be made to specify
/etc/mnttab mntfs
the directory being accessed.
/etc/svc/volatile tmpfs
If access is denied, then special consideration must be made for users to run
/system/object objfs
programs located in system directories. A search path is the list of directories in
/lib/libc.so.1 lofs
which to search for executable programs, and can be set uniquely for each user .
/dev/fd fd
/var ufs
/tmp tmpfs
/var/run tmpfs
/opt ufs
/zpbge zfs
/zpbge/backup zfs
/export/home zfs
/var/mail zfs
/var/spool/mqueue zfs
/zpbg zfs • Tree-Structured Directories
/zpbg/zones zfs An obvious extension to the two-tiered directory structure, and the one with
Figure Solaris file systems. which we are all most familiar.
• Directory Overview Each user / process has the concept of a current directory from which all
Directory operations to be supported include: ( relative ) searches take place.
Search for a file Files may be accessed using either absolute pathnames ( relative to the root of
Create a file - add to the directory the tree ) or relative pathnames ( relative to the current directory. )

69
Directories are stored the same as any other file in the system, except there is a the references is removed the link count is reduced, and when it reaches zero, the
bit that identifies them as directories, and they have some special structure that disk space can be reclaimed.
the OS understands. For symbolic links there is some question as to what to do with the symbolic links
One question for consideration is whether or not to allow the removal of when the original file is moved or deleted:
directories that are not empty - Windows requires that directories be emptied One option is to find all the symbolic links and adjust them also.
first, and UNIX provides an option for deleting entire sub-trees. Another is to leave the symbolic links dangling, and discover that they are no
longer valid the next time they are used.

• Acyclic-Graph Directories
When the same files need to be accessed in more than one place in the directory
structure ( e.g. because they are being shared by more than one user / process ),
it can be useful to provide an acyclic-graph structure. ( Note the directed arcs from
parent to child. ) General Graph Directory
UNIX provides two types of links for implementing the acyclic-graph structure. ( If cycles are allowed in the graphs, then several problems can arise:
See "man ln" for more details. ) Search algorithms can go into infinite loops. One solution is to not follow links in
A hard link ( usually just called a link ) involves multiple directory entries that both search algorithms. ( Or not to follow symbolic links, and to only allow symbolic
refer to the same file. Hard links are only valid for ordinary files in the same links to refer to directories. )
filesystem. Sub-trees can become disconnected from the rest of the tree and still not have
A symbolic link, that involves a special file, containing information about where to their reference counts reduced to zero. Periodic garbage collection is required to
find the linked file. Symbolic links may be used to link directories and/or files in detect and resolve this problem. ( chkdsk in DOS and fsck in UNIX search for these
other filesystems, as well as ordinary files in the current filesystem. problems, among others, even though cycles are not supposed to be allowed in
Windows only supports symbolic links, termed shortcuts. either system. Disconnected disk blocks that are not marked as free are added
Hard links require a reference count, or link count for each file, keeping track of back to the file systems with made-up file names, and can usually be safely
how many directory entries are currently referring to this file. Whenever one of deleted. )
70
Q)File-System Mounting - File system. (a) Existing system. (b) Unmounted volume .
The basic idea behind mounting file systems is to combine multiple file systems
into one large tree structure.
The mount command is given a filesystem to mount and a mount point ( directory
) on which to attach it.
Once a file system is mounted onto a mount point, any further references to that
directory actually refer to the root of the mounted file system.
Any files ( or sub-directories ) that had been stored in the mount point directory
prior to mounting the new filesystem are now hidden by the mounted filesystem,
and are no longer available. For this reason some systems only allow mounting
onto empty directories.
Filesystems can only be mounted by root, unless root has previously configured
certain filesystems to be mountable onto certain pre-determined mount points. (
E.g. root may allow users to mount floppy filesystems to /mnt or something like
it. ) Anyone can run the mount command to see what filesystems are currently
mounted.
Filesystems may be mounted read-only, or have other restrictions imposed.
The traditional Windows OS runs an extended two-tier directory structure, where
the first tier of the structure separates volumes by drive letters, and a tree
structure is implemented below that level. Mount point.
Macintosh runs a similar system, where each new volume that is found is
automatically mounted and added to the desktop when it is found.
More recent Windows systems allow filesystems to be mounted to any directory
in the filesystem, much like UNIX.

71
Q) File Sharing 2 Distributed Information Systems
❖ Multiple Users The Domain Name System, DNS, provides for a unique naming system across all of
On a multi-user system, more information needs to be stored for each file: the Internet.
The owner ( user ) who owns the file, and who can control its access. Domain names are maintained by the Network Information System, NIS, which
The group of other user IDs that may have some special access to the file. unfortunately has several security issues. NIS+ is a more secure version, but has
What access rights are afforded to the owner ( User ), the Group, and to the rest not yet gained the same widespread acceptance as NIS.
of the world ( the universe, a.k.a. Others. ) Microsoft's Common Internet File System, CIFS, establishes a network login for
Some systems have more complicated access control, allowing or denying specific each user on a networked system with shared file access. Older Windows systems
accesses to specifically named users or groups. used domains, and newer systems ( XP, 2000 ), use active directories. User names
❖ Remote File Systems must match across the network for this system to be valid.
The advent of the Internet introduces issues for accessing files stored on remote A newer approach is the Lightweight Directory-Access Protocol, LDAP, which
computers provides a secure single sign-on for all users to access all resources on a network.
The original method was ftp, allowing individual files to be transported across This is a secure system which is gaining in popularity, and which has the
systems as needed. Ftp can be either account and password controlled, maintenance advantage of combining authorization information in one central
or anonymous, not requiring any user name or password. location.
Various forms of distributed file systems allow remote file systems to be mounted 3 Failure Modes
onto a local directory structure, and accessed using normal file access commands. When a local disk file is unavailable, the result is generally known immediately,
( The actual files are still transported across the network as needed, possibly using and is generally non-recoverable. The only reasonable response is for the
ftp as the underlying transport mechanism. ) response to fail.
The WWW has made it easy once again to access files on remote systems without However when a remote file is unavailable, there are many possible reasons, and
mounting their filesystems, generally using ( anonymous ) ftp as the underlying whether or not it is unrecoverable is not readily apparent. Hence most remote
file transport mechanism. access systems allow for blocking or delayed response, in the hopes that the
1 The Client-Server Model remote system ( or the network ) will come back up eventually.
When one computer system remotely mounts a filesystem that is physically ❖ Consistency Semantics
located on another system, the system which physically owns the files acts as Consistency Semantics deals with the consistency between the views of shared
a server, and the system which mounts them is the client. files on a networked system. When one user changes the file, when do other
User IDs and group IDs must be consistent across both systems for the system to users see the changes?
work properly. ( I.e. this is most applicable across multiple computers managed by At first glance this appears to have all of the synchronization issues discussed in
the same organization, shared by a common group of users. ) Chapter 6. Unfortunately the long delays involved in network operations prohibit
The same computer can be both a client and a server. ( E.g. cross-linked file the use of atomic operations as discussed in that chapter.
systems. ) 1 UNIX Semantics
There are a number of security concerns involved in this model: The UNIX file system uses the following semantics:
Servers commonly restrict mount permission to certain trusted systems only. Writes to an open file are immediately visible to any other user who has the file
Spoofing ( a computer pretending to be a different computer ) is a potential open.
security risk. One implementation uses a shared location pointer, which is adjusted for all
Servers may restrict remote access to read-only. sharing users.
Servers restrict which filesystems may be remotely mounted. Generally the The file is associated with a single exclusive physical resource, which may delay
information within those subsystems is limited, relatively public, and protected by some accesses.
frequent backups. 2 Session Semantics
The NFS ( Network File System ) is a classic example of such a system. The Andrew File System, AFS uses the following semantics:
Writes to an open file are not immediately visible to other users.

72
When a file is closed, any changes made become available only to users who open
the file at a later time. bit Files Directories
According to these semantics, a file can be associated with multiple ( possibly
different ) views. Almost no constraints are imposed on scheduling accesses. No Read ( view ) Read directory contents. Required to get a listing of the
R
user is delayed in reading or writing their personal copy of the file. file contents. directory.
AFS file systems may be accessible by systems around the world. Access control is
Write
maintained through ( somewhat ) complicated access control lists, which may Change directory contents. Required to create or delete
W ( change ) file
grant access to the entire world ( literally ) or to specifically named users files.
contents.
accessing the files from specifically named remote environments.
3 Immutable-Shared-Files Semantics Access detailed directory information. Required to get a
Under this system, when a file is declared as shared by its creator, it becomes long listing, or to access any specific file in the directory.
immutable and the name cannot be re-used for any other resource. Hence it Execute file
Note that if a user has X but not R permissions on a
becomes read-only, and shared access is simple. X contents as a
directory, they can still access specific files, but only if they
Q)Protection program.
already know the name of the file they are trying to
Files must be kept safe for reliability ( against accidental damage ), and protection access.
( against deliberate malicious access. ) The former is usually managed with backup
copies. This section discusses the latter. The set user ID ( SUID ) bit and/or the set group ID ( SGID ) bits applied to
One simple protection scheme is to remove all access to a file. However this executable files temporarily change the identity of whoever runs the program to
makes the file unusable, so some sort of controlled access must be arranged. match that of the owner / group of the executable program. This allows users
❖ Types of Access running specific programs to have access to files ( while running that program ) to
The following low-level operations are often controlled: which they would normally be unable to access. Setting of these two bits is
Read - View the contents of the file usually restricted to root, and must be done with caution, as it introduces a
Write - Change the contents of the file. potential security leak.
Execute - Load the file onto the CPU and follow the instructions contained therein. The sticky bit on a directory modifies write permission, allowing users to only
Append - Add to the end of an existing file. delete files for which they are the owner. This allows everyone to create files in
Delete - Remove a file from the system. /tmp, for example, but to only delete files which they have created, and not
List -View the name and other attributes of files on the system. anyone else's.
Higher-level operations, such as copy, can generally be performed through The SUID, SGID, and sticky bits are indicated with an S, S, and T in the positions for
combinations of the above. execute permission for the user, group, and others, respectively. If the letter is
❖ Access Control lower case, ( s, s, t ), then the corresponding execute permission is not also given.
One approach is to have complicated Access Control Lists, ACL, which specify If it is upper case, ( S, S, T ), then the corresponding execute permission IS given.
exactly what access is allowed or denied for specific users or groups. The numeric form of chmod is needed to set these advanced bits.
The AFS uses this system for distributed access. ❖ Other Protection Approaches and Issues
Control is very finely adjustable, but may be complicated, particularly when the Some systems can apply passwords, either to individual files, or to specific sub-
specific users involved are unknown. ( AFS allows some wild cards, so for example directories, or to the entire system. There is a trade-off between the number of
all users on a certain remote system may be trusted, or a given username may be passwords that must be maintained ( and remembered by the users ) and the
trusted when accessing from any remote system. ) amount of information that is vulnerable to a lost or forgotten password.
UNIX uses a set of 9 access control bits, in three groups of three. These Older systems which did not originally have multi-user file access permissions (
correspond to R, W, and X permissions for each of the Owner, Group, and Others. DOS and older versions of Mac ) must now be retrofitted if they are to share files
( See "man chmod" for full details. ) The RWX bits control the following privileges on a network.
for ordinary files and directories:
In addition there are some special bits that can also be applied:
73
Access to a file requires access to all the files along its path as well. In a cyclic
directory structure, users may have different access to the same file accessed
through different paths.
Sometimes just the knowledge of the existence of a file of a certain name is a
security ( or privacy ) concern. Hence the distinction between the R and X bits on
UNIX directories.

74
UNIT-5 physical blocks, the file organization module also maintains the list of free blocks,
Q) FIL SYSTEM IMPLEMENTATION-FILE SYSTEM STRUCTURE and allocates free blocks to files as needed.
File-System Structure The logical file system deals with all of the meta data associated with a file ( UID,
Hard disks have two important properties that make them suitable for secondary GID, mode, dates, etc ), i.e. everything about the file except the data itself. This
storage of files in file systems: (1) Blocks of data can be rewritten in place, and (2) level manages the directory structure and the mapping of file names to file
they are direct access, allowing any block of data to be accessed with only ( control blocks, FCBs, which contain all of the meta data as well as block number
relatively ) minor movements of the disk heads and rotational latency. information for finding the data on the disk.
Disks are usually accessed in physical blocks, rather than a byte at a time. Block Q)File-System Implementation
sizes may range from 512 bytes to 4K or larger. ❖ Overview
File systems organize storage on disk drives, and can be viewed as a layered File systems store several important data structures on the disk:
design: A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or the partition
At the lowest layer are the physical boot sector in Windows contains information about how to boot the system off of
devices, consisting of the magnetic this disk. This will generally be the first sector of the volume if there is a bootable
media, motors & controls, and the system loaded on that volume, or the block will be left vacant otherwise.
electronics connected to them and A volume control block, ( per volume ) a.k.a. the master file table in UNIX or
controlling them. Modern disk put more the superblock in Windows, which contains information such as the partition
and more of the electronic controls table, number of blocks on each filesystem, and pointers to free blocks and free
directly on the disk drive itself, leaving FCB blocks.
relatively little work for the disk A directory structure ( per file system ), containing file names and pointers to
controller card to perform. corresponding FCBs. UNIX uses inode numbers, and NTFS uses a master file table.
I/O Control consists of device drivers, The File Control Block, FCB, ( per file ) containing details about ownership, size,
special software programs ( often permissions, dates, etc. UNIX stores this information in inodes, and NTFS in the
written in assembly ) which master file table as a relational database structure.
communicate with the devices by
reading and writing special codes
directly to and from memory addresses
corresponding to the controller card's
registers. Each controller card ( device )
on a system has a different set of
addresses ( registers, a.k.a. ports ) that it
listens to, and a unique set of command
codes and results codes that it
understands.
The basic file system level works directly
with the device drivers in terms of
retrieving and storing raw blocks of There are also several key data structures stored in memory:
data, without any consideration for An in-memory mount table.
what is in each block. Depending on the system, blocks may be referred to with a An in-memory directory cache of recently accessed directory information.
single block number, ( e.g. block # 234234 ), or with head-sector-cylinder A system-wide open file table, containing a copy of the FCB for every currently
combinations. open file in the system, as well as some other related information.
The file organization module knows about files and their logical blocks, and how
they map to physical blocks on the disk. In addition to translating from logical to
75
A per-process open file table, containing a pointer to the system open file table as Partitions and Mounting
well as some other information. ( For example the current file position pointer Physical disks are commonly divided into smaller units called partitions. They can
may be either here or in the system file table, depending on the implementation also be combined into larger units, but that is most commonly done for RAID
and whether the file is being shared or not. ) installations and is left for later chapters.
Figure 12.3 illustrates some of the interactions of file system components when Partitions can either be used as raw devices ( with no structure imposed upon
files are created and/or used: them ), or they can be formatted to hold a filesystem ( i.e. populated with FCBs
When a new file is created, a new FCB is allocated and filled out with important and initial directory structures as appropriate. ) Raw partitions are generally used
information regarding the new file. The appropriate directory is modified with the for swap space, and may also be used for certain programs such as databases that
new file name and FCB information. choose to manage their own disk storage system. Partitions containing file
When a file is accessed during a program, the open( ) system call reads in the FCB systems can generally only be accessed using the file system structure by ordinary
information from disk, and stores it in the system-wide open file table. An entry is users, but can often be accessed as a raw device also by root.
added to the per-process open file table referencing the system-wide table, and The boot block is accessed as part of a raw partition, by the boot program prior to
an index into the per-process table is returned by the open( ) system call. UNIX any operating system being loaded. Modern boot programs understand multiple
refers to this index as a file descriptor, and Windows refers to it as a file handle. OS es and file system formats, and can give the user a choice of which of several
If another process already has a file open when a new request comes in for the available systems to boot.
same file, and it is sharable, then a counter in the system-wide table is The root partition contains the OS kernel and at least the key portions of the OS
incremented and the per-process table is adjusted to point to the existing entry in needed to complete the boot process. At boot time the root partition is mounted,
the system-wide table. and control is transferred from the boot program to the kernel found there. (
When a file is closed, the per-process table entry is freed, and the counter in the Older systems required that the root partition lie completely within the first 1024
system-wide table is decremented. If that counter reaches zero, then the system cylinders of the disk, because that was as far as the boot program could reach.
wide table is also freed. Any data currently stored in memory cache for this file is Once the kernel had control, then it could access partitions beyond the 1024
written out to disk if necessary. cylinder boundary. )
Virtual File Systems
Virtual File Systems, VFS, provide a common interface to multiple different
filesystem types. In addition, it provides for a unique identifier ( vnode ) for files
across the entire space, including across all filesystems of different types. ( UNIX
inodes are unique only across a single filesystem, and certainly do not carry across
networked file systems. )
The VFS in Linux is based upon four key object types:
The inode object, representing an individual file
The file object, representing an open file.
The superblock object, representing a filesystem.
The dentry object, representing a directory entry.
Linux VFS provides a set of common functionalities for each filesystem, using
function pointers accessed through a table. The same functionality is accessed
through the same table position for all filesystem types, though the actual
functions pointed to by the pointers may be filesystem-specific. See
/usr/include/linux/fs.h for full details. Common operations provided include
open( ), read( ), write( ), and mmap( ).

In-memory file-system structures. (a) File open. (b) File read.

76
Q) Allocation Methods
There are three major methods of storing files on disks: contiguous, linked, and
indexed.
❖ Contiguous Allocation
Contiguous Allocation requires that all blocks of a file be kept together
contiguously.
Performance is very fast, because reading successive blocks of the same file
generally requires no movement of the disk heads, or at most one small step to
the next adjacent cylinder.
Storage allocation involves the same issues discussed earlier for the allocation of
contiguous blocks of memory ( first fit, best fit, fragmentation problems, etc. ) The
distinction is that the high time penalty required for moving the disk heads from
spot to spot may now justify the benefits of keeping files contiguously when
possible.
( Even file systems that do not by default store files contiguously can benefit from
certain utilities that compact the disk and make all files contiguous in the process.
Problems can arise when files grow, or if the exact size of a file is unknown at
creation time:
Over-estimation of the file's final size increases external fragmentation and
wastes disk space.
Under-estimation may require that a file be moved or a process aborted if the file
grows beyond its originally allocated space.
Q)Directory Implementation If a file grows slowly over a long time period and the total final space must be
Directories need to be fast to search, insert, and delete, with a minimum of allocated initially, then a lot of space becomes unusable before the file fills the
wasted disk space. space.
❖ Linear List A variation is to allocate file space in large contiguous chunks,
A linear list is the simplest and easiest directory structure to set up, but it does called extents. When a file outgrows its original extent, then an additional one is
have some drawbacks. allocated. ( For example an extent may be the size of a complete track or even
Finding a file ( or verifying one does not already exist upon creation ) requires a cylinder, aligned on an appropriate track or cylinder boundary. ) The high-
linear search. performance files system Veritas uses extents to optimize performance.
Deletions can be done by moving all entries, flagging an entry as deleted, or by
moving the last entry into the newly vacant position.
Sorting the list makes searches faster, at the expense of more complex insertions
and deletions.
A linked list makes insertions and deletions into a sorted list easier, with overhead
for the links.
More complex data structures, such as B-trees, could also be considered.
❖ Hash Table
A hash table can also be used to speed up searches.
Hash tables are generally implemented in addition to a linear or other structure

77
Linked allocation of disk space.
Contiguous allocation of disk space.
❖ Linked Allocation
Disk files can be stored as linked lists, with the expense of the storage space
consumed by each link. ( E.g. a block may be 508 bytes instead of 512. )
Linked allocation involves no external fragmentation, does not require pre-known
file sizes, and allows files to grow dynamically at any time.
Unfortunately linked allocation is only efficient for sequential access files, as
random access requires starting at the beginning of the list for each new location
access.
Allocating clusters of blocks reduces the space wasted by pointers, at the cost of
internal fragmentation.
Another big problem with linked allocation is reliability if a pointer is lost or
damaged. Doubly linked lists provide some protection, at the cost of additional
overhead and wasted space.
The File Allocation Table, FAT, used by DOS is a variation of linked allocation,
where all the links are stored in a separate table at the beginning of the disk. The
benefit of this approach is that the FAT table can be cached in memory, greatly
improving random access speeds.

File-allocation table

78
❖ Indexed Allocation
Indexed Allocation combines all of the indexes for accessing each file into a
common block ( for that file ), as opposed to spreading them all over the disk or
storing them in a FAT table.

Some disk space is wasted ( relative to linked lists or FAT tables ) because an
entire index block must be allocated for each file, regardless of how many data
blocks the file contains. This leads to questions of how big the index block should
❖ Performance
be, and how it should be implemented. There are several approaches:
The optimal allocation method is different for sequential access files than for
Linked Scheme - An index block is one disk block, which can be read and written in
random access files, and is also different for small files than for large files.
a single disk operation. The first index block contains some header information,
Some systems support more than one allocation method, which may require
the first N block addresses, and if necessary a pointer to additional linked index
specifying how the file is to be used ( sequential or random access ) at the time it
blocks.
is allocated. Such systems also provide conversion utilities.
Multi-Level Index - The first index block contains a set of pointers to secondary
Some systems have been known to use contiguous access for small files, and
index blocks, which in turn contain pointers to the actual data blocks.
automatically switch to an indexed scheme when file sizes surpass a certain
Combined Scheme - This is the scheme used in UNIX inodes, in which the first 12
threshold.
or so data block pointers are stored directly in the inode, and then singly, doubly,
And of course some systems adjust their allocation schemes ( e.g. block sizes ) to
and triply indirect pointers provide access to more data blocks as needed. ( See
best match the characteristics of the hardware for optimum performance.
below. ) The advantage of this scheme is that for small files ( which many are ),
Q)Free-Space Management
the data blocks are readily accessible
Another important aspect of disk management is keeping track of and allocating
free space.
❖ Bit Vector
One simple approach is to use a bit vector, in which each bit represents a disk
block, set to 1 if free or 0 if allocated.
Fast algorithms exist for quickly finding contiguous blocks of a given size

79
The down side is that a 40GB disk requires over 5MB just to store the bitmap. ( ❖ Space Maps
For example. ) Sun's ZFS file system was designed for HUGE numbers and sizes of files,
❖ Linked List directories, and even file systems.
A linked list can also be used to keep track of all free blocks. The resulting data structures could be VERY inefficient if not implemented
Traversing the list and/or finding a contiguous block of a given size are not easy, carefully. For example, freeing up a 1 GB file on a 1 TB file system could involve
but fortunately are not frequently needed operations. Generally the system just updating thousands of blocks of free list bit maps if the file was spread across the
adds and removes single blocks from the beginning of the list. disk.
The FAT table keeps track of the free list as just one more linked list on the table. ZFS uses a combination of techniques, starting with dividing the disk up into (
hundreds of ) metaslabs of a manageable size, each having their own space map.
Free blocks are managed using the counting technique, but rather than write the
information to a table, it is recorded in a log-structured transaction record.
Adjacent free blocks are also coalesced into a larger single free block.
An in-memory space map is constructed using a balanced tree data structure,
constructed from the log data.
The combination of the in-memory tree and the on-disk log provide for very fast
and efficient management of these very large files and free blocks.
Q)Efficiency and Performance
❖ Efficiency
UNIX pre-allocates inodes, which occupies space even before any files are
created.
UNIX also distributes inodes across the disk, and tries to store data files near their
inode, to reduce the distance of disk seeks between the inodes and the data.
Some systems use variable size clusters depending on the file size.
The more data that is stored in a directory ( e.g. last access time ), the more often
the directory blocks have to be re-written.
As technology advances, addressing schemes have had to grow as well.
Sun's ZFS file system uses 128-bit pointers, which should theoretically never need
to be expanded. ( The mass required to store 2^128 bytes with atomic storage
❖ Grouping would be at least 272 trillion kilograms! )
A variation on linked list free lists is to use links of blocks of indices of free blocks. Kernel table sizes used to be fixed, and could only be changed by rebuilding the
If a block holds up to N addresses, then the first block in the linked-list contains kernels. Modern tables are dynamically allocated, but that requires more
up to N-1 addresses of free blocks and a pointer to the next block of free complicated algorithms for accessing them.
addresses. ❖ Performance
❖ Counting Disk controllers generally include on-board caching. When a seek is requested, the
When there are multiple contiguous blocks of free space then the system can heads are moved into place, and then an entire track is read, starting from
keep track of the starting address of the group and the number of contiguous free whatever sector is currently under the heads ( reducing latency. ) The requested
blocks. As long as the average length of a contiguous group of free blocks is sector is returned and the unrequested portion of the track is cached in the disk's
greater than two this offers a savings in space needed for the free list. ( Similar to electronics.
compression techniques used for graphics images when a group of pixels all the Some OSes cache disk blocks they expect to need again in a buffer cache.
same color is encountered. ) A page cache connected to the virtual memory system is actually more efficient as
memory addresses do not need to be converted to disk block addresses and back
again.
80
Some systems ( Solaris, Linux, Windows 2000, NT, XP ) use page caching for both Page replacement strategies can be complicated with a unified cache, as one
process pages and file data in a unified virtual memory. needs to decide whether to replace process or file pages, and how many pages to
guarantee to each category of pages. Solaris, for example, has gone through many
variations, resulting in priority paging giving process pages priority over file I/O
pages, and setting limits so that neither can knock the other completely out of
memory.
Another issue affecting performance is the question of whether to
implement synchronous writes or asynchronous writes. Synchronous writes occur
in the order in which the disk subsystem receives them, without caching;
Asynchronous writes are cached, allowing the disk subsystem to schedule writes
in a more efficient order ( See Chapter 12. ) Metadata writes are often done
synchronously. Some systems support flags to the open call requiring that writes
be synchronous, for example for the benefit of database systems that require
their writes be performed in a required order.
The type of file access can also have an impact on optimal page replacement
policies. For example, LRU is not necessarily a good policy for sequential access
files. For these types of files progression normally goes in a forward direction
only, and the most recently used page will not be needed again until after the file
has been rewound and re-read from the beginning, ( if it is ever needed at all. ) On
the other hand, we can expect to need the next page in the file fairly soon. For
this reason sequential access files often take advantage of two special policies:
Free-behind frees up a page as soon as the next page in the file is requested, with
I/O without a unified buffer cache.
the assumption that we are now done with the old page and won't need it again
for a long time.
Read-ahead reads the requested page and several subsequent pages at the same
time, with the assumption that those pages will be needed in the near future. This
is similar to the track caching that is already performed by the disk controller,
except it saves the future latency of transferring data from the disk controller
memory into motherboard main memory.
Q) Recovery
❖ Consistency Checking
The storing of certain data structures ( e.g. directories and inodes ) in memory and
the caching of disk operations can speed up performance, but what happens in
the result of a system crash? All volatile memory structures are lost, and the
information stored on the hard drive may be left in an inconsistent state.
A Consistency Checker ( fsck in UNIX, chkdsk or scandisk in Windows ) is often run
at boot time or mount time, particularly if a filesystem was not closed down
properly. Some of the problems that these tools look for include:
Disk blocks allocated to files and also listed on the free list.
Disk blocks neither allocated to files nor on the free list.
I/O using a unified buffer cache.
Disk blocks allocated to more than one file.

81
The number of disk blocks allocated to a file inconsistent with the file's stated The old blocks can then be freed up for future use.
size. Alternatively, if the old blocks and old metadata are saved, then a snapshot of the
Properly allocated files / inodes which do not appear in any directory entry. system in its original state is preserved. This approach is taken by WAFL.
Link counts for an inode not matching the number of references to that inode in ZFS combines this with check-summing of all metadata and data blocks, and RAID,
the directory structure. to ensure that no inconsistencies are possible, and therefore ZFS does not
Two or more identical file names in the same directory. incorporate a consistency checker.
Illegally linked directories, e.g. cyclical relationships where those are not allowed, ❖ Backup and Restore
or files/directories that are not accessible from the root of the directory tree. In order to recover lost data in the event of a disk crash, it is important to conduct
Consistency checkers will often collect questionable disk blocks into new files backups regularly.
with names such as chk00001.dat. These files may contain valuable information Files should be copied to some removable medium, such as magnetic tapes, CDs,
that would otherwise be lost, but in most cases they can be safely deleted, ( DVDs, or external removable hard drives.
returning those disk blocks to the free list. ) A full backup copies every file on a filesystem.
UNIX caches directory information for reads, but any changes that affect space Incremental backups copy only files which have changed since some previous
allocation or metadata changes are written synchronously, before any of the time.
corresponding data blocks are written to. A combination of full and incremental backups can offer a compromise between
❖ Log-Structured File Systems full recoverability, the number and size of backup tapes needed, and the number
Log-based transaction-oriented ( a.k.a. journaling ) filesystems borrow techniques of tapes that need to be used to do a full restore. For example, one strategy might
developed for databases, guaranteeing that any given transaction either be:
completes successfully or can be rolled back to a safe state before the transaction At the beginning of the month do a full backup.
commenced: At the end of the first and again at the end of the second week, backup all files
All metadata changes are written sequentially to a log. which have changed since the beginning of the month.
A set of changes for performing a specific task ( e.g. moving a file ) is At the end of the third week, backup all files that have changed since the end of
a transaction. the second week.
As changes are written to the log they are said to be committed, allowing the Every day of the month not listed above, do an incremental backup of all files that
system to return to its work. have changed since the most recent of the weekly backups described above.
In the meantime, the changes from the log are carried out on the actual Backup tapes are often reused, particularly for daily backups, but there are limits
filesystem, and a pointer keeps track of which changes in the log have been to how many times the same tape can be used.
completed and which have not yet been completed. Every so often a full backup should be made that is kept "forever" and not
When all changes corresponding to a particular transaction have been completed, overwritten.
that transaction can be safely removed from the log. Backup tapes should be tested, to ensure that they are readable!
At any given time, the log will contain information pertaining to uncompleted For optimal security, backup tapes should be kept off-premises, so that a fire or
transactions only, e.g. actions that were committed but for which the entire burglary cannot destroy both the system and the backups. There are companies (
transaction has not yet been completed. e.g. Iron Mountain ) that specialize in the secure off-site storage of critical backup
From the log, the remaining transactions can be completed, information.
or if the transaction was aborted, then the partially completed changes can be Keep your backup tapes secure - The easiest way for a thief to steal all your data is
undone. to simply pocket your backup tapes!
❖ Other Solutions Storing important files on more than one computer can be an alternate though
Sun's ZFS and Network Appliance's WAFL file systems take a different approach to less reliable form of backup.
file system consistency. Note that incremental backups can also help users to get back a previous version
No blocks of data are ever over-written in place. Rather the new data is written of a file that they have since changed in some way.
into fresh new blocks, and after the transaction is complete, the metadata ( data Beware that backups can help forensic investigators recover e-mails and other
block pointers ) is updated to point to the new blocks. files that users had though they had deleted!
82
Q)NFS ❖ The Mount Protocol
❖ Overview The NFS mount protocol is similar to the local mount protocol, establishing a
connection between a specific local directory ( the mount point ) and a specific
device from a remote system.
Each server maintains an export list of the local filesystems ( directory sub-trees )
which are exportable, who they are exportable to, and what restrictions apply (
e.g. read-only access. )
The server also maintains a list of currently connected clients, so that they can be
notified in the event of the server going down and for other reasons.
Automount and autounmount are supported.
❖ The NFS Protocol
Implemented as a set of remote procedure calls ( RPCs ):
Searching for a file in a directory
REading a set of directory entries
Manipulating links and directories
Accessing file attributes
Reading and writing files
Three independent file systems.

- Mounting in NFS. (a) Mounts. (b) Cascading mounts.

83
Q) I/O SYSTEM-OVERVIEW The control register has bits written by the host to issue commands or to change
❖ Overview settings of the device such as parity checking, word length, or full- versus half-
Management of I/O devices is a very important part of the operating system - so duplex operation.
important and so varied that entire I/O subsystems are devoted to its operation.
( Consider the range of devices on a modern computer, from mice, keyboards,
disk drives, display adapters, USB devices, network connections, audio I/O,
printers, special devices for the handicapped, and many special-purpose
peripherals. )
I/O Subsystems must contend with two ( conflicting? ) trends: (1) The gravitation
towards standard interfaces for a wide range of devices, making it easier to add
newly developed devices to existing systems, and (2) the development of entirely
new types of devices, for which the existing standard interfaces are not always
easy to apply.
Device drivers are modules that can be plugged into an OS to handle a particular
device or category of similar devices.
Q)I/O Hardware
I/O devices can be roughly categorized as storage, communications, user-
interface, and other
Devices communicate with the computer via signals sent over wires or through
the air.
Devices connect with the computer via ports, e.g. a serial or parallel port.
A common set of wires connecting multiple devices is termed a bus.
Buses include rigid protocols for the types of messages that can be sent across the
bus and the procedures for resolving contention issues.
Figure 13.1 below illustrates three of the four bus types commonly found in a
modern PC: A typical PC bus structure.
The PCI bus connects high-speed high-bandwidth devices to the memory ❖ Polling
subsystem ( and the CPU. ) One simple means of device handshaking involves polling:
The expansion bus connects slower low-bandwidth devices, which typically The host repeatedly checks the busy bit on the device until it becomes clear.
deliver data one character at a time ( with buffering. ) The host writes a byte of data into the data-out register, and sets the write bit in
The SCSI bus connects a number of SCSI devices to a common SCSI controller. the command register ( in either order. )
A daisy-chain bus, ( not shown) is when a string of devices is connected to each The host sets the command ready bit in the command register to notify the device
other like beads on a chain, and only one of the devices is directly connected to of the pending command.
the host. When the device controller sees the command-ready bit set, it first sets the busy
One way of communicating with devices is through registers associated with each bit.
port. Registers may be one to four bytes in size, and may typically include ( a Then the device controller reads the command register, sees the write bit set,
subset of ) the following four: reads the byte of data from the data-out register, and outputs the byte of data.
The data-in register is read by the host to get input from the device. The device controller then clears the error bit in the status register, the command-
The data-out register is written by the host to send output. ready bit, and finally clears the busy bit, signaling the completion of the
The status register has bits read by the host to ascertain the status of the device, operation.
such as idle, ready for input, busy, error, transaction complete, etc.

84
software interrupt. ( E.g. 21 hex in DOS. ) The system does a state save and then
Interrupts calls on the proper interrupt handler to process the request in kernel mode.
Interrupts allow devices to notify the CPU when they have data to transfer or Software interrupts generally have low priority, as they are not as urgent as
when an operation is complete, allowing the CPU to perform other duties when devices with limited buffering space.
no I/O transfers need its immediate attention. Interrupts are also used to control kernel operations, and to schedule activities for
The CPU has an interrupt-request line that is sensed after every instruction. optimal performance. For example, the completion of a disk read operation
A device's controller raises an interrupt by asserting a signal on the interrupt involves two interrupts:
request line. A high-priority interrupt acknowledges the device completion, and issues the next
The CPU then performs a state save, and transfers control to the interrupt disk request so that the hardware does not sit idle.
handler routine at a fixed address in memory. A lower-priority interrupt transfers the data from the kernel memory space to the
user space, and then transfers the process from the waiting queue to the ready
queue.
The Solaris OS uses a multi-threaded kernel and priority threads to assign
different threads to different interrupt handlers. This allows for the
"simultaneous" handling of multiple interrupts, and the assurance that high-
priority interrupts will take precedence over low-priority ones and over user
processes.
❖ Direct Memory Access
For devices that transfer large quantities of data ( such as disk controllers ), it is
wasteful to tie up the CPU transferring data in and out of registers one byte at a
time.
Instead this work can be off-loaded to a special processor, known as the Direct
Memory Access, DMA, Controller.
The host issues a command to the DMA controller, indicating the location where
the data is located, the location where the data is to be transferred to, and the
number of bytes of data to transfer. The DMA controller handles the data
transfer, and then interrupts the CPU when the transfer is complete.
A simple DMA controller is a standard component in modern PCs, and many bus-
mastering I/O cards contain their own DMA hardware.
Handshaking between DMA controllers and their devices is accomplished through
two wires called the DMA-request and DMA-acknowledge wires.
While the DMA transfer is going on the CPU does not have access to the PCI bus (
including main memory ), but it does have access to its internal registers and
primary and secondary caches.
DMA can be done in terms of either physical addresses or virtual addresses that
are mapped to physical addresses. The latter approach is known as Direct Virtual
Memory Access, DVMA, and allows direct data transfer from one memory-mapped
device to another without using the main memory chips.
Interrupt-driven I/O cycle. Direct DMA access by user processes can speed up operations, but is generally
System calls are implemented via software interrupts, a.k.a. traps. When a forbidden by modern systems for security and protection reasons.
( library ) program needs work performed in kernel mode, it sets command
information and possibly data addresses in certain registers, and then raises a
85
Steps in a DMA transfer. A kernel I/O structure.
Q) Application I/O Interface ❖ Network Devices
User application access to a wide variety of different devices is accomplished Because network access is inherently different from local disk access, most
through layering, and through encapsulating all of the device-specific code systems provide a separate interface for network devices.
into device drivers, while application layers are presented with a common One common and popular interface is the socket interface, which acts like a cable
interface for all ( or at least large general categories of ) devices. or pipeline connecting two networked entities. Data can be put into the socket at
❖ Block and Character Devices one end, and read out sequentially at the other end. Sockets are normally full-
Block devices are accessed a block at a time, and are indicated by a "b" as the first duplex, allowing for bi-directional data transfer.
character in a long listing on UNIX systems. Operations supported include read( ), The select( ) system call allows servers ( or other applications ) to identify sockets
write( ), and seek( ). which have data waiting, without having to poll all available sockets.
Accessing blocks on a hard drive directly ( without going through the filesystem ❖ Clocks and Timers
structure ) is called raw I/O, and can speed up certain operations by bypassing the Three types of time services are commonly needed in modern systems:
buffering and locking normally conducted by the OS. ( It then becomes the Get the current time of day.
application's responsibility to manage those issues. ) Get the elapsed time ( system or wall clock ) since a previous event.
A new alternative is direct I/O, which uses the normal filesystem access, but Set a timer to trigger event X at time T.
which disables buffering and locking operations. Unfortunately time operations are not standard across all systems.
Memory-mapped file I/O can be layered on top of block-device drivers. A programmable interrupt timer, PIT can be used to trigger operations and to
Rather than reading in the entire file, it is mapped to a range of memory measure elapsed time. It can be set to trigger an interrupt at a specific future
addresses, and then paged into memory as needed using the virtual memory time, or to trigger interrupts periodically on a regular basis.
system. The scheduler uses a PIT to trigger interrupts for ending time slices.
.
86
❖ Blocking and Non-blocking I/O
With blocking I/O a process is moved to the wait queue when an I/O request is
made, and moved back to the ready queue when the request completes, allowing
other processes to run in the meantime.
With non-blocking I/O the I/O request returns immediately, whether the
requested I/O operation has ( completely ) occurred or not. This allows the
process to check for available data without getting hung completely if it is not
there.
One approach for programmers to implement non-blocking I/O is to have a multi-
threaded application, in which one thread makes blocking I/O calls ( say to read a
keyboard or mouse ), while other threads continue to update the screen or
perform other tasks.
A subtle variation of the non-blocking I/O is the asynchronous I/O, in which the
I/O request returns immediately allowing the process to continue on with other
tasks, and then the process is notified ( via changing a process variable, or a
software interrupt, or a callback function ) when the I/O operation has completed
and the data is available for use. - Device-status table.
❖ Buffering
Buffering of I/O is performed for ( at least ) 3 major reasons:
Speed differences between two devices. ( See Figure 13.10 below. ) A slow device
may write data into a buffer, and when the buffer is full, the entire buffer is sent
to the fast device all at once. So that the slow device still has somewhere to write
while this is going on, a second buffer is used, and the two buffers alternate as
each becomes full. This is known as double buffering. ( Double buffering is often
used in ( animated ) graphics, so that one screen image can be generated in a
buffer while the other ( completed ) buffer is displayed on the screen. This
prevents the user from ever seeing any half-finished screen images. )
Data transfer size differences. Buffers are used in particular in networking
systems to break messages up into smaller packets for transfer, and then for re-
assembly at the receiving side.
Q) Kernel I/O Subsystem
❖ Caching
❖ I/O Scheduling
Caching involves keeping a copy of data in a faster-access location than where the
Scheduling I/O requests can greatly improve overall efficiency. Priorities can also
data is normally stored.
play a part in request scheduling.
Buffering and caching are very similar, except that a buffer may hold the only
Buffering and caching can also help, and can allow for more flexible scheduling
copy of a given data item, whereas a cache is just a duplicate copy of some other
options.
data stored elsewhere.
On systems with many devices, separate request queues are often kept for each
Buffering and caching go hand-in-hand, and often the same storage space may be
device:
used for both purposes. For example, after a buffer is written to disk, then the
copy in memory can be used as a cached copy, (until that buffer is needed for
other purposes. )

87
Spooling and Device Reservation
A spool ( Simultaneous Peripheral Operations On-Line ) buffers data for (
peripheral ) devices such as printers that cannot support interleaved data
streams.
If multiple processes want to print at the same time, they each send their print
data to files stored in the spool directory. When each file is closed, then the
application sees that print job as complete, and the print scheduler sends each file
to the appropriate printer one at a time.
Support is provided for viewing the spool queues, removing jobs from the queues,
moving jobs from one queue to another queue, and in some cases changing the
priorities of jobs in the queues.
Spool queues can be general ( any laser printer ) or specific ( printer number 42. )
OSes can also provide support for processes to request / get exclusive access to a
particular device, and/or to wait until a device becomes available.
❖ Error Handling
I/O requests can fail for many reasons, either transient ( buffers overflow ) or
permanent ( disk crash ). Use of a system call to perform I/O.
I/O requests usually return an error bit ( or more ) indicating the problem. UNIX ❖ Kernel Data Structures
systems also set the global variable errno to one of a hundred or so well-defined The kernel maintains a number of important data structures pertaining to the I/O
values to indicate the specific error that has occurred. ( See errno.h for a complete system, such as the open file table.
listing, or man errno. ) These structures are object-oriented, and flexible to allow access to a wide variety
Some devices, such as SCSI devices, are capable of providing much more detailed of I/O devices through a common interface. ( See Figure 13.12 below. )
information about errors, and even keep an on-board error log that can be Windows NT carries the object-orientation one step further, implementing I/O as
requested by the host. a message-passing system from the source through various intermediaries to the
❖ I/O Protection device
The I/O system must protect against either accidental or deliberate erroneous
I/O.
User applications are not allowed to perform I/O in user mode - All I/O requests
are handled through system calls that must be performed in kernel mode.
Memory mapped areas and I/O ports must be protected by the memory
management system, but access to these areas cannot be totally denied to user
programs. ( Video games and some other applications need to be able to write
directly to video memory for optimal performance for example. ) Instead the
memory protection system restricts access so that only one process at a time can
access particular parts of memory, such as the portion of the screen memory
corresponding to a particular window.

88
A series of lookup tables and mappings makes the access of different devices
flexible, and somewhat transparent to users.

UNIX I/O kernel structure.


Q) Transforming I/O Requests to Hardware Operations
Users request data using file names, which must ultimately be mapped to specific
blocks of data from a specific device managed by a specific device driver.
DOS uses the colon separator to specify a particular device ( e.g. C:, LPT:, etc. )
UNIX uses a mount table to map filename prefixes ( e.g. /usr ) to specific mounted
devices. Where multiple entries in the mount table match different prefixes of the
filename the one that matches the longest prefix is chosen. ( e.g. /usr/home
instead of /usr where both exist in the mount table and both match the desired
file. )
UNIX uses special device files, usually located in /dev, to represent and access
physical devices directly.
Each device file has a major and minor number associated with it, stored and
displayed where the file size would normally go.
The major number is an index into a table of device drivers, and indicates which
device driver handles this device. ( E.g. the disk drive handler. )
The minor number is a parameter passed to the device driver, and indicates which The life cycle of an I/O request.
specific device is to be accessed, out of the many which may be handled by a
particular device driver. ( e.g. a particular disk drive or partition. )
89
Q) STREAMS
The streams mechanism in UNIX provides a bi-directional pipeline between a user
process and a device driver, onto which additional modules can be added.
The user process interacts with the stream head.
The device driver interacts with the device end.
Zero or more stream modules can be pushed onto the stream, using ioctl( ). These
modules may filter and/or modify the data as it passes through the stream.
Each module has a read queue and a write queue.
Flow control can be optionally supported, in which case each module will buffer
data until the adjacent module is ready to receive it. Without flow control, data is
passed along as soon as it is ready.
User processes communicate with the stream head using either read( ) and write(
) ( or putmsg( ) and getmsg( ) for message passing. )
Streams I/O is asynchronous ( non-blocking ), except for the interface between
the user process and the stream head.
The device driver must respond to interrupts from its device - If the adjacent
module is not prepared to accept data and the device driver's buffers are all full,
then data is typically dropped.
Streams are widely used in UNIX, and are the preferred approach for device
drivers. For example, UNIX implements sockets using streams.

The SREAMS structure

90

You might also like