Operating
Systems:
Internals Chapter 11
and
Design I/O Management
Principles and Disk Scheduling
Ninth Edition
By William Stallings
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Categories of I/O Devices
External devices that engage in I/O with computer
systems can be grouped into three categories:
Human readable
• Suitable for communicating with the computer user
• Printers, terminals, video display, keyboard, mouse
Machine readable
• Suitable for communicating with electronic equipment
• Disk drives, USB keys, sensors, controllers
Communication
• Suitable for communicating with remote devices
• Modems, digital line drivers
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Differences in I/O Devices
Devices differ in a number of areas:
Data Rate
• There may be differences of magnitude between the data transfer rates
Application
• The use to which a device is put has an influence on the software
Complexity of Control
• The effect on the operating system is filtered by the complexity of the I/O module that controls the device
Unit of Transfer
• Data may be transferred as a stream of bytes or characters or in larger blocks
Data Representation
• Different data encoding schemes are used by different devices
Error Conditions
• The nature of errors, the way in which they are reported, their consequences, and
the available range of responses differs from one device to
another
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Organization of the I/O
Function
Three techniques for performing I/O are:
Programmed I/O
The processor issues an I/O command on behalf of a process to an I/O module; that
process then busy waits for the operation to be completed before proceeding
Interrupt-driven I/O
The processor issues an I/O command on behalf of a process
If non-blocking – processor continues to execute instructions from the process that
issued the I/O command
If blocking – the next instruction the processor executes is from the OS, which will
put the current process in a blocked state and schedule another process
Direct Memory Access (DMA)
A DMA module controls the exchange of data between main memory and an I/O
module
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 11.1
I/O Techniques
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Evolution of the I/O
Function
• Processor directly controls a peripheral device
1
• A controller or I/O module is added
2
• Same configuration as step 2, but now interrupts are employed
3
• The I/O module is given direct control of memory via DMA
4
• The I/O module is enhanced to become a separate processor, with a
5 specialized instruction set tailored for I/O
• The I/O module has a local memory of its own and is, in fact, a
6 computer in its own right
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Design Objectives
Efficiency Generality
Major effort in I/O design Desirable to handle all devices in a
uniform manner
Important because I/O operations
often form a bottleneck Applies to the way processes view
I/O devices and the way the
Most I/O devices are extremely operating system manages I/O
slow compared with main devices and operations
memory and the processor
Diversity of devices makes it
The area that has received the difficult to achieve true generality
most attention is disk I/O
Use a hierarchical, modular
approach to the design of the I/O
function
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Buffering
To avoid overheads and inefficiencies, it is sometimes convenient to perform
input transfers in advance of requests being made, and to perform output
transfers some time after the request is made
Block-oriented device Stream-oriented device
• Stores information in blocks • Transfers data in and out as a
that are usually of fixed size stream of bytes
• Transfers are made one block • No block structure
at a time • Terminals, printers,
• Possible to reference data by communications ports, mouse
its block number and other pointing devices,
• Disks and USB keys are and most other devices that
examples are not secondary storage are
examples
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Without a buffer, the OS
directly accesses the device
No Buffer when it needs
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The simplest type of support
that the operating system can
Single Buffer
provide
When a user process issues an
I/O request, the OS assigns a
buffer in the system portion of
main memory to the operation
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Single Buffer
Input transfers are made to the system buffer
Reading ahead/anticipated input
Is done in the expectation that the block will eventually be needed
When the transfer is complete, the process moves the block into user space and
immediately requests another block
Approach generally provides a speedup compared to the lack of system buffering
The user process can be processing one block of data while the next block is being read
in
The OS is able to swap the process out because the input operation is taking place in
system memory rather than user process memory
Disadvantages:
Complicates the logic in the operating system
Swapping logic is also affected
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Single Buffering for Stream-
Oriented I/O
Can be used in a line-at-a-time Byte-at-a-time operation is
fashion or a byte-at-a-time used on forms-mode terminals,
fashion when each keystroke is
significant and for many other
Line-at-a-time operation is peripherals, such as sensors and
appropriate for scroll-mode controllers
terminals (dumb terminals)
With this form of terminal,
user input is one line at a time,
with a carriage return
signaling the end of a line
Output to the terminal is
similarly one line at a time
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Assigning two system buffers to
the operation
A process now transfers data to or
Double Buffer from one buffer while the
operating system empties or fills
the other buffer
Also known as buffer swapping
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
When more than two buffers
are used, the collection of
buffers is itself referred to as a
Circular Buffer
circular buffer
Each individual buffer is one
unit in the circular buffer
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The actual details of disk I/O
Disk
operation depend on the:
Computer system
Performance
Operating system
Nature of the I/O channel
Parameters and disk controller
hardware
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Disk Performance
Parameters
When the disk drive is operating, the disk is rotating at constant speed
To read or write the head must be positioned at the desired track and at the
beginning of the desired sector on that track
Track selection involves moving the head in a movable-head system or
electronically selecting one head on a fixed-head system
On a movable-head system the time it takes to position the head at the
track is known as seek time
The time it takes for the beginning of the sector to reach the head is
known as rotational delay
The sum of the seek time and the rotational delay equals the access time
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Seek Time
The time required to move the disk arm to the required track
Consists of two key components:
The initial startup time
The time taken to traverse the tracks that have to be crossed once the access
arm is up to speed
Settling time
Time after positioning the head over the target track until track identification
is confirmed
Much improvement comes from smaller and lighter disk components
A typical average seek time on contemporary hard disks is under
10ms
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Disk Performance
Rotational delay
The time required for the addressed area of the disk to rotate into a
position where it is accessible by the read/write head
Disks rotate at speeds ranging from 3,6000 rpm (for handheld devices
such as digital cameras) up to 15,000 rpm
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 11.2 Comparison of Disk Scheduling Algorithms
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
First-In, First-Out (FIFO)
Processes in sequential order
Fair to all processes
Approximates random scheduling in performance if
there are many processes competing for the disk
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 11.3 Disk Scheduling Algorithms
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Priority (PRI)
Control of the scheduling is outside the control of disk management
software
Goal is not to optimize disk utilization but to meet other objectives
Short batch jobs and interactive jobs are given higher priority
Provides good interactive response time
Longer jobs may have to wait an excessively long time
A poor policy for database systems
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Select the disk I/O request
Shortest Service that requires the least
movement of the disk arm
Time First from its current position
(SSTF) Always choose the minimum
seek time
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Also known as the elevator algorithm
Arm moves in one direction only
SCAN Satisfies all outstanding requests until it
reaches the last track in that direction then
the direction is reversed
Favors jobs whose requests are for tracks
nearest to both innermost and outermost
tracks and favors the latest-arriving jobs
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Restricts scanning to one direction
C-SCAN only
When the last track has been
(Circular SCAN) visited in one direction, the arm is
returned to the opposite end of the
disk and the scan begins again
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
N-Step-SCAN
Segments the disk request queue into subqueues of length N
Subqueues are processed one at a time, using SCAN
While a queue is being processed new requests must be added to some
other queue
If fewer than N requests are available at the end of a scan, all of them
are processed with the next scan
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
FSCAN
Uses two subqueues
When a scan begins, all of the requests are in one of the queues, with
the other empty
During scan, all new requests are put into the other queue
Service of new requests is deferred until all of the old requests have
been processed
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
RAID RAID is a set of physical
disk drives viewed by the
operating system as a
Redundant Array of single logical drive
Independent Disks
Consists of seven
levels, zero through Design
six architectures
share three
characteristics:
Redundant disk capacity is Data are distributed
used to store parity across the physical
information, which drives of an array in a
guarantees data recoverability scheme known as
in case of a disk failure striping
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
RAID
The term was originally coined in a paper by a group of researchers at the
University of California at Berkeley
The paper outlined various configurations and applications and
introduced the definitions of the RAID levels
Strategy employs multiple disk drives and distributes data in such a way as to
enable simultaneous access to data from multiple drives
Improves I/O performance and allows easier incremental increases in
capacity
The unique contribution is to address effectively the need for redundancy
Makes use of stored parity information that enables the recovery of data lost
due to a disk failure
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
N = number of data disks; m proportional to log N Table 11.4 RAID Levels (Page 498 in textbook)
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Not a true RAID because it does not
RAID include redundancy to improve
performance or provide data protection
Level 0 User and system data are distributed
across all of the disks in the array
Logical disk is divided into strips
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Redundancy is achieved by the simple
RAID expedient of duplicating all the data
Level 1 There is no “write penalty”
When a drive fails the data may still be
accessed from the second drive
Principal disadvantage is the cost
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Makes use of a parallel access
RAID
technique
Data striping is used
Level 2 Typically a Hamming code is used
Effective choice in an environment in
which many disk errors occur
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Requires only a single redundant disk,
no matter how large the disk array
RAID Employs parallel access, with data
distributed in small strips
Level 3 Can achieve very high data transfer
rates
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Makes use of an independent access
technique
RAID A bit-by-bit parity strip is calculated across
corresponding strips on each data disk, and
Level 4 the parity bits are stored in the corresponding
strip on the parity disk
Involves a write penalty when an I/O write
request of small size is performed
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Similar to RAID-4 but distributes the
parity bits across all disks
RAID Typical allocation is a round-robin
scheme
Level 5 Has the characteristic that the loss of any
one disk does not result in data loss
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Two different parity calculations are
carried out and stored in separate blocks
RAID Level
on different disks
Provides extremely high data availability
6 Incurs a substantial write penalty because
each write affects two parity blocks
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Summary of RAID
Of the seven RAID levels described, only four are commonly used: RAID
levels 0, 1, 5, and 6.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
RAID 0
• Based on striping concept.
• Expanding disk capacity Ex: disk 1 – 40GB
disk 2 – 40 GB
Total disk = 80 GB
• Requires a minimum of 2 drives to implement.
The user and system data are distributed across all of the disks in the
array.
• Disadvantage: Not offer fault-tolerance, data will be lost.
• Ideal use: ideal for non-critical storage of data that have to be
read/written at a high speed. E.g. on a Photoshop image retouching
station.
RAID 1 RAID 1
• Based on mirroring concept
• Example: disks are two – 1 x online
– 1 x hot swap
• When a drive fails, the data may still be accessed from the second
drive.
• Disadvantage: Costly because need twice of the disk space of the
logical disk that it supports.
• Ideal use: ideal for mission critical storage.
• Also, suitable for small servers in which only two disks will be used.
• e.g: application such as accounting, payroll
RAID 5 RAID 5
• Example: disks are four – 3 x online
– 1 x hot swap
• Parity strips distributed across all disks.
• Disadvantages: disks failure have an effects on throughput and a
complex technology.
• Ideal use: good system that combines efficient storage with security
and performance.
• e.g: application such as file and application servers.
Disk Cache
Cache memory is used to apply to a memory that is smaller and faster than main
memory and that is interposed between main memory and the processor
Reduces average memory access time by exploiting the principle of locality
Disk cache is a buffer in main memory for disk sectors
Contains a copy of some of the sectors on the disk
The request is satisfied
If YES
via the cache
When an I/O request is made
for a particular sector, a check
is made to determine if the
sector is in the disk cache The requested sector is
If NO read into the disk cache
from the disk
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Least Recently Used
(LRU)
Most commonly used algorithm that deals with the design issue of
replacement strategy
The block that has been in the cache the longest with no reference to it
is replaced
A stack of pointers reference the cache
Most recently referenced block is on the top of the stack
When a block is referenced or brought into the cache, it is placed on the top
of the stack
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Least Frequently Used
(LFU)
The block that has experienced the fewest references is replaced
A counter is associated with each block
Counter is incremented each time block is accessed
When replacement is required, the block with the smallest count is
selected
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
UNIX Buffer Cache
Is essentially a disk cache
I/O operations with disk are handled through the buffer cache
The data transfer between the buffer cache and the user process space always
occurs using DMA
Does not use up any processor cycles
Does consume bus cycles
Three lists are maintained:
Free list
List of all slots in the cache that are available for allocation
Device list
List of all buffers currently associated with each disk
Driver I/O queue
List of buffers that are actually undergoing or waiting for I/O on a particular device
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Character Queue
Used by character oriented devices
Terminals and printers
Either written by the I/O device and read by the process or
vice versa
Producer/consumer model is used
Character queues may only be read once
As each character is read, it is effectively destroyed
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Unbuffered I/O
Is simply DMA between device and process space
Is always the fastest method for a process to perform I/O
Process is locked in main memory and cannot be swapped out
I/O device is tied up with the process for the duration
of the transfer making it unavailable for other
processes
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 11.5 Device I/O in UNIX
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Linux I/O
Very similar to other UNIX implementation
Associates a special file with each I/O device driver
Block, character, and network devices are recognized
Default disk scheduler in Linux 2.4 is the Linux Elevator
For Linux 2.6 the Elevator algorithm has been
augmented by two additional algorithms:
• The deadline I/O scheduler
• The anticipatory I/O scheduler
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The Elevator Scheduler
Maintains a single queue for disk read and write requests and performs
both sorting and merging functions on the queue
When a new request is added to the queue, four operations are considered
in order:
If the request is to the same on-disk sector or an immediately adjacent
sector to a pending request in the queue, then the existing request and the
new request are merged into one request
If a request in the queue is sufficiently old, the new request is inserted at
the tail of the queue
If there is a suitable location, the new request is inserted in sorted order
If there is no suitable location, the new request is placed at the tail of the
queue
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Deadline Scheduler
Two problems manifest themselves with the elevator scheme:
A distant block request can be delayed for a substantial time because the queue is
dynamically updated
A stream of write requests can block a read request for a considerable time, and thus block a
process
To overcome these problems, a new deadline I/O scheduler was developed in 2002
This scheduler makes use of two pairs of queues
In addition to each incoming request being placed in a sorted elevator queue as before, the
same request is placed at the tail of a read FIFO queue for a read request or a write FIFO
queue for a write request
When a request is satisfied, it is removed from the head of the sorted queue and also from the
appropriate FIFO queue
However, when the item at the head of one of the FIFO queues becomes older than its expiration time, then the
scheduler next dispatches from that FIFO queue, taking the expired request, plus the next few requests from
the queue
As each request is dispatched, it is also removed from the sorted queue
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Anticipatory I/O Scheduler
Elevator and deadline scheduling can be counterproductive if there are
numerous synchronous read requests
In Linux, the anticipatory scheduler is superimposed on the deadline
scheduler
When a read request is dispatched, the anticipatory scheduler causes the
scheduling system to delay
There is a good chance that the application that issued the last read request
will issue another read request to the same region of the disk
That request will be serviced immediately
Otherwise the scheduler resumes using the deadline scheduling algorithm
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
The NOOP Scheduler
This is the simplest among Linux I/O schedulers
It is a minimal scheduler that inserts I/O requests into a FIFO
queue and uses merging
Its main uses include nondisk-based block devices such as memory
devices, and specialized software or hardware environments that do
their own scheduling and need only minimal support in the kernel
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Completely Fair Queuing I/O
Scheduler (CFQ)
Was developed in 2003
Is the default I/O scheduler in Linux
The CFQ scheduler guarantees a fair allocation of the disk I/O bandwidth
among all processes
It maintains per process I/O queues
Each process is assigned a single queue
Each queue has an allocated timeslice
Requests are submitted into these queues and are processed in round robin
When the scheduler services a specific queue, and there are no more requests in
that queue, it waits in idle mode for a predefined time interval for new requests,
and if there are no requests, it continues to the next queue
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Linux Page Cache
For Linux 2.4 and later there is a single unified page
cache for all traffic between disk and main memory
Benefits:
When it is time to write back dirty pages to disk, a
collection of them can be ordered properly and written
out efficiently
Because of the principle of temporal locality, pages in
the page cache are likely to be referenced again before
they are flushed from the cache, thus saving a disk I/O
operation
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Basic I/O Facilities
Cache Manager Network Drivers
Maps regions of files into Windows includes integrated
kernel virtual memory and networking capabilities and
then relies on the virtual support for remote file systems
memory manager to copy The facilities are implemented
pages to and from the files on as software drivers
disk
File System Drivers Hardware Device Drivers
Sends I/O requests to the The source code of
software drivers that Windows device drivers is
manage the hardware portable across different
device adapter processor types
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Asynchronous and
Synchronous I/O
Windows offers two
modes of I/O operation
Asynchronous Synchronous
Is used whenever An application initiates an
The application is
possible to optimize I/O operation and then can
blocked until the I/O
application continue processing while
operation completes
performance the I/O request is fulfilled
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
I/O Completion
Windows provides five different techniques
for signaling I/O completion:
1
• Signaling the file object
2
• Signaling an event object
3
• Asynchronous procedure call
4
• I/O completion ports
5
• Polling
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Windows RAID Configurations
Windows supports two sorts of RAID configurations:
Hardware RAID Software RAID
Separate physical Noncontiguous disk
disks combined into space combined into
one or more logical one or more logical
disks by the disk partitions by the
controller or disk fault-tolerant
storage cabinet software disk driver,
hardware FTDISK
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Volume Shadow Copies and
Volume Encryption
Volume Shadow Volume Encryption
Copies Windows uses
Efficient way of making BitLocker to encrypt
consistent snapshots of entire volumes
volumes so they can be More secure than
backed up encrypting individual
Also useful for archiving files
files on a per-volume basis Allows multiple
Implemented by a software interlocking layers of
driver that makes copies of security
data on the volume before it
is overwritten
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Summary
Raid levels 0 – 6
I/O devices Disk cache
Organization of the I/O function Design and performance considerations
The evolution of the I/O function UNIX SVR4 I/O
Direct memory access Buffer cache
Operating system design issues Character queue
Design objectives Unbuffered I/O
Logical structure of the I/O UNIX devices
function Linux I/O
Disk scheduling
I/O Buffering
Linux page cache
Single/double/circular buffer
The utility of buffering Windows I/O
Basic I/O facilities
Disk scheduling
Asynchronous and Synchronous I/O
Disk performance parameters Software RAID
Disk scheduling policies Volume shadow copies/encryption
Raid
© 2017 Pearson Education, Inc., Hoboken, NJ. All rights reserved.