Unit 6 Part - 2 OS Lecture Notes
Unit 6 Part - 2 OS Lecture Notes
UNIT - VI
OPERATING SYSTEMS – Disk Management –
12.1 Overview of Mass-Storage Structure
1 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
In operation the disk rotates at high speed, such as 7200 rpm ( 120 revolutions per
second. ) The rate at which data can be transferred from the disk to the computer is
composed of several steps:
o The positioning time, a.k.a. the seek time or random access time is the time
required to move the heads from one cylinder to another, and for the heads to
settle down after the move. This is typically the slowest step in the process and
the predominant bottleneck to overall transfer rates.
o The rotational latency is the amount of time required for the desired sector to
rotate around and come under the read-write head.This can range anywhere from
zero to one full revolution, and on the average will equal one-half revolution. This
is another physical step and is usually the second slowest step behind seek time. (
For a disk rotating at 7200 rpm, the average rotational latency would be 1/2
revolution / 120 revolutions per second, or just over 4 milliseconds, a long time
by computer standards.
2 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
o The transfer rate, which is the time required to move the data electronically from
the disk to the computer. ( Some authors may also use the term transfer rate to
refer to the overall transfer rate, including seek time and rotational latency as well
as the electronic data transfer rate. )
Disk heads "fly" over the surface on a very thin cushion of air. If they should accidentally
contact the disk, then a head crash occurs, which may or may not permanently damage
the disk or even destroy it completely. For this reason it is normal to park the disk heads
when turning a computer off, which means to move the heads off the disk or to an area of
the disk where there is no data stored.
Floppy disks are normally removable. Hard drives can also be removable, and some are
even hot-swappable, meaning they can be removed while the computer is running, and a
new hard drive inserted in their place.
Disk drives are connected to the computer via a cable known as the I/O Bus. Some of the
common interface formats include Enhanced Integrated Drive Electronics, EIDE;
Advanced Technology Attachment, ATA; Serial ATA, SATA, Universal Serial Bus,
USB; Fiber Channel, FC, and Small Computer Systems Interface, SCSI.
The host controller is at the computer end of the I/O bus, and the disk controller is built
into the disk itself. The CPU issues commands to the host controller via I/O ports. Data is
transferred between the magnetic surface and onboard cache by the disk controller, and
then the data is transferred from that cache to the host controller and the motherboard
memory at electronic speeds.
Magnetic tapes were once used for common secondary storage before the days of hard
disk drives, but today are used primarily for backups.
Accessing a particular spot on a magnetic tape can be slow, but once reading or writing
commences, access speeds are comparable to disk drives.
Capacities of tape drives can range from 20 to 200 GB, and compression can double that
capacity.
The traditional head-sector-cylinder, HSC numbers are mapped to linear block addresses
by numbering the first sector on the first head on the outermost track as sector 0.
Numbering proceeds with the rest of the sectors on that same track, and then the rest of
the tracks on the same cylinder before proceeding through the rest of the cylinders to the
center of the disk. In modern practice these linear block addresses are used in place of the
HSC numbers for a variety of reasons:
1. The linear length of tracks near the outer edge of the disk is much longer than for
those tracks located near the center, and therefore it is possible to squeeze many
more sectors onto outer tracks than onto inner ones.
2. All disks have some bad sectors, and therefore disks maintain a few spare sectors
that can be used in place of the bad ones. The mapping of spare sectors to bad
sectors in managed internally to the disk controller.
3 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
3. Modern hard drives can have thousands of cylinders, and hundreds of sectors per
track on their outermost tracks. These numbers exceed the range of HSC numbers
for many ( older ) operating systems, and therefore disks can be configured for
any convenient combination of HSC values that falls within the total number of
sectors physically on the drive.
There is a limit to how closely packed individual bits can be placed on a physical media,
but that limit is growing increasingly more packed as technological advances are made.
Modern disks pack many more sectors into outer cylinders than inner ones, using one of
two approaches:
o With Constant Linear Velocity, CLV, the density of bits is uniform from cylinder
to cylinder. Because there are more sectors in outer cylinders, the disk spins
slower when reading those cylinders, causing the rate of bits passing under the
read-write head to remain constant. This is the approach used by modern CDs and
DVDs.
o With Constant Angular Velocity, CAV, the disk rotates at a constant angular
speed, with the bit density decreasing on outer cylinders. ( These disks would
have a constant number of sectors per track on all cylinders. )
Disk drives can be attached either directly to a particular host ( a local disk ) or to a network.
4 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
o A large switched fabric having a 24-bit address space. This variant allows for
multiple devices and multiple hosts to interconnect, forming the basis for the
storage-area networks, SANs, to be discussed in a future section.
o The arbitrated loop, FC-AL, that can address up to 126 devices ( drives and
controllers. )
5 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
As mentioned earlier, disk transfer speeds are limited primarily by seek times and
rotational latency. When multiple requests are to be processed there is also some
inherent delay in waiting for other requests to be processed.
Bandwidth is measured by the amount of data transferred divided by the total amount of
time from the first request being made to the last transfer being completed, ( for a series
of disk requests. )
Both bandwidth and access time can be improved by processing requests in a good order.
Disk requests include the disk address, memory address, number of sectors to transfer,
and whether the request is for reading or writing.
6 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
First-Come First-Serve is simple and intrinsically fair, but not very efficient. Consider in
the following sequence the wild swing from cylinder 122 to 14 and then back to 124:
7 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
Shortest Seek Time First scheduling is more efficient, but may lead to starvation if a
constant stream of requests arrives for the same general area of the disk.
SSTF reduces the total head movement to 236 cylinders, down from 640 required for the
same set of requests under FCFS. Note, however that the distance could be reduced still
further to 208 by starting with 37 and then 14 first before processing the rest of the
requests.
8 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
The SCAN algorithm, a.k.a. the elevator algorithm moves back and forth from one end of
the disk to the other, similarly to an elevator processing requests in a tall building.
Under the SCAN algorithm, If a request arrives just ahead of the moving head then it will
be processed right away, but if it arrives just after the head has passed, then it will have to
wait for the head to pass going the other way on the return trip. This leads to a fairly wide
variation in access times which can be improved upon.
Consider, for example, when the head reaches the high end of the disk: Requests with
high cylinder numbers just missed the passing head, which means they are all fairly
recent requests, whereas requests with low numbers may have been waiting for a much
longer time. Making the return scan from high to low then ends up accessing recent
requests first and making older requests wait that much longer.
9 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
The Circular-SCAN algorithm improves upon SCAN by treating all requests in a circular
queue fashion - Once the head reaches the end of the disk, it returns to the other end
without processing any requests, and then starts again from the beginning of the disk:
10 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
LOOK scheduling improves upon SCAN by looking ahead at the queue of pending
requests, and not moving the heads any farther towards the end of the disk than is
necessary. The following diagram illustrates the circular form of LOOK:
11 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
With very low loads all algorithms are equal, since there will normally only be one
request to process at a time.
For slightly larger loads, SSTF offers better performance than FCFS, but may lead to
starvation when loads become heavy enough.
For busier systems, SCAN and LOOK algorithms eliminate starvation problems.
The actual optimal algorithm may be something even more complex than those discussed
here, but the incremental improvements are generally not worth the additional overhead.
Some improvement to overall filesystem access times can be made by intelligent
placement of directory and/or inode information. If those structures are placed in the
middle of the disk instead of at the beginning of the disk, then the maximum distance
from those structures to data blocks is reduced to only one-half of the disk size. If those
structures can be further distributed and furthermore have their data blocks stored as
close as possible to the corresponding directory structures, then that reduces still further
the overall time to find the disk block numbers and then access the corresponding data
blocks.
On modern disks the rotational latency can be almost as significant as the seek time,
however it is not within the OSes control to account for that, because modern disks do
not reveal their internal sector mapping schemes, ( particularly when bad blocks have
been remapped to spare sectors. )
o Some disk manufacturers provide for disk scheduling algorithms directly on their
disk controllers, ( which do know the actual geometry of the disk as well as any
remapping ), so that if a series of requests are sent from the computer to the
controller then those requests can be processed in an optimal order.
o Unfortunately there are some considerations that the OS must take into account
that are beyond the abilities of the on-board disk-scheduling algorithms, such as
priorities of some requests over others, or the need to process certain requests in a
particular order. For this reason OSes may elect to spoon-feed requests to the disk
controller one at a time in certain situations.
Modern systems typically swap out pages as needed, rather than swapping out entire
processes. Hence the swapping system is part of the virtual memory management system.
Managing swap space is obviously an important task for modern OSes.
The amount of swap space needed by an OS varies greatly according to how it is used.
Some systems require an amount equal to physical RAM; some want a multiple of that;
some want an amount equal to the amount by which virtual memory exceeds physical
RAM, and some systems use little or none at all!
Some systems support multiple swap spaces on separate disks in order to speed up the
virtual memory system.
12 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
As a large file which is part of the regular filesystem. This is easy to implement, but
inefficient. Not only must the swap space be accessed through the directory system, the
file is also subject to fragmentation issues. Caching the block location helps in finding the
physical blocks, but that is not a complete fix.
As a raw partition, possibly on a separate or little-used disk. This allows the OS more
control over swap space management, which is usually faster and more efficient.
Fragmentation of swap space is generally not a big issue, as the space is re-initialized
every time the system is rebooted. The downside of keeping swap space on a raw
partition is that it can only be grown by repartitioning the hard drive.
The general idea behind RAID is to employ a group of hard drives together with some
form of duplication, either to increase reliability or to speed up operations, ( or sometimes
both. )
RAID originally stood for Redundant Array of Inexpensive Disks, and was designed to
use a bunch of cheap small disks in place of one or two larger more expensive ones.
Today RAID systems employ large possibly expensive disks as their components,
switching the definition to Independent disks.
The more disks a system has, the greater the likelihood that one of them will go bad at
any given time. Hence increasing disks on a system actually decreases the Mean Time
To Failure, MTTF of the system.
If, however, the same data was copied onto multiple disks, then the data would not be lost
unless both ( or all ) copies of the data were damaged simultaneously, which is a MUCH
lower probability than for a single disk going bad. More specifically, the second disk
would have to go bad before the first disk was repaired, which brings the Mean Time To
Repair into play. For example if two disks were involved, each with a MTTF of 100,000
hours and a MTTR of 10 hours, then the Mean Time to Data Loss would be 500 * 10^6
hours, or 57,000 years!
This is the basic idea behind disk mirroring, in which a system contains identical data on
two or more disks.
o Note that a power failure during a write operation could cause both disks to
contain corrupt data, if both disks were writing simultaneously at the time of the
power failure. One solution is to write to the two disks in series, so that they will
not both become corrupted ( at least not in the same way ) by a power failure. And
alternate solution involves non-volatile RAM as a write cache, which is not lost in
the event of a power failure and which is protected by error-correcting codes.
13 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
There is also a performance benefit to mirroring, particularly with respect to reads. Since
every block of data is duplicated on multiple disks, read operations can be satisfied from
any available copy, and multiple disks can be reading different data blocks
simultaneously in parallel. ( Writes could possibly be sped up as well through careful
scheduling algorithms, but it would be complicated in practice. )
Another way of improving disk access time is with striping, which basically means
spreading data out across multiple disks that can be accessed simultaneously.
o With bit-level striping the bits of each byte are striped across multiple disks. For
example if 8 disks were involved, then each 8-bit byte would be read in parallel
by 8 heads on separate disks. A single disk read would access 8 * 512 bytes = 4K
worth of data in the time normally required to read 512 bytes. Similarly if 4 disks
were involved, then two bits of each byte could be stored on each disk, for 2K
worth of disk access per read or write operation.
o Block-level striping spreads a filesystem across multiple disks on a block-by-
block basis, so if block N were located on disk 0, then block N + 1 would be on
disk 1, and so on. This is particularly useful when filesystems are accessed in
clusters of physical blocks. Other striping possibilities exist, with block-level
striping being the most common.
Mirroring provides reliability but is expensive; Striping improves performance, but does
not improve reliability. Accordingly there are a number of different schemes that
combine the principals of mirroring and striping in different ways, in order to balance
reliability versus performance versus cost. These are described by different RAID levels,
as follows: ( In the diagram that follows, "C" indicates a copy, and "P" indicates parity,
i.e. checksum bits. )
1. Raid Level 0 - This level includes striping only, with no mirroring.
2. Raid Level 1 - This level includes mirroring only, no striping.
3. Raid Level 2 - This level stores error-correcting codes on additional disks,
allowing for any damaged data to be reconstructed by subtraction from the
remaining undamaged data. Note that this scheme requires only three extra disks
to protect 4 disks worth of data, as opposed to full mirroring. ( The number of
disks required is a function of the error-correcting algorithms, and the means by
which the particular bad bit(s) is(are) identified. )
4. Raid Level 3 - This level is similar to level 2, except that it takes advantage of the
fact that each disk is still doing its own error-detection, so that when an error
occurs, there is no question about which disk in the array has the bad data. As a
result a single parity bit is all that is needed to recover the lost data from an array
of disks. Level 3 also includes striping, which improves performance. The
downside with the parity approach is that every disk must take part in every disk
access, and the parity bits must be constantly calculated and checked, reducing
performance. Hardware-level parity calculations and NVRAM cache can help
with both of those issues. In practice level 3 is greatly preferred over level 2.
14 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
15 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
16 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
There are also two RAID levels which combine RAID levels 0 and 1 ( striping and
mirroring ) in different combinations, designed to provide both performance and
reliability at the expense of increased cost.
o RAID level 0 + 1 disks are first striped, and then the striped disks mirrored to
another set. This level generally provides better performance than RAID level 5.
o RAID level 1 + 0 mirrors disks in pairs, and then stripes the mirrored pairs. The
storage capacity, performance, etc. are all the same, but there is an advantage to
this approach in the event of multiple disk failures, as illustrated below:.
In diagram (a) below, the 8 disks have been divided into two sets of four,
each of which is striped, and then one stripe set is used to mirror the other
set.
If a single disk fails, it wipes out the entire stripe set, but the
system can keep on functioning using the remaining set.
However if a second disk from the other stripe set now fails, then
the entire system is lost, as a result of two disk failures.
In diagram (b), the same 8 disks are divided into four sets of two, each of
which is mirrored, and then the file system is striped across the four sets of
mirrored disks.
If a single disk fails, then that mirror set is reduced to a single disk,
but the system rolls on, and the other three mirror sets continue
mirroring.
Now if a second disk fails, ( that is not the mirror of the already
failed disk ), then another one of the mirror sets is reduced to a
single disk, but the system can continue without data loss.
In fact the second arrangement could handle as many as four
simultaneously failed disks, as long as no two of them were from
the same mirror pair.
o
17 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
18 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
Trade-offs in selecting the optimal RAID level for a particular application include cost,
volume of data, need for reliability, need for performance, and rebuild time, the latter of
which can affect the likelihood that a second disk will fail while the first failed disk is
being rebuilt.
Other decisions include how many disks are involved in a RAID set and how many disks
to protect with a single parity bit. More disks in the set increases performance but
increases cost. Protecting more disks per parity bit saves cost, but increases the likelihood
that a second disk will fail before the first bad disk is repaired.
12.7.5 Extensions
RAID concepts have been extended to tape drives ( e.g. striping tapes for faster backups
or parity checking tapes for reliability ), and for broadcasting of data.
RAID protects against physical errors, but not against any number of bugs or other errors
that could write erroneous data.
Another problem with traditional filesystems is that the sizes are fixed, and relatively
difficult to change. Where RAID sets are involved it becomes even harder to adjust
filesystem sizes, because a filesystem cannot span across multiple filesystems.
The concept of stable storage ( first presented in chapter 6 ) involves a storage medium in
which data is never lost, even in the face of equipment failure in the middle of a write
operation.
To implement this requires two ( or more ) copies of the data, with separate failure
modes.
An attempted disk write results in one of three possible outcomes:
1. The data is successfully and completely written.
2. The data is partially written, but not completely. The last block written may be
garbled.
3. No writing takes place at all.
Whenever an equipment failure occurs during a write, the system must detect it, and
return the system back to a consistent state. To do this requires two physical blocks for
every logical block, and the following procedure:
1. Write the data to the first physical block.
2. After step 1 had completed, then write the data to the second physical block.
3. Declare the operation complete only after both physical writes have completed
successfully.
19 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
Primary storage refers to computer memory chips; Secondary storage refers to fixed-disk
storage systems ( hard drives ); And Tertiary Storage refers to removable media, such as
tape drives, CDs, DVDs, and to a lesser extend floppies, thumb drives, and other
detachable devices.
Tertiary storage is typically characterized by large capacity, low cost per MB, and slow
access times, although there are exceptions in any of these categories.
Tertiary storage is typically used for backups and for long-term archival storage of
completed work. Another common use for tertiary storage is to swap large little-used files
( or groups of files ) off of the hard drive, and then swap them back in as needed in a
fashion similar to secondary storage providing swap space for primary storage12.9.1
Tertiary-Storage Devices
Removable magnetic disks ( e.g. floppies ) can be nearly as fast as hard drives, but are at
greater risk for damage due to scratches. Variations of removable magnetic disks up to a
GB or more in capacity have been developed. ( Hot-swappable hard drives? )
A magneto-optical disk uses a magnetic disk covered in a clear plastic coating that
protects the surface.
o The heads sit a considerable distance away from the magnetic surface, and as a
result do not have enough magnetic strength to switch bits at normal room
temperature.
o For writing, a laser is used to heat up a specific spot on the disk, to a temperature
at which the weak magnetic field of the write head is able to flip the bits.
o For reading, a laser is shined at the disk, and the Kerr effect causes the
polarization of the light to become rotated either clockwise or counter-clockwise
depending on the orientation of the magnetic field.
Optical disks do not use magnetism at all, but instead use special materials that can be
altered ( by lasers ) to have relatively light or dark spots.
o For example the phase-change disk has a material that can be frozen into either a
crystalline or an amorphous state, the latter of which is less transparent and
20 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
reflects less light when a laser is bounced off a reflective surface under the
material.
Three powers of lasers are used with phase-change disks: (1) a low power
laser is used to read the disk, without effecting the materials. (2) A
medium power erases the disk, by melting and re-freezing the medium
into a crystalline state, and (3) a high power writes to the disk by melting
the medium and re-freezing it into the amorphous state.
The most common examples of these disks are re-writable CD-RWs and
DVD-RWs.
An alternative to the disks described above are Write-Once Read-Many, WORM drives.
o The original version of WORM drives involved a thin layer of aluminum
sandwiched between two protective layers of glass or plastic.
Holes were burned in the aluminum to write bits.
Because the holes could not be filled back in, there was no way to re-write
to the disk. ( Although data could be erased by burning more holes. )
o WORM drives have important legal ramifications for data that must be stored for
a very long time and must be provable in court as unaltered since it was originally
written. ( Such as long-term storage of medical records. )
o Modern CD-R and DVD-R disks are examples of WORM drives that use organic
polymer inks instead of an aluminum layer.
Read-only disks are similar to WORM disks, except the bits are pressed onto the disk at
the factory, rather than being burned on one by one. ( See
http://en.wikipedia.org/wiki/CD_manufacturing#Premastering for more information on
CD manufacturing techniques. )
12.9.1.2 Tapes
Tape drives typically cost more than disk drives, but the cost per MB of the tapes
themselves is lower.
Tapes are typically used today for backups, and for enormous volumes of data stored by
certain scientific establishments. ( E.g. NASA's archive of space probe and satellite
imagery, which is currently being downloaded from numerous sources faster than anyone
can actually look at it. )
Robotic tape changers move tapes from drives to archival tape libraries upon demand.
( Never underestimate the bandwidth of a station wagon full of tapes rolling down the
highway! )
Solid State Disks, SSDs, are becoming more and more popular.
Holographic storage uses laser light to store images in a 3-D structure, and the entire
data structure can be transferred in a single flash of laser light.
Micro-Electronic Mechanical Systems, MEMS, employs the technology used for
computer chip fabrication to create VERY tiny little machines. One example packs
10,000 read-write heads within a square centimeter of space, and as media are passed
over it, all 10,000 heads can read data in parallel.
21 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
The OS must provide support for tertiary storage as removable media, including the
support to transfer data between different systems.
File systems are typically not stored on tapes. ( It might be technically possible, but it is
impractical. )
Tapes are also not low-level formatted, and do not use fixed-length blocks. Rather data is
written to tapes in variable length blocks as needed.
Tapes are normally accessed as raw devices, requiring each application to determine how
the data is to be stored and read back. Issues such as header contents and ASCII versus
binary encoding ( and byte-ordering ) are generally application specific.
Basic operations supported for tapes include locate( ), read( ), write( ), and read_position(
).
( Because of variable length writes ), writing to a tape erases all data that follows that
point on the tape.
o Writing to a tape places the End of Tape ( EOT ) marker at the end of the data
written.
o It is not possible to locate( ) to any spot past the EOT marker.
File naming conventions for removable media are not entirely uniquely specific, nor are
they necessarily consistent between different systems. ( Two removable disks may
contain files with the same name, and there is no clear way for the naming system to
distinguish between them. )
Fortunately music CDs have a common format, readable by all systems. Data CDs and
DVDs have only a few format choices, making it easy for a system to support all known
formats.
Hierarchical storage involves extending file systems out onto tertiary storage, swapping
files from hard drives to tapes in much the same manner as data blocks are swapped from
memory to hard drives.
A placeholder is generally left on the hard drive, storing information about the particular
tape ( or other removable media ) on which the file has been swapped out to.
A robotic system transfers data to and from tertiary storage as needed, generally
automatically upon demand of the file(s) involved.
22 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
12.9.3.1 Speed
Sustained Bandwidth is the rate of data transfer during a large file transfer, once the
proper tape is loaded and the file located.
Effective Bandwidth is the effective overall rate of data transfer, including any overhead
necessary to load the proper tape and find the file on the tape.
Access Latency is all of the accumulated waiting time before a file can be actually read
from tape. This includes the time it takes to find the file on the tape, the time to load the
tape from the tape library, and the time spent waiting in the queue for the tape drive to
become available.
Clearly tertiary storage access is much slower than secondary access, although removable
disks ( e.g. a CD jukebox ) have somewhat faster access than a tape library.
12.9.3.1 Reliability
Fixed hard drives are generally more reliable than removable drives, because they are less
susceptible to the environment.
Optical disks are generally more reliable than magnetic media.
A fixed hard drive crash can destroy all data, whereas an optical drive or tape drive
failure will often not harm the data media, ( and certainly can't damage any media not in
the drive at the time of the failure. )
Tape drives are mechanical devices, and can wear out tapes over time, ( as the tape head
is generally in much closer physical contact with the tape than disk heads are with
platters. )
o Some drives may only be able to read tapes a few times whereas other drives may
be able to re-use the same tapes millions of times.
o Backup tapes should be read after writing, to verify that the backup tape is
readable. ( Unfortunately that may have been the LAST time that particular tape
was readable, and the only way to be sure is to read it again, . . . )
o Long-term tape storage can cause degradation, as magnetic fields "drift" from one
layer of tape to the adjacent layers. Periodic fast-forwarding and rewinding of
tapes can help, by changing which section of tape lays against which other layers.
12.9.3.3 Cost
The cost per megabyte for removable media is its strongest selling feature, particularly as
the amount of storage involved ( i.e. the number of tapes, CDs, etc ) increases.
However the cost per megabyte for hard drives has dropped more rapidly over the years
than the cost of removable media, such that the currently most cost-effective backup
solution for many systems is simply an additional ( external ) hard drive.
( One good use for old unwanted PCs is to put them on a network as a backup server
and/or print server. The downside to this backup solution is that the backups are stored
on-site with the original data, and a fire, flood, or burglary could wipe out both the
original data and the backups. )
23 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
24 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
25 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
One way of communicating with devices is through registers associated with each port.
Registers may be one to four bytes in size, and may typically include ( a subset of ) the
following four:
1. The data-in register is read by the host to get input from the device.
2. The data-out register is written by the host to send output.
3. The status register has bits read by the host to ascertain the status of the device,
such as idle, ready for input, busy, error, transaction complete, etc.
4. The control register has bits written by the host to issue commands or to change
settings of the device such as parity checking, word length, or full- versus half-
duplex operation.
Figure 13.2 shows some of the most common I/O port address ranges.
26 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
13.2.1 Polling
13.2.2 Interrupts
Interrupts allow devices to notify the CPU when they have data to transfer or when an
operation is complete, allowing the CPU to perform other duties when no I/O transfers
need its immediate attention.
The CPU has an interrupt-request line that is sensed after every instruction.
o A device's controller raises an interrupt by asserting a signal on the interrupt
request line.
27 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
o The CPU then performs a state save, and transfers control to the interrupt handler
routine at a fixed address in memory. ( The CPU catches the interrupt and
dispatches the interrupt handler. )
o The interrupt handler determines the cause of the interrupt, performs the
necessary processing, performs a state restore, and executes a return from
interrupt instruction to return control to the CPU. ( The interrupt handler clears
the interrupt by servicing the device. )
( Note that the state restored does not need to be the same state as the one
that was saved when the interrupt went off. See below for an example
involving time-slicing. )
Figure 13.3 illustrates the interrupt-driven I/O procedure:
28 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
The above description is adequate for simple interrupt-driven I/O, but there are three
needs in modern computing which complicate the picture:
1. The need to defer interrupt handling during critical processing,
2. The need to determine which interrupt handler to invoke, without having to poll
all devices to see which one needs attention, and
3. The need for multi-level interrupts, so the system can differentiate between high-
and low-priority interrupts for proper response.
These issues are handled in modern computer architectures with interrupt-controller
hardware.
o Most CPUs now have two interrupt-request lines: One that is non-maskable for
critical error conditions and one that is maskable, that the CPU can temporarily
ignore during critical processing.
o The interrupt mechanism accepts an address, which is usually one of a small set
of numbers for an offset into a table called the interrupt vector. This table
( usually located at physical address zero ? ) holds the addresses of routines
prepared to process specific interrupts.
o The number of possible interrupt handlers still exceeds the range of defined
interrupt numbers, so multiple handlers can be interrupt chained. Effectively the
addresses held in the interrupt vectors are the head pointers for linked-lists of
interrupt handlers.
o Figure 13.4 shows the Intel Pentium interrupt vector. Interrupts 0 to 31 are non-
maskable and reserved for serious hardware and other errors. Maskable interrupts,
including normal device I/O interrupts begin at interrupt 32.
o Modern interrupt hardware also supports interrupt priority levels, allowing
systems to mask off only lower-priority interrupts while servicing a high-priority
interrupt, or conversely to allow a high-priority signal to interrupt the processing
of a low-priority one.
29 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
30 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
Most devices can be characterized as either block I/O, character I/O, memory mapped file
access, or network sockets. A few devices are special, such as time-of-day clock and the
system timer.
Most OSes also have an escape, or back door, which allows applications to send
commands directly to device drivers if needed. In UNIX this is the ioctl( ) system call (
I/O Control ). Ioctl( ) takes three arguments - The file descriptor for the device driver
being accessed, an integer indicating the desired function to be performed, and an address
used for communicating or transferring additional information.
Block devices are accessed a block at a time, and are indicated by a "b" as the first
character in a long listing on UNIX systems. Operations supported include read( ), write(
), and seek( ).
31 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
o Accessing blocks on a hard drive directly ( without going through the filesystem
structure ) is called raw I/O, and can speed up certain operations by bypassing the
buffering and locking normally conducted by the OS. ( It then becomes the
application's responsibility to manage those issues. )
o A new alternative is direct I/O, which uses the normal filesystem access, but
which disables buffering and locking operations.
Memory-mapped file I/O can be layered on top of block-device drivers.
o Rather than reading in the entire file, it is mapped to a range of memory
addresses, and then paged into memory as needed using the virtual memory
system.
o Access to the file is then accomplished through normal memory accesses, rather
than through read( ) and write( ) system calls. This approach is commonly used
for executable program code.
Character devices are accessed one byte at a time, and are indicated by a "c" in UNIX
long listings. Supported operations include get( ) and put( ), with more advanced
functionality such as reading an entire line supported by higher-level library routines.
Because network access is inherently different from local disk access, most systems
provide a separate interface for network devices.
One common and popular interface is the socket interface, which acts like a cable or
pipeline connecting two networked entities. Data can be put into the socket at one end,
and read out sequentially at the other end. Sockets are normally full-duplex, allowing for
bi-directional data transfer.
The select( ) system call allows servers ( or other applications ) to identify sockets which
have data waiting, without having to poll all available sockets.
32 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
o More timers than actually exist can be simulated by maintaining an ordered list of
timer events, and setting the physical timer to go off when the next scheduled
event should occur.
On most systems the system clock is implemented by counting interrupts generated by
the PIT. Unfortunately this is limited in its resolution to the interrupt frequency of the
PIT, and may be subject to some drift over time. An alternate approach is to provide
direct access to a high frequency hardware counter, which provides much higher
resolution and accuracy, but which does not support interrupts.
With blocking I/O a process is moved to the wait queue when an I/O request is made, and
moved back to the ready queue when the request completes, allowing other processes to
run in the meantime.
With non-blocking I/O the I/O request returns immediately, whether the requested I/O
operation has ( completely ) occurred or not. This allows the process to check for
available data without getting hung completely if it is not there.
One approach for programmers to implement non-blocking I/O is to have a multi-
threaded application, in which one thread makes blocking I/O calls ( say to read a
keyboard or mouse ), while other threads continue to update the screen or perform other
tasks.
A subtle variation of the non-blocking I/O is the asynchronous I/O, in which the I/O
request returns immediately allowing the process to continue on with other tasks, and
then the process is notified ( via changing a process variable, or a software interrupt, or a
callback function ) when the I/O operation has completed and the data is available for
use. ( The regular non-blocking I/O returns immediately with whatever results are
available, but does not complete the operation and notify the process later. )
33 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
Scheduling I/O requests can greatly improve overall efficiency. Priorities can also play a
part in request scheduling.
The classic example is the scheduling of disk accesses, as discussed in detail in chapter
12.
Buffering and caching can also help, and can allow for more flexible scheduling options.
On systems with many devices, separate request queues are often kept for each device:
34 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
13.4.2 Buffering
13.4.3 Caching
Caching involves keeping a copy of data in a faster-access location than where the data is
normally stored.
Buffering and caching are very similar, except that a buffer may hold the only copy of a
given data item, whereas a cache is just a duplicate copy of some other data stored
elsewhere.
Buffering and caching go hand-in-hand, and often the same storage space may be used
for both purposes. For example, after a buffer is written to disk, then the copy in memory
can be used as a cached copy, (until that buffer is needed for other purposes. )
35 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
I/O requests can fail for many reasons, either transient ( buffers overflow ) or permanent
( disk crash ).
I/O requests usually return an error bit ( or more ) indicating the problem. UNIX systems
also set the global variable errno to one of a hundred or so well-defined values to indicate
the specific error that has occurred. ( See errno.h for a complete listing, or man errno. )
Some devices, such as SCSI devices, are capable of providing much more detailed
information about errors, and even keep an on-board error log that can be requested by
the host.
The I/O system must protect against either accidental or deliberate erroneous I/O.
User applications are not allowed to perform I/O in user mode - All I/O requests are
handled through system calls that must be performed in kernel mode.
Memory mapped areas and I/O ports must be protected by the memory management
system, but access to these areas cannot be totally denied to user programs. ( Video
games and some other applications need to be able to write directly to video memory for
optimal performance for example. ) Instead the memory protection system restricts access
so that only one process at a time can access particular parts of memory, such as the
portion of the screen memory corresponding to a particular window.
36 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
The kernel maintains a number of important data structures pertaining to the I/O system,
such as the open file table.
These structures are object-oriented, and flexible to allow access to a wide variety of I/O
devices through a common interface. ( See Figure 13.12 below. )
Windows NT carries the object-orientation one step further, implementing I/O as a
message-passing system from the source through various intermediaries to the device.
37 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram
Operating Systems Unit -VI
******
38 Dept. of CSE, Shri Vishnu Engineering College for Women (A), Bhimavaram