Os Note For IV r19
Os Note For IV r19
AND ENGINEERING
COURSE MATERIAL
1)RESOURCES
A resource can be a hardware device (e.g., a tape drive) or a piece of information (e.g., a locked record in
a database).
A nonpreemptable resource, in contrast, is one that cannot be taken away from its current owner without
causing the computation to fail. CD- ROM is an example of a nonpreemptable resource.
The sequence of events required to use a resource is given below in an abstract form.
1. Request the resource.
2. Use the resource.
3. Release the resource.
If the resource is not available when it is requested, the requesting process is forced to wait. In some
operating systems, the process is automatically blocked when a resource request fails, and awakened when
it becomes available.
Figure 6-1. Using a semaphore to protect resources. (a) One resource. (b) Two resources
Sometimes processes need two or more resources. They can be acquired sequentially, as shown in Fig. 6-
l(b). If more than two resources are needed, they are just acquired one after another.
Figure 6-2. (a) Deadlock-free code.
Now let us consider a situation with two processes, A and B, and two resources. Two scenarios are depicted
in Fig. 6-2. In Fig. 6-2(a), both processes ask for the resources in the same order. In Fig. 6-2(b), they ask for
them in a different order. This difference may seem minor, but it is not.
In Fig. 6-2(a), one of the processes will acquire the first resource before the other one. That process will
then successfully acquire the second resource and do its work. If the other process attempts to acquire
resource 1 before it has been released, the other process will simply block until it becomes available.
In Fig. 6-2(b), the situation is different. It might happen that one of the processes acquires both resources
and effectively blocks out the other process until it is done. However, it might also happen that process A
acquires resource 1 and process B acquires resource 2. Each one will now block when trying to acquire the
other one. Neither process will ever run again. This situation is a deadlock.
2) I N T R O D U C T I O N T O D E A D L O C KS
1. Mutual exclusion condition. Each resource is either currently assigned to exactly one process or is
available.
2. Hold and wait condition. Processes currently holding resources that were granted earlier can request new
resources.
3. No preemption condition. Resources previously granted cannot be forcibly taken away from a process.
They must be explicitly released by the process holding them.
4. Circular wait condition. There must be a circular chain of two or more processes, each of which is
waiting for a resource held by the next member of the chain.
All four of these conditions must be present for a resource deadlock to occur. If one of them is absent, no
resource deadlock is possible.
The simplest approach is the ostrich algorithm: stick your head in the sand and pretend there is no problem at all.
Reasonable if – deadlocks occur very rarely – cost of prevention is high
UNIX and Windows takes this approach for some of the more complex resource relationships to manage
It’s a trade off between – Convenience (engineering approach) – Correctness (mathematical approach).
To make this contrast more specific, consider an operating system that blocks the caller when an
open system call on a physical device such as a CD-ROM driver or a printer cannot be carried out because
the device is busy. Typically it is up to the device driver to decide what action to take under such
circumstances. Blocking or returning an error code are two obvious possibilities. If one process successfully
opens the CD-ROM drive and another successfully opens the printer and then each process tries to open the
other one and blocks trying, we have a deadlock. Few current systems will detect this.
A second technique is detection and recovery. When this technique is used,the system does not attempt to
prevent deadlocks from occurring. Instead, it lets them occur, tries to detect when this happens, and
then takes some action to recover after the fact. In this section we will look at some of the ways
deadlocks can be detected and some of the ways recovery from them can be handled.
Let us begin with the simplest case: only one resource of each type exists. Such a system might have one
scanner, one CD recorder, one plotter, and one tape drive, but no more than one of each class of
resource.
For such a system, we can construct a resource graph . If this graph contains one or more cycles, a
deadlock exists. Any process that is part of a cycle is deadlocked. If no cycles exist, the system is not
deadlocked.
As an example of a more complex system than the ones we have looked at so far, consider a system with
seven processes, A though G, and six resources, R through W. The state of which resources are
currently owned and which ones are currently being requested is as follows:
The question is: "Is this system deadlocked, and if so, which processes are involved?”
To answer this question, we can construct the resource graph of Fig. 6-5(a). This graph contains one
cycle, which can be seen by visual inspection. The cycle is shown in Fig. 6-5(b). From this cycle, we
can see the processes D, E, and G are all deadlocked. Processes A, C, and F are not deadlock because S
can be allocated
to any one of them, which then finishes and returns it. Then the other two can take it in turn and also
complete.
Figure 6-5. (a) A resource graph, (b) A cycle extracted from (a).
Below we will give a simple one that inspects a graph and terminates either when it has found a cycle or
when it has shown that none exists. It uses one dynamic data structure, L, a list of nodes, as well as the
list of arcs. During the algorithm, arcs will be marked to indicate that they have already been inspected,
to
prevent repeated inspections.
The algorithm operates by carrying out the following steps as specified:
1. For each node, N in the graph, perform the following five steps with N as the starting node.
2. Initialize L to the empty list, and designate all the arcs as unmarked.
3. Add the current node to the end of L and check to see if the node now appears in L two times. If it does,
the graph contains a cycle (listed in L) and the algorithm terminates.
4. From the given node, see if there are any unmarked outgoing arcs. If so, go to step 5; if not, go to step
6.
5. Pick an unmarked outgoing arc at random and mark it. Then follow it to the new current node and go to
step 3.
6. If this node is the initial node, the graph does not contain any cycles and the algorithm terminates.
Otherwise, we have now reached a dead end. Remove it and go back to the previous node, that is, the
one that was current just before this one, make that one the current node, and go to step 3.
To see how the algorithm works in practice, let us use it on the graph of Fig. 6-5(a). The order of
processing the nodes is arbitrary, so let us just inspect them from left to right, top to bottom, first
running the algorithm starting at R, then successively A, B, C, S, D, T, E, F, and so forth. If we hit a
cycle, the algorithm stops.
We start at R and initialize L to the empty list. Then we add R to the list and move to the only possibility,
A, and add it to L, giving L = [R, A). From A we go to S, giving L = [R, A, S]. S has no outgoing arcs,
so it is a dead end, forcing us to backtrack to A. Since A has no unmarked outgoing arcs, we backtrack
to R, completing
our inspection of R.
Now we restart the algorithm starting at A, resetting L to the empty list. This search, too, quickly stops, so
we start again at B. From B we continue to follow outgoing arcs until we get to D, at which time L - [B,
T, E, V, G, U, D]. Now we must make a (random) choice. If we pick S we come to a dead end and
backtrack to D. The second time we pick T and update L to be [B, T, E, V, G, U,D,T], at which point
we discover the cycle and stop the algorithm.
When multiple copies of some of the resources exist, a different approach is needed to detect deadlocks.
We will now present a matrix-based algorithm for detecting deadlock among n processes, P, through
P„. It gives the total number of instances of each resource in existence. For example, if class 1 is tape
drives, then Ex = 2 means the system has two tape drives.
Figure 6-6. The four data structures needed by the deadlock detection algorithm
The deadlock detection algorithm can now be given as follows.
1. Look for an unmarked process, P,-, for which the j'-th row of R is less than or equal to A.
2. If such a process is found, add the i-th row of C to A, mark the process, and go back to step 1.
3. If no such process exists, the algorithm terminates.
When the algorithm finishes, all the unmarked processes, if any, are deadlocked.
As an example of how the deadlock detection algorithm works, consider Fig. 6-7. Here we have three
processes and four resource classes, which we have arbitrarily labeled tape drives, plotters,
scanner, and CD-ROM drive. Process 1 has one scanner. Process 2 has two tape drives and a CD-
ROM drive. Process 3
has a plotter and two scanners. Each process needs additional resources, as shown by the R matrix .
To run the deadlock detection algorithm, we look for a process whose resource request can be satisfied.
The first one cannot be satisfied because there is no CD-ROM drive available. The second cannot
be satisfied either, because there is no scanner free. Fortunately, the third one can be satisfied, so
process 3 runs
and eventually returns all its resources, giving
A = (22 2 0)
At this point process 2 can run and return its resources, giving
A=(4 2 2 1)
Now the remaining process can run. There is no deadlock in the system.
5)DEADLOCK AVOIDANCE
1. Resource Trajectories
2. Safe and unsafe states
3. The Banker’s algorithm for single Resource
4. The Banker’s algorithm for multiple resource
Resource Trajectories:
Process A and B
•Resources: printer and plotter
•A needs printer from I1 to I3
•A needs plotter from I2 to I4
•B needs plotter from I5 to I7
•B needs printer from I6 to I8
The horizontal (vertical) axis represents the number of instructions executed by process A (B)
Every point in the diagram represents a joint state of the two processes
If the system ever enters the box bounded by I1 and I2 on the sides I5 and I6 and top and bottom,
it will eventually deadlock when it gets to the intersection of I2 and I6
At this point, A is requesting the plotter and B is requesting the printer, and both are already
assigned
The entire box is unsafe and must not be entered
At point the only safe thing to do is run process A until it gets to . Beyond that, any trajectory
to will do
At point B is requesting a resource. The system must decide whether to grant it or not
If the grant is made, the system will enter an unsafe region and eventually deadlock.
Figure 3.5: Demonstration that the state in is safe (upper), and in is not safe (lower).
First let's consider the situation when there is one resource type, think of it as units of money (1K
ollars), a banker (the OS) who has a certain number of units in his bank and a number of customers
who can loan a certain number of units from the bank and later pay the loan back (release the
resources). The customers have a credit line that cannot exceed the number of units initially in the
bank, and when they have borrowed their max number of units, they will pay their loan back.
The banker will grant a loan request only if it does not lead to an unsafe state. Eg, the banker
initially has 10 K, and four customers A, B, C and D have credit lines 6K, 5K, 4K and 7K
respectively. The state when no loans have been made is then:
This is unsafe: if ALL CUSTOMERS ask their MAXIMUM remaining credit, NONE
can be satisfied, and we have deadlock. So, in this case the banker will not
grant B the loan; ie back to
This state is safe because with 2K left, C can borrow her max remaining credit:
and terminate
A 6 6
and terminate
So a safe state is a state where a sequence of ALL proceses can get their max
required resources (one at the time) and finish and release all their resources.
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 5 3 2 2 P: Possessed resources
C 1110 3100 A = 1 0 2 0 A: Available resources
D 1101 0010
E 0000 2110
HAS STILL NEEDS
The Column sums of the HAS matrix is equal to the vector P. Also, E = P+A,or A = E-P.
1. Find a process (row) R with unmet resources ALL less than A, ie a process that can be granted all its
unmet resources. If no such row exists, the state is unsafe.
2. Assume process R takes all its resources and finishes. Mark it as finished and release its resources back to
A.
3. Repeat steps 1 and 2 until either an unsafe state appears, in which case there is a potential for deadlock,
or all processes finish, in which case the original state was safe.
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 5 3 2 2 P: Possessed resources
C 1110 3100 A = 1 0 2 0 A: Available resources
D 1101 0010
E 0000 2110
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 5 3 3 2 P: Possessed resources
C 1110 3100 A = 1 0 1 0 A: Available resources
D 1111 0000
E 0000 2110
HAS STILL NEEDS
Which is safe, as
D can still finish
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 4 2 2 1 P: Possessed resources
C 1110 3100 A = 2 1 2 1 A: Available resources
E 0000 2110
HAS STILL NEEDS
now E can go
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 6 3 3 1 P: Possessed resources
C 1110 3100 A = 0 0 1 1 A: Available resources
E 2110 0000
HAS STILL NEEDS
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 4 2 2 1 P: Possessed resources
C 1110 3100 A = 2 1 2 1 A: Available resources
HAS STILL NEEDS
Which is safe, as
E can still finish
NOW A can go
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 3011 1100 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 4 2 2 1 P: Possessed resources
C 1110 3100 A = 2 1 2 1 A: Available resources
HAS STILL NEEDS
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
A 4111 0000 E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 5 3 3 1 P: Possessed resources
C 1110 3100 A = 1 0 2 1 A: Available resources
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
E = 6 3 4 2 E: Existing resources
B 0100 0112 P = 1 2 1 0 P: Possessed resources
C 1110 3100 A = 5 1 3 2 A: Available resources
NOW B can go
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
E = 6 3 4 2 E: Existing resources
B 0212 0000 P = 1 3 2 2 P: Possessed resources
C 1110 3100 A = 5 0 2 0 A: Available resources
Which is safe, as
B can still finish
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
E = 6 3 4 2 E: Existing resources
P = 1 1 1 0 P: Possessed resources
C 1110 3100 A = 5 2 3 2 A: Available resources
NOW C can go
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
E = 6 3 4 2 E: Existing resources
P = 4 2 1 0P: Possessed resources
C 4210 0000 A = 2 1 3 2 A: Available resources
HAS STILL NEEDS
Which is safe, as
C can still finish
Tp Pl Pr CD Tp Pl Pr CD Tp Pl Pr CD
E = 6 3 4 2 E: Existing resources
P = 0 0 0 0 P: Possessed resources
A = 6 3 4 2 A: Available resources
So the bankers algorithm, upon receiving a request, "pencils" it into the state and checks whether the state
is still safe..
DEADLOCK PREVENTION:
1)FILES:
File Naming:
Suffix Examples
File Structure
Byte sequences
Maximum flexibility-can put anything in
Unix and Windows use this approach
Fixed length records (card images in the old days)
Tree of records- uses key field to find records in the tree
File Types
File Access
• Sequential access- read from the beginning, can’t skip around
• Corresponds to magnetic tape
• Random access- start where you want to start
• Came into play with disks
• Necessary for many applications, e.g. airline reservation system
File Attributes
File operations
• Create -with no data, sets some attributes
• Delete-to free disk space
• Open- after create, gets attributes and disk addresses into main memory
• Close- frees table space used by attributes and addresses
• Read-usually from current pointer position. Need to specify buffer into which data is placed
• Write-usually to current position
• Append- at the end of the file
• Seek-puts file pointer at specific place in file. Read or write from that position on
• Get Attributes-e.g. make needs most recent modification times to arrange for group compilation
• Set Attributes-e.g. protection attributes
• Rename
Systemcalls:
2)DIRECTOIES
Sometimes the file system consisting of millions of files,at that situation it is very hard to manage the files. To
manage these files grouped these files and load one group into one partition. Each partition is called a directory
.a directory structure provides a mechanism for organizing many files in the file system.
A single level directory system contains 4 files owned by 3 different people, A, B, and C
With this approach, there can be as many directories as are needed to group the files in natural ways.
Furthermore, if multiple users share a common file server, as is the case on many company networks, each user
can have a private root directory for his or her own hierarchy. This approach is shown in Fig. 4-7. Here, the
directories A, B, and C contained in the root directory each belong to a different user, two of whom have created
subdirectories for projects they are working on.
Path names
Absolute path name: /usr/carl/cs310/miderm/answers. Refers to parent of current directory
Relative path name: cs310/midterm/answers-. Refers to current (working) directory
Fig: UNIX Path
• Superblock contains info about the fs (e.g. type of fs, number of blocks, …)
• i-nodes contain info about files
Implementing files:
• Most important implementation issue
• Methods
• Contiguous allocation
• Linked list allocation
• Linked list using table
• I-nodes
Contiguous Allocation
The good
• Gets rid of fragmentation
The bad
• Random access is slow. Need to chase pointers to get to a block
I-nodes
Keep data structure in memory only for active files
Data structure lists disk addresses of the blocks and attributes of the files
K active files, N blocks per file => k*n blocks max!!
Solves the growth problem
How big is N?
Solution: Last entry in table points to disk block which contains pointers to other disk blocks
Two ways of handling long file names in a directory. (a) In-line. (b) In a heap.
Shared Files
File system containing a shared file. File systems is a directed acyclic tree (DAG)
(a) Situation prior to linking. (b) After the link is created. (c) After the original owner removes the file.
Symbolic links
Symbolic link solves problem
Can have too many symbolic links and they take time to follow
Big advantage-can point to files on other machines
Log Structured File System
CPU’s faster, disks and memories bigger (much) but disk seek time has not decreased
• Caches bigger-can do reads from cache
• Want to optimize writes because disk needs to be updated
• Structure disk as a log-collect writes and periodically send them to a segment in the disk . Writes tend to
be very small
• Segment has summary of contents (i-nodes, directories….).
• Keep i-node map on disk and cache it in memory to locate i-nodes
• Cleaner thread compacts log. Scans segment for current i-nodes, discarding ones not in use and sending
current ones to memory.
• Writer thread writes current ones out into new segment.
• Works well in Unix. Not compatible with most file systems
• Not used
Journaling File Systems
Want to guard against lost files when there are crashes. Consider what happens when a file has to be
removed.
• Remove the file from its directory.
• Release the i-node to the pool of free i-nodes.
• Return all the disk blocks to the pool of free disk blocks
If there is a crash somewhere in this process, have a mess.
o Keep a journal (i,.e. list) of actions before you take them, write journal to disk, then perform actions. Can
recover from a crash!
o Need to make operations idempotent. Must arrange data structures to do so.
o Mark block n as free is an idempotent operation.
o Adding freed blocks to the end of a list is not idempotent
o NTFS (Windows) and Linux use journaling
Virtual File Systems :
Have multiple fs on same machine
o Windows specifies fs (drives)
o Unix integrates into VFS
o Vfs calls from user
o Lower calls to actual fs
Supports Network File System-file can be on a remote machine
VFS-how it works
o File system registers with VFS (e.g. at boot time)
o At registration time, fs provides list of addresses of function calls the vfs wants
o Vfs gets info from the new fs i-node and puts it in a v-node
o Makes entry in fd table for process
o When process issues a call (e.g. read), function pointers point to concrete function calls
. A simplified view of the data structures and code used by the VFS and concrete file system to do a read.
Disk-Space Management
Since all the files are normally stored on disk one of the main concerns of file system is management of disk
space.
Block Size
The main question that arises while storing files in a fixed-size blocks is the size of the block. If the block is too
large space gets wasted and if the block is too small time gets waste. So, to choose a correct block size some
information about the file-size distribution is required. Performance and space-utilization are always in conflict.
Disk quotas
Multiuser operating systems often provide a mechanism for enforcing disk quotas. A system administrator
assigns each user a maximum allotment of files and blocks and the operating system makes sure that the users do
not exceed their quotas. Quotas are kept track of on a per-user basis in a quota table.
File-system Backups
If a computer's file system is irrevocably lost, whether due to hardware or software restoring all the information
will be difficult, time consuming and in many cases impossible. So it is adviced to always have file-system
backups.
Backing up files is time consuming and as well occupies large amount of space, so doing it efficiently and
convenietly is important. Below are few points to be considered before creating backups for files.
There are two ways for dumping a disk to the backup disk:
Physical dump: In this way dump starts at block 0 of the disk, writes all the disk blocks onto thee output
disk in order and stops after copying the last one.
Advantages: Simplicity and great speed.
Disadvantages: inability to skip selected directories, make incremental dumps, and restore individual
files upon request
Logical dump: In this way the dump starts at one or more specified directories and recursively dump all
files and directories found that have been changed since some given base date. This is the most
commonly used way.
The above figure depicts a popular algorithm used in many UNIX systems wherein squares depict directories and
circles depict files. This algorith dumps all the files and directories that have been modified and also the ones on
the path to a modified file or directory. The dump algorithm maintains a bitmap indexed by i-node number with
several bits per i-node. Bits will be set and cleared in this map as the algorithm proceeds. Although logical
dumping is straightforward, there are few issues associated with it.
Since the free block list is not a file, it is not dumped and hence it must be reconstructed from scratch after
all the dumps have been restored
If a file is linked to two or more directories, it is important that the file is restored only one time and that
all the directories that are supposed to point to it do so
UNIX files may contain holes
Special files, named pipes and all other files that are not real should never be dumped.
File-system Consistency
To deal with inconsistent file systems, most computers have a utility program that checks file-system
consistency. For example, UNIX has fsck and Windows has sfc. This utility can be run whenever the system is
booted. The utility programs perform two kinds of consistency checks.
Blocks: To check block consistency the program builds two tables, each one containing a counter for each
block, initially set to 0. If the file system is consistent, each block will have a 1 either in the first table or
in the second table as you can see in the figure below.
In case if both the tables have 0 in it that may be because the block is missing and hence will be reported as a
missing block. The two other situations are if a block is seen more than once in free list and same data block is
present in two or more files.
In addition to checking to see that each block is properly accounted for, the file-system checker also
checks the directory system. It too uses a table of counters but per file-size rather than per block. These
counts start at 1 when a file is created and are incremented each time a (hard) link is made to the file. In a
consistent file system, both counts will agree
File-system Performance
Since the access to disk is much slower than access to memory, many file systems have been designed with
various optimizations to improve performance as described below.
Caching
The most common technique used to reduce disk access time is the block cache or buffer cache. Cache can be
defined as a collection of items of the same type stored in a hidden or inaccessible place. The most common
algorithm for cache works in such a way that if a disk access is initiated, the cache is checked first to see if the
disk block is present. If yes then the read request can be satisfied without a disk access else the disk block is
copied to cache first and then the read request is processed.
The above figure depicts how to quickly determine if a block is present in a cache or not. For doing so a hash
table can be implemented and look up the result in a hash table.
Defragmenting Disks
Due to continuous creation and removal of files the disks get badly fragmented with files and holes all over the
place. As a consequence, when a new file is created, the blocks used for it may be spread all over the disk, giving
poor performance. The performance can be restored by moving files around to make them contiguous and to put
all (or at least most) of the free space in one or more large contiguous regions on the disk.
PART 3
SECONDARY STORAGE STRUCTURE
In operation the disk rotates at high speed, such as 7200 rpm ( 120 revolutions per second. ) The rate at
which data can be transferred from the disk to the computer is composed of several steps:
o The positioning time, a.k.a. the seek time or random access time is the time required to move the
heads from one cylinder to another, and for the heads to settle down after the move. This is
typically the slowest step in the process and the predominant bottleneck to overall transfer rates.
o The rotational latency is the amount of time required for the desired sector to rotate around and
come under the read-write head.This can range anywhere from zero to one full revolution, and on
the average will equal one-half revolution. This is another physical step and is usually the second
slowest step behind seek time. ( For a disk rotating at 7200 rpm, the average rotational latency
would be 1/2 revolution / 120 revolutions per second, or just over 4 milliseconds, a long time by
computer standards.
o The transfer rate, which is the time required to move the data electronically from the disk to the
computer. ( Some authors may also use the term transfer rate to refer to the overall transfer rate,
including seek time and rotational latency as well as the electronic data transfer rate. )
Disk heads "fly" over the surface on a very thin cushion of air. If they should accidentally contact the disk,
then a head crash occurs, which may or may not permanently damage the disk or even destroy it
completely. For this reason it is normal to park the disk heads when turning a computer off, which means
to move the heads off the disk or to an area of the disk where there is no data stored.
Floppy disks are normally removable. Hard drives can also be removable, and some are even hot-
swappable, meaning they can be removed while the computer is running, and a new hard drive inserted in
their place.
Disk drives are connected to the computer via a cable known as the I/O Bus. Some of the common
interface formats include Enhanced Integrated Drive Electronics, EIDE; Advanced Technology
Attachment, ATA; Serial ATA, SATA, Universal Serial Bus, USB; Fiber Channel, FC, and Small
Computer Systems Interface, SCSI.
The host controller is at the computer end of the I/O bus, and the disk controller is built into the disk
itself. The CPU issues commands to the host controller via I/O ports. Data is transferred between the
magnetic surface and onboard cache by the disk controller, and then the data is transferred from that
cache to the host controller and the motherboard memory at electronic speeds.
2) DISK ATTACHEMENT
Disk drives can be attached either directly to a particular host ( a local disk ) or to a network.
2 )DISK SCHEDULING
As mentioned earlier, disk transfer speeds are limited primarily by seek times and rotational
latency. When multiple requests are to be processed there is also some inherent delay in waiting for other
requests to be processed.
Bandwidth is measured by the amount of data transferred divided by the total amount of time from the
first request being made to the last transfer being completed, ( for a series of disk requests. )
Both bandwidth and access time can be improved by processing requests in a good order.
Disk requests include the disk address, memory address, number of sectors to transfer, and whether the
request is for reading or writing.
4)RAID STRUCTURE
The general idea behind RAID is to employ a group of hard drives together with some form of duplication,
either to increase reliability or to speed up operations, ( or sometimes both. )
RAID originally stood for Redundant Array of Inexpensive Disks, and was designed to use a bunch of
cheap small disks in place of one or two larger more expensive ones. Today RAID systems employ large
possibly expensive disks as their components, switching the definition to Independent disks.
There are also two RAID levels which combine RAID levels 0 and 1 ( striping and mirroring ) in different
combinations, designed to provide both performance and reliability at the expense of increased cost.
o RAID level 0 + 1 disks are first striped, and then the striped disks mirrored to another set. This
level generally provides better performance than RAID level 5.
o RAID level 1 + 0 mirrors disks in pairs, and then stripes the mirrored pairs. The storage capacity,
performance, etc. are all the same, but there is an advantage to this approach in the event of
multiple disk failures, as illustrated below:.
In diagram (a) below, the 8 disks have been divided into two sets of four, each of which is
striped, and then one stripe set is used to mirror the other set.
If a single disk fails, it wipes out the entire stripe set, but the system can keep on
functioning using the remaining set.
However if a second disk from the other stripe set now fails, then the entire system
is lost, as a result of two disk failures.
In diagram (b), the same 8 disks are divided into four sets of two, each of which is
mirrored, and then the file system is striped across the four sets of mirrored disks.
If a single disk fails, then that mirror set is reduced to a single disk, but the system
rolls on, and the other three mirror sets continue mirroring.
Now if a second disk fails, ( that is not the mirror of the already failed disk ), then
another one of the mirror sets is reduced to a single disk, but the system can
continue without data loss.
In fact the second arrangement could handle as many as four simultaneously failed
disks, as long as no two of them were from the same mirror pair.
Figure 10.12 - RAID 0 + 1 and 1 + 0
4.4 Selecting a RAID Level
Trade-offs in selecting the optimal RAID level for a particular application include cost, volume of data,
need for reliability, need for performance, and rebuild time, the latter of which can affect the likelihood
that a second disk will fail while the first failed disk is being rebuilt.
Other decisions include how many disks are involved in a RAID set and how many disks to protect with a
single parity bit. More disks in the set increases performance but increases cost. Protecting more disks per
parity bit saves cost, but increases the likelihood that a second disk will fail before the first bad disk is
repaired.
4.5 Extensions
RAID concepts have been extended to tape drives ( e.g. striping tapes for faster backups or parity checking
tapes for reliability ), and for broadcasting of data.
Figure 10.14 - (a) Traditional volumes and file systems. (b) a ZFS pool and file systems.
The concept of stable storage ( first presented in chapter 6 ) involves a storage medium in which data
is never lost, even in the face of equipment failure in the middle of a write operation.
To implement this requires two ( or more ) copies of the data, with separate failure modes.
An attempted disk write results in one of three possible outcomes:
1. The data is successfully and completely written.
2. The data is partially written, but not completely. The last block written may be garbled.
3. No writing takes place at all.
Whenever an equipment failure occurs during a write, the system must detect it, and return the system
back to a consistent state. To do this requires two physical blocks for every logical block, and the
following procedure:
1. Write the data to the first physical block.
2. After step 1 had completed, then write the data to the second physical block.
3. Declare the operation complete only after both physical writes have completed successfully.
During recovery the pair of blocks is examined.
o If both blocks are identical and there is no sign of damage, then no further action is necessary.
o If one block contains a detectable error but the other does not, then the damaged block is replaced
with the good copy. ( This will either undo the operation or complete the operation, depending on
which block is damaged and which is undamaged. )
o If neither block shows damage but the data in the blocks differ, then replace the data in the first
block with the data in the second block. ( Undo the operation. )
Because the sequence of operations described above is slow, stable storage usually includes NVRAM as a
cache, and declares a write operation complete once it has been written to the NVRAM.