Lecture 18
Lecture 18
Disk Structure
• Disk drives are addressed as large 1-dimensional
arrays of logical blocks, where the logical block is
the smallest unit of transfer
• The 1-dimensional array of logical blocks is
mapped into the sectors of the disk sequentially
– Sector 0 is the first sector of the first track on the outermost
cylinder
– Mapping proceeds in order through that track, then the rest
of the tracks in that cylinder, and then through the rest of the
cylinders from outermost to innermost
1
Disk Scheduling
• The OS is responsible minimizing access time and
maximizing bandwidth for disks
• Access time has two major components
– Seek time is the time for the disk are to move the heads to the
cylinder containing the desired sector
– Rotational latency is the additional time waiting for the disk
to rotate the desired sector to the disk head
• Disk bandwidth is the total number of bytes
transferred, divided by the total time from the start of
the request to the completion of the last transfer
• Requests are serviced immediately if the disk is not
busy, if disk is busy then requests are queued
– Disk scheduling chooses the next request to service
Disk Scheduling
• First-come, first-serve (FCFS)
– As always, this is the simplest algorithm to implement
– This algorithm is intrinsically fair, but does not necessarily
provide the fastest service
– Consider requests for blocks on the following cylinders
98, 183, 37, 122, 14, 124, 65, 67
– Assume the disk head is initially on cylinder 53
• The next slide shows the head traversal
2
Disk Scheduling
• First-come, first-serve (FCFS)
– Total disk head movement is 640 cylinders
Disk Scheduling
• Shortest-seek-time-first (SSTF)
– Service requests closest to the current head position (i.e., the
minimum seek time)
– Is a form of Shortest-job-first
– May lead to starvation
– Next slide depicts head movement for previous example
3
Disk Scheduling
• Shortest-seek-time-first (SSTF)
– Total head movement is 236 cylinders
Disk Scheduling
• Elevator algorithm
– Also called SCAN
– The disk arm starts at one end of the disk
– Moves toward the other end of the disk, servicing requests
along the way
– When there are no more requests or it reaches the end, it
reverses direction and moves back toward the other end of
the disk, servicing requests along the way
– This process is continued repeatedly
– Next slide depicts head movement for previous example
4
Disk Scheduling
• Elevator algorithm
– Total head movement is 208 cylinders
• This figure assumes a “strict” elevator that moves all the way
to the end before reversing directions
Disk Scheduling
• C-SCAN
– “Circular” SCAN
– Provides a more uniform wait time than Elevator
– The head moves from one end of the disk to the other,
servicing requests as it goes
– When it reaches the other end, it immediately returns to the
beginning of the disk, without servicing any requests on the
return trip
– Treats the cylinders as a circular list that wraps around from
the last cylinder to the first one
5
Disk Scheduling
• C-SCAN
– Head movement for previous example
6
Disk Reliability
• Several improvements in disk-use techniques
involve the use of multiple disks working
cooperatively
• Disk striping uses a group of disks as one
– Improves performance by breaking blocks into sub-blocks
• RAID (redundant array of independent disks)
schemes improve performance and improve the
reliability of the storage system by storing
redundant data
– Mirroring or shadowing keeps duplicate of each disk
– Block interleaved parity uses much less redundancy
RAID Schemes
• Defined in 1988 by Patterson, Gibson, and Katz from
Berkeley University
• RAID 0: Striping
• RAID 1: Mirroring
• RAID 2: Bit Striping with ECC (use Hamming-code;
practically not used)
• RAID 3: Bit striping with parity
• RAID 4: Striping with fixed parity
• RAID 5: Striping with striped parity
• RAID 10: Striped mirrors
7
Tertiary Storage
• Low cost is the defining characteristic of tertiary
storage
• Generally, tertiary storage is built using removable
media
• Common examples of removable media are floppy
disks, CD-ROMs, etc.
Tertiary Storage
• Floppy disk — thin flexible disk coated with
magnetic material, enclosed in a protective plastic
case
– Most floppies hold about 1 MB; similar technology is used
for removable disks that hold more than 1 GB
– Removable magnetic disks can be nearly as fast as hard
disks, but they are at a greater risk of damage from exposure
8
Tertiary Storage
• A magneto-optic disk records data on a rigid platter
coated with magnetic material
– Laser heat is used to amplify a large, weak magnetic field to
record a bit
– Laser light is also used to read data (Kerr effect)
• Polarization of laser is rotated depending on magnetic field
orientation
– The magneto-optic head flies much farther from the disk
surface than a magnetic disk head, and the magnetic material
is covered with a protective layer of plastic or glass; resistant
to head crashes
Tertiary Storage
• Optical disks do not use magnetism; they employ
special materials that are altered by laser light
– Phase-change disk
• Coated with material that can freeze into either a crystalline or
an amorphous state
• Different states reflect laser light with different strengths
• Laser is used at different power settings to melt and refreeze
spots on disk to change their state
– Dye-polymer disk
• Coated with plastic contained a dye that absorbs laser light
• Laser can heat a small spot to cause it to swell and form a bump
• Laser can reheat a bump to smooth it out
• High cost, low performance
9
Tertiary Storage
• Data on read/write disks can be modified many times
• WORM (“Write Once, Read Many Times”) disks can
be written only once
• Thin aluminum film sandwiched between two glass or
plastic platters
• To write a bit, the drive uses a laser light to burn a
small hole through the aluminum; information can be
destroyed by not altered
• Very durable and reliable
10
Linux Swap Space Management
• Performed at the page level
– Requires hardware paging unit in the CPU
• Linux page table entry
– Present flag indicates whether a page is swapped out
– Remaining bits store location of page on disk
• Linux only swaps
– Pages belonging to an anonymous memory region of a
process (e.g., user mode stack)
– Modified pages belonging to a private memory mapping of a
process
– Pages belonging to an IPC shared memory region
– Other page types are either used by the kernel or to map files
on disk and are not swapped to the swap area
11
Linux Page Swap Selection
• Generally, the rule for selecting pages to swap is to
take from the process with largest number of pages
– Changes in 2.4 to processes with fewest page faults
• A choice must also be made among process pages
– As we know, LRU is a good selection algorithm
– The x86 processor does provide hardware support for LRU
– Linux tries to approximate LRU by using the Accessed
flag in each page table entry; this flag is automatically set by
the hardware when the page is accessed
12
Linux Swap Area
• Linux can support up to MAX_SWAPFILES swap
areas (usually set to 8)
• Each swap area consists of a sequences of page
slots, i.e., 4096 byte blocks
• The first slot of the swap area is used to store
information about the swap area
– The first slot is a swap_header union containing two
structures
• magic - contains magic number to identify partition as swap
stored at end of slot
• info - contains information such as swap algorithm version,
last usable page slot, number of bad page slots, and location of
bad page slots
13
Linux Swap Area
• Swap area data structures
swap_info free slot occupied slot defective slot
swap area
swap_device
or
swap_file
swap_map
00 11 00 22 11 00 32768
32768
array of counters
14
Finding a Free Page Slot
• The goal is to store pages in contiguous slots
• Allocating from the start of the swap area increases
average swap-out time, while allocating from the
last allocated slot increases swap-in time
– Linux’ approach tries to balance these issues
• Linux always searches for free slots from the last
allocated slot, unless
– The end of the swap area is reached
– The number of allocated slots since the last restart is greater
than or equal to SWAPFILE_CLUSTER (usually 256)
15
The Swap Cache
• It is possible that a page will get swapped out
multiple times, which is why the slot counter in
swap descriptor may have a value greater than 1
– The counter is incremented each time a page is swapped out
• It is possible to swap out a shared page for all
process at once, but it is inefficient since the kernel
does not know which processes share the page
16