0% found this document useful (0 votes)
15 views50 pages

15 Disks

The document discusses the internal components and operation of hard disk drives, including platters, surfaces, tracks, sectors, read/write heads, and the seek, rotation, and transfer times involved in reading a sector from disk. It provides an example of the performance differences between two disk models for sequential and random workloads, showing that sequential access is faster due to transfer-dominated timing while random access is slower due to seek and rotation overhead. The document also briefly mentions some techniques for improving disk performance like track skew and caching.

Uploaded by

chandreshpatel16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views50 pages

15 Disks

The document discusses the internal components and operation of hard disk drives, including platters, surfaces, tracks, sectors, read/write heads, and the seek, rotation, and transfer times involved in reading a sector from disk. It provides an example of the performance differences between two disk models for sequential and random workloads, showing that sequential access is faster due to transfer-dominated timing while random access is slower due to seek and rotation overhead. The document also briefly mentions some techniques for improving disk performance like track skew and caching.

Uploaded by

chandreshpatel16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Fall 2017 :: CSE 306

Hard Disk
Drives
Nima Honarmand
(Based on slides by Prof. Andrea Arpaci-Dusseau)
Fall 2017 :: CSE 306

Storage Stack in the OS


Application
Virtual file system
Concrete file system
Generic block layer
Driver Build common interface
Disk drive on top of all disk drivers

Different types of drives: HDD, SSD, network mount, USB stick


Different types of interfaces: ATA, SATA, SCSI, USB, NVMe, etc.
Fall 2017 :: CSE 306

Basic Interface
• Disk has a sector-addressable address space
• Appears as an array of sectors to the OS

• Sectors are typically 512 bytes or 4096 bytes

• Main operations: reads + writes to sectors

• Mechanical (slow) nature makes management


“interesting”
Fall 2017 :: CSE 306

Platter

Disk Internals
Fall 2017 :: CSE 306

Platter is covered with a magnetic film.


Fall 2017 :: CSE 306

Spindle
Fall 2017 :: CSE 306

Surface

Surface
Fall 2017 :: CSE 306

Many platters may be bound to the spindle.


Fall 2017 :: CSE 306

Each surface is divided into rings called a track.


A stack of tracks (across platters) is called a cylinder.
Fall 2017 :: CSE 306

23 16
15 8
22 7 0 17
14 9
6 1
13 5 2 10
4 3
21 18
12 11
20 19

The tracks are divided into numbered sectors.


Fall 2017 :: CSE 306

23 16
15 8
22 7 0 17
14 9
6 1
13 5 2 10
4 3
21 18
12 11
20 19

Spindle/platters rapidly spin.

Heads on a moving arm can read from


each surface.
Fall 2017 :: CSE 306

Disk Terminology
spindle
read/write head
platter

surface

sector

track
cylinder
Fall 2017 :: CSE 306

Let’s Read Sector 12!


23 16
15 8
22 7 0 17
14 9
6 1
13 5 2 10
4 3
21 18
12 11
20 19
Fall 2017 :: CSE 306

Step 1: Seek to right track


Fall 2017 :: CSE 306

Step 1: Seek to right track

The disk keeps rotating at a constant speed,


even when seeking.
Fall 2017 :: CSE 306

Step 2: Wait for rotation


Fall 2017 :: CSE 306

Step 2: Wait for rotation


Fall 2017 :: CSE 306

Step 2: Wait for rotation


Fall 2017 :: CSE 306

Step 3: Transfer data


Fall 2017 :: CSE 306

Yay!
Fall 2017 :: CSE 306

HDD Video Demo


• https://www.youtube.com/watch?v=9eMWG3fwiE
U&feature=youtu.be&t=30s

• https://www.youtube.com/watch?v=L0nbo1VOF4
M
Fall 2017 :: CSE 306

Time to Read/Write a Sector


• Three components
1) Seek time
2) Rotation time
3) Transfer time

• Time = seek + rotation + transfer


Fall 2017 :: CSE 306

1) Seek Time
• Seek time: function of cylinder distance
• Not purely linear cost

• Must accelerate, coast, decelerate, settle

• Settling alone can take 0.5–2 ms

• Entire seeks often takes several milliseconds


• 4–10 ms

• Average seek distance?


• 1/3 max seek distance. Why?
Fall 2017 :: CSE 306

2) Rotation Time
• Depends on disk’s rotational speed: Rotations Per
Minute (RPM)
• 7200 RPM is common, 15000 RPM is high end.

• With 7200 RPM, how long to rotate around?


• 1 / 7200 RPM
= 1 minute / 7200 rotations
= 1 second / 120 rotations
= 8.3 ms / rotation

• Average rotation?
• 8.3 ms / 2 = 4.15 ms
Fall 2017 :: CSE 306

3) Transfer Time
• Pretty fast — depends on RPM and sector density.

• 100+ MB/s is typical for maximum transfer rate

• How long to transfer 512-bytes?


• 512 bytes / (100 MBps) = 5 us
Fall 2017 :: CSE 306

Workload Performance
• So…
• seeks are slow
• rotations are slow
• transfers are fast

• What kind of workload is faster for disks?


• Sequential: access sectors in order (transfer dominated)
• Random: access sectors arbitrarily (seek+rotation
dominated)
Fall 2017 :: CSE 306

Disk Spec
Cheetah Barracuda
Capacity 300 GB 1 TB
RPM 15,000 7,200
Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/s
Platters 4 4
Cache 16 MB 32 MB

• Sequential workload: what is throughput for each?


• Cheetah: 125 MB/s
• Barracuda: 105 MB/s
Fall 2017 :: CSE 306

Disk Spec
Cheetah Barracuda
Capacity 300 GB 1 TB
RPM 15,000 7,200
Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/s
Platters 4 4
Cache 16 MB 32 MB

• Random workload: what is throughput for each?

• Assume size of each random read is 16KB


Fall 2017 :: CSE 306

Cheetah Barracuda
RPM 15,000 7,200
Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/s

Time = seek + rotate + transfer

Seek = 4 ms
Full rotation = 60 / (15,000) = 4 ms
Half rotation = 2 ms
Transfer = 16 KB / 125 MBps = 125 us

Throughput = 16 KB / (6.125 ms) = 2.5 MBps


Fall 2017 :: CSE 306

Cheetah Barracuda
RPM 15,000 7,200
Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/s

Time = seek + rotate + transfer

Seek = 9 ms
Full rotation = 60 / (7,200) = 8.3 ms
Half rotation = 4.1 ms
Transfer = 16 KB / 100 MBps = 160 us

Throughput = 16 KB / (13.260 ms) = 1.2 MBps


Fall 2017 :: CSE 306

Cheetah Barracuda
Capacity 300 GB 1 TB
RPM 15,000 7,200
Avg Seek 4 ms 9 ms
Max Transfer 125 MB/s 105 MB/s
Platters 4 4
Cache 16 MB 32 MB

Cheetah Barracuda
Sequential 125 MB/s 105 MB/s
Random 2.5 MB/s 1.2 MB/s

This shows the importance of proper disk scheduling


to achieve good disk performance.
Fall 2017 :: CSE 306

Other Improvements
• Track Skew

• Zones

• Drive Cache
Fall 2017 :: CSE 306

Imagine sequential reading. How should sectors


numbers be laid out on disk?

23 16
15 8
22 17
14 9

13 10
21 18
12 11
20 19
Fall 2017 :: CSE 306

When reading 16 after 15, the head won’t settle


quick enough, so we need to do a rotation.

23 16
15 8
22 17
14 9

13 10
21 18
12 11
20 19
Fall 2017 :: CSE 306

Enough time to settle now!

21 22
15 8
20 23
14 9

13 10
19 16
12 11
18 17
Fall 2017 :: CSE 306

Other Improvements
• Track Skew

• Zones

• Drive Cache
Fall 2017 :: CSE 306

ZBR (Zoned bit recording): More sectors on outer zones

Within each zone, all tracks have the same number of


sectors per track.
Fall 2017 :: CSE 306

Other Improvements
• Track Skew

• Zones

• Drive Cache
Fall 2017 :: CSE 306

Drive Cache
• Drives may cache both reads and writes
• OS caches data too
• Disks contain internal memory (2−16MB) used as
cache
• A.k.a. “Track Buffer”
• Provides multiple benefits

1) Read-ahead
• Read contents of entire track into memory during
rotational delay
• Can send them to OS if it asks for them later
Fall 2017 :: CSE 306

Drive Cache (2)


2) Write caching
• Keep write data in the drive cache and claim completion
to OS
• “faster” response time
• But data could be lost on power failure

3) Tagged command queueing


• Have multiple outstanding requests to the disk
• Disk can reorder (schedule) requests for better
performance
• OS tags each request with an ID; disk uses tag to report
completion
Fall 2017 :: CSE 306

Disk Scheduling
Fall 2017 :: CSE 306

Disk Scheduling
• We saw importance of proper request ordering in
our throughput example
• 125 MBps for sequential vs. 2.5 MBps for random
workload

• Crux: Given a stream of requests, in what order


should they be served?
• Much different than CPU scheduling
• Performance dominated by seek+rotation
• Position of disk head relative to request position matters
more than job length (i.e., request size)
Fall 2017 :: CSE 306

First-Come-First-Serve (FCFS) Scheduler


• Assume seek+rotate = 10 ms for a random request
• How long (roughly) does below workload (1) below take?
• Requests are given in sector numbers
• 60 msec

• How about Workload (2)?


• 20 msec

(1) 300001, 700001, 300002, 700002, 300003, 700003


(2) 300001, 300002, 300003, 700001, 700002, 700003
• Small change in request ordering yielded 3x improvement in
disk bandwidth utilization!

Main objective in disk scheduling: to maximize bandwidth utilization


Fall 2017 :: CSE 306

How to Maximize BW?


• Given a set of requests, which one would you choose next
to maximize BW?

• Strategy: always choose the request with least positioning


time — Shortest Positioning Time First (SPTF)

• Where best to implement?


1) Disk Controller
2) OS

• In general, OS doesn’t know the disk geometry. It can


approximate SPTF by Nearest Block First (NBF)

• Disadvantage: easy for far-away requests to starve


Fall 2017 :: CSE 306

Detour: Where to Do Scheduling?


• OS:
• Positive: can know about all the OS
pending requests; a more global
view
• Negative: does not know disk Scheduler
geometry

• Disk:
• Positive : knows the geometry
• Negative: can only hold a few
requests to schedule among
Scheduler
• Reality: both — OS picks next few
“good” requests to send to disk; Disk
disk then schedules among them
Fall 2017 :: CSE 306

Tackling Starvation Problem


• Elevator algorithm (a.k.a. SCAN)
• Sweep back and forth, from one end of disk other, serving requests
as pass that track

• Variations to improve fairness:


• F-SCAN: freeze the queue of requests when doing a sweep in the
current direction
• To avoid starving requests waiting in the other direction
• C-SCAN: only sweep in one direction
• To be fair to the outer tracks; in original SCAN, middle tracks are passed
over twice as often as outer tracks

• Simple old algorithms — not used directly today


• But the idea is useful and can be part of more complex solutions
Fall 2017 :: CSE 306

Anticipatory Scheduling (1)


• Assume 2 processes each calling read() on different files
Assume OS uses something similar to C-SCAN
void reader(int fd) {
char buf[1024];
int rv;
while((rv = read(fd, buf)) != 0) {
assert(rv);
// takes short time, e.g., 1ms
process(buf, rv);
}
}

• Should OS serve P2’s read after finishing P1’s read?


Fall 2017 :: CSE 306

Anticipatory Scheduling (2)


• Let’s say disk is idle and OS receives a read request. Should
we send it immediately to disk?

• Work-preserving schedulers: yes, let’s do work if there is


work to be done

• Anticipatory schedulers: no, let’s wait for some time in case


a “better” request arrives soon

• Ideally, OS can observe each process’s behavior over time to


learn its access pattern
• And use that to decide whether to wait or not before switching to
requests from another process
Fall 2017 :: CSE 306

Completely Fair Queuing (CFQ)


• Linux’s current default scheduler

• Queue of requests for each process


• Weighted round-robin between queues, with IO
slice time proportional to priority
• Yield slice only if idle for a given time
• Emulating some sort of anticipatory algorithm
• A back-seek penalty to optimize order within queue
• Emulating some aspects of elevator algorithms
Fall 2017 :: CSE 306

Hard Disk Summary


• Storage devices provide common sector-based
interface

• On a hard disk: Never do random I/O unless you must!


• Quicksort is a terrible algorithm on disk

• It pays off to spend CPU time to do complex scheduling


on slow hard disk devices

• These scheduling algorithms were motivated by hard


disk properties; SSDs have different properties and thus
require different algorithms 
• Read the OSTEP chapter to learn about these other devices

You might also like