CA-chap6-IO System
CA-chap6-IO System
1
6.1. Introduction
❑ I/O devices can be characterized by
l Behaviour: input, output, storage
l Partner: human or machine
l Data rate: bytes/sec, transfers/sec
Cache
Graphics Network
Disk Disk
output
2
I/O System Characteristics
❑ Dependability is important
l Particularly for storage devices
❑ Performance measures
throughput
l Latency (response time)
l Throughput (bandwidth)
latency
Servers
• Mainly interested in: throughput & expandability of
devices
3
Respond Time
Time
Reaction
Time
Response
Time 1
Response
Time 2
4
Dependability
Service accomplishment
Service delivered
as specified
❑ Fault:
failure of a
component
Restoration Failure l May or may not lead
to system failure
Service interruption
Deviation from
specified service
5
Dependability Measures
6
6.2. Disk Storage
7
Disk Sectors and Access
10
Flash Storage
❑ Nonvolatile semiconductor storage
l 100× – 1000× faster than disk
l Smaller, lower power, more robust
l But more $/GB (between disk and DRAM)
11
Flash Types
12
6.3. Interfacing with I/O System
❑ Need interconnections between
l CPU, memory, I/O controllers
13
Bus Types
❑ Processor-Memory buses
l Short, high speed
l Design is matched to memory organization
❑ I/O buses
l Longer, allowing multiple connections
l Specified by standards for interoperability
l Connect to processor-memory bus through a bridge
14
Bus Signals and Synchronization
❑ Data lines
l Carry address and data
l Multiplexed or separate
❑ Control lines
l Indicate data type, synchronize transactions
❑ Synchronous
l Uses a bus clock
❑ Asynchronous
l Uses request/acknowledge control lines for handshaking
15
I/O Bus Examples
16
Typical x86 PC I/O System
17
IO Model
IO Polling/Interrupt
Something happened CPU App
HAL Ker- Shell
nel Services
App
OS
IO Management
(South/North) Bridges
IO Controllers reg
IO Command
IO Devices
18
I/O Management
❑ I/O is mediated by the OS
l Multiple programs share I/O resources
- Need protection and scheduling
l I/O causes asynchronous interrupts
- Same mechanism as exceptions
l I/O programming is fiddly
- OS provides abstractions to programs
19
I/O Commands
❑ I/O
devices are managed by I/O controller
hardware
l Transfers data to/from device
l Synchronizes operations with software
❑ Command registers
l Cause device to do something
❑ Status registers
l Indicate what the device is doing and occurrence of
errors
❑ Data registers
l Write: transfer data to a device
l Read: transfer data from a device 20
I/O Register Mapping
❑ I/O instructions
l Separate instructions to access I/O
registers
l Can only be executed in kernel mode
l Example: x86
21
Polling
❑ Periodically check I/O status register
l If device ready, do operation
l If error, take action
22
Interrupts
❑ When a device is ready or error occurs
l Controller interrupts CPU
❑ Priority interrupts
l Devices needing more urgent attention get higher priority
l Can interrupt handler for a lower priority interrupt
23
Interrupts: Examples
IRQ Number the number of interrupt handled by CPU Core Interrupt Type Device Name
❑ Example with Asus K43SJ
❑ Each CPU in the system has its own column and its own
number of interrupts per IRQ.
❑ IRQ0: system timer; IRQ1&12: keyboard&mouse.
24
I/O Data Transfer
• Polling
When it happen
• Interrupt
26
DMA/VM Interaction
28
Throughput vs Respond Time
Load
29 29
Transaction Processing Benchmarks
❑ Transactions
l Small data accesses to a DBMS
l Interested in I/O rate, not data rate
❑ Measure throughput
l Subject to response time limits and failure handling
l ACID (Atomicity, Consistency, Isolation, Durability)
l Overall cost per transaction
30
File System & Web Benchmarks
❑ SPEC System File System (SFS)
l Synthetic workload for NFS server, based on monitoring real
systems
l Results
- Throughput (operations/sec)
- Response time (average ms/operation)
31
I/O vs. CPU Performance
❑ Amdahl’s Law
l Don’t neglect I/O performance as parallelism increases compute
performance
❑ Example
l Benchmark takes 90s CPU time, 10s I/O time
l Double the number of CPUs/2 years
- I/O unchanged
32
Amdahl and Gustafson’s Laws
❑ Amdahl’s Law: The speed up achieved through
parallelization of a program is limited by the
percentage of its workload that is inherently
serial.
Speedup(N)= 1 / (S + (1-S)/N) < 1/S
N: processors, S: proportion of none-parallelization
34
Exercise
35
RAID
❑ Redundant Array of Inexpensive (Independent) Disks
l Use multiple smaller disks (c.f. one large disk)
l Parallelism improves performance
l Plus extra disk(s) for redundant data storage
36
RAID 1 & 0+1
❑ RAID 1: Mirroring
l N + N disks, replicate data
- Write data to both data disk and mirror disk
- On disk failure, read from mirror
37
RAID 2, bit stripped
38
RAID 3: Bit-Interleaved Parity
❑N + 1 disks
l Data striped across N disks at byte level
l Redundant disk stores parity (dedicated parity disk)
l Read access: Read all disks
l Write access: Generate new parity and update all
disks
l On failure: Use parity to reconstruct missing data
39
RAID 4: Block-Interleaved Parity
❑ N + 1 disks
l Data striped across N disks at block level (16, 32, 64,128 kB)
l Redundant disk stores parity for a group of blocks
l Read access
- Read only the disk holding the required block
l Write access
- Just read disk containing modified block, and parity disk
- Calculate new parity, update data disk and parity disk
l On failure
- Use parity to reconstruct missing data
40
RAID 3 vs RAID 4
(block)
(byte) (block)
41
RAID 5: Distributed Parity
❑ N + 1 disks
l Like RAID 4, but parity blocks distributed across disks
- Avoids parity disk being a bottleneck
❑ Widely used
42
RAID 6: P + Q Redundancy
❑ N + 2 disks
l Like RAID 5, but two lots of parity
l Greater fault tolerance through more redundancy
❑ Multiple RAID
l More advanced systems give similar fault tolerance with better
performance
43
RAID Summary
❑ RAID can improve performance and availability
l High availability requires hot swapping
44
I/O System Design
❑ Maximizing throughput
l Find “weakest link” (lowest-bandwidth component)
l Configure to operate at its maximum bandwidth
l Balance remaining components in the system
45
Server Computers
1U= 1.75”
19”
2U
46
Rack-Mounted Servers
47
Sun Fire x4150 1U server
South
North bridge
bridge
4 cores
each
16 x 4GB =
64GB DRAM
48
I/O System Design Example
❑ Sequential reads
l 112MB/s / 64KB = 1750 IOs/sec per disk
l 14,000 ops/sec for 8 disks 50
Design Example (cont)
❑ PCI-E I/O rate
l 2GB/sec / 64KB = 31,250 IOs/sec
53
Pitfall: Offloading to I/O Processors
57
Pitfall: Peak Performance
❑ Peak I/O rates are nearly impossible to achieve
l Usually, some other system component limits performance
l E.g., transfers to memory over a bus
- Collision with DRAM refresh
- Arbitration contention with other bus masters
l E.g., PCI bus: peak bandwidth ~133 MB/sec
- In practice, max 80MB/sec sustainable
58
Concluding Remarks
❑ I/O performance measures
l Throughput, response time
l Dependability and cost also important
❑ I/O benchmarks
l TPC, SPECSFS, SPECWeb
❑ RAID
l Improves performance and dependability
59