0% found this document useful (0 votes)
6 views59 pages

CA-chap6-IO System

Uploaded by

elsword26072002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views59 pages

CA-chap6-IO System

Uploaded by

elsword26072002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 6: IO System

[with materials from Computer Organization and Design, 4th Edition,


Patterson & Hennessy, © 2008, MK]

1
6.1. Introduction
❑ I/O devices can be characterized by
l Behaviour: input, output, storage
l Partner: human or machine
l Data rate: bytes/sec, transfers/sec

❑ I/O bus connections


Interrupts
Processor

Cache

Memory – I/O Interconnect

Main I/O I/O I/O


Memory Controller Controller Controller

Graphics Network
Disk Disk
output
2
I/O System Characteristics

❑ Dependability is important
l Particularly for storage devices

❑ Performance measures
throughput
l Latency (response time)
l Throughput (bandwidth)
latency

Desktops & embedded systems


• Mainly interested in: response time & diversity of
devices

Servers
• Mainly interested in: throughput & expandability of
devices

3
Respond Time

User User System System System


Starts Finishes Starts Starts Finishes
Request Request Execution Response Response

Time

Reaction
Time

Response
Time 1
Response
Time 2

◼ Can have two measures of response time


◼ Both ok, but 2 preferred if execution long

4
Dependability

Service accomplishment
Service delivered
as specified
❑ Fault:
failure of a
component
Restoration Failure l May or may not lead
to system failure

Service interruption
Deviation from
specified service

5
Dependability Measures

◼ Reliability: mean time to failure


(MTTF)
◼ Service interruption: mean time to
MTTF
repair (MTTR)
◼ Mean time between failures MTTR

MTBF = MTTF + MTTR MTBF


◼ Availability = MTTF / (MTTF + MTTR)
❑ Improving Availability
Increase MTTF: fault avoidance, fault
tolerance, fault forecasting

Reduce MTTR: improved tools and


processes for diagnosis and repair

6
6.2. Disk Storage

❑ Nonvolatile, rotating magnetic storage

7
Disk Sectors and Access

❑ Each sector records


l Sector ID
l Data (512 bytes, 4096 bytes proposed)
l Error correcting code (ECC)
- Used to hide defects and recording errors
l Synchronization fields and gaps

❑ Access to a sector involves


l Queuing delay if other accesses are pending
l Seek: move the heads
l Rotational latency
l Data transfer
l Controller overhead 8
Disk Access Example

seek time controller


❑ Given 4ms delay 0.2ms
l 512B sector, 15,000rpm, 4ms average
seek time, 100MB/s transfer rate,
0.2ms controller overhead, idle disk
❑ Average read time
l 4ms seek time rotational
+ ½ / (15,000/60) = 2ms rotational latency 2ms
latency
+ 512 / 100MB/s = 0.005ms transfer
time
+ 0.2ms controller delay transfer time
0.005ms
= 6.2ms
❑ If actual average seek time is 1ms
l Average read time = 3.2ms
9
Disk Performance Issues

❑ Manufacturers quote average seek time


l Based on all possible seeks
l Locality and OS scheduling lead to smaller actual
average seek times (25%~33%)

❑ Smart disk controller allocate physical sectors on


disk
l Present logical sector interface to host
l SCSI, ATA, SATA
anticipation /æn,tisi'peiʃn/:
sự đoán trước
❑ Disk drives include caches
l Prefetch sectors in anticipation of access
l Avoid seek and rotational delay

10
Flash Storage
❑ Nonvolatile semiconductor storage
l 100× – 1000× faster than disk
l Smaller, lower power, more robust
l But more $/GB (between disk and DRAM)

11
Flash Types

❑ NOR flash: bit cell like a NOR gate


l Random read/write access
l Used for instruction memory in embedded systems

❑ NAND flash: bit cell like a NAND gate


l Denser (bits/area), but block-at-a-time access
l Cheaper per GB
l Used for USB keys, media storage, …

❑ Flash bits wears out after 1000’s of accesses


l Not suitable for direct RAM or disk replacement
l Wear leveling: remap data to less used blocks

12
6.3. Interfacing with I/O System
❑ Need interconnections between
l CPU, memory, I/O controllers

❑ Bus: shared communication channel


l Parallel set of wires for data and synchronization of data transfer
l Can become a bottleneck

❑ Performance limited by physical factors


l Wire length, number of connections

❑ More recent alternative: high-speed serial connections


with switches
l Like networks

13
Bus Types
❑ Processor-Memory buses
l Short, high speed
l Design is matched to memory organization

❑ I/O buses
l Longer, allowing multiple connections
l Specified by standards for interoperability
l Connect to processor-memory bus through a bridge

14
Bus Signals and Synchronization
❑ Data lines
l Carry address and data
l Multiplexed or separate

❑ Control lines
l Indicate data type, synchronize transactions

❑ Synchronous
l Uses a bus clock

❑ Asynchronous
l Uses request/acknowledge control lines for handshaking

15
I/O Bus Examples

Firewire USB 2.0 PCI Express Serial ATA Serial


Attached
SCSI
Intended use External External Internal Internal External
Devices per 63 127 1 1 4
channel
Data width 4 2 2/lane 4 4
Peak 50MB/s or 0.2MB/s, 250MB/s/lane 300MB/s 300MB/s
bandwidth 100MB/s 1.5MB/s, or 1×, 2×, 4×,
60MB/s 8×, 16×, 32×
Hot Yes Yes Depends Yes Yes
pluggable
Max length 4.5m 5m 0.5m 1m 8m
Standard IEEE 1394 USB PCI-SIG SATA-IO INCITS TC
Implementers T10
Forum

16
Typical x86 PC I/O System

17
IO Model

IO Polling/Interrupt
Something happened CPU App
HAL Ker- Shell
nel Services
App
OS
IO Management

(South/North) Bridges

IO Controllers reg

IO Command

IO Devices

18
I/O Management
❑ I/O is mediated by the OS
l Multiple programs share I/O resources
- Need protection and scheduling
l I/O causes asynchronous interrupts
- Same mechanism as exceptions
l I/O programming is fiddly
- OS provides abstractions to programs

19
I/O Commands

❑ I/O
devices are managed by I/O controller
hardware
l Transfers data to/from device
l Synchronizes operations with software

❑ Command registers
l Cause device to do something

❑ Status registers
l Indicate what the device is doing and occurrence of
errors

❑ Data registers
l Write: transfer data to a device
l Read: transfer data from a device 20
I/O Register Mapping

❑ Memory mapped I/O


l Registers are addressed in same
space as memory
l Address decoder distinguishes
between them
l OS uses address translation
mechanism to make them only
accessible to kernel

❑ I/O instructions
l Separate instructions to access I/O
registers
l Can only be executed in kernel mode
l Example: x86
21
Polling
❑ Periodically check I/O status register
l If device ready, do operation
l If error, take action

❑ Common in small or low-performance real-time


embedded systems
l Predictable timing
l Low hardware cost

❑ In other systems, wastes CPU time

22
Interrupts
❑ When a device is ready or error occurs
l Controller interrupts CPU

❑ Interrupt is like an exception


l But not synchronized to instruction execution
l Can invoke handler between instructions
l Cause information often identifies the interrupting device

❑ Priority interrupts
l Devices needing more urgent attention get higher priority
l Can interrupt handler for a lower priority interrupt

23
Interrupts: Examples

IRQ Number the number of interrupt handled by CPU Core Interrupt Type Device Name
❑ Example with Asus K43SJ
❑ Each CPU in the system has its own column and its own
number of interrupts per IRQ.
❑ IRQ0: system timer; IRQ1&12: keyboard&mouse.
24
I/O Data Transfer

• Polling
When it happen
• Interrupt

• mem/io → CPU → mem/io


How to transfer
• DMA, with cache/VMem

❑ Polling and interrupt-driven I/O


l CPU transfers data between memory and
I/O data registers
l Time consuming for high-speed devices

❑ Direct memory access (DMA)


l OS provides starting address in memory
l I/O controller transfers to/from memory
autonomously
l Controller interrupts on completion or error
25
DMA/Cache Interaction

❑ If DMA writes to a memory block that is cached


l Cached copy becomes stale

❑ Ifwrite-back cache has dirty block, and DMA


reads memory block
l Reads stale data

❑ Need to ensure cache coherence


l Flush blocks from cache if they will be used for DMA
l Or use non-cacheable memory locations for I/O

stale /steil/ (adj): cũ rích, mất hiệu lực


coherence /kou'hiərəns/: tính nhất quán

26
DMA/VM Interaction

❑ OS uses virtual addresses for memory


l DMA blocks may not be contiguous in
physical memory

❑ Should DMA use virtual addresses?


l Would require controller to do translation

❑ If DMA uses physical addresses


l May need to break transfers into page-
sized chunks
l Or chain multiple transfers
l Or allocate contiguous physical pages for
DMA
contiguous /kən'tigjuəs/: liền kề, bên cạnh
27
6.4. Measuring I/O Performance
❑ I/O performance depends on

Hardware Software Workload

• CPU • OS • Request rates


• Memory • DBMS • Patterns
• Controllers • Application
• Buses

❑ I/O system design can trade-off between response time


and throughput
l Measurements of throughput often done with constrained
response-time

28
Throughput vs Respond Time

◼ Throughput increases as load • Nominal capacity is


ideal (ex: 10 Mbps)
increases, to a point
• Usable capacity is
achievable (ex: 9.8
Mbps)
Knee

Throughput

Nominal Knee is where response


Capacity time goes up rapidly
for small increase in
Knee Usable throughput
Capacity Capacity
Load
Response
Time

Load
29 29
Transaction Processing Benchmarks
❑ Transactions
l Small data accesses to a DBMS
l Interested in I/O rate, not data rate

❑ Measure throughput
l Subject to response time limits and failure handling
l ACID (Atomicity, Consistency, Isolation, Durability)
l Overall cost per transaction

❑ Transaction Processing Council (TPC) benchmarks


(www.tcp.org)
l TPC-APP: B2B application server and web services
l TCP-C: on-line order entry environment
l TCP-E: on-line transaction processing for brokerage firm
l TPC-H: decision support — business oriented ad-hoc queries

30
File System & Web Benchmarks
❑ SPEC System File System (SFS)
l Synthetic workload for NFS server, based on monitoring real
systems
l Results
- Throughput (operations/sec)
- Response time (average ms/operation)

❑ SPEC Web Server benchmark


l Measures simultaneous user sessions, subject to required
throughput/session
l Three workloads: Banking, Ecommerce, and Support

31
I/O vs. CPU Performance

❑ Amdahl’s Law
l Don’t neglect I/O performance as parallelism increases compute
performance

❑ Example
l Benchmark takes 90s CPU time, 10s I/O time
l Double the number of CPUs/2 years
- I/O unchanged

Year CPU time I/O time Elapsed time % I/O time


now 90s 10s 100s 10%
+2 45s 10s 55s 18%
+4 23s 10s 33s 31%
+6 11s 10s 21s 47%

32
Amdahl and Gustafson’s Laws
❑ Amdahl’s Law: The speed up achieved through
parallelization of a program is limited by the
percentage of its workload that is inherently
serial.
Speedup(N)= 1 / (S + (1-S)/N) < 1/S
N: processors, S: proportion of none-parallelization

❑ Gustafson’s Law: With increasing data size, the


speedup obtained though parallelization
increases, because the parallel work increases
with data size.
Speedup(N) = N – S(N-1) = N(1-S) + S

denominator /di'nɔmineitə/ mẫu số; mẫu thức


numerator /'nju:məreitə/: tử số fraction /'frækʃn/: phân số
33
❑ ĐỊNH LUẬT AMDAHL và GUSTAFSON
- Amdahl’s Law: Sự tăng tốc của một chương trình
sử dụng nhiều bộ xử lí trong tính toán song song
bị hạn chế bởi một phần tính toán tuần tự của
chương trình (dựa trên giả thiết rằng kích thước
của bài toán không đổi khi tính toán song song)
- Khi tăng khối lượng dữ liệu cần xử lý, hệ số tăng
tốc sẽ tăng lên khi tiến hành song song hóa, bởi
vì khối lượng công việc có thể tính toán song
song cũng tăng lên theo dữ liệu

34
Exercise

❑ Assume accelerating a machine by adding a


vector mode to it. When a computation is run in
vector mode, it is 20 times faster than the normal
mode of execution.
However, the software program cannot be
parallized absolutely and CPU’s speedup of this
program is only 2. So how many percent the
software cannot run in vector mode?
Answer:
❑ Speedup(N)= 1 / (S + (1-S)/N)
2 = 1 / (S + (1-S)/20) = 20 / (19S+1)
→ S = 47.4%

35
RAID
❑ Redundant Array of Inexpensive (Independent) Disks
l Use multiple smaller disks (c.f. one large disk)
l Parallelism improves performance
l Plus extra disk(s) for redundant data storage

❑ Provides fault tolerant storage system


l Especially if failed disks can be “hot swapped”
❑ RAID 0, stripping
l No redundancy (“AID”?)
- Just stripe data over multiple disks
l But it does improve performance

36
RAID 1 & 0+1

❑ RAID 1: Mirroring
l N + N disks, replicate data
- Write data to both data disk and mirror disk
- On disk failure, read from mirror

RAID 0+1: Stripping+ Mirroring


RAID 1

37
RAID 2, bit stripped

❑ RAID 2: Error correcting code (ECC)


l N + E disks (e.g., 10 + 4)

l Split data at bit level across N disks


l Generate E-bit ECC
l Too complex, not used in practice

38
RAID 3: Bit-Interleaved Parity

❑N + 1 disks
l Data striped across N disks at byte level
l Redundant disk stores parity (dedicated parity disk)
l Read access: Read all disks
l Write access: Generate new parity and update all
disks
l On failure: Use parity to reconstruct missing data

❑ Not widely used

39
RAID 4: Block-Interleaved Parity
❑ N + 1 disks
l Data striped across N disks at block level (16, 32, 64,128 kB)
l Redundant disk stores parity for a group of blocks
l Read access
- Read only the disk holding the required block
l Write access
- Just read disk containing modified block, and parity disk
- Calculate new parity, update data disk and parity disk
l On failure
- Use parity to reconstruct missing data

❑ Not widely used

40
RAID 3 vs RAID 4

(block)
(byte) (block)

Read 3 disks to get 3 bytes, Read 2 disks to get 2 blocks


and then create parity byte (include parity block), and then
create new parity block

41
RAID 5: Distributed Parity

❑ N + 1 disks
l Like RAID 4, but parity blocks distributed across disks
- Avoids parity disk being a bottleneck

❑ Widely used

42
RAID 6: P + Q Redundancy
❑ N + 2 disks
l Like RAID 5, but two lots of parity
l Greater fault tolerance through more redundancy

❑ Multiple RAID
l More advanced systems give similar fault tolerance with better
performance

43
RAID Summary
❑ RAID can improve performance and availability
l High availability requires hot swapping

❑ Assumes independent disk failures


l Too bad if the building burns down!

❑ See “Hard Disk Performance, Quality and Reliability”


l http://www.pcguide.com/ref/hdd/perf/index.htm

44
I/O System Design

❑ Satisfying latency requirements


l For time-critical operations
l If system is unloaded
weakest
- Add up latency of components link

❑ Maximizing throughput
l Find “weakest link” (lowest-bandwidth component)
l Configure to operate at its maximum bandwidth
l Balance remaining components in the system

❑ If system is loaded, simple analysis is insufficient


l Need to use queuing models or simulation

45
Server Computers

❑ Applications are increasingly run on servers


l Web search, office apps, virtual worlds, cloud…

❑ Requires large data centre servers


l Multiple processors, networks connections, massive
storage
l Space and power constraints

❑ Server equipment built for 19” racks


l Multiples of 1.75” (1U) high

1U= 1.75”
19”

2U
46
Rack-Mounted Servers

Sun Fire x4150 1U server

47
Sun Fire x4150 1U server

South
North bridge
bridge

4 cores
each

16 x 4GB =
64GB DRAM

48
I/O System Design Example

❑ Given a Sun Fire x4150 system with


l Workload: 64KB disk reads
- Each I/O op requires 200,000 user-code instructions and
100,000 OS instructions
l Each CPU: 109 instructions/sec
l FSB: 10.6 GB/sec peak
l DRAM DDR2 667MHz: 5.336 GB/sec
l PCI-E 8× bus: 8 × 250MB/sec = 2GB/sec
l Disks: 15,000 rpm, 2.9ms avg. seek time, 112MB/sec
transfer rate

❑ What I/O rate can be sustained?


l For random reads, and for sequential reads
49
Design Example (cont)

❑ I/O rate for CPUs


l Per core: 109/(100,000 + 200,000) = 3,333 IOs/sec
l 8 cores: 3,333 x 8 = 26,667 IOs/sec

❑ Random reads, I/O rate for disks


l Assume actual seek time is average/4
l Time/op = seek + latency + transfer (+ control time~0)
= 2.9ms/4 + 4ms/2 + 64KB/(112MB/s)=3.3ms(per
IOs)
l 1000*1/3.3=303 IOs/sec per disk, 2424 IOs/sec for 8
disks

❑ Sequential reads
l 112MB/s / 64KB = 1750 IOs/sec per disk
l 14,000 ops/sec for 8 disks 50
Design Example (cont)
❑ PCI-E I/O rate
l 2GB/sec / 64KB = 31,250 IOs/sec

❑ DRAM I/O rate


l 5.336 GB/sec / 64KB = 83,375 IOs/sec

❑ FSB I/O rate


l Assume we can sustain half the peak rate
l 10.6 GB/sec /2 / 64KB = 81,540 IOs/sec per FSB
l 163,080 IOs/sec for 2 FSBs (2 Intel Xeon)

❑ Weakest link: disks ample /'æmpl/ (adj): nhiều, phong phú


headroom: không gian trống
l 2424 IOs/sec random, 14,000 IOs/sec sequential
l Other components have ample headroom to
accommodate these rates 51
Fallacy: Disk Dependability

❑ If a disk manufacturer quotes MTTF as 1,200,000hr


(140yr)
l A disk will work that long

❑ Wrong: this is the mean time to failure


l What is the distribution of failures?
l What if you have 1000 disks
- How many will fail per year?

1000 disks  (24 * 365) hrs/disk


Failed Disks = = 7.3
1200000 hrs/failure
7.3
Annual Failure Rate (AFR) = = 0.73%
fallacy /'fæləsi/ ảo tưởng; ý kiến sai lầm
1000
52
Fallacies

❑ Disk failure rates are as specified


l Studies of failure rates in the field Prof. Bianca Schroeder

- Schroeder and Gibson: 2% to 4% vs. 0.6% to 0.8%


- Pinheiro, et al.: 1.7% (first year) to 8.6% (third year) vs. 1.5%
l Why?

❑A 1GB/s interconnect transfers 1GB in one sec


l But what’s a GB?
l For bandwidth, use 1GB = 109 B
l For storage, use 1GB = 230 B = 1.075×109 B
l So 1GB/sec is 0.93GB in one second
- About 7% error

53
Pitfall: Offloading to I/O Processors

❑ Overhead of managing I/O


processor request may dominate
l Quicker to do small operation on
the CPU
l But I/O architecture may prevent
that

❑ I/O processor may be slower


l Since it’s supposed to be simpler

❑ Making it faster makes it into a


major system component
l Might need its own coprocessors!
pitfall /'pitfɔ:l/ cạm bẫy
Offload: đẩy dữ liệu ra ngoại vi 54
Pitfall: Backing Up to Tape
❑ Magnetic tape used to have advantages
l Removable, high capacity

❑ Advantages eroded by disk technology developments


❑ Makes better sense to replicate data
l E.g, RAID, remote mirroring

Tape IBM System


Storage TS1130
Tape Drive

erode /i'roud/: xói mòn, suy giảm


55
Fallacy: Disk Scheduling

❑ Best to let the OS schedule disk accesses


l But modern drives deal with Logical Block Addresses
- Map to physical track, cylinder, sector locations
- Also, blocks are cached by the drive
l OS is unaware of physical locations
- Reordering can reduce performance
- Depending on placement and caching

unaware /'ʌbə'weə/: không hay biết


56
Example: Disk Management

❑ Disk size = <sector num>*<sector size>


= 976773168 * 512 = 500107862016=465GB

57
Pitfall: Peak Performance
❑ Peak I/O rates are nearly impossible to achieve
l Usually, some other system component limits performance
l E.g., transfers to memory over a bus
- Collision with DRAM refresh
- Arbitration contention with other bus masters
l E.g., PCI bus: peak bandwidth ~133 MB/sec
- In practice, max 80MB/sec sustainable

58
Concluding Remarks
❑ I/O performance measures
l Throughput, response time
l Dependability and cost also important

❑ Buses used to connect CPU, memory,


I/O controllers
l Polling, interrupts, DMA

❑ I/O benchmarks
l TPC, SPECSFS, SPECWeb

❑ RAID
l Improves performance and dependability

59

You might also like