0% found this document useful (0 votes)

84 views40 pages

ECE/CS 250 Computer Architecture Summer 2021

Seek time dominates random access!

Uploaded by

deepthi.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views40 pages

ECE/CS 250 Computer Architecture Summer 2021

Seek time dominates random access!

Uploaded by

deepthi.m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

ECE/CS 250

Computer Architecture

Summer 2021

I/O

Tyler Bletsch
Duke University

Includes material adapted from Dan Sorin (Duke) and Amir Roth (Penn).
SSD material from Andrew Bondi (Colorado State).
Where We Are in This Course Right Now

• So far:
• We know how to design a processor that can fetch, decode, and
execute the instructions in an ISA
• We understand how to design caches and memory
• Now:
• We learn about the lowest level of storage (disks)
• We learn about input/output in general
• Next:
• Faster processor cores
• Multicore processors

2
This Unit: I/O

Application • I/O system structure

OS • Devices, controllers, and buses
Compiler Firmware • Device characteristics
• Disks: HDD and SSD
CPU I/O
• I/O control
Memory
• Polling and interrupts
Digital Circuits • DMA
Gates & Transistors

3
Readings

• Patterson and Hennessy dropped the ball on this topic

• It used to be covered in depth (in previous editions)
• Now it’s sort of in Appendix A.8

4
Computers Interact with Outside World

• Input/output (I/O)
• Otherwise, how will we ever tell a computer what to do…
• …or exploit the results of its work?
• Computers without I/O are not useful
• ICQ: What kinds of I/O do computers have?

5
One Instance of I/O

• Have briefly seen one instance of I/O

CPU
• Disk: bottom of memory hierarchy
• Holds whatever can’t fit in memory
I$ D$ • ICQ: What else do disks hold?

Main
Memory

Disk(swap)

6
A More General/Realistic I/O System

• A computer system
• CPU, including cache(s)
• Memory (DRAM)
• I/O peripherals: disks, input devices, displays, network cards, ...
• With built-in or separate I/O (or DMA) controllers
• All connected by a system bus

CPU ($) will define DMA later

“System” (memory-I/O) bus

DMA DMA I/O ctrl

Main kbd display

Disk NIC
Memory
7
Bus Design
data lines
address lines
control lines
• Goals
• High Performance: low latency and high bandwidth
• Standardization: flexibility in dealing with many devices
• Low Cost
• Processor-memory bus emphasizes performance, then cost
• I/O & backplane emphasize standardization, then performance
• Design issues
1. Width/multiplexing: are wires shared or separate?
2. Clocking: is bus clocked or not?
3. Switching: how/when is bus control acquired and released?
4. Arbitration: how do we decide who gets the bus next?

8
Standard Bus Examples

PCI SCSI USB

Type Backplane I/O I/O
Width 32–64 bits 8–32 bits 1 bit
Multiplexed? Yes Yes Yes
Clocking 33 (66) MHz 5 (10) MHz Asynchronous
Data rate 133 (266) MB/s 10 (20) MB/s 0.2, 1.5, 60 MB/s
Arbitration Distributed Daisy chain weird
Maximum masters 1024 7–31 127
Maximum length 0.5 m 2.5 m –

• USB (universal serial bus)

• Popular for low/moderate bandwidth external peripherals
+ Packetized interface (like TCP), extremely flexible
+ Also supplies power to the peripheral

9
This Unit: I/O

Application • I/O system structure

10
Operating System (OS) Plays a Big Role

• I/O interface is typically under OS control

• User applications access I/O devices indirectly (e.g., SYSCALL)
• Why?
• Device drivers are “programs” that OS uses to manage devices
• Virtualization: same argument as for memory
• Physical devices shared among multiple programs
• Direct access could lead to conflicts – example?
• Synchronization
• Most have asynchronous interfaces, require unbounded waiting
• OS handles asynchrony internally, presents synchronous interface
• Standardization
• Devices of a certain type (disks) can/will have different interfaces
• OS handles differences (via drivers), presents uniform interface

11
I/O Device Characteristics

• Primary characteristic
• Data rate (aka bandwidth)
• Contributing factors
• Partner: humans have slower output data rates than machines
• Input or output or both (input/output)

12
I/O Device: Disk

head • Disk: like stack of record players

platter
• Collection of platters
• Each with read/write head
• Platters divided into concentric tracks
• Head seeks (forward/backward) to track
sector • All heads move in unison
• Each track divided into sectors
• ZBR (zone bit recording)
• More sectors on outer tracks
track • Sectors rotate under head
• Controller
• Seeks heads, waits for sectors
• Turns heads on/off
• May have its own cache (made w/DRAM)
13
Disk Parameters

Seagate 6TB Seagate Savvio Toshiba MK1003

Enterprise HDD (~2005) (early 2000s)
(2016)
Diameter 3.5” 2.5” 1.8”
Density
Capacity 6 TB 73 GB 10 GB improving

RPM 7200 RPM 10000 RPM 4200 RPM

Cache 128 MB 8 MB 512 KB Caches
improving

Platters ~6 2 1
Seek time
Average Seek 4.16 ms 4.5 ms 7 ms not really
improving!
Sustained Data Rate 216 MB/s 94 MB/s 16 MB/s
Interface SAS/SATA SCSI ATA
Use Desktop Laptop Ancient iPod

14
Disk Read/Write Latency

• Disk read/write latency has four components

• Seek delay (tseek): head seeks to right track
• Fixed delay plus proportional to distance
• Rotational delay (trotation): right sector rotates under head
• Fixed delay on average (average = half rotation)
• Controller delay (tcontroller): controller overhead (on either side)
• Fixed cost

• Transfer time (ttransfer): data actually being transferred

• Proportional to amount of data

15
Understanding disk performance

• One 🕐 equals 1 microsecond

• Time to read the “next” 512-byte sector (no seek needed):
🕐 🕐 ~2μs

• Time to read a random 512-byte sector (with seek):

🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
16
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐🕐
Disk Bandwidth

• Disk is bandwidth-inefficient for page-sized transfers

• Actual data transfer (ttransfer) a small part of disk access (and cycle)

• Increase bandwidth: stripe data across multiple disks

• Striping strategy depends on disk usage model
• “File System” or “web server”: many small files
• Map entire files to disks
• “Supercomputer” or “database”: several large files
• Stripe single file across multiple disks

• Both bandwidth and individual transaction latency important

17
Error Correction: RAID

• Error correction: more important for disk than for memory

• Mechanical disk failures (entire disk lost) is common failure mode
• Entire file system can be lost if files striped across multiple disks
• RAID (redundant array of inexpensive disks)
• Similar to DRAM error correction, but…
• Major difference: which disk failed is known
• Even parity can be used to recover from single failures
• Parity disk can be used to reconstruct data faulty disk
• RAID design balances bandwidth and fault-tolerance
• Many flavors of RAID exist
• Tradeoff: extra disks (cost) vs. performance vs. reliability
• Deeper discussion of RAID in ECE 552 and ECE 554;
super-duper deep coverage in ECE 566
(“Enterprise Storage Architecture”)
• RAID doesn’t solve all problems → can you think of any examples?
18
What about Solid State Drives (SSDs)?

SSD HDD

19
Adapted from “Solid State Drives” by Andrew Bondi
SSDs

• Multiple NAND flash chips operated in parallel

• Pros:
• Extremely good “seek” times (since “seek” is no longer a thing)
• Almost instantaneous read and write times
• The ability to read or write in multiple locations at once
• The speed of the drive scales extremely well with the number of NAND ICs on
board
• Way cheaper than disk per IOP (performance)
• Cons:
• Way more expensive than disk per GB (capacity)
• Limited number of write cycles possible before it degrades
(getting less and less of a problem these days)
• Fundamental problem: Write amplification
• You can set bits in “pages” (~4kB) fast (microseconds), but
you can only clear bits in “blocks” (~512kB) slooow (milliseconds)
• Solution: controller that is managing NAND cells tries to hide this

20
Adapted from “Solid State Drives” by Andrew Bondi
Typical read and write rates: SSD vs HDD

• Benchmark data from HD Tune (Windows benchmark)

HDD SSD

21
This Unit: I/O

Application • I/O system structure

22
I/O Control and Interfaces

• Now that we know how I/O devices and buses work…

• How does I/O actually happen?
• How does CPU give commands to I/O devices?
• How do I/O devices execute data transfers?
• How does CPU know when I/O devices are done?

23
Sending Commands to I/O Devices

• Remember: only OS can do this! Two options …

• I/O instructions
• OS only? Instructions must be privileged (only OS can execute)
• E.g., IA-32
• Memory-mapped I/O
• Portion of physical address space reserved for I/O
• OS maps physical addresses to I/O device control registers
• Stores/loads to these addresses are commands to I/O devices
• Main memory ignores them, I/O devices recognize and respond
• Address specifies both I/O device and command
• These address are not cached – why?
• OS only? I/O physical addresses only mapped in OS address space
• E.g., almost every architecture other than IA-32 (see pattern??)

24
Memory mapped IO example (1)

• Non-special read – comes from memory

25
Memory mapped IO example (2)

• Write to address 1000 – routed to TTY!

• Mem write disabled, TTY write enabled; signal goes to both
26
Memory mapped IO example (3)

• Read from address 1001 – data comes from keyboard

• Mux switches to keyboard for that address
27
Querying I/O Device Status

• Now that we’ve sent command to I/O device …

• How do we query I/O device status?
• So that we know if data we asked for is ready?
• So that we know if device is ready to receive next command?

• Polling: Ready now? How about now? How about now???

• Processor queries I/O device status register (e.g., with MM load)
• Loops until it gets status it wants (ready for next command)
• Or tries again a little later
+ Simple
– Waste of processor’s time
• Processor much faster than I/O device

28
Polling Overhead: Example #1

• Parameters
• 500 MHz CPU
• Polling event takes 400 cycles

• Overhead for polling a mouse 30 times per second?

• Cycles per second for polling = (30 poll/s)*(400 cycles/poll)
• → 12000 cycles/second for polling
• (12000 cycles/second)/(500 M cycles/second) = 0.002% overhead
+ Not bad

29
Polling Overhead: Example #2

• Same parameters
• 500 MHz CPU, polling event takes 400 cycles

• Overhead for polling a 4 MB/s disk with 16 B interface?

• Must poll often enough not to miss data from disk
• Polling rate = (4MB/s)/(16 B/poll) >> mouse polling rate
• Cycles per second for polling=[(4MB/s)/(16 B/poll)]*(400 cyc/poll)
• → 100 M cycles/second for polling
• (100 M cycles/second)/(500 M cycles/second) = 20% overhead
– Bad
• This is the overhead of polling, not actual data transfer
• Really bad if disk is not being used (pure overhead!)

30
Interrupt-Driven I/O

• Interrupts: alternative to polling

• I/O device generates interrupt when status changes, data ready
• OS handles interrupts just like exceptions (e.g., page faults)
• Identity of interrupting I/O device recorded in ECR
• ECR: exception cause register

• I/O interrupts are asynchronous

• Not associated with any one instruction
• Don’t need to be handled immediately

• I/O interrupts are prioritized

• Synchronous interrupts (e.g., page faults) have highest priority
• High-bandwidth I/O devices have higher priority than low-
bandwidth ones

31
Interrupt Overhead

• Parameters Note: when disk is

• 500 MHz CPU transferring data, the interrupt
• Polling event takes 400 cycles rate is same as polling rate
• Interrupt handler takes 400 cycles
• Data transfer takes 100 cycles
• 4 MB/s, 16 B interface disk, transfers data only 5% of time

• Percent of time processor spends transferring data

• 0.05 * (4 MB/s)/(16 B/xfer)*[(100 c/xfer)/(500M c/s)] = 0.25%
• Overhead for polling?
• (4 MB/s)/(16 B/poll) * [(400 c/poll)/(500M c/s)] = 20%
• Overhead for interrupts?
+ 0.05 * (4 MB/s)/(16 B/int) * [(400 c/int)/(500M c/s)] = 1%

32
Direct Memory Access (DMA)

• Interrupts remove overhead of polling…

• But still requires OS to transfer data one word at a time
• OK for low bandwidth I/O devices: mice, microphones, etc.
• Bad for high bandwidth I/O devices: disks, monitors, etc.

• Direct Memory Access (DMA)

• Transfer data between I/O and memory without processor control
• Transfers entire blocks (e.g., pages, video frames) at a time
• Can use bus “burst” transfer mode if available
• Only interrupts processor when done (or if error occurs)

33
DMA Controllers

• To do DMA, I/O device attached to DMA controller

• Multiple devices can be connected to one DMA controller
• Controller itself seen as a memory mapped I/O device
• Processor initializes start memory address, transfer size, etc.
• DMA controller takes care of bus arbitration and transfer details
• So that’s why buses support arbitration and multiple masters!
CPU ($)

Bus

DMA DMA I/O ctrl

Main
Disk display NIC
Memory
34
DMA Overhead

• Parameters
• 500 MHz CPU
• Interrupt handler takes 400 cycles
• Data transfer takes 100 cycles
• 4 MB/s, 16 B interface, disk transfers data 50% of time
• DMA setup takes 1600 cycles, transfer 1 16KB page at a time

• Processor overhead for interrupt-driven I/O?

• 0.5 * (4M B/s)/(16 B/xfer)*[(500 c/xfer)/(500M c/s)] = 12.5%
• Processor overhead with DMA?
• Processor only gets involved once per page, not once per 16 B
+ 0.5 * (4M B/s)/(16K B/page) * [(2000 c/page)/(500M c/s)] = 0.05%

35
DMA and Memory Hierarchy

• DMA is good, but is not without challenges

• Without DMA: processor initiates all data transfers

• All transfers go through address translation
+ Transfers can be of any size and cross virtual page boundaries
• All values seen by cache hierarchy
+ Caches never contain stale data

• With DMA: DMA controllers initiate data transfers

• Do they use virtual or physical addresses?
• What if they write data to a cached memory location?

36
DMA and Caching

• Caches are good

• Reduce CPU’s observed instruction and data access latency
+ But also, reduce CPU’s use of memory…
+ …leaving majority of memory/bus bandwidth for DMA I/O

• But they also introduce a coherence problem for DMA

• Input problem
• DMA write into memory version of cached location
• Cached version now stale
• Output problem: write-back caches only
• DMA read from memory version of “dirty” cached location
• Output stale value

37
Solutions to Coherence Problem

• Route all DMA I/O accesses to cache?

+ Solves problem
– Expensive: CPU must contend for access to caches with DMA
• Disallow caching of I/O data?
+ Also works
– Expensive in a different way: CPU access to those regions slow
• Selective flushing/invalidations of cached data
• Flush all dirty blocks in “I/O region”
• Invalidate blocks in “I/O region” as DMA writes those addresses
+ The high performance solution
• Hardware cache coherence mechanisms for doing this
– Expensive in yet a third way: must implement this mechanism

38
H/W Cache Coherence (more later on this)

• D$ and L2 “snoop” bus traffic

CPU
• Observe transactions
VA VA • Check if written addresses are resident
TLB I$ D$ TLB • Self-invalidate those blocks
PA + Doesn’t require access to data part
– Does require access to tag part
L2
• May need 2nd copy of tags for this
PA • That’s OK, tags smaller than data
Bus

• Bus addresses are physical

DMA
• L2 is easy (physical index/tag)
• D$ is harder (virtual index/physical tag)
Main
Memory Disk

39
Summary

• Storage devices
• HDD: Mechanical disk. Seeks are bad. Cheaper per GB.
• SSD: Flash storage. Cheaper per performance.
• Can combine drives with RAID to get aggregate performance/capacity
plus fault tolerance (can survive individual drive failures).
• Connectivity
• A bus is shared between CPU, memory, and/or and multiple IO devices
• How does CPU talk to IO devices?
• Special instructions or memory-mapped IO
(certain addresses don’t lead to RAM, they lead to IO devices)
• Either requires OS privilege to use
• Methods of interaction:
• Polling (simple but wastes CPU)
• Interrupts (saves CPU but transfers tiny bit at a time)
• DMA+interrupts (saves CPU+fast, but requires caches to snoop
traffic to not become wrong) 40

IO Systems
No ratings yet
IO Systems
53 pages
Boym RussianSoulPostCommunist 1995
No ratings yet
Boym RussianSoulPostCommunist 1995
35 pages
Aalim Muhammed Salegh College of Engineering: Iaf-Avadi, Chennai - 55 Department of Computer Science and Engineering
No ratings yet
Aalim Muhammed Salegh College of Engineering: Iaf-Avadi, Chennai - 55 Department of Computer Science and Engineering
224 pages
OS - Chapter - 6 - Input Output Management
No ratings yet
OS - Chapter - 6 - Input Output Management
21 pages
Unit - Iii
No ratings yet
Unit - Iii
45 pages
Os Input Output Devices
No ratings yet
Os Input Output Devices
63 pages
Unit 4 Memory Organization 1
No ratings yet
Unit 4 Memory Organization 1
33 pages
Input/Output and Storage Systems
No ratings yet
Input/Output and Storage Systems
73 pages
Chapter 3 Disk Management
No ratings yet
Chapter 3 Disk Management
57 pages
CA-chap6-IO System
No ratings yet
CA-chap6-IO System
59 pages
Module 6-Os
No ratings yet
Module 6-Os
19 pages
16-Module 6-21-10-2024
No ratings yet
16-Module 6-21-10-2024
29 pages
0 387 28942 9
No ratings yet
0 387 28942 9
703 pages
OS 3141601 Unit7 IO Management
No ratings yet
OS 3141601 Unit7 IO Management
66 pages
10 - Mass Storage
No ratings yet
10 - Mass Storage
39 pages
COS Module 4
No ratings yet
COS Module 4
45 pages
Io Management
No ratings yet
Io Management
43 pages
Week9 Slides
No ratings yet
Week9 Slides
38 pages
CA Chap6 IO
No ratings yet
CA Chap6 IO
40 pages
CS-chap6-Storage and Other IO Topics
No ratings yet
CS-chap6-Storage and Other IO Topics
60 pages
Chapter 05 OS - Ver1.1
No ratings yet
Chapter 05 OS - Ver1.1
72 pages
Oprting System (Lecture) - Chapter-05
No ratings yet
Oprting System (Lecture) - Chapter-05
63 pages
Input/output
No ratings yet
Input/output
63 pages
Chapter 3 Input Output Devices-Sum-W3
No ratings yet
Chapter 3 Input Output Devices-Sum-W3
77 pages
Lecture 4 Interfacing and Communication
No ratings yet
Lecture 4 Interfacing and Communication
43 pages
11 - Input Output Systems
No ratings yet
11 - Input Output Systems
50 pages
Os - Unit 6
No ratings yet
Os - Unit 6
38 pages
Mil STD 444
100% (1)
Mil STD 444
161 pages
Unit 5
No ratings yet
Unit 5
8 pages
OS IO Management
No ratings yet
OS IO Management
79 pages
Slide #5
No ratings yet
Slide #5
32 pages
Unit-6 Device Management
No ratings yet
Unit-6 Device Management
59 pages
Unit 4 Input Output System - Saj
No ratings yet
Unit 4 Input Output System - Saj
28 pages
OS-Chap5-2021 01 22
No ratings yet
OS-Chap5-2021 01 22
51 pages
SIT102 Lecture 8.1
No ratings yet
SIT102 Lecture 8.1
30 pages
6chapter Six - Input Output Management
No ratings yet
6chapter Six - Input Output Management
35 pages
Chapter 7 IO
No ratings yet
Chapter 7 IO
63 pages
Chapter 4 - Opreating System
No ratings yet
Chapter 4 - Opreating System
34 pages
Chapter 5 Input Output Management
No ratings yet
Chapter 5 Input Output Management
34 pages
18CSC205J Operating Systems-Unit-5
No ratings yet
18CSC205J Operating Systems-Unit-5
138 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
C-CS316 - Lect12 - Storage Management
No ratings yet
C-CS316 - Lect12 - Storage Management
21 pages
OS - Unit 1
No ratings yet
OS - Unit 1
67 pages
chp4 IO Management
No ratings yet
chp4 IO Management
49 pages
Storage Basics
100% (1)
Storage Basics
71 pages
IO Systems
No ratings yet
IO Systems
22 pages
Lec17 Disks
No ratings yet
Lec17 Disks
31 pages
Principles of I/O Hardware and Software Principles of I/O Hardware
100% (1)
Principles of I/O Hardware and Software Principles of I/O Hardware
15 pages
Input/Output: The Computer's Response Time Is No Match For Ours
No ratings yet
Input/Output: The Computer's Response Time Is No Match For Ours
108 pages
IO Management-1
No ratings yet
IO Management-1
12 pages
System Bus
No ratings yet
System Bus
37 pages
Mod 2
No ratings yet
Mod 2
121 pages
Pico Interactive Instruction Manual
No ratings yet
Pico Interactive Instruction Manual
200 pages
Manual F315-F321-F330-F340
No ratings yet
Manual F315-F321-F330-F340
19 pages
Practice Problems: Paul Dawkins
No ratings yet
Practice Problems: Paul Dawkins
75 pages
Lecture 15 Device Management
No ratings yet
Lecture 15 Device Management
38 pages
Files and Disks
No ratings yet
Files and Disks
33 pages
Installation of Signboard
100% (1)
Installation of Signboard
13 pages
Nddi2 LS - GS - 3ba23173
No ratings yet
Nddi2 LS - GS - 3ba23173
27 pages
20 Io
No ratings yet
20 Io
21 pages
Chapter 5
No ratings yet
Chapter 5
40 pages
GenAI 20 Weeks Roadmap
No ratings yet
GenAI 20 Weeks Roadmap
2 pages
I/O Management and Disk Scheduling (Chapter 11)
No ratings yet
I/O Management and Disk Scheduling (Chapter 11)
24 pages
CSCE 351 Operating System Kernels: Steve Goddard
No ratings yet
CSCE 351 Operating System Kernels: Steve Goddard
10 pages
Wa0010.
No ratings yet
Wa0010.
4 pages
I/O Hardware: Incredible Variety of I/O Devices Common Concepts
No ratings yet
I/O Hardware: Incredible Variety of I/O Devices Common Concepts
22 pages
Research Methods Synopsis
No ratings yet
Research Methods Synopsis
22 pages
Chapter 5
No ratings yet
Chapter 5
49 pages
Cpu I o Cpu Cpu
No ratings yet
Cpu I o Cpu Cpu
11 pages
Cambridge International AS & A Level: Physics 9702/23
No ratings yet
Cambridge International AS & A Level: Physics 9702/23
12 pages
Simple Future Tense: Presented by Henny Septia Utami, M.PD
100% (2)
Simple Future Tense: Presented by Henny Septia Utami, M.PD
10 pages
Facilities Management Conference Indonesia
No ratings yet
Facilities Management Conference Indonesia
6 pages
3.-GE11 EntrepreneurialMind FINAL
100% (4)
3.-GE11 EntrepreneurialMind FINAL
15 pages
VVM MCQ On Electricity
No ratings yet
VVM MCQ On Electricity
4 pages
IMU (V) 2012 13 Detail Brochure
No ratings yet
IMU (V) 2012 13 Detail Brochure
6 pages
Lecture 17
No ratings yet
Lecture 17
13 pages
18.question Bank - SA I - ND22
No ratings yet
18.question Bank - SA I - ND22
5 pages
TP5088 PDF
No ratings yet
TP5088 PDF
6 pages
Cersai: Central Registry of Securitisation Asset Reconstruction and Security Interest of India
No ratings yet
Cersai: Central Registry of Securitisation Asset Reconstruction and Security Interest of India
3 pages
7.chapter 4 Fire Protection Design Process
No ratings yet
7.chapter 4 Fire Protection Design Process
4 pages
IDTR 2019-20 Announcement
No ratings yet
IDTR 2019-20 Announcement
3 pages
FLYFokker Leaflet Lavatory Modifications
No ratings yet
FLYFokker Leaflet Lavatory Modifications
2 pages
Accounts Project Bcom 1year
No ratings yet
Accounts Project Bcom 1year
6 pages
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
No ratings yet
Katalog Cable Support SIVENTRA (Tray C) - Siap Cetak
7 pages
PC Hardware Explained
From Everand
PC Hardware Explained
V. Subhash
No ratings yet
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
No ratings yet
Nakamura 1991 Jpn. J. Appl. Phys. 30 L1998
5 pages
Definitions of Curriculum Bsed
No ratings yet
Definitions of Curriculum Bsed
1 page
Đề thi học kì 2 2022 - 2023
No ratings yet
Đề thi học kì 2 2022 - 2023
3 pages
Medical Image Analysis: Published by Elsevier B.V
No ratings yet
Medical Image Analysis: Published by Elsevier B.V
1 page
Membership Form: The Accredited Professional Organization in The Phils. (I-Apo No
No ratings yet
Membership Form: The Accredited Professional Organization in The Phils. (I-Apo No
1 page

ECE/CS 250 Computer Architecture Summer 2021

Uploaded by

ECE/CS 250 Computer Architecture Summer 2021

Uploaded by

ECE/CS 250

Application • I/O system structure

• Patterson and Hennessy dropped the ball on this topic

• Have briefly seen one instance of I/O

CPU ($) will define DMA later

“System” (memory-I/O) bus

DMA DMA I/O ctrl

Main kbd display

PCI SCSI USB

• USB (universal serial bus)

Application • I/O system structure

• I/O interface is typically under OS control

Device Partner I? O? Data Rate (KB/s)

head • Disk: like stack of record players

Seagate 6TB Seagate Savvio Toshiba MK1003

RPM 7200 RPM 10000 RPM 4200 RPM

• Disk read/write latency has four components

• Transfer time (ttransfer): data actually being transferred

• One 🕐 equals 1 microsecond

• Time to read a random 512-byte sector (with seek):

• Disk is bandwidth-inefficient for page-sized transfers

• Increase bandwidth: stripe data across multiple disks

• Both bandwidth and individual transaction latency important

• Error correction: more important for disk than for memory

• Multiple NAND flash chips operated in parallel

• Benchmark data from HD Tune (Windows benchmark)

Application • I/O system structure

• Now that we know how I/O devices and buses work…

• Remember: only OS can do this! Two options …

• Non-special read – comes from memory

• Write to address 1000 – routed to TTY!

• Read from address 1001 – data comes from keyboard

• Now that we’ve sent command to I/O device …

• Polling: Ready now? How about now? How about now???

• Overhead for polling a mouse 30 times per second?

• Overhead for polling a 4 MB/s disk with 16 B interface?

• Interrupts: alternative to polling

• I/O interrupts are asynchronous

• I/O interrupts are prioritized

• Parameters Note: when disk is

• Percent of time processor spends transferring data

• Interrupts remove overhead of polling…

• Direct Memory Access (DMA)

• To do DMA, I/O device attached to DMA controller

DMA DMA I/O ctrl

• Processor overhead for interrupt-driven I/O?

• DMA is good, but is not without challenges

• Without DMA: processor initiates all data transfers

• With DMA: DMA controllers initiate data transfers

• Caches are good

• But they also introduce a coherence problem for DMA

• Route all DMA I/O accesses to cache?

• D$ and L2 “snoop” bus traffic

• Bus addresses are physical

You might also like