BCS361: Computer Architecture
I/O Devices
Input/Output
CPU
Cache
Bus
…
Memory Disk Network USB DVD
2
I/O Hierarchy
CPU
Cache
Disk
Memory Bus
I/O
Controller I/O Bus
Memory
…
Network USB DVD
3
Bus Design Review
• The bus is a shared resource – any device can send
data on the bus and all other devices can read this data off
the bus
• The address/control signals on the bus specify the
intended receiver of the message
• The length of the bus determines its speed (hence, a
hierarchy makes sense)
• Buses can be synchronous or asynchronous
4
Memory-Mapped I/O
• Each I/O device has its own special address range
The CPU issues commands such as these:
sw [some-data] [some-address]
Usually, memory services these requests… if the
address is in the I/O range, memory ignores it
The data is written into some register in the
appropriate I/O device – this serves as the command
to the device
5
Polling Vs. Interrupt-Driven
• When the I/O device is ready to respond, it can send an
interrupt to the CPU; the CPU stops what it was doing;
the OS examines the interrupt and then reads the data
produced by the I/O device (and usually stores into memory)
• In the polling approach, the CPU (OS) periodically checks
the status of the I/O device and if the device is ready with
data, the CPU reads it
6
Role of I/O
• Activities external to the CPU are typically orders of
magnitude slower
• Example: while CPU performance has improved by 50%
per year, disk latencies have improved by 10% every year
• Typical strategy on I/O: switch contexts and work on
something else
7
BCS361: Computer Architecture
Multiprocessors
Crossroads: Conventional Wisdom in Comp. Arch
• Old Conventional Wisdom: Power is free, Transistors expensive
• New Conventional Wisdom: “Power wall” Power expensive, Xtors free
(Can put more on chip than can afford to turn on)
• Old CW: Sufficiently increasing Instruction Level Parallelism via compilers,
innovation (Out-of-order, speculation, …)
• New CW: “ILP wall” law of diminishing returns on more HW for ILP
• Old CW: Multiplies are slow, Memory access is fast
• New CW: “Memory wall” Memory slow, multiplies fast
(200 clock cycles to DRAM memory, 4 clocks for multiply)
• Old CW: Uniprocessor performance 2X / 1.5 yrs
• New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall
– Uniprocessor performance now 2X / 5(?) yrs
Sea change in chip design: multiple “cores”
(2X processors per chip / ~ 2 years)
• More simpler processors are more power efficient