0% found this document useful (0 votes)
67 views10 pages

Common Framework For Memory Hierarchy

This document discusses I/O systems and performance. It covers where I/O devices can be placed, how blocks are found in caches and virtual memory, and replacement policies for blocks. It also discusses I/O performance metrics like throughput and response time, and how throughput can be increased but response time is difficult to reduce due to fundamental limits. Common I/O benchmarks for disks are also mentioned for different applications.

Uploaded by

KaKa Siu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views10 pages

Common Framework For Memory Hierarchy

This document discusses I/O systems and performance. It covers where I/O devices can be placed, how blocks are found in caches and virtual memory, and replacement policies for blocks. It also discusses I/O performance metrics like throughput and response time, and how throughput can be increased but response time is difficult to reduce due to fundamental limits. Common I/O benchmarks for disks are also mentioned for different applications.

Uploaded by

KaKa Siu
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Common Framework for Memory Hierarchy

°Question 1: Where can a Block be Placed


• Cache:
ECE 4680 - direct mapped, n-way set associative
Computer Organization and Architecture • VM:
- fully associative
I/O Systems
°Question 2: How is a block found
• index,
 Features of I/O devices: performance, benchmarks and types • index the set and search among elements
 How to connect I/O devices: bus • search all cache entries or separate lookup table
 Bus: features, performance, types, connections, timing, arbitration,
°Question 3: Which block be replaced
• Random, LRU, NRU

°What happens on a write


• write through vs write back
• write allocate vs write no-allocate on a write miss

ECE4680 io.1 April 1, 2003 ECE4680 io.2 April 1, 2003

The Big Picture: Where are We Now? I/O System Design Issues (§8.1)
• Performance
• Expandability
°Today’s Topic: I/O Systems
• Resilience in the face of failure

Network
interrupts
Processor Processor
Processor
Input Input
Control Control
Memory Memory Cache

Datapath Output Datapath


Output
Memory - I/O Bus

Main I/O I/O I/O


Memory Controller Controller Controller

Disk Disk Graphics Network


ECE4680 io.3 ECE4680 io.4
April 1, 2003 April 1, 2003

Example: Pentium System Organization I/O System Performance (§8.1)

°I/O System performance depends on many aspects of the system(§8.9)


• The CPU
Processor/Memory
• The memory system:
Bus
- Internal and external caches
- Main Memory
• The underlying interconnection (buses)
• The I/O controller
• The I/O device
• The speed of the I/O software
PCI Bus • The efficiency of the software’s use of the I/O devices

°Two common performance metrics:


• Throughput: I/O bandwidth
I/O Busses • Response time: Latency

ECE4680 io.5 ECE4680 io.6


April 1, 2003 April 1, 2003
Producer-Server Model Throughput versus Response Time
Response
Time (ms)
300

Producer Queue Server

200
°Throughput:
• The number of tasks completed by the server in unit time
• In order to get the highest possible throughput:
100
- The server should never be idle
- The queue should never be empty

°Response time:
20% 40% 60% 80% 100%
• Begins when a task is placed in the queue Percentage of maximum throughput
• Ends when it is completed by the server
• In order to minimize the response time: °Tradeoff between response time and throughput
- The queue should be empty
°Example: grouping access requests that are close may increase
- The server will be idle throughput but also increase the response time for some requests.
ECE4680 io.7 April 1, 2003 ECE4680 io.8 April 1, 2003

Throughput Enhancement I/O Benchmarks for Magnetic Disks (§8.2)

Server
Queue
°Supercomputer application:
• Large-scale scientific problems

°Transaction processing:
• Examples: Airline reservations systems and banks
Producer
°File system:
Queue Server • Example: UNIX file system

°In general throughput can be improved by:


• Throwing more hardware at the problem

°Response time is much harder to reduce:


• Ultimately it is limited by the speed of light
• You cannot bribe God!

ECE4680 io.9 ECE4680 io.10


April 1, 2003 April 1, 2003

Supercomputer I/O Transaction Processing I/O

°Supercomputer I/O is dominated by: °Transaction processing:


• Access to large files on magnetic disks • Examples: airline reservations systems, bank ATMs
• A lot of small changes to a large body of shared data
°Supercomputer I/O consists of one large read (read in the data)
• Many writes to snapshot the state of the computation °Transaction processing requirements:
• Throughput and response time are important
°Supercomputer I/O consists of more output than input
• Must be gracefully handling certain types of failure
°The overriding supercomputer I/O measures is data throughput:
°Transaction processing is chiefly concerned with I/O rate:
• Bytes/second that can be transferred between disk and memory
• The number of disk accesses per second

°Each transaction in typical transaction processing system takes:


• Between 2 and 10 disk I/Os
• Between 5,000 and 20,000 CPU instructions per disk I/O

ECE4680 io.11 ECE4680 io.12


April 1, 2003 April 1, 2003
File System I/O Types and Characteristics of I/O Devices (§8.3)

°Measurements of UNIX file systems in an engineering environment: °Behavior: how does an I/O device behave?
• 80% of accesses are to files less than 10 KB • Input: read only
• 90% of all file accesses are to data with • Output: write only, cannot read
sequential addresses on the disk • Storage: can be reread and usually rewritten
• 67% of the accesses are reads
• 27% of the accesses are writes °Partner:
100%
• 6% of the accesses are read-write accesses • Either a human or a machine is at the other end of the I/O device
• Either feeding data on input or reading data on output

°Data rate:
• The peak rate at which data can be transferred:
- Between the I/O device and the main memory
- Or between the I/O device and the CPU

ECE4680 io.13 April 1, 2003 ECE4680 io.14 April 1, 2003

I/O Device Examples Magnetic Disk

Registers

Cache

Memory

Disk
°Purpose:
Device Behavior Partner Data Rate (KB/sec) • Long term, nonvolatile storage
• Large, inexpensive, and slow
Keyboard Input Human 0.01
• Lowest level in the memory hierarchy
Mouse Input Human 0.02
°Two major types:
Line Printer Output Human 1.00 • Floppy disk
• Hard disk
Laser Printer Output Human 100.00
°Both types of disks:
Graphics Display Output Human 30,000.00
• Rely on a rotating platter coated with a magnetic surface
Network-LAN Input or Output Machine 200.00 • Use a moveable read/write head to access the disk
Floppy disk Storage Machine 50.00 °Advantages of hard disks over floppy disks:
Optical Disk Storage Machine 500.00 • Platters are more rigid ( metal or glass) so they can be larger
• Higher density because it can be controlled more precisely
Magnetic Disk Storage Machine 2,000.00 • Higher data rate because it spins faster
• Can incorporate more than one platter
ECE4680 io.15 ECE4680 io.16
April 1, 2003 April 1, 2003

Organization of a Hard Magnetic Disk Magnetic Disk Characteristic Track


°Disk head: each side of a platter has separate disk head Sector

°Cylinder: all the tracks under the head


at a given point on all surface
Cylinder
Platters °Read/write data is a three-stage process:
Platter
• Seek time: position the arm over the proper track Head
Track • Rotational latency: wait for the desired sector
to rotate under the read/write head
• Transfer time: transfer a block of bits (sector)
Sector under the read-write head

°A stack of platters, a surface with a magnetic coating °Average seek time as reported by the industry:
• Typically in the range of 8 ms to 15 ms
°Typical numbers (depending on the disk size):
• (Sum of the time for all possible seek) / (total # of possible seeks)
• 500 to 2,000 tracks per surface
• 32 to 128 sectors per track °Due to locality of disk reference, actual average seek time may:
- A sector is the smallest unit that can be read or written • Only be 25% to 33% of the advertised number

°Traditionally all tracks have the same number of sectors:


• Constant bit density: record more sectors on the outer tracks
ECE4680 io.17 ECE4680 io.18
April 1, 2003 April 1, 2003
Typical Numbers of a Magnetic Disk Disk I/O Performance
Track
Sector Request Rate Service Rate
°Rotational Latency: λ µ
• Most disks rotate at 3,600/5400/7200 RPM Disk Disk
Controller
• Approximately 16 ms per revolution Cylinder Queue
• An average latency to the desired Platter Processor
information is halfway around the disk: 8 ms Head
Disk Disk
Controller
°Transfer Time is a function of : Queue
• Transfer size (usually a sector): 1 KB / sector
• Rotation speed: 3600 RPM to 5400 RPM to 7200 °Disk Access Time = Seek time + Rotational Latency + Transfer time
• Recording density: typical diameter ranges from 2 to 14 in + Controller Time + Queueing Delay
• Typical values: 2 to 4 MB per second
°Estimating Queue Length:
• Utilization = U = Request Rate / Service Rate
• Mean Queue Length = U / (1 - U)
• As Request Rate  Service Rate
- Mean Queue Length  Infinity
ECE4680 io.19 April 1, 2003 ECE4680 io.20 April 1, 2003

Magnetic Disk Examples (page 650) Western Digital WD205BA


Characteristics IBM 3090 IBM0663 Integral 1820

Disk diameter (inches) 10.88 3.50 1.80 °Setup parameters: 16383 Cycliders, 63 sectors per track
Formatted data capacity (MB) 22,700 1,000 21 °3 platters, 6 heads
MTTF (hours) 50,000 400,000 100,000
°Bytes per sector: 512
Number of arms/box 12 1 1
°RPM: 7200
Rotation speed (RPM) 3,600 4,318 3,800
°Transfer mode: 66.6MB/s
Transfer rate (MB/sec) 4.2 4 1.9
°Average Read Seek time: 9.0ms (read), 9.5ms (write)
Power/box (watts) 2,900 12 2
°Average latency: 4.17ms
MB/watt 8 102 10.5
°Physical dimension: 1’’ x 4’’ x 5.75’’
Volume (cubic feet) 97 0.13 0.02

MB/cubic feet 234 7692 1050 °Interleave: 1:1

These disks represent the newest products of 1993. Compare with the newest disks
of 1997 at page 650 to see how fast the disks are developed.
ECE4680 io.21 ECE4680 io.22
April 1, 2003 April 1, 2003

Example (pp.648-649) Disk Arrays(p.692, 709)

°512 byte sector, rotate at 5400 RPM, advertised seeks is 12 ms, transfer
rate is 4 BM/sec, controller overhead is 1 ms, queue idle so no service
time

°Disk Access Time = Seek time + Rotational Latency + Transfer time


+ Controller Time + Queuing Delay
°A new organization of disk storage:
°Disk Access Time = 12 ms + 0.5 / 5400 RPM + 0.5 KB / 4 MB/s + 1 ms + 0 • Arrays of small and inexpensive disks
• Increase potential throughput by having many disk drives:
°Disk Access Time = 12 ms + 0.5 / 90 RPS + 0.125 / 1024 s + 1 ms + 0
- Data is spread over multiple disk
°Disk Access Time = 12 ms + 5.5 ms + 0.1 ms + 1 ms + 0 ms - Multiple accesses are made to several disks

°Disk Access Time = 18.6 ms °Reliability is lower than a single disk:


• But availability can be improved by adding redundant disks:
°If real seeks are 1/3 advertised seeks, then it is 10.6 ms, with rotation Lost information can be reconstructed from redundant information
delay at 50% of the time!
• MTTR: mean time to repair is in the order of hours
• MTTF: mean time to failure of disks is three to five years

ECE4680 io.23 ECE4680 io.24


April 1, 2003 April 1, 2003
The Big Picture: Where are We Now? Buses: Connecting I/O to Processor and Memory (§8.4)

°How to connect I/O to the rest of the computer?


Processor
Input
Network Control
Memory

Processor Processor Datapath Output


Input Input
Control Control
Memory °A bus is a shared communication link
Memory
°Multiple sources and multiple destinations
Datapath
Datapath Output Output °It uses one set of wires to connect multiple subsystems
°Different uses: data, address and control

°A definition from dictionary:


• An electrical connection between the components of a computer system along
which the signals or power is transmitted. Information is transferred along the
buses from any one of many sources to any one of many destinations. The bus
consists of several parallel wires, with separate wires serving various functions:
addresses, data, synchronization, control and power, etc.
ECE4680 io.25 April 1, 2003 ECE4680 io.26 April 1, 2003

Advantages of Buses Disadvantages of Buses

I/O I/O I/O I/O I/O I/O


Processor Device Device Device Memory Processor Device Device Device Memory

°It creates a communication bottleneck


°Versatility: • The bandwidth of that bus can limit the maximum I/O throughput
• New devices can be added easily
°The maximum bus speed is largely limited by:
• Peripherals can be moved between computer
systems that use the same bus standard • The length of the bus
• The number of devices on the bus
°Low Cost:
• The need to support a range of devices with:
• A single set of wires is shared in multiple ways
- Widely varying latencies
- Widely varying data transfer rates
ECE4680 io.27 ECE4680 io.28
April 1, 2003 April 1, 2003

The General Organization of a Bus Master versus Slave


Control Lines
Data Lines Master send address
Bus Bus
Master Data can go either way Slave
°Control lines:
• Signal requests and acknowledgments
• Indicate what type of information is on the data lines
°A bus transaction includes two parts:
°Data lines carry information between the source and the destination: • Sending the address
• Data and Addresses • Receiving or sending the data
• Complex commands
°Master is the one who starts the bus transaction by:
°A bus transaction includes two parts: • Sending the address
• Sending the address
°Salve is the one who responds to the address by:
• Receiving or sending the data
• Sending data to the master if the master ask for data
• Receiving data from the master if the master wants to send data

ECE4680 io.29 ECE4680 io.30


April 1, 2003 April 1, 2003
Output Operation Input Operation

°Output is defined as the Processor sending data to the I/O device:


°Input is defined as the Processor receiving data from the I/O device:
Step 1: Request Memory
Control (Memory Read Request)
Processor Memory Step 1: Request Memory
Data (Memory Address) Control (Memory Write Request)

I/O Device (Disk) Processor Data (Memory Address) Memory

Step 2: Read Memory I/O Device (Disk)


Control
Processor Data Memory

I/O Device (Disk) Step 2: Receive Data


Control (I/O Read Request)
Processor Data Memory
(I/O Device Address
Step 3: Send Data to I/O Device I/O Device (Disk)
Control (Device Write Request) and then Data)

Processor Data Memory


(I/O Device Address
I/O Device (Disk) and then Data)
ECE4680 io.31 April 1, 2003 ECE4680 io.32 April 1, 2003

Types of Buses A Computer System with One Bus: Backplane Bus

Backplane Bus
°Processor-Memory Bus (design specific or proprietary)
Processor Memory
• Short and high speed
• Only need to match the memory system
- Maximize memory-to-processor bandwidth I/O Devices
• Connects directly to the processor

°I/O Bus (industry standard) °A single bus (the backplane bus) is used for:
• Usually is lengthy and slower • Processor to memory communication
• Need to match a wide range of I/O devices • Communication between I/O devices and memory
• Connects to the processor-memory bus or backplane bus
°Advantages: Simple and low cost
°Backplane Bus (industry standard)
°Disadvantages: slow and the bus can become a major bottleneck
• Backplane: an interconnection structure within the chassis
• Allow processors, memory, and I/O devices to coexist °Example: IBM PC
• Cost advantage: one single bus for all components

ECE4680 io.33 ECE4680 io.34


April 1, 2003 April 1, 2003

A Two-Bus System A Three-Bus System

Processor Memory Bus Processor Memory Bus


Processor Memory Processor Memory

Bus Bus Bus Bus


Adaptor Adaptor Adaptor Adaptor
Bus
I/O I/O I/O Adaptor I/O Bus
Bus Bus Bus Backplane Bus
Bus I/O Bus
Adaptor

°I/O buses tap into the processor-memory bus via bus adaptors:
• Processor-memory bus: mainly for processor-memory traffic °A small number of backplane buses tap into the processor-memory bus
• I/O buses: provide expansion slots for I/O devices • Processor-memory bus is used for processor memory traffic
• I/O buses are connected to the backplane bus
°Apple Macintosh-II
• NuBus: Processor, memory, and a few selected I/O devices °Advantage:
• SCSI Bus: the rest of the I/O devices • loading on the processor bus is greatly reduced
• I/O system can be easily expanded
ECE4680 io.35 ECE4680 io.36
April 1, 2003 April 1, 2003
Synchronous and Asynchronous Bus Simplest bus paradigm

°Synchronous Bus:
• Includes a clock in the control lines
• A fixed protocol for communication that is relative to the clock
• Advantage: involves very little logic and can run very fast
• Disadvantages:
- Every device on the bus must run at the same clock rate
- To avoid clock skew, they cannot be long if they are fast °All agents operate synchronously

°Asynchronous Bus: °All can source / sink data at same rate


• It is not clocked °=> simple protocol
• It can accommodate a wide range of devices • just manage the source and target
• It can be lengthened without worrying about clock skew
• It requires a handshaking protocol

ECE4680 io.37 April 1, 2003 ECE4680 io.38 April 1, 2003

Simple Synchronous Protocol Typical Synchronous Protocol

BReq BReq

BG BG

R/W Cmd+Addr R/W Cmd+Addr


Address Address

Data Data1 Data2


Wait

Data Data1 Data1 Data2


°Even memory busses are more complex than this
• memory (slave) may take time to respond
°Slave indicates when it is prepared for data transfer
• it need to control data rate
°Actual transfer goes at bus rate
ECE4680 io.39 ECE4680 io.40
April 1, 2003 April 1, 2003

A Handshaking Protocol (page 661) Increasing the Bus Bandwidth

ReadReq 1 2 3 °Separate versus multiplexed address and data lines:


Data
 Address Data • Address and data can be transmitted in one bus cycle
2 4 6
if separate address and data lines are available
5
Ack
 6 7 • Cost: (a) more bus lines, (b) increased complexity
4
DataRdy
°Data bus width:
• By increasing the width of the data bus, transfers of multiple words
require fewer bus cycles
• Example: SPARCstation 20’s memory bus is 128 bit wide
°Three control lines
• Cost: more bus lines
• ReadReq: indicate a read request for memory
Address is put on the data lines at the same line °Block transfers:
• DataRdy: indicate the data word is now ready on the data lines • Allow the bus to transfer multiple words in back-to-back bus cycles
Data is put on the data lines at the same time • Only one address needs to be sent at the beginning
• Ack: acknowledge the ReadReq or the DataRdy of the other party • The bus is not released until the last word is transferred
• Cost: (a) increased complexity
°This figure is for read operation, but is almost the same for write operation (b) decreased response time for request

ECE4680 io.41 ECE4680 io.42


April 1, 2003 April 1, 2003
Asynchronous Handshake – Write Operation Asynchronous Handshake – Read Operation
Write Transaction Read Transaction

Address Master Asserts Address Next Address Address Master Asserts Address Next Address

Data Master Asserts Data Data

Read Read

Req Req

 Ack Ack

t0 t1 t2 t3 t4 t5
t0 t1 t2 t3 t4 t5 ° t0 : Master has obtained control and asserts address, direction, data
° t0 : Master has obtained control and asserts address, direction, data ° Waits a specified amount of time for slaves to decode target\
° Waits a specified amount of time for slaves to decode target ° t1: Master asserts request line
° t1: Master asserts request line ° t2: Slave asserts ack, indicating ready to transmit data
° t2: Slave asserts ack, indicating data received ° t3: Master releases req, data received
° t3: Master releases req ° t4: Slave releases ack
° t4: Slave releases ack
ECE4680 io.43 April 1, 2003 ECE4680 io.44 April 1, 2003

Example: Performance Analysis(page 665) Obtaining Access to the Bus


° Consider a system with the following characteristics: Control: Master initiates requests
• Its memory and bus supporting block access of 4 to 16 32-bit words Bus Bus
• 64-bit synchronous bus clocked at 200MHz, with each 64-bit transfer taking 1 clock Master Data can go either way Slave
cycle, and 1 clock cycle required to send an address to memory
• Two clock cycles needed between each bus operation
• A memory access time for the first four words of 200ns; each additional set of four
words can be read in 20ns. Assume that a bus transfer of the most recently read data
°One of the most important issues in bus design:
and a read of the next four words can be overlapped. • How is the bus reserved by a devices that wishes to use it?
° Find the sustained bandwidth and the latency for a read of 256 words that use 4-word blocks;
°Chaos is avoided by a master-slave arrangement:
° Compute the effective number of bus transactions per second ?
• Only the bus master can control access to the bus:
° Repeat the question for transfers that use 16-word blocks It initiates and controls all bus requests
1 clock
200 ns
5 ns / cycle = 40 clock 2 clock 2 clock Latency = 2880 clock cycles • A slave responds to read and write requests
I/O Memory
Block Read Data Bus Bandwidth = 71.11MB /sec
address memory transfer idle °The simplest system:
• Processor is the only bus master
1 clock 40 clock 2 clock 2 clock
I/O • All bus requests must be controlled by the processor
• Major drawback: the processor is involved in every transaction
Memory
Latency = 912 clock cycles
Bandwidth = 224.56MB /sec
ECE4680 io.45 ECE4680 io.46
April 1, 2003 April 1, 2003

Split Bus Transaction (page 666:elaboration) Multiple Potential Bus Masters: the Need for Arbitration

°Bus arbitration scheme:


°Request-Reply
• A bus master wanting to use the bus asserts the bus request
• CPU initiates a read or write transaction
• A bus master cannot use the bus until its request is granted
- address, data, and command
• A bus master must signal to the arbiter after finish using the bus
• then waiting for reply from memory
°Bus arbitration schemes usually try to balance two factors:
°Split bus transaction
• Bus priority: the highest priority device should be serviced first
• CPU initiates a read or write transaction
• Fairness: Even the lowest priority device should never
- address, data, and command be completely locked out from the bus
• Memory initiates a reply transaction
°Bus arbitration schemes can be divided into four broad classes:
- data (read) or acknowledge (write)
• Distributed arbitration by self-selection: each device wanting the
°+ bandwidth is improved bus places a code indicating its identity on the bus.
• Distributed arbitration by collision detection: Ethernet uses this.
°- latency for an individual read/write ?? • Daisy chain arbitration: see next slide.
• Centralized, parallel arbitration: see next-next slide

ECE4680 io.47 ECE4680 io.48


April 1, 2003 April 1, 2003
The Daisy Chain Bus Arbitrations Scheme Centralized Parallel Arbitration

Device 1 Device N Device 1


Highest Device 2 Lowest Device 2 Device N
Priority Priority

Grant Grant Grant Grant Req


Bus Release Bus
Arbiter Request Arbiter

°Advantage: simple

°Disadvantages:
• Cannot assure fairness:
A low-priority device may be locked out indefinitely
• The use of the daisy chain grant signal also limits the bus speed
°Used in essentially all processor-memory busses and in high-speed
I/O busses
ECE4680 io.49 April 1, 2003 ECE4680 io.50 April 1, 2003

Centralized Arbitration with a Bus Arbiter Simple Implementation of a Bus Arbiter

ReqA GrantA SetGrA


ReqA G0 J

3-bit D Register
ReqB Arbiter GrantB P0 ReqA Q GrantA
Highest priority: ReqA K
ReqC GrantC ReqB Priority Clk
Lowest Priority: ReqB
P1 SetGrB
Clk G1 J
ReqC ReqB Q GrantB
P2 K
Clk
Clk SetGrC
EN G2 J GrantC
ReqC Q
Clk K
ReqA Clk

ReqB

GrA

GrB

°What is inside the Priority? See next slide.

°How to implement JK flip-lop? See next next slide.


ECE4680 io.51 ECE4680 io.52
April 1, 2003 April 1, 2003

Priority Logic JK Flip Flop

P0 °JK Flip Flop can be implemented with a D-Flip Flop


G0

J
J K Q(t-1) Q(t)
P1 G1 0 0 0 0
0 0 1 1 Q
D
0 1 x 0
1 0 x 1 K
1 1 0 1
P2 G2 1 1 1 0
clk

EN

ECE4680 io.53 ECE4680 io.54


April 1, 2003 April 1, 2003
1993 MP Server Memory Bus Survey: GTL revolution 1993 Backplane/IO Bus Survey
Bus MBus Summit Challenge XDBus

Originator Sun HP SGI Sun Bus SBus TurboChannel MicroChannel PCI


Originator Sun DEC IBM Intel
Clock Rate (MHz) 40 60 48 66
Clock Rate (MHz) 16-25 12.5-25 async 33
Address lines 36 48 40 muxed Addressing Virtual Physical Physical Physical
Data Sizes (bits) 8,16,32 8,16,24,32 8,16,24,32,64 8,16,24,32,64
Data lines 64 128 256 144 (parity)
Master Multi Single Multi Multi
Data Sizes (bits) 256 512 1024 512 Arbitration Central Central Central Central
Clocks/transfer 4 5 4? 32 bit read (MB/s) 33 25 20 33
Peak (MB/s) 89 84 75 111 (222)
Peak (MB/s) 320(80) 960 1200 1056 Max Power (W) 16 26 13 25
Master Multi Multi Multi Multi

Arbitration Central Central Central Central

Slots 16 9 10

Busses/system 1 1 1 2

Length 13 inches 12? inches 17 inches


ECE4680 io.55 April 1, 2003 ECE4680 io.56 April 1, 2003

High Speed I/O Bus Break

°Examples
• graphics
• fast networks

°Limited number of devices

°Data transfer bursts at full rate

°DMA transfers important


• small controller spools stream of bytes to or from memory

°Either side may need to squelch transfer


• buffers fill up

ECE4680 io.57 ECE4680 io.58


April 1, 2003 April 1, 2003

PCI Transactions Summary of Bus Options (Fig 8.14, page 671)


°All signals sampled on rising edge
°Option High performance Low cost
°Centralized Parallel Arbitration
• overlapped with previous transaction °Bus width Separate address Multiplex address
& data lines & data lines
°Bus Parking °Data width Wider is faster Narrower is cheaper
• retain bus grant for previous master until another makes request (e.g., 32 bits) (e.g., 8 bits)
• granted master can start next transfer without arbitration °Transfer size Multiple words has Single-word transfer
less bus overhead is simpler
°All transfers are bursts; Arbitrary Burst length
°Bus masters Multiple Single master
• initiator and target can exert flow control with xRDY (requires arbitration) (no arbitration)
• target can disconnect request with STOP (abort or retry)
°Clocking Synchronous Asynchronous
• master can disconnect by deasserting FRAME
• arbiter can disconnect by deasserting GNT °Clocking rate higher is faster lower is easier
(e.g., 66MHz for PCI) (e.g., 5MHz for SCSI)
°Delayed (pended, split-phase) transactions
°Protocol pipelined Serial
• free the bus after request to slow device

ECE4680 io.59 ECE4680 io.60


April 1, 2003 April 1, 2003

You might also like