William Stallings Computer Organization and Architecture 7 Edition System Buses

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 91

William Stallings

Computer Organization
and Architecture
7th Edition

Chapter 3
System Buses
Program Concept
• Hardwired systems are inflexible
• General purpose hardware can do
different tasks, given correct control
signals
• Instead of re-wiring, supply a new set of
control signals
• Programming is now much easier. Instead of rewiring the
hardware for each new program, all we need to do is provide a
new sequence of codes. Each code is, in effect, an instruction, and
part of the hardware interprets each instruction and generates
control signals. To distinguish this new method of programming, a
sequence of codes or instructions is called software.
Continues
• Figure 3.1b indicates two major components of the
system: an instruction interpreter and a module of
general-purpose arithmetic and logic functions.
These two constitute the CPU. Several other
components are needed to yield a functioning
computer. Data and instructions must be put into the
system. For this we need some sort of input module.
This module contains basic components for
accepting data and instructions in some form and
converting them into an internal form of signals
usable by the system. A means of reporting results is
needed, and this is in the form of an output module.
Taken together, these are referred to as I/O
components.
What is a program?
• A sequence of steps
• For each step, an arithmetic or logical
operation is done
• For each operation, a different set of
control signals is needed
Function of Control Unit
• For each operation a unique code is
provided
—e.g. ADD, MOVE
• A hardware segment accepts the code and
issues the control signals

• We have a computer!
Components
• The Control Unit and the Arithmetic and
Logic Unit constitute the Central
Processing Unit
• Data and instructions need to get into the
system and results out
—Input/output
• Temporary storage of code and results is
needed
—Main memory
Computer Components:
Top Level View
Figure 3.2 illustrates these top-level components and
suggests the interactions among them. The CPU exchanges
data with memory. For this purpose, it typically makes use of
two internal (to the CPU) registers: a memory address
register (MAR), which specifies the address in memory for the
next read or write, and a memory buffer register (MBR),
which contains the data to be written into memory or receives
the data read from memory. Similarly, an I/O address
register (I/OAR) specifies a particular I/O device. An I/O
buffer (I/OBR) register is used for the exchange of data
between an I/O module and the CPU. A memory module
consists of a set of locations, defined by sequentially
numbered addresses. Each location contains a binary number
that can be interpreted as either an instruction or data. An
I/O module transfers data from external devices to CPU and
memory, and vice versa. It contains internal buffers for
temporarily holding these data until they can be sent on.
Instruction Cycle
• Two steps:
—Fetch
—Execute
Fetch Cycle
• Program Counter (PC) holds address of
next instruction to fetch
• Processor fetches instruction from
memory location pointed to by PC
• Increment PC
—Unless told otherwise
• Instruction loaded into Instruction
Register (IR)
• Processor interprets instruction and
performs required actions
Execute Cycle
• Processor-memory
—data transfer between CPU and main memory
• Processor I/O
—Data transfer between CPU and I/O module
• Data processing
—Some arithmetic or logical operation on data
• Control
—Alteration of sequence of operations
—e.g. jump
• Combination of above
Example of Program Execution
• 1. The PC contains 300, the address of the first instruction. This
instruction (the value 1940 in hexadecimal) is loaded into the instruction
register IR and the PC is incremented. Note that this process involves the
use of a memory address register (MAR) and a memory buffer register
(MBR). For simplicity, these intermediate registers are ignored.
• 2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is
to be loaded. The remaining 12 bits (three hexadecimal digits) specify the
address (940) from which data are to be loaded.
• 3. The next instruction (5941) is fetched from location 301 and the PC is
incremented.
• 4. The old contents of the AC and the contents of location 941 are added
and the result is stored in the AC.
• 5. The next instruction (2941) is fetched from location 302 and the PC is
• incremented.
• 6. The contents of the AC are stored in location 941.
• For example, the PDP-11 processor includes an instruction,
expressed symbolically as ADD B,A, that stores the sum of the
contents of memory locations B and A into memory location A. A
single instruction cycle with the following steps occurs:
• • Fetch the ADD instruction.
• • Read the contents of memory location A into the processor.
• • Read the contents of memory location B into the processor. In
order that the contents of A are not lost, the processor must have
at least two registers for storing memory values, rather than a
single accumulator.
• • Add the two values.
• • Write the result from the processor to memory location A.
Thus, the execution cycle for a particular instruction may involve
more than one reference to memory. Also, instead of memory
references, an instruction may specify an I/O operation. With these
additional considerations in mind, Figure 3.6 provides a more
detailed look at the basic instruction cycle of Figure 3.3.The figure
is in the form of a state diagram. For any given instruction cycle,
some states may be null and others may be visited more than once
Instruction Cycle State Diagram
• Instruction address calculation (iac): Determine the address of the
next instruction to be executed. Usually, this involves adding a fixed
number to the address of the previous instruction. For example, if each
instruction is 16 bits long and memory is organized into 16-bit words,
then add 1 to the previous address. If, instead, memory is organized as
individually addressable 8-bit bytes, then add 2 to the previous address.
• • Instruction fetch (if): Read instruction from its memory location into
the processor.
• • Instruction operation decoding (iod): Analyze instruction to
determine type of operation to be performed and operand(s) to be used.
• • Operand address calculation (oac): If the operation involves
reference to an operand in memory or available via I/O, then determine
the address of the operand.
• • Operand fetch (of): Fetch the operand from memory or read it in from
I/O.
• • Data operation (do): Perform the operation indicated in the
instruction.

• • Operand store (os): Write the result into memory or out to I/O .
Interrupts
• Mechanism by which other modules (e.g.
I/O) may interrupt normal sequence of
processing
• Program
—e.g. overflow, division by zero
• Timer
—Generated by internal processor timer
—Used in pre-emptive multi-tasking
• I/O
—from I/O controller
• Hardware failure
—e.g. memory parity error
• Interrupts are provided primarily as a way to
improve processing efficiency. For example, most
external devices are much slower than the
processor. Suppose that the processor is
transferring data to a printer using the
instruction cycle scheme of Figure 3.3. After each
write operation, the processor must pause and
remain idle until the printer catches up. The
length of this pause may be on the order of many
hundreds or even thousands of instruction cycles
that do not involve memory. Clearly, this is a
very wasteful use of the processor.
Program Flow Control
• Figure 3.7a illustrates this state of affairs.
The user program performs a series of
WRITE calls interleaved with processing.
Code segments 1, 2, and 3 refer to
sequences of instructions that do not
involve I/O. The WRITE calls are to an I/O
program that is a system utility and that
will perform the actual I/O operation. The
I/O program consists of three sections:
• A sequence of instructions, labeled 4 in the figure, to
prepare for the actual I/O operation. This may include
copying the data to be output into a special buffer and
preparing the parameters for a device command.
• • The actual I/O command. Without the use of interrupts,
once this command is issued, the program must wait for
the I/O device to perform the requested function (or
periodically poll the device).The program might wait by
simply repeatedly performing a test operation to determine
if the I/O operation is done.
• • A sequence of instructions, labeled 5 in the figure, to
complete the operation. This may include setting a flag
indicating the success or failure of the operation.
• Because the I/O operation may take a relatively long time
to complete, the I/O program is hung up waiting for the
operation to complete; hence, the user program is stopped
at the point of the WRITE call for some considerable period
of time.
Interrupt cycle
• With interrupts, the processor can be engaged in executing
other instructions while an I/O operation is in progress.
Consider the flow of control in Figure 3.7b. As before, the
user program reaches a point at which it makes a system
call in the form of a WRITE call. The I/O program that is
invoked in this case consists only of the preparation code
and the actual I/O command. After these few instructions
have been executed, control returns to the user program.
Meanwhile, the external device is busy accepting data from
computer memory and printing it. This I/O operation is
conducted concurrently with the execution of instructions
in the user program.
Interrupt Cycle
• Added to instruction cycle
• Processor checks for interrupt
—Indicated by an interrupt signal
• If no interrupt, fetch next instruction
• If interrupt pending:
—Suspend execution of current program
—Save context
—Set PC to start address of interrupt handler
routine
—Process interrupt
—Restore context and continue interrupted
program
Transfer of Control via Interrupts
Instruction Cycle with Interrupts
• The processor now proceeds to the fetch cycle and fetches
the first instruction in the interrupt handler program, which
will service the interrupt. The interrupt handler program is
generally part of the operating system. Typically, this
program determines the nature of the interrupt and
performs whatever actions are needed. In the example we
have been using, the handler determines which I/O module
generated the interrupt and may branch to a program that
will write more data out to that I/O module. When the
interrupt handler routine is completed, the processor can
resume execution of the user program at the point of
interruption. It is clear that there is some overhead involved in this
process. Extra instructions must be executed (in the interrupt handler) to
determine the nature of the interrupt and to decide on the appropriate
action. Nevertheless, because of the relatively large amount of time that
would be wasted by simply waiting on an I/O operation, the processor can
be employed much more efficiently with the use of interrupts.
Program Timing
Short I/O Wait
Program Timing
Long I/O Wait
Multiple interrupts
• Multiple interrupts can occur. For
example, a program may be receiving
data from a communications line and
printing results. The printer will generate
an interrupt every time that it completes a
print operation. The communication line
controller will generate an interrupt every
time a unit of data arrives. The unit could
either be a single character or a block,
depending on the nature of the
communications discipline.
Instruction Cycle (with Interrupts) -
State Diagram
Multiple Interrupts solution
• Disable interrupts
—Processor will ignore further interrupts whilst
processing one interrupt
—Interrupts remain pending and are checked
after first interrupt has been processed
—Interrupts handled in sequence as they occur
• A disabled interrupt simply means that the processor can and will
ignore that interrupt request signal. If an interrupt occurs during
this time, it generally remains pending and will be checked by the
processor after the processor has enabled interrupts. Thus, when
a user program is executing and an interrupt occurs, interrupts are
disabled immediately. After the interrupt handler routine
completes, interrupts are enabled before resuming the user
program, and the processor checks to see if additional interrupts
have occurred. This approach is nice and simple, as interrupts are
handled in strict sequential order.
Multiple Interrupts - Sequential
Multiple Interrupts – Nested
Multiple interrupts solution
• Define priorities
—Low priority interrupts can be interrupted by
higher priority interrupts
—When higher priority interrupt has been
processed, processor returns to previous
interrupt.
• As an example of this second approach, consider a system with three I/O
devices: a printer, a disk, and a communications line, with increasing
priorities of 2, 4, and 5, respectively. Figure 3.14, based on an example in
[TANE97], illustrates a possible sequence. A user program begins at t 0. At
t 10, a printer interrupt occurs; user information is placed on the system
stack and execution continues at the printer interrupt service routine
(ISR).While this routine is still executing, at t 15, a communications
interrupt occurs. Because the communications line has higher priority than
the printer, the interrupt is honored. The printer ISR is interrupted, its state
is pushed onto the stack, and execution continues at the communications
ISR. While this routine is executing, a disk interrupt occurs (t 20). Because
this interrupt is of lower priority, it is simply held, and the communications
ISR runs to completion.
Time Sequence of Multiple Interrupts
Connecting
• All the units must be connected
• Different type of connection for different
type of unit
—Memory
—Input/Output
—CPU
Computer Modules
Memory Connection
• Receives and sends data
• Receives addresses (of locations)
• Receives control signals
—Read
—Write
—Timing
Input/Output Connection(1)
• Similar to memory from computer’s
viewpoint
• Output
—Receive data from computer
—Send data to peripheral
• Input
—Receive data from peripheral
—Send data to computer
Input/Output Connection(2)
• Receive control signals from computer
• Send control signals to peripherals
—e.g. spin disk
• Receive addresses from computer
—e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
• Receives (& acts on) interrupts
Buses
A bus is a communication pathway connecting two or more
devices. A key characteristic of a bus is that it is a shared
transmission medium. Multiple devices connect to the bus,
and a signal transmitted by any one device is available for
reception by all other devices attached to the bus. If two
devices transmit during the same time period, their signals
will overlap and become garbled. Thus, only one device at a
time can successfully transmit.

•There are a number of possible


interconnection systems
•Single and multiple BUS structures are
most common
•e.g. Control/Address/Data bus (PC)
•e.g. Unibus (DEC-PDP)
What is a Bus?
• A communication pathway connecting two
or more devices
• Usually broadcast
• Often grouped
—A number of channels in one bus
—e.g. 32 bit data bus is 32 separate single bit
channels
• Power lines may not be shown
• Computer systems contain a number of different buses
that provide pathways between components at various
levels of the computer system hierarchy. A bus that
connects major computer components (processor,
memory, I/O) is called a system bus.
Data Bus
The data lines provide a path for moving
data among system modules. These
lines, collectively, are called the data bus.
•Carries data
—Remember that there is no difference between
“data” and “instruction” at this level
•Width is a key determinant of performance
—8, 16, 32, 64 bit
Address bus
• Identify the source or destination of data
• e.g. CPU needs to read an instruction
(data) from a given location in memory
• Bus width determines maximum memory
capacity of system
—e.g. 8080 has 16 bit address bus giving 64k
address space
Control Bus
The control lines are used to control the
access to and the use of the data and
address lines. Because the data and address
lines are shared by all components,
there must be a means of controlling their
use.
•Control and timing information
—Memory read/write signal
—Interrupt request
—Clock signals
Continue . . .
Typical control lines include
• Memory write: Causes data on the bus to be written into the addressed
location
• Memory read: Causes data from the addressed location to be placed on
the bus
• I/O write: Causes data on the bus to be output to the addressed I/O port
• I/O read: Causes data from the addressed I/O port to be placed on the
bus
• Transfer ACK: Indicates that data have been accepted from or placed on
the bus
• Bus request: Indicates that a module needs to gain control of the bus
• Bus grant: Indicates that a requesting module has been granted control of
the bus
• Interrupt request: Indicates that an interrupt is pending
• Interrupt ACK: Acknowledges that the pending interrupt has been
recognized
• Clock: Is used to synchronize operations
• Reset: Initializes all modules
Operation of a bus
The operation of the bus is as follows. If one
module wishes to send data to another,
it must do two things: (1) obtain the use of
the bus, and (2) transfer data via
the bus. If one module wishes to request
data from another module, it must (1)
obtain the use of the bus, and (2) transfer a
request to the other module over the
appropriate control and address lines. It
must then wait for that second module to
send the data.
Bus Interconnection Scheme
Bus arrangement
Physically, the system bus is actually a
number of parallel electrical conductors.
In the classic bus arrangement, these
conductors are metal lines etched
in a card or board (printed circuit board).
The bus extends across all of the system
components, each of which taps into some
or all of the bus lines. The classic
physical arrangement is depicted in Figure
3.17
Big and Yellow?
• What do buses look like?
—Parallel lines on circuit boards
—Ribbon cables
—Strip connectors on mother boards
– e.g. PCI
—Sets of wires
Physical Realization of Bus Architecture
Continue
In this example, the bus consists of two vertical
columns of conductors. At regular intervals along
the columns, there are attachment points in the
form of slots that extend out horizontally to support
a printed circuit board. Each of the major system
components occupies one or more boards and
plugs into the bus at these slots. However, modern
systems tend to have all of the major components
on the same board with more elements on the
same chip as the processor. Thus, an on-chip bus
may connect the processor and cache memory,
whereas an on-board bus may connect the
processor to main memory and
other components.
Single Bus Problems
• Lots of devices on one bus leads to:
• In general, the more devices attached to the bus, the greater the
bus length and hence the greater the propagation delay. This
delay determines the time it takes for devices to coordinate the
use of the bus. When control of the bus passes from one device
to another frequently, these propagation delays can noticeably
affect performance.
 The bus may become a bottleneck as the aggregate data transfer
demand approaches the capacity of the bus. This problem can be
countered to some extent by increasing the data rate that the bus
can carry and by using wider buses (e.g., increasing the data bus
from 32 to 64 bits). However, because the data rates generated
by attached devices (e.g., graphics and video controllers, network
interfaces) are growing rapidly, this is a race that a single bus is
ultimately destined to lose.

– Most systems use multiple buses to overcome


these problems
Traditional Bus
Accordingly, most computer systems use multiple buses,
generally laid out in a hierarchy. A typical traditional structure is
shown in Figure next slide. There is a local bus that connects the
processor to a cache memory and that may support one or more
local devices. The cache memory controller connects the cache
not only to this local bus, but to a system bus to which are
attached all of the main memory modules. As will be discussed
in Chapter 4, the use of a cache structure insulates the processor
from a requirement to access main memory frequently. Hence,
main memory can be moved off of the local bus onto a system
bus. In this way, I/O transfers to and from the main memory
across the system bus do not interfere with the processor’s
activity.
Traditional (ISA)
(with cache)
High Performance Bus
This traditional bus architecture is reasonably efficient but begins
to break down as higher and higher performance is seen in the I/O
devices. In response to these growing demands, a common
approach taken by industry is to build a high-speed bus that is
closely integrated with the rest of the system, requiring only a
bridge between the processor’s bus and the high-speed bus. This
arrangement is sometimes known as a mezzanine architecture.
Figure 3.18b shows a typical realization of this approach. Again, there is a
local bus that connects the processor to a cache controller, which is in turn
connected to a system bus that supports main memory. The cache controller is
integrated into a
bridge, or buffering device, that connects to the high-speed bus.
The advantage of this arrangement is that the high-speed bus
brings high demand devices into closer integration with the
processor.
High Performance Bus
Bus Types
• Dedicated & Multiplexed
• Dedicated
A dedicated bus line is permanently assigned either to one
function or to a physical subset of computer components.
An example of functional dedication is the use of separate dedicated
address and data lines, which is common on many buses.
However, it is not essential. For example, address and data
information may be transmitted over the same set of lines using
an Address Valid control line. At the beginning of a data transfer,
the address
is placed on the bus and the Address Valid line is activated. At this
point, each module has a specified period of time to copy the
address and determine if it is the addressed module. The address
is then removed from the bus, and the same bus connections are
used for the subsequent read or write data transfer. This
method of
using the same lines for multiple purposes is known as time
multiplexing.
Continue . . .
The advantage of time multiplexing is the use of fewer lines,
which saves space and, usually, cost. The disadvantage is that
more complex circuitry is needed within each module. Also,
there is a potential reduction in performance.
Physical dedication refers to the use of multiple buses, each of
which connects only a subset of modules. A typical example is
the use of an I/O bus to interconnect all I/O modules; this bus is
then connected to the main bus through some type of I/O adapter
module. The potential advantage of physical dedication is high
throughput, because there is less bus contention. A disadvantage
is the increased size and cost of
the system.
Bus Arbitration
In all but the simplest systems, more than one module may need control of the
bus. For example, an I/O module may need to read or write directly to memory,
without sending the data to the processor. Because only one unit at a time can
successfully transmit over the bus, some method of arbitration is needed.

•More than one module controlling the bus


•e.g. CPU and DMA controller
•Only one module may control bus at one
time
•Arbitration may be centralised or
distributed
Centralised or Distributed Arbitration
• Centralised
—Single hardware device controlling bus access
– Bus Controller
– Arbiter
—May be part of CPU or separate
• Distributed
—Each module may claim the bus
—Control logic on all modules
Timing
TIMING refers to the way in which events are coordinated on the bus. Buses
use either synchronous timing or asynchronous timing.
•Synchronous
The occurrence of events on the bus is determined by a clock. The bus
includes a clock line upon which a clock transmits a regular sequence of
alternating 1s and 0s of equal duration. A single 1–0 transmission is referred
to as a clock cycle or bus cycle and defines a time slot. All other devices on
the bus can read the clock line, and all events start at the beginning of a clock
cycle.
Synchronous Timing Diagram
Continue
In this simple example, the processor places a memory address
on the address lines during the first clock cycle and may assert
various status lines. Once the address lines have stabilized, the
processor issues an address enable signal. For a read operation,
the processor issues a read command at the start of the second
cycle. A memory module recognizes the address and, after a
delay of one cycle, places the data on the data lines. The
processor reads the data from the data lines and drops the read
signal.
For a write operation, the processor puts the data on the data
lines at the start of the second cycle, and issues a write command
after the data lines have stabilized. The memory module copies
the information from the data lines during the third clock cycle.
Asynchronous Timing
In Asynchronous the occurrence of one event on a bus follows
and depends on the occurrence of a previous event. In the simple
read example of Figure 3.20a, the processor places address and
status signals on the bus. After pausing for these signals to
stabilize, it issues a read command, indicating the presence
of valid address and control signals. The appropriate memory
decodes the address and responds by placing the data on the data
line. Once the data lines have stabilized, the memory module
asserts the acknowledged line to signal the processor
that the data are available. Once the master has read the data
from the data lines, it deasserts the read signal. This causes the
memory module to drop the data and acknowledge lines. Finally,
once the acknowledge line is dropped, the master removes the
address information.
Asynchronous Timing – Read Diagram
Asynchronous Timing
Figure 3.20b shows a simple asynchronous
write operation. In this case, the
master places the data on the data line at
the same time that is puts signals on the
status and address lines. The memory
module responds to the write command by
copying the data from the data lines and
then asserting the acknowledge line.The
master then drops the write signal and the
memory module drops the acknowledge
signal.
Asynchronous Timing – Write Diagram
Continue
Synchronous timing is simpler to implement
and test. However, it is less flexible
than asynchronous timing. Because all
devices on a synchronous bus are tied to
a fixed clock rate, the system cannot take
advantage of advances in device
performance.
With asynchronous timing, a mixture of slow
and fast devices, using older
and newer technology, can share a bus.
PCI Bus
The peripheral component interconnect (PCI) is a popular
high-bandwidth, processor-independent bus that can function
as a mezzanine or peripheral bus.
Compared with other common bus specifications, PCI delivers
better system performance for high-speed I/O subsystems
(e.g., graphic display adapters, network
interface controllers, disk controllers, and so on).
The current standard allows the use of up to 64
data lines at 66 MHz, for a raw transfer rate of 528
MByte/s, or 4.224 Gbps. But it is not just a high
speed that makes PCI attractive. PCI is specifically
designed to meet economically the I/O
requirements of modern systems; it requires very
few chips to implement and supports other buses
attached to the PCI bus.
Continue
PCI is designed to support a variety of
microprocessor-based configurations, including
both single- and multiple-processor systems.
Accordingly, it provides a general-purpose set of
functions. It makes use of synchronous timing and
a centralized arbitration scheme.
Figure 3.22a shows a typical use of PCI in a
single-processor system. A combined
DRAM controller and bridge to the PCI bus
provides tight coupling with the
processor and the ability to deliver data at
high speeds. The bridge acts as a data
buffer so that the speed of the PCI bus may
differ from that of the processor’s I/O
capability.
Diaram
Mutiprocessor
In a multiprocessor system (Figure 3.22b),
one or more PCI configurations
may be connected by bridges to the
processor’s system bus. The system bus
supports only the processor/cache units,
main memory, and the PCI bridges. Again,
the use of bridges keeps the PCI
independent of the processor speed yet
provides
the ability to receive and deliver data
rapidly.
Diagram
PCI Bus structure
• 49 mandatory signal lines
• Systems lines
— Including clock and reset
• Address & Data
— 32 time mux lines for address/data
— Interrupt & validate lines
• Interface Control Control the timing of transactions and
provide coordination
among initiators and targets
• Arbitration
— Not shared
— Direct connection to PCI bus arbiter
• Error lines Used to report parity and other errors.
PCI Bus Lines (Optional)
• Interrupt lines These are provided for PCI devices that
must generate requests for service.
— Not shared
— Each device has its own interrupt line.
• Cache support These pins are needed to support a
memory on PCI that can be cached in the processor or
another device
• 64-bit Bus Extension Include 32 lines that are time
multiplexed for addresses and data and that are combined with
the mandatory address/data lines to form a 64-bit address/data
bus.
• JTAG/Boundary Scan
—For testing procedures
PCI Commands
Bus activity occurs in the form of transactions between an
initiator, or master, and a target. When a bus master acquires
control of the bus, it determines the type of transaction that will
occur next. During the address phase of the transaction, the
C/BE lines are used to signal the transaction type. The
commands are as follows:
1)Interrupt Acknowledge 2)Special Cycle 3)I/O Read 4)I/O
Write 5)Memory Read 6) Memory Read Line 7)Memory
Read Multiple 8)Memory Write 9)Memory Write and
Invalidate 10)Configuration Read 11)Configuration Write
12) Dual address Cycle.
Continues . . .
Interrupt Acknowledge is a read command intended for the
device that functions as an interrupt controller on the PCI bus.
The Special Cycle command is used by the initiator to broadcast
a message to one or more targets.
The I/O Read and Write commands are used to transfer data
between the initiator and an I/O controller. Each I/O device has
its own address space, and the address lines are used to indicate a
particular device and to specify the data to be transferred
to or from that device.
The memory read and write commands are used to specify the
transfer of a burst of data, occupying one or more clock cycles.
Continues . . . .
The Memory Write command is used to transfer data in one or
more data cycles to memory.
The Memory Write and Invalidate command transfers data in
one or more cycles to memory. In addition, it guarantees that at
least one cache line is written.
The two configuration commands enable a master to read and
update configuration parameters in a device connected to the
PCI. Each PCI device may include up to 256 internal registers
that are used during system initialization to configure
that device.
The Dual Address Cycle command is used by an initiator to
indicate that it is using 64-bit addressing.
Data transfer
Every data transfer on the PCI bus is a single transaction consisting of one
address phase and one or more data phases. In this discussion, we illustrate a
typical read operation; a write operation proceeds similarly. Figure 3.23
shows the timing of the read transaction. All events are synchronized to the
falling transitions of the clock, which occur in the middle of each clock cycle.
Bus devices sample the bus lines on the rising edge at the beginning of a bus
cycle. The following are the significant events, labeled on the diagram:
a. Once a bus master has gained control of the bus, it may begin the
transaction by asserting FRAME. This line remains asserted until the initiator
is ready to complete the last data phase. The initiator also puts the start
address on the address bus, and the read command on the C/BE lines.
b. At the start of clock 2, the target device will recognize its address on the
AD lines.
.
c. The initiator ceases driving the AD bus. A turnaround cycle
(indicated by the two circular arrows) is required on all signal lines
that may be driven by more than one device, so that the dropping
of the address signal will prepare the bus for use by the target
device.The initiator changes the information on the C/BE
lines to designate which AD lines are to be used for transfer for the
currently addressed data (from 1 to 4 bytes).The initiator also
asserts IRDY to indicate that it is ready for the first data item.
d. The selected target asserts DEVSEL to indicate that it has
recognized its address and will respond. It places the requested
data on the AD lines and asserts TRDY to indicate that valid data
are present on the bus.
e. The initiator reads the data at the beginning of clock 4 and
changes the byte enable lines as needed in preparation for the next
read.
f. In this example, the target needs some time to prepare the second
block of data for transmission.Therefore, it deasserts TRDY to
signal the initiator that there will not be new data during the coming
cycle.Accordingly, the initiator does not read the data lines at the
beginning of the fifth clock cycle and does not change byte enable
during that cycle.The block of data is read at beginning of clock 6
g. During clock 6, the target places the third data item on the bus. However, in
this example, the initiator is not yet ready to read the data item (e.g., it has a
temporary buffer full condition). It therefore deasserts IRDY. This will cause the
target to maintain the third data item on the bus for an extra clock cycle.
h. The initiator knows that the third data transfer is the last, and so it deasserts
FRAME to signal the target that this is the last data transfer. It also asserts IRDY
to signal that it is ready to complete that transfer.
i. The initiator deasserts IRDY, returning the bus to the idle state, and the target
deasserts TRDY and DEVSEL.
PCI Read Timing Diagram
PCI Arbiteration
PCI makes use of a centralized, synchronous arbitration scheme
in which each master has a unique request (REQ) and grant
(GNT) signal. These signal lines are attached to a central arbiter
(Figure 3.24) and a simple request–grant handshake is used to
grant access to the bus. Figure 3.25 is an example in which
devices A and B are arbitrating for the bus.
The following sequence occurs:
a. At some point prior to the start of clock 1, A has asserted its
REQ signal.The arbiter samples this signal at the beginning of
clock cycle 1.
b. During clock cycle 1, B requests use of the bus by asserting its
REQ signal.
c. At the same time, the arbiter asserts GNT-A to grant bus
access to A.
d. Bus master A samples GNT-A at the beginning of clock 2 and learns that it has been
granted bus access. It also finds IRDY and TRDY deasserted, indicating that the bus is
idle. Accordingly, it asserts FRAME and places the address information on the address
bus and the command on the C/BE bus (not shown). It also continues to assert REQ-A,
because it has a second transaction to perform after this one.
e. The bus arbiter samples all REQ lines at the beginning of clock 3 and makes an
arbitration decision to grant the bus to B for the next transaction. It then asserts GNT-B
and deasserts GNT-A. B will not be able to use the bus until it returns to an idle state.
f. A deasserts FRAME to indicate that the last (and only) data transfer is in
progress. It puts the data on the data bus and signals the target with IRDY.The
target reads the data at the beginning of the next clock cycle.
g. At the beginning of clock 5, B finds IRDY and FRAME deasserted and so is able to
take control of the bus by asserting FRAME. It also deasserts its REQ line, because it
only wants to perform one transaction. Subsequently, master A is granted access to the
bus for its next transaction.
Notice that arbitration can take place at the same time that the current bus
master is performing a data transfer. Therefore, no bus cycles are lost in performing
arbitration.This is referred to as hidden arbitration .
PCI Bus Arbiter
PCI Bus Arbitration
Foreground Reading
• Stallings, chapter 3 (all of it)
• www.pcguide.com/ref/mbsys/buses/

• In fact, read the whole site!


• www.pcguide.com/

You might also like