Block Interconnection: Today's Topics Divide Into Two
Block Interconnection: Today's Topics Divide Into Two
Block Interconnection
Today’s topics divide into two:
Problems of logical interfacing
❏ Standardization of interfaces
❍ AMBA
❍ OCP
❍ …
❏ Interface operation
IP blocks typically come as ‘black boxes’ and it is the function and the interface AMBA comes in several ‘flavours’, including:
which are of interest to the developer. Having some standard interfaces allows
blocks to be composed easily. ❏ Advanced Peripheral Bus (APB)
❏ Advanced High-performance Bus (AHB)
❏ Advanced eXtensible Interface (AXI)
which are used here as examples.
‘Traditional’ bus
Example CS
Read
Write
Read data
voltage threshold
Write data
Although the data is shown here as unidirectional, off chip buses typically use
bidirectional data signals so must be either reading or writing when active. This ‘0’ time
is due to pin restriction on the package (and wiring on the PCB).
On chip buses are limited by distance but not (particularly) restricted by width Possible solutions:
because there is a considerable wiring resource on a chip. However on-chip sig- ❏ Increase the drive strength
nals are now ‘universally’ unidirectional so that electrical buffers can be inserted
❏ Decrease the load
alon the wire to keep switching edge speed reasonably rapid.
However the wires also have resistance which slows down the edge more at
greater distances from the driver. The first solution is therefore not as effective
as might be though at driving longer wires.
The load can be decreased by ‘cutting’ the wire and inserting buffers (amplifi-
ers) at intervals. These also insert delay but keep the edges fast.
Buffers have an input and an output so the wires, necessarily, are unidirectional.
Buffers
The term “buffer” as applied here refers purely to an electrical amplifier.
“Buffer” is also used to refer to, for example, latches which hold data and
are thus part of the logic. Beware of potential confusion!
University of Manchester School of Computer Science
Clk
Select
Write
Enable
❏ Simple
❏ Single master
❏ Used for low speed peripherals
Example
A peripheral I/O device may have only a small number of registers (say 16) but
be allocated a ‘page’ (say 1KB) of the memory map. It could indicate if an
access was apparently to that device but not to one of its valid registers. Alterna-
tively, it could indicate an attempt to write to a read-only register.
This cannot be done by a typical MMU which will not resolve translations to
individual words, only pages.
University of Manchester School of Computer Science
Clk
❏ Moderately complex
❏ Multi-master via centralised arbitration
❏ Bus cycles can be extended or aborted
❏ Used for processor buses on medium performance devices (e.g. ARM9)
AHB
AHB increases performance by pipelining. For example, in a read operation it AHB operation is piplined, so that as one set of dat is transferred the subsequent
outputs an address and status asking for the read on a rising clock edge. This is address can be sent.
decoded and selects the appropriate slave device.
On the next active clock edge the slave is expected to latch the address and start Bus master
the read. At this point the bus master can start the next cycle.
On the next active clock edge the master must:
❏ latch the first input data
❏ provide output data if the second cycle was a write operation Device Device Device
❏ start the third cycle (if appropriate)
This sequencing allows faster bus throughput but causes certain difficulties
Bus master
when things don’t go smoothly.
❏ If a peripheral is slow and needs to insert wait states it does this in
the data phase. Other peripherals need to monitor this because, if
one is being addressed ‘next’ it needs to defer starting.
Device Device Device
addr_0 addr_1
data_0 data_1
addr_0 addr_1
wait prevents other devices from starting error causes master to remove command
University of Manchester School of Computer Science
Ready
Valid
Data
A B C D E
AXI-like pipeline
Consider a synchronous AXI pipeline stage.
valid_in valid_out
The intention is to pass data on every clock cycle.
data_in data_out
Data moves across an interface if both valid and ready are active.
If you indicate (upstream) you are willing to accept data (ready) that is ready_in ready_out
a commitment
There is not time to propagate a control signal throughout the pipe!
❏ Solution 1
❍ Don’t indicate possible acceptance until you are empty
❍ Benefit: simple to design
❍ Consequence: the pipeline will never be more than half full
❏ Solution 2
❍ Be prepared to accept new data even if you couldn’t pass on the current packet
❍ Benefit: full bandwidth available
❍ Consequence: twice as many flip-flops in each stage, (half are normally unused)
Stop! Stop!
With sparser occupancy data can stop safely; however throughput is reduced.
Stop! Go
Stop!
Stop! Go
Stop!
Stop! Go
Stop! Go
Note that in some pipelines there will be buffering implicit in the architecture to
‘even out’ such flow irregularities. Examples could include network routers stor-
Stop! Go ing and forwarding packets.
Bus hierarchy
Simple example:
APB TCM
ARM
RAM
$ $
AHB
bus off
bridge Bus crossbar switch I/F chip
AHB
USB LCD
ROM host ctrl
Atmel AT91SAM9261
❏ APB
master master master
❏ ROM
❏ USB host and LCD controller (for programming)
❏ External bus interface
❏ RAM bridge RAM bridge RAM bridge RAM
The crossbar switch allows parallel operations so different masters can have
access to different slave devices simultaneously. Clashes have to resolved by
I/O I/O I/O
inserting wait states.
Bus occupancy can be reduced because the processor has:
This gives decreased latency for some (urgent) operations at the expense of
❏ separate instruction and data caches greater complexity, especially at the master where dependencies between reor-
❏ direct access to the on-chip RAM as Tightly Coupled Memory (TCM) dered transactions may have to be resolved.
GALS Handshaking
The simplest communication mechanism is synchronous on a one-item-per
As clock speeds increase and wiring delays become more significant it is diffi- clock basis; this relies on assumptions that data will always be avaialble and
cult to maintain a synchronous clock model across a whole chip. This problem accepted on every cycle.
was discussed in the section on timing (q.v.). If data is not available on every cycle a ‘validity’ (or “request”) signal can be
used to indicate when data is available.
However one solution to this problem is to allow different IP blocks to be
clocked independently with an arbitrary phase and, possibly, at different fre- If the receiver may not always accept data then some sort of flow control must be
quencies. It is then the job of the interconnection to cross the clock domains. included. Across a synchronous interface – such as AXI, discussed earlier – this
can be another status bit.
This form of interconnection is known as GALS (Globally Asynchronous,
Locally Synchronous). GALS frees the SoC designers from a number of timing With an asynchronous interface various assumptions cannot be made and some
constraints which makes timing closure much easier. Each block is developed as form of handshake protocol is needed. This must be subject to synchronisation
a synchronous circuit but there is no need for chip-wide skew-free clock distri- to the local clock, with a concommitant latency penalty.
bution. Request
Another advantage is the ability to run each block at its own ‘best’ frequency Acknowledge
with the possibility of consequent power reduction.
Data
There can also be a reduction in power supply noise. In a synchronous circuit
logic begins to switch just after each active clock edge. Typically the number of
gates switching over time diminishes during the clock period because not all Block transfers
logic paths are the same length. When gates switch they pull charge from the A simple method of communication between asynchronous blocks is to syn-
power supply or dump it onto the ground. The demand for charge (a.k.a. “cur- chronise each data request and, subsequently, latch the data from the bus. This
rent”) therefore varies periodically setting up a regular AC signal in the (exten- results in a moderate latency but quite a low bandwidth because every transmis-
sive) power wiring. This both acts as a transmitting aerial (especially the wiring sion requires two synchronisations, one for the forward request and another for
into the chip) and may affect other gates’ switching. If a whole chip is synchro- the reverse acknowledge.
nous then this problem is at its worst; if there are several clocks with different Higher bandwidth can be achieved by buffering several data elements for a sin-
phases (or frequencies) the demand tends to even out, reducing noise problems. gle synchronisation. The transmitter ‘owns’ a RAM into which it writes a mes-
There are also disadvantages to GALS’ unsynchronised communication. The sage. When this is complete it passes the RAM to the receiver. After
biggest is the need for synchronisation of signals when they arrive at their desti- synchronising with the receiver’s clock the data can be read out at full speed.
nation. This inherently adds some latency to the signal; more if the reliability is The overall latency is greater but the average bandwidth is also higher. This type
increased by adding longer waits for the resolution of any metastability. Com- of mechanism may be further enhanced (at additional hardware cost) by double
munication is therefore slowed down in some way. buffering so that one RAM is filled whilst the previous one is emptied.
At its most extreme the interconnection may be asynchronous logic which can
implement an elastic FIFO between transmitter and receiver. This could be a
dual-port RAM which is written and read at different rates – synchronisation is
only necessary when the FIFO is almost empty or almost full – or truly clock-
free circuits.
University of Manchester School of Computer Science
Serial buses
This slide is something of an aside, in that it is chiefly concerned with systems off chip.
For wider system interconnection it is common to use serial interconnection:
❏ Inherently slower
❏ Far fewer chip-pins required
❏ Cheaper interconnection medium (wires, connectors, …)
❏ Suitable for wireless applications
Examples include:
❏ Ethernet
❏ USB
❏ I2C
On SoC
Pin restrictions do not apply to intra-chip connections.
Nevertheless the reduction in wiring is becoming attractive for some SoC applications.
I2C
I2C (Inter-Integrated Circuit) is a Philips invention; to avoid legal complications
it is typically reffered to as Two Wire Interface (TWI) by other manufacturers.
I2C is a fairly slow interconnection, suitable for driving entirely in software with
two PIO bits if required. It is typically used as a PCB level interconnection, for
example for adding memory to small microcontrollers. However it is a multi-
master bus where arbitration for mastery takes place via the same two wires.
Communication is synchronous as one wire is used as data, the other as a clock.
However the ‘clock’ – really more a strobe – need not be regular as it may be
software driven or paused by the receiving device if it is not ready.