UART Vs I2C Vs SPI
UART Vs I2C Vs SPI
This article will compare the various interfaces: UART, SPI and I2C and their
differences. We will be comparing them with various factors through their
protocols, advantages and disadvantages of each interface, etc and we will be
providing some examples of how these interfaces are being used in
microcontrollers.
UART Interface
What is UART?
Stands for Universal Asynchronous Reception and Transmission (UART)
A simple serial communication protocol that allows the host communicates
with the auxiliary device.
UART supports bi-directional, asynchronous and serial data transmission.
It has two data lines, one to transmit (TX) and another to receive (RX), which
are used to communicate through digital pin 0, digital pin 1.
TX and RX are connected between two devices. (eg. USB and computer)
UART can also handle synchronization management issues between
computers and external serial devices.
As UART has no clocks, UART adds start and stop bits that are being
transferred to represent the start and end of a message.
This helps the receiving UART know when to start and stop reading bits.
When the receiving UART detects a start bit, it will read the bits at the
defined BAUD rate.
UART data transmission speed is referred to as BAUD Rate and is set to
115,200 by default (BAUD rate is based on symbol transmission rate, but is
similar to bit rate).
Both UARTs must operate at about the same baud rate. If the difference of
BAUD rate is more than 10%, the timing of bits may be off and render the
data unusable. The user must ensure UARTs are configured to transmit and
receive from the same data packet.
Once data is being transmitted by the transmit FIFO, the FIFO ‘BUSY’ flag will
be asserted and active during the process.
FIFO = First in, First out. It’s a UART buffer that that forces each
byte to be passed in sequence to the receiving UART.
The ‘BUSY’ bit will only be inactive after data is finished transmitting, the
FIFO is emptied and every bit has been transmitted including the stop bit.
When the UART receiver is idle and if the data input is low after start bit is
received, the receive counter will start running and expect to receive data in
the 8th cycle of BAUD16.
If RX is still low during the 8th cycle of Baud16 while the start bit is valid, it
would be processed as the wrong start bit and thus ignored.
If the start bit is valid, data bits are sampled every 16th cycle of Baud16
based on the length of the data character. If the parity mode is enabled, the
parity bit is also detected.
If RX is high, a valid stop bit will be acknowledged. Otherwise, a framing error
will occur.
When a complete data packet is received, the data is stored in the receiving
FIFO.
Interrupt Control
FIFO Operation
UART module of the Stellaris family of ARM CPUs contain two 16-byte FIFOs:
one for transmission and one for the reception.
They can be configured to trigger interrupts at various depths. For example,
1/8, 1/4, 1/2, 3/4, and 7/8 depth.
If the receiving FIFO triggers an interrupt at 1/4, a receive interrupt is
triggered when the UART receives 4 data.
1. When the hardware receives the data, it will be stored into the receiving
FIFO. The program will retrieve and erase the data automatically from the
receiving FIFO, so there will be space in the receiving FIFO. If the data in the
receiving FIFO is not erased and the receiving FIFO is full, the data will be
lost.
2. The transceiver FIFO is to solve the issue regarding the CPU being inefficient
and the UART transceiver being interrupted too frequently. Using UART
communication, the interrupt mode is simpler and more efficient than the
polling method. With no transceiver FIFO, each data will be interrupted once
and become inefficient. With a transceiver FIFO, it can generate an interrupt
and constantly transmit and receive data (up to 14), which improves the
transmission and reception efficiency.
3. Data loss would not occur as a result of the FIFO as it has already foreseen
any problems in the process of sending and receiving. As long as the UART is
initialized, the interrupt routine will do everything automatically.
Loopback
UART has an IrDA Serial Infrared (SIR) encoder/decoder module. The IrDA SIR
module translates between an asynchronous UART data stream and a half-
duplex serial SIR interface.
It is used to provide a digital coded output and a decoded input to the
UART. The UART signal pin can be connected to an infrared transceiver for
the IrDA SIR physical layer connection.
Advantages of Using UART
Simple to operate, well documented as it is a widely used method with a lot
of resources online
No clock needed
Parity bit to allow for error checking
All Arduino boards have at least one serial port (UART) which communicates
on digital pins 0 (RX) and 1 (TX) as well with the computer via USB.
This is an Arduino-compatible board, which is based on ATmga328P MCU.
With an ATMEGA16U2 as a UART-to-USB converter, the board can basically
work like an FTDI chip and it can be programmed via a micro-USB cable.
Base Shield V2
Arduino Uno is the most popular Arduino board so far, however, it is
sometimes frustrating when your project requires a lot of sensors or LEDs
and your jumper wires are in a mess.
The purpose of this product is to help you get rid of the breadboard and
jump wires. With the rich grove connectors on the baseboard, you can add
all the grove modules to the Arduino Uno very conveniently!
These devices can be connected via UART and I2C (the next communication
peripheral which I am going to touch on!)
I2C Interface
What is I2C?
Stands for Inter-integrated-circuit (I2C)
It is a serial communications protocol similarly to UART. However, it is not
used for PC-device communication but instead with modules and sensors.
It is a simple, bidirectional two-wire synchronous serial bus and requires
only two wires to transmit information between devices connected to the
bus.
They are useful for projects that require many different parts (eg. sensors,
pin, expansions and drivers) working together as they can connect up to 128
devices to the mainboard while maintaining a clear communication
pathway!
This is because I2C uses an address system and a shared bus = many
different devices can be connected using the same wires and all data are
transmitted on a single wire and have a low pin count. However, the tradeoff
for this simplified wiring is that it is slower than SPI.
Speed of I2C is also dependent by data speed, wire quality and external
noise
The I2C protocol is also used as a two-wire interface to connect low-speed
devices like microcontrollers, EEPROMs, A/D and D/A converters, I/O
interfaces and other similar peripherals in embedded systems.
Clock Synchronisation
All masters generate their own clocks on the SCL line to transmit messages
on the I2C bus.
Data is only valid during the high period of the clock.
Clock synchronization is performed by connecting the I2C interface to the
SCL line where the switch goes from high to low. Once the device’s clock
goes low, it keeps the SCL line in this state until it reaches the high level of
the clock.
If another clock is still in a low period, the low-to-high switch does not
change the state of the SCL line. The SCL line is always held low by the
device with the longest low period. At this time, the device with a short and
low period will enter a high and waiting state.
When all relevant devices have completed their low period, the clock line
goes high.
After that, there is no difference in the state of the device clock and the SCL
line, and all devices begin to count their high period. The device that first
completes the high period will pull the SCL line low again.
The low period of the synchronous SCL clock is determined by the device
with the longest low clock period, while the high period is determined by the
device with the shortest high clock period.
Transmission Modes
Quick Mode:
High-Speed Mode:
Hs mode devices can transmit information at bit rates up to 3.4 Mbit/s and
remain fully backwards compatible with fast mode or standard mode (F/S
mode) devices that can communicate bi-directionally in a speed mixed bus
system.
The Hs mode transmission has the same serial bus principle and data format
as the F/S mode system except for arbitration and clock synchronization
which is not performed.
The I2C bus specification in high-speed mode is as follows:
In high speed (Hs) mode, the master device has an open-drain
output buffer for the high-speed (SDAH) signal and an open-drain
pull-down and current source pull-up circuit at the high-speed
serial clock (SCLH) output. This shortens the rise time of the SCLH
signal and at any time, only one host current source is active;
In the Hs mode of a multi-master system, arbitration and clock
synchronization are not performed in order to speed up the bit
processing capability. The arbitration process normally ends after
the host code is transmitted in the F/S mode.
The Hs mode master device generates a high and low serial clock
signal with a ratio of 1:2 which removes the timing requirements
for setup and hold time.
The Hs mode device can have a built-in bridge. During Hs mode
transmission, the SDAH and SCLH lines of the Hs mode device are
separated from the SDA and SCL lines which reduces the capacitive
loading of the SDAH and SCLH lines and make rise and fall faster.
The difference between Hs mode slave devices and F/S slave
devices is the speed at which they operate.
The Hs mode device can suppress glitches, and the SDAH and SCLH
outputs also have a Schmitt trigger;
The output buffer of the Hs mode device has a slope control
function for the falling edges of the SDAH and SCLH signals.
This provides an excellent tool for debugging I²C issues because you can
listen in on the conversation as it happens.
MCP 23017
Ref: Electronicwings, MCP23017 16-bit GPIO Expander.
16-bit, general-purpose parallel I/O expansion for the I2C bus. Similar to
MCP23S17 except for serial interface (I2C vs SPI).
Port expander that gives the user virtually identical ports compared to
standard microcontrollers.
PCF 8574
Ref: PCF8574 Serial Interface Module Board LCD Converter.
SPI Interface
What is SPI?
Stands for Serial Peripheral Interface (SPI)
It is similar to I2C and it is a different form of serial-communications protocol
specially designed for microcontrollers to connect.
Operates at full-duplex where data can be sent and received simultaneously.
Operate at faster data transmission rates = 8Mbits or more
It is typically faster than I2C due to the simple protocol. Even if data/clock
lines are shared between devices, each device will require a unique address
wire.
Used in places where speed is important. (eg. SD cards, display modules or
when info updates and changes quickly like thermometers)
Seeed does offer a similar product which has the same functions: Grove I2C
ADC but its communication peripheral is I2C.
It is 10 bit 8-channel analogue-to-digital converter (ADC).
For the MCP 3008, it connects to the Raspberry Pi using an SPI serial
connection. Done by using the hardware SPI bus or any four GPIO pins and
software SPI to connect to the MCP 3008.
The Pi zero ENC28J60 is a simple Network Adapter module for Pi zero that is
very easy to assemble and configure.
It allows your Raspberry Pi zero to access the network smoothly, and it is
easy to do system updates and software installation operations.
Microchip’s ENC28J60 is a 28-pin, 10BASE-T stand-alone Ethernet controller
with an SPI interface.
The SPI interface serves as a communication channel between the host
controller and the ENC28J60.
SPI flash is very common, and by using a test clip, SPIDriver makes it
convenient to read and write SPI flash in-circuit. A short script is all it takes
to read or write an Atmel’s flash and SPI LED strips are also easy to hook up
to the SPI Driver, You can also be able to control them directly which makes
them much more fun!
Using SPI in this secnario is fast enough to smoothly animate long strips and
achieve POV effects. Short strips can also be powered directly by the
SPIDriver’s beefy 470 mA built-in supply.
Thus, a user should pick a communication peripheral that suits your project the
best. For example, you want the fastest communication peripheral, SPI would be
the ideal pick. On another hand, if a user wants to connect many devices without it
being too complex, I2C will be the ideal pick as it can connect up to 127 devices and
it is simple to manage.
Summary
In summary, I have compiled all the various advantages/disadvantages and
functions of the various communication protocols and compared them so you can
easily pick which is the best for your project. Do keep in mind that the device,
accessory, module or sensor you are using must support the communication
protocol as well.
Number of devices Up to 2 devices Up to 127, but gets complex Many, but gets complex
Number of wires 1 2 4
No. of masters and slaves Single to Single Multiple slaves and masters 1 master, multiple slaves
With increasing number of functional blocks (IP) integrating into SOC designs, the shared bus
protocols (AHB/ASB) started hitting limitations sooner and in 2003 , the new revision of AMBA 3
introduced a point to point connectivity protocol — AXI (Advanced Extensible Interface). Further
in 2010, an enhanced version was introduced — AXI 4. Following diagram illustrates this
evolution of protocols along with the SOC design trends in industry.
Following diagram illustrates how an AXI interconnect can be used to build an SOC with various
functional blocks talking through a master-slave protocol. The interconnect could be a custom
crossbar or a switch design or even an off the shelve NOC (Network on Chip) IP that supports
multiple AXI masters and slaves. The AXI interconnect helps in scaling up connectivity for
number of agents compared to previous AHB/ASB bus. An AXI to APB bridge on one of the
slave port is normally used to bridge communications to a set of peripherals shared on an APB
bus.
Further evolution happened in the era of mobile and smartphones with SOCs
having dual/quad/octa core processors with shared caches integrated and the
need for hardware managed coherency across the memory subsystem. This
lead to the introduction of ACE (AXI Coherency Protocol Extension) in AMBA
revision 4.
Lastly, in the current era of heterogeneous computing for HPC and data
center markets, the integration trend continues with increasing number of
processor cores along with several heterogeneous computing elements like
GPU, DSP, FPGAs, memory controllers and IO sub systems. In 2013, AMBA
5 introduced the CHI (Coherent Hub Interconnect) protocol as a re-design of
the AXI/ACE protocol. The signal based AXI/ACE protocol was replaced with
the new packet based CHI layered protocol that can scale very well for near
term future.
Now that hopefully you understand how the protocols evolved and how each
of them fit in to an SOC design— here are few basics and references to
resources that you can use to learn more in depth about each of the protocol.
ARM has open sourced all of the protocols and all the specifications can be
downloaded from the ARM website free by signing up.
1. APB : The Advanced Peripheral Bus (APB) is used for connecting low bandwidth peripherals. It is
a simple non-pipelined protocol that can be used to communicate(read or write) from a
bridge/master to a number of slaves through the shared bus. The reads and writes shares the
same set of signals and no burst data transfers are supported. The latest spec (APB 2.0) is
available on ARM website here and is a relatively easy protocol to learn.
2. AHB: The Advanced High-performance Bus (AHB) is used for connecting components that need
higher bandwidth on a shared bus. These could be a internal memory or an external memory
interface, DMA , DSP etc but the shared bus would limit the number of agents. Similar to APB,
this is a shared bus protocol for multiple masters and slaves, but higher bandwidth is possible
through burst data transfers. The latest spec can be found on ARM website here and is relatively
easy to learn
3. AHB-lite protocol is a simplified version of AHB. The simplification comes with support for only a
single master design and that removes need for any arbitration, retry, split transactions etc.
4. AXI: The Advanced Extensible interface (AXI) is useful for high bandwidth and low latency
interconnects. This is a point to point interconnect and overcomes the limitations of a shared bus
protocol in terms of number of agents that can be connected. The protocol also was an
enhancement from AHB in terms of supporting multiple outstanding data transfers (pipe-lined),
burst data transfers, separate read and write paths and supporting different bus widths.
5. AXI-lite protocol is a simplified version of AXI and the simplification comes in terms of no support
for burst data transfers.
6. AXI-stream protocol is another flavor of the AXI protocol that supports only streaming of data
from a master to a slave. There is no separate read/write channels in the stream protocol unlike a
full AXI or AXI-lite as the intend is to only stream in one direction. Multiple streams of data can be
transferred (even with interleaving) across a master and slave. This becomes useful in designs
like video streaming applications.
7. The full AXI and AXI-lite specification can be downloaded on ARM website here. The AXI-stream
protocol has a different spec and is available here for download.
8. ACE — AXI Coherence extension protocol is an extension to AXI 4 protocol and evolved in the era
of multiple CPU cores with coherent caches getting integrated on a single chip. The ACE protocol
extends the AXI read and write data channels by introducing separate snoop address, snoop data
and snoop response channels. These extra channels provides mechanisms to implement a
snoop based coherency protocol. If you are new to coherency, understanding that will be a pre-
requisite before learning ACE protocol. The spec is available for download from ARM here as
part of AXI4 spec
9. ACE-Lite — The ACE also has a simplified version of protocol for those agents that does not have
a cache of its own but still are part of the shareable coherency domain. Typical agents like DMA
or network interface agents fall implement this “one-way” coherency using a ACE-lite protocol.
10. CHI —( Coherent Hub Interface) — The ACE protocol was developed as an extension to AXI to
support coherent interconnects. The ACE protocol used a signal level communication between
master/slave and hence the interconnects needed large number of wires with added channels for
snoops and responses. This worked well for small coherent clusters with dual/quad core mobile
SOC designs. With increasing number of coherent clusters on SOC along with other
heterogeneous compute elements and memory controllers — the AMBA 5 revision introduced CHI
protocol as a complete re-design of the ACE protocol. The CHI protocol uses a layered packet
based communication protocol with protocol, link layer and physical layer implementation and
also supports QoS based flow control and retry mechanisms.
Essentially, AMBA protocols define how functional blocks communicate with each other.
The following diagram shows an example of an SoC design. This SoC has several functional
blocks that use AMBA protocols, like AXI, to communicate with each other:
Today, AMBA is widely used in a range of ASIC and SoC parts. These parts include
applications processors that are used in devices like IoT subsystems, smartphones, and
networking SoCs.
Efficient IP reuse
Flexibility
AMBA offers the flexibility to work with a range of SoCs. IP reuse requires a common
standard while supporting a wide variety of SoCs with different power, performance, and
area requirements. Arm offers a range of interface specifications that are optimized for
these different requirements.
Compatibility
Support
Bus interface standards like AMBA, are differentiated through the performance that they enable.
The two main characteristics of bus interface performance are:
Bandwidth
The rate at which data can be driven across the interface. In a synchronous system, the
maximum bandwidth is limited by the product of the clock speed and the width of the
data bus.
Latency
The efficiency of your interface depends on the extent to which it achieves the maximum
bandwidth with zero latency.
APB is designed for low-bandwidth control accesses, for example, register interfaces on system
peripherals. This bus has a simple address and data phase and a low complexity signal list.
AMBA 2
In 1999, AMBA 2 added the AMBA High-performance Bus (AHB), which is a single clock-edge
protocol. A simple transaction on the AHB consists of an address phase and a subsequent data
phase. Access to the target device is controlled through a MUX, admitting access to one manager
at a time. AHB is pipelined for performance, while APB is not pipelined for design simplicity.
AMBA 3
In 2003, Arm introduced the third generation, AMBA 3, which includes ATB and AHB-Lite.
Advanced Trace Bus (ATB), is part of the CoreSight on-chip debug and trace solution.
AHB-Lite is a subset of AHB. This subset simplifies the design for a bus with a single manager.
Advanced eXtensible Interface (AXI), the third generation of AMBA interface defined in the
AMBA 3 specification, is targeted at high performance, high clock frequency system designs.
AXI includes features that make it suitable for high-speed submicrometer interconnect.
AMBA 4
In 2010, the AMBA 4 specifications were introduced, starting with AMBA 4 AXI4 and then
AMBA 4 AXI Coherency Extensions (ACE) in 2011.
ACE extends AXI with additional signaling introducing system-wide coherency. This system-
wide coherency allows multiple processors to share memory and enables technology like
big.LITTLE processing. At the same time, the ACE-Lite protocol enables one-way coherency.
One-way coherency enables a network interface to read from the caches of a fully coherent ACE
processor.
The AXI4-Stream protocol is designed for unidirectional data transfers from manager to
subordinate with reduced signal routing, which is ideal for implementation in FPGAs.
AMBA 5
In 2014, the AMBA 5 Coherent Hub Interface (CHI) specification was introduced, with a
redesigned high-speed transport layer and features designed to reduce congestion. There have
been several editions of the CHI protocol, and each new version adds new features.
In 2016, the AHB-Lite protocol was updated to AHB5, to complement the Armv8-M
architecture, and extend the TrustZone security foundation from the processor to the system.
In 2019, the AMBA Adaptive Traffic Profiles (ATP) was introduced. ATP complements the
existing AMBA protocols and is used for modeling high-level memory access behavior in a
concise, simple, and portable way.
AXI5, ACE5 and ACE5-Lite extend prior generations, to include a number of performance and
scalability features to align with and complement AMBA CHI. Some of the new features and
options include:
Support for high frequency, non-blocking coherent data transfer between many
processors.
A layered model to allow separation of communication and transport protocols
for flexible topologies, such as a cross-bar, ring, mesh or ad hoc.
Cache stashing to allow accelerators or IO devices to stash critical data within a
CPU cache for low latency access.
Far atomic operations enable the interconnect to perform high-frequency updates
to shared data.
End-to-end data protection and poisoning signalling.
The following diagram shows how AXI is used to interface an interconnect component:
There are only two AXI interface types, manager and subordinate. These interface types are
symmetrical. All AXI connections are between manager interfaces and subordinate interfaces.
AXI interconnect interfaces contain the same signals, which makes integration of different IP
relatively simple. The previous diagram shows how AXI connections join manager and
subordinate interfaces. The direct connection gives maximum bandwidth between the manager
and subordinate components with no extra logic. And with AXI, there is only a single protocol to
validate.
The AXI protocol defines the signals and timing of the point-to-point connections between
manager and subordinates.
The previous diagram shows that each AXI manager interface is connected to a single AXI
subordinate interface. Where multiple managers and subordinates are involved, an interconnect
fabric is required. This interconnect fabric also implements subordinate and manager interfaces,
where the AXI protocol is implemented.
The following diagram shows that the interconnect is a complex element that requires its own
AXI manager and subordinate interfaces to communicate with external function blocks:
The following diagram shows an example of an SoC with various processors and function
blocks:
The previous diagram shows all the connections where AXI is used. You can see that AXI3 and
AXI4 are used within the same SoC, which is common practice. In such cases, the interconnect
performs the protocol conversion between the different AXI interfaces.
AXI channels
The AXI specification describes a point-to-point protocol between two interfaces: a manager and
a subordinate. The following diagram shows the five main channels that each AXI interface uses
for communication:
Using separate address and data channels for read and write transfers helps to maximize the
bandwidth of the interface. There is no timing relationship between the groups of read and write
channels. This means that a read sequence can happen at the same time as a write sequence.
Each of these five channels contains several signals, and all these signals in each channel have
the prefix as follows:
AXI supports two different sets of channels, one for write operations, and one for read
operations. Having two independent sets of channel helps to improve the bandwidth
performances of the interfaces. This is because read and write operations can happen at
the same time.
AXI allows for multiple outstanding addresses. This means that a manager can issue
transactions without waiting for earlier transactions to complete. This can improve
system performance because it enables parallel processing of transactions.
For any burst that is made up of data transfers wider than one byte, the first bytes
accessed can be unaligned with the natural address boundary. For example, a 32-bit data
packet that starts at a byte address of 0x1002 is not aligned to the natural 32-bit address
boundary.
Out-of-order transaction completion is possible with AXI. The AXI protocol includes
transaction identifiers, and there is no restriction on the completion of transactions with
different ID values. This means that a single physical port can support out-of-order
transactions by acting as several logical ports, each of which handles its transactions in
order.
AXI managers only issue the starting address for the first transfer. For any following
transfers, the subordinate will calculate the next transfer address based on the burst
type.
Channel handshake
The AXI4 protocol defines five different channels, as described in AXI channels. All of these
channels share the same handshake mechanism that is based on
the VALID and READY signals, as shown in the following diagram:
The VALID signal goes from the source to the destination, and READY goes from the destination to the
source.
Whether the source or destination is a manager or subordinate depends on which channel is
being used. For example, the manager is a source for the Read Address channel, but a
destination for the Read Data channel.
The source uses the VALID signal to indicate when valid information is available.
The VALID signal must remain asserted, meaning set to high, until the destination accepts the
information. Signals that remain asserted in this way are called sticky signals.
The destination indicates when it can accept information using the READY signal.
The READY signal goes from the channel destination to the channel source.
This mechanism is not an asynchronous handshake, and requires the rising edge of the clock for
the handshake to complete.
The final example shows both VALID and READY signals being asserted during the clock
cycle 3, as seen in the following diagram:
Again, the handshake completes on the rising edge of clock cycle 4, when
both VALID and READY are asserted.
In all three examples, information is passed down the channel when READY and VALID are
asserted on the rising edge of the clock signal.
This handshake is where the manager communicates the address of the write to the subordinate. The
handshake has the following sequence of events:
After this first handshake, the manager transfers the data to the subordinate on the Write
(W) channel, as shown in the following diagram:
The data transfer has the following sequence of events:
Finally, the subordinate uses the Write Response (B) channel, to confirm that the write
transaction has completed once all WDATA has been received. This response is shown in the
following diagram:
Next, on the Read (R) channel, the subordinate transfers the data to the manager. The following
diagram shows the data transfer process:
1. In clock cycle n, the manager indicates that it is waiting to receive the data by
asserting RREADY.
2. The subordinate retrieves the data and places it on RDATA in clock cycle n+2. In this
case, because this is a single data transaction, the subordinate also sets the RLAST signal to
high.
At the same time, the subordinate uses RRESP to indicate the success or failure of the
read transaction to the manager, and asserts RVALID.
3. Because RREADY is already asserted by the manager, the handshake completes on the rising
edge of clock cycle n+3.
If an error is indicated for any of the transfers in the transaction, the full indicated length of the
transaction must still be completed. There is no such thing as early burst termination.
Active transactions
Active transactions are also known as outstanding transactions.
An active read transaction is a transaction for which the read address has been transferred, but
the last read data has not yet been transferred at the current point in time.
With reads, the data must come after the address, so there is a simple reference point for when
the transaction starts. This is shown in the following diagram:
For write transactions, the data can come after the address, but leading write data is also allowed. The
start of a write transaction can therefore be either of the following:
Therefore, an active write transaction is a transaction for which the write address or leading write
data has been transferred, but the write response has not yet been transferred.
The following diagram shows an active write transaction where the write address has been
transferred, but the write response has not yet been transferred:
The following diagram shows an active write transaction where the leading write data has been
transferred, but the write response has not yet been transferred:
What are the AMBA protocols?
February 15, 2021 By Nikhil Agnihotri
What is AMBA?
AMBA (Advanced Microcontroller Bus Architecture) is a freely-available, open standard
for interconnection and management of IP cores in a System-on-Chip (SoC) IC. It allows
right-first-time development of multi-processor chip designs in a modular, reusable, and
scalable manner. This helps in avoiding costly re-designs and reduces time-to-market
integrated designs.
AMBA was first introduced in 1996 with Advanced Peripheral Bus (APB) and Advanced
System Bus (ASB) specifications. The second version of AMBA was introduced in 1999
and included Advanced High-Performance Bus (AHB) specifications. AMBA 3 that
included Advanced Extensible Interface (AXI), was introduced in 2003. AMBA 4
introduced AXI Coherency Extensions (ACE) in 2010 and AMBA 5, the latest version of
AMBA, introduced Coherent Hub Interface (CHI) in 2013.
Mobile phones and smartphones that contain SoC having multiple processor cores
sharing a common cache memory require management of coherency across the
memory subsystem. For this, ACE specifications were introduced in AMBA 4. The
AXI/ACE specifications were redesigned as CHI to manage communication mechanisms
in heterogeneous computing systems. In contrast, to signal based protocol in AXI/ACE
specifications, CHI is a packet-based layered protocol that can scale up to
communication mechanisms between heterogeneous functional blocks like Digital
Signal Processors (DSP), Graphics Processing Units (GPU), I/O subsystems, and memory
controllers.
AMBA specifications
AMBA is a set of interconnect protocols. The latest version AMBA 5 includes the
following specifications:
1. APB: The latest version of the Advanced Peripheral Bus (APB) was
introduced in AMBA 2.0. This is a simple non-pipelined protocol that is used
for master-slave communication with low bandwidth peripherals. A number
of peripherals can be connected to a shared bus, which is managed via a
bridge (like AXI-APB bridge) or directly by a master (processor/controller).
In APB specifications, the same set of signals are used to read and write
over the bus and no burst data transfers are supported.
2. ASB: The Advanced System Bus (ASB) is a pipelined protocol for
communication mechanisms with high-bandwidth and high-frequency
components. It supports burst transfers and multiple bus masters. This bus
system supports interconnection between multiple masters and memories.
The bus consists of four types of blocks – Master, Arbiter, Slave, and
Decoder. At any time, only one master can access the bus. A master can
only access the bus with the help of an arbiter while it needs selecting a
slave for communication using a decoder. The master initiates read or write
operation and the selected slave responds to the read and write requests.
3. AHB: The Advanced High-Performance Bus (AHB) was introduced in AMBA 2.0. It
is an alternative to ASB where high-performance features are required. It supports
wider data bus configurations, single-cycle bus-master handover, split
transactions, and single clock-edge operations. Like ASB, the AHB bus also
requires additional components for managing communication mechanisms like a
read multiplexer, write multiplexer, decoder, arbiter, address, and control
multiplexer. The bus system consists of three signals – address signal, write data
bus, and read data bus. The address signal is used to select a slave, the write data
bus is used to move commands from master to the slave, and the read data bus is
used to move responses from slaves to masters. The master access the bus by
requesting to the arbiter and uses decoder to select a slave. The bus is allotted to
a master on the basis of a prioritization scheme. This scheme is defined in the
AMBA specifications and differs between different designs. There are 20 different
AHB signals in total compared to 15 signals in ASB.
4. AHB-lite: It is a simplified version of AHB. It supports communication
mechanisms with a single master without the need for any arbiter. It also
excludes some high-performance features of AHB like split transactions and
retries.
5. AXI: The Advanced Extensible Interface (AXI) is a point-to-point interconnect
specification that overcomes the limitations of shared bus protocols in
connecting multiple agents. It was specifically designed to manage
communication mechanisms with multi-core processors and controllers. AXI
specifications were introduced in AMBA 3.0. Instead of using a system bus, it uses
well-defined interfaces for high-bandwidth and low-latency communication
mechanisms. It has several enhanced features compared to AHB, like multiple
pipelined transfers, separate read/write wires, wider data bus widths, and burst
data transfers.
6. AXI-lite: It is a simplified version of the AXI protocol. It lacks burst data transfers
compared to full AXI specifications.
7. AXI-stream: This is a modification of the AXI protocol for supporting data
streaming from masters to slaves. In this protocol, data is moved only in one
direction from master to slave. The read/write channels are not separate in AXI-
stream unlike in the full AXI specification. It is possible to transfer multiple
streams of data between master and slave. This protocol is useful in applications
like video streaming, game streaming, etc.
8. ACE: The AXI Coherency Extensions (ACE) specifications were introduced in
AMBA 4.0. This specification is used to manage communication mechanisms in
multi-core processors/controllers with coherent cache memories shared between
them. The ACE specification extends the AXI read and write channels using
separate snoop address, snoop data channel,s and snoop response channels.
These additional channels implement snoop based coherency protocol.
9. ACE-lite: ACE-lite is a simplified version of the full ACE specification. It was
designed to manage communication mechanisms with agents that do not have a
cache memory of their own but still can participate in a sharable coherency
system using one-way coherency. Examples of such agents are DMA controllers
and Network-on-Chip blocks.
10. CHI: The Coherent Hub Interface (CHI) is a redesign of the ACE protocol for much
complex heterogeneous computing systems. The ACE protocol uses signal-level
master-slave communication interconnecting using a large number of wires and
additional channels for snoops and responses. This works fine for small coherent
clusters like dual or quad-core mobile SoCs. However, with many heterogeneous
components like DSP, GPU, NPU, etc, AXI hits limitations due to being still a
signal-based protocol. CHI is a redesign of the AXI bus that uses packet-based
interface protocols instead of a signal-based bus system.
Conclusion
If you are in VLSI design, you most likely have heard or learned about AMBA protocols.
AMBA has evolved over years to meet the needs of state-of-the-art SoC designs and
future IC developments. AMBA protocols are open-standard and can be downloaded
from the Arm website after free registration. This article gives you an overview of various
AMBA specifications. You can download the specifications from the Arm website and
learn about these chip design protocols in more depth.
APB Protocol
Introduction
Advanced Peripheral Bus (APB) is the part of Advanced Microcontroller Bus Architecture
(AMBA) family protocols. The latest version of APB is v2.0, which was a part of AMBA 4
realease. It is a low-cost interface and it is optimized for minimal power consumption and
reduced interface complexity. Unlike AHB, it is a Non-Pipelined protocol, used to connect
low-bandwidth peripherals. Mostly, used to connect the external peripheral to the SOC. In
APB, every transfer takes at least two clock cycles (SETUP Cycle and ACCESS Cycle) to
complete. It can also interface with AHB and AXI protocols using the bridges in between.
The above diagram depicts a block diagram of a System. The High-performance ARM
processor is the Core of the system. The other components like High-bandwidth on-chip RAM,
DMA bus master and High-bandwidth Memory Interface are connected to the Core by System
bus,which is AHB in this case. The other low bandwidth peripherals like UART, Timer, Keypad
and PIO are connected to the System bus through the Bridge by using Peripheral bus, here it
is APB bus. In this scenario, the Bridge acts as the AHB Slave corresponding to the Core
Master. And it also acts as the APB Master corresponding to remaining low-bandwidth
external peripherals.
Generally there won’t be any component that produces the APB transfers. The AHB to APB
Bridge is the only component that acts as the APB master in a system.
System bus slave Interface – This is the System bus interface which transfers the
AHB/AXI transactions to APB Bridge
PCLK – Generally System clock is directly connected to this
PRESETn – Active Low Asynchronous Reset
PADDR[31:0] – Address bus from Master to Slave, can be up 32 to bit wide
PWDATA[31:0] – Write data bus from Master to Slave, can be up to 32 bit wide
PRDATA[31:0] – Read data us from Slave to Master, can be up to 32 bit wide
PSELx – Slave select signal, there will be one PSEL signal for each slave connected to
master. If master connected to ‘n’ number of slaves, PSELn is the maximum number of
signals present in the system. (Eg: PSEL1,PSEL2,..,PSELn)
PENABLE – Indicates the second and subsequent cycles of transfer. When PENABLE is
asserted, the ACCESS phase in the transfer starts.
PWRITE – Indicates Write when HIGH, Read when LOW
PREADY – It is used by the slave to include wait states in the transfer. i.e. whenever
slave is not ready to complete the transaction, it will request the master for some time
by de-asserting the PREADY.
PSLVERR – Indicates the Success or failure of the transfer. HIGH indicates failure and
LOW indicates Success
Let’s see how a typical Write and Read transfers are done in APB Protocol
At T1, a READ transfer with address PADDR, PWRITE and PSEL starts.
They will be registered at rising edge of PCLK.
This is SETUP Phase of the transfer.
After T2, PENABLE and PREADY are registered at the rising edge of PCLK.
When asserted, PENABLE indicates the starting of ACCESS phase.
When asserted, PREADY indicates that slave can complete the transfer at next rising
edge of PCLK by providing the data on PRDATA.
Slave must provide the data before the end of read transfer. i.e. before T3.
During the ACCESS Phase, when PENABLE is high, the slave extends the transfer by
driving PREADY low.
The PADDR, PWRITE, PSEL, PENABLE, PPROT signals should remain unchanged while
PREADY is low
ERROR Response
Whenever there is a problem in the transfer, Slave indicates the error response for the
transfer by asserting the PSLVERR signal. PSLVERR is only considered valid during the last
cycle f and APB transfer, when PSEL, PENABLE and PREADY are all HIGH. It is recommended,
but not mandatory that you drive PSLVERR low when it is not being sampled.
Transactions that receive an error response, might or might not have changed the state of
peripheral. For example, If APB master performs a write transaction to an APB slave and
received an error response, it is not guaranteed that the data is not written on the slave
peripheral.
To support complex system designs, it is often necessary for both the interconnect and
other devices in the system to provide protection against illegal transactions. It is provided
by Protection Unit in APB Protocol. The signals indicating the protection unit are PPROT[2:0].
PPROT[0]:
LOW indicates Normal Access
HIGH indicates Privileged Access
PPROT[1]:
LOW indicates Secure Access
HIGH indicates Non-Secure Access
PPROT[2]:
LOW indicates Data Access
HIGH indicates Instruction Access
Operating States
SETUP : When transfer is required, PSELx is asserted then the bus moves in setup state. Bus
only remains in SETUP for only one clock cycle and always moves to ACCESS state on next
rising edge of clock. So, the slave must be able to sample the Address and control information
in the SETUP cycle itself.
ACCESS : PENABLE is asseted to enter into the ACCESS state. The PADDR, PWRITE, PSELx and
PWDATA signals must remain stable during ACCESS state.