0% found this document useful (0 votes)
88 views

Lecture 4 On Chip Interfaces 2023

Uploaded by

Pavan Dhake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Lecture 4 On Chip Interfaces 2023

Uploaded by

Pavan Dhake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

SoC 101:

a.k.a., “Everything you wanted to know about a computer but were afraid to ask”

On-Chip Interconnect
Prof. Adam Teman
EnICS Labs, Bar-Ilan University

1 May 2023
This Lecture

Source: ARM

2 © Adam Teman,
May 1, 2023
Lecture Overview

3 © Adam Teman,
May 1, 2023
Higher
On-Chip Connecting with Simple Bus
Performance
Communication Peripherals Operation
Buses

On-Chip Communication
Typical Computing System
• On-Chip Interconnect
• Processors
• IP Blocks
• On Chip Memory
• Off-Chip Interconnect
• Off-chip peripherals
• Off-chip memory
• Off-chip ASICs
• In this lecture, we will focus
on On-Chip Interconnect

© Adam Teman,
May 1, 2023
Communication Considerations
System-level issues and specifications for choosing communication architecture:
• Communication Bandwidth
• Rate of information transfer (bytes/sec)
• Communication Latency
• Time delay between a request and response
• Application dependent, e.g., Video Streaming vs. two-way communication
• Master and Slave
• Who can control transactions? What can be controlled?
• Concurrency Requirement
• The number of independent simultaneous channels open in parallel.
• Multiple Clock Domains
• Different IPs may operate at different frequencies.
6 © Adam Teman,
May 1, 2023
System-level Trends
• Heterogeneity among components that need to be interconnected
• Increasing volume and diversity of traffic
• Complexity of
communication logic can
easily compare to a small
microprocessor!

7 © Adam Teman,
May 1, 2023
Interconnect Scaling Trends
• Global wires scale slower than
transistors/gates
• Gates, local wires scale with technology,
global wires do not
• Global on-chip comm to operation
delay changed from 2:1 to 9:1 over
a few technology generations

Source: Bill Dally, DAC 2009 keynote


© Adam Teman,
May 1, 2023
Source: ITRS
Need for Communication-centric Design
• Communication is THE most critical aspect affecting system performance
• Communication architecture consumes up to 50% of total on-chip power
• Ever increasing number of wires, repeaters, bus components
(arbiters, bridges, decoders etc.) increases system cost
• Communication architecture design, customization, exploration, verification
and implementation takes up the largest chunk of a design cycle

Communication Architectures in today’s complex systems


significantly affect performance, power, cost and time-to-market!

© Adam Teman,
May 1, 2023
On-Chip Communication Architecture Design
Three topics to consider when discussing on-chip communication architecture:
• Communication Topology
• How the communication resources are connected
• Simple shared bus, hierarchical bus structures,
rings, mesh, custom bus networks
• Protocols
• How you manage the communication resources
• Static priority, TDMA, round-robin, token passing
• Mapping of System Communications
• Which components connect where?
• e.g., exploit locality, by putting close
components on same bus Wingard, Kurosawa,
10 IEEE CICC, 1998
© Adam Teman,
May 1, 2023
Higher
On-Chip Connecting with Simple Bus
Performance
Communication Peripherals Operation
Buses

Connecting with
Peripherals

11
Connecting with Memory
• In our discussion of Microprocessors,
we assumed the existence of external memory components:
• In a Princeton Architecture, one homogenous memory space.
• In a Harvard Architecture, separate channels for Instruction and Data Memory

Source: Wolf,
Princeton Architecture Harvard Architecture Computers as Components

• Before we go into more complex interconnect options, let’s start by looking at


how these tightly-coupled memory blocks are interfaced with the CPU.
© Adam Teman,
May 1, 2023
Synchronous SRAM Interface 2mxn SRAM
A[m-1:0]

• A typical on-chip synchronous SRAM features: D[n-1:0] Q[n-1:0]

• Single-cycle write/read latency WEN[p-1:0]

• Byte write mask CEN


• Active low Write Enable (i.e., WEN=1 → Read Enable) CLK
• The timing diagram can be viewed, as follows:
(1) Rising edge of the clock results
CLK in WRITE, when WEN is low.

A A0 A1 A2 A3 (2) Rising edge of the clock results


in READ, when WEN is high.
D D0 D1 Valid data appears on the
output after a delay.
WEN
Q D2 D3
13 © Adam Teman,
May 1, 2023
Scaling to a larger network
• The previous SRAM interface is an example of a point-to-point (p2p) link
• P2P links are simple and fast, but not scalable
• Every additional link added requires a full (private) set of signals and control
• Such an approach cannot even accommodate a simple microcontroller,
much less a complex SoC.
• Large amounts of SRAM
• Slower, higher density
memory (DRAM, Flash)
• Peripherals and accelerators

• Therefore,
we need a System Bus.
Source: Greaves, U. Cambridge
14 © Adam Teman,
May 1, 2023
System Bus
• A collection of signals (wires) to which one or more IP components
(which need to communicate data with each other) are connected.
• In addition to the clock, a synchronous bus consists of:
• An Address Bus
• A Data Bus
• A Control Bus
• In a typical system, the CPU serves
as the bus master (a.k.a. “manager”)
Source: Wolf,
and initiates all transfers. Computers as Components

• Other devices are typically called slaves (a.k.a. “subordinates”)


and they react to transfers initiated by the master.
15 © Adam Teman,
May 1, 2023
Memory Mapping
• Amazingly, the three bus components described above (address, data and
control) can facilitate the majority of required control and data transfer.
• This is thanks to the concept of Memory Mapping
• An n-bit bus supplies 2n unique byte addresses
• With a wide bus (e.g., 32-bits) only a small portion of these
addresses are required for data storage (i.e., memory)
• Therefore, every other device connected to the
system is just treated as a memory address.
• For example, registers of peripherals and accelerators
are given addresses in the system memory map.
• These registers are used to control the devices Source: Peckol, Embedded Systems

(e.g., “start operation” command) as well as to transfer data to and from them.
16 © Adam Teman,
May 1, 2023
Bus Terminology
Slave
(Subordinate) 1

Address
Decoder Slave
(Subordinate) 2
Master
Multiplexor
(Manager) Select Slave
(Subordinate) 3
(Subordinate)
Multiplexor
Slave

Source: ARM
17 © Adam Teman,
May 1, 2023
Bus Terminology
• Master (or Manager)
• Component that initiates a read or write data transfer.
• Slave (or Subordinate) A bus can accommodate multiple
• Component that does not initiate transfers Masters and multiple Slaves
and only responds to incoming transfer requests.
• Decoder
• Determines which component a transfer is intended for.
• Bridge
• Connects two busses. Slave on one side and master on the other.
• Arbiter
• Controls access to the shared bus.
Selects master to grant access to bus.
18 © Adam Teman,
May 1, 2023
Basic Bus Topologies
• Shared bus
• Components share a single set of signals.
• Only one transaction can exist at a time.
• Hierarchical busses enable multiple transactions.
• Ring
• Very low cost connection (only connect to neighbor).
• Potential long propagation delay.
• Multiple concurrent transactions.
• Crossbar
• Point-to-point connection between all.
• Very high throughput, but very costly wiring.
• Can be reduced into partial crossbar/matrix.
19 © Adam Teman,
May 1, 2023
Higher
On-Chip Connecting with Simple Bus
Performance
Communication Peripherals Operation
Buses

Simple Bus Operation

20
How does communication work?
• Communication through ports consisting of:
• An Interface – set of pins/wires that connect the components.
• A Protocol – set of rules for changing the logic levels and meaning of data.
• Flow-control is implemented through handshaking
• Data is transferred when both the sender and receiver are happy to receive.
• “ack”s and “nack”s are used for communicating readiness.
• Communication is convened between:
• A Master – the port initiating the communication.
• A Slave – the port responding to the communication.
• Interfaces can be:
• Synchronous – both sides are clocked by the same clock.
• Asynchronous – data is transferred through a clock domain crossing.
21 © Adam Teman,
May 1, 2023
Handshaking
• In order to ensure that both devices are ready to communicate
over the bus, a handshaking protocol is required.
• A conceptual handshake protocol utilizes two signals:
• ENQ (enquiry) – from transmitter to receiver
• ACK (acknowledge) – from receiver to transmitter
• The four-cycle handshake process includes:
• Device 1 raises ENQ to initiate transfer
• Device 2 raises ACK, when
ready and transmission can start
• Device 2 lowers ACK to
signal that data was received
• Device 1 lowers ENQ to finish Source: Wolf,
Computers as Components
22 © Adam Teman,
May 1, 2023
Bus Arbitration
• Only one master can control the bus
• Need some way of deciding who is master

Arbiter
• And to make sure the right slave answers
• Arbitration Source: Ques10.com

• Decides which master can use the shared bus if


several masters request bus access simultaneously
• Common arbitration schemes include:
Random, Priority-based, Round Robin, Time Division Multiplexing (TDMA), etc.
• Decoding
• Determines the target for any transfer initiated by a master
• Tells the right slave to put the response on the bus
23 © Adam Teman,
May 1, 2023
A Typical Bus Operation Example
• The following steps illustrate a typical operation
to access a peripheral: Address bus
Select a peripheral

• The Master (e.g., a processor) selects one


Slave (e.g., peripheral or register) by giving the Control bus
Read operation,

Slave (Subordinate)
address to the address bus. transfer size at the same time

Master (Manager)
• At the same time, it sets control signals,
such as read or write and transfer size. Data bus
Send data back to processor

• The Master waits for the Slave to respond.


• Once the Slave is ready, it sends back the Control bus
Set ready signal at the same time

requested data to the Master.


• At the same time, it sets the Processor reads the data
and starts the next operation
ready signal on the control bus.
• Finally, the Master reads the transmitted data Address bus
Select a peripheral
and starts another communication cycle
Source: ARM
24 © Adam Teman,
May 1, 2023
Example: The Advanced Peripheral Bus (APB)
• APB is a simple, low performance bus, part of the AMBA specification by ARM.
• APB uses the following signals (Master/Slave):
• PCLK: the bus clock source (rising-edge triggered) PCLK
PSEL1
• PRESETn: the bus reset signal (active low) PRESETn PSEL2
• PADDR: the APB address bus (up to 32-bits wide) PRDATA 32

PSELn
• PSELx: the select line for each slave device PSLVERR
• PENABLE: indicates the 2nd cycle of an APB transfer APB
Bridge PENABLE
• PWRITE: indicates transfer direction (Write=H, Read=L) PWRITE
PWDATA: the write data bus (up to 32-bits wide) PREADY1
32
• PADDR
PREADY2
• PREADYX: used to extend a transfer 32
PWDATA
• PRDATA: the read data bus (up to 32-bits wide) PREADYn
• PSLVERR: indicates a transfer error (OKAY=L, ERROR=H)
25 © Adam Teman,
May 1, 2023
PWRITE

APB Write Transfer


PADDR[31:0]
PWDATA[31:0]

PWRITE
PSEL[1] Slave 1
1 2 3 PENABLE
PADDR[31:0] PRDATA[31:0]
PWDATA[31:0] PREADY[1]
PCLK PSEL[1:N]
Master PENABLE
PWRITE PRDATA[31:0] PWRITE
PREADY[1:N] PADDR[31:0]
PWDATA[31:0]
PADDR A0 A1 PSEL[2] Slave 2
PENABLE
PWDATA PRDATA[31:0]
D0 D1 PREADY[2]

PSEL1 Select • Setup Phase 1

Slave 1
• PWRITE, PADDR, PWDATA are set
Select
PSEL2 Slave 2 • PSEL is raised for selected Slave
PENABLE
• Access Phase 2

• PENABLE is raised, with other signals held


PRDATA • When selected Slave acks, PREADY is raised
Can extend
PREADY1
to delay write • Next Transfer 3

• PENABLE is lowered by Master


PREADY2
• PREADY may be lowered by Slave
© Adam Teman,
May 1, 2023
PWRITE

APB Read Transfer


PADDR[31:0]
PWDATA[31:0]

PWRITE
PSEL[1] Slave 1
1 2 3 PENABLE
PADDR[31:0] PRDATA[31:0]
PWDATA[31:0] PREADY[1]
PCLK PSEL[1:N]
Master PENABLE
PWRITE PRDATA[31:0] PWRITE
PREADY[1:N] PADDR[31:0]
PWDATA[31:0]
PADDR A0 A1 PSEL[2] Slave 2
PENABLE
PWDATA PRDATA[31:0]
PREADY[2]

PSEL1 Select
• Setup Phase 1
Slave 1 • PWRITE is lowered, PADDR is set
Select
PSEL2 Slave 2 • PSEL is raised for selected Slave
• Access Phase 2
PENABLE
• PENABLE is raised, with other signals held
PRDATA D0 • Slave raises PREADY and drives PRDATA
Can extend
to delay read • Next Transfer 3
PREADY1
• PENABLE is lowered by Master
PREADY2 • PREADY may be lowered by Slave
© Adam Teman,
May 1, 2023
See straightforward RTL implementation

APB State Diagram by Quick Silicon at:


https://www.edaplayground.com/x/AXrK

Only remains in the


SETUP state for one
clock cycle

Slave pulls PREADY


Enter ACCESS low to cause WAIT
state one cycle state
after SETUP state
28 Source: ARM © Adam Teman,
May 1, 2023
Higher
On-Chip Connecting with Simple Bus
Performance
Communication Peripherals Operation
Buses

Higher Performance Buses

29
Increasing Bus Performance
• The APB example was a simple low-performance bus
• Two-cycles to carry out a single transfer
• Each transfer requires the definition of address and data
• However, this can be easily improved by:
• Pipelining transfers:
Overlap Setup and Access phases
to achieve single-clock edge transfers.
• Burst operations:
Provide a single address with
multiple (sequential) data.
• Wide bus widths:
Support 64-bit, 128-bit, or even wider buses.

30 source: gruzovikpress.ru/
© Adam Teman,
May 1, 2023
Example: Advanced High-Performance Bus
• A more advanced bus defined within the AMBA specification
Decoder selects the correct is the Advanced High-Performance (AHB) bus.
slave according to the address. HWRITE
HREADY HADDR[31:0]
HRESP HWDATA[31:0]
HSIZE[2:0]
HRDATA[31:0] Master HBURST[2:0]
HRESETn HPROT[3:0]
HTRANS[1:0]
HCLK
HMASTLOCK

HSELx
HREADYOUT
HWRITE
HRESPx
HADDR[31:0]
HWDATA[31:0] HRDATAx[31:0]
HSIZE[1:0]
HBURST[2:0]
Slave
HPROT[3:0]
HTRANS[1:0] HRESETn
HMASTLOCK HCLK
HREADY
Source: ARM
Multiplexor multiplexes the correct read data bus
and response from the selected slave. © Adam Teman,
May 1, 2023
31
AHB Basic Read Operation
• An AHB transfer consists of two (overlapped) phases:
• The address phase: Master drives address and control signals onto the bus.
• The data phase: Selected Slave responds with HRDATA.
HREADY is lowered to extend the data phase
• The next address phase is applied during the current data phase.
HCLK

CONTROL Control 0 Control 1 Control 2 Control 3

HADDR [31:0] Address 0 Address 1 Address 2 Address 3

HRDATA [31:0]
Read Data 0 Read Data 1 Read Data 2

HWRITE

32 HREADY © Adam Teman,


May 1, 2023
AHB Burst Transfer
• AHB Supports bursts of different lengths
• Master provides one address and the burst length
• Several operations (W/R) are applied to incrementing addresses
• Allows reducing the overhead of the address phase
1 Provide address and burst length
Example: 4-beat Read Burst with “wait” state.
Lower HWRITE
First HTRANS=NONSEQ
2 Slave reads out first address
Master is not ready so sets BUSY HTRANS[1:0]

3 Master sets HTRANS=SEQ HADDR[31:0]

Read data (delayed) is ignored HWRITE

4 Master increments address HBURST[2:0]


Slave reads out second address HREADY
5 Slave reads out third address HRDATA[31:0]
But lowers HREADY to delay
6 Slave reads out last address
34 one cycle after raising HREADY © Adam Teman,
May 1, 2023
AHB with Multiple Masters
• Arbiter controls which master
initiates current transaction.

35 © Adam Teman,
May 1, 2023
Even More Performance
• To get even more performance, a bus can have the following capabilities:
• Independent read and write channels:
Simultaneous reads and writes → Improved bandwidth
• Multiple outstanding addresses:
Master can issue new transactions without waiting for previous to complete
• Out-of-order transaction completion:
A later transaction can complete before a previously launched one.
• Independent address and data operations:
If there is no strict timing relationship between address and data operations,
they can be arbitrarily separated
• These and other features are supported by the
Advanced eXtensible Interface (AXI) bus of the AMBA specification.
36 © Adam Teman,
May 1, 2023
The Advanced eXtensible Interface (AXI)
• AXI is an interface specification that defines the interface of IP blocks,
rather than the interconnect itself.
• AXI supports multiple masters (Managers)
and multiple slaves (Subordinates)
• AXI uses five main channels
(i.e., groups of signals) for communication:
• Write Address (AW)
• Write Data (W)
• Write Response (B)
• Read Address (AR)
• Read Data (R)
• Read response is passed as part of Read Data
37 Source: ARM © Adam Teman,
May 1, 2023
Channel handshake
• All channels have VALID (from source)
and READY (from destination) signals
• VALID remains high until
READY signal rises.
(1) Source Information is ready.
VALID goes high.

(2) Destination acknowledges


it is ready to receive information.
READY goes high.

(3) Information is passed from source to


destination at rising edge of clock.
(4) Transaction is complete.
VALID goes low. Information changed.
READY goes low.
38 * Note that READY can be asserted before VALID © Adam Teman,
May 1, 2023
Example: Write Transaction
(1) ADDRESS
Handshake.
(2) DATA Handshake.

(3) Burst transaction.


WVALID remains
high.
(4) WVALID falls.
Pause in
transaction
(5) WLAST indicates
final data.
(6) RESPONSE
handshake.
Note that SLAVE
is source.
39 © Adam Teman,
May 1, 2023
Transaction Ordering
• AXI Supports Interleaved/
Out-of-Order Transactions
• Example of a simple
transaction

• Example of a more
complex transaction

Source: ARM
40 © Adam Teman,
May 1, 2023
Multi-Level Buses
• A microprocessor system often has more than one bus.
• Complexity: High speed buses are more complex (wider and implement
sophisticated protocols), often not required for simple, slower devices.
• Parallelism: Breaking up the bus can provide less contention between devices
that operate independently.
• A bridge connects two buses:
• Acts as a slave on one bus
(e.g., the fast bus)
• Acts as a master on the second
bus (e.g., the slow bus)
• Provides protocol translation
and speed synchronization.
Source: Wolf,
Computers as Components
41 © Adam Teman,
May 1, 2023
AMBA Multi-Level Approach PULPino architecture

• AMBA is designed for multi-level buses


• Commonly use a bridge from a high-speed
bus (e.g., AXI) to a low-speed bus (e.g., APB)
to accommodate low-speed peripherals.

https://pulp-platform.org/

Source: ARM
42 © Adam Teman,
May 1, 2023
References
• Anand Raghunathan, ECE 695R: System-on-Chip Design
• https://nanohub.org/courses/ECE695R/o1a
• Lectures 1.7, 4.1, 4.2
• Pasricha, Dutt, “On-Chip Communication Architectures”, 2008
• Flynn, Luk “Computer System Design: System-on-Chip”, 2011
• University of Texas, EE319K Introduction to Embedded Systems
• Circuits Basics “BASICS OF UART COMMUNICATION
• ARM AMBA Bus specifications
• ARM Education Kits
• AXI Protocol Overview,
https://developer.arm.com/documentation/102202/0200/AXI-protocol-overview
43 © Adam Teman,
May 1, 2023

You might also like