0% found this document useful (0 votes)
133 views102 pages

System Busses / Networks-on-Chip: EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton

This document discusses different types of system bus architectures and networks-on-chip (NoC) used to connect components in system-on-chip (SoC) designs. It begins by describing simple system busses like Advanced Microcontroller Bus Architecture (AMBA) Advanced Peripheral Bus (APB), which allow a processor to communicate with peripherals through load and store instructions but are limited by a single master. More complex busses like AMBA Advanced High-performance Bus (AHB) aim to improve scalability. The document concludes by introducing Networks-on-Chip (NoC) as an alternative to busses that can provide better performance and scalability in large SoCs using techniques like topologies, protocols,

Uploaded by

ujwala_512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views102 pages

System Busses / Networks-on-Chip: EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton

This document discusses different types of system bus architectures and networks-on-chip (NoC) used to connect components in system-on-chip (SoC) designs. It begins by describing simple system busses like Advanced Microcontroller Bus Architecture (AMBA) Advanced Peripheral Bus (APB), which allow a processor to communicate with peripherals through load and store instructions but are limited by a single master. More complex busses like AMBA Advanced High-performance Bus (AHB) aim to improve scalability. The document concludes by introducing Networks-on-Chip (NoC) as an alternative to busses that can provide better performance and scalability in large SoCs using techniques like topologies, protocols,

Uploaded by

ujwala_512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

System Busses / Networks-on-Chip

EECE 579 - Advanced Topics in VLSI Design


Spring 2009
Brad Quinton

1
Outline
1. Simple systems busses
• Overview
• AMBA APB
• Advantages/Limitations
2. Complex systems busses
• Overview
• AMBA AHB
• Advantages/Limitations
3. Networks-on-Chip (NoC)
• Overview
• AMBA AXI
• Research Topics: Topology, Protocol, VLSI Implementation...
• Review: “A Generic Architecture for On-Chip Packet-
Switched Interconnections”
2
Bluetooth “Platform” SoC
Processor Application Specific Logic
ARBITER
Memory
ARM7TDMI DECODER Controller
SMC RADIO ADC
TIC I/F

AHB APB
BRIDGE SHARED SPEECH
MEMORY I/F
CONTROLLER
POWER &
DMA CLOCK
CONTROL

LMC
SHARED DAP I/F
PLL
CLOCKS
System Bus / MEMORY

Hardware I/F

WATCH
GPIO PIC TIMERS
text UART UART ACI USB
DOG

Low-speed I/O and Support Logic 3


Simple System Busses

4
Simple System Busses

• The primary goal of a simple system bus is to


allow software (running on a processor) to
communicate with other hardware in the SoC

• There are many different implementation ... but


they are all very similar

5
Embedded Processor I/O

• RISC-based embedded processors


communicate with external hardware using two
simple instructions:

6
Embedded Processor I/O

• RISC-based embedded processors


communicate with external hardware using two
simple instructions:

– Load Operation: Copies a word of data from a


specific address to a local register

– Store Operation: Copies a word of data from a


local register to a specific address

7
Embedded Processor I/O

• RISC-based embedded processors


communicate with external hardware using two
simple instructions:

– Load Operation: Copies a word of data from a


specific address to a local register

– Store Operation: Copies a word of data from a


local register to a specific address

• The simple system bus is just a direct


extension of this model
8
Embedded Processor I/O

9
Embedded Processor I/O

Software
sets up the
register with
the address
and data ... 10
Embedded Processor I/O

Blocks
decode
addresses
to see if
they are the
targets...

Software
sets up the
register with
the address
and data ... 11
Embedded Processor I/O

Blocks
decode
addresses
to see if
they are the
targets...

Software Data
sets up the transferred
register with between
the address register and
and data ... hardware
12
AMBA Specification

• AMBA: Advanced Microcontroller Bus


Architecture

• Created by ARM to enable standardized


interfaces to their embedded processors

• Actually three standards: APB, AHB, and AXI

• Very commonly used for commercial IP cores

13
AMBA Specification

• AMBA: Advanced Microcontroller Bus


Architecture

• Created by ARM to enable standardized


interfaces to their embedded processors
Simple Bus
• Actually three standards: APB, AHB, and AXI

• Very commonly used for commercial IP cores

14
AMBA Specification

• AMBA: Advanced Microcontroller Bus


Architecture

• Created by ARM to enable standardized


interfaces to their embedded processors
Simple Bus Complex Bus
• Actually three standards: APB, AHB, and AXI

• Very commonly used for commercial IP cores

15
AMBA Specification

• AMBA: Advanced Microcontroller Bus


Architecture

• Created by ARM to enable standardized


interfaces to their embedded processors
Simple Bus Complex Bus NoC
• Actually three standards: APB, AHB, and AXI

• Very commonly used for commercial IP cores

16
AMBA APB: Read Operation

17
AMBA APB: Read Operation

Target Address

18
AMBA APB: Read Operation

Target Address

Transaction
Type

19
AMBA APB: Read Operation

Target Address

Transaction
Type

Address
Decode

20
AMBA APB: Read Operation

Target Address

Transaction
Type

Address
Decode

Optional (for
asynchronous
implementations
...) 21
AMBA APB: Read Operation

Target Address

Transaction
Type

Address
Decode

Optional (for
asynchronous
implementations Read Data
...) 22
AMBA APB: Write Operation

23
AMBA APB: Write Operation

Common Signals
Between Read and
Write

24
AMBA APB: Write Operation

Common Signals
Between Read and
Write

Write Data
25
Remember Our Case Study
Simple generic processor interface:

- data width:16 bits


- address width: 16 bits
- read cycle time: 50 ns
- write cycle time: 50 ns

26
Remember Our Case Study
Simple generic processor interface:

- data width:16 bits


- address width: 16 bits
- read cycle time: 50 ns
- write cycle time: 50 ns

System bus

27
Simple Bus Advantages

• Simple to implement
• Easy to understand
• Simple programming model
• Easy to add new hardware blocks
• Minimal hardware requirements (most of the
signals are shared)

28
Simple Bus Limitations

• Single Master - limits parallelism


• Scalability - performance suffers as bus is
loaded...
• Single outstanding request - poor throughput
and multi-threading performance bottleneck

29
Case Study: Single Master

• Imagine a new
partition:
– APS Bit Error
Monitor
communicates
directly with Switch

• Simple bus doesnʼt


work...

30
Case Study: Single Master

• Imagine a new
partition:
No Path – APS Bit Error
Monitor
communicates
directly with Switch

• Simple bus doesnʼt


work...

31
Case Study: Single Master

• Imagine a new
partition:
No Path – APS Bit Error
Monitor
communicates
directly with Switch

• Simple bus doesnʼt


work...

• This can make software the bottleneck in the


system....
32
Single Master Summary

• A bus that is limited to a single master:

– Makes inter-block communication inefficient


– Limits parallelism between hardware and software
– Increases reliance on interrupts
– Creates software performance bottlenecks
– Is not compatible with multiple processors

33
Scalability

34
Scalability

Blocks are functionally


easy to add, but....
35
Scalability

Each new
block
increases
the delay
on the
address
and data

Blocks are functionally


easy to add, but....
36
Scalability Summary

• Simple busses are not scaleable because:

– The address and data “fan-out” to each target


– Adding a new block increases the load on the bus
– Increased fanout + greater load = reduce
performance

37
Single Outstanding Request

38
Single Outstanding Request
Processor is stalled waiting for response...

39
Single Outstanding Request
Processor is stalled waiting for response...

best-case <= 50% efficiency


40
Single Outstanding Request Summary

• Busses limited to a single outstanding request:

– Reduce software performance since the software


must “stall” on the first transaction

– Are not able to achieve full bus throughput since the


data bus is idle during the address phase

41
Complex System Busses

42
Complex Systems Busses

• The complex system bus is attempts to


address some of the issues with the simple
bus:

– Multi-master
– Pipelined transactions

• There are many different ways to go about


this...

43
AMBA AHB

• AHB addresses many of the limitations of APB:

– multi-master
– multiple outstanding transactions (sort of...)
– back-to-back transactions

• Unfortunately, this adds significant complexity

44
Bring on the complexity...

45
Bring on the complexity...

IP Block
CPU #1 #1

IP Block
CPU #2 #2

IP Block
IP Block #3
#1
IP Block
#4

46
Bring on the complexity...

Request
IP Block
CPU #1 #1

IP Block
CPU #2 #2

IP Block
IP Block #3
#1
IP Block
#4

47
Bring on the complexity...

Request
Grant IP Block
CPU #1 #1

IP Block
CPU #2 #2

IP Block
IP Block #3
#1
IP Block
#4

48
Bring on the complexity...

Request
Grant IP Block
CPU #1 #1
Transaction
IP Block
CPU #2 #2

IP Block
IP Block #3
#1
IP Block
#4

49
Bus Arbitration

• When multiple masters share a bus there must


be some central resource to manage the bus:
an arbiter

• Once there is competition for the bus, it is


possible that it is not ready when you need it:
backpressure

• Backpressure adds complexity and hurt


performance

50
Request / Grant Protocol

51
Request / Grant Protocol

Before a transaction a
master makes a request
to the central arbiter
52
Request / Grant Protocol

Before a transaction a
master makes a request Eventually the request is
to the central arbiter granted
53
Request / Grant Protocol

Then the
transaction
proceeds

Before a transaction a
master makes a request Eventually the request is
to the central arbiter granted
54
Request / Grant Protocol

Performance Impact
Then the
transaction
proceeds

Before a transaction a
master makes a request Eventually the request is
to the central arbiter granted
55
Pipelined Transactions

• To help improve bus efficiency the


transactions on the bus can be pipelined

• This is really a simple implementation of


multiple outstanding transactions

• The address for one transaction can be


presented before the data from the previous
transaction has been completed

56
Pipelined Transactions

57
Pipelined Transactions

Transaction A Starts

58
Pipelined Transactions

Transaction A Starts

Transaction B Starts
59
Pipelined Transactions

Transaction A Starts Transaction A Completes

Transaction B Starts
60
Pipelined Transactions
Notice backpressure

Transaction A Starts Transaction A Completes

Transaction B Starts
61
Advantages

• Relatively easy to add new blocks


• Still has the familiar bus structure
• Low hardware cost
• Bus arbitration “solves” many ordering
problems

62
Disadvantages

• Busses that require arbitration:


– must route signals to the arbitration logic and back
– must find a “fair” way to share the bus
– slaves are not always available => backpressure
– difficult to provide performance guarantees...

• Still potentially a bandwidth bottleneck

• Still doesnʼt scale well when blocks are added

• Multiple outstanding transactions not handled


well - no ordering information 63
Networks-on-Chip (NoCs)

64
Networks-on-Chip

• It is clear that even with significant design


effort the bus-style interconnect is not going to
sufficient for large SoCs:

– the physical implementation does not scale: bus


fanout, loading, arbitration depth all reduce
operating frequency

– the available bandwidth does not scale: the single


bus must be shared by all masters and slaves

65
Networks-on-Chip

• It is clear that even with significant design


effort the bus-style interconnect is not going to
sufficient for large SoCs:

– the physical implementation does not scale: bus


fanout, loading, arbitration depth all reduce
operating frequency

– the available bandwidth does not scale: the single


bus must be shared by all masters and slaves

• Lets start again: Leverage research from


data networking 66
What do we want?

• The SoCs of the future will:

– have 100s of hardware blocks,


– have billions of transistors,
– have multiple processors,
– have large wire-to-gate delay ratios,
– handle large amounts of high-speed data,
– need to support “plug-and-play” IP blocks

• Our NoC needs to be ready for these SoCs...

67
The Ideal Network

• What would the ideal network look like?:

– Low area overhead


– Simple implementation
– High-speed operation
– Low-latency
– High-bandwidth
– Operate at a constant frequency even with
additional blocks
– Increase available bandwidth as blocks are added
– Provide performance guarantees
– Have a “universal” interface
68
The Ideal Network

• What would the ideal network look like?:

– Low area overhead These are competing


– Simple implementation requirements: Design a
– High-speed operation network that is the
– Low-latency “best” fit.
– High-bandwidth
– Operate at a constant frequency even with
additional blocks
– Increase available bandwidth as blocks are added
– Provide performance guarantees
– Have a “universal” interface
69
What do we need to decide?

• Network Interface
• Network Protocol / Transaction Format
• Network Topology
• VLSI Implementation

70
Network Interface

• We want our network to be “plug-and-play” so


industry standardization is key

• However the standard be universal enough to


address many different needs

• AMBA AXI is an example of an attempt at this

71
AMBA AXI

• ARM added the AXI specification to Version


3.0 of the AMBA standard

• New approach: define the interface and leave


the interconnect up to the designers

• Good plan since a specific bus implementation


is no longer required

• It is possible to use AXI to build many different


NoCs
72
AMBA AXI

• Interface divided into 5 channels:

– Write Address
– Write Data
– Write Response
– Read Address
– Read Data/Response

• Each channel is independent and use two-


way flow control
73
AMBA AXI Read Channels

74
AMBA AXI Read Channels

Independent

75
AMBA AXI Read Channels

Give me some data

Independent

76
AMBA AXI Read Channels

Give me some data

Independent

Here you go

77
AMBA AXI Read Channels
channels synchronized
with ID # or “tags”
Give me some data

Independent

Here you go

78
AMBA AXI Write Channels

79
AMBA AXI Write Channels

Independent

Independent

80
AMBA AXI Write Channels

Iʼm sending data. Please store it.

Independent

Independent

81
AMBA AXI Write Channels

Iʼm sending data. Please store it.

Independent
Here is the data.

Independent

82
AMBA AXI Write Channels

Iʼm sending data. Please store it.

Independent
Here is the data.

Independent

I received that data correctly.

83
AMBA AXI Write Channels

Iʼm sending data. Please store it.

Independent
Here is the data.

Independent

I received that data correctly.

channels synchronized
with ID # or “tags” 84
AMBA AXI Flow-Control

• Information moves
only when:

– Source is Valid, and


– Destination is Ready

• On each channel the


master or slave can
limit the flow

• Very flexible
85
AMBA AXI Flow-Control

• Information moves
only when:

– Source is Valid, and


– Destination is Ready

• On each channel the


master or slave can Transfer
limit the flow

• Very flexible
86
AMBA AXI Flow-Control

• This definition of very independent, fully


flow-controlled channels is very useful

• However, there is a potential problem:

87
AMBA AXI Flow-Control

• This definition of very independent, fully


flow-controlled channels is very useful

• However, there is a potential problem:


DEADLOCK

88
AMBA AXI Flow-Control

• This definition of very independent, fully


flow-controlled channels is very useful

• However, there is a potential problem:


DEADLOCK

• On a write transaction the master must not


wait for AWREADY before asserting
WVALID

89
AMBA AXI Read

90
AMBA AXI Read

Read Address Channel

Read Data Channel

91
AMBA AXI Write

92
AMBA AXI Write

Write Address Channel

Write
Data
Channel

Write Response Channel


93
A True Interface Specification

• Because of the channel independence and


the two-way flow-control the interface does
not dictate the network protocol, transaction
format, network topology, or VLSI
implementation

• For example:
– if you want to build a packet-based network, you
can “backpressure” the data channel while you build
the packet header from the address channel
information,
– you can use store-and-forward, or cut-through,
– etc. 94
Network Protocol / Transaction Format

• There are many choice for network protocols


and transactions formats:

– circuit-switched : plan and provision a connection


before communication starts

– packet-switched : issues packets which compete


for network resources

– hybrids: schedule connectivity (dynamic or static)

95
Network Protocol / Transaction Format

• There are many choice for network protocols


and transactions formats:

– circuit-switched : plan and provision a connection


before communication starts

– packet-switched : issues packets which compete


for network resources

– hybrids: schedule connectivity (dynamic or static)

• There is still lots of research here....


96
Network Topology

• How should your network elements be


interconnected:

– Fully Connected (N2): high area cost, high performance


– Mesh: low area cost, potential poor performance
– Hypercube: medium area, traffic dependent
performance
– Fat-tree: medium area, traffic dependent performance
– Torus: medium area, traffic dependent performance

97
Network Topology

• There is lots of research here....


98
Network Topology - Caveat

• There has been a lot of research on topologies for


NoCs, however it is important to realize that the
performance of a topology is highly dependent on
the traffic patterns!

• Traffic patterns in an SoC that you are designing


yourself are NOT random, therefore much of the
topology research is not applicable to most SoCs!

99
VLSI Implementation

• Once you have a topology there is still the mater of


implementing it on your SoC

• There are many considerations:

– Clocking: Synchronous, Asynchronous


– Buffer Insertion: Trade-off power, area, performance
– Register Insertion / Pipelining: Trade-off clock
frequency, area, and latency
– Packet Buffers: Trade-off area, latency and throughput

• Again, lots of research on-going... 100


Bluetooth “Platform” SoC
Processor Application Specific Logic
ARBITER
Memory
ARM7TDMI DECODER Controller
SMC RADIO ADC
TIC I/F

AHB APB
BRIDGE SHARED SPEECH
MEMORY I/F
CONTROLLER
POWER &
DMA CLOCK
CONTROL

LMC
SHARED DAP I/F
PLL
CLOCKS
System Bus / MEMORY

Hardware I/F

WATCH
GPIO PIC TIMERS
text UART UART ACI USB
DOG

Low-speed I/O and Support Logic 101


Research Paper

• Lets look at:

Guerrier, P.; Greiner, A., "A generic architecture for on-chip


packet-switched interconnections ," Design, Automation and
Test in Europe Conference and Exhibition 2000. Proceedings
, vol., no., pp.250-256, 2000

102

You might also like