AXI Interface PDF
AXI Interface PDF
Guide
[Guide Subtitle]
[optional]
Xilinx is providing this product documentation, hereinafter Information, to you AS IS with no warranty of any kind, express or implied.
Xilinx makes no representation that the Information, or any particular implementation thereof, is free from any claims of infringement. You
are responsible for obtaining any rights you may require for any implementation based on the Information. All specifications are subject to
change without notice.
XILINX EXPRESSLY DISCLAIMS ANY WARRANTY WHATSOEVER WITH RESPECT TO THE ADEQUACY OF THE INFORMATION OR
ANY IMPLEMENTATION BASED THEREON, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OR REPRESENTATIONS THAT
THIS IMPLEMENTATION IS FREE FROM CLAIMS OF INFRINGEMENT AND ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Except as stated herein, none of the Information may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or
transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without
the prior written consent of Xilinx.
Copyright 2012 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, Spartan, Kintex, Artix, ISE, Zynq, and other designated brands included herein
are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.
ARM and AMBA are registered trademarks of ARM in the EU and other countries. All other trademarks are the property of their
respective owners.
Revision History
The following table shows the revision history for this document:
.
Date
Version
03/01/2011
13.1
Description of Revisions
Second Xilinx release. Added new AXI Interconnect features.
Corrected ARESETN description in Appendix A.
03/07/2011
13.1_web
07/06/2011
13.2
www.xilinx.com
Date
Version
10/19/2011
13.3
Description of Revisions
Release updates:
Added information about an AXI Interconnect option to delay assertion of
AWVALID/ARVALID signals until FIFO occupancy permits interrupted burst
transfers to AXI Interconnect Core Features, page 14.
Added limitation related to CORE Generator use in AXI Interconnect Core
Limitations, page 17.
Added the impact of delay assertion BRAM FIFO option to as a means of improving
throughput to Table 5-1, page 88.
Added the impact of delay assertion BRAM FIFO option to as a means of improving
throughput to Throughput / Bandwidth Optimization Guidelines, page 92.
Added reference to AXI MPMC Application Note, (XAPP739), to AXI4-based
Multi-Ported Memory Controller: AXI4 System Optimization Example, page 94.
Added information regarding the AXI Interconnect option to delay assertion of
AWVALID/ARVALID signals until FIFO occupancy permits interrupted burst
transfers to Refining the AXI Interconnect Configuration, page 98.
Added information about using the BSB for an AXI design in Using Base System
Builder Without Analyzing and Optimizing Output, page 106.
Added reference to AXI MPMC Application Note, (XAPP739), to Appendix C,
Additional Resources.
01/18/2012
13.4
Modified:
References to 7 series and Zynq Extensible Platform devices in Introduction in
Chapter 1, Introducing AXI for Xilinx System Development.
Figure 2-1, page 11 and Figure 2-4, page 13 to reflect new IP Catalog in tools.
Added:
References to new design templates, documented in
www.xilinx.com
www.xilinx.com
Table of Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3
3
4
6
8
8
41
43
54
55
69
70
72
75
77
79
82
82
83
84
85
www.xilinx.com
www.xilinx.com
Chapter 1
Introducing AXI for Xilinx System Development
Introduction
Xilinx adopted the Advanced eXtensible Interface (AXI) protocol for Intellectual Property
(IP) cores beginning with the Spartan-6 and Virtex-6 devices. Xilinx has continued the
use of the AXI protocol for IP targeting the 7 series, and the Zynq-7000 Extensible
Processing Platform (EPP) devices (Zynq is in Beta development.)
This document is intended to:
Give an overview of what Xilinx tools you can use to create AXI-based IP
Note: This document is not intended to replace the Advanced Microcontroller Bus
Architecture (AMBA) ARM AXI4 specifications. Before beginning an AXI design, you need to
download, read, and understand the ARM AMBA AXI Protocol v2.0 Specification, along with the
AMBA4 AXI4-Stream Protocol v1.0.
These are the steps to download the specifications; you might need to fill out a brief
registration before downloading the documents:
1.
Go to www.amba.com
2.
3.
In the Contents pane on the left, click AMBA > AMBA Specifications >AMBA4.
4.
Download both the ABMA AXI4-Stream Protocol Specification and AMBA AXI Protocol
Specification v2.0.
What is AXI?
AXI is part of ARM AMBA, a family of micro controller buses first introduced in 1996. The
first version of AXI was first included in AMBA 3.0, released in 2003. AMBA 4.0, released
in 2010, includes the second version of AXI, AXI4.
There are three types of AXI4 interfaces:
Xilinx introduced these interfaces in the ISE Design Suite, release 12.3.
www.xilinx.com
AXI4 is for memory mapped interfaces and allows burst of up to 256 data transfer
cycles with just a single address phase.
AXI4-Stream removes the requirement for an address phase altogether and allows
unlimited data burst size. AXI4-Stream interfaces and transfers do not have
address phases and are therefore not considered to be memory-mapped.
Data can move in both directions between the master and slave simultaneously, and data
transfer sizes can vary. The limit in AXI4 is a burst transaction of up to 256 data transfers.
AXI4-Lite allows only 1 data transfer per transaction.
Figure 1-1, page 5 shows how an AXI4 Read transaction uses the Read address and Read
data channels:
www.xilinx.com
Slave
interface
Read
data
Read
data
Read
data
X12076
Figure 1-1:
Figure 1-2 shows how a write transaction uses the write address, write data, and write
response channels.
Write
data
Write
data
Write
data
Slave
interface
Write
data
X12077
Figure 1-2:
As shown in the preceding figures, AXI4 provides separate data and address connections
for reads and writes, which allows simultaneous, bidirectional data transfer. AXI4 requires
a single address and then bursts up to 256 words of data. The AXI4 protocol describes a
variety of options that allow AXI4-compliant systems to achieve very high data
throughput. Some of these features, in addition to bursting, are: data upsizing and
downsizing, multiple outstanding addresses, and out-of-order transaction processing.
At a hardware level, AXI4 allows a different clock for each AXI master-slave pair. In
addition, the AXI protocol allows the insertion of register slices (often called pipeline
stages) to aid in timing closure.
www.xilinx.com
AXI4-Lite is similar to AXI4 with some exceptions, the most notable of which is that
bursting, is not supported. The AXI4-Lite chapter of the ARM AMBA AXI Protocol v2.0
Specification describes the AXI4-Lite protocol in more detail.
The AXI4-Stream protocol defines a single channel for transmission of streaming data. The
AXI4-Stream channel is modeled after the write data channel of the AXI4. Unlike AXI4,
AXI4-Stream interfaces can burst an unlimited amount of data. There are additional,
optional capabilities described in the AXI4-Stream Protocol Specification. The specification
describes how AXI4-Stream-compliant interfaces can be split, merged, interleaved,
upsized, and downsized. Unlike AXI4, AXI4-Stream transfers cannot be reordered.
Note: With regards to AXI4-Stream, even if two pieces of IP are designed in accordance with the
AXI4-Stream specification, and are compatible at a signaling level, it does not guarantee that two
components will function correctly together due to higher level system considerations. Refer
to the AXI IP specifications at http://www.xilinx.com/support/documentation/
axi_ip_documentation.htm, and AXI4-Stream Signals, page 43 for more information.
IP Interoperability
The AXI specification provides a framework that defines protocols for moving data
between IP using a defined signaling standard. This standard ensures that IP can exchange
data with each other and that data can be moved across a system.
AXI IP interoperability affects:
The AXI protocol defines how data is exchanged, transferred, and transformed. The AXI
protocol also ensures an efficient, flexible, and predictable means for transferring data.
About IP Compatibility
For more application-specific IP, like an Ethernet MAC (EMAC) or a Video Display IP
using AXI4-Stream, the compatibility of the IP is more limited to their respective
application spaces. For example, directly connecting an Ethernet MAC to the Video
Display IP would not be feasible.
Note: Even though two IP such as EMAC and Video Streaming can theoretically exchange data
with each other, they would not function together because the two IP interpret bit fields and data
packets in a completely different manner.
www.xilinx.com
IP Interoperability
Infrastructure IP
An infrastructure IP is another IP form used to build systems. Infrastructure IP tends to be
a generic IP that moves or transforms data around the system using general-purpose AXI4
interfaces and does not interpret data.
Examples of infrastructure IP are:
AXI Direct Memory Access (DMA) engines (memory mapped to stream conversion)
These IP are useful for connecting a number of IP together into a system, but are not
generally endpoints for data.
AXI4-Stream Protocol
The AXI4-Stream protocol is used for applications that typically focus on a data-centric
and data-flow paradigm where the concept of an address is not present or not required.
Each AXI4-Stream acts as a single unidirectional channel for a handshake data flow.
At this lower level of operation (compared to the memory mapped AXI protocol types), the
mechanism to move data between IP is defined and efficient, but there is no unifying
address context between IP. The AXI4-Stream IP can be better optimized for performance
in data flow applications, but also tends to be more specialized around a given application
space.
www.xilinx.com
Interface
AXI4
Features
Replaces
PLBv3.4/v4.6
OPB
NPI
XCL
AXI4-Lite
AXI4-Stream
Data-only burst.
www.xilinx.com
Chapter 2
www.xilinx.com
Put the chipscope_axi_monitor into your bus interface System Assembly View (SAV).
2.
Select the bus you want to probe from the Bus Name field.
After you select the bus, an M for monitor displays between your peripheral and the
AXI Interconnect core IP.
3.
Add a ChipScope ICON core to your system, and connect the control bus to the AXI
monitor.
4.
In the SAV Ports tab, on the monitor core, set up the MON_AXI_ACLK port of the core to
match the clock used by the AXI interface being probed.
Optionally, you can assign the MON_AXI_TRIG_OUT port and connect it to other
chipscope_axi_monitor cores in the system.
AXI4-Stream interface is supported in IPs found in the System Generator AXI4 block
library.
10
www.xilinx.com
The EDK Processor block lets you connect hardware circuits created in System Generator
to a Xilinx MicroBlaze processor; options to connect to the processor using either a
PLBv4.6 or an AXI4 interface are available.
You do not need to be familiar with the AXI4 nomenclature when using the System
Generator flow because the EDK Processor block provides an interface that is
memory-centric and works with multiple bus types.
You can create hardware that uses shared registers, shared FIFOs, and shared memories,
and the EDK Processor block manages the memory connection to the specified interface.
Figure 2-1 shows the EDK Processor Implementation tab with an AXI4 bus type selected.
Figure 2-1:
Port Groupings
System Generator groups together and color-codes blocks of AXI4-Stream channel signals.
In the example illustrated in the following figure, the top-most input port, data_tready,
and the top two output ports, data_tvalid and data_tdata belong in the same
AXI4-Stream channel, as well as phase_tready, phase_tvalid, and phase_tdata.
System Generator gives signals that are not part of any AXI4-Stream channels the same
background color as the block; the rst signal, shown in Figure 2-2, page 12, is an example.
www.xilinx.com
11
Figure 2-2:
Figure 2-3:
Multi-Channel TDATA
Note: Breaking out of multi-channel TDATA does not add additional logic to the design. The data is
correctly byte-aligned also.
For more information about System Generator and AXI IP creation, see the following
Xilinx website: http://www.xilinx.com/tools/sysgen.htm.
12
www.xilinx.com
Figure 2-4:
Figure 2-5 shows the IP catalog in PlanAhead with the equivalent AXI4 column and the
supported AXI4 interfaces in the IP details panel.
Figure 2-5:
www.xilinx.com
13
New HDL designs for AXI4, AXI4-Lite, and AXI4-Stream masters and slaves can reference
AXI IP HDL design templates provided in the solution record:
http://www.xilinx.com/support/answers/37425.htm.
Check this answer record periodically for updates or new templates.
Data Mover
Centralized DMA
Ethernet DMA
Video DMA
Refer to Chapter 4, Migrating to Xilinx AXI Protocols, for more detailed usage
information. See the following for a list of all AXI IP:
http://www.xilinx.com/support/documentation/axi_ip_documentation.htm.
Appendix C, Additional Resources, also contains this link.
14
www.xilinx.com
Converts AXI4 bursts >16 beats when targeting AXI3 slave devices by splitting
transactions.
Generates REGION outputs for use by slave devices with multiple address decode
ranges
Propagates USER signals on each channel, if any; independent USER signal width
per channel (optional)
Propagates Quality of Service (QoS) signals, if any; not used by the AXI
Interconnect core (optional)
AXI4-Lite: 32 bits
The Slave Interface (SI) of the core can be configured to compirse 1-16 SI slots to
accept transactions from up to 16 connected master devices. The Master Interface (MI)
can be configured to comprise 1-16 MIT slots to issue transactions to up to 16
connected slave devices.
When connecting one master to one slave, the AXI Interconnect core can
optionally perform address range checking. Also, it can perform any of the
normal data-width, clock-rate, or protocol conversions and pipelining.
When connecting one master to one slave and not performing any conversions or
address range checking, pathways through the AXI Interconnect core are
implemented as wires, with no resources, no delay and no latency.
Note: When used in a non-embedded system such as CORE Generator, the AXI Interconnect core
connects multiple masters to one slave, typically a memory controller.
Each master and slave connection can independently use data widths of 32, 64,
128, 256, 512, or 1024 bits wide:
-
The internal crossbar can be configured to have a native data-width of 32, 64,
128, 256, 512, or 1024 bits.
Each master and slave connection can use independent clock rates
Synchronous integer-ratio (N:1 and 1:N) conversion to the internal crossbar native
clock-rate.
Asynchronous clock conversion (uses more storage and incurs more latency than
synchronous conversion).
The AXI Interconnect core exports reset signals resynchronized to the clock input
associated with each SI and MI slot.
www.xilinx.com
15
The AXI Interconnect core can connect to any mixture of AXI4 and AXI4-Lite
masters and slaves.
The AXI Interconnect core saves transaction IDs and restores them during
response transfers, when connected to an AXI4-Lite slave.
-
The AXI Interconnect core detects illegal AXI4-Lite transactions from AXI4
masters, such as any transaction that accesses more than one word. It generates a
protocol-compliant error response to the master, and does not propagate the
illegal transaction to the AXI4-Lite slave.
The AXI Interconnect core splits burst transactions of more than 16 beats from
AXI4 masters into multiple transactions of no more than 16 beats when connected
to an AXI3 slave.
Available on each AXI channel connecting to each master and each slave.
One latency cycle per register-slice, with no loss in data throughput under all AXI
handshaking conditions.
Available on write and read data paths connecting to each master and each slave.
ARVALID until the R-channel FIFO has enough vacancy to store the entire
burst length
16
Parallel crossbar pathways for write data and read data channels. When more
than one write or read data source has data to send to different destinations,
data transfers may occur independently and concurrently, provided AXI
ordering rules are met.
One shared write address arbiter, plus one shared Read address arbiter.
Arbitration latencies typically do not impact data throughput when
transactions average at least three data beats.
Shared write data, shared read data, and single shared address pathways.
www.xilinx.com
Supports write response re-ordering, read data re-ordering, and read data
interleaving.
Configurable write and read transaction acceptance limits for each connected
master.
Configurable write and read transaction issuing limits for each connected slave.
For each ID thread issued by a connected master, the master can have outstanding
transactions to only one slave for writes and one slave for reads, at any time.
Round-robin arbitration is used among all connected masters configured with the
lowest priority setting (priority 0), when no higher priority master is requesting.
Any SI slot that has reached its acceptance limit, or is targeting an MI slot that has
reached its issuing limit, or is trying to access an MI slot in a manner that risks
deadlock, is temporarily disqualified from arbitration, so that other SI slots can be
granted arbitration.
Any non-secure accesses are blocked and the AXI Interconnect core returns a
DECERR response to the master
Support for Read-only and write-only masters and slaves, resulting in reduced
resource utilization.
AXI4 QoS signals do not influence arbitration priority. QoS signals are propagated
from SI to MI.
The AXI Interconnect core does not convert multi-beat bursts into multiple single-beat
transactions when connected to an AXI4-Lite slave.
The AXI Interconnect core does not support low-power mode or propagate the AXI
C-channel signals.
The AXI Interconnect core does not time out if the destination of any AXI channel
transfer stalls indefinitely. All AXI slaves must respond to all received transactions, as
required by AXI protocol.
The AXI Interconnect core provides no built-in conversion to non-AXI protocols, such
as APB.
www.xilinx.com
17
The AXI Interconnect core does not have clock-enable (ACLKEN) inputs. Consequently,
the use of ACLKEN is not supported among memory mapped AXI interfaces in Xilinx
systems.
Note: The ACLKEN signal is supported for Xilinx AXI4-Stream interfaces.
When used in the CORE Generator tool flow, the AXI Interconnect core can only be
configured with one MI port (one connected slave device), and therefore performs no
address decoding.
MI Hemisphere
Slave
Interface
Register Slices
Protocol Converters
Down-sizers
Clock Converters
Up-sizers
Data FIFOs
Data FIFOs
Down-sizers
Master 1
Clock Converters
Register Slices
Master 0
Up-sizers
Crossbar
Slave 0
Slave 1
Master
Interface
X12047
Figure 2-6:
The AXI Interconnect core consists of the SI, the MI, and the functional units that comprise
the AXI channel pathways between them. The SI accepts Write and Read transaction
requests from connected master devices. The MI issues transactions to slave devices. At
the center is the crossbar that routes traffic on all the AXI channels between the various
devices connected to the SI and MI.
The AXI Interconnect core also comprises other functional units located between the
crossbar and each of the interfaces that perform various conversion and storage functions.
The crossbar effectively splits the AXI Interconnect core down the middle between the
SI-related functional units (SI hemisphere) and the MI-related units (MI hemisphere).
The following subsection describes the use models for the AXI Interconnect core.
18
Pass Through
Conversion Only
N-to-1 Interconnect
1-to-N Interconnect
www.xilinx.com
Pass Through
When there is only one master device and only one slave device connected to the AXI
Interconnect core, and the AXI Interconnect core is not performing any optional
conversion functions or pipelining, all pathways between the slave and master interfaces
degenerate into direct wire connections with no latency and consuming no logic
resources.
The AXI Interconnect core does, however, continue to resynchronize the
INTERCONNECT_ARESETN input to each of the slave and master interface clock domains
for any master or slave devices that connect to the ARESET_OUT_N outputs, which
consumes a small number of flip-flops.
Figure 2-7 is a diagram of the pass through use case.
Interconnect
Master 0
Slave 0
Figure 2-7:
Conversion Only
The AXI Interconnect core can perform various conversion and pipelining functions when
connecting one master device to one slave device. These are:
In these cases, the AXI Interconnect core contains no arbitration, decoding, or routing logic.
There could be latency incurred, depending on the conversion being performed.
www.xilinx.com
19
Master 0
Slave 0
Conversion
and/or
Pipelining
X12049
Figure 2-8:
N-to-1 Interconnect
A common degenerate configuration of AXI Interconnect core is when multiple master
devices arbitrate for access to a single slave device, typically a memory controller.
In these cases, address decoding logic might be unnecessary and omitted from the AXI
Interconnect core (unless address range validation is needed).
Conversion functions, such as data width and clock rate conversion, can also be performed
in this configuration. Figure 2-9 shows the N to 1 AXI interconnection use case.
.
Master 0
Interconnect
Arbiter
Slave 0
Master 1
X12050
Figure 2-9:
20
www.xilinx.com
1-to-N Interconnect
Another degenerative configuration of the AXI Interconnect core is when a single master
device, typically a processor, accesses multiple memory-mapped slave peripherals. In
these cases, arbitration (in the address and write data paths) is not performed. Figure 2-10,
shows the 1 to N Interconnect use case.
Interconnect
Decoder/Router
Master 0
Slave 0
Slave 1
X12051
Figure 2-10:
Master 0
AW
Slave 0
AW
Write
Transaction
Arbiter
AR
AR
Router
Master 1
Slave 1
AW
AW
AR
AR
Router
Master 2
Slave 2
AW
Read
Transaction
Arbiter
AR
AW
AR
X12052
Figure 2-11:
www.xilinx.com
21
Figure 2-12 shows the sparse crossbar write and read data pathways.
Interconnect
Master 0
Slave 0
Master 1
Slave 1
Master 2
Slave 2
Figure 2-12:
Parallel write and read data pathways connect each SI slot (attached to AXI masters on the
left) to all the MI slots (attached to AXI slaves on the right) that it can access, according to
the configured sparse connectivity map. When more than one source has data to send to
different destinations, data transfers can occur independently and concurrently, provided
AXI ordering rules are met.
The write address channels among all SI slots (if > 1) feed into a central address arbiter,
which grants access to one SI slot at a time, as is also the case for the read address channels.
The winner of each arbitration cycle transfers its address information to the targeted MI
slot and pushes an entry into the appropriate command queue(s) that enable various data
pathways to route data to the proper destination while enforcing AXI ordering rules.
22
www.xilinx.com
-ASTER
!7
!2
7
2
3LAVE
)NTERCONNECT
!7
!2
!RBITER
7
2
!DDRESS
3LAVE
-ASTER
!7
!2
!7
!2
7
2
7
2
7RITE $ATA
2EAD $ATA
Width Conversion
The AXI Interconnect core has a parametrically-defined, internal, native data-width that
supports 32, 64, 128, 256, 512, and 1024 bits. The AXI data channels that span the crossbar
are sized to the native width of the AXI Interconnect, as specified by the
C_INTERCONNECT_DATA_WIDTH parameter.
When any SI slots or MI slots are sized differently, the AXI Interconnect core inserts width
conversion units to adapt the slot width to the AXI Interconnect core native width before
transiting the crossbar to the other hemisphere.
The width conversion functions differ depending on whether the data path width gets
wider (up-sizing) or more narrow (down-sizing) when moving in the direction from
the SI toward the MI. The width conversion functions are the same in either the SI
hemisphere (translating from the SI to the AXI Interconnect core native width) or the MI
hemisphere (translating from the AXI Interconnect core native width to the MI).
MI and SI slots have an associated individual parametric data-width value. The AXI
Interconnect core adapts each MI and SI slot automatically to the internal native
data-width as follows:
When the data width of an SI slot is wider than the internal native data width of the
AXI Interconnect, a down-sizing conversion is performed along the pathways of the
SI slot.
When the internal native data width of the AXI Interconnect core is wider than that of
an MI slot, a down-sizing conversion is performed along the pathways of the MI slot.
When the data width of an SI slot is narrower than the internal native data width of
the AXI Interconnect, an up-sizing conversion is performed along the pathways of the
SI slot.
When the internal native data width of the AXI Interconnect core is narrower than
that of an MI slot, an up-sizing conversion is performed along the pathways of the MI
slot.
Typically, the data-width of the AXI Interconnect core is matched to that of the most
throughput-critical peripheral, such as a memory controller, in the system design.
The following subsections describe the down-sizing and up-sizing behavior.
www.xilinx.com
23
Downsizing
Downsizers used in pathways connecting wide master devices are equipped to split burst
transactions that might exceed the maximum AXI burst length (even if such bursts are
never actually needed).
When the data width on the SI side is wider than that on the MI side and the transfer size
of the transaction is also wider than the data width on the MI side, then down-sizing is
performed and, in the transaction issued to the MI side, the number of data beats is
multiplied accordingly.
The AXI Interconnect core sets the RRESP for each output data beat (on the SI) to the
worst-case error condition encountered among the input data beats being merged,
according to the following descending precedence order: DECERR, SLVERR, OKAY,
EXOKAY.
When the transfer size of the transaction is equal to or less than the MI side data width, the
transaction (address channel values) remains unchanged, and data transfers pass through
unchanged except for byte-lane steering. This applies to both writes and reads.
Upsizing
For upsizers in the SI hemisphere, data packing is performed (for INCR and WRAP bursts),
provided the AW/ARCACHE[1] bit (Modifiable) is asserted.
In the resulting transaction issued to the MI side, the number of data beats is reduced
accordingly.
The AXI Interconnect core replicates the RRESP from each input data beat onto the RRESP
of each output data beat (on the SI).
Clock Conversions
Clock conversion comprises the following:
A clock-rate reduction module performs integer (N:1) division of the clock rate from
its input (SI) side to its output (MI) side.
For both the reduction and the acceleration modules, the sample cycle for the faster clock
domain is determined automatically. Each module is applicable to all five AXI channels.
The MI and SI each have a vector of clock inputs in which each bit synchronizes all the
signals of the corresponding interface slot. The AXI Interconnect core has its own native
clock input. The AXI Interconnect core adapts the clock rate of each MI and SI slot
automatically to the native clock rate of the AXI Interconnect.
Typically, the native clock input of the AXI Interconnect core is tied to the same clock
source as used by the highest frequency SI or MI slot in the system design, such as the MI
slot connecting to the main memory controller.
24
www.xilinx.com
Pipelining
Under some circumstances, AXI Interconnect core throughput is improved by buffering
data bursts. This is commonly the case when the data rate at a SI or MI slot differs from the
native data rate of the AXI Interconnect core due to data width or clock rate conversion.
To accommodate the various rate change combinations, data burst buffers can be inserted
optionally at the various locations.
Additionally, an optional, two-deep register slice (skid buffer) can be inserted on each of
the five AXI channels at each SI or MI slot to help improve system timing closure.
The SI-side write data FIFO is located before crossbar module, after any SI-side width,
or clock conversion.
The MI-side write data FIFO is located after the crossbar module, before any MI slot
width, clock, or protocol conversion.
The MI-side Read data FIFO is located before (on the MI side) of the crossbar module,
after any MI-side width, or protocol conversion.
The SI-side Read data FIFO is located after (on the SI side) of the crossbar module,
before any SI-side width, or clock conversion.
Data FIFOs are synchronized to the AXI Interconnect core native clock. The width of each
data FIFO matches the AXI Interconnect core native data width.
For more detail and the required signals and parameters of the AXI Interconnect core IP,
refer to the AXI Interconnect IP (DS768). Appendix C, Additional Resources, also contains
this link.
www.xilinx.com
25
Connects the master interface of one AXI Interconnect core module to slave interface
of another AXI Interconnect core module.
Directly connects all master interface signals to all slave interface signals.
Description
The AXI slave interface of the axi2axi_connector (connector) module always connects to
one attachment point (slot) of the master interface of one AXI Interconnect core module
(the upstream interconnect). The AXI master interface of the connector always connects
to one slave interface slot of a different AXI Interconnect core module (the downstream
interconnect) as shown in Figure 2-14.
slave_1
AXI_Interconnect_2
M_AXI_IP
M_AXI_DP
mb_0
M_AXI_IC
M_AXI_DC
AXI_Interconnect_0
slave_2
axi2axi_connector
AXI_Interconnect_1
slave_3
X12036
Figure 2-14:
Master and Slave Interface Modules Connecting Two AXI Interconnect cores
26
www.xilinx.com
Features
Connects an AXI master or slave interface to the AXI Interconnect core IP.
A master or slave AXI bus interface on one side and AXI ports on the other side.
Other ports are modeled as an I/O interface, which can be made external, thereby
providing the necessary signals that can be connected to a top-level master or slave.
ICAXI
Microblaze
DCAXI
Axi_interconnect
S_AXI
Memory controller
Axi_ext_master_conn
M_AXI
X12040
Figure 2-15:
S_AXI
ICAXI
Memory controller
Microblaze
DCAXI
Axi_interconnect
S_AXI
Axi_gpio
Individual AXI Ports made
external to sub-system
interface
S_AXI
Axi_ext_slave_conn
X12075
Figure 2-16:
www.xilinx.com
27
The Platform Studio IP Catalog contains the external master and external slave connectors.
For more information, refer to the Xilinx website:
http://www.xilinx.com/support/documentation/axi_ip_documentation.htm.
Appendix C, Additional Resources, also contains this link.
Data Mover
The AXI Data Mover is an important interconnect infrastructure IP that enables high
throughput transfer of data between the AXI4 memory-mapped domain to the
AXI4-Stream domain. It provides Memory Map to Stream and Stream to Memory Map
channels that operate independently in a full, duplex-like method. The Data Mover IP has
the following features:
It also provides byte-level data realignment allowing memory reads and writes to any
byte offset location.
It is recommended to use AXI DataMover as a bridge between AXI4 Stream and AXI4
Memory Map Interfaces for both write and read operations where AXI4 Stream Master
controls data flow through command and status bus. AXI DataMover is available in both
CORE Generator andXPS. Figure 2-17 shows a block diagram of the Data Mover
functionality. See more information on the product page
http://www.xililnx.com/products/intellectual-property/axi_datamover.htm
Figure 2-17:
28
www.xilinx.com
Centralized DMA
Xilinx provides a Centralized DMA core for AXI. This core replaces legacy PLBv4.6
Centralized DMA with an AXI4 version that contains enhanced functionality and higher
performance. Figure 2-18 shows a typical embedded system architecture incorporating the
AXI (AXI4 and AXI4-Lite) Centralized DMA.
AXI4 MMap
Interconnect
(AXI4-Lite)
AXI Intc
AXI4-Lite
Interrupt Out
(To AXI Intc)
Interrupts In
AXI CDMA
Interrupt
DP
AXI4-Lite
CPU
(AXI
MicroBlaze)
DC
AXI4
IC
AXI4
AXI4-Lite
AXI4 MMap
Interconnect
(AXI4)
AXI4
Registers
Scatter
Gather
Engine
AXI
DDRx
AXI4-Stream
AXI4 Read
AXI4
DataMover
AXI4 Write
AXI4-Stream
AXI
BRAM
AXI4
X12037
Figure 2-18:
The AXI4 Centralized DMA performs data transfers from one memory mapped space to
another memory mapped space using high speed, AXI4, bursting protocol under the
control of the system microprocessor.
www.xilinx.com
29
Waits for the microprocessor to program and start the next transfer
Also, the AXI Centralized DMA includes an optional data realignment function for 32- and
64-bit bus widths. This feature allows addressing independence between the transfer
source and destination addresses.
30
Use DataMover Lite for the main data transport (Data Realignment Engine (DRE) and
SG mode are not supported with this data transport mechanism)
Include or omit the DRE function (available for 32- and 64-bit data transfer bus widths
only)
Specify the main data transfer bus width (32, 64, 128, 256, 512, and 1024 bits)
Specify the maximum allowed AXI4 burst length the DataMover will use during data
transfers
www.xilinx.com
Interface
AXI Type
Data Width
Description
Control
AXI4-Lite slave
32
Scatter Gather
AXI4 master
32
AXI4 Read
master
32, 64,
128, 256, 512,
1024
AXI4 Write
master
32, 64,
128, 256, 512,
1024
Ethernet DMA
The AXI4 protocol adoption in Xilinx embedded processing systems contains an Ethernet
solution with Direct Memory Access (DMA). This approach blends the performance
advantages of AXI4 with the effective operation of previous Xilinx Ethernet IP solutions.
Figure 2-19, page 32 provides high-level block diagram of the AXI DMA.
www.xilinx.com
31
AXI DMA
AXI Lite
MM2S_IntrOut
S2MM_IntrOut
Register Module
MM2S_DMACR
MM2S_DMASR
MM2S_CURDESC
Reserved
MM2S_TAILDESC
Reserved
SG Interface
S2MM_DMACR
S2MM_DMASR
S2MM_CURDESC
Reserved
S2MM_TAILDESC
Reserved
AXI Control
Interface
AXI Control
Stream (MM2S)
AXI Stream
(MM2S)
AXI DataMover
SG Engine
(Interrupt Coalescing)
AXI Stream
(S2MM)
SG Interface
AXI Status
Interface
Reset
Module
X12038
Figure 2-19:
Figure 2-20 shows a typical system architecture for the AXI Ethernet.
Figure Top x-ref 3
AXI Intc
AXI4-Lite
AXI Ethernet
AXI4 MMap
Interconnect
Registers
AXI4-Lite
Interrupts In
MIIM
Ethernet Tx
AXI DMA
Interrupt
Ethernet Rx
DP
AXI4-Lite
AXI4-Lite
Registers
AXI4
Scatter
Gather
Engine
AXI4-Stream
CPU
(AXI
MicroBlaze)
AXI4 MMap
Interconnect
DC
AXI4
IC
AXI4
Ethernet
Control
and Status
AVB
AXI4-Stream
AXI4-Stream
Tx
Control
AXI4-Stream
Rx
Status
AXI4-Stream
Tx
Payload
AXI4-Stream
Rx
Payload
AXI
DDRx
AXI4
AXI4 Read
DataMover
AXI4 Write
AXI
BRAM
Interrupt Out
(To AXI Intc)
Interrupt Out
(To AXI Intc)
AXI4
X12039
Figure 2-20:
32
www.xilinx.com
As shown in Figure 2-20, page 32, the AXI Ethernet is now paired with a new AXI DMA IP.
The AXI DMA replaces the legacy PLBv4.6 SDMA function that was part of the PLBv4.6
Multi-Port Memory Controller (MPMC).
The AXI DMA is used to bridge between the native AXI4-Stream protocol on the AXI
Ethernet to AXI4 memory mapped protocol needed by the embedded processing system.
The AXI DMA core can also be connected to a user system other than an Ethernet-based
AXI IP. In this case, the parameter C_SG_INCLUDE_STSCNTRL_STRM must be set to 0 to
exclude status and control information and use it for payload only.
www.xilinx.com
33
34
Interface
AXI Type
Data
Width
Control
AXI4-Lite slave
32
Scatter Gather
AXI4 master
32
Data MM Read
AXI4 Read
master
32, 64,
128, 256,
512, 1024
Data MM Write
AXI4 Write
master
32, 64,
128, 256,
512, 1024
AXI4-Stream
master
32, 64,
128, 256,
512, 1024
Data Stream In
AXI4-Stream
slave
32, 64,
128, 256,
512, 1024
AXI4-Stream
master
32
Status Stream In
AXI4-Stream
slave
32
Description
www.xilinx.com
Video DMA
The AXI4 protocol Video DMA (VDMA) provides a high bandwidth solution for Video
applications. It is a similar implementation to the Ethernet DMA solution.
Figure 2-21 shows a top-level AXI4 VDMA block diagram.
AXI VDMA
AXI Memory Map SG Read
AXI Lite
Register Module
MM2S_DMACR
MM2S_DMASR
MM2S_CURDESC
Reserved
MM2S_TAILDESC
Reserved
S2MM_DMACR
S2MM_DMASR
S2MM_CURDESC
Reserved
S2MM_TAILDESC
Reserved
MM2S_IntrOut
S2MM_IntrOut
MM2S
FSync
MM2S
Gen-Lock
MM2S Frame Size
MM2S Stride
MM2S Strt Addr 0
Down
Sizer
MM2S Line
Bufffer Status
AXI MM2S
Stream
Line
Buffer
SG Engine
(Interrupt Coalescing )
AXI DataMover
Up
Sizer
S2MM Line
Bufffer Status
AXI S2MM
Stream
Line
Buffer
MM2S Frame Size
MM2S Stride
axi_resetn
Reset
Module
m_axis_mm2s_aresetn
s_axis_s2mm_aresetn
S2MM
FSync
S2MM
Gen-Lock
x12054
Figure 2-21:
Figure 2-22, page 36 illustrates a typical system architecture for the AXI VDMA.
www.xilinx.com
35
&IGURE 4OP X REF
!8) )NTC
!8)
,ITE
!8) --AP
)NTERCONNECT
)NTERRUPTS )N
)NTERRUPT /UT
4O !8) )NTC
!8) 6$-!
)NTERRUPT
#05
!8)
-ICRO"LAZE
$0
!8) ,ITE
2EGISTERS
!8) ,ITE
!8) )NTERCONNECT
!8) 2EAD
$#
!8)
)#
!8)
3CATTER
'ATHER
%NGINE
--3 &3YNC
--3
'EN,OCK
6IDEO
4IMING
#ONTROLLER
--3
#ONTROLLER
!8) $$2X
6IDEO )0
!8)
!8) 3TREAM
!8) 2EAD
'EN,OCK
$ATA-OVER
6IDEO /UT
6IDEO )0
!8) "2!-
!8) 7RITE
!8) 3TREAM
6IDEO )N
3-#ONTROLLER
!8)
3--
'EN,OCK
3-- &SYNC
6IDEO
4IMING
#ONTROLLER
8
Figure 2-22:
36
www.xilinx.com
Interface
AXI Type
Data Width
Description
Control
AXI4-Lite slave
32
Scatter Gather
AXI4 master
32
Data MM Read
32, 64,
128, 256, 512,
1024
Data MM Write
32, 64,
128, 256, 512,
1024
AXI4-Stream master
8,16, 32,
64, 128, 256,
512, 1024
Data Stream In
AXI4-Stream slave
8, 16, 32,
64, 128, 256,
512, 1024
In the CORE Generator interface, through the Memory Interface Generator (MIG)
tool.
The underlying HDL code between the two packages is the same with different wrappers.
The flexibility of the AXI4 interface allows easy adaptation to both controller types.
www.xilinx.com
37
Virtex-6
The Virtex-6 memory controller solution is provided by the Memory Interface Generator
(MIG) tool and is updated with an optional AXI4 interface.
This solution is available through EDK also, with an AXI4-only interface as the
axi_v6_ddrx memory controller.
The axi_v6_ddrx memory controller uses the same Hardware Design Language (HDL)
logic and uses the same GUI, but is packaged for EDK processor support through XPS. The
Virtex-6 memory controller is adapted with an AXI4 Slave Interface (SI) through an AXI4
to User Interface (UI) bridge. The AXI4-to-UI bridge converts the AXI4 slave transactions
to the MIG Virtex-6 UI protocol. This supports the same options that were previously
available in the Virtex-6 memory solution.
The optimal AXI4 data width is the same as the UI data width, which is four times the
memory data width. The AXI4 memory interface data width can be smaller than the UI
interface but is not recommended because it would result in a higher area, lower timing/
performance core to support the width conversion.
The AXI4 interface maps transactions over to the UI by breaking each of the AXI4
transactions into smaller stride, memory-sized transactions. The Virtex-6 memory
controller then handles the bank/row management for higher memory utilization.
Figure 2-23 shows a block diagram of the Virtex-6 memory solution with the AXI4
interface.
axi_v6_ddrx (EDK) or memc_ui_top (COREGen) top level
AXI4 Master
AXI4
Interface
AXI4 Slave
Interface
Block
Figure 2-23:
UI
Interface
User
Interface
bllock
Native
Interface
Virtex-6
Memory
Controller
DFI
DDR2 /
DDR3 PHY
DDR2 / DDR3
DDR2 or
DDR3
SDRAM
external
38
www.xilinx.com
Converts AXI4 incremental (INCR) commands to MCB commands in a 1:1 fashion for
transfers that are 16 beats or less.
Breaks down AXI4 transfers greater than 16 beats into 16-beat maximum transactions
sent over the MCB protocol.
This allows a balance between performance and latency in multi-ported systems. AXI4
WRAP commands can be broken into two MCB transactions to handle the wraps on the
MCB interface, which does not natively support WRAP commands natively.
The axi_s6_ddrx core and Spartan-6 AXI MIG core from CORE Generator support all
native port configurations of the MCB including 32, 64, and 128 bit wide interfaces with up
to 6 ports (depending on MCB port configuration). Figure 2-24 shows a block diagram of
the AXI Spartan-6 memory solution.
mcb_raw_wrapper
AXI4 Master
AXI4 Slave
Interface 0
Port 0
AXI4 Master
AXI4 Slave
Interface 1
Port 1
fpga boundary
axi_s6_ddrx or mcb_ui_top
MCB
AXI4 Master
AXI4 Slave
Interface 5
LPDDR/
DDR/
DDR2/
DDR3
SDRAM
Port 5
X12046
Figure 2-24:
For more detail on memory control, refer to the memory website documents at
http://www.xilinx.com/products/technology/memory-solutions/index.htm.
www.xilinx.com
39
40
www.xilinx.com
Chapter 3
AXI Feature
READY/VALIDY
Handshake
Transfer Length
Xilinx IP Support
Full forward and reverse direction flow control of AXI protocol-defined READY/VALID
handshake.
AXI4 memory mapped burst lengths of:
IP can be defined with native data widths of 32, 64, 128, 256, 512, and 1024 bits wide.
For AXI4-Lite, the supported data width is 32 bits only.
The use of AXI4 narrow bursts is supported but is not recommended. Use of narrow bursts
can decrease system performance and increase system size.
Where Xilinx IP of different widths need to communicate with each other, the AXI
Interconnect provides data width conversion features.
Read/Write only
Designed to support AXI4 natively. Where AXI3 interoperability is required, the AXI
Interconnect contains the necessary conversion logic to allow AXI3 and AXI4 devices to
connect.
AXI3 write interleaving is not supported and should not be used with Xilinx IP.
Note: The AXI3 write Interleaving feature was removed from the AXI4 specification.
www.xilinx.com
41
Table 3-1:
AXI Feature
Xilinx IP Support
Infrastructure IP passes protection and cache bits across a system, but Endpoint IP generally
do not contain support for dynamic protection or cache bits.
Protections bits should be constant at 000 signifying a constantly secure transaction type.
Cache bits should generally be constant at 0011 signifying a bufferable and modifiable
transaction.
This provides greater flexibility in the infrastructure IP to transport and modify transactions
passing through the system for greater performance.
Quality of Service (QoS)
Bits
REGION Bits
The Xilinx AXI Interconnect generates REGION bits based upon the Base/High address
decoder ranges defined in the address map for the AXI interconnect.
Xilinx infrastructure IP, such as register slices, pass region bits across a system.
Some Endpoint slave IP supporting multiple address ranges might use region bits to avoid
redundant address decoders.
AXI Master Endpoint IP do not generate REGION bits.
User Bits
Infrastructure IP passes user bits across a system, but Endpoint IP generally ignores user bits.
The use of user bits is discouraged in general purpose IP due to interoperability concerns.
However the facility to transfer user bits around a system allows special purpose custom
systems to be built that require additional transaction-based sideband signaling. An example
use of USER bits would be for transferring parity or debug information.
Reset
Xilinx IP generally deasserts all VALID outputs within eight cycles of reset, and have a reset
pulse width requirement of 16 cycles or greater.
Holding AXI ARESETN asserted for 16 cycles of the slowest AXI clock is generally a sufficient
reset pulse width for Xilinx IP.
DSP IP has a requirement of 2 cycles for ARESETN on the AXI4-Stream interface.
42
Not Supported. The optional AXI low power interfaces, CSYSREQ, CSYSACK, and CACTIVE
are not present on IP interfaces.
www.xilinx.com
AXI4-Stream Signals
Table 3-2 lists the AXI4-Stream signals, status, and notes on usage.
Table 3-2:
AXI4-Stream Signals
Signal
Status
TVALID
Required
TREADY
Optional
TDATA
Optional
TSTRB
Optional
Notes
TKEEP
Absent
TLAST
Optional
TID
Optional
TDEST
Optional
TUSER
Optional
The physical view describes how the logical view is mapped to bits and the
underlying AXI4-Stream signals.
Simple vectors of values represent numerical data at the logical level. Individual values
can be real or complex quantities depending on the application. Similarly the number of
elements in the vector will be application-specific.
www.xilinx.com
43
At the physical level, the logical view must be mapped to physical wires of the interface.
Logical values are represented physically by a fundamental base unit of bit width N, where
N is application-specific. In general:
N bits are interpreted as a fixed point quantity, but floating point quantities are also
permitted.
Complex values are represented as a pair of base units signifying the real component
followed by the imaginary component.
To aid interoperability, all logical values within a stream are represented using base units
of identical bit width.
Before mapping to the AXI4-Stream signal, TDATA, the N bits of each base unit are rounded
up to a whole number of bytes. As examples:
The AXI4-Stream protocol requires that TDATA ports of the IP have a width in multiples of
8. It is a specification violation to define an AXI4-Stream IP with a TDATA port width that
is not a multiple of 8, therefore, it is a requirement to round up TDATA widths to byte
multiples. This simplifies interfacing with memory-orientated systems, and also allows the
use of AXI infrastructure IP, such as the AXI Interconnect, to perform upsizing and
downsizing.
By convention, the additional packing bits are ignored at the input to a slave; they
therefore use no additional resources and are removed by the back-end tools. To simplify
diagnostics, masters drive the unused bits in a representative manner, as follows:
Signed quantities are sign-extended (the unused bits are copies of the sign bit).
The width of TDATA can allow multiple base units to be transferred in parallel in the same
cycle; for example, if the base unit is packed into 16 bits and TDATA signal width was 64
bits, four base units could be transferred in parallel, corresponding to four scalar values or
two complex values. Base units forming the logical vector are mapped first spatially
(across TDATA) and then temporally (across consecutive transfers of TDATA).
Deciding whether multiple sub-fields of data (that are not byte multiples) should be
concatenated together before or after alignment to byte boundaries is generally
determined by considering how atomic is the information. Atomic information is data that
can be interpreted on its own whereas non-atomic information is incomplete for the
purpose of interpreting the data.
For example, atomic data can consist of all the bits of information in a floating point
number. However, the exponent bits in the floating point number alone would not be
atomic. When packing information into TDATA, generally non-atomic bits of data are
concatenated together (regardless of bit width) until they form atomic units. The atomic
units are then aligned to byte boundaries using pad bits where necessary.
44
www.xilinx.com
Unsigned Real
16 bits
16 bits
X12056
www.xilinx.com
45
Alternatively, scalar values can also be considered as vectors of unity length, in which case
TLAST should be driven active-High (TLASTB). As the value type is unsigned, the unused
packing bits are driven 0 (zero extended).
Similarly, for signed data the unused packing bits are driven with the sign bits
(sign-extended), as shown in Figure 3-2:
X12057
Signed Complex
16 bits
16 bits
X12058
Figure 3-3:
Where re(X) and im(X) represent the real and imaginary components of X respectively.
46
www.xilinx.com
Note: For simplicity, sign extension into TDATA[15:12] is not illustrated here. A complex value is
transferred every two clock cycles.
The same data can be similarly represented on a channel with a TDATA signal width of 32
bits; the wider bus allows a complex value to be transferred every clock cycle, as shown in
Figure 3-4:
X12059
Figure 3-4:
The two representations in the preceding figures of the same data (serial and parallel)
show that data representation can be tailored to specific system requirements. For
example, a:
High throughput processing engine such as a Fast Fourier Transform (FFT) might
favor the parallel form
MAC-based Finite Impulse Response (FIR) might favor the serial form, thus enabling
Time Division Multiplexing (TDM) data path sharing
Use an AXI4-Stream-based upsizer to convert the serial form to the parallel form.
Use an AXI4-Stream-based downsizer to convert the parallel form to the serial form.
Signed Complex
16 bits
16 bits
www.xilinx.com
47
X12060
Figure 3-5:
As for the scalar case, the same data can be represented on a channel with TDATA width of
32 bits, as shown in Figure 3-6:
X12061
Figure 3-6:
48
www.xilinx.com
The degree of parallelism can be increased further for a channel with TDATA width of 64
bits, as shown in Figure 3-7:
x12062
Figure 3-7:
Full parallelism can be achieved with TDATA width of 128 bits, as shown in Figure 3-8:
X12063
Figure 3-8:
www.xilinx.com
49
As shown for the scalar data in the preceding figures, there are multiple representations
that can be tailored to the application.
Similarly, AXI4-Stream upsizers and downsizers can be used for conversion.
50
For Packetized Data, TKEEP might be needed to signal packet remainders. When the
TDATA width is greater than the atomic size (minimum data granularity) of the
stream, a remainder is possible because there may not be enough data bytes to fill an
entire data beat. The only supported use of TKEEP for Xilinx endpoint IP is for packet
remainder signaling and deasserted TKEEP bits (which is called Null Bytes in the
AXI4-Stream Protocol v1.0) are only present in a data beat with TLAST asserted. For
non-packetized continuous streams or packetized streams where the data width is the
same size or smaller than the atomic size of data, there is no need for TKEEP. This
generally follows the Continuous Aligned Stream model described in the
AXI4-Stream protocol.
www.xilinx.com
The AXI4-Stream protocol describes the usage for TKEEP to encode trailing null bytes
to preserve packet lengths after size conversion, especially after upsizing an odd
length packet. This usage of TKEEP essentially encodes the remainder bytes after the
end of a packet which is an artifact of upsizing a packet beyond the atomic size of the
data.
Xilinx AXI master IP do not to generate any packets that have trailing transfers with all
TKEEP bits deasserted. This guideline maximizes compatibility and throughput since
Xilinx IP will not originate packets containing trailing transfers with all TKEEP bits
deasserted. Any deasserted TKEEP bits must be associated with TLAST = 1 in the
same data beat to signal the byte location of the last data byte in the packet.
Xilinx AXI slave IP are generally not designed to be tolerant of receiving packets that
have trailing transfers with all TKEEP bits deasserted. Slave IP that have TKEEP inputs
only sample TKEEP with TLAST is asserted to determine the packet remainder bytes.
In general if Xilinx IP are used in the system with other IP designed for Continuous
Aligned Streams as described in the AXI4-Stream specification, trailing transfers with
all TKEEP bits deasserted will not occur.
All streams entering into a system of Xilinx IP must be fully packed upon entry in the
system (no leading or intermediate null bytes) in which case arbitrary size conversion
will only introduce TKEEP for packet remainder encoding and will not result in data
beats where all TKEEP bits are deasserted.
Sideband Signals
The AXI4-Stream interface protocol allows passing sideband signals using the TUSER bus.
From an interoperability perspective, use of TUSER on an AXI4-Stream channel is an issue
as both master and Slave must now not only have the same interpretation of TDATA, but
also of TUSER.
Generally, Xilinx IP uses the TUSER field only to augment the TDATA field with
information that could prove useful, but ultimately can be ignored. Ignoring TUSER could
result in some loss of information, but the TDATA field still has some meaning.
For example, an FFT core implementation could use a TUSER output to indicate block
exponents to apply to the TDATA bus; if TUSER was ignored, the exponent scaling factor
would be lost, but TDATA would still contain un-scaled transform data.
Events
An event signal is a single wire interface used by a core to indicate that some specific
condition exists (for example: an input parameter is invalid, a buffer is empty or nearly
full, or the core is waiting on some additional information). Events are asserted while the
condition is present, and are deasserted once the condition passes, and exhibit no latching
behavior. Depending on the core and how it is used in a system, an asserted event might
indicate an error, a warning, or information. Event signals can be viewed as AXI4-Stream
channels with an VALID signal only, without any optional signals. Event signals can also
be considered out-of-band information and treated like generic flags, interrupts, or status
signals.
www.xilinx.com
51
Ignored:
Unless explicitly stated otherwise, a system can ignore all event conditions.
In general, a core continues to operate while an event is asserted, although potentially
in some degraded manner.
As Interrupts or GPIOs:
An event signal might be connected to a processor using a suitable interrupt controller
or general purpose I/O controller. System software is then free to respond to events as
necessary.
As Simulation Diagnostic:
Events can be useful during hardware simulation. They can indicate interoperability
issues between masters and slaves, or indicate misinterpretation of how subsystems
interact.
As Hardware Diagnostic:
Similarly, events can be useful during hardware diagnostic. You can route events
signals to diagnostic LED or test points, or connect them to the ChipScope Pro
Analyzer.
TLAST Events
Some slave channels do not require a TLAST signal to indicate packet boundaries. In such
cases, the core has a pair of events to indicate any discrepancy between the presented
TLAST and the internal concept of packet boundaries:
Depending on the system design these events might or might not indicate potential
problems.
For example, consider an FFT core used as a coprocessor to a CPU where data is streamed
to the core using a packet-orientated DMA engine.
The DMA engine can be configured to send a contiguous region of memory of a given
length to the FFT core, and to correctly assert TLAST at the end of the packet. The system
software can elect to use this coprocessor in a number of ways:
Single transforms:
The simplest mode of operation is for the FFT core and the DMA engine to operate in
a lockstep manner. If the FFT core is configured to perform an N point transform, then
the DMA engine should be configured to provide packets of N complex samples.
52
www.xilinx.com
If a software or hardware bug is introduced that breaks this relationship, the FFT core
will detect TLAST mismatches and assert the appropriate event; in this case indicating
error conditions.
Grouped transforms:
Typically, for each packet transferred by the DMA engine, a descriptor is required
containing start address, length, and flags; generating descriptors and sending them to
the engine requires effort from the host CPU. If the size of transform is short and the
number of transforms is high, the overhead of descriptor management might begin to
overcome the advantage of offloading processing to the FFT core.
One solution is for the CPU to group transforms into a single DMA operation. For
example, if the FFT core is configured for 32-point transforms, the CPU could group 64
individual transforms into a single DMA operation. The DMA engine generates a
single 2048 sample packet containing data for the 64 transforms; however, as the DMA
engine is only sending a single packet, only the data for the last transform has a
correctly placed TLAST. The FFT core would report 63 individual missing TLAST
events for the grouped operation. In this case the events are entirely expected and do
not indicate an error condition.
In the example case, the unexpected TLAST event should not assert during normal
operation. At no point should a DMA transfer occur where TLAST does not align with
the end of an FFT transform. However, as for the described single transform example
case, a software or hardware error could result in this event being asserted. For
example, if the transform size is incorrectly changed in the middle of the grouped
packet, an error would occur.
Streaming transforms:
For large transforms it might be difficult to arrange to hold the entire input packet in a
single contiguous region of memory.
In such cases it might be necessary to send data to the FFT core using multiple smaller
DMA transfers, each completing with a TLAST signal. Depending on how the CPU
manages DMA transfers, it is possible that the TLAST signal never aligns correctly
with the internal concept of packet boundaries within the FFT core.
The FFT core would therefore assert both missing TLAST and unexpected
TLAST events as appropriate while the data is transferring. In this example case, both
events are entirely expected, and do not indicate an error condition.
www.xilinx.com
53
A blocking case performs a transform only when both a control packet and a data
packet are presented to the core.
A non-blocking case performs a transform with just a data packet, with the core
reusing previous control information.
There are numerous tradeoffs related to the use of blocking versus non-blocking interfaces:
Feature
Synchronization
Blocking
Non-blocking
Automatic
Not automatic
Signaling
Small
Minimized
Overhead
Connectivity
Simple
Complex
Small
None
Resource Overhead
Note: In many cases, DSP and Wireless IP have base units that do not usually fall on the 8 bit (Byte)
boundaries. Refer to the Numerical Data in an AXI4-Stream, page 43 for information on how to
handle data that does not fall on byte boundaries.
54
www.xilinx.com
Video data enters the system through the input I/O interface and exits the system through
a similar I/O interface, which is, in many cases, connected to a monitor for display of the
processed video sequence. In a complex video system, the IP cores provide a register
interface that is used for setup and control by a central managing microprocessor. This type
of system is supported by Xilinx design tools such as the Embedded Development Kit
(EDK) using the Xilinx MicroBlaze processor.
Xilinx Video IP cores are used in a variety of video and imaging applications and a wide
range of markets, from Industrial, Scientific, and Medical (ISM), security and surveillance,
automotive, customer electronics, to professional broadcast markets. The interface and
protocol addresses the needs of multiple application domains.
Video IP using the AXI4-Stream interface provides a simple, versatile, high-performance
point-to-point communication interface between video IP cores easy to use for video
designers.
Using the industry standard AXI interface lets video cores connect to embedded
processors and infrastructure IP. Based upon a well-defined, standard interface and
protocol, video and system designers can leverage advanced Xilinx software tools to
connect video IP and to build video systems.
The following subsections provide the requirements, standards, recommendations, and
guidelines for Xilinx Video IP design to adapt AXI4-Stream interfaces, and harmonize
AXI4-Stream based Video IP development with AXI4-Stream based DSP IP, infrastructure
IP, and software tools development. The subsections also provide the details for defining
AXI4-Stream based Video IP interfaces, and describes the signals and protocols for
transmitting video using the AXI4-Stream interface, its applicability to a wide range of
video systems, and usage guidelines for interoperability.
This subsection also defines the:
Set of AXI4-Stream signals used for video data exchange between IP cores
List of supported data, such as RGB, 420 YCC, and the mapping of data to the TDATA
bus (see Table 3-6, page 67)
www.xilinx.com
55
Video systems follow the general pipelined processing chain, shown in Figure 3-9.
Figure 3-9:
Signaling Interface
Table 3-3 lists the mandatory interface signal names and functions for the input (slave) side
connectors. The AXI4-Stream Signal Name column lists the mandatory, top-level IP port
names.
Table 3-3:
56
Function
Width
Direction
AXI4-Stream
Signal Name
Video Specific
Name
Video Data
IN
s_axis_video_tdata
DATA
Valid
IN
s_axis_video_tvalid
VALID
Ready
OUT
s_axis_video_tready
READY
Start Of Frame
IN
s_axis_video_tuser
SOF
End Of Line
IN
s_axis_video_tlast
EOL
www.xilinx.com
For IP with multiple AXI4-Stream input interfaces, the s_ signal prefix must be appended
to the sk_ signal, where k is the index of the respective input AXI4-Stream, shown in
Figure 3-10.
VBD[LVBYLGHRBWGDWD
9LGHR,3
VBD[LVBYLGHRBWYDOLG
)URP
9LGHR,3RU
$;,9'0$
VBD[LVBYLGHRBWUHDG\
VBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWXVHU
VBD[LVBYLGHRBWGDWD
VBD[LVBYLGHRBWYDOLG
)URP
9LGHR,3RU
$;,9'0$
VBD[LVBYLGHRBWUHDG\
VBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWXVHU
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
PBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWXVHU
7R
9LGHR,3RU
$;,9'0$
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
PBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWXVHU
7R
9LGHR,3RU
$;,9'0$
FONBSURF
DFONBV
DFONHQBV
DUHVHWQBV
DFONBP
DFONHQBP
DUHVHWQBP
Figure 3-10:
Table 3-4 lists the mandatory interface signal names and functions for the output
(master) side signals.
Similarly, for IP with multiple AXI4-Stream output interfaces, the m_ signal prefix must be
appended to the mk_ signal, where k is the index of the respective output AXI4-Stream.
Table 3-4:
Function
Width
Direction
AXI4-Stream Signal
Name
Video Specific
Name
Video Data
OUT
m_axis_video_tdata
DATA
Valid
OUT
m_axis_video_tvalid
VALID
Ready
IN
m_axis_video_tready
READY
Start Of Frame
OUT
m_axis_video_tuser
SOF
End Of Line
OUT
m_axis_video_tlast
EOL
The Video Specific Name column recommends short, descriptive signal names referring to
AXI4-Stream ports, to be used in HDL code, timing diagrams, and testbenches.
www.xilinx.com
57
The name of the ACLK pin can be appended or prefixed to designate clock functionality,
such as m0_axis_aclk, or aclk_out for IP with multiple AXI4 interfaces using different
clocks.
Interface input signals are sampled on the rising edge of ACLK. Output signal changes
must occur after the rising edge of ACLK.
On Video IP interfaces, the ACLK pin is not part of the AXI4-Stream component interface;
ACLK signals associated with AXI4-Stream component interfaces are provided to Video IP
using one or multiple core clock signals. The clock signals can be shared by multiple
AXI4-Stream interfaces and signal processing circuitry within the IP.
Signals in each component interface must be synchronous to one of the core clock signals,
which are inputs to Video IP cores, but not directly part of the video over AXI4-Stream
interface. For example, if a core uses a single processing ACLK signal, to which all
operations within the core are synchronous, the master and slave AXI4-Stream video
interfaces should use this clock signal as their clock reference, as shown in Figure 3-11.
9LGHR,3$
VBD[LVBYLGHRBWGDWD
VBD[LVBYLGHRBWYDOLG
VBD[LVBYLGHRBWUHDG\
VBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWXVHU
D[LVBDFON
D[LVBDFONHQ
D[LVBDUHVHWQ
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
PBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWXVHU
9LGHR,3%
VBD[LVBYLGHRBWGDWD
VBD[LVBYLGHRBWYDOLG
VBD[LVBYLGHRBWUHDG\
VBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWXVHU
9LGHR,3&
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
PBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWGDWD
VBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWXVHU
VBD[LVBYLGHRBWXVHU
D[LVBDFON
D[LVBDFONHQ
D[LVBDUHVHWQ
VBD[LVBYLGHRBWUHDG\
VBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
PBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWXVHU
D[LVBDFON
D[LVBDFONHQ
D[LVBDUHVHWQ
DFON
DFONHQ
VFOU
Figure 3-11:
58
www.xilinx.com
9LGHR,3$
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
9LGHR,3%
VBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWYDOLG
VBD[LVBYLGHRBWUHDG\
VBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWXVHU
VBD[LVBYLGHRBWXVHU
FONBSURF
FONBSURF
PBD[LVBYLGHRBDFON
VBD[LVBYLGHRBDFON
PBD[LVBYLGHRBDFONHQ
PBD[LVBYLGHRBDUHVHWQ
VBD[LVBYLGHRBDFONHQ
VBD[LVBYLGHRBDUHVHWQ
PBD[LVB
PBD[LVB
PBD[LVBY
PBD[LV
PBD[LVB
FONBGYLBLQ
FON
FH
VFOUBQ
Figure 3-12:
TDATA Structure
DATA bits are represented using the (N-1 downto 0) or [N-1:0] bit numbering convention.
The components of implicit subfields of DATA shall be packed together tight; for example,
a DW=10 bit RGB data packed together to 30 bits. If necessary, the packed data word can be
zero padded with Most Significant Bit (MSBs) so the width of the resulting word is an
integer multiple of 8.
Note: When two cores connect using AXI4-Stream interfaces, where only the master or the slave
interface has an ACLKEN port, which is not permanently tied high, the two interfaces must be
connected using the AXI4 FIFO core to avoid data corruption. See the LogiCORE IP FIFO Generator
(DS317) for information about the core. Appendix C, Additional Resources, provides a link to the
datasheet.
Reset pins provided in conjunction with the corresponding clocks (hardware reset)
An Active-low reset pin, ARESETn, associated with ACLK, is required on the IP core
interface. For IP with multiple AXI4-Stream interfaces using different clocks, each clock
domain can have corresponding reset signals. The name of the ARESETn pin can be
appended to designate clock association, such as ARESETn_m0.
www.xilinx.com
59
The ARESETn signal takes precedence over ACLKEN, cores with optional ACLKEN inputs
that must reset when ARESETn is deasserted irrespective of the state of the associated
ACLKEN input.
Note: When a system with multiple-clocks and corresponding reset signals are being reset, the
reset generator has to ensure all reset signals are asserted/deasserted long enough that all
interfaces and clock-domains in all IP cores are correctly reinitialized.
TID
Video IP must use designated AXI4-Stream interfaces to transfer video and data streams;
therefore, TID is not used in Video IP using AXI4-Stream interface. Video IP is not
expected to forward a slave TID, or generate a TID, instead, the unconnected TID signal is
expected to default to 0.
TDEST
The TDEST signal is not used in Video IP using AXI4-Stream interface. Video IP is not
expected to forward a slave TDEST, or generate a TDEST, instead, the unconnected TDEST
signal is expected to default to 0.
TUSER
TUSER bit 0, labeled Start of Frame (see Start of Frame Signal - SOF, page 63) is the only
AXI4-Stream signal used for video. Other TUSER signal bits are not propagated by video
cores.
Signaling Protocol
This section describes how you can use the interface signals and basic protocols of the
AXI4-Stream specification to construct streaming interfaces to meet the needs of various
video system applications. Generic AXI protocol signals are referenced using signal names
reflecting their video specific function.
Channel Structure
The interface contains a set of handshake signals, VALID and READY, and a set of
information-carrying signals, DATA, EOL, and SOF, that are conditioned by the handshake
signals.
AXI4-Stream interface signals must operate in the same clock domain; however, the master
and slave side can operate in different clock domains. In this case, proper clock-domain
crossing logic must be employed when connecting the interfaces.
60
www.xilinx.com
Figure 3-13, page 61 is a simplified diagram showing the use of an asynchronous FIFO for
clock-domain crossing, omitting the enable and reset logic from the FIFO. A similar design
can be used to connect to third party IP with no ACLKEN, or to a master with no TREADY
input.
9LGHR,3%
9LGHR,3$
PBD[LVBYLGHRBWODVW
PBD[LVBYLGHRBWXVHU
VBD[LVBYLGHRBWODVW
VBD[LVBYLGHRBWXVHU
$V\QF),)2
GDWDBZU
PBD[LVBYLGHRBWGDWD
PBD[LVBYLGHRBWYDOLG
PBD[LVBYLGHRBWUHDG\
GDWDBUG
VBD[LVBYLGHRBWGDWD
IXOO
HPSW\
UH
VBD[LVBYLGHRWYDOLG
VBD[LVBYLGHRWUHDG\
FONBZU
FONBUG
DFON
ZH
DFON
DFONHQ
DUHVHWQ
DFONHQ
DUHVHWQ
FONBGRPDLQB$
FONBGRPDLQB%
Figure 3-13:
Use of Asynchronous FIFO for Clock-Domain Crossing- No FIFO Enable or Reset Logic
In EDK, the AXI4-Stream Interconnect IP can be used to simplify connecting AXI4-Stream
interfaces in different clock domains, shown in Figure 3-14.
Note: In this protocol specification, for the sake of simplicity, both master and slave AXI4-Stream
interfaces are assumed to operate in the same clock domain, synchronous to ACLK, with ACLKEN=1,
and ARESETn=1.
Figure 3-14:
For any given channel, signals propagate from the source (master) to the destination
(slave) with the exception of the READY signal.
Any other information-carrying or control signals that need to propagate in the opposite
direction must be part of a separate interface, READY is not used as a mechanism to transfer
opposite direction information from a slave to a master.
READY/VALID Handshake
A valid transfer occurs when READY, VALID, ACLKEN, and ARESETn signals are high at
the rising edge of ACLK, as shown in Figure 3-15, page 62. During valid transfers, DATA
only carries active video data.
www.xilinx.com
61
Note: Blank periods, audio, and ancillary data packets are not transferred in Video IP using
AXI4-Stream.
Figure 3-15:
62
www.xilinx.com
with other devices, such as a processor, for access to memory. This could require the
memory controller to temporarily become unavailable by de-asserting TREADY while it
waits for access to memory.
After the controller grants access to the Frame Buffer write interface, it asserts READY and
takes data. In this example, having an AXI FIFO between the Video Input IP and the Frame
Buffer IP would allow the two to connect to each other. If the FIFO depth is selected
correctly by analyzing the memory arbitration process, no data would be lost due to a FIFO
overflow.
Can be asserted an arbitrary number of ACLK cycles before the first pixel value is
presented on DATA, as long as a VALID is not asserted.
When the SOF_LATE condition is detected, the IP should drop (accept on the input,
but not propagate to the output) subsequent pixels until the SOF signal arrives.
www.xilinx.com
63
Figure 3-16:
When the EOL_LATE condition is detected, the IP should generate its output EOL
signal according to the programmed/parameterized line-length, and drop (accept on
the input, but not propagate to the output) subsequent pixels until the EOL signal
arrives.
64
www.xilinx.com
Data Format
To transport video data, the DATA vector encodes logical channel subsets of physical DATA
signals. Various AXI4-Stream interfaces between the modules can facilitate transferring
video using different precision (for example; 8, 10, 12, or 16 bits per color channel), and/or
different formats (for example RGB or YUV 420). A specific example of a typical image
pre-processing system is illustrated in Figure 3-17, which consists of a number of Xilinx IP
cores connected using AXI4-Stream to implement an imaging sensor processing pipeline.
$;,6
,QSXW
,QWHUIDFH
6HQVRU
$;,6
,,)
&RORU)LOWHU
$UUD\
,QWHUSRODWLRQ
'HIHFWLYH
3L[HO
&RUUHFWLRQ
<
'3&
<
&)$
&RORU
&RUUHFWLRQ
0DWUL[
5*%
&&0
5*%
*DPPD
&RUUHFWLRQ
5*%
WR
<&U&E
&6&
5*%
1RLVH
5HGXFWLRQ
<&&
1RLVH
<&&
(GJH
(QKDQFHPHQW
(QKDQFH
&KURPD
5HVDPSOHU
<&&
&56
<&&
,PDJH
6WDWLVWLFV
6WDWV
/HJHQG
$;,/LWH
$;,/LWH
9LUWXDO
&RQQHFWLRQ
6RIWZDUH
$;,6WUHDP
'3&
GULYHU
,,)
GULYHU
&)$
GULYHU
&&0
GULYHU
*DPPD
GULYHU
&6&
GULYHU
1RLVH
GULYHU
(QKDQFH
GULYHU
&56
GULYHU
6WDWV
GULYHU
6HQVRU*DLQ
$*
6HQVRU
([SRVXUH
&RQWURO
$(
/HQV)RFXV
$)
$:%
0LFUR%OD]H
*OREDO
*OREDO
&RQWUDVW
Figure 3-17:
AXI4-Stream channels in Figure 3-17 are annotated to reflect the transferred video format.
The DATA signal must not have any explicit subfields defined, either as separate ports or
with special signal suffixes.
For example, cores with DATA_Y and DATA_C signals are not permitted. Format
information is embedded in the IP-XACT representation of IP, as metadata tags attached to
AXI4-Stream ports.
Parameter Function
C_t_AXIS_DATA_WIDTH
C_t_VIDEO_FORMAT
C_t_AXIS_TDATA_WIDTH
www.xilinx.com
65
Encoding
Table 3-6, page 67 lists the detailed representation of video data formats, with DW =
C_DATA_WIDTH and VF = C_VIDEO_FORMAT. Video data format codes follow the
examples of from the following industry standards:
66
HDTV Standards and Practices for Digital Broadcasting: SMTPE 352M-2009 (available on
the web in video).
www.xilinx.com
Video Format
[4DW-1: 3DW]
[3DW-1: 2DW]
[2DW-1: DW]
[DW-1:0]
YUV 4:2:2
V/U, Cr/Cb
YUV 4:4:4
V, Cr
U, Cb
RGB
YUV 4:2:0
V/U, Cr/Cb
YUVA 4:2:2
V/U, Cr/Cb
YUVA 4:4:4
V, Cr
U, Cb
RGBA
7*
YUVA 4:2:0
V/U, Cr/Cb
YUVD 4:2:2
V/U, Cr/Cb
YUVD 4:4:4
V, Cr
U, Cb
10
RGBD
11*
YUV 4:2:0
V/U, Cr/Cb
12*
Bayer Sensor
RGB, CMY
13*
Luma Only
14*
Reserved
15*
Reserved
www.xilinx.com
67
68
www.xilinx.com
Chapter 4
Were created using the Create and Import IP Wizard in a previous version of
Xilinx tools.
Cannot be altered, and needs to be used as-is with its existing PLBv4.6 interface.
IP created from scratch is not discussed in this section, except for using the AXI IP
templates provided in the answer record cited below; refer to Memory Mapped IP
Feature Adoption and Support, page 41 as well as the ARM AMBA AXI Protocol v2.0
Specification specification available from the ARM website. New IP should be designed to
the AXI protocol.
IP that was created using the Create and Import Peripheral (CIP) Wizard in a previous
version of Xilinx tools (before AXI was supported) can be migrated by rerunning the CIP
Wizard to create AXI-based template designs. Alternatively, the IP can be migrated using
AXI IP templates provided in the solution record:
http://www.xilinx.com/support/answers/37425.htm.
These templates provide example RTL level designs for AXI4, AXI4-Lite, and AXI4-Stream
masters and slaves. Check this answer record periodically for updates or new templates.
IP that needs to remain unchanged can be used in the Xilinx tools using the AXI to PLB
bridge. See the following section, The AXI To PLBv.46 Bridge, page 70 for more
information.
Larger pieces of Xilinx IP (often called Connectivity or Foundation IP): This class of IP has
migration instructions in the respective documentation. This class of IP includes: PCIe,
Ethernet, Memory Core, and Serial Rapid I/O.
www.xilinx.com
69
Local-Link Interface: Local-Link is a generic streaming, FIFO-like interface that has been
in service at Xilinx for a number of software generations. See Migrating Local-Link to
AXI4-Stream, page 72 for more information.
DSP IP: General guidelines on converting this broad class of IP is covered in Migrating
HDL Designs to use DSP IP with AXI4-Stream, page 79.
Features
The Xilinx AXI (AXI4 and AXI4-Lite) to PLBv4.6 Bridge is a soft IP core with the following
supported features:
32- or 64-bit data buses on AXI and PLB interfaces (1:1 ratio)
70
Read/write interface
Read-only interface
Write-only interface
Unaligned transactions
Interrupt generation for partial data strobes except first and last data beat
www.xilinx.com
Supports 32-, 64-, and 128-bit PLBv4.6 data bus widths with required data mirroring
PORT 2
AXI4 Lite
AXI4-PLBv46
Bridge
AXILitePLBv46
Bridge
PLB v46
PORT 1
AXI4
Registers
X12064
Figure 4-1:
The AXI data bus width is 32- and 64-bit and the PLBv4.6 master is a 32- and 64-bit
device (for example, C_MPLB_NATIVE_DWIDTH= 32/64).
PLBv4.6 data bus widths of 32-bit, 64-bit, and 128-bit are supported with the AXI to
PLBv4.6 Bridge performing the required data mirroring.
AXI transactions are received on the AXI Slave Interface (SI), then translated to
PLBv4.6 transactions on the PLBv4.6 bus master interface.
Both read data and write data are buffered (when C_S_AXI_PROTOCOL=AXI4) in
the bridge, because of the mismatch of AXI and PLBv4.6 protocols where AXI
protocol allows the master to throttle data flow, but PLBv4.6 protocol does not allow
PLB masters to throttle data flow.
The bridge:
Buffers the write data input from the AXI port before initiating the PLBv4.6 write
transaction.
www.xilinx.com
71
Implements a read and write data buffer of depth 32x32/64x32 to hold the data for
two PLB transfers of highest (16) burst length.
Signal Name
Direction
Description
Mapping to
AXI4-Stream
Signals
CLK
Input
ACLK
RST_N
Input
ARESETN
(or some other
reset)
DATA
src to dst
TDATA
SRC_RDY_N
src to dst
TVAILD
DST_RDY_N
dst to src
TREADY
SOF_N
src to dst
Optional TUSER
EOF_N
src to dst
TLAST
You can map clock and reset signals directly to the appropriate clock and reset for the
given interface. The ARESETN signal is not always present in IP cores, but you can
instead use another system reset signal.
In AXI4-Stream, the TDATA signal is optional. Because DATA is required for a LocalLink, it is assumed that an interface converted to AXI4-Stream from Local-Link must
create the TDATA signal.
The Source and Destination Ready are active-Low signals. You can translate these
signals to TVALID and TREADY by inverting them to an active-High signal.
Note: The TREADY signal is optional. It is assumed that when you convert an interface, you
chose to use this signal.
72
The EOF_N is an active-Low signal used to indicate the end of a frame. With an
inversion, this will connect directly to TLAST, which is an optional signal.
www.xilinx.com
The SOF_N signal does not have a direct map to AXI4-Stream. The start of frame is
implied, because it is always the first valid beat after the observation of TLAST (or
reset).
In most cases the signal is no longer needed for the interface. If a start of frame signal
is needed, it could be applied to the TUSER field.
P0
P1
P2
P3
P4
P5
X12043
Figure 4-2:
The preceding figure shows how the flow control signals (SRC_RDY_N and DST_RDY_N)
restrict data flow. Also, observe how SOF_N and EOF_N signals frame the data packet.
Figure 4-3 shows the same type of transaction with AXI4-Stream. Note the only major
difference is the absence of an SOF signal, which is now implied.
ACLK
TLAST
TVALID
TREADY
TDATA
P0
P1
P2
P3
P4
P5
X12042
Figure 4-3:
AXI4-Stream Waveform
Signal Name
Direction
Description
Mapping to AXI
SOP_N
src to dst
Start-of-Packet: Packetization
within a frame.
TUSER
EOP_N
src to dst
End-of-Packet: Packetization
within a frame.
TUSER
REM
src to dst
TKEEP
www.xilinx.com
73
Table 4-2:
Signal Name
Direction
Description
Mapping to AXI
SRC_DSC_N
src to dst
TUSER
DST_DSC_N
dst to src
Destination Discontinue:
Indicates the destination
device is cancelling a frame.
<none>
CH
src to dst
TID
PARITY
src to dst
TUSER
Any optional signal that is not represented in this table must be sent using the TUSER
signal.
The SOP_N and EOP_N signals are rarely used in Local-Link. They add granularity to
the SOF/EOF signals. If there is a need for them, they must be created in the TUSER
field.
The REM signal specifies the remainder of a packet. AXI4-Stream has TKEEP bus that
may have deasserted bits when TLAST = 1 to signal the location of the last byte in the
packet.
The CH indicator can be mapped to the thread ID (TID). For parity, or any error
checking, the TUSER is a suitable resource.
Variations in Local-Link IP
There are some variations of Local-Link to be aware of:
74
Some users create their own signals for Local-Link. These signals are not defined in
the Local-Link specification.
In cases where the signal goes from the source to the destination, a suitable
location is TUSER.
If the signals go from the destination to the source, they cannot use TUSER or any
other AXI4-Stream signal. Instead, one of the preferable methods is to create a
second return AXI4-Stream interface. In this case, most of the optional AXI4Stream signals would not be used; only the TVALID and TUSER signals.
www.xilinx.com
Local-Link References
The Local-Link documentation is on the following website:
http://www.xilinx.com/products/design_resources/conn_central/locallink_member/sp06.pdf.
Resets
In System Generator, the resets on non-AXI IP are active-High. AXI IP in general, and in
System Generator, has an active-Low reset, aresetn. System Generator Inverter blocks
are necessary when using a single reset signal to reset both AXI and non-AXI IP.
A minimum aresetn active-Low pulse of two cycles is required, because the signal is
internally registered for performance. Additionally, aresetn always takes priority over
aclken.
Clock Enables
In System Generator, the clock enables on both non-AXI and AXI IP are active-High. AXI
IP in System Generator use an active-High clock-enable, aclken.
TDATA
In AXI protocols, data is consolidated onto a single TDATA input stream. This is consistent
with the top-level ports on DSP IP in CORE Generator. For ease of connecting data ports,
System Generator breaks TDATA down into individual ports in the block view.
www.xilinx.com
75
An example of this is the AXI Complex Multiplier block as shown in Figure 4-4:
Figure 4-4:
Port Ordering
When comparing non-AXI and AXI IP, such as the Complex Multiplier 3.1 and 4.0,
respectively, the real and imaginary ports appear in opposite order when looking at the
block from top to bottom. You must be careful to not connect the AXI block with the data
paths accidentally crossed. Figure 4-5 shows an example of the port signals.
Figure 4-5:
76
www.xilinx.com
Latency
With AXI IP in System Generator, the latency is handled in a different fashion than nonAXI IP. In non-AXI IP, you can specify latency directly using either a GUI option or by
specifying -1 in the maximum performance option. With AXI IP, the latency is either
Automatic or Manual:
To manually set the latency, the parameter is called Minimum Latency for AXI blocks
because:
In blocking mode, the latency can be higher than the minimum latency specified if
the system has back pressure.
With DSP IP that support the AXI4-Stream interface, each individual AXI4-Stream slave
channel can be categorized as either a blocking or a non-blocking channel. A slave channel
is blocking when some operation of the core is inhibited until a transaction occurs on that
channel. In general, the latency of the DSP IP AXI4-Stream interface is static for nonblocking and variable for blocking mode. To reduce errors while migrating your design,
pay attention to the Latency Changes and Instructions for Minimum Change
Migration sections of the IP data sheet.
The following tables list the master-FSL and slave-FSL to AXI4-Stream signals conversion
mappings.
www.xilinx.com
77
Signal
Direction
AXI Signal
Direction
FSL_M_Clk
Out
M_AXIS_<Port_Name>ACLK
In
FSL_M_Write
Out
M_AXIS_<Port_Name>TVALID
Out
FSL_M_Full
In
M_AXIS_<Port_Name>TREADY
In
FSL_M_Data
Out
M_AXIS_<Port_Name>TDATA
Out
FSL_M_Control
Out
M_AXIS_<Port_Name>TLAST
Out
Signal
FSL_S_Clk
Direction
Out
AXI Signal
Direction
S_AXIS_<Port_Name>ACLK
In
FSL_S_Exists
In
S_AXIS_<Port_Name>TVALID
In
FSL_S_Read
Out
S_AXIS_<Port_Name>TREADY
Out
FSL_S_Data
In
S_AXIS_<Port_Name>TDATA
In
FSL_S_Control
In
S_AXIS_<Port_Name>TLAST
In
Differences in Throttling
There are fundamental differences in throttling between FSL and AXI4-Stream, as follows:
The AXI_M_TVALID signal cannot be deasserted after being asserted unless a transfer
is completed with AXI_TREADY. However, a AXI_TREADY can be asserted and
deasserted whenever the AXI4-Stream slave requires assertion and deassertion.
For FSL, the signals FSL_Full and FSL_Exists are the status of the interface; for
example, if the slave is full or if the master has valid data
The MicroBlaze processor has an FSL test instruction that checks the current status of the
FSL interface. For this instruction to function on the AXI4-Stream, MicroBlaze has an
additional 32-bit Data Flip-Flop (DFF) for each AXI4-Stream master interface to act as an
output holding register.
When MicroBlaze executes a put fsl instruction, it writes to this DFF. The AXI4-Stream
logic inside MicroBlaze moves the value out from the DFF to the external AXI4-Stream
slave device as soon as the AXI4-Stream allows. Instead of checking the AXI4-Stream
78
www.xilinx.com
TREADY/TVALID signals, the fsl test instruction checks if the DFF contains valid data
instead because the AXI_S_TREADY signal cannot be directly used for this purpose.
The additional 32-bit DFFs ensure that all current FSL instructions to work seamlessly on
AXI4-Stream. There is no change needed in the software when converting from FSL to
AXI4 stream.
For backward compatibility, the MicroBlaze processor supports keeping the FSL interfaces
while the normal memory mapped AXI interfaces are configured for AXI4.
This is accomplished by having a separate, independent MicroBlaze configuration
parameter (C_STREAM_INTERCONNECT) to determine if the stream interface should be
AXI4-Stream or FSL.
Demonstration Testbench
Latency Changes
Figure 4-6:
www.xilinx.com
79
Demonstration Testbench
To assist with core migration, CORE Generator generates an example testbench in the
demo_tb directory under the CORE Generator project directory. The testbench instantiates
the generated core and demonstrates a simple example of how the DSP IP works with the
AXI4-stream interface. This is a simple VHDL testbench that exercises the core.
The demonstration testbench source code is one VHDL file,
demo_tb/tb_<component_name>.vhd, in the CORE Generator output directory.
The source code is comprehensively commented. The demonstration testbench drives the
input signals of the core to demonstrate the features and modes of operation of the core
with the AXI4-Stream interface. For more information on how to use the generated
testbench refer to the Demonstration Testbench section in the individual IP data sheet.
Figure 4-7 shows the demo_tb directory structure.
Figure 4-7:
Figure 4-8:
Note: The upgrade mechanism alone will not create a core compatible with the latest version but will
provide a core that has equivalent parameter selection as the previous version of the core. The core
instantiation in the design must be updated to use the AXI4-Stream interface. The upgrade
mechanism also creates a backup of the old XCO file. The generated output is contained in the /tmp
folder of the CORE Generator project.
80
www.xilinx.com
Latency Changes
With DSP IP that support the AXI4-Stream interface, each individual AXI4-Stream slave
channel can be categorized as either a blocking or a non-blocking channel. A slave channel is
blocking when some operation of the core is inhibited until a transaction occurs on that
channel. In general, the latency of the DSP IP AXI4-Stream interface is static for nonblocking and variable for blocking mode. To reduce errors while migrating your design,
pay attention to the Latency Changes and Instructions for Minimum Change
Migration sections of the IP data sheet.
Resets: The DSP IP AXI4-stream interface aresetn reset signal is active-Low and
must be asserted for a minimum length of two clock cycles. The aresetn reset signal
always takes priority over the aclken clock enable signal.
Therefore your IP instantiation reset input must change from an active-High SCLR
signal with a minimum length of one clock cycle, to a active-Low reset input with a
minimum of two clock cycles.
Input and Output TDATA port structure: The AXI specification calls for data to be
consolidated onto a single TDATA input stream. For ease of visualization you can view
the TDATA structure from the IP symbol and implementation details tab in the IP GUI.
For ease of IP instantiation the demonstration example testbench also shows how to
connect and functionally split signals from the TDATA structure. The demonstration
testbench assigns TDATA fields to aliases for easy waveform viewing during
simulation.
Figure 4-9, page 82 shows the TDATA port structure.
www.xilinx.com
81
Figure 4-9:
82
www.xilinx.com
For example, the compiler flag -mlittle-endian is added automatically when you build
board support packages and user applications in SDK-managed builds, but you must add
this option explicitly when directly calling the MicroBlaze GNU compiler from the
command line.
Libgen, XMD, and other command line software tools generate output or interact with the
processor system while taking into account its endianness. In addition, drivers are
properly associated with AXI embedded processing IP used in the design, including an
updated CPU driver for MicroBlaze v8.00.a.
However, the Xilinx Platform Studio IDE does not support the creation of board support
packages or user applications for AXI-based designs. Use SDK instead or invoke software
tools directly from the command line.
End-user applications written for a MicroBlaze big-endian system can work with the
equivalent little-endian system with minimal or no changes if the code is not sensitive to
endianness. Once the hardware migration has been completed, migrate software
applications using the same approach used when updating hardware designs previously
(see the SDK online help for suggested steps). Function calls in user applications to board
support package functions for OS libraries and drivers might need to be modified if the
API has changed.
User code, custom drivers or other code might need to be rewritten if the code is sensitive
to the representation of data in big-endian or little-endian formats.
2.
Existing software that ran on a MicroBlaze PLBv4.6 big-endian system might need to
change:
a.
b.
In SDK, import the new AXI hardware handoff, create new board software
packages, and import the software applications.
c.
Modify driver calls as needed and ensure user code is endian-safe. Userdeveloped drivers might need to be re-written if their behavior is affected by
endianess.
d. If running the GNU tools on the command line and writing make files, the correct
compiler flag mlittle-endian must be used.
3.
Do not mix object files (.o) and libraries created with different endian data
representations.
b.
c.
Do not use ELF files built for big-endian systems on a little-endian system (and vice
versa).
d. Do not use generated Xilinx data files that are affected by endianess (for example
BIT files that include block RAM data - like ELF files) across systems.
e.
Block RAM initialization and data sharing should reflect the endianess
requirements of the master.
www.xilinx.com
83
f.
Do not use old application data files used by the application if it is affected by
endianess (byte ordering in the file).
g.
Value
MSByte
n+1
n+2
n+3
n+3
n+2
n+1
n+3
n+2
n+1
MSByte
n+3
n+2
n+1
n+1
n+2
n+3
Bit Label
31
Bit Significance
MSBit
LSBit
Table 4-6:
n+2
n+3
LSByte
LSByte
84
n+1
Value
n+1
MSByte
LSByte
n+1
n+1
n+1
www.xilinx.com
Table 4-6:
Value
MSByte
LSByte
n+1
n+1
Bit Label
15
Bit Significance
MSBit
LSBit
Table 4-7:
Byte Address
Bit Label
Bit Significance
MSBit
LSBit
www.xilinx.com
85
86
www.xilinx.com
Chapter 5
www.xilinx.com
87
Timing/Fmax
Throughput
Latency
Ease of Use/Flexibility
Ease of Debug
Size/Area
Table 5-1:
--
--
++
Downsizer
--
--
--
Upsizer
--
AXI4-Lite
--
++
AXI3
++
--
++
--
Single Thread
As more threads are used and there is a greater potential for read
reordering or read interleaving, the more difficult it is to debug.
1 (Default)
2, 4, 8, 16, 32
32 (Default)
--
++
Feature
Width Conversion
(Optional, Default = OFF)
Connectivity Mode
Configuration
ID Threading
Notes
Issuance/Acceptance
88
www.xilinx.com
Timing/Fmax
Throughput
Latency
Ease of Use/Flexibility
Ease of Debug
Size/Area
Table 5-1:
++
--
++
Type 8 (Automatic)
++
Floorplanning
(Optional, Default = OFF)
--
SRL
BRAM
--
++
Arbitration Priority
(Optional, Default = Round
Robin)
ON
++
ON
++
Feature
Register Slice
(Optional, Default = OFF)
Configuration
Notes
89
www.xilinx.com
Timing/Fmax
Throughput
Latency
Ease of Use/Flexibility
Ease of Debug
Size/Area
Table 5-2:
++
++
AXI4
ON (Default)
--
OFF
(Recommended)
Uses No Thread or
Issues Only a
Single Thread
Issues Multiple
Threads
> 1 up to 32
32
--
++
Short (1-4)
++
--
Feature
Configuration
AXI4-Lite
IP Protocol
Narrow Burst
Support
Threading
Ability to Pipeline
Transactions
Notes
90
www.xilinx.com
Minimize the clock domain conversions by reducing the logic associated with clock
domain conversion. Use as few clocks as possible and, if clock conversion is necessary,
attempt to keep the clocks to synchronous integer ratios.
2.
3.
4.
Avoid using AXI3 or AXI4 narrow bursts. Narrow bursts are defined in the AXI
protocol but are generally not used by master IP. When a master IP specifically
designates that they do not issue narrow bursts, some slaves (such as memory
controllers) can detect that they will therefore never receive a narrow burst transaction
and can omit narrow burst support logic.
5.
Minimize protocol conversions and use AXI4-Lite where possible. Protocol conversion
to AXI3 slaves utilizes logic. The AXI4-Lite protocol requires less logic to support,
especially when all devices on an AXI Interconnect are AXI4-Lite type. Using
AXI4-Lite protocols and grouping AXI4-Lite IP into a separate subsystem can reduce
logic.
6.
7.
Reduce data path width and minimize size and width conversions. Design systems to
the minimum required data path width while also minimizing width conversions. Be
careful not to inadvertently mismatch the AXI Interconnect core data width or core
clock with the width and clock of all the attached endpoints; this can result in an
excessive number of conversions. If possible, handle width conversion inside the user
IP instead of using a general-purpose memory mapped AXI width converter.
A protocol-compliant AXI memory mapped width converter block is complex due to
issues like address calculation, multi-thread support, transaction splitting, unaligned
bursts, and arbitrary burst length.
If width conversion can be performed more efficiently in the user IP or in the
application domain before reaching the AXI interconnect, the overall area is reduced.
www.xilinx.com
91
Turn on register slices where appropriate. Register slices act as AXI pipeline stages to
break combinatorial timing paths across the register slice. AXI Interconnect provides
an optional register slice at the boundary of each attached endpoint. The FIFO
Generator can also generate standalone instances of AXI register slices. Different
register slice types and the granularity to set them on individual AXI channels
provides fine grain control of register slices placement.
2.
Large and complex IP blocks such as processors, DDR3 memory controllers, and PCIe
bridges are good candidates for having register slices enabled. The register slice breaks
timing paths and allows more freedom for Place and Route (PAR) tools to move a large
IP block away from the congestion of the interconnect core and other IP logic.
a.
b.
3.
Reduce data path width and minimize size/width conversions (as described in
Size/Area Optimization Guidelines, step 7).
4.
5.
Separate IP using register slices then floorplan the IP blocks (this is an advanced
strategy). After placing register slices to provide timing isolation, IP blocks can be
floorplanned further away from the interconnect core to reduce congestion around
that block core.
2.
Increase data path widths. Wider data paths carry more information per clock cycle.
3.
Turn on data path FIFO buffers. Buffers can provide elasticity to hide temporary stalls
or backpressure in the data flow. Use of the additional BRAM FIFO option, to delay
AWVALID/ARVALID until FIFO occupancy permits interrupted burst transfers, can
further improve throughput at the expense of increased latency.
4.
5.
Increase transaction burst length. Longer bursts reduce the potential for stall cycles
caused by address arbitration and control logic overhead.
Longer bursts also signal to the AXI slave the intent to move a large amount of
contiguous data so that slaves, such as memory controllers, can better optimize their
response, and Reduce the relative amount of AXI address channel bandwidth.
92
www.xilinx.com
This reduces address channel congestion around the shared address arbiter logic in the
AXI Interconnect.
6.
7.
Exploit parallelism of Sparse Crossbar AXI Interconnect. In Sparse Crossbar Mode, the
AXI Interconnect supports parallel data flow when multiple masters transfer data to
multiple independent slaves.
8.
Avoid issuing read/write address requests until the IP ensures it can provide data
while inserting minimal idle cycles in the data stream. Otherwise when a read or write
data transfer is in progress, stalling the data phase of the transaction could prevent the
AXI Interconnect from servicing other read or write data transfers. If the master or
slave stalls, it could be blocking other devices, limiting system throughput.
For higher throughput, IP should be designed to request reads or writes when they are
ready to be serviced with minimal stall cycles. The use of buffering might be beneficial.
The worst case is a very slow AXI master requesting write bursts. When the slow
master is granted arbitration, it will block other writes to the same slave until it
completes its slow write transaction; this can take many clock cycles to transfer each
beat of data.
The use of Data Path FIFOs (with delayed AWVALID/ARVALID feature) in the AXI
Interconnect can help mitigate the system throughput impact of slow masters.
Minimize clock and width conversions. Clock and width conversion require logic that
adds additional cycles of latency.
2.
Avoid using AXI3/AXI4 narrow bursts. Some AXI slave devices such as memory
controllers must use logic to internally convert narrow bursts to full width bursts. This
packing logic adds latency. If all masters connected to a given slave can designate that
they do not perform narrow bursts, the narrow burst logic in the slaves can be
disabled, thereby reducing area and latency.
3.
Increase arbitration priority of latency sensitive masters. If some masters are more
latency sensitive than others, increasing the priority of the latency sensitive master
helps its requests to be serviced more quickly.
4.
Reduce transaction burst lengths to prevent prolonged head of line blocking. Long
bursts lengths can tie up data paths for longer periods of time while latency sensitive
masters have to wait. Reducing burst length provides more frequent arbitration cycles
where a latency sensitive master can gain access.
5.
Increase clock frequency while trying not to using register slices. This reduces the
absolute latency time. If register slices are not added, the number of clock cycles of
latency does not change, only the period of each clock cycle.
6.
7.
Arrange system address map and address access patterns to exploit row/bank
management features of AXI DDRx (MIG) memory controllers.
www.xilinx.com
93
Accessing address locations of open banks and rows (pages) of memory reduces
DRAM memory access time.
8.
Greater ease of use is accomplished by leaving each IP in its native, most convenient
clock, width, protocol, etc and using the per-port configurability of the interconnect to
adapt to the IP.
2.
Utilizing full crossbar connectivity provides more flexibility to change active source/
destinations of transactions whereas sparse connectivity limits the flexibility of which
masters can communicate with which slaves. An even simpler solution is to use the
Shared Address Shared Data (SASD) mode of the AXI Interconnect. SASD mode
permits only a single read or write transaction to execute at a time with no overlapping
or pipelining of transactions. The SASD mode of the AXI Interconnect stalls
transactions so only a single one at a time can progress. This eases the debug and
understanding of transaction sequences.
3.
The AXI4-Lite protocol is much simpler than the AXI3 or AXI4 protocol. If AXI4-Lite is
sufficient for an IP, using it simplifies the design.
4.
Reducing the use of threading and transaction pipelining makes the system easier to
debug and analyze using the AXI ChipScope debug monitor. Threading and
pipelining make it more difficult to correlate activity on each of the AXI channels with
a logical transaction. High levels of threading and pipelining also might be more likely
to expose functional bugs in user IP.
5.
Enabling the AXI ChipScope monitor permits full waveform capture and triggering in
hardware. This enables hardware runtime viewing/triggering of some or all AXI
signals at the boundary of the AXI Interconnect. This can be used to help diagnose
functional or performance issues in hardware.
6.
AXI hardware protocol checkers also help detect and more quickly isolate the source of
protocol violations due to functional errors.
94
www.xilinx.com
!8) -ULTI 0ORT -EMORY #ONTROLLER
!8) -ASTERS
!8) )NTERCONNECT .
!8)-ASTER
--
!8)-ASTER
--
!8) 0ORT
!8) -)'
!8) 0ORT
!8)
3LAVE
!8)-ASTER
--
Figure 5-1:
%XTERNAL
-EMORY
!8) 0ORT .
IP Configuration decisions across AXI masters, the AXI Interconnect, and AXI MIG can
greatly affect the characteristics of the system, such as size, Fmax, throughput, and latency.
By using the general optimization information described previously, the AXI MPMC can
be tuned for a balance of size and performance. This section works through an example of
applying system optimization techniques to tune the AXI MPMC.
For information on how to create an AXI MPMC design using EDK or Project Navigator,
see the AXI Multi-Ported Memory Controller Application Note, (XAPP739).
For an example of an AXI MPMC used in a high performance system, see Designing
High-Performance Video Systems with the AXI Interconnect, (XAPP740).
Appendix C, Additional Resources, also contains links to these documents.
Max theoretical
Bandwidth
(MBytes/sec)
300
600
600
400
800
800
16
300
600
1200
16
400
800
1600
32
300
600
2400
32
400
800
3200
www.xilinx.com
95
Table 5-3:
Max theoretical
Bandwidth
(MBytes/sec)
64
300
600
4800
64
400
800
6400
128
300
600
9600
The two smallest memory configurations that would meet the bandwidth requirements of
the system are using a 16-bit DDR3 running 300 to 400 MHz memory clock rate, providing
1200 to 1600 Mbytes/sec of theoretical bandwidth (67% to 50% memory utilization at 800
Mbytes/sec). In theory an 8-bit DDR3 running at 400 MHz meets the bandwidth too, but
given overhead (lost clock cycles) for refresh, write leveling, read-write bus turnaround
time, and row/bank address changes, some more efficiency margin is required. With AXI
MIG, the AXI slave interface data width is natively equal to four times the physical
memory data width and the AXI slave clock is the memory clock frequency, so a 16-bit
DDR3 @ 300 to 400 MHz directly corresponds to an AXI slave interface that is natively
64-bits wide at 150 to 200 MHz.
The recommendation for clock conversion is to use synchronous ratios over asynchronous
ratios to reduce logic. Instead of a 200 MHz Interconnect clock, the system can be
configured to attempt to remove asynchronous clock conversions by employing a:
96
www.xilinx.com
Figure 5-2:
Longer bursts reduce address arbitration/control cycles and help keep the memory
controller in the same row, bank, and read/write direction longer. Long bursts would
normally impact latency, but assuming this application is not very latency sensitive and
that data path FIFOs are enabled for elasticity, the use of long bursts should not result in
head of line blocking/stalling.
No Narrow Burst
Transactions
The AXI4 master should not issue any narrow burst transactions. Narrow bursts are
defined in the AXI specification as transactions where the size of the AXI transaction is
more narrow than the native data width of the interface. Such bursts are less efficient in
terms of bus utilization and require extra logic in the memory controller to handle
repacking of any narrow bursts into full width bursts. In this example:
Enable the modifiable bit on AXI transactions (AxCACHE[3]=1) to ensure that any
downstream upsizer can fully pack data up to wider widths. This allows costly
narrow burst support logic to be removed from the memory controller.
In XPS, this is designated by the C_SUPPORTS_NARROW parameter that then allows XPS to
automatically configure AXI MIG to omit narrow burst support logic. In a CORE
Generator context, you must manually configure AXI MIG to omit narrow burst support
logic.
www.xilinx.com
97
Pipeline
Transactions
Design the AXI4 master to pipeline transactions so it can issue new address requests while
servicing the data transfers for previous transactions. Pipelining transactions helps overlap
address and control cycles with data transfer cycles to improve data path efficiency and
throughput. However, new address requests should not be made until it is ensured that the
master can supply sufficient write data or has sufficient ability to accept read data to
complete a full burst with minimal stalling. A master that issues an address request and
excessively stalls the data transfer phase of its requested transaction could cause
backpressure that could eventually stall or slow the whole system.
Single Thread
Transactions
Design the AXI4 master so that it operates using only a single thread for all transactions
(declared using the C_SUPPORTS_THREADS=0 parameter). By not using multiple threads,
the logic in the AXI4 master can be simplified because it can be designed to rely upon write
responses and read data being returned in order. The use of a single thread also benefits the
AXI Interconnect performance because the upsizer is active in this example system.
Upsizers in the AXI Interconnect stall when changing ID threads so using a single thread
avoids stalling of transactions passing through the upsizer. Ensure that the AXI4 master
declares itself not to use threads so that AXI Interconnect can be configured to omit its
multi-thread support logic which reduces area and improves timing. Using a single thread
also makes debug easier because AXI transactions observed in the ChipScope monitor are
easier to decode and correlate across a system.
When fine tuning the configuration of the AXI Interconnect, it is useful to understand the
AXI Interconnect converter bank block. The converter bank handles size, clock, and
protocol conversion in addition to register slice and data path FIFO features.
The converter bank can be independently configured at each endpoint of the AXI
Interconnect, as shown in Figure 5-3.
AXI Interconnect
SI Hemisphere
MI Hemisphere
Slave
Interface
Register Slices
Protocol Converters
Down-sizers
Clock Converters
Up-sizers
Data FIFOs
Data FIFOs
Down-sizers
Clock Converters
Master 1
Up-sizers
Master 0
Register Slices
Crossbar
Slave 0
Slave 1
Master
Interface
X12047
Figure 5-3:
98
www.xilinx.com
Notice that from the perspective of the attached AXI master, shown in Figure 5-3, page 98,
the data path FIFOs are positioned after the upsizer and the clock converter so that the
FIFO interfaces to the interconnect core at its higher native width and clock.
Because the AXI masters are at a relatively lower bandwidth than the memory controller
(1/2 width, clock frequency), turning on data path FIFOs allows the interconnect to
buffer up the wider width transactions to and from the memory controller and service each
of the AXI masters at their slower rates on the other side of the FIFOs. Data path FIFOs
reduce stalling of the memory controller due to the slower data rate AXI masters. The AXI
Interconnect offers data path FIFOs in options of 32 deep or 512 deep FIFOs. Because the
AXI4 master is generating long bursts up to 256 beats in length, configure the FIFOs as 512
deep to fit an entire burst.
Beginning in release 13.3, the data path FIFOs have a new optional feature to delay
AWVALID/ARVALID until FIFO occupancy permits interrupted burst transfers
downstream. This feature causes:
Write address requests to be withheld from the crossbar until the write data path FIFO
has buffered all the data for the transaction.
Read address requests to be withheld from the crossbar until the read data path FIFO
has sufficient vacancy to store the entire transaction.
This feature ensures that the crossbar does not see a transaction request until the data path
FIFO can guarantee that it can source/sink the entire transaction at the full bandwidth of
the crossbar without introducing stall cycles in the data transfer. This feature is especially
useful in situations similar to the example design, shown in Figure 5-2, page 97, where the
master has a relatively lower bandwidth than the slave (memory controller).
Timing
Considerations
For timing, the AXI Interconnect should be configured to enable register slices at the
interface to the memory controller. Because the memory controllers AXI interface operates
at the highest width and clock frequency in the system, it is likely a critical path unless a
register slice is turned on. A Type 8 register slice can be enabled on all 5 channels at the AXI
interface of the memory controller to allow the AXI Interconnect to optimize the kind of
register slice best suited to each AXI channel. Note that a register slice at the AXI master
interface is not required. This is because the AXI master and the upsizer are both clocked
by the slower 48 MHz side of the clock converter. Also, the clock converter acts as a register
slice since it provides timing isolation between 48 MHz and 192 MHz clock domains.
Issuance/Acceptance Also, issuance and acceptance values at each port of the AXI Interconnect can be optimized
Values of 2 or Higher to support transaction pipelining and to limit the pipelining so that head of line blocking is
Given that there are 4 masters, an issuance of 2 means that memory controller would
need an acceptance of 8 to fully pipeline 2 transactions from each master.
www.xilinx.com
99
Given that transactions are all long bursts, pipelining more than 8 transactions at the
memory controller becomes excessive. An issuance setting of 4 at the masters is too
high because it would require the slave to consume up to 16 transactions to utilize.
If the memory width or clock should be increased to provide more available margin.
Whether to reduce the burst length of the other AXI masters to reduce the time that a
processor waits for a burst transaction to complete.
If the highest arbitration priority can be granted to the processor to minimize its
latency.
If the issuance/acceptance values for other devices might be reduced to limit head of
line blocking due to pipelined transactions.
If the system clocking can be altered to favor the memory path of the processor having
no clock conversion or only synchronous clock conversion.
Note: The MicroBlaze processor can support a native 128-bit and 256-bit and 512-bit wide
AXI interface. This is an example of application domain size conversion that is more efficient that
generic AXI width conversion. This MicroBlaze wide cache configuration is ideal for connecting
to an equally wide memory controller to remove the latency impact of size conversion.
The optimizations described to improve processor performance are often the opposite step
for maximizing system throughput. Therefore, either more margin is needed by using a
larger memory controller or you must carefully optimize your software (to minimize cache
misses) and be more willing to experiment with the system to find the right balance
between latency and throughput.
100
www.xilinx.com
For example, when two AXI Interconnects connect directly to each other, a set of
back-to-back register slices can be enabled using one register slice from each adjacent
interconnect. This can be used to span longer routing distances in large AXI MPMC
systems.
In some cases using multiple AXI Interconnects can even reduce overall system size. When
an AXI MPMC requires a large number of upsizers, especially with large steps like 32- to
128-bits, separating the masters into subgroups using smaller width AXI Interconnects can
reduce the number of upsizers which consume area and impact timing.
Note: in XPS, the connection of two cascaded AXI Interconnect instances together requires that an
AXI-to-AXI Connector IP be instantiated. This bridge IP provides a tool mechanism for XPS to
cascade interconnect, but logically it contains only wires and consumes no logic.
www.xilinx.com
101
You can place multiple AXI ChipScope monitors around the system and cross-trigger
between each of them to analyze more complex system level activity. The AXI Hardware
Protocol Checker feature is available that can trigger the AXI ChipScope monitor when
some types of AXI protocol violations occur. The AXI Hardware Protocol Checker can
more quickly isolate the source of protocol violations.
Floorplanning
AXI IP connected to the AXI Interconnect can be floorplanned to improve placer results
and reduce routing congestion. To make floorplanning easier in large FPGAs, enable extra
register slices to provide a more distinct flip flop-based boundary at the AXI IP interface.
Note: After any significant changes to the AXI Interconnect configuration, floorplan locations might
need to be rechecked and updated as necessary. Otherwise subsequent changes to the AXI
Interconnect, such as turning on data path FIFOs, can change the footprint and necessary placement
of the AXI Interconnect.
102
Xilinx recommends that these BFMs be used for verification of user IP under development.
Given the potential complexity of understanding AXI transactions, especially across
pipelined transactions and multi-threaded traffic, it can be extremely difficult to debug
subtle functional errors or isolate the root cause of protocol violations solely in hardware.
The simulation domain is usually a far less expensive method for verifying and debugging
new AXI IP before use in complex systems. See the AXI Bus Functional Models User Guide
(UG783) and the AXI Bus Functional Model Data Sheet (DS824) for more information.
www.xilinx.com
Reducing AXI clocks from 200 MHz to 150 MHz to improve timing.
Reductions in clock frequency could permit use of a slower speed grade FPGA
device
Allow crossbar to be reconfigured into SASD (this disables transaction pipelining and
multi-threading support)
Provides room for future system bandwidth expansion (you can later increase
clock frequencies, enable crossbar, and so forth.)
Allow masters to use shorter burst lengths to reduce latency or reduce the FIFO/
buffering requirements of the systems
Note: Increasing memory controller and AXI Interconnect data width could introduce new size
conversion requirements and board-level requirements into the system that might offset these AXI
system simplifications so analysis and experimentation is required to determine if this approach is an
overall improvement for the given application.
www.xilinx.com
103
Because XPS handles width and clock conversion configurations automatically, connecting
the wrong interconnect clock or setting the wrong interconnect data width causes XPS to
automatically activate all the necessary conversions to make the system function. The
result is a system that might appear to function and completes all AXI transactions, but the
system bandwidth, area, latency, and timing could be very undesirable.
The recommended approach is to incrementally add register slices when timing fails
starting at the interfaces with highest clock frequencies and data widths. Register slices
might also be needed for large crossbar interconnects or at AXI interfaces where size
conversion is performed. If large numbers of register slices are required to meet timing in
a large system, floorplanning may be needed to help guide the place and route tools.
Do Not Place
Register Slices on
AXI4-Lite IP
104
www.xilinx.com
For example, type 1 register slices support back-to-back data beats without inserting stalls
while type 7 register slices use less area, but insert a stall after every data transfer.
The type 7 register slice is ideal for AXI4-Lite interfaces or for AW, AR, and B channels
of an AXI interface where back-to-back transfers do not occur or occur infrequently.
The type 1 register slice is designed for R and W channels where burst transactions
occur.
Manage transaction ordering and completion rules to manage traffic among multiple
AXI masters and slaves in a system.
The richness of the AXI protocol and the possible concurrency of data transfer in a
crossbar, make hardware-only based debug and verification of new AXI IP much more
challenging.
Verify New IP with
BFM
New IP should be verified in simulation using AXI Bus Functional Models (such as the
Cadence BFM available for XPS) and AXI protocol checkers/assertions (available from
Cadence or from the ARM website).
Simulation-based verification results in far shorter debug cycle time, easier identification
and isolation of functional problems, and greater variation of AXI traffic than
hardware-only based verification.
Hardware-only based AXI IP verification requires full synthesis and Place and Route
(PAR) time per debug cycle, and the visibility of signals from an AXI ChipScope monitor is
more limited than in a simulation domain. The potential complexity of AXI4 traffic even in
a relatively typical system makes hardware-only verification very expensive.
Skipping
Simulation-based
AXI IP Verification
is Highly
Discouraged
www.xilinx.com
105
The AXI system output from BSB should still be further adapted, optimized, and
incrementally transformed to fit the desired end application using the techniques
described in AXI System Optimization, page 91. Failure to tune the output of BSB to meet
the specific requirements of an application could result in poor quality of results and low
performance.
The architecture and optimizations necessary for a good AXI IP-based solution can differ
greatly from that of an IBM CoreConnect or a Xilinx MPMC-based system. The output
from BSB for AXI systems might not be designed with the same type of system architecture
as was output from BSB for CoreConnect or MPMC based systems. The output from BSB
for AXI systems must be significantly modified to match similar area, performance, and
features tradeoffs as a CoreConnect or an MPMC system created by BSB.
106
www.xilinx.com
Appendix A
Signal
AXI4
AXI4-Lite
ACLK
Clock source.
ARESETN
Global reset source, active-Low. This signal is not present on the interface when a reset source (of either
polarity) is taken from another signal available to the IP. Xilinx IP generally must deassert VALID
outputs within 8 cycles of reset assertion, and generally require a reset pulse-width of 16 or more clock
cycles of the slowest clock.
Some Xilinx IP might document that they can accept ARESETN asserted for fewer than 16 cycles. For
example, DSP IP require ARESETN asserted for a minimum of 2 cycles on their AXI4-Stream interfaces.
Signal
AXI4
AXI4-Lite
Signal not present.
AWID
Fully supported.
Masters need only output the set of ID bits that it varies
(if any) to indicate re-orderable transaction threads.
Single-threaded master interfaces may omit this signal. Masters do not need to
output the constant portion that comprises the Master ID, as this is appended by the
AXI Interconnect.
AWADDR
Fully supported.
Width 32 bits, or larger as needed. High-order bits outside the native address range of a slave are ignored
(trimmed), by an end-point slave, which could result in address aliasing within the slave.
www.xilinx.com
107
Table A-2:
Signal
AXI4
AXI4-Lite
AWLEN
Fully supported.
Support bursts:
Up to 256 beats for incrementing (INCR).
16 beats for WRAP.
AWSIZE
AWBURST
AWLOCK
AWCACHE
AWPROT
AWQOS
AWREGION
AWUSER
AWVALID
Fully supported.
AWREADY
Fully supported.
Signal
AXI4
AXI4-Lite
WDATA
WSTRB
Fully supported.
WLAST
Fully supported.
WUSER
108
www.xilinx.com
Table A-3:
Signal
AXI4
WVALID
Fully supported.
WREADY
Fully supported.
AXI4-Lite
Signal
AXI4
AXI4-Lite
BID
Fully supported.
See AWID for more information.
BRESP
Fully supported.
BUSER
BVALID
Fully supported.
BREADY
Fully supported.
Signal
AXI4
AXI4-Lite
ARID
ARADDR
Fully supported.
Width 32 bits, or larger as needed. High-order bits outside the native address range of a slave are ignored
(trimmed) by an end-point slave, which could result in address aliasing within the slave.
ARSIZE
ARBURST
www.xilinx.com
109
Table A-5:
Signal
AXI4
AXI4-Lite
ARLOCK
ARCACHE
ARPROT
ARQOS
ARREGION
ARUSER
ARVALID
Fully supported.
ARREADY
Fully supported.
Signal
AXI4
AXI4-Lite
RID
Fully supported.
See ARID for more information.
RDATA
RRESP
Fully supported.
RLAST
Fully supported.
RUSER
RVALID
Fully supported.
RREADY
Fully supported.
110
www.xilinx.com
Default
(All Bits)
TVALID
No
N/A
No change.
TREADY
Yes
No change
TDATA
Yes
No change.
Xilinx AXI IP convention:
8 through 4096 bit widths are used by Xilinx AXI IP (establishes a testing limit).
TSTRB
Yes
Same as
TKEEP
else 1
TKEEP
Yes
In Xilinx IP, there is only a limited use of Null Bytes to encode the remainders bytes
at the end of packetized streams.
TKEEP is not used in Xilinx endpoint IP for leading or intermediate null bytes in the
middle of a stream.
TLAST
Yes
TID
Yes
No change.
Xilinx AXI IP convention:
Only 1-32 bit widths are used by Xilinx AXI IP (establishes a testing limit).
TDEST
Yes
No change
Xilinx AXI IP convention:
Only 1-32 bit widths are used by Xilinx AXI IP (establishes a testing limit).
TUSER
Yes
No change
Xilinx AXI IP convention:
Only 1-4096 bit widths are used by Xilinx AXI IP (establishes a testing limit).
Signal
Description
www.xilinx.com
111
112
www.xilinx.com
Appendix B
AXI Terminology
Table B-1:
AXI Terminology
Term
Type
Description
Usage
AXI
Generic
General description.
Embedded and
memory cores.
Examples: MIG, block
Ram, EDK PCIe
Bridge, FIFO.
Management registers.
Examples: Interrupt
Controller, UART Lite,
IIC Bus Interface.
AXI4
AXI4-Lite
AXI4-Stream
Unidirectional links modeled after a single write channel. Used in DSP, Video,
and communication
Unlimited burst length.
applications.
AXI4
AXI4-Lite
AXI4-Stream
All.
Interface
AXI4
AXI4-Lite
AXI4-Stream
All.
Channel
Generic
Multiple-bit signal
(Not an interface or a channel).
All.
Bus
AXI4-Stream
Transaction
AXI4
AXI4-Lite
AXI4
AXI4-Lite
AXI4-Stream
All.
Transfer
AXI4
AXI4-Lite
AXI4-Stream
All.
Burst
AXI4
AXI4-Lite
AXI4-Stream
All.
master
AXI4
AXI4-Lite
AXI4-Stream
All.
slave
www.xilinx.com
113
Term
Type
Description
master
interface
(generic)
AXI4
AXI4-Lite
AXI4-Stream
slave interface
(generic)
AXI4
AXI4-Lite
AXI4-Stream
AXI4
AXI4-Lite
EDK.
AXI Interconnect Slave Interface:
For XPS flow, Vectored AXI slave interface receiving inbound AXI transactions from all connected master
devices.
For CORE Generator tool flow, one of multiple slave
interfaces connecting to one master device.
MI
AXI4
AXI4-Lite
EDK.
AXI Interconnect Master Interface:
For XPS flow, Vectored AXI master interface generating
out-bound AXI transactions to all connected slave
devices.
For CORE Generator tool flow, one master interface
connecting to one slave device.
AXI4
AXI4-Lite
EDK.
SI slot
AXI4
AXI4-Lite
EDK.
MI slot
SI-side
AXI4
AXI4-Lite
All.
MI-side
AXI4
AXI4-Lite
All.
upsizer
AXI4
AXI4-Lite
AXI4-Stream
All.
downsizer
AXI4
AXI4-Lite
AXI4-Stream
All.
SAMD
Topology
SASD
Topology
SI
Usage
All.
All.
Shared-Access Topology
Crossbar
Topology
All.
Crossbar
Structural
All.
114
www.xilinx.com
Appendix C
Additional Resources
Additional reference documentation:
See the Introduction, page 3 for instructions on how to download the ARM AMBA AXI
specification from http://www.amba.com.
Additionally, this document references documents located at the following Xilinx website:
http://www.xilinx.com/support/documentation/axi_ip_documentation.htm
Xilinx Documentation
Xilinx Glossary:
http://www.xilinx.com/support/documentation/sw_manuals/glossary
www.xilinx.com
115
Memory Control:
http://www.xilinx.com/products/technology/memory-solutions/index.htm
Local-Link:
http://www.xilinx.com/products/design_resources/conn_central/locallink_member/sp06.pdf
116
www.xilinx.com