0% found this document useful (0 votes)
35 views490 pages

AMD Soc

The Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347) provides comprehensive information on design processes, including system planning, software development, and hardware integration for CPM4 and CPM5. It details various subsystems such as QDMA, AXI Bridge, and XDMA, along with their configurations, functionalities, and use cases. Additionally, the guide addresses design flow steps, debugging, and application software development, ensuring users can effectively implement PCIe solutions.

Uploaded by

kmaddy069
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views490 pages

AMD Soc

The Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347) provides comprehensive information on design processes, including system planning, software development, and hardware integration for CPM4 and CPM5. It details various subsystems such as QDMA, AXI Bridge, and XDMA, along with their configurations, functionalities, and use cases. Additionally, the guide addresses design flow steps, debugging, and application software development, ensuring users can effectively implement PCIe solutions.

Uploaded by

kmaddy069
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 490

Versal Adaptive SoC CPM DMA and

Bridge Mode for PCI Express Product


Guide (PG347)

Confidential - Copyright © Fluid Topics


Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Overview
Navigating Content by Design Process
Terminology
Introduction
Designing with the Core
Tandem Configuration
Overview
Supported Devices
Tandem + DFX
Enable the Tandem Configuration Solution
Deliver Programming Images to Silicon
Tandem Configuration Performance
Design Operation
Loading Tandem PCIe for Stage 2
Segmented Configuration
Known Issues and Limitations
QDMA Subsystem for CPM4
Overview
Product Specification
Design Flow Steps
Customizable Example Design (CED)
Debugging
Application Software Development
Upgrading
QDMA Subsystem for CPM5
Overview
Product Specification
Design Flow Steps
Customizable Example Design (CED)
Application Software Development
Debugging
Upgrading
AXI Bridge Subsystem for CPM4
Overview
Product Specification
Design Flow Steps
Debugging
Upgrading
AXI Bridge Subsystem for CPM5
Overview
Product Specification
Design Flow Steps
Debugging
Upgrading
XDMA Subsystem for CPM4
Overview

Displayed in the footer


Page 2 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Product Specification
Design Flow Steps
Application Software Development
Debugging
Upgrading
GT Selection and Pin Planning for CPM4
CPM4 GT Selection
CPM4 Additional Considerations
CPM4 GTY Locations
GT Selection and Pin Planning for CPM5
General Guidance for CPM5
Guidance for CPM5 in Specifically Identified Engineering Sample Devices
Guidance for CPM5 Migration from Specifically Identified Engineering Sample Devices
CPM5 GTYP Locations
Using the High Speed Debug Port Over PCIe for Design Debug
Overview
Implementing the HSDP-over-PCIe Example Design
Interrupt Request (IRQ) Routing and Programming for CPM4
Interrupt Request (IRQ) Routing and Programming for CPM5
Migrating
Limitations
Additional Resources and Legal Notices
Finding Additional Documentation
Support Resources
References
Revision History
Please Read: Important Legal Notices

Displayed in the footer


Page 3 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Overview
Navigating Content by Design Process
AMD Adaptive Computing documentation is organized around a set of standard design processes to
help you find relevant content for your current development task. You can access the AMD Versal™
adaptive SoC design processes on the Design Hubs page. You can also use the Design Flow
Assistant to better understand the design flows and find content that is specific to your intended
design needs. This document covers the following design processes:

System and Solution Planning


Identifying the components, performance, I/O, and data transfer requirements at a system level.
Includes application mapping for the solution to PS, PL, and AI Engine. Topics in this document
that apply to this design process include:

Introduction to the CPM4


Use Modes
Introduction to the CPM5
Use Modes

Embedded Software Development


Creating the software platform from the hardware platform and developing the application code
using the embedded CPU. Also covers XRT and Graph APIs. Topics in this document that apply
to this design process include:

CPM4
QDMA Subsystem
Register Space
Application Software Development
AXI Bridge Subsystem
Register Space
XDMA Subsystem
Register Space
Application Software Development
CPM5
QDMA Subsystem
Register Space
Application Software Development
AXI Bridge Subsystem
Register Space

Displayed in the footer


Page 4 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Hardware, IP, and Platform Development​
Creating the PL IP blocks for the hardware platform, creating PL kernels, functional simulation,
and evaluating the AMD Vivado™ timing, resource use, and power closure. Also involves
developing the hardware platform for system integration. Topics in this document that apply to
this design process include:

CPM4
QDMA Subsystem: Lab1: QDMA AXI MM Interface to NoC and DDR
QDMA Subsystem: Lab2: QDMA AXI MM Interface to NoC and DDR with Mailbox
XDMA Subsystem: XDMA AXI MM Interface to NoC and DDR Lab
CPM5
QDMA Subsystem: QDMA AXI MM Interface to NoC and DDR Lab

Terminology
Table: Terminology in this Guide

Acronym or Term Description

AXI-ST AXI4-Stream

AXI-MM AXI4 Memory Mapped

Controller 0 CPM PCIE Controller 0

Controller 1 CPM PCIE Controller 1

QDMA Queue-based Direct Memory Access

Controller 0 QDMA or QDMA0 CPM PCIE controller 0 with QDMA

Controller 1 QDMA or QDMA1 CPM PCIE controller 1 with QDMA (Only CPM5
contains hardened QDMA with Controller 1)

SR-IOV Single root input/output virtualization

Introduction
Introduction to the CPM4

The integrated block for PCIe® Rev. 4.0 with DMA and CCIX Rev. 1.0 (CPM4) is shown in the
following figure.

Figure: CPM4 Sub-Block for PCIe Function (CPM4 PCIE)

Displayed in the footer


Page 5 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

CPM Components
The CPM includes multiple IP cores:

Controllers for PCIe


The CPM contains two instances of the AMD controller for PCIe, PCIE Controller 0 and PCIE
Controller 1. Both controllers can have CCIX capabilities. However, only PCIE Controller 0 is
capable of acting as an AXI bridge and as a DMA master. The controllers interface with the GTs
through the XPIPE interface.

Displayed in the footer


Page 6 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Coherent Mesh Network
The CPM has a Coherent Mesh Network (CMN) (not shown) that forms the cache coherent
interconnect block in the CPM that is based on the Arm® CMN600 IP. There are two instances of
L2 cache and CHI PL Interface (CPI) blocks in the CPM (also not shown).

DMA / AXI Bridge


CPM Controller 0 has a hardened DMA/AXI Bridge core. CPM Controller 1 does not have a
hardened DMA core, but you can have a soft DMA/Bridge for Controller 1. The CPM Controller 0
has two possible direct memory access (DMA) IP cores. DMA Subsystem for PCIe (XDMA) and
Queue DMA Subsystem for PCIe (QDMA). The DMA cores are used for data transfer between
the programmable logic (PL) to the host, and from the host to PL. The DMA cores can also
transfer data between the host and the Network-on-Chip (NoC), which provides high bandwidth
to other NoC ports including the available DDR memory controllers (DDRMC). The CPM has an
AXI Bridge Subsystem for PCIe (AXI Bridge) IP for AXI-to-host communication.
The CPM includes a clock/reset block that houses a phase-locked loop (PLL) and clock dividers. The
CPM also includes the system-on-a-chip (SoC) debug component for transaction-level debug. Several
APB and AXI interfaces are used between blocks in the CPM for configuration.

DMA Data Transfers


DMA transfers can be categorized into two different datapaths.

Data path from CPM to NoC to PL


All AXI4 signals are connected from the DMA to the AXI interconnect. These signals are then
routed to the Non-Coherent interconnect in the CPM block. They then connect to the PS
interconnect and the NoC. From the NoC, the signal can be directed to any block (DDR or block
RAM) based on the user design. The figure below shows the datapath to NoC in red.

Data path from CPM directly to PL


All AXI4-Stream signals and other sideband signals, like clock and reset, are routed directly to
the PL. The figure below shows the data path to the PL in green.

Figure: DMA Data Paths

Displayed in the footer


Page 7 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Use Modes

There are several use modes for DMA functionality in the CPM. You can select one of three options
for data transport from host to programmable logic (PL), or PL to host: QDMA, AXI Bridge, and XDMA.
To enable DMA transfers, customize the Control, Interfaces and Processing System (CIPS) IP core as
follows:

1. In the CPM4 Basic Configuration page, set the PCIE Controller 0 Mode to DMA.
2. Set the lane width value.

3. In the CPM4 PCIE Controller 0 Configuration page, set the PCIe Functional Mode for the desired
DMA transfer mode:
QDMA
AXI Bridge
XDMA

Displayed in the footer


Page 8 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The following sections explain how you can further configure and use these different functional modes
for your application.

QDMA Subsystem

QDMA mode enables the use of PCIE Controller 0 with QDMA enabled. QDMA mode provides two
connectivity variants: AXI4-Stream and AXI4. Both variants can be enabled simultaneously.

AXI Streaming
QDMA Streaming mode can be used in applications where the nature of the data traffic is
streaming with source and destination IDs instead of a specific memory address location, such
as network accelerators, network quality of service managers, or firewalls.

AXI Memory Mapped


QDMA Memory Mapped mode can be used in applications where the nature of the data traffic is
addressable memory, such as moving data between a host and a card, or an acceleration
platform.

The main difference between XDMA mode and QDMA mode is that while XDMA mode supports up to
four independent data streams, QDMA mode can support up to 2048 independent data streams.
Based on this strength, QDMA mode is typically used for applications that require many queues or
data streams that need to be virtually independent of each other. QDMA mode is the only DMA mode
that can support multiple functions, either physical functions or single root I/O virtualization (SR-IOV)
virtual functions.
QDMA mode can be used with AXI Bridge mode. For more details on AXI Bridge mode, see the AXI
Bridge Subsystem section.

Displayed in the footer


Page 9 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI Bridge Subsystem

AXI bridge mode enables you to interface the CPM4 PCIE Controller 0 with an AXI4 domain. This use
mode connects directly to the NoC which allows communication with other peripherals within the
Processing System (PS) and in the Programmable Logic (PL).

Figure: Bridge Data Paths

AXI bridge mode is typically used for light traffic data paths such as write to or read from Control and
Status registers. AXI bridge mode is also the only mode that can be configured for root port
application with AXI4 interface used to interface with a processor, typically the PS.
AXI bridge functionality is available in the following three modes:

Standalone AXI bridge mode


AXI bridge with QDMA mode
AXI bridge with XDMA mode

AXI bridge in QDMA or XDMA mode, customize the core as follows:

1. For CPM4 Basic Configuration options, select DMA for PCIE controller 0 Mode.
2. In controller 0 Basic tab, set PCIE 0 Functional Mode to either XDMA or QDMA.
3. Set one or both of the following options:

Displayed in the footer


Page 10 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
In the Basic tab, select the Enable Bridge Slave Mode checkbox. This option enables slave
AXI Bridge interface within the IP, which you can use to generate write or read transaction
from an AXI source peripheral to other PCIe devices.

In the PCIe BARs tab, select the BAR check-box next to the AXI Bridge Master. This option
enables the master AXI Bridge interface within the IP, which you can use to receive write or
read transactions from a PCIe source device to AXI peripherals.

In standalone AXI Bridge mode, customize the core as follows:

1. CPM4 Basic Configuration:


PCIE controller 0 Mode: Select DMA
2. CPM4 PCIE Controller 0 Configuration:
PCIe 0 Functional Mode: Select AXI Bridge
3. Set one or both of the following options:

Displayed in the footer


Page 11 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
In the PCIe BARs tab, select the BAR check-box for each BAR that is needed. This option
enables the Master AXI interface within the IP, which enables writes/reads transaction from
a PCIe source device to AXI peripherals.
If AXI slave functionality is needed, select the Enable Bridge Slave Mode check-box. This
option enables the slave AXI interface within the IP, which you can use to generate write or
read transaction from an AXI source peripheral to other PCIe devices.

XDMA Subsystem

XDMA mode enables the use of PCIE Controller 0 with the XMDA enabled. XDMA mode provides two
connectivity variants: AXI Streaming and AXI Memory Mapped. Only one variant can be enabled at a
time.

AXI Streaming
XDMA Streaming mode can be used in applications where the nature of the data traffic is
streaming with source and destination IDs instead of a specific memory address location, such
as network accelerators, network quality of service managers, or firewalls.

AXI Memory Mapped


XDMA Memory mapped mode can be used in applications where the nature of the data traffic is
addressable memory, such as moving data between a host and a card as an acceleration
platform.

XDMA mode can be used in conjunction with AXI Bridge mode. For more details on AXI Bridge mode,
see the AXI Bridge Subsystem.

CPM4 Common Features

Supports 64, 128, 256, and 512-bit data path.


Supports x1, x2, x4, x8, or x16 link widths.
Supports Gen1, Gen2, Gen3, and Gen4 link speeds.

✎ Note: x16 Gen4 configuration is not available in the data path from CPM directly to PL. This is only
used with the CPM through AXI4 to NoC to PL datapath.
The IP configuration supports and shows the selectable options in the GUI, which are x16, x8, and x4.
The PCIe specification requires devices to negotiate link width with the attached link partner during
link training. The IP is capable of training down to narrower link widths than the IP configuration set at
design time. For designs intending to use narrower x2 or x1 link widths, configure the IP as x4 and
connect only the bottom lane(s).

Displayed in the footer


Page 12 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA Features

2048 queue sets


2048 H2C descriptor rings.
2048 C2H descriptor rings.
2048 C2H Completion (CMPT) rings.
Supports both the AXI4 Memory Mapped and AXI4-Stream interfaces per queue (AXI4-Stream is
not available when CPM4 configured for 16 GTps data rate with x16 lane width).
Supports Polling Mode (Status Descriptor Write Back) and Interrupt Mode.
Interrupts
2048 MSI-X vectors.
Up to 32 MSI-X vectors per PF, and 8 MSI-X vectors per VF.
Interrupt aggregation.
C2H Stream interrupt moderation.
C2H Stream Completion queue entry coalescence.
Descriptor and DMA customization through user logic
Allows custom descriptor format.
Traffic Management.
Supports SR-IOV with up to 4 Physical Functions (PF) and 252 Virtual Functions (VF)
Thin hypervisor model.
QID virtualization.
Allows only privileged/Physical functions to program contexts and registers.
Function level reset (FLR) support.
Mailbox.
Rich programmability on a per queue basis, such as AXI4 Memory Mapped versus AXI4-Stream
interfaces.

AXI Bridge Features

AXI Bridge functional mode features are supported when the AXI4 slave bridge is enabled in the
XDMA or QDMA or in standalone Bridge use mode.

Supports Multiple Vector Messaged Signaled Interrupts (MSI), MSI-X interrupt, and Legacy
interrupt.
AXI4 Slave access to PCIe address space.
PCIe access to AXI4 Master.
Tracks and manages Transaction Layer Packets (TLPs) completion processing.
Detects and indicates error conditions with interrupts in Root Port mode.
Supports six PCIe 32-bit or three 64-bit PCIe Base Address Registers (BARs) as an Endpoint.
Supports up to two PCIe 32-bit or a single PCIe 64-bit BAR as Root Port.

Displayed in the footer


Page 13 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
XDMA Features

64-bit source, destination, and descriptor addresses.


Up to four host-to-card (H2C/Read) data channels.
Up to four card-to-host (C2H/Write) data channels.
Selectable user interface.
Single AXI4 (MM) user interface.
AXI4-Stream user interface (each channel has its own AXI4-Stream interface; AXI4-Stream
is not available when CPM4 configured for 16 GT/s data rate with x16 lane width).
AXI4 Bridge Master interface allows for PCIe traffic to bypass the DMA engine.
AXI Slave interface allows access to DMA status registers.
Scatter Gather descriptor list supporting unlimited list size.
256 MB max transfer size per descriptor.
Legacy, MSI, and MSI-X interrupts.
Block fetches of contiguous descriptors.
Poll Mode.
Descriptor Bypass interface.
Arbitrary source and destination address.
Parity check or Propagate Parity on DMA AXI interface.

Standards

The AMD Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express adheres to the following
standards:

AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A)


PCI Express Base Specification v4.0 Version 1.0 and Errata updates
PCI Local Bus Specification
PCI-SIG® Single Root I/O Virtualization and Sharing (SR-IOV) Specification

For details, see PCI-SIG Specifications (https://www.pcisig.com/specifications).

Minimum Device Requirements

Table: CPM4 Controller with QDMA, Bridge, or XDMA Hard IP Subsystem Maximum
Configurations (Versal Prime, Versal AI Core, Versal AI Edge)

Speed Grade -1 -1 -2 -2 -2 -3

Voltage Grade L (0.70V) M (0.80V) L (0.70V) M (0.80V) H (0.88V) H (0.88V)

Gen1 (2.5 GT/s x16 x16 x16 x16 x16 x16


per lane)

Gen2 (5 GT/s x16 x16 x16 x16 x16 x16


per lane)

Displayed in the footer


Page 14 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Speed Grade -1 -1 -2 -2 -2 -3

Voltage Grade L (0.70V) M (0.80V) L (0.70V) M (0.80V) H (0.88V) H (0.88V)

Gen3 (8 GT/s x16 x16 x16 x16 x16 x16


per lane)

Gen4 (16 GT/s x8 x8 x8 x8 x16 1 x16 1


per lane)

1. Gen4x16 does not support AXI4 interfaces directly between CPM4 and the programmable
logic. This is supported only through NoC.

Licensing and Ordering

This AMD LogiCORE™ IP module is provided at no additional cost with the AMD Vivado™ Design
Suite under the terms of the End User License.
Information about other AMD LogiCORE™ IP modules is available at the Intellectual Property page.
For information about pricing and availability of other AMD LogiCORE IP modules and tools, contact
your local sales representative.

Introduction to the CPM5

The integrated block for PCIe® Rev. 5.0 with DMA and CCIX Rev. 1.0 (CPM5) is shown in the
following figure.

Figure: CPM5 Sub-Block for PCIe Function (CPM5 PCIE)

Displayed in the footer


Page 15 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

CPM Components
The CPM includes multiple IP cores:

Controllers for PCIe


The CPM contains two instances of the AMD controller for PCIe: PCIE Controller 0 and PCIE
Controller 1. Both controllers can have CCIX capabilities and they are capable of acting as an
AXI bridge and a DMA master. The controllers interface with the GTs through the XPIPE
interface.

Displayed in the footer


Page 16 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Coherent Mesh Network
The CPM has a Coherent Mesh Network (CMN) (not shown) that forms the cache coherent
interconnect block in the CPM that is based on the Arm® CMN600 IP. There are two instances of
L2 cache and CHI PL Interface (CPI) blocks in the CPM (also not shown).

DMA / AXI Bridge


CPM Controller 0 and Controller 1 both have hardened DMA/AXI Bridge. The CPM has Queue
DMA Subsystem for PCIe (QDMA) for data transfer using Direct Memory Access (DMA). The
DMA cores are used for data transfer between the programmable logic (PL) to the host and from
the host to PL. The DMA cores can also transfer data between the host and the network on chip
(NoC), which provides a high bandwidth to other NoC ports, including the available DDR
memory controllers (DDRMC). The CPM has an AXI Bridge Subsystem for PCIe (AXI Bridge) IP
for AXI-to-host communication.

‼ Important: XDMA is not supported for CPM5.


The CPM includes a clock/reset block that houses a phase-locked loop (PLL) and clock dividers. The
CPM also includes the system-on-a-chip (SoC) debug component for transaction-level debug. Several
APB and AXI interfaces are used between blocks in the CPM for configuration.

DMA Data Transfers


DMA data transfers can be initiated from both Controller 0 and Controller 1. There are some
limitations based on which DMA controller is used.

PCIE Controller 0
Data transfer width can be x16, x8, x4, x2 or x1.
AXI4 data can only be transferred through NoC. From NoC, the data can be steered to
DDR or to the programmable logic.
PCIE Controller 1
Data transfer width can be x8, x4, x2 or x1 (Not x16).
AXI4 data can be transferred through NoC or directly to PL logic. This is possible by setting
the host profile programming. See Host Profile.

DMA transfers can be categorized into two different datapaths.

Data path from CPM to NoC to PL


All AXI Memory Mapped signals are connected from the DMA to the AXI interconnect. These
signals are then routed to the Non-Coherent interconnect in the CPM block. They then connect
to the PS interconnect and the NoC. From the NoC, the signal can be directed to any block
(DDR or block RAM) based on the user design. The figure below shows the datapath to NoC in
red.
Controller 1 QDMA can transfer AXI4 signals directly to the PL or to the NoC and then to PL
based on host profile programming.

Data path from CPM directly to PL


All AXI4-Stream signals and other side band signals, like clock and reset, are routed directly to
the PL. The figure below shows the data path to the PL in green.

Displayed in the footer


Page 17 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Controller 0 QDMA Data Paths

Figure: Controller 1 QDMA Data Paths

Displayed in the footer


Page 18 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Use Modes

There are several use modes for QDMA functionality in the CPM. You can select one of the two
options for data transport from host to programmable logic (PL) or PL to host: QDMA or AXI Bridge.
To enable QDMA transfers, customize the Control, Interfaces and Processing System (CIPS) IP core
as follows:
DMA transfers can be initiated in PCIE Controller 0 or in PCIE Controller 1.
The following illustration shows the CPM5 PCIE Controller 0 selection. The same options apply to
Controller 1 QDMA. CPM to PL options is available only for controller 1.

1. In the CPM5 Basic Configuration page, set the PCIE Controller 0 Mode to DMA.

Displayed in the footer


Page 19 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
2. Set the lane width value.

3. In the CPM5 PCIE Controller 0 Configuration page, set the PCIe Functional Mode for the desired
DMA transfer mode:
QDMA
AXI Bridge

The following sections explain how to further configure and use these different functional modes for
your application.

QDMA Subsystem

QDMA mode enables the use of PCIE Controller 0 QDMA or PCIE Controller 1 QDMA. QDMA mode
provides two connectivity variants: AXI Streaming and AXI Memory Mapped. Both variants can be
enabled simultaneously.

AXI Streaming
QDMA Streaming mode can be used in applications where the nature of the data traffic is
streaming with source and destination IDs instead of a specific memory address location, such
as network accelerators, network quality of service managers, or firewalls.

AXI Memory Mapped


QDMA Memory Mapped mode can be used in applications where the nature of the data traffic is
addressable memory, such as moving data between a host and a card, such as an acceleration
platform.

Displayed in the footer


Page 20 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The main difference between XDMA mode and QDMA mode is that while XDMA mode supports up to
4 independent data streams, QDMA mode can support up to 2048 independent data streams. Based
on this strength, QDMA mode is typically used for applications that require many queues or data
streams that need to be virtually independent of each other. QDMA mode is the only DMA mode that
can support multiple functions, either physical functions or single root I/O virtualization (SR-IOV)
virtual functions.
QDMA mode can be used with the AXI Bridge mode. More details on AXI Bridge mode are described
in the next section.

AXI Bridge Subsystem

AXI Bridge mode enables you to interface the CPM5 PCIE Controller 0 or CPM5 PCIE Controller 1
with an AXI4 domain. This use mode connects directly to the NoC or to PL logic (based on Controller
0 or Controller 1 selection) that allows communication with other peripherals within the Processing
System (PS) and in the Programmable Logic (PL).

Figure: Controller 0 Bridge Data Paths

Displayed in the footer


Page 21 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Controller 1 Bridge Data Paths

AXI Bridge mode is typically used for light traffic data paths such as write to or read from Control and
Status registers. AXI Bridge mode is also the only mode that can be configured for Root Port
application with AXI4 interface used to interface with a processor, typically the PS.
AXI bridge functionality is available in the following two modes:

AXI Bridge with QDMA


Standalone AXI Bridge mode

AXI Bridge in QDMA Mode


For configuring the IP Core in AXI Bridge with QDMA, customize the core as follows:

1. In the Basic tab, set PCIe0 or PCIe1 Functional Mode to DMA.


2. Set one or both of the following options:

Displayed in the footer


Page 22 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
In the Basic tab, select the Enable Bridge Slave Mode checkbox. This option enables Slave
AXI interface within the IP that you can use to generate write or read transaction from an
AXI source peripheral to other PCIe devices.

In the PCIe BARs tab, select the BAR checkbox next to the AXI Bridge Master. This option
enables the Master AXI interface within the IP that you can use to receive write or read
transaction from a PCIe source device to AXI peripherals.

Standalone AXI Bridge Mode


For configuring the IP Core in Standalone AXI Bridge mode, customize the core as follows:

1. CPM5 Basic Configuration:


PCIE Controller 0 Mode: Select DMA
Select the DMA option in Controller 0 and or Controller 1 based on the requirement
2. CPM5 PCIE Controller 0 Configuration:
Functional Mode: Select AXI Bridge

Displayed in the footer


Page 23 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
3. Set one or both of the following options:
In the PCIe BARs tab, select the BAR checkbox for each BAR that is needed. This option
enables the Master AXI interface within the IP that enables writes/reads transaction from a
PCIe source device to AXI peripherals.
If AXI Slave functionality is needed, select the Enable Bridge Slave Mode checkbox. This
option enables the Slave AXI interface within the IP that you can use to generate Write or
Read transaction from an AXI source peripheral to other PCIe devices.

CPM5 Common Features

CPM5 has two PCIE controllers, and each controller has a hardened QDMA/AXI Bridge. Each
QDMA/AXI Bridge supports the following features:

Supports only 512-bit data path.


Supports x1, x2, x4, x8, or x16 link widths.
Supports Gen1, Gen2, Gen3, Gen4, and Gen5 link speeds.

QDMA Functional Mode

AXI4-Stream Interfaces
AXI4 Memory Mapped Interfaces
64-bit PCIe addresses
48-bit AXI MM addresses
10-bit tag support with maximum outstanding 512 PCIe tags (requester and completer)
MSI-X Interrupt type supported
Descriptor input interface
Descriptor output interface
Root Port support (in AXI Bridge Mode)
Endpoint support
4096 Descriptor rings
4096 CMPT rings
Programmable ring sizes for descriptor and CMPT rings
Per queue PASID
4096 functions (Up to 4 physical functions (PFs) and 240 virtual functions (VFs))*
Function level reset
Interrupt coalescing
A total of 8192 MSI-X vectors

✎ Note: pcie_qdma_mailbox IP support is only up to 4PF and 240VF's.


AXI Bridge Functional Mode

AXI Bridge functional mode features are supported when the AXI4 slave bridge is enabled in QDMA
use mode or standalone AXI Bridge.

Displayed in the footer


Page 24 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Supports Multiple Vector Messaged Signaled Interrupts (MSI), MSI-X interrupt, and Legacy
interrupt.
AXI4 Slave access to PCIe address space.
PCIe access to AXI4 Master.
Tracks and manages Transaction Layer Packets (TLPs) completion processing.
Detects and indicates error conditions with interrupts in Root Port mode.
Supports six PCIe 32-bit or three 64-bit PCIe Base Address Registers (BARs) as Endpoint.
Supports up to two PCIe 32-bit or a single PCIe 64-bit BAR as Root Port.

Standards

The CPM5 QDMA and Bridge Mode for PCI Express adheres to the following standards:

AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A)


PCI Express Base Specification 5.0 (final) and Errata updates (available at PCI-SIG
Specifications (https://www.pcisig.com/specifications)

Minimum Device Requirements

Table: CPM5 Controller with QDMA or Bridge Hard IP Subsystem Maximum Configurations
(Versal Premium, Versal HBM, Versal Prime, Versal AI Core, Versal AI Edge)

Speed Grade -1 -1 -2 -2 -2 -2 -2 -3

Voltage Grade L (0.70V) M (0.80V) L (0.70V) L (0.70V) M (0.80V) M (0.80V) H (0.88V) H (0.88V)

Overdrive 1 Don’t CareDon’t Care No Yes No Yes Not Applicable


Not Applicable

Speed File -2LP -2LHP -2MP -2MHP

Gen1 (2.5 x16 x16 x16 x16 x16 x16 x16 x16
GT/s per
lane)

Gen2 (5 x16 x16 x16 x16 x16 x16 x16 x16


GT/s per
lane)

Gen3 (8 x16 x16 x16 x16 x16 x16 x16 x16


GT/s per
lane)

Gen4 (16 x8 x8 x8 x16 x8 x16 x16 x16


GT/s per
lane)

Gen5 (32 N/A N/A N/A x8 N/A x8 x8 x8


GT/s per
lane) 2

Displayed in the footer


Page 25 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Speed Grade -1 -1 -2 -2 -2 -2 -2 -3

Voltage Grade L (0.70V) M (0.80V) L (0.70V) L (0.70V) M (0.80V) M (0.80V) H (0.88V) H (0.88V)

Overdrive 1 Don’t CareDon’t Care No Yes No Yes Not Applicable


Not Applicable

Speed File -2LP -2LHP -2MP -2MHP

1. For information on requirements for supplying Overdrive voltages, see Versal Premium
Series Data Sheet: DC and AC Switching Characteristics (DS959) and Power Design
Manager User Guide (UG1556).
2. CPM5 PCIe Gen5 support is available only in Versal Premium, Versal HBM, and Versal AI
Core series.

Licensing and Ordering

This AMD LogiCORE™ IP module is provided at no additional cost with the AMD Vivado™ Design
Suite under the terms of the End User License.
Information about other AMD LogiCORE™ IP modules is available at the Intellectual Property page.
For information about pricing and availability of other AMD LogiCORE IP modules and tools, contact
your local sales representative.

Designing with the Core


Designing with the Subsystem for CPM4

Clocking

✎ Note: USER_CLK (user_clk) in this section refers to pcie(n)_user_clk, which is also


described in the Clock and Reset Interface section.
The CPM programmable logic integrated block for PCIe (PL PCIE) requires a 100, 125, or 250 MHz
reference clock input. The following figure shows the clocking architecture. The user_clk clock is
available for use in the fabric logic. The user_clk clock can be used as the system clock.

Figure: USER_CLK Clocking Architecture

All user interface signals are timed with respect to the same clock (user_clk), which can have a
frequency of 62.5, 125, or 250 MHz depending on the configured link speed and width. The user_clk
should be used to interface with the CPM. With the user logic, any available clocks can be used.
Each link partner device shares the same reference clock source. The following figures show a
system using a 100 MHz reference clock. Even if the device is part of an embedded system, if the

Displayed in the footer


Page 26 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
system uses commercial PCI Express® root complexes or switches along with typical motherboard
clocking schemes, synchronous clocking should be used.
✎ Note: The following figures are high-level representations of the board layout. Ensure that
coupling, termination, and details are correct when laying out a board.

Figure: Embedded System Using 100 MHz Reference Clock

Figure: Open System Add-In Card Using 100 MHz Reference Clock

Resets

The fundamental resets for the CPM PCIe® controllers and associated GTs are perst0n and
perst1n. These resets should be provided as an input to the FPGA for both endpoint and root port
modes using the pins identified in GT Selection and Pin Planning for CPM4 and GT Selection and Pin
Planning for CPM5.

PERST# input to the IP is routed from one of the allowed PS/PMC MIO pins through a dedicated
logic into the CPM.
Selection of the PS/PMC MIO pins is made in CIPS/PS Wizard, but it is not made in the CPM
sub-core, rather it is made in the PS/PMC sub-core.
Users of the configurable example designs (CEDs) are advised that CEDs by default have
PERST# assignment(s) but it is a best practice to visit the IP configuration GUI for CIPS/PS
Wizard to ensure the pin assignment matches the PERST# connectivity in the board schematic.

Displayed in the footer


Page 27 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
There is a power-on-reset for CPM driven by the platform management controller (PMC). When both
PS and the power-on reset from PMC are released, CPM PCIE controllers and the associated GTs
come out of reset. After the reset is released, the core attempts to link the train and resumes normal
operation.
There is a pcie(n)_user_reset given from the CPM PCIE controller to the user design present in
the fabric logic. Whenever CPM PCIE controller goes through a reset, or there is a link down, the
CPM PCIE controller issues a pcie(n)_user_reset to the user design in the programmable logic
(PL) region. After the PCIe link is up, pcie(n)_user_reset is released for the user design to come
out of reset.
To reset the DMA block, deassert the dma_soft_resetn pin. This pin is active-Low, and by default it
should be tied to High. This does not reset the entire CPM PCIE controller but it resets only the DMA
(XDMA/QDMA/AXI Bridge) block.

Reset in a Root Port Mode

Following are the two ways to activate fundamental reset in a Root Port mode:

Fundamental reset as output


Fundamental reset as input

Fundamental reset as output


Root Port drives a GPIO pin out to reset the End Point. Also, Root Port CPM is reset via
RST_PCIE_CORE0 and RST_PCIE_CORE1 registers. This is an asynchronous reset.

Figure: Fundamental Reset as Output

Displayed in the footer


Page 28 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
PCIe controllers can be reset manually by asserting and deasserting the following registers:

Table: Reset PCIe Core Registers

Name Address Access Type Value Description

RST_PCIE_CORE0 0x0000000308 RW 0x00000001 Reset for PCIe


controller 0.

RST_PCIE_CORE1 0x000000030C RW 0x00000001 Reset for PCIe


controller 1.

Fundamental reset as input


Fundamental reset as input via MIO pin. Root Port can drive reset to End Point via any GPIO pin
and the same reset is looped back as a fundamental reset input via MIO pin.

Figure: Fundamental Reset as Input

To set an MIO pin as input, you need to configure that in CIPS GUI as shown below:

Figure: PCIe Reset Configuration

Displayed in the footer


Page 29 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Following are the few pins that are allowed as MIO input pins:

Dedicated pins for controller 0: PS MIO 18, PMC MIO 24, or PMC MIO 38.
Dedicated pins for controller 1: PS MIO 19, PMC MIO 25, or PMC MIO 39.

Figure: MIO Pin Selection

Designing with the Subsystem for CPM5

Displayed in the footer


Page 30 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Clocking

DMA Clock
QDMA and AXI Bridge run on the clock that is provided by the user. This is a change from CPM4
(where the IP provides the clock). You must provide a clock dma<n>_intrfc_clk, that is used by the
IP. All the input and output ports are driven or loaded using this clock. Because this is an independent
clock provided by the user, there are some restrictions on clock frequency based on the IP
configurations that are listed below:

Table: Clock Frequency

Configuration Options Frequency

Gen3x16 250 MHz

Gen4x8 250 MHz

Gen4x16 433 MHz 1

Gen5x8 433 MHz 1

1. For 433 MHz frequency, you need to have a -3HP device.

The input clock frequency (dma<n>_interfc_clk and cpm_pl_axi<n>_clk for Gen3x16 and Gen4x8
configurations is 250 MHz. For Gen4x16 and Gen5x8 configurations, the maximum input clock
frequency allowed is 433 MHz for a -3HP device. For other device speed grades, refer to the
corresponding device datasheet to know the maximum frequency applicable to those devices.
For the QDMA1 AXI-MM interface, there are two more clock inputs that you must provide,
cpm_pl_axi0_clk and cpm_pl_axi1_clk.

PCIe Ref Clock


Each link partner device shares the same reference clock source. The following figures show a
system using a 100 MHz reference clock. Even if the device is part of an embedded system, if the
system uses commercial PCI Express® root complexes or switches along with typical motherboard
clocking schemes, synchronous clocking should be used.
✎ Note: The following figures are high-level representations of the board layout. Ensure that
coupling, termination, and details are correct when laying out a board.

Figure: Embedded System Using 100 MHz Reference Clock

Displayed in the footer


Page 31 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Open System Add-In Card Using 100 MHz Reference Clock

Resets

The fundamental resets for the CPM PCIe® controllers and associated GTs are perst0n and
perst1n. The resets are driven by the I/O inside the PS.

PERST# input to the IP is routed from one of the allowed PS/PMC MIO pins through a dedicated
logic into the CPM.
Selection of the PS/PMC MIO pins is made in CIPS/PS Wizard, but it is not made in the CPM
sub-core, rather it is made in the PS/PMC sub-core.
Users of the configurable example designs (CEDs) are advised that CEDs by default have
PERST# assignment(s) but it is a best practice to visit the IP configuration GUI for CIPS/PS
Wizard to ensure the pin assignment matches the PERST# connectivity in the board schematic.

In addition, there is a power-on-reset for CPM driven by the platform management controller (PMC).
When both PS and the power-on reset from PMC are released, CPM PCIE controllers and the
associated GTs come out of reset.
After the reset is released, the core attempts to link train and resumes normal operation.
Signals dma<n>_intrfc_clk and dma<n>_intrfc_resetn are both input to CPM5 IPs. CPM5
interface logic is cleared with dma<n>_intrfc_resetn signals that the user provides. These reset
signals are active-Low and should be held in the reset state (1'b0) until the input clock
dma<n>_intrfc_clk is stable. Once the clock is stable, you can deassert dma<n>_intrfc_resetn
signal.

Displayed in the footer


Page 32 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The dma0_axi_aresetn is a reset signal that is given to user logic. This signal is synchronized to the
input clock signal dma<n>_intrfc_clk. The user is responsible to use dma<n>_axi_aresetn signal
to reset interface logic.

Reset in a Root Port Mode

To activate fundamental reset in a Root Port mode, see Reset in a Root Port Mode.

Data Bandwidth and Performance Tuning

The CPM offers a few different main data interfaces for you to use depending on the CPM subsystem
functional mode being used. The following table shows the available data interfaces to be used as the
primary data transfer interface for each functional mode.

Table: Available Data Interface for Each CPM Subsystem Functional Mode

Functional Mode
CPM_PCIE_NOC_0/1NOC_CPM_PCIE_0 CPM_PL_AXI_0/1 AXI4 ST C2H/H2C

CPM4 Yes (both) No N/A Yes


QDMA

CPM5 Yes (both) No Yes (both) Yes


QDMA

CPM4 AXI Yes (only one) Yes N/A No


Bridge

CPM5 AXI Yes (only one) Yes Yes (Only one) No


Bridge

CPM4 Yes No N/A Yes


XDMA

1. CPM_PCIE_NOC_0/1: These interfaces are for AXI4-MM traffic which is mastered from
within the CPM and exits to the NoC towards DDRMC/PL connections. Examples of such
masters in the CPM include the CPM integrated DMA and the CPM integrated bridge master.
2. NOC_CPM_PCIE_0: This interface is for AXI4-MM traffic which is mastered from an internal
PS or PL connections and exits from the NoC towards CPM. Examples of such slaves in the
CPM include the CPM integrated bridge slave.
3. CPM_PL_AXI_0/1: These interfaces are for AXI4-MM traffic which is mastered from within the
CPM and exits to the PL directly. Examples of such masters in the CPM include the CPM
integrated DMA and the CPM integrated bridge master. These interfaces are only available to
CPM5 controller and DMA/Bridge instance 1 (not available to instance 0).
4. AXI4 ST C2H/H2C: This interface is for inbound and outbound AXI4-ST traffic for the CPM
integrated DMA.

✎ Note: Certain data interfaces are unavailable based on the selected feature set for that particular
functional mode. For more details on these restrictions, refer to the port description in the associated

Displayed in the footer


Page 33 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
CPM subsystems section.
✎ Note: Some data interfaces are shared with more than one feature set. Therefore, even though a
particular mode does not use certain data interfaces with those interfaces can still be enabled and
visible at the CPM boundary for other use.
The raw capacity for each AXI4 data interface is determined by multiplying the data width and the
clock frequency. The net bandwidth depends on multiple factors, including but not limited to the
packet overhead for given packet types. Achievable bandwidth might vary.

CPM_PCIE_NOC and NOC_CPM_PCIE: Fixed 128-bit wide at CPM_TOPSW_CLK frequency.


The maximum frequency is dependent on the device speed grade.
CPM_PL_AXI: 64/128/256/512-bit data width is supported at cpm_pl_axi0_clk or
cpm_pl_axi1_clk pin frequency.
AXI4 ST C2H/H2C: 64/128/256/512 bit data width is supported at dma_intrfc_clk pin
frequency.
✎ Note: NoC clock frequency must be greater than the CPM_TOPSW_CLK clock frequency.
The raw capacity for the PCIe link is determined by multiplying the number of PCIe lanes
(x1/x2/x4/x8/x16) and their link speed (Gen1/Gen2/Gen3/Gen4/Gen5). The overhead of the link
comes from the link layer encoding and Ordered Sets, CRC fields, packet framing, TLP headers and
prefixes, and data bus alignment.
In the event that a particular PCIe link configuration has a higher bandwidth than the available data
bus capacity of the AXI4 interface, more than one AXI4 interface must be used to sustain the
maximum link throughput. This can be achieved in various ways. Here are some examples:

Load balance data transfer by allocating half of the enabled DMA queues or DMA channels to
interface #0, and the other half to interface #1.
Share the available PCIe link bandwidth among different types of transfers. DMA streaming uses
AXI4 ST C2H/H2C interface while DMA Memory Mapped uses CPM_PCIE_NOC or
CPM_PL_AXI interfaces.

AXI Bridge functional mode alone might not be able to sustain full PCIe link bandwidth in some link
and device configurations due to the availability of only one data interface per bridge instance.
Therefore, the AXI Bridge functional mode is restricted to control and status accesses only, it is not
intended to be used as a primary data mover. However, it can be paired with a DMA functional mode
to make use of the remaining bandwidth. This functional mode has a variety of applications, including
but not limited to root complex (RC) memory bridging and add-in-card peer-to-peer (P2P) operation.
P2P use cases are complex with respect to the achievable bandwidth depending on many factors
including but not limited to CPM DMA/Bridge bandwidth capabilities, whether DMA or Bridge is active
depending on the initiator of the P2P operation, the peer capability, and the capability of any
intervening switch component or root complex integrated switch.
You must also analyze the potential of head of line blocking or the request and response buffer size
for each interface and ensure that data transfer initiated within a system does not cause cyclic
dependencies between interfaces or different transfers. PCIe and AXI specifications have data types,
IDs, and request/response ordering requirements and CPM upholds those requirements. For more
details on CPM_PCIe_NOC and NOC_CPM_PCIe interfaces, refer to Versal Adaptive SoC
Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide

Displayed in the footer


Page 34 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
(PG313). CPM_PL_AXI_0/1 and AXI4 ST C2H/H2C interfaces are direct interfaces to the user PL
region and give you the flexibility to attach your own data buffer and interconnect as required.

Tandem Configuration
Overview
PCI Express® is a plug-and-play protocol, meaning that at power-up the PCIe® host enumerates the
system. This process consists of the host enumerating PCIe devices and assigning them base
addresses. As such, PCIe interfaces must be ready when the host queries them or they do not get
assigned a base address. The PCI Express specification states that PERST# can deassert 100 ms
after the power good of the systems has occurred, and a PCI Express port must be ready to link train
20 ms after PERST# has deasserted. This is commonly referred to as the 100 ms boot time
requirement, even though 120 ms is the fundamental goal.
AMD devices can meet this 120 ms link training requirement by using Tandem Configuration, a
solution that splits the programming image into two stages. The first stage quickly configures the PCIe
endpoint(s) so the endpoint is ready for link training within 120 ms. The second stage then configures
the rest of the device. Two variants are supported:

Tandem PROM
Loads both stages from a single programming image from a standard primary boot interface.

Tandem PCIe
Loads the first stage from a primary boot interface, then the second stage is delivered via one of
the CPM PCIE controllers.

Within AMD Versal™ device, the CPM consists of two PCIE controllers, DMA features, CCIX features,
and network-on-chip (NoC) integration. The enables direct access to the two high-performance,
independently customizable PCIE controllers. You can select to have one or both of these controllers
enumerate within 120 ms using Tandem Configuration.
✎ Note: While Tandem Configuration is designed to meet the 120 ms link training goal, additional
considerations are required. Configuration memory device selection is a key factor, as some options
(such as OSPI or QSPI) are much faster than others (such as SD or eMMC). Secure boot features
such as encryption and authentication increase the stage 1 loading time, putting the goal of 120 ms at
risk. Use the Versal Adaptive SoC - Boot Time Estimator to calculate the time expected to complete
the stage 1 load.
While the term Tandem Configuration is carried forward from prior iterations of this technology applied
in 7 series, AMD UltraScale™ and AMD UltraScale+™ silicon, the solution is fundamentally different
in an AMD Versal device given how the PCIE controllers are built and configured. No programmable
logic is needed to activate an endpoint, so only CPM, XPIPE or CPIPE, NoC, and GTY or GTYP
resources are included along with the PMC in the first stage.
‼ Important: Only production AMD Versal devices are supported for Tandem Configuration for certain
devices. Do NOT use this solution on engineering silicon (ES1) for VC1902, VC1802 or VM1802
devices.

Displayed in the footer


Page 35 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Supported Devices
The Integrated Block for PCIe with DMA and CCIX and AMD Vivado™ tool flow support
implementations targeting AMD reference boards and specific part/package combinations.
Tandem Configuration is available as a production solution for monolithic Versal devices, but only
those with CPM resources. Tandem Configuration supports the devices described in the following
table:

Table: Tandem Configuration Supported Devices

Device Description

Series Part CPM Type Support Level

Versal AI Core VC1502 CPM4 Production

VC1702 CPM4 Production

VC1802 CPM4 Production

VC1902 CPM4 Production

VC2602 CPM5 Production

VC2802 CPM5 Production

Versal AI Edge VE1752 CPM4 Production

VE2002 None Unsupported

VE2102 None Unsupported

VE2202 None Unsupported

VE2302 None Unsupported

VE2602 CPM5 Production

VE2802 CPM5 Production

Versal Prime VM1102 None Unsupported

VM1302 CPM4 Production

VM1402 CPM4 Production

VM1502 CPM4 Production

VM1802 CPM4 Production

VM2202 CPM5 Production

VM2302 None Unsupported

Displayed in the footer


Page 36 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Device Description

Series Part CPM Type Support Level

VM2502 CPM5 Production

VM2902 None Unsupported

Versal Premium VP1002 CPM4 Production

VP1052 CPM4 Production

VP1102 None Unsupported

VP1202 CPM5 Production

VP1402 None Unsupported

VP1502 CPM5 Production

VP1552 CPM5 Production

VP1702 CPM5 Production

VP1802 CPM5 Production

VP2502 CPM5 Production

VP2802 CPM5 Production

Versal HBM VH1522 CPM5 Production

VH1542 CPM5 Production

VH1582 CPM5 Production

VH1742 CPM5 Production

VH1782 CPM5 Production

Device image generation is disabled by default for all ES silicon. Engineering Silicon devices might
not be tested through software and/or hardware and Tandem PDI generation is gated by a parameter.
Any device not listed in this table is either currently unsupported, or does not contain a CPM site
necessary to enable support.

Tandem + DFX
The Dynamic Function eXchange (DFX) feature is supported by much of the AMD silicon portfolio and
Vivado Design Suite that allows for the reconfiguration of various modules within an active device. It
gives system architects the flexibility to switch a portion of the design in and out depending on the
system requirements, removing the need to multiplex multiple functions in a larger device, which
saves on part cost, power and improves system up time. Taking advantage of the PCIe link with CPM
for delivery of reconfigurable partition bitstream data to the PMC allows for high throughput and

Displayed in the footer


Page 37 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
minimal design requirements and it is simplified by the provided software and drivers. Delivery of DFX
partial images is done in the same way as Tandem stage 2 images.
Tandem Configuration and Dynamic Function eXchange (DFX) are solutions for different phases of
the device operation. Tandem Configuration is only used at the initial power-up of the device or after a
full device reconfiguration request to bring up the device in stages. DFX, on the other hand, is used to
deliver the programming images that modify a portion of the programmable logic while the rest of the
device remains operational. As of Vivado 2024.1 release, all Versal Adaptive SoC Gen 1 devices,
including those using SSI technology, can take advantage of Tandem Configuration and DFX in the
same design.
Both features can be selected in a single design, allowing you to benefit from both sets of
advantages. Support is limited to all the production devices listed in the preceding table. Monolithic
device support was introduced in Vivado 2023.2, and support for devices using SSI technology was
introduced in Vivado 2024.1.
✎ Note: Care must be taken to keep any DFX Pblock away from the PS-PL boundary where soft
logic associated with the CPM is placed. Failure to do so might result in an unrouteable design.

Enable the Tandem Configuration Solution


An option in the CIPS customization GUI allows you to pick the option that suits your needs. In the
CPM options, a Tandem Configuration selection is enabled when a PCIe Endpoint is selected for
PCIe Controller 0 or Controller 1. The three available options are:

Tandem PROM
Tandem PCIe
None

Figure: Customizing the PCIe Controllers

Displayed in the footer


Page 38 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Tandem PROM is the simpler mode for Tandem Configuration, where both stages reside in a single
programming image. If 120 ms enumeration is required the selection of this option essentially comes
for free, as there is no overhead in design complexity or requirement for programmable logic to build.
The programming ordering simply starts with the CPM and other necessary elements before moving
on to the rest of the device.
Because Tandem PCIe uses the PCIe link to program the stage 2 portion of the design, the design
must include connectivity from the enable CPM Master(s) to the PMC Slave. This should be
accomplished through the block design connectivity. When PCIE Controller 0 is set to DMA, this CPM
Interface is set automatically and appropriate mapping of the slave into the CPM master address
space(s). This includes enabling the CPM to NoC 0 Interface by checking the appropriate box on the
CPM Basic Configuration customization page. The specific aperture within the PMC slave that must
be accessible from the host is the Slave Boot Interface (SBI) which is available at AMD Versal device
address 0x102100000.

Figure: CPM Master to PMC Slave Connection for Loading Tandem PCIe Stage 2 to SBI

Displayed in the footer


Page 39 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Slave Boot Interface FIFO Assigned in CPM Address Map

To deliver stage 2 images using MCAP VSEC, see Versal Adaptive SoC CPM Mode for PCI Express
Product Guide (PG346).
✎ Note: If these interfaces are not used, tie the corresponding ready signal to 1.
dma0_mgmt_cpl_rdy, dma0_st_rx_msg_tready and dma_tm_dsc_sts_rdy must be tied to 1 if not
used.
To deliver stage 2 images using PCIe DMA, the DMA BAR must be set to BAR0. The driver probes
BAR0 to find the DMA BAR. If this goes to the PL, the transaction does not complete, because the PL
is not yet configured.

Figure: PCIe BARs

Displayed in the footer


Page 40 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Confirmation that Vivado parameters and Tandem Configuration in general have been applied can be
seen in the log when write_device_image is run. Following is a snippet of the log for a Tandem
PROM run during the write_device_image step:
INFO: [Designutils 12-2358] Enabled Tandem boot bitstream.
Creating bitmap...
INFO: [Bitstream 40-812] Reading NPI Startup sequence definitions
INFO: [Bitstream 40-811] Reading NPI Shutdown sequence definitions
INFO: [Bitstream 40-810] Reading NPI Preconfig sequence definitions
Creating bitstream...
Tandem stage1 bitstream contains 1243712 bits.
Writing NPI partition ./Versal_CPM_Tandem_tandem1.rnpi...
Creating bitstream...
Tandem stage2 bitstream contains 15633728 bits.
Writing CDO partition ./Versal_CPM_Tandem_tandem2.rcdo...
Writing NPI partition ./Versal_CPM_Tandem_tandem2.rnpi...
Generating bif file Versal_CPM_Tandem_tandemPROM.bif for Tandem PROM.

The resulting run creates (in addition to the files mentioned above) a single .pdi image for this design
called Versal_CPM_Tandem.pdi.
When Tandem PCIe is enabled through CIPS IP customization, two .pdi files are generated:

<design>_tandem1.pdi
This file should be added to the device configuration flash.

<design>_tandem2.pdi
This file should be programmed into the device through the PCIe link once it becomes active.

Displayed in the footer


Page 41 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The resulting report in the write_device_image log looks nearly identical, but the file name for the
.bif is slightly different:
INFO: [Designutils 12-2358] Enabled Tandem boot bitstream.
Creating bitmap...
INFO: [Bitstream 40-812] Reading NPI Startup sequence definitions
INFO: [Bitstream 40-811] Reading NPI Shutdown sequence definitions
INFO: [Bitstream 40-810] Reading NPI Preconfig sequence definitions
Creating bitstream...
Tandem stage1 bitstream contains 1243712 bits.
Writing NPI partition ./Versal_CPM_Tandem_tandem1.rnpi...
Creating bitstream...
Tandem stage2 bitstream contains 15633728 bits.
Writing CDO partition ./Versal_CPM_Tandem_tandem2.rcdo...
Writing NPI partition ./Versal_CPM_Tandem_tandem2.rnpi...
Generating bif file Versal_CPM_Tandem_tandem1.bif for Tandem stage-1.

In addition to the files mentioned above, the resulting run creates two .pdi images for this design
called design_1_wrapper_tandem1.pdi and design_1_wrapper_tandem2.pdi. The _tandem1 and
_tandem2 suffixes are automatically added to differentiate the stages.
‼ Important: Stage 1 and stage 2 bitstreams must remained paired. While this is trivial for Tandem
PROM because both stages are stored in a single PDI image, this is a critical consideration for
Tandem PCIe. If any part of the design is modified such that a full recompilation is triggered, both
stage 1 and stage 2 images must be updated. Always update both stages when any change is made.
✎ Note: Tandem PDI generation for new devices is gated until a device reaches production status. A
parameter is available to build images for pre-production or ES silicon; contact support for information
and to confirm that there are no issues with the new device. Without the parameter,
write_device_image is expected to fail with the following error:
ERROR: [Vivado 12-4165] Tandem Configuration bitstream generation is not supported for part
<device>.
In an AMD UltraScale+ device, the Field Updates solution enables you to build Reconfigurable Stage
Twos where one could not only pick a stage 2 image from a list of compatible images, but then also
reconfigure that stage 2 area with another stage 2 image to provide dynamic field updates. In an AMD
Versal device, the solution is similar but not exactly the same. The first part (for the initial boot of a
device) can be supported in the future to allow you to lock a stage 1 image in a small local boot flash;
the second part (dynamic reconfiguration) requires a Tandem + DFX-based approach to allow for
dynamic reconfiguration of a subsection of the PL.
For test and debug purposes the HD.TANDEM_BITSTREAMS property can be set on the implemented
design before .pdi file generation to separate a single Tandem PROM .pdi file into separate
tandem1.pdi and tandem2.pdi files.

set_property HD.TANDEM_BITSTREAMS Separate [current_design]

Similarly, the behavior of Tandem PROM or Tandem PCIe file generation can be disabled entirely by
using the HD.TANDEM_BITSTREAMS property on the implemented design before .pdi file generation.
The following command can be used to do this.

set_property HD.TANDEM_BITSTREAMS NONE [current_design]

Displayed in the footer


Page 42 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Deliver Programming Images to Silicon


The purpose of Tandem Configuration is to be able to program the PCIe endpoint(s) within 120 ms
before link training starts. The stage 1 file configures the CPM, GTs, and NoC connection required for
stage 1 operation. These blocks then become active and can interact with the host to perform PCIe
enumeration.
When using Tandem PROM, configuration of the remainder of the device continues to load from the
same programming image on the same primary boot device while the PCIe endpoint(s) are up and
running. There are no requirements for PERSIST or similar restrictions as was the case with
UltraScale+ and prior FPGA architectures.
When using Tandem PCIe, the remainder of the device programming is done by delivering the stage 2
.pdi from the PCIe host as a secondary boot device. There are four data paths in the CPM through
which the tandem2.pdi file can be loaded from PCIe into the PMC Slave Boot Interface. The specific
path you select is determined based on the IP configuration desired for your specific application.
These are enumerated below.

QDMA MM Data Path


If the QDMA Memory Mapped data path is enabled, it can be used to download through PCIe
into the adaptive SoC slave boot interface at a maximum rate of 3.2 GB/s. For CPM4, this is
limited by the programming rate of the Slave Boot Interface. AMD provides sample QDMA
drivers and software (see Note below). For CPM4, this data path can only be used with PCIe
controller 0 because CPM controller 1 does not support hardened QDMA operation.

XDMA MM Data Path


If the XDMA Memory Mapped data path is enabled, it can be used to download through PCIe
into the adaptive SoC slave boot interface at a maximum rate of 3.2 GB/s. This is limited by the
programming rate of the adaptive SoC slave boot interface. AMD provides sample XDMA drivers
and software (see Note below). For CPM4, this data path can only be enabled with PCIe
controller 0 because CPM controller 1 does not support hardened XDMA operation.

AXI Master Bridge


If the AXI Master bridge data path is enabled, it can be used to download through PCIe into the
adaptive SoC slave boot interface at the rate of 700 MB/s for a 64 byte transfers from the host.
The rate depends on the host's ability to generate PCIe transactions. AMD does not provide
sample drivers and software for this because either the XDMA or QDMA data paths are typically
enabled with this mode and allow for higher transfer rates. For CPM4, this data path can only be
enabled with PCIe controller 0 because CPM controller 1 does not support hardened AXI Master
Bridge operation.

Versal MCAP VSEC


Both controllers in CPM can enable the MCAP Vendor Specific Extended Capability. This
capability functions differently from UltraScale and UltraScale+ solutions. It can be used to
download through PCIe into the adaptive SoC slave boot interface at a maximum rate limited by
the host’s ability to generate 32-bit PCIe configuration transactions (typically lower than 1 MB/s).
This mode of configuration should only be used when the other three data paths are not
available. This includes PCIe Streaming and CCIX modes on either controller. For more
information, see Versal Adaptive SoC CPM Mode for PCI Express Product Guide (PG346).

Displayed in the footer


Page 43 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

✎ Note: AMD provides sample drivers and software to enable stage 2 programming. These drivers
can be found at https://github.com/Xilinx/dma_ip_drivers.

Tandem Configuration Performance


The initial programming image is stored in and retrieved from a primary boot device. For Tandem
PROM, this image contains both stages, and for Tandem PCIe, this image contains only stage 1. To
calculate configuration performance, use the stage 1 image size as reported in the
write_device_image log files.
When using Tandem PCIe, the stage 2 image is delivered over a secondary boot device. Depending
on the delivery path and PCIE controller options, configuration performance can be up to 3.2 GB/s
while configuring the programmable logic.

CPM QDMA-MM to SBI: Gen3x16+ maximum expected bandwidth 3.2 GB/s


CPM XDMA-MM to SBI: Gen3x16+ maximum expected bandwidth 3.2 GB/s
CPM AXI-Bridge to SBI: Maximum expected Bandwidth 700 MB/s
CPM MCAP VSEC to SBI: Maximum expected bandwidth 1 MB/s or slower

For more information regarding Versal Configuration and Boot, please consult Versal Adaptive SoC
System Software Developers Guide (UG1304).

Design Operation
Though the CPM is a hardened integrated block, many features and options that can be selected
during CIPS configuration will require implementation in programmable soft logic (PL). Any part of the
design that has been implemented in the PL will be configured in stage 2. Design configurations that
require PL resources during PCIe enumeration should not be used with Tandem PROM or Tandem
PCIe. Specifically, the PCIe extended capability interface should not be enabled for Tandem modes
because these registers are addressed during enumeration and are not present in the stage 1 portion
of the design. Moreover, any other resource in the Versal device, such as the R5 or A72 processors in
the Scalar Engines, will be programmed after the CPM and its PCI Express endpoint(s). While future
enhancements to the Tandem Configuration solution may open opportunities to quickly booting other
dedicated parts of a target device, the current solution focuses exclusively on PCI Express end points
in the CPM only for the sole purpose of meeting the 100 ms boot time requirement.

Loading Tandem PCIe for Stage 2


Design Requirements

The Tandem PCIe and DFX features inherently operate on the same datapaths for this discussion;
they both use the PCIe link to deliver bitstream data to the slave boot interface (SBI) buffer, which is
grabbed by a PMC DMA block and delivered to the platform processing unit (PPU) for processing and
delivery to be programmed to the configurable device resources. The SBI buffer is an 8 KB FIFO and
any write to the aperture will occur in order, regardless of the target address. The mechanism for
delivery through CPM varies depending on the chosen methodology, but all require specific hardware
design requirements and have accompanying software and driver components. CPM4 Controller 1

Displayed in the footer


Page 44 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
does not have a hardened DMA controller connected to it, so QDMA, XDMA, and Master Bridge
delivery of a bitstream to the SBI would require the use of soft IP or the Versal MCAP VSEC. CPM5
does not support the use of the XDMA controller.

Table: Bitstream Delivery Details from PCIe to SBI

Delivery Method CPM4 HW Capable CtrlrCPM5


0 / Ctrlr
HW 1 Capable Ctrlr 0 /Software
Ctrlr 1 Driver

QDMA Yes / No Yes / Yes Yes

XDMA Yes / No No / No Yes

MCAP VSEC Yes / Yes Yes / Yes Yes

Master Bridge Yes / No Yes / Yes Yes

✎ Note: It is possible to take advantage of CPM in streaming only mode with the RQ/RC/CQ/CC
interfaces to the PL and deliver reconfigurable partitions to SBI, but this would require a user-written
soft IP DMA or bridge and custom software and drivers. It is recommended to use the MCAP VSEC
instead for ease of use unless higher throughput is required. The MCAP VSEC can also be used with
a user DMA or bridge to load a Tandem PCIe Stage 2 bitstream since the path to the SBI does not
require PL logic to be present.
For details on configuring a design in the Vivado Design Suite to support using MCAP VSEC or DMA
transfers for Tandem PCIe, refer to Enable the Tandem Configuration Solution. The requirements to
load a reconfigurable partition are the same as what’s described for Tandem PCIe since the datapaths
are the same. To configure a design for DFX and generate partial bitstreams for reconfigurable
modules, refer to Vivado Design Suite User Guide: Dynamic Function eXchange (UG909).

Using the Provided Software and Drivers

The open-source, AMD provided drivers and user space applications for the MCAP VSEC, QDMA,
and XDMA IPs can be found at https://github.com/Xilinx/dma_ip_drivers to be cloned or downloaded.
There is also extensive documentation on each of the drivers contained in the repository and linked to
external pages. The following sections are provided to show an example of the required commands;
assuming a compatible bitstream has been loaded to the device already, the PCIe link is up, and the
stage 2 or partial bitstream(s) are ready to be delivered to the device. For the examples in the
following sections, assuming the Bus:Device.Function of the PCIe device is 01:00.0.
✎ Note: Before loading DFX PDI using QDMA/XDMA driver (or) MCAP, you need to enable the SBI
data path by writing 0x29 to the SBI control register (0xF1220004).

QDMA

1. Navigate to driver root directory


a. $> cd <parent-path>/dma_ip_drivers/QDMA/linux-kernel
2. Compile driver and applications
a. $> make TANDEM_BOOT_SUPPORTED=1
3. Copy driver and application executables to standard destinations
a. $> make install

Displayed in the footer


Page 45 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
4. Generate the qdma.conf file manually or using the qdma_generate_conf_file.sh script and place
it in /etc/modprobe.d if this is the first time using the driver on the host system. It is also
recommended to blacklist the driver modules for boot; refer to the README for instructions
5. Insert the driver into the kernel
a. $> modprobe qdma-pf
6. Set the maximum number of possible queues using sysfs
a. $> echo 1 > /sys/bus/pcie/devices/0000:01:00.0/qdma/qmax
7. Add the queue; must use memory mapped DMA and direction is host to card
a. $> dma-ctl qdma01000 q add idx 0 mode mm dir h2c
8. Start the queue; must set aperture size so DMA transfer acts on a keyhole
a. $> dma-ctl qdma01000 q start idx 0 dir h2c aperture_sz 4096
9. Perform the DMA transfer to SBI
a. $> dma-to-device -d /dev/qdma01000-MM-0 -f <.pdi file> -s <size> -a
0x102100000
10. For CPM4 QDMA, perform the following additional steps to reload the qdma driver:
a. rmmod qdma-pf
b. modprobe qdma-pf
c. echo 1 > /sys/bus/pcie/devices/0000:01:00.0/qdma/qmax
d. dma-ctl qdma01000 q add idx 0 mode mm dir h2c
e. dma-ctl qdma01000 q start idx 0 dir h2c

XDMA (CPM4 Only)

1. Navigate to driver root directory


a. $> cd <parent-path>/dma_ip_drivers/XDMA/linux-kernel
2. Compile and install driver to /lib/modules directory
a. $> cd xdma && make install
3. Compile user applications
a. $> cd ../tools && make all
4. Insert the driver into the kernel
a. $> modprobe xdma
5. Perform the DMA transfer to SBI; must set aperture size so DMA transfer acts on keyhole
a. $> dma_to_device -d /dev/xdma0_h2c_0 -k 4096 -f <.pdi file> -s <size> -a
0x102100000

MCAP VSEC

‼ Important: The MCAP VSEC can only natively address the lower 4 GB of the address map, as it
can only issue 32-bit address transactions. To reach the SBI buffer address, the NOC NMU address
remapping capability must be employed, see the following recommended command as an example to
employ remapping from 32 bit address space to 48 bit address space for the SBI buffer.

set_property CONFIG.REMAPS {M00_AXI {{0xF122_0000 0x1_0122_0000 64K} {0xF210_0000


0x1_0210_0000 64K}}} [get_bd_intf_pins /axi_noc_0/S00_AXI]

Displayed in the footer


Page 46 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
1. Navigate to driver root directory
a. $> cd <parent-path>/dma_ip_drivers/XVSEC/linux-kernel
2. Compile driver and user space applications
a. $> make all
3. Copy driver and application executables to standard destinations
a. $> make install
4. Install the driver into the kernel
a. $> modprobe xvsec
5. Perform the transfer to SBI; must set transfer mode to fixed address, issue 128-bit transactions,
and target the remapped address range
a. $> xvsecctl -b 0x01 -F 0x0 -c 1 -p mode 128b type fixed 0xF2100000 <.pdi
file> tr_mode fast [sbi 0xF1220000]
✎ Note: Depending on the way the previous bitstreams are loaded to the device through the
SBI, it might be required to set the SBI to accept data from the AXI slave interface. The SBI
control and status registers are located at base address 0x101220000; examine the SBI_CTRL
register in the block. The xvsecctl application can automatically set the register by appending the
sbi <reg block target base address> argument as shown above. To use this functionality,
it is assumed that the PMC slave target (the slave boot control and status register block) is
assigned in the CPM address map and NOC NMU address remapping is employed to reach the
SBI buffer, as demonstrated in the important note above.

Design Version Compatibility Checks

A critical detail when using Tandem Configuration, specifically Tandem PCIe® , is to ensure that stage
1 and stage 2 PDI images remain paired. The fundamental solution splits a single implemented
design image into two sections, and there is no mechanism to guarantee either section will pair
successfully with a section from an independent design. It is the designer’s responsibility to keep the
two stages in sync. AMD Versal™ adaptive SoC devices have safeguards in place to help with this
effort.
AMD Versal™ adaptive SoC devices have safeguards in the form of unique identifiers (UID) that are
checked by the PLM when stage 2 programming images are delivered. Four 32-bit fields are
embedded in the PDI as described in the following table:

Table: Unique Identifiers

ID Name Description Defined by PDI Mapping

Node ID Defines the configuration node in the Vivado id (0x18)


PL

Unique ID A unique hash value to identify the Vivado unique_id (0x24)


module

Parent ID A reference to the Unique ID of the Vivado parent_unique_id


module that precedes the target (0x28)
module

Displayed in the footer


Page 47 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

ID Name Description Defined by PDI Mapping

Function ID Identification of the function residing User function_id (0x2c)


in the target module
The Node ID is a fixed value that is incremented for each partition in the design. The stage 1 image
represents the initial configuration of the device and it is given an ID of 0x18700000. A value of
0x18700001 is assigned to stage 2 of the Tandem PCIe design. This value is useful in differentiating
the partitions in the design, though the file sizes and contents also make this quite apparent. If DFX is
also included, each node (RM) in the design has an incremented value starting with 0x18700002.
The Unique ID is a hash value generated automatically by Vivado. The value is deterministic,
calculated by a number of factors within the design, so any change to a design’s results (code
change, new synthesis, or implementation options, or design constraints) could produce a new unique
ID for each design section.
The Parent ID is a reference to the unique ID of the module that precedes the current one. The Parent
ID of the stage 1 design always be zero (0x00000000) as it is the first file to be programmed in any
tandem design. The Parent ID of stage 2 is the unique ID of the stage 1 design it initially compiled
with. This is the comparison done in the PLM to ensure compatibility between images. If DFX is also
enabled in the tandem design, the Parent ID of each reconfigurable module PDI is the unique ID of
stage 2 PDI, as the partial images must follow the stage 2 image.
The function ID is a user space to embed any custom unique identifier that can be parsed in the PDI
header. Users can apply a 32-bit value by setting a property on the instance. This enables them to
quickly differentiate functions contained in the programming images.
✎ Note: The Function ID feature is not supported currently. This capability will be added in a future
Vivado release. Until then, the value of the Function ID remains at the default 0x00000000 for any
design partition.
These unique identifiers are automatically stored in the Tandem PDI files generated by
write_device_image. You can read these values in a number of ways, but the easiest way is to parse
them using Bootgen. Call Bootgen with the -read option to see the details of the PDI contents. Find
the four UID fields in the pl_cfi portion of the output.

bootgen -arch versal -read <pdi file>

Information seen in the stage 1 PDI:

...
--------------------------------------------------------------------------------
IMAGE HEADER (pl_cfi)
--------------------------------------------------------------------------------
pht_offset (0x00) : 0x00014a24 section_count (0x04) : 0x00000001
mHdr_revoke_id (0x08) : 0x00000000 attributes (0x0c) : 0x00001800
name (0x10) : pl_cfi
id (0x18) : 0x18700000 unique_id (0x24) : 0x738c16a7
parent_unique_id (0x28) : 0x00000000 function_id (0x2c) : 0x00000000
memcpy_address_lo (0x30) : 0x00000000 memcpy_address_hi (0x34) : 0x00000000
checksum (0x3c) : 0x10a2b15d
attribute list -

Displayed in the footer


Page 48 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
owner [plm] memcpy [no]
load [now] handoff [now]
dependentPowerDomains [spd][pld]

Information seen in the stage 2 PDI:

...
--------------------------------------------------------------------------------
IMAGE HEADER (pl_cfi)
--------------------------------------------------------------------------------
pht_offset (0x00) : 0x00000034 section_count (0x04) : 0x00000002
mHdr_revoke_id (0x08) : 0x00000000 attributes (0x0c) : 0x00001800
name (0x10) : pl_cfi
id (0x18) : 0x18700001 unique_id (0x24) : 0xf278b7b9
parent_unique_id (0x28) : 0x738c16a7 function_id (0x2c) : 0x00000000
memcpy_address_lo (0x30) : 0x00000000 memcpy_address_hi (0x34) : 0x00000000
checksum (0x3c) : 0x1e2b4392
attribute list -
owner [plm] memcpy [no]
load [now] handoff [now]
dependentPowerDomains [spd][pld]

Note how the Unique ID of the stage 1 output matches the Parent ID of the stage 2 output. These
PDIs are from the same design and are therefore compatible.
This data can also be found via cdoutil by grepping for the string "ldr_set_image_info" in a Tandem or
DFX PDI image. The four UID fields always be listed in the same order: Node ID, Unique ID, Parent
ID, and Function ID. The stage 1 PDI has the UID information for the stage 1 domain (clearly
identified by the Parent ID – the third field – set to 0) followed by the stage 2, whereas the stage 2 PDI
only shows the UID information for the PL portion of the design, unless DFX is also enabled. If DFX is
enabled, the stage 2 PDI lists all RMs in the design, each with an incrementing Node ID and a Parent
ID pointing back to stage 2.
Example without DFX:

> cdoutil design_tandem1.pdi | grep ldr_set_image_info


ldr_set_image_info 0x18700000 0x738c16a7 0 0
ldr_set_image_info 0x18700001 0xf278b7b9 0x738c16a7 0
> cdoutil design_tandem2.pdi | grep ldr_set_image_info
ldr_set_image_info 0x18700001 0xf278b7b9 0x738c16a7 0

Example with DFX:

> cdoutil Versal_CPM_Tandem_PCIe_top_tandem1.pdi | grep ldr_set_image_info


ldr_set_image_info 0x18700000 0x6be3877e 0 0
ldr_set_image_info 0x18700001 0x9163c3cf 0x6be3877e 0
> cdoutil Versal_CPM_Tandem_PCIe_top_tandem2.pdi | grep ldr_set_image_info
ldr_set_image_info 0x18700001 0x9163c3cf 0x6be3877e 0
ldr_set_image_info 0x18700002 0xb01c9cde 0x9163c3cf 0
ldr_set_image_info 0x18700003 0xebb76422 0x9163c3cf 0

Displayed in the footer


Page 49 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
> cdoutil Versal_CPM_Tandem_PCIe_DFX_pl_bram_partial.pdi | grep ldr_set_image_info
ldr_set_image_info 0x18700002 0xb01c9cde 0x9163c3cf 0

During the stage 2 image load, the PLM examines these contents and confirms that the identifiers
match. The Parent ID of the stage 2 must match the Unique ID of the stage 1 image already loaded in
the Versal device. If they do not, a failure is reported and the stage 2 PDI is rejected before it is
programmed, allowing the you to take corrective action. Following are example logs for passing and
failing conditions:
Passing example:

[root@machine linux-kernel]# ll pdi/


-rw-r--r-- 1 root root 1426912 Apr 14 12:32 design_tandem2.pdi
[root@machine linux-kernel]# dma-to-device -d /dev/qdma65000-MM-0 -f
./pdi/design_tandem2.pdi -s 1426912 -a 0x102100000
size=1426912 Average BW = 36.272916 MB/sec

Failing example:

[root@machine linux-kernel]# ll pdi/


-rw-r--r-- 1 root root 1426912 Apr 14 12:32 design_tandem2.pdi
[root@machine linux-kernel]# dma-to-device -d /dev/qdma65000-MM-0 -f
./pdi/design_tandem2.pdi -s 1426912 -a 0x102100000
/dev/qdma65000-MM-0, W off 0x102100000, 0x15ecf0 failed -1.
write file: Input/output error

Tandem PCIe and DFX Configurable Example Design

A set of example designs are hosted on GitHub in the XilinxCEDStore repository. These repositories
can be accessed through AMD Vivado, which can be refreshed with a valid internet connection
including an AMD Versal device CPM Tandem PCIe and DFX example design. You can also
download or clone the GitHub repository to your local machine and point to that location on your PC.
The example design can be generated for CPM4 (VCK190) and CPM5 (VPK120) targets. The
following diagram intends to demonstrate the tandem PCIe and DFX features:

Figure: Tandem CED Block

Displayed in the footer


Page 50 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

To open an example design, perform the following steps:

1. Launch AMD Vivado.


2. Navigate to the set of example designs for selection.
3. From the Quick Start menu, select File > Project > Open Example.
4. From the Select Project Template window, select Versal CPM Tandem PCIe > Versal CPM
Tandem PCIe and DFX and navigate through the menu to select a project location and board
part.

Displayed in the footer


Page 51 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

5. In the Flow Navigator, click Generate Device Image to run synthesis, implementation, and to
generate a programmable device image (.pdi) file that can be loaded to the target Versal device.

Instructions on how to download, install, and use the DMA drivers for this design are here.

Segmented Configuration
Segmented Configuration is a solution that enables designers to boot the processors in a Versal
device and access DDR memory before the programmable logic (PL) is configured. This allows DDR-
based software like Linux to boot first followed by the PL, which can be configured later if needed via
any primary or secondary boot device or through a DDR image store. The Segmented configuration
feature is intended to present the Versal boot sequence with similar flexibility to configure PL as can
be done with AMD Zynq™ UltraScale+™ MPSoCs.
This solution uses a standard Vivado tool flow through implementation, with the only additional
annotation required is the identification of NoC path segments to be included in the initial boot image.
This occurs automatically after the project property enabling the feature is set. Programming image
generation (write_device_image) automatically splits the programming images into two PDI files to be
stored and delivered separately. The entire PL is dynamic and it can be completely reloaded while any
operating system and DDR memory access remain active.
Segmentation of Versal adaptive SoC programming images allow the processing domain, which
includes the CPM to be available much more quickly than with a monolithic programming solution.
The difference between Tandem Configuration and Segmented Configuration is where the split is
done. Tandem includes only the necessary elements for link training in stage 1 – CPM, GTY, and
PMC. Segmented includes everything except the programmable logic (PL) domain along with NoC
resources within that domain.

Displayed in the footer


Page 52 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Additional tuning of the boot PDI is done to combine the Tandem approach with Segmented
Configuration to ensure the 120 ms link training goal is met, but only for specific Versal Premium and
HBM devices in this release. When Segmented Configuration is enabled for a design with one or both
CPM5 controllers set to end point mode, the resulting boot PDI is structured as a Tandem image. You
can deliver the PLD PDI over the PCIe link using QDMA or any other available interface. The
equivalent support for CPM4 devices is planned for a future release.

Table: Segmented Configuration Feature

Feature CPM4 Devices CPM5 Devices

Tandem Configuration Available (Tandem PROM or Available (Tandem PROM or


Tandem PCIe) Tandem PCIe)

120 ms enumeration with Yes Yes


Tandem Configuration

Segmented Configuration Available Available

120 ms enumeration with No Yes


Segmented Configuration

✎ Note: In the Vivado 2024.2 release, select one feature or the other. Selecting both Tandem and
Segmented options results in an error during write_device_image.
To load PLD (Segmented) or partial (DFX) PDI images over the CPM QDMA interface, PCIe must be
declared as a secondary boot interface. In the boot.bif generated by the Vivado flow (found in the
implementation runs directory), add a single line. Insert boot_device { pcie } after line 5 (id = 0x2).
This is automatically managed when Tandem PCIe is selected, but must be declared when using
Segmented Configuration or DFX.

Figure: boot.bif

Displayed in the footer


Page 53 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

For more information on Segmented Configuration, including design requirements and a tutorial walk-
through, see the Segmented Configuration tutorial available on the AMD Vivado GitHub repository.

Displayed in the footer


Page 54 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Known Issues and Limitations


Engineering Silicon (ES) is not supported for VC1902, VC1802, and VM1802 devices. Only
production silicon is supported for these devices.
PCIe features incompatible with Tandem Configuration:
PCIe Extended Configuration Space as this requires PL logic.
QDMA multi-function is not supported. This feature uses PL mailbox which is probed during
driver load.
Stage 1 and stage 2 PDI images must remain linked. If you update one, update the other to
ensure both stages have been generated from the same implemented design.
Do not reload the same or a new stage 2 image on the fly. Isolation circuitry on the periphery of
the CPM is designed to ensure safe operation of the PCI Express endpoints prior to the rest of
the device becoming active. Upon stage 2 completion, this isolation is released and it is not re-
enabled for a dynamic reload of the stage 2 image.
For the Versal architecture, CPM Tandem PROM configuration can be used with post
configuration flash update. This is different from previous (UltraScale+ and older) architectures
that did not support Tandem PROM post configuration flash update. Given the lack of dual-mode
configuration pins, there is no PERSIST requirement to keep a configuration port active.
At this point there is no Tandem with Field Updates predefined use case. You can create a
Tandem + DFX solution noted above where both features are enabled in the same design, but
creation of the design hierarchy and floorplan as well as insertion of any decoupling logic are the
responsibility of the designer. The dynamic (DFX) portion of the solution would be limited to
programmable logic and NoC resources, and not parts of the Scalar Engines (processors).
No support is planned for Tandem Configuration for PL-based PCIe sites. The complexity and
inefficiency of such a solution in Versal device makes it very difficult to meet the 120 ms goal for
Tandem PROM and impossible for Tandem PCIe. If compliant link training is a fundamental
requirement, be sure to use a device with the CPM and enable the Tandem feature within CIPS
customization.
When debug within a Tandem design, debug_nets.ltx file should only be included after stage2 is
loaded. Loading the debug_nets.ltx file when only stage1 is present, it causes tool errors when
trying to access the debug cores that exist as part of the stage2 design.

QDMA Subsystem for CPM4


Overview
The Queue-based Direct Memory Access (QDMA) subsystem is a PCI Express® ( PCIe® ) based
DMA engine that is optimized for both high bandwidth and high packet count data transfers. The
QDMA is composed of the AMD Versal™ Integrated Block for PCI Express, and an extensive DMA
and bridge infrastructure that enables the ultimate in performance and flexibility.
The QDMA offers a wide range of setup and use options, many selectable on a per-queue basis, such
as memory-mapped DMA or stream DMA, interrupt mode and polling. The functional mode provides
many options for customizing the descriptor and DMA through user logic to provide complex traffic
management capabilities.

Displayed in the footer


Page 55 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The primary mechanism to transfer data using the QDMA is for the QDMA engine to operate on
instructions (descriptors) provided by the host operating system. Using the descriptors, the QDMA can
move data in both the Host to Card (H2C) direction, or the Card to Host (C2H) direction. You can
select on a per-queue basis whether DMA traffic goes to an AXI4 memory map (MM) interface or to an
AXI4-Stream interface. In addition, the QDMA has the option to implement both an AXI4 MM Master
port and an AXI4 MM Slave port, allowing PCIe traffic to bypass the DMA engine completely.
The main difference between QDMA and other DMA offerings is the concept of queues. The idea of
queues is derived from the “queue set” concepts of Remote Direct Memory Access (RDMA) from high
performance computing (HPC) interconnects. These queues can be individually configured by
interface type, and they function in many different modes. Based on how the DMA descriptors are
loaded for a single queue, each queue provides a very low overhead option for setup and continuous
update functionality. By assigning queues as resources to multiple PCIe Physical Functions (PFs) and
Virtual Functions (VFs), a single QDMA core and PCI Express interface can be used across a wide
variety of multifunction and virtualized application spaces.
The QDMA can be used and exercised with an AMD provided QDMA reference driver, and then built
out to meet a variety of application spaces.

QDMA Architecture

The following figure shows the block diagram of the QDMA.

Figure: QDMA Architecture

DMA Engines

Displayed in the footer


Page 56 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Descriptor Engine

The Host to Card (H2C) and Card to Host (C2H) descriptors are fetched by the Descriptor Engine in
one of two modes: Internal mode, and Descriptor bypass mode. The descriptor engine maintains per
queue contexts where it tracks software (SW) producer index pointer (PIDX), consumer index pointer
(CIDX), base address of the queue (BADDR), and queue configurations for each queue. The
descriptor engine uses a round robin algorithm for fetching the descriptors. The descriptor engine has
separate buffers for H2C and C2H queues, and ensures it never fetches more descriptors than
available space. The descriptor engine will have only one DMA read outstanding per queue at a time
and can read as many descriptors as can fit in a MRRS. The descriptor engine is responsible for
reordering the out of order completions and ensures that descriptors for queues are always in order.
The descriptor bypass can be enabled on a per-queue basis and the fetched descriptors, after
buffering, are sent to the respective bypass output interface instead of directly to the H2C or C2H
engine. In internal mode, based on the context settings the descriptors are sent to delete per H2C
memory mapped (MM), C2H MM, H2C Stream, or C2H Stream engines.
The descriptor engine is also responsible for generating the status descriptor for the completion of the
DMA operations. With the exception of C2H Stream mode, all modes use this mechanism to convey
completion of each DMA operation so that software can reclaim descriptors and free up any
associated buffers. This is indicated by the CIDX field of the status descriptor.
✎ Recommended: If a queue is associated with interrupt aggregation, AMD recommends that the
status descriptor be turned off, and instead the DMA status be received from the interrupt aggregation
ring.
To put a limit on the number of fetched descriptors (for example, to limit the amount of buffering
required to store the descriptor), it is possible to turn-on and throttle credit on a per-queue basis. In
this mode, the descriptor engine fetches the descriptors up to available credit, and the total number of
descriptors fetched per queue is limited to the credit provided. The user logic can return the credit
through the dsc_crdt interface. The credit is in the granularity of the size of the descriptor.
To help a user-developed traffic manager prioritize the workload, the available descriptor to be fetched
(incremental PIDX value) of the PIDX update is sent to the user logic on the tm_dsc_sts interface.
Using this interface it is possible to implement a design that can prioritize and optimize the descriptor
storage.

H2C MM Engine

The H2C MM Engine moves data from the host memory to card memory through the H2C AXI-MM
interface. The engine generates reads on PCIe, splitting descriptors into multiple read requests based
on the MRRS and the requirement that PCIe reads do not cross 4 KB boundaries. Once completion
data for a read request is received, an AXI write is generated on the H2C AXI-MM interface. For
source and destination addresses that are not aligned, the hardware will shift the data and split writes
on AXI-MM to prevent 4 KB boundary crossing. Each completed descriptor is checked to determine
whether a writeback and/or interrupt is required.
For Internal mode, the descriptor engine delivers memory mapped descriptors straight to the H2C MM
engine. The user logic can also inject the descriptor into the H2C descriptor bypass interface to move
data from host to card memory. This gives the ability to do interesting things such as mixing control
and DMA commands in the same queue. Control information can be sent to a control processor
indicating the completion of DMA operation.

Displayed in the footer


Page 57 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
C2H MM Engine

The C2H MM Engine moves data from card memory to host memory through the C2H AXI-MM
interface. The engine generates AXI reads on the C2H AXI-MM bus, splitting descriptors into multiple
requests based on 4 KB boundaries. Once completion data for the read request is received on the
AXI4 interface, a PCIe write is generated using the data from the AXI read as the contents of the
write. For source and destination addresses that are not aligned, the hardware will shift the data and
split writes on PCIe to obey Maximum Payload Size (MPS) and prevent 4 KB boundary crossings.
Each completed descriptor is checked to determine whether a writeback and/or interrupt is required.
For Internal mode, the descriptor engine delivers memory mapped descriptors straight to the C2H MM
engine. As with H2C MM Engine, the user logic can also inject the descriptor into the C2H descriptor
bypass interface to move data from card to host memory.
For multi-function configuration support, the PCIe function number information will be provided in the
aruser bits of the AXI-MM interface bus to help virtualization of card memory by the user logic. A
parity bus, separate from the data and user bus, is also provided for end-to-end parity support.

H2C Stream Engine

The H2C stream engine moves data from the host to the H2C Stream interface. For internal mode,
descriptors are delivered straight to the H2C stream engine; for a queue in bypass mode, the
descriptors can be reformatted and fed to the bypass input interface. The engine is responsible for
breaking up DMA reads to MRRS size, guaranteeing the space for completions, and also makes sure
completions are reordered to ensure H2C stream data is delivered to user logic in-order.
The engine has sufficient buffering for up to 256 descriptor reads and up to 32 KB of data. DMA
fetches the data and aligns to the first byte to transfer on the AXI4 interface side. This allows every
descriptor to have random offset and random length. The total length of all descriptors put together
must be less than 64 KB.
For internal mode queues, each descriptor defines a single AXI4-Stream packet to be transferred to
the H2C AXI-ST interface. A packet with multiple descriptors straddling is not allowed due to the lack
of per queue storage. However, packets with multiple descriptors straddling can be implemented
using the descriptor bypass mode. In this mode, the H2C DMA engine can be initiated when the user
logic has enough descriptors to form a packet. The DMA engine is initiated by delivering the multiple
descriptors straddled packet along with other H2C ST packet descriptors through the bypass
interface, making sure they are not interleaved. Also, through the bypass interface, the user logic can
control the generation of the status descriptor.

C2H Stream Engine

The C2H streaming engine is responsible for receiving data from the user logic and writing to the Host
memory address provided by the C2H descriptor for a given Queue.
The C2H engine has two major blocks to accomplish C2H streaming DMA, Descriptor Prefetch Cache
(PFCH), and the C2H-ST DMA Write Engine. The PFCH has per queue context to enhance the
performance of its function and the software that is expected to program it.
PFCH cache has three main modes, on a per queue basis, called Simple Bypass Mode, Internal
Cache Mode, and Cached Bypass Mode.

Displayed in the footer


Page 58 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
In Simple Bypass Mode, the engine does not track anything for the queue, and the user logic
can define its own method to receive descriptors. The user logic is then responsible for
delivering the packet and associated descriptor through the simple bypass interface. The
ordering of the descriptors fetched by a queue in the bypass interface and the C2H stream
interface must be maintained across all queues in bypass mode.
In Internal Cache Mode and Cached Bypass Mode, the PFCH module offers storage for up to
512 descriptors and these descriptors can be used by up to 64 different queues. In this mode,
the engine controls the descriptors to be fetched by managing the C2H descriptor queue credit
on demand based on received packets in the pipeline. Pre-fetch mode can be enabled on a per
queue basis, and when enabled, causes the descriptors to be opportunistically pre-fetched so
that descriptors are available before the packet data is available. The status can be found in
prefetch context. This significantly reduces the latency by allowing packet data to be transferred
to the PCIe integrated block almost immediately, instead of having to wait for the relevant
descriptor to be fetched. The size of the data buffer is fixed for a queue (PFCH context) and the
engine can scatter the packet across as many as seven descriptors. In cached bypass mode
descriptor is bypassed to user logic for further processing, such as address translation, and sent
back on the bypass in interface. This mode does not assume any ordering descriptor and C2H
stream packet interface, and the pre-fetch engine can match the packet and descriptors. When
pre-fetch mode is enabled, do not give credits to IP. The pre-fetch engine takes care of credit
management.

Completion Engine

The Completion (CMPT) Engine is used to write to the completion queues. Although the Completion
Engine can be used with an AXI-MM interface and Stream DMA engines, the C2H Stream DMA
engine is designed to work closely with the Completion Engine. The Completion Engine can also be
used to pass immediate data to the Completion Ring. The Completion Engine can be used to write
Completions of up to 64B in the Completion ring. When used with a DMA engine, the completion is
used by the driver to determine how many bytes of data were transferred with every packet. This
allows the driver to reclaim the descriptors.
The Completion Engine maintains the Completion Context. This context is programmed by the Driver
and is maintained on a per-queue basis. The Completion Context stores information like the base
address of the Completion Ring, PIDX, CIDX and a number of aspects of the Completion Engine,
which can be controlled by setting the fields of the Completion Context.
The engine also can be configured on a per-queue basis to generate an interrupt or a completion
status update, or both, based on the needs of the software. If the interrupts for multiple queues are
aggregated into the interrupt aggregation ring, the status descriptor information is available in the
interrupt aggregation ring as well.
The CMPT Engine has a cache of up to 64 entries to coalesce the multiple smaller CMPT writes into
64B writes to improve the PCIe efficiency. At any time, completions can be simultaneously coalesced
for up to 64 queues. Beyond this, any additional queue that needs to write a CMPT entry will cause
the eviction of the least recently used queue from the cache. The depth of the cache used for this
purpose is configurable with possible values of 8, 16, 32, and 64.

Bridge Interfaces

Displayed in the footer


Page 59 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI Memory Mapped Bridge Master Interface

The AXI MM Bridge Master interface is used for high bandwidth access to AXI Memory Mapped space
from the host. The interface supports up to 32 outstanding AXI reads and writes. One or more PCIe
BAR of any physical function (PF) or virtual function (VF) can be mapped to the AXI-MM bridge
master interface. This selection must be made prior to design compilation.
Virtual function group (VFG) refers to the VF group number. It is equivalent to the PF number
associated with the corresponding VF. VFG_OFFSET refers to the VF number with respect to a
particular PF. Note that this is not the FIRST_VF_OFFSET of each PF.
For example, if both PF0 and PF1 have 8 VFs, FIRST_VF_OFFSET for PF0 and PF1 is 4 and 11.
Each host initiated access can be uniquely mapped to the 64 bit AXI address space through the PCIe
to AXI BAR translation.
Since all functions share the same AXI Master address space, a mechanism is needed to map
requests from different functions to a distinct address space on the AXI master side. An example
provided below shows how PCIe to AXI translation vector is used. Note that all VFs belonging to the
same PF share the same PCIe to AXI translation vector. Therefore, the AXI address space of each VF
is concatenated together. Use VFG_OFFSET to calculate the actual starting address of AXI for a
particular VF.
To summarize, AXI master write or read address is determined as:

For PF, address = pcie2axi_vec + axi_offset.


For VF, address = pcie2axi_vec + (VFG_OFFSET + 1)*vf_bar_size + axi_offset.

Where pcie2axi_vec is PCIe to AXI BAR translation (that can be set when the IP core is configured
from the Vivado IP Catalog).
And axi_offset is the address offset in the requested target space.

PCIe to AXI BARs

For each physical function, the PCIe configuration space consists of a set of six 32-bit memory BARs
and one 32-bit Expansion ROM BAR. When SR-IOV is enabled, an additional six 32-bit BARs are
enabled for each Virtual Function. These BARs provide address translation to the AXI4 memory
mapped space capability, interface routing, and AXI4 request attribute configuration. Any pairs of
BARs can be configured as a single 64-bit BAR. Each BAR can be configured to route its requests to
the QDMA register space, or the AXI MM bridge master interface.

Request Memory Type


The memory type can be set for each PCIe BAR through GUI configuration.

AxCache[1] is set to 1 for modifiable, and 0 for non-modifiable. Selecting the AxCache box will
set AxCache[1] to 1.

AXI Memory Mapped Bridge Slave Interface

The AXI-MM Bridge Slave interface is used for high bandwidth memory transfers between the user
logic and the Host. AXI to PCIe translation is supported through the AXI to PCIe BARs. The interface

Displayed in the footer


Page 60 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
will split requests as necessary to obey PCIe MPS and 4 KB boundary crossing requirements. Up to
32 outstanding read and write requests are supported.

AXI to PCIe BARs

In the Bridge Slave interface, there are six BARs which can be configured as 32 bits or 64 bits. These
BARs provide address translation from an AXI address space to the PCIe address space. The
address translation is configured through GUI configuration, for more information see, BAR and
Address Translation.

Interrupt Module

The IRQ module aggregates interrupts from various sources. The interrupt sources are queue-based
interrupts, user interrupts and error interrupts.
Queue-based interrupts and user interrupts are allowed on PFs and VFs, but error interrupts are
allowed only on PFs. If the SR-IOV is not enabled, each PF has the choice of MSI-X or Legacy
Interrupts. With SR-IOV enabled, only MSI-X interrupts are supported across all functions.
MSI-X interrupt is enabled by default. Host system (Root Complex) will enable one or all of the
interrupt types supported in hardware. If MSI-X is enabled, it takes precedence.
Up to eight interrupts per function are available. To allow many queues on a given function and each
to have interrupts, the QDMA offers a novel way of aggregating interrupts from multiple queues to a
single interrupt vector. In this way, all 2048 queues could in principle be mapped to a single interrupt
vector. QDMA offers 256 interrupt aggregation rings that can be flexibly allocated among the 256
available functions.

PCIe Block Interface

PCIe CQ/CC

The PCIe Completer Request (CQ)/Completer Completion (CC) modules receive and process TLP
requests from the remote PCIe agent. This interface to the PCIE Controller operates in address
aligned mode. The module uses the BAR information from the Integrated Block for PCIE Controller to
determine where the request should be forwarded. The possible destinations for these requests are:

DMA configuration module


AXI4 MM Bridge interface to Network on Chip (NoC)

Non-posted requests are expected to receive completions from the destination, which are forwarded
to the remote PCIe agent. For further details, see the Versal Adaptive SoC CPM Mode for PCI
Express Product Guide (PG346).

PCIe RQ/RC

The PCIe Requester Request (RQ)/Requester Completion (RC) interface generates PCIeTLPs on the
RQ bus and processes PCIe Completion TLPs from the RC bus. This interface to the PCIE Controller
operates in DWord aligned mode. With a 512-bit interface, straddling will be enabled. While straddling

Displayed in the footer


Page 61 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
is supported, all combinations of RQ straddled transactions may not be implemented. For further
details, see the Versal Adaptive SoC CPM Mode for PCI Express Product Guide (PG346).

PCIe Configuration

Several factors can throttle outgoing non-posted transactions. Outgoing non-posted transactions are
throttled based on flow control information from the PCIE Controller to prevent head of line blocking of
posted requests. The DMA will meter non-posted transactions based on the PCIe Receive FIFO
space.

General Design of Queues

The multi-queue DMA engine of the QDMA uses RDMA model queue pairs to allow RNIC
implementation in the user logic. Each queue set consists of Host to Card (H2C), Card to Host (C2H),
and a C2H Stream Completion (CMPT). The elements of each queue are descriptors.
H2C and C2H are always written by the driver/software; hardware always reads from these queues.
H2C carries the descriptors for the DMA read operations from Host. C2H carries the descriptors for
the DMA write operations to the Host.
In internal mode, H2C descriptors carry address and length information and are called gather
descriptors. They support 32 bits of metadata that can be passed from software to hardware along
with every descriptor. The descriptor can be memory mapped (where it carries host address, card
address, and length of DMA transfer) or streaming (only host address, and length of DMA transfer)
based on context settings. Through descriptor bypass, an arbitrary descriptor format can be defined,
where software can pass immediate data and/or additional metadata along with packet.
C2H queue memory mapped descriptors include the card address, the host address and the length.
In streaming internal cached mode, descriptors carry only the host address. The buffer size of the
descriptor, which is programmed by the driver, is expected to be of fixed size for the whole queue.
Actual data transferred associated with each descriptor does not need to be the full length of the
buffer size.
The software advertises valid descriptors for H2C and C2H queues by writing its producer index
(PIDX) to the hardware. The status descriptor is the last entry of the descriptor ring, except for a C2H
stream ring. The status descriptor carries the consumer index (CIDX) of the hardware so that the
driver knows when to reclaim the descriptor and deallocate the buffers in the host.
For the C2H stream mode, C2H descriptors will be reclaimed based on the CMPT queue entry.
Typically, this carries one entry per C2H packet, indicating one or more C2H descriptors is consumed.
The CMPT queue entry carries enough information for software to claim all the descriptors consumed.
Through external logic, this can be extended to carry other kinds of completions or information to the
host.
CMPT entries written by the hardware to the ring can be detected by the driver using either the color
bit in the descriptor or the status descriptor at the end of the CMPT ring. Each CMPT entry can carry
metadata for a C2H stream packet and can also serve as a custom completion or immediate
notification for the user application.
The base address of all ring buffers (H2C, C2H, and CMPT) should be aligned to a 4 KB address.

Figure: Queue Ring Architecture

Displayed in the footer


Page 62 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The software can program 16 different ring sizes. The ring size for each queue can be selected from
context programing. The last queue entry is the descriptor status, and the number of allowable entries
is (queue size -1).
For example, if queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is
reserved for status. This index should never be used for PIDX update, and PIDX update should never
be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.
In the example above, if traffic has already started and the CIDX is 4, the maximum PIDX update is 3.

H2C and C2H Queues

H2C/C2H queues are rings located in host memory. For both type of queues, the producer is software
and consumer is the descriptor engine. The software maintains producer index (PIDX) and a copy of
hardware consumer index (HW CIDX) to avoid overwriting unread descriptors. The descriptor engine
also maintains consumer index (CIDX) and a copy of SW PIDX, which is to make sure the engine
does not read unwritten descriptors. The last entry in the queue is dedicated for the status descriptor
where the engine writes the HW CIDX and other status.
The engine maintains a total of 2048 H2C and 2048 C2H contexts in local memory. The context
stores properties of the queue, such as base address (BADDR), SW PIDX, CIDX, and depth of the
queue.

Figure: Simple H2C and C2H Queue

Displayed in the footer


Page 63 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The figure above shows the H2C and C2H fetch operation.

1. For H2C, the driver writes payload into host buffer, forms the H2C descriptor with the payload
buffer information and puts it into H2C queue at the PIDX location. For C2H, the driver forms the
descriptor with available buffer space reserved to receive the packet write from the DMA.
2. The driver sends the posted write to PIDX register in the descriptor engine for the associated
Queue ID (QID) with its current PIDX value.
3. Upon reception of the PIDX update, the engine calculates the absolute QID of the pointer update
based on address offset and function ID. Using the QID, the engine will fetch the context for the
absolute QID from the memory associated with the QDMA.
4. The engine determines the number of descriptors that are allowed to be fetched based on the
context. The engine calculates the descriptor address using the base address (BADDR), CIDX,
and descriptor size, and the engine issues the DMA read request.
5. After the descriptor engine receives the read completion from the host memory, the descriptor
engine delivers them to the H2C Engine or C2H Engine in internal mode. In case of bypass, the
descriptors are sent out to the associated descriptor bypass output interface.
6. For memory mapped or H2C stream queues programmed as internal mode, after the fetched
descriptor is completely processed, the engine writes the CIDX value to the status descriptor.
For queues programmed as bypass mode, user logic controls the write back through bypass in
interface. The status descriptor could be moderated based on context settings. C2H stream
queues always use the CMPT ring for the completions.

For C2H, the fetch operation is implicit through the CMPT ring.

Completion Queue

The Completion (CMPT) queue is a ring located in host memory. The consumer is software, and the
producer is the CMPT engine. The software maintains the consumer index (CIDX) and a copy of
hardware producer index (HW PIDX) to avoid reading unwritten completions. The CMPT engine also
maintains PIDX and a copy of software consumer index (SW CIDX) to make sure that the engine

Displayed in the footer


Page 64 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
does not overwrite unread completions. The last entry in the queue is dedicated for the status
descriptor which is where the engine writes the hardware producer index (HW PIDX) and other status.
The engine maintains a total of 2048 CMPT contexts in local memory. The context stores properties of
the queue, such as base address, SW CIDX, PIDX, and depth of the queue.

Figure: Simple Completion Queue Flow

C2H stream is expected to use the CMPT queue for completions to host, but it can also be used for
other types of completions or for sending messages to the driver. The message through the CMPT is
guaranteed to not bypass the corresponding C2H stream packet DMA.
The simple flow of DMA CMPT queue operation with respect to the numbering above follows:

1. The CMPT engine receives the completion message through the CMPT interface, but the QID
for the completion message comes from the C2H stream interface. The engine reads the QID
index of CMPT context RAM.
2. The DMA writes the CMPT entry to address BASE+PIDX.
3. If all conditions are met, optionally writes PIDX to the status descriptor of the CMPT queue with
color bit.
4. If interrupt mode is enabled, the CMPT engine generates the interrupt event message to the
interrupt module.
5. The driver can be in polling or interrupt mode. Either way, the driver identifies the new CMPT
entry either by matching the color bit or by comparing the PIDX value in the status descriptor
against its current software CIDX value.
6. The driver updates CIDX for that queue. This allows the hardware to reuse the descriptors again.
After the software finishes processing the CMPT, that is, before it stops polling or leaving the
interrupt handler, the driver issues a write to CIDX update register for the associated queue.

Displayed in the footer


Page 65 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
SR-IOV Support

The QDMA provides an optional feature to support Single Root I/O Virtualization (SR-IOV). The PCI-
SIG® Single Root I/O Virtualization and Sharing (SR-IOV) specification (available from PCI-SIG
Specifications (www.pcisig.com/specifications) standardizes the method for bypassing the VMM
involvement in datapath transactions and allows a single endpoint to appear as multiple separate
endpoints. SR-IOV classifies the functions as:

Physical Functions (PF): Full featured PCIe® functions which include SR-IOV capabilities among
others.
Virtual Functions (VF): PCIe functions featuring configuration space with Base Address
Registers (BARs) but lacking the full configuration resources and controlled by the PF
configuration. The main role of the VF is data transfer.

Apart from PCIe defined configuration space, QDMA Subsystem for PCI Express virtualizes data path
operations, such as pointer updates for queues, and interrupts. The rest of the management and
configuration functionality is deferred to the physical function driver. The Drivers that do not have
sufficient privilege must communicate with the privileged Driver through the mailbox interface which is
provided in part of the QDMA Subsystem for PCI Express.
Security is an important aspect of virtualization. The QDMA Subsystem for PCI Express offers the
following security functionality:

QDMA allows only privileged PF to configure the per queue context and registers. VFs inform the
corresponding PFs of any queue context programming.
Drivers are allowed to do pointer updates only for the queue allocated to them.
The system IOMMU can be turned on to check that the DMA access is being requested by PFs
or VFs. The ARID comes from queue context programmed by a privileged function.

Any PF or VF can communicate to a PF (not itself) through mailbox. Each function implements one
128B inbox and 128B outbox. These mailboxes are visible to the driver in the DMA BAR (typically
BAR0) of its own function. At any given time, any function can have one outgoing mailbox and one
incoming mailbox message outstanding per function.
The diagram below shows how a typical system can use QDMA with different functions and operating
systems. Different Queues can be allocated to different functions, and each function can transfer DMA
packets independent of each other.

Figure: QDMA in a System

Displayed in the footer


Page 66 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Limitations

The limitation of the QDMA is as follows:

The DMA supports a maximum of 256 Queues on any VF function.

Applications

The QDMA is used in a broad range of networking, computing, and data storage applications. A
common usage example for the QDMA is to implement Data Center and Telco applications, such as
Compute acceleration, Smart NIC, NVMe, RDMA-enabled NIC (RNIC), server virtualization, and NFV
in the user logic. Multiple applications can be implemented to share the QDMA by assigning different
queue sets and PCIe functions to each application. These Queues can then be scaled in the user
logic to implement rate limiting, traffic priority, and custom work queue entry (WQE).

Product Specification

Displayed in the footer


Page 67 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA Performance Optimization

Performance Optimization Based on Available Cache/Buffer Size

Table: CPM4 QDMA

Name Entry/Depth Description

C2H Descriptor Cache Depth 512 Total number of outstanding C2H stream
descriptor fetches for cache bypass and
internal. This cache depth is not relevant in
Simple bypass mode, in simple bypass
mode user can have longer descriptor
cache.

Prefetch Cache Depth 64 C2H prefetch tags available. If you have


more then 64 active queues for packets <
512B, performance can reduce depending
on the data pattern. If you see the
performance degradation, you can
implement simple bypass mode, where
you can maintain all descriptor flows.

C2H Payload FIFO Depth 512 Units of 64 B. Amount of C2H data that
C2H engine can buffer. This amount of
buffer can sustain host read latency up to 2
us (512 *4 ns). If latency is more then 2 us
there could be performance degradation.

MM Reorder Buffer Depth 512 Units of 64 B. Amount of MM read data


that can be stored to absorb host read
latency.

Desc Eng Reorder Buffer Depth 512 Units of 64 B. Amount of Descriptor fetch
data that can be stored to absorb host
read latency.

H2C-ST Reorder Buffer Depth 512 Units of 64 B. Amount of H2C-ST data that
can be stored to absorb host read latency.

QDMA Operations

Descriptor Engine

The descriptor engine is responsible for managing the consumer side of the Host to Card (H2C) and
Card to Host (C2H) descriptor ring buffers for each queue. The context for each queue determines
how the descriptor engine will process each queue individually. When descriptors are available and
other conditions are met, the descriptor engine will issue read requests to PCIe to fetch the

Displayed in the footer


Page 68 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
descriptors. Received descriptors are offloaded to either the descriptor bypass out interface (bypass
mode) or delivered directly to a DMA engine (internal mode). When a H2C Stream or Memory
Mapped DMA engine completes a descriptor, status can be written back to the status descriptor, an
interrupt, and/or a marker response can be generated to inform software and user logic of the current
DMA progress. The descriptor engine also provides a Traffic Manager Interface which notifies user
logic of certain status for each queue. This allows the user logic to make informed decisions if
customization and optimization of DMA behavior is desired.

Descriptor Context

The Descriptor Engine stores per queue configuration, status and control information in descriptor
context that can be stored in block RAM or UltraRAM, and the context is indexed by H2C or C2H QID.
Prior to enabling the queue, the hardware and credit context must first be cleared. After this is done,
the software context can be programmed and the qen bit can be set to enable the queue. After the
queue is enabled, the software context should only be updated through the direct mapped address
space to update the Producer Index and Interrupt ARM bit, unless the queue is being disabled. The
hardware context and credit context contain only status. It is only necessary to interact with the
hardware and credit contexts as part of queue initialization in order to clear them to all zeros. Once
the queue is enabled, context is dynamically updated by hardware. Any modification of the context
through the indirect bus when the queue is enabled can result in unexpected behavior. Reading the
context when the queue is enabled is not recommended as it can result in reduced performance.

Software Descriptor Context Structure (0x0 C2H and 0x1 H2C)

The descriptor context is used by the descriptor engine. All descriptor rings must be aligned to the 4K
address.

Table: Software Descriptor Context Structure Definition

Bit Bit Width Field Name Description

[127:64] 64 dsc_base 4K aligned Base address of Descriptor


Ring.

[63] 1 is_mm This field is applicable only for internal


mode. If this field is set then the descriptors
will be delivered to associated H2C or C2H
MM engine.

[62] 1 mrkr_dis If set, disables the marker response in


internal mode.
Not applicable for C2H ST.

[61] 1 irq_req Interrupt due to error waiting to be sent


(waiting for irq_arm). This bit should be
cleared when the queue context is
initialized.
Not applicable for C2H ST.

Displayed in the footer


Page 69 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[60] 1 err_wb_sent A writeback/interrupt was sent for an error.


Once this bit is set no more writebacks or
interrupts will be sent for the queue. This bit
should be cleared when the queue context
is initialized.
Not applicable for C2H ST.

[59:58] 2 err Error status.


Bit[1] dma – An error occurred during DMA
operation. Check engine status registers.
Bit[0] dsc – An error occured during
descriptor fetch or update. Check descriptor
engine status registers. This field should be
set to 0 when the queue context is
initialized.

[57] 1 irq_no_last No interrupt was sent and pidx/cidx was


idle in internal mode. When the irq_arm bit
is set, the interrupt will be sent. This bit will
clear automatically when the interrupt is
sent or if the PIDX of the queue is updated.
This bit should be initialized to 0 when the
queue context is initialized.
Not applicable for C2H ST.

[56:54] 3 port_id Port_id


The port id that will be sent on user
interfaces for events associated with this
queue.

[53] 1 irq_en Interrupt enable.


An interrupt to the host will be sent on host
status updates.
Set to 0 for C2H ST.

[52] 1 wbk_en Writeback enable.


A memory write to the status descriptor will
be sent on host status updates.

[51] 1 mm_chn For AXI-MM transfer set to 0 to target


Channel 0 or set to 1 to target Channel 1.
For AXI-ST Set to 0.

[50] 1 bypass If set, the queue will operate under Bypass


mode, otherwise it will be in Internal mode.

Displayed in the footer


Page 70 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[49:48] 2 dsc_sz Descriptor size. 0: 8B, 1:16B; 2:32B; 3:rsv


32B is required for Memory Mapped DMA.
16B is required for H2C Stream DMA.
8B is required for C2H Stream DMA.

[47:44] 4 rng_sz Descriptor ring size index to ring size


registers.

[43:36] 8 fnc_id Function ID.


The function to which this queue belongs.

[35] 1 wbi_intvl_en Write back/Interrupt interval.


Enables periodic status updates based on
the number of descriptors processed.
Applicable to Internal mode.
Not Applicable to C2H ST. The writeback
interval is determined by
QDMA_GLBL_DSC_CFG.wb_acc_int.

[34] 1 wbi_chk Writeback/Interrupt after pending check.


Enable status updates when the queue has
completed all available descriptors.
Applicable to Internal mode.

[33] 1 fcrd_en Enable fetch credit.


The number of descriptors fetched will be
qualified by the number of credits given to
this queue.
Set to 1 for C2H ST.

[32] 1 qen Indicates that the queue is enabled.

[31:17] 15 rsv Reserved.

[16] 1 irq_arm Interrupt Arm. When this bit is set, the


queue is allowed to generate an interrupt.

[15:0] 16 pidx Producer Index.

Hardware Descriptor Context Structure (0x2 C2H and 0x3 H2C)

Table: Hardware Descriptor Structure Definition

Bit Bit Width Field Name Description

[47:43] 5 reserved Reserved

Displayed in the footer


Page 71 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[42] 1 fetch_pnd Descriptor fetch pending.

[41] 1 idl_stp_b Queue invalid and no descriptors pending.


This bit is set when the queue is enabled.
The bit is cleared when the queue has been
disabled (software context qen bit) and no
more descriptor are pending.

[40] 1 dsc_pnd Descriptors pending. Descriptors are


defined to be pending if the last CIDX
completed does not match the current
PIDX.

[39:32] 8 reserved Reserved

[31:16] 16 crd_use Credits consumed. Applicable if fetch


credits are enabled in the software context.

[15:0] 16 cidx Consumer Index of last fetched descriptor.

Credit Descriptor Context Structure

Table: Credit Descriptor Context Structure Definition

Bit Bit Width Field Name Description

[31:16] 16 reserved Reserved

[15:0] 16 credt Fetch credits received.


Applicable if fetch credits are enabled in the
software context.

The credit descriptor context is for internal DMA use only and can be read from the indirect bus for
debug. This context stores credits for each queue that have been received through the Descriptor
Credit Interface with the CREDIT_ADD operation. If the credit operation has the fence bit, credits are
added only as the read request for the descriptor is generated.

Descriptor Fetch

Figure: Descriptor Fetch Flow

Displayed in the footer


Page 72 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

1. The descriptor engine is informed of the availability of descriptors through an update to a


queue’s descriptor PIDX. This portion of the context is direct mapped to the
QDMA_DMAP_SEL_H2C_DSC_PIDX and QDMA_DMAP_SEL_C2H_DSC_PIDX address
space.
2. On a PIDX update, the descriptor engine evaluates the number of descriptors available based on
the last fetched consumer index (CIDX). The availability of new descriptors is communicated to
the user logic through the Traffic Manager Status Interface.
3. If fetch crediting is enabled, the user logic is required to provide a credit for each descriptor that
should be fetched.
4. If descriptors are available and either fetch credits are disabled or are non-zero, the descriptor
engine will generate a descriptor fetch to PCIe. The number of fetched descriptors is further
qualified by the PCIe Max Read Request Size (MRRS) and descriptor fetch credits, if enabled. A
descriptor fetch can also be stalled due to insufficient completion space. In each direction, C2H
and H2C are allocated 256 entries for descriptor fetch completions. Each entry is the width of the
datapath. If sufficient space is available, the fetch is allowed to proceed. A given queue can only
have one descriptor fetch pending on PCIe at any time.
5. The host receives the read request and provides the descriptor read completion to the descriptor
engine.
6. Descriptors are stored in a buffer until they can be offloaded. If the queue is configured in bypass
mode, the descriptors are sent to the Descriptor Bypass Output port. Otherwise they are
delivered directly to a DMA engine. Once delivered, the descriptor fetch completion buffer space
is deallocated.

✎ Note: Available descriptors are always <ring size> - 2. At any time, the software should not update
the PIDX to more than <ring size> - 2.
For example, if queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is
reserved for status. This index should never be used for the PIDX update, and the PIDX update
should never be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.

Displayed in the footer


Page 73 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Internal Mode

A queue can be configured to operate in Descriptor Bypass mode or Internal mode by setting the
software context bypass field. In internal mode, the queue requires no external user logic to handle
descriptors. Descriptors that are fetched by the descriptor engine are delivered directly to the
appropriate DMA engine and processed. Internal mode allows credit fetching and status updates to
the user logic for run time customization of the descriptor fetch behavior.

Internal Mode Writeback and Interrupts (AXI MM and H2C ST)

Status writebacks and/or interrupts are generated automatically by hardware based on the queue
context. When wbi_intvl_en is set, writebacks/interrupts will be sent based on the interval selected
in the register QDMA_GLBL_DSC_CFG.wb_intvl. Due to the slow nature of interrupts, in interval
mode, interrupts may be late or skip intervals. If the wbi_chk context bit is set, a writeback/interrupt
will be sent when the descriptor engine has detected that the last descriptor at the current PIDX has
completed. It is recommended the wbi_chk bit be set for all internal mode operation, including when
interval mode is enabled. An interrupt will not be generated until the irq_arm bit has been set by
software. Once an interrupt has been sent the irq_arm bit is cleared by hardware. Should an interrupt
be needed when the irq_arm bit is not set, the interrupt will be held in a pending state until the
irq_arm bit is set.
Descriptor completion is defined to be when the descriptor data transfer has completed and its write
data has been acknowledged on AXI (H2C bresp for AXI MM, Valid/Ready of ST), or been accepted
by the PCIE Controller’s transaction layer for transmission (C2H MM).

Descriptor Bypass Mode

Descriptor Bypass mode also supports crediting and status updates to user logic. In addition,
Descriptor Bypass mode allows the user logic to customize processing of descriptors and status
updates. Descriptors fetched by the descriptor engine are delivered to user logic through the
descriptor bypass out interface. This allows user logic to pre-process or store the descriptors, if
desired. On the descriptor bypass out interface, the descriptors can be a custom format (adhering to
the descriptor size). To perform DMA operations, the user logic drives descriptors (must be QDMA
format) into the descriptor bypass input interface.

Descriptor Bypass Mode Writeback/Interrupts

In bypass mode, the user logic has explicit control over status updates to the host, and marker
responses back to user logic. Along with each descriptor submitted to the Descriptor Bypass Input
Port for a Memory Mapped Engine (H2C and C2H) or H2C Stream DMA engine, there is a CIDX, and
sdi field. The CIDX is used to identify which descriptor has completed in any status update (host
writeback, marker response, or coalesced interrupt) generated at the completion of the descriptor. If
the sdi field of the descriptor was input, then a writeback to the host will be generated if the context
wbk_en bit is set. An interrupt can also be sent if the sdi bit is set if the context irq_en and irq_arm
bits are set.
If interrupts are enabled, the user logic must monitor the traffic manager output for the irq_arm. After
the irq_arm bit has been observed for the queue, a descriptor with the sdi bit will be sent to the

Displayed in the footer


Page 74 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
DMA. Once a descriptor with the sdi bit has been sent, another irq_arm assertion must be observed
before another descriptor with the sdi bit can be sent. If the user sets the sdi bit when the arm bit
has not be properly observed, an interrupt may or may not be sent, and software might hang
indefinitely waiting for an interrupt. When interrupts are not enabled, setting the sdi bit has no
restriction. However excessive writeback events can severely reduce the descriptor engine
performance and consume write bandwidth to the host.
Descriptor completion is defined to be when the descriptor data transfer has completed and its write
data has been acknowledged on AXI4 (H2C bresp for AXI MM, Valid/Ready of ST), or been accepted
by the PCIE Controller’s transaction layer for transmission (C2H MM).

Marker Response

Marker responses can be generated for any descriptor by setting the mrkr_req bit. Marker responses
are generated after the descriptor is completed. Similar to host writebacks, excessive marker
response requests can reduce descriptor engine performance. Marker responses to the user logic can
also be sent with the sbi bit if configured in the context. The Marker responses are sent on Queue
Status ports which can be identified by the queue id.
Descriptor completion is defined as when the descriptor data transfer has completed and its write data
is acknowledged on AXI (H2C bresp for AXI MM, Valid/Ready of ST), or is accepted by the PCIE
Controller’s transaction layer for transmission (C2H MM).

Traffic Manager Output Interface

The traffic manager interface provides details of a queue’s status to user logic, allowing user logic to
manage descriptor fetching and execution. In normal operation, for an enabled queue, each time the
irq_arm bit is asserted or PIDX of a queue is updated, the descriptor engine asserts
tm_dsc_sts_valid. The tm_dsc_sts_avl signal indicates the number of new descriptors available
since the last update. Through this mechanism, user logic can track the amount of work available for
each queue. This can be used for prioritizing fetches through the descriptor engine’s fetch crediting
mechanism or other user optimizations. On the valid cycle, the tm_dsc_sts_irq_arm indicates that
the irq_arm bit was zero and was set. In bypass mode, this is essentially a credit for an interrupt for
this queue. When a queue is invalidated by software or due to error, the tm_dsc_sts_qinv bit will be
set. If this bit is observed, the descriptor engine will have halted new descriptor fetches for that queue.
In this case, the contents on tm_dsc_sts_avl indicate the number of available fetch credits held by
the descriptor engine. This information can be used to help user logic reconcile the number of credits
given to the descriptor engine, and the number of descriptors it should expect to receive. Even after
tm_dsc_sts_qin is asserted, valid descriptors already in the fetch pipeline will continue to be
delivered to the DMA engine (internal mode) or delivered to the descriptor bypass output port (bypass
mode).
Other fields of the tm_dsc_sts interface identify the queue id, DMA direction (H2C or C2H), internal
or bypass mode, stream or memory mapped mode, queue enable status, queue error status, and port
ID.
While the tm_dsc_sts interface is a valid/ready interface, it should not be back-pressured for optimal
performance. Since multiple events trigger a tm_dsc_sts cycle, if internal buffering is filled, descriptor
fetching will be halted to prevent generation of new events.

Displayed in the footer


Page 75 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Descriptor Credit Input Interface

The credit interface is relevant when a queue’s fcrd_en context bit is set. It allows the user logic to
prioritize and meter descriptors fetched for each queue. You can specify the DMA direction, qid, and
credit value. For a typical use case, the descriptor engine uses credit inputs to fetch descriptors.
Internally, credits received and consumed are tracked for each queue. If credits are added when the
queue is not enabled, the credits will be returned through the Traffic Manager Output Interface with
tm_dsc_sts_qinv asserted, and the credits in tm_dsc_sts_avl are not valid. Monitor tm_dsc_sts
interface to keep an account for each queue on how many credits are consumed.

Errors

Errors can potentially occur during both descriptor fetch and descriptor execution. In both cases, once
an error is detected for a queue it will invalidate the queue, log an error bit in the context, stop fetching
new descriptors for the queue which encountered the error, and can also log errors in status registers.
If enabled for writeback, interrupts, or marker response, the DMA will generate a status update to
these interfaces. Once this is done, no additional writeback, interrupts, or marker responses (internal
mode) will be sent for the queue until the queue context is cleared. As a result of the queue
invalidation due to an error, a Traffic Manager Output cycle will also be generated to indicate the error
and queue invalidation. After the queue is invalidated, if there is an error you can determine the cause
by reading the error registers and context for that queue. You must clear and remove that queue, and
then add the queue back later when needed.
Although additional descriptor fetches will be halted, fetches already in the pipeline will continue to be
processed and descriptors will be delivered to a DMA engine or Descriptor Bypass Out interface as
usual. If the descriptor fetch itself encounters an error, the descriptor will be marked with an error bit.
If the error bit is set, the contents of the descriptor should be considered invalid. It is possible that
subsequent descriptor fetches for the same queue do not encounter an error and will not have the
error bit set.

Memory Mapped DMA

In memory mapped DMA operations, both the source and destination of the DMA are memory
mapped space. In an H2C transfer, the source address belongs to PCIe address space while the
destination address belongs to AXI MM address space. In a C2H transfer, the source address belongs
to AXI MM address space while the destination address belongs to PCIe address space. PCIe-to-
PCIe, and AXI MM-to-AXI MM DMAs are not supported. Aside from the direction of the DMA, transfer
H2C and C2H DMA behave similarly and share the same descriptor format.

Operation

The memory mapped DMA engines (H2C and C2H) are enabled by setting the run bit in the Memory
Mapped Engine Control Register. When the run bit is deasserted, descriptors can be dropped. Any
descriptors that have already started the source buffer fetch will continue to be processed.
Reassertion of the run bit will result in resetting internal engine state and should only be done when
the engine is quiesced. Descriptors are received from either the descriptor engine directly or the
Descriptor Bypass Input interface. Any queue that is in internal mode should not be given descriptors

Displayed in the footer


Page 76 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
through the Descriptor Bypass Input interface. Any descriptor sent to an MM engine that is not running
will be dropped. For configurations where a mix of Internal Mode queues and Bypass Mode queues
are enabled, round robin arbitration is performed to establish order.
The DMA Memory Mapped engine first generates the read request to the source interface, splitting
the descriptor at alignment boundaries specific to the interface. Both PCIe and AXI read interfaces
can be configured to split at different alignments. Completion space for read data is preallocated
when the read is issued. Likewise for the write requests, the DMA engine will split at appropriate
alignments. On the AXI interface each engine will use a single AXI ID. The DMA engine will reorder
the read completion/write data to the order in which the reads were issued. Once sufficient read
completion data is received the write request will be issued to the destination interface in the same
order that the read data was requested. Before the request is retired, the destination interfaces must
accept all the write data and provide a completion response. For PCIe the write completion is issued
when the write request has been accepted by the transaction layer and will be sent on the link next.
For the AXI Memory Mapped interface, the bresp is the completion criteria. Once the completion
criteria has been met, the host writeback, interrupt and/or marker response is generated for the
descriptor as appropriate.
The DMA Memory Mapped engines also support the no_dma field of the Descriptor Bypass Input, and
zero-length DMA. Both cases are treated identically in the engine. The descriptors propagate through
the DMA engine as all other descriptors, so descriptor ordering within a queue is still observed.
However no DMA read or write requests are generated. The status update (writeback, interrupt,
and/or marker response) for zero-length/no_dma descriptors is processed when all previous
descriptors have completed their status update checks.

Errors

There are two primary error categories for the DMA Memory Mapped Engine. The first is an error bit
that is set with an incoming descriptor. In this case, the DMA operation of the descriptor is not
processed but the descriptor will proceed through the engine to status update phase with an error
indication. This should result in a writeback, interrupt, and/or marker response depending on context
and configuration. It will also result in the queue being invalidated. The second category of errors for
the DMA Memory Mapped Engine are errors encountered during the execution of the DMA itself. This
can include PCIe read completions errors, and AXI bresp errors (H2C), or AXI bresp errors and PCIe
write errors due to bus master enable or function level reset (FLR), as well as RAM ECC errors. The
first enabled error is logged in the DMA engine. Please refer to the Memory Mapped Engine error
logs. If an error occurs on the read, the DMA write will be aborted if possible. If the error was detected
when pulling write data from RAM, it is not possible to abort the request. Instead invalid data parity
will be generated to ensure the destination is aware of the problem. After the descriptor which
encountered the error has gone through the DMA engine, it will proceed to generate status updates
with an error indication. As with descriptor errors, it will result in the queue being invalidated.

AXI Memory Mapped Descriptor for H2C and C2H (32B)

Table: AXI Memory Mapped Descriptor Structure for H2C and C2H

Bit Bit Width Field Name Description

Displayed in the footer


Page 77 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[255:192] 64 reserved Reserved

[191:128] 64 dst_addr Destination Address

[127:92] 36 reserved Reserved

[91:64] 28 lengthInByte Read length in byte

[63:0] 64 src_addr Source Address


Internal mode memory mapped DMA must configure the descriptor queue to be 32B and follow the
above descritor format. In bypass mode, the descriptor format is defined by the user logic, which must
drive the H2C or C2H MM bypass input port.

AXI Memory Mapped Writeback Status Structure for H2C and C2H

The MM writeback status register is located after the last entry of the (H2C or C2H) descriptor.

Table: AXI Memory Mapped Writeback Status Structure for H2C and C2H

Bit Bit Width Field Name Description

[63:48] 16 reserved Reserved

[47:32] 16 pidx Producer Index at time of writeback

[31:16] 16 cidx Consumer Index

[15:2] 14 reserved Reserved

[1:0] 2 err Error


bit 1: Descriptor fetch error
bit 0: DMA error

Stream Mode DMA

H2C Stream Engine

The H2C Stream Engine is responsible for transferring streaming data from the host and delivering it
to the user logic. The H2C Stream Engine operates on H2C stream descriptors. Each descriptor
specifies the start address and the length of the data to be transferred to the user logic. The H2C
Stream Engine parses the descriptor and issues read requests to the host over PCIe, splitting the
read requests at the MRRS boundary. There can be up to 256 requests outstanding in the H2C
Stream Engine to hide the host read latency. The H2C Stream Engine implements a re-ordering buffer
of 32 KB to re-order the TLPs as they come back. Data is issued to the user logic in order of the
requests sent to PCIe.
If the status descriptor is enabled in the associated H2C context, the engine could additionally send a
status write back to host once it is done issuing data to the user logic.

Displayed in the footer


Page 78 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Internal and Bypass Modes

Each queue in QDMA can be programmed in either of the two H2C Stream modes: internal and
bypass. This is done by specifying the mode in the queue context. The H2C Stream Engine knows
whether the descriptor being processed is for a queue in internal or bypass mode.
The following figures show the internal mode and bypass mode flows.

Figure: H2C Internal Mode Flow

Figure: H2C Bypass Mode Flow

Displayed in the footer


Page 79 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
For a queue in internal mode, after the descriptor is fetched from the host, it is fed straight to the H2C
Stream Engine for processing. In this case, a packet of data cannot span over multiple descriptors.
Thus for a queue in internal mode, each descriptor generates exactly one AXI4-Stream packet on the
QDMA H2C AXI4-Stream output. If the packet is present in host memory in non-contiguous space,
then it has to be defined by more than one descriptor and this requires that the queue be programmed
in bypass mode.
In the bypass mode, after the descriptors are fetched from the host, they are sent straight to the user
logic via the QDMA bypass output port. The QDMA does not parse these descriptors at all. The user
logic can store these descriptors and then send the required information from these descriptors back
to QDMA using the QDMA H2C Stream descriptor bypass-in interface. Using this information, the
QDMA constructs descriptors which are then fed to the H2C Stream Engine for processing.
When fcrd_en is enabled in the software context, DMA will wait for the user application to provide
credits, Credit return in the figure above. When fcrd_en is not set, the DMA uses a pointer update,
fetches descriptors and sends the pointer out. The user application should not send in credits. "Credit
return" in the above figure does not apply in this case.
The following are the advantages of using the bypass mode:

The user logic can have a custom descriptor format. This is possible because QDMA does not
parse descriptors for queues in bypass mode. The user logic parses these descriptors and
provides the information required by the QDMA on the H2C Stream bypass-in interface.
Immediate data can be passed from the software to the user logic without DMA operation.
The user logic can do traffic management by sending the descriptors to the QDMA when it is
ready to sink all the data. Descriptors can be cached in local RAM.
Perform address translation.

There are some requirements imposed on the user logic when using the bypass mode. Because the
bypass mode allows a packet to span multiple descriptors, the user logic needs to indicate to QDMA
which descriptor marks the Start-Of-Packet (SOP) and which marks the End-Of-Packet (EOP). At the
QDMA H2C Stream bypass-in interface, among other pieces of information, the user logic needs to
provide: Address, Length, SOP, and EOP. It is required that once the user logic feeds SOP descriptor
information into QDMA, it must eventually feed EOP descriptor information also. Descriptors for these
multi-descriptor packets must be fed in sequentially. Other descriptors not belonging to the packet
must not be interleaved within the multi-descriptor packet. The user logic must accumulate the
descriptors up to the EOP descriptor, before feeding them back to QDMA. Not doing so can result in a
hang. The QDMA will generate a TLAST at the QDMA H2C AXI Stream data output once it issues the
last beat for the EOP descriptor. This is guaranteed because the user is required to submit the
descriptors for a given packet sequentially.
The H2C stream interface is shared by all the queues, and has the potential for a head of line
blocking issue if the user logic does not reserve the space to sink the packet. Quality of service can
be severely affected if the packet sizes are large. The Stream engine is designed to saturate PCIe for
packet sizes as low as 128B, so AMD recommends that you restrict the packet size to be host page
size or maximum transfer unit as required by the user application.
A performance control provided in the H2C Stream Engine is the ability to stall requests from being
issued to the PCIe RQ/RC if a certain amount of data is outstanding on the PCIe side as seen by the
H2C Stream Engine. To use this feature, the SW must program a threshold value in the
H2C_REQ_THROT (0xE24) register. After the H2C Stream Engine has more data outstanding to be

Displayed in the footer


Page 80 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
delivered to the user logic than this threshold, it stops sending further read requests to the PCIe
RQ/RC. This feature is disabled by default and can be enabled with the H2C_REQ_THROT (0xE24)
register. This feature helps improve the C2H Stream performance, because the H2C Stream Engine
can make requests at a much faster rate than the C2H Stream Engine. This can potentially use up the
PCIe side resources for H2C traffic which results in C2H traffic suffering. The H2C_REQ_THROT
(0xE24) register also allows the SW to separately enable and program the threshold of the maximum
number of read requests that can be outstanding in the H2C Stream engine. Thus, this register can
be used to individually enable and program the thresholds for the outstanding requests and data in
the H2C Stream engine.

H2C Stream Descriptor (16B)

Table: H2C Descriptor Structure

Bit Bit Width Field Name Description

[127:96] 32 addr_h Address High. Higher 32 bits of the source


address in Host

[95:64] 32 addr_l Address Low. Lower 32 bits of the source


address in Host

[63:48] 16 reserved Reserved

[47:32] 16 len Packet Length. Length of the data to be


fetched for this descriptor.
This is also the packet length since in
internal mode, a packet cannot span
multiple descriptors.
The maximum length of the packet can be
64K-1 bytes.

[31:0] 32 metadata Metadata. QDMA passes this field on the


H2C-ST TUSER along with the data on
every beat. For a queue in internal mode, it
can be used to pass messages from SW to
user logic along with the data.

This H2C descriptor format is only applicable for internal mode. For bypass mode, the user logic can
define its own format as needed by the user application.

Descriptor Metadata

Similar to bypass mode, the internal mode also provides a mechanism to pass information directly
from the software to the user logic. In addition to address and length, the H2C Stream descriptor also
has a 32b metadata field. This field is not used by the QDMA for the DMA operation. Instead, it is
passed on to the user logic on the H2C AXI4-Stream tuser on every beat of the packet. Passing

Displayed in the footer


Page 81 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
metadata on the tuser is not supported for a queue in bypass mode and consequently there is no
input to provide the metadata on the QDMA H2C Stream bypass-in interface.

Zero Length Descriptor

The length field in a descriptor can be zero. In this case, the H2C Stream Engine will issue a zero
byte read request on PCIe. After the QDMA receives the completion for the request, the H2C Stream
Engine will send out one beat of data with tlast on the QDMA H2C AXI4-Stream interface. The zero
byte packet will be indicated on the interface by setting the zero_b_dma bit in the tuser. The user
logic must set both the SOP and EOP for a zero byte descriptor. If not done, an error will be flagged
by the H2C Stream Engine.

H2C Stream Status Descriptor Writeback

When feeding the descriptor information on the bypass input interface, the user logic can request the
QDMA to send a status write back to the host when it is done fetching the data from the host. The
user logic can also request that a status be issued to it when the DMA is done. These behaviors can
be controlled using the sdi and mrkr_req inputs in the bypass input interface.
The H2C writeback status register is located after the last entry of the H2C descriptor list.
✎ Note: The format of the H2C-ST status descriptor written to the descriptor ring is different from
that written into the interrupt coalesce entry.

Table: AXI4-Stream H2C Writeback Status Descriptor Structure

Bit Bit Width Field Name Description

[63:32] 32 reserved Reserved

[47:32] 16 pidx Producer Index

[31:16] 16 cidx Consumer Index

[15:2] 14 reserved Reserved (Producer Index)

[1:0] 2 error Error


0x0 : No Error
0x1 : Descriptor or data error was
encountered on this queue
0x2 and 0x3 : Reserved

H2C Stream Data Aligner

The H2C engine has a data aligner that aligns the data to zero Bytes (0B) boundary before issuing it
to the user logic. This allows the start address of a descriptor to be arbitrarily aligned and still receive
the data on the H2C AXI4-Stream data bus without any holes at the beginning of the data. The user
logic can send a batch of descriptors from SOP to EOP with arbitrary address and length alignments
for each descriptor. The aligner will align and pack the data from the different descriptors and will

Displayed in the footer


Page 82 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
issue a continuous stream of data on the H2C AXI4-Stream data bus. The tlast on that interface will
be asserted when the last beat for the EOP descriptor is being issued.

Handling Descriptors With Errors

If an error is encountered while fetching a descriptor, the QDMA Descriptor Engine flags the descriptor
with error. For a queue in internal mode, the H2C Stream Engine handles the error descriptor by not
performing any PCIe or DMA activity. Instead, it waits for the error descriptor to pass through the
pipeline and forces a writeback after it is done. For a queue in bypass mode, it is the responsibility of
the user logic to not issue a batch of descriptors with an error descriptor. Instead, it must send just
one descriptor with error input asserted on the H2C Stream bypass-in interface and set the SOP,
EOP, no_dma signal, and sdi or mrkr-req signal to make the H2C Stream Engine send a writeback
to Host.

Handling Errors in Data From PCIe

If the H2C Stream Engine encounters an error coming from PCIe on the data, it keeps the error sticky
across the full packet. The error is indicated to the user on the err bit on the H2C Stream Data
Output. Once the H2C Stream sends out the last beat of a packet that saw a PCIe data error, it also
sends a Writeback to the Software to inform it about the error.

C2H Stream Engine

The C2H Stream Engine DMA writes the stream packets to the host memory into the descriptor
provided by the host driver through the C2H descriptor queue.
The Prefetch Engine is responsible for calculating the number of descriptors needed for the DMA that
is writing the packet. The buffer size is fixed per queue basis. For internal and cached bypass mode,
the prefetch module can fetch up to 512 descriptors for a maximum of 64 different queues at any
given time.
The Prefetch Engine also offers low latency feature pfch_en = 1, where the engine can prefetch up
to qdma_c2h_pfch_cfg.num_pfch descriptors upon receiving the packet, so that subsequent packets
can avoid the PCIe latency.
The QDMA requires software to post full ring size so the C2H stream engine can fetch the needed
number of descriptors for all received packets. If there are not enough descriptors in the descriptor
ring, the QDMA will stall the packet transfer. For performance reasons, the software is required to post
the PIDX as soon as possible to ensure there are always enough descriptors in the ring.
C2H stream packet data length is limited to 7 * C2H buffer size. C2H buffer size can be programmed
from 0xAB0 to 0xAEC address. For details see cpm4-qdma-v2-1-registers.csv available in the register
map files.

C2H Stream Descriptor (8B)

Table: AXI4-Stream C2H Descriptor Structure

Bit Bit Width Field Name Description

[63:0] 64 addr Destination Address

Displayed in the footer


Page 83 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

C2H Prefetch Engine

The prefetch engine interacts between the descriptor fetch engine and C2H DMA write engine to pair
up the descriptor and its payload.

Table: C2H Prefetch Context Structure

Bit Bit Width Field Name Description

[45] 1 valid Context is valid

[44:29] 16 sw_crdt Software credit


This field is written by the hardware for
internal use. The software must initialize it
to 0 and then treat it as read-only.

[28] 1 pfch Queue is in prefetch


This field is written by the hardware for
internal use. The software must initialize it
to 0 and then treat it as read-only.

[27] 1 pfch_en Enable prefetch

[26] 1 err Error detected on this queue


During the descriptor per-fetch process, if
there are any errors detected it will be
logged here. This will be per queue basis.

[25:8] 18 reserved Reserved

[7:5] 3 port_id Port ID

[4:1] 4 buf_size_idx Buffer size index

[0] 1 bypass C2H is in bypass mode

C2H Stream Modes

The C2H descriptors can be from the descriptor fetch engine or C2H bypass input interfaces. The
descriptors from the descriptor fetch engine are always in cache mode. The prefetch engine keeps
the order of the descriptors to pair with the C2H data packets from the user. The descriptors from the
C2H bypass input interfaces have one interface for both simple mode and cache mode (note that both
simple bypass and cache bypass use the same interface). For simple mode, the user application
keeps the order of the descriptors to pair with the C2H data packets. For cache mode, the prefetch
engine keeps the order of the descriptors to pair with the C2H data packet from the user.
The prefetch context has a bypass bit. When it is 1'b1, the user application sends the credits for the
descriptors. When it is 1'b0, the prefetch engine handles the credits for the descriptors.

Displayed in the footer


Page 84 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The descriptor context has a bypass bit. When it is 1'b1, the descriptor fetch engine sends out the
descriptors on the C2H bypass output interface. The user application can convert it and later loop it
back to the QDMA on the C2H bypass input interface. When the bypass context bit is 1'b0, the
descriptor fetch engine sends the descriptors to the prefetch engine directly.
On a per queue basis, three cases are supported.
✎ Note: If you already have the descriptor cached on the device, there is no need to fetch one from
the host and you should follow the simple bypass mode for the C2H Stream application. In simple
bypass mode, do not provide credits to fetch the descriptor, and instead, you need to send in the
descriptor on the descriptor bypass interface.
✎ Note: AXI4-Stream C2H Simple Bypass mode and Cache Bypass mode both use same bypass in
ports (c2h_byp_in_st_csh_* ports).

Table: C2H Stream Modes

c2h_byp_in desc_ctxt.desc_byp pfch_ctxt.bypass

Simple bypass simple byp in 1 1


mode

Cache bypass mode cache byp in 1 0

Cache internal mode N/A 0 0

Simple Bypass Mode


For simple bypass mode, the descriptor fetch engine sends the descriptors out on the C2H bypass
out interface. The user application converts the descriptor and loops it back to the QDMA on the
simple mode C2H bypass input interface. The user application sends the credits for the descriptors,
and it also keeps the order of the descriptors.

Figure: C2H Simple Bypass Mode Flow

Displayed in the footer


Page 85 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

If you already have descriptors, there is no need to update the pointers or provide credits. Instead,
send the descriptors in the descriptor bypass interface, and send the data and Completion (CMPT)
packets.

Cache Bypass Mode


For cache bypass mode, the descriptor fetch engine sends the descriptors out on the C2H bypass
output interface. The user application converts the descriptor and loops it back to the QDMA on the
cache mode C2H bypass input interface. The prefetch engine sends the credits for the descriptors,
and it keeps the order of the descriptors.
For cache internal mode, the descriptor fetch engine sends the descriptors to the prefetch engine.
The prefetch engine sends out the credits for the descriptors and keeps the order of the descriptors.
In this case, the descriptors do not go out on the C2H bypass output and do not come back on the
C2H bypass input interfaces.
In cache bypass or internal mode, prefetch mode can be turned on which will prefetch the descriptor
and will reduce transfer latency significantly. When prefetch mode is enabled, the user application can
not send credits as input in QDMA Descriptor Credit input ports. Credits for all queues will be
maintained by prefetch engine.
In cache bypass mode c2h_byp_out_pfch_tag[6:0] signal should be looped back as an input
c2h_byp_in_st_csh_pfch_tag[6:0]. The prefetch tag points to the cam that stores the active
queues in the prefetch engine.

Figure: C2H Cache Bypass Mode Flow

Displayed in the footer


Page 86 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

C2H Stream Packet Type

The following are some of the different C2H stream packets.

Regular Data Packet


The regular C2H data packet can be multiple bits.

dma0_s_axis_c2h_ctrl_qid = qid of the packet


dma0_s_axis_c2h_ctrl_len = length of the packet
dma0_s_axis_c2h_mty = empty byte in the beat

Immediate Data Packet


The user logic can mark a data packet as 'immediate' to write to just the Completion ring without
having a corresponding data packet transfer to the host. For the immediate data packet, the QDMA
will not send the data payload, but it will write to the CMPT Queue. The immediate packet does not
consume a descriptor.
The following is the setting of the immediate data packet:

1 beat of data
dma0_s_axis_c2h_ctrl_imm_data = 1’b1
dma0_s_axis_c2h_ctrl_len = datapath width in bytes (for example; 64 if datawidth is 512 bits)
dma0_s_axis_c2h_mty = 0

Displayed in the footer


Page 87 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Marker Packet
The C2H Stream Engine of the QDMA provides a way for the user logic to insert a marker into the
QDMA along with a C2H packet. This marker then propagates through the C2H Engine pipeline and
comes out on the C2H Stream Descriptor Bypass Out interface. The marker is inserted by setting the
marker bit in the C2H Stream Control input. The marker response is indicated by QDMA to the user
logic by setting the mrkr_rsp bit on the C2H Stream Descriptor Bypass Out interface. For a marker,
QDMA does not send out a payload packet but still writes to the Completion Ring. Not all marker
responses are generated because of a corresponding marker request. The QDMA sometimes
generates marker responses when it encounters exceptional events. See the following section for
details about when QDMA internally generates marker responses.
The primary purpose of giving the user logic the ability of sending in a marker into QDMA is to
determine when all the traffic prior to the marker has been flushed. This can be used in the shutdown
sequence in the user logic. Although not a requirement, the marker must be sent by the user logic
with the user_trig bit set when sending in the marker into QDMA. This allows the QDMA to generate
an interrupt and truly ensures that all traffic prior to the marker is flushed out. The QDMA Completion
Engine takes the following actions when it receives a marker from the user logic:

Sends the Completion that came along with the marker to the C2H Stream Completion Ring
Generates Status Descriptor if enabled (if user_trig was set when maker was inserted)
Generates an Interrupt if enabled and not outstanding
Sends the marker response. If an Interrupt was not sent due to it being enabled but outstanding,
the ‘retry_mrkr’ bit in the marker response is set to inform the user that an Interrupt could not be
sent for this marker request. See the C2H Stream Descriptor Bypass Output interface
description for details of these fields.

The following is the setting of the marker data packet:

1 beat of data
dma0_s_axis_c2h_ctrl_marker = 1’b1
dma0_s_axis_c2h_ctrl_len = data width (ex. 64 if data width is 512 bits)
dma0_s_axis_c2h_mty = 0

The immediate data packet and the marker packet don't consume the descriptor, but they write to the
C2H Completion Ring. The software needs to size the C2H Completion Ring large enough to
accommodate the outstanding immediate packets and the marker packets.

Zero Length Packet


The length of the data packet can be 0. On the input, the user needs to send 1 beat of data. The zero
length packet consumes the descriptor. The QDMA will send out 1DW payload data.
The following is the setting of the zero length packet:

1 beat of data
dma0_s_axis_c2h_ctrl_len = 0
dma0_s_axis_c2h_mty = 0

Displayed in the footer


Page 88 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Disable Completion Packet


The user can disable the completion for a specific packet. The QDMA will DMA the payload, but it will
not write to the C2H Completion Ring.
The following is the setting of the disable completion packet:

dma0_s_axis_c2h_ctrl_disable_cmp = 1

Handling Descriptors With Errors

If an error is encountered while fetching a descriptor (in pre-fetch or regular mode), the QDMA
Descriptor Engine flags the descriptor with error. For a queue in internal mode, the C2H Stream
Engine handles the error descriptor by not performing any PCIe or DMA activity. Instead, it waits for
the error descriptor to pass through the pipeline and forces a writeback after it is done. For a queue in
bypass mode, it is the responsibility of the user logic to not issue a batch of descriptors with an error
descriptor. Instead, it must send just one descriptor with error input asserted on the C2H Stream
bypass-in interface and set the SOP, EOP, no_dma signal, and sdi or mrkr_req signal to make the
C2H Stream Engine send a writeback to Host.

C2H Completion

When the DMA write of the data payload is done, the QDMA writes the CMPT packet into the CMPT
queue. Besides the user defined data, the CMPT packet also includes some other information, such
as error, color, and the length. It also has a desc_used bit to indicate if the packet consumes a
descriptor. A C2H data packet of immediate-data or marker type does not consume any descriptor.

C2H Completion Context Structure

The completion context is used by the completion engine.

Table: C2H Completion Context Structure Defintion

Bit Bit Width Field Name Description

[127:126] 2 rsvd Reserved

[125] 1 full_upd Full Update


If reset, then the Completion-CIDX-update
is allowed to update only the CIDX in this
context.
If set, then the Completion CIDX update
can update the following fields in this
context:
timer_ix
counter_ix
trig_mode
en_int

Displayed in the footer


Page 89 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


en_stat_desc

[124] 1 timer_running If set, it indicates that a timer is running on


this queue. This timer is for the purpose of
C2H interrupt moderation. Ideally, the
software must ensure that there is no
running timer on this QID before shutting
the queue down. This is a field used
internally by HW. The SW must initialize it
to 0 and then treat it as read-only.

[123] 1 user_trig_pend If set, it indicates that a user logic initiated


interrupt is pending to be generated. The
user logic can request an interrupt through
the dma0_s_axis_c2h_ctrl_user_trig signal.
This bit is set when the user logic requests
an interrupt while another one is already
pending on this QID. When the next
Completion CIDX update is received by
QDMA, this pending bit may or may not
generate an interrupt depending on
whether or not there are entries in the
Completion ring waiting to be read. This is
a field used internally by HW. The SW must
initialize it to 0 and then treat it as read-
only.

[122:121] 2 err Indicates that the C2H Completion Context


is in error. This is a field written by HW. The
SW must initialize it to 0 and then treat it as
read-only. The following errors are indicated
here:
0: No error.
1: A bad CIDX update from software was
detected.
2: A descriptor error was detected.
3: A Completion packet was sent by the
user logic when the Completion Ring was
already full.

[120] 1 valid Context is valid.

[119:104] 16 cidx Current value of the hardware copy of the


Completion Ring Consumer Index.

Displayed in the footer


Page 90 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[103:88] 16 pidx Completion Ring Producer Index. This is a


field written by HW. The SW must initialize
it to 0 and then treat it as read-only.

[87:86] 2 desc_size Completion Entry Size:


0: 8B
1: 16B
2: 32B
3: Unknown

[85:28] 58 baddr_64 Base address of Completion ring – bit


[63:6].

[27:24] 4 qsize_idx Completion ring size index to ring size


registers.

[23] 1 color Color bit to be used on Completion.

[22:21] 2 int_st Interrupt State:


0: ISR
1: TRIG
This is a field used internally by HW. The
SW must initialize it to 0 and then treat it as
read-only.
Because it is out of reset, the HW initializes
into ISR state, it is not sensitive to trigger
events. If SW desires interrupts or status
writes, it must send an initial Completion
CIDX update. This makes the HW move
into TRIG state and as a result it becomes
sensitive to any trigger conditions.

[20:17] 4 timer_idx Index to timer register for TIMER based


trigger modes.

[16:13] 4 counter_idx Index to counter register for COUNT based


trigger modes.

[12:5] 8 fnc_id Function ID

[4:2] 3 trig_mode Interrupt and Completion Status Write


Trigger Mode:
0x0: Disabled
0x1: Every
0x2: User_Count
0x3: User
0x4: User_Timer

Displayed in the footer


Page 91 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


0x5: User_Timer_Count

[1] 1 en_int Enable Completion interrupts.

[0] 1 en_stat_desc Enable Completion Status writes.

C2H Completion Status Structure

The C2H completion status is located at the last location of completion ring, that is, Completion Ring
Base Address + (Size of the completion length (8,16,32) * (Completion Ring Size – 1)).
When C2H Streaming Completion is enabled, after the packet is transferred, CMPT entry and CMPT
status are written to C2H Completion ring. PIDX in the Completion status can be used to indicate the
currently available completion to be processed.

Table: AXI4-Stream C2H Completion Status Structure

Bit Bit Width Field Name Description

[63:35] 29 Reserve Reserved

[34:33] 2 int_state Interrupt State.


0: ISR
1: TRIG

[32] 1 color Color status bit

[31:16] 16 cidx Consumer Index (RO)

[15:0] 16 pidx Producer Index

C2H Completion Entry Structure

The following is the C2H Completion ring entry structure for User format when the data format bit is
set to 1’b1.

Table: C2H Completion Entry User Format Structure

Name Size Index

User defined bits for 32 Bytes 252 bits [255:4]


settings

User defined bits for 16 Bytes 124 bits [127:4]


settings

User defined bits for 8 Bytes settings 60 bits [63:4]

desc_used 1 [3:3]

Displayed in the footer


Page 92 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Name Size Index

err 1 [2:2]

color 1 [1:1]

Data format 1 [0:0]


The following is the C2H Completion ring entry structure for Standard format when the data format bit
is set to 1’b0.

Table: C2H Completion Entry Standard Format Structure

Name Size Index

User defined bits for 32 Bytes 236 bits [255:20]


settings

User defined bits for 16 Bytes 108 bits [127:20]


settings

User defined bits for 8 Bytes settings 44 bits [63:20]

Len 16 [19:4]

desc_used 1 [3:3]

err 1 [2:2]

color 1 [1:1]

Data format 1 [0:0]

C2H Completion Input Packet

The Completion Ring entry structure is shown in C2H Stream Engine.


When the user application sends the C2H data packet, it also sends the CMPT (completion) packet to
the QDMA. The CMPT packet has two formats: Standard Format and User Format.
The following is the CMPT packet from the user application in the standard format, which is when the
data format bit is 1’b0.

Table: CMPT Packet in Standard Format

Name Size Index

User defined 44 bits-236 bits [255:20]

rsvd 8 [19:12]

Qid 11 [11:1]

Data format 1 [0:0]

Displayed in the footer


Page 93 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The following is the CMPT packet from the user application in the user format, which is when the data
format bit is 1’b1.

Table: CMPT Packet in User Format

Name Size Index

User defined 61 bits-253 bits [255:3]

rsvd 2 [2:1]

Data format 1 [0:0]

The CMPT packet has three types: 8B, 16B, or 32B. When it is 8B or 16B, it only needs one beat of
the data. When it is 32B, it needs two beats of data. Each data beat is 128 bits.

C2H Interrupt/Completion Status Moderation

The QDMA provides a means to moderate the C2H completion interrupts and Completion status
writes on a per queue basis. The software can select one out of five modes for each queue. The
selected mode for a queue is stored in the QDMA in the C2H completion ring context for that queue.
After a mode has been selected for a queue, the driver can always select another mode when it
sends the completion ring CIDX update to QDMA.
The C2H completion interrupt moderation is handled by the completion engine inside the C2H engine.
The completion engine stores the C2H completion ring contexts of all the queues. It is possible to
individually enable or disable the sending of interrupts and C2H completion status descriptors for
every queue and this information is present in the completion ring context. It is worth mentioning that
the modes being described here moderate not only interrupts but also completion status writes. Also,
since interrupts and completion status writes can be individually enabled/disabled for each queue,
these modes will work only if the interrupt/completion status is enabled in the Completion context for
that queue.
The QDMA keeps only one interrupt outstanding per queue. This policy is enforced by QDMA even if
all other conditions to send an interrupt have been met for the mode. The way the QDMA considers
an interrupt serviced is by receiving a CIDX update for that queue from the driver.
The basic policy followed in all the interrupt moderation modes is that when there is no interrupt
outstanding for a queue, the QDMA keeps monitoring the trigger conditions to be met for that mode.
Once the conditions are met, an interrupt is sent out. While the QDMA subsystem is waiting for the
interrupt to be served, it remains sensitive to interrupt conditions being met and remembers them.
When the CIDX update is received, the QDMA subsystem evaluates whether the conditions are still
being met. If they are still being met, another interrupt is sent out. If they are not met, no interrupt is
sent out and QDMA resumes monitoring for the conditions to be met again.
Note that the interrupt moderation modes that the QDMA subsystem provides are not necessarily
precise. Thus, if the user application sends two C2H packets with an indication to send an interrupt, it
is not necessary that two interrupts will be generated. The main reason for this behavior is that when
the driver is interrupted to read the completion ring, and it is under no obligation to read exactly up to
the completion for which the interrupt was generated. Thus, the driver may not read up to the
interrupting completion descriptor, or it may even read beyond the interrupting completion descriptor if

Displayed in the footer


Page 94 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
there are valid descriptors to be read there. This behavior requires the QDMA to re-evaluate the
trigger conditions every time it receives the CIDX update from the driver.
The detailed description of each mode is given below:

TRIGGER_EVERY
This mode is the most aggressive in terms of interruption frequency. The idea behind this mode
is to send an interrupt whenever the completion engine determines that an unread completion
descriptor is present in the completion ring.

TRIGGER_USER
The QDMA provides a way to send a C2H packet to the subsystem with an indication to send out
an interrupt when the subsystem is done sending the packet to the host. This allows the user
application to perform interrupt moderation when the TRIGGER_USER mode is set.

TRIGGER_USER_COUNT
This mode allows the QDMA is sensitive to either of two triggers. One of these triggers is sent by
the user along with the C2H packet. The other trigger is the presence of more than a
programmed threshold of unread Completion entries in the Completion Ring, as seen by the HW.
This threshold is driver programmable on a per-queue basis. The QDMA evaluates whether or
not to send an interrupt when either of these triggers is detected. As explained in the preceding
sections, other conditions must be satisfied in addition to the triggers for an interrupt to be sent.

TRIGGER_USER_TIMER
In this mode, the QDMA is sensitive to either of two triggers. One of these triggers is sent by the
user along with the C2H packet. The other trigger is the expiration of the timer that is associated
with the C2H queue. The period of the timer is driver programmable on a per-queue basis. The
QDMA evaluates whether or not to send an interrupt when either of these triggers is detected. As
explained in the preceding sections, other conditions must be satisfied in addition to the triggers
for an interrupt to be sent. For more information, see C2H Timer.

TRIGGER_USER_TIMER_COUNT
This mode allows the QDMA is sensitive to any of three triggers. One of these triggers is sent by
the user along with the C2H packet. The second trigger is the expiration of the timer that is
associated with the C2H queue. The period of the timer is driver programmable on a per-queue
basis. The third trigger is the presence of more than a programmed threshold of unread
Completion entries in the Completion Ring, as seen by the HW. This threshold is driver
programmable on a per-queue basis. The QDMA evaluates whether or not to send an interrupt
when any of these triggers is detected. As explained in the preceding sections, other conditions
must be satisfied in addition to the triggers for an interrupt to be sent.

TRIGGER_DIS
In this mode, the QDMA does not send C2H completion interrupts in spite of them being enabled
for a given queue. The only way that the driver can read the completion ring in this case is when
it regularly polls the ring. The driver will have to make use of the color bit feature provided in the
completion ring when this mode is set as this mode also disables the sending of any completion
status descriptors to the completion ring.

The following are the flowcharts of different modes. These flowcharts are from the point of view of the
C2H Completion Engine. The Completion packets come in from the user logic and are written to the

Displayed in the footer


Page 95 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Completion Ring. The software (SW) update refers to the Completion Ring CIDX update sent from
software to hardware.

Figure: Flowchart for EVERY Mode

Figure: Flowchart for USER Mode

Displayed in the footer


Page 96 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Flowchart for USER_COUNT Mode

Displayed in the footer


Page 97 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Flowchart for USER_TIMER Mode

Displayed in the footer


Page 98 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

C2H Timer

The C2H timer is a trigger mode in the Completion context. It supports 2048 queues, and each queue
has its own timer. When the timer expires, a timer expire signal is sent to the Completion module. If
multiple timers expire at the same time, then they are sent out in a round robin manner.

Reference Timer
The reference timer is based on the timer tick. The register QDMA_C2H_INT (0xB0C) defines the
value for a timer tick. The 16 registers QDMA_C2H_TIMER_CNT (0xA00-0xA3C) has the timer
counts based on the timer tick. The timer_idx in the Completion context is the index to the 16
QDMA_C2H_TIMER_CNT registers. Each queue can choose its own timer_idx.

Displayed in the footer


Page 99 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Handling Exception Events

C2H Completion On Invalid Queue


When QDMA receives a Completion on a queue which has an invalid context as indicated by the Valid
bit in the C2H CMPT Context, the Completion is silently dropped.

C2H Completion On A Full Ring


The maximum number of Completion entries in the Completion Ring is 2 less than the total number of
entries in the Completion Ring. The C2H Completion Context has PIDX and CIDX in it. This allows the
QDMA to calculate the number of Completions in the Completion Ring. When the QDMA receives a
Completion on a queue that is full, QDMA takes the following actions:

Invalidates the C2H Completion Context for that queue.


Marks the C2H Completion Context with error.
Drops the Completion.
If enabled, sends a Status Descriptor marked with error.
If enabled and not outstanding, sends an Interrupt.
Sends a Marker Response with error.
Logs the error in the C2H Error Status Register.

C2H Completion With Descriptor Error


When the QDMA C2H Engine encounters a Descriptor Error, the following actions are taken in the
context of the C2H Completion Engine:

Invalidates the C2H Completion Context for that queue.


Marks the C2H Completion Context with error.
Sends the Completion out to the Completion Ring. It is marked with an error.
If enabled, sends a Status Descriptor marked with error.
If enabled and not outstanding, sends an Interrupt.
Sends a Marker Response with error.

C2H Completion With Invalid CIDX


The C2H Completion Engine has logic to detect that the CIDX value in the CIDX update points to an
empty location in the Completion Ring. When it detects such error, the C2H Completion Engine:

Invalidates the Completion Context.


Marks the Completion Context with error.
Logs an error in the C2H error status register.

Displayed in the footer


Page 100 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Bridge

The Bridge core is an interface between the AXI4 and the PCI Express integrated block. It contains
the memory mapped AXI4 to AXI4-Stream Bridge, and the AXI4-Stream Enhanced Interface Block for
PCIe. The memory mapped AXI4 to AXI4-Stream Bridge contains a register block and two functional
half bridges, referred to as the Slave Bridge and Master Bridge.

The slave bridge connects to the AXI4 Interconnect as a slave device to handle any issued AXI4
master read or write requests.
The master bridge connects to the AXI4 Interconnect as a master to process the PCIe generated
read or write TLPs.
The register block contains registers used in the Bridge core for dynamically mapping the AXI4
memory mapped (MM) address range provided using the AXIBAR parameters to an address for
PCIe range.

The core uses a set of interrupts to detect and flag error conditions.

AXI Transaction for PCIe

The following tables are the translation tables for AXI4-Stream and memory-mapped transactions.

Table: AXI4 Memory-Mapped Transactions to AXI4-Stream PCIe TLPs

AXI4 Memory-Mapped Transaction AXI4-Stream PCIe TLPs

INCR Burst Read of AXIBAR MemRd 32 (3DW)

INCR Burst Write to AXIBAR MemWr 32 (3DW)

INCR Burst Read of AXIBAR MemRd 64 (4DW)

INCR Burst Write to AXIBAR MemWr 64 (4DW)

Table: AXI4-Stream PCIe TLPs to AXI4 Memory Mapped Transactions

AXI4-Stream PCIe TLPs AXI4 Memory-Mapped Transaction

MemRd 32 (3DW) of PCIEBAR INCR Burst Read

MemWr 32 (3DW) to PCIEBAR INCR Burst Write

MemRd 64 (4DW) of PCIEBAR INCR Burst Read

MemWr 64 (4DW) to PCIEBAR INCR Burst Write

For PCIe® requests with lengths greater than 1 Dword, the size of the data burst on the Master AXI
interface will always equal the width of the AXI data bus even when the request received from the
PCIe link is shorter than the AXI bus width.
slave axi wstrb can be used to facilitate data alignment to an address boundary. slave axi
wstrb can equal 0 in the beginning of a valid data cycle and will appropriately calculate an offset to

Displayed in the footer


Page 101 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
the given address. However, the valid data identified by slave axi wstrb must be continuous from
the first byte enable to the last byte enable.

Transaction Ordering for PCIe

The functional mode conforms to PCIe® transaction ordering rules. See the PCI-SIG Specifications
for the complete rule set. The following behaviors are implemented in the functional mode to enforce
the PCIe transaction ordering rules on the highly-parallel AXI bus of the bridge.

The bresp to the remote (requesting) AXI4 master device for a write to a remote PCIe device is
not issued until the MemWr TLP transmission is guaranteed to be sent on the PCIe link before any
subsequent TX-transfers.
If Relaxed Ordering bit is not set within the TLP header, then a remote PCIe device read to a
remote AXI slave is not permitted to pass any previous remote PCIe device writes to a remote
AXI slave received by the functional mode. The AXI read address phase is held until the
previous AXI write transactions have completed and bresp has been received for the AXI write
transactions. If the Relaxed Ordering attribute bit is set within the TLP header, then the remote
PCIe device read is permitted to pass.
Read completion data received from a remote PCIe device are not permitted to pass any remote
PCIe device writes to a remote AXI slave received by the functional mode prior to the read
completion data. The bresp for the AXI write(s) must be received before the completion data is
presented on the AXI read data channel.

✎ Note: The transaction ordering rules for PCIe might have an impact on data throughput in heavy
bidirectional traffic.

BAR and Address Translation

BAR Addressing

Aperture_Base_Address_n represents Aperture Base Address of n th BAR in GUI.


Aperture_High_Address_n represents Aperture High Address of n th BAR in GUI.
AXI to PCIe Translation_n represents AXI to PCIe_translation of n th BAR in GUI.
Aperture_Base_Address n and Aperture_High_Address_n are used to calculate the size of the
AXI BAR n and during address translation to PCIe address.

Aperture_Base_Address_n provides the low address where AXI BAR n starts and will be
regarded as address offset 0x0 when the address is translated.
Aperture_High_Address_n is the high address of the last valid byte address of AXI BAR n (for
more details on how the address gets translated, see Address Translation.).

The difference between Aperture_Base_Address_n and Aperture_High_Address_n is your AXI


BAR n size. These values must be set accordingly such that the AXI BAR n size is a power of two and
must have at least 4K.
When a packet is sent to the core (outgoing PCIe packets), the packet must have an address that is
in the range of Aperture_Base_Address_n and Aperture_High_Address_n. Any packet that is
received by the core that has an address outside of this range will be responded to with a SLVERR.

Displayed in the footer


Page 102 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
When the IP integrator is used, these parameters are derived from the Address Editor tab within the
IP integrator. The Address Editor sets the AXI Interconnect as well as the core so the address range
matches, and the packet is routed to the core only when the packet has an address within the valid
range.
AXI Address width is limited to 48 bits.

Addressing Translation

The address space for PCIe® is different than the AXI address space. To access one address space
from another address space requires an address translation process. On the AXI side, the bridge
supports mapping to PCIe on up to six 32-bit or 64-bit AXI base address registers (BARs).
Four examples follow:

Example 1 (32-bit PCIe Address Mapping) demonstrates how to set up three AXI BARs and
translate the AXI address to a 32-bit address for PCIe.
Example 2 (64-bit PCIe Address Mapping) demonstrates how to set up three AXI BARs and
translate the AXI address to a 64-bit address for PCIe.
Example 3 demonstrates how to set up two 64-bit PCIe BARs and translate the address for PCIe
to an AXI address.
Example 4 demonstrates how to set up a combination of two 32-bit AXI BARs and two 64 bit AXI
BARs, and translate the AXI address to an address for PCIe.

Example 1 (32-bit PCIe Address Mapping)

This example shows the generic settings to set up three independent AXI BARs and address
translation of AXI addresses to a remote 32-bit address space for PCIe. This setting of AXI BARs
does not depend on the BARs for PCIe in the functional mode.
In this example, number of AXI BARs are 3, the following assignments for each range are made.

Aperture_Base_Address_0=0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 Kbytes)
AXI_to_PCIe_Translation_0=0x00000000_56710000 (Bits 63-32 are zero in order to
produce a
32-bit PCIe TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 16 bits are invalid translation values.)

Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 Kbytes)
AXI_to_PCIe_Translation_1=0x00000000_FEDC0000 (Bits 63-32 are zero in order to
produce a
32-bit PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 13 bits are invalid translation values.)

Aperture_Base_Address_1 =0x00000000_FE000000

Displayed in the footer


Page 103 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Aperture_High_Address_2 =0x00000000_FFFFFFFF (32 Mbytes)
AXI_to_PCIe_Translation_2=0x00000000_40000000 (Bits 63-32 are zero in order to
produce a
32-bit PCIe TLP. Bits 24-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 25 bits are invalid translation values.)

Figure: Example 1 Settings

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the AXI bus yields
0x56710ABC on the bus for PCIe.

Figure: AXI to PCIe Address Translation

Displayed in the footer


Page 104 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the AXI bus yields
0xFEDC1123 on the bus for PCIe.
Accessing the Bridge AXI BAR_2 with address 0x0000_FFEDCBA on the AXI bus yields
0x41FEDCBA on the bus for PCIe.

Example 2 (64-bit PCIe Address Mapping)

This example shows the generic settings to set up to three independent AXI BARs and address
translation of AXI addresses to a remote 64-bit address space for PCIe. This setting of AXI BARs
does not depend on the BARs for PCIe within the Bridge.
In this example, number of AXI BARs are three, the following assignments for each range are made:

Aperture_Base_Address_0 =0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 Kbytes)
AXI_to_PCIe_Translation_0=0x5000000056710000 (Bits 63-32 are non-zero in order to
produce a
64-bit PCIe TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 16 bits are invalid translation values.)

Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 Kbytes)
AXI_to_PCIe_Translation_1=0x60000000_FEDC0000 (Bits 63-32 are non-zero in order to
produce
a 64-bit PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 13 bits are invalid translation values.)

Aperture_Base_Address_2 =0x00000000_FE000000
Aperture_High_Address_2 =0x00000000_FFFFFFFF (32 Mbytes)
AXI_to_PCIe_Translation_2=0x7000000040000 (Bits 63-32 are non-zero in order to
produce a
64-bit PCIe TLP. Bits 24-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 25 bits are invalid translation values.)

Figure: Example 2 Settings

Displayed in the footer


Page 105 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the bus
yields0x5000000056710ABC on the bus for PCIe.
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the bus yields
0x60000000FEDC1123 on the bus for PCIe.
Accessing the Bridge AXI BAR_2 with address 0x0000_FFFEDCBA on the bus yields
0x7000000041FEDCBA on the bus for PCIe.

Example 3

This example shows the generic settings to set up two independent BARs for PCIe® and address
translation of addresses for PCIe to a remote AXI address space. This setting of BARs for PCIe does
not depend on the AXI BARs within the bridge.
In this example, number of PCIe BAR are two, the following range assignments are made.

Aperture_Base_Address_0 =0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 KB)
AXI_to_PCIe_Translation_0=0x00000000_56710000 (Bits 63-32 are zero to
produce a 32-bit PCIe
TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-zero
values in
the lower 16 bits are invalid translation values.)
Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 KB)
AXI_to_PCIe_Translation_1=0x50000000_FEDC0000 (Bits 63-32 are non-zero to
produce a 64-bit
PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Nonzero
values
in the lower 13 bits are invalid translation values.)

Figure: Example 3 Settings

Displayed in the footer


Page 106 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the AXI bus yields
0x56710ABC on the bus for PCIe.
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the AXI bus yields
0x50000000FEDC1123 on the bus for PCIe.

Figure: PCIe to AXI Translation

Example 4

This example shows the generic settings of four AXI BARs and address translation of AXI addresses
to a remote 32-bit and 64-bit addresses for PCIe® . This setting of AXI BARs do not depend on the
BARs for PCIe within the Bridge.
In this example, where number AXI BAR's are 4, the following assignments for each range are made:

Aperture_Base_Address_0 =0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 KB)
AXI_to_PCIe_Translation_0=0x00000000_56710000 (Bits 63-32 are zero to produce a 32-
bit PCIe
TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-zero values in
the lower 16 bits are invalid translation values.)

Displayed in the footer


Page 107 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 KB)
AXI_to_PCIe_Translation_1=0x50000000_FEDC0000 (Bits 63-32 are non-zero to produce a
64-bit
PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Non-zero
values
in the lower 13 bits are invalid translation values.)

Aperture_Base_Address_2 =0x00000000_FE000000
Aperture_High_Address_2 =0x00000000_FFFFFFFF (32 MB)
AXI_to_PCIe_Translation_2=0x00000000_40000000 (Bits 63-32 are zero to produce a 32-
bit PCIe
TLP. Bits 24-0 must be zero based on the AXI BAR aperture size. Non-zero values in
the lower 25 bits are invalid translation values.)

Aperture_Base_Address_3 =0x00000000_00000000
Aperture_High_Address_3 =0x00000000_00000FFF (4 KB)
AXI_to_PCIe_Translation_3=0x60000000_87654000 (Bits 63-32 are non-zero to produce a
64-bit
PCIe TLP. Bits 11-0 must be zero based on the AXI BAR aperture size. Non-zero
values
in the lower 12 bits are invalid translation values.)

Figure: Example 4 Settings

Displayed in the footer


Page 108 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the AXI bus yields
0x56710ABC on the bus for PCIe.
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the AXI bus yields
0x50000000FEDC1123 on the bus for PCIe.
Accessing the Bridge AX IBAR_2 with address 0x0000_FFFEDCBA on the AXI bus yields
0x41FEDCBA on the bus for PCIe.
Accessing the Bridge AXI BAR_3 with address 0x0000_00000071 on the AXI bus yields
0x6000000087654071 on the bus for PCIe.

Addressing Checks

When setting the following parameters for PCIe® address mapping, C_PCIEBAR2AXIBAR_n and
PF0_BARn_APERTURE_SIZE, be sure these are set to allow for the addressing space on the AXI
system. For example, the following setting is illegal and results in an invalid AXI address.

C_PCIEBAR2AXIBAR_n=0x00000000_FFFFF000
PF0_BARn_APERTURE_SIZE=0x06 (8 KB)

For an 8 Kilobyte BAR, the lower 13 bits must be zero. As a result, the C_PCIEBAR2AXIBAR_n value
should be modified to be 0x00000000_FFFFE0000. Also, check for a larger value on
PF0_BARn_APERTURE_SIZE compared to the value assigned to the C_PCIEBAR2AXIBAR_n parameter.
And example parameter setting follows.

C_PCIEBAR2AXIBAR_n=0xFFFF_E000
PF0_BARn_APERTURE_SIZE=0x0D (1 MB)

Displayed in the footer


Page 109 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
To keep the AXIBAR upper address bits as 0xFFFF_E000 (to reference bits [31:13]), the
PF0_BARn_APERTURE_SIZE parameter must be set to 0x06 (8 KB).

Malformed TLP

The integrated block for PCI Express® detects a malformed TLP. For the IP configured as an
Endpoint core, a malformed TLP results in a fatal error message being sent upstream if error reporting
is enabled in the Device Control register.

Abnormal Conditions

This section describes how the Slave side and Master side (see the following tables) of the functional
mode handle abnormal conditions.

Slave Bridge Abnormal Condition

Slave bridge abnormal conditions are classified as: Illegal Burst Type and Completion TLP Errors. The
following sections describe the manner in which the Bridge handles these errors.
Illegal Burst Type
The slave bridge monitors AXI read and write burst type inputs to ensure that only the INCR
(incrementing burst) type is requested. Any other value on these inputs is treated as an error condition
and the Slave Illegal Burst (SIB) interrupt is asserted. In the case of a read request, the Bridge
asserts SLVERR for all data beats and arbitrary data is placed on the Slave AXI4-MM read data bus.
In the case of a write request, the Bridge asserts SLVERR for the write response and all write data is
discarded.
Completion TLP Errors
Any request to the bus for PCIe (except for a posted Memory write) requires a completion TLP to
complete the associated AXI request. The Slave side of the Bridge checks the received completion
TLPs for errors and checks for completion TLPs that are never returned (Completion Timeout). Each
of the completion TLP error types are discussed in the subsequent sections.
Unexpected Completion
When the slave bridge receives a completion TLP, it matches the header RequesterID and Tag to the
outstanding RequesterID and Tag. A match failure indicates the TLP is an Unexpected Completion
which results in the completion TLP being discarded and a Slave Unexpected Completion (SUC)
interrupt strobe being asserted. Normal operation then continues.
Unsupported Request
A device for PCIe might not be capable of satisfying a specific read request. For example, if the read
request targets an unsupported address for PCIe, the completer returns a completion TLP with a
completion status of 0b001 - Unsupported Request. The completer that returns a completion TLP
with a completion status of Reserved must be treated as an unsupported request status, according to
the PCI Express Base Specification v3.0. When the slave bridge receives an unsupported request
response, the Slave Unsupported Request (SUR) interrupt is asserted and the DECERR response is
asserted with arbitrary data on the AXI4 memory mapped bus.
Completion Timeout
A Completion Timeout occurs when a completion (Cpl) or completion with data (CplD) TLP is not
returned after an AXI to PCIe memory read request, or after a PCIe Configuration Read/Write request.

Displayed in the footer


Page 110 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
For PCIe Configuration Read/Write request, completions must complete within the C_COMP_TIMEOUT
parameter selected value from the time the request is issued. For PCIe Memory Read request,
completions must complete within the value set in the Device Control 2 register in the PCIe
Configuration Space register. When a completion timeout occurs, an OKAY response is asserted with
all 1s data on the memory mapped AXI4 bus.
Poison Bit Received on Completion Packet
An Error Poison occurs when the completion TLP EP bit is set, indicating that there is poisoned data
in the payload. When the slave bridge detects the poisoned packet, the Slave Error Poison (SEP)
interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.
Completer Abort
A Completer Abort occurs when the completion TLP completion status is 0b100 - Completer Abort.
This indicates that the completer has encountered a state in which it was unable to complete the
transaction. When the slave bridge receives a completer abort response, the Slave Completer Abort
(SCA) interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.

Table: Slave Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

SIB interrupt is asserted.


Read Illegal burst type SLVERR response given with
arbitrary read data.

SIB interrupt is asserted.


Write Illegal burst type Write data is discarded.
SLVERR response given.

SUC interrupt is asserted.


Read Unexpected completion
Completion is discarded.

SUR interrupt is asserted.


Unsupported Request status
Read DECERR response given with
returned
arbitrary read data.

SCT interrupt is asserted.


Read Completion timeout SLVERR response given with
arbitrary read data.

Completion data is discarded.


SEP interrupt is asserted.
Read Poison bit in completion
SLVERR response given with
arbitrary read data.

SCA interrupt is asserted.


Read Completer Abort (CA) status returned SLVERR response given with
arbitrary read data.

Displayed in the footer


Page 111 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

PCIe Error Handling

RP_ERROR_FIFO (RP only)


Error handling is as follows:

1. Read register 0xE10 (INT_DEC) and check if bits is set to one of: [9] (correctable), [10]
(non_fatal) or [11] (fatal).
2. Read register 0xE20 (RP_CSR) and check if bit [16] is set to see if efifo_not_empty is set.
3. If FIFO is not empty read FIFO by reading 0xE2C (RP_FIFO_READ)
a. Error message indicates where the error comes from (i.e, requestor ID) and Error type.
4. To clear the error, write to 0xE2C (RP_FIFO_READ). Value does not matter
5. Repeat steps 2 and 3 until 0xE2C (RP_FIFO_READ) bit [18] valid bit is cleared.
6. Write 1 to register 0xE10 (INT_DEC) to clear bits [9] (correctable), [10] (non_fatal) or [11] (fatal).

RP_PME_FIFO (RP only)


Error handling is a follows:

1. Read register 0xE10 (INT_DEC) and check if bits [17] is set, which indicates PM_PME message
has been received.
2. Read register 0xE20 (RP_CSR) and check if bit [18] is set to see if pfifo_not_empty is set.
3. If FIFO is not empty, read FIFO by reading 0xE30 (RP_PFIFO).
a. Message will indicate where the message comes from (i.e., requestor ID).
4. To clear the error, write to 0xE30 (RP_PFIFO). Value does not matter.
5. Repeat steps 2 and 3 until 0xE30 (RP_PFIFO) bit [31] valid bit is cleared.
6. Write 1 to register 0xE10 (INT_DEC) to clear bits [17].

Master Bridge Abnormal Condition

The following sections describe the manner in which the master bridge handles abnormal conditions.

AXI DECERR Response

When the master bridge receives a DECERR response from the AXI bus, the request is discarded
and the Master DECERR (MDE) interrupt is asserted. If the request was non-posted, a completion
packet with the Completion Status = Unsupported Request (UR) is returned on the bus for PCIe.

AXI SLVERR Response

When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.

Displayed in the footer


Page 112 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Max Payload Size for PCIe, Max Read Request Size or 4K Page Violated

When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.

Completion Packets

When the MAX_READ_REQUEST_SIZE is greater than the MAX_PAYLOAD_SIZE, a read request for PCIe
can ask for more data than the master bridge can insert into a single completion packet. When this
situation occurs, multiple completion packets are generated up to MAX_PAYLOAD_SIZE, with the Read
Completion Boundary (RCB) observed.

Poison Bit

When the poison bit is set in a transaction layer packet (TLP) header, the payload following the
header is corrupted. When the master bridge receives a memory request TLP with the poison bit set,
it discards the TLP and asserts the Master Error Poison (MEP) interrupt strobe.

Zero Length Requests

When the master bridge receives a read request with the Length = 0x1, FirstBE = 0x00, and LastBE =
0x00, it responds by sending a completion with Status = Successful Completion.
When the master bridge receives a write request with the Length = 0x1, FirstBE = 0x00, and LastBE
= 0x00 there is no effect.

Table: Master Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

MSE interrupt strobe asserted.


Read SLVERR response Completion returned with Completer
Abort status.

Write SLVERR response MSE interrupt strobe asserted.

MEP interrupt strobe asserted.


Write Poison bit set in request
Data is discarded.

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Displayed in the footer


Page 113 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Transfer Type Abnormal Condition Bridge Response

Write DECERR response MDE interrupt strobe asserted.

Link Down Behavior

The normal operation of the functional mode is dependent on the integrated block for PCIe
establishing and maintaining the point-to-point link with an external device for PCIe. If the link has
been lost, it must be re-established to return to normal operation.
When a Hot Reset is received by the functional mode, the link goes down and the PCI Configuration
Space must be reconfigured.
Initiated AXI4 write transactions that have not yet completed on the AXI4 bus when the link goes down
have a SLVERR response given and the write data is discarded. Initiated AXI4 read transactions that
have not yet completed on the AXI4 bus when the link goes down have a SLVERR response given,
with arbitrary read data returned.
Any MemWr TLPs for PCIe that have been received, but the associated AXI4 write transaction has not
started when the link goes down, are discarded.

Interrupts

The QDMA supports up to 2K total MSI-X vectors. A single MSI-X vector can be used to support
multiple queues.
The QDMA supports Interrupt Aggregation. Each vector has an associated Interrupt Aggregation Ring.
The QID and status of queues requiring service are written into the Interrupt Aggregation Ring. When
a PCIe® MSI-X interrupt is received by the Host, the software reads the Interrupt Aggregation Ring to
determine which queue needs service. Mapping of queues to vectors is programmable vector number
provided in the queue context. It supports MSI-X interrupt modes for SR-IOV and non-SR-IOV.

Asynchronous and Queue Based Interrupts

The QDMA supports both asynchronous interrupts and queue-based interrupts.


The asynchronous interrupts are used for capturing events that are not synchronous to any DMA
operations, namely, errors, status, and debug conditions.
Interrupts are broadcast to all PFs, and maintain status for each PF in a queue based scheme. The
queue based interrupts include the interrupts from the H2C MM, H2C stream, C2H MM, and C2H
stream.

Interrupt Engine

The QDMA Interrupt Engine handles the queue based interrupts and the error interrupt.
This block diagram is of the Interrupt Engine.

Figure: Interrupt Engine Block Diagram

Displayed in the footer


Page 114 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

When the H2C or C2H interrupt occur, it first reads the QID to vector table. The table has 2K entries to
support up to 2K queues. Each entry of the table includes two portions: one for H2C interrupts, and
one for C2H interrupts. The table maps the QID to the vector, and indicates if the interrupt is direct
interrupt mode or indirect interrupt mode. If it is direct interrupt mode, the vector is used to generate
the PCIe MSI-X message. If it is indirect interrupt mode, the vector is the ring index, which is the index
of the Interrupt Context for the Interrupt Aggregation Ring.
The following is the data in the QID to vector table.

Table: QID to Vector Table

Signal Bit Owner Description

h2c_en_coal [17:17] Driver 1’b1: indirect interrupt mode.


1’b0: direct interrupt mode for H2C
interrupt.

h2c_vector [16:9] Driver For direct interrupt, it is the interrupt


vector index of MSI-X table. For indirect
interrupt, it is the ring index.

c2h_en_coal [8:8] Driver 1’b1: indirect interrupt mode.


1’b0: direct interrupt mode for C2H
interrupt.

c2h_vector [7:0] Driver For direct interrupt, it is the interrupt


vector index of MSI-X table. For indirect
interrupt, it is the ring index.

The QID to Vector table is programmed by the context access.

Context access through QDMA_TRQ_SEL_IND:


QDMA_IND_CTXT_CMD.Qid = Qid
QDMA_IND_CTXT_CMD.Sel = MDMA_CTXT_SEL_INT_QID2VEC (0xC)
QDMA_IND_CTXT_CMD.Op = MDMA_CTXT_CMD_WR or MDMA_CTXT_CMD_RD
(MDMA_CTXT_CMD_CLR and MDMA_CTXT_CMD_INV are not supported for Qid to
Vector table)

Displayed in the footer


Page 115 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Direct Interrupt

For direct interrupt, the QDMA processes the interrupt with the following steps.

Look up the QID to Vector Table.


Send out the PCIe MSI-X message.

Interrupt Aggregation Ring

For indirect interrupt, it does interrupt aggregation. The following are some restrictions for the interrupt
aggregation.

Each Interrupt Aggregation Ring can only be associated with one function. But multiple rings can
be associated with the same function.
The interrupt engine supports up to three interrupts from same source, until software services
the interrupts.

In the indirect interrupt, the QDMA processes the interrupt with the following steps.

Look up the QID to Vector Table.


Look up the Interrupt Context.
Write to the Interrupt Aggregation Ring.
Send out the PCIe MSI-X message.

This block diagram is of the indirect interrupt.

Figure: Indirect Interrupt

The Interrupt Context includes the information of the Interrupt Aggregation Ring. It has 256 entries to
support up to 256 Interrupt Aggregation Rings.
The following is the Interrupt Context Structure (0x8).

Table: Interrupt Context Structure (0x8)

Signal Bit Owner Description

pidx [75:64] DMA Producer Index

page_size [63:61] Driver Interrupt Aggregation Ring size:

Displayed in the footer


Page 116 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Bit Owner Description


0: 4 KB
1: 8 KB
2: 12 KB
3: 16 KB
4: 20 KB
5: 24 KB
6: 28 KB
7: 32 KB

baddr_4k [60:9] Drive Base address of Interrupt Aggregation


Ring – bit[63:12]

color [8] DMA Color bit

int_st [7] DMA Interrupt State:


0: WAIT_TRIGGER
1: ISR_RUNNING

reserved [6] NA Reserved

vec [5:1] Driver Interrupt vector index in MSI-X table

valid [0] Driver Valid

The software needs to size the Interrupt Aggregation Ring appropriately. Each source can send up to
three messages to the ring. Therefore, the size of the ring needs satisfy the following formula.
Number of entry >= 3 * (number of queues + error interrupts that are mapped to this ring)
The Interrupt Context is programmed by the context access. The QDMA_IND_CTXT_CMD.Qid has
the ring index, which is from the Qid to Vector Table. The operation of MDMA_CTXT_CMD_CLR can
clear all of the bits in the Interrupt Context. The MDMA_CTXT_CMD_INV can clear the valid bit.

Context access through QDMA_TRQ_SEL_IND:


QDMA_IND_CTXT_CMD.Qid = Ring index
QDMA_IND_CTXT_CMD.Sel = MDMA_CTXT_SEL_INT_COAL (0x8)
QDMA_IND_CTXT_CMD.cmd.Op =
MDMA_CTXT_CMD_WR,
MDMA_CTXT_CMD_RD,
MDMA_CTXT_CMD_CLR , or
MDMA_CTXT_CMD_INV.

After it looks up the Interrupt Context, it then writes to the Interrupt Aggregation Ring. It also updates
the Interrupt Context with the new PIDX, color, and the interrupt state.
This is the Interrupt Aggregation Ring entry structure. It has 8B data.

Table: Interrupt Aggregation Ring Entry Structure

Signal Bit Owner Description

Displayed in the footer


Page 117 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Bit Owner Description

coal_color [63:63] DMA The color bit of the Interrupt Aggregation


Ring. This bit inverts every time pidx
wraps around on the Interrupt
Aggregation Ring.

qid [62:52] DMA This is from Interrupt source. Queue ID.

int_type [51:51] DMA 0: H2C


1: C2H

err_int [50:50] DMA 0: non-error interrupt


1: error interrupt

reserved [49:39] DMA Reserved

stat_desc [38:0] DMA This is the status descriptor of the


Interrupt source.
The following is the information in the stat_desc.

Table: stat_desc Information

Signal Bit Owner Description

error [38:35] DMA This is from interrupt source:


{c2h_err[1:0], h2c_err[1:0]}

int_st [34:33] DMA This is from Interrupt source. Interrupt


state.
0: WRB_INT_ISR
1: WRB_INT_TRIG
2: WRB_INT_ARMED

color [32:32] DMA This is from Interrupt source. This bit


inverts every time pidx wraps around
and this field gets copied to color field of
descriptor.

cidx [31:16] DMA This is from Interrupt source.


Cumulative consumed pointer

pidx [15:0] DMA This is from Interrupt source.


Cumulative pointer of total interrupt
Aggregation Ring entry written

When the software allocates the memory space for the Interrupt Aggregation Ring, the coal_color
starts with 1’b0. The software needs to initialize the color bit of the Interrupt Context to be 1’b1.
When the hardware writes to the Interrupt Aggregation Ring, it reads color bit from the Interrupt

Displayed in the footer


Page 118 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Context, and writes it to the entry. When the ring (PIDX) wraps around, the hardware will flip the color
bit in the Interrupt Context. In this way, when the software reads from the Interrupt Aggregation Ring,
it will know which entries got written by the hardware by looking at the color bit.
The software reads the Interrupt Aggregation Ring to get the qid, the int_type (H2C or C2H), and
the err_int. From the qid, the software can identify it the queue is stream or MM.
When the err_int is set, it is an error interrupt. The software can then check the error status register
of the Central Error Aggregator QDMA_GLBL_ERR_STAT (0x248). The register shows the error
source. The software can then read the error status register of the Leaf Error Aggregator of the
corresponding error.
The stat_desc in the Interrupt Aggregation Ring is the status descriptor from the Interrupt source.
When the status descriptor is disabled, the software can get the status descriptor information from the
Interrupt Aggregation Ring.
The two cases are as follows:

The interrupt source is C2H stream, then it is the status descriptor of the C2H Completion Ring.
The software can read the pidx of the C2H Completion Ring.
The interrupt source is others (H2C stream, H2C MM, C2H MM), then it is the status descriptor
of that source. The software can read the cidx.

Finally, the QDMA sends out the PCIe MSI-X message using the interrupt vector from the Interrupt
Context.
When the PCIe MSI-X interrupt is received by the Host, the software reads the Interrupt Aggregation
Ring to determine which queue needs service. After the software reads the Interrupt Aggregation
Ring, it will do a dynamic pointer update for the software CIDX to indicate the cumulative pointer that
the software reads to. The software does the dynamic pointer update using the register
QDMA_DMAP_SEL_INT_CIDX[2048] (0x6400). If the software cidx is equal to the pidx, this will
trigger a write to the Interrupt Context on the interrupt state of that queue. This is to indicate the
QDMA that the software already reads all of the entries in the Interrupt Aggregation Ring. If the
software cidx is not equal to the pidx, it will send out another PCIe MSI-X message. Therefore, the
software can read the Interrupt Aggregation Ring again. After that, the software can perform a pointer
update of the interrupt source ring. For example, for a C2H stream interrupt, the software will update
the pointer of the interrupt source ring, which is the C2H Completion Ring.
These are the steps for the software:

1. After the software gets the PCIe MSI-X message, it reads the Interrupt Aggregation Ring entries.
2. The software uses the coal_color bit to identify the written entries. Each entry has Qid and
Int_type (H2C or C2H). From the Qid and Int_type, the software can check if it is stream or
MM. This points to a corresponding source ring. For example, if it is C2H stream, the source ring
is the C2H Completion Ring. The software can then read the source ring to get information, and
do a dynamic pointer update of the source ring after that.
3. After the software finishes reading of all written entries, it does one dynamic point update of the
software cidx using the register QDMA_DMAP_SEL_INT_CIDX[2048] (0x6400). The Qid in the
register is the Qid in the last written entry. This tells hardware the pointer of the Interrupt
Aggregation Ring that the software reads to.
If the software cidx is not equal to the PIDX, the hardware will send out another PCIE MSI-X
message, so that the software can read the Interrupt Aggregation Ring again.

Displayed in the footer


Page 119 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
When the software performs the dynamic point update for the Interrupt Aggregation Ring using the
register QDMA_DMAP_SEL_INT_CIDX[2048] (0x6400), it needs to use the virtual qid. The FMAP
block in the hardware translates the virtual qid to absolute qid. The interrupt Engine uses the
absolute qid when it looks up the qid to Vector Table.

Figure: Interrupt Engine

The following diagram shows the indirect interrupt flow. The Interrupt module gets the interrupt
requests. It first writes to the Interrupt Aggregation Ring. Then it waits for the write completions. After
that, it sends out the PCIe MSI-X message. The interrupt requests can keep on coming, and the
Interrupt module keeps on processing them. In the meantime, the software reads the Interrupt
Aggregation Ring and it does the dynamic pointer update. If the software CIDX is not equal to the
PIDX, it will send out another PCIe MSI-X message.

Figure: Interrupt Flow

Displayed in the footer


Page 120 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Legacy Interrupt

The QDMA supports the legacy interrupt for physical function, and it is expected that the single queue
will be associated with interrupt.
To enable the legacy interrupt, the software needs to set the en_lgcy_intr bit in the register
QDMA_GLBL_INTERRUPT_CFG (0x288). When en_lgcy_intr is set, the QDMA will not send out
MSI-X interrupt.
When the legacy interrupt wire INTA, INTB, INTC, or INTD is asserted, the QDMA hardware sets the
lgcy_intr_pending bit in the QDMA_GLBL_INTERRUPT_CFG (0x288) register. When the software
receives the legacy interrupt, it needs to clear the lgcy_intr_pending bit. The hardware will keep
the legacy interrupt wire asserted until the software clears the lgcy_intr_pending bit.

User Interrupt

Figure: Interrupt

Displayed in the footer


Page 121 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Error Interrupt

There are Leaf Error Aggregators in different places. They log the errors and propagate the errors to
the Central Error Aggregator. Each Leaf Error Aggregator has an error status register and an error
mask register. The error mask is enable mask. Irrespective of the enable mask value, the error status
register always logs the errors. Only when the error mask is enabled, the Leaf Error Aggregator will
propagate the error to the Central Error Aggregator.
The Central Error Aggregator aggregates all of the errors together. When any error occurs, it can
generate an Error Interrupt if the err_int_arm bit is set in the error interrupt register
QDMA_GLBL_ERR_INT (0B04). The err_int_arm bit is set by the software and cleared by the
hardware when the Error Interrupt is taken by the Interrupt Engine. The Error Interrupt is for all of the
errors including the H2C errors and C2H errors. The Software must set this err_int_arm bit to
generate interrupt again.
The Error Interrupt supports the direct interrupt only. Register QDMA_GLBL_ERR_INT bit[23],
en_coal must always be programmed to 0 (direct interrupt).
The Error Interrupt gets the vector from the error interrupt register QDMA_GLBL_ERR_INT. For the
direct interrupt, the vector is the interrupt vector index of the MSI-X table.
Here are the processes of the Error Interrupt.

1. Reads the Error Interrupt register QDMA_C2H_GLBL_INT (0B04) to get function and vector
numbers.
2. Sends out the PCIe MSI-X message.

The following figure shows the error interrupt register block diagram.

Figure: Error Interrupt Handling

Queue Management

Function Map Table

The Function Map Table is used to allocate queues to each function. The index into the RAM is the
function number. Each entry contains the base number of the physical QID and the number of queues
allocated to the function. It provides a function based, queue access protection mechanism by
translating and checking accesses to logical queues (through QDMA_TRQ_SEL_QUEUE_PF and
QDMA_TRQ_SEL_QUEUE_VF address space) to their physical queues. Direct register accesses to
queue space beyond what is allocated to the function in the table will be canceled and an error will be
logged.
The table can be programmed through the QDMA_TRQ_SEL_FMAP address space. Because this
space only exists in the PF address map, only a physical function can modify this table.

Displayed in the footer


Page 122 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Context Programming

Program all mask registers to 1. They are QDMA_IND_CTXT_MASK_0 (0x814) to


QDMA_IND_CTXT_MASK_7 (0x820).
Program context values for the following registers: QDMA_IND_CTXT_DATA_0 (0x804) to
QDMA_IND_CTXT_DATA_7 (0x810).
Refer to Software Descriptor Context Structure, C2H Prefetch Context Structure and C2H
Prefetch Context Structure to program the context data registers.
Program any context to corresponding queue in the following context command register:
QDMA_IND_CTXT_CMD (0x824).
Qid is given in bits [17:7].
Opcode bits [6:5] selects what operations must be done.
0 = QDMA_CTXT_CLR : All content of context is zeroed out. Qinv will be sent out on
tm_dsc_sts
1 = QDMA_CTXT_WR : Write context
2 = QDMA_CTXT_RD : Read context
3 = QDMA_CTXT_INV : Qen in set to zero and other context values are intact. Qinv
will be sent out on tm_dsc_sts and unused credits will be sent out.
The context that is accessed is given in bits [4:1].
0 = QDMA_CTXT_SELC_DEC_SW_C2H; C2H Descriptor SW Context
1 = QDMA_CTXT_SELC_DEC_SW_H2C; H2C descriptor SW context
2 = QDMA_CTXT_SELC_DEC_HW_C2H; C2H Descriptor HW Context
3 = QDMA_CTXT_SELC_DEC_HW_H2C; H2C Descriptor HW Context
4 = QDMA_CTXT_SELC_DEC_CR_C2H; C2H Descriptor HW Context
5 = QDMA_CTXT_SELC_DEC_CR_H2C; H2C Descriptor HW Context
6 = QDMA_CTXT_SELC_WRB; CMPT / used ring Context
7 = QDMA_CTXT_SELC_PFTCH; C2H PFCH Context
8 = QDMA_CTXT_SELC_INT_COAL; Interrupt Aggregation Context
Context programing write/read does not occur when bit [0] is set.

Queue Setup

Clear Descriptor Software Context.


Clear Descriptor Hardware Context.
Clear Descriptor Credit Context.
Set-up Descriptor Software Context.
Clear Prefetch Context.
Clear Completion Context.
Set-up Completion Context.
If interrupts/status writes are desired (enabled in the Completion Context), an initial
Completion CIDX update is required to send the hardware into a state where it is sensitive
to trigger conditions. This initial CIDX update is required, because when out of reset, the
hardware initializes into an unarmed state.
Set-up Prefetch Context.

Displayed in the footer


Page 123 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Queue Teardown

Queue Tear-down (C2H Stream):

Send Marker packet to drain the pipeline.


Wait for Marker completion.
Invalidate/Clear Descriptor Software Context.
Invalidate/Clear Prefetch Context.
Invalidate/Clear Completion Context.
Invalidate Timer Context (clear cmd is not supported).

Queue Tear-down (H2C Stream & MM):

Invalidate/Clear Descriptor Software Context.

Virtualization

QDMA implements SR-IOV passthrough virtualization where the adapter exposes a separate virtual
function (VF) for use by a virtual machine (VM). A physical function (PF) can be optionally made
privileged with full access to QDMA registers and resources, but only VFs implement per queue
pointer update registers and interrupts. VF drivers must communicate with the driver attached to the
PF through the mailbox for configuration, resource allocation, and exception handling. The QDMA
implements function level reset (FLR) to enable operating system on VM to reset the device without
interfering with the rest of the platform.

Table: Privileged Access

Type Notes

Queue context/other Registers for Context access only controlled by PFs (All 4 PFs).
control registers

Status and statistics Mainly PF only registers. VFs need to coordinate with a PF driver for
registers error handling. VFs need to communicate through the mailbox with driver
attached to PF.

Data path registers Both PFs and VFs must be able to write the registers involved in data
path without needing to go through a hypervisor. Pointer update for
H2C/C2H Descriptor Fetch can be done directly by VF or PF for the
queues associated with the function using its own BAR space. Any
pointer updates to queue that do not belong to the function will be
dropped with error logged.

Other protection Turn on IOMMU to protect bad memory accesses from VMs.
recommendations

PF driver and VF The VF driver needs to communicate with the PF driver to request
driver communication operations that have global effect. This communication channel needs

Displayed in the footer


Page 124 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Type Notes
this ability to pass messages and generate interrupts. This
communication channel utilizes a set of hardware mailboxes for each VF.

Mailbox

In a virtualized environment, the driver attached to a PF has enough privilege to program and access
QDMA registers. For all the lesser privileged functions, certain PFs and all VFs must communicate
with privileged drivers using the mailbox mechanism. The communication API must be defined by the
driver. The QDMA IP does not define it.
Each function (both PF and VF) has an inbox and an outbox that can fit a message size of 128B. A VF
accesses its own mailbox, and a PF accesses its own mailbox and all the functions (PF or VF)
associated with that PF.
✎ Note: Enabling mailbox will increase PL utilization.
The QDMA mailbox allows the following access:

From a VF to the associated PF.


From a PF to any VF belonging to its own virtual function group (VFG).
From a PF (typically a driver that does not have access to QDMA registers) to another PF.

Figure: Mailbox

VF To PF Messaging
A VF is allowed to post one message to a target PF mailbox until the target function (PF) accepts it.
Before posting the message the source function should make sure its o_msg_status is cleared, then
the VF can write the message to its Outgoing Message Registers. After finishing message writing, the

Displayed in the footer


Page 125 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
VF driver sends msg_send command through write 0x1 at the control/status register (CSR) address
0x5004. The mailbox hardware then informs the PF driver by asserting i_msg_status field.
The function driver should enable the periodic polling of the i_msg_status to check the availability of
incoming messages. At a PF side, i_msg_status = 0x1 indicates one or more message is pending
for the PF driver to pick up. The cur_src_fn in the Mailbox Status Register gives the function ID of
the first pending message. The PF driver should then set the Mailbox Target Function Register to the
source function ID of the first pending message. Then access to a PF’s Incoming Message Registers
is indirectly, which means the mailbox hardware will always return the corresponding message bytes
sent by the Target function. Upon finishing the message reading, the PF driver should also send
msg_rcv command through write 0x2 at the CSR address. The hardware will deassert the
o_msg_status at the source function side. The following figure illustrates the messaging flow from a
VF to a PF at both the source and destination sides.

Figure: VF to PF Messaging Flow

PF To VF Messaging
The messaging flow from a PF to the VFs that belong to its VFG is slightly different than the VF to PF
flow because:
A PF can send messages to multiple destination functions, therefore, it may receives multiple
acknowledgments at the moment when checking the status. As illustrated in the following figure, a PF
driver must set Mailbox Target Function Register to the destination function ID before doing any

Displayed in the footer


Page 126 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
message operation; for example, checking the incoming message status, write message, or send the
command. At the VF side (receiving side), whenever a VF driver get the i_msg_status = 0x1, the
VF driver should read its Incoming Message Registers to pick up the message. Depending on the
application, the VF driver can send the msg_rcv immediately after reading the message or after the
corresponding message being processed.
To avoid one-by-one polling of the status of outgoing messages, the mailbox hardware provides a set
of Acknowledge Status Registers (ASR) for each PF. Upon the mailbox receiving the msg_rcv
command from a VF, it deasserts the o_msg_status field of the source PF and it also sets the
corresponding bit in the Acknowledge Status Registers. For a given VF with function ID <N>,
acknowledge status is at:

Acknowledge Status Register address: <N> / 32 + <0x22420 Register Address>


Acknowledge Status bit location: <N> / 32

The mailbox hardware asserts the ack_status filed in the Status Register (0x22400) when there is
any bit was asserted in the Acknowledge Status Register (ASR). The PF driver can poll the
ack_status before actually reading out the Acknowledge status registers. The PF driver may detect
multiple completions through one register access. After being processed, the PF driver should also
write the value back to the same register address to clear the status.

Figure: PF to VF Messaging Flow

Mailbox Interrupts
The mailbox module supports interrupt as the alternative event notification mechanism. Each mailbox
has an Interrupt Control Register (at the offset 0x22410 for a PF, or at the offset 0x5010 for a VF). Set
1 to this register to enable the interrupt. Once the interrupt is enabled, the mailbox will send the
interrupt to the QDMA given there is any pending event for the mailbox to process, namely, any
incoming message pending or any acknowledgment for the outgoing messages. Configure the

Displayed in the footer


Page 127 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
interrupt vector through the Function Interrupt Vector Register (0x22408 for a FP, or 0x5008 for a VF)
according to the driver configuration.
Enabling the interrupt does not change the event logging mechanism, which means the user must
check the pending events through reading the Function Status Registers. The first step to respond to
an interrupt request is disabling the interrupt. It is possible that the actual number of the pending
events is more than the number of the events at the moment when the mailbox is sent the interrupt.
✎ Recommended: AMD recommends that the user application interrupt handler process all the
pending events that present in the status register. Upon finishing the interrupt response, the user
application re-enables the interrupt.
The mailbox will check its event status at the time the interrupt control change from disabled to
enabled. If there is any new events that arrived the mailbox between reading the interrupt status and
the re-enabling the interrupt, the mailbox will generate a new interrupt request immediately.

Function Level Reset

The function level reset (FLR) mechanism enables the software to quiesce and reset Endpoint
hardware with function-level granularity. When a VF is reset, only the resources associated with this
VF are reset. When a PF is reset, all resources of the PF, including that of its associated VFs, are
reset. Since FLR is a privileged operation, it must be performed by the PF driver running in the
management system.

Use Mode
Hypervisor requests for FLR when a function is attached and detached (i.e., power on and off).
You can request FLR as follows:

echo 1 > /sys/bus/pci/devices/$BDF/reset

where $BDF is the bus device function number of the targeted function.

FLR Process
A complete FLR process involves of three major steps.

1. Pre-FLR: Pre-FLR resets all QDMA context structure, mailbox, and user logic of the target
function.
Each function has a register called MDMA_PRE_FLR_STATUS, which keeps track of the
pre-FLR status of the function. The offset is calculated as
MDMA_PRE_FLR_STATUS_OFFSET = MB_base + 0x100, which is located at offset
0x100 from the mailbox memory space of the function. Note that PF and VF have different
MB_base. The definition of MDMA_PRE_FLR_STATUS is shown in the table below.
The software writes 1 to MDMA_PRE_FLR_STATUS[0] (bit 0) of the target function to
initiate pre-FLR. Hardware will clear MDMA_PRE_FLR_STATUS[0] when pre-FLR
completes. The software keeps polling on MDMA_PRE_FLR_STATUS[0], and only
proceeds to the next step when it returns 0.

Displayed in the footer


Page 128 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: MDMA_PRE_FLR_STATUS Register

Offset Field R/W Type Width Default Description

0x100 pre_flr_st RW 32 0 [31:1] reserved.


[0]: 1 Initiates pre-FLR.
[0]: 0 pre-FLR done.
bit[0] is set by the driver
and cleared by the
hardware.
2. Quiesce: The software must ensure all pending transaction is completed. This can be done by
polling the Transaction Pending bit in the Device Status register (in PCIe Configuration Space),
until it is cleared or times out after a certain period of time.
3. PCIe-FLR: PCIe-FLR resets all resources of the target function in the PCIe controller.
✎ Note: Initiate Function Level Reset bit (bit 15 of PCIe Device Control Register) of the target
function should be set to 1 to trigger FLR process in PCIe.

OS Support
If the PF driver is loaded and alive (i.e., use mode 1), all three steps aforementioned are performed by
the driver. However, for Versal, if a user wants to perform FLR before loading the PF driver (as
defined in Use Mode above), an OS kernel patch is provided to allow OS to perform the correct FLR
sequence through functions defined in //…/source/drivers/pci/quick.c.

Mailbox IP

You need to add a new IP from the IP catalog to instantiate pcie_qdma_mailox Mailbox IP. This IP is
needed for function virtualization. pcie_qdma_mailbox IP should be connected to the versel_cips
IP as shown in the following diagram:

Figure: CPM4 Mailbox Connection

Displayed in the footer


Page 129 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

To connect the Mailbox IP connection, follow these steps.

Add PCIe QDMA Mailbox IP. To do so,


1. Configure the IP for number of PFs (should be same as number of PF selected in QDMA
configuration).
2. Configure the IP for number of VFs in each PF(should be same as number of VF selected
in QDMA configuration).
✎ Note: It is important to match number PFs and VFs to the numbers configured in the QMDA
IP. If not, the design will not work.
Re-configure NoC IP to add one extra AXI Master port. To do so,
1. Assign one more AXI clock.
2. In Outputs tab, assign M00_AXI to aclk2.
3. In Connectivity tab, select the MM00_AXI pl option for both S00_AXI and
SS01_AXIps_pcie.
Add AXI SmartConnect IP.
1. Configure the IP to have one Master, one Slave, one clock, and one reset.

Follow the above diagram to make all necessary connections. Mailbox IP has two clocks, axi_aclk
and ip_clk and two resets axi_aresetn and ip_resetn. Connect two clocks and two resets
together.

Connect dma0_usr_irq from CIPS IP to output of Mailbox IP.


Connect dma0_usr_flr from CIPS IP to output of Mailbox IP.
Make usr_flr and usr_irq interface in Mailbox IP as external pins.

✎ Note: Mailbox access can be steered to NoC0 or NoC1 port based on CIPS the GUI configuration.
You should configure the NoC based on the CIPS GUI selection.

Displayed in the footer


Page 130 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Port ID

Port ID is the categorization of some queues on the FPGA side. When the DMA is shared by more
than one user application, the port ID provides indirection to QID so that all the interfaces can be
further demuxed with lower cost. However, when used by a single application, the port ID can be
ignored and drive the port id inputs to 0s.

System Management

Resets

The QDMA supports all the PCIe defined resets, such as link down, reset, hot reset, and function level
reset (FLR) (supports only Quiesce mode).

VDM

Vendor Defined Messages (VDMs) are an expansion of the existing messaging capabilities with PCI
Express. PCI Express Specification defines additional requirements for Vendor Defined Messages,
header formats and routing information. For details, see PCI-SIG Specifications
(https://www.pcisig.com/specifications).
QDMA allows the transmission and reception of VDMs. To enable this feature, select Enable Bridge
Slave Mode in the Vivado Customize IP dialog box. This enables the st_rx_msg interface.
RX Vendor Defined Messages are stored in shallow FIFO before they are transmitted to the output
port. When there are many back-to-back VDM messages, FIFO will overflow and these message will
be dropped. So it is better to repeat VDM messages at regular intervals.
Throughput for VDMs depend on several factors: PCIe speed, data width, message length, and the
internal VDM pipeline.
Internal VDM pipelines must be replaced with the Internal RX VDM FIFO interface for network on chip
(NoC) access, which has a shallow buffer of 64B.
✎ Note: New VDM messages will be dropped if more than 64B of VDM are received before the FIFO
is serviced through NoC.
Internal RX VDM FIFO interface cannot handle back-to-back messages. Pipeline throughput can only
handle one in every four accesses, which is about 25% efficiency from the host access.
‼ Important: Do not use back-to-back VDM access.
RX Vendor Defined Messages:

1. When QDMA receives a VDM, the incoming messages will be received on the st_rx_msg port.
2. The incoming data stream will be captured on the st_rx_msg_data port (per-DW).
3. The user application needs to drive the st_rx_msg_rdy to signal if it can accept the incoming
VDMs.
4. Once st_rx_msg_rdy is High, the incoming VDM is forwarded to the user application.
5. The user application needs to store this incoming VDMs and track of how many packets were
received.

TX Vendor Defined Messages:

Displayed in the footer


Page 131 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
1. To enable transmission of VDM from QDMA, program the TX Message registers in the Bridge
through the AXI4 Slave interface.
2. Bridge has TX Message Control, Header L (bytes 8-11), Header H (bytes 12-15) and TX
Message Data registers as shown in the PCIe TX Message Data FIFO Register
(TX_MSG_DFIFO).
3. Issue a Write to offset 0xE64 through AXI4 Slave interface for the TX Message Header L
register.
4. Program offset 0xE68 for the required VDM TX Header H register.
5. Program up to 16DW of Payload for the VDM message starting from DW0 – DW15 by sending
Writes to offset 0xE6C one by one.
6. Program the msg_routing, msg_code, data length, requester function field and msg_execute
field in the TX_MSG_CTRL register in offset 0xE60 to send the VDM TX packet.
7. The TX Message Control register also indicates the completion status of the message in bit 23.
User needs to read this bit to confirm the successful transmission of the VDM packet.
8. All the fields in the registers are RW except bit 23 (msg_fail) in TX Control register which is
cleared by writing a 1.
9. VDM TX packet will be sent on the AXI-ST RQ transmit interface.

Config Extend

PCIe extended interface can be selected for more configuration space. When the Configuration
Extend Interface is selected, you are responsible for adding logic to extend the interface to make it
work properly.

Expansion ROM

If selected, the Expansion ROM is activated and can be a value from 2 KB to 4 GB. According to the
PCI Local Bus Specification ( PCI-SIG Specifications (https://www.pcisig.com/specifications)), the
maximum size for the Expansion ROM BAR should be no larger than 16 MB. Selecting an address
space larger than 16 MB can result in a non-compliant core.

Errors

Bridge Errors

Slave Bridge Abnormal Conditions

Slave bridge abnormal conditions are classified as: Illegal Burst Type and Completion TLP Errors. The
following sections describe the manner in which the Bridge handles these errors.
Illegal Burst Type
The slave bridge monitors AXI read and write burst type inputs to ensure that only the INCR
(incrementing burst) type is requested. Any other value on these inputs is treated as an error condition
and the Slave Illegal Burst (SIB) interrupt is asserted. In the case of a read request, the Bridge
asserts SLVERR for all data beats and arbitrary data is placed on the Slave AXI4-MM read data bus.
In the case of a write request, the Bridge asserts SLVERR for the write response and all write data is
discarded.

Displayed in the footer


Page 132 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Completion TPL Errors
Any request to the bus for PCIe (except for a posted Memory write) requires a completion TLP to
complete the associated AXI request. The Slave side of the Bridge checks the received completion
TLPs for errors and checks for completion TLPs that are never returned (Completion Timeout). Each
of the completion TLP error types are discussed in the subsequent sections.
Unexpected Completion
When the slave bridge receives a completion TLP, it matches the header RequesterID and Tag to the
outstanding RequesterID and Tag. A match failure indicates the TLP is an Unexpected Completion
which results in the completion TLP being discarded and a Slave Unexpected Completion (SUC)
interrupt strobe being asserted. Normal operation then continues.
Unsupported Request
A device for PCIe might not be capable of satisfying a specific read request. For example, if the read
request targets an unsupported address for PCIe, the completer returns a completion TLP with a
completion status of 0b001 - Unsupported Request. The completer that returns a completion TLP
with a completion status of Reserved must be treated as an unsupported request status, according to
the PCI Express Base Specification v3.0. When the slave bridge receives an unsupported request
response, the Slave Unsupported Request (SUR) interrupt is asserted and the DECERR response is
asserted with arbitrary data on the AXI4 memory mapped bus.
Completion Timeout
A Completion Timeout occurs when a completion (Cpl) or completion with data (CplD) TLP is not
returned after an AXI to PCIe memory read request, or after a PCIe Configuration Read/Write request.
For PCIe Configuration Read/Write request, completions must complete within the C_COMP_TIMEOUT
parameter selected value from the time the request is issued. For PCIe Memory Read request,
completions must complete within the value set in the Device Control 2 register in the PCIe
Configuration Space register. When a completion timeout occurs, an OKAY response is asserted with
all 1s data on the memory mapped AXI4 bus.
Poison Bit Received on Completion Packet
An Error Poison occurs when the completion TLP EP bit is set, indicating that there is poisoned data
in the payload. When the slave bridge detects the poisoned packet, the Slave Error Poison (SEP)
interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.
Completer Abort
A Completer Abort occurs when the completion TLP completion status is 0b100 - Completer Abort.
This indicates that the completer has encountered a state in which it was unable to complete the
transaction. When the slave bridge receives a completer abort response, the Slave Completer Abort
(SCA) interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.

Table: Slave Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

SIB interrupt is asserted.


Read Illegal burst type SLVERR response given with
arbitrary read data.

Displayed in the footer


Page 133 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Transfer Type Abnormal Condition Bridge Response

SIB interrupt is asserted.


Write Illegal burst type Write data is discarded.
SLVERR response given.

SUC interrupt is asserted.


Read Unexpected completion
Completion is discarded.

SUR interrupt is asserted.


Unsupported Request status
Read DECERR response given with
returned
arbitrary read data.

SCT interrupt is asserted.


Read Completion timeout SLVERR response given with
arbitrary read data.

Completion data is discarded.


SEP interrupt is asserted.
Read Poison bit in completion
SLVERR response given with
arbitrary read data.

SCA interrupt is asserted.


Read Completer Abort (CA) status returned SLVERR response given with
arbitrary read data.
PCIe Error Handling

Master Bridge Abnormal Conditions

The following sections describe the manner in which the master bridge handles abnormal conditions.
AXI DECERR Response
When the master bridge receives a DECERR response from the AXI bus, the request is discarded
and the Master DECERR (MDE) interrupt is asserted. If the request was non-posted, a completion
packet with the Completion Status = Unsupported Request (UR) is returned on the bus for PCIe.
AXI SLVERR Response
When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.
Max Payload Size for PCIe, Max Read Req
When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.
Completion Packets
When the MAX_READ_REQUEST_SIZE is greater than the MAX_PAYLOAD_SIZE, a read request for PCIe
can ask for more data than the master bridge can insert into a single completion packet. When this
situation occurs, multiple completion packets are generated up to MAX_PAYLOAD_SIZE, with the Read
Completion Boundary (RCB) observed.

Displayed in the footer


Page 134 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Poison Bit
When the poison bit is set in a transaction layer packet (TLP) header, the payload following the
header is corrupted. When the master bridge receives a memory request TLP with the poison bit set,
it discards the TLP and asserts the Master Error Poison (MEP) interrupt strobe.
Zero Length Requests
When the master bridge receives a read request with the Length = 0x1, FirstBE = 0x00, and LastBE =
0x00, it responds by sending a completion with Status = Successful Completion.
When the master bridge receives a write request with the Length = 0x1, FirstBE = 0x00, and LastBE
= 0x00 there is no effect.

Table: Master Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

MSE interrupt strobe asserted.


Read SLVERR response Completion returned with Completer
Abort status.

Write SLVERR response MSE interrupt strobe asserted.

MEP interrupt strobe asserted.


Write Poison bit set in request
Data is discarded.

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

Linkdown Errors

If the PCIe link goes down during DMA operations, transactions may be lost and the DMA may not be
able to complete. In such cases, the AXI4 interfaces will continue to operate. Outstanding read
requests on the C2H Bridge AXI4 MM interface receive correct completions or completions with a
slave error response. The DMA will log a link down error in the status register. It is the responsibility of
the driver to have a timeout and handle recovery of a link down situation.

Data Path Errors

Data protection is supported on the primary data paths. CRC error can occur on C2H streaming, H2C
streaming. Parity error can occur on Memory Mapped, Bridge Master and Bridge Slave interfaces.
Error on Write payload can occur on C2H streaming, Memory Mapped and Bridge Slave. Double bit
error on write payload and read completions for Bridge Slave interface causes parity error. Parity

Displayed in the footer


Page 135 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
errors on requests to the PCIe are dropped by the core, and a fatal error is logged by the PCIe. Parity
errors are not recoverable and can result in unexpected behavior. Any DMA during and after the parity
error should be considered invalid. If there is a parity error and transfer hangs or stops, the DMA will
log the error. You must investigate and fix the parity issues. Once the issues are fixed, clear that
queue and reopen the queue to start a new transfer.

DMA Errors

All DMA errors are logged in their respective error status register. Each block has error status and
error mask register so error can be passed on to higher level and eventually to
QDMA_GLBL_ERR_STAT register.
Errors can be fatal error based on register settings. If there is an fatal error DMA will stop the transfer
and will send interrupt if enabled. After debug and analysis, you must invalidate and restart the queue
to start the DMA transfer.

Error Aggregator

There are Leaf Error Aggregators in different places. They log the errors and propagate them to the
central place. The Central Error Aggregator aggregates the errors from all of the Leaf Error
Aggregators.
The QDMA_GLBL_ERR_STAT register is the error status register of the Central Error Aggregator. The
bit fields indicate the locations of Leaf Error Aggregators. Then, look for the error status register of the
individual Leaf Error Aggregator to find the exact error.
The register QDMA_GLBL_ERR_MASK is the error mask register of the Central Error Aggregator. It
has the mask bits for the corresponding errors. When the mask bit is set to 1'b1, it will enable the
corresponding error to be propagated to the next level to generate an Interrupt. The detail information
of the error generated interrupt is described in the interrupt section. Error interrupt is controlled by the
register QDMA_GLBL_ERR_INT (0xB04).
Each Leaf Error Aggregator has an error status register and an error mask register. The error status
register logs the error. The hardware sets the bit when the error happens, and the software can write
1'b1 to clear the bit if needed. The error mask register has the mask bits for the corresponding errors.
When the mask bit is set to 1'b1, it will enable the propagation of the corresponding error to the
Central Error Aggregator. The error mask register does not affect the error logging to the error status
register.

Figure: Error Aggregator

Displayed in the footer


Page 136 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The error status registers and the error mask registers of the Leaf Error Aggregators are as follows.

C2H Streaming Error


QDMA_C2H_ERR_STAT (0xAF0): This is the error status register of the C2H streaming errors.
QDMA_C2H_ERR_MASK (0xAF4): This the error mask register. The software can set the bit to
enable the corresponding C2H streaming error to be propagated to the Central Error Aggregator.
QDMA_C2H_FIRST_ERR_QID (0xB30): This is the Qid of the first C2H streaming error.

C2H MM Error
QDMA_C2H MM Status (0x1040)
C2H MM Error Code Enable Mask (0x1054)
C2H MM Error Code (0x1058)
C2H MM Error Info (0x105C)

QDMA H2C0 MM Error


H2C0 MM Status (0x1240)
H2C MM Error Code Enable Mask (0x1254)
H2C MM Error Code (0x1258)
H2C MM Error Info (0x125C)

TRQ Error
QDMA_GLBL_TRQ_ERR_STS (0x264): This is the error status register of the Trq errors.
QDMA_GLBL_TRQ_ERR_MSK (0x268): This is the error mask register.
QDMA_GLBL_TRQ_ERR_LOG_A (0x26C): This is the error logging register. It shows the select,
function and the address of the access when the error happens.

Displayed in the footer


Page 137 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Descriptor Error
QDMA_GLBL_DSC_ERR_STS (0x254)
QDMA_GLBL_DSC_ERR_MSK (0x258): This is the error logging register. It has the QID, DMA
direction, and the consumer index of the error.
QDMA_GLBL_DSC_ERR_LOG0 (0x25C)
QDMA_GLBL_TRQ_ERR_STS (0x264): This is the error status register of the TRQ errors.

RAM Double Bit Error


QDMA_RAM_DBE_STS_A (0xFC)
QDMA_RAM_DBE_MSK_A (0xF8)

RAM Single Error


QDMA_RAM_SBE_STS_A (0xF4)
QDMA_RAM_SBE_MSK_A (0xF0)

C2H Streaming Fatal Error Handling

QDMA_C2H_FATAL_ERR_STAT (0xAF8): The error status register of the C2H streaming fatal
errors.
QDMA_C2H_FATAL_ERR_MASK (0xAFC): The error mask register. The SW can set the bit to
enable the corresponding C2H fatal error to be sent to the C2H fatal error handling logic.
QDMA_C2H_FATAL_ERR_ENABLE (0xB00): This register enables two C2H streaming fatal
error handling processes:

1. Stop the data transfer by disabling the WRQ from the C2H DMA Write Engine.
2. Invert the WPL parity on the data transfer.

Port Descriptions

QDMA Global Signals

Table: QDMA Global Port Descriptions

Port Name I/O Description

gt_refclk0_p/gt_refclk0_n I GT reference clock

pci_gt_txp/pci_gt_txn O PCIe TX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

pci_gt_rxp/pci_gt_rxn I PCIe RX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

Displayed in the footer


Page 138 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

pcie0_user_lnk_up O Output active-High identifies that the PCI Express


core is linked up with a host device.

pcie0_user_clk O User clock out. PCIe derived clock output for all
interface signals output/input to the QDMA. Use this
clock to drive inputs and gate outputs from QDMA.

dma0_axi_aresetn O User reset out. AXI reset signal synchronous with the
clock provided on the pcie0_user_clk output. This
reset should drive all corresponding AXI Interconnect
aresetn signals.

dma0_soft_resetn I Soft reset (active-Low). Use this port to assert reset


and reset the DMA logic. This will reset only the DMA
logic. User should assert and de-assert this port.
All AXI interfaces are clocked out and in by the pcie0_user_clk signal. You are responsible for using
pcie0_user_clk to drive all signals into the CPM.
pcie0_user_clk should be used to interface with the CPM. In the user logic, any available clocks
can be used.

AXI Slave Interface

AXI Bridge Slave ports are connected from the AMD Versal device Network on Chip (NoC) to the
CPM DMA internally. For Slave Bridge AXI4 details, see the Versal Adaptive SoC Programmable
Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
To access QDMA registers, you must follow the protocols outlined in the AXI Slave Bridge Register
Limitations section.
Related Information
Slave Bridge Registers Limitations

AXI4 Memory Mapped Interface

AXI4 (MM) Master ports are connected from the CPM to the AMD Versal device Network on Chip
(NoC) internally. For details, see the Versal Adaptive SoC Programmable Network on Chip and
Integrated Memory Controller LogiCORE IP Product Guide (PG313). The AXI4 Master interface can
be connected to the DDR memory or to the PL user logic, depending on the NoC configuration.

AXI4-Lite Master Interface

AMD Versal device Network on Chip (NoC) provides only AXI4 interface. If you need AXI4-Lite
interface, use SmartConnect IP to convert NoC output AXI4 interface to AXI4-Lite interface. For
details, see the SmartConnect LogiCORE IP Product Guide (PG247).

Displayed in the footer


Page 139 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI4-Stream H2C Interface

Table: AXI4-Stream H2C Interface Descriptions

Port Name I/O Description

dma0_m_axis_h2c_tdata O Data output for H2C AXI4-Stream.


[AXI_DATA_WIDTH-1:0]

dma0_m_axis_h2c_par O Odd parity calculated bit-per-byte over


[AXI_DATA_WIDTH/8-1 : 0] dma0_m_axis_h2c_tdata.
dma0_m_axis_h2c_dpar[0] is parity calculated over
dma0_m_axis_h2c_tdata[7:0].
dma0_m_axis_h2c_dpar[1] is parity calculated over
dma0_m_axis_h2c_tdata[15:8], and so on.

dma0_m_axis_h2c_tuser_qid[10:0] O Queue ID

dma0_m_axis_h2c_tuser_port_id[2:0]
O Port ID

dma0_m_axis_h2c_err O If set, indicates the packet has an error. The error


could be coming from the PCIe, or the QDMA might
have encountered a double bit error.

dma0_m_axis_h2c_mdata[31:0] O Metadata
In internal mode, QDMA passes the lower 32 bits of
the H2C AXI4-Stream descriptor on this field.

dma0_m_axis_h2c_mty[5:0] O The number of bytes that are invalid on the last beat
of the transaction. This field is 0 for a 64B transfer.

dma0_m_axis_h2c_zero_byte O When set, it indicates that the current beat is an


empty beat (zero bytes are being transferred).

dma0_m_axis_h2c_tvalid O Valid

dma0_m_axis_h2c_tlast O Indicates that this is the last cycle of the packet


transfer.

dma0_m_axis_h2c_tready I Ready

AXI4-Stream C2H Interface

Table: AXI4-Stream C2H Interface Descriptions

Port Name I/O Description

dma0_s_axis_c2h_tdata I Supports 4 data widths: 64 bits, 128 bits, 256 bits,


[AXI_DATA_WIDTH-1:0] and 512 bits. Every C2H data packet has a
corresponding C2H completion packet.

Displayed in the footer


Page 140 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_s_axis_c2h_dpar I Odd parity computed as bit per byte.


[AXI_DATA_WIDTH/8-1 : 0]

dma0_s_axis_c2h_ctrl_len I Length of the packet. For 0 (zero) byte write, the


[15:0] length is 0. C2H stream packet data length is limited
to 7 * descriptor size.

dma0_s_axis_c2h_ctrl_qid I Queue ID.


[10:0]

dma0_s_axis_c2h_ctrl_imm_data I Immediate data. This allows only the completion and


no DMA on the data payload.

dma0_s_axis_c2h_ctrl_dis_cmpt I Disable completion

dma0_s_axis_c2h_ctrl_marker I Marker message used for making sure pipeline is


completely flushed. After that, you can safely perform
queue invalidation.

dma0_s_axis_c2h_ctrl_port_id I Port ID.


[2:0]

dma0_s_axis_c2h_ctrl_user_trig I User trigger. This can trigger the interrupt and the
status descriptor write if they are enabled.

dma0_s_axis_c2h_mty [5:0] I Empty byte should be set in last beat.

dma0_s_axis_c2h_tvalid I Valid.

dma0_s_axis_c2h_tlast I Indicate last packet.

dma0_s_axis_c2h_tready O Ready.

dma0_s_axis_c2h_cmpt_data[127:0]I Completion data from the user application. This


contains information that is written to the completion
ring in the host. This information includes the length of
the packet transferred in bytes, error, color bit, and
user data. Based on completion size, this could be 1
or 2 beats. Every C2H completion packet has a
corresponding C2H data packet.

dma0_s_axis_c2h_cmpt_size[1:0] I 00: 8B completion.


01: 16B completion.
10: 32B completion.
11: unknown.

dma0_s_axis_c2h_dmpt_dpar[3:0] I Odd parity computed as bit per word.


dma0_s_axis_c2h_cmpt_dpar[0] is parity over
dma0_s_axis_c2h_cmpt_data[31:0].

Displayed in the footer


Page 141 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


dma0_s_axis_c2h_cmpt_dpar[1] is parity over
dma0_s_axis_c2h_cmpt_data[63:31], and so on.

dma0_s_axis_c2h_cmpt_tvalid I Valid

dma0_s_axis_c2h_cmpt_tlast I Indicates the end of the completion data transfer.

dma0_s_axis_c2h_cmpt_tready O Ready

AXI4-Stream Status Interface

Table: AXI-ST C2H Status Interface Descriptions

Port Name I/O Description

dma0_axis_c2h_status_valid O Valid per descriptor.

dma0_axis_c2h_status_qid[10:0] O QID of the packet.

dma0_axis_c2h_status_drop O The QDMA drops the packet if it does not have


enough descriptors to transfer the full packet to the
host. This bit indicates if the packet was dropped or
not. A packet that is not dropped is considered as
having been accepted.
0: Packet is not dropped.
1: Packet is dropped.

AXI4-Stream C2H Write Cmp Interface

Table: AXI-ST C2H Write Cmp Interface Descriptions

Port Name I/O Description

dma0_axis_c2h_dmawr_cmp O This signal is asserted when the last data payload


write request of the packet gets the write completion.
It is one pulse per packet.

VDM Interface

Table: VDM Port Descriptions

Port Name I/O Description

dma0_st_rx_msg_tvalid O Valid

dma0_st_rx_msg_tdata[31:0] O Beat 1:
{REQ_ID[15:0], VDM_MSG_CODE[7:0],
VDM_MSG_ROUTING[2:0],

Displayed in the footer


Page 142 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


VDM_DW_LENGTH[4:0]}
Beat 2:
VDM Lower Header [31:0]
or
{(Payload_length=0), VDM Higher Header [31:0]}
Beat 3 to Beat <n>:
VDM Payload

dma0_st_rx_msg_tlast O Indicates the last beat

dma0_st_rx_msg_tready I Ready.
✎ Note: When this interface is not used, Ready must
be tied-off to 1.

✎ Recommended: RX Vendor Defined Messages are stored in shallow FIFO before they are
transmitted to output ports. When there are many back-to-back VDM messages, the FIFO overflows
and these messages are dropped. It is best to repeat VDM messages at regular intervals.

FLR Interface

Table: FLR Port Descriptions

Port Names I/O Description

dma0_usr_flr_fnc [7:0] O Function


The function number of the FLR status change.

dma0_usr_flr_set O Set
Asserted for 1 cycle indicating that the FLR status of
the function indicated on dma0_usr_flr_fnc[7:0] is
active.

dma0_usr_flr_clr O Clear
Asserted for 1 cycle indicating that the FLR status of
the function indicated on dma0_usr_flr_fnc[7:0] is
completed.

dma0_usr_flr_done_fnc [7:0] I Done Function


The function for which FLR has been completed.

dma0_usr_flr_done_vld I Done Valid


Assert for one cycle to signal that FLR for the function
on dma0_usr_flr_done_fnc[7:0] has been completed.

QDMA Descriptor Bypass Input Interface

Table: QDMA H2C-Streaming Bypass Input Interface Descriptions

Displayed in the footer


Page 143 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_h2c_byp_in_st_addr[63:0] I 64-bit starting address of the DMA transfer.

dma0_h2c_byp_in_st_len[15:0] I The number of bytes to transfer.

dma0_h2c_byp_in_st_sop I Indicates start of packet. Set for the first descriptor.


Reset for the rest of the descriptors.

dma0_h2c_byp_in_st_eop I Indicates end of packet. Set for the last descriptor.


Reset for the rest of the descriptors

dma0_h2c_byp_in_st_sdi I H2C Bypass In Status Descriptor/Interrupt


If set, it is treated as an indication from the user
application to the QDMA to send the status descriptor
to host, and to generate an interrupt to host when the
QDMA has fetched the last byte of the data
associated with this descriptor. The QDMA honors the
request to generate an interrupt only if interrupts have
been enabled in the H2C SW context for this QID and
armed by the driver. This can only be set for an EOP
descriptor.
QDMA will hang if the last descriptor without
h2c_byp_in_st_sdi has an error. This results in a
missing writeback, and hw_ctxt.dsc_pend bit that are
asserted indefinitely. The workaround is to send a
zero length descriptor to trigger the Completion
(CMPT) Status.

dma0_h2c_byp_in_st_mrkr_req I H2C Bypass In Marker Request


When set, the descriptor passes through the H2C
Engine pipeline and once completed, produces a
marker response on the interface. This can only be
set for an EOP descriptor.

dma0_h2c_byp_in_st_no_dma I H2C Bypass In No DMA


When sending a descriptor through the interface with
this signal asserted, it informs the QDMA to not send
any PCIe requests for this descriptor. Because no
PCIe request is sent out, no corresponding DMA data
is issued on the H2C Streaming output interface.
This signal is typically used in conjunction with
h2c_byp_in_st_sdi to cause Status
Descriptor/Interrupt when the user logic is out of the
actual descriptors and still wants to drive the
h2c_byp_in_st_sdi signal.
If dma0_h2c_byp_in_st_mrkr_req and
h2c_byp_in_st_sdi are reset when sending in a no-

Displayed in the footer


Page 144 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


DMA descriptor, the descriptor is treated as a NOP
and is completely consumed inside the QDMA without
any interface activity.
If dma0_h2c_byp_in_st_no_dma is set, both
dma0_h2c_byp_in_st_sop and
dma0_h2c_byp_in_st_eop must be set.
If dma0_h2c_byp_in_st_no_dma is set, the QDMA
ignores the address and length fields of this interface.

dma0_h2c_byp_in_st_qid[10:0] I The QID associated with the H2C descriptor ring.

dma0_h2c_byp_in_st_error I This bit can be set to indicate an error for the queue.
The descriptor will not be processed. Context will be
updated to reflect an error in the queue

dma0_h2c_byp_in_st_func[7:0] I PCIe function ID

dma0_h2c_byp_in_st_cidx[15:0] I The CIDX that will be used for the status descriptor
update and/or interrupt (aggregation mode). Generally
the CIDX should be left unchanged from when it was
received from the descriptor bypass output interface.

dma0_h2c_byp_in_st_port_id[2:0] I QDMA port ID

dma0_h2c_byp_in_st_valid I Valid. High indicates descriptor is valid. One pulse for


one descriptor.

dma0_h2c_byp_in_st_ready O Ready to take in descriptor

Table: QDMA H2C-MM Descriptor Bypass Input Port Descriptions

Port Name I/O Description

dma0_h2c_byp_in_mm_radr[63:0] I The read address for the DMA data.

dma0_h2c_byp_in_mm_wadr[63:0] I The write address for the DMA data.

dma0_h2c_byp_in_mm_len[27:0] I The DMA data length.


The upper 12 bits must be tied to 0. Thus only the
lower 16 bits of this field can be used for specifying
the length.

dma0_h2c_byp_in_mm_sdi I H2C-MM Bypass In Status Descriptor/Interrupt


If set, the signal is treated as an indication from the
user logic to the QDMA to send the status descriptor
to the host and generate an interrupt to the host when
the QDMA has fetched the last byte of the data
associated with this descriptor. The QDMA will honor

Displayed in the footer


Page 145 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


the request to generate an interrupt only if interrupts
have been enabled in the H2C ring context for this
QID and armed by the driver.
QDMA will hang if the last descriptor without
dma0_h2c_byp_in_mm_sdi has an error. This results
in a missing writeback, and the hw_ctxt.dsc_pend bit
is asserted indefinitely. The workaround is to send a
zero length descriptor to trigger the Completion
(CMPT) Status.

dma0_h2c_byp_in_mm_mrkr_req I H2C-MM Bypass In Completion Request


Indication from the user logic that the QDMA must
send a completion status to the user logic after the
QDMA has completed the data transfer of this
descriptor.

dma0_h2c_byp_in_mm_qid[10:0] I The QID associated with the H2C descriptor ring.

dma0_h2c_byp_in_mm_error I This bit can be set to indicate an error for the queue.
The descriptor will not be processed. Context will be
updated to reflect and error in the queue.

dma0_h2c_byp_in_mm_func[7:0] I PCIe function ID

dma0_h2c_byp_in_mm_cidx[15:0] I The CIDX that will be used for the status descriptor
update and/or interrupt (aggregation mode). Generally
the CIDX should be left unchanged from when it was
received from the descriptor bypass output interface.

dma0_h2c_byp_in_mm_port_id[2:0] I QDMA port ID

dma0_h2c_byp_in_mm_valid I Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma0_h2c_byp_in_mm_ready O Ready to take in descriptor

Table: QDMA C2H-Streaming Cache Bypass Input Port Descriptions

Port Name I/O Description

dma0_c2h_byp_in_st_csh_addr I 64 bit address where DMA writes data.


[63:0]

dma0_c2h_byp_in_st_csh_qid I The QID associated with the C2H descriptor ring.


[10:0]

dma0_c2h_byp_in_st_csh_error I This bit can be set to indicate an error for the queue.
The descriptor will not be processed. Context will be

Displayed in the footer


Page 146 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


updated to reflect and error in the queue.

dma0_c2h_byp_in_st_csh_func I PCIe function ID


[7:0]

dma0_c2h_byp_in_st_csh_port_id[2:0]
I QDMA port ID

dma0_c2h_byp_in_st_csh_valid I Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma0_c2h_byp_in_st_csh_ready O Ready to take in descriptor.

Table: QDMA C2H-Streaming Simple Bypass Input Port Descriptions

Port Name I/O Description

dma0_c2h_byp_in_st_sim_addr I 64-bit address where DMA writes data.


[63:0]

dma0_c2h_byp_in_st_sim_qid I The QID associated with the C2H descriptor ring.


[10:0]

dma0_c2h_byp_in_st_sim_error I This bit can be set to indicate an error for the queue.
The descriptor will not be processed. Context will be
updated to reflect an error in the queue.

dma0_c2h_byp_in_st_sim_func I PCIe function ID


[7:0]

dma0_c2h_byp_in_st_sim_port_id[2:0]
I QDMA port ID

dma0_c2h_byp_in_st_sim_valid I Valid. High indicates descriptor is valid. One pulse for


one descriptor.

dma0_c2h_byp_in_st_sim_ready O Ready to take in descriptor.

Table: QDMA C2H-MM Descriptor Bypass Input Port Descriptions

Port Name I/O Description

dma0_c2h_byp_in_mm_raddr I The read address for the DMA data.


[63:0]

dma0_c2h_byp_in_mm_wadr[63:0] I The write address for the DMA data.

dma0_c2h_byp_in_mm_len[27:0] I The DMA data length.

dma0_c2h_byp_in_mm_sdi I C2H Bypass In Status Descriptor/Interrupt


If set, it is treated as an indication from the user logic
to the QDMA to send the status descriptor to host,

Displayed in the footer


Page 147 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


and generate an interrupt to host when the QDMA has
fetched the last byte of the data associated with this
descriptor. The QDMA will honor the request to
generate an interrupt only if interrupts have been
enabled in the C2H ring context for this QID and
armed by the driver.

dma0_c2h_byp_in_mm_mrkr_req I C2H Bypass In Marker Request


Indication from the user logic that the QDMA must
send a completion status to the user logic after the
QDMA has completed the data transfer of this
descriptor.

dma0_c2h_byp_in_mm_qid I The QID associated with the C2H descriptor ring.


[10:0]

dma0_c2h_byp_in_mm_error I This bit can be set to indicate an error for the queue.
The descriptor will not be processed. Context will be
updated to reflect and error in the queue.

dma0_c2h_byp_in_mm_func I PCIe function ID


[7:0]

dma0_c2h_byp_in_mm_cidx I The User must echo the CIDX from the descriptor that
[15:0] it received on the bypass-out interface.

dma0_c2h_byp_in_mm_port_id[2:0] I QDMA port ID

dma0_c2h_byp_in_mm_valid I Valid. High indicates descriptor is valid. One pulse for


one descriptor.

dma0_c2h_byp_in_mm_ready O Ready to take in descriptor.

QDMA Descriptor Bypass Output Interface

Table: QDMA H2C Descriptor Bypass Output Interface Descriptions

Port Name I/O Description

dma0_h2c_byp_out_dsc[255:0] O The H2C descriptor fetched from the host.


For H2C AXI-MM, the QDMA uses all 256 bits, and
the structure of the bits are the same as found in AXI
Memory Mapped Writeback Status Structure for H2C
and C2H.
For H2C AXI-ST, the QDMA uses [127:0] bits, and the
structure of the bits are the same as found in H2C
Stream Status Descriptor Writeback.

Displayed in the footer


Page 148 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_h2c_byp_out_st_mm O Indicates whether this is a streaming data descriptor


or memory-mapped descriptor.
0: Streaming
1: Memory-mapped

dma0_h2c_byp_out_dsc_sz[1:0] O Descriptor size. This field indicates the size of the


descriptor.
0: 8B
1: 16B
2: 32B
3: 64B - 64B descriptors will be transferred with two
valid/ready cycles. The first cycle has the least
significant 32 bytes. The second cycle has the most
significant 32 bytes. CIDX and other queue
information is valid only on the second beat of a 64B
descriptor .

dma0_h2c_byp_out_qid[10:0] O The QID associated with the H2C descriptor ring.

dma0_h2c_byp_out_error O Indicates that an error was encountered in descriptor


fetch or execution of a previous descriptor.

dma0_h2c_byp_out_func[7:0] O PCIe function ID

dma0_h2c_byp_out_cidx[15:0] O H2C Bypass Out Consumer Index


The ring index of the descriptor fetched. The User
must echo this field back to QDMA when submitting
the descriptor on the bypass-in interface.

dma0_h2c_byp_out_port_id[2:0] O QDMA port ID

dma0_h2c_byp_out_mrkr_rsp O Indicates completion status in response to


h2c_byp_in_st_mrkr_req (Stream) or
h2c_byp_in_mm_mrkr_req (MM).

dma0_h2c_byp_out_valid O Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma0_h2c_byp_out_ready I Ready. When this interface is not used, Ready must


be tied-off to 1.

Table: QDMA C2H Descriptor Bypass Output Port Descriptions

Port Name I/O Description

dma0_c2h_byp_out_dsc[255:0] O The C2H descriptor fetched from the host.

Displayed in the footer


Page 149 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


For C2H AXI-MM, the QDMA uses all 256 bits, and
the structure of the bits is the same as found in AXI
Memory Mapped Writeback Status Structure for H2C
and C2H.
For C2H AXI-ST, the QDMA uses [63:0] bits, and the
structure of the bits is the same as found in C2H
Stream Descriptor (8B). The remaining bits are
ignored.

dma0_c2h_byp_out_st_mm O Indicates whether this is a streaming data descriptor


or memory-mapped descriptor.
0: streaming
1: memory-mapped

dma0_c2h_byp_out_dsc_sz[1:0] O Descriptor size. This field indicates the size of the


descriptor.
0: 8B
1: 16B
2: 32B
3: 64B to 64B descriptors will be transferred with two
valid/ready cycles. The first cycle has the least
significant 32 bytes. The second cycle has the most
significant 32 bytes. CIDX and other queue
information is valid only on the second beat of a 64B
descriptor.

dma0_c2h_byp_out_qid[10:0] O The QID associated with the H2C descriptor ring.

dma0_c2h_byp_out_error O Indicates that an error was encountered in descriptor


fetch or execution of a previous descriptor.

dma0_c2h_byp_out_func[7:0] O PCIe function ID.

dma0_c2h_byp_out_cidx[15:0] O C2H Bypass Out Consumer Index


The ring index of the descriptor fetched. The User
must echo this field back to QDMA when submitting
the descriptor on the bypass-in interface.

dma0_c2h_byp_out_port_id[2:0] O QDMA port ID

dma0_c2h_byp_out_mrkr_rsp O Indicates completion status in response to


dma0_s_axis_c2h_ctrl_marker (Stream) or
c2h_byp_in_mm_mrkr_req (MM). For the completions
status for dma0_s_axis_c2h_ctrl_marker (Stream),
the details are given in the table below.

Displayed in the footer


Page 150 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_c2h_byp_out_valid O Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma0_c2h_byp_out_ready I Ready. When this interface is not used, Ready must


be tied-off to 1.

Table: QDMA C2H Descriptor Bypass out Marker Response Description

Field Name location Description

err[1:0] [1:0] Error code reported by the C2H Engine.


0: No error
1: SW gave bad Completion CIDX update
2: Descriptor error received while processing the C2H
packet
3: Completion dropped by the C2H Engine because
Completion Ring was full

retry_marker_req [2] The marker request could not be completed because


an Interrupt could not be generated in spite of being
enabled. This happens when an Interrupt is already
outstanding on the queue when the marker request
was received. The user logic must wait and retry the
marker request again.

rsv [255:3] Reserved

It is common for dma0_h2c_byp_out_valid or dma0_c2h_byp_out_valid to be asserted with the


CIDX value. This occurs when the descriptor bypass mode option is not set in the context
programming selection. You must set the descriptor bypass mode during QDMA IP core customization
in the AMD Vivado™ IDE to see descriptor bypass output ports. When the descriptor bypass option is
selected in the Vivado IDE but the descriptor bypass bit is not set in context programming, you will
see valid signals getting asserted with CIDX updates.

QDMA Descriptor Credit Input Interface

Table: QDMA Descriptor Credit Input Port Descriptions

Port Name I/O Description

dma0_dsc_crdt_in_valid I Valid. When asserted the user must be presenting


valid data on the bus and maintain the bus values
until both valid and ready are asserted on the same
cycle.

Displayed in the footer


Page 151 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_dsc_crdt_in_rdy O Ready. Assertion of this signal indicates the DMA is


ready to accept data from this bus.

dma0_dsc_crdt_in_sel I Indicates whether credits are for H2C or C2H


descriptor ring.
0: H2C
1: C2H

dma0_dsc_crdt_in_qid [10:0] I The QID associated with the descriptor ring for the
credits are being added.

dma0_dsc_crdt_in_crdt [15:0] I The number of descriptor credits that the user


application is giving to the QDMA to fetch descriptors
from the host.

QDMA Traffic Manager Credit Output Interface

Table: QDMA TM Credit Output Port Descriptions

Port Name I/O Description

dma0_tm_dsc_sts_valid O Valid. Indicates valid data on the output bus. Valid


data on the bus is held until dma0_tm_dsc_sts_rdy is
asserted by the user.

dma0_tm_dsc_sts_rdy I Ready. Assertion indicates that the user logic is ready


to accept the data on this bus. When this interface is
not used, Ready must be tied-off to 1.
✎ Note: When this interface is not used, Ready must
be tied-off to 1.

dma0_tm_dsc_sts_byp O Shows the bypass bit in the SW descriptor context.

dma0_tm_dsc_sts_dir O Indicates whether the status update is for a H2C or


C2H descriptor ring.
0: H2C
1: C2H

dma0_tm_dsc_sts_mm O Indicates whether the status update is for a streaming


or memory-mapped queue.
0: Streaming
1: Memory-mapped

dma0_tm_dsc_sts_qid [10:0] O The QID of the ring

dma0_tm_dsc_sts_avl [15:0] O If dma0_tm_dsc_sts_qinv is set, this is the number of


credits available in the descriptor engine. If

Displayed in the footer


Page 152 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


dma0_tm_dsc_sts_qinv is not set this is the number
of new descriptors that have been posted to the ring
since the last time this update was sent.

dma0_tm_dsc_sts_qinv O If set, it indicates that the queue has been invalidated.


This is used by the user application to reconcile the
credit accounting between the user application and
QDMA.

dma0_tm_dsc_sts_qen O The current queue enable status.

dma0_tm_dsc_sts_irq_arm O If set, it indicates that the driver is ready to accept


interrupts.

dma0_tm_dsc_sts_error O Set to 1 if the PIDX update is rolled over the current


CIDX of associated. queue.

dma0_tm_dsc_sts_port_id O The port id associated with the queue from the queue
[2:0] context.

User Interrupts

Table: User Interrupts Port Descriptions

Port Name I/O Description

dma0_usr_irq_vld I Valid
An assertion indicates that an interrupt associated
with the vector, function, and pending fields on the
bus should be generated to PCIe. Once asserted,
dma0_usr_irq_in_vld must remain high until
dma0_usr_irq_ack is asserted by the DMA.

dma0_usr_irq_vec [4:0] I Vector


The MSI-X vector to be sent.
Vector number is the index for MSI_X table. Vector 0
is the first vector for that function. Each function has 8
vectors.

dma0_usr_irq_fnc [7:0] I Function


The function of the vector to be sent.

dma0_usr_irq_ack O Interrupt Acknowledge


An assertion of the acknowledge bit indicates that the
interrupt was transmitted on the link the user logic
must wait for this pulse before signaling another
interrupt condition.

Displayed in the footer


Page 153 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_usr_irq_fail O Interrupt Fail


An assertion of fail indicates that the interrupt request
was aborted before transmission on the link.
IP can support maximum 32 vector per function. But only 2K vectors for all function combined are
supported. AMD driver supports only 8 vectors per function.

NoC Ports

✎ Note: NoC ports are always connected to NoC. You cannot leave them unconnected or connect to
any other blocks. This results in synthesis and implementation error. For connection reference, see
the following figure:

Table: NoC Ports

Port Name I/O Description

CPM_PCIE_NOC_0 O AXI4 MM 0 port from CPM to NoC

CPM_PCIE_NOC_1 O AXI4 MM 1 port from CPM to NoC

cpm_pcie_noc_axi0_clk* O Clock for AXI4 MM 0 Port

cpm_pcie_noc_axi1_clk* O Clock for AXI4 MM 1 Port

NOC_CPM_PCIE_0 I AXI4 MM ports from NoC to CPM. This port is


enabled when AXI Slave Bridge is enabled.

noc_cpm_pcie_axi0_clk* O Clock for AXI4 MM port from NoC. This port is


enabled when AXI Slave Bridge is enabled.

✎ Note: * All NoC related clock frequency can be modified in PS PMC GUI settings. By default all
the clock frequencies are set to max frequencies for an appropriate configurations.

Figure: CPM4 NoC Connection

Displayed in the footer


Page 154 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

CPM4 Mailbox Ports

✎ Note: Mailbox ports are always connected to Mailbox IP. If Mailbox IP is not used, leave the port
unconnected (floating). For connection reference look at the picture below.

Table: Mailbox Ports

Port Name I/O Description

dma0_mgmt I These DMA management ports should be connect to


Mailbox IP.

Register Space

QDMA PF Address Register Space

All the physical function (PF) registers are found in cpm4-qdma-v2-1-registers.csv available in the
register map files.
To locate the register space information:

1. Download the register map files.


2. Extract the ZIP file contents into any write-accessible location.
3. Refer to cpm4-qdma-v2-1-registers.csv.

Table: QDMA PF Address Register Space

Register Name Base (Hex) Byte Size (Dec) Register List and Details

QDMA_CSR 0x0000 8192 QDMA Configuration Space


Register (CSR) found in
cpm4-qdma-v2-1-
registers.csv.

QDMA_TRQ_MSIX 0x2000 512 Also found in


QDMA_TRQ_MSIX (0x2000).

Displayed in the footer


Page 155 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Name Base (Hex) Byte Size (Dec) Register List and Details
Maximum of 32 vectors per
function.

QDMA_PF_MAILBOX 0x2400 16384 Also found in


QDMA_PF_MAILBOX
(0x2400).

QDMA_TRQ_SEL_QUEUE_PF 0x6400 32768 Also found in


QDMA_TRQ_SEL_QUEUE_PF
(0x6400).

QDMA_CSR (0x0000)

QDMA Configuration Space Register (CSR) descriptions are found in cpm4-qdma-v2-1-registers.csv.


See above for details.

QDMA_TRQ_MSIX (0x2000)

Table: QDMA_TRQ_MSIX (0x2000)

Byte Offset Bit Default Access Type Field Description

0x2000 [31:0] 0 NA addr MSIX_Vector0_Address[63:32]


MSI-X vector0 message lower
address.

0x2004 [31:0] 0 RO addr MSIX_Vector0_Address[63:32]


MSI-X vector0 message upper
address.

0x2008 [31:0] 0 RO data MSIX_Vector0_Data[31:0]


MSI-X vector0 message data.

0x200C [31:0] 0 RO control MSIX_Vector0_Control[31:0]


MSI-X vector0 control.
Bit Position:
31:1: Reserved.
0: Mask. When set to 1, this MSI-X
vector is not used to generate a
message. When reset to 0, this MSI-
X vector is used to generate a
message.

The MSI-X table PBA offset is at 0x1400.


✎ Note: The table above represents one MSI-X table entry 0. Each function can only support up to
32 vectors.

Displayed in the footer


Page 156 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA_PF_MAILBOX (0x2400)

Table: QDMA_PF_MAILBOX (0x2400) Register Space

Register Address Description

Function Status Register (0x2400) 0x2400 Status bits

Function Command Register 0x2404 Command register bits


(0x2404)

Function Interrupt Vector Register 0x2408 Interrupt vector register


(0x2408)

Target Function Register (0x240C) 0x240C Target Function register

Function Interrupt Vector Register 0x2410 Interrupt Control Register


(0x2410)

RTL Version Register (0x2414) 0x2414 RTLVersion Register

PF Acknowledgment Registers 0x2420-0x243C PF acknowledge


(0x2420-0x243C)

FLR Control/Status Register 0x2500 FLR control and status


(0x2500)

Incoming Message Memory 0x2C00-0x2C7C Incoming message (128 bytes)


(0x2C00-0x2C7C)

Outgoing Message Memory (0x3000- 0x3000-0x307C Outgoing message (128 bytes)


0x307C)

Mailbox Addressing
PF addressing
Addr = PF_Bar_offset + CSR_addr

VF addressing
Addr = VF_Bar_offset + VF_Start_offset + VF_offset + CSR_addr

Function Status Register (0x2400)

Table: Function Status Register (02400)

Bit Default Access Type Field Description

[31:12] 0 NA Reserved Reserved

[11:4] 0 RO cur_src_fn This field is for PF use only.

Displayed in the footer


Page 157 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description


The source function number of the
message on the top of the incoming
request queue.

[2] 0 RO ack_status This field is for PF use only.


The status bit will be set when any bit in
the acknowledgment status register is
asserted.

[1] 0 RO o_msg_status For VF: The status bit will be set when VF
driver write msg_send to its command
register. When The associated PF driver
send acknowledgment to this VF, the
hardware clear this field. The VF driver is
not allow to update any content in its
outgoing mailbox memory (OMM) while
o_msg_status is asserted. Any illegal
write to the OMM will be discarded
(optionally, this can cause an error in the
AXI4-Lite response channel).
For PF: The field indicated the message
status of the target FN which is specified
in the Target FN Register. The status bit
will be set when PF driver sends
msg_send command. When the
corresponding function driver send
acknowledgment by sending msg_rcv, the
hardware clear this field. The PF driver is
not allow to update any content in its
outgoing mailbox memory (OMM) while
o_msg_status(target_fn_id) is asserted.
Any illegal write to the OMM will be
discarded (optionally, this can cause an
error in the AXI4-Lite response channel).

Displayed in the footer


Page 158 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[0] 0 RO i_msg_status For VF: When asserted, a message in the


VF’s incoming Mailbox memory is
pending for process. The field will be
cleared once the VF driver write msg_rcv
to its command register.
For PF: When asserted, the messages in
the incoming Mailbox memory are
pending for process. The field will be
cleared only when the event queue is
empty.

Function Command Register (0x2404)

Table: Function Command Register (0x2404)

Bit Default Access Type Field Description

[31:3] 0 NA Reserved Reserved

[2] 0 RO Reserved Reserved

[1] 0 RW msg_rcv For VF: VF marks the message in its


Incoming Mailbox Memory as received.
Hardware asserts the acknowledgement
bit of the associated PF.
For PF: PF marks the message send by
target_fn as received. The hardware will
refresh the i_msg_status of the PF, and
clear the o_msg_status of the target_fn.

[0] 0 RW msg_send For VF: VF marks the current message in


its own Outgoing Mailbox as valid.
For PF:

Displayed in the footer


Page 159 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description


Current target_fn_id belongs to a
VF: PF finished writing a message
into the Incoming Mailbox memory of
the VF with target_fn_id. The
hardware sets the i_msg_status field
of the target FN’s status register.
Current target_fn_id belongs to a
PF: PF finished writing a message
into its own outgoing Mailbox
memory. Hardware will push the
message to the event queue of the
PF with target_fn_id.

Function Interrupt Vector Register (0x2408)

Table: Function Interrupt Vector Register (0x2408)

Bit Default Access Type Field Description

[31:5] 0 NA Reserved Reserved

[4:0] 0 RW int_vect 5-bit interrupt vector assigned by the


driver.

Target Function Register (0x240C)

Table: Target Function Register (0x240C)

Bit Default Access Type Field Description

[31:8] 0 NA Reserved Reserved

[7:0] 0 RW target_fn_id This field is for PF use only.


The FN number which the current
operation is targeting.

Function Interrupt Vector Register (0x2410)

Table: Function Interrupt Vector Register (0x2410)

Bit Default Access Type Field Description

[31:1] 0 NA Reserved Reserved

[0] 0 RW int_en Interrupt enable.

Displayed in the footer


Page 160 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
RTL Version Register (0x2414)

Table: RTL Version Register (0x2414)

Bit Default Access Type Description

[31:16] 0x1fd3 RO QDMA ID

[15:0] 0 RO Vivado versions


0x1000: CPM QDMA Vivado version 2020.1.

PF Acknowledgment Registers (0x2420-0x243C)

Table: PF Acknowledgment Registers (0x2420-0x243C)

Register Addr Default Access Type Width Description

Ack0 0x22420 0 RW 32 Acknowledgment from FN


31~0

Ack1 0x22424 0 RW 32 Acknowledgment from FN


63~32

Ack2 0x22428 0 RW 32 Acknowledgment from FN


95~64

Ack3 0x2242C 0 RW 32 Acknowledgment from FN


127~96

Ack4 0x22430 0 RW 32 Acknowledgment from FN


159~128

Ack5 0x22434 0 RW 32 Acknowledgment from FN


191~160

Ack6 0x22438 0 RW 32 Acknowledgment from FN


223~192

Ack7 0x2243C 0 RW 32 Acknowledgment from FN


255~224

FLR Control/Status Register (0x2500)

Table: FLR Control/Status Register (0x2500)

Bit Default Access Type Field Description

[31:1] 0 NA Reserved Reserved

[0] 0 RW Flr_status Software write 1 to initiate the Function


Level Reset (FLR) for the associated

Displayed in the footer


Page 161 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description


function. The field is kept asserted during
the FLR process. After the FLR is done,
the hardware de-asserts this field.

Incoming Message Memory (0x2C00-0x2C7C)

Table: Incoming Message Memory (0x2C00-0x2C7C)

Register Addr Default Access Type Width Description

i_msg_i 0x22C00 + 0 RW 32 The ith word of the


i*4 incoming message ( 0 ≤ I
< 128).

Outgoing Message Memory (0x3000-0x307C)

Table: Outgoing Message Memory (0x3000-0307C)

Register Addr Default Access Type Width Description

o_msg_i 0x3000 + i 0 RW 32 The ith word of the


*4 outgoing message ( 0 ≤ I
< 128).

QDMA_TRQ_SEL_QUEUE_PF (0x6400)

Table: QDMA_TRQ_SEL_QUEUE_PF (0x6400) Register Space

Register Address Description

QDMA_DMAP_SEL_INT_CIDX[2048] 0x6400-0xB3F0 Interrupt Ring Consumer Index


(0x6400) (CIDX)

QDMA_DMAP_SEL_H2C_DSC_PIDX[2048]
0x6404-0xB3F4 H2C Descriptor Producer index
(0x6404) (PIDX)

QDMA_DMAP_SEL_C2H_DSC_PIDX[2048]
0x6408-0xB3F8 C2H Descriptor Producer Index
(0x6408) (PIDX)

QDMA_DMAP_SEL_CMPT_CIDX[2048] 0x640C-0xB3FC C2H Completion Consumer Index


(0x640C) (CIDX)

There are 2048 Queues, each Queue will have more than four registers. All these registers can be
dynamically updated at any time. This set of registers can be accessed based on the Queue number.

Displayed in the footer


Page 162 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Queue number is absolute Qnumber [0 to 2047].
Interrupt CIDX address = 0x6400 + Qnumber*16
H2C PIDX address = 0x6404 + Qnumber*16
C2H PIDX address = 0x6408 + Qnumber*16
Write Back CIDX address = 0x640C + Qnumber*16

For Queue 0:

0x6400 correspond to QDMA_DMAP_SEL_INT_CIDX


0x6404 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x6408 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x640C correspond to QDMA_DMAP_SEL_WRB_CIDX

For Queue 1:

0x6410 correspond to QDMA_DMAP_SEL_INT_CIDX


0x6414 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x6418 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x641C correspond to QDMA_DMAP_SEL_WRB_CIDX

For Queue 2:

0x6420 correspond to QDMA_DMAP_SEL_INT_CIDX


0x6424 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x6428 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x642C correspond to QDMA_DMAP_SEL_WRB_CIDX

QDMA_DMAP_SEL_INT_CIDX[2048] (0x6400)

Table: QDMA_DMAP_SEL_INT_CIDX[2048] (0x6400)

Bit Default Access Type Field Description

[31:24] 0 NA Reserved Reserved

[23:16] 0 RW ring_idx Ring index of the Interrupt Aggregation


Ring

[15:0] 0 RW sw_cdix Software Consumer index (CIDX)

QDMA_DMAP_SEL_H2C_DSC_PIDX[2048] (0x6404)

Table: QDMA_DMAP_SEL_H2C_DSC_PIDX[2048] (0x6404)

Bit Default Access Type Field Description

[31:17] 0 NA Reserved Reserved

[16] 0 RW irq_arm Interrupt arm. Set this bit to 1 for next


interrupt generation.

Displayed in the footer


Page 163 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[15:0] 0 RW h2c_pidx H2C Producer Index

QDMA_DMAP_SEL_C2H_DSC_PIDX[2048] (0x6408)

Table: QDMA_DMAP_SEL_C2H_DSC_PIDX[2048] (0x6408)

Bit Default Access Type Field Description

[31:17] 0 NA Reserved Reserved

[16] 0 RW irq_arm Interrupt arm. Set this bit to 1 for next


interrupt generation.

[15:0] 0 RW c2h_pidx C2H Producer Index

QDMA_DMAP_SEL_CMPT_CIDX[2048] (0x640C)

Table: QDMA_DMAP_SEL_CMPT_CIDX[2048] (0x640C)

Bit Default Access Type Field Description

[31:29] 0 NA Reserved Reserved

[28] 0 RW irq_en_wrb Interrupt arm. Set this bit to 1 for next


interrupt generation.

[27] 0 RW en_sts_desc_wrb Enable Status Descriptor for CMPT

[26:24] 0 RW trigger_mode Interrupt and Status Descriptor Trigger


Mode:
0x0: Disabled
0x1: Every
0x2: User_Count
0x3: User
0x4: User_Timer
0x5: User_Timer_Count

[23:20] 0 RW c2h_timer_cnt_index Index to QDMA_C2H_TIMER_CNT

[19:16] 0 RW c2h_count_threshhold Index to QDMA_C2H_CNT_TH

[15:0] 0 RW wrb_cidx CMPT Consumer Index (CIDX)

QDMA VF Address Register Space

Table: QDMA VF Address Register Space

Target Name Base (Hex) Byte Size (Dec) Notes

Displayed in the footer


Page 164 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Target Name Base (Hex) Byte Size (Dec) Notes

QDMA_TRQ_SEL_QUEUE_VF 00003000 4096 VF Direct QCSR (16B per


(0x3000) Queue, up to max of 256
Queue per function)

QDMA_TRQ_MSIX_VF 00004000 4096 Space for 32 MSI-X vectors


(0x400) and PBA

QDMA_VF_MAILBOX 00001000 8192 Mailbox address space


(0x1000)

QDMA_TRQ_SEL_QUEUE_VF (0x3000)

VF functions can access direct update registers per queue with offset (0x3000). The description for
this register space is the same as QDMA_TRQ_SEL_QUEUE_PF (0x6400).
This set of registers can be accessed based on Queue number. Queue number is absolute Qnumber,
[0 to 256].

Interrupt CIDX address = 0x3000 + Qnumber*16


H2C PIDX address = 0x3004 + Qnumber*16
C2H PIDX address = 0x3008 + Qnumber*16
Completion CIDX address = 0x300C + Qnumber*16

For Queue 0:

0x3000 correspond to QDMA_DMAP_SEL_INT_CIDX


0x3004 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x3008 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x300C correspond to QDMA_DMAP_SEL_WRB_CIDX

For Queue 1:

0x3010 correspond to QDMA_DMAP_SEL_INT_CIDX


0x3014 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x3018 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x301C correspond to QDMA_DMAP_SEL_WRB_CIDX

QDMA_TRQ_MSIX_VF (0x400)

VF functions can access the MSI-X table with offset (0x0000) from that function. The description for
this register space is the same as QDMA_TRQ_MSIX (0x2000).

QDMA_VF_MAILBOX (0x1000)

Table: QDMA_VF_MAILBOX (0x0100) Register Space

Registers (Address) Address Description

Displayed in the footer


Page 165 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Registers (Address) Address Description

Function Status Register 0x1000 Status register bits


(0x1000)

Function Command Register 0x1004 Command register bits


(0x1004)

Function Interrupt Vector 0x1008 Interrupt vector register


Register (0x1008)

Target Function Register 0x100C Target Function register


(0x100C)

Function Interrupt Control 0x1010 Interrupt Control Register


Register (0x1010)

RTL Version Register 0x1014 RTL Version Register


(0x1014)

Incoming Message Memory 0x1800-0x187C Incoming message (128


(0x1800-0x187C) bytes)

Outgoing Message Memory 0x1C00-0x1C7C Outgoing message (128


(0x1C00-0x1C7C) bytes)

Function Status Register (0x1000)

Table: Function Status Register (0x1000)

Bit Index Default Access Type Field Description

[31:12] 0 NA Reserved Reserved

[11:4] 0 RO cur_src_fn This field is for PF use only.


The source function number of the
message on the top of the incoming
request queue.

[2] 0 RO ack_status This field is for PF use only.


The status bit will be set when any bit in
the acknowledgement status register is
asserted.

[1] 0 RO o_msg_status For VF: The status bit will be set when VF
driver write msg_send to its command
register. When the associated PF driver
sends acknowledgement to this VF, the
hardware clears this field. The VF driver is
not allow to update any content in its

Displayed in the footer


Page 166 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Index Default Access Type Field Description


outgoing mailbox memory (OMM) while
o_msg_status is asserted. Any illegal
writes to the OMM are discarded
(optionally, this can cause an error in the
AXI4-Lite response channel).
For PF: The field indicated the message
status of the target FN which is specified
in the Target FN Register. The status bit is
set when PF driver sends the msg_send
command. When the corresponding
function driver sends acknowledgement
through msg_rcv, the hardware clears this
field. The PF driver is not allow to update
any content in its outgoing mailbox
memory (OMM) while
o_msg_status(target_fn_id) is asserted.
Any illegal writes to the OMM are
discarded (optionally, this can cause an
error in the AXI4-Lite response channel).

[0] 0 RO i_msg_status For VF: When asserted, a message in the


VF's incoming Mailbox memory is
pending for process. The field is cleared
after the VF driver writes msg_rcv to its
command register.
For PF: When asserted, the messages in
the incoming Mailbox memory are
pending for process. The field is cleared
only when the event queue is empty.

Function Command Register (0x1004)

Table: Function Command Register (0x1004)

Bit Index Default Access Type Field Description

[31:3] 0 NA Reserved Reserved

[2] 0 RO Reserved Reserved

[1] 0 RW msg_rcv For VF: VF marks the message in its


Incoming Mailbox Memory as received.
The hardware asserts the
acknowledgement bit of the associated
PF.

Displayed in the footer


Page 167 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Index Default Access Type Field Description


For PF: PF marks the message send by
target_fn as received. The hardware
refreshes the i_msg_status of the PF, and
clears the o_msg_status of the target_fn.

[0] 0 RW msg_send For VF: VF marks the current message in


its own Outgoing Mailbox as valid.
For PF:
Current target_fn_id belongs to a VF: PF
finished writing a message into the
Incoming Mailbox memory of the VF with
target_fn_id. The hardware sets the
i_msg_status field of the target FN's
status register.
Current target_fn_id belongs to a PF: PF
finished writing a message into its own
outgoing Mailbox memory. The hardware
pushes the message to the event queue
of the PF with target_fn_id.

Function Interrupt Vector Register (0x1008)

Table: Function Interrupt Vector Register (0x1008)

Bit Index Default Access Type Field Description

[31:5] 0 NA Reserved Reserved

[4:0] 0 RW int_vect 5-bit interrupt vector assigned by the


driver software.

Target Function Register (0x100C)

Table: Target Function Register (0x100C)

Bit Index Default Access Type Field Description

[31:8] 0 NA Reserved Reserved

[7:0] 0 RW target_fn_id This field is for PF use only.


The FN number that the current operation
is targeting.

Function Interrupt Control Register (0x1010)

Table: Function Interrupt Control Register (0x1010)

Displayed in the footer


Page 168 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Index Default Access Type Field Description

[31:1] 0 NA res Reserved

[0] 0 RW int_en Interrupt enable.

RTL Version Register (0x1014)

Table: RTL Version Register (0x1014)

Bit Default Access Type Description

[31:16] 0x1fd3 RO QDMA ID

[15:0] 0 RO Vivado versions


0x1000: CPM QDMA Vivado version 2020.1.

Incoming Message Memory (0x1800-0x187C)

Table: Incoming Message Memory (0x1800-0x187C)

Register Addr Default Access Type Width Description

i_msg_i 0x1800 + 0 RW 32 The ith word of the


i*4 incoming message ( i <
128).

Outgoing Message Memory (0x1C00-0x1C7C)

Table: Outgoing Message Memory (0x1C00-0x1C7C)

Register Addr Default Access Type Width Description

o_msg_i 0x1C00 + i 0 RW 32 The ith word of the


*4 outgoing message (i <
128).

AXI Slave Register Space

DMA register space can be accessed using AXI Slave interface. When AXI Slave Bridge mode is
enabled (based on GUI settings) user can also access Bridge registers and can also access Host
memory space.

Table: AXI4 Slave Register Space

Register Space AXI Slave Interface Address range Details

Bridge 0x6_0000_0000 Described in Bridge register space CSV file.


registers See Bridge Register Space for details.

Displayed in the footer


Page 169 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Space AXI Slave Interface Address range Details

DMA 0x6_1000_0000 Described in QDMA PF Address Register


registers Space and QDMA VF Address Register Space.

Slave Bridge 0xE001_0000 - 0xEFFF_FFFF Address range for Slave bridge access is set
access to 0x6_1101_0000 - during IP customization in the Address Editor
Host memory 0x7_FFFF_FFFF tab of the Vivado IDE.
space 0x80_0000_0000 -
0xBF_FFFF_FFFF
If you want to access QDMA register space from AXI Slave interface the offset is 0x6_1000_0000.
QDMA supports 256 functions and 64 KB space is allocated for each function. To access any CSR or
queue space register for a function, "AXI Slave address offset + function offset+ register offset".
Function offset (64 KB): 0x1_0000
Register offset within each function is listed at QDMA PF Address Register Space.
For example, to access queues space registers for function 0: 0x6_1000_6400.
For example, to access queues space registers for function 1: 0x6_1001_6400.
For example, to access queues space registers for function 2: 0x6_1002_6400.

Bridge Register Space

Bridge register addresses start at 0xE00. Addresses from 0x00 to 0xE00 are directed to the PCIe
configuration register space.
All the bridge registers are listed in the cpm4-bridge-v2-1-registers.csv available in the register map
files.
To locate the register space information:

1. Download the register map files.


2. Extract the ZIP file contents into any write-accessible location.
3. Refer to cpm4-bridge-v2-1-registers.csv.

DMA Register Space

The DMA register space is described in the following sections:

QDMA PF Address Register Space


QDMA VF Address Register Space

Design Flow Steps


This section describes customizing and generating the functional mode, constraining the functional
mode, and the simulation, synthesis, and implementation steps that are specific to this IP functional
mode. More detailed information about the standard AMD Vivado™ design flows and the IP integrator
can be found in the following Vivado Design Suite user guides:

Displayed in the footer


Page 170 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
Vivado Design Suite User Guide: Designing with IP (UG896)
Vivado Design Suite User Guide: Getting Started (UG910)
Vivado Design Suite User Guide: Logic Simulation (UG900)

Lab1: QDMA AXI MM Interface to NoC and DDR

This lab describes the process of generating an AMD Versal™ device QDMA design with AXI4
interface connected to network on chip (NoC) IP and DDR memory. This design has the following
configurations:

AXI4 memory mapped (AXI MM) connected to DDR through the NoC IP
Gen3 x 16
MSI-X interrupts

This lab provides step by step instructions to configure a Control, Interfaces and Processing System
(CIPS) QDMA design and network on chip (NoC) IP. The following figure shows the AXI4 Memory
Mapped (AXI-MM) interface to DDR using the NoC IP. At the end of this lab, you can synthesize and
implement the design, and generate a Programmable Device Image (PDI) file. The PDI file is used to
program the Versal device and run data traffic on a system. For the AXI-MM interface host to chip
(H2C) transfers, data is read from Host and sent to DDR memory. For chip to host (C2H) transfers,
data is read from DDR memory and written to host.
This lab targets a xcvc1902-vsvd1760-1LP-e-S-es1 part on a VCK5000 board. This lab connects to
DDR memory found outside the Versal device. For more information, see QDMA AXI MM Interface to
NoC and DDR Lab.

Lab2: QDMA AXI MM Interface to NoC and DDR with Mailbox

This lab describes the process of generating an AMD Versal™ device QDMA design containing 4
PFs, 252 VFs and AXI4 Memory Mapped interface connected to the network on chip (NoC) IP and
DDR memory. This design has the following configurations:

AXI4 memory mapped (AXI MM) connected to DDR through the NoC IP
Gen3 x 16
4 physical functions (PFs) and 252 virtual functions (VFs) with Mailbox connections.
MSI-X interrupts

This lab provides step by step instructions to configure a Control, Interfaces and Processing System
(CIPS) QDMA design and network on chip (NoC) IP. The following figure shows the AXI4 Memory
Mapped (AXI-MM) interface to DDR using the NoC IP. At the end of this lab, you can synthesize and
implement the design, and generate a Programmable Device Image (PDI) file. The PDI file is used to
program the Versal device and run data traffic on a system. For the AXI-MM interface host to chip
(H2C) transfers, data is read from Host and sent to DDR memory. For chip to host (C2H) transfers,
data is read from DDR memory and written to host.
This lab targets a xcvc1902-vsvd1760-1LP-e-S-es1 part on a VCK5000 board. This lab connects to
DDR memory found outside the Versal device. For more information, see QDMA AXI MM Interface to
NoC and DDR with Mailbox.

Displayed in the footer


Page 171 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Customizable Example Design (CED)


The following table describes the available CPM4 example design. All the listed designs are based on
VCK190 evaluation board:

Table: QDMA Example Design

Top CED Name Preset Simulation/Implementation


Description

CPM4_QDMA_Gen4x8_MM_ST_Design
Implementation Functional
example design.
Versal_CPM_QDMA_EP_Design
CPM4_QDMA_Gen4x8_MM_ST_Performance_Design
Implementation Performance
example design.

Versal_CPM_QDMA_EP_Simulation_Design
No preset Simulation Functional
example design.

Versal_CPM_Bridge No preset Implementation Root port design.


RP_Design

The associated drivers can be downloaded from GitHub. For more information about CED, see Versal
Adaptive SoC CPM Example Designs.

CED Generation Steps

Following are the steps to generate a CED:

1. Launch Vivado.
2. Check whether the designs are installed and update if required.

Displayed in the footer


Page 172 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
3. Click Vivado Store.

Displayed in the footer


Page 173 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
4. Go to Example Designs tab, click Refresh to refresh the catalog.

5. Click install for any newly added designs or click Update for any updates to the designs and
close it.

Displayed in the footer


Page 174 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

6. Click Open Example Project > Next, select the appropriate design, click Next.

Displayed in the footer


Page 175 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
7. Create project, click Next.

8. Choose the board or part option available. Based on the board selected, appropriate CPM block
is enabled. For example, for VCK190 board, CPM4 block is enabled. Similarly, for VPK120
board, CPM5 block is enabled.

Displayed in the footer


Page 176 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Also, choose the appropriate preset if applicable.


9. Click Finish.

Versal_CPM_QDMA_EP_Design

The following are the presets available for you to select:

Displayed in the footer


Page 177 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
CPM4_QDMA_Gen4x8_MM_ST_Design

CPM4 QDMA0 Gen4x8 Functional example design.

This design has CPM4 – QDMA0 enabled in Gen4x8 configuration as an End Point
The design targets VCK190 board and it supports Synthesis and Implementation flows
The associated drivers can be downloaded from GitHub
Enables QDMA AXI4-MM and QDMA AXI-ST functionality with 4 PF and 252 VFs
Capable of exercising AXI4-MM, AXI-ST path, and descriptor bypass
Design also includes DDR

Example design registers can only be controlled through the AXI4-Lite master interface. To test the
QDMA's AXI4-Stream interface, ensure that the AXI4-Lite master interface is present. Following are
the example design registers:

Example Design Registers

Table: Example Design Registers

Registers Address Description

C2H_ST_QID (0x000) 0x000 AXI-ST C2H Queue id

C2H_ST_LEN (0x004) 0x004 AXI-ST C2H transfer length

C2H_CONTROL_REG (0x008) 0x008 AXI-ST C2H pattern generator


control

H2C_CONTROL_REG (0x00C) 0x00C AXI-ST H2C Control

H2C_STATUS (0x010) 0x010 AXI-ST H2C Status

C2H_STATUS (0x018) 0x018 AXI-ST C2H Status

C2H_PACKET_COUNT (0x020) 0x020 AXI-ST C2H number of packets to


transfer

C2H_COMPLETION_DATA_0 0x4C-0x030 AXI-ST C2H completion data


(0x030)
to
C2H_COMPLETION_DATA_7
(0x04C)

C2H_COMPLETION_SIZE (0x050) 0x050 AXI-ST completion data type

SCRATCH_REG0 (0x060) 0x060 Scratch register 0

SCRATCH_REG1 (0x064) 0x064 Scratch register 1

C2H_PACKETS_DROP (0x088) 0x088 AXI-ST C2H Packets drop count

Displayed in the footer


Page 178 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Registers Address Description

C2H_PACKETS_ACCEPTED 0x08C AXI-ST C2H Packets accepted count


(0x08C)

DESCRIPTOR_BYPASS (0x090) 0x090 C2H and H2C descriptor bypass


loopback

USER_INTERRUPT (0x094) 0x094 User interrupt, vector number,


function number

USER_INTERRUPT_MASK (0x098) 0x098 User interrupt mask

USER_INTERRUPT_VECTOR 0x09C User interrupt vector


(0x09C)

DMA_CONTROL (0x0A0) 0x0A0 DMA control

VDM_MESSAGE_READ (0x0A4) 0x0A4 VDM message read

C2H_ST_QID (0x000)

Table: C2H_ST_QID (0x000)

Bit Default Access Type Field Description

[31:11] 0 NA Reserved

[10:0] 0 RW c2h_st_qid AXI4-Stream C2H Queue ID

C2H_ST_LEN (0x004)

Table: C2H_ST_LEN (0x004)

Bit Default Access Type Field Description

[31:16] 0 NA Reserved

[15:0] 0 RW c2h_st_len AXI4-Stream


packet length

C2H_CONTROL_REG (0x008)

Table: C2H_CONTROL_REG (0x008)

Bit Default Access Type Description

[31:6] 0 NA Reserved

[5] 0 RW C2H Stream Marker


request. C2H Stream

Displayed in the footer


Page 179 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Description


Marker response is
registered at address
0x18, bit [0].

[4] 0 NA reserved

[3] 0 RW Disable completion.


For this packet, there
will not be any
completion.

[2] 0 RW Immediate data.


When set, the data
generator sends
immediate data. This
is a self-clearing bit.
Write 1 to initiate
transfer.

[1] 0 RW Starts AXI-ST C2H


transfer. This is a self-
clearing bit. Write 1 to
initiate transfer.

[0] 0 RW Streaming loop back.


When set, the data
packet from H2C
streaming port in the
Card side is looped
back to the C2H
streaming ports.
For Normal C2H stream packet transfer, set address offset 0x08 to 0x2.
For C2H immediate data transfer, set address offset 0x8 to 0x4.
For C2H/H2C stream loopback, set address offset 0x8 to 0x1.

H2C_CONTROL_REG (0x00C)

Table: H2C_CONTROL_REG (0x00C)

Bit Default Access Type Description

[31:30] 0 NA Reserved

[0] 0 RW Clear match bit for


H2C transfer.

Displayed in the footer


Page 180 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
H2C_STATUS (0x010)

Table: H2C_STATUS (0x010)

Bit Default Access Type Description

[31:15] 0 NA Reserved

[14:4] 0 R H2C transfer Queue


ID

[3:1] 0 NA Reserved

[0] 0 R H2C transfer match

C2H_STATUS (0x018)

Table: C2H_STATUS (0x018)

Bit Default Access Type Description

[31:30] 0 NA Reserved

[0] 0 R C2H Marker response

C2H_PACKET_COUNT (0x020)

Table: C2H_PACKET_COUNT (0x020)

Bit Default Access Type Description

[31:10] 0 NA Reserved

[9:0] 0 RW AXI-ST C2H number of packet to transfer

C2H_PREFETCH_TAG(0x024)

Table: C2H_PREFETCH_TAG (0x024)

Bit Default Access Type Description

[31:27] 0 NA Reserved

[26:16] 0 RW Qid for prefetch tag

[15:7] 0 NA Reserved

[6:0] 0 RW Prefetch tag value

Displayed in the footer


Page 181 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
C2H_COMPLETION_DATA_0 (0x030)

Table: C2H_COMPLETION_DATA_0 (0x030)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [31:0]

C2H_COMPLETION_DATA_1 (0x034)

Table: C2H_COMPLETION_DATA_1 (0x034)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [63:32]

C2H_COMPLETION_DATA_2 (0x038)

Table: C2H_COMPLETION_DATA_2 (0x038)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [95:64]

C2H_COMPLETION_DATA_3 (0x03C)

Table: C2H_COMPLETION_DATA_3 (0x03C)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [127:96]

C2H_COMPLETION_DATA_4 (0x040)

Table: C2H_COMPLETION_DATA_4 (0x040)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [159:128]

C2H_COMPLETION_DATA_5 (0x044)

Table: C2H_COMPLETION_DATA_5 (0x044)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [191:160]

Displayed in the footer


Page 182 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
C2H_COMPLETION_DATA_6 (0x048)

Table: C2H_COMPLETION_DATA_6 (0x048)

Bit Default Access Type Field Description

[31:0] 0 NA NA AXI-ST C2H Completion Data [223:192]

C2H_COMPLETION_DATA_7 (0x04C)

Table: C2H_COMPLETION_DATA_7 (0x04C)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [255:224]

C2H_COMPLETION_SIZE (0x050)

Table: C2H_COMPLETION_SIZE (0x050)

Bit Default Access Type Description

[31:13] 0 NA Reserved

[12] 0 RW Completion Type.


1'b1: NO_PLD_BUT_WAIT
1'b0: HAS PLD

[10:8] 0 RW s_axis_c2h_cmpt_ctrl_err_idx[2:0] Completion Error


Bit Index.
3'b000: Selects 0th register.
3'b111: No error bit is reported.

[6:4] 0 RW s_axis_c2h_cmpt_ctrl_col_idx[2:0] Completion Color


Bit Index.
3'b000: Selects 0th register.
3'b111: No color bit is reported.

[3] 0 RW s_axis_c2h_cmpt_ctrl_user_trig Completion user


trigger

[1:0] 0 RW AXI4-Stream C2H completion data size.


00: 8 Bytes
01: 16 Bytes
10: 32 Bytes
11: 64 Bytes

Displayed in the footer


Page 183 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
SCRATCH_REG0 (0x060)

Table: SCRATCH_REG0 (0x060)

Bit Default Access Type Description

[31:0] 0 RW Scratch register

SCRATCH_REG1 (0x064)

Table: SCRATCH_REG1 (0x064)

Bit Default Access Type Description

[31:0] 0 RW Scratch register

C2H_PACKETS_DROP (0x088)

Table: C2H_PACKETS_DROP (0x088)

Bit Default Access Type Description

[31:0] 0 R The number of AXI-ST C2H packets (descriptors)


dropped per transfer

Each AXI-ST C2H transfer can contain one or more descriptors depending on transfer size and C2H
buffer size. This register represents how many of the descriptors were dropped in the current transfer.
This register will reset to 0 in the beginning of each transfer.

C2H_PACKETS_ACCEPTED (0x08C)

Table: C2H_PACKETS_ACCEPTED (0x08C)

Bit Default Access Type Description

[31:0] 0 R The number of AXI-ST C2H packets (descriptors)


accepted per transfer

Each AXI-ST C2H transfer can contain one or more descriptors depending on the transfer size and
C2H buffer size. This register represents how many of the descriptors were accepted in the current
transfer. This register will reset to 0 at the beginning of each transfer.

DESCRIPTOR_BYPASS (0x090)

Table: Descriptor Bypass (0x090)

Bit Default Access Type Field Description

[31:3] 0 NA Reserved

Displayed in the footer


Page 184 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[2:1] 0 RW c2h_dsc_bypass C2H descriptor bypass loopback. When


set, the C2H descriptor bypass-out port is
looped back to the C2H descriptor
bypass-in port.
2'b00: No bypass loopback.
2'b01: C2H MM desc bypass loopback
and C2H Stream cache bypass loopback.
2'b10: C2H Stream Simple descriptor
bypass loopback.
2'b11: H2C stream 64 byte descriptors
are looped back to Completion interface.

[0] 0 RW h2c_dsc_bypass H2C descriptor bypass loopback. When


set, the H2C descriptor bypass-out port is
looped back to the H2C descriptor
bypass-in port.
1'b1: H2C MM and H2C Stream
descriptor bypass loopback
1'b0: No descriptor loopback

USER_INTERRUPT (0x094)

Table: User Interrupt (0x094)

Bit Default Access Type Field Description

[31:20] 0 NA Reserved

[19:12] 0 RW usr_irq_in_fun User interrupt function number

[11:9] 0 NA Reserved

[8:4] 0 RW usr_irq_in_vec User interrupt vector number

[3:1] 0 NA Reserved

[0] 0 RW usr_irq User interrupt. When set, the example


design generates a user interrupt.

To generate a user interrupt:

1. Write the function number at bits [19:12]. This corresponds to the function that generates the
usr_irq_in_fnc user interrupt.
2. Write MSI-X Vector number at bits [8:4]. This corresponds to the entry in the MSI-X table that is
set up for usr_irq_in_vec user interrupt.

Displayed in the footer


Page 185 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
3. Write 1 to bit [0] to generate user interrupt. This bit clears itself after usr_irq_out_ack from the
DMA is generated.

All three above steps can be done at the same time, with a single write.
Following is the user interrupt timing diagram:

Figure: Interrupt

USER_INTERRUPT_MASK (0x098)

Table: User Interrupt Mask (0x098)

Bit Default Access Type Description

[31:0] 0 RW User Interrupt Mask

USER_INTERRUPT_VECTOR (0x09C)

Table: User Interrupt Vector (0x09C)

Bit Default Access Type Description

[31:0] 0 RW User Interrupt Vector

The user_interrupt_mask[31:0] and user_interrupt_vector[31:0] registers are provided as


an example design for user interrupt aggregation that can generate a user interrupt for a function. The
user_interrupt_mask[31:0] is anded (bitwire and) with user_interrupt_vector[31:0] and a
user interrupt is generated. The user_interrupt_vector[31:0] is clear on read register.
To generate a user interrupt:

1. Write the function number at user_interrupt[19:12]. This corresponds to which function


generates the usr_irq_in_fnc user interrupt.
2. Write the MSI-X Vector number at user_interrupt[8:4]. This corresponds to which entry in
MSI-X table is set up for the usr_irq_in_vec user interrupt.
3. Write mask value in the user_interrupt_mask[31:0] register.
4. Write the interrupt vector value in the user_interrupt_vector[31:0] register.

This generates a user interrupt to the DMA block.


There are two way to generate user interrupt:

Write to user_interrupt[0], or
Write to the user_interrupt_vector[31:0] register with mask set.

Displayed in the footer


Page 186 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

DMA_CONTROL (0x0A0)

Table: DMA Control (0x0A0)

Bit Default Access Type Field Description

[31:1] NA Reserved

[0] 0 RW gen_qdma_reset When soft_reset


is set, generates
a soft reset to
the DMA block.
This bit is
cleared after 100
cycles.

Writing a 1 to DMA_control[0] generates a soft reset on soft_reset_n (active-Low). A reset is asserted


for 100 cycles, and following which of the signals will be deasserted.

VDM_MESSAGE_READ (0x0A4)

Table: VDM Message Read (0x0A4)

Bit Default Access Type Description

[31:0] RO VDM message read

Vendor Defined Message (VDM) messages, st_rx_msg_data, are stored in FIFO in the example
design. A read to this register (0x0A4) will pop out one 32-bit message at a time.

CPM4_QDMA_Gen4x8_MM_ST_Performance_Design

CPM4 Controller0 QDMA. The following is Gen4x8 performance example design:

This design has CPM4 – QDMA0 enabled in Gen4x8 configuration as an End Point
The design targets VCK190 board and it supports Synthesis and Implementation flows
The associated drivers can be downloaded from: https://github.com/Xilinx/dma_ip_drivers
Enables QDMA AXI4-MM and QDMA AXI-ST functionality
Capable of demonstrating the QDMA MM and ST performance
Design also includes DDR

Versal_CPM_QDMA_EP_Simulation_Design

CPM4 Controler0 QDMA Gen4x8. Functional example design which can be used for simulation.

Displayed in the footer


Page 187 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
This design has CPM4 – QDMA0 enabled in Gen4x8 configuration as an End Point
This design supports simulation
Capable of exercising AXI4, AXI-ST path, and descriptor bypass
Design also includes DDR
The design includes Root Port testbench, which simulates QDMA MM and ST datapath

Example design registers are listed under CPM4_QDMA_Gen4x8_MM_ST_Design section.


✎ Note: For CPM simulation, you must use Synopsys VCS or Siemens Questa simulation tools.
Versal_CPM_Bridge RP_Design

CPM4 Controller 0 QDMA AXI-Bridge in Root Port mode:

This design has CPM4 – QDMA0 AXI Bridge mode enabled in Gen4x8 configuration as Root
Port
The design targets VCK190 board and it supports Synthesis and Implementation flows
The design implements the Root Complex functionality. It includes CIPS IP, which enabled both
CPM and PS

Debugging
This appendix includes details about resources available on the AMD Support website and debugging
tools.
If the IP requires a license key, the key must be verified. The AMD Vivado™ design tools have several
license checkpoints for gating licensed IP through the flow. If the license check succeeds, the IP can
continue generation. Otherwise, generation halts with an error. License checkpoints are enforced by
the following tools:

Vivado Synthesis
Vivado Implementation
write_bitstream (Tcl command)

Finding Help with AMD Adaptive Computing Solutions

To help in the design and debug process when using the functional mode, the Support web page
contains key resources such as product documentation, release notes, answer records, information
about known issues, and links for obtaining further product support. The Community Forums are also
available where members can learn, participate, share, and ask questions about AMD Adaptive
Computing solutions.

Documentation

This product guide is the main document associated with the functional mode. This guide, along with
documentation related to all products that aid in the design process, can be found on the AMD
Adaptive Support web page or by using the AMD Adaptive Computing Documentation Navigator.
Download the Documentation Navigator from the Downloads page. For more information about this
tool and the features available, open the online help after installation.

Displayed in the footer


Page 188 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Debug Guide

For more information on PCIe Debug, see PCIe Debug K-Map.

Answer Records

Answer Records include information about commonly encountered problems, helpful information on
how to resolve these problems, and any known issues with an AMD Adaptive Computing product.
Answer Records are created and maintained daily to ensure that users have access to the most
accurate information available.
Answer Records for this functional mode can be located by using the Search Support box on the main
AMD Adaptive Support web page. To maximize your search results, use keywords such as:

Product name
Tool message(s)
Summary of the issue encountered

A filter search is available after results are returned to further target the results.

Master Answer Record for the Core

AR 75396.

Technical Support

AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:

Implement the solution in devices that are not defined in the documentation.
Customize the solution beyond that allowed in the product documentation.
Change any section of the design labeled DO NOT MODIFY.

To ask questions, navigate to the Community Forums.

Hardware Debug

Hardware issues can range from link bring-up to problems seen after hours of testing. This section
provides debug steps for common issues. The AMD Vivado™ debug feature is a valuable resource to
use in hardware debug. The signal names mentioned in the following individual sections can be
probed using the debug feature for debugging the specific problems.

General Checks

Ensure that all the timing constraints for the core were properly incorporated from the example design
and that all constraints were met during implementation.

Displayed in the footer


Page 189 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Does it work in post-place and route timing simulation? If problems are seen in hardware but not
in timing simulation, this could indicate a PCB issue. Ensure that all clock sources are active and
clean.
If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the
locked port.
If your outputs go to 0, check your licensing.

Soft Reset

Reset the QDMA logic through the dma0_soft_reset_n port. This port needs to be held in reset for a
minimum of 100 clock cycles (pcie0_user_clk cycles).
This signal resets only the DMA portion of logic. It does not reset the PCIe hard block.

Soft Reset Use Cases


The uses cases that prompt the use of dma0_soft_reset include:

DMA does not respond, and the user application is not getting proper values.
DMA transfer has errors, but the PCIe links are good.
DMA records some asynchronous errors.

After dma0_soft_reset, you must reinitialize the queues and program all queue context.

Registers

A complete list of registers and attributes for the QDMA Subsystem is available in the Versal Adaptive
SoC Register Reference (AM012). Reviewing the registers and attributes might be helpful for
advanced debugging.
✎ Note: The attributes are set during IP customization in the Vivado IP catalog. After core
customization, attributes are read-only.

Debug Tools

There are many tools available to address QDMA design issues. It is important to know which tools
are useful for debugging various situations.

Vivado Design Suite Debug Feature

The AMD Vivado™ Design Suite debug feature inserts logic analyzer and virtual I/O cores directly into
your design. The debug feature also allows you to set trigger conditions to capture application and
integrated block port signals in hardware. Captured signals can then be analyzed. This feature in the
Vivado IDE is used for logic debugging and validation of a design running in AMD devices.
The Vivado logic analyzer is used to interact with the logic debug LogiCORE IP cores, including:

ILA 2.0 (and later versions)


VIO 2.0 (and later versions)

Displayed in the footer


Page 190 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
See the Vivado Design Suite User Guide: Programming and Debugging (UG908).

Application Software Development


Device Drivers

Figure: Device Drivers

The above figure shows the usage model of Linux and Windows QDMA software drivers. The QDMA
example design is implemented on an AMD adaptive SoC, which is connected to an X86 host through
PCI Express.

In the first use mode, the QDMA driver in kernel space runs on Linux, whereas the test
application runs in user space.
In the second use mode, the Data Plane Dev Kit (DPDK) is used to develop a QDMA Poll Mode
Driver (PMD) running entirely in the user space, and use the UIO and VFIO kernel framework to
communicate with the adaptive SoC.
In the third usage mode, the QDMA driver runs in kernel space on Windows, whereas the test
application runs in the user space.

Linux QDMA Software Architecture (PF/VF)

Figure: Linux DMA Software Architecture

Displayed in the footer


Page 191 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The QDMA driver consists of the following three major components:

Device control tool: Creates a netlink socket for PCIe device query, queue management,
reading the context of a queue, etc.
DMA tool: Is the user space application to initiate a DMA transaction. You can use standard
Linux utility dd or fio, or use the example application in the driver package.
Kernel space driver: Creates the descriptors and translates the user space function into low-
level command to interact with the Versal device.

Using the Drivers

Linux, DPDK and Windows drivers and the corresponding documentation are available at AMD DMA
IP Drivers.
‼ Important: 8 MSI-X vectors are needed on all functions (PF/VF) for using AMD QDMA IP driver.

Reference Software Driver Flow

AXI4 Memory Map Flow Chart

Figure: AXI4-Memory Map Flow Chart

Displayed in the footer


Page 192 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4 Memory Mapped C2H Flow

Figure: AXI4 Memory Mapped Cart to Host (C2H) Flow Diagram

Displayed in the footer


Page 193 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4 Memory Mapped H2C Flow

Figure: AXI4 Memory Mapped H2C Flow

Displayed in the footer


Page 194 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream Flow Chart

Figure: AXI4-Stream Flow Chart

Displayed in the footer


Page 195 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream C2H Flow

Figure: AXI4-Stream C2H Flow

Displayed in the footer


Page 196 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream H2C Flow

Figure: AXI4-Stream H2C Flow

Displayed in the footer


Page 197 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Upgrading
This appendix is not applicable for the first release of the functional mode.

QDMA Subsystem for CPM5


Overview
The Queue-based Direct Memory Access (QDMA) subsystem is a PCI Express® ( PCIe® ) based
DMA engine that is optimized for both high bandwidth and high packet count data transfers. The

Displayed in the footer


Page 198 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA is composed of the AMD Versal™ Integrated Block for PCI Express, and an extensive DMA
and bridge infrastructure that enables the ultimate in performance and flexibility.
The QDMA offers a wide range of setup and use options, many selectable on a per-queue basis, such
as memory-mapped DMA or stream DMA, interrupt mode and polling. The functional mode provides
many options for customizing the descriptor and DMA through user logic to provide complex traffic
management capabilities.
The primary mechanism to transfer data using the QDMA is for the QDMA engine to operate on
instructions (descriptors) provided by the host operating system. Using the descriptors, the QDMA can
move data in both the Host to Card (H2C) direction, or the Card to Host (C2H) direction. You can
select on a per-queue basis whether DMA traffic goes to an AXI4 memory map (MM) interface or to an
AXI4-Stream interface. In addition, the QDMA has the option to implement both an AXI4 MM Master
port and an AXI4 MM Slave port, allowing PCIe traffic to bypass the DMA engine completely.
The main difference between QDMA and other DMA offerings is the concept of queues. The idea of
queues is derived from the “queue set” concepts of Remote Direct Memory Access (RDMA) from high
performance computing (HPC) interconnects. These queues can be individually configured by
interface type, and they function in many different modes. Based on how the DMA descriptors are
loaded for a single queue, each queue provides a very low overhead option for setup and continuous
update functionality. By assigning queues as resources to multiple PCIe Physical Functions (PFs) and
Virtual Functions (VFs), a single QDMA core and PCI Express interface can be used across a wide
variety of multifunction and virtualized application spaces.
The QDMA can be used and exercised with an AMD provided QDMA reference driver, and then built
out to meet a variety of application spaces.

QDMA Architecture

The following figure shows the block diagram of the QDMA.

Figure: QDMA Architecture

Displayed in the footer


Page 199 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

DMA Engines

Descriptor Engine

The Host to Card (H2C) and Card to Host (C2H) descriptors are fetched by the Descriptor Engine in
one of two modes: Internal mode, and Descriptor bypass mode. The descriptor engine maintains per
queue contexts where it tracks software (SW) producer index pointer (PIDX), consumer index pointer
(CIDX), base address of the queue (BADDR), and queue configurations for each queue. The
descriptor engine uses a round robin algorithm for fetching the descriptors. The descriptor engine has
separate buffers for H2C and C2H queues, and ensures it never fetches more descriptors than
available space. The descriptor engine will have only one DMA read outstanding per queue at a time
and can read as many descriptors as can fit in a MRRS. The descriptor engine is responsible for
reordering the out of order completions and ensures that descriptors for queues are always in order.
The descriptor bypass can be enabled on a per-queue basis and the fetched descriptors, after
buffering, are sent to the respective bypass output interface instead of directly to the H2C or C2H
engine. In internal mode, based on the context settings the descriptors are sent to delete per H2C
memory mapped (MM), C2H MM, H2C Stream, or C2H Stream engines.
The descriptor engine is also responsible for generating the status descriptor for the completion of the
DMA operations. With the exception of C2H Stream mode, all modes use this mechanism to convey
completion of each DMA operation so that software can reclaim descriptors and free up any
associated buffers. This is indicated by the CIDX field of the status descriptor.
✎ Recommended: If a queue is associated with interrupt aggregation, AMD recommends that the
status descriptor be turned off, and instead the DMA status be received from the interrupt aggregation
ring.
To put a limit on the number of fetched descriptors (for example, to limit the amount of buffering
required to store the descriptor), it is possible to turn-on and throttle credit on a per-queue basis. In

Displayed in the footer


Page 200 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
this mode, the descriptor engine fetches the descriptors up to available credit, and the total number of
descriptors fetched per queue is limited to the credit provided. The user logic can return the credit
through the dsc_crdt interface. The credit is in the granularity of the size of the descriptor.
To help a user-developed traffic manager prioritize the workload, the available descriptor to be fetched
(incremental PIDX value) of the PIDX update is sent to the user logic on the tm_dsc_sts interface.
Using this interface it is possible to implement a design that can prioritize and optimize the descriptor
storage.

H2C MM Engine

The H2C MM Engine moves data from the host memory to card memory through the H2C AXI-MM
interface. The engine generates reads on PCIe, splitting descriptors into multiple read requests based
on the MRRS and the requirement that PCIe reads do not cross 4 KB boundaries. Once completion
data for a read request is received, an AXI write is generated on the H2C AXI-MM interface. For
source and destination addresses that are not aligned, the hardware will shift the data and split writes
on AXI-MM to prevent 4 KB boundary crossing. Each completed descriptor is checked to determine
whether a writeback and/or interrupt is required.
For Internal mode, the descriptor engine delivers memory mapped descriptors straight to the H2C MM
engine. The user logic can also inject the descriptor into the H2C descriptor bypass interface to move
data from host to card memory. This gives the ability to do interesting things such as mixing control
and DMA commands in the same queue. Control information can be sent to a control processor
indicating the completion of DMA operation.

C2H MM Engine

The C2H MM Engine moves data from card memory to host memory through the C2H AXI-MM
interface. The engine generates AXI reads on the C2H AXI-MM bus, splitting descriptors into multiple
requests based on 4 KB boundaries. Once completion data for the read request is received on the
AXI4 interface, a PCIe write is generated using the data from the AXI read as the contents of the
write. For source and destination addresses that are not aligned, the hardware will shift the data and
split writes on PCIe to obey Maximum Payload Size (MPS) and prevent 4 KB boundary crossings.
Each completed descriptor is checked to determine whether a writeback and/or interrupt is required.
For Internal mode, the descriptor engine delivers memory mapped descriptors straight to the C2H MM
engine. As with H2C MM Engine, the user logic can also inject the descriptor into the C2H descriptor
bypass interface to move data from card to host memory.
For multi-function configuration support, the PCIe function number information will be provided in the
aruser bits of the AXI-MM interface bus to help virtualization of card memory by the user logic. A
parity bus, separate from the data and user bus, is also provided for end-to-end parity support.

H2C Stream Engine

The H2C stream engine moves data from the host to the H2C Stream interface. For internal mode,
descriptors are delivered straight to the H2C stream engine; for a queue in bypass mode, the
descriptors can be reformatted and fed to the bypass input interface. The engine is responsible for
breaking up DMA reads to MRRS size, guaranteeing the space for completions, and also makes sure
completions are reordered to ensure H2C stream data is delivered to user logic in-order.

Displayed in the footer


Page 201 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The engine has sufficient buffering for up to 256 descriptor reads and up to 32 KB of data. DMA
fetches the data and aligns to the first byte to transfer on the AXI4 interface side. This allows every
descriptor to have random offset and random length. The total length of all descriptors put together
must be less than 64 KB.
For internal mode queues, each descriptor defines a single AXI4-Stream packet to be transferred to
the H2C AXI-ST interface. A packet with multiple descriptors straddling is not allowed due to the lack
of per queue storage. However, packets with multiple descriptors straddling can be implemented
using the descriptor bypass mode. In this mode, the H2C DMA engine can be initiated when the user
logic has enough descriptors to form a packet. The DMA engine is initiated by delivering the multiple
descriptors straddled packet along with other H2C ST packet descriptors through the bypass
interface, making sure they are not interleaved. Also, through the bypass interface, the user logic can
control the generation of the status descriptor.

C2H Stream Engine

The C2H streaming engine is responsible for receiving data from the user logic and writing to the Host
memory address provided by the C2H descriptor for a given Queue.
The C2H engine has two major blocks to accomplish C2H streaming DMA, Descriptor Prefetch Cache
(PFCH), and the C2H-ST DMA Write Engine. The PFCH has per queue context to enhance the
performance of its function and the software that is expected to program it.
PFCH cache has three main modes, on a per queue basis, called Simple Bypass Mode, Internal
Cache Mode, and Cached Bypass Mode.

In Simple Bypass Mode, the engine does not track anything for the queue, and the user logic
can define its own method to receive descriptors. The user logic is then responsible for
delivering the packet and associated descriptor through the simple bypass interface. The
ordering of the descriptors fetched by a queue in the bypass interface and the C2H stream
interface must be maintained across all queues in bypass mode.
In Internal Cache Mode and Cached Bypass Mode, the PFCH module offers storage for up to
512 descriptors and these descriptors can be used by up to 64 different queues. In this mode,
the engine controls the descriptors to be fetched by managing the C2H descriptor queue credit
on demand based on received packets in the pipeline. Pre-fetch mode can be enabled on a per
queue basis, and when enabled, causes the descriptors to be opportunistically pre-fetched so
that descriptors are available before the packet data is available. The status can be found in
prefetch context. This significantly reduces the latency by allowing packet data to be transferred
to the PCIe integrated block almost immediately, instead of having to wait for the relevant
descriptor to be fetched. The size of the data buffer is fixed for a queue (PFCH context) and the
engine can scatter the packet across as many as seven descriptors. In cached bypass mode
descriptor is bypassed to user logic for further processing, such as address translation, and sent
back on the bypass in interface. This mode does not assume any ordering descriptor and C2H
stream packet interface, and the pre-fetch engine can match the packet and descriptors. When
pre-fetch mode is enabled, do not give credits to IP. The pre-fetch engine takes care of credit
management.

Displayed in the footer


Page 202 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Completion Engine

The Completion (CMPT) Engine is used to write to the completion queues. Although the Completion
Engine can be used with an AXI-MM interface and Stream DMA engines, the C2H Stream DMA
engine is designed to work closely with the Completion Engine. The Completion Engine can also be
used to pass immediate data to the Completion Ring. The Completion Engine can be used to write
Completions of up to 64B in the Completion ring. When used with a DMA engine, the completion is
used by the driver to determine how many bytes of data were transferred with every packet. This
allows the driver to reclaim the descriptors.
The Completion Engine maintains the Completion Context. This context is programmed by the Driver
and is maintained on a per-queue basis. The Completion Context stores information like the base
address of the Completion Ring, PIDX, CIDX and a number of aspects of the Completion Engine,
which can be controlled by setting the fields of the Completion Context.
The engine also can be configured on a per-queue basis to generate an interrupt or a completion
status update, or both, based on the needs of the software. If the interrupts for multiple queues are
aggregated into the interrupt aggregation ring, the status descriptor information is available in the
interrupt aggregation ring as well.
The CMPT Engine has a cache of up to 64 entries to coalesce the multiple smaller CMPT writes into
64B writes to improve the PCIe efficiency. At any time, completions can be simultaneously coalesced
for up to 64 queues. Beyond this, any additional queue that needs to write a CMPT entry causes the
eviction of the least recently used queue from the cache. The depth of the cache is set to 64.

Bridge Interfaces

AXI Memory Mapped Bridge Master Interface

The AXI MM Bridge Master interface is used for high bandwidth access to AXI Memory Mapped space
from the host. The interface supports up to 32 outstanding AXI reads and writes. One or more PCIe
BAR of any physical function (PF) or virtual function (VF) can be mapped to the AXI-MM bridge
master interface. This selection must be made prior to design compilation.
Virtual function group (VFG) refers to the VF group number. It is equivalent to the PF number
associated with the corresponding VF. VFG_OFFSET refers to the VF number with respect to a
particular PF. Note that this is not the FIRST_VF_OFFSET of each PF.
For example, if both PF0 and PF1 have 8 VFs, FIRST_VF_OFFSET for PF0 and PF1 is 16 and 23.
Each host initiated access can be uniquely mapped to the 64-bit AXI address space through the PCIe
to AXI BAR translation.
Since all functions share the same AXI Master address space, a mechanism is needed to map
requests from different functions to a distinct address space on the AXI master side. An example
provided below shows how PCIe to AXI translation vector is used. Note that all VFs belonging to the
same PF share the same PCIe to AXI translation vector. Therefore, the AXI address space of each VF
is concatenated together. Use VFG_OFFSET to calculate the actual starting address of AXI for a
particular VF.
To summarize, AXI master write or read address is determined as:

For PF, address = pcie2axi_vec + axi_offset.


For VF, address = pcie2axi_vec + (VFG_OFFSET + 1)*vf_bar_size + axi_offset.

Displayed in the footer


Page 203 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Where pcie2axi_vec is PCIe to AXI BAR translation (that can be set when the IP core is configured
from the Vivado IP Catalog).
And axi_offset is the address offset in the requested target space.

PCIe to AXI BARs

For each physical function, the PCIe configuration space consists of a set of five 32-bit memory BARs
and one 32-bit Expansion ROM BAR. When SR-IOV is enabled, an additional five 32-bit BARs are
enabled for each Virtual Function. These BARs provide address translation to the AXI4 memory
mapped space capability, interface routing, and AXI4 request attribute configuration. Any pairs of
BARs can be configured as a single 64-bit BAR. Each BAR can be configured to route its requests to
the QDMA register space, or the AXI MM bridge master interface.

Request Memory Type


The memory type can be set for each PCIe BAR through GUI configuration.

AxCache[1] is set to 1 for modifiable, and 0 for non-modifiable. Selecting the AxCache box sets
AxCache[1] to 1.

AXI Memory Mapped Bridge Slave Interface

The AXI-MM Bridge Slave interface is used for high bandwidth memory transfers between the user
logic and the Host. AXI to PCIe translation is supported through the AXI to PCIe BARs. The interface
will split requests as necessary to obey PCIe MPS and 4 KB boundary crossing requirements. Up to
32 outstanding read and write requests are supported.

AXI to PCIe BARs

In the Bridge Slave interface, there is one BARs which can be configured as 32 bits or 64 bits. This
BAR provide address translation from AXI address space to PCIe address space. The address
translation is configured through BDF table programming. Refer to Slave Bridge section for BDF
programming.

Interrupt Module

The IRQ module aggregates interrupts from various sources. The interrupt sources are queue-based
interrupts, user interrupts and error interrupts.
Queue-based interrupts and user interrupts are allowed on PFs and VFs, but error interrupts are
allowed only on PFs. If the SR-IOV is not enabled, each PF has the choice of MSI-X or Legacy
Interrupts. With SR-IOV enabled, only MSI-X interrupts are supported across all functions.
MSI-X interrupt is enabled by default. Host system (Root Complex) will enable one or all of the
interrupt types supported in hardware. If MSI-X is enabled, it takes precedence.
Up to eight interrupts per function are available. To allow many queues on a given function and each
to have interrupts, the QDMA offers a novel way of aggregating interrupts from multiple queues to a
single interrupt vector. In this way, all 2048 queues could in principle be mapped to a single interrupt

Displayed in the footer


Page 204 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
vector. QDMA offers 256 interrupt aggregation rings that can be flexibly allocated among the 256
available functions.

PCIe Block Interface

PCIe CQ/CC

The PCIe Completer Request (CQ)/Completer Completion (CC) modules receive and process TLP
requests from the remote PCIe agent. This interface to the PCIe Controller operates in an address
aligned mode. The module uses the BAR information from the Integrated Block for IP PCIe Controller
to determine where the request should be forwarded. The possible destinations for these requests
are:

DMA configuration module


AXI4 Bridge interface to Network on Chip (NoC)

Non-posted requests are expected to receive completions from the destination, which are forwarded
to the remote PCIe agent. For further details, see the Versal Adaptive SoC CPM Mode for PCI
Express Product Guide (PG346).

PCIe RQ/RC

The PCIe Requester Request (RQ)/Requester Completion (RC) interface generates PCIe TLPs on
the RQ bus and processes PCIe Completion TLPs from the RC bus. This interface to the PCIe
Controller operates in DWord aligned mode. With a 512-bit interface, straddling is enabled. While
straddling is supported, all combinations of RQ straddled transactions might not be implemented. For
further details, see the Versal Adaptive SoC CPM Mode for PCI Express Product Guide (PG346).

PCIe Configuration

Several factors can throttle outgoing non-posted transactions. Outgoing non-posted transactions are
throttled based on flow control information from the PCIE Controller to prevent head of line blocking of
posted requests. The DMA will meter non-posted transactions based on the PCIe Receive FIFO
space.

General Design of Queues

The multi-queue DMA engine of the QDMA uses RDMA model queue pairs to allow RNIC
implementation in the user logic. Each queue set consists of Host to Card (H2C), Card to Host (C2H),
and a C2H Stream Completion (CMPT). The elements of each queue are descriptors.
H2C and C2H are always written by the driver/software; hardware always reads from these queues.
H2C carries the descriptors for the DMA read operations from Host. C2H carries the descriptors for
the DMA write operations to the Host.
In internal mode, H2C descriptors carry address and length information and are called gather
descriptors. They support 32 bits of metadata that can be passed from software to hardware along
with every descriptor. The descriptor can be memory mapped (where it carries host address, card
address, and length of DMA transfer) or streaming (only host address, and length of DMA transfer)

Displayed in the footer


Page 205 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
based on context settings. Through descriptor bypass, an arbitrary descriptor format can be defined,
where software can pass immediate data and/or additional metadata along with packet.
C2H queue memory mapped descriptors include the card address, the host address and the length.
In streaming internal cached mode, descriptors carry only the host address. The buffer size of the
descriptor, which is programmed by the driver, is expected to be of fixed size for the whole queue.
Actual data transferred associated with each descriptor does not need to be the full length of the
buffer size.
The software advertises valid descriptors for H2C and C2H queues by writing its producer index
(PIDX) to the hardware. The status descriptor is the last entry of the descriptor ring, except for a C2H
stream ring. The status descriptor carries the consumer index (CIDX) of the hardware so that the
driver knows when to reclaim the descriptor and deallocate the buffers in the host.
For the C2H stream mode, C2H descriptors will be reclaimed based on the CMPT queue entry.
Typically, this carries one entry per C2H packet, indicating one or more C2H descriptors is consumed.
The CMPT queue entry carries enough information for software to claim all the descriptors consumed.
Through external logic, this can be extended to carry other kinds of completions or information to the
host.
CMPT entries written by the hardware to the ring can be detected by the driver using either the color
bit in the descriptor or the status descriptor at the end of the CMPT ring. Each CMPT entry can carry
metadata for a C2H stream packet and can also serve as a custom completion or immediate
notification for the user application.
The base address of all ring buffers (H2C, C2H, and CMPT) should be aligned to a 4 KB address.

Figure: Queue Ring Architecture

Displayed in the footer


Page 206 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The software can program 16 different ring sizes. The ring size for each queue can be selected from
context programing. The last queue entry is the descriptor status, and the number of allowable entries
is (queue size -1).
For example, if queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is
reserved for status. This index should never be used for PIDX update, and PIDX update should never
be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.
In the example above, if traffic has already started and the CIDX is 4, the maximum PIDX update is 3.

H2C and C2H Queues

H2C/C2H queues are rings located in host memory. For both type of queues, the producer is software
and consumer is the descriptor engine. The software maintains producer index (PIDX) and a copy of
hardware consumer index (HW CIDX) to avoid overwriting unread descriptors. The descriptor engine
also maintains consumer index (CIDX) and a copy of SW PIDX, which is to make sure the engine
does not read unwritten descriptors. The last entry in the queue is dedicated for the status descriptor
where the engine writes the HW CIDX and other status.
The engine maintains a total of 2048 H2C and 2048 C2H contexts in local memory. The context
stores properties of the queue, such as base address (BADDR), SW PIDX, CIDX, and depth of the
queue.

Figure: Simple H2C and C2H Queue

The figure above shows the H2C and C2H fetch operation.

1. For H2C, the driver writes payload into host buffer, forms the H2C descriptor with the payload
buffer information and puts it into H2C queue at the PIDX location. For C2H, the driver forms the
descriptor with available buffer space reserved to receive the packet write from the DMA.
2. The driver sends the posted write to PIDX register in the descriptor engine for the associated
Queue ID (QID) with its current PIDX value.

Displayed in the footer


Page 207 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
3. Upon reception of the PIDX update, the engine calculates the absolute QID of the pointer update
based on address offset and function ID. Using the QID, the engine will fetch the context for the
absolute QID from the memory associated with the QDMA.
4. The engine determines the number of descriptors that are allowed to be fetched based on the
context. The engine calculates the descriptor address using the base address (BADDR), CIDX,
and descriptor size, and the engine issues the DMA read request.
5. After the descriptor engine receives the read completion from the host memory, the descriptor
engine delivers them to the H2C Engine or C2H Engine in internal mode. In case of bypass, the
descriptors are sent out to the associated descriptor bypass output interface.
6. For memory mapped or H2C stream queues programmed as internal mode, after the fetched
descriptor is completely processed, the engine writes the CIDX value to the status descriptor.
For queues programmed as bypass mode, user logic controls the write back through bypass in
interface. The status descriptor could be moderated based on context settings. C2H stream
queues always use the CMPT ring for the completions.

For C2H, the fetch operation is implicit through the CMPT ring.

Completion Queue

The Completion (CMPT) queue is a ring located in host memory. The consumer is software, and the
producer is the CMPT engine. The software maintains the consumer index (CIDX) and a copy of
hardware producer index (HW PIDX) to avoid reading unwritten completions. The CMPT engine also
maintains PIDX and a copy of software consumer index (SW CIDX) to make sure that the engine
does not overwrite unread completions. The last entry in the queue is dedicated for the status
descriptor which is where the engine writes the hardware producer index (HW PIDX) and other status.
The engine maintains a total of 2048 CMPT contexts in local memory. The context stores properties of
the queue, such as base address, SW CIDX, PIDX, and depth of the queue.

Figure: Simple Completion Queue Flow

Displayed in the footer


Page 208 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

C2H stream is expected to use the CMPT queue for completions to host, but it can also be used for
other types of completions or for sending messages to the driver. The message through the CMPT is
guaranteed to not bypass the corresponding C2H stream packet DMA.
The simple flow of DMA CMPT queue operation with respect to the numbering above follows:

1. The CMPT engine receives the completion message through the CMPT interface, but the QID
for the completion message comes from the C2H stream interface. The engine reads the QID
index of CMPT context RAM.
2. The DMA writes the CMPT entry to address BASE+PIDX.
3. If all conditions are met, optionally writes PIDX to the status descriptor of the CMPT queue with
color bit.
4. If interrupt mode is enabled, the CMPT engine generates the interrupt event message to the
interrupt module.
5. The driver can be in polling or interrupt mode. Either way, the driver identifies the new CMPT
entry either by matching the color bit or by comparing the PIDX value in the status descriptor
against its current software CIDX value.
6. The driver updates CIDX for that queue. This allows the hardware to reuse the descriptors again.
After the software finishes processing the CMPT, that is, before it stops polling or leaving the
interrupt handler, the driver issues a write to CIDX update register for the associated queue.

SR-IOV Support

The QDMA provides an optional feature to support Single Root I/O Virtualization (SR-IOV). The PCI-
SIG® Single Root I/O Virtualization and Sharing (SR-IOV) specification (available from PCI-SIG
Specifications (www.pcisig.com/specifications) standardizes the method for bypassing the VMM

Displayed in the footer


Page 209 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
involvement in datapath transactions and allows a single endpoint to appear as multiple separate
endpoints. SR-IOV classifies the functions as:

Physical Functions (PF): Full featured PCIe® functions which include SR-IOV capabilities among
others.
Virtual Functions (VF): PCIe functions featuring configuration space with Base Address
Registers (BARs) but lacking the full configuration resources and controlled by the PF
configuration. The main role of the VF is data transfer.

Apart from PCIe defined configuration space, QDMA Subsystem for PCI Express virtualizes data path
operations, such as pointer updates for queues, and interrupts. The rest of the management and
configuration functionality is deferred to the physical function driver. The Drivers that do not have
sufficient privilege must communicate with the privileged Driver through the mailbox interface which is
provided in part of the QDMA Subsystem for PCI Express.
Security is an important aspect of virtualization. The QDMA Subsystem for PCI Express offers the
following security functionality:

QDMA allows only privileged PF to configure the per queue context and registers. VFs inform the
corresponding PFs of any queue context programming.
Drivers are allowed to do pointer updates only for the queue allocated to them.
The system IOMMU can be turned on to check that the DMA access is being requested by PFs
or VFs. The ARID comes from queue context programmed by a privileged function.

Any PF or VF can communicate to a PF (not itself) through mailbox. Each function implements one
128B inbox and 128B outbox. These mailboxes are visible to the driver in the DMA BAR (typically
BAR0) of its own function. At any given time, any function can have one outgoing mailbox and one
incoming mailbox message outstanding per function.
The diagram below shows how a typical system can use QDMA with different functions and operating
systems. Different Queues can be allocated to different functions, and each function can transfer DMA
packets independent of each other.

Figure: QDMA in a System

Displayed in the footer


Page 210 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Limitations

The limitation of the QDMA is as follows:

The DMA supports a maximum of 256 Queues on any VF function.


SRIOV is not supported in bridge mode.

Applications

The QDMA is used in a broad range of networking, computing, and data storage applications. A
common usage example for the QDMA is to implement Data Center and Telco applications, such as
Compute acceleration, Smart NIC, NVMe, RDMA-enabled NIC (RNIC), server virtualization, and NFV
in the user logic. Multiple applications can be implemented to share the QDMA by assigning different
queue sets and PCIe functions to each application. These Queues can then be scaled in the user
logic to implement rate limiting, traffic priority, and custom work queue entry (WQE).

Product Specification

Displayed in the footer


Page 211 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA Performance Optimization

AMD provides multiple example designs for you to experiment. All example designs can be
downloaded from GitHub. Performance example design can be selected from the CED example
design.

Clock Settings
To achieve maximum performance of 200 Gb/s line rate, PCIe configuration needs to be set at Gen5
speed and x8 width.

NoC needs to run at much higher frequency to reach the maximum data throughput. Set NoC
clock at 1000 MHz as shown below.
PL logic that interacts with QDMA IP needs to run at 420 to 430 MHz. This clock can be
generated from CPM5 IP as shown in the following settings. The PL CLK 0 is set to 430 MHz to
achieve the desired performance.
Not all PL logic need to run at this rate, only the data path and the control path that interact with
the QDMA IP. pcie_qdma_mailbox IP's internal logic can run at lower frequency at 250 MHz.
This clock is generated from the IP PL CLK1.

Figure: Clock Settings

Displayed in the footer


Page 212 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

✎ Note: You can get all clock details from the performance CED.
Following are the QDMA register settings recommended by AMD for better performance. Performance
numbers vary depending on systems and OS used.

Table: QDMA Performance Registers

Address Name Fields Field Value Register Value

evt_pfch_fl_th[15:0] 256
0xB08 PFCH CFG 0x100_0100
pfch_fl_th[15:0] 256

evt_qcnt_th[15:0] 120
0xA80 PFCH_CFG_1 0x78_007C
pfch_qcnt[15:0] 124

fence 1
rsvd[1:0] 0
var_desc_no_drop 0
0xA84 PFCH_CFG_2 0x8040_03C8
pfch_ll_sz_th[15:0] 1024
var_desc_num_pfch[5:0] 15
num_pfch[5:0] 8

rsvd[12:0] 0
dis_fence_fix 0
0x1400 CRDT_COAL_CFG_1 0x4010
pld_fifo_th[7:0] 16
crdt_timer_th[9:0] 16

rsv2[7:0] 0
crdt_fifo_th[7:0] 120
0x1404 CRDT_COAL_CFG_2 0x78_0060
rsv1[4:0] 0
crdt_cnt_th[10:0] 96

req_throt_en_req 1
req_throt 448
0xE24 H2C_REQ_THROT_PCIE 0x8E04_E000
req_throt_en_data 1
data_thresh 57344

Displayed in the footer


Page 213 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Address Name Fields Field Value Register Value

req_throt_en_req 0
req_throt 448
0xE2C H2C_REQ_THROT_AXIMM 0x8E05_0000
req_throt_en_data 0
data_thresh 65536

c2h_uodsc_limit 0
h2c_uodsc_limit 0
0x250 QDMA_GLBL_DSC_CFG reserved 0 0x00_0015
Max_dsc_fetch 5
wb_acc_int 1

10b_tag_en 1
reserved 0
axi_wbk 0
0x4C CONFIG_BLOCK_MISC_CONTROL
axi_dsc 0 0x81_001F
num_tags 512
reserved 0
rq_metering_multiplier 31

Displayed in the footer


Page 214 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA_C2H_INT_TIMER_TICK (0xB0C) set to 50. Corresponding to 100 ns (1 tick = 4 ns for
250 MHz user clock)
C2H trigger mode set to User + Timer, with counter set to 64 and timer to match round trip
latency. Global register for timer should have a value of 30 for 3 μs
TX/RX API burst size = 64, ring depth = 2048. The driver should update TX/RX PIDX in batches
of 64
PCIe MPS = 256 bytes, MRRS = 4K bytes, 10-bit Tag Enabled, Relaxed Ordering Enabled
QDMA_C2H_WRB_COAL_CFG (0xB50) bits[31:26] set to 63.This is maximum buffer size for
WRB.
The driver updates the completion CIDX in batches of 64 to reduce number of MMIO writes
before updating the C2H PIDX
The driver should update the H2C PIDX in batches of 64, and also update for the last descriptor
of the scatter gather list.
C2H context:
bypass = 0 (Internal mode)
frcd_en = 1
qen = 1
wbk_en = 1
irq_en = irq_arm = int_aggr = 0
C2H prefetch context:
pfch = 1
bypass = 0
valid = 1
C2H CMPT context:
en_stat_desc = 1
en_int = 0 (Poll_mode)
int_aggr = 0 (Poll mode)
trig_mode = 4
counter_idx = corresponding to 64
timer_idx = corresponding to 3 μs
valid = 1
H2C context:
bypass = 0 (Internal mode)
frcd_en = 0
fetch_max = 0
qen = 1
wbk_en = 1
wbi_chk = 1
wbi_intvl_en = 1
irq_en = 0 (Poll mode)
irq_arm = 0 (Poll mode)
int_aggr = 0 (Poll mode)
For optimal QDMA streaming performance, packet buffers of the descriptor ring should be aligned to
at least 256 bytes.

Displayed in the footer


Page 215 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

✎ Recommended: AMD recommends that you limit the total outstanding descriptor fetch to be less
than 8 KB on the PCIe. For example, limit the outstanding credits across all queues to 512 for a 16B
descriptor.

Performance in Descriptor Bypass Mode


QDMA supports both internal mode and descriptor bypass mode. Depending on the number of active
queues needed for the design, you need to select the Internal mode or Descriptor bypass mode. If the
number of active queues are less than 64, then Internal mode works fine. If the number of queues are
more than 64, it is better to use the descriptor bypass mode.
In the descriptor bypass mode, it is your responsibility to maintain descriptors for corresponding
queues and need to control their priority in sending the descriptors back to the IP.
When the design is configured in descriptor bypass mode, all the above setting apply. The following
information provides recommendations to improve performance in bypass mode.

When bypass in dma<0/1>_h2c_byp_in_st_sdi ports is set, the QDMA IP generates the status
write back for every packet. AMD recommends that this port be asserted once in 32 packets or
64 packets. And if there are no more descriptors left then assert dma<0/1>_h2c_byp_in_st_sdi
at the last descriptor. This requirement is per queue basis, and applies to AXI4 (H2C and C2H)
bypass transfers and AXI4-Stream H2C transfers.
For AXI4-Stream C2H Simple bypass mode, the dma<0/1>_dsc_crdt_in_fence port should be
set to 1 for performance reasons. This recommendation assumes the user design already
coalesced credits for each queue and sent them to the IP. In internal mode, set the fence bit in
the QDMA_C2H_PFCH_CFG_2 (0xA84) register.

Performance Optimization Based on Available Cache/Buffer Size

Table: CPM5 QDMA

Name Entry/Depth Description

C2H Descriptor Cache 2048 Total number of outstanding C2H stream


Depth descriptor fetches for cache bypass and
internal. This cache depth is not relevant in
Simple bypass mode, in simple bypass mode
user can have longer descriptor cache.

Prefetch Cache Depth 128 C2H prefetch tags available. If a you have more
then 128 active queues for packets < 512B,
performance may reduce depending on the
data pattern. If user see performance
degradation they can implement simple bypass
mode, where user can maintain all descriptor
flow.

Displayed in the footer


Page 216 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Name Entry/Depth Description

C2H Payload FIFO Depth 1024 Units of 64B. Amount of C2H data that C2H
engine can buffer. This amount of buffer can
sustain host read latency up to 2us (1024
*2ns). If latency is more then 2us there could
be performance degradation.

MM Reorder Buffer Depth 512 Units of 64B. Amount of MM read data that can
be stored to absorb host read latency.

Desc Eng Reorder Buffer 512 Units of 64B. Amount of Descriptor fetch data
Depth that can be stored to absorb host read latency.

H2C-ST Reorder Buffer 1024 Units of 64B. Amount of H2C-ST data that can
Depth be stored to absorb host read latency.

QDMA Operations

Descriptor Engine

The descriptor engine is responsible for managing the consumer side of the Host to Card (H2C) and
Card to Host (C2H) descriptor ring buffers for each queue. The context for each queue determines
how the descriptor engine will process each queue individually. When descriptors are available and
other conditions are met, the descriptor engine will issue read requests to PCIe to fetch the
descriptors. Received descriptors are offloaded to either the descriptor bypass out interface (bypass
mode) or delivered directly to a DMA engine (internal mode). When a H2C Stream or Memory
Mapped DMA engine completes a descriptor, status can be written back to the status descriptor, an
interrupt, and/or a marker response can be generated to inform software and user logic of the current
DMA progress. The descriptor engine also provides a Traffic Manager Interface which notifies user
logic of certain status for each queue. This allows the user logic to make informed decisions if
customization and optimization of DMA behavior is desired.

Descriptor Context

The Descriptor Engine stores per queue configuration, status and control information in descriptor
context that can be stored in block RAM or UltraRAM, and the context is indexed by H2C or C2H QID.
Prior to enabling the queue, the hardware and credit context must first be cleared. After this is done,
the software context can be programmed and the qen bit can be set to enable the queue. After the
queue is enabled, the software context should only be updated through the direct mapped address
space to update the Producer Index and Interrupt Arm® bit, unless the queue is being disabled. The
hardware context and credit context contain only status. It is only necessary to interact with the
hardware and credit contexts as part of queue initialization to clear them to all zeros. Once the queue
is enabled, context is dynamically updated by hardware. Any modification of the context through the
indirect bus when the queue is enabled can result in unexpected behavior. Reading the context when
the queue is enabled is not recommended as it can result in reduced performance.

Displayed in the footer


Page 217 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Software Descriptor Context Structure (0x0 C2H and 0x1 H2C)

The descriptor context is used by the descriptor engine.

Table: Software Descriptor Context Structure Definition

Bit Bit Width Field Name Description

[255:148] 108 reserved Reserved. Set to 0s.

[147:144] 4 host_id host_id is index into the host_profile


registers to determine the steering for AXI4
MM transfers.

[143:140] 4 reserved Reserved. Set to 0s.

[139] 1 int_aggr If set, interrupts will be aggregated in


interrupt ring.

[138:128] [10:0] vec MSI-X vector used for interrupts for direct
interrupt or interrupt aggregation entry for
aggregated interrupts.

[127:64] 64 dsc_base 4K aligned base address of descriptor ring.

[63] 1 is_mm This field determines if the queue is


Memory Mapped or not. If this field is set,
the descriptors will be delivered to
associated H2C or C2H MM engine.
1: Memory Mapped
0: Stream

[62] 1 mrkr_dis If set, disables the marker response in


internal mode.
Not applicable for C2H ST.

[61] 1 irq_req Interrupt due to error waiting to be sent


(waiting for irq_arm). This bit should be
cleared when the queue context is
initialized.
Not applicable for C2H ST.

[60] 1 err_wb_sent A writeback/interrupt was sent for an error.


Once this bit is set no more writebacks or
interrupts will be sent for the queue. This bit
should be cleared when the queue context
is initialized.
Not applicable for C2H ST.

[59:58] 2 err Error status.

Displayed in the footer


Page 218 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


Bit[1] dma – An error occurred during DMA
operation. Check engine status registers.
Bit[0] dsc – An error occurred during
descriptor fetch or update. Check descriptor
engine status registers. This field should be
set to 0 when the queue context is
initialized.

[57] 1 irq_no_last No interrupt was sent and the producer


index (PIDX) or consumer index (CIDX)
was idle in internal mode. When the
irq_arm bit is set, the interrupt will be sent.
This bit will clear automatically when the
interrupt is sent or if the PIDX of the queue
is updated.
This bit should be initialized to 0 when the
queue context is initialized.
Not applicable for C2H ST.

[56:54] 3 port_id Port_id


The port id that will be sent on user
interfaces for events associated with this
queue.

[53] 1 irq_en Interrupt enable.


An interrupt to the host will be sent on host
status updates.
Set to 0 for C2H ST.

[52] 1 wbk_en Writeback enable.


A memory write to the status descriptor will
be sent on host status updates.

[51] 1 mm_chn For AXI-MM transfer, set to 0 to target


Channel 0 or set to 1 to target Channel 1.
For AXI-ST set to 0.

[50] 1 bypass If set, the queue will operate under Bypass


mode, otherwise it will be in Internal mode.

[49:48] 2 dsc_sz Descriptor fetch size. 0: 8B, 1: 16B; 2: 32B;


3: 64B.
If bypass mode is not enabled, 32B is
required for Memory Mapped DMA, 16B is
required for H2C Stream DMA, and 8B is
required for C2H Stream DMA.

Displayed in the footer


Page 219 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


If the queue is configured for bypass mode,
any descriptor size can be selected. The
descriptors will be delivered on the bypass
output interface. It is up to the user logic to
process the descriptors before they are fed
back into the descriptor bypass input.

[47:44] 4 rng_sz Descriptor ring size index. This index


selects one of 16 register (offset
0x204:0x240) which has different ring
sizes.

[43:41] 3 reserved Reserved

[40:37] 4 fetch_max Maximum number of descriptor fetches


outstanding for this queue. The max
outstanding is fetch_max + 1. Higher value
can increase the single queue
performance,

[36] 1 at Address type of base address.


0: untranslated
1: translated
This will be the address type (AT) used on
PCIe for descriptor fetches and status
descriptor writebacks.

[35] 1 wbi_intvl_en Write back/Interrupt interval.


Enables periodic status updates based on
the number of descriptors processed.
Applicable to Internal mode.
Not Applicable to C2H ST. The writeback
interval is determined by register
QDMA_GLBL_DSC_CFG (0x250) bits[2:0].

[34] 1 wbi_chk Writeback/Interrupt after pending check.


Enable status updates when the queue has
completed all available descriptors.
Applicable to Internal mode.

[33] 1 fcrd_en Enable fetch credit.


The number of descriptors fetched will be
qualified by the number of credits given to
this queue.
Set to 1 for C2H ST.

Displayed in the footer


Page 220 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[32] 1 qen Indicates that the queue is enabled.

[31:25] 7 reserved Reserved

[24:17] 8 fnc_id Function ID

[16] 1 irq_arm Interrupt arm. When this bit is set, the


queue is allowed to generate an interrupt.

[15:0] 16 pidx Producer index.

Hardware Descriptor Context Structure (0x2 C2H and 0x3 H2C)

Table: Hardware Descriptor Structure Definition

Bit Bit Width Field Name Description

[47] 1 reserved Reserved

[46:43] 4 fetch_pnd Descriptor fetch pending

[42] 1 evt_pnd Event pending

[41] 1 idl_stp_b Queue invalid and no descriptors pending.


This bit is set when the queue is enabled.
The bit is cleared when the queue has been
disabled (software context qen bit) and no
more descriptors are pending.
[0] Queue is disabled (software context qen
bit) and no more descriptors pending.
[1] Queue is enabled.

[40] 1 dsc_pnd Descriptors pending. Descriptors are


defined to be pending if the last CIDX
completed does not match the current
PIDX.

[39:32] 8 Reserved

[31:16] 16 crd_use Credits consumed. Applicable if fetch


credits are enabled in the software context.

[15:0] 16 cidx Consumer index of last fetched descriptor.

Credit Descriptor Context Structure

Table: Credit Descriptor Context Structure Definition

Displayed in the footer


Page 221 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[31:16] 16 reserved Reserved

[15:0] 16 credt Fetch credits received.


Applicable if fetch credits are enabled in the
software context.
The credit descriptor context is for internal DMA use only and it can be read from the indirect bus for
debug. This context stores credits for each queue received through the Descriptor Credit Interface
with the CREDIT_ADD operation. If the credit operation has the dsc_crdt_in_fence bit set to 1,
credits are added only as the read request for the descriptor is generated.

Descriptor Fetch

Figure: Descriptor Fetch Flow

1. The descriptor engine is informed of the availability of descriptors through an update to a


queue’s descriptor PIDX. This portion of the context is direct mapped to the
QDMA_DMAP_SEL_H2C_DSC_PIDX and QDMA_DMAP_SEL_C2H_DSC_PIDX address
space.
2. On a PIDX update, the descriptor engine evaluates the number of descriptors available based on
the last fetched consumer index (CIDX). The availability of new descriptors is communicated to
the user logic through the Traffic Manager Status Interface.
3. If fetch crediting is enabled, the user logic is required to provide a credit for each descriptor that
should be fetched.
4. If descriptors are available and either fetch credits are disabled or are non-zero, the descriptor
engine will generate a descriptor fetch to PCIe. The number of fetched descriptors is further
qualified by the PCIe Max Read Request Size (MRRS) and descriptor fetch credits, if enabled. A

Displayed in the footer


Page 222 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
descriptor fetch can also be stalled due to insufficient completion space. In each direction, C2H
and H2C are allocated 256 entries for descriptor fetch completions. Each entry is the width of the
datapath. If sufficient space is available, the fetch is allowed to proceed. A given queue can only
have one descriptor fetch pending on PCIe at any time.
5. The host receives the read request and provides the descriptor read completion to the descriptor
engine.
6. Descriptors are stored in a buffer until they can be offloaded. If the queue is configured in bypass
mode, the descriptors are sent to the Descriptor Bypass Output port. Otherwise they are
delivered directly to a DMA engine. Once delivered, the descriptor fetch completion buffer space
is deallocated.

✎ Note: Available descriptors are always <ring size> - 2. At any time, the software should not update
the PIDX to more than <ring size> - 2.
For example, if queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is
reserved for status. This index should never be used for the PIDX update, and the PIDX update
should never be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.

Internal Mode

A queue can be configured to operate in Descriptor Bypass mode or Internal mode by setting the
software context bypass field. In internal mode, the queue requires no external user logic to handle
descriptors. Descriptors that are fetched by the descriptor engine are delivered directly to the
appropriate DMA engine and processed. Internal mode allows credit fetching and status updates to
the user logic for run time customization of the descriptor fetch behavior.

Internal Mode Writeback and Interrupts (AXI MM and H2C ST)

Status writebacks and/or interrupts are generated automatically by hardware based on the queue
context. When wbi_intvl_en is set, writebacks/interrupts are sent based on the interval selected in
the register QDMA_GLBL_DSC_CFG (0x250) bits[2:0]. Due to the slow nature of interrupts, in
interval mode, interrupts might be late or skip intervals. If the wbi_chk context bit is set, a
writeback/interrupt is sent when the descriptor engine has detected that the last descriptor at the
current PIDX has completed. It is recommended the wbi_chk bit be set for all internal mode
operation, including when interval mode is enabled. An interrupt is not generated until the irq_arm bit
is set by the software. After an interrupt is sent, the irq_arm bit is cleared by the hardware. Should an
interrupt be needed when the irq_arm bit is not set, the interrupt is held in a pending state until the
irq_arm bit is set.
Descriptor completion is defined to be when the descriptor data transfer has completed and its write
data is acknowledged on AXI (H2C bresp for AXI MM, Valid/Ready of ST), or it is accepted by the
PCIe Controller’s transaction layer for transmission (C2H MM).

Descriptor Bypass Mode

Descriptor Bypass mode also supports crediting and status updates to user logic. In addition,
Descriptor Bypass mode allows the user logic to customize processing of descriptors and status
updates. Descriptors fetched by the descriptor engine are delivered to user logic through the

Displayed in the footer


Page 223 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
descriptor bypass out interface. This allows user logic to pre-process or store the descriptors, if
desired. On the descriptor bypass out interface, the descriptors can be a custom format (adhering to
the descriptor size). To perform DMA operations, the user logic drives descriptors (must be QDMA
format) into the descriptor bypass input interface.

Descriptor Bypass Mode Writeback/Interrupts

In bypass mode, the user logic has explicit control over status updates to the host, and marker
responses back to user logic. Along with each descriptor submitted to the Descriptor Bypass Input
Port for a Memory Mapped Engine (H2C and C2H) or H2C Stream DMA engine, there is a CIDX, and
sdi field. The CIDX is used to identify which descriptor has completed in any status update (host
writeback, marker response, or coalesced interrupt) generated at the completion of the descriptor. If
the sdi field of the descriptor is input, then a writeback to the host is generated if the context wbk_en
bit is set. An interrupt can also be sent if the sdi bit is set if the context irq_en and irq_arm bits are
set.
If interrupts are enabled, the user logic must monitor the traffic manager output for the irq_arm. After
the irq_arm bit is observed for the queue, a descriptor with the sdi bit is sent to the DMA. Once a
descriptor with the sdi bit is sent, another irq_arm assertion must be observed before another
descriptor with the sdi bit can be sent. If the you set the sdi bit when the arm bit is not properly
observed, an interrupt might or might not be sent, and software might hang indefinitely waiting for an
interrupt. When interrupts are not enabled, setting the sdi bit has no restriction. However, excessive
writeback events can severely reduce the descriptor engine performance and consume write
bandwidth to the host.
Descriptor completion is defined to be when the descriptor data transfer has completed and its write
data has been acknowledged on AXI4 (H2C bresp for AXI MM, Valid/Ready of ST), or been accepted
by the PCIe Controller’s transaction layer for transmission (C2H MM).

Marker Response

Marker responses can be generated for any descriptor by setting the mrkr_req bit. Marker responses
are generated after the descriptor is completed. Similar to host writebacks, excessive marker
response requests can reduce descriptor engine performance. Along with mrkr_req signals, sdi can
also be set. In this case, the marker response is sent on queue status ports and writeback is sent to
the host. The marker responses are sent on queue status ports that can be identified by the queue id.
Descriptor completion is defined as when the descriptor data transfer has completed and its write data
is acknowledged on AXI (H2C bresp for AXI4, Valid/Ready of ST), or is accepted by the PCIe
Controller’s transaction layer for transmission (C2H MM).

Traffic Manager Output Interface

✎ Note: The ports described below have a prefix of dma<n>_, which can be either dma0_ for QDMA
Port 0 or dma1_ for QDMA Port 1.
The traffic manager interface provides details of a queue’s status to user logic, allowing user logic to
manage descriptor fetching and execution. In normal operation, for an enabled queue, each time the
irq_arm bit is asserted or PIDX of a queue is updated, the descriptor engine asserts
dma<n>_tm_dsc_sts_valid. The dma<n>_tm_dsc_sts_avl signal indicates the number of new

Displayed in the footer


Page 224 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
descriptors available since the last update. Through this mechanism, user logic can track the amount
of work available for each queue. This can be used for prioritizing fetches through the descriptor
engine’s fetch crediting mechanism or other user optimizations. On the valid cycle, the
dma<n>_tm_dsc_sts_irq_arm indicates that the dma<n>_irq_arm bit was zero and was set. In
bypass mode, this is essentially a credit for an interrupt for this queue. When a queue is invalidated
by software or due to error, the dma<n>_tm_dsc_sts_qinv signal will be set. If this bit is observed,
the descriptor engine will have halted new descriptor fetches for that queue. In this case, the contents
on dma<n>_tm_dsc_sts_avl indicate the number of available fetch credits held by the descriptor
engine. This information can be used to help user logic reconcile the number of credits given to the
descriptor engine, and the number of descriptors it should expect to receive. Even after
dma<n>_tm_dsc_sts_qinv is asserted, valid descriptors already in the fetch pipeline will continue to
be delivered to the DMA engine (internal mode) or delivered to the descriptor bypass output port
(bypass mode).
Other fields of the dma<n>_tm_dsc_sts interface identify the queue id, DMA direction (H2C or C2H),
internal or bypass mode, stream or memory mapped mode, queue enable status, queue error status,
and port ID.
While the dma<n>_tm_dsc_sts interface is a valid/ready interface, it should not be back-pressured for
optimal performance. Since multiple events trigger a dma<n>_tm_dsc_sts cycle, if internal buffering is
filled, descriptor fetching will be halted to prevent generation of new events.

Descriptor Credit Input Interface

The credit interface is relevant when a queue’s fcrd_en context bit is set. It allows the user logic to
prioritize and meter descriptors fetched for each queue. You can specify the DMA direction, qid, and
credit value. For a typical use case, the descriptor engine uses credit inputs to fetch descriptors.
Internally, credits received and consumed are tracked for each queue. If credits are added when the
queue is not enabled, the credits will be returned through the Traffic Manager Output Interface with
tm_dsc_sts_qinv asserted, and the credits in tm_dsc_sts_avl are not valid. Monitor tm_dsc_sts
interface to keep an account for each queue on how many credits are consumed.

Errors

Errors can potentially occur during both descriptor fetch and descriptor execution. In both cases, once
an error is detected for a queue it will invalidate the queue, log an error bit in the context, stop fetching
new descriptors for the queue which encountered the error, and can also log errors in status registers.
If enabled for writeback, interrupts, or marker response, the DMA will generate a status update to
these interfaces. Once this is done, no additional writeback, interrupts, or marker responses (internal
mode) will be sent for the queue until the queue context is cleared. As a result of the queue
invalidation due to an error, a Traffic Manager Output cycle will also be generated to indicate the error
and queue invalidation. After the queue is invalidated, if there is an error you can determine the cause
by reading the error registers and context for that queue. You must clear and remove that queue, and
then add the queue back later when needed.
Although additional descriptor fetches will be halted, fetches already in the pipeline will continue to be
processed and descriptors will be delivered to a DMA engine or Descriptor Bypass Out interface as
usual. If the descriptor fetch itself encounters an error, the descriptor will be marked with an error bit.
If the error bit is set, the contents of the descriptor should be considered invalid. It is possible that

Displayed in the footer


Page 225 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
subsequent descriptor fetches for the same queue do not encounter an error and will not have the
error bit set.

Memory Mapped DMA

In memory mapped DMA operations, both the source and destination of the DMA are memory
mapped space. In an H2C transfer, the source address belongs to PCIe address space while the
destination address belongs to AXI MM address space. In a C2H transfer, the source address belongs
to AXI MM address space while the destination address belongs to PCIe address space. PCIe-to-
PCIe, and AXI MM-to-AXI MM DMAs are not supported. Aside from the direction of the DMA, transfer
H2C and C2H DMA behave similarly and share the same descriptor format.

Operation

The memory mapped DMA engines (H2C and C2H) are enabled by setting the run bit in the Memory
Mapped Engine Control Register. When the run bit is deasserted, descriptors can be dropped. Any
descriptors that have already started the source buffer fetch will continue to be processed.
Reassertion of the run bit will result in resetting internal engine state and should only be done when
the engine is quiesced. Descriptors are received from either the descriptor engine directly or the
Descriptor Bypass Input interface. Any queue that is in internal mode should not be given descriptors
through the Descriptor Bypass Input interface. Any descriptor sent to an MM engine that is not running
will be dropped. For configurations where a mix of Internal Mode queues and Bypass Mode queues
are enabled, round robin arbitration is performed to establish order.
The DMA Memory Mapped engine first generates the read request to the source interface, splitting
the descriptor at alignment boundaries specific to the interface. Both PCIe and AXI read interfaces
can be configured to split at different alignments. Completion space for read data is preallocated
when the read is issued. Likewise for the write requests, the DMA engine will split at appropriate
alignments. On the AXI interface each engine will use a single AXI ID. The DMA engine will reorder
the read completion/write data to the order in which the reads were issued. Once sufficient read
completion data is received the write request will be issued to the destination interface in the same
order that the read data was requested. Before the request is retired, the destination interfaces must
accept all the write data and provide a completion response. For PCIe the write completion is issued
when the write request has been accepted by the transaction layer and will be sent on the link next.
For the AXI Memory Mapped interface, the bresp is the completion criteria. Once the completion
criteria has been met, the host writeback, interrupt and/or marker response is generated for the
descriptor as appropriate.
The DMA Memory Mapped engines also support the no_dma field of the Descriptor Bypass Input, and
zero-length DMA. Both cases are treated identically in the engine. The descriptors propagate through
the DMA engine as all other descriptors, so descriptor ordering within a queue is still observed.
However no DMA read or write requests are generated. The status update (writeback, interrupt,
and/or marker response) for zero-length/no_dma descriptors is processed when all previous
descriptors have completed their status update checks.

Displayed in the footer


Page 226 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Errors

There are two primary error categories for the DMA Memory Mapped Engine. The first is an error bit
that is set with an incoming descriptor. In this case, the DMA operation of the descriptor is not
processed but the descriptor proceeds through the engine to status update phase with an error
indication. This should result in a writeback, interrupt, and/or marker response depending on context
and configuration. It also results in the queue being invalidated. The second category of errors for the
DMA Memory Mapped Engine are errors encountered during the execution of the DMA itself. This can
include PCIe read completions errors, and AXI bresp errors (H2C), or AXI bresp errors and PCIe
write errors due to bus master enable or function level reset (FLR), as well as RAM ECC errors. The
first enabled error is logged in the DMA engine. Refer to the Memory Mapped Engine error logs. If an
error occurs on the read, the DMA write is aborted if possible. If the error was detected when pulling
write data from RAM, it is not possible to abort the request. Instead invalid data parity is generated to
ensure the destination is aware of the problem. After the descriptor which encountered the error has
gone through the DMA engine, it proceeds to generate status updates with an error indication. As with
descriptor errors, it results in the queue being invalidated.

AXI Memory Mapped Descriptor for H2C and C2H (32B)

Table: AXI Memory Mapped Descriptor Structure for H2C and C2H

Bit Bit Width Field Name Description

[255:192] 64 reserved Reserved

[191:128] 64 dst_addr Destination Address

[127:92] 36 reserved Reserved

[91:64] 28 lengthInByte Read length in byte (64K-1)

[63:0] 64 src_addr Source Address

Internal mode memory mapped DMA must configure the descriptor queue to be 32B and follow the
above descritor format. In bypass mode, the descriptor format is defined by the user logic, which must
drive the H2C or C2H MM bypass input port.

AXI Memory Mapped Writeback Status Structure for H2C and C2H

The MM writeback status register is located after the last entry of the (H2C or C2H) descriptor.

Table: AXI Memory Mapped Writeback Status Structure for H2C and C2H

Bit Bit Width Field Name Description

[63:48] 16 reserved Reserved

[47:32] 16 pidx Producer Index at time of writeback

[31:16] 16 cidx Consumer Index

Displayed in the footer


Page 227 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[15:2] 14 reserved Reserved

[1:0] 2 err Error


bit 1: Descriptor fetch error
bit 0: DMA error

Stream Mode DMA

H2C Stream Engine

The H2C Stream Engine is responsible for transferring streaming data from the host and delivering it
to the user logic. The H2C Stream Engine operates on H2C stream descriptors. Each descriptor
specifies the start address and the length of the data to be transferred to the user logic. The H2C
Stream Engine parses the descriptor and issues read requests to the host over PCIe, splitting the
read requests at the MRRS boundary. There can be up to 256 requests outstanding in the H2C
Stream Engine to hide the host read latency. The H2C Stream Engine implements a re-ordering buffer
of 32 KB to re-order the TLPs as they come back. Data is issued to the user logic in order of the
requests sent to PCIe.
If the status descriptor is enabled in the associated H2C context, the engine could additionally send a
status write back to host once it is done issuing data to the user logic.

Internal and Bypass Modes

Each queue in QDMA can be programmed in either of the two H2C Stream modes: internal and
bypass. This is done by specifying the mode in the queue context. The H2C Stream Engine knows
whether the descriptor being processed is for a queue in internal or bypass mode.
The following figures show the internal mode and bypass mode flows.

Figure: H2C Internal Mode Flow

Displayed in the footer


Page 228 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: H2C Bypass Mode Flow

For a queue in the Internal mode, after the descriptor is fetched from the host it is fed straight to the
H2C Stream Engine for processing. In this case, a packet of data cannot span over multiple
descriptors. Thus for a queue in internal mode, each descriptor generates exactly one AXI4-Stream
packet on the QDMA H2C AXI4-Stream output. If the packet is present in host memory in non-
contiguous space, then it has to be defined by more than one descriptor and this requires that the
queue be programmed in bypass mode.
In the Bypass mode, after the descriptors are fetched from the host they are sent straight to the user
logic using the QDMA bypass output port. The QDMA does not parse these descriptors at all. The
user logic can store these descriptors and then send the required information from these descriptors
back to QDMA using the QDMA H2C Stream descriptor bypass-in interface. Using this information,
the QDMA constructs descriptors which are then fed to the H2C Stream Engine for processing.
When fcrd_en is enabled in the software context, DMA will wait for the user application to provide
credits, Credit return in the figure above. When fcrd_en is not set, the DMA uses a pointer update,
fetches descriptors and sends the descriptor out. The user application should not send in credits.
Credit return in the above figure does not apply in this case.
The following are the advantages of using the bypass mode:

The user logic can have a custom descriptor format. This is possible because QDMA does not
parse descriptors for queues in bypass mode. The user logic parses these descriptors and
provides the information required by the QDMA on the H2C Stream bypass-in interface.
Immediate data can be passed from the software to the user logic without DMA operation.
The user logic can do traffic management by sending the descriptors to the QDMA when it is
ready to sink all the data. Descriptors can be cached in local RAM.
Perform address translation.

Displayed in the footer


Page 229 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
There are some requirements imposed on the user logic when using the bypass mode. Because the
bypass mode allows a packet to span multiple descriptors, the user logic needs to indicate to QDMA
which descriptor marks the Start-Of-Packet (SOP) and which marks the End-Of-Packet (EOP). At the
QDMA H2C Stream bypass-in interface, among other pieces of information, the user logic needs to
provide: Address, Length, SOP, and EOP. It is required that once the user logic feeds SOP descriptor
information into QDMA, it must eventually feed EOP descriptor information also. Descriptors for these
multi-descriptor packets must be fed in sequentially. Other descriptors not belonging to the packet
must not be interleaved within the multi-descriptor packet. The user logic must accumulate the
descriptors up to the EOP descriptor, before feeding them back to QDMA. Not doing so can result in a
hang. The QDMA will generate a TLAST at the QDMA H2C AXI4-Stream data output once it issues
the last beat for the EOP descriptor. This is guaranteed because the user is required to submit the
descriptors for a given packet sequentially.
The H2C stream interface is shared by all the queues, and has the potential for a head of line
blocking issue if the user logic does not reserve the space to sink the packet. Quality of service can
be severely affected if the packet sizes are large. The Stream engine is designed to saturate PCIe for
packet sizes as low as 128B, so AMD recommends that you restrict the packet size to be host page
size or maximum transfer unit as required by the user application.
A performance control provided in the H2C Stream Engine is the ability to stall requests from being
issued to the PCIe RQ/RC if a certain amount of data is outstanding on the PCIe side as seen by the
H2C Stream Engine. To use this feature, the SW must program a threshold value in the
H2C_REQ_THROT (0xE24) register. After the H2C Stream Engine has more data outstanding to be
delivered to the user logic than this threshold, it stops sending further read requests to the PCIe
RQ/RC. This feature is disabled by default and can be enabled with the H2C_REQ_THROT (0xE24)
register. This feature helps improve the C2H Stream performance, because the H2C Stream Engine
can make requests at a much faster rate than the C2H Stream Engine. This can potentially use up the
PCIe side resources for H2C traffic which results in C2H traffic suffering. The H2C_REQ_THROT
(0xE24) register also allows the SW to separately enable and program the threshold of the maximum
number of read requests that can be outstanding in the H2C Stream engine. Thus, this register can
be used to individually enable and program the thresholds for the outstanding requests and data in
the H2C Stream engine.

H2C Stream Descriptor (16B)

Table: H2C Descriptor Structure

Bit Bit Width Field Name Description

[127:96] 32 addr_h Address High. Higher 32 bits of the source


address in Host

[95:64] 32 addr_l Address Low. Lower 32 bits of the source


address in Host

[63:48] 16 reserved Reserved

[47:32] 16 len Packet Length. Length of the data to be


fetched for this descriptor.

Displayed in the footer


Page 230 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


This is also the packet length since in
internal mode, a packet cannot span
multiple descriptors.
The maximum length of the packet can be
64K-1 bytes.

[31:0] 32 metadata Metadata. QDMA passes this field on the


H2C-ST TUSER along with the data on
every beat. For a queue in internal mode, it
can be used to pass messages from SW to
user logic along with the data.
This H2C descriptor format is only applicable for internal mode. For bypass mode, the user logic can
define its own format as needed by the user application.

Descriptor Metadata

Similar to bypass mode, the internal mode also provides a mechanism to pass information directly
from the software to the user logic. In addition to address and length, the H2C Stream descriptor also
has a 32b metadata field. This field is not used by the QDMA for the DMA operation. Instead, it is
passed on to the user logic on the H2C AXI4-Stream tuser on every beat of the packet. Passing
metadata on the tuser is not supported for a queue in bypass mode and consequently there is no
input to provide the metadata on the QDMA H2C Stream bypass-in interface.

Zero Length Descriptor

The length field in a descriptor can be zero. In this case, the H2C Stream Engine will issue a zero
byte read request on PCIe. After the QDMA receives the completion for the request, the H2C Stream
Engine will send out one beat of data with tlast on the QDMA H2C AXI4-Stream interface. The zero
byte packet will be indicated on the interface by setting the zero_b_dma bit in the tuser. The user
logic must set both the SOP and EOP for a zero byte descriptor. If not done, an error will be flagged
by the H2C Stream Engine.

H2C Stream Status Descriptor Writeback

When feeding the descriptor information on the bypass input interface, the user logic can request the
QDMA to send a status write back to the host when it is done fetching the data from the host. The
user logic can also request that a status be issued to it when the DMA is done. These behaviors can
be controlled using the sdi and mrkr_req inputs in the bypass input interface.
The H2C writeback status register is located after the last entry of the H2C descriptor list.
✎ Note: The format of the H2C-ST status descriptor written to the descriptor ring is different from
that written into the interrupt coalesce entry.

Table: AXI4-Stream H2C Writeback Status Descriptor Structure

Bit Bit Width Field Name Description

Displayed in the footer


Page 231 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[63:32] 32 reserved Reserved

[47:32] 16 pidx Producer Index

[31:16] 16 cidx Consumer Index

[15:2] 14 reserved Reserved (Producer Index)

[1:0] 2 error Error


0x0 : No Error
0x1 : Descriptor or data error was
encountered on this queue
0x2 and 0x3 : Reserved

H2C Stream Data Aligner

The H2C engine has a data aligner that aligns the data to zero Bytes (0B) boundary before issuing it
to the user logic. This allows the start address of a descriptor to be arbitrarily aligned and still receive
the data on the H2C AXI4-Stream data bus without any holes at the beginning of the data. The user
logic can send a batch of descriptors from SOP to EOP with arbitrary address and length alignments
for each descriptor. The aligner will align and pack the data from the different descriptors and will
issue a continuous stream of data on the H2C AXI4-Stream data bus. The tlast on that interface will
be asserted when the last beat for the EOP descriptor is being issued.

Handling Descriptors With Errors

If an error is encountered while fetching a descriptor, the QDMA Descriptor Engine flags the descriptor
with error. For a queue in internal mode, the H2C Stream Engine handles the error descriptor by not
performing any PCIe or DMA activity. Instead, it waits for the error descriptor to pass through the
pipeline and forces a writeback after it is done. For a queue in bypass mode, it is the responsibility of
the user logic to not issue a batch of descriptors with an error descriptor. Instead, it must send just
one descriptor with error input asserted on the H2C Stream bypass-in interface and set the SOP,
EOP, no_dma signal, and sdi or mrkr-req signal to make the H2C Stream Engine send a writeback
to Host.

Handling Errors in Data From PCIe

If the H2C Stream Engine encounters an error coming from PCIe on the data, it keeps the error sticky
across the full packet. The error is indicated to the user on the err bit on the H2C Stream Data
Output. Once the H2C Stream sends out the last beat of a packet that saw a PCIe data error, it also
sends a Writeback to the Software to inform it about the error.

Displayed in the footer


Page 232 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
C2H Stream Engine

The C2H Stream Engine DMA writes the stream packets to the host memory into the descriptor
provided by the host driver through the C2H descriptor queue.
The Prefetch Engine is responsible for calculating the number of descriptors needed for the DMA that
is writing the packet. The buffer size is fixed per queue basis. For internal and cached bypass mode,
the prefetch module can fetch up to 512 descriptors for a maximum of 64 different queues at any
given time.
The Prefetch Engine also offers low latency feature pfch_en = 1, where the engine can prefetch up
to qdma_c2h_pfch_cfg.num_pfch descriptors upon receiving the packet, so that subsequent packets
can avoid the PCIe latency.
The QDMA requires software to post full ring size so the C2H stream engine can fetch the needed
number of descriptors for all received packets. If there are not enough descriptors in the descriptor
ring, the QDMA will stall the packet transfer. For performance reasons, the software is required to post
the PIDX as soon as possible to ensure there are always enough descriptors in the ring.
C2H stream packet data length is limited to 31 * C2H buffer size. C2H buffer size can be programed
from 0xAB0 to 0xAEC address, for details refer to cpm5-qdma-v4-0-pf-registers.csv file.

C2H Stream Descriptor (8B)

Table: AXI4-Stream C2H Descriptor Structure

Bit Bit Width Field Name Description

[63:0] 64 addr Destination Address

C2H Prefetch Engine

The prefetch engine interacts between the descriptor fetch engine and C2H DMA write engine to pair
up the descriptor and its payload.

Table: C2H Prefetch Context Structure

Bit Bit Width Field Name Description

[45] 1 valid Context is valid

[44:29] 16 sw_crdt Software credit


This field is written by the hardware for
internal use. The software must initialize it
to 0 and then treat it as read-only.

[28] 1 pfch Queue is in prefetch


This field is written by the hardware for
internal use. The software must initialize it
to 0 and then treat it as read-only.

[27] 1 pfch_en Enable prefetch

Displayed in the footer


Page 233 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[26] 1 err Error detected on this queue


During the descriptor per-fetch process, if
there are any errors detected, they are
logged here. This is per queue basis.

[25:8] 18 reserved Reserved

[7:5] 3 port_id Port ID

[4:1] 4 buf_size_idx Buffer size index

[0] 1 bypass C2H bypass mode, set this bit for simple
bypass mode.

C2H Stream Modes

The C2H descriptors can be from the descriptor fetch engine or C2H bypass input interfaces. The
descriptors from the descriptor fetch engine are always in cache mode. The prefetch engine keeps
the order of the descriptors to pair with the C2H data packets from the user. The descriptors from the
C2H bypass input interfaces have one interface for both simple mode and cache mode (note that both
simple bypass and cache bypass use the same interface). For simple mode, the user application
keeps the order of the descriptors to pair with the C2H data packets. For cache mode, the prefetch
engine keeps the order of the descriptors to pair with the C2H data packet from the user.
The prefetch context has a bypass bit. When it is 1'b1, the user application sends the credits for the
descriptors. When it is 1'b0, the prefetch engine handles the credits for the descriptors.
The descriptor context has a bypass bit. When it is 1'b1, the descriptor fetch engine sends out the
descriptors on the C2H bypass output interface. The user application can convert it and later loop it
back to the QDMA on the C2H bypass input interface. When the bypass context bit is 1'b0, the
descriptor fetch engine sends the descriptors to the prefetch engine directly.
There is a 2K descriptor entry buffer to take in descriptors from bypass input ports. 2K deep buffer is
shared with all the Qs.
On a per queue basis, three modes are supported.

Cache Internal Mode


Cache Bypass Mode
Simple Bypass Mode

The selection between Simple Bypass Mode and Cache Bypass Mode is done by setting the bypass
bits in Software Descriptor Context and C2H Prefetch Context as shown in the following table.
✎ Note: If you already have the descriptor cached on the device, there is no need to fetch one from
the host and you should follow the simple bypass mode for the C2H Stream application. In simple
bypass mode, do not provide credits to fetch the descriptor, and instead, you need to send in the
descriptor on the descriptor bypass interface.
✎ Note: AXI4-Stream C2H Simple Bypass mode and Cache Bypass mode both use same bypass in
ports (c2h_byp_in_st_csh_* ports).

Displayed in the footer


Page 234 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: C2H Stream Modes

c2h_byp_in port desc_byp software descriptor


C2Hcontext
prefetch context

Simple bypass c2h_byp_in_st_csh_* 1 1


mode

Cache bypass c2h_byp_in_st_csh_* 1 0


mode

Cache internal N/A 0 0


mode

Simple Bypass Mode


For simple bypass mode, the descriptor fetch engine sends the descriptors out on the C2H bypass
out interface. The user application converts the descriptor and loops it back to the QDMA on the
simple mode C2H bypass input interface. The user application sends the credits for the descriptors,
and it also keeps the order of the descriptors.
For simple bypass transfer to work, a prefetch tag is needed and it can be fetched from the QDMA IP.
The user application must request a prefetch tag before sending any traffic for a simple bypass queue
through the C2H ST engine. Invalid queues or non-bypass queues should not request any tags using
this method, as it might reduce performance by freezing tags that never get used.
The prefetch tag needs to be reserved upfront before any traffic can start. One prefetch tag per target
host is required. In most applications, one prefetch tag for a host is needed. In Simple Bypass mode,
the tag is not tied to any descriptor ring. For the queues that share the same prefetch tag, the data
and descriptors need to come in the same order. For Simple Bypass, the data and descriptors are
both controlled by the user, so they need to guarantee the order is maintained.
For example when the data stream has packets in the order of Q0, Q1, and Q2, on descriptor input,
you can not send Q1, Q2, Q0 etc. The order needs to be maintained.
The user application writes to the MDMA_C2H_PFCH_BYP_QID (0x1408) register with the qid for a
simple bypass queue, then reads from MDMA_C2H_PFCH_BYP_TAG (0x140C) register to retrieve
the corresponding prefetch tag. This tag must be driven with all bypass_in descriptors for as long as
the current qid is valid. If a current qid is invalidated, a new prefetch tag must be requested with a
valid qid.
Prefetched tag must be assigned to input port c2h_byp_in_st_csh_pfch_tag[6:0] for all transfers.
The prefetch tag points to the CAM that stores the active queues in the prefetch engine. Also the qid
that was used to prefetch tag needs to be used as the qid for all simple bypass packets. Assign the
qid to dma<n>_s_axis_c2h_ctrl_qid.
The steps to fetch the prefetch tag are as follows:

1. Software instruction:
a. Initialize a queue (qid).
b. Write to MDMA_C2H_PFCH_BYP_QID 0x1408 with valid qid.
c. Read MDMA_C2H_PFCH_BYP_TAG 0x140C to obtain the prefetch tag.

Displayed in the footer


Page 235 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
d. The prefetch tag and the qid that was used to fetch the tag should be used for all simple
bypass packets. This information needs to be communicated to the user side.
2. User side:
a. Assign the qid used to fetch the tag to dma<n>_s_axis_c2h_ctrl_qid.
b. Assign the actual qid of the packet transfer to dma<n>_s_axis_c2h_cmpt_ctrl_qid.
c. Assign the prefetch tag value to c2h_byp_in_st_csh_pfch_tag.
d. Assign the actual qid of the packet transfer to c2h_byp_in_st_csh_qid.
✎ Note: The c2h_byp_in_st_csh_pfch_tag[6:0] port can have the same prefetch_tag for as
long as the original qid is valid.

Simple bypass flow shown below does not include fetch of the "prefetch_tag".

Figure: C2H Simple Bypass Mode Flow

✎ Note: No sequence is required between descriptor bypass in, data payload and completion
packets.
If you already have descriptors, there is no need to update the pointers or provide credits. Instead,
send the descriptors in the descriptor bypass interface, and send the data and Completion (CMPT)
packets.
When simple bypass mode is selected, the queue that is used to fetch the prefetch ID, acts like
management Q and it controls the buffer sizes.
The buffer size that is set for this management Q is used for all the queues, irrespective of the buffer
size set for other queues. In the simple bypass mode, you provide descriptors and data packets, so
the buffer size should be set properly to accommodate all packet sizes in all queues. Users need to

Displayed in the footer


Page 236 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
make sure to send adequate descriptors for each data packets. Number of descriptors = packet
size/Management Q buffer size, and this applies for all queues.

Cache Bypass Mode


For Cache Bypass mode, the descriptor fetch engine sends the descriptors out on the C2H bypass
output interface. The user application converts the descriptor and loops it back to the QDMA on the
cache mode C2H bypass input interface. The prefetch engine sends the credits for the descriptors,
and it keeps the order of the descriptors.
For Cache Internal mode, the descriptor fetch engine sends the descriptors to the prefetch engine.
The prefetch engine sends out the credits for the descriptors and keeps the order of the descriptors.
In this case, the descriptors do not go out on the C2H bypass output and do not come back on the
C2H bypass input interfaces. In Cache Internal mode prefetch tag is maintained by the IP internally.
In Cache Bypass or Cache Internal mode, prefetch mode can be turned on which prefetches the
descriptor and reduces transfer latency significantly. When prefetch mode is enabled, the user
application can not send credits as input in QDMA Descriptor Credit input ports. Credits for all queues
are maintained by prefetch engine.
In cache bypass mode, prefetch tag is maintained by the IP internally. Signal
c2h_byp_out_pfch_tag[6:0] should be looped back as an input
c2h_byp_in_st_csh_pfch_tag[6:0]. The prefetch tag points to the cam that stores the active
queues in the prefetch engine.

Figure: C2H Cache Bypass Mode Flow

✎ Note: No sequence is required between payload and completion packets.


Displayed in the footer
Page 237 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
C2H Stream Packet Type

The following are some of the different C2H stream packets.

Regular Packet
The regular C2H packet has both the data packet and Completion (CMPT) packet. They are a one-to-
one match.
The regular C2H data packet can be multiple beats.

dma<n>_s_axis_c2h_ctrl_qid = C2H descriptor queue ID.


dma<n>_s_axis_c2h_ctrl_len = length of the packet.
dma<n>_s_axis_c2h_mty = empty byte should be set in last beat.
dma<n>_s_axis_c2h_ctrl_has_cmpt = 1'b1. This data packet has a corresponding CMPT
packet.

The regular C2H CMPT packet is one beat.

dma<n>_s_axis_c2h_cmpt_ctrl_qid = Completion queue ID of the packet. This can be


different from the C2H descriptor QID.
dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type = HAS_PLD. This completion packet has a
corresponding data packet.
dma<n>_s_axis_c2h_cmpt_ctrl_wait_pld_pkt_id = This completion packet has to wait for
the data packet with this ID to be sent before the CMPT packet can be sent.

When the user application sends the data packet, it must count the packet ID for each packet. The
first data packet has a packet ID of 1, and it increments for each data packets.
For the regular C2H packet, the data packet and the completion packet is a one-to-one match.
Therefore, the number of data packets with dma<n>_s_axis_c2h_ctrl_has_cmpt as 1'b1 should be
equal to the number of CMPT packet with dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type as HAS_PLD.
The QDMA has a shallow completion input FIFO of depth 2. For better performance, add FIFO for
completion input as shown in the diagram below. Depth and width of the FIFO depends on the use
case. Width is dependent on the largest CMPT size for the application, and depth is dependent on
performance needs. For best performance for 64 Byte CMPT, a depth of 512 is recommended.
When the user application sends the data payload, it counts every packet. The first packet starts with
a pkt_pld_id of 1. The second packet has a pkt_pld_id of 2, and so on. It is a 16-bits counter once
the count reaches 16'hffff it wraps around to 0 and count forward.
The user application defines the CMPT type.

Displayed in the footer


Page 238 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
If the dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type is HAS_PLD, the CMPT has a
corresponding data payload. The user application must place pkt_pld_id of that packet in the
dma<n>_s_axis_c2h_cmpt_ctrl_wait_pld_pkt_id field. The DMA only sends out this CMPT
after it sends out the corresponding data payload packet.
If the dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type is NO_PLD_NO_WAIT, the CMPT does not
have any data payload, and it does not need to wait for payload. Then the DMA sends out this
CMPT.
If the dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type is NO_PLD_BUT_WAIT, the CMPT does not
have a corresponding data payload packet. The CMPT must wait for a particular data payload
packet before the CMPT is sent out. Therefore, the user application must place the pld_pkt_id
of that particular data payload into the dma<n>_s_axis_c2h_cmpt_ctrl_wait_pld_pkt_id
field. The DMA does not send out the CMPT until the data payload with that pld_pkt_id is sent
out.

Figure: CMPT Input FIFO

Immediate Data Packet


The user application can have a packet that only writes to the Completion Ring without having a
corresponding data packet transfer to the host. This type of packet is called immediate data packet.
For the immediate data packet, the QDMA does not send the data payload, but it writes to the CMPT
Queue. The immediate packet does not consume a descriptor.
For the immediate data packet, the user application only sends the CMPT packet to the DMA, and it
does not send the data packet.
The following is the setting of the immediate completion packet. There is no corresponding data
packet.
In some applications, the immediate completion packet does not need to wait for any data packet. But
in some applications, it might still need to wait for the data payload packet. When the completion type
is NO_PLD_NO_WAIT, the completion packet can be sent out without waiting for any data packet. When
the completion type is NO_PLD_BUT_WAIT, the completion packet must specify the data packet ID that
it needs to wait for.

dma<n>_s_axis_c2h_cmpt_user_cmpt_type = NO_PLD_NO_WAIT or NO_PLD_BUT_WAIT.


dma<n>_s_axis_c2h_cmpt_ctrl_wait_pld_pkt_id = Do not increment packet count.

Displayed in the footer


Page 239 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Zero Length Packet


The length of the data packet can be zero. On the input, the user needs to send one beat of data. The
zero length packet consumes the descriptor. The QDMA sends out 1DW payload data.
The following is the setting of the zero length packet:

1 beat of data
dma<n>_s_axis_c2h_ctrl_len = 0
dma<n>_s_axis_c2h_mty = 0

✎ Note: Zero Byte packets are not supported in Internal mode and Cache bypass mode. The QDMA
might hang if zero byte packets are dropped due to not available descriptor. Zero Byte Packets are
supported in Simple bypass mode.

Disable Completion Packet


The user application can disable the completion for a specific packet. The QDMA provides direct
memory access (DMA) to the payload, but does not write to the C2H Completion Ring. The user
application only sends the data packet to the DMA, and does not send the CMPT packet.
The following is the setting of the disable completion packet:

dma<n>_s_axis_c2h_ctrl_has_cmpt = 1'b0

Handling Descriptors With Errors

If an error is encountered while fetching a descriptor (in pre-fetch or regular mode), the QDMA
Descriptor Engine flags the descriptor with error. For a queue in internal mode, the C2H Stream
Engine handles the error descriptor by not performing any PCIe or DMA activity. Instead, it waits for
the error descriptor to pass through the pipeline and forces a writeback after it is done. For a queue in
bypass mode, it is the responsibility of the user logic to not issue a batch of descriptors with an error
descriptor. Instead, it must send just one descriptor with error input asserted on the C2H Stream
bypass-in interface and set the SOP, EOP, no_dma signal, and sdi or mrkr_req signal to make the
C2H Stream Engine send a writeback to Host.

Completion Engine

The Completion Engine writes the C2H AXI4-Stream Completion (CMPT) in the CMPT queue. The
user application sends a CMPT packet and other information, such as, but not limited to, CMPT QID,
and CMPT_TYPE to the QDMA. The QDMA uses this information to process the CMPT packet. The
QDMA can be instructed to write the CMPT packet unchanged in the CMPT queue. Alternatively, the
user application can instruct the QDMA to insert certain fields, like error and color, in the CMPT packet
before writing it into the CMPT queue. Additionally, using the CMPT interface signals, the user
application instructs the QDMA to order the writing of the CMPT packet in a specific way, relative to
traffic on the C2H data input. Although not a requirement, a CMPT is typically used with a C2H queue.
In such a case, the CMPT is used to inform the SW that a certain number of C2H descriptors are used

Displayed in the footer


Page 240 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
up by the DMA of C2H data. This allows the SW to reclaim the C2H descriptors. A CMPT can also be
used without a corresponding C2H DMA operation, in which case, it is known as Immediate Data.
The user-defined portion of the CMPT packet typically needs to specify the length of the data packet
transferred and whether or not descriptors were consumed as a result of the data packet transfer.
Immediate and marker type packets do not consume any descriptors. The exact contents of the user-
defined data are up to the user to determine.
✎ Note: Maximum buffer size register 0xB50 bits[31:26] is programmed to 0 (default value). This
value might result in an overflow depending on the simulator or the synthesis tool used. To avoid
overflow, set 0xB50 bits[31:26] to maximum value of 63.
✎ Note: The calculation of the completion ring size must account for completion data, immediate
data, and marker packets. Therefore, you must assign the completion ring size accurately.

Completion Context Structure

The completion context is used by the Completion Engine.

Table: Completion Context Structure Definition

Bit Bit Width Field Name Description

[256:183] 17 Reserved. Initialize to 0.

[182:180] 3 port_id Port ID. The Completion Engine checks the


port_id of events received at its input to the
port_id configured here. If the check fails,
the input is dropped, and an error is logged
in the C2H_ERR_STAT register. The
following are checked for port_id:

All events on the


dma<n>_s_axis_c2h_cmpt interface.
These include CMPTs, immediate
data, and markers.
CMPT CIDX pointer updates (checked
only when the update is coming from
the AXI side).

[179] 1 Reserved. Initialize to 0's

[178:175] 4 baddr4_low Since the minimum alignment supported is


64B in this case, this field must be 0

[174:147] 28 Reserved. Initialize to 0's

[146] 1 dir_c2h DMA direction is C2H. The CMPT engine


can be used to manage the
completion/used ring of a C2H as well as
an H2C queue.

Displayed in the footer


Page 241 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


0x0: DMA direction is H2C
0x1: DMA direction is C2H

[145] 1 Reserved. Initialize to 0

[144] 1 dis_int_on_vf Disable interrupt with VF

[143] 1 int_aggr Interrupt Aggregration


Set to configure the QID in interrupt
aggregation mode

[142:132] 11 vec Interrupt Vector Number

131 1 at Address Translation


This bit is used to determine whether the
queue addresses are translated or
untranslated. This information is sent to the
PCIe on CMPT and Status writes.
0: Address is untranslated
1: Address is translated

130 1 ovf_chk_dis Completion Ring Overflow Check Disable


If set, then the CMPT Engine does not
check whether writing a completion entry in
the Completion Ring will overflow the Ring
or not. The result is that QDMA invariably
sends out Completions without first
checking if it is going to overflow the
Completion Ring and not take any actions
that it normally takes when it encounters a
Completion Ring overflow scenario. It is up
to the software and user logic to negotiate
and ensure that they do not cause a
Completion Ring overflow

[129] 1 full_upd Full Update


If reset, the all fields other than the CIDX of
a Completion-CIDX-update are ignored.
Only the CIDX field will be copied from the
update to the context.
If set, the Completion CIDX update can
update the following fields in this context:

Displayed in the footer


Page 242 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


timer_ix
counter_ix
trig_mode
en_int
en_stat_desc

[128] 1 timer_running If set, it indicates that a timer is running on


this queue. This timer is for the purpose of
CMPT interrupt moderation. Ideally, the
software must ensure that there is no
running timer on this QID before shutting
the queue down. This is a field used
internally by the hardware. The software
must initialize it to 0 and then treat it as
read-only.

[127] 1 user_trig_pend If set, it indicates that a user logic initiated


interrupt is pending to be generated. The
user logic can request an interrupt through
the
dma<n>_s_axis_c2h_cmpt_ctrl_user_trig
signal. This bit is set when the user logic
requests an interrupt while another one is
already pending on this QID. When the next
Completion CIDX update is received by
QDMA, this pending bit might or might not
generate an interrupt depending on
whether or not there are entries in the
Completion ring waiting to be read. This is
a field used internally by the hardware. The
software must initialize it to 0 and then treat
it as read-only.

[126:125] 2 err Indicates that the Completion Context is in


error. This is a field written by the hardware.
The software must initialize it to 0 and then
treat it as read-only. The following errors
are indicated here:
0: No error.
1: A bad CIDX update from software was
detected.
2: A descriptor error was detected.
3: A Completion packet was sent by the
user logic when the Completion Ring was

Displayed in the footer


Page 243 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description


already full.

[124] 1 valid Context is valid.

[123:108] 16 cidx Current value of the hardware copy of the


Completion Ring Consumer Index.

[107:92] 16 pidx Completion Ring Producer Index. This is a


field written by the hardware. The software
must initialize it to 0 and then treat it as
read-only.

[91:90] 2 desc_size Completion Entry Size:


0: 8B
1: 16B
2: 32B
3: 64B

[89:32] 58 baddr 64B aligned base address of Completion


ring – bit [63:6].

[31:28] 4 qsize_idx Completion ring size index. This index


selects one of 16 register (offset 0x204
:0x240) which has different ring sizes.

[27] 1 color Color bit to be used on Completion.

[26:25] 2 int_st Interrupt State:


0: ISR
1: TRIG
This is a field used internally by the
hardware. The software must initialize it to
0 and then treat it as read-only.
When out of reset, the hardware initializes
into ISR state, and is not sensitive to trigger
events. If the software needs interrupts or
status writes, it must send an initial
Completion CIDX update. This makes the
hardware move into TRIG state and as a
result it becomes sensitive to any trigger
conditions.

[24:21] 4 timer_idx Index to timer register for TIMER based


trigger modes.

[20:17] 4 counter_idx Reserved

Displayed in the footer


Page 244 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[16:13] 4 Reserved. Initialize to 0

[12:5] 8 fnc_id Function ID

[4:2] 3 trig_mode Interrupt and Completion Status Write


Trigger Mode:
0x0: Disabled
0x1: Every
0x2: reserved
0x3: User
0x4: User_Timer
0x5: reserved

[1] 1 en_int Enable Completion interrupts.

[0] 1 en_stat_desc Enable Completion Status writes.

Completion Status Structure

The Completion Status is located at the last location of Completion ring, that is, Completion Ring
Base Address + (Size of the completion length (8,16,32) * (Completion Ring Size – 1)).
In order to make the QDMA write Completion Status to the Completion ring, Completion Status must
be enabled in the Completion context. In addition to affecting Interrupts, the trigger mode defined in
the Completion context also moderates the writing of Completion Statuses. Subject to Interrupt/Status
moderation, a Completion Status can be written when either of the following happens:

1. A CMPT packet is written to the Completion ring.


2. A CMPT-CIDX update from the SW is received, and indicates that more Completion entries are
waiting to be read.
3. The timer associated with the respective CMPT QID expires and is programmed in a timer-
based trigger mode.

Table: AXI4-Stream Completion Status Structure

Bit Bit Width Field Name Description

[63:37] 27 Reserved

[36:35] 2 error Error.


0x0: No error
0x1 Bad CIDX update received
0x2: Descriptor error
0x3: CMPT ring overflow error

[34:33] 2 int_state Interrupt State.


0: ISR
1: TRIG

Displayed in the footer


Page 245 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Bit Width Field Name Description

[32] 1 color Color status bit

[31:16] 16 cidx Consumer Index (RO)

[15:0] 16 pidx Producer Index

Completion Entry Structure

The size of a Completion (CMPT) Ring entry is 512-bits. This includes user defined data, an optional
error bit, and an optional color bit. The user defined data has four size options: 8B, 16B, 32B and 64B.
The bit locations of the optional error and color bits in the CMPT entry are configurable individually.
This is done by specifying the locations of these fields using the AMD Vivado™ IDE IP customization
options while compiling the QDMA. There are seven color bit location options and eight error bit
location options. The location is specified as an offset from the LSB bit of the Completion entry.
When the user application drives a Completion packet into the QDMA, it provides a
dma<n>_s_axis_cmpt_ctrl_col_idx[2:0] value and a
dma<n>_s_axis_cmpt_ctrl_err_idx[2:0] value at the interface. These indices are used by the
QDMA to use the correct locations of the color and error bits. For example, if
dma<n>_s_axis_cmpt_ctrl_col_idx[2:0] = 0 and dma<n>_s_axis_cmpt_ctrl_err_idx[2:0] =
1, then the QDMA uses the C2H Stream Completion Color bits position option 0 for color location, and
C2H Stream Completion Error bits position option 1 for error location. An index of seven for color or
error signals implies that the DMA will not update the corresponding color or error bits when
Completion entry is updated (those fields are ignored). The C2H Stream Completions bits options are
set in the PCIe DMA Tab in the AMD Vivado™ IDE.
The error and color bit location values that are used at compile time are available for the software to
read from the MMIO registers. There are seven registers for this purpose,
QDMA_C2H_CMPT_FORMAT (0xBC4) to QDMA_GLBL_ERR_MASK (0x24C). Each of these
registers holds one color and one error bit location.

C2H Stream Completions bits option 0 for color bit location and option 0 for error bit location are
available through the QDMA_C2H_CMPT_FORMAT_0 register.
C2H Stream Completions bits option 1 for color bit location and option 1 for error bit location are
available through the QDMA_C2H_CMPT_FORMAT_1 register.
And so on.

Based on the CMPT data size selection (8, 16, 32 or 64 Bytes), the data in
s_axis_c2h_cmpt_tdata[511:0] signal is registered in the completion entry as shown in the
following table.

Table: Completion Entry Structure

Name Size (Bits) Index

User-defined bits for 64 Bytes 510-512 Depending on whether there are


settings color and error bits present.

Displayed in the footer


Page 246 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Name Size (Bits) Index

User-defined bits for 32 Bytes 254-256 Depending on whether there are


settings color and error bits present.

User-defined bits for 16 Bytes 126-128 Depending on whether there are


settings color and error bits present.

User-defined bits for 8 Bytes settings 62-64 Depending on whether there are
color and error bits present.

Err The Error bit location is defined by


registers
QDMA_C2H_CMPT_FORMAT_0
(0xBC4) to
QDMA_C2H_CMPT_FORMAT_6
(0xBDC). These register show color
bit position that is user defined during
IP generation. You can index into this
register based on input CMPT ports
dma<n>_s_axis_c2h_cmpt_ctrl_err_idx[2:0]
You can choose not to include err bit
(index value 7). In such a case, user-
defined data takes up that space

Color The Color bit location is defined by


registers
QDMA_C2H_CMPT_FORMAT_0
(0xBC4) to
QDMA_C2H_CMPT_FORMAT_6
(0xBDC).These register show color
bit position that is user defined during
IP generation. You can index into this
register based on input CMPT ports
dma<n>_s_axis_c2h_cmpt_ctrl_col_idx[2:0]
If you do not include a color bit (index
value 7), the user-defined data takes
up that space.

Completion Input Packet

The user application sends the CMPT packet to the QDMA.


The CMPT packet and data packet do not require a one-to-one match. For example, the immediate
data packet only has the CMPT packet, and does not have the data packet. The disable completion
packet only has the data packet and does not have the CMPT packet.
Each CMPT packet has a CMPT ID. It is the ID for the associated CMPT queue. Each CMPT queue
has a CMPT Context. The driver sets up the mapping of the C2H descriptor queue to the CMPT

Displayed in the footer


Page 247 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
queue. There also can be a CMPT queue that is not associated to a C2H queue.
The following is the CMPT packet from the user application.

Table: CMPT Input Packet

Name Size Index

Data 512 bits [511:0]


(s_axis_c2h_cmpt_tdata[511:0])

The CMPT packet has four options (8, 16, 32 or 64 Bytes). It has one pump of data with 512 bits.

Completion Status/Interrupt Moderation

The QDMA provides a means to moderate the Completion interrupts and Completion Status writes on
a per queue basis. The software can select one out of five modes for each queue. The selected mode
for a queue is stored in the QDMA in the Completion ring context for that queue. After a mode has
been selected for a queue, the driver can always select another mode when it sends the completion
ring CIDX update to the QDMA.
The Completion interrupt moderation is handled by the Completion engine. The Completion engine
stores the Completion ring contexts of all the queues. It is possible to individually enable or disable
the sending of interrupts and Completion Statuses for every queue and this information is present in
the Completion ring context. It is worth mentioning that the modes being described here moderate not
only interrupts but also Completion Status writes. Also, since interrupts and Completion Status writes
can be individually enabled/disabled for each queue, these modes work only if the
interrupt/Completion Status is enabled in the Completion context for that queue.
The QDMA keeps only one interrupt outstanding per queue. This policy is enforced by QDMA even if
all other conditions to send an interrupt are met for the mode. The way the QDMA considers an
interrupt serviced is by receiving a CIDX update for that queue from the driver.
The basic policy followed in all the interrupt moderation modes is that when there is no interrupt
outstanding for a queue, the QDMA keeps monitoring the trigger conditions to be met for that mode.
Once the conditions are met, an interrupt is sent out. While the QDMA subsystem is waiting for the
interrupt to be served, it remains sensitive to interrupt conditions being met and remembers them.
When the CIDX update is received, the QDMA subsystem evaluates whether the conditions are still
being met. If they are still being met, another interrupt is sent out. If they are not met, no interrupt is
sent out and the QDMA resumes monitoring for the conditions to be met again.
The interrupt moderation modes that the QDMA subsystem provides are not necessarily precise.
Thus, if the user application sends two CMPT packets with an indication to send an interrupt, it is not
necessary that two interrupts are generated. The main reason for this behavior is that when the driver
is interrupted to read the Completion ring, and it is under no obligation to read exactly up to the
Completion for which the interrupt was generated. Thus, the driver might not read up to the
interrupting Completion, or it might even read beyond the interrupting Completion descriptor if there
are valid descriptors to be read there. This behavior requires the QDMA to re-evaluate the trigger
conditions every time it receives the CIDX update from the driver.
The detailed description of each mode is given below:

Displayed in the footer


Page 248 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
TRIGGER_EVERY
This mode is the most aggressive in terms of interruption frequency. The idea behind this mode
is to send an interrupt whenever the completion engine determines that an unread completion
descriptor is present in the Completion ring.

TRIGGER_USER
The QDMA provides a way to send a CMPT packet to the subsystem with an indication to send
out an interrupt when the subsystem is done sending the packet to the host. This allows the user
application to perform interrupt moderation when the TRIGGER_USER mode is set.

TRIGGER_USER_TIMER
In this mode, the QDMA is sensitive to either of two triggers. One of these triggers is sent by the
user along with the CMPT packet. The other trigger is the expiration of the timer that is
associated with the CMPT queue. The period of the timer is driver programmable on a per-
queue basis. The QDMA evaluates whether or not to send an interrupt when either of these
triggers is detected. As explained in the preceding sections, other conditions must be satisfied in
addition to the triggers for an interrupt to be sent. For more information, see Completion Timer.

TRIGGER_DIS
In this mode, the QDMA does not send Completion interrupts in spite of them being enabled for
a given queue. The only way that the driver can read the completion ring in this case is when it
regularly polls the ring. The driver requires to make use of the color bit feature provided in the
Completion ring when this mode is set as this mode also disables the sending of any Completion
Status descriptors to the Completion ring.

When a queue is programmed in TRIGGER_USER_TIMER_COUNT mode, the software can choose


to not read all the Completion entries available in the Completion ring as indicated by an interrupt (or
a Completion Status write). In such a case, the software can give a Completion CIDX update for the
partial read. This works because the QDMA restarts the timer upon reception of the CIDX update and
once the timer expires, another interrupt is generated. This process repeats until all the completion
entries are read.
However, in the TRIGGER_EVERY, TRIGGER_USER and TRIGGER_USER_COUNT modes, an
interrupt is sent, if at all, as a result of a Completion packet being received by the QDMA from the
user logic. For every request by the user logic to send an interrupt, the QDMA sends one and only
one interrupt. Thus in this case, if the software does not read all the Completion entries available to
be read and the user logic does not send any more Completions requesting interrupts, the QDMA
does not generate any more interrupts. This results in the residual Completions sitting in the
Completion ring indefinitely. To avoid this from happening, when in TRIGGER_EVERY,
TRIGGER_USER and TRIGGER_USER_COUNT mode, the software must read all the Completion
entries in the Completion ring as indicated by an interrupt (or a Completion Status write).
The following are the flowcharts of different modes. These flowcharts are from the point of view of the
Completion Engine. The Completion packets come in from the user logic and are written to the
Completion Ring. The software (SW) update refers to the Completion Ring CIDX update sent from
software to hardware.

Figure: Flowchart for EVERY Mode

Displayed in the footer


Page 249 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Flowchart for USER Mode

Figure: Flowchart for USER_COUNT Mode

Displayed in the footer


Page 250 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Flowchart for USER_TIMER Mode

Displayed in the footer


Page 251 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Flowchart for USER_TIMER_COUNT Mode

Displayed in the footer


Page 252 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Completion Timer

The Completion Timer engine supports the timer trigger mode in the Completion context. It supports
2048 queues, and each queue has its own timer. When the timer expires, a timer expire signal is sent
to the Completion module. If multiple timers expire at the same time, they are sent out in a round
robin manner.

Reference Timer
The reference timer is based on the timer tick. The register QDMA_C2H_INT (0xB0C) defines the
value of a timer tick. The 16 registers QDMA_C2H_TIMER_CNT (0xA00-0xA3c) has the timer counts
based on the timer tick. The timer_idx in the Completion context is the index to the 16
QDMA_C2H_TIMER_CNT registers. Each queue can choose its own timer_idx.

Handling Exception Events

C2H Completion On Invalid Queue


When QDMA receives a Completion on a queue which has an invalid context as indicated by the Valid
bit in the C2H CMPT Context, the Completion is silently dropped.

Displayed in the footer


Page 253 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

C2H Completion On A Full Ring


The maximum number of Completion entries in the Completion Ring is 2 less than the total number of
entries in the Completion Ring. The C2H Completion Context has PIDX and CIDX in it. This allows the
QDMA to calculate the number of Completions in the Completion Ring. When the QDMA receives a
Completion on a queue that is full, QDMA takes the following actions:

Invalidates the C2H Completion Context for that queue.


Marks the C2H Completion Context with error.
Drops the Completion.
If enabled, sends a Status Descriptor marked with error.
If enabled and not outstanding, sends an Interrupt.
Sends a Marker Response with error.
Logs the error in the C2H Error Status Register.

C2H Completion With Descriptor Error


When the QDMA C2H Engine encounters a Descriptor Error, the following actions are taken in the
context of the C2H Completion Engine:

Invalidates the C2H Completion Context for that queue.


Marks the C2H Completion Context with error.
Sends the Completion out to the Completion Ring. It is marked with an error.
If enabled and not outstanding, sends a Status Descriptor marked with error.
If enabled and not outstanding, sends an Interrupt. Note that the Completion Engine can only
send an interrupt and/or status descriptor if not outstanding. One implication of this is that if the
interrupt happens to be outstanding when the descriptor error is encountered, a queue interrupt
will not be sent to the software. Despite that, the error is logged and an error interrupt is still sent,
if not masked by the software
Sends a Marker Response with error.

C2H Completion With Invalid CIDX


The C2H Completion Engine has logic to detect that the CIDX value in the CIDX update points to an
empty location in the Completion Ring. When it detects such error, the C2H Completion Engine:

Invalidates the Completion Context.


Marks the Completion Context with error.
Logs an error in the C2H error status register.

Port ID Mismatch
The CMPT context specifies the port over which CMPTs are expected for that CMPT queue. If the
port_id in the incoming CMPT is not the same as the port_id in the CMPT context, the CMPT

Displayed in the footer


Page 254 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Engine treats the incoming CMPT as a mis-directed CMPT and drops it. It also logs an error. Note that
the CMPT queue is not invalidated when a port_id mismatch occurs.

Bridge

The Bridge core is an interface between the AXI4 and the PCI Express integrated block. It contains
the memory mapped AXI4 to AXI4-Stream Bridge, and the AXI4-Stream Enhanced Interface Block for
PCIe. The memory mapped AXI4 to AXI4-Stream Bridge contains a register block and two functional
half bridges, referred to as the Slave Bridge and Master Bridge.

The slave bridge connects to the AXI4 Interconnect as a slave device to handle any issued AXI4
master read or write requests.
The master bridge connects to the AXI4 Interconnect as a master to process the PCIe generated
read or write TLPs.
The register block contains registers used in the Bridge core for dynamically mapping the AXI4
memory mapped (MM) address range provided using the AXIBAR parameters to an address for
PCIe range.

The core uses a set of interrupts to detect and flag error conditions.
Related Information
Bridge Register Space

Slave Bridge

The slave bridge provides termination of memory-mapped AXI4 transactions from an AXI4 master
device (such as a processor). The slave bridge provides a way to translate addresses that are
mapped within the AXI4 memory mapped address domain to the domain addresses for PCIe. Write
transactions to the Slave Bridge are converted into one or more MemWr TLPs, depending on the
configured Max Payload Size setting, which are passed to the integrated block for PCI Express. When
a remote AXI master initiates a read transaction to the slave bridge, the read address and qualifiers
are captured and a MemRd request TLP is passed to the core and a completion timeout timer is
started. Completions received through the core are correlated with pending read requests and read
data is returned to the AXI4 master. The slave bridge can support up to 32 AXI4 write requests, and
32 AXI4 read requests.
CPM does not do any SMID checks for slave AXI4 transfers. Any value is accepted.
✎ Note: If slave reads and writes are valid, IP prioritizes reads over writes. You are recommended to
have proper arbitration (leave some gaps between reads so writes can pass through).

BDF Table

Address translations for AXI address is done based on BDF table programming (0x2420 to 0x2434).
These BDF table entries can be programmed through the NoC AXI Slave interface. There are three
regions that you can use for slave data transfers. Each region can be further divided into many
windows for a different address translation. These regions and number of windows should be
configured in the IP wizard configuration. Each entry in the BDF table programming represents one
window. If a you need 2 windows then 2 entries need to be programmed and so on.

Displayed in the footer


Page 255 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
There are some restrictions in programming BDF table.

1. All PCIe slave bridge data transfers must be quiesced before programming the BDF table.
2. There are six registers for each BDF table entry. All six registers must be programmed to make a
valid entry. Even if some registers have 0s, you need to program 0s in those registers.
3. All the six registers need to be programmed in an order for an entry to be valid. Order is listed
below.
a. 0x2420
b. 0x2424
c. 0x2428
d. 0x242C
e. 0x2430
f. 0x2434

BDF table entry start address = 0x2420 + (0x20 * i), where i = table entry number.

Protection
Specifying protection levels for different windows within a BAR is facilitated by AXI4 prot field via
Trustzones. Any access from PMC will have a*prot[1]=0 and hence will get full access.
For the BDF space the protection domain ID itself is stored in the BDF table. When a request comes
in with a*rpot[1]=0, it will be allowed full access. Requests with a*prot[1]=1 will only be allowed to
access BDF entries that have lower protection level.
The following table describes this behavior:

Table: AXI BAR Protection Levels

Access Type BDF Table Value (prot[2:0])


Value in a*prot[2:0] (AXI Interface)
Action

Secure access 3'hXXX 3'hX0X (bit 1=0) Allow

Non-secure access to 3'hX0X 3'hX1X (bit 1=1) Do not allow


secure entry

Non-secure access to 3'hX1X 3'hX1X (bit 1=1) Allow if bits [2] and [0]
less secure entry match between a*prot
and BDF entry

Address Translation

Slave bridge data transfer can be performed over three regions. You have options to set the size of
the region and also how many windows are needed for different address translation per region. If
address translation is not needed for a window, you still need to program the BDF table with address
translation value as 0x0.
Address translation for Slave Bridge transfer are described in the following examples:

Displayed in the footer


Page 256 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Slave Address Translation Examples

Example 1: BAR Size of 64 KB, with 1 Window Size 4 KB


Window 0: 4 KB with address translation of 0x7 for bits [63:13].

1. Selections in Vivado IP configuration in the AXI BARs tab are as follows:


AXI BAR size 64K: 0xFFFF bits [15:0]
Set Aperture Base Address: 0x0000_0000_0000_0000
Set Aperture High Address: 0x0000_0000_0000_FFFF
2. The BDF table programming:
Program 1 entries for 1 window
Window Size = AXI BAR size/8 = 64K / 8 = 0x1FFF = 8 KB (13 bits). Each window max size
is 8 KB.
In this example for window size of 4K, 0x1 is programmed at 0x2430 bits [25:0].
Address translation for bits [63:13] are programmed at 0x2420 and 0x2424.
In this example, address translation for bits [63:13] are set to 0x7.

Table: BDF Table Programming

Program Value Registers

0x0000_E000 Address translation value Low

0x0 Address translation value High

0x0 PASID/ Reserved

0x0 [11:0]: Function Number

0xC0000001 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x0 reserved

For this example Slave address 0x0000_0000_0000_0100 will be address translated to


0x0000_0000_0000_E100.

Example 2: BAR Size of 64 KB, with 1 Window 8 KB


Window 0:8 KB with address translation of 0x6 ('b110) for bits [63:13].

1. Selections in Vivado IP configuration in the AXI BARs tab are as follows:

Displayed in the footer


Page 257 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI BAR size 64K: 0xFFFF bits [15:0]
Set Aperture Base Address: 0x0000_0000_0000_0000
Set Aperture High Address: 0x0000_0000_0000_FFFF
2. The BDF table programming:
Program 1 entries for 1 window.
Window Size = AXI BAR size/8 = 64K / 8 = 0x1FFF = 8 KB (13 bits). Each window max size
is 8 KB.
In this example for window size of 8K, 0x2 is programmed at 0x2430 bits [25:0].
Address translation for bits [63:13] are programmed at 0x2420 and 0x2424.
In this example, address translation for bits [63:13] are set to 0x6 ('b110).

Table: BDF Table Programming

Offset Program Value Registers

0x2420 0x0000_C000 Address translation value Low

0x2424 0x0 Address translation value High

0x2428 0x0 PASID/ Reserved

0x242C 0x0 [11:0]: Function Number

0x2430 0xC0000002 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2434 0x0 reserved

For this example, the Slave address 0x0000_0000_0000_0100 will be address translated to
0x0000_0000_0000_C100.

Example 3: BAR Size of 32 GB, and 4 Windows of Various Sizes


Window 0: 4 KB with address translation of 0x7 for bits [63:32].
Window 1: 4 GB with address translation of 0x0 for bits [63:32].
Window 2: 64 KB with address translation of 0xBBBB for bits [63:32].
Window 3: 1 GB with address translation of 0x11111 for bits [63:32].

1. Selections in AMD Vivado™ IP configuration in the AXI BARs tab are as follows:
AXI BAR size 32G: 0x7_FFFF_FFFF bits [34:0].
Set Aperture Base Address: 0x0000_0000_0000_0000.
Set Aperture High Address: 0x0000_0007_FFFF_FFFF.
2. The BDF table programming:

Displayed in the footer


Page 258 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Window Size = AXI BAR size/8 = 32 GB / 8 = 0xFFFF_FFFF = 4 GB (32 bits). Each window
max size is 4 GB.
Program 4 entries for 4 windows:
BDF entry 0 table starts at 0x2420.
BDF entry 1 table starts at 0x2440.
BDF entry 2 table starts at 0x2460.
BDF entry 3 table starts at 0x2480.
Window 0 size 4 KB.
Program 0x1 to 0x2430 bits [25:0].
Address translation for bits [34:32] are programmed at 0x2420 and 0x2424.
Program 0x0000_0000 to 0x2420.
Program 0x0000_0007 to 0x2424
Window 1 size 4 GB.
Program 0x10_0000 to 0x2450 bits [25:0].
Address translation for bits [63:32] are programmed at 0x2440 and 0x2444.
Program 0x0000_0000 to 0x2440.
Program 0x0000_0000 to 0x2444
Window 2 size 64 KB.
Program 0x10 to 0x2470 bits [25:0].
Address translation for bits [63:32] are programmed at 0x2460 and 0x2464.
Program 0x0000_0000 to 0x2460
Program 0x0000_BBBB to 0x2464
Window 3 size 1 GB.
Program 0x4_0000 to 0x2490 bits [25:0].
Address translation for bits [63:32] are programmed at 0x2480 and 0x2484.
Program 0x0000_0000 to 0x2480.
Program 0x0001_1111 to 0x2484

Table: BDF Table Programming Entry 0

Offset Program Value Registers

0x2420 0x0000_0000 Address translation value Low

0x2424 0x7 Address translation value High

0x2428 0x0 PASID/ Reserved

0x242C 0x0 [11:0]: Function Number

0x2430 0xC0000001 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2434 0x0 reserved

Displayed in the footer


Page 259 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: BDF Table Programming Entry 1

Offset Program Value Registers

0x2440 0x0000 Address translation value Low

0x2444 0x0 Address translation value High

0x2448 0x0 PASID/ Reserved

0x244C 0x0 [11:0]: Function Number

0x2450 0xC010_0000 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2454 0x0 reserved

Table: BDF Table Programming Entry 2

Offset Program Value Registers

0x2460 0x_0000 Address translation value Low

0x2464 0xBBBB Address translation value High

0x2468 0x0 PASID/ Reserved

0x246C 0x0 [11:0]: Function Number

0x2470 0xC000_00010 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2474 0x0 reserved

Table: BDF Table Programming Entry 3

Offset Program Value Registers

0x2480 0x0000_0000 Address translation value Low

0x2484 0x1_1111 Address translation value High

0x2488 0x0 PASID/ Reserved

0x248C 0x0 [11:0]: Function Number

Displayed in the footer


Page 260 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Offset Program Value Registers

0x2490 0xC004_0000 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2494 0x0 reserved


For the above example:

The slave address 0x0000_0000_0000_0100 translated to 0x0000_0007_0000_0100.


The slave address 0x0000_0001_0000_0100 translated to 0x0000_0000_0000_0100.
The slave address 0x0000_0002_0000_0100 translated to 0x0000_BBBB_0000_0100.
The slave address 0x0000_0003_0000_0100 translated to 0x0001_1111_0000_0100.

The slave bridge does not support narrow burst AXI transfers. To avoid narrow burst transfers,
connect the AXI smart-connect module which will convert narrow burst to full burst AXI transfers.

Master Bridge

The master bridge processes both PCIe MemWr and MemRd request TLPs received from the integrated
block for PCI Express and provides a means to translate addresses that are mapped within the
address for PCIe domain to the memory mapped AXI4 address domain. Each PCIe MemWr request
TLP header is used to create an address and qualifiers for the memory mapped AXI4 bus and the
associated write data is passed to the addressed memory mapped AXI4 Slave. The Master Bridge
can support up to 32 active PCIe MemWr request TLPs. PCIe MemWr request TLPs support is as
follows:
Each PCIe MemRd request TLP header is used to create an address and qualifiers for the memory
mapped AXI4 bus. Read data is collected from the addressed memory mapped AXI4 bridge slave and
used to generate completion TLPs which are then passed to the integrated block for PCI Express.
The Master Bridge in AXI Bridge mode can support up to 32 active PCIe MemRd request TLPs with
pending completions for improved AXI4 pipe-lining performance.
All AXI4_MM master transfer can be directed to modules based on the QDMA controller selection and
the steering selection in the GUI as shown in the following table:

Table: Controller Steering Options

Controller Steering Options

CTRL 0
CPM PCIE NoC 0
CPM PCIE NoC 1
CCI PS AXI 0

Displayed in the footer


Page 261 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Controller Steering Options

CTRL 1
CPM PCIE NoC 0
CPM PCIE NoC 1
CCI PS AXI 0
PL AXI0
PL AXI1

All AXI4_MM master transfer have SMID set to 0.

Interrupts

The QDMA supports up to 2K total MSI-X vectors. A single MSI-X vector can be used to support
multiple queues. Each function can support up to 8 vectors (8 *256 function = 2K vectors).
The QDMA supports Interrupt Aggregation. Each vector has an associated Interrupt Aggregation Ring.
The QID and status of queues requiring service are written into the Interrupt Aggregation Ring. When
a PCIe® MSI-X interrupt is received by the Host, the software reads the Interrupt Aggregation Ring to
determine which queue needs service. Mapping of queues to vectors is programmable vector number
provided in the queue context. It supports MSI-X interrupt modes for SR-IOV and non-SR-IOV.

Asynchronous and Queue Based Interrupts

The QDMA supports both asynchronous interrupts and queue-based interrupts.


The asynchronous interrupts are used for capturing events that are not synchronous to any DMA
operations, namely, errors, status, and debug conditions.
Interrupts are broadcast to all PFs, and maintain status for each PF in a queue based scheme. The
queue based interrupts include the interrupts from the H2C MM, H2C stream, C2H MM, and C2H
stream.

Interrupt Engine

The Interrupt Engine handles the queue based interrupts and the error interrupt.
The following figure shows the Interrupt Engine block diagram.

Figure: Interrupt Engine Block Diagram

Displayed in the footer


Page 262 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The Interrupt Engine gets the interrupts from H2C MM, H2C stream, C2H MM, C2H stream, or error
interrupt.
It handles the interrupts in two ways: direct interrupt or indirect interrupt. The interrupt sources has the
information that shows if it is direct interrupt or indirect interrupt. It also has the information of the
vector. If it is direct interrupt, the vector is the interrupt vector that is used to generate the PCIe MSI-X
message (the interrupt vector indix of the MSIX table). If it is indirect interrupt, the vector is the ring
index of the Interrupt Aggregation Ring. The interrupt source gets the information of interrupt type and
vector from the Descriptor Software Context, the Completion Context, or the error interrupt register.

Direct Interrupt

For direct interrupt, the Interrupt Engine gets the interrupt vector from the source, and it then sends
out the PCIe MSI-X message directly.

Interrupt Aggregation Ring

For the indirect interrupt, it does interrupt aggregation. The following are some restrictions for the
interrupt aggregation.

Each Interrupt Aggregation Ring can only be associated with one function. But multiple rings can
be associated with the same function.
The interrupt engine supports up to three interrupts from the same source, until the software
services the interrupts.
Interrupt aggregation ring size needs to be > 3 * number of Qs.

The Interrupt Engine processes the indirect interrupt with the following steps.

Interrupt source provides the index to which interrupt ring it belongs too.
Reads interrupt context for that queue.
Writes to the Interrupt Aggregation Ring.
Sends out the PCIe MSI-X message.

This following figure shows the indirect interrupt block diagram.

Figure: Indirect Interrupt

Displayed in the footer


Page 263 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The Interrupt Context includes the information of the Interrupt Aggregation Ring. It has 256 entries to
support up to 256 Interrupt Aggregation Rings.
Color bit is added so software does not read more entries that what it should read. When the software
allocates the memory space for the Interrupt Aggregation Ring, the coal_color starts with 1'b0. The
software needs to initialize the color bit of the Interrupt Context to be 1'b1. When the hardware
completes the entire ring and flips to first entry in the next pass, it also flips the color value to 0 and
starts writing 0 in color bit space. The software does the same after it completes the last entry with a
color value 1, and goes to the first entry in the second pass and expects a color value 0. If the
software does not see a color value 0,which indicates an old entry, it waits for new entry with a color
value 0.
The software reads the Interrupt Aggregation Ring to get the Qid, and the int_type (H2C or C2H).
From the Qid, the software can identify whether the queue is stream or MM.
The stat_desc in the Interrupt Aggregation Ring is the status descriptor from the Interrupt source.
When the status descriptor is disabled, the software can get the status descriptor information from the
Interrupt Aggregation Ring.
There can be two cases:

The interrupt source is C2H stream. Then it is the status descriptor of the C2H Completion Ring.
The software can read the pidx of the C2H Completion Ring.
The interrupt source is others (H2C stream, H2C MM, C2H MM). Then it is the status descriptor
of that source. The software can read the cidx.

Finally, the Interrupt Engine sends out the PCIe MSI-X message using the interrupt vector from the
Interrupt Context. When there is an interrupt from any source, the interrupt engine updates PIDX and
check for int_st of that interrupt context. If int_st is 0 (WAITING_TRIGGER) then the interrupt
engine will send a interrupt. If int_st is 1 (ISR_RUNNING), the interrupt engine will not send
interrupt. If the interrupt engine sends interrupt it will update int_sts to 1 and once software updated
CIDX and CIDX matches PIDX int_sts will be cleared. The process is explained below.
When the PCIe MSI-X interrupt is received by the Host, the software reads the Interrupt Aggregation
Ring to determine which queue needs service. After the software reads the ring, it will do a dynamic
pointer update for the software CIDX to indicate the cumulative pointer that the software reads to. The
software does the dynamic pointer update using the register QDMA_DMAP_SEL_INT_CIDX[2048]
(0x18000). If the software CIDX is equal to the PIDX, this will trigger a write to the Interrupt Context to
clear int_ston the interrupt state of that queue. This is to indicate the QDMA that the software
already reads all of the entries in the Interrupt Aggregation Ring. If the software CIDX is not equal to
the PIDX, the interrupt engine will send out another PCIe MSI-X message. Therefore, the software
can read the Interrupt Aggregation Ring again. After that, the software can do a pointer update of the

Displayed in the footer


Page 264 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
interrupt source ring. For example, if it is C2H stream interrupt, the software will update pointer of the
interrupt source ring, which is the C2H Completion Ring.
These are the steps for the software:

1. After the software gets the PCIe MSI-X message, it reads the Interrupt Aggregation Ring entries.
2. The software uses the coal_color bit to identify the written entries. Each entry has Qid and
Int_type (H2C or C2H). From the Qid and Int_type, the software can check if it is stream or
MM. This points to a corresponding source ring. For example, if it is C2H stream, the source ring
is the C2H Completion Ring. The software can then read the source ring to get information, and
do a dynamic pointer update of the source ring after that.
3. After the software finishes reading of all written entries in the Interrupt Aggregation Ring, it does
one dynamic pointer update of the software cidx using the register
QDMA_DMAP_SEL_INT_CIDX[2048] (0x18000). This communicates to the hardware of the
Interrupt Aggregation Ring pointer used by the software.
If the software cidx is not equal to the pidx, the hardware will send out another PCIe MSI-X
message, so that the software can read the Interrupt Aggregation Ring again.

When the software does the dynamic pointer update for the Interrupt Aggregation Ring using the
register QDMA_DMAP_SEL_INT_CIDX[2048] (0x18000), it sends the ring index of the Interrupt
Aggregation Ring.
The following diagram shows the indirect interrupt flow. The Interrupt module gets the interrupt
requests. It first writes to the Interrupt Aggregation Ring. Then it waits for the write completions. After
that, it sends out the PCIe MSI-X message. The interrupt requests can keep on coming, and the
Interrupt module keeps on processing them. In the meantime, the software reads the Interrupt
Aggregation Ring, and it does the dynamic pointer update. If the software CIDX is not equal to the
PIDX, it will send out another PCIe MSI-X message.
Interrupt Context Structure
The following is the Interrupt Context Structure (0x8).

Table: Interrupt Context Structure (0x8)

Signal Bit Owner Description

rsvd [255:126] Driver Reserved. Initialize to 0s

func [125:114] Driver Function number

rsvd [113:83] Driver Reserved. Initialize to 0s

at [82] Driver 1'b0: un-translated address


1'b1: translated address

pidx [81:70] DMA Producer Index, updated by DMA IP.

page_size [69:67] Driver Interrupt Aggregation Ring size:


0: 4 KB
1: 8 KB
2: 12 KB
3: 16 KB

Displayed in the footer


Page 265 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Bit Owner Description


4: 20 KB
5: 24 KB
6: 28 KB
7: 32 KB

baddr_4k [66:15] Drive Base address of Interrupt Aggregation


Ring – bit [63:12]

color [14] DMA Color bit

int_st [13] DMA Interrupt State:


0: WAIT_TRIGGER
1: ISR_RUNNING

Rsvd [12] NA Reserved

vec [11:1] Driver Interrupt vector index in msix table

valid [0] Driver Valid


The software needs to size the Interrupt Aggregation Ring appropriately. Each source can send up to
three messages to the ring. Therefore, the size of the ring needs satisfy the following formula.
Number of entry ≥ 3 x number of queues
The Interrupt Context is programmed by the context access. The QDMA_IND_CTXT_CMD.Qid has
the ring index, which is from the interrupt source. The operation of MDMA_CTXT_CMD_CLR can
clear all of the bits in the Interrupt Context. The MDMA_CTXT_CMD_INV can clear the valid bit.

Context access through QDMA_TRQ_SEL_IND:


QDMA_IND_CTXT_CMD.Qid = Ring index
QDMA_IND_CTXT_CMD.Sel = MDMA_CTXT_SEL_INT_COAL (0x8)
QDMA_IND_CTXT_CMD.cmd.Op =
MDMA_CTXT_CMD_WR
MDMA_CTXT_CMD_RD
MDMA_CTXT_CMD_CLR
MDMA_CTXT_CMD_INV

After the interrupt engine looks up the Interrupt Context, the interrupt engine writes to the Interrupt
Aggregation Ring. The interrupt engine also updates the Interrupt Context with the new PIDX, color,
and the interrupt state.
Interrupt Aggregation Entry
This is the Interrupt Aggregation Ring entry structure. It has 8B data.

Table: Interrupt Aggregation Ring Entry Structure

Signal Bit Owner Description

Coal_color [63] DMA The color bit of the Interrupt Aggregation


Ring. This bit inverts every time pidx

Displayed in the footer


Page 266 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Bit Owner Description


wraps around on the Interrupt
Aggregation Ring.

Qid [62:39] DMA This is from Interrupt source. Queue ID.

Int_type [38:38] DMA 0: H2C


1: C2H

Rsvd [37:37] DMA Reserved

Stat_desc [36:0] DMA This is the status descriptor of the


Interrupt source.
The following is the information in the stat_desc.

Table: stat_desc Information

Signal Bit Owner Description

Error [36:35] DMA This is from interrupt source:


c2h_err[1:0], or h2c_err[1:0].

Int_st [34:33] DMA This is from Interrupt source. Interrupt


state.
0: WRB_INT_ISR
1: WRB_INT_TRIG
2: WRB_INT_ARMED

Color [32:32] DMA This is from Interrupt source. This bit


inverts every time pidx wraps around
and this field gets copied to color field of
descriptor.

Cidx [31:16] DMA This is from Interrupt source.


Cumulative consumed pointer.

Pidx [15:0] DMA This is from Interrupt source.


Cumulative pointer of total interrupt
Aggregation Ring entry written.

Interrupt Flow

Figure: Interrupt Flow

Displayed in the footer


Page 267 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Error Interrupt

There are Leaf Error Aggregators in different places. They log the errors and propagate the errors to
the Central Error Aggregator. Each Leaf Error Aggregator has an error status register and an error
mask register. The error mask is enable mask. Irrespective of the enable mask value, the error status
register always logs the errors. Only when the error mask is enabled, the Leaf Error Aggregator will
propagate the error to the Central Error Aggregator.
The Central Error Aggregator aggregates all of the errors together. When any error occurs, it can
generate an Error Interrupt if the err_int_arm bit is set in the error interrupt register
QDMA_GLBL_ERR_INT (0B04). The err_int_arm bit is set by the software and cleared by the
hardware when the Error Interrupt is taken by the Interrupt Engine. The Error Interrupt is for all of the
errors including the H2C errors and C2H errors. The Software must set this err_int_arm bit to
generate interrupt again.
The Error Interrupt supports the direct interrupt only. Register QDMA_GLBL_ERR_INT bit[23],
en_coal must always be programmed to 0 (direct interrupt).
The Error Interrupt gets the vector from the error interrupt register QDMA_GLBL_ERR_INT. For the
direct interrupt, the vector is the interrupt vector index of the MSI-X table.
Here are the processes of the Error Interrupt.

1. Reads the Error Interrupt register QDMA_C2H_GLBL_INT (0B04) to get function and vector
numbers.
2. Sends out the PCIe MSI-X message.

The following figure shows the error interrupt register block diagram.

Displayed in the footer


Page 268 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: Error Interrupt Handling

User Interrupt

You can generate interrupt to host system using the user interrupt interface. You need to provide
usr_irq_in_fnc, usr_irq_in_vec, and usr_irq_in_vld interrupts and they should be held active
until usr_irq_out_ack is returned. usr_irq_in_fnc is a function number associated with an
interrupt. If it is for MSI-X interrupt usr_irq_in_vec should be provided. If the interrupt is for legacy
interrupt, vector is not needed.

Figure: Interrupt

Queue Management

Function Map Table

The Function Map Table is used to allocate queues to each function. The index into the RAM is the
function number. Each entry contains the base number of the physical QID and the number of queues
allocated to the function. It provides a function based, queue access protection mechanism by
translating and checking accesses to logical queues (through QDMA_TRQ_SEL_QUEUE_PF and
QDMA_TRQ_SEL_QUEUE_VF address space) to their physical queues. Direct register accesses to
queue space beyond what is allocated to the function in the table will be canceled and an error will be
logged.
Function map can be accessed through the indirect context register space QMDA_IND_CTXT_CMD
registers, with QDMA_IND_CTXT_CMD.sel = 0xC. When accessed through indirect context register
space, the context structure is defined by the Function Map Context Structure table. Along with FMAP
table programming in the IP, you must program the FMAP table in the Mailbox IP. This is needed for
function level reset (FLR) procedure.

Displayed in the footer


Page 269 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

FMAP Programing in QDMA IP


1. Program Function Map Context structure in QDMA_IND_CTXT_DATA (0x804 - 0x820) registers
as listed in the following table.
2. Program QDMA_IND_CTXT_CMD registers
a. [19:7] : function number
b. [6:5] : 2'h1 (write a context data)
c. [4:1] : 4'hC (FMAP)

For more information on QDMA_IND_CTXT_CMD (0x844), refer qdma_v5_0_pf_registers.csv


available in the Register Reference File.
Because these spaces exist only in the PF address map, only a physical function can modify this
table.

Table: Function Map Context Structure (0xC)

Bits Bit Width Field Name Description

[255:44] Reserved. Set to 0.

[44:32] 13 Qid_max Maximum number of


queues this function
has.

[31:12] Reserved. Set to 0.

[11:0] 12 Qid_base The base queue ID


for the function.

FMAP Programing in Mailbox IP


FMAP table in Mailbox can be only updated from PF. FMAP table in mailbox is at 0x43100 + (function
number *4).

Address

Function 0 FMAP address is at 0x43100


And remaining 255 function can follow 0x43100 + (func*4)

Data
Look at the following table for the data description.

Table: Function Map Table in Mailbox

Bits Bit Width Field Name Description

[31:22] Reserved.

Displayed in the footer


Page 270 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bits Bit Width Field Name Description

[23:12] 12 Old_max Maximum number of


queues this function
has.

[11:0] 12 Old_max The base queue ID


for the function.

Displayed in the footer


Page 271 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Context Programming

Program all mask registers to 1. They are QDMA_IND_CTXT_MASK_0 (0x824) to


QDMA_IND_CTXT_MASK_7 (0x840).
Program context values on to the following registers: QDMA_IND_CTXT_DATA_0 (0x804) to
QDMA_IND_CTXT_DATA_7 (0x820).
A host profile table context needs to be programmed before any context settings
QDMA_CTXT_SELC_HOST_PROFILE. Select 0xA in QDMA_IND_CTXT_CMD (0x844), and
write all data field to 0s and program context. All other values are reserved.
Refer to software descriptor context structure, C2H prefetch context structure and C2H CPMT
context structure to program the context data registers.
Program any context to corresponding queue in the following context command register:
QDMA_IND_CTXT_CMD (0x844).
Qid is given in bits [19:7].
Opcode bits [6:5] selects what operations must be done.
0 = QDMA_CTXT_CLR : All content of context is zeroed out. Qinv is sent out on
tm_dsc_sts
1 = QDMA_CTXT_WR : Write context
2 = QDMA_CTXT_RD : Read context
3 = QDMA_CTXT_INV : Qen in set to zero and other context values are intact. Qinv is
sent out on tm_dsc_sts and unused credits are sent out.
The context that is accessed is given in bits [4:1].
4'h0 = QDMA_CTXT_SELC_DEC_SW_C2H; C2H Descriptor SW Context
4'h1 = QDMA_CTXT_SELC_DEC_SW_H2C; H2C descriptor SW context
4'h2 = QDMA_CTXT_SELC_DEC_HW_C2H; C2H Descriptor HW Context
4'h3 = QDMA_CTXT_SELC_DEC_HW_H2C; H2C Descriptor HW Context
4'h4 = QDMA_CTXT_SELC_DEC_CR_C2H; C2H Descriptor HW Context
4'h5 = QDMA_CTXT_SELC_DEC_CR_H2C; H2C Descriptor HW Context
4'h6 = QDMA_CTXT_SELC_WRB; CMPT / used ring Context
4'h7 = QDMA_CTXT_SELC_PFTCH; C2H PFCH Context
4'h8 = QDMA_CTXT_SELC_INT_COAL; Interrupt Aggregation Context
4'h9 = Reserved
4'hA = QDMA_CTXT_SELC_HOST_PROFILE; Host Profile Table (Only
QDMA_CTXT_CMD_WR and QDMA_CTXT_CMD_RD supported)
4'hB = QDMA_CTXT_SELC_TIMER; Timer Context (Only QDMA_CTXT_CMD_INV
supported)
4'hC = QDMA_CTXT_SELC_FMAP FMAP table write (Only QDMA_CTXT_CMD_WR
and QDMA_CTXT_CMD_RD supported)
4'hD = QDMA_CTXT_SELC_FNC_STS (Per function BME enable/Disable)
Context programing write/read does not occur when bit [0] is set. For more information on
register 0x844, refer qdma_v5_0_pf_registers.csv available in the Register Reference File.

Displayed in the footer


Page 272 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Queue Setup

Clear Descriptor Software Context.


Clear Descriptor Hardware Context.
Clear Descriptor Credit Context.
Set-up Descriptor Software Context.
Clear Prefetch Context.
Clear Completion Context.
Set-up Completion Context.
If interrupts/status writes are desired (enabled in the Completion Context), an initial
Completion CIDX update is required to send the hardware into a state where it is sensitive
to trigger conditions. This initial CIDX update is required, because when out of reset, the
hardware initializes into an unarmed state.
Set-up Prefetch Context.

Queue Teardown

Queue Tear-down (C2H Stream):

Send marker packet to drain the pipeline.


Wait for marker completion.
Invalidate/clear prefetch context.
Invalidate/clear completion context.
Invalidate/clear descriptor software context.
Invalidate timer context (clear cmd is not supported).

Queue Tear-down (H2C Stream & MM):

Invalidate/clear descriptor software context.

Virtualization

QDMA implements SR-IOV passthrough virtualization where the adapter exposes a separate virtual
function (VF) for use by a virtual machine (VM). A physical function (PF) can be optionally made
privileged with full access to QDMA registers and resources, but only VFs implement per queue
pointer update registers and interrupts. VF drivers must communicate with the driver attached to the
PF through the mailbox for configuration, resource allocation, and exception handling. The QDMA
implements function level reset (FLR) to enable operating system on VM to reset the device without
interfering with the rest of the platform.

Table: Privileged Access

Type Notes

Queue context/other Registers for Context access only controlled by PFs (All 4 PFs).
control registers

Displayed in the footer


Page 273 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Type Notes

Status and statistics Mainly PF only registers. VFs need to coordinate with a PF driver for
registers error handling. VFs need to communicate through the mailbox with driver
attached to PF.

Data path registers Both PFs and VFs must be able to write the registers involved in data
path without needing to go through a hypervisor. Pointer update for
H2C/C2H Descriptor Fetch can be done directly by VF or PF for the
queues associated with the function using its own BAR space. Any
pointer updates to queue that do not belong to the function will be
dropped with error logged.

Other protection Turn on IOMMU to protect bad memory accesses from VMs.
recommendations

PF driver and VF The VF driver needs to communicate with the PF driver to request
driver communication operations that have global effect. This communication channel needs
this ability to pass messages and generate interrupts. This
communication channel utilizes a set of hardware mailboxes for each VF.

Mailbox

In a virtualized environment, the driver attached to a PF has enough privilege to program and access
QDMA registers. For all the lesser privileged functions, certain PFs and all VFs must communicate
with privileged drivers using the mailbox mechanism. The communication API must be defined by the
driver. The QDMA IP does not define it.
Each function (both PF and VF) has an inbox and an outbox that can fit a message size of 128B. A VF
accesses its own mailbox, and a PF accesses its own mailbox and all the functions (PF or VF)
associated with that PF.
✎ Note: pcie_qdma_mailbox IP supports up to 4PFs and 240VFs. You can build a mailbox system in
the PL to support more number of PFs and VFs (CPM5 limit is 16PFs and 2KVFs). Adding
pcie_qdma_mailbox IP increases PL utilization.
The QDMA mailbox allows the following access:

From a VF to the associated PF.


From a PF to any VF belonging to its own virtual function group (VFG).
From a PF (typically a driver that does not have access to QDMA registers) to another PF.

Figure: Mailbox

Displayed in the footer


Page 274 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

VF To PF Messaging
A VF is allowed to post one message to a target PF mailbox until the target function (PF) accepts it.
Before posting the message the source function should make sure its o_msg_status is cleared, then
the VF can write the message to its Outgoing Message Registers. After finishing message writing, the
VF driver sends msg_send command through write 0x1 at the control/status register (CSR) address
0x5004. The mailbox hardware then informs the PF driver by asserting i_msg_status field.
The function driver should enable the periodic polling of the i_msg_status to check the availability of
incoming messages. At a PF side, i_msg_status = 0x1 indicates one or more message is pending
for the PF driver to pick up. The cur_src_fn in the Mailbox Status Register gives the function ID of
the first pending message. The PF driver should then set the Mailbox Target Function Register to the
source function ID of the first pending message. Then access to a PF’s Incoming Message Registers
is indirectly, which means the mailbox hardware will always return the corresponding message bytes
sent by the Target function. Upon finishing the message reading, the PF driver should also send
msg_rcv command through write 0x2 at the CSR address. The hardware will deassert the
o_msg_status at the source function side. The following figure illustrates the messaging flow from a
VF to a PF at both the source and destination sides.

Figure: VF to PF Messaging Flow

Displayed in the footer


Page 275 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

PF To VF Messaging
The messaging flow from a PF to the VFs that belong to its VFG is slightly different than the VF to PF
flow because:
A PF can send messages to multiple destination functions, therefore, it might receive multiple
acknowledgments at the moment when checking the status. As illustrated in the following figure, a PF
driver must set Mailbox Target Function Register to the destination function ID before doing any
message operation; for example, checking the incoming message status, write message, or send the
command. At the VF side (receiving side), whenever a VF driver get the i_msg_status = 0x1, the
VF driver should read its Incoming Message Registers to pick up the message. Depending on the
application, the VF driver can send the msg_rcv immediately after reading the message or after the
corresponding message being processed.
To avoid one-by-one polling of the status of outgoing messages, the mailbox hardware provides a set
of Acknowledge Status Registers (ASR) for each PF. Upon the mailbox receiving the msg_rcv
command from a VF, it deasserts the o_msg_status field of the source PF and it also sets the
corresponding bit in the Acknowledge Status Registers. For a given VF with function ID <N>,
acknowledge status is at:

Acknowledge Status Register address: <N> / 32 + <0x22420 Register Address>


Acknowledge Status bit location: <N> / 32

The mailbox hardware asserts the ack_status filed in the Status Register (0x22400) when there is
any bit was asserted in the Acknowledge Status Register (ASR). The PF driver can poll the

Displayed in the footer


Page 276 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
ack_status before actually reading out the Acknowledge status registers. The PF driver might detect
multiple completions through one register access. After being processed, the PF driver should also
write the value back to the same register address to clear the status.

Figure: PF to VF Messaging Flow

Mailbox Interrupts
The mailbox module supports interrupt as the alternative event notification mechanism. Each mailbox
has an Interrupt Control Register (at the offset 0x22410 for a PF, or at the offset 0x5010 for a VF). Set
1 to this register to enable the interrupt. Once the interrupt is enabled, the mailbox will send the
interrupt to the QDMA given there is any pending event for the mailbox to process, namely, any
incoming message pending or any acknowledgment for the outgoing messages. Configure the
interrupt vector through the Function Interrupt Vector Register (0x22408 for a PF, or 0x5008 for a VF)
according to the driver configuration.
Enabling the interrupt does not change the event logging mechanism, which means the user must
check the pending events through reading the Function Status Registers. The first step to respond to
an interrupt request is disabling the interrupt. It is possible that the actual number of the pending
events is more than the number of the events at the moment when the mailbox is sent the interrupt.
✎ Recommended: AMD recommends that the user application interrupt handler process all the
pending events that present in the status register. Upon finishing the interrupt response, the user
application re-enables the interrupt.
The mailbox will check its event status at the time the interrupt control change from disabled to
enabled. If there is any new events that arrived the mailbox between reading the interrupt status and
the re-enabling the interrupt, the mailbox will generate a new interrupt request immediately.

Displayed in the footer


Page 277 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Function Level Reset

The function level reset (FLR) mechanism enables the software to quiesce and reset Endpoint
hardware with function-level granularity. When a VF is reset, only the resources associated with this
VF are reset. When a PF is reset, all resources of the PF, including that of its associated VFs, are
reset. Because FLR is a privileged operation, it must be performed by the PF driver running in the
management system.

Use Mode
Hypervisor requests for FLR when a function is attached and detached (that is; power on and
off).
You can request FLR as follows:

echo 1 > /sys/bus/pci/devices/$BDF/reset

where $BDF is the bus device function number of the targeted function.

FLR Process
A complete FLR process involves of three major steps.

1. Pre-FLR: Pre-FLR resets all QDMA context structure, mailbox, and user logic of the target
function.
Each function has a register called FLR Control Status register, which keeps track of the
pre-FLR status of the function. The offset is calculated as FLR Control Status register offset
= MB_base + 0x100, which is located at offset 0x100 from the mailbox memory space of
the function. Note that PF and VF have different MB_base. The definition of FLR Control
Status register is shown in the following table.
The software writes 1 to Bit[0] flr_status of the target function to initiate pre-FLR. The
hardware clears Bit[0] flr_status when pre-FLR completes. The software keeps polling
on Bit[0] flr_status, and only proceeds to the next step when it returns 0.

Table: FLR Control Status Register

Offset Field R/W Type Width Default Description

0x100 pre_flr_st RW 32 0 [31:1] reserved.


[0]: 1 Initiates pre-FLR.
[0]: 0 pre-FLR done.
bit[0] is set by the driver
and cleared by the
hardware.

2. Quiesce: The software must ensure all pending transaction is completed. This can be done by
polling the Transaction Pending bit in the Device Status register (in PCIe Configuration Space),

Displayed in the footer


Page 278 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
until it is cleared or times out after a certain period of time.
3. PCIe-FLR: PCIe-FLR resets all resources of the target function in the PCIe controller.
✎ Note: Initiate Function Level Reset bit (bit 15 of PCIe Device Control Register) of the target
function should be set to 1 to trigger FLR process in PCIe.

OS Support
If the PF driver is loaded and alive (i.e., use mode 1), all three steps aforementioned are performed by
the driver. However, for an AMD Versal device, if you want to perform FLR before loading the PF
driver (as defined in Use Mode above), an OS kernel patch is provided to allow OS to perform the
correct FLR sequence through functions defined in //…/source/drivers/pci/quick.c.

CPM5 Mailbox IP

You need to add a new IP from the IP catalog to instantiate pcie_qdma_mailox Mailbox IP. This IP is
needed for function virtualization. pcie_qdma_mailbox IP should be connected to the versel_cips
IP as shown in the following diagram:

Figure: CPM5 Mailbox Connection

✎ Note: Mailbox ports are always connected to Mailbox IP. If Mailbox IP is not used, leave the port
unconnected (floating). See the preceding figure for connection reference.
The following connections are related to the above example design. To connect the Mailbox IP, follow
these steps:

1. Add PCIe QDMA Mailbox IP. To do so:


a. Configure the IP for the number of PFs (should be same as number of PFs selected in
QDMA configuration).

Displayed in the footer


Page 279 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
b. Configure the IP for the number of VFs in each PF (should be same as number of VFs
selected in QDMA configuration).
✎ Note: It is important to match number PFs and VFs to the numbers configured in the QMDA
IP. If not, the design will not work.
2. Re-configure the NoC IP to add one extra AXI Master port. To do so:
a. Assign one more AXI clock.
b. In the Outputs tab, assign M02_AXI to aclk2.
c. In the Connectivity tab, select the M02_AXI PL option for both S00_AXI and SS01_AXI
ps_pcie.
3. Add AXI SmartConnect IP.
a. Configure the IP to have one master, one slave, one clock, and one reset.

Mailbox IP has two clocks and two resets as shown in the preceding figure. In this example, both the
clocks are generated from the PMC block.

axi_aclk
Mailbox IP runs at 250 MHz, this clock is used internaly in the Mailbox IP.

ip_clk
Depending on the configuration, the PL might need to run at a higher frequency to satisfy the
data throughput. For example, Gen5x8 PL need to run at 433 MHz to satisfy the data
throughput. ip_clk should be connected to 433 MHz clock in this case.

ip_resetn
It is synchronous with ip_clk and it is derived from the CIPS IP.

axi_aresetn
It is synchronous with axi_aclk. Use pro_sys_reset IP to generate a reset synchronous to
axi_aclk.

For some configurations, the PL clock's maximum speed is 250 MHz. For example, Gen3x16 and
Gen4x8 configurations. The PL clock runs at 250 MHz and pcie_qdma_mailbox IP also runs at 250
MHz. In this case, connect ip_clk and axi_aclk clock and ip_resetn and axi_aresetn reset.
Follow the CPM5 Mailbox Connection figure to make the following connections:

1. Connect M02_AXI interface to Smartconnect1, from Smartconnect1, connect M00_AXI to


Mailbox IP.
2. Connect dma0_usr_irq from CIPS IP to the Mailbox IP output.
3. Connect dma0_usr_flr from CIPS IP to the Mailbox IP output.
4. Make usr_flr and usr_irq interface in the Mailbox IP as external pins.

✎ Note: Mailbox access can be steered to NoC0 or NoC1 port based on the CIPS GUI configuration.
You should configure the NoC based on the CIPS GUI selection.

Port ID

Port ID is the categorization of some queues on the FPGA side. When the DMA is shared by more
than one user application, the port ID provides indirection to QID so that all the interfaces can be

Displayed in the footer


Page 280 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
further demuxed with lower cost. However, when used by a single application, the port ID can be
ignored and drive the port id inputs to 0s.

Host Profile

Host profile must be programmed to represent Root Port host. Host profile can be programmed
through indirect Context programming. Select QDMA_CTXT_SELC_HOST_PROFILE (4'hA) in
QDMA_IND_CTXT_CMD. Host profile context structure is given in the following table.

Table: Host Profile Context Structure

Bit Bit Width Field Name Description

[255:202] 54 Reserved Reserved

[201:192] 10 smid System Management


ID.
smid[9] is reserved.
simd[8:0] valid range
is 0x100 to 0x1FF
smid bit [8] should to
set to 1

[191:188] 4 Reserved Reserved

[187:186] 2 H2C AXI4-MM write


awprot

[185:182] 4 H2C AXI4-MM write


awcache

[181:178] 4 H2C AXI4-MM


Steering

[177:104] 74 Reserved Reserved

[103:102] 2 C2H AXI4-MM read


arprot

[101:98] 4 C2H AXI4-MM read


arcache

[97:94]] 4 C2H AXI4-MM


Steering

[0:93] 94 Reserved Reserved

Host Profile Context needs to be programed for any QDMA AXI4-MM transfers. And it needs to be
programmed before any Queue context programming. This effects only AX4-MM DMA transfers not
Streaming transfers.

Displayed in the footer


Page 281 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The System Management ID (SMID) is used to uniquely identify all masters in the device. For AXI-MM
master bridge transfers SMID (BAR access) is set to 0x100. For all AXI-MM transfer (DMA and BAR
access) SMID should be set within the range of 0x100 to 0x1FF (bit 8 should be set to 1).
Steering values should be set based on the address space location in Versal design. Steering Table
below shows values that need to be set and its corresponding Versal location.

Table: CPM5 QDMA Steering

Steering values Description

4'b0000 NoC Channel 0 (CPM PCIE NoC 0)

4'b0001 NoC channel 1 (CPM PCIE NoC 1)

4'b0010 Coherent PS Channel 0 (CCI PS AXI0)

4'b0100 AXI4-MM PL channel 0 (PL AXI 0) Applicable for only QDMA1

4'b0101 AXI4-MM PL channel 1 (PL AXI 1) Applicable for only QDMA1

All other values are reserved NA

There are some restrictions based on the QDMA selection. For more information, see Controller
Steering Options table in Master Bridge section.
The following example illustrates how a host profile can be programmed to direct some queues to a
specific location.
The example uses QDMA0 and two Host IDs. Host ID 0 targets Queues from 0 to 9 to NoC Channel 0
and Host ID 1 targets Queues from 10 to 19 to NoC Channel 1.

Host ID 0 for Queues 0 to 9


Context DATA Register, program register 0x804 to 0x820 based on the Host Profile table above
H2C AXI4-MM Steering and C2H AXI4-MM Steering set to 0x0 to target "NOC Channel0"
Set other values based on the Host Profile Context Structure table above.
Context CMD Register, program register 0x844
qid [19:7] : 0x0 (Host ID 0)
op [6:5] : 0x1 (WR)
sel [4:1] : 0xA (Host_PROFILE)
busy [0] : x
write 0x34 to Context CMD register

Displayed in the footer


Page 282 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Host ID 1 for Queues 10 to 19


Context DATA Register, program register 0x804 to 0x820 based on the Host Profile table above
H2C AXI4-MM Steering and C2H AXI4-MM Steering set to 0x1 to target "NOC Channel1"
Set all other values in the table also 0's
Context CMD Register, program register 0x844
qid [19:7] : 0x1 (Host ID 1)
op [6:5] : 0x1 (WR)
sel [4:1] : 0xA (HOST_PROFILE)
busy [0] : x
write 0xB4 to Context CMD register
This sets up two Host ID, 0 and 1.
Use Host ID 0 for setting up Queues 0 to 9. Write 0 in host_id field in context programming.
Use Host ID 1 for setting up Queues 10 to 19. Write 1 in host_id field in context programming.

System Management

Resets

The QDMA supports all the PCIe defined resets, such as link down, reset, hot reset, and function level
reset (FLR) (supports only Quiesce mode).

VDM

Vendor Defined Messages (VDMs) are an expansion of the existing messaging capabilities with PCI
Express. PCI Express Specification defines additional requirements for Vendor Defined Messages,
header formats and routing information. For details, see PCI-SIG Specifications
(https://www.pcisig.com/specifications).
QDMA allows the transmission and reception of VDMs. To enable this feature, select Enable Bridge
Slave Mode in the Vivado Customize IP dialog box. This enables the st_rx_msg interface.
RX Vendor Defined Messages are stored in shallow FIFO before they are transmitted to the output
port. When there are many back-to-back VDM messages, FIFO will overflow and these message will
be dropped. So it is better to repeat VDM messages at regular intervals.
Throughput for VDMs depend on several factors: PCIe speed, data width, message length, and the
internal VDM pipeline.
Internal VDM pipelines must be replaced with the Internal RX VDM FIFO interface for network on chip
(NoC) access, which has a shallow buffer of 64B.
✎ Note: New VDM messages will be dropped if more than 64B of VDM are received before the FIFO
is serviced through NoC.
Internal RX VDM FIFO interface cannot handle back-to-back messages. Pipeline throughput can only
handle one in every four accesses, which is about 25% efficiency from the host access.
‼ Important: Do not use back-to-back VDM access.
RX Vendor Defined Messages:

1. When QDMA receives a VDM, the incoming messages will be received on the st_rx_msg port.
2. The incoming data stream will be captured on the st_rx_msg_data port (per-DW).

Displayed in the footer


Page 283 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
3. The user application needs to drive the st_rx_msg_rdy to signal if it can accept the incoming
VDMs.
4. Once st_rx_msg_rdy is High, the incoming VDM is forwarded to the user application.
5. The user application needs to store this incoming VDMs and track of how many packets were
received.

TX Vendor Defined Messages:

1. To enable transmission of VDM from QDMA, program the TX Message registers in the Bridge
through the AXI4 Slave interface.
2. Bridge has TX Message Control, Header L (bytes 8-11), Header H (bytes 12-15) and TX
Message Data registers as shown in the PCIe TX Message Data FIFO Register
(TX_MSG_DFIFO).
3. Issue a Write to offset 0xE64 through AXI4 Slave interface for the TX Message Header L
register.
4. Program offset 0xE68 for the required VDM TX Header H register.
5. Program up to 16DW of Payload for the VDM message starting from DW0 – DW15 by sending
Writes to offset 0xE6C one by one.
6. Program the msg_routing, msg_code, data length, requester function field and msg_execute
field in the TX_MSG_CTRL register in offset 0xE60 to send the VDM TX packet.
7. The TX Message Control register also indicates the completion status of the message in bit 23.
User needs to read this bit to confirm the successful transmission of the VDM packet.
8. All the fields in the registers are RW except bit 23 (msg_fail) in TX Control register which is
cleared by writing a 1.
9. VDM TX packet will be sent on the AXI-ST RQ transmit interface.

Expansion ROM

If selected, the Expansion ROM is activated and can be a value from 2 KB to 4 GB. According to the
PCI Local Bus Specification ( PCI-SIG Specifications (https://www.pcisig.com/specifications)), the
maximum size for the Expansion ROM BAR should be no larger than 16 MB. Selecting an address
space larger than 16 MB can result in a non-compliant core.

Errors

Bridge Errors

Slave Bridge Abnormal Conditions

Slave bridge abnormal conditions are classified as: Illegal Burst Type and Completion TLP Errors. The
following sections describe the manner in which the Bridge handles these errors.
Illegal Burst Type
The slave bridge monitors AXI read and write burst type inputs to ensure that only the INCR
(incrementing burst) type is requested. Any other value on these inputs is treated as an error condition
and the Slave Illegal Burst (SIB) interrupt is asserted. In the case of a read request, the Bridge
asserts SLVERR for all data beats and arbitrary data is placed on the Slave AXI4-MM read data bus.

Displayed in the footer


Page 284 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
In the case of a write request, the Bridge asserts SLVERR for the write response and all write data is
discarded.
Completion TLP Errors
Any request to the bus for PCIe (except for a posted Memory write) requires a completion TLP to
complete the associated AXI request. The Slave side of the Bridge checks the received completion
TLPs for errors and checks for completion TLPs that are never returned (Completion Timeout). Each
of the completion TLP error types are discussed in the subsequent sections.
Unexpected Completion
When the slave bridge receives a completion TLP, it matches the header RequesterID and Tag to the
outstanding RequesterID and Tag. A match failure indicates the TLP is an Unexpected Completion
which results in the completion TLP being discarded and a Slave Unexpected Completion (SUC)
interrupt strobe being asserted. Normal operation then continues.
Unsupported Request
A device for PCIe might not be capable of satisfying a specific read request. For example, if the read
request targets an unsupported address for PCIe, the completer returns a completion TLP with a
completion status of 0b001 - Unsupported Request. The completer that returns a completion TLP
with a completion status of Reserved must be treated as an unsupported request status, according to
the PCI Express Base Specification v3.0. When the slave bridge receives an unsupported request
response, the Slave Unsupported Request (SUR) interrupt is asserted and the DECERR response is
asserted with arbitrary data on the AXI4 memory mapped bus.
Completion Timeout
A Completion Timeout occurs when a completion (Cpl) or completion with data (CplD) TLP is not
returned after an AXI to PCIe memory read request, or after a PCIe Configuration Read/Write request.
For PCIe Configuration Read/Write request, completions must complete within the C_COMP_TIMEOUT
parameter selected value from the time the request is issued. For PCIe Memory Read request,
completions must complete within the value set in the Device Control 2 register in the PCIe
Configuration Space register. When a completion timeout occurs, an OKAY response is asserted with
all 1s data on the memory mapped AXI4 bus.
Poison Bit Received on Completion Packet
An Error Poison occurs when the completion TLP EP bit is set, indicating that there is poisoned data
in the payload. When the slave bridge detects the poisoned packet, the Slave Error Poison (SEP)
interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.
Completer Abort
A Completer Abort occurs when the completion TLP completion status is 0b100 - Completer Abort.
This indicates that the completer has encountered a state in which it was unable to complete the
transaction. When the slave bridge receives a completer abort response, the Slave Completer Abort
(SCA) interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.

Table: Slave Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

Read Illegal burst type SIB interrupt is asserted.

Displayed in the footer


Page 285 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Transfer Type Abnormal Condition Bridge Response


SLVERR response given with
arbitrary read data.

SIB interrupt is asserted.


Write Illegal burst type Write data is discarded.
SLVERR response given.

SUC interrupt is asserted.


Read Unexpected completion
Completion is discarded.

SUR interrupt is asserted.


Unsupported Request status
Read DECERR response given with
returned
arbitrary read data.

SCT interrupt is asserted.


Read Completion timeout SLVERR response given with
arbitrary read data.

Completion data is discarded.


SEP interrupt is asserted.
Read Poison bit in completion
SLVERR response given with
arbitrary read data.

SCA interrupt is asserted.


Read Completer Abort (CA) status returned SLVERR response given with
arbitrary read data.

Master Bridge Abnormal Conditions

The following sections describe the manner in which the master bridge handles abnormal conditions.
AXI DECERR Response
When the master bridge receives a DECERR response from the AXI bus, the request is discarded
and the Master DECERR (MDE) interrupt is asserted. If the request was non-posted, a completion
packet with the Completion Status = Unsupported Request (UR) is returned on the bus for PCIe.
AXI SLVERR Response
When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.
Max Payload Size for PCIe, Max Read Request Size
When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.
Completion Packets
When the MAX_READ_REQUEST_SIZE is greater than the MAX_PAYLOAD_SIZE, a read request for PCIe
can ask for more data than the master bridge can insert into a single completion packet. When this

Displayed in the footer


Page 286 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
situation occurs, multiple completion packets are generated up to MAX_PAYLOAD_SIZE, with the Read
Completion Boundary (RCB) observed.
Poison Bit
When the poison bit is set in a transaction layer packet (TLP) header, the payload following the
header is corrupted. When the master bridge receives a memory request TLP with the poison bit set,
it discards the TLP and asserts the Master Error Poison (MEP) interrupt strobe.
Zero Length Requests
When the master bridge receives a read request with the Length = 0x1, FirstBE = 0x00, and LastBE =
0x00, it responds by sending a completion with Status = Successful Completion.
When the master bridge receives a write request with the Length = 0x1, FirstBE = 0x00, and LastBE
= 0x00 there is no effect.

Table: Master Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

MSE interrupt strobe asserted.


Read SLVERR response Completion returned with Completer
Abort status.

Write SLVERR response MSE interrupt strobe asserted.

MEP interrupt strobe asserted.


Write Poison bit set in request
Data is discarded.

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

Linkdown Errors

If the PCIe link goes down during DMA operations, transactions might be lost and the DMA might not
be able to complete. In such cases, the AXI4 interfaces continue to operate. Outstanding read
requests on the C2H Bridge AXI4 MM interface receive correct completions or completions with a
slave error response. The DMA logs a link down error in the status register. It is the responsibility of
the driver to have a timeout and handle recovery of a link down situation.

Data Path Errors

Data protection is supported on the primary data paths. CRC error can occur on C2H streaming, H2C
streaming. Parity error can occur on Memory Mapped, Bridge Master and Bridge Slave interfaces.

Displayed in the footer


Page 287 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Error on Write payload can occur on C2H streaming, Memory Mapped and Bridge Slave. Double bit
error on write payload and read completions for Bridge Slave interface causes parity error. Parity
errors on requests to the PCIe are dropped by the core, and a fatal error is logged by the PCIe. Parity
errors are not recoverable and can result in unexpected behavior. Any DMA during and after the parity
error should be considered invalid. If there is a parity error and transfer hangs or stops, the DMA will
log the error. You must investigate and fix the parity issues. Once the issues are fixed, clear that
queue and reopen the queue to start a new transfer.

DMA Errors

All DMA errors are logged in their respective error status register. Each block has error status and
error mask register so error can be passed on to higher level and eventually to
QDMA_GLBL_ERR_STAT register.
Errors can be fatal error based on register settings. If there is an fatal error DMA will stop the transfer
and will send interrupt if enabled. After debug and analysis, you must invalidate and restart the queue
to start the DMA transfer.

Error Aggregator

There are Leaf Error Aggregators in different places. They log the errors and propagate them to the
central place. The Central Error Aggregator aggregates the errors from all of the Leaf Error
Aggregators.
The QDMA_GLBL_ERR_STAT register is the error status register of the Central Error Aggregator. The
bit fields indicate the locations of Leaf Error Aggregators. Then, look for the error status register of the
individual Leaf Error Aggregator to find the exact error.
The register QDMA_GLBL_ERR_MASK is the error mask register of the Central Error Aggregator. It
has the mask bits for the corresponding errors. When the mask bit is set to 1'b1, it will enable the
corresponding error to be propagated to the next level to generate an Interrupt. The detail information
of the error generated interrupt is described in the interrupt section. Error interrupt is controlled by the
register QDMA_GLBL_ERR_INT (0xB04).
Each Leaf Error Aggregator has an error status register and an error mask register. The error status
register logs the error. The hardware sets the bit when the error happens, and the software can write
1'b1 to clear the bit if needed. The error mask register has the mask bits for the corresponding errors.
When the mask bit is set to 1'b1, it will enable the propagation of the corresponding error to the
Central Error Aggregator. The error mask register does not affect the error logging to the error status
register.

Figure: Error Aggregator

Displayed in the footer


Page 288 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The error status registers and the error mask registers of the Leaf Error Aggregators are as follows.

C2H Streaming Error


QDMA_C2H_ERR_STAT (0xAF0): This is the error status register of the C2H streaming errors.
QDMA_C2H_ERR_MASK (0xAF4): This the error mask register. The software can set the bit to
enable the corresponding C2H streaming error to be propagated to the Central Error Aggregator.
QDMA_C2H_FIRST_ERR_QID (0xB30): This is the Qid of the first C2H streaming error.

C2H MM Error
QDMA_C2H MM Status (0x1040)
C2H MM Error Code Enable Mask (0x1054)
C2H MM Error Code (0x1058)
C2H MM Error Info (0x105C)

QDMA H2C0 MM Error


H2C0 MM Status (0x1240)
H2C MM Error Code Enable Mask (0x1254)
H2C MM Error Code (0x1258)
H2C MM Error Info (0x125C)

TRQ Error
QDMA_GLBL_TRQ_ERR_STS (0x264): This is the error status register of the Trq errors.
QDMA_GLBL_TRQ_ERR_MSK (0x268): This is the error mask register.
QDMA_GLBL_TRQ_ERR_LOG_A (0x26C): This is the error logging register. It shows the select,
function and the address of the access when the error happens.

Displayed in the footer


Page 289 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Descriptor Error
QDMA_GLBL_DSC_ERR_STS (0x254)
QDMA_GLBL_DSC_ERR_MSK (0x258): This is the error logging register. It has the QID, DMA
direction, and the consumer index of the error.
QDMA_GLBL_DSC_ERR_LOG0 (0x25C)
QDMA_GLBL_TRQ_ERR_STS (0x264): This is the error status register of the TRQ errors.

RAM Double Bit Error


QDMA_RAM_DBE_STS_A (0xFC)
QDMA_RAM_DBE_MSK_A (0xF8)

RAM Single Error


QDMA_RAM_SBE_STS_A (0xF4)
QDMA_RAM_SBE_MSK_A (0xF0)

C2H Streaming Fatal Error Handling

QDMA_C2H_FATAL_ERR_STAT (0xAF8): The error status register of the C2H streaming fatal
errors.
QDMA_C2H_FATAL_ERR_MASK (0xAFC): The error mask register. The SW can set the bit to
enable the corresponding C2H fatal error to be sent to the C2H fatal error handling logic.
QDMA_C2H_FATAL_ERR_ENABLE (0xB00): This register enables two C2H streaming fatal
error handling processes:

bit[0]
Stop the data transfer by disabling the write request from the C2H DMA write engine by
setting enable_wrq_dis bit [0] to 1.

bit[1]
Invert the write payload parity on the data transfer by setting enable_wpl_par_inv bit [1]
to 1.

Port Descriptions

AMD Versal™ device CPM5 has two QDMA IPs, which can be selected in the AMD Vivado™ IP
customization GUI.

Controller 0 QDMA (QDMA0)


Controller 1 QDMA (QDMA1)

Based on the GUI selection, QDMA0 or QDMA1 ports will be enabled. Ports described below have a
prefix of dma<n>_, which can be dma0_ for QDMA Port 0 or dma1_ for QDMA Port 1.
✎ Note: Ports without prefix dma0_ or dma1_ are global ports.

Displayed in the footer


Page 290 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: Parameters

Parameter Name Description

PL_LINK_CAP_MAX_LINK_WIDTH Phy lane width

C_M_AXI_ADDR_WIDTH AXI4 Master interface Address width

C_M_AXI_ID_WIDTH AXI4 Master interface id width

C_M_AXI_DATA_WIDTH AXI4 Master interface data width


512 bits

C_S_AXI_ID_WIDTH AXI4 Bridge Slave interface id width

C_S_AXI_ADDR_WIDTH AXI4 Bridge Slave interface Address width

C_S_AXI_DATA_WIDTH AXI4 Bridge Slave interface data width


512 bits

C_S_AXI_ID_WIDTH AXI4 Bridge Slave interface id width

AXI_DATA_WIDTH AXI4 DMA transfer data width


512 bits

QDMA Global Ports

Table: QDMA Global Port Descriptions

Port Name I/O Description

gt_refclk0_clk_p/gt_refclk0_clk_n I GT reference clock

PCIE0_GT_gtx_n/PCIE0_GT_gtx_p O PCIe TX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

PCIE0_GT_grx_n/PCIE0_GT_grx_p I PCIe RX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

dma<n>_intrfc_clk I User clock input. User needs to connected this pin to


an independent clock source. All DMA interface
output/input needs to driven/flopped from this clock
source.

dma<n>_intrfc_resetn I Interface reset signals. User should release this signal


(1'b1) after the clock dma<n>_intrfc_clk is stable,
until then all the interface signals will be held in reset
state.

Displayed in the footer


Page 291 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

cpm_pl_axi0_clk I This is AXI4-MM interface clock for channel 0. This


clock will be exposed only when QDMA1 with AXI4-
MM interface to PL is enabled and maximum
frequency should be 250MHz.

cpm_pl_axi1_clk I This is AXI4-MM interface clock for channel 1. This


clock will be exposed only when QDMA1 with AXI4-
MM interface to PL is enabled and maximum
frequency should be 250MHz.

dma<n>_axi_aresetn O User reset out. Reset out signal is synchronous to


clock provided on dma<n>_interff_clk. This reset
should drive all interface reset logic.
dma<n>_axi_aresetn is independent of PCIe reset.
When the DMA engine comes out of reset,
dma<n>_axi_aresetn is de-asserted.

cpm_misc_irq O Reserved (future use)

cpm_cor_irq O Reserved (future use)

cpm_uncor_irq O Reserved (future use)

cpm_irq0 I Reserved (future use), tie this port to 1'b0

cpm_irq1 I Reserved (future use), tie this port to 1'b0

AXI Slave Interface

AXI Bridge Slave ports are connected from the AMD Versal device Network on Chip (NoC) to the
CPM DMA internally. For Slave Bridge AXI4 details, see the Versal Adaptive SoC Programmable
Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
To access QDMA registers, you must follow the protocols outlined in the AXI Slave Bridge Register
Limitations section.

AXI4 Memory Mapped Interface

Controller 0 QDMA AXI4 Memory Mapped interface (QDMA0)


QDMA0 IP AXI4 Memory Mapped (MM) Master ports are connected from the CPM to the AMD Versal
device Network on Chip (NoC) internally. For details, see the Versal Adaptive SoC Programmable
Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313). QDMA0
AXI4 MM Master interface can be connected to the DDR memory, or to the PL user logic, depending
on the NoC configuration.

Displayed in the footer


Page 292 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Controller 1 QDMA AXI4 Memory Mapped interface (QDMA1)


QDMA1 AXI4 Memory Mapped ports can be directed to NoC or to PL user logic. If QDMA1 AXI4 MM
ports to NoC is enabled see Noc details Versal Adaptive SoC Programmable Network on Chip and
Integrated Memory Controller LogiCORE IP Product Guide (PG313).
If QDMA1 AXI4 MM ports to PL user logic is enabled below table lists all ports. There will be two sets
of AXI4 MM ports: one for Channel 0, and one for Channle1. Based on the context settings AXI4 MM
transfers will be directed to AXI4 MM channel 0 or AXI4 MM Channel 1 ports.

Table: AXI4 Memory Mapped DMA Read Address Interface Signals

Signal Name Direction Description

CPM_PL_AXI0_araddr O This signal is the address for a memory mapped read


[C_M_AXI_ADDR_WIDTH- to the user logic from the DMA.
1:0]
CPM_PL_AXI1_araddr
[C_M_AXI_ADDR_WIDTH-
1:0]

CPM_PL_AXI0_arid [3:0] O Standard AXI4 description, which is found in the AXI4


CPM_PL_AXI1_arid [3:0] Protocol Specification AMBA AXI4-Stream Protocol
Specification (ARM IHI 0051A).

CPM_PL_AXI0_aruser[31:0] O CPM_PL_AXI<0/1>_aruser[18:0] = reserved


CPM_PL_AXI1_aruser[31:0] CPM_PL_AXI<0/1>_aruser[31:19] = queue number

CPM_PL_AXI0_arlen[7:0] O Master read burst length.


CPM_PL_AXI1_arlen[7:0]

CPM_PL_AXI0_arsize[2:0] O Master read burst size.


CPM_PL_AXI1_arsize[2:0]

CPM_PL_AXI0_arprot[2:0] O Protection type.


CPM_PL_AXI1_arprot[2:0]

CPM_PL_AXI0_arvalid O The assertion of this signal means there is a valid


CPM_PL_AXI1_arvalid read request to the address on
CPM_PL_AXI<0/1>_araddr.

CPM_PL_AXI0_arready I Master read address ready.


CPM_PL_AXI1_arready

CPM_PL_AXI0_arlock O Lock type.


CPM_PL_AXI1_arlock

CPM_PL_AXI0_arcache[3:0] O Memory type.


CPM_PL_AXI1_arcache[3:0]

Displayed in the footer


Page 293 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name Direction Description

CPM_PL_AXI0_arburst[1:0] O Master read burst type.


CPM_PL_AXI1_arburst[1:0]

Table: AXI4 Memory Mapped DMA Read Interface Signals

Signal Name Direction Description

CPM_PL_AXI0_rdata I Master read data.


[C_M_AXI_DATA_WIDTH-
1:0]
CPM_PL_AXI1_rdata
[C_M_AXI_DATA_WIDTH-
1:0]

CPM_PL_AXI0_rid [3:0] I Master read ID.


CPM_PL_AXI1_rid [3:0]

CPM_PL_AXI0_rresp[1:0] I Master read response.


CPM_PL_AXI1_rresp[1:0]

CPM_PL_AXI0_rlast I Master read last.


CPM_PL_AXI1_rlast

CPM_PL_AXI0_rvalid I Master read valid.


CPM_PL_AXI1_rvalid

CPM_PL_AXI0_rready O Master read ready.


CPM_PL_AXI1_rready

CPM_PL_AXI0_ruser I Master read odd data parity, per byte. This port is
[C_M_AXI_DATA_WIDTH/8- enabled only in Data Protection mode.
1:0]
CPM_PL_AXI1_ruser
[C_M_AXI_DATA_WIDTH/8-
1:0]

Table: AXI4 Memory Mapped DMA Write Address Interface Signals

Signal Name Direction Description

CPM_PL_AXI0_awaddr O This signal is the address for a memory mapped write


[C_M_AXI_ADDR_WIDTH- to the user logic from the DMA.
1:0]
CPM_PL_AXI1_awaddr
[C_M_AXI_ADDR_WIDTH-
1:0]

Displayed in the footer


Page 294 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name Direction Description

CPM_PL_AXI0_awid[3:0] O Master write address ID.


CPM_PL_AXI1_awid[3:0]

CPM_PL_AXI0_awuser[31:0] O CPM_PL_AXI<0/1>_awuser[18:0] = reserved


CPM_PL_AXI1_awuser[31:0] CPM_PL_AXI<0/1>_awuser[31:19] = queue number

CPM_PL_AXI0_awlen[7:0] O Master write address length.


CPM_PL_AXI1_awlen[7:0]

CPM_PL_AXI0_awsize[2:0] O Master write address size.


CPM_PL_AXI1_awsize[2:0]

CPM_PL_AXI0_awburst[1:0] O Master write address burst type.


CPM_PL_AXI1_awburst[1:0]

CPM_PL_AXI0_awprot[2:0] O Protection type.


CPM_PL_AXI1_awprot[2:0]

CPM_PL_AXI0_awvalid O The assertion of this signal means there is a valid


CPM_PL_AXI1_awvalid write request to the address on
CPM_PL_AXI<0/1>_araddr.

CPM_PL_AXI0_awready I Master write address ready.


CPM_PL_AXI1_awready

CPM_PL_AXI0_awlock O Lock type.


CPM_PL_AXI1_awlock

CPM_PL_AXI0_awcache[3:0] O Memory type.


CPM_PL_AXI1_awcache[3:0]

Table: AXI4 Memory Mapped DMA Write Interface Signals

Signal Name Direction Description

CPM_PL_AXI0_wdata O Master write data.


[C_M_AXI_DATA_WIDTH-
1:0]
CPM_PL_AXI1_wdata
[C_M_AXI_DATA_WIDTH-
1:0]

CPM_PL_AXI0_wlast O Master write last.


CPM_PL_AXI1_wlast

CPM_PL_AXI0_wstrb[31:0] O Master write strobe.


CPM_PL_AXI1_wstrb[31:0]

Displayed in the footer


Page 295 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name Direction Description

CPM_PL_AXI0_wvalid O Master write valid.


CPM_PL_AXI1_wvalid

CPM_PL_AXI0_wready I Master write ready.


CPM_PL_AXI1_wready

CPM_PL_AXI0_wuser O Master write user.


[C_M_AXI_DATA_WIDTH/8- CPM_PL_AXI<0/1>_wuser[C_M_AXI_DATA_WIDTH/8-
1:0] 1:0] = write data odd parity, per byte. This port is
CPM_PL_AXI1_wuser enabled only in Data Protection mode.
[C_M_AXI_DATA_WIDTH/8-
1:0]

Table: AXI4 Memory Mapped DMA Write Response Interface Signals

Signal Name Direction Description

CPM_PL_AXI0_bvalid I Master write response valid.


CPM_PL_AXI1_bvalid

CPM_PL_AXI0_bresp[1:0] I Master write response.


CPM_PL_AXI1_bresp[1:0]

CPM_PL_AXI0_bid[3:0] I Master response ID.


CPM_PL_AXI1_bid[3:0]

CPM_PL_AXI0_bready O Master response ready.


CPM_PL_AXI1_bready

AXI4-Lite Master Interface

AMD Versal device Network on Chip (NoC) provides only AXI4 interface. If you need AXI4-Lite
interface use SmartConnect IP to convert NoC output AXI4 interface to AXI4-Lite interface.
For details, see SmartConnect LogiCORE IP Product Guide (PG247).

AXI4-Stream H2C Ports

Table: AXI4-Stream H2C Port Descriptions

Port Name I/O Description

dma<n>_m_axis_h2c_tdata O Data output for H2C AXI4-Stream.


[AXI_DATA_WIDTH-1:0]

dma<n>_m_axis_h2c_tcrc O 32-bit CRC value for that beat.


[31:0] IEEE 802.3 CRC-32 Polynomial

Displayed in the footer


Page 296 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_m_axis_h2c_qid[1:0] O Queue ID

dma<n>_m_axis_h2c_port_id[2:0] O Port ID

dma<n>_m_axis_h2c_err O If set, indicates the packet has an error. The error


could come from the PCIe, or the error could be in the
DMA transfer. AMD recommends that you look at the
error registers and context for details.
When the DMA first detects the error, the error bit will
be set to 1. And the error bit will be set for the
remainder of that packet. The error bit will be reset to
0 for the next packet if there are no errors in that
packet.

dma<n>_m_axis_h2c_mdata[31:0] O Metadata
In internal mode, QDMA passes the lower 32 bits of
the H2C AXI4-Stream descriptor on this field.

dma<n>_m_axis_h2c_mty[5:0] O The number of bytes that are invalid on the last beat
of the transaction. This field is 0 for a 64B transfer.

dma<n>_m_axis_h2c_zero_byte O When set, it indicates that the current beat is an


empty beat (zero bytes are being transferred).

dma<n>_m_axis_h2c_tvalid O Valid

dma<n>_m_axis_h2c_tlast O Indicates that this is the last cycle of the packet


transfer

dma<n>_m_axis_h2c_tready I Ready

AXI4-Stream C2H Ports

Table: AXI4-Stream C2H Port Descriptions

Port Name I/O Description

dma<n>_s_axis_c2h_tdata I It supports 4 data widths: 64 bits, 128 bits, 256 bits,


[AXI_DATA_WIDTH-1:0] and 512 bits. Every C2H data packet has a
corresponding C2H completion packet

dma<n>_s_axis_c2h_tcrc I 32 bit CRC value for that beat


[31:0] IEEE 802.3 CRC-32 Polynomial
IP samples CRC value only when
dma<n>_s_axis_c2h_tlast is asserted

Displayed in the footer


Page 297 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_s_axis_c2h_ctrl_len I Length of the packet. For ZERO byte write, the length
[15:0] is 0.
C2H stream packet data length is limited to 31 * c2h
buffer size
ctrl_len is in bytes and should be valid in first beat of
the packet

dma<n>_s_axis_c2h_ctrl_qid I Queue ID
[11:0]

dma<n>_s_axis_c2h_ctrl_has_cmpt I 1'b1: The data packet has a completion;


1'b0: The data packet doesn't have a completion

dma<n>_s_axis_c2h_ctrl_marker I Marker message used for making sure pipeline is


completely flushed. After that, you can safely do
queue invalidation

dma<n>_s_axis_c2h_ctrl_port_id I Port ID
[2:0]

dma<n>_s_axis_c2h_ecc[6:0] I Output of the AMD Error Correction Code (ECC) core.


ECC IP input is described below

dma<n>_s_axis_c2h_mty I Empty byte should be set in last beat


[5:0]

dma<n>_s_axis_c2h_tvalid I Valid

dma<n>_s_axis_c2h_tlast I Indicate last packet

dma<n>_s_axis_c2h_tready O Ready

To generate ECC signals for C2H control bus dma<n>_s_axis_c2h_ctrl_ecc[6:0], use AMD Error
Correction Code (ECC) IP. Input signals to ECC IP are listed below and you have to maintain the
order as per the list.

Input to ECC IP using ecc_data_in[56:0]

assign ecc_data_in[56:0] = { 24'h0, //reserved


dma<n>_s_axis_c2h_ctrl_has_cmpt, //has compt
dma<n>_s_axis_c2h_ctrl_marker, //marker
dma<n>_s_axis_c2h_ctrl_port_id, //port_id
dma<n>_s_axis_c2h_ctrl_qid, // Qid
dma<n>_s_axis_c2h_ctrl_len}; //length

Displayed in the footer


Page 298 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI4-Stream C2H Completion Ports

Table: AXI4-Stream C2H Completion Port Descriptions

Port Name I/O Description

dma<n>_s_axis_c2h_cmpt_data[511:0]
I Completion data from the user application. This
contains information that is written to the completion
ring in the host.

dma<n>_s_axis_c2h_cmpt_size I 00: 8B completion.


[1:0] 01: 16B completion.
10: 32B completion.
11: 64B completion

dma<n>_s_axis_c2h_cmpt_dpar I Odd parity computed as bit per 32b.


[15:0] dma<n>_s_axis_c2h_cmpt_dpar[0] is parity over
dma<n>_s_axis_c2h_cmpt_data[31:0].
dma<n>_s_axis_c2h_cmpt_dpar[1] is parity over
dma<n>_s_axis_c2h_cmpt_data[63:31] and so on.

dma<n>_s_axis_c2h_cmpt_ctrl_qid[1:0]
I Completion queue ID.

dma<n>_s_axis_c2h_cmpt_ctrl_marker
I Marker message used for making sure pipeline is
completely flushed. After that, you can safely do
queue invalidation.

dma<n>_s_axis_c2h_cmpt_ctrl_user_trig
I User can trigger the interrupt and the status descriptor
write if they are enabled.

dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type[1:0]
I 2’b00: NO_PLD_NO_WAIT. The CMPT packet does
not have a corresponding payload packet, and it does
not need to wait.
2’b01: NO_PLD_BUT_WAIT. The CMPT packet does
not have a corresponding payload packet; however, it
still needs to wait for the payload packet to be sent
before sending the CMPT packet.
2’b10: RSVD.
2’b11: HAS_PLD. The CMPT packet has a
corresponding payload packet, and it needs to wait for
the payload packet to be sent before sending the
CMPT packet.

dma<n>_s_axis_c2h_cmpt_ctrl_wait_pld_pkt_id[15:0]
I The data payload packet ID that the CMPT packet
needs to wait for before it can be sent.

dma<n>_s_axis_c2h_cmpt_ctrl_port_id[2:0]
I Port ID.

Displayed in the footer


Page 299 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_s_axis_c2h_cmpt_ctrl_col_idx[2:0]
I Color index that defines if the user wants to have the
color bit in the CMPT packet and the bit location of the
color bit if present.

dma<n>_s_axis_c2h_cmpt_ctrl_err_idx[2:0]
I Error index that defines if the user wants to have the
error bit in the CMPT packet and the bit location of the
error bit if present.

dma<n>_s_axis_c2h_cmpt_ctrl_no_wrb_marker
I Disables CMPT packet during Marker transfer.
1'b0 : CMPT packets are sent to CMPT ring
1'b1 : CMPT packets are not sent to CMPT ring.

dma<n>_s_axis_c2h_cmpt_tvalid I Valid. dma<n>_s_axis_c2h_cmpt_tvalid must be


asserted until dma<n>_s_axis_c2h_cmpt_tready is
asserted.

dma<n>_s_axis_c2h_cmpt_tready O Ready.

AXI4-Stream Status Ports

Table: AXI-ST C2H Status Port Descriptions

Port Name I/O Description

dma<n>_axis_c2h_status_valid O Valid per descriptor.

dma<n>_axis_c2h_status_qid O QID of the packet.


[10:0]

dma<n>_axis_c2h_status_drop O The QDMA drops the packet if it does not have


enough descriptors to transfer the full packet to the
host. This bit indicates if the packet was dropped or
not. A packet that is not dropped is considered as
having been accepted.
0: Packet is not dropped.
1: Packet is dropped.

dma<n>_axis_c2h_status_last O Last descriptor.

dma<n>_axis_c2h_status_cmp O 0: Dropped packet or C2H packet with has_cmpt of


1'b0.
1: C2H packet that has completions.

dma<n>_axis_c2h_status_error O When axis_c2h_status_error is set to 1, the descriptor


fetched has an error. When set to 0, there is no error.

Displayed in the footer


Page 300 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI4-Stream C2H Write Completion Ports

Table: AXI-ST C2H Write Completion Port Descriptions

Port Name I/O Description

dma<n>_axis_c2h_dmawr_cmp O This signal is asserted when the last data payload


write request of the packet gets the write completion.
It is one pulse per packet.

dma<n>_axis_c2h_dmawr_target_vch
O Target virtual channel

dma<n>_c2h_dmawr_port_id O Port ID

VDM Ports

Table: VDM Port Descriptions

Port Name I/O Description

dma<n>_st_rx_msg_tvalid O Valid

dma<n>_st_rx_msg_tdata[31:0] O Beat 1:
{REQ_ID[15:0], VDM_MSG_CODE[7:0],
VDM_MSG_ROUTING[2:0],
VDM_DW_LENGTH[4:0]}
Beat 2:
VDM Lower Header [31:0]
or
{(Payload_length=0), VDM Higher Header [31:0]}
Beat 3 to Beat <n>:
VDM Payload

dma<n>_st_rx_msg_tlast O Indicate the last beat

dma<n>_st_rx_msg_tready I Ready.
✎ Note: When this interface is not used, Ready must
be tied-off to 1.

RX Vendor Defined Messages are stored in shallow FIFO before they are transmitted to output ports.
When there are many back to back VDM messages, FIFO overflows and these messages are
dropped. It is best to repeat VDM messages at regular intervals.

FLR Ports

Table: FLR Port Descriptions

Port Names I/O Description

Displayed in the footer


Page 301 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Names I/O Description

dma<n>_usr_flr_fnc [12:0] O Function


The function number of the FLR status change.

dma<n>_usr_flr_set O Set
Asserted for 1 cycle indicating that the FLR status of
the function indicated on
dma<n>_usr_flr_fnc[12:0] is active.

dma<n>_usr_flr_clear O Reserved

dma<n>_usr_flr_done_fnc I Done Function


[12:0] The function for which FLR has been completed by
user logic.

dma<n>_usr_flr_done_vld I Done Valid


Assert for one cycle to signal that FLR for the function
on dma<n>_usr_flr_done_fnc[12:0] has been
completed.

QDMA Descriptor Bypass Input Ports

Table: QDMA H2C-Streaming Bypass Input Port Descriptions

Port Name I/O Description

dma<n>_h2c_byp_in_st_addr I 64-bit starting address of the DMA transfer.


[63:0]

dma<n>_h2c_byp_in_st_len I The number of bytes to transfer.


[15:0]

dma<n>_h2c_byp_in_st_sop I Indicates start of packet. Set for the first descriptor.


Reset for the rest of the descriptors.

dma<n>_h2c_byp_in_st_eop I Indicates end of packet. Set for the last descriptor.


Reset for the rest of the descriptors

dma<n>_h2c_byp_in_st_sdi I H2C Bypass In Status Descriptor/Interrupt


If set, it is treated as an indication from the user
application to the QDMA to send the status descriptor
to host, and to generate an interrupt to host when the
QDMA has fetched the last byte of the data
associated with this descriptor. The QDMA honors the
request to generate an interrupt only if interrupts have
been enabled in the H2C SW context for this QID and
armed by the driver. This can only be set for an EOP
descriptor.

Displayed in the footer


Page 302 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


QDMA hangs if the last descriptor without
h2c_byp_in_st_sdi has an error. This results in a
missing writeback and hw_ctxt.dsc_pend bit that are
asserted indefinitely. The workaround is to send a
zero length descriptor to trigger the Completion
(CMPT) Status.
✎ Recommended: For performance reasons, AMD
recommends that this port be asserted once in 32 or
64 descriptors and assert at the last descriptor if there
are no more descriptors left.

dma<n>_h2c_byp_in_st_mrkr_req I H2C Bypass In Marker Request


When set, the descriptor passes through the H2C
Engine pipeline and once completed, produces a
marker response on the interface. This can only be
set for an EOP descriptor.

dma<n>_h2c_byp_in_st_no_dma I H2C Bypass In No DMA


When sending in a descriptor through this interface
with this signal asserted, it informs the QDMA to not
send any PCIe requests for this descriptor. Because
no PCIe request is sent out, no corresponding DMA
data is issued on the H2C Streaming output interface.
This is typically used in conjunction with
dma<n>_h2c_byp_in_st_sdi to cause Status
Descriptor/Interrupt when the user logic is out of the
actual descriptors and still wants to drive the
dma<n>_h2c_byp_in_st_sdi signal.
If dma<n>_h2c_byp_in_st_mrkr_req and
h2c_byp_in_st_sdi are reset when sending in a no-
DMA descriptor, the descriptor is treated as a NOP
and is completely consumed inside the QDMA without
any interface activity.
If dma<n>_h2c_byp_in_st_no_dma is set, then both
dma<n>_h2c_byp_in_st_sop and
dma<n>_h2c_byp_in_st_eop must be set.
If dma<n>_h2c_byp_in_st_no_dma is set, the QDMA
ignores the address and length fields of this interface.

dma<n>_h2c_byp_in_st_qid I The QID associated with the H2C descriptor ring.


[1:0]

dma<n>_h2c_byp_in_st_error I This bit can be set to indicate an error for the queue.
The descriptor is not be processed. Context is be
updated to reflect an error in the queue

Displayed in the footer


Page 303 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_h2c_byp_in_st_func I PCIe function ID


[11:0]

dma<n>_h2c_byp_in_st_cidx I The CIDX that is used for the status descriptor update
[15:0] and/or interrupt (aggregation mode). Generally the
CIDX should be left unchanged from when it is
received from the descriptor bypass output interface.

dma<n>_h2c_byp_in_st_port_id I QDMA port ID


[2:0]

dma<n>_h2c_byp_in_st_valid I Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma<n>_h2c_byp_in_st_ready O Ready to take in descriptor


The following is an example timing diagram for H2C Streaming Bypass Input:

Figure: H2C-Streaming Bypass Input

Table: QDMA H2C-MM Channel 0 Descriptor Bypass Input Port Descriptions

Port Name I/O Description

Displayed in the footer


Page 304 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_h2c_byp_in_mm_<y>_radr[63:0]
I The read address for the DMA data.

dma<n>_h2c_byp_in_mm_<y>_wadr[63:0]
I The write address for the dma data.

dma<n>_h2c_byp_in_mm_<y>_no_dma
I H2C Bypass In No DMA
When sending in a descriptor through this interface
with this signal asserted, this signal informs the
QDMA to not send any PCIe requests for this
descriptor. Because no PCIe request is sent out, no
corresponding DMA data is issued on the H2C MM
output interface.
This is typically used in conjunction with
h2c_byp_in_mm_sdi to cause Status
Descriptor/Interrupt when the user logic is out of the
actual descriptors and still wants to drive the
h2c_byp_in_mm_sdi signal.
If h2c_byp_in_mm_mrkr_req and
h2c_byp_in_mm_sdi are reset when sending in a no-
DMA descriptor, the descriptor is treated as a No
Operation (NOP) and is completely consumed inside
the QDMA without any interface activity.
If h2c_byp_in_mm_no_dma is set, the QDMA ignores
the address. The length field should be set to 0.

dma<n>_h2c_byp_in_mm_<y>_len[27:0]
I The DMA data length.
The upper 12 bits must be tied to 0. Thus only the
lower 16 bits of this field can be used for specifying
the length.

dma<n>_h2c_byp_in_mm_<y>_sdi I H2C-MM Bypass In Status Descriptor/Interrupt


If set, it is treated as an indication to QDMA to send
the status descriptor to host and generate an interrupt
to host when the QDMA has fetched the last byte of
the data associated with this descriptor. The QDMA
honors the request to generate an interrupt only if
interrupts have been enabled in the H2C ring context
for this QID and armed by the driver.
QDMA hangs if the last descriptor without
h2c_byp_in_mm_sdi has an error. This results in a
missing writeback and hw_ctxt.dsc_pend bit that are
asserted indefinitely. The workaround is to send a
zero length descriptor to trigger the Completion
(CMPT) Status.

dma<n>_h2c_byp_in_mm_<y>_mrkr_req
I H2C-MM Bypass In Marker Request

Displayed in the footer


Page 305 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


Indication from the User that the QDMA must send a
completion status to the User once the QDMA has
completed the data transfer of this descriptor.

dma<n>_h2c_byp_in_mm_<y>_qid I The QID associated with the H2C descriptor ring.


[10:0]

dma<n>_h2c_byp_in_mm_<y>_errorI This bit can be set to indicate an error for the queue.
The descriptor is not processed. Context is updated to
reflect and error in the queue.

dma<n>_h2c_byp_in_mm_<y>_func I PCIe function ID


[11:0]

dma<n>_h2c_byp_in_mm_<y>_cidx I The CIDX that is used for the status descriptor update
[15:0] and/or interrupt (aggregation mode). Generally the
CIDX should be left unchanged from when it was
received from the descriptor bypass output interface.

dma<n>_h2c_byp_in_mm_<y>_port_id
I QDMA port ID
[2:0]

dma<n>_h2c_byp_in_mm_<y>_validI Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma<n>_h2c_byp_in_mm_<y>_ready
O Ready to take in descriptor

1. variable <n> : represents different QDMAs (QDMA0 or QDMA1).


2. variable <y> : represents different Channels, (Channel 0 or Channel 1 ).

The following is an example timing diagram for H2C AXI-MM Bypass input:

Figure: H2C AXI-MM Bypass Input

Displayed in the footer


Page 306 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: QDMA C2H-Streaming Bypass Input Port Descriptions 1

Port Name I/O Description

dma<n>_c2h_byp_in_st_csh_addr I 64 bit address where DMA writes data.


[63:0]

dma<n>_c2h_byp_in_st_csh_qid I The QID associated with the C2H descriptor ring.


[1:0]

dma<n>_c2h_byp_in_st_csh_error I This bit can be set to indicate an error for the queue.
The descriptor is not processed. Context is updated to
reflect and error in the queue. Error port is not valid in
Simple bypass mode. You are responsible to feed in
good descriptors. If there is a descriptor with error on
Bypass out, you need to fix the error first.

dma<n>_c2h_byp_in_st_csh_func I PCIe function ID


[11:0]

dma<n>_c2h_byp_in_st_csh_port_id[2:0]
I QDMA port ID

dma<n>_c2h_byp_in_st_csh_pfch_tag[6:0]
I Prefetch tag. The prefetch tag points to the cam that
stores the active queues in the prefetch engine. In
Cache Bypass mode, you must loop back
dma<n>_c2h_byp_out_pfch_tag[6:0]to
dma<n>_c2h_byp_in_st_csh_pfch_tag[6:0].In

Displayed in the footer


Page 307 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


Simple bypass mode, used to pass in the Prefetch tag
value from MDMA_C2H_PFCH_BYP_TAG (0x140C)
register.

dma<n>_c2h_byp_in_st_csh_valid I Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma<n>_c2h_byp_in_st_csh_ready O Ready to take in descriptor.

1. AXI-Stream C2H Simple Bypass mode and Cache Bypass mode both use the same bypass
ports, dma<n>_c2h_byp_in_st_csh_*.

The following is an example timing diagram for C2H AXI-Stream Bypass Input:

Figure: C2H AXI-Stream Bypass Input

Table: QDMA C2H-MM Channel 0 Descriptor Bypass Input Port Descriptions

Port Name I/O Description

dma<n>_c2h_byp_in_mm_<y>_raddrI The read address for the DMA data.


[63:0]

dma<n>_c2h_byp_in_mm_<y>_wadr[63:0]
I The write address for the DMA data.

dma<n>_c2h_byp_in_mm_<y>_no_dma
I C2H Bypass In No DMA
When sending in a descriptor through this interface
with this signal asserted, this signal informs the
QDMA to not send any PCIe requests for this
descriptor. Because no PCIe request is sent out, no
corresponding DMA data is read from C2H MM
interface.

Displayed in the footer


Page 308 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


This is typically used in conjunction with
dma<n>_c2h_byp_in_mm_<y>_sdi to cause Status
Descriptor/Interrupt when the user logic is out of the
actual descriptors and still wants to drive the
dma<n>_c2h_byp_in_mm_<y>_sdi signal.
If dma<n>_c2h_byp_in_mm_<y>_mrkr_req and
dma<n>_c2h_byp_in_mm_<y>_sdi are reset when
sending in a no-DMA descriptor, the descriptor is
treated as a NOP and is completely consumed inside
the QDMA without any interface activity.
If dma<n>_c2h_byp_in_mm_<y>_no_dma is set, the
QDMA ignores the address. The length field should be
set to 0.

dma<n>_c2h_byp_in_mm_<y>_len[27:0]
I The DMA data length. The upper 12 bits must be tied
to 0. Thus, only the lower 16 bits of this field can be
used for specifying the length.

dma<n>_c2h_byp_in_mm_<y>_sdi I C2H Bypass In Status Descriptor/Interrupt


If set, it is treated as an indication from the User to
QDMA to send the status descriptor to host, and
generate an interrupt to host when the QDMA has
fetched the last byte of the data associated with this
descriptor. The QDMA will honor the request to
generate an interrupt only if interrupts have been
enabled in the C2H ring context for this QID and
armed by the driver.

dma<n>_c2h_byp_in_mm_<y>_mrkr_req
I C2H Bypass In Marker Request
You must send an indication that the QDMA must
send a completion status after the QDMA completes
the data transfer of this descriptor.

dma<n>_c2h_byp_in_mm_<y>_qid I The QID associated with the C2H descriptor ring


[1:0]

dma<n>_c2h_byp_in_mm_<y>_errorI This bit can be set to indicate an error for the queue.
The descriptor is not processed. Context is updated to
reflect and error in the queue.

dma<n>_c2h_byp_in_mm_<y>_func I PCIe function ID


[11:0]

dma<n>_c2h_byp_in_mm_<y>_cidx I You must echo the CIDX from the descriptor that it
[15:0] received on the bypass-out interface.

Displayed in the footer


Page 309 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_c2h_byp_in_mm_<y>_port_id[2:0]
I QDMA port ID

dma<n>_c2h_byp_in_mm_<y>_validI Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma<n>_c2h_byp_in_mm_<y>_ready
O Ready to take in descriptor.

1. variable <n> : represents different QDMAs (QDMA0 or QDMA1).


2. variable <y> : represents different Channels, (Channel 0 or Channel 1 ).

The following is an example timing diagram for C2H AXI-MM Bypass input:

Figure: C2H AXI-MM Bypass Input

QDMA Descriptor Bypass Output Ports

Table: QDMA H2C Descriptor Bypass Output Port Descriptions

Port Name I/O Description

dma<n>_h2c_byp_out_dsc O The H2C descriptor fetched from the host.


[255:0] For H2C AXI-MM, the functional mode uses all 256
bits, and the structure of the bits are the same as this
table.

Displayed in the footer


Page 310 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


For H2C AXI-ST, the functional mode uses [127:0]
bits, and the structure of the bits are the same as this
table.

dma<n>_h2c_byp_out_st_mm O Indicates whether this is a streaming data descriptor


or memory-mapped descriptor.
0: streaming
1: memory-mapped

dma<n>_h2c_byp_out_dsc_sz O Descriptor size. This field indicates the size of the


[1:0] descriptor.
0: 8B
1: 16B
2: 32B
3: 64B - 64B descriptors will be transferred with two
valid/ready cycles. The first cycle has the least
significant 32 bytes. The second cycle has the most
significant 32 bytes. CIDX and other queue
information is valid only on the second beat of a 64B
descriptor .

dma<n>_h2c_byp_out_qid O The QID associated with the H2C descriptor ring.


[1:0]

dma<n>_h2c_byp_out_error O Indicates that an error was encountered in descriptor


fetch or execution of a previous descriptor.

dma<n>_h2c_byp_out_func O PCIe function ID


[11:0]

dma<n>_h2c_byp_out_cidx O H2C Bypass Out Consumer Index


[15:0] The ring index of the descriptor fetched. The User
must echo this field back to QDMA when submitting
the descriptor on the bypass-in interface.

dma<n>_h2c_byp_out_port_id O QDMA port ID


[2:0]

dma<n>_h2c_byp_out_fmt[2:0] O Format
Tthe encoding for this field is as follows.
0x0: Standard descriptor
0x1 - 0x7: Reserved

dma<n>_h2c_byp_out_mm_chn O Channel number. Based on context settings it could


be 0 or 1.

Displayed in the footer


Page 311 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_h2c_byp_out_valid O Valid. High indicates descriptor is valid, one pulse for


one descriptor.

dma<n>_h2c_byp_out_ready I Ready. When this interface is not used, Ready must


be tied-off to 1.
The following is an example timing diagram for H2C Bypass Output:

Figure: H2C Bypass Output

Table: QDMA C2H Descriptor Bypass Output Port Descriptions

Port Name I/O Description

dma<n>_c2h_byp_out_dsc O The C2H descriptor fetched from the host.


[255:0] For C2H AXI-MM, the functional mode uses all 256
bits, and the structure of the bits is the same as this
table.
For C2H AXI-ST, the functional mode uses [63:0] bits,
and the structure of the bits is the same as this table.
The remaining bits are ignored.

Displayed in the footer


Page 312 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_c2h_byp_out_st_mm O Indicates whether this is a streaming data descriptor


or memory-mapped descriptor.
0: streaming
1: memory-mapped

dma<n>_c2h_byp_out_dsc_sz O Descriptor size. This field indicates the size of the


[1:0] descriptor.
0: 8B
1: 16B
2: 32B
3:64B - 64B descriptors will be transferred with two
valid/ready cycles. The first cycle has the least
significant 32 bytes. The second cycle has the most
significant 32 bytes. CIDX and other queue
information is valid only on the second beat of a 64B
descriptor.

dma<n>_c2h_byp_out_qid O The QID associated with the H2C descriptor ring.


[1:0]

dma<n>_c2h_byp_out_error O Indicates that an error was encountered in descriptor


fetch or execution of a previous descriptor.

dma<n>_c2h_byp_out_func O PCIe function ID.


[11:0]

dma<n>_c2h_byp_out_cidx O C2H Bypass Out Consumer Index


[15:0] The ring index of the descriptor fetched. The User
must echo this field back to QDMA when submitting
the descriptor on the bypass-in interface.

dma<n>_c2h_byp_out_port_id O QDMA port ID


[2:0]

dma<n>_c2h_byp_out_pfch_tag[6:0]O Prefetch tag. The prefetch tag points to the cam that
stores the active queues in prefetch engine

dma<n>_c2h_byp_out_fmt[2:0] O Format
The encoding for this field is as follows.
0x0 : Standard descriptor
0x1 - 0x7 : Reserved

dma<n>_c2h_byp_out_mm_chn O Channel number. Based on context settings it could


be 0 or 1.

dma<n>_c2h_byp_out_valid O Valid. High indicates descriptor is valid, one pulse for


one descriptor.

Displayed in the footer


Page 313 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_c2h_byp_out_ready I Ready. When this interface is not used, Ready must


be tied-off to 1.
The following is an example timing diagram for C2H bypass Output:

Figure: C2H Bypass Output

It is common for dma<n>_h2c_byp_out_vld or dma<n>_c2h_byp_out_vld to be asserted with the


CIDX value; this occurs when the Descriptor bypass mode option is not set in the context
programming selection. You must set the Descriptor bypass mode during QDMA IP core
customization in the AMD Vivado™ IDE to see descriptor bypass output ports. When Descriptor
bypass option is selected in the AMD Vivado™ IDE but the descriptor bypass bit is not set in context
programming, you see valid signals getting asserted with CIDX updates.

QDMA Descriptor Credit Input Ports

Table: QDMA Descriptor Credit Input Port Descriptions

Port Name I/O Description

dma<n>_dsc_crdt_in_valid I Valid. When asserted the user must be presenting


valid data on the bus and maintain the bus values
until both valid and ready are asserted on the same
cycle.

Displayed in the footer


Page 314 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_dsc_crdt_in_rdy O Ready. Assertion of this signal indicates the DMA is


ready to accept data from this bus.

dma<n>_dsc_crdt_in_dir I Indicates whether credits are for H2C or C2H


descriptor ring.
0: H2C
1: C2H

dma<n>_dsc_crdt_in_fence I If the fence bit is set, the credits are not coalesced,
and the queue is guaranteed to generate a descriptor
fetch before subsequent credit updates are
processed. The fence bit should only be set for a
queue that is enabled, and has both descriptors and
credits available, otherwise a hang condition might
occur.

dma<n>_dsc_crdt_in_qid I The QID associated with the descriptor ring for the
[1:0] credits are being added.

dma<n>_dsc_crdt_in_crdt I The number of descriptor credits that the user


[15:0] application is giving to QDMA to fetch descriptors
from the host.

QDMA Traffic Manager Credit Output Ports

Table: QDMA TM Credit Output Port Descriptions

Port Name I/O Description

dma<n>_tm_dsc_sts_valid O Valid. Indicates valid data on the output bus. Valid


data on the bus is held until tm_dsc_sts_rdy is
asserted by the user.

dma<n>_tm_dsc_sts_rdy I Ready. Assertion indicates that the user logic is ready


to accept the data on this bus. When this interface is
not used, Ready must be tied-off to 1.
✎ Note: When this interface is not used, Ready must
be tied-off to 1.

dma<n>_tm_dsc_sts_byp O Shows the bypass bit in the SW descriptor context

dma<n>_tm_dsc_sts_dir O Indicates whether the status update is for a H2C or


C2H descriptor ring.
0: H2C
1: C2H

Displayed in the footer


Page 315 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma<n>_tm_dsc_sts_mm O Indicates whether the status update is for a streaming


or memory-mapped queue.
0: streaming
1: memory-mapped

dma<n>_tm_dsc_sts_qid O The QID of the ring


[11:0]

dma<n>_tm_dsc_sts_avl O If dma<n>_tm_dsc_sts_qinv is set, this is the number


[15:0] of credits available in the descriptor engine. If
dma<n>_tm_dsc_sts_qinv is not set this is the
number of new descriptors that have been posted to
the ring since the last time this update was sent.

dma<n>_tm_dsc_sts_qinv O If set, it indicates that the queue has been invalidated.


This is used by the user application to reconcile the
credit accounting between the user application and
QDMA.

dma<n>_tm_dsc_sts_qen O The current queue enable status.

dma<n>_tm_dsc_sts_irq_arm O If set, it indicates that the driver is ready to accept


interrupts

dma<n>_tm_dsc_sts_error O Set to 1 if the PIDX update is rolled over the current


CIDX of associated queue.

dma<n>_tm_dsc_sts_pidx[15:0] O PIDX of the Queue

dma<n>_tm_dsc_sts_port_id O The port id associated with the queue from the queue
[2:0] context.

User Interrupts

Table: User Interrupts Port Descriptions

Port Name I/O Description

dma<n>_usr_irq_valid I Valid
An assertion indicates that an interrupt associated
with the vector, function, and pending fields on the
bus should be generated to PCIe. Once asserted,
dma<n>_usr_irq__vld must remain high until
dma<n>_usr_irq_ack is asserted by the DMA.

dma<n>_usr_irq_vec [4:0] I Vector


The MSIX vector to be sent.

Displayed in the footer


Page 316 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description


Vector starts from 0 to max 31. Vector 0 is 1st vector

dma<n>_usr_irq_in_fnc [12:0] I Function


The function of the vector to be sent.

dma<n>_usr_irq_ack O Interrupt Acknowledge


An assertion of the acknowledge bit indicates that the
interrupt was transmitted on the link the user logic
must wait for this pulse before signaling another
interrupt condition.

dma<n>_usr_irq_fail O Interrupt Fail


An assertion of fail indicates that the interrupt request
was aborted before transmission on the link.
The interrupt fail can happen for several reasons, for
details, see the following MSIX Interrupt Options
section.

✎ Note: Maximum eight vectors are allowed per function.


MSIX Interrupt Options

The following table describes the possible scenarios and outcomes:

Table: MSIX Interrupt Options

MSIX Enable (MSIXMSIX


capability)
Mask (MSIX capability)
Mask Per Vector (inusr_irq_ack/Fail
MSIX table) Outcome
Interrupt at Host

0 0 0 usr_irq_fail Not Received


asserted

0 0 1 usr_irq_fail Not Received

0 1 0 usr_irq_fail Not Received

0 1 1 usr_irq_fail Not Received

1 0 0 usr_irq_ack Received

1 0 1 usr_irq_ack with Not Received


PBA bit set

1 1 0 usr_irq_ack with Not Received


PBA bit set

1 1 1 usr_irq_ack with Not Received


PBA bit set

Displayed in the footer


Page 317 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Queue Status Ports

Table: Queue Status Ports

Port Name I/O Description

dma<n>_qsts_out_op[7:0] O Opcode This indicates the type of packet being


issued. Encoding of this field is as follows.
0x0: CMPT Marker Response
0x1: H2C-ST Marker Response
0x2: C2H-MM Marker Response
0x3: H2C-MM Marker Response
0x4-0xff: reserved

dma<n>_qsts_out_data[63:0] O The data field for the individual opcodes are defined
in the tables below.

dma<n>_qsts_out_port_id[2:0] O Port ID

dma<n>_qsts_out_qid[12:0] O Queue ID

dma<n>_qsts_out_vld O Queue status valid

dma<n>_qsts_out_rdy I Queue status ready. Ready must be tied to 1 so


status output will not be blocked. Even if this
interface is not used, the ready port must be tied to
1.

Queue Status Data

Table: Queue Status Data

qsts_out_data Field Description

[1:0] err Error code reported by the CMPT Engine.


0: No error
1: SW gave bad Completion CIDX update
2: Descriptor error received while
processing the C2H packet
3: Completion dropped by the C2H Engine
because Completion Ring was full

[2] retry_marker_req An Interrupt could not be generated in spite


of being enabled. This happens when an
Interrupt is already outstanding on the
queue when the marker request was
received. The user logic must wait and retry
the marker request again if an Interrupt is
desired to be sent.

Displayed in the footer


Page 318 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

qsts_out_data Field Description

[26:3] marker_cookie When the CMPT Engine sends a marker to


the interface, it sends the lower 24b of the
CMPT as part of the marker response on the
interface. Thus the user logic can place a
24b value in the CMPT when making the
marker request and it will receive the same
24b with the marker response. When the
marker is generated as a result of an error
that the CMPT Engine encountered (as
opposed to a marker request made by the
user logic), then this 24b field is don't care.
✎ Note: Even if the user has enabled
stamping of error and/or color bits in the
CMPT writes to the host, the marker_cookie
does not contain them. It is exactly the lower
24-bits of the CMPT that the user logic
provided to the QDMA when making the
marker request.

[63:27] rsv Reserved

NoC Ports

✎ Note: NoC ports are always connected to the NoC IP block, you cannot leave them unconnected
nor connect to any other block. This results in synthesis and implementation error. For connection
reference, see the following figure:

Table: NoC Ports

Port Name I/O Description

CPM_PCIE_NOC_0 O AXI4 MM 0 port from CPM to NoC

CPM_PCIE_NOC_1 O AXI4 MM 1 port from CPM to NoC

cpm_pcie_noc_axi0_clk O Clock for AXI4 MM 0 Port

cpm_pcie_noc_axi1_clk O Clock for AXI4 MM 1 Port

NOC_CPM_PCIE_0 I AXI4 MM ports from Noc to CPM. This port will


enabled when AXI Slave Bridge is enabled.

noc_cpm_pcie_axi0_clk O Clock for AXI4 MM port from NoC. This port will
enabled when AXI Slave Bridge is enabled.

Displayed in the footer


Page 319 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Figure: CPM5 NoC Connection

QDMA Management Ports

Table: QDMA Management Ports

Port Name I/O Description

dma0_mgmt_req_adr[31:0] I QDMA register address

dma0_mgmt_req_dat[31:0] I data value to be written.

dma0_mgmt_req_cmd[1:0] I 2'b00 : read


2'b01 : write
2'b10 : reserved
2'b11 : reserved

dma0_mgmt_req_fnc[12:0] I Function number

dma0_mgmt_req_msc[5:0] I Reserved. Assign all zeroes

dma0_mgmt_req_rdy O Ready

dma0_mgmt_req_vld I Valid, asserted for one clock cycle if rdy is asserted

dma0_mgmt_cpl_dat[31:0] O Data from QDMA IP

Displayed in the footer


Page 320 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Name I/O Description

dma0_mgmt_cpl_rdy I Ready

dma0_mgmt_cpl_sts[1:0] O
bit[0] :
1 Error
0 good
bit[1] :
1 write response
0 read response

dma0_mgmt_cpl_vld O Valid asserted for one clock cycle, if rdy is asserted

QDMA Management port should be connected to mailbox ports as described in CPM5 Mailbox IP.

Register Space

This section provides register space information for the QDMA.


In register space descriptions, configuration register attributes are defined as follows:

NA
Reserved

RO
Read-Only - Register bits are read-only and cannot be altered by the software.

RW
Read-Write - Register bits are read-write and are permitted to be either Set or Cleared by the
software to the desired state.

RW1C
Write-1-to-clear-status - Register bits indicate status when read. A Set bit indicates a status
event which is Cleared by writing a 1b. Writing a 0b to RW1C bits has no effect.

W1C
Non-readable-write-1-to-clear-status - Register will return 0 when read. Writing 1b Clears the
status for that bit index. Writing a 0b to W1C bits has no effect.

W1S
Non-readable-write-1-to-set - Register will return 0 when read. Writing 1b Sets the control set for
that bit index. Writing a 0b to W1S bits has no effect.

QDMA PF Address Register Space

All the physical function (PF) registers are listed in the cpm5-qdma-v4-0-pf-registers.csv available in
the register map files.

Table: QDMA PF Address Register Space

Register Name Base (Hex) Byte Size (Dec) Register List and Details

Displayed in the footer


Page 321 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Name Base (Hex) Byte Size (Dec) Register List and Details

QDMA_CSR 0x0000 8192 QDMA Configuration Space


Register (CSR) found in
cpm5-qdma-v4-0-pf-
registers.csv.

QDMA_TRQ_SEL_QUEUE_PF 0x18000 32768 Also found in


QDMA_TRQ_SEL_QUEUE_PF
(0x18000).

QDMA_PF_MAILBOX 0x42400 16384 Also found in


QDMA_PF_MAILBOX
(0x42400).

QDMA_TRQ_MSIX 0x50000 262144 Also found in


QDMA_TRQ_MSIX
(0x50000).

QDMA_TRQ_MSIX_PBA 0x54000 65536 MSIX PBA entries

QDMA_CSR (0x0000)

QDMA Configuration Space Register (CSR) descriptions are accessible in cpm5-qdma-v4-0-pf-


registers.csv available in the register map files.

QDMA_TRQ_SEL_QUEUE_PF (0x18000)

Table: QDMA_TRQ_SEL_QUEUE_PF (0x18000) Register Space

Register Address Description

QDMA_DMAP_SEL_INT_CIDX[2048] 0x18000- Interrupt Ring Consumer Index


(0x18000) 0x1CFF0 (CIDX)

QDMA_DMAP_SEL_H2C_DSC_PIDX[2048]
0x18004- H2C Descriptor Producer index
(0x18004) 0x1CFF4 (PIDX)

QDMA_DMAP_SEL_C2H_DSC_PIDX[2048]
0x18008- C2H Descriptor Producer Index
(0x18008) 0x1CFF8 (PIDX)

QDMA_DMAP_SEL_CMPT_CIDX[2048] 0x1800C- C2H Completion Consumer Index


(0x1800C) 0x1CFFC (CIDX)

There are 2048 Queues, each Queue will have more than four registers. All these registers can be
dynamically updated at any time. This set of registers can be accessed based on the Queue number.

Displayed in the footer


Page 322 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Queue number is absolute Qnumber [0 to 2047].
Interrupt CIDX address = 0x18000 + Qnumber*16
H2C PIDX address = 0x18004 + Qnumber*16
C2H PIDX address = 0x18008 + Qnumber*16
Write Back CIDX address = 0x1800C + Qnumber*16

For Queue 0:

0x18000 correspond to QDMA_DMAP_SEL_INT_CIDX


0c18004 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x18008 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x1800C correspond to QDMA_DMAP_SEL_CMPT_CIDX

For Queue 1:

0x18010 correspond to QDMA_DMAP_SEL_INT_CIDX


0c18014 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x18018 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x1801C correspond to QDMA_DMAP_SEL_CMPT_CIDX

For Queue 2:

0x18020 correspond to QDMA_DMAP_SEL_INT_CIDX


0c18024 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x18028 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x1802C correspond to QDMA_DMAP_SEL_CMPT_CIDX

QDMA_DMAP_SEL_INT_CIDX[2048] (0x18000)

Table: QDMA_DMAP_SEL_INT_CIDX[2048] (0x18000)

Bit Default Access Type Field Description

[31:24] 0 NA Reserved Reserved

[23:16] 0 RW ring_idx Ring index of the Interrupt Aggregation


Ring

[15:0] 0 RW sw_cdix Software Consumer index (CIDX)

QDMA_DMAP_SEL_H2C_DSC_PIDX[2048] (0x18004)

Table: QDMA_DMAP_SEL_H2C_DSC_PIDX[2048] (0x18004)

Bit Default Access Type Field Description

[31:17] 0 NA Reserved Reserved

[16] 0 RW irq_arm Interrupt arm. Set this bit to 1 for next


interrupt generation.

Displayed in the footer


Page 323 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[15:0] 0 RW h2c_pidx H2C Producer Index

QDMA_DMAP_SEL_C2H_DSC_PIDX[2048] (0x18008)

Table: QDMA_DMAP_SEL_C2H_DSC_PIDX[2048] (0x18008)

Bit Default Access Type Field Description

[31:17] 0 NA Reserved Reserved

[16] 0 RW irq_arm Interrupt arm. Set this bit to 1 for next


interrupt generation.

[15:0] 0 RW c2h_pidx C2H Producer Index

QDMA_DMAP_SEL_CMPT_CIDX[2048] (0x1800C)

Table: QDMA_DMAP_SEL_CMPT_CIDX[2048] (0x1800C)

Bit Default Access Type Field Description

[31:29] 0 NA Reserved Reserved

[28] 0 RW irq_en_wrb Interrupt arm. Set this bit to 1 for next


interrupt generation.

[27] 0 RW en_sts_desc_wrb Enable Status Descriptor for CMPT

[26:24] 0 RW trigger_mode Interrupt and Status Descriptor Trigger


Mode:
0x0: Disabled
0x1: Every
0x2: User_Count
0x3: User
0x4: User_Timer
0x5: User_Timer_Count

[23:20] 0 RW c2h_timer_cnt_index Index to QDMA_C2H_TIMER_CNT

[19:16] 0 RW c2h_count_threshhold Index to QDMA_C2H_CNT_TH

[15:0] 0 RW wrb_cidx CMPT Consumer Index (CIDX)

QDMA_PF_MAILBOX (0x42400)

Table: QDMA_PF_MAILBOX (0x42400) Register Space

Register Address Description

Displayed in the footer


Page 324 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Address Description

Function Status Register (0x42400) 0x42400 Status bits

Function Command Register 0x42404 Command register bits


(0x42404)

Function Interrupt Vector Register 0x42408 Interrupt vector register


(0x42408)

Target Function Register (0x4240C) 0x4240C Target Function register

Function Interrupt Vector Register 0x42410 Interrupt Control Register


(0x42410)

RTL Version Register (0x42414) 0x42414 RTL Version Register

PF Acknowledgment Registers 0x42420- PF acknowledge


(0x42420-0x4243C) 0x2243C

FLR Control/Status Register 0x42500 FLR control and status


(0x42500)

Incoming Message Memory 0x42C00- Incoming message (128 bytes)


(0x42C00-0x42C7C) 0x42C7C

Outgoing Message Memory 0x43000- Outgoing message (128 bytes)


(0x43000-0x4307C) 0x4307C

FMAP Programming (0x43100- 0x43100- Queue count and Q base.


0x434FC) 0x434FC

Mailbox Addressing
PF addressing
Addr = PF_Bar_offset + CSR_addr

VF addressing
Addr = VF_Bar_offset + VF_Start_offset + VF_offset + CSR_addr

Function Status Register (0x42400)

Table: Function Status Register (0x42400)

Bit Default Access Type Field Description

[31:12] 0 NA Reserved Reserved

[11:4] 0 RO cur_src_fn This field is for PF use only.

Displayed in the footer


Page 325 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description


The source function number of the
message on the top of the incoming
request queue.

[2] 0 RO ack_status This field is for PF use only.


The status bit will be set when any bit in
the acknowledgment status register is
asserted.

[1] 0 RO o_msg_status For VF: The status bit will be set when VF
driver write msg_send to its command
register. When The associated PF driver
send acknowledgment to this VF, the
hardware clear this field. The VF driver is
not allow to update any content in its
outgoing mailbox memory (OMM) while
o_msg_status is asserted. Any illegal
write to the OMM will be discarded
(optionally, this can cause an error in the
AXI Lite response channel).
For PF: The field indicated the message
status of the target FN which is specified
in the Target FN Register. The status bit
will be set when PF driver sends
msg_send command. When the
corresponding function driver send
acknowledgment by sending msg_rcv, the
hardware clear this field. The PF driver is
not allow to update any content in its
outgoing mailbox memory (OMM) while
o_msg_status(target_fn_id) is asserted.
Any illegal write to the OMM will be
discarded (optionally, case an error in the
AXI4L response channel).

Displayed in the footer


Page 326 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[0] 0 RO i_msg_status For VF: When asserted, a message in the


VF’s incoming Mailbox memory is
pending for process. The field will be
cleared once the VF driver write msg_rcv
to its command register.
For PF: When asserted, the messages in
the incoming Mailbox memory are
pending for process. The field will be
cleared only when the event queue is
empty.

Function Command Register (0x42404)

Table: Function Command Register (0x42404)

Bit Default Access Type Field Description

[31:3] 0 NA Reserved Reserved

[2] 0 RO Reserved Reserved

[1] 0 RW msg_rcv For VF: VF marks the message in its


Incoming Mailbox Memory as received.
Hardware asserts the acknowledgement
bit of the associated PF.
For PF: PF marks the message send by
target_fn as received. The hardware will
refresh the i_msg_status of the PF, and
clear the o_msg_status of the target_fn.

[0] 0 RW msg_send For VF: VF marks the current message in


its own Outgoing Mailbox as valid.
For PF:

Displayed in the footer


Page 327 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description


Current target_fn_id belongs to a
VF: PF finished writing a message
into the Incoming Mailbox memory of
the VF with target_fn_id. The
hardware sets the i_msg_status field
of the target FN’s status register.
Current target_fn_id belongs to a
PF: PF finished writing a message
into its own outgoing Mailbox
memory. Hardware will push the
message to the event queue of the
PF with target_fn_id.

Function Interrupt Vector Register (0x42408)

Table: Function Interrupt Vector Register (0x42408)

Bit Default Access Type Field Description

[31:5] 0 NA Reserved Reserved

[4:0] 0 RW int_vect 5-bit interrupt vector assigned by the


driver.

Target Function Register (0x4240C)

Table: Target Function Register (0x4240C)

Bit Default Access Type Field Description

[31:8] 0 NA Reserved Reserved

[7:0] 0 RW target_fn_id This field is for PF use only.


The FN number which the current
operation is targeting at.

Function Interrupt Vector Register (0x42410)

Table: Function Interrupt Vector Register (0x42410)

Bit Default Access Type Field Description

[31:1] 0 NA Reserved Reserved

[0] 0 RW int_en Interrupt enable.

Displayed in the footer


Page 328 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
RTL Version Register (0x42414)

Table: RTL Version Register (0x42414)

Bit Default Access Type Description

[31:16] 0x1fd3 RO QDMA ID

[15:12] 4'h20 RO Device ID


4'b0010 : CPM5 QDMA
All other options are reserved.

[11:8] 4'h0 RO Vivado version 2022.1

[7:0] 8'h0 RO Reserved

PF Acknowledgment Registers (0x42420-0x4243C)

Table: PF Acknowledgment Registers (0x42420-0x4243C)

Register Addr Default Access Type Width Description

Ack0 0x42420 0 RW 32 Acknowledgment from FN


31~0

Ack1 0x42424 0 RW 32 Acknowledgment from FN


63~32

Ack2 0x42428 0 RW 32 Acknowledgment from FN


95~64

Ack3 0x4242C 0 RW 32 Acknowledgment from FN


127~96

Ack4 0x42430 0 RW 32 Acknowledgment from FN


159~128

Ack5 0x42434 0 RW 32 Acknowledgment from FN


191~160

Ack6 0x42438 0 RW 32 Acknowledgment from FN


223~192

Ack7 0x4243C 0 RW 32 Acknowledgment from FN


255~224

FLR Control/Status Register (0x42500)

Table: FLR Control/Status Register (0x42500)

Bit Default Access Type Field Description

Displayed in the footer


Page 329 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[31:1] 0 NA Reserved Reserved

[0] 0 RW Flr_status Software write 1 to initiate the Function


Level Reset (FLR) for the associated
function. The field is kept asserted during
the FLR process. After the FLR is done,
the hardware de-asserts this field.

Incoming Message Memory (0x42C00-0x42C7C)

Table: Incoming Message Memory (0x22C00-0x22C7C)

Register Addr Default Access Type Width Description

i_msg_i 0x42C00 + 0 RW 32 The ith word of the


i*4 incoming message ( 0 ≤ I
< 128).

Outgoing Message Memory (0x43000-0x4307C)

Table: Outgoing Message Memory (0x43000-0x4307C)

Register Addr Default Access Type Width Description

o_msg_i 0x43000 + 0 RW 32 The ith word of the


i *4 outgoing message ( 0 ≤ I
< 128).

FMAP Programming (0x43100-0x434FC)

Table: Outgoing Message Memory (0x23000-0x2307C)

Register Addr Default Access Type Width Description

FMAP 0x43100+fun*4 0 RW 32 [23:12] Qid_max. Number


of Queues for this
function.
[11:0] Qid_base. Base
Qid for this function.

QDMA_TRQ_MSIX (0x50000)

Table: QDMA_TRQ_MSIX (0x50000)

Byte Offset Bit Default Access Type Field Description

Displayed in the footer


Page 330 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Byte Offset Bit Default Access Type Field Description

0x50000 [31:0] 0 RW addr MSI-X vector0 message lower


address.
MSIX_Vector0_Address[63:32]

0x50004 [31:0] 0 RW addr MSI-X vector0 message upper


address.
MSIX_Vector0_Address[63:32]

0x50008 [31:0] 0 RW data MSIX_Vector0_Data[31:0]


MSI-X vector0 message data.

0x5000C [31:0] 0 RW control MSIX_Vector0_Control[31:0]


MSI-X vector0 control.
Bit Position:
31:1: Reserved.
0: Mask. When set to 1, this MSI-X
vector is not used to generate a
message. When reset to 0, this MSI-
X vector is used to generate a
message.

✎ Note: The table above represents one MSI-X table entry 0. There are 2K MSI-X table entries for
the QDMA.

QDMA VF Address Register Space

All the virtual function (VF) registers are listed in the cpm5-qdma-v4-0-vf-registers.csv available in the
register map files.

Table: QDMA VF Address Register Space

Target Name Base (Hex) Byte Size (Dec) Notes

QDMA_TRQ_SEL_QUEUE_VF 00003000 4096 VF Direct QCSR (16B per


(0x3000) Queue, up to max of 256
Queue per function)

QDMA_TRQ_MSIX_VF 00004000 4096 Space for 32 MSIX vectors


(0x4000) and PBA

QDMA_VF_MAILBOX 00005000 8192 Mailbox address space


(0x5000)

QDMA_TRQ_SEL_QUEUE_VF (0x3000)

VF functions can access direct update registers per queue with offset (0x3000). The description for
this register space is the same as QDMA_TRQ_SEL_QUEUE_PF (0x18000).

Displayed in the footer


Page 331 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
This set of registers can be accessed based on Queue number. Queue number is relative Qnumber
for that VF .

Interrupt CIDX address = 0x3000 + Qnumber*16


H2C PIDX address = 0x3004 + Qnumber*16
C2H PIDX address = 0x3008 + Qnumber*16
Completion CIDX address = 0x300C + Qnumber*16

For Queue 0:

0x3000 correspond to QDMA_DMAP_SEL_INT_CIDX


0x3004 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x3008 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x300C correspond to QDMA_DMAP_SEL_WRB_CIDX

For Queue 1:

0x3010 correspond to QDMA_DMAP_SEL_INT_CIDX


0x3014 correspond to QDMA_DMAP_SEL_H2C_DSC_PIDX
0x3018 correspond to QDMA_DMAP_SEL_C2H_DSC_PIDX
0x301C correspond to QDMA_DMAP_SEL_WRB_CIDX

QDMA_TRQ_MSIX_VF (0x4000)

VF functions can access the MSIX table with offset (0x0000) from that function. The description for
this register space is the same as QDMA_TRQ_MSIX (0x50000).

QDMA_VF_MAILBOX (0x5000)

Table: QDMA_VF_MAILBOX (0x05000) Register Space

Registers (Address) Address Description

Function Status Register 0x5000 Status register bits


(0x5000)

Function Command Register 0x5004 Command register bits


(0x5004)

Function Interrupt Vector 0x5008 Interrupt vector register


Register (0x5008)

Target Function Register 0x500C Target Function register


(0x500C)

Function Interrupt Control 0x5010 Interrupt Control Register


Register (0x5010)

RTL Version Register 0x5014 RTL Version Register


(0x5014)

Displayed in the footer


Page 332 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Registers (Address) Address Description

Incoming Message Memory 0x5800-0x587C Incoming message (128


(0x5800-0x587C) bytes)

Outgoing Message Memory 0x5C00-0x5C7C Outgoing message (128


(0x5C00-0x5C7C) bytes)

Function Status Register (0x5000)

Table: Function Status Register (0x5000)

Bit Index Default Access Type Field Description

[31:12] 0 NA Reserved Reserved

[11:4] 0 RO cur_src_fn This field is for PF use only.


The source function number of the
message on the top of the incoming
request queue.

[2] 0 RO ack_status This field is for PF use only.


The status bit will be set when any bit in
the acknowledgement status register is
asserted.

[1] 0 RO o_msg_status For VF: The status bit will be set when VF
driver write msg_send to its command
register. When the associated PF driver
sends acknowledgement to this VF, the
hardware clears this field. The VF driver is
not allow to update any content in its
outgoing mailbox memory (OMM) while
o_msg_status is asserted. Any illegal
writes to the OMM are discarded
(optionally, case an error in the AXI4-Lite
response channel).
For PF: The field indicated the message
status of the target FN which is specified
in the Target FN Register. The status bit is
set when PF driver sends the msg_send
command. When the corresponding
function driver sends acknowledgement
through msg_rcv, the hardware clears this
field. The PF driver is not allow to update
any content in its outgoing mailbox
memory (OMM) while

Displayed in the footer


Page 333 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Index Default Access Type Field Description


o_msg_status(target_fn_id) is asserted.
Any illegal writes to the OMM are
discarded (optionally, case an error in the
AXI4L response channel).

[0] 0 RO i_msg_status For VF: When asserted, a message in the


VF's incoming Mailbox memory is
pending for process. The field is cleared
after the VF driver writes msg_rcv to its
command register.
For PF: When asserted, the messages in
the incoming Mailbox memory are
pending for process. The field is cleared
only when the event queue is empty.

Function Command Register (0x5004)

Table: Function Command Register (0x5004)

Bit Index Default Access Type Field Description

[31:3] 0 NA Reserved Reserved

[2] 0 RO Reserved Reserved

[1] 0 RW msg_rcv For VF: VF marks the message in its


Incoming Mailbox Memory as received.
The hardware asserts the
acknowledgement bit of the associated
PF.
For PF: PF marks the message send by
target_fn as received. The hardware
refreshes the i_msg_status of the PF, and
clears the o_msg_status of the target_fn.

[0] 0 RW msg_send For VF: VF marks the current message in


its own Outgoing Mailbox as valid.
For PF:
Current target_fn_id belongs to a VF: PF
finished writing a message into the
Incoming Mailbox memory of the VF with
target_fn_id. The hardware sets the
i_msg_status field of the target FN's
status register.
Current target_fn_id belongs to a PF: PF
finished writing a message into its own

Displayed in the footer


Page 334 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Index Default Access Type Field Description


outgoing Mailbox memory. The hardware
pushes the message to the event queue
of the PF with target_fn_id.

Function Interrupt Vector Register (0x5008)

Table: Function Interrupt Vector Register (0x5008)

Bit Index Default Access Type Field Description

[31:5] 0 NA Reserved Reserved

[4:0] 0 RW int_vect 5-bit interrupt vector assigned by the


driver software.

Target Function Register (0x500C)

Table: Target Function Register (0x500C)

Bit Index Default Access Type Field Description

[31:8] 0 NA Reserved Reserved

[7:0] 0 RW target_fn_id This field is for PF use only.


The FN number that the current operation
is targeting.

Function Interrupt Control Register (0x5010)

Table: Function Interrupt Control Register (0x5010)

Bit Index Default Access Type Field Description

[31:1] 0 NA res Reserved

[0] 0 RW int_en Interrupt enable.

RTL Version Register (0x5014)

Table: RTL Version Register (0x5014)

Bit Default Access Type Field Description

[31:16] 0x1fd3 RO . QDMA ID

[15:0] 0 RO . Vivado versions

Displayed in the footer


Page 335 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Incoming Message Memory (0x5800-0x587C)

Table: Incoming Message Memory (0x5800-0x587C)

Register Addr Default Access Type Width Description

i_msg_i 0x5800 + 0 RW 32 The ith word of the


i*4 incoming message ( i <
128).

Outgoing Message Memory (0x5C00-0x5C7C)

Table: Outgoing Message Memory (0x5C00-0x5C7C)

Register Addr Default Access Type Width Description

o_msg_i 0x5C00 + i 0 RW 32 The ith word of the


*4 outgoing message (i <
128).

AXI Slave Register Space

You can access QDMA0 or QDMA1 register space using AXI Slave interface. When AXI Slave Bridge
mode is enabled (based on GUI settings) user can also access Bridge registers in QDMA0 or in
QDMA1 and can also access Host memory space. Host memory address offset varies based on
QDMA0 and/or QDMA 1 selection.
If only QDMA0 is enabled, the table below shows address ranges and limitations.
✎ Note: You cannot access QDMA CSR register space through AXI Slave Bridge interface. You can
only access QDMA Queue space register.

Table: AXI4 Slave Register Space for QDMA0

Register Space AXI Slave Interface Address range Details

Bridge registers 0x6_0000_0000 - Described in Bridge register


0x6_0FFF_FFFF space CSV file. See Bridge
Register Space for details.

DMA registers 0x6_1000_0000 - Described in QDMA Queue


0x6_107F_FFFF space register.
QDMA_TRQ_SEL_QUEUE_PF
(0x18000) and
QDMA_TRQ_SEL_QUEUE_VF
(0x3000).

Slave Bridge access to Host 0xE000_0000 - 0xEFFF_FFFF Address range for Slave
memory space 0x6_1101_0000 - bridge access is set during IP
0x7_FFFF_FFFF

Displayed in the footer


Page 336 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Space AXI Slave Interface Address range Details


0x80_0000_0000 - customization in the Address
0xBF_FFFF_FFFF Editor tab of the Vivado IDE.
If QDMA1 is enabled, the table below shows address ranges and limitations.

Table: AXI4 Slave Register Space for QDMA1

Register Space AXI Slave Interface Address range Details

Bridge registers 0x7_0000_0000 - Described in Bridge register


0x70FFF_FFFF space CSV file. See Bridge
Register Space for details.

DMA registers 0x7_1000_0000 - Described in QDMA Queue


0x7_107F_FFFF space register.
QDMA_TRQ_SEL_QUEUE_PF
(0x18000) and
QDMA_TRQ_SEL_QUEUE_VF
(0x3000).

Slave Bridge access to Host 0xE800_0000 - 0xEFFF_FFFF Address range for Slave
memory space 0x7_1101_0000 - bridge access is set during IP
0x7_FFFF_FFFF customization in the Address
0xA0_0000_0000 - Editor tab of the Vivado IDE.
0xBF_FFFF_FFFF

When both QDMA0 and QDMA1 controllers are enabled, the table above remains the same for
QDMA1 controller. The table shown below represents the QDMA0 controller.

Table: AXI4 Slave Register Space for QDMA0

Register Space AXI Slave Interface Address range Details

Bridge registers 0x6_0000_0000 - Described in Bridge register


0x6_0FFF_FFFF space CSV file. See Bridge
Register Space for details.

DMA registers 0x6_1000_0000 - Described in QDMA Queue


0x6_107F_FFFF space register.
QDMA_TRQ_SEL_QUEUE_PF
(0x18000) and
QDMA_TRQ_SEL_QUEUE_VF
(0x3000).

Displayed in the footer


Page 337 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Space AXI Slave Interface Address range Details

Slave Bridge access to Host 0xE000_0000 - 0xE7FF_FFFF Address range for Slave
memory space 0x6_1101_0000 - bridge access is set during IP
0x6_FFFF_FFFF customization in the Address
0x80_0000_0000 - Editor tab of the Vivado IDE.
0x9F_FFFF_FFFF

Bridge Register Space

Bridge register addresses start at 0xE00. Addresses from 0x00 to 0xE00 are directed to the PCIe
configuration register space.
All the bridge registers are listed in the cpm5-qdma-v4-0-bridge-registers.csv available in the register
map files.
To locate the register space information:

1. Download the register map files.


2. Extract the ZIP file contents into any write-accessible location.
3. Refer to cpm5-qdma-v4-0-bridge-registers.csv.

QDMA Register Space

The QDMA Queue register space is described in the following sections:

QDMA_TRQ_SEL_QUEUE_PF (0x18000)
QDMA_TRQ_SEL_QUEUE_VF (0x3000)

Queue space register access from AXI Slave interface should be Salve Address range low (see the
table above) + 0x18000.

Design Flow Steps


This section describes customizing and generating the functional mode, constraining the functional
mode, and the simulation, synthesis, and implementation steps that are specific to this IP functional
mode. More detailed information about the standard AMD Vivado™ design flows and the IP integrator
can be found in the following Vivado Design Suite user guides:

Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
Vivado Design Suite User Guide: Designing with IP (UG896)
Vivado Design Suite User Guide: Getting Started (UG910)
Vivado Design Suite User Guide: Logic Simulation (UG900)

CPM5 GUI Customization

For more information on CPM5 GUI Customization, see AR000034477.

Displayed in the footer


Page 338 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
QDMA AXI MM Interface to NoC and DDR Lab

This lab describes the process of generating an AMD Versal™ device QDMA design with AXI4
interface connected to network on chip (NoC) IP and DDR memory. This design has the following
configurations:

AXI4 memory mapped (AXI MM) connected to DDR through the NoC IP
Gen3 x 16
4 physical functions (PFs) and 252 virtual functions (VFs)
MSI-X interrupts

This lab provides step by step instructions to configure a Control, Interfaces and Processing System
(CIPS) QDMA design and network on chip (NoC) IP. The following figure shows the AXI4 Memory
Mapped (AXI-MM) interface to DDR using the NoC IP. At the end of this lab, you can synthesize and
implement the design, and generate a Programmable Device Image (PDI) file. The PDI file is used to
program the Versal device and run data traffic on a system. For the AXI-MM interface host to chip
(H2C) transfers, data is read from Host and sent to DDR memory. For chip to host (C2H) transfers,
data is read from DDR memory and written to host.
This lab targets xcvp1202-vsva2785-2MP-e-S part. This lab connects to DDR memory found outside
the Versal device. A constraints file is provided and added to the design during the lab. The
constraints file lists all DDR pins and their placement. You can modify the constraint file based on your
requirements and DDR part selection. For more information, see QDMA AXI MM Interface to NoC and
DDR.

Simulation

Simulation example designs are listed in the configurable example design (CED). You can download
the simulation example design from the Vivado store Versal_CPM5_QDMA_Simultion_Design. The
list of Versal PCIe example designs are available here. Simulation design has a fixed configuration as
follows:

Gen4x8
AXI4 and AXI-ST
4 PF and 250 VF's
Each Function with two BAR's. One for QDMA configuration space and one for Bypass access to
PL.
Descriptor bypass and Internal Mode

Following is the procedure to generate a CPM5 QDMA simulation design:

1. Open Vivado and select Open Example Project option under Quick Start.
✎ Note: In simulation, you might receive warnings from internal RAMs within the CPM5 block
indicating that a write/read contention has occurred on a multi-port RAM. This is normal as the
contention is resolved separately outside of the RAM blocks. These warnings can be safely
ignored.

Displayed in the footer


Page 339 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

2. From the Template options, select Versal CPM5 QDMA Simulation Design under PCIe section.
You can see the corresponding diagram on the right-hand side description section. click Next.

Displayed in the footer


Page 340 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The Vivado wizard guides you through the board/part section. This example design is fixed to
VPK120 board.
3. Select the project name and directory and click Next to generate project.
4. A new simulation project is displayed as shown below.

Displayed in the footer


Page 341 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
This project has cpm_qdma (EP design) and design_rp (root port model). And all the relevant
files that are need for simulation.

Basic Simulation

Simulation models for the AXI-MM and AXI-ST options can be generated and simulated. The simple
simulation model options enable you to develop complex designs.

AXI-MM Mode
The example design for the AXI4 Memory Mapped (AXI-MM) mode has 512 KB block RAM on the
user side, where the data can be written to the block RAM, and read from block RAM to the host.
After the host to Card (H2C) transfer is started, the DMA reads data from the host memory, and writes
to the block RAM. After the transfer is completed, the DMA updates the write back status and
generates an interrupt (if enabled). Then, the card to host (C2H) transfer is started, and the DMA
reads data from the block RAM and writes to the host memory. The original data is compared with the
C2H write data. H2C and C2H are set up with one descriptor each, and the total transfer size is 128
bytes.

AXI-ST Mode
The example design for the AXI4-Stream (AXI-ST) mode has a data check that checks the data from
the H2C transfer, and has a data generator that generates the data for C2H transfer.
After the H2C transfer is started, the DMA engine reads data from the host memory, and writes to the
user side. After the transfer is completed, the DMA updates write back status and generates an
interrupt (if enabled). The data checker on the user side checks for a predefined data to be present,
and the result is posted in a predefined address for the user application to read.
After the C2H transfer is started, the data generator generates predefined data and associated control
signals, and sends them to the DMA. The DMA transfers data to the Host, updates the completion
(CMPT) ring entry/status, and generates an interrupt (if enabled).
H2C and C2H are set up with 16 descriptor each, and the total transfer size is 128 bytes.

PIPE Mode Simulation

The QDMA supports the PIPE mode simulation where the PIPE interface of the core is connected to
the PIPE interface of the link partner. This mode increases the simulation speed.
Use the Enable PIPE Simulation option on the Basic tab of the Customize IP dialog box to enable
PIPE mode simulation in the current AMD Vivado™ Design Suite solution example design, in either
Endpoint or Root Port mode. The External PIPE interface signals are generated at the core boundary
for access to the external device. Enabling this feature also provides the necessary hooks to use
third-party PCI Express® VIPs/BFMs instead of the Root Port model provided with the example
design.

Displayed in the footer


Page 342 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Customizable Example Design (CED)

CPM5 QDMA
The following table describes the available CPM5 example design. All the listed example designs are
based on VPK120 evaluation board or equivalent part.

Table: QDMA Example design

Top CED Name Preset Simulation/Implementation


Description

CPM5_QDMA_Gen4x8_MM_ST_Design
Implementation Functional
example design.

CPM5_QDMA_Gen5x8_MM_Performance_Design
Implementation AXI4
performance
design.
Vsersal_CPM_QDMA_EP_Design
CPM5_QDMA_Gen4x8_ST_Performance_Design
Implementation AXI-ST
performance
design.

CPM5_QDMA_Dual_Gen4x8_MM_ST_Design
Implementation Functional
example design.

Versal_CPM_QDMA_EP_Simulation_Design
No preset Simulation QDMA full
functional
simulation
design.

Versal_CPM_Bridge CPM5_PCIe_Controller0_Gen4x8_RootPort_Design
Implementation RP design
RP_Design CPM5_PCIe_Controller1_Gen4x8_RootPort_Design
Implementation RP design

CPM5_QDMA_Gen5x8_ST_Performance_Design
Implementation AXI-ST
performance
Versal_CPM_QDMA_EP_Design design.
(Part Based) CPM5_QDMA_Dual_Gen5x8_ST_Performance_Design
Implementation AXI-ST
performance
design.

The associated drivers can be downloaded from GitHub.

CED Generation Steps

Following are the steps to generate a CED:

1. Launch Vivado.

Displayed in the footer


Page 343 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
2. Check whether the designs are installed and update if required.
3. Click Vivado Store.

Displayed in the footer


Page 344 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
4. Go to Example Designs tab, click Refresh to refresh the catalog.

5. Click install for any newly added designs or click Update for any updates to the designs and
close it.

Displayed in the footer


Page 345 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

6. Click Open Example Project > Next, select the appropriate design, click Next.

Displayed in the footer


Page 346 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
7. Create project, click Next.

8. Choose the board or part option available. Based on the board selected, appropriate CPM block
is enabled. For example, for VCK190 board, CPM4 block is enabled. Similarly, for VPK120
board, CPM5 block is enabled.
a. In CPM5, the Gen 5 link speed is available for -2 MHP speed grade variant of VPK120
board. Ensure to choose -2 MHP variant in the Switch Part option while selecting the board.

Displayed in the footer


Page 347 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Also, choose the appropriate preset if applicable.

Displayed in the footer


Page 348 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
9. Click Finish.

Versal_CPM_QDMA_EP_Design

The following are the presets available for you to select:

CPM5_QDMA_Gen4x8_MM_ST_Design

CPM5 QDMA1 Gen4x8 Functional example design:

This design has CPM5 – QDMA1 enabled in Gen4x8 configuration as an End Point
The design targets VPK120 board and it supports synthesis and implementation flows
Enables QDMA AXI4 and QDMA AXI-ST functionality with 4 PF and 252 VFs
Capable of exercising AXI4, AXI-ST path, and descriptor bypass

Example Design Registers

Table: Example Design Registers

Registers Address Description

C2H_ST_QID (0x000) 0x000 AXI-ST C2H Queue id

C2H_ST_LEN (0x004) 0x004 AXI-ST C2H transfer length

C2H_CONTROL_REG (0x008) 0x008 AXI-ST C2H pattern generator


control

Displayed in the footer


Page 349 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Registers Address Description

H2C_CONTROL_REG (0x00C) 0x00C AXI-ST H2C Control

H2C_STATUS (0x010) 0x010 AXI-ST H2C Status

C2H_STATUS (0x018) 0x018 AXI-ST C2H Status

C2H_PACKET_COUNT (0x020) 0x020 AXI-ST C2H number of packets to


transfer

C2H_PREFETCH_TAG (0x024) 0x024 AXI-St C2H simple bypass prefetch


Tag and Q details

C2H_COMPLETION_DATA_0 0x4C-0x030 AXI-ST C2H completion data


(0x030)
to
C2H_COMPLETION_DATA_7
(0x04C)

C2H_COMPLETION_SIZE (0x050) 0x050 AXI-ST completion data type

SCRATCH_REG0 (0x060) 0x060 Scratch register 0

SCRATCH_REG1 (0x064) 0x064 Scratch register 1

C2H_PACKETS_DROP (0x088) 0x088 AXI-ST C2H Packets drop count

C2H_PACKETS_ACCEPTED 0x08C AXI-ST C2H Packets accepted count


(0x08C)

DESCRIPTOR_BYPASS (0x090) 0x090 C2H and H2C descriptor bypass


loopback

USER_INTERRUPT (0x094) 0x094 User interrupt, vector number,


function number

USER_INTERRUPT_MASK (0x098) 0x098 User interrupt mask

USER_INTERRUPT_VECTOR 0x09C User interrupt vector


(0x09C)

DMA_CONTROL (0x0A0) 0x0A0 DMA control

VDM_MESSAGE_READ (0x0A4) 0x0A4 VDM message read

C2H_ST_QID (0x000)

Table: C2H_ST_QID (0x000)

Bit Default Access Type Field Description

Displayed in the footer


Page 350 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[31:11] 0 NA Reserved

[10:0] 0 RW c2h_st_qid AXI4-Stream C2H Queue ID

C2H_ST_LEN (0x004)

Table: C2H_ST_LEN (0x004)

Bit Default Access Type Field Description

[31:16] 0 NA Reserved

[15:0] 0 RW c2h_st_len AXI4-Stream


packet length

C2H_CONTROL_REG (0x008)

Table: C2H_CONTROL_REG (0x008)

Bit Default Access Type Description

[31:6] 0 NA Reserved

[5] 0 RW C2H Stream Marker


request. C2H Stream
Marker response is
registered at address
0x18, bit [0].

[4] 0 NA reserved

[3] 0 RW Disable completion.


For this packet, there
will not be any
completion.

[2] 0 RW Immediate data.


When set, the data
generator sends
immediate data. This
is a self-clearing bit.
Write 1 to initiate
transfer.

[1] 0 RW Starts AXI-ST C2H


transfer. This is a self-

Displayed in the footer


Page 351 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Description


clearing bit. Write 1 to
initiate transfer.

[0] 0 RW Streaming loop back.


When set, the data
packet from H2C
streaming port in the
Card side is looped
back to the C2H
streaming ports.
For Normal C2H stream packet transfer, set address offset 0x08 to 0x2.
For C2H immediate data transfer, set address offset 0x8 to 0x4.
For C2H/H2C stream loopback, set address offset 0x8 to 0x1.

H2C_CONTROL_REG (0x00C)

Table: H2C_CONTROL_REG (0x00C)

Bit Default Access Type Description

[31:30] 0 NA Reserved

[0] 0 RW Clear match bit for


H2C transfer.

H2C_STATUS (0x010)

Table: H2C_STATUS (0x010)

Bit Default Access Type Description

[31:15] 0 NA Reserved

[14:4] 0 R H2C transfer Queue


ID

[3:1] 0 NA Reserved

[0] 0 R H2C transfer match

C2H_STATUS (0x018)

Table: C2H_STATUS (0x018)

Bit Default Access Type Description

[31:30] 0 NA Reserved

Displayed in the footer


Page 352 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Description

[0] 0 R C2H Marker response

C2H_PACKET_COUNT (0x020)

Table: C2H_PACKET_COUNT (0x020)

Bit Default Access Type Description

[31:10] 0 NA Reserved

[9:0] 0 RW AXI-ST C2H number of packet to transfer

C2H_PREFETCH_TAG (0x024)

Table: C2H_PREFETCH_TAG (0x024)

Bit Default Access Type Description

[31:27] 0 NA Reserved

[26:16] 0 RW Qid for prefetch tag

[15:7] 0 NA Reserved

[6:0] 0 RW Prefetch tag value for Simple Bypass mode

C2H_COMPLETION_DATA_0 (0x030)

Table: C2H_COMPLETION_DATA_0 (0x030)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [31:0]

C2H_COMPLETION_DATA_1 (0x034)

Table: C2H_COMPLETION_DATA_1 (0x034)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [63:32]

C2H_COMPLETION_DATA_2 (0x038)

Table: C2H_COMPLETION_DATA_2 (0x038)

Bit Default Access Type Description

Displayed in the footer


Page 353 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [95:64]

C2H_COMPLETION_DATA_3 (0x03C)

Table: C2H_COMPLETION_DATA_3 (0x03C)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [127:96]

C2H_COMPLETION_DATA_4 (0x040)

Table: C2H_COMPLETION_DATA_4 (0x040)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [159:128]

C2H_COMPLETION_DATA_5 (0x044)

Table: C2H_COMPLETION_DATA_5 (0x044)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [191:160]

C2H_COMPLETION_DATA_6 (0x048)

Table: C2H_COMPLETION_DATA_6 (0x048)

Bit Default Access Type Field Description

[31:0] 0 NA NA AXI-ST C2H Completion Data [223:192]

C2H_COMPLETION_DATA_7 (0x04C)

Table: C2H_COMPLETION_DATA_7 (0x04C)

Bit Default Access Type Description

[31:0] 0 NA AXI-ST C2H Completion Data [255:224]

C2H_COMPLETION_SIZE (0x050)

Table: C2H_COMPLETION_SIZE (0x050)

Bit Default Access Type Description

Displayed in the footer


Page 354 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Description

[31:13] 0 NA Reserved

[12] 0 RW Completion Type.


1'b1: NO_PLD_BUT_WAIT
1'b0: HAS PLD

[10:8] 0 RW s_axis_c2h_cmpt_ctrl_err_idx[2:0] Completion Error


Bit Index.
3'b000: Selects 0th register.
3'b111: No error bit is reported.

[6:4] 0 RW s_axis_c2h_cmpt_ctrl_col_idx[2:0] Completion Color


Bit Index.
3'b000: Selects 0th register.
3'b111: No color bit is reported.

[3] 0 RW s_axis_c2h_cmpt_ctrl_user_trig Completion user


trigger

[1:0] 0 RW AXI4-Stream C2H completion data size.


00: 8 Bytes
01: 16 Bytes
10: 32 Bytes
11: 64 Bytes

SCRATCH_REG0 (0x060)

Table: SCRATCH_REG0 (0x060)

Bit Default Access Type Description

[31:0] 0 RW Scratch register

SCRATCH_REG1 (0x064)

Table: SCRATCH_REG1 (0x064)

Bit Default Access Type Description

[31:0] 0 RW Scratch register

C2H_PACKETS_DROP (0x088)

Table: C2H_PACKETS_DROP (0x088)

Bit Default Access Type Description

Displayed in the footer


Page 355 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Description

[31:0] 0 R The number of AXI-ST C2H packets (descriptors)


dropped per transfer
Each AXI-ST C2H transfer can contain one or more descriptors depending on transfer size and C2H
buffer size. This register represents how many of the descriptors were dropped in the current transfer.
This register will reset to 0 in the beginning of each transfer.

C2H_PACKETS_ACCEPTED (0x08C)

Table: C2H_PACKETS_ACCEPTED (0x08C)

Bit Default Access Type Description

[31:0] 0 R The number of AXI-ST C2H packets (descriptors)


accepted per transfer

Each AXI-ST C2H transfer can contain one or more descriptors depending on the transfer size and
C2H buffer size. This register represents how many of the descriptors were accepted in the current
transfer. This register will reset to 0 at the beginning of each transfer.

DESCRIPTOR_BYPASS (0x090)

Table: Descriptor Bypass (0x090)

Bit Default Access Type Field Description

[31:3] 0 NA Reserved

[2:1] 0 RW c2h_dsc_bypass C2H descriptor bypass loopback. When


set, the C2H descriptor bypass-out port is
looped back to the C2H descriptor
bypass-in port.
2'b00: No bypass loopback.
2'b01: C2H MM desc bypass loopback
and C2H Stream cache bypass loopback.
2'b10: C2H Stream Simple descriptor
bypass loopback.
2'b11: H2C stream 64 byte descriptors
are looped back to Completion interface.

Displayed in the footer


Page 356 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description

[0] 0 RW h2c_dsc_bypass H2C descriptor bypass loopback. When


set, the H2C descriptor bypass-out port is
looped back to the H2C descriptor
bypass-in port.
1'b1: H2C MM and H2C Stream
descriptor bypass loopback
1'b0: No descriptor loopback

USER_INTERRUPT (0x094)

Table: User Interrupt (0x094)

Bit Default Access Type Field Description

[31:20] 0 NA Reserved

[19:12] 0 RW usr_irq_in_fun User interrupt function number

[11:9] 0 NA Reserved

[8:4] 0 RW usr_irq_in_vec User interrupt vector number

[3:1] 0 NA Reserved

[0] 0 RW usr_irq User interrupt. When set, the example


design generates a user interrupt.

To generate a user interrupt:

1. Write the function number at bits [19:12]. This corresponds to the function that generates the
usr_irq_in_fnc user interrupt.
2. Write MSI-X Vector number at bits [8:4]. This corresponds to the entry in the MSI-X table that is
set up for usr_irq_in_vec user interrupt.
3. Write 1 to bit [0] to generate user interrupt. This bit clears itself after usr_irq_out_ack from the
DMA is generated.

All three above steps can be done at the same time, with a single write.
Following is the user interrupt timing diagram:

Figure: Interrupt

Displayed in the footer


Page 357 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
USER_INTERRUPT_MASK (0x098)

Table: User Interrupt Mask (0x098)

Bit Default Access Type Description

[31:0] 0 RW User Interrupt Mask

USER_INTERRUPT_VECTOR (0x09C)

Table: User Interrupt Vector (0x09C)

Bit Default Access Type Description

[31:0] 0 RW User Interrupt Vector

The user_interrupt_mask[31:0] and user_interrupt_vector[31:0] registers are provided as


an example design for user interrupt aggregation that can generate a user interrupt for a function. The
user_interrupt_mask[31:0] is anded (bitwire and) with user_interrupt_vector[31:0] and a
user interrupt is generated. The user_interrupt_vector[31:0] is clear on read register.
To generate a user interrupt:

1. Write the function number at user_interrupt[19:12]. This corresponds to which function


generates the usr_irq_in_fnc user interrupt.
2. Write the MSI-X Vector number at user_interrupt[8:4]. This corresponds to which entry in
MSI-X table is set up for the usr_irq_in_vec user interrupt.
3. Write mask value in the user_interrupt_mask[31:0] register.
4. Write the interrupt vector value in the user_interrupt_vector[31:0] register.

This generates a user interrupt to the DMA block.


There are two way to generate user interrupt:

Write to user_interrupt[0], or
Write to the user_interrupt_vector[31:0] register with mask set.

DMA_CONTROL (0x0A0)

Table: DMA Control (0x0A0)

Bit Default Access Type Field Description

[31:1] NA Reserved

[0] 0 RW gen_qdma_reset When this bit is


set, example
design
generates a
signal which
resets QDMA

Displayed in the footer


Page 358 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bit Default Access Type Field Description


interface logic.
This bit is
cleared after 100
cycles.
Writing a 1 to DMA_control[0] generates a soft reset on dma<0/1>_intrfc_resetn (active-Low). A reset
is asserted for 100 cycles, and following which of the signals will be deasserted.

VDM_MESSAGE_READ (0x0A4)

Table: VDM Message Read (0x0A4)

Bit Default Access Type Description

[31:0] RO VDM message read

Vendor Defined Message (VDM) messages, st_rx_msg_data, are stored in FIFO in the example
design. A read to this register (0x0A4) will pop out one 32-bit message at a time.

CPM5_QDMA_Gen5x8_MM_Performance_Design

CPM5 QDMA1 Gen5x8 AXI4 performance example design:

This design has CPM5 – QDMA1 enabled in Gen5x8 configuration as an End Point
The design targets VPK120 board and it supports synthesis and implementation flows
The design has AXI4 datapath accessing DDR over NoC
Capable of demonstrating the QDMA MM performance

To achieve maximum performance for AXI4 transfers, you need to use both the NoC channels 0 and
1. Both NoC channels can be used by programming the Host Profile settings. For more information,
see Host Profile.
For example all even queues can be assigned to Host ID 0 and odd queues can be assigned to Host
ID 1. During Q context programming, all even queues can be assigned with host ID 0 and all odd
queues can be assigned to host ID 1. In this manner there is equal traffic on NoC 0 and NoC1
channels. This maximizes the MM throughput from CPM to NoC to DDR memory.

CPM5_QDMA_Gen4x8_ST_Performance_Design

CPM5 QDMA1 Gen4x8 AXI4-Stream performance example design:

This design has CPM5 – QDMA1 enabled in Gen4x8 configuration as an End Point
The design targets VPK120 board and it supports Synthesis and Implementation flows
Capable of demonstrating the QDMA AXI4-Stream performance

Displayed in the footer


Page 359 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
CPM5_QDMA_Dual_Gen4x8_MM_ST_Design

CPM5 Dual Controller QDMA0 and QDMA1 with Gen4x8 AXI4 and AXI4-Stream functional example
design:

This design has CPM5–QDMA0 and CPM5-QDMA1 enabled in Gen4x8 configuration as an End
Point
The design targets VPK120 board and it supports Synthesis and Implementation flows
Enables each QDMA, AXI4 and QDMA AXI-ST functionality with 4 PF and 252 VFs
Capable of exercising AXI4, AXI-ST path, and descriptor bypass

Example design registers are listed under CPM5_QDMA_Gen4x8_MM_ST_Design section.

Versal_CPM_QDMA_EP_Simulation_Design

CPM5 QDMA1 Gen4x8 Functional simulation example design:

This design has CPM5 – QDMA1 enabled in Gen4x8 configuration as an End Point
The design support simulation
Enables QDMA AXI4 and QDMA AXI-ST functionality with 4 PF and 252 VFs
The design includes Root Port testbench, which simulates QDMA AXI4 and AXI4-Stream
datapath.

Example design registers are listed under CPM5_QDMA_Gen4x8_MM_ST_Design section.


✎ Note: Before doing data transfers, you must clear the context for that queue. First, you must clear
the perfetch context and the software context for the hardware context and credit context for
simulation to work properly.
✎ Note: For CPM simulation, you must use Synopsys VCS or Siemens Questa simulation tools.
Versal_CPM_Bridge RP_Design

CPM5 QDMA1 AXI-Bridge in Root Port mode:

This design has CPM5 – QDMA1 AXI Bridge mode enabled in Gen4x8 configuration as Root
Port
The design targets VPK120 board and It supports Synthesis and Implementation flows
The design implements the Root Complex functionality. It includes CIPS IP, which enabled both
CPM and PS

Versal_CPM_QDMA_EP_Design (Part Based)

The following are the presets available for you to select:

CPM5_QDMA_Gen5x8_ST_Performance_Design

CPM5 QDMA1 with Gen5x8 AXI4-Stream performance example design:

Displayed in the footer


Page 360 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The design targets "xcvp1202-3HP-e-S" part and it supports synthesis and implementation flows
This design has CPM5 – QDMA1 enabled in Gen5x8, AXI4-Stream configuration as an End
Point
Capable of demonstrating AXI-ST performance
Capable of performing in internal modes or cache bypass mode or in simple bypass mode
To enable the Simple bypass mode in the example design, you need to set the register offset
0x98 to 0x4
To change the data packet size on C2H direction, you need to set the example design register
offset 0x90 to the desired value

Performance Example Design Simple Bypass Mode Flow


In this example design, QMDA configuration bar is set to BAR0. Performance example design can be
controlled through AXI4-Lite bar which is BAR 0. Find out what is the bar offset for the QDMA
configuration bar (BAR0) and AXI4-Lite bar (BAR2).

1. Put example design in pause mode. Set offset 0x8 bit [30] to 1, all other bit values should not be
changed
2. Set the example design in simple bypass mode. Set offset 0x98 to 0x4
3. Set the desired packet size. Set offset 0x90 to the desired value
4. Enable and start the data transfer from the host application/driver
At this time no data transfer happens as the example design is paused.
5. Now you need to fetch the prefetch tag from QDMA IP (configuration bar)
Write 0 to QDMA configuration bar (BAR0) offset 0x1048. Write offset 0x1408 with value
0x0
Read prefetch tag from QDMA configuration bar (BAR0). Read offset 0x140C
You need to write that tag value to the example design (BAR2). Write tag value to offset
0x24
6. After the prefetch tag exchange, you can release the example design. Set offset 0x8 bit [30] to 0.
All other bits should not be changed
7. You can now see the data transfer from example design to host

CPM5_QDMA_Dual_Gen5x8_ST_Performance_Design

CPM5 Dual Controller QDMA0 and QDMA1 with Gen5x8 AXI4 and AXI4-Stream performance
example design

The design targets "vsva2785-3HP-e-S" part and it supports synthesis and implementation flows
This design has both CPM5–QDMA0 and CPM5-QDMA1 enabled in Gen5x8, AXI4-Stream
configuration as an End Point
Capable of demonstrating AXI-ST performance
Capable of performing in Internal modes or cache bypass mode or in Simple bypass mode
To change the data packet size on C2H direction you need to set example design register offset
0x90 to desired value

Displayed in the footer


Page 361 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Application Software Development


Device Drivers

Figure: Device Drivers

The above figure shows the usage model of Linux QDMA software drivers. The QDMA example
design is implemented on an AMD adaptive SoC, which is connected to an X86 host through PCI
Express® .

In the first use mode, the QDMA driver in kernel space runs on Linux, whereas the test
application runs in user space.
In the second use mode, the Data Plane Dev Kit (DPDK) is used to develop a QDMA Poll Mode
Driver (PMD) running entirely in the user space, and use the UIO and VFIO kernel framework to
communicate with the adaptive SoC.

For device driver documentation click DMA IP Drivers.

Linux QDMA Software Architecture (PF/VF)

Figure: Linux DMA Software Architecture

Displayed in the footer


Page 362 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The QDMA driver consists of the following three major components:

Device control tool


Creates a netlink socket for PCIe device query, queue management, reading the context of a
queue, etc.

DMA tool
Is the user space application to initiate a DMA transaction. You can use standard Linux utility dd
or fio, or use the example application in the driver package.

Kernel space driver


Creates the descriptors and translates the user space function into low-level command to
interact with the AMD Versal device.

For AMD QDMA Linux driver documentation, click here.

Using the Drivers

Drivers and the corresponding documentation are available at DMA IP Drivers.


✎ Note: Starting from 2022.1 release of the Linux driver for QDMA, if a design is using streaming
queues, they must be explicitly enabled through API as they are not configured at module load. If a
design is using tandem PCIe methodology at power-on, the enablement must occur after Stage 2 is
loaded to the device.
‼ Important: 8 MSI-X vectors are needed on all functions (PF/VF) for using QDMA IP driver.

Reference Software Driver Flow

AXI4 Memory Map Flow Chart

Figure: AXI4 Memory Map Flow Chart

Displayed in the footer


Page 363 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4 Memory Mapped C2H Flow

Figure: AXI4 Memory Mapped Card to Host (C2H) Flow Diagram

Displayed in the footer


Page 364 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4 Memory Mapped H2C Flow

Figure: AXI4 Memory Mapped Host to Card (H2C) Flow Diagram

Displayed in the footer


Page 365 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream Flow Chart

Figure: AXI4-Stream Flow Chart

Displayed in the footer


Page 366 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream C2H Flow

Figure: AXI4-Stream C2H Flow Diagram

Displayed in the footer


Page 367 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream H2C Flow

Figure: AXI4-Stream H2C Flow Diagram

Displayed in the footer


Page 368 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Debugging
This appendix includes details about resources available on the AMD Support website and debugging
tools.

Finding Help with AMD Adaptive Computing Solutions

To help in the design and debug process when using the functional mode, the Support web page
contains key resources such as product documentation, release notes, answer records, information
about known issues, and links for obtaining further product support. The Community Forums are also
available where members can learn, participate, share, and ask questions about AMD Adaptive
Computing solutions.

Displayed in the footer


Page 369 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Documentation

This product guide is the main document associated with the functional mode. This guide, along with
documentation related to all products that aid in the design process, can be found on the AMD
Adaptive Support web page or by using the AMD Adaptive Computing Documentation Navigator.
Download the Documentation Navigator from the Downloads page. For more information about this
tool and the features available, open the online help after installation.

Debug Guide

For more information on PCIe debug, see PCIe Debug K-Map.

Answer Records

Answer Records include information about commonly encountered problems, helpful information on
how to resolve these problems, and any known issues with an AMD Adaptive Computing product.
Answer Records are created and maintained daily to ensure that users have access to the most
accurate information available.
Answer Records for this functional mode can be located by using the Search Support box on the main
AMD Adaptive Support web page. To maximize your search results, use keywords such as:

Product name
Tool message(s)
Summary of the issue encountered

A filter search is available after results are returned to further target the results.

Master Answer Record for the Core

AR 75396.

Technical Support

AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:

Implement the solution in devices that are not defined in the documentation.
Customize the solution beyond that allowed in the product documentation.
Change any section of the design labeled DO NOT MODIFY.

To ask questions, navigate to the Community Forums.

Hardware Debug

Hardware issues can range from link bring-up to problems seen after hours of testing. This section
provides debug steps for common issues. The AMD Vivado™ debug feature is a valuable resource to

Displayed in the footer


Page 370 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
use in hardware debug. The signal names mentioned in the following individual sections can be
probed using the debug feature for debugging the specific problems.

General Checks

Ensure that all the timing constraints for the core were properly incorporated from the example design
and that all constraints were met during implementation.

Does it work in post-place and route timing simulation? If problems are seen in hardware but not
in timing simulation, this could indicate a PCB issue. Ensure that all clock sources are active and
clean.
If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the
locked port.
If your outputs go to 0, check your licensing.

Registers

A complete list of registers and attributes for the QDMA Subsystem is available in the Versal Adaptive
SoC Register Reference (AM012). Reviewing the registers and attributes might be helpful for
advanced debugging.
✎ Note: The attributes are set during IP customization in the Vivado IP catalog. After core
customization, attributes are read-only.

Upgrading
This appendix is not applicable for the first release of the functional mode.

AXI Bridge Subsystem for CPM4


Overview
The AXI Bridge Subsystem is designed for the AMD Vivado™ IP integrator in the AMD Vivado™
Design Suite. The AXI Bridge Subsystem provides an interface between an AXI4 user logic interface
and PCI Express® using the AMD Versal™ Integrated Block for PCI Express. The AXI Bridge
functional mode provides the translation level between the AXI4 embedded system to the PCI
Express system. The AXI Bridge functional mode translates the AXI4 memory read or writes to PCI®
Transaction Layer Packets (TLP) packets and translates PCIe memory read and write request TLP
packets to AXI4 interface commands.
The architecture of the AXI Bridge is shown in the following figure.

Figure: High-Level AXI Bridge Architecture

Displayed in the footer


Page 371 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Limitations

1. The achievable bandwidth for this subsystem depends on multiple factors, including but not
limited to the IP configuration, the data path options used with the IP, the host system
performance, and the methods by which data movements are programmed. For more
information on CPM4 AXI Bridge, see Data Bandwidth and Performance Tuning section. The
bandwidth ceiling is limited by the lower of the raw capacity of the designed PCIe link
configuration and the internal data interface used. The Data Bandwidth and Performance Tuning
section provides guidance on the related clock frequency settings and high-level guidance on
performance expectations. Achievable bandwidth might vary.
2. Bridge is compliant with all MPS and MRRS settings; however, all the traffic initiated from the
Bridge is limited to 256 bytes (max).
3. AXI address width is limited to 48 bits.

Product Specification
The Register block contains registers used in the AXI Bridge functional mode for dynamically mapping
the AXI4 memory mapped (MM) address range provided using the AXIBAR parameters to an address
for PCIe® range.
The slave bridge provides termination of memory-mapped AXI4 transactions from an AXI master
device (such as a processor). The slave bridge provides a way to translate addresses that are
mapped within the AXI4 memory mapped address domain to the domain addresses for PCIe. Write
transactions to the Slave Bridge are converted into one or more MemWr TLPs, depending on the
configured Max Payload Size setting, which are passed to the integrated block for PCI Express. The
slave bridge can support up to 32 active AXI4 Write requests. When a remote AXI master initiates a
read transaction to the slave bridge, the read address and qualifiers are captured and a MemRd
request TLP is passed to the core and a completion timeout timer is started. Completions received

Displayed in the footer


Page 372 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
through the core are correlated with pending read requests and read data is returned to the AXI
master. The slave bridge can support up to 32 active AXI4 Read requests with pending completions.
The master bridge processes both PCIe MemWr and MemRd request TLPs received from the
Integrated Block for PCI Express and provides a means to translate addresses that are mapped
within the address for PCIe domain to the memory mapped AXI4 address domain. Each PCIe MemWr
request TLP header is used to create an address and qualifiers for the memory mapped AXI4 bus and
the associated write data is passed to the addressed memory mapped AXI4 Slave. The Master Bridge
can support up to 32 active PCIe MemWr request TLPs. PCIe MemWr request TLPs support is as
follows:

4 for 64-bit AXI data width


8 for 128-bit AXI data width
16 for 256-bit AXI data width
32 for 512-bit AXI data width

Each PCIe MemRd request TLP header is used to create an address and qualifiers for the memory-
mapped AXI4 bus. Read data is collected from the addressed memory mapped AXI4 slave and used
to generate completion TLPs which are then passed to the integrated block for PCI Express. The
Master Bridge in can support up to 32 active PCIe MemRd request TLPs with pending completions for
improved AXI4 pipelining performance.
The instantiated AXI4-Stream Enhanced PCIe block contains submodules including the
Requester/Completer interfaces to the AXI bridge and the Register block. The Register block contains
the status, control, and interrupt registers.

AXI Bridge Operations

AXI Transactions for PCIe

The following tables are the translation tables for AXI4-Stream and memory-mapped transactions.

Table: AXI4 Memory-Mapped Transactions to AXI4-Stream PCIe TLPs

AXI4 Memory-Mapped Transaction AXI4-Stream PCIe TLPs

INCR Burst Read of AXIBAR MemRd 32 (3DW)

INCR Burst Write to AXIBAR MemWr 32 (3DW)

INCR Burst Read of AXIBAR MemRd 64 (4DW)

INCR Burst Write to AXIBAR MemWr 64 (4DW)

Table: AXI4-Stream PCIe TLPs to AXI4 Memory Mapped Transactions

AXI4-Stream PCIe TLPs AXI4 Memory-Mapped Transaction

MemRd 32 (3DW) of PCIEBAR INCR Burst Read

MemWr 32 (3DW) to PCIEBAR INCR Burst Write

Displayed in the footer


Page 373 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream PCIe TLPs AXI4 Memory-Mapped Transaction

MemRd 64 (4DW) of PCIEBAR INCR Burst Read

MemWr 64 (4DW) to PCIEBAR INCR Burst Write


For PCIe® requests with lengths greater than 1 Dword, the size of the data burst on the Master AXI
interface will always equal the width of the AXI data bus even when the request received from the
PCIe link is shorter than the AXI bus width.
Slave axi write strobe (wstrb) signal can be used to facilitate data alignment to an address boundary.
Write strobe signal can be 0 in the beginning of a valid data cycle and it appropriately calculates an
offset to the given address. However, the valid data identified by the write strobe signal must be
continuous from the first byte enable to the last byte enable.
All transactions initiated at the Slave Bridge interface will be modified and metered by the IP as
necessary. The Slave Bridge interface will conform to AXI4 specification and allow burst size up to
4KB, and the IP will split the transaction automatically according to PCIe Max Read Request Size
(MRRS), Max Payload Size (MPS), and Read Completion Boundary (RCB), As a result of this
operation, one request at the AXI domain result in multiple request at the PCIe domain, and the IP will
adjust the number of issued PCIe request accordingly to avoid oversubscribing the available
Completion buffer.
The Slave Bridge does not support narrow transfers natively on its AXI Slave interface. It is highly
recommended for the AXI Master interfacing to the Slave Bridge to never generate a narrow transfers.
However, the Bridge core can be customized to enable AXI Slave narrow burst support to allow
interfacing to an AXI master which generates narrow transfers. These narrow transfers are typically
generated as a result of interfacing with a smaller width AXI Master through an AXI Interconnect IP.
When this option is enabled in the Bridge core, it adds AXI Upsizer IP as a sub-core, it is attached to
the Slave Bridge AXI Slave interface. The internal AXI Upsizer IP is configured to modify the AXI
transaction regardless of the modifiable bit in the AxCache signal to guarantee full transfers at the AXI
Slave interface and it can produce a new AXI request that is longer than the original AXI request
before it is processed into PCIe packets. The AXI Master which originated the read request or the
destination PCIe device which is the recipient of the write request, never receives extra data,
however, care must be taken when reading a destination device with destructive reads behavior (such
as FIFOs or registers that are clear on read) because extra bytes read by core can alter the contents
of those devices. Consequently, the extra null data beat that is generated as a result of the modified
AXI request can impact the performance of the Bridge core. For more information on AXI Upsizer IP,
see AXI Interconnect LogiCORE IP Product Guide (PG059).

Transaction Ordering for PCIe

The AXI Bridge functional mode conforms to PCIe® transaction ordering rules. See the PCI-SIG
Specifications for the complete rule set. The following behaviors are implemented in the AXI Bridge
functional mode to enforce the PCIe transaction ordering rules on the highly-parallel AXI bus of the
bridge.

Displayed in the footer


Page 374 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The bresp to the remote (requesting) AXI4 master device for a write to a remote PCIe device is
not issued until the MemWr TLP transmission is guaranteed to be sent on the PCIe link before any
subsequent TX-transfers.
If Relaxed Ordering bit is not set within the TLP header, then a remote PCIe device read to a
remote AXI slave is not permitted to pass any previous remote PCIe device writes to a remote
AXI slave received by the AXI Bridge functional mode. The AXI read address phase is held until
the previous AXI write transactions have completed and bresp has been received for the AXI
write transactions. If the Relaxed Ordering attribute bit is set within the TLP header, then the
remote PCIe device read is permitted to pass.
Read completion data received from a remote PCIe device are not permitted to pass any remote
PCIe device writes to a remote AXI slave received by the AXI Bridge functional mode prior to the
read completion data. The bresp for the AXI write(s) must be received before the completion
data is presented on the AXI read data channel.

✎ Note: The transaction ordering rules for PCIe might have an impact on data throughput in heavy
bidirectional traffic.

BAR and Address Translation

BAR Addressing

Aperture_Base_Address_n represents Aperture Base Address of nth BAR in GUI


Aperture_High_Address_n represents Aperture High Address of nth BAR in GUI
AXI to PCIe Translation_n represents AXI to PCIe_translation of nth BAR in GUI
Aperture_Base_Address n and Aperture_High_Address_n are used to calculate the size of the
AXI BAR n and during address translation to PCIe address.

Aperture_Base_Address_n provides the low address where AXI BAR n starts and will be
regarded as address offset 0x0 when the address is translated.
Aperture_High_Address_n is the high address of the last valid byte address of AXI BAR n. (For
more details on how the address gets translated, see Address Translation.)

The difference between Aperture_Base_Address_n and Aperture_High_Address_n is your AXI


BAR n size. These values must be set accordingly such that the AXI BAR n size is a power of two and
must have at least 4K.
When a packet is sent to the Bridge core (outgoing PCIe packets), the packet must have an address
that is in the range of Aperture_Base_Address_n and Aperture_High_Address_n. For a design
that has an AXI Master with a potential of generating an address outside of this range, you must
attach an additional AXI Interconnect core or provide an external address filtering logic. Any packet
that is received by the AXI Interconnect core that has an address outside of this range is responded to
with a SLVERR and prevented from progressing into the Bridge core. When the IP integrator is used,
these parameters are derived from the Address Editor tab within the IP integrator. The Address Editor
sets the AXI Interconnect as well as the core so the address range matches, and the packet is routed
to the core only when the packet has an address within the valid range. AXI Address width is limited to
48 bits.

Displayed in the footer


Page 375 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Address Translation

There are two ways to change the PCIe address translation:

Address translation from the IP GUI configuration.


Address translation through registers.

Address translation from the IP GUI configuration


Set Aperture Base Address and Aperture High Address to a desired value during the IP
configuration. You should set AXI to PCIE Translation to all 0s.

Address translation through registers


AXI base address translation register are listed in cpm4-bridge-v2-1-register.csv file, offset
starting at 0xEE0 to 0xF0C. These set of registers can be used in two ways based on the
address width of the PCIE address. When the PCIe address space is 32 bits, the translation
vector should be placed into the AXIBAR_<n>L registers where n is AXI BAR number (0 to 5).
When PCIe address space is 64 bits, the most significant 32 bits are written into AXIBAR_<n>U
and the least significant 32 bits are written into AXIBAR_<n>L. Care should be taken so that
invalid values are not written to the address translation registers.

Four examples follow:

Example 1 (32-bit PCIe Address Mapping) demonstrates how to set up three AXI BARs and
translate the AXI address to a 32-bit address for PCIe.
Example 2 (64-bit PCIe Address Mapping) demonstrates how to set up three AXI BARs and
translate the AXI address to a 64-bit address for PCIe.
Example 3 demonstrates how to set up two 64-bit PCIe BARs and translate the address for PCIe
to an AXI address.
Example 4 demonstrates how to set up a combination of two 32-bit AXI BARs and two 64 bit AXI
BARs, and translate the AXI address to an address for PCIe.

Example 1 (32-bit PCIe Address Mapping)

This example shows the generic settings to set up three independent AXI BARs and address
translation of AXI addresses to a remote 32-bit address space for PCIe. This setting of AXI BARs
does not depend on the BARs for PCIe in the functional mode.
In this example, number of AXI BARs are 3, the following assignments for each range are made.

Aperture_Base_Address_0=0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 Kbytes)
AXI_to_PCIe_Translation_0=0x00000000_56710000 (Bits 63-32 are zero in order
to produce a
32-bit PCIe TLP. Bits 15-0 must be zero based on the AXI BAR aperture size.
Non-zero
values in the lower 16 bits are invalid translation values.)
Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 Kbytes)
AXI_to_PCIe_Translation_1=0x00000000_FEDC0000 (Bits 63-32 are zero in order

Displayed in the footer


Page 376 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
to produce a
32-bit PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size.
Non-zero
values in the lower 13 bits are invalid translation values.)
Aperture_Base_Address_1 =0x00000000_FE000000
Aperture_High_Address_2 =0x00000000_FFFFFFFF (32 Mbytes)
AXI_to_PCIe_Translation_2=0x00000000_40000000 (Bits 63-32 are zero in order
to produce a
32-bit PCIe TLP. Bits 24-0 must be zero based on the AXI BAR aperture size.
Non-zero
values in the lower 25 bits are invalid translation values.)

Figure: Example 1 Settings

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the AXI bus yields
0x56710ABC on the bus for PCIe.

Figure: AXI to PCIe Address Translation

Displayed in the footer


Page 377 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the AXI bus yields
0xFEDC1123 on the bus for PCIe.
Accessing the Bridge AXI BAR_2 with address 0x0000_FFEDCBA on the AXI bus yields
0x41FEDCBA on the bus for PCIe.

Example 2 (64-bit PCIe Address Mapping)

This example shows the generic settings to set up to three independent AXI BARs and address
translation of AXI addresses to a remote 64-bit address space for PCIe. This setting of AXI BARs
does not depend on the BARs for PCIe within the Bridge.
In this example, number of AXI BARs are three, the following assignments for each range are made:

Aperture_Base_Address_0 =0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 Kbytes)
AXI_to_PCIe_Translation_0=0x5000000056710000 (Bits 63-32 are non-zero in order to
produce a
64-bit PCIe TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 16 bits are invalid translation values.)

Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 Kbytes)
AXI_to_PCIe_Translation_1=0x60000000_FEDC0000 (Bits 63-32 are non-zero in order to
produce
a 64-bit PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Non-
zero
values in the lower 13 bits are invalid translation values.)

Aperture_Base_Address_2 =0x00000000_FE000000
Aperture_High_Address_2 =0x00000000_FFFFFFFF (32 Mbytes)
AXI_to_PCIe_Translation_2=0x70000000_40000000 (Bits 63-32 are non-zero in order to
produce a
64-bit PCIe TLP. Bits 24-0 must be zero based on the AXI BAR aperture size. Non-

Displayed in the footer


Page 378 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
zero
values in the lower 25 bits are invalid translation values.)

Figure: Example 2 Settings

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the bus yields
0x5000000056710ABC on the bus for PCIe.
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the bus yields
0x60000000FEDC1123 on the bus for PCIe.
Accessing the Bridge AXI BAR_2 with address 0x0000_FFFEDCBA on the bus yields
0x7000000041FEDCBA on the bus for PCIe.

Example 3

This example shows the generic settings to set up two independent BARs for PCIe® and address
translation of addresses for PCIe to a remote AXI address space. This setting of BARs for PCIe does
not depend on the AXI BARs within the bridge.
In this example, number of PCIe BAR are two, the following range assignments are made.

Aperture_Base_Address_0 =0x00000000_12340000
Aperture_High_Address_0 =0x00000000_1234FFFF (64 KB)
AXI_to_PCIe_Translation_0=0x00000000_56710000 (Bits 63-32 are zero to
produce a 32-bit PCIe
TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-zero
values in
the lower 16 bits are invalid translation values.)
Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 KB)
AXI_to_PCIe_Translation_1=0x50000000_FEDC0000 (Bits 63-32 are non-zero to
produce a 64-bit
PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Nonzero

Displayed in the footer


Page 379 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
values
in the lower 13 bits are invalid translation values.)

Figure: Example 3 Settings

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the AXI bus yields
0x56710ABC on the bus for PCIe.
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the AXI bus yields
0x50000000FEDC1123 on the bus for PCIe.

Figure: PCIe to AXI Translation

Example 4

This example shows the generic settings of four AXI BARs and address translation of AXI addresses
to a remote 32-bit and 64-bit addresses for PCIe® . This setting of AXI BARs do not depend on the
BARs for PCIe within the Bridge.
In this example, where number AXI BARs are 4, the following assignments for each range are made:

Aperture_Base_Address_0 =0x00000000_12340000

Displayed in the footer


Page 380 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Aperture_High_Address_0 =0x00000000_1234FFFF (64 KB)
AXI_to_PCIe_Translation_0=0x00000000_56710000 (Bits 63-32 are zero to produce a 32-
bit PCIe
TLP. Bits 15-0 must be zero based on the AXI BAR aperture size. Non-zero values in
the lower 16 bits are invalid translation values.)

Aperture_Base_Address_1 =0x00000000_ABCDE000
Aperture_High_Address_1 =0x00000000_ABCDFFFF (8 KB)
AXI_to_PCIe_Translation_1=0x50000000_FEDC0000 (Bits 63-32 are non-zero to produce a
64-bit
PCIe TLP. Bits 12-0 must be zero based on the AXI BAR aperture size. Non-zero
values
in the lower 13 bits are invalid translation values.)

Aperture_Base_Address_2 =0x00000000_FE000000
Aperture_High_Address_2 =0x00000000_FFFFFFFF (32 MB)
AXI_to_PCIe_Translation_2=0x00000000_40000000 (Bits 63-32 are zero to produce a 32-
bit PCIe
TLP. Bits 24-0 must be zero based on the AXI BAR aperture size. Non-zero values in
the lower 25 bits are invalid translation values.)

Aperture_Base_Address_3 =0x00000000_00000000
Aperture_High_Address_3 =0x00000000_00000FFF (4 KB)
AXI_to_PCIe_Translation_3=0x60000000_87654000 (Bits 63-32 are non-zero to produce a
64-bit
PCIe TLP. Bits 11-0 must be zero based on the AXI BAR aperture size. Non-zero
values
in the lower 12 bits are invalid translation values.)

Figure: Example 4 Settings

Displayed in the footer


Page 381 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Accessing the Bridge AXI BAR_0 with address 0x0000_12340ABC on the AXI bus yields
0x56710ABC on the bus for PCIe.
Accessing the Bridge AXI BAR_1 with address 0x0000_ABCDF123 on the AXI bus yields
0x50000000FEDC1123 on the bus for PCIe.
Accessing the Bridge AX IBAR_2 with address 0x0000_FFFEDCBA on the AXI bus yields
0x41FEDCBA on the bus for PCIe.
Accessing the Bridge AXI BAR_3 with address 0x0000_00000071 on the AXI bus yields
0x6000000087654071 on the bus for PCIe.

Addressing Checks

When setting the following parameters for PCIe® address mapping, C_PCIEBAR2AXIBAR_n and
PF0_BARn_APERTURE_SIZE, be sure these are set to allow for the addressing space on the AXI
system. For example, the following setting is illegal and results in an invalid AXI address.

C_PCIEBAR2AXIBAR_n=0x00000000_FFFFF000
PF0_BARn_APERTURE_SIZE=0x06 (8 KB)

For an 8 Kilobyte BAR, the lower 13 bits must be zero. As a result, the C_PCIEBAR2AXIBAR_n value
should be modified to be 0x00000000_FFFFE0000. Also, check for a larger value on
PF0_BARn_APERTURE_SIZE compared to the value assigned to the C_PCIEBAR2AXIBAR_n parameter.
And example parameter setting follows.

C_PCIEBAR2AXIBAR_n=0xFFFF_E000
PF0_BARn_APERTURE_SIZE=0x0D (1 MB)

Displayed in the footer


Page 382 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
To keep the AXIBAR upper address bits as 0xFFFF_E000 (to reference bits [31:13]), the
PF0_BARn_APERTURE_SIZE parameter must be set to 0x06 (8 KB).

Malformed TLP

The integrated block for PCI Express® detects a malformed TLP. For the IP configured as an
Endpoint core, a malformed TLP results in a fatal error message being sent upstream if error reporting
is enabled in the Device Control register.

Abnormal Conditions

This section describes how the Slave side and Master side (see the following tables) of the AXI Bridge
functional mode handle abnormal conditions.

Slave Bridge Abnormal Conditions

Illegal Burst Type

The slave bridge monitors AXI read and write burst type inputs to ensure that only the INCR
(incrementing burst) type is requested. Any other value on these inputs is treated as an error condition
and the Slave Illegal Burst (SIB) interrupt is asserted. In the case of a read request, the Bridge
asserts SLVERR for all data beats and arbitrary data is placed on the Slave AXI4-MM read data bus.
In the case of a write request, the Bridge asserts SLVERR for the write response and all write data is
discarded.

Completion TLP Errors

Any request to the bus for PCIe (except for a posted Memory write) requires a completion TLP to
complete the associated AXI request. The Slave side of the Bridge checks the received completion
TLPs for errors and checks for completion TLPs that are never returned (Completion Timeout). Each
of the completion TLP error types are discussed in the subsequent sections.
Unexpected Completion
When the slave bridge receives a completion TLP, it matches the header RequesterID and Tag to the
outstanding RequesterID and Tag. A match failure indicates the TLP is an Unexpected Completion
which results in the completion TLP being discarded and a Slave Unexpected Completion (SUC)
interrupt strobe being asserted. Normal operation then continues.
Unsupported Request
A device for PCIe might not be capable of satisfying a specific read request. For example, if the read
request targets an unsupported address for PCIe, the completer returns a completion TLP with a
completion status of 0b001 - Unsupported Request. The completer that returns a completion TLP
with a completion status of Reserved must be treated as an unsupported request status, according to
the PCI Express Base Specification v3.0. When the slave bridge receives an unsupported request
response, the Slave Unsupported Request (SUR) interrupt is asserted and the DECERR response is
asserted with arbitrary data on the AXI4 memory mapped bus.
Completion Timeout

Displayed in the footer


Page 383 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
A Completion Timeout occurs when a completion (Cpl) or completion with data (CplD) TLP is not
returned after an AXI to PCIe memory read request, or after a PCIe Configuration Read/Write request.
For PCIe Configuration Read/Write request, completions must complete within the C_COMP_TIMEOUT
parameter selected value from the time the request is issued. For PCIe Memory Read request,
completions must complete within the value set in the Device Control 2 register in the PCIe
Configuration Space register. When a completion timeout occurs, an OKAY response is asserted with
all 1s data on the memory mapped AXI4 bus.
Poison Bit Received on Completion Packet
An Error Poison occurs when the completion TLP EP bit is set, indicating that there is poisoned data
in the payload. When the slave bridge detects the poisoned packet, the Slave Error Poison (SEP)
interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.
Completer Abort
A Completer Abort occurs when the completion TLP completion status is 0b100 - Completer Abort.
This indicates that the completer has encountered a state in which it was unable to complete the
transaction. When the slave bridge receives a completer abort response, the Slave Completer Abort
(SCA) interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.

Table: Slave Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

SIB interrupt is asserted.


Read Illegal burst type SLVERR response given with
arbitrary read data.

SIB interrupt is asserted.


Write Illegal burst type Write data is discarded.
SLVERR response given.

SUC interrupt is asserted.


Read Unexpected completion
Completion is discarded.

SUR interrupt is asserted.


Unsupported Request status
Read DECERR response given with
returned
arbitrary read data.

SCT interrupt is asserted.


Read Completion timeout SLVERR response given with
arbitrary read data.

Completion data is discarded.


SEP interrupt is asserted.
Read Poison bit in completion
SLVERR response given with
arbitrary read data.

Read Completer Abort (CA) status returned SCA interrupt is asserted.

Displayed in the footer


Page 384 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Transfer Type Abnormal Condition Bridge Response


SLVERR response given with
arbitrary read data.

PCIe Error Handling

RP_ERROR_FIFO (RP only)


Error handling is as follows:

1. Read register 0xE10 (INT_DEC) and check if bits is set to one of: [9] (correctable), [10]
(non_fatal) or [11] (fatal).
2. Read register 0xE20 (RP_CSR) and check if bit [16] is set to see if efifo_not_empty is set.
3. If FIFO is not empty read FIFO by reading 0xE2C (RP_FIFO_READ)
a. Error message indicates where the error comes from (i.e, requestor ID) and Error type.
4. To clear the error, write to 0xE2C (RP_FIFO_READ). Value does not matter
5. Repeat steps 2 and 3 until 0xE2C (RP_FIFO_READ) bit [18] valid bit is cleared.
6. Write 1 to register 0xE10 (INT_DEC) to clear bits [9] (correctable), [10] (non_fatal) or [11] (fatal).

RP_PME_FIFO (RP only)


Error handling is a follows:

1. Read register 0xE10 (INT_DEC) and check if bits [17] is set, which indicates PM_PME message
has been received.
2. Read register 0xE20 (RP_CSR) and check if bit [18] is set to see if pfifo_not_empty is set.
3. If FIFO is not empty, read FIFO by reading 0xE30 (RP_PFIFO).
a. Message will indicate where the message comes from (i.e., requestor ID).
4. To clear the error, write to 0xE30 (RP_PFIFO). Value does not matter.
5. Repeat steps 2 and 3 until 0xE30 (RP_PFIFO) bit [31] valid bit is cleared.
6. Write 1 to register 0xE10 (INT_DEC) to clear bits [17].

Master Bridge Abnormal Conditions

The following sections describe the manner in which the master bridge handles abnormal conditions.

AXI DECERR Response

When the master bridge receives a DECERR response from the AXI bus, the request is discarded
and the Master DECERR (MDE) interrupt is asserted. If the request was non-posted, a completion
packet with the Completion Status = Unsupported Request (UR) is returned on the bus for PCIe.

Displayed in the footer


Page 385 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI SLVERR Response

When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.

Max Payload Size for PCIe, Max Read Request Size or 4K Page Violated

When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.

Completion Packets

When the MAX_READ_REQUEST_SIZE is greater than the MAX_PAYLOAD_SIZE, a read request for PCIe
can ask for more data than the master bridge can insert into a single completion packet. When this
situation occurs, multiple completion packets are generated up to MAX_PAYLOAD_SIZE, with the Read
Completion Boundary (RCB) observed.

Poison Bit

When the poison bit is set in a transaction layer packet (TLP) header, the payload following the
header is corrupted. When the master bridge receives a memory request TLP with the poison bit set,
it discards the TLP and asserts the Master Error Poison (MEP) interrupt strobe.

Zero Length Requests

When the master bridge receives a read request with the Length = 0x1, FirstBE = 0x00, and LastBE =
0x00, it responds by sending a completion with Status = Successful Completion.
When the master bridge receives a write request with the Length = 0x1, FirstBE = 0x00, and LastBE
= 0x00 there is no effect.

Table: Master Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

MSE interrupt strobe asserted.


Read SLVERR response Completion returned with Completer
Abort status.

Write SLVERR response MSE interrupt strobe asserted.

Write Poison bit set in request MEP interrupt strobe asserted.

Displayed in the footer


Page 386 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Transfer Type Abnormal Condition Bridge Response


Data is discarded.

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

Link Down Behavior

The normal operation of the functional mode is dependent on the integrated block for PCIe
establishing and maintaining the point-to-point link with an external device for PCIe. If the link has
been lost, it must be re-established to return to normal operation.
When a Hot Reset is received by the functional mode, the link goes down and the PCI Configuration
Space must be reconfigured.
Initiated AXI4 write transactions that have not yet completed on the AXI4 bus when the link goes down
have a SLVERR response given and the write data is discarded. Initiated AXI4 read transactions that
have not yet completed on the AXI4 bus when the link goes down have a SLVERR response given,
with arbitrary read data returned.
Any MemWr TLPs for PCIe that have been received, but the associated AXI4 write transaction has not
started when the link goes down, are discarded.

Endpoint

When configured to support Endpoint functionality, the AXI Bridge functional mode fully supports
Endpoint operation as supported by the underlying block. There are a few details that need special
consideration. The following subsections contain information and design considerations about
Endpoint support.

Interrupts

The Interrupt modes in the following section applies to AXI Bridge mode only.
Multiple interrupt modes can be configured during IP configuration, however only one interrupt mode
is used at runtime. If multiple interrupt modes are enabled by the host after PCI bus enumeration at
runtime, MSI-X interrupt takes precedence over MSI interrupt, and MSI interrupt takes precedence
over Legacy interrupt. All of these interrupt modes are sent using the same xdma0_usr_irq_*
interface and the core automatically picks the best available interrupt mode at runtime.

Legacy Interrupts

Asserting one or more bits of xdma0_usr_irq_req when legacy interrupts are enabled causes the IP
to issue a legacy interrupt over PCIe. Multiple bits may be asserted simultaneously but each bit must
remain asserted until the corresponding xdma0_usr_irq_ack bit has been asserted. After a
xdma0_usr_irq_req bit is asserted, it must remain asserted until the corresponding
xdma0_usr_irq_ack bit is asserted and the interrupt has been serviced and cleared by the Host. The

Displayed in the footer


Page 387 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
xdma0_usr_irq_ack assertion indicates the requested interrupt has been sent on the PCIe block.
This will ensure interrupt pending register within the IP remains asserted when queried by the Host's
Interrupt Service Routine (ISR) to determine the source of interrupts. You must implement a
mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an interrupt is serviced.
After the xdma0_usr_irq_req bit is deasserted, it cannot be reasserted until the corresponding
xdma0_usr_irq_ack bit has been asserted for a second time. This indicates the deassertion
message for the legacy interrupt has been sent over PCIe. After a second xdma0_usr_irq_ack
occurred, the xdma0_usr_irq_req wire can be reasserted to generate another legacy interrupt.
The xdma0_usr_irq_req bit can be mapped to legacy interrupt INTA, INTB, INTC, INTD through the
configuration registers. The following figure shows the legacy interrupts.
This figure shows only the handshake between xdma0_usr_irq_req and xdma0_usr_irq_ack. The
user application might not clear or service the interrupt immediately, in which case, you must keep
xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: Legacy Interrupts

MSI and Internal MSI-X Interrupts

Asserting one or more bits of xdma0_usr_irq_req causes the generation of an MSI or MSI-X
interrupt if MSI or MSI-X is enabled. If both MSI and MSI-X capabilities are enabled, an MSI-X
interrupt is generated. The Internal MSI-X interrupts mode is enabled when you set the MSI-X
Implementation Location option to Internal in the PCIe Misc Tab.
After a xdma0_usr_irq_req bit is asserted, it must remain asserted until the corresponding
xdma0_usr_irq_ack bit is asserted and the interrupt has been serviced and cleared by the Host. The
xdma0_usr_irq_ack assertion indicates the requested interrupt has been sent on the PCIe block.
This will ensure the interrupt pending register within the IP remains asserted when queried by the
Host's Interrupt Service Routine (ISR) to determine the source of interrupts. You must implement a
mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an Interrupt is serviced.
Configuration registers are available to map xdma0_usr_irq_req to MSI or MSI-X vectors. For MSI-X
support, there is also a vector table and PBA table. The following figure shows the MSI interrupt.
This figure shows only the handshake between xdma0_usr_irq_req and xdma0_usr_irq_ack. Your
application might not clear or service the interrupt immediately, in which case, you must keep
xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: MSI Interrupts

Displayed in the footer


Page 388 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The following figure shows the MSI-X interrupt.


This figure shows only the handshake between xdma0_usr_irq_req and xdma0_usr_irq_ack. Your
application might not clear or service the interrupt immediately, in which case, you must keep
xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: MSI-X Interrupts

Root Port

When configured to support Root Port functionality, the AXI Bridge functional mode fully supports
Root Port operation as supported by the underlying block. There are a few details that need special
consideration. The following subsections contain information and design considerations about Root
Port support.

Enhanced Configuration Access Memory

When the functional mode is configured as a Root Port, configuration traffic is generated by using the
PCI Express enhanced configuration access mechanism (ECAM). ECAM functionality is available
only when the core is configured as a Root Port. Reads and writes to a certain memory aperture are
translated to configuration reads and writes, as specified in the PCI Express Base Specification
(v3.0), §7.2.2.
The address breakdown is defined in the following table. ECAM is used in conjunction with the Bridge
Register Memory Map only when used in both AXI Bridge for PCIe Gen3 core as well as DMA/Bridge
Subsystem for PCIe in AXI Bridge mode core. The DMA/Bridge Subsystem for PCIe Register Memory
Map does not have ECAM functionality.
When an ECAM access is attempted to the primary bus number, which defaults as bus 0 from reset,
then access to the type 1 PCI™ Configuration Header of the integrated block in the Enhanced
Interface for PCIe is performed. When an ECAM access is attempted to the secondary bus number,
then type 0 configuration transactions are generated. When an ECAM access is attempted to a bus
number that is in the range defined by the secondary bus number and subordinate bus number (not
including the secondary bus number), then type 1 configuration transactions are generated. The
primary, secondary, and subordinate bus numbers are written by Root Port software to the type 1 PCI
Configuration Header of the Enhanced Interface for PCIe in the beginning of the enumeration
procedure.
When an ECAM access is attempted to a bus number that is out of the bus_number and subordinate
bus number range, the bridge does not generate a configuration request and signal SLVERR response
on the AXI4-Lite bus. When the Bridge is configured for EP (PL_UPSTREAM_FACING = TRUE), the
underlying Integrated Block configuration space and the core memory map are available at the
beginning of the memory space. The memory space looks like a simple PCI Express® configuration

Displayed in the footer


Page 389 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
space. When the Bridge is configured for RC (PL_UPSTREAM_FACING = FALSE), the same is true, but
it also looks like an ECAM access to primary bus, Device 0, Function 0.
When the functional mode is configured as a Root Port, the reads and writes of the local ECAM are
Bus 0. Because the adaptive SoC only has a single Integrated Block for PCIe core, all local ECAM
operations to Bus 0 return the ECAM data for Device 0, Function 0.
Configuration write accesses across the PCI Express bus are non-posted writes and block the AXI4-
Lite interface while they are in progress. Because of this, system software is not able to service an
interrupt if one were to occur. However, interrupts due to abnormal terminations of configuration
transactions can generate interrupts. ECAM read transactions block subsequent Requester read
TLPs until the configuration read completions packet is returned to allow unique identification of the
completion packet.

Table: ECAM Addressing

Bits Name Description

1:0 Byte Address Ignored for this implementation. The


s<n>_axi_wstrb signals define byte enables for
ECAM accesses.

7:2 Register Number Register within the configuration space to


access.

11:8 Extended Register Number Along with Register Number, allows access to
PCI Express Extended Configuration Space.

14:12 Function Number Function Number to completer.

19:15 Device Number Device Number to completer.

27:20 Bus Number Bus Number to completer.

Root Port Enumeration

Any of the 6 BAR can be programmed within 3 regions that are listed below. Some regions full space
is not accessible because DMA register reside there.

Table: Root Port Settings

Region Address Range Comments

Region 0 0xE001_0000 to 0xEFFF_FFFF 0xE000_0000 to


0XE000_FFFF is used for
dma register and can not be
used for bridge

Region 1 0x6_1101_0000 to 0x7_FFFF_FFF 0x6_0000_0000 to


0x6_1100_FFFF is used for

Displayed in the footer


Page 390 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Region Address Range Comments


dma register and can not be
used for bridge

Region 2 0x80_0000_0000 to 0xBF_FFFF_FFFF

✎ Note: For Root Port, most of the OS does not support address translation; therefore, AMD
recommends Region 0 for 32-bit address space or non-prefetchable memory allocation. If RP design
needs to support 64-bit BAR, it is recommended EP select region 1 or 2. Refer to Root Port BAR for
more information.
In Versal architecture, address maps are fixed. If AXI Bridge Master BAR is not selected, all
transaction will be as is, that is, with no address translation. For information about the Versal adaptive
SoC global address map, see Versal Adaptive SoC Technical Reference Manual (AM011).
Root port configuration address offsets are not listed correctly. Next pointer address is not pointing to
proper address below AER, this can result in wrong configuration values. Up to AER all listed values
are correct. User can read Config extended space below AER with fixed targeted address. Following
are the target address values:

Table: PCIe Extended Capability for Root Port

PF0 Start Address

AER 0x100

2nd PCIE 0x1C0

VC 0x1F0

Loopback VSEC 0x330

DLL Feature Cap 0x3A0

16 GT Cap 0x3B0

Margining Cap 0x400

ACS 0x450

VC Arb Table 0x504

PASID 0x5F0

Extend-Large 0x600

Extend-Small 0xE00

Coherent Data Path

Root Port Bridge IP has two choices for AXI4 data path. Coherent data path is routed to the NoC by
selecting CPM to NoC port 0 in the CIPS CPM IP customization GUI. Alternatively, non-coherent data

Displayed in the footer


Page 391 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
path is routed directly to the NoC by selecting CPM to NoC port 1 route in the CIPS CPM IP
customization GUI.

Power Limit Message TLP

The AXI Bridge functional mode automatically sends a Power Limit Message TLP when the Master
Enable bit of the Command Register is set. The software must set the Requester ID register before
setting the Master Enable bit to ensure that the desired Requester ID is used in the Message TLP.

Root Port Configuration Read

When an ECAM access is performed to the primary bus number, self-configuration of the integrated
block for PCIe is performed. A PCIe configuration transaction is not performed and is not presented
on the link. When an ECAM access is performed to the bus number that is equal to the secondary bus
value in the Enhanced PCIe Type 1 configuration header, then Type 0 configuration transactions are
generated.
When an ECAM access is attempted to a bus number that is in the range defined by the secondary
bus number and subordinate bus number range (not including secondary bus number), then Type 1
configuration transactions are generated. The primary, secondary and subordinate bus numbers are
written and updated by Root Port software to the Type 1 PCI™ Configuration Header of the AXI
Bridge functional mode in the enumeration procedure.
When an ECAM access is attempted to a bus number that is out of the range defined by the
secondary bus_number and subordinate bus number, the bridge does not generate a configuration
request and signal a SLVERR response on the AXI4 bus.
When an Unsupported Request (UR) response is received for a configuration read request, all ones
are returned on the AXI4 bus to signify that a device does not exist at the requested device address. It
is the responsibility of the software to ensure configuration write requests are not performed to device
addresses that do not exist. However, the AXI Bridge functional mode asserts SLVERR response on
the AXI4 bus when a configuration write request is performed on device addresses that do not exist or
a UR response is received.

Root Port BAR

Root Port BAR does not support packet filtering (all TLPs received from PCIe link are forwarded to the
user logic), however Address Translation can be configured to enable or disable, depending on the IP
configuration.
During core customization in the AMD Vivado™ Design Suite, when there is no BAR enabled, RP
passes all received packets to the user application without address translation or address filtering.
When BAR is enabled, by default the BAR address starts at 0x0000_0000 unless programmed
separately. Any packet received from the PCIe® link that hits a BAR is translated according to the
PCIE-to-AXI Address Translation rules.
✎ Note: The IP must not receive any TLPs outside of the PCIe BAR range from the PCIe link when
RP BAR is enabled. If this rule cannot be enforced, it's recommended that the PCIe BAR is disabled
and do address filtering and/or translation outside of the IP.
The Root Port BAR customization options in the Vivado Design Suite are found in the PCIe BARs
Tab.

Displayed in the footer


Page 392 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Configuration Transaction Timeout

Configuration transactions are non-posted transactions. The AXI Bridge functional mode has a timer
for timeout termination of configuration transactions that have not completed on the PCIe link. An
OKAY response and 0s data are given on the AXI4 memory mapped bus.
✎ Note: Multiple Configuration read (PCIe CFG Read) can block Configuration Writes (PCIe CFG
write). User need to have some kind of throttling mechanism for CFG reads so CFG Write can pass
by.

Abnormal Configuration Transaction Termination Responses

Responses to abnormal terminations to configuration transactions are shown in the following table.

Table: Responses of Bridge to Abnormal Configuration Terminations

Transfer Type Abnormal Condition Bridge Response

Bus number not in the range


of primary bus number SLVERR response is
Config Read or Write
through subordinate bus asserted.
number.

For PCIe Configuration


Read/Write request, an OKAY
Config Read or Write Completion timeout. response and 0s data are
given on the AXI4 memory
mapped bus.

Bus number in the range of


secondary bus number SLVERR response is
Config Write
through subordinate bus asserted.
number and UR is returned.

Port Description

Global Signals

The interface signals for the Bridge are described in the following table.

Table: Global Signals

Signal Name I/O Description

gt_refclk0_p/gt_refclk0_n I GT reference clock.

pci_gt_txp/pci_gt_txn O PCIe TX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

Displayed in the footer


Page 393 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name I/O Description

pci_gt_rxp/pci_gt_rxn I PCIe RX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

pcie0_user_lnk_up O Output active-High identifies that the


PCI Express core is linked up with a
host device.

pcie0_user_clk O User clock out. PCIe derived clock


output for all interface signals
output/input to AXI Bridge. Use this
clock to drive inputs and gate outputs
from AXI Bridge.

dma0_user_reset O User reset out. AXI reset signal


synchronous with the clock provided on
the pcie0_user_clk output. This reset
should drive all corresponding AXI
Interconnect signals.

cpm_cor_irq O Reserved

cpm_misc_irq O Reserved

cpm_uncor_irq O Reserved

cpm_irq0 I Reserved

cpm_irq1 I Reserved

AXI Slave Interface

AXI Bridge Slave ports are connected from the AMD Versal™ device programmable Network on Chip
(NoC) to the CPM DMA internally. For slave bridge AXI4 details and configuration, see Versal
Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP
Product Guide (PG313).

AXI Master Interface

AXI4 (MM) Master ports are connected from the AMD Versal device Network on Chip (NoC) to the
CPM DMA internally. For details, see Versal Adaptive SoC Programmable Network on Chip and
Integrated Memory Controller LogiCORE IP Product Guide (PG313). The AXI4 Master interface can
be connected to the DDR or the PL, depending on the NoC configuration.

Displayed in the footer


Page 394 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
AXI Bridge for PCIe Interrupts

Table: AXI Bridge for PCIe Interrupts

Signal Name I/O Description

User interrupt request. Asset to


bridge0_usr_irq_in I generate an interrupt and maintain
assertion until interrupt is serviced.

User interrupt acknowledge. Indicates


that the interrupt has been set on PCIe.
bridge0_usr_irq_ack[NUM_USR_IRQ-
O Two acks are generated for legacy
1:0]
interrupt. One ack is generated for
MSI/MSI-X interrupts.

NUM_USR_IRQ is selectable and it ranges from 0 to 15. Each bits in bridge0_usr_irq_req bus
corresponds to the same bits in bridge0_usr_irq_ack. For example, bridge0_usr_irq_ack[0]
represents an ack for bridge0_usr_irq_req[0].

Register Space

Bridge register space can be accessed using AXI Slave interface and user can also access Host
memory space.

Table: AXI Slave Bridge Register Space

Register Space AXI Slave Interface Address Range Details

Bridge registers 0x6_0000_0000 Described in Bridge register


space CSV file. See Bridge
Register Space for details.

Slave Bridge access to Host 0xE001_0000 - 0x Address range for Slave


memory space EFFF_FFFF bridge access is set during IP
0x6_1101_0000 - customization in the Address
0x7_FFFF_FFFF Editor tab of the Vivado IDE.
0x80_0000_0000 -
0xBF_FFFF_FFFF

Bridge register descriptions are found in cpm4-bridge-v2-1-registers.csv available in the register map
files.
To locate the register space information:

1. Download the register map files.


2. Extract the ZIP file contents into any write-accessible location.
3. Refer to the cpm4-bridge-v2-1-registers.csv file.

Displayed in the footer


Page 395 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Slave Bridge Registers Limitations

The Register Space mentioned in this document can also be accessible through the AXI4 Memory
Mapped Slave interface. All accesses to these registers will be based on the following AXI Base
Addresses:

For QDMA registers: Base Address = 0x6_1000_0000


For XDMA registers: Base Address = 0x6_1002_0000
For Bridge registers: Base Address = 0x6_0000_0000

The offsets within each register space are the same as listed for the PCIe BAR accesses.
Please make sure that all transactions targeting these register spaces have AWCACHE[1] and
ARCACHE[1] set to 1’b0 (Non-Modifiable) and only access it in 4 Bytes transactions.

All transactions originating from Programmable Logic (PL) region, must have an AXI Master that
sets AxCACHE[1] = 1’b0 before it enters the AXI NOC.
All transactions originating from the APU or RPU must be defined by a Memory Attribute nGnRnE
or nGnRE to ensure AxCACHE[1] = 1’b0.
All transactions originating from PPU has no additional requirement necessary.

Design Flow Steps


This section describes customizing and generating the functional mode, constraining the functional
mode, and the simulation, synthesis, and implementation steps that are specific to this IP functional
mode. More detailed information about the standard AMD Vivado™ design flows and the IP integrator
can be found in the following Vivado Design Suite user guides:

Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
Vivado Design Suite User Guide: Designing with IP (UG896)
Vivado Design Suite User Guide: Getting Started (UG910)
Vivado Design Suite User Guide: Logic Simulation (UG900)

Debugging
This appendix includes details about resources available on the AMD Support website and debugging
tools.

Finding Help with AMD Adaptive Computing Solutions

To help in the design and debug process when using the functional mode, the Support web page
contains key resources such as product documentation, release notes, answer records, information
about known issues, and links for obtaining further product support. The Community Forums are also
available where members can learn, participate, share, and ask questions about AMD Adaptive
Computing solutions.

Displayed in the footer


Page 396 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Documentation

This product guide is the main document associated with the functional mode. This guide, along with
documentation related to all products that aid in the design process, can be found on the AMD
Adaptive Support web page or by using the AMD Adaptive Computing Documentation Navigator.
Download the Documentation Navigator from the Downloads page. For more information about this
tool and the features available, open the online help after installation.

Debug Guide

For more information on PCIe Debug, see PCIe Debug K-Map.

Answer Records

Answer Records include information about commonly encountered problems, helpful information on
how to resolve these problems, and any known issues with an AMD Adaptive Computing product.
Answer Records are created and maintained daily to ensure that users have access to the most
accurate information available.
Answer Records for this functional mode can be located by using the Search Support box on the main
AMD Adaptive Support web page. To maximize your search results, use keywords such as:

Product name
Tool message(s)
Summary of the issue encountered

A filter search is available after results are returned to further target the results.

Master Answer Record for the Core

AR 75396.

Technical Support

AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:

Implement the solution in devices that are not defined in the documentation.
Customize the solution beyond that allowed in the product documentation.
Change any section of the design labeled DO NOT MODIFY.

To ask questions, navigate to the Community Forums.

Hardware Debug

Hardware issues can range from link bring-up to problems seen after hours of testing. This section
provides debug steps for common issues. The AMD Vivado™ debug feature is a valuable resource to

Displayed in the footer


Page 397 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
use in hardware debug. The signal names mentioned in the following individual sections can be
probed using the debug feature for debugging the specific problems.

General Checks

Ensure that all the timing constraints for the core were properly incorporated from the example design
and that all constraints were met during implementation.

Does it work in post-place and route timing simulation? If problems are seen in hardware but not
in timing simulation, this could indicate a PCB issue. Ensure that all clock sources are active and
clean.
If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the
locked port.
If your outputs go to 0, check your licensing.

Registers

A complete list of registers and attributes for the AXI Bridge Subsystem is available in the Versal
Adaptive SoC Register Reference (AM012). Reviewing the registers and attributes might be helpful
for advanced debugging.
✎ Note: The attributes are set during IP customization in the Vivado IP catalog. After core
customization, attributes are read-only.

Upgrading
This appendix is not applicable for the first release.

AXI Bridge Subsystem for CPM5


Overview
✎ Note: The information about AXI Bridge Subsystem for CPM5 in this section may not reflect the
latest information, and is subject to change.
The AXI Bridge Subsystem is designed for the AMD Vivado™ IP integrator in the AMD Vivado™
Design Suite. The AXI Bridge functional mode provides an interface between an AXI4 customer user
interface and PCI Express® using the AMD Versal™ Integrated Block for PCI Express. The AXI
Bridge functional mode provides the translation level between the AXI4 embedded system to the PCI
Express system. The AXI Bridge functional mode translates the AXI4 memory read or writes to PCI®
Transaction Layer Packets (TLP) packets and translates PCIe memory read and write request TLP
packets to AXI4 interface commands.
The architecture of the Bridge is shown in the following figure.

Figure: High-Level AXI Bridge Architecture

Displayed in the footer


Page 398 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Limitations

1. The achievable bandwidth for this subsystem depends on multiple factors, including but not
limited to the IP configuration, the data path options used with the IP, the host system
performance, and the methods by which data movements are programmed. See Data Bandwidth
and Performance Tuning for more information on CPM4 AXI Bridge. The bandwidth ceiling is
limited by the lower of the raw capacity of the designed PCIe link configuration and the internal
data interface used. The Data Bandwidth and Performance Tuning section provides guidance on
the related clock frequency settings and high-level guidance on performance expectations.
Achievable bandwidth might vary.
2. Bridge is compliant with all MPS and MRRS settings, however all traffic initiated from the bridge
is limited to 256 Bytes (max).
3. AXI Address width is limited to 48 bits.
4. Writes to the Slot Capability register in ECAM space, do not retain values but the functionality is
execute as expected. Reads of Slot Capability register does not retail the values.
5. AXI Bridge in Root Port mode does not support ASPM L1 or L0.

Product Specification
The Register block contains registers used in the AXI Bridge functional mode for dynamically mapping
the AXI4 memory mapped (MM) address range provided using the AXIBAR parameters to an address
for PCIe® range.
The slave bridge provides termination of memory-mapped AXI4 transactions from an AXI master
device (such as a processor). The slave bridge provides a way to translate addresses that are
mapped within the AXI4 memory mapped address domain to the domain addresses for PCIe. Write
transactions to the Slave Bridge are converted into one or more MemWr TLPs, depending on the
configured Max Payload Size setting, which are passed to the integrated block for PCI Express. The
slave bridge can support up to 32 active AXI4 Write requests. When a remote AXI master initiates a

Displayed in the footer


Page 399 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
read transaction to the slave bridge, the read address and qualifiers are captured and a MemRd
request TLP is passed to the core and a completion timeout timer is started. Completions received
through the core are correlated with pending read requests and read data is returned to the AXI
master. The slave bridge can support up to 32 active AXI4 Read requests with pending completions.
The master bridge processes both PCIe MemWr and MemRd request TLPs received from the
Integrated Block for PCI Express and provides a means to translate addresses that are mapped
within the address for PCIe domain to the memory mapped AXI4 address domain. Each PCIe MemWr
request TLP header is used to create an address and qualifiers for the memory mapped AXI4 bus and
the associated write data is passed to the addressed memory mapped AXI4 Slave. The Master Bridge
can support up to 32 active PCIe MemWr request TLPs. PCIe MemWr request TLPs support is as
follows:

4 for 64-bit AXI data width


8 for 128-bit AXI data width
16 for 256-bit AXI data width
32 for 512-bit AXI data width

Each PCIe MemRd request TLP header is used to create an address and qualifiers for the memory-
mapped AXI4 bus. Read data is collected from the addressed memory mapped AXI4 slave and used
to generate completion TLPs which are then passed to the integrated block for PCI Express. The
Master Bridge in can support up to 32 active PCIe MemRd request TLPs with pending completions for
improved AXI4 pipelining performance.
The instantiated AXI4-Stream Enhanced PCIe block contains submodules including the
Requester/Completer interfaces to the AXI bridge and the Register block. The Register block contains
the status, control, and interrupt registers.

AXI Bridge Operations

AXI Transactions for PCIe

The following tables are the translation tables for AXI4-Stream and memory-mapped transactions.

Table: AXI4 Memory-Mapped Transactions to AXI4-Stream PCIe TLPs

AXI4 Memory-Mapped Transaction AXI4-Stream PCIe TLPs

INCR Burst Read of AXIBAR MemRd 32 (3DW)

INCR Burst Write to AXIBAR MemWr 32 (3DW)

INCR Burst Read of AXIBAR MemRd 64 (4DW)

INCR Burst Write to AXIBAR MemWr 64 (4DW)

Table: AXI4-Stream PCIe TLPs to AXI4 Memory Mapped Transactions

AXI4-Stream PCIe TLPs AXI4 Memory-Mapped Transaction

MemRd 32 (3DW) of PCIEBAR INCR Burst Read

Displayed in the footer


Page 400 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

AXI4-Stream PCIe TLPs AXI4 Memory-Mapped Transaction

MemWr 32 (3DW) to PCIEBAR INCR Burst Write

MemRd 64 (4DW) of PCIEBAR INCR Burst Read

MemWr 64 (4DW) to PCIEBAR INCR Burst Write


For PCIe® requests with lengths greater than 1 Dword, the size of the data burst on the Master AXI
interface will always equal the width of the AXI data bus even when the request received from the
PCIe link is shorter than the AXI bus width.
Slave axi write strobe (wstrb) signal can be used to facilitate data alignment to an address boundary.
Write strobe signal can be 0 in the beginning of a valid data cycle and it appropriately calculates an
offset to the given address. However, the valid data identified by the write strobe signal must be
continuous from the first byte enable to the last byte enable.

Transaction Ordering for PCIe

The AXI Bridge functional mode conforms to PCIe® transaction ordering rules. See the PCI-SIG
Specifications for the complete rule set. The following behaviors are implemented in the AXI Bridge
functional mode to enforce the PCIe transaction ordering rules on the highly-parallel AXI bus of the
bridge.

The bresp to the remote (requesting) AXI4 master device for a write to a remote PCIe device is
not issued until the MemWr TLP transmission is guaranteed to be sent on the PCIe link before any
subsequent TX-transfers.
If Relaxed Ordering bit is not set within the TLP header, then a remote PCIe device read to a
remote AXI slave is not permitted to pass any previous remote PCIe device writes to a remote
AXI slave received by the AXI Bridge functional mode. The AXI read address phase is held until
the previous AXI write transactions have completed and bresp has been received for the AXI
write transactions. If the Relaxed Ordering attribute bit is set within the TLP header, then the
remote PCIe device read is permitted to pass.
Read completion data received from a remote PCIe device are not permitted to pass any remote
PCIe device writes to a remote AXI slave received by the AXI Bridge functional mode prior to the
read completion data. The bresp for the AXI write(s) must be received before the completion
data is presented on the AXI read data channel.

✎ Note: The transaction ordering rules for PCIe might have an impact on data throughput in heavy
bidirectional traffic.

Bridge

The Bridge core is an interface between the AXI4 and the PCI Express integrated block. It contains
the memory mapped AXI4 to AXI4-Stream Bridge, and the AXI4-Stream Enhanced Interface Block for
PCIe. The memory mapped AXI4 to AXI4-Stream Bridge contains a register block and two functional
half bridges, referred to as the Slave Bridge and Master Bridge.

Displayed in the footer


Page 401 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The slave bridge connects to the AXI4 Interconnect as a slave device to handle any issued AXI4
master read or write requests.
The master bridge connects to the AXI4 Interconnect as a master to process the PCIe generated
read or write TLPs.
The register block contains registers used in the Bridge core for dynamically mapping the AXI4
memory mapped (MM) address range provided using the AXIBAR parameters to an address for
PCIe range.

The core uses a set of interrupts to detect and flag error conditions.

Slave Bridge

The slave bridge provides termination of memory-mapped AXI4 transactions from an AXI4 master
device (such as a processor). The slave bridge provides a way to translate addresses that are
mapped within the AXI4 memory mapped address domain to the domain addresses for PCIe. Write
transactions to the Slave Bridge are converted into one or more MemWr TLPs, depending on the
configured Max Payload Size setting, which are passed to the integrated block for PCI Express. When
a remote AXI master initiates a read transaction to the slave bridge, the read address and qualifiers
are captured and a MemRd request TLP is passed to the core and a completion timeout timer is
started. Completions received through the core are correlated with pending read requests and read
data is returned to the AXI4 master. The slave bridge can support up to 32 AXI4 write requests, and
32 AXI4 read requests.
CPM does not do any SMID checks for slave AXI4 transfers. Any value is accepted.
✎ Note: If slave reads and writes are valid, IP prioritizes reads over writes. You are recommended to
have proper arbitration (leave some gaps between reads so writes can pass through).

BDF Table

Address translations for AXI address is done based on BDF table programming (0x2420 to 0x2434).
These BDF table entries can be programmed through the NoC AXI Slave interface. There are three
regions that you can use for slave data transfers. Each region can be further divided into many
windows for a different address translation. These regions and number of windows should be
configured in the IP wizard configuration. Each entry in the BDF table programming represents one
window. If a you need 2 windows then 2 entries need to be programmed and so on.
There are some restrictions in programming BDF table.

1. All PCIe slave bridge data transfers must be quiesced before programming the BDF table.
2. There are six registers for each BDF table entry. All six registers must be programmed to make a
valid entry. Even if some registers have 0s, you need to program 0s in those registers.
3. All the six registers need to be programmed in an order for an entry to be valid. Order is listed
below.
a. 0x2420
b. 0x2424
c. 0x2428
d. 0x242C
e. 0x2430

Displayed in the footer


Page 402 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
f. 0x2434

BDF table entry start address = 0x2420 + (0x20 * i), where i = table entry number.

Address Translation

Slave bridge data transfer can be performed over three regions. You have options to set the size of
the region and also how many windows are needed for different address translation per region. If
address translation is not needed for a window, you still need to program the BDF table with address
translation value as 0x0.
Address translation for Slave Bridge transfer are described in the following examples:

Slave Address Translation Examples

Example 1: BAR Size of 64 KB, with 1 Window Size 4 KB


Window 0: 4 KB with address translation of 0x7 for bits [63:13].

1. Selections in Vivado IP configuration in the AXI BARs tab are as follows:


AXI BAR size 64K: 0xFFFF bits [15:0]
Set Aperture Base Address: 0x0000_0000_0000_0000
Set Aperture High Address: 0x0000_0000_0000_FFFF
2. The BDF table programming:
Program 1 entries for 1 window
Window Size = AXI BAR size/8 = 64K / 8 = 0x1FFF = 8 KB (13 bits). Each window max size
is 8 KB.
In this example for window size of 4K, 0x1 is programmed at 0x2430 bits [25:0].
Address translation for bits [63:13] are programmed at 0x2420 and 0x2424.
In this example, address translation for bits [63:13] are set to 0x7.

Table: BDF Table Programming

Program Value Registers

0x0000_E000 Address translation value Low

0x0 Address translation value High

0x0 PASID/ Reserved

0x0 [11:0]: Function Number

0xC0000001 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x0 reserved

Displayed in the footer


Page 403 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
For this example Slave address 0x0000_0000_0000_0100 will be address translated to
0x0000_0000_0000_E100.

Example 2: BAR Size of 64 KB, with 1 Window 8 KB


Window 0:8 KB with address translation of 0x6 ('b110) for bits [63:13].

1. Selections in Vivado IP configuration in the AXI BARs tab are as follows:


AXI BAR size 64K: 0xFFFF bits [15:0]
Set Aperture Base Address: 0x0000_0000_0000_0000
Set Aperture High Address: 0x0000_0000_0000_FFFF
2. The BDF table programming:
Program 1 entries for 1 window.
Window Size = AXI BAR size/8 = 64K / 8 = 0x1FFF = 8 KB (13 bits). Each window max size
is 8 KB.
In this example for window size of 8K, 0x2 is programmed at 0x2430 bits [25:0].
Address translation for bits [63:13] are programmed at 0x2420 and 0x2424.
In this example, address translation for bits [63:13] are set to 0x6 ('b110).

Table: BDF Table Programming

Offset Program Value Registers

0x2420 0x0000_C000 Address translation value Low

0x2424 0x0 Address translation value High

0x2428 0x0 PASID/ Reserved

0x242C 0x0 [11:0]: Function Number

0x2430 0xC0000002 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2434 0x0 reserved

For this example, the Slave address 0x0000_0000_0000_0100 will be address translated to
0x0000_0000_0000_C100.

Example 3: BAR Size of 32 GB, and 4 Windows of Various Sizes


Window 0: 4 KB with address translation of 0x7 for bits [63:32].
Window 1: 4 GB with address translation of 0x0 for bits [63:32].
Window 2: 64 KB with address translation of 0xBBBB for bits [63:32].
Window 3: 1 GB with address translation of 0x11111 for bits [63:32].

Displayed in the footer


Page 404 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
1. Selections in AMD Vivado™ IP configuration in the AXI BARs tab are as follows:
AXI BAR size 32G: 0x7_FFFF_FFFF bits [34:0].
Set Aperture Base Address: 0x0000_0000_0000_0000.
Set Aperture High Address: 0x0000_0007_FFFF_FFFF.
2. The BDF table programming:
Window Size = AXI BAR size/8 = 32 GB / 8 = 0xFFFF_FFFF = 4 GB (32 bits). Each window
max size is 4 GB.
Program 4 entries for 4 windows:
BDF entry 0 table starts at 0x2420.
BDF entry 1 table starts at 0x2440.
BDF entry 2 table starts at 0x2460.
BDF entry 3 table starts at 0x2480.
Window 0 size 4 KB.
Program 0x1 to 0x2430 bits [25:0].
Address translation for bits [34:32] are programmed at 0x2420 and 0x2424.
Program 0x0000_0000 to 0x2420.
Program 0x0000_0007 to 0x2424
Window 1 size 4 GB.
Program 0x10_0000 to 0x2450 bits [25:0].
Address translation for bits [63:32] are programmed at 0x2440 and 0x2444.
Program 0x0000_0000 to 0x2440.
Program 0x0000_0000 to 0x2444
Window 2 size 64 KB.
Program 0x10 to 0x2470 bits [25:0].
Address translation for bits [63:32] are programmed at 0x2460 and 0x2464.
Program 0x0000_0000 to 0x2460
Program 0x0000_BBBB to 0x2464
Window 3 size 1 GB.
Program 0x4_0000 to 0x2490 bits [25:0].
Address translation for bits [63:32] are programmed at 0x2480 and 0x2484.
Program 0x0000_0000 to 0x2480.
Program 0x0001_1111 to 0x2484

Table: BDF Table Programming Entry 0

Offset Program Value Registers

0x2420 0x0000_0000 Address translation value Low

0x2424 0x7 Address translation value High

0x2428 0x0 PASID/ Reserved

0x242C 0x0 [11:0]: Function Number

0x2430 0xC0000001 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID

Displayed in the footer


Page 405 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Offset Program Value Registers


[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2434 0x0 reserved

Table: BDF Table Programming Entry 1

Offset Program Value Registers

0x2440 0x0000 Address translation value Low

0x2444 0x0 Address translation value High

0x2448 0x0 PASID/ Reserved

0x244C 0x0 [11:0]: Function Number

0x2450 0xC010_0000 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2454 0x0 reserved

Table: BDF Table Programming Entry 2

Offset Program Value Registers

0x2460 0x_0000 Address translation value Low

0x2464 0xBBBB Address translation value High

0x2468 0x0 PASID/ Reserved

0x246C 0x0 [11:0]: Function Number

0x2470 0xC000_00010 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2474 0x0 reserved

Table: BDF Table Programming Entry 3

Offset Program Value Registers

Displayed in the footer


Page 406 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Offset Program Value Registers

0x2480 0x0000_0000 Address translation value Low

0x2484 0x1_1111 Address translation value High

0x2488 0x0 PASID/ Reserved

0x248C 0x0 [11:0]: Function Number

0x2490 0xC004_0000 [31:30] Read/Write Access permission


[29] : R0 access Error
[28:26] Protection ID
[25:0] Window Size
([25:0]*4K = actual size of the window)

0x2494 0x0 reserved


For the above example:

The slave address 0x0000_0000_0000_0100 translated to 0x0000_0007_0000_0100.


The slave address 0x0000_0001_0000_0100 translated to 0x0000_0000_0000_0100.
The slave address 0x0000_0002_0000_0100 translated to 0x0000_BBBB_0000_0100.
The slave address 0x0000_0003_0000_0100 translated to 0x0001_1111_0000_0100.

The slave bridge does not support narrow burst AXI transfers. To avoid narrow burst transfers,
connect the AXI smart-connect module which will convert narrow burst to full burst AXI transfers.

Master Bridge

The master bridge processes both PCIe MemWr and MemRd request TLPs received from the integrated
block for PCI Express and provides a means to translate addresses that are mapped within the
address for PCIe domain to the memory mapped AXI4 address domain. Each PCIe MemWr request
TLP header is used to create an address and qualifiers for the memory mapped AXI4 bus and the
associated write data is passed to the addressed memory mapped AXI4 Slave. The Master Bridge
can support up to 32 active PCIe MemWr request TLPs. PCIe MemWr request TLPs support is as
follows:
Each PCIe MemRd request TLP header is used to create an address and qualifiers for the memory
mapped AXI4 bus. Read data is collected from the addressed memory mapped AXI4 bridge slave and
used to generate completion TLPs which are then passed to the integrated block for PCI Express.
The Master Bridge in AXI Bridge mode can support up to 32 active PCIe MemRd request TLPs with
pending completions for improved AXI4 pipe-lining performance.
All AXI4_MM master transfer can be directed to modules based on the QDMA controller selection and
the steering selection in the GUI as shown in the following table:

Table: Controller Steering Options

Controller Steering Options

Displayed in the footer


Page 407 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Controller Steering Options

CTRL 0
CPM PCIE NoC 0
CPM PCIE NoC 1
CCI PS AXI 0

CTRL 1
CPM PCIE NoC 0
CPM PCIE NoC 1
CCI PS AXI 0
PL AXI0
PL AXI1

All AXI4_MM master transfer have SMID set to 0.

Root Port

When the AXI bridge is configured as a root port, the transfers are directed based on the GUI
selections as shown in the following table:

Table: Controller Steering Options for Root Port

Controller Steering Options

CTRL 0 CCI PS AXI 0

CTRL 1 CPM PCIE NoC 0

✎ Note: Root port mode is not supported in QDMA0 and QDMA1 controllers at the same time. You
can enable root port mode only in one controller at a time.

Malformed TLP

The integrated block for PCI Express® detects a malformed TLP. For the IP configured as an
Endpoint core, a malformed TLP results in a fatal error message being sent upstream if error reporting
is enabled in the Device Control register.

Abnormal Conditions

This section describes how the Slave side and Master side (see the following tables) of the AXI Bridge
functional mode handle abnormal conditions.

Slave Bridge Abnormal Conditions

Slave bridge abnormal conditions are classified as: Illegal Burst Type and Completion TLP Errors. The
following sections describe the manner in which the Bridge handles these errors.

Displayed in the footer


Page 408 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Illegal Burst Type

The slave bridge monitors AXI read and write burst type inputs to ensure that only the INCR
(incrementing burst) type is requested. Any other value on these inputs is treated as an error condition
and the Slave Illegal Burst (SIB) interrupt is asserted. In the case of a read request, the Bridge
asserts SLVERR for all data beats and arbitrary data is placed on the Slave AXI4-MM read data bus.
In the case of a write request, the Bridge asserts SLVERR for the write response and all write data is
discarded.

Completion TLP Errors

Any request to the bus for PCIe (except for a posted Memory write) requires a completion TLP to
complete the associated AXI request. The Slave side of the Bridge checks the received completion
TLPs for errors and checks for completion TLPs that are never returned (Completion Timeout). Each
of the completion TLP error types are discussed in the subsequent sections.
Unexpected Completion
When the slave bridge receives a completion TLP, it matches the header RequesterID and Tag to the
outstanding RequesterID and Tag. A match failure indicates the TLP is an Unexpected Completion
which results in the completion TLP being discarded and a Slave Unexpected Completion (SUC)
interrupt strobe being asserted. Normal operation then continues.
Unsupported Request
A device for PCIe might not be capable of satisfying a specific read request. For example, if the read
request targets an unsupported address for PCIe, the completer returns a completion TLP with a
completion status of 0b001 - Unsupported Request. The completer that returns a completion TLP
with a completion status of Reserved must be treated as an unsupported request status, according to
the PCI Express Base Specification v3.0. When the slave bridge receives an unsupported request
response, the Slave Unsupported Request (SUR) interrupt is asserted and the DECERR response is
asserted with arbitrary data on the AXI4 memory mapped bus.
Completion Timeout
A Completion Timeout occurs when a completion (Cpl) or completion with data (CplD) TLP is not
returned after an AXI to PCIe memory read request, or after a PCIe Configuration Read/Write request.
For PCIe Configuration Read/Write request, completions must complete within the C_COMP_TIMEOUT
parameter selected value from the time the request is issued. For PCIe Memory Read request,
completions must complete within the value set in the Device Control 2 register in the PCIe
Configuration Space register. When a completion timeout occurs, an OKAY response is asserted with
all 1s data on the memory mapped AXI4 bus.
Poison Bit Received on Completion Packet
An Error Poison occurs when the completion TLP EP bit is set, indicating that there is poisoned data
in the payload. When the slave bridge detects the poisoned packet, the Slave Error Poison (SEP)
interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.
Completer Abort
A Completer Abort occurs when the completion TLP completion status is 0b100 - Completer Abort.
This indicates that the completer has encountered a state in which it was unable to complete the
transaction. When the slave bridge receives a completer abort response, the Slave Completer Abort

Displayed in the footer


Page 409 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
(SCA) interrupt is asserted and the SLVERR response is asserted with arbitrary data on the memory
mapped AXI4 bus.

Table: Slave Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

SIB interrupt is asserted.


Read Illegal burst type SLVERR response given with
arbitrary read data.

SIB interrupt is asserted.


Write Illegal burst type Write data is discarded.
SLVERR response given.

SUC interrupt is asserted.


Read Unexpected completion
Completion is discarded.

SUR interrupt is asserted.


Unsupported Request status
Read DECERR response given with
returned
arbitrary read data.

SCT interrupt is asserted.


Read Completion timeout SLVERR response given with
arbitrary read data.

Completion data is discarded.


SEP interrupt is asserted.
Read Poison bit in completion
SLVERR response given with
arbitrary read data.

SCA interrupt is asserted.


Read Completer Abort (CA) status returned SLVERR response given with
arbitrary read data.

PCIe Error Handling

RP_ERROR_FIFO (RP only)


Error handling is as follows:

1. Read register 0xE10 (INT_DEC) and check if bits is set to one of: [9] (correctable), [10]
(non_fatal) or [11] (fatal).
2. Read register 0xE20 (RP_CSR) and check if bit [16] is set to see if efifo_not_empty is set.
3. If FIFO is not empty read FIFO by reading 0xE2C (RP_FIFO_READ)
a. Error message indicates where the error comes from (i.e, requestor ID) and Error type.
4. To clear the error, write to 0xE2C (RP_FIFO_READ). Value does not matter
5. Repeat steps 2 and 3 until 0xE2C (RP_FIFO_READ) bit [18] valid bit is cleared.

Displayed in the footer


Page 410 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
6. Write 1 to register 0xE10 (INT_DEC) to clear bits [9] (correctable), [10] (non_fatal) or [11] (fatal).

RP_PME_FIFO (RP only)


Error handling is a follows:

1. Read register 0xE10 (INT_DEC) and check if bits [17] is set, which indicates PM_PME message
has been received.
2. Read register 0xE20 (RP_CSR) and check if bit [18] is set to see if pfifo_not_empty is set.
3. If FIFO is not empty, read FIFO by reading 0xE30 (RP_PFIFO).
a. Message will indicate where the message comes from (i.e., requestor ID).
4. To clear the error, write to 0xE30 (RP_PFIFO). Value does not matter.
5. Repeat steps 2 and 3 until 0xE30 (RP_PFIFO) bit [31] valid bit is cleared.
6. Write 1 to register 0xE10 (INT_DEC) to clear bits [17].

Master Bridge Abnormal Conditions

The following sections describe the manner in which the master bridge handles abnormal conditions.

AXI DECERR Response

When the master bridge receives a DECERR response from the AXI bus, the request is discarded
and the Master DECERR (MDE) interrupt is asserted. If the request was non-posted, a completion
packet with the Completion Status = Unsupported Request (UR) is returned on the bus for PCIe.

AXI SLVERR Response

When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.

Max Payload Size for PCIe, Max Read Request Size or 4K Page Violated

When the master bridge receives a SLVERR response from the addressed AXI slave, the request is
discarded and the Master SLVERR (MSE) interrupt is asserted. If the request was non-posted, a
completion packet with the Completion Status = Completer Abort (CA) is returned on the bus for PCIe.

Completion Packets

When the MAX_READ_REQUEST_SIZE is greater than the MAX_PAYLOAD_SIZE, a read request for PCIe
can ask for more data than the master bridge can insert into a single completion packet. When this
situation occurs, multiple completion packets are generated up to MAX_PAYLOAD_SIZE, with the Read
Completion Boundary (RCB) observed.

Displayed in the footer


Page 411 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Poison Bit

When the poison bit is set in a transaction layer packet (TLP) header, the payload following the
header is corrupted. When the master bridge receives a memory request TLP with the poison bit set,
it discards the TLP and asserts the Master Error Poison (MEP) interrupt strobe.

Zero Length Requests

When the master bridge receives a read request with the Length = 0x1, FirstBE = 0x00, and LastBE =
0x00, it responds by sending a completion with Status = Successful Completion.
When the master bridge receives a write request with the Length = 0x1, FirstBE = 0x00, and LastBE
= 0x00 there is no effect.

Table: Master Bridge Response to Abnormal Conditions

Transfer Type Abnormal Condition Bridge Response

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

MSE interrupt strobe asserted.


Read SLVERR response Completion returned with Completer
Abort status.

Write SLVERR response MSE interrupt strobe asserted.

MEP interrupt strobe asserted.


Write Poison bit set in request
Data is discarded.

MDE interrupt strobe asserted.


Read DECERR response Completion returned with
Unsupported Request status.

Write DECERR response MDE interrupt strobe asserted.

Link Down Behavior

The normal operation of the functional mode is dependent on the integrated block for PCIe
establishing and maintaining the point-to-point link with an external device for PCIe. If the link has
been lost, it must be re-established to return to normal operation.
When a Hot Reset is received by the functional mode, the link goes down and the PCI Configuration
Space must be reconfigured.
Initiated AXI4 write transactions that have not yet completed on the AXI4 bus when the link goes down
have a SLVERR response given and the write data is discarded. Initiated AXI4 read transactions that
have not yet completed on the AXI4 bus when the link goes down have a SLVERR response given,
with arbitrary read data returned.

Displayed in the footer


Page 412 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Any MemWr TLPs for PCIe that have been received, but the associated AXI4 write transaction has not
started when the link goes down, are discarded.

Endpoint

When configured to support Endpoint functionality, the AXI Bridge functional mode fully supports
Endpoint operation as supported by the underlying block. There are a few details that need special
consideration. The following subsections contain information and design considerations about
Endpoint support.

Interrupts

The Interrupt modes in the following section applies to AXI Bridge mode only.
Multiple interrupt modes can be configured during IP configuration, however only one interrupt mode
is used at runtime. If multiple interrupt modes are enabled by the host after PCI bus enumeration at
runtime, MSI-X interrupt takes precedence over MSI interrupt, and MSI interrupt takes precedence
over Legacy interrupt. All of these interrupt modes are sent using the same xdma0_usr_irq_*
interface and the core automatically picks the best available interrupt mode at runtime.

Legacy Interrupts

Asserting one or more bits of xdma0_usr_irq_req when legacy interrupts are enabled causes the IP
to issue a legacy interrupt over PCIe. Multiple bits may be asserted simultaneously but each bit must
remain asserted until the corresponding xdma0_usr_irq_ack bit has been asserted. After a
xdma0_usr_irq_req bit is asserted, it must remain asserted until the corresponding
xdma0_usr_irq_ack bit is asserted and the interrupt has been serviced and cleared by the Host. The
xdma0_usr_irq_ack assertion indicates the requested interrupt has been sent on the PCIe block.
This will ensure interrupt pending register within the IP remains asserted when queried by the Host's
Interrupt Service Routine (ISR) to determine the source of interrupts. You must implement a
mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an interrupt is serviced.
After the xdma0_usr_irq_req bit is deasserted, it cannot be reasserted until the corresponding
xdma0_usr_irq_ack bit has been asserted for a second time. This indicates the deassertion
message for the legacy interrupt has been sent over PCIe. After a second xdma0_usr_irq_ack
occurred, the xdma0_usr_irq_req wire can be reasserted to generate another legacy interrupt.
The xdma0_usr_irq_req bit can be mapped to legacy interrupt INTA, INTB, INTC, INTD through the
configuration registers. The following figure shows the legacy interrupts.
This figure shows only the handshake between xdma0_usr_irq_req and xdma0_usr_irq_ack. The
user application might not clear or service the interrupt immediately, in which case, you must keep
xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: Legacy Interrupts

Displayed in the footer


Page 413 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

MSI and Internal MSI-X Interrupts

Asserting one or more bits of xdma0_usr_irq_req causes the generation of an MSI or MSI-X
interrupt if MSI or MSI-X is enabled. If both MSI and MSI-X capabilities are enabled, an MSI-X
interrupt is generated. The Internal MSI-X interrupts mode is enabled when you set the MSI-X
Implementation Location option to Internal in the PCIe Misc Tab.
After a xdma0_usr_irq_req bit is asserted, it must remain asserted until the corresponding
xdma0_usr_irq_ack bit is asserted and the interrupt has been serviced and cleared by the Host. The
xdma0_usr_irq_ack assertion indicates the requested interrupt has been sent on the PCIe block.
This will ensure the interrupt pending register within the IP remains asserted when queried by the
Host's Interrupt Service Routine (ISR) to determine the source of interrupts. You must implement a
mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an Interrupt is serviced.
Configuration registers are available to map xdma0_usr_irq_req and DMA interrupts to MSI or MSI-
X vectors. For MSI-X support, there is also a vector table and PBA table. The following figure shows
the MSI interrupt.
This figure shows only the handshake between xdma0_usr_irq_req and xdma0_usr_irq_ack. Your
application might not clear or service the interrupt immediately, in which case, you must keep
xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: MSI Interrupts

The following figure shows the MSI-X interrupt.


This figure shows only the handshake between xdma0_usr_irq_req and xdma0_usr_irq_ack. Your
application might not clear or service the interrupt immediately, in which case, you must keep
xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: MSI-X Interrupts

Root Port

When configured to support Root Port functionality, the AXI Bridge functional mode fully supports
Root Port operation as supported by the underlying block. There are a few details that need special

Displayed in the footer


Page 414 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
consideration. The following subsections contain information and design considerations about Root
Port support.

Enhanced Configuration Access Memory Map

When the functional mode is configured as a Root Port, configuration traffic is generated by using the
PCI Express enhanced configuration access mechanism (ECAM). ECAM functionality is available
only when the core is configured as a Root Port. Reads and writes to a certain memory aperture are
translated to configuration reads and writes, as specified in the PCI Express Base Specification
(v3.0), §7.2.2.
The address breakdown is defined in the following table. ECAM is used in conjunction with the Bridge
Register Memory Map only when used in both AXI Bridge for PCIe Gen3 core as well as DMA/Bridge
Subsystem for PCIe in AXI Bridge mode core. The DMA/Bridge Subsystem for PCIe Register Memory
Map does not have ECAM functionality.
When an ECAM access is attempted to the primary bus number, which defaults as bus 0 from reset,
then access to the type 1 PCI™ Configuration Header of the integrated block in the Enhanced
Interface for PCIe is performed. When an ECAM access is attempted to the secondary bus number,
then type 0 configuration transactions are generated. When an ECAM access is attempted to a bus
number that is in the range defined by the secondary bus number and subordinate bus number (not
including the secondary bus number), then type 1 configuration transactions are generated. The
primary, secondary, and subordinate bus numbers are written by Root Port software to the type 1 PCI
Configuration Header of the Enhanced Interface for PCIe in the beginning of the enumeration
procedure.
When an ECAM access is attempted to a bus number that is out of the bus_number and subordinate
bus number range, the bridge does not generate a configuration request and signal SLVERR response
on the AXI4-Lite bus. When the Bridge is configured for EP (PL_UPSTREAM_FACING = TRUE), the
underlying Integrated Block configuration space and the core memory map are available at the
beginning of the memory space. The memory space looks like a simple PCI Express® configuration
space. When the Bridge is configured for RC (PL_UPSTREAM_FACING = FALSE), the same is true, but
it also looks like an ECAM access to primary bus, Device 0, Function 0.
When the functional mode is configured as a Root Port, the reads and writes of the local ECAM are
Bus 0. Because the adaptive SoC only has a single Integrated Block for PCIe core, all local ECAM
operations to Bus 0 return the ECAM data for Device 0, Function 0.
Configuration write accesses across the PCI Express bus are non-posted writes and block the AXI4-
Lite interface while they are in progress. Because of this, system software is not able to service an
interrupt if one were to occur. However, interrupts due to abnormal terminations of configuration
transactions can generate interrupts. ECAM read transactions block subsequent Requester read
TLPs until the configuration read completions packet is returned to allow unique identification of the
completion packet.

Table: ECAM Addressing

Bits Name Description

Displayed in the footer


Page 415 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Bits Name Description

1:0 Byte Address Ignored for this implementation. The


s<n>_axi_wstrb signals define byte enables for
ECAM accesses.

7:2 Register Number Register within the configuration space to


access.

11:8 Extended Register Number Along with Register Number, allows access to
PCI Express Extended Configuration Space.

14:12 Function Number Function Number to completer.

19:15 Device Number Device Number to completer.

27:20 Bus Number Bus Number to completer.

Root Port Enumeration

Root Port PCIe enumeration is done through ECAM register space. Each PCIe device or function is
allocated 4 KB address space which holds their PCIe Configuration Space register. The upper
address field of the ECAM register space consists of the PCIe Bus Device Function number to select
the target device. ECAM register space automatically routes and generates the appropriate PCIe
Configuration Request TLP Type based on the target PCIe Bus Device Function number as well as
the programmed Primary Bus Number, Secondary Bus Number, and Subordinate Bus Number field.
Enumeration process through the Root Port PCIe Bridge IP follows the standard PCIe Bus discovery
as well as PCIe Configuration Space programming sequence, as defined by the PCIe Base
Specification. Root Port PCIe Bridge lists all the PCIe capabilities enabled in the Root Port up to AER
Capabilities register. The remaining PCIe capabilities registers in the Root Port Configuration Space
are not visible in the standard PCIe Configuration Space link list, however they follow the standard
PCIe Configuration Space layout. Bridge register in the Root Port use one of the PCIe User Extended
Configuration Space region and is accessible when targeting the Root Port Bus Device Function
number. All downstream devices (PCIe switches, Endpoints) attached to the Root Port show all PCIe
capabilities registers without any limitation.
Root port configuration address offsets are not listed correctly. Next pointer address is not pointing to
proper address below AER, this may result in wrong configuration values. Up to AER all listed values
are correct. User can read Config extended space below AER with fixed targeted address, Target
address values are listed below.

Table: PCIe Extended Capability for Root Port

PF0 Start Address

AER 0x100

2nd PCIE 0x1C0

VC 0x1F0

Displayed in the footer


Page 416 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

PF0 Start Address

Loopback VSEC 0x330

DLL Feature Cap 0x3A0

16 GT Cap 0x3B0

Margining Cap 0x400

Coherent Data Path


Root Port Bridge IP has two choices for AXI4 data path. A coherent data path is routed through
CCI-500 interconnects in the FPD by selecting the CPM-NOC-0 route in the CIPS CPM IP
customization GUI. Alternatively, a non-coherent data path is routed directly to the NoC in the
LPD by selecting the CPM-NOC-1 route in the CIPS CPM IP customization GUI.

Power Limit Message TLP

The AXI Bridge functional mode automatically sends a Power Limit Message TLP when the Master
Enable bit of the Command Register is set. The software must set the Requester ID register before
setting the Master Enable bit to ensure that the desired Requester ID is used in the Message TLP.

Root Port Configuration Read

When an ECAM access is performed to the primary bus number, self-configuration of the integrated
block for PCIe is performed. A PCIe configuration transaction is not performed and is not presented
on the link. When an ECAM access is performed to the bus number that is equal to the secondary bus
value in the Enhanced PCIe Type 1 configuration header, then Type 0 configuration transactions are
generated.
When an ECAM access is attempted to a bus number that is in the range defined by the secondary
bus number and subordinate bus number range (not including secondary bus number), then Type 1
configuration transactions are generated. The primary, secondary and subordinate bus numbers are
written and updated by Root Port software to the Type 1 PCI™ Configuration Header of the AXI
Bridge functional mode in the enumeration procedure.
When an ECAM access is attempted to a bus number that is out of the range defined by the
secondary bus_number and subordinate bus number, the bridge does not generate a configuration
request and signal a SLVERR response on the AXI4 MM bus.
When a Unsupported Request (UR) response is received for a configuration read request, all ones
are returned on the AXI bus to signify that a device does not exist at the requested device address. It
is the responsibility of the software to ensure configuration write requests are not performed to device
addresses that do not exist. However, the AXI Bridge functional mode asserts SLVERR response on
the AXI bus when a configuration write request is performed on device addresses that do not exist or
a UR response is received.

Displayed in the footer


Page 417 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Root Port BAR

Root Port BAR does not support packet filtering (all TLPs received from PCIe link are forwarded to the
user logic), however Address Translation can be configured to enable or disable, depending on the IP
configuration.
During core customization in the AMD Vivado™ Design Suite, when there is no BAR enabled, RP
passes all received packets to the user application without address translation or address filtering.
When BAR is enabled, by default the BAR address starts at 0x0000_0000 unless programmed
separately. Any packet received from the PCIe® link that hits a BAR is translated according to the
PCIE-to-AXI Address Translation rules.
✎ Note: The IP must not receive any TLPs outside of the PCIe BAR range from the PCIe link when
RP BAR is enabled. If this rule cannot be enforced, it's recommended that the PCIe BAR is disabled
and do address filtering and/or translation outside of the IP.
The Root Port BAR customization options in the Vivado Design Suite are found in the PCIe BARs
Tab.

Configuration Transaction Timeout

Configuration transactions are non-posted transactions. The AXI Bridge functional mode has a timer
for timeout termination of configuration transactions that have not completed on the PCIe link. SLVERR
is returned when a configuration timeout occurs. Timeouts of configuration transactions are flagged by
an interrupt as well.
✎ Note: Multiple Configuration read (PCIe CFG read) can block Configuration writes (PCIe CFG
write). You must have some kind of throttling mechanism for CFG reads so CFG write can pass by.

Abnormal Configuration Transaction Termination Responses

Responses on AXI to abnormal terminations to configuration transactions are shown in the following
table.

Table: Responses of Bridge to Abnormal Configuration Terminations

Transfer Type Abnormal Condition Bridge Response

Bus number not in the range


of primary bus number SLVERR response is
Config Read or Write
through subordinate bus asserted.
number.

Valid bus number and SLVERR response is


Config Read or Write
completion timeout occurs. asserted.

SLVERR response is
Config Read or Write Completion timeout.
asserted.

Config Write Bus number in the range of SLVERR response is


secondary bus number asserted.

Displayed in the footer


Page 418 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Transfer Type Abnormal Condition Bridge Response


through subordinate bus
number and UR is returned.

Receiving Interrupts

In Root Port mode, you can choose one of the two ways to handle incoming interrupts;

Legacy Interrupt FIFO mode: Legacy Interrupt FIFO mode is the default. It is available in earlier
Bridge IP variants and versions, and will continue to be available. Legacy Interrupt FIFO mode is
geared towards compatibility for legacy designs.
Interrupt Decode mode: Interrupt Decode mode is available in the CPM AXI Bridge. Interrupt
Decode mode can be used to mitigate Interrupt FIFO overflow condition which can occur in a
design that receives interrupts at a high rate and avoids the performance penalty incurred when
such condition occurs.

Port Description

Global Signals

The interface signals for the Bridge are described in the following table.

Table: Global Signals

Signal Name I/O Description

gt_refclk0_p/gt_refclk0_n I GT reference clock.

pci_gt_txp/pci_gt_txn O PCIe TX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

pci_gt_rxp/pci_gt_rxn I PCIe RX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

pcie0_user_lnk_up O Output active-High identifies that the


PCI Express core is linked up with a
host device.

pcie0_user_clk O User clock out. PCIe derived clock


output for all interface signals
output/input to AXI Bridge. Use this
clock to drive inputs and gate outputs
from AXI Bridge.

dma0_user_reset O User reset out. AXI reset signal


synchronous with the clock provided on

Displayed in the footer


Page 419 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name I/O Description


the pcie0_user_clk output. This reset
should drive all corresponding AXI
Interconnect signals.

cpm_cor_irq O Reserved

cpm_misc_irq O Reserved

cpm_uncor_irq O Reserved

cpm_irq0 I Reserved

cpm_irq1 I Reserved

AXI Slave Interface

AXI Bridge Slave ports are connected from the AMD Versal™ device programmable Network on Chip
(NoC) to the CPM DMA internally. For slave bridge AXI-MM details and configuration, see Versal
Adaptive SoC Programmable Network on Chip and Integrated Memory Controller LogiCORE IP
Product Guide (PG313).

AXI Master Interface

AXI4 (MM) Master ports are connected from the AMD Versal device Network on Chip (NoC) to the
CPM DMA internally. For details, see Versal Adaptive SoC Programmable Network on Chip and
Integrated Memory Controller LogiCORE IP Product Guide (PG313). The AXI4 Master interface can
be connected to the DDR or the PL, depending on the NoC configuration.

AXI4-Lite Master Interface

The CIPS IP does not support the AXI4-Lite Master interface. Use the SmartConnect IP to connect
the NoC to the AXI4-Lite Master interface. For details, see SmartConnect LogiCORE IP Product
Guide (PG247).

AXI Bridge for PCIe Interrupts

Table: AXI Bridge for PCIe Interrupts

Signal Name I/O Description

User interrupt request. Asset to


xdma0_usr_irq_req[NUM_USR_IRQ-
I generate an interrupt and maintain
1:0]
assertion until interrupt is serviced.

xdma0_usr_irq_ack[NUM_USR_IRQ- O User interrupt acknowledge. Indicates


1:0] that the interrupt has been set on PCIe.
Two acks are generated for legacy

Displayed in the footer


Page 420 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name I/O Description


interrupt. One ack is generated for
MSI/MSI-X interrupts.

xdma0_usr_irq_fnc[7:0] I Function
The function of the vector to be sent.

✎ Note: The xdma0_ prefix in the above signal names will be changed to dma0_* in a future release.
NUM_USR_IRQ is selectable and it ranges from 0 to 15. Each bits in xdma0_usr_irq_req bus
corresponds to the same bits in xdma0_usr_irq_ack. For example, xdma0_usr_irq_ack[0]
represents an ack for xdma0_usr_irq_req[0].

Register Space

You can access the register space when AXI Slave Bridge mode is enabled (based on GUI settings).
You can also access Bridge registers in Controller0 or in Controller1 and can also access Host
memory space. Host memory address offset varies based on Contoller0 and/or Controller1 selection.
If only Contoller0 is enabled, the table below shows address ranges and limitations.

Table: AXI Slave Bridge Register Space for Controller0

Register Space AXI Slave Interface Address Range Details

Bridge registers (ECAM 0x6_0000_0000 - Described in Bridge register


space) 0x6_0FFF_FFFF space CSV file. See Bridge
Register Space for details.
✎ Note: Bridge registers are
not accessible from PL area.
Bridge register can be access
through APB access; see
Versal Adaptive SoC Register
Reference (AM012).

Slave Bridge access to Host 0xE000_0000 - 0x Address range for Slave


memory space EFFF_FFFF bridge access is set during IP
0x6_1101_0000 - customization in the Address
0x7_FFFF_FFFF Editor tab of the Vivado IDE.
0x80_0000_0000 -
0xBF_FFFF_FFFF

The table below shows address ranges and limitations for when Contoller1 is enabled.

Table: AXI4 Slave Register Space for Contoller1

Register Space AXI Slave Interface Address range Details

Bridge registers (ECAM 0x7_0000_0000 - Described in Bridge register


space) 0x70FFF_FFFF space CSV file. See Bridge

Displayed in the footer


Page 421 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Space AXI Slave Interface Address range Details


Register Space for details.
✎ Note: Bridge registers are
not accessible from PL area.
Bridge register can be access
through APB access; see
Versal Adaptive SoC Register
Reference (AM012).

Slave Bridge access to Host 0xE800_0000 - 0xEFFF_FFFF Address range for Slave
memory space 0x7_1101_0000 - bridge access is set during IP
0x7_FFFF_FFFF customization in the Address
0xA0_0000_0000 - Editor tab of the Vivado IDE.
0xBF_FFFF_FFFF
When Both controller Contoller0 and Controller1 are enabled, the table above remains same for
Controller1. The table shown below represents Controller0.

Table: AXI4 Slave Register Space for Controller0

Register Space AXI Slave Interface Address range Details

Bridge registers (ECAM 0x6_0000_0000 - Described in Bridge register


space) 0x6_0FFF_FFFF space CSV file. See Bridge
Register Space for details.
✎ Note: Bridge registers are
not accessible from PL area.
Bridge register can be access
through APB access; see
Versal Adaptive SoC Register
Reference (AM012).

Slave Bridge access to Host 0xE000_0000 - 0xE7FF_FFFF Address range for Slave
memory space 0x6_1101_0000 - bridge access is set during IP
0x6_FFFF_FFFF customization in the Address
0x80_0000_0000 - Editor tab of the Vivado IDE.
0x9F_FFFF_FFFF

Bridge register descriptions are found in cpm5-qdma-v4-0-bridge-registers.csv available in the


register map files.
To locate the register space information:

1. Download the register map files.


2. Extract the ZIP file contents into any write-accessible location.
3. Refer to the cpm5-qdma-v4-0-bridge-registers.csv file.

Displayed in the footer


Page 422 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Slave Bridge Registers Limitations

The Register Space mentioned in this document can also be accessible through the AXI4 Memory
Mapped Slave interface. All accesses to these registers will be based on the following AXI Base
Addresses:

For QDMA registers: Base Address = 0x6_1000_0000


For Bridge registers: Base Address = 0x6_0000_0000

The offsets within each register space are the same as listed for the PCIe BAR accesses.
Please make sure that all transactions targeting these register spaces have AWCACHE[1] and
ARCACHE[1] set to 1’b0 (Non-Modifiable) and only access it in 4 Bytes transactions.

All transactions originating from Programmable Logic (PL) region, must have an AXI Master that
sets AxCACHE[1] = 1’b0 before it enters the AXI NOC.
All transactions originating from the APU or RPU must be defined by a Memory Attribute nGnRnE
or nGnRE to ensure AxCACHE[1] = 1’b0.
All transactions originating from PPU has no additional requirement necessary.

Design Flow Steps


This section describes customizing and generating the functional mode, constraining the functional
mode, and the simulation, synthesis, and implementation steps that are specific to this IP functional
mode. More detailed information about the standard AMD Vivado™ design flows and the IP integrator
can be found in the following Vivado Design Suite user guides:

Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
Vivado Design Suite User Guide: Designing with IP (UG896)
Vivado Design Suite User Guide: Getting Started (UG910)
Vivado Design Suite User Guide: Logic Simulation (UG900)

Debugging
This appendix includes details about resources available on the AMD Support website and debugging
tools.

Finding Help with AMD Adaptive Computing Solutions

To help in the design and debug process when using the functional mode, the Support web page
contains key resources such as product documentation, release notes, answer records, information
about known issues, and links for obtaining further product support. The Community Forums are also
available where members can learn, participate, share, and ask questions about AMD Adaptive
Computing solutions.

Displayed in the footer


Page 423 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Documentation

This product guide is the main document associated with the functional mode. This guide, along with
documentation related to all products that aid in the design process, can be found on the Support web
page or by using the AMD Adaptive Computing Documentation Navigator. Download the
Documentation Navigator from the Downloads page. For more information about this tool and the
features available, open the online help after installation.

Debug Guide

For more information on PCIe debug, see PCIe Debug K-Map.

Answer Records

Answer Records include information about commonly encountered problems, helpful information on
how to resolve these problems, and any known issues with an AMD Adaptive Computing product.
Answer Records are created and maintained daily to ensure that users have access to the most
accurate information available.
Answer Records for this functional mode can be located by using the Search Support box on the main
Support web page. To maximize your search results, use keywords such as:

Product name
Tool message(s)
Summary of the issue encountered

A filter search is available after results are returned to further target the results.

Master Answer Record for the Core

AR 75396.

Technical Support

AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:

Implement the solution in devices that are not defined in the documentation.
Customize the solution beyond that allowed in the product documentation.
Change any section of the design labeled DO NOT MODIFY.

To ask questions, navigate to the Community Forums.

Hardware Debug

Hardware issues can range from link bring-up to problems seen after hours of testing. This section
provides debug steps for common issues. The AMD Vivado™ debug feature is a valuable resource to

Displayed in the footer


Page 424 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
use in hardware debug. The signal names mentioned in the following individual sections can be
probed using the debug feature for debugging the specific problems.

General Checks

Ensure that all the timing constraints for the core were properly incorporated from the example design
and that all constraints were met during implementation.

If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the
locked port.

Registers

A complete list of registers and attributes for the AXI Bridge Subsystem is available in the Versal
Adaptive SoC Register Reference (AM012). Reviewing the registers and attributes might be helpful
for advanced debugging.
✎ Note: The attributes are set during IP customization in the Vivado IP catalog. After core
customization, attributes are read-only.

Upgrading
This appendix is not applicable for the first release.

XDMA Subsystem for CPM4


Overview
The XDMA Subsystem can be configured as a high performance direct memory access (DMA) data
mover between the PCI Express® and AXI memory spaces. As a DMA, the functional mode can be
configured with either an AXI (memory mapped) interface or with an AXI streaming interface to allow
for direct connection to RTL logic. Either interface can be used for high performance block data
movement between the PCIe® address space and the AXI address space using the provided
character driver. In addition to the basic DMA functionality, the DMA supports up to four upstream and
downstream channels, the ability for PCIe traffic to bypass the DMA engine (Host DMA Bypass), and
an optional descriptor bypass to manage descriptors from the AMD Versal™ Adaptive SoC for
applications that demand the highest performance and lowest latency.

Figure: XDMA Subsystem

Displayed in the footer


Page 425 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

This diagram refers to the Requester Request (RQ)/Requester Completion (RC) interfaces, and the
Completer Request (CQ)/Completer Completion (CC) interfaces.

Limitations

The limitations of the XDMA are as follows:

SR-IOV
Example design not supported for all configurations
Narrow burst (not supported on the master interface)
Invalid descriptors can cause system crash, user/driver is responsible to generate valid
descriptors.

Architecture

Internally, the subsystem can be configured to implement up to eight independent physical DMA
engines (up to four H2C and four C2H). These DMA engines can be mapped to individual AXI4-
Stream interfaces or a shared AXI4 memory mapped (MM) interface to the user application. On the
AXI4 MM interface, the XDMA Subsystem generates requests and expected completions. The AXI4-
Stream interface is data-only.
The type of channel configured determines the transactions on which bus:

A Host-to-Card (H2C) channel generates read requests to PCIe and provides the data or
generates a write request to the user application.
A Card-to-Host (C2H) channel either waits for data on the user side or generates a read request
on the user side and then generates a write request containing the data received to PCIe.

Displayed in the footer


Page 426 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
The XDMA also enables the host to access the user logic. Write requests that reach PCIe to DMA
bypass Base Address Register (BAR) are processed by the DMA. The data from the write request is
forwarded to the user application through the NoC interface to the PL logic.
The host access to the configuration and status registers in the user logic is provided through an AXI
master port. These requests are 32-bit reads or writes. The user application also has access to
internal DMA configuration and status registers through an AXI slave port.
When multiple channels for H2C and C2H are enabled, transactions on the AXI4 Master interface are
interleaved between all selected channels. Simple round robin protocol is used to service all
channels. Transactions granularity depends on host Max Payload Size (MPS), page size, and other
host settings.

Target Bridge

The target bridge receives requests from the host. Based on BARs, the requests are directed to the
internal registers, or the CQ bypass port. After the downstream user logic has returned data for a non-
posted request, the target bridge generates a read completion TLP and sends it to the PCIe IP over
the CC bus.
In the following tables, the PCIe BARs selection corresponds to the options set in the PCIe BARs tab
in the IP Configuration GUI.

H2C Channel

The number of H2C channels is configured in the AMD Vivado™ Integrated Design Environment
(IDE). The H2C channel handles DMA transfers from the host to the card. It is responsible for splitting
read requests based on maximum read request size, and available internal resources. The DMA
channel maintains a maximum number of outstanding requests based on the RNUM_RIDS, which is the
number of outstanding H2C channel request ID parameter. Each split, if any, of a read request
consumes an additional read request entry. A request is outstanding after the DMA channel has
issued the read to the PCIe RQ block to when it receives confirmation that the write has completed on
the user interface in-order. After a transfer is complete, the DMA channel issues a writeback or
interrupt to inform the host.
The H2C channel also splits transaction on both its read and write interfaces. On the read interface to
the host, transactions are split to meet the maximum read request size configured, and based on
available Data FIFO space. Data FIFO space is allocated at the time of the read request to ensure
space for the read completion. The PCIe RC block returns completion data to the allocated Data
Buffer locations. To minimize latency, upon receipt of any completion data, the H2C channel begins
issuing write requests to the user interface. It also breaks the write requests into maximum payload
size. On an AXI4-Stream user interface, this splitting is transparent.
When multiple channels are enabled, transactions on the AXI4 Master interface are interleaved
between all selected channels. Simple round robin protocol is used to service all channels.
Transactions granularity depends on host Max Payload Size (MPS), page size, and other host
settings.

Displayed in the footer


Page 427 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
C2H Channel

The C2H channel handles DMA transfers from the card to the host. The instantiated number of C2H
channels is controlled in the AMD Vivado™ IDE. Similarly the number of outstanding transfers is
configured through the WNUM_RIDS, which is the number of C2H channel request IDs. In an AXI4-
Stream configuration, the details of the DMA transfer are set up in advance of receiving data on the
AXI4-Stream interface. This is normally accomplished through receiving a DMA descriptor. After the
request ID has been prepared and the channel is enabled, the AXI4-Stream interface of the channel
can receive data and perform the DMA to the host. In an AXI4 MM interface configuration, the request
IDs are allocated as the read requests to the AXI4 MM interface are issued. Similar to the H2C
channel, a given request ID is outstanding until the write request has been completed. In the case of
the C2H channel, write request completion is when the write request has been issued as indicated by
the PCIe IP.
When multiple channels are enabled, transactions on the AXI4 Master interface are interleaved
between all selected channels. Simple round robin protocol is used to service all channels.
Transactions granularity depends on host MaxPayload Size (MPS), page size, and other host
settings.

Host-to-Card Bypass Master

Host requests that reach the PCIe to DMA bypass BAR are sent to this module. The bypass master
port is an AXI4 MM interface and supports read and write accesses.

IRQ Module

The IRQ module receives a configurable number of interrupt wires from the user logic and one
interrupt wire from each DMA channel. This module is responsible for generating an interrupt over
PCIe. Support for MSI-X, MSI, and legacy interrupts can be specified during IP configuration.
✎ Note: The Host can enable one or more interrupt types from the specified list of supported
interrupts during IP configuration. The IP only generates one interrupt type at a given time even when
there are more than one enabled. MSI-X interrupt takes precedence over MSI interrupt, and MSI
interrupt take precedence over Legacy interrupt. The Host software must not switch (either enable or
disable) an interrupt type while there is an interrupt asserted or pending.

Legacy Interrupts

Asserting one or more bits of xdma0_usr_irq_req when legacy interrupts are enabled causes the
DMA to issue a legacy interrupt over PCIe. Multiple bits may be asserted simultaneously but each bit
must remain asserted until the corresponding xdma0_usr_irq_ack bit has been asserted. After a
xdma0_usr_irq_req bit is asserted, it must remain asserted until both corresponding
xdma0_usr_irq_ack bit is asserted and the interrupt is serviced and cleared by the Host. This
ensures interrupt pending register within the IP remains asserted when queried by the Host's Interrupt
Service Routine (ISR) to determine the source of interrupts. The xdma0_usr_irq_ack assertion
indicates the requested interrupt has been sent to the PCIe block. You must implement a mechanism
in the user application to know when the interrupt routine has been serviced. This detection can be
done in many different ways depending on your application and your use of this interrupt pin. This

Displayed in the footer


Page 428 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
typically involves a register (or array of registers) implemented in the user application that is cleared,
read, or modified by the Host software when an interrupt is serviced.
After the xdma0_usr_irq_req bit is deasserted, it cannot be reasserted until the corresponding
xdma0_usr_irq_ack bit has been asserted for a second time. This indicates the deassertion
message for the legacy interrupt has been sent over PCIe. After a second xdma0_usr_irq_ack has
occurred, the xdma0_usr_irq_req wire can be reasserted to generate another legacy interrupt.
The xdma0_usr_irq_req bit and DMA interrupts can be mapped to legacy interrupt INTA, INTB,
INTC, and INTD through the configuration registers. The following figure shows the legacy interrupts.
✎ Note: This figure shows only the handshake between xdma0_usr_irq_req and
xdma0_usr_irq_ack. Your application might not clear or service the interrupt immediately, in which
case, you must keep xdma0_usr_irq_req asserted past xdma0_usr_irq_ack. The figure below
shows one possible scenario where usr_irq_ackis deasserted at the same cycle for both
requests[1:0], which might not be the case in other situations.

Figure: Legacy Interrupts

MSI and MSI-X Interrupts

Asserting one or more bits of xdma0_usr_irq_req causes the generation of an MSI or MSI-X
interrupt if MSI or MSI-X is enabled. If both MSI and MSI-X capabilities are enabled, an MSI-X
interrupt is generated.
After a xdma0_usr_irq_req bit is asserted, it must remain asserted until the corresponding
xdma0_usr_irq_ack bit is asserted and the interrupt has been serviced and cleared by the Host. The
xdma0_usr_irq_ack assertion indicates the requested interrupt has been sent to the PCIe block.
This will ensure the interrupt pending register within the IP remains asserted when queried by the
Host's Interrupt Service Routine (ISR) to determine the source of interrupts. You must implement a
mechanism in the user application to know when the interrupt routine has been serviced. This
detection can be done in many different ways depending on your application and your use of this
interrupt pin. This typically involves a register (or array of registers) implemented in the user
application that is cleared, read, or modified by the Host software when an Interrupt is serviced.
Configuration registers are available to map xdma0_usr_irq_req and DMA interrupts to MSI or MSI-
X vectors. For MSI-X support, there is also a vector table and PBA table. The following figure shows
the MSI interrupt.
✎ Note: This figure shows only the handshake between xdma0_usr_irq_req and
xdma0_usr_irq_ack. Your application might not clear or service the interrupt immediately, in which
case, you must keep xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: MSI Interrupts

Displayed in the footer


Page 429 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

The following figure shows the MSI-X interrupt.


✎ Note: This figure shows only the handshake between xdma0_usr_irq_req and
xdma0_usr_irq_ack. Your application might not clear or service the interrupt immediately, in which
case, you must keep xdma0_usr_irq_req asserted past xdma0_usr_irq_ack.

Figure: MSI-X Interrupts

Config Block

The config module, the DMA register space which contains PCIe® solution IP configuration
information and DMA control registers, stores PCIe IP configuration information that is relevant to the
XDMA. This configuration information can be read through register reads to the appropriate register
offset within the config module.

Product Specification
XDMA Operations

Descriptors

The XDMA Subsystem uses a linked list of descriptors that specify the source, destination, and length
of the DMA transfers. Descriptor lists are created by the driver and stored in host memory. The DMA
channel is initialized by the driver with a few control registers to begin fetching the descriptor lists and
executing the DMA operations.
Descriptors describe the memory transfers that the XDMA should perform. Each channel has its own
descriptor list. The start address of each channel's descriptor list is initialized in hardware registers by
the driver. After the channel is enabled, the descriptor channel begins to fetch descriptors from the
initial address. Thereafter, it fetches from the Nxt_adr[63:0] field of the last descriptor that was
fetched. Descriptors must be aligned to a 32 byte boundary.
The size of the initial block of adjacent descriptors are specified with the Dsc_Adj register. After the
initial fetch, the descriptor channel uses the Nxt_adj field of the last fetched descriptor to determine
the number of descriptors at the next descriptor address. A block of adjacent descriptors must not
cross a 4K address boundary. The descriptor channel fetches as many descriptors in a single request
as it can, limited by MRRS, the number the adjacent descriptors, and the available space in the
channel's descriptor buffer.
✎ Note: Because MRRS in most host systems is 512 bytes or 1024 bytes, having more than 32
adjacent descriptors is not allowed on a single request. However, the design will allow a maximum 64
descriptors in a single block of adjacent descriptors if needed.

Displayed in the footer


Page 430 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Every descriptor in the descriptor list must accurately describe the descriptor or block of descriptors
that follows. In a block of adjacent descriptors, the Nxt_adj value decrements from the first descriptor
to the second to last descriptor which has a value of zero. Likewise, each descriptor in the block
points to the next descriptor in the block, except for the last descriptor which might point to a new
block or might terminate the list.
Termination of the descriptor list is indicated by the Stop control bit. After a descriptor with the Stop
control bit is observed, no further descriptor fetches are issued for that list. The Stop control bit can
only be set on the last descriptor of a block.
When using an AXI4 memory mapped interface, DMA addresses to the card are not translated. If the
Host does not know the card address map, the descriptor must be assembled in the user logic and
submitted to the DMA using the descriptor bypass interface.

Table: Descriptor Format

Offset Fields

0x0 Magic[15:0] Rsv[1:0] Nxt_adj[5:0] Control[7:0]

0x04 4’h0, Len[27:0]

0x08 Src_adr[31:0]

0x0C Src_adr[63:32]

0x10 Dst_adr[31:0]

0x14 Dst_adr[63:32]

0x18 Nxt_adr[31:0]

0x1C Nxt_adr[63:32]

Table: Descriptor Fields

Offset Field Bit Index Sub Field Description

0x0 Magic 15:0 16'had4b. Code


to verify that the
driver generated
descriptor is
valid.

0x0 1:0 Reserved set to


0's

0x0 Nxt_adj 5:0 The number of


additional
adjacent
descriptors after
the descriptor

Displayed in the footer


Page 431 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Offset Field Bit Index Sub Field Description


located at the
next descriptor
address field.
A block of
adjacent
descriptors
cannot cross a
4k boundary.

0x0 5, 6, 7 Reserved

0x0 4 EOP End of packet for


stream interface.

0x0 2, 3 Reserved

0x0 Set to 1 to
interrupt after the
engine has
completed this
descriptor. This
1 Completed requires global
IE_DESCRIPTOR_COMPLE
control flag set in
Control
the H2C/C2H
Channel control
register.

0x0 Set to 1 to stop


fetching
descriptors for
this descriptor
list. The stop bit
0 Stop
can only be set
on the last
descriptor of an
adjacent block of
descriptors.

0x04 Length 31:28 Reserved set to


0's

0x04 27:0 Length of the


data in bytes.

0x0C-0x8 Src_adr 63:0 Source address


for H2C and

Displayed in the footer


Page 432 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Offset Field Bit Index Sub Field Description


memory mapped
transfers.
Metadata
writeback
address for C2H
transfers.

0x14-0x10 Dst_adr 63:0 Destination


address for C2H
and memory
mapped
transfers. Not
used for H2C
stream.

0x1C-0x18 Nxt_adr 63:0 Address of the


next descriptor in
the list.
The DMA has (512 * 512) 32 KB deep FIFO to hold all descriptors in the descriptor engine. This
descriptor FIFO is shared with all selected channels and it is used only in an internal mode (not in
descriptor bypass mode).

Descriptor Bypass

The descriptor fetch engine can be bypassed on a per channel basis through AMD Vivado™ IDE
parameters. A channel with descriptor bypass enabled accepts descriptor from its respective
c2h_dsc_byp or h2c_dsc_byp bus. Before the channel accepts descriptors, the Control register Run
bit must be set. The NextDescriptorAddress and NextAdjacentCount, and Magic descriptor fields are
not used when descriptors are bypassed. The ie_descriptor_stopped bit in Control register bit
does not prevent the user logic from writing additional descriptors. All descriptors written to the
channel are processed, barring writing of new descriptors when the channel buffer is full.
When XDMA is configured in descriptor bypass mode, there is an 8 deep descriptor FIFO which is
common for all descriptor channels from user.
✎ Note: To enable descriptor bypass for any channel, you should write to register 0x3060. Refer to
cpm4-xdma-v2-1-registers.csv available in the register map files.

Poll Mode

Each engine is capable of writing back completed descriptor counts to host memory. This allows the
driver to poll host memory to determine when the DMA is complete instead of waiting for an interrupt.
For a given DMA engine, the completed descriptor count writeback occurs when the DMA completes
a transfer for a descriptor, and ie_descriptor_completed and Pollmode_wb_enable are set. The
completed descriptor count reported is the total number of completed descriptors since the DMA was

Displayed in the footer


Page 433 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
initiated (not just those descriptors with the Completed flag set). The writeback address is defined by
the Pollmode_hi_wb_addr and Pollmode_lo_wb_addr registers.

Table: Completed Descriptor Count Writeback Format

Offset Fields

0x0 Sts_err 7’h0 Compl_descriptor_count[23:0]

Table: Completed Descriptor Count Writeback Fields

Field Description

Sts_err The bitwise OR of any error status bits in the channel Status register.

Compl_descriptor_count[23:0]The lower 24 bits of the Complete Descriptor Count register.

DMA H2C Stream

For host-to-card transfers, data is read from the host at the source address, but the destination
address in the descriptor is unused. Packets can span multiple descriptors. The termination of a
packet is indicated by the EOP control bit. A descriptor with an EOP bit asserts tlast on the AXI4-
Stream user interface on the last beat of data.
Data delivered to the AXI4-Stream interface will be packed for each descriptor. tkeep is all 1s except
for the last cycle of a data transfer of the descriptor if it is not a multiple of the datapath width. The
DMA does not pack data across multiple descriptors.

DMA C2H Stream

For card-to-host transfers, the data is received from the AXI4-Stream interface and written to the
destination address. Packets can span multiple descriptors. The C2H channel accepts data when it is
enabled, and has valid descriptors. As data is received, it fills descriptors in order. When a descriptor
is filled completely or closed due to an end of packet on the interface, the C2H channel writes back
information to the writeback address on the host with pre-defined WB Magic value 16'h52b4 (Table
2), and updated EOP and Length as appropriate. For valid data cycles on the C2H AXI4-Stream
interface, all data associated with a given packet must be contiguous.
✎ Note: C2H Channel Writeback information is different then Poll mode updates. C2H Channel
Writeback information provides the driver current length status of a particular descriptor. This is
different from Pollmode_*, as is described in Poll Mode.
The tkeep bits must be all 1s except for the last data transfer of a packet. On the last transfer of a
packet, when tlast is asserted, you can specify a tkeep that is not all 1s to specify a data cycle that
is not the full datapath width. The asserted tkeep bits need to be packed to the lsb, indicating
contiguous data. If tlast is asserted and tkeep has all zero's, this is not a valid combination for DMA
to function properly.
The length of a C2H Stream descriptor (the size of the destination buffer) must always be a multiple of
64 bytes.

Displayed in the footer


Page 434 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: C2H Stream Writeback Format

Offset Fields

0x0 WB Magic[15:0] Reserved [14:0] Status[0]

0x04 Length[31:0]

Table: C2H Stream Writeback Fields

Field Bit Index Sub Field Description

Status 0 EOP End of packet

Reserved 14:0 Reserved

WB Magic 15:0 16’h52b4. Code to verify


the C2H writeback is
valid.

Length 31:0 Length of the data in


bytes.

✎ Note: C2H Streaming writeback address cannot cross 4K boundary.


Address Alignment

Table: Address Alignment

Interface Type Datapath Width Address Restriction

AXI4 MM 64, 128, None


256, 512

AXI4-Stream 64, 128, None


256, 512

AXI4 MM fixed address 64 Source_addr[2:0] == Destination_addr[2:0] == 3’h0

AXI4 MM fixed address 128 Source_addr[3:0] == Destination_addr[3:0] == 4’h0

AXI4 MM fixed address 256 Source_addr[4:0] == Destination_addr[4:0] == 5’h0

AXI4 MM fixed address 512 Source_addr[5:0] == Destination_addr[5:0]==6'h0

Length Granularity

Table: Length Granularity

Interface Type Datapath Width Length Granularity Restriction

Displayed in the footer


Page 435 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Interface Type Datapath Width Length Granularity Restriction

AXI4 MM 64, 128, None


256, 512

AXI4-Stream 64, 128, None 1


256, 512

AXI4 MM fixed address 64 Length[2:0] == 3’h0

AXI4 MM fixed address 128 Length[3:0] == 4’h0

AXI4 MM fixed address 256 Length[4:0] == 5’h0

AXI4 MM fixed address 512 Length[5:0] == 6'h0

1. Each C2H descriptor must be sized as a multiple of 64 Bytes. However, there are no
restrictions to the total number of Bytes in the actual C2H transfer.

Parity

Set the Propagate Parity option in the PCIe DMA Tab in the AMD Vivado™ IDE to check for parity.
Otherwise, no parity checking occurs.
When Propagate Parity is enabled, the XDMA propagates parity to the user AXI interface. You are
responsible for checking and generating parity in the AXI Interface. Parity is valid every clock cycle
when a data valid signal is asserted, and parity bits are valid only for valid data bytes. Parity is
calculated for every byte; total parity bits are DATA_WIDTH/8.

Parity information is sent and received on *_tuser ports in AXI4-Stream (AXI_ST) mode.
Parity information is sent and received on *_ruser and *_wuser ports in AXI4 Memory Mapped
(AXI-MM) mode.

Odd parity is used for parity checking. By default, parity checking is not enabled.

Port Description

Global Signals

Table: Global Signals

Signal Name Direction Description

gt_refclk0_p/gt_refclk0_n I GT reference clock

pci_gt_txp/pci_gt_txn O PCIe TX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

Displayed in the footer


Page 436 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name Direction Description

pci_gt_rxp/pci_gt_rxn I PCIe RX serial interface.


[PL_LINK_CAP_MAX_LINK_WIDTH-
1:0]

pcie0_user_lnk_up O Output active-High identifies that the PCI Express core


is linked up with a host device.

pcie0_user_clk O User clock out. PCIe derived clock output for all
interface signals output/input to QDMA. Use this clock
to drive inputs and gate outputs from QDMA.

dma0_axi_aresetn O User reset out. AXI reset signal synchronous with the
clock provided on the pcie0_user_clk output. This
reset should drive all corresponding AXI Interconnect
aresetn signals.

dma0_soft_resetn I Soft reset (active-Low). Use this port to assert reset


and reset the DMA logic. This will reset only the DMA
logic. User should assert and deassert this port.

AXI Slave Interface

AXI Bridge Slave ports are connected from the AMD Versal device Network on Chip (NoC) to the
CPM DMA internally. For Slave Bridge AXI4 details, see Versal Adaptive SoC Programmable Network
on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
To access XDMA registers, you must follow the protocols outlined in the AXI Slave Bridge Register
Limitations section.
Related Information
Slave Bridge Registers Limitations

AXI4 Memory Mapped Interface

AXI4 (MM) Master ports are connected from the CPM to the AMD Versal device Network on Chip
(NoC) internally. For details, see Versal Adaptive SoC Programmable Network on Chip and Integrated
Memory Controller LogiCORE IP Product Guide (PG313). The AXI4 Master interface can be
connected to DDR or to the PL user logic, depending on the NoC configuration.

AXI4-Lite Master Interface

The CIPS IP does not support the AXI4-Lite Master interface. Use the SmartConnect IP to connect
the NoC to the AXI4-Lite Master interface.
For details, see SmartConnect LogiCORE IP Product Guide (PG247).

Displayed in the footer


Page 437 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
H2C Channel 0-3 AXI4-Stream Interface Signals

Table: H2C Channel 0-3 AXI4-Stream Interface Signals

Signal Name 1 Direction Description

Assertion of this signal by the user logic


indicates that it is ready to accept data. Data is
transferred across the interface when
dma0_m_axis_h2c_tready and
dma0_m_axis_h2c_x_tready I dma0_m_axis_h2c_tvalid are asserted in the
same cycle. If the user logic deasserts the
signal when the valid signal is High, the DMA
keeps the valid signal asserted until the ready
signal is asserted.

The DMA asserts this signal in the last beat of


dma0_m_axis_h2c_x_tlast O the DMA packet to indicate the end of the
packet.

dma0_m_axis_h2c_x_tdata
O Transmit data from the DMA to the user logic.
[DATA_WIDTH-1:0]

dma0_m_axis_h2c_x_tkeep O tkeep will be all 1s except when


[DATA_WIDTH/8-1:0] dma0_m_axis_h2c_x_tlast is asserted.

The DMA asserts this whenever it is driving


dma0_m_axis_h2c_x_tvalid O
valid data on dma0_m_axis_h2c_tdata.

dma0_m_axis_h2c__x_tuser Parity bits. This port is enabled only in


O
[DATA_WIDTH/8-1:0] Propagate Parity mode.

1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for
channel 0 use the dma0_ m_axis_h2c_tready_0 port, and for channel 1 use the
dma0_m_axis_h2c_tready_1 port.

C2H Channel 0-3 AXI4-Stream Interface Signals

Table: C2H Channel 0-3 AXI4-Stream Interface Signals

Signal Name 1 Direction Description

dma0_s_axis_c2h_x_tready O Assertion of this signal indicates that the DMA


is ready to accept data. Data is transferred
across the interface when
dma0_s_axis_c2h_tready and
dma0_s_axis_c2h_tvalid are asserted in the

Displayed in the footer


Page 438 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Signal Name 1 Direction Description

same cycle. If the DMA deasserts the signal


when the valid signal is High, the user logic
must keep the valid signal asserted until the
ready signal is asserted.

The user logic asserts this signal to indicate the


dma0_s_axis_c2h_x_tlast I
end of the DMA packet.

dma0_s_axis_c2h_x_tdata
I Transmits data from the user logic to the DMA.
[DATA_WIDTH-1:0]

dma0_s_axis_x_tkeep I tkeep must all be 1s for all cycles except when


[DATA_WIDTH/8-1:0] dma0_s_axis_c2h_x_tlastis asserted. The
asserted tkeep bits need to be packed to the
lsb, indicating contiguous data.

The user logic asserts this whenever it is


dma0_s_axis_c2h_x_tvalid I
driving valid data on dma0_s_axis_c2h_tdata.

dma0_s_axis_c2h_x_tuser Parity bits. This port is enabled only in


I
[DATA_WIDTH/8-1:0] Propagate Parity mode.

1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for
channel 0 use the dma0_m_axis_c2h_tready_0 port, and for channel 1 use the
dma0_m_axis_c2h_tready_1 port.

Interrupt Interface

Table: Interrupt Interface

Signal Name Direction Description

dma0_usr_irq_req[NUM_USR_IRQ-1:0] I Assert to generate an interrupt. Maintain


assertion until interrupt is serviced.

dma0_usr_irq_ack[NUM_USR_IRQ-1:0] O Indicates that the interrupt has been


sent on PCIe. Two acks are generated
for legacy interrupts. One ack is
generated for MSI interrupts.

dma0_usr_irq_func[7:0] I In most cases these signals are tied to


0s for function 0.

NUM_USR_IRQ is selectable and it ranges from 0 to 15. Each bits in dma0_usr_irq_reqbus


corresponds to the same bits in dma0_usr_irq_ack. For example, dma0_usr_irq_ack[0] represents
an ack for dma0_usr_irq_req[0].

Displayed in the footer


Page 439 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Channel 0-3 DMA Status Interface

Table: Channel 0-3 DMA Status Interface

Signal Name Direction Description

dma0_h2c_sts_x [7:0] O Status bits for each channel. Bit:


6: Control register 'Run' bit
5: IRQ event pending
4: Packet Done event (AXI4-Stream)
3: Descriptor Done event. Pulses for
one cycle for each descriptor that is
completed, regardless of the
Descriptor.Completed field
2: Status register Descriptor_stop bit
1: Status register Descriptor_completed
bit
0: Status register busy bit

dma0_c2h_sts_x [7:0] O Status bits for each channel. Bit:


6: Control register 'Run' bit
5: IRQ event pending
4: Packet Done event (AXI4-Stream)
3: Descriptor Done event. Pulses for
one cycle for each descriptor that is
completed, regardless of the
Descriptor.Completed field
2: Status register Descriptor_stop bit
1: Status register Descriptor_completed
bit
0: Status register busy bit

1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for
channel 0 use the dma0_c2h_sts_0 port, and for channel 1 use the dma0_c2h_sts_1 port.

Descriptor Bypass Interface

These ports are present if either Descriptor Bypass for Read (H2C) or Descriptor Bypass for Write
(C2H) are selected in the PCIe DMA Tab in the Vivado IDE. Each binary bit corresponds to a channel,
and LSB corresponds to Channel 0. Value 1 in bit positions means the corresponding channel
descriptor bypass is enabled.

Table: H2C 0-3 Descriptor Bypass Interface description

Port Direction Description

Displayed in the footer


Page 440 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Direction Description

dma0_h2c_dsc_byp_x_ready O Channel is ready to accept new descriptors.


After dma0_h2c_dsc_byp_ready is deasserted,
one additional descriptor can be written. The
Control register 'Run' bit must be asserted
before the channel accepts descriptors.

dma0_h2c_dsc_byp_x_load I Write the descriptor presented at


dma0_h2c_dsc_byp_data into the channel’s
descriptor buffer.

dma0_h2c_dsc_byp_src_x_addr[63:0] I Descriptor source address to be loaded.

dma0_h2c_dsc_byp_dst_x_addr[63:0] I Descriptor destination address to be loaded.

dma0_h2c_dsc_byp_x_len[27:0] I Descriptor length to be loaded.

dma0_h2c_dsc_byp_x_ctl[15:0] I Descriptor control to be loaded.


[0]: Stop. Set to 1 to stop fetching next
descriptor.
[1]: Completed. Set to 1 to interrupt after the
engine has completed this descriptor.
[3:2]: Reserved.
[4]: EOP. End of Packet for AXI-Stream
interface.
[15:5]: Reserved.
All reserved bits can be forced to 0s.

1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for
channel 0 use the dma0_h2c_dsc_byp_0_ctl[15:0] port, and for channel 1 use the
dma0_h2c_dsc_byp_1_ctl[15:0] port.

Table: C2H 0-3 Descriptor Bypass Ports

Port Direction Description

dma0_c2h_dsc_byp_x_ready O Channel is ready to accept new descriptors.


After dma0_c2h_dsc_byp_ready is deasserted,
one additional descriptor can be written. The
Control register 'Run' bit must be asserted
before the channel accepts descriptors.

dma0_c2h_dsc_byp_x_load I Descriptor presented at dma0_c2h_dsc_byp_*


is valid.

dma0_c2h_dsc_byp_src_x_addr[63:0] I Descriptor source address to be loaded.

Displayed in the footer


Page 441 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Port Direction Description

dma0_c2h_dsc_byp_dst_x_addr[63:0] I Descriptor destination address to be loaded.

dma0_c2h_dsc_byp_x_len[27:0] I Descriptor length to be loaded.

dma0_c2h_dsc_byp_x_ctl[15:0] I Descriptor control to be loaded.


[0]: Stop. Set to 1 to stop fetching next
descriptor.
[1]: Completed. Set to 1 to interrupt after the
engine has completed this descriptor.
[3:2]: Reserved.
[4]: EOP. End of Packet for AXI-Stream
interface.
[15:5]: Reserved.
All reserved bits can be forced to 0s.

1. _x in the signal name changes based on the channel number 0, 1, 2, and 3. For example, for
channel 0 use the dma0_c2h_dsc_byp_0_ctl[15:0] port, and for channel 1 use the
dma0_c2h_dsc_byp_1_ctl[15:0] port.

The following timing diagram shows how to input the descriptor in descriptor bypass mode. When
dma0_<h2c|c2h>_dsc_byp_ready is asserted, a new descriptor can be pushed in with the
dma0_<h2c|c2h>_dsc_byp_load signal.

Figure: Timing Diagram for Descriptor Bypass Mode

‼ Important: Immediately after dma0_<h2c|c2h>_dsc_byp_ready is deasserted, one more


descriptor can be pushed in. In the above timing diagram, a descriptor is pushed in when
dma0_<h2c|c2h>_dsc_byp_ready is deasserted.

NoC Ports

✎ Note: NoC ports are always connected to the NoC IP. You cannot leave the ports unconnected
and they connect to any other blocks. If so, this will result in synthesis and implementation error. For
connection reference look the picture below.

Displayed in the footer


Page 442 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: NoC Ports

Port Name I/O Description

CPM_PCIE_NOC_0 I AXI4 MM 0 port from CPM to NoC

CPM_PCIE_NOC_1 I AXI4 MM 1 port from CPM to NoC

cpm_pcie_noc_axi0_clk 0 Clock for AXI4 MM 0 Port

cpm_pcie_noc_axi1_clk O Clock for AXI4 MM 1 Port

Figure: CPM4 NoC Connection

Register Space

Configuration and status registers internal to the XDMA Subsystem and those in the user logic can be
accessed from the host through mapping the read or write request to a Base Address Register (BAR).
Based on the BAR hit, the request is routed to the appropriate location. For PCIe BAR assignments,
see Target Bridge.

XDMA Address Register Space

All the registers are found in cpm4-xdma-v2-1-registers.csv available in the register map files.
To locate the register space information:
1. Download the register map files.
2. Extract the ZIP file contents into any write-accessible location.
3. Refer to the cpm4-xdma-v2-1-registers.csv file.

PCIe to AXI Bridge Master Address Map

Transactions that hit the PCIe to AXI Bridge Master are routed to the AXI4 Memory Mapped user
interface.

Displayed in the footer


Page 443 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
PCIe to DMA Address Map

Transactions that hit the PCIe to DMA space are routed to the DMA Subsystem for the PCIeXDMA
Subsystem internal configuration register bus. This bus supports 32 bits of address space and 32-bit
read and write requests.
XDMA registers can be accessed from the host or from the AXI Slave interface. These registers
should be used for programming the DMA and checking status.

PCIe to DMA Address Format

Table: PCIe to DMA Address Format

31:16 15:12 11:8 7:0

Reserved Target Channel Byte Offset

Table: PCIe to DMA Address Field Descriptions

Bit Index Field Description

The destination submodule within the DMA


4’h0: H2C Channels
4’h1: C2H Channels
4’h2: IRQ Block
15:12 Target 4’h3: Config
4’h4: H2C SGDMA
4’h5: C2H SGDMA
4’h6: SGDMA Common
4'h8: MSI-X

This field is only applicable for H2C Channel, C2H


Channel, H2C SGDMA, and C2H SGDMA Targets. This
11:8 Channel ID[3:0]
field indicates which engine is being addressed for these
Targets. For all other Targets this field must be 0.

The byte address of the register to be accessed within the


7:0 Byte Offset
target. Bits[1:0] must be 0.

AXI Slave Register Space

DMA register space can be accessed using AXI Slave interface. When AXI Slave Bridge mode is
enabled (based on GUI settings) user can also access Bridge registers and can also access Host
memory space.

Table: AXI4 Slave Register Space

Register Space AXI Slave Interface Address Range Details

Displayed in the footer


Page 444 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register Space AXI Slave Interface Address Range Details

Bridge registers 0x6_0000_0000 Described in Bridge register


space CSV file. See Bridge
Register Space for details.

DMA registers 0x6_1002_0000 Described in XDMA Address


Register Space.

Slave Bridge access to Host 0xE001_0000 - 0xEFFF_FFFF Address range for Slave
memory space 0x6_1100_0000 - bridge access is set during IP
0x7_FFFF_FFFF customization in the Address
0x80_0000_0000 - Editor tab of the Vivado IDE.
0xBF_FFFF_FFFF

Bridge Register Space

Bridge register addresses start at 0xE00. Addresses from 0x00 to 0xE00 are directed to the PCIe
configuration register space.
All the bridge registers are listed in the cpm4-bridge-v2-1-registers.csv available in the register map
files.
To locate the register space information:

1. Download the register map files.


2. Extract the ZIP file contents into any write-accessible location.
3. Refer to the cpm4-bridge-v2-1-registers.csv file.

DMA Register Space

The DMA register space is described in XDMA Address Register Space.

Design Flow Steps


This section describes customizing and generating the functional mode, constraining the functional
mode, and the simulation, synthesis, and implementation steps that are specific to this IP functional
mode. More detailed information about the standard AMD Vivado™ design flows and the IP integrator
can be found in the following Vivado Design Suite user guides:

Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
Vivado Design Suite User Guide: Designing with IP (UG896)
Vivado Design Suite User Guide: Getting Started (UG910)
Vivado Design Suite User Guide: Logic Simulation (UG900)

XDMA AXI MM Interface to NoC and DDR Lab

This lab describes the process of generating an AMD Versal™ device XDMA design with AXI4
interface connecting to DDR memory. This lab explains a step by step procedure to configure a

Displayed in the footer


Page 445 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Control, Interfaces and Processing System (CIPS) XDMA design and network on chip (NoC) IP. The
following figure shows the AXI4 Memory Mapped (AXI4) interface to DDR using the NoC IP. At the
end of this lab, you can synthesize and implement the design, and generate a Programmable Device
Image (PDI) file. The PDI file is used to program the AMD Versal device and run data traffic on a
system. For host to chip (H2C) transfers, data is read from Host, and sent to DDR memory. For chip
to host (C2H) transfers, data is read from DDR memory and written to Host. Transfer can be initiated
on all 4 channels.
This lab targets a xcvc1902-vsvd1760-1LP-e-S-es1 part on a VCK5000 board. This lab connects to
DDR found outside the Versal device. For more information, see XDMA AXI MM Interface to NoC and
DDR.

Application Software Development


This section provides details about the Linux device driver that is provided with the core.

Device Drivers

Figure: Device Drivers

The above figure shows the usage model of Linux XDMA software drivers. The XDMA example
design is implemented on an AMD adaptive SoC, which is connected to an X86 host through PCI
Express. In this mode, the XDMA driver in kernel space runs on Linux, whereas the test application
runs in user space.

Linux Device Driver

The Linux device driver has the following character device interfaces:

Displayed in the footer


Page 446 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
User character device for access to user components.
Control character device for controlling XDMA Subsystem components.
Events character device for waiting for interrupt events.
SGDMA character devices for high performance transfers.
The user accessible devices are as follows:

XDMA0_control
Used to access XDMA Subsystem registers.

XDMA0_user
Used to access AXI-Lite master interface.

XDMA0_bypass
Used to access DMA Bypass interface.

XDMA0_events_*
Used to recognize user interrupts.

Using the Driver

The XDMA drivers can be downloaded from the DMA IP Drivers page.

Interrupt Processing

Legacy Interrupts

There are four types of legacy interrupts: A, B, C and D. You can select any interrupts in the PCIe
Misc tab under Legacy Interrupt Settings. You must program the corresponding values for both the
IRQ Block Channel Vector and the IRQ Block User Vector. Values for each legacy interrupts are A = 0,
B = 1, C = 2 and D = 3. The host recognizes interrupts only based on these values.

MSI Interrupts

For MSI interrupts, you can select from 1 to 32 vectors in the PCIe Misc tab under MSI Capabilities,
which consists of a maximum of 16 usable DMA interrupt vectors and a maximum of 16 usable user
interrupt vectors. The Linux operating system (OS) supports only 1 vector. Other operating systems
might support more vectors and you can program different vectors values in the IRQ Block Channel
Vector and in the IRQ Block User Vector to represent different interrupt sources. The AMD Linux
driver supports only 1 MSI vector.

MSI-X Interrupts

The DMA supports up to 32 different interrupt source for MSI-X, which consists of a maximum of 16
usable DMA interrupt vectors and a maximum of 16 usable user interrupt vectors. The DMA has 32
MSI-X tables, one for each source. For MSI-X channel interrupt processing the driver should use the
Engine’s Interrupt Enable Mask for H2C and C2H to disable and enable interrupts.

Displayed in the footer


Page 447 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
User Interrupts

The user logic must hold usr_irq_req active-High even after receiving usr_irq_ack (acks) to keep
the interrupt pending register asserted. This enables the Interrupt Service Routine (ISR) within the
driver to determine the source of the interrupt. Once the driver receives user interrupts, the driver or
software can reset the user interrupts source to which hardware should respond by deasserting
usr_irq_req.

Example H2C Flow

In the example H2C flow, loaddriver.sh loads devices for all available channels. The dma_to_device
user program transfers data from host to Card.
The example H2C flow sequence is as follows:

1. Open the H2C device and initialize the DMA.


2. The user program reads the data file, allocates a buffer pointer, and passes the pointer to write
function with the specific device (H2C) and data size.
3. The driver creates a descriptor based on input data/size and initializes the DMA with descriptor
start address, and if there are any adjacent descriptor.
4. The driver writes a control register to start the DMA transfer.
5. The DMA reads descriptor from the host and starts processing each descriptor.
6. The DMA fetches data from the host and sends the data to the user side. After all data is
transferred based on the settings, the DMA generates an interrupt to the host.
7. The ISR driver processes the interrupt to find out which engine is sending the interrupt and
checks the status to see if there are any errors. It also checks how many descriptors are
processed.
8. After the status is good, the drive returns transfer byte length to user side so it can check for the
same.

Example C2H Flow

In the example C2H flow, loaddriver.sh loads the devices for all available channels. The
dma_from_device user program transfers data from Card to host.
The example C2H flow sequence is as follows:

1. Open device C2H and initialize the DMA.


2. The user program allocates buffer pointer (based on size), passes pointer to read function with
specific device (C2H) and data size.
3. The driver creates descriptor based on size and initializes the DMA with descriptor start address.
Also if there are any adjacent descriptor.
4. The driver writes control register to start the DMA transfer.
5. The DMA reads descriptor from host and starts processing each descriptor.
6. The DMA fetches data from Card and sends data to host. After all data is transferred based on
the settings, the DMA generates an interrupt to host.
7. The ISR driver processes the interrupt to find out which engine is sending the interrupt and
checks the status to see if there are any errors and also checks how many descriptors are
processed.

Displayed in the footer


Page 448 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
8. After the status is good, the drive returns transfer byte length to user side so it can check for the
same.

Debugging
This appendix includes details about resources available on the AMD Support website and debugging
tools.

Finding Help with AMD Adaptive Computing Solutions

To help in the design and debug process when using the functional mode, the Support web page
contains key resources such as product documentation, release notes, answer records, information
about known issues, and links for obtaining further product support. The Community Forums are also
available where members can learn, participate, share, and ask questions about AMD Adaptive
Computing solutions.

Documentation

This product guide is the main document associated with the functional mode. This guide, along with
documentation related to all products that aid in the design process, can be found on the AMD
Adaptive Support web page or by using the AMD Adaptive Computing Documentation Navigator.
Download the Documentation Navigator from the Downloads page. For more information about this
tool and the features available, open the online help after installation.

Debug Guide

For more information on PCIe Debug, see PCIe Debug K-Map.

Answer Records

Answer Records include information about commonly encountered problems, helpful information on
how to resolve these problems, and any known issues with an AMD Adaptive Computing product.
Answer Records are created and maintained daily to ensure that users have access to the most
accurate information available.
Answer Records for this functional mode can be located by using the Search Support box on the main
AMD Adaptive Support web page. To maximize your search results, use keywords such as:

Product name
Tool message(s)
Summary of the issue encountered

A filter search is available after results are returned to further target the results.

Master Answer Record for the Core

AR 75396.

Displayed in the footer


Page 449 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Technical Support

AMD Adaptive Computing provides technical support on the Community Forums for this AMD
LogiCORE™ IP product when used as described in the product documentation. AMD Adaptive
Computing cannot guarantee timing, functionality, or support if you do any of the following:

Implement the solution in devices that are not defined in the documentation.
Customize the solution beyond that allowed in the product documentation.
Change any section of the design labeled DO NOT MODIFY.

To ask questions, navigate to the Community Forums.

Hardware Debug

Hardware issues can range from link bring-up to problems seen after hours of testing. This section
provides debug steps for common issues. The AMD Vivado™ debug feature is a valuable resource to
use in hardware debug. The signal names mentioned in the following individual sections can be
probed using the debug feature for debugging the specific problems.

General Checks

Ensure that all the timing constraints for the core were properly incorporated from the example design
and that all constraints were met during implementation.

Does it work in post-place and route timing simulation? If problems are seen in hardware but not
in timing simulation, this could indicate a PCB issue. Ensure that all clock sources are active and
clean.
If using MMCMs in the design, ensure that all MMCMs have obtained lock by monitoring the
locked port.
If your outputs go to 0, check your licensing.

Registers

A complete list of registers and attributes for the XDMA Subsystem is available in the Versal Adaptive
SoC Register Reference (AM012). Reviewing the registers and attributes might be helpful for
advanced debugging.
✎ Note: The attributes are set during IP customization in the Vivado IP catalog. After core
customization, attributes are read-only.

Upgrading
This appendix is not applicable for the first release of the functional mode.

GT Selection and Pin Planning for CPM4


This appendix provides guidance on gigabit transceiver (GT) selection for applicable AMD Versal™
devices and some key recommendations that should be considered when selecting the GT locations.

Displayed in the footer


Page 450 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
This appendix provides guidance for CPM, PL PCIe® and PHY IP based solutions. In this guide, the
CPM IP related guidance is of primary importance, while the other related guidance might be relevant
and is provided for informational purposes.
A GT Quad is comprised of four GT lanes. GT Quad and ref clock locations for CPM4 are in fixed
locations depending on the desired link configuration (see GT Quad Locations). When selecting GT
Quads for the PHY IP based solution with AMD PCIe MAC, AMD recommends that you use the GT
Quads most adjacent to the AMD PCIe macro. While this is not required, it improves place, route, and
timing for the design.

Link widths of x1, x2, and x4 require one bonded GT Quad and should not split lanes between
two GT Quads.
A link width of x8 requires two adjacent GT Quads that are bonded and are in the same SLR.
A link width of x16 requires four adjacent GT Quads that are bonded and are in the same SLR.
PL PCIE blocks should use GTs adjacent to the PCIe block where possible.
CPM has a fixed connectivity to GTs based on the CPM configuration.

For GTs on the left side of the device, it is suggested that PCIe lane 0 is placed in the bottom-most
GT of the bottom-most GT Quad. Subsequent lanes use the next available GTs moving vertically up
the device as the lane number increments. This means that the highest PCIe lane number uses the
top-most GT in the top-most GT Quad that is used for PCIe.
For GTs on the right side of the device, it is suggested that PCIe lane 0 is placed in the top-most GT
of the top-most GT Quad. Subsequent lanes use the next available GTs moving vertically down the
device as the lane number increments. This means that the highest PCIe lane number uses the
bottom-most GT in the bottom-most GT Quad that is used for PCIe.
✎ Note: For more information on GT Quad location, refer to the Device Diagram Overview section in
Versal Adaptive SoC Packaging and Pinouts Architecture Manual (AM013). Understand that the
device diagram view might not be the same as IO device view in Vivado because of the device
packaging.
✎ Note: The implemented device view in Vivado shows lane 0 on the bottom-most GT of the bottom-
most Quad on the right side of the device, but lane re-ordering is handled in logic to place lane 0 on
the top-most GT of the top-most GT Quad. The GT Quad IP does not allow channel level control to
remap the GT pins.
The PCIe reference clock uses GTREFCLK0 in the PCIe lane 0 GT Quad for x1, x2, x4, and x8
configurations. For x16 configurations the PCIe reference clock should use GTREFCLK0 on a GT
Quad associated with lanes 8-11. This allows the clock to be forwarded to all 16 PCIe lanes.
✎ Note: The reference clock cannot be forwarded between the CPM4 GTs and GTs used by the PL.
CPM4 and PL IPs must have separate reference clocks.
The PCIe reset pins for CPM designs must connect to one of specified pins for each of the two PCIe
controllers. The PCIe reset pin for PL PCIE and PHY IP designs can be connected to any compatible
PL pin location, or the CPM PCIE reset pins when the corresponding CPM PCIE controller is not in
use. This is summarized in the following table:

Table: PCIe Controller Reset Pin Locations

Versal PCIe Controller Versal Reset Pin Location

Displayed in the footer


Page 451 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Versal PCIe Controller Versal Reset Pin Location

CPM PCIE Controller 0 PS MIO 18

PMC MIO 24

PMC MIO 38

CPM PCIE Controller 1 PS MIO 19

PMC MIO 25

PMC MIO 39

PL PCIE Controllers Any compatible single-ended PL I/O pin.

Versal Adaptive SoC PHY IP Any compatible single-ended PL I/O pin.

PCIe PHY IP has the following two Vivado Tcl parameters:

lane_reversal with values true or false (Default).


lane_order which is only applicable to x1 and x2 configurations with values Bottom (Default) or
Top.

For example in a x2 design, by default PIPE signals of the PCIe MAC[1:0] connects to PIPE signals of
the GT QUAD[1:0]. When you apply lane_reversal {true}, PIPE signals of the PCIe MAC[1:0] connects
to PIPE signals of the GT QUAD[0:1]. When you apply lane_order {Top}, PIPE signals of the PCIe
MAC[1:0] connects to PIPE signals of the GT QUAD[3:2].
Following are the commands for using lane_reversal and lane_order:

lane_reversal
set_property -dict [list CONFIG.lane_reversal {true}] [get_bd_cells
<ip_name>]

lane_order
set_property -dict [list CONFIG.lane_order {Top}] [get_bd_cells <ip_name>]

CPM4 GT Selection
The CPM block within Versal devices has a fixed set of GTs that can be used for each of the two PCIE
controllers. These GTs are shared between the two PCIE controllers and High Speed Debug Port
(HSDP) as such x16 link widths are only supported when a single PCIE controller is in use and HSDP
is disabled. When two CPM PCIE controllers or one PCIE controller and HSDP are enabled each link
will be limited to a x8 link width. GT Quad allocation for CPM happens at GT Quad granularity and
must include all GT Quads from the most adjacent to the CPM to the topmost GT Quad that is in use
by the CPM. GT Quads that are used or between GT Quads that are used by the CPM (for either
PCIe or HSDP) cannot be shared with PL resources even if GTs within the quad are not in use.

Displayed in the footer


Page 452 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

CPM in Single Controller Mode


When a single PCIE controller in the CPM is being used and HSDP is disabled, PCIe x1, x2, x4, x8,
and x16 link widths are supported. PCIe lane0 is places at the bottom-most GT of the bottom-most GT
Quad that is directly above the CPM. Subsequent lanes use the next available GTs moving vertically
up the device as the lane number increments. This means the highest PCIe lane number uses the
top-most GT in the top-most GT Quad that is used for PCIe. Because the GT locations and lane
ordering for CPM is fixed it cannot be modified through IP customization.
As stated previously GT Quad allocation happens at GT Quad granularity and cannot share unused
GT Quad resources with the PL. This means that CPM PCIe controller 0 configurations that use x1 or
x2 link widths will not use all the GTs within the Quad and that these GTs cannot be used in the PL for
additional GT connectivity. Unused GT Quads in this configuration can be used by the PL to
implement PL GT based solutions.
When CPM PCIe controller 0 and High Speed Debug Port (HSDP) is enabled, a PCIe link width of
x16 cannot be used and the CPM will use all three GT Quads that are directly above the CPM
regardless of PCIe link width. In this configuration, these GT Quads are allocated to CPM and cannot
be shared with PL resources. CPM PCIe lanes 0-7 will be unchanged in their GT selection and lane
ordering. HSDP will use the bottom-most GT that is the third GT Quad away from CPM. This
corresponds to the same location as PCIe lane 8 for a x16 link configuration. The fourth GT Quad in
this configuration is not use by CPM and can be used to implement PL GT based solutions.

CPM in Dual Controller Mode


When the CPM is configured to use two PCIE controllers, High Speed Debug Port (HSDP) cannot be
used because it shares GTs with the two PCIE controllers. Each PCIE controller can support x1, x2,
x4 and x8 link widths in this configuration. This configuration will use at least the bottom three GT
Quads closest to the CPM. These GT Quads cannot be used by PL resources. If CPM PCIE controller
1 is using a link width of x1, x2, or x4; then CPM uses three GT Quads. In this case the fourth GT
Quad can be used by PL resources to implement GT based solutions. If CPM PCIE controller 1 is
using a x8 link width, all four GT Quads will be used by the CPM and cannot be used by PL
resources.
CPM PCIE controller 0 lane0 is placed at the bottom-most GT of the bottom-most GT Quad that is
directly above the CPM. Subsequent lanes use the next available GTs moving vertically up the device
as the lane number increments. CPM PCIE controller 0 lane7 connects to the top-most GT in the
second GT Quad away from the CPM.
CPM PCIE controller 1 lane0 is places at the bottom-most GT of third GT Quad above the CPM.
Subsequent lanes use the next available GTs moving vertically up the device as the lane number
increments. CPM PCIE controller 1 lane7 connects to the top-most GT in the fourth GT Quad away
from the CPM.

High Speed Debug Port (HSDP) Only Modes


When the CPM is configured to use the High Speed Debug Port (HSDP) without enabling either PCIe
controller, the bottom-most GT in the bottom-most GT Quad closest to CPM should be used. This will

Displayed in the footer


Page 453 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
allow the CPM to use only one GT Quad and allow the next three GT Quads to be used by PL
resources.
HSDP can also be enabled for the bottom-most GT in the third GT Quad up from CPM. In this
scenario CPM will use three GT Quads and only use one GT. The remaining unused GTs cannot be
used or shared by PL resources. As result typically HSDP will not be used in this configuration.

CPM4 Additional Considerations


To facilitate migration from AMD UltraScale™ or AMD UltraScale+™ designs, boards might be
designed to use either CPM4 or PL PCIE integrated blocks to implement PCIe solutions. When
designing a board to use either CPM4 or the PL PCIE hardblock, the CPM4 pin selection and
planning guidelines should be followed because they are more restrictive. By doing this a board can
be designed that works for either a CPM4 or PL PCIE implementation. To route the PCIe reset from
the CPM4 to the PL for use with a PL PCIE implementation, the following option needs to be enabled
under PS-PMC in the CIPS IP customization GUI.

Figure: Configuring the PS PMC

When this option is enabled the PCIe reset for each disabled CPM4 PCIE controller is routed to the
PL. The same CPM4 pin selection limitations apply and the additional PCIe reset output pins are

Displayed in the footer


Page 454 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
exposed at the boundary of the CIPS IP. If the CPM4 PCIE controller is enabled, the PCIe reset is
used internal to the CPM4 and is not routed to the PL for connectivity to PL PCIE controllers.

CPM4 GTY Locations


GT Quad Locations

The following table identifies the GT Quad(s) that can be used for each PCIE controller location. The
Quad shown in bold is the most adjacent GT Quad for each PCIe block location.

Table: CPM4 GTY Locations

Device Package CPM Controller


GT QUAD for X16
GT QUAD for X8GT QUAD for X4

CPM N/A GTY_QUAD_X0Y3


GTY_QUAD_X0Y2
Controller GTY_QUAD_X0Y2
1
XCVM1302,
All CPM GTY_QUAD_X0Y3
GTY_QUAD_X0Y1
GTY_QUAD_X0Y0
XCVM1402
Controller GTY_QUAD_X0Y2
GTY_QUAD_X0Y0
0 GTY_QUAD_X0Y1
GTY_QUAD_X0Y0

CPM N/A GTY_QUAD_X0Y6


GTY_QUAD_X0Y5
XCVC1902, Controller GTY_QUAD_X0Y5
XCVM1502, 1
XCVM1802,
All CPM GTY_QUAD_X0Y6
GTY_QUAD_X0Y4
GTY_QUAD_X0Y3
XCVC1702,
XCVC1802, Controller GTY_QUAD_X0Y5
GTY_QUAD_X0Y3
XCVE1752 0 GTY_QUAD_X0Y4
GTY_QUAD_X0Y3

XCVC1502 All CPM N/A GTY_QUAD_X0Y4


GTY_QUAD_X0Y3
Controller GTY_QUAD_X0Y3
1

CPM GTY_QUAD_X0Y4
GTY_QUAD_X0Y2
GTY_QUAD_X0Y1
Controller GTY_QUAD_X0Y3
GTY_QUAD_X0Y1
0 GTY_QUAD_X0Y2
GTY_QUAD_X0Y1

GT Selection and Pin Planning for CPM5


⚠ CAUTION! The guidance provided in this appendix contains preliminary information and is subject
to change without notice.
This appendix provides guidance on gigabit transceiver (GTYP) selection for CPM5 and key
recommendations to be considered during pin planning. For each PCIe interface, these include:

Displayed in the footer


Page 455 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
GTYP quad placement
REFCLK placement
RESET placement

CPM5 has dedicated connectivity to a specific set of four GTYP quads which are adjacent to each
other, and adjacent to CPM5. If unused by CPM5, certain quads might be available for use with the
high-speed debug port (HSDP), but no quad can be bypassed to the programmable logic. The
remaining GT in the device are available for other use cases, if the GT of interest provide the
necessary protocol support as required for the desired use cases.
Through the GTYP quads with dedicated connectivity to the CPM5, specific REFCLK inputs must be
used to provide a reference clock to GTYP quads, which internally provide derived clocks to the
CPM5. In the common case of add-in-card designs, the reference clock is sourced from the edge
connector. In other cases, such as system-board designs, embedded designs, and cabled
interconnect, a local oscillator is typically required.
✎ Recommended: Although the CPM5 can support a variety of reference clock frequencies, AMD
recommends that designers selecting local oscillators use a 100 MHz reference clock as described in
the PCI Express Card Electromechanical Specification unless there is a compelling reason to use a
different supported frequency.
As part of the AMD Versal™ architecture integrated shell, specific reset inputs must be used to
provide a reset to the GTYP and the CPM5. In the common case of add-in-card designs, the reset is
sourced from the edge connector. In the case of a system-board or embedded design, the system is
responsible for generating reset signals and sourcing them to devices as required for the desired use
case. Where cabled interconnect is used, consult the cable specification for information about if and
how it accommodates sideband signaling for reset.
The remainder of this appendix is divided into these sections:

1. General Guidance for CPM5


Most designers working with CPM5 devices require guidance in this section.
2. Guidance for CPM5 in Specifically Identified Engineering Sample Devices
Designers working with CPM5 in specifically identified engineering sample devices require
guidance in this section.
3. Guidance for CPM5 Migration from Specifically Identified Engineering Sample Devices
Designers intending to migrate their design containing CPM5 from specifically identified
engineering sample devices into other devices will require the guidance in all sections.

General Guidance for CPM5


GTYP Quad and REFCLK Placements

The allowable GTYP quad placements are shown in the following table. Placements are determined
by CIPS IP configuration GUI as part of CPM configuration selections.

Table: Allowable GTYP Quad Placement - Production Silicon

Displayed in the footer


Page 456 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

BoardCPM5 PCIe Controller


GTYP Transceiver
Width Configuration
Quad (Package Bank)
CPM5
Channels
PCIe ControllerOther
GTYPSupported
Reference
Lane Reversal
Width
ClockC

Controller
Controller
1 Quad0 3 Quad 2 Quad 1 Quad 0
(Bank 105)(Bank 104)(Bank 103)(Bank 102)

3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 Controller C
1 ontroller 0

x16 – x16 Controller 0 [0:15] – Quad x8, x4 Not


104, Required
REFCLK
0

x8, x8 x8 Controller 1 [0:7] Controller 0 [0:7] Quad Quad x8, x4 Not


x8 104, 102, x4, x8 Required
REFCLK REFCLK x4, x4
0 0

x8 x8 – Controller 1 [0:7] – Quad – x4 Not


104, Required
REFCLK
0

x8 – x8 – Controller 0 [0:7] – Quad x4 Not


102, Required
REFCLK
0

x4, x4 x4 – Controller – Controller Quad Quad – On


x4 1 [3:0] 0 [3:0] 104, 102, PCB
1 1 REFCLK REFCLK
0 0

x4 x4 – – Controller – Quad – – On
1 [3:0] 104, PCB
1 REFCLK
0

x4 – x4 – – Controller – Quad – On
0 [3:0] 102, PCB
1 REFCLK
0

x4, x4 x8 – Controller Controller 0 [0:7] Quad Quad x4, x4 x4 on


x8 1 [3:0] 104, 102, PCB,
1 REFCLK REFCLK x8
0 0 Not
Required

Displayed in the footer


Page 457 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

BoardCPM5 PCIe Controller


GTYP Transceiver
Width Configuration
Quad (Package Bank)
CPM5
Channels
PCIe ControllerOther
GTYPSupported
Reference
Lane Reversal
Width
ClockC

Controller
Controller
1 Quad0 3 Quad 2 Quad 1 Quad 0
(Bank 105)(Bank 104)(Bank 103)(Bank 102)

3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 Controller C
1 ontroller 0

x8, x8 x4 Controller 1 [0:7] – Controller Quad Quad x4, x4 x8


x4 0 [3:0] 104, 102, Not
1 REFCLK REFCLK Required,
0 0 x4 on
PCB

1. A x1 link width uses lane 0 of the x4 configuration. A x2 link width uses lanes 1:0 of the x4
configuration. This should be reversed on the PCB.

Board designs for x2 and x1 must use x4 guidance. For x2 board designs, based on the controller to
be used, connect to controller lane numbers Controller 0 [1:0] or Controller 1 [1:0]. Similarly, for x1
board designs, connect to controller lane numbers Controller 0 [0] or Controller 1 [0]. Note that
controller lane numbers might not be the same as physical GTYP channel numbers in a quad.
Consult the provided placement table.

RESET Placements

Allowed placements are shown in the table below. Placements are selected in CIPS IP configuration
GUI as part of PS PMC peripheral and I/O configuration selections.

Table: Allowed Reset Pin Placements

CPM5 PCIE Controller and Port Type RESET Pin Location Options

PS MIO 18 (Default)
0: Endpoint, Switch Ports
PMC MIO 24
(Up/Down)
PMC MIO 38

PS MIO 19 (Default)
1: Endpoint, Switch Ports
PS MIO 25
(Up/Down)
PS MIO 39

PS MIO 0 (Default)

0: Root Port PS MIO 0 – 25

PMC MIO 0 – 51

1: Root Port PS MIO 1 (Default)

Displayed in the footer


Page 458 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

CPM5 PCIE Controller and Port Type RESET Pin Location Options

PS MIO 0 – 25

PMC MIO 0 – 51

1. Both CPM5 PCIE Controller0 and Controller1 cannot be enabled in the AXI Bridge functional
mode and port type as root port, rest other combinations are possible.

CPM5 Configuration Notes

The GTYP lane and quad ordering above typically results in lanes crossing for x4 and x2 endpoint
configurations. In this scenario AMD recommends physically reversing the lanes in the PCB design
board traces. This typically results in a bow-tie in the board PCB traces between the device and the
PCIe edge connector.

Guidance for CPM5 in Specifically Identified


Engineering Sample Devices
Specifically identified engineering sample devices listed in the table below contain a CPM5 based on
an earlier CPM5 design database. Subsequent AMD plans include additional lane remapping support
to further increase flexibility of CPM5 across a variety of use cases.

Table: Engineering Sample Devices

Versal Adaptive SoC Series Part Number

VP1202 VSVA2785 ES1

VP1502 VSVA2785 ES1

VP1502 VSVA3340 ES1


Versal Premium
VP1552 VSVA2785 ES1

VP1702 VSVA3340 ES1

VP1802 LSVC4072 ES1

Versal HBM VH1522 VSVA3697 ES1

VH1542 VSVA3697 ES1

VH1582 VSVA3697 ES1

GTYP Quad and REFCLK Placements

Allowed placements are shown in the table below. Placements are determined by CIPS IP
configuration GUI as part of CPM configuration selections.

Displayed in the footer


Page 459 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Table: Allowable GTYP Quad Placement - Engineering Samples Silicon

BoardCPM5 PCIE Controller


GTYP Transceiver
Width Configuration
Quad (Package Bank)
CPM5
Channels
PCIE ControllerOther
GTYP Supported
Reference
Lane Reversal
Width
ClockC

1 0 Quad 3 (Bank
Quad105)
2 (Bank
Quad104)
1 (Bank
Quad103)
0 (Bank 102)

3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 1 0

Quad
x16 -- x16 Controller 0 [15:0] -- 104, -- By IP
Refclk 0

Quad Quad
x8,
x8 x8 Controller 1 [7:0] Controller 0 [7:0] 104, 102, -- By IP
x8
Refclk 0 Refclk 0

Quad
x8 x8 -- Controller 1 [7:0] -- 104, -- -- By IP
Refclk 0

Quad
x8 -- x8 -- Controller 0 [7:0] -- 102, -- By IP
Refclk 0

Quad Quad
x4, Controller Controller On
x4 x4 -- -- 104, 102, --
x4 1 [3:0] 0 [3:0] PCB
Refclk 0 Refclk 0

Quad
Controller On
x4 x4 -- -- -- 104, -- --
1 [3:0] PCB
Refclk 0

Quad
Controller On
x4 -- x4 -- -- -- 102, --
0 [3:0] PCB
Refclk 0

x4 on
Quad Quad
x4, Controller PCB
x4 x8 -- Controller 0 [7:0] 104, 102, --
x8 1 [3:0] x8 by
Refclk 0 Refclk 0
IP

x8 by
Quad Quad
x8, Controller IP
x8 x4 Controller 1 [7:0] -- 104, 102, --
x4 0 [3:0] x4 on
Refclk 0 Refclk 0
PCB

Board designs for x2 and x1 must use x4 guidance. For x2 board designs, based on the controller to
be used, connect to controller lane numbers Controller 0 [1:0] or Controller 1 [1:0]. Similarly, for x1
board designs, connect to controller lane numbers Controller 0 [0] or Controller 1 [0].

Displayed in the footer


Page 460 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

✎ Note: Controller lane numbers might not be the same as physical GTYP channel numbers in a
quad. Refer the provided placement table.

RESET Placements

Allowed placements are shown in the following table. Placements are selected in CIPS IP
configuration GUI as part of PS PMC peripheral and I/O configuration selections.

Table: Allowed Reset Pin Placements

CPM5 PCIE Controller and Port Type RESET Pin Location Options

PS MIO 18 (Default)
0: Endpoint, Switch Ports
PMC MIO 24
(Up/Down)
PMC MIO 38

PS MIO 19 (Default)
1: Endpoint, Switch Ports
PS MIO 25
(Up/Down)
PS MIO 39

PS MIO 0 (Default)

0: Root Port PS MIO 0 – 25

PMC MIO 0 – 51

PS MIO 1 (Default)

1: Root Port PS MIO 0 – 25

PMC MIO 0 – 51

1. Root port mode can be enabled simultaneously on both controller 0 and controller 1 as long
as one of them is in non-DMA mode.

CPM5 Configuration Notes

In many cases which might naturally arise from allowed GTYP quad placements and lane ordering,
the PCB designer might conclude it is not feasible to meet length, loss, or other signaling
requirements while physically implementing lane reversal on the PCB.
This is likely with x16 and x8 link widths, therefore use lane reversal by the IP rather than physically
implementing lane reversal on the PCB. With lane reversal by the IP, CPM5 link width selection in
CIPS IP configuration GUI must match the PCB designed link width to ensure lane reversal by the IP
will function.
For x4 or narrower link widths, the feasibility of physically implementing lane reversal on the PCB is
greater, therefore use this approach instead.

Displayed in the footer


Page 461 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Guidance for CPM5 Migration from Specifically


Identified Engineering Sample Devices
GTYP Quad and REFCLK Considerations

In migration, the lane ordering for each controller configured for x16 or x8 link widths reverse within
the GTYP quads accessible to each controller. For these designs, the lane reversal is transparent
under the assumption that the lane reversal by the IP is used. REFCLK placements for x16 or x8 link
widths do not change.
For designs using x4 or narrower link widths, the lane ordering is unchanged during migration.
REFCLK placements also do not change.
Refer the provided placement tables. For additional migration support, contact your AMD
representative.

RESET Considerations

RESET placement options do not change.

CPM5 Configuration Considerations

Design migration requires IP update of the CPM5 as well as a re-implementation of the design to
generate a new programmable design image (PDI).

CPM5 GTYP Locations


Table: CPM5 GTYP Locations

Device Package CPM Controller


GT QUAD for X16
GT QUAD for X8GT QUAD for X4

XCVC2602, All CPM N/A GTY_QUAD_X0Y3


GTY_QUAD_X0Y2
XCVC2802, Controller GTY_QUAD_X0Y2
XCVE2802, 1
XCVE2602,
CPM GTY_QUAD_X0Y3
GTY_QUAD_X0Y1
GTY_QUAD_X0Y0
XCVM2202,XAVE2602,
Controller GTY_QUAD_X0Y2
GTY_QUAD_X0Y0
XAVE2802
0 GTY_QUAD_X0Y1
GTY_QUAD_X0Y0

XCVH1522, All CPM N/A GTY_QUAD_X0Y5


GTY_QUAD_X0Y4
XCVH1542, Controller GTY_QUAD_X0Y4
XCVH1582, 1
XCVH1742,
XCVH1782,
XCVM2502,
XCVP1202,
XCVP1502,
XCVP1552,

Displayed in the footer


Page 462 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Device Package CPM Controller


GT QUAD for X16
GT QUAD for X8GT QUAD for X4
XCVP1702, CPM GTY_QUAD_X0Y5
GTY_QUAD_X0Y3
GTY_QUAD_X0Y2
XCVP1802, Controller GTY_QUAD_X0Y4
GTY_QUAD_X0Y2
XCVP2502, 0 GTY_QUAD_X0Y3
XCVP2802, GTY_QUAD_X0Y2
XQVP1202,
XQVP1502,
XQVP1702,
XQVP2502

Using the High Speed Debug Port Over PCIe


for Design Debug
The high speed debug port (HSDP) allows the Vivado Design Suite to connect to the FPGA debug
cores through non-JTAG interfaces. The standard Vivado Design Suite debug feature uses JTAG to
connect to the hardware FPGA resources and performs debug through Vivado. This appendix focuses
on using PCIe to perform debug over a PCIe link rather than the standard JTAG debug interface. This
is referred to as HSDP-over-PCIe and allows for Vivado ILA waveform capture, VIO debug control,
and interaction with other AMD debug cores using the PCIe link as the communication channel.
HSDP-over-PCIe should be used to perform FPGA debug remotely using the Vivado Design Suite
debug feature when JTAG debug is not available. This is commonly used for data center applications
where the FPGA is connected to a PCIe host system without any other connections to the hardware
device.
Using debug over PCIe requires software, driver, and FPGA hardware design components. Because
there is an FPGA hardware design component to HSDP-over-PCIe debug, you cannot perform debug
until the FPGA is already loaded with a FPGA hardware design that implements HSDP-over-PCIe and
PCIe link to the host PC is established. This is achieved by loading an HSDP-over-PCIe enabled
design into the configuration flash on the board prior to inserting the card into the data center location.
Because debug using HSDP-over-PCIe is dependent on the PCIe communication channel, this
should not be used to debug PCIe link related issues.

Overview
The AMD Versal™ device has an integrated debug that resides in the PMC. The integrated debug
subsystem includes the test access port (TAP) controller, the Arm® debug access port (DAP)
controller, and the debug packet controller (DPC). The DPC receives command packets, referred to
as debug and trace packets (DTP), from one or more debug host interfaces, then generates reply
packets and transmits them back to the debug host. The Versal device has four debug host interfaces
that are connected to the DPC for interaction with external debug hosts.

Displayed in the footer


Page 463 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Serial, low-speed JTAG interface attached to the debug access port (DAP) controller
Serial, high-speed debug port (HSDP) connected to the Aurora protocol unit
Parallel, high-speed PCIe interface with debug protocols connected to the GTY quad
transceivers
Parallel PL path for potential soft Aurora IP or other debug interface and protocol in the PL

The focus of this appendix is employing the PCIe link as the communication channel with the DPC,
referred to as management (mgmt) mode for HSDP-over-PCIe, and user mode for HSDP-over-PCIe,
which is a slower and more restrictive mode to enable debug over a PCIe link. For more information
about Versal device integrated debug and other communication channels, refer Versal Adaptive SoC
Technical Reference Manual (AM011).
There are three primary components that enable HSDP-over-PCIe debug:

Host PC HSDP-PCIe driver


Host PC hw_server application
HSDP-over-PCIe enabled FPGA design

The following figure shows the role of each component when performing debug over PCIe.

Figure: HSDP-over-PCIe Hardware and Software Components

Host PC HSDP-PCIe Driver

The HSDP-PCIe driver provides connectivity to the debug over PCIe enabled FPGA hardware
resource that is connected to the Host PC via PCIe link. It acts as a bridge between the user-space
hw_server application and the FPGA hardware. The driver, depending on the FPGA design and the
arguments specified to hw_server, can function in mgmt mode or user mode. The hardware design
requirements for HSDP-over-PCIe for both mgmt and user mode and their differences are specified in
a later section. You must supply the required parameters associated with the hardware design to the
driver via a configuration header file before compilation and module installation. The HSDP-PCIe
driver for Linux can be downloaded from GitHub.

Host PC hw_server Application

The Vivado IDE is used to connect to the hw_server application to use the debug feature for remote
or local FPGA targets, including when using HSDP-over-PCIe. The Vivado IDE application can be
running on a remote or local PC and connects to the host PC using a TCP/IP socket, which is running
hw_server and connected to the hardware target via PCIe link. The HSDP-over-PCIe driver acts as a
conduit for hw_server to serve, receive debug, and trace data to the target device and display it to
the Vivado IDE.

Displayed in the footer


Page 464 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
HSDP-over-PCIe Enabled FPGA Design

Traditionally, hardware debug through AMD Vivado is performed over a JTAG interface. For Versal
device, the JTAG datapath to the DPC is hardened and abstracted away using Vivado IDE. Making
debug seamless requires that the connections are established among the debug target(s), DPC, and
debug target clock(s) are active and reset(s) are deasserted. To enable the HSDP-over-PCIe feature
on a Versal device, there are several design requirements that must be met. As mentioned previously,
there are two distinct methods to exercise the HSDP-over-PCIe feature such as mgmt mode and user
mode. Each of these methods has its own design requirements and supporting driver code.

User Mode

The user mode method for HSDP-over-PCIe imposes fewer requirements on the hardware design,
but it is also slower than management mode and does not allow for debug access to hardened blocks
like SYSMON, DDRMC, and IBERT. User mode must employ a PCIe BAR to access a fabric debug
hub from the host PC, which bypasses the DPC entirely and does not operate using DTPs. Instead,
the host PC uses memory mapped reads and writes to directly access the debug hub to issue and
collect debug data. User mode is identical on CPM4 and CPM5 capable devices and it is
recommended to use CPM in DMA mode to easily make use of the AXI Bridge Master type for the
PCIe BAR required to reach the debug hub. Debug transactions are then routed through the NoC to
the fabric, which opens the possibility to use NoC NMU remapping to ensure the PCIe BAR size
remains small and the device address map does not become fragmented.

Figure: Example Block Diagram for HSDP-over-PCIe User Mode Debug for CPM4/CPM5

Figure: Example Minimum Address Map for HSDP-over-PCIe User Mode Debug for
CPM4/CPM5

Displayed in the footer


Page 465 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Management Mode

The management mode method for HSDP-over-PCIe imposes more design requirements on the
target device and requires more setup, but its throughput is faster and allows for debug of hardened
debug cores. Management mode for HSDP-over-PCIe uses the HSDP DMA block to transfer DTPs to
DPC and coordinates responses to the Host PC. The HSDP DMA block must be accessible for setup
from a PCIe BAR through CPM’s AXI Master Bridge at base address 0xFE5F0000. The physical
address is fixed and cannot be remapped in the FPGA address space, as the HSDP DMA is only
accessible from an interconnect switch between the CPM interconnect and the NoC. This means that
NoC NMU address remapping cannot be employed, and the PCIe BAR must be large enough to
reach the HSDP DMA or the master bridge itself must perform address translation.

Figure: CIPS IP PS PMC Configuration for Management Mode Debug Using DPC

✎ Note: GTYP mapping is package or device dependent.


To enable management mode debug for Versal device that has CPM4, CPM must have at least one
AXI BAR enabled and the slave bridge must also be enabled for the DMA transfers to target and to
set up address translation to host memory. The PMC master must have the debug hub slave mapped
to its address space and the CPM master must have the CPM slave mapped to its address space.
For CPM4, you can configure up to 6 AXI BARs, each with address translation, which allows for only 6

Displayed in the footer


Page 466 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
apertures. The address translation registers are programmed from the slave bridge interface by the
Host PC through the master bridge.

Figure: Block Diagram for HSDP-over-PCIe Management Mode Debug for CPM4

Figure: Address Map for HSDP-over-PCIe Management Mode Debug for CPM4

Figure: AXI BARs for HSDP-over-PCIe Management Mode Debug for CPM4

Figure: PCIe BARs for HSDP-over-PCIe Debug for CPM4/CPM5

Displayed in the footer


Page 467 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

To enable mgmt mode debug for a Versal device that has CPM5 within it, CPM must have at least one
AXI BAR enabled and the slave bridge must also be enabled for the DMA transfers to target. The
PMC master must have the debug slave mapped into its address space and the CPM master must
have the CPM registers mapped into its address space. The address translation between CPM4 and
CPM5 differ significantly. For CPM5, the concept of the BDF table was introduced, which allows for
significantly more granularity for address translation within each AXI BAR, even with fewer AXI BARs.
The BDF table registers are located in the CPM register space, which are only accessible through the
PMC interface by the Host PC from the master bridge.

Figure: Block Diagram for HSDP-over-PCIe Management Mode Debug for CPM5

Figure: Address Map for HSDP-over-PCIe Management Mode Debug for CPM5

Figure: AXI BARs for HSDP-over-PCIe Management Mode Debug for CPM5

Displayed in the footer


Page 468 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Implementing the HSDP-over-PCIe Example Design


HSDP-over-PCIe example design is provided for Vivado users to test the feature in the hardware with
a Versal device development board. This section provides necessary instructions to generate the
design, implement appropriate software, and begin running debug traffic across a PCIe link.
✎ Note: The provided example design is for reference only; it might or might not comply with the
requirements for a production design. Users are advised to perform extensive testing and verification
before attempting to use in a production design.

Opening the Example Design and Generating a Bitstream

A set of example designs are hosted on GitHub in the AMD CED Store repository and displayed
through Vivado, which can be refreshed with a valid internet connection, including the HSDP-over-
PCIe example design. You can also download or clone the GitHub repository to your local machine
and point to the that location on your PC. To open the example design, perform the following options:

1. Launch Vivado.
2. Navigate to the set of example designs for selection
From the Quick Start menu, select Open Example Project, or
Select File > Project > Open Example.
3. From the Select Project Template window, select Versal CPM PCIe Debug and navigate through
the menus to select a project location and board part.
4. In the Flow Navigator, click Generate Device Image to run synthesis, implementation, and
generate a programmable device image (.pdi) file that can be loaded to the target Versal device
and a probes (.ltx) file used to specify debug information.

✎ Note: You can download or clone the GitHub repository to a local machine from
https://github.com/Xilinx/XilinxCEDStore and set the following parameter so that local example
designs are displayed in the Select Project template window.

set_param CED.repoPaths <parent-path>/XilinxCEDStore/<optional-path-to-subset-of-


designs>

System Bring-Up

The first step is to program the FPGA and power on the system such that the PCIe link is detected by
the host system. This can be accomplished by either:

Displayed in the footer


Page 469 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Programming the design file into the flash present on the FPGA board, or
Programming the device directly via JTAG

If the card is powered by the Host PC, it will need to be powered on to perform this programming
using JTAG and then restarted to allow the PCIe link to enumerate. After the system is up and
running, you can use the Linux lspci utility to list the details for the FPGA-based PCIe device.

Compiling and Loading the Driver

The provided HSDP-PCIe driver can be downloaded from GitHub. The driver should be compiled and
installed on the Host PC that is connected to the target FPGA via PCIe link. Before compiling the
driver, you must specify the relevant parameters of the hardware design to the driver through a
configuration header file for management and/or user mode. Refer to the comments within the header
file for more information on cross referencing the variable values to the hardware design parameters.
The values provided within the driver should already match the example design’s hardware
configuration, but may require selectively commenting or defining a specific section of the
configuration file in or out. A Makefile is provided with the driver to simply compilation, installation, and
removal.
✎ Note: You can download the driver from GitHub.
1. Navigate to the driver directory

$> cd <parent-path>/hsdp-pcie-driver

2. Edit the configuration header file, if necessary, at /src/hsdp_pcie_user_config.h


3. Compile the driver and copy to /lib/modules

$> make install

4. Insert the driver into the kernel

$> make insmod

Launch hw_server on the Remote or Local Host PC

After installing the HSDP-PCIe driver in the previous step, character device file(s) for user mode
and/or management mode will be located at /dev/ on the system with the BDF (Bus:Domain.Function)
of the PCIe device appended to the name if module compilation and installation was successful.
To launch hw_server and specify mgmt mode connection to the target FPGA, issue the following
command on the remote Host PC and replace <BB:DD.F> with the BDF of the PCIe device.

$> hw_server -e “set dpc-pcie /dev/hsdp_mgmt_<BB:DD.F>”

To launch hw_server <format> and specify user mode connection to the target FPGA, issue the
following command on the remote Host PC and replace <BB:DD.F> with the BDF of the PCIe device
and <name> if that was specified in the configuration header file.

Displayed in the footer


Page 470 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

$> hw_server -e “set pcie-debug-hub /dev/hsdp_user_<BB:DD.F>_<name>”

Connecting the Vivado IDE to the hw_server Application for Debug Over PCIe

At this point, the FPGA design has been loaded, the PCIe link has been established, the HSDP-PCIe
driver has been compiled and installed with the correct configuration values, and the hw_server
application has been started on the debug Host PC. The remaining step is to connect to hw_server
and begin connecting to the debug cores to exchange and display debug data.

1. Launch Vivado.
2. Select Open Hardware Manager from the Flow Navigator.
3. In the Hardware Manager, select Open target > Open New Target.
4. Connect to the hw_server application from the Vivado IDE.
If the debug host is remote, in the Hardware Server Settings window, modify the host name
field to the remote server that is running hw_server and the port number field, if using the
non-default port.
If the debug host is local, in the Hardware Server Settings window, select the Local Server
option for the Connect to: field.
5. If successful, a hardware target should be populated for selection, then click through to Finish.

6. The target device should be in the Hardware window and a probes file can now be specified in
the Hardware Device Properties window after opening the hardware target and the debug core
data is displayed

Displayed in the footer


Page 471 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

✎ Note: If using mgmt mode for debug, the hard block debug cores are accessible for debug,
while only user debug cores are present when using user mode for debug.
7. If using mgmt mode for debug, a user can connect to the debug host PC through the XSDB
application and issue direct AXI reads and writes through the PMC

Interrupt Request (IRQ) Routing and


Programming for CPM4
This appendix includes guidance on Interrupt Request (IRQ) pins routing and programming for CPM4.
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express provides three independent IRQ
pins routed to the Programmable Logic (PL) region as well as three independent IRQ pins routed to
the hardened Processing System (PS) region. These IRQ pins are shared between the two PCIE
controllers and all CPM4 use modes and can be programmed by user to route one or many Interrupt
sources.
The three IRQ pins routed to the PL region are named cpm_misc_irq, cpm_cor_irq, and
cpm_uncor_irq and are visible in the Vivado Block Diagram (BD) canvas at the Versal CIPS IP
boundary. Although, the IRQ pins are named as miscellaneous, correctable, and uncorrectable
respectively, they function identically from each other and have the same list of Interrupt source to
select from. Therefore, you can assume these IRQ pins as three separate general purpose IRQ pins.
The three IRQ pins routed to the PS regions are named similarly however they are not visible in the
Vivado BD canvas and they are using hardened silicon routing. These paths are always enabled and
no extra customization is required during CIPS IP customization to use it. These IRQ pins also

Displayed in the footer


Page 472 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
function identically and have the same list of Interrupt source to select from as the PL IRQ pins
counterparts and can be used with the PL IRQ pins.
There are many interrupt sources to select from and the complete list is available in the Versal
Adaptive SoC Register Reference (AM012). Versal Adaptive SoC CPM DMA and Bridge Mode for PCI
Express is only available on Controller 0. This appendix will provide some use cases example to
showcase how the IRQ pins mux registers are programmed and includes firmware guidance to
service the Interrupt Request.

Example: Generate Interrupt Request for Hot Reset


In this example, you generate interrupt whenever the hot reset is received for PCIE0 controller. You
route an interrupt generated from the PCIE0 controller to the PS region. You use the cpm_misc_irq
pin as an example, but any other pins can also be used. Note that in this particular example, you use
the pcie_local_event interrupt line at the CPM_SLCR mux level which is driven by
XDMA_REG.INT_DEC register. A high level block diagram of the interrupt routing is shown in the
following diagram:

Figure: Interrupt Routing Diagram

Register programming to enable Hot Reset event interrupt


The following register shall be programmed by you at runtime to enable the Interrupt:

Displayed in the footer


Page 473 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
For PCIE0 controller
1. CPM_SLCR.PS_MISC_IR_ENABLE set to 0x2 to select "pcie_local_event"
2. Read to confirm that CPM_SLCR.PS_MISC_IR_MASK is cleared for bit[1]
"pcie_local_event"
3. XDMA_REG.INT_MASK set to 0x8 to select "hot_reset"
4. Read to confirm that XDMA_REG.INT_MASK is set for bit[3] "hot_reset"

Interrupt service routine


The following steps outlines the recommended procedure to service the interrupt request.

For PCIE0 controller


1. Upon receiving interrupt, read CPM_SLCR.PS_MISC_IR_STATUS to confirm bit[1]
"pcie_local_event" is asserted
2. CPM_SLCR.PS_MISC_IR_DISABLE set to 0x2 to temporarily mask
"pcie_local_event" so further interrupt is not received while existing interrupt is being
serviced
3. Read XDMA_REG.INT_DEC to confirm bit [3] "hot_reset" is asserted.
4. XDMA_REG.INT_MASK set to 0x0 to temporarily mask "hot_reset"
5. Execute user-defined task for servicing Hot Reset event
6. XDMA_REG.INT_DEC set to 0x8 to clear "hot_reset"
7. CPM_SLCR.PS_MISC_IR_STATUS set to 0x2 to clear "pcie_local_event"
8. Re-enable / unmask Hot Reset event interrupt source by programming the
CPM_SLCR.PS_MISC_IR_ENABLE and XDMA_REG.INT_MASK registers

Example: Generate Interrupt Request for MSI Interrupt as Root Port


In this example, you will generate the interrupt whenever MSI vector 0 interrupt is received for PCIE0
controller. You will route the interrupt generated from PCIE0 controller to the PS region. You will use
the cpm_misc_irq pin as an example, but any other pins can also be used. Note that in this particular
example, you will use the pcie_msi0 interrupt line at the CPM_SLCR mux level which is driven by
XDMA_REG.MSI_DEC_31_0 register. A high level block diagram of the interrupt routing is shown in
the following diagram:

Figure: Interrupt Routing Diagram

Displayed in the footer


Page 474 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register programming to enable MSI vector 0 event interrupt


The following register shall be programmed by you at runtime to enable the interrupt:

For PCIE0 controller


1. CPM_SLCR.PS_MISC_IR_ENABLE set to 0x4 to select "pcie_msi0"
2. Read to confirm that CPM_SLCR.PS_MISC_IR_MASK is cleared for bit[2]
"pcie_msi0"
3. XDMA_REG.MSI_MASK_31_0 set to 0x1 to select "MSI vector 0"
4. Read to confirm that XDMA_REG.MSI_MASK_31_0 is set for bit[0] "MSI vector 0"

Interrupt service routine


The following steps outlines the recommended procedure to service the interrupt request:

For PCIE0 controller


1. Upon receiving interrupt, read CPM_SLCR.PS_MISC_IR_STATUS to confirm bit[2]
"pcie_msi0" is asserted
2. CPM_SLCR.PS_MISC_IR_DISABLE set to 0x4 to temporarily mask "pcie_msi0" so
further interrupt is not received while existing interrupt is being serviced
3. Read XDMA_REG.MSI_DEC_31_0 to confirm bit [0] "MSI vector 0" is asserted.
4. XDMA_REG.MSI_MASK_31_0 set to 0x0 to temporarily mask "MSI vector 0"
5. Execute user-defined task for servicing MSI interrupt
6. XDMA_REG.MSI_DEC_31_0 set to 0x1 to clear "MSI vector 0"
7. CPM_SLCR.PS_MISC_IR_STATUS set to 0x4 to clear "pcie_msi0"
8. Re-enable / unmask FLR event interrupt source by programming the
CPM_SLCR.PS_MISC_IR_ENABLE and XDMA_REG.MSI_MASK_31_0 registers

Displayed in the footer


Page 475 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Interrupt Request (IRQ) Routing and


Programming for CPM5
This appendix includes guidance on Interrupt Request (IRQ) pins routing and programming for CPM5.
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express provides three independent IRQ
pins routed to the Programmable Logic (PL) region as well as three independent IRQ pins routed to
the hardened Processing System (PS) region. These IRQ pins are shared between the two PCIE
controllers and all CPM5 use modes and can be programmed by user to route one or many Interrupt
sources.
The three IRQ pins routed to the PL region are named cpm_misc_irq, cpm_cor_irq, and
cpm_uncor_irq and are visible in the Vivado Block Diagram (BD) canvas at the Versal CIPS IP
boundary. Although, the IRQ pins are named as miscellaneous, correctable, and uncorrectable
respectively, they function identically from each other and have the same list of interrupt source to
select from. Therefore, you can assume these IRQ pins as three separate general purpose IRQ pins.
The three IRQ pins routed to the PS regions are named similarly however they are not visible in the
Vivado BD canvas and they are using hardened silicon routing. These paths are always enabled and
no extra customization is required during CIPS IP customization to use it. These IRQ pins also
function identically and have the same list of interrupt source to select from as the PL IRQ pins
counterparts and can be used with the PL IRQ pins.
There are many Interrupt source to select from and the complete list is available in the Versal
Adaptive SoC Register Reference (AM012). This appendix provides a use case example to showcase
how the IRQ pins mux registers are programmed and includes firmware guidance to service the
interrupt request.

Example: Generate Interrupt Request for Hot Reset


In this example, you will generate an interrupt whenever hot reset is received for PCIE0 controller and
PCIe1 controller. You will route the interrupt generated from PCIe0 controller to the PS region while
interrupt is generated from PCIE1 controller to the PL region. You will use the cpm_misc_irq pin as
an example, but any other pins can also be used. Note that in this particular example, you will use the
pcie_local_event interrupt line at the CPM5_SLCR mux level which is driven by
CPM5_DMA_CSR.INT_DEC register. A high level block diagram of the interrupt routing is shown in
the following diagram:

Figure: Interrupt Routing Diagram

Displayed in the footer


Page 476 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register programming to enable Hot Reset event interrupt


The following register shall be programmed by you at runtime to enable the interrupt:

For PCIE0 controller


1. CPM5_SLCR.PS_MISC_IR_ENABLE set to 0x2 to select pcie0_err
2. Read to confirm that CPM5_SLCR.PS_MISC_IR_MASK is cleared for bit[1] pcie0_err
3. CPM5_SLCR.PCIE0_IR_ENABLE set to 0x1 to select pcie_local_event
4. Read to confirm that CPM5_SLCR.PCIE0_IR_MASK is cleared for bit[0]
pcie_local_event
5. CPM5_DMA0_CSR.INT_MASK set to 0x8 to select hot_reset
6. Read to confirm that CPM5_DMA0_CSR.INT_MASK is set for bit[3] hot_reset
For PCIE1 controller
1. CPM5_SLCR.PL_MISC_IR_ENABLE set to 0x4 to select pcie1_err
2. Read to confirm that CPM5_SLCR.PL_MISC_IR_MASK is cleared for bit[2] pcie1_err
3. CPM5_SLCR.PCIE1_IR_ENABLE set to 0x1 to select pcie_local_event
4. Read to confirm that CPM5_SLCR.PCIE1_IR_MASK is cleared for bit[0]
pcie_local_event
5. CPM5_DMA1_CSR.INT_MASK set to 0x8 to select hot_reset
6. Read to confirm that CPM5_DMA1_CSR.INT_MASK is set for bit[3] hot_reset

Interrupt service routine


The following steps outline the recommended procedure to service the interrupt request:

Displayed in the footer


Page 477 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
For PCIE0 controller
1. Upon receiving interrupt, read CPM5_SLCR.PS_MISC_IR_STATUS to confirm bit[1]
pcie0_err is asserted
2. CPM5_SLCR.PS_MISC_IR_DISABLE set to 0x2 to temporarily mask pcie0_err so
further interrupt is not received while existing interrupt is being serviced
3. Read CPM5_SLCR.PCIE0_IR_STATUS to confirm bit[0] pcie_local_event is asserted
4. CPM5_SLCR.PCIE0_IR_DISABLE set to 0x1 to temporarily mask pcie_local_event
5. Read CPM5_DMA0_CSR.INT_DEC to confirm bit [3] hot_reset is asserted.
6. CPM5_DMA0_CSR.INT_MASK set to 0x0 to temporarily mask hot_reset
7. Execute user-defined task for servicing Hot Reset event
8. CPM5_DMA0_CSR.INT_DEC set to 0x8 to clear hot_reset
9. CPM5_SLCR.PCIE0_IR_STATUS set to 0x1 to clear pcie_local_event
10. CPM5_SLCR.PS_MISC_IR_STATUS set to 0x2 to clear pcie0_err
11. Re-enable / unmask Hot Reset event interrupt source by programming the
CPM5_SLCR.PS_MISC_IR_ENABLE, CPM5_SLCR.PCIE0_IR_ENABLE, and
CPM5_DMA0_CSR.INT_MASK registers
For PCIE1 controller
1. Upon receiving interrupt, read CPM5_SLCR.PL_MISC_IR_STATUS to confirm bit[2]
pcie1_err is asserted
2. CPM5_SLCR.PL_MISC_IR_DISABLE set to 0x4 to temporarily mask pcie1_err so
further interrupt is not received while existing interrupt is being serviced
3. Read CPM5_SLCR.PCIE1_IR_STATUS to confirm bit[0] pcie_local_event is asserted
4. CPM5_SLCR.PCIE1_IR_DISABLE set to 0x1 to temporarily mask pcie_local_event
5. Read CPM5_DMA1_CSR.INT_DEC to confirm bit [3] hot_reset is asserted.
6. CPM5_DMA1_CSR.INT_MASK set to 0x0 to temporarily mask hot_reset
7. Execute user-defined task for servicing Hot Reset event
8. CPM5_DMA1_CSR.INT_DEC set to 0x8 to clear hot_reset
9. CPM5_SLCR.PCIE1_IR_STATUS set to 0x1 to clear pcie_local_event
10. CPM5_SLCR.PL_MISC_IR_STATUS set to 0x4 to clear pcie1_err
11. Re-enable / unmask Hot Reset event interrupt source by programming the
CPM5_SLCR.PL_MISC_IR_ENABLE, CPM5_SLCR.PCIE1_IR_ENABLE, and
CPM5_DMA1_CSR.INT_MASK registers

Example: Generate Interrupt Request for MSI Interrupt as Root Port


In this example, you will generate an interrupt whenever MSI vector 0 interrupt is received for PCIe0
controller and PCIe1 controller. You will route the interrupt generated from PCIe0 controller to the PS
region while the interrupt generated from PCIe1 controller to the PL region. You will use the
cpm_misc_irq pin as an example, but any other pins can also be used. Note that in this particular
example, you will use the pcie_msi0 interrupt line at the CPM5_SLCR mux level which is driven by
CPM5_DMA_CSR.MSI_DEC_31_0 register. A high level block diagram of the interrupt routing is
shown in the following diagram:

Figure: Interrupt Routing Diagram

Displayed in the footer


Page 478 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Register programming to enable MSI vector 0 event interrupt


The following register shall be programmed by you at runtime to enable the interrupt:

For PCIE0 controller


1. CPM5_SLCR.PS_MISC_IR_ENABLE set to 0x2 to select pcie0_err
2. Read to confirm that CPM5_SLCR.PS_MISC_IR_MASK is cleared for bit[1] pcie0_err
3. CPM5_SLCR.PCIE0_IR_ENABLE set to 0x2 to select pcie_msi0
4. Read to confirm that CPM5_SLCR.PCIE0_IR_MASK is cleared for bit[1] pcie_msi0
5. CPM5_DMA0_CSR.MSI_MASK_31_0 set to 0x1 to select MSI vector 0
6. Read to confirm that CPM5_DMA0_CSR.MSI_MASK_31_0 is set for bit[0] MSI vector
0
For PCIE1 controller
1. CPM5_SLCR.PL_MISC_IR_ENABLE set to 0x4 to select pcie1_err
2. Read to confirm that CPM5_SLCR.PL_MISC_IR_MASK is cleared for bit[2] pcie1_err
3. CPM5_SLCR.PCIE1_IR_ENABLE set to 0x2 to select pcie_msi0
4. Read to confirm that CPM5_SLCR.PCIE1_IR_MASK is cleared for bit[1] pcie_msi0
5. CPM5_DMA1_CSR.MSI_MASK_31_0 set to 0x1 to select MSI vector 0
6. Read to confirm that CPM5_DMA1_CSR.MSI_MASK_31_0 is set for bit[0] MSI vector
0

Interrupt service routine


The following steps outlines the recommended procedure to service the interrupt request.

Displayed in the footer


Page 479 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
For PCIE0 controller
1. Upon receiving interrupt, read CPM5_SLCR.PS_MISC_IR_STATUS to confirm bit[1]
pcie0_err is asserted
2. CPM5_SLCR.PS_MISC_IR_DISABLE set to 0x2 to temporarily mask pcie0_err so
further interrupt is not received while existing interrupt is being serviced
3. Read CPM5_SLCR.PCIE0_IR_STATUS to confirm bit[1] pcie_msi0 is asserted
4. CPM5_SLCR.PCIE0_IR_DISABLE set to 0x2 to temporarily mask pcie_msi0
5. Read CPM5_DMA0_CSR.MSI_DEC_31_0 to confirm bit [0] MSI vector 0 is asserted.
6. CPM5_DMA0_CSR.MSI_MASK_31_0 set to 0x0 to temporarily mask MSI vector 0
7. Execute user-defined task for servicing MSI interrupt
8. CPM5_DMA0_CSR.MSI_DEC_31_0 set to 0x1 to clear MSI vector 0
9. CPM5_SLCR.PCIE0_IR_STATUS set to 0x2 to clear pcie_msi0
10. CPM5_SLCR.PS_MISC_IR_STATUS set to 0x2 to clear pcie0_err
11. Re-enable / unmask FLR event interrupt source by programming the
CPM5_SLCR.PS_MISC_IR_ENABLE, CPM5_SLCR.PCIE0_IR_ENABLE, and
CPM5_DMA0_CSR.MSI_MASK_31_0 registers
For PCIE1 controller
1. Upon receiving interrupt, read CPM5_SLCR.PL_MISC_IR_STATUS to confirm bit[2]
pcie1_err is asserted
2. CPM5_SLCR.PL_MISC_IR_DISABLE set to 0x4 to temporarily mask pcie1_err so
further interrupt is not received while existing interrupt is being serviced
3. Read CPM5_SLCR.PCIE1_IR_STATUS to confirm bit[1] pcie_msi0 is asserted
4. CPM5_SLCR.PCIE1_IR_DISABLE set to 0x2 to temporarily mask pcie_msi0
5. Read CPM5_DMA1_CSR.MSI_DEC_31_0 to confirm bit [0] MSI vector 0 is asserted.
6. CPM5_DMA1_CSR.MSI_MASK_31_0 set to 0x0 to temporarily mask MSI vector 0
7. Execute user-defined task for servicing MSI interrupt
8. CPM5_DMA1_CSR.MSI_DEC_31_0 set to 0x1 to clear MSI vector 0
9. CPM5_SLCR.PCIE1_IR_STATUS set to 0x2 to clear pcie_msi0
10. CPM5_SLCR.PL_MISC_IR_STATUS set to 0x4 to clear pcie1_err
11. Re-enable / unmask FLR event interrupt source by programming the
CPM5_SLCR.PL_MISC_IR_ENABLE, CPM5_SLCR.PCIE1_IR_ENABLE, and
CPM5_DMA1_CSR.MSI_MASK_31_0 registers

Migrating
For information about migrating QDMA/AXI Bridge 4.0/5.0 Soft IP to Versal CPM4 QDMA/AXI Bridge
Hard IP, see AR 33054.
For information about migrating QDMA/AXI Bridge 4.0/5.0 Soft IP to Versal CPM5 QDMA/AXI Bridge
Hard IP, see AR 33056 .

Displayed in the footer


Page 480 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Limitations
Speed Change Related Issue #1
Description
Repeated speed changes can result in the link not coming up to the intended targeted speed.

Workaround
A follow-on attempt should bring the link back. In extremely rare scenarios, a full reboot might be
required.

Speed Change Related Issue #2


Description
In extremely rare cases repeated link Rate changes might also result in the following:

1. PCIe access becoming unresponsive.


2. While traffic is going on in system and PM D3 is also enabled with rate changes, the host
might receive completion timeout for the read when the pre-read is done before the PM D3
sequence is targeted to the EP ECAM space.

Workaround
In the case of PM D3, AMD recommends that any valid EP address be used except ECAM
space in the pre-read before initiating PM D3 sequence.
In all other cases, waiting approximately 20 msec after the link rate and before attempting any
PCIe access can help.
However, in scenarios where the transaction still does not complete, a full reboot (power cycle
and re-programming image) would be required.

Speed Change Related Issue #3


Description
In RP configuration with core clock of 1 GHz, PCIe link rate changes from Gen1/Gen2 to
Gen3/Gen4/Gen5, it can fail to reach the intended speed or link can go down in rare cases.

Workaround
An additional write with value 1 to the Perform Equalization bit in Link Control 3 register on the
Root complex PCIe configuration space is required when the rate change is performed to Gen3,
Gen4, or Gen5 speeds from Gen1/Gen2.

Speed Change Related Issue #4


Description
In rare cases where DMA traffic occurs and repeated speed changes are performed, it is
possible that MSIX interrupt may not be generated.

Displayed in the footer


Page 481 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
Workaround
Remove the queue and add the queue after the speed change is complete.

Link Autonomous Bandwidth Status (LABS) Bit


Description
As a Root Complex when performing the link width/rate changes, the link width change works as
expected. However, the PCIe protocol requires a LABS bit which is not getting set after the link
width/rate change.
✎ Note: This is an informational bit and does not impact actual functionality.
Workaround
Ensure the software / application ignores the LABS bit as this is an informational bit and does
not impact functionality.
✎ Note: For any application, AMD recommends that you make sure the link is quiesced and no
transactions are pending before performing any link rate changes.

QDMA data transfer ordering


While the PCIe Bridge master follows PCIe ordering rules, there is no ordering enforcement between
the PCIe AXI Bridge Master path and the internal DMA registers or DMA data paths. In some cases
this may cause race condition between AXI Bridge Master and DMA register transfers. The following
is a workaround:

1. You should assign separate BAR to access QDMA queues space registers and set the steering
to route it to NOC. You can then loop back AXI Master transfers on to AXI slave interface. Set the
BAR size at 256K.
2. You should not make any BAR as DMA BAR rather make it as a separate AXI BAR to map
QDMA base registers and set the steering to route it to NOC, making DMA BAR internally
terminates this and ordering is not maintained. To workaround this issue, it needs to go to AXI
Master which should then be loop back to AXI slave interface.
3. Address offset of queues space registers are listed under AXI Slave register space section. For
controller 0, DMA registers are at 0x6_1000_0000. For controller 1 its at 0x7_1000_0000.

Bridge mode
SRI-OV will not be supported in Bridge mode.

Relaxed Ordering in Bridge Setup


Description
With any read request from Slave bridge the request TLP will not have Relaxed order bit set.

Displayed in the footer


Page 482 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

MPS Limitation
Description
Only an MPS size of up to 512 is supported in DMA and Bridge modes.

Secondary Bus Reset (SBR)


Description
If SBR is issued on H10 devices, 10ms of additional delay is required after SBR de-assert.

Master Bridge AER Errors


Description
If a packets is dropped in the PCIe domain because of some reason, AER is logged. But if a
packet is dropped in the AXI-MM domain for decode error or slave error, AER is not logged.

Slave Bridge Transaction Ordering


Description
Ordering between writes, reads, and side band transactions is not strictly enforced at the AXI
Slave Bridge and user_irq* interfaces.

Workaround
If strict ordering is required, users should wait for the appropriate AXI response before issuing
the dependent transaction.

Power Management - ASPM L1/L0s/PM D3


Description

1. Enabling ASPM L0s/ASPM L1 could show correctable errors being reported on the link by
both link partners such as, replay timer timeout, replay timer rollover, and receiver error.
2. A PCIe Endpoint device might also log errors when Configuration PM D3 transition request
comes in during non-quiesced traffic mode.
3. A PCIe Root Port device does not support ASPM L1 or L0s.

Workaround

1. It is recommended that the application disables correctable error reporting or ignores


correctable errors reported in event of link transitioned to ASPM L0s / ASPM L1.
2. For transition to D3Hot, software needs to make sure that the link is quiesced. To ensure
Memory Write packets are finished, issue a Memory Read request to the same location.
When the completion packet is received, it indicates that the link is quiesced and PM D3
request can be issued.

Displayed in the footer


Page 483 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Concurrent MSI-X Capability and MSI Capability


Description
CPM5 cannot be configured at compile time with both MSI-X internal capability and MSI
capability enabled.

Workaround

1. For XDMA and AXI4 bridge modes, MSI-X internal capability is used, therefore no
workaround is available. The choice to enable either MSI-X or MSI capability must be made
when configuring CPM5 IP.
2. This limitation does not apply to QDMA mode as MSI interrupt is not supported.

Root Port Configuration for Both CPM5 PCIe Controllers


Description
Both CPM5 PCIe Controller0 and Controller1, cannot be enabled in the AXI bridge functional
mode and port type as root port, as processor subsystem including NoC cannot implement two
independent root port datapath. Rest all other combinations are possible.

Workaround
For the dual controller use cases, where two PCIe Controllers of CPM5 are needed as root port,
at least one of them must be used in non DMA mode. For example, the CPM5 PCIe Controller0
can be enabled in AXI Bridge functional mode and Root port type; the Processor Subsystem(PS)
via NoC can enable the datapath for the same. The CPM5 PCIe Controller1 cannot be enabled
in the same configuration as CPM PCIe Controller0. However, the AXI Bridge Functional mode
and Root port type with CPM PCIe Controller1 can be realized by configuring it in PCIE mode as
Root port Type and implementing the AXI Bridge functionality in the PL logic. For the latter, the
existing soft QDMA IP in AXI Bridge mode can be used.

CPM5 Interrupt Usage Limitation


CPM5 to PS Interrupt sources are shared across PS_CORR, PS_UNCORR, and PS_MISC Interrupt
lines. There are three interrupt lines that go from CPM to PS (PS_CORR, PS_UNCORR, and
PS_MISC). The status and control registers for these interrupts can be found in the CPM5_SLCR
(0xfcdd0000) register block. Each interrupt source in these registers is replicated across all three
interrupt lines. If one of the interrupt sources in this register is unmasked for more than one of these
interrupts, both sources receive spurious interrupts.
For CPM5 QDMA and Bridge modes, the PS_UNCORR interrupt is used by the PLM to reset the
QDMA and bridge logic on PCIe link_down events through the PCIe0_err and PCIe1_err sources.
User applications should not use or service the PS_UNCORR interrupt or PCIe link_down events.
Doing so results in race conditions between the PLM and user application interrupt service routines. If
the PCIe0_err or PCIe1_err sources are used for the PS_CORR and PS_MISC by the user
application, it receives spurious interrupts for PCIe link_down events. The user applications should
not service interrupts for PCIe link_down events. This interrupt is serviced by the PLM for CPM5
QDMA and Bridge modes.

Displayed in the footer


Page 484 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Additional Resources and Legal Notices


Finding Additional Documentation

Technical Information Portal


The AMD Technical Information Portal is an online tool that provides robust search and navigation for
documentation using your web browser. To access the Technical Information Portal, go to
https://docs.amd.com.

Documentation Navigator
Documentation Navigator (DocNav) is an installed tool that provides access to AMD Adaptive
Computing documents, videos, and support resources, which you can filter and search to find
information. To open DocNav:

From the AMD Vivado™ IDE, select Help > Documentation and Tutorials.
On Windows, click the Start button and select Xilinx Design Tools > DocNav.
At the Linux command prompt, enter docnav.

✎ Note: For more information on DocNav, refer to the Documentation Navigator User Guide
(UG968).

Design Hubs
AMD Design Hubs provide links to documentation organized by design tasks and other topics, which
you can use to learn key concepts and address frequently asked questions. To access the Design
Hubs:

In DocNav, click the Design Hubs View tab.


Go to the Design Hubs web page.

Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.

References
These documents provide supplemental material useful with this guide:

1. Control, Interface and Processing System LogiCORE IP Product Guide (PG352)


2. Versal Adaptive SoC DMA and Bridge Subsystem for PCI Express Product Guide (PG344)
3. Versal Adaptive SoC CPM Mode for PCI Express Product Guide (PG346)
4. Versal Adaptive SoC Integrated Block for PCI Express LogiCORE IP Product Guide (PG343)

Displayed in the footer


Page 485 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
5. Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller
LogiCORE IP Product Guide (PG313)
6. SmartConnect LogiCORE IP Product Guide (PG247)
7. QDMA Subsystem for PCI Express Product Guide (PG302)
8. DMA/Bridge Subsystem for PCI Express Product Guide (PG195)
9. AXI Bridge for PCI Express Gen3 Subsystem Product Guide (PG194)
10. AXI Interconnect LogiCORE IP Product Guide (PG059)
11. Versal Adaptive SoC Register Reference (AM012)
12. PCI-SIG Specifications (https://www.pcisig.com/specifications)
13. AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A)
14. Vivado Design Suite User Guide: Designing with IP (UG896)
15. Vivado Design Suite User Guide: Logic Simulation (UG900)
16. Vivado Design Suite User Guide: Programming and Debugging (UG908)
17. Vivado Design Suite User Guide: Getting Started (UG910)
18. Vivado Design Suite User Guide: Designing IP Subsystems using IP Integrator (UG994)
19. Power Design Manager User Guide (UG1556)
20. Versal Adaptive SoC System Software Developers Guide (UG1304)
21. Vivado Design Suite User Guide: Dynamic Function eXchange (UG909)
22. Versal Premium Series Data Sheet: DC and AC Switching Characteristics (DS959)
23. Versal Adaptive SoC Packaging and Pinouts Architecture Manual (AM013)
24. Versal Adaptive SoC Technical Reference Manual (AM011)

Revision History
The following table shows the revision history for this document.

Section Revision Summary

11/22/2024 Version 3.4

Resets Updated section.

Segmented Configuration Added section.

Using the Provided Software and Drivers Added a note.

C2H Stream Modes Updated section.

Enable the Tandem Configuration Solution Updated section.

Completion Engine Added a note.

QDMA Performance Optimization Updated section.

Reset in a Root Port Mode Added new section.

Deliver Programming Images to Silicon Updated section.

AXI to PCIe BARs Updated section.

Displayed in the footer


Page 486 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Section Revision Summary

Customizable Example Design (CED) Updated section.

NoC Ports Updated section.

QDMA Descriptor Bypass Input Ports Added figures.

QDMA Descriptor Bypass Output Ports Added figures.

Limitations Updated section.

05/30/2024 Version 3.4

Terminology Updated terminology.

Supported Devices Updated Tandem Configuration Supported


Devices table.

Enable the Tandem Configuration Solution Updated figures.

Design Version Compatibility Checks Added new section.

Tandem PCIe and DFX Configurable Example Updated section.


Design

Function Map Table Updated for FMAP programming in mailbox IP.

Limitations Updated.

MSIX Interrupt Options Added new section.

11/20/2023 Version 3.4

CPM4_QDMA_Gen4x8_MM_ST_Design Added new section.

Example Design Registers Added new section.

CED Generation Steps Added new section.

CPM4_QDMA_Gen4x8_MM_ST_Performance_Design
Added new section.

Versal_CPM_QDMA_EP_Simulation_Design Added new section.

Versal_CPM_Bridge RP_Design Added new section.

CPM5_QDMA_Gen4x8_MM_ST_Design Added new section.

CPM5_QDMA_Gen5x8_MM_Performance_DesignAdded new section.

CPM5_QDMA_Gen4x8_ST_Performance_Design Added new section.

CPM5_QDMA_Dual_Gen4x8_MM_ST_Design Added new section.

Displayed in the footer


Page 487 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Section Revision Summary

Versal_CPM_QDMA_EP_Simulation_Design Added new section.

Versal_CPM_Bridge RP_Design Added new section.

CPM5_QDMA_Gen5x8_ST_Performance_Design Added new section.

CPM5_QDMA_Dual_Gen5x8_ST_Performance_Design
Added new section.

Interrupt Request (IRQ) Routing and Added new section.


Programming for CPM4

Interrupt Request (IRQ) Routing and Added new section.


Programming for CPM5

Supported Devices Added new section.

Tandem + DFX Added new section.

QDMA Performance Optimization Updated.

Enable the Tandem Configuration Solution Updated.

AXI Bridge Subsystem Added new figure.

AXI Bridge Subsystem Added new figures.

05/16/2023 Version 3.3

General updates Entire document.

Data Bandwidth and Performance Tuning Updated.

Enable the Tandem Configuration Solution Added a note.

Tandem PCIe and DFX Configurable Example Added new section.


Design

Known Issues and Limitations Updated.

Simulation Added new section.

Customizable Example Design (CED) Added new chapter

QDMA Performance Optimization Added QDMA Performance Registers table.

User Interrupts Updated User Interrupts Port Descriptions


table.

11/02/2022 Version 3.3

General updates Entire document.

Displayed in the footer


Page 488 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header

Section Revision Summary

Clocking Updated clock frequency.

Tandem Configuration Updated for clarification.

Master Bridge Updated for clarification.

Function Map Table Updated description.

Context Programming Updated context programming for CPM4.

Context Programming Updated context programming for CPM5.

06/15/2022 Version 3.0

General updates Entire document.

04/29/2022 Version 3.0

General updates Updated for Versal Premium adaptive SoC


support.

12/17/2021 Version 3.0

General Update Updated for the CIPS IP v3.0.

Limitations Added known issues for the release.

QDMA Features and XDMA Features Added clarifying details regarding AXI4-Stream
interface data rate support.

12/04/2020 Version 2.1

Initial release. N/A

Please Read: Important Legal Notices


The information presented in this document is for informational purposes only and may contain
technical inaccuracies, omissions, and typographical errors. The information contained herein is
subject to change and may be rendered inaccurate for many reasons, including but not limited to
product and roadmap changes, component and motherboard version changes, new model and/or
product releases, product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that
cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise
correct or revise this information. However, AMD reserves the right to revise this information and to
make changes from time to time to the content hereof without obligation of AMD to notify any person
of such revisions or changes. THIS INFORMATION IS PROVIDED "AS IS." AMD MAKES NO
REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND
ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT

Displayed in the footer


Page 489 of 490
Versal Adaptive SoC CPM DMA and Bridge Mode for PCI Express Product Guide (PG347)
Displayed in the header
MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY
PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY
RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING
FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AUTOMOTIVE APPLICATIONS DISCLAIMER


AUTOMOTIVE PRODUCTS (IDENTIFIED AS "XA" IN THE PART NUMBER) ARE NOT
WARRANTED FOR USE IN THE DEPLOYMENT OF AIRBAGS OR FOR USE IN APPLICATIONS
THAT AFFECT CONTROL OF A VEHICLE ("SAFETY APPLICATION") UNLESS THERE IS A
SAFETY CONCEPT OR REDUNDANCY FEATURE CONSISTENT WITH THE ISO 26262
AUTOMOTIVE SAFETY STANDARD ("SAFETY DESIGN"). CUSTOMER SHALL, PRIOR TO USING
OR DISTRIBUTING ANY SYSTEMS THAT INCORPORATE PRODUCTS, THOROUGHLY TEST
SUCH SYSTEMS FOR SAFETY PURPOSES. USE OF PRODUCTS IN A SAFETY APPLICATION
WITHOUT A SAFETY DESIGN IS FULLY AT THE RISK OF CUSTOMER, SUBJECT ONLY TO
APPLICABLE LAWS AND REGULATIONS GOVERNING LIMITATIONS ON PRODUCT LIABILITY.

Copyright
© Copyright 2020-2024 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, UltraScale,
UltraScale+, Versal, Vivado, Zynq, and combinations thereof are trademarks of Advanced Micro
Devices, Inc. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and used under license. AMBA,
AMBA Designer, Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are
trademarks of Arm Limited in the US and/or elsewhere. Other product names used in this publication
are for identification purposes only and may be trademarks of their respective companies.

Displayed in the footer


Page 490 of 490

You might also like