Kintex 7
Kintex 7
Revision History
The following table shows the revision history for this document.
Kintex-7 FPGA Base Targeted Reference Design www.xilinx.com UG882 (v1.2) August 3, 2012
Date Version Revision
08/03/12 1.2 Updated last paragraph of Hardware Test Setup Requirements. Added Vivado tools to
Rebuilding the Base TRD. Added step 6 to Generating the MIG IP Core through CORE
Generator Tool. Added Implementing the Design Using the Vivado Tools. Added
Vivado tools to Configuration Requirements.
UG882 (v1.2) August 3, 2012 www.xilinx.com Kintex-7 FPGA Base Targeted Reference Design
Kintex-7 FPGA Base Targeted Reference Design www.xilinx.com UG882 (v1.2) August 3, 2012
Table of Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 1: Introduction
The Base Targeted Reference Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Introduction
The Kintex™-7 Base Targeted Reference Design (TRD) delivers all the basic components of
a targeted design platform for high performance in a single package. Targeted Design
Platforms from Xilinx provide customers with simple, smart design platforms for the
creation of FPGA-based solutions in a wide variety of industries.
This user guide details a TRD developed for high performance on a Kintex-7 FPGA. The
aim is to accelerate the design cycle and enable FPGA designers to spend less time
developing the infrastructure of an application and more time creating a unique value-add
design. The primary components of the Kintex-7 Base TRD are the Kintex-7 FPGA
integrated Endpoint block for PCI Express®, Northwest Logic Packet DMA, Memory
Interface Solutions for DDR3, and AXI Interconnect IP block.
Software Hardware
C2S S2C
Channel-0
64 x 64 x Checker
Controller Loopback
250 MHz 250 MHz Generator
VFIFO
64 x 250 MHz
SI SI
256 x 64 x
Multi-Channel 200 MHz
Software AXI AXI DDR3 1,600 Mb/s
GUI
MI
DMA for PCIe DDR3
Driver Interconnect MIG I/O
SI SI
Channel-1
S2C C2S
VFIFO
64 x Controller 64 x Generator
Loopback
250 MHz 250 MHz
VFIFO Checker
Controller Raw Packet Data Block
AXI-ST AXI-MM
UG882_c1_01_0112012
Note: The arrows in Figure 1-1 indicate AXI interface directions from master to slave. They do not
indicate data flow directions.
• AXI Interconnect IP core with the Memory Controller supports multiple ports on
the memory.
• The Packetized Virtual FIFO controller controls the addressing of the DDR3
memory for each port, allowing the DDR3 memory to be used as Virtual Packet
FIFO.
• Software driver for a 32-bit Linux platform
• Configures the hardware design parameters
• Generates and consumes traffic
• Provides a Graphical User Interface (GUI) to report status and performance
statistics
The 7 Series FPGAs Integrated Block for PCI Express core and the Packet DMA are
responsible for data transfers from the host system to the Endpoint card (S2C) and
Endpoint card to host system (C2S). Data to and from the host is stored in a Virtual FIFO
built around the DDR3 memory. This Multiport Virtual FIFO abstraction layer around the
DDR3 memory allows the traffic to be moved efficiently without the need to manage
addressing and arbitration on the memory interface. It also provides a larger depth when
compared to storage implemented using Block RAMs.
The Integrated Block for PCI Express core, Packet DMA, and Multiport Virtual FIFO can be
considered as the base system. The base system can bridge the host system to any user
application running on the other end. The Raw Data Packet module is a dummy
application which generates and consumes packets. It can be replaced by any user-specific
protocol like Aurora or XAUI.
The software driver runs on the host system. It generates raw data traffic for transmit
operations in the S2C direction. It also consumes the data looped back or generated at the
application end in the C2S direction.
The modular architecture of the Base TRD hardware and software components simplifies
reuse and customization of the architecture to specific user requirements.
Getting Started
This chapter is a quick start guide enabling the user to test the Kintex-7 FPGA Base
Targeted Reference Design (TRD) in hardware with the software driver provided and also
simulate it. It provides step-by-step instructions for testing the design in hardware.
Note: The screen captures in this document are conceptual representatives of their subjects and
provide general information only.
Requirements
This section lists the prerequisites for hardware testing and simulation of the Base TRD.
Simulation Requirements
The tools required to simulate the Base TRD are:
• ISE Design Suite, Logic Edition
• ModelSim simulation software, v6.6d or later
SW13
J29, J30 J32
UG882_c2_01_011112
UG882_c2_02_011112
2. Connect one of the spare 4-pin connectors from the PC’s 12V ATX power supply to J49
on the KC705 board using a 4-pin to 6-pin PCIe adapter cable. Toggle the KC705 board
power switch SW15 to the ON position. Figure 2-3 shows the 12V power supply
connection and power switch SW15.
X-Ref Target - Figure 2-3
SW15
12V ATX Power
Supply Plugged
J49 6-pin Into the 4-pin
Connector Connector
UG882_c2_03_011112
3. Confirm the connectors are latched tight and power on the PC.
Note: If the user wishes to boot Linux from the Fedora 16 Live DVD, place the DVD in the PC’s
CD-ROM drive as soon as the PC system is powered on.
4. Check the status of the design on the KC705 board LEDs. The Base TRD provides
status on the GPIO LEDs on the front side of the KC705 board near the upper right
edge (Figure 2-4). When the PC is powered on and the Base TRD has successfully
configured on the FPGA, the LED status (right to left) should indicate:
• LED 0 - ON (the PCIe link is up)
• LED 1 - FLASHING (the PCIe user clock is present)
• LED 2 - ON (lane width is what is expected, else LED 2 flashes—expected lane
width is 4 for a x4 design and 8 for a x8 design)
• LED 3 - ON (Memory calibration is done)
• LED 4 to LED 7 - Not connected
X-Ref Target - Figure 2-4
UG882_c2_04_011112
Note: The BIOS boot order settings might have to be changed to make sure that the CD-ROM
is the first drive in the boot order. To set the boot order, enter the BIOS menu by pressing the DEL
or F2 key when the system is powered on. Set the boot order and save the changes.
The DEL or F2 key is used by most PC systems to enter the BIOS setup. Some PCs might have
a different way to enter the BIOS setup.
While booting, from the CD-ROM drive, the PC displays the images shown in
Figure 2-5.
X-Ref Target - Figure 2-5
UG882_c2_06_010612
Driver Installation
To set up and run the TRD demonstration, the software driver should be installed on the
PC system.
Installation of the software driver involves:
• Building the kernel objects and the GUI.
• Inserting the driver modules into the kernel.
After the driver modules are loaded, the application GUI can be invoked. The user can set
parameters through the GUI and run the TRD.
When the user is done running the TRD, the application GUI can be closed and the drivers
can be removed.
A script is provided to execute these actions. To run this script:
1. Double-click k7_trd_lin_quickstart in the k7_pcie_dma_ddr3 folder
(Figure 2-7).
X-Ref Target - Figure 2-7
UG882_c2_07_010612
2. When the window prompt shown in Figure 2-8 appears. Click Run in Terminal.
X-Ref Target - Figure 2-8
UG882_c2_08_010612
The application GUI is invoked. Proceed to Using the Application GUI to set design
parameters and run the Base TRD.
In case issues are encountered or if the user wants to understand driver details, the user
can run the individual steps detailed in Appendix D, Compiling Linux Drivers.
UG882_c2_09_010612
Figure 2-9: Test Setup and Payload Statistics Screen – Raw Data TX>RX Loopback
UG882_c2_10_010612
Figure 2-10: Test Setup and Payload Statistics Screen – Raw Data TX Only
UG882_c2_11_010612
Figure 2-11: Test Setup and Payload Statistics Screen – Raw Data RX Only
Select Enable TX Checker and Enable RX Generator and click Start Test to
enable both the data checker and the data generator. Packets are generated and
checked in both directions. The GUI plots the number of bytes transmitted and
received by the Packet DMA. Click Stop Test to stop packet generation. The screen
in Figure 2-12 shows the data throughput obtained from the S2C and C2S DMA
engines for the raw data Path0 with Enable TX Checker and Enable RX
Generator selected.
UG882_c2_12_010612
Figure 2-12: Test Setup and Payload Statistics Screen – Raw Data TX and RX
System Status
Click the System Status tab to view the system status screen (see Figure 2-13). This
screen shows the throughput numbers reported by the DMA engines for raw data
Path0 and the performance monitor on the transaction layer of the Kintex-7 FPGA. For
more details on the System Status window, refer to Figure 3-10.
UG882_c2_13_010612
Transaction Statistics
Click the PCIe Statistics tab (Figure 2-14) to view the PCIe transaction statistics
screen. This screen plots the data bus utilization statistics on the AXI4-Stream
interface. After the base TRD has run successfully, close the application GUI. Wait for
the drivers to be removed, and then proceed to Shutting Down the System, page 25.
UG882_c2_14_010612
The designs can also be re-implemented using the ISE or Vivado™ tools. Before running
any command line scripts, refer to the “Platform-Specific Installation Instructions” section
in UG798, Xilinx Design Tools: Installation and Licensing Guide [Ref 2] to learn how to set the
appropriate environment variables for the operating system. All scripts mentioned in this
user guide assume the XILINX environment variables have been set.
Note: The development machine does not have to be the hardware test machine with the PCIe slots
used to run the Base TRD.
Copy the k7_pcie_dma_ddr3_base files to the PC with the ISE or Vivado tools installed.
The LogiCORE™ IP blocks required for the Base TRD are shipped as a part of the package.
These cores and netlists are located in the k7_pcie_dma_ddr3_base/design/
ip_cores directory:
• pcie
• fifo
• axi_ic
The MIG IP core cannot be delivered as a part of the Base TRD source, because customers
have to accept a license agreement for the Micron simulation models. These models are
used when simulating the Base TRD. Users must generate the MIG IP core using the ISE
CORE Generator tool before trying to simulate or implement the Base TRD with the ISE
tools.
For users of the Vivado tools, the IP Catalog project files are in the
k7_pcie_dma_ddr3_base/design/ip_catalog directory, and the IP cores will be
generated automatically when the synthesis step is initiated.
button and then OK for the pop-up window that appears afterward. Then click Next to
go on.
X-Ref Target - Figure 2-15
UG882_c2_15_010612
Figure 2-15: CORE Generator Tool GUI to Generate the MIG IP Core
Note: The version you see in Figure 2-15 might not be the version on your screen.
7. Click Next until the Micron Tech Inc Simulation Model License Agreement page.
Select Accept and Click Next. This selection will generate the memory models
required for simulation.
8. In the following page click Next. Then click Generate to create the MIG IP core.
9. Close the Readme Notes Window and then the CORE Generator tool GUI.
Additionally, a golden set of XCO files are also provided under the
k7_pcie_dma_ddr3_base/design/ip_cores/reference directory so that the cores
can be regenerated, if desired.
To regenerate the core, copy mig.xco and mig.prj from the design/ip_cores/
reference directory.
3. At the command line of a terminal window (Linux) or ISE Design Suite Command
Prompt (Windows), use one of these commands to invoke the ISE software tools and
produce a BIT file and an MCS file in the results folder for downloading to the KC705
board:
$ source implement.sh x4 gen2 (for Linux)
$ source implement.sh x8 gen1 (for Linux)
Configuration Requirements
1. Check the KC705 board switch and jumper settings as shown in Table 2-1 and
Figure 2-1. Connect the micro USB cable and use the wall power adapter to provide
12V power to the 6-pin connector as shown in Figure 2-15.
X-Ref Target - Figure 2-16
UG882_c2_16_011112
UG882_c2_17_122011
If the design has been rebuilt according to the instructions in Rebuilding the Base TRD,
page 26, navigate to the k7_pcie_dma_ddr3_base/design/implement directory.
The BIT and MCS files generated during implementation and the scripts to program
the KC705 board are located in the results directory.
Navigate to the results directory and run the FPGA programming script at the
command prompt to configure the KC705 board with the design built in the
implement folder.
For the designs rebuilt using the PlanAhead design tool, the MCS files and the FPGA
programming scripts are available at k7_pcie_dma_ddr3_base/design/
implement/planahead_flow_x4gen2 and k7_pcie_dma_ddr3_base/
design/implement/planahead_flow_x8gen1.
For the designs rebuilt using the Vivado design tools, the MCS files and the FPGA
programming scripts are available at k7_pcie_dma_ddr3_base/design/
implement/vivado_flow_x4gen2 and k7_pcie_dma_ddr3_base/design/
implement/vivado_flow_x8gen1.
Simulation
The out-of-box simulation environment consists of the design under test (DUT) connected
to the Kintex-7 FPGA Root Port Model for PCI Express. This simulation environment
demonstrates the basic functionality of the Base TRD through various test cases. The
out-of-box simulation environment covers these traffic flows:
• Raw Data Transmit: Raw data traffic from the Root Port Model through the Endpoint
PCIe, Packet DMA, and DDR3 memory to the Loopback module
• Raw Data Receive: Raw data traffic from the Loopback module through the DDR3
memory, Packet DMA, and Endpoint PCIe to the Root Port Model
The Root Port Model for PCI Express is a limited test bench environment that provides a
test program interface. The purpose of the Root Port Model is to provide a source
mechanism for generating downstream PCI Express traffic to simulate the DUT and a
destination mechanism for receiving upstream PCI Express traffic from the DUT in a
simulation environment.
The out-of-box simulation environment (see Figure 2-18) consists of:
• Root Port Model for PCI Express connected to the DUT
• Transaction Layer Packet (TLP) generation tasks for various programming operations
• Test cases to generate different traffic scenarios
X-Ref Target - Figure 2-18
Command Line or
User Defined
PARAMETERS
Kintex-7 FPGA
Tasks For PCI Express PCIe Link Kintex-7 FPGA DDR3
TEST x4 Gen2/ PCIe_DMA_DDR3 Memory
TLP Generation Root Port
Model x8 Gen1 Design Model
UG882_c2_18_010512
The simulation environment creates log files during simulation. These log files contain a
detailed record of every TLP that was received and transmitted by the Root Port Model.
User-Controlled Macros
The simulation environment allows the user to define macros that control DUT
configuration. These values can be changed in the user_defines.v file.
Test Selection
For the raw data path, fixed length packets of 1024 bytes are generated.
Table 2-3 describes the various tests provided by the out-of-box simulation environment.
The name of the test to be run can be specified on the command line while invoking
relevant simulators in the provided scripts. By default, the simulation script file specifies
the basic test to be run using this syntax:
+TESTNAME=basic_test
The test selection can be changed by specifying a different test case as specified in
Table 2-3.
Functional Description
This chapter describes the hardware design and software driver components. It also
describes how the data and control information flow through the various connected IPs.
Hardware Architecture
Figure 3-1 provides a detailed block level overview of the TRD. The base system
components and the applications components enable data flow to/from the host memory
at high data rates.
X-Ref Target - Figure 3-1
Packetized VFIFO
PREVIEW
PREVIEW
FIFO
FIFO
Controller
S2C
ADDRESS
Channel-0
Loopback
64 x Checker
Packetized VFIFO
PREVIEW
PREVIEW
Integrated Endpoint Block for PCI Express
156.25 MHz
FIFO
FIFO
Controller
C2S
Generator
ADDRESS
MANAGER
PCIe x4Gen2/x8Gen1 Link
64 x 250 MHz
SI SI
256 x 64 x
DDR3 I/O
Packetized VFIFO
PREVIEW
PREVIEW
C2S
FIFO
FIFO
Controller
Loopback
ADDRESS 64 x Generator
Channel-1
PREVIEW
FIFO
FIFO
Controller
S2C
ADDRESS
MANAGER
Transmit Receive
UG882_c3_01_030812
PCI Express
The Kintex-7 FPGA Integrated Block for PCI Express provides a wrapper around the
integrated block in the FPGA. The integrated block is compliant with the PCI Express v2.0
specification. It supports x1, x2, x4, x8 lane widths operating at 2.5 Gb/s (Gen1) or 5 Gb/s
(Gen2) line rate per direction. The wrapper combines the Kintex-7 FPGA Integrated Block
for PCI Express with transceivers, clocking, and reset logic to provide an industry standard
AXI4-Stream interface as the user interface.
For details on the Kintex-7 FPGA integrated Endpoint block for PCI Express, refer to
UG477, 7 Series FPGAs Integrated Block for PCI Express User Guide [Ref 4].
Note: Start of packet is derived based on the signal values of source valid, destination ready and
end of packet indicator. The clock cycle after end of packet is deasserted and source valid is asserted
indicates start of a new packet.
Four counters collect information on the transactions on the AXI4-Stream interface:
• TX Byte Count. This counter counts bytes transferred when the s_axis_tx_tvalid and
s_axis_tx_tready signals are asserted between the Packet DMA and the
Kintex-7 FPGA Integrated Block for PCI Express. This value indicates the raw
utilization of the PCIe transaction layer in the transmit direction, including overhead
such as headers and non-payload data such as register access.
• RX Byte Count. This counter counts bytes transferred when the m_axis_rx_tvalid and
m_axis_rx_tready signals are asserted between the Packet DMA and the
Kintex-7 FPGA Integrated Block for PCI Express. This value indicates the raw
utilization of the PCIe transaction layer in the receive direction, including overhead
such as headers and non-payload data such as register access.
• TX Payload Count. This counter counts all memory writes and completions in the
transmit direction from the Packet DMA to the host. This value indicates how much
traffic on the PCIe transaction layer is from data, which includes the DMA buffer
descriptor updates, completions for register reads, and the packet data moving from
the user application to the host.
• RX Payload Count. This counter counts all memory writes and completions in the
receive direction from the host to the DMA. This value indicates how much traffic on
the PCIe transaction layer is from data, which includes the host writing to internal
registers in the hardware design, completions for buffer description fetches, and the
packet data moving from the host to user application.
The actual packet payload by itself is not reported by the performance monitor. This value
can be read from the DMA register space. The method of taking performance snapshots is
similar to the Northwest Logic DMA performance monitor (refer to the Northwest Logic
DMA Back-End Core User Guide and Northwest Logic DMA AXI DMA Back-End Core User
Guide, available in the k7_pcie_dma_ddr3_base/design/ipcores/dma/doc
directory). The byte counts are truncated to a four-byte resolution, and the last two bits of
the register indicate the sampling period. The last two bits transition every second from 00
to 01 to 10 to 11. The software polls the performance register every second. If the
sampling bits are the same as the previous read, then the software needs to discard the
second read and try again. When the one-second timer expires, the new byte counts are
loaded into the registers, overwriting the previous values.
E S C S E E S C
Hi L
0 0 0 R 0 0 H M Rsvd ByteCount[19:0] O O 0 R H M Rsvd ByteCount[19:0]
0 0
R T P P P R T P
S E Ir Ir Ir Ir
O O 0 0 0 0 q q Rsvd ByteCount[19:0] 0 0 0 0 0 0 q q Rsvd RsvdByteCount[19:0]
P P Er C Er C
NextDescPtr[31:5],5'b00000 NextDescPtr[31:5],5'b00000
UG882_c3_02_121711
Figure 3-2: S2C Buffer Descriptor and C2S Buffer Descriptor Layout
This field points to the next descriptor in the linked list. All descriptors are 32-byte aligned.
Packet Transmission
The software driver prepares a ring of descriptors in system memory and writes the start
and end addresses of the ring to the relevant S2C channel registers of the DMA. When
enabled, the DMA fetches the descriptor followed by the data buffer it points to. Data is
fetched from the host memory and made available to the user application through the
DMA S2C streaming interface.
The packet interface signals (for example, user control and the end of packet) are built from
the control fields in the descriptor. The information present in the user control field is made
available during the start of packet. The reference design does not use the user control
field.
To indicate data fetch completion corresponding to a particular descriptor, the DMA
engine updates the first doubleword of the descriptor by setting the complete bit of the
'Status and Byte Count field to 1. The software driver analyzes the complete bit field to free
up the buffer memory and reuse it for later transmit operations.
Figure 3-3 shows the system to card data transfer.
Note: Start of Packet is derived based on the signal values of source valid (s2c_tvalid), destination
ready (s2c_tready) and end of packet (s2c_tlast) indicator. The clock cycle after end of packet is
deasserted and source valid is asserted indicates start of a new frame.
X-Ref Target - Figure 3-3
Complete=1
Packet Reception
The software driver prepares a ring of descriptors with each descriptor pointing to an
empty buffer. It then programs the start and end addresses of the ring in the relevant C2S
DMA channel registers. The DMA reads the descriptors and waits for the user application
to provide data on the C2S streaming interface. When the user application provides data,
the DMA writes the data into one or more empty data buffers pointed to by the prefetched
descriptors. When a packet fragment is written to host memory, the DMA updates the
status fields of the descriptor. The c2s_tuser signal on the C2S interface is valid only during
c2s_tlast. Hence, when updating the EOP field, the DMA engine also needs to update the
User Status fields of the descriptor. In all other cases, the DMA updates only the Status and
Byte Count field. The completed bit in the updated status field indicates to the software
driver that data was received from the user application. When the software driver
processes the data, it frees the buffer and reuses it for later receive operations.
Figure 3-4 shows the card to system data transfer.
SOP=1 Complete=1
Note: Start of Packet is derived based on the signal values of source valid (c2s_tvalid), destination
ready (c2s_tready) and end of packet (c2s_tlast) indicator. The clock cycle after end of packet is
deasserted and source valid is asserted indicates start of a new frame.
The software periodically updates the end address register on the Transmit and Receive
DMA channels to ensure uninterrupted data flow to and from the DMA.
Packetized
Packetized
Packetized
Packetized
controller
controller
controller
controller
VFIFO
VFIFO
VFIFO
VFIFO
64 bit @
200 MHz
SI SI SI SI
AXI4
Interconnect
MI
AXI4 Slave
Memory Interface
Generator
DDR3 I/O
UG882_c3_05_121811
interface (MI) on the Interconnect drives the single port Memory Controller Interface
which is a slave.
Packetizer
CW CW
INSERT STRIP
AXI-ST AXI-ST
Minimal Minimal
ASYNC
FIFO
ASYNC
FIFO
Virtual FIFO
Controller
Ingress Egress
FIFO FIFO
Address
Manager
The DDR3 memory is required to store packets (size ranging from 64B to 32 KB). Because
the interface width on DDR3 is 64-bits, there are no extra bits to store control information.
Therefore, the reference design needs a way to find the start and end of the packet and
valid data bytes when data is read out of the DDR3. Inserting a control word in a data
packet before it is written into the DDR3 and using the control word when data is read out
of the DDR3 is a simple scheme to determine packet delineations. Using the Packetizer
logic does not have a large performance impact and avoids use of a store and forward
scheme.
The input and output interfaces on Packetized VFIFO controller are AXI-Stream
compliant. The DMA engines and the user application interface with this module. On the
write port, the packet length should be available with the first data beat to enable control
word insertion. This control word is discarded before the packet is made available on the
read port.
The Virtual FIFO Controller comprises of three modules Ingress FIFO, Egress FIFO and
address manager.
The address manager implements the addressing scheme to manage DDR3 as FIFO. Users
have control to set the DDR3 start and end address boundary to be used as FIFO. The users
can also set the burst size to be used on write and read AXI-MM interfaces. These values
are guidance to what the maximum burst size could be. For example, say burst size is
programmed as 256 for both read and write interfaces. Effort is made to operate at this
burst size, but a sub-optimal burst (lesser than 256) can be issued based on timeout in lean
traffic scenarios
(Refer to Memory Controller Registers in Appendix B to set the start and end addresses
and burst size.)
The Ingress and Egress FIFO blocks communicate to the AXI Interconnect block based on
the control signals from the address manager.
The Ingress FIFO block handles the write data and is responsible for driving the write
interface of AXI-MM. The asynchronous preview FIFO in this block helps with clock
domain crossing. It also allows storing up enough data to create a transaction of write
burst size and then sending it to the AXI Interconnect.
The Egress FIFO block handles the read data and is responsible for draining the read data
interface of AXI-MM after a read command is issued and when read data is available. The
asynchronous preview FIFO in this block helps with clock domain crossing. It also allows
storing up enough data to store a transaction of read burst size received from AXI
Interconnect.
The preview FIFOs in the Ingress FIFO block and Egress FIFO block are generated using
the LogiCORE FIFO Generator IP with a data width of 64-bit and depth of 1024. The FIFOs
internally use Block RAMs.
Table 3-3 shows the signals on the Multiport Virtual Packet FIFO. The read and writes
interface signals and the user-specified register widths scale with the number of ports.
Application Components
This section describes the block that interfaces with the base components to support Raw
Packet Data flow. It is a simple application which can be replaced with any other
application protocol like XAUI or Aurora.
Note: The data uses a fixed pattern to enable data checking. The data could be any random data
otherwise.
Table 3-5 shows the ports on the Raw Packet Data module.
Table 3-5: Ports on Raw Packet Data Module
Port Name Type Description
clk Input 250 MHz clock
reset Input Synchronous reset
Read Interface
axi_str_tx_tdata Input Data available from VFIFO transmit
axi_str_tx_tkeep Input Number of bytes valid per data beat on axi_str_tx_tdata
axi_str_tx_tvalid Input Indicates data on axi_str_tx_tdata is valid
axi_str_tx_tlast Input Indicates the end of packet on axi_str_tx_tdata
axi_str_tx_tuser Input VFIFO passes the length of the packet being transmitted
Clocking
This section describes the clocking requirements for this Kintex-7 FPGA Base TRD.
Two differential clocks are needed in this TRD:
• 200 MHz clock and 800 MHz clock for the Memory Controller
• 100 MHz clock for PCI Express integrated Endpoint block
The KC705 board used for this reference design has a 100 MHz differential clock coming
from the PCIe edge connector is passed on to the PCIe wrapper. The 200 MHz differential
clock for the DDR3 Memory Controller comes from an oscillator on the KC705 board, and
an 800 MHz clock is generated inside the Memory Controller by adjusting the MMCM
multipliers and dividers.
AXI
Interconnect
Raw Packet
Data Module
250 MHz
clk_250 Clock
Domain
UG882_c3_07_010512
Figure 3-7 shows the clocking connections. The wrapper for PCI Express generates a 250
MHz single-ended clock that goes to the DMA, Packetized Virtual FIFO Controller, and
Raw Packet Data modules. The DDR3 Memory Controller generates a single-ended 200
MHz clock for the Packetized Virtual FIFO Controller and AXI Interconnect, and an 800
MHz single/differential clock for various parts of the Memory Controller and the external
DDR3 device.
Resets
This section describes the reset requirements for Kintex-7 FPGA Base TRD.
Table 3-6 shows how the different blocks get reset depending on the events that can
happen. The primary reset for the Kintex-7 FPGA Base TRD is driven from the PERSTn pin
of the PCIe edge connector. When this asynchronous pin is active (Low), the
Kintex-7 FPGA Integrated Block for PCI Express, GT transceivers for PCIe and DDR3
Memory Controller IP are held in reset. When PERSTn is released, the initialization
sequences start on these blocks. The initialization sequence for each of these blocks takes a
long time, which is why they get the PERSTn pin directly. Each of these blocks has an
output that reflects the status of its initialization sequence. PCIe asserts user_lnk_up, and
the Memory Controller asserts init_calib_complete when the respective initialization is
complete. These status signals are combined to generate the user logic resets. Figure 3-8
shows the connections for the resets used in the design.
X-Ref Target - Figure 3-8
perstn
init_calib_done Packetized VFIFO
user_lnk_up controller
PCI Express
Endpoint Wrapper axi_str_s2c0_areset_n wr_reset_n
Port 0
perstn rd_reset_n
wr_reset_n
Software Reset, axi_str_c2s0_areset_n Port 1
rd_reset_n
register write to DMA
reset registers
wr_reset_n
axi_str_c2s1_areset_n Port 3
rd_reset_n
UG882_c3_08_010912
Software Architecture
Figure 3-9 shows the software components of the Kintex-7 FPGA Base TRD. The software
comprises several Linux kernel-space drivers and a user-space application.
X-Ref Target - Figure 3-9
Control GUI
xpmon
User Space
Kernel Space
Packets Packets
Application Layer
DMA Layer
Driver Entry:
Application Layer Interface
open, close, ioctl, read
• Generation and transfer of raw data streams from host memory to hardware
(Transmit). Transfer of the looped or generated streaming data back to the host
memory (Receive).
User-space Application (xpmon) is a graphical user interface (GUI) used to:
• Manage the driver and device - for example setting configuration controls for packet
generation and display options
• Display of performance statistics reported by PCIe performance monitor and DMA
performance monitor
The software developed:
• Can generate adequate data to enable the hardware design to operate at throughput
rates of up to 10 Gb/s end to end.
• Showcases the ability of the multi-channel DMA to transfer large amounts of data.
• Provides a user interface that is easy to use and intuitive.
• Is modular and allows for reuse in similar designs.
Kernel Components
Driver Entry Points
The driver has several entry points, some of which are described here. The system invokes
the driver entry function when a hardware match is detected after driver insertion (when
the PCIe device probed by the driver is found). After reading the device's configuration
space, various initialization actions are done. These are initialization of the DMA engine(s),
setting up of receive and transmit buffer descriptor rings, and, finally, initialization of
interrupts. The other driver entry points are when the GUI starts up and shuts down; when
a new performance test is started or stopped; and to convey periodic status information
and performance statistics results to the GUI.
On a Linux OS, the system invokes the probe() function when a hardware match is
detected. A device node is created for xdma (the node name is fixed and the major/minor
numbers are allocated by the system). The base DMA driver appears as a device table entry
in Linux.
DMA Operations
For each DMA channel, the driver sets up a buffer descriptor ring. At initialization, the
receive ring (associated with a C2S channel) is fully populated with buffers meant to store
incoming packets, and the full receive ring is submitted for DMA. On the other hand, the
transmit ring (associated with S2C channel) is empty. As packets arrive for transmission,
they are added to the buffer descriptor ring, and submitted for DMA.
Interrupt Operations
If interrupts are enabled (by setting the compile-time macro TH_BH_ISR), the interrupt
service routine (ISR) handles interrupts from the DMA engine and other errors from
hardware, if any. The driver sets up the DMA engine to interrupt after every N descriptors
that it processes. This value of N can be set by a compile-time macro. The ISR invokes the
functionality in the block handler routines pertaining to handling received data and
housekeeping of completed transmit and receive buffers.
Performance Monitor
The Performance Monitor is a handler that reads all the performance-related registers
(PCIe link level, DMA Engine level). Each of these is read periodically at an interval of one
second.
Control
The GUI allows the user to specify these items before starting a test:
• Packet size
• Enable Loopback or Enable Generator or Enable Checker
When the user starts a test, the GUI informs the DMA driver of the parameters of the test
(unidirectional or bidirectional, the fixed buffer size). The driver sets up the test
parameters and informs the Raw Data Packet Handler, which then starts setting up data
buffers for transmission, reception or both. Similarly, if the user were to abort a test, the
GUI informs the driver, which stops the packet generation mechanism. The test is aborted
by stopping the transmit side flow, and then allowing the receive side flow to drain.
Monitor
The driver always maintains information on the status of the hardware. The GUI
periodically invokes an ioctl() to read this status information.
• PCIe link status, device status
• DMA Engine status
• BDs and buffer information from drivers
• Interrupt status
The driver maintains a set of arrays to hold per-second sampling points of different kinds
of statistics, which are periodically collected by the performance monitor handler. The
arrays are handled in a circular fashion. The GUI periodically invokes an ioctl() to read
these statistics, and then displays them.
• PCIe link statistics provided by hardware
• DMA engine statistics provided by DMA hardware
• Graph display of all of the above
Figure 3-10 shows a screen capture of the GUI with the System Status tab selected.
X-Ref Target - Figure 3-10
2
5
6
7
8
9
10
11
12
13
15
16
17
UG882_c3_10_010912
The GUI Fields (indicated by the numbers in the Figure 3-10) are explained here.
1. Stop Test: Test start/stop control for raw data Path0.
2. Start Test: Test start/stop control for raw data Path1.
3. Packet Size: Fixed packet size selection in bytes for the raw data path.
4. PCIe Statistics tab: Plots the PCIe transactions on the AXI4-Stream interface.
5. Payload Statistics tab: Shows the payload statistics graphs based on DMA
engine performance monitor.
6. Throughput (Gb/s): DMA payload throughput in gigabits per second for each
engine.
7. DMA Active Time (ns): The time in nanoseconds that the DMA engine has been
active in the last second.
8. DMA Wait Time (ns): The time in nanosecond that the DMA was waiting for the
software to provide more descriptors.
9. BD Errors: Indicates a count of descriptors that caused a DMA error. Indicated by
the error status field in the descriptor update.
10. BD Short Errors: Indicates a short error in descriptors in the transmit direction
when the entire buffer specified by length in the descriptor could not be fetched. This
field is not applicable for the receive direction.
11. # SW BDs: Indicates the count of total descriptors set up in the descriptor ring.
12. # SW Buffers: Indicates the count of total data buffers associated with the ring.
13. Interrupts Enabled: Indicates the interrupt enable status for that DMA engine.
The driver enables interrupts on a DMA engine by writing to the DMA engine's
register space. To enable interrupts, the compile-time macro TH_BH_ISR needs to be
set.
14. PCIe Transmit (writes) (Gb/s): Reports the transmit (Endpoint card to host)
utilization as obtained from the PCIe performance monitor in hardware.
15. PCIe Receive (reads) (Gb/s): Reports the receive (host to Endpoint card)
utilization as obtained from the PCIe performance monitor in hardware.
16. PCIe Endpoint Status: Reports the status of various PCIe fields as reported in
the Endpoint's configuration space. Host System's Initial Flow Control Credits. Initial
Flow control credits advertised by the host system after link training with the
Endpoint. A value of zero implies infinite flow control credits.
17. The text pane at the bottom shows informational messages, warnings, or errors.
• The software and hardware are each able to independently work on a set of buffer
descriptors in a supplier-consumer model
• The software is informed of packets being received and transmitted as it happens
• On the receive side, the software needs a way of knowing the size of the packet
The rest of this section describes how the driver uses the features provided by DMA to
achieve the above requirements. Refer to Scatter Gather Packet DMA, page 39 and the
Northwest Logic Packet DMA User Guide to get an overview of the DMA descriptors and
DMA register space [Ref 17].
1 2 3
SW_Next HW_Next
UG882_c3_11_121711
1 2 3
SW_Next
HW_Next HW_Next HW_Completed
HW_Completed
UG882_c3_12_121711
This process continues as the DMA engine keeps adding received packets in the ring, and
the driver keeps consuming them. Because the descriptors are already arranged in a ring,
post-processing of descriptors is minimal and dynamic allocation of descriptors is not
required.
Performance Estimation
This chapter presents a theoretical estimation of performance on the PCI Express interface
and the Packetized Virtual FIFO. It also presents a method to measure performance.
Term Description
MRD Memory Read transaction
MWR Memory Write transaction
CPLD Completion with Data
C2S Card to System
S2C System to Card
Calculations are done considering unidirectional data traffic that is either transmit (data
transfer from System to Card) or receive (data transfer from Card to System).
Traffic on the upstream (Card to System) PCIe link is bolded and traffic on the downstream
(System to Card) PCIe link is italicized.
The C2S DMA engine (which deals with data reception, i.e., writing data to system
memory) first does a buffer descriptor fetch. Using the buffer address in the descriptor, it
issues Memory Writes to the system. After the actual payload in transferred to the system,
it sends a Memory Write to update the buffer descriptor. Table 4-1 shows the overhead
incurred during data transfer in the C2S direction.
Table 4-1: PCI Express Performance Estimation with DMA in the C2S Direction
ACK
Transaction Overhead Comment
Overhead
MRD for C2S Desc = 20/4096 = 0.625/ 8/4096 = One descriptor fetch from C2S engine
128 0.25/128 for 4 KB data (TRN-TX); 20B of TLP
overhead and 8 bytes DLLP
overhead
CPLD for C2S Desc = 20+32/4096 = 8/4096=0.25/ Descriptor reception by C2S engine
1.625/128 128 (TRN-RX). CPLD Header is 20 bytes,
and the C2S Desc data is 32 bytes.
MWR for C2S buffer = 20/128 8/128 MPS = 128B; Buffer write from C2S
engine (TRN-TX)
MWR for C2S Desc update = 20+12/ 8/4096 = Descriptor update from C2S engine
4096 = 1/128 0.25/128 (TRN-TX). MWR header is 20 bytes,
and the C2S Desc update data is 12
bytes.
For every 128 bytes of data sent from the card to the system, the overhead on the upstream
link (in bold) is 21.875 bytes.
% Overhead = 21.875/ (128 + 21.875) = 14.60%
The throughput per PCIe lane is 2.5 Gb/s, but because of 8B/10B encoding, the throughput
comes down to 2 Gb/s.
Maximum theoretical throughput per lane for
Receive = (100 – 14.60)/100 * 2 Gb/s = 1.70 Gb/s
Maximum theoretical throughput for a
x4 Gen2 or x8 Gen1 link for Receive = 13.6 Gb/s
The S2C DMA engine (which deals with data transmission, i.e., reading data from system
memory) first does a buffer descriptor fetch. Using the buffer address in the descriptor, it
issues Memory Read requests and receives data from system memory through
completions. After the actual payload in transferred from the system, it sends a Memory
Write to update the buffer descriptor. Table 4-2 shows the overhead incurred during data
transfer in the S2C direction.
Table 4-2: PCI Express Performance Estimation with DMA in the S2C Direction
ACK
Transaction Overhead Comment
Overhead
MRD for S2C Desc=20/ 8/4096 = 0.25/ Descriptor fetch from S2C engine
4096=0.625/128 128 (TRN-TX)
CPLD for S2C Desc=20+32/ 8/4096 = 0.25/ Descriptor reception by S2C engine
4096=1.625/128 128 (TRN-RX). CPLD Header is 20 bytes and
the S2C Desc data is 32 bytes.
MRD for S2C Buffer = 20/128 8/128 Buffer fetch from S2C engine (TRN-TX).
MRRS=128B
CPLD for S2C buffer = 20/64 = 40/ 8/64=16/128 Buffer reception by S2C engine
128 (TRN-RX). Because RCB=64B, 2
completions are received for every 128
byte read request
MWR for S2C Desc=20+4/ 8/4096=0.25/ Descriptor update from S2C engine
4096=0.75/128 128 (TRN-TX). MWR Header is 20 bytes and
the S2C Desc update data is 12 bytes.
For every 128 bytes of data sent from system to card, the overhead on the downstream link
(italicized) is 50.125 bytes.
% Overhead = 50.125/128 + 50.125 = 28.14%
The throughput per PCIe lane is 2.5 Gb/s, but because of 8B/10B encoding, the
throughput comes down to 2 Gb/s.
Maximum theoretical throughput per lane for Transmit = (100 – 28.14)/100 * 2 =
1.43 Gb/s
Maximum theoretical throughput for a x4 Gen2 or x8 Gen1 link for Transmit =
11.44 Gb/s.
Because the TRD has two raw data paths, there are two C2S DMA engines and two S2C
DMA engines. Each C2S and S2C engine should be able to theoretically operate at the
13.6 Gb/s and 11.44 Gb/s, respectively. If both data are enabled, the DMA splits the
available bandwidth between the two C2S engines and two S2C engines.
The throughput numbers are theoretical and could go down further due other factors,
such as:
• With an increase in lane width, PCIe credits are consumed at a faster rate, which could
lead to throttling on the PCIe link reducing throughput.
• The transaction interface of PCIe is 64 bits wide. The data sent is not always 64-bit
aligned, and this could cause some reduction in throughput.
• Changes in MPS, MRRS, RCB, and buffer descriptor size also have significant impact
on the throughput. The MPS and MRRS values are negotiated between the host PC
and all the endpoints plugged into the host PC. The RCB value is specific to the host
PC.
• If bidirectional traffic is enabled, then overhead incurred further reduces throughput.
• Software overhead/latencies also contribute to reducing throughput.
Table 4-3: Projected Performance of Packetized Virtual FIFO with DDR3 Running
@ 800 MHz
Virtual FIFO Throughput (Gb/s) Comments
Total throughput 102.4*0.8= 81.92 80% efficiency
Total throughput 102.4*0.9= 92.16 90% efficiency
If we consider one S2C and one C2S DMA engine is enabled, the throughput required on
the DDR3 interface is
[13.6 Gb/s (S2C) + 11.44 (C2S)] * 2 (Writes and Read, in and out of the DDR3) = 50.08 Gb/s
Because the maximum theoretical throughput numbers on the PCIe link with the DMA
overhead is less than what the Virtual FIFO can handle, the limiting component in this Base
TRD's system performance is the PCIe and DMA.
Measuring Performance
This section shows how performance is measured in the TRD.
It should be noted that PCI Express performance depends on factors like maximum
payload size, maximum read request size, and read completion boundary, which are
dependent on the systems used. With higher MPS values, performance improves as packet
size increases.
Hardware provides the registers listed in Table 4-4 for software to aid performance
measurement.
.
These registers are updated once every second by hardware. Software can read them
periodically at one second intervals to directly get the throughput.
The PCIe monitor registers can be read to understand PCIe transaction layer utilization.
The DMA registers provide throughput measurement for actual payload transferred.
These registers give a good estimate of the TRD performance.
Software-Only Modifications
This section describes modifications to the platform done directly in the software driver.
The same hardware design (BIT/MCS files) works. After any software modification, the
code needs to be recompiled. The Linux driver compilation procedure is detailed in
Appendix D, Compiling Linux Drivers.
Macro-Based Modifications
This section describes the modifications, which can be realized by compiling the software
driver with various macro options, either in the Makefile or in the driver source code.
Hardware-Only Modifications
This section outlines the changes that require only hardware re-implementation.
Architectural Modifications
This section describes architecture level changes to the functionality of the platform. These
changes include adding or deleting IP with similar interfaces used in the framework.
Aurora IP Integration
The LogiCORE IP Aurora 8B/10B core implements the Aurora 8B/10B protocol using the
high-speed Kintex-7 FPGA GTX transceivers. The core is a scalable, lightweight link layer
protocol for high-speed serial communication. It is used to transfer data between two
devices using transceivers. It provides an AXI4-Stream compliant user interface.
A 4-lane Aurora design with 2-byte user interface data width presents a 64-bit
AXI4-Stream user interface, which matches the Raw Packet Data module's interface within
the framework. Hence, a customer can accelerate the task of creating a PCIe-to-Aurora
bridge design through these high-level steps:
1. Generate a four-lane (3.125 Gb/s line rate per lane) and two-byte Aurora 8B/10B
LogiCORE IP from the CORE Generator tool. Remove the raw data block instance.
2. Remove the Raw Packet Data block and insert the Aurora LogiCORE IP into the
framework (see Figure 5-1).
3. Add an MMCM block to generate a 156.25 MHz clock, or use an external clock source,
to drive a 156.25 MHz clock into the Aurora LogiCORE IP.
4. Simulate the design with the out-of-box simulation framework with appropriate
modifications to include the Aurora files.
5. Implement the design and run the design with Aurora in loopback mode with minimal
changes to the implementation flow.
C2S S2C
Channel-0
VFIFO
Aurora
64 x Controller 64 x GTX
250 MHz 156.25 MHz Transceiver
VFIFO
Integrated Block for PCI Express Controller
PCIe x4Gen2/x8Gen1 Link
64 x 250 MHz
SI SI
256 x 64 x
Multi-channel 200 MHz 1600 Mb/s
AXI AXI DDR3
MI
DMA for PCIe DDR3
Interconnect MIG I/O
Channel-1
S2C C2S VFIFO
Generator
64 x Controller 64 x
250 MHz 156.25 MHz Loopback
VFIFO Checker
Controller Raw Packet Data Block
Aurora IP does not support throttling in the receive direction, because the core has no
internal buffers. The Multiport Virtual FIFO in the data path allows the user to drain
packets at the line rate. The Native Flow Control feature of Aurora can also be used to
manage flow control. As per the Aurora protocol, the round trip delay through the Aurora
interfaces between the NFC request and the first pause arriving at the originating channel
partner must not exceed 256 symbol times.
For 4 lanes, time taken to transmit 4 symbols with each lane running at 3.125 Gb/s
40 bits/4 lanes x 1/3.125 Gb/s = 3.2 ns (1 symbol = 10 bits because of 8B/10B encoding
scheme).
For 256 symbols, time taken to transmit is 256/4 x 3.2 = 205 ns.
For a 156.25 MHz clock (8 ns period), this is 26 clock cycles (the worst case delay),
amounting to a FIFO depth of 26, which is required to hold data received on the Aurora RX
interface after an NFC request to pause data is initiated. The user must appropriately
configure the preview FIFO thresholds for full and empty in Multiport Virtual Packet FIFO
considering this value to prevent overflows.
The Raw Packet Data driver can be reused for Aurora with some modifications. The data
generated by the block handler for Raw Packet Data could now drive traffic over Aurora.
The Aurora serial interface needs to be looped back externally or connected to another
Aurora link partner.
The maximum theoretical throughput that can be achieved on the Aurora path is 10 Gb/s
(64 bit * 156.25 MHz). Refer to UG766, LogiCORE IP Aurora 8B/10B v7.1 User Guide (AXI) for
throughput efficiency [Ref 3].
Resource Utilization
Table A-1 and Table A-2 list resource utilization obtained from the map report during the
implementation phase. The XC7K325T-2FFG900C is the target FPGA.
Note: The reported utilization numbers are obtained with the specific options set for synthesis and
implementation of the design. Refer to the implement script to find the options that are set. A change
in the default options results in a change in the utilization numbers.
Table A-1: Resources for the TRD with the PCIe Link Configured as x4 at a 5 Gb/s
Link Rate
Percentage
Resource Utilization Total Available
Utilization (%)
Slice registers 55,797 407,600 13%
Slice LUTs 47,551 203,800 23%
Bonded IOB 126 500 25%
RAMB36E1 69 445 15%
RAMB18E1 4 890 1%
BUFG/BUFGCNTRL 8 32 25%
MMCM_ADV 1 10 10%
GTXE2_CHANNELS 4 16 25%
GTXE2_COMMONS 1 4 25%
PCIE_2_1 1 1 100%
Table A-2: Resources for the TRD with the PCIe Link Configured as x8 at a 2.5 Gb/s
Link Rate
Resource Utilization Total Available Percentage Utilization (%)
Slice registers 56,793 407,600 13%
Slice LUTs 47,668 203,800 23%
Bonded IOB 126 500 25%
RAMB36E1 69 445 15%
RAMB18E1 4 890 1%
BUFG/BUFGCNTRL 8 32 25%
MMCM_ADV 1 10 10%
Table A-2: Resources for the TRD with the PCIe Link Configured as x8 at a 2.5 Gb/s
Link Rate (Cont’d)
Resource Utilization Total Available Percentage Utilization (%)
GTXE2_CHANNELS 8 16 50%
GTXE2_COMMONS 2 4 50%
PCIE_2_1 1 1 100%
Register Description
This appendix describes registers most commonly accessed by the software driver.
The registers implemented in hardware are mapped to base address register (BAR0) in the
PCIe integrated Endpoint block.
Table B-1 shows the mapping of multiple DMA channel registers across the BAR.
Table B-1: DMA Channel Register Address
DMA Channel Offset from BAR0
Channel-0 S2C 0x0
Channel-1 S2C 0x100
Channel-0 C2S 0x2000
Channel-1 C2S 0x2100
Registers in DMA for interrupt handling are grouped under a category called common
registers, which are at an offset of 0x4000 from BAR0.
Figure B-1 shows the layout of registers.
PCIe Performance
BAR 0 Monitor Registers
Target Interface
Configuration and
Status Registers
Engine Registers
Reg_next_desc_ptr
Reg_sw_desc_ptr
DMA Completed
Byte Count
DMA Common
Registers
UG882_aB_01_011012
DMA Registers
This section describes certain prominent DMA registers used very frequently by the
software driver. For a detailed description of all registers available, refer to the Northwest
Logic DMA Back-End Core User Guide and Northwest Logic DMA AXI DMA Back-End Core
User Guide, available in the k7_pcie_dma_ddr3_base/design/ipcores/dma/doc
directory.
Common Registers
The registers described in this section are common to all engines. The register addresses
are located at the given offsets from BAR0.
Bit
Mode Default Value Description
Position
1:0 RO 00 Sample count - increments every second
PCIe Credits Status - Initial Non Posted Data Credits for Downstream Port
(0x9024)
Table B-16: PCIe Performance Monitor - Initial NPD Credits Register
Bit Position Mode Default Value Description
INIT_FC_NPD captures initial flow control credits for
11:0 RO 00
non-posted data for host system
PCIe Credits Status - Initial Non Posted Header Credits for Downstream Port
(0x9028)
Table B-17: PCIe Performance Monitor - Initial NPH Credits Register
Bit Position Mode Default Value Description
INIT_FC_NPH captures initial flow control credits for
7:0 RO 00
non-posted header for host system
PCIe Credits Status - Initial Posted Data Credits for Downstream Port (0x902C)
Table B-18: PCIe Performance Monitor - Initial PD Credits Register
Bit Position Mode Default Value Description
INIT_FC_PD captures initial flow control credits for
11:0 RO 00
posted data for host system
PCIe Credits Status - Initial Posted Header Credits for Downstream Port
(0x9030)
Table B-19: PCIe Performance Monitor - Initial PH Credits Register
Bit Position Mode Default Value Description
INIT_FC_PH captures initial flow control credits for
7:0 RO 00
posted header for host system
Directory Structure
This appendix describes the directory structure and explains the organization of various
files and folders.
X-Ref Target - Figure C-1
k7_pcie_dma_ddr3_base
xrawdata0 tb k7_lin_trd_quickstart
xpmon implement
Makefile ip_cores
UG882_aC_01_011012
• The k7_trd_lin_quickstart script is used to build and insert driver and GUI
modules, invoke the GUI, and remove the driver modules when the user closes the
GUI window.
UG882_aD_01_011112
3. GUI compilation: Steps are provided for compiling and invoking the GUI.
To compile and invoke the GUI, navigate to the k7_pcie_dma_ddr3_base/
linux_driver/xpmon folder and follow these steps:
a. To clean the area, type:
$ make clean
b. To compile the files, type:
$ make
c. To invoke the GUI, type:
$. /xpmon
To run the application GUI, go to Using the Application GUI, page 19.
4. Remove the device drivers. Steps are provided for unloading the driver.
To unload the driver modules, navigate to the k7_pcie_dma_ddr3_base/
linux_driver folder and execute this command at the command line in the
terminal:
$ make remove
UG882_aD_02_011112
Additional Resources
Xilinx Resources
To search the Answer database of silicon, software, and IP questions and answers, or to
create a technical support WebCase, see the Xilinx Support website at:
http://www.xilinx.com/support.
For a glossary of technical terms used in Xilinx documentation, see:
http://www.xilinx.com/support/documentation/sw_manuals/glossary.pdf.
References
These documents provide supplemental material useful with this user guide.
1. UG882, Kintex-7 FPGA Base Targeted Reference Design User Guide (this guide)
2. UG798, Xilinx Design Tools: Installation and Licensing Guide
3. UG766, LogiCORE IP Aurora 8B/10B v7.1 User Guide
4. UG477, 7 Series FPGAs Integrated Block for PCI Express User Guide
5. UG626, Synthesis and Simulation Design Guide
6. WP350, Understanding Performance of PCI Express Systems
7. UG476, 7 Series FPGAs GTX Transceivers User Guide
8. UG810, KC705 Evaluation Board for the Kintex-7 FPGA User Guide
9. UG586, 7 Series FPGAs Memory Interface Solutions User Guide
10. UG883, Kintex-7 FPGA Base Targeted Reference Design Getting Started Guide
11. AXI Interconnect IP:
http://www.xilinx.com/products/intellectual-property/axi_interconnect.htm