System Generator Tutorial

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Workshop on FPGA based Digital Design

Day 3: Xilinx System Generator Tutorial (11/06/2014)


Introduction to System Generator
System Generator is a system-level modelling tool that facilitates FPGA
hardware design. It extends Simulink in many ways to provide a modelling
environment that is well suited to hardware design. The tool provides high
level abstractions that are automatically compiled to an FPGA at the push of a
button.
It provides two tools:
Blocks to build the model.
Hardware generator: ModelHDL code.
And Simulink/ Matlab provides test environment for the design:
Generate input test vectors.
Visualize/ Analyse outputs of the design.
Matlab

Simulink

System Generator
Source1

Xilinx blocks

Source 2

Block Diagram View for System Generator Design

System Generator Block Set Libraries


Xilinx block set contains building blocks for constructing DSP and other
digital systems in FPGA using Simulink. These blocks are grouped into libraries
according to their function, and some blocks with broad applicability are linked
to multiple libraries. The following libraries are provided:
Basic Element Blocks: Includes standard building blocks for digital logic.
Communication Blocks: Includes forward error correction and
modulator blocks, commonly used in digital communication.
Control Logic Blocks: Includes blocks for control circuitry and state
machines.
Data Type Blocks: Includes blocks that convert data types (including
gateways).
DSP Blocks: Includes Digital Signal Processing blocks.
Math Blocks: Includes blocks that implement mathematical functions.
Memory blocks: Includes blocks that implement and access memories.
Shared Memory Blocks: Includes blocks that implement and access
Xilinx Shared memories.
Tool Blocks: Includes Utility blocks i.e., code generation(System
Generator Token), resource estimation, HDL-cosimulation etc.
Common Options in Block Parameter Dialog boxes
Each Xilinx block has several controls and configurable parameters, seen
in its blockparameters dialog box. This dialog box can be accessed by doubleclicking on the block. Many of these parameters are specific to the block. The
controls and parameters that are common to most of the blocks are discussed
below:
Precision: Most blocks give you the option of choosing the precision, i.e.
the number of bits and binary point position. By default, the output of
Xilinx blocks is full precision; that is, sufficient precision to represent the
result without error. Most blocks have a User-Defined precision option
that fixes the number of total and fractional bits.
Arithmetic Type: In the Typefield of the block parameters dialog box,
you can choose unsigned or signed (two's complement) as the data type
of the output signal.

Number of bits: Fixed-point numbers are stored in data types


characterized by their word size as specified by number of bits, binary
point, and arithmetic type parameters.The maximum number of bits
supported is 4096.
Binary point: The binary point is the means by which fixed-point
numbers are scaled. The binary point parameter indicates the number of
bits to the right of the binary point (i.e., the size of the fraction) for the
outputport. The binary point position must be between zero and the
specified number of bits.
Overflow and Quantization: When user-defined precision is selected,
errors may result from overflow or quantization. Overflow errors occur
when a value lies outside the representable range. Quantization errors
occur when the number of fractional bits is insufficient to represent the
fractional portion of a value. The Xilinx fixed-point data type supports
several options for user-defined precision. For overflow the options
are to Saturateto the largest positive/smallest negative value, to Wrap
(i.e., to discard bits to the left of the most significant representable bit),
or to Flag aserror(an overflow as a Simulink error) during simulation.
For quantization, the options are to Roundto the nearest representable
value (or to the value furthest from zero if there are two equidistant
nearest representable values), or to Truncate(i.e., to discard bits to the
right of the least significant representable bit).
Latency: Many elements in the Xilinx block set have a latency option.
This defines the number of sample periods by which the block's output is
delayed.
Provide Synchronous reset port: Selecting the Provide Synchronous
Reset Portoption activates an optional reset (rst) pin on the block. When
the reset signal is asserted the block goes back to its initial state. Reset
signal has precedence over the optional enable signal available on the
block. The reset signal has to run at a multiple of the block's sample rate.
The signal driving the reset port must be Boolean.
Provide Enable Port: Selecting the Provide Enable Port option activates
an optional enable (en) pin on the block. When the enable signal is not
asserted the block holds its current state until the enable signal is
asserted again or the reset signal is asserted. Reset signal has

precedence over the enable signal. The enable signal has to run at a
multiple of the block 's sample rate. The signal driving the enable port
must be Boolean.
Sample Period: Data streams are processed at a specific sample rate as
they flow through Simulink. Typically, each block detects the input
sample rate and produces the correct sample rate on its output. If you
select Specify explicit sample periodrather than the default, you may set
the sample period required for all the block outputs. This is useful when
implementing features such as feedback loops in your design. In a
feedback loop, it is not possible for System Generator to determine a
default sample rate, because the loop makes an input sample rate
depend on a yet-to-be-determined output sample rate. System
Generator under these circumstances requires you to supply a hint to
establish sample periods throughout a loop.
Things to be noted:
Every model needs a System Generator token.

System Generator token configures the simulation and hardware


parameters.
o Relates Sample Period to Hardware clock.
o Used to synthesize the model.
o Sets the target FPGA device for the model
All models start and end with Gateways.

Gateway In: Converts from double to fixed point format.


Gateway Out: Converts from fixed point to Double format.
Any Simulink block can be used outside the Gateways for data sources
and output analysis.
Only Xilinx blocks can be used inside the Gateways.
On synthesizing the model, Gateways are considered as the ports.

Exercise 1: Introduction to basic building blocks in XSG


Section 1:In this exercise, you will create a model with a sine wave source
element and Scope sink element together with a Delay element.
Step 1: Open a Blank Model from Simulink Library Browser.
Step 2: From the library browser choose SimulinkSourcesSine wave. Copy
the block to the current model.
Step 3: Similarly, add the Scope from SimulinkSinksScope.
Step 4: Add the System Generator Token, GateWay In and GateWay Out from
Xilinx ToolsBasic Elements.
Step 5: Add the delay element block from Xilinx ToolsBasic Elements.

Completed model

Step 6: Double Click on the Sine Wave block and change the parameters:
Amplitude : 1
Frequency : 2*pi*1/150
Step 7: Change the GateWay In parameters as:

GateWay In parameters
Step 8:The only parameter to be changed in the Delay element is the latency.
Set the latency value to 1.
Step 9:Now, to view both the source and output from the system generator
block together in scope, Double click on the scope icon and set the Number of
axes to 2.

Changing Scope Parameters

Running the simulations:


From the Model window choose Simulation Configuration Parameters.

From the Configuration Parameters dialog box, enter 150 in the Stop
time field, and set thefollowing Solver options:
o Type: Fixed-step
o Solver: Discrete (no continuous states)
o Tasking mode: SingleTasking
Setting these parameters allows your simulation to run for 150 time units.

Scope Output
Observe that the output from the Xilinx blocks (first plot) is delayed by one
sample.
1)Vary the Latency of the Delay element and observe the output.
2)Vary the Stop time parameter and observe the output.

Modifying the time parameters for simulation:


Verify that the current sampling rate of the system is 1 Hz.
Now, to change the sample rate of the system to 100 Hz, Double click on
the System Generator TokenClocking and set the Sample Period to
1/100. By doing so, the system clock will be generated at 100 Hz.

Setting the System Period


Now, to sample at 100 Hz, Change the Sample Period within the
Gateway In block to 1/100.
Modifying the Block Parameters for simulation:
Sine Wave:
Double Click on the Sine Wave block and change the amplitude to 10.
Run the Simulation and Observe the Waveform.

Scope Output
The waveform gets clipped.
Explain what has happened?
Answer :

GateWay In:
Change the Overflow Parameter to Wrap and observe the waveform.

Explain what has happened?


Answer :

Now Set the Overflow parameter to Flag as Error and Run the
Simulation.

This option generates an error message whenever an overflow is detected.


In order to handle the overflow properly (to fix the error), the number of bits
set for the representation has to be properly chosen.
Vary the number of bits (in GateWay In) and the binary point. Observe the
output for various cases.
Explain what happens?
Answer:

Section 2: Hardware co-simulation in XSG


In this section, you will learn to generate the bit stream and download it to the
target FPGA using System Generator Token.
Step 1:Double Click on the System Generator Token. Choose the Compilation
parameters as Hardware CosimulationNew Compilation Target.

Step 1: Hardware Cosimulation


Step 2:Choose the compilation target parameters as:
Board Name : ML505
System Clock : 100 MHz
Pin Location : AH15
Boundary Scan Positions : 5
Click on Detect. (Before that ensure that the board is connected and powered
up).
From Add Targetable Devices Add Virtex 5 board.

Click on Install. A new blank window appears.


Step 3: Now, in the model double click on the system generator token and
choose theCompilation Target as ML505. Click on Generate.
Step 4: After Generation, a new window with generatedhardware block
appears.
Step 5:Copy the generated block and make connections as shown in the figure.

Hardware cosimulation
Step 6:Run the simulation and verify that the result matches with the previous
ones.
Note:All GateWay In and GateWay Out blocks will be mapped to the input and
output ports in hardware cosimulation. If a GateWay Out block need not to be
mapped to the output uncheck the Translate to Output port option.

Section 3: Timing and Power Analysis Using XSG


Step 1: Double click on the System Generator Token and choose Timing and
Power analysis from the Compilation Parameters.
Step 2: Choose the proper Target Device. This tutorial is based on Virtex 5
(XC5VLX110T Evaluation Platform).
Step 3:The timing and power reports would be saved in a folder timing which
would be automatically created within the current folder.
Step 4: Click on Generate.
Step 5:A timing Analyzer window appears.

Timing analyzer
This window provides various options for identifying slow paths, charts
showing details of various paths, operating frequency and period, Trace and
ISE reports and Power analysis reports.
Since there are no registers within the design, slow paths and charts are not
displayed.

Step 6:Click on various options and tabulate the results below.


Minimum Period
Maximum Operating Frequency
Step 7:From the ISE reports note down the resource utilization, minimum
period and the maximum operating frequency.

Step 8: Click on the Power Analysis tab. It launches XPower Analyzer which
provides detailed power analysis.

Section 4: Implementing Using Xilinx ISE Tools


In this section, we will learn how to generate the HDL code and then download
the bit stream to the target device.
In this section we will consider a new model with a delay value equal to 4. For
this, modify the Latency value within the Delay element in the previous
model.

Step 1:Double Click on the System Generator Token. A dialog box appears. Set
the parameters for Compilation as shown in the figure below.

Compilation Parameters
Note that the proper target device is chosen. This tutorial is based on Virtex 5
(XC5VLX110T Evaluation Platform).
Step 2:Click on Generate. A new directory named netlist appears in the
current folder in which you are working. The HDL code (here, the Verilog code)

and an ISE project together with many other files would be created in the
netlist folder.
delay4_cw.v : This is the top level module which forms the HDL wrapper
for the design. Depending on the type of multi-rate implementation
selected it drives clock enables in the design or the clocks.
delay4.v : This contains most of the HDL for the design.
In addition to the signals in the system generator model, various other signals
are also present in the generated code.
Clock (clk) : Clock signal for the design. All operations of the core are
synchronised with the rising edge of the clock.
Clock Enable (ce) : It is attached to the clock enable pins of the flip-flops.
A valid clock signal occurs only when the ce signal attached to the CE
pin of the flip-flop is high on a rising edge of the clock. (Mainly of use in
multirate systems).
Step 3: These files can be taken to the Xilinx ISE in order to begin the stages of
taking the design to FPGA.
Step 4:OpenXilinx Project Navigator. Open the generated project from
FileOpen Project and browse to your current folder (where the files are
generated).
Step 5:Observe that various files are added automatically into the project.
Step 6:Synthesize the design. Expand the Synthesize-XST option and Click on
View RTL Schematic.

Double click on the RTL Schematic to see inside the block. Observe that there
are blocks other than that of the main block (delay4_cw). The extra added

blocks are used for generating the clk and clock enable for the system
generator model.
Step 7:In the model, there are 4 delay elements (since the delay is set to 4).
Synthesis and mapping options can be set which would control how the design
is implemented on the FPGA.
Either the implementation can be carried out utilizing the IOBs
(Input/Output blocks) or use only the flip-flops within the logic slices.

Implement a delay using flip-flops for each clock cycle or by using a shift
register.

Step 8:In order to make use of flip-flops rather than the shift registers, modify
the Synthesis options. Right click on Synthesize- XSTProcess PropertiesHDL
options Uncheck the Shift Register Extraction.
Step 9:To make use of the IOBs, Right click on Synthesize- XSTProcess
PropertiesXilinx Specific Options Pack IO registers into IOBs Yes.

Step 10: Click on Generate Programming file. This will go through a series of
steps including SynthesisTranslateMapPlace and Route. Finally the bit
stream is generated.
Step 11:Tabulate various results for resource utilization, operating frequency
and minimum time period (Post-PAR Static Timing Report).

Subsection: Inspecting the design using FPGA Editor


FPGA Editor allows the placement and routing of the design to be inspected,
and modifications to be made if required.
Step 1:To view the routed design, the actual hardware used and its locations
on the FPGA, expand the Place and Route Option in the Processes pane.
Double click on View or Edit Routed Design (FPGA editor). A new window
appears.
Step 2:Zoom in to the view and observe various connections.

Exercise : Change the Synthesize Options to default Settings. Implement the


design and view the placed & routed design in FPGA. Observe the differences
in both the cases.

Section 5: Hardware verification with ChipScope Pro


In this section, a simple system comprising an 8 bit counter and a 10 sample
delay is implemented in hardware and testing is performed using ChipScope
Pro.

Step 1:Set the GateWay Indata type to Boolean.


Step 2:Add a counter block (Xilinx BlocksetBasic Elementscounter). Set the
number of bits to 8. Change the Counter type to Count limited and set the
value to 100. Check the Provide Synchronous Reset port.
Step 3: Change the Delay latency to 10.
Step 4:Add the ChipScope Block (Xilinx BlocksetToolsChipscope). Change
the parameters within the Chipscope as shown in the figure below.

Step 5: Run simulation and verify the output.

Step 6:To verify the output in hardware, the reset signal has to be mapped
from outside. Hence, change the GateWay In paramatersImplementation
IOB pad locations {AC24}. This is the pin location for GPIO DIP switch 8. This
switch can be used to reset the counter.
Step 7:Generate HDL code using the System Generator Token. Open the
generated project in Xilinx ISE.
Step 8: Modify the UCF file generated to map the clock signal.
# LOC constraints
NET gateway_in*0+ LOC = AC24;
NET clk LOC = AH15;
Click on Generate BitStream. The generated bit stream includes the ChipScope
core as well as the design under test.
Step 8:Click on Analyze using ChipScope.Click on Open Cable/JTAG connection.
Ensure that the board is connected and powered on.
Step 9:ChipScope Pro Analyzer window appears. Click on OK. Click on
DeviceDEV4(XC5VLX110T)Configure. Click on OK.

Step 10: Next, we need to import the file containing the bus information,
which was originally defined in the ChipScope block in System Generator. Goto
FileImportSelect new filecounter_delay.cdc. Click OK.
Step 11:In the project panel Click on Trigger setup and Bus Plot. Check the
boxes for data0 and data1.
Step 12:In the trigger setup window, set the match unit value to 0. This
corresponds to the reset signal of the counter connected to the trigger port of
the chipscope.
Step 13:Before arming the trigger, ensure that the GPIO switch has been
switched On. Now, arm the trigger. ChipScope now waits for the trigger signal.
Step 14:Switch the DIP switch position to OFF. This will trigger the capture of
data and reset the counter.
Step 15:View the signal in Busplot. The captured data will be plotted and it
should match with the results obtained from System Generator.

Exercise 2: MAC based FIR filter design


In this section, you will create a Low Pass FIR filter using a single Multiplier and
accumulator unit.

Basic FIR filter structure


Objective: To design a LPF to eliminate high frequency component in the given
signal. In this tutorial, our aim is to remove 300 Hz signal from a mixture of
sinusoids.
Design Characteristics:
Low pass filter of order 6.
Coefficients generated using Xilinx FDATool.
Sampling Frequency of 1 KHz.
Input consists of a combination of 2 sine waves: a low frequency and a
high frequency.
Input Signal Characteristics:
Sine wave of frequencies 5 Hz and 300 Hz.
Amplitude of each sine wave is set to 1.
Generating Filter coefficients:
Step 1:Filter coefficients are generated using FDATool in Matlab. Type
fdatool in the command window.
Step 2:Filter Design and Analysis Tool opens up. Set the following parameters.
Response Type: Lowpass
Design Method : FIRLeast Squares
Specify Order : 6

Units : Normalized to [0 1]
wpass : 0.1
wstop : 0.25
Click on Design Filter.

Step 3:Export the generated coefficients using FileExport. Save the


coefficients as filter_coeff.

Click Export. This variable appears in the Matlab WorkSpace.

In the design, these values will be stored in memory. During the filtering
operation, these would be read and used.

Developing the Model


Step 1:Start a new model. Add the System Generator Token and set the sample
Period to 1/1000.
Step 2:Add 2 sine wave sources and set the frequency to 5 Hz and 300 Hz each.
Set the amplitudes of each of them to 1. Rename the sources. Add both the
signals using an adder (SimulinkMath Operations Sum).

Step 2:Set the parameters of GateWay In as

Step 3: The inputs to the filter are passed through delay line (as in the block
diagram shown above). The delay line is implemented using an Addressable

Shift Register (Xilinx blocksetMemoryAddressable Shift Register) of depth


equal to the number of the filter coefficients.

Step 4:The filter coefficients generated are stored within a ROM (Xilinx
BlockSetMemoryROM). Double Click on the ROM block and set the
parameters.

Step 5: The addresses to the delay line and the memory are generated using a
counter (Xilinx BlocksetBasic elementsCounter)

The sample period of the counter is set to 1/7000 because, for every new input
which comes, the filter has to process 7 samples (since a 7 tap filter). So, the
memory and the delay line should operate at 7 times faster rate than rest of
the elements.
Step 6:All blocks in the model operate according to the simulink clock. Hence,
the Simulink Clock should be set to the maximum of frequency value at which
each block operate. Set the Simulink time period to 1/7000.
Step 7:Add a multiplier block (Xilinx BlockSetMathMult). Set the
parameters as shown below:

The model developed so far appears to be:

The delay element used after the Delay line is to compensate for the latency of
the ROM (filter coefficients).
Step 8: Add an accumulator block (Xilinx BlocksetMathAccumulator) to
the end of the multiplier. For every input, the accumulator needs to operate 7
times. Whenever a new input comes, the output of the accumulator is reset.
The reset signal is generated using control logic.

Control Logic

Accumulator Parameters

Step 9:For each input signal, the accumulator generates 7 outputs (depending
on the number of filter taps). Only the last value of these 7 outputs is the valid
result. This value is captured using a register (Xilinx BlocksetBasic Elements)
which is enabled only when a valid output comes.
Step 10:The MAC unit performs at a faster rate compared to the input
sampling time. In order for the output sample time to match with that of the
input ones, the value obtained from the capture register is downsampled by an
amount equal to the number of filter taps. This is done using a DownSample
block (Xilinx BlocksetBasic ElementsDownsample).

Step 11:Finally, the output is send to a GateWay Out block and the results are
viewed using Scope.

The Completed Model

Section 1: Running the Simulations


Step 1: Ensure that the simulink sample period is set to 1/7000.
Step 2:Set the Stop time to 0.7. Run the simulation.

Section 2: Timing and Power Analysis


Observe that the results obtained are same as the ones tabulated below:
Minimum Period
Max. operating Frequency
Number Of Slice Registers
Number Of Slice LUTs
Number Of Bonded IOBs
Total Power

8.963 ns
111.570 MHz
101
85
33
1.061 W

Section 3: Hardware Co-Simulation


Generate the hardware block using the procedure discussed above.

Run the simulation and verify that the results are same as that from the
software model.

You might also like