0% found this document useful (0 votes)
10 views24 pages

Unit V-Seca1605-Programming in HDL

The document outlines the FPGA design flow using Xilinx ISE, detailing steps such as design entry, synthesis, implementation, and programming. It describes the architecture of the Artix-7 FPGA, including the components of Logic Blocks, DSP slices, and the functionality of SLICEM and SLICEL slices. Additionally, it emphasizes best practices for design flow, resource utilization, and pinout planning to optimize FPGA performance.

Uploaded by

shreyarbm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

Unit V-Seca1605-Programming in HDL

The document outlines the FPGA design flow using Xilinx ISE, detailing steps such as design entry, synthesis, implementation, and programming. It describes the architecture of the Artix-7 FPGA, including the components of Logic Blocks, DSP slices, and the functionality of SLICEM and SLICEL slices. Additionally, it emphasizes best practices for design flow, resource utilization, and pinout planning to optimize FPGA performance.

Uploaded by

shreyarbm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

SCHOOL OF ELECTRICAL AND ELECTRONICS

DEPARTMENT OF ELECTRONICS AND COMMMUNICATION


ENGINEERING

UNIT - V
PROGRAMMING IN HDL – SEC1406

1
V. REALIZING APPLICATIONS IN FPGA

FPGA Design Flow


The ISE® design flow comprises the following steps: design entry, design synthesis, design
implementation, and Xilinx® device programming. Design verification, which includes both
functional verification and timing verification, takes places at different points during the
design flow. This section describes what to do during each step. For additional details on each
design step, click on a link below the following figure.

Figure 5.1 : ISE design flow


• Design Entry
• Design Synthesis and Verification
• Design Implementation and Verification
• Device Programming
• In-Circuit Verification

Design Entry
1. Create an ISE project as follows:
2. Create a project.
3. Create files and add to project, including a user constraints (UCF) file.
4. Add any existing files to project.
5. Assign constraints such as timing constraints, pin assignments, and area constraints.

2
Functional Verification
We can verify the functionality of our design at different points in the design flow as follows:
• Before synthesis, run behavioral simulation (also known as RTL simulation).
• After Translate, run functional simulation (also known as gate-level simulation), using
the SIMPRIM library.
• After device programming, run in-circuit verification.

Design Synthesis
The synthesis process will check code syntax and analyze the hierarchy of our design which
ensures that our design is optimized for the design architecture that we have selected. The
resulting netlist is saved to an NGC file (for Xilinx® Synthesis Technology (XST)) or an EDIF
file (for Precision, or Synplify/Synplify Pro).
The synthesis process can be used with the following synthesis technology tools. Select one of
the following for information about running your synthesis tool:
• Xilinx Synthesis Technology (XST)
• Precision from Mentor Graphics Inc.
• Synplify and Synplify Pro from Synplicity Inc.

Design Implementation
Implementation of design as follows:
1. Implement design, which includes the following steps:
• Translate
• Map
• Place and Route
2. Review reports generated by the Implement Design process, such as the Map Report or
Place & Route Report, and change any of the following to improve our design:
• Process properties
• Constraints
• Source files
3. ynthesize and implement our design again until design requirements are met.

3
Timing Verification
We can verify the timing of our design at different points in the design flow as follows:
Run static timing analysis at the following points in the design flow:
• After Map
• After Place & Route
Run timing simulation at the following points in the design flow:
• After Map (for a partial timing analysis of CLB and IOB delays)
• After Place and Route (for full timing analysis of block and net delays)

Xilinx Device Programming


Program Xilinx device as follows:
• Create a programming file (BIT) to program our FPGA.
• Generate a PROM or ACE file for debugging or to download to device. Optionally,
create a JTAG file.
• Use iMPACT to program the device with a programming cable.

Xilinx Artix-7 architecture


The Artix-7 FPGA consists of Logic Blocks, Block RAM, DSP blocks, and a global routing
network. We will spend most of our time discussing the Logic Blocks. But before we do, realize
that modern reconfigurable logic exists because logic designs can easily be expressed in terms
of medium scale logic building blocks such as registers, shift registers, multiplexers, counters,
adders, subtractors, and comparators. Consider the following output from the Xilinx ISE
during the synthesis of Lab 4 on the Spartan-6 FPGA.
ISE decomposed my VHDL design into basic building blocks. This is one reason we insisted on
certain coding practices throughout the semester - they increase the likelihood that our design
will be efficiently mapped into these basic building blocks. However, consider how the actual
FPGA can be configured to realize these basic building blocks. This complicated process is
what the Xilinx software does.

Logic Blocks
A configurable logic block (CLB) is a basic block used to implement the logic behind the
VHDL designs we have been working on all semester. In FPGAs, hundreds or thousands of
CLBs are laid out in an array (commonly a switch matrix) known as the global routing

4
network. All of the CLBs on the FPGA are connected to each other. On the Artix-7 (and other
Xilinx 7-series boards), each CLB contains two Logic Slices (discussed in the following
section). The logical layout of a CLB can be seen in the Figure below.

Figure 5.2 : CLB diagram

Logic Slices
The Artix-7 on our board (Artix-7 7A200T) has a total of 33,650 logic slices (16,825 CLBs).
Each logic slice contains four 6-input LUTs and eight flip-flops. This corresponds to 134,600
total6-inputLUTs.
There are three possible types of logic slices: SLICEM, SLICEL, and SLICEX. However, in
the Artix-7, SLICEX slices are unused; of the 33,650 logic slices, 22,100 are SLICEL and
11,550 are SLICEM.
In the logic slices, three major SLICEM subsystems can be seen: 1. the four 6-input LUTs, 2.
the eight flip-flops, and 3. the fast carry logic.
1. Look-up tables
If you need a refresher on how a hardware LUT works, In a SLICEM, there are four 64x1
RAMs which are used to realize 5 or 6-variable functions; the truth table for the function is
stored in the RAM and the inputs are used as the input addresses. As an example, let's try to
realize a full adder using RAM. In class, we will derive the truth table for sum and carry and
show how they can be inserted into a LUT. It is important for the further development of the
lecture to point out that sum = a xor b xor c and that you can represent cout = ((a xor b) and
cin) or (a and b) This last form is pretty nutty, but is also very useful.
2. Flip Flops
There are 8 flip flops in each logic slice.

5
3. Fast Carry Logic
The fast carry logic is designed explicitly to realize a variation of a carry look-ahead adder.
Consider the construction of a 4-bit adder with inputs A=a3,a2,a1,a0 , B=b3,b2,b1,b0 , and a
carry in c0. Each slice of the adder can either generate a carry bit or propagate its carry in to
the carry out.
• Propagate -- pi is equal to 1 when the inputs to a bit slice are such that any carry in will
be propagated.
• Generate -- gi is equal to 1 when the inputs to a bit slice are such that a carry will be
generated.
We can represent the cout of a slice as cout = g + p*cin. This arrangement is effectively what is
happening in the carry logic block in the middle of each logic slice.

Interconnect
A logical figure of how the CLBs on the Artix-7 are interconnected to each other can be seen
in the Figure 5.3

Figure 5.3 : Interconnect diagram

6
DSP Slice
Apart from the slices which make up the CLBs discussed above, the Artix-7 also contains
DSP slices. The Artix-7 we are using contains 700 DSP48E1 slices. Each DSP48E1 slice
contains a pre-adder, a 25 x 18 multiplier, an adder, and an accumulator. A picture of a DSP
slice can be seen in the Figure 5.4.

Figure 5.4 : DSP Slice

Configurable Logic Blocks (CLB)


The 7 series configurable logic block (CLB) provides advanced, high-performance FPGA
logic:
• Real 6-input look-up table (LUT) technology
• Dual LUT5 (5-input LUT) option
• Distributed Memory and Shift Register Logic capability
• Dedicated high-speed carry logic for arithmetic functions
• Wide multiplexers for efficient utilization
CLBs are the main logic resources for implementing sequential as well as combinatorial
circuits. Each CLB element is connected to a switch matrix for access to the general routing
matrix (shown in Figure 5.5). A CLB element contains a pair of slices.

7
Figure 5.5 : CLB diagram

The LUTs in 7 series FPGAs can be configured as either a 6-input LUT with one output, or as
two 5-input LUTs with separate outputs but common addresses or logic inputs. Each 5-input
LUT output can optionally be registered in a flip-flop. Four such 6-input LUTs and their eight
flip-flops as well as multiplexers and arithmetic carry logic form a slice, and two slices form a
CLB. Four flip-flops per slice (one per LUT) can optionally be configured as latches. In that
case, the remaining four flip-flops in that slice must remain unused. Approximately two-thirds
of the slices are SLICEL logic slices and the rest are SLICEM, which can also use their LUTs
as distributed 64-bit RAM or as 32-bit shift registers (SRL32) or as two SRL16s. Modern
synthesis tools take advantage of these highly efficient logic, arithmetic, and memory features.
Expert designers can also instantiate them.

7 Series CLB Features


The 7 series CLB is identical to that in the Virtex®-6 FPGA family. The CLB is very similar to
that of the Spartan®-6 FPGA family with these differences:
• Columnar architecture
• Scales easily to higher densities
• More routing between CLBs
• SLICEL and SLICEM only (no Spartan-6 FPGA SLICEX)
• All slices support carry logic
• More optimized
The common features in the CLB structure simplify design migration from the Spartan-6 and
Virtex-6 families to the 7 series devices. The unique floor plan means that location constraints
8
should be removed before implementing designs originally targeted to earlier FPGAs. The
interconnect routing resources are increased in size, quantity, and flexibility relative to the
Virtex-6 FPGA family, improving the quality of automatic place and route results.

Device Resources
The CLB resources are scalable across all the 7 series families, providing a common
architecture that improves efficiency, IP implementation, and design migration. The number
of CLBs and the ratio between CLBs and other device resources differentiates the 7 series
families. Migration between the 7 series families does not require any design changes for the
CLBs.
Device capacity is often measured in terms of logic cells, which are the logical equivalent of a
classic four-input LUT and a flip-flop. The 7 series FPGA CLB six-input LUT, abundant flip-
flops and latches, carry logic, and the ability to create distributed RAM or shift registers in the
SLICEM, increase the effective capacity. The ratio between the number of logic cells and 6-
input LUTs is 1.6:1.

Recommended Design Flow


CLB resources are inferred for generic design logic and do not require instantiation. Good
HDL design is sufficient. A few items to note:
• CLB flip-flops have either a set or a reset. The designer must not use both set and reset.
• Flip-flops are abundant. Pipelining should be considered to improve performance.
• Control inputs are shared across a slice or CLB. The number of unique control inputs
required for a design should be minimized. Control inputs include clock, clock enable,
set/reset, and write enable.
• A 6-input LUT can be used as a 32-bit shift register for efficient implementation.
• A 6-input LUT can be used as a 64 x 1 memory for small storage requirements.
• Dedicated carry logic implements arithmetic functions effectively.
These steps indicate the recommended design flow:
1. Implement the design using preferred methodologies (HDL, IP, etc.).
2. Evaluate utilization reports to determine resources used. Check to make sure arithmetic
logic, distributed RAM, and SRL are used, when helpful.
3. Consider flip-flop usage. a. Pipeline for performance b. Use dedicated flip-flops at the
outputs of dedicated resources (block RAM, DSP) c. Allow shift registers to use SRL (avoid
set/resets)

9
4. Minimize the use of set/resets.

Pinout Planning
Although the use of most resources affects the resulting device pinout, CLB usage has little
effect on pinouts because they are distributed throughout the device. The ASMBL™
architecture provides maximum flexibility with CLBs on both sides of most I/Os.
The best approach is to let the tools choose the I/O locations based on the FPGA requirements.
Results can be adjusted if necessary for board layout considerations. The timing constraints
should be set so that the tools can choose optimal placement for the design requirements.
Carry logic cascades vertically up a column, so wide arithmetic buses might drive a vertical
orientation to other logic, including I/O.
While most 7 series devices are available in flip-chip packages, taking full advantage of the
distributed I/O in the ASMBL architecture, the smaller devices are available in wire-bond
packages at a lower cost. In these packages, some pins are naturally closer to the I/Os and
special resources than others, so pin placement should be done after the internal logic is
defined.

Slice Description
Every slice contains:
• Four logic-function generators (or look-up tables)
• Eight storage elements
• Wide-function multiplexers
• Carry logic These elements are used by all slices to provide logic, arithmetic, and ROM
functions.
In addition, some slices support two additional functions: storing data using distributed RAM
and shifting data with 32-bit registers. Slices that support these additional functions are called
SLICEM; others are called SLICEL. SLICEM represents a superset of elements and
connections found in all slices. Each CLB can contain two SLICEL or a SLICEL and a
SLICEM.

SLICEM and SLICEL


The components discussed after this all exist as pieces within a slice. This fact does not mean
that the whole is simply the sum of its parts! There are some unique features to the slices
themselves that allow an FPGA to expand its functionality.
10
First, the slices within a CLB are not connected to each other. They are physically oriented in
a similar fashion to the above diagram so that they may be connected with the same slice type
(SLICEM or SLICEL) within CLBs above or below, creating columns. This allows
interconnections between SLICEM or SLICEL in a column to create large scale functions.
The distinguishing feature of the two slice types is the configurability of the SLICEM.
SLICEM can be configured so that the look-up tables within it can act as shift registers or as
data storage (creating distributed memory on the chip) in addition to its normal logic
functionality.
A note on naming: the ‘M’ may be an indication of its ability to act as distributed memory,
while the ‘L’ may be an indication of its exclusive logic functionality. This is just speculative
but it can be helpful to remember which is which.

11
Figure 5.6 : SLICEM

12
Figure 5.7 : SLICEL

Look-Up Table (LUT)


The function generators in 7 series FPGAs are implemented as six-input look-up tables
(LUTs). There are six independent inputs (A inputs - A1 to A6) and two independent outputs
(O5 and O6) for each of the four function generators in a slice (A, B, C, and D). The function
generators can implement:
• Any arbitrarily defined six-input Boolean function
13
• Two arbitrarily defined five-input Boolean functions, as long as these two functions
share common inputs
• Two arbitrarily defined Boolean functions of 3 and 2 inputs or less
A six-input function uses:
• A1-A6 inputs
• O6 output Two five-input or less functions use:
• A1–A5 inputs
• A6 driven High
• O5 and O6 outputs
The propagation delay through a LUT is independent of the function implemented. Signals
from the function generators can:
• Exit the slice (through A, B, C, D output for O6 or AMUX, BMUX, CMUX, DMUX
output for O5)
• Enter the XOR dedicated gate from an O6 output
• Enter the carry-logic chain from an O5 output
• Enter the select line of the carry-logic multiplexer from O6 output
• Feed the D input of the storage element
• Go to F7AMUX/F7BMUX wide multiplexers from O6 output
In addition to the basic LUTs, slices contain three multiplexers (F7AMUX, F7BMUX, and
F8MUX). These multiplexers are used to combine up to four function generators to provide
any function of seven or eight inputs in a slice.
• F7AMUX: Used to generate seven input functions from LUTs A and B
• F7BMUX: Used to generate seven input functions from LUTs C and D
• F8MUX: Used to combine all LUTs to generate eight input functions.
Functions with more than eight inputs can be implemented using multiple slices. There are no
direct connections between slices to form function generators greater than eight inputs within
a CLB.

Storage Elements
There are eight storage elements per slice. Four can be configured as either edge-triggered D-
type flip-flops or level-sensitive latches. The D input can be driven directly by a LUT output
via AFFMUX, BFFMUX, CFFMUX, or DFFMUX, or by the BYPASS slice inputs bypassing
the function generators via AX, BX, CX, or DX input. When configured as a latch, the latch is
14
transparent when the CLK is Low.
There are four additional storage elements that can only be configured as edge-triggered D-
type flip-flops. The D input can be driven by the O5 output of the LUT or the BYPASS slice
inputs via AX, BX, CX, or DX input. When the original four storage elements are configured
as latches, these four additional storage elements cannot be used.

Programmable Interconnect

In Fig 5.8 , a hierarchy of interconnect resources can be seen. There are long lines that can be
used to connect critical CLBs that are physically far from each other on the chip without
inducing much delay. Theses long lines can also be used as buses within the chip.
There are also short lines that are used to connect individual CLBs that are located physically
close to each other. Transistors are used to turn on or off connections between different lines.
There are also several programmable switch matrices in the FPGA to connect these long and
short lines together in specific, flexible combinations.
Three-state buffers are used to connect many CLBs to a long line, creating a bus. Special long
lines, called global clock lines , are specially designed for low impedance and thus fast
propagation times. These are connected to the clock buffers and to each clocked element in
each CLB. This is how the clocks are distributed throughout the FPGA, ensuring minimal
skew between clock signals arriving at different flip-flops within the chip.
In an ASIC, the majority of the delay comes from the logic in the design, because logic is
connected with metal lines that exhibit little delay. In an FGPA, however, most of the delay in
the chip comes from the interconnect, because the interconnect – like the logic – is fixed on the
chip. In order to connect one CLB to another CLB in a different part of the chip often
requires a connection through many transistors and switch matrices, each of which introduces
extra delay.

15
Figure 5.7 : SLICEL

Macros
Create macros using multiple design element primitives. Following are the different types of
macros:
• Hard Macro (.nmc)
When we add a hard macro to our design, we are adding an instance of a library hard macro.
Our design can contain multiple instances of the same library hard macro, but each hard
macro must have a unique name. We can use FPGA Editor to create hard macros using either
of the following methods:
• Save a design as a hard macro. For details.
• Create a hard macro. This method is only recommended if we have advanced hand
routing skills and knowledge of our targeted architecture.
Note RPM macros are recommended instead of hard macros wherever possible, because hard
macros do not allow timing analysis. A hard macro is seen as a "black box" by the Xilinx®
timing tools. Timing can be analyzed to the input and output of the hard macro, but we must
manually verify the timing paths within the hard macro.
• Relationally Placed Macro (RPM)
RPMs define the spatial relationship of the primitives that comprise the RPM. We can define
the relative placements of these primitives to create our own RPMs, using constraints in a
UCF file. After create the RPM, we can use FPGA Editor to view the placement of the RPM
and to verify that it was created as expected.

16
Combinational Logic Implemented by Xilinx XC4000 CLB
Any function of up to four variables, plus any second function of up to four unrelated
variables, plus any third function of up to three unrelated variables ‰
Any single function of five variables ‰
Any function of four variables together with some functions of six variables ‰
Some functions of up to nine variables.
F(a, b, c, d, e) = a•F(a=1) + a’•F(a=0)
- Both F(a=1) and F(a=0) are four-input functions ‰

Figure 5.8 : Four variable implementation

Any function of four variables together with some functions of six variables can be
implemented by a single CLB
F(a,b,c,d,e,f) = a•b•F1 + a•b’•F2 + a’•b•F3 + a’•b’•F4
F1 = F(a=1, b=1);
F2 = F(a=1, b=0);
F3 = F(a=0, b=1);
F4 = F(a=0, b=0)
Condition: Among F1-F4, three of them are constant (e.g. F1=1, F2=F3=0)

17
Figure 5.9 : Four variable CLB implementation
Decoding Circuits
2-to-4 Decoding circuit

Figure 5.10 : 2 to 4 decoding circuit

18
10-to-1024 Decoding circuit

Figure 5.11 : 10 to 1024 decoding circuit


F1= x4•x5•x6•x7
F2= x0•x1•x2•x3
F3= x8•F1•F2 x9
F4= x9•F3
F5= x9’•F3
Disadvantages
• It needs 1024 CLBs; expensive to implement.
• It is a two level implementation, resulting large delay.

Dedicated Decoding Circuits in Xilinx FPGAs


Four dedicated programmable decoding circuits are included in Xilinx FPAGs. ‰
The number of decoder inputs ranges from 42 to 132 for different devices. ‰
The decoding circuits use wired-AND gate structures (like the AND plane in PAL).

Figure 5.12 : Dedicated decoding circuit


19
FPGA Implementation of Sequential Logic
Sequential Circuit: the circuit outputs depend on not only the current values of inputs but also
previous input values.

Figure 5.13 : Sequential Logic

Storage Elements in Xilinx CLB

Figure 5.14 : Storage Elements in Xilinx CLB

Each CLB contains two edge-triggered D flip-flops. They can be configured as positive-edge-
triggered or negative-edge-triggered. ‰
Each D flip-flop has clock enable signal E, which is active high. ‰

20
Each D flip-flop can be set or reset by SR signal. A global reset or reset signal is also available
for set or reset all D flip-flops once the device is powered up.
FPGA Implementation of Finite State Machines
Example of Finite State Machine

Figure 5.15 : State transition diagram

Figure 5.16 : State Table

21
State Encoding
Binary encoding: minimum number of D flip-flops

It needs two D flip-flps


Implementation Using Binary Encoding
Excitation table

Implementation Using Binary Encoding


Combinational functions needed to be implemented
D1 = x’+ y + Q0 (F1)
D0 = Q1•Q0 + y’•Q0’ + x’•Q0’ (F2)
a = Q1•Q0’ (F3)
b = Q1’•Q0 (F4)
c = Q1’•Q0’ (F5)
d = Q0’ + Q1’ (F6)
e = Q1’ (F7)

22
Implementation Using Binary Encoding
FPGA implementation

Figure 5.17 : FPGA implementation

TEXT / REFERENCE BOOKS


1. J.Bhaskar, “A VHDL Primer”, Prentice Hall of India Limited. 3rd edition 2004
2. Stphen Brown, "Fundamental of Digital logic with Verilog Design",3rd edition, Tata
McGraw Hill, 2008
3. J.Bhaskar, “A Verilog HDL Primer”, Prentice Hall of India Limited. 3rd edition 2004
4. Samir Palnitkar” Verilog HDL: A Guide to Digital Design and Synthesis”, Star Galaxy
Publishing; 3rd edition,2005
5. Michael D Ciletti - Advanced Digital Design with VERILOG HDL, 2nd Edition, PHI,
2009.
6. Z Navabi - Verilog Digital System Design, 2nd Edition, McGraw Hill, 2005.
7. Stuart Sutherland, “RTL Modeling With System Verilog for Simulation and
Synthesis: Using System Verilog for ASIC and FPGA Design”,1st Edition, Sutherland
HDL,Inc., 2017.
23
8. Simon Monk, “Programming FPGAs: Getting Started with Verilog”, 1st Edition, Tata
McGraw Hill,2016.
9. User Guide – “7 Series FPGAs Configurable Logic Block” - (WWW.XILINX.COM)

24

You might also like