Module 3 - Full
Module 3 - Full
S5 HONOURS
MODULE 3
Typical architectures for FPGAs
• FPGA architecture or organization refers to the manner or topology in which the
logic blocks and interconnect resources are distributed inside the FPGA.
• FPGAs can be classified into four different basic architectures or
topologies:
• Row-based architectures 2
• Sea-of-gates architecture
Matrix based (Symmetrical arrays) architecture:
• Logic blocks in this type of FPGA are organized in a matrix-like fashion
• Most Xilinx FPGAs belong to this category
• This architecture consists of logic elements (called CLBs) arranged in rows and
columns of a matrix and interconnect laid out between them .
3
• These architectures typically contain 8x8 arrays in the smaller chips and 100x100
or larger arrays in the bigger chips.
• This symmetrical matrix is surrounded by I/O blocks which connect it to outside
world.
• The routing resources are interspersed between the logic blocks.
• The routing in these architectures is often called two-dimensional channeled
routing since routing resources are generally available in horizontal and vertical
directions.
4
Row based architecture:
• These architectures were inspired by traditional gate arrays
• Row-based architecture consists of alternating rows of logic modules and
programmable interconnect tracks.
5
• Routing tracks are divided into smaller segments connected by anti-fuse elements
between them.
• One row may be connected to adjacent rows via vertical interconnect.
• Traditional mask-programmable gate arrays use very similar architectures.
• The routing in these architectures is often called one-dimensional channeled
routing, because the routing resources are located as a channel in between rows of
logic resources.
7
• In Altera APEX20 and APEX II FPGAs, 10 or so logic elements are connected to
form what Altera calls a Logic Array Block (LAB), and then several LABs are
connected to form a MEGALAB.
• These FPGAs contain clusters of logic blocks with localized resources for
interconnection.
• The global interconnect network is used for the interconnections between the
clusters of logic blocks in these FPGAs.
• Each logic module has combinatorial as well as sequential functional
8 elements.
• Each of these functional elements is controlled by the programmed memory.
• Input output blocks surround this scheme of logic blocks and interconnects
Sea of gates:
• The sea-of-gates architecture is yet another manner to organize the logic blocks
and interconnect in an FPGA.
• The general FPGA fabric consists of a large number of gates, and then there is an
interconnect superimposed on the sea of gates as illustrated
9
• Plessey, a manufacturer that was in the FPGA market in the mid-1990s, made
FPGAs of this architecture.
• The basic cell used was a NAND gate, in contrast to the larger basic cells used
by manufacturers such as Xilinx.
• While the terminology sea of gates is the most popular, there are also the
terminologies sea of cells and sea of tiles to indicate the topology of FPGAs
with a large number of fine-grain logic cells.
10
• The Microsemi Fusion FPGAs contain a sea of tiles, where each tile can be
configured as a 3-input logic function or a flip-flop/latch.
Granularity
● FPGA logic blocks differ greatly in their size and implementation capability.
● The two transistor logic block used in the Crosspoint FPGA can only implement an
inverter but is very small in size.
● The look-up table logic block used in the Xilinx 3000 series FPGA can implement
any five-input logic function but is significantly larger.
● To capture these differences, logic blocks are classified by their granularity.
● Granularity can be defined in various ways, for example, as the number of Boolean
functions that the logic block can implement, the number of equivalent two-input
NAND gates, the total number of transistors, total normalized area, or the number of
inputs and outputs.
Fine-Grained Logic Blocks
● Fine-grain logic blocks closely resemble MPGA basic cells.
● The most fine grain logic block would be identical to a basic cell of an MPGA
and would consist of few transistors that can be programmable interconnected.
Each logic block can be used to implement only a very simple function.
● For example, the logic block might be configured to act as any 3 input
function such as primitive logic gates (AND, OR, NAND etc) or a storage
element (D FF, D latch etc).
Fine grained FPGA families are given below.
● The Crosspoint FPGA
● The Plessey FPGA
The Crosspoint FPGA:
● The FPGA from Crosspoint Solutions uses a single transistor pair in
the logic block.
● In addition to the transistor pair tiles, the cross-point FPGA has a second type of logic
block, called a RAM logic tile, that is tuned for the implementation of random access
memory, but can also be used to build random logic functions
The Plessey FPGA
● A second example of a fine-grain FPGA architecture is the FPGA from Plessey
● Here the basic block is a two-input NAND gate and the logic is formed in the usual
way by connecting the NAND gates
● Algotronix :uses a two-input function block which can perform
any function of two inputs. This is implemented using a
configurable set of multiplexers.
● Concurrent Logic:logic block of Concurrent Logic's FPGA
contains a two-input AND gate and a two-input EXCLUSIVE-
OR gate.
ADVANTAGES:
● The useable blocks are fully utilized since it is easier to use small logic gates
efficiently and the logic synthesis techniques for such blocks are very similar
to those for conventional mask-programmed gate arrays and standard cells.
DISADVANTAGES:
● They require a relatively large number of wire segments and programmable
switches.
● Such routing resources are costly in delay and area.
● As a result, FPGAs employing fine-grain blocks are in general slower and
achieve lower densities than those employing coarse grain blocks.
Coarse-grained Logic block
● In the case of coarse- grained architecture, each block contain relatively large
amount of logic compared to their fine-grained counterparts.
● For example, logic block might contain four 4 input LUTs, four multiplexers,
four D-type flip flops and some fast carry logic.
● ACTEL
● QUICK LOGIC
● XILINX
● The Actel logic block is based on the ability of a multiplexer to implement
different logic functions by connecting each of its inputs to a constant or to a
signal
● By connecting together a number of multiplexers and basic logic gates, a
logic block can be constructed which can implement a large number of
functions in this manner.
Quick logic
coarse-grained architectures uses a bus interconnect and PEs that perform more
than just bitwise operations, such as ALUs and multipliers.
● Some Coarse grained architecture comprises of array of nodes, where each
node is a highly complex processing element ranging from algorithmic
structure such as FFT all the way up to general purpose microprocessor core.
● These are called medium grained architectures.
● Medium grained architectures are classified as LUT based and MUX based.
Effect of Logic Block Granularity on FPGA Density
and Performance
● Effect of Granularity on Density
○ As the granularity of a logic block increases, the number of blocks needed to
implement a design should decrease.
○ On the other hand a more functional (larger granularity) logic block requires
more circuitry to implement it, and therefore occupies more area.
○ This tradeoff suggests the existence of an “optimal” logic block granularity for
which the FPGA area devoted to logic implementation is minimized.
● FPGA contains logic cells replicated in regular array across the chip.
● The logic blocks vary in the basic components they use.
● Look-Up Table (LUT) based logic blocks (Xilinx)
● Multiplexers and logic gates to build their logic blocks (Microsemi/Actel)
● PLD blocks (Altera FPGA).
● Simple building blocks consisted of
● Transistor pairs (e.g., Crosspoint FPGAs).
● NAND gates (e.g., Plessey).
Look-Up-Table–Based Programmable Logic Blocks
● Many look-up-table–based FPGAs use a 4-variable look-up table (often denoted by
the short form LUT4) plus a flip-flop as the basic element and then combine several
of them in various topologies.
● The LUT4 can also be called a 4-variable function generator since it can generate
any function of four variables.
● The inputs to the X-function generator are called X1, X2, X3, and X4
● The functions can be steered to the output of the block (X ) in combinational or latched
form
● The D flip-flop can have clock enable, direct set, and direct reset inputs
● A multiplexer selects between the combinatorial output and the latched version of the
output.
● The memory cell beneath the multiplexer provide appropriate select signals to select
between the latched and unlatched form of the function
● Examples are
○ Xilinx Spartan/Virtex
○ Altera Cyclone II/APEX II
○ QuickLogic Eclipse/PolarPro
○ Lattice Semiconductor ECP
Logic Blocks Based on Multiplexers and Gates
● Any combinational function can be implemented using multiplexers alone.
● A 4-to-1 multiplexer can generate any 2-input function.
● Logic blocks similar to these were used in early Microsemi (Earlier Actel) FPGAs
such as the ACT I and ACT II.
FPGA logic cells
In this section, the basic logic cell architecture of three major FPGA vendors are being
discussed
❖ Xilinx LCA (Xilinx 2000, 3000 and 4000)
❖ Altera Max
❖ Actel ACT 1,2 and 3
Xilinx LCA
● Xilinx LCA (a trademark, denoting logic cell array) basic logic cells, configurable logic
blocks or CLBs , are bigger and more complex than the Actel or QuickLogic cells.
● The Xilinx LCA basic logic cell is an example of a coarse-grain architecture .
● The Xilinx CLBs contain both combinational logic and flip-flops.
● The basis for the Xilinx logic block is an SRAM functioning as a look-up table (LUT).
● The truth table for a K-input logic function is stored in a 2K x 1 SRAM.
● The address lines of the SRAM function as inputs and the output of the SRAM provides the
value of the logic function.
● The advantage of look-up tables is that they exhibit high functionality
○ a K-input LUT can implement any function of K
● The disadvantage is that they are unacceptably large for more than about five inputs, since the
number of memory cells needed for a K-input lookup table is 2k
XC 3000
● The XC3000 CLB has five logic inputs (A–E), a common clock input (K), an asynchronous direct-
reset input (RD), and an enable (EC)
● Using programmable MUXes connected to the SRAM programming cells, you can independently
connect each of the two CLB outputs (X and Y) to the output of the flip-flops (QX and QY) or to the
output of the combinational logic (F and G).
● The basic logic cell in the Actel ACT family of FPGAs are called logic
modules.
● Actel ACT 1 use one type of logic module and ACT 2 and 3 use two different
types of logic modules.
● It is possible to build a logic function by connecting logic signals to some or all
of the Logic Module inputs, and by connecting any remaining Logic Module
inputs to VDD or GND.
● The figure also shows the implementation of a logic function F = A.B+B.C’+D
in ACT 1 logic module with the help of Shannon’s expansion theorem
(Assignment 1)
● ACT 2 and ACT 3 use two different types of logic modules C-module and S
module.
● ACT 2 C-module
● Combinational module
● 5 input functions
● The S-module (Sequential module)
● same combinational function capability as the C-module but with an
additional sequential element that can be configured as a flip-flop.
FPGA timing model
● Many FPGA and CPLD vendors provide a timing model in their data sheets that
allow estimation of path delays.
● Some example path delays that are of interest:
○ Minimum Pin to Pin (combinational) delay
■ (through input pin, through one combinational logic element, through one output pin.)
○ Minimum Register to Register Delay
■ From clock input pin, through Clock to Q delay through DFF of a logic element, through one
combinational logic element to setup time on DFF input).
calculates the signal propagation delay along each path, and checks for violations of timing
● Another way to perform timing analysis is to use dynamic simulation, which determines the full
behavior of the circuit for a given set of input stimulus vectors. Compared to dynamic
simulation, static timing analysis is much faster because it is not necessary to simulate the
logical operation of the circuit. STA is also more thorough because it checks all timing paths,
not just the logical conditions that are sensitized by a set of test vectors. However, STA can
● TSUD : Setup time is the amount of time required for the input to a sequential
device to be stable before a clock edge.
● TH : Hold time is similar to setup time, but it deals with events after a clock
edge occurs. Hold time is the minimum amount of time required for the input
to a sequential device to be stable after a clock edge.
● For Actel family, this sorting is done according to the Logic Module
propagation delay tPD.
● The propagation delay is defined as the average of rising (tPLH) and falling (tPHL)
propagation delays of a Logic Module.
● If the designer is using fully synchronized design techniques, then one more
timing parameter need to be considered called as worst case timing, which is
the maximum delay the design may encounter.
● The critical path delay between registers is given below.
Xilinx LCA Timing Model
● The above figure shows timing model of Xilinx LCA FPGAs.
● Xilinx FPGAs use two speed grade systems.
● The first uses the maximum guaranteed toggle rate of a CLB flip-flop
measured in MHz as a suffix, so higher toggle rate then faster the device.
● The second uses the approximate delay time of the combinational logic in a
CLB in nano second, so lower the delay then faster the device.
● For example, an XC4010-6 has tILO (combinational logic delay) equal to 6.0 ns
(the correspondence between speed grade and tILO is fairly accurate for
XC2000, XC4000 and XC5200 but it is less accurate for XC3000).
Altera MAX timing model
● The above figure shows the Altera MAX timing model for local signals.
● In figure (a), an internal signal I1 enters the local array (LAB interconnect with a
fixed delay t1 = tLOCAL = 0.5 ns, then passes through AND array (delay t2 = tLAD =
4.0 ns and to the macrocell flip-flop (with set up time t3 = tSU = 3.0 ns and clock Q
or register delay t4 = tRD = 1.0 ns. Thus the total path delay = 0.5 + 4.0 + 3.0 + 1.0
= 8.5 ns.
● Figure (c) shows the use of parallel logic expander and figure (e) with a shared
expander.
● Unlike the shared expander, the parallel logic expander, the extra product term is
generated in parallel with other product terms, where as in shared expander it is
generated in series.
● Then the parallel expander delay, tPEXP = 1.0 ns will be added to total path delay
to make it 9.5 ns in this case.
Power Dissipation (Actel)
The power dissipation of FPGAs depends on such factors as utilization, average
operating frequency, and load conditions unlike the most of PALs and PLDs which
have a fixed power consumption.
Where: CEQ is the equivalent capacitance expressed in pF. VCC is the power supply in volts. F is
the switching frequency in MHz.
● Inputs from the pad can be brought into the interior of the chip either directly
or registered or both.
● Polarity of each clock line is programmable.
● Input clamping diodes are provided for electrostatic protection.
● Both direct input (from IOB pin I) and registered input (from IOB pin Q) signals
are available for interconnect.
● For reliable operation, inputs should have transition times of less than 100 ns
and should not be left floating.
● Each user IOB includes a programmable high-impedance pull-up resistor,
which may be selected by the program to provide a constant High for
otherwise undriven package pins.
Output characteristics:
● Output signals can be inverted or not inverted, and can pass directly to
the pad or be stored in an edge-triggered flip-flop.
● Optionally, an output enable signal can be used to place the output buffer
in a high-impedance state, implementing 3-state outputs or bidirectional
I/O.
● Under configuration control, the output (OUT) and output enable (OE)
signals can be inverted.
● The slew rate of the output buffer can be reduced to minimize power bus
transients when switching non-critical signals.
● Programmable pull-up and pull-down resistors are useful for tying unused
pins to VCC or ground to minimize power consumption.
Input characteristics:
● The XC5200 inputs can be globally configured for either TTL (1.2V) or CMOS
thresholds, using an option in the bitstream generation software.
● The inputs of XC5200-Series 5-Volt devices can be driven by the outputs of any
3.3-Volt device, if the 5-Volt inputs are in TTL mode.
● The data input to the register can optionally be delayed by several nanoseconds.
● The XC5200 IOB has a one-tap delay element: either the delay is inserted
(default), or it is not. The delay guarantees a zero hold time with respect to clocks
routed through any of the XC5200 global clock buffers. For a shorter input register
setup time, with non-zero hold, attach a NODELAY attribute or property to the flip-
flop or input buffer.
Output characteristics:
● Output signals can be optionally inverted within the IOB, and pass directly to the
pad or can be made registered.
● An active-High 3-state signal can be used to place the output buffer in a high-
impedance state, implementing 3-state outputs or bidirectional I/O. Under
configuration control, the output (OUT) and output 3-state (T) signals can be
inverted. The polarity of these signals is independently configured for each IOB
● The XC5200 devices provide a guaranteed output sink current of 8 mA.
● An output can be configured as open-drain (open-collector) by placing an OBUFT
symbol in a schematic or HDL code, then tying the 3-state pin (T) to the output
signal, and the input pin (I) to Ground.