Unit Iii PDF
Unit Iii PDF
-Ankita Tijare
ASIC Logic Cells
All FPGAs contain a basic logic cell replicated in a
regular array across the chip.
There are three different types of basic logic cells:
multiplexer based
look-up table based
programmable array logic
ACT1 Logic Module:
Logic functions can be built by connecting logic signals to some or all of the
Logic Modules inputs and by connecting the remaining Logic Module
inputs to VDD or GND
Figure 1 The Actel ACT1 architecture. (a) Organization of the basic cells. (b) The ACT1 logic
module. (c) An implementation using pass transistors. (d) An example logic macro.
Shannons Expansion Theorem
We can use Shannons expansion theorem to expand a
function:
F = A F (A = 1) + A' F (A = 0)
Where F(A=1) is the function evaluated with A=1 and
F(A=0) is the function evaluated with A=0 F(A=0) is the function evaluated with A=0
Example: F = A' B + A B C' + A' B' C
= A (B C') + A' (B + B' C)
F (A = '1') = B C' is the cofactor of F with respect to ( wrt ) A
or F
A
Shannons Expansion Theorem cntd.
Eventually we reach the unique canonical form , which uses
only minterms (A minterm is a product term that contains all
the variables of Fsuch as A B' C)
Final result for example above should be:
F = A' B C + A' B' C + A B C' + A' B C' F = A' B C + A' B' C + A B C' + A' B C'
Using Shannons Expansion Theorem to Map a
Function to an ACT1 Logic Module
Another example: F = (A B) + (B' C) + D
Expand F wrt B: F = B (A + D) + B' (C + D)
= B F2 + B' F1
Where F1= (C + D) and F2 = (A + D)
The function F can be implemented by 2:1 MUX, with B The function F can be implemented by 2:1 MUX, with B
selecting between two inputs: F (B = '1') and F (B = '0')
F also describes the output of the ACT 1 LM
Now we need to split up F1 and F2
Expand F1 wrt C: F1 = C + D = (C 1) + (C' D)
Expand F2 wrt A: F2 = A + D = (A 1) + (A' D);
C connects to the select line of a first-level mux in the ACT1
LM with 1 and D as the inputs to the mux
A connects to the select line of another first-level mux in the
ACT1 LM with 1 and D as inputs to the mux
B connects to the select line of the output mux with F1 and F2, B connects to the select line of the output mux with F1 and F2,
the outputs of the first level muxes, connected to the inputs
See Figure 5.1(d) for implementation
Ways to Arrange a Karnaugh Map o !
"aria#les
Fig.The logic functions of two variables.
$oolean %unctions o T&o
"aria#les 'sing a !() Mux
Function, F F = Canonical form Minterms Minterm
code
Function
number
M1
A0 A1 SA
1 '0' '0' '0' none 0000 0 0 0 0
2 NOR1-1(A, B) (A + B') A' B 1 0010 2 B 0 A
3 NOT(A) A' A' B' + A' B 0, 1 0011 3 0 1 A
4 AND1-1(A, B) A B' A B' 2 0100 4 A 0 B
5 NOT(B) B' A' B' + A B' 0, 2 0101 5 0 1 B 5 NOT(B) B' A' B' + A B' 0, 2 0101 5 0 1 B
6 BUF(B) B A' B + A B 1, 3 1010 6 0 B 1
7 AND(A, B) A B A B 3 1000 8 0 B A
8 BUF(A) A A B' + A B 2, 3 1100 9 0 A 1
9 OR(A, B) A + B A' B + A B' + A B 1, 2, 3 1110 13 B 1 A
10 '1' '1' A' B' + A' B + A B' + A B 0, 1, 2, 3 1111 15 1 1 1
ACT) LM as a %unction Wheel *cont.+
A 2:1 MUX is a function wheel that can generate BUF, INV,
AND-11, AND1-1, OR, AND
Define a function
WHEEL (A, B) = MUX (A0, A1, SA)
MUX (A0, A1, SA) = A0 SA' + A1 SA
Each of the inputs (A0, A1, and SA) may be A, B, '0', or '1'
The ACT 1 LMis built from two function wheels, a 2:1 MUX,
and a two-input OR gate:
ACT 1 LM = MUX [WHEEL1, WHEEL2, OR (S0, S1)]
ACT) LM as a %unction Wheel
Fig. The ACT1 logic module as a Boolean function generator.
(a) A 2:1 MUX viewed as a logic wheel.
(b) The ACT1 logic module viewed as two function wheels.
Example of Implementing a Function
with an ACT1 LM
Example of using the WHEEL functions to implement:
F = NAND (A, B) = (A B)
1. First express F as the output of a 2:1 MUX:
expand F wrt A (or wrt B; since F is symmetric)
F = A (B') + A' ('1') F = A (B') + A' ('1')
2. Assign WHEEL1 to implement INV (B), and WHEEL2 to
implement '1'
3. Set the select input to the MUX connecting WHEEL1 and
WHEEL2, S0 + S1 = A. We can do this using S0 = A, S1 =
'1'
A single Actel ACT1 LM can implement all combinational
two-input functions, most three input functions and many four
input functions
A transparent D latch can be implemented with one ACT1 LM
and an edge triggered D flip-flop can be implemented with two
LMs LMs
ACT ! , ACT - logic modules(
ACT 1 requires 2 LMs per flip-flop: with unknown
interconnect capacitance
ACT 2 and ACT 3 use two types of LMs, one includes a
D flip-flop
ACT 2 C-Module is similar to the ACT 1 LM but can ACT 2 C-Module is similar to the ACT 1 LM but can
implement five-input logic functions combinatorial
module implements combinational logic (blame MMI for
the misuse of terms)
ACT 2 S-Module (sequential module) contains a C-
Module and a sequential element
Actel ACT! and ACT- Logic Modules
Figure 5.4 The ACT2 and
ACT3 logic
modules. (a) The C-
module. (b) The
ACT2 S-module. (c)
The ACT3 S-
module. (d) The
equivalent circuit of
the SE. (e) The SE
configured as a
positive edge-
triggered D flip-flop. triggered D flip-flop.
Actel Timing Model
Fig. The Actel ACT timing model. (a) The timing parameters for a std
speed grade ACT3. (b) Flip-flop timing. (c) An example of flip-flop timing
based on ACT3 parameters.
Actel Timing Model(
t
SUD
- Set up time inside the S-module of the flipflops
t
H
- hold time inside the S-module of the flipflops
t
CO
clock propagation delay
t
PD
propagation delay of the combinational logic inside
PD
the S module.
t
CLKD
the delay of the combinational logic that drives
the flipflop clock signal.
.ilinx Conigura#le Logic $loc/ *CL$+
Fig. The Xilinx XC3000 CLB (configurable logic block).
The combinational function in a CLB is implemented with
a 32 bit look-up table (LUT)
LUT values are stored in 32 bits of SRAM
CLB delay is fixed and equal to the LUT access time.
The XC3000 CLB, shown in Figure, has five logic inputs
(AE), a common clock input (K), an asynchronous direct-
reset input (RD), and an enable (EC). reset input (RD), and an enable (EC).
Using programmable MUXes connected to the SRAM
programming cells, you can independently connect each of
the two CLB outputs (X and Y) to the output of the flip-
flops (QX and QY) or to the output of the combinational
logic (F and G).
.ilinx CL$ *cont.+
There are several ways to use the LUT:
You can use five of the seven possible inputs (A-E.
QX,QY) with the entire LUT - the outputs (F,G) are
identical
You can split the 32-bit LUT in half to implement two
functions of four variables functions of four variables
The input variable can be chosen from A-E,QX,QY
Two of the inputs must come from A-E
You can split the 32-bit LUT in half, using one of the seven
input variables as a select input to a 2:1 MUX that switches
between F and G. This allows you to implement some
functions of six and seven variables.
.ilinx .C0111 Logic $loc/
Fig. The Xilinx XC4000 CLB (configurable logic block).
Figure shows the CLB used in the XC4000 series of
Xilinx FPGAs. This is a fairly complicated basic logic
cell containing 2 four-input LUTs that feed a three-input
LUT.
The XC4000 CLB also has special fast carry logic hard-
wired between CLBs.
MUX control logic maps four control inputs (C1C4) into
the four inputs: LUT input H1, direct in (DIN), enable the four inputs: LUT input H1, direct in (DIN), enable
clock (EC), and a set / reset control (S/R) for the flip-
flops.
The control inputs (C1C4) can also be used to control
the use of the F' and G' LUTs as 32 bits of SRAM
.ilinx .C2!11 Logic $loc/
Basic Cell is called a Logic Cell (LC) and is similar to, but
simpler than, CLBs in other Xilinx families
Term CLB is used here to mean a group of 4 LCs (LC0-LC3)
Fig. The Xilinx XC5200 LC (logic cell) and CLB (configurable logic block).
The XC5200 LC contains a four-input LUT, a flip-flop, and
MUXes to handle signal switching. The arithmetic carry logic
is separate from the LUTs.
A limited capability to cascade functions is provided (using the
MUX labeled F5_MUX in logic cells LC0 and LC2) to gang MUX labeled F5_MUX in logic cells LC0 and LC2) to gang
two LCs in parallel to provide the equivalent of a five-input
LUT.
Altera %LE. Architecture
Figure The Altera FLEX architecture. (a) Chip floorplan. (b) LAB (Logic Array
Block). (c) Details of the LE (logic element).
Figure shows the basic logic cell, a Logic Element ( LE ),
that Altera uses in its FLEX 8000 series of FPGAs.
The FLEX LE uses a four-input LUT, a flip-flop, cascade
logic, and carry logic. Eight LEs are stacked to form a
Logic Array Block (the same term as used in the MAX
series, but with a different meaning). series, but with a different meaning).
34564AMMA$LE ASIC I75
CELLS CELLS
8ierent types o i7o re9uirements(
DC output
AC output
DC input
AC input AC input
Clock input
Power input
DC Output
Fig. shows a robot arm driven by three small
motors together with switches to control the
motors. The motor armature current varies
between 50 mA and nearly 0.5 A when the
motor is stalled. Can we replace the switches
with an FPGA and drive the motors directly? with an FPGA and drive the motors directly?
Fig. shows a CMOS complementary output
buffer used in many FPGA I/O cells and its DC
characteristics.
Values for the Xilinx XC5200 are as follows:
VOLmax = 0.4 V, low-level output voltage at
IOLmax = 8.0 mA.
VOHmin = 4.0 V, high-level output voltage at VOHmin = 4.0 V, high-level output voltage at
IOHmax = 8.0 mA
If we force the output voltage , VO , of an output buffer,
using a voltage supply, and measure the output current,
IO , that results, we find that a buffer is capable of
sourcing and sinking far more than the specified
IOHmax and IOLmax values.
Most vendors do not specify output characteristics
because they are difficult to measure in production. because they are difficult to measure in production.
Thus we normally do not know the value of IOLpeak or
IOHpeak ; typical values range from 50 to 200 mA.
Can we drive the motors by connecting several output buffers in
parallel to reach a peak drive current of 0.5 A?
Some FPGA vendors do specifically allow you to connect adjacent
output cells in parallel to increase the output drive.
If the output cells are not adjacent or are on different chips, there is
a risk of contention.
Contention will occur if, due to delays in the signal arriving at two
output cells, one output buffer tries to drive an output high while the
other output buffer is trying to drive the same output low.
If this happens we essentially short VDD to GND for a brief period.
Although contention for short periods may not be destructive, it
increases power dissipation and should be avoided.
It is thus possible to parallel outputs to increase the DC
drive capability, but it is not a good idea to do so
because we may damage or destroy the chip (by
exceeding the maximum metal electro migration limits).
An alternative simple circuit to boost the drive capability
of the output buffers.
If we need more power we could use two operational
amplifiers ( op-amps ) connected as voltage followers in
a bridge configuration.
For even more power we could use discrete power
MOSFETs or power op-amps.
Totem-Pole Output
Figure shows a totem-pole output buffer and its DC
characteristics. It is similar to the TTL totem-pole output
from which it gets its name (the totem-pole circuit has two
stacked transistors of the same type, whereas a
complementary output uses transistors of opposite types).
The high-level voltage, VHmin , for a totem pole is lower The high-level voltage, VHmin , for a totem pole is lower
than VDD .
Clamp Diodes
Figure (c) show the connection of clamp diodes (D1
and D2) that prevent the I/O pad from voltage
excursions greater than V DD and less than V SS .
Figure (d) shows the resulting characteristics.
AC 5utput(
Fig. shows an example of an off-chip three-state bus. Chips
that have inputs and outputs connected to a bus are called
bus transceivers .
Can we use FPGAs to perform the role of bus
transceivers?
We will focus on one bit, B1, on bus BUSA, and we shall
call it BUSA.B1.CHIP1.OE means the signal OE inside call it BUSA.B1.CHIP1.OE means the signal OE inside
CHIP1.
Initially CHIP2 drives BUSA.B1 high (CHIP2.D1 is '1' and CHIP2.OE is '1').
The buffer output enable on CHIP2 (CHIP2.OE) goes low, floating the bus. The
bus will stay high because we have a bus keeper, BK1.
The buffer output enable on CHIP3 (CHIP3.OE) goes high and the buffer drives
a low onto the bus (CHIP3.D1 is '0').
We wish to calculate the delays involved in driving the off-chip bus
in Fig. In order to find t
float
, we need to understand how Actel
specifies the delays for its I/O cells.
Fig (a) shows the circuit used for measuring I/O delays for the ACT
FPGAs.
Fig(a) that when the output enable E is '0' the output is three-stated .
To measure the buffer delay (measured from the change in the
enable signal, E) Actel uses a resistor load ( R L = 1 k W for
ACT2). The resistor pulls the buffer output high or low depending
on whether we are measuring:
t
ENZL
, when the output switches from hi-Z to '0'.
t , when the output switches from '0' to hi-Z. t
ENLZ
, when the output switches from '0' to hi-Z.
t
ENZH
, when the output switches from hi-Z to '1'.
t
ENHZ
, when the output switches from '1' to hi-Z.
Supply Bounce:
Fig. (a) shows an n -channel transistor, M1, that is part of an output
buffer driving an output pad, OUT1; M2 and M3 form an inverter
connected to an input pad, IN1; and M4 and M5 are part of another
output buffer connected to an output pad, OUT2.
As M1 sinks current pulling OUT1 low ( Vo1 in Fig b), a substantial
current I
OL
may flow in the resistance, R S , and inductance, L S , that
are between the on-chip GND net and the off-chip, external ground
connection.
Supply bounce
A substantial current IOL may flow in the resistance, RS,
and inductance, LS, that are between the on-chip GND
net and the off-chip, external ground connection
As the pull-down device, M1, switches, it causes the
GND net (value VSS) to bounce
The supply bounce is dependent on the output slew rate
Ground bounce can cause other output buffers to Ground bounce can cause other output buffers to
generate a logic glitch
Bounce can also cause errors on other inputs.
Transmission lines
A printed-circuit board (PCB) trace is a transmission (TX) line
(Z0 = 50W100W)
A driver launches an incident wave, which is reflected at the end of
the line
A connection starts to look like a TX line when the rise time is
about 2 line delay (2tf)
Transmission line termination
(a) Open-circuit or capacitive termination
(b) Parallel resistive termination
(c) Thvenin termination
(d) Series termination at the source
(e) Parallel termination using a voltage bias
(f) Parallel termination with a series capacitor
Open-circuit or capacitive termination. The bus termination is the
input capacitance of the receivers (usually less than 20 pF). The PCI
bus uses this method.
Parallel resistive termination. This requires substantial DC current
(5 V / 100 W= 50 mA for a 100 Wline). It is used by bipolar logic,
for example emitter-coupled logic (ECL), where we typically do not
care how much power we use.
Thvenin termination. Connecting 300 W in parallel with 150 W
across a 5 V supply is equivalent to a 100 W termination connected
to a 1.6 V source. This reduces the DC current drain on the drivers
but adds a resistance directly across the supply.
Series termination at the source. Adding a resistor in series with the
driver so that the sum of the driver source resistance (which is
usually 50 W or even less) and the termination resistor matches the
line impedance (usually around 100 W).
The disadvantage is that it generates reflections that may be close to
the switching threshold.
Parallel termination with a voltage bias. This is awkward because it
requires a third supply and is normally used only for a specialized requires a third supply and is normally used only for a specialized
high-speed bus.
Parallel termination with a series capacitance. This removes the
requirement for DC current but introduces other problems.
8C Input(
A switch input-
(a) A pushbutton switch connected to an input buffer with a pull-
up resistor
(b) As the switch bounces several pulses may be generated
We might have to debounce this signal using an SR flip-flop or
small state machine
DC input
(a) A Schmitt-trigger inverter lower switching threshold- 2V
upper switching threshold-3V difference between thresholds is the
hysteresis
(b) A noisy input signal
(c) Output from an inverter with no hysteresis
(d) Hysteresis helps prevent glitches
(e) A typical FPGA input buffer with a hysteresis of 200mV and a
threshold of 1.4V threshold of 1.4V
Noise margins
Transfer characteristics of a CMOS inverter with the lowest switching
threshold
The highest switching threshold
A graphical representation of CMOS logic thresholds
Logic thresholds at the inputs and outputs of a logic gate or an ASIC
The switching thresholds viewed as a plug and socket
CMOS plugs fit CMOS sockets and the clearances are the noise margins
Fig. (c) depicts the following relationships between the various
voltage levels at the inputs and outputs of a logic gate:
A logic '1' output must be between V OHmin and V DD .
A logic '0' output must be between V SS and V OLmax .
A logic '1' input must be above the high-level input voltage , V Ihmin
A logic '0' input must be below the low-level input voltage , V ILmax . A logic '0' input must be below the low-level input voltage , V ILmax .
Clamp diodes prevent an input exceeding V DD or going lower than V
SS .
TTL and CMOS logic thresholds
(a) TTL logic thresholds
(b) Typical CMOS logic thresholds
(c) A TTL plug will not fit in a CMOS socket
(d) Raising VOHmin solves the problem
AC Input(
Suppose we wish to connect an input bus containing sampled
data from an analog-to-digital converter ( A/D ) that is running
at a clock frequency of 100 kHz to an FPGA that is running
from a system clock on a bus at 10 MHz (a NuBus).
We have to perform some filtering and calculations on the We have to perform some filtering and calculations on the
sampled data before placing it on the NuBus.
Metastability
If we change the data input to a flip-flop (or a latch) too close
to the clock edge (called a setup or hold-time violation ), we
run into a problem called metastability , illustrated in Fig.
In this situation the flip-flop cannot decide whether its output
should be a '1' or a '0' for a long time.
If the flip-flop makes a decision, at a time tr after the clock
edge, as to whether its output is a '1' or a '0', there is a small,
but finite, probability that the flip-flop will decide the output is
a '1' when it should have been a '0' or vice versa.
This situation, called an upset , can happen when the data is
coming from the outside world and the flip-flop cant determine coming from the outside world and the flip-flop cant determine
when it will arrive; this is an asynchronous signal , because it
is not synchronized to the chip clock.
Experimentally we find that the probability of upset , p , is p
= T 0 exp(-tr / tc) (per data event, per clock edge, in one
second, with units 1/Hz 1/Hz 1/s) where tr is the time a
sampler (flip-flop or latch) has to resolve the sampler output;
T0 and tc are constants of the sampler circuit design. Let us
see how serious this problem is in practice.
If tr = 5 ns, tc = 0.1 ns, and T0 = 0.1 s, If tr = 5 ns, tc = 0.1 ns, and T0 = 0.1 s,
p = 2 X 10 ^-23 s ,
which is very small, but the data and clock may be running at
several MHz, causing the sampler plenty of opportunities for
upset.
The mean time between upsets ( MTBU , similar to MTBF
mean time between failures) is
MTBU=(1/pfclock *fdata)= (exp (Tr/Tc))/(Tofclock * fdata)
where f clock is the clock frequency and f data is the data
frequency.
If tr = 5 ns, tc = 0.1 ns, T0 = 0.1 s (as in the previous
example), f clock = 100 MHz, and f data = 1 MHz, then
MTBU= 5.2 X 10^8 seconds , MTBU= 5.2 X 10^8 seconds ,
Clock Input
When we bring the clock signal onto a chip, we may need
to adjust the logic level (clock signals are often driven by
TTL drivers with a high current output capability) and
then we need to distribute the clock signal around the
chip as it is needed.
FPGAs normally provide special clock buffers and clock
networks. We need to minimize the clock delay (or networks. We need to minimize the clock delay (or
latency), but we also need to minimize the clock skew.
Registered Inputs
Some FPGAs provide a flip-flop or latch that you can use as
part of the I/O circuit (registered I/O). For other FPGAs you
have to use a flip-flop or latch using the basic logic cell in the
core.
In either case the important parameter is the input setup time. In either case the important parameter is the input setup time.
We can measure the setup with respect to the clock signal at
the flip-flop or the clock signal at the clock input pad.
The difference between these two parameters is the clock
delay.
Figure shows part of the I/O timing model for a Xilinx
XC40005
tPICK is the fixed setup time for a flip-flop relative to
the flip-flop clock.
tskew is the variable clock skew , the signed delay
between two clock edges.
tPG is the variable clock delay or latency . tPG is the variable clock delay or latency .
To calculate the flip-flop setup time ( tPSUFmin )
relative to the clock pad (which is the parameter system
designers need to know), we subtract the clock delay, so
that
tPSUF = tPICK - tPG
FIGURE Clock input. (a) Timing model with values for
a Xilinx
XC4005-6. (b) A simplified view of clock distribution.
(c) Timing diagram. Xilinx eliminates the variable
internal delay t PG , by specifying a pin-to-pin setup
time, t PSUFmin = 2 ns.
The problem is that we cannot easily calculate t PG ,
since it depends on the clock distribution scheme and
where the flip-flop is on the chip. Instead Xilinx
specifies tPSUFmin directly, measured from the data pad
to the clock pad; this time is called a pin-to-pin timing
parameter . Notice t PSUF min = 2 ns `
t PICK - tPG max = 1 ns.
Power Input
The last item that we need to bring onto an FPGA is the
power. We may need multiple VDD and GND power pads to
reduce supply bounce or separate VDD pads for mixed-voltage
supplies. We may also need to provide power for on-chip
programming (in the case of antifuse or EPROMprogramming programming (in the case of antifuse or EPROMprogramming
technology).
The package type and number of pins will determine the
number of power pins, which, in turn, affects the number of
SSOs you can have in a design.
Power Dissipation
As a general rule a plastic package can dissipate about 1 W,
and more expensive ceramic packages can dissipate up to
about 2 W.
ASIC power consumption may dictate your choice of
packages. Actel provides a formula for calculating typical
dynamic chip power consumption of their FPGAs. dynamic chip power consumption of their FPGAs.
The formula for the ACT 2 and ACT 3 FPGAs are
complex; therefore we shall use the simpler formula for the
ACT 1 FPGAs as an example 1 :
Total chip power = 0.2 (N X F1) + 0.085 (M X F2) + 0.8
( P X F3) mW(6.7)
where
F1 = average logic module switching rate in MHz
F2 = average clock pin switching rate in MHz F2 = average clock pin switching rate in MHz
F3 = average I/O switching rate in MHz
M= number of logic modules connected to the clock pin
N = number of logic modules used on the chip
P = number of I/O pairs used (input + output), with 50 pF load
As an example of a power-dissipation calculation, consider an
Actel 1020B-2 with a 20 MHz clock.
We shall initially assume 100 percent utilization of the 547
Logic Modules and assume that each switches at an average
speed of 5 MHz.
We shall also initially assume that we use all of the 69 I/O
Modules and that each switches at an average speed of 5 MHz. Modules and that each switches at an average speed of 5 MHz.
Using Eq. the Logic Modules dissipate
P LM= (0.2)(547)(5) = 547 mW,
and the I/O Module dissipation is
P IO = (0.8)(69)(5) = 276 mW.
If we assume the clock buffer drives 20 percent of the Logic
Modules, then the additional power dissipation due to the
clock buffer is
P CLK = (0.085)(547)(0.2)(5) = 46.495 mW.
The total power dissipation is thus The total power dissipation is thus
P D = (547 + 276 + 46.5) = 869.5 mW
Xilinx I/O Block
The Xilinx I/O cell is the input/output block ( IOB )
The outputs contain features that allow you to do the following:
Switch between a totem-pole and a complementary output
(XC4000H).
Include a passive pull-up or pull-down (both n -channel
devices) with a typical resistance of about 50 k .
Invert the three-state control (output enable OE or three-state, Invert the three-state control (output enable OE or three-state,
TS).
Include a flip-flop, or latch, or a direct connection in the output
path.
Control the slew rate of the output.
The features on the inputs allow you to do the following:
Configure the input buffer with TTL or CMOS thresholds.
Include a flip-flop, or latch, or a direct connection in the input
path.
Switch in a delay to eliminate an input hold time. Switch in a delay to eliminate an input hold time.