18-643 Lecture 2: Basic FPGA Fabric: James C. Hoe Department of ECE Carnegie Mellon University

18-643 Lecture 2:
Basic FPGA Fabric
James C. Hoe
Department of ECE
Carnegie Mellon University
18-643-F21-L02-S1, James C. Hoe, CMU/ECE/CALCM, ©2021

Housekeeping
• Your goal today: know enough to build a basic
FPGA (even if not a very good one)
• Notices
– Complete survey on Canvas, due noon, 9/8
– Handout #2: lab 0, due noon, 9/13
– Make friends, make teams, due noon, 9/13
• Readings (see lecture schedule online)
– Altera 2006 white paper (see course website)
– skim databooks referenced for more details

What it means:
“Field Programmable” “Gate Array”

SSI MSI LSI VLSI
From Quora, “How did people design integrated circuits in early years?”
How to democratize 100K gates
(AB)’ (X+Y)’
VCC
GND
18-643-F21-L02-S5, James C. Hoe, CMU/ECE/CALCM, ©2021 A B X Y

Idea behind Gate Arrays
• Mass produce identical gate array wafers
• Finish into any design by custom metal layers (2)
– so called Mask-Programmable GA (MPGA)
– reduced design effort (more automation, no layout)
– reduced mask and fab cost
– faster fab turn-around
• Proliferation of ASIC design starts
– don’t need volume for economy of scale
– small design team could keep up with Moore’s law
Of course, not as efficient as full-custom
or standard-cell designs
How about no mask, no fab?
i.e., “field programmable”
• Again, mass produce identical devices but this
time fully-finalized
• Then what can be changed?
– SRAM EPROM (anti)fuse
{1,0} {1,0}
bits
{1,0}
connections
– pass gate mux {1,0} diode

{1,0} A C B
B A
A B
programmable vs reprogrammable
Configurable “Logic Gates”

Reconfigurable Logic
• Arbitrary logic (combinational and sequential) can
be formed by wiring up enough NANDs or muxes
X
f(…,0,…) 0
f(…,X,…)
f(…,1,…) 1
Shannon expansion
• Lookup table as universal logic primitive
– arbitrary n-input function ABC
from 2n-entry table

f(0,0,0)
– this is 8-by-1 bit “memory” f(0,0,1)
∙∙∙∙ f(A,B,C)
f(1,1,0)
f(1,1,1)
Size of Lookup Tables (aka LUTs)
• n-input function from 2n-entry LUT
• Count only the 6T SRAM cells, an n-LUT has 6∙2n T
• Some points of reference n-LUT T-count
– 2-input NAND = 4T 2 24
– 3-input NAND = 6T 3 48
– 3-input full-adder (a, b, cin) 4 96
5 192
• s = a  b  cin = 8T
6 384
• cout = bcin+acin+ab =18T
7 768
– 10-input 5-bit adder = 130T 8 1536
– basic flip-flop=16T 9 3072
(compare to 2 LUTs per latch) 10 6144
Choosing LUT Granularity
• Small LUTs
+ shorter propagation delay (per LUT)
– a given fxn consumes many LUTs (comes with
wiring cost and delay) this kind
– high “interpretation overhead” if too small
• Big LUTs
– longer propagation delay (per LUT)
+ a given fxn consumes fewer (but bigger) LUTs
– high “interpretation overhead” if too large (and
fxn has exploitable structure, e.g., 5-bit ripple add)
this kind
– wastage if not all input are used in a LUT
Where is the sweetspot?
A Quantitative Look at LUT Sizing
e.g., 2006 Altera White Paper on Stratix-II ALMs
Large-enough functions have shorter 3-LUTs 50+% fully utilized

total delay using bigger LUTs 6-LUTs less than 40% fully utilized
But, bigger LUTs cost more and prone
to “internal fragmentation” No one LUT size optimal
 “adaptive” LUT approach
LUT-based Configurable Logic Block
(simplified sketch)
D
X
A {2,1,0}
g(A,B,C)
B 3-LUT
C Y
h(A,B,C,D)
{2,1,0}
FF
3-LUT
f(A,B,C) {1,0} (also latch mode)
• 2 fxns (f & g) of 3 inputs OR 1 fxn (h) of 4 inputs

• hardwired FFs (too expensive/slow to fake)
• Just 10s of these in the earliest FPGAs
Xilinx XC2000 CLB (1980s)

[XC2064, XC2018 Logic Cell Array]
Contemporary Xilinx CLB Architecture
• each 6LUT is
two 5LUTs 2 slices per CLB
• LUTs can also
be used as Largest devices
small SRAMs (many $K each)
• special paths have several
for addition 100K slices
and
multiplexer Largest extreme
in 2021 has
over 1M slices
[Figure 2-3: 7 Series FPGAs CLB User Guide]
Still Coarser Logic Blocks?
• So called Coarse-Grain Reconfigurable Arrays
(CGRAs) based on complete adders or ALUs
– native arithmetic units have low interpretation
overhead if you are doing arithmetic
– poor fit if you are working with narrow data or bit-
level manipulations
• Even coarser is to use many tiny processors
– still a spatial computing paradigm
– not programmed with RTLs
– converging with software multicores

More on this later on
Brief Aside: Mapping Logic To LUTs
• Start from primary output and input to registers,
cover logic graph with cuts of less than K input edges
• K-cuts corresponding to K-LUT realizable functions
[Figure 13.1: “Reconfigurable Computing: The Theory

and Practice
18-643-F21-L02-S17, ofC. Hoe,
James FPGA-Based Computation”]
CMU/ECE/CALCM, ©2021
Placement

[Vivado Implementation Screenshot]
… and Route

[Vivado Implementation Screenshot]
Configurable “Wires”

PLA-style Configurable Routing
AND OR
? ? ?
?
I0 I1 In-1 O0 O1 Om-1
Island Style Routing Architecture
• CLB islands in sea of interconnects
• Flexible routing to support ASIC style netlists
• Note regularity in structure
C C C C C
C C C C C
C C C C C
C C C C C
C C C C C

Configurable Routing
(1980s Xilinx simplified)
Switch Block
A
B
X
CLB CLB
C
Y
D
Connection Block

Reconfigurable Routing is Expensive!
• Routing resource area is on par with logic
• Each configurable connection is
– area of configuration bit
– area of configurable connection
and don’t forget propagation delay
• Too much: cost for everyone who doesn’t need it
• Too little: congestion leaves unreachable CLBs
unused
– worse for larger arrays/designs (why?)
– buy a $10K FPGA and only get to use 70%?

Rent’s Rule
• Tgp
– T = number of inputs and outputs
– g = number of internal components
– p typically between 0.5 (regular) and 0.8 (random)
• In a square, perimeter=4area0.5
– unless regular, I/O signals grow faster than
available routes exiting a design area
• Need hierarchy of progressively longer additional
routing resources
long routes also reduce delay when going far

Virtex-II Routing Architecture
[Figure 48: Virtex-II Platform FPGAs: Complete Data Sheet]

Virtex-II Routing Architecture
Later architectures extended

in reach and in diagonals
Separate, dedicated clock trees
[Figure 49: Virtex-II Platform FPGAs: Complete Data Sheet]
Between-Die Routing in 2.5D IC
Virtex7 Stacked Silicon Interconnect (SSI), 2011
• Longest routes go across dies carried on interposer
• No change to design tool and abstraction
[Figure 1, Stacked & Loaded: Xilinx SSI, 28-

Gbps I/O Yield Amazing FPGAs, Xcell, Q1 2011]
Intel Stratix-X HyperFlex
• Long routes need buffered repeaters; very long
routes need pipelining
• Add (bypassable) pipeline registers throughout
• RTL designs have to be
pipelined explicitly to
benefit; high-level
synthesized designs
leverage directly
• a high-freq strategy 
e.g., 0.5xlogic at 2xfreq
for perf. parity
[Figure 2: Understanding How the New HyperFlex Architecture
Enables Next-Generation
18-643-F21-L02-S29, High-Performance
James C. Hoe, CMU/ECE/CALCM, ©2021 Systems]
Don’t Forget Configurable I/O
In/Out {1,0}
{1,0}
{1,0}
Dout {fast,slow}
FF
PAD
{1,0}
Din
{1,0} I/O Block
- real devices more complicated
- modern devices support special signaling and protocols
Putting it all together:
an Universal ASIC
programmable routing
programmable lookup tables

I (LUT) and flip-flops (FF)
aka “soft logic” or “fabric”
Interconnect
LUT FF
I/O pins

Bitstream defines the chip
• After power up, SRAM FPGA loads bitstream from
somewhere before becoming the “chip”
a bonus “feature” for sensitive
devices that need to forget what it does
• Many built-in loading options
• Non-trivial amount of time; must control reset
timing and sequence with the rest of the system
• Reverse-engineering concerns ameliorated by
– encryption
– proprietary knowledge
Return to this later in term . . . .
Parting Thoughts
• Birth of FPGAs rooted entirely in digital logic and
ASIC concerns; today, you can use an FPGA
without knowing any of this stuff
• You can find a lot of specific details on-line
(databooks and research papers)
• So far still just the basic fabric . . . .
. . . more next time
- saving “configuration” for later in term
- won’t say anything about low-level EDA

18-643 Lecture 2: Basic FPGA Fabric: James C. Hoe Department of ECE Carnegie Mellon University

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

18-643 Lecture 2: Basic FPGA Fabric: James C. Hoe Department of ECE Carnegie Mellon University

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18-643 Lecture 2: Basic FPGA Fabric: James C. Hoe Department of ECE Carnegie Mellon University

Uploaded by

Copyright:

Available Formats

18-643 Lecture 2:

Basic FPGA Fabric

18-643-F21-L02-S1, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S2, James C. Hoe, CMU/ECE/CALCM, ©2021

“Field Programmable” “Gate Array”

18-643-F21-L02-S3, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S5, James C. Hoe, CMU/ECE/CALCM, ©2021 A B X Y

– pass gate mux {1,0} diode

18-643-F21-L02-S8, James C. Hoe, CMU/ECE/CALCM, ©2021

from 2n-entry table

Large-enough functions have shorter 3-LUTs 50+% fully utilized

• 2 fxns (f & g) of 3 inputs OR 1 fxn (h) of 4 inputs

18-643-F21-L02-S14, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S16, James C. Hoe, CMU/ECE/CALCM, ©2021

[Figure 13.1: “Reconfigurable Computing: The Theory

18-643-F21-L02-S18, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S19, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S20, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S22, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S23, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S24, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S25, James C. Hoe, CMU/ECE/CALCM, ©2021

[Figure 48: Virtex-II Platform FPGAs: Complete Data Sheet]

Later architectures extended

[Figure 1, Stacked & Loaded: Xilinx SSI, 28-

programmable lookup tables

18-643-F21-L02-S31, James C. Hoe, CMU/ECE/CALCM, ©2021

18-643-F21-L02-S33, James C. Hoe, CMU/ECE/CALCM, ©2021

You might also like