0% found this document useful (0 votes)
20 views

Lecture11 FPGA

Uploaded by

羅翊誠
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Lecture11 FPGA

Uploaded by

羅翊誠
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

inst.eecs.berkeley.

edu/~eecs151

EECS151/251A
Introduction to Digital Design and ICs
Lecture 11: FPGA Sophia Shao
Intel’s Altera Acquisition
On June 1, 2015, Altera and Intel announced that Intel would acquire
Altera in an all-cash transaction valued at approximately $16.7 billion.

AMD Acquires Xilinx


Semiconductor designer Advanced Micro Devices Inc (AMD)(AMD.O) said on Monday
it has finalized the purchase of Xilinx Inc (XLNX.O), giving it an extra edge in the key
data center market. Intel acquired Altera AMD acquired
in 2015. Xilinx in 2022.
https://newsroom.intel.com/news-releases/intel-completes-acquisition-of-altera/#gs.i5jkp3
https://www.amd.com/en/corporate/xilinx-acquisition
Shao Fall 2022 © UCB
• RISC-V Pipelining
• 5-Stage Pipeline
• Pipeline Hazards
• Structural
• Data
• Control

2 EECS151 L11 FPGA Shao Fall 2022 © UCB


Pipelined Datapath: When is the target PC calculated?

+4
Add
Add
+4 pc+4 2
wb pcX
DataD alu 1
Reg[rs1]
alu Inst[11:7]
1 0 wb
PC AddrD
addr
addr Inst[19:15] LD
0 DataR
inst
inst AddrA DataA
pc+4
Inst[24:20]
Branch + addr X
Comp aluM
AddrB DataB 0 ALU mem
clk DataW
IMEM
IMEM Reg [ ] 1 DMEM

clk clk
Imm. Imm
Gen [31:0]

InstD InstX InstM InstW

Forwarding logic

EECS151 L11 FPGA Shao Fall 2022 © UCB


Control Hazards

beq t0, t1, Label

sub t2, s0, t5 executed regardless of


branch outcome!
executed regardless of
or t6, s0, t3
branch outcome!!!
executed regardless of
xor t5, t1, s0
branch outcome!!!
Label: sw s0, 8(t3)

PC updated reflecting
branch outcome
4 EECS151 L11 FPGA Shao Fall 2022 © UCB
Observation
• If branch not taken, then instructions fetched sequentially after branch are
correct
• If branch or jump taken, then need to flush incorrect instructions from pipeline
by converting to NOPs

5 EECS151 L11 FPGA Shao Fall 2022 © UCB


Kill Instructions after Branch if Taken

Taken branch
beq t0, t1, label
Convert to NOP
sub t2, s0, t5
Convert to NOP
or t6, s0, t3

Add t2, t3, s0 Convert to NOP


PC updated
label: xxxxxx reflecting branch
outcome

6 EECS151 L11 FPGA Shao Fall 2022 © UCB


Reducing Branch Penalties
• In our datapath, every taken branch in simple pipeline costs 3 dead cycles
• To improve performance, use “branch prediction” to guess which way branch
will go earlier in pipeline
• Only flush pipeline if branch prediction was incorrect

7 EECS151 L11 FPGA Shao Fall 2022 © UCB


Branch Prediction

Taken branch
beq t0, t1, label

label: ….. Guess next PC!

…..

Check guess correct

8 EECS151 L11 FPGA Shao Fall 2022 © UCB


Summary
• We have covered single-cycle, multi-cycle and pipelined datapaths
• Pipeline Hazards:
• Structural
• Data
• Control hazards

• Need to be understood and handled appropriately


• More resources
• Forwarding
• Stall

9 EECS151 L11 FPGA Shao Fall 2022 © UCB Nikolić, Shao Fall 2019 © UCB
• FPGA
• Overview
• Key Configurable Resources
• Configurable Logic Blocks (CLBs)
• Look-Up Tables
• Slices
• Configurable Interconnect
• BRAM, DSP, and AI Engine

10 EECS151 L11 FPGA Shao Fall 2022 © UCB


Chip Hall of Fame: Xilinx XC2064 FPGA
• Back in the early 1980s, chip designers
tried to get the most out of each transistor
on their circuits.
• Ross Freeman came up with a chip
packed with transistors that formed loosely
organized logic blocks with connections
that could be configured and reconfigured
with software.
• As a result, sometimes a bunch of
transistors wouldn’t be used, but Freeman
was betting that Moore’s Law would
https://spectrum.ieee.org/tech-history/silicon-
eventually make transistors so cheap that revolution/chip-hall-of-fame-xilinx-xc2064-
no one would care. fpga

11 EECS151 L11 FPGA Shao Fall 2022 © UCB


Why are FPGAs Interesting?
• Technical viewpoint:
• For hardware/system designers, FPGAs are
quite similar to ASICs but can be designed
faster:
• “Tape-out” new design every few
minutes/hours
• “reconfigurability” or “reprogrammability”
may offer other advantages over fixed
logic
• On the other hand, the relative flexibility
comes at the expense of larger die area,
slower circuits, and more energy per
operation.
• Modern FPGAs are “reconfigurable systems
on a chip”

12 EECS151 L11 FPGA Shao Fall 2022 © UCB


Field Programmable Gate Arrays (FPGAs)
• An integrated circuit designed to be
configured by a customer or a designer
after manufacturing, i.e., field
programmable.
• Low NRE cost compared to ASIC
• The FPGA configuration is generally
specified using a hardware description
language, similar to that used for ASICs.
• Two dominant FPGA makers:
• Xilinx (now AMD)
• Altera (now Intel)

13 EECS151 L11 FPGA Shao Fall 2022 © UCB


Trimberger, IEEE Micro’2015
FPGA Overview
• Basic idea:
• Two-dimensional array of logic blocks
and flip-flops with means for the user to
configure:
• The function of each block
• The interconnection between
blocks

• Configurable Logic Blocks (CLBs)


• FPGA’s Functional Units

• Configurable Interconnect
• Connecting CLBs together

Xilinx Early FPGA


14 EECS151 L11 FPGA Shao Fall 2022 © UCB
State-of-the-art Xilinx FPGAs

Virtex Ultra-scale

15 EECS151 L11 FPGA Shao Fall 2022 © UCB


• FPGA
• Overview
• Key Configurable Resources
• Configurable Logic Blocks (CLBs)
• Look-Up Tables
• Slices
• Configurable Interconnect
• BRAM, DSP, and AI Engine

16 EECS151 L11 FPGA Shao Fall 2022 © UCB


Background
• A MUX or multiplexor is a combinational logic
circuit that chooses between 2N inputs under the
control of N control signals.

• A latch is a 1-bit memory (similar to a flip-flop).

17 EECS151 L11 FPGA Shao Fall 2022 © UCB


Configurable Logic Blocks (CLBs)
• Basic FPGA functional unit
• Implements both combinational
and sequential logic
• Includes:
• Look-up table
• Register (Flip-Flop)
• Multiplexers

18 EECS151 L11 FPGA Shao Fall 2022 © UCB


Look-Up Table Implementation
• Implement truth table in small memories
• Latches/SRAM

• n-bit LUT is implemented as a 2! ∗ 1𝑏


memory:
• inputs choose one of 2! memory locations.
• memory locations (latches) are normally loaded
with values from user’s configuration bit stream.
• Inputs to mux control are the CLB inputs.

• Result is a general purpose “logic gate”.


• n-LUT can implement any function of n inputs!

19 EECS151 L11 FPGA Shao Fall 2022 © UCB


Look-Up Table Implementation
Example: 4-LUT
• An n-LUT is a direct implementation of
a function truth-table.
• Each location holds the value of the
function corresponding to one input
combination.
• LUT size grows exponentially with # of
inputs.
• 64 input LUT requires 2"# = 1.84 ∗ 10$%
bits storage.
• 4-input ~ 8-input LUT

20 EECS151 L11 FPGA Shao Fall 2022 © UCB


Slices
• Each CLB contains two slices in 7 series.
• LUTs and registers are split across slices.
• 4 LUTS and 8 FFs in 7-series.

• Two types of slices:


• SLICEM: Full slice
• LUT can be used for logic *and*
memory/shift registers.
• Has wide multiplexers and carry chain
• SLICEL: logic and arithmetic only
• LUT can only be used for logic (no
memory)
• Has wide multiplexers and carry chain

21 EECS151 L11 FPGA Shao Fall 2022 © UCB


Constructing a SLICE
• 5-Input Look-Up Table (1)
Q Computes any 5-
input logic function.
A[6:2] D (0)
Q
A[6:2]
00000 1
00001 0 (1)
Q
00010 1 Timing is
D independent
of function.
Q
(0)
11101 0
11110 0 Latches
1 Q
11111 (0)
set during
Q configuration.
(1)

22 EECS151 L11 FPGA Shao Fall 2022 © UCB


Constructing a SLICE
• 6-input LUT May be used
as one 6-input LUT
(D6 out)
...

... or as two
5-input LUTS
(D6 and D5)

Combinational
logic
(post configuration)

23 EECS151 L11 FPGA Shao Fall 2022 © UCB


The Simplest View of A Slice
Four 6-LUTs

Four Flip-Flops

Switching fabric may see


combinational and registered
outputs.

An actual Virtex slice adds many


small features to this simplified
diagram.
We show them one by one ...

24 EECS151 L11 FPGA Shao Fall 2022 © UCB


How about 7-input LUT in a slice?

Two 7-LUTs

Extra MUX
(F7AMUX, F7BMUX)

Extra inputs
(AX and CX)

25 EECS151 L11 FPGA Shao Fall 2022 © UCB


How about 8-input LUT in a slice?

Third MUX
(F8MUX)

Third input
(BX)

26 EECS151 L11 FPGA Shao Fall 2022 © UCB


Extra MUXes to choose LUT outputs

From eight 5-LUTs


... to one 8-LUT.

Combinational
or registered outs.

27 EECS151 L11 FPGA Shao Fall 2022 © UCB


Extra Carry Chain

We can map
ripple-carry addition onto
carry-chain block.

28 EECS151 L11 FPGA Shao Fall 2022 © UCB


Putting it all together ... a SLICEL.
The previous
slides explain all
SLICEL features.

About 50% of the


are SLICELs.

The other slices are


SLICEMs,
and have extra
features.

29 EECS151 L11 FPGA Shao Fall 2022 © UCB


Administrivia
• Lab 3 and 4 walkthrough videos posted.
• Useful to understand conceptual questions

• Homework 4 due this week.


• Homework 5 will be released.

30 EECS151 L11 FPGA Shao Fall 2022 © UCB


Configurable Interconnect
• Between rows and columns of CLBs are
wiring channels.
• These are programable. Each wire can
be connected in many ways.
• Switch Box:
• Each interconnection has a transistor switch.
• Each switch is controlled by 1-bit
configuration register.

31 EECS151 L11 FPGA Shao Fall 2022 © UCB


• FPGA
• Overview
• Key Configurable Resources
• Configurable Logic Blocks (CLBs)
• Look-Up Tables
• Slices
• Configurable Interconnect
• BRAM, DSP, and AI Engine

32 EECS151 L11 FPGA Shao Fall 2022 © UCB


Diverse Resources on FPGA
Colors represent
different types of
resources:
Logic
Block RAM
DSPs
Clocking
I/O
Serial I/O + PCI
A routing fabric runs
throughout the chip
to wire everything
together.
Virtex-5 Die
Photo
33 EECS151 L11 FPGA Shao Fall 2022 © UCB
Block RAM
• Block Random Access Memory
• Used for storing large amounts of data:
• 18Kb or 36Kb
• Configurable bitwidth
• 2 read and write ports

• More recently
• UltraRAM in UltraScale+ devices
• 288Kb

Xilinx Datasheet

34 EECS151 L11 FPGA Shao Fall 2022 © UCB


DSP Slice
48 A:B 48
B REG
CE ALUMode
18
A D Q
2-Deep
M REG
Input Conditioning CE X 4
72 36 0
D Q P REG
B
A REG 36 0 CE P
Y 48
CE 1 D Q
25
D Q
2-Deep
C 0
17-bit shift
Z
17-bit shift PATTERN
= DETECT
D C REG CarryIn
CE
C or MC
D Q 7
OpMode 48

OP
Z-1 MULT Z
-1
CTL ADD Z-1
Z-2
Efficient implementation of multiply, add, bit-wise logical. Xilinx Resource
35 EECS151 L11 FPGA Shao Fall 2022 © UCB
AI Engine
• Versal AI Core

Xilinx
HotChips’2019
36 EECS151 L11 FPGA Shao Fall 2022 © UCB
State-of-the-art Xilinx FPGA Platform
• Versal (ACAP: Adaptive Compute Acceleration Platform)

AI Engine

Xilinx
HotChips’2019

37 EECS151 L11 FPGA Shao Fall 2022 © UCB


Summary
• FPGAs are widely used for hardware prototyping and accelerating key
applications.
• Core FPGA building blocks:
• Configurable Logic Blocks (CLBs)
• Slices
• Look-Up Tables
• FlipFlops
• Carry chain
• Configurable Interconnect
• Switch boxes

• Modern FPGA Designs:


• BRAMs, DSPs, and AI Engines

38 EECS151 L11 FPGA Shao Fall 2022 © UCB

You might also like