0% found this document useful (0 votes)
62 views85 pages

DSD ch-5 Building Blocks

This document discusses basic building blocks for FPGA designs, including different types of computational blocks, embedded processors, and architectural options for adders, multipliers, and shifters. Some key points: - Modern FPGAs contain dedicated blocks like multipliers, adders, and DSP slices to improve performance for tasks like signal processing. - Common components in FPGAs include 18x18 multipliers (Altera, Xilinx), 8x8 multipliers and 16-bit adders (Quick Logic), and DSP48 blocks (Xilinx). - FPGAs also contain embedded processors like PowerPC, ARM, or MicroBlaze for control functions. - Architectural

Uploaded by

hafsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views85 pages

DSD ch-5 Building Blocks

This document discusses basic building blocks for FPGA designs, including different types of computational blocks, embedded processors, and architectural options for adders, multipliers, and shifters. Some key points: - Modern FPGAs contain dedicated blocks like multipliers, adders, and DSP slices to improve performance for tasks like signal processing. - Common components in FPGAs include 18x18 multipliers (Altera, Xilinx), 8x8 multipliers and 16-bit adders (Quick Logic), and DSP48 blocks (Xilinx). - FPGAs also contain embedded processors like PowerPC, ARM, or MicroBlaze for control functions. - Architectural

Uploaded by

hafsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Basic building blocks design

options
Introduction
• FPGAs outperform their traditional competing
technology of digital signal processors (DSPs).
• No matter how many MACs the DSP vendor can
place on a chip, still it cannot compete with the
availability of hundreds of these units on a high
end FPGA device.
• The modern day FPGAs come with embedded
processors, standard interfaces and signal
processing building blocks consisting of
multipliers, adders, registers and multiplexers.
• It is expected that the number and type of these
building blocks on FPGAs will see an upward
trend
Dedicated computational blocks
• 18x18 multiplier in Virtex II, Virtex II pro and
Spartan 3 FPGA
• 8x8 multiplier and 16 bit adder in Quick Logic
FPGA
• 18x18 multiplier and adder in Altera FPGA
• DSP48 blocks in Xilinx 7 series FPGAs
18x18 multiplier
18x18 multiplier and adder in Altera
DSP48 blocks in Xilinx
Embedded processors
• The FPGA vendors are also incorporating cores
of programmable processors and numerous
high speed interfaces.
• High end devices in Xilinx FPGAs are
embedded with Hard IP core of PowerPC,
ARM cortex or Soft IP core of Microblaze,
along with standard interfaces like PCI Express
and Gigabit Ethernet
FPGA embedded processors and other
interfaces
Instantiation of Embedded Blocks
• Example: a second order infinite impulse
response (IIR) filter in Direct Form (DF) II
realization
RTL Code
Code
Spartan-3 architecture
• Multiplication is implemented using 18x18
dedicated multipliers
Device utilization summary
Spartan-3
Virtex-4 architecture
• The multiplication and addition operations are
mapped on DSP48 multiply accumulate (MAC)
embedded blocks.
Device utilization summary
Virtex-4
Design optimization by pipe-lining
• Example-2: 8 tap FIR filter
RTL code: FIR
RTL code: FIR
Schematic Spartan-3
Device utilization
Design optimization
• Introducing pipeline registers
RTL code
Synthesis report
• Design is 9 times faster
Basic Building Blocks architecture
• After the foregoing discussion of the use of
dedicated multipliers and MAC blocks, it is
pertinent to look at the architectures for the
basic building blocks
• Several architectural options are available for
selecting an appropriate HW block for
operations like addition, multiplication and
shifting
– Parallel adders
– Barrel shifters
– Parallel multipliers
Adders
• Adders are used in addition, subtraction,
multiplication and division.
• The speed of any digital design of a signal
processing or communication system depends
heavily on these functional units.
• The ripple carry adder (RCA) is the slowest in
adder family.
• To cater for the slow carry propagation, fast
adders are designed. These make the process of
carry generation and its propagation faster.
Fast adders
• Carry look ahead adder
• Conditional sum adder
• Carry select adder
• Hierarchical Carry select adder
Single bit full adder/ Gate level options
Ripple carry adder
• Slowest adder due to carry propagation delay
Logic placement in CLBs
Carry look ahead adder
• A simple consideration of full adder logic
identifies that a carry c(i+1) is generated if
a(i) = b(i) = 1, and a carry is propagated if
either a(i) or b(i) is 1. This can be written as:
Carry look ahead adder
CLA logic
Grouping of CLAs
• Industrial practice is to use 4 bit wide blocks.
This limits the computation of carries until c3,
and c4 is not computed. The first four terms in
c4 are grouped as G0 and the product
p3p2p1p0 in the last term is tagged as P0 as
given here
Grouping of CLAs
• Similarly, bits 4 to 7 are also grouped together and
c5, c6 and c7 are computed in the first level of the
CLA block using c4 from the second level of CLA logic.
The first level CLA block for these bits also generates
G1 and P1.
A 16-bit carry look-ahead adder using two levels of CLA logic
A 64-bit carry look-ahead adder using
three levels of CLA logic
Conditional sum adder
16-bit uniform groups carry select
adder
Hierarchical carry select adder
Barrel Shifter
Barrel Shifter
• The circuit should support
– Logical right shift: x>>S
– Logical left shift: x<<S
– Arithmetic right shift: x>>>S
– Arithmetic left shift: x<<<S
8-bit arithmetic shift
8-bit logical and
arithmetic shift
Multi-stage barrel shifter
Parallel Multipliers
Carry Save Addition
• while reducing three operands to two, does
not propagate carries; rather, a carry is saved
to the next significant bit position. Thus this
addition reduces three operands to two
without carry propagation delay
Dot notation
• Dot notation facilitates description of different
reduction schemes
• Dots are used to represent each bit of the
partial product
Parallel multiplier circuits
• A CSA is one of the fundamental building
blocks of most parallel multiplier
architectures. The partial products are first
reduced to two numbers using a CSA tree.
These two numbers are then added to get the
final product.
Three components of a multiplier
Partial Product Generation
• Partial products PP[i] are genearted by ANDing each
bit a(i) of the multiplier with all the bits of the
multiplicand b
• Each PP[i] is shifted to the left by i bit positions
PPs generation code
Partial Product Reduction
• For a general N1xN2 multiplier, the following
four techniques are generally used to reduce
N1 layers of the partial products to two layers
for their final addition using any CPA:
– carry save reduction
– dual carry save reduction
– Wallace tree reduction
– Dadda tree reduction.
Carry Save Reduction
• The first three layers of the PPs are reduced to two
layers using carry save addition (CSA).
• Isolated bits in a column three layers, are simply
dropped down to the same column
• Columns with two bits are reduced to two bits using
half adders and the columns with three bits are
reduced to two bits using full adders
• Once the first three PPs are reduced to two layers,
the fourth partial product is grouped with them to
make a new group of three layers.
• The process is repeated until two layers are left
which are added using CPA
12x12 multiplier PPs
reduction
Carry save reduction scheme layout for
a 6x6 multiplier
• Level 0
Carry save reduction scheme layout for
a 6x6 multiplier
• Level 1
Carry save reduction scheme layout for
a 6x6 multiplier
Tree diagram 6x6
Tree diagram
Dual Carry Save Reduction
• The partial products are divided into 2 equal
size groups
• The carry save reduction scheme is applied on
both the groups simultaneously
• This results into two partial product layers in
each group
• The four layers are then reduced using Carry
Save Reduction
• The last two layers are added using any CPA
Tree diagram 8x8
Tree diagram
Wallace Tree Multipliers
• One of the most commonly used multiplier
architecture
• The number of adder levels increase
logarithmically as the partial products increase
Wallace Tree Multipliers
• Make group of threes and apply CSA reduction
in parallel
• Each CSA layer produces two rows
• Repeat the above two steps until two rows are
left
• The final rows are added together using CPA
for the final product
12x12 multiplication using Wallace
Reduction Tree
12x12 multiplication using Wallace Reduction Tree
12x12 multiplication using Wallace Reduction Tree
Wallace tree diagram 12x12
Wallace tree
Wallace Reduction layout for a 6x6
array of PPs
Wallace Reduction layout for a 6x6
array of PPs
Wallace Reduction layout for a 6x6 array of PPs
Adder delays in Wallace tree
A Decomposed Multiplier
• Four Multipliers of size NxN can be combined
to make a 2N x 2N multiplier
A 16x16 bit Multiplier decomposed
into four 8x8 multipliers
Two’s Complement Signed Multiplier
• 4 x 4-bit signed by signed multiplication
– The sign bits of the first three PPs are extended
– Two’s complement of the last PP is taken
– HW implementation results in additional logic
The End

Q&A
Appendix
8x8 multiplier and 16 bit adder
Tree diagram
Sign - extension Elimination
• Flip the sign bit, extend the number with all 1s and add a 1 at
the location of the sign bit
• Irrespective of the sign of the number, the technique makes
all the extended bits into 1s
Applied to multiplication
• First the MSB of all the PPs except the last one are
flipped and a 1 is added at the sign bit location, and the
number is extended by all 1s.
• For the last PP, the two’s complement is computed by
flipping all the bits and adding 1 to the LSB position.
• The MSB of the last PP is flipped again and 1 is added
to this bit location for sign extension.
• All these 1s are added to find a correction vector (CV).
• Now all the 1s are removed and the CV is simply added
and it takes care of the sign extension logic.
Correction vector
4x4 multiplication example

You might also like