0% found this document useful (0 votes)
28 views30 pages

Implementation Methods

65

Uploaded by

dowoc61946
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views30 pages

Implementation Methods

65

Uploaded by

dowoc61946
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

IMPLEMENTATION METHODS

• PAST METHODS
– Use of standard parts such as logic family chips on a Printed Circuit Board (PCB) or a breadboard for
prototyping
• Benefits include low technology demand and modest performance but modification or bug fix is difficult

– Use of embedded processors to implement a system where the main digital function is implemented
as a computing program (normally called firmware) running on a microcontroller or DSP chip.
• The hardware is readily available and all one needs is the appropriate development system to compile and debug the firmware.
Modification or enhancement presents no problem but performance is usually poor.

C.S. Choy 1
MODERN METHODS
• To obtain the best performance, it is desirable to implement a design in an
integrated circuit. However the cost can be high or even prohibitive and the
turnaround time can be long. The modern methods are to meet these challenges
and make IC technology more accessible to designers who do not have the
knowledge of fabrication processes and design rules, and sophisticated IC design
skill .
– FPGAs or PLDs
• Sell as standard parts but users can program the device just like an EPROM to perform a hardwired digital function
• Since it is a finished product, the design and manufacturing costs are shared by the users. The turnaround time is literally in minutes

– ASIC
• A semi-custom designed IC with logic functions determined by the users
• The physical design of those logic functions are pre-designed by the vendors and they are integrated into an IC with automation tools
• No saving in manufacturing cost but reductions in design cost and time

C.S. Choy 2
SYNCHRONOUS DESIGNS
• A circuit operated in synchronous mode is
often preferred in today’s digital design. A
synchronous design has all operations
governed by a clock signal

– Speed of the clock dictates the speed of the circuit


– All timings take reference from clock cycles
– Separate timing control and function operations
– Simplify the design process and timing verification
SYNCHRONOUS DESIGN GENERAL MODEL
• Timing is controlled by the state machine and
function operations are realized in the
datapath
instructions

clock
STATE
MACHINE
control
feedback signals

data in DATAPATH data out


SYNCHRONOUS TIMING
• The basic sub-structure of a synchronous design contains two flip-
flops in cascade with a combinational logic block in between
• The principle timing requirement is the clock period must be long
enough for the data input of the first flip-flop to get through this
flip-flop itself and the combinational logic block before reaching the
input of the next flip-flop
TIMINGS ASSOCIATED WITH FLIP-FLOP

• Setup time
– d input must be stable for at least the setup time before the active
clock edge
• Hold time
– d input must be kept stable for at least the hold time after the active
clock edge
• Minimum clock width
– Sufficient time for the flip-flop to respond
STATE MACHINE DESIGN
• Formal Technique
– State Diagram
– State Table
– State Assignment
– Excitation Conditions
• There are other more ad hoc techniques
– Counter
– One-hot shift register
DATA PROCESSING
• Parallel
– Normal realization, operations are performed on the whole data word
• Serial
– A data word is processed bit by bit
• Bit-Slice
– A data word is processed in chunks; an intermediate between parallel
and serial
• Pipelining
– Dividing up the whole process into sub-operations. The pipeline is a
cascaded chain of registers storing intermediate results, and in
between the registers, a sub-operation is realized
PIPELINING EXAMPLE
• 8-bit Adder
DESIGN OF ARITHMETIC UNITS
• Addition is the elementary operation in any arithmetic
manipulation from multiplication to complex signal processing
algorithms. An efficient realization of addition is the key to the
design of arithmetic units.
• The performance of an Adder is defined by the delay in the
carry path. In the worst case, the carry signal has to propagate
through all adder stages, k stages if k is the word length of the
operands.
• There are four approaches to improve carry propagation
– Limit carry propagation
– Detect carry completion
– Speed up carry propagation
– Carry-free
RIPPLE CARRY ADDER
• Parallel

• Serial
A
SUM
B
Cout
Cin

F/F

• Full Adder Block Ai


Bi
Si=Ai+Bi+Ci
Ci

Ci+1=AiBi+AiCi+BiCi
LIMIT CARRY PROPAGATION
• Residue Number System
Find the residues of a number using different moduli, each residue then forms a
digit of the number

e.g. RNS(8|7|5|3) system


700 mod 8 = 4
700 mod 7 = 0
700 mod 5 = 0
700 mod 3 = 1
Therefore 70010 = (4|0|0|1)RNS

For arithmetic operations, each digit is operated on independently or


intransitively. Therefore, for the RNS example, the maximum number of bits to
represent a digit is 3 bits. As a result, the maximum carry propagation for
addition is limited to 3 stages
e.g. 70010 + 2110 = 72110
(4|0|0|1) RNS + (5|0|1|0)RNS = (1|0|1|1)RNS
DETECT CARRY COMPLETION
• Use of differential encoding to detect completion; differential encoding
offers a means to indicate the validity of a signal
• For addition at a particular bit, there are circumstances that the carry for
this stage can be generated disregard the carry propagated from the lower
bits. These circumstances are
Adding two zeros implies carry must be zero
Adding two ones implies carry must be one
• Example of differential encoding
(bi , ci) = (0 , 0) carry not yet known
(0 , 1) carry known to be 1
(1 , 0) carry known to be 0
When alldone is asserted, carry generation
for the whole addition is completed
SPEED UP CARRY PROPAGATION
• Carry-lookahead Adder
Ai Bi Ci+1 remark
0 0 0 No carry
0 1 Ci Carry propagate (Pi)
1 0 Ci Carry propagate (Pi)
1 1 1 Carry generate (Gi)

Pi = Ai + Bi Gi = AiBi

The sum and carry can be derived from P and G


Si = Ai + Bi + Ci = Pi + Ci
Ci+1 = AiBi + AiCi + BiCi
= AiBi + Ci (Ai + Bi)
= AiBi + Ci (Ai + Bi)
= Gi + PiCi
SPEED UP CARRY PROPAGATION
• Expressed in terms of P and G, the carry logic can be written
as
C1 = G0 + P0C0
C2 = G1 + P1C1 = G1 + P1G0 + P1P0C0
C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0
C4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0C0
CARRY-FREE
• Redundant Number Systems
Concept

If we use a digit set of [0,18], there is no carry to worry about. However,


further additions will cause overflow again. This can be dealt with by a
transfer digit (similar to carry but it is not transitive)
CARRY-FREE
• For binary system, carry-free (carry-save) addition can be
realized using a digit set of [0,2]
CARRY-FREE
• Hardware Realization
The digit set [0,2] is encoded as
0:(0,0) 1:(0,1)or(1,0) 2:(1,1)

• In the end, it may be necessary to convert the result in carry-


save form to normal binary number by using a conventional
adder. However, this is only done in the final operation after a
series of accumulations.
DESIGN TECHNIQUES CONGENIAL TO FPGA
• Unlike DSP, FPGA conforms better to bit-level
implementation. The unique architecture of a FPGA
also asks for unconventional approach to increase
performance
• Number representations
FIXED-POINT REPRESENTATIONS
• Fixed-point implementations have higher speed and
lower cost, while floating-point has higher dynamic range
and no need for scaling. Fixed-point is generally preferred
in FPGA implementations.
– Unsigned Integer
– Signed-Magnitude (SM)
– Two’s Complement (2C)
– One’s Complement (1C)
– Diminished One (D1)
– Bias System (Bias)
BINARY ADDERS
• Ripple Carry Adder

• FPGA Implementation
Several bits are grouped into Dedicated fast carry logic
one LUT
PIPELINED ADDERS
• Pipelining can speed up operation by breaking it into smaller sub-
operations. Result of sub-operation is saved in registers and another sub-
operation is then executed in the next cycle and so on
Direct Implementation of pipelined adder FPGA optimized

Notes: Allocate registers to reduce cell requirement


Pipelining must take into account logic resources and their organization
MULTIPLIERS
• Serial/Parallel – one operand is used in parallel and the second operand is
used bitwise

• Serial/Serial – only one full adder is needed

• Parallel/Parallel – shift-add in different stages, popular in ASIC


implementation
FAST ARRAY MULTIPLIER
• Generate all partial
products in parallel and add
in a binary tree of adders
• Convenient to introduce
pipeline stages after each
tree level
DISTRIBUTED ARITHMETIC
• It is used in computing the sum of products where
the coefficients are known
• Sum of products
DISTRIBUTED ARITHMETIC
• Variable x[n] is represented by

• Y can then be represented as

• Redistributing the order of summation


• In compact form

• If the term f(c[n],xb[n]) can be realized with one LUT


TABLE PARTITIONING
• If the number of coefficients N is too large to implement the
full word with a single LUT (no. of inputs to LUT = N), one can
use partial tables and add the results

• If pipeline registers are also added, the modification will not


reduce speed, but can dramatically reduce the size of the
design
WORD PARALLEL
• A basic DA architecture, for
a length Nth sum of product
computation, accepts one
bit from each of N words. If
two bits per word are
accepted, then the
computational speed can be
essentially doubled. The
maximum speed can be
achieved with the full
pipelined word-parallel
architecture

You might also like