Unit 4 Vlsi Ec3552
Unit 4 Vlsi Ec3552
For a full adder, it is sometimes useful to define Generate (G), Propagate (P), and Kill (K)
signals. The adder generates a carry when Cout is true independent of Cin, so G = A · B.
The adder kills a carry when Cout is false independent of Cin, so K = A · B = A + B.
The adder propagates a carry; i.e., it produces a carry-out if and only if it receives a carry-in,
when exactly one input is true: P = A B.
The sum and carry out signals interms of G,P,K can be given by:
Co(G,P)=G+PCi
S(G,P)=P Gi
Inverting property of RCA: Inverting all inputs to a full adder results in inverted output and it
can be expressed as
S’(A,B,Ci)=S(A’,B’,Ci’)
Co’(A,B,Ci)=Co(A’,B’,Ci’)
S=A B C S’(A,B,Ci)=S(A’,B’,Ci’)
Co=AB+BCi+ACi Co’(A,B,Ci)=Co(A’,B’,Ci’)
•A Manchester carry chain adder uses a cascade of pass transistors to implement the
carry chain.
• During the precharge phase (Φ=0),all intermediate nodes of the pass transistor carry
chain are precharged to Vdd.
• During evaluation, the nodes are discharged when there is an incoming carry and the
propogate and generate signals are high.
• The worst case delay of carry chain adder is modeled by the linearized RC network.
• Increasing the transistor width reduces the time constant,but it loads the gates in the
previous stage.
• Therefor transistor size is limited by the input loading capacitance
• The distributed nature of RC of the carry chain results in a propogation delay that is
quadratic in the number of nits N.
• To avoid this, it is necessary to insert signal buffering inverters
• Adding inverter makes the overall propagation delay that is quadratic in the
number of bits N.
• Adding inverter makes the overall propogation delay a linear function of N,as is the
case with ripple carry adders.
LOOK AHEAD ADDER DESIGN
LOOK AHEAD –BASIC IDEA
• Carry look ahead logic uses the concepts of generating and propagating carries.
• A carry-lookahead adder improves speed by reducing the amount of time required to
determine carry bits.
• The carry-lookahead adder calculates one or more carry bits before the sum. This
reduces the wait time to calculate the result of larger value bits. The Kogge-stone
adder and Brent-kung adder are examples of this type of adder.
• Carry lookahead depends on two things:
-Calculating for each digit position, whether that position is going to propagate
carry if one comes in from right.
-Combining these calculated values to be able to deduce quickly whether for
each group of digits, that group is going to propagate a carry that comes in
from the right
Suppose that groups of 4 digits are chosen. Then the sequence of events goes something like
this:
-All 1-bit adders calculate their results. Simultaneously, the lookahead units perform their
calculations.
-Suppose that a carry arises in a particular group. Within at most 5 gate delays, that carry will
emerge at the left-hand end of the group and starts propagating through the group to its left.
-If that carry is going to propagate all the way through the next group, the lookahead unit will
already have deduced this. Accordingly, before the carry emerges from the next group the
lookahead unit is immediately (within 1 gate delay) able to tell the next group to the left that it
is going to receive a carry –and, at the same time, to tell the next lookahead unit to the left that
a carry is on its way.
CARRY-LOOK-AHEAD ADDERS:
• Objective-generate all incoming carries in parallel
• Feasible-carries depend only on xn-1,xn-2,,...x0 and yn-1,yn-2,y0-information available to
all stages for calculating incoming carry and sum bit
• Requires large number of inputs to each stage of adder-impractical
• Number of inputs at each stage can be reduced-find out from inputs whether new
carries will be generated and whether they will be propagated.
CARRY PROPAGATION
• If xi=yi=1-carry –out generated regardless of incoming carry-no additional
information needed
• If xi,yi=10 or xiyi=01 – incoming carry propagated
• If xi-yi=0 – no carry propagation
• Gi=xiyi- generated carry;P I=XI+YI –Propagated carry
• Ci+1=xiyi +ci(xi+yi)=Gi + ci Pi
• Substituting ci= GI-1 +ci-1 Pi-1->ci+1 =Gi +Gi-1Pi+ci-1 Pi-1Pi
• Further substitutions –
Ci+1=Gi + Gi-1Pi+Gi-2Pi-1Pi+ci-2Pi-2Pi-1Pi= ....
= Gi +Gi-1Pi+Gi-2Pi-1Pi+ ....+c0P0P1 .... Pi.
• All carries can be calculated in parallel from xn-1,xn-2,...x0,yn-1,yn-2, .. y0 and forced
carry c0
Mirror implementation of Look Ahead Carry Adder
Look-Ahead: Topology
• Consider the four-bit adder of as in above fig. The values of Ak and Bk (k=0…3)
are such that all propagate signals Pk (k=0…3) are high.
• An incoming carry Ci,0=1 propagates under those conditions through the complete
adder chain and causes an outgoing carry C0,3=1. In other words, If (P0 P1 P2 P3
=1) then C0,3 =Ci,0 else either DELETE or GENERATE occurred.
• This information can be used to speed up the operation of the adder as in fig. When
BP=P0 P1P2P3=1 , the incoming carry is forwarded immediately to next block
through the bypass transistor Mb –hence the name carry-bypass adder or carry-skip
adder.
• tsetup:the fixed overhead time to create the generate and propagate signals.
• tcarry: the propagation delay through a single bit. The worst-case carry-
propagation delaythrough a single stage of M bits is approximately M times
larger.
• tbypass: the propagation delay through the bypass multiplexer of a single stage.
• tmin: the time to generate the sum of final stage.
MULTIPLICATION
• Multiplication needs M cycles using N-bit adder
• In shift and add
-M partial product added
- Partial product is AND operation of multiplier bit and multiplicand followed by a ‘shift’
PARTIAL PRODUCT-GENERATION:
To form the various product terms, an array of AND gates is used before the Adder array. An
array multiplier is a vast improvement in speed over the traditional bit serial multipliers in
which only one full adder along with a storage memory was used to carry out all the bit
additions involved and also over the row serial multipliers in which product rows (also known
as the partial products) were sequentially added one by one via the use of only one multi-bit
adder.
The tradeoff for this extra speed is the extra hardware required to lay down the adder array. But
with the much-decreased costs of these adders, this extra hardware has become quite affordable
to a designer. In spite of the vast improvement in speed, there is still a level of delay that is
involved in an array multiplier before the final product is achieved. Before committing
hardware resources to the circuit, it is important for the designer to calculate the aforementioned
delay in order to make sure that the circuit is compatible with the timing requirements of the
user.
Fig: Array Multiplier
• N partial products of M bit size each.
• NxM two bit AND; N-1 Mbit adders
• Layout need not be straggled, but routing will take care of shift
1. Multiply (that is – AND) each bit of one of the arguments, by each bit of the other,
yielding n2 results. Depending on position of the multiplied bits, the wires carry
different weights, for example wire of bit carrying result is 128
2. Reduce the number of partial products to two by layers of full and half adders.
3. Group the wires in two numbers, and add them with a conventional adder.
The second step works as follows. As long as there are three or more wires with the same
weight add a following layer:
• Take any three wires with the same weights and input them into a full adder. The result
will be an output wire of the same weight and an output wire with a higher weight for
each three input wires.
• If there are two wires of the same weight left, input them into a half adder.
• If there is just one wire left, connect it to the next layer.
• These computations only consider gate delays and don't deal with wire delays, which can
also be very substantial.
• The Wallace tree can be also represented by a tree of 3/2 or 4/2 adders.
• It is sometimes combined with Booth encoding
DIVIDER:
Unsigned non-restoring division:
Input: An n-bit dividend and a m-bit divisor
Output: The quotient and remainder
Begin:
1. load divisor and dividend into registers M and D, respectively, clear partial remainder
register R and set loop count cnt equal to n-1.
2. left shift register pair R:D one bit.
3.compute R=R-M;
4. Repeat
If(R<0)
begin
D(0)=0;left shift R: D one bit; R=R+M; end
Else begin
D(0)=1 ; let shift R:D one bit ;R=R-M; end
Cnt=cnt-1: until (cnt==0)
5. If(R<0) begin D[0]=0;R=R+M; end else D(0)=1;end
Fig: Sequential Implementation of Non-Restoring Divider.
BARREL SHIFTER:
Any general-purpose n-bit shifter should be able to shift incoming data upto n-1 places in a right shift or
left shift direction. If we now further specify that all shifts should be one end around basis, so that any bit
shifted out at one end of a data word, will be shifted in at the other end of the word, then the problem of left
shift or right shift is greatly eased.
For a 4 it word, a 1bt right shift is equal to a 3bit left shift and a 2bit shift right is equal to a 2bit shift left
etc. Thus, we can achieve a capability to shift left or right by zero, one, two or three places by designing a
circuit which will shift right only by one, two or three places.
Barrel shifter is an adaptation of the crossbar switch which recognizes the fact that we can couple the switch
gates together in groups of four and also form four separate groups corresponding to shifts of zero, one,
two and three bits.
The arrangement is readily adapted so that the in-lines also run horizontally. The resulting arrangement is
known as barrel shifter. This inter bus switches have their gate inputs connected in a staircase fashion in
groups of four and these are now four shift control inputs which must be mutually exclusive in the active
state. The structure of barrel shifter is of high regularity and generality
DESIGNING OF MEMORY AND ARRAY STRUCTURES
Memory Classification: Classification criteria
I. Size: Depending upon the level of abstraction, different means are used to express the size of
a memory unit. The circuit designer tends to define the size of a memory in terms of the
numbering of bits that are equivalent to the number of individual cells (flip flops) needed to
store the data. The chip designer expresses the memory size in bytes or its multiples. The
system designer likes to quote the storage requirement in words.
II. Timing Parameters:
The time it takes to retrieve data (read) from the memory is called read access time which is
equal to the delay between the read request and the moment the data is available at the output.
This time is different from the write-access time which is the time elapsed between a write
request and final writing of the input data into the memory Read or write cycle time of the
memory is the minimum time required between successive reads or writes.
This design does not address the issue of memory aspect ratio (height is very large compared
to width). This results in a design which cannot be implemented. Besides the bizzare shape
factor, the resulting design is extremely slow. The vertical wires connecting the storage cells
to the input/output becomes excessively long. To address this problem, memory arrays are
organized so that vertical and horizontal dimensions are of the same order of magnitude, thus
the aspect ratio approaches unity. Multiple words are stored in a single row and are selected
simultaneously. To route the correct word to the input/output terminals, an extra piece of
circuitry called the column decoder is needed. The address word is partitioned into a column
address (A0 to AK-1) and a row address (AK to AL-1). The row address enables one row of the
memory for R/W while the column address picks one particular word from the selected row.
For layer memories. The memory is partitioned into P smaller blocks. The composition of each
of the individual blocks is identical to the above figure. A word is selected based on the row
and column address that are broadcast to all the blocks. An extra address word called the block
address, selects one of the P blocks to be read or written. This approach has a dual advantage.
1. The length of the local word and bitlines i.e. the length of the lines within the blocks is
kept within bounds, results in faster access times.
2. The block address can be used to activate only the addressed block. Non active blocks
are put in power saving mode with sense amplifiers and row and column decoders
disabled. This results in a substantial power saving that is desirable.
For this particular case, the address is partitioned into sections of 2 bits that are decoded in
advance. The resulting signals are combined using 4 input NAND gates to produce the fully
decoded array of WL signals.
Dynamic Decoders:
Since only one transition determines the decoder speed, it is interesting to evaluate other circuit
implementations.
Column and Block decoders:
The functionality of a column and block decoder is best described as a 2K input multiplexer
where K stands for the size of the address word. One implementation is based on the CMOS
pass transistor multiplexer. The control signals of the pass transistor are generated using a K-
to-2K predecoder. The schematic of a 4to1 column decoder using only NMOS transistors is
shown. The main advantage of this implementation is its speed. Only a single pass transistor is
inserted in the signal path, which introduces only a minimal extra resistance. The column
decoding is one of the last actions to be performed in the read sequence, so that the predecoding
can be executed in parallel with other operations such as memory access and sensing and can
be performed as soon as the column address is available. Consequently, the propagation delay
does not add to the overall memory access time.
A more efficient implementation is offered by a tree decoder that uses a binary reduction
scheme. Notice that no predecoder is required. The number of devices is drastically reduced as
shown.
Ntree = 2K + 2K-1 + … + 4 + 2 = 2(2K-1)