0% found this document useful (0 votes)
24 views47 pages

Avlsi - Module 1word Notesn

word format of AVLSI

Uploaded by

Akash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views47 pages

Avlsi - Module 1word Notesn

word format of AVLSI

Uploaded by

Akash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

21EC71 Advanced VLSI

MODULE 1

 Introduction to ASICs

 Full custom ASIC, Semi-custom ASIC

 Programmable ASICs

 ASIC Design flow

 ASIC cell libraries

 CMOS Logic: Data path Logic Cells

 Data Path Elements

 Adders: Carry skip, Carry bypass,

Carry save, Carry select, Conditional sum,

Multiplier (Booth encoding)

 Data path Operators

 I/O cells, Cell Compilers


ECE Dept, Vemana IT 1 Prepared by : Prathima A
21EC71 Advanced VLSI

History of integration and Evolution of ICs :

• SSI (Small-Scale Integration)-(1962)

– Tens of Transistors

– NAND, NOR

• MSI (Medium-Scale Integration)-(late 1960)

– Hundreds of Transistors

– Counters

• LSI (Large-Scale Integration)-(mid 1970)

– Tens of Thousands of Transistors

– First Microprocessor

• VLSI (Very Large-Scale Integration)-(1980)

– Started Hundreds of Thousands of Transistors to-several billion transistors in


2009

– 64 bit Microprocessor with cache memory and floating-point arithmetic units

• ULSI (Ultra Large-Scale Integration)-(late 1980)

– More than about one million circuit elements on a single chip.

– The Intel 486 and Pentium microprocessors, use ULSI technology

ECE Dept, Vemana IT 2 Prepared by : Prathima A


21EC71 Advanced VLSI
History of technology :

• Bipolar
– More accuracy
• MOS
– Gate-Aluminium
– Low power consumption
– Low cost
• CMOS
– Gate-Poly-Silicon
– Low power consumption
– Low cost
• BiCMOS
- High Current drive

Origin of ASICs:

• Standard parts/Standard ICs: Initially used to design microelectronic systems

• Glue Logic : Microelectronic system design then becomes a matter of defining

the functions that we can implement using standard ICs and then implementing

the remaining logic functions (sometimes called glue logic ) with one or more

custom ICs.

• Standard Parts were gradually replaced with a combination of glue logic,

custom ICs, dynamic random access memory (DRAM) and static RAM (SRAM)

• ASIC (Application Specific Integrated Circuits ) (“a-sick”)

• ASSP (Application-Specific Standard Products) : are a cross between standard

parts and ASICs

ASICs and Non-ASICs :

• Examples of ICs that are not ASICs include standard parts such as:

ECE Dept, Vemana IT 3 Prepared by : Prathima A


21EC71 Advanced VLSI
– memory chips sold as a commodity item—ROMs, DRAM, and SRAM;

- microprocessors;

– TTL or TTL-equivalent ICs at SSI, MSI, and LSI levels.

• Examples of ICs that are ASICs include:

– a chip for a toy bear that talks;

– a chip for a satellite;

– a chip designed to handle the interface between memory and a microprocessor

for a workstation CPU;

– a chip containing a microprocessor as a cell together with other logic.

• ASSP (two ICs that might or might not be considered ASICs )

– controller chip for a PC and a chip for a modem.

– Both of these examples are specific to an application (shades of an ASIC) but are

sold to many different system vendors (shades of a standard part). ASICs such

as these are sometimes called application-specific standard11 products

( ASSPs ).

Measurement of IC :

• Gate Equivalent

– Number of gates or transistors

– Gate refer to two input NAND Gate

– In CMOS, each NAND gate consist of 4 transistors

– Example : 10k gate IC

– (10,000 two-input NAND gates or 40,000 transistors in CMOS)

– Feature Size (smallest feature size = l )

– Half of smallest transistor length

– Example: 0.5µm IC

ECE Dept, Vemana IT 4 Prepared by : Prathima A


21EC71 Advanced VLSI
Feature size, l = 0.25µm

Types of ASICs

• Full-Custom ASICs: Possibly all logic cells and all mask layers customized

• Semi-Custom ASICs: all logic cells are pre-designed and some (possibly all)

mask layers customized

 Full-Custom ASICs

 Includes possibly all customized logic cells

 Have all their mask layers customized

 Manufacturing lead time is typically 8 weeks (time taken to make the IC does

not include design time)

 Full-custom ASIC design makes sense only when :

 When no suitable existing libraries exist or

 Existing library cells are not fast enough or

 The available pre-designed/pre-tested cells consume too much power that

a design can allow or

ECE Dept, Vemana IT 5 Prepared by : Prathima A


21EC71 Advanced VLSI
 The available logic cells are not compact enough to fit or

 ASIC technology is new or/and so special that no cell library exits.

 Full-Custom ASICs

 Advantages:

 Offer highest performance

 lowest cost (smallest die size/lowest part cost)

 Disadvantages

 Increased design time

 Increased Complexity

 Higher design cost

 Higher risk.

 Some Examples:

 Microporcessor,

 High-Voltage Automobile Control Chips

 Ana-Digi Communication Chips

 Sensors and Actuators

 Semi-Custom ASICs : all logic cells are pre-designed and some (possibly all)mask

layers customized

1) Standard-Cell Based ASICs (CBIC- “sea-bick”)

 Use predesigned logic cells (Called standard cells) from

 standard cell libraries

 other mega-cells (Microcontroller or Microprocessors)

 full-custom blocks

 System-Level Macros(SLMs)

 Functional Standard Blocks (FSBs)

ECE Dept, Vemana IT 6 Prepared by : Prathima A


21EC71 Advanced VLSI
 cores etc

 Get all mask layers customized- transistors and interconnect

 Manufacturing lead time is about 8 weeks

 Custom blocks can be embedded

Standard – Cell Based Array (Typical CBIC)

 Wiring cells in Standard Cell based ASICs

• Feedthrough cell:

• Piece of metal that is used to pass a signal through a cell or to a space in a cell

waiting to be used as a feedthrough

• Spacer cells

• The width of each row of standard cells is adjusted so that they may be aligned using

spacer cells .

• Row end cells

• The power buses, or rails, are then connected to additional vertical power rails using

row-end cells at the aligned ends of each standard-cell block.

ECE Dept, Vemana IT 7 Prepared by : Prathima A


21EC71 Advanced VLSI
• Power cells

• If the rows of standard cells are long, then vertical power rails can also be run in

metal2 through the cell rows using special power cells that just connect to VDD and

GND.

Usually the designer manually controls the number and width of the verti1c9al power rails

connected to the standard-cell blocks during physical design

 Standard Cell in Flexible block of CBIC

• Standard cell in library is constructed using full-custom design methodology-

– Same performance and flexibility but reduce time and risk.

• ASIC designer defines only placement of standard cells

– It can be placed anywhere on silicon.

 Flexible blocks in CBIC

– Standard cells are designed like bricks in a wall.

ECE Dept, Vemana IT 8 Prepared by : Prathima A


21EC71 Advanced VLSI
– Groups of standard cells fit horizontally to form rows.

– The rows stack vertically to form flexible blocks- reshape during design

– Flexible blocks connected with other std cell blocks or full custom block

 Advantages of CBIC

– Save time, money, reduce risk

– Standard cell optimized individually for speed or area

 Disadvantages of CBIC:

– Time to design standard cell library

– Expenses of designing std cell library

– Time needed to fabricate all layers of the ASIC for new design

2) Gate Array based ASICs

 Transistors are predefined on the silicon wafer

 Predefined pattern of transistors on a gate array is base array.

 Smallest element repeated to form base array is base cell.

 Only the top few layers of metal, which define the interconnect between

transistors, are defined by the designer using custom masks.

 It is often called a masked gate array ( MGA ) or a pre-diffused array which uses

macros(books) to turn –around time (few days or couple of weeks)

 It comprises a base array made from a base cell or primitive cell

 There are three types:

ECE Dept, Vemana IT 9 Prepared by : Prathima A


21EC71 Advanced VLSI
a) Channeled gate arrays b) Channel-less gate arrays c) Structured gate arrays

a) Channeled gate arrays

Similar to CBIC –but here space is fixed

b) Channel-less gate arrays

The key difference between a channel-less gate array and channeled gate array
• There are no predefined areas set aside for routing between cells on a channel-
less gate array.

ECE Dept, Vemana IT 10 Prepared by : Prathima A


21EC71 Advanced VLSI
• Use an area of transistors for routing in a channelless array, we do not make any
contacts to the devices lying underneath; we simply leave the transistors
unused.
• The logic density—the amount of logic that can be implemented in a given
silicon area is higher for channelless gate array
• This is usually attributed to the difference in structure between the two types of
array. In fact, the difference occurs because the contact mask is customized in a
channel-less gate array, but is not usually customized in a channeled gate array. This
leads to denser cells in the channel-less architectures.
• Customizing the contact layer in a channel-less gate array allows us to increase the
density of gate-array cells because we can route over the top of unused contact sites.
c) Structured gate arrays

• An embedded gate array or structured gate array (also known as master slice or
master image ) combines some of the features of CBICs and MGAs.

• One of the disadvantages of the MGA is the fixed gate-array base cell. This makes
the implementation of memory, for example, difficult and inefficient.

• In an embedded gate array we set aside some of the IC area and dedicate it to a
specific function.

• This embedded area either can contain a different base cell that is more suitable for
building memory cells, or it can contain a complete circuit block, such as a
microcontroller.
Channelled gate array
Adv: Specific space for interconnection
Disadv: compared to CBIC space is not adjustable
Channelless gate array

ECE Dept, Vemana IT 11 Prepared by : Prathima A


21EC71 Advanced VLSI
Adv :
• Logic density is higher for channelless gate array
• Contact layers are customized
Disadv:
• No specific area for routing
• Rows of transistors used for routing are not used for other purpose.
Structured Gate Array
Adv:
• Embedded gate array set in some of IC area and dedicate to specific function-
customized.
• Increase area efficiency, performance of CBIC
• low cost and fast turn around of MGA
Disadv:
Embedded function is fixed
3) Programmable ASICs
• PLDs – Programmable Logic Devices are low-density devices which contain 1k – 10 k
gates and are available both in Bipolar and CMOS technologies [PLA, PAL or GAL]
• CPLDs or FPLDs or FPGAs -
FPGAs combine architecture of gate arrays with programmability of PLDs.
• User Configurable
• Contain Regular Structures - circuit elements such as AND, OR, NAND/NOR gates,
FFs, Mux, RAMs
• Allow Different Programming Technologies
• Allow both Matrix and Row- based Architectures
• Programmable logic devices ( PLDs ) are standard ICs
• Available in standard configurations
• Sold in very high volume to many different customers.
• PLDs may be configured or programmed to create a part customized to a
specific application
• PLDs use different technologies to allow programming of the device.

ECE Dept, Vemana IT 12 Prepared by : Prathima A


21EC71 Advanced VLSI

• The important features of PLDs:


– No customized mask layers or logic cells
– Fast design turnaround
• Structure of programmable logic device (PLD)
– A single large block of programmable interconnect
– A matrix of logic macrocells that usually consist of programmable array logic
followed by a flip-flop or latch
Examples of PLD:
• The simplest type of programmable IC is a Read-Only Memory ( ROM ).
• The most common types of ROM use a metal fuse that can be blown permanently (a
programmable ROM or PROM ).
• An electrically programmable ROM (EPROM) , uses programmable MOS transistors
whose characteristics are altered by applying a high voltage.
• Erasable PROM
– Erase an EPROM either by using another high voltage (an electrically erasable
PROM , or EEPROM )
– Exposing the device to ultraviolet light ( UV-erasable PROM , or UVPROM).
• There is another type of ROM that can be placed on any ASIC—a mask programmable
ROM (mask-programmed ROM or masked ROM).
– A masked ROM is a regular array of transistors permanently programmed using
custom mask patterns.
– An embedded masked ROM is thus a large, specialized, logic cell.
Type of PLDs-PLA and PAL:

ECE Dept, Vemana IT 13 Prepared by : Prathima A


21EC71 Advanced VLSI
• Place a logic array as a cell on a custom ASIC. This type of logic array is called a
programmable logic array (PLA).
• A PLA has a programmable AND logic array, or AND plane , followed by a
programmable OR logic array, or OR plane
• A PAL has a programmable AND plane and, in contrast to a PLA, a fixed OR plane.

• Depending on how the PLD is programmed, we can have an


– Erasable PLD (EPLD),
– Mask-programmed PLD (called as masked PLD but usually just PLD).
• The first bipolar based PALs, PLAs, and PLDs used programmable fuses or links.
• CMOS PLDs usually employ floating-gate transistors

FPGA (Field Programmable Gate Array) and its characteristics:


• None of the mask layers are customized
• A method of programming the basic logic cells and interconnect
• Core-regular array of Programmable basic logic cells implement combinational or
sequential logic
• Matrix of programmable interconnects surround the basic logic cells
• Programmable I/O cells surround the core

• Design turnaround is few hours.


• Difference between PLD and FPGA:
- FPGA are larger and more complex than PLD , (FPGA=Complex PLD)

ECE Dept, Vemana IT 14 Prepared by : Prathima A


21EC71 Advanced VLSI

FPGA - Field Programmable Gate Arrays

Why FPGA-based ASIC Design?


 Choice is based on Many Factors ;

ECE Dept, Vemana IT 15 Prepared by : Prathima A


21EC71 Advanced VLSI
 Speed
 Gate Density
 Development Time
 Prototyping and Simulation Time
 Manufacturing Lead Time
 Future Modifications
 Inventory Risk
 Cost

Different Categorizations of FPGAs :


 Based on Functional Unit/Logic
Cell Structure
 Transistor Pairs
 Basic Logic Gates: NAND/NOR
 MUX
 Look –up Tables (LUT)
 Wide-Fan-In AND-OR Gates
 Programming Technology
 Anti-Fuse Technology
 SRAM Technology
 EPROM Technology
 Gate Density
 Chip Architecture (Routing Style)

Different Types of Logic Cells :

ECE Dept, Vemana IT 16 Prepared by : Prathima A


21EC71 Advanced VLSI

Case Study : SPARC station


1. Better performance at lower cost
2. Compact size, reduced power, and quiet operation
3. Reduced number of parts, easier assembly, and improved reliability

ASIC Design Flow

ECE Dept, Vemana IT 17 Prepared by : Prathima A


21EC71 Advanced VLSI

ECE Dept, Vemana IT 18 Prepared by : Prathima A


21EC71 Advanced VLSI

Datapath Logic Cells / elements


What is the difference between datapath and standard cells?
• Standard Cell Based Design: Cells are placed together in rows but there is generally
no regularity to the arrangement of the cells within the rows—we let software arrange
the cells and complete the interconnect.
• Datapath layout automatically takes care of most of the interconnect between the
cells with the following advantages:
– Regular layout produces predictable and equal delay for each bit.
– Interconnect between cells can be built into each cell.
Disadvantages: Overhead , Harder Design, Software is more complex

ECE Dept, Vemana IT 19 Prepared by : Prathima A


21EC71 Advanced VLSI

Suppose we want to design a Full Adder (FA):

ECE Dept, Vemana IT 20 Prepared by : Prathima A


21EC71 Advanced VLSI

Combine the two functions to a single FA logic cell:


ADD(A[i],B[i],CIN,S[i],COUT)
The layout of bus-wide logic that operates on data signals is called a Datapath.
The module ADD is called a Datapath element.

Ripple-carry adder: n-bit adder built from full adders.


Delay of ripple-carry adder goes through all carry bits.

ECE Dept, Vemana IT 21 Prepared by : Prathima A


21EC71 Advanced VLSI

• In RCA, every stage adder depends on previous stage carry .


• As the carry bit ripples from one stage to another , the last stage of the adder has to
wait long for getting the carry from previous additions.
• The worst case delay is maximum delay which is the delay incurred in the rippling of
the carry all the way from LSB to MSB
• Delay is proportional to the number of the bits

Carry Look Ahead Adder :


• A carry-look ahead adder (CLA) is a type of adder used in digital logic.
• A carry-look ahead adder improves speed by reducing the amount of time required to
determine carry bits.
• The carry-look ahead adder calculates one or more carry bits before the sum, which
reduces the wait time to calculate the result of the larger value bits.
• To reduce the computation time, there are faster ways to add two binary numbers by
using carry lookahead adders.
• They work by creating two signals P and G known to be Carry Propagator and Carry
Generator.
• The carry propagator is propagated to the next level whereas the carry generator is
used to generate the output carry regardless of input carry.

ECE Dept, Vemana IT 22 Prepared by : Prathima A


21EC71 Advanced VLSI

A B Cout

0 Kill
0

1 Propagate
0

1 0 Propagate

1 1 Generate

ECE Dept, Vemana IT 23 Prepared by : Prathima A


21EC71 Advanced VLSI
• How do we make an n bit adder?
• The delay of the adder chain needs to be optimized.
• The block diagram of a 4-bit Carry Lookahead Adder is shown here below

• First compute carry propagate, generate:

• Compute sum and carry from P and G:

Carry-lookahead expansion:
• Can recursively expand carry formula.
• Having these we could design the circuit. We can now write the Boolean function for
the carry output of each stage and substitute for each Ci its value from the previous
equations

ECE Dept, Vemana IT 24 Prepared by : Prathima A


21EC71 Advanced VLSI
• Expanded formula does not depend on intermediate carries.
• Allows carry for each bit to be computed independently.

ECE Dept, Vemana IT 25 Prepared by : Prathima A


21EC71 Advanced VLSI

ECE Dept, Vemana IT 26 Prepared by : Prathima A


21EC71 Advanced VLSI
• The CLA is designed to minimize the carry propagation delay by generating the carry
signals for each bit position in advance.
• It uses a set of logic equations to compute the carry signals independently leading to
faster addition operations
• CLA is very faster than RCA because CLA utilizes Propagate and Generate signals.
• CLA enhances speed by decreasing the required amount of time to decide to carry bits
• A CLA decreases the propagation delay by introducing very complex hardware.

Carry Bypass Adder(CBA)/ Carry Skip Adder(CSKA):


• CBA is an adder implementation that improves on the delay of a RCA with little effort
compared to other adders
• The improvement of the worst case delay is achieved by using several carry skip
adders to form a block-CSKA
• CBA introduces additional logic to bypass the carry chain when possible.
• CBA composed of cascaded full adders with additional carry logic circuitry
• Modification to RCA to reduce delay of the carry:
Consider the following functions:

Generate :

Propagate :

Delete : D= A . B
Condition : If BP= P 0 . P 1 . P 2 . P 3=1 Then C0,3 = Ci,0

Else Delete or Generate occurs.

ECE Dept, Vemana IT 27 Prepared by : Prathima A


21EC71 Advanced VLSI

• When BP= P 0 . P 1 . P 2 . P 3=1, the incoming carry is bypassed through the next blocks
immediately and hence the name

• Otherwise it behaves like a normal RCA.

• The delay involved is the setup time to evaluate the generate and propagate functions

To design a N bit adder: Divide the adder into equal lengths by pass stages of
length M:(if N=16, M=4)

tsetup: Time taken to create generate and propagate signals.

tbypass: propagation delay through MUX

tcarry : propagation delay through single bit

tsum : Time to generate the sum of the final stage where N=2M

Advantages

• Improves delay compared to the RCA with minimal effort and complexity

Disadvantages

• Probabilistic speed improvement : the speed of a CBA improves for only some input
combinations

• Overhead of Multiplexer circuits

ECE Dept, Vemana IT 28 Prepared by : Prathima A


21EC71 Advanced VLSI

Applications

• In some of the DSP systems , Control systems Speed-digital systems

• In computer processors to perform fast addition operations

Carry Save Adder (CSA)

• CSA is a digital adder that can efficiently add three or more binary numbers.

• The CSA outputs two numbers , a partial sum and a carry instead of a single sum.

• CSA uses bit level compressors to break the carry chain and sum two numbers without
carry propagation.

• It is a high-speed multi quantity adder in which three different numbers can be


operated at a time.

• It is used in high speed applications and mostly used in binary multipliers.

• It is often faster than conventional addition

• The basic CSA is made up of a number of (3,2) counters in parallel, with no carry links.
Main idea: Don’t propagate carry signal until last possible stage.

ECE Dept, Vemana IT 29 Prepared by : Prathima A


21EC71 Advanced VLSI

• This adder performs three bit addition at once and produces two outputs

• The original sum can be calculated by adding the two outputs.

• The carry in this adder cannot be propagated throughout stages .

• As an alternative, carry can be stored within the current stage and updated like added
value within the next stage.

• A simple n-bit RCA is used for the next level wherever the final operation is done.(or
use carry propagate adder for final sum)

ECE Dept, Vemana IT 30 Prepared by : Prathima A


21EC71 Advanced VLSI

Advantages

• The CSA decreases the addition of 3 numbers to 2 numbers.

• It consumes low power as compared to other types of adders due to few carry
propagation stages

• Constant delay: don’t need to wait for carry to propagate through.

• The delay is zero for the tree CSA.

Disadvantages

• At each stage , only partial sums are known.

• Not applicable for simple two digit additions

• The detection of signs is not easy

• Energy efficient for high bit operations .

• Power consumption and delay is high for few-bit operations

Applications

• To calculate products within integer multiplication

• Used in high speed multiplication and better performance compared to RCA and CPA

• Used to build multi-operand adders without enhancing the hardware utilized


significantly.

Carry Select Adder(CSLA)

• A CSLA is a fast adder which is used in high speed arithmetic calculations , processing
applications, Memory architectures and digital communication systems.

• A CSLA is based on usage of multiple carry bits that is needed for the final output. This
helps to reduce the issue of carry propagation.

ECE Dept, Vemana IT 31 Prepared by : Prathima A


21EC71 Advanced VLSI
• A CSLA uses blocks of two ripple carry adders , one with constant 0 carry-in and the
other with a constant 1 carry-in. This allows both blocks to calculate in parallel.

• A carry-select adder is often used as the fast adder in a datapath library because its
layout is regular.

• In a carry-select adder we duplicate two small adders (usually 4-bit or 8-bit adders—often
CLAs) for the cases Cin = '0' and Cin = '1' and then use a MUX to select the case that we need—
wasteful, but fast [Bedrij, 1962].

Advantages

• The adder speeds up addition by performing addition operation on lower and upper
positions of the word simultaneously

Disadvantages

• The price paid is additional hardware for word adder, a set of multiplexers and the
associated interconnect wiring. Area requirement is huge because of multiple pairs of
RCAs.

• The design is favourable when speed is more important than area consumption.

Conditional Sum Adder

• A conditional sum adder is a recursive structure based on carry select adder. (CSLA)

• The carry select adder is a fast addition scheme that divides the n-bit operands into
smaller groups , allowing the serial carry propagation to be done in parallel.

ECE Dept, Vemana IT 32 Prepared by : Prathima A


21EC71 Advanced VLSI
• In the conditional sum adder , the MUX level chooses between two n/2 bit inputs that
are themselves built as conditional sum adder

• We can extend the idea behind a carry-select adder as follows. Suppose we have an n -
bit adder that generates two sums: One sum assumes a carry-in condition of '0', the
other sum assumes a carry-in condition of '1’.

• We can split this n -bit adder into an i -bit adder for the ‘i’ LSBs and an ( n – i )-bit
adder for the (n – i) MSBs. Both of the smaller adders generate two conditional sums as
well as true and complement carry signals.

• The two (true and complement) carry signals from the LSB adder are used to select
between the two (n – i + 1)-bit conditional sums from the MSB adder using 2( n – i + 1) two
input MUXes

• The above figure shows the simplest form of an n -bit conditional-sum adder that uses
n single-bit conditional adders, H (each with four outputs: two conditional sums, true
carry, and complement carry), together with a tree of 2:1 MUXes (Qi_j).

ECE Dept, Vemana IT 33 Prepared by : Prathima A


21EC71 Advanced VLSI
• The conditional-sum adder is usually the fastest of all the adders we have discussed (it
is the fastest when logic cell delay increases with the number of inputs—this is true for
all ASICs except FPGAs).

• We can recursively apply this technique. For example, we can split a 16-bit adder using
i = 8 and n = 8; then we can split one or both 8–bit adders again—and so on.

ECE Dept, Vemana IT 34 Prepared by : Prathima A


21EC71 Advanced VLSI

Advantages

• Fast addition: Parallel addition speeds up addition

• Structure is modular and suitable for ICs

• Synchronous and rapid

• Extendable for multi-operand summation

Disadvantages

• Cost of implementation is high as it requires pairs of adders

• Area requirement is high

Booth’s Multiplier

• Booth's algorithm is a method for multiplying two signed or unsigned integers in


binary representation more efficiently than straightforward algorithms.

• It uses fewer additions and subtractions by representing the multiplicand as 2's


complement numbers.

• The algorithm loads the multiplicand and multiplier into registers, initializes a third
register to 0, and performs bitwise shifts and arithmetic operations
(addition/subtraction of the multiplicand) on the registers based on the values of bits
from the multiplier.

ECE Dept, Vemana IT 35 Prepared by : Prathima A


21EC71 Advanced VLSI

Fig : Flowchart of Booth Algorithm

ECE Dept, Vemana IT 36 Prepared by : Prathima A


21EC71 Advanced VLSI

• This process builds up the product one bit at a time in a third register.
• The function of the algorithm is to determine the beginning and the end of the string of
ones in the multiplier and perform multiplicand addition-accumulation at the end of
the string or perform multiplicand subtraction-accumulation at the beginning of the
string.

ECE Dept, Vemana IT 37 Prepared by : Prathima A


21EC71 Advanced VLSI

Advantages

• Faster multiplication than traditional multiplication methods as it requires fewer steps


to produce the same result.

• Reduces partial products: making it useful for multipliers with long operands

• Efficient for signed numbers

Disadvantages

• Higher power consumption: Requires large number of adder cells

• Circuit to generate partial products in the Booth encoding is complex

Other Datapath Operators

• Figure below shows symbols for some other datapath elements.

• The combinational datapath cells, NAND, NOR, and so on, and sequential datapath cells
(flip-flops and latches) have standard-cell equivalents and function identically.

• Bold outlines(1 point) are used for datapath cells instead of the regular (0.5 point) that
is used for scalar symbols. We call a set of identical cells a vector of datapath elements
in the same way that a bold symbol, A , represents a vector and A represents a scalar.

ECE Dept, Vemana IT 38 Prepared by : Prathima A


21EC71 Advanced VLSI

1) A subtracter is similar to an adder, except in a full subtracter we have a borrow-in


signal, BIN; a borrow-out signal, BOUT; and a difference signal, DIFF.

• DIFF = A ⊕ NOT(B) ⊕ NOT( BIN)

• SUM(A, NOT(B), NOT(BIN))

• NOT(BOUT) = A · NOT(B) + A · NOT(BIN) + NOT(B) · NOT(BIN)

• MAJ(NOT(A), B, NOT(BIN)) These equations are the same as those for the FA except
that the B input is inverted and the sense of the carry chain is inverted.

• To build a subtracter that calculates (A – B) we invert the entire B input bus and
connect the BIN[0] input to VDD (not to VSS as we did for CIN[0] in an adder).

• As an example, to subtract B = '0011' from A = '1001' we calculate :

'1001' + '1100' + '1’ = '0110’.

As with an adder, the true overflow is XOR(BOUT[MSB], BOUT[MSB – 1]).

• We can build a ripple-borrow subtracter (a type of borrow-propagate subtracter), a


borrow-save subtracter, and a borrow-select subtracter in the same way we built these
adder architectures.

• An adder/subtracter has a control signal that gates the A input with an exclusive-OR
cell (forming a programmable inversion) to switch between an adder or subtracter.

• Some adder/subtracters gate both inputs to allow us to compute (–A – B). We must be
careful to connect the input to the LSB of the carry chain (CIN[0] or BIN[0]) when
changing between addition (connect to VSS) and subtraction (connect to VDD).

2) A barrel shifter rotates or shifts an input bus by a specified amount. For example if we
have an eight input barrel shifter with input '1111 0000' and we specify a shift of '0001
0000' (3, coded by bit position) the right-shifted 8-bit output is '0001 1110’.

• A barrel shifter may rotate left or right (or switch between the two under a separate
control).

• A barrel shifter may also have an output width that is smaller than the input.

ECE Dept, Vemana IT 39 Prepared by : Prathima A


21EC71 Advanced VLSI
• To use a simple example, we may have an 8-bit input and a 4-bit output.

• This situation is equivalent to having a barrel shifter with two 4-bit inputs and a 4-bit
output.

• Barrel shifters are used extensively in floating-point arithmetic to align (we call this
normalize and denormalize ) floating-point numbers (with sign, exponent, and
mantissa).

3) A leading-one detector is used with a normalizing (left-shift) barrel shifter to align


mantissas in floating-point numbers.

• The input is an n -bit bus A, the output is an n -bit bus, S, with a single '1’ in the bit
position corresponding to the most significant '1' in the input.
• Thus, for example, if the input is A = '0000 0101' the leading-one detector output
is S = '0000 0100', indicating the leading one in A is in bit position 2 (bit 7 is the
MSB, bit zero is the LSB).
• If we feed the output, S, of the leading-one detector to the shift select input of a
normalizing (left-shift) barrel shifter, the shifter will normalize the input A.
• In our example, with an input of A = '0000 0101', and a left-shift of S = '0000
0100', the barrel shifter will shift A left by five bits and the output of the shifter is
Z = '1010 0000’.
• Now that Z is aligned (with the MSB equal to '1') we can multiply Z with another
normalized number.

4) The output of a priority encoder is the binary-encoded position of the leading one in an
input.

For example, with an input A = '0000 0101' the leading 1 is in bit position 3 so the output
of a 4-bit priority encoder would be Z = ‘0011' (3).

In some cell libraries the encoding is reversed so that the MSB has an output code of zero,
in this case Z = '0101' (5).

This second, reversed, encoding scheme is useful in floating-point arithmetic.

If A is a mantissa and we normalize A to '1010 0000' we have to subtract 5 from the


exponent, this exponent correction is equal to the output of the priority encoder.

ECE Dept, Vemana IT 40 Prepared by : Prathima A


21EC71 Advanced VLSI

5) An accumulator is an adder/subtracter and a register. Sometimes these are combined


with a multiplier to form a multiplier–accumulator( MAC ). An incrementer adds 1 to the
input bus,

Z = A + 1, so we can use this function, together with a register, to negate a two‘s


complement number for example.

• The implementation is Z[ i ] = XOR(A[ i ], CIN[ i ]), and

COUT[ i ] = AND(A[ i ], CIN[ i ]).

• The carry-in control input, CIN[0], thus acts as an enable: If it is set to '0' the output is
the same as the input.

• The implementation of arithmetic cells is often a little more complicated than we have
explained.

• CMOS logic is naturally inverting, so that it is faster to implement an incrementer as :

Z[ i (even)] = XOR(A[ i ], CIN[ i ]) and COUT[ i (even)] = NAND(A[ i ], CIN[ i ]).

• This inverts COUT, so that in the following stage we must invert it again. If we push an
inverting bubble to the input CIN we find that:

Z[ i (odd)]= XNOR(A[ i ], CIN[ i ]) and COUT[ i (even)] =NOR(NOT(A[ i ]),CIN[i ]).

• In many datapath implementations all odd-bit cells operate on inverted carry signals,
and thus the odd-bit and even-bit datapath elements are different.

• In fact, all the adder and subtracter datapath elements we have described may use this
technique.

• Normally this is completely hidden from the designer in the datapath assembly and
any output control signals are inverted, if necessary, by inserting buffers.

• A decrementer subtracts 1 from the input bus, the logical implementation is :

• Z[ i ] = XOR(A[ i ], CIN[ i ]) and COUT[ i ] = AND(NOT(A[ i ]), CIN[ i ]).

• The implementation may invert the odd carry signals, with CIN[0] again acting

• as an enable.

ECE Dept, Vemana IT 41 Prepared by : Prathima A


21EC71 Advanced VLSI
• An incrementer/decrementer has a second control input that gates the input,

• inverting the input to the carry chain.

• This has the effect of selecting either the increment or decrement function

6) Using the all-zeros detectors and all-ones detectors :

• For a 4-bit number, for example, zero in ones‘ complement arithmetic is '1111' or
'0000', and that zero in signed magnitude arithmetic is '1000' or '0000’.

7) A register file (or scratchpad memory) is a bank of flip-flops arranged across the bus;
sometimes these have the option of multiple ports (multiport register files) for read and
write.

• Normally these register files are the densest logic and hardest to fit in a datapath. For
large register files it may be more appropriate to use a multiport memory. We can add
control logic to a register file to create a first-in first-out register ( FIFO ), or last-in
first-out register ( LIFO ).

I/O Cells

The Figure below shows a three-state bidirectional output buffer (Tri-State ® is a


registered trademark of National Semiconductor).

When the output enable (OE) signal is high, the circuit functions as a noninverting
buffer driving the value of DATAout onto the I/O pad.

ECE Dept, Vemana IT 42 Prepared by : Prathima A


21EC71 Advanced VLSI

• When OE is low, the output transistors or drivers , M1 and M2, are disconnected. This allows
multiple drivers to be connected on a bus.
• It is up to the designer to make sure that a bus never has two drivers—a problem known as
contention .
• In order to prevent the problem opposite to contention—a bus floating to an
intermediate voltage when there are no bus drivers—we can use a bus keeper or bus-
hold cell (TI calls this Bus-Friendly logic).

• A bus keeper normally acts like two weak (low drive-strength) cross-coupled inverters
that act as a latch to retain the last logic state on the bus, but the latch is weak enough
that it may be driven easily to the opposite state.

• Even though bus keepers act like latches, and will simulate like latches, they should
not be used as latches, since their drive strength is weak.

• Transistors M1 and M2 have to drive large off-chip loads.

• If we wish to change the voltage on a C = 200 pF load by 5 V in 5 ns (a slew rate of 1


Vns–1 ) we will require a current in the output transistors of :

• IDS = C (d V /d t ) = (200 ¥ 10 –12 ) (5/5 ¥ 10–9) = 0.2 A or 200 mA.

• Such large currents flowing in the output transistors must also flow in the power
supply bus and can cause problems.

• There is always some inductance in series with the power supply, between the point at
which the supply enters the ASIC package and reaches the power bus on the chip.

• The inductance is due to the bond wire, lead frame, and package pin.

• If we have a power-supply inductance of 2nH and a current changing from zero to 1 A


(32 I/O cells on a bus switching at 30 mA each) in 5 ns, we will have a voltage spike on
the power supply (called power-supply bounce ) of :

L (d I /d t ) = (2 ¥ 10–9 )(1/(5 ¥ 10–9 )) = 0.4 V.

• Several things can be done to alleviate this problem:

• We can limit the number of simultaneously switching outputs (SSOs).

ECE Dept, Vemana IT 43 Prepared by : Prathima A


21EC71 Advanced VLSI
• We can limit the number of I/O drivers that can be attached to any one VDD and GND
pad

• We can design the output buffer to limit the slew rate of the output (we call these slew-
rate limited I/O pads).

• Quiet-I/O cells also use two separate power supplies and two sets of I/O drivers:

• An AC supply (clean or quiet supply) with small AC drivers for the I/O circuits that
start and stop the output slewing at the beginning and end of a output transition

• A DC supply (noisy or dirty supply) for the transistors that handle large currents as
they slew the output.

• The three-state buffer allows us to employ the same pad for input and output—
bidirectional I/O .

• When we want to use the pad as an input, we set OE low and take the data from
DATAin.

• Of course, it is not necessary to have all these features on every pad: We can build
output-only or input-only pads

• We can also use many of these output cell features for input cells that have to drive
large on-chip loads (a clock pad cell, for example).

• Some gate arrays simply turn an output buffer around to drive a grid of interconnect
that supplies a clock signal internally.

• With a typical interconnect capacitance of 0.2pFcm–1 , a grid of 100 cm (consisting of


10 by 10 lines running all the way across a 1 cm chip) presents a load of 20 pF to the
clock buffer.

• Some libraries include I/O cells that have passive pull-ups or pull-downs (resistors)
instead of the transistors, M1 and M2 (the resistors are normally still constructed from
transistors with long gate lengths).

• We can also omit one of the driver transistors, M1 or M2, to form open-drain outputs
that require an external pull-up or pull-down.

• We can design the output driver to produce TTL output levels rather than CMOS logic
levels.

ECE Dept, Vemana IT 44 Prepared by : Prathima A


21EC71 Advanced VLSI
• We may also add input hysteresis (using a Schmitt trigger) to the input buffer, to accept
input data signals that contain glitches (from bouncing switch contacts, for example) or
that are slow rising.

• The input buffer can also include a level shifter to accept TTL input levels and shift the
input signal to CMOS levels.

• The gate oxide in CMOS transistors is extremely thin (100 Å or less). This leaves the
gate oxide of the I/O cell input transistors susceptible to breakdown from static
electricity ( electrostatic discharge , or ESD ).

• ESD arises when we or machines handle the package leads (like the shock I sometimes
get when I touch a doorknob after walking across the carpet at work).

• Sometimes this problem is called electrical overstress (EOS) since most ESD-related
failures are caused not by gate-oxide breakdown, but by the thermal stress (melting)
that occurs when the n -channel transistor in an output driver overheats (melts) due to
the large current that can flow in the drain diffusion connected to a pad during an ESD
event.

• To protect the I/O cells from ESD, the input pads are normally tied to device structures
that clamp the input voltage to below the gate breakdown voltage (which can be as low
as 10 V with a 100 Ao gate oxide).

• Some I/O cells use transistors with a special ESD implant that increases breakdown
voltage and provides protection.

• I/O driver transistors can also use elongated drain structures (ladder structures) and
large drain-to-gate spacing to help limit current.

• In a salicide process that lowers the drain resistance ladder structures are difficult.
One solution is to mask the I/O cells during the salicide step.

• Another solution is to use pnpn and npnp diffusion structures called silicon-controlled
rectifiers (SCRs) to clamp voltages and divert current to protect the I/O circuits from
ESD

• There are several ways to model the capability of an I/O cell to withstand EOS.

ECE Dept, Vemana IT 45 Prepared by : Prathima A


21EC71 Advanced VLSI
• The human-body model ( HBM ) represents ESD by a 100 pF capacitor discharging
through a 1.5 k W resistor (this is an International Electrotechnical Committee, IEC,
specification).

• Typical voltages generated by the human body are in the range of 2–4 kV, and we often
see an I/O pad cell rated by the voltage it can withstand using the HBM.

• The machine model ( MM ) represents an ESD event generated by automated


machine handlers. Typical MM parameters use a 200 pF capacitor (typically charged to
200 V) discharged through a 25 W resistor, corresponding to a peak initial current of
nearly 10 A.

• The charge-device model ( CDM , also called device charge–discharge) represents the
problem when an IC package is charged, in a shipping tube for example, and then
grounded.

• If the maximum charge on a package is 3 nC (a typical measured figure) and the


package capacitance to ground is 1.5 pF, we can simulate this event by charging a 1.5
pF capacitor to 2 kV and discharging it through a 1 W resistor.

• If the diffusion structures in the I/O cells are not designed with care, it is possible to
construct an SCR structure unwittingly, and instead of protecting the transistors the
SCR can enter a mode where it is latched on and conducting large enough currents to
destroy the chip.

• This failure mode is called latchup.

• Latch-up can occur if the pn -diodes on a chip become forward-biased and inject
minority carriers (electrons in p -type material, holes in n -type material) into the
substrate.

• The source–substrate and drain–substrate diodes can become forward-biased due to


power-supply bounce or output undershoot (the cell outputs fall below VSS ) or
overshoot(outputs rise to greater than VDD ) for example.

• These injected minority carriers can travel fairly large distances and interact with
nearby transistors causing latch-up.

ECE Dept, Vemana IT 46 Prepared by : Prathima A


21EC71 Advanced VLSI
• I/O cells normally surround the I/O transistors with guard rings (a continuous ring of
n -diffusion in an n -well connected to VDD, and a ring of p -diffusion in a p –well
connected to VSS) to collect these minority carriers.

• This is a problem that can also occur in the logic core and this is one reason that we
normally include substrate and well connections to the power supplies in every cell

Cell Compilers

• The process of hand crafting circuits and layout for a full-custom IC is a tedious, time-
consuming, and error-prone task.

• There are two types of automated layout assembly tools, often known as a silicon
compilers .

• The first type produces a specific kind of circuit, a RAM compiler or multiplier
compiler , for example.

• The second type of compiler is more flexible, usually providing a programming


language that assembles or tiles layout from an input command file, but this is full-
custom IC design.A register file can be built from latches or flip-flops, but, at 4.5–6.5
gates (18–26 transistors) per bit, this is an expensive way to build memory.

• Dynamic RAM (DRAM) can use a cell with only one transistor, storing charge on a
capacitor that has to be periodically refreshed as the charge leaks away.

• ASIC RAM is invariably static (SRAM), so we do not need to refresh the bits.

• When we refer to RAM in an ASIC environment we almost always mean SRAM. Most
ASIC RAMs use a six-transistor cell (four transistors to form two cross-coupled
inverters that form the storage loop, and two more transistors to allow us to read from
and write to the cell).

• RAM compilers are available that produce single-port RAM (a single shared bus for
read and write) as well as dual-port RAMs , and multiport RAMs .

• In a multi-port RAM the compiler may or may not handle the problem of address
contention (attempts to read and write to the same RAM address simultaneously).

• RAM can be asynchronous (the read and write cycles are triggered by control and/or
address transitions asynchronous to a clock) or synchronous (using the system clock).

ECE Dept, Vemana IT 47 Prepared by : Prathima A

You might also like