0% found this document useful (0 votes)
14 views

Module -1 - Introduction to ASIC

Refer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Module -1 - Introduction to ASIC

Refer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

ADVANCED VLSI – 21EC71 [MODULE – 1]

MODULE -1
Introduction to ASICs: Full custom, Semi-custom and Programmable ASICs, ASIC Design
flow, ASIC cell libraries. CMOS Logic: Data path Logic Cells: Data Path Elements, Adders:
Carry skip, Carry bypass, Carry save, Carry select, Conditional sum, Multiplier (Booth
encoding), Data path Operators, I/O cells, Cell Compilers

Pre-requisite: Basic concepts of CMOS Logic, Basic concepts of logic Design, Basic
concepts of EDA

Introduction
ASIC stands for Application Specific Integrated Circuit. It is specially built for a specific
application or purpose. If compared to any other device, ASIC has improved speed. Basically, it
is an integrated circuit that’s been specified for one specific purpose and is not software
programmable to perform a wide variety of different tasks. These are widely used in applications,
including auto emission control, environmental monitoring, and personal digital assistants. It often
has an embedded CPU to manage suitable tasks.

The semiconductor industry has evolved from the first ICs of the early 1970s and matured rapidly
since then. Early small-scale integration ( SSI ) ICs contained a few (1 to 10) logic gates NAND
gates, NOR gates, and so on a mounting to a few tens of transistors. The era of medium-scale
integration ( MSI ) increased the range of integrated logic available to counters and similar, larger
scale, logic functions. The era of large-scale integration ( LSI ) packed even larger logic functions,
such as the first microprocessors, into a single chip. The era of very large-scale integration ( VLSI
) now offers 64-bit microprocessors, complete with cache memory and floating-point arithmetic
units well over a million transistors on a single piece of silicon. As CMOS process technology
improves, transistors continue to get smaller and ICs hold more and more transistors.

The earliest ICs used bipolar technology and the majority of logic ICs used either transistor
transistor logic ( TTL ) or emitter-coupled logic (ECL). Although invented before the bipolar
transistor, the metal-oxide-silicon ( MOS ) transistor was initially difficult to manufacture because
of problems with the oxide interface. As these problems were gradually solved, metal-gate n -
channel MOS (nMOS or NMOS ) technology developed in the 1970s. At that time MOS
technology required fewer masking steps, was denser, and consumed less power than equivalent

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 1


ADVANCED VLSI – 21EC71 [MODULE – 1]

bipolar ICs. This meant that, for a given performance, an MOS IC was cheaper than a bipolar IC
and led to investment and growth of the MOS IC market.

By the early 1980s the aluminum gates of the transistors were replaced by polysilicon gates, but
the name MOS remained. The introduction of polysilicon as a gate material was a major
improvement in CMOS technology, making it easier to make two types of transistors, n -channel
MOS and p -channel MOS transistors, on the same IC a complementary MOS ( CMOS , never
cMOS) technology. The principal advantage of CMOS over NMOS is lower power consumption.
Another advantage of a polysilicon gate was a simplification of the fabrication process, allowing
devices to be scaled down in size.

There are four CMOS transistors in a two-input NAND gate (and a two-input NOR gate too), so
to convert between gates and transistors, you multiply the number of gates by 4 to obtain the
number of transistors. We can also measure an IC by the smallest feature size (roughly half the
length of the smallest transistor) imprinted on the IC. Transistor dimensions are measured in
microns (a micron, 1mm, is a millionth of a meter). Thus we talk about a 0.5 mm IC or say an IC
is built in (or with) a 0.5 mm process, meaning that the smallest transistors are 0.5 mm in length.
We give a special label, l or lambda , to this smallest feature size. Since lambda is equal to half of
the smallest transistor length, l ª 0.25 mm in a 0.5 mm process. Many of the drawings in this book
use a scale marked with lambda for the same reason we place a scale on a map.

A modern submicron CMOS process is now just as complicated as a submicron bipolar or


BiCMOS (a combination of bipolar and CMOS) process. However, CMOS ICs have established
a dominant position, are manufactured in much greater volume than any other technology, and
therefore, because of the economy of scale, the cost of CMOS ICs is less than a bipolar or BiCMOS
IC for the same function. Bipolar and BiCMOS ICs are still used for special needs. For example,
bipolar technology is generally capable of handling higher voltages than CMOS.
This makes bipolar and BiCMOS ICs useful in power electronics, cars, telephone circuits, and so
on.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 2


ADVANCED VLSI – 21EC71 [MODULE – 1]

Some digital logic ICs and their analog counterparts (analog/digital converters, for example) are
standard parts , or standard ICs. You can select standard ICs from catalogs and data books and buy
them from distributors. Systems manufacturers and designers can use the same standard part in a
variety of different microelectronic systems (systems that use microelectronics or ICs).
With the advent of VLSI in the 1980s engineers began to realize the advantages of designing an
IC that was customized or tailored to a particular system or application rather than using standard
ICs alone. Microelectronic system design then becomes a matter of defining the functions that you
can implement using standard ICs and then implementing the remaining logic functions
(sometimes called glue logic) with one or more custom ICs . As VLSI became possible you
could build a system from a smaller number of components by combining many standard ICs into
a few custom ICs. Building a microelectronic system with fewer ICs allows you to reduce cost and
improve reliability.

ASIC Categories

ASIC

Programmable
Full – Custom ASICs Semi-Custom ASICs
ASICs

Gate Array Cell Array

Channeled Array Standard Cell

Channel-less Array Macro Cell

• Full – Custom ASICs


• Semi-Custom ASICs
o Gate Array Based Design
▪ Channelled Array
▪ Channel-less Array
o Cell Array Based design
▪ Standard Cell
▪ Macro Cell
• Programmable ASICs

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 3


ADVANCED VLSI – 21EC71 [MODULE – 1]

Full Custom ASIC:

In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or layout
specifically for one ASIC. This means the designer abandons the approach of using pretested
and precharacterized cells for all or part of that design. It makes sense to take this approach only
if there are no suitable existing cell libraries available that can be used for the entire design. This
might be because existing cell libraries are not fast enough, or the logic cells are not small enough
or consume too much power. You may need to use full-custom design if
the ASIC technology is new or so specialized that there are no existing cell libraries or because
the ASIC is so specialized that some circuits must be custom designed. Fewer and fewer full-
custom ICs are being designed because of the problems with these special parts of the ASIC
Bipolar technology has historically been used for precision analog functions. There are some
fundamental reasons for this. In all integrated circuits the matching of component characteristics
between chips is very poor, while the matching of characteristics between components on the
same chip is excellent.

Suppose we have transistors T1, T2, and T3 on an analog/digital ASIC. The three transistors are
all the same size and are constructed in an identical fashion. Transistors T1 and T2 are located
adjacent to each other and have the same orientation. Transistor T3 is the same size as T1 and
T2 but is located on the other side of the chip from T1 and T2 and has a different orientation.
ICs are made in batches called wafer lots. A wafer lot is a group of silicon wafers that are all
processed together. Usually there are between 5 and 30 wafers in a lot. Each
wafer can contain tens or hundreds of chips depending on the size of the IC and the wafer.

If we were to make measurements of the characteristics of transistors T1, T2, and T3 we would
find the following:
• Transistors T1 will have virtually identical characteristics to T2 on the same IC. We say
that the transistors match well or the tracking between devices is excellent.
• Transistor T3 will match transistors T1 and T2 on the same IC very well, but not as
closely as T1 matches T2 on the same IC.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 4


ADVANCED VLSI – 21EC71 [MODULE – 1]

• Transistor T1, T2, and T3 will match fairly well with transistors T1, T2, and T3 on a
different IC on the same wafer. The matching will depend on how far apart the two ICs
are on the wafer.
• Transistors on ICs from different wafers in the same wafer lot will not match very well.
• Transistors on ICs from different wafer lots will match very poorly.

For many analog designs the close matching of transistors is crucial to circuit operation.
For these circuit designs pairs of transistors are used, located adjacent to each other. Device
physics dictates that a pair of bipolar transistors will always match more precisely than
CMOS transistors of a comparable size.

Bipolar technology has historically been more widely used for full-custom analog design
because of its improved precision. Despite its poorer analog properties, the use of CMOS
technology for analog functions is increasing. There are two reasons for this.
• The first reason is that CMOS is now by far the most widely available IC
technology. Many more CMOS ASICs and CMOS standard products are now being
manufactured than bipolar ICs.
• The second reason is that increased levels of integration require mixing analog and
digital functions on the same IC: this has forced designers to find ways to use
CMOS technology to implement analog functions. Circuit designers, using clever
new techniques, have been very successful in finding new ways to design analog
CMOS circuits that can approach the accuracy of bipolar analog designs

The main advantages of full-custom ASICs over other IC designs are it delivers the highest
possible performance at the smallest possible die size. But this high performance and small size
comes at a price of increased design time, complex design and overall cost of the IC itself.
Some of most common full-custom ASICs are Microprocessors, Memories, Analog
Processors, Analog / Digital Communication devices, Sensors, Transducers, high-voltage
ICs for Automobiles, etc.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 5


ADVANCED VLSI – 21EC71 [MODULE – 1]

Semi Custom ASICs:

In this type of design logic cells are taken from standard libraries.i.e. they are not handcrafted as
in Full custom design. Some masks are customized while some are taken from the predesigned
library. Based on the type of logic cells taken from the library and amount of customization
allowed for interconnects these ASICs are divided into two types- Standard cell-based ASIC and
Gate Array-based ASIC.

Gate Array Based ASIC:


This type of semi-custom ASIC have predefined transistors on the silicon wafer .i.e. the
designer cannot change the placement of the transistors present on the die. Base array is the
predefined pattern of the gate array and the base cell is the smallest repetitive cell of the base
array.

The designer only has liability to change interconnection between transistors using the first few
metal layers of the die. The designer chooses from the gate array library. These are often called
as Masked Gate Array. Gate Array Based ASIC are of three types.
• Channelled Gate array,
• Channel less gate array
• Structured gate array.

Channelled Gate Array:


In this type of gate array, wiring space is left between rows of transistors. These are similar to
Cell-Based IC (CBIC) as space is left for interconnection between blocks but in channeled gate
array cell rows are fixed in height whereas in CBIC this space can be adjusted.
Some of the Constraints are as follows
• Only the interconnect is customized.
• The interconnect uses predefined spaces between rows of base cells.
• Manufacturing lead time is between two days and two weeks.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 6


ADVANCED VLSI – 21EC71 [MODULE – 1]

Figure: A channeled gate-array die. The spaces between rows of the base cells are set aside for
interconnect

Channeless gate Array:

There is no free space left for routing between rows of cells as seen in the channeled gate array.
Here routing is done from above the gate array cells as we can customize the connection between
the metal 1 and transistors. For routing, we leave the transistors lying in the path of routing unused.
The manufacturing lead time is about two weeks.

The important features of this type of MGA are as follows:


• Only some (the top few) mask layers are customized the interconnect.
• Manufacturing lead time is between two days and two weeks

Figure: A channel less gate-array or sea-of-gates (SOG) array die. The core area of the die is
completely filled with an array of base cells (the base array)

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 7


ADVANCED VLSI – 21EC71 [MODULE – 1]

The key difference between a channelless gate array and channeled gate array is that there are no
predefined areas set aside for routing between cells on a channelless gate array. Instead we
route over the top of the gate-array devices. We can do this because we customize the contact
layer that defines the connections between metal1, the first layer of metal, and the transistors.
When we use an area of transistors for routing in a channelless array, we do not make any
contacts to the devices lying underneath; we simply leave the transistors unused.

The logic density the amount of logic that can be implemented in a given silicon area is higher
for channelless gate arrays than for channeled gate arrays. This is usually attributed to the
difference in structure between the two types of array.

In fact, the difference occurs because the contact mask is customized in a channelless gate array,
but is not usually customized in a channeled gate array. This leads to denser cells in the
channelless architectures. Customizing the contact layer in a channelless gate array allows us to
increase the density of gate-array cells because we can route over the top of unused contact sites

Structured Gate Array

This type of gate array has an embedded block along with gate array rows as seen above. Structured
gate array has a higher area efficiency of CBIC. Like Masked gate array these have lower cost and
faster turnaround. Here the fixed size of the embedded function poses a limitation on the structured
gate array. For example, is this gate array contains an area reserved for 32k bit controller but if in
an application we only require an area for 16k bit controller the remaining area gets wasted. All
the gate array have a turnaround time of two days to two weeks and all have customized
interconnect.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 8


ADVANCED VLSI – 21EC71 [MODULE – 1]

Figure: A structured or embedded gate-array die showing an embedded block in the upper left corner (a
static random-access memory, for example). The rest of the die is filled with an array of base cells.

The important features of this type of Masked Gate Array (MGA) are the following:
• Only the interconnect is customized.
• Custom blocks (the same for each design) can be embedded.
• Manufacturing lead time is between two days and two weeks

An embedded gate array gives the improved area efficiency and increased performance of a CBIC
but with the lower cost and faster turnaround of an MGA. One disadvantage of an embedded gate
array is that the embedded function is fixed. For example, if an embedded gate array contains an
area set aside for a 32 k-bit memory, but we only need a 16 k-bit memory, then we may have to
waste half of the embedded memory function. However, this may still be more efficient and
cheaper than implementing a 32 k-bit memory using macros on a SOG array.

Cell Based -ASIC:


A cell-based ASIC (cell-based IC, or CBIC a common term in Japan, pronounced sea-bick) uses
predesigned logic cells (AND gates, OR gates, multiplexers, and flip-flops, for example) known
as standard cells . We could apply the term CBIC to any IC that uses cells, but it is generally
accepted that a cell-based ASIC or CBIC means a standard-cell based ASIC.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 9


ADVANCED VLSI – 21EC71 [MODULE – 1]

Standard Cell Based ASIC:


A Standard Cell based ASIC uses predesigned logic cells like Gates, Multiplexers, Flip-flops,
Adders etc. These logic cells are known as Standard Cells that are already designed and stored in
a library. This library is imported into the CAD tool and the design can performed using the
components of the library as inputs.
Typically, Standard Cell based designs are organized as rows of constant height cells on the chip,
just like a row of bricks. When combined with logic-level components, standard cell-based designs
can be used to implement complex functions like Multipliers and Memory Arrays.

The ASIC designer defines only the placement of the standard cells and the interconnect in a CBIC.
However, the standard cells can be placed anywhere on the silicon; this means that all the mask
layers of a CBIC are customized and are unique to a particular customer. The advantage of CBICs
is that designers save time, money, and reduce risk by using a predesigned, pretested, and
precharacterized standard-cell library. In addition, each standard cell can be optimized
individually. During the design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area, for example. The disadvantages are the time
or expense of designing or buying the standard-cell library and the time needed to fabricate all
layers of the ASIC for each new design

Figure: A cell-based ASIC (CBIC) die with a single standard-cell area (a flexible block) together
with four fixed blocks. The flexible block contains rows of standard cells. The small squares
around the edge of the die are bonding pads that are connected to the pins of the ASIC package.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 10


ADVANCED VLSI – 21EC71 [MODULE – 1]

Example:
Standard cells are designed to fit together like
bricks in a wall. Figure shows
an example of a simple standard cell (it is
simple in the sense it is not maximized
for density but ideal for showing you its
internal construction). Power and ground
buses (VDD and GND or VSS) run
horizontally on metal lines inside the cells

Standard-cell design allows the automation of the process of assembling an ASIC. Groups of
standard cells fit horizontally together to form rows. The rows stack vertically to form flexible
rectangular blocks (which you can reshape during design). You may then connect a flexible block

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 11


ADVANCED VLSI – 21EC71 [MODULE – 1]

built from several rows of standard cells to other standard-cell blocks or other full-custom logic
blocks.
For example, you might want to include a custom interface to a standard, predesigned
microcontroller together with some memory. The microcontroller block may be a fixed-size
megacell, you might generate the memory using a memory compiler, and the custom logic and
memory controller will be built from flexible standard-cell blocks, shaped to fit in the empty spaces
on the chip Both cell-based and gate-array ASICs use predefined cells, but there is a
difference we can change the transistor sizes in a standard cell to optimize speed and performance,
but the device sizes in a gate array are fixed. This results in a trade-off in performance and area in
a gate array at the silicon level.
The trade-off between area and performance is made at the library level for a standard-cell
ASIC.

Modern CMOS ASICs use two, three, or more levels (or layers) of metal for interconnect. This
allows wires to cross over different layers in the same way that we use copper traces on different
layers on a printed-circuit board.
• In a two-level metal CMOS technology, connections to the standard-cell inputs and outputs
are usually made using the second level of metal ( metal2 , the upper level of metal) at the
tops and bottoms of the cells.
• In a three-level metal technology, connections may be internal to the logic cell. This allows
for more sophisticated routing programs to take advantage of the extra metal layer to route

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 12


ADVANCED VLSI – 21EC71 [MODULE – 1]

interconnect over the top of the logic cells A connection that needs to cross over a row of
standard cells uses a feedthrough.
• The term feedthrough can refer either to the piece of metal that is used to pass a signal
through a cell or to a space in a cell waiting to be used as a feedthrough very confusing.
Figure shows two feedthroughs: one in cell A.14 and one in cell A.23. In both two-level
and three-level metal technology, the power buses (VDD and GND) inside the standard
cells normally use the lowest (closest to the transistors) layer of metal ( metal1 ). The width
of each row of standard cells is adjusted so that they may be aligned using spacer cells .
The power buses, or rails, are then connected to additional vertical power rails using row-
end cells at the aligned ends of each standard-cell block.
• If the rows of standard cells are long, then vertical power rails can also be run in metal2
through the cell rows using special power cells that just connect to VDD and GND. Usually
the designer manually controls the number and width of the vertical power rails connected
to the standard-cell blocks during physical design. A diagram of the power distribution
scheme for a CBIC is shown in Figure

All the mask layers of a CBIC are customized. This allows megacells (SRAM, a SCSI controller,
or an MPEG decoder, for example) to be placed on the same IC with standard cells. Megacells are
usually supplied by an ASIC or library company complete with behavioral models and some way

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 13


ADVANCED VLSI – 21EC71 [MODULE – 1]

to test them (a test strategy). ASIC library companies also supply compilers to generate flexible
DRAM, SRAM, and ROM blocks. Since all mask layers on a standard-cell design are customized,
memory design is more efficient and denser than for gate arrays.
For logic that operates on multiple signals across a data bus a datapath ( DP ) the
use of standard cells may not be the most efficient ASIC design style.
Some ASIC library companies provide a datapath compiler that automatically generates
datapath logic . A datapath library typically contains cells such as adders, subtracters, multipliers,
and simple arithmetic and logical units ( ALUs ). The connectors of datapath library cells are pitch-
matched to each other so that they fit together. Connecting datapath cells to form a datapath
usually, but not always, results in faster and denser layout than using standard cells or a gate array.

Standard-cell and gate-array libraries may contain hundreds of different logic


cells, including combinational functions (NAND, NOR, AND, OR gates) with multiple inputs, as
well as latches and flip-flops with different combinations of reset, preset and clocking options. The
ASIC library company provides designers with a data book in paper or electronic form with all of
the functional descriptions and timing information for each library element

Programmable ASIC
There are two types of programmable ASICs. They are PLD and FPGA

PLDs (Programmable Logic Devices)


These are the standard cells readily available. We can program a PLD to customized a part of the
application, so they are considered as ASIC. We can use different methods and software to
program a PLD. These contain a regular matrix of logic cells usually programmable array logic
along with flip-flops or latches. Here interconnects are present as a single large block.
PROM is a common example of this IC. EPROM uses MOS transistors as interconnect so by
applying high voltage we can program it. PLDs have no customized logic cells or interconnect.
These have a fast design turnaround.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 14


ADVANCED VLSI – 21EC71 [MODULE – 1]

Figure: A programmable logic device (PLD) die. The macrocells typically consist of programmable array
logic followed by a flip-flop or latch. The macrocells are connected using a large programmable
interconnect block

the following important features that all PLDs have in common:


• No customized mask layers or logic cells
• Fast design turnaround
• A single large block of programmable interconnect
• A matrix of logic macrocells that usually consist of programmable array logic followed
by a flip-flop or latch
The simplest type of programmable IC is a read-only memory ( ROM ). The most common types
of ROM use a metal fuse that can be blown permanently (a programmable ROM or PROM ). An
electrically programmable ROM , or EPROM , uses programmable MOS transistors whose
characteristics are altered by applying a high voltage. You can erase an EPROM either by using
another high voltage (an electrically erasable PROM , or EEPROM ) or by exposing the
device to ultraviolet light ( UV-erasable PROM , or UVPROM ).

There is another type of ROM that can be placed on any ASIC a mask-programmable ROM
(mask-programmed ROM or masked ROM). A masked ROM is a regular array of transistors
permanently programmed using custom mask patterns. An embedded masked ROM is thus a
large, specialized, logic cell.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 15


ADVANCED VLSI – 21EC71 [MODULE – 1]

The same programmable technologies used to make ROMs can be applied to more flexible logic
structures. By using the programmable devices in a large array of AND gates and an array of OR
gates, we create a family of flexible and programmable logic devices called logic arrays . The
company Monolithic
Memories (bought by AMD) was the first to produce Programmable Array Logic (PAL ® , a
registered trademark of AMD) devices that you can use, for example, as transition decoders for
state machines.
A PAL can also include registers (flip-flops) to store the current state information so that you can
use a PAL to make a complete state machine. Just as we have a mask-programmable ROM, we
could place a logic array as a cell on a custom ASIC. This type of logic array is called a
programmable logic array (PLA). There is a difference between a PAL and a PLA: a PLA has a
programmable AND logic array, or AND plane , followed by a programmable OR logic array, or
OR plane ; a PAL has a programmable AND plane and, in contrast to a PLA, a fixed OR plane.
Depending on how the PLD is programmed, we can have an erasable PLD (EPLD), or mask-
programmed PLD (sometimes called a masked PLD but usually just PLD). The first PALs,
PLAs, and PLDs were based on bipolar technology and used programmable fuses or links.
CMOS PLDs usually employ floating-gate transistors

Field-Programmable Gate Arrays


A step above the PLD in complexity is the field-programmable gate array ( FPGA ). There is
very little difference between an FPGA and a PLD an FPGA is usually just larger and more
complex than a PLD. In fact, some companies that manufacture programmable ASICs call their
products FPGAs and some call them complex PLDs . FPGAs are the newest member of the
ASIC family and are rapidly growing in importance, replacing TTL in microelectronic systems.
Even though an FPGA is a type of gate array, we do not consider the term gate-array based
ASICs to include FPGAs. This may change as FPGAs and MGAs start to look more alike.
the essential characteristics of an FPGA:
• None of the mask layers are customized.
• A method for programming the basic logic cells and the interconnect.
• The core is a regular array of programmable basic logic cells that can implement
combinational as well as sequential logic (flip-flops).

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 16


ADVANCED VLSI – 21EC71 [MODULE – 1]

• A matrix of programmable interconnect surrounds the basic logic cells.


• Programmable I/O cells surround the core.
• Design turnaround is a few hours.

ASIC Design Flow


• Design Entry: In the step, the logic design is created using a Hardware Description
Language (HDL) like VHDL or Verilog or with the help of Schematic entry.
• Logic Synthesis: Once the logic is designed using HDL or Schematic entry, the next step
is to extract the description of the logic cells and their interconnections. This information
is also called as Netlist.
• System Partitioning: The next step is to logically divide the entire system into small ASIC
sized blocks.
• Pre-layout Simulation: Before going into the actual physical layout of the design, a
simulation tool checks the circuit for proper working. In fact, this process is performed at
every step so that if any errors are found, then it would be easy to correct them at this stage
itself. The process until this step is usually regarded as Logical Design. The steps after this
are related to the actual physical layout of the design.
• Floorplanning: The first step in the physical design is arrange all the blocks of the circuit
on the chip.
• Placement: In this step, the location of the logic cells in a block are set.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 17


ADVANCED VLSI – 21EC71 [MODULE – 1]

• Routing: Once the placement of the blocks and cells is completed, then it is time to create
the connections between the cells and the blocks.
• Extraction: The next step is to determine the resistance and capacitance of the
interconnections previously made, since they decide the delay of the signal. Also, the
delays are calculated at this stage.
• Post-layout Simulation: Once the physical design is complete, the circuit is again tested
for working. The delays previously calculated are also taken into consideration for the
simulation process.
• Design Rule Check (DRC): Final step is to verify the layout of the entire circuit and check
whether it complies with the design rule specifications.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 18


ADVANCED VLSI – 21EC71 [MODULE – 1]

Applications
The area of applications of ASICs is very wide as they are basically used everywhere where there
is a need for performance, customization and size. Some of the common categories of application
are mentioned below.
• Sensors and Transducers
• Automotive and Avionic Components
• Satellite, Radar and related Communication processors
• Microprocessors, Memories, Microcontrollers

ASIC Cell Libraries


The cell library is the key part of ASIC design. For a programmable ASIC the FPGA company
supplies you with a library of logic cells in the form of a design kit , you normally do not have a
choice, and the cost is usually a few thousand dollars. For MGAs and CBICs you have three
choices: the ASIC vendor (the company that will build your ASIC) will supply a cell library, or
you can buy a cell library from a third-party library vendor , or you can build your own cell
library.
The first choice, using an ASIC-vendor library , requires you to use a set of design tools
approved by the ASIC vendor to enter and simulate your design. You have to buy the tools,
and the cost of the cell library is folded into the Non Recurring Engineering (NRE).

Some ASIC vendors (especially for MGAs) supply tools that they have developed in-house. For
some reason the more common model in Japan is to use tools supplied by the ASIC vendor, but in
the United States, Europe, and elsewhere designers want to choose their own tools. Perhaps this
has to do with the relationship between customer and supplier being a lot closer in Japan than it is
elsewhere. (MGA – Maked Gate Array)

An ASIC vendor library is normally a phantom library the cells are empty boxes, or phantoms, but
contain enough information for layout (for example, you would only see the bounding box or
abutment box in a phantom version of the cell in Figure 1.3). After you complete layout you hand
off a netlist to the ASIC vendor, who fills in the empty boxes (phantom instantiation) before
manufacturing your chip.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 19


ADVANCED VLSI – 21EC71 [MODULE – 1]

The second and third choices require you to make a buy-or-build decision . If you complete
an ASIC design using a cell library that you bought, you also own the masks (the tooling ) that are
used to manufacture your ASIC. This is called customer-owned tooling ( COT , pronounced see-
oh-tee). A library vendor normally develops a cell library using information about a process
supplied by an ASIC foundry . An ASIC foundry (in contrast to an ASIC vendor) only provides
manufacturing, with no design help. If the cell library meets the foundry specifications, we call
this a qualified cell library . These cell libraries are normally expensive (possibly several hundred
thousand dollars), but if a library is qualified at several foundries this allows you to shop around
for the most attractive terms. This means that buying an expensive library can be cheaper in the
long run than the other solutions for high-volume production.

The third choice is to develop a cell library in-house. Many large computer and electronics
companies make this choice. Most of the cell libraries designed today are still developed in-house
despite the fact that the process of library development is complex and very expensive.
However created, each cell in an ASIC cell library must contain the following:
• A physical layout
• A behavioral model
• A Verilog/VHDL model
• A detailed timing model
• A test strategy
• A circuit schematic
• A cell icon
• A wire-load model
• A routing model

The ASIC designer needs a high-level, behavioral model for each cell because simulation at
the detailed timing level takes too long for a complete ASIC design.
• For a NAND gate a behavioral model is simple.
• A multiport RAM model can be very complex.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 20


ADVANCED VLSI – 21EC71 [MODULE – 1]

• The designer may require Verilog and VHDL models in addition to the models for a
particular logic simulator.

ASIC designers also need a detailed timing model for each cell to determine the performance of
the critical pieces of an ASIC.
• It is too difficult
• too time-consuming, and
• too expensive to build every cell in silicon and measure the cell delays.
Instead library engineers simulate the delay of each cell, a process known as characterization.
Characterizing a standard-cell or gate-array library involves circuit extraction from the full-custom
cell layout for each cell. The extracted schematic includes all the parasitic resistance and
capacitance elements. Then library engineers perform a simulation of each cell including the
parasitic elements to determine the switching delays. All ASICs need to be production tested
(programmable ASICs may be tested by the manufacturer before they are customized, but they
still need to be tested). Simple cells in small or medium-size blocks can be tested using automated
techniques, but large blocks such as RAM or multipliers need a planned strategy The cell schematic
(a netlist description) describes each cell so that the cell designer can perform simulation for
complex cells. You may not need the
detailed cell schematic for all cells, but you need enough information to compare what you think
is on the silicon (the schematic) with what is actually on the silicon (the layout)this is a layout
versus schematic ( LVS ) check.

If the ASIC designer uses schematic entry, each cell needs a cell icon together with connector and
naming information that can be used by design tools from different vendors. ASIC design uses
Schematic entry and One of the advantages of using logic synthesis rather than schematic design
entry is eliminating the problems with icons, connectors, and cell names. Logic synthesis also
makes moving an ASIC between different cell libraries, or retargeting, much easier.
In order to estimate the parasitic capacitance of wires before we actually complete any routing, we
need a statistical estimate of the capacitance for a net in given size circuit block. This usually takes
the form of a look-up table known as a wire-load model. We also need a routing model for each
cell. Large cells are too complex for the physical design or layout tools to handle directly and we

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 21


ADVANCED VLSI – 21EC71 [MODULE – 1]

need a simpler representation a phantom of the physical layout that still contains all the necessary
information. The phantom may include information that tells the automated routing tool where it
can and cannot place wires over the cell, as well as the location and types of the connections to the
cell.

Open source Tools for Semi custom & Full Custom Design: Klayout, Magic VLSI, electric
VLSI, Open ROAD, Yosys

Semiconductor Foundry Company List


• Taiwan Semiconductor Manufacturing Company (TSMC) Limited
• Globalfoundries Inc.
• United Microelectronics Corporation (UMC)
• Semiconductor Manufacturing International Corporation (SMIC)
• Samsung Electronics Co. Ltd (Samsung Foundry)

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 22


ADVANCED VLSI – 21EC71 [MODULE – 1]

• Dongbu Hitek Co. Ltd


• Intel Corporation
• Hua Hong Semiconductor Limited
• Powerchip Technology Corporation
• STMicroelectronics NV
• Tower Semiconductor Ltd.
• Vanguard International Semiconductor Corporation
• X-FAB Silicon Foundries
• NXP Semiconductors NV
• Renesas Electronics Corporation
• Microchip Technologies Inc.
• Texas Instruments Inc.

CMOS Logic:
Data Path Logic Cells: (Example – n- bit adder)

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 23


ADVANCED VLSI – 21EC71 [MODULE – 1]

A datapath adder consists of the following


• A full-adder (FA) cell with inputs (A and B), a carry in, CIN, sum output, S, and carry
out, COUT.
• (b) A 4-bit adder.
• (c) The layout, using two-level metal, with data in m1 and control in m2.
o In this example the wiring is completed outside the cell; it is also possible to design
the datapath cells to contain the wiring. Using three levels of metal, it is possible to
wire over the top of the datapath cells.
• (d) The datapath layout
What is the difference between using a datapath, standard cells, or gate arrays?
• Cells are placed together in rows on a CBIC or an MGA, but there is no generally no
regularity to the arrangement of the cells within the rows we let software arrange the
cells and complete the interconnect.
• Datapath layout automatically takes care of most of the interconnect between the cells with
the following advantages:
o Regular layout produces predictable and equal delay for each bit.
o Interconnect between cells can be built into each cell.
• There are some disadvantages of using a datapath:

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 24


ADVANCED VLSI – 21EC71 [MODULE – 1]

o The overhead (buffering and routing the control signals, for example) can make a
narrow (small number of bits) datapath larger and slower than a standard-cell (or
even gate-array) implementation.
o Datapath cells have to be predesigned (otherwise we are using full-custom design)
for use in a wide range of datapath sizes. Datapath cell design can be harder than
designing gate-array macros or standard cells.
o Software to assemble a datapath is more complex and not as widely used as
software for assembling standard cells or gate arrays.
• There are some newer standard-cell and gate-array tools that can take advantage of
regularity in a design and position cells carefully. The problem is in finding the
regularity if it is not specified. Using a datapath is one way to specify regularity to
ASIC design tools.

Data Path Elements

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 25


ADVANCED VLSI – 21EC71 [MODULE – 1]

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 26


ADVANCED VLSI – 21EC71 [MODULE – 1]

Ripple Carry Adder:

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 27


ADVANCED VLSI – 21EC71 [MODULE – 1]

Carry Save Adder:


A carry-save adder or CSA is a type of digital adder mainly used for computing the sum of a
minimum of three or above binary numbers very efficiently. A CSA is normally used within a
binary multiplier because this multiplier involves the addition of the above two binary numbers
after multiplication. By using this method, a big adder can be implemented which is very faster
compared to the usual addition of numbers.

The Carry Save adder circuit diagram is shown below. This type of adder is very different as
compared to other types because it doesn’t transmit the middle carries toward the next stages, but
in its place, it saves the carry & addends to the sum of the next stage with another fuller adder
(FA).

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 28


ADVANCED VLSI – 21EC71 [MODULE – 1]

The technique of adding up binary bits, the first stage of the addition part mainly includes saving
the carries & sum bits, and transfers to the second stage. This stage acts related to ripple carry
adder or RCA where the stored carry & sum bits are added. The operands used in this adder are
three like A, B & C where ‘C’ is a four-bit input carry. Here, for four every bit of A, B & C, and
4 FAs is used.
For every FA, the sum & carry bits are produced. Here, the carry bits are not transmitted to the
next FA, but in its place, they are simply saved & added up to the next sum term with a ripple
carry adder.

CSA Working
The carry save adder works by assembling K FAs without any horizontal connection. The main
function of this adder is to add three k-bit integers like A, B & C to generate two integers sum ‘S’
& carry C. The carry propagator is propagated to the next level whereas the carry generator is used
to generate the output carry, irrespective of the input carry. The carry propagation and generation
are two functions in the carry save adder. The carry propagation (Cp) is propagated to the next
level whereas the carry generator (Cg) is used to generate the output carry, irrespective of input
carry.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 29


ADVANCED VLSI – 21EC71 [MODULE – 1]

Example:
let X = 19, Y = 25 & Z = 11, then we compute the sum and carry it as S & C’ shown below.

X=19 = 1 0 0 1 1
Y=25 = 1 1 0 0 1
Z=11 = 0 1 0 1 1
……………………………………….
Sum =100001
Carry =11011
……………………………………….
55 = 1 1 0 1 1 1

In the above CSA example, when we are adding three binary numbers X, Y & Z you will get a
sum and carry in the next state. After adding the sum and carry values, we will get the final value

Figure (2.23) Details


• A CSA cell.
• (b) A 4-bit CSA.
• (c) Symbol for a CSA.
• (d) A four-input CSA.
• (e) The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder (RCA)
as the final stage.
• (f) A pipelined adder.
• (g) The datapath for the pipelined version showing the pipeline registers as well as the
clock control lines that use m2

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 30


ADVANCED VLSI – 21EC71 [MODULE – 1]

We can register the CSA stages by adding vectors of flip-flops as shown in


Figure 2.23(f). This reduces the adder delay to that of the slowest adder stage, usually
the CPA. By using registers between stages of combinational logic we use pipelining to
increase the speed and pay a price of increased area (for the registers) and introduce
latency . It takes a few clock cycles (the latency, equal to n clock cycles for an n -stage
pipeline) to fill the pipeline, but once it is filled, the answers emerge every clock cycle.
Ferris wheels work much the same way. When the fair opens it takes a while (latency)
to fill the wheel, but once it is full the people can get on and off every few seconds.
(We can also pipeline the RCA of Figure 2.20. We add i registers on the A and B
inputs before ADD[i] and add (n+i) registers after the output S[i], with a single
register before each C[ i ].)
The problem with an RCA is that every stage has to wait to make its carry decision, C[i], until
the previous stage has calculated C[i+1]. If we examine the propagate signals
we can bypass this critical path.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 31


ADVANCED VLSI – 21EC71 [MODULE – 1]

Carry Bypass Adder:

The problem with an RCA is that every stage has to wait to make its carry decision, C[i], until the
previous stage has calculated C[i+1]. If we examine the propagate signals
we can bypass this critical path. Thus, for example, to bypass the carries for bits C[4], C[7]
(stages 5,8) of an adder we can compute BYPASS [BP] = P[4].P[5].P[6].P[7] and then use a
MUX as follows:

Carry Skip Adder:


A carry-skip adder consists of a simple ripple carry-adder with a special speed up carry
chain called a skip chain. This chain defines the distribution of ripple carry blocks, which
compose the skip adder. The addition of two binary digits at stage i, where i is not equal to 0, of
the ripple carry adder depends on the carry in, Ci , which in reality is the carry out, Ci-1, of the
previous stage. Therefore, in order to calculate the sum and the carry out, Ci+1 , of stage i, it is
imperative that the carry in, Ci, be known in advance. It is interesting to note that in some cases
Ci+1 can be calculated without knowledge of Ci.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 32


ADVANCED VLSI – 21EC71 [MODULE – 1]

Carry Lookahead Adder:


A carry look-ahead adder reduces the propagation delay by introducing more complex hardware.
In this design, the ripple carry design is suitably transformed such that the carry logic over fixed
groups of bits of the adder is reduced to two-level logic.
In this adder, the carry input at any stage of the adder is independent of the carry bits generated
at the independent stages. Here the output of any stage is dependent only on the bits which are
added in the previous stages and the carry input provided at the beginning stage. Hence, the

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 33


ADVANCED VLSI – 21EC71 [MODULE – 1]

circuit at any stage does not have to wait for the generation of carry-bit from the previous stage
and carry bit can be evaluated at any instant of time.
For deriving the truth table of this adder, two new terms are introduced – Carry generate and
carry propagate. Carry generate Gi =1 whenever there is a carry Ci+1 generated. It depends on
Ai and Bi inputs. Gi is 1 when both Ai and Bi are 1. Hence, Gi is calculated as Gi = Ai. Bi.
Carry propagated Pi is associated with the propagation of carry from Ci to Ci+1. It is calculated
as Pi = Ai ⊕ Bi. The truth table of this adder can be derived from modifying the truth table of a
full adder.
Using the Gi and Pi terms the Sum Si and Carry Ci+1 are given as below –
• S[i] = P[i] ⊕ G[i].
• C[i+1] = C[i].P[i] +G[i].

It can be observed from the equations that carry C[i+1] only depends on the carry C[0], not on
the intermediate carry bits.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 34


ADVANCED VLSI – 21EC71 [MODULE – 1]

. Figure:Brent-Kung carry look ahead adder

Carry Select Adder:


“duplicates two small adders for the cases CIN='0' and CIN='1' and then
uses a MUX to select the case that we need”
An 8-bit carry-select adder, built as a cascade from a 1-bit full-adder, a 3-bit carry-select block,
and a 4-bit carry-select adder. Click the input switches or type the 'a', 'b', 'c' bindkeys to control the
first-stage adder.
The problem of the ripple-carry adder is that each adder has to wait for the arrival of its carry-input
signal before the actual addition can start. The basic idea of the carry-select adder is to use blocks

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 35


ADVANCED VLSI – 21EC71 [MODULE – 1]

of two ripple-carry adders, one of which is fed with a constant 0 carry-in while the other is fed
with a constant 1 carry-in. Therefore, both blocks can calculate in parallel. When the actual carry-
in signal for the block arrives, multiplexers are used to select the correct one of both precalculated
partial sums. Also, the resulting carry-out is selected and propagated to the next carry-select block.
In total, the carry propagation time through an n-bit adder block is reduced from O(n) to the number
of stages times the delay of the multiplexers. Naturally, using n blocks of 1-bit carry-select adders
would incur a complexity of n multiplexers, again resulting in O(n) delay. Therefore, a partition
with (slowly) increasing block-size is chosen. In the example, the first (least-significant) block
consists of a simple full adder, followed by a 3-bit carry-select block, and finally a 4-bit carry-
select block. A common choice for a 16-bit carry-select adder is to use a 6-4-3-2-1 bit partitioning.
While the delay of the standard ripple-carry adder with n-bits is O(n), the delay through the carry-
select adder behaves as O(sqrt(n)) at a hardware cost of O(3*n).

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 36


ADVANCED VLSI – 21EC71 [MODULE – 1]

Conditional Sum Adder (8 – bit):


• Extension of carry select adder
n -bit conditional-sum adder uses n single-bit conditional adders, H (each with four outputs: two
conditional sums, true carry, and complement carry), together with a tree of 2:1 MUXes. The
conditional-sum adder is usually the fastest of all the adders (it is the fastest when logic cell delay
increases with the number of inputs this is true for all ASICs except FPGAs)
The idea of carry select adder is behind the idea of fast conditional sum adders. An n-bit adder can
be designed using smaller n/2 or n/4 bit adders using the same carry select concept. For example,
a 4-bit adder can be built using seven 1-bit adders.

Figure: A 1-bit conditional adder that calculates the sum and carry out assuming the carry in is
either '1' or '0'

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 37


ADVANCED VLSI – 21EC71 [MODULE – 1]

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 38


ADVANCED VLSI – 21EC71 [MODULE – 1]

Example:
// 4- bit sonsitional sum adder
module cond_sum_add(a,b,c0,cout,s );

input [3:0] a,b;


output [3:0] s;
input c0;
output cout;
wire c1,c2,c3,c4,c5,c6,c7,c23,c670,c671;
wire s1,s2,s3,s4,s5,s6,s670,s671;
fa m1(a[0],b[0],c0,s[0],c1,);

fa m2(a[1],b[1],1'b0,s1,c2,);
fa m3(a[1],b[1],1'b1,s2,c3,);

fa m4(a[2],b[2],1'b0,s3,c4,);
fa m5(a[2],b[2],1'b1,s4,c5,);

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 39


ADVANCED VLSI – 21EC71 [MODULE – 1]

fa m6(a[3],b[3],1'b0,s5,c6,);
fa m7(a[3],b[3],1'b1,s6,c7,);

mux_df mx1(c2,c3,c1,c23);
mux_df mx2(s1,s2,c1,s[1]);

mux_df mx3(c6,c7,c4,c670);
mux_df mx4(s5,s6,c4,s670);
mux_df mx5(c6,c7,c5,c671);
mux_df mx6(s5,s6,c5,s671);

mux_df mx7(c670,c671,c23,cout);
mux_df mx8(s3,s4,c23,s[2]);
mux_df mx9(s670,s671,c23,s[3]);

endmodule

// Verilog code for Full Adder


module fa(a,b,cin,sum,co,t1);
input a,b,cin;
output sum,co,t1;
wire t1,t2;
ha X1(a,b,t1,t2);
ha X2(cin,t1,sum,t4);
assign co = t2 | t4;
endmodule

// Verilog code for 2x1 Multiplexer


module mux_df( input a,b,s,output y );
wire sbar;
assign y = (a&sbar)|(s&b);
assign sbar = ~s;
endmodule

Figure: Data path Adder

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 40


ADVANCED VLSI – 21EC71 [MODULE – 1]

Multiplier (booth Encoding):

Data Path Operators:


The combinational datapath cells, NAND, NOR, and so on, and sequential datapath cells (flip-
flops and latches) have standard-cell equivalents and function identically

FIGURE: Symbols for datapath elements. (a) An array or vector of flip-flops (a


register). (b) A two-input NAND cell with databus inputs. (c) A two-input NAND cell
with a control input. (d) A buswide MUX. (e) An incrementer/decrementer. (f) An
all-zeros detector. (g) An all-ones detector. (h) An adder/subtracter
(Thick line → Vector data, Thin line → Scalar Data)
A subtracter is similar to an adder, except in a full subtracter we have a
• BIN - borrow-in signal
• BOUT - borrow-out signal
• DIFF - difference signal
• MAJ – Majority function (‘1’ if the majority of the inputs are ‘1’

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 41


ADVANCED VLSI – 21EC71 [MODULE – 1]

Subtractor – 2’s complemetn addition (A+not (B)+1)


Barrel Shifter:
• A barrel shifter rotates or shifts an input bus by a specified amount. For example if we
have an eight-input barrel shifter with input '1111 0000' and we specify a shift of
'0001 0000' (3, coded by bit position) the right-shifted 8-bit output is '0001 1110'.
• A barrel shifter may rotate left or right (or switch between the two under a separate
control).
• A barrel shifter may also have an output width that is smaller than the input.
To use a simple example, we may have an 8-bit input and a 4-bit output. This situation
is equivalent to having a barrel shifter with two 4-bit inputs and a 4-bit output.
• Barrel shifters are used extensively in floating-point arithmetic to align (we call this
normalize and denormalize ) floating-point numbers (with sign, exponent, and
mantissa)

Figure: IEEE floating point single precision data format

Figure: IEEE floating point double precision data format

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 42


ADVANCED VLSI – 21EC71 [MODULE – 1]

Figure: 8-bit logical right shifter

Figure: 8-bit logical right rotator

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 43


ADVANCED VLSI – 21EC71 [MODULE – 1]

Leading one Detector:


• A leading-one detector is used with a normalizing (left-shift) barrel shifter to align
mantissas in floating-point numbers.
• The input is an n -bit bus A, the output is an n -bit bus, S, with a single '1' in the bit position
corresponding to the most significant '1' in the input.
• For example, if the input is A = '0000 0101' the leading-one detector output is S = '0000
0100', indicating the leading one in A is in bit position 2 (bit 7 is the MSB, bit zero is the
LSB).
• If we feed the output, S, of the leading-one detector to the shift select input of a normalizing
(left-shift) barrel shifter, the shifter will normalize the input A.
• In our example, with an input of A = '0000 0101', and a left-shift of S ='0000 0100', the
barrel shifter will shift A left by five bits and the output of the shifter is Z = '1010 0000'.
Now that Z is aligned (with the MSB equal to '1') we can multiply Z with another
normalized number.

Priority Encoder:
• The output of a priority encoder is the binary-encoded position of the leading one in an
input. For example, with an input A = '0000 0101' the leading 1 is in bit position 3
(MSB is bit position 7) so the output of a 4-bit priority encoder would be Z = '0011' (3).
• In some cell libraries the encoding is reversed so that the MSB has an output code of
zero, in this case Z = '0101' (5). This second, reversed, encoding scheme is useful in
floating-point arithmetic.
• If A is a mantissa and we normalize A to '1010 0000' we have to subtract 5 from the
exponent, this exponent correction is equal to the output of the priority encoder.
Accumulator:
• An accumulator is an adder/subtracter and a register. Sometimes these are combined
with a multiplier to form a multiplier/accumulator ( MAC ).
• An incrementer adds 1 to the input bus, Z = A + 1, so we can use this function, together
with a register, to negate a twos complement number for example.
o Z[ i ] = XOR(A[ i ], CIN[ i ])
o COUT[ i ] = AND(A[ i ], CIN[ i ]).

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 44


ADVANCED VLSI – 21EC71 [MODULE – 1]

• The carry-in control input, CIN[0], thus acts as an enable: If it is set to '0' the output is the
same as the input.
Incrementor:
• Z[ i (even)] = XOR(A[ i ], CIN[ i ]) and COUT[ i (even)] = NAND(A[ i ], CIN[ i ]).
This inverts COUT, so that in the following stage we must invert it again.
• If we push an inverting bubble to the input CIN we find that:
o Z[ i (odd)] = XNOR(A[ i ], CIN[ i ])
o COUT[ i (even)] = NOR(NOT(A[ i ]), CIN[ i]).
• In many datapath implementations all odd-bit cells operate on inverted carry signals,
and thus the odd-bit and even-bit datapath elements are different.
Decrementer:
• A decrementer subtracts 1 from the input bus, the logical implementation is
o Z[ i ] = XOR(A[ i ], CIN[ i ])
o COUT[ i ] = AND(NOT(A[ i ]), CIN[ i ]).
• The implementation may invert the odd carry signals, with CIN[0] again acting as an
enable.
Incrementer/Decrementer:
• An incrementer/decrementer has a second control input that gates the input, inverting
the input to the carry chain. This has the effect of selecting either the increment or
decrement function.
All -Zero Detector &All One-Detector:
• For a 4-bit number, for example, zero in ones complement arithmetic is '1111' or '0000',
and that zero in signed magnitude arithmetic is '1000' or '0000'.
Register File:
• A register file (or scratchpad memory) is a bank of flip-flops arranged across the bus;
sometimes these have the option of multiple ports (multiport register files) for read and
write.
• Normally these register files are the densest logic and hardest to fit in a datapath.
• For large register files it may be more appropriate to use a multiport memory.
• We can add control logic to a register file to create a first-in first-out register ( FIFO ), or
last-in first-out register ( LIFO )

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 45


ADVANCED VLSI – 21EC71 [MODULE – 1]

I/O Cells:
Three State Bi-directional output buffer
• When the output enable (OE) signal is high, the circuit functions as a noninverting buffer
driving the value of DATAin onto the I/O pad.
• When OE is low, the output transistors or drivers , M1 and M2, are disconnected. This
allows multiple drivers to be connected on a bus. It is up to the designer to make sure that
a bus never has two drivers a problem known as contention .
• In order to prevent the problem opposite to contention a bus floating to an intermediate
voltage when there are no bus drivers ,we can use a bus keeper or bus-hold cell (TI calls
this Bus-Friendly logic).
o A bus keeper normally acts like two weak (low drive-strength) cross-coupled
inverters that act as a latch to retain the last logic state on the bus, but the latch is
weak enough that it may be driven easily to the opposite state.
o Even though bus keepers act like latches, and will simulate like latches, they
should not be used as latches, since their drive strength is weak

Figure: three State bi-directional output buffer[When the output enable, OE, is '1' the output
section is enabled and drives the I/O pad. When OE is '0' the output buffer is placed in a
high-impedance state]
• Transistors M1 and M2 have to drive large off-chip loads. If we wish to change the
voltage on a C = 200 pF load by 5 V in 5 ns (a slew rate of 1Vns-1 ) we will require a
current in the output transistors of

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 46


ADVANCED VLSI – 21EC71 [MODULE – 1]

o IDS = C (d V /d t ) =(200 x 10-12 ) (5/5 x 10-9 ) = 0.2 A or 200 mA.


• Such large currents flowing in the output transistors must also flow in the power
supply bus and can cause problems. There is always some inductance in series
with the power supply, between the point at which the supply enters the ASIC
package and reaches the power bus on the chip. The inductance is due to the bond
wire, lead frame, and package pin.
• If we have a power-supply inductance of 2 nH and a current changing from zero to 1 A (32
I/O cells on a bus switching at 30mA each) in 5 ns, we will have a voltage spike on the
power supply (called power-supply bounce ) of
o L (d I /d t ) = (2 x 10-9 )(1/(5x10-9 )) = 0.4 V.
• To Solve the problem easily
o limit the number of simultaneously switching outputs (SSOs)
o limit the number of I/O drivers that can be attached to any one VDD and GND pad
o design the output buffer to limit the slew rate of the output (we call these slew-rate
limited I/O pads).
o Quiet-I/O cells also use two separate power supplies and two sets of I/O
drivers:
▪ an AC supply (clean or quiet supply) with small AC drivers for the I/O
circuits that start and stop the output slewing at the beginning and end of a
output transition
▪ a DC supply (noisy or dirty supply) for the transistors that handle large
currents as they slew the output
• The three-state buffer allows us to employ the same pad for input and output bidirectional
I/O .
• When we want to use the pad as an input, we set OE low and take the data from DATAin.
Of course, it is not necessary to have all these features on every pad: We can build output-
only or input-only pads
• We can also use many of these output cell features for input cells that have to
drive large on-chip loads (a clock pad cell, for example).
• Some gate arrays simply turn an output buffer around to drive a grid of interconnect that
supplies a clock signal internally. With a typical interconnect capacitance of 0.2pFcm-1 , a

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 47


ADVANCED VLSI – 21EC71 [MODULE – 1]

grid of 100 cm (consisting of 10 by 10 lines running all the way across a 1 cm chip) presents
a load of 20 pF to the clock buffer
• Some libraries include I/O cells that have passive pull-ups or pull-downs
(resistors) instead of the transistors, M1 and M2 (the resistors are normally still
constructed from transistors with long gate lengths).
• We can also omit one of the driver transistors, M1 or M2, to form open-drain outputs that
require an external pull-up or pull-down.
• We can design the output driver to produce TTL output levels rather than CMOS logic
levels. We may also add input hysteresis (using a Schmitt trigger) to the input buffer to
accept input data signals that contain glitches (from bouncing switch contacts, for example)
or that are slow rising.
• The input buffer can also include a level shifter to accept TTL input levels and shift the
input signal to CMOS levels
• The gate oxide in CMOS transistors is extremely thin (100 Å or less). This leaves the gate
oxide of the I/O cell input transistors susceptible to breakdown from static electricity
(electrostatic discharge , or ESD ).
• ESD arises when we or machines handle the package leads (like the shock I sometimes get
when I touch a doorknob after walking across the carpet at work).
• Sometimes this problem is called electrical overstress (EOS) since most ESD-related
failures are caused not by gate-oxide breakdown, but by the thermal stress (melting) that
occurs when the n -channel transistor in an output driver overheats (melts) due to the large
current that can flow in the drain diffusion connected to a pad during an ESD event.

Cell Compilers:
• The process of hand crafting circuits and layout for a full-custom IC is a tedious, time-
consuming, and error-prone task.
• There are two types of automated layout assembly tools, often known as a silicon
compilers .
o The first type produces a specific kind of circuit, a RAM compiler or multiplier
compiler

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 48


ADVANCED VLSI – 21EC71 [MODULE – 1]

o The second type of compiler is more flexible, usually providing a programming


language that assembles or tiles layout from an input command file, but this is
full-custom IC design.
• We can build a register file from latches or flip-flops, but using gates (18-26
transistors) per bit, this is an expensive way to build memory.
• Dynamic RAM (DRAM) can use a cell with only one transistor, storing charge on a
capacitor that has to be periodically refreshed as the charge leaks away.
• ASIC RAM is invariably static (SRAM), so we do not need to refresh the bits. When we
refer to RAM in an ASIC environment we almost always mean SRAM.
• Most ASIC RAMs use a six-transistor cell (four transistors to form two cross-coupled
inverters that form the storage loop, and two more transistors to allow us to read from and
write to the cell).
• RAM compilers are available that produce single-port RAM (a single shared bus for read
and write) as well as dual-port RAMs , and multiport RAMs .
• In a multi-port RAM the compiler may or may not handle the problem of address
contention (attempts to read and write to the same RAM address simultaneously). RAM
can be asynchronous (the read and write cycles are triggered by control and/or address
transitions asynchronous to a clock) or synchronous (using the system clock)
• In addition to producing layout we also need a model compiler so that we can verify the
circuit at the behavioral level, and we need a netlist from a netlist compiler so that we can
simulate the circuit and verify that it works correctly at the structural level.
• Silicon compilers are thus complex pieces of software. We assume that a silicon compiler
will produce working silicon even if every configuration has not been tested.

DR.K.EZHILARASAN, ASSOCIATE PROFESSOR, DEPT. OF ECE, SJCIT 49

You might also like