Module -1 - Introduction to ASIC
Module -1 - Introduction to ASIC
MODULE -1
Introduction to ASICs: Full custom, Semi-custom and Programmable ASICs, ASIC Design
flow, ASIC cell libraries. CMOS Logic: Data path Logic Cells: Data Path Elements, Adders:
Carry skip, Carry bypass, Carry save, Carry select, Conditional sum, Multiplier (Booth
encoding), Data path Operators, I/O cells, Cell Compilers
Pre-requisite: Basic concepts of CMOS Logic, Basic concepts of logic Design, Basic
concepts of EDA
Introduction
ASIC stands for Application Specific Integrated Circuit. It is specially built for a specific
application or purpose. If compared to any other device, ASIC has improved speed. Basically, it
is an integrated circuit that’s been specified for one specific purpose and is not software
programmable to perform a wide variety of different tasks. These are widely used in applications,
including auto emission control, environmental monitoring, and personal digital assistants. It often
has an embedded CPU to manage suitable tasks.
The semiconductor industry has evolved from the first ICs of the early 1970s and matured rapidly
since then. Early small-scale integration ( SSI ) ICs contained a few (1 to 10) logic gates NAND
gates, NOR gates, and so on a mounting to a few tens of transistors. The era of medium-scale
integration ( MSI ) increased the range of integrated logic available to counters and similar, larger
scale, logic functions. The era of large-scale integration ( LSI ) packed even larger logic functions,
such as the first microprocessors, into a single chip. The era of very large-scale integration ( VLSI
) now offers 64-bit microprocessors, complete with cache memory and floating-point arithmetic
units well over a million transistors on a single piece of silicon. As CMOS process technology
improves, transistors continue to get smaller and ICs hold more and more transistors.
The earliest ICs used bipolar technology and the majority of logic ICs used either transistor
transistor logic ( TTL ) or emitter-coupled logic (ECL). Although invented before the bipolar
transistor, the metal-oxide-silicon ( MOS ) transistor was initially difficult to manufacture because
of problems with the oxide interface. As these problems were gradually solved, metal-gate n -
channel MOS (nMOS or NMOS ) technology developed in the 1970s. At that time MOS
technology required fewer masking steps, was denser, and consumed less power than equivalent
bipolar ICs. This meant that, for a given performance, an MOS IC was cheaper than a bipolar IC
and led to investment and growth of the MOS IC market.
By the early 1980s the aluminum gates of the transistors were replaced by polysilicon gates, but
the name MOS remained. The introduction of polysilicon as a gate material was a major
improvement in CMOS technology, making it easier to make two types of transistors, n -channel
MOS and p -channel MOS transistors, on the same IC a complementary MOS ( CMOS , never
cMOS) technology. The principal advantage of CMOS over NMOS is lower power consumption.
Another advantage of a polysilicon gate was a simplification of the fabrication process, allowing
devices to be scaled down in size.
There are four CMOS transistors in a two-input NAND gate (and a two-input NOR gate too), so
to convert between gates and transistors, you multiply the number of gates by 4 to obtain the
number of transistors. We can also measure an IC by the smallest feature size (roughly half the
length of the smallest transistor) imprinted on the IC. Transistor dimensions are measured in
microns (a micron, 1mm, is a millionth of a meter). Thus we talk about a 0.5 mm IC or say an IC
is built in (or with) a 0.5 mm process, meaning that the smallest transistors are 0.5 mm in length.
We give a special label, l or lambda , to this smallest feature size. Since lambda is equal to half of
the smallest transistor length, l ª 0.25 mm in a 0.5 mm process. Many of the drawings in this book
use a scale marked with lambda for the same reason we place a scale on a map.
Some digital logic ICs and their analog counterparts (analog/digital converters, for example) are
standard parts , or standard ICs. You can select standard ICs from catalogs and data books and buy
them from distributors. Systems manufacturers and designers can use the same standard part in a
variety of different microelectronic systems (systems that use microelectronics or ICs).
With the advent of VLSI in the 1980s engineers began to realize the advantages of designing an
IC that was customized or tailored to a particular system or application rather than using standard
ICs alone. Microelectronic system design then becomes a matter of defining the functions that you
can implement using standard ICs and then implementing the remaining logic functions
(sometimes called glue logic) with one or more custom ICs . As VLSI became possible you
could build a system from a smaller number of components by combining many standard ICs into
a few custom ICs. Building a microelectronic system with fewer ICs allows you to reduce cost and
improve reliability.
ASIC Categories
ASIC
Programmable
Full – Custom ASICs Semi-Custom ASICs
ASICs
In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or layout
specifically for one ASIC. This means the designer abandons the approach of using pretested
and precharacterized cells for all or part of that design. It makes sense to take this approach only
if there are no suitable existing cell libraries available that can be used for the entire design. This
might be because existing cell libraries are not fast enough, or the logic cells are not small enough
or consume too much power. You may need to use full-custom design if
the ASIC technology is new or so specialized that there are no existing cell libraries or because
the ASIC is so specialized that some circuits must be custom designed. Fewer and fewer full-
custom ICs are being designed because of the problems with these special parts of the ASIC
Bipolar technology has historically been used for precision analog functions. There are some
fundamental reasons for this. In all integrated circuits the matching of component characteristics
between chips is very poor, while the matching of characteristics between components on the
same chip is excellent.
Suppose we have transistors T1, T2, and T3 on an analog/digital ASIC. The three transistors are
all the same size and are constructed in an identical fashion. Transistors T1 and T2 are located
adjacent to each other and have the same orientation. Transistor T3 is the same size as T1 and
T2 but is located on the other side of the chip from T1 and T2 and has a different orientation.
ICs are made in batches called wafer lots. A wafer lot is a group of silicon wafers that are all
processed together. Usually there are between 5 and 30 wafers in a lot. Each
wafer can contain tens or hundreds of chips depending on the size of the IC and the wafer.
If we were to make measurements of the characteristics of transistors T1, T2, and T3 we would
find the following:
• Transistors T1 will have virtually identical characteristics to T2 on the same IC. We say
that the transistors match well or the tracking between devices is excellent.
• Transistor T3 will match transistors T1 and T2 on the same IC very well, but not as
closely as T1 matches T2 on the same IC.
• Transistor T1, T2, and T3 will match fairly well with transistors T1, T2, and T3 on a
different IC on the same wafer. The matching will depend on how far apart the two ICs
are on the wafer.
• Transistors on ICs from different wafers in the same wafer lot will not match very well.
• Transistors on ICs from different wafer lots will match very poorly.
For many analog designs the close matching of transistors is crucial to circuit operation.
For these circuit designs pairs of transistors are used, located adjacent to each other. Device
physics dictates that a pair of bipolar transistors will always match more precisely than
CMOS transistors of a comparable size.
Bipolar technology has historically been more widely used for full-custom analog design
because of its improved precision. Despite its poorer analog properties, the use of CMOS
technology for analog functions is increasing. There are two reasons for this.
• The first reason is that CMOS is now by far the most widely available IC
technology. Many more CMOS ASICs and CMOS standard products are now being
manufactured than bipolar ICs.
• The second reason is that increased levels of integration require mixing analog and
digital functions on the same IC: this has forced designers to find ways to use
CMOS technology to implement analog functions. Circuit designers, using clever
new techniques, have been very successful in finding new ways to design analog
CMOS circuits that can approach the accuracy of bipolar analog designs
The main advantages of full-custom ASICs over other IC designs are it delivers the highest
possible performance at the smallest possible die size. But this high performance and small size
comes at a price of increased design time, complex design and overall cost of the IC itself.
Some of most common full-custom ASICs are Microprocessors, Memories, Analog
Processors, Analog / Digital Communication devices, Sensors, Transducers, high-voltage
ICs for Automobiles, etc.
In this type of design logic cells are taken from standard libraries.i.e. they are not handcrafted as
in Full custom design. Some masks are customized while some are taken from the predesigned
library. Based on the type of logic cells taken from the library and amount of customization
allowed for interconnects these ASICs are divided into two types- Standard cell-based ASIC and
Gate Array-based ASIC.
The designer only has liability to change interconnection between transistors using the first few
metal layers of the die. The designer chooses from the gate array library. These are often called
as Masked Gate Array. Gate Array Based ASIC are of three types.
• Channelled Gate array,
• Channel less gate array
• Structured gate array.
Figure: A channeled gate-array die. The spaces between rows of the base cells are set aside for
interconnect
There is no free space left for routing between rows of cells as seen in the channeled gate array.
Here routing is done from above the gate array cells as we can customize the connection between
the metal 1 and transistors. For routing, we leave the transistors lying in the path of routing unused.
The manufacturing lead time is about two weeks.
Figure: A channel less gate-array or sea-of-gates (SOG) array die. The core area of the die is
completely filled with an array of base cells (the base array)
The key difference between a channelless gate array and channeled gate array is that there are no
predefined areas set aside for routing between cells on a channelless gate array. Instead we
route over the top of the gate-array devices. We can do this because we customize the contact
layer that defines the connections between metal1, the first layer of metal, and the transistors.
When we use an area of transistors for routing in a channelless array, we do not make any
contacts to the devices lying underneath; we simply leave the transistors unused.
The logic density the amount of logic that can be implemented in a given silicon area is higher
for channelless gate arrays than for channeled gate arrays. This is usually attributed to the
difference in structure between the two types of array.
In fact, the difference occurs because the contact mask is customized in a channelless gate array,
but is not usually customized in a channeled gate array. This leads to denser cells in the
channelless architectures. Customizing the contact layer in a channelless gate array allows us to
increase the density of gate-array cells because we can route over the top of unused contact sites
This type of gate array has an embedded block along with gate array rows as seen above. Structured
gate array has a higher area efficiency of CBIC. Like Masked gate array these have lower cost and
faster turnaround. Here the fixed size of the embedded function poses a limitation on the structured
gate array. For example, is this gate array contains an area reserved for 32k bit controller but if in
an application we only require an area for 16k bit controller the remaining area gets wasted. All
the gate array have a turnaround time of two days to two weeks and all have customized
interconnect.
Figure: A structured or embedded gate-array die showing an embedded block in the upper left corner (a
static random-access memory, for example). The rest of the die is filled with an array of base cells.
The important features of this type of Masked Gate Array (MGA) are the following:
• Only the interconnect is customized.
• Custom blocks (the same for each design) can be embedded.
• Manufacturing lead time is between two days and two weeks
An embedded gate array gives the improved area efficiency and increased performance of a CBIC
but with the lower cost and faster turnaround of an MGA. One disadvantage of an embedded gate
array is that the embedded function is fixed. For example, if an embedded gate array contains an
area set aside for a 32 k-bit memory, but we only need a 16 k-bit memory, then we may have to
waste half of the embedded memory function. However, this may still be more efficient and
cheaper than implementing a 32 k-bit memory using macros on a SOG array.
The ASIC designer defines only the placement of the standard cells and the interconnect in a CBIC.
However, the standard cells can be placed anywhere on the silicon; this means that all the mask
layers of a CBIC are customized and are unique to a particular customer. The advantage of CBICs
is that designers save time, money, and reduce risk by using a predesigned, pretested, and
precharacterized standard-cell library. In addition, each standard cell can be optimized
individually. During the design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area, for example. The disadvantages are the time
or expense of designing or buying the standard-cell library and the time needed to fabricate all
layers of the ASIC for each new design
Figure: A cell-based ASIC (CBIC) die with a single standard-cell area (a flexible block) together
with four fixed blocks. The flexible block contains rows of standard cells. The small squares
around the edge of the die are bonding pads that are connected to the pins of the ASIC package.
Example:
Standard cells are designed to fit together like
bricks in a wall. Figure shows
an example of a simple standard cell (it is
simple in the sense it is not maximized
for density but ideal for showing you its
internal construction). Power and ground
buses (VDD and GND or VSS) run
horizontally on metal lines inside the cells
Standard-cell design allows the automation of the process of assembling an ASIC. Groups of
standard cells fit horizontally together to form rows. The rows stack vertically to form flexible
rectangular blocks (which you can reshape during design). You may then connect a flexible block
built from several rows of standard cells to other standard-cell blocks or other full-custom logic
blocks.
For example, you might want to include a custom interface to a standard, predesigned
microcontroller together with some memory. The microcontroller block may be a fixed-size
megacell, you might generate the memory using a memory compiler, and the custom logic and
memory controller will be built from flexible standard-cell blocks, shaped to fit in the empty spaces
on the chip Both cell-based and gate-array ASICs use predefined cells, but there is a
difference we can change the transistor sizes in a standard cell to optimize speed and performance,
but the device sizes in a gate array are fixed. This results in a trade-off in performance and area in
a gate array at the silicon level.
The trade-off between area and performance is made at the library level for a standard-cell
ASIC.
Modern CMOS ASICs use two, three, or more levels (or layers) of metal for interconnect. This
allows wires to cross over different layers in the same way that we use copper traces on different
layers on a printed-circuit board.
• In a two-level metal CMOS technology, connections to the standard-cell inputs and outputs
are usually made using the second level of metal ( metal2 , the upper level of metal) at the
tops and bottoms of the cells.
• In a three-level metal technology, connections may be internal to the logic cell. This allows
for more sophisticated routing programs to take advantage of the extra metal layer to route
interconnect over the top of the logic cells A connection that needs to cross over a row of
standard cells uses a feedthrough.
• The term feedthrough can refer either to the piece of metal that is used to pass a signal
through a cell or to a space in a cell waiting to be used as a feedthrough very confusing.
Figure shows two feedthroughs: one in cell A.14 and one in cell A.23. In both two-level
and three-level metal technology, the power buses (VDD and GND) inside the standard
cells normally use the lowest (closest to the transistors) layer of metal ( metal1 ). The width
of each row of standard cells is adjusted so that they may be aligned using spacer cells .
The power buses, or rails, are then connected to additional vertical power rails using row-
end cells at the aligned ends of each standard-cell block.
• If the rows of standard cells are long, then vertical power rails can also be run in metal2
through the cell rows using special power cells that just connect to VDD and GND. Usually
the designer manually controls the number and width of the vertical power rails connected
to the standard-cell blocks during physical design. A diagram of the power distribution
scheme for a CBIC is shown in Figure
All the mask layers of a CBIC are customized. This allows megacells (SRAM, a SCSI controller,
or an MPEG decoder, for example) to be placed on the same IC with standard cells. Megacells are
usually supplied by an ASIC or library company complete with behavioral models and some way
to test them (a test strategy). ASIC library companies also supply compilers to generate flexible
DRAM, SRAM, and ROM blocks. Since all mask layers on a standard-cell design are customized,
memory design is more efficient and denser than for gate arrays.
For logic that operates on multiple signals across a data bus a datapath ( DP ) the
use of standard cells may not be the most efficient ASIC design style.
Some ASIC library companies provide a datapath compiler that automatically generates
datapath logic . A datapath library typically contains cells such as adders, subtracters, multipliers,
and simple arithmetic and logical units ( ALUs ). The connectors of datapath library cells are pitch-
matched to each other so that they fit together. Connecting datapath cells to form a datapath
usually, but not always, results in faster and denser layout than using standard cells or a gate array.
Programmable ASIC
There are two types of programmable ASICs. They are PLD and FPGA
Figure: A programmable logic device (PLD) die. The macrocells typically consist of programmable array
logic followed by a flip-flop or latch. The macrocells are connected using a large programmable
interconnect block
There is another type of ROM that can be placed on any ASIC a mask-programmable ROM
(mask-programmed ROM or masked ROM). A masked ROM is a regular array of transistors
permanently programmed using custom mask patterns. An embedded masked ROM is thus a
large, specialized, logic cell.
The same programmable technologies used to make ROMs can be applied to more flexible logic
structures. By using the programmable devices in a large array of AND gates and an array of OR
gates, we create a family of flexible and programmable logic devices called logic arrays . The
company Monolithic
Memories (bought by AMD) was the first to produce Programmable Array Logic (PAL ® , a
registered trademark of AMD) devices that you can use, for example, as transition decoders for
state machines.
A PAL can also include registers (flip-flops) to store the current state information so that you can
use a PAL to make a complete state machine. Just as we have a mask-programmable ROM, we
could place a logic array as a cell on a custom ASIC. This type of logic array is called a
programmable logic array (PLA). There is a difference between a PAL and a PLA: a PLA has a
programmable AND logic array, or AND plane , followed by a programmable OR logic array, or
OR plane ; a PAL has a programmable AND plane and, in contrast to a PLA, a fixed OR plane.
Depending on how the PLD is programmed, we can have an erasable PLD (EPLD), or mask-
programmed PLD (sometimes called a masked PLD but usually just PLD). The first PALs,
PLAs, and PLDs were based on bipolar technology and used programmable fuses or links.
CMOS PLDs usually employ floating-gate transistors
• Routing: Once the placement of the blocks and cells is completed, then it is time to create
the connections between the cells and the blocks.
• Extraction: The next step is to determine the resistance and capacitance of the
interconnections previously made, since they decide the delay of the signal. Also, the
delays are calculated at this stage.
• Post-layout Simulation: Once the physical design is complete, the circuit is again tested
for working. The delays previously calculated are also taken into consideration for the
simulation process.
• Design Rule Check (DRC): Final step is to verify the layout of the entire circuit and check
whether it complies with the design rule specifications.
Applications
The area of applications of ASICs is very wide as they are basically used everywhere where there
is a need for performance, customization and size. Some of the common categories of application
are mentioned below.
• Sensors and Transducers
• Automotive and Avionic Components
• Satellite, Radar and related Communication processors
• Microprocessors, Memories, Microcontrollers
Some ASIC vendors (especially for MGAs) supply tools that they have developed in-house. For
some reason the more common model in Japan is to use tools supplied by the ASIC vendor, but in
the United States, Europe, and elsewhere designers want to choose their own tools. Perhaps this
has to do with the relationship between customer and supplier being a lot closer in Japan than it is
elsewhere. (MGA – Maked Gate Array)
An ASIC vendor library is normally a phantom library the cells are empty boxes, or phantoms, but
contain enough information for layout (for example, you would only see the bounding box or
abutment box in a phantom version of the cell in Figure 1.3). After you complete layout you hand
off a netlist to the ASIC vendor, who fills in the empty boxes (phantom instantiation) before
manufacturing your chip.
The second and third choices require you to make a buy-or-build decision . If you complete
an ASIC design using a cell library that you bought, you also own the masks (the tooling ) that are
used to manufacture your ASIC. This is called customer-owned tooling ( COT , pronounced see-
oh-tee). A library vendor normally develops a cell library using information about a process
supplied by an ASIC foundry . An ASIC foundry (in contrast to an ASIC vendor) only provides
manufacturing, with no design help. If the cell library meets the foundry specifications, we call
this a qualified cell library . These cell libraries are normally expensive (possibly several hundred
thousand dollars), but if a library is qualified at several foundries this allows you to shop around
for the most attractive terms. This means that buying an expensive library can be cheaper in the
long run than the other solutions for high-volume production.
The third choice is to develop a cell library in-house. Many large computer and electronics
companies make this choice. Most of the cell libraries designed today are still developed in-house
despite the fact that the process of library development is complex and very expensive.
However created, each cell in an ASIC cell library must contain the following:
• A physical layout
• A behavioral model
• A Verilog/VHDL model
• A detailed timing model
• A test strategy
• A circuit schematic
• A cell icon
• A wire-load model
• A routing model
The ASIC designer needs a high-level, behavioral model for each cell because simulation at
the detailed timing level takes too long for a complete ASIC design.
• For a NAND gate a behavioral model is simple.
• A multiport RAM model can be very complex.
• The designer may require Verilog and VHDL models in addition to the models for a
particular logic simulator.
ASIC designers also need a detailed timing model for each cell to determine the performance of
the critical pieces of an ASIC.
• It is too difficult
• too time-consuming, and
• too expensive to build every cell in silicon and measure the cell delays.
Instead library engineers simulate the delay of each cell, a process known as characterization.
Characterizing a standard-cell or gate-array library involves circuit extraction from the full-custom
cell layout for each cell. The extracted schematic includes all the parasitic resistance and
capacitance elements. Then library engineers perform a simulation of each cell including the
parasitic elements to determine the switching delays. All ASICs need to be production tested
(programmable ASICs may be tested by the manufacturer before they are customized, but they
still need to be tested). Simple cells in small or medium-size blocks can be tested using automated
techniques, but large blocks such as RAM or multipliers need a planned strategy The cell schematic
(a netlist description) describes each cell so that the cell designer can perform simulation for
complex cells. You may not need the
detailed cell schematic for all cells, but you need enough information to compare what you think
is on the silicon (the schematic) with what is actually on the silicon (the layout)this is a layout
versus schematic ( LVS ) check.
If the ASIC designer uses schematic entry, each cell needs a cell icon together with connector and
naming information that can be used by design tools from different vendors. ASIC design uses
Schematic entry and One of the advantages of using logic synthesis rather than schematic design
entry is eliminating the problems with icons, connectors, and cell names. Logic synthesis also
makes moving an ASIC between different cell libraries, or retargeting, much easier.
In order to estimate the parasitic capacitance of wires before we actually complete any routing, we
need a statistical estimate of the capacitance for a net in given size circuit block. This usually takes
the form of a look-up table known as a wire-load model. We also need a routing model for each
cell. Large cells are too complex for the physical design or layout tools to handle directly and we
need a simpler representation a phantom of the physical layout that still contains all the necessary
information. The phantom may include information that tells the automated routing tool where it
can and cannot place wires over the cell, as well as the location and types of the connections to the
cell.
Open source Tools for Semi custom & Full Custom Design: Klayout, Magic VLSI, electric
VLSI, Open ROAD, Yosys
CMOS Logic:
Data Path Logic Cells: (Example – n- bit adder)
o The overhead (buffering and routing the control signals, for example) can make a
narrow (small number of bits) datapath larger and slower than a standard-cell (or
even gate-array) implementation.
o Datapath cells have to be predesigned (otherwise we are using full-custom design)
for use in a wide range of datapath sizes. Datapath cell design can be harder than
designing gate-array macros or standard cells.
o Software to assemble a datapath is more complex and not as widely used as
software for assembling standard cells or gate arrays.
• There are some newer standard-cell and gate-array tools that can take advantage of
regularity in a design and position cells carefully. The problem is in finding the
regularity if it is not specified. Using a datapath is one way to specify regularity to
ASIC design tools.
The Carry Save adder circuit diagram is shown below. This type of adder is very different as
compared to other types because it doesn’t transmit the middle carries toward the next stages, but
in its place, it saves the carry & addends to the sum of the next stage with another fuller adder
(FA).
The technique of adding up binary bits, the first stage of the addition part mainly includes saving
the carries & sum bits, and transfers to the second stage. This stage acts related to ripple carry
adder or RCA where the stored carry & sum bits are added. The operands used in this adder are
three like A, B & C where ‘C’ is a four-bit input carry. Here, for four every bit of A, B & C, and
4 FAs is used.
For every FA, the sum & carry bits are produced. Here, the carry bits are not transmitted to the
next FA, but in its place, they are simply saved & added up to the next sum term with a ripple
carry adder.
CSA Working
The carry save adder works by assembling K FAs without any horizontal connection. The main
function of this adder is to add three k-bit integers like A, B & C to generate two integers sum ‘S’
& carry C. The carry propagator is propagated to the next level whereas the carry generator is used
to generate the output carry, irrespective of the input carry. The carry propagation and generation
are two functions in the carry save adder. The carry propagation (Cp) is propagated to the next
level whereas the carry generator (Cg) is used to generate the output carry, irrespective of input
carry.
Example:
let X = 19, Y = 25 & Z = 11, then we compute the sum and carry it as S & C’ shown below.
X=19 = 1 0 0 1 1
Y=25 = 1 1 0 0 1
Z=11 = 0 1 0 1 1
……………………………………….
Sum =100001
Carry =11011
……………………………………….
55 = 1 1 0 1 1 1
In the above CSA example, when we are adding three binary numbers X, Y & Z you will get a
sum and carry in the next state. After adding the sum and carry values, we will get the final value
The problem with an RCA is that every stage has to wait to make its carry decision, C[i], until the
previous stage has calculated C[i+1]. If we examine the propagate signals
we can bypass this critical path. Thus, for example, to bypass the carries for bits C[4], C[7]
(stages 5,8) of an adder we can compute BYPASS [BP] = P[4].P[5].P[6].P[7] and then use a
MUX as follows:
circuit at any stage does not have to wait for the generation of carry-bit from the previous stage
and carry bit can be evaluated at any instant of time.
For deriving the truth table of this adder, two new terms are introduced – Carry generate and
carry propagate. Carry generate Gi =1 whenever there is a carry Ci+1 generated. It depends on
Ai and Bi inputs. Gi is 1 when both Ai and Bi are 1. Hence, Gi is calculated as Gi = Ai. Bi.
Carry propagated Pi is associated with the propagation of carry from Ci to Ci+1. It is calculated
as Pi = Ai ⊕ Bi. The truth table of this adder can be derived from modifying the truth table of a
full adder.
Using the Gi and Pi terms the Sum Si and Carry Ci+1 are given as below –
• S[i] = P[i] ⊕ G[i].
• C[i+1] = C[i].P[i] +G[i].
It can be observed from the equations that carry C[i+1] only depends on the carry C[0], not on
the intermediate carry bits.
of two ripple-carry adders, one of which is fed with a constant 0 carry-in while the other is fed
with a constant 1 carry-in. Therefore, both blocks can calculate in parallel. When the actual carry-
in signal for the block arrives, multiplexers are used to select the correct one of both precalculated
partial sums. Also, the resulting carry-out is selected and propagated to the next carry-select block.
In total, the carry propagation time through an n-bit adder block is reduced from O(n) to the number
of stages times the delay of the multiplexers. Naturally, using n blocks of 1-bit carry-select adders
would incur a complexity of n multiplexers, again resulting in O(n) delay. Therefore, a partition
with (slowly) increasing block-size is chosen. In the example, the first (least-significant) block
consists of a simple full adder, followed by a 3-bit carry-select block, and finally a 4-bit carry-
select block. A common choice for a 16-bit carry-select adder is to use a 6-4-3-2-1 bit partitioning.
While the delay of the standard ripple-carry adder with n-bits is O(n), the delay through the carry-
select adder behaves as O(sqrt(n)) at a hardware cost of O(3*n).
Figure: A 1-bit conditional adder that calculates the sum and carry out assuming the carry in is
either '1' or '0'
Example:
// 4- bit sonsitional sum adder
module cond_sum_add(a,b,c0,cout,s );
fa m2(a[1],b[1],1'b0,s1,c2,);
fa m3(a[1],b[1],1'b1,s2,c3,);
fa m4(a[2],b[2],1'b0,s3,c4,);
fa m5(a[2],b[2],1'b1,s4,c5,);
fa m6(a[3],b[3],1'b0,s5,c6,);
fa m7(a[3],b[3],1'b1,s6,c7,);
mux_df mx1(c2,c3,c1,c23);
mux_df mx2(s1,s2,c1,s[1]);
mux_df mx3(c6,c7,c4,c670);
mux_df mx4(s5,s6,c4,s670);
mux_df mx5(c6,c7,c5,c671);
mux_df mx6(s5,s6,c5,s671);
mux_df mx7(c670,c671,c23,cout);
mux_df mx8(s3,s4,c23,s[2]);
mux_df mx9(s670,s671,c23,s[3]);
endmodule
Priority Encoder:
• The output of a priority encoder is the binary-encoded position of the leading one in an
input. For example, with an input A = '0000 0101' the leading 1 is in bit position 3
(MSB is bit position 7) so the output of a 4-bit priority encoder would be Z = '0011' (3).
• In some cell libraries the encoding is reversed so that the MSB has an output code of
zero, in this case Z = '0101' (5). This second, reversed, encoding scheme is useful in
floating-point arithmetic.
• If A is a mantissa and we normalize A to '1010 0000' we have to subtract 5 from the
exponent, this exponent correction is equal to the output of the priority encoder.
Accumulator:
• An accumulator is an adder/subtracter and a register. Sometimes these are combined
with a multiplier to form a multiplier/accumulator ( MAC ).
• An incrementer adds 1 to the input bus, Z = A + 1, so we can use this function, together
with a register, to negate a twos complement number for example.
o Z[ i ] = XOR(A[ i ], CIN[ i ])
o COUT[ i ] = AND(A[ i ], CIN[ i ]).
• The carry-in control input, CIN[0], thus acts as an enable: If it is set to '0' the output is the
same as the input.
Incrementor:
• Z[ i (even)] = XOR(A[ i ], CIN[ i ]) and COUT[ i (even)] = NAND(A[ i ], CIN[ i ]).
This inverts COUT, so that in the following stage we must invert it again.
• If we push an inverting bubble to the input CIN we find that:
o Z[ i (odd)] = XNOR(A[ i ], CIN[ i ])
o COUT[ i (even)] = NOR(NOT(A[ i ]), CIN[ i]).
• In many datapath implementations all odd-bit cells operate on inverted carry signals,
and thus the odd-bit and even-bit datapath elements are different.
Decrementer:
• A decrementer subtracts 1 from the input bus, the logical implementation is
o Z[ i ] = XOR(A[ i ], CIN[ i ])
o COUT[ i ] = AND(NOT(A[ i ]), CIN[ i ]).
• The implementation may invert the odd carry signals, with CIN[0] again acting as an
enable.
Incrementer/Decrementer:
• An incrementer/decrementer has a second control input that gates the input, inverting
the input to the carry chain. This has the effect of selecting either the increment or
decrement function.
All -Zero Detector &All One-Detector:
• For a 4-bit number, for example, zero in ones complement arithmetic is '1111' or '0000',
and that zero in signed magnitude arithmetic is '1000' or '0000'.
Register File:
• A register file (or scratchpad memory) is a bank of flip-flops arranged across the bus;
sometimes these have the option of multiple ports (multiport register files) for read and
write.
• Normally these register files are the densest logic and hardest to fit in a datapath.
• For large register files it may be more appropriate to use a multiport memory.
• We can add control logic to a register file to create a first-in first-out register ( FIFO ), or
last-in first-out register ( LIFO )
I/O Cells:
Three State Bi-directional output buffer
• When the output enable (OE) signal is high, the circuit functions as a noninverting buffer
driving the value of DATAin onto the I/O pad.
• When OE is low, the output transistors or drivers , M1 and M2, are disconnected. This
allows multiple drivers to be connected on a bus. It is up to the designer to make sure that
a bus never has two drivers a problem known as contention .
• In order to prevent the problem opposite to contention a bus floating to an intermediate
voltage when there are no bus drivers ,we can use a bus keeper or bus-hold cell (TI calls
this Bus-Friendly logic).
o A bus keeper normally acts like two weak (low drive-strength) cross-coupled
inverters that act as a latch to retain the last logic state on the bus, but the latch is
weak enough that it may be driven easily to the opposite state.
o Even though bus keepers act like latches, and will simulate like latches, they
should not be used as latches, since their drive strength is weak
Figure: three State bi-directional output buffer[When the output enable, OE, is '1' the output
section is enabled and drives the I/O pad. When OE is '0' the output buffer is placed in a
high-impedance state]
• Transistors M1 and M2 have to drive large off-chip loads. If we wish to change the
voltage on a C = 200 pF load by 5 V in 5 ns (a slew rate of 1Vns-1 ) we will require a
current in the output transistors of
grid of 100 cm (consisting of 10 by 10 lines running all the way across a 1 cm chip) presents
a load of 20 pF to the clock buffer
• Some libraries include I/O cells that have passive pull-ups or pull-downs
(resistors) instead of the transistors, M1 and M2 (the resistors are normally still
constructed from transistors with long gate lengths).
• We can also omit one of the driver transistors, M1 or M2, to form open-drain outputs that
require an external pull-up or pull-down.
• We can design the output driver to produce TTL output levels rather than CMOS logic
levels. We may also add input hysteresis (using a Schmitt trigger) to the input buffer to
accept input data signals that contain glitches (from bouncing switch contacts, for example)
or that are slow rising.
• The input buffer can also include a level shifter to accept TTL input levels and shift the
input signal to CMOS levels
• The gate oxide in CMOS transistors is extremely thin (100 Å or less). This leaves the gate
oxide of the I/O cell input transistors susceptible to breakdown from static electricity
(electrostatic discharge , or ESD ).
• ESD arises when we or machines handle the package leads (like the shock I sometimes get
when I touch a doorknob after walking across the carpet at work).
• Sometimes this problem is called electrical overstress (EOS) since most ESD-related
failures are caused not by gate-oxide breakdown, but by the thermal stress (melting) that
occurs when the n -channel transistor in an output driver overheats (melts) due to the large
current that can flow in the drain diffusion connected to a pad during an ESD event.
Cell Compilers:
• The process of hand crafting circuits and layout for a full-custom IC is a tedious, time-
consuming, and error-prone task.
• There are two types of automated layout assembly tools, often known as a silicon
compilers .
o The first type produces a specific kind of circuit, a RAM compiler or multiplier
compiler