7th Sem - ASIC-1st-Module-NOTES
7th Sem - ASIC-1st-Module-NOTES
PART - B
• CMOS Logic 13
• Data path Logic Cells 15
o Data Path Elements 17
o Adders: 19
▪ Carry skip 20
▪ Carry bypass 20
▪ Carry save 22
▪ Carry select 22
▪ Conditional sum 23
o Multiplier (Booth encoding) 24
• Data path Operators 26
• I/O cells 27
• Cell Compilers. 27
TEXT BOOK:
INTRODUCTION TO ASICs
An ASIC (“a-sick”) is an application-specific integrated circuit. It is an Integrated Circuit (IC)
designed to perform a specific function for a specific application. As opposed to a standard, general
purpose off-the-shelf part such as a commercial microprocessor or a 7400 series IC.
Gate equivalent - a unit of size measurement corresponding to a 4 transistor gate equivalent (e.g. a 2
input NOR gate).
History of integration:
Integrated circuit is a circuit in which all or some of the circuit elements are inseparably associated and
electrically interconnected to form a complete functional device. Advances in IC technology, primarily
smaller features and larger chips, have allowed the number of transistors in an integrated circuit to double
every two years, a trend known as Moore's law. This increased capacity has been used to decrease cost and
increase functionality. As a resultant the various integration levels emerged based on the Moore’s Law.
History of technology:
1. Bipolar technology
2. Transistor–transistor logic (TTL)
3. Metal Oxide-Silicon (MOS) technology because it was difficult to make metal-gate n-
channel MOS (nMOS or NMOS)
4. Complementary MOS (CMOS) greatly reduced power.
The feature size is the smallest shape you can make on a chip and is measured in λ or lambda.
Origin of ASICs:
The standard parts, initially used to design microelectronic systems, were gradually replaced
with a combination of glue logic, custom ICs, dynamic random access memory (DRAM) and static
RAM (SRAM).
History of ASICs:
The IEEE Custom Integrated Circuits Conference (CICC) and IEEE International
ASIC Conference document the development of ASICs
Application-specific standard products (ASSPs) are a cross between standard parts and ASICs.
Types of ASIC
ICs are made on a wafer. Circuits are built up with successive mask layers.
The number of masks used to define the interconnect and other layers is different between various
categories of ASICs.
1. Full custom ASIC
2. Standard cell based ASIC and Gate Array based ASIC.
3. Programmable ASIC: PLDs (FPGA, CPLD etc…)
1. Full Custom ASIC:
• All mask layers are customized in a full-custom ASIC.
• It only makes sense to design a full-custom IC if there are no libraries available.
• Full-custom offers the highest performance and lowest part cost (smallest die size) with
the disadvantages of increased design time, complexity, design expense, and highest risk.
• Microprocessors were exclusively full-custom, but designers are increasingly turning to
semicustom ASIC techniques in this area too.
• Other examples of full-custom ICs or ASICs are requirements for high-voltage
(automobile), analog/digital (communications), or sensors and actuators.
Design Flow:
Economics of ASICs
We’ll compare the most popular types of ASICs: an FPGA, an MGA, and a CBIC.
• On a parts only basis, an FPGA is more expensive per-gate than an MGA, which is in turn more
expensive than a CBIC
• The key is that the fixed cost of the CBIC is higher than the MGA which is higher than the FPGA
because of
1. Design cost
2. Fabrication cost
In a product cost there are fixed costs and variable costs:
Total Product Cost = Fixed Product Cost + Variable Product Cost × Products Sold
In a product made from parts the total cost for any part is
Total Part Cost = Fixed Part Cost + Variable Cost Per Part × Volume Of Parts
Break-even graph
The profit model is the linear, deterministic algebraic model used implicitly by most cost
accountants. Starting with, profit equals sales minus costs, it provides a structure for modeling cost
elements such as materials, losses, multi-products, learning, depreciation etc. It provides a mutable
conceptual base for spreadsheet modelers. This enables them to run deterministic simulations on the
Product to see the impact of price, cost or quantity changes on profitability.
In commerce, time to market (TTM) is the length of time it takes from a product being conceived
until its being available for sale. TTM is important in industries where products are outmoded quickly.
A library of cells is used by the designer to design the logic function for an ASIC
Options for cell library:
1. Use a design kit from the ASIC vendor
• Usually requires the use of ASIC vendor approved tools
• Cells are “phantoms” - empty boxes that get filled in by the vendor when you deliver,
or ‘hand off” the netlist
• Vendor may provide more of a “guarantee” that design will work
2. Buy an ASIC-vendor library from a library vendor
• Library vendor is different from fabricator (foundry)
• Library may be approved by the foundry (qualified cell library)
• Allows the designer to own the masks (tooling) for the part when finished
3. You can build your own cell library
• Difficult and costly.
A complete ASIC library (suitable for commercial use) must include the following for each cell
and macro:
• A physical layout
• A behavioral model
• A VHDL or Verilog model
• A detailed timing model
• A test strategy
• A circuit schematic
• A cell icon (symbol)
• A wire-load model
• A routing model
CMOS Logic:
A CMOS transistor (or device) has four terminals: gate, source, drain, and a fourth terminal that we shall
ignore until the next section. A CMOS transistor is a switch. The switch must be conducting or on to allow
current to flow between the source and drain terminals (using open and closed for switches is confusing
for the same reason we say a tap is on and not that it is closed ). The transistor source and drain terminals
are equivalent as far as digital signals are concerned—we do not worry about labeling an electrical switch
with two terminals.
We turn a transistor on or off using the gate terminal. There are two kinds of CMOS transistors: n -channel
transistors and p-channel transistors. An n -channel transistor requires a logic '1' (from now on I’ll just say
a '1') on the gate to make the switch conducting (to turn the transistor on ). A p -channel transistor requires
a logic '0' (again from now on, I’ll just say a '0') on the gate to make the switch non conducting (to turn the
transistor off ). The p -channel transistor symbol has a bubble on its gate to remind us that the gate has to
be a '0' to turn the transistor on . All this is shown in (a) and (b). If we connect an n -channel transistor in
series with a p -channel transistor, as shown in Figure(c), we form an inverter.
CMOS logic.
(a) A two-input NAND logic cell. (b) A two-input NOR logic cell. The n -channel and p -
channel transistor switches implement the '1's and '0's of a Karnaugh map.
Other Logics: The AND-OR-INVERT (AOI) and the OR-AND-INVERT (OAI) logic cells are particularly
efficient in CMOS.
INPUTS OUTPUTS
A B CIN SUM COUT
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Data Path Adder:
Data path adder is a Ripple Carry adder.
Ripple Carry Adder:
A ripple carry adder is a logic circuit in which the carry-out of each full adder is the carry in of the
succeeding next most significant full adder. It is called a ripple carry adder because each carry bit gets
rippled into the next stage.
Figure above shows a typical datapath symbols for an adder (people rarely use the IEEE standards in ASIC
datapath libraries). I use heavy lines (they are 1.5 point wide) with a stroke to denote a data bus (that flows
in the horizontal direction in a datapath), and regular lines (0.5 point) to denote the control signals (that
flow vertically in a datapath). At the risk of adding confusion where there is none, this stroke to indicate a
data bus has nothing to do with mixed-logic conventions. For a bus, A[31:0] denotes a 32-bit bus with
A[31] as the leftmost or most-significant bit or MSB , and A[0] as the least-significant bit or LSB .
Sometimes we shall use A[MSB] or A[LSB] to refer to these bits. Notice that if we have an n -bit bus and
LSB = 0, then MSB = n – 1. Also, for example, A[4] is the fifth bit on the bus (from the LSB). We use a '
S ' or 'ADD' inside the symbol to denote an adder instead of '+', so we can attach '–' or '+/–' to the inputs
for a subtracter or adder/subtracter.
Some schematic datapath symbols include only data signals and omit the control signals—but we must not
forget them. In Figure (C), for example, we may need to explicitly tie CIN[0] to VSS and use COUT[MSB]
and COUT[MSB – 1] to detect overflow.
Adders:
We can view addition in terms of generate, G[i], and propagate, P[i], signals.
Where C[i] is the carry-out signal from stage i , equal to the carry in of stage (i + 1). Thus, C[i]= COUT[i]
= CIN[i + 1]. We need to be careful because C[0] might represent either the carry in or the carry out of the
LSB stage. For an adder we set the carry in to the first stage (stage zero), C[–1] or CIN[0], to '0'.
If we consider a conventional RCA. The delay of an n -bit RCA is proportional to n and is limited by the
propagation of the carry signal through all of the stages. We can reduce delay by using pairs of “go-faster”
bubbles to change AND and OR gates to fast two-input NAND gates as shown in Figure (a). Alternatively,
we can write the equations for the carry signal in two different ways:
or
The carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c) Symbol for a CSA. (d) A four-input
CSA. (e) The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder (RCA) as the final
stage. (f) A pipelined adder. (g) The datapath for the pipelined version showing the pipeline registers as
well as the clock control lines that use m2.
(We can also pipeline the RCA. We add i registers on the A and B inputs before ADD[ i ] and add
( n – i) registers after the output S[ i ], with a single register before each C[ i ].)
The problem with an RCA is that every stage has to wait to make its carry decision, C[ i ], until the previous
stage has calculated C[ i – 1]. If we examine the propagate signals we can bypass this critical path. Thus,
for example, to bypass the carries for bits 4–7 (stages 5–8) of an adder we can compute
Adders based on this principle are called carry-bypass adders (CBA). Large, custom adders
employ Manchester-carry chains to compute the carries and the bypass operation using TGs or just pass
transistors. These types of carry chains may be part of a predesigned ASIC adder cell, but are not used by
ASIC designers.
Instead of checking the propagate signals we can check the inputs. For example we can compute
SKIP = (A[ i – 1] ⊕ B[ i – 1]) + (A[ i] ⊕ B[ i ] ) and then use a 2:1 MUX to select C[ i ]. Thus,
This is a carry-skip adder. Carry-bypass and carry-skip adders may include redundant logic (since the carry
is computed in two different ways—we just take the first signal to arrive). We must be careful that the
redundant logic is not optimized away during logic synthesis.
If we find the recursive carries to look ahead the possibilities of carry then it is easier for Computation.
The following equation represents the Carry look ahead adder for 4 bits. C[0]=Cin.
The Brent–Kung carry-lookahead adder (CLA). (a) Carry generation in a 4-bit CLA. (b) A cell to generate
the lookahead terms, C[0]–C[3]. (c) Cells L1, L2, and L3 are rearranged into a tree that has less delay. Cell
L4 is added to calculate C[2] that is lost in the translation. (d) and (e) Simplified representations of parts a
and c. (f) The lookahead logic for an 8-bit adder. The inputs, 0–7, are the propagate and carry terms formed
from the inputs to the adder. (g) An 8-bit Brent–Kung CLA.
The outputs of the look ahead logic are the carry bits that (together with the inputs) form the sum. One
advantage of this adder is that delays from the inputs to the outputs are more nearly equal than in other
adders. This tends to reduce the number of unwanted and unnecessary switching events and thus reduces
power dissipation.
In a carry-select adder we duplicate two small adders (usually 4-bit or 8-bit adders—often CLAs) for the
cases CIN = '0' and CIN = '1' and then use a MUX to select the case that we need—wasteful, but fast. A
carry-select adder is often used as the fast adder in a datapath library because its layout is regular.
We can use the carry-select, carry-bypass, and carry-skip architectures to split a 12-bit adder, for example,
into three blocks. The delay of the adder is then partly dependent on the delays of the MUX between each
block. Suppose the delay due to 1-bit in an adder block (we shall call this a bit delay) is approximately
equal to the MUX delay. In this case may be faster to make the blocks 3, 4, and 5-bits long instead of being
equal in size. Now the delays into the final MUX are equal—3 bit-delays plus 2 MUX delays for the carry
signal from bits 0–6 and 5 bit-delays for the carry from bits 7–11. Adjusting the block size reduces the
delay of large adders (more than 16 bits).
We can extend the idea behind a carry-select adder as follows. Suppose we have an n -bit adder that
generates two sums: One sum assumes a carry-in condition of '0', the other sum assumes a carry-in
condition of '1'. We can split this n -bit adder into an i -bit adder for the i LSBs and an ( n – i ) bit adder for
the n – i MSBs. Both of the smaller adders generate two conditional sums as well as true and complement
carry signals. The two (true and complement) carry signals from the LSB adder are used to select between
the two ( n– i + 1) bit conditional sums from the MSB adder using 2( n – i + 1) two-input MUXes. This is
a conditional-sum adder (also often abbreviated to CSA). We can recursively apply this technique. For
example, we can split a 16-bit adder using i = 8 and n = 8, then we can split one or both 8–bit adders
again—and so on.
Figure above shows the simplest form of an n -bit conditional-sum adder that uses n single-bit conditional
adders, H (each with four outputs: two conditional sums, true carry, and complement carry), together with
a tree of 2:1 MUXes (Qi_j). The conditional-sum adder is usually the fastest of all the adders we have
discussed.
Multipliers:
Figure below shows a symmetric 6-bit array multiplier (an n -bit multiplier multiplies two n -bit numbers;
we shall use n -bit by m -bit multiplier if the lengths are different). Adders a0–f0 may be eliminated, which
then eliminates adders a1–a6, leaving an asymmetric CSA array of 30 (5 × 6) adders (including one half
adder). An n -bit array multiplier has a delay proportional to n plus the delay of the CPA.
There are two items we can attack to improve the performance of a multiplier:
Suppose we wish to multiply 15 (the multiplicand ) by 19 (the multiplier ) mentally. It is easier to calculate
15 × 20 and subtract 15. In effect we complete the multiplication as 15 ×(20 – 1) and we could write this
as 15 × 2 1 , with the overbar representing a minus sign. Now suppose we wish to multiply an 8-bit binary
number, A, by B = 00010111 (decimal 16 + 4 + 2 + 1 = 23). It is easier to multiply A by the canonical
signed-digit vector ( CSD vector ) D = 0010 1 001 (decimal 32 – 8 + 1 = 23) since this requires only three
add or subtract operations (and a subtraction is as easy as an addition). We say B has a weight of 4 and D
has a weight of 3. By using D instead of B we have reduced the number of partial products by 1 (= 4 – 3).
We can recode (or encode) any binary number, B, as a CSD vector, D, as follows (canonical means there
is only one CSD vector for any number):
D i = B i + C i – 2C i + 1
where C i + 1 is the carry from the sum of B i + 1 + B i + C i (we start with C 0 = 0).
Tree-based multiplication. (a) The portion of above Figure that calculates the sum bit, P 5 , using a chain
of adders (cells a0–f5). (b) We can collapse this chain to a Wallace tree (cells 5.1–5.5). (c) The stages of
multiplication.
I/O Cells:
A three-state bidirectional output buffer (Tri-State ® is a registered trademark of National Semiconductor).
When the output enable (OE) signal is high, the circuit functions as a noninverting buffer driving the value
of DATAin onto the I/O pad. When OE is low, the output transistors or drivers , M1 and M2, are
disconnected. This allows multiple drivers to be connected on a bus. It is up to the designer to make sure
that a bus never has two drivers—a problem known as contention.
In order to prevent the problem opposite to contention—a bus floating to an intermediate voltage when
there are no bus drivers—we can use a bus keeper or bus-hold cell (TI calls this Bus-Friendly logic). A bus
keeper normally acts like two weak (low drive-strength) cross-coupled inverters that act as a latch to retain
the last logic state on the bus, but the latch is weak enough that it may be driven easily to the opposite state.
Even though bus keepers act like latches, and will simulate like latches, they should not be used as latches,
since their drive strength is weak.
The three-state buffer allows us to employ the same pad for input and output— bidirectional I/O . When
we want to use the pad as an input, we set OE low and take the data from DATA in. Of course, it is not
necessary to have all these features on every pad: We can build output-only or input-only pads.
Cell Compiler:
The process of hand crafting circuits and layout for a full-custom IC is a tedious, time-consuming, and
error-prone task.
There are two types of automated layout assembly tools, often known as a silicon compilers.
1. The first type produces a specific kind of circuit, a RAM compiler or multiplier compiler etc….
2. The second type of compiler is more flexible, usually providing a programming language
that assembles or tiles layout from an input command file, but this is full-custom IC design.
We can build a register file from latches or flip-flops, but, at 4.5–6.5 gates (18–26 transistors) per bit,
this is an expensive way to build memory. Dynamic RAM (DRAM) can use a cell with only one transistor,
storing charge on a capacitor that has to be periodically refreshed as the charge leaks away. ASIC RAM is
invariably static (SRAM), so we do not need to refresh the bits. When we refer to RAM in an ASIC
environment we almost always mean SRAM. Most ASIC RAMs use a six-transistor cell (four transistors
to form two cross-coupled inverters that form the storage loop, and two more transistors to allow us to read
from and write to the cell). RAM compilers are available that produce single-port RAM (a single shared
bus for read and write) as well as dual-port RAMs , and multiport RAMs . In a multi-port RAM the compiler
may or may not handle the problem of address contention (attempts to read and write to the same RAM
address simultaneously). RAM can be asynchronous (the read and write cycles are triggered by control
and/or address transitions asynchronous to a clock) or synchronous (using the system clock).
In addition to producing layout we also need a model compiler so that we can verify the circuit at the
behavioral level, and we need a netlist from a netlist compiler so that we can simulate the circuit and verify
that it works correctly at the structural level. Silicon compilers are thus complex pieces of software. We
assume that a silicon compiler will produce working silicon even if every configuration has not been tested.
This is still ASIC design, but now we are relying on the fact that the tool works correctly and therefore the
compiled blocks are correct by construction.