AICD Notes
AICD Notes
EC2V12-ASIC DESIGN
History of integration:
Integrated circuit is a circuit in which all or some of the circuit elements are inseparably associated and
electrically interconnected to form a complete functional device. Advances in IC technology, primarily
smaller features and larger chips, have allowed the number of transistors in an integrated circuit to double
every two years, a trend known as Moore's law. This increased capacity has been used to decrease cost and
increase functionality. As a resultant the various integration levels emerged based on the Moore’s Law.
History of technology:
1. Bipolar technology
2. Transistor–transistor logic (TTL)
3. Metal Oxide-Silicon (MOS) technology because it was difficult to make metal-gate n-
channel MOS (nMOS or NMOS)
4. Complementary MOS (CMOS) greatly reduced power.
The feature size is the smallest shape you can make on a chip and is measured in λ or lambda.
Origin of ASICs:
The standard parts, initially used to design microelectronic systems, were gradually replaced
with a combination of glue logic, custom ICs, dynamic random access memory (DRAM) and static
RAM (SRAM).
ASIC DESIGN Unit - I Introduction to ASICs
History of ASICs:
The IEEE Custom Integrated Circuits Conference (CICC) and IEEE International
ASIC Conference document the development of ASICs
Application-specific standard products (ASSPs) are a cross between standard parts and ASICs.
Types of ASIC
ICs are made on a wafer. Circuits are built up with successive mask layers.
The number of masks used to define the interconnect and other layers is different between various
categories of ASICs.
1. Full custom ASIC
2. Standard cell based ASIC and Gate Array based ASIC.
3. Programmable ASIC: PLDs (FPGA, CPLD etc…)
1. Full Custom ASIC:
All mask layers are customized in a full-custom ASIC.
It only makes sense to design a full-custom IC if there are no libraries available.
Full-custom offers the highest performance and lowest part cost (smallest die size) with
the disadvantages of increased design time, complexity, design expense, and highest risk.
Microprocessors were exclusively full-custom, but designers are increasingly turning to
semicustom ASIC techniques in this area too.
Other examples of full-custom ICs or ASICs are requirements for high-voltage
(automobile), analog/digital (communications), or sensors and actuators.
ASIC DESIGN Unit - I Introduction to ASICs
Design Flow:
CMOS Logic:
A CMOS transistor (or device) has four terminals: gate, source, drain, and a fourth terminal that we shall
ignore until the next section. A CMOS transistor is a switch. The switch must be conducting or on to allow
current to flow between the source and drain terminals (using open and closed for switches is confusing
for the same reason we say a tap is on and not that it is closed ). The transistor source and drain terminals
are equivalent as far as digital signals are concerned—we do not worry about labeling an electrical switch
with two terminals.
We turn a transistor on or off using the gate terminal. There are two kinds of CMOS transistors: n -channel
transistors and p-channel transistors. An n -channel transistor requires a logic '1' (from now on I’ll just say
a '1') on the gate to make the switch conducting (to turn the transistor on ). A p -channel transistor requires
a logic '0' (again from now on, I’ll just say a '0') on the gate to make the switch non conducting (to turn the
transistor off ). The p -channel transistor symbol has a bubble on its gate to remind us that the gate has to
be a '0' to turn the transistor on . All this is shown in (a) and (b). If we connect an n -channel transistor in
series with a p -channel transistor, as shown in Figure(c), we form an inverter.
CMOS logic.
(a) A two-input NAND logic cell. (b) A two-input NOR logic cell. The n -channel and p -
channel transistor switches implement the '1's and '0's of a Karnaugh map.
Other Logics: The AND-OR-INVERT (AOI) and the OR-AND-INVERT (OAI) logic cells are particularly
efficient in CMOS.
INPUTS OUTPUTS
A B CIN SUM COUT
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Data Path Adder:
Data path adder is a Ripple Carry adder.
Ripple Carry Adder:
A ripple carry adder is a logic circuit in which the carry-out of each full adder is the carry in of the
succeeding next most significant full adder. It is called a ripple carry adder because each carry bit gets
rippled into the next stage.
Figure above shows a typical datapath symbols for an adder (people rarely use the IEEE standards in ASIC
datapath libraries). I use heavy lines (they are 1.5 point wide) with a stroke to denote a data bus (that flows
in the horizontal direction in a datapath), and regular lines (0.5 point) to denote the control signals (that
flow vertically in a datapath). At the risk of adding confusion where there is none, this stroke to indicate a
data bus has nothing to do with mixed-logic conventions. For a bus, A[31:0] denotes a 32-bit bus with
A[31] as the leftmost or most-significant bit or MSB , and A[0] as the least-significant bit or LSB .
Sometimes we shall use A[MSB] or A[LSB] to refer to these bits. Notice that if we have an n -bit bus and
LSB = 0, then MSB = n – 1. Also, for example, A[4] is the fifth bit on the bus (from the LSB). We use a
' S ' or 'ADD' inside the symbol to denote an adder instead of '+', so we can attach '–' or '+/–' to the inputs
for a subtracter or adder/subtracter.
Some schematic datapath symbols include only data signals and omit the control signals—but we must not
forget them. In Figure (C), for example, we may need to explicitly tie CIN[0] to VSS and use COUT[MSB]
and COUT[MSB – 1] to detect overflow.
ASIC DESIGN Unit - I Introduction to ASICs
Adders:
We can view addition in terms of generate, G[i], and propagate, P[i], signals.
Where C[i] is the carry-out signal from stage i , equal to the carry in of stage (i + 1). Thus, C[i]= COUT[i]
= CIN[i + 1]. We need to be careful because C[0] might represent either the carry in or the carry out of the
LSB stage. For an adder we set the carry in to the first stage (stage zero), C[–1] or CIN[0], to '0'.
If we consider a conventional RCA. The delay of an n -bit RCA is proportional to n and is limited by the
propagation of the carry signal through all of the stages. We can reduce delay by using pairs of “go-faster”
bubbles to change AND and OR gates to fast two-input NAND gates as shown in Figure (a). Alternatively,
we can write the equations for the carry signal in two different ways:
or
The carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c) Symbol for a CSA. (d) A four-input
CSA. (e) The datapath for a four-input, 4-bit adder using CSAs with a ripple-carry adder (RCA) as the final
stage. (f) A pipelined adder. (g) The datapath for the pipelined version showing the pipeline registers as
well as the clock control lines that use m2.
ASIC DESIGN Unit - I Introduction to ASICs
(We can also pipeline the RCA. We add i registers on the A and B inputs before ADD[ i ] and add
( n – i) registers after the output S[ i ], with a single register before each C[ i ].)
The problem with an RCA is that every stage has to wait to make its carry decision, C[ i ], until the previous
stage has calculated C[ i – 1]. If we examine the propagate signals we can bypass this critical path. Thus,
for example, to bypass the carries for bits 4–7 (stages 5–8) of an adder we can compute
Adders based on this principle are called carry-bypass adders (CBA). Large, custom adders
employ Manchester-carry chains to compute the carries and the bypass operation using TGs or just pass
transistors. These types of carry chains may be part of a predesigned ASIC adder cell, but are not used by
ASIC designers.
Instead of checking the propagate signals we can check the inputs. For example we can compute
SKIP = (A[ i – 1] ⊕ B[ i – 1]) + (A[ i] ⊕ B[ i ] ) and then use a 2:1 MUX to select C[ i ]. Thus,
This is a carry-skip adder. Carry-bypass and carry-skip adders may include redundant logic (since the carry
is computed in two different ways—we just take the first signal to arrive). We must be careful that the
redundant logic is not optimized away during logic synthesis.
ASIC DESIGN Unit - I Introduction to ASICs
If we find the recursive carries to look ahead the possibilities of carry then it is easier for Computation.
The following equation represents the Carry look ahead adder for 4 bits. C[0]=Cin.
The Brent–Kung carry-lookahead adder (CLA). (a) Carry generation in a 4-bit CLA. (b) A cell to generate
the lookahead terms, C[0]–C[3]. (c) Cells L1, L2, and L3 are rearranged into a tree that has less delay. Cell
L4 is added to calculate C[2] that is lost in the translation. (d) and (e) Simplified representations of parts a
and c. (f) The lookahead logic for an 8-bit adder. The inputs, 0–7, are the propagate and carry terms formed
from the inputs to the adder. (g) An 8-bit Brent–Kung CLA.
ASIC DESIGN Unit - I Introduction to ASICs
The outputs of the look ahead logic are the carry bits that (together with the inputs) form the sum. One
advantage of this adder is that delays from the inputs to the outputs are more nearly equal than in other
adders. This tends to reduce the number of unwanted and unnecessary switching events and thus reduces
power dissipation.
In a carry-select adder we duplicate two small adders (usually 4-bit or 8-bit adders—often CLAs) for the
cases CIN = '0' and CIN = '1' and then use a MUX to select the case that we need—wasteful, but fast. A
carry-select adder is often used as the fast adder in a datapath library because its layout is regular.
We can use the carry-select, carry-bypass, and carry-skip architectures to split a 12-bit adder, for example,
into three blocks. The delay of the adder is then partly dependent on the delays of the MUX between each
block. Suppose the delay due to 1-bit in an adder block (we shall call this a bit delay) is approximately
equal to the MUX delay. In this case may be faster to make the blocks 3, 4, and 5-bits long instead of being
equal in size. Now the delays into the final MUX are equal—3 bit-delays plus 2 MUX delays for the carry
signal from bits 0–6 and 5 bit-delays for the carry from bits 7–11. Adjusting the block size reduces the
delay of large adders (more than 16 bits).
ASIC DESIGN Unit - I Introduction to ASICs
We can extend the idea behind a carry-select adder as follows. Suppose we have an n -bit adder that
generates two sums: One sum assumes a carry-in condition of '0', the other sum assumes a carry-in
condition of '1'. We can split this n -bit adder into an i -bit adder for the i LSBs and an ( n – i ) bit adder for
the n – i MSBs. Both of the smaller adders generate two conditional sums as well as true and complement
carry signals. The two (true and complement) carry signals from the LSB adder are used to select between
the two ( n– i + 1) bit conditional sums from the MSB adder using 2( n – i + 1) two-input MUXes. This is
a conditional-sum adder (also often abbreviated to CSA). We can recursively apply this technique. For
example, we can split a 16-bit adder using i = 8 and n = 8, then we can split one or both 8–bit adders
again—and so on.
Figure above shows the simplest form of an n -bit conditional-sum adder that uses n single-bit conditional
adders, H (each with four outputs: two conditional sums, true carry, and complement carry), together with
a tree of 2:1 MUXes (Qi_j). The conditional-sum adder is usually the fastest of all the adders we have
discussed.
ASIC DESIGN Unit - I Introduction to ASICs
Multipliers:
Figure below shows a symmetric 6-bit array multiplier (an n -bit multiplier multiplies two n -bit numbers;
we shall use n -bit by m -bit multiplier if the lengths are different). Adders a0–f0 may be eliminated, which
then eliminates adders a1–a6, leaving an asymmetric CSA array of 30 (5 × 6) adders (including one half
adder). An n -bit array multiplier has a delay proportional to n plus the delay of the CPA.
There are two items we can attack to improve the performance of a multiplier:
Suppose we wish to multiply 15 (the multiplicand ) by 19 (the multiplier ) mentally. It is easier to calculate
15 × 20 and subtract 15. In effect we complete the multiplication as 15 ×(20 – 1) and we could write this
as 15 × 2 1 , with the overbar representing a minus sign. Now suppose we wish to multiply an 8-bit binary
number, A, by B = 00010111 (decimal 16 + 4 + 2 + 1 = 23). It is easier to multiply A by the canonical
signed-digit vector ( CSD vector ) D = 0010 1 001 (decimal 32 – 8 + 1 = 23) since this requires only three
add or subtract operations (and a subtraction is as easy as an addition). We say B has a weight of 4 and D
has a weight of 3. By using D instead of B we have reduced the number of partial products by 1 (= 4 – 3).
We can recode (or encode) any binary number, B, as a CSD vector, D, as follows (canonical means there
is only one CSD vector for any number):
D i = B i + C i – 2C i + 1
where C i + 1 is the carry from the sum of B i + 1 + B i + C i (we start with C 0 = 0).
ASIC DESIGN Unit - I Introduction to ASICs
Tree-based multiplication. (a) The portion of above Figure that calculates the sum bit, P 5 , using a chain
of adders (cells a0–f5). (b) We can collapse this chain to a Wallace tree (cells 5.1–5.5). (c) The stages of
multiplication.
Two choices for sequential logic: multiphase clocks or synchronous design. We choose
the latter.
Latch
Flip-Flop
master slave
CLKN CLKP
D M S Q
(a) I1 I2 I6 I8
CLKP CLKP CLKN CLKN
QN
I3 I7 I9
CLKN CLKP
CLK
I4 I5 1D
CLKN CLKP C1
CMOS flip-flop
• master latch • slave latch
• active clock edge • negative-edge–triggered flip-flop
• setup time (tSU) • hold time (tH) • clock-to-Q propagation delay (tPD)
• decision window
ASIC DESIGN Unit - I Introduction to ASICs
Transistors as Resistors
–tPDf
0.35VDD = VDD exp –––––––––––––––––
Rpd (Cout + Cp)
VDD
v(in1)
v(out1) m2
VDD t' =0
m2 VDD
–IDSp Rpu
VDD exp[–t' / (Rpd (Cp + C out ))]
in1 out1 in1 out1
m1 0.5VDD
m1
0.35VDD
I DSn –(I DSp + IDSn ) Rpd
0
tPDf t'
Cout C Cp Cout
t' =0 ≈ R pd (C p + Cout)
t' =0
m1: off saturation linear
(a) (b) (c)
(a)
1
v(out1) / V
nonequilibrium path
3
equilibrium
path
2
0
0 1 2 3
v(in1) /V
(b)
–I
max(IDSn , DSp ) /mA 2
0.4
equilibrium
path
IDSn =–I DSp
0.2
0.0
0 1 2 3
v(in1) /V
channel edge 2
2 CGB
1 C GS CGD 2 1
ndiff pwell CBSSW 3 C GSOV G CBDSW
CGDOV 3
D FOX
S D
L G poly
channel
S PS LDD diffusion
field
AS implant bulk, B depletion region
W CBSJ 4 CBDJ 4
GND or
(a) VSS (b)
C BD
1 CGD C GDOV
= C BDJ + CBDSW
D 4
L 3 G D + C BDJ GATE
5 5
2 S
G
3 C BS
CGBOV CGB CGS CGSOV = C BSJ + C BSSW
4 TFOX + C BSJGATE
S (d) B
1
W channel
edge CBSJ GATE
(c) CGBOV
1
W EFF AS channel edge PS
CGDOV
LEFF LEFF S
3
C GD 5 4
2 5 S
C GB WEFF
C GS LD
channel edge 3 CBSJ
C GSOV
T ox CBSSW
(e) (f) (g) (h)
edge) 10–16 F
CBS CBS = 4.032 × 10–15 + 4.2 × 10–16 = 4.45 ×
CBS = CBSJ + CBSSW 10–15 F
AS CJ = (7.2 × 10–15)(5.6 × 10–4) = 4.03 ×
CBSJ + AS CJ ( 1 + VSB/φB)–mJ 10–15 F
PS CJSW = (8.4 × 10–6)(5 × 10–11) = 4.2 ×
CBSSW = PS CJSW (1 + VSB/φB)–mJSW 10–16 F
CGSOV CGSOV =W EFF CGSO ; WEFF =W–2W
D CGSOV = (6 × 10–6)(3 × 10–10) = 1.8 × 10–16 F
CGDOV CGDOV =W EFF CGSO CGDOV = (6 × 10–6)(3 × 10–10) = 1.8 × 10–15 F
CGBOV CGBOV =L EFF CGBO ; LEFF =L–2L D CGDOV = (0.5 × 10–6)(4 × 10–10) = 2 × 10–16 F
CGS CGS/CO = 0 (off), 0.5 (lin.), 0.66 (sat.) CO = (6 × 10–6)(0.5 × 10–6)(0.00345) = 1.03 ×
CO (oxide capacitance) = WEF LEFF εox 10–14 F
/ Tox CGS = 0.0 F
CGD CGD/CO = 0 (off), 0.5 (lin.), 0 (sat.) CGD = 0.0 F
CGB CGB = 0 (on), = CO in series with CGS CGB = 3.88 × 10–15 F, CS =depletion capaci-
(off) tance
1Input .MODEL CMOSN NMOS LEVEL=3 PHI=0.7 TOX=10E-09 XJ=0.2U TPG=1
VTO=0.65 DELTA=0.7
+ LD=5E-08 KP=2E-04 UO=550 THETA=0.27 RSH=2 GAMMA=0.6
NSUB=1.4E+17 NFS=6E+11
+ VMAX=2E+05 ETA=3.7E-02 KAPPA=2.9E-02 CGDO=3.0E-10
CGSO=3.0E-10 CGBO=4.0E-10
+ CJ=5.6E-04 MJ=0.56 CJSW=5E-11 MJSW=0.52 PB=1
m1 out1 in1 0 0 cmosn W=6U L=0.6U AS=7.2P AD=7.2P PS=8.4U
PD=8.4U
ASIC DESIGN Unit - I Introduction to ASICs
Junction Capacitance
• Junction capacitances, CBD and CBS, consist of two parts: junction area and sidewall
• Both CBD and CBS have different physical characteristics with parameters: CJ and MJ
for the junction, CJSW and MJSW for the sidewall, and PB is common
• CBD and CBS depend on the voltage across the junction (VDB and VSB)
• The sidewalls facing the channel (CBSJGATE and CBDJGATE) are different from the side-
walls that face the field
• It is a mistake to exclude the gate edge assuming it is in the rest of the model—it is not
• In HSPICE there is a separate mechanism to account for the channel edge capaci-
tance (using parameters ACM and CJGATE)
Overlap Capacitance
• The overlap capacitance calculations for CGSOV and CGDOV account for lateral diffusion
• SPICE parameter LD=5E-08 or LD =0.05 µm
• Not all SPICE versions use the equivalent parameter for width reduction, WD, in calcu-
lating CGDOV
• Not all SPICE versions subtract WD to form WEFF
Gate Capacitance
• The gate capacitance depends on the operating region
• The gate–source capacitance CGS varies from zero (off) to 0.5C O in the linear region to
(2/3)C O in the saturation region
• The gate–drain capacitance CGD varies from zero (off) to 0.5C O (linear region) and
back to zero (saturation region)
• The gate–bulk capacitance CGB is two capacitors in series: the fixed gate-oxide capaci-
tance, CO, and the variable depletion capacitance, CS
• As the transistor turns on the channel shields the bulk from the gate—and CGB falls to
zero
• Even with VGS =0V, the depletion width under the gate is finite and thus CGB is less than
CO
ASIC DESIGN Unit - I Introduction to ASICs
Logical Effort
We extend the prop–ramp model with a “catch all” term, tq, that includes:
• delay due to internal parasitic capacitance
• the time for the input to reach the switching threshold of the cell
• the dependence of the delay on the slew rate of the input waveform
Cout
tPD = RC –––––– + RCp + stq
Cin
The time constant tau, τ = Rinv Cinv , is a basic property of any CMOS technology
The delay equation is the sum of three terms, d = f + p + q or
delay = effort delay + parasitic delay + nonideal delay
The effort delay f is the product of logical effort, g, and electrical effort, h: f = gh
Thus, delay = logical effort × electrical effort + parasitic delay + nonideal delay
• R and C will change as we scale a logic cell, but the RC product stays the same
• Logical effort is independent of the size of a logic cell
• We can find logical effort by scaling a logic cell to have the same drive as a 1X
minimum-size inverter
• Then the logical effort, g, is the ratio of the input capacitance, Cin, of the 1X logic cell to
Cinv
ASIC DESIGN Unit - I Introduction to ASICs
2 units of
gate capacitance
2/1 2/1 2/1
A 1X
ZN ZN
C inv 1/1 Cinv A
1 unit 2/1 Cin =2+2=4
2/1
1
g = C in /C inv = 4/3
Measure the input 1X 2 3
capacitance of a Make the cell have the same Measure ratio of cell input
minimum-size drive strength as a capacitance to that of a
inverter. minimum-size inverter. minimum-size inverter.
C inv =2+1=3
Logical effort • For a two-input NAND cell, the logical effort, g=4/3
(a) Find the input capacitance, Cinv, looking into the input of a minimum-size inverter in
terms of the gate capacitance of a minimum-size device
(b) Size a logic cell to have the same drive strength as a minimum-size inverter (assuming
a logic ratio of 2). The input capacitance looking into one of the logic-cell terminals is then
Cin
(c) The logical effort of a cell is Cin/ Cinv
The h depends only on the load capacitance Cout connected to the output of the logic
cell and the input capacitance of the logic cell, Cin; thus
Cell effort, parasitic delay, and nonideal delay (in units of τ) for single-stage CMOS cells
Cell effort Cell effort
Cell Parasitic delay/τ Nonideal delay/τ
(logic ratio=2) (logic ratio=r)
inverter 1 (by definition) 1 (by definition) pinv (by definition) qinv (by definition)
n-input NAND (n+2)/3 (n+r)/(r+1) np inv nq inv
n-input NOR (2n+1)/3 (nr+1)/(r+1) np inv nq inv
ASIC DESIGN Unit - I Introduction to ASICs
0.3 × 10–12
d = gh + p + q = –––––––––––––––––––– + (3)·(1) + (3)·(1.7)
(2)·(0.036 × 10–12)
= 4.1666667 + 3 + 5.1
= 12.266667 τ equivalent to an absolute delay, tPD ≈ 12.3 × 0.06ns=0.74ns
The delay for a 2X drive, three-input NOR logic cell is tPD = (0.03 + 0.72Cout + 0.60) ns
With Cout =0.3pF, tPD = 0.03 + (0.72)·(0.3) + 0.60 = 0.846 ns compared to our prediction of
0.74ns
ASIC DESIGN Unit - I Introduction to ASICs
VDD
4/1 4/1 2/1
A C E
An OAI221 logic cell A 4/1 4/1
B B D
• Logical-effort vector g=(7/3, 7/3, C Z
D Z
5/3)
E 3/1
E
• The logical area is 33 logical
squares 3/1 3/1
A B
3/1 3/1
C D
VDD
6/1 6/1
A B
An AOI221 logic cell
A C 6/1 6/1
D
• g=(8/3, 8/3, 7/3) B
Z
C 6/1
D E
• Logical area is 39 logical squares E
Z
Logical Paths
Multistage Cells
(a) delay d1
g0 =1 g2 =1.4
p0 =1 p2 =2 h0 =1.4 h2 =1.0
g3 =1.4
q0 =1.7 q2 =3.4 1.0 1.4 AOI221
p3 =2 AOI221
A1 q3 =3.4 A1 1.4
A2 2 A2 2 h3 =0.7
ZN ZN
B1 AOI21 3 4 B1 3 4
B2 B2
1 g4 =1 1 1.0 CL
C C 1.4
p4 =1
g1 =(2, 1.6) q4 =1.7 h4 =C L
p1 =3 2.0 1.6
q1 =5.4
d1=( g0 h0 + p0 + q0) +( g2 h2 + p2 + q2) +( g3h 3 + p3 + q3) +( g4h4 + p4 + q4)
=(1 × 1.4 +1+1.7)+(1.4 × 1+2+3.4)+(1.4 × 0.7+ 2+3.4) +(1 × C L +1+1.7)=20+ CL
Cout
path electrical effort H = ∏ hi –––––
i ∈ path Cin
Cout is the load and Cin is the first input capacitance on the path
path effort F = GH
P+Q = ∑ pi + hi
i ∈ path