Vlsi Notes 16-17 Even Final-2
Vlsi Notes 16-17 Even Final-2
DEPARTMENT
OF
ELECTRONICS & COMMUNICATION ENGINEERING
LECTURE NOTES
EC8095-VLSI DESIGN
(2017 Regulation)
Year/Semester: III/VI ECE
Prepared by
Mrs.P.Vijayasri,Assist.Prof.
Department of ECE EC8095-VLSI DESIGN
Syllabus:
EC8095 VLSI DESIGN SYLLABUS REGULATION 2017
INTRODUCTION TO 9
UNIT I MOS TRANSISTOR
MOS Transistor, CMOS logic, Inverter, Pass Transistor, Transmission gate,
Layout Design Rules, Gate Layouts, Stick Diagrams, Long-Channel I-V Charters
tics, C-V Charters tics, Non ideal I-V Effects, DC Transfer characteristics, RC
Delay Model, Elmore Delay, Linear Delay Model, Logical effort, Parasitic Delay,
Delay in Logic Gate, Scaling.
UNIT II COMBINATIONAL MOS 9
LOGIC CIRCUITS
Circuit Families: Static CMOS, Ratioed Circuits, Cascode Voltage Switch
Logic, Dynamic Circuits, Pass Transistor Logic, Transmission Gates, Domino,
Dual Rail Domino, CPL, DCVSPG, DPL, Circuit Pitfalls. Power: Dynamic Power,
Static Power, Low Power Architecture.
1. 1997
INDEX
EC8095-VLSI DESIGN
3.5 Pipelining
3.6 Schmitt Trigger
3.7 Monostable Sequential Circuits
3.8 Astable Sequential Circuits
3.9 Timing Issues :Timing Classification Of Digital System
3.10 Synchronous Design
2 Marks Questions & Answers
16 Marks Questions
UNIT I
MOS TRANSISTOR
PRINCIPLE
REFERRED BOOK:
TEXT BOOKS:
(a) (b)
Fig. 1.1 nMOS transisitor (a) and pMOS transistor (b)
Each transistor consists of a conducting gate, an insulating layer of
silicondioxide andsubstrate or bodyor bulk. Gates of early transistors were built from
metal, so the stack was called metaloxide-semiconductor, or MOS. Since the 1970s,
the gate has been formed from polycrystallinesilicon (polysilicon).
1.1.1nMOS transistor
An nMOStransistor is built with a p-type body and has regions of n-type
semiconductor adjacent to thegate called the sourceand drain. They are physically
equivalent and interchangeable. The body is typically grounded.
Operation
The gate is a control input: It affects the flow of electrical current between the
source and drain. In an nMOS transistor, the body is generally grounded so the p–n
junctions of the source and drain to body are reverse-biased. If the gate is also
grounded, no current flows through the reverse-biased junctions. Hence, we say the
transistor is OFF. If the gate voltage is raised, it creates an electric field that starts to
attract free electrons to the underside of the Si–SiO2 interface. If the voltage is
raised enough, the electrons out number the holes and a thin region under the gate
called the channel is inverted to act as an n-type semiconductor. Hence, a
conducting path of electron carriers is formed from source to drain and current can
flow. The transistor is ON.As the gate voltage increases, the potential at the silicon
surface at some point reaches a critical value, where the semiconductor surface
inverts to n-type material. Further increases in the gate voltage produce no further
changes in the depletion layer width, but result in additional electrons in the thin
inversion layer directly under theoxide. These are drawn into the inversion layer from
the heavily doped n + source region. Hence, a continuous n-type channel is formed
between the source and drain regions, the conductivity of which is modulated by the
gate-source voltage.
This picture changes somewhat in case a substrate bias voltage VSBis applied (VSBis
normallypositive for n-channel devices). This causes the surface potential required
for stronginversion to increase and to become |–2ФF + VSB|. The charge stored in the
depletion region now is expressed by
The value of VGS where strong inversion occurs is called the threshold voltage VT.
VT is a function of several components, most of which are material constants such
as the difference in work-function between gate and substrate material, the oxide
thickness, theFermi voltage, the charge of impurities trapped at the surface between
channel and gateoxide, and the dosage of ions implanted for threshold adjustment.
The threshold voltage under different body-biasing conditions can then be
determined in the following manner,
The parameter γ is called the body-effect coefficient, and expresses the impact
ofchanges in VSB. The threshold voltage has a positive value for a typical NMOS
device, while it is negative for a normal PMOS transistor.
nMOS transistor and one pMOS transistor. The bar at the top indicates VDD and the triangle
at the bottom indicates GND. When the input A is 0, the nMOS transistor is OFF and
the pMOS transistor is ON. Thus, the output Y is pulled up to 1 because it is connected to
VDD but not to GND. Conversely, when A is 1, the nMOS is ON, the pMOS is OFF, and Y
is pulled down to ‘0.’ This is summarized in Table.
k-input NAND gates are constructed using k series nMOS transistors and k parallel pMOS
transistors. For example, a 3-input NAND gate is shown in Figure. When any of the inputs
are 0, the output is pulled high through the parallel pMOS transistors. When all of the inputs
are 1, the output is pulled low through the series nMOS transistors.
1.2.2 CMOS Logic Gates
The inverter and NAND gates are examples of static CMOS logic gates, also called
complementary CMOS gates. In general, a static CMOS gate has an nMOS pull-down
network to connect the output to 0 (GND) and pMOS pull-up network to connect the output
to 1 (VDD), as shown in Figure.
The networks are arranged such that one is ON and the other OFF for any input
pattern.
The pull-up and pull-down networks in the inverter each consist of a single transistor.
The NAND gate uses a series pull-down network and a parallel pullup network.
More elaborate networks are used for more complex gates.
Two or more transistors in series are ON only if all of the series transistors are ON.
Two or more transistors in parallel are ON if any of the parallel transistors are ON.
This is illustrated in Figure for nMOS and pMOS transistor pairs. By using
combinations of these constructions, CMOS combinational gates can be constructed.
Although such static CMOS gates are most widely used.In general, when we join a
pull-up network to a pull-down network to form a logic gate as shown in Figure , they both
will attempt to exert a logic level at the output.
The possible levels at the output are shown in Table . From this table it can be seen
that the output of a CMOS logic gate can be in four states.
The 1 and 0 levels have been encountered with the inverter and NAND gates, where
either the pull-up or pull-down is OFF and the other structure is ON.
When both pull-up and pull-down are OFF, the highimpedance or floating Z output
state results.
This is of importance in multiplexers, memory elements, and tristate bus drivers. The
crowbarred (or contention) X level exists when both pull-up and pull-down are
simultaneously turned ON.
Contention between the two networks results in an indeterminate output level and
dissipates static power. It is usually an unwanted condition.
Fig 1.5General logic gate using pull up and pull down networks
where the permittivity εox= 3.9 εo = 3.9 x 8.85 x 10 –12 = 3.45 x 10 –11 F/m
εox is the dielectric or permittivity of the silicon dioxide εo is the vacuum permittivity
(or dielectric constant).εois the permittivity of free space, 8.85x10-14 F/cm. Often the
εox/toxterm is called Cox, the capacitance per unit area of the gate oxide.
-
Substitute the values of Vgcin equation (1.1). The drain to source current Ids
(iii)Saturation region
When Vgs>Vtand Vds>Vgs-Vt,the switch is turned on and the channel has been
created which allows the current to flow between the drain and source. Since
the drain voltage is higher than the gate voltage, portion of the channel is
turned off. The onset of this region is known as pinch-off. The drain current is
now independent of the drain voltage and the current is controlled only by the
gate voltage.
(a) (b)
Fig. 1.10 I-V characteristics of Ideal (a) nMOS and (b) pMOS transistors
Assume the source voltage is close to the body voltage so Vdb~Vds. Hence,
increasing Vds decreases the effective channel length. Shorter channel length
results in higher current; thus Ids increases with Vds in saturation as shown.
Fig. 1.13 I-V characteristics of nMOS transistor with channel length modulation
This can be modeled by multiplying Ids value at saturation region by (1 + λVds).
1.4.4Body Effect
In a transistor, the body is an implicit fourth terminal. The potential difference
between the source and body V sb affects the threshold voltage. The threshold
voltage can be modeled as,
where“Vt0” is the threshold voltage when the source is at the body potential, “Φs” is
the surface potential at threshold .“γ” is the body effect coefficient.
1.4.5Subthreshold Conduction
The ideal transistor I-V model current only flows from S to D when Vgs> Vt. In
real transistors, current does not abruptly cut off below threshold, but rather drops off
exponentially as given below. This conduction is also known as leakage and often
results in undesired current when a transistor is nominally OFF.
Where “Ids0” is the current at threshold & dependent on process, device geometry;
“n” is a process-dependent term affected by the depletion region and is typically in
the range of 1.4-1.5 for CMOS processes. “VT” is thermal voltage
where“Is” depends on doping levels and on the area & perimeter of the diffusion
region “VD” is the diode voltage
Fig 1.16 plots gate leakage current density JG against voltage for various oxide
thicknesses.
1.4.8 Temperature Dependence
Transistor characteristics are influenced by temperature, carrier mobility
decreases with temperature. An approximate relation is
• The mechanism involved shows the key parasitic components associated with a p‐
wellstructure in which an inverter circuit (for example) has been formed.
The bottom plate of the capacitor is the channel, which is not one of the transistor’s
terminals. When the transistor is on, the channel extends from the source (and reaches the
drain if the transistor is unsaturated, or stops short in saturation).
Thus, we often approximate the gate capacitance as terminating at the source and
call the capacitance Cgs. Most transistors used in logic are of minimum manufacturable
length because this results in greatest speed and lowest dynamic power consumption. Thus,
taking this mini- mum L as a constant for a particular process, we can define
Notice that if we develop a more advanced manufacturing process in which both the
channel length and oxide thickness are reduced by the same factor, Cpermicron remains
unchanged.
In addition to the gate, the source and drain also have capacitances. These
capacitances are not fundamental to operation of the devices, but do impact circuit
performance and hence are called parasitic capacitors.
The source and drain capacitances arise from the p–n junctions between the source
or drain diffusion and the body and hence are also called diffusion capacitance Csband Cdb. A
depletion region with no free carriers forms along the junction. The depletion region acts as
an insulator between the conducting p- and n-type regions, creating capacitance across the
junction.
The capacitance of these junctions depends on the area and perimeter of the source
and drain diffusion, the depth of the diffusion, the doping levels, and the voltage. As diffusion
has both high capacitance and high resistance, it is generally made as small as possible in
the layout. Three types of diffusion regions are frequently seen, illustrated by the two series
transistors in Figure .
In Fig1(a), each source and drain has its own isolated region of contacted diffusion.
In Figure1(b), the drain of the bottom transistor and source of the top transistor form a
shared contacted diffusion region. In Figure1 (c), the source and drain are merged into an
uncontacted region. The average capacitance of each of these types of regions can be
calculated or measured from simulation as a transistor switches between VDDand GND.
The MOS gate sits above the channel and may partially overlap the source and drain
diffusion areas. Therefore, the gate capacitance has two components:
The intrinsic capacitance was approximated as a simple parallel plate with capacitance
C0= WLCox. However, the bottom plate of the capacitor depends on the mode of operation of
the transistor.
Fig 1.20.Intrinsic gate capacitance Cgc+Cgd+Cgb as a function of (a) Vgs and Vds
1. Cutoff. When the transistor is OFF (Vgs <Vt), the channel is not inverted and charge on
the gate is matched with opposite charge from the body. This is called Cgb, the gate-to-body
capacitance. For negative Vgs, the transistor is in accumulation and Cgb= C0. As Vgsincreases
but remains below a threshold, a depletion region forms at the surface. This effectively
moves the bottom plate downward from the oxide, reducing the capacitance, as shown in
Figure.
2. Linear. When Vgs>Vt, the channel inverts and again serves as a good conductive bottom
plate. However, the channel is connected to the source and drain, rather than the body, so
Cgbdrops to 0. At low values of Vds, the channel charge is roughly shared between source
and drain, so Cgs= Cgd= C0/2. As Vdsincreases, the region near the drain becomes less
inverted, so a greater fraction of the capacitance is attributed to the source and a smaller
fraction to the drain, as shown in Figure.
3. Saturation. At Vds>Vdsat, the transistor saturates and the channel pinches off. At this point,
all the intrinsic capacitance is to the source, as shown in Figure. Because of pinchoff, the
capacitance in saturation reduces to Cgs= 2/3 C0 for an ideal transistor. The behavior in these
three regions can be approximated as shown in Table1 .
The gate overlaps the source and drain in a real device and also has fringing fields
terminating on the source and drain. This leads to additional overlap capacitances, as shown
in Figure. These capacitances are proportional to the width of the transistor. Typical values
are Cgsol = Cgdol = 0.2 – 0.4 fF/µm. They should be added to the intrinsic gate capacitance
to find the total.
Figure4(a) shows the Cgs and Cgd of a long channel n-transistor. Figure 4(b) shows the C gs
and Cgd of a short channel device (L=0.75 μ m). Note that C gd is finite, i.e., Cgd > 0. This is
due to channel side fringing fields between the gate and drain.
(a) (b)
The p–n junction between the source diffusion and the body contributes parasitic
capacitance across the depletion region. The capacitance depends on both the area AS and
sidewall perimeter PS of the source diffusion region. The geometry is illustrated in Figure 5 .
The area is AS = WD. The perimeter is PS = 2W + 2D. Of this perimeter, W abuts the
channel and the remaining W + 2D does not.
whereCjbs(the capacitance of the junction between the body and the bottom of the
source) has units of capacitance/area and Cjbssw(the capacitance of the junction between the
body and the side walls of the source) has units of capacitance/length. Because the
depletion region thickness depends on the bias conditions, these parasitics are nonlinear.
The area junction capacitance term is
vTis the thermal voltage from thermodynamics, not to be confused with the threshold voltage
Vt. It has a value equal to kT/q (26 mV at room temperature), where k = 1.380 × 10–23 J/K is
Boltzmann’s constant, T is absolute temperature (300 K at room temperature), and q = 1.602
× 10–19 C is the charge of an electron. NA and ND are the doping levels of the body and
source diffusion region. niis the intrinsic carrier concentration in undoped silicon and has a
value of 1.45 × 1010 cm–3 at 300 K.
The sidewall capacitance term is of a similar form but uses different coefficients.
In some SPICE models, the capacitance of this sidewall abutting the gate and channel is
specified with another set of parameters:
The drain diffusion has a similar parasitic capacitance dependent on AD, PD, and Vdb.
Equivalent relationships hold for pMOS transistors, but doping levels differ. As the
capacitances are voltage-dependent, the most useful information to digital designers is the
value averaged across a switching transition.
Let us derive the DC transfer function (V out vs. Vin) for the static CMOS inverter shown
in Figure . We begin with Table, which outlines various regions of operation for the n- and p-
transistors. In this table, Vtn is the threshold voltage of the n-channel device, and V tp is the
threshold voltage of the p-channel device.
Note that Vtp is negative. The equations are given both in terms of V gs /Vds and Vin
/Vout. As the source of the nMOS transistor is grounded, V gsn = Vin and Vdsn = Vout. As the
source of the pMOS transistor is tied to VDD,
Vgsp = Vin – VDD and Vdsp = Vout – VDD.
Table Relationship between Voltages for the three regions of operation of a CMOS inverter
The objective is to find the variation in output voltage (V out) as a function of the input voltage
(Vin).
Given Vin, we must find Vout subject to the constraint that Idsn = |Idsp|.
For simplicity, we assume Vtp = –Vtn and that the pMOS transistor is 2–3 times as wide as
the nMOS transistor so βn = βp.
In Figure 8(a). The plot shows Idsn and Idsp in terms of Vdsn and Vdsp for various values of
Vgsn and Vgsp. Figure 8(b) shows the same plot of Idsn and |Idsp| now in terms of Vout for
various values of Vin.
The possible operating points of the inverter, marked with dots, are the values of V out
where Idsn = |Idsp| for a given value of Vin. These operating points are plotted on Vout vs. Vin
axes in Figure 8(c) to show the inverter DC transfer characteristics. The supply current I DD =
Idsn = |Idsp| is also plotted against Vin in Figure 8(d) showing that both transistors are
momentarily ON as Vin passes through voltages between GND and VDD, resulting in a
pulse of current drawn from the power supply.
The operation of the CMOS inverter can be divided into five regions indicated on Figure 8(c).
The state of each transistor in each region is shown in Table 3 .
In region A, the nMOS transistor is OFF so the pMOS transistor pulls the output to V DD. In
region B, the nMOS transistor starts to turn ON, pulling the output down. In region C, both
transistors are in saturation. Notice that ideal transistors are only in region C for Vin = V DD/2
Also notice that the inverter’s current consumption is ideally zero, neglecting leakage,
when the input is within a threshold voltage of the V DD or GND rails. This feature is important
for low-power operation.
Region A : This region is defined by 0 =< Vin < Vtn in which the n-device is cut off (Idsn =0),
and the p-device is in the linear region. Since Idsn = –Idsp, the drain-to-source current Idsp for
the p-device is also zero. But for Vdsp = Vout– VDD, withVdsp = 0, the output voltage is Vout=VDD.
Region B : This region is characterized by Vtn =< Vin < VDD /2 in which the p-device is in its
nonsaturated region (Vds != 0) while the n-device is in saturation. The equivalent circuit for
the inverter in this region can be represented by a resistor for the p-transistor and a current
source for the n-transistor. The saturation current Idsn for the n-device is obtained by
setting Vgs = Vin . This results in
Region C: In this region both the n- and p-devices are in saturation. The saturation currents
for the two devices are given by
This yields,
By setting,
Which implies that region C exists only for one value of Vin. We have assumed that a MOS
device in saturation behaves like an ideal current soured with drain-to-source current being
independent of Vds.In reality, as Vds increases, Ids also increases slightly; thus region C has a
finite slope. The significant factor to be noted is that in region C, we have two current
sources in series, which is an “unstable” condition.
Thus a small input voltage as a large effect at the output. This makes the output transition
very steep, which contrasts with the equivalent nMOS inverter characteritics. The above
expression of Vth is particularly useful since it provides the basis for defining the gate
threshold Vinv which corresponds to the state where V out=Vin .This region also defines the
“gain” of the CMOS inverter when used as a small signal amplifier.
Region D: This region is described by VDD/2 <Vin =< VDD+ Vtp.The p-device is in saturation
while the n-device is operation in its nonsaturated region..The two currents may be written
as
Region E: This region is defined by the input condition Vin >= VDD -Vtp, in which the p device
is cut off (Idsp =0), and the n-device is in the linear mode. Here, Vgsp= Vin - VDD Which is more
positive than Vtp. The output in this region is Vout=0.
From the transfer curve , it may be seen that the transition between the two states is very
step.This characteristic is very desirable because the noise immunity is maximized.
The gate-threshold voltage, Vinv, where Vin =Vout is dependent on βn/βp . Thus, for
given process, if we want to change βn/βp we need to change the channel dimensions,
i.e.,channel-length L and channel-width W.
Therefore it can be seen that as the ratio βn/βp is decreased, the transition region
shifts from left to right; however, the output voltage transition remains sharp.
Inverters with different beta ratios r = βp /βn are called skewed inverters [Sutherland99]. If r
> 1, the inverter is HI-skewed. If r < 1, the inverter is LO-skewed. If r = 1, the inverter has
normal skew or is unskewed.
A HI-skew inverter has a stronger pMOS transistor. Therefore, if the input is V DD /2, we would
expect the output will be greater than V DD /2. In other words, the input threshold must be
higher than for an unskewed inverter. Similarly, a LO-skew inverter has a weaker pMOS
transistor and thus a lower switching threshold.
Figure explores the impact of skewing the beta ratio on the DC transfer characteristics. As
the beta ratio is changed, the switching threshold moves. However, the output voltage
transition remains sharp. Gates are usually skewed by adjusting the widths of transistors
while maintaining minimum length for speed.
Noise margin is closely related to the DC voltage characteristics. This parameter allows
you to determine the allowable noise voltage on the input of a gate so that the output will not
be corrupted. The specification most commonly used to describe noise margin (or noise
immunity) uses two parameters:
With reference to Figure, NML is defined as the difference in maximum LOW input voltage
recognized by the receiving gate and the maximum LOW output voltage produced by the
driving gate.
The value of NMH is the difference between the minimum HIGH output voltage of the driving
gate and the minimum HIGH input voltage recognized by the receiving gate. Thus,
where
VIH = minimum HIGH input voltage
VIL = maximum LOW input voltage
VOH= minimum HIGH output voltage
VOL = maximum LOW output voltage
Inputs between VIL and VIH are said to be in the indeterminate region or forbidden zone and
do not represent legal digital logic levels. Therefore, it is generally desirable to have V IH as
close as possible to VIL and for this value to be midway in the “logic swing,” VOL to VOH.
This implies that the transfer characteristic should switch abruptly; that is, there should be
high gain in the transition region. For the purpose of calculating noise margins, the transfer
characteristic of the inverter and the definition of voltage levels VIL, VOL, VIH, and VOH are
shown in Figure.
Logic levels are defined at the unity gain point where the slope is –1. This gives a
conservative bound on the worst case static noise margin.
For the inverter shown, the NML is 0.46 VDD while the NMH is 0.13 VDD. Note that the
output is slightly degraded when the input is at its worst legal value; this is called noise feed
through or propagated noise.
If either NML or NMH for a gate are too small, the gate may be disturbed by noise that
occurs on the inputs. An unskewed gate has equal noise margins, which maximizes
immunity to arbitrary noise sources. If a gate sees more noise in the high or low input state,
the gate can be skewed to improve that noise margin at the expense of the other. Note that
if |Vtp| = Vtn , then NMH and NML increase as threshold voltages are increased.
DC analysis gives us the static noise margins specifying the level of noise that a gate may
see for an indefinite duration. Larger noise pulses may be acceptable if they are brief;.
Unfortunately, there is no simple amplitude-duration product that conveniently specifies
dynamic noise margins.
Note that both the control input and its complement are required by the transmission
gate. This is called double rail logic. Some circuit symbols for the transmission gate are
shown in Figure.
.
In all of our examples so far, the inputs drive the gate terminals of nMOS transistors
in the pull-down network and pMOS transistors in the complementary pull-up network, as
was shown in Figure.
Thus, the nMOS transistors only need to pass 0s and the pMOS only pass 1s, so the
output is always strongly driven and the levels are never degraded. This is called a fully
restored logic gate and simplifies circuit design considerably.
In contrast to other forms of logic, where the pull-up and pull-down switch networks
have to be ratioed in some manner, static CMOS gates operate correctly independently of
the physical sizes of the transistors. Moreover, there is never a path through ‘ON’ transistors
from the 1 to the 0 supplies for any combination of inputs (in contrast to single-channel MOS,
GaAs technologies, or bipolar).
In some specific designs there will be a many number of logic paths, that are called the
critical paths, which require a attention to timing details.
The critical paths can be affected at four main levels of your designs are:
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
ii. The logic level: Here tradeoffs include types of functional blocks, the number of
stages of gates in the cycle, and the fan-in and fan-out of the gates. However, that no
amount of skillful logic design can overcome a poor microarchitecture.
iii. The circuit level: Once the logic has been selected, the delay can be tuned at the
circuit level by choosing transistor sizes or using other styles of CMOS logic. (iv) The
layout level: Finally, delay is dependent on the layout. The floor plan is of great
importance because it determines the wire lengths that can dominate delay.
We will focus on the logic and circuit optimizations of selecting the number of stages
of logic, the types of gates, and the transistor sizes.
Quick delay estimation is essential to designing critical paths. Timing details of the
critical path can be recognized by a timing analyzer, which is a design tool that
automatically finds the slowest paths in a logic design.
Simulation or timing analysis only provide the details of how fast a particular circuit
operates; they do not specify how the circuit could be modified to operate faster.
A delay simple models that can be applied on the design to rapidly estimate delay,
understand its origin, and figure out how it can be reduced.
Contamination delay time, tcd = minimum time from the input crossing 50% to the
output crossing 50%
So when an input changes, the output will retain its old value for atleast the
contamination delay and take on its new value in at most the propagation delay.
Propagation and contamination delay times are also called max-time and min-time.
This RC delay model is used to estimate the delay of logic gates as the RC product
of the effective driver resistance and the load capacitance. The gate that charges or
discharges a node is called the driver and the gates and wire being driven is called
the load.
Usually, logic gates use minimum-length devices for least delay, area, and power
consumption. Given this, the delay of a logic gate depends on the widths of the
transistors in the gate and the capacitance of the load that must be driven.
An nMOS transistor with width of one unit is defined to have effective resistance R.
The unit-width pMOS has a higher resistance to the nMOS transistor; let us assume
this resistance is 2R.
Wider transistors have lower resistance. For example, a pMOS transistor of double-
unit width has effective resistance R.
Parallel and series transistors combine like conventional resistors. When multiple
transistors are in series, their resistance is the sum of each individual resistance.
When multiple transistors are in parallel, the resistance is lower if they are all ON. In
that case, the effective resistance is just that of the single transistor.
In many processes the capacitances are approximately equal and can be labeled
C=Cg=Cdiff to keep estimation simple. The second terminal of the diffusion capacitor
is the body, which is usually tied to ground (for nMOS) or VDD (for pMOS).
As the DC voltage on the second terminal is irrelevant to delay, we often draw both
capacitances to ground for simplicity. The gate capacitance includes fields
terminating on the channel, source, and drain.
Usually, gate capacitance can be determined directly from the transistor widths in the
schematic. Diffusion capacitance depends on the layout.
In a good layout, diffusion nodes are shared wherever possible to reduce the
diffusion capacitance. Moreover, the uncontacted diffusion nodes between series
transistors are usually smaller than those that must be contacted. Such uncontacted
nodes have less capacitance.
resistance Rn_t between that node and a supply multiplied by the capacitance on
the node:
Observe that the delay consists of two components. The parasitic delay is determined by the
gate driving its own internal diffusion capacitance. Boosting the width of the transistors
decreases the resistance but increases the capacitance so the parasitic delay is ideally
independent of the gate size.
d= tpd/x
Hence, the rising delay of the 2- input NAND gate is d = (4/3)h + 2. The RC delay model
similarly predicts an inverter with real parasitics driving h identical inverters to have a delay
of h+1.
If the load is not identical copies of the gate, electrical effort can be computed as
where Cout is the capacitance of the external load being driven Cin is the input capacitance
of the gate Figure (b) plots normalized delay vs. electrical effort for an idealized
inverter and 2-input NAND gate. The y-intercepts indicate the parasitic delay, i.e., the
delay when the gate drives no load. The slope of the lines is the logical effort.
The inverter has a slope of 1 by definition. The NAND has a slope of 4/3. The logical
effort and parasitic delay can be estimated using RC models.
These parameters are related to the logical effort terms as given in Table. The
effective resistance of a gate increases with the logical effort of the gate but
decreases with the gate size.
Some designers use the term drive as the reciprocal of resistance: drive = Cm/g.
Delay can be expressed in terms of drive as
Logical effort of a gate is defined as the ratio of the input capacitance of the gate to
the input capacitance of an inverter that can deliver the same output current.
Equivalently, logical effort indicates how much worse a gate is at producing output
current as compared to an inverter, given that each input of the gate may only present as
much input capacitance as the inverter.
Logical effort can be measured in simulation from delay vs. fanout plots as the ratio
of the slope of the delay of the gate to the slope of the delay of an inverter.
The inverter presents three units of input capacitance. The NAND presents five units
of capacitance on each input, so the logical effort is 5/3. Similarly, the NOR presents seven
units of capacitance, so the logical effort is 7/3.
This matches our expectation that NANDs are better than NORs because NORs
have slow pMOS transistors in series.
Table lists the logical effort of common gates. The effort tends to increase with the
number of inputs. NAND gates are better than NOR gates because the series transistors are
nMOS rather than pMOS. Exclusive-OR gates are particularly costly and have different
logical efforts for different inputs.
The parasitic delay of a gate is the delay of the gate when it drives zero load. It can
be estimated with RC delay models.
A crude method good for hand calculations is to count only diffusion capacitance on
the output node. For example, consider the gates in Figure (2.1), assuming each transistor
on the output node has its own drain diffusion contact. Transistor widths were chosen to give
a resistance of R in each gate. The inverter has three units of diffusion capacitance on the
output, so the parasitic delay is 3RC =τ.
Table estimates the parasitic delay of common gates. Increasing transistor sizes
reduces resistance but increases capacitance correspondingly, so parasitic delay is, on first
order, independent of gate size.
The parasitic delay also depends on the ratio of diffusion capacitance to gate
capacitance.
Nevertheless, it is important to realize that parasitic delay grows more than linearly with the
number of inputs in a real NAND or NOR circuit.
For example, Figure2.2 shows a model of an n-input NAND gate in which the upper
inputs were all 1 and the bottom input rises. The gate must discharge the diffusion
capacitances of all of the internal nodes as well as the output. The Elmore delay is
This delay grows quadratically with the number of series transistors n, indicating that beyond
a certain point it is faster to split a large gate into a cascade of two smaller gates.
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
Designers often need to choose the fastest circuit topology and gate sizes for a
particular logic function and to estimate the delay of the design. The method of Logical Effort
provides a simple method “on the back of an envelope” to choose the best topology and
number of stages of logic for a function. Based on the linear delay model, it allows the
designer to quickly estimate the best number of stages for a path, the minimum possible
delay for the given topology, and the gate sizes that achieve this delay.
Figure shows the logical and electrical efforts of each stage in a multistage path as a
function of the sizes of each stage. The path of interest (the only path in this case) is marked
with the dashed blue line.
Observe that logical effort is independent of size, while electrical effort depends on
sizes. This section develops some metrics for the path as a whole that are independent of
sizing decisions.
The path electrical effort H can be given as the ratio of the output capacitance the path must
drive divided by the input capacitance presented by the path. This is more convenient than
defining path electrical effort as the product of stage electrical efforts because we do not
know the individual stage electrical efforts until gate sizes are selected.
The path effort F is the product of the stage efforts of each stage. Recall that the stage effort
of a single stage is f = gh. Can we by analogy state F = GH for a path?
In paths that branch,F≠GH . This is illustrated in Figure , a circuit with a two way branch.
Consider a path from the primary input to one of the outputs. The path logical effort is G = 1
× 1 = 1. The path electrical effort is H = 90/5 = 18. Thus, GH = 18. But F = f1 f2 = g1h1g2h2
= 1 × 6 × 1 × 6 = 36. In other words, F = 2GH in this path on account of the two-way branch.
We must introduce a new kind of effort to account for branching between stages of a path.
This branching effort b is the ratio of the total capacitance seen by a stage to the
capacitance on the path; in Figure 4.30 it is (15 + 15)/15 = 2.
The path branching effort B is the product of the branching efforts between stages.
Now we can define the path effort F as the product of the logical, electrical, and branching
efforts of the path. Note that the product of the electrical efforts of the stages is actually BH,
not just H.
The path delay D is the sum of the delays of each stage. It can also be written as the sum of
the path effort delay DF and path parasitic delay P:
The product of the stage efforts is F, independent of gate sizes. The path effort delay is the
sum of the stage efforts. The sum of a set of numbers whose product is constant is
minimized by choosing all the numbers to be equal. In other words, the path delay is
minimized when each stage bears the same effort. If a path has N stages and each bears
the same effort, that effort must be
Thus, the minimum possible delay of an N-stage path with path effort F and path parasitic
delay P is
This is a key result of Logical Effort. It shows that the minimum delay of the path can be
estimated knowing only the number of stages, path effort, and parasitic delays without the
need to assign transistor sizes. This is superior to simulation, in which delay depends on
sizes and you never achieve certainty that the sizes selected are those that offer minimum
delay.
capacitance transformation formula to find the best input capacitance for a gate given the
output capacitance it drives.
Starting with the load at the end of the path, work backward applying the capacitance
transformation to determine the size of each stage. Check the arithmetic by verifying that the
size of the initial stage matches the specification.
Given a specific circuit topology, we now know how to estimate delay and choose gate sizes.
However, there are many different topologies that implement a particular logic function.
Logical Effort tells us that NANDs are better than NORs and that gates with few inputs are
better than gates with many. In this section, we will also use Logical Effort to predict the best
number of stages to use.
Logic designers sometimes estimate delay by counting the number of stages of logic,
assuming each stage has a constant “gate delay.” This is potentially misleading because it
implies that the fastest circuits are those that use the fewest stages of logic.
Of course, the gate delay actually depends on the electrical effort, so sometimes using fewer
stages results in more delay. The following example illustrates this point. parasitic delay. The
delay of the new path is
Differentiating with respect to N and setting to 0 allows us to solve for the best number of
stages, which we will call N. The result can be expressed more compactly by defining
A path achieves least delay by using stages. It is important to understand not only the best
stage effort and number of stages but also the sensitivity to using a different number of
stages. Figure plots the delay increase using a particular number of stages against the total
number of stages, for pinv = 1. The x-axis plots the ratio of the actual number of stages to
the ideal number. The y-axis plots the ratio of the actual delay to the best achievable.
1.14.SCALING
The only constant in VLSI design is constant change. Figure 1.6 showed the unrelenting
march of technology, in which feature size has reduced by 30% every two to three years.
Dennard’s Scaling Law predicts that the basic operational characteristics of a MOS transistor
can be preserved and the performance improved if the critical parameters of a device are
scaled by a dimensionless factor S. These parameters include the following:
This approach is also called constant field scaling because the electric fields remain the
same as both voltage and distance shrink. In contrast, constant voltage scaling shrinks the
devices but not the power supply. Another approach is lateral scaling, in which only the gate
length is scaled. This is commonly called a gate shrink because it can be done easily to an
existing mask database for a design.
Figure2.10 shows how voltage has scaled with feature size. Historically, feature
sizes were shrunk from 6 µm to 1 µm while maintaining a 5 V supply voltage. This constant
voltage scaling offered quadratic delay improvement as well as cost reduction. It also
maintained continuity in I/O voltage standards.
Constant voltage scaling increased the electric fields in devices. By the 1µm
generation, velocity saturation was severe enough that decreasing feature size no longer
improved device current. Device breakdown from the high field was another risk. And power
consumption became unacceptable. Therefore, Dennard scaling has been the rule since the
half-micron node.
Maintaining a constant field has the further benefit that many nonlinear factors and wearout
mechanisms are essentially unaffected. Unfortunately, voltage scaling has dramatically
slowed since the 90 nm generation because of leakage, and this may ultimately limit CMOS
scaling.
The FO4 inverter delay will scale as 1/S assuming ideal constant-field scaling.
Wires also tend to be scaled equally in width and thickness to maintain an aspect ratio close
to 2.1 Table 7.5 shows the resistance, capacitance, and delay per unit length. Wires and use
the bottom layers of metal. Semiglobal (or scaled ) wires run across larger blocks or cores,
typically using middle layers of metal. Both local and semiglobal wires scale with feature
size.
Global wires run across the entire chip using upper levels of metal. For example, global
wires might connect cores to a shared cache. Global wires do not scale with feature size;
indeed, they may get longer (by a factor of Dc , on the order of 1.1) because die size has
been gradually increasing.
Most local wires are short enough that their resistance does not matter. Like gates, their
capacitance per unit length is remaining constant, so their delay is improving just like gates.
Semiglobal wires long enough to require repeaters are speeding up, but not as fast as gates.
This is a relatively minor problem. Global wires, even with optimal repeaters, are getting
slower as technology scales. The time to cross a chip in a nanometer process can be
multiple cycles, and this delay must be accounted for in the microarchitecture.
Observe that when wire thickness is scaled, the capacitance per unit length remains
constant. Hence, a reasonable initial estimate of the capacitance of a minimum-pitch wire is
about 0.2 fF/µm, independent of the process. In other words, wire capacitance is roughly 1/5
of gate capacitance per unit length.
N+ N+
VXDD
Gnd
Gnd
VDD
x
x
x
Gnd
VDD
x
x
When two or more ‘sticks’ of the same type cross or touch each other that represents
electrical contact.
1) Power and ground lines run horizontally in metal 1.
2) The input and output are accessible from the top or bottom of the cell and will be in Metal
2 running vertically.
3) To draw the stick diagrams the conventions used in this book are shown in
Figure. These conventions are: For Metal-1 use thick solid line, for Metal-2 use thin solid
line, for poly use thick dashed line, for active ( n+ or p+ ) use thin dashed line, for contact use
"X" and for via use "O".
Fig-Stick-Diagrams
From the designer's viewpoint, all CMOS designs have the following entities:
Two different substrates and/or wells: which are p-type for NMOS and n-type
for PMOS.
Diffusion regions (p+ and n+): which defines the area where transistors can
be formed. These regions are also called active areas.
Diffusion of an inverse type is needed to implement contacts to the well or to
substrate. These are called select regions.
Transistor gate electrodes : Polysilicon layer
Metal interconnect layers
Interlayer contacts and via layers.
The layers for typical CMOS processes are represented in various figures in terms
of:
A color scheme (Mead-Conway colors).
Other color schemes designed to differentiate CMOS structures.
Varying stipple patterns
Varying line styles
1.17.3Gate Layouts
For many applications, a straightforward layout is good enough and can be
automatically generated or rapidly built by hand. This section presents a simple
layout style based on a “line of diffusion”rule that is commonly used for standard
cells in automated layout systems.
The power and ground lines areoften called supply rails. Polysilicon lines run
vertically to formtransistor gates. Metal wires within the cell connect the transistors
appropriately.
Part-A (2 marks)
1. What is Moore’s law?
Moore’s law states that the number of transistor would double every 18 months.
conform to a set of geometric constraints or rule specify the minimum allowable line
widths for physical objects on-chip such as metal and poly silicon interconnects or
diffusion area, minimum feature dimensions and minimum allowable separations
between two layers.
10.What is DRC ?
Design Rule Check program looks for design rule violations in the layout. It checks
for
minimum spacing and minimum size and ensures that combinations of layers from
legal components.
2. )(i)Discuss in detail with necessary equations the operation of MOSFET and its
current – Voltage characteristics.
3. a) (i) An NMOS transistor has the following parameters : gate oxide thickness = 10
nm, relative permittivity of gate oxide = 3.9, electron mobility = 520 2 cm /V-sec,
threshold voltage = 0.7 V, permittivity of free space = 14 10 85 . 8 −× F/cm and (W/L)
= 8. Calculate the drain current when ( GS V = 2 V and = DS V 1.2 V) and ( GS V = 2
V and = DS V2 V) and also compute the gate oxide capacitance per unit area. Note
that W and L refer to the width and length of the channel respectively. (3 + 3 +2) Dec
2011
4. Draw and explain the DC and transfer characteristics of a CMOS inverter with
necessary conditions for the different regions of operation. (8) May-16
5. (a) Explain in detail about the ideal I-V characteristics and non ideal I-V
characteristics of a NMOS and PMOS devices.(16)May-16
6. a) Discuss the CV characteristic and DC transfer characteristic of the CMOS.(16)
7. Discuss the principles of constant field scaling and lateral field sacling.Write the
effects of above scaling methods on device characteristics.May-16
8.Explain the dynamic behaviour of MOSFET transistor with neat diagram.April/May
2018
9.Write the layout design rul;es and diagrams for four input NAND &NOR. April/May
2018
10.Explain the basic principles of transmission gate in CMOS design. April/May 2018
15.
UNIT 2
COMBINATIONAL
LOGIC CIRCUITS
REFERRED BOOK:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated Circuits: A
Design Perspective”, Second Edition, Prentice Hall of India, 2003.
4. R.Jacob Baker, Harry W.LI., David E.Boyee, “CMOS Circuit Design, Layout
and Simulation”, Prentice Hall of India 2005 3. A.Pucknell, Kamran
Eshraghian, “BASIC VLSI Design”, Third Edition, Prentice Hall of India, 2007.
I .CIRCUIT FAMILIES
Static CMOS circuits with complementary nMOS pulldown and pMOS pullup
networks are used for the vast majority of logic gates in integrated circuits.
They have good noise margins, and are fast, low power, insensitive to device
variations, easy to design, widely supported by CAD tools, and readily available in
standard cell libraries.
When noise does exceed the margins, the gate delay increases because of the
glitch, but the gate eventually will settle to the correct answer. Most design teams
now use static CMOS exclusively for combinational logic.
Compound gates are particularly useful to perform complex functions with relatively
low logical efforts.
When a particular input is known to be latest, the gate can be optimized to favor that
input.
Similarly, when either the rising or falling edge is known to be more critical, the gate
can be optimized to favor that edge.
We have focused on building gates with equal rising and falling delays; however,
using smaller pMOS transistors can reduce power, area, and delay.
CMOS stages are inherently inverting, so AND and OR functions must be built from NAND
and NOR gates. DeMorgan’s law helps with this conversion
These relations are illustrated graphically in Figure 3.1. A NAND gate is quivalent to an OR
of inverted inputs. A NOR gate is equivalent to an AND of inverted inputs. The same
relationship applies to gates with more inputs. Switching between representations is easy to
do on a whiteboard and is often called bubble pushing.
SOLUTION: By inspection, the circuit consists of two ANDs and an OR, shown in Figure (a).
In Figure(b), the ANDs and ORs are converted to basic CMOS stages. In Figure (c and d),
bubble pushing is used to simplify the logic to three NANDs.
FIGURE Bubble pushing to convert ANDs and ORs to NANDs and NORs FIG. Logic using AOI22
gate
Static CMOS also efficiently handles compound gates computing various inverting
combinations of AND/OR functions in a single stage.
The function F = AB + CD can be computed with an AND-OR INVERT- 22 (AOI22) gate and
an inverter, as shown in Figure 3.2.
In general, logical effort of compound gates can be different for different inputs.
Figure 3.2 shows how logical efforts can be estimated for the AOI21, AOI22, and a more
complex compound AOI gate. The transistor widths are chosen to give the same drive as a
unit inverter. The logical effort of each input is the ratio of the input capacitance of that input
to the input capacitance of the inverter.
For the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C) than
for the two AND terminals (A, B). The parasitic delay is crudely estimated from the total
diffusion capacitance on the output node by summing the sizes of the transistors attached to
the output.
The logical effort and parasitic delay of different gate inputs are often different.
Some logic gates, like the AOI21 are inherently asymmetric in that one input sees less
capacitance than another.
Other gates, like NANDs and NORs, are nominally symmetric but actually have slightly
different logical effort and parasitic delays for the different inputs.
Figure 3.3 shows a 2-input NAND gate annotated with diffusion parasitics. Consider
the falling output transition occurring when one input held a stable 1 value and the other
rises from 0 to 1. If input B rises last, node x will initially be at VDD– Vt~VDD because it was
pulled up through the nMOS transistor on input A.
The Elmore delay is (R/2)(2C) R(6C) =7RC =2.3τ זּ. On the other hand, if input A
rises last, node x will initially be at 0 V because it was discharged through the nMOS
transistor on input B. No charge must be delivered to node x, so the Elmore delay is simply
R(6C) =6RC =2τזּ.
In general, we define the outer input to be the input closer to the supply rail (e.g., B)
and the inner input to be the input closer to the output (e.g., A). The parasitic delay is
smallest when the inner input switches last because the intermediate nodes have already
been discharged.
Therefore, if one signal is known to arrive later than the others, the gate is fastest
when that signal is connected to the inner input.
The logical efforts are lower than initial estimates might predict because of velocity
saturation. Interestingly, the inner input has a slightly higher logical effort because the
intermediate node x tends to rise and cause negative feedback when the inner input turns
ON.
FIGURE 3.3 NAND gate delay estimation FIGURE Paths with transistor widths
When one input is far less critical than another, even nominally symmetric gates can be
made asymmetric to favor the late input at the expense of the early one.
In a series network, this involves connecting the early input to the outer transistor and
making the transistor wider so that it offers less series resistance when the critical
input arrives.
For example, consider the path in Figure 3.4(a). Under ordinary conditions, the path acts as
a buffer between A and Y. When reset is asserted, the path forces the output low. If reset
only occurs under exceptional circumstances and can take place slowly, the circuit should be
optimized for input-to-output delay at the expense of reset.
This can be done with the asymmetric NAND gate in Figure 3.4(b). The pulldown resistance
is R/4 +R/(4/3) =R, so the gate still offers the same driver as a unit inverter.
However, the capacitance on input A is only 10/3, so the logical effort is 10/9. This is
better than 4/3, which is normally associated with a NAND gate. In the limit of an infinitely
large reset transistor and unit-sized nMOS transistor for input A, the logical effort
approaches 1, just like an inverter.
The improvement in logical effort of input A comes at the cost of much higher effort
on the reset input. Note that the pMOS transistor on the reset input is also shrunk. This
reduces its diffusion capacitance and parasitic delay at the expense of slower response to
reset.
CMOS transistors are usually velocity saturated, and thus series transistors carry
more current than the long-channel model would predict. For asymmetric gates, the
equivalent width is that of the inner (narrower) transistor.
In other cases, one input transition is more important than the other.
we defined
This favoring can be done by decreasing the size of the noncritical transistor. The
logical efforts for the rising (up) and falling (down) transitions are called guand gd,
respectively, and are the ratio of the input capacitance of the skewed gate to the input
capacitance of an unskewed inverter with equal drive for that transition.
Figure 3.5 (a) shows how a HI-skew inverter is constructed by downsizing the nMOS
transistor. This maintains the same effective resistance for the critical transition while
reducing the input capacitance relative to the unskewed inverter of Figure 3.5(b), thus
reducing the logical effort on that critical transition to gu= 2.5/3 = 5/6.
The improvement comes at the expense of the effort on the noncritical transition. The
logical effort for the falling transition is estimated by comparing the inverter to a smaller
unskewed inverter with equal pulldown current, shown in Figure3.5(c), giving a logical effort
of gd =2.5/1.5 =5/3.
The degree of skewing (e.g., the ratio of effective resistance for the fast transition
relative to the slow transition) impacts the logical efforts and noise margins; a factor of two is
common. Figure 3.6 catalogs HIskew and LO-skew gates with a skew factor of two.
Notice in Figure 3.6 that the average logical effort of the LO-skew NOR2 is actually better
than that of the unskewed gate. The pMOS transistors in the unskewed gate are enormous
in order to provide equal rise delay. They contribute input capacitance for both transitions,
while only helping the rising delay.
By accepting a slower rise delay, the pMOS transistors can be downsized to reduce input
capacitance and average delay significantly.
In general, what is the best P/N ratio for logic gates (i.e., the ratio of pMOS to nMOS
transistor width)? For processes with a mobility ratio of µ n/µp =2 as we have generally been
assuming, the best ratios are shown in Figure 3.7.
Some paths can be slower than average if they trigger the worst edge of each gate.
Excessively slow rising outputs can also cause hot electron degradation. And reducing the
pMOS size also moves the switching point lower and reduces the inverter’s noise margin.
In summary, the P/N ratio of a library of cells should be chosen on the basis of area,
power, and reliability, not average delay.
For NOR gates, reducing the size of the pMOS transistors significantly improves both
delay and area. In most standard cell libraries, the pitch of the cell determines the P/N ratio
that can be achieved in any particular gate. Ratios of 1.5–2 are commonly used for inverters.
1.1.1.7 Multiple Threshold Voltages
Some CMOS processes offer two or more threshold voltages. Transistors with lower
threshold voltages produce more ON current, but also leak exponentially more OFF current.
Ratioed circuits depend on the proper size or resistance of devices for correct
operation.
As shown in Figure 3.8. Conceptually, the ratioed gate consists of an nMOS pulldown
network and some pullup device called the static load.
When the pulldown network is OFF, the static load pulls the output to 1.
When the pulldown network turns ON, it fights the static load.
The static load must be weak enough that the output pulls down to an acceptable 0.
Hence, there is a ratio constraint between the static load and pulldown network.
Stronger static loads produce faster rising outputs, but increase VOL, degrade the
noise margin, and burn more static power when the output should be 0.
CMOS logic eventually displaced nMOS logic because the static power became
unacceptable as the number of gates increased.
However, ratioed circuits are occasionally still useful in special applications. A resistor is a
simple static load, but large resistors consume a large layout area in typical MOS processes.
Another technique is to use an nMOS transistor with the gate tied to VGG. If VGG =VDD, the
nMOS transistor will only pull up to VDD– Vt. Worse yet, the threshold is increased by the
body effect.
Thus, using VGG>VDDwas attractive. To eliminate this extra supply voltage, some nMOS
processes offered depletion mode transistors. These transistors, indicated with the thick bar,
are identical to ordinary enhancement mode transistors except that an extra ion implantation
was performed to create a negative threshold voltage. The depletion mode pullups have
their gate wired to the source so Vgs=0 and the transistor is always weakly ON.
1.2.1 Pseudo-nMOS
The DC transfer characteristics are derived by finding Vout for which Idsn =|Idsp| for
a given Vin, as shown in Figure 3.9(b–c) for a 180 nm process.
The beta ratio affects the shape of the transfer characteristics and the VOL of the
inverter. Larger relative pMOS transistor sizes offer faster rise times but less sharp transfer
characteristics.
Figure 3.9(d) shows that when the nMOS transistor is turned on, a static DC current flows in
the circuit.
Figure 3.10 shows several pseudo-nMOS logic gates. The pulldown network is like
that of an ordinary static gate, but the pullup network has been replaced with a single pMOS
transistor that is grounded so it is always ON.
The pMOS transistor widths are selected to be about 1/4 the strength (i.e., 1/2 the
effective width) of the nMOS pulldown network as a compromise between noise margin and
speed; this best size is process-dependent, but is usually in the range of 1/3 to 1/6.
The logical effort for each transition is computed as the ratio of the input capacitance
to that of a complementary CMOS inverter with equal current for that transition. For the
falling transition, the pMOS transistor effectively fights the nMOS pulldown. The output
current is estimated as the pulldown current minus the pullup current, (4I/3 – I/3) =I.
For example, the logical effort for a falling transition of the pseudo-nMOS inverter is
the ratio of its input capacitance (4/3) to that of a unit complementary CMOS inverter (3), i.e.,
4/9. guis three times as great because the current is 1/3 as much.
The parasitic delay is also found by counting output capacitance and comparing it to
an inverter with equal current. For example, the pseudo-nMOS NOR has 10/3 units of
diffusion apacitance as compared to 3 for a unit-sized complementary CMOS inverter, so its
parasitic delay pulling down is 10/9.
The pullup current is 1/3 as great, so the parasitic delay pulling up is 10/3. As can be
seen, pseudo-nMOS is slower on average than static CMOS for NAND structures. However,
pseudo-nMOS works well for NOR structures.
The logical effort is independent of the number of inputs in wide NORs, so pseudo-
nMOS is useful for fast wide NOR gates or NOR-based structures like ROMs and PLAs
when power permits.
Pseudo-nMOS gates will not operate correctly if VOL>VILof the receiving gate. This
is most likely in the SF design corner where nMOS transistors are weak and pMOS
transistors are strong.
Designing for acceptable noise margin in the SF corner forces a conservative choice
of weak pMOS transistors in the normal corner. A biasing circuit can be used to reduce
process sensitivity, as shown in Figure 3.11.
The goal of the biasing circuit is to create a Vbias that causes P2 to deliver 1/3 the
current of N2, independent of the relative mobilities of the pMOS and nMOS transistors.
Transistor N2 has width of 3/2 and hence produces current 3I/2 when ON.
Transistor N1 is tied ON to act as a current source with 1/3 the current of N2, i.e., I/2.
P1 acts as a current mirror using feedback to establish the bias voltage sufficient to provide
equal current as N1, I/2. The size of P1 is noncritical so long as it is large enough to produce
sufficient current and is equal in size to P2.
Now, P2 ideally also provides I/2. In summary, when A is low, the pseudo-nMOS
gate pulls up with a current of I/2. When A is high, the pseudo-nMOS gate pulls down with
an effective current of (3I/2 – I/2) =I. To first order, this biasing technique sets the relative
currents strictly by transistor widths, independent of relative pMOS and nMOS mobilities.
Such replica biasing permits the 1/3 current ratio rather than the conservative ¼ ratio in the
previous circuits, resulting in lower logical effort.
The bias voltage Vbias can be distributed to multiple pseudo-nMOS gates. Ideally, Vbias will
adjust itself to keep VOLconstant across process corners. Unfortunately, the currents through
the two pMOS transistors do not exactly match because their drain voltages are unequal, so
this technique still has some process sensitivity. Also note that this bias is relative to VDD,
so any noise on either the bias voltage line or the VDD supply rail will impact circuit
performance.
Figure 3.12 illustrates pairs of CMOS inverters ganged together. The truth table is
given in Table 2, showing that the pair compute the NOR function. Such a circuit is
sometimes called a symmetric 2 NOR, or more generally, ganged CMOS.
When one input is 0 and the other 1, the gate can be viewed as a pseudo-
nMOS circuit with appropriate ratio constraints.
When both inputs are 0, both pMOS transistors turn on in parallel, pulling the
output high faster than they would in an ordinary pseudonMOS gate.
Moreover, when both inputs are 1, both pMOS transistors turn OFF, saving
static power dissipation.
As in pseudo-nMOS, the transistors are sized so the pMOS are about 1/4 the
strength of the nMOS and the pulldown current matches that of a unit inverter. Hence, the
symmetric NOR achieves both better performance and lower power dissipation than a 2-
input pseudo-nMOS NOR.
Cascode Voltage Switch Logic (CVSL3) seeks the benefits of ratioed circuits without
the static power consumption.
It uses both true and complementary input signals and computes both true and
complementary outputs using a pair of nMOS pulldown networks, as shown in Figure
3.13(a).
The pulldown network f implements the logic function as in a static CMOS gate, while
f uses inverted inputs feeding transistors arranged in the conduction complement.
For any given input pattern, one of the pulldown networks will be ON and the other
OFF. The pulldown network that is ON will pull that output low. This low output turns
ON the pMOS transistor to pull the opposite output high. When the opposite output
rises, the other pMOS transistor turns OFF so no static power dissipation occurs.
Figure 3.13(b) shows a CVSL AND/NAND gate. Observe how the pulldown networks
are complementary, with parallel transistors in one and series in the other.
Figure 3.13(c) shows a 4-input XOR gate. The pulldown networks share Aand A
transistors to reduce the transistor count by two.
Advantages
CVSL has a potential speed advantage because all of the logic is performed with
nMOS transistors, thus reducing the input capacitance.
Unlike pseudo-nMOS, the feedback tends to turn off the pMOS, so the outputs will
settle eventually to a legal logic level. A small pMOS transistor is slow at pulling the
complementary output high.
In addition, the CVSL gate requires both the low- and high-going transitions, adding
more delay. Contention current during the switching period also increases power
consumption.
Unfortunately, CVSL also requires the complement, a slow tall NAND structure.
Therefore, CVSL is poorly suited to general NAND and NOR logic
Ratioed circuits reduce the input capacitance by replacing the pMOS transistors
connected to the inputs with a single resistive pullup.
Drawbacks
Dynamic circuits circumvent these drawbacks by using a clocked pullup transistor rather
than a pMOS that is always ON.
Figure 3.14 compares (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters.
FIGURE 3.14 Comparison of (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters
Dynamic circuit operation is divided into two modes, as shown in Figure 3.15.
During precharge, the clock φ is 0, so the clocked pMOS is ON and initializes the
output Y high.
During evaluation, the clock is 1 and the clocked pMOS turns OFF.
The output may remain high or may be discharged low through the pulldown network.
Dynamic circuits are the fastest commonly used circuit family because they have lower input
capacitance and no contention during switching. They also have zero static power
dissipation.
However, they require careful clocking, consume significant dynamic power, and are
sensitive to noise during evaluation.
In Figure 3.14(c), if the input A is 1 during precharge, contention will take place because
both the pMOS and nMOS transistors will be ON.
When the input cannot be guaranteed to be 0 during precharge, an extra clocked evaluation
transistor can be added to the bottom of the nMOS stack to avoid contention as shown in
Figure 3.15(a).
The extra transistor is sometimes called a foot. Figure (c) shows generic footed and
unfooted gates.
Figure 3.16 estimates the falling logical effort of both footed and unfooted dynamic
gates. As usual, the pulldown transistors’ widths are chosen to give unit resistance.
Precharge occurs while the gate is idle and often may take place more slowly.
Therefore, the precharge transistor width is chosen for twice unit resistance. This reduces
the capacitive load on the clock and the parasitic capacitance at the expense of greater
rising delays.
We see that the logical efforts are very low. Footed gates have higher logical effort than their
unfooted counterparts but are still an improvement over static logic.
Like pseudo-nMOS gates, dynamic gates are particularly well suited to wide NOR functions
or multiplexers because the logical effort is independent of the number of inputs.
Monotonicity
While a dynamic gate is in evaluation, the inputs must be monotonically rising. That is, the
input can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and remain
HIGH, but not start HIGH and fall LOW.
Figure 3.17 shows waveforms for a footed dynamic inverter in which the input violates
monotonicity.
During precharge, the output is pulled HIGH. When the clock rises, the input is HIGH
so the output is discharged LOW through the pulldown network, as you would want to have
happen in an inverter.
The input later falls LOW, turning off the pulldown network. However, the precharge
transistor is also OFF so the output floats, staying LOW rather than rising as it would in a
normal inverter. The output will remain low until the next precharge step.
In summary, the inputs must be monotonically rising for the dynamic gate to compute
the correct function. Unfortunately, the output of a dynamic gate begins HIGH and
monotonically falls LOW during evaluation.
This monotonically falling output X is not a suitable input to a second dynamic gate
expecting monotonically rising signals, as shown in Figure 3.18. Dynamic gates sharing the
same clock cannot be directly connected.
The monotonicity problem can be solved by placing a static CMOS inverter between
dynamic gates, as shown in Figure 3.19(a).
This converts the monotonically falling output into a monotonically rising signal
suitable for the next gate, as shown in Figure 3.19(b).
A single clock can be used to precharge and evaluate all the logic gates within the
chain. The dynamic output is monotonically falling during evaluation, so the static inverter
output is monotonically rising.
Therefore, the static inverter is usually a HI-skew gate to favor this rising output.
Observe that precharge occurs in parallel, but evaluation occurs sequentially. The symbols
for the dynamic NAND, HI-skew inverter, and domino AND are shown in Figure 3.19(c).
In general, more complex inverting static CMOS gates such as NANDs or NORs can be
used in place of the inverter. This mixture of dynamic and static logic is called compound
domino.
Dual-rail domino gates encode each signal with a pair of wires. The input and output signal
pairs are denoted with _h and _l, respectively.
The _h wire is asserted to indicate that the output of the gate is “high” or 1. The _l wire is
asserted to indicate that the output of the gate is “low” or 0. When the gate is precharged,
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
neither _h nor _l is asserted. The pair of lines should never be both asserted simultaneously
during correct operation.
Dual-rail domino gates accept both true and complementary inputs and compute both true
and complementary outputs, as shown in Figure 3.20(a).
Observe that this is identical to static CVSL circuits from Figure 3.13 except that the cross-
coupled pMOS transistors are instead connected to the precharge clock.
Therefore, dual-rail domino can be viewed as a dynamic form of CVSL, sometimes called
DCVS.
Figure 3.20(b) shows a dual-rail AND/NAND gate and Figure 3.20(c) shows a dual-rail
XOR/XNOR gate. The gates are shown with clocked evaluation transistors, but can also be
unfooted.
Dual-rail domino is a complete logic family in that it can compute all inverting and
noninverting logic functions. However, it requires more area, wiring, and power.
Dual-rail structures also lose the efficiency of wide dynamic NOR gates because they require
complementary tall dynamic NAND stacks.
Dual-rail domino signals not only the result of a computation but also indicates when the
computation is done. Before computation completes, both rails are precharged. When the
computation completes, one rail will be asserted. A NAND gate can be used for completion
detection, as shown in Figure 3.21.
Dynamic circuits also suffer from charge leakage on the dynamic node.
If a dynamic node is precharged high and then left floating, the voltage on the
dynamic node will drift over time due to subthreshold, gate, and junction
leakage.
Moreover, dynamic circuits have poor input noise margins. If the input rises
above Vtwhile the gate is in evaluation, the input transistors will turn on
weakly and can incorrectly discharge the output.
Both leakage and noise margin problems can be addressed by adding a keeper circuit.
The keeper is a weak transistor that holds, or staticizes, the output at the correct
level when it would otherwise float.
When the dynamic node X is high, the output Y is low and the keeper is ON to
prevent X from floating. When X falls, the keeper initially opposes the transition so it
must be much weaker than the pulldown network.
Eventually Y rises, turning the keeper OFF and avoiding static power dissipation.
The keeper must be strong (i.e., wide) enough to compensate for any leakage
current drawn when the output is floating and the pulldown stack is OFF.
Strong keepers also improve the noise margin because when the inputs are slightly
above Vtthe keeper can supply enough current to hold the output high.
For small dynamic gates, the keeper must be weaker than a minimum-sized transistor. This
is achieved by increasing the keeper length, as shown in Figure 3.23(a).
Long keeper transistors increase the capacitive load on the output Y. This can be avoided by
splitting the keeper, as shown in Figure 3.23(b).
Figure 3.24 shows a differential keeper for a dual-rail domino buffer. When the gate is
precharged, both keeper transistors are OFF and the dynamic outputs float. However, as
soon as one of the rails evaluates low, the opposite keeper turns ON.
The differential keeper is fast because it does not oppose the falling rail. As long as one of
the rails is guaranteed to fall promptly, the keeper on the other rail will turn on before
excessive leakage or noise causes failure.
Another variation on domino is shown in Figure 3.27(a). The HI-skew inverting static
gates are replaced with predischarged dynamic gates using pMOS logic.
For example, a footed dynamic p-logic NAND gate is shown in Figure 3.27(b).
When φ is 0, the first and third stages precharge high while the second stage
predischarges low.
When φ rises, all the stages evaluate.
The logical effort of footed p-logic gates is generally worse than that of HI-skew gates
(e.g., 2 vs. 3/2 for NOR2 and 4/3 vs. 1 for NAND2).
Secondly, NORA is extremely susceptible to noise.
In an ordinary dynamic gate, the input has a low noise margin (about Vt ), but is strongly
driven by a static CMOS gate.The floating dynamic output is more prone to noise from
coupling and charge sharing, but drives another static CMOS gate with a larger noise
margin.In NORA, however, the sensitive dynamic inputs are driven by noiseprone dynamic
outputs.
Given these drawbacks and the extra clock phase required, there is little reason to
use NORA. Zipper domino is a closely related technique that leaves the precharge
transistors slightly ON during evaluation by using precharge clocks that swing between 0
and VDD – |Vtp| for the pMOS precharge and Vtnand VDDfor the nMOS precharge. This plays
much the same role as a keeper.
In the circuit families we have explored so far, inputs are applied only to the gate
terminals of transistors. In pass-transistor circuits, inputs are also applied to the source/drain
diffusion terminals.
These circuits build switches using either nMOS pass transistors or parallel pairs of
nMOS and pMOS transistors called transmission gates.
For the purpose of comparison, Figure3.28 shows a 2-input multiplexer constructed in a wide
variety of pass-transistor circuit families along with static CMOS, pseudonMOS, CVSL, and
single- and dual-rail domino.
Some of the circuit families are dualrail, producing both true and complementary outputs,
while others are single-rail and may require an additional inversion if the other polarity of
output is needed. U XOR V can bertain other cases, we will see that computed with exactly
the same logic using S = U, S = U, A = V, B = V. This shows that static CMOS is particularly
poorly suited to XOR because the complex gate and two additional inverters are required;
hence, pass-transistor circuits become attractive.
In comparison, static CMOS NAND and NOR gates are relatively efficient and benefit less
from pass transistors.
The transmission gate multiplexer using two transmission gates. The circuit was non
restoring; i.e., the logic levels on the output are no better than those on the input so a
cascade of such circuits may accumulate noise. To buffer the output and restore levels, a
static CMOS output inverter can be added, as shown in Figure 3.28 (CMOSTG).
A single nMOS or pMOS pass transistor suffers from a threshold drop. If used alone,
additional circuitry may be needed to pull the output to the rail. Transmission gates solve this
problem but require two transistors in parallel.
Estimate the effective resistance of a unit transistor passing a value in its poor
direction as twice the usual value: 2R for nMOS and 4R for pMOS.
Boosting the size of the pMOS transistor only slightly improves the effective resistance while
significantly increasing the capacitance.
Figure 3.30(a) redraws the multiplexer to include the inverters from the previous
stage that drive the diffusion inputs but to exclude the output inverter.
Figure 3.30(b) shows this multiplexer drawn at the transistor level. Observe that this
is identical to the static CMOS multiplexer of Figure 3.28 except that the intermediate nodes
in the pullup and pulldown networks are shorted together as N1 and N2.
The effective resistance decreases somewhat (especially for rising outputs) because
the output is pulled up or down through the parallel combination of both pass
transistors rather than through a single transistor.
Tthe effective capacitance increases slightly because of the extra diffusion and wire
capacitance required for this shorting.
Note that the circuit in Figure 3.31(d) interchanges the A and enable terminals. It is
logically equivalent, but electrically inferior because if the output is tristated but Atoggles,
charge from the internal nodes may disturb the floating output node.
Note that the parasitic delay of transmission gate circuits with multiple series
transmission gates increases rapidly because of the internal diffusion capacitance, so it is
seldom beneficial to use more than two transmission gates in series without buffering.
Disadvantages of CVSL:
CVSL is slow because one side of the gate pulls down, and then the cross-
coupled pMOS transistor pulls the other side up.
Figure 3.33(a) shows the CPL multiplexer from Figure 3.28 rotated sideways. If a
path consists of a cascade of CPL gates, the inverters can be viewed equally well as being
on the output of one stage or the input of the next.
Figure 3.33(b) redraws the mux to include the inverters from the previous stage that
drives the diffusion input, but to exclude the output inverters.
Figure 3.33(c) shows the mux drawn at the transistor level. Observe that this is
identical to the CVSL gate from Figure 3.28 except that the internal node of the stack can be
pulled up through the weak pMOS transistors in the inverters.
When the gate switches, one side pulls down well through its nMOS transistors. The
other side pulls up. CPL can be constructed without cross-coupled pMOS transistors, but the
outputs would only rise to VDD– Vt(or slightly lower because the nMOS transistors experience
the body effect). This costs static power because the output inverter will be turned slightly
ON.
Adding weak cross-coupled devices helps bring the rising output to the supply rail
while only slightly slowing the falling output. The output inverters can be LO-skewed to
reduce sensitivity to the slowly rising output.
18 POWER DISSIPATION
In static CMOS gates power was a secondary consideration behind speed and
area for many chips. As transistor counts and clock frequencies have increased,
power consumption is a primary design constraint.
Some important definitions. The instantaneous power P(t) drawn from the
power supply is proportional to the supply current iDD(t) and the supply voltage
VDD.
P(t) = iDD(t).VDD
The energy consumed over some time interval T is the integral of the
instantaneous power
Considering the static CMOS inverter shown in Figure (a), if the input = '0,' the
associated nMOS transistor is OFF and the pMOS transistor is ON. The output
voltage is VDD or logic '1.
When the input = '1,' the associated nMOS transistor is ON and the pMOS
transistor is OFF. The output voltage is 0 volts (GND).
Note that one of the transistors is always OFF when the gate is in either of
these logic states. Ideally, no current flows through the OFF transistor so the
power dissipation is zero.
The leakage current is constant so instantaneous and average power are the
same; the static power dissipation is the evaluation product of total leakage
current and the supply voltage.
Pstatic = IstaticVDD
Si02 is a very good insulator, so leakage current through the gate dielectric
historically was very low. However, it is possible for electrons to tunnel across
very thin insulators; the probability drops off exponentially with oxide
thickness.
Current flows from VDD to the load to charge it. Current then flows from the
load to GND during discharge. In one complete charge/discharge cycle, a
total charge of Q = CVDD is thus transferred from VDD to GND.
Taking the integral of the current over some interval T as the total charge delivered
during
Now the dynamic power dissipation may be rewritten as an activity factor times
the clock frequency:
A clock has an activity factor of α =1 because it rises and falls every cycle.
Most data has a maximum α value as of 0.5 because it transitions only once
each cycle.
Because the input rise/fall time is greater than zero, both nMOS and pMOS
transistors will be ON for a short period of time while the input is between Vtn
and VDD – |Vtp|.
This results in an "short circuit" current pulse from V DD to GND and typically
increases power dissipation by about 10%. Short circuit power dissipation
occurs as both pullup and pulldown networks are partially ON while the input
switches.
It increases as edge rates become slower because both networks are ON for
more time. However, it decreases as load capacitance increases because
with large loads the output only switches a small amount during the input
transition, leading to a small Vds across one of the transistors.
It is good to use relatively crisp edge rates at the inputs to gates with wide
transistors to minimize their short circuit current.
Power dissipation has become extremely important to VLSI designers. Total power
dissipation is the sum of the static and dynamic dissipation components.
Dynamic dissipation has historically been far greater than static power when
systems are active, and hence, static power is often ignored.
This number increases slowly with advances in heatsink technology and can be
increased significantly with expensive liquid cooling, but has not kept pace with
the growing power demands of systems.
Power reduction techniques can be divided into those that reduce dynamic
power and those that reduce static power.
Activity factor reduction is very important. Static logic has an inherently low
activity factor. Dynamic circuit families have clocked nodes and a high internal
activity factor, so they are also costly in power. Clock gating can be used to stop
portions of the chip that are idle; for example, a floating point unit can be turned off
when executing integer code and a second level cache can be idled if the data is
found in the primary cache.
For example, buffers driving I/O pads or long wires may use a stage effort of
8-12 to reduce the buffer size. Interconnect switching capacitance is most
effectively reduced through careful floor-planning, placing communicating units
near each other to reduce wire lengths.
Frequency can also be traded for power. For example, in a digital signal processing
system primarily concerned with throughput, two multipliers running at half speed
can replace a single multiplier at full speed.
At first, this may not appear to be a good idea because it maintains constant power
and performance while doubling area. However, if the power supply can also be
reduced because the frequency requirement is lowered, overall power consumption
goes down.
Commonly used metrics in low-power design are power, the power-delay product,
and the energy-delay product. Power alone is a questionable metric because it can
be reduced simply by computing more slowly. The power-delay product is also
suspect because the energy can be reduced by computing more slowly at a lower
supply voltage. The energy-delay product is less prone to such gaming.
Static power reduction involves minimizing Istatic. Some circuit techniques such as
analog current sources and pseudo-nMOS gates intentionally draw static power.
They can be turned off when they are not needed.
where
Where the η term describes drain-induced barrier lowering and the γ term describes
the body effect.
Another way to control leakage is through the body voltage using the body
effect. For example, low-Vt devices can be used and a reverse body bias
(RBB) can be applied during idle mode to reduce leakage [Alternatively,
higher Vt devices can be used, and then a forward body bias (FBB) can be
applied during active mode to increase performance.
Too much reverse body bias leads to greater junction leakage through a
mechanism called band-to-band tunneling, while too much forward body bias
leads to substantial current through the body to source diodes.
An adaptive body bias (ABB) can compensate and achieve more uniform
transistor performance despite the variations. In any case, the body bias
should be kept to less than about 0.5 V.
Applying a body bias requires additional power supply rails to distribute the
substrate and well voltages. For example, a RBB scheme for a 1.8 V n-well
process could bias the p-type substrate at V BBn = -0.4 V and the n-well at V BBp
= 2.2 V.
Alternatively, the source voltage can be raised in sleep mode. This has the
double benefit of reducing Vds as well as increasing Vsb. However, the source
does carry significant current, so generating a stable and adjustable source
voltage rail is challenging
Fig.2.56(c): MTCMOS
The high-Vt device is connected between the true VDD and the virtual VDDV rails
connected to the logic gates. The extra transistor increases the impedance
between the true and virtual power supply, causing greater power supply noise
and gate delay.
Bypass capacitance between VDDV and GND stabilizes the supply somewhat,
but the capacitance is discharged each time Vddy is disconnected, contributing to
the power consumption.
The leakage through two series OFF transistors is much lower than that of a
single transistor because of the stack effect. In Figure2.57 (d), the single
transistor has a relatively low threshold because of drain- induced barrier
lowering from the high drain voltage. In Figure 2.57(d), node x rises to about 100
mV.
Low-power systems can take advantage of this stack effect to put gates with
series transistors into a low-leakage sleep mode by applying an input pattern to
turn off both transistors. Silicon on Insulator (SOI) circuits is attractive for low-
leakage designs because they have a sharper sub threshold current roll off.
PartA
1.Define Elmore delay model ?Give the expression for elmore delay and state
the various parameters associated with it. (Nov-14,May-16,May-17)
It is an analytical method used to estimate the RC delay in a network. Elmore
delay model estimates the delay of a RC ladder as the sum over each node in the
ladder of the resistance Rn-1 between that node and a supply multiplied by the
capacitor on the nodes.
They do not have path VDD to GND and do not dissipate standby
power(static power dissipation).
13.What is transmission gate ?
The circuit constructed with the parallel connection of PMOS and NMOS with
shorted drain and source terminals. The gate terminal uses two select signals s and
s, when s is high than the transmission gates passes the signal on the input. The
main advantage of transmission gate is that it eliminates the threshold voltage drop.
14.why low power has become an important issue in the present day VLSI
circuit realization?
Indeep submicron technology the power has become as one of the most
important issue because of:
Increasing transistor count-the number of transistor is getting doubled in
every 18 months based on moore's law
producing output current as compared to an inverter, given that each input of the
gate may only present as much input capacitance as the inverter.
27.What is complementary pass transistor logic?State its advantages over
CVSL.(Nov-14)
CVSL is slow because one side of the gate pulls down, and then the
cross-coupled pMOS transistor pulls the other side up.
PART B Questions
1.(i) Explain the static and dynamic power dissipation in CMOS circuits with
necessary diagrams and expressions.(10) DEC 2011
2.a)Implement the equation X=((A +B )(C+D+E)+F)G using CMOS
technology and draw the layout for this CMOS circuit DEC 2012
3.Derive an expression for the rise time, fall time and propagation delay of a
CMOS inverter.(16)
4.Explain the various ways to minimize the static and dynamic power
dissipation.(16 DEC 2013)
5. Discuss in detail about the ratioed circuit and dynamic circuit CMOS logic
configurations
6. Describe the basic principle of operation of dynamic CMOS ,domino and
NP dominologic with neat diagrams.
7. Explain the static and dynamic power dissipation in CMOS circuits with
necessarydiagrams and expressions.
8. Discuss the design techniques to reduce switching activity in a static and
dynamic CMOScircuits.
9. Briefly discuss about the classification of circuit families and comparison of
circuitfamilies.
10.i)Draw the static CMOS logic circuit for the given expression(May-16)
Y= A . B . C . D
Y= D ¿ ¿)
ii)Discuss in detail characteristics of CMOS transmission gate?
11.What are the sources of power dissipation in CMOSand discuss
various design techniques to reduce power dissipation in CMOS circuits.
(May-16)
UNIT III
SEQUENTIAL LOGIC
CIRCUITS
REFERRED BOOK:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated
Circuits: A Design Perspective”, Second Edition, Prentice Hall of India,
2003.
2. N.Weste, K.Eshraghian, “Principles of CMOS VLSI Design”, Second
Edition, Addision Wesley 1993
3.1INTRODUCTION
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
Combinational logic circuits that were described earlier have the property that
the outputof a logic block is only a function of the current input values, assuming that
enough timehas elapsed for the logic gates to settle. Yet virtually all useful systems
require storage ofstate information, leading to another class of circuits
calledsequential logic circuits. Inthese circuits, the output not only depends upon the
current values of the inputs, but alsoupon preceding input values. In other words, a
sequential circuit remembers some of thepast history of the system— it has memory.
Figure 3.1 shows a block diagram of a generic finite state machine (FSM) that
consistsof combinational logic and registers that hold the system state. The system
depictedhere belongs to the class of synchronous sequential systems, in which all
registers areunder control of asingle global clock. The outputs of the FSM are a
function of the currentInputs and the Current State. The Next State is determined
based on the Current State andthe current Inputs and is fed to the inputs of registers.
On the rising edge of the clock, theNext State bits are copied to the outputs of
the registers (after some propagation delay),and a new cycle begins. The
registerthen ignores changes in the input signals until thenext rising edge. In general,
registers can be positive edge-triggered (where the input datais copied on the
positive edge) or negative edge-triggered (where the input data is copiedon the
negative edge of the clock, as is indicated by a small circle at the clock input).
Thehold time (thold) is the time the data input must remain valid after the
clock edge. Assuming that the set-up and hold-times are met, the data at the D input
is copied to the Q outputafter a worst-case propagation delay (with reference to the
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
clock edge) denoted by tc-q.Given the timing information for the registers and the
combination logic, some system-level timing constraints can be derived.
------------------------------------------------------------- 3.1
The hold time of the register imposes an extra constraint for proper operation,
---------------------------------------------3.2
Wheretcdregister is the minimum propagation delay (or contamination delay)
of the register. As seen from Eq. (3.1), it is important to minimize the values of the
timing parametersassociated with the register, as these directly affect the rate at
which a sequential circuitcan be clocked. In fact, modern high-performance systems
are characterized by avery-low logic depth, and the register propagation delay and
set-up times account for asignificant portion of the clock period.
For example, the DEC Alpha EV6 microprocessorhas a maximum logic depth
of 12 gates, and the register overhead stands forapproximately 15% of the clock
period. In general, the requirement of Eq. (3.2) is not hardto meet, although it
becomes an issue when there is little or no logic between registers, (orwhen the
clocks at different registers are somewhat out of phase due to clock skew, as willbe
discussed in a later Chapter).
The bistable element is its most popular representative, but other elements
such as monostable and astable circuits are also frequently used. Dynamic
memories store state for a short period of time— on the order of milli seconds.They
are based on the principle of temporary charge storage on parasitic capacitors
associated with MOS devices. As with dynamic logic discussed earlier, the
capacitors have to be refreshed periodically to annihilate charge leakage.
The resulting circuit has only three possible operation points (A, B, and C), as
demonstrated on the combined VTC. The followingimportant conjecture is easily
proven to be valid: Under the condition that the gain of the inverter in the transient
region is larger than 1, only A and B are stable operation points, and C is a
metastable operation point. Suppose that the cross-coupled inverter pair is biased at
point C.
deviation δ is applied to Vi1 (biased in C). This deviationis amplified by the gain of
the inverter.
The enlarged divergence is applied to thesecond inverter and amplified once
more. The bias point moves away fromC until one ofthe operation points A or B is
reached. In conclusion, C is an unstable operation point. Every deviation (even the
smallest one) causes the operation point to run away from itsoriginal bias. The
chance is indeed very small that the cross-coupled inverter pair is biasedat C and
stays there. Operation points with this property are termed metastable.
In order to change the stored value, we must be able to bring the circuit from
stateAto B and vice-versa. Since the precondition for stability is that the loop gainG
is smallerthan unity, we can achieve this by making A (or B) temporarily unstable by
increasingG toa value larger than 1. This is generally done by applying a trigger
pulse atV i1 or Vi2. Forinstance, assume that the system is in position A (Vi1 = 0, Vi2
= 1). Forcing Vi1 to 1 causesboth inverters to be on simultaneously for a short time
and the loop gain G to be largerthan 1. The positive feedback regenerates the effect
of the trigger pulse, and the circuitmoves to the other state (B in this case). The width
of the trigger pulse need be only a little larger than the total propagation delay
around the circuit loop, which is twice the averagepropagation delay of the inverters.
This circuit is similar to the cross-coupled inverter pair with NOR gates
replacing theinverters. The second input of the NOR gates is connected to the trigger
inputs ( S and R),that make it possible to force the outputs Q and Q to a given state.
These outputs are complimentary(except for the SR = 11 state).
When both S and R are 0, the flip-flop is in a quiescentstate and both outputs
retain their value (a NOR gate with one of its input being 0looks like an inverter, and
the structure looks like a cross coupled inverter). If a positive(or 1) pulse is applied to
the S input, the Q output is forced into the 1 state (with Q going to0). Vice versa, a 1
pulse on R resets the flip-flop and the Q output goes to 0.
There are many approaches for constructing latches. One very common
techniqueinvolves the use of transmission gate multiplexers. Multiplexer based
latches can providesimilar functionality to the SR latch, but has the important added
advantage that the sizingof devices only affects performance and is not critical to the
functionality.
Unlike the SR FF, the feedback does nothave to be overridden to write the
memory and hence sizing of transistors is not critical forrealizing correct functionality.
The number of transistors that the clock touches is importantsince it has an activity
factor of 1. This particular latch implementation is not particularlyefficient from this
metric as it presents a load of 4 transistors to theC LK signal.
When CLK is high, thelatch samples the D input, while a low clock-signal
enables the feedback-loop, and putsthe latch in the hold mode. While attractive for
its simplicity, the use of NMOS only passtransistors results in the passing of a
degraded high voltage of VDD-VTn to the input of thefirst inverter.
This impacts both noise margin and the switching performance, especially
inthe case of low values of VDD and high values of VTn. It also causes static power
dissipationin first inverter.Since the maximum input-voltageto the inverter equals
VDD-VTn, the PMOS device of the inverter is never turned off, resulting is a static
current flow.
Figure 3.9Multiplexer based NMOS latch using NMOS only pass transistors for
multiplexer
On the low phase of the clock, the master stage istransparent and theD input
is passed to the master stage output, QM. During this period, the slave stage is inthe
hold mode, keeping its previous value using feedback. On the rising edge of the
clock,the master slave stops sampling the input, and the slave stage starts sampling.
During thehigh phase of the clock, the slave stage samples the output of the
master stageQ (M), whilethe master stage remains in a hold mode. Since QM is
constant during the high phase of theclock, the output Q makes only one transition
per cycle. The value of Q is the value of Dright before the rising edge of the clock,
achieving the positive edge-triggered effect. Anegative edge-triggered register can
be constructed using the same principle by simplyswitching the order of the positive
and negative latch (i.e., placing the positive latch first).
As discussed earlier, there are three important timing metrics in registers: the
set up time, the hold time andthe propagation delay. It is important to understand
these factors that affect the timingparameters and develop the intuition to manually
estimate the parameters. Assume that thepropagation delay of each inverter is
tpd_inv and the propagation delay of the transmissiongate is tpd_tx.
Also assume that the contamination delay is 0 and the inverter delay to
deriveCLK from CLK has a delay equal to 0.The set-up time is the time before the
rising edge of the clock that the input dataDmust become valid. Another way to ask
the question is how long before the rising edgedoes the D input have to be stable
such that QM samples the value reliably.
The propagation delay is the time for the value of QM to propagate to the
output Q.Note that since we included the delay ofI2 in the set-up time, the output of
I4 is validbefore the rising edge of clock. Therefore the delayt c-q is simply the delay
throughT 3 andI6 (tc-q = tpd_tx + tpd_inv).The hold time represents the time that the
input must be held stable after the risingedge of the clock. In this case, the
transmission gateT 1 turns off when clock goes high andtherefore any changes in
theD-input after clock going high are not seen by the input.
Therefore, the hold time is 0.As mentioned earlier, the drawback of the
transmission gate register is the high capacitiveload presented to the clock signal.
The clock load per register is important since it directlyimpacts the power dissipation
of the clock network. Ignoring the overhead required toinvert the clock signal (since
the buffer inverter overhead can be amortized over multipleregister bits), each
register has a clock load of 8 transistors. One approach to reduce theclock load at
the cost of robustness is to make the circuit ratioed. Figure 3.12 shows thatthe
feedback transmission gate can be eliminated by directly cross coupling the
inverters.
.
Figure 3.12Reduced load clock load static master-slave register
The penalty for the reduced clock load is increased design complexity. The
transmissiongate (T1) and its source driver must overpower the feedback inverter
(I2) to switchthe state of the cross-coupled inverter.The sizing requirements for the
transmission gatescan be derived using a similar analysis as performed for the SR
flip-flop. The input to theinverter I1 must be brought below its switching threshold in
order to make a transition. Ifminimum-sized devices are to be used in the
transmission gates, it is essential that thetransistors of inverter I2 should be made
• When the clock goes high, the slave stage should stop sampling the master
stageoutput and go into a hold mode. However, since CLK and CLK are both high for
ashort period of time (the overlap period), both sampling pass transistors conduct
andthere is a direct path from the D input to the Q output. As a result, data at the
outputcan change on the rising edge of the clock, which is undesired for
anegativeedgetriggeredregister. The is known as a race condition in which the value
of the outputQ is a function of whether the input D arrives at node X before or after
the fallingedge of CLK . If node X is sampled in the metastable state, the output will
switch to avalue determined by noise in the system.
However, with the use of conditional clocks, it is possible that registers are
idle for extended periods and the leakage energy expended by registers can be quite
significant. Many solutions are being explored to address the problem of high
leakage during idle periods. One approach for this involves the use of Multiple
Threshold devices as shown in Figure.
Only the negative latch is shown here. The shaded inverters and transmission
gates are implemented in low-threshold devices. The lowthreshold inverters are
gated using high threshold devices to eliminate leakage. During normal mode of
operation, the sleep devices are tunedon. When clock is low, the D input is sampled
and propagates to the output. When clock is high, the latch is in the hold mode.
A stored value can hence only be kept for alimited amount of time, typically in
the range of milliseconds. If one wants to preservesignal integrity, a periodic refresh
of its value is necessary. Hence the name dynamic storage. Reading the value of the
stored signal from a capacitor without disrupting the chargerequires the availability of
a device with ahigh input impedance.
One important consideration for such a dynamic register is that the storage
nodes(i.e., the state) has to be refreshed at periodic intervals to prevent a loss due to
charge leakage, due to diode leakage as well as sub-threshold currents. In datapath
circuits, the refreshrate is not an issue since the registers are periodically clocked,
and the storage nodes areconstantly updated.
Clock overlap is an important concern for this register. Consider the clock
waveformsshown in Figure 3.17. During the 0-0 overlap period, the NMOS of T1 and
thePMOS of T2 are simultaneously on, creating a direct path for data to flow from the
D input of the register to the Q output. This is known asa race condition. The output
Q can changeon the falling edge if the overlap period is large — obviously an
undesirable effect for apositive edge-triggered register. The same is true for the 1-1
overlap region, where aninput-output path exists through the PMOS of T1 and the
NMOS of T2.
The latter case istaken care off by enforcing a hold time constraint. That is,
the data must be stable duringthe high-high overlap period. The former situation (0-0
overlap) can be addressed by makingsure that there is enough delay between the D
input and node 2 ensuring that new datasampled by the master stage does not
propagate through to the slave stage. Generally the built in single inverter delay
should be sufficient and the overlap period constraint is givenas:
1. CLK= 0 (CLK = 1): The first tri-state driver is turned on, and the master
stage actsas an inverter sampling the inverted version of D on the internal node X.
The masterstage is in the evaluation mode. Meanwhile, the slave section is in a high-
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
2. The roles are reversed when CLK= 1: The master stage section is in hold
mode (M3-M4 off), while the second section evaluates (M7-M8 on). The value stored
on CL1propagates to the output node through the slave stage which acts as an
inverter. The overall circuit operates as a positive edge-triggered master-slave
register —very similar to the transmission-gate based register presented earlier.
To prove the above statement, we examine both the (0-0) and (1-1) overlap
cases(Figure 3.17). In the (0-0) overlap case, the circuit simplifies to the network
shown in Figure3.19a in which both PMOS devices are on during this period. The
question is does anynew data sampled during the overlap window propagate to the
output Q.
This is not desirablesince data should not change on the negative edge for
apositive edge-triggered register. Indeed new data is sampled on node X through the
series PMOS devices M2-M4, andnode X can make a 0-to-1 transition during the
overlap period. However, this data cannotpropagate to the output since the NMOS
device M7 is turned off.
At the end of the overlapperiod, CLK=1 and both M7 and M8 turn off, putting
the slave stage is in the hold mode. Therefore, any new data sampled on the falling
clock edge is not seen at the slave outputQ, since the slave state is off till the next
rising edge of the clock. As the circuit consists ofa cascade of inverters, signal
propagation requires one pull-up followed by a pull-down, orvice-versa, which is not
feasible in the situation presented.The (1-1) overlap case (Figure 3.19b), where both
NMOS devices M3 and M7 areturned on, is somewhat more contentious.
The question is again if new data sampled duringthe overlap period (right after
clock goes high) propagates to the Q output. A positiveedge-triggered register may
only pass data that is presented at the input before the risingedge. If the D input
changes during the overlap period, node X can make a 1-to-0 transition, but cannot
propagate to the output. However, as soon as the overlap period is over, the PMOS
M8 is turned on and the 0 propagates to output. This effect is not desirable.
The problem is fixed by imposing a hold time constraint on the input data,D,
or, in other words, the data D should be stable during the overlap period. In
summary, it can be stated that the C 2MOS latch is insensitive to clock
overlapsbecause those overlaps activate either the pull-up or the pull-down networks
of the latches, but never both of them simultaneously. If therise and fall times of the
clock are sufficiently slow, however, there exists a time slot where both the NMOS
and PMOS transistorsare conducting.
This creates a path between input and output that can destroy the stateof the
circuit. Simulations have shown that the circuit operates correctly as long as theclock
rise time (or fall time) is smaller than approximately five times thepropagationdelay of
the register. This criterion is not too stringent, and is easily met in practicaldesigns.
As a result of the dual-stage approach, nosignal can ever propagate from the
input of the latch to the output in this mode. A registercan be constructed by
cascading positive and negative latches. The clock load is similar toa conventional
transmission gate register, or C2MOS register. The main advantage is theuse of a
single clock phase. The disadvantage is the slight increase in the number of
transistors— 12 transistors are required.
While theset-up time of this latch has increased over theoneshown in Figure
3.20, the overall performance of the digital circuit (that is, the clockperiod of a
sequential circuit) has improved: the increase in set-up time is typically smallerthan
the delay of an AND gate. This approach of embedding logic into latches has
beenused extensively in the design of the EV4 DEC Alpha microprocessorand many
other high performance processors.
Wheretc-q and tsu are the propagation delay and the set-up time of the
register, respectively. We assume that the registers are edge-triggered D registers.
The term tpd,logic standsfor the worst-case delay path through the combinatorial
network, which consists of theadder, absolute value, and logarithm functions.
The result for the data set (a1, b1) only appears at the output after three clock-
periods. Atthat time, the circuit has already performed parts of the computations for
the next datasets, (a2, b2) and (a3,b3). The computation is performed in an
assembly-line fashion, hencethe name pipeline.
Table 3.1 Example of pipelined computations
than the originalfunction. This effectively reduces the value of the minimum allowable
clock period:
That is, logic is introduced between the master and slave latches of amaster-
slave system. In the following discussion, we use without loss of generality theCLK-
CLK notation to denote a two-phase clock system.
The value stored on C2 at the end of the CLK low phase is theresult of
passing the previous input (stored on the falling edge ofC LK on C1) through thelogic
function F. When overlap exists between CLK and CLK , the next input is
alreadybeing applied to F, and its effect might propagate to C2 before CLK goes low
(assumingthat the contamination delay of F is small).
In other words, a race develops between theprevious input and the current
one. Which value wins depends upon the logic functionF ,the overlap time, and the
value of the inputs since the propagation delay is often a functionof the applied
inputs. The latter factor makes the detection and elimination of raceconditions non-
trivial.
The reasoning for the above argument is similar to the argument made in the
constructionof a C2MOS register. During a (0-0) overlap between CLK and CLK, all
C2MOSlatches, simplify to pure pull-up networks.
The only way a signal canrace from stage to stage under this condition is
when the logic functionF is inverting, asillustrated in Figure 3.25, where F is replaced
by a single, static CMOS inverter.
Similarconsiderations are valid for the (1-1) overlap. Based on this concept, a
logic circuit style calledNORA-CMOS was conceived. It combines C2MOS pipeline
registers and NORA dynamic logic functionblocks.
Dynamic and static logiccan be mixed freely, and both CLKp and CLKn
dynamic blocks can be used in cascaded orin pipelined form. With this freedom of
(a) CLK-module
(b) CLK-module
Figure 3.26Examples of NORA CMOS Modules.
• The dynamic-logic rule: Inputs to a dynamic CLKn (CLKp) block are only
allowed tomake a single 0→1 (1→0) transition during the evaluation period .
• The C2MOS rule: In order to avoid races, the number of static inversions
betweenC2MOS latches should be even.
This translates into the followingrule: The number of static inversions between
the last dynamic block in a logic functionand the C2MOS latch should be even. This
and similar considerations lead to a reformulatedC2MOS rule.
3.7Clocking Strategy
Introduction
A synchronous signal is one that has the exact same frequency, and a known
fixed phaseoffset with respect to the local clock. In such a timing methodology, the
signal is “synchronized “with the clock, and the data can be sampled directly without
any uncertainty. Indigital logic design, synchronous systems are the most
straightforward type of interconnect, where the flow of data in a circuit proceeds in
lockstep with the system clock asshown below.
Here, the input data signal In is sampled with register R1 to give signal Cin,
which issynchronous with the system clock and then passed along to the
combinational logicblock. After a suitable setting period, the output Cout becomes
valid and can be sampled byR2 which synchronizes the output with the clock. In a
sense, the “certainty period” of signaCout, or the period where data is valid is
synchronized with the system clock, whichallows register R2 to sample the data with
complete confidence. The length of the “uncertaintyperiod,” or the period where data
is not valid, places an upper bound on how fast asynchronous interconnect system
can be clocked.
A mesochronous signal is one that has the same frequency but an unknown
phase offsetwith respect to the local clock (“meso” from Greek is middle). For
example, if data isbeing passed between two different clock domains, then the data
signal transmitted fromthe first module can have an unknown phase relationship to
the clock of the receivingmodule.
ClkA.
Figure 3.28Mesochronous communication approach using variable delay line.
A plesiochronous signal is one that has nominally the same, but slightly
different frequencyas the local clock (“plesio” from Greek is near). In effect, the
phase differencedrifts in time. This scenario can easily arise when two interacting
modules have independentclocks generated from separate crystal oscillators.
Since the clock frequencies from the originatingand receiving modules are
mismatched, data might have to be dropped if the transmitfrequency is faster, and
data can be duplicated if the transmit frequency is slower thanthe receive frequency.
However, by making the FIFO large enough, and periodically resettingthe system
whenever an overflow condition occurs, robust communication can beachieved.
Asynchronous signals can transition at any arbitrary time, and are not slaved
to any localclock. As a result, it is not straightforward to map these arbitrary
transitions into a synchronizeddata stream.
For a positive edge-triggered system, the rising edge of the clock is used
todenote the beginning and completion of a clock cycle. In the ideal world, assuming
theclock paths from a central distribution point to each register are perfectly
balanced, thephase of the clock (i.e., the position of the clock edge relative to a
reference) at variouspoints in the system is going to be exactly equal.
At the same time, the hold time of the destination register must be shorter than the
minimumpropagation delay through the logic network,
thold<tc– q,cd+ tlogic,cd-----------------------------------------------------------(2)
The above analysis is simplistic since the clock is never ideal. As a result of process
andenvironmental variations, the clock signal can have spatial and temporal
variations.
Clock Skew
The spatial variation in arrival time of a clock transition on an integrated circuit
is commonlyreferred to as clock skew. The clock skew between two points i and j on
a IC isgiven by δ(i,j) = ti- tj, where ti and tj are the position of the rising edge of the
clock withrespect to a reference. Consider the transfer of data between registers R1
and R2 in Figure3.30. The clock skew can be positive or negative depending upon
the routing direction andposition of the clock source. The timing diagram for the case
with positive skew is shownin Figure 3.31. As the figure illustrates, the rising clock
edge is delayed by a positive δ atthe second register.
Figure Timing diagram to study the impact of clock skew on performance and
functionality. In this sample timing diagram, δ > 0.
Clock skew is caused by static path-length mismatches in the clock load and
by definition skew is constant from cycle to cycle. That is, if in one cycle CLK2
lagged CLK1 by δ, then on the next cycle it will lag it by the same amount. It is
important to note that clock skew does not result in clock period variation, but rather
phase shift.
Figure 3.32Timing diagram for the case when δ< 0. The rising edge of CLK2
arrives earlier than the edge of CLK1.
The above equation suggests that clock skew actually has the potential to
improvethe performance of the circuit. That is, the minimum clock period required to
operate thecircuit reliably reduces with increasing clock skew! This is indeed correct,
but unfortunately, increasing skew makes the circuit more susceptible to race
conditions may andharm the correct operation of sequential systems.
As above, assume that input In is sampled on the rising edge of CLK1 at edge
1 intoR1. The new values at the output of R1 propagates through the combinational
logic andshould be valid before edge 4 at CLK2. However, if the minimum delay of
the combinationallogic block is small, the inputs to R2 may change before the clock
edge 2, resultingin incorrect evaluation.
Figure 3.31shows the timing diagram for the case when δ< 0. For this case,
the risingedge of CLK2 happens before the rising edge of CLK1. On the rising edge
of CLK1, anew input is sampled by R1. The new sampled data propagates through
the combinationallogic and is sampled by R2 on the rising edge of CLK2, which
corresponds to edge 4. Ascan be seen from Figure 3.32 and Eq.(1), a negative skew
directly impacts the performance of sequential system. However, a negative skew
implies that the system never fails,since edge 2 happens before edge 1! This can
also be seen from Eq(2)., which isalways satisfied since δ< 0.
Example scenarios for positive and negative clock skew are shown in Figure
• δ< 0—When the clock is routed in the opposite direction of the data (Figure
3.32b), theskew is negative and condition (4) is unconditionally met. The circuit
operates correctlyindependent of the skew. The skew reduces the time available for
actual computationso that the clock period has to be increased by δ. In summary,
routing the clock inthe opposite direction of the data avoids disasters but hampers
the circuit performance.
Unfortunately, since a general logic circuit can have data flowing in both
directions (forexample, circuits with feedback), this solution to eliminate races will not
always work (Figure3.33). The skew can assume both positive and negative values
depending on the direction of the data transfer.
Clock jitter refers to the temporal variation of the clock period at a given point
on the chip— that is,the clock period can reduce or expand on a cycle-by-cycle
basis. It is strictly a temporaluncertainty measure and is often specified at a given
point on the chip.
The above equation illustrates that jitter directly reduces the performance of a
sequentialcircuit. Care must be taken to reduce jitter in the clock network to
maximize performance.
---------------------------(6)
Figure 3.35Sequential circuit to study the impact of skew and jitter on edge-
triggeredsystems. In this example, a positive skew (δ) is assumed.
As the above equation illustrates, while positive skew can provide potential
performanceadvantage, jitter has a negative impact on the minimum clock period. To
formulatethe minimum delay constraint, consider the case when the leading edge of
the CLK1 cyclearrives early (edge 1) and the leading edge the current cycle of CLK2
arrives late (edge6). The separation between edge 1 and 6 should be smaller than
the minimum delaythrough the network. This result in
-------------------(7)
The above relation indicates that the acceptable skew is reduced by the jitter
of the twosignals.Now consider the case when the skew is negative (δ<0) as shown
in Figure .For the timing shown, |δ|> tjitter2. It can be easily verified that the worst
case timing isexactly the same as the previous analysis, with δ taking a negative
value. That is, negative skew reduces performance.
Figure 3.36Consider a negative clock skew (δ) and the skew is assumed to be
larger than the jitter.
Pulse Registers
Until now, we have used the master-slave configuration to create an edge-triggered
register. A fundamentally different approach for constructing a register uses pulse signals.
The idea is to construct a short pulse around the rising (or falling) edge of the clock. This
pulse acts as the clock input to a latch (e.g., a TSPC flavor is shown in Figure 7.35a),
sampling the input only in a short window. Race conditions are thus avoided by keeping the
opening time (i.e, the transparent period) of the latch very short. The combination of the
glitchgeneration circuitry and the latch results in a positive edge-triggered register. Figure
7.35b shows an example circuit for constructing a short intentional glitch on each rising edge
of the clock. When CLK = 0, node X is charged up to VDD (MN is off since CLKG is low). On
the rising edge of the clock, there is a short period of time when both inputs of the AND gate
are high, causing CLKG to go high. This in turn acti-vates MN, pulling X and eventually
CLKG low (Figure 7.35c). The length of the pulse is controlled by the delay of the AND gate
and the two inverters. Note that there exists also a delay between the rising edges of the
input clock (CLK) and the glitch clock (CLKG) — also equal to the delay of the AND gate and
the two inverters. If every register on the chip uses the same clock generation mechanism,
this sampling delay does not matter. However, process variations and load variations may
cause the delays through the glitch clock circuitry to be different. This must be taken into
account when performing timing verification and clock skew analysis (which is the topic of a
later Chapter). If set-up time and hold time are measured in reference to the rising edge of
the glitch clock, the set-up time is essentially zero, the hold time is equal to the length of the
pulse (if the contamination delay is zero for the gates), and the propagation delay (tc-q)
equals two gate delays. The advantage of the approach is the reduced clock load and the
small number of transistors required. The glitch-generation circuitry can be amortized over
multiple register bits. The disadvantage is a substantial increase in verification complexity.
This has prevented a wide-spread use. They do however provide an alternate approach to
conventional schemes, and have been adopted in some high performance processors.
Another version of the pulsed register is shown in Figure 7.36 When the clock is low, M3
and M6 are off, and device P1 is turned on. Node X is precharged to VDD, the output node
(Q) is decoupled from X and is held at its previous state. CLKD is a delay-inverted version of
CLK. On the rising edge of the clock, M3 and M6 turn on while devices M1 and M4 stay on
for a short period, determined by the delay of the three inverters. During this interval, the
circuit is transparent and the input data D is sampled by the latch. Once CLKD goes low,
node X is decoupled from the D input and is either held or starts to precharge to VDD by
PMOS device P2. On the falling edge of the clock, node X is held at VDD and the output is
held stable by the cross-coupled inverters.
Note that this circuit also uses a one-shot, but the one-shot is integrated into the reg-ister.
The transparency period also determines the hold time of the register. The window
must be wide enough for the input data to propagate to the Q output. In this particular circuit,
the set-up time can be negative. This is the case if the transparency window is longer
than the delay from input to output. This is attractive, as data can arrive at the register even
after the clock goes high, which means that time is borrowed from the previous cycle.
turned off ensuring that the differential inputs don’t affect the output during the low phase of
the clock. On the rising edge of the clock, the evaluate transistor turns on and the differential
input pair (M2 and M3) is enabled, and the difference between the input signals is amplified
on the output nodes on L1 and L2. The cross-coupled inverter pair flips to one of its the
stable states based on the value of the inputs. For example, if IN is 1, L1 is pulled to 0, and
L2 remains at VDD. Due to the amplifying properties of the input stage, it is not necessary
for the input to swing all the way up to VDD and enables the use of lowswing signaling on
the input wires. The shorting transistor, M4, is used to provide a DC leakage path from either
node L3, or L4, to ground. This is necessary to accommodate the case where the inputs
change
Positive edge-triggered
register based on sense-amplifier.
their value after the positive edge of CLK has occurred, resulting in either L3 or L4 being left
in a high-impedance state with a logical low voltage level stored on the node. Without the
leakage path that node would be susceptible to charging by leakage currents. The latch
could then actually change state prior to the next rising edge of CLK! This is best illustrated
graphically, as shown in Figure 7.39.
Non-Bistable Sequential Circuits
In the preceding sections, we have focused on one single type of sequential element, this is
the latch (and its sibling the register). The most important property of such a circuit is that it
has two stable states, and is hence called bistable. The bistable element is not the only
sequential circuit of interest. Other regenerative circuits can be catalogued as astable and
monostable. The former act as oscillators and can, for instance, be used for on-chip clock
generation. The latter serve as pulse generators, also called one-shot circuits. Another
interesting regenerative circuit is the Schmitt trigger. This component has the useful property
of showing hysteresis in its dc characteristics—its switching threshold is variable and
depends upon the direction of the transition (low-to-high or high-to-low). This peculiar feature
can come in handy in noisy environments.
7.6.1 The Schmitt Trigger
Definition
A Schmitt trigger [Schmitt38] is a device with two important properties:
1. It responds to a slowly changing input waveform with a fast transition time at the
output.
2. The voltage-transfer characteristic of the device displays different switching thresholds
for positive- and negative-going input signals. This is demonstrated in Figure 7.46, where a
typical voltage-transfer characteristic of the Schmitt trigger is shown (and its schematics
symbol). The switching thresholds for the low-to-high and high to- low transitions are called
VM+ and VM-, respectively. The hysteresis voltage is defined as the difference between the
two. One of the main uses of the Schmitt trigger is to turn a noisy or slowly varying input
signal into a clean digital output signal. This is illustrated in Figure 7.47. Notice how the
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
hysteresis suppresses the ringing on the signal. At the same time, the fast low-to-high (and
highto- low) transitions of the output signal should be observed. For instance, steep signal
slopes are beneficial in reducing power consumption by suppressing direct-path currents.
The “secret” behind the Schmitt trigger concept is the use of positive feedback.
CMOS Implementation
One possible CMOS implementation of the Schmitt trigger is shown in Figure 7.48. The idea
behind this circuit is that the switching threshold of a CMOS inverter is determined by the
(kn/kp) ratio between the NMOS and PMOS transistors. Increasing the ratio results in a
reduction of the threshold, while decreasing it results in an increase in VM. Adapting the ratio
depending upon the direction of the transition results in a shift in the switching threshold and
a hysteresis effect. This adaptation is achieved with the aid of feedback. Suppose that Vin is
initially equal to 0, so that Vout = 0 as well. The feedback loop biases the PMOS transistor
M4 in the conductive mode while M3 is off. The input signal effectively connects to an
inverter consisting of two PMOS transistors in parallel (M2 and M4) as a pull-up network,
and a single NMOS transistor (M1) in the pull-down chain. This modifies the effective
transistor ratio of the inverter to kM1/(kM2+kM4), which moves the switching threshold
upwards. Once the inverter switches, the feedback loop turns off M4, and the NMOS device
M3 is activated. This extra pull-down device speeds up the transition and produces a clean
output signal with steep slopes. A similar behavior can be observed for the high-to-low
transition. In this case, the pull-down network originally consists of M1 and M3 in parallel,
while the pull-up network is formed by M2. This reduces the value of the switching threshold
to VM–.
td is created. The delay circuit can be realized in many different ways, such as an RC-
network or a chain of basic gates.
Astable Circuits
An astable circuit has no stable states. The output oscillates back and forth between two
quasi-stable states with a period determined by the circuit topology and parameters (delay,
power supply, etc.). One of the main applications of oscillators is the on-chip generation of
clock signals. This application is discussed in detail in a later chapter (on timing). The ring
oscillator is a simple, example of an astable circuit. It consists of an odd number of inverters
connected in a circular chain. Due to the odd number of inversions, no stable operation point
exists, and the circuit oscillates with a period equal to 2 ´ tp ´ N, with N the number of
inverters in the chain and tp the propagation delay of each inverter. The ring oscillator
composed of cascaded inverters produces a waveform with a fixed oscillating frequency
determined by the delay of an inverter in the CMOS process. In many applications, it is
necessary to control the frequency of the oscillator. An example of such a circuit is the
voltage-controlled oscillator (VCO), whose oscillation frequency is a function (typically non-
linear) of a control voltage. The standard ring oscillator can be modified into a VCO by
replacing the standard inverter with a current-starved inverter as shown in Figure 7.53
[Jeong87]. The mechanism for controlling the delay of each inverter is to limit the current
available to discharge the load capacitance of the gate. In this modified inverter circuit, the
maximal discharge current of the inverter is limited by adding an extra series device. Note
that the low-to-high transition on the inverter can also be controlled by adding a PMOS
device in series with M2. The added NMOS transistor M3, is controlled by an analog control
voltage Vcntl, which determines the available discharge current. Lowering Vcntl reduces the
discharge current and, hence, increases tpHL. The ability to alter the propagation delay per
stage allows us to control the frequency of the ring structure. The control voltage is generally
set using feedback techniques. Under low operating current levels, the current-starved
inverter suffers from slow fall times at its
Consider a typical personal computer. All operations within the system are
strictlyorchestrated by a central clock that provides a time reference. This reference
determineswhat happens within the computer system at any point in time.
The resulting value is neither low nor high but undefined. At that point, it is not
clearif the key was pressed or not. Feeding the undefined signal into the computer
could be thesource of all types of trouble, especially when it is routed to different
functions or gatesthat might interpret it differently.
For instance, one function might decide that the key ispushed and start a
certain action, while another function might lean the other way andissue a competing
command. This results in a conflict and a potential crash. Therefore, theundefined
state must be resolved in one way or another before it is interpreted further.
Itdoes not really matter what decision is made, as long as a unique result is
available. Forinstance, it is either decided that the key is not yet pressed, which will
be corrected in thenext poll of the keyboard, or it is concluded that the key is already
pressed.Thus, an asynchronous signal must be resolved to be either in the high or
low statebefore it is fed into the synchronous environment.
The designer’s task is to ensure that the probabilityof such a failure is small
enough that it is not likely to disturb the normal system behavior.Typically, this
probability can be reduced in an exponential fashion by waiting longer beforemaking
a decision. This is not too troublesome in the keyboard example, but in
general,waiting affects system performance and should therefore be avoided to a
maximal extent.
There are numerous digital applications that require the on-chip generation of
a periodicsignal. Synchronous circuits need a global periodic clock reference to drive
sequential elements.Current microprocessors and high performance digital circuits
require clock frequenciesin the gigahertz range. Crystal oscillators generate
accurate, low-jitter clockswith a frequency range from 10’s of Megahertz to
approximately 200MHz.
Typically as shown in Figure 3.58, a reference clock is sent along with the
parallel databeing communicated (in this example only the transmit path from chip 1
to chip 2 isshown). Since chip-to-chip communication occurs at a lower rate than the
on-chip clockrate, the reference clock is a divided but in-phase version of the system
clock.
The referenceclock synchronizes all input flip-flops on chip 2; this can present
a significant clockload for wide data busses. Introducing clock buffers to deal with
this problem unfortunatelyintroduces skew between the data and sample clock. A
PLL, using feedback, can bealign (i.e., de-skew) the output of the clock buffer with
respect to the data. In addition, forthe configuration shown in Figure 3.58, the PLL
can multiply the frequency of theincoming reference clock, allowing the reference
clock to be a fraction of the data rate.
the two signals. The relative phase isdefined as the difference between the two
phases.
This is function of the rest of the blocks (i.e.,feedback loop) in the PLL. The
feedback loop is critical to tracking process and environmentalvariations. The
feedback also allows frequency multiplication.
jitter.Note that the PLL structure is a feedback structure and the addition of extra
phase shifts,as is done by a high-order filter, may result in instability.
TWO MARKS
NORA [NO Race] has two major drawbacks. The logical effort of footed p-
logic gates is generally worse than that of Hi-skew gates. Secondly, NORA is
extremely susceptible to noise.
11. Define Pass-transistor logic? [MAY 2010]
It reduces the count of transistors used to make different logic gates, by
eliminating redundant transistors. Transistors are used as switches to pass logic
levels between nodes of a circuit, instead of as switches connected directly to
supply voltages.
In pass-transistor circuits, inputs are also applied to the source/drain diffusion
terminals. These circuits build switches using either nMOS pass transistors or
parallel pairs of nMOS and pMOS transistors called transmission gates.
15. Compare CMOS combinational logic gates with reference to the Equivalent
n-mos depletion load logic with reference to the Area requirement. [MAY2012]
For CMOS, the area required is 533 μm2 , For pseudo NMOS the area required is
288 μ m2.
16. What are the advantage of using a pseudo NMOS gate instead Of a full
CMOS gate. [MAY 2012]
Ratioed circuits dissipate power continually in certain states and have poor
noise margin than complementary circuits.
17. Enumerate the features of synchronizers [MAY2013]
The goal of a digital system designer should be ensure that given
synchronous inputs the probability of encountering a metastable voltage is
sufficiently small. To guarantee good logic levels, all synchronous inputs should
be passed through synchronizers.
18. List the various power losses in CMOS circuits [MAY2013]
1. Static power dissipation(due to leakage current when the circuit is ideal)
2. Dynamic power dissipation(When the circuit is switching)and
3. Short-circuit power dissipation during switching of transistors.
19. Draw the NAND gate logic gate diagram and its layout diagram [Nov 2012]
contamination delay(tcd): The amount of time needed for a change in a logic input
to result in an initial change at an output, that is the combinational logic is
guaranteed not to show any output change in response to an input change before
fed time units have passed.
33. Define Setup time and Hold time.
Setup time (t setup): The amount of time before the clock edge that data
input D must be stable the rising clock edge arrives.
Hold time (t hold): This indicates the amount of time after the clock edge
arrives the data input D must be held stable in order for FF to latch the correct
value. Hold time is always measured from the rising clock edge to a point after the
clock edge.
34. Difference between latches and Flip-Flop.
35.Define Pipelining.
Pipelining is a popular design technique often used to accelerate the
operation of the data path in digital processors. The major advantages of pipelinig
are to reduce glitching in complex logic networks and getting lower energy due to
operand isolation.
36.How the limitations of a ROM-based realization is overcome in a PLA-based
realization.
In a ROM, the encoder part is only programmable and use of ROMs to realize
Boolean functions is wasteful in many situations because there is no cross-connect
for a significant part.
This wastage can be overcome by using Programmable Logic Array(PLA),
which requires much lesser chip area.
37.In what way the DRAMs differ from SRAMs?
Both SRAMs and DRAMs are volatile in nature, ie. Information is lost if power
line is removed. However SRAMs provide high switching speed, good noise margin
but require large chip area than DRAMs.
38.Explain the read and write operations for a one-transistor DRAM cell.
A significant improvement in the DRAM evolution was to realize 1-T DRAM
cell. One additional capacitor is explicitly fabricated for storage purpose. To store 'I',
it is charged to store '0' it is discharged to '0' volt. Read operation is destructive.
Sense amplifier is needed for reading.
Read operation is followed by restoration operation.
39.what is MTBF ?
MTBF=(1/P(failure)) = ( Ti e(Ti=tsetup/ti)/Nto)
40. what do you meant by Max delay constraint and Min delay constraint ?
Min delay constraint: the path begins with the rising edge of the clock
triggering F1. The data may begin to change at Q1 after a clk-to-Q contamination
delay. However, it must not reach D2 until at least the hold after the clock edge, lest
it corrupt the contents of F2. Hence,
we solve for minimum logic contamination delay :
tcd>= thold – tccq
Max delay constraint : the path begins with the rising edge of the clock
triggering F1. The data must propagate to the output of the flipflop Q1 and through
the combinational logic to D2, setting up at F2 before the next rising clock edge.
Under ideal conditions, the worst case propagation delays determine the minimum
clock period for this sequential circuitry
Tc >= tpcq + tpd + tsetup
41.Draw switch level schematic of multiplexer based NMOS latch using NMOS
only pass transistors for multiplexer.(May-16)
UNIVERSITY QUESTIONS
PART B
1.Explain in detail about the pipelining concept used in sequential circuits. (16) [MAY
2013]
2.Discuss the techniques to reduce switching activity in a static and dynamic CMOS
circuits.(16) [MAY 2013]
3.(i) For a two input NAND gate de rive an ex press ion for the drain current. (8)
[MAY 2012]
(ii) Draw a CMOS NOR 2 gate and its complementary operation with necessary
equations. (4)
(iii) Obtain a CMOS logic design realizing the Boolean function z=a (d+e)+ bc (8 )
[MAY 2012]
4.(i) Draw a circuit diagram of the CMOS SR latch and explain in detail. (8) [MAY
2012]
(ii) Along with the necessary input and output waveforms of the CMOS DFF negative
edge triggered master slaved flip flop. (8) [MAY 2012]
5. Write the basic principle of low power logic design. (4) [NOV 2011]
6.(i) For a resistive load inverter circuit with VDD = 5 V, Kn’= 20 µA/V2, VTO= 0.8 V,
RL= 200 kΩ and = 2. Calculate the critical voltages on the voltage transfer
characteristics and find the noise margins of the circuit. [May 2011].
(ii) Explain the details about pseudo – nMOS gates with neat circuit diagram. [May
2011]
7.(i) Design a transistor level schematic of the one bit full adder circuit and explain.
(6) [May 2011]
(ii) Discuss in detail the characteristics of CMOS transmission gate. (10) [May 2011]
8.(i) What is meant by transmission gate? List the applications of transmission gates
and design a 2 x 1 mux operation circuit using transmission gates.(8) [NOV 2012]
(ii) Realize the AND logic gate and OR logic gate using NOR logic gates. (8) [NOV
2012]
9. Realize the AND logic gate and OR logic gate using NAND logic gates. (8) [NOV
2012]
10.Explain in detail about the design of a six transistor static RAM cell with dynamic
refreshment logic.(16) [NOV 2012]
11.(i)Compare static and dynamic logic circuits with example.(8)(DEC 2014)
(ii)Explain the dynamic and static power reduction in low power design of VLSI
circuits.(8)
12. Explain the methodology of sequential circuit design of latches and Flip-flops.(16)
(MAY 2014)
13.Explain the operation of Master slave based edge triggered register.May-16
14.Discuss in detail various pipelining approaches to optimize sequential
circuits.May-16
UNIT IV
DESIGNING
ARITHMETIC
BUILDING BLOCKS
REFERRED BOOKS:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated
Circuits: A Design Perspective”, Second Edition, Prentice Hall of India,
2003.
Introduction
Fig 4.1 shows the bit sliced data path organization. Data in a processor are
operated in word-based manner.Data paths in a microprocessor are 32 or 64 bits
wide.The signal processing datapaths in Digital subscriber Line modems, magnetic
disk drives, or compact disc players are 5 to 24 bits wide.
The most commonly used arithmetic operation is the addition. Adder is the
speed limiting element so careful optimization of adder design is needed.The
optimization is made either at logic level or circuit level.
Carry look ahead adder is an example for logic level optimization which uses
rearrange the Boolean equations.Circuit optimizations manipulate transistor sizes
and circuit topology to optimize the speed.
A full adder has three inputs and two outputs.Let A and B are the adder
inputs,Cin is the carry input, S is the sum output and cout is the carry output.Table
shows the truth table of binary full adder.
A B Ci S Co Carry status
0 0 0 0 0 delete
0 0 1 1 0 delete
0 1 0 1 0 propagate
0 1 1 0 1 propagate
1 0 0 1 0 propagate
1 0 1 0 1 propagate
1 1 0 0 1 generate/propagate
1 1 1 1 1 generate/propagate
S=A⊕ B⊕ Cin
Cout=AB+BCin+ACin
G=AB
D= A B
P=A ⨁ B
Co(G,P)=G+PCin
S(G,P)=P⊕Cin
G and P are functions of A and B only and are not dependent upon Cin.Similarly
Expression for S(D,P) and Co(D,P) can be derived.
For some input signals, no ripple occurs at all.But for some cases the carry
has to ripple all the way from the LSB to the MSB.the propagation delay in this path
is called the critical path is defined as the worst case delay over all possible input
patterns.
Worst case delay occurs when a carry is generated at the LSB position.This
carry is consumed at the last stage to produce the sum.Delay proportional to the
number of bits in the input words N and is given by,
tadder≈ (N-1)tcarry+tsum
Conclusions observed:
ii)When designing the full adder cell for afast ripple carry adder, it is important to
optimize tcarryand tsum.since the later has only a minor influence on the total value of
tadder.
Inverting all inputs to a full adder results in inverted values for all outputs. This
is called the inverter property.The inverting property is useful when optimizing the
speed of ripple carry adder.The property is expressed as,
S(A,B,Ci)=S( A , B,Ci )
C 0 (A,B,Ci)=C0( A , B,Ci)
Cout=AB+BCin+ACin
G=AB
D= A B
P=A ⨁ B
This method makes use of logic gates so as to look at the lower order bits of
the augend and addend to see whether a higher order carry is to be generated or
not.
Pi = Ai ⊕ Bi
Gi = Ai Bi
The sum output and carry output can be expressed as
Si = Pi ⊕ Ci
C i +1 = Gi + Pi Ci
Where Gi is a carry generate which produces the carry when both Ai, Bi are one
regardless of the input carry. Pi is a carry propagate and it is associate with the
propagation of carry from Ci to Ci +1.
This high level model has some hidden dependencies. The constant addition
time in adder is wishful thinking and the delay is increasing linearly with the number
of bits as shown in fig.
The carry output Boolean function of each stage in a 4 stage carry-Lookahead adder
can be expressed as
C1 = G0 + P0 Cin
C2 = G1 + P1 C1
= G1 + P1 G0 + P1 P0 Cin
C3 = G2 + P2 C2
= G2 + P2 G1+ P2 P1 G0 + P2 P1 P0 Cin
C4 = G3 + P3 C3
= G3 + P3 G2+ P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 Cin
Disadvantages
i). For a group of N bits the transistor implementation has N+1 parallel
branches and N+1 transistors in the stack
ii). Wide gates and large stacks display poor performance
iii). The computation has to be limited to up to two or four bits.
4.8 Carry-Bypass Adder
Fig:4.11Carry-Bypass Adder
The number of bits in each carry select block can be uniform, or variable. In
the uniform case, the optimal delay occurs for a block size of sqrt {n} . When
variable, the block size should have a delay, from addition inputs A and B to the
carry out, equal to that of the multiplexer chain leading into it, so that the carry out is
calculated just in time. The sqrt {n) delay is derived from uniform sizing, where the
ideal number of full-adder elements per block is equal to the square root of the
number of bits being added, since that will yield an equal number of MUX delays.
4.11Multipliers
Multiplications are expensive and slow operations.Performance of many
computational circuits is decided by the speed at which a multiplication operation can
be executed. So the design and integration of multiplication units are very much
important. Multipliers have complex adder arrays.
Definition: Consider the two N bit sequence A & B for multiplication is given as
Multiplication is performed using ,a single two input adder with M and NBits wide.The
multiplication takes M cycles using an N-Bit adder.The shift and add algorithm for
multiplication adds together M partial products.
Fast multiplication is done similar to manual computation. Here all the partial
products are generated at the same time and organized in an array.To compute the
final products a multioperand addition is used as shown in fig.This structure is called
an array multiplier.The array multiplier has the following three functions.
Is a short notation for -1.This format needs to add only two partial products,but the
final adder has to perform subtraction as well.This type of transformation is called
Booth’s recoding. Booth’s recoding reduces thenumber of partial products to at
most one half. This ensures that for every two consecutive bits,at most one bit will be
1 or -1.Reducing the number of partial products is equivalent to reducing the
number of additions, that speedup the operation and reduces the area.This
transformation is equivalent to formatting the multiplier word into a base-4 scheme
instead of the binary format.The format is,
(N−1)/2
Y¿ ∑ yj 4 j with (Yj∈{-2,-1,0,1,2})
j=0
The 1010..10 represents the worst case multiplier input because it generates
the most partial products.Multiplication with {0,1}is equivalent to an AND operation
multiplying with{-2,-1,0,1,2}.This requires combination of inversion and shift logic.
In modified Booths recoding the multiplier is portioned into 3 bit groups that
overlap by one bit. The groups are shown in table and forms one partial product.The
number of partial products equals half of the multiplier width.The input bits to the
recoding process are the two current bits,combined with the upper bit from the next
group,moving from most significant bit to least significant bit.
tmult≈ [(M-1)+(N+2)]tcarry+(N-1)tsum+tand
tsum-Delay between the input carry and sum bit of the full adder
All critical paths have same length. Speeding up one of them for instance or by
replacing one adder by faster one such as a carry-select adder does not make much
sense. All critical paths have to be attacked at the same time. Minimization of
tmult.requires the minimization of both tcarryand tsum.
The first type of operator used to cover the array is a full adder,which takes
three inputs and produces two outputs: The sum, located in the same column and
the carry located in the next one.
For this reason the full adder is called a 3-2 compressor.It is denoted by a
circle covering three bits.The other operator is the half adder,which takes two input
bits in a column and produces two outputs.The half adder is denoted by circle
covering two bits.
To obtain minimal implementation ,the tree is covered with full adders and half
adders, starting fromits densest part.First half adder is introduced in column 4 and 3
as shown in fig.(b).the reduced tree is shown in figure(c)..Only three full adders and
three half adders are used for the reduction process, compared with six full adders in
the carry save multiplier.
Disadvantage
i).Very irregular
ii).Complicated layout
4.13 SHIFTERS
The shift operation is another essential arithmetic operation that requires
adequate hardware support. It is used extensively in floating point units, scalers and
multiplications by constant numbers. The latter can be implemented as a
combination of add and shift operations. Shifting a data word left to right over a
constant amount is a trivial hardware operation and is implemented by the
appropriate signal wiring. A programmable shifter is more complex and requires
active circuitry.
A major advantage of this shifter is that the signal has to pass through atmost
one transmission gate .In other words, the propagation delay is theoretically constant
and independent of the shift value or shifter size. this is not true in reality, however
because the capacitance at the input of the buffers rises linearly with the maximum
shift width.
One third of the total energy ofa digital system is consumed by the clock
distribution network. A common method to reduce power in idle mode is the clock
gating technique. Clock gating does not reduce the leakage power of the idle block.
Complicated schemes to lower the standby power are used, such as increasing the
transistor thresholds or switching off the power rails. Following are the techniques to
reduce the power.
iii). Circuit size is not only determined by the number and size of the transistors, but
also by other factors such as wiring and the number of vias and contacts.
iv). Optimization helps to get better result, but results in an irregular and convoluted
topology.
v). Power and speed can be traded off through a choice of circuit sizing, supply
voltages and transistor thresholds.
PartA
1.How data path can be implemented in VLSI system?
A data path is best implemented in a bit –sliced fashion. A single layout is
used respectively for every bit in the data word. This regular approach eases the
design effort and results in fast and dense layouts.
2.Comment on performance of ripple carry adder.
A ripple carry adder has a performance that is linearly proportional to the
number of bits.
Circuit optimizations concentrate on reducing the delay of the carry path. A number
of circuit topologies exist providing that careful optimization of the circuit topology
and the transistor sizes helps to reduce the capacitance on the carry bit.
3.What is the logic of adder for increasing its performance ?
Other adder structures use logic optimizations to increase the performance
(carry bypass, carry select, carry lookahead). Performance increase comes at the
cost area.
4.What is multiplier circuit ?
A multiplier is nothing more than a collection of cascaded adders. Critical
path is far more complex and optimizations are different compared to adders.
5.Which factors dominate the performance of programmable shifter ?
The performance and the area of a programmable shifter are dominated by
the wiring.
6.What is meant by data path ?(May 2016)
A datapath is a functional units, such as arithmetic logic units or multipliers,
that perform data processing operations, registers and buses. Along with the control
unit it composes the central processing unit.
7.Write down the expression for worst-case delay for RCA.
t = (n-1)tc+ts
8.Write down the expression to obtain delay for N-bit carry select adder.(May
2016)
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
A carry-lookahead adder (CLA) or fast adder is a type of adder used in digital logic.
A carry-lookahead adder improves speed by reducing the amount of time required to
determine carry bits.
Latency is the delay from input into a system to desired outcome; the term is
understood slightly differently in various contexts and latency issues also vary from
one system to another.
PartB
1. Explain the structure of booth multiplier and list its advantages.
2. Design a 3 bit barrel shifter
3. what is 4*4 carry save multiplier. Calculate its critical path delay
4. Explain the following circuits 1. Data path circuit 2. Any one adder circuit
5. Explain with neat diagram baugh-wooley multiplier
6. Explain ripple carry adder.Nov-16
7. Describe about carry look-ahead adder and its carry generation and
propagation.Nov-16,Nov-17
8.Design 16 bit carrybypass and carry select adder and discuss their features.May16
9.Design a 4x4 array multiplier and write down the equation for delay.May-16
10. Explain the operation of Booth multiplier with suitable examples?Justify how
booth algorithm speed up the multiplication process.Nov-16
11. Design a multiplier for 5 bit by 3 bit .Explain its operation and summarize the
number of adders.Discuss it over Wallace multiplier.Nov-17
UNIT V
IMPLEMENTATION
STRATEGIES AND
TESTING
REFERRED BOOK:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated
Circuits: A Design Perspective”, Second Edition, Prentice Hall of India,
2003.
5.1 INTRODUCTION
FPGAs are the newest member of the ASIC family and are rapidly growing in
importance, replacing TTL in microelectronic systems. Even though an FPGA is a type of
gate array, we do not consider the term gate-array based ASICs to include FPGAs. This may
change as FPGAs and MGAs start to look more alike.
All FPGAs contain a regular structure of programmable basic logic cells surrounded
by programmable interconnect. The exact type, size, and number of the programmable basic
logic cells varies tremendously.
The input to the global router is a floorplan that includes the locations of all the fixed
and flexible blocks; the placement information for flexible blocks; and the locations of all the
logic cells. The goal of global routing is to provide complete instructions to the detailed router
on where to route every net. The objectives of global routing are one or more of the
following:
3. Use a detailed router to select specific wiring segments and routing switches for
each connection, within the restrictions set by the global router.
The advantage of this approach is that each of the routing tools can more effectively
solve a smaller part of the routing problem. More specifically, since a global router need
not be concerned with allocating wiring segments or routing switches, it can concentrate
on more global issues, like balancing the usage of the routing channels. Similarly, with
the reduced number of detailed routing alternatives that are available for each connection
because of the restrictions introduced by a global router, a detailed router can focus on
the problem of achieving connectivity. Its limited scope enables a detailed router to
concentrate on resolving contention for routing resources that may exist among different
nets. The above routing strategy has been adopted in this thesis for FPGA routing. The
routing resources are partitioned into horizontal and vertical routing channels.
2.2.3 Introduction to Global Routing
This section introduces global routing by describing the LocusRoute global routing
algorithm for standard cells. Although there are many other published techniques for global
routing, this specific algorithm is described as an example because a modified version of it is
employed for FPGA global routing in this thesis.
This algorithm has been chosen for FPGAs because, as described below, its primary
goal is to balance the usage of the routing channels. This is important for FPGAs because the
number of tracks per channel is pre-determined. Note that the description below is based on
the standard-cell version of LocusRoute, and the main difference between this and the FPGA
version is the definitions of the routing channels - the standard-cell program assumes only
horizontal routing channels, whereas the FPGA version uses both horizontal and vertical
channels.
2.2.3.1 The LocusRoute Global Routing Algorithm
The LocusRoute global router views the global routing problem as consisting of
three main tasks:
1. For nets comprising more than two pins, determine which pairs of pins to connect
together. This step decomposes a multi-point net into a set of two-point connections.
2. Determine a path through the routing channels for each connection.
3. Optimize the solution so that the usage of all of the routing channels is balanced.
The first task is solved by finding a minimum-spanning tree for each net.
Basically, this technique breaks a net into a set of two point connections such that the
total amount of interconnect required is minimized.
To solve the second task, LocusRoute models each routing channel as an array of
grids, as shown in Figure 2.2. Each grid location contains a counter, originally set to zero,
which is incremented by one for each connection that is globally routed through it. In this
way, the algorithm is able to maintain a detailed account of the usage of each routing channel,
so that it can avoid congestion.
The algorithm considers alternative ways of routing each connection and chooses the
one that passes through the least congested routing grids. Note that LocusRoute does not
consider all of the possible ways that a connection can be routed, but rather it evaluates only a
subset of the paths that have "two or fewer bends", as explained in .
After all of the connections have been globally routed once, LocusRoute optimizes
the solution by sequentially ripping up and re-routing each connection. After repeating this
procedure a small number of times, the final solution is output in a format suitable for the
detailed router to be employed.
The main advantage of a maze router is that it is guaranteed to find a path from one
end of a connection to the other, if one exists at the time the connection is routed. On the
other hand, because of its sequential nature a maze router is unable to consider the
sideeffects that the routing of one connection may have on another. Correspondingly, the
main disadvantage of maze routing is the unnecessary blockage of as yet unrouted
connections because of previous routing decisions.
Commercially Available FPGAs
This section provides a detailed description of three commercially available FPGA
families, including those from Xilinx Co., Actel, and Altera. These particular FPGAs
have been chosen because they are representative examples of state-of-the-art devices
and they are in widespread use.
Each device is described in terms of its general architecture, its choice of
programmable cell, its routing architecture, and its CAD routing tools. Enough details are
given, and in some cases specific comments are made, to show how the routing architecture
of each device relates to the research contained in this thesis. In addition, at the end of the
section, several recently introduced FPGAs are briefly
described.
The XC2000 CLB, shown in Figure 2.6, consists of a four-input look-up table and a D flip-flop
[Cart86]. The look-up table can generate any function of up to four variables
or any two functions of three variables. Both of the CLB outputs can be combinational, or
one output can be registered. As illustrated in Figure 2.7, the XC2000 routing architecture
employs three types of routing resources: Direct interconnect, General Purpose
interconnect, and Long Lines. Note that for clarity the routing switches that connect to the
CLB pins are not shown in the figure. The Direct interconnect (shown only for the CLB
marked with an ’*’) provides connections from the output of a CLB to its right, top, and
bottom neighbours. For connections that span more than one CLB, the General Purpose
interconnect provides horizontal and vertical wiring segments, with four segments per row
and five segments per column. Each wiring segment spans only the length or width of one
CLB, but longer wires can be formed because each switch matrix holds a number of routing
switches that can interconnect the wiring segments on its four sides. Note that a connection
routed with the General Purpose interconnect will incur significant routing delays because it
must pass through a routing switch at each switch matrix. Connections that are required
to reach several CLBs with low skew can use the Long Lines, which traverse at most one
routing switch to span the entire length or width of the FPGA.
2.3.1.2 Xilinx XC3000
The XC3000 [Hsie88] is an enhanced version of the XC2000, featuring a more complex CLB
and more routing resources. The CLB, as shown in Figure 2.8, houses a look-up table that
can implement any function of five variables, any two functions of four variables, and some
functions of up to seven variables. The CLB has two outputs, both of which may be either
combinational or registered. Figure 2.9 shows that the XC3000 routing architecture is similar
to that in the XC2000, having Direct interconnect, General Purpose interconnect, and Long
Lines. Each resource is enhanced: the Direct interconnect can additionally reach a CLB’s left
neighbour, the General Purpose interconnect has an extra wiring segment per row, and
there are more Long Lines. The XC3000 also contains switch matrices that are similar to
those in the XC2000. Figure 2.9 depicts the internal structure of an XC3000 switch matrix by
showing, as an example, that the wiring segment marked with an ’*’ can connect through
routing switches to six other wiring segments. Although not shown in the figure, the other
wiring segments are similarly connected, though not always to the same number of
segments. This detail is included here because the results shown in Chapter 4 of this thesis
suggest recommended values for the number of routing switches connectable to any wiring
segment, as well as the number of wiring segments in a row or column. Those results
indicate that, in terms of routability, the XC3000 contains too many routing switches per
switch matrix and too few wiring segments in its rows and columns.
four variables together with some functions of five variables, or some functions of up to nine
variables. The CLB has two outputs, which may be either combinational or registered.
The XC4000 routing architecture is significantly different from the earlier Xilinx FPGAs, with
the most obvious difference being the replacement of the Direct interconnect and General
Purpose interconnect with two new resources, called Single-length Lines and Double-length
Lines. The Single-length Lines, which are intended for relatively short connections or those
that do not have critical timing requirements, are shown in Figure 2.11, where each X
indicates a routing switch. This figure illustrates three architectural enhancements in the
XC4000 series:
1. There are more wiring segments in the XC4000. While the number shown in the
figure is only suggestive, the XC4000 contains more than twice as many wiring segments
as does the XC3000.
2. Most CLB pins can connect to a high percentage of the wiring segments. This
represents an increase in connectivity over the XC3000.
3. Each wiring segment that enters a switch matrix can connect to only three others,
which is half the number found in the XC3000.
It is interesting to note these three enhancements here because they are all supported
by the architectural research that appears in Chapter 4 of this thesis. The remaining routing
resources in the XC4000, which includes the Double-length Lines and the Long Lines, are
shown in Figure 2.12. As the figure shows, the Doublelength Lines are similar to the Single-
length Lines, except that each one passes through half as many switch matrices. This
scheme offers lower routing delays for moderately long connections that are not appropriate
for the low-skew Long Lines. For clarity, neither the Single-length Lines nor the routing
switches that connect to the CLB pins are shown in Figure 2.12.
speed performance of the routing architecture. As Figure 2.14 shows, the Act-1 LM is
based on a configuration of multiplexers, which can implement any function of two variables,
most functions of three, some of four, up to a total of 702 logic functions [Mail90]. The Act-1
routing architecture is illustrated in Figure 2.15, which for clarity shows only the routing
resources connected to the LM in the middle of the picture. The Act-1 employs four distinct
types of routing resources: Input segments, Output segments, Clock tracks, and Wiring
segments. Input segments connect four of the LM inputs to the Wiring segments above the
LM and four to those below, while an Output segment connects the LM output to several
channels, both above and below the module. The Wiring segments consist of straight metal
lines of various lengths that can be connected together through anti-fuses to form longer
lines. The Act-1 features 22 tracks of Wiring segments in each routing channel and, although
not shown in the figure, 13 vertical tracks that lie directly on top of each LM column. Clock
tracks are special low-delay lines that are used for signals that must reach many LMs with
minimum skew.
the FPGA families described thus far. A LAB can be thought of as an efficient PLD, as will be
explained in the following paragraphs. Each LAB, as seen in Figure 2.17, consists of two
major blocks, called the Macrocell Array and the Expander Product Terms. The Macrocell
Array is a one-dimensional array of elements called Macrocells, where the number of
elements in the array varies with each Altera device. As illustrated
in Figure 2.18, each Macrocell comprises three wide AND gates that feed an OR gate which
connects to an XOR gate, and a flip-flop. The XOR gate generates the Macrocell output and
can optionally be registered. In Figure 2.18, the inputs to the Macrocell are shown as single-
input AND gates because each is generated as a wired-AND (called a pterm) of the signals
drawn on the left-hand side of the figure. A p-term can include any signal in the PIA, any of
the LAB Expander Product Terms (described below), or the output of any other Macrocell.
With this arrangement the Macrocell Array functions much like a PLD, but with fewer product
terms per register (there are usually at least eight product terms per register in a PLD).
Altera claims [Alt90] that this makes the LAB more efficient because most logic functions do
not require the large number of p-terms found in PLDs and the LAB supports wide functions
by way of the Expander Product Terms.
As illustrated in Figure 2.19, each Expander Product Terms block consists of a number of p-
terms (the number shown in the figure is only suggestive) that are inverted and fed back to
the Macrocell Array, and to itself. This arrangement permits the implementation of very wide
logic functions because any Macrocell has access to these extra p-terms. The Altera routing
structure, the PIA, consists of a number of long wiring segments that pass adjacent to every
LAB. The PIA provides complete connectivity because each LAB input can be programmably
connected to the output of any LAB, without constraints. With this arrangement, routing an
Altera FPGA is trivial, since there are no routing constraints. However, as mentioned
previously for Actel FPGAs, this level of connectivity is excessive and could probably be
reduced, given an appropriate routing algorithm.
(a) The LCA architecture (notice the matrix element size is larger than a CLB).
A 4 x 5 array of Logic Array Blocks (LABs), the same size as the EMP9400
chip.
A simplified block diagram of the interconnect architecture showing the
connection of the FastTrack buses to a LAB.
(b) The ACT 1 Logic Module (LM, the Actel basic logic cell). The ACT 1 family
uses just one type of LM. ACT 2 and ACT 3 FPGA families both use two
different types of LM
(d) An example logic macro. Connect logic signals to some or all of the LM
inputs, the remaining inputs to VDD or GND.
Fig.5.16.ACTEL ACT 1
Actel uses a fine-grain architecture which allows you to use almost all of the
FPGA.
Synthesis can map logic efficiently to a fine-grain architecture.
Physical symmetry simplifies place-and-route (swapping equivalent pins on
opposite sides of the LM to ease routing).
Matched to small antifuse programming technology.
LMs balance efficiency of implementation and efficiency of utilization.
A simple LM reduces performance, but allows fast and robust placeand-route.
The implementation details vary among the families, but the basic features: wide
programmable-AND array, narrow fixed-OR array, logic expanders, and
programmable inversion—are very similar. Each family has the following individual
characteristics:
A typical MAX 5000 chip has: 8 dedicated inputs (with both true and
complement forms); 24 inputs from the chipwide interconnect (true and
complement); and either 32 or 64 shared expander terms (single polarity).
The MAX 5000 LAB looks like a 32V16 PLD (ignoring the expander terms).
The MAX 7000 LAB has 36 inputs from the chipwide interconnect and 16
shared expander terms; the MAX 7000 LAB looks like a 36V16 PLD.
The MAX 9000 LAB has 33 inputs from the chipwide interconnect and 16 local
feedback inputs (as well as 16 shared expander terms); the MAX 9000 LAB
looks like a 49V16 PLD.
circuit. Observability is the ability to observe, either directly or indirectly, the state of any
node in the circuit.
Ad hoc testing
Scan-based approaches
Built-in self-test (BIST)
Ad hoc test techniques, as their name suggests, are collections of ideas aimed at
reducing the combinational explosion of testing. They are summarized here for historical
reasons.
They are only useful for small designs where scan, ATPG, and BIST are not
available. A complete scan-based testing methodology is recommended for all digital
circuits. Having said that, the following are common techniques for ad hoc testing:
A technique classified in this category is the use of the bus in a bus-oriented system for
test purposes. Each register has been made loadable from the bus and capable of being
driven onto the bus.
Here, the internal logic values that exist on a data bus are enabled onto the bus for
testing purposes. Frequently, multiplexers can be used to provide alternative signal paths
during testing. In CMOS, transmission gate multiplexers provide low area and delay
overhead.
Any design should always have a method of resetting the internal state of the chip within
a single cycle or at most a few cycles. Apart from making testing easier, this also makes
simulation faster as a few cycles are required to initialize the chip.
In general, ad hoc testing techniques represent a bag of tricks developed over the years
by designers to avoid the overhead of a systematic approach to testing. While these general
approaches are still quite valid, process densities and chip complexities necessitate a
structured approach to testing.
The scan-design strategy for testing has evolved to provide observability and
controllability at each register. In designs with scan, the registers operate in one of two
modes.
Therefore, scan mode gives easy observability and controllability of every register in
the system. Modern scan is based on the use of scan registers, as shown in Figure 6.1 . The
scan register is a D flip-flop preceded by a multiplexer.
For the circuit to load the scan chain, SCAN is asserted and CLK is pulsed eight
times to load the first two ranks of 4-bit registers with data. SCAN is deasserted and CLK is
asserted for one cycle to operate the circuit normally with predefined inputs. SCAN is then
reasserted and CLK asserted eight times to read the stored data out.
At the same time, the new register contents can be shifted in for the next test.
Testing proceeds in this manner of serially clocking the data through the scan register to the
right point in the circuit, running a single system clock cycle and serially clocking the data out
for observation.
In this scheme, every input to the combinational block can be controlled and every
output can be observed. In addition, running a random pattern of 1s and 0s through the scan
chain can test the chain itself.
Test generation for this type of test architecture can be highly automated. ATPG
techniques can be used for the combinational blocks and, as mentioned, the scan chain is
easily tested.
The prime disadvantage is the area and delay impact of the extra multiplexer in the
scan register. Designers (and managers alike) are in widespread agreement that this cost is
more than offset by the savings in debug time and production test cost.
6.2.1PARALLEL SCAN
You can imagine that serial scan chains can become quite long, and the loading and
unloading can dominate testing time. A fairly simple idea is to split the chains into smaller
segments. This can be done on a module-by-module basis or completed automatically to
some specified scan length. Extending this to the limit yields an extension to serial scan
called random access scan.
To some extent, this is similar to that used inside FPGAs to load and read the control
RAM. The basic idea is shown in Figure 6.2. The figure shows a two-by-two register section.
Each register receives a column (column<m>) and row (row<n>) access signal along with a
row data line (data<n>).
A global write signal (write) is connected to all registers. By asserting the row and
column access signals in conjunction with the write signal, any register can be read or
written in exactly the same method as a conventional RAM.
The notional logic is shown to the right of the four registers. Implementing the logic required
at the transistor level can reduce the overhead for each register.
The setup time increases by the delay of the extra transmission gate in series with the D
input as compared to the ordinary static flip-flop. igure 6.4(c) shows a circuit using clock
gating to obtain nearly the same setup time as the ordinary flip-flop.
In either design, if a clock enable is used to stop the clock to unused portions of the chip,
care must be taken that always toggles during scan mode.
Self-test and built-in test techniques, as their names suggest, rely on augmenting circuits
to allow them to perform operations upon themselves that prove correct operation.
These techniques add area to the chip for the test logic, but reduce the test time required
and thus can lower the overall system cost. offers extensive coverage of the subject from
the implementer’s perspective. One method of testing a module is to use or cyclic
redundancy checking.
This involves using a pseudo-random sequence generator (PRSG to produce the input
signals for a section of combinational circuitry and a signature analyzer to observe the output
signals. A PRSG of length n is constructed from a linear feedback shift register (LFSR),
which in turn is made of n flip-flops connected in a serial fashion, as shown in Figure 6.5(a).
The XOR of particular outputs are fed back to the input of the LFSR. An n-bit LFSR will
cycle through 2n–1 states before repeating the sequence.
A complete feedback shift register (CFSR), shown in Figure 6.5(b), includes the zero
state that may be required in some test situations.
Otherwise, the sequence is the same. Alternatively, the bottom n bits of an n + 1-bit
LFSR can be used to cycle through the all zeros state without the delay of the NOR gate. A
signature analyzer receives successive outputs of a combinational logic block and produces
a syndrome that is a function of these outputs. The syndrome is reset to 0, and then XORed
with the output on each cycle. The syndrome is swizzled each cycle so that a fault in one bit
is unlikely to cancel itself out.
At the end of a test sequence, the LFSR contains the syndrome that is a function of all
previous outputs. This can be compared with the correct syndrome (derived by running a
test program on the good logic) to determine whether the circuit is good or bad.
If the syndrome contains enough bits, it is improbable that a defective circuit will produce
the correct syndrome.
6.3.1 BIST
The combination of signature analysis and the scan technique creates a structure known as
BIST—for Built-In Self-Test or BILBO—for Built-In Logic Block Observation. The 3-bit BIST
register shown in Figure 6.6 is a scannable, resettable register that also can serve as a
pattern generator and signature analyzer. C[1:0] specifies the mode of operation.
In the reset mode (10), all the flip-flops are synchronously initialized to 0. In normal
mode (11), the flip-flops behave normally with their D input and Q output. In scan mode (00),
the flip-flops are configured as a 3-bit shift register between SI and SO.
Note that there is an inversion between each stage. In test mode (01), the register
behaves as a pseudo-random sequence generator or signature analyzer. If all the D inputs
are held low, the Q outputs loop through a pseudo-random bit sequence, which can serve as
the input to the combinational logic.
If the D inputs are taken from the combinational logic output, they are swizzled with
the existing state to produce the syndrome. In summary, BIST is performed by first resetting
the syndrome in the output register. Then both registers are placed in the test mode to
produce the pseudo-random inputs and calculate the syndrome.
Finally, the syndrome is shifted out through the scan chain. Various companies have
commercial design aid packages that automatically replace ordinary registers with scannable
BIST registers, check the fault coverage, and generate scripts for production testing.
On many chips, memories account for the majority of the transistors. A robust testing
methodology must be applied to provide reliable parts. In a typical MBIST scheme,
multiplexers are placed on the address, data, and control inputs for the memory to allow
direct access during test.
During testing, a state machine uses these multiplexers to directly write a
checkerboard pattern of alternating 1s and 0s. The data is read back, checked, then the
inverse pattern is also applied and checked. ROM testing is even simpler: The contents are
read out to a signature analyzer to produce a syndrome.
On-chip speeds are usually so high that directly observing internal behavior for testing
can be difficult or impossible. Designers have included on-chip logic analyzers and
oscilloscopes to deal with this problem Such systems typically require a trigger signal to
initiate data collection, a high speed timing generator, analog or digital sampling, and a
buffer to store the results until they can be off-loaded at lower speed.
A drawback is that the nodes to be observed must be selected at design time, and these
may not be the problem circuits. Nevertheless, probing major busses and critical analog/RF
nodes can be helpful. Also, on-chip scopes have been used to characterize power supply
noise . Analog/digital converter testing requires real-time access to the digital output of the
ADC. Providing parallel digital test ports by reassigning pins on the chip I/O can facilitate this
testing.
If this is impossible, a “capture RAM” on chip can be used to capture results in real-time
and then the contents can be transferred off-chip at a slower rate for analysis. If both ADCs
and DACs are present, a loopback strategy can be employed, as shown in Figure 6.8.
Both analog and digital signals can loop back. Communication and graphics systems
frequently have I/O systems that can be configured as shown. It is often worthwhile to add a
DAC and an ADC to a system to allow a level of analog self-test.
6.3.4 IDDQ
A method of testing for bridging faults is called IDDQ test (VDD supply current
Quiescent) or supply current monitoring. This relies on the fact that when a CMOS logic gate
is not switching, it draws no DC current (except for leakage).
When a bridging fault occurs, then for some combination of input conditions, a
measurable DC IDD will flow. Testing consists of applying the normal vectors, allowing the
signals to settle, and then measuring IDD.
As potentially only one gate is affected, the IDDQ test has to be very sensitive. In
addition, to be effective, any circuits that draw DC power such as pseudo-nMOS gates or
analog circuits have to be disabled. Dynamic gates can also cause problems.
As current measuring is slow, the tests must be run slower (of the order of 1 ms per
vector) than normal, which increases the test time. IDDQ testing can be completed externally
to the chip by measuring the current drawn on the VDD line or internally using specially
constructed test circuits. This technique gives a form of indirect massive observability at little
circuit overhead.
In this type of a tester, the board-under-test is lowered onto a set of test points (nails)
that probe points of interest on the board. These can be sensed (the observable points) and
driven (the controllable points) to test the complete board. At the chassis level, software
programs are frequently used to test a complete board set.
For instance, when a computer boots, it might run a memory test on the installed
memory to detect possible faults.
Boundary scan was originally developed by the Joint Test Access Group and hence
is commonly referred to as JTAG. Boundary scan has become a popular standard interface
for controlling BIST features as well.
The IEEE 1149 boundar y s c an architecture is shown in Figure 7.1. All of the I/O
pins of each IC on the board are connected serially in a standardized scan chain accessed
through the Test Access Port (TAP) so that every pin can be observed and controlled
remotely through the scan chain. At the board level, ICs obeying the standard can be
connected in series to form a scan chain spanning the entire board.
Connections between ICs are tested by scanning values into the outputs of each chip
and checking that those values are received at the inputs of the chips they drive. Moreover,
chips with internal scan chains and BIST can access those features through boundary scan
to provide a unified testing framework.
When the chip is in normal mode,TRSTandTCK are held low and TMS is held high to
disable boundary scan.
To prevent race conditions,inputs are sampled on the rising edge ofTCK and outputs
toggle onthe falling edge
The boundary scan register is associated with all the inputs and outputs on the
chips of that boundary scan can observe and control the chipI/Os.
The bypass register is a single flip- flop used to accelerate testing by avoiding
shifting data into the boundaryscan registers of idle chips when only a single chip on the
board is being tesd.
The TAP controller is a16-state FSM that proceeds from state to state based on
theTCKandTMSsignals.
It provides signals that control the test-data registers and the instruction
register.These include serial shift clocks and update clocks.
It moves from one state to then ext on the rising edge of TCK based on the value of TMS.
Typical test sequence will involve clocking TCK at some rate and setting TRST* to 0
for a few cycles and then returning this signal to 1 to reset theTAPcontroller state machine
TMS is then toggled to traverse the state machine for what ever operation is
required.These operations include serially loading an instruction register or serially loading
or reading data registers that are used to test the chip
The instruction register has to be atleast two bits long.Recall that boundary scan requires
atleast two data registers.
The instruction register specifies which data register will be placed in the scan chain when
the DR is selected.It also determines where the DR will load its value from in the Capture-
DR state and whether the values will be driven to output pads or core logic.
bypass—This instruction places the bypass register in the DR chain sothat the path
fromTDItoTDO involves only a singleflip-flop.This instruction is represented with all l's
in theIR.
extest—Thisinstructionallowsforthetestingofoff-chipcircuitry.It is similar to
sample/preload,but also drives the valuesfrom the DRs on to the output pads.
They are loaded with a constant value from the Data input in the Capture-IRstate and
then shifted out in the Shift-IRstate while new values are shifted in.
In the Update-IR state the contents of the shift register are copied in parallel to the
IR output to load the entire instruction at once.
They are loaded with a constant value from the Data input in the Capture-
IRstate ,and then shifted out in the Shift-IRstate while new values are shifted in.
In the Update-IRstate, the contents of the shift register are copied in parallel to the
IR output to load the entire instruction at once.
The test data registers are used to set the inputs of modules to be tested and collect
the results of running tests.
A multiplexer under thec ontrol of the TAP controller selects which data register is
routed to theTDO pin.
When internal data registers are added,the IR decoder must produce extracontrol
signals to select which one is in the DRchain for a particular instruction
The boundary scan register connects to all of the I/O circuitry . Like the instruction
register,shift register for the scan chain and an additional bank of flip-flops to update the
outputs in parallel.
An extra multiplexer on the output allows the boundary scan, register to override the
normal path through the I/Opad so it can observe and control inputs and outputs.
The schematic and symbol for a single bit of the boundary scan register are shown
inFigure
The boundary scan register can be configured as an input pad or output pad,as
shown in Figure(aandb).
As an input,the register receives Data in from the pad and sends Qout to the core
logic in the chip.
As an output,the register receives Data infrom the corelogic and drives Qout to a
pad.
Tri state and bidirectional pads use two or three boundary scan register cells,as
shown inFigure(candd).
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
The Mode signal determines whether Qout should be taken from Data in or the
boundary scan register.
Separate mode_in and mode_out signals are used for input and output pads so they can be
controlled separately
In normal chip operation,both mode signals are0,so the boundary scan registers are ignored.
For the extest instruction,mode_out=1,so the output scan be controlled by the boundary
scan registers.
For intest or runbist instructions,mode_in and mode_out areboth1.so the corelogic receives
its inputs from the boundary scan registers and the outputs are also driven to known safe
values by the boundary scanregister
When executing the bypass instruction, the single-bit Bypass register is connected
between TDIandTDO.It consists of a singleflip-flop that is cleared during Capture-DR,and
then scanned during Shift-DR, as shown in Figure
The TDO pin shifts out the least significant bit of the IR during Shift-IR or the least
significant bit of one of the data registers during Shift-DR,depending on which instruction is
active.
The IEEE boundary scan specification requires that TDO change on the falling edge of
TCK and be tristated except during the Shift states.
This prevents race conditions when the value is clocked into the next chip in the rising
edge of TCKand allows multiple chips to be connected inparallel with TDO pins tied to gether
to reduce the length of the boundary scan chain.
TDO Driver
Typical commercial products target a defect rate of 350–1000 defects per million
(DPM) chips shipped. The customer then assembles systems from the chips, tests the
systems, and discards or repairs defective systems.
A high defect rate leads to unhappy customers. A critical factor in all VLSI design is the
need to incorporate methods of testing circuits. This task should proceed concurrently with
architectural considerations and not be left until fabricated parts are available (as is a
recurring temptation to designers).
To deal with the existence of good and bad parts, it is necessary to propose a fault
model; i.e., a model for how faults occur and their impact on circuits. The most
popular model is called the Stuck-At model. The Short Circuit/ Open Circuit model
can be a closer fit to reality, but is harder to incorporate into logic simulation tools.
In the Stuck-At model, a faulty gate input is modeled as a stuck at zero (Stuck-At-0,
S-A- 0) or stuck at one (Stuck-At-l, S-A-l). This model dates from board-level designs, where
it was determined to be adequate for modeling faults. Figure illustrates how an S-A-0 or S-
A-1 fault might occur. These faults most frequently occur due to gate oxide shorts (the
nMOS gate to GND or the pMOS gate to VDD) or metal-to-metal shorts.
Other models inc lude stuck-open or shor ted models .Two bridging or shorted faults
are
shown in Figure a.
The short S1 results in an S-A-0 fault at input A, while short S2 modifies the function
of the gate. It is evident that to ensure the most accurate modeling, faults should be modeled
at the transistor complete circuit structure is known.
For instance, in the case of a simple NAND gate, the intermediate node between the
series nMOS transistors is hidden by level because it is only at this level that the the
schematic. This implies that test generation should ideally take account of possible shorts
and open circuits at the switch level [Galiay80]. Expediency dictates that most existing
systems rely on Boolean logic representations of circuits and stuck-at fault modeling.
FIGURE( b) A CMOS open fault that causes sequential faults FIGURE(C) A defect that
causes static IDD current
A particular problem that arises with CMOS is that it is possible for a fault to convert
a combinational circuit into a sequential circuit. This is illustrated in Figure b for the case of a
2-input NOR gate in which one of the transistors is rendered ineffective. If nMOS transistor A
is stuck open, then the function displayed by the gate will be
where Z’ is the previous state of the gate. As another example, if either pMOS transistor is
missing, the node would be arbitrarily charged (i.e., it might be high due to some weird
charging sequence) until one of the nMOS transistors discharged the node. Thereafter, it
would remain at zero, barring charge leakage effects.
This could physically occur if stray metal (caused by a speck of dust at the
photolithography stage) overlapped the VDD line and drain connection as shown.
If we apply the test vector 01 or 10 to the A and B inputs and measure the static IDD
current, we will notice that it rises to some value determined by size of the nMOS transistors.
5.2 OBSERVABILITY
The observability of a particular circuit node is the degree to which you can observe
that node at the outputs of an integrated circuit (i.e., the pins). This metric is relevant when
you want to measure the output of a gate within a larger circuit to check that it operates
correctly. Given the limited number of nodes that can be directly observed, it is the aim of
good chip designers to have easily observed gate outputs.
Adoption of some basic design for test techniques can aid tremendously in this
respect. Ideally, you should be able to observe directly or with moderate indirection (i.e., you
may have to wait a few cycles) every gate output within an integrated circuit.
While at one time this aim was hindered by the expense of extra test circuitry and a
lack of design methodology, current processes and design practices allow you to approach
this ideal.
5.3 CONTROLLABILITY
The controllability of an internal circuit node within a chip is a measure of the ease of
setting the node to a 1 or 0 state. This metric is of importance when assessing the degree of
difficulty of testing a particular signal within a circuit. An easily controllable node would be
directly settable via an input pad.
A node with little controllability, such as the most significant bit of a counter, might
require many hundreds or thousands of cycles to get it to the right state. Often, you will find it
impossible to generate a test sequence to set a number of poorly controllable nodes into the
right state. It should be the aim of good chip designers to make all nodes easily controllable.
In common with observability, the adoption of some simple design for test
techniques can aid in this respect tremendously. Making all flip-flops resettable via a global
reset signal is one step toward good controllability.
Each circuit node is taken in sequence and held to 0 (S-A-0), and the circuit is
simulated with the test vectors comparing the chip outputs with a known good machine––a
circuit with no nodes artificially set to 0 (or 1).
When a discrepancy is detected between the faulty machine and the good machine,
the fault is marked as detected and the simulation is stopped. This is repeated for setting the
node to 1 (S-A-1). In turn, every node is stuck (artificially) at 1 and 0 sequentially.
The fault coverage of a set of test vectors is the percentage of the total nodes that
can be detected as faulty when the vectors are applied. To achieve world-class quality
levels, circuits are required to have in excess of 98.5% fault coverage. The Verification
Methodology Manual is the bible for fault coverage techniques.
Historically, in the IC industry, logic and circuit designers implemented the functions at
the RTL or schematic level, mask designers completed the layout, and test engineers wrote
the tests. In many ways, the test engineers were the Sherlock Holmes of the industry,
reverse engineering circuits and devising tests that would test the circuits in an adequate
manner.
For the longest time, test engineers implored circuit designers to include extra circuitry to
ease the burden of test generation. Happily, as processes have increased in density and
chips have increased in complexity, the inclusion of test circuitry has become less of an
overhead for both the designer and the manager worried about the cost of the die.
In addition, as tools have improved, more of the burden for generating tests has fallen on
the designer. To deal with this burden, Automatic Test Pattern Generation (ATPG) methods
have been invented. The use of some form of ATPG is standard for most digital designs.
Commercial ATPG tools can achieve excellent fault coverage.
However, they are computation-intensive and often must be run on servers or compute
farms with many parallel processors. Some tools use statistical algorithms to predict the fault
coverage of a set of vectors without performing as much simulation.
The fault models dealt with until this point have neglected timing. Failures that occur in
CMOS could leave the functionality of the circuit untouched, but affect the timing.
If an open circuit occurs in one of the nMOS transistor source connections to GND, then
the gate would still function but with increased tpdf. In addition, the fault now becomes
sequential as the detection of the fault depends on the previous state of the gate.
Delay faults may be caused by crosstalk . Delay faults can also occur more often in SOI
logic through the history effect. Software has been developed to model the effect of delay
faults and is becoming more important as a failure mode as processes scale.
PART-A
1. What is a FPGA?
A field programmable gate array (FPGA) is a programmable logic device
that supports implementation of relatively large logic circuits. FPGAs can be
used to implement a logic circuit with more than 20,000 gates whereas a CPLD
can implement circuits of upto about 20,000 equivalent gates.
A connection that needs to cross over a row of standard cells uses a feed
through cells.
It is Short for ultra large scale integration, which refers loosely to placing more than
about one million circuit elements on a single chip. The Intel 486 and Pentium
microprocessors, for example, use ULSI technology.
i) Detailed Routing
Functionality tests verify that the chip performs its intended function. These tests assert that
all the gates in the chip, acting in concert, achieve a desired function. These tests are
usually used early in the design cycle to verify the functionality of the circuit.
An approach to fault analysis is known as fault sampling. This is used in circuits where it is
impossible to fault every node in the circuit. Nodes are randomly selected and faulted. The
resulting fault detection rate may be statistically inferred from the number of faults that are
detected in the fault set and the size of the set. The randomly selected faults are unbiased. It
will determine whether the fault coverage exceeds a desired level.
29. What are the factors that cause timing failures?[MAY 2012]
Temperature, large of interconnects.
32. In saturation region, what are the factors that affect Ids? [MAY 2011]
i. distance between source and drain.
ii. channel width
iii.Threshold oltage
iv.thickness of oxide layer
v. dielectric constant of gate insulator
vi. Carrier mobility.
34. What are the test features required to test a chip. [MAY 2010]
A test fixture is a device or setup designed to hold the device under test in place and allow
it to be tested by being subjected to controlled electronic test signals.
Eg: 1. Socket style test fixtures
2. Semi custom test fixture
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN
PART B
1. (a)Expalin in detail the sequence of Scan – Based techniques.(16) MAY
(b)With the essential circuit modules, explain in detail the BIST technique. 2011
(16)
.2. (a) Explain the manufacturing test principles in detail. (16) DEC
3. (b) Describe the adhoc testing and scan based approaches to design for 2011
testability in detail. (16)
4. (a) Explain the design for testability (DFT) concepts. MAY
5. (b) Explain the following terms, 2013
(i) Silicon debugs principles. (8)
(ii) Boundary scans techniques. (8)
6. (a)Describe in detail, the various manufacturing test in CMOS testing. DEC
7. (b)Explain in detail boundary scan testing.(16) 2013
8. (a)Discuss the need for testing and explain about the silicon debugging MAY
principles.(16) 2014
10.Explain the general architecture of FPGA and bring about different programmable
blocks used.(NOV 2015)(Nov 2017)