0% found this document useful (0 votes)
10 views228 pages

Vlsi Notes 16-17 Even Final-2

The document provides lecture notes for the EC8095 VLSI Design course, detailing the syllabus and topics covered in various units including MOS transistors, combinational and sequential circuit design, arithmetic building blocks, and implementation strategies. It includes references to key textbooks and outlines specific concepts such as CMOS logic, power architecture, and design for testability. The notes are prepared by Mrs. P. Vijayasri, Assistant Professor in the Department of Electronics & Communication Engineering at SKR Engineering College.

Uploaded by

vijishri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views228 pages

Vlsi Notes 16-17 Even Final-2

The document provides lecture notes for the EC8095 VLSI Design course, detailing the syllabus and topics covered in various units including MOS transistors, combinational and sequential circuit design, arithmetic building blocks, and implementation strategies. It includes references to key textbooks and outlines specific concepts such as CMOS logic, power architecture, and design for testability. The notes are prepared by Mrs. P. Vijayasri, Assistant Professor in the Department of Electronics & Communication Engineering at SKR Engineering College.

Uploaded by

vijishri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 228

(Approved by AICTE, affiliated to AU, NAAC &NBA accredited)

DEPARTMENT
OF
ELECTRONICS & COMMUNICATION ENGINEERING

LECTURE NOTES
EC8095-VLSI DESIGN
(2017 Regulation)
Year/Semester: III/VI ECE

Prepared by
Mrs.P.Vijayasri,Assist.Prof.
Department of ECE EC8095-VLSI DESIGN

Syllabus:
EC8095 VLSI DESIGN SYLLABUS REGULATION 2017
INTRODUCTION TO 9
UNIT I MOS TRANSISTOR
MOS Transistor, CMOS logic, Inverter, Pass Transistor, Transmission gate,
Layout Design Rules, Gate Layouts, Stick Diagrams, Long-Channel I-V Charters
tics, C-V Charters tics, Non ideal I-V Effects, DC Transfer characteristics, RC
Delay Model, Elmore Delay, Linear Delay Model, Logical effort, Parasitic Delay,
Delay in Logic Gate, Scaling.
UNIT II COMBINATIONAL MOS 9
LOGIC CIRCUITS
Circuit Families: Static CMOS, Ratioed Circuits, Cascode Voltage Switch
Logic, Dynamic Circuits, Pass Transistor Logic, Transmission Gates, Domino,
Dual Rail Domino, CPL, DCVSPG, DPL, Circuit Pitfalls. Power: Dynamic Power,
Static Power, Low Power Architecture.

UNIT III SEQUENTIAL CIRCUIT 9


DESIGN
Static latches and Registers, Dynamic latches and Registers, Pulse
Registers, Sense Amplifier Based Register, Pipelining, Schmitt Trigger,
Monostable Sequential Circuits, Astable Sequential Circuits. Timing
Issues :Timing Classification Of Digital System, Synchronous Design.
UNIT IV DESIGN OF 9
ARITHMETIC
BUILDING BLOCKS
AND SUBSYSTEM
Arithmetic Building Blocks: Data Paths, Adders, Multipliers, Shifters,
ALUs, power and speed tradeoffs, Case Study: Design as a tradeoff.
Designing Memory and Array structures: Memory Architectures and
Building Blocks, Memory Core, Memory Peripheral Circuitry.
UNIT V IMPLEMENTATION 9
STRATEGIES AND
TESTING
FPGA Building Block Architectures, FPGA Interconnect Routing Procedures.
Design for Testability: Ad Hoc Testing, Scan Design, BIST, IDDQ Testing,
Design for Manufacturability, Boundary Scan.
TEXT BOOKS:

Neil H.E. Weste, David Money Harris


―CMOS VLSI Design: A Circuits and
1. Systems Perspective‖, 4th Edition,
Pearson , 2017 (UNIT I,II,V)
Jan M. Rabaey ,Anantha
Chandrakasan, Borivoje. Nikolic,
2. ‖Digital Integrated Circuits:A Design
perspective‖, Second Edition , Pearson
, 2016.(UNIT III,IV)
REFERENCES
M.J. Smith, ―Application Specific
Integrated Circuits‖, Addisson Wesley,

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

1. 1997

Sung-Mo kang, Yusuf leblebici,


Chulwoo Kim ―CMOS Digital
2. Integrated Circuits:Analysis &
Design‖,4th edition McGraw Hill
Education,2013
Wayne Wolf, ―Modern VLSI Design:
System On Chip‖, Pearson Education,
3. 2007

R.Jacob Baker, Harry W.LI., David


E.Boyee, ―CMOS Circuit Design,
4. Layout and Simulation‖, Prentice Hall
of India 2005.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

INDEX

EC8095-VLSI DESIGN

S.N TOPICS PAGE


O NO
UNIT I MOS TRANSISTOR PRINCIPLE
1.1 MOS transistors
1.2 CMOS logic Inverter
1.3 Pass Transistor
1.4 Transmission gate
1.5 Layout Design Rules
1.6 Gate Layouts
1.7 Stick Diagrams
1.8 Long-Channel I-V Charters tics
1.9 C-V Charters tics
Non ideal I-V Effects
DC Transfer characteristics
RC Delay Model
Elmore Delay
Linear Delay Model
Logical effort, Parasitic Delay
Delay in Logic Gate
Scaling

2 Marks Questions & Answers


16 Marks Questions
UNIT II COMBINATIONAL LOGIC CIRCUITS
2.1 Circuit Families: Static CMOS, , , , , ,.
2.2 Ratioed Circuits
2.3 Cascode Voltage Switch Logic
2.4 Dynamic Circuits
2.5 Pass Transistor Logic
2.6 Transmission Gates
2.7 Domino, Dual Rail Domino
CPL, DCVSPG, DPL
Circuit Pitfalls
Power: Dynamic Power, Static Power
Low Power Architecture

2 Marks Questions & Answers


16 Marks Questions
UNIT III SEQUENTIAL LOGIC CIRCUITS
3.1 Static latches and Registers
3.2 Dynamic latches and Registers
3.3 Pulse Registers
3.4 Sense Amplifier Based Register

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

3.5 Pipelining
3.6 Schmitt Trigger
3.7 Monostable Sequential Circuits
3.8 Astable Sequential Circuits
3.9 Timing Issues :Timing Classification Of Digital System
3.10 Synchronous Design
2 Marks Questions & Answers
16 Marks Questions

UNIT IV DESIGNING ARITHMETIC BUILDING BLOCKS


4.1 Arithmetic Building Blocks: Data Paths
4.2 Adders
4.3 Multipliers
4.4 Shifters
4.5 ALUs
4.6 power and speed tradeoffs
4.7 Case Study: Design as a tradeoff
Designing Memory and Array structures: Memory Architectures and
Building Blocks
Memory Core
Memory Peripheral Circuitry
2 Marks Questions & Answers
16 Marks Questions
UNIT V IMPLEMENTATION STRATEGIES
5.1 FPGA Building Block Architectures
5.2 FPGA Interconnect Routing Procedures
5.3 Design for Testability: Ad Hoc Testing
5.4 Scan Design
5.5 BIST, IDDQ Testing
5.6 Design for Manufacturability
5.7 Boundary Scan
2 Marks Questions & Answers
16 Marks Questions

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

UNIT I
MOS TRANSISTOR
PRINCIPLE

REFERRED BOOK:
TEXT BOOKS:

Neil H.E. Weste, David Money Harris


―CMOS VLSI Design: A Circuits and
1. Systems Perspective‖, 4th Edition,
Pearson , 2017 (UNIT I,II,V)
Jan M. Rabaey ,Anantha
Chandrakasan, Borivoje. Nikolic,
2. ‖Digital Integrated Circuits:A Design
perspective‖, Second Edition , Pearson
, 2016.(UNIT III,IV)

STAFF IN-CHARGE HOD


1.1 MOS transistors
A Metal-Oxide-Semiconductor (MOS) structure is created by superimposing
severallayers ofconducting and insulating materials to form a sandwich-like structure.
Thesestructures aremanufactured using a series of steps involving oxidationof the
silicon, selective adding of dopants, and deposition and etching of metal wiresand
contacts. Transistors are built on single crystals of silicon. CMOS technology
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

providestwo types of transistors (also called devices): an n-type transistor (nMOS)


and a p-typetransistor (pMOS).
Transistor operation is controlled by electric fields so the devices arealso
called Metal Oxide Semiconductor Field Effect Transistors (MOSFETs) or
simplyFETs. Cross-sections and symbols of these transistors are shown in Figure.
The n+and p+ regions indicate heavily doped n- or p-type silicon.

(a) (b)
Fig. 1.1 nMOS transisitor (a) and pMOS transistor (b)
Each transistor consists of a conducting gate, an insulating layer of
silicondioxide andsubstrate or bodyor bulk. Gates of early transistors were built from
metal, so the stack was called metaloxide-semiconductor, or MOS. Since the 1970s,
the gate has been formed from polycrystallinesilicon (polysilicon).

1.1.1nMOS transistor
An nMOStransistor is built with a p-type body and has regions of n-type
semiconductor adjacent to thegate called the sourceand drain. They are physically
equivalent and interchangeable. The body is typically grounded.
Operation
The gate is a control input: It affects the flow of electrical current between the
source and drain. In an nMOS transistor, the body is generally grounded so the p–n
junctions of the source and drain to body are reverse-biased. If the gate is also
grounded, no current flows through the reverse-biased junctions. Hence, we say the
transistor is OFF. If the gate voltage is raised, it creates an electric field that starts to
attract free electrons to the underside of the Si–SiO2 interface. If the voltage is
raised enough, the electrons out number the holes and a thin region under the gate
called the channel is inverted to act as an n-type semiconductor. Hence, a
conducting path of electron carriers is formed from source to drain and current can
flow. The transistor is ON.As the gate voltage increases, the potential at the silicon
surface at some point reaches a critical value, where the semiconductor surface
inverts to n-type material. Further increases in the gate voltage produce no further
changes in the depletion layer width, but result in additional electrons in the thin
inversion layer directly under theoxide. These are drawn into the inversion layer from
the heavily doped n + source region. Hence, a continuous n-type channel is formed
between the source and drain regions, the conductivity of which is modulated by the
gate-source voltage.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig. 1.2- Cross section of anMOS transistor


In the presence of an inversion layer, the charge stored in the depletion region is
fixed and equals

This picture changes somewhat in case a substrate bias voltage VSBis applied (VSBis
normallypositive for n-channel devices). This causes the surface potential required
for stronginversion to increase and to become |–2ФF + VSB|. The charge stored in the
depletion region now is expressed by

The value of VGS where strong inversion occurs is called the threshold voltage VT.
VT is a function of several components, most of which are material constants such
as the difference in work-function between gate and substrate material, the oxide
thickness, theFermi voltage, the charge of impurities trapped at the surface between
channel and gateoxide, and the dosage of ions implanted for threshold adjustment.
The threshold voltage under different body-biasing conditions can then be
determined in the following manner,

The parameter γ is called the body-effect coefficient, and expresses the impact
ofchanges in VSB. The threshold voltage has a positive value for a typical NMOS
device, while it is negative for a normal PMOS transistor.

1.1.2 pMOS transistor


A pMOS transistor is just theopposite, consisting of p-type source and drain
regions with an n-type body. For a pMOS transistor, the body is held at a
positivevoltage. When the gate is also at a positive voltage, the source and drain
junctions are reverse-biased and no current flows, so the transistor is OFF. When
the gate voltage is lowered,positive charges are attracted to the underside of the Si–
SiO2 interface. A sufficiently low gate voltage inverts the channel and a conducting
path of positive carriers is formed fromsource to drain, so the transistor is ON. Notice
that the symbol for the pMOS transistor hasa bubble on the gate, indicating that the
transistor behavior is the opposite of the nMOS.The positive voltage is usually called
VDDand represents a logic1 valuein digital circuits.The low voltage is called
GROUND (GND) or VSS and represents a logic 0. It is normally 0 volts.

1.2 The Inverter


Figure 1.11 shows the schematic and symbol for a CMOS inverter or NOT gate using one

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

nMOS transistor and one pMOS transistor. The bar at the top indicates VDD and the triangle
at the bottom indicates GND. When the input A is 0, the nMOS transistor is OFF and
the pMOS transistor is ON. Thus, the output Y is pulled up to 1 because it is connected to
VDD but not to GND. Conversely, when A is 1, the nMOS is ON, the pMOS is OFF, and Y
is pulled down to ‘0.’ This is summarized in Table.

Fig.1.3 CMOS inverter


1.2.1 The NAND Gate
Figure shows a 2-input CMOS NAND gate. It consists of two series nMOS transistors
betweenY and GND and two parallel pMOS transistors between Y and VDD. If either input A
or B is 0, at least one of the nMOS transistors will be OFF, breaking the path from Y to GND.
But at least one of the pMOS transistors will be ON, creating a path from Y to VDD. Hence,
the output Y will be 1. If both inputs are 1, both of the Nmos transistors will be ON and both
of the pMOS transistors will be OFF. Hence, the output will be 0. The truth table is given in
Table and the symbol is shown in Figure. Note that by DeMorgan’s Law, the inversion
bubble may be placed on either side of the gate.

Fig 1.4 Two input NAND gate schematic and symbol

k-input NAND gates are constructed using k series nMOS transistors and k parallel pMOS
transistors. For example, a 3-input NAND gate is shown in Figure. When any of the inputs
are 0, the output is pulled high through the parallel pMOS transistors. When all of the inputs
are 1, the output is pulled low through the series nMOS transistors.
1.2.2 CMOS Logic Gates
The inverter and NAND gates are examples of static CMOS logic gates, also called
complementary CMOS gates. In general, a static CMOS gate has an nMOS pull-down
network to connect the output to 0 (GND) and pMOS pull-up network to connect the output
to 1 (VDD), as shown in Figure.
The networks are arranged such that one is ON and the other OFF for any input
pattern.
The pull-up and pull-down networks in the inverter each consist of a single transistor.
The NAND gate uses a series pull-down network and a parallel pullup network.
More elaborate networks are used for more complex gates.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Two or more transistors in series are ON only if all of the series transistors are ON.
Two or more transistors in parallel are ON if any of the parallel transistors are ON.
This is illustrated in Figure for nMOS and pMOS transistor pairs. By using
combinations of these constructions, CMOS combinational gates can be constructed.
Although such static CMOS gates are most widely used.In general, when we join a
pull-up network to a pull-down network to form a logic gate as shown in Figure , they both
will attempt to exert a logic level at the output.
The possible levels at the output are shown in Table . From this table it can be seen
that the output of a CMOS logic gate can be in four states.
The 1 and 0 levels have been encountered with the inverter and NAND gates, where
either the pull-up or pull-down is OFF and the other structure is ON.
When both pull-up and pull-down are OFF, the highimpedance or floating Z output
state results.
This is of importance in multiplexers, memory elements, and tristate bus drivers. The
crowbarred (or contention) X level exists when both pull-up and pull-down are
simultaneously turned ON.
Contention between the two networks results in an indeterminate output level and
dissipates static power. It is usually an unwanted condition.

Fig 1.5General logic gate using pull up and pull down networks

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig 1.6 Connection and behaviour of series and parallel transistors

Fig 1.7 2 input NOR gate schematic and symbol

1.3 Ideal I-V Characteristics


This section derives the current-voltage relationships for various bias conditions
in a MOS transistor. As stated previously, MOS transistors have three regions of
operation:
 Cutoff or subthreshold region

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Linear or non-saturation region


 Saturation region
We shall derive the current-voltage relationship separately for the cutoff, linear
region and the saturated region of operation.The fundamental operation of a MOS
transistor arises out of the gate voltage V gscreating a channel between the source
and the drain, attracting the majority carriers from the source and causing them to
move towards the drain under the influence of an electric field due to the voltage V ds.
The corresponding current Ids depend on both Vgsand Vds.
(i) Cutoff
In the cutoff region (Vgs<Vt), there is no channel and almost zero current flows
from drain to source. In the other regions, the gate attracts carriers (electrons) to
form a channel. The electrons drift from source to drain at a rate proportional to the
electric field between these regions. Thus we can compute currents if we know the
amount of charge in the channel and the rate at which it moves.
The conventional current flowing from the drain to the source is given by,
Ids = -Isd= (charge induced in channel) / (electron transit time) = Q channel/ ζn
We know that the charge on each plate of a capacitor is Q=CV thus the charge in the
channel Qchannelis

Where Cg= capacitance of the gate to the channel


Vgc-Vt= Amount of voltage attracting charge to the channel beyond the minimum
required to invert from p to n.
We can model the gate as a parallel plate capacitor with capacitance
proportional to area over thickness. If the gate has length “L” and width “W” and the
oxide thickness is tox, as shown in Figure 1-2 the capacitance is

where the permittivity εox= 3.9 εo = 3.9 x 8.85 x 10 –12 = 3.45 x 10 –11 F/m
εox is the dielectric or permittivity of the silicon dioxide εo is the vacuum permittivity
(or dielectric constant).εois the permittivity of free space, 8.85x10-14 F/cm. Often the
εox/toxterm is called Cox, the capacitance per unit area of the gate oxide.

Fig. 1.8 nMOS Transistor Dimensions


The time required for an electron or other charge carrier to travel between two
electrodes in a transistor is

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Transit time ζn= (length of the channel) / (electron velocity) = L / ν


“ν” velocity given by the electron mobility and electric field or ν = μE
“μ” is mobility
Electric field is the voltage difference between drain and source V dsdivided by the
channel length

So the velocity, ν = (μVds/L)


Thus, the transit time is ζn= L/ν = L/(μVds/L) = L2/(μVds)
(ii) Linear
This region of operation implies the existence of the uninterrupted channel
between the source and the drain. The voltage between the gate and the channel
varies linearly with the distance x from the source due to the IR drop in the channel.
Ids = Qchannel / ζn----- (1)
Now,the gate voltage is referenced to the channel, which is not grounded. If the
source is at Vgs and the drain is at Vgd, the average is
Vgc=(Vgs + Vgd )/ 2
We know that Vds=Vgs-Vgd
Therefore Vgd=Vgs-Vds
So, the mean difference between the gate and channel potentials Vgc is,

Fig. 1.9 Average gate to channel voltage

-
Substitute the values of Vgcin equation (1.1). The drain to source current Ids

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

(iii)Saturation region
 When Vgs>Vtand Vds>Vgs-Vt,the switch is turned on and the channel has been
created which allows the current to flow between the drain and source. Since
the drain voltage is higher than the gate voltage, portion of the channel is
turned off. The onset of this region is known as pinch-off. The drain current is
now independent of the drain voltage and the current is controlled only by the
gate voltage.

(a) (b)
Fig. 1.10 I-V characteristics of Ideal (a) nMOS and (b) pMOS transistors

1.4 Non-Ideal I-V Effects


1.4.1.1Velocity saturation & mobility degradation
Carrier drift velocity (ν = μE) and hence current increase linearly with the
lateral electric field Elat = Vds/L between source and drain. At high field strength, drift
velocity rolls off due to carrier scattering and eventually saturates at v sat, as shown in
Fig 1.5. Now the carrier velocity becomes,

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Where Esat is determined empirically.vsat = μEsat.

Fig.1.11. Carrier velocity Vs Electric field


1.4.2 Mobility Degradation
Mobility is important because the current in MOSFET depends upon mobility of
charge carriers(holes and electrons).We can describe this mobility degradation by
two effects:
(i)Lateral Field Effect: In case of short channels, as the lateral field is increased, the
channel mobility becomes field-dependent and eventually velocity saturation occurs
(which was referred to in the previous lecture). This results in current saturation.
ii) Vertical Field Effect: As the vertical electric field also increases on shrinking
the channel lengths, it results in scattering of carriers near the surface. Hence the
surface mobility reduces

1.4.3 Channel Length Modulation


Ideally, Ids is independent of Vds for a transistor in saturation, making the transistor
a perfect current source. But the reverse-biased p-n junction between the drain and
body forms a depletion region with a width L d that increases with Vdb. The depletion
region effectively shortens the channel length to

Fig. 1.12 Depletion region shortens effective channel length


SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

Assume the source voltage is close to the body voltage so Vdb~Vds. Hence,
increasing Vds decreases the effective channel length. Shorter channel length
results in higher current; thus Ids increases with Vds in saturation as shown.

Fig. 1.13 I-V characteristics of nMOS transistor with channel length modulation
This can be modeled by multiplying Ids value at saturation region by (1 + λVds).

The parameter λ is an empirical channel length modulation factor

1.4.4Body Effect
In a transistor, the body is an implicit fourth terminal. The potential difference
between the source and body V sb affects the threshold voltage. The threshold
voltage can be modeled as,

where“Vt0” is the threshold voltage when the source is at the body potential, “Φs” is
the surface potential at threshold .“γ” is the body effect coefficient.

1.4.5Subthreshold Conduction
The ideal transistor I-V model current only flows from S to D when Vgs> Vt. In
real transistors, current does not abruptly cut off below threshold, but rather drops off
exponentially as given below. This conduction is also known as leakage and often
results in undesired current when a transistor is nominally OFF.

Where “Ids0” is the current at threshold & dependent on process, device geometry;
“n” is a process-dependent term affected by the depletion region and is typically in
the range of 1.4-1.5 for CMOS processes. “VT” is thermal voltage

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.1.14 I-V characteristics of nMOS transistor with subthreshold conduction


1.4.6 Junction Leakage
The p-n junctions between diffusion and the substrate or well form diodes, as
shown in Fig1.9.The well-to-substrate junction is another diode. The substrate and
well are tied to GND or VDD to ensure these diodes remain reverse-biased.
However, reverse-biased diodes still conduct a small amount of current ID.

where“Is” depends on doping levels and on the area & perimeter of the diffusion
region “VD” is the diode voltage

Figure.1.15 Reverse Biased Diode In nMOS Transistor


1.4.7 Tunnelling
There is a finite probability that carriers will tunnel through the gate oxide. The
probability of tunneling drops off exponentially with oxide thickness. This results in
gate leakage current flowing into the gate. For oxides thinner than about 15-20 A0,
tunneling current becomes a factor and may become comparable to subthreshold
leakage in advanced processes.
Tunneling current is an order of magnitude higher for nMOS than pMOS
transistors with Si02 gate dielectrics

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig 1.16 plots gate leakage current density JG against voltage for various oxide
thicknesses.
1.4.8 Temperature Dependence
Transistor characteristics are influenced by temperature, carrier mobility
decreases with temperature. An approximate relation is

where T is the absolute temperature


Tris room temperature
kμis a fitting parameter generally in the range of 1.2-2.0
The magnitude of the threshold voltage decreases nearly linearly with temperature
and may be approximated by

wherekvt is typically in the range of 0.5 to 3.0 mV/K.


Junction leakage also increases with temperature because Isis strongly
temperature dependent. The combined temperature effects are shown in Figure
1.11(a), where ON current decreases and OFF current increases with temperature.
Similarly, Figure 1.11(b) shows how the ON current I dsat decreases with
temperature. Therefore, circuit performance is generally worst at high temperature.
This is called a negative temperature coefficient.

Fig. 1.17 Input voltage- Output voltage characteristics

1.4.9 Latch Up for CMOS


•Latch‐up may be induced by glitches on the supply rails or by incidentradiation.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

• The mechanism involved shows the key parasitic components associated with a p‐
wellstructure in which an inverter circuit (for example) has been formed.

Fig. 1.18 (a) Latch Up for CMOS

1.5 C-V CHARACTERISTICS


Each terminal of an MOS transistor has capacitance to the other terminals. In
general, these capacitances are nonlinear and voltage dependent (C-V); however, they can
be approximated as simple capacitors when their behavior is averaged across the switching
voltages of a logic gate.

(A) SIMPLE MOS CAPACITANCE MODELS

The gate of an MOS transistor is a good capacitor. Indeed, its capacitance is


necessary to attract charge to invert the channel, so high gate capacitance is required to
obtain high Ids. The gate capacitor can be viewed as a parallel plate capacitor with the gate
on top and channel on bottom with the thin oxide dielectric between.

Therefore, the capacitance is


Cg=CoxWL

The bottom plate of the capacitor is the channel, which is not one of the transistor’s
terminals. When the transistor is on, the channel extends from the source (and reaches the
drain if the transistor is unsaturated, or stops short in saturation).

Thus, we often approximate the gate capacitance as terminating at the source and
call the capacitance Cgs. Most transistors used in logic are of minimum manufacturable
length because this results in greatest speed and lowest dynamic power consumption. Thus,
taking this mini- mum L as a constant for a particular process, we can define

Notice that if we develop a more advanced manufacturing process in which both the
channel length and oxide thickness are reduced by the same factor, Cpermicron remains
unchanged.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

In addition to the gate, the source and drain also have capacitances. These
capacitances are not fundamental to operation of the devices, but do impact circuit
performance and hence are called parasitic capacitors.

The source and drain capacitances arise from the p–n junctions between the source
or drain diffusion and the body and hence are also called diffusion capacitance Csband Cdb. A
depletion region with no free carriers forms along the junction. The depletion region acts as
an insulator between the conducting p- and n-type regions, creating capacitance across the
junction.

The capacitance of these junctions depends on the area and perimeter of the source
and drain diffusion, the depth of the diffusion, the doping levels, and the voltage. As diffusion
has both high capacitance and high resistance, it is generally made as small as possible in
the layout. Three types of diffusion regions are frequently seen, illustrated by the two series
transistors in Figure .

In Fig1(a), each source and drain has its own isolated region of contacted diffusion.
In Figure1(b), the drain of the bottom transistor and source of the top transistor form a
shared contacted diffusion region. In Figure1 (c), the source and drain are merged into an
uncontacted region. The average capacitance of each of these types of regions can be
calculated or measured from simulation as a transistor switches between VDDand GND.

Fig 1.19 Diffusion region geometries

(B) DETAILED MOS GATE CAPACITANCE MODEL

The MOS gate sits above the channel and may partially overlap the source and drain
diffusion areas. Therefore, the gate capacitance has two components:

 The intrinsic capacitance Cgc(over the channel)


 The overlap capacitances Cgol(to the source and drain).

The intrinsic capacitance was approximated as a simple parallel plate with capacitance

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

C0= WLCox. However, the bottom plate of the capacitor depends on the mode of operation of
the transistor.

Fig 1.20.Intrinsic gate capacitance Cgc+Cgd+Cgb as a function of (a) Vgs and Vds

1. Cutoff. When the transistor is OFF (Vgs <Vt), the channel is not inverted and charge on
the gate is matched with opposite charge from the body. This is called Cgb, the gate-to-body
capacitance. For negative Vgs, the transistor is in accumulation and Cgb= C0. As Vgsincreases
but remains below a threshold, a depletion region forms at the surface. This effectively
moves the bottom plate downward from the oxide, reducing the capacitance, as shown in
Figure.

2. Linear. When Vgs>Vt, the channel inverts and again serves as a good conductive bottom
plate. However, the channel is connected to the source and drain, rather than the body, so
Cgbdrops to 0. At low values of Vds, the channel charge is roughly shared between source
and drain, so Cgs= Cgd= C0/2. As Vdsincreases, the region near the drain becomes less
inverted, so a greater fraction of the capacitance is attributed to the source and a smaller
fraction to the drain, as shown in Figure.

3. Saturation. At Vds>Vdsat, the transistor saturates and the channel pinches off. At this point,
all the intrinsic capacitance is to the source, as shown in Figure. Because of pinchoff, the
capacitance in saturation reduces to Cgs= 2/3 C0 for an ideal transistor. The behavior in these
three regions can be approximated as shown in Table1 .

The gate overlaps the source and drain in a real device and also has fringing fields
terminating on the source and drain. This leads to additional overlap capacitances, as shown
in Figure. These capacitances are proportional to the width of the transistor. Typical values
are Cgsol = Cgdol = 0.2 – 0.4 fF/µm. They should be added to the intrinsic gate capacitance
to find the total.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.3 Overlap Capacitance

It is convenient to view the gate capacitance as a single-terminal capacitor attached to the


gate (with the other side not switching). Because the source and drain actually form second
terminals, the effective gate capacitance varies with the switching activity of the source and
drain.

Figure4(a) shows the Cgs and Cgd of a long channel n-transistor. Figure 4(b) shows the C gs
and Cgd of a short channel device (L=0.75 μ m). Note that C gd is finite, i.e., Cgd > 0. This is
due to channel side fringing fields between the gate and drain.

(a) (b)

Figure.1.21 Total gate capacitance as a function of Vds

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

(C) DETAILED MOS DIFFUSION CAPACITANCE MODEL

The p–n junction between the source diffusion and the body contributes parasitic
capacitance across the depletion region. The capacitance depends on both the area AS and
sidewall perimeter PS of the source diffusion region. The geometry is illustrated in Figure 5 .
The area is AS = WD. The perimeter is PS = 2W + 2D. Of this perimeter, W abuts the
channel and the remaining W + 2D does not.

The total source parasitic capacitance is

Fig.1.22 Diffusion Region Geometry

whereCjbs(the capacitance of the junction between the body and the bottom of the
source) has units of capacitance/area and Cjbssw(the capacitance of the junction between the
body and the side walls of the source) has units of capacitance/length. Because the
depletion region thickness depends on the bias conditions, these parasitics are nonlinear.
The area junction capacitance term is

CJ is the junction capacitance at zero bias and is highly process-dependent. MJ is the


junction grading coefficient, typically in the range of 0.5 to 0.33 depending on the abruptness
of the diffusion junction.
is the built-in potential that depends on doping levels.

vTis the thermal voltage from thermodynamics, not to be confused with the threshold voltage
Vt. It has a value equal to kT/q (26 mV at room temperature), where k = 1.380 × 10–23 J/K is
Boltzmann’s constant, T is absolute temperature (300 K at room temperature), and q = 1.602
× 10–19 C is the charge of an electron. NA and ND are the doping levels of the body and
source diffusion region. niis the intrinsic carrier concentration in undoped silicon and has a
value of 1.45 × 1010 cm–3 at 300 K.

The sidewall capacitance term is of a similar form but uses different coefficients.

In some SPICE models, the capacitance of this sidewall abutting the gate and channel is
specified with another set of parameters:

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The drain diffusion has a similar parasitic capacitance dependent on AD, PD, and Vdb.
Equivalent relationships hold for pMOS transistors, but doping levels differ. As the
capacitances are voltage-dependent, the most useful information to digital designers is the
value averaged across a switching transition.

In summary, an MOS transistor can be


viewed as a four-terminal device with
capacitances between each terminal pair, as
shown in Figure6. The gate capacitance
includes an intrinsic component (to the body,
source and drain, or source alone,
depending on operating regime) and overlap
terms with the source and drain. The source
and drain have parasitic diffusion
capacitance to the body.

Fig.1.23 Capacitance of an MOS transistor

1.6 DC TRANSFER CHARACTERISTICS


The DC transfer characteristics of a circuit relate the output voltage to the input voltage,
assuming the input changes slowly enough that capacitances have plenty of time to charge
or discharge. Specific ranges of input and output voltages are defined as valid 0 and 1 logic
levels.

(A) STATIC CMOS INVERTER DC CHARACTERISTICS

Let us derive the DC transfer function (V out vs. Vin) for the static CMOS inverter shown
in Figure . We begin with Table, which outlines various regions of operation for the n- and p-
transistors. In this table, Vtn is the threshold voltage of the n-channel device, and V tp is the
threshold voltage of the p-channel device.

Note that Vtp is negative. The equations are given both in terms of V gs /Vds and Vin
/Vout. As the source of the nMOS transistor is grounded, V gsn = Vin and Vdsn = Vout. As the
source of the pMOS transistor is tied to VDD,
Vgsp = Vin – VDD and Vdsp = Vout – VDD.
Table Relationship between Voltages for the three regions of operation of a CMOS inverter

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig 1.24.CMOS Inverter

The objective is to find the variation in output voltage (V out) as a function of the input voltage
(Vin).

Given Vin, we must find Vout subject to the constraint that Idsn = |Idsp|.

For simplicity, we assume Vtp = –Vtn and that the pMOS transistor is 2–3 times as wide as
the nMOS transistor so βn = βp.

In Figure 8(a). The plot shows Idsn and Idsp in terms of Vdsn and Vdsp for various values of
Vgsn and Vgsp. Figure 8(b) shows the same plot of Idsn and |Idsp| now in terms of Vout for
various values of Vin.

The possible operating points of the inverter, marked with dots, are the values of V out
where Idsn = |Idsp| for a given value of Vin. These operating points are plotted on Vout vs. Vin
axes in Figure 8(c) to show the inverter DC transfer characteristics. The supply current I DD =
Idsn = |Idsp| is also plotted against Vin in Figure 8(d) showing that both transistors are
momentarily ON as Vin passes through voltages between GND and VDD, resulting in a
pulse of current drawn from the power supply.

The operation of the CMOS inverter can be divided into five regions indicated on Figure 8(c).
The state of each transistor in each region is shown in Table 3 .

In region A, the nMOS transistor is OFF so the pMOS transistor pulls the output to V DD. In
region B, the nMOS transistor starts to turn ON, pulling the output down. In region C, both
transistors are in saturation. Notice that ideal transistors are only in region C for Vin = V DD/2

Real transistors have finite output resistances on account of channel length


modulation, and thus have finite slopes over a broader region C. In region D, the pMOS
transistor is partially ON and in region E, it is completely OFF, leaving the nMOS transistor to
pull the output down to GND.

Also notice that the inverter’s current consumption is ideally zero, neglecting leakage,
when the input is within a threshold voltage of the V DD or GND rails. This feature is important
for low-power operation.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig 1.25 Graphical derivation of CMOS inverter DC Characteristic

We will describe about each regions in details

Region A : This region is defined by 0 =< Vin < Vtn in which the n-device is cut off (Idsn =0),
and the p-device is in the linear region. Since Idsn = –Idsp, the drain-to-source current Idsp for
the p-device is also zero. But for Vdsp = Vout– VDD, withVdsp = 0, the output voltage is Vout=VDD.

Region B : This region is characterized by Vtn =< Vin < VDD /2 in which the p-device is in its
nonsaturated region (Vds != 0) while the n-device is in saturation. The equivalent circuit for
the inverter in this region can be represented by a resistor for the p-transistor and a current
source for the n-transistor. The saturation current Idsn for the n-device is obtained by
setting Vgs = Vin . This results in

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

and Vtn =threshold voltage of n-device,µn=mobility of electrons Wn = channel width of n-


device & Ln = channel length of n-device The current for the p-device can be obtained by
noting that Vgs =( Vin – VDD ) and Vds = (Vout –VDD ). And therefore,

and Vtp =threshold voltage of n-device, µp=mobility of electrons, Wp = channel width of n-


device &Lp = channel length of n-device. The output voltage Vout can be expressed as-

Region C: In this region both the n- and p-devices are in saturation. The saturation currents
for the two devices are given by

This yields,

By setting,

Which implies that region C exists only for one value of Vin. We have assumed that a MOS
device in saturation behaves like an ideal current soured with drain-to-source current being
independent of Vds.In reality, as Vds increases, Ids also increases slightly; thus region C has a
finite slope. The significant factor to be noted is that in region C, we have two current
sources in series, which is an “unstable” condition.

Thus a small input voltage as a large effect at the output. This makes the output transition
very steep, which contrasts with the equivalent nMOS inverter characteritics. The above
expression of Vth is particularly useful since it provides the basis for defining the gate
threshold Vinv which corresponds to the state where V out=Vin .This region also defines the
“gain” of the CMOS inverter when used as a small signal amplifier.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Region D: This region is described by VDD/2 <Vin =< VDD+ Vtp.The p-device is in saturation
while the n-device is operation in its nonsaturated region..The two currents may be written
as

with Idsn = -Idsp.

The output voltage becomes

Region E: This region is defined by the input condition Vin >= VDD -Vtp, in which the p device
is cut off (Idsp =0), and the n-device is in the linear mode. Here, Vgsp= Vin - VDD Which is more
positive than Vtp. The output in this region is Vout=0.

From the transfer curve , it may be seen that the transition between the two states is very
step.This characteristic is very desirable because the noise immunity is maximized.

(B) BETA RATIO EFFECTS

The gate-threshold voltage, Vinv, where Vin =Vout is dependent on βn/βp . Thus, for
given process, if we want to change βn/βp we need to change the channel dimensions,
i.e.,channel-length L and channel-width W.

Therefore it can be seen that as the ratio βn/βp is decreased, the transition region
shifts from left to right; however, the output voltage transition remains sharp.

Fig 1.26 Transfer characteristics of skewed inverters

Inverters with different beta ratios r = βp /βn are called skewed inverters [Sutherland99]. If r
> 1, the inverter is HI-skewed. If r < 1, the inverter is LO-skewed. If r = 1, the inverter has
normal skew or is unskewed.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

A HI-skew inverter has a stronger pMOS transistor. Therefore, if the input is V DD /2, we would
expect the output will be greater than V DD /2. In other words, the input threshold must be
higher than for an unskewed inverter. Similarly, a LO-skew inverter has a weaker pMOS
transistor and thus a lower switching threshold.

Figure explores the impact of skewing the beta ratio on the DC transfer characteristics. As
the beta ratio is changed, the switching threshold moves. However, the output voltage
transition remains sharp. Gates are usually skewed by adjusting the widths of transistors
while maintaining minimum length for speed.

(C) NOISE MARGIN

Noise margin is closely related to the DC voltage characteristics. This parameter allows
you to determine the allowable noise voltage on the input of a gate so that the output will not
be corrupted. The specification most commonly used to describe noise margin (or noise
immunity) uses two parameters:

 The LOW noise margin, NML


 The HIGH noise margin, NMH.

With reference to Figure, NML is defined as the difference in maximum LOW input voltage
recognized by the receiving gate and the maximum LOW output voltage produced by the
driving gate.

The value of NMH is the difference between the minimum HIGH output voltage of the driving
gate and the minimum HIGH input voltage recognized by the receiving gate. Thus,

where
VIH = minimum HIGH input voltage
VIL = maximum LOW input voltage
VOH= minimum HIGH output voltage
VOL = maximum LOW output voltage

Fig.1.27 Noise margin definitions

Inputs between VIL and VIH are said to be in the indeterminate region or forbidden zone and
do not represent legal digital logic levels. Therefore, it is generally desirable to have V IH as
close as possible to VIL and for this value to be midway in the “logic swing,” VOL to VOH.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

This implies that the transfer characteristic should switch abruptly; that is, there should be
high gain in the transition region. For the purpose of calculating noise margins, the transfer
characteristic of the inverter and the definition of voltage levels VIL, VOL, VIH, and VOH are
shown in Figure.

Logic levels are defined at the unity gain point where the slope is –1. This gives a
conservative bound on the worst case static noise margin.

Fig 1.28 CMOS inverter noise margins

For the inverter shown, the NML is 0.46 VDD while the NMH is 0.13 VDD. Note that the
output is slightly degraded when the input is at its worst legal value; this is called noise feed
through or propagated noise.

If either NML or NMH for a gate are too small, the gate may be disturbed by noise that
occurs on the inputs. An unskewed gate has equal noise margins, which maximizes
immunity to arbitrary noise sources. If a gate sees more noise in the high or low input state,
the gate can be skewed to improve that noise margin at the expense of the other. Note that
if |Vtp| = Vtn , then NMH and NML increase as threshold voltages are increased.

DC analysis gives us the static noise margins specifying the level of noise that a gate may
see for an indefinite duration. Larger noise pulses may be acceptable if they are brief;.
Unfortunately, there is no simple amplitude-duration product that conveniently specifies
dynamic noise margins.

1.7 Pass Transistors and Transmission Gates


The strength of a signal is measured by how closely it approximates an ideal voltage
source. In general, the stronger a signal, the more current it can source or sink. The power
supplies, or rails, (VDD and GND) are the source of the strongest 1s and 0s.
An nMOS transistor is an almost perfect switch when passing a 0 and thus we say it
passes a strong 0. However, the nMOS transistor is imperfect at passing a 1. The high
voltage level is somewhat less than VDD. It passes a degraded or weak 1. A pMOS
transistor again has the opposite behavior, passing strong 1s but degraded 0s. The
transistor symbols and behaviors are summarized in Figure with g, s, and d indicating gate,
source, and drain.
When an nMOS or pMOS is used alone as an imperfect switch, we sometimes call it
a pass transistor. By combining an nMOS and a pMOS transistor in parallel , we obtain a
switch that turns on when a 1 is applied to g in which 0s and 1s are both passed in an
acceptable fashion. We term this a transmission gate or pass gate. In a circuit where only a

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

0 or a 1 has to be passed, the appropriate transistor (n or p) can be deleted, reverting to a


single nMOS or pMOS device.

Figure1.29 Pass transistors strong and degraded outputs

Note that both the control input and its complement are required by the transmission
gate. This is called double rail logic. Some circuit symbols for the transmission gate are
shown in Figure.
.
In all of our examples so far, the inputs drive the gate terminals of nMOS transistors
in the pull-down network and pMOS transistors in the complementary pull-up network, as
was shown in Figure.
Thus, the nMOS transistors only need to pass 0s and the pMOS only pass 1s, so the
output is always strongly driven and the levels are never degraded. This is called a fully
restored logic gate and simplifies circuit design considerably.
In contrast to other forms of logic, where the pull-up and pull-down switch networks
have to be ratioed in some manner, static CMOS gates operate correctly independently of
the physical sizes of the transistors. Moreover, there is never a path through ‘ON’ transistors
from the 1 to the 0 supplies for any combination of inputs (in contrast to single-channel MOS,
GaAs technologies, or bipolar).

1.30 Fig Transmission gate

1.8 DELAY ESTIMATION

In some specific designs there will be a many number of logic paths, that are called the
critical paths, which require a attention to timing details.

The critical paths can be affected at four main levels of your designs are:
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

i. The architectural/microarchitectural level: The most efficient design is achieved with


a good microarchitecture. This requires a broad knowledge of how many gate delays
fit in a clock cycle, how fast addition occurs, how fast memories are accessed, and
how long signals take to propagate along a wire. Tradeoffs at the microarchitectural
level include the number of pipeline stages, the number of execution units, and the
size of memories.

ii. The logic level: Here tradeoffs include types of functional blocks, the number of
stages of gates in the cycle, and the fan-in and fan-out of the gates. However, that no
amount of skillful logic design can overcome a poor microarchitecture.

iii. The circuit level: Once the logic has been selected, the delay can be tuned at the
circuit level by choosing transistor sizes or using other styles of CMOS logic. (iv) The
layout level: Finally, delay is dependent on the layout. The floor plan is of great
importance because it determines the wire lengths that can dominate delay.

 We will focus on the logic and circuit optimizations of selecting the number of stages
of logic, the types of gates, and the transistor sizes.

 Quick delay estimation is essential to designing critical paths. Timing details of the
critical path can be recognized by a timing analyzer, which is a design tool that
automatically finds the slowest paths in a logic design.

 Simulation or timing analysis only provide the details of how fast a particular circuit
operates; they do not specify how the circuit could be modified to operate faster.

 A delay simple models that can be applied on the design to rapidly estimate delay,
understand its origin, and figure out how it can be reduced.

1.8.1 RC DELAY MODEL

 This RC delay model is used to estimate the delay of logic gates.


 Some important definitions need to be known while calculating delays are:
 Rise time, tr = time for a waveform to rise from 20% to 80% of its steady-state value
 Fall time, tf = time for a waveform to fall from 80% to 20% of its steady-state value
 Edge rate, trf= (tr + tf)/2
 Propagation delay time, tpd = maximum time from the input crossing 50% to the
output crossing 50%.

 Contamination delay time, tcd = minimum time from the input crossing 50% to the
output crossing 50%

 So when an input changes, the output will retain its old value for atleast the
contamination delay and take on its new value in at most the propagation delay.
Propagation and contamination delay times are also called max-time and min-time.

 This RC delay model is used to estimate the delay of logic gates as the RC product
of the effective driver resistance and the load capacitance. The gate that charges or
discharges a node is called the driver and the gates and wire being driven is called
the load.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Usually, logic gates use minimum-length devices for least delay, area, and power
consumption. Given this, the delay of a logic gate depends on the widths of the
transistors in the gate and the capacitance of the load that must be driven.

1.8.2 Effective Resistance and Capacitance:

 An nMOS transistor with width of one unit is defined to have effective resistance R.
The unit-width pMOS has a higher resistance to the nMOS transistor; let us assume
this resistance is 2R.

 Wider transistors have lower resistance. For example, a pMOS transistor of double-
unit width has effective resistance R.

 Parallel and series transistors combine like conventional resistors. When multiple
transistors are in series, their resistance is the sum of each individual resistance.

 When multiple transistors are in parallel, the resistance is lower if they are all ON. In
that case, the effective resistance is just that of the single transistor.

 Capacitance consists of gate capacitance and source/drain diffusion capacitance. Let


us define the gate capacitance of a unit transistor to be Cg and the diffusion
capacitance of its contacted source and drain to each be Cdiff. Cg and Cdiff are
proportional to transistor width.

 In many processes the capacitances are approximately equal and can be labeled
C=Cg=Cdiff to keep estimation simple. The second terminal of the diffusion capacitor
is the body, which is usually tied to ground (for nMOS) or VDD (for pMOS).

 As the DC voltage on the second terminal is irrelevant to delay, we often draw both
capacitances to ground for simplicity. The gate capacitance includes fields
terminating on the channel, source, and drain.

1.8.3 Diffusion Capacitance Layout Effects:

 Usually, gate capacitance can be determined directly from the transistor widths in the
schematic. Diffusion capacitance depends on the layout.

 In a good layout, diffusion nodes are shared wherever possible to reduce the
diffusion capacitance. Moreover, the uncontacted diffusion nodes between series
transistors are usually smaller than those that must be contacted. Such uncontacted
nodes have less capacitance.

 A conservative method of estimating capacitances before layout is to assume


uncontacted diffusion between series transistors and contacted diffusion on all other
nodes.

1.8.4 Elmore Delay Model

 Viewing ON transistors as resistors, we see that a chain of transistors can be


represented as an RC ladder as shown in Figure (a). The Elmore delay model
estimates the delay of an RC ladder as the sum over each node in the ladder of the

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

resistance Rn_t between that node and a supply multiplied by the capacitance on
the node:

Fig. 1.31. RC ladder for


Elmore model

Observe that the delay consists of two components. The parasitic delay is determined by the
gate driving its own internal diffusion capacitance. Boosting the width of the transistors
decreases the resistance but increases the capacitance so the parasitic delay is ideally
independent of the gate size.

 he effort delay depends on the ratio of external load capacitance to input


capacitance and thus changes with transistor widths. The capacitance ratio is called
the electrical effort or fanout and the term indicating gate complexity is called the
logical effort.

 That the delay of an ideal fanout-of-1 inverter with no parasitic capacitance is x =


3RC. We denote the normalized delay as multiples of this inverter delay:

d= tpd/x

Hence, the rising delay of the 2- input NAND gate is d = (4/3)h + 2. The RC delay model
similarly predicts an inverter with real parasitics driving h identical inverters to have a delay
of h+1.

1.9 LINEAR DELAY MODEL:

 In general the propagation delay of a gate can be written as d=f+p.


where p is the parasitic delay to the gate when no load is attached; f=gh is the effort
delay it depends on complexity and fanout of the gate.

 The complexity is represented by the logical effort, g. An inverter is defined to have a


logical effort of 1. More complex gates have greater logical efforts, indicating that
they take longer to drive a given fanout.

 If the load is not identical copies of the gate, electrical effort can be computed as

where Cout is the capacitance of the external load being driven Cin is the input capacitance
of the gate Figure (b) plots normalized delay vs. electrical effort for an idealized
inverter and 2-input NAND gate. The y-intercepts indicate the parasitic delay, i.e., the
delay when the gate drives no load. The slope of the lines is the logical effort.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 The inverter has a slope of 1 by definition. The NAND has a slope of 4/3. The logical
effort and parasitic delay can be estimated using RC models.

Fig. 1.32: Normalized delay vs. fanout


 A properly calibrated linear delay model is widely used by CAD tools such as logic
synthesizers and static timing analyzers, although the notation varies from tool to
tool. For example, the popular Synopsys Design Compiler tool uses the following
basic model to define delay for a library of gates:

 These parameters are related to the logical effort terms as given in Table. The
effective resistance of a gate increases with the logical effort of the gate but
decreases with the gate size.

 Some designers use the term drive as the reciprocal of resistance: drive = Cm/g.
Delay can be expressed in terms of drive as

1.10 LOGICAL EFFORT

Logical effort of a gate is defined as the ratio of the input capacitance of the gate to
the input capacitance of an inverter that can deliver the same output current.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Equivalently, logical effort indicates how much worse a gate is at producing output
current as compared to an inverter, given that each input of the gate may only present as
much input capacitance as the inverter.

Logical effort can be measured in simulation from delay vs. fanout plots as the ratio
of the slope of the delay of the gate to the slope of the delay of an inverter.

Alternatively, it can be estimated by sketching gates. Figure(2.1) shows inverter, 3-


input NAND, and 3-input NOR gates with transistor widths chosen to achieve unit resistance,
assuming pMOS transistors have twice the resistance of nMOS transistors.

The inverter presents three units of input capacitance. The NAND presents five units
of capacitance on each input, so the logical effort is 5/3. Similarly, the NOR presents seven
units of capacitance, so the logical effort is 7/3.

This matches our expectation that NANDs are better than NORs because NORs
have slow pMOS transistors in series.

Table lists the logical effort of common gates. The effort tends to increase with the
number of inputs. NAND gates are better than NOR gates because the series transistors are
nMOS rather than pMOS. Exclusive-OR gates are particularly costly and have different
logical efforts for different inputs.

Fig 1.33 Logic gates sized for unit resistance

1.11 PARASITIC DELAY

The parasitic delay of a gate is the delay of the gate when it drives zero load. It can
be estimated with RC delay models.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

A crude method good for hand calculations is to count only diffusion capacitance on
the output node. For example, consider the gates in Figure (2.1), assuming each transistor
on the output node has its own drain diffusion contact. Transistor widths were chosen to give
a resistance of R in each gate. The inverter has three units of diffusion capacitance on the
output, so the parasitic delay is 3RC =τ.

pinv is the ratio of diffusion capacitance to gate capacitance in a particular process. It


is usually close to 1 and will be considered to be 1. The 3-input NAND and NOR each have
9 units of diffusion capacitance on the output, so the parasitic delay is three times as great
(3pinv, or simply 3).

Table estimates the parasitic delay of common gates. Increasing transistor sizes
reduces resistance but increases capacitance correspondingly, so parasitic delay is, on first
order, independent of gate size.

The parasitic delay also depends on the ratio of diffusion capacitance to gate
capacitance.

Nevertheless, it is important to realize that parasitic delay grows more than linearly with the
number of inputs in a real NAND or NOR circuit.

For example, Figure2.2 shows a model of an n-input NAND gate in which the upper
inputs were all 1 and the bottom input rises. The gate must discharge the diffusion
capacitances of all of the internal nodes as well as the output. The Elmore delay is

Fig 1.34 n-input NAND gate parasitic delay

This delay grows quadratically with the number of series transistors n, indicating that beyond
a certain point it is faster to split a large gate into a cascade of two smaller gates.
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

2.1 LOGICAL EFFORT OF PATHS

Designers often need to choose the fastest circuit topology and gate sizes for a
particular logic function and to estimate the delay of the design. The method of Logical Effort
provides a simple method “on the back of an envelope” to choose the best topology and
number of stages of logic for a function. Based on the linear delay model, it allows the
designer to quickly estimate the best number of stages for a path, the minimum possible
delay for the given topology, and the gate sizes that achieve this delay.

1.12 DELAY IN MULTISTAGE LOGIC NETWORKS

Figure shows the logical and electrical efforts of each stage in a multistage path as a
function of the sizes of each stage. The path of interest (the only path in this case) is marked
with the dashed blue line.

Observe that logical effort is independent of size, while electrical effort depends on
sizes. This section develops some metrics for the path as a whole that are independent of
sizing decisions.

Fig 1.35Multistage logic network


The path logical effort G can be expressed as the products of the logical efforts of each
stage along the path.

The path electrical effort H can be given as the ratio of the output capacitance the path must
drive divided by the input capacitance presented by the path. This is more convenient than
defining path electrical effort as the product of stage electrical efforts because we do not
know the individual stage electrical efforts until gate sizes are selected.

The path effort F is the product of the stage efforts of each stage. Recall that the stage effort
of a single stage is f = gh. Can we by analogy state F = GH for a path?

In paths that branch,F≠GH . This is illustrated in Figure , a circuit with a two way branch.
Consider a path from the primary input to one of the outputs. The path logical effort is G = 1
× 1 = 1. The path electrical effort is H = 90/5 = 18. Thus, GH = 18. But F = f1 f2 = g1h1g2h2
= 1 × 6 × 1 × 6 = 36. In other words, F = 2GH in this path on account of the two-way branch.
We must introduce a new kind of effort to account for branching between stages of a path.
This branching effort b is the ratio of the total capacitance seen by a stage to the
capacitance on the path; in Figure 4.30 it is (15 + 15)/15 = 2.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE1.36 Circuit with two-way branch

The path branching effort B is the product of the branching efforts between stages.

Now we can define the path effort F as the product of the logical, electrical, and branching
efforts of the path. Note that the product of the electrical efforts of the stages is actually BH,
not just H.

We can now compute the delay of a multistage network.

The path delay D is the sum of the delays of each stage. It can also be written as the sum of
the path effort delay DF and path parasitic delay P:

The product of the stage efforts is F, independent of gate sizes. The path effort delay is the
sum of the stage efforts. The sum of a set of numbers whose product is constant is
minimized by choosing all the numbers to be equal. In other words, the path delay is
minimized when each stage bears the same effort. If a path has N stages and each bears
the same effort, that effort must be

Thus, the minimum possible delay of an N-stage path with path effort F and path parasitic
delay P is

This is a key result of Logical Effort. It shows that the minimum delay of the path can be
estimated knowing only the number of stages, path effort, and parasitic delays without the
need to assign transistor sizes. This is superior to simulation, in which delay depends on
sizes and you never achieve certainty that the sizes selected are those that offer minimum
delay.

capacitance transformation formula to find the best input capacitance for a gate given the
output capacitance it drives.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Starting with the load at the end of the path, work backward applying the capacitance
transformation to determine the size of each stage. Check the arithmetic by verifying that the
size of the initial stage matches the specification.

1.13 CHOOSING THE BEST NUMBER OF STAGES

Given a specific circuit topology, we now know how to estimate delay and choose gate sizes.
However, there are many different topologies that implement a particular logic function.
Logical Effort tells us that NANDs are better than NORs and that gates with few inputs are
better than gates with many. In this section, we will also use Logical Effort to predict the best
number of stages to use.

Logic designers sometimes estimate delay by counting the number of stages of logic,
assuming each stage has a constant “gate delay.” This is potentially misleading because it
implies that the fastest circuits are those that use the fewest stages of logic.

Of course, the gate delay actually depends on the electrical effort, so sometimes using fewer
stages results in more delay. The following example illustrates this point. parasitic delay. The
delay of the new path is

Differentiating with respect to N and setting to 0 allows us to solve for the best number of
stages, which we will call N. The result can be expressed more compactly by defining

to be the best stage effort.

A path achieves least delay by using stages. It is important to understand not only the best
stage effort and number of stages but also the sensitivity to using a different number of
stages. Figure plots the delay increase using a particular number of stages against the total
number of stages, for pinv = 1. The x-axis plots the ratio of the actual number of stages to
the ideal number. The y-axis plots the ratio of the actual delay to the best achievable.

Fig 1.37 Sensitivity of delay to number of stages

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

1.14.SCALING

The only constant in VLSI design is constant change. Figure 1.6 showed the unrelenting
march of technology, in which feature size has reduced by 30% every two to three years.

Such scaling is unprecedented in the history of technology. However, scaling also


exacerbates bates reliability issues, increases complexity, and introduces new problems.
Designers need to be able to predict the effect of this feature size scaling on chip
performance to plan future products, ensure existing products will scale gracefully to future
processes for cost reduction, and anticipate looming design challenges

1.14.1 TRANSISTOR SCALING

Dennard’s Scaling Law predicts that the basic operational characteristics of a MOS transistor
can be preserved and the performance improved if the critical parameters of a device are
scaled by a dimensionless factor S. These parameters include the following:

 All dimensions (in the x, y, and z directions)


 Device voltages
 Doping concentration densities

This approach is also called constant field scaling because the electric fields remain the
same as both voltage and distance shrink. In contrast, constant voltage scaling shrinks the
devices but not the power supply. Another approach is lateral scaling, in which only the gate
length is scaled. This is commonly called a gate shrink because it can be done easily to an
existing mask database for a design.

The effects of these types of scaling are illustrated in Table .

Figure2.10 shows how voltage has scaled with feature size. Historically, feature
sizes were shrunk from 6 µm to 1 µm while maintaining a 5 V supply voltage. This constant

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

voltage scaling offered quadratic delay improvement as well as cost reduction. It also
maintained continuity in I/O voltage standards.

Fig 1.38 Voltage scaling with feature size

Constant voltage scaling increased the electric fields in devices. By the 1µm
generation, velocity saturation was severe enough that decreasing feature size no longer
improved device current. Device breakdown from the high field was another risk. And power
consumption became unacceptable. Therefore, Dennard scaling has been the rule since the
half-micron node.

Maintaining a constant field has the further benefit that many nonlinear factors and wearout
mechanisms are essentially unaffected. Unfortunately, voltage scaling has dramatically
slowed since the 90 nm generation because of leakage, and this may ultimately limit CMOS
scaling.

The FO4 inverter delay will scale as 1/S assuming ideal constant-field scaling.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

1.14.2 INTERCONNECT SCALING

Wires also tend to be scaled equally in width and thickness to maintain an aspect ratio close
to 2.1 Table 7.5 shows the resistance, capacitance, and delay per unit length. Wires and use
the bottom layers of metal. Semiglobal (or scaled ) wires run across larger blocks or cores,
typically using middle layers of metal. Both local and semiglobal wires scale with feature
size.

Global wires run across the entire chip using upper levels of metal. For example, global
wires might connect cores to a shared cache. Global wires do not scale with feature size;
indeed, they may get longer (by a factor of Dc , on the order of 1.1) because die size has
been gradually increasing.

Most local wires are short enough that their resistance does not matter. Like gates, their
capacitance per unit length is remaining constant, so their delay is improving just like gates.

Semiglobal wires long enough to require repeaters are speeding up, but not as fast as gates.
This is a relatively minor problem. Global wires, even with optimal repeaters, are getting
slower as technology scales. The time to cross a chip in a nanometer process can be
multiple cycles, and this delay must be accounted for in the microarchitecture.

Observe that when wire thickness is scaled, the capacitance per unit length remains
constant. Hence, a reasonable initial estimate of the capacitance of a minimum-pitch wire is

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

about 0.2 fF/µm, independent of the process. In other words, wire capacitance is roughly 1/5
of gate capacitance per unit length.

1.15 Propagation delays


Propagation delay is introduced when he logic signals have to pass through a chain
of pass transisitors. The transistors could pose a RC product delay and this
increases drastically as the number of pass transistors in series increases. As seen
from the figure the response at node V2 is given by CdV2/dt=(V1-V2)(V2-V3)/R. For
RCdv
a long network we can write =dv 2/dx 2, i.e delay α x2.
dt

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig. 1.39 Propagation Delays

1.16 STICK DIAGRAMS


A popular method of symbolic design is "Sticks" layout. In this, the designer
draws a freehand sketch of a layout, using colored lines to represent the various
process layers such as diffusion, metal and polysilicon .Where polysilicon crosses
diffusion, transistors are created and where metal wires join diffusion or polysilicon,
contacts are formed.
This notation indicates only the relative positioning of the various design
components .The absolute coordinates of these elements are determined
automatically by the editor using a compactor. The compactor translates the design
rules into a set of constraints on the component positions and solve a constrained
optimization problem that attempts to minimize the area or cost function.
The advantage of this symbolic approach is that the designer does not have
to worry about design rules, because the compactor ensures that the final layout is
physically correct.
The disadvantage of the symbolic approach is that the outcome of the
compaction phase is often unpredictable. The resulting layout can be less dense
than what is obtained with the manual approach. In addition, it does not show exact
placement, transistor sizes, wire lengths, wire widths, tub boundaries.

Figure 1.40: Stick Diagram of a CMOS Inverter

Fig. 1.41 Simplified λ based design rules

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Stick diagrams convey layer information through colour codes (or


monochrome encoding).

N+ N+

VXDD
Gnd
Gnd
VDD
x

x
x

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Gnd
VDD
x
x

Fig 1.42: Stick Diagram of a CMOS Inverter

Stick Diagrams – Notations

When two or more ‘sticks’ of the same type cross or touch each other that represents
electrical contact.
1) Power and ground lines run horizontally in metal 1.
2) The input and output are accessible from the top or bottom of the cell and will be in Metal
2 running vertically.
3) To draw the stick diagrams the conventions used in this book are shown in
Figure. These conventions are: For Metal-1 use thick solid line, for Metal-2 use thin solid
line, for poly use thick dashed line, for active ( n+ or p+ ) use thin dashed line, for contact use
"X" and for via use "O".

Fig-Stick-Diagrams

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

1.17 LAYOUT DESIGN RULES


Layout rules, also referred to as design rules, and can be considered a
prescription for preparing the photo masks that are used in the fabrication of
integrated circuits. The rules are defined in terms of feature sizes (widths),
separations, and overlaps.
The main objective of the layout rules is to build reliably functional circuits in
as small an area as possible. In general, design rules represent a compromise
between performance and yield. The more conservative the rules are, the more likely
it is that the circuit will function. However, the more aggressive the rules are, the
greater the opportunity for improvements in circuit performance and size.
Design rules specify to the designer certain geometric constraints on the
layout artwork so that the patterns on the processed wafer will preserve the topology
and geometry of the designs. It is important to note that design rules do not
represent some hard boundary between correct and incorrect fabrication. Rather,
they represent a tolerance that ensures high probability of correct fabrication and
subsequent operation.
1.17.1 Types Of Design Rules
The design rules primary address two issues:
 The geometrical reproduction of features that can be reproduced by the mask-
making and lithographical process
 The interaction between different layers.
There are primarily two approaches in describing the design rules.
1. Scalable Design Rules (e.g. SCMOS, λ-based design rules):
In this approach, all rules are defined in terms of a single parameter λ. The rules are
so chosen that a design can be easily ported over a cross section of industrial
process,making the layout portable.Scaling can be easily done by simply changing
the value .
The key disadvantages of this approach are:
 Linear scaling is possible only over a limited range of dimensions.
 Scalable design rules are conservative .This results in over dimensioned and
less dense design.
 This rule is not used in real life.
2. Absolute Design Rules (e.g. µ-based design rules) :
In this approach, the design rules are expressed in absolute dimensions (e.g.
0.75µm) and therefore can exploit the features of a given process to a maximum
degree. Here, scaling and porting is more demanding, and has to be performed
either manually or using CAD tools .Also, these rules tend to be more complex
especially for deep submicron.
The fundamental unity in the definition of a set of design rules is the minimum line
width It stands for the minimum mask dimension that can be safely transferred to the
semiconductor material .Even for the same minimum dimension, design rules tend to
differ from company to company, and from process to process.
1.17.2 LAYER REPRESENTATIONS
With increase of complexity in the CMOS processes, the visualization of all the mask
levels that are used in the actual fabrication process becomes inhibited. The layer
concept translates these masks to a set of conceptual layout levels that are easier to
visualize by the circuit designer.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

From the designer's viewpoint, all CMOS designs have the following entities:
 Two different substrates and/or wells: which are p-type for NMOS and n-type
for PMOS.
 Diffusion regions (p+ and n+): which defines the area where transistors can
be formed. These regions are also called active areas.
 Diffusion of an inverse type is needed to implement contacts to the well or to
substrate. These are called select regions.
 Transistor gate electrodes : Polysilicon layer
 Metal interconnect layers
 Interlayer contacts and via layers.
The layers for typical CMOS processes are represented in various figures in terms
of:
 A color scheme (Mead-Conway colors).
 Other color schemes designed to differentiate CMOS structures.
 Varying stipple patterns
 Varying line styles

Fig. 1.43 Figure representations of layers

1.17.3Gate Layouts
For many applications, a straightforward layout is good enough and can be
automatically generated or rapidly built by hand. This section presents a simple
layout style based on a “line of diffusion”rule that is commonly used for standard
cells in automated layout systems.

This style consists of four horizontal strips:


metal ground at the bottom of the cell, n-diffusion, p-diffusion,and metal power
at the top.

The power and ground lines areoften called supply rails. Polysilicon lines run
vertically to formtransistor gates. Metal wires within the cell connect the transistors
appropriately.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig. 1.44 Layout of an inverter


Figure 1.20 shows such a layout for an inverter. Theinput Acan be connected from
the top, bottom, or left inpolysilicon. The output Y is available at the right side of
thecell in metal. Recall that the p-substrate and n-well must be tied to ground and
power,respectively. Notice how the nMOS transistors are connected in series while
the pMOS transistors areconnected in parallel.

Part-A (2 marks)
1. What is Moore’s law?
Moore’s law states that the number of transistor would double every 18 months.

2.What is CMOS technology?


Complementary Metal Oxide Semiconductor (CMOS)in which both n-channel MOS
and p-channel MOS are fabricated in the same IC.

3.What are the advantages of CMOS over NMOS technology ?


In CMOS technology the aluminum gates of the transistor are replaced by poly
silicon gate.The main advantage of CMOS over NMOS is low power consumption.
In CMOS technology the device sizes can be easily scalable than NMOS.

4.What are the advantages of CMOS technology ?


 Low power consumption.
 High performance.
 Scalable threshold voltage.
 High noise margin.
 Low output drive current.

5.What are the disadvantages of CMOS technology ?


 Low resistance to produce deviations and temperature changes.
 Low switching speed at large values of capacitive loads.

6.What is design rule ?


Design rules are the communication link between the designer
specifyingrequirements and the fabricator who materializes them. The design rule

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

conform to a set of geometric constraints or rule specify the minimum allowable line
widths for physical objects on-chip such as metal and poly silicon interconnects or
diffusion area, minimum feature dimensions and minimum allowable separations
between two layers.

7.What is stick diagram ?


Stick diagram are the key element of designing a circuit used to convey layer
informationthrough the use of a color code

8.What is micron design rule?


Micron rules specify the layout constraints such as minimum feature sizes and
minimum allowable feature separations are stated in terms of absolute dimensions in
micrometers.

9.What is Lambda design rule?


Lambda rule specify the layout constraints such as minimum feature sizes and
minimum allowable feature separations are stated in terms of a single parameter (λ)
and thus allow linear, proportional scaling of all geometrical constraints.

10.What is DRC ?
Design Rule Check program looks for design rule violations in the layout. It checks
for
minimum spacing and minimum size and ensures that combinations of layers from
legal components.

11.Mention MOS transistor characteristics ?


Metal Oxide Semiconductor is a three terminal device having source, drain and
gate.
The resistance path between the drain and the source is controlled by applying a
voltage to the gate.The Normal conduction characteristics of an MOS transistor can
be categorized as cutoff region Non saturated region and saturated region.

12.Compare NMOS and PMOS

13. Compare enhancement and depletion mode devices

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

14.What is threshold voltage ?


It is defined as the minimum voltage at which the device starts conduction (ie) turns
on.

15.What are different operating modes of MOS transistor ?


 Accumulation mode
 Depletion mode
 Inversion mode

16.What is accumulation mode?


When the gate to source voltage(Vgs) is much less than the threshold voltage (Vt)
then itis termed as the accumulation mode. There is no conduction between source
and drain. The device is turned off.

17.What is depletion mode ?


When the gate to source voltage(Vgs) is increased greater than the threshold
voltage (Vt)the electrons are attracted towards the gate while the holes are repelled
causing a depletion region under the gate. This is called depletion mode.

18.What is inversion mode ?


When Vgs is raised above the Vt the electrons are attracted to the gate region.
Under such a condition the surface of the underlying p-type silicon is said to be a
inverted to n-type, and provides a conduction path between a source and drain. The
device is turned on. This is called inversion mode.

19.What are three operating regions of MOS transistor ?


 cut-off region
 Non saturated region
 saturated region
20.What is cut-off region?
The region where the current flow is essentially zero is called cut-off region.
(ie) Ids=0, Vgs ≤ Vt.

21.What is body effect ?


The threshold voltage Vt is constant with respect to voltage difference between
sourceand the substrate is called body effect.

22.Write the threshold voltage equation including the body effect ?

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

23.What is latchup?How to prevent latchup?May-16


Latch up is condition in which the parasitic components give rise to the
establishment of low resistance conducting paths between VDD and VSS with
disastrous results.careful control during fabrication is necessary to avoid this
problem.
24.State channel length modulation ?Write down the equation for describing channel
length modulation inNMOS. May-16,May-17
The current between drain and source terminals is constant and independent of the
applied voltage over the terminals.this is not entirely correct.The effective length of
the conductive channel is actually modulated by the applied Vds,increasing Vds
causes the depletion region at the drain junction to grow,reducing the length of
effective channel.constant.
25.
PART B
1. (ii)Discuss in detail with a neat layout the design rules for a CMOS inverter.

2. )(i)Discuss in detail with necessary equations the operation of MOSFET and its
current – Voltage characteristics.
3. a) (i) An NMOS transistor has the following parameters : gate oxide thickness = 10
nm, relative permittivity of gate oxide = 3.9, electron mobility = 520 2 cm /V-sec,
threshold voltage = 0.7 V, permittivity of free space = 14 10 85 . 8 −× F/cm and (W/L)
= 8. Calculate the drain current when ( GS V = 2 V and = DS V 1.2 V) and ( GS V = 2
V and = DS V2 V) and also compute the gate oxide capacitance per unit area. Note
that W and L refer to the width and length of the channel respectively. (3 + 3 +2) Dec
2011
4. Draw and explain the DC and transfer characteristics of a CMOS inverter with
necessary conditions for the different regions of operation. (8) May-16
5. (a) Explain in detail about the ideal I-V characteristics and non ideal I-V
characteristics of a NMOS and PMOS devices.(16)May-16
6. a) Discuss the CV characteristic and DC transfer characteristic of the CMOS.(16)
7. Discuss the principles of constant field scaling and lateral field sacling.Write the
effects of above scaling methods on device characteristics.May-16
8.Explain the dynamic behaviour of MOSFET transistor with neat diagram.April/May
2018

9.Write the layout design rul;es and diagrams for four input NAND &NOR. April/May
2018

10.Explain the basic principles of transmission gate in CMOS design. April/May 2018

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

15.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

UNIT 2
COMBINATIONAL
LOGIC CIRCUITS
REFERRED BOOK:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated Circuits: A
Design Perspective”, Second Edition, Prentice Hall of India, 2003.

2. M.J. Smith, “Application Specific Integrated Circuits”, Addisson Wesley, 1997

3. N.Weste, K.Eshraghian, “Principles of CMOS VLSI Design”, Second Edition,


Addision Wesley 1993

4. R.Jacob Baker, Harry W.LI., David E.Boyee, “CMOS Circuit Design, Layout
and Simulation”, Prentice Hall of India 2005 3. A.Pucknell, Kamran
Eshraghian, “BASIC VLSI Design”, Third Edition, Prentice Hall of India, 2007.

STAFF IN-CHARGE HOD

I .CIRCUIT FAMILIES

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Static CMOS circuits with complementary nMOS pulldown and pMOS pullup
networks are used for the vast majority of logic gates in integrated circuits.

 They have good noise margins, and are fast, low power, insensitive to device
variations, easy to design, widely supported by CAD tools, and readily available in
standard cell libraries.

 When noise does exceed the margins, the gate delay increases because of the
glitch, but the gate eventually will settle to the correct answer. Most design teams
now use static CMOS exclusively for combinational logic.

 The most important alternative is dynamic circuits. However, we begin by considering


ratioed circuits, which are simpler and offer a helpful conceptual transition between
static and dynamic. We also consider pass transistors, which had their general-
purpose logic and still appear in specialized applications

3.1 STATIC CMOS

 Designers accustomed to AND and OR functions must learn to think in terms of


NAND and NOR to take advantage of static CMOS.

 In manual circuit design, this is often done through bubble pushing.

 Compound gates are particularly useful to perform complex functions with relatively
low logical efforts.

 When a particular input is known to be latest, the gate can be optimized to favor that
input.

 Similarly, when either the rising or falling edge is known to be more critical, the gate
can be optimized to favor that edge.

 We have focused on building gates with equal rising and falling delays; however,
using smaller pMOS transistors can reduce power, area, and delay.

 In processes with multiple threshold voltages, multiple flavors of gates can be


constructed with different speed/leakage power trade-offs.

1.1.1 Bubble Pushing

CMOS stages are inherently inverting, so AND and OR functions must be built from NAND
and NOR gates. DeMorgan’s law helps with this conversion

These relations are illustrated graphically in Figure 3.1. A NAND gate is quivalent to an OR
of inverted inputs. A NOR gate is equivalent to an AND of inverted inputs. The same
relationship applies to gates with more inputs. Switching between representations is easy to
do on a whiteboard and is often called bubble pushing.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.1 Bubble pushing with DeMorgan’s law


EXAMPLE

Design a circuit to compute F = AB + CD using NANDs and NORs.

SOLUTION: By inspection, the circuit consists of two ANDs and an OR, shown in Figure (a).
In Figure(b), the ANDs and ORs are converted to basic CMOS stages. In Figure (c and d),
bubble pushing is used to simplify the logic to three NANDs.

FIGURE Bubble pushing to convert ANDs and ORs to NANDs and NORs FIG. Logic using AOI22
gate

1.1.2 Compound Gates

Static CMOS also efficiently handles compound gates computing various inverting
combinations of AND/OR functions in a single stage.

The function F = AB + CD can be computed with an AND-OR INVERT- 22 (AOI22) gate and
an inverter, as shown in Figure 3.2.

In general, logical effort of compound gates can be different for different inputs.

Figure 3.2 shows how logical efforts can be estimated for the AOI21, AOI22, and a more
complex compound AOI gate. The transistor widths are chosen to give the same drive as a
unit inverter. The logical effort of each input is the ratio of the input capacitance of that input
to the input capacitance of the inverter.

For the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C) than
for the two AND terminals (A, B). The parasitic delay is crudely estimated from the total
diffusion capacitance on the output node by summing the sizes of the transistors attached to
the output.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.2 Logical efforts and parasitic delays of AOI gates

1.1.1.3 Input Ordering Delay Effect

The logical effort and parasitic delay of different gate inputs are often different.

Some logic gates, like the AOI21 are inherently asymmetric in that one input sees less
capacitance than another.

Other gates, like NANDs and NORs, are nominally symmetric but actually have slightly
different logical effort and parasitic delays for the different inputs.

Figure 3.3 shows a 2-input NAND gate annotated with diffusion parasitics. Consider
the falling output transition occurring when one input held a stable 1 value and the other
rises from 0 to 1. If input B rises last, node x will initially be at VDD– Vt~VDD because it was
pulled up through the nMOS transistor on input A.

The Elmore delay is (R/2)(2C) R(6C) =7RC =2.3τ‫ זּ‬. On the other hand, if input A
rises last, node x will initially be at 0 V because it was discharged through the nMOS
transistor on input B. No charge must be delivered to node x, so the Elmore delay is simply
R(6C) =6RC =2τ‫זּ‬.

In general, we define the outer input to be the input closer to the supply rail (e.g., B)
and the inner input to be the input closer to the output (e.g., A). The parasitic delay is
smallest when the inner input switches last because the intermediate nodes have already
been discharged.

Therefore, if one signal is known to arrive later than the others, the gate is fastest
when that signal is connected to the inner input.

The logical efforts are lower than initial estimates might predict because of velocity
saturation. Interestingly, the inner input has a slightly higher logical effort because the
intermediate node x tends to rise and cause negative feedback when the inner input turns
ON.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.3 NAND gate delay estimation FIGURE Paths with transistor widths

1.1.1.4 Asymmetric Gates

When one input is far less critical than another, even nominally symmetric gates can be
made asymmetric to favor the late input at the expense of the early one.

 In a series network, this involves connecting the early input to the outer transistor and
making the transistor wider so that it offers less series resistance when the critical
input arrives.

 In a parallel network, the early input is connected to a narrower transistor to reduce


the parasitic capacitance.

For example, consider the path in Figure 3.4(a). Under ordinary conditions, the path acts as
a buffer between A and Y. When reset is asserted, the path forces the output low. If reset
only occurs under exceptional circumstances and can take place slowly, the circuit should be
optimized for input-to-output delay at the expense of reset.

This can be done with the asymmetric NAND gate in Figure 3.4(b). The pulldown resistance
is R/4  +R/(4/3) =R, so the gate still offers the same driver as a unit inverter.

However, the capacitance on input A is only 10/3, so the logical effort is 10/9. This is
better than 4/3, which is normally associated with a NAND gate. In the limit of an infinitely
large reset transistor and unit-sized nMOS transistor for input A, the logical effort
approaches 1, just like an inverter.

The improvement in logical effort of input A comes at the cost of much higher effort
on the reset input. Note that the pMOS transistor on the reset input is also shrunk. This
reduces its diffusion capacitance and parasitic delay at the expense of slower response to
reset.

CMOS transistors are usually velocity saturated, and thus series transistors carry
more current than the long-channel model would predict. For asymmetric gates, the
equivalent width is that of the inner (narrower) transistor.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE3.4 Resettable buffer optimized for data input

1.1.1.5 Skewed Gates

In other cases, one input transition is more important than the other.

we defined

 HI-skew gates to favor the rising output transition


 LO-skew gates to favor the falling output transition.

This favoring can be done by decreasing the size of the noncritical transistor. The
logical efforts for the rising (up) and falling (down) transitions are called guand gd,
respectively, and are the ratio of the input capacitance of the skewed gate to the input
capacitance of an unskewed inverter with equal drive for that transition.

Figure 3.5 (a) shows how a HI-skew inverter is constructed by downsizing the nMOS
transistor. This maintains the same effective resistance for the critical transition while
reducing the input capacitance relative to the unskewed inverter of Figure 3.5(b), thus
reducing the logical effort on that critical transition to gu= 2.5/3 = 5/6.

The improvement comes at the expense of the effort on the noncritical transition. The
logical effort for the falling transition is estimated by comparing the inverter to a smaller
unskewed inverter with equal pulldown current, shown in Figure3.5(c), giving a logical effort
of gd =2.5/1.5 =5/3.

The degree of skewing (e.g., the ratio of effective resistance for the fast transition
relative to the slow transition) impacts the logical efforts and noise margins; a factor of two is
common. Figure 3.6 catalogs HIskew and LO-skew gates with a skew factor of two.

Skewed gates are sometimes denoted with an H or an L on their symbol in a


schematic Alternating HI-skew and LO-skew gates can be used when only one transition is
important.

FIGURE 3.5 Catalog of skewed gates

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.6 Catalog of skewed gates

1.1.1.6 P/N Ratios

Notice in Figure 3.6 that the average logical effort of the LO-skew NOR2 is actually better
than that of the unskewed gate. The pMOS transistors in the unskewed gate are enormous
in order to provide equal rise delay. They contribute input capacitance for both transitions,
while only helping the rising delay.

By accepting a slower rise delay, the pMOS transistors can be downsized to reduce input
capacitance and average delay significantly.

In general, what is the best P/N ratio for logic gates (i.e., the ratio of pMOS to nMOS
transistor width)? For processes with a mobility ratio of µ n/µp =2 as we have generally been
assuming, the best ratios are shown in Figure 3.7.

FIGURE 3.7 Gates with P/N ratios giving least delay

Some paths can be slower than average if they trigger the worst edge of each gate.
Excessively slow rising outputs can also cause hot electron degradation. And reducing the
pMOS size also moves the switching point lower and reduces the inverter’s noise margin.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

In summary, the P/N ratio of a library of cells should be chosen on the basis of area,
power, and reliability, not average delay.

For NOR gates, reducing the size of the pMOS transistors significantly improves both
delay and area. In most standard cell libraries, the pitch of the cell determines the P/N ratio
that can be achieved in any particular gate. Ratios of 1.5–2 are commonly used for inverters.
1.1.1.7 Multiple Threshold Voltages

Some CMOS processes offer two or more threshold voltages. Transistors with lower
threshold voltages produce more ON current, but also leak exponentially more OFF current.

1.2 RATIOED CIRCUITS

 Ratioed circuits depend on the proper size or resistance of devices for correct
operation.

 As shown in Figure 3.8. Conceptually, the ratioed gate consists of an nMOS pulldown
network and some pullup device called the static load.

 When the pulldown network is OFF, the static load pulls the output to 1.
 When the pulldown network turns ON, it fights the static load.

 The static load must be weak enough that the output pulls down to an acceptable 0.
Hence, there is a ratio constraint between the static load and pulldown network.

 Stronger static loads produce faster rising outputs, but increase VOL, degrade the
noise margin, and burn more static power when the output should be 0.

FIGURE3.8 nMOS ratioed gates

 CMOS logic eventually displaced nMOS logic because the static power became
unacceptable as the number of gates increased.

However, ratioed circuits are occasionally still useful in special applications. A resistor is a
simple static load, but large resistors consume a large layout area in typical MOS processes.

Another technique is to use an nMOS transistor with the gate tied to VGG. If VGG =VDD, the
nMOS transistor will only pull up to VDD– Vt. Worse yet, the threshold is increased by the
body effect.

Thus, using VGG>VDDwas attractive. To eliminate this extra supply voltage, some nMOS
processes offered depletion mode transistors. These transistors, indicated with the thick bar,
are identical to ordinary enhancement mode transistors except that an extra ion implantation

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

was performed to create a negative threshold voltage. The depletion mode pullups have
their gate wired to the source so Vgs=0 and the transistor is always weakly ON.

1.2.1 Pseudo-nMOS

Figure 3.9(a) shows a pseudo-nMOS inverter. Neither high-value resistors nor


depletion mode transistors are readily available as static loads in most CMOS processes.
Instead, the static load is built from a single pMOS transistor that has its gate grounded so it
is always ON.

FIG3.9 Pseudo-nMOS inverter and DC transfer characteristics

The DC transfer characteristics are derived by finding Vout for which Idsn =|Idsp| for
a given Vin, as shown in Figure 3.9(b–c) for a 180 nm process.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The beta ratio affects the shape of the transfer characteristics and the VOL of the
inverter. Larger relative pMOS transistor sizes offer faster rise times but less sharp transfer
characteristics.

Figure 3.9(d) shows that when the nMOS transistor is turned on, a static DC current flows in
the circuit.

Figure 3.10 shows several pseudo-nMOS logic gates. The pulldown network is like
that of an ordinary static gate, but the pullup network has been replaced with a single pMOS
transistor that is grounded so it is always ON.

The pMOS transistor widths are selected to be about 1/4 the strength (i.e., 1/2 the
effective width) of the nMOS pulldown network as a compromise between noise margin and
speed; this best size is process-dependent, but is usually in the range of 1/3 to 1/6.

FIGURE 3.10 Pseudo-nMOS logic gates

To calculate the logical effort of pseudo-nMOS gates, suppose a complementary


CMOS unit inverter delivers current I in both rising and falling transitions. For the widths
shown, the pMOS transistors produce I/3 and the nMOS networks produce 4I/3.

The logical effort for each transition is computed as the ratio of the input capacitance
to that of a complementary CMOS inverter with equal current for that transition. For the
falling transition, the pMOS transistor effectively fights the nMOS pulldown. The output
current is estimated as the pulldown current minus the pullup current, (4I/3 – I/3) =I.

Therefore, we will compare each gate to a unit inverter to calculate gd.

For example, the logical effort for a falling transition of the pseudo-nMOS inverter is
the ratio of its input capacitance (4/3) to that of a unit complementary CMOS inverter (3), i.e.,
4/9. guis three times as great because the current is 1/3 as much.

The parasitic delay is also found by counting output capacitance and comparing it to
an inverter with equal current. For example, the pseudo-nMOS NOR has 10/3 units of
diffusion apacitance as compared to 3 for a unit-sized complementary CMOS inverter, so its
parasitic delay pulling down is 10/9.

The pullup current is 1/3 as great, so the parasitic delay pulling up is 10/3. As can be
seen, pseudo-nMOS is slower on average than static CMOS for NAND structures. However,
pseudo-nMOS works well for NOR structures.

The logical effort is independent of the number of inputs in wide NORs, so pseudo-
nMOS is useful for fast wide NOR gates or NOR-based structures like ROMs and PLAs
when power permits.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Pseudo-nMOS gates will not operate correctly if VOL>VILof the receiving gate. This
is most likely in the SF design corner where nMOS transistors are weak and pMOS
transistors are strong.

Designing for acceptable noise margin in the SF corner forces a conservative choice
of weak pMOS transistors in the normal corner. A biasing circuit can be used to reduce
process sensitivity, as shown in Figure 3.11.

The goal of the biasing circuit is to create a Vbias that causes P2 to deliver 1/3 the
current of N2, independent of the relative mobilities of the pMOS and nMOS transistors.
Transistor N2 has width of 3/2 and hence produces current 3I/2 when ON.

Transistor N1 is tied ON to act as a current source with 1/3 the current of N2, i.e., I/2.
P1 acts as a current mirror using feedback to establish the bias voltage sufficient to provide
equal current as N1, I/2. The size of P1 is noncritical so long as it is large enough to produce
sufficient current and is equal in size to P2.

Now, P2 ideally also provides I/2. In summary, when A is low, the pseudo-nMOS
gate pulls up with a current of I/2. When A is high, the pseudo-nMOS gate pulls down with
an effective current of (3I/2 – I/2) =I. To first order, this biasing technique sets the relative
currents strictly by transistor widths, independent of relative pMOS and nMOS mobilities.

FIGURE 3.11 Replica biasing of pseudo-nMOS gates

Such replica biasing permits the 1/3 current ratio rather than the conservative ¼ ratio in the
previous circuits, resulting in lower logical effort.

The bias voltage Vbias can be distributed to multiple pseudo-nMOS gates. Ideally, Vbias will
adjust itself to keep VOLconstant across process corners. Unfortunately, the currents through
the two pMOS transistors do not exactly match because their drain voltages are unequal, so
this technique still has some process sensitivity. Also note that this bias is relative to VDD,
so any noise on either the bias voltage line or the VDD supply rail will impact circuit
performance.

FIGURE Replica biasing of pseudo-nMOS gates

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

1.2.2 Ganged CMOS

Figure 3.12 illustrates pairs of CMOS inverters ganged together. The truth table is
given in Table 2, showing that the pair compute the NOR function. Such a circuit is
sometimes called a symmetric 2 NOR, or more generally, ganged CMOS.

 When one input is 0 and the other 1, the gate can be viewed as a pseudo-
nMOS circuit with appropriate ratio constraints.
 When both inputs are 0, both pMOS transistors turn on in parallel, pulling the
output high faster than they would in an ordinary pseudonMOS gate.

 Moreover, when both inputs are 1, both pMOS transistors turn OFF, saving
static power dissipation.

As in pseudo-nMOS, the transistors are sized so the pMOS are about 1/4 the
strength of the nMOS and the pulldown current matches that of a unit inverter. Hence, the
symmetric NOR achieves both better performance and lower power dissipation than a 2-
input pseudo-nMOS NOR.

FIGURE3.12 Symmetric 2-input NOR gate

TABLE 2 Operation of symmetric NOR

1.3 Cascode Voltage Switch Logic

 Cascode Voltage Switch Logic (CVSL3) seeks the benefits of ratioed circuits without
the static power consumption.

 It uses both true and complementary input signals and computes both true and
complementary outputs using a pair of nMOS pulldown networks, as shown in Figure
3.13(a).

 The pulldown network f implements the logic function as in a static CMOS gate, while
f uses inverted inputs feeding transistors arranged in the conduction complement.

 For any given input pattern, one of the pulldown networks will be ON and the other
OFF. The pulldown network that is ON will pull that output low. This low output turns
ON the pMOS transistor to pull the opposite output high. When the opposite output
rises, the other pMOS transistor turns OFF so no static power dissipation occurs.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Figure 3.13(b) shows a CVSL AND/NAND gate. Observe how the pulldown networks
are complementary, with parallel transistors in one and series in the other.

 Figure 3.13(c) shows a 4-input XOR gate. The pulldown networks share Aand A
transistors to reduce the transistor count by two.

Advantages

 CVSL has a potential speed advantage because all of the logic is performed with
nMOS transistors, thus reducing the input capacitance.

 As in pseudo-nMOS, the size of the pMOS transistor is important. It fights the


pulldown network, so a large pMOS transistor will slow the falling transition.

 Unlike pseudo-nMOS, the feedback tends to turn off the pMOS, so the outputs will
settle eventually to a legal logic level. A small pMOS transistor is slow at pulling the
complementary output high.

 In addition, the CVSL gate requires both the low- and high-going transitions, adding
more delay. Contention current during the switching period also increases power
consumption.

 Pseudo-nMOS worked well for wide NOR structures.

 Unfortunately, CVSL also requires the complement, a slow tall NAND structure.
Therefore, CVSL is poorly suited to general NAND and NOR logic

FIGURE3.13 CVSL gates

1.4 Dynamic Circuits

Ratioed circuits reduce the input capacitance by replacing the pMOS transistors
connected to the inputs with a single resistive pullup.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Drawbacks

The drawbacks of ratioed circuits include

 Slow rising transitions,


 Contention on the falling transitions, s
 Tatic power dissipation, and
 A nonzero vol.

Dynamic circuits circumvent these drawbacks by using a clocked pullup transistor rather
than a pMOS that is always ON.

Figure 3.14 compares (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters.

FIGURE 3.14 Comparison of (a) static CMOS, (b) pseudo-nMOS, and (c) dynamic inverters

Dynamic circuit operation is divided into two modes, as shown in Figure 3.15.

 During precharge, the clock φ is 0, so the clocked pMOS is ON and initializes the
output Y high.
 During evaluation, the clock is 1 and the clocked pMOS turns OFF.
 The output may remain high or may be discharged low through the pulldown network.

Dynamic circuits are the fastest commonly used circuit family because they have lower input
capacitance and no contention during switching. They also have zero static power
dissipation.

However, they require careful clocking, consume significant dynamic power, and are
sensitive to noise during evaluation.

In Figure 3.14(c), if the input A is 1 during precharge, contention will take place because
both the pMOS and nMOS transistors will be ON.

When the input cannot be guaranteed to be 0 during precharge, an extra clocked evaluation
transistor can be added to the bottom of the nMOS stack to avoid contention as shown in
Figure 3.15(a).

The extra transistor is sometimes called a foot. Figure (c) shows generic footed and
unfooted gates.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.15(a) Precharge and evaluation of dynamic gates(b)Footed dynamic inverter


(c)Generalized footed andunfooted dynamic gates

Figure 3.16 estimates the falling logical effort of both footed and unfooted dynamic
gates. As usual, the pulldown transistors’ widths are chosen to give unit resistance.
Precharge occurs while the gate is idle and often may take place more slowly.

FIGURE 3.16 Catalog of dynamic gates

Therefore, the precharge transistor width is chosen for twice unit resistance. This reduces
the capacitive load on the clock and the parasitic capacitance at the expense of greater
rising delays.

We see that the logical efforts are very low. Footed gates have higher logical effort than their
unfooted counterparts but are still an improvement over static logic.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Like pseudo-nMOS gates, dynamic gates are particularly well suited to wide NOR functions
or multiplexers because the logical effort is independent of the number of inputs.

Monotonicity

A fundamental difficulty with dynamic circuits is the monotonicity requirement.

While a dynamic gate is in evaluation, the inputs must be monotonically rising. That is, the
input can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and remain
HIGH, but not start HIGH and fall LOW.

Figure 3.17 shows waveforms for a footed dynamic inverter in which the input violates
monotonicity.

During precharge, the output is pulled HIGH. When the clock rises, the input is HIGH
so the output is discharged LOW through the pulldown network, as you would want to have
happen in an inverter.

The input later falls LOW, turning off the pulldown network. However, the precharge
transistor is also OFF so the output floats, staying LOW rather than rising as it would in a
normal inverter. The output will remain low until the next precharge step.

In summary, the inputs must be monotonically rising for the dynamic gate to compute
the correct function. Unfortunately, the output of a dynamic gate begins HIGH and
monotonically falls LOW during evaluation.

FIGURE3.17 Monotonicity problem

This monotonically falling output X is not a suitable input to a second dynamic gate
expecting monotonically rising signals, as shown in Figure 3.18. Dynamic gates sharing the
same clock cannot be directly connected.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.18 Incorrect connection of dynamic gates

1.4.1 Domino Logic

The monotonicity problem can be solved by placing a static CMOS inverter between
dynamic gates, as shown in Figure 3.19(a).

This converts the monotonically falling output into a monotonically rising signal
suitable for the next gate, as shown in Figure 3.19(b).

The dynamic-static pair together is called a domino gate because precharge


resembles setting up a chain of dominos and evaluation causes the gates to fire like
dominos tipping over, each triggering the next.

A single clock can be used to precharge and evaluate all the logic gates within the
chain. The dynamic output is monotonically falling during evaluation, so the static inverter
output is monotonically rising.

Therefore, the static inverter is usually a HI-skew gate to favor this rising output.
Observe that precharge occurs in parallel, but evaluation occurs sequentially. The symbols
for the dynamic NAND, HI-skew inverter, and domino AND are shown in Figure 3.19(c).

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE3.19 Domino gates

In general, more complex inverting static CMOS gates such as NANDs or NORs can be
used in place of the inverter. This mixture of dynamic and static logic is called compound
domino.

1.4.2 Dual-Rail Domino Logic

Dual-rail domino gates encode each signal with a pair of wires. The input and output signal
pairs are denoted with _h and _l, respectively.

Table summarizes the encoding.

TABLE Dual-rail domino signal encoding

The _h wire is asserted to indicate that the output of the gate is “high” or 1. The _l wire is
asserted to indicate that the output of the gate is “low” or 0. When the gate is precharged,
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

neither _h nor _l is asserted. The pair of lines should never be both asserted simultaneously
during correct operation.

FIGURE3.20 Dual-rail domino gates

Dual-rail domino gates accept both true and complementary inputs and compute both true
and complementary outputs, as shown in Figure 3.20(a).

Observe that this is identical to static CVSL circuits from Figure 3.13 except that the cross-
coupled pMOS transistors are instead connected to the precharge clock.

Therefore, dual-rail domino can be viewed as a dynamic form of CVSL, sometimes called
DCVS.

Figure 3.20(b) shows a dual-rail AND/NAND gate and Figure 3.20(c) shows a dual-rail
XOR/XNOR gate. The gates are shown with clocked evaluation transistors, but can also be
unfooted.

Dual-rail domino is a complete logic family in that it can compute all inverting and
noninverting logic functions. However, it requires more area, wiring, and power.

Dual-rail structures also lose the efficiency of wide dynamic NOR gates because they require
complementary tall dynamic NAND stacks.

Dual-rail domino signals not only the result of a computation but also indicates when the
computation is done. Before computation completes, both rails are precharged. When the
computation completes, one rail will be asserted. A NAND gate can be used for completion
detection, as shown in Figure 3.21.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.21 Dual-rail domino gate with completion detection


1.4.3 Keepers

Drawbacks of Dynamic Circuits

 Dynamic circuits also suffer from charge leakage on the dynamic node.
 If a dynamic node is precharged high and then left floating, the voltage on the
dynamic node will drift over time due to subthreshold, gate, and junction
leakage.

 The time constants tend to be in the millisecond to nanosecond range,


depending on process and temperature. This problem is analogous to
leakage in dynamic RAMs.

 Moreover, dynamic circuits have poor input noise margins. If the input rises
above Vtwhile the gate is in evaluation, the input transistors will turn on
weakly and can incorrectly discharge the output.

Both leakage and noise margin problems can be addressed by adding a keeper circuit.

Figure 3.22 shows a conventional keeper on a domino buffer.

 The keeper is a weak transistor that holds, or staticizes, the output at the correct
level when it would otherwise float.

 When the dynamic node X is high, the output Y is low and the keeper is ON to
prevent X from floating. When X falls, the keeper initially opposes the transition so it
must be much weaker than the pulldown network.

 Eventually Y rises, turning the keeper OFF and avoiding static power dissipation.

 The keeper must be strong (i.e., wide) enough to compensate for any leakage
current drawn when the output is floating and the pulldown stack is OFF.

 Strong keepers also improve the noise margin because when the inputs are slightly
above Vtthe keeper can supply enough current to hold the output high.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE3.22 Conventional keeper

For small dynamic gates, the keeper must be weaker than a minimum-sized transistor. This
is achieved by increasing the keeper length, as shown in Figure 3.23(a).

Long keeper transistors increase the capacitive load on the output Y. This can be avoided by
splitting the keeper, as shown in Figure 3.23(b).

FIGURE 3.23 Weak keeper implementations

Figure 3.24 shows a differential keeper for a dual-rail domino buffer. When the gate is
precharged, both keeper transistors are OFF and the dynamic outputs float. However, as
soon as one of the rails evaluates low, the opposite keeper turns ON.

The differential keeper is fast because it does not oppose the falling rail. As long as one of
the rails is guaranteed to fall promptly, the keeper on the other rail will turn on before
excessive leakage or noise causes failure.

FIGURE3.24 Differential keeper

1.4.5 NP and Zipper Domino

Another variation on domino is shown in Figure 3.27(a). The HI-skew inverting static
gates are replaced with predischarged dynamic gates using pMOS logic.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

For example, a footed dynamic p-logic NAND gate is shown in Figure 3.27(b).

 When φ is 0, the first and third stages precharge high while the second stage
predischarges low.
 When φ rises, all the stages evaluate.

Domino connections are possible, as shown in Figure 3.27(c).

The design style is called NP Domino or NORA Domino (NO RAce) .

NORA has two major drawbacks.

 The logical effort of footed p-logic gates is generally worse than that of HI-skew gates
(e.g., 2 vs. 3/2 for NOR2 and 4/3 vs. 1 for NAND2).
 Secondly, NORA is extremely susceptible to noise.

In an ordinary dynamic gate, the input has a low noise margin (about Vt ), but is strongly
driven by a static CMOS gate.The floating dynamic output is more prone to noise from
coupling and charge sharing, but drives another static CMOS gate with a larger noise
margin.In NORA, however, the sensitive dynamic inputs are driven by noiseprone dynamic
outputs.

Given these drawbacks and the extra clock phase required, there is little reason to
use NORA. Zipper domino is a closely related technique that leaves the precharge
transistors slightly ON during evaluation by using precharge clocks that swing between 0
and VDD – |Vtp| for the pMOS precharge and Vtnand VDDfor the nMOS precharge. This plays
much the same role as a keeper.

FIGURE 3.27 NP Domino

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

1.5 Pass-Transistor Circuits

In the circuit families we have explored so far, inputs are applied only to the gate
terminals of transistors. In pass-transistor circuits, inputs are also applied to the source/drain
diffusion terminals.

These circuits build switches using either nMOS pass transistors or parallel pairs of
nMOS and pMOS transistors called transmission gates.

For the purpose of comparison, Figure3.28 shows a 2-input multiplexer constructed in a wide
variety of pass-transistor circuit families along with static CMOS, pseudonMOS, CVSL, and
single- and dual-rail domino.

Some of the circuit families are dualrail, producing both true and complementary outputs,
while others are single-rail and may require an additional inversion if the other polarity of
output is needed. U XOR V can bertain other cases, we will see that computed with exactly
the same logic using S = U, S = U, A = V, B = V. This shows that static CMOS is particularly
poorly suited to XOR because the complex gate and two additional inverters are required;
hence, pass-transistor circuits become attractive.

In comparison, static CMOS NAND and NOR gates are relatively efficient and benefit less
from pass transistors.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.28 Comparison of circuit families for 2-input multiplexers

1.5.1 CMOS with Transmission Gates

Structures such as tristates, latches, and multiplexers are often drawn as


transmission gates in conjunction with simple static CMOS logic.

The transmission gate multiplexer using two transmission gates. The circuit was non
restoring; i.e., the logic levels on the output are no better than those on the input so a
cascade of such circuits may accumulate noise. To buffer the output and restore levels, a
static CMOS output inverter can be added, as shown in Figure 3.28 (CMOSTG).

A single nMOS or pMOS pass transistor suffers from a threshold drop. If used alone,
additional circuitry may be needed to pull the output to the rail. Transmission gates solve this
problem but require two transistors in parallel.

The resistance of a unit-sized transmission gate can be estimated as R for the


purpose of delay estimation. Current flows through the parallel combination of the nMOS and
pMOS transistors. One of the transistors is passing the value well and the other is passing it
poorly; for example, a logic 1 is passed well through the pMOS but poorly through the
nMOS.

Estimate the effective resistance of a unit transistor passing a value in its poor
direction as twice the usual value: 2R for nMOS and 4R for pMOS.

Figure3.29 shows the parallel combination of resistances. When passing a 0, the


resistance is R || 4R = (4/5)R. The effective resistance passing a 1 is 2R || 2R = R. Hence, a
transmission gate made from unit transistors is approximately R in either direction. Note that
transmission gates are commonly built using equal-sized nMOS and pMOS transistors.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Boosting the size of the pMOS transistor only slightly improves the effective resistance while
significantly increasing the capacitance.

FIGURE3.29 Effective resistance of a unit transmission gate

Figure 3.30(a) redraws the multiplexer to include the inverters from the previous
stage that drive the diffusion inputs but to exclude the output inverter.

Figure 3.30(b) shows this multiplexer drawn at the transistor level. Observe that this
is identical to the static CMOS multiplexer of Figure 3.28 except that the intermediate nodes
in the pullup and pulldown networks are shorted together as N1 and N2.

The shorting of the intermediate nodes has two effects on delay.

 The effective resistance decreases somewhat (especially for rising outputs) because
the output is pulled up or down through the parallel combination of both pass
transistors rather than through a single transistor.

 Tthe effective capacitance increases slightly because of the extra diffusion and wire
capacitance required for this shorting.

FIGURE 3.30 Alternate representations of CMOSTG in a 2-input inverting multiplexer

Figure3.31 shows a similar transformation of a tristate inverter from transmission


gate form to conventional static CMOS by unshorting the intermediate node and redrawing
the gate.

Note that the circuit in Figure 3.31(d) interchanges the A and enable terminals. It is
logically equivalent, but electrically inferior because if the output is tristated but Atoggles,
charge from the internal nodes may disturb the floating output node.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 3.31 Tristate inverter

The logical effort of circuits involving transmission gates is computed by drawing


stages that begin at gate inputs rather than diffusion inputs, as in as in Figure 3.32 for a
transmission gate multiplexer. The effect of the shorting can be ignored, so the logical effort
from either the A or B terminals is 6/3, just as in a static CMOS multiplexer.

Note that the parasitic delay of transmission gate circuits with multiple series
transmission gates increases rapidly because of the internal diffusion capacitance, so it is
seldom beneficial to use more than two transmission gates in series without buffering.

FIGURE3.32 Logical effort of transmission gate circuit

1.5.2 Complementary Pass Transistor Logic (CPL)

Disadvantages of CVSL:

 CVSL is slow because one side of the gate pulls down, and then the cross-
coupled pMOS transistor pulls the other side up.

 The size of the cross coupled device is an inherent compromise between a


large transistor that fights the pulldown xcessively and a small transistor that
is slow pulling up.

CPL can be understood as an improvement on CVSL. CPL resolves this problem by


making one half of the gate pull up while the other half pulls down.

Figure 3.33(a) shows the CPL multiplexer from Figure 3.28 rotated sideways. If a
path consists of a cascade of CPL gates, the inverters can be viewed equally well as being
on the output of one stage or the input of the next.

Figure 3.33(b) redraws the mux to include the inverters from the previous stage that
drives the diffusion input, but to exclude the output inverters.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.33(c) shows the mux drawn at the transistor level. Observe that this is
identical to the CVSL gate from Figure 3.28 except that the internal node of the stack can be
pulled up through the weak pMOS transistors in the inverters.

When the gate switches, one side pulls down well through its nMOS transistors. The
other side pulls up. CPL can be constructed without cross-coupled pMOS transistors, but the
outputs would only rise to VDD– Vt(or slightly lower because the nMOS transistors experience
the body effect). This costs static power because the output inverter will be turned slightly
ON.

Adding weak cross-coupled devices helps bring the rising output to the supply rail
while only slightly slowing the falling output. The output inverters can be LO-skewed to
reduce sensitivity to the slowly rising output.

FIGURE 3.33 Alternate representations of CPL

18 POWER DISSIPATION

 In static CMOS gates power was a secondary consideration behind speed and
area for many chips. As transistor counts and clock frequencies have increased,
power consumption is a primary design constraint.

 Power dissipation in CMOS circuits comes from two components:

 Static dissipation due to subthreshold conduction through OFF transistors


tunneling current through gate oxide
 leakage through reverse-biased diodes contention current in ratioed circuits
 Dynamic dissipation due to charging and discharging of load capacitances short-
circuit current while both pMOS & nMOS are partially ON

Ptotal = Pstatic + Pdynamic

 Some important definitions. The instantaneous power P(t) drawn from the
power supply is proportional to the supply current iDD(t) and the supply voltage
VDD.

P(t) = iDD(t).VDD

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 The energy consumed over some time interval T is the integral of the
instantaneous power

 The average power over this interval is

2.19 STATIC DISSIPATION:

 Considering the static CMOS inverter shown in Figure (a), if the input = '0,' the
associated nMOS transistor is OFF and the pMOS transistor is ON. The output
voltage is VDD or logic '1.

 When the input = '1,' the associated nMOS transistor is ON and the pMOS
transistor is OFF. The output voltage is 0 volts (GND).

 Note that one of the transistors is always OFF when the gate is in either of
these logic states. Ideally, no current flows through the OFF transistor so the
power dissipation is zero.

 However, secondary effects including subthreshold conduction, tunneling, and


leakage lead to small amounts of static current flowing through the OFF
transistor.

 The leakage current is constant so instantaneous and average power are the
same; the static power dissipation is the evaluation product of total leakage
current and the supply voltage.
 Pstatic = IstaticVDD

Fig.2.54(a): CMOS inverter model for static power dissipation evaluation

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Even OFF transistors conduct a small amount of subthreshold current. As


subthreshold current is exponentially dependent on threshold voltage, it is
increasing dramatically as threshold voltages have scaled down.

 Si02 is a very good insulator, so leakage current through the gate dielectric
historically was very low. However, it is possible for electrons to tunnel across
very thin insulators; the probability drops off exponentially with oxide
thickness.

 Tunneling current becomes important for transistors around the 130 nm


generation with gate oxides of 20A or thinner. There is also some small static
dissipation due to reverse biased diode leakage between diffusion regions,
wells, and the substrate.

 In modern processes, diode leakage is generally much smaller than the


subthreshold or gate leakage and may be neglected. Leakage power was of
concern only to ultra-low-power systems.

 Today, static dissipation can occur in gates such as pseudo-nMOS gates


where there is a direct path between power and ground. If such gates are
used, contention current must be factored into total static power dissipation of
the chip.

2.20. DYNAMIC DISSIPATION

 The primary dynamic dissipation component is charging the load capacitance.


Suppose a load C is switched between GND and V DD at an average frequency
fsw. Over a given interval of time T, the load will be charged and discharged
Tfsw times.

 Current flows from VDD to the load to charge it. Current then flows from the
load to GND during discharge. In one complete charge/discharge cycle, a
total charge of Q = CVDD is thus transferred from VDD to GND.

 The average dynamic power dissipation is

Taking the integral of the current over some interval T as the total charge delivered
during

Now the dynamic power dissipation may be rewritten as an activity factor  times
the clock frequency:

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 A clock has an activity factor of α =1 because it rises and falls every cycle.
Most data has a maximum α value as of 0.5 because it transitions only once
each cycle.

 Because the input rise/fall time is greater than zero, both nMOS and pMOS
transistors will be ON for a short period of time while the input is between Vtn
and VDD – |Vtp|.

 This results in an "short circuit" current pulse from V DD to GND and typically
increases power dissipation by about 10%. Short circuit power dissipation
occurs as both pullup and pulldown networks are partially ON while the input
switches.

 It increases as edge rates become slower because both networks are ON for
more time. However, it decreases as load capacitance increases because
with large loads the output only switches a small amount during the input
transition, leading to a small Vds across one of the transistors.

 It is good to use relatively crisp edge rates at the inputs to gates with wide
transistors to minimize their short circuit current.

2.21. LOW-POWER DESIGN

Power dissipation has become extremely important to VLSI designers. Total power
dissipation is the sum of the static and dynamic dissipation components.

 Dynamic dissipation has historically been far greater than static power when
systems are active, and hence, static power is often ignored.

 For high-performance systems such as workstations and servers, dynamic


power consumption per chip is often limited to about 150 W by the amount of
heat that can be managed with air-cooled systems and cost-effective heatsinks.

 This number increases slowly with advances in heatsink technology and can be
increased significantly with expensive liquid cooling, but has not kept pace with
the growing power demands of systems.

 Power reduction techniques can be divided into those that reduce dynamic
power and those that reduce static power.

(i). Dynamic Power Reduction

 From Pdynamic equation that dynamic power is reduced by decreasing the


activity factors, the switching capacitance, the power supply, or the operating
frequency.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Activity factor reduction is very important. Static logic has an inherently low
activity factor. Dynamic circuit families have clocked nodes and a high internal
activity factor, so they are also costly in power. Clock gating can be used to stop
portions of the chip that are idle; for example, a floating point unit can be turned off
when executing integer code and a second level cache can be idled if the data is
found in the primary cache.

 A drawback of activity factor reduction is that if the system transitions rapidly


from an idle mode with little switching to a fully active mode, a large di/dt spike will
occur. This leads to inductive noise in the power supply network.

 Device-switching capacitance is reduced by choosing small transistors.


Minimum-sized gates can be used on non-critical paths. Although Logical Effort
finds that the best stage effort is about 4, using a larger stage effort increases
delay only slightly and greatly reduces transistor sizes.

 For example, buffers driving I/O pads or long wires may use a stage effort of
8-12 to reduce the buffer size. Interconnect switching capacitance is most
effectively reduced through careful floor-planning, placing communicating units
near each other to reduce wire lengths.

 Choosing a lower power supply significantly reduces power consumption. As


many transistors are operating in a velocity-saturated regime, the lower power
supply may not reduce performance as much as first-order models predict.

 Voltage can be adjusted based on operating mode; for example, a laptop


processor may operate at high voltage and high speed when plugged into an AC
adapter, but at lower voltage and speed when on battery power.

Frequency can also be traded for power. For example, in a digital signal processing
system primarily concerned with throughput, two multipliers running at half speed
can replace a single multiplier at full speed.

At first, this may not appear to be a good idea because it maintains constant power
and performance while doubling area. However, if the power supply can also be
reduced because the frequency requirement is lowered, overall power consumption
goes down.

Commonly used metrics in low-power design are power, the power-delay product,
and the energy-delay product. Power alone is a questionable metric because it can
be reduced simply by computing more slowly. The power-delay product is also
suspect because the energy can be reduced by computing more slowly at a lower
supply voltage. The energy-delay product is less prone to such gaming.

Overall, the energy-delay product measured in Performance2/Watt normalized for


process only varies by about a factor of two across a wide range of general-purpose
microprocessor architectures.

(ii). Static Power Reduction

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Static power reduction involves minimizing Istatic. Some circuit techniques such as
analog current sources and pseudo-nMOS gates intentionally draw static power.
They can be turned off when they are not needed.

The sub threshold leakage current equation for Vgs<Vt is

where
Where the η term describes drain-induced barrier lowering and the γ term describes
the body effect.

 Sub threshold leakage power is already a major problem for battery-powered


designs in the 180 nm generations and will be growing exponentially as
power supplies and threshold voltages are scaled down in future processes.

 The high-performance requirement needs relatively low thresholds, which


contribute excessive leakage current in the idle mode. Selective application of
multiple threshold voltages can maintain performance on critical paths with
low Vt transistors while reducing leakage on other paths with high Vt
transistors.

 Another way to control leakage is through the body voltage using the body
effect. For example, low-Vt devices can be used and a reverse body bias
(RBB) can be applied during idle mode to reduce leakage [Alternatively,
higher Vt devices can be used, and then a forward body bias (FBB) can be
applied during active mode to increase performance.

 Too much reverse body bias leads to greater junction leakage through a
mechanism called band-to-band tunneling, while too much forward body bias
leads to substantial current through the body to source diodes.

 An adaptive body bias (ABB) can compensate and achieve more uniform
transistor performance despite the variations. In any case, the body bias
should be kept to less than about 0.5 V.

 Applying a body bias requires additional power supply rails to distribute the
substrate and well voltages. For example, a RBB scheme for a 1.8 V n-well
process could bias the p-type substrate at V BBn = -0.4 V and the n-well at V BBp
= 2.2 V.

 Figure 2.55(b) shows a schematic and cross-section of an inverter using body


bias. In an n-well process, all nMOS transistors share the same p substrate
and must use the same VBBn. The well and substrate carry little current, so the
bias voltages are relatively easy to generate and distribute.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Alternatively, the source voltage can be raised in sleep mode. This has the
double benefit of reducing Vds as well as increasing Vsb. However, the source
does carry significant current, so generating a stable and adjustable source
voltage rail is challenging

Fig.2.55(b): Body bias of a CMOS inverter


 Yet another method of reducing idle leakage current in low-power systems is to
turn off the power supply entirely. Multiple Threshold CMOS circuits (MTCMOS)
use low-Vt transistors for computation and a high- Vt transistor as a switch to
disconnect the power supply during idle mode, as shown in Figure2.56(c).

Fig.2.56(c): MTCMOS

 The high-Vt device is connected between the true VDD and the virtual VDDV rails
connected to the logic gates. The extra transistor increases the impedance
between the true and virtual power supply, causing greater power supply noise
and gate delay.

 Bypass capacitance between VDDV and GND stabilizes the supply somewhat,
but the capacitance is discharged each time Vddy is disconnected, contributing to
the power consumption.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 The leakage through two series OFF transistors is much lower than that of a
single transistor because of the stack effect. In Figure2.57 (d), the single
transistor has a relatively low threshold because of drain- induced barrier
lowering from the high drain voltage. In Figure 2.57(d), node x rises to about 100
mV.

Fig.2.57(d): Leakage Stage Effect

 Low-power systems can take advantage of this stack effect to put gates with
series transistors into a low-leakage sleep mode by applying an input pattern to
turn off both transistors. Silicon on Insulator (SOI) circuits is attractive for low-
leakage designs because they have a sharper sub threshold current roll off.

PartA

1.Define Elmore delay model ?Give the expression for elmore delay and state
the various parameters associated with it. (Nov-14,May-16,May-17)
It is an analytical method used to estimate the RC delay in a network. Elmore
delay model estimates the delay of a RC ladder as the sum over each node in the
ladder of the resistance Rn-1 between that node and a supply multiplied by the
capacitor on the nodes.

2.What are the general properties of Elmore delay model?


General property of Elmore delay model network has
Single input node
All the capacitors are between a node and ground
Network does not contain any resistive loop
3.What are the types of power dissipation ?
Static power dissipation (due to leakage current when the circuit is idle).
Dynamic power dissipation(when the circuit is switching) and
Short –circuit power dissipation during switching of transistors.
4.What is static power dissipation ?
Power dissipation due to leakage current when the idle is called the static
power dissipation. Static power due to
Sub – threshold conduction through OFF transistors
Tunneling current through gate oxide

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Leakage through reverse biased diodes


contention current in radioed circuits.
5.What is Dynamic power dissipation ?
capacitance at a particular node at operating frequency is called Dynamic
power dissipation. The Dynamic power dissipation at a particular output node is
given by
Pd= αCL Vdd2 Fclk
Where,
CL = load capacitance ;
α = activity factor ;
Vdd =power supply ;
Fclk= operating frequency
6.What are the methods to reduce dynamic power dissipation ?
1. Reducing the product of capacitance and its switching frequency .
2. Eliminate logic switching that is not necessary for computation.
3. Reduce activity factor Reduce supply voltage
7.What are the methods to reduce static power dissipation ?
1. By selecting multi threshold voltages on circuit paths with low-Vt transistors
while leakage on other paths with high-Vt transistors.
2. By using two operating modes, active and standby for each function blocks.
3. By adjusting the body bias (i.e) adjusting FBB (Forward Body Bias) in
active mode to increase performance and RBB (Reverse Body Bias) in standby
mode to reduce leakage.
4. By using sleep transistors to isolate the supply from the block to achieve
significant leakage power savings.
8.What is short circuit power dissipation ?
During switching, both NMOS and PMOS transistors will conduct
simultaneously and provide a direct path between Vdd and the ground rail resulting
in short circuit power dissipation
9.Define design margin ?
The additional performance capability above required standard basic system
parameters that may be specified by a system designer to compensate for
uncertainties is called design margin. Design margin required as there are three
sources of variation- two environmental and one manufacturing.
10.Write the applications of transmission gate ?
Multiplexing element of path selector
A latch element An unlock switch
Act as a voltage controlled resistor connecting the input and output.
11.What is pass transistor?
It is a MOS transistor, in which gate is driven by a control signal the source
(out), the drain of the transistor is called constant or variable voltage potential(in)
when the control signal is high, input is passed to the output and when the control
signal is low, the output is floating topology such topology circuits is called pass
transistor.
12.List the advantages of pass transistor?
Pass transistor logic (PTL) circuits are often superior to standard CMOS
circuits in terms of layout density, circuit delay and power consumption.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

They do not have path VDD to GND and do not dissipate standby
power(static power dissipation).
13.What is transmission gate ?
The circuit constructed with the parallel connection of PMOS and NMOS with
shorted drain and source terminals. The gate terminal uses two select signals s and
s, when s is high than the transmission gates passes the signal on the input. The
main advantage of transmission gate is that it eliminates the threshold voltage drop.
14.why low power has become an important issue in the present day VLSI
circuit realization?
Indeep submicron technology the power has become as one of the most
important issue because of:
Increasing transistor count-the number of transistor is getting doubled in
every 18 months based on moore's law

higher speed of operation-the power dissipation is proportional to clock


frequency greater device leakage current;in nanometer technology the leakage
component become a significant percentage of the total power and the leakage
current increases at a faster rate than dynamic power in technology generations.
15.what are the various ways to reduce the delay time of a CMOS inverter ?
Various ways for reducing the delay time are given below:
a)the width of the MOS transistor can be increased to reduce delay.this is
known as gate sizing,which will be discussed later in moredetails.
b)the load capacitance can be reduced to reduce delay.this is achivedby
using transistor of smaller and smaller dimension by feature generation technology.
c)delay can also be reduced by increasing the supply voltage Vdd and/or
reducing the threshold voltage Vt of the MOS transistors
16.Explain the basic operation of a 2- phase dynamic circuit/
The operation of the circuit can be explained using precharge logic in which the
output is precharged to HIGH level during Φ2 clock and the output is evaluated
during Φ1 clock.
17.what makes dynamic CMOS circuits faster than static CMOS circuits ?
As MOS dynamic circuits require lesser number of transistor and capacitance
is to be driven by it.this makes MOS dynamic circuits faster.
18.what is glitching power dissipation?
Because of finite delay of the gates used to realize boolean functions,different
signals cannot reach the inputs of a gate simultaneously.this leads to spurious
transition at the output before it settles down to its final value.the spurious transitions
leads to charging and discharging of the outputs causing glitching power dissipation.
It can be minimized by having balanced realization having same delay at the inputs.
19.List various sources of leakage currents?
Various source of leakage currents are listed below:
I1=Reverse-bias p-n junction diode leakage current.

I2=band-to-band tunneling current


I3=Subthreshold leakage current
I4=Gate oxide tunneling current
I5=Gate current due to hot carrier junction
I6=Channel punch through

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

I7=Gate induced drain leakage current


20. Compare and contrast clock gating versus power gating approaches.
Clock gating minimizes dynamic power by stopping unnecessary
transitions,but power gating minimizes leakage power by inserting a high Vt
transistor in series with low Vt logic blocks.

21. What are two components of Power dissipation? [MAY2013]


Power dissipation in CMOS circuits comes from two components:
Static dissipation due to
 Sub threshold conduction through OFF transistors
 Tunneling current through gate oxide
 Leakage through reverse-biased diodes
 Contention current in ratioed circuits
Dynamic dissipation due to
 charging and discharging of load capacitances
 "short-circuit" current while both pMOS and nMOS networks are
partially ON
Ptotal = Pstatic + Pdynamic
22. Define Rise time
Rise time, tr is the time taken for a waveform to rise from 10% to 90% of its
steady-state value.

23. Define Fall time


Fall time, tf is the time taken for a waveform to fall from 90% to 10% of its
steady-state value.
24. Define Delay time
Delay time, td is the time difference between input transition (50%) and the
50% output level. This is the time taken for a logic transition to pass from input to
output.
25. What is meant by propagation delay & contamination delay time? May-17
Propagation delay time, tpd = maximum time from the input crossing 50% to
the output crossing 50%
Contamination delay time, tcd = minimum time from the input crossing 50% to
the output crossing 50% .
26. Define logical effort?
Logical effort of a gate is defined as ―the ratio of the input capacitance of the
gate to the input capacitance of an inverter that can deliver the same output
current. Equivalently, logical effort indicates how much worse a gate is at

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

producing output current as compared to an inverter, given that each input of the
gate may only present as much input capacitance as the inverter.
27.What is complementary pass transistor logic?State its advantages over
CVSL.(Nov-14)
 CVSL is slow because one side of the gate pulls down, and then the
cross-coupled pMOS transistor pulls the other side up.

 The size of the cross coupled device is an inherent compromise


between a large transistor that fights the pulldown xcessively and a
small transistor that is slow pulling up.
CPL can be understood as an improvement on CVSL. CPL resolves this
problem by making one half of the gate pull up while the other half pulls down.

28.Why single phase dynamic logic structure cannot be cascaded?Justify.


(May2016)
If several stages of the previous CMOS dynamic logic circuit are cascaded
together using the same clock φ, a problem in evaluation involving a built-in “race
condition” will exist
29.State the advantages of transmission gates.

PART B Questions

1.(i) Explain the static and dynamic power dissipation in CMOS circuits with
necessary diagrams and expressions.(10) DEC 2011
2.a)Implement the equation X=((A +B )(C+D+E)+F)G using CMOS
technology and draw the layout for this CMOS circuit DEC 2012

3.Derive an expression for the rise time, fall time and propagation delay of a
CMOS inverter.(16)

4.Explain the various ways to minimize the static and dynamic power
dissipation.(16 DEC 2013)
5. Discuss in detail about the ratioed circuit and dynamic circuit CMOS logic
configurations
6. Describe the basic principle of operation of dynamic CMOS ,domino and
NP dominologic with neat diagrams.
7. Explain the static and dynamic power dissipation in CMOS circuits with
necessarydiagrams and expressions.
8. Discuss the design techniques to reduce switching activity in a static and
dynamic CMOScircuits.
9. Briefly discuss about the classification of circuit families and comparison of
circuitfamilies.
10.i)Draw the static CMOS logic circuit for the given expression(May-16)
Y= A . B . C . D

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Y= D ¿ ¿)
ii)Discuss in detail characteristics of CMOS transmission gate?
11.What are the sources of power dissipation in CMOSand discuss
various design techniques to reduce power dissipation in CMOS circuits.
(May-16)

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

UNIT III
SEQUENTIAL LOGIC
CIRCUITS
REFERRED BOOK:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated
Circuits: A Design Perspective”, Second Edition, Prentice Hall of India,
2003.
2. N.Weste, K.Eshraghian, “Principles of CMOS VLSI Design”, Second
Edition, Addision Wesley 1993

3. R.Jacob Baker, Harry W.LI., David E.Boyee, “CMOS Circuit Design,


Layout and Simulation”, Prentice Hall of India 2005 3. A.Pucknell,
Kamran Eshraghian, “BASIC VLSI Design”, Third Edition, Prentice Hall
of India, 2007.

STAFF IN-CHARGE HOD


UNITIII

3.1INTRODUCTION
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

Combinational logic circuits that were described earlier have the property that
the outputof a logic block is only a function of the current input values, assuming that
enough timehas elapsed for the logic gates to settle. Yet virtually all useful systems
require storage ofstate information, leading to another class of circuits
calledsequential logic circuits. Inthese circuits, the output not only depends upon the
current values of the inputs, but alsoupon preceding input values. In other words, a
sequential circuit remembers some of thepast history of the system— it has memory.
Figure 3.1 shows a block diagram of a generic finite state machine (FSM) that
consistsof combinational logic and registers that hold the system state. The system
depictedhere belongs to the class of synchronous sequential systems, in which all
registers areunder control of asingle global clock. The outputs of the FSM are a
function of the currentInputs and the Current State. The Next State is determined
based on the Current State andthe current Inputs and is fed to the inputs of registers.
On the rising edge of the clock, theNext State bits are copied to the outputs of
the registers (after some propagation delay),and a new cycle begins. The
registerthen ignores changes in the input signals until thenext rising edge. In general,
registers can be positive edge-triggered (where the input datais copied on the
positive edge) or negative edge-triggered (where the input data is copiedon the
negative edge of the clock, as is indicated by a small circle at the clock input).

Figure3.1 Block diagram of a finite state machine usingpositive edge-triggered


registers.
This chapter discusses the CMOS implementation of the most important
sequentialbuilding blocks. A variety of choices in sequential primitives and clocking
methodologiesexist; making the correct selection is getting increasingly important in
modern digital circuits, and can have a great impact on performance, power, and/or
design complexity. Before embarking on a detailed discussion on the various design
options, a revision of thedesign metrics, and a classification of the sequential
elements is necessary.

3.2 Timing Metrics for Sequential Circuits


There are three important timing parameters associated with a register as
illustrated in Figure3.2. The set-up time (tsu) is the time that the data inputs (D input)
must be valid beforethe clock transition (this is, the 0 to 1 transition for a positive
edge-triggered register).

Thehold time (thold) is the time the data input must remain valid after the
clock edge. Assuming that the set-up and hold-times are met, the data at the D input
is copied to the Q outputafter a worst-case propagation delay (with reference to the
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

clock edge) denoted by tc-q.Given the timing information for the registers and the
combination logic, some system-level timing constraints can be derived.

Assume that the worst-casepropagationdelay of the logic equals tplogic, while


its minimum delay (also called thecontaminationdelay) is tcd. The minimum clock
period T, required for proper operation of the sequentialcircuit is given by

Figure 3.2 Definition of set-uptime, hold time, and propagationdelay of a


synchronous register

------------------------------------------------------------- 3.1
The hold time of the register imposes an extra constraint for proper operation,

---------------------------------------------3.2
Wheretcdregister is the minimum propagation delay (or contamination delay)
of the register. As seen from Eq. (3.1), it is important to minimize the values of the
timing parametersassociated with the register, as these directly affect the rate at
which a sequential circuitcan be clocked. In fact, modern high-performance systems
are characterized by avery-low logic depth, and the register propagation delay and
set-up times account for asignificant portion of the clock period.

For example, the DEC Alpha EV6 microprocessorhas a maximum logic depth
of 12 gates, and the register overhead stands forapproximately 15% of the clock
period. In general, the requirement of Eq. (3.2) is not hardto meet, although it
becomes an issue when there is little or no logic between registers, (orwhen the
clocks at different registers are somewhat out of phase due to clock skew, as willbe
discussed in a later Chapter).

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

3.3 Classification of Memory Elements


Foreground versus Background Memory
At a high level, memory is classified into background and foreground memory.
Memorythat is embedded into logic is foreground memory, andis most often
organized as individualregisters of register banks. Large amounts of centralized
memory core are referred toas background memory. In this chapter, we focus on
foreground memories.

Static versus Dynamic Memory


Memories can be static or dynamic. Static memories preserve the state as
long as thepower is turned on. Static memories are built usingpositive feedback or
regeneration, where the circuit topology consists of intentional connectionsbetween
the output and theinput of a combinational circuit. Static memories are most useful
when the register won’tbe updated for extended periods of time.

An example of such is configuration data, loadedat power-up time. This


condition also holds for most processors that use conditional clocking (i.e., gated
clocks) where the clock is turned off for unused modules. In that case, thereare no
guarantees on how frequently the registers will be clocked, and static memories
areneeded to preserve the state information. Memory based on positive feedback fall
underthe class of elements called multivibrator circuits.

The bistable element is its most popular representative, but other elements
such as monostable and astable circuits are also frequently used. Dynamic
memories store state for a short period of time— on the order of milli seconds.They
are based on the principle of temporary charge storage on parasitic capacitors
associated with MOS devices. As with dynamic logic discussed earlier, the
capacitors have to be refreshed periodically to annihilate charge leakage.

Dynamic memories tend tosimpler, resulting in significantly higher


performance and lower power dissipation. Theyare most useful in datapath circuits
that require high performance levels and are periodicallyclocked. It is possible to use
dynamic circuitry even when circuits are conditionallyclocked, if the state can be
discarded when a module goes into idle mode.
Latches vs. Registers
A latch is an essential component in the construction of an edge-triggered
register. It islevel-sensitive circuit that passes the D input to the Q output when the
clock signal is high. This latch is said to be in transparent mode. When the clock is
low, the input data sampledon the falling edge of the clock is held stable at the
output for the entire phase, and thelatch is in hold mode. The inputs must be stable
for a short period around the falling edgeof the clock to meet set-up and hold
requirements.

A latch operating under the above conditionsis a positive latch. Similarly, a


negative latch passes the D input to the Q outputwhen the clock signal is low. The
signal waveforms for a positive and negative latch areshown in Figure 3.3.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

A wide variety of static and dynamic implementations exists for therealization


of latches. Contrary to level-sensitive latches, edge-triggered registers only sample
the input ona clock transition — 0-t o-1 for a positive edge-triggered register, and 1-
to-0 for a negativeedge-triggered register. They are typically built using the latch
primitives of Figure 3.3. Amost-often recurring configuration is the master-slave
structure that cascades a positiveand negative latch. Registers can also be
constructed using one-shot generators of theclock signal (“glitch” registers), or using
other specialized structures. Examples of theseare shown later in this chapter.

Figure 3.3 Timing of positive and negative latches.


3.4 Static Latches and Registers
In static latches and registers the following are discussed
I) Bistability Principle
II) Mutiplexer based latches
III) Master slave edge triggered registers
IV) Low voltage static latches
V) Static SR flipflops
3.4.1 The Bistability Principle
Static memories use positive feedback to create a bistable circuit — a circuit
having twostable states that represent 0 and 1. The basic idea is shown in Figure
3.4a, which showstwo inverters connected in cascade along with a voltage-transfer
characteristic typical ofsuch a circuit. Also plotted are the VTCs of the first inverter,
that is,V o1 versus Vi1, and thesecond inverter (Vo2 versus Vo1). The latter plot is
rotated to accentuate thatVi2 = Vo1.Assume now that the output of the second
inverter Vo2 is connected to the input of the firstVi1, as shown by the dotted lines in
Figure 3.4a.

The resulting circuit has only three possible operation points (A, B, and C), as
demonstrated on the combined VTC. The followingimportant conjecture is easily
proven to be valid: Under the condition that the gain of the inverter in the transient
region is larger than 1, only A and B are stable operation points, and C is a
metastable operation point. Suppose that the cross-coupled inverter pair is biased at
point C.

A small deviation fromthis bias point, possibly caused by noise, is amplified


and regenerated around the circuitloop. This is a consequence of the gain around
the loop being larger than 1. The effect isdemonstrated in Figure 3.4a. A small

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

deviation δ is applied to Vi1 (biased in C). This deviationis amplified by the gain of
the inverter.
The enlarged divergence is applied to thesecond inverter and amplified once
more. The bias point moves away fromC until one ofthe operation points A or B is
reached. In conclusion, C is an unstable operation point. Every deviation (even the
smallest one) causes the operation point to run away from itsoriginal bias. The
chance is indeed very small that the cross-coupled inverter pair is biasedat C and
stays there. Operation points with this property are termed metastable.

On the other hand, A and B are stable operation points, as demonstrated in


Figure3.4b. In these points, the loop gain is much smaller than unity. Even a rather
large deviationfrom the operation point is reduced in size and disappears. Hence the
cross-coupling of two inverters results in a bistable circuit, that is, a circuitwith two
stable states, each corresponding to a logic state. The circuit serves as amemory,
storing either a 1 or a 0 (corresponding to positions A and B).

In order to change the stored value, we must be able to bring the circuit from
stateAto B and vice-versa. Since the precondition for stability is that the loop gainG
is smallerthan unity, we can achieve this by making A (or B) temporarily unstable by
increasingG toa value larger than 1. This is generally done by applying a trigger
pulse atV i1 or Vi2. Forinstance, assume that the system is in position A (Vi1 = 0, Vi2
= 1). Forcing Vi1 to 1 causesboth inverters to be on simultaneously for a short time
and the loop gain G to be largerthan 1. The positive feedback regenerates the effect
of the trigger pulse, and the circuitmoves to the other state (B in this case). The width
of the trigger pulse need be only a little larger than the total propagation delay
around the circuit loop, which is twice the averagepropagation delay of the inverters.

In summary, a bistable circuit has two stable states. In absence of any


triggering, thecircuit remains in a single state (assuming that the power supply
remains applied to thecircuit), and hence remembers a value. A trigger pulse must
be applied to change the stateof the circuit. Another common name for a bistable
circuit is flip-flop (unfortunately, anedge-triggered register is also referred to as a flip-
flop).

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.4 Two cascaded inverters (a)and their VTCs (b).


3.4.2 SR Flip-Flops

The cross-coupled inverter pair shown in the previous section provides an


approach tostore a binary variable in a stable way. However, extra circuitry must be
added to enablecontrol of the memory states. The simplestincarnation accomplishing
this is the wellknownSR —or set-reset— flip-flop, an implementation of which is
shown in Figure 3.5a.

This circuit is similar to the cross-coupled inverter pair with NOR gates
replacing theinverters. The second input of the NOR gates is connected to the trigger
inputs ( S and R),that make it possible to force the outputs Q and Q to a given state.
These outputs are complimentary(except for the SR = 11 state).

When both S and R are 0, the flip-flop is in a quiescentstate and both outputs
retain their value (a NOR gate with one of its input being 0looks like an inverter, and
the structure looks like a cross coupled inverter). If a positive(or 1) pulse is applied to
the S input, the Q output is forced into the 1 state (with Q going to0). Vice versa, a 1
pulse on R resets the flip-flop and the Q output goes to 0.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.5NOR-based SR flip-flop.


These results are summarized in the characteristic table of the flip-flop, shown
inFigure 3.5c. The characteristic table is the truth table of the gate and lists the
output statesas functions of all possible input conditions. When both S and R are
high, both Q and Qbar are forced to zero. Since this does not correspond with our
constraint that Q and Qbar must becomplementary, this input mode is considered to
be forbidden. An additional problemwith this condition is that when the input triggers
return to their zero levels, the resultingstate of the latch is unpredictable and
depends on whatever input is last to go low.

Figure 3.5 shows the schematics symbol of the SR flip-flop.The SR flip-flops


discussed so far are asynchronous, and do not require a clock signal.Most systems
operate in a synchronous fashion with transition events referenced to aclock.

One possible realization of a clocked SR flip-flop— alevel-sensitive


positivelatch— is shown in Figure 3.6. It consists of a cross-coupled inverter pair
plus 4 extratransistors to drive the flip-flop from one state to another and to provide
clocked operation.
Observe that the number of transistors is identical to the implementation of
Figure3.5, but the circuit has the added feature of being clocked. The drawback of
saving sometransistors over a fully-complimentary CMOS implementation is that
transistor sizingbecomes critical in ensuring proper functionality.
Consider the case where Q is high andan R pulse is applied. The combination
of transistorsM4, M7, and M8 forms a ratioedinverter. In order to make the latch
switch, we must succeed in bringingQ below theswitching threshold of the
inverterM1-M2. Once this is achieved, the positive feedbackcauses the flip-flop to
invert states. This requirement forces us to increase the sizes of transistorsM5, M6,
M7, and M8.

Figure 3.6CMOS clocked SR flip-flop.


The presented flip-flop does not consume any static power. In steady-state,
oneinverter resides in the high state, while the other one is low. No static paths
between VDDand GND can exist except during switching.
3.4.3 Multiplexer Based Latches

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

There are many approaches for constructing latches. One very common
techniqueinvolves the use of transmission gate multiplexers. Multiplexer based
latches can providesimilar functionality to the SR latch, but has the important added
advantage that the sizingof devices only affects performance and is not critical to the
functionality.

Figure 3.7 shows an implementation of static positive and negative latches


basedon multiplexers. For a negative latch, when the clock signal is low, the input 0
of the multiplexer is selected, and the D input is passed to the output. When the
clock signal is high,the input 1 of the multiplexer, which connects to the output of the
latch, is selected. Thefeedback holds the output stable while the clock signal is high.
Similarly in the positivelatch, the D input is selected when clock is high, and the
output is held (using feedback) when clock is low.

A transistor level implementation of a positive latch based on multiplexers is


shownin Figure 3.8. When CLK is high, the bottom transmission gate ison and the
latch istransparent - that is, the D input is copied to the Q output. During this phase,
the feedbackloop is open since the top transmission gate is off.

Unlike the SR FF, the feedback does nothave to be overridden to write the
memory and hence sizing of transistors is not critical forrealizing correct functionality.
The number of transistors that the clock touches is importantsince it has an activity
factor of 1. This particular latch implementation is not particularlyefficient from this
metric as it presents a load of 4 transistors to theC LK signal.

Figure3. 7 Negative and positive latches based onmultiplexers.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.8Transistor level implementation ofa positive latch built using


transmission gates.
It is possible to reduce the clock load to two transistors by using implement
multiplexersusing NMOS only pass transistor as shown in Figure 3.9. The advantage
of thisapproach is the reduced clock load of only two NMOS devices.

When CLK is high, thelatch samples the D input, while a low clock-signal
enables the feedback-loop, and putsthe latch in the hold mode. While attractive for
its simplicity, the use of NMOS only passtransistors results in the passing of a
degraded high voltage of VDD-VTn to the input of thefirst inverter.

This impacts both noise margin and the switching performance, especially
inthe case of low values of VDD and high values of VTn. It also causes static power
dissipationin first inverter.Since the maximum input-voltageto the inverter equals
VDD-VTn, the PMOS device of the inverter is never turned off, resulting is a static
current flow.

Figure 3.9Multiplexer based NMOS latch using NMOS only pass transistors for
multiplexer

3.4.4 Master-Slave Based Edge Triggered Register

The most common approach for constructing an edge-triggered register is to


use amaster-slave configuration as shown in Figure 3.10. The register consists of
cascading anegative latch (master stage) with a positive latch (slave stage). A
multiplexer based latchis used in this particular implementation, though any latch can
be used to realize the masterand slave stages.

On the low phase of the clock, the master stage istransparent and theD input
is passed to the master stage output, QM. During this period, the slave stage is inthe
hold mode, keeping its previous value using feedback. On the rising edge of the
clock,the master slave stops sampling the input, and the slave stage starts sampling.

During thehigh phase of the clock, the slave stage samples the output of the
master stageQ (M), whilethe master stage remains in a hold mode. Since QM is
constant during the high phase of theclock, the output Q makes only one transition
per cycle. The value of Q is the value of Dright before the rising edge of the clock,
achieving the positive edge-triggered effect. Anegative edge-triggered register can

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

be constructed using the same principle by simplyswitching the order of the positive
and negative latch (i.e., placing the positive latch first).

Figure 3.10Positive edge-triggered register based on a master-slave


configuration.
A complete transistor level implementation of a the master-slave positive edge-
triggeredregister is shown in Figure 3.10. The multiplexer is implemented using
transmissiongates as discussed in the previous section. When clock is low (CLK =
1), T1 is on and T2 isoff, and the D input is sampled onto node QM. During this
period, T3 is off and T4 is on andthe cross-coupled inverters (I5, I6) hold the state of
the slave latch. When the clock goeshigh, the master stage stops sampling the input
and goes into a hold mode. T1 is off and T2is on, and the cross coupled inverters I3
and I4 holds the state of QM. Also, T3 is on and T4is off, and QM is copied to the
output Q

Figure 3.11 Transistor-level implementation of a master-slave postive edge-


triggered register using multiplexers
Timing Properties of the multiplexer Based Master-Slave Register.

As discussed earlier, there are three important timing metrics in registers: the
set up time, the hold time andthe propagation delay. It is important to understand
these factors that affect the timingparameters and develop the intuition to manually
estimate the parameters. Assume that thepropagation delay of each inverter is
tpd_inv and the propagation delay of the transmissiongate is tpd_tx.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Also assume that the contamination delay is 0 and the inverter delay to
deriveCLK from CLK has a delay equal to 0.The set-up time is the time before the
rising edge of the clock that the input dataDmust become valid. Another way to ask
the question is how long before the rising edgedoes the D input have to be stable
such that QM samples the value reliably.

For the transmissiongate multiplexer-based register, the input D has to


propagate through I1, T1, I3 andI2 before the rising edge of the clock. This is to
ensure that the node voltage s on both terminalsof the transmission gate T2 are at
the same value. Otherwise, it is possible for thecross-coupled pair I2 and I3 to settle
to an incorrect value. The set-up time is thereforeequal to 3 *tpd_inv + tpd_tx .

The propagation delay is the time for the value of QM to propagate to the
output Q.Note that since we included the delay ofI2 in the set-up time, the output of
I4 is validbefore the rising edge of clock. Therefore the delayt c-q is simply the delay
throughT 3 andI6 (tc-q = tpd_tx + tpd_inv).The hold time represents the time that the
input must be held stable after the risingedge of the clock. In this case, the
transmission gateT 1 turns off when clock goes high andtherefore any changes in
theD-input after clock going high are not seen by the input.

Therefore, the hold time is 0.As mentioned earlier, the drawback of the
transmission gate register is the high capacitiveload presented to the clock signal.
The clock load per register is important since it directlyimpacts the power dissipation
of the clock network. Ignoring the overhead required toinvert the clock signal (since
the buffer inverter overhead can be amortized over multipleregister bits), each
register has a clock load of 8 transistors. One approach to reduce theclock load at
the cost of robustness is to make the circuit ratioed. Figure 3.12 shows thatthe
feedback transmission gate can be eliminated by directly cross coupling the
inverters.

.
Figure 3.12Reduced load clock load static master-slave register
The penalty for the reduced clock load is increased design complexity. The
transmissiongate (T1) and its source driver must overpower the feedback inverter
(I2) to switchthe state of the cross-coupled inverter.The sizing requirements for the
transmission gatescan be derived using a similar analysis as performed for the SR
flip-flop. The input to theinverter I1 must be brought below its switching threshold in
order to make a transition. Ifminimum-sized devices are to be used in the
transmission gates, it is essential that thetransistors of inverter I2 should be made

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

even weaker. This can be accomplished by makingtheir channel-lengths larger than


minimum.
Using minimum or close-to-minimumsizedevices in the transmission gates is
desirable to reduce the power dissipation in thelatches and the clock distribution
network. Another problem with this scheme is the reverse conduction — this is ,the
secondstage can affect the state of the first latch. When the slave stage is on (Figure
3.13), it ispossible for the combination of T2 and I4 to influence the data stored in I1-
I2 latch. As longas I4 is a weak device, this is fortunately not a major problem.

Figure 3.13Reverse conduction possible in the transmission gate.


3.4.5 Non-ideal clock signals

So far, we have assumed that CLK is a perfect inversion of CLK, or in other


words, that thedelay of the generating inverter is zero. Even if this were possible, this
would still not be agood assumption. Variations can exist in the wires used to route
the two clock signals, orthe load capacitances can vary based on data stored in the
connecting latches. This effect, known as clock skew is a major problem, and causes
the two clock signals to overlap as isshown in Figure 3.14b. Clock-overlap can cause
two types of failures, as illustrated for theNMOS-only negative master-slave register
of Figure 3.14a.

Figure 3.14Master-slave register basedon NMOS-only pass transistors.

• When the clock goes high, the slave stage should stop sampling the master
stageoutput and go into a hold mode. However, since CLK and CLK are both high for
ashort period of time (the overlap period), both sampling pass transistors conduct

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

andthere is a direct path from the D input to the Q output. As a result, data at the
outputcan change on the rising edge of the clock, which is undesired for
anegativeedgetriggeredregister. The is known as a race condition in which the value
of the outputQ is a function of whether the input D arrives at node X before or after
the fallingedge of CLK . If node X is sampled in the metastable state, the output will
switch to avalue determined by noise in the system.

• The primary advantage of the multiplexer-based register is that the feedback


loop isopen during the sampling period, and therefore sizing of devices is not critical
tofunctionality. However, if there is clock overlap betweenCLK and CLK , node A
canbe driven by both D and B, resulting in an undefined state.Those problems can
be avoided by using two non-overlapping clocks PHI1 and PHI2instead (Figure
3.15), and by keeping the nonoverlap time tnon_overlap between the clockslarge
enough such that no overlap occurs even in the presence of clock-routing delays.

During the nonoverlap time, the FF is in the high-impedance state— the


feedback loop isopen, the loop gain is zero, and the input is disconnected. Leakage
will destroy the state ifthis condition holds for too long a time. Hence the
namepseudostatic: the registeremploys a combination of static and dynamic storage
approaches depending upon the stateof the clock.

Figure 3.15Pseudostatic two-phase D register


Low-Voltage Static Latches
The scaling of supply voltages is critical for low power operation.
Unfortunately,certain latch structures don’t function at reduced supply voltages. For
example, without the scaling of device thresholds, NMOS only pass transistors (e.g.,
Figure 7.21) don’t scale well with supply voltage due to its inherent threshold drop. At
very low power supply voltages, the input to the inverter cannot be raised above the
switching threshold, resulting in incorrect evaluation.

Even with the use of transmission gates, performance degrades significantly


at reduced supply voltages.Scaling to low supply voltages hence requires the use of

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

reduced threshold devices.However, this has the negative effect of exponentially


increasing the sub-threshold leakage power as discussed in Chapter 6. When the
registers are constantly accessed, the leak- age energy is typically insignificant
compared to the switching power.

However, with the use of conditional clocks, it is possible that registers are
idle for extended periods and the leakage energy expended by registers can be quite
significant. Many solutions are being explored to address the problem of high
leakage during idle periods. One approach for this involves the use of Multiple
Threshold devices as shown in Figure.

Only the negative latch is shown here. The shaded inverters and transmission
gates are implemented in low-threshold devices. The lowthreshold inverters are
gated using high threshold devices to eliminate leakage. During normal mode of
operation, the sleep devices are tunedon. When clock is low, the D input is sampled
and propagates to the output. When clock is high, the latch is in the hold mode.

The feedback transmission gate conducts and the cross-coupled feedback is


enabled. Note there is an extra inverter, needed for storage of state when the latch is
in the sleep state. During idle mode, the high threshold devices in series with the low
threshold inverter are turned off (the SLEEP signal is high), eliminating leakage. It is
assumed that clock is in the high state when the latch is in the sleep state. The
feedback low-threshold transmission gate is turned on and the cross-coupled high-
threshold devices maintain the state of the latch.

Fig:3.16One solution for the leakage problem in low-voltage operation using


MTCMOS.
3.5 Dynamic Latches and Registers
Storage in a static sequential circuit relies on the concept that a cross-coupled
inverter pairproduces a bistable element and can thus be used to memorize binary
values. Thisapproach has the useful property that a stored value remains valid as
long as the supplyvoltage is applied to the circuit, hence the namestatic.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The major disadvantage of thestatic gate, however, is its complexity. When


registers are used in computational structuresthat are constantly clocked such as
pipelined datapath, the requirement that the memoryshould hold state for extended
periods of time can be significantly relaxed.

This results in a class of circuits based on temporary storage of charge on


parasiticcapacitors. The principle is exactly identical to the one used in dynamic logic
— chargestored on a capacitor can be used to represent a logic signal. The absence
of charged notes a 0, while its presence stands for a stored 1. No capacitor is ideal,
unfortunately, and some charge leakage is always present.

A stored value can hence only be kept for alimited amount of time, typically in
the range of milliseconds. If one wants to preservesignal integrity, a periodic refresh
of its value is necessary. Hence the name dynamic storage. Reading the value of the
stored signal from a capacitor without disrupting the chargerequires the availability of
a device with ahigh input impedance.

3.5.1 Dynamic Transmission-Gate Based Edge-triggredRegisters


A fully dynamic positive edge-triggered register based on the master-slave
concept isshown in Figure 3.16. When CLK = 0, the input data is sampled on storage
node 1, whichhas an equivalent capacitance of C1 consisting of the gate
capacitance of I1, the junctioncapacitance of T1, and the overlap gate capacitance of
T1.
During this period, the slavestage is in a hold mode, with node 2 in a high-
impedance (floating) state. On the risingedge of clock, the transmission gate T2
turns on, and the value sampled on node 1 rightbefore the rising edge propagates to
the output Q (note that node 1 is stable during the highphase of the clock since the
first transmission gate is turned off). Node 2 now stores theinverted version of node
1.
This implementation of anedge-triggered register is very efficientas it requires
only 8 transistors. The sampling switches can be implemented usingNMOS-only
pass transistors, resulting in an even-simpler 6 transistor implementation. The
reduced transistor count is attractive for high-performance and low-power systems.

Figure 3.16Dynamic edge-triggered register.


The set-up time of this circuit is simply the delay of the transmission gate ,
and correspondsto the time it takes node 1 to sample the D input. The hold time is
approximatelyzero, since the transmission gate is turned off on the clock edge and
further inputs changesare ignored. The propagation delay (tc-q) is equal to two
inverter delays plus the delay of the transmission gate T2.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

One important consideration for such a dynamic register is that the storage
nodes(i.e., the state) has to be refreshed at periodic intervals to prevent a loss due to
charge leakage, due to diode leakage as well as sub-threshold currents. In datapath
circuits, the refreshrate is not an issue since the registers are periodically clocked,
and the storage nodes areconstantly updated.

Clock overlap is an important concern for this register. Consider the clock
waveformsshown in Figure 3.17. During the 0-0 overlap period, the NMOS of T1 and
thePMOS of T2 are simultaneously on, creating a direct path for data to flow from the
D input of the register to the Q output. This is known asa race condition. The output
Q can changeon the falling edge if the overlap period is large — obviously an
undesirable effect for apositive edge-triggered register. The same is true for the 1-1
overlap region, where aninput-output path exists through the PMOS of T1 and the
NMOS of T2.

The latter case istaken care off by enforcing a hold time constraint. That is,
the data must be stable duringthe high-high overlap period. The former situation (0-0
overlap) can be addressed by makingsure that there is enough delay between the D
input and node 2 ensuring that new datasampled by the master stage does not
propagate through to the slave stage. Generally the built in single inverter delay
should be sufficient and the overlap period constraint is givenas:

similarly, the constraint for the 1-1 overlap is given as:

Figure 3.17 Impact of non-overlapping clocks

3.5.2 C2MOS Dynamic Register: A Clock Skew Insensitive Approach


The C2MOS Register
Figure 3.18shows an ingenious positive edge-triggered register based on the
master-slaveconcept which is insensitive to clock overlap. This circuit is called the
C2MOS (ClockedCMOS) register. The register operates in two phases.

1. CLK= 0 (CLK = 1): The first tri-state driver is turned on, and the master
stage actsas an inverter sampling the inverted version of D on the internal node X.
The masterstage is in the evaluation mode. Meanwhile, the slave section is in a high-
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

impedancemode, or in a hold mode. Both transistors M7 and M8 are off, decoupling


the outputfrom the input. The output Q retains its previous value stored on the output
capacitorCL2.

2. The roles are reversed when CLK= 1: The master stage section is in hold
mode (M3-M4 off), while the second section evaluates (M7-M8 on). The value stored
on CL1propagates to the output node through the slave stage which acts as an
inverter. The overall circuit operates as a positive edge-triggered master-slave
register —very similar to the transmission-gate based register presented earlier.

However, there is animportant difference: A C2MOS register with CLK-CLK


clocking is insensitive to overlap, as long as the rise andfall times of the clock edges
are sufficiently small.

To prove the above statement, we examine both the (0-0) and (1-1) overlap
cases(Figure 3.17). In the (0-0) overlap case, the circuit simplifies to the network
shown in Figure3.19a in which both PMOS devices are on during this period. The
question is does anynew data sampled during the overlap window propagate to the
output Q.

This is not desirablesince data should not change on the negative edge for
apositive edge-triggered register. Indeed new data is sampled on node X through the
series PMOS devices M2-M4, andnode X can make a 0-to-1 transition during the
overlap period. However, this data cannotpropagate to the output since the NMOS
device M7 is turned off.

At the end of the overlapperiod, CLK=1 and both M7 and M8 turn off, putting
the slave stage is in the hold mode. Therefore, any new data sampled on the falling
clock edge is not seen at the slave outputQ, since the slave state is off till the next
rising edge of the clock. As the circuit consists ofa cascade of inverters, signal
propagation requires one pull-up followed by a pull-down, orvice-versa, which is not
feasible in the situation presented.The (1-1) overlap case (Figure 3.19b), where both
NMOS devices M3 and M7 areturned on, is somewhat more contentious.

The question is again if new data sampled duringthe overlap period (right after
clock goes high) propagates to the Q output. A positiveedge-triggered register may
only pass data that is presented at the input before the risingedge. If the D input
changes during the overlap period, node X can make a 1-to-0 transition, but cannot
propagate to the output. However, as soon as the overlap period is over, the PMOS
M8 is turned on and the 0 propagates to output. This effect is not desirable.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.18C2MOS master-slave positive edge-triggeredregister.

Figure 3.19C2MOS D FF during overlap periods. No feasible signal path can


exist betweenIn and D, as illustrated by the arrows.

The problem is fixed by imposing a hold time constraint on the input data,D,
or, in other words, the data D should be stable during the overlap period. In
summary, it can be stated that the C 2MOS latch is insensitive to clock
overlapsbecause those overlaps activate either the pull-up or the pull-down networks
of the latches, but never both of them simultaneously. If therise and fall times of the
clock are sufficiently slow, however, there exists a time slot where both the NMOS
and PMOS transistorsare conducting.

This creates a path between input and output that can destroy the stateof the
circuit. Simulations have shown that the circuit operates correctly as long as theclock
rise time (or fall time) is smaller than approximately five times thepropagationdelay of
the register. This criterion is not too stringent, and is easily met in practicaldesigns.

3.5.3 True Single-Phase Clocked Register (TSPCR)

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

In the two-phase clocking schemes described above, care must be taken in


routing the twoclock signals to ensure that overlap is minimized. While the C 2MOS
provides a skew-tolerantsolution, it is possible to design registers that only use a
single phase clock. The TrueSingle-Phase Clocked Register (TSPCR) proposed by
Yuan and Svensson uses a singleclock (without an inverse clock) .

The basic single-phase positive and negativelatches are shown in Figure


3.20. For the positive latch, when CLK is high, the latch is inthe transparent mode
and corresponds to two cascaded inverters; the latch is non-inverting, and
propagates the input to the output. On the other hand, whenC LK = 0, both
invertersare disabled, and the latch is in hold-mode. Only the pull-up networks are
still active, while the pull-down circuits are deactivated.

As a result of the dual-stage approach, nosignal can ever propagate from the
input of the latch to the output in this mode. A registercan be constructed by
cascading positive and negative latches. The clock load is similar toa conventional
transmission gate register, or C2MOS register. The main advantage is theuse of a
single clock phase. The disadvantage is the slight increase in the number of
transistors— 12 transistors are required.

TSPC offers an additional advantage: the possibility of embedding logic


functionalityinto the latches. This reduces the delay overhead associated withthe
latches. Figure3.21a outlines the basic approach for embedding logic, while Figure
3.21b shows anexample of a positive latch that implements the AND ofI n1 and In2
in addition to performingthe latching function.

While theset-up time of this latch has increased over theoneshown in Figure
3.20, the overall performance of the digital circuit (that is, the clockperiod of a
sequential circuit) has improved: the increase in set-up time is typically smallerthan
the delay of an AND gate. This approach of embedding logic into latches has
beenused extensively in the design of the EV4 DEC Alpha microprocessorand many
other high performance processors.

Figure 3.20True Single Phase Latches.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.21True Single Phase Latches.

3.6 Pipelining: An approach to optimize sequential circuits

Pipelining is a popular design technique often used to accelerate the


operation of the datapathsin digital processors. The idea is easily explained with the
example of Figure 3.22a.The goal of the presented circuit is to computelog(|a-b|),
where both a and b representstreams of numbers, that is, the computation must be
performed on a large set of input values.The minimal clock period Tmin necessary to
ensure correct evaluation is given as:

Tmin= tc-q + tpd,logic+ tsu

Wheretc-q and tsu are the propagation delay and the set-up time of the
register, respectively. We assume that the registers are edge-triggered D registers.
The term tpd,logic standsfor the worst-case delay path through the combinatorial
network, which consists of theadder, absolute value, and logarithm functions.

In conventional systems (that don’t pushthe edge of technology), the latter


delay is generally much larger than the delays associatedwith the registers and
dominates the circuit performance.

Assume that eachlogicmodule has an equal propagation delay. We note that


each logic module is then active foronly 1/3 of the clock period (if the delay of the
register is ignored). For example, the adderunit is active during the first third of the
period and remains idle( this is,it does no usefulcomputation) during the other 2/3 of
the period. Pipelining is a technique to improve theresource utilization, and increase
the functional throughput. Assume that we introduceregisters between the logic
blocks, as shown in Figure 3.22b. This causes the computationfor one set of input
data to spread over a number of clock periods, as shown in Table 3.1.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

(a) Nonpipelined version

(b) Pipelined version

Figure 3.22Datapath for the computation of log(| a + b|).

The result for the data set (a1, b1) only appears at the output after three clock-
periods. Atthat time, the circuit has already performed parts of the computations for
the next datasets, (a2, b2) and (a3,b3). The computation is performed in an
assembly-line fashion, hencethe name pipeline.
Table 3.1 Example of pipelined computations

The advantage of pipelined operation becomes apparent when examining the


minimumclock period of the modified circuit. The combinational circuit block has
been partitionedinto three sections, each of which has a smallerpropagation delay

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

than the originalfunction. This effectively reduces the value of the minimum allowable
clock period:

Tmin ,pipe = tc-q + max(tpd ,add,tpd ,abstpd ,log)


Suppose that all logic blocks have approximately the same propagation delay,
andthat the register overhead is small with respect to the logic delays. The pipelined
networkoutperforms the original circuit by a factor of three under these assumptions,
(i.e.,Tmin,pipe=Tmin/3). The increased performance comes at the relatively small
cost of two additional registers, and an increased latency.1 This explains why
pipelining is popular in the implementationof very high-performance datapaths.

3.6.1 Latch- vs. Register-Based Pipelines


Pipelined circuits can be constructed using level-sensitive latches instead of
edge-triggeredregisters. Consider the pipelined circuit of Figure 3.23. The pipeline
system isimplemented based on pass-transistor-based positive and negative latches
instead of edgetriggeredregisters.

That is, logic is introduced between the master and slave latches of amaster-
slave system. In the following discussion, we use without loss of generality theCLK-
CLK notation to denote a two-phase clock system.

Latch-based systems give significantlymore flexibility in implementing a


pipelined system, and often offers higher performance.When the clocks CLK and
CLK are nonoverlapping, correct pipeline operation isobtained.

Input data is sampled on C1 at the negative edge of CLK and the


computation oflogic block F starts; the result of the logic block F is stored on C2 on
the falling edge ofCLK , and the computation of logic blockG starts. The
nonoverlapping of the clocksensures correct operation.

The value stored on C2 at the end of the CLK low phase is theresult of
passing the previous input (stored on the falling edge ofC LK on C1) through thelogic
function F. When overlap exists between CLK and CLK , the next input is
alreadybeing applied to F, and its effect might propagate to C2 before CLK goes low
(assumingthat the contamination delay of F is small).

In other words, a race develops between theprevious input and the current
one. Which value wins depends upon the logic functionF ,the overlap time, and the
value of the inputs since the propagation delay is often a functionof the applied
inputs. The latter factor makes the detection and elimination of raceconditions non-
trivial.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.23Operation of two-phasepipelined circuit using dynamic registers

3.6.2 NORA-CMOS— A Logic Style for Pipelined Structures


The latch-based pipeline circuit can also be implemented using C2MOS
latches, as shownin Figure 3.24. The operation is similar to the one discussed
above.
This topology has oneadditional, important property:
A C2MOS-based pipelined circuit is race-free as long as all the logic
functionsF (implementedusing static logic) between the latches are noninverting.

The reasoning for the above argument is similar to the argument made in the
constructionof a C2MOS register. During a (0-0) overlap between CLK and CLK, all
C2MOSlatches, simplify to pure pull-up networks.

The only way a signal canrace from stage to stage under this condition is
when the logic functionF is inverting, asillustrated in Figure 3.25, where F is replaced
by a single, static CMOS inverter.

Similarconsiderations are valid for the (1-1) overlap. Based on this concept, a
logic circuit style calledNORA-CMOS was conceived. It combines C2MOS pipeline
registers and NORA dynamic logic functionblocks.

Each module consists of a block of combinational logic that can be a mixture


ofstatic and dynamic logic, followed by a C2MOS latch. Logic and latch are clocked
in sucha way that both are simultaneously in either evaluation, or hold (precharge)
mode.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.24Pipelined datapath using C2MOS latches.

Figure 3.25Potential race conditionduring (0-0) overlap in C 2MOS-based


design
that is in evaluation during CLK =1is called a CLK-module, while the inverse is called
aCLK -module. The operation modes of the modules are summarized in Table 3.2.

Table 3.2 Operation modes for NORA logic modules.

A NORA datapath consists of a chain of alternating CLK and CLK modules.


Whileone class of modules is precharging with its output latch in hold mode,
preserving the previousoutput value, the other class is evaluating. Data is passed in
a pipelined fashion frommodule to module.NORA offers designers a wide range of
design choices.

Dynamic and static logiccan be mixed freely, and both CLKp and CLKn
dynamic blocks can be used in cascaded orin pipelined form. With this freedom of

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

design, extra inverter stages, as required inDOMINO-CMOS, are most often


avoided.
Design rule:
In order to ensure correct operation, two important rules should always be followed:

(a) CLK-module

(b) CLK-module
Figure 3.26Examples of NORA CMOS Modules.
• The dynamic-logic rule: Inputs to a dynamic CLKn (CLKp) block are only
allowed tomake a single 0→1 (1→0) transition during the evaluation period .

• The C2MOS rule: In order to avoid races, the number of static inversions
betweenC2MOS latches should be even.

The presence of dynamic logic circuits requires the introduction of some


extensionsto the latter rule. Consider the situation pictured in Figure 3.26a. During
precharge (CLK =0), the output register of the module has to be in hold mode,
isolating the output node fromthe internal events in the module. Assume now that a
(0-0) overlap occurs.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Node A getsprecharged to VDD, while the latch simplifies to a pull-up network


(Figure 3.26b). It can beobserved that under those circumstances the output node
charges to VDD, and the storedvalue is erased! This malfunctioning is caused by the
fact that the number of static inversionsbetween the last dynamic node in the module
and the latch is odd, which creates anactive path between the precharged node and
the output.

This translates into the followingrule: The number of static inversions between
the last dynamic block in a logic functionand the C2MOS latch should be even. This
and similar considerations lead to a reformulatedC2MOS rule.

3.7Clocking Strategy

Selection of clocking strategy is an important commencement of a design. The


various clocking strategy are
i)Pseudo 2-phase clocking ii)Pseudo 2-phase memory structures
iii) Pseudo 2-phase logic structureiv) 2-phase clocking
v) 2-phase memory structures vi) 2-phase logic structure
vii) 4-phase clocking viii) 4-phase memory structures
ix) 4-phase logic structure x) Pseudo 4-phase clocking

3.8 TIMING ISSUESIN DIGITAL CIRCUITS

Introduction

All sequential circuits have one property in common—a well-defined ordering


of theswitching events must be imposed if the circuit is to operate correctly. If this
were not thecase, wrong data might be written into the memory elements, resulting
in a functional failure. The synchronous system approach, in which all memory
elements in the system aresimultaneously updated using a globally distributed
periodic synchronization signal (thais, a global clock signal), represents an effective
and popular way to enforce this ordering.

Functionality is ensured by imposing some strict contraints on the generation


of the clocksignals and their distribution to the memory elements distributed over the
chip; non-complianceoften leads to malfunction.

This Chapter starts with an overview of the different timing methodologies.


Themajority of the text is devoted to the popular synchronous approach. We analyze
theimpact of spatial variations of the clock signal, called clock skew, and temporal
variationsof the clock signal, called clock jitter, and introduce techniques to cope with
it. These variationsfundamentally limit the performance that can be achieved using a
conventionaldesign methodology.

At the other end of the design spectrum is an approach called asynchronous


design, which avoids the problem of clock uncertainty all-together by eliminating the
need forglobally-distributed clocks.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

After discussing the basics of asynchronous design approach, we analyze the


associated overhead and identify some practical applications. The importantissue of
synchronization, which is required when interfacing different clock domainsor when
sampling an asynchronous signal, also deserves some in-depth treatment. Finally,
thefundamentals of on-chip clock generation using feedback is introduced along
withtrends in timing.

Digital Systems timing classification

In digital systems, signals can be classified depending on how they are


related to a localclock .Signals that transition only at predetermined periodsin time
can be classified as synchronous, mesochronous, or plesiochronous with respect toa
system clock. A signal that can transition at arbitrary times is considered
asynchronous.

3.9.1 Synchronous Interconnect

A synchronous signal is one that has the exact same frequency, and a known
fixed phaseoffset with respect to the local clock. In such a timing methodology, the
signal is “synchronized “with the clock, and the data can be sampled directly without
any uncertainty. Indigital logic design, synchronous systems are the most
straightforward type of interconnect, where the flow of data in a circuit proceeds in
lockstep with the system clock asshown below.

Here, the input data signal In is sampled with register R1 to give signal Cin,
which issynchronous with the system clock and then passed along to the
combinational logicblock. After a suitable setting period, the output Cout becomes
valid and can be sampled byR2 which synchronizes the output with the clock. In a
sense, the “certainty period” of signaCout, or the period where data is valid is
synchronized with the system clock, whichallows register R2 to sample the data with
complete confidence. The length of the “uncertaintyperiod,” or the period where data
is not valid, places an upper bound on how fast asynchronous interconnect system
can be clocked.

Figure 3.27Synchronous interconnectmethodology.

3.9.2 Mesochronous interconnect

A mesochronous signal is one that has the same frequency but an unknown
phase offsetwith respect to the local clock (“meso” from Greek is middle). For
example, if data isbeing passed between two different clock domains, then the data

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

signal transmitted fromthe first module can have an unknown phase relationship to
the clock of the receivingmodule.

In such a system, it is not possible to directly sample the output at the


receivingmodule because of the uncertainty in the phase offset. A (mesochronous)
synchronizer canbe used to synchronize the data signal with the receiving clock as
shown below. The synchronizerserves to adjust the phase of the received signal to
ensure proper sampling In Figure 3.28, signal D1 is synchronous with respect to

ClkA.
Figure 3.28Mesochronous communication approach using variable delay line.

However, D1 and D2are mesochronous with ClkB because of the unknown


phase difference between ClkA andClkB and the unknown interconnect delay in the
path between Block A and Block B. Therole of the synchronizer is to adjust the
variable delay line such that the data signal D3 (adelayed version of D2) is aligned
properly with the system clock of block B.

In this example, the variable delay element is adjusted by measuring the


phase difference between thereceived signal and the local clock. After register R2
samples the incoming data during thecertainty period, then signal D4 becomes
synchronous with ClkB.

3.9.3 Plesiochronous Interconnect

A plesiochronous signal is one that has nominally the same, but slightly
different frequencyas the local clock (“plesio” from Greek is near). In effect, the
phase differencedrifts in time. This scenario can easily arise when two interacting
modules have independentclocks generated from separate crystal oscillators.

Since the transmitted signal canarrive at the receiving module at a different


rate than the local clock, one needs to utilize abuffering scheme to ensure all data is
received. Typically, plesiochronous interconnection occurs in distributed systems like
long distance communications, since chip or evenboard level circuits typically utilize
a common oscillator to derive local clocks. A possibleframework for plesiochronous
interconnect is shown in Figure 3.29.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.29Plesiochronous communications using FIFO.


In this digital communications framework, the originating module issues data
atsome unknown rate characterized by C1, which is plesiochronous with respect to
C2. Thetiming recovery unit is responsible for deriving clock C3 from the data
sequence, and bufferingthe data in a FIFO. As a result, C3 will be synchronous with
the data at the input ofthe FIFO and will be mesochronous with C1.

Since the clock frequencies from the originatingand receiving modules are
mismatched, data might have to be dropped if the transmitfrequency is faster, and
data can be duplicated if the transmit frequency is slower thanthe receive frequency.
However, by making the FIFO large enough, and periodically resettingthe system
whenever an overflow condition occurs, robust communication can beachieved.

3.9.4 Asynchronous Interconnect

Asynchronous signals can transition at any arbitrary time, and are not slaved
to any localclock. As a result, it is not straightforward to map these arbitrary
transitions into a synchronizeddata stream.

Although it is possible to synchronize asynchronous signals bydetecting


events and introducing latencies into a data stream synchronized to a local clock,a
more natural way to handle asynchronous signals is to simply eliminate the use of
localclocks and utilize a self-timed asynchronous design approach.

In such an approach, communicationbetween modules is controlled through a


handshaking protocol to perform theproper ordering of commands.

Figure 3.30Asynchronous design methodology for simple pipeline


interconnect.
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

When a logic block completes an operation, it will generate a completion


signal DVto indicate that output data is valid. The handshaking signals then initiate a
data transfer tothe next block, which latches in the new data and begins a new
computation by assertingthe initialization signal I.

Asynchronous designs are advantageous because computationsare


performed at the native speed of the logic, where block computations occur
wheneverdata becomes available.

There is no need to manage clock skew, and the design methodologyleads to


a very modular approach where interaction between blocks simply occursthrough a
handshaking procedure. However, these handshaking protocols result inincreased
complexity and overhead in communication that can reduce performance.

3.10 Synchronous Design

3.10.1 Synchronous Timing Basics


Virtually all systems designed today use a periodic synchronization signal or
clock. Thegeneration and distribution of a clock has a significant impact on
performance and powerdissipation.

For a positive edge-triggered system, the rising edge of the clock is used
todenote the beginning and completion of a clock cycle. In the ideal world, assuming
theclock paths from a central distribution point to each register are perfectly
balanced, thephase of the clock (i.e., the position of the clock edge relative to a
reference) at variouspoints in the system is going to be exactly equal.

However, the clock is neither perfectlyperiodic nor perfectly simultaneous.


This results in performance degradation and/or circuitmalfunction. Figure 3.31 shows
the basic structure of a synchronous pipelined datapath. Inthe ideal scenario, the
clock at registers 1 and 2 have the same clock period and transitionat the exact
same time. The following timing parameters characterize the timing of thesequential
circuit.
• The contamination (minimum) delay tc-q,cd, and maximum propagation delay
of theregister tc-q.
• The set-up (tsu) and hold time (thold) for the registers.
• The contamination delay tlogic,cd and maximum delay tlogic of the
combinational logic.
• tclk1 and tclk2, corresponding to the position of the rising edge of the clock
relative toa global reference.
Under ideal conditions (tclk1 = tclk2), the worst case propagation delays determine
theminimum clock period required for this sequential circuit. The period must be
longenough for the data to propagate through the registers and logic and be set-up
at the destinationregister before the next rising edge of the clock.
This constraint is given by
T >tc– q + tlogic+ tsu--------------------------------------------------------(1)

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

At the same time, the hold time of the destination register must be shorter than the
minimumpropagation delay through the logic network,
thold<tc– q,cd+ tlogic,cd-----------------------------------------------------------(2)

The above analysis is simplistic since the clock is never ideal. As a result of process
andenvironmental variations, the clock signal can have spatial and temporal
variations.
Clock Skew
The spatial variation in arrival time of a clock transition on an integrated circuit
is commonlyreferred to as clock skew. The clock skew between two points i and j on
a IC isgiven by δ(i,j) = ti- tj, where ti and tj are the position of the rising edge of the
clock withrespect to a reference. Consider the transfer of data between registers R1
and R2 in Figure3.30. The clock skew can be positive or negative depending upon
the routing direction andposition of the clock source. The timing diagram for the case
with positive skew is shownin Figure 3.31. As the figure illustrates, the rising clock
edge is delayed by a positive δ atthe second register.

Figure Timing diagram to study the impact of clock skew on performance and
functionality. In this sample timing diagram, δ > 0.

Clock skew is caused by static path-length mismatches in the clock load and
by definition skew is constant from cycle to cycle. That is, if in one cycle CLK2
lagged CLK1 by δ, then on the next cycle it will lag it by the same amount. It is
important to note that clock skew does not result in clock period variation, but rather
phase shift.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.32Timing diagram for the case when δ< 0. The rising edge of CLK2
arrives earlier than the edge of CLK1.

Skew has strong implications on performance and functionality of a sequential


system. First consider the impact of clock skew on performance. From Figure 3.32, a
newinput In sampled by R1 at edge 1 will propagate through the combinational logic
and besampled by R2 on edge 4. If the clock skew is positive, the time available for
signal topropagate from R1 to R2 is increased by the skew. The output of the
combinational logicmust be valid one set-up time before the rising edge of CLK2
(point 4). The constraint onthe minimum clock period can then be derived as:
T + δ ≥tc– q + tlogic+ Isu
Or
T≥ tc– q+ tlogic+ Isu- δ----------------------------(3)

The above equation suggests that clock skew actually has the potential to
improvethe performance of the circuit. That is, the minimum clock period required to
operate thecircuit reliably reduces with increasing clock skew! This is indeed correct,
but unfortunately, increasing skew makes the circuit more susceptible to race
conditions may andharm the correct operation of sequential systems.
As above, assume that input In is sampled on the rising edge of CLK1 at edge
1 intoR1. The new values at the output of R1 propagates through the combinational
logic andshould be valid before edge 4 at CLK2. However, if the minimum delay of
the combinationallogic block is small, the inputs to R2 may change before the clock
edge 2, resultingin incorrect evaluation.

To avoid races, we must ensure that the minimum propagationdelay through


the register and logic must be long enough such that the inputs to R2 arevalid for a
hold time after edge 2. The constraint can be formally stated as

δ+ thold<tc– qcd+ tlogiccd


δ <tc– qcd+ tlogiccd–
thold----------------------------------------------------------------------(4)

Figure 3.31shows the timing diagram for the case when δ< 0. For this case,
the risingedge of CLK2 happens before the rising edge of CLK1. On the rising edge
of CLK1, anew input is sampled by R1. The new sampled data propagates through
the combinationallogic and is sampled by R2 on the rising edge of CLK2, which
corresponds to edge 4. Ascan be seen from Figure 3.32 and Eq.(1), a negative skew
directly impacts the performance of sequential system. However, a negative skew
implies that the system never fails,since edge 2 happens before edge 1! This can
also be seen from Eq(2)., which isalways satisfied since δ< 0.

Example scenarios for positive and negative clock skew are shown in Figure

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.32Positive and negative clock skew.


• δ> 0—This corresponds to a clock routed in the same direction as the flow of
the datathrough the pipeline (Figure 3.32a). In this case, the skew has to be strictly
controlledand satisfy Eq. (4). If this constraint is not met, the circuit does malfunction
independentof the clock period. Reducing the clock frequency of an edge-triggered
circuitdoes not help get around skew problems! On the other hand, positive skew
increases thethroughput of the circuit as expressed by Eq. (3), because the clock
period can beshortened by δ. The extent of this improvement is limited as large
values of δ soon provokeviolations of Eq. (4).

• δ< 0—When the clock is routed in the opposite direction of the data (Figure
3.32b), theskew is negative and condition (4) is unconditionally met. The circuit
operates correctlyindependent of the skew. The skew reduces the time available for
actual computationso that the clock period has to be increased by δ. In summary,
routing the clock inthe opposite direction of the data avoids disasters but hampers
the circuit performance.

Unfortunately, since a general logic circuit can have data flowing in both
directions (forexample, circuits with feedback), this solution to eliminate races will not
always work (Figure3.33). The skew can assume both positive and negative values
depending on the direction of the data transfer.

Figure 3.33Datapath structure with feedback.


Under these circumstances, the designer has to account for the worst-caseskew
condition. In general, routing the clock so that only negative skew occurs is not
feasible. Therefore, the design of a low-skew clock network is essential.
Clock Jitter

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Clock jitter refers to the temporal variation of the clock period at a given point
on the chip— that is,the clock period can reduce or expand on a cycle-by-cycle
basis. It is strictly a temporaluncertainty measure and is often specified at a given
point on the chip.

Jitter can be measuredand cited in one of many ways. Cycle-to-cycle jitter


refers to time varying deviationof a single clock period and for a given spatial location
i is given as Tjitter,i(n) = Ticlk,n+1 - Ticlk,,n-Tclk
where Ti,n is the clock period for period n, Ti, n+1 is clock period for period n+1,
andTCLK is the nominal clock period.

Jitter directly impacts the performance of a sequential system. Figure 3.34


showsthe nominal clock period as well as variation in period. Ideally the clock period
starts atedge 2and ends at edge 5 and with a nominal clock period of TCLK.
However, as a resultof jitter, the worst case scenario happens when the leading
edge of the current clock periodis delayed (edge 3), and the leading edge of the next
clock period occurs early (edge 4).As a result, the total time available to complete
the operation is reduced by 2 tjiiter in theworst case and is given by
TCLK – 2tjitter ≥tc – q + tlogic + tsu or T ≥tc – q + tlogic + tsu + 2tjitter----------------(5)

The above equation illustrates that jitter directly reduces the performance of a
sequentialcircuit. Care must be taken to reduce jitter in the clock network to
maximize performance.

Figure 3.34Circuit for studying the impact of jitter on performance.


Impact of Skew and Jitter on Performance
In this section, the combined impact of skew and jitter is studied with respect
to conventionaledge-triggered clocking. Consider the sequential circuit show in
Figure 10.12.Assume that nominally ideal clocks are distributed to both registers (the
clockperiod is identical every cycle and the skew is 0). In reality, there is static skew
δ betweenthe two clock signals (assume that δ> 0). Assume that CLK1 has a jitter of
tjitter1 andCLK2 has a jitter of tjitter2. To determine the constraint on the minimum
clock period, wemust look at the minimum available time to perform the required
computation. The worstcase happen when the leading edge of the current clock
period on CLK1 happens late(edge 3) and the leading edge of the next cycle of
CLK2 happens early (edge10). Thisresults in the following constraint

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

---------------------------(6)

Figure 3.35Sequential circuit to study the impact of skew and jitter on edge-
triggeredsystems. In this example, a positive skew (δ) is assumed.

As the above equation illustrates, while positive skew can provide potential
performanceadvantage, jitter has a negative impact on the minimum clock period. To
formulatethe minimum delay constraint, consider the case when the leading edge of
the CLK1 cyclearrives early (edge 1) and the leading edge the current cycle of CLK2
arrives late (edge6). The separation between edge 1 and 6 should be smaller than
the minimum delaythrough the network. This result in

-------------------(7)

The above relation indicates that the acceptable skew is reduced by the jitter
of the twosignals.Now consider the case when the skew is negative (δ<0) as shown
in Figure .For the timing shown, |δ|> tjitter2. It can be easily verified that the worst
case timing isexactly the same as the previous analysis, with δ taking a negative
value. That is, negative skew reduces performance.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 3.36Consider a negative clock skew (δ) and the skew is assumed to be
larger than the jitter.

Pulse Registers
Until now, we have used the master-slave configuration to create an edge-triggered
register. A fundamentally different approach for constructing a register uses pulse signals.
The idea is to construct a short pulse around the rising (or falling) edge of the clock. This
pulse acts as the clock input to a latch (e.g., a TSPC flavor is shown in Figure 7.35a),
sampling the input only in a short window. Race conditions are thus avoided by keeping the
opening time (i.e, the transparent period) of the latch very short. The combination of the
glitchgeneration circuitry and the latch results in a positive edge-triggered register. Figure
7.35b shows an example circuit for constructing a short intentional glitch on each rising edge
of the clock. When CLK = 0, node X is charged up to VDD (MN is off since CLKG is low). On
the rising edge of the clock, there is a short period of time when both inputs of the AND gate
are high, causing CLKG to go high. This in turn acti-vates MN, pulling X and eventually
CLKG low (Figure 7.35c). The length of the pulse is controlled by the delay of the AND gate
and the two inverters. Note that there exists also a delay between the rising edges of the
input clock (CLK) and the glitch clock (CLKG) — also equal to the delay of the AND gate and
the two inverters. If every register on the chip uses the same clock generation mechanism,
this sampling delay does not matter. However, process variations and load variations may
cause the delays through the glitch clock circuitry to be different. This must be taken into
account when performing timing verification and clock skew analysis (which is the topic of a
later Chapter). If set-up time and hold time are measured in reference to the rising edge of
the glitch clock, the set-up time is essentially zero, the hold time is equal to the length of the
pulse (if the contamination delay is zero for the gates), and the propagation delay (tc-q)
equals two gate delays. The advantage of the approach is the reduced clock load and the
small number of transistors required. The glitch-generation circuitry can be amortized over
multiple register bits. The disadvantage is a substantial increase in verification complexity.
This has prevented a wide-spread use. They do however provide an alternate approach to
conventional schemes, and have been adopted in some high performance processors.
Another version of the pulsed register is shown in Figure 7.36 When the clock is low, M3
and M6 are off, and device P1 is turned on. Node X is precharged to VDD, the output node
(Q) is decoupled from X and is held at its previous state. CLKD is a delay-inverted version of
CLK. On the rising edge of the clock, M3 and M6 turn on while devices M1 and M4 stay on
for a short period, determined by the delay of the three inverters. During this interval, the
circuit is transparent and the input data D is sampled by the latch. Once CLKD goes low,
node X is decoupled from the D input and is either held or starts to precharge to VDD by
PMOS device P2. On the falling edge of the clock, node X is held at VDD and the output is
held stable by the cross-coupled inverters.
Note that this circuit also uses a one-shot, but the one-shot is integrated into the reg-ister.
The transparency period also determines the hold time of the register. The window

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

must be wide enough for the input data to propagate to the Q output. In this particular circuit,
the set-up time can be negative. This is the case if the transparency window is longer
than the delay from input to output. This is attractive, as data can arrive at the register even
after the clock goes high, which means that time is borrowed from the previous cycle.

Sense-Amplifier Based Registers


So far, we have presented two fundamental approaches towards building edge-triggered
registers: the master-slave concept and the glitch technique. Figure 7.38 introduces another
technique that uses a sense amplifier structure to implement an edge-triggered register
[Montanaro96]. Sense-amplifier circuits accept small input signals and amplify them to
generate rail-to-rail swings. As we will see, sense amplifier circuits are used extensively in
memory cores and in low swing bus drivers to amplify small voltage swings present in
heavily loaded wires. There are many techniques to construct these amplifiers, with the use
of feedback (e.g., cross-coupled inverters) being one common approach. The circuit shown
in Figure 7.38 uses a precharged front-end amplifier that samples the differential input signal
on the rising edge of the clock signal. The outputs of front-end are fed into a NAND cross-
coupled SR FF that holds the data and guarantees that the differential outputs switch only
once per clock cycle. The differential inputs in this implementation don’t have to have rail-to-
rail swing and hence this register can be used as a receiver for a reduced swing differential
bus. The core of the front-end consists of a cross-coupled inverter (M5-M8), whose outputs
(L1 and L2) are precharged using devices M9 and M10 during the low phase of the clock. As
a result, PMOS transistors M7 and M8 to be turned off and the NAND FF is holding its
previous state. Transistor M1 is similar to an evaluate switch in dynamic circuits and is

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

turned off ensuring that the differential inputs don’t affect the output during the low phase of
the clock. On the rising edge of the clock, the evaluate transistor turns on and the differential
input pair (M2 and M3) is enabled, and the difference between the input signals is amplified
on the output nodes on L1 and L2. The cross-coupled inverter pair flips to one of its the
stable states based on the value of the inputs. For example, if IN is 1, L1 is pulled to 0, and
L2 remains at VDD. Due to the amplifying properties of the input stage, it is not necessary
for the input to swing all the way up to VDD and enables the use of lowswing signaling on
the input wires. The shorting transistor, M4, is used to provide a DC leakage path from either
node L3, or L4, to ground. This is necessary to accommodate the case where the inputs
change

Positive edge-triggered
register based on sense-amplifier.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

their value after the positive edge of CLK has occurred, resulting in either L3 or L4 being left
in a high-impedance state with a logical low voltage level stored on the node. Without the
leakage path that node would be susceptible to charging by leakage currents. The latch
could then actually change state prior to the next rising edge of CLK! This is best illustrated
graphically, as shown in Figure 7.39.
Non-Bistable Sequential Circuits
In the preceding sections, we have focused on one single type of sequential element, this is
the latch (and its sibling the register). The most important property of such a circuit is that it
has two stable states, and is hence called bistable. The bistable element is not the only
sequential circuit of interest. Other regenerative circuits can be catalogued as astable and
monostable. The former act as oscillators and can, for instance, be used for on-chip clock
generation. The latter serve as pulse generators, also called one-shot circuits. Another
interesting regenerative circuit is the Schmitt trigger. This component has the useful property
of showing hysteresis in its dc characteristics—its switching threshold is variable and
depends upon the direction of the transition (low-to-high or high-to-low). This peculiar feature
can come in handy in noisy environments.
7.6.1 The Schmitt Trigger
Definition
A Schmitt trigger [Schmitt38] is a device with two important properties:
1. It responds to a slowly changing input waveform with a fast transition time at the
output.
2. The voltage-transfer characteristic of the device displays different switching thresholds
for positive- and negative-going input signals. This is demonstrated in Figure 7.46, where a
typical voltage-transfer characteristic of the Schmitt trigger is shown (and its schematics
symbol). The switching thresholds for the low-to-high and high to- low transitions are called
VM+ and VM-, respectively. The hysteresis voltage is defined as the difference between the
two. One of the main uses of the Schmitt trigger is to turn a noisy or slowly varying input
signal into a clean digital output signal. This is illustrated in Figure 7.47. Notice how the
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

hysteresis suppresses the ringing on the signal. At the same time, the fast low-to-high (and
highto- low) transitions of the output signal should be observed. For instance, steep signal
slopes are beneficial in reducing power consumption by suppressing direct-path currents.
The “secret” behind the Schmitt trigger concept is the use of positive feedback.

CMOS Implementation
One possible CMOS implementation of the Schmitt trigger is shown in Figure 7.48. The idea
behind this circuit is that the switching threshold of a CMOS inverter is determined by the
(kn/kp) ratio between the NMOS and PMOS transistors. Increasing the ratio results in a
reduction of the threshold, while decreasing it results in an increase in VM. Adapting the ratio
depending upon the direction of the transition results in a shift in the switching threshold and
a hysteresis effect. This adaptation is achieved with the aid of feedback. Suppose that Vin is
initially equal to 0, so that Vout = 0 as well. The feedback loop biases the PMOS transistor
M4 in the conductive mode while M3 is off. The input signal effectively connects to an
inverter consisting of two PMOS transistors in parallel (M2 and M4) as a pull-up network,
and a single NMOS transistor (M1) in the pull-down chain. This modifies the effective
transistor ratio of the inverter to kM1/(kM2+kM4), which moves the switching threshold
upwards. Once the inverter switches, the feedback loop turns off M4, and the NMOS device
M3 is activated. This extra pull-down device speeds up the transition and produces a clean
output signal with steep slopes. A similar behavior can be observed for the high-to-low
transition. In this case, the pull-down network originally consists of M1 and M3 in parallel,
while the pull-up network is formed by M2. This reduces the value of the switching threshold
to VM–.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

CMOS Schmitt trigger.

Monostable Sequential Circuits


A monostable element is a circuit that generates a pulse of a predetermined width every time
the quiescent circuit is triggered by a pulse or transition event. It is called monostable
because it has only one stable state (the quiescent one). A trigger event, which is either a
signal transition or a pulse, causes the circuit to go temporarily into another quasi-stable
state. This means that it eventually returns to its original state after a time period determined
by the circuit parameters. This circuit, also called a one-shot, is useful in generating pulses
of a known length. This functionality is required in a wide range of applications. We have
already seen the use of a one-shot in the construction of glitch registers. Another notorious
example is the address transition detection (ATD) circuit, used for the timing generation in
static memories. This circuit detects a change in a signal, or group of signals, such as the
address or data bus, and produces a pulse to initialize the subsequent circuitry. The most
common approach to the implementation of one-shots is the use of a simple delay element
to control the duration of the pulse. The concept is illustrated in Figure 7.51. In the quiescent
state, both inputs to the XOR are identical, and the output is low. A transition on the input
causes the XOR inputs to differ temporarily and the output to go high. After a delay td (of the
delay element), this disruption is removed, and the output goes low again. A pulse of length

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

td is created. The delay circuit can be realized in many different ways, such as an RC-
network or a chain of basic gates.

Astable Circuits
An astable circuit has no stable states. The output oscillates back and forth between two
quasi-stable states with a period determined by the circuit topology and parameters (delay,
power supply, etc.). One of the main applications of oscillators is the on-chip generation of
clock signals. This application is discussed in detail in a later chapter (on timing). The ring
oscillator is a simple, example of an astable circuit. It consists of an odd number of inverters
connected in a circular chain. Due to the odd number of inversions, no stable operation point
exists, and the circuit oscillates with a period equal to 2 ´ tp ´ N, with N the number of
inverters in the chain and tp the propagation delay of each inverter. The ring oscillator
composed of cascaded inverters produces a waveform with a fixed oscillating frequency
determined by the delay of an inverter in the CMOS process. In many applications, it is
necessary to control the frequency of the oscillator. An example of such a circuit is the
voltage-controlled oscillator (VCO), whose oscillation frequency is a function (typically non-
linear) of a control voltage. The standard ring oscillator can be modified into a VCO by
replacing the standard inverter with a current-starved inverter as shown in Figure 7.53
[Jeong87]. The mechanism for controlling the delay of each inverter is to limit the current
available to discharge the load capacitance of the gate. In this modified inverter circuit, the
maximal discharge current of the inverter is limited by adding an extra series device. Note
that the low-to-high transition on the inverter can also be controlled by adding a PMOS
device in series with M2. The added NMOS transistor M3, is controlled by an analog control
voltage Vcntl, which determines the available discharge current. Lowering Vcntl reduces the
discharge current and, hence, increases tpHL. The ability to alter the propagation delay per
stage allows us to control the frequency of the ring structure. The control voltage is generally
set using feedback techniques. Under low operating current levels, the current-starved
inverter suffers from slow fall times at its

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Figure 7.53 Voltage-controlled oscillator based on current-starved inverters.


output. This can result in significant short-circuit current. This is resolved by feeding its
output into a CMOS inverter or better yet a Schmitt trigger. An extra inverter is needed at
the end to ensure that the structure oscillates.

3.12 Synchronizers and Arbiters*


3.12.1 Synchronizers—Concept and Implementation

Even though a complete system may be designed in a synchronous fashion, it


must stillcommunicate with the outside world, which is generally asynchronous. An
asynchronousinput can change value at any time related to the clock edges of the
synchronous system,as is illustrated in Figure3.55

Figure 3.55Asynchronous-synchronous interface

Consider a typical personal computer. All operations within the system are
strictlyorchestrated by a central clock that provides a time reference. This reference
determineswhat happens within the computer system at any point in time.

This synchronous computerhas to communicate with a human through the


mouse or the keyboard, who has noknowledge of this time reference and might
decide to press a keyboard key at any point intime. The way a synchronous system
deals with such an asynchronous signal is to sampleor poll it at regular intervals and
to check its value. If the sampling rate is high enough, notransitions will be missed—

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

this is known as the Nyquist criterion in the communicationcommunity. However, it


might happen that the signal is polled in the middle of a transition.

The resulting value is neither low nor high but undefined. At that point, it is not
clearif the key was pressed or not. Feeding the undefined signal into the computer
could be thesource of all types of trouble, especially when it is routed to different
functions or gatesthat might interpret it differently.

For instance, one function might decide that the key ispushed and start a
certain action, while another function might lean the other way andissue a competing
command. This results in a conflict and a potential crash. Therefore, theundefined
state must be resolved in one way or another before it is interpreted further.
Itdoes not really matter what decision is made, as long as a unique result is
available. Forinstance, it is either decided that the key is not yet pressed, which will
be corrected in thenext poll of the keyboard, or it is concluded that the key is already
pressed.Thus, an asynchronous signal must be resolved to be either in the high or
low statebefore it is fed into the synchronous environment.

A circuit that implements such a decision-making function is called a


synchronizer. Now comes the bad news—building a perfectsynchronizer that always
delivers a legal answer is impossible! A synchronizer needs some time to come to a
decision, and in certain cases thistime might be arbitrarily long. An
asynchronous/synchronous interface is thus always proneto errors called
synchronization failures.

The designer’s task is to ensure that the probabilityof such a failure is small
enough that it is not likely to disturb the normal system behavior.Typically, this
probability can be reduced in an exponential fashion by waiting longer beforemaking
a decision. This is not too troublesome in the keyboard example, but in
general,waiting affects system performance and should therefore be avoided to a
maximal extent.

To illustrate why waiting helps reduce thefailure rate of a synchronizer,


consider a synchronizeras shown in Figure 3.56. This circuit is alatch that is
transparent during the low phase ofthe clock and samples the input on the
risingedge of the clock CLK. However, since the sampledsignal is not synchronized
to the clock signal,there is a finite probability that the set-uptime or hold time of the
latch is violated (the probability is a strong function of the transitionfrequencies of the
input and the clock). As a result, one the clock goes high, there is athe chance that
the output of the latch resides somewhere in the undefined transition zone.The
sampled signal eventually evolves into a legal 0 or 1 even in the latter case, as
thelatch has only two stable states.

3.13 Clock Synthesis and Synchronization Using a Phase-Locked Loop

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

There are numerous digital applications that require the on-chip generation of
a periodicsignal. Synchronous circuits need a global periodic clock reference to drive
sequential elements.Current microprocessors and high performance digital circuits
require clock frequenciesin the gigahertz range. Crystal oscillators generate
accurate, low-jitter clockswith a frequency range from 10’s of Megahertz to
approximately 200MHz.

To generate ahigher frequency required by digital circuits, a phase-locked


loop (PLL) structure is typicallyused. A PLL takes an external low-frequency
reference crystal frequency signal andmultiplies its frequency by a rational number N
(see the left side of Figure 3.58).PLLs are also used to perform synchronization of
communication between chips.

Typically as shown in Figure 3.58, a reference clock is sent along with the
parallel databeing communicated (in this example only the transmit path from chip 1
to chip 2 isshown). Since chip-to-chip communication occurs at a lower rate than the
on-chip clockrate, the reference clock is a divided but in-phase version of the system
clock.

The referenceclock synchronizes all input flip-flops on chip 2; this can present
a significant clockload for wide data busses. Introducing clock buffers to deal with
this problem unfortunatelyintroduces skew between the data and sample clock. A
PLL, using feedback, can bealign (i.e., de-skew) the output of the clock buffer with
respect to the data. In addition, forthe configuration shown in Figure 3.58, the PLL
can multiply the frequency of theincoming reference clock, allowing the reference
clock to be a fraction of the data rate.

Figure 3.58Applications of Phase Locked Loops (PLL).


3.13.1 Basic Concept
Periodic signals of known frequency can be discribed exactly by only one
parameter, theirphase. More accurately a set of two or more periodic signals of the
same frequency can bewell defined if we know one of them and its phase with
respect to the other signals, as inFigure 3.59.Here Φ1 andΦ2 represent the phase of

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

the two signals. The relative phase isdefined as the difference between the two
phases.

Figure 3.59Relative and absolute phase of two periodic signals


A PLL is a complex, nonlinear feedback circuit, and its basic operation is
understoodwith the aid of Figure 3.60 . The voltage-controlled oscillator (VCO)takes
an analog control input and generates a clock signal of the desired frequency. In
general,there is a non-linear relationship between the control voltage (vcont) and the
outputfrequency. To synthesize a system clock of a particular frequency, it
necessary to set thecontrol voltage to the appropriate value.

This is function of the rest of the blocks (i.e.,feedback loop) in the PLL. The
feedback loop is critical to tracking process and environmentalvariations. The
feedback also allows frequency multiplication.

The reference clock is typically generated off-chip from an accurate crystal


reference.The reference clock is compared to a divided version of the system clock
(i.e., thelocal clock). The local clock and reference clock are compared using a
phase detector thatcompares the phase difference between the signals and
produces an Up or Down signalwhen the local clock lags or leads the reference
signal.
It detects which of the two inputsignals arrives earlier and produces an
appropriate output signal. Next, the Up and Downsignals are fed into a charge pump,
which translates the digital encoded control informationinto an analog voltage. An Up
signal increases the value of the controlvoltage and speeds up the VCO, which
causes the local signal to catch up with the referenceclock. A Down signal, on the
other hand, slows down the oscillator and eliminatesthe phase lead of the local
clock.
Passing the output of the charge pump directly into the VCO creates a jittery
clocksignal. The edge of the local clock jumps back and forth instantaneously and
oscillatesaround the targeted position. As discussed earlier, clock jitter, is highly
undesirable, sinceit reduces the time available for logic computation and therefore
should be kept within agiven percentage of the clock period. This is partially
accomplished by the introduction ofthe loop filter.

This low-pass filter removes the high-frequency components from the


VCOcontrol voltage and smooths out its response, which results in a reduction of the

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

jitter.Note that the PLL structure is a feedback structure and the addition of extra
phase shifts,as is done by a high-order filter, may result in instability.

Important properties of a PLL areits lock range—the range of input


frequencies over which the loop can maintain functionality; the lock time—the time it
takes for the PLL to lock onto a given input signal; and thejitter. When in lock, the
system clock is N-times the reference clock frequency.A PLL is an analog circuit and
is inherently sensitive to noise and interference. This is especially true for the loop
filter and VCO, where induced noise has a direct effect on the
Resulting clock jitter. A major source of interference is the noise coupling through the
supply

Figure 3.60 Composition of a phase-locked loop (PLL).


Rails and the substrate. This is particularly a concern in digital environments, where
noise is introduced due to a variety of sources. Analog circuits with a high supply
rejection, such as differential VCOs, are therefore desirable. In summary, integrating
a highly sensitivecomponent into a hostile digital environment is nontrivial and
requires expert analog design.

TWO MARKS

1. What is bubble pushing?


Bubble pushing is a technique used to transform certain gates into others -
usually NANDs to ORs and NORs to ANDs. The technique can also be used to
simplify the logic of a combinational circuit. The rule is that pushing one "bubble"
(circle representing NOT) from each input changes the gate (AND to OR orOR to
AND) and yields one bubble on the output.
2. Define time borrowing?
When a system uses transparent latches, the data can depart the first latch on
the rising edge of the clock, but does not have to set up until the falling edge of
the clock on the receiving latch. if one half cycle or stage of a pipeline has too
much logic ,it borrow time into the next half cycle.
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

3. Define clock skew?


In circuit designs, clock skew (sometimes timing skew) is a phenomenon in
synchronous circuits in which the clock signal (sent from the clock circuit) arrives
at different components at different times. This can be caused by many different
things, such as wire-interconnect length, temperature variations, variation in
intermediate devices, capacitive coupling, material imperfections, and differences
in input capacitance on the clock inputs of devices using the clock. As the clock
rate of a circuit increases, timing becomes more critical and less variation can be
tolerated if the circuit is to function properly.
4. What is meant by clock gating?
Clock gating is one of the power-saving techniques used on many
synchronous circuits. To save power, clock gating support adds more logic to a
circuit to prune the clock tree, thus disabling portions of the circuitry so that its
flip-flops do not change state: their switching power consumption goes to zero,
and only leakage currents are incurred.
5. Define metastability?
A latch is a bistable device .under the right conditions, the latch can enter
state in which the output is at indeterminate level between 0 and1.the point is
called metastable because the voltages are self consistent and can remain there
indefinitely.
6. Define synchronizers? [MAY2013]
It is a circuit that accepts an input that can change at arbitrary times and
produces an output aligned to the synchronizer‘s clock.
7. Specify some of the draw backs of ratioed circuits?
The drawbacks of ratioed circuits include
 slow rising transitions
 contention on the falling transitions
 static power dissipation
 non-zero VOL

8. What is meant by monotonically raising?


While a dynamic gate is in evaluation, the inputs must be monotonically rising.
That is the input can start LOW and remain LOW, start LOW and rise HIGH, start
HIGH and remain HIGH, but not start HIGH and fall LOW.
9. What is MODL?[Nov 2012]
Multiple-output Domino Logic (MODL) is often necessary to compute multiple
functions where one is a sub function of another or shares a sub function.
Multiple- output domino logic (MODL) saves area by combining all of the
computations into a multiple-output gate.
10. What are the main draw backs of NORA?

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

NORA [NO Race] has two major drawbacks. The logical effort of footed p-
logic gates is generally worse than that of Hi-skew gates. Secondly, NORA is
extremely susceptible to noise.
11. Define Pass-transistor logic? [MAY 2010]
It reduces the count of transistors used to make different logic gates, by
eliminating redundant transistors. Transistors are used as switches to pass logic
levels between nodes of a circuit, instead of as switches connected directly to
supply voltages.
In pass-transistor circuits, inputs are also applied to the source/drain diffusion
terminals. These circuits build switches using either nMOS pass transistors or
parallel pairs of nMOS and pMOS transistors called transmission gates.

Eg.: NAND gate


12. Explain about LEAP?
It is a single-ended logic family in that the complementary network is not
required, thus saving area and power. The output is buffered with an inverter,
which can be LO-skewed to favor the asymmetric response of an nMOS
transistor. The nMOS network only pulls up to VDD - Vt so a pMOS feedback
transistor is necessary to pull the internal node fully high, avoiding power
consumption in the output inverter.
13. What circuit families are best for low power?
Dynamic and pseudo-nMOS gates appear attractive because they eliminate
the bulky p-MOS transistors that account for 2/3 of the gate width in
complementary CMOS logic. Pseudo-nMOS static power dissipation will dwarf
the dynamic power in most applications. However, complementary CMOS
benefits from efficient layout of simple gates, no swing restoration circuitry, single
rail logic designs, and better scaling at low VDD/Vt ratios; demonstrated that pass
transistors have a higher power-delay product in most situations.
14. What are the advantages of differential flip flops? [Nov-2011]
They are build from clocked sense amplifier so they can rapidly respond to
small differential input voltages They work well with low swing inputs

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

15. Compare CMOS combinational logic gates with reference to the Equivalent
n-mos depletion load logic with reference to the Area requirement. [MAY2012]
For CMOS, the area required is 533 μm2 , For pseudo NMOS the area required is
288 μ m2.
16. What are the advantage of using a pseudo NMOS gate instead Of a full
CMOS gate. [MAY 2012]
Ratioed circuits dissipate power continually in certain states and have poor
noise margin than complementary circuits.
17. Enumerate the features of synchronizers [MAY2013]
The goal of a digital system designer should be ensure that given
synchronous inputs the probability of encountering a metastable voltage is
sufficiently small. To guarantee good logic levels, all synchronous inputs should
be passed through synchronizers.
18. List the various power losses in CMOS circuits [MAY2013]
1. Static power dissipation(due to leakage current when the circuit is ideal)
2. Dynamic power dissipation(When the circuit is switching)and
3. Short-circuit power dissipation during switching of transistors.
19. Draw the NAND gate logic gate diagram and its layout diagram [Nov 2012]

20. Draw a pseudo NMOS inverter? [MAY2010]

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

21.What are the classification of CMOS circuit families ?


 Static CMOS circuits.
 Dynamic CMOS circuits.
 Ratioed circuits.
 Pass-transistor circuits.
22.What is the characteristics of Static CMOS design ?
A static CMOS circuit is a combination of two networks – the pull-up network
(PUN) and the pull-down network (PDN) in which at every point in time, each gate
output is connected to either VDD or VSS via a low resistance line.
23.List the important properties of Static CMOS design ?
 At any instant of time, the output of the gate is directly connected to VDD
and VSS.
 The function of the PUN is provide a connection between the output and
VDD.
 The function of the PDN is provide a connection between the output and
VSS .
 Both PDN and PUN are constructed in mutually exclusive way such that
one and only one of the networks is conducting in steady state. That is, the output
node is always a low-impedance node in steady state.
24.What is Dynamic CMOS logic ?
Dynamic circuits rely on the temporary storage of signal values on the
capacitance of high impedance node.
 Requires only N+2 transistors.
 Takes a sequence of precharge and conditional evaluation phases to
realizes logic functions.

25.What are the properties of Dynamic logic ?


 Logic function is implemented by pull-down network only.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Full swing outputs (VOL= GND and VOH = VDD).


 Non-ratioed.
 Faster switching speeds.
 Needs a precharge clock.
26.What are the disadvantages of dynamic CMOS technology ?
 A fundamental difficulty with dynamic circuits is a loss of noise immunity
and a serious timing restriction on the inputs of the gate.
 Violate monotonicity during evaluation phase.
27.What is CMOS Domino logic ?
A static CMOS inverter placed between dynamic gates which eliminate the
monotonicity problem in dynamic circuits are called CMOS Domino logic.
28.What is called static and dynamic sequencing element ?
A sequencing element with static storage employs some sort of feedback to
retain its output value indefinitely.
A sequencing element with dynamic storage generally maintain its value as
charge on a capacitor that will leak away if not refreshed for a long period of time.
29.What is clock skew ?
In reality clocks have some uncertainty in their arrival times that can cut into
the time available for useful computation is called clock skew.
30.What are synchronizers ?
Synchronizers are used to reduce metastability. The synchronizers ensure
synchronization between asynchronous input and synchronous system.
31.What is the difference between melay and moore state machines?
In the melay state machine we can calculate the next state and output both
from the input and state. But in the moore state machine we can calculate only next
state but not output from the input and the state and the output is issued according to
next state.

32.Define propagation delay and contamination delay?


Propagation delay(t pd): The amount of time needed for a change in a logic input
to result in a permanent change at an output,that is the combinational logic will not
show any further output changes in response to an input change alter time fod units

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

contamination delay(tcd): The amount of time needed for a change in a logic input
to result in an initial change at an output, that is the combinational logic is
guaranteed not to show any output change in response to an input change before
fed time units have passed.
33. Define Setup time and Hold time.
Setup time (t setup): The amount of time before the clock edge that data
input D must be stable the rising clock edge arrives.
Hold time (t hold): This indicates the amount of time after the clock edge
arrives the data input D must be held stable in order for FF to latch the correct
value. Hold time is always measured from the rising clock edge to a point after the
clock edge.
34. Difference between latches and Flip-Flop.

35.Define Pipelining.
Pipelining is a popular design technique often used to accelerate the
operation of the data path in digital processors. The major advantages of pipelinig
are to reduce glitching in complex logic networks and getting lower energy due to
operand isolation.
36.How the limitations of a ROM-based realization is overcome in a PLA-based
realization.
In a ROM, the encoder part is only programmable and use of ROMs to realize
Boolean functions is wasteful in many situations because there is no cross-connect
for a significant part.
This wastage can be overcome by using Programmable Logic Array(PLA),
which requires much lesser chip area.
37.In what way the DRAMs differ from SRAMs?
Both SRAMs and DRAMs are volatile in nature, ie. Information is lost if power
line is removed. However SRAMs provide high switching speed, good noise margin
but require large chip area than DRAMs.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

38.Explain the read and write operations for a one-transistor DRAM cell.
A significant improvement in the DRAM evolution was to realize 1-T DRAM
cell. One additional capacitor is explicitly fabricated for storage purpose. To store 'I',
it is charged to store '0' it is discharged to '0' volt. Read operation is destructive.
Sense amplifier is needed for reading.
Read operation is followed by restoration operation.
39.what is MTBF ?
MTBF=(1/P(failure)) = ( Ti e(Ti=tsetup/ti)/Nto)
40. what do you meant by Max delay constraint and Min delay constraint ?
Min delay constraint: the path begins with the rising edge of the clock
triggering F1. The data may begin to change at Q1 after a clk-to-Q contamination
delay. However, it must not reach D2 until at least the hold after the clock edge, lest
it corrupt the contents of F2. Hence,
we solve for minimum logic contamination delay :
tcd>= thold – tccq
Max delay constraint : the path begins with the rising edge of the clock
triggering F1. The data must propagate to the output of the flipflop Q1 and through
the combinational logic to D2, setting up at F2 before the next rising clock edge.
Under ideal conditions, the worst case propagation delays determine the minimum
clock period for this sequential circuitry
Tc >= tpcq + tpd + tsetup
41.Draw switch level schematic of multiplexer based NMOS latch using NMOS
only pass transistors for multiplexer.(May-16)

42.What is clocked CMOS register?(May-16)


Clocked CMOS register is a positive edge triggered register that is based on a
master-slave concept insensitive to clock overlap.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

UNIVERSITY QUESTIONS
PART B
1.Explain in detail about the pipelining concept used in sequential circuits. (16) [MAY
2013]
2.Discuss the techniques to reduce switching activity in a static and dynamic CMOS
circuits.(16) [MAY 2013]
3.(i) For a two input NAND gate de rive an ex press ion for the drain current. (8)
[MAY 2012]
(ii) Draw a CMOS NOR 2 gate and its complementary operation with necessary
equations. (4)
(iii) Obtain a CMOS logic design realizing the Boolean function z=a (d+e)+ bc (8 )
[MAY 2012]
4.(i) Draw a circuit diagram of the CMOS SR latch and explain in detail. (8) [MAY
2012]
(ii) Along with the necessary input and output waveforms of the CMOS DFF negative
edge triggered master slaved flip flop. (8) [MAY 2012]
5. Write the basic principle of low power logic design. (4) [NOV 2011]
6.(i) For a resistive load inverter circuit with VDD = 5 V, Kn’= 20 µA/V2, VTO= 0.8 V,
RL= 200 kΩ and = 2. Calculate the critical voltages on the voltage transfer
characteristics and find the noise margins of the circuit. [May 2011].
(ii) Explain the details about pseudo – nMOS gates with neat circuit diagram. [May
2011]
7.(i) Design a transistor level schematic of the one bit full adder circuit and explain.
(6) [May 2011]
(ii) Discuss in detail the characteristics of CMOS transmission gate. (10) [May 2011]
8.(i) What is meant by transmission gate? List the applications of transmission gates
and design a 2 x 1 mux operation circuit using transmission gates.(8) [NOV 2012]
(ii) Realize the AND logic gate and OR logic gate using NOR logic gates. (8) [NOV
2012]
9. Realize the AND logic gate and OR logic gate using NAND logic gates. (8) [NOV
2012]
10.Explain in detail about the design of a six transistor static RAM cell with dynamic
refreshment logic.(16) [NOV 2012]
11.(i)Compare static and dynamic logic circuits with example.(8)(DEC 2014)
(ii)Explain the dynamic and static power reduction in low power design of VLSI
circuits.(8)
12. Explain the methodology of sequential circuit design of latches and Flip-flops.(16)
(MAY 2014)
13.Explain the operation of Master slave based edge triggered register.May-16
14.Discuss in detail various pipelining approaches to optimize sequential
circuits.May-16

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

UNIT IV
DESIGNING
ARITHMETIC
BUILDING BLOCKS
REFERRED BOOKS:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated
Circuits: A Design Perspective”, Second Edition, Prentice Hall of India,
2003.

2. N.Weste, K.Eshraghian, “Principles of CMOS VLSI Design”, Second


Edition, Addision Wesley 1993

3. R.Jacob Baker, Harry W.LI., David E.Boyee, “CMOS Circuit Design,


Layout and Simulation”, Prentice Hall of India 2005 3. A.Pucknell,
Kamran Eshraghian, “BASIC VLSI Design”, Third Edition, Prentice Hall
of India, 2007.

STAFF IN-CHARGE HOD

Introduction

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The overall system performance is dominated by the speed and power


consumption of circuits like adders,multipliers and shifters.so a careful design
optimization is needed in these circuits.For every module, multiple equivalent logic
and circuit topologies exist.Each logic topologies have its own positives and
negatives in terms of area, speed or power.

4.1 Data Path Circuits


Digital processor has components like data path, memory, control and
input/output blocks.All computations are performed in the data path. Components in
the processor are support units to store the results produced by the data path or help
to determine the actions to be taken place in the next cycle.

A data path consists of interconnection of combinational functions such as


arithmetic or logical operations.The design of data path is based on application.For
example ,in PCs processing, speed is the constrain and in another application,
maximum dissipated power may be the constrain.

Fig 4.1 shows the bit sliced data path organization. Data in a processor are
operated in word-based manner.Data paths in a microprocessor are 32 or 64 bits
wide.The signal processing datapaths in Digital subscriber Line modems, magnetic
disk drives, or compact disc players are 5 to 24 bits wide.

A 32 bit processor operates on data words that are 32 bits wide.This is


because the same operation performed on each bit of the data word.The datapath
consists of 32 bits slices and each operation on a single bit hence called bit-sliced.

Fig:4.1 Bit sliced data path organization

4.2 Ripple Carry Adder

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The most commonly used arithmetic operation is the addition. Adder is the
speed limiting element so careful optimization of adder design is needed.The
optimization is made either at logic level or circuit level.

Carry look ahead adder is an example for logic level optimization which uses
rearrange the Boolean equations.Circuit optimizations manipulate transistor sizes
and circuit topology to optimize the speed.

A full adder has three inputs and two outputs.Let A and B are the adder
inputs,Cin is the carry input, S is the sum output and cout is the carry output.Table
shows the truth table of binary full adder.

A B Ci S Co Carry status
0 0 0 0 0 delete
0 0 1 1 0 delete
0 1 0 1 0 propagate
0 1 1 0 1 propagate
1 0 0 1 0 propagate
1 0 1 0 1 propagate
1 1 0 0 1 generate/propagate
1 1 1 1 1 generate/propagate

Table 4.1.Truth table of full adder


The outputs S and Cout are defined as functions of some intermediate signals
G(generate),D(delete),and P(propagate).ifG=1,then that a carry bit will be
generated(1) at Cout independent of Cin. If D=1,then that a carry bit will be
deleted(0) at Cout is independent of Cin.P=1 guarantees that an incoming carry Cin
will propagate to Cout.

S=A⊕ B⊕ Cin
Cout=AB+BCin+ACin
G=AB
D= A B
P=A ⨁ B

Now S and Cout is written as a function P and G as,

Co(G,P)=G+PCin

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

S(G,P)=P⊕Cin

G and P are functions of A and B only and are not dependent upon Cin.Similarly
Expression for S(D,P) and Co(D,P) can be derived.

Fig:4.1. Ripple carry adder

An N-bit adder can be constructed by cascading N full adder(FA)circuits in


series,connecting C0,k-1 to Ci,k for K=1 to N-1 and the first carry-in C i,0 to 0.In ripple
carry adder the carry bit ripples from one stage to the other.The circuit delay
depends upon the number of logic stages that must be traversed and a function of
the applied input signals.

For some input signals, no ripple occurs at all.But for some cases the carry
has to ripple all the way from the LSB to the MSB.the propagation delay in this path
is called the critical path is defined as the worst case delay over all possible input
patterns.

Worst case delay occurs when a carry is generated at the LSB position.This
carry is consumed at the last stage to produce the sum.Delay proportional to the
number of bits in the input words N and is given by,

tadder≈ (N-1)tcarry+tsum

tcarry-Propagation delay from Cin to Cout

tsum-propagation delay from Cin to S

Conclusions observed:

i)The propagation delay of the ripple carry adder is linearly proportional to N

ii)When designing the full adder cell for afast ripple carry adder, it is important to
optimize tcarryand tsum.since the later has only a minor influence on the total value of

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

tadder.

Inverting all inputs to a full adder results in inverted values for all outputs. This
is called the inverter property.The inverting property is useful when optimizing the
speed of ripple carry adder.The property is expressed as,

S(A,B,Ci)=S( A , B,Ci )

C 0 (A,B,Ci)=C0( A , B,Ci)

Fig:4.2. Full adder-inverting property

Fig:4.3. Ripple carry adder-inverting property

4.3 Complimentary Static CMOS Full Adder:


It has 28transistor with simple modifications.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.4 Modified CMOS Full adder design


4.4. The Mirror Adder Design

 The NMOS and PMOS chains are completely symmetrical.


 Thisguarantees identical rising and falling transitions if the NMOS and PMOS
devices are properly sized.
 A maximum of two series transistors can be
observed in the carry-generation circuitry.
 When laying out the cell, the most critical issue is the minimization of
the capacitance at node Cout.
 The reduction of the diffusion capacitances
is particularly important.
 The capacitance at node Co is composed of four diffusion capacitances,
two internal gate capacitances, and six gate capacitances in the
connecting adder cell .
 The transistors connected to Cin are placed closest to the output.
 Only the transistors in the carry stage have to be optimized for optimal
speed. All transistors in the sum stage can be minimal size.
S=A⊕ B ⊕ Cin

Cout=AB+BCin+ACin

G=AB

D= A B

P=A ⨁ B

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.5 Mirror design of CMOS Full adder design


It has 24 transistors.
4.5 Transmission Gate based Full Adder

Fig:4.6 Transmission gate based CMOS Full adder design


4.6 Manchester Carry Chain Adder
 Switches controlled by Gi and Pi
 Total delay of
• time to form the switch control signals Gi and Pi
• setup time for the switches
• signal propagation delay through N switches in the worst case.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.7 Manchester full adder design

Fig:4.8 Manchester 4 bit full adder design


4.7Monolithic look ahead adder
A carry-Lookahead adder is a fast parallel adder as it reduces the propagation delay
by more complex hardware, hence it is costlier. In this design, the carry logic over
fixed groups of bits of the adder is reduced to two-level logic, which is nothing but a
transformation of the ripple carry design.

This method makes use of logic gates so as to look at the lower order bits of
the augend and addend to see whether a higher order carry is to be generated or
not.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.9. CLA Full adder design


Consider the full adder circuit shown above with corresponding truth table. If
we define two variables as carry generate Gi and carry propagate Pi then,

Pi = Ai ⊕ Bi
Gi = Ai Bi
The sum output and carry output can be expressed as

Si = Pi ⊕ Ci
C i +1 = Gi + Pi Ci
Where Gi is a carry generate which produces the carry when both Ai, Bi are one
regardless of the input carry. Pi is a carry propagate and it is associate with the
propagation of carry from Ci to Ci +1.

Fig:4.10 Mirror adder design of CLA

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

This high level model has some hidden dependencies. The constant addition
time in adder is wishful thinking and the delay is increasing linearly with the number
of bits as shown in fig.

In the schematic diagram of mirror implementation of 4 bit look ahead


adder,the circuit exploits self duality and the recursivity of the carry lookahead
adder.The large fan in of the circuit makes it prohibitively slow for larger values of
N.Implementing it with simpler gates requires multiple logic levels.

In both cases the propagation delay increases.the fanout on some of the


signals tends to grow excessively,slowing down the adder even more.The area of
the implementation grows progressively with N.

The carry output Boolean function of each stage in a 4 stage carry-Lookahead adder
can be expressed as

C1 = G0 + P0 Cin

C2 = G1 + P1 C1

= G1 + P1 G0 + P1 P0 Cin

C3 = G2 + P2 C2

= G2 + P2 G1+ P2 P1 G0 + P2 P1 P0 Cin

C4 = G3 + P3 C3

= G3 + P3 G2+ P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 Cin

Disadvantages
i). For a group of N bits the transistor implementation has N+1 parallel
branches and N+1 transistors in the stack
ii). Wide gates and large stacks display poor performance
iii). The computation has to be limited to up to two or four bits.
4.8 Carry-Bypass Adder

A carry-skip adder (also known as a carry-bypass adder) is an adder


implementation that improves on the delay of a ripple-carry adder with little effort
compared to other adders. The improvement of the worst-case delay is achieved by
using several carry-skip adders to form a block-carry-skip adder.

If (P0 and P1 and P2 and P3 = 1)

then Co3 = C0, else “kill” or “generate”.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.11Carry-Bypass Adder

4.9. Linear Carry-Select Adder


The carry-select adder generally consists of two ripple carry adders and a
multiplexer. Adding two n-bit numbers with a carry-select adder is done with two
adders (therefore two ripple carry adders). In order to perform the calculation twice,
one time with the assumption of the carry-in being zero and the other assuming it will
be one. After the two results are calculated, the correct sum, as well as the correct
carry-out, is then selected with the multiplexer once the correct carry-in is known.

The number of bits in each carry select block can be uniform, or variable. In
the uniform case, the optimal delay occurs for a block size of sqrt {n} . When
variable, the block size should have a delay, from addition inputs A and B to the
carry out, equal to that of the multiplexer chain leading into it, so that the carry out is
calculated just in time. The sqrt {n) delay is derived from uniform sizing, where the
ideal number of full-adder elements per block is equal to the square root of the
number of bits being added, since that will yield an equal number of MUX delays.

Fig:4.12 Linear CarrySelect Adder

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The worst case delay is given as

4.10. Square root Carry-Select Adder


The carry-select adder generally consists of two ripple carry adders and a
multiplexer. Adding two n-bit numbers with a carry-select adder is done with two
adders (therefore two ripple carry adders). In order to perform the calculation twice,
one time with the assumption of the carry-in being zero and the other assuming it will
be one. After the two results are calculated, the correct sum, as well as the correct
carry-out, is then selected with the multiplexer once the correct carry-in is known.

Fig:4.13 Linear CarrySelect Adder

The worst case delay is given as

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.14 Comparison of adders

4.11Multipliers
Multiplications are expensive and slow operations.Performance of many
computational circuits is decided by the speed at which a multiplication operation can
be executed. So the design and integration of multiplication units are very much
important. Multipliers have complex adder arrays.
Definition: Consider the two N bit sequence A & B for multiplication is given as

Multiplication is performed using ,a single two input adder with M and NBits wide.The
multiplication takes M cycles using an N-Bit adder.The shift and add algorithm for
multiplication adds together M partial products.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Each partial product is generated by multiplying the multiplicand with a bit of


the multiplier and by shifting the result on basis of the multiplier bit’s position.

Fast multiplication is done similar to manual computation. Here all the partial
products are generated at the same time and organized in an array.To compute the
final products a multioperand addition is used as shown in fig.This structure is called
an array multiplier.The array multiplier has the following three functions.

i)Partial product generation


ii) Partial product Accumulation
iii)Final addition

4.11.1 Partial product Generation


Fig shows the partial product generation logic.Partial products result from the
logic AND of multiplicand X with a multiplier bit Y .Each row in the partial product
array is either a copy of the multiplicand or a row of zeroes. partial product
generation optimization can lead to reduced delay and area.But partial product array
has many zero rows that have no impact on the result.If multiplier consists of all
ones,all the partial products exist,while in the case of all zeros,there is none.This
reduces the number of generated partial products by half.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.15 Partial Product Generation


Consider an 8 bit multiplier of the form 01111110,which produces six nonzero
partial product rows. Reduce the number of nonzero rows by recoding this number
(27+26+25+24+23+22)into a different format. We can verify that the form 10000010.1

Is a short notation for -1.This format needs to add only two partial products,but the
final adder has to perform subtraction as well.This type of transformation is called
Booth’s recoding. Booth’s recoding reduces thenumber of partial products to at
most one half. This ensures that for every two consecutive bits,at most one bit will be
1 or -1.Reducing the number of partial products is equivalent to reducing the
number of additions, that speedup the operation and reduces the area.This
transformation is equivalent to formatting the multiplier word into a base-4 scheme
instead of the binary format.The format is,
(N−1)/2
Y¿ ∑ yj 4 j with (Yj∈{-2,-1,0,1,2})
j=0

The 1010..10 represents the worst case multiplier input because it generates
the most partial products.Multiplication with {0,1}is equivalent to an AND operation
multiplying with{-2,-1,0,1,2}.This requires combination of inversion and shift logic.

In modified Booths recoding the multiplier is portioned into 3 bit groups that
overlap by one bit. The groups are shown in table and forms one partial product.The
number of partial products equals half of the multiplier width.The input bits to the
recoding process are the two current bits,combined with the upper bit from the next
group,moving from most significant bit to least significant bit.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.16 Booths Recoding


4.11.2 Partial product Accumulation
Once the partial products are generated ,it has to be summed.This
accumulation is a multioperand addition and it is done by using a number of adders
that will form an array,called array multiplier.Following methods used for
accumulation process.

i). Array multiplier


ii). Carry save multiplier
iii).Tree multiplier
4.11.2.1 Array multiplier
Figure shows a 4x4 bit array multiplier.Generation of N partial products
requiresNxM 2-bit AND gates. Here most of the area is utilized to add the N partial
products and requires N-1 M-bit adders.Shifting of the partial products for proper
alignment is performed by simple routing and does not require any logic.The overall
structure is compacted into a rectangle to get a very efficient layout. FA represents
full adder and HA stands for a half adder or an adder with two inputs.

Propagation delay in an array organized circuits is difficult to find. The partial


sum adders are implemented using ripple carry structures shown in figure.
Performance optimization requires that the critical timing path be identified first. This
turns out to be non trivial. A large number of paths yield an approximate expression
for the propagation delay as,

tmult≈ [(M-1)+(N+2)]tcarry+(N-1)tsum+tand

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig:4.17 Array multiplier


tcarry-propagation delay between input and output carry

tsum-Delay between the input carry and sum bit of the full adder

tand-AND gate delay

All critical paths have same length. Speeding up one of them for instance or by
replacing one adder by faster one such as a carry-select adder does not make much
sense. All critical paths have to be attacked at the same time. Minimization of
tmult.requires the minimization of both tcarryand tsum.

4.11.2.2 Carry save multiplier


A more efficient realization is obtained by noticing that the multiplication result
does not change when the output carry bits are passed diagonally downwards
instead of only to the right. Figure shows a 4X4 carry save multiplier.An extra adder

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

called vector-merging adder is added to generate thefinal result.The resulting


multiplier is called carry save multiplier.In this the carry bits are not immediately
added but are saved for the next adder stage. In the final stage carry and sum are
merged using a carry lookahead adder. The worst case critical path is given by
assuming tadd=tcarry.

tmult= [tand+(N-1)] tcarry+tmerge


Advantage
i)Shorter worst case critical path
Disadvantage
i)Increased area cost(because of one extra adder)

4.11.2.3 Wallace Tree multiplier


Partial sum adders are rearranged in a tree like fashion to reduce both the
critical path and the number of adder cells. Consider four partial products each of
which is four bits wide as shown in Fig.Number of full adders needed for this
operation is reduced by observing that only column 3 in the array has to add four
bits.All other columns are less complex as shown in fig.Now the original matrix of
partial products is reorganized into a tree shape to visually illustrate its varying
depth. The challenge is to realize the complete matrix with a minimum depth and a
minimum number of adder elements.

The first type of operator used to cover the array is a full adder,which takes
three inputs and produces two outputs: The sum, located in the same column and
the carry located in the next one.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

For this reason the full adder is called a 3-2 compressor.It is denoted by a
circle covering three bits.The other operator is the half adder,which takes two input
bits in a column and produces two outputs.The half adder is denoted by circle
covering two bits.
To obtain minimal implementation ,the tree is covered with full adders and half
adders, starting fromits densest part.First half adder is introduced in column 4 and 3
as shown in fig.(b).the reduced tree is shown in figure(c)..Only three full adders and
three half adders are used for the reduction process, compared with six full adders in
the carry save multiplier.

Fig.4.18.Transforming a partial product tree


The final stage consists of simple two input adder, for which any type of
adder can be used. This structure is called the Wallace tree multiplier and its
implementation is shown in Fig.The tree multiplier realizes substantial hardware
savings for larger multipliers.The propagation delay is reduced to O( log 3 / 2(N )).this
structure is substantially faster than the carry save structure for large multipliers word
lengths.

Disadvantage
i).Very irregular
ii).Complicated layout

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.4.19 Wallace tree Multiplier


Final addition
The final step for completing the multiplication is to combine the result in the
final adder.The choice of the adder style depends on the structure of accumulation
array.A carry look ahead adder is the preferable option if all input bits to the adder
arrive at the same time,as it yields the smallest possible delay.

4.13 SHIFTERS
The shift operation is another essential arithmetic operation that requires
adequate hardware support. It is used extensively in floating point units, scalers and
multiplications by constant numbers. The latter can be implemented as a
combination of add and shift operations. Shifting a data word left to right over a
constant amount is a trivial hardware operation and is implemented by the
appropriate signal wiring. A programmable shifter is more complex and requires
active circuitry.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig 4.23 One bit programmable shifter


4.13.1 Barrel shifter
It consists of an array of transistors, in which the number of rows equals the
word length of the data and the number of columns equals the maximum shift width.
In this case,both are set equal to four. The control wires are routed diagonally
through the array.

A major advantage of this shifter is that the signal has to pass through atmost
one transmission gate .In other words, the propagation delay is theoretically constant
and independent of the shift value or shifter size. this is not true in reality, however
because the capacitance at the input of the buffers rises linearly with the maximum
shift width.

Fig 4.24 Barrel shifter


Advantages

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

i) signal has to pass through at most one transmission gate


ii) Propagation delay is theoretically constant and independent of shifter size.
iii) Layout is not dominated the transistors
Disadvantages

i). It is used only for smaller shift


ii).Separate decoder is necessary for decoding control bits.

4.14 Speed and Area Trade Off


With a fixed architecture of the data path,speed,area and power can be traded
off through the choice of the supply voltage, transistor thresholds and device sizes.
This enables a variety of power minimization techniques and is classified as follows.

i). Enable time-Some designs are implemented or enabled at design time.


transistor widths and lengths are fixed at the design time. Supply voltage
andtransistor thresholds are assigned statically during the design phase or changed
dynamically at runtime. Other techniques address the time that a function or module
is in idle or standby mode The power dissipation of module in sleep mode is minimal.

ii).Targeted dissipation source-This depends on the source power


dissipation like active power or leakage power. Lowering supply voltage notonly
reduces the energy consumed per transition but also reduces the leakage current.

One third of the total energy ofa digital system is consumed by the clock
distribution network. A common method to reduce power in idle mode is the clock
gating technique. Clock gating does not reduce the leakage power of the idle block.
Complicated schemes to lower the standby power are used, such as increasing the
transistor thresholds or switching off the power rails. Following are the techniques to
reduce the power.

i). Design time power reduction


ii). Run time power management
iii). Power reduction in stand by or sleep mode
iv). Design as atradeoff
v).Design time power reduction
The following are the techniques to reduce the power during design time.
a).Supply voltage reduction
b).Using multiple supply voltages

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

c).Using multiple device threshold


d).Reducing switching capacitance through transistor sizing
e).Reducing switching capacitance through logic and architecture
optimization
f).Supply voltage reduction
The reduction insupply voltage results in quadratic power savings. But delay
of CMOS gates increases with supply voltage. At data path level, this loss of
performance is compensated by logical and architectural optimizations. For example
a ripple carry adder is replaced by a look ahead adder. Look ahead adder is larger
and more complex, but run at lower supply voltage for the same performance.

b) UsingMultiple supply voltages


This technique selectively decrease the supply voltage on some of the gates which,

i). Correspond to fast paths and finish the computation early

ii). Drive large capacitances with increased delay

A separate supply voltage is provided in I/O forcompatibility. The logic core is


powered from lower voltage supplies. Multiple voltages are assigned on a gate-by-
gate basis. The two different ways for using multiple supply voltages are,

i).Module level voltage selection


ii).Multiple supplies inside a block
c) Using multiple device threshold
The use of devices with multiple thresholds offers another way of trading off
speed for power.Mostsub-0.25µmCMOS technologies offer two types n-type and p-
type transistors, with thresholds differing by about100mv.Low threshold devices are
used in timing criticalpaths, while the high thresholds are used anywhere else. The
use of multiple thresholds does not require level converters or any other special
circuits. The assignment of thresholds can reduce the leakage power by 80-90%.

d)Reducing switching capacitance through transistor sizing


Input capacitance of a complementary CMOS gate is proportional to its size
and speed. The optimal sizing of gates in a logic path should be sized to have an
effective fanout approximately 4 to get the minimum delay for thatpath. In an inverter
chain with a given load and number of stages, the minimum delay is achieved when
the size of each inverter in the chain is the geometric mean of its neighbours.When
the minimum delay achieved by sizing is below the desired delay, an optimization

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

problem is formulated that minimizes the switching capacitance under delay


constraints. The optimal approach is to adjust the tapering factor at each stage.

e) Reducing switching capacitance through logic and architecture optimization


Effective capacitance is the product of the physical capacitance and the
switching activity. Logic and architectural optimizations reduce the switching activity
without degrading the performance. The following are the methods of reducing
switching activity.
i). Switching activity reduction by resource allocation
ii). Glitch reduction through path balancing
Run time power management
Following methods are used for run time power management.
i) .Dynamic supply voltage scaling
ii).Dynamic threshold scaling
i) Dynamic supply voltage scaling
Lowering clock frequency at reduced workloads reduces the power. But this
doesnot save energy, because every operation is executed at high level. If both
supply voltage andfrequency are lowered simultaneously, the energy is reduced. To
maintain the required throughput for high workloads and minimize energy for low
workloads, both supply and frequency must be dynamically varied according to the
requirements application that is currently being executed. This technique is called
dynamic voltage scaling.

ii). Dynamic threshold scaling


For low latency computation, the threshold is lowered to its minimal value. For
low speed computation, it is increased. In the standby mode, it is set to the highest
possible value to minimize the leakage current. Substrate bias is the control to vary
the threshold voltages dynamically. For this, operate the transistors with four terminal
in both n-type and p-type devices.

Power reduction in standby or sleep mode


In idle mode no active switching occurs, so power dissipation is due to
leakage is minimized using DTS technique. A simple power down scheme utilizes
large sleep transistors to switch OFF the power supply rails when the circuit is in the
sleep mode. This is implemented by using a power switch on the supply rail.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig 4.25 Power leakage

Fig 4.25 Power reduction due to sleep transistor


The sleep signal is high in normal operation, and the sleep transistors is small
as a resistance as possible. Transistorsfinite resistance results in noise on supply
rails. Sizing and threshold selection of sleep transistors is subject to a trade off
process.

To minimize fluctuations in the supply voltage, the sleep transistors have a


very low on –resistance, and are very wide. With high threshold transistors leakage
is reduced. Addition of sleep transistor increases the transistor stack height. this
results in leakage reductions in the order of 10/1000 times for low/high threshold
switches.

4.15 Design as a tradeoff


The important design concepts are,
i) Select right structure before starting an elaborate circuit optimization.
Optimizingtransistor sizes and topologies probably will not give the best
result.
ii) Determine the critical timing path through the circuit and focus optimization efforts
on that part of the circuit. Computeraided design tools are available to find the critical
paths and size the transistors. Some non critical paths are down sized to reduce
power consumption.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

iii). Circuit size is not only determined by the number and size of the transistors, but
also by other factors such as wiring and the number of vias and contacts.
iv). Optimization helps to get better result, but results in an irregular and convoluted
topology.
v). Power and speed can be traded off through a choice of circuit sizing, supply
voltages and transistor thresholds.
PartA
1.How data path can be implemented in VLSI system?
A data path is best implemented in a bit –sliced fashion. A single layout is
used respectively for every bit in the data word. This regular approach eases the
design effort and results in fast and dense layouts.
2.Comment on performance of ripple carry adder.
A ripple carry adder has a performance that is linearly proportional to the
number of bits.
Circuit optimizations concentrate on reducing the delay of the carry path. A number
of circuit topologies exist providing that careful optimization of the circuit topology
and the transistor sizes helps to reduce the capacitance on the carry bit.
3.What is the logic of adder for increasing its performance ?
Other adder structures use logic optimizations to increase the performance
(carry bypass, carry select, carry lookahead). Performance increase comes at the
cost area.
4.What is multiplier circuit ?
A multiplier is nothing more than a collection of cascaded adders. Critical
path is far more complex and optimizations are different compared to adders.
5.Which factors dominate the performance of programmable shifter ?
The performance and the area of a programmable shifter are dominated by
the wiring.
6.What is meant by data path ?(May 2016)
A datapath is a functional units, such as arithmetic logic units or multipliers,
that perform data processing operations, registers and buses. Along with the control
unit it composes the central processing unit.
7.Write down the expression for worst-case delay for RCA.
t = (n-1)tc+ts
8.Write down the expression to obtain delay for N-bit carry select adder.(May
2016)
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

tadder = tsetup +Mtcarry +(N/M-1)tbypass +(M-1)tcarry + tsum


9.Define Braun multiplier.
The simplest multiplier is the Braun multiplier. All the partial products are
computed in parallel and then collected through a cascade of Carry Save Adders.
The completion time is limited by the depth of the carry save array and by the carry
propagation in the adder. This multiplier is suitable for positive operands.
10.What is the advantage of Booth’s algorithm ?
Booth algorithm is a method that will reduce the number of multiplicand
multiples. For agiven number of ranges to be represented , a higher representation
radix leads to fewer digits.
11.Draw the truth table for Modified booth’s algorithm.

12. List the different types of shifter.


 Array shifter
 Barrel shifter
 Logarithm shifter
13.What is meant by bit sliced data path organization.(May-16)

Data in a processor are operated in word-based manner.Data paths in a


microprocessor are 32 or 64 bits wide.The signal processing datapaths in Digital
subscriber Line modems, magnetic disk drives, or compact disc players are 5 to 24
bits wide.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

A 32 bit processor operates on data words that are 32 bits wide.This is


because the same operation performed on each bit of the data word.The datapath
consists of 32 bits slices and each operation on a single bit hence called bit-sliced.
14.Determine propagation delay of N-bit carry select adder. (May-16)
Propagation delay of N-bit carry select adder written as
Tadd=tsetup+Mtcarry+(N/M)tmux+tsum
15.Why barrel shifter very useful in the designing of arithmetic circuits?(nov
2016)
A barrel shifter is a digital circuit that can shift a data word by a specified
number of bits without the use of any sequential logic, only pure combinatorial logic.

16. Write the principle of any one fast multiplier?(Nov- 2016)


A binary multiplier is an electronic circuit used in digital electronics, such as a
computer, to multiply two binary numbers.Most techniques involve computing a set
of partial products, and then summing the partial products together.

17. How to design a high speed adder?(Nov-2017)

A carry-lookahead adder (CLA) or fast adder is a type of adder used in digital logic.
A carry-lookahead adder improves speed by reducing the amount of time required to
determine carry bits.

18. What is latency?(Nov-2017)

Latency is the delay from input into a system to desired outcome; the term is
understood slightly differently in various contexts and latency issues also vary from
one system to another.

PartB
1. Explain the structure of booth multiplier and list its advantages.
2. Design a 3 bit barrel shifter
3. what is 4*4 carry save multiplier. Calculate its critical path delay
4. Explain the following circuits 1. Data path circuit 2. Any one adder circuit
5. Explain with neat diagram baugh-wooley multiplier
6. Explain ripple carry adder.Nov-16
7. Describe about carry look-ahead adder and its carry generation and
propagation.Nov-16,Nov-17

8.Design 16 bit carrybypass and carry select adder and discuss their features.May16

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

9.Design a 4x4 array multiplier and write down the equation for delay.May-16

10. Explain the operation of Booth multiplier with suitable examples?Justify how
booth algorithm speed up the multiplication process.Nov-16

11. Design a multiplier for 5 bit by 3 bit .Explain its operation and summarize the
number of adders.Discuss it over Wallace multiplier.Nov-17

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

UNIT V
IMPLEMENTATION
STRATEGIES AND
TESTING
REFERRED BOOK:
1. Jan Rabaey, AnanthaChandrakasan, B.Nikolic, “Digital Integrated
Circuits: A Design Perspective”, Second Edition, Prentice Hall of India,
2003.

2. M.J. Smith, “Application Specific Integrated Circuits”, Addisson Wesley,


1997

STAFF IN-CHARGE HOD

5.1 INTRODUCTION

Introduction to Field-Programmable Gate Arrays


SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

Field-Programmable Gate Arrays (FPGAs) are a revolutionary new type of


userprogrammable integrated circuits that provide fast, inexpensive access to customized
VLSI. An FPGA consists of an array of logic cells that can be interconnected via
programmable routing switches, where the routing structures are sufficiently general to allow
the configuration of multiple levels of the FPGA’s logic cells. FPGAs represent a combination
of the features of Mask Programmable Gate Arrays (MPGAs) and Programmable Logic
Devices (PLDs).
From MPGAs, FPGAs have adopted a twodimensional array of logic cells, and from
PLDs the user-programmability. The research reported in this thesis is focused on FPGA
routing algorithms and routing architectures. Following their introduction in 1985, by the
Xilinx Company [Cart86], FPGAs have evolved considerably as various new devices have
been.
FPGAs have quickly gained widespread use, which can be attributed to the reduced
manufacturing time and relatively low costs of these large-capacity userprogrammable
devices. As an implementation medium for customized VLSI circuits, FPGAs offer unique
advantages over the alternative technologies (MPGAs, standard cells, and full custom
design):
(1) FPGAs provide a reduction in the cost of manufacturing a customized VLSI circuit
from tens of thousands of dollars to about one hundred dollars.
(2) FPGAs reduce the manufacturing time from months to minutes.

These advantages, which are attributable to the user-programmability of FPGAs,


provide a faster time-to-market and less pressure on designers, because multiple design
iterations can be done quickly and inexpensively.
However, user-programmability also has drawbacks: the logic density and speed
performance of FPGAs is considerably lower than those of the alternatives. While
developments over the last few years have shown significant improvements in FPGAs, much
research is still needed before the best FPGA designs are discovered.
5.2.5Field-Programmable Gate Arrays

FPGAs are the newest member of the ASIC family and are rapidly growing in
importance, replacing TTL in microelectronic systems. Even though an FPGA is a type of
gate array, we do not consider the term gate-array based ASICs to include FPGAs. This may
change as FPGAs and MGAs start to look more alike.

The essential characteristics of an FPGA:

 None of the mask layers are customized.


 A method for programming the basic logic cells and the interconnect.
 The core is a regular array of programmable basic logic cells that can
implement combinational as well as sequential logic (flip-flops).
 A matrix of programmable interconnect surrounds the basic logic cells.
 Programmable I/O cells surround the core.
 Design turnaround is a few hours.

All FPGAs contain a regular structure of programmable basic logic cells surrounded
by programmable interconnect. The exact type, size, and number of the programmable basic
logic cells varies tremendously.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.5.9 FPGA Architecture

5.3 ROUTING ARCHITECTURE

The input to the global router is a floorplan that includes the locations of all the fixed
and flexible blocks; the placement information for flexible blocks; and the locations of all the
logic cells. The goal of global routing is to provide complete instructions to the detailed router
on where to route every net. The objectives of global routing are one or more of the
following:

 Minimize the total interconnect length.


 Maximize the probability that the detailed router can complete the routing.
 Minimize the critical path delay.

General Approach to Routing


Because of the combinatorial complexity involved, the solution of large routing problems
usually requires a "divide and conquer" strategy. Following this philosophy, routing can be
solved by a three-step process [Loren89]:
1. Partition the routing resources into routing areas that are appropriate for both the
device to be routed and the routing algorithms to be employed.
2. Use a global router to assign each net to a subset of the routing areas. The global
router does not choose specific wiring segments and routing switches for each connection,
but rather it creates a new set of restricted routing problems.

3. Use a detailed router to select specific wiring segments and routing switches for
each connection, within the restrictions set by the global router.
The advantage of this approach is that each of the routing tools can more effectively
solve a smaller part of the routing problem. More specifically, since a global router need
not be concerned with allocating wiring segments or routing switches, it can concentrate
on more global issues, like balancing the usage of the routing channels. Similarly, with
the reduced number of detailed routing alternatives that are available for each connection
because of the restrictions introduced by a global router, a detailed router can focus on
the problem of achieving connectivity. Its limited scope enables a detailed router to
concentrate on resolving contention for routing resources that may exist among different
nets. The above routing strategy has been adopted in this thesis for FPGA routing. The

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

routing resources are partitioned into horizontal and vertical routing channels.
2.2.3 Introduction to Global Routing
This section introduces global routing by describing the LocusRoute global routing
algorithm for standard cells. Although there are many other published techniques for global
routing, this specific algorithm is described as an example because a modified version of it is
employed for FPGA global routing in this thesis.
This algorithm has been chosen for FPGAs because, as described below, its primary
goal is to balance the usage of the routing channels. This is important for FPGAs because the
number of tracks per channel is pre-determined. Note that the description below is based on
the standard-cell version of LocusRoute, and the main difference between this and the FPGA
version is the definitions of the routing channels - the standard-cell program assumes only
horizontal routing channels, whereas the FPGA version uses both horizontal and vertical
channels.
2.2.3.1 The LocusRoute Global Routing Algorithm
The LocusRoute global router views the global routing problem as consisting of
three main tasks:
1. For nets comprising more than two pins, determine which pairs of pins to connect
together. This step decomposes a multi-point net into a set of two-point connections.
2. Determine a path through the routing channels for each connection.
3. Optimize the solution so that the usage of all of the routing channels is balanced.
The first task is solved by finding a minimum-spanning tree for each net.
Basically, this technique breaks a net into a set of two point connections such that the
total amount of interconnect required is minimized.
To solve the second task, LocusRoute models each routing channel as an array of
grids, as shown in Figure 2.2. Each grid location contains a counter, originally set to zero,
which is incremented by one for each connection that is globally routed through it. In this
way, the algorithm is able to maintain a detailed account of the usage of each routing channel,
so that it can avoid congestion.
The algorithm considers alternative ways of routing each connection and chooses the
one that passes through the least congested routing grids. Note that LocusRoute does not
consider all of the possible ways that a connection can be routed, but rather it evaluates only a
subset of the paths that have "two or fewer bends", as explained in .
After all of the connections have been globally routed once, LocusRoute optimizes
the solution by sequentially ripping up and re-routing each connection. After repeating this
procedure a small number of times, the final solution is output in a format suitable for the
detailed router to be employed.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Introduction to Detailed Routing


This section provides an introduction to detailed routing by describing the maze
routing technique. Although there exist many other detailed routing algorithms , maze
routing will be discussed because it is widely used due to its general applicability, and a
variant of a maze router is employed as a comparison against the detailed routing algorithm
for FPGAs..
2.2.4.1 The Lee Maze Router
Most maze routers can be considered to be a variant of the algorithm described in . This
technique models the entire routing surface as a rectangular array of cells, where the size of
each cell is defined so as not to violate the spacing rules for wiring segments. Connections
are formed one at a time by selecting adjacent cells that reach from one end of a connection
to the other. Once a grid location is occupied, either by a connection or by some sort of
obstruction, it is marked as unusable. An array of routing cells is illustrated in Figure 2.3,
where unusable cells are shaded and usable ones are not. The figure shows the detailed
routes of three connections as they might be produced by a 2-7
maze router.
The Lee algorithm implements the array of cells as a regular graph, with one vertex
for each cell and one edge joining each pair of adjacent cells. A connection is routed by
beginning at one of its ends and traversing the graph in a breadth first fashion until the
other end is reached. The result is a diamond shaped wavefront that emanates from the
first point, as illustrated in Figure 2.4. The numbers in the figure correspond to each step
as the wavefront is propagated.

The main advantage of a maze router is that it is guaranteed to find a path from one
end of a connection to the other, if one exists at the time the connection is routed. On the

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

other hand, because of its sequential nature a maze router is unable to consider the
sideeffects that the routing of one connection may have on another. Correspondingly, the
main disadvantage of maze routing is the unnecessary blockage of as yet unrouted
connections because of previous routing decisions.
Commercially Available FPGAs
This section provides a detailed description of three commercially available FPGA
families, including those from Xilinx Co., Actel, and Altera. These particular FPGAs
have been chosen because they are representative examples of state-of-the-art devices
and they are in widespread use.
Each device is described in terms of its general architecture, its choice of
programmable cell, its routing architecture, and its CAD routing tools. Enough details are
given, and in some cases specific comments are made, to show how the routing architecture
of each device relates to the research contained in this thesis. In addition, at the end of the
section, several recently introduced FPGAs are briefly
described.

2.3.1 Xilinx FPGAs


The general architecture of Xilinx FPGAs is shown in Figure 2.5. It consists of a two-
dimensional array of programmable cells, called Configurable Logic Blocks (CLBs), with
horizontal routing channels between rows of cells and vertical routing channels between
columns. Programmable resources are configured by Static RAM cells, and each routing
switch is implemented as a specially designed transistor controlled by an SRAM bit. There
are three families of Xilinx FPGAs, called the XC2000, XC3000, and XC4000 corresponding
to first, second, and third generation devices. Table 2.1 gives an indication of the logic
capacities of each generation by showing the number of CLBs and an equivalent gate count.
The gate count measure is given in terms of "equivalent to an MPGA of the same size." All
FPGA manufacturers quote logic capacity by this measure, but it is questionable whether the
figures quoted by each are realistic. The numbers given in Table 2.1, and in similar tables
that appear later in this chapter, should be interpreted accordingly. The design of the Xilinx
CLB and routing architecture differs for each generation, so they will each be described in
turn.
2.3.1.1 Xilinx XC2000

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The XC2000 CLB, shown in Figure 2.6, consists of a four-input look-up table and a D flip-flop
[Cart86]. The look-up table can generate any function of up to four variables

or any two functions of three variables. Both of the CLB outputs can be combinational, or
one output can be registered. As illustrated in Figure 2.7, the XC2000 routing architecture
employs three types of routing resources: Direct interconnect, General Purpose
interconnect, and Long Lines. Note that for clarity the routing switches that connect to the
CLB pins are not shown in the figure. The Direct interconnect (shown only for the CLB
marked with an ’*’) provides connections from the output of a CLB to its right, top, and
bottom neighbours. For connections that span more than one CLB, the General Purpose
interconnect provides horizontal and vertical wiring segments, with four segments per row
and five segments per column. Each wiring segment spans only the length or width of one
CLB, but longer wires can be formed because each switch matrix holds a number of routing
switches that can interconnect the wiring segments on its four sides. Note that a connection
routed with the General Purpose interconnect will incur significant routing delays because it
must pass through a routing switch at each switch matrix. Connections that are required

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

to reach several CLBs with low skew can use the Long Lines, which traverse at most one
routing switch to span the entire length or width of the FPGA.
2.3.1.2 Xilinx XC3000
The XC3000 [Hsie88] is an enhanced version of the XC2000, featuring a more complex CLB
and more routing resources. The CLB, as shown in Figure 2.8, houses a look-up table that
can implement any function of five variables, any two functions of four variables, and some
functions of up to seven variables. The CLB has two outputs, both of which may be either
combinational or registered. Figure 2.9 shows that the XC3000 routing architecture is similar
to that in the XC2000, having Direct interconnect, General Purpose interconnect, and Long
Lines. Each resource is enhanced: the Direct interconnect can additionally reach a CLB’s left

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

neighbour, the General Purpose interconnect has an extra wiring segment per row, and

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

there are more Long Lines. The XC3000 also contains switch matrices that are similar to
those in the XC2000. Figure 2.9 depicts the internal structure of an XC3000 switch matrix by
showing, as an example, that the wiring segment marked with an ’*’ can connect through
routing switches to six other wiring segments. Although not shown in the figure, the other
wiring segments are similarly connected, though not always to the same number of
segments. This detail is included here because the results shown in Chapter 4 of this thesis
suggest recommended values for the number of routing switches connectable to any wiring
segment, as well as the number of wiring segments in a row or column. Those results
indicate that, in terms of routability, the XC3000 contains too many routing switches per
switch matrix and too few wiring segments in its rows and columns.

2.3.1.3 Xilinx XC4000


The XC4000 [Hsie90] features several enhancements over its predecessors. The CLB,
illustrated in Figure 2.10, utilizes a hierarchical arrangement of look-up tables that yields a
greater logic capacity per CLB than in the XC3000. The XC4000 CLB can implement two
independent functions of four variables, any single function of five variables, any function of

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

four variables together with some functions of five variables, or some functions of up to nine
variables. The CLB has two outputs, which may be either combinational or registered.

The XC4000 routing architecture is significantly different from the earlier Xilinx FPGAs, with
the most obvious difference being the replacement of the Direct interconnect and General
Purpose interconnect with two new resources, called Single-length Lines and Double-length
Lines. The Single-length Lines, which are intended for relatively short connections or those
that do not have critical timing requirements, are shown in Figure 2.11, where each X
indicates a routing switch. This figure illustrates three architectural enhancements in the
XC4000 series:
1. There are more wiring segments in the XC4000. While the number shown in the
figure is only suggestive, the XC4000 contains more than twice as many wiring segments
as does the XC3000.
2. Most CLB pins can connect to a high percentage of the wiring segments. This
represents an increase in connectivity over the XC3000.
3. Each wiring segment that enters a switch matrix can connect to only three others,
which is half the number found in the XC3000.
It is interesting to note these three enhancements here because they are all supported
by the architectural research that appears in Chapter 4 of this thesis. The remaining routing
resources in the XC4000, which includes the Double-length Lines and the Long Lines, are
shown in Figure 2.12. As the figure shows, the Doublelength Lines are similar to the Single-
length Lines, except that each one passes through half as many switch matrices. This
scheme offers lower routing delays for moderately long connections that are not appropriate
for the low-skew Long Lines. For clarity, neither the Single-length Lines nor the routing
switches that connect to the CLB pins are shown in Figure 2.12.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

2.3.1.4 Xilinx CAD Routing Tools


Xilinx routing tools are based on maze routers that are customized for the particular routing
resources in each part. It was noted earlier in this chapter that maze routers are unable to
consider the side effects that routing some connection in a particular fashion may have on
other connections. This is a serious shortcoming because Xilinx routing structures have
limited connectivity, and for this reason maze routing is probably not the best technique to
use for Xilinx devices.
2.3.2 Actel FPGAs
The basic architecture of Actel FPGAs, depicted in Figure 2.13, is similar to that found in
MPGAs, consisting of rows of programmable cells, called Logic Modules (LMs), with
horizontal routing channels between the rows. Each routing switch in these FPGAs is
implemented by a novel device called an anti-fuse [ElAy88], which normally resides in a
high-impedance state but takes on a low resistance (about 500 ohms) when "programmed"
by a high voltage pulse. Actel currently has two generations of FPGAs, called the Act-1
[ElAy88] and Act-2 [Ahre90], whose logic capacities are shown in Table 2.2.
2.3.2.1 Actel Act-1
The Act-1 LM that is shown in Figure 2.14 illustrates a very different approach from that
found in Xilinx FPGAs. Namely, while Xilinx utilizes a large, complex CLB, Actel advocates a
small, simple LM. Research has shown [Sing91] that both of these approaches have their
merits, and the best choice for a programmable cell depends on the

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

speed performance of the routing architecture. As Figure 2.14 shows, the Act-1 LM is
based on a configuration of multiplexers, which can implement any function of two variables,
most functions of three, some of four, up to a total of 702 logic functions [Mail90]. The Act-1
routing architecture is illustrated in Figure 2.15, which for clarity shows only the routing
resources connected to the LM in the middle of the picture. The Act-1 employs four distinct
types of routing resources: Input segments, Output segments, Clock tracks, and Wiring
segments. Input segments connect four of the LM inputs to the Wiring segments above the
LM and four to those below, while an Output segment connects the LM output to several
channels, both above and below the module. The Wiring segments consist of straight metal
lines of various lengths that can be connected together through anti-fuses to form longer
lines. The Act-1 features 22 tracks of Wiring segments in each routing channel and, although
not shown in the figure, 13 vertical tracks that lie directly on top of each LM column. Clock
tracks are special low-delay lines that are used for signals that must reach many LMs with
minimum skew.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

2.3.2.2 Actel Act-2


The Act-2 device, an enhanced version of the Act-1, contains two different programmable
cells, called the C (Combinational) module and the S (Sequential) module. The C module is
very similar to the Act-1 LM, although slightly more complex, while the S module is optimized
to implement sequential elements. The Act-2 routing architecture is also similar to that found
in the Act-1. It features the same four types of routing resources, but the number of tracks is
boosted to 36 in each routing channel and 15 in each column.
2.3.2.3 Actel CAD Routing Tools
The key CAD tool that is used to route Actel FPGAs is the segmented channel router
described in [Green90]. This router uses a novel algorithm that guarantees that every
connection will pass through at most a given maximum number of anti-fuses, if such a
solution exists, and in this sense the algorithm produces an optimal result. Although channel
routers are not generally appropriate for FPGAs, for reasons given in Chapter 3, it is
possible to use this technique for Actel designs because of their high connectivity. Every LM
input connects to all of the tracks either above or below it and each LM output connects to all
the tracks in the channels spanned by its output segment. However, it is worthy of note that
the research reported in Chapter 4 of this thesis indicates that this connectivity can be
reduced, in which case it might be necessary to modify the routing algorithm to handle the
reduced horizontal-vertical connectivity.
2.3.3 Altera FPGAs
Altera FPGAs [Alt90] are considerably different from the others discussed above because
they resemble large Programmable Logic Devices. Nonetheless, they are functionally
equivalent to FPGAs because they employ a two-dimensional array of programmable cells
and a programmable routing structure, they can implement multi-level logic, and they are
user-programmable. Altera’s general architecture, which is based on an EPROM
programming technology, is illustrated in Figure 2.16. It consists of an array of rogrammable
cells, called Logic Array Blocks (LABs), interconnected by a routing resource called the
Programmable Interconnect Array (PIA). The logic capacities of the two generations of Altera
FPGAs are listed in Table 2.3. The Altera LAB is by far the most complex logic cell of any of
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

the FPGA families described thus far. A LAB can be thought of as an efficient PLD, as will be
explained in the following paragraphs. Each LAB, as seen in Figure 2.17, consists of two
major blocks, called the Macrocell Array and the Expander Product Terms. The Macrocell
Array is a one-dimensional array of elements called Macrocells, where the number of
elements in the array varies with each Altera device. As illustrated

in Figure 2.18, each Macrocell comprises three wide AND gates that feed an OR gate which
connects to an XOR gate, and a flip-flop. The XOR gate generates the Macrocell output and
can optionally be registered. In Figure 2.18, the inputs to the Macrocell are shown as single-
input AND gates because each is generated as a wired-AND (called a pterm) of the signals
drawn on the left-hand side of the figure. A p-term can include any signal in the PIA, any of
the LAB Expander Product Terms (described below), or the output of any other Macrocell.
With this arrangement the Macrocell Array functions much like a PLD, but with fewer product
terms per register (there are usually at least eight product terms per register in a PLD).
Altera claims [Alt90] that this makes the LAB more efficient because most logic functions do
not require the large number of p-terms found in PLDs and the LAB supports wide functions
by way of the Expander Product Terms.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

As illustrated in Figure 2.19, each Expander Product Terms block consists of a number of p-
terms (the number shown in the figure is only suggestive) that are inverted and fed back to
the Macrocell Array, and to itself. This arrangement permits the implementation of very wide
logic functions because any Macrocell has access to these extra p-terms. The Altera routing
structure, the PIA, consists of a number of long wiring segments that pass adjacent to every
LAB. The PIA provides complete connectivity because each LAB input can be programmably
connected to the output of any LAB, without constraints. With this arrangement, routing an
Altera FPGA is trivial, since there are no routing constraints. However, as mentioned
previously for Actel FPGAs, this level of connectivity is excessive and could probably be
reduced, given an appropriate routing algorithm.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

5.3.2 Xilinx LCA

(a) The LCA architecture (notice the matrix element size is larger than a CLB).

(b) A simplified representation of the interconnect resources.

 Each of the lines is a bus.


 The vertical lines and horizontal lines run between CLBs.
 The general-purpose interconnect joins switch boxes (also known as magic boxes
or switching matrices).
 The long lines run across the entire chip.
 It is possible to form internal buses using long lines and the three-state buffers that
are next to each CLB.
 The direct connections (not used on the XC4000) bypass the switch matrices and
directly connect adjacent CLBs.
 The Programmable Interconnection Points ( PIP s) are programmable pass
transistors that connect the CLB inputs and outputs to the routing network.
 The bi-directional ( BIDI ) interconnect buffers restore the logic level and logic
strength on long interconnect paths.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.5.12. Interconnect architecture of XILINX LCA

Fig.5.13 (a) A portion of the interconnect around the CLBs (b) A


switching matrix (c) A detailed view inside the switching matrix showing the
pass-transistor arrangement (d) The equivalent circuit for the connection
between nets 6 and 20 using the matrix (e) A view of the interconnect at a
Programmable Interconnection Point (PIP) (f) and (g) The equivalent schematic
of a PIP connection (h) The complete RC delay path
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

5.3.3 Altera MAX

Altera MAX 9000 interconnect scheme

 A 4 x 5 array of Logic Array Blocks (LABs), the same size as the EMP9400
chip.
 A simplified block diagram of the interconnect architecture showing the
connection of the FastTrack buses to a LAB.

Fig.5.14. Interconnect scheme of ALTERA MAX

5.4 FPGA ARCHITECTURE

5.4.1 XC4000 CLB

 A 32-bit look-up table ( LUT ).


 CLB propagation delay is fixed (the LUT access time) and independent of the
logic function.
 7 inputs to the XC3000 CLB: 5 CLB inputs (A–E), and 2 flip-flop outputs (QX
and QY).
 2 outputs from the LUT (F and G). Since a 32-bit LUT requires only five
variables to form a unique address (32 = 2 5 ), there are several ways to use
the LUT:
 Use 5 of the 7 possible inputs (A–E, QX, QY) with the entire 32-bit LUT (the
CLB outputs (F and G) are then identical) .
 Split the 32-bit LUT in half to implement 2 functions of 4 variables each;
choose 4 input variables from the 7 inputs (A–E, QX, QY). You have to
choose 2 of the inputs from the 5 CLB inputs (A–E); then one function output
connects to F and the other output connects to G.
 You can split the 32-bit LUT in half, using one of the 7 input variables as a
select input to a 2:1 MUX that switches between F and G (to implement some
functions of 6 and 7 variables).

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.5.15. Xilinx XC4000 family CLB

5.4.1.2 XC4000 I/O Block

Fig.5.15. Xilinx XC4000 I/O Block

5.4.2 Actel ACT

The Actel ACT architecture:

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

(a) Organization of the basic logic cells.

(b) The ACT 1 Logic Module (LM, the Actel basic logic cell). The ACT 1 family
uses just one type of LM. ACT 2 and ACT 3 FPGA families both use two
different types of LM

(c) An example LM implementation using pass transistors .

(d) An example logic macro. Connect logic signals to some or all of the LM
inputs, the remaining inputs to VDD or GND.

Fig.5.16.ACTEL ACT 1

5.4.2.2 Logic Module Analysis

 Actel uses a fine-grain architecture which allows you to use almost all of the
FPGA.
 Synthesis can map logic efficiently to a fine-grain architecture.
 Physical symmetry simplifies place-and-route (swapping equivalent pins on
opposite sides of the LM to ease routing).
 Matched to small antifuse programming technology.
 LMs balance efficiency of implementation and efficiency of utilization.
 A simple LM reduces performance, but allows fast and robust placeand-route.

5.4.3 ALTERA MAX

The implementation details vary among the families, but the basic features: wide
programmable-AND array, narrow fixed-OR array, logic expanders, and
programmable inversion—are very similar. Each family has the following individual
characteristics:

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 A typical MAX 5000 chip has: 8 dedicated inputs (with both true and
complement forms); 24 inputs from the chipwide interconnect (true and
complement); and either 32 or 64 shared expander terms (single polarity).
 The MAX 5000 LAB looks like a 32V16 PLD (ignoring the expander terms).
 The MAX 7000 LAB has 36 inputs from the chipwide interconnect and 16
shared expander terms; the MAX 7000 LAB looks like a 36V16 PLD.
 The MAX 9000 LAB has 33 inputs from the chipwide interconnect and 16 local
feedback inputs (as well as 16 shared expander terms); the MAX 9000 LAB
looks like a 49V16 PLD.

Fig.5.16.ALTERA MAX architecture

VI.DESIGN FOR TESTABILITY


The keys to designing circuits that are testable are controllability and observability.
Restated, controllability is the ability to set (to 1) and reset (to 0) every node internal to the

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

circuit. Observability is the ability to observe, either directly or indirectly, the state of any
node in the circuit.

Good observability and controllability reduce the cost of manufacturing testing


because they allow high fault coverage with relatively few test vectors. Moreover, they can
be essential to silicon debug because physically probing internal signals has become so
difficult. We will first cover three main approaches to what is commonly called Design for
Testability.

(DFT). These may be categorized as follows:

 Ad hoc testing
 Scan-based approaches
 Built-in self-test (BIST)

6.1 AD HOC TESTING

Ad hoc test techniques, as their name suggests, are collections of ideas aimed at
reducing the combinational explosion of testing. They are summarized here for historical
reasons.

They are only useful for small designs where scan, ATPG, and BIST are not
available. A complete scan-based testing methodology is recommended for all digital
circuits. Having said that, the following are common techniques for ad hoc testing:

 Partitioning large sequential circuits


 Adding test points
 Adding multiplexers
 Providing for easy state reset

A technique classified in this category is the use of the bus in a bus-oriented system for
test purposes. Each register has been made loadable from the bus and capable of being
driven onto the bus.

Here, the internal logic values that exist on a data bus are enabled onto the bus for
testing purposes. Frequently, multiplexers can be used to provide alternative signal paths
during testing. In CMOS, transmission gate multiplexers provide low area and delay
overhead.

Any design should always have a method of resetting the internal state of the chip within
a single cycle or at most a few cycles. Apart from making testing easier, this also makes
simulation faster as a few cycles are required to initialize the chip.

In general, ad hoc testing techniques represent a bag of tricks developed over the years
by designers to avoid the overhead of a systematic approach to testing. While these general
approaches are still quite valid, process densities and chip complexities necessitate a
structured approach to testing.

6.2 SCAN DESIGN

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

The scan-design strategy for testing has evolved to provide observability and
controllability at each register. In designs with scan, the registers operate in one of two
modes.

 In normal mode, they behave as expected.


 In scan mode, they are connected to form a giant shift register called a scan
chain spanning the whole chip. By applying N clock pulses in scan mode, all
N bits of state in the system can be shifted out and new N bits of state can be
shifted in.

Therefore, scan mode gives easy observability and controllability of every register in
the system. Modern scan is based on the use of scan registers, as shown in Figure 6.1 . The
scan register is a D flip-flop preceded by a multiplexer.

When the SCAN signal is deasserted, the register behaves as a conventional


register, storing data on the D input. When SCAN is asserted, the data is loaded from the SI
pin, which is connected in shift register fashion to the previous register Q output in the scan
chain.

For the circuit to load the scan chain, SCAN is asserted and CLK is pulsed eight
times to load the first two ranks of 4-bit registers with data. SCAN is deasserted and CLK is
asserted for one cycle to operate the circuit normally with predefined inputs. SCAN is then
reasserted and CLK asserted eight times to read the stored data out.

At the same time, the new register contents can be shifted in for the next test.
Testing proceeds in this manner of serially clocking the data through the scan register to the
right point in the circuit, running a single system clock cycle and serially clocking the data out
for observation.

In this scheme, every input to the combinational block can be controlled and every
output can be observed. In addition, running a random pattern of 1s and 0s through the scan
chain can test the chain itself.

Test generation for this type of test architecture can be highly automated. ATPG
techniques can be used for the combinational blocks and, as mentioned, the scan chain is
easily tested.

The prime disadvantage is the area and delay impact of the extra multiplexer in the
scan register. Designers (and managers alike) are in widespread agreement that this cost is
more than offset by the savings in debug time and production test cost.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 6.1 Scan-based testing

6.2.1PARALLEL SCAN

You can imagine that serial scan chains can become quite long, and the loading and
unloading can dominate testing time. A fairly simple idea is to split the chains into smaller
segments. This can be done on a module-by-module basis or completed automatically to
some specified scan length. Extending this to the limit yields an extension to serial scan
called random access scan.

To some extent, this is similar to that used inside FPGAs to load and read the control
RAM. The basic idea is shown in Figure 6.2. The figure shows a two-by-two register section.
Each register receives a column (column<m>) and row (row<n>) access signal along with a
row data line (data<n>).

A global write signal (write) is connected to all registers. By asserting the row and
column access signals in conjunction with the write signal, any register can be read or
written in exactly the same method as a conventional RAM.

The notional logic is shown to the right of the four registers. Implementing the logic required
at the transistor level can reduce the overhead for each register.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 6.2 Parallel scan––basic structure

6.2.2 SCANNABLE REGISTER DESIGN

As we have seen, an ordinary flip-flop can be made scannable by adding a multiplexer


on the data input, as shown in Figure 6.4(a). Figure 6.4(b) shows a circuit design for such a
scan register using a transmission-gate multiplexer.

The setup time increases by the delay of the extra transmission gate in series with the D
input as compared to the ordinary static flip-flop. igure 6.4(c) shows a circuit using clock
gating to obtain nearly the same setup time as the ordinary flip-flop.

In either design, if a clock enable is used to stop the clock to unused portions of the chip,
care must be taken that always toggles during scan mode.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 6.4 Scannable flip-flops

6.3 BUILT-IN SELF-TEST (BIST)

Self-test and built-in test techniques, as their names suggest, rely on augmenting circuits
to allow them to perform operations upon themselves that prove correct operation.

These techniques add area to the chip for the test logic, but reduce the test time required
and thus can lower the overall system cost. offers extensive coverage of the subject from
the implementer’s perspective. One method of testing a module is to use or cyclic
redundancy checking.

This involves using a pseudo-random sequence generator (PRSG to produce the input
signals for a section of combinational circuitry and a signature analyzer to observe the output
signals. A PRSG of length n is constructed from a linear feedback shift register (LFSR),
which in turn is made of n flip-flops connected in a serial fashion, as shown in Figure 6.5(a).

The XOR of particular outputs are fed back to the input of the LFSR. An n-bit LFSR will
cycle through 2n–1 states before repeating the sequence.

A complete feedback shift register (CFSR), shown in Figure 6.5(b), includes the zero
state that may be required in some test situations.

An n-bit LFSR is converted to an n-bit CFSR by adding an n – 1 input NOR gate


connected to all but the last bit. When in state 0…01, the next state is 0…00. When in state
0…00, the next state is 10…0.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE 6.5 Pseudo-random sequence generator

Otherwise, the sequence is the same. Alternatively, the bottom n bits of an n + 1-bit
LFSR can be used to cycle through the all zeros state without the delay of the NOR gate. A
signature analyzer receives successive outputs of a combinational logic block and produces
a syndrome that is a function of these outputs. The syndrome is reset to 0, and then XORed
with the output on each cycle. The syndrome is swizzled each cycle so that a fault in one bit
is unlikely to cancel itself out.

At the end of a test sequence, the LFSR contains the syndrome that is a function of all
previous outputs. This can be compared with the correct syndrome (derived by running a
test program on the good logic) to determine whether the circuit is good or bad.

If the syndrome contains enough bits, it is improbable that a defective circuit will produce
the correct syndrome.

6.3.1 BIST

The combination of signature analysis and the scan technique creates a structure known as
BIST—for Built-In Self-Test or BILBO—for Built-In Logic Block Observation. The 3-bit BIST
register shown in Figure 6.6 is a scannable, resettable register that also can serve as a
pattern generator and signature analyzer. C[1:0] specifies the mode of operation.

In the reset mode (10), all the flip-flops are synchronously initialized to 0. In normal
mode (11), the flip-flops behave normally with their D input and Q output. In scan mode (00),
the flip-flops are configured as a 3-bit shift register between SI and SO.

Note that there is an inversion between each stage. In test mode (01), the register
behaves as a pseudo-random sequence generator or signature analyzer. If all the D inputs
are held low, the Q outputs loop through a pseudo-random bit sequence, which can serve as
the input to the combinational logic.

If the D inputs are taken from the combinational logic output, they are swizzled with
the existing state to produce the syndrome. In summary, BIST is performed by first resetting
the syndrome in the output register. Then both registers are placed in the test mode to
produce the pseudo-random inputs and calculate the syndrome.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Finally, the syndrome is shifted out through the scan chain. Various companies have
commercial design aid packages that automatically replace ordinary registers with scannable
BIST registers, check the fault coverage, and generate scripts for production testing.

6.3.2 Memory BIST

On many chips, memories account for the majority of the transistors. A robust testing
methodology must be applied to provide reliable parts. In a typical MBIST scheme,
multiplexers are placed on the address, data, and control inputs for the memory to allow
direct access during test.
During testing, a state machine uses these multiplexers to directly write a
checkerboard pattern of alternating 1s and 0s. The data is read back, checked, then the
inverse pattern is also applied and checked. ROM testing is even simpler: The contents are
read out to a signature analyzer to produce a syndrome.

FIGURE 6.6 BIST (a) 3-bit register, (b) use in a system

6.3.3 Other On-Chip Test Strategies

On-chip speeds are usually so high that directly observing internal behavior for testing
can be difficult or impossible. Designers have included on-chip logic analyzers and
oscilloscopes to deal with this problem Such systems typically require a trigger signal to
initiate data collection, a high speed timing generator, analog or digital sampling, and a
buffer to store the results until they can be off-loaded at lower speed.

A drawback is that the nodes to be observed must be selected at design time, and these
may not be the problem circuits. Nevertheless, probing major busses and critical analog/RF
nodes can be helpful. Also, on-chip scopes have been used to characterize power supply
noise . Analog/digital converter testing requires real-time access to the digital output of the
ADC. Providing parallel digital test ports by reassigning pins on the chip I/O can facilitate this
testing.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

If this is impossible, a “capture RAM” on chip can be used to capture results in real-time
and then the contents can be transferred off-chip at a slower rate for analysis. If both ADCs
and DACs are present, a loopback strategy can be employed, as shown in Figure 6.8.

Both analog and digital signals can loop back. Communication and graphics systems
frequently have I/O systems that can be configured as shown. It is often worthwhile to add a
DAC and an ADC to a system to allow a level of analog self-test.

FIGURE 6.8 Analog and digital loopback

6.3.4 IDDQ

A method of testing for bridging faults is called IDDQ test (VDD supply current
Quiescent) or supply current monitoring. This relies on the fact that when a CMOS logic gate
is not switching, it draws no DC current (except for leakage).

When a bridging fault occurs, then for some combination of input conditions, a
measurable DC IDD will flow. Testing consists of applying the normal vectors, allowing the
signals to settle, and then measuring IDD.

As potentially only one gate is affected, the IDDQ test has to be very sensitive. In
addition, to be effective, any circuits that draw DC power such as pseudo-nMOS gates or
analog circuits have to be disabled. Dynamic gates can also cause problems.

As current measuring is slow, the tests must be run slower (of the order of 1 ms per
vector) than normal, which increases the test time. IDDQ testing can be completed externally
to the chip by measuring the current drawn on the VDD line or internally using specially
constructed test circuits. This technique gives a form of indirect massive observability at little
circuit overhead.

However, as subthreshold leakage current increases, IDDQ testing ceases to be


effective because variations in subthreshold leakage exceed currents caused by the faults.

VII. BOUNDARY SCAN

Up to this point we have concentrated on the methods of testing individual chips.


Many system defects occur at the board level, including open or shorted printed circuit board
traces and incomplete solder joints. At the board level, “bed-of-nails” testers historically were
used to test boards.

In this type of a tester, the board-under-test is lowered onto a set of test points (nails)
that probe points of interest on the board. These can be sensed (the observable points) and

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

driven (the controllable points) to test the complete board. At the chassis level, software
programs are frequently used to test a complete board set.

For instance, when a computer boots, it might run a memory test on the installed
memory to detect possible faults.

The increasing complexity of boards and the movement to technologies such as


surface mount technologies (with an absence of throughboard vias) resulted in system
design ers agreeing on a unified scan-based methodology called boundary scan for testing
chips at the board (and system) level.

Boundary scan was originally developed by the Joint Test Access Group and hence
is commonly referred to as JTAG. Boundary scan has become a popular standard interface
for controlling BIST features as well.

The IEEE 1149 boundar y s c an architecture is shown in Figure 7.1. All of the I/O
pins of each IC on the board are connected serially in a standardized scan chain accessed
through the Test Access Port (TAP) so that every pin can be observed and controlled
remotely through the scan chain. At the board level, ICs obeying the standard can be
connected in series to form a scan chain spanning the entire board.

Connections between ICs are tested by scanning values into the outputs of each chip
and checking that those values are received at the inputs of the chips they drive. Moreover,
chips with internal scan chains and BIST can access those features through boundary scan
to provide a unified testing framework.

FIGURE 7.1 Boundary scan architecture

7.1 THE TEST ACCESS PORT (TAP)

The Test Access Port has four or five single-bit connections:

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 When the chip is in normal mode,TRSTandTCK are held low and TMS is held high to
disable boundary scan.

 To prevent race conditions,inputs are sampled on the rising edge ofTCK and outputs
toggle onthe falling edge

The Test Logic Architecture and Test Access Port

 The basic test architecture is shown i n Figure.It consists of


 The TAP interface pins
 A set of two or more test-dataregisters (DR)to collect data from the chip
 An instruction register(IR)specifying the type of test to perform
 A TAPcontroller which controls the scan of bits through the instruction and test-data
registers
 TheTAPcontroller is a small finite-state machine that configures the system.
 In one mode,it scans an instruction into the instruction register specifying what
boundary scan should do.
 In another mode it scans data in and out of the test-dataregisters.
 The specification requires atleast two test-data registers:
 The boundary scan register and the bypass register.

7.2 The Test Logic Architecture and Test Access Port

 The boundary scan register is associated with all the inputs and outputs on the
chips of that boundary scan can observe and control the chipI/Os.

 The bypass register is a single flip- flop used to accelerate testing by avoiding
shifting data into the boundaryscan registers of idle chips when only a single chip on the
board is being tesd.

 Internal scan chain,BIST,or configuration registers can be treated as optional


additional data registers controlled by boundary scan

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

7.3 The TAP Controller

The TAP controller is a16-state FSM that proceeds from state to state based on
theTCKandTMSsignals.

It provides signals that control the test-data registers and the instruction
register.These include serial shift clocks and update clocks.

The state transition diagram is shown in Figure.TheTAP controller is initialized to


Test-Logic-Reset on power-up by TRST*or an internal power-up detection circuit.

It moves from one state to then ext on the rising edge of TCK based on the value of TMS.

Typical test sequence will involve clocking TCK at some rate and setting TRST* to 0
for a few cycles and then returning this signal to 1 to reset theTAPcontroller state machine

TMS is then toggled to traverse the state machine for what ever operation is
required.These operations include serially loading an instruction register or serially loading
or reading data registers that are used to test the chip

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

7.4 THE INSTRUCTION REGISTER

The instruction register has to be atleast two bits long.Recall that boundary scan requires
atleast two data registers.

The instruction register specifies which data register will be placed in the scan chain when
the DR is selected.It also determines where the DR will load its value from in the Capture-
DR state and whether the values will be driven to output pads or core logic.

Three instructions are required to be supported:

 bypass—This instruction places the bypass register in the DR chain sothat the path
fromTDItoTDO involves only a singleflip-flop.This instruction is represented with all l's
in theIR.

 sample/preload—This instruction places the boundary scan registers in the


DRchain.Inthe Capture-DRstate,it copies the chip's I/O values into theDRs.

 extest—Thisinstructionallowsforthetestingofoff-chipcircuitry.It is similar to
sample/preload,but also drives the valuesfrom the DRs on to the output pads.

 intest—This instruction allows forsingle-step testing of internal circuitry via the


boundary scan registers.

 runbist—This instruction is used to activate internal self-testing procedures within a


chip.

Atypical IRbit is shown in Figure.Observe that it contains two flip-flops.TheClock IR


flip-flops of each bit are connected to form a shift register.

They are loaded with a constant value from the Data input in the Capture-IRstate and
then shifted out in the Shift-IRstate while new values are shifted in.

In the Update-IR state the contents of the shift register are copied in parallel to the
IR output to load the entire instruction at once.

On reset,the IRshould be asynchronouslyloaded with an innocuous instruction such


as bypass that doesnot interfere with the normal behavior of the core logic.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

7.5 The Instruction Register

 intest—This instruction nallowsforsingle-step testing of internal circuitry via the


boundary scan registers.

 runbist—This instruction is used to activate internal self-testing procedures within a


chip.

 At ypical IRbit is shown inFigure.Observe that it contains twoflip-flops.The Clock IR


flip-flops of each bit are connected to form a shiftregister.

 They are loaded with a constant value from the Data input in the Capture-
IRstate ,and then shifted out in the Shift-IRstate while new values are shifted in.

 In the Update-IRstate, the contents of the shift register are copied in parallel to the
IR output to load the entire instruction at once.

 On reset, the IR should be asynchronously loaded with an innocuous instruction


such as bypass that does not interfere with the normal behavior of the corelogic
 Test Data Registers

 The test data registers are used to set the inputs of modules to be tested and collect
the results of running tests.

 The simplest data register configuration consists of a boundary scanregister and a


bypass register.Figure shows a generalized view of the data registers in which an
internal data register has been added.

 A multiplexer under thec ontrol of the TAP controller selects which data register is
routed to theTDO pin.

 When internal data registers are added,the IR decoder must produce extracontrol
signals to select which one is in the DRchain for a particular instruction

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

Fig.Test Data Registers

7.5.1 Boundary Scan Register

 The boundary scan register connects to all of the I/O circuitry . Like the instruction
register,shift register for the scan chain and an additional bank of flip-flops to update the
outputs in parallel.

 An extra multiplexer on the output allows the boundary scan, register to override the
normal path through the I/Opad so it can observe and control inputs and outputs.

 The schematic and symbol for a single bit of the boundary scan register are shown
inFigure

 The boundary scan register can be configured as an input pad or output pad,as
shown in Figure(aandb).

 As an input,the register receives Data in from the pad and sends Qout to the core
logic in the chip.

 As an output,the register receives Data infrom the corelogic and drives Qout to a
pad.

 Tri state and bidirectional pads use two or three boundary scan register cells,as
shown inFigure(candd).
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

 The Mode signal determines whether Qout should be taken from Data in or the
boundary scan register.

 Separate mode_in and mode_out signals are used for input and output pads so they can be
controlled separately

 In normal chip operation,both mode signals are0,so the boundary scan registers are ignored.

 For the extest instruction,mode_out=1,so the output scan be controlled by the boundary
scan registers.

 For intest or runbist instructions,mode_in and mode_out areboth1.so the corelogic receives
its inputs from the boundary scan registers and the outputs are also driven to known safe
values by the boundary scanregister

7.5.2 Bypass Register

 When executing the bypass instruction, the single-bit Bypass register is connected
between TDIandTDO.It consists of a singleflip-flop that is cleared during Capture-DR,and
then scanned during Shift-DR, as shown in Figure

7.5.3 TDO Driver

 The TDO pin shifts out the least significant bit of the IR during Shift-IR or the least
significant bit of one of the data registers during Shift-DR,depending on which instruction is
active.

 The IEEE boundary scan specification requires that TDO change on the falling edge of
TCK and be tristated except during the Shift states.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 This prevents race conditions when the value is clocked into the next chip in the rising
edge of TCKand allows multiple chips to be connected inparallel with TDO pins tied to gether
to reduce the length of the boundary scan chain.

 Figure shows a possible implementation of theTDOdriver.The multiplexers choose among


the possible shift registers including the instruction register,boundary scan register,and
bypass register.

 Additional multiplexers would be used if more data registers were included.Aflip-flop or


latch delays the TDO signal until the falling edge of TCK.The tristate drives TDO during
Shift-IRorShift-DR.

TDO Driver

MANUFACTURING TEST PRINCIPLES


Integrated circuits have a yield of less than 100%. The purpose of manufacturing test
is to screen out most of the defective parts before they are shipped to customers.

Typical commercial products target a defect rate of 350–1000 defects per million
(DPM) chips shipped. The customer then assembles systems from the chips, tests the
systems, and discards or repairs defective systems.

A high defect rate leads to unhappy customers. A critical factor in all VLSI design is the
need to incorporate methods of testing circuits. This task should proceed concurrently with
architectural considerations and not be left until fabricated parts are available (as is a
recurring temptation to designers).

5.1 FAULT MODELS

To deal with the existence of good and bad parts, it is necessary to propose a fault
model; i.e., a model for how faults occur and their impact on circuits. The most
popular model is called the Stuck-At model. The Short Circuit/ Open Circuit model
can be a closer fit to reality, but is harder to incorporate into logic simulation tools.

5.1.1 STUCK-AT FAULTS

In the Stuck-At model, a faulty gate input is modeled as a stuck at zero (Stuck-At-0,
S-A- 0) or stuck at one (Stuck-At-l, S-A-l). This model dates from board-level designs, where
it was determined to be adequate for modeling faults. Figure illustrates how an S-A-0 or S-

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

A-1 fault might occur. These faults most frequently occur due to gate oxide shorts (the
nMOS gate to GND or the pMOS gate to VDD) or metal-to-metal shorts.

FIGURE CMOS stuck-at faults

5.1.2 SHORT-CIRCUIT AND OPEN-CIRCUIT FAULTS

Other models inc lude stuck-open or shor ted models .Two bridging or shorted faults
are
shown in Figure a.

The short S1 results in an S-A-0 fault at input A, while short S2 modifies the function
of the gate. It is evident that to ensure the most accurate modeling, faults should be modeled
at the transistor complete circuit structure is known.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE( a) CMOS bridging faults

For instance, in the case of a simple NAND gate, the intermediate node between the
series nMOS transistors is hidden by level because it is only at this level that the the
schematic. This implies that test generation should ideally take account of possible shorts
and open circuits at the switch level [Galiay80]. Expediency dictates that most existing
systems rely on Boolean logic representations of circuits and stuck-at fault modeling.

FIGURE( b) A CMOS open fault that causes sequential faults FIGURE(C) A defect that
causes static IDD current

A particular problem that arises with CMOS is that it is possible for a fault to convert
a combinational circuit into a sequential circuit. This is illustrated in Figure b for the case of a

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

2-input NOR gate in which one of the transistors is rendered ineffective. If nMOS transistor A
is stuck open, then the function displayed by the gate will be

where Z’ is the previous state of the gate. As another example, if either pMOS transistor is
missing, the node would be arbitrarily charged (i.e., it might be high due to some weird
charging sequence) until one of the nMOS transistors discharged the node. Thereafter, it
would remain at zero, barring charge leakage effects.

It is also possible for transistors to exhibit a stuck-open or stuck-closed state.


Stuckclosed states can be detected by observing the static VDD current (IDD) while applying
test vectors. Consider the fault shown in Figure C, where the drain connection on a pMOS
transistor in a 2-input NOR gate is shorted to VDD.

This could physically occur if stray metal (caused by a speck of dust at the
photolithography stage) overlapped the VDD line and drain connection as shown.

If we apply the test vector 01 or 10 to the A and B inputs and measure the static IDD
current, we will notice that it rises to some value determined by size of the nMOS transistors.

5.2 OBSERVABILITY

The observability of a particular circuit node is the degree to which you can observe
that node at the outputs of an integrated circuit (i.e., the pins). This metric is relevant when
you want to measure the output of a gate within a larger circuit to check that it operates
correctly. Given the limited number of nodes that can be directly observed, it is the aim of
good chip designers to have easily observed gate outputs.

Adoption of some basic design for test techniques can aid tremendously in this
respect. Ideally, you should be able to observe directly or with moderate indirection (i.e., you
may have to wait a few cycles) every gate output within an integrated circuit.

While at one time this aim was hindered by the expense of extra test circuitry and a
lack of design methodology, current processes and design practices allow you to approach
this ideal.

5.3 CONTROLLABILITY

The controllability of an internal circuit node within a chip is a measure of the ease of
setting the node to a 1 or 0 state. This metric is of importance when assessing the degree of
difficulty of testing a particular signal within a circuit. An easily controllable node would be
directly settable via an input pad.

A node with little controllability, such as the most significant bit of a counter, might
require many hundreds or thousands of cycles to get it to the right state. Often, you will find it
impossible to generate a test sequence to set a number of poorly controllable nodes into the
right state. It should be the aim of good chip designers to make all nodes easily controllable.

In common with observability, the adoption of some simple design for test
techniques can aid in this respect tremendously. Making all flip-flops resettable via a global
reset signal is one step toward good controllability.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

5.4 FAULT COVERAGE

A measure of goodness of a set of test vectors is the amount of fault coverage it


achieves. That is, for the vectors applied, what percentage of the chip’s internal nodes were
checked? Conceptually, the way in which the fault coverage is calculated is as follows.

Each circuit node is taken in sequence and held to 0 (S-A-0), and the circuit is
simulated with the test vectors comparing the chip outputs with a known good machine––a
circuit with no nodes artificially set to 0 (or 1).

When a discrepancy is detected between the faulty machine and the good machine,
the fault is marked as detected and the simulation is stopped. This is repeated for setting the
node to 1 (S-A-1). In turn, every node is stuck (artificially) at 1 and 0 sequentially.

The fault coverage of a set of test vectors is the percentage of the total nodes that
can be detected as faulty when the vectors are applied. To achieve world-class quality
levels, circuits are required to have in excess of 98.5% fault coverage. The Verification
Methodology Manual is the bible for fault coverage techniques.

5.5 Automatic Test Pattern Generation (ATPG)

Historically, in the IC industry, logic and circuit designers implemented the functions at
the RTL or schematic level, mask designers completed the layout, and test engineers wrote
the tests. In many ways, the test engineers were the Sherlock Holmes of the industry,
reverse engineering circuits and devising tests that would test the circuits in an adequate
manner.

For the longest time, test engineers implored circuit designers to include extra circuitry to
ease the burden of test generation. Happily, as processes have increased in density and
chips have increased in complexity, the inclusion of test circuitry has become less of an
overhead for both the designer and the manager worried about the cost of the die.

In addition, as tools have improved, more of the burden for generating tests has fallen on
the designer. To deal with this burden, Automatic Test Pattern Generation (ATPG) methods
have been invented. The use of some form of ATPG is standard for most digital designs.
Commercial ATPG tools can achieve excellent fault coverage.

However, they are computation-intensive and often must be run on servers or compute
farms with many parallel processors. Some tools use statistical algorithms to predict the fault
coverage of a set of vectors without performing as much simulation.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

FIGURE An example of a delay fault

5.6 DELAY FAULT TESTING

The fault models dealt with until this point have neglected timing. Failures that occur in
CMOS could leave the functionality of the circuit untouched, but affect the timing.

If an open circuit occurs in one of the nMOS transistor source connections to GND, then
the gate would still function but with increased tpdf. In addition, the fault now becomes
sequential as the detection of the fault depends on the previous state of the gate.

Delay faults may be caused by crosstalk . Delay faults can also occur more often in SOI
logic through the history effect. Software has been developed to model the effect of delay
faults and is becoming more important as a failure mode as processes scale.

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

PART-A

1. What is a FPGA?
A field programmable gate array (FPGA) is a programmable logic device
that supports implementation of relatively large logic circuits. FPGAs can be
used to implement a logic circuit with more than 20,000 gates whereas a CPLD
can implement circuits of upto about 20,000 equivalent gates.

2. What are the different methods of programming of PALs?


The programming of PALs is done in three main ways:
 Fusible links
 UV – erasable EPROM
 EEPROM (E PROM) – Electrically Erasable Programmable ROM

3. What is an antifuse? (nov 2016)
An antifuse is normally high resistance (>100M ). On application of
appropriate programming voltages, the antifuse is changed permanently to a
low-resistance structure (200-500 ).

4. What are the types of programmable devices ?

Types of programmable devices are

 Programmable logic structure


 Programmable Interconnect

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

 Reprogrammable Gate Array

5. What are the characteristics of FPGA ?


 None of the mask layers are customized
 A method of programming the basic logic cells and the interconnect.
 The core is a array of programmable basic logic cells that can
implement combinational as well as sequential logic (flipflops).
 A matrix of programmable interconnect surrounds the basic logic cells .
 Design turn around is a few hours.
6. What is programmable logic array ?

A programmable logic array (PLA) is a programmable device used to


implementcombinational logic circuits. The PLA has a set of programmable AND
planes, which link to a programmable OR planes, which can then be conditionally
complemented to produce an output. This layout allows for a large number of logic
functions to be synthesized in the sum of products (sometimes product of sums)
canonical forms.

7. What is feed through cells?(may 2016)

A connection that needs to cross over a row of standard cells uses a feed
through cells.

15.What is ULSI?(Nov 2017)

It is Short for ultra large scale integration, which refers loosely to placing more than
about one million circuit elements on a single chip. The Intel 486 and Pentium
microprocessors, for example, use ULSI technology.

16. Write the various ways of routing Procedure. (Nov 2017)

i) Detailed Routing

ii) Channel Routing

iii) Global Routing

1. Mention the levels at which testing of a chip can be done?


a) At the wafer level b) At the packaged-chip level c) At the board level d) At the system
levele) In the field

2. What are the categories of testing?


a) Functionality tests b) Manufacturing tests

3. Write notes on functionality tests?


SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

Functionality tests verify that the chip performs its intended function. These tests assert that
all the gates in the chip, acting in concert, achieve a desired function. These tests are
usually used early in the design cycle to verify the functionality of the circuit.

4. Write notes on manufacturing tests?


Manufacturing tests verify that every gate and register in the chip functions correctly. These
tests are used after the chip is manufactured to verify that the silicon is intact.

5. Mention the defects that occur in a chip?


a) layer-to-layer shorts b) Discontinuous wires c) thin-oxide shorts to substrate or
well

6. Give some circuit maladies to overcome the defects?


i. nodes shorted to power or ground ii. nodes shorted to each other

iii. inputs floating/outputs disconnected

7. What are the tests for I/O integrity?


i. I/O level test ii. Speed test iii. IDD test

8. What is meant by fault models?


Fault model is a model for how faults occur and their impact on circuits.

9. Give some examples of fault models?


i. Stuck-At Faults ii. Short-Circuit and Open-Circuit Faults

8. What is stuck – at fault?


With this model, a faulty gate input is modeled as a “stuck at zero” or “stuck at one”. These
faults most frequently occur due to thin-oxide shorts or metal-to-metal shorts.

9. What is meant by observability?


The observability of a particular internal circuit node is the degree to which one can observe
that node at the outputs of an integrated circuit.

10. What is meant by controllability?


The controllability of an internal circuit node within a chip is a measure of the ease of setting
the node to a 1 or 0 state.

11. What is known as percentage-fault coverage?


The total number of nodes that, when set to 1 or 0, do result in the detection of the fault,
divided by the total number of nodes in the circuit, is called the percentage-fault coverage.

12. What is fault grading?


Fault grading consists of two steps. First, the node to be faulted is selected. A simulation is
run with no faults inserted, and the results of this simulation are saved. Each node or line to
be faulted is set to 0 and then 1 and the test vector set is applied. If and when a discrepancy
is detected between the faulted circuit response and the good circuit response, the fault is
said to be detected and the simulation is stopped.

13. Mention the ideas to increase the speed of fault simulation?


a. parallel simulation b. concurrent simulation

14. What is fault sampling?


SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

An approach to fault analysis is known as fault sampling. This is used in circuits where it is
impossible to fault every node in the circuit. Nodes are randomly selected and faulted. The
resulting fault detection rate may be statistically inferred from the number of faults that are
detected in the fault set and the size of the set. The randomly selected faults are unbiased. It
will determine whether the fault coverage exceeds a desired level.

15. What are the approaches in design for testability?


a. ad hoc testing b. scan-based approaches c. self-test and built-in
testing

16. Mention the common techniques involved in ad hoc testing?


1. partitioning large sequential circuits
2. adding test points
3. adding multiplexers
4. providing for easy state reset

17. What are the scan-based test techniques?


a) Level sensitive scan design b) Serial scan c) Partial serial scan d) Parallel scan

18. What are the two tenets in LSSD?


The circuit is level-sensitive. Each register may be converted to a serial shift register.

19. What are the self-test techniques?


a. Signature analysis and BILBO b. Memory self-test c. Iterative logic array testing

20. What is known as BILBO?


Signature analysis can be merged with the scan technique to create a structure known as
BILBO- for Built In Logic Block Observation.

21. What is known as IDDQ testing?[MAY 2011]


A popular method of testing for bridging faults is called IDDQ or current supply monitoring.
This relies on the fact that when a complementary CMOS logic gate is not switching, it draws
no DC current. When a bridging fault occurs, for some combination of input conditions a
measurable DC IDD will flow.

22. What are the applications of chip level test techniques?


a. Regular logic arrays b. Memories c. Random logic

23. What is boundary scan?


The increasing complexity of boards and the movement to technologies like multichip
modules and surface-mount technologies resulted in system designers agreeing on a unified
scan-based methodology for testing chips at the board. This is called boundary scan.

24. What is the test access port?


The Test Access Port (TAP) is a definition of the interface that needs to be included in an IC
to make it capable of being included in a boundary-scan architecture. The port has four or
five single bit connections, as follows:
 TCK(The Test Clock Input) TMS(The Test Mode Select)
 TDI(The Test Data Input) TDO(The Test Data Output)
It also has an optional signal

 TRST*(The Test Reset Signal)

SKR Engineering College


Department of ECE EC8095-VLSI DESIGN

25. What are the contents of the test architecture?


The test architecture consists of:
The TAP interface pins
A set of test-data registers
An instruction register
A TAP controller

26. What is the TAP controller?


The TAP controller is a 16-state FSM that proceeds from state to state based on the TCK
and TMS signals. It provides signals that control the test data registers, and the instruction
register. These include serial-shift clocks and update clocks.

27. What is known as test data register?


The test-data registers are used to set the inputs of modules to be tested, and to collect the
results of running tests.

28. What is known as boundary scan register?


The boundary scan register is a special case of a data register. It allows circuit-board Inter
connections to be tested, external components tested, and the state of chip digital I/Os to be
sampled.

29. What are the factors that cause timing failures?[MAY 2012]
Temperature, large of interconnects.

30.What are the advantage of a single stuck at fault? [MAY 2012]


Faults are modeled at transistor level. At this level only, the complete circuit structure is
known.

31. List the basic types of CMOS testing [MAY2013]


1.Characterization Testing
2.Production testing
3.Burn in testing
4.Incoming inspection

32. In saturation region, what are the factors that affect Ids? [MAY 2011]
i. distance between source and drain.
ii. channel width
iii.Threshold oltage
iv.thickness of oxide layer
v. dielectric constant of gate insulator
vi. Carrier mobility.

33. State the objective of functionality testing. [MAY 2010]


Structural information can facilitate testing.
Organization/Architecture information can make testing
of microprocessors and memories practical.
Develop fault models when we are to use this kind of information.

34. What are the test features required to test a chip. [MAY 2010]
A test fixture is a device or setup designed to hold the device under test in place and allow
it to be tested by being subjected to controlled electronic test signals.
Eg: 1. Socket style test fixtures
2. Semi custom test fixture
SKR Engineering College
Department of ECE EC8095-VLSI DESIGN

3. Curve trace test fixture

PART B
1. (a)Expalin in detail the sequence of Scan – Based techniques.(16) MAY
(b)With the essential circuit modules, explain in detail the BIST technique. 2011
(16)
.2. (a) Explain the manufacturing test principles in detail. (16) DEC
3. (b) Describe the adhoc testing and scan based approaches to design for 2011
testability in detail. (16)
4. (a) Explain the design for testability (DFT) concepts. MAY
5. (b) Explain the following terms, 2013
(i) Silicon debugs principles. (8)
(ii) Boundary scans techniques. (8)
6. (a)Describe in detail, the various manufacturing test in CMOS testing. DEC
7. (b)Explain in detail boundary scan testing.(16) 2013
8. (a)Discuss the need for testing and explain about the silicon debugging MAY
principles.(16) 2014

9. (b)Explain the method of boundary scan test in detail.(16)

10.Explain the general architecture of FPGA and bring about different programmable
blocks used.(NOV 2015)(Nov 2017)

4.Explain the programmable interconnects and I/O blocks used in FPGA.(MAY


2016)

6. Describe the different types of programming technology used in FPGA.(DEC


2016)

SKR Engineering College

You might also like