0% found this document useful (0 votes)
37 views31 pages

Layout Lec 02 Var Rel v01

This lecture discusses variability and reliability in integrated circuits. It explains that manufacturing variations in process, voltage, and temperature (PVT) must be considered in robust IC design. Variability arises from random dopant fluctuations, line edge roughness, and other sources. Monte Carlo simulation is used to predict yield given variability. Variation-tolerant design techniques like adaptive control are becoming important to compensate for increasing variability without degrading performance or power.

Uploaded by

Ahmed Metwaly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views31 pages

Layout Lec 02 Var Rel v01

This lecture discusses variability and reliability in integrated circuits. It explains that manufacturing variations in process, voltage, and temperature (PVT) must be considered in robust IC design. Variability arises from random dopant fluctuations, line edge roughness, and other sources. Monte Carlo simulation is used to predict yield given variability. Variation-tolerant design techniques like adaptive control are becoming important to compensate for increasing variability without degrading performance or power.

Uploaded by

Ahmed Metwaly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

‫ن ا ْلعِْلِم إِاَّل قَلِ ًيل‬ِ ‫وما أُوتِيتم‬

‫م‬
26 March 2018 1439 ‫ رجب‬9

َ ُْ ََ

IC Layout

Lecture 02
Variability and Reliability

Dr. Hesham A. Omran


Integrated Circuits Lab (ICL)
Electronics and Communications Eng. Dept.
Faculty of Engineering
Ain Shams University
This lecture is mainly based on “CMOS VLSI Design”, 4th edition, by N. Weste and D. Harris and
its accompanying lecture notes
Robustness
 A central challenge in building integrated circuits is to get millions
or billions of transistors to all function
– Not just once, but for a quintillion consecutive cycles
 Despite challenges, engineers routinely build robust integrated
circuits
– lifetimes exceed ten years of continuous operation
 Conventional static CMOS circuits are exceptionally well-suited to
the task
– have great noise margins
– are minimally sensitive to variations in transistor parameters
– will eventually recover even if a noise event occurs

20: Variability and Reliability 2


Variability (PVT Variations)
 Manufacturing variations: process variations
 Environmental variations: voltage and temperature
 PVT: Process, voltage, and temperature

 Statistical distribution
1. Uniform distribution
2. Normal (Gaussian) distribution

20: Variability and Reliability 3


Variability (PVT Variations)
 Uniform distribution
– Tolerate all variations within half-range = a
– E.g., VDD = 1V +/- 10%  half-range = 100 mV

20: Variability and Reliability 4


Variability (PVT Variations)
 Normal (Gaussian) distribution:
– 68.3–95.5–99.7 rule
• More accurately, 68.27%, 95.45% and 99.73%
– 3-sigma range is usually acceptable (0.27% of parts rejected)
– Components replicated millions of times (e.g., memory cells)
are designed to tolerate 5- to 7-sigma variations
– Note: if only the variations in one direction (e.g., too slow)
matter, the reject rate is halved
– Ex: for clock period, how many sigma for 97.7% yield?

20: Variability and Reliability 5


Process Variations
 Examples of process variations
– Channel length
• Lithography limitations
• Varying etch rates
• Line edge roughness
– Threshold voltage
• Random dopant
fluctuations
• High Vt has higher doping
and higher variations
 E.g., intel CPU binning
– 1.8 GHz, 2.0 GHz, 2.2 GHz
– i3, i5, i7
20: Variability and Reliability 6
Process Variations
 Process Variations Classification
– Lot-to-lot (L2L)
– Wafer-to-wafer (W2W)
– Die-to-die (D2D) or inter-die
– Within-die (WID) or intra-die
• Most important for analog
• a.k.a. mismatch

20: Variability and Reliability 7


Pelgrom’s Model
 The standard deviation of random WID variations is inversely
proportional to the square root of the transistor area (WL)
 Ex:

 This makes sense intuitively because variations tend to average out


over a larger area

20: Variability and Reliability 8


Quiz
 𝜎𝑉𝑡 = 𝐴𝑉𝑡 / 𝑊𝐿, 𝐴𝑉𝑡 = 2 𝑚𝑉 ⋅ 𝜇𝑚 for 45nm process
 Calculate the standard deviation in threshold voltage for the
following devices
𝑊 2𝜇𝑚
1. 𝐿
=
0.5𝜇𝑚
𝑊 200𝑛𝑚
2. 𝐿
=
50𝑛𝑚
 If the subthreshold slope is 100mV/decade, what is the percent
increase in leakage current for three sigma variation?

20: Variability and Reliability 9


Quiz
 𝜎𝑉𝑡 = 𝐴𝑉𝑡 / 𝑊𝐿, 𝐴𝑉𝑡 = 2 𝑚𝑉 ⋅ 𝜇𝑚 for 45nm process

𝑊 2𝜇𝑚
1. 𝐿
=
0.5𝜇𝑚
10^(3*2/100) – 1 = 15%

𝑊 200𝑛𝑚
2. 𝐿
=
50𝑛𝑚
10^(3*20/100) – 1 = 300%

20: Variability and Reliability 10


Voltage Variations
 Supply voltage may vary around its nominal value
– Regulator tolerances
– IR drops
– di/dt noise
 Voltage varies in both space (across chip) and time
 Typically tolerate 10% variation
– To first order, leads to 10% delay variations
 Voltage droop map:

20: Variability and Reliability 11


Temperature Variations
 Drain current decreases with temperature
 Ambient temp ranges:
– Commercial (0 to 70oC)
– Industrial (-40 to 85oC)
– Military (-55 to 125oC)
 Junction temp may significantly exceed the ambient
– Function of power consumption and package thermal resistance
– Commercial parts commonly verified at 125oC junction temp
 Temperature varies in both space (across chip, a.k.a. temperature
gradients) and time (temperature fluctuations)
– Circuits in a 1 mm diameter see nearly the same temperature
– Temperature varies in time on a scale of milliseconds

20: Variability and Reliability 12


Variations Classification
 Systematic: Can be modeled and nulled out at design time (static)
– Ex: polysilicon gates may systematically be etched narrower in
regions of high polysilicon density than low density
 Random: Can be nulled out by a single calibration step after
manufacturing (static)
– Ex: number of dopant atoms implanted in a transistor
 Drift: Can be nulled by compensation circuits that recalibrate faster
than the drift (dynamic – slow)
– Ex: aging and temperature variation
 Jitter: The most difficult cause of mismatch (dynamic – fast)
– Ex: voltage variations or crosstalk

20: Variability and Reliability 13


Design Corners
 MOS: Slow (S), typical or nominal (T), and fast (F)
 Voltage: S (0.9VDD), T (VDD), and F (1.1VDD)
 Temp: S (125oC), T (70oC), and F (0oC)
 Binning: Faster parts are rated for higher frequency and sold for
more money

20: Variability and Reliability 14


Design Corners
 Few corners for old technologies: simulate all
 Thousands of corners for DSM nodes: sophisticated CAD tools are
required to identify what corners are important for a given circuit

 Burn-in corner during testing


– 125 to 140 °C externally, corresponding to an even higher
internal temperature
– 1.3–1.7× nominal VDD
– Very high leakage

20: Variability and Reliability 15


Monte Carlo Simulation
 As process variation increases, the worst-case corners become too
pessimistic for practical design
 Monte Carlo: repeated simulations with parameters randomly
varied each time
 Look at scatter plot of results to predict yield
 Ex: impact of Vt variation
– ON-current
– leakage

20: Variability and Reliability 16


Yield
 Yield (Y) is the fraction of manufactured chips that are operational
or that work according to specification
 Functional yield: chips fail because of gross problems such as open
or short circuits caused by contaminants during manufacturing
– Follows Poisson’s distribution: 𝑌 = 𝑒 −𝐷𝐴 , D is defect density
and A is area
 Parametric yield: operational chips are rejected because they are
too slow or consume too much power or have insufficient noise
margin
 Increasing variability tends to reduce parametric yield, but
designers are introducing adaptive techniques to compensate

20: Variability and Reliability 17


Example
 Suppose the offset voltage in a sense amplifier is a normally
distributed zero-mean random variable with a standard deviation
of 10 mV.
 If a memory contains 4096 sense amplifiers, how much offset
voltage must it tolerate to achieve a 99.9% parametric yield
overall?
𝑁
 For N-component system: 𝑌𝑠𝑦𝑠𝑡𝑒𝑚 = 𝑌𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡

 Ys = 0.999 and N = 4096  Yc = 0.99999976.


 This requires tolerating about five standard deviations, or 50 mV of
amplifier offset.

20: Variability and Reliability 18


Variation-Tolerant Design
 Variation has traditionally been handled by margining to ensure a
good parametric yield
 As variability increases, the growing margins severely degrade the
performance and power of a chip
 Variation-tolerant designs are becoming more important
1. Adaptive control:
• Chip can measure its operating conditions and adjust
parameters such as supply voltage, body bias, or frequency,
on the fly to compensate for variability
2. Fault tolerance:
• Next slide

20: Variability and Reliability 19


Variation-Tolerant Design
 Fault tolerance: providing spare parts and performing error
detection and correction
– Ex: an 8-core processor could be sold as a 6-core if two cores
were defective
– For 16-core processor, two spare cores
may improve yield from 19% to 79%
– Memory fault tolerance
• Error detecting and correcting codes
– Logic fault tolerance
1. Master-checker configuration
– Two or more cores can operate in lockstep
2. Triple-mode redundancy (TMR)
– Ideal for real-time systems

20: Variability and Reliability 20


Reliability
 Reliability is the probability that a part will last at least a specified
time under specified experimental conditions
 Total device-hours (TDH): No. of devices (units) x operation time
 Failure rate (λ) is measured in FIT (failures in time) units
– 1 FIT = 1 failure in 109 TDH
• No. of failures / (thousand hours * million devices)
• 109 x No. of failures / (no. of hours * no. of devices)
 Mean time to/between failures (MTTF or MTBF)
– MTTF = TDH / no. of failures = 109/ λ, where λ is in FITs
 E.g., 500 failures for 106 components operating for 1000 hours
– Failure rate = λ = 500 FIT
– MTTF = 2 million device-hours
 Overall MTTF for a system of n components = 109/ (λ1 + λ2 + … + λn)
20: Variability and Reliability 21
Quiz
 If a system contains 100 chips each rated at 1000 FIT and a
customer purchases 10 systems, what is the overall MTBF in days?

20: Variability and Reliability 22


Quiz
 If a system contains 100 chips each rated at 1000 FIT and a
customer purchases 10 systems, what is the overall MTBF in days?

 The failure rate is: λ = 100 × 1000 × 10 = 106 FIT


 MTTF = 109/ λ = 1000 hours = 42 days

20: Variability and Reliability 23


Reliability Bathtub Curve
 Soon after birth, systems with weak components tend to fail
 Systems should be aged past infant mortality before shipping
 Aging: Transistors change over time as they wear out
– Hot carriers (HC)
– Negative bias temperature instability (NBTI)
– Time-dependent dielectric breakdown (TDDB)
 Causes threshold voltage changes, more on this later…

20: Variability and Reliability 24


Accelerated Lifetime Testing
 Expected reliability typically exceeds 10 years
 But products come to market in 1-2 years
 Accelerated lifetime testing required to predict adequate long-
term reliability
 Wearout depends exponentially on voltage and temperature
– Aging is accelerated by stressing the part at “burn-in” corner
– Burn-in corner: 125 to 140oC externally and 1.3–1.7VDD

20: Variability and Reliability 25


Hard Errors
 Hard errors: Cause permanent failure
 Oxide wearout:
– Hot carriers (HC): High energy carriers damage the oxide
– Negative bias temperature instability (NBTI): Traps develop at
the Si-SiO2 interface when strong negative bias is applied to
PMOS at elevated temperature
– Time-dependent dielectric breakdown (TDDB): Gate leakage
current gradually increases over time
 Interconnect wearout:
– Electromigration: High current density causes metal to migrate
over time. Cu wires can tolerate higher currents than Al.
– Self-heating: Hot wires have higher resistance and delay and are
more prone to electromigration. Wires may even melt.

20: Variability and Reliability 26


Overvoltage Failure
 Tiny transistors in DSM nodes can be easily damaged
– Overvoltage at the gate accelerates oxide wearout and may
cause breakdown
• Breakdown occurs at 3V in 65nm node
– Overvoltage at the drain leads to punch-through causing high
current flow and ultimately self-destructive overheating
 Overvoltage failure may be triggered by power supply transients or
electrostatic discharge (ESD)
 I/O cells use thick-oxide transistors with longer channels to endure
higher voltages

20: Variability and Reliability 27


Soft Errors
 Soft errors: Cause system crash or losing data
 Soft errors are random nonrecurring errors triggered by radiation
striking a chip
– E.g., radiation may cause a DRAM bit to flip
– The problem is worse for DSM nodes because the capacitances
are smaller, i.e., it is easier to flip a bit with small amount of
charge
 Memories use error detecting and correcting codes to tolerate soft
errors
 The problem is much worse at aircraft flight altitudes
– Radiation-hardening techniques aim at designs that can tolerate
larger amount of radiation

20: Variability and Reliability 28


Latchup
 Ordinarily, both parasitic bipolar transistors are OFF.
 If substantial current flows in the substrate:
– Vsub  turning ON the npn transistor
– Vwell  turning ON the pnp-transistor
 A positive feedback loop is triggered with a large current flowing
between VDD and GND

Find the mistake

20: Variability and Reliability 29


Latchup
 Latchup prevention is easily accomplished
by minimizing Rsub and Rwell
– Use MANY substrate/well taps close to
each transistor
 I/O pads are more prone to latchup
because external voltages can ring below
GND or above VDD
– Guard rings should be used (a low
resistance path to collect stray currents)
 Latchup is not important for:
– SOI processes (no parasitic BJT)
– DSM nodes with VDD < 1.4 (no enough
voltage to turn on two BJTs)

20: Variability and Reliability 30


Thank you!

20: Variability and Reliability 31

You might also like