FIT Rate Aware EM Analysis: Govind Saraswat, William Au, Qing He and Subramanian Venkateswaran
FIT Rate Aware EM Analysis: Govind Saraswat, William Au, Qing He and Subramanian Venkateswaran
FIT Rate Aware EM Analysis: Govind Saraswat, William Au, Qing He and Subramanian Venkateswaran
Abstract— We present a novel method for performing Elec- II. T HEORETICAL C ONSIDERATIONS
tromigration (EM) verification for VLSI interconnects. It pro-
vides an accurate metric for EM reliability for the entire design The relation between the EM current in a wire segment
thus it is not affected by the inherent pessimism existing in to the corresponding failure rate, under given environment
the current state-of-art methods used by the industry. In this conditions is outlined next.
method, failure rates are computed for each wire interconnect
and accumulated in the design. It then becomes possible to A. Reliability Engineering Basics
determine the overall failure rate margin (the real pass/fail
criteria). This paper delves into the relationship between EM
Reliability engineering deals with managing the ability of
current density, technology parameters and the failure rate a system or component to function properly within the target
for each interconnect, which culminates in EM reliability conditions over its planned lifetime. Reliability management
equation. This equation is used to calculate failure rate for of this kind requires analysis, measurements, and predictions
each interconnect. of failures over time. Reliability can be measured as a proba-
bility distribution of cumulative failures as well as by failure
I. INTRODUCTION rates. Here, Time to Failure (TTF) is the random variable
Electromigration (EM) is a physical phenomenon, where whose probability distribution is studied for reliability. There
metal atoms undergo migration in the direction of applied are four important stochastic functions which are used to
electrical field[1]. This gradual migration is a result of quantify the reliability. These functions are, namely[3]:
momentum transfer due to random bombardment of the • The probability that the unit will fail within time t is
conducting electrons, as shown in Figure 1. EM is emerging called the Cumulative Distribution Function (CDF) or
as a significant problem in modern integrated circuits. The Failure Function and denoted by F(t). It is defined as:
rampant increase in total wire length and the current densities
while decreasing wire widths is making it difficult to guar- F(t) = P(T T F < t) (1)
antee EM reliability[2]. As a result, EM sign-off is going to
• Conversely, the probability that the unit will survive
be increasingly difficult and the number of design iterations
beyond time t is called the Reliability Function and
required will be more. Reliability of an IC is measured in
denoted by R(t). It is defined as:
FIT (Failure-In-Time) rate where 1 FIT means 1 failure in 1
billion device hours. Traditionally, FIT rates are budgeted R(t) = P(T T F > t) = 1 − F(t) (2)
in advance across all wire segments somewhat uniformly
(static-FIT) and a reference current limit for this fractional • Also, the Probability Density Function (PDF), denoted
rate is imposed. The process of converting product Failure as f(t) is defined as:
rate to a current limit on individual metal segment is a form dF(t)
of abstraction. When the failure rate is completely abstracted f (t) = (3)
dt
away it is impossible to determine exactly how much margin
• The failure rate, denoted as λ (t), is the rate of change in
is left after each EM check. This leads to excess pessimism
failure probability over the survival probability at time
in the EM analysis which leads to over design.
t and is given by:
dF(t)
f (t) (4)
λ (t) = dt =
R(t) R(t)
λ (t) is a conditional probability of the PDF f (t), assuming
failure has not occurred at time t. One of the important
quantity of interest in Reliability analysis is Average Failure
Rate (AFR), which is given by,
Fig. 1. Electromigration phenomenon is depicted where metal ions are
displaced by conducting electrons. t
1
AFR(t) = λ (t)dt (5)
t 0
1 All with Oracle Inc, correspondence at govind.saraswat at By some abuse of notation, let us represent this AFR with
oracle dot com λ only. Given there are no failures at time t = 0 we can solve
787
Fig. 2. Here the hierarchical FIT aware EM analysis is outlined. FIT is calculated for each library cell and then propagated up the hierarchy while
calculating total FIT for every block/cluster of the chip. Similarly cluster/block FIT is propagated up while calculating total FIT of the chip.
Fig. 3. FIT is calculated and accumulated for each block and then compared with the set target FIT rate for that block. Here it is assumed that the target
FIT rate for the entire chip is 10.
Further using (10), we get: vice versa. But to know optimum LRcrit , we have to
Ea 1 1 know the jmax,re f , and we need LRcrit to find jmax,re f ,
Zdes = Zre f + ( − ) (19) thus presenting a classic ‘chicken vs egg’ problem.
σ Tre f Tdes
The wires which are not designed at jmax,re f still have
Now we can calculate the design FIT rate λdes correspond- contributions to the total failure rate. Thus, this method
ing to the operating temperature of Tdes by using equations is very approximate in nature and does not provide the
(6), (12) and (19): real picture. Furthermore, a lot of design iterations are
III. S TATE - OF - ART EM ANALYSIS needed to know optimum LRcrit .
Main aim of EM analysis is to achieve some fix target Other strategy proposed in [5], [6] and [7] gives an
FIT rate (λchip ) for the entire chip. There are two strategies approximate value of total reliability of the chip. There
predominantly used: wire segments are classified in discrete classes, and then
total length of wires of specific class is calculated (similar
• Static FIT: Traditionally, FIT rates are budgeted in
to the Critical Length Ratio method). Thus the strategies
advance across all wire segments somewhat uniformly
which are used in industry are either overly pessimistic or
(static-FIT) and a reference current limit for this frac-
very approximate. We next outline the new method for EM
tional rate is imposed for each wire.
analysis.
• Critical Length Ratio: A critical length ratio (LRcrit )
is chosen for the chip. Here it is assumed that the IV. N EW METHOD
number of wires operating at maximum allowed current
density jmax,re f is LRcrit . This corresponds to a failure We propose a new FIT aware EM analysis where failure
rate of λre f which depends on LRcrit , and is given rate is computed for each wire segment and accumulated
λchip in the design. With this method, we can determine the
by . Thus for a given target failure rate λchip , overall failure rate margin (the real pass/fail criteria). No
LRcrit
we can calculate λre f and thus calculate jmax,re f or assumptions are made on the reliability performance (as is
788
the case in the Critical Length Ratio method), and thus an
accurate picture is provided by this method. FIT aware EM
analysis is implemented in a hierarchical manner as depicted
in Figure 2
An EM analysis which is FIT aware also allows for
dynamic FIT budgeting and taking account of the thermal
map of the chip easily. The dynamic FIT budgeting rec-
ognizes that not all wires require the same FIT target. It
attempts to redistribute the remaining balance from wires
that do not need more margin to wires that need thus
reducing pessimism from the analysis. The use of thermal
map provides further relaxation to the EM analysis. Dynamic
FIT budgeting happens during analysis instead of before, by
performing real time failure rate calculation and rate target
budgeting for each block of the chip (see Figure 3). If total
FIT of the chip is more than the target FIT rate (10, for
example), target FIT rate for each block can be dynamically
changed depending on the calculated FIT rate of the block.
As mentioned earlier, effect of temperature on failure
rate is of great importance and thermal map of a chip is
extremely useful in quantifying the actual failure rate of
the device [8], [9]. We can use the method outlined in
the subsection E to relate the changes in FIT rate with
changes in the temperature of a wire. These changes are
calculated using some of the parameter values provided by
foundry and plotted in Figure 4. Figure 4 clearly depicts near-
exponential dependence of failure rate on the temperature,
while everything else is kept constant. A mere change of
around 7°C results in two order of magnitude change in
FIT rate. Thus the thermal map of a chip can be used for
calculating the accurate FIT rate for each wire segment.
Fig. 5. Figure shows log-log graph of number of interconnects plotted
against FIT rate of interconnects for two examples where (a) Total FIT rate
is less than 10 and (b) total FIT rate is greater than 10.
789
inaccuracies of the LRcrit method. of the FIT rate calculated for each wire segment.
790