Interconnect Limits On Gigascale Integration (GSI) in The 21st Century
Interconnect Limits On Gigascale Integration (GSI) in The 21st Century
Invited Paper
Twenty-first century opportunities for GSI will be governed in Keywords—Crosstalk, epitaxial growth, interconnections, mod-
part by a hierarchy of physical limits on interconnects whose levels eling, scattering, technology forecasting, thin films, thin-film tran-
are codified as fundamental, material, device, circuit, and system. sistors, transmission lines, wafer bonding, wiring.
Fundamental limits are derived from the basic axioms of electro-
magnetic, communication, and thermodynamic theories, which im-
mutably restrict interconnect performance, energy dissipation, and I. INTRODUCTION
noise reduction. At the material level, the conductor resistivity in-
creases substantially in sub-50-nm technology due to scattering The International Technology Roadmap for Semiconduc-
mechanisms that are controlled by quantum mechanical phenomena tors (ITRS) projects that by 2011 over one billion transis-
and structural/morphological effects. At the device and circuit level, tors will be integrated into a single monolithic die [1]. The
interconnect scaling significantly increases interconnect crosstalk
and latency. Reverse scaling of global interconnects causes induc- wiring system of this billion-transistor die will deliver power
tance to influence on-chip interconnect transients such that even to each transistor, provide a low-skew synchronizing clock to
with ideal return paths, mutual inductance increases crosstalk by latches and dynamic circuits, and distribute data and control
up to 60% over that predicted by conventional RC models. At the signals throughout the chip. The resulting design and mod-
system level, the number of metal levels explodes for highly con- eling complexity of this GSI multilevel interconnect network
nected 2-D logic megacells that double in size every two years
such that by 2014 the number is significantly larger than ITRS is enormous such that over 10 coupling inductances and
projections. This result emphasizes that changes in design, tech- capacitances throughout a nine-to-ten-level metal stack must
nology, and architecture are needed to cope with the onslaught of be managed. A seminal paper [2] focuses on the transistor
wiring demands. One potential solution is 3-D integration of tran- limits for a GSI system; therefore, this paper will address the
sistors, which is expected to significantly improve interconnect per- limits that on-chip interconnects place on a GSI system de-
formance. Increasing the number of active layers, including the use
of separate layers for repeaters, and optimizing the wiring network, sign in the 21st century.
yields an improvement in interconnect performance of up to 145% Interconnect limits potentially threaten to decelerate
at the 50-nm node. or halt the historical progression of the semiconductor
industry because the miniaturization of interconnects,
Manuscript received February 15, 2000; revised October 1, 2000. unlike transistors, does not enhance their performance.
J. A. Davis, R. Venkatesan, and J. D. Meindl are with the Department Scaling transistors to the nanometer regime is plagued
of Electrical and Computer Engineering, Georgia Institute of Technology, with many challenges, such as drain-induced-barrier
Atlanta, GA 30332 USA.
A. Kaloyeros and M. Beylansky are with the Center for Advanced Thin lowering (DIBL), quantum mechanical gate tunneling,
Film Technology and Department of Physics, The State University of New mobility degradation, and reliability problems due to
York (SUNY) at Albany, Albany, NY USA. random placement of dopant atoms in a host silicon
S. J. Souri, K. Banerjee, and K. C. Saraswat are with the Department
of Electrical Engineering, Stanford University, Stanford, CA 94305-9505 lattice [1], but once overcome MOSFET channel scaling
USA. will enhance intrinsic gate delay [1]. For instance, scaling
A. Rahman and R. Reif are with the Department of Electrical Engineering MOSFET channel length from 1000 to 100 nm to 35
and Computer Science, Massachusetts Institute of Technology, Cambridge,
MA 02139 USA. nm dramatically reduces the intrinsic MOSFET switching
Publisher Item Identifier S 0018-9219(01)02068-0. time as seen in Table 1. Scaling interconnects into the
nanometer regime is also plagued with many challenges, II. FUNDAMENTAL LIMITS
such as resistivity degradation, material integration issues, This discourse on interconnect limits begins through ex-
high-aspect ratio via and wire coverage, planarity control, amination of several of the most basic principles that govern
and reliability problems due to electrical, thermal, and the physical world. The limits discussed in this section are
mechanical stresses in a multilevel wire stack [1], and immutable and are unchanged through the use of advanced
once these challenges are overcome, minimum inter- materials, sophisticated device structures, inventive circuit
connect scaling will still degrade interconnect delay. techniques, or novel instruction set architectures. These
For example, Table 1 also illustrates that the intrinsic limits, therefore, are defined as fundamental and will irre-
interconnect delay of a 1-mm length interconnect at the vocably limit interconnect performance, energy dissipation,
35-nm technology node overwhelms the transistor delay and signal integrity in the 21st century.
by two orders of magnitude.
A potential solution to this interconnect dilemma is to re-
A. Performance Limits
verse scale longer semiglobal and global interconnects such
that they have “fat” cross-sectional dimensions [3], [4]. This The role of GSI global interconnects is to transmit binary
strategy enhances interconnect performance, but at the ex- switching events that are generated from constituent compu-
pense of wire density. For example, to balance the intercon- tational elements. The fundamental limit, therefore, on inter-
nect delay of a 1-mm interconnect length with the transistor connect performance is set by the shortest delay between a
switching delay, the wire size at the 35-nm generation must binary switching event in a transmitter and a binary transi-
be almost five times larger than the minimum lithographic tion detected at a receiver. To determine the shortest possible
size as seen in Table 1. Because die area is directly related to delay, the communication channel connecting the transmitter
cost, the area penalties of the reverse scaled strategies could to the receiver is assumed to be a perfect noise-free lossless
hinder the exponential reduction in cost per function that interconnect.
has propelled semiconductor technology over the past sev- The maximum transmission speed is limited by the speed
eral decades. of an electromagnetic wave propagating in free space and
The central thesis of this paper is that in the 21st century is a well-known quantity derived from Maxwell’s equations
opportunities for GSI will be governed in part by a hierarchy [7]. Assuming that free space surrounds a lossless intercon-
of physical limits on interconnects whose levels are codified nect, then the Helmholz equations, which are derived from
as fundamental, material, device, circuit, and system [2], [6]. Maxwell’s equations, describe the propagation of electric
In Section II, fundamental limits are derived from the basic and magnetic fields. A key result obtained from the Helmholz
axioms of electromagnetic, communication, and thermody- equation is that the free-space wave propagation speed is
namic theories. In Section III, material limits are determined given by
by the transformation of bulk properties of metallic inter-
connects as they are scaled into the nanometer regime. In (1)
Section IV, device limits deal directly with the problems of
interconnect miniaturization and provide a rationale for re-
where and are, respectively, the permeability and the
verse-scaling strategies. New metrics for crosstalk with and
permittivity of free space. The latency in communicating
without on-chip inductive effects are presented. At the cir-
a binary transition event from the transmitter to the receiver
cuit level in Section V, the impact of transistor driver output
must be greater than
resistance on interconnect performance and crosstalk is in-
vestigated. Finally, in Section VI, system limits imposed by
reverse-scaled multilevel interconnect networks are investi- (2)
gated using a compact wire-length distribution model to pre-
dict the wiring requirements of future GSI products. Wire where is the transmission distance.
area limits of reverse-scaled multilevel networks in a two-di- This fundamental limit is clearly represented in the recip-
mensional (2-D) planar transistor process are projected, and rocal length squared versus time delay plane as seen in Fig. 1
the opportunity for three-dimensional (3-D) integration of after [2]. The region to the left of the line with a slope of neg-
transistors is rigorously explored to help alleviate intercon- ative two in logarithmic scaling in this plane is a forbidden
nect delay and density problems. region of interconnect operation.
[V] (5)
C. Noise Limits
In digital circuits an important metric of a binary transition
is its potential swing, and in the presence of thermal noise this
potential is perturbed from its nominal value. The best metric
for this perturbation is the standard deviation of thermal noise
voltage across a resistor, which is derived by Nyquist [2] to
be
(6)
where
maximum channel capacity measured in bits/s;
III. MATERIAL LIMITS
average signal power of the input;
Johnson thermal noise power delivered to a matched Device feature sizes are crossing a critical physical
load [8]; threshold below which the performance of extremely narrow
bandwidth of the receiver; interconnect lines is controlled primarily by: 1) the proper-
Boltzmann’s constant ( J/K); ties of their surfaces and interfaces, as driven by one- and
temperature ( 300 K) [8]. two-dimensional scattering effects; and 2) the characteristics
Assuming that the average energy per bit is , of their impurity and defect densities, as governed by the
then solving for in (3) gives type and distribution of grain boundaries, dislocations, and
junctions. This transition represents a major show stopper
in the successful development of the material and process
(4) (M&P) technologies necessary to ensure maximum signal
transmission in sub-50-nm device nodes through reduced
Setting the derivative of (6) equal to zero or resistance capacitance ( ) time delay. In particular, the
and employing L’Hospital’s rule gives physics of resistivity behavior in extremely fine conductor
lines represents a daunting and potentially insurmountable
challenge that needs to be understood and resolved in order
(7)
to ensure the extendibility of today’s chip architecture below
the 50-nm device node.
Note that is tantamount to calculating the energy In this respect, the resistivity of thin-film conductors is
transfer of an infinitely long bit or a single binary transition. given by [9], [10]
If the energy transferred during a binary transmission on an
interconnect is less than , then the binary transition (thin film) (thermal) (extrinsic) (8)
cannot be differentiated from thermal noise regardless of ad-
vanced error-correcting encoding techniques. where (thermal) is the contribution due to electron–phonon
This energy also sets a lower limit on low-swing intercon- “coupling” (i.e., electronic interactions with thermally in-
nect buses. In the limit, the smallest swing of an intercon- duced lattice vibrations), and (extrinsic) is the contribution
(17)
B. Crosstalk Limits
(16) 1) Resistance and Capacitance ( ) Effects: Even
in high-speed GSI multilevel interconnect networks, dis-
where is a th-order modified Bessel function, is the tributed models are still needed to determine the
interconnect length, is time, , , and are the distributed transient behavior of local and semiglobal interconnects
inductance, resistance, and capacitance per unit length, re- and, therefore, are used to investigate the limits on crosstalk
spectively, is the reflection coefficient at the source, is for shorter high-speed interconnects. Local interconnects,
the current reflection number given by which make up the majority of on-chip interconnects [25],
, the notation is defined as the decimal trunca- will continue to scale to minimum feature size dimensions
tion of (i.e., ), and and are determined to maximize wire density. An existing distributed
to obtain the desired accuracy of solution (in the limit they interconnect model with a step-response excitation voltage
both go to infinity) [23]. predicts that the peak crosstalk (at the load of the quiescent
Using a near wave-front approximation to (15) and a dis- line), , between the two parallel wires is length, scaling,
tributed model in [24], the 50% time delay of a single and material independent for homogeneous dielectrics [24].
(19)
(20)
(23)
interconnects, (25) reveals that providing ground planes suf-
ficiently close to interconnect structures can be an effective
where strategy for controlling crosstalk. For local, semiglobal, and
voltage along the active line; global interconnects, further reduction in crosstalk can be
voltage along the quiescent line; achieved by increasing wire spacing.
self-inductance of each line;
mutual inductance between each line. V. CIRCUIT LIMITS
Empirical expressions for the capacitance [27] and induc-
tance matrices [28] are used for parasitic estimation. The To gain insight into interconnect circuit limits, simple
transient response along the quiescent line is calculated using models that retain only the essence of the problem under
the compact distributed expression attack are engaged. To this end, a transistor is modeled
as an equivalent resistance in series with an ideal voltage
source that drives an active interconnect in isolation or in
proximity to an identical quiescent wire. In addition, the
limits to reducing circuit delay and crosstalk are determined
(24)
through the use of ideal current return paths for each
where is defined in (15). interconnect structure. Such assumptions clearly elucidate
Effects of mutual inductance pose significant limitations the effects of source resistance on interconnect performance
on peak crosstalk reduction. Using (22) and (24), Fig. 12 and crosstalk. The key conclusion of this section is that
shows the length dependence of crosstalk with and without transistor output resistance exacerbates interconnect circuit
the inclusion of inductance on two coupled lines with delay and crosstalk.
negligible source impedance ( ). Using the dis- A. Circuit Delay Limits
tributed models with a step-response voltage in [24] the
crosstalk is length independent; however, with the inclusion The effects of delay can be approximated using a near
of inductance a strong nonlinear length dependence of wave-front approximation to a Bessel function expansion
crosstalk emerges as seen in Fig. 12. For , the similar to (15) and a distributed model after [24].
distributed crosstalk is roughly 60% higher than that Uniting these two models and assuming that the wire
predicted by models. The expression for this maximum capacitance dominates the transistor input capacitance
crosstalk voltage with the inclusion of inductance, which is ( ), the approximate time for the transient voltage
derived from (24), is given by [23] of an interconnect load to reach is given by
(25)
(26)
The peak crosstalk is approximately times
larger than predicted by a distributed model in [24].
where and is the equivalent transistor output
To help control crosstalk gigahertz interconnect network
impedance. The 90% (i.e., ) interconnect latency
ground planes or dedicated ground wires maybe necessary
limit for a very “fat” global wire ( ) is given by
for the suppression of unpredictable crosstalk caused by in-
ductance. For distributed and high-speed global (27)
describes the input–output (I/O) requirements of arbitrarily saturating the maximum number of highly connected gates
sized megacells. The wiring distribution of a 2-D megacell at a value around 10 M keeps the number of metal levels
is based upon Rent’s Rule [29] and is given in [25]. per megacell to a controllable number through 2014. Without
The complete wiring distribution along with interconnect significant changes to traditional microprocessor or ASIC
performance and noise models are used to construct the ar- 2-D transistor technologies, design methodologies, or archi-
chitecture on a GSI multilevel wiring network. In this net- tectures, Fig. 15 suggests that interconnect limits could un-
work it is assumed that interconnects on adjacent metal levels dermine Moore’s law.
in a multilevel network are routed orthogonally. The wire di-
mensions on each orthogonal wiring pair are calculated to B. 3-D Integration Opportunities
insure that the latency of the longest interconnect does not
exceed 90% of the clock period, and each pair of levels is oc- Interconnect delays are increasingly dominating IC per-
cupied with interconnects by equating the required intercon- formance due to increases in chip size and reduction in the
nect area to the available interconnect area. To determine the minimum feature size [30]. In spite of new materials like
absolute limits on system signal integrity, it is assumed that Cu with low- dielectric interconnect delay is expected to
ultrahigh-speed designs have low-impedance ground planes be substantial below 130-nm technology node, thereby se-
that are inserted between each orthogonal pair of wire levels verely limiting chip performance [31]. Therefore, the need
to control the vast number of coupling inductances in an un- exists for alternative technologies to overcome this problem.
shielded GSI multilevel interconnect network. One such promising technique is 3-D ICs with multiple ac-
This stochastic wiring distribution is used to illustrate the tive Si layers. 3-D integration (schematically illustrated in
limitations of historical approaches to microprocessor and Fig. 16) to create multilayer Si ICs is a concept that can
ASIC design. Starting with the assumption that one million significantly alleviate interconnect delay problems, increase
highly connected logic gates are contained in a logic mega- transistor packing density and reduce chip area. Each Si layer
cell for 1999, the number of metal levels is projected over the in the 3-D structure can have multiple layers of interconnect.
next 15 years by doubling the number of highly connected Each of these layers are connected together with vertical in-
logic gates in a megacell every two years. Logic megacell terlayer interconnects (VILICs) and common global inter-
areas for projected designs are calculated by using the pro- connects as shown schematically in Fig. 16. In a 3-D struc-
jected transistor densities, minimum feature size, and clock ture a large number of long horizontal interconnects com-
frequencies outlined in the ITRS [1]. As seen in Fig. 15, monly used in 2-D structures can be replaced by short ver-
the number of required metal levels approaches unrealistic tical interconnects. Additionally, the 3-D architecture offers
values beyond 2005. In fact, the number of projected levels at extra flexibility in system design, placement, and routing.
2014 is almost an order of magnitude larger than the number For instance, logic gates on a critical path can be placed very
of levels prescribed by the ITRS at 2014. As an alternative to close to each other using multiple active layers. This would
Moore’s Law scaling, for example, Fig. 15 also shows that result in reduced chip footprint leading to a significant reduc-
The value of is estimated such that the total number of We assume all the logic gates drive average length wires,
point-to-point interconnects in a 2-D or 3-D IC is conserved. while one logic gate drives a chip-edge length wire [4]. We
is estimated by taking into account the equidistant assume the chip area is interconnect limited, and it is esti-
gate pairs located within a device layer and between device mated by equating the available chip area with the required
layers [33]. is estimated by applying Rent’s rule chip area [34]. The available chip area is a function of the
where the source and sink gate pairs, connected by a wire, number of device layers, the chip/die size, total number of in-
can be located on the same or different device layers [33]. In terconnect layers, and the wiring efficiency in each intercon-
our analysis, two limiting cases of the 3-D wire-length dis- nect layer. The required chip area is the product of the wiring
tribution are considered. In the symmetric interconnection pitches and the total wire length of local, semiglobal and
scheme, for any source logic gate, the sink logic gate can global wires. The wiring efficiency model presented in [4]
be located on the same or other device layers, and there is a can be extended to estimate the wiring efficiency of 3-D ICs.
comparable number of interconnections between gate pairs To make a fair comparison between different 2-D and 3-D
on the same and different device layers. In the asymmetric technologies, we introduce a cost/complexity function. We
interconnection scheme, we assume the number of intercon- define a cost function, c.f. , where is the number
nections between the logic gates on different device layers of interconnect levels per device layer, and is
is negligible compared to the number of interconnections the number of interdevice layer bonding steps, and is the
within the device layers. number of device layers. For example, in a 2-D IC c.f. 6
The wire-length distributions for homogeneous random implies that there are six interconnect levels. For the same
logic networks in 2-D and 3-D ICs are shown in Figs. 18 cost function in a 3-D IC with two device layers, there are
and 19. In a 3-D IC, as more device layers are added, the five interconnect levels/device layer and one bonding step.
wire-length distribution becomes narrower resulting in fewer The input parameters of our analysis are presented in
and shorter semiglobal and global wires. In both 3-D in- Table 3. These parameters are consistent with the technology
terconnect schemes, the average and total wire lengths are requirement for microprocessors in 0.18- m technology
shorter. However, a symmetric interconnection scheme re- node [37]. The clock frequency is estimated by keeping
sults in shorter average and total wire lengths compared to the total chip area, , fixed and applying the cost
an asymmetric interconnection scheme. constraint. The simulation results are shown in Fig. 20. The
b) Simulation results: Using the wire-length distri- improvement in clock frequency in a 3-D IC results from
bution and the interconnect delay criteria, some interesting the reduction in interconnect delay of the average length
tradeoff analysis can be performed between 2-D and 3-D and chip-edge length wires due to their shorter wire-lengths
ICs. For example: 1) chip area can be estimated for fixed and larger wiring pitch. The total wire length in a 3-D IC
clock frequency; 2) clock frequency can be estimated for is shorter than that of a 2-D IC. Since the wiring area is
fixed chip area; or 3) number of interconnect levels can be proportional to , for comparable
estimated for fixed chip area and clock frequency. Simula- available wiring area, the wiring pitch in a 3-D IC can be
tion results of some of these tradeoff analyses are presented increased to reduce the interconnect delay. In a 3-D IC,
here. due to the constant cost function, c.f. , fewer
To estimate the clock frequency, we use a critical path interconnect levels per device layer are available as more
model that has a logic depth of 15. The logic gates are ap- device layers are integrated. Wiring area is also reduced
proximated by NAND gates with fan-in and fan-out of three. due to the via blockage of VILICs. Based on our modeling
Fig. 21. Simulation results of total chip area for fixed clock
frequency and cost constraint. The total chip area is given by
NzAc, where Nz is the number of device layers and Ac is the die
area.
approach, there is an optimum number of device layers that Fig. 22. Interconnect delay limits IC performance with scaling.
can be integrated profitably to improve the clock frequency. Moving repeaters to upper active tiers reduces interconnect delay by
For the example being considered, it appears to be three to 9%. 3-D (two active layers) shows significant delay reduction (64%).
Increasing the number of metal levels in 3-D reduces interconnect
four. delay by a further 35%.
To estimate the impact of 3-D integration on chip area, an-
other set of tradeoff analyses can be performed. In this case
the clock frequency and the cost function are kept constant,
and the total chip area is estimated. The required chip area of similar analysis can be carried out for other approaches to
2-D and 3-D ICs for 450-MHz clock frequency, and c.f. 6 3-D integration as well.
is shown in Fig. 21. Assuming the interconnect delay is pro- Interconnect delay as a function of technology is calcu-
portional to - , for similar interconnect lated (Fig. 22) using data projected by the NTRS for 2-D
delay constraint, since the wire length in a 3-D IC is shorter, ICs. Also shown are delays for 3-D ICs with two active
the wiring pitch can be reduced. Both the shorter wire length layers, where wire pitches are increased to match the 2-D IC
and the flexibility to reduce the wiring pitch for fixed clock areas, calculated using the 3-D chip area estimation model
frequency constraint lead to the lower chip area in a 3-D IC. described above. Interconnect delay is reduced by 64% as a
The analysis presented so far was for a 180-nm 3-D tech- result. In all these calculations the number of metal levels is
nology for a fixed cost function. Next we extend this analysis conserved between 2-D and 3-D ICs. This assumption can
to study the effect of scaling the technology to smaller feature be relaxed such that each active layer in 3-D ICs may have
size, increasing the number available metal layers and active its own associated lower metal tiers with a universal global
Si layers. In the next set of analyses, the 3-D interconnect tier used for connecting the active-layer networks. The total
scheme being considered is shown in Fig. 16(b). However, number of metal layers is thus increased in this 3-D case.