0% found this document useful (0 votes)

188 views211 pages

PowerDistributionNetworkDesignForVLSI PDF

Uploaded by

shoryaveer -

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

188 views211 pages

PowerDistributionNetworkDesignForVLSI PDF

Uploaded by

shoryaveer -

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 211

ffirs.

qxd 3/24/2004 11:23 AM Page i

POWER DISTRIBUTION
NETWORK DESIGN
FOR VLSI
ffirs.qxd 3/24/2004 11:23 AM Page iii

POWER DISTRIBUTION
NETWORK DESIGN
FOR VLSI

QING K. ZHU
Intel Corporation
Matrix Semiconductor Inc., U.S.A.

A JOHN WILEY & SONS, INC., PUBLICATION

ffirs.qxd 3/24/2004 11:23 AM Page iv

Copyright © 2004 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
(978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representation or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, including
but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services please contact our Customer Care
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.

Library of Congress Cataloging-in-Publication is available.

ISBN 0-471-65720-4

Printed in the United States of America.

10 9 8 7 6 5 4 3 2 1
ftoc.qxd 3/24/2004 11:26 AM Page v

CONTENTS

Preface vii

1 Introduction 1
1.1 Power Supply Noise 2
1.2 Power Network Modeling 4
1.3 Modelling of Switching Currents 12
1.4 On-Chip Decoupling Capacitance 16
1.5 On-Chip Inductance 20
1.6 Process Scaling Impacts 28
1.7 Summary 32

2 Design Perspectives 33
2.1 Planning for Communication Chips 34
2.2 Planning for Microprocessor Chips 44
2.3 IBM CAD Methodology 55
2.4 Design for IR Drop 62
2.5 Package-Level Methodology 67
2.6 Summary 73

3 Electromigration 75
3.1 Basic Definitions and EM Rules 75
3.2 EM Analysis Tool 80
3.3 Full-Chip EM Methodology 83
3.4 Summary 85

v
ftoc.qxd 3/24/2004 11:26 AM Page vi

vi CONTENTS

4 IR Voltage Drop 87
4.1 Causes of IR Drop 87
4.2 Overview of IR Analysis 89
4.3 Static Analysis Approach 96
4.4 Dynamic Analysis Approach 99
4.5 Circuit Analysis with IR Drop Impacts 103
4.6 Summary 103

5 Power Grid Analysis 105

5.1 Introduction 106
5.2 Executing the Tool 108
5.3 Advanced Static Analysis 119
5.4 Dynamic Analysis 125
5.5 Layout Exploration 129
5.6 Summary 133

6 Microprocessor Design Examples 135

6.1 Intel IA-32 Pentium-III 135
6.2 Sun UltraSPARC 139
6.3 Hitachi SuperH Microprocessor 141
6.4 IBM S/390 Microprocessor 146
6.5 Sun SPARC 64b Microprocessor 148
6.6 Intel IA-64 Microprocessor 153
6.7 Summary 156

7 Package and I/O Design for Power Delivery 157

7.1 Flip-Chip Package 157
7.2 Simultaneous Switching Noise (SSN) 159
7.3 Case Study of a Microprocessor-Like Chip 167
7.4 Power Supply Measurement 181
7.5 I/O Pads for Power/Ground Supplies 188

Glossary 191

References 199

Index 205
fpref.qxd 3/24/2004 11:28 AM Page vii

PREFACE

This book provides the detailed information on power distribution

network design in integrated circuit chips. Power distribution
network design is a critical part of the job in circuit design and
physical integration for high-speed chips.
The IR drop and di/dt noise associated with the power distrib-
ution networks are crucial to circuit timing and performance. Due
to the complexity of the millions of gates and interconnects in
modern VLSI chips, power network analysis is accomplished us-
ing CAD tools. These tools take the layout database, usually in
GDSII files, extract the RC parasitic for the power distribution
network, and model the current consumption for switching de-
vices.
A fast circuit simulation is done for the electrical model of the
power distribution network in order to determine the IR drop or
other supply voltage noises, as well as the current density of met-
al power lines for checking electromigration failures.
In addition, the decoupling capacitors are inserted into the
power network for stabilizing the supply voltages in local regions
where current surges occur from time to time due to clock and log-
ic operations. The decoupling capacitors and power distribution
networks are required in some optimal form not only on-chip, but
also on the package and at system levels.
This book will explain the design issues, guidelines, examples,

vii
fpref.qxd 3/24/2004 11:28 AM Page viii

viii PREFACE

and CAD tools for the power distribution of the VLSI chip and
package. The user guide of the VoltageStorm™ tool from Cadence
Design Systems, Inc. is referred to throughout [51], together with
the author’s experience using this tool in designs.
The book is organized into seven chapters. Chapter 1 is an in-
troduction to the power supply network, power network modeling,
decoupling capacitors, and process scaling trends. Chapter 2 illus-
trates the design perspectives for the power distribution network,
including power network planning, layout specifications, decou-
pling capacitance insertion, modeling and analysis of power net-
works, and IR drop analysis and reduction. Chapter 3 explores
electromigration phenomena for the on-chip power distribution
network.
Chapter 4 discusses IR drop analysis methodology. It is taken
primarily from the VoltageStorm™ tool, using both static and dy-
namic analysis methods. The static method is performed for some
level worst-case IR drop analysis without the knowledge of input
vectors at the chip’s primary inputs. Chapter 5 describes the com-
mands and user interfaces of the VoltageStorm™ tool from Ca-
dence Design Systems, Inc. [51]. Chapter 6 lists the microproces-
sor design examples, with a focus on on-chip power distribution.
Readers will gain the insights into industry chip design for power
distribution networks from these examples.
Chapter 7 discusses the flip-chip and package design issues,
since the package is a part of the global power distribution. A case
study has been provided in this chapter for selecting the package
options, based on the performance requirements for the power
supply. Power network measurement techniques from silicon are
also discussed at the end of Chapter 7.
A glossary of key words and basic terms is provided at the end
of the book to help understand the basic concepts in VLSI design
and power distribution.
With the continually decreasing supply voltages and the in-
creasing transistor switching currents on-chip, power supply nois-
es on-chip remains the challenging issue for high-performance
chip design. More and more research will be needed in the future
in CAD tools for switching current modeling and accurate power
network analysis. The design methodology for power delivery will
need to consider the performance, layout area, and package tech-
nology optimization for future chips.
The author would like to thank Mr. George J. Telecki at John
fpref.qxd 3/24/2004 11:28 AM Page ix

PREFACE ix

Wiley & Sons, Inc. for providing the chance to get this book pub-
lished. He also thanks his co-workers in Intel Corporation, includ-
ing David Ayers, Alex Waizman, and Bendik Kleveland. Finally,
he appreciates the strong support from family members, includ-
ing wife Huiling Song and two sons Phillip and Michael.
c01.qxd 12/16/2003 11:21 AM Page 1

1
INTRODUCTION

As power supply voltage continues to drop with the VLSI tech-

nology scaling associated with significantly increasing device
numbers in a die, power network design becomes a very chal-
lenging task for a chip with millions of transistors. The common
task in VLSI power network design is to provide enough power
lines across the chip to reduce the voltage drops from the power
pads to the center of the chip. The voltage drops are mainly
caused by the resistance or inductance of the power network
metal lines.
The power network can be modeled as a low-pass filter with RL
segments in series, attached with capacitors at each end. The cur-
rent sources of the switching gates and the intentional decoupling
capacitors are also inserted in the model. The IR drop is propor-
tional to the average current consumed by the circuit in the chip.
The L · di/dt drop is proportional to the time-domain change of
the current, due to the switching of logic gates in the chip opera-
tions.
This chapter is organized into seven sections. Section 1.1 dis-
cusses the general trend of power supply noise with the process
technology scaling. Section 1.2 shows the modeling methodology
for on-chip power networks. Section 1.3 discusses the switching
current modeling methodology for the power distribution net-
work, which is critical for the accuracy of power grid analysis.
Once we obtain the models, the power network can be character-
ized as a linear network with R, L, C, and current sources, in or-
der to solve the voltage distributions across the power network.

Power Distribution Network Design for VLSI, by Qing K. Zhu 1

ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c01.qxd 12/16/2003 11:21 AM Page 2

2 INTRODUCTION

Section 1.4 discusses a special topic in power network design:

the decoupling capacitor optimization to allocate enough decou-
pling capacitors between Vdd and Vss nets, but not over-allocating
so as to result in enlargment of the die area. Section 1.5 discusses
the on-chip inductance effects on power network modeling. We
show the metal configurations used in the power line design in or-
der to minimize the inductance delay. In general, many thin-
width Vdd and Vss lines interleaved with each other in the power
distribution network are preferred in order to minimize the area
of the return current loop or on-chip inductance.
Section 1.6 discusses process technology scaling impacts for the
future power network design. We discuss the technology scaling
impacts in two scenarios. Section 1.7 provides the summary to
this chapter.

1.1 POWER SUPPLY NOISE

Noise problems in microprocessor power distribution networks

have been discussed in the literature [1, 2, 3, 4, 5, 6]. The supply
voltage is continually dropping in microprocessor design to reduce
the power consumption and matche the reduced gate oxide thick-
ness in the scaled IC process technology generations. Figure 1-
1(a) shows the supply voltage drop trend in new technologies; and
Figure 1-1(b) shows the gate oxide thickness reduction during the
process scaling.
The on-chip decoupling capacitor is constructed by using the
dummy transistors connected to Vcc with the gate, and Vss with
the drain and source. A conventional method for on-chip decou-
pling capacitance allocation is based on a percentage (i.e., 10%)
area in each layout window (e.g., 100 × 100 ␮m) allocated for the
decoupling capacitance.
The decoupling capacitors are inserted near the large-size
buffers, such as clock buffers or phase-locked loops. The conven-
tional method, based on the layout area percentage, is not opti-
mal, either being overestimated for a large layout area or under-
estimated for meeting the power noise requirements.
The power distribution design techniques used for DEC Alpha
chips, such as the C4 package and on-chip power planes, can be
found in [1]. The decoupling capacitance optimization technique,
based on the layout floor plan graph and path-finding algorithm,
c01.qxd 12/16/2003 11:21 AM Page 3

1.1 POWER SUPPLY NOISE 3

3
Supply voltage (V) 2.5
2
1.5
1
0.5
0
0.25 0.18 0.13 0.1
Minimum feature size (µm)

(a)
Gate oxide thickness (A)

60
50
40
30
20
10
0
0.25 0.18 0.13 0.1
Minimum feature size (µm)

(b)

Figure 1-1. Power supply (a) and gate oxide scaling (b) trends.

can be found in [2]. The power network modeling and analysis

techniques for PowerPC microprocessors can be found in [3]. A
power network modeling and simulation CAD tool is described in
[4].
The reliability problems (i.e., electromigration) and CAD tool
for the power network are discussed in [5]. The basics of VLSI
power distribution can be found in [6]. The description of a high-
performance power network scaling model and decoupling ca-
pacitance optimization method is proposed in [7]. A criterion to
include the inductance in on-chip interconnect modeling was dis-
c01.qxd 12/16/2003 11:21 AM Page 4

4 INTRODUCTION

cussed in [8]. The VLSI design basic to the power network de-
sign, such as metal sizing equations, can be found in [9]. Inter-
connect scaling issues in the deep-submicron process can be
found in [10].

1.2 POWER NETWORK MODELING

The layout and C4 package of a high-performance microprocessor

power network is illustrated in Figure 1-2. It is a five metal
process, and M5 and M4 (the top two metal layers in this process)
are used for the full-chip power distribution, although signal lines
can still be routed between the spaces between the power lines in
these top metal layers. Note that the local power networks are not
shown in Figure 1-2; they will be routed on lower metal layers to
deliver the power to the circuits.
The on-chip power lines are modeled in RLC segments, as illus-
trated in Figure 1-3. Rvcc and Lvcc are the unit-length resistance
and unit-length inductance (self and mutual) of the power line,
multiplied by the line length between two nodes in the power grid.
Rd and Cd are the resistance and capacitance in the series, used
to model the decoupling capacitor that is implemented by the
dummy transistors. Is is the switching current of devices and it is
time varying. Rs and Cs represent the turn-on resistance and the
capacitance load of the devices connected at the power grid nodes
(AC, BD, etc.).
The model in Figure 1-3 contains only the linear elements such
as R, L, C, and current sources. It suggests to us that a linear cir-
cuit simulator can be used to speed up the large-size microproces-
sor power network analysis based on the proposed model. The key
parameters of decoupling capacitors (dummy transistors) are
Cdecap and Rdecap, as shown in Figure 1-4.
The charges in Cdecap are used to help the supply voltage stabil-
ity in Csw (switching gates) before the charges eventually come
from the supply voltage source via the long current loop from the
package.
To improve the efficiency of the decoupling capacitors, the
Rdecap needs to be sufficiently small. When Vcc is applied to the
gate, as shown in Figure 1-5, the inversion channel is created be-
tween the D and S with the Rds-on resistance. The Rds-on resistance
is the 1/slope of the I/V curves of the resistor at Vds = 0V. The
c01.qxd 12/16/2003 11:21 AM Page 5

1.2 POWER NETWORK MODELING 5

Figure 1-2. Power distribution for high-performance microprocessors.

c01.qxd 12/16/2003 11:21 AM Page 6

6 INTRODUCTION

Power grid node

Vcc line

Rvcc Lvcc A Rvcc Lvcc B Rvcc Lvcc

Switching Switching
Is circuit Is circuit
Rd Rd
Decoupling Rs Decoupling Rs
capacitor capacitor
Cd Cd
Cs Cs

Rvss Lvss Rvss Lvss Rvss Lvss

C D
Vss Line

Figure 1-3. On-chip power grid RLC modeling.

Rds-on and Cgate form a distributed RC network. Cgate is in series

with two Rds-on/2 resistors connected in parallel, resulting in
Rds-on/4 added in series with Cgate, as shown in Figure 1-5.
The simulation of the power network depends on the accuracy
and turnaround time of the power grid modeling. In most cases,
only the resistance and capacitance of the power lines are needed,

Vcc SW
Lvcc Vccdie

Rdecap Rsw
Vc(t=0)=Vcc Cdecap Csw

Lvss Vc(t=0)=0
Vssdie
Cdecap>>Csw
RCdecap<<RCsw

Figure 1-4. Switching model of decoupling capacitor.

c01.qxd 12/16/2003 11:21 AM Page 7

Figure 1-5. Decoupling capacitor modeling.

excluding the metal inductances for the on-chip power network.

Many CAD tools are available for the purpose of extracting the in-
terconnect RC for power grids, as summarized in Table 1-1.
The on-chip inductance for the power grid can be ignored by us-
ing special design rules, shortening the return loop of Vdd and Vss
by using several interleaved Vss and Vdd lines, as shown in Figure
1-2, for example, to implement the power grid.
During the RC modeling process, each metal segment can be
represented in two forms as follows: (1) the lumped capacitive
parasitic, or (2) the distributed RC parasitic, as shown in Figure
1-6(a). The lumped capacitive parasitic represents the total wire
capacitance from each driver circuit in the signal net. The distrib-
uted RC parasitic includes the resistance (R) of the metal line in
the modeling.
Power grid modeling usually uses the RC model, since the met-
al line resistance of the power grid is significant at the full-chip
level. A long metal line can be broken into multiple RC segments,
as shown in Figure 1-6(b).

Table 1-1. Well-known RC extraction CAD tools

Tool Manufacturer
Fire & Ice Cadence Design Systems
Star-RCXT Synopsys
xCalibre Mentor Graphics
HyperExtract Cadence Design Systems
Arcadia Synopsys
Columbus Sequence Design
Nautilus Cadence Design Systems
QuickCap Random Logic
c01.qxd 12/16/2003 11:21 AM Page 8

8 INTRODUCTION

(a)

RC:
Net pattern
matching
RC
library
segment

Break Break
line 1 line 2
Break Break
line 3 line 4

(b)

Figure 1-6. Lumped and distributed RC models.

Each RC segment is modeled with a series resistor, together

with two capacitors at two ends of the resistor. The metal seg-
ment capacitance is evenly divided by two capacitors. This is usu-
ally called the Pai–RC model since it looks like a pi (␲) symbol, as
shown in Figure 1-6(b).
The extracted RC data from the layout are saved in a standard
parasitic format (SPF) file. It includes a list of nets and detailed
RC values. The R and C elements with the node names are speci-
fied either as schematic-based labels or layout-based labels, de-
pending on the options used in the RC netlisting stage.
The schematic node names are preferred in the SPF, since this
SPF can be back-annotated to the prelayout schematic netlist [33,
c01.qxd 12/16/2003 11:21 AM Page 9

1.2 POWER NETWORK MODELING 9

34]. In addition, the SPF can include the device section that mod-
els the extracted devices from the physical layout.
In general, the capacitance can be formed between any poly-
gons in the layout, although the closer ones have more significant
capacitances, and thus have more impact on the total capacitance
of the net. Figure 1-7 shows the possible capacitances between the
gates and metal lines in the physical layouts.
The capacitance to the substrate is dominant over other cou-
pling capacitances in the old one or two metals technology. But
the situation changes in the latest submicron technology with sev-
en to eight metal layers, since the top-level metals are far away
from the substrate, and the total capacitance of these top-level
metals is more impacted by the coupling capacitances between ad-
jacent lines in the same layer or adjacent layers of the layout.
In addition, the spacing between metal lines is continually
scaled, so the coupling capacitance between neighboring metal
lines becomes more and more important. The calculation of the re-
sistance or capacitance can be done through the direct solution of
the well-known Maxwell’s EM equations or Green’s functions
[17].
A complex geometrical layout can require an extremely long
computational time using the direct EM field solution. Therefore,

Figure 1-7. Coupling capacitances between conductors in a VLSI layout [33].

c01.qxd 12/16/2003 11:21 AM Page 10

10 INTRODUCTION

equations or capacitance models are usually adopted in the capac-

itance calculation for a large-scale layout.
Once the capacitance equations have been established, they are
used in the RC extraction, which is fast enough to handle a large-
scale layout. The RC extraction works on the physical database
together with the specified RC equations.
Let us review the basic resistance equation:

R = sl/w (ohm) (1-1)

In Equation (1-1), s is the sheet resistance in the unit of

ohm/square, l is the length of the line in ␮m, and w is the width of
the line in ␮m.
Table 1-2 shows the sheet resistance data in a 0.18 ␮m technol-
ogy. Metal four and metal five have significantly lower resis-
tances, making them suitable for long metal routes. The polysili-
con and metal one layers have high resistance, making them
suitable for short metal connects.
The contacts or vias between metal layers, as shown in Figure
1-8, are usually modeled as resistors. Each contact or via has a
fixed resistance based on design rules. The contact represents the
metal hole between metal one to the diffusion or poly layer,
whereas the via represents the metal hole between metal one and
metal two. Contacts or vias will introduce many RC segments and
significantly increase the RC parasitic file size and simulation
time.
The unit-length capacitance models are based on the results in
[41] as follows.

a. Overlap capacitance: the bottom/top surface of one line to

the bottom and top surfaces of another line in two layers.
Two lines are overlapped in the vertical direction. The over-
lap capacitance is modeled as Ca =␧0␧r · A/dl1l2, where A is
the overlap area of line l1 and l2, ␧0 is the permittivity of free

Table 1-2. Metal sheet resistances in 0.18 ␮m technology

Layer Polysilicon Metal 1 Metal 2 Metal 3 Metal 4 Metal 5
Sheet 5.5 0.1 0.05 0.05 0.01 0.01
Resistance
(⍀ square)
c01.qxd 12/16/2003 11:21 AM Page 11

1.2 POWER NETWORK MODELING 11

Figure 1-8. Contacts and vias [9].

space (8.854 · 10–14 F/cm2), ␧r is the relative permittivity be-

tween l1 and l2, and dl1l2 is the vertical spacing between two
lines.
b. Fringe capacitance: the side surface of one line to the bottom
or top surface of another line in two layers. Two lines may or
may not be overlapped in the vertical direction. The fringe
capacitance is modeled as Cfr = Cfr0 · l · (e–x1/x0 – e–x2/x0). x1 is
the distance from l1 (side edge) to l2 (near-end edge), and x2
is the distance to l2 (far-end edge). l is the length of l1 (side
edge). Cfr0 and x0 are model coefficients that are character-
ized based on different vertical profiles. In a special case,
two side edges may coincide in l1 and l2 (x1 = 0 and x2 =
width of l2) and the model becomes Cfr = Cfr0 · l · (1 – e–x2/x0).
c. Lateral capacitance: the side surface of one line to the side
surface of the adjacent line in the same layer. The lateral ca-
c01.qxd 12/16/2003 11:21 AM Page 12

12 INTRODUCTION

pacitance is modeled as Clt = Fl1l2 (d) · l, and Fl1l2 (d) = C0 +

C1/d + C2/d2 + C3/d3 + C4/d4. l is the parallel length of two
neighboring lines or conductors, Fl1l2 (d) is the lateral capac-
itance per unit length, and d is the spacing between two
lines. C0, C1, C2, C3, and C4 are coefficients that are charac-
terized for the given process technology.

1.3 MODELING OF SWITCHING CURRENTS

The high current consumption in some regions of the die produces

“hot spots.” In these hot spots, significant current transition oc-
curs and the power network voltage fluctuation will be high. Ac-
curate transition current modeling and power network simulation
are necessary to calculate the noise and temperature distribu-
tions across the entire chip power network.
Figure 1-9(a) shows the current waveforms of multiple nearby
drivers with three combinations of the transition patterns for
these drivers. The simulation results are obtained when all dri-
vers are charging (case: ALL UP), all on discharging (case: ALL
DN), and half are charging and half discharging (case: UP_DN).
In Figure 1-9(a), the X-coordinate is the time (ns) and the Y-coor-
dinate is the voltage (V).
The waveforms illustrate the need to include the driver transi-
tion patterns (UP/DOWN) to model the transition currents. In our
simulation, a 295.2 ␮m long bus with 130 signals is simulated in
the minimum M5 width and pitch. Figure 1-9(b) shows the circuit
schematic to be simulated. Figure 1-9(c) shows the entire power
grid modeling for the simulation. Figure 1-9(d) shows the struc-
ture of bus lines and Vcc/Vss lines on the M5 layer included in the
simulation.
In general, the total current consumption I(t) of the CMOS cir-
cuit shown in Figure 1-10 consists of three components: Id, Isc, and
Il. Id is the charge or discharge current to the output load:

Id = CloadVcc f (1-2)

In Equation (1-2), Cload is the total output load of the driver, in-
cluding the gate load and interconnect load; Vcc is the supply volt-
age; and f is the switching activity of Cload. Although the charge
and discharge dynamic current Id is a predominant component of
c01.qxd 12/16/2003 11:21 AM Page 13

1.3 MODELING OF SWITCHING CURRENTS 13

the total current consumption, other two current components (Isc,

Il) are still significant in the submicron CMOS process.
The short-circuit current Isc is due to the fact that pMOS and
nMOS transistors are both in the transition region of the inverter.
The leakage current Il is due to the reverse-biased diode’s leakage
between the diffusion region and the substrate or well. Although
the sum of the short-circuit and leakage currents accounts for less
than 15% of the total current consumption of the microprocessor
chip, the percentage will go up in future CMOS processes.
Figure 1-10(b) shows the current waveforms based on the esti-
mated current components; the waveform is assumed to be a tri-

112pS
20.7pS

179pS

(a)

Figure 1-9. Switching noise simulation based on power grid modelling. (a) Sim-
ulation result. (Figure continues on next page)
c01.qxd 12/16/2003 11:21 AM Page 14

14 INTRODUCTION

VCCDRV VCC21
VSSDRV VSS21

(b)

(c)

Figure 1-9 (continued). (b) Simulated circuit. (c) M5 and M6 power grid model-
ling. (Figure continues on next page)
c01.qxd 12/16/2003 11:21 AM Page 15

1.3 MODELING OF SWITCHING CURRENTS 15

(d)

Figure 1-9 (continued). (d) Bus lines layout structure.

Vcc
A

I(t)

B D

Cload

Vss
(a)

I(t)

Tp/2 Tp/2

i(n) i(p)

tr tf
t
(b)

Figure 1-10. Modeling of switching currents.

c01.qxd 12/16/2003 11:21 AM Page 16

16 INTRODUCTION

angle. The current waveforms are back-annotated into the power

network model, as shown in Figure 1-3. To improve the accuracy
of the current waveforms, a current simulation tool such as Syn-
opsys, Inc.’s PowerMill™ can be used, although the result largely
depends on the (0, 1) patterns at the input ports.

1.4 ON-CHIP DECOUPLING CAPACITANCE

To prevent the supply level from collapsing when many gates

switch simultaneously at the same clock transition, it is necessary
to add decoupling capacitors at “hot spots” to reduce the peak
voltage drops. These decoupling capacitors should be designed
such that they do not occupy an excessively large area, which
would decrease the yield.
It is important to realize that the on-chip decoupling capacitors
reduce the di/dt noise generated by the on-chip circuitry, but do
not reduce the noise due to the simultaneous switching of off-chip
drivers. Placing many low-inductance decoupling capacitors on
the package and board to provide multiple low-inductance pow-
er/ground pins for output buffers should minimize the transient
noise due to off-chip drivers.
If decoupling capacitors are placed, an upper limit or bound of
the transient voltage fluctuation can be determined by modeling
the power lines behind the capacitor as an infinitely large induc-
tor. Immediately after switching, based on the decoupling capaci-
tor model, as shown in Figure 1-4, no current flows through this
large inductor and a capacitance divider is established based on
the charge conservation law:

CdecapVCC = (VCC + ⌬V)(Cdecap + Csw)

(1-3)
Csw
⌬V = – ᎏᎏ VCC
Cdecap + Csw

Based on Equation (1-3), to ensure a small voltage fluctuation ⌬V,

the Cdecap (decoupling capacitance) should be much larger than
the Csw (switching capacitance). Accordingly, for a microprocessor
chip with a 14 nF load, we need 10 · 14 nF = 140 nF to achieve a
10% Vdd power noise threshold in the worst case. Equation (1-3)
provides the calculation of an upper bound of the total on-chip de-
coupling capacitance to satisfy the voltage fluctuation ⌬V bound.
c01.qxd 12/16/2003 11:21 AM Page 17

1.4 ON-CHIP DECOUPLING CAPACITANCE 17

The objective of the decoupling capacitance optimization problem

is to minimize the total amount of decoupling capacitance as need-
ed. Meanwhile, all the nodes in the power network model are satis-
fied with the specified supply voltage noise thresholds. Formally,
we can describe the objective and constraints as follows [83]:

Min 冱 (Cd)i Subject to V1 ⱕ V(ni) ⱕ V2 (1-4)

In Equation (1-4), (Cd)i is the decoupling capacitance and V(ni) the

voltage at node ni of the power network model, as shown in Figure
1-3; V1 and V2 are the lower and upper thresholds required for
feasible supply voltages. We define a noisy node in the power net-
work model as one in which, at some time, the voltage exceeds the
required [V1, V2] thresholds, as shown in Figure 1-11.
The thresholds are at the upper bound and lower bound away
from the nominal supply voltages to guarantee the correct circuit
timing. For example, with a nominal voltage of 1.3 V and 10%
away allowed, the upper and lower thresholds are [V1, V2] = [1.17
V, 1.43 V].
The power network, with each node’s transient voltages in the
electrical model satisfying the given thresholds, is called a feasible
power network. Adding the decoupling capacitors at noisy nodes
will turn a power network into a feasible one. Figure 1-12(a) shows

voltage

1.43V
1.3V: normal voltage
Thresholds

1.17V

violation
Voltage waveform at the node
time
Figure 1-11. Supply voltage thresholds and noisy nodes definition [83].
c01.qxd 12/16/2003 11:21 AM Page 18

18 INTRODUCTION

Nominal voltage:
Vcc = 1.3V

Voltage thresholds:
Vcc:[1.17V - 1.43V]

Noisy nodes:
Node 25 (min V = 0.47V)
Node 10 (min V = 1.15V)

(a)

Nominal voltage:
Vcc = 1.3V

Voltage thresholds:
Vcc:[1.17V - 1.43V]

Noisy nodes:
None

Decoupling capacitors:
Node 25
Node 10

(b)

Figure 1-12. Adding decoupling capacitors at noisy nodes [83]. (a) Nodes 10 and
25 are noisy. (b) Adding more capacitors on Nodes 10 and 25.
c01.qxd 12/16/2003 11:21 AM Page 19

1.4 ON-CHIP DECOUPLING CAPACITANCE 19

one example with the simulated voltages of two nodes (Node 25 and
Node 10) in the power network.
The minimum voltages (0.47 V and 1.15 V) of these nodes are
less than the required lower threshold (1.17 V), and thus they are
noisy nodes. The decoupling capacitor is added at each of these
two noisy nodes and the voltages eventually satisfy the required
thresholds, as shown in Figure 1-12(b).
Figure 1-13 shows the high-level decoupling capacitance opti-
mization flow [83]. Procedure I adds the decoupling capacitors at
the noisy nodes. Procedure II removes the unnecessary decou-
pling capacitance overallocated initially.
We have done experiments on a power network model with
about 100 RLC grids and decoupling capacitors. Current sources
have been added at each node in the model for transistor transi-
tions with the current waveforms, as shown in Figure 1-10(b). The

Procedure I: Decoupling Capacitance Increment

Simulate the power network model with RLC elements and current sources.
Identify the “noisy” nodes by comparing the voltage results with the specified thresholds.
While (there is “noisy” node){
For (each “noisy” node){
Add a step size of the decoupling capacitance.
}
Simulate the power network model with the updated decoupling capacitance.
Identify “noisy” nodes by comparing simulation voltages with the required thresholds.
}

Procedure II: Decoupling Capacitance Decrement

For (each node){

Mark the node as “deductible”;
}
While (there is still “deductible” node){
Deduct a step size of decoupling capacitance from each “deductible” node;
Simulate the power network model with the updated decoupling capacitance;
Identify the “noisy” nodes by comparing simulation voltages with the required thresholds;
For (each “noisy” node){
Add a step size of the decoupling capacitance;
Make the node as “nondeductible”;
}
}

Figure 1-13. Decoupling capacitance optimization flow [83].

c01.qxd 12/16/2003 11:21 AM Page 20

20 INTRODUCTION

cycle time is 3 ns or 330 MHz frequency in the experiments. Two

voltage sources are added to model the C4 package power pads.
The RL parasitic (200 ⍀ and 0.5 nH) of the package layer are in-
cluded in the model. The nominal supply voltage is 1.3 V.
The power grid simulation is done using a fast linear circuit
simulator [20]. The flow shown in Figure 1-13 is used to deter-
mine the locations and amounts of on-chip decoupling capacitors.
Figure 1-14 shows the experimental results for a sensitivity study
to decoupling capacitances. The decoupling capacitance is most
sensitive to the changes in the noise margin and device transition
currents.
This suggests to us that the model of the current consumption
is the key to getting the accurate voltage drop and decoupling ca-
pacitance amounts. In addition, we want to reduce the on-chip
decoupling capacitance size by improving the noise margin. This
can be achieved by improving the power distribution on the
package and the board. The changes of power line RLC values,
as well as the absolute supply voltages with the same noise
thresholds, do not show significant impact on the decoupling ca-
pacitance.
In the experiment, we assigned the initial RLC values at each
node of the power network as follows: R = 40 ⍀, L = 0.005 nH, C =
0.3 pF (without the decoupling capacitance at this initial assign-
ment). The change of on-chip power line inductance does not lead
to a lot of variation in decoupling capacitance, as shown in Figure
1-14(b); this is due to the very small L/R delay (0.12 ps) compared
to the RC delay (12 ps) in this example.
The decoupling capacitor can be improved by using either the
PN junction or a MOS varactor device [43]. As shown in Figure 1-
15(a), the PN junction is formed by diffusing p+ doping in an n-
well. As shown in Figure 1-15(b), the MOS varactor is formed by
placing an nMOS in an n-well. The n-well is added to form a chan-
nel between the source and drain. In addition, Vtune and Vgate volt-
ages are controlled to vary the gate capacitance used for the de-
coupling capacitances between Vdd and Vss.

1.5 ON-CHIP INDUCTANCE

The inductive drop or noise (L · di/dt) on the power lines becomes

significant for high-speed microprocessor chips [14, 15], especially
c01.qxd 12/16/2003 11:21 AM Page 21

1.5 ON-CHIP INDUCTANCE 21

Decoupling Capacitance (pf) 450 Vcc = 0.00375V

400
350
300
250
Vcc = 0.075V
200
150
100 Vcc = 0.15V
50
0
0% 10% 20% 30% 40% 50%
Line Resistance Increasing Rate
(a)
450
Decoupling Capacitance

400 Vcc = 0.0375V

350
300
250
(pf)

200 Vcc = 0.075V

150
100 Vcc = 0.15V
50
0
0% 10% 20% 30% 40% 50%
Line Inductance Increasing Rate
(b)
Decoupling Capacitance (pf)

450
400 Vcc = 0.0375V
350
300
250
200
Vcc = 0.075V
150
100
Vcc = 0.15V
50
0
0% 10% 20% 30% 40% 50%

Load Capacitance Increasing Rate

(c)

Figure 1-14. Sensitivity study of on-chip decoupling capacitances [83]. (Figure

continues on next page)
c01.qxd 12/16/2003 11:21 AM Page 22

22 INTRODUCTION

Decoupling Capacitance (pf)

700
600 Vcc = 0.0375V
500
400
300 Vcc = 0.075V
200
Vcc = 0.15V
100
0
0% 10% 20% 30% 40% 50%

Is (Current Source) Increasing Rate

(d)
Decoupling Capacitance (pf)

450 Vcc = 0.0375V

400
350
300
250
200 Vcc = 0.075V
150
100 Vcc = 0.15V
50
0
0% 10% 20% 30% 40% 50%
Vcc Increasing Rate
(e)
(pf)
Capacitance (pf)

450
Vcc = 0.0375V
Decoupling Capacitance

400
350
300
250
200 Vcc = 0.075V
Decoupling

150
100 Vcc = 0.15V
50
0
0% 10% 20% 30% 40% 50%

⌬VccIncreasing
Vcc IncreasingRate
Rate
(f)

Figure 1-14 (continued).

c01.qxd 12/16/2003 11:21 AM Page 23

1.5 ON-CHIP INDUCTANCE 23

Figure 1-15. Decoupling capacitor [43]. (a) PN Junction. (b) MOS varactor.

when the chip becomes faster and larger in size. The characteris-
tic impedance is Z0 = 兹L 苶/
苶C苶. Adding decoupling capacitors will in-
crease the capacitance but does not affect the inductance of the
power planes. As a result, Z0 is reduced, and current spikes gener-
ate smaller voltage drops because ⌬V = Z0⌬I
Low impedance of the power network helps the pulse response
and curbs the instantaneous fluctuations. The impedance Z0 can
be further reduced by lowering the inductance L of the power net-
work. This section presents a metal wire design method to reduce
the inductance by carefully selecting the sizes and spaces of pow-
er lines.
Figure 1-16(a) shows five different combinations of the widths
and spaces for two adjacent Vcc and Vss lines [21]. The inductance
and resistance of these five combinations are shown in Figure
1-16(b) and Figure 1-16(c) for 10,000 ␮m long power lines. The in-
ductance is calculated by using a two-dimensional model with the
current loops between adjacent Vss and Vcc lines. The first-order
estimation of the unit-length loop inductance for two adjacent Vcc
and Vss lines is as follows:

s
L = ␮ᎏ (1-5)
w

In Equation (1-5), ␮ is the permeability of the dielectric material

between adjacent Vcc and Vss lines, s the space between the Vcc
and Vss lines, and w the width of Vcc or Vss lines. The Vcc and Vdd
nets are interchangeable in this book. Usually, Vcc is used for the
analog signal and Vdd for digital design.
The inductance becomes large when the line space is big, which
c01.qxd 12/16/2003 11:21 AM Page 24

24 INTRODUCTION

4.24 0.8
Case 1 Medium width pair of
minimum spaced M5 M5
Vcc
2.12
0.8
Case 2 Try half width pair of
minimum spaced M5 M5
Vcc
1.64 0.84
Case 3 Narrow width pair of
minimum spaced M6 M6
Vcc
37 0.84
Case 4 Wide minimum spaced M6 M6
lines pair Vcc
4.24 22
Case 5 Spread out medium width
M5 M5
Vcc
(a)

1.2
L [nH/1000u]

1.1

4.24 22
1
M5
37 0.84
0.9
M6
0.8

0.7
4.24 0.8
2.12 M5
0.6
0.8
M5
0.5

0.4
1.64 0.84
M6 Freq [MHz]
0.3
1 10 100 1,000 10,000

(b)

Figure 1-16. Characterization results of Vdd/Vss metal structures [21]. (a) Vcc
and Vss cases. (b) On-chip inductance characterizations.
c01.qxd 12/16/2003 11:21 AM Page 25

1.5 ON-CHIP INDUCTANCE 25

35
1.64 0.84
R[⍀/1000u]

30.8 M6 30.8
30
27.6
26.3 26.6

25
2.12
0.8
20
M5
4.24 0.8
M5 16.5

15 14.2
13.2

4.24 22
10
M5
5.92
37 0.84
5
M6
3.97
1.3
Freq [MHz]
0
1 10 100 1,000 10,000

(c)

35
1.64 0.84
R[⍀ /1000u]

30.8 M6 30.8
30
27.6
26.3 26.6

25
2.12
0.8
20
M5
4.24 0.8
M5 16.5

15 14.2
13.2

4.24 22
10
M5
5.92
37 0.84
5
M6
3.97
1.3
Freq [MHz]
0
1 10 100 1,000 10,000

(d)

Figure 1-16 (continued). (c) Resistance characterizations. (d) Impedance calcu-

lation. (Continued on next page)
c01.qxd 12/16/2003 11:21 AM Page 26

26 INTRODUCTION

1,000
τ[pS]

37 0.84
391.97 M6
391.97 391.97 391.97 391.97 391.97 384.17
341.45

264.67

178.14

4.24 22
95.72
100 M5
90.15

4.24 0.8 57.94

M5
39.09
2.12
0.8
M5
18.78
15.55
1.64 0.84 Freq [MHz]
10 M6
1 10 100 1000 10000

(e)

Figure 1-16 (continued). (e) L/R delay.

is opposite to the case of line-to-line capacitance coupling. Case 5

has far more inductance than any other cases, since it has a large
line-to-line space. More magnetic coupling is caused by two con-
ductors in the far distance and that is one of difficulties in accu-
rate inductance modeling.
The inductance is reduced at high frequencies because time
varying currents tend to concentrate near the surface of the con-
ductors at high frequencies; this is known as the skin effect [6].
As a consequence of this electromagnetic induction phenome-
non, the magnitude of the current density drops exponentially
with the distance away from the surface. The distance at which
the current density becomes a fraction 1/e of its value at the sur-
face is called skin depth, which is calculated by

␳
␴s =
冪莦 ᎏ
␲␮f
(1-6)

In Equation (1-6), f is the frequency, and ␮ and ␳ are the perme-

ability and resistivity of the material. Making the thickness of the
c01.qxd 12/16/2003 11:21 AM Page 27

1.5 ON-CHIP INDUCTANCE 27

conductor larger than approximately 2␴s will not reduce the effec-
tive resistance of the line.
Figure 1-16(c) shows the resistance plots over the frequency for
the five line configurations shown in Figure 1-16(a). The skin ef-
fects are observed at the higher frequencies with the increased re-
sistances for all configurations. Case 4, shown in Figure 1-16(c),
which has the largest width, shows the skin effect at the lowest
frequency due to its large width.
The impedance of a power line is calculated as follows:

|Z(f)| = 兹R
苶2苶+ 苶␲
苶苶(2 苶fL
苶苶 )2 (1-7)

In Equation (1-7), f is the clock frequency and R and L are unit-

length line resistance and unit-length line inductance. Figure 1-
16(d) shows the impedance as the frequency functions of the Vcc
and Vss line configurations shown in Figure 1-16(a).
At the high frequency, the impedance is rising, especially for
Case 5, due to the inductance effect, as shown in Figure 1-16(d).
Case 4, shown in Figure 1-16(a) with the largest wire width and
small line space, has the smallest impedance.
The inductance delay due to the line inductance and line resis-
tance is calculated as follows:

␶ = L/R (1-8)

The L/R delay characterizes the importance of the inductance in

power network modeling. Figure 1-16(e) shows the L/R delay re-
sults; Case 2 and Case 3, with small line widths and small line
spaces, have the smallest L/R delay, as small as 15–19 ps for a
10000 ␮m long power line.
If the L/R delay is much smaller than the RC delay per unit
length, the line inductance Lvcc or Lvss can be ignored in the on-
chip power network model. In this condition, the RC network is
accurate enough to model the on-chip power network.
Based on the experimental results shown in Figure 1-16(e), we
can conclude that narrow and dense lines are preferred in the
power network design for metal inductance reduction. However
other effects, like the IR drop, need to be considered as well.
Just considering how to reduce the inductance effect through
wire sizing is not very useful since the inductance is still dominat-
ed by the package in modern chips. But we can use dense and nar-
row lines for reducing both on-chip inductance and resistance. An
c01.qxd 12/16/2003 11:21 AM Page 28

28 INTRODUCTION

example is shown in Figure 1-17. The inductance is obviously re-

duced based on our experiments.
The resistance of these narrow lines combined is equal to, or less
than, a wide line. The example in Figure 1-17 shows a practical
guideline used in the Intel microprocessor power network design.

1.6 PROCESS SCALING IMPACTS

We have considered two scenarios for the technology scaling in

microprocessor chips. Scenario A scales the existing chip to a new
process with a scaling factor S with little logic change. In Scenario
A, die size is reduced by S2. Scenario B scales the existing chip to
a new process with lots of new logics implemented.
In Scenario B, the die size is assumed to be unchanged when
using the new process due to more transistors employed in the
new design. Table 1-3 shows the impact on the microprocessor
power distribution of using the above two scaling scenarios for the
microprocessor chips. The detailed derivations are given below.

Scenario A
The line width and space are both reduced by S, assuming the
line thickness change is negligible in process shrinking. The unit-
length resistance is increased by 1/S. The unit-length capacitance
is reduced in S by assuming that the plate capacitance is reduced
by 1/S2 but the coupling capacitance increases by 1/S due to the
smaller line space.

Vcc Vss

Preferred

Vcc Vss Vcc Vss Vcc Vss Vcc Vss Vcc Vss

Figure 1-17. Design guidelines for on-chip power lines.

c01.qxd 12/16/2003 11:21 AM Page 29

1.6 PROCESS SCALING IMPACTS 29

Table 1-3. Technology scaling model for microprocessor power distribution

Design Parameters Scenario A Scenario B
Dimensions Die size S2 (down) Unchanged
Transistor count Unchanged 1/S (up)
Metal width S (down) S (down)
Metal space S (down) S (down)
Metal thickness Unchanged Unchanged
Global metal length S (down) Unchanged
Decoupling capacitance bound S2 (down) Unchanged
Area % of decoupling capacitor Unchanged Unchanged
RLC Metal resistance Unchanged 1/S (up)
Parameters Metal capacitance S2 (down) S (down)
Loop inductance S (down) Unchanged
Clock frequency 1/S2 (up) 1/S2 (up)
Toggling transistors per cycle Unchanged 1/S (up)
Average gate capacitance S2 (down) S2 (down)
Total gate capacitance S2 (down) S (down)
Total signal connections Unchanged 1/S2 (up)
Total wire capacitance S2 (down) 1/S (up)
Total toggling capacitance S2 (down) Unchanged
Power Power consumption (total) S2 (down) Unchanged
Consumption Supply current (total) S (down) 1/S (up)
Current density on power line Unchanged 1/S2 (up)
Voltage Supply voltage S (down) S (down)
Drop IR drop S (down) 1/S2 (up)
L · Di/Dt drop S2 (down) 1/S (up)

The die size is reduced by S2, and the length of power lines is
scaled in S. The line resistance for the power network is not
changed, and the line capacitance for the power network or long
signal lines is reduced by S2.
Based on Equation (1-5), the unit-length inductance between
two adjacent Vcc and Vss lines is not changed, because the line space
(s) and line width (w) are both reduced by S. The total line induc-
tance is reduced in S, due to the power line length scaled in S.
Chip clock frequency is assumed to increase by 1/S2, which is a
simplification of the fact that the microprocessor frequency will
roughly double every two years for the next process generation. In
Scenario A, the logic of the chip is changed very little and the
number of toggling transistors per clock cycle is kept unchanged.
The channel length and width of each device are both scaled
down in S. The average gate capacitance is down by S2. So the total
c01.qxd 12/16/2003 11:21 AM Page 30

30 INTRODUCTION

gate capacitance is down by S2. Since the total wire capacitance of

signals is also down in S2, with unchanged transistor numbers and
signal connections, the total toggling capacitance (Ctoggle = Cgate +
Cwire) of the chip is reduced by S2. The supply voltage is scaled in S
at each process generation, as shown in Figure 1-1(a).
The power consumption can be estimated as: 0.5 · f · V2dd ·
Ctoggle, where f = clock frequency, Vdd = supply voltage, and Ctoggle
= total toggling capacitance of the chip. The power consumption is
reduced by S2 based on the above assumptions for the frequency,
supply voltage, and the total toggling capacitance per clock cycle.
The current of the power distribution network is calculated by
the power consumption divided by the supply voltage. Since the
power is down by S2 and Vdd is down by S, the current is thus
down by S. Since the line width is down by S and current down by
S, the current density of the power line is not changed.
The IR drop is down by S, since the line resistance is not
changed but the current is reduced in S. The L · di/dt voltage
drop is reduced by S2 because the line inductance L is scaled down
by S; di (current) is reduced by S for the same dt period.
Based on Equation (1-3), we got the bound of the total on-chip
decoupling capacitance with 10 times the total toggling capaci-
tance to achieve 10% Vdd noise bound. Because the total toggling
capacitance is reduced by S2, the upper bound of the total decou-
pling capacitance needed in the chip is also reduced by S2.
Since the die size is reduced by S2 in Scenario A, the percentage
of die size used for the on-chip decoupling capacitance is not
changed in this scenario.

Scenario B
The die size is assumed to be not changed in this scenario, so the
global line length is not changed. The line resistance of the power
network is increased by 1/S. The line capacitance of the power
network, or long signals, is reduced in S, since the unit-length ca-
pacitance is down in S, as derived in Scenario A.
Based on Equation (1-5), the unit-length inductance between
two adjacent Vcc and Vss lines is not changed due to the line space
(s) and the line width (w), both reduced by S. The total line induc-
tance is not changed because the global line length is not changed.
The chip clock frequency is supposed to increase by 1/S2 about
every two years for each process generation. In Scenario B, new
c01.qxd 12/16/2003 11:21 AM Page 31

1.6 PROCESS SCALING IMPACTS 31

logic features are implemented, assuming employment of 1/S

more transistors in the design. Therefore, the total toggling tran-
sistors per cycle increases by 1/S.
The gate channel length and channel width are both scaled
down by S, so each gate capacitance is down by S2 and the total
gate capacitance is down by S. The total signal number is in-
creased by 1/S2, for 1/S more transistors used in the design. This
implies that the total wire capacitance of signals in this chip is in-
creased by 1/S, based on the unit line capacitance in this scenario
being reduced by S.
If we assume that the total wire capacitance is almost equal to
the total gate capacitance across a chip (and that is the case we
found in a microprocessor chip), we get the unchanged total tog-
gling capacitance, Ctoggle (Ctoggle = Cgate + Cwire). The supply volt-
age is reduced in S at each process generation.
The average power consumption is calculated by 0.5 · f · V2dd ·
Ctoggle, where f = clock frequency, Vdd = supply voltage, and Ctoggle
= toggling capacitance. The power consumption is unchanged in
this scenario. The current through the power distribution net-
work is calculated by the power consumption divided by the sup-
ply voltage. Since the power is unchanged and Vdd is down by S,
the total current increases by 1/S.
Because the wire width is down by S and current increases by
S, the current density of the power network increases by 1/S2. The
IR drop increases by 1/S2, due to the line resistance increasing by
1/S and the supply current also increases by 1/S. The L · di/dt
noise increases by 1/S since L not changed; di (current) increases
by 1/S for the same dt period.
Because the total toggling capacitance per cycle is unchanged,
the upper bound of the total on-chip decoupling capacitance is
also unchanged, based on Equation (1-3). Since the die size is not
changed in Scenario B, the area percentage used for the on-chip
decoupling capacitance is also unchanged.
Although the scaling models show unchanged power consump-
tion in Scenario B, for most new microprocessors we see more ag-
gressive transistor number increase or more parallelism used for
higher performance. This observation results in more power con-
sumption in new microprocessors. For example, Alpha 21264
(0.35 ␮m) has 1.63 times more transistors than Alpha 21164 (0.50
␮m) (> 1/0.7 = 1.42 scaling factor assumed in Scenario B), and the
power consumption is increased from 50 W to 72 W [1].
c01.qxd 12/16/2003 11:21 AM Page 32

32 INTRODUCTION

Process scaling factor S in Table 1-3 is the ratio of the mini-

mum feature sizes between two process generations. S is about
0.7 [10]. For example, an 0.18 ␮m process is scaled to 0.13 ␮m for
a scaling factor S of about 0.72 (0.13/0.18 = 0.72).

1.7 SUMMARY

This chapter discusses the modeling issues of on-chip power grids.

It provides the primary models and characterization results for
the resistance, capacitance, and inductance associated with metal
lines and vias to route the power distribution network on the chip.
The power distribution network, in general, can be characterized
as a low-pass RLC filter for the frequency domain analysis.
In addition, the resonant frequency should be removed from the
working frequency of the circuit; otherwise, this RLC network will
generate a lot of noise. We describe the inductance effects for the
on-chip power grid. Usually, very dense and narrow width Vss and
Vcc lines are interleaved with each other to reduce the inductance.
In general, as a designer of a power grid, you want to increase
the capacitance while reducing the resistance and inductance.
The latter two parameters are associated with the IR drop and L ·
di/dt noise.
The capacitance increase for a power grid is implemented by
adding intentional decoupling capacitors. In addition, decoupling
capacitors are inserted at the noisy nodes of the power distribu-
tion network. A CAD algorithm has been proposed to automate
this decoupling capacitor insertion process [83].
Finally, we predict future design directions by providing tech-
nology scaling models related to power distribution performance
and voltage drop based on two different chip improvement scenar-
ios.
c02.qxd 12/16/2003 11:50 AM Page 33

2
DESIGN PERSPECTIVES

In this chapter, we describe guidelines for chip layout and floor

planning in power grid design. Enough metal power lines should
be allocated for both the global power network and local power
network in all metal layers in order to deliver current efficiently
through the power network. However, power grids or metal lines
used for Vdd and Vss networks will use up a lot of signal routing
resources.
Therefore, there is an intention from the circuit design perspec-
tive to ignore the power network metal density at the planning
stage in order to reduce the metal layers or reduce the chip size
for manufacturing cost reduction, but it carries the risk of in-
creasing IR drop and L · di/dt noise associated with the power
distribution network.
Therefore, we believe that planning or design guidelines for the
power networks’ metal lines are essential at the early design
planning stage in order to deliver a successful chip.
This chapter is organized into six sections as follows. Section
2.1 covers power grid planning for a communication chip [45].
Section 2.2 examines power grid planning for two microprocessor
chips [46, 47, 48]. Section 2.3 describes the power grid analysis
and decoupling capacitance optimization method for another mi-
croprocessor chip [49]. Section 2.4 discusses the general method-
ology for IR drop analysis and reduction. Section 2.5 discusses the
package-level power network planning [61]. Section 2.6 is a sum-
mary of the chapter.

Power Distribution Network Design for VLSI, by Qing K. Zhu 33

ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c02.qxd 12/16/2003 11:50 AM Page 34

34 DESIGN PERSPECTIVES

2.1 PLANNING FOR COMMUNICATION CHIPS

Deciding on the metal line layout in a chip to minimize the IR

drop and reduce L · di/dt noise is part of power network plan-
ning. Based on Equation (1-1), the metal line resistance is in-
versely proportional to the metal line width. Based on Equation
(1-5), the inductance is also inversely proportional to the metal
line width. In addition, based on the guidelines shown in Figure
1-17, the interleaving of Vdd and Vss lines in small widths is pre-
ferred to reduce the area of the current loop paths and reduce the
inductance. In addition, the resistance and inductance are both
reduced if we use short metal lines from the power supply pads to
the devices.
The methods to improve the layout or package for the power
distribution network are summarized as follows:

1. Adding multiple power lines (Vdd/Vss) over the chip, usually

at some constant space over the chip surface.
2. Adding enough power lines in each layer (for example, M1,
M2, M3, M4, M5, and M6, etc.).
3. Adding enough vias between power lines in adjacent metal
layers.
4. Using advanced package technology, such as the C4 pack-
age, to place multiple C4 power bumps over the chip and to
reduce the distance from the bumps to the on-chip power
network.

The following design example is from a communication chip, as

shown in Figure 2-1 [45].

앫 The first step is to decide on the floorplanning and chip area.

The floorplanning also includes the package options and I/O
locations.
앫 A simplified RLC model is constructed that reflects the pow-
er line electrical models. In order to reduce the computation-
al time, the R and C in the area are lumped in the RC model.
To improve accuracy, the package model is also included for
Vdd/Vss pads.
앫 The inductance may not be included in the above model if it
is not significant in the power distribution and the R/L delay
is much less than the RC delay, as discussed in Section 1.5.
c02.qxd 12/16/2003 11:50 AM Page 35

2.1 PLANNING FOR COMMUNICATION CHIPS 35

IO pads (0.28mm) Boundary scan bank (0.10mm)

Routing channel (0.40mm)

PLL
Standard_Cell 2.5mm
13.4mm
Routing channel (0.45mm)
ARC
Memory

IO pads (0.28mm)
Boundary scan bank (0.10mm)

Routing channel (0.30mm)

Routing channel (0.50mm)

PIO SRAM block (0.20mm)
Routing channel (0.50mm)

19.33mm 14.6mm PIO control bank (0.12mm)

Fabric ESRAM

10.9mm 2.2mm
Routing channel (0.50mm)

15.40mm

Figure 2-1. Floor plan of a communication chip [45].

앫 A sensitivity study is executed by varying the metal density or

metal widths in the chip floorplanning for the power distribu-
tion network. The R and C values in the simplified RC model
will be varied based on the density of the power grid metals.
앫 The sensitivity study can be done by changing the number
and locations of the Vdd/Vss pads supplied to the chip. We then
decide the best IR drop and L · di/dt drop across the chip.
앫 Once we select the power grid structure, we need to deter-
mine the number of Vdd/Vss pads and locations and the metal
line widths for each layer of power grid.
앫 Again, the design is optimized for the power grid with regard
to the IR drop and L · di/dt drop targets, with as little as
possible taken from the layout area.
앫 The IR drop analysis is performed on the DC analysis for this
simplified RC model of the power grid. The package resis-
tance or inductance for each Vdd pad is included in the model
to analyze the voltage drop across the package.
앫 The above power grid modeling and analysis should be done
for both Vdd and Vss networks.
c02.qxd 12/16/2003 11:50 AM Page 36

36 DESIGN PERSPECTIVES

The IR drop or voltage drop is estimated for either Vdd or Vss net-
works. Let us assume the Vdd worst-case drop is ⌬Vdd, and the Vss
worst-case drop is ⌬Vss. So the total worst-case IR drop across the
Vdd and Vss networks is (⌬Vdd + ⌬Vss). Let us assume the voltage
(Vdd) at the inputs of the Vdd pads is Vmax, and the Vss voltage at
the inputs of the Vss pads is 0 V. Therefore, the lowest voltage
Vmin in the chip is estimated based on the following equation:

Vmin = Vmax – (⌬Vdd + ⌬Vss) (2-1)

Figure 2-1 shows the floor plan of the communication chip. The
area is about 15.40 × 19.33 mm. This chip is in a wire bonding
package with Vdd and Vss pads on the chip’s four boundaries.
The power lines cross the main regions as follows: Fabric,
ESRAM, standard cells, and routing channels.

(a)

Figure 2-2. RC modeling of full-chip power grid.

c02.qxd 12/16/2003 11:50 AM Page 37

2.1 PLANNING FOR COMMUNICATION CHIPS 37

(b)

Figure 2-2 (continued).

Figure 2-2(a) shows the simplified RC model for the full-chip

power grid, and Figure 2-2(b) shows the unit-cell RC model for the
power grid in each unit region. The entire chip is partitioned into
many finer unit regions to cover the power grid. Each node in the
unit-cell RC model is tied to a current source, which is a DC cur-
rent to model the average current consumption by the devices lo-
cated in that region.
The most difficult job in the modeling is to estimate the current
consumption, since the current consumption depends on the ap-
plications of the chip and it is very hard to determine with accura-
cy in the model before the chip is manufactured.
There are CAD tools on the market to estimate the current con-
sumption based on test vectors or worst-case assumptions. For a
small unit region, we could apply the circuit simulation on the de-
c02.qxd 12/16/2003 11:50 AM Page 38

38 DESIGN PERSPECTIVES

vices to extract the average current. Figure 2-3 shows the current
models used for each unit region in this example. In addition, the
currents will be different in different regions of the chip due to
different circuit density and switching activity.
The modeling of the current sources can be improved continual-
ly during the chip design stages as more circuits are designed and
more accurate current estimations are obtained. In addition, the
power grid current modeling can be further optimized based on
some test chip or earlier version chip’s power measurement. The
initial specifications of the power grid will come up based on the
simulation model, as shown in Figure 2-2. Figure 2-4 shows the
power routing specifications in the fabric tile region of this com-
munication chip [45].
The simulation result for the power grid model in this chip is
shown in Figure 2-5. The simulation is done for the IR drop analy-
sis. The worst-case IR drop, based on Figure 2-5, is about 99 mV
(1.71 – 1.6112 V). The lowest (Vdd – Vss) voltage across the chip is
about 1.512 V (1.6112 – 0.0998 V).
For the communication chip power grid design shown in Figure
2-1, due to the wire bonding package technology in which all the
Vdd and Vss pads are located on the chip boundaries, many power
straps are required across different regions and routing channels.
In our case, the IR drop target is about 100 mV for each Vdd or Vss
network across the chip.
The following specifications are given for the power routing on
the chip for the Vdd network; the Vss network has the same specifi-
cations and equal metal lines in the routing [45].

.SUBCKT tile_pwr .SUBCKT std_pwr .SUBCKT esram_pwr

I_T1 T1 0 20.3mA I_T1 T1 0 6.9mA I_T1 T1 0 0.4mA
I_T2 T2 0 40.6mA I_T2 T2 0 13.8mA I_T2 T2 0 0.8mA
I_T3 T3 0 20.3mA I_T3 T3 0 6.9mA I_T3 T3 0 0.4mA
I_T4 T4 0 40.6mA I_T4 T4 0 13.8mA I_T4 T4 0 0.8mA
I_T5 T5 0 40.6mA I_T5 T5 0 13.8mA I_T5 T5 0 0.8mA
I_T6 T6 0 20.3mA I_T6 T6 0 6.9mA I_T6 T6 0 0.4mA
I_T7 T7 0 40.6mA I_T7 T7 0 13.8mA I_T7 T7 0 0.8mA
I_T8 T8 0 20.3mA I_T8 T8 0 6.9mA I_T8 T8 0 0.4mA
I_N_5 N_5 0 81.2mA I_N_5 N_5 0 27.6mA I_N_5 N_5 0 1.6mA
.ENDS $ tile_pwr $ .ENDS $ std_pwr $ .ENDS $ esram_pwr $

Figure 2-3. Current consumption in unit regions.

c02.qxd 12/16/2003 11:50 AM Page 39

2.1 PLANNING FOR COMMUNICATION CHIPS 39

Figure 2-4. Fabric tile power routing specifications [45].

c02.qxd 12/16/2003 11:50 AM Page 40

40 DESIGN PERSPECTIVES

+ Vdd = 1.7100
+ Vss = 0.
+ xi_2865.n_5 = 1.6910
+ xi_2866.n_5 = 1.6932
+ xi_2868.n_5 = 1.6980
+ xi_218.n_5 = 1.6926
+ xi_219.n_5 = 1.6951
+ xi_2867.n_5 = 1.6890
+ xi_636.n_5 = 1.6945
+ xi_427.n_5 = 1.6386
+ xi_638.n_5 = 1.6846
+ xi_637.n_5 = 1.6817
+ xi_4.n_5 = 1.6659
+ xi_432.n_5 = 1.6281
+ xi_431.n_5 = 1.6504
+ xi_2870.n_5 = 1.7029
+ xi_840.n_5 = 1.6959
+ xi_424.n_5 = 1.6503
+ xi_423.n_5 = 1.7007
+ xi_428.n_5 = 1.6112
+ xi_425.n_5 = 1.6508
+ xi_434.n_5 = 1.6529
+ xi_430.n_5 = 1.6439
+ xi_429.n_5 = 1.6120
+ xi_433.n_5 = 1.6261
+ xi_426.n_5 = 1.6692
+ xi_6339.n_5 = 1.6979
+ xi_220.n_5 = 1.7008

Figure 2-5. Simulation results of node voltages [45].

앫 Vertical and horizontal channels between standard cell, fab-

ric, and ESRAM regions (metal width):
M6: 125 ␮m (vertical channel)
M5: 125 ␮m (horizontal channel)
M4: 125 ␮m (vertical channel)
M3: 125 ␮m (horizontal channel)
앫 I/O vertical and horizontal channels between core and pads
(metal width):
M6: 125 ␮m (vertical channel)
M5: 125 ␮m (horizontal channel)
M4: 125 ␮m (vertical channel)
M3: 125 ␮m (horizontal channel)
c02.qxd 12/16/2003 11:50 AM Page 41

2.1 PLANNING FOR COMMUNICATION CHIPS 41

앫 Vdd pad connection to core power ring (metal width and

length):
Length: 200 ␮m
M6: 90 ␮m
M2: 90 ␮m
Package resistance for each Vdd pad: 40 m⍀ (from ball to
package substrate to pad)
Input Vdd (lowest) to package Vdd ball: 1.71 V
앫 Fabric tile Vdd lines (metal width):
M6: 200 ␮m total (vertical) inside the tile, 150 ␮m total
(vertical) added between tiles
M5: 550 ␮m total (horizontal) inside the tile, 150 ␮m total
(horizontal) added at sides of tiles
M4: 230.5 ␮m total (vertical) inside the tile, 150 ␮m total
(vertical) added between tiles
M3: 150 ␮m total (vertical) between tiles, 150 ␮m total
(horizontal) added on two sides of the tile
M2: 150 ␮m total (vertical) between tiles.
M1: 150 ␮m total (vertical) between tiles, 150 ␮m total
(horizontal) added on two sides of the tile
앫 Standard cell region Vdd lines (metal width):
M6 completely used for Vdd and Vss vertical straps (total
M6: ~6.7 mm Vdd, ~6.7 mm Vss)
M3: 20 ␮m width straps (horizontal) per 500 ␮m space
M2: 20 ␮m width straps (vertical) per 500 ␮m space
M1: inside standard cells (horizontal) about total 330 ␮m
in the region
앫 ESRAM region Vdd lines (Vdd metal width to fill in white
spaces):
M5 completely over the 9 SRAM blocks (ESRAM/ARC) (to-
tal M5: ~7.1 mm Vdd, ~7.1 mm Vss). 0 ␮m in channels be-
tween ESRAM blocks
M4: 30 ␮m ring (vertical) inside each SRAM block
M3: 30 ␮m ring (horizontal) inside each SRAM block

Figure 2-6 shows the complete power grid (Vdd) simulation model.
The node voltages in the simulation by DC analysis are shown in
this figure and the lowest voltage is about 1.32 V at the center of
c02.qxd 12/16/2003 11:50 AM Page 42

42 DESIGN PERSPECTIVES

Vdd Vdd Vdd V dd Vdd

1 .8 V 1.8 V 1 .8 V 1 .8V 1 .8 V

1 17.5mA 2 66.8mA 3 75 .7 mA 4 67.1 mA 2 75 .3 mA

1 27 .1 mA 1 .8 V
5 4. 2mA 5 3.3 mA 1 .7 2 V 3 4.2 mA 1 .6 9 V 2 8. 7 mA 1 .6 6 V 9 8. 1mA 1 .5 7V 1 58 .2mA 1 .7 2 V
1 .7 6 V Vdd
1.8V Vdd

50m A 100m A 100m A 100m A 100m A I av g 50mA

6 8. 5mA 185 .9 mA 2 81.2mA 297 .7 mA 1 56 .3 mA 194 .2mA

3 .8 mA 131 .0 mA
1 26.7mA 1 49 .3 mA
1 .7 2 V 1 .5 9 V1 02 .8 mA 1 .4 9 V 4 0. 6mA 1 .4 6 V 1.46 V 1 .5 8 V
1 .8 V Vdd
240m A 240m A 240m A 120m A
58m A 178m A
12.1 mA 54.3 mA 103 .4 mA 1 02.1mA 43.4 mA 5 6. 8mA
102 .7mA 2 19 .6 mA 1 .8 V
1 .8 V Vdd 3 8.7 mA 8 3. 2mA 2 66 .6 mA Vdd
1 69 .3 mA 1.66 V
102 .7mA 1 .7 3 V2 51 .4 mA 1 .5 2V 1.36 V 1.3 2 V 1.40 V
2 19 .6 mA
1 .8 V Vdd
1 02.7mA
Vdd
1 .8 V Vdd 17m A 157m A 280m A 280m A 280m A 140m A 1 .8 V

2 7. 6mA 20.6 mA 4 6.0 mA 6 0. 0mA 53.1 mA 2 4.2 mA 1 .8 V

1 71 .5 mA
7 9.3 mA 2 27.4mA Vdd
157 .7 mA 1 77 .3 mA
1 .5 5 V
1 34 .9 mA 1 .42 V 2 5. 2mA 1.4 0 V 1 .4 7V 1 .6 9V
1.70 V Vdd
1 .8 V Vdd
1 71 .5 mA 1 .8 V
157m A 280m A 280m A 280m A 140m A
17m A
2 31.5mA 185 .0mA 48.7 mA
9 .0 mA 1 35 .2mA 2 16 .3 mA

1 7.0 mA 2 5.2 mA 1 6.4 mA 4 2.9 mA

1 .7 0V 4 .6 mA 1 .6 9V 1 .7 1 V 1 .7 5 V
1 .7 1V 1 .7 2 V

78m A 140m A 140m A 140m A 70m A

8m A
255 .4 mA 3 35.7 mA 3 50 .5 mA 298 .4mA 1 61 .6 mA

1 .8 V 1 .8 V 1 .8 V 1 .8 V
Current and Voltag e D istribut ions Vdd Vdd Vdd 1 .8 V Vdd
Vdd
i @ l Ch l S t 8 /28 /01

Figure 2-6. Power grid simulation model [45].

the chip. The simplified power distribution model allows us to do

the sensitivity study while changing the metal widths and densi-
ties in the power routing to see the impacts on the node voltages.
The resistance and capacitance of the metal lines are varied,
based on the given power routing widths of the Vdd network. For
example, Figure 2-7(a) shows the lowest voltage at the center of
the chip by selecting various metal widths for each Vdd or Vss bus
in the routing (horizontal × coordinates in the figure) and various
metal widths extended directly from each Vdd or Vss pad (vertical
y coordinates in the figure).
It is done by using parallel metal buses overlapped in the M6
and M4 (vertical) or M5 and M3 (horizontal) layers. Figure 2-7(b)
further shows the lowest voltage improvement obtained by adding
more parallel buses in M6, M4, and M2 (vertical) and M5, M3, and
M1 (horizontal) routing layers. By adding more power buses in M2,
compared with Figure 2-7(a) and Figure 2-7(b), the lowest node
voltages are slightly improved across the chip by our simulation.
c02.qxd 12/16/2003 11:50 AM Page 43

2.1 PLANNING FOR COMMUNICATION CHIPS 43

1.8
Lowest voltage V (fabric center)

1.6

1.4
1.2 30um pad
1 60um pad
0.8 90um pad
0.6 120um pad
0.4
0.2
0
1 (34um) 2 (68um) 3 (102um) 4 (136um) 5 (170um)
# 34um Lines per VDD Bus

(a)
Lowest voltage V (fabric center)

(b)

Figure 2-7. Sensitivity study of power metal widths [45].

The sensitivity study based on the simplified RC model for the

entire chip power grid provides a useful tool during the power
grid planning. Further sensitivity studies can be iterated during
the power grid planning stage to answer the following questions:
(1) How many Vdd and Vss pads should there be? (2) Where
should these Vdd and Vss pads be located? (3) Do we distribute
them evenly or nonevenly? (4) Do we use wire bonding technolo-
gy or some other more advanced technology to reduce the IR
drop?
In the example we have shown, a huge amount of layout area
has obviously been used by the power grid and the chip area will
be impacted significantly. So C4 or flip-chip technology is definite-
ly a good alternative for this design.
c02.qxd 12/16/2003 11:50 AM Page 44

44 DESIGN PERSPECTIVES

2.2 PLANNING FOR MICROPROCESSOR CHIPS

The following design example is from a high-performance micro-

processor [46]. Power distribution has been always one of the crit-
ical issues in high-performance microprocessor designs. The volt-
age supplies and also the voltage drop budgets are scaled along
with the deep-submicron processes. In addition, the power density
of the die is significantly increased in new processors. The C4
package is used to increase the power drop reduction across the
system to inputs of the chip.
The decoupling capacitors are used for two purposes in high-
performance microprocessor design. They provide the charge
sharing for nearby switching gates. The local decoupling needs a
very fast response time and this response time is scaled in every
generation of the microprocessors. The decoupling capacitors also
provide the charges for suppressing large full-chip current fluctu-
ations over the power delivery system.
Figure 2-8 shows the voltage drop across the power network
system versus the capacitances in the die. It is claimed that the
area of the on-chip decoupling capacitance is about 12% of the to-
tal die size [46]. The power distribution network is a low-pass fil-

Figure 2-8. Power voltage drop versus decoupling capacitance in a high-perfor-

mance microprocessor [46].
c02.qxd 12/16/2003 11:50 AM Page 45

2.2 PLANNING FOR MICROPROCESSOR CHIPS 45

ter in order to suppress the high-order noise, preferably only for

the DC voltage across this system.
Based on the series RLC model, as shown in Figure 1-3, the
quality factor Q will be reduced with large C, small R, and small
L. Low Q will result in the wide bandwidth needed to allow the
AC resonance to pass over the power delivery system. The quality
factor of a series RLC network can be determined as follows:

兹苶
L苶
/苶
C
Q= ᎏ (2-2)
R

There are two methods to plan the power grid in high-perfor-

mance microprocessors [46]. The first method uses spreadsheet
calculations. It computes the voltage drop for a section of the pow-
er grid, which includes the estimation of voltage drops from the
package to the transistors.
The second method is to build the complete RLC model of the
full-chip and package-level power distribution networks. The full-
system models (die, package, and power supply) are needed in the
accurate model to perform the voltage simulations across the pow-
er network.
It is usually simulated overnight and the model complexity is
limited by the simulation time. The results can be used to set the
specifications for the power distribution design on the chip and on
the package. Here are the detailed steps for the C4-package-based
power grid design in the high-performance microprocessor design
[46]:

앫 Start with basic calculations of the current needed for the

chip. The current can be scaled from the prior products. It
can also be decided on based on the spreadsheet and hand
calculations based on the simulation data in individual mod-
ules possibly used in the chip.
앫 Keep in mind that when we design the power grid, the circuit
and layout design of each module may not be clear or final-
ized. So in this stage, a ballpark figure or estimation is used
for the power design. Usually, overallocation of the power
grid lines are common practice due to the overestimation of
the switching current.
앫 Build the first full system model based on the understanding
of what are the causes of the large voltage drop.
c02.qxd 12/16/2003 11:50 AM Page 46

46 DESIGN PERSPECTIVES

앫 Propose the first-order solution for the die, package, and

power supply.
앫 Develop the initial voltage drop budget and simulation volt-
age for timing modeling.
앫 Move toward the detailed design. Determine the exact C4
bump array. Fine-tune the metal grids over the chip based
on the detailed RLC model’s simulation.
앫 The power grid model is improved during the project when
more modules are finalized with circuits and layouts.
앫 Determine the distance limits of the decoupling capacitors,
based on the response time simulation to neighboring
switching gates, and eventually come up with the decou-
pling capacitance placements and sizes needed in the de-
sign.

The current estimation usually uses the spreadsheet method,

based on power estimates, which have substantial uncertainty
[46]. It takes the module power and area into the spreadsheet and
produces the map of the power per grid area. The grid area is fit
to the C4 bump service area. It converts the power of the current
and produces the distribution of the current per bump.
Figure 2-9 shows a detailed M6 grid alignment specification in
this high-performance microprocessor. This gives a regular rela-
tionship between the two layers. Only two M5 tracks are needed
in this assignment to connect M6 to M4 layers. The M6 grid is de-
signed to align with the global M4 grid to enable the efficient
routing of the top-level nets and allow for DRC cleaning in the
full-chip assembly.
To accomplish this, the following M6 specifications are given
for the Vcc/Vss lines:

앫 The M6 grid pitch is a multiple of the M4 grid pitch and will

be aligned to the M4 grid on the floor plan.
앫 The M6 major grid pitch = 538.56 ␮m, which is 11 times the
M4 grid pitch of 48.96 ␮m. The M6 minor grid pitch = 48.96
␮m, which is equal to the M4 grid pitch.
앫 Each M6 minor grid will exactly overlay the M4 grid under
it. The M6 grid is placed on the floor plan such that the Y off-
set of both major and minor M6 grid is a multiple of 48.96
␮m.
c02.qxd 12/16/2003 11:50 AM Page 47

2.2 PLANNING FOR MICROPROCESSOR CHIPS 47

Figure 2-9. Specifications of the power grid on M6 [46].

c02.qxd 12/16/2003 11:50 AM Page 48

48 DESIGN PERSPECTIVES

앫 16 Vcc/Vss stripes between the C4 power rows enable the re-

laxation of the decoupling capacitor placement rule, which is
from 200 ␮m to 500 ␮m.
앫 Each Vcc/Vss strip width is 2.64 ␮m and the space is 1.68
␮m.
앫 Unlike M4 and M5, there are no reserved tracks in M6 for
the global clock distribution.
앫 The global clock will be routed in signal tracks and will be
shielded from any adjacent nonclock-related routing by Vcc
and Vss.
앫 The global clock routing width is 18.96 ␮m and space is 1.44
␮m. They should be designed to fit into the M6 grid.

The M5 power grid, as shown in Figure 2-10, has the following

specifications:

앫 The M5 grid pitch is 81.36 ␮m. The M5 Vcc/Vss width is 6.80

␮m and the space is 1.52 ␮m.
앫 The M5 signal pitch is 4.24 ␮m and there are 12 signal
tracks between two Vss/Vcc pairs.

The M4 power grid, as shown in Figure 2-11, has the following

specifications:

앫 The M4 grid pitch is 48.96 ␮m. The M4 Vcc/Vss width is 2.68

␮m and the space is 1.04 ␮m.
앫 The M4 signal pitch is 2.32 ␮m and there are 16 signal
tracks between two Vss/Vcc pairs.

In order to plan the metal grid design for the full-chip power net-
work, the package model and decoupling capacitor model have to
be included in the entire AC analysis. A reasonably good AC pow-
er network model must be built. We discussed power network
modeling and characterization in Chapter 1.
In this section, we will examine the power network AC analysis
model from two high-performance microprocessors [47–48]. At the
minimum, the analysis must account for the Vcc source, the moth-
erboard Vcc/Vss traces, the board decoupling capacitors, the CPU
socket, the package pin, the power planes, the on-package decou-
pling capacitances, the CPU I/O, core circuits, and the global clock
distribution network.
c02.qxd 12/16/2003 11:51 AM Page 49

2.2 PLANNING FOR MICROPROCESSOR CHIPS 49

Figure 2-10. Specifications of the power grid on M5 [46].

With this AC model, the CPU I/O and core can be toggled to
mimic the execution of the CPU, and the power network perfor-
mance can be measured and analyzed. The AC model from a high-
performance microprocessor is made of three submodels: the
package model, the I/O model, and the CPU core model. These
models are shown in Figure 2-12.
The I/O and core cell models are represented by an array of the
circuit models to model the global power grid on the M4 and M3
layers across the chip, with the switching current tied to each core
cell to model the switching activity of the circuit, as shown in Fig-
ure 2-13. The current model can be a triangular or other current
c02.qxd 12/16/2003 11:51 AM Page 50

50 DESIGN PERSPECTIVES

Figure 2-11. Specifications of the power grid on M4 [46].

Figure 2-12. Package-level power network modeling [47].

c02.qxd 12/16/2003 11:51 AM Page 51

2.2 PLANNING FOR MICROPROCESSOR CHIPS 51

Figure 2-13. I/O and CPU core power network modeling [47].

waveform from the circuit simulation of this design. The I/O mod-
el will include the detailed I/O circuits.
Since the global clock tree will consume a lot of power, in this
model the detailed model of the clock tree is included for the
whole power network simulation. In addition, the decoupling ca-
pacitors are included in this model, as shown in Figure 2-13.
As shown in Figure 2-13, the total chip is partitioned into 180
core cells in this AC model. Each cell represents about 1150 ×
1000␮m2 of area in the chip. Each cell includes the modeling of
M4 Vcc/Vss, M3 Vcc/Vss and the back power plane network. The on-
chip decoupling capacitors are added in the model to simulate the
effectiveness of such capacitors.
c02.qxd 12/16/2003 11:51 AM Page 52

52 DESIGN PERSPECTIVES

The core cell current source is turned on to consume a total of

8 A average current. The I/O models can be turned on simulta-
neously. The AC wave of the core cell current is shown in the
cell model. A high current peak is introduced after the rise of
the clock and a smaller peak is introduced after the fall of the
clock.
In order to understand the impact of on-chip decoupling ca-
pacitors on the power network, it is necessary to break the on-
chip decoupling into two categories: global on-chip decoupling
and local on-chip decoupling. For performing global on-chip de-
coupling, the on-chip decoupling capacitor value in the core cell
is varied from 0 pF, 100 pF, 300 pF, and 500 pF to represent a
total decoupling of 0 nF, 18 nF, 54 nF, and 90 nF in the active
core.
Simulations were done in a typical corner with the Vcc set to
2.5 V. The results are shown in Table 2-1. It is obvious that
there is a net improvement on the power network and clock dis-
tribution with the global decoupling capacitors. Assuming that a
greater percentage of the channels can be used to implement the
decoupling capacitors, the decoupling capacitor layout density
can be calculated, assuming 34% of the active core area in the
channels.
An investigation of the effect of local on-chip decoupling on the
power network was conducted [47]. A 5 nF decoupling capacitor
was placed in one of the core model cells. It had roughly the same
decoupling density as the 90 nF case in the global study with no
decoupling capacitors in other core model cells. Simulation results
indicate that the effect of the local decoupling is not limited to the
cell where the decoupling capacitors are placed. The surrounding
core cells, both in the M4 and M3 directions, all benefit from this
large decoupling capacitor. The simulation result of this local de-
coupling is shown in Table 2-2.

Table 2-1. Global decoupling capacitor results [47]

Total decoupling Worst-cycle Worst cycle Worst-case
capacitance average minimum Circuit speed global clock
(nF) Vcc/Vss (V) Vcc/Vss (V) up (gates) jitter (ps)
0 2.071 2.002 Baseline 96
18 2.089 2.046 1.00% 83
54 2.122 2.087 2.40% 74
90 2.136 2.110 2.75% 59
c02.qxd 12/16/2003 11:51 AM Page 53

2.2 PLANNING FOR MICROPROCESSOR CHIPS 53

Table 2-2. Local decoupling capacitor results [47]

Total decoupling Worst-cycle Worst cycle Worst case
capacitance average minimum Circuit speed global clock
(nF) Vcc/Vss (V) Vcc/Vss (V) up (gates) jitter (ps)
0 2.051 1.920 Baseline 96
5 2.071 1.959 0.39% 90

The local decoupling capacitors are extremely useful for high-

switching-current circuits. They prevent the dip of the power sup-
ply voltage around these areas due to the immediate large current
flows. For example, if the decoupling capacitors are placed in the
left and right I/O areas, ~8 nF total decoupling capacitance in the
I/O regions has been reported [47].
The center clock spine will also have decoupling capacitors
(~4–5 nF) [47]. It is strongly recommended to have enough decou-
pling capacitors close to each clock buffer in the chip. The global
decoupling is implemented to prevent the overall dip in the power
supply. Therefore, the die, the package, and the board design re-
quire additional decoupling capacitors for high-performance mi-
croprocessors. For example, a minimum of 25 nF decoupling ca-
pacitance is required on the die [48]. However, to improve the
performance of the power supply network, 60 nF or more is rec-
ommended for this processor.
There are usually dead spaces in the layout that are not being
occupied by the devices, which may comprise up to 10% of the to-
tal die area. In addition, some percentage of the layout area is oc-
cupied by the decoupling capacitors based on the AC analysis for
the power network. In [48], >1% device of the area is reserved for
decoupling capacitors, and >20% of the total area is the channel
area used for decoupling capacitors.
As described in Chapter 1, a decoupling capacitor is an nMOS
device with its gate tied to Vcc and its source and drain tied to Vss.
Each ␮m2 of the gate area will provide ~5.5 fF of capacitance [48].
There is a set of standard decoupling cells to assist the layout
design of the decoupling capacitors, as shown in Figure 2-14 [48].
These standard cells have the split geometries, with split poly
contacts and split diffusion contacts, as shown in Figure 2-15 [48].
These standard cells have sizes of 2 × 2 ␮m2, 4 × 4 ␮m2, and 6 × 6
␮m2.
Fill in any available space with decoupling capacitors. The diffi-
c02.qxd 12/16/2003 11:51 AM Page 54

54 DESIGN PERSPECTIVES

Figure 2-14. Decoupling capacitor standard cells [48].

culty lies in routing the filled decoupling capacitors to the Vcc and
Vss lines in the layout. Once the decoupling capacitors are insert-
ed into the layout, the schematic should be updated with the in-
serted decoupling capacitors to make sure the layout versus
schematic (LVS) is clean in the layout verification.
When we update the schematic, the decoupling capacitors can

Figure 2-15. Decoupling capacitor layout [48].

c02.qxd 12/16/2003 11:51 AM Page 55

2.3 IBM CAD METHODOLOGY 55

add one nMOS device, with the total gate area equal to the sum of
all the individual decoupling capacitors.

2.3 IBM CAD METHODOLOGY

A model to analyze the on-chip power supply network of another

high-performance microprocessor is described in [49]. A complete
power distribution model is shown in Figure 2-16; it includes the
package-level power distribution network, the on-chip power bus
model, and the equivalent circuits to represent various on-chip
switching activities for each functional block.
Among the three major components in the model, the package-
level power bus model is dominated by the inductance. The on-
chip power bus model is dominated by wire resistances. The
switching circuit model determines the switching currents in the
chip. In addition, the Cdecap and Rdecap in Figure 2-16 show the
equivalent model of the decoupling capacitor.
A package-level power bus model for a single-chip site is
shown in Figure 2-17. The power and ground distribution net-
works on the thin-film and ceramic mesh planes are represented
with the equivalent inductance model. In this model, the off-chip
decoupling capacitors, the multiplayer ceramic vias, the C4 con-

Figure 2-16. Equivalent model for power network AC analysis [49].

c02.qxd 12/16/2003 11:51 AM Page 56

56 DESIGN PERSPECTIVES

Figure 2-17. Package model of power distribution [49].

nections to the chip, and the I/O pins to the board interface are
all included.
To analyze the on-chip power supply voltage drop, we need to
model the resistance, capacitance, and inductance of each power
bus segment. The nominal resistance at 25°C, R25 = Rs/width, is
determined by each layer’s sheet resistance Rs and the width of
the power line.
At an operating temperature of 85°C, the resistance is in-
creased with the following well-known linear model to reflect the
increase of the temperatures:

R85 = R25[1 + Tc(85 – 25)] (2-3)

where Tc is the temperature coefficient. An additional 10% is

added to account for the electromigration-induced resistance in-
crease over the lifetime of the device. The total capacitance for the
power bus consists of three components: the area capacitance, the
fringe capacitance, and line-to-line capacitance.
The area capacitance is the parallel plate capacitance to the
wiring planes above and below. The fringe capacitance is the ca-
pacitance from the left and right edges of the wire to the wiring
planes above and below. The line-to-line capacitance is the coupling
capacitance between adjacent wires on the same wiring plane.
c02.qxd 12/16/2003 11:51 AM Page 57

2.3 IBM CAD METHODOLOGY 57

The inductance modeling is more complex, since the formula is

not well developed. Therefore, an impedance characteristics pro-
gram is usually used to calculate the inductance [50].
An equivalent RLC power bus network can be generated. In or-
der to reduce the complexity for full-chip analysis, a hierarchical
approach is used to build the on-chip power bus model. At the
chip level, a global routing grid is generated.
In order to reduce the complexity of full-chip analysis, a hierar-
chical approach is used to build the on-chip power bus model. At
the chip level, a global routing grid is generated to subdivide the
chip into global routing cells. All the switching activities within
one global routing cell are lumped together, and adjacent cells are
connected in global power buses.
At the macro level, where local hot spots are located, a finer
grid will be generated to model the detailed power bus structure.
Since the power supply voltage in one region can be affected by
the switching activities in the neighbouring regions, the finer de-
tailed power bus model should always be connected to the adja-
cent global power bus model to ensure the analysis results.
It also confirmed that the excessive power supply drop ⌬V in
the deep-submicron design also necessitates the use of the on-chip
decoupling capacitors in addition to the off-chip decoupling capac-
itors. Without any decoupling capacitors, the impedance will be as
follows:

Z = R + j ␻L (2-4)

where R is the resistance, L is the inductance of the power distri-

bution network, and ␻ is the angular frequency.
Obviously, the impedance is increased linearly with the fre-
quency in this case, and more ⌬V across the power distribution
network will be observed in the high-frequency applications.
To model the switching activities for each functional block, we
build an equivalent circuit, which consists of time-varying resis-
tors (R1, R2, R3), loading capacitors (C1, C2, C3) and decoupling ca-
pacitors (Cd1, Cd2), as shown in Figure 2-18(a). The loading capac-
itance for the equivalent circuit is calculated by CL = P/(0.5 V2ddf),
where P is the estimated power for the corresponding area, Vdd is
the power supply voltage, and f is the clock frequency.
When the circuit is switched off, the time-varying resistance
will be set to Roff. Since not all circuits will switch at the same
time, the circuit represented by the loading capacitance CL can be
further partitioned into subcircuits represented by C1, C2, C3, . . . ,
c02.qxd 12/16/2003 11:51 AM Page 58

58 DESIGN PERSPECTIVES

where the total capacitances will be CL in order to simulate the

distributed switching activities. The timing and delay patterns of
each subcircuit can be controlled separately by switching on and
off R1, R2, R3, . . . at different times.
If the simulation results of the functional blocks are available,
we can replace the nonlinear devices and capacitive loads in the
switching-circuit model with the piecewise linear current sources,
which mimic the waveforms of the actual circuits.
A triangular or trapezoidal current waveform, which is simpler
than the piecewise linear current waveform, can be derived by cal-
culating the total average current Iave and peak current Ipeak for
each macro in the procedure listed, as follows [49]. The triangular
and trapezoidal current waveforms are shown in Figure 2-18(b).

앫 Simulate the circuit without loading to obtain the internal

Iave and Ipeak.
앫 Calculate the total output capacitance Cout from all output
nets.

(a)

(b)

Figure 2-18. Switching model for power network simulation [49].

c02.qxd 12/16/2003 11:51 AM Page 59

2.3 IBM CAD METHODOLOGY 59

앫 Iave(total) = Iave(internal) + Cout · Vdd · f, where Vdd is the

power supply voltage and f is the frequency.
앫 Ipeak(total) = Ipeak(internal) · n, where n is an empirical ratio
between the peak current with loading and the peak current
without loading.
앫 Calculate the total power using the following formula: P =
0.5 · Vdd · [Iave(internal) + Cout · Vdd · f · SF].

After the equivalent circuit of each functional block is generated,

it will then be assigned to the global routing cells where the
functional block is located, and connected to the corresponding
points on the power bus. The model for the on-chip decoupling
capacitors consists of three major components: the n-well capaci-
tor Cnw, the circuit capacitor Cckt, and the thin-oxide capacitors
Cox. The n-well capacitor Cnw is the reverse-biased PN junction
capacitor between the n-well and p-substrate, as shown in
Figure 2-19(a).
The time constant for Cnw is process-dependent, but usually can
be characterized as between 250 ps and 500 ps. The circuit capac-

(a)

(b)

Figure 2-19. Decoupling capacitors and RC modeling [49]. (a) n-well junction
capacitor. (b) Thin oxide capacitor.
c02.qxd 12/16/2003 11:51 AM Page 60

60 DESIGN PERSPECTIVES

Figure 2-20. Switching capacitor provided by a nonswitching circuit [49].

itor Cckt is derived from the built-in capacitance between Vdd and
ground in nonswitching circuits, as shown in Figure 2-20. The to-
tal capacitance C, the sum of Cp and Cn, from nonswitching cir-
cuits, is estimated as [49]:

C = [P/(V 2 f)] · (1 – SF)/SF (2-5)

where P is the power of the circuit, V is the power supply, f is the

frequency, and SF is the switching factor of the circuit.
The time constant for Cckt is determined by the switching speed
of the device, and it typically ranges from 50 ps to 250 ps. The
thin-oxide capacitor Cox uses the thin oxide layer between the n-
well and the polysilicon gate, as shown in Figure 2-19(b), to pro-
vide the additional decoupling capacitance needed to alleviate the
switching noise problem.
The thin-oxide capacitors are usually added near the drivers,
the high-power macro blocks, and empty spaces in the chip. The
RC time constant ranges from 100 ps to 300 ps [49]. According to
the switching patterns and placement of the functional units, the
equivalent circuits for these units are attached to the power bus
in the corresponding locations [49].
The on-chip power buses are then connected to the power struc-
tures on the package with the complete simulation model. With-
out the package model, the power simulation results are not accu-
rate enough. It is impossible to assume the constant power supply
voltages at the I/Os.
To have an accurate chip-level noise analysis result, one must
include a package-level model to account for the voltage drops on
both the package level and chip level.
Signals can be switched with some patterns for a long time,
c02.qxd 12/16/2003 11:51 AM Page 61

2.3 IBM CAD METHODOLOGY 61

with different impacts on the power supply voltage. The difficulty

lies in the timing patterns that must be extracted accurately in
order to simulate the dynamic switching supply voltage wave-
forms. In a lot of cases, the power network is overdesigned to ac-
commodate the worst-case switching patterns of the circuits or
signals in each functional block.
It is even more important to model the switching patterns be-
tween functional blocks correctly. We are concerned with not only
the steady-state noise of the hot spots, but also the transient noise
when circuits switch from one power level to another. To examine
the different noises between units in the chip, the authors in [49]
partitioned a chip into nine (3 × 3) regions.
With a power supply voltage of 2.5 V in a 0.25 ␮m CMOS tech-
nology, and when circuits are switched from 20% idle power to
100% full power, the transient voltage and the steady-state volt-
age are measured in each region. If the flip-chip or C4 technology
is used to provide the on-chip power supply, the minimum steady-
state Vdd will be about 2.37 V [49].
If using wire-bonding peripheral I/Os instead of the C4 technol-
ogy, the minimum steady-state Vdd in the center region will drop
to 2.0 V. The following section describes a decoupling capacitor
optimization procedure to minimize the sizes and optimise the lo-
cations of the on-chip decoupling capacitors with the floor-plan-
ning constraints [49].
Most designs now require the voltage drop to be within 10% of
Vdd. To achieve this goal, decoupling capacitors are added to mini-
mize the switching noises. For high-performance circuits with a
frequency of 400 MHz or higher, 10% or more chip area is needed
for this purpose. Therefore, it is important to estimate and allo-
cate the area needed for on-chip decoupling capacitors during the
early floor-planning stage.
The floorplanning of decoupling capacitors is restricted by the
topological and ordering constraints of the preplaced functional
blocks. Two directed acyclic graphs are used to represent the ver-
tical and horizontal spaces between adjacent blocks. The edge
weights in the acyclic graphs represent the spaces allocated be-
tween adjacent functional blocks [49].
The optimization of on-chip decoupling capacitors involves an
iteration process between the circuit simulation and floor plan-
ning. Given the specifications and locations of each function block,
the circuit simulator will analyze the switching noise of the power
bus, identify the hot spots, and then determine the amount of de-
c02.qxd 12/16/2003 11:51 AM Page 62

62 DESIGN PERSPECTIVES

coupling capacitance Cn needed for each global cell n in the power

grid.
The floorplanner then translates the amount of decoupling ca-
pacitance into physical area An, generates pseudoblocks in each
region, and determines their locations and dimensions. The added
decoupling capacitors will be modeled and simulated in the new
floor plan during the next iteration until ⌬V is satisfied [49].

2.4 DESIGN FOR IR DROP

IR drop is a reduction in voltage that occurs on both power and

ground networks in integrated circuits. Narrower metal line
widths cause an increase in the metal resistance and therefore in
the amount of the voltage drop in the chip. The amount of the
voltage drop depends on the effective resistance from the power
pads to the logic gates. The metal-line resistance is formulated in
Equation (1-1).
The voltage drop is calculated by the following formula:

⌬V = Iavg · Reff (2-6)

where Iavg is the average current switched by the logic gates from
the power lines originating from a Vdd pad. The term IR drop (⌬V)
is derived from Equation (2-6), which is based on the product of
the current I flowing through the effective resistance Reff. Based
on Equation (2-6), the methods to reduce the voltage IR drop are
summarized as follows:

앫 Reducing the current consumption (Iavg) of logic gates. There-

fore, any low-power design techniques on the circuit will
help. Process scaling or capacitance reduction will also help.
앫 Another alternative is to increase the number of Vdd and Vss
pads in the chip to reduce the current consumption for each
pair of Vdd and Vss pads.
앫 If the gates along the metal line switch together, the IR drop
can be larger due to the increased Iavg. Therefore, some alter-
native switching order for large current gates helps to reduce
the IR drop.
앫 Reducing the wire resistance. In this category, the widening
of the metal lines for power lines, or adding more power lines
c02.qxd 12/16/2003 11:51 AM Page 63

2.4 DESIGN FOR IR DROP 63

in the layout are obviously preferred in the power grid floor

plan.
앫 In addition, multiple power layers with extremely dense
power lines in the layout are used for high-performance mi-
croprocessors. The wire resistance is also proportional to the
metal line length from the power pads to the logic gates.
앫 The C4 package technology provides the area I/O pads,
which can provide short power lines. Therefore, most high-
performance chips currently use the C4 technology instead of
the wire-bonding technology for this reason.

Figure 2-21 shows a power supply connected to the chip pads. The
resistors in this figure are the effective resistances in the Vdd and
Vss power grid distribution. R11–R14 are for Vdd and R21–R24 are
for Vss. G1–G4 are for logic gates. When the designers are doing
the transistor-level simulation, the voltages (V1–V4) are assumed
to be equal.
In reality, due to the power grid resistances, the Vdd voltage
will be reduced due to the current flowing through resistors
R11–R14, whereas the Vss voltage will be increased due to the
same current flowing through resistors R21–R24. The worst-case
drop between the Vdd and Vss at any logic gate G1–G4 should be
estimated as follows:

⌬Vmax = ⌬Vdd + ⌬Vss = Iavg · RVdd + Iavg · RVss

or
⌬Vmax = Iavg (RVdd + RVss) (2-7)

where ⌬Vmax is the worst-case voltage drop between Vdd and Vss,
⌬Vdd is the IR drop of the Vdd distribution, and ⌬Vss is the IR drop

Figure 2-21. Power grid modeling [51].

c02.qxd 12/16/2003 11:51 AM Page 64

64 DESIGN PERSPECTIVES

of the Vss distribution. Iavg is the average current consumption of

the region provided by one pair of Vdd and Vss pads. The (RVdd +
RVss) is the sum of effective resistances in the Vdd and Vss distribu-
tion lines from the pair of Vdd and Vss pads to their supplied logic
gates.
The IR drop can either have a local or global effect on the chip
performance [51]. The IR drop is a local phenomenon when a
number of gates in close proximity switch at once, causing the IR
drop in that area. A local IR drop can also be caused by a higher
resistance to a specific portion of the grid, such as R14 being
much larger than expected.
The IR drop can also be a global phenomenon when activity in
one region of a chip causes an IR drop in other regions. In a well-
meshed power grid with equally distributed currents, the power
grid typically has a set of equipotential IR drop surfaces that form
concentric circles cantered in the middle of the chip. So the center
of the chip usually has the largest IR drop or the lowest supply
voltage, especially in the wire-bonding technology. The IR drop
formula illustrates that it is important to model the switching
patterns of the logic gates in a continuous timing period.
If all the gates switched at once, the local or global drop on a
chip would be extremely large, an example being when the clock
and synchronized elements are switched at the same time. The
peak IR drop is much larger than the average IR drop. The peak
IR drop happens in the worst-case switch patterns of the logic
gates, which excite the maximum amount of power from the
gates.
The primary cause of the simultaneously switching IR drop is
the gate switching due to the clock, the bus, or signal pads. When
the global drop is high, but not high enough to cause logic failure,
the IR drop may cause the timing failure. The IR drop, which low-
ers the supply voltage, will slow down the speed of the gate opera-
tion.
The 5% IR drop in the lower supply voltage will slow down the
timing speed by 10–15% [51]. The circuit performance or speed
paths in the chip greatly depend on the supply voltages. Unfortu-
nately, the supply voltage across the chip, especially for the large-
size dies such as system-on-chip applications, is varied due to the
voltage drops.
Two kinds of well-known voltage drops are discussed in the lit-
erature for the on-chip power supply network: IR drop and di/dt
c02.qxd 12/16/2003 11:51 AM Page 65

2.4 DESIGN FOR IR DROP 65

noise [6, 52]. The IR drop is defined as the average of the peak
currents in the power network multiplied by the effective resis-
tance from the power supply pads to the center of the chip. There-
fore, in the wire-bonding environment, we can observe the worst-
case IR drop or the lowest supply voltage at the center of the chip.
Flip-chip technology, which provides area pads on the top of the
chip, can ease this problem and this package technology is seen to
be more popular for the chips employing 0.13 ␮m process technol-
ogy due to the IR drop problem.
The following example shows the IR drop problem in a wire-
bonding package technology with five metal layers with 0.25 ␮m
process technology. M5 is completely used for power straps to re-
duce the IR drop. Readers can see the severity of the IR drop
problem in the case of the wire-bonding package technology in the
communication chip.
A postlayout simulation methodology has been described as fol-
lows [54]. The methodology has been used in the standard cell de-
sign style in a Vdd and ground mesh structure, as shown in Figure
2-22. The standard cell design style has the regular rows of cells
aligned in multiple rows, and the power lines of the standard cells
are butted together in the same row. The circuit simulation to a
set of standard cells is used to understand the parameters that
impact the IR drop.

Figure 2-22. Power mesh on standard cell design [54].

c02.qxd 12/16/2003 11:51 AM Page 66

66 DESIGN PERSPECTIVES

Knowing when and under which conditions the currents to the

standard cells are large, we can devise the following method to
simulate the most severe IR drop.

앫 Simulate all standard cells and classify them into two class-
es: negligible IR drop impact and severe IR drop impact. The
latter class for all the standard cells will have current from
the Vdd to the cell at the switching points greater than the
current threshold (i.e., 1 mA).
앫 Draw the schematic of the Vdd mesh, featuring a metal resis-
tor for each vertical or horizontal metal segment of the power
mesh. It is recommended that a contact or via resistance be
inserted in order to improve accuracy. In the postlayout, the
RC extraction tool can be used to get the complete RC net-
work [59, 60].
앫 At each cell of the power grid, add a current source to model
the sum of the switching current of cells tied from this point.
앫 Partition the whole chip into smaller areas based on the cur-
rent source points in the above modeling. Inside each area,
we can calculate the average current from Vdd to all cells be-
longing to the area.
앫 A worst-case assumption can be made that all the cells in
this area will switch at the same time if we do not have the
switching activity patterns. But the best way is to decide
that the ratio of the cells will switch based on switching ac-
tivity patterns, so the worst-case whole switching total cur-
rent can be multiplied by this ratio (20%, 30%, or 40%) to get
a more realistic current consumption.
앫 The estimated average currents are taken as the current
sources. In addition, the current sources can be modeled as
triangular or trapezoidal waveforms, as shown in Figure 2-
18.
앫 Simulate the Vdd or Vss model with the interconnect RC and
current sources. If you have a large-sized power grid, the fast
circuit simulator will be preferred.

The standard cells simulation can be done using the stimuli vec-
tors to model the transient current waveform from Vdd to the
gates. The simulation can be done in different corners of the
process, with different temperatures, supply voltages, and transi-
c02.qxd 12/16/2003 11:51 AM Page 67

2.5 PACKAGE-LEVEL METHODOLOGY 67

Figure 2-23. Schematic of standard cell [54].

tion times of the input signals to the standard cells. Figure 2-23
shows the schematic of a few standard cells in the design [54].

2.5 PACKAGE-LEVEL METHODOLOGY

There is a general trend toward higher and higher on-chip di/dt

noise and less and less tolerance for the voltage noise caused by
the fast switching currents (L · di/dt). Many factors are making
the di/dt problem worse: faster transistors, high current levels,
shorter clock cycles, lower noise tolerance due to lower Vcc levels,
and power saving techniques. Low power design techniques actu-
ally degrade the stability of the on-die power supply levels be-
c02.qxd 12/16/2003 11:51 AM Page 68

68 DESIGN PERSPECTIVES

cause large sections of the die get turned on and off at various
times [61].
There are three ways to handle the di/dt: (1) lower the induc-
tance so that V = L · di/dt becomes lower, (2) add decoupling ca-
pacitance in strategic locations, and (3) identify and reduce,
where possible, high sources of di/dt in the design.
In order to get a rough idea of the magnitude of the problem, as
seen from the package pins, let us look at the maximum allowable
package–die loop inductances for several Intel microprocessors, as
shown in Table 2-3 [61]. The L · di/dt noise generated on the chip
can be calculated as follows in Table 2-3:

L · Icc(average)/(0.5 · Tc) (2-8)

where L is the loop inductance, Icc is the total current from the
power supply to the circuits of the chip, and Tc is the clock cycle
time.
Table 2-3 calculates the inductance L, using Equation (2-8),
based on the power supply noise upper limit, about 5% of Vdd. If
we know the power supply noise upper limit, the Icc(average) of
the chip, and the clock cycle time or clock frequency, Equation (2-
8) can derive the maximum allowable loop inductance L. This
simple model shows dramatic reduction of the maximum allow-
able inductance in the design for the power network in high-per-
formance microprocessors with increasing frequencies.
Given an initial stimulus on the circuit, the power network Vcc
and Vss will try to oscillate 180 degrees out of phase at the ringing
frequency as follows:

1
␻0 = ᎏ (2-9)
兹苶
L苶C

Table 2-3. Maximum allowable inductances to achieve power noise limits in

high-performance microprocessors [61]
Maximum Allowable
Frequency (MHz) di/dt Power Noise Limit Inductance, L
100 3 A/5.0 ns 165mV 275 pH
150 7 A/3.3 ns 145mV 68 pH
200 7 A/2.5 ns 125mV 45 pH
300 7 A/1.6 ns 90mV 21 pH
500 40 A/1.0 ns 90mV 2 pH
c02.qxd 12/16/2003 11:51 AM Page 69

2.5 PACKAGE-LEVEL METHODOLOGY 69

where L is the total power supply loop inductance, and C is the

Vcc/Vss total capacitance, including the decoupling capacitance in-
serted in the design. The oscillator may be forced to oscillate at
the device’s clock frequency if the current levels are high enough.
The magnitude of the oscillation is referred as the power supply
noise level Vnoise, as shown in Figure 2-24.
Vnoise is related to many factors in the design, and is mainly
based on the following: (1) the power supply inductances for Vdd
and Vss, (2) the Vdd/Vss on-die capacitance Cdie, (3) the power sup-
ply resistance, and (4) the di/dt numbers from the switching
gates [61].
The following are common techniques in microprocessor circuit
design to reduce the power noise levels.

앫 Supply the chip with as many Vdd and Vss pins as possible to
reduce the LVcc/Vss loop inductance.
앫 Add the decoupling capacitors on the die so that the highest
frequency components of di/dt do not need to be supplied by
highly inductive paths through the package and board.
앫 Try different architecture techniques to limit di/dt, especial-
ly in the case of clock gating for power saving.

The minimum and maximum of Vdd and Vss have performance

and reliability implications, respectively. Timing slowdown may
occur when Vdd/Vss is at a minimum. Timing skews may arise
from some circuits speeding up at high Vdd/Vss, and others slowing
down at low Vdd/Vss. Hot electron operating limits or gate oxide
stress limits may be exceeded during the Vdd/Vss peaks, leading to
reliability failures.

Figure 2-24. LC Oscillation due to power distribution [61].

c02.qxd 12/16/2003 11:51 AM Page 70

70 DESIGN PERSPECTIVES

The timing failures are easy to catch during testing, but relia-
bility problems are not. Low-power design introduces its own set
of problems. An ideal low-power design would result in low values
of Iavg and di/dt. All units on the die would use small currents
when active and very little current when inactive.
Low-power designs for microprocessors can typically result in
reducing the maximum current peaks moderately, reducing the
time spent at peak levels greatly, and causing very low values of
current when the chip is carrying out easy tasks or is in standby
mode [61].
One concern is the use of lower voltage to achieve low power.
Although low power supply voltages help lower the power con-
sumed, higher transistor counts and higher frequency rates usu-
ally keep the Icc relatively high.
Lower Vcc usually means maintaining a lower absolute value of
the voltage noise. Considering the IR drop across the die, power
supply guard bands, and tester guard bands, very little margin is
left for the on-die power supply oscillations. Since the di/dt usu-
ally remains fairly high, large values of decoupling capacitance
are needed.
Decoupling capacitance reduces the power supply noise by
charging up during the steady state and supplying current during
the time at which the circuit switches. Also, decoupling capaci-
tance filters out the differential mode noise on the Vss line from
the power supply by keeping the Vdd and Vss constant.
Some amount of decoupling capacitance exists naturally on the
chip—capacitance of n-wells to the substrate, capacitance of the
circuits that are not switching, capacitance between the Vdd and
Vss traces, etc. A conservative estimate is that only 10–20% of the
circuits on the chip switch at any given time; the remaining cir-
cuits act as decoupling capacitors [61].
Additional decoupling capacitance is usually placed on the die
opportunistically if there exist unutilized areas on the die. One
example of this opportunistic capacitance placement is in the
routing channels with empty spaces. The difficulty is greater in
routing to the power grids for Vdd and Vss to these decoupling ca-
pacitors. The need for on-die decoupling capacitance is growing
with the increased operating frequency and increased die size.
A very common example of a large number of drivers switching
simultaneously occurs in wide signal buses. For example, in the
case of a microprocessor, the worst-case scenario is with the
c02.qxd 12/16/2003 11:51 AM Page 71

2.5 PACKAGE-LEVEL METHODOLOGY 71

write-back bus on four different ports, for a total 292 bits switch-
ing simultaneously. Each bit drives a 5 pF load with a CMOS in-
verter size of pMOS = 120 ␮m and nMOS = 78 ␮m.
Figure 2-25 shows a plot of the maximum supply voltage drop
as a function of the total width of a p-transistor switching simul-
taneously from low to high for this write-back bus. The write-back
bus drivers are laid out in a strip 1000 ␮m tall and 6000 ␮m long
[61].
The power supply noise is obtained by simulating bus drivers in
a power grid model for this microprocessor, with the resistance
and inductance of lines and decoupling capacitors properly mod-
eled. In Figure 2-25, the amount of the decoupling (CD) related to
the total load (CD/Cload) is varied to show the effects on the power
supply noise [61].
Identifying potential noisy areas on the die based on the loca-
tions of wide signal buses is fairly easy. However, it is not an easy
task to find clumps of simultaneously switching random logic
gates on the die. Such clumps as commonly used can be as bad as
the example given above in terms of injecting noise into the sup-
ply rails. Hot spots can be identified by summing up the driver
sizes (pMOS only or nMOS only) that switch in the same timing
window from adjacent devices in the design.

Figure 2-25. Voltage drop versus driver size and decoupling capacitance [61].
c02.qxd 12/16/2003 11:51 AM Page 72

72 DESIGN PERSPECTIVES

From the above discussion, it is apparent that for low supply

noise, oversized drivers should be avoided. The driver should be
sized just big enough to meet the timing goal. In fact, a slightly
undersized driver may be faster than an oversized driver, because
of higher supply voltage available to the undersized driver during
the switching due to lower supply noise.
A more accurate model for the decoupling capacitor is shown in
Figure 2-26. It takes into account the lossy ESR (effective series
resistance) and inductive ESL (effective series inductance) prop-
erties, as well as the actual capacitance value.
When used to decouple the Vcc/Vss voltage planes, this model
needs to be modified to add the effective inductance of intercon-
nects (vias) and the plane segment connecting the capacitor to the
load. Inductive levels are most significant in high-speed decou-
pling applications. The lossy component, represented by the ESR
of the capacitor, is most significant in decoupling large current
transitions such as those around a high-power voltage regulator.
With a lower absolute voltage margin and increasing load dy-
namics, the ability of the system power supply to directly power
the CPU becomes quite limited. To avoid excessive IR and induc-
tance-generated voltage drops, a DC/DC converter is used to pow-
er the CPU.
Decoupling capacitance is added on the die, in the package, and
on the printed circuit board, and any solution should consider the
fact that all locations have an influence on the final solution, as
shown in Figure 2-27.
The cost of the decoupling capacitance should be managed care-
fully. In addition, the distances between decoupling capacitors
should be optimized to the noisy circuits on the die, on the pack-
age, and on the board, the same as for the power network design.
If we do not use enough power lines and decoupling capacitance
in the layout, the on-die voltage supply levels will vary too much
and we will lose the yield. If we design with excessive amounts,
the layout area or die size will increase significantly to increase
the die cost.

Figure 2-26. Model of decoupling capacitor [61].

c02.qxd 12/16/2003 11:51 AM Page 73

2.6 SUMMARY 73

Figure 2-27. Hierarchy of power distribution and decoupling capacitance [61].

2.6 SUMMARY

Power network planning is discussed in this chapter. The power

network plan step specifies the metal lines (widths, pitches, etc.)
and decoupling capacitor locations for the power distribution net-
work in the chip. The power network is implemented in each met-
al layer of the die, the package, and the system board.
The design guidelines should be optimized and specified for the
metal lines and decoupling capacitors on the die, the package, and
the system board. In order to achieve that, the complete RLC net-
work is usually constructed for the prelayout metal lines used for
the power network. In addition, the decoupling capacitors are in-
cluded in the modeling, as well as the package models.
High-performance microprocessor design usually employs this
kind of optimization study in order to provide accurate specifica-
tions of the metal lines for the power distribution. The difficulty
in power network modeling is the current waveform modeling to
simulate the transistor switching activity in the design.
Usually, simplified triangular or trapezoidal waveforms are
used to model the switching currents. The timing patterns of the
circuit switching are also important to capture the dynamic (not
the worst-case) current consumption in the design.
c03.qxd 12/16/2003 12:04 PM Page 75

3
ELECTROMIGRATION

Electromigration in an IC is the movement of metal ions as the re-

sult of the flow of electrical charges through the metal wires in
the chip, particularly the wires that distribute the power within
the IC. This unwanted ion movement could open up metal voids in
some parts of the wires and build up metals at other sites.
At the sites from which metal migrates, voids increase the resis-
tance of the affected wire and, in extreme cases, can cause it to
open completely. At the receiving end of the migration path, the
buildup of metal can form hillocks that, in extreme cases, can span
the gap between adjacent wires and cause shorts between them [5].
This chapter is organized into four sections as follows. Section
3.1 discusses the basic definitions and rules for IC electromigra-
tion reliability. Section 3.2 describes the CAD tool used to perform
the electromigration (EM) analysis [65]. Section 3.3 further dis-
cusses the design methodology for reducing IC electromigration
failures. Section 3.4 summarizes the chapter.

3.1 BASIC DEFINITIONS AND EM RULES

The increase in resistance caused by electromigration appears

only after a period of incubation. During this period, wire resis-
tance remains fairly constant. After that, it increases steadily,
eventually causing the IC to fail. How long incubation lasts is de-
termined by such factors as wire size and composition, as well as
the current density.

Power Distribution Network Design for VLSI, by Qing K. Zhu 75

ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c03.qxd 12/16/2003 12:04 PM Page 76

76 ELECTROMIGRATION

In process technologies below 0.18 ␮m, IC metal lines are usu-

ally formed of aluminium, some alloy of aluminium and silicon, or
aluminium and copper. Pure aluminium has low resistivity, but it
is also the most susceptible to electromigration. Copper, which
has much lower and better resistance to electromigration, is usu-
ally used in 0.18 ␮m and below processes.
Information obtained during the accelerated testing of IC chips
is used for predicting the IC mean time to failure (MTF) under
normal operating conditions. The overall relationship of all fac-
tors under DC conditions contributing to MTF can be described
using Black’s equation, as follows [5]:

MTF = (AJ–N)eEa/kT (3-1)

where J = current density, Ea = activation energy, k = Boltz-

mann’s constant, and A = an experimentally determined scaling
factor.
For dynamic operation of a circuit, the equation can be modified
by replacing current density, J, with an effective current density,
Jeff. A factor is adjusted, based on the experimental measurement
data, to fit Black’s equation curve with the reliability data from
the measurements.
Although electromigration is a serious problem in submicron
designs, it seldom affects a small portion of the design. In most
cases, it is limited to the power distribution network. The problem
occurs when some power lines are too narrow, or an insufficient
number of contacts or vias have been placed for the large current
density carried.
Current density can be reduced by increasing the size of the
metal lines or adding more contacts between metal lines. Adding
more power lines on metal layers also reduces the current densi-
ties. In general, with more metal lines and vias used in the power
distribution network, the electromigration failures are decreased
while the IR drop is also reduced. In the early design planning
stage, enough power lines should be provided in order to overcome
the IR drop and electromigration problems.
The operating switching time (T0) is defined as the minimum
time between successive current switching operations, as shown
in Figure 3-1. The current operating frequency is defined as fsw =
1/T0. The switching factor (s) is defined as a fraction of operating
cycles over the life of the product during which a given circuit
c03.qxd 12/16/2003 12:04 PM Page 77

3.1 BASIC DEFINITIONS AND EM RULES 77

Figure 3-1. Switching time period.

switches. The average DC current (idc) is calculated based on the

following equation:
S
idc = ᎏ
T0
冕 T0

0
i(t)dt (3-2)

In addition, two more current measurements are used for the

EM analysis: RMS current and peak current. The RMS current is
calculated as follows:

冪莦冕莦莦i莦(t莦)d莦t莦
T0
S
irms = ᎏ 2
(3-3)
T0 0

where i(t) is the current waveform, as shown in Figure 3-1.

The peak current (ipeak) is represented as follows:

ipeak = max[|i(t)|] (3-4)

In the process design manual, the EM rules are specified to pro-

tect against two types of current-density-introduced failures: the
standard EM and local heating EM. For the standard EM check,
the rules define the maximum DC current Idc, which is the func-
tion of the metal width, such that: idc < Idc for every metal line in
the layout.
For the local-heating-enhanced EM, the rules define the maxi-
mum RMS current Irms, such that irms < Irms; and in addition, the
maximum peak current Ipeak is specified such that ipeak < Ipeak. In
the design, for any currents over the metal lines, the above EM
conditions have to be satisfied: idc < Idc, irms < Irms, and ipeak < Ipeak.
The DC average current limit Idc can be translated into the
maximum load capacitance allowed for the drivers in order to
generate the current idc < Idc. For the typical CMOS situation,
where circuits are used to charge and discharge capacitances, the
following formula may be used to translate Idc limits into the ca-
pacitance limits [64]:
c03.qxd 12/16/2003 12:05 PM Page 78

78 ELECTROMIGRATION

Idc
Cmax = ᎏᎏ (3-5)
s·fsw·Vdd

where sxfsw is the switching activity and V is the supply voltage.

For the case of pure AC current, the following formula can be
used to translate Irms limits into the capacitance limits [64]:

Irms
Cmax = ᎏ ␪–1 (3-6)
fsw·Vdd

where ␪ is defined differently for square, triangular, and sinu-

soidal waveforms based on the switching activity and clock cycle
time.
Table 3-1 shows the EM limits (the maximum allowable cur-
rent rules) for an eight-metal-layers process, where W represents
the drawn metal width of the metal line, and 0.04 is the process
shift for the metal width correction after manufacturing; that
means that W – 0.04 is the actual or effective width of the metal
line after manufacturing [64].
In the case of narrow strips where a single via or contact is per-
mitted along the width, the general rules can be applied by using
two or more contacts or vias along the line length. The general
rules can be applied for the cases of wide lines, provided the max-
imum number of contacts or vias allowed along the width are
used.
For a wide line crossing a wide line, the general rule can be ap-
plied by using the maximum number of contacts or vias to create
an L-shaped array, as shown in Figure 3-2 [64]. Use of redundant
vias is recommended where possible.

Table 3-1. EM current limits (T = 105°C) [64].

Metal
Level Idc (mA) Irms (mA) Ipeak (mA)
M1 4.05 · (W – 0.04) 兹[2
苶3苶5苶.8
苶苶·苶(W
苶苶 –苶苶4
0.0苶)]
苶苶·苶[(W
苶苶–苶苶4
0.0苶)苶+
苶苶苶0
0.7苶4苶]苶 100 · Idc
M2 3.30 · (W – 0.04) 兹苶苶.1
[96 苶苶·苶(W苶苶–苶苶4
0.0苶)]
苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶0
1.4苶8苶]苶 100 · Idc
M3 4.80 · (W – 0.04) 兹[9
苶5苶苶.1苶·苶(W
苶苶–苶苶4
0.0苶)]
苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶6
2.0苶8苶]苶 100 · Idc
M4 4.80 · (W – 0.04) 兹[6
苶9苶.9
苶苶·苶(W苶苶–苶苶4
0.0苶)]
苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶1
2.8苶6苶]苶 100 · Idc
M5 7.05 · (W – 0.04) 兹[8
苶1苶.1
苶苶·苶(W苶苶–苶苶4
0.0苶)]
苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶6
3.5苶4苶]苶 100 · Idc
M6 7.05 · (W – 0.04) 兹[6
苶3苶.2
苶苶·苶(W苶苶–苶苶4
0.0苶)]
苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶7
4.5苶6苶]苶 100 · Idc
M7 7.05 · (W – 0.04) 兹[5
苶1苶.7
苶苶·苶(W苶苶–苶苶4
0.0苶)]
苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶8
5.5苶8苶]苶 100 · Idc
M8 7.05 · (W – 0.04) 兹[4
苶3苶.8
苶苶·苶(W苶苶–苶0苶苶)]
.04 苶苶·苶[(W
苶苶 –苶苶4
0.0苶)苶+苶苶苶0
6.6苶0苶]苶 100 · Idc
c03.qxd 12/16/2003 12:05 PM Page 79

3.1 BASIC DEFINITIONS AND EM RULES 79

Figure 3-2. Via array for wide metal lines [64].

The maximum current allowed through all contact and via in-
terfaces is described as follows. The number of contacts and vias
placed across a line, perpendicular to the direction of the current
flow, must be maximized or increased as soon as the line width
permits, per layout rule restrictions, as shown in Figure 3-3.
If multiple vias are used, the allowable current value equals
the allowable current per via times the number of vias. In all
cases, the total current must not exceed the interconnecting metal
line current limit, as shown in Table 3-1.
Multiple vias, or maximum coverage arrays of vias, added down
the metal strip in the direction of the current flow do not increase
the maximum current flow. Only the first via, or row of the via ar-

Figure 3-3. Reliability enhancement for placing multiple vias [64].

c03.qxd 12/16/2003 12:05 PM Page 80

80 ELECTROMIGRATION

ray, contributes to the current flow due to the nature of the inlaid
copper process [64]. Multiple vias, or arrays of vias, must be used
to increase the reliability in case of blocked or resistive vias.

3.2 EM ANALYSIS TOOL

We will describe an EM analysis tool from Cadence Design Sys-

tems [65]. High-speed signal nets can suffer from both DC and AC
electromigration problems. The tool uses two separate algorithms
to provide comprehensive electromigration verification for any
signal net. The tool can check nets in large designs without reduc-
ing the data, so it produces accurate results. It is typically used in
high-speed clocks and data nets. It can highlight the areas of con-
cern by using detailed simulation specifically designed to locate
the electromigration issues.
The tool can produce the graphical output that clearly identi-
fies the interconnect metals and vias of concern that violate EM
rules. The tool uses two programs: one program accesses the de-
sign device information, and another program checks for signal
electromigration. The tool requires the following inputs:

앫 An interconnect database
앫 Device capacitance data
앫 Driver-strength database
앫 Electromigration limits for all design layers

The tool loads the postlayout interconnect database. It checks the

consistency of each net and displays appropriate warnings and er-
rors. It plots the nets contained in the interconnect database, and
you can select the net for the EM checking. For example, the Vdd
net is selected for further analysis.
Because of the high volume of data in a signal net, the tool uses
filters to determine whether the value of the current density of in-
terest lies within the accepted levels. You can set filter ranges to
obtain a more detailed view of delay or current density distribu-
tion in a design. You can also easily flag critical nets, which are
the nets with high current densities.
You can create a filter for the following analysis types, as
shown in Table 3-2. Use the filter command to set the data range
for the analysis types. The syntax of this command is as follows:
c03.qxd 12/16/2003 12:05 PM Page 81

3.2 EM ANALYSIS TOOL 81

Table 3-2. Current density analysis types [65]

Analysis Type Symbol
RMS current density Jrms
Average current density Javg
Peak current density Jpeak
Electromigration risk in each signal resistor for the Emrisk
signal net
Electromigration risk in a net Emrisknet

Filter [jrms|javg|jpeak|jrec|emrisk|emrisknet] [auto|range]

[on|off] | range min_value max_value]

To set filter 4 for emrisk to be in the range of 10 to 50, enter the

following command:

>> filter emrisk 4 10 50

There are several methods used in the EM analysis by this tool

[65]:

앫 Method 1: calculating worst-case values without driver infor-

mation.
앫 Method 2: calculating worst-case values with driver informa-
tion.
앫 Method 3: calculating realistic values for Javg, Jpeak and
Jrms.
앫 Method 4: calculating Javg, Jrms and Jpeak by using user-
provided device current data.

Method 1 is the fastest but least accurate method. Using this

method causes the tool to overestimate the current in the net.
Start by calculating the worst-case values for peak, average, and
RMS current density, without the driver information. The tool
can apply the electromigration analysis to all signal nets in a
large design and filter out critical nets with potentially high cur-
rent densities. This analysis can drastically reduce the number of
signal nets requiring further investigation.
The tool assumes that all inputs and bi-directional ports on a
net drive the net in parallel. For each driver, the tool assumes the
maximum driving strength defined by the default driver strength
and default port driver strength environment variables, as well as
c03.qxd 12/16/2003 12:05 PM Page 82

82 ELECTROMIGRATION

a step voltage function at the driver inputs. For example, the de-
fault value for both variables is 10 ⍀.
You must set the voltage range and cycle time by using the ac-
tivity command. You can improve the quality of the estimate by
adjusting the activity ratio on a per-net basis over consecutive
analyses. The tool will report the nets that cannot be passed in
the current density check.
Method 2 produces more accurate currents in the net than
Method 1 and is almost as fast as Method 1. It requires the driver
information—the direction and strength of the ports driving a
net—to make more realistic current estimates. The tool will calcu-
late the driver data and place it into a file. This type of analysis
uses the same algorithm as Method 1, which enables you to re-
peat the electromigration checks for nets that failed in Method 1,
calculating worst-case values without driver information. When
you specify Method 2, you must use the Load Driver command to
load driver strength information.
Method 3 is the slowest but most accurate method. Consider
using this method only for critical nets, that is, nets that allow
failures during the electromigration analysis with Methods 1 or 2.
Using the driver information, the tool uses a simulation method
to determine Javg, Jpeak, and Jrms in every resistor. This analysis
gives the most accurate results for each resistor in the net but re-
quires a longer run time compared to Methods 1 and 2. Method 3
will require the driver information—the direction and strength of
the ports driving a net—to make a more realistic current estima-
tion.
When you specify Method 3, you must use the Load Driver com-
mand to load driver strength information. But more accurate
analysis using the detailed simulation in Method 3 will increase
the run time. Method 3 only analyzes the nets that failed in
Method 2.
Method 4 performs the electromigration analysis by using pre-
calculated average, RMS, and peak device currents. It can also de-
fine groups of devices that either charge or discharge a net. This
methodology assumes that truly parallel devices, which are tran-
sistors with the drain, gate, source, and bulk connected to the
same node, act together as a unit. It derives a separate solution
for each driver charging or discharging the net.
For devices with no current specified, it assumes a zero current
and does not calculate a separate solution. If you do not specify a
current for any of the devices connected to the net, the tool issues
c03.qxd 12/16/2003 12:05 PM Page 83

3.3 FULL-CHIP EM METHODOLOGY 83

a warning and performs no analysis. For each transistor, you can

specify two average values as follows:

1. Iavg_ds: the average current flowing from drain to source

2. Iavg_sd: the average current flowing from source to drain

For each charging current, which is provided by a single device or

a group of parallel devices, the tool calculates the average current
by using the charging current and the capacitance of the net.
Another commonly used EM analysis tool, called RailMill, from
Synopsys, Inc., is described as follows [5]. It simulates the power
network of the IC design for EM violations. It will display a color-
coded picture of the circuit showing the current densities in vari-
ous areas. Red color indicates that the current density or electro-
migration limit has been violated. Brown and orange colors are
used for areas in which the values are quite close to the limits.
The yellow portion of the circuit is where the current density val-
ue is one-half of the limit. Finally, the blue, green, and grey colors
correspond to the much lower current densities.
The analysis tool separates the power network from the tran-
sistors by extracting a model of that network from the design lay-
out file [5]. It performs transistor-level simulation of the IC to de-
termine the current in each part of the circuit at each instant. An
input vector set that reflects the operational behavior of the chip
is used for the transistor-level simulation, so the power network
will be simulated using the realistic currents.
Once the transistor-level simulation is completed, the calculat-
ed transistor current and the power network model serve to deter-
mine where electromigration problems exist. A graphical environ-
ment is provided with which users may perform iterative what-if
analysis [5]. The user may make tentative changes as annotations
to the power network, simulate and analyze them, and then dis-
play problems; designers may change the width of specific wires,
add more power pad connections, add power lines, and delete
power lines. All the tentative changes will not make real changes
to the layout.

3.3 FULL-CHIP EM METHODOLOGY

Full-chip reliability has become more critical because advances in

technology are yielding narrower interconnect structures and
c03.qxd 12/16/2003 12:05 PM Page 84

84 ELECTROMIGRATION

high-frequency designs [4]. This combination increases the risk of

electromigration and joule heating failures in designs.
Traditionally, designers are given simple layout design rules
based on the wire current density limits to which they must ad-
here. These limits, set to provide reliability over a broad range of
circuit configurations, can make high-speed designs excessively
large or impossible to design. This indicates that a methodology
for the reliability budgeting is needed to permit engineering
trade-offs between performance, design size, and lifetime.
This methodology must analyze the circuit to obtain realistic
estimates of actual currents flowing in the circuit; apply advanced
electromigration models to wire segments, usually based on
Black’s equation; and perform statistical analysis over the wires
in the design to estimate the probability of the chip operating
properly over its lifetime.
Due to the complex power grid and distributed blocks in a de-
sign, current flow from a chip pin to the gates cannot be deter-
mined without full-chip analysis. This is one of the reasons why
full-chip electromigration analysis finds design problems. The
current flowing in the chip may be taking a completely unexpect-
ed route through failure-prone portions of the power grid.
A design methodology includes extraction of chip interconnect
data. It uses a static or dynamic full-chip analysis to determine
current loading characteristics at the various device contacts to
the power grid, and modeling mechanisms to report either wire
segments likely to fail or overall chip lifetime characteristics.
Full-chip reliability analysis is part of the power distribution
verification process and can be carried out in parallel with the IR
drop analysis. The ability to apply reliability analysis at the full-
chip level makes it possible to bring product reliability and relia-
bility budgeting into the design cycles. The power grid electromi-
gration analyses require the creation of models for a chip.
Model data is provided for each metal and via layer in the chip.
Each metal-layer model provides the layer thickness and current
density limits for peak, average, and RMS currents through wire
segments. Different foundries provide different rules for thresh-
old checks. Different model parameters may be applied for narrow
wires and wide wires; an additional model parameter defines the
boundary between narrow and wide wires.
Each via and contact model provides the current limits for
peak, average, and RMS currents through each via/contact for
c03.qxd 12/16/2003 12:05 PM Page 85

3.4 SUMMARY 85

threshold checks. A more detailed analysis of the reliability is

made by calculating the theoretical time to the failure and EM-
risk value for each segment, and using the proper failure statis-
tics to obtain a failure probability as a function of time for the en-
tire chip. The results are highly dependent on the choice of the
statistical model used.
When the wire segments with the highest EM risk are identi-
fied, these can be provided to the designer for an engineering
change order (ECO) if the overall chip probability is below specifi-
cation. Improving the reliability of the latest reliable elements in
the design will drastically increase the overall MTF of the design.
To fix the electromigration problems, the metal lines are
widened while observing the possible warnings of electromigra-
tion failures. In addition, more vias and contacts are needed be-
tween these wider metals lines between different layers. Figure 3-
2 illustrates this design concept.

3.4 SUMMARY

The power grid of a chip is operated primarily in a pulsed DC

sense with respect to the electromigration analysis. Therefore, the
average current data through the circuit is used to perform elec-
tromigration analysis on the grid. The full-chip transistor analy-
sis tool will provide the average current drawn by each transistor
connected to the power grid.
Each power grid is modeled with the voltage sources at the Vdd
and Vss pins, and the transistor tap currents at the device connec-
tion points. The large linear system is then solved to determine
the precise current flowing through every wire segment and via in
the chip. Once each wire segment current density has been deter-
mined, simple checks are applied to identify those wires in the de-
sign that exceed the thresholds.
c04.qxd 12/16/2003 12:14 PM Page 87

4
IR VOLTAGE DROP

A combination of factors cause increases in IR drop failure. In the

past, designers of low-frequency circuits implementing 0.35 ␮m
three-layer metal processes rarely encountered IR drop or electro-
migration issues. However, designs with frequencies above 100
MHz, 0.25 ␮m processes, or four or more layers of metal increase
the risk of problems. The IR drop problem is the voltage drop
across the power grid due to the currents flowing through the
power metal lines or metal resistances.
Lower metal resistance or smaller current definitely help solve
the IR drop problem, but this may not be the case, due to the
scaled-down metal pitch and increased power consumption. In ad-
dition, the tolerance of the IR drop decreases due to the lower sup-
ply voltage. Therefore, we need to address the IR drop problem in
the power grid design.
This chapter is partitioned into six sections. Section 4-1 de-
scribes the causes of the IR drop in the deep-submicron chip. Sec-
tion 4-2 gives an overview of the IR drop analysis. Section 4-3 de-
scribes a static IR drop analysis method [51]. Section 4-4
describes a dynamic IR drop analysis method [51]. Section 4-5 dis-
cusses circuit analysis with the IR drop impacts to improve the
accuracy. Section 4-6 summarizes this chapter.

4.1 CAUSES OF IR DROP

The first set of causes is related to the advances in process tech-

nology. Chip feature sizes are decreasing in accordance with
Power Distribution Network Design for VLSI, by Qing K. Zhu 87
ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c04.qxd 12/16/2003 12:14 PM Page 88

88 IR VOLTAGE DROP

Moore’s famous law. Transistor sizes are decreasing to permit

high-density designs. Transistors require a lower power supply
voltage to avoid device failures. A lower supply voltage means
that lower noise margins or smaller IR drops are permitted on the
power grid.
On the other hand, the ability to design increasingly complex
chips leads to increases in overall size and power dissipation. To
design larger chips, more metal layers are being used to imple-
ment longer signal and power routing. Narrower wires have high-
er resistance than those used in previous technologies.
These higher-resistance wires and higher overall power currents
lead to increases in IR drop or power switching noise. The conflict-
ing design trend toward lower noise margins means that you must
achieve a balance between the inherent power grid noise and the
power supply noise margin to achieve a successful design.
The natural response to balancing the technology trends is to be
more conservative in power grid design by adding more power lines
on the chip layout to reduce the IR drop. But a more conservative
power grid design means sacrificing the chip area, potentially a
high cost. Other trends in processing technology present addition-
al problems related to the IR drop. Via and contact resistances are
not scaling in accordance with the transistor scaling. The trend is
for them to remain the same or increase in metal resistance.
The parallel nature of data, such as that in a 64-bit wide bus,
usually will place the drivers of each bit of the bus together or
near each other. Large drivers in a local area are a common cause
of the local IR drop problem. When the drivers of all those bus bits
are switched in the same time window, a local IR drop will cause
logic errors in the circuit.
The clock net in the chip must operate synchronously. Simulta-
neous clock switching introduces a large, instantaneous IR drop
on the power grid. The clocks on some microprocessor chips con-
sume up to 40% of the chip’s total power.
In addition to the clocks, most circuit activity in a design occurs
just beyond the edge of the clock, due to the higher frequency, cre-
ating a high instantaneous power demand after the clock edge.
The overconservative design for timing will also cause IR drop
problems. For example, oversized buffers are usually used in the
critical speed paths, increasing power consumption. Conservative
design for the timing must be balanced with the power grid opti-
mization.
c04.qxd 12/16/2003 12:14 PM Page 89

4.2 OVERVIEW OF IR ANALYSIS 89

The location and design of I/O pads are a further source of the
IR drop. Simultaneously switching output pads, which always
have a large load, creates a strong demand for the power current
and causes IR drop. The placement of I/O pads and power pins is
a difficult design challenge. I/O rings normally have independent
power rings and pads to prevent I/O ring IR drops from affecting
the internal chip power.
Another common source of IR drop problems is the isolation of
block power grids. It is common to isolate the power grids for sen-
sitive blocks in the design, such as phase-locked loops and memo-
ries. However, power grid problems can result from excessive iso-
lation or insufficient isolation.
Excessive isolation occurs when the block’s power grid is so well
isolated that the resistance from the power pad to the block is ex-
cessive, causing the IR drop. Insufficient isolation occurs when
neighboring blocks create an IR drop that will impact the sensi-
tive block. The IR drop in the sense amplifier is of particular con-
cern for the memory design.
Many low-power design methodologies apply techniques to re-
duce the average power dissipation of a block. Techniques such as
gated clocking isolate the power demands to the times of the block
activity. Low power consumption does not necessarily mean low
IR drop. If we design the block power grid on the basis of average
power consumption, undersized power buses will create IR drop
problems.
The last source of IR drop problems is errors in connecting glob-
al power grids to block power grids. It is common to design the
global and local power grids separately. The power grid is de-
signed to attach the block power grid at a large number of points
after the block is finally placed.
Either manual or automatic techniques are used to insert the
vias in the design where the grids are to be connected. This
process may cause the attached points to be missed, resulting in a
large IR drop to a portion of this chip.

4.2 OVERVIEW OF IR ANALYSIS

Power grid analysis helps to identify weak spots in the power net-
work. Weak spots are the lower supply voltages that result in ex-
cessive IR drop or ground bounce. A good power grid analysis tool
c04.qxd 12/16/2003 12:14 PM Page 90

90 IR VOLTAGE DROP

not only helps you find such weak spots, but also helps you under-
stand what you must change to improve the weak spots. The IR
drop analysis tool VoltageStorm™ Transistor-Level PGS from Ca-
dence Design Systems will perform this task [51].
It includes static, activity-based, and dynamic analyses. Power
grid analysis involves the extraction of power grid and netlist
data from your chip layout, followed by the analysis of the power
grid and netlist. The interface between circuit netlist analysis and
power grid analysis is implemented by using the tap currents.
In most cases, each tap current is a transistor current, but it
could emanate from a variety of elements. Tap currents are cur-
rents arising from the connection of transistors to the power grid.
Figure 4-1 shows a typical netlist analysis view of transistors con-
nected to a power grid. Each transistor is modeled with a tap cur-
rent, as shown in Figure 4-1(b).
If the netlist has a million transistors connected to the Vdd
wire, data for a million transistors is passed to the power grid

(a)

(b)

Figure 4-1. Tap current model of each transistor tied to Vdd [51].
c04.qxd 12/16/2003 12:14 PM Page 91

4.2 OVERVIEW OF IR ANALYSIS 91

analysis. The power grid analysis includes no information about

any transistors other than those connected to the specific power
grid being analyzed.
In a typical digital circuit design, one-third of the total number
of transistors is connected to Vdd, one-third is connected to Vss,
and the rest are connected to internal nodes between logic gates.
Because the primary elements in common between netlist analy-
sis and power grid analysis are the transistors connected to the
power grid, the power grid analysis models the transistor tap cur-
rents as the current sources attached to the power grid.
The tap current data file provides the details for each current
source. Tap current files can be static—only a single current value
is provided for each transistor—or dynamic—a sequence of data
points is provided for each transistor. These currents are used to
perform either a simple steady-state analysis or a dynamic analy-
sis of the power grid.
Transistors have four terminals: drain (D), gate (G), source (S),
and bulk (B). A typical p-type transistor representation is shown
in Figure 4-2. The dominant current in a transistor is IDS, the cur-
rent flowing into the drain through the transistor and out the
source. In a p-type transistor, this current is typically negative. In
the power grid analysis, we are interested not only in IDS, but also
in the total currents flowing from and to the power grid: IS and IB.
The total power current is the sum of these currents over all
transistors. The total current sink to the Vdd line, in Figure 4-2, is
the sum of IS and IB. IS is the sum of several currents in the tran-
sistor, as follows:

IS = –IDS + ICSG + ISB (4-1)

Figure 4-2. Tap current calculation [51].

c04.qxd 12/16/2003 12:14 PM Page 92

92 IR VOLTAGE DROP

where ICSG is the current charging the transistor capacitance,

CSG; ISB is the junction current including capacitive current be-
tween the source and the bulk; and IB is the sum of several cur-
rents as follows:

IB = –ISB – IDB + ICBG (4-2)

where ICBG is the current charging the transistor capacitance,

CBG; and ISB and IDB are the junction currents. IB contributes to
the total power dissipation for chips over a million transistors in
size, but it is not a primary cause of IR drop.
In addition, the bulk current flows into either a well or the sub-
strate of the chip and, therefore, usually introduces its load to the
power grid at a location away from the transistor. For these rea-
sons, we will consider only IS in the power grid analysis, although
the analysis also computes IB during the circuit netlist simulation
[51].
The following sections will compare static and dynamic analy-
ses with the power grid analysis tool and show how static analysis
can find problems in the power grid [51]. When it is used effective-
ly and interpreted properly, static analysis with the tool can even
find data-dependent power grid problems. We perform the static
power analysis when the analysis of a power grid is based on the
steady-state current modeling of the tap currents [51].
If we simulate the chip with thousands of test vectors and track
the average current through each transistor connected to Vdd, we
can obtain a long-term average behavior of the Vdd distribution
network in the power grid analysis [51].
The challenge in static power grid analysis is obtaining suffi-
ciently representative tap currents in a small computation time.
An important lesson learned through experience is that meaning-
ful results are obtained in static analysis, even if the currents ap-
plied are not precise.
The goal of static power grid analysis is to find weak spots in
the power distribution network, not necessarily to compute the
exact IR drop to the closest millivolts. The most common signifi-
cant power grid problems stand out in static analysis, even if the
tap currents applied are rough guesses of actual average currents.
Consider a chip in which one row of cells is only connected on one
end, when it should be connected on both ends to the power net-
work. The result is that the IR drop at one end of the row is much
larger than in all other rows in the chip. Even if the total power
c04.qxd 12/16/2003 12:14 PM Page 93

4.2 OVERVIEW OF IR ANALYSIS 93

distribution of the chip is unknown, a specific row standing out

above the others is a strong indication of a weak spot.
As another example, consider a set of drivers of a long bus, all
powered from a specific location on the power grid. In this case,
an IR drop failure may be data-dependent. However, in the static
power grid analysis each driver is modeled by a larger current be-
cause of either the larger load on the driver or the larger transis-
tors in the driver.
These larger currents in the static analysis highlight the weak
spot without requiring you to simulate the specific vector to acti-
vate all drivers at once. You can still find problems without per-
forming a significant amount of simulation.
If the currents are overestimated by the static analysis, which
uses the worst-case switching activity, the method can provide
current scaling information to the static analysis. For example,
memory cells have substantially lower activity levels than the
other circuitry, so dedicated current scaling factors are applied to
these low-switching-activity regions.
The average currents assume equal amounts of rising and
falling transitions on nets, so you can ignore currents due to the
Miller capacitances in the transistors. The most significant ad-
vantage of static power grid analysis is that the requirements for
extraction and netlist analyses are much lower, so we can rapidly
perform the static power grid analysis.
Then we can apply a more extensive dynamic analysis while
waiting for the chip to return, if the schedule does not permit it
before the tapeout. It is recommended that one should always be-
gin with static analysis before proceeding to dynamic analysis.
Dynamic power grid analysis uses simulation vectors to simu-
late the chip to obtain a finer solution of the chip’s behavior. Al-
though static analysis is quite effective in finding weak spots in
the power grid, we may want to go to the next level of depth in an-
alyzing the power grid. The dynamic analysis helps to identify
false warnings caused by the temporal variation of currents.
Figures 4-3(a) and (b) show the current distribution based on the
timing diagram. Obviously, the static analysis may treat the total
current consumption the same for the current distributions shown
in Figures 4-3(a) and (b), but the real IR drop will be much smaller
if we can identify the current pulses in the timing diagram, based
on the dynamic analysis for asynchronous transistor currents.
Each part in Figure 4-3 shows the current waveforms for tran-
sistors M1–M6 over a clock cycle. Each transistor has the same
c04.qxd 12/16/2003 12:14 PM Page 94

94 IR VOLTAGE DROP

(a)

Figure 4-3. Current pulses in different timing patterns [51].

current pulse. The difference between Figures 4-3(a) and (b) is in

the timing of the pulses. In Figure 4-3(a), all pulses occur at once,
and in Figure 4-3(b) they are spread out over the clock cycle. Both
sets of current waveforms yield the same average currents for all
transistors. Depending on the specific characteristic of your chip
design, the case shown in Figure 4-3(a) has a worse IR drop than
the case shown in Figure 4-3(b).
You are likely to use dynamic analysis for one of the following
four specific reasons [51]:

1. To simulate a specific test vector

2. To identify which specific test vectors activated an imple-
mentation weakness
3. To examine the time correlation of tap currents
4. To obtain a better estimate of the realistic magnitude of IR
drop
c04.qxd 12/16/2003 12:14 PM Page 95

4.2 OVERVIEW OF IR ANALYSIS 95

(b)

Figure 4-3 (continued).

The simulation of a specific test vector is common in memory de-

sign to test power grid behavior under specific corner cases. It is
also used when worst-case IR drop test vectors are known before
analysis. Simulation to identify which specific test vector acti-
vates a weakness is useful when you cannot change your power
grid, but you can change your power profile by changing the vec-
tors using microcodes.
Examining the time correlation of tap currents is a valuable
check to avoid the static averaging current issue. A better esti-
mate of the magnitude of the IR drop is used when the cost of fix-
ing a weak spot is high and you want a more precise analysis be-
fore making the decision [51].
Dynamic power grid analysis is a type of transient analysis.
Transient analysis assumes the application of automatic
timestep control. However, performing full-chip netlist and pow-
er grid analyses require many resources. Automatic timestep
c04.qxd 12/16/2003 12:14 PM Page 96

96 IR VOLTAGE DROP

control tends to create time steps that are too small for practical
use in the power grid analysis. It is used in netlist analysis,
but not in the power grid analysis. We can manually control the
step size used in the power grid analysis by setting the parame-
ters.

4.3 STATIC ANALYSIS APPROACH

Static power grid analysis requires a minimum of three pieces of

information: a netlist of the circuits, the transistor modeling, and
a power grid of the chip. It is assumed that we know the name of
the power nets extracted from the chip. These nets are often la-
beled as Vdd and Vss, but if you use different names, substitute Vdd
and Vss with those names. To create a complete input circuit
netlist, combine the transistor netlist with the voltage source def-
initions for at least Vdd and Vss.
If we do not identify the power supplies, the tap current data
cannot be created. We must define additional voltage sources if
the chip has additional power inputs.
In addition to the voltage sources, we have to provide the tran-
sistor modeling data used in the circuit netlist. We also need to
create voltage sources for the signal inputs to the chip in the cir-
cuit netlist. The signal voltage sources are required to be piece-
wise linear input sources with a single initial voltage.
If we are going to apply vector-based simulation, the data in the
vector file overrides the piecewise linear data. For static power
grid analysis, the power grid extraction from the chip must only
contain the resistances. In the steady-state analysis, inductances
and capacitances are treated as shorts and opens, respectively, so
the extraction time is reduced by considering only the metal resis-
tances in the power grid.
If you intend to analyze both Vdd and Vss, extract them individ-
ually and do not extract both into a single power grid database.
We know where the power pins are located in the power grid.
Voltage sources are defined in the power grid analysis for the
power input pins at the locations.
A different voltage source is defined for each pad and can in-
clude a series of resistances and inductances. Different sources
are used because each has different behaviors resulting from the
characteristics of the power grid in operation.
c04.qxd 12/16/2003 12:14 PM Page 97

4.3 STATIC ANALYSIS APPROACH 97

The passing of tap currents from the circuit netlist analysis to

the power grid analysis uses the transistor names for identifica-
tion. It is therefore critical that the transistor names be consis-
tent between the circuit netlist and the power grid. The power
network and the circuit netlist use the same extractor from the
layout. For the static power grid analysis, neither net names nor
transistor names need to match any schematic that you may
have.
Schematic net names are only required if you supply activity
data. One of challenges in the static power grid analysis is to ob-
tain an accurate estimate of the distribution of the power con-
sumption in the chip. A rough power consumption estimate
method has been developed [51]. It uses various forms of data to
derive the distribution of the power consumption in the chip. It re-
quires a default chip frequency and the following optional infor-
mation. The specific clock inputs and the clock frequencies are
used to trace the clock domains in the design.
Any portion of the design not assigned with a specific domain is
assigned the default chip frequency. The gates on the clock distri-
bution network are modeled as operating at the specific clock fre-
quency. We can derive an activity rate for the logic circuit that is
not on the clock tree, based on the clocked domains [51].
We can also specify the activity rates or frequencies of the spe-
cific nets in the design. This information is used to set the known
activity rates of specific nets in the design. It is propagated for-
ward and backward, considering the logic functionality to im-
prove the estimation of the activity rates of nearby logic circuits.
We can further specify the power consumption of specific blocks
in the chip. When the actual power consumption of a specific block
is known, this number is used to automatically scale the distribu-
tion of currents in the block, so that the total estimated power
consumption of the block will match the specified one.
We could specify the power consumption of the entire design.
Once we estimate the various portions of the chip to determine
their power distribution, the specified total power consumption of
the chip is used to scale the estimated currents in the design to
match the specified one.
The following sections explain the power estimation based on
the maximum saturation currents. No capacitance or vectors are
required, so the turnaround time is that of the connectivity and
resistance extraction.
c04.qxd 12/16/2003 12:14 PM Page 98

98 IR VOLTAGE DROP

To use this method, we should have an estimate of the total

power dissipation of the design in the form of average current. Ad-
ditional information of the block power consumption also im-
proves the quality of the analysis. The peak saturation current,
IDS, for each transistor connected to the power grid, is calculated
based on the device’s IV curve and the transistor sizes. This peak
saturation current is used as the tap current.
Parameters to scale the VGS and VDS voltages are applied to
compute the saturation current. In addition, certain transistor
configurations result in no IDS current, because transistors with
shorted source and drain or gates have been turned off.
Although saturation currents may seem to be an inaccurate
method for deriving average currents, they have been quite suc-
cessful in finding weaknesses in power grids. We can scale the sat-
uration currents by the scaling factors, based on the specified pow-
er consumption of blocks. If we know the specific power dissipation
of the blocks or chips, we can scale the currents accordingly.
We can also use net activity data to estimate the power for the
power grid analysis. The clock is defined as having an activity ra-
tio of 1.0. The net activity is used in conjunction with the net ca-
pacitance, Vdd voltage, and chip frequency to derive the average
current of the transistors connected to the power grid.
Given these parameters, the average current consumed by a
gate is derived from the following equation:

IAVG = A · CGATE · Vdd · F (4-3)

where A is the activity ratio of the gate, CGATE is the total capaci-
tance of the nets in the gate including the load capacitance, Vdd is
the supply voltage, and F is the chip frequency.
Computing tap current on the basis of net activity introduces
two additional requirements for the layout extraction: (1) para-
sitic capacitances must now be computed for signal nets, and (2)
back-annotation of the net names from the schematic.
We can also derive the average transistor currents by perform-
ing vector-based simulation in the netlist analysis. This can
achieve more accurate average power grid currents by using the
transistor-level simulation of several vectors. This approach is
most commonly used at the block level for electromigration analy-
sis.
The tool uses one test vector input file, performs the simula-
tion over the vectors provided, and tracks the tap currents [51].
c04.qxd 12/16/2003 12:14 PM Page 99

4.4 DYNAMIC ANALYSIS APPROACH 99

It tracks the average, peak, and RMS currents at once and re-
ports them in three separate tap current files. Each tape of tap
current data provides a different perspective of simulation be-
haviour, allowing us to select which is the best suited to the
need.
Computing the average tap current on the basis of the vector
simulation requires one more additional condition for the layout
RC extraction: the parasitic capacitances should now be computed
for signal nets. If the vector input signals are not labeled in the
GDSII input, you must back-annotate the signal names in the
schematic to the extracted netlist from the layout.

4.4 DYNAMIC ANALYSIS APPROACH

Dynamic power grid analysis is the next step to improve the tap
current estimation accuracy, based on the input vectors at I/O
pins. It also includes the time variation of the currents in the
analysis. Rather than averaging the currents as in the static pow-
er analysis, this dynamic power analysis enables us to see the fine
time variation of currents over a clock cycle.
The challenge in the dynamic power analysis is to find the
weakness in the power grid by using the minimal amount of com-
putational time. A technique in the dynamic analysis includes the
capability for a form of vector compression in the creation of the
dynamic tap current data [51].
The vector compression is intended to create an effective worst-
case IR drop test vector by merging the behavior of many vectors
into a single equivalent vector set. The dynamic power grid analy-
sis introduces two additional requirements for the extraction be-
yond those of static analysis as follows:

1. The parasitic capacitances must be computed for both signal

nets and power nets, which are merged into the power grid
for analysis.
2. In addition, if the vector input signals are not labeled in the
GDSII input, we need to back-annotate signal names to the
layout netlist extracted.

The capacitance on the power grid is due to two sources: parasitic

capacitances and transistor capacitances. Parasitic capacitances
c04.qxd 12/16/2003 12:14 PM Page 100

100 IR VOLTAGE DROP

are generated from the RC extraction from the layout; and the
transistor capacitances are embedded in the dynamic tap cur-
rents extraction. The decoupling capacitances are also included in
the transistor capacitances.
The dynamic power analysis processes the dynamic current
data as piecewise constant current sources. The recommended
step size is about a single gate delay. Another criterion is to use
one-tenth of the clock cycle as the step size, so if our clock cycle is
10 ns, it will use 1 ns as the step size in the dynamic simulation.
If we want to include the pin inductance, a smaller step size is
required, such as one-twentieth of the clock cycle. The power grid
solution is performed by constructing and solving the massive ma-
trix problem. The size of the matrix describing the resistive con-
nectivity of a full-chip Vdd network can be very large.
The number of resistors in the Vdd network can be the number
of metal layers times the number of transistors in the circuit. In
the five-to-six metal layers process, the ratio will be five to six
times; and 10 million transistors will have 50 million resistors in
the network. The matrix to solve the power grid analysis is huge.
VoltageStorm™ from Cadence Design Systems uses vector
compression to reduce the overall computational time, because
the time to solve a large matrix for each of the large number of
time points can be very large [51]. If we simulate the chip for 100
vectors, and select 10 steps per clock cycle in the dynamic analy-
sis, we may perform 1000 solutions of the power grid. This may
not be practical with existing computational resources. The vector
compression reduces the number of solutions to 10. It is useful
when our objective is to resolve the temporal issues of the static
analysis or to estimate the magnitude of worst-case the IR drop
more precisely.
The dynamic analysis will introduce the time correlation to the
analysis data. The chips are synchronous in their behavior, with
the clock being the synchronous signal. Introducing the temporal
correlation in the dynamic analysis splits the activity occurring at
different portions in the clock cycle, rather than modeling the
clock cycles as a single time-averaged value. The key is to improve
the resolution in a clock cycle, not across many clock cycles or vec-
tors [51].
For example, assume that we split a 10 ns clock cycle into 10
buckets {B1 – B10} of 1 ns each, B1–B10. B1 corresponds to the
interval 0.0–1.0 ns into the clock cycle, B2 to the interval 1.0–2.0
c04.qxd 12/16/2003 12:14 PM Page 101

4.4 DYNAMIC ANALYSIS APPROACH 101

ns, and so on. Figure 4-4 illustrates the current for gate G1 over a
clock cycle [51].
If gate G1 can only switch in the time interval corresponding to
bucket B2 in the dynamic analysis, the current value for gate G1
in buckets B1 and B3–B10 should be 0.0 A in all clock cycles. The
current value in bucket B2 may be 0.0 A in some clock cycles and
nonzero in others. Over the 100 vectors, 1000 total buckets corre-
spond to gate G1. The 1000 buckets correspond to 100 vectors and
B1–B10 offsets into each vector.
The second concept in vector compression is that of peak analy-
sis, with a goal of finding the worst-case current. When simulat-
ing to determine the peak current of a transistor, we take the
maximum value found for the transistor currents at each time
point.
We would like to find the worst-case set of current buckets for
gate G1 to create a current waveform for a single worst-case clock
cycle. We want to find the peak over many vectors, but meanwhile
want to preserve the time offsets or buckets in clock cycles.
In summary, the vector compression technique will assign the
worst-case bucket with the largest current to the specific gate
(e.g., G1) for many vectors and so on for all gates. For example,
Table 4-1 shows the peak currents in different input vectors at
bucket 2 for Gate 1, so 2.1 mA is used for the largest current for
Gate 1.

Figure 4-4. Current distribution to gate G1 [51].

c04.qxd 12/16/2003 12:14 PM Page 102

102 IR VOLTAGE DROP

Table 4-1. Current values in buckets for G1 [51]

Test Vector B1 B2 B3
1 0 1.0 mA 0
2 0 2.1 mA 0
3 0 1.4 mA 0
4 0 0.2 mA 0
5 0 0.0 mA 0

After the assignment is done, we can assign the worst-case cur-

rents to tap currents. So the dynamic simulation is only done for
the number of timing buckets in one clock cycle. Notice that the
tap current assignment by many test vectors may be overestimat-
ed in this technique [51].
The computational time for the power grid analysis is only pro-
portional to the number of buckets instead of the number of buck-
ets multiplied by the number of input vectors. This processing in-
dependently takes place for each tap on the power grid.
If we have a Vdd power grid with 1 million transistors, we will
have 1 million sets of buckets. Each bucket set is the compression
over all the test vectors. Vector compression tries to synthesize a
worst-case IR drop test vector. How many vectors are required to
obtain a sufficient amount of data? The answer is a function of the
chip and the vectors that we apply.
We probably can obtain high-quality results from as few as one
vector, because the clock is a primary source of the power con-
sumption and IR drop. Simulating one vector may give you in-
sight into the performance of the power grid.
In some situations, we may want to perform power grid analy-
sis for each clock cycle, avoiding vector compression technology
[51]. We can perform the power grid analysis on a single vector in
isolation. This cycle-by-cycle flow is sometimes used in mutually
exclusive circuits, such as memories. This flow is useful when we
have multiple licenses of the VoltageStorm™ tool and we want to
split the power grid analysis into several pieces, so that we can
use a number of machines in parallel [51].
Based on the pieces of the power grid, each piece is analyzed
with the tap currents for transistors, and then we analyze the IR
drop for the entire full-chip power grid. This method is based on
the assumption that the currents of the power grid’s pieces do not
interact for different vectors [51].
c04.qxd 12/16/2003 12:14 PM Page 103

4.6 SUMMARY 103

4.5 CIRCUIT ANALYSIS WITH IR DROP IMPACTS

Theoretically, the tap currents of the circuit rely on the supply

voltage of the power distribution network. Hence, the power grids’
IR drop analysis and tap current analysis of the circuits interact
with each other. But if we simulate them one by one at each time
point, it may take a long computational time to improve the IR
drop analysis accuracy. The power grid analysis creates an IR
drop report that contains the voltages computed after the static
power grid analysis.
With the tap current file, the identification device used to pass
data back is the name of the transistor tap. We can repeat the
netlist analysis by using the unique voltages for each transistor
connected to the power grid.
The impact of the feedback is that the tap currents computed
again will have a smaller magnitude than in the first pass, be-
cause the lower-power grid voltages reduce the voltage swing of
the gates. The gate delay will also be larger.
But the gate speed should not have much impact on average
currents unless the functionality is altered. In examining the re-
sults of two passes through the loop, we obtain both the worst-
case and best-case IR drop results.
In the first pass, we can observe the worst-case IR drop, be-
cause the voltage on the power grid is highest using the ideal Vdd
value for all nodes in the power grid. In the second pass, we could
observe the optimistic IR drop, because the IR drop values fed
back to the circuit netlist analysis will reduce all the tap currents.
The next step to improve in accuracy is to feed the dynamic
analysis results into the circuit analysis. Time-varying power grid
voltages alter the speed of the transistors to obtain the most accu-
rate performance estimate of the design. In this methodology, dy-
namic IR drop data is fed back to the netlist analysis. The wave-
forms applied to each tap current of the transistor are now
dynamic rather than static.

4.6 SUMMARY

The analysis of the power grid can be done either by static or dy-
namic methods. The static method uses the average current and
current scaling factor to estimate the static number for the IR
c04.qxd 12/16/2003 12:14 PM Page 104

104 IR VOLTAGE DROP

drop. It is faster and easier to identify the weak spots in the pow-
er distribution grid by using this method.
The dynamic method improves the accuracy by simulating the
power grid and tap currents in multiple time points of the clock
cycle, similar to the transient analysis of circuit simulation for
both the circuit netlist and the power grid resistance network.
The dynamic method is not usually practical for a full-chip scale
due to the long simulation time required in multiple input test
vectors, but it is worthwhile to try it out using one or two test vec-
tors or the vector compression technique [51].
The best solution to a given IR drop problem depends on the
type of the IR drop, the chip architecture, the chip layout, and the
functionality. Several approaches can be used in the circuit and
layout design to fix power drop problems as follows [51]:

앫 Widen metal lines.

앫 Add or remove straps to redirect the currents.
앫 Reduce the circuit sizes while meeting the performance tar-
gets.
앫 Add decoupling capacitors to the design.
앫 Use C4 or flip-chip technology.
앫 Add more Vdd pads to the design.
앫 Connect buffers to different power buses.

Using the power grid analysis tool VoltageStorm™ from Cadence

Design Systems, we can make the ECO (engineering change or-
der) for the power network design [51]. In addition to the analysis
capability, the tool adds layout exploration capability, which en-
ables the designer to perform power grid ECOs within the tool.
The designer can remove all the power grid problems from a de-
sign in a single ECO pass. Once the power grid is clean, we could
create a single ECO list, called a change report, which guides the
implementation of the layout modifications necessary to create
the clean power grid design.
The exploration capability in the VoltageStorm™ tool enables
us to quickly experiment with the power grid change, and then
use the static power grid analysis to show the effects of these
modifications on the impact of the power grid performance [51].
Because all the ECO changes are implemented within the
framework of the tool, we do not need to reextract and reload the
power grid network with each ECO, thus saving turnaround time.
c05.qxd 12/16/2003 12:22 PM Page 105

5
POWER GRID ANALYSIS

This chapter will explain how to use CAD tools to help you find
the weak spots in the power grid. We chose to use the Volt-
ageStorm™ tool from Cadence Design Systems, Inc. although sev-
eral other CAD tools perform similar tasks [51]. Weak spots are
implementation characteristics that result in excessive IR drop,
electromigration stress, or pin currents during the operation of
the chip.
There are three approaches to finding weak spots. The first is
finding the weaknesses in the power grid that are likely to impact
the proper functioning of the chip, regardless of the magnitude of
the impact. This approach is quite common and best addressed by
using static analysis. It is strongly recommended to use static
analysis before dynamic analysis, because static analysis can find
the problems quickly.
The second approach to finding weak spots is to predict a worst-
case IR drop vector on the basis of the limited coverage of the vec-
tors for analysis.
The third approach to finding weak spots is to address the pre-
cise voltage drop on the grid for a specific test vector. This ap-
proach is common in memory design or when the cost of changing
a design is high and we want to determine the exact magnitude of
the IR drop.
This chapter is organized in six sections. Section 5-1 describes
the data preparation and provides an overall introduction to a
CAD tool used for the IR drop analysis. Section 5-2 explains the
steps needed to execute the CAD tool. Section 5-3 discusses ad-

Power Distribution Network Design for VLSI, by Qing K. Zhu 105

ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c05.qxd 12/16/2003 12:22 PM Page 106

106 POWER GRID ANALYSIS

vanced static analysis, such as the activity-based analysis

method. Section 5-4 discusses a dynamic analysis method that is
similar to the transient analysis of the power network. Section 5-5
discusses layout exploration—changing the layout and then re-
submitting the power grid analysis within a CAD framework. Sec-
tion 5-6 summarizes this chapter.

5.1 INTRODUCTION

VoltageStorm™ uses several tools to perform power grid simula-

tion for weak spot identification [51]. VoltageStorm™ uses Thun-
der, which is a netlist analysis tool, and Lightning, which is a
power grid analysis tool [51]. Thunder performs a transistor-level
analysis of the chip. It analyses the entire transistor netlist by us-
ing the voltage sources, transistor model data, and vectors that
we provide. Notice that the power grid in Thunder is modeled as a
single node.
Lightning performs a detailed analysis of the power grid node
in which the node is represented by its resistor, inductor, and ca-
pacitor components. It only processes the devices connected di-
rectly to the power grid. The power grid is modeled as a linear cir-
cuit with voltage sources representing the power pins and current
sources that represent the transistor taps connected to the power
grid [51].
The power currents flow from the voltage sources through the
grid and out the current taps. Proper analysis requires all three
components: the voltage sources, the transistor–inductor–capaci-
tor grid, and the tap current sinks [51]. Thunder calculates the
current information for each device connected to the power grid
(Vdd or Vss) and passes these currents, plus device capacitances for
dynamic power grid analysis, to Lightning. The interface between
the tools is based on the names of the devices connected to the
power node.
Thunder passes the current and capacitance data to Lightning
for each transistor. Lightning passes the IR drop data to Thunder
for each transistor. We must prepare a circuit netlist file. The pow-
er sources of the chip are defined in the netlist, which are used by
the tool to identify the gates and to perform the simulation.
The primary inputs of the chip are also defined in the netlist to
which the input vector files can be applied. These inputs should
c05.qxd 12/16/2003 12:22 PM Page 107

5.1 INTRODUCTION 107

not be defined as DC sources because Thunder treats DC voltage

sources as power sources to which vectors cannot be applied.
If the input voltage is a constant value over a specific simula-
tion time, define the source as a piecewise linear (PWL) waveform
at a single voltage. The output pin load of the chip has to be de-
fined also. It is important to model the loading on the chip outputs
in both activity-based analysis and vector-based simulation. A
common error in the analysis and simulation is to forget the out-
put pin loading in the circuit file.
In addition, the bidirectional pin loadings of the chip have to be
defined, because they act like outputs during some time intervals.
In the circuit netlist, we need to specify the link path to the tran-
sistor modeling data, which tells the tool how to compute transis-
tor currents and capacitances as a function of the voltage.
We can refer to multiple sets of modeling data for different de-
vices or place data for multiple models into a single directory. We
also need to specify the link path to the transistor netlist and co-
ordinate file used by the power grid analysis tool [51].
The netlist is usually hierarchical, although the flat netlist will
be accepted in the circuit simulation tool. The coordinate data-
base provides the geometric data about the locations of the de-
vices in the layout, which is also used in the graphical output.
Finally, the circuit netlist can contain the path to the parasitic
capacitance database of signal nets used and can be back-annotat-
ed into the circuit schematic node names for complete simulation
with parasitic RC effects.
The power grid database is also required by the analysis tool.
The power grid database contains the resistors and capacitors of
the power net in the chip. We also need the locations of the power
supplies that we want to model. When analyzing a block of the de-
sign, we select a number of locations on the periphery of the block
where power will be connected to the block.
To model the package characteristics, we can define a series re-
sistance as well as the inductance for the power pins. The accu-
rate modeling of inductance requires smaller time steps for dy-
namic analysis.
Another database is the estimation of the transistor peak satu-
ration currents, which is also called Ipeak analysis [51]. The
methodology for the current estimation used in the static analysis
is to estimate the average currents throughout the chip by com-
puting the peak saturation currents for all the transistors con-
c05.qxd 12/16/2003 12:22 PM Page 108

108 POWER GRID ANALYSIS

nected to the power grid, followed by some simple scaling of the

currents.
The above method is very simple and assumes that the average
current of a transistor is somehow related to its size. Although
this assumption is not strictly true, the results for a large number
of transistors connected to the power grid highlight the problem
areas of the power grid, if not their exact voltages.
Effectively using the circuit design experience for this analysis,
and filtering and displaying data, will find most problems in the
power grid design. It is claimed that accurate dynamic analysis of
the power grids will show similar symptoms of those problems in
static analysis using the above methodology [51].

5.2 EXECUTING THE TOOL

The following sections show the steps used to load the input data-
bases, do the power grid analysis, and show the IR drop analysis
[51]. The next section will show a more specific design example for
the application of this CAD flow.

1. Move to the working directory in the UNIX shell:

Shell>> cd $thunder_working_directory
2. Start the Thunder tool to load the netlist and compute the
peak saturation current:
Shell>> thunder
Thunder> load design.ckt
The above step is to load the circuit netlist file.
Thunder> pwrnet ipeak VDD
The above step will compute the peak saturation currents.
The command creates an output file named VDD.ipeak,
which contains the desired peak currents. The currents are
computed for each transistor connected to the Vdd voltage
source, assuming a VGS voltage magnitude of Vdd and a VDS
voltage magnitude of Vdd.
The current estimation is based on the IV curve and tran-
sistor size from the specific SPICE simulation deck in a spe-
cific process technology. We can also scale the resulting cur-
rent to match the realistic average current on one design
example, as shown in the next section.
Notice that the transistors tied to DC voltage sources,
c05.qxd 12/16/2003 12:22 PM Page 109

5.2 EXECUTING THE TOOL 109

which will turn these transistors off, are assigned 0.0 A cur-
rent. We can exit the Thunder window as follows:
Thunder > quit
An alternative way of using Thunder is to create a com-
mand file, for example: ipeak.cmd, which contains the three
Thunder commands introduced. Then we can use the com-
mand line version of Thunder to perform the analysis by en-
tering this command as follows:
Shell > thunder.tty ipeak.cmd
The above command creates the VDD.ipeak output file,
the same as the pwrnet command’s output. Next, we can run
Lightning by using the following steps, which will load the
power grid RC network modeling, specify the power source
pin locations, load the Ipeak current data file (VDD.ipeak)
generated in the above steps, and then solve the linear net-
work of the power grid modeling with RC and tap currents,
and show the lowest voltage across the full-chip power grid.
3. Move to the working directory containing the Vdd power grid
database:
Shell>> cd $lightning_working_directory
4. Run Lightning:
Shell>> lightning
Lightning > load design_VDD.mhdr
The above step loads the binary power grid database for
Vdd and displays the power grid in the plotter window. The
metal layers are shown in different colors—such as M3 in
purple, M2 in tan and M1 in blue—in the layout display:
Lightning > putvsrc M3 Vsrc1 3.3 24000 17000
Lightning > putvsrc M3 Vsrc2 3.3 284000 17000
Lightning > putvsrc M3 Vsrc3 3.3 24000 12000
Lightning > putvsrc M3 Vsrc4 3.3 284000 12000
The above step is used to define where the power source
pins are placed. Four Vdd pads are placed in the chip bound-
ary. M3 is the power line, which is started from the Vdd in-
put pin. {24000 17000}, etc. are the X–Y locations of the pads
in the layout with the drawn dimensions.
Be sure to name each source differently. After each com-
mand, a white dot corresponding to the placement of the
voltage source in the layout plotter window can be seen.
The voltage is actually placed at the power grid subnode on
M3 near the specified location. Units are in ␮m in general
c05.qxd 12/16/2003 12:22 PM Page 110

110 POWER GRID ANALYSIS

for X and Y coordinates, which are in the drawn layout

sizes.
We can also place the voltage sources by using either a
command file or the graphical user interface. One com-
mand file, called vsrc.cmd, can be created to contain the four
putvsrc commands in the above step.
5. We now have the voltage sources placed on the power grid.
We need the tap current model, which was calculated earlier
and is stored in the file named VDD.ipeak. We load the cur-
rents in the Thunder working directory into the Lightning
tool as follows:
Lightning > iload $thunder_working_directory/
VDD. ipeak
As a matter of practice, after loading the static current data,
use the following command:
Lightning > scan tc
The above command reports the statistics about the tap cur-
rents loaded. For example, the following result will show
that the current range is 0.0 A to 0.004 A with an average
current of 0.001 A and a total current of 17.3 A.
6. We have not scaled the current yet, and the above statistics
are the sum of all transistor saturation peak currents con-
nected to Vdd. In reality, not all transistors will switch at the
same time, and the worst-case total current consumed by the
circuit or drawn from Vdd will be smaller than the summation
of all these transistor currents, depending on how many
switching transistors occur in the worst-case application.
However, to identify the switching patterns of transistors
in the circuit to Vdd will require a long computational time
using the dynamic simulation of the circuits in multiple tim-
ing steps.
The methodology used in this static analysis is to roughly
estimate the total current either by measurement or current
simulation tool using multiple vectors, and then apply one
currentscalefactor to the estimated peak current sum in
VDD.ipeak by using the following command:
Lightning > setenv currentscalefactor 0.01
The above scaling factor is decided by designers and ap-
plied to the tap current scaling for later power grid analysis.
We can recheck the static current load using the scan com-
mand again as follows:
Lightning > scan tc
c05.qxd 12/16/2003 12:22 PM Page 111

5.2 EXECUTING THE TOOL 111

The statistics show that the total current is now 0.173 A,

scaled by 0.01 from original 17.3 A. This 0.173 A current will
be used in the linear circuit analysis of the power grid mod-
eling.
7. We can perform the power grid solving to calculate the volt-
ages across the power grid based on the tap currents, voltage
sources, and resistive model of the power grid, which have
been loaded in the above steps:
Lightning > solve
Lightning > scan ir
The solve command prints a number of messages, ending
with the memory utilization of the solve command. The scan
ir command scans the node voltages in the power grid and
reports their range. For example, the node voltage range is
3.13 V to 3.3 V. The worst-case IR drop is about 0.17 V. It in-
dicates the minimum voltage in the power grid.

The high volume of data processed by VoltageStorm™ makes nor-

mal reporting of every item in the database excessive in size, as
well as nearly impossible to sort. Therefore, VoltageStorm™ uses
a concept called filtering. For example, when screening for exces-
sive IR drop, we may be interested in seeing where the IR drop
exceeds 10% of the Vdd voltage in the power grid. Effective use of
filtering on the various analysis types in VoltageStorm™ gives
significant insight into the behavior of the chip.
Table 5-1 shows the most common analysis types supported by
VoltageStorm™ [51]. Each analysis type can have up to eight fil-

Table 5-1. Analysis types in the VoltageStorm™ tool [51]

Analysis Type Option Abbreviation
Tap current (current drawn by transistors Tap_Current Tc
connected to the power grid)
IR drop (voltage on each node of the IR_drop Ir
power grid)
Resistor current (current through resistors) Resistor_Current Rc
Current density (current through a metal Current_Density Rj
divided by the metal area)
Electromigration risk (probable time until EM_risk Er
failure because of electromigration)
Resistor voltage (voltage drop across a Resistor_Voltage Rv
resistor)
c05.qxd 12/16/2003 12:22 PM Page 112

112 POWER GRID ANALYSIS

ters, and each filter contains a range. We can establish a set of fil-
ter ranges by using the following methods:

(a) Selecting the auto-filter setting on the command line

(b) Entering the range on the command line
(c) Using a command line
(d) Interactively selecting filters from the dialog box

Figure 5-1 shows a graphical representation of the ranges of fil-

ters. For example, node voltage 3.21 V is located in Range 4. We
can set the IR drop filters both automatically and manually by us-
ing the filter command [51].
For example, the filters are set to assign the plot colors in
such a way that the red color is assigned to the node voltages in
the lowest range from 0.00 V to 3.14 V, the orange color to the
voltages in the medium range from 3.14 V to 3.16 V, and the
green color to the voltages in the high range from 3.16 V to
3.3 V.
Then, the scan command prints the range of voltages from the
IR drop analysis with the number for each filter. The plot com-
mand will create a color-coded plot similar to the thermal plot. It

Figure 5-1. Filtering ranges of voltage data [51].

c05.qxd 12/16/2003 12:22 PM Page 113

5.2 EXECUTING THE TOOL 113

shows where the circuit has the largest IR drops, as well as the
voltage trends from the power pins to each area of the chip.
A design example shows that the red colors in the plot, which
has the largest IR drop, are located in the central control units on
the left side of the chip [51]. The reason for this is that the power
routing for much of the control circuitry is provided only from one
side of the block, yielding high IR drops at the isolated end of the
power bus, whereas the power is supplied to the top and bottom
blocks from both sides of the block.
We can also use the VoltageStorm™ tool to view the geometric
distribution of tap currents. We can use the scan tc command to
view the total currents in the design. We can also create filters to
plot the tap currents. Before doing this, we can set the analysis
tape to Tap_Current, as shown in Table 5-1. One design example
shows that the larger areas of currents are located in the data
path units of the chip and smaller currents in the control units.
Resistor current value distribution has different characteristics
than node voltages. The maximum current should be near to the
power pins and the minimum near the transistors. After observ-
ing the IR drop in the plots, the next examination is of the current
flows in the circuits to create the IR drop. The current flow trends
may not be as you expected or currents from several power pins
may merge in the middle of the chip to create a high current
through the wires with high IR drops.
We can use scan rc command to get the data for the resistor
current, and set the analysis type to resistor_current before the
filtering and plotting. Finally, we can use the plot rc command to
obtain colorful plots with the highest-current chip area in red, the
medium resistor current area in orange, and the low-current area
in green.
Once we understand how current flows through the chip, we ex-
amine the current densities of metal wires, which is a first-order
indication of the electromigration failures within the chip. Use
the Current_Density analysis type to examine them.
We can change the current density limits for metal layers. The
current density reporting is not based on the actual value, but on
the ratio of wire current density over the required limit. If the ra-
tio is more than 1.0, there is a potential electromigration failure
in that area.
Summarizing the modeling, analysis, and viewing results using
the VoltageStorm™ (Thunder and Lightning) tool suite from Ca-
dence, the recommended design flow is described as follows [51]:
c05.qxd 12/16/2003 12:22 PM Page 114

114 POWER GRID ANALYSIS

1. Simulate the transistor current data of the chip using the

Thunder tool.
2. Load the power grid into Lightning, and place the voltage
source pins.
3. Use the image command to create a GIF image of the grid
for reference.
4. Load the transistor current data from Thunder into Light-
ning.
5. For the blocks whose relative activity we know, use the
scalecurrents command to scale the currents for the blocks.
The scale factor for a memory is typically 1 divided by the
number of words. If the design has blocks whose activities
are exclusive, only some portions of the chip can be active at
any one time. Scale the currents in those blocks accordingly.
For example, if a block has four units, only one of which
can be active at a time, scale the currents for that block by
0.25, or set the scalecurrents to 0.25 in this block.
6. Generate the plot of tap currents to visually inspect the
areas of the scaled currents. If the scaled currents for some
blocks were missed, go back to Step 5 to observe the reason-
able tap currents for the power grid simulation. Once satis-
fied with the tap currents data, generate a GIF image of the
tap currents plot for reference.
7. Given the total current reported by the tap currents scan
and an estimate of the expected total power current, we can
compute the appropriate value by setting the CurrentScale-
Factor environment variable. It is important to use the
CurrentScaleFactor environment variable to scale the total
current consumption estimated by the tool to be matched
with the one from real estimation or measurement.
Notice that the real estimation or measurement of the
power consumption for the chip is derived from the total
power consumption during the active logic switching. We
may not want to average in the inactive time of the design,
because it may produce a substantially lower value than
the average power consumption.
8. Use the solve command to solve the power grid I-V equa-
tions.
9. Use the savestate command to save the solved results. Sav-
ing the result will enable reuse later without having to
c05.qxd 12/16/2003 12:22 PM Page 115

5.2 EXECUTING THE TOOL 115

solve the power grid again, which will save the computa-
tional time for a large-size chip.
10. Iterate between using the scan ir command and setting the
filter to derive a good set of filters to observe the IR drop.
These filters are generally equal-sized steps. We can gener-
ate the plot of the IR drop by using the plot ir command.
Use the image command to create a GIF image of the IR
drop plot.
11. Iterate between using the scan rc command and setting the
filter to derive a good set of filters to observe resistor cur-
rent flow in the chip. These filters are generally decreased
in magnitude logarithmically. We can generate the plot of
the resistor current by using the plot rc command. Use the
image command to create a GIF image of the resistor cur-
rents plot.
12. Create a plot for each layer with only the metal layer and
its error layer turned on. For example, turn off the grid and
all errors, then turn on M2 and M2 errors and save the im-
age. These plots help in understanding the behavior of each
metal layer.
13. Iterate between the scan rj command and filter setting to
derive a good set of filters to observe the resistor current
density in the design. These filters are generally decreased
in magnitude logarithmically. We can generate the plot of
the current density by using the plot rj command.
Look for the few wire segments that have the highest val-
ues. Create a GIF image of this plot using the image com-
mand. If we define the appropriate model parameters, we
can also generate a plot of electromigration risk.
14. Create a plot for each layer with only the metal layer and
its error layer turned on. For example, turn off the grid and
all errors, and turn on M2 and M2 errors and save the im-
age. These plots help us to identify the specific wires most
likely to fail because of electromigration.

Because of the large amount of data in large designs, it is rec-

ommended to store the temporary Thunder and Lightning files in
the large-size tmp directory of the local machine. Lightning uses
the temporary files for commands. If they reside in a machine
elsewhere on the network and the network is overloaded, Light-
c05.qxd 12/16/2003 12:22 PM Page 116

116 POWER GRID ANALYSIS

ning’s performance might be affected. Moving as many as files as

possible to the local machine can improve its performance signifi-
cantly. Use the scan command to iterate the filter settings in or-
der to avoid the processing involved in plotting large volumes of
the data until it is really needed. Depending on the design size
and network performance, iterating significant amounts of data
with the plot command can be slow.
It is a good practice to save the state of the analysis after find-
ing a solution to the power grid equation, to avoid having to
solve the power grid again to continue the analysis in another
session.
Be sure there is enough disk space when a lot of data is
processed with Lightning to avoid problems. Redrawing the grid
of a large design can be time-consuming. We can use Ctrl-C to in-
terrupt the redrawing of the power grid.
If there are different power supply voltages, which were de-
signed in different power grids, these power grids should be ana-
lyzed separately. For each power grid, we can use the described
method to do the modeling and analysis and generate the reports
and plots.
Figure 5-2 shows a static power grid analysis flow application
in one communication chip [53]. The “xtc64” is the transistor-level
netlisting for the entire chip. The “thunder.tty64” models the peak
currents for devices, and multiple (two in the flow) thunder.tty64
commands are feasible for partitions of the entire chip in order to
speed up the simulation time. The “design.ckt” specifies the volt-
age levels at the power pads.
The “tablegen” command generates the table of current curves
for transistors according to the device sizes. The table of I-V mod-
els for the transistors is useful in the peak current simulation.
The “runFEX_p_a,” “runFEX_p_b,” etc. extract the resistance
model of the power network. Multiple resistance extraction com-
mands (four in the flow) may be performed on the stripes of the
power grid in order to speed up the computational time.
The “mergenet” command stitches the resistance models of the
power network and the peak currents of devices into a complete
IR model of the full-chip power network. Each device is modeled
as a current source in the peak current, and each wire segment or
via in the power network is modeled as a resistor.
Note that the flow in Figure 5-2 is applied on both the Vdd and
Vss nets, so we can estimate the worst-case IR drop between Vss
c05.qxd 12/16/2003 12:22 PM Page 117

5.2 EXECUTING THE TOOL 117

#! /bin/csh -f
#This is the flow for core Vdd/Vss nets IR static analysis
#(ocelot, 64bit, 4GB Memory, 30GB disk)
#/chip/thunder/tablegen tablegen.cmd
cd /remote/chamfs3/jonathan/simplex/viper_d_2/xt
xtc64 chip_xt.cmd
net_profile chip_cmln.net
cd /remote/chamfs4/home/qing/simplex/viper_d/thunder/static_1
thunder.tty64 run.cmd
cd /remote/chamfs4/home/qing/simplex/viper_d/thunder/static_2
thunder.tty64 run.cmd
cd /remote/chamfs4/home/qing/simplex/viper_d/thunder
itaputil combine static_1/vdd.ipeak static_2/vdd.ipeak vdd.ipeak
itaputil combine static_1/vss.ipeak static_2/vss.ipeak vss.ipeak
cd /remote/chamfs4/home/qing/simplex/viper_d/firePOWER
runFEX_p_a&
runFEX_p_b&
runFEX_p_c&
runFEX_p_d
cd /remote/cougar3/simplex/viper_d/firePOWER
mergenet _s VDD_simplex -o add /remote/chamfs4/home/qing/simplex/viper_d/firePOWER/chip_cmln_*.hdr
mergenet _s VSS_simplex -o vss /remote/chamfs4/home/qing/simplex/viper_d/firePOWER/chip_cmln_*.hdr
cd /remote/cougar3/simplex/viper_d/lightning/static_vdd
lightning.tty64 run.cmd
cd /remote/cougar3/simplex/viper_d/lightning/static_vss
lightning.tty64 run.cmd

Figure 5-2. Static IR analysis flow [53].

and Vdd. The “lightning.tty64” command solves the IR model for

the voltage drops across the power grid.
The peak current modeling may be overestimated in the above
static IR drop analysis method. Static analysis runs fast with the
assumptions that all the devices are on during the chip operation,
which is a worst-case assumption for the chip power consumption.
Although a dynamic analysis capability is possible by using it-
erative test vectors at the circuit inputs, it will take a much
longer time and may not be preferred in design iterations for the
purpose of power grid improvement.
When increasing the accuracy of the peak current modeling in
the static current analysis, we have to provide the current scale fac-
tor for the chip, which is decided by us in the CAD flow to scale the
estimated peak current by the tool that matches the measured or
realistic current consumptions [51]. The peak current in the mea-
surement is about 4.0 A for each of the Vdd or Vss nets in the design.
c05.qxd 12/16/2003 12:22 PM Page 118

118 POWER GRID ANALYSIS

So we can use this peak current as the baseline to match the

peak current value estimated from the tool to determine the cur-
rent scale factor. We found that Vss and Vdd nets estimated from
the tool have different peak current values, so the current scale
factors will be different for Vss and Vdd, since the measured cur-
rent will be the same for both nets.
We do find other conclusions from the experiments on the flow
in Figure 5-2. The tool does not like too many floating metals that
cause the floating nodes in the power network. The disk space and
memory should be very large in a large-chip power network. The
turnaround time in the communication chip is about 24 hours and
the automation of the running flow, as shown in Figure 5-2, helps
to submit the job overnight and do the power grid improvement
during the next day [53].
To reduce the extraction and analysis time, we can separate the
power networks for the I/O ring and the core power network by
using different Vdd and Vss labels, so the analysis is only for the
core power network in the experiments. In addition, multiple
CPUs can be used to do the resistance extraction in multiple
strips for the power network and the peak current estimation in
multiple partitions in parallel.
The “lightning64.tty” command should be used in a 64-bit ma-
chine in order to run the full-chip level. The “tablegen” command
is only needed once in a design if the circuit and process technolo-
gies are not changed. The IV table can be reused in the flow for
the power grid improvement.
The first tape-out of the chip fails due to the significant IR drop
(~0.8 V between Vss and Vdd nets) across the chip, because the
wire bonding technology is used in this chip [53]. We added a ded-
icated M5 for the Vdd and Vss straps to reduce the IR drop.
Table 5-2 shows the voltage drops across the chip for the Vdd
and Vss networks by using separation of 40 ␮m and 75 ␮m be-
tween two adjacent Vdd lines or two adjacent Vss lines, as well as
the original design without the M5 power straps [53]. The worst-
case IR drop calculation is the sum of the voltage drops in the Vdd
and Vss nets.
Table 5-3 shows the current scaling factors in the simulation
for the Vdd and Vss nets in order to match the currents simulated
by the tool with the peak current assumption (4.0 A), based on the
measurement from the original tape-out chip in the same process
technology and not significantly changed circuits.
c05.qxd 12/16/2003 12:22 PM Page 119

5.3 ADVANCED STATIC ANALYSIS 119

Table 5-2. Simulation results

Worst-Case
(Vdd + Vss)
Vdd IR Drop Vss IR Drop IR Drop
Original Chip: No M5 Power Straps 0.356 V 0.434 V 0.79 V
Additional M5 Power Straps: 75 ␮m 0.117 V 0.146 V 0.26 V
Separation
Additional M5 Power Straps: 40 ␮m 0.130 V 0.124 V 0.25 V
Separation

Table 5-3. Current scaling factors in simulation [53]

Simulated Current Measured Current Current Scaling Factor
Vdd 4.22 e + 03 A 4A 0.00095
Vss 5.25 e + 03 A 4A 0.00076

About 67% IR drop reduction is observed by adding the power

straps on the M5 layer for this chip, and the 0.25 V IR drop is
within the required supply voltage ranges (nominal voltage: 2.5
V) for the correct device timing, which is about 10% of the nomi-
nal voltage.
The old design has an IR drop (Vdd – Vss) of about 30% of the
nominal voltage, which is one reason that the chip fails. Figure 5-
3 shows the voltage plots of the power grids for the old design
without M5 power straps. We can observe significantly low volt-
age at the center area of the chip.

5.3 ADVANCED STATIC ANALYSIS

Activity-based analysis is another approach to static analysis that

better resolves the distribution of currents on the power grid [51].
The activity-based approach assumes that you have a mechanism,
such as a Verilog simulator, to compute and report the relative ac-
tivity of the nets in the design.
These relative activities can be used in conjunction with net ca-
pacitances to estimate the average current load of each gate in the
design. This form of analysis will provide a more realistic power
current than the Ipeak estimation approach based on the satura-
tion currents.
c05.qxd 12/16/2003 12:22 PM Page 120

120 POWER GRID ANALYSIS

(a)

Figure 5-3. Voltage plots of power distribution networks [53]. (a) Vdd.

As input, the activity-based analysis uses a file containing the

activity levels of nets in the design. This file is optional but recom-
mended in VoltageStorm™ [51]. In addition, activity-based analy-
sis has three other important input parameters: (1) the clock cycle
time of the chip for which the activity values are defined, (2) The
value of Vdd, and (3) the default activity to use for gates whose ac-
tivity is not specified.
The average current for each gate is computed using the load-
ing of the gate, Vdd, the cycle time, and the activity of the gate.
The average current consumed by a gate is derived by the follow-
ing equation:

Iavg = A · Cgate · Vdd · F (5-1)

where A is the activity ratio of the gate, Cgate the total capaci-
tance of the nets including the wires and gates, F the clock fre-
c05.qxd 12/16/2003 12:22 PM Page 121

5.3 ADVANCED STATIC ANALYSIS 121

(b)

Figure 5-3 (continued). (b) Vss.

quency of the chip, and Vdd the supply voltage. This equation for
the average current is derived by considering the charge, Q, re-
quired to charge the outputs of the gate in a clock cycle interval
(1/AF).
This derivation of average current is not a function of transistor
sizes. If your design has multiple clocks, select one clock to be the
reference for the activity analysis, and scale the gates associated
with other clocks accordingly.
For example, if CLK1 has a period of 10 ns and CLK2 has a pe-
riod of 15 ns, and CLK1 is to be the reference for activity-based
analysis, scale the activity of gates in the CLK2 domain by 0.666.
On the other hand, if we have the actual toggle numbers of all
nets, use CLK1 as the reference to divide the net toggle counts to
derive the activity values.
Here are steps to run the activity-based static power grid
analysis using the VoltageStorm™ tool [51].
c05.qxd 12/16/2003 12:22 PM Page 122

122 POWER GRID ANALYSIS

1. Obtain a file called activity.list and put it in the running di-

rectory. This file contains a list of nets in the design and
their activity levels.
2. For the nets in the clock trees, all of them should have the
activity of 1.0. This file may only contain a subset of nets in
the design, and remaining nets will use the default activity
factor. This analysis uses the name back-annotated netlist,
capacitance database, and power grid database.
3. Start Thunder and load the design:
Shell >> thunder
Thunder > load design.ckt
4. Set the parameters for the activity command, and load the
file of activity values.
Thunder > activity default 0.03
Thunder > activity cycle_time 5ns
Thunder > activity vdd_range 3.3
Thunder > activity filen activity.list
In the above setting, the default activity is set to 3%. The cy-
cle time is set to 5 ns, and the Vdd range is set to 3.3. The ac-
tivity filen command reads the activity file and sets the ac-
tivity for each node as specified in the file.
5. Complete the analysis by generating the report and exit
Thunder. The activity report command computes the tap
currents on the basis of these activities, based on Equation
(5-1), and writes them in the VDD.iavg file:
Thunder > activity report VDD
Thunder > quit
6. We now proceed to the power grid analysis and change to
the Lightning directory. Copy the files from the static direc-
tory for defining the voltage sources (vsrc.cmd) and defining
filters (filters.cmd) into the Lightning directory. Start Light-
ning, load the grid, define the voltage sources, and load the
currents, which have just been computed:
Shell >> lightning
Lightning > load design_VDD.mhdr
Lightning > run vsrc.cmd
Lightning > run filters.cmd
Lightning > iload $thunder_dir/VDD.iavg
The above commands load the necessary information for a
power grid solution. We can view the tap currents computed
c05.qxd 12/16/2003 12:22 PM Page 123

5.3 ADVANCED STATIC ANALYSIS 123

from the activity information in the chip by the following

command:
Lightning > plot tc
7. Now solve the grid, and then plot the IR drop (plot ir com-
mand) and the resistor current (plot rc command):
Lightning > solve
Lightning > plot ir
Lightning > plot rc
Lightning > quit

The third method for performing static power grid analysis is to

use dynamic vectors to exercise the design when computing the
average currents in the transistors connected to the power grid.
This method will compute the average currents for transistors on
the basis of the specific vector set. The vector set must be suffi-
ciently representative of the design usage to achieve accurate av-
erage currents.
VoltageStorm™ allows three ways to specify input vectors: (1)
define SPICE-like voltage sources in the netlist, (2) use a Thunder-
specific vector file to describe waveforms, and (3) use the VCD file.
The SPICE-like voltage sources that Thunder supports are DC,
pulse, and piecewise linear (PWL). The following steps use the
SPICE-like voltage sources to drive only the clock to illustrate
how to perform the vector-based netlist analysis.

1. Prepare a design_input_sources.inc file in the Thunder

working directory. This design_input_sources.inc file in-
cludes the pulse waveforms for CLK and CLKN with a peri-
od of 5 ns. Only the clocks are used in this example to illus-
trate the vector-based static power grid analysis.
2. Start Thunder and load the circuit:
Shell >> thunder
Thunder > load design.ckt
Simulate for two clock cycles in the DC state to initialize the
system properly. After computing an initial state, we can
save it to reuse later:
Thunder > s 30
Thunder > save ic state.ic
The s command performs the DC solution and performs the
circuit for 30 ns, which is in two clock cycles. The save ic
c05.qxd 12/16/2003 12:22 PM Page 124

124 POWER GRID ANALYSIS

command saves the voltages of the circuit in the format of

.IC cards. This file is used in the dynamic analysis to avoid
the computation of a DC solution.
3. The following commands compute the currents for devices
connected to the VDD source:
Thunder > devi tally VDD
Thunder > devi tran VDD
The devi tally command instructs Thunder to begin tracking
the minimum, maximum, and average current for the Vdd
voltage source when you next perform the simulation.
The devi tran command instructs Thunder to create a
Thunder.tran output file that provides the transient wave-
form for the currents of the voltage source VDD.
Simulate for another 10 ns in two clock cycles, report the
tailed currents for Vdd, and exit Thunder:
Thunder > s 10
Thunder > pwrnet report VDD
Thunder > devi report
Thunder > quit
The pwrnet report command instructs Thunder to write
the currents reported so far into the VDD.avg, VDD.max,
and VDD.rms files. In this case, the three files are all in
ASCII format [51].
The devi report command instructs Thunder to report the
minimum, maximum, average, and RMS currents of the VDD
voltage source. According to the SPICE convention, the cur-
rent of a device entering the device at the terminal is posi-
tive, so normal current flow into the Vdd voltage source is
negative. Therefore, the reported minimum value is the
peak absolute current generated by the Vdd source, and the
average current should be negative.
4. The power grid analysis portion of the flow is much like the
analysis performed in the activity-based static analysis, ex-
cept that the currents input file is different:
Shell >> cd $lightning_work_dir
Shell >> lightning
Load the design and voltage sources command files as fol-
lows:
Lightning > load design_VDD.mhdr
Lightning > run vsrc.cmd
c05.qxd 12/16/2003 12:22 PM Page 125

5.4 DYNAMIC ANALYSIS 125

5. Load the current input file from the vector-based simulation

result, and then perform the power grid analysis as follows:
Lightning > iload VDD.iavg
Lightning > solve
6. The above power analysis performs the static analysis by av-
eraging the currents over clock cycles. The IR drop result is
based on the average currents over clock cycles. On the other
hand, we can use the VDD.max file, which tracks the peak
current of each transistor on the power grid. We can analyze
the power grid again using the VDD.max file, after clearing
the early current inputs by the iclear command, as follows:
Lightning > iclear
Lightning > iload VDD.max
Lightning > solve
Lightning > quit

When we use the VDD.avg file, for example, the IR drop in this
case goes down as far as 3.265 V [51]. However, when we use the
VDD.max file, the IR drop goes down to 2.772 V [51], but this
number may be an overestimation of the peak IR drop in the pow-
er grid, because it models all the transistors turned on to their
maximum currents at the same time.
The actual peak IR drop is somewhere between that reported
using the VDD.max file and that reported using the VDD.avg file.
Use the VDD.max file only on the small blocks to perform an easy
pass and fail screening of the block power grid.
If we want to apply peak currents to large designs to which
many vectors have been applied, we will see an unrealistic mea-
sure of the IR drop. We can use it on small blocks in which many
gates could potentially switch at the same time.

5.4 DYNAMIC ANALYSIS

The dynamic analysis method is claimed to provide more precise

insight into the behavior of the power grid [51]. The static analy-
sis averages the tap currents to look at the long-term average be-
havior of the power grid. Dynamic analysis keeps the time distrib-
ution of currents in place so you can see the voltage and current
waveforms in a more numerically precise way.
c05.qxd 12/16/2003 12:22 PM Page 126

126 POWER GRID ANALYSIS

Therefore, the dynamic analysis will provide better insight into

the magnitude of the IR drop. The dynamic analysis capability
provided by the VoltageStorm™ tool is claimed to have the follow-
ing goals [51].
It helps the designer to find the weak spots in the power grid by
predicting a worst-case test vector for the IR drop from the test
vectors that we have. It is usually hard to find the worst-case IR
drop test vector because it is a function of the physical implemen-
tation of the design, not the logic implementation.
The dynamic analysis enables us to analyze the specific test
vectors on the design. This capability is most useful when we
know some specific test vectors that we must analyze in great
depth to obtain the exact magnitude of IR drop in the power grid.
It is critical to select the proper step size for the power grid
analysis. This step size in the power grid analysis is different
from the simulation time step in the netlist simulation. The
netlist simulation uses internal time step control to keep the sim-
ulation accurate. The power grid analysis step size reflects how
often the tap currents pass from the netlist analysis to the power
grid analysis.
As described before, the VoltageStorm™ tool uses vector com-
pression technology to speed up the dynamic analysis for a large
design. Obtaining usable results from the vector compression re-
quires us to set the parameters for the compression carefully.
We can set three parameters as follows:

1. Method: determining whether to compress using peak or av-

eraging across multiple vectors
2. Period: the time period over which we want compression to
be applied
3. Intervals: the number of time steps that we want in each pe-
riod

In general, we use the clock period as the period of compression

because we want to gain a better insight into the operation of the
chip over a clock cycle. Most circuit activities occur near the edges
of the clock, so we want to see if IR drop problems occur because
of the clock itself or as the result of logic switching after the clock.
The second question to address is the number of intervals to ap-
ply. Once again, the starting point is based on the delay of a typi-
cal gate. The number of intervals or timing buckets is the period
c05.qxd 12/16/2003 12:22 PM Page 127

5.4 DYNAMIC ANALYSIS 127

divided by the gate delay. It is always better to use more intervals

to solve the power grid if the computational time is tolerated for a
large power grid.
The third question is the selection of the method to apply in the
vector compression: peak or averaging. If the design is small and
could possibly have more simultaneous activity than is represent-
ed by the vectors, we want to use peak compression. Increasing
the intervals is necessary for large designs because the finer time
stepping reduces the overestimation of IR drop resulting from
bucketing activity at the same time, when in reality it occurs at
different times.
If we use peak compression, we need to use the average of peak
currents in the power grid analysis. Do not use peak compression
if the design contains exclusive logic, in which at most one in n
components could ever operate at once.
The average-to-peak currents form a good data set if the time
steps are small enough that the peak current in a time step does
not highly overestimate the average current in the time step.
The dynamic analysis is the next step in complexity beyond the
vector-driven static analysis. The dynamic analysis uses vectors as
input to the netlist analysis performed by Thunder. It generates
the dynamic current data to feed into the power grid analysis per-
formed by Lightning. Here are the steps for one example to perform
the dynamic analysis, based on the VoltageStorm™ tool [51]:

1. It needs a VCD-format input file (inputs.vcd), and then we

start Thunder in the Thunder directory as follows:
Shell >> thunder
Thunder > load design.ckt
2. Enter the following command to tell Thunder to use the ini-
tial conditions specified in the file when we begin the simu-
lation.
Thunder > use ic state.ic
3. Perform the vector compression using the peak function
across the vectors and compute 20 time steps in the 5 ns pe-
riod:
Thunder > pwrnet tallyint method=000 intervals=20
period=5ns VDD
In this case, because simulation runs for 10 ns, two vec-
tors are compressed into one and each power grid analysis
time step is 250 ps wide. Tally the current from the Vdd
c05.qxd 12/16/2003 12:22 PM Page 128

128 POWER GRID ANALYSIS

source, create a Thunder.tran output file, submit a VCD-for-

mat file, and report tallied currents for Vdd:
Thunder > devi tally VDD
Thunder > devi tran VDD
Thunder > vcd inputs.vcd
Thunder > devi report
Thunder > quit
The above simulation generates several files. The pwrnet
tallyint command creates three files: VDD.ptimax, VDD.pti-
avg, and VDD.ptirms. They correspond to the peak, average,
and RMS currents, respectively, for each interval of analy-
sis. The three files correspond to the peak-to-peak, peak-to-
average, and peak-to-RMS currents. The devi tran command
creates the Thunder.tran file containing the transient wave-
form for the Vdd voltage source current.
4. After the completion of the netlist analysis, run the follow-
ing commands:
Shell >> itaputil summary VDD.ptiavg
Shell >> itaptuil s VDD.ptimax
The itaputil summary VDD.ptiavg command generates a
summary of the current data in the VDD.ptiavg file. There
are 20 intervals of the data in this example, and the report
shows the minimum, maximum, and average currents over
all transistors connected to Vdd, as well as the total Vdd cur-
rent in the interval.
5. Given the above dynamic current files, we are ready to pro-
ceed to the power grid analysis. Dynamic power grid analy-
sis is similar to static analysis. Dynamic analysis performs a
series of power grid matrix solutions, one for each time step.
Currents are updated for each time step and capacitance
models are updated. The resulting states for each solve are
saved automatically.
Change to the Lightning working directory, and start the
Lightning command from there:
Shell >> lightning
Lightning > load design_VDD.mhdr
Lightning > run vsrc.cmd
Lightning > run filters.cmd
The above commands are the same as the static analysis to
load the design database of the power grid, the Vdd source lo-
cations, and the filters.
c05.qxd 12/16/2003 12:22 PM Page 129

5.5 LAYOUT EXPLORATION 129

6. The following command specifies the current file to apply

and initiates the dynamic analysis:
Lightning > tran VDD.ptiavg

The CurrentScaleFactor environment variable is not set in this

dynamic analysis. If we set the value for CurrentScaleFactor envi-
ronment variable, it would scale the currents appropriately. We
could set it to overestimate the power currents to compensate for
the averaging of peak currents resulting from taking averages ei-
ther within each time step or across vectors in the compression.
We can also set up the filters to generate the plots and reports
during the dynamic analysis. The VoltageStorm™ tool also pro-
vides a movie of the behavior of the power grid over the time in-
tervals of the analysis. We can examine the plots and reports of
the individual states as in the static analysis.
The tran command computes the state of the power grid after
each time step during the analysis. These states are saved in a
set of files sequentially numbered beginning with Lightning.
tran_int0. We can load each of these states individually by using
the loadstate command and generate plots and reports of these in-
dividual states.
The most useful state created during the dynamic analysis is
the Lightning.worstcase state. It contains the worst-case voltages
over the dynamic analysis for each subnode in the power grid, so
you can examine a single file to determine the worst IR drop oc-
curring in the dynamic analysis.

5.5 LAYOUT EXPLORATION

VoltageStorm™’s power grid exploration capability enables the

designer to optimize the power grid or correct a problem inside
the database [51]. We can experiment with power grid changes,
such as adding or changing vias, voltage sources, or resistors, and
perform the power grid analysis to show the effects of these
changes in the power grid.
The power grid layout changes are easier to complete if no sig-
nal routing is completed. The PGS exploration is used once we
have placed all cells and transistors and have a complete physical
power network.
We do not have to wait until we have completed the signal rout-
ing. Because we can easily explore the effects of changes to a pow-
c05.qxd 12/16/2003 12:22 PM Page 130

130 POWER GRID ANALYSIS

er grid network in the PGS exploration framework, we can deter-

mine if we have overdesigned the power grid. Although some
overdesign is necessary, significant overdesign decreases the
available signal routing area and wastes the die area.
PGS exploration lets us rapidly understand the consequences of
reducing or increasing power route widths, so we can adjust the
power grid design to the power grid requirements.
Once we load the power grid and tap currents into Voltage-
Storm™, we can use PGS exploration [51] to modify the power
grid resistor network as desired. When we want to understand
the effects of the changes, we simply perform a power grid solu-
tion. We can continue to repeat the modification and solution
steps until the power grid is clean.
When we are satisfied that the power grid design is acceptable,
we write out the change report from VoltageStorm™ and use it as
the guide for implementing the changes in the layout. A change
report contains a summarized list of the changes made to the pow-
er grid. The changes are made to the resistors within Voltage-
Storm™.
The change report is written in the layout format, which con-
tains the width, length, layer, and coordinate information that en-
ables a layout designer to easily implement the required layout
changes to the power grid. In order to avoid redundant layout
changes when a resistor is modified more than once, the change
report lists only the final modifications.
PGS exploration includes the following commands:

1. addres: adds a resistor to the power grid

2. changeres: modifies an existing power grid resistor
3. addvia: adds a new via to the power grid
4. addvsrc: adds a voltage source to the power grid
5. show: displays the selected nodes or elements
6. unselect: deselects the signal or multiple nodes or elements
7. write: writes out the change report

Most of the above commands allow us to modify the power grid

network.
Typically, we can select the object to modify and then execute
the change command. We can select objects interactively by using
the middle mouse button and drawing a selection box over the
c05.qxd 12/16/2003 12:22 PM Page 131

5.5 LAYOUT EXPLORATION 131

area to be selected. To be selected, an object must be completely

inside the selection box. We can use the select command to select
the objects. We can select resistors interactively by clicking on
them with the middle mouse button [51]. The following are the
steps for one example using PGS exploration.

1. Move to the directory containing the Vdd power grid data-

base, and start Lightning from there:
Shell >> lightning
2. Load the power grid database into Lightning as follows:
Lightning > load design_VDD.mhdr
Add the voltage sources to the power grid using the speci-
fied command file as follows:
Lightning > run vsrc.cmd
Load the tap currents into Lightning as follows.
Lightning > iload VDD.ipeak
3. We can solve the power grid equation by using the following
command:
Lightning > solve
Use the auto-filtering to define the filter ranges for the IR
drop, and display the IR drop in the Lightning plotter win-
dow as follows:
Lightning > filter ir auto
Lightning > plot ir
4. Use autofiltering to define the filter ranges for the IR drop,
and display the IR drop in the Lightning plotter window as
follows:
Lightning > filter ir auto
Lightning > plot ir
In this example, the large IR drop could be observed in the
left side of the central control section. It occurs because the
power routing to this block comes only from the right side. In
order to fix the problem for this larger IR drop, a resistor
could be added on the M1 layer to connect the upper row of
the control block.
5. First zoom in on the upper left corner of the central block by
using the left mouse button to draw the box from the lower
left to the upper right around the area. Select a node by
clicking on it, and then add a resistor between the selected
nodes using the following command:
Lightning > addres selected 1000
c05.qxd 12/16/2003 12:22 PM Page 132

132 POWER GRID ANALYSIS

This command adds a resistor with a width of one micron

(1000 units) between two selected nodes. Notice that the re-
sistance of this resistor is automatically calculated from the
process technology information.
Repeat the addres command to add additional resistors to
the adjacent nodes in the power grid.
6. Solve the circuit again and replot the IR drop. The reduced
IR drop can be seen to be due to the added metal lines on the
power grid:
Lightning > solve
Lightning > plot ir
7. Now verify that the current density limits have not been ex-
ceeded after modifying the resistance in the power grid. The
autofiltering can be used to set the filters for the current
density and plot the current density errors:
Lightning > filter rj auto
Lightning > plot rj
The red color in the upper right corner indicates that the
current density has exceeded the limits. Change the width of
the resistor in the upper right corner by a factor of 5.0, dese-
lect all, solve the circuit, and plot the current density errors
as follows.
First select a resistor of the red color by clicking on it with
the middle mouse button. Then we perform the following
commands:
Lightning > changeres selected 5.0
Lightning > unselect all
Lightning > solve
Lightning > plot rj
We can see that changing the width of the resistor by a
factor of 5 made a significant improvement in the current
density. Then we replot the IR, since the resistance of the
power grid has been changed:
Lightning > plot ir
8. Next, we will generate the reports. The following command
will generate a report with a list of all resistors that have
been changed:
Lightning > changeres report
The following command will generate a report with a list
of all added resistors:
Lightning > addres report
c05.qxd 12/16/2003 12:22 PM Page 133

5.6 SUMMARY 133

The following command writes out the change report to

guide the layout changes:
Lightning > write gridchanges design_layout.eco
The above command creates a file named design_layout.eco
that contains all the commands that make changes to the
power grid. Then we need to quit Thunder to finish the ECO
changes in the power grid:
Lightning > quit

5.6 SUMMARY

With the complexity of the power grid and reduced power supply
voltages in modern VLSI chips, CAD tools are necessary to assist
designers in finding failures or weak spots in power network de-
signs. This chapter discusses the most popular tool, Volt-
ageStorm™ from Cadence, with modeling and analysis capability,
and explains how to use this CAD tool to aid in IR drop analysis
and improvement.
The tool provides the following capabilities: (1) modeling of the
power network in the resistance network, (2) modeling the tran-
sistor switching current in the tap current, and (3) solving the
power network model in the linear circuit.
The tool also provides the capability to help designers locate
and fix errors in the power grid layout. For example, PGS explo-
ration is one example that uses the internal power grid analysis
database to fix the power grid and output a list of changes needed
with zero violations for the power grid. Layout designers can use
up these changes, as necessary, to fix the power grid design.
c06.qxd 12/16/2003 12:24 PM Page 135

6
MICROPROCESSOR
DESIGN EXAMPLES

Microprocessor chips usually consume a lot of power and there-

fore have the highest requirements for power network distribu-
tion performance. This chapter contains seven sections. Section
6.1 describes the Intel IA-32 Pentium-III chip [66]. Section 6.2 de-
scribes the Sun UltraSPARC chip [67]. Section 6.3 describes the
Hitachi SuperH microprocessor chip [68]. Section 6.4 describes
the IBM S/390 microprocessor [69]. Section 6.5 describes the Sun
SPARC 64b microprocessor [70]. Section 6.6 describes the Intel
IA-64 microprocessor [71]. Section 6.7 summarizes this chapter.

6.1 INTEL IA-32 PENTIUM-III

The Intel IA-32 microprocessor is implemented in a five-layer

metal 0.25 ␮m CMOS process technology [66]. Table 6-1 shows
the process technology parameters and operating voltage range
for this processor. The 10.1 × 12.1 mm2 die contains 9.5 million
transistors. The functional unit-level local interconnects are rout-
ed using lower metal layers with higher density, whereas the
global interconnects have been routed in the upper layers, M4 and
M5, which have lower metal resistance. The top metal layer (M5)
supports all of the C4 bumps.
Alternative power and ground grids are implemented in M5
and M4 for global power distribution. Spacing and width of these
metals were selected such that inductive effects are minimized
Power Distribution Network Design for VLSI, by Qing K. Zhu 135
ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c06.qxd 12/16/2003 12:24 PM Page 136

136 MICROPROCESSOR DESIGN EXAMPLES

Table 6-1. 0.25 ␮m CMOS process technology [66].

Gate oxide thickness 40 A
Gate length 0.20 ␮m
M1 pitch 0.61 ␮m
M2 pitch 0.88 ␮m
M3 pitch 0.88 ␮m
M4 pitch 1.73 ␮m
M5 pitch 2.43 ␮m
Operating voltage 1.4–2.2 V

and both AC and DC drops are reduced. For the local metal lay-
ers, a tree-based distribution was chosen, with custom width se-
lection for the trunks and branches according to the area current
drain requirements. The global power grids and associated local
tree structures are shown in Figure 6-1 [66].
It is difficult to optimize the power distribution using a single
C4 bump pitch for both the I/O and the core due to their different
requirements. In the core, the optimization is primarily driven by
the potential for power collapse but constrained by the effective
routing channel space available for global signals. However, in
the I/O area, power collapse, minimization of the interconnect
length to a C4 bump, and package-level routability are some of
the additional constraints.
A 252 ␮m bump pitch for the core and 235 ␮m bump pitch for
the I/Os were chosen [66]. The overlap region between the core
and I/O area is strapped with custom power grids. In the I/O ring
design, special attention was paid to the placement of signals and
power/ground bumps and their ratio, such that loop inductance is
minimized while maintaining the continuous return paths for I/O
signals.
The processor is packaged in a six-layer organic land grid array
(OLGA) package. Dedicated power and ground planes are used to
minimize the package-level power distribution and the noise due
to package-level power distribution. Power distribution was de-
signed with two different Vcc supplies to enable lower-power ap-
plications.
The core power supply voltage level can be dropped significant-
ly while maintaining the I/Os and other special analog circuits
with a different supply. All of the special circuits within the core
were verified at a 1.1 V supply voltage to enable this voltage scal-
ability.
c06.qxd 12/16/2003 12:24 PM Page 137

6.1 INTEL IA-32 PENTIUM-III 137

(a)

(b)

Figure 6-1. (a) Global power grid (M4 and M5) and (b) local power trees for the
Intel IA-32 Pentium-III chip [66].
c06.qxd 12/16/2003 12:24 PM Page 138

138 MICROPROCESSOR DESIGN EXAMPLES

From a measured thermal profile of the previous Intel micro-

processors, it was found that the voltage level due to power col-
lapse is not sufficiently uniform across the die to hit the projected
goal of the required clock frequency. A power distribution model
was developed such that we could study the power collapse in dif-
ferent areas separately [66]. Knowing the worst-case switching
activity per area, the coupling capacitor requirements on a per-
area basis are derived [66].
Various design profiles for the process technology are derived to
come up with the proximity roll-off characteristics. When design-
ing these optimizations, a broad range of frequency components
were considered in the modeling to capture several spectral com-
ponents created by the high-frequency edge rates associated with
transistor switching, as shown in Figure 6-2.
In Figure 6-2, the device junction voltage is a function of decou-
pling capacitance distance for worst-case switching conditions. It
is observed that up to a 80 ␮m distance, the decoupling capaci-
tance behaves as if it is connected to the driver directly. Beyond
80 ␮m, the impact rolls off quickly, and beyond 200 ␮m its contri-
bution to the decoupling is negligible.
For the best case when neighbors are not switching, the roll-
off is extended to 100 ␮m but diminishes beyond 200 ␮m. With

Figure 6-2. Decoupling capacitance effects on device voltage [66].

c06.qxd 12/16/2003 12:24 PM Page 139

6.2 SUN ULTRASPARC 139

Figure 6-3. Device voltages across microprocessor chip [66].

better placement guidelines from the decoupling capacitors, as

shown in Figure 6-2, a more uniform power collapse is achieved
in spite of nonuniform current drain at various parts of the die.
Figure 6-3 shows the power fluctuations of the chip at various
points.

6.2 SUN ULTRASPARC

A 1.1 GHz 64-bit UltraSPARC microprocessor has been described

in [67]. It is built on a 0.13 ␮m 7LM Cu CMOS process from Texas
Instruments Inc. The nominal channel length for the gate is 65
nm and interconnects use the low-k dielectric with dielectric con-
stant 3.6. The power consumption is 53 W at 1.1 GHz and 1.3 V
supply voltage [67].
The die size is 178.5 mm2. The total transistor count of this
chip is 87.5 million, of which 63 million are in the SRAM cells.
The chip package is a 950 pin flip-chip micropin grid array
(␮PGA). The signal-to-power pin ratio is 5:1 in the I/O distribu-
tions.
Figure 6-4 is a die micrograph showing the floor plan of the
main functional blocks in this chip. The L2 Cache or SRAM cells
c06.qxd 12/16/2003 12:24 PM Page 140

140 MICROPROCESSOR DESIGN EXAMPLES

Figure 6-4. Sun UltraSPARC die micrograph [67].

are located at the bottom of the chip. The control and execution
units are located in the middle of the chip.
The instruction catch and decoder blocks are located at the top
of the chip. The clock is distributed from the PLL output up to the
flip-flops through a balanced tree network. All the inputs of flip-
flops and clock buffers are connected through a clock grid network
to minimize clock skew.
The main power network uses a grid in M5/M6 and M7 (three
metal layers). There are 2065 solder bumps, of which 1251 are
used for Vdd and Vss. These bumps are area-distributed over the
chip area by the flip-chip technology. The I/O contains 800 solder
bumps, 470 of which are signal bumps, whereas 330 are used for
power and ground. The bumps in the core area and in the channel
regions are placed away from the active circuitry to prevent soft
errors due to alpha particles released from the bumps.
c06.qxd 12/16/2003 12:24 PM Page 141

6.3 HITACHI SuperHTM MICROPROCESSOR 141

6.3 HITACHI SuperHTM MICROPROCESSOR

A 200 MHz 0.2 ␮m Hitachi SuperH™ microprocessor has been de-

scribed in [68]. The microprocessor is fabricated with a 0.2 ␮m,
five-metal, dual-oxide-thickness, triple-well CMOS technology. It
has five levels of metal (M1, M2, M3, M4, and M5). The last two
metals are thicker than the first three and the top metals are usu-
ally used for the global power distribution. The dual-tox structure
enables the use of MOS transistors with two distinct tox and
threshold voltages for both pMOS and nMOS devices.
Table 6-2 shows the process and device parameters used for
this processor. Thin-tox, low-threshold voltage devices are provid-
ed for the 1.8 V internal circuitry, and thick-tox, high-threshold
voltage devices are used for the 3.3 V circuitry, such as the I/O cir-
cuitry.
Figure 6-5 shows the pMOS and nMOS device layers and struc-
tures. Substrate biases, denoted as vbp and vbn in Figure 6-5, for
the thin-tox, low-threshold voltage devices are controlled through
the switched substrate impedance scheme. The substrate biases
for the thick-tox, high-threshold voltage are connected to their lo-
cal source terminals as in the conventional CMOS devices.
In the standby mode, the substrates for the pMOS and nMOS
devices are biased to 3.3 and –1.5 V, respectively, to increase the
threshold voltages of the MOS transistors and lower the sub-
threshold leakage current. The substrates for the pMOS and
nMOS devices are biased to 1.8 V and 0 V, respectively, in the ac-
tive mode to maintain high-speed operation.
The high-speed switching of MOS transistors induces significant
power supply noise and local substrate noise. This noise makes it
difficult to bias the substrate of all MOS transistors uniformly. In

Table 6-2. Process and device parameters for the Hitachi SuperH™ CPU [68]
Technology 0.2 ␮m, P-sub, triple-well CMOS
Gate channel length (Lg) 0.2 ␮m (1.8 V device) and 0.35 ␮m (3.3 V device)
Gate oxide thickness (tox) 4.5 nm (1.8 V device) and 8 nm (3.3 V device)
Threshold voltage (Vth) 0.15 V (1.8 V device) and 0.45 V (3.3 V device)
Metal layers Metal 1–3 (0.88 ␮m pitch) and Metal 4–5
(1.76 ␮m pitch)
Area 6.84 × 6.84 mm2
Transistor count 3.3 M
c06.qxd 12/16/2003 12:24 PM Page 142

142 MICROPROCESSOR DESIGN EXAMPLES

Figure 6-5. pMOS and nMOS device structures [68].

the active mode, the fluctuation in the substrate bias causes signif-
icant threshold voltage variation and lowers the operating speed.
The peak overshoot of the substrate noises can be reduced by
lowering the supply voltage or increasing the source and sub-
strate diffusion capacitances. The decap time of the noise depends
on the substrate impedance. A long decap time that exceeds the
cycle time causes the substrate noise to accumulate.
To reduce the substrate impedance and achieve substrate bias-
ing, the switched substrate impedance scheme has been devel-
oped. This scheme switches the substrate impedance, as well as
the substrate bias, according to the operation mode. Figure 6-6
shows the switched impedance scheme for this microprocessor.
A standby controller and a vbb controller (VBC controller) con-
trol the voltage of the substrates, denoted as vbp for the pMOS
substrate and vbn for the nMOS substrate. In the standby mode,
these are driven with a high-voltage, high-output impedance dri-
ver in the VBC macro. In the active mode, the substrates are driv-
en with about 10000 switch cells over the chip [68].
Each switch cell consists of two thick-tox and high-threshold
voltage MOS transistors. One transistor with a gate signal cbp is
connected to vbp and add. Another with a gate signal cbn is connect-
ed to vbn and vss. These transistors reduce the substrate imped-
c06.qxd 12/16/2003 12:24 PM Page 143

6.3 HITACHI SUPERH MICROPROCESSOR 143

Figure 6-6. Switched substrate impedance control scheme [68].

ance; in other words, they keep the substrate biases of the MOS
transistors equal to their local power supplies.
Therefore, even if the local power supply drops due to a power
line pump or simultaneous switching noise, the substrate bias is
quickly recovered. The VBC macro consists of four circuits—
VBCP, VBCN, VBCI, and VBCG—and is fed by supply voltages
add (normally 1.8 V) and vwell (3.3 V). VBCG generates vsub volt-
age, which is a negative voltage used as the third voltage source
in the VBC macro. The vsub voltage is equal to add – vwell = 1.8 V –
3.3 V = –1.5 V.
Figure 6-7 shows the waveforms of a complete transition from
active mode to standby mode. When the microprocessor goes from
the active to the standby mode, the standby controller stops all
c06.qxd 12/16/2003 12:24 PM Page 144

144 MICROPROCESSOR DESIGN EXAMPLES

Figure 6-7. Control signal waveforms [68].

1.8 V logic circuits. After that, it issues a vbbenb signal. Then the
VBC macro drives cbp up to vwell (3.3 V) and cbn down to vsub (–1.5
V). These signals turn off all switch cells. The VBC macro also dri-
ves cbp to 3.3 V and vbn to –1.5 V. This mode transition takes
about 50 ␮s.
Figure 6-8 shows the layout of a standard cell and a switch cell
for random logic circuitry. Both cells have the same height. In a
conventional CMOS cell, the substrate biasing lines, vbp and vbn,
are connected to the power lines (add and vss) locally. In the new
scheme, these lines are interconnected separately to bias the sub-
strate.
The substrate bias lines vbp and vbn are interconnected by M1
and are parallel to the power lines add and vss. The switch cell has
additional vertical power lines add and vss interconnected by M2.
Furthermore, between add and vss, there are four metal lines: two
c06.qxd 12/16/2003 12:25 PM Page 145

6.3 HITACHI SUPERH MICROPROCESSOR 145

Figure 6-8. Standard cell and switch cell layouts [68].

are the substrate biasing lines vbp and vbn and the other two are
the gate lines cbp and cbn.
In order to reduce the chip area overhead, the design uses iden-
tical heights for each cell compared to the conventional CMOS
cell, as shown in Figure 6-8 [68]. The width of the power lines to
M1 is reduced to about 77% that of the conventional CMOS cell.
This increases the impedance of the power lines.
To reduce the impedance, the power lines are routed in a fine
mesh structure. Figure 6-9 shows the metal routing of vbp, vbn, cbp,
cbn, and power lines. The switch cells are placed in rows, and the
distance between two switch cells is about 200 ␮m. The thicker
metal levels of M4 and M5 also form a coarse power line mesh that
reduces the impedance of the power lines. The chip area overhead
of the switch cells is less than 2% because the switch cells are
placed under the power lines in M2, as shown in Figure 6-9.
The data flow in the data path is designed so as to be parallel to
the power lines and p- or n-wells. This layout will reduce the
c06.qxd 12/16/2003 12:25 PM Page 146

146 MICROPROCESSOR DESIGN EXAMPLES

Figure 6-9. Power grid structure for microprocessor [68].

number of logic cells operating on the same well simultaneously.

It also reduces the injected noise. The substrate biases of 3.3 V for
pMOS and –1.5 V for nMOS decrease the subthreshold leakage
current during the standby mode by about 1.5 orders of magni-
tude. However, a larger body effect degrades the circuit perfor-
mance by elevating the threshold voltage in series-connected
MOS transistors or pass transistors.

6.4 IBM S/390 MICROPROCESSOR

A microprocessor implementing IBM S/390 architecture operates

at frequencies up to 411 MHz (2.43 ns). The chip is fabricated in a
0.2 ␮m Leff CMOS technology with five layers of metal and tung-
c06.qxd 12/16/2003 12:25 PM Page 147

6.4 IBMS/390 MICROPROCESSOR 147

Table 6-3. S/390 microprocessor technology parameters and chip

characteristics [69].
Leff 0.2 ␮m
Gate Oxide 5.5 nm
M1 Pitch 1.2 ␮m
M2 Pitch 1.8 ␮m
M3 Pitch 1.8 ␮m
M4 Pitch 1.8 ␮m
M5 Pitch 4.8 ␮m
Power supply 2.5 V
Transistor count Logic (3.8 million)
Array (4.0 million)
Die size 17.35 mm × 17.3 mm
Power 37 W @ 2.5 V 300 MHz
Maximum frequency 411 MHz
Area C4 1600
Off-chip signal I/O 448
On-chip decoupling capacitance 102 nF

sten local interconnects. The chip size is 17.35 mm × 17.3 mm with

about 7.8 million transistors. The power supply is 2.5 V and mea-
sured power dissipation at 300 MHz is 37 W. Table 6-3 shows the
typical technology parameters, including the metal layer pitches.
Figure 6-10 shows the die photo. The measured power dissipa-
tion at 300 MHz is 37 W. There are 1600 area C4 and 448 off-chip
signal I/Os. Dedicated thin-oxide capacitors of 102 nF are provid-
ed for on-chip decoupling [69]. Combined with the “built-in,” non-
switching well-to-substrate and diffusion-to-well capacitances,
the total on-chip decoupling capacitance is about 200 nF [69].
The power distribution supports an average DC voltage drop of
23 mV. The Delta-I current transients were managed by including
additional on-chip decoupling capacitors around large noise
sources, such as the off-chip drivers, clock buffers, and on-chip
drivers with large loads. Since a large amount of switching capac-
itance occurs in the dataflow stacks, decoupling capacitors were
also placed under the wiring tracks.
The thin-oxide capacitor features a “built-in” fuse mechanism
whereby weak spots between M1 and contact are used to blow
connections to Vdd and ground in the presence of a large current
resulting from oxide defects. Each capacitor has a gated NFET
control device with an external decap_enable pin for leakage cur-
rent measurement during testing.
c06.qxd 12/16/2003 12:25 PM Page 148

148 MICROPROCESSOR DESIGN EXAMPLES

Figure 6-10. S/390 die photo [69].

Figure 6-11 shows the decoupling capacitor cell that fits under
the data flow wiring tracks. The cell is double bit-pitch wide (43.2
␮m) and 14 tracks tall (25.2 ␮m). Two out of the 14 horizontal
wiring tracks are specially blocked for the decoupling capacitor
wiring so the capacitor can fit right under the wiring tracks. A
low-resistance layout of the capacitor cell provides a fast time con-
stant of about 85 ps.

6.5 SUN SPARC 64B MICROPROCESSOR

This die with 750 I/O signals and 1735 power bumps is flip-chip-
attached to a multilayered ceramic land grid array package [70].
Figure 6-12 shows the die micrograph of the chip [70]. The pack-
c06.qxd 12/16/2003 12:25 PM Page 149

6.5 SUN SPARC 64B MICROPROCESSOR 149

Figure 6-11. Decoupling capacitor [69].

Figure 6-12. Die micrograph of Sun SPARC 64-bit microprocessor [70].

c06.qxd 12/16/2003 12:25 PM Page 150

150 MICROPROCESSOR DESIGN EXAMPLES

age lid is mated to an air-cooled heat sink containing a heat pipe

structure to control the die temperature. Power bumps over the
chip core minimize the IR and di/dt drops.
The on-chip Vdd peak-to-peak variation of about 260 mV is re-
duced to about 60 mV when the on-chip regulator is enabled, as
shown in Figure 6-13. Since the period of this resonance is much
longer than a CPU clock cycle, the CPU clock speed is limited by
the minimum voltages that are supplied during this resonance.
The maximum supply voltage must still be fixed at 1.6 V to assure
the long-term reliability.
The on-chip power distribution begins at the power and ground
solder bumps, placed primarily in channels to minimize soft er-
rors from the solder, and proceeds through the M7 distribution to
the M6 and M5 grids. The grid extends continuously over the
processor core, excluding the large RAM blocks so that any circuit
block can be connected vertically to a good power source.
This paired grid reduces the power supply and signal loop in-
ductance on the die. Gate oxide capacitors, which occupy all of the
unused silicon area under the wiring, connect to the power grid to
increase the on-chip bypass capacitance by 220 nF.
The power distribution system is verified for IR and EM com-
pliance using a Cadence tool [60]. This tool checks the power dis-
tribution on both static and dynamic modes. Figure 6-14 shows
one static simulation result for the IR drop plot.

Figure 6-13. Supply voltage noise [70].

c06.qxd 12/16/2003 12:25 PM Page 151

6.5 SUN SPARC 64B MICROPROCESSOR 151

Figure 6-14. Full-chip IR drop plot [70].

This simulation was done after the core was attached to the
pad ring and the result shows a black region in the bottom right
of the die. This large IR drop being highlighted is where the
power supply connections between the core and the pad rings are
incomplete. A hook-up is added here later to fix this IR drop
problem.
Voltage regulation requirements of each generation of micro-
processors are more critical as the on-chip voltage decreases and
the AC current increases. Distributed thin-oxide capacitors are
used for supporting instantaneous current variations within the
die, but are insufficient to compensate for the tank circuit formed
by the parasitic LC in line with the supply distribution.
Simulation shows nearly an order of magnitude increase in
supply network AC impedance seen by an internal gate at reso-
nance. This resonant frequency is much lower than the system
clock frequency but can limit the speed performance. A special
voltage regulator circuit is placed 99 times to reduce the reso-
nance from the board to the package to the chip.
The voltage regulator circuit increases the charge stored or de-
livered by a given amount of added decoupling capacitors by ac-
tively increasing the voltage across the capacitor’s terminals. The
operation is done by stacking fully charged equal value capacitors
c06.qxd 12/16/2003 12:25 PM Page 152

152 MICROPROCESSOR DESIGN EXAMPLES

in series as a voltage multiplier to supply charges in the on-chip

power (Vdd) and ground (Vss) grid.
The depleted voltage in each capacitor is then (Vdd – Vss)/n,
where n is the stack height. Figure 6-15 shows a simplified block
diagram of the regulator for n = 2. Mutually exclusive CMOS
switches configure the capacitors to either be in the charging
phase when shunting across Vdd – Vss, or in the discharging phase
in series between Vdd and Vss.
The sizes of the capacitors are chosen to exhibit the proper
equivalent series resistance ESR. The switches are driven by two
sets of complementary drivers, each of which provides two out-
puts with enough voltage offsets to ensure the minimal crowbar
leakage through both charge and discharge switches during the
switching activity.
The operation of the voltage regulator shown in Figure 6-15 is
described as follows. The instantaneous difference Vinst between
Vdd and Vss begins at the same value as the average Vdd – Vss. In
this condition, N2 and P2, the shunt switches, are weakly on with
gate-to-source voltages of (Vdd – Vss)/2 each, whereas N1 and P1,

Figure 6-15. Block diagram of voltage regulator [70].

c06.qxd 12/16/2003 12:25 PM Page 153

6.6 INTEL IA-64 MICROPROCESSOR 153

the series switches, are completely off. Then Vinst drops, causing
node B to fall, cutting off N2. Slightly later, node A falls, turning
on P1. This changes C2 from being in shunt with C1 to being in
series. Similarly, the mirror devices, P2 and N1, are being cut off
and turned on, respectively. This allows the series-connected C1
and C2 to discharge into the power grid, which forces Vinst up. In
the next time section, where Vinst > Vave, node A rises, cutting off
P1, and then node B rises, turning on N2. Similarly, N1 turns off
and then P2 turns on. This switches C1 and C2 into the shunt
mode, allowing them to be charged by Vinst and forces Vinst to drop.
Once Vinst = Vave, node B returns to Vdd/2, which returns the cir-
cuit to the weakly charging mode.
The switched capacitors are enhancement mode MOSFET de-
vices, laid out in a waffle-type structure to maximize capacity
[70]. The regulators are evenly distributed across the chip in 99
instances, which are directly hooked up to the main global power
grid.
Care has been taken in shielding sensitive signals and in man-
aging high-current-density paths. The regulators are placed un-
derneath the global routing channels to reduce the layout area
impacts.

6.6 INTEL IA-64 MICROPROCESSOR

This microprocessor implements a highly parallel execution core,

while maintaining binary compatibility with the IA-32 instruction
set [71]. The processor contains 25.4 million transistors, and is
fabricated in a 0.18 ␮m CMOS process with six metal layers using
C4 or flip-chip assembly technology in an organic land grid array.
Table 6-4 shows the process technology used in the manufac-
turing of the processor. Figure 6-16 shows the die micrograph for
this processor and Figure 6-17 shows the architecture [71]. Four
1MB L3 cache chips are connected to the processor die by a core-
speed backside bus (BSB).

Table 6-4. 0.18 ␮m process technology [71]

Poly M1 M2 M3 M4 M5 M6
0.48 ␮m 0.60 ␮m 0.72 ␮m 0.72 ␮m 1.45 ␮m 1.80 ␮m 2.00 ␮m
c06.qxd 12/16/2003 12:25 PM Page 154

154 MICROPROCESSOR DESIGN EXAMPLES

Figure 6-16. Die photograph of Intel IA-64 microprocessor [71].

Figure 6-17. Architecture of Intel IA-64 microprocessor [71].

c06.qxd 12/16/2003 12:25 PM Page 155

6.6 INTEL IA-64 MICROPROCESSOR 155

All these components are packaged in a cartridge optimized for

double-sided motherboard mounting, as shown in Figure 6-18. The
processor has fifteen execution units, including four integer and
two floating units. The processor includes three levels of cache or-
ganized in the hierarchical manner. The L1 and L2 caches are in-
tegrated on the die. The L3 cache contains up to 4 MB of custom-
designed on-cartridge memory and is connected to the processor
die by a dedicated 128 bit BSB source synchronous interface.
Power is delivered from the voltage converter to the processor
cartridge through a separate connector that provides significantly
lower impedance compared to traditional power delivery using
pins through the motherboard socket. The chip-level power distri-
bution consists of a uniform M6–M5 grid with C4 power and
ground bump arrays.
This grid has the power and ground lines finely interspersed
with signal traces to reduce the inductive crosstalk, i.e., a very
wide power or ground line is composed of multiple thin lines of the
Vdd and Vss in order to reduce the inductance talk or the switching
current returning paths.
The on-die decoupling capacitors are placed in the proximity of
the high di/dt switching circuits, as well as in all the routing
channels. The total on-die decoupling capacitance is about 800 nF
in this microprocessor [71]. In addition, on-package decoupling ca-
pacitances have been added to reduce the synchronous switching
noise from the I/O buffers.

Figure 6-18. Package of Intel IA-64 microprocessor [71].

c06.qxd 12/16/2003 12:25 PM Page 156

156 MICROPROCESSOR DESIGN EXAMPLES

This microprocessor allows the use of clock gating to reduce the

average power without any loss in performance. Figure 6-18
shows the internal structure of the processor cartridge. The
processor is C4, attached to a multilayer organic land grid array
(OLGA) package, which is soldered to the base cartridge sub-
strate. Inductive signal return current loops are minimized by
proper placement of return vias for image currents propagating in
the reference planes inside the multilayer package.

6.7 SUMMARY

With microprocessor frequency continuing to rise and supply volt-

age continuing to decrease, the power delivery system remains
very challenging in microprocessor design. The C4 or flip-chip
package, with area solder bumps, is used in modern microproces-
sor chips. The dense power grid in multiple metal layers is used to
achieve low-resistance delivery of power inside the die.
To prevent di/dt noise collapse for the circuit functions, a large
amount of decoupling capacitors have been used in the chips.
Package design with decoupling capacitors is essential to provide
the lower voltage drop; multiple power and ground planes are
used for this purpose.
In addition, the voltage regulators of microprocessor chips have
been moved into the die to stabilize the increasingly reduced on-
die power supply voltage, as in the design example shown in the
Sun SPARC 64-bit microprocessor [70].
c07.qxd 12/19/2003 9:06 AM Page 157

7
PACKAGE AND I/O DESIGN
FOR POWER DELIVERY

The power delivery performance of a VLSI system depends not

only on the on-chip power network, but also on the system-level
power distribution, including the package options and board pow-
er planes. The voltage drop and power noise are influenced by the
chip, the package, and the entire board. Each of the components
in the system will contribute to the voltage noise as a whole.
Therefore, the package options and I/O design for power supplies
are important in the VLSI power network design.
This chapter is organized into six sections. Section 7.1 describes
the flip-chip package technology. Section 7.2 discusses the simul-
taneous switching noise for off-chip drivers. Section 7.3 provides a
case study of how to evaluate the package technology and metal
options in a high-performance microprocessor [76]. Section 7.4
discusses microprocessor power noise measurement techniques.
Section 7.5 describes the I/O pads for power and ground supplies
to the chip. Section 7.6 summarizes the chapter and also high-
lights some thoughts on the chip and package codesign concept
[81, 82].

7.1 FLIP-CHIP PACKAGE

The length of the electrical connections between the chip and the
substrate can be further reduced using flip-chip or C4 technology.
This technology is achieved by distributing the I/O solder bumps
Power Distribution Network Design for VLSI, by Qing K. Zhu 157
ISBN 0-471-65720-4 © 2004 John Wiley & Sons, Inc.
c07.qxd 12/19/2003 9:06 AM Page 158

158 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

over the die, flipping the chips over, aligning them with the con-
tact pads on the substrate, and connecting the solder bumps be-
tween the chip and package to make connections.
This saves silicon area and increases the maximum number of
I/O and power/ground terminals available with a given die size.
This package also provides more efficiently routed signal and
power/ground interconnections on the chips. Therefore, modern
high-speed chips and microprocessors use this flip-chip technolo-
gy to achieve high speed and lower power noise.
For example, the 450 MHz RISC microprocessor from Motorola
has a chip footprint with a total of 794 C4 or flip-chip pads [72].
Two hundred and sixty-six pads are used for 64-bit bus transfer,
64-bit L2 interface, and control. The remaining C4 pads are used
for power and ground and possible extension to 128 bit bus trans-
fer and L2 interface options.
The 1.8 V Vdd and ground C4 pads are distributed over the core
of the chip to reduce the voltage drop and feed the internal power
structure. The signal I/Os are distributed around the periphery to
reduce the wiring congestion in the package substrate and to iso-
late the ESD structures from the internal circuits.
L2 cache interface C4s are placed along the left side (bits
0–63) and bottom (bits 64–128) of the chip. This allows for an op-
timal multichip module design of this processor, with two
SRAMs using the 360-pin solution. The data transfer signals are
on the right side of the chip, and the address/control signals are
at the top.
A total of 236 Vdd and ground C4 pads are used for the internal
1.8 V core and supply 1.8 V power to off-chip I/O drivers and re-
ceivers, 55 Vdd and ground C4s for the external L2 interface, and
73 Vdd and ground C4s for the external bus transfer address and
control [72].
Flip-chip connection technology as the first level chip-to-pack-
age connection option traditionally is regarded as being the con-
trolled collapse chip connection (C4) process, which was originat-
ed by IBM [73].
Figure 7-1 shows the schematic, which is a bare IC device
flipped upside down with its active area or I/O side attached to a
substrate via a connecting medium. The device may be any of the
substrates providing an interconnection network between the
flipped active device and other active, or even passive devices,
such as the decoupling capacitors.
c07.qxd 12/19/2003 9:06 AM Page 159

7.2 SIMULTANEOUS SWITCHING NOISE (SSN) 159

Figure 7-1. Flip-chip package [73].

There is another feature unique to having the active side of the

chip face the top of the interconnecting substrate. Since the I/O
pads on the chip also are fabricated on the active side, the layout
of these pads easily can be expanded into an array covering the
entire inner area of the chip, rather than being confined to the
perimeter.
Area arrays I/Os in the flip-chip package offer a way of increas-
ing I/O density. For a chip size of 5 mm and a constant I/O pad
spacing of 100 ␮m, a perimeter array could accommodate about
200 I/Os, whereas an area array could accommodate about 2000
I/Os, a tenfold increase.
Only the flip-chip configuration provides the ability to achieve
higher I/O density without decreasing I/O pitch. Flip-chip bonding
also offers the shortest possible leads with the lowest inductance,
maximizing the operating frequency.
Table 7-1 shows the typical values of the lead inductance and ca-
pacitance in various chip package choices. The solder bump pro-
vided by flip-chip technology has the lowest inductance and lowest
capacitance, compared to wire bonding and TAB technologies [74].

7.2 SIMULTANEOUS SWITCHING NOISE (SSN)

When a number of off-chip loads are switched simultaneously in a

digital system, a current change is produced in the power and

Table 7-1. Typical values of lead inductance and capacitance [74]

Package Technology Capacitance (pF) Inductance (nH)
Wire bonding 0.5 1–2
TAB 0.6 1–6
Solder bump 0.1 0.01
c07.qxd 12/19/2003 9:06 AM Page 160

160 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

ground supply network [73]. Consider a 5 V swing voltage with

32-bit drivers with a rise time 2 ns driving a 320 pF load. This
will generate a di/dt = C⌬V/⌬t = 0.8 A/s.
When this transient current passes through the inductive pow-
er distribution network, a noise voltage is produced. This simulta-
neous switching noise is sometimes referred to as ground bounce.
The switching noise can result in a number of problems if not
handled correctly.
The noise appears at the output of what were intended to be
quiet off-chip drivers. This noise appears at the inputs of the con-
nected receivers. The changes in the internal chip supply voltage
make the circuits operate more slowly, and thus increase the de-
lay in switching drivers.
Overshoots and undershoots might also appear in these dri-
vers. For on-chip circuits acting as input gates, the simultaneous
switching noise acts to reduce the effective noise margin at the in-
puts. For on-chip memory devices, such as latches, large amounts
of the ground rail and power rail noise might cause false changes
in the logic state. In the first order, the noise generated by the si-
multaneous switching of N output drivers can be calculated as fol-
lows [74]:

⌬V = N · Leff · di/dt (7-1)

where Leff is the effective inductance of the power and ground con-
nections, di/dt is the peak rate of the change of the currents for
each driver, and N is the number of drivers used during the
switching. di is the current demand of each driver during the
switching event, and dt is the rise and fall time of the signal.
In reality, the ⌬V does not increase linearly with Leff or N, be-
cause any increase in ⌬V will slow down the circuits and reduce
the di/dt. The effective inductance Leff is primarily a function of
the package design. Reducing Leff requires minimizing the induc-
tances of the power and ground distribution networks and also
the use of the decoupling capacitors.
The decoupling capacitor placed between the power and ground
pins of each chip can act as a local source of the charges during
the switching events, so that not all of the switching current has
to be supplied from the system ground to minimize the local
change in voltage. Figure 7-2 shows the equivalent circuit model
for a CMOS output driving a capacitance [74].
c07.qxd 12/19/2003 9:06 AM Page 161

7.2 SIMULTANEOUS SWITCHING NOISE (SSN) 161

Figure 7-2. Electrical modeling of a package power distribution network [74].

A couple of inductances are included in this model: the induc-

tance of the ground lead in the chip attachment and the induc-
tance of the ground plane or wiring between the chip attachment
and the decoupling capacitors. The parasitic inductance and ca-
pacitance associated with the decoupling capacitor are also shown
in this figure.
To minimize the Lgnd and L0, the ground and power planes are
used in the package design. The decoupling capacitors on the
package should be placed close to the chips, since at high frequen-
cies it is important to minimize the parasitic R and L of the decou-
pling capacitors. More leads or I/Os assigned to the chip are pre-
ferred to reduce the inductance.
There are several other sources of noise that must be consid-
ered in the package design [74]. Solutions include the use of mul-
tiple power supply planes or using a ceramic substrate base with
thick-film ground and power planes within it. Table 7-2 shows the
relative noise budgets for each noise source, including reflection
noise, crosstalk noise, and simultaneous switching noise [74].
These noise budgets include two different types of reflection noise:
reflection from loads and reflection due to mismatches between
different transmission lines.
The simultaneous switching noise refers to the noise at the out-
puts of the quiet drivers when they are grounded. The root sum of
squares of the different noise voltages are calculated as follows [74]:
c07.qxd 12/19/2003 9:06 AM Page 162

162 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Table 7-2. Noise budgets for package and system level at 6°C [74]
Noise Source Noise Budget (mV)
Load reflections 100
Interconnect impedance mismatch 100
Crosstalk 100
Simultaneous switching noise (SSN) 150
AC noise 25
Signal IR drop 25
Vcc IR drop 14
Internal chip noise 50

VRSS = (V 2load_reflection + V 2mismatch_reflection + V 2crosstalk

(7-2)
+ V 2SSN + V AC
2
+ V 2IR-sig + V IR-Vcc
2 2
+ V thermal)1/2

The parameters in the package model, such as the simultaneous

switching noise model, as shown in Figure 7-2, are provided by
the package vendors. In addition, the transistor models of the I/O
drivers are also included in the simulation model. An electrical
study will provide the amount of decoupling capacitance and
package layers design guidelines.
In addition, after the layout of the package layers has been
done, the extraction of the RLC parasitic is provided, based on
CAD tools, and then the circuit simulation is done to measure the
performance of the package design, especially for the simultane-
ous switching noise against the required budgets.
The simulation conditions are set up correctly to model the cir-
cuit operation environment. Any deviations from the simulation
will be reported as a possible drawback in the design and improve-
ments will be adopted; for example, adding more decoupling capac-
itors or using additional power and ground planes on the package.
Figure 7-3 shows a simulation model, with the simulation con-
ditions switched within 2.5 ns, and with a ramp-up and ramp-
down peaking at 1.25 ns. Table 7-3 provides the assumptions
made on the package and chip parameters. The ground or Vss in-
ductance and resistance parameters reflect the Vdd path parame-
ters. Four different package types were investigated with the pa-
rameters, as shown in Table 7-3.
The simulation model of the package-level power network can
be simplified to a tank circuit, as shown in Figure 7-4. L and R are
the lumped parasitic inductance and parasitic resistance of the
c07.qxd 12/19/2003 9:06 AM Page 163

7.2 SIMULTANEOUS SWITCHING NOISE (SSN) 163

Figure 7-3. Simulation model of package performance [75].

power distribution network from the voltage regulator or the volt-

age source to the chip. Cd is the total capacitance at the inputs of
the chip, including the added decoupling capacitors on-package
and on-chip. The resonance frequency of the tank circuit is given
as follows:

1
f = ᎏᎏ (7-3)
2␲兹L苶C
苶苶d

The resonance quality factor Q determines the impedance of the

network at the resonance frequency as follows:

苶C
兹L 苶苶d
Q= ᎏ (7-4)
R

For the design improvement, we can increase Cd so that the reso-

nance frequency f is very small compared to the operational fre-
quency range, and Q is small. We can also decrease the package
inductance L to the extent that f is very large. We can achieve

Table 7-3. Simulation model parameters of four packages [75].

Lp/g Rp/g
Package Package Lp/g Bond Rp/g Bond Rc
Package A 180 pH 1.0 m⍀ 180 pH 1.0 m⍀ 2.5 m⍀
Package B 80 pH 1.0 m⍀ 90 pH 1.0 m⍀ 2.5 m⍀
Package C 67 pH 1.0 m⍀ 74 pH 1.0 m⍀ 2.5 m⍀
Package D 55 pH 1.0 m⍀ 30 pH 0.5 m⍀ 2.5 m⍀
c07.qxd 12/19/2003 9:06 AM Page 164

164 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Figure 7-4. Tank circuit model for power distribution.

very high frequency with the flip-chip package and use both the
package-level and on-chip decoupling capacitors.
The degree of ground bounce depends on multiple factors, such
as the total current inputs to the chip, the clock delay and skew,
and the switching activity factor. With reduction in clock delay
and clock skew, the higher harmonic components will become
stronger, and the most acceptable design technique would be
staged decoupling, with both on-package and on-chip decoupling
to reduce the package resonance to a small value.
With no additional on-package decoupling capacitance, it re-
quires a very large on-chip capacitance to decouple the package in-
ductance. As the power and ground inductance decrease to meet
the simultaneous switching noise reduction requirement, the re-
quired on-chip capacitance becomes larger. On-package decoupling
capacitors should be used to decouple the package inductance.
The value should be high enough to make the first resonant fre-
quency and the resonant impedance sufficiently low. The reso-
nant frequency, as specified in Equation (7-3), should be four to
five times smaller than the clock frequency.
The following parameters should be considered when predict-
ing the high-frequency ground bounce:

1. The chip current demand and clock skew

2. Package RLC parasitic
3. Chip RC parasitic of the power and ground network
4. The number of gates, the activity factor, and the average
loading on each gate to estimate the on-chip capacitance

In the worst case, when there is no on-package capacitance or large

c07.qxd 12/19/2003 9:06 AM Page 165

7.2 SIMULTANEOUS SWITCHING NOISE (SSN) 165

power and ground inductance, the high-frequency bounce on either

power rail is roughly determined, based on the above factors [75].
The ground bounce predominately observed in the time domain
analysis is referred to as low-frequency bounce; it occurs with a
frequency of fclk and 2fclk [75]. The magnitude of the low-frequency
bounce may be conservatively estimated by the following equation
[75]:

Pcore
Bounce(fclk) = ᎏᎏ (7-5)
3.3 · Zin(fclk)

Here Pcore is the power dissipation due to the core gate, including
the flip-flops. As the power dissipation increases, it becomes nec-
essary to decouple at very high frequency.
With less on-chip decoupling, it is important to reduce the chip-
to-package inductance with integrated decoupling, along with
large high-performance on-package decoupling. The bounce mag-
nitudes observed in the simulations are less than 70% of the val-
ues predicted by the above equations [75].
The delay derating factor for the ASIC standard cell library is
Kv = 1.03 for a 160 mV reduction in the supply voltage, or 3% in-
crease in the delay for a 5% reduction in the voltage [75]. For 320
mV peak bounce on both Vdd and Vss, the delay penalty is 6%, ap-
proximately the dynamic effect of the bounce with an average ef-
fect.
For a critical path in a 100 MHz system, if only a 5 ns delay
with gate and loads is produced by this bounce, the delay penalty
is 300 ps [75]. Unless the low-frequency bounce is designed within
a controlled limit, the effect on chip power consumption may be
noticeable. This results are from the fact that the power consump-
tion from the ground bounce affects all gates in the chip.
The simulations indicate a significant variation in the power
dissipation. For example, for a very high performance package
with no on-package decoupling, the power dissipation may vary
from 13.0 W to 18.9 W [75]. One effect of the power consumption
from low-frequency bounce is that it is dependent on the relative
position of the clock frequency and resonance frequency.
A study methodology for the ground bounce and decoupling ca-
pacitance has been proposed as follows [75]:

1. Obtain the current requirement of the chip based on real

analysis or based on some known parameters, such as gate
c07.qxd 12/19/2003 9:06 AM Page 166

166 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

count, activity factor, clock delay, clock skew, current peak,

current width, etc.
2. Obtain the package RLC characteristics and the chip RLC
characteristics.
3. Design the preliminary decoupling network based on the
equations.
4. Obtain the frequency domain characteristics of the decou-
pling network through SPICE simulation and modify the
network using the measurement information.
5. Obtain the dominant frequency components of the current
waveform and extract the magnitude of current waveforms
in the desired frequencies. The desired frequencies are com-
monly fclk and 2fclk.
6. Check the decoupling condition at high frequencies (5fclk to
8fclk) to eliminate the high-frequency bounces.
7. Check the decoupling condition at low frequencies all the
way to DC. This is particularly important if specialized
megacells like memories are used, which can excite very
high harmonics.
8. Set a targeted bounce number and modify the decoupling
network.
9. Distribute the on-chip capacitance according to the local-
ized current demand within the chip.
10. Distribute a power and ground network on the chip to min-
imize the localized bounce.
11. Verify the on-chip power distribution for local hotspots af-
ter the layout. This will require modeling and extraction ca-
pabilities for on-chip power distribution parameters like re-
sistance, inductance, and capacitance.

To address the high-performance and high-integration applica-

tions, flip-chip technology with integral power and ground tech-
nology should be used, along with the on-chip decoupling provided
by the gate capacitance.
Lower-cost packaging solutions may be available for low-inte-
gration–high performance and high-integration–low-performance
applications, but the design should go through the power distribu-
tion methodology, integrating both the on-chip and on-package
decoupling. This also emphasizes the need for a chip design flow
c07.qxd 12/19/2003 9:06 AM Page 167

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 167

Figure 7-5. Vss ground bounce in a flip-chip package [75].

that includes the package. Figure 7-5 shows the simulation result
of the Vss bounce noise for a flip-chip package [75].

7.3 CASE STUDY OF A MICROPROCESSOR-LIKE CHIP

The purpose of this case study is to analyze the power network on

a microprocessor-like die for several technology options. The
study is based on a distributed model of the chip, with current
sources representing the active circuitry. The model is tested for
normal, power saving, and power peak modes. The die size is
about 17 × 17mm2 and the power supply is about 2.5 V, with an
average current of 12.5 A for the average power of 31 W [76].
The power network was known to be a significant problem in
terms of both metal utilization and voltage drop in the center of
the die. There are several options considered in this study as fol-
lows [76]:

1. Thick M4 with wire bond.

2. Routing most of the Vss through the substrate to reduce
crowding on M4 and improve routability as well as average
voltage in the center.
3. Wire bond with M4/M5.
4. Using C4 with M4. In C4 technology, the power is routed
through the package. The M4 utilization is very low, al-
c07.qxd 12/19/2003 9:06 AM Page 168

168 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

though the effective inductance and resistance of the power

network are low.

Figure 7-6(a) shows the power routing configuration with wide

supply lines (120/120 ␮m wide). It was found from simulations
that the inductance of this case is quite high. With a 30%/30% uti-
lization of Vcc/Vss in M4, we observed an inductance of about 0.2
nH/square.
In Figure 7-6(b), the case of interdigitalized lines is shown. The
more interdigitalization in Vcc and Vss lines, the lower the induc-
tance, assuming that adjacent power lines carry opposite currents.
With about 10 pairs of 12/12 ␮m Vcc/Vss lines, the inductance was
reduced an order of magnitude, compared with the power routing
line widths shown in Figure 7-6(a), to the 0.02 nH/square.
The package for a wire-bond case is shown in Figure 7-7(a). As-
suming that discrete low-inductance capacitors used in the pack-
age and a total of 300 bond wires for Vcc and Vss, the total package
inductance is 114 pH per side, and the bond-wire inductance
causes 65% of the total inductance.
The power in the package is supplied only from two sides. From
the process point of view, it is easier to make the last metal thick-

Vcc

250 m
Vss

(a)
(a)

Vcc
Vss
250 m
Vcc
Vss
(b)
(b)

Figure 7-6. Vcc and Vss line configurations [76].

c07.qxd 12/19/2003 9:06 AM Page 169

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 169

(a) (b)

Figure 7-7. Wire-bonding package with decoupling capacitors [76].

er than the others; and from a routing point of view, it is prefer-

able to have most of the power routed in the thickest, uppermost
metal layer.
The bonding wire is 2.7 m long with radius 12.5 ␮m and pitch
125 ␮m. Each bonding wire has an inductance of approximately
72 pH. The discrete capacitor lead has an inductance of 15 pH.
The Vss and Vdd planes have about 27 pH inductance, and the to-
tal wire-bonding package inductance is about 114 pH (72 + 15 +
27).
The C4 package is shown in Figure 7-8. We now have 50 ␮m
long solder balls with radius 64 ␮m and pitch 250 ␮m. The total
number of C4 balls is about 3000. The C4 inductance has been re-
duced to be negligible. The C4 has completely removed the pack-
age-to-chip bottleneck. We also benefit from the four-sided supply
of the C4 due to the power planes in the package below the chip.
The effective package inductance has been reduced from 57 pH
in the wire-bond case to 10 pH in the C4 case. This can be further
improved by placing the package decoupling capacitors closer to
the chip, and by using a large number of on-chip decoupling ca-
pacitors.
The power routing is assumed to be in Manhattan structures in
M3 and M4. The initial estimates are based on average currents
in a uniformly distributed load on the chip. These values will then
be tested and refined by using the distributed power supply mod-
el.
In the following, we will estimate the IR drop in the wire-bond-
ing technology in which the pads are located on the boundary of
c07.qxd 12/19/2003 9:06 AM Page 170

170 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Figure 7-8. C4 power routing configuration [76].

the chip to feed the middle of the chip. Both M4 and M5 are as-
sumed to be 1.8 ␮m thick and about 21 m⍀/square. This is an op-
timistic assumption, since only the last layer can be made signifi-
cantly thicker than the other layers. M3 is assumed to be 0.8 ␮m
thick and about 47 m⍀/square.
The average current drawn by the chip is about 12.5 A and the
supply voltage is about 2.5 V. We assume in all cases 30% for Vcc
and 30% for Vss. M3 is used for equalization of about 5%/5%
Vcc/Vss. The effective resistance of M4 for Vss or Vcc is increased by
70 m⍀/square [76].
The average voltage drop can now be calculated by considering
uniform current injection from one side. The current is reduced by
two times, the resistance is only 0.5 square, and the current is re-
duced linearly from the edge of the chip to the middle as follows:

Vdrop = (Itot/2) · (Rs /2) · (1/2) (7-6)

c07.qxd 12/19/2003 9:06 AM Page 171

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 171

where the Itot is the total current consumed by the chip and Rs is
the metal sheet resistance. For the case of the interdigitated pow-
er supply in M4 with the Vcc/Vss metal widths of 30 ␮m/30 ␮m, as
shown in Figure 7-6(b), the power is supplied only from two sides
and only M4 is used to carry the average current.
Based on Equation (7-6), for this case, the voltage drop is calcu-
lated as: Vcc_drop = 6.25 · (52.5e – 3/2) · 0.5 = 82 mV. The average
drop in the Vss is dependent on the number of substrate taps. The
number of taps is determined by peak noise considerations, so the
average voltage drop will be small.
On average, we could get the Vcc ~ Vss = 130 mV, so we could
achieve a good routability with only 40% Vcc in M4 and a tolerable
average voltage drop. But this power routing configuration with
the wire-bonding package has high inductance and, therefore, it
has a high switching noise drop across the package and chip.
We consider the second option of the metal routing for Vcc and
Vss using the wire-bonding package. The M4 Vcc and Vss are 30/30
␮m in width and the M5 Vcc and Vss are also 30/30 ␮m in width.
We assume that the M4 and M5 have the same metal thickness.
We also assume that the power supply is from all four sides of the
chip, so the inductance will be reduced.
We can roughly estimate that the voltage drop from the chip
side to the middle is reduced to 220 mV/2 = 110 mV, and the
routability is also improved significantly with the fifth metal lay-
er (M5) added for the power routing [76].
The C4 power distribution is quite different. The resistance in
the package plane is only 2.36 m⍀/square, so the voltage drop in
the package from the edge of the die to the center, assuming a
uniformly distributed current injection to the chip, is about (12.5
A/4) · Rs/4 ⬵ 2 mV [76].
One suggestion is to place the power routing on the package in-
stead of on the chip. The maximum number of solder bumps on
the 17 × 17mm2 chip with a minimum pitch of 250 ␮m is 172/0.252
⬵ 4600.
Since the landing pad of the solder ball is 70 × 70 ␮m, the total
area used, if we use a maximum number of solder balls, is 4600 ·
0.072 = 23 ␮m2, which is about 8% (23/17 · 17) of the chip area. By
using about half of the solder bumps for power and ground, we
need little local routing in M4/M3 from the solder bumps. In addi-
tion to reducing the inductance, the C4 technology also reduces
the on-chip power routing significantly.
c07.qxd 12/19/2003 9:06 AM Page 172

172 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

The following is one option we will discuss to use C4 with

M4 and M3 for local power distribution. Since the area between
the bumps in the horizontal direction is not available for signal
routing, we might as well use the minimum pitch solder balls in
M4 with alternate Vcc and Vss in order to minimize the induc-
tance.
If we assume 15 rows of M4 solder bumps for the whole chip,
and 30 ␮m/30 ␮m for Vcc/Vss in each row, the resistance is 2/m⍀ ·
17,000/15 · 30 = 800 m⍀/square. By reducing the horizontal dis-
tance by N, we reduce the current injected in each section by N
and the resistance to the center of each section by N.
We can neglect the voltage drop in the package, so the voltage
drop will thereby be reduced by N2. With a solder bump at each
250 ␮m, we have a Vcc each 500 ␮m => N = 17,000/500 = 34. Fig-
ure 7-8 shows the C4 package power routing configuration.
A full-chip and package model of the power distribution net-
work is built, as shown in Figure 7-9 [76]. This model is used to
simulate the effect of different metal utilizations and packages
more accurately. The 25 or 5 × 5 elements in the center model the
chip core. Separate current waveforms can be injected into each of
these elements in order to model real chip blocks with different
activities.
Around the core, there are five package elements at each side to
model the wire bond or C4 package. Part of the C4 package model
is also included in each core element, since the solder ball bumps
can be placed anywhere on the die. The pins of the package are
assumed to have ideal Vcc/Vss potentials.
The core element consists of three main elements as follows:

1. The current source for Vcc and Vss. The current waveform
can be injected between the local Vcc and Vss power supplies.
2. The decoupling capacitances, with the modeling of parasitic
capacitance and an explicit decoupling capacitance.
3. Power network metal RL modeling.

The RL branches in the simulation model show the on-chip Vcc,

Vss, and substrate per unit resistance and inductance, as well as
the C4 Vcc and Vss package planes. Figure 7-10 shows the power
I/O package model. It uses separate inductors and resistors for
C4 and bond wire package models [76]. It includes both C4 and
wire-bonding models in one simulation model. In order to switch
c07.qxd 12/19/2003 9:06 AM Page 173

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 173

Figure 7-9. Full-chip and package model of power distribution network [76].

from C4 to wire bonding, we can change all the C4 resistance

values to 10 k⍀ so that there will be negligible current in the C4
network.
Both experimental results and RC extraction of all the different
parasitic components of the on-chip capacitance suggest that the
total effective on-chip decoupling capacitance on the previous mi-
croprocessor using the old process is about 40 nF. In the new
process, which has the 0.8 scaling factor from the previous micro-
processor, the main assumptions for the capacitance in the new
microprocessor in this experiment are as follows [76]:
c07.qxd 12/19/2003 9:06 AM Page 174

174 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Figure 7-10. Package I/O model [76].

1. The n-well and diffusion capacitance increase due to smaller

reverse bias by 1.3 times, which is due to higher doping of
~1.25 times in the new process.
2. The gate oxide increases by 2.2 times due to the gate thick-
ness.
3. The metal capacitance increases by 1.4 times due to extra-
level metal and smaller pitch.

The total parasitic capacitance is estimated to be about 75 nF

with an uncertainty of about ±15 nF. The RC components of the
decoupling capacitances are also included in the simulation mod-
c07.qxd 12/19/2003 9:06 AM Page 175

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 175

el. It turns out that the n-well decoupling capacitance is not very
effective in absorbing short spikes, due to the high lateral resis-
tance in the well.
Figure 7-11 shows the average noise with and without the 100 nF
on-chip decoupling capacitors. Obviously, the noise performance is
better with the additional decoupling capacitors. However, the
noise without the additional decoupling capacitors is not that se-
vere, so the intrinsic parasitic capacitance does help significantly.
In order to find the exact requirement of the on-chip decoupling
capacitance, a better knowledge of the clocking strategy, the pow-
er saving requirements, and bus protocol is necessary. In the pow-
er network noise simulation, the switching currents are modeled
on the power grid model. The modeling of the switching currents
is the key to the power noise results.
Figure 7-12 shows the current waveform used in Figure 7-11’s
result. It uses a waveform peaking at the beginning of the cycle
with the rising edge of the inserted clock and falling off to 50%
and 20% of the peak at the middle and the end of the cycle, re-
spectively.
The average current is 500 mA and the peak is about 800 mA,
and with 25 cells in the full-chip modeling this results in current

Figure 7-11. Simulation waveforms for Vcc noise [76].

c07.qxd 12/19/2003 9:06 AM Page 176

176 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Figure 7-12. Switching current waveforms injected into each cell [76].

consumption on the chip of about 25 × 500 mA = 12.5 A [76]. The

I/O cells may use different current waveforms from the core cells.
By using the current waveforms shown in Figure 7-12, we can
test different package scenarios using different test cases as fol-
lows [76]:

1. Normal mode: all 25 cells in the full-chip model use the cur-
rent waveform shown in Figure 7-12, with an average cur-
rent of 0.5 A each and total current 12.5 A.
2. Power saving mode: 2 × 3 or a total of six units in the lower-
right corner are turned off in one cycle to simulate the effect
of the power saving in large units, about 24%.
3. Peak power mode: the current in one unit in the center of
the chip is five times larger for five cycles in order to simu-
late the effect of local peak and average activity.
4. I/O noise mode: we assume that in the worst case, 150 I/Os
switch with 75 I/Os at two sides. In the model with five ele-
ments per side, the current in each side element is ramped
to (75/5) · 70 mA = 1 A in 1 ns, and back to 600 mA after 2
ns, and kept high for 8 ns by assuming the bus speed is half
the clock frequency.
c07.qxd 12/19/2003 9:06 AM Page 177

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 177

Table 7-4. Metal utilization and parasitic values [76]

Case II: Case III: Case IV:
Case I: M4 M4/Substrate M5/M4 C4/M4
interdigitated noninterdigitated interdigitated interdigitated
M5 utilization — — 30%/30% —
M4 utilization 30%/30% 40%/40% 30%/30% 5%/5%
M3 utilization 5%/5% 5%/5% 5%/5% 5%/5%
Number of bond 300 300 300 —
wires
Number of bumps — — — 1100
Rtotal (chip and 20 13 13 2.4
package, m⍀)
Lpackage (pH) 57 57 50 10.5
Lchip (pH) 5 150 2.5 4
Ltotal (pH) 62 200 53 15

Table 7-4 shows the simulation results for the above test condi-
tions in different package and power routing configurations. The
metal utilization and approximate resistance and inductance val-
ues are summarized in the table. The inductance and metal uti-
lization of the C4 technology is much lower than the cases in the
wire-bonding technology.
Table 7-5 shows the result for the Vcc–Vss noise comparisons. It
is interesting to note that the power saving and peak power condi-
tions cause larger power peaks than the I/O noise. In the past,
I/Os have been known to cause most of the noises.

Table 7-5. Vcc–Vss performance comparisons [76]

Case II: Case III: Case IV:
Case I: M4 M4/Substrate M5/M4 C4/M4
interdigitated noninterdigitated interdigitated interdigitated
Normal test:
Average 2.25 V 2.34 V 2.34 V 2.48 V
Minimum 2.24 V 2.32 V 2.33 V 2.46 V
Power saving:
Minimum 2.23 V 2.29 V 2.30 V 2.41 V
Peak power:
Average 1.93 V 2.06 V 2.21 V 2.39 V
Minimum 1.84 V 1.83 V 2.08 V 2.29 V
P-to-P noise 0.36 V 0.46 V 0.18 V 0.18 V
c07.qxd 12/19/2003 9:06 AM Page 178

178 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

The difference now is in the low-voltage swing. The C4 package

is clearly the best choice. The C4 undershoots recover within a cy-
cle, so the speed paths, usually in one cycle, are not much affected.
Figure 7-13 shows the Vcc and Vss simulated waveforms in the
peak power test condition. Case IV in the waveforms is marked as
the C4/M4 interdigitated design with the lowest power noise, as
in Table 7-5. The peak power test case is described as follows.
When the current in the center of the chip suddenly increases by
five times, the Vcc–Vss is affected significantly in the bond wire
cases. Case I drops down and recovers in a couple of clock cycles
but the undershoot is not much compared with the final average
value. Case II has a significant undershoot.
The high inductance of the substrate tap case causes the center
element not to have the benefit of the whole chip’s decoupling ca-
pacitance. The delay to the edge of the chip also results in a slow
start of the currents in the bond wires, so the drop needs about
three cycles to settle. Case III is better than Case I, and Case IV
with C4 package technology is significantly better. In this case,
the settling time is within a clock cycle, and although the mini-
mum Vcc–Vss is somewhat affected in the following cycle, the aver-
age is about the same.
There is a local decoupling capacitance with time constant Rdec
· Cdec. With a typical Lchip of 50–200 pH, including the bond wire
and a switching power of maximum 3 A, we can get R = 2.5 V/3 A
= 0.83 ⍀ and L/R about 60–240 ps.
With a typical decoupling capacitance of 100 nF and the para-
sitic added, we can have RC delay = 83 ns. This will clearly domi-
nate over the L/R decoupling. Since the RC time constant is domi-
nant over the L/R time constant, the impedance from the package
looks like that of a capacitor. For the typical numbers we have a
time constant of 14–28 ns in the bond wire case and less than 10
ns in the C4 case.
As a rule of thumb, one could design the power supply so that
Cdec takes care of the local drop until the LC can respond. We can
increase Cdec until this is satisfied. The time constant increases
only by the square root of C, whereas the time that the switched
power can be sustained by Cdec increases linearly with C. With,
say, a two times increase in Cdec, the improvement in the voltage
drop would be 1.41 times.
We have an on-chip decoupling capacitance of 100 nF to take
care of local noise. We could deduce the inductance and capaci-
c07.qxd 12/19/2003 9:07 AM Page 179

7.3 CASE STUDY OF MICROPROCESSOR-LIKE CHIP 179

(a)

(b)

Figure 7-13. Switching power noises in (a) Vcc and (b) Vss for the peak power
test [76].
c07.qxd 12/19/2003 9:07 AM Page 180

180 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

tance requirements of the socket and board. Given the inductance

of the pins, say five times that of the package to chip, we need a
package capacitance of only five times the chip capacitance. The
noise in the package will be five times slower than that on the
chip, but due to the five times capacitance it will maintain the
same voltage drop as the on-chip capacitance.
Similarly, if the board inductance is 1 nH, which is ten times
that of the socket inductance, we need a ten times capacitance in-
crease from the package to the board in order to maintain the
voltage at the board for 10 clock cycles. It is therefore clear that in
order to find the requirements of on-chip decoupling capacitance,
we need a model of the package inductance.
Similarly, in order to find the requirements of the package de-
coupling capacitance, we need an accurate inductance model of
the pin, socket, and decoupling capacitors near the socket on the
board.
The performance results of the four power routing options are
summarized in Table 7-5 [76]. The wire bond solution with M4
only (Case I) has too much resistance combined with the bad ef-
fects of the bond wire inductance.
The minimum voltage in the center for the 2.5 V supply goes
down to 1.85 V under peak power stress. The second option, with
the Vss power through the substrate, has better average voltage
and lower resistance. However, it suffers from unacceptably high
on-chip inductance values, even with an optimistic low-resistance
substrate of 9 m⍀-cm [76].
The estimate is that total effective inductance between the ca-
pacitors in the package and the die is over three times higher
than Case I. This causes the peak power simulation of the 1.87 V
voltage. The M4/M5 solution shows a good average voltage of 2.34
V and the worst case of 2.1 V under the peak power load. This
would cause a speed degradation of approximately 6% compared
to Case IV. The routability of this solution is low on both M4 and
M5.
The C4-based solution is clearly superior from the point of view
of the power network, performance, and routability. The average
voltage is degraded only 20 mV from the external value. The
worst-case average for a cycle is 2.4 V under the peak power
stress.
This gives the best performance of the options considered. The
M4 utilization for power is extremely low—only 10%—with the
c07.qxd 12/19/2003 9:07 AM Page 181

7.4 POWER SUPPLY MEASUREMENT 181

ability to put gaps in the M4 power buses as required for routing.

In addition, C4 provides reduction of routing, especially for the
I/O areas.

7.4 POWER SUPPLY MEASUREMENT AND VALIDATION

This section will analyze the effectiveness of the on-board decou-

pling capacitance for microprocessor chips [77]. The model used in
the simulation is a PGA-type microprocessor model for the pack-
age and the chip parasitic [77].
The main emphasis is to determine the effect that this varia-
tion has on the noise seen at the pins and on the die. SPICE cir-
cuit simulations, using the frequency domain analysis, were used
to assist in the evaluation.
When the number of on-board 1 ␮F type 1206 decoupling capac-
itors increases from zero to 35, the resonance frequency increases.
Table 7-6 summarizes the optimal measurement bandwidth for
each level of board decoupling [77].
The main test points used in these simulations were 0, 1, 5, 10,
20, and 35 decoupling capacitors of the 1 ␮F ceramic type. The
three areas of interest that were examined for all test cases were
noise levels at the pins and die and the ratio of correlation be-
tween these two parameters. Figure 7-14 shows the noise at the
pins in the frequency domain [77].
As the capacitance at the board increases, the resonance fre-
quency also increases. The magnitude of the resonance decreases
as the decoupling capacitors are added. Figure 7-15 shows the ra-

Table 7-6. Measurement results for Vcc–Vss noises [77]

Number of
decoupling Optimal Worst-case noise Worst-case noise
capacitors measured at the die at the pins at
on board bandwidth (MHz) at resonance (V) resonance (V)
0 45 1.114 1.048
1 45 0.416 0.364
5 45 0.261 0.180
10 40 0.213 0.116
20 40 0.175 0.066
35 40 0.154 0.040
c07.qxd 12/19/2003 9:07 AM Page 182

182 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Figure 7-14. Measured noise at the pins of the chip package [77].

Figure 7-15. Ratio of pins per die versus frequency [77].

c07.qxd 12/19/2003 9:07 AM Page 183

7.4 POWER SUPPLY MEASUREMENT 183

tio between the noise at the die and the noise at the pins for vari-
ous decoupling capacitors at the board.
Figure 7-16 shows the on-board decoupling capacitor model for
a ceramic 1 ␮F capacitor with 15 m⍀ parasitic resistance and 2.1
nH parasitic inductance [77].
Figure 7-15 shows that for a majority of the test range frequen-
cies, noise at the die becomes much greater relative to the noise at
the pins as the number of capacitors increase on the board. As the
frequency increases to greater than 60 MHz, the noise at the pins
quickly becomes much greater than the noise at the die, due to
the resonance effects. A measurement bandwidth is selected to
achieve a more consistent relationship between the pins and die
noise over this frequency range.
To understand the effects of varying the number of decoupling
capacitors at the board, a model is developed for the package and
die’s parasitic to be used in the SPICE simulation [77]. Using pre-
viously taken test data, it is possible to plot how the simulation
model compares to the actual device.
Figure 7-17 shows the discrepancies between the simulation
data and empirical data over a range of 20 MHz to 100 MHz for
the zero capacitance case. The discrepancy seen between the em-
pirical data and the model is likely due to residual impedance
found on the board [77].
A Pentium-II chip scheme dedicated roughly 75% of the M4 lay-
er and 12% of the M3 layer to Vcc and Vss routing [78]. The Vss re-
sistance is significantly lower than could have been achieved by
using all of M3 and M4 for power routing.
Since M4 is the only thick (low-resistance) metal layer, the
main supply current was constrained to the latest M4 routing di-
mension. Hence, the bulk of the Vcc and Vss pins are located to
the left and right of the die, where Vcc bond wires tie the pack-

Figure 7-16. Modeling of an on-board decoupling capacitor [77].

c07.qxd 12/19/2003 9:07 AM Page 184

184 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

Figure 7-17. Measurement versus simulation data comparison [77].

age Vcc planes to the left and right edges of the die, and where a
regular array of parallel M4 Vcc lines terminate, as shown in
Figure 7-18.
The objectives of the measurement are as follows:

1. Feed back measured information into the power grid simula-

tion.
2. Add to the general understanding of the microprocessor
power delivery for the preparation of the new process.
3. Determine a solid minimum operating voltage for use in set-
ting the next-step performance goal.
4. Access the impact of adhesive die attachment on the Vss volt-
age [78].

Two types of measurements were found to be most useful, as

follows:

1. DC mapping. A plain wire probe with no transistors or resis-

tors and a voltmeter are used to create a voltage versus posi-
tion graph of the Vcc and Vss planes. This is done for the full
c07.qxd 12/19/2003 9:07 AM Page 185

7.4 POWER SUPPLY MEASUREMENT 185

M4 Vcc, 20.6 m

91.2 m

M4 Vcc, 20.6 m

GCLK

M4 Vss, 10.3 m

Figure 7-18. M4 power routing patterns [78].

die for a single part, and for a single slice through the die
center for many others. The mapping is done with the part
running a high-power pattern.
2. AC snapshots. AC waveforms are taken of the Vcc, Vss, and
Vcc–Vss for 33 locations around the die and cavity. Some of
these were taken using picoprobes for both transistors and
passive differentials, and comparable results were obtained.
The differential probes for this type of study are used to au-
tomatically subtract the Vcc and Vss with a low-noise result.
Since there is no need for time-consuming averaging, stor-
ing, and subtracting of waveforms, the measurement is an
order of magnitude faster than with the FET probes, which
allows the engineer more time to search the patterns for the
worst-case voltage spikes. Maximum di/dt patterns were
used for these snapshots.

The microprocessor performance validation methodology is al-

lowed for a 385 mV drop sustained over multiple cycles, whereas
the worst measured drop was only 200 mV. This drop was only
sustained over the cycle immediately after a restart of the clocks,
and in all other instances lasted less than a phase. The DC volt-
c07.qxd 12/19/2003 9:07 AM Page 186

186 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

age drop at the worst-case point on the die is at most 100 mV [78].
The 100 mV DC drop at 133 MHz and 2.755 V is accumulated as
follows:

앫 Vcc package drop to the die edge = 32 mV

앫 Vcc bond wire and bond finger drop = 10 mV
앫 Vcc die drop = 40 mV
앫 Vss package drop to the die edge = 9 mV
앫 Vss die drop = 9 mV

In spite of the fact that the Vcc metal grid is about six times as
wide as the Vss grid, 80% of the IR drop is in the Vcc supply. Note
that in both the package and on the die, the Vcc drop is about four
times that of the Vss.
This is due to the fact that the Vcc has to go through an extra
set of vias and through bond wires, and then it must laterally tra-
verse the metal grid. The Vss current travels to the interior of the
die on a dedicated metal plane and then has a very short path ver-
tically up through the die.
Had the Vss current been distributed in the same manner as the
Vcc current, it would have increased the total supply drop by at
least 60% and it would have required essentially all of the M4 and
M3 planes to be used for the power and clock routing [78]. Thirty-
three points around the die and bond cavity were probed using a
variety of high-speed probes. All of the locations were probed us-
ing the stop-clock pattern, which halts the high power loop and
then allows it to resume [78].
Several of the more interesting locations were probed while
running patterns tailored to exercise the local power grid, but
none of these produced voltage spikes as bad as the stop-clock pat-
tern.
The AC measurements for the patterns and positions include
the following:

1. Halt instruction in the high power loop. Probed at the DC

hot spot at the bottom center of the die. It takes 200 mV AC
transient settling to 100 mV after 20 ns in the stop clock pat-
terns.
2. I/O simultaneous switching pattern. Probed at the middle of
the left and right die edges and at the middle of the die, 80
c07.qxd 12/19/2003 9:07 AM Page 187

7.4 POWER SUPPLY MEASUREMENT 187

mV transient voltage drop is observed from the Vss grid to

the die attachment plane.
3. Simultaneous switching pattern through a large repeater
block. Probed at the repeater block, a 2 ns long 180 mV peak
glitch is seen in Vcc–Vss during switching.
4. Back-side-bus-induced noise from simultaneously switching
28 address lines. A 200 mV peak glitch is observed for less
than 2 ns. The worst transient observed in all the probing
was observed at the DC hot spot, at the bottom middle of the
die, while running the stop-clock pattern. Figure 7-19 shows
the 200 mV transient noise in the stop-clock patterns.

Microprobing is very mature, even old fashioned, but we can use

it to produce a complete power map, which is extremely valuable.
The microprocessor power delivery scheme was quickly proven a
success, and the risk taken in running the Vss current through the
substrate was shown to be working extremely well for EDA at-
tached parts. The valuable supply voltage information was fed
into the design for the B-step, which allows the designers to reset
the speed targets [78].

Figure 7-19. Vcc transient noise in measurement [78].

c07.qxd 12/19/2003 9:07 AM Page 188

188 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

7.5 I/O PADS FOR POWER/GROUND SUPPLIES

A set of recommendations can be used for how to place the power

and ground pads for the standard I/O library [79]. The following
set of I/O power and ground pads are used to supply power and
ground pads to the I/O bus structure and internal core in a 0.18
␮m process [79].

앫 pnl_vc (VDD): power pads for core logic and I/O interface
with the nominal voltage 1.8 V. These I/O cells provide pow-
er to the standard cell core area and interface between the
core rings and I/O pads. These pads are paired with the
ground pads pnl_gcs.
앫 pnl_gcs (GND/SGND): ground pads for core standard cell log-
ic, the ground, and substrate ground connections within the
I/O set.
앫 pnl_go (VSSO): ground pads for output drivers. Included in
these pads are ESD protection circuits. These pads must be
included in the pad ring at regular intervals to provide good
power distribution and ESD protection for I/Os.
앫 pnl_go (VSSO): power pads for the output drivers only. In-
cluded in these pads are ESD protection circuits. These pads
must be included in the pad ring at regular intervals to pro-
vide good power distribution and ESD protection for the I/Os.
These pins operate at 3.3 V nominal.
앫 pnl_vop: power pads for the output drivers, power for
predrivers and for input buffers. These pins are nominally at
3.3 V.
앫 pnl_vp (VDDP): supplies the voltage for the predrivers and
the first stage of the input receiver and is nominally 3.3 V.

Sometimes it is necessary to insert cells that disconnect power

connections between I/Os that operate at different potential or for
noise isolation purposes. For example, interfacing a bank of SSTL
I/Os operating at 2.5 V with a standard set of I/Os operating at 3.3
V requires a breaker cell.
Another example would be isolating power supplies between
slower, standard TTL I/Os and high-speed LVDS I/Os. Figure 7-
20(a) shows an example of using a break cell to disconnect the
power and ground between 2.5 V and 3.3 V supplies [79]. Notice
that only the power and ground supplies that affect the I/Os are
c07.qxd 12/19/2003 9:07 AM Page 189

7.6 SUMMARY 189

(a)

(b)

Figure 7-20. Break cells for power supplies [79].

cut. The core power and ground voltages (Vdd and Vcc) remain in-
tact.
In this diagram, the VDD2.5 supplies power to I/Os that have a
2.5 V reference voltage, and the VDD3.3 supplies power to the
I/Os that use a 3.3 V reference voltage. The break cell is inserted
between them to separate power and ground for the VDDO,
VDDP, and VSSO buses. In Figure 7-20(b), break cells isolate the
noise that can occur on power bus connections between high-
speed LVDS I/Os and slower TTL I/Os.

7.6 SUMMARY

With the continually increasing clock frequency and performance

requirements for chips, the power distribution in both chip and
package has to be carefully designed and analyzed. This chapter
c07.qxd 12/19/2003 9:07 AM Page 190

190 PACKAGE AND I/O DESIGN FOR POWER DELIVERY

describes design examples to model the package and decoupling

capacitors in order to analyze the power distribution performance
at the system level.
In addition, a microprocessor test case is given in details for
package options, such as the C4 package, to reduce the IR drop
and simultaneous switching noise in the design. Microprocessor
measurement and I/O design are also discussed in this chapter. A
novel concept for the package and chip codesign concept has been
discussed in literature [80, 81, 82].
One idea is use to the mesh planes in the package and local
power distribution inside the chip; this is called chip and package
co-synthesis, and is based on the flip-chip or C4 package, as
shown in Figure 7-21 [82].
Flip-chip technology, combined with this codesign scheme, pro-
vides significant advantages in noise reduction. The amount of
di/dt noise is proportional to the effective inductance of the power
distribution network. The effective inductance is further reduced
due to multiple Vdd and ground connections from the chip to pack-
age.
Note that flip-chip mounting can provide many more I/O con-
nections than other attachment techniques. We can assign inde-
pendent Vdd/ground nets for each local region on the chip, such
that the supply voltage drop within a part of the region can be
small, depending on the partition scales.

Figure 7-21. Co-design of chip and package power distribution by flip-chip

package [82].
gloss.qxd 12/16/2003 12:45 PM Page 191

GLOSSARY

AC Analysis: transient analysis with detailed voltage distribu-

tions over time using stimulus vectors at primary inputs.
Activity-Based Analysis: method used to compute the currents
drawn from power supply lines based on the switching activi-
ties of circuits.
Back-Annotation: the task of stitching the extracted RC data back
to the prelayout circuit netlist to perform the circuit simulation
with the interconnect effects.
Block-Level Power Distribution: also called the local power distri-
bution. Connects the power supply from the global power net-
work at the full-chip level and then distributes the power inside
a local block region of the chip.
Break Cells: standard cells containing only the power and ground
connections. Used to bridge the gaps in cell rows based on stan-
dard cell design style.
By-pass Capacitors: the decoupling capacitors added between Vdd
and Vss networks.
C4 Bump: refers to the solder bump in the special flip-chip pack-
age technology. These solder bumps are attached to the top
metal of the chip to make the connections to the package.
C4 Package: refers to the flip-chip package technology. Flip-chip
connection technology, the first-level chip-to-package connec-
tion option, is traditionally regarded as being the controlled col-
lapse chip connection (C4) process. This technology is achieved
by distributing the I/O solder bumps over the die, flipping the
chips over, aligning them with the contact pads on the sub-

Power Distribution Network Design for VLSI, by Qing K. Zhu 191

192 GLOSSARY

strate, and connecting the solder bumps between the chip and
package to make the connections.
Capacitance: the charge storage capability between two conduc-
tors.
Capacitance Model: the mathematical equations used to estimate
the interconnect capacitances. They contain the variables of geo-
metrical parameters associated with neighboring metal lines.
Chip: the packaged integrated circuits that can be used as a basic
building block in complex electrical system designs.
Characterization: the process used to reveal the dependence of
electrical performance on design parameters.
Circuit Simulation: the method of using computer programs to
model transistors and interconnects, and solve the IV current
and voltage equations. The final results are presented as text
files or graphical waveforms.
Circuit Timing Analysis: the task of analyzing the hold time and
set-up time requirements in sequential logic and other timing
constraints across the chip.
Contact: the connection from metal in one layer to diffusion or
polysilicon layers.
Core: the chip without the I/O region.
Crossbar Leakage Current: the current from Vdd to ground due to
a possible short between pMOS and nMOS transistors in the
circuit. This is a kind of wasted current for circuit operations.
Decoupling Capacitors: refers to the capacitors added between
Vdd and Vss lines, used to protect the power supply voltage
from sudden switching currents. These decoupling capacitors
are required inside the chip, on the package, and in the sys-
tem board.
Deep-Submicron Process: refers to the VLSI process technology
with about 0.18 ␮m minimum feature size or less.
Delay: the time needed from the 50% Vdd of the input signal to the
50% Vdd of the output signal through the circuits.
Delta-I Current: same as di/dt noise; the change of switching cur-
rents in a short period.
Design Guidelines: the set of guidelines provided by senior design-
ers for the IC design team to follow, to meet the performance,
area, and power requirements of chip design.
Design Methodology: the set of design guidelines, CAD tools, and
design data flows used to design an IC chip from the conceptual
idea to the final working silicon.
gloss.qxd 12/16/2003 12:45 PM Page 193

GLOSSARY 193

Design Rules: the minimum space, minimum width, and mini-

mum coverage, etc. for each physical layer in the VLSI process.
Device-Level Extraction: the modeling of transistors and intercon-
nect RC networks from the physical layout.
Die: refers to the bare chip without the pads and package.
Die Size: the size of the die without the I/O region.
Die Micrograph: photograph of real silicon showing functional
blocks and their physical placement in the chip.
DRC: acronym for design rule checking, which verifies any viola-
tions in the physical layout against design rules.
Dynamic Analysis: the circuit performance analysis that specifies
the input signals for I/Os. Includes dynamic IR drop analysis
with the input test vectors specified at primary inputs.
ECO (Engineering Change Order): the specifications used to make
circuit or logic changes after the initial physical layout has
been done. ECO has to be managed carefully since it is time-
consuming to change the circuit and layout on a tight schedule.
Electromigration: the phenomenon that causes metal lines to be
worn out if the average current density carried by these metal
lines exceeds the required upper limit.
ESR: the external series resistance associated with the decou-
pling capacitance transistor.
Extracted Parasitic: the RC data used to model interconnects in
the VLSI physical layout.
Flip-Chip: also called C4 package technology. This technology is
achieved by distributing the I/O solder bumps over the die, flip-
ping the chips over, aligning them with the contact pads on the
substrate, and connecting the solder bumps between the chip
and package to make the connections.
Floor Planning: the arrangement of physical partitions and loca-
tions of major functional blocks and I/O pads in the chip. The
goal in floor planning is to reduce the total layout area and
meet the timing requirements of critical paths.
Full-Chip Power Distribution: the power (Vdd, Vss, etc.) networks
over the entire chip.
Gate Capacitance: the capacitance from the device gate to the sub-
strate.
Gate Delay: the delay through the logic gate.
GDSII: a binary file used to represent the physical layout. It is in
a standard format accepted by most physical design and verifi-
cation tools.
gloss.qxd 12/16/2003 12:45 PM Page 194

194 GLOSSARY

Global Power Network: the full-chip-level power distribution net-

work on the top metal layers.
Ground Bounce: the variations in the supply voltages, especially
in the ground plane, due to the switching of logic gates and out-
put drivers.
Hot Spot: refers to the location in the chip where the local current
densities are extremely high due to the high power consump-
tion in this area.
Impedance: the characteristic resistance of the metal traces to the
currents carried over them. It is usually characterized in the
frequency domain.
Interconnect RC: refers to the resistance and capacitance associat-
ed with the metal lines on the routing layers of the chip.
IR Drop: refers to the voltage drop of the power supply current
through the resistive network of the distribution network,
where the resistive voltage drop is calculated by I (average
current) · R (metal resistance) for the power distribution net-
work.
I/O Library: refers to standard cells, particularly those used for
I/O functions.
L · di/dt Noise: refers to the voltage drop due to the power supply
current change (di) in the ⌬t rise or fall time through the induc-
tance (L) associated with the power distribution loops. The in-
ductive drop of the power distribution network is calculated by
the L · di/dt formula.
Linear Network: refers to the electrical network consisting of the
linear elements only, such as the resistor, capacitor, inductor,
linear current source, and linear voltage source.
Loop Inductance: the inductance associated with a current loop.
Low-Pass Filter: refers to the RLC network, which filters out the
high-frequency harmonics of the input signal and keeps most of
the low-frequency harmonics in the output.
LVS: the acronym for layout versus schematic checking, which
verifies that the netlist extracted from the physical layout is
the same as the prelayout circuit netlist.
Mean Time to Fail (MTF): the average time that a specific product
can last, based on thousands of product samples.
Metal Capacitance: the total capacitance from a metal line to ad-
jacent metal lines and to the substrate.
Metal Ions: the atoms and electrons comprising the material of
metal layers.
gloss.qxd 12/16/2003 12:45 PM Page 195

GLOSSARY 195

Metal Structures: different combinations of metal lines in the ad-

jacent layout with various metal widths and line-to-line spaces.
Metal Utilization: the ratio of metals used to carry signals com-
pared to the total metals used, including power and ground lines.
Modeling of Power Network: the electrical model of the metal lines
and switching currents for the power distribution network.
Noise: refers to the unwanted voltages excited by on-chip activi-
ties and those in the signal line or power line.
Noisy Nodes: the nodes in the power distribution network with
transient voltages below or above the required voltage thresh-
olds.
On-Chip Inductance: the inductance associated with the on-chip
metal lines, especially that of power distribution network.
Pads: the I/O circuits or landing metals from the chip to the out-
side package.
Parasitics: the interconnect resistance and interconnect capaci-
tance extracted from the physical layout.
Parasitic Capacitance: the capacitance of metal lines in the chip.
Peak Current: the maximum current value over clock cycles.
Physical Design: the task of implementing the logic circuit into
the physical layout based on design rules.
Piecewise Linear: A technique in the computer’s digitized algo-
rithm for using the linear function in each step size to approxi-
mate the general curve function.
Place and Route: to place the standard cells and blocks and then
route the cells or blocks based on the circuit netlist. The layout
is accomplished using the place and route tool in a standard
cell-based design style, especially for the ASIC (application-spe-
cific integrated circuit) chip.
Power Bus: the wide power lines used in the power distribution
for the chip.
Power Distribution: the task of delivering the power supply (Vdd,
Vss, etc.) from the power sources to individual transistors on the
chip.
Power Grid: the term for the on-chip power distribution network,
which is usually routed in the horizontal and vertical grid
structure on various metal layers.
Power Grid Analysis: the circuit analysis of voltage drops and
electromigration of the power distribution network, based on
the power network models for metal lines and device switching
currents.
gloss.qxd 12/16/2003 12:45 PM Page 196

196 GLOSSARY

Power Network: refers to the interconnected metal lines on the IC

chip used to deliver the Vdd and ground voltages.
Power Strap: the wide metal lines used for on-chip power distribu-
tion.
Power Supply Voltage: the nominal voltage used by the circuits
for correct functions and timing requirements. For example,
a 1.8 V supply voltage is usually used in a 0.18 ␮m process
chip.
Power Switching Noise: the voltage variations in the power distri-
bution network due to the switching currents. IR drop and L ·
di/dt drop are the two main sorts of voltage noises causing the
power grid to fail.
Prelayout Netlist: the circuit netlist used to specify the layout to
be drawn and to describe the connectivity of devices and tran-
sistor sizing parameters.
RC Back-Annotation: the process stitching the RC data or RC ele-
ments extracted from the VLSI layout back to the prelayout cir-
cuit netlist, in order to simulate the circuit with the intercon-
nect RC parasitic.
RC Extraction: the task of modeling metal lines of the chip layout
into a distributed RC network together with the transistors.
RC Data: the RC network, extracted from the VLSI physical lay-
out, saved in a file such as the standard parasitic format (SPF)
file.
RC Netlist: the netlist used to represent the RC models extracted
from the physical layout.
Reflection Noise: the voltage increase or decrease due to the mis-
match of the characteristic impedance of the metal line to the
load.
Regulator: the circuitry that stabilizes the output power supply to
a specified voltage to compensate for supply voltage noise.
Resonance: the phenomenon that results in a cyclic waveform be-
ing generated from an RLC network.
RLC Segments: the distributed elements of the resistor, inductor,
and capacitor used to model the metal lines of the physical lay-
out, especially for the power distribution grid.
RMS Current: the root mean square current value over clock cy-
cles.
Scaling: refers to the shrinking of the minimal gate length in the
IC process by a fixed factor from one generation process to the
next. This fixed factor is called the scaling factor. For example,
gloss.qxd 12/16/2003 12:45 PM Page 197

GLOSSARY 197

for a 0.18 ␮m process scaled to a 0.13 ␮m process, the scaling

factor is 0.13/0.18 = 0.72.
Simulation: the method of using computer programs to model
transistors and interconnects and solve current and voltage
equations.
Simultaneous Switching Noise (SSN): when a number of off-chip
loads are switched simultaneously in a digital system, a sudden
current change is produced in the power and ground supply
networks.
Standard Parasitic Format (SPF) File: the industry-standard file
format used to save the RC data extracted from the physical
layout of the VLSI chip. It is similar to SPICE netlist format.
Standby Mode: the chip is in a quiet mode with no logic operations.
Static Analysis: circuit performance analysis without the input
signals for I/Os; for example, static IR drop analysis with no in-
put vectors.
Switching Current: when the logic circuits change the states from
logic 1 to logic 0 or from logic 0 to logic 1, a surging current is
generated at the source or drain of the transistors, which caus-
es an IR drop or di/dt noise in the power distribution network.
Switching Factor: a fraction of operating cycles during which the
circuit node switches on and off during the clock cycles.
Tap Current: the current source model tied to the power grid used
to model the switching currents of transistors.
Technology Parameters: the numbers describing the process tech-
nology, such as the minimum gate length, the metal pitches in
metal layers, etc. These technology parameters provide the
foundation for design rules in IC circuit and layout designs.
Top Metal Layers: the top one or two metal layers of the chip. For
example, for a 0.18 ␮m process chip with six metal layers, Met-
al 6 and Metal 5 are usually the top metal layers and the global
power grid is routed on these top metal layers.
Transistor: the basic device in IC technology used to implement
the switching of currents based on the controlling voltages at
the terminals. The transistor has four terminals: source, drain,
gate, and bulk.
Transistor-Level Simulator: the circuit simulator that uses the
transistor device models and interconnect RC or RLC models.
Transmission Line: the long metal trace in the package or board
used as an RLC line instead of an RC line.
Unit-Length Capacitance: the capacitance of the metal line per
unit length.
gloss.qxd 12/16/2003 12:45 PM Page 198

198 GLOSSARY

Unit-Length Inductance: the inductance of the metal line per unit

length.
Unit-Length Resistance: the resistance of the metal line per unit
length.
Via: the hole between adjacent metal layers in an IC chip.
Via Resistance: the resistance of each via between two adjacent
metal layers.
Vector-Based Analysis: the method used to analyze the switching
currents of the circuit, based on the input vectors at chip I/Os.
Voltage Distribution: the various voltage values at the nodes of
the power distribution network.
Voltage Fluctuation: the phenomenon caused by supply voltage
variations during different time periods and at different loca-
tions in the chip.
Voltage Threshold: the upper or lower voltage limits for the sup-
ply voltage (Vdd or ground) considered as functional for the cir-
cuits in the chip.
Voltage Regulation: the step used to adjust the supply voltage to
the required stable values. It can be lower or higher than the
nominal supply voltage, or even negative for substrate-biasing
purposes.
Weak Spot: the location in the power grid where the voltage val-
ues are below or above the required voltage thresholds.
Wire Bonding: the chip attachment technology using long lead
metals bonded from the package layer to I/O pads.
refs.qxd 12/16/2003 12:48 PM Page 199

REFERENCES

1. P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. All-

mon, “High-Performance Microprocessor Design,” IEEE Journal of Solid-
State Circuits, Vol. 33, No. 5, May 1998, pp. 676–686.
2. H. H. Chen and D. D. Ling, “Power Supply Noise Analysis Methodology for
Deep-Submicron VLSI Chip Design,” in Proceedings of 34th Design Automa-
tion Conference, 1997, pp. 638.
3. A. Dharchoudhury, R. Panda, D. Blaauw, R. Vaidyanathan, B. Tutuianu,
and D. Bearden, “Design and Analysis of Power Distribution Networks in
PowerPC Microprocessors,” in Proceedings of 35th Design Automation Con-
ference, 1998, p. 738.
4. G. Steele, D. Overhauser, S. Rochel, S. Z. Hussain, “Full-Chip Verification
Methods for DSM Power Distribution Systems,” in Proceedings of 35th De-
sign Automation Conference, 1998, p. 744.
5. P. C. Li and T. K. Young, “Electromigration: the Time Bomb in Deep-Submi-
cron ICs,” IEEE Spectrum, Sept. 1996, p. 75.
6. H. B. Bakoglu, Circuits, Interconnections and Packaging for VLSI, Addison-
Wesley, 1990, Chapter 7.
7. Q. Zhu, “Power Grid Problems and On-Die Decoupling Capacitance Opti-
mization Method,” in Proceedings of IEEE 2nd International Workshop on
Chip and Package Co-design, CPD2000, 2000, p. 46.
8. A. Deutsch et al., “When are Transmission-Line Effects Important for On-
Chip Interconnections?,” IEEE Transactions on Microwave Theory and Tech-
niques, Vol. 45, No. 10, Oct. 1997, p. 1836.
9. N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design—A
System Perspective, Addison-Wesley, 1992, Chapter 4.
10. M. T. Bohr, “Interconnect Scaling—The Real Limiter to High Performance
ULSI,” Solid State Technology Journal, Sept. 1996, p. 105.
11. A. Odabasioglu, M. Celik, and L. T. Pileggi, “PRIMA: Passive Reduced-
Order Interconnect Macromodeling Algorithm,” IEEE Trans. on Computer-

Power Distribution Network Design for VLSI, by Qing K. Zhu 199

200 REFERENCES

Aided Design of Integrated Circuits and Systems, Vol. 17, No. 8, Aug. 1998,
p. 645.
12. L. T. Pillage and R. A. Rohrer, “Asymptotic Waveform Evaluation for Timing
Analysis,” IEEE Trans. on Computer-Aided Design, Vol. 9, No. 4, April 1990,
p. 352.
13. R. Kielkowski, Inside SPICE, McGraw-Hill, 1994.
14. K. L. Shepard, S. M. Carey, E. K. Cho, B. W. Curran, R. F. Hatch, D. E. Hoff-
man, S. A. McCabe, G. A. Northrop, and R. Seigler, “Design Methodology for
the S/390 Parallel Enterprise Server G4 Microprocessor,” IBM Journal of
Research and Development, Vol. 41, No. 4/5, May 1997, p. 515.
15. K. L. Shepard and T Zian, “Return-Limited Inductance: A Practical Ap-
proach to On-Chip Inductance Extraction,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, Vol. 19, No. 4, April 2000,
p. 425.
16. M. Basel, “Accurate and Efficient Extraction of Interconnect Circuits for
Full-Chip Timing Analysis,” in Proceedings of Design Automation Confer-
ence, 1995, p. 118.
17. A. K. Goel, “High-Speed Interconnections ,” Wiley, 1994, Chapter 2.
18. F. Najm, “Transition Density: a New Measure of Activity in Digital Cir-
cuits,” IEEE Trans. on Computer-Aided Design, Vol. 12, No. 2, Feb. 1993, p.
310.
19. M. Xakellis and F. Najm, “Statistical Estimation of the Switching Activity in
Digital Circuits,” in Proceedings of 31st ACM/IEEE Design Automation Con-
ference, 1994, p. 728.
20. E. Grim, Technical Presentations, Intel Corporation, 1999.
21. A. Waizman, Technical Presentations, Intel Corporation, 1998.
22. D. Ayers, Private Communications, Intel Corporation, 1998.
23. T. Burton, Technical Presentations, Intel Corporation, 1998.
24. Q. Zhu, “A New Technique: Decap (Decoupling Capacitance) Sizing and In-
sertion Based on Power Noise Violation Nodes,” USA Patent # 6446016, Sep.
2002.
25. Y. L. Le Coz and R. B. Iverson, “A Stochastic Algorithm for High-Speed Ca-
pacitance Extraction in Integrated Circuits,” Solid-State Electronics, Vol. 35,
No. 7, July 1992, p. 1005.
26. P. Larsson, “Resonance and Damping in CMOS Circuits with on-Chip De-
coupling Capacitance,” IEEE Transactions on Circuits and Systems—I: Fun-
damental Theory and Applications, Vol. 45, No. 8, Aug. 1998, p. 849.
27. The National Technology Roadmap for Semiconductors, Semiconductor Re-
search Corporation, 1997.
28. Q. Zhu and W. W.-M. Dai, “High Speed Clock Network Sizing Optimization
Based on Distributed RC and RLC Interconnect Models,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, Vol. 15, No.
9, Sept. 1996, p. 1106.
29. G. K. Rao, Multilevel Interconnect Technology, McGraw-Hill, 1993.
30. TSMC Technology Roadmap, www.tsmc.com, 2002.
refs.qxd 12/16/2003 12:48 PM Page 201

REFERENCES 201

31. TSMC 0.18 ␮m Logic 1P6M Salicide 1.5 V/3.3 V Design Rule, Taiwan Semi-
conductor Manufacturing Co., Nov. 2000.
32. P. E. Allen and D. R. Holberg, CMOS Analog Circuit Design, Oxford Univer-
sity Press, 1987.
33. Star-RCXT User Guide, Synopsys Corporation, 2002.
34. Star-SimXT User Guide, Synopsys Corporation, 2002.
35. R. Kumar, Noise Design and Analysis, Intel Corporation, 1997.
36. A. K. Goel, High-Speed VLSI Interconnections: Modeling, Analysis and Sim-
ulation, Wiley, 1994.
37. P. DeWilde and Z.-Q. Ning, Models for Large Integrated Circuits, Kluwer Ac-
ademic Publishers, 1990.
38. C. S. Walker, Capacitance, Inductance, and Crosstalk Analysis, Artech
House, 1990.
39. Virtuoso User Guide, Cadence Design Systems, Inc., 2002.
40. HSPICE User Guide, Synopsys, Inc., 2002.
41. N. D. Arora et al., “Modelling and Extraction of Interconnect Capacitances
for Multilayer VLSI Circuits,” IEEE Trans. on Computer-Aided Design of In-
tegrated Circuits and Systems, Vol. 15, No. 1, pp. 58–67, Jan. 1996.
42. Q. Zhu, “Star-RCXT Capacitance Accuracy Study,” T-RAM, Inc., Feb. 2002.
43. J. Savoj and B. Razavi, High-Speed CMOS Circuits for Optical Receivers,
Kluwer Academic Publishers, 2001.
44. W. S. Song and L. A. Glasser, “Power Distribution Techniques for VLSI
Circuits,” IEEE Journal of Solid-State Circuits, Vol. SC-21, No. 1, Feb.
1986.
45. Q. Zhu, Power Grid Design and Specifications, Chameleon Systems, Inc.,
2001.
46. D. Ayers, Microprocessor Power Network Design, Intel Corporation, 1998.
47. Y. Jiang, P6C AC Analysis, Intel Corporation, 1995.
48. Y. Jiang, P6C Decoupling Capacitor Methodology, Intel Corporation, 1995.
49. H. H. Chen and D. D. Ling, “Power Supply Analysis Methodology for Deep-
Submicron VLSI Chip Design,”in Proceedings of 34th Design Automation
Conference, 1997, p. 638.
50. B. J. Rubin, “An Electromagnetic Approach for Modeling High-Performance
Computer Package,” IBM Journal of Research and Development, Vol. 34, pp.
585–600, July 1990.
51. VoltageStorm Transistor-Level PGS User Guide, Cadence Design Systems,
Inc., 2002.
52. A. Chandrakasan, W. J. Bowhill, and F. Fox, Design of High-Performance
Microprocessor Circuits, IEEE Press, 2001, Chapter 24.
53. Q. Zhu and J. Pabustan, Post-Layout Static IR Analysis Flow Based on Sim-
plex Tool, Chameleon Systems, Inc., 2001.
54. V. L. Bars, IR Drop Evaluation in a Power/Ground Mesh, Project Report on
UC Santa Cruz Extension, 1997.
55. S. Chowdhury and J.S. Barkatullah, “Estimation of Maximum Currents in
refs.qxd 12/16/2003 12:48 PM Page 202

202 REFERENCES

MOS IC Logic Circuits,” IEEE Transactions on Computer-Aided Design, Vol.

9, No. 6, June 1990, pp. 642–654.
56. J. N. Kozhaya and F. N. Najm, “Power Estimation for Large Sequential Cir-
cuits,” IEEE Transactions on VLSI Systems, Vol. 9, No. 2, April 2001, pp.
400–407.
57. M. S. Hsiao, E. M. Rudnick, and J. H. Patel, “Peak Power Estimation of
VLSI Circuits: New Peak Power Measures,” IEEE Transactions on VLSI
Systems, Vol. 8, No. 4, August 2000, pp. 435–439.
58. Y.-M. Jiang, K.-T. Cheng, and A. Krstic, “Estimation of Maximum Power
and Instantaneous Current Using a Genetic Algorithm,” in Proceedings of
Custom Integrated Circuits Conference, 1997, pp. 135–138.
59. Star-RCXT User Guide, Synopsys Inc., 2001.
60. Fire & Ice User Guide, Cadence Design Systems, Inc., 2001.
61. T. Mozdzen, J. Barkatullah, S. Rajgopal, and D. Weiss, Management of Pow-
er Supply Noise Using Die, Package and Board Level Solutions, Intel Corpo-
ration, 1995.
62. A. Dharchoudhury, R. Panda, D. Blaauw, R. Vaidyanathan, B. Tutuianu,
and D. Bearden, “Design and Analysis of Power Distribution Networks in
PowerPC Microprocessors,” in Proceedings of 35th ACM/IEEE Design Au-
tomation Conference, 1998, pp. 738–743.
63. P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. All-
mon, “High-Performance Microprocessor Design,” IEEE Journal of Solid-
State Circuits, Vol. 33, No. 5, May 1998, pp. 676–686.
64. HiP7 Design Manual, Motorola, Inc., Oct. 2002.
65. ElectronStorm Manual 3.1, Cadence Design Systems, Inc., 2001.
66. R. Senthinathan, S. Fischer, H. Rangchi and H. Yazdanmehr, “A 650-MHz,
IA-32 Microprocessor with Enhanced Data Streaming for Graphics and
Video,” IEEE Journal of Solid-State Circuits, Vol. 34, No. 11, Nov. 1999, pp.
1454–1465.
67. G. K. Konstadinidis, K. Normoyle, S. Wong, S. Bhutani, H. Stuimer, T.
Johnson, A. Smith, D. Y. Cheung, F. Romano, S. Yu, S.-H. Oh, V. Melamed,
S. Narayanan, D. Bunsey, C. Khieu, K. J. Wu, R. Schmitt, A. Dumlao, M.
Sutera, J. Chau, K. J. Lin and W. S. Coates, “Implementation of a Third-
Generation 1.1-GHz 64-bit Microprocessor,” IEEE Journal of Solid-State
Circuits, Vol. 37, No. 11, Nov. 2002, pp. 1461–1469.
68. H. Mizuno, K. Ishibashi, T. Shimura, T. Hattori, S. Narita, K. Shiozawa, S.
Ikeda, and K. Uchiyama, “An 18-␮A Standby-Current 1.8 V 200 MHz Micro-
processor with Self Substrate-Biased Data Retention Mode,” IEEE Journal
of Solid-State Circuits, Vol. 34, No. 11, Nov. 1999, pp. 1492–1500.
69. C. F. Webb et al., “A 400-MHz S/390 Microprocessor,” IEEE Journal of Sol-
id-State Circuits, Vol. 32, No. 11, Nov. 1997, pp. 1665–1675.
70. R. Heald et al., “A Third-Generation SPARC V9 64-b Microprocessor,” IEEE
Journal of Solid-State Circuits, Vol. 35, No. 11, Nov. 2000, pp. 1526–1538.
71. S. Rusu and G. Singer, “The First IA-64 Microprocessor,” IEEE Journal of
Solid-State Circuits, Vol. 35, No. 11, Nov. 2000, pp. 1539–1544.
72. C. Nicoletta et al., “A 450-MHz RISC Microprocessor with Enhanced In-
refs.qxd 12/16/2003 12:48 PM Page 203

REFERENCES 203

struction Set and Copper Interconnect,” IEEE Journal of Solid-State Cir-

cuits, Vol. 34, No. 11, Nov. 1999, pp. 1478–1491.
73. C. C. Wong, “Flip Chip Connection Technology,” in Multichip Module Tech-
nologies and Alternatives: The Basics, Edited by D. A. Doane and P. D. Fran-
zon, Van Nostrand Reinhold, 1993.
74. P. D. Franzon, “Electrical Design of Digital Multichip Module,” in Multichip
Module Technologies and Alternatives: The Basics, Edited by D. A. Doane
and P. D. Franzon, Van Nostrand Reinhold, 1993.
75. A. Chakrabarti, A Preliminary Analysis of Decoupling in Package and
Chips, LSI Logic Corp., 1994.
76. B. Kleveland and J. Prak, Chip and Package Power Supply Analysis, Intel
Corporation, 1993.
77. B. Jocobs, VCC/VSS Noise Measurement Bandwidth vs. Motherboard De-
coupling, Intel Corporation, 1995.
78. T. Burton, Power Grid Measurement Show Big Performance Win, Intel Cor-
poration, 1996.
79. Application Note: Recommended Placement for Power and Ground Pads for
the Standard I/O Pad Library, Nurlogic Design, Inc., 2001.
80. Q. Zhu and S. Tam, “Package Clock Distribution Design Optimization for
High-Speed and Low-Power VLSIs,” IEEE Transactions on CPMT/Ad-
vanced Packaging, Vol. 20, No. 1, pp. 56–63, Feb. 1997.
81. Q. Zhu and W. W.-M. Dai, “Planar Clock Routing for Chip and Package Co-
Design,” IEEE Transactions on VLSI Systems, Vol. 4, No. 2, pp. 210–226,
June 1996.
82. Q. Zhu, Chip and Package Co-design of Clock Networks, Ph.D. Thesis, Uni-
versity of California, Santa Cruz, June 1995.
83. Q. Zhu, “An On-Chip Decoupling Capacitance Allocation Method,” in Pro-
ceedings of Northeast Workshop on Circuits and Systems, Canada, May
2003, pp. 121–124.
index.qxd 12/16/2003 10:08 AM Page 205

INDEX

AC Analysis, 191, 48, 53, 55 Design Guidelines, 192, 28, 33, 73,
Activity-based Analysis, 191, 106, 162
107. 119. 120, 121, Design Methodology, 192, 75, 84
Design Rule, 193, 7, 10, 84
Back-annotation, 191, 196, 116 Device, 1, 4, 9, 20, 34, 37, 53, 55–56,
Block-Level Power Distribution, 191 58, 60, 69, 71, 80, 82, 84–85, 88,
Break Cell, 191, 188, 189, 106–107, 116–117, 138–139,
By-pass Capacitor, 191 141–142, 153
Device-Level Extraction, 193
C4 Package, 191, 2, 4, 20, 34, 44, 45, Di/Dt, 1, 16, 20, 29, 30–31, 33–35,
63 67–70, 150, 155–156, 160, 185,
Capacitance, 191, 4, 6, 8, 9, 10, 11, 12 190
Capacitance Model, 191, 10, 1, 2, 8, Die, 193, 1, 12, 28, 29, 30, 44, 45, 53,
Chip, 191, 1, 2, 4, 6, 7, 8, 12 67, 68
Characterization, 191, 24, 25, 32, 48 Die Size, 193, 28, 29, 30, 44, 70, 72,
Circuit Simulation, 191, 37, 51, 104, 139
107, 162 Die Micrograph, 193, 139, 140, 148,
Circuit Timing Analysis, 191 149
Contact, 192, 10, 11, 53, 66, 76, 78, DRC, 193, 46.
79, 84 Dynamic Analysis, 193, 91, 94, 95, 99
Core, 192, 117, 136, 140, 150, 158,
172, 188 ECO (Engineering Change Order),
Crossbar Leakage Current, 192 193, 104, 133
Electromigration, 193, 3, 56, 75, 76,
Decoupling Capacitor, 192, 1, 2, 4, 6, 80, 81, 82, 83, 84, 85
7, 16, 19, 23, 44, 46, 52, 53, 54, ESR, 193, 72
55, 57, 59, 61 Extracted Parasitic, 193
Deep-Submicron Process, 192, 4, 44 Extraction, 196, 7, 10, 66, 99, 173
Delay, 192, 20, 27, 34, 58, 100, 103,
126, 160. Flip-Chip, 193, 43, 61, 65, 104, 139,
Delta-I Current, 192, 147 148, 153, 156, 157, 158, 159

Power Distribution Network Design for VLSI, by Qing K. Zhu 205

206 INDEX

Floorplanning, 193, 33, 34, 35, 61 On-chip Inductance, 195, 2, 7, 20, 24,
Full Chip Power Distribution, 193, 4 180

Gate Capacitance, 193, 20, 29, 30, Package, 2, 4, 16, 20, 33–34, 36, 38,
166. 41, 44–45, 48, 50, 53, 55–56, 60,
Gate Delay, 193, 100, 103, 127 63, 65, 67–68, 72–73, 107, 136,
GDSII, 193, 99 139, 148, 151, 155–158
Global Power Network, 194, 33 Parasitic,195, 7, 8, 10, 20, 98, 175,
Ground Bounce, 194, 89, 160, 164, 177, 151, 161, 162, 178, 181,
165, 167 183.
Parasitic Capacitance, 195, 99, 107,
Hot Spot, 194, 12, 16, 57, 61, 71, 172, 174, 175
186 Peak Current, 195, 58, 59, 65, 77, 81,
101, 108, 110, 116, 117, 118,
Impedance, 194, 23, 25, 27, 57 125, 127, 129
Interconnect RC, 194, 7, 66 Physical Design, 195
IR Drop, 194, 27, 29, 30–36, 38, Piecewise Linear, 195, 58, 96, 107
62–66, 87–104 Place & Route, 195
I/O Library, 194, 188. Power Bus, 195, 42, 55, 56, 57, 60, 61,
89, 104, 113, 181, 189
L*di/dt Noise, 194, 1, 20, 29–35, 67 Power Distribution, 195, 1, 2, 4–5, 20,
Leakage Current, 194, 13, 141, 28–30, 32–35, 42, 44–45, 55–56,
146–147 69, 73, 76, 84, 97, 104
Linear Network, 194, 1, 109. Power Grid, 195, 1, 4, 6–7, 12, 14, 20,
Loop Inductance, 194, 23, 29, 68, 69, 32–33, 35–38, 41–43, 45, 47–50,
136, 150 62–63.
Low-pass Filter, 194, 1, 44 Power Grid Analysis, 195, 1, 33, 89,
LVS, 194, 54 91–108, 116, 121–129
Power Network, 196, 1–7, 12, 16–20,
Mean Time to Fail (MTF), 194, 54, 76, 28–32, 33–34, 44, 48, 50–55, 58,
85 65, 68, 73, 83, 89, 104, 116,
Metal Capacitance, 194, 29, 174 118, 129, 133, 135, 140, 157,
Metal Ion, 194, 75 162, 167–168, 172, 175, 180
Metal Structure, 195, 24, 25 Power Strap, 196, 38, 65, 118, 119
Metal Layer, 4, 9–10, 33–34, 73, 76, Power Supply Voltage, 196, 1, 53,
84, 100, 109, 113, 115, 135, 56–57, 59–61, 70, 88, 116, 133,
140–141, 147, 153 136, 156
Metal Utilization, 195, 167, 172, Power Switching Noise, 196, 88
177 Pre-layout Netlist, 196
Modeling, 195, 1–8, 12–16, 27, 32, 35,
37, 38, 48, 50–51, 57, 59, 63, RC Back-annotation, 196
72–73, 92, 161. RC Extraction, 196, 7, 10, 66, 99, 173
RC Data, 196, 8,
Netlist, 8, 90, 92–93, 95–97, 99–134 RC Netlist, 196, 8
Noise, 195, 1–2, 12–13, 16. 20, 30–33, Reflection Noise, 196, 161
44, 60–61, 67–69, 71, 72, 88, Regulator, 196, 72, 150–153, 156, 163
136, 141, 142, 146, 147, 150, Resonance, 196, 150–151, 163–165,
155, 156, 157, 159, 161, 175 181, 183
Noisy Nodes, 195, 17, 18, 19, 32 RLC Segment, 196, 4
index.qxd 12/16/2003 10:08 AM Page 207

INDEX 207

Routing, 33, 36, 38, 39, 42, 46, 48, 57, Tap Current, 197, 85, 90–91, 94–104,
70, 88, 113, 130–131, 136, 153, 106, 109–133
155, 167–172, 177, 181, 183, Technology Parameters, 197, 135, 147
185 Top Metal Layer, 197, 4, 135
RMS Current, 196, 77, 81, 84, 99, Transistor, 197, 1–4, 13, 19, 28–30,
128 45, 67, 73, 83–85, 88–104,
106–133, 135–156
Scaling, 196, 1–3, 28–32, 62, 76, 88, Transistor-Level Simulation, 197, 63,
103, 107, 110, 118, 119, 173 83, 98
Simulation, 197, 3, 6, 12, 19, 20, 38, Transmission Line, 197, 161
40, 42, 45–46, 51–52, 58, 60, 63,
65, 80, 83, 92, 98–99, 104, Unit-length Capacitance, 198, 10, 28,
106–107, 116, 118–119, 30
124–125, 127, 162–163, Unit-length Inductance, 198, 4, 29
165–167, 172, 175, 177, 181, Unit-length Resistance, 198, 4, 28
183–184
Simultaneous Switching Noise (SSN), Via, 198, 10–11, 32, 34, 55, 72, 76,
195, 143, 157, 159, 161, 162, 78–80, 84–85, 89, 129–130, 186
164, 190 Via Resistance, 198, 66
Standard Parasitic Format (SPF), Vector-Based Analysis, 198
197, 8, 9. Voltage Distribution, 198, 1
Standby Mode, 197, 141–143, 146 Voltage Fluctuation, 198, 12, 16
Static Analysis, 197, 92–93, 96, 100, Voltage Threshold, 198, 17
105, 107–108, 110, 116–117, Voltage Regulation, 198, 151
119, 124, 125, 127–128
Switching Current, 197, 1, 4, 12, 15, Weak Spot, 198, 89–90, 92–93,
45, 49, 55, 66–67, 73, 133, 155, 104–106, 126, 133
160, 175–176 Wire Bonding, 198, 36, 38, 43, 61, 63,
Switching Factor, 197, 28, 31–32, 76, 65, 118, 159, 168–169, 171–172,
93, 98, 103, 110, 118–119, 173 177

Powerplan - Physical Design - VLSI Back-End Adventure
100% (1)
Powerplan - Physical Design - VLSI Back-End Adventure
14 pages
AEC - Q100 - What Changed From Rev H To Rev J
No ratings yet
AEC - Q100 - What Changed From Rev H To Rev J
40 pages
Power Distribution Network Design For VLSI
100% (1)
Power Distribution Network Design For VLSI
211 pages
CMOS C35 Design Rules
No ratings yet
CMOS C35 Design Rules
60 pages
Module 10 Generating and Running The Foundation Flow Scripts
No ratings yet
Module 10 Generating and Running The Foundation Flow Scripts
3 pages
Physical Design - POWER PLANING
No ratings yet
Physical Design - POWER PLANING
10 pages
(Day2-2) IC低功耗设计之UPF 简介
No ratings yet
(Day2-2) IC低功耗设计之UPF 简介
43 pages
Dynamic Power Reduction WP
No ratings yet
Dynamic Power Reduction WP
6 pages
1444 - Power and Rail Integrity Closure On The Latest Arm V8-A Processors
No ratings yet
1444 - Power and Rail Integrity Closure On The Latest Arm V8-A Processors
29 pages
IR-Drop in On-Chip Power Distribution Networks
No ratings yet
IR-Drop in On-Chip Power Distribution Networks
11 pages
SoC Design - A Review
No ratings yet
SoC Design - A Review
131 pages
Chip IO Circuit Design - IO Buffers Design in IC Communications
No ratings yet
Chip IO Circuit Design - IO Buffers Design in IC Communications
84 pages
SoC Timing ECO Cycle
100% (1)
SoC Timing ECO Cycle
2 pages
VLSI Interconnects: Credits: David Harris
No ratings yet
VLSI Interconnects: Credits: David Harris
46 pages
Fundamentals of Floor Planning A Complex Soc: Andre Hassan
No ratings yet
Fundamentals of Floor Planning A Complex Soc: Andre Hassan
6 pages
Signal and Design Integrity-HangzhouSI2
No ratings yet
Signal and Design Integrity-HangzhouSI2
56 pages
STA Temp
No ratings yet
STA Temp
34 pages
Power Integrity Analysis of Low Power SOC Design
No ratings yet
Power Integrity Analysis of Low Power SOC Design
7 pages
Primetime: Golden Timing Signoff Solution and Environment
No ratings yet
Primetime: Golden Timing Signoff Solution and Environment
7 pages
Concept of Power Planing I
No ratings yet
Concept of Power Planing I
20 pages
Soc
No ratings yet
Soc
49 pages
Floorplan
No ratings yet
Floorplan
27 pages
VLSI Terminologies RV-VLSI PDF
No ratings yet
VLSI Terminologies RV-VLSI PDF
58 pages
2015.06 Router ICV Incremental Training
No ratings yet
2015.06 Router ICV Incremental Training
63 pages
Advanced VLSI Design: Dr. Premananda B.S
No ratings yet
Advanced VLSI Design: Dr. Premananda B.S
339 pages
TP Ansys Redhawk PDF
No ratings yet
TP Ansys Redhawk PDF
4 pages
Design in India - Fabless Chip Design-WhitePaper Aug 28'17 PDF
100% (1)
Design in India - Fabless Chip Design-WhitePaper Aug 28'17 PDF
55 pages
Timing
No ratings yet
Timing
6 pages
UNIT-3 Sources of Power Dissipation
No ratings yet
UNIT-3 Sources of Power Dissipation
6 pages
Design Rule Checking: Why DRC Violations Occur
No ratings yet
Design Rule Checking: Why DRC Violations Occur
6 pages
IR DROP Analysis: Avg Avg
No ratings yet
IR DROP Analysis: Avg Avg
7 pages
Soc Design
No ratings yet
Soc Design
42 pages
IR Drop Using Voltage Strom PDF
No ratings yet
IR Drop Using Voltage Strom PDF
19 pages
Synopsis Design Constraints: SDC Timing Constraints Clock Constraints
No ratings yet
Synopsis Design Constraints: SDC Timing Constraints Clock Constraints
2 pages
Pin Assignment
No ratings yet
Pin Assignment
39 pages
Expanding The Synopsys Primetime Solution With Power Analysis
No ratings yet
Expanding The Synopsys Primetime Solution With Power Analysis
7 pages
IP-XACT Standardized IP Interfaces For Rapid IP Integration
No ratings yet
IP-XACT Standardized IP Interfaces For Rapid IP Integration
26 pages
Ir em Syn PDF
No ratings yet
Ir em Syn PDF
6 pages
Input Files Required
No ratings yet
Input Files Required
3 pages
Ccs Timing WP
No ratings yet
Ccs Timing WP
15 pages
Power Distribution and Clock Design: R. Saleh Dept. of ECE University of British Columbia Res@ece - Ubc.ca
No ratings yet
Power Distribution and Clock Design: R. Saleh Dept. of ECE University of British Columbia Res@ece - Ubc.ca
55 pages
Latch Up
No ratings yet
Latch Up
3 pages
A Technique To Remove Glitches in Physical Design Stage
No ratings yet
A Technique To Remove Glitches in Physical Design Stage
120 pages
D2A1-1-3-DV VCD Based Power Signoff
No ratings yet
D2A1-1-3-DV VCD Based Power Signoff
17 pages
Asic Design Flow Tutorial 3228gl
No ratings yet
Asic Design Flow Tutorial 3228gl
138 pages
IO Planning in EDI Systems: Cadence Design Systems, Inc
No ratings yet
IO Planning in EDI Systems: Cadence Design Systems, Inc
13 pages
Low Power Soc Verification Ip Reuse and Hierarchical Composition Using Upf Presentation
No ratings yet
Low Power Soc Verification Ip Reuse and Hierarchical Composition Using Upf Presentation
26 pages
Week 4
No ratings yet
Week 4
35 pages
Introduction To Liberty - CCS, ECSM and NDLM
0% (1)
Introduction To Liberty - CCS, ECSM and NDLM
7 pages
Full Flow Clock Domain Crossing - From Source To Si: March 2016
No ratings yet
Full Flow Clock Domain Crossing - From Source To Si: March 2016
13 pages
Cadence SOC Encounter PDF
No ratings yet
Cadence SOC Encounter PDF
222 pages
Lib Char CT Seminar
No ratings yet
Lib Char CT Seminar
75 pages
ASIC Design Guidelines: Hauw Suwito, Consultant
No ratings yet
ASIC Design Guidelines: Hauw Suwito, Consultant
8 pages
Introduction To Cmos Vlsi Design
No ratings yet
Introduction To Cmos Vlsi Design
29 pages
Advanced Asic Chip Synthesis Using Synopsys 1999
No ratings yet
Advanced Asic Chip Synthesis Using Synopsys 1999
149 pages
Power Distribution Network Design For Vlsi
No ratings yet
Power Distribution Network Design For Vlsi
211 pages
ECE260B - CSE241A Winter 2005 Power Distribution: Website: Http://vlsicad - Ucsd.edu/courses/ece260b-W05
No ratings yet
ECE260B - CSE241A Winter 2005 Power Distribution: Website: Http://vlsicad - Ucsd.edu/courses/ece260b-W05
52 pages
Cmos Low Power
No ratings yet
Cmos Low Power
5 pages
Power Optimization For Low Power VLSI Circuits
No ratings yet
Power Optimization For Low Power VLSI Circuits
4 pages
Low Power Cmos Vlsi Circuit Design by Kaushik Roy 1 To 30 Page
No ratings yet
Low Power Cmos Vlsi Circuit Design by Kaushik Roy 1 To 30 Page
30 pages
Power Planning in ASIC Design
No ratings yet
Power Planning in ASIC Design
2 pages
3 Powerplan
No ratings yet
3 Powerplan
40 pages
2.5Gbps CMOS Laser Diode Driver With APC and
No ratings yet
2.5Gbps CMOS Laser Diode Driver With APC and
4 pages
Chapter 16 CMOS Logic Gates 3. CMOS Inverters - With Class Notes 11182024
No ratings yet
Chapter 16 CMOS Logic Gates 3. CMOS Inverters - With Class Notes 11182024
54 pages
Camera Link Repeaters - Splitters, Repeaters, Mux: What Is The Weightage of in GATE Exam? 9 1.22
No ratings yet
Camera Link Repeaters - Splitters, Repeaters, Mux: What Is The Weightage of in GATE Exam? 9 1.22
5 pages
T6963C
No ratings yet
T6963C
28 pages
Single-Chip 16C X 2L Dot-Matrix LCD Controller / Driver: Features
No ratings yet
Single-Chip 16C X 2L Dot-Matrix LCD Controller / Driver: Features
31 pages
Adg714 715
No ratings yet
Adg714 715
21 pages
Strained Silicon Technology
No ratings yet
Strained Silicon Technology
35 pages
Pic 14000
No ratings yet
Pic 14000
153 pages
Ir2117 Igbt Driver PDF
No ratings yet
Ir2117 Igbt Driver PDF
18 pages
Dynamic Logic Circuits
No ratings yet
Dynamic Logic Circuits
61 pages
2019S Lec Ch9 CascodeStage&CurrentMirrors Rev
No ratings yet
2019S Lec Ch9 CascodeStage&CurrentMirrors Rev
29 pages
Department Programme Course Code Course Name Semester Credit Values Contact Hours Pre-Requisite (S) Vission & Mission: Vission
No ratings yet
Department Programme Course Code Course Name Semester Credit Values Contact Hours Pre-Requisite (S) Vission & Mission: Vission
4 pages
A Balanced CMOS Compatible Ternary Memristor-NMOS Logic Family and Its Application
No ratings yet
A Balanced CMOS Compatible Ternary Memristor-NMOS Logic Family and Its Application
14 pages
Lab 9 X
No ratings yet
Lab 9 X
11 pages
R21 - VLSI - LP - RSK - 2023 With Date
No ratings yet
R21 - VLSI - LP - RSK - 2023 With Date
4 pages
Logic Families
No ratings yet
Logic Families
3 pages
PDF Datasheet 7432 or PDF - Compress
No ratings yet
PDF Datasheet 7432 or PDF - Compress
9 pages
Vlsi Unit 3
No ratings yet
Vlsi Unit 3
21 pages
ASIC and Physical Design Questions
100% (2)
ASIC and Physical Design Questions
46 pages
C60 - Fujitsu Digital To Analog Converter LEIA 55-65 GSa/s 8-Bit DAC
No ratings yet
C60 - Fujitsu Digital To Analog Converter LEIA 55-65 GSa/s 8-Bit DAC
2 pages
Tutorial-2 LNA PDF
100% (2)
Tutorial-2 LNA PDF
19 pages
Inv Delay PDF
No ratings yet
Inv Delay PDF
6 pages
Ultra Low Power IoT Applications
No ratings yet
Ultra Low Power IoT Applications
21 pages
Peripherals Viva
No ratings yet
Peripherals Viva
6 pages
An Accurate Photodiode Model For DC and High Frequency SPICE Circuit Simulation
No ratings yet
An Accurate Photodiode Model For DC and High Frequency SPICE Circuit Simulation
4 pages
CMOS Inverter - Dynamic Characteristics
100% (1)
CMOS Inverter - Dynamic Characteristics
54 pages
Design of A Wideband Variable-Gain Amplifier With Self-Compensated Transistor For Accurate dB-Linear Characteristic in 65 NM CMOS Technology
No ratings yet
Design of A Wideband Variable-Gain Amplifier With Self-Compensated Transistor For Accurate dB-Linear Characteristic in 65 NM CMOS Technology
12 pages
A Cmos: 14-Transistor Full Adder With Full Voltage-Swing Nodes
No ratings yet
A Cmos: 14-Transistor Full Adder With Full Voltage-Swing Nodes
10 pages

PowerDistributionNetworkDesignForVLSI PDF

Uploaded by

PowerDistributionNetworkDesignForVLSI PDF

Uploaded by

ffirs.

qxd 3/24/2004 11:23 AM Page i

A JOHN WILEY & SONS, INC., PUBLICATION

Library of Congress Cataloging-in-Publication is available.

Printed in the United States of America.

5 Power Grid Analysis 105

6 Microprocessor Design Examples 135

7 Package and I/O Design for Power Delivery 157

This book provides the detailed information on power distribution

As power supply voltage continues to drop with the VLSI tech-

Power Distribution Network Design for VLSI, by Qing K. Zhu 1

Section 1.4 discusses a special topic in power network design:

1.1 POWER SUPPLY NOISE

Noise problems in microprocessor power distribution networks

1.1 POWER SUPPLY NOISE 3

can be found in [2]. The power network modeling and analysis

1.2 POWER NETWORK MODELING

The layout and C4 package of a high-performance microprocessor

1.2 POWER NETWORK MODELING 5

Figure 1-2. Power distribution for high-performance microprocessors.

Power grid node

Rvcc Lvcc A Rvcc Lvcc B Rvcc Lvcc

Rvss Lvss Rvss Lvss Rvss Lvss

Figure 1-3. On-chip power grid RLC modeling.

Rds-on and Cgate form a distributed RC network. Cgate is in series

Figure 1-4. Switching model of decoupling capacitor.

1.2 POWER NETWORK MODELING 7

Figure 1-5. Decoupling capacitor modeling.

excluding the metal inductances for the on-chip power network.

Table 1-1. Well-known RC extraction CAD tools

Figure 1-6. Lumped and distributed RC models.

Each RC segment is modeled with a series resistor, together

1.2 POWER NETWORK MODELING 9

Figure 1-7. Coupling capacitances between conductors in a VLSI layout [33].

equations or capacitance models are usually adopted in the capac-

R = sl/w (ohm) (1-1)

In Equation (1-1), s is the sheet resistance in the unit of

a. Overlap capacitance: the bottom/top surface of one line to

Table 1-2. Metal sheet resistances in 0.18 ␮m technology

1.2 POWER NETWORK MODELING 11

Figure 1-8. Contacts and vias [9].

space (8.854 · 10–14 F/cm2), ␧r is the relative permittivity be-

pacitance is modeled as Clt = Fl1l2 (d) · l, and Fl1l2 (d) = C0 +

1.3 MODELING OF SWITCHING CURRENTS

The high current consumption in some regions of the die produces

1.3 MODELING OF SWITCHING CURRENTS 13

the total current consumption, other two current components (Isc,

1.3 MODELING OF SWITCHING CURRENTS 15

Figure 1-9 (continued). (d) Bus lines layout structure.

Figure 1-10. Modeling of switching currents.

angle. The current waveforms are back-annotated into the power

1.4 ON-CHIP DECOUPLING CAPACITANCE

To prevent the supply level from collapsing when many gates

CdecapVCC = (VCC + ⌬V)(Cdecap + Csw)

Based on Equation (1-3), to ensure a small voltage fluctuation ⌬V,

1.4 ON-CHIP DECOUPLING CAPACITANCE 17

The objective of the decoupling capacitance optimization problem

Min 冱 (Cd)i Subject to V1 ⱕ V(ni) ⱕ V2 (1-4)

In Equation (1-4), (Cd)i is the decoupling capacitance and V(ni) the

1.4 ON-CHIP DECOUPLING CAPACITANCE 19

Procedure I: Decoupling Capacitance Increment

Procedure II: Decoupling Capacitance Decrement

For (each node){

Figure 1-13. Decoupling capacitance optimization flow [83].

cycle time is 3 ns or 330 MHz frequency in the experiments. Two

1.5 ON-CHIP INDUCTANCE

The inductive drop or noise (L · di/dt) on the power lines becomes

1.5 ON-CHIP INDUCTANCE 21

Decoupling Capacitance (pf) 450 Vcc = 0.00375V

400 Vcc = 0.0375V

200 Vcc = 0.075V

Load Capacitance Increasing Rate

Figure 1-14. Sensitivity study of on-chip decoupling capacitances [83]. (Figure

Decoupling Capacitance (pf)

Is (Current Source) Increasing Rate

450 Vcc = 0.0375V