Analog Layout Generation For Performance and Manufacturability PDF
Analog Layout Generation For Performance and Manufacturability PDF
Analog Layout Generation For Performance and Manufacturability PDF
AND MANUFACTURABILITY
THE KLUWERINTERNATIONAL SERIES
IN ENGINEERING AND COMPUTER SCIENCE
by
Koen Lampaert
Katholieke Universiteit Leuven
Georges Gielen
Katholieke Universiteit Leuven
and
Willy Sansen
Katholieke Universiteit Leuven
Analog integrated circuits are very important as interfaces between the digital parts of integrated
electronic systems and the outside word. A large portion of the effort involved in designing these
circuits is spent in the layout phase. Whereas the physical design of digital circuits is automated
to a large extent, the layout of analog circuits is still a manual, time-consuming and error-prone
task. This is mainly due to the continuous nature of analog signals, which causes analog circuit
performance to be very sensitive to layout parasitics. The parasitic elements associated with
interconnect wires cause loading and coupling effects that degrade the frequency behavior and
the noise performance of analog circuits. Device mismatch and thermal effects put a funda-
mental limit on the achievable accuracy of circuits. For successful automation of analog layout,
advanced place and route tools that can handle these critical parasitics are required.
In the past, automatic analog layout tools tried to optimize the layout without quantifying the
performance degradation introduced by layout parasitics. Therefore, it was not guaranteed that
the resulting layout met the specifications and one or more layout iterations could be needed. In
this work, we propose a performance driven layout strategy to overcome this problem; In our
methodology, the layout tools are driven by performance constraints, such that the final layout,
with parasitic effects, still satisfies the specifications of the circuit. The performance degrada-
tion associated with an intermediate layout solution is evaluated at runtime using predetermined
sensitivities. In contrast with other performance driven layout methodologies, our tools oper-
ate directly on the performance constraints, without intermediate parasitic constraint generation
step. This approach makes a complete and sensible trade-off between the different layout alterna-
tives possible at runtime and therefore eliminates the possible feedback route between constraint
derivation, placement and layout extraction.
Besides its influence on the performance, layout also has a profound impact on the yield
and testability of an analog circuit. In this work, we develop a new criterion to quantify the
detectability of a fault and combine this with a yield model to evaluate the testability of an in-
tegrated circuit layout. We then integrate this technique with our performance driven routing
algorithm to produce layouts that have optimal manufacturability while still meeting their per-
formance specifications.
Contents
Abstract v
1 Introduction 1
1.1 Mixed-signal Design Methodology . ......... 1
1.2 A Hierarchical Performance-Driven Design Strategy 3
1.3 Physical Design Tools for Mixed-signal IC's . 7
1.3.1 Circuit Level Layout Generation . . 7
1.3.2 System Level Layout Generation. . 7
1.3.3 Layout Extraction and Verification . 8
1.3.4 Scope Of This Work 9
1.4 Layout Styles . . . . . 9
1.4.1 Full-Custom .... 9
1.4.2 Semi-Custom . . . . 10
1.4.3 Scope Of This Work 11
1.5 Existing Tools for Analog Layout 11
1.5.1 The Analog Macro-Cell Layout Style 11
1.5.2 Implementations of the Macro-Cell Layout Style 13
1.5.3 Situation Of This Work. . . . . . . . 17
1.6 Overview of the Analog Layout Tool LAYLA 18
1.6.1 Circuit Analysis 19
1.6.2 Device Generation 19
1.6.3 Placement . . . . . 19
1.6.4 Routing . . . . . . 19
1.7 Summary and Conclusions 20
3 Module Generation 53
3.1 Introduction . . . . . . . . . . 53
3.2 Problem Formulation . . . . . 53
3.3 Module Generation Strategies 55
3.3.1 Fixed Library of Procedural Generators 55
3.3.2 Dynamic Merging . . . . . . . . . . . 56
3.3.3 Simultaneous Placement and Module Optimization 56
3.3.4 Discussion . . . . . . . 57
3.4 Transistor Stacking Algorithms. 57
3.5 Procedural Module Generation 60
3.6 Technology Independence 60
3.7 Examples . . . . . . . . . . . 61
3.7.1 MOS Transistor . . . . 62
3.7.2 Cascode MOS Transistor Pair 68
3.8 Summary and Conclusions 68
4 Placement 71
4.1 Introduction 71
4.2 Problem Formulation . . . . . . 71
4.3 Overview of the Placement Tool 74
4.4 Previous Work in Placement Algorithms 75
4.4.1 Constructive Placement (CP) .. 76
CONTENTS ix
5 Routing 119
5.1 Introduction .. · 119
5.2 Problem Formulation . . . . . .. · . 119
5.3 Overview of the Routing Tool .. · 120
5.4 Classification of Routing Algorithms. · . 121
5.4.1 Routing Strategy · . 121
5.4.2 Routing Model . . . . . . . . · 122
5.4.3 Search Strategies . . . .. · . 123
5.5 Previous Work in Area Routing . . . . · . 124
5.5.1 Maze Routing . . . . . . . . . . ..... 124
5.5.2 Line-Search Routing . . ...... 125
5.5.3 Line-Expansion Routing .. . · 125
5.5.4 Discussion . . . . . . . . . . · . 126
5.6 A Grid-Less Maze Routing Algorithm · . 126
5.6.1 Routing Model . . . . . . . · . 127
5.6.2 Source Region Expansion · 130
5.6.3 Path Expansion . . . . . . . · . 131
x CONTENTS
6 Implementation 153
6.1 Introduction . · 153
6.2 Implementation . . . · 153
6.2.1 Source Code · 153
6.2.2 Interface to Electronic Design Frameworks · 154
6.3 Use ofLAYLA in an Industrial Environment. · 155
6.3.1 Link to Schematic Capture . . . . . . · 155
6.3.2 Link to Simulation .......... · 155
6.3.3 Back-Annotation of Layout Parasitics · 156
6.4 Results . . . . . . ............... · 156
Bibliography 163
List of Tables
Introduction
Our emphasis in this work is on automatic layout generation for analog integrated circuits. How-
ever, to gain a global perspective, we first briefly outline the most important steps involved in
the design of mixed-signal integrated circuits. Section 1.1 discusses the design cycle of a mixed-
signal ASIC. A design methodology for the analog part of the ASIC is presented in section 1.2
and the physical design steps in this flow are discussed in detail in section 1.3. In section 1.4, we
will discuss the different layout styles that can be used for analog layout, and we will indicate the
scope of the research. An overview of existing tools for analog full-custom layout is given in 1.5,
together with a situation of our own work. Finally, in section 1.6, we give a brief overview of the
LAYLA tool set, which is the result of this work, and we draw some conclusions in section 1.7.
• High-Level Specification The first step of the design process is to lay down the speci-
fications of the system in a formal, implementation-independent way. To ensure correct
functionality of the chip, this high-level specification can be simulated within its system
context.
• High-Level Design Next, a high-level architecture of the system is designed. During this
step, the system is partitioned into different subsystems (digital, DSP and analog) and the
high-level specifications are mapped into formal specifications for each subsystem. These
specifications are expressed in the language which is most appropriate to describe the type
of signals processed by the block (e.g. VHDL for digital, VHDL-A for analog, . " ). The
correct functionality of the system after high-level design can be verified using mixed-
mode simulators.
I--.--~
DSP
~-------------------------~
4-------------------~
ERC& ORC
• Synthesis Starting from the formal specifications derived during high-level design, each
subsystem is designed separately, either by hand or by a dedicated tool set. After this step,
the complete floorplan can be generated and the wiring parasitics estimated. The complete
chip is then again verified on the transistor level, or if this is impossible, using a mixture
of transistor-level and behavioral representations.
• Layout After synthesis of the blocks and verification of their functionality, the layout can
be generated. The different subsystems are laid out using different layout styles and sup-
porting tool sets. The final chip layout is then assembled using block place and route tools.
The real parasitics can now be extracted and included in the netlist for a final verification.
If the simulated chip performance is still within the specifications, ERC and DRC can be
carried out and the mask production can be started.
This design methodology involves many iterations, both within a step and between different
steps. The entire design process may be viewed as transformations of representations in various
steps. Each transformation results in a more accurate description of the system which can be
analyzed and verified. If a verification shows that the system specifications are violated at a
certain point in the design cycle, a number of design steps have to be repeated to correct the
error.
The remainder of this work has to do with the physical design aspects of the analog subsys-
tem. Examples of analog subsystems are data acquisition chains and sensor interface circuits.
These subsystems also include AID and DIA converters and analog circuits of which the transfer
function is controlled by a digital signal. This means that our analog subsystem will also include
some low level digital circuitry such as clocks, control logic and digital registers. Therefore, in
the remainder of this work, the analog subsystem will be referred to as "mixed AID subsystem"
or "mixed AID system".
In the next section, a methodology for the design of such mixed AID subsystems will be
described.
module
level
circuit
level
ronment is shown schematically in Fig. 1.3. In between any two levels i and i + 1 in the design
hierarchy, the following steps are performed:
1. top-down
2. bottom-up
Conceptually, these steps are the same at any level. First, based upon the required specifi-
cations ¢i for a block at level i and the technology data and considering criteria such as over-
all feasibility, smallest area and power consumption, the most appropriate architecture Ai from
among a set of alternatives is selected for the block under design. Next, the specifications ¢i of
the present block are translated into specifications w} for each of the sub-blocks j within the
selected architecture Ai (at the lowest level this is device sizing). The results of the specifica-
tion mapping process can then be verified by means of mixed-signal behavioral simulation of
the complete architecture. If the simulated architecture does not meet its specifications ¢i at this
point, iterations have to be carried out by redoing some of the previous steps.
This process of architecture selection and specification translation is repeated down the hier-
archy until a level is reached which allows a physical implementation B} (device level for custom
1.2 A Hierarchical Performance-Driven Design Strategy 5
- - - - - - -.- 0c;;
~
...
]
,
.. ,
I ~------_
,
r-------------._
, OK
level i+l
specifications at level i+ 1 •
layout at level i+l
layouts, higher levels when using standard cells). The hierarchy is then traversed bottom-up, at
each level generating the block layout B j by assembling the sub-block layouts B j of the archi-
tecture. After extraction and detailed verification (by simulation) the layout is passed to the next
level up. This results in the layout of the overall system at the top level. A final verification
can then be performed by extracting the design from the layout and simulating the whole sys-
tem with an appropriate mixed-level mixed-signal simulator. If the specifications are not met,
redesign iterations have to be carried out again.
This hierarchical performance driven design methodology has been implemented in the
AMGIE environment developed in ESATIMICAS [Gie1en 95a]. AMGIE is a CAD system for
the automated design of integrated analog modules starting from specifications over structure
selection and sizing down to layout. The AMGIE design flow (see Fig. 1.4) is organized hi-
erarchically using the hierarchical levels defined above: a module (e.g. an analog-to-digital
converter) is composed of different circuits (e.g. opamps and comparators) which in tum consist
of devices. The design flow is based on the top-down synthesis/bottom-up layout generation
process described earlier.
6 Introduction
+ t t
level 1 modules
..
OJ
0.0
"1;
E
:I:
o
I;
co
0.0
...
.~
level 3 devices
Figure 1.4: The design flow of the analog module generator AMGIE.
1.3 Physical Design Tools for Mixed-signal Ie's 7
AMGIE offers three different design styles to the designer, all handled in the same way by
the system: analog standard cells, parameterized cells and full-custom design. The user or the
system decides which design style to use. This allows to trade off design flexibility for design
speed. The use of predefined standard cells has the advantage that the cells have a proven quality
and that the design time is very short. Parameterized cells have a limited set of changeable
parameters which offers the additional advantage that the performances can still be tuned over
some range. Often, however, the required specifications are different from those of the library
cells and optimal fully customized tuning of the circuit is needed. Therefore, full-custom design
is necessary and is also offered in AMGIE.
This choice of design styles is also reflected in the design flow of Fig. 104. Analog standard
cells have a predefined design and layout. Therefore, the specification translation and verification
steps can be omitted from the design flow. They also have a predefined layout which is stored in
the library. Therefore, the layout generation/assembly steps are not needed and the layout can be
taken from the library.
The physical design steps in the design flow (shaded in Fig. 1.4) are the subject of this book
and will be discussed in the next section.
1.3.2.1 Floorplanning
After architecture selection, specification mapping and verification, the block under design is
defined as an interconnection of blocks from the lower level. At this point in the design flow,
the exact area and shape of the sub-blocks are unknown. It is however possible to determine
rough estimates of the area and the power consumption of each sub-block since its specifications
have been determined. Based on these estimates, the floorplanning algorithm has to determine
the position, the aspect ratio and the terminal positions for each sub-block, subject to constraints
inherited from the previous floorplanning step and such that the area is minimized and the per-
formance degradation of the circuit is bounded. Parasitic effects to be taken into account include
interconnect parasitics, thermal effects and substrate coupling. The aspect ratios and the terminal
positions derived by the floorplanner are used as constraints for the floorplanning or the layout
generation of the sub-blocks on the next lower hierarchical level. As an additional result of the
floorplanning algorithm, a set of estimated values for routing parasitics comes available. These
values can be used during synthesis of the lower level blocks.
•. Extraction
An extraction tool regenerates a circuit netlist from the layout of a circuit. The netlist
produced by a circuit extractor includes all interconnect parasitics present in the layout
and can be used to verify the performance of the circuit after layout.
1.4.1 Full-Custom
In a full custom layout style, the layout is done hierarchically in a bottom-up fashion. No re-
strictions are imposed on the width, height, aspect ratio or terminal positions of the layout blocks
at each hierarchical level and they can be placed at any location on the chip surface without
any restrictions. Every component in the design is hand crafted for performance, area and power
tradeoffs, often resulting in highly irregular placement and routing. This technique has the largest
flexibility and the best performance for the highest density since the layout can be tuned and op-
10 Introduction
timized for each application. However, the total turnaround time is quite large if the layout is
done by hand and tools to automate this layout technique are very complex.
1.4.2 Semi-Custom
The semi-custom layout styles try to speed up the layout and/or fabrication process by imposing
restrictions and constraints on the physical design of the circuits.
• Standard Cell
Standard cell layout architecture considers the layout to consist of rectangular cells of fixed
height and variable width. The cells are placed in rows and the space between two rows is
called a channel. These channels and the space above and between cells is used to perform
interconnections between cells. This layout style introduces several additional constraints
on the layout tools. During circuit level layout generation, the circuits have to be laid out
with a fixed height. Standard cells usually have their power-supply connections running
horizontally on the top and the bottom of the cell, which puts another constraint on the
layout process.
• Sea of Gates
The sea of gates is a semi-custom design style similar to gate array. The master of the sea
of gates consists of a predefined regular pattern of logic gates. The interconnect between
the predefined gates can be customized by the designer of the chip. Since there are no
routing channels on a sea of gates chip, interconnects have to be completed by routing
through gates, or by using a second or third layer of metal. The restrictions imposed by
the sea of gates styles are similar to those of an array, and analog designs implemented in
this style suffer from the same layout induced performance problems. Sea of gates chips
are rarely used for analog designs.
macro-cells, structural entities or modules. The complexity of these modules ranges from basic
devices (transistors, capacitors, resistors) to more complicated structures like current mirrors or
differentials pairs. Some analog layout systems restrict the complexity of their modules to basic
devices, in which case the module recognition step is trivial and can be omitted. Other layout
systems allow more complex modules and rely on the designer or on dedicated algorithms to
make the division. An example of a BiCMOS opamp divided into modules is shown in Fig. 1.6.
Once the division into modules is completed, a placement can be generated. Each module
can be laid out in several electrically equivalent ways, called variants. The task of the placement
program is to select an optimal variant for each module, and to place these variants on the layout
surface in an optimal way. To accomplish this, the placement program uses a set of module gen-
erators to create a set of possible variants for each module. Typically, these module generators
are parameterized programs which procedurally generate layouts for modules based on the mod-
ule parameters and a number of options. For each module considered, there must be a module
generator that generates all the different variants. Maintenance of these module generators over
different processes is a hard job. This is also the reason why certain layout synthesis tools try
to reduce the number of generators, for instance by restricting the considered modules only to
individual devices. A placement of the BiCMOS opamp of Fig. 1.6 is shown in Fig. 1.7.
The next step in the design flow is routing. The task of the router is to connect the modules
according to the netlist of the circuit. The layout of the BiCMOS opamp after routing is shown
in Fig. 1.8. Sometimes, compaction is used as a final step to improve the density of the layout.
Although most automatic analog layout systems use a variant of the macro-cell layout style
as basic layout strategy, they vary greatly in the way they implement the analog specific features
of the various tools. This will be discussed next.
1.5 Existing Tools for Analog Layout 13
Vee
----------------,
1
lEE
~--~r_--------~t~~:
t ,- - -, ,- -II 1
1 11.- - _I
,ICC ''
• _ J
,, - --,,
: M4a......lII.:---<~-----'-~hM4b :
__ J
,
Vss
. . ____
r••••--a)
f .-.~ ..
~~:
~ ~
I I .
rrl ~J
L~J _ .~
The automatic generation of layout through the use of procedural generators [Kuhn 87] is perhaps
the most mature layout automation technique used for analog circuits. In these systems, specific
14 Introduction
software is written to support the creation of each unique circuit topology for which layout is
desired and the analog-specific knowledge about the layout is coded into the software itself.
Parameters passed to the generator at invocation are used to calculate the actual dimensions of
the variable devices. A variety of simple constraint mechanisms can be used to adjust the position
of the devices to account for their changing sizes. In this way the performance of the generated
circuit can be varied within some bounds by changing the invocation parameters to the generator.
Because the topology remains fixed, the base layout can be manually optimized, hence the
generated layouts are of reasonably high quality. Virtually all of the commonly used analog-
specific layout constraints, e.g. device merging, layout symmetries and matching considerations,
can be manually programmed into these generators. The major disadvantage of analog procedu-
ral layout generators is that they require a large coding effort for each new topology. Because
of the relatively high cost of developing new generators, procedural generation is clearly not a
good choice for an environment requiring a large variety of circuit topologies. These systems are
most useful in environments where many variations of a few general purpose circuit topologies
are used, e.g. switched capacitor filter synthesis systems.
This approach uses a template to capture graphically or textually an expert's knowledge of analog
layout for a given circuit type. The template is created once by an expert designer and captures
his knowledge of analog specific constraints like symmetry, device matching and parasitic min-
imization. To generate a circuit layout, one supplies the required electrical parameters for the
1.5 Existing Tools for Analog Layout 15
circuit together with some other geometrical constraints (e.g. circuit shape and aspect ratio). The
layout is generated by transforming the template into an actual layout, based on the user-supplied
electrical and geometrical parameters.
A typical example of a template driven analog layout system is the "design by example"
approach presented by the Philips Research Laboratories [Conway 92]. For a given module type,
a sample module layout is created once by an expert designer. This template graphically captures
his knowledge of device placement and orientation, routing wire trajectories, material types and
widths and position of module terminals. To generate a module, the user supplies the required
electrical parameters for each device, the sets of devices which must be matched and geometrical
constraints on the module's shape. To reconstruct the placement, the template is analyzed to
determine all its possible slicing structures. These structures are stored in an exhaustive slicing
tree from which an area-optimal floorplan is derived, depending on specified matching and aspect
ratio constraints. The reader is referred to section 4.5.1 for an introduction to slicing trees and
their use in analog circuit placement. To obtain real layout, the modules in the template are
enlarged and then replaced by the actual module instances, restoring connectivity using a river
router [Conway 92]. The final layout is then obtained by compaction.
This technique produces good quality layout in a reasonable amount of time but has some
major drawbacks. A template has to be created for each new type of circuit which limits the
generality of the method. The library of templates has to be updated for each new technology.
For these reasons this method was not considered for this research.
Rule-based layout systems offer a fast and flexible integrated set of tools and a rule-based control
that can be customized by the designer to meet his specific needs. In these systems, knowledge
about analog layout design is incorporated in the rule set. The quality of the final layout depends
on the quality of the rule set.
One of the most successful rule-based analog layout systems is ALSYN [Meyer 93]. In the
ALSYN design flow, the circuit is first analyzed by a compiled rule set. Using designer specified
rules, compound modules are recognized and constraints for placement and routing are deter-
mined. ALSYN's placement algorithm is based on the min-cut approach and uses Stockmeyer's
algorithm for area optimization. A number of constraints can be specified, e.g. partial slicing
specifications, symmetry and clustering groups. Routing can be done either with a grid-based
maze routing algorithm or with a grid-less line expansion approach.
At first glance, the rule-based approach seems to be very attractive because every designer
can adapt the tools to his specific needs by providing his own set of rules. Unfortunately, this
is also the major drawback. The quality of the resulting layout depends for a great deal on the
quality of the rule set. Rules are difficult to formulate in a general and context-independent way,
and as a consequence, rule-based systems produce acceptable layouts only for the limited range
of applications for which the rules have been designed.
16 Introduction
Algorithmic approaches incorporate the analog layout knowledge in the software itself. The
main purpose of these algorithmic analog layout tools is to create a layout that is fully functional
and optimal in area and performance. They differ from the other described approaches by em-
ploying general algorithmic techniques to optimize the layout and to minimize layout induced
performance degradation, rather than relying on templates or rule sets.
Most of these tools use cost function driven algorithms for placement and routing. Incremen-
tal changes are made to an intermediate solution and the quality of the new solution is evaluated
using a cost function. The choice of the cost function is crucial for the operation of the tools. The
penalties that are built into this function for violations of analog constraints (matching, crosstalk,
loading capacitances, etc.) drive the tools to a solution which minimizes the layout induced per-
formance degradation. The algorithmic layout tools can be classified by the ways of deriving and
treating performance constraints.
implement and maintain and a certain degree of technology dependence can not be avoided.
In [Cohn 91] the KOAN/ANAGRAM IT tool set was presented as a new layout program that
permits more low-level layout optimizations. KOAN is a device-level analog placement tool
based on simulated annealing. It differs from other tools in its ability to selectively merge device
modules to reduce parasitic capacitance and cell area by appropriate sharing of geometry. KOAN
uses a small basic device generator library and creates the more complicated sub-circuit layout
structures that make up the bulk of typical generator libraries (e.g. cascode structures, matched
differential pairs) dynamically during placement. ANAGRAM IT is a detailed general-area router
that handles arbitrary grid-less design rules in addition to over-the-device, crosstalk avoiding,
mirror-symmetric and self-symmetric wiring. The area-routing strategy incorporates models of
capacitive coupling, including simple shielding effects, in its basic evaluation mechanism for
paths, allowing path selections to be coerced by possible interactions with other wired nets.
Performance Driven Approaches Although the tools mentioned in the previous section try
to optimize performance in various heuristic ways, none of them quantifies performance degra-
dation during layout design. As a consequence, they can not guarantee that the performance
specifications will be met after the layout is completed and one or more time-consuming layout-
extraction-verification loops may be necessary. An additional problem is that an extracted list of
parasitics is usually very large and if performance specifications are not met, no clue is obtained
regarding what went wrong and it is not obvious which parasitics have to be changed to reduce
the performance degradation.
The solution to this problem is to apply performance driven layout techniques to the analog
layout problem. Performance driven layout tools strive to construct a layout such that the perfor-
mance specifications are guaranteed to be met by construction, by quantifying the layout-induced
performance degradation and keeping this below the. allowed margins.
The first analog performance driven layout methodology was proposed in [Choudhury 90a,
Choudhury 90b]. In this approach, the effect of layout parasitics on circuit performance is mod-
eled using sensitivities. Using this linear approximation and a quadratic optimization technique,
the performance constraints for the circuit are mapped to a set of constraints on layout para-
sitics. These parasitic constraints are then used to drive the layout tools. This methodology
has been applied to channel routing [Choudhury 90c], area routing [Malavasi 90, Malavasi 93],
placement [Charbon 92, Charbon 94a] and compaction [Felt 93]. Another implementation of a
performance driven layout strategy was presented in [Bas 93]. The router presented in this paper
uses parasitic constraints to guide the search of a line-expansion area router.
: Commercial :
, Layout ' i~
~ ~~v!r.?~~':.n.!S_ ~
r
'-----'
tools are driven directly by the performance constraints. We believe that this approach has
several advantages which will be explained in chapter 2.
Another limitation of the performance driven layout tools mentioned above is that they focus
on the control of performance degradation introduced by interconnect parasitics. The perfor-
mance driven layout tools presented in this work have been designed to simultaneously handle
performance degradation induced by interconnect parasitics, mismatch and thermal effects.
Experiments with our performance driven algorithms show that they find a layout respect-
ing the performance constraints if one exists. Moreover, in all but the most tightly constrained
cases, several valid solutions could be found by varying the control parameters of the various
algorithms. Although these solutions are equivalent in terms of performance degradation, their
quality differs substantially if yield and testability are taken into account. These observations
lead us to an extension of the performance-driven routing algorithm to also include yield and
testability effects.
1. netlist file The netlist describing the circuit for which the layout has to be designed. This
can be a device level netlist, or an architecture of higher level blocks.
1.6 Overview of the Analog Layout Tool LAYLA 19
1.6.3 Placement
The task of the placement tool is to select an optimal position, orientation and implementation
for each device in the circuit. The freedom in placing these devices is used to control the layout-
induced performance degradation within the margins imposed by the designer's specifications.
During each iteration of a simulated annealing optimization algorithm, the layout-induced per-
formance degradation is calculated from the geometrical properties of the intermediate solution.
The cost-function is designed to control performance degradation due to interconnect parasitics,
mismatch and thermal effects, as will be explained in detail in chapter 4.
1.6.4 Routing
The main task of the performance-driven router is to route the circuit such that the performance
degradation caused by the interconnect parasitics remains within the specification margins im-
posed by the designer. For a given set of circuit specifications, several valid routing solutions can
be found. Among these, the routing algorithm selects the solution that additionally maximizes
the yield and the testability of the resulting layout. Initially, the circuit is routed with a cost func-
tion designed to enforce all performance constraints. After all nets have been routed, the layout
20 Introduction
parasitics are extracted and the performance of the circuit is verified. In a second phase, nets are
ripped up and rerouted to optimize the yield and the testability of the layout. During this process,
care is taken not to introduce performance constraint violations. The performance driven routing
algorithm together with the yield and testability aspects are the subject of chapter 5.
2.1 Introduction
Generating the layout of high-performance analog circuits is a difficult and time-consuming task
which has a considerable impact on circuit performance. The various parasitics which are intro-
duced during the layout phase of an integrated circuit design can introduce intolerable perfor-
mance degradation. Since these parasitics are unavoidable, the main concern in analog layout
synthesis is to control the effects of the parasitics on cin;uit performance and to make sure that
the circuit after layout still performs within its specifications.
Traditionally, this has been done as shown in Fig. 2.1(a). During layout design, the layout
is optimized without quantifying the performance degradation. Therefore, it is not guaranteed
that the resulting layout will also meet the specifications and a post-layout verification of the
circuit with extracted layout parasitics is needed. If it turns out that the circuit does not meet its
specifications, one or more time consuming layout iterations are needed. Another problem with
this approach is that the extracted list of parasitics is huge, and that the layout designer, or the
layout synthesis tool, has no clue which parasitics are responsible for the performance constraint
violations.
The goal of a performance driven layout strategy (see Fig. 2.1(b» is to drive the layout tools
directly by performance constraints, such that the final layout, with parasitic effects still satisfies
the specifications of the circuit.
specs specs
performance
driven
layout
generation
specs
met?
layout layout
(a) (b)
1. Design Parameters
Design parameters are the values of device characteristics which can be directly manipu-
2.2 Problem Formulation 23
lated by the circuit designer, such as the width and the length of a MOSFET transistor, or
capacitance and resistance values of passive circuit components.
2. Process Parameters
Process parameters are used to characterize the technology process. Their values are spec-
ified by the foundry and can not be controlled by the designer. Examples of process pa-
rameters are the Kp and the Vro of a MOS transistor.
3. Layout Parasitics
In general, a layout parasitic can be defined as every cause of performance degradation
which is not intended by the circuit designer and whose value is determined by the layout
of the circuit. Examples are the parasitic capacitance of a circuit node, or the coupli'ng
capacitance between two circuit wires.
Due to parametric fluctuations in the manufacturing process, the process and to some extent also
the design parameters are statistical in nature and have to be treated as random variables with
a distribution function. In practice, it is impossible to design a circuit in one step, taking all
parameters and their stochastic nature into account. Therefore, the design process is usually
divided in consecutive steps, during which a subset of the parameters is determined, while the
effect of the others is approximated or discarded. The following steps are usually executed:
(a)
Pnom
(b) I
pmin
proc
(c) I
(d)
which contains all possible values of P if process variations are taken into account (see
Fig. 2.2(c» :
(2.2)
3. Layout Generation
The various parasitic layout effects that will be discussed in sections 2.5-2.8 result in an
additional perfonnance variation !!"Play . For the circuit still to be functional, the value of
P after layout and with process variations has to be in the interval [P.nin. Pm ax J. It can be
derived from Fig. 2.2 that this results in the following two constraints on !!" Play :
(2.3)
and
(2.7)
2.3 Previous Work in Performance Driven Layout Generation 2S
The goal of a perfonnance driven layout system is thus to generate a layout such that the layout
induced perfonnance degradation fl.P1ay lies within the interval specified by (2.5), with the limits
given by (2.6) and (2.7), for each perfonnance characteristic of the circuit (see Fig. 2.1(b)).
Expressions (2.5),(2.6) and (2.7) are the input of a perfonnance driven layout tool.
In the net-based approach [Dun 84, Ogaw 86, Hau 87, Gao 91], timing requirements are trans-
lated into constraints on the length of nets, which are then used to drive the layout algorithms.
The place and route tools try to generate a layout such that the constraints on the length of nets
are satisfied and hence, also the timing requirements. Several algorithms were proposed to gen-
erate the bounds on the net length: examples are the zero-slack algorithm in [Hau 87, Nair 89]
and a convex programming technique which tries to maximize the flexibility of the layout tools
in [Gao 91].
The conversion of timing requirements to net constraints yields a convenient way to per-
fonn perfonnance driven layout. However, it also over-constrains the layout tools: the solution
space of feasible net lengths satisfying timing constraints is very large and calculating net length
bounds before layout is equivalent to picking just one solution. From the viewpoint of the layout
algorithms, the chosen net bounds are random. It is not known how to iterate in order to converge
to net lengths which would lead to realizable layouts.
26 Performance Driven Layout of Analog Integrated Circuits
(2.8)
where
-1
Cj=------ (2.9)
(Xj.max - Xj,min)2
(2.10)
(2.11)
2.3 Previous Work in Performance Driven Layout Generation 27
If the layout tools fail to meet one of the derived parasitic constraints, one or more itera-
tions with another set of constraints is needed. This methodology has been applied for chan-
nel routing [Choudhury 90c], area routing [Malavasi 90, Matavasi 93], placement [Charbon 92,
Charbon 94a] and compaction [Pelt 93].
2.3.2.3 GELSA
In [Prieto 97] a performance-driven placement algorithm for analog integrated circuits was pre-
sented. The placement algorithm is based on simulated annealing and uses a slicing style place-
ment representation (see section 4.5.1). The use of the slicing style placement representation al-
lows to integrate a heuristic global routing algorithm in the placement optimization loop, which
results in accurate estimates for interconnect parasitics and routing area. The disadavantage of
this approach is that it only works with a slicing style placement representation, which is a poor
choice for analog layout, as will be explained in section 4.5.1.
2.3.3 Discussion
A common feature of the presented analog performance driven layout tools is that they are driven
by parasitic constraints. The performance constraints are seen as too abstract for the tools to
handle directly, and are therefore mapped into a set of bounds on parasitics, either by the designer
or by a constraint generation algorithm. This approach resembles the net-based approach in
digital timing-driven layout, where the high-level timing requirements are mapped into bounds
on net parasitics, which are then enforced by the layout tools. Unfortunately, it also suffers from
comparable drawbacks.
28 Performance Driven Layout of Analog Integrated Circuits
layout layout
(a) (b)
First, converting performance specifications into only one set of parasitic bounds is overly
constraining. If the tools fail to meet the selected set of parasitic constraints, a new set has to
be generated and the layout process has to be repeated (see Fig. 2.4(a». While this conversion
yields a convenient way to handle performance constraints during layout design, it may cause
unnecessary iterations. Valid solutions in terms of performance may be rejected because they
don't satisfy an a priori selected set of parasitic constraints.
Secondly, the criterion which is used to select a set of parasitics, the flexibility of the layout
tools, is something which is very hard, if not impossible to quantify. Although the model (2.8)
might be a good approximation in some cases, in general it is far from realistic since it only
takes the value of the parasitic constraint into account. The difficulty of embedding a net during
routing is also determined by the area of the terminals it has to connect and the distance between
them, the obstacles present in the layout area, previously routed nets, etc. Most of these factors
are impossible to determine a priori.
It can be concluded that this indirect constraint-based approach overly constrains the layout
tools by imposing only one set of parasitic bounds and that the criterion used to determine the
set of constraints is of no general use.
the performance constraints (see Fig. 2.4(b». The cost functions used to quantify intermediate
place and route solutions are based on an evaluation of the performance degradation /)"Play that
would result from accepting the solution: when /)" Play exceeds the allowed margins, the solution
is penalized.
The evaluation of performance degradation is based on a three-step methodology. First, the
relevant geometrical information is extracted from an intermediate layout solution. Based on this
information, the value of the parameters which model the parasitic effects are calculated. Finally,
the influence of the parasitic layout effects on the performance characteristics of the circuit are
evaluated using a linear approximation based on performance sensitivities.
This direct method has several advantages. First, by directly taking into account the high-
level performance specifications, a complete and sensible trade-off between the different alter-
native solutions can be made. Secondly, since the performance degradation is calculated at run
time, it is not only possible to keep all performance characteristics within their limits, but also
to optimize the layout with respect to other constraints, such as yield and testability. Finally,
while the methodology described in [Choudhury 93, Charbon 93] can lead to a number of itera-
tions with different parasitic constraints, our tools will either yield a correct layout or will flag the
specifications as being impossible to meet, without iterations. This approach therefore eliminates
the feedback route between constraint derivation, placement/routing and layout extraction.
. oP)
S' = - , (2.13)
Xi OXi
to DC, AC and transient sensitivity computation. Although the details are different, the general
principle of the method is the same in the three cases. We will illustrate the different methods for
the simplest case, DC sensitivity computation. For details on sensitivity computation, we refer
to [Vlach 83, Hoc 85, Dir 69].
The DC solution of a circuit can be computed by solving a system of nonlinear algebraic
equations [Vlach 83]:
f(xO, h, w) =0 (2.14)
where XO is the vector of the voltages and currents, h is the vector of parameters and w represents
DC sources. The purpose of DC sensitivity analysis is to determine the derivative of one or more
elements of XO to one or more elements of h.
(2.15)
This brute force approach has several disadvantages. First, the finite difference approximation
~~~ tends to the differential sensitivity only in the limit as f1h -+ 0, and using a very small
value for f1h is numerically unstable because of roundoff errors. Second, a complete sensitivity
analysis requires the formulation and solution of (2.14) for each component of h which results
in prohibitively high computational cost. The advantage of the approach is its straightforward
implementation and general applicability.
(2.16)
*
M-=--
(lh (lh
where M = Ixi' is the Jacobian matrix about the operating point. To obtain ~8~~ (2.17) can
be solved using the LV factors of M, which can be reused from the original Newton-Raphson
solution of (2.14). This method generates the sensitivity of the whole vector XO with respect to
one single variable element h. For each additional parameter h, (2.17) has to be solved again.
2.4 A Direct Performance Driven Layout Strategy 31
The adjoint method [Dir 69] can be used to compute the sensitivity of any scalar variable ¢J (xo)
to a parameter h. To illustrate the method, we restrict ¢J(XO) to be a linear combination of the
components of Xo :
(2.18)
where d is a constant vector. Extension to any function ¢J(xo, h) is possible [Vlach 83]. To
compute the sensitivity of ¢J with respect to h, we differentiate (2.18) :
(2.19)
oxo lof
-=-M- - . (2.20)
oh oh
Combining equations (2.19) and (2.20) gives:
o¢J T _I of
-=-d M - (2.21)
oh oh
We now define an adjoint vector x' through the relation
(2.22)
(2.24)
2.4.2.4 Discussion
To compute perfonnance degradation with the linear approximation (2.12) as proposed in sec-
tion 2.4.1, the sensitivities of all perfonnance characteristics with respect to all layout parasitics
are needed. If there are N p perfonnance specifications and Nx parasitic layout effects, N p x N x
sensitivities need to be computed. From the presentation of the different methods in the previous
sections, it can be concluded that the adjoint method of sensitivity computation is best suited for
such a calculation. Only two systems of equations have to be solved, irrespective of the number
of perfonnance characteristics or parasitics involved.
The problem however is that the adjoint method of sensitivity is not supported in any circuit
simulator available for this work. The commercial circuit simulator HSPICE [MS 92] only sup-
ports DC sensitivity computation. Sensitivity analysis based on the direct computation method
was reported for SPICE3 [Choudhury 88], but our experiments revealed that the implementa-
tion was not robust enough for practical use. Therefore, we have implemented the perturbation
method in our circuit analysis tool. Although this method suffers from several drawbacks, as
discussed in section 2.4.2.2, it offers the advantage of general applicability and easy implemen-
tation. An implementation of the direct or adjoint method of sensitivity computation requires
access to the source code of a circuit simulator and constitutes a considerable coding effort.
Since the focus of this research is on the design and implementation of perfonnance driven lay-
out algorithms, we have used the simpler but less efficient perturbation method. This can simply
be substituted by any more efficient sensitivity calculation method when one comes available.
metal2
First, they act as additional loads for the devices in the circuit. As such, they cause the poles
and the zeros of the circuit to shift, which degrades the small signal characteristics of the circuit.
A typical example of this is the parasitic capacitance of the output node of a load-compensated
operational transconductance amplifier (OTA), which influences the bandwidth and the phase
margin of the circuit. Large-signal characteristics, such as the slew rate, can also be affected.
Secondly, the parasitic elements which are present between wires implementing different nets
introduce unexpected signal coupling into a circuit. This effect degrades the noise performance
of the circuit or may even destroy its stability through unwanted feedback. Since there is no
concept of a noise margin for analog circuits, all noise is generally bad. Many analog signals
have very small amplitudes and are thus very sensitive to noise. An example of a performance
characteristic which is heavily influenced by parasitic coupling elements is the Power Supply
Rejection Ratio (PSRR). A coupling capacitance between one of the input nodes of an amplifier
and the power supply node can be disastrous for the PSRR.
model. The complete interconnection is then modeled as a cascade of lumped element sections.
For an interconnection piece of length D, at a given frequency w, exact T and n equivalent
circuits can be found. These circuits are shown in Fig. 2.6. In this figure, D is the length of
the interconnection segment, Zo the characteristic impedance and y =
ex + jf3 the propagation
function. Zo and y are given by :
Ro + jwLo
Zo= (2.25)
Go + jwCo
where Ro, L o, Go and Co are the resistance, inductance, conductance and capacitance per unit
length, respectively.
If the length of the interconnection segment is sufficiently small, i.e. Iy DI«1 , the follow-
ing simplified expressions can be derived for the impedance values of Fig. 2.6 :
Y D)
Z otan h( -2- ~
(Ro + jwLo)D
-'--"--"':'2---'"'--- (2.27)
. 1
Zoslnh(y D) ~ , (2.28)
(Go + jwCo)D
The interconnect model presented above is far too complex for use in a performance driven
layout tool and some approximations must be made. For analog circuits operated at relatively
low frequency (long wavelength), each wire is much shorter than one wavelength, and can thus
-
be treated as an individual lumped element. A further simplification can be made by observing
that Go « wCo and wLo « Ro is generally true for CMOS analog circuits. (typical values are:
w = 2TI x 100M Hz, Lo = 1O-15.lL, Ro = 0.01-'1-, Go = 1O-12n- 1square, Co = 1O-16-L.,).
~~ ~
This means that the inductance Lo and the conductance Go can be neglected in the models of
Fig. 2.6.
Based on the two approximations explained above, we use the interconnect model shown in
Fig. 2.7 in our performance driven layout tools. An n-terminal net i is modeled by n parasitic
resistors Rij' j = 1, ... , n, one parasitic capacitance to ground Ci and a parasitic coupling capac-
itance Cij' j = 1, .... N - 1 to each other node jin the circuit. where N is the number of nodes
in the circuit.
2.5 Interconnect Parasitics 35
Zosinh(yD)
:?fE RoD LoD
Zo
(a) (b)
(a) (b)
Numerical Methods
Geometrical Methods
Geometrical methods extract interconnect capacitances directly from the layout geometry us-
ing parameterized models for commonly encountered interconnect configurations that are stored
in a database. Geometrical methods are fast and reasonably accurate, and therefore, they are
good candidates for use in a performance driven layout tool.
In our layout tool, we use geometrical methods to compute parasitic interconnect during
placement and routing. During placement, calculations must be made based on estimates of the
wire topology. During routing, the actual wire topology can be used. Although the details of the
method differ for the two problems, the general principle is the same in both cases. The method
is based on techniques reported in [Arora 96] and [Choudhury 95].
The layout geometry is first reduced into a set of boxes on different conducting layers. The
capacitance of each box is then estimated as the sum of three components (see Fig. 2.8):
• area capacitance due to overlap with conductors on different layers. The parasitic capac-
itance component CUY can be computed using the following well known equation for
parallel plate capacitance:
E
C AXY =dx A xy , (2.29)
where E is the permittivity of the material between the conductors, d is the vertical distance
between the conductor layers and Axy is the overlap area.
2.5 Interconnect Parasitics 37
5
Layer 3
Layer 2
1 Layer 1
• lateral capacitance with respect to conductors on the same layer. For two conductors on
the same layer, running in parallel over a length Ixy and separated by a distance d, the
lateral capacitance component can be computed as :
where F(d) is the capacitance per unit edge length of two parallel conductors as a function
of separation distance d. The function F(d) can be fitted to the following form :
Cl C2 C3 C4
F(d) = Co + d + d2 + d3 + d 4 ' (2.31 )
where CO,C h C2,C3 and C4 are technology dependent constants that can be determined by
fitting (2.31) to simulated or measured capacitance data .
(2.32)
where I XY is the parallel length between the conductors, C FO is the maximum value of the
fringe capacitance, and X2 and Xl are the distances from the edge of the first conductor
to the near and far surface edges of the second one. Xo is a measure of how the fringe
capacitance varies for incremental length of the fringing surface. The constants in this
model, C FO and Xo are technology dependent and have to be fitted to simulated or measured
data.
38 Performance Driven Layout of Analog Integrated Circuits
An accurate prediction of the parasitic resistance of an interconnection again requires the nu-
merical solution of the Laplace equation to determine the relation between the terminal potentials
and the terminal currents. The same numerical techniques which are used to compute parasitic
capacitance can be used to compute parasitic resistance: the finite-difference method [Harb 86],
the finite-element method [Hall 87] and the boundary-element method. These methods yield
very accurate results and are very general at the expense of large CPU-times. Therefore they are
mostly used in circuit extractors. For the repeated evaluation of parasitic resistance in the inner
loop of layout optimization tools a fast and reasonably accurate geometrical method has to be
used.
Geometrical Methods
where Ii and Wi are the length and the width of the rectangle and Po.I its sheet resistance. The
parasitic resistances of an interconnection can then be determined by series/parallel combination
of all the R i .
C. = AsC j PC
s jsw (2.34)
jSBt (l _ !PJ. )mj + (1 _ ~)mj,W'
'I' <Pj
where, C jSBt refers to the total source (or drain) bulk capacitance, As and Ps to the source (or
drain) area and perimeter, Cj and CjsUl to the bottom and sidewall junction capacitances in ab-
sence of any junction voltage and cPj to the built-in junction potential. m j and m jSIII depend on
2.7 Mismatch 39
(a)
(b)
IdId ~ H (c)
the doping profile of the junction. Equation (2.34) reveals that this capacitance can be reduced
by minimizing the size of all diffusions. In particular. large FET devices can be folded to allow
a single source or drain diffusion to be shared by two gate regions. Device folding is illustrated
in Fig. 2.9(a). An additional large saving in diffusion capacitance can be made by device merg-
ing. i.e. placing devices such that diffusion geometry is shared between electrically connected
devices as shown in Fig. 2.9(b). This type of geometry sharing has the additional benefit of
improving the layout density. If spacing rules permit. additional capacitance and resistance can
be saved by making the connection between some adjacent devices by abutment, rather than by
explicit wiring (see Fig. 2.9(c» .
2.7 Mismatch
Mismatch is defined in [Pel 89] as the process that causes process-induced. time-independent
random variations in physical quantities of identically designed devices. Since the function-
ality of analog circuits is often based on ratios of devices, mismatch puts a fundamental limit
40 Performance Driven Layout of Analog Integrated Circuits
• the total mismatch of parameter P is composed of many single events of the mismatch-
generating process;
• the effects on the parameter are so small that the contributions to the parameter can be
summed;
• the events have a correlation distance much smaller than the device dimensions.
Consequently, the values of the mismatch of parameter P are normally distributed with zero
mean. Examples of mismatch processes of this class are: distribution of ion-implanted, diffused,
or substrate ions, local mobility fluctuations, oxide charges, etc. On the other hand, the circular
parameter-value distributions which originate from wafer fabrication and the oxidation process
are explained by a second class of mismatch: deterministic processes (e.g. gradients) which can
be modeled as an additional stochastic process with a long correlation distance.
A mathematical treatment of these two classes of mismatch behavior yields the following
model for the mismatch of a parameter P [Pel 89]:
(2.35)
where Ap and Sp are technology dependent area and distance proportionality constants for
parameter P. Applied to the threshold voltage VTO , the current factor f3 and the substrate factor
K of a MOS transistor, this gives:
2.7 Mismatch 41
cr 2(VTO) = (2.36)
cr 2(K) = (2.37)
cr 2(f3)
f32 =
(2.38)
where Ax and Sx are process-related area and distance constants for parameter X. The
validity of this model has been verified by measurements.
In [Bast 95] the mismatch of small size MOS transistors was characterized. The measure-
ments of the current factor mismatch was in good agreement with the model (2.38), but for the
area dependence of the threshold voltage mismatch significant deviations from model (2.36) were
observed. It was found that the linear dependency of the threshold voltage mismatch on the in-
verse of the square root of the effective channel area no longer holds for small length transistors.
This observation was explained by the fact that for small geometry MOS devices, the channel
depletion thickness can no longer be considered uniform: it is a function of the channel length,
the channel width and the drain voltage. An extended threshold voltage mismatch model taking
these effects into account was given:
A2 A2 A2
cr 2 (V ) = ~ + 2VT _ ~ + S2 D2 (2.39)
TO WL WU W2L fJ
This new model is able to accurately predict the threshold voltage mismatch for small size
MOS transistors.
• Same Structure
Matching devices should have the same structure. For instance, a poly-poly capacitor can
not be matched with a metal-poly capacitor. Due to the large spreading on the absolute val-
ues of process parameters, they can never be used to design predictable device parameter
ratios.
• Same Temperature
Since device parameters are sensitive to temperature variations, matching devices should
have the same local temperature. Power dissipating devices cause a temperature distribu-
tion across the layout area. Matching devices should be placed on isotherms. The influence
of temperature effects on circuit performance will be discussed in section 2.8.
• Common-centroid geometries
Four different layout styles for a MOS transistor pair are shown in Fig. 2.11 : a finger
structure (Fig. 2.11 (a», an interdigitated finger structure (Fig. 2.11 (b », a common-centroid
structure (Fig. 2.ll(c)) and an interdigitated waffle structure (Fig. 2.ll(d». The mismatch
of these structures has been measured and compared in [Bas 96b]. It was found that the
interdigitated waffle and the common centroid (quad) transistor layout structures show no
systematic mismatch, and that the matching follows the model described in the previous
section. Under induced die stress due to packaging, the finger style transistor pair and the
interdigitated finger structure show a fluctuation on f3 matching up to 5 times higher than
predicted by the model. This can be explained by piezoresistive effects due to residual
stresses induced into the silicon chip by packaging. It has been shown that this effect can
significantly degrade the matching performance of MOS transistors [Bast 96c]. Layout
structures with a 2-axial symmetry, like the common centroid structure, can be used to
cancel out stress, process and thermal gradients in every direction.
• Same Orientation
Anisotropic process steps cause asymmetries in process parameters and the silicon sub-
strate itself can be anisotropic. The mismatch caused by these effects can be avoided by
placing matching devices with equal orientations and such that the current flow is strictly
parallel and in the same direction (see Fig. 2.12).
• Same Surroundings
Devices with un-identical surroundings can show a considerable mismatch. There are
various possible reasons for this effect, which are not always clear. This problem can be
solved by implementing dummy devices to simulate the same surroundings (see Fig. 2.13).
2.8 Thermal Effects 43
(a) reforence
~
~ ~~~~~~
00:
r::o::::::;-::: ::;-:::~ t:% ::;-::: ;%::: ~~ ~ ~
~~ ~
~
t4:::
~~ Ii
:;:::
12~ I::.:: i%~~ ~ ~ ~ ~
~~~~~ ::;-::: ~
~~~
~~
~
(a) (b)
~./.%: % ~ ~
t:';~ ~
:;:: 1:::::'" ~ ~~ ~
~
~
~ ::;:::;;:;
~~ ~
S;::' ~~~
'/:;
!;:::!
::;-:::
::;-::: !i t:%
~~~ :;;:: ~ ~~
t::2
IX i"::: 01i;!%
~ ~/,/,/,.h ~~..--0
(c) (d)
bles for approximately every 10 deg C increase in temperature, hot spots due to excessive local
power dissipation have become a major long-term reliability concern in many applications. In
addition, the temperature dependence of device parameters results in thermally induced perfor-
2.8 Thermal Effects 45
D MMY DUMMY
Figure 2.13: Matched resistor pair with dummy resistor strips added to improve matching.
mance degradation on the circuit level. This performance degradation can be due to a shift in
absolute temperature of individual components, or can be caused by temperature gradients be-
tween components. An example of the first category is reference voltage shift in regulators and
data converters [Fuka 76]. The second category includes input offset voltage, offset voltage drift
and unwanted dc-feedback in differential amplifiers [Sol 74]. To control this thermally-induced
performance degradation, it has become essential to take thermal effects into account during
layout design.
46 Performance Driven Layout of Analog Integrated Circuits
(2.43 )
2.8 Thermal Effects 47
" Lx
where L z = l:;:'1 L j is the total height of the structure. Power is generated uniformly in the
surface source with power density Q~ :
8TNI
kN -.- = p(x, y), (2.44)
oZ z=O
l
with:
w h
p(x, y) =
Q fI
00
for Ixl :::: 2' Iyl :::: 2' (2.45)
elsewhere
With these assumptions, the temperature distribution on the top surface of the N -layer structure
can be written as a double Fourier series [Alb 95]:
nnw . mnh
" 00 00 sin - - sm - -
4Q-o '"""
T(x, y) = - '"""
~ ~ 'CN(n, m) .
Lx
.
Ly nn x mny
. cos - - . cos-- (2.46)
kN n=O m=O (1 + 8no )nn (1 + 8mo )mn Lx Ly
The Fourier coefficients 'CN(n, m) depend upon the thermal conductivities kj and thicknesses
L j of the layers in the multi-layer structure and can be computed with the following recursive
relation:
'CI (n, m) = tanh(y(n, m)L 1) (2.47)
kN-1 tanh(y(n, m)L N) + kN'CN_ly(n, m)
'CN (n,m ) = kN- 1 + kN'CN-1 tanh(y(n, m)L N)
(2.48)
Using (2.48) and (2.49) to compute the Fourier coefficients, it is straightforward to compute
(2.46) for the two-layer and the four-layer case. These results are in exact formal agreement with
the expressions derived in [Van Pet 93] for the two-layer case and in [Kokkas 74] for the four-
layer case. The two-layer results derived in [Van Pet 93] have been experimentally verified using
a thermal measurement procedure based on a BiCMOS diode matrix [Van Pet 93, Geer 93]. This
and other studies [Lee 88] have shown that, when the heat source edges are at least one structure
thickness away from the boundaries of the rectangular structure, the thermal profiles are weakly
affected by the boundaries and thus the boundaries can be assumed to extend to infinity.
2.8.3 Discussion
The thermal solution (2.46) is expressed in terms of an infinite double Fourier series. In practice,
however, only a finite number of terms can be taken into account. It was shown in [Lee 89] that
the required number of terms in the series for a designated accuracy is directly proportional to
the ratio of the chip to source size. For the analysis of structures with large chip-to-source size
ratios, a considerable amount of CPU time is needed for (2.46) to converge. Consequently, it is
impossible for thermally constrained layout tools to directly evaluate circuit level thermal perfor-
mance degradation using (2.46). To overcome this problem, we have designed an efficient and
reasonably accurate incremental thermal computation scheme based on (2.46). This computation
scheme will be discussed in detail in chapter 4.
If the well potential is allowed to vary with respect to ground, the well itself acts as an injector of
current. Another significant current injection mechanism is impact ionization in MOS transistors.
The reception of substrate noise occurs mostly through capacitive sensing. The parasitic
substrate capacitances that act as injectors of substrate current can also act as receiver. In MOS
transistors, the body effect is also a severe form of substrate interaction. The body effect makes
the drain current dependent on the substrate potential and causes a MOS transistor to pick up
local substrate potential variations. While the capacitive coupling effects become significant
only at relatively high frequencies, the body effect can be an issue at low frequencies.
For a first order calculation of substrate coupling effects, it is adequate to consider the
substrate as a distributed resistance. This approximation is valid for frequencies up to
2GHz [Ghar95b]. At frequencies above 4-5 GHz, the error in this assumption may become
too large and a correct model of the substrate has to be obtained by solving Maxwell's equations
in the substrate. For todays designs however, transmission of substrate noise is mainly due to the
voltage variations induced by the injected current over the distributed substrate resistance.
To study substrate coupling effects, the substrate can be modeled as an n-port resistive network,
where n is the number of contacts to the substrate. A contact is defined as a region where the
circuit interacts with the substrate (see section 2.9.1). Computing a substrate model requires
the computation of all resistance elements in the matrix model. This problem is similar to the
capacitance and resistance extraction problems for interconnects and the same techniques that
are used in interconnect modeling are also used in substrate modeling.
The first substrate modeling technique was introduced in [Johnson 84], where large 3-D resis-
tor networks were used to study substrate coupled switching noise. In [Su 93], the large resistor
network is replaced by a single node to model substrates which use an epitaxial layer grown on
a heavily-doped bulk wafer. In [Verghese 93], a finite difference modeling technique was em-
ployed to generate substrate macro-models. 3D boundary-element methods are also used and
offer the advantage of generating much fewer elements [Smedes 93a, Smedes 95, Ghar 95b].
Because of their computational requirements, the numerical methods that are mentioned
above are limited to handle circuits containing a few hundreds of substrate contacts. In practice
however, substrate effects occur in large mixed-signal circuits. Therefore, several attempt have
been reported to speed-up the computation of substrate resistances. In [Joarder 94] parameter-
ized lumped models are used to model several different isolation schemes using guard-rings. In
[Verghese 95], precomputed point-to-point impedances and several other techniques are used to
speed up the computation of the substrate admittance matrix. Another simplified substrate mod-
eling technique was introduced in [Genderen 96]. The method involves the use of a common
substrate node to which all contacts are connected via a resistance. Direct coupling resistances
are only computed between neighbOring terminals.
50 Performance Driven Layout of Analog Integrated Circuits
Cl C2 Cl C2
30m QCIll
(a) (b)
type of substrate the perfonnance of the chip can be improved by separating noisy digital and
noise-sensitive analog circuitry.
Another commonly used layout measure is the use of guard rings around the injectors and the
receivers of substrate currents in the circuit. Guard rings are substrate contacts that completely
enclose a given region. They are connected to ground and provide isolation by absorbing the
substrate potential fluctuations generated by the other devices. This technique has been shown
to be effective only when the guard rings are placed very close to the sensitive analog circuits
and are biased using dedicated package pins. In that case, they have the effect of creating a zero
potential ring around the sensitive device, thereby electrically isolating it from the rest of the
circuit.
of the layout generation process. During module generation, the device level interconnect par-
asitics are fixed. The placement algorithms determines the minimum values of the interconnect
parasitics while the router fixes their final values.
A second performance degrading layout effect is device mismatch. This mismatch can be
limited by keeping the layouts of matched devices as identical as possible: equal orientations,
shapes and equal surroundings. The distance between matched devices has to be traded of with
other parasitic effects during placement in order to achieve the best overall performance. Special
layout structures can be used if a very high matching degree is required.
Thermal effects are a third category of performance degradation in integrated circuits. Power
dissipated by devices in a circuit causes thermal gradients across the chip. These gradients
result in te~perature shift for individual devices and thermal mismatch in matched device pairs.
Thermal effects can be limited by careful placement of devices.
The goal of a performance driven layout system is to generate a layout such that the com-
bined effect of these three major layout parasitics on the performance of a circuit remains within
the specifications. In this chapter we have described a performance driven layout strategy that
achieves this by driving the layout tools directly by the performance constraints. The cost func-
tions used to quantify intermediate place and route solutions are based on an evaluation of the
performance degradation that would result from accepting the solution : when it exceeds the
allowed margins, the solution is penalized. In the remainder of this work, this methodology will
be applied to module generation, placement and routing.
Chapter 3
Module Generation
3.1 Introduction
According to the macro-cell layout strategy presented in the previous chapter, the circuit
schematic is first divided in modules, which are then optimally placed and routed. A module
itself is defined as a functional set of one or more devices. For each module, a set of layout alter-
natives, called variants has to be generated. The placement tool then selects an optimal variant
for each module, such that the overall placement is optimal in terms of area and performance.
In this chapter we discuss the problem of circuit partitioning, i.e. the division of the circuit
into modules, and module generation, i.e. the generation of a set of layout alternatives for each
module.
Section 3.2 gives a description of the problem and some important issues which have to be
taken into account during module generation. An overview and comparison of module genera-
tion strategies is given in section 3.3. Two recently proposed transistor stacking algorithms are
discussed in section 3.4. The design and implementation of the LAYLA module generator library
is the subject of section 3.5. In section 3.6 we describe the techniques that are used to make the
library technology independent. We end this chapter with some examples in section 3.7 and we
present conclusions in section 3.8.
perfonnance constraints, certain variants and shapes are more appropriate than others to obtain
a functional and still dense final layout within the user-supplied perfonnance and geometrical
constraints. Therefore, the actual variant and shape of each module is only decided during the
placement (if not imposed by the user). It is the task of the module generators to generate the
palette of alternative variants for each module and the task of the placer to select the optimal one .
• "
["
I" : ""
,II
•
.'
•" ."
'"
.• ,,' " i -
..
I',,"" ."
8 ; .
. ~ .
"
'." •••
"" 'f." " • "
" " 81 . •
II ,
iI!
.1
~i
;.
" '"'
In CMOS circuits, device capacitances can often be minimized by proper layout. The pn
junctions which fonn the MOS device source and drain regions each have a nonlinear voltage
dependent capacitance which consists of two tenns, proportional to the junction area and perime-
ter, respectively:
AsCj PsCj,w
CJ. s Bt = + ----,;-;-'--;;;-:-- (3.1)
(1- ~tj (1- ~tj'w '
In equation (3.1), C jSBt refers to the total source bulk capacitance, As and Ps to the source area
and perimeter, C j and C j sw to the bottom and sidewall junction capacitances in absence of any
junction voltage and <Pj to the built-in junction potential. m j and m jsw depend on the doping
profile of the junction. Equation (3.1) reveals that this capacitance can be reduced by minimizing
the size of all diffusions. A large saving in diffusion capacitance can be made by device merging,
i.e. placing devices such that diffusion geometry is shared between electrically connected devices
as shown in Fig. 3.2. Two MOS transistors that have a common drain or source node (e.g. two
transistors in series have one common node, two transistors in parallel have two) can share a
diffusion region if they are of the same type and have the same bulk potential. This type of
3.3 Module Generation Strategies 55
geometry sharing not only improves the density of a layout, but also the perfonnance of the
circuit by reducing the parasitic capacitance of the node which has been merged.
In [Meyer 93], the partitioning process can be controlled by a set of user defined rules. Another
disadvantage of this approach is that the library of module generators has to be maintained when-
ever the technology process is updated. Procedural module generation is discussed in secion 3.5.
3.3.4 Discussion
Based on experiments with a large number of industrial analog circuits, we have found that
neither of these strategies does the job in an optimal way. A number of substructures returns
over and over again in analog circuits and can not be assembled by merging elementary devices.
Examples include interdigitated casdoded gain stages and common centroid differential input
pairs. For these components, specific procedural module generators have to be written.
These module generators have to be combined with merging during placement to achieve
close to optimal layouts in a fast, reliable and predictable way. In LAYLA, we have used a dy-
namic merging strategy, similar to the one presented in [Cohn 91]. In our placement algorithm,
potentially beneficial overlaps between device source/drain regions are explored at every itera-
tion. When such an overlap situation occurs, the total junction capacitance of the net to which
the terminals belong decreases with an amount proportional to the area and the perimeter of the
overlap region. This decreases the total performance degradation and lowers the cost function.
In this way, device geometry sharing is promoted specifically for sensitive nets.
To overcome the problem of technology dependent module generator libraries, we propose a
technique that isolates technology dependence in technology parameter files and device defini-
tions (see section 3.6. This techique allows us to reuse the module generator library for different
technologies without modifications to the source code.
1. The circuit graph is divided into a number of subgraphs, such that each sub graph contains
only transistors of the same type, with the same bulk bias net (see Fig. 3.3(c».
58 Module Generation
2. Large transistors are split into smaller parallel transistors (device folding).
3. Each subgraph is split into smaller connected subgraphs, containing only edges corre-
sponding to transistors with the same channel width.
These four steps are common to the two algorithms that will be discussed. The difference be-
tween the two concerns step 4, stack generation for a subgraph.
In [Wimer 87] and later in [Malavasi 95] for analog circuits, stack generation is done in two
phases. In the first phase, all the paths in the circuit graph are generated by a dynamic program-
ming procedure. In the first step of this iterative procedure, all paths of length 1 are generated.
In the n-th step, the paths of length n are generated by augmentation of the paths of length n - 1.
In the second phase, a path-graph G p is generated. The paths generated in the first step become
the vertices of G p' and an edge is inserted between two vertices if the corresponding paths are
compatible, Le. if they can coexist in the same partition. A circuit partition corresponds to a
clique of G p , Le. a maximally connected subgraph of Gpo All circuit partitions are then gener-
ated using a clique finding graph algorithm. The quality of each clique (partition) is evaluated
using an analog specific cost function. The cheapest clique found is the optimum solution to
the partitioning problem. This algorithm has exponential time complexity [Malavasi 95]. The
reason for this is that it enumerates all stacks, which makes it extremely sensitive to the size of
the problem.
In [Bas 96] a constraint-driven stacking algorithm with linear time complexity was pro-
posed. It employs an Eulerian trail finding algorithm that can satisfy analog-specific perfor-
mance constraints. A trail on a graph is a set of edges (vo, eo, Vt. et. ... , Vk-t. ek-l> Vk), where
ej = (Vj, Vj+!) is an edge in the graph and ej ::/; ej for all i ::/; j [ThuI92]. The trail is a closed
trail if Vo = Vk. A closed trail is an Eulerian trail, if it touches all the edges in the graph. If
a closed Eulerian trail exists in a graph, the graph is called Eulerian. A graph is Eulerian if and
only if it is connected and all vertices in the graph have even degree [Thul 92]. An Euler trail in
a circuit graph defines the optimal stack of the circuit. The algorithm proposed in [Chak 90] and
modified for analog constraints in [Bas 96], proceeds in two steps. First, if the circuit graph is
not Eulerian, it is made Eulerian by adding an extra vertex (supervertex) and an edge between the
supervertex and every odd-degree vertex. The resulting graph is called modified circuit graph.
Every edge that is added to the original circuit graph will result in a gap in the diffusion area
of the stack. In the second step, an Eulerian trail finding algorithm is run on the modified cir-
cuit graph. The Eulerian trail finding algorithm in [Bas 96] is adapted to support symmetry and
matching constraints and works with a constraint driven cost-function. The complexity of the
algorithm is linear in the number of transistors in the circuit and guarantees to find the minimum
cost stack set for a certain class of circuits [Bas 96].
3.4 Transistor Stacking Algorithms 59
M4b
4a M5b 4b
3 out
M2a M2b
Vb3--~--JI~------~-----r------~1
Vss
(a)
Vdd
4a~
4a
Vdd
:( 5( 4b(
lout
1
2aA2b
3
2b 3
Vss
Vss
(b) (c)
1. interface mode : when used in interface mode, the module generators take as input the
actual parameters for the module and return a collection of geometrical abstractions for all
possible variants that can be used to layout the module. The geometrical abstraction can
be as simple as a bounding box or can be more complicated, depending on the information
that is needed by the placement algorithm to evaluate intermediate solutions. The details of
geometrical abstraction that is used in our placement model will be discussed in chapter 4.
2. layout mode: when working in layout mode, the module generator constructs a detailed
physical layout for the actual variant selected by the placement algorithm.
In the beginning of the placement process, the generators are used in analysis mode to obtain in-
formation about the different possible implementations of a module. When the placer is finished,
the actual variant is known, and the generators can be used in synthesis mode to obtain the actual
physical placement.
layers) have to be parameterized in the module generation code. We have introduced process
parameters and module definitions to achieve this.
• Process Parameters
- Geometrical Process Parameters For every relevant design rule, a process parame-
ter is defined. The module generator code contains statements that specify polygon
dimensions as a function of module parameters and process parameters. During ini-
tialization of the module generator, the values for the process parameters are retrieved
from the technology file and together with the module parameters, they are used to
calculate the polygon dimensions. Examples of geometrical process parameters are:
the minimum width of a layer, minimum separation between layers, the minimum
enclosure of one layer with respect to another layer, etc.
- Electrical Process Parameters Electrical Process Parameters define the electrical
characteristics of the process layers. Typical examples are the sheet resistance and
parallel plate capacitance of a layer, the maximum current density of a layer, etc.
• Device Definitions
The device definitions are introduced to describe in a general way the mask layout structure
of different devices in a particular process. For each device used in a module generator,
a generic layer model has been constructed and the module generators have been written
in terms of these generic layers. The device definitions, which are part of the technology
file, are used to establish the correspondence between these generic layers and the layers
used in a particular process. A 'void' layer is used if there is no corresponding layer in
the process. The module generators can be adapted to different technology processes by
writing new device definitions. An additional advantage of using device definitions is that
it allows a user to define different types of transistors, capacitors and resistors. Device
definitions are currently supported for contact structures, MOS transistors, capacitors and
resistors. They are discussed in detail in [LAYLA man](Ch. 9).
3.7 Examples
The LAYLA module generator library currently consists of generators for MOS transistors, ca-
pacitors, resistors, inductors, transformers, pairs of MOS transistors, pairs of capacitors and pairs
of resistors. The library is currently in use in the ESATIMICAS group and in several electronic
companies. Technology files have been written for more than 20 different processes from 5
different foundries. It takes about an hour to port the library to a new technology. Detailed infor-
mation on the library can be found in [LAYLA man]. In the remainder of this chapter, we will
present some examples to illustrate some features of the module generator library.
62 Module Generation
The following parameters can be specified as input for the MOS transistor generator:
• fingers: the number of fingers of the MOS transistor. Large transistors can be split into a
number of parallel parts, which are then merged together. The parts are called fingers, and
3.7 Examples 63
mos_def(low_vt-pmos, II mosType
poly, II gateLayer
void, 1/ gateMask (not used)
mlpoly, II gateContactType
active_area, II sourceDrainLayer
p_active_area, II sourceDrainMaskl
p_diffusion, /1 sourceDrainMask2
low_vtp, 1/ sourceDrainMask3
void, II sourceDrainMaks4 (not used)
mlpdiff, 1/ sourceDrainContactType
active_area, II guardRingLayer
n_active_area, II guardRingMaskl
void, II guardRingMask2 (not used)
void, /1 guardRingMask3 (not used)
void, 1/ guardRingMask4 (not used)
mlndiff, 1/ guardRingContactType
nwell, II bulkLayer
mlnwell, II bulkContactType
metall, /1 firstRoutingLayer
meta12, /1 secondRoutingLayer
via) ; II rountingViaType
the process of splitting a transistor into parts is called transistor folding. Fig. 3.5 shows a
folded transistor with 12 fingers.
• slices: the number of slices of the transistor. An effective way to protect sensitive devices
against coupling noise is to make sure that no part of the transistor is more than a specified
minimum distance away from a bulk contact. This minimum distance can be a design rule
imposed by the foundry or can be specified by the designer. For extremely large transistors,
this rule results in very long transistor layout structures. To avoid these inconvenient aspect
ratios, very large transistors can be split into a number of parallel transistors, called slices
and rows of bulk contacts can be inserted between them. Fig. 3.6(a) shows a large transistor
which has been split into two slices. A row of bulk contacts has been added in between
them .
• current: the current flowing through the transistor. If this current exceeds a process-
specific threshold value, the widths of the source/drain wires have to be increased to avoid
excessive voltage drops and electromigration effects. Fig. 3.6(b) shows a transistor layout
with increased source/drain wire width. The number of vias used to connect metal 1 and
metal 2 wires has also been adapted to the high current.
64 Module Generation
• aspect ratio : the aspect ratio of the layout of a transistor with a given W / L ratio is
determined by the number of fingers and slices. Instead of specifying the number of fingers
and slices, it is also possible to specify a desired aspect ratio. In this case, the number of
fingers and slices will be determined, taking into account the bulk-contact and high-current
layout rules.
• type: the MOS transistor type. The specified type must correspond to one of the device
definitions that are defined in the technology file.
• technology file : the technology file that contains the design rules and device definitions
to be used during generation of the transistor.
• generator mode: the mode in which the transistor has to be generated (see section 3.5).
This can be interface or layout.
• output format: different output formats can be specified, depending on the layout envi-
ronment in which the generator is used.
Fig. 3.5 shows a folded nMOS transistor with 14 fingers. An example of a large nMOS transistor
that has been split into two parallel transistors is given in Fig. 3.6(a). Three parallel transistors
are placed on top of each other and bulk contacts are inserted in between them.
The transistor in Fig. 3.6(b) has a high drain current. Therefore, the width of the horizontal
and vertical source and drain wires has been increased as well as the number of vias used to
connect the vertical source/drain wires to the horizontal ones. The width and the number of vias
is calculated based on the maximum current per unit wire width and per via, which are specified
in the technology file.
The width of each wire in Fig. 3.6(b) depends on the current flowing. Let I be the total drain
current of the wire, then the width W2 of the metal 2 wires connecting the slices together is given
by:
I
W2 = - - , (3.2)
I max•2
where I max •2 is the maximum current density of the metal 2 layer, expressed in Ampere per meter.
If there are ns slices in the transistor structure, then the width W2 of the metal 1 wires connecting
the fingers together is given by :
I
W.=---,
n.lmax .•
(3.3)
3.7 Examples 65
with [max . I the maximum current density of the metal 1 layer. The neccesary number of vias Nvia
can be computed as follows:
[
N via = - - - - (3.4)
ns [max .via l2
where [max. via ,2 is the maximum current in a via connecting layers metal 1 and metal 2. If
there are n I fingers per transistor slice, then the width WII of the metal 1 wires connecting the
source/drain diffusion region is given by
[
WII= ~ , (3.5)
ns IIoor( 2 )Imax . I
where I I oor (x) denotes the largest integer smaller than or equal to x .
As a result of the wide source wires, the gate wires can become very long. This causes the
gate resistance to increase which is bad for the noise performance and can lead to RC effects
at high operating frequencies. The structure in Fig. 3.7 is used to solve this problem. In this
structure, metal 2 is used for the source wires, which allows to put the gate wires closer to the
active area of the transistor.
66 Module Generation
(a) (b)
Figure 3.7: High current nMOS transistor, source wires in metal 2 to reduce RC effects
68 Module Generation
analog circuits.
In general, three different strategies can be applied to module generation for analog circuit
layout. The first strategy uses a large library of procedural module generators that implement
most of the analog layout knowledge. The second strategy limits the number of procedural
module generators as much as possible and relies on the placement tool to assemble the more
complicated substructures. The third approach is to construct a limited number of promising
alternative sets of stacks in advance and to select the best one during placement. Graph-based
algorithms can be used to generate the palette of alternative stacked realizations.
It is our opinion that neither of these strategies does the job in an optimal way. A number
of substructures returns over and over again in analog circuits and can not be assembled by
merging elementary devices. For these components, specific procedural module generators have
to be written. These module generators have to be combined with merging during placement to
achieve close to optimal layouts in a fast, reliable and predictable way.
To overcome the problem of technology dependent module generator libraries, we have pro-
posed a technique that isolates technology dependence in technology parameter files and device
definitions. The LAYLA module generator library has been used with more than 20 different
technologies from 5 different foundries without modifications to the source code. An example
of a procedural module generator was presented to illustrate some analog specific features.
Chapter 4
Placement
4.1 Introduction
This chapter addresses the placement problem for high-performance analog circuits. The place-
ment phase is crucial for the performance degradation of an analog circuit layout since it in-
fluences all the parasitic layout effects which have been discussed in chapter 2. The distance
between matching devices, and therefore also their matching degree is determined during place-
ment. The placement of a circuit also determines its thermal profile. In addition, it greatly influ-
ences the values of the interconnect parasitics. Although their final values are determined during
routing, their minimum values are fixed by the configuration of the device terminals, which is
determined during placement. A performance driven placement algorithm therefore has to take
into account all of these performance degrading effects simultaneously.
We begin this chapter by formally stating the problem and reviewing the constraints that have
to be taken into account during placement. We then give a brief overview of our placement tool in
section 4.3. In order to justify our choice of simulated annealing as basic placement optimization
algorithm, we give an overview and comparison of different placement techniques in section 4.4.
Based on this comparison, the simulated annealing algorithm is selected and its application to
analog performance driven placement is discussed in section 4.5. Some important details of our
placement implementation, the placement model, the handling of analog constraints, the move set
and cost function are discussed in sections 4.5.1,4.7 and 4.8. In section 4.9 we describe in detail
how the layout induced performance degradation is computed for an intermediate placement
solution. In section 4.11 we discuss the annealing schedule of the algorithm. Finally, we give
some experimental results in section 4.12 and we draw conclusions in section 4.13.
correct way and such that the layout area is minimal and that the circuit can be routed afterwards.
The following additional constraints and objectives have to be added to this basic definition:
• Symmetry Constraints
In high-performance analog circuits, it is often required that groups of devices are placed
symmetrically with respect to one or more symmetry axes. Symmetric placement allows
for symmetric routing and results in matched parasitics. Symmetry constraints can be
formulated in terms of couples, selfsymmetric devices and symmetry groups. Two devices
which are placed symmetrically with respect to an axis form a couple. A self-symmetric
device is a device which is placed on a symmetry axis. A symmetry group is a collection of
couples and self-symmetric devices which share the same symmetry axis. The symmetry
group represented in Fig. 4.1 consists of the couples (MIA,MlB) and (M2A,M2B) and the
self-symmetric device M5. More than one symmetry group can be specified for a circuit.
The presence of one or more symmetry groups has the following implications for analog
placement:
- Two devices which are specified as a couple must be placed symmetrically with re-
spect to an axis and must have identical variants and mirrored orientations.
- A device which is specified as self-symmetric must be placed on a symmetry axis.
- Couples and self-symmetric devices that belong to the same symmetry group must
share the same symmetry axis.
• Matching Constraints
Matching constraints can be specified by defining matching groups. A matching group is a
set of two or more devices for which an accurate ratio of device characteristics is required.
The simplest and most common case of a matching group is a pair of equal devices. A
more complicated case of a matching group is shown in Fig. 4.2. Any number of matching
groups can be defined in an analog circuit. The presence of one or more matching groups
has the foJIowing implications for analog placement:
- All devices which belong to the same matching group must have equal orientations.
- If the devices of a matching group are equal (1: 1 ratios), they must be implemented
with equal variants. If they have another ratio, they should be built of equal unit
devices, according to the ratio.
- The placement tool has to determine the positions and therefore also the distance
between the matched devices such that the circuit performance constraints are met.
Since it is not always possible in an analog circuit layout to, at the same time, meet
all symmetry requirements, put all matching devices directly next to each other and
obtain a fairly compact layout, the matching degree of a pair of devices has to be
selected in view of its influence on the performance of the circuit.
4.2 Problem Formulation 73
M2A M2B
VDD ----~--~--~------
couple 2
MiA
couple 1
I
I
self I 1"15
symmetric
VSS __----l'--_
• Performance Constraints
As discussed in chapter 2, the performance of a circuit is influenced by layout parasitics.
- interconnect parasitics
A performance driven placement algorithm has to create a placement that allows
a router to complete the interconnections within the performance constraints. Al-
though the actual values of interconnect capacitances and resistances are determined
during routing, their minimum achievable values are fixed during placement and it is
therefore crucial that (estimated) performance degradation induced by interconnect
parasitics is taken into account during placement.
- device mismatch
The distance between the matching devices has to be selected in view of its influence
on the performance of the circuit.
- thermal effects
The presence of power dissipating devices in a circuit causes a temperature distribu-
tion across the placement. Since device characteristics are influenced by local tem-
peratures, matching devices have to be placed such that the performance degradation
74 Placement
vnn
~ plM2 pi M3
l\1l
:'>IJ
M2
• Geometrical Constraints
The blocks generated by a circuit level layout generation tool are often part of a larger
system. To minimize system level performance degradation, a target aspect ratio, fixed
height and/or fixed terminal positions may be specified by a floorplanning tool or by a de-
signer. These additional geometrical constraints also have to be taken into account during
placement.
circuit netlist, is then used as input for a set of module generators that construct a list of geomet-
rical variants for every device (see chapter 3). Only the information needed for the optimization
of the placement is generated (the generators are called in interface mode).
Next, a simulated-annealing algorithm is used to generate the actual placement, taking into
account all the constraints and objectives that have been identified in the previous section.
After the optimization, the module generators are called again (this time in layout mode) to
create the actual layout for the selected geometrical device variants and the final layout is con-
structed. The output of the program is then the final placement layout, together with information
about the performance degradation in this final layout and an identification of the most impor-
tant contributions to this degradation. In case the degradation exceeds the required performance
specifications, this information can be used by the designer to see the failing performance(s) and
to identify the critical effects. This allows him to improve his design when desired.
Because of this complexity, heuristic algorithms must be used to solve them. It would lead us too
far to discuss all placement algorithms that have been used in the past. We refer the reader to the
excellent overviews in [Shah 91, Sher 95] and the references therein for details on the various al-
gorithms. In this section we will give an overview of the main classes of algorithms, with a short
description of their working principle and main features. Based on this overview we will justify
our choice of simulated annealing as basic algorithm for performance driven analog placement.
(4.1)
characterized by the positions of its particles, is repeatedly changed by applying a small random
displacement to a randomly chosen particle. If the difference in energy between the old and the
new state/:;.E is negative, the new state is accepted and the process is continued with the new
state. If /:;.E is positive or equal to zero, the probability of acceptance of the new state is given
/:;.E
by exp( ---). This acceptance rule is referred to as the Metropolis criterion. Following this
kBT
criterion, after a large number of perturbations, the system evolves to a state of thermal equilib-
rium, characterized by energy distribution (4.2). The Metropolis algorithm can thus be used to
simulate the evolution of a solid to thennal equilibrium. By applying this algorithm at decreasing
values of temperature, the annealing process of a solid can be simulated.
The simulated annealing algorithm is based on the analogy between the simulation of the
annealing of solids and the problem of solving large combinatorial optimization problems. In the
latter case, the configurations of the optimization problem play the role of the states of the solids
and the cost function C associated with a particular configuration takes the role of the energy E
of a state. A control parameter T is introduced to play the role of the temperature. The algorithm
can now be described as follows. Initially, the control parameter T is given a high value and a
sequence of configurations is generated using the Metropolis algorithm. Starting from the current
configuration i, a new configuration j is chosen using a generation mechanism, i.e. a prescription
to generate a transition from a configuration to another one by a small perturbation. If /:;,Cij =
C(i) - C(j) is the difference in cost between the two configurations, the new configuration is
/:;'C·
accepted with probability 1 if /:;,Cij :::: 0 and with probability exp( --:jl) if /:;,Cij > O. This
process is continued until eqUilibrium is reached, i.e. until the probability distribution of the
configurations approaches the Boltzmann distribution, now given by
1 C(i)
P{conJiguration = i) = Q(T)exp(-T) (4.3)
4.4.7 Discussion
Based on the problem description given in section 4.2, the following desirable features for an
analog placement algorithm can be identified:
1. As pointed out in chapter 3, most of the devices in an analog integrated circuit can be laid
out in different ways. Therefore, the algorithm should be able to select position, orientation
and implementation (variant) simultaneously.
4.4 Previous Work in Placement Algorithms 79
2. Analog devices can have arbitrary rectilinear shapes. Each device tenninal can also have
an arbitrary rectilinear shape. Device tenninals in analog layout can not be reduced to
points.
3. Most analog circuits are of moderate size. The average complexity of an analog circuit
level placement problem is 20 to 30 devices. Although important, the efficiency of the
algorithm is not as crucial as it is for large digital placement problems.
4. The sizes of devices encountered in analog circuits vary with orders of magnitude. There-
fore, the algorithm should perform well with blocks of widely varying sizes.
5. The various symmetry and matching constraints that are frequently encountered in analog
circuits, require accurate control over the devices positions and orientations. In addition
to this, a number of geometrical constraints are often imposed on the overall layout (e.g.
fixed height, aspect ratio, tenninal positions, ... ) Hence, a flexible placement optimization
technique, that allows to arbitrarily constrain every aspect of the placement problem, is
needed.
6. The most important objective in performance driven placement is to guarantee that the lay-
out induced performance degradation remains within the circuit specifications. Therefore,
placement decisions must be based on an accurate evaluation of the performance degra-
dation, which requires detailed knowledge of the positions and orientation of all devices
simultaneously at all times. Constructive and partitioning based algorithms take placement
decisions sequentially, based on incomplete information and are therefore not well suited
for analog placement problems.
If we take into account these criteria for the selection of an analog placement algorithm, the
stochastic optimization algorithms (SE and SA) appear to be the most promising candidates.
Both of these algorithms offer the possibility of putting arbitrary constraints on the generation
of new candidate placement solutions. In SA this can be done through careful design of the
move set, in SE through the use of restricted crossover, mutation and inversion operators. Both
algorithms are cost function based. The cost function, that evaluates the quality (or fitness)
of intermediate solutions can be used to implement the analog performance and geometrical
constraints. SA and SE operate on the entire solution simultaneously, which is essential for an
accurate parasitic evaluation. All these features combine to make SA and SE the best choice for
analog placement problems.
One of the features that can be used to distinguish SA and SE algorithms is their interactivity
control. SE operates on a pool of solutions simultaneously. This offers the possibility to the user
to insert candidate solutions (hints) in this pool. Even if the hint solution is not accepted as the
optimal one, it will continue to exist in the pool for some generations and the best aspects of the
solution will influence the final solution through the crossover mechanism. This makes SE an
ideal algorithm for placement problems where a great deal of user interaction is required (e.g.
f100rplanning algorithms).
Our analog placement tool was designed for 'batch' operation in an automated analog synthe-
sis environment [Gielen 95a]. The main advantage of the SE algorithm is therefore not essential
80 Placement
for our problem. In addition to this, SA for layout problems is much more mature than SE. A lot
of research that has been done on the operation of the SA algorithm can be reused for our analog
placement problem. This allows us to concentrate on the analog specific aspects of the problem,
such as the implementation of the various constraints, and the evaluation of layout induced per-
formance degradation. Therefore, SA has been selected as the optimization algorithm for our
placement tool. The remainder of this chapter will be used to explain the various aspects of our
implementation of SA for analog circuit level placement.
Anneal Placement {
Calculate initial temperature TO;
Generate initial placement;
Evaluate cost function C;
until (placement frozen) {
until (equilibrium reached at current T) {
Generate random move;
Evaluate change in cost functio~C ;
if(6C < OH
accept move;
}
else if ( Metropolis criterion ) {
accept move;
}
else {
reject move;
restore previous state;
}
Decrement temperature T;
In the flat placement style, also called Gellat-Jepsen style [Jeps 84], a placement is represented
by specifying the absolute coordinates of all devices. An annealer manipulates the placement
by shifting the coordinates of the devices. Since there are no restrictions on the positions of the
devices with respect to each other, devices can overlap in intermediate placement solutions. This
illegal overlap must be driven to zero in the final result by inserting an overlap penalty term in
the cost function.
In the slicing style [Wong 86], a placement is specified by the relative positions of all devices
with respect to each other. This is done by using a slicing structure to represent a placement
solution. Such a slicing structure is obtained by repeatedly partitioning the layout area into
horizontal and/or vertical slices, as shown in Fig. 4.5(a). The layout area is partitioned into as
many partitions as there are devices in the circuit, and each device is assigned to one partition.
A slicing tree is a convenient way to represent such a slicing partition (Fig. 4.5(b». In this tree,
* and + are two-operand operators, symbolizing a vertical and a horizontal cut, respectively. An
annealer can perform object moves by operating directly on the slicing tree, e.g. by interchanging
82 Placement
A
/\
*
B
B
/\ +
D© C
/\ D
(a) (b)
4.5.1.3 Discussion
Both types of placement representation have their advantages and their drawbacks.
As pointed out in section 4.2, symmetric placement is very important for high-performance
analog circuit layout. The use of a flat placement representation allows the annealer to operate
directly on the absolute coordinates of the devices. This makes it possible to implement the
important symmetry and self-symmetry constraints directly in the move set, as will be discussed
in section 4.7. Slicing style placement tools have to implement global symmetry constraints in
the cost-function through the use of virtual symmetry axes [Malavasi 91], which is a less efficient
solution.
Knowledge of absolute device coordinates is also necessary to accurately estimate the values
of layout parasitics and to calculate the resulting performance degradation . These absolute coor-
dinates are readily available when a flat placement style is used. If a slicing style representation
is used, the relative position of the devices have to be mapped to absolute coordinates before the
performance degradation can be computed. This is again a waste of CPU time and a less elegant
solution.
The main advantage of a slicing style placement tool becomes clear when it is used in a
digital layout system that employs channel routing. In that case, the slicing tree structure defines
a channel routing order that is guaranteed to be conflict-free. In addition to this, there is never a
wire space problem since the wiring channels can easily be adjusted to accommodate the required
amount of routing space. However, these features offer little advantage when used in the context
of analog layout. It will be shown in chapter 5 that channel routing is a poor choice for analog
layout. Hence, an area router must be used and the main advantages of the slicing style over the
4.5 Simulated Annealing for Analog Performance Driven Placement 83
.-------., --------,
: Rl : Rl 1
1
1
1 1
______ L _______ ~~-~-~-~-~-~-~-~
------~~~~~~~~~-------
:R2
1
: R2
1
I I
1
I
1 1 1 1
1
------f===:=:=~------- ,- - - - - -:- - - - - - - - ~r-:"":-:"":-:""::"::""::":::J
1
: R3 :
t_ _ _ _ _ _ _ _ : 1
~--------
(a) (b)
the module.
The representation discussed above is a 'black box' approach: only the boundary of a device
is considered. This is sufficient for placement of general rectilinear cells. However, if layout
geometry sharing (see chapter 3) is considered, the geometry of each device tenninal has to
be taken into account. In this case, the overlap computation between two devices has to be
perfonned by computing overlap between every tenninal of the first device and every tenninal
of the second device and the distinction between legal and illegal overlap has to made.
Overlap between two module tenninals is legal if and only if [Cohn 91]:
• the terminals are connected to the same net,
• the terminals belong to the same layer,
• the devices of which they are part have the same bulk potential.
In all other cases, the overlap between device terminals is illegal. Legal overlap results in a re-
duction of diffusion capacitance and hence in a reduction of perfonnance degradation. Illegal
overlap results in DRC errors and/or circuit topology changes and must be driven to zero. A ter-
minal can have an arbitrary rectilinear shape. Therefore, we need to cover it with rectangles (see
Fig. 4.7). Since the computation of the tenninal capacitance is based on the area of the rectangles
in the cover, we need a set of disjoint rectangles. Use of a minimum overlapping cover would
result in an overestimation of the tenninal capacitance.
The geometry of each tenninal is also needed to estimate the topology of the net that is
connected to it. In digital placement tools, a device tenninal is often represented by a point,
and sometimes it is assumed that all terminal points coincide with the center of the device. For
analog circuit level layout, this simplification is unacceptable. To determine accurate estimates
of the interconnect parasitics, the exact geometry of each tenninal has to be considered.
For the evaluation of thennally induced perfonnance degradation, we also need the thennal
profile of a device. The thennal profile of a device is a matrix representation of the temperature
4.5 Simulated Annealing for Analog Performance Driven Placement 85
distribution caused by the device in its neighborhood. This profile is computed during initializa-
tion of the device, based on its power dissipation, its geometry and the thermal profile of a unit
source. The thermal distribution of an intermediate placement can be evaluated by superposition
of the thermal profiles of the individual modules. This will be discussed in detail in section 4.9.3.
All these considerations lead to the following representation for each device:
• minimum overlapping cover : a set of rectangles covering the complete layout of the
device.
• cover bounding box : the bounding box of the minimum overlapping cover of the module
layout. This box is used to speed up overlap computation. Overlap between the rectangles
of the MOC's of two devices is only checked if their cover bounding boxes overlap.
• terminal bounding box: the bounding box of the terminal partition of a device terminal.
This box is used to speed up overlap computation in a similar way as the cover bounding
box of a device.
gate
dra in
drain
source
source
~
bu l k bulk
(a) (b)
,, '''
: 6.int{
6. intb
,---- -- ------ - - - - ---------------
where the subscript x denotes one of the four directions l,b,r or t. 6.intx,st is the static and
6.i ntx,d is the dynamic component of the estimated interconnect area,
The static term 6.intx,st is used to allocate interconnect area for nets that are connected to
the device itself. This term only depends on the device itself. The static interconnect area for a
direction x is computed as the sum of the widths of the wires that connect to device terminals
4.6 Handling Analog Constraints in Simulated Annealing 87
Tx
!':.intX •SI =L W" (45)
i=!
where Tx is the number of terminals on the x side of the device and Wi the width of the wire
connecting to terminal i. Wi is computed based on the current flowing through terminal i.
The static term !':.intx,st is independent of the position of the device in the placement and re-
mains constant during placement optimization. The dynamic component !':.intx.d reserves routing
space for wires that are not connected to the device and that have to pass along its edges. Its value
depends on the configuration of the device terminals in its neighborhood and therefore is strongly
dependent on the placement. !':.i ntx,d has to be computed for each intermediate placement, based
on the estimated topologies of the nets. We will discuss this term in section 4.10.
Unfortunately, some hard constraints are difficult to maintain by construction. If this is the case,
they must be implemented as penalty tenns in the cost function and special care must be taken
that they are actually driven to zero in the final result (for instance by giving them large weights).
The different constraints are implemented as follows in the LAYLA placement tool.
• Symmetry Constraints
Symmetry is considered to be an absolute constraint. If the user specifies a number of
devices as being symmetric and/or self-symmetric, the devices have to be symmetric in
the resulting placement. Consequently, symmetry constraints are handled as restrictions
in the move set. Groups of symmetric devices are moved simultaneously such that their
symmetry is preserved at all times during the optimization and also in the final result.
• Matching Constraints
If a user specifies a group of devices as a matching group, this will have two effects. First,
they will have equal orientations and variants in the final placement. Second, the distance
between them will be optimized in view of its influence on the perfonnance of the circuit.
The first requirement is implemented as an equal orientation/equal variant constraint in the
move set. The second one is handled by including the distance between matching devices
in the set of parasitic effects for which the degradation of the perfonnance characteristics
is calculated and included in the placement cost function.
In this way, the user can specify a pair of devices as being matched without specifying the
degree of matching. Matching devices are always generated identically and with identical
orientations but it is up to the placement tool to detennine the positions and therefore also
the distance between the matched devices such that the circuit perfonnance constraints are
met. Since it is not always possible in an analog circuit layout to, at the same time, meet
all symmetry requirements, put all matching devices directly next to each other and obtain
a fairly compact layout, the matching degree of a pair of devices is selected in view of its
influence on the perfonnance of the circuit.
• Perfonnance Constraints
The most important requirement of an analog perfonnance driven placement tool is to
make sure that the perfonnance degradation induced by the various parasitic layout effects
remains within the circuit specifications. However, perfonnance constraints can not simply
be translated to restrictions on the coordinates and/or the orientations of the devices. They
have to be evaluated based on an intennediate placement solution. It is therefore impossi-
ble to maintain perfonnance constraints by construction and they have to be implemented
by penalty tenns in the cost function. Special care is taken to guarantee that perfonnance
constraint violations are actually driven to zero in the final result. This will be explained
in section 4.8.
• Geometrical Constraints
Geometrical constraints can be considered as hard constraints or as soft constraints. It is
up to the user to make the distinction. For instance, if the height of a placement has to
4.7 Move Set 89
be smaller than a certain value to make it fit into a system level placement, the minimum
height constraint is a hard constraint and is implemented in the move set. Another situation
where hard geometrical constraints are imposed is when the resulting placement will be
used in a standard cell layout assembly system: in that case the height and the power
supply terminal positions are fixed. In cases where the geometrical constraints are specified
as optimization targets they are implemented in the cost function. An example of this type
is the target aspect ratio that can be specified for a placement.
• Overlap Constraint
The overlap constraint is a hard constraint: if the final placement contains illegal overlap,
it can not be used. However, it is not implemented as a restriction in the move set. Imple-
menting overlap constraints in the move set would imply that placements with overlapping
modules are never considered. A restriction like that would make it impossible to detect
those situations where overlap is beneficial, for density as well as for performance reasons.
Therefore, each overlap situation has to be considered individually and overlap is best dealt
with in the cost function.
An important consequence of the strategy outlined above is that the annealing cost function
contains penalty terms that have to be driven to zero in order to obtain a useful result. To make
sure that they are actually driven to zero, special techniques must be used, as will be explained
in section 4.8.
Translation Swap
I - - -- ,
, ,
/
I- L - - - ,-
I - r - _ _ L_
, , I _ _ _-.!
,- -, - r I
I
, ,
I I I
: I I :
, I I I
~ ~_I
1-
,
'- ' I I , I
, ,
_...J_L_
(a) (b )
to three groupS: one to control its position, one for its orientation and one for its variant. The
moves can be divided into three classes:
1. relocation moves change the location of one or more devices. They are executed on the
independent group or on a symmetry group.
2. reorientation moves operate on an orientation group and change the orientation of one or
more devices.
3. reshaping moves operate on a shape group and change the variant of one or more devices.
We will now discuss the various groups and the moves that can be executed on them .
• Independent Group The independent group consists of all devices that are not involved
in a symmetry constraint. Consequently, their positions can be altered independently of
the positions of all other devices in the circuit. There is only one independent group. Two
types of relocation moves can be executed on the independent group (see Fig. 4.9):
- Independent Translation (see Fig. 4.9(a»: one device of the independent group is
selected at random and its center position is translated to a randomly chosen new
position.
- Independent Swap (see Fig. 4.9(b»: two devices of the independent group are cho-
sen at random and their center coordinates are interchanged.
• Symmetry Group A symmetry group consists of all devices which are symmetric with
respect to the same axis. The devices are stored as a collection of symmetric units. There
are two types of symmetric units: couples and self-symmetries. Two devices which have
4.7 Move Set 91
Translation Swap
I I I I
D
D'-0 0-' D
#" "~
(a) (b)
Flip Shift
~ -------."".
D D ,, ,,
,-, ::,-, D D
DD ,, ,,
'-' , '-'
,, ,,
DD
1'-'
'"'.!.I,
,,
(c) (d)
- Symmetric Translation (see Fig. 4.1O(a»: a symmetric unit is randomly chosen and
relocated to a new, random position. If a couple is selected, the positions of the two
devices are changed, such that their symmetry is preserved. If a self-symmetric is
chosen, it is shifted on the axis.
- Symmetric Swap (see Fig. 4.10(b»: two symmetric units are selected and their co-
ordinates in the direction of the symmetry axis are interchanged. Their coordinates
perpendicular to the axis remain the same.
- Symmetric Flip (see Fig. 4.1O(c»: a couple is randomly chosen and the center coor-
dinates of the two devices are interchanged.
92 Placement
-'-r-.-r-
I
_ _
• I I
I
r..:!I I I I I I ,I
(a)
1:,----, -'-r'-r-
I
I I
_ _
I I
•
(b)
- ,
I I I I
,- - , - r - , - r - ,
, ,
I I I t
I I I I
, ,
, I I I
I t I I
I I I I
I I I I I
I-r, -r ..,-t
I
, -
I I I
- Symmetric Shift (see Fig. 4.1O(d)): the axis of the symmetry group is shifted by a
random amount in a direction perpendicular to its own direction. During this op-
eration, the position of the symmetric units relative to the axis remains the same,
but their absolute position is shifted by the same amount as the axis. This move is
necessary if there are multiple symmetry groups present in the circuit.
to an equal orientation group are always equal (see Fig. 4.11(a). Those of devices belong-
ing to a mirrored orientation group are always mirror-symmetric with respect to an axis.
In Fig. 4.11(b), the orientations of the devices are mirror-symmetric with respect to the X-
axis. Equal orientation groups are used for matching devices. Mirrored orientation groups
are used to maintain the mirror-symmetry of symmetric couples. As discussed in sec-
tion 4.5.2, each device is represented by a set of rectangles that cover its shape and another
set of rectangles for each terminal. Changing the orientation of a device involves rotating
and/or mirroring each of these rectangles, which is a time-consuming operation. To speed
up this move, the layout of each device is generated for all its possible orientations during
initialization of the program. The reorientation moves are then actually implemented by a
variant change. This only involves a switch of a pointer.
• Shape Group A shape group is a collection of one or more identical devices which have
to be implemented with equal variants. Reshaping moves change the variant of all devices
in the group simultaneously (see Fig. 4.12). Shape groups are used for matching devices
and symmetric couples.
Now that we have defined the different groups and the moves that operate on them, we can
describe more formally the effect of the symmetry and matching constraints that are defined in
the netlist :
1. If there is no symmetry or matching constraint defined for a device:
• A symmetry group is created and the devices, grouped into couples and self-
symmetrics, are inserted.
• For each couple, a mirrored orientation group is created and the two devices are
inserted. For each selfsymmetric, an orientation group is created and the device is
inserted.
• For each couple, a shape group is created and the devices are inserted. For each
selfsymmetric, a shape group is created and the device is inserted.
94 Placement
where:
L L areaOverlapij
n n
Coverlap = (4.8)
i=l j=i+l
where n is the number of devices in the circuit and areaOverlapij the overlap area be-
tween device i and j.
The weighting coefficients ex, fJ, y and /) are used to dynamically adjust the relative impor-
tance of each term during the course of the optimization. In the earlier stages of the optimization,
4.9 Estimating Performance Degradation 95
when the general configuration of the placement is determined, the aspect ratio and performance
terms have to dominate the cost function. Towards the end of the optimization, when the final
positions of the devices are optimized without major configuration changes, the relative weight
of the overlap term has to be increased to make sure that no illegal overlap is present in the final
solution. To achieve this, the weighting coefficients are varied between a minimum and maxi-
mum value. After each inner loop. the relative weight of the performance and aspect ratio terms
is linearly decreased from the maximum towards the minimum value, while the weight of the
overlap term is increased from the minimum to the maximum value.
Using the technique described in section 2.2, these performance specifications can be trans-
formed into N p constraints on the layout induced performance degradation:
i = 1, ...• N p , (4.10)
where 11P1ay •i is the layout induced performance degradation for performance characteristic Pi.
The limits 11PI~;J and D.PI~:1 can be calculated using equations (2.6) and (2.7) respectively.
A penalty term CperjDegr.i is associated with each performance characteristic Pi' If (4.10)
is satisfied for Pi. CperjDegr.i is set to zero. If not, its value is proportional to the amount of
violation:
The evaluation of (4.11) for an intermediate placement solution requires the knowledge of the
layout induced performance degradation 11 P1ay •i for each performance characteristic Pi' To com-
pute this performance degradation we follow the direct performance driven methodology which
was explained in section 2.2. Based on the geometrical information of the intermediate place-
ment, we estimate the value of all parasitic layout effects. To compute the influence of the
parasitic layout effects on the performance characteristics of the circuit, we use a linear approx-
imation based on the performance sensitivities, which are derived once from simulations before
placement starts. This methodology will now be applied to interconnect parasitics, device mis-
matches and thermal effects.
96 Placement
A complete and accurate extraction of the interconnect parasitics of a layout requires the knowl-
edge of the exact layout of each net. During placement, the layout of the nets is unknown and
hence the interconnect parasitics can only be estimated. To make a reasonably accurate esti-
mation of the different parasitics, a technique to estimate the geometry of a net based on the
locations of its connecting terminals is needed. To model the interconnect, we only consider the
parasitic resistance and capacitance (see section 2.5). To estimate the series resistance and the
capacitance to ground of a net, an estimation of the total length of a net is needed. To make rea-
sonable estimations of coupling capacitances between different nets, we also need the geometry
of the nets. Net estimation techniques which have been proposed in the past have focussed on net
length estimation. In this section, we give an overview of the most commonly used techniques
and we show how the minimum spanning tree technique is used to give reasonably accurate
estimations for length and geometry estimations in a computationally efficient way.
Given a net of n terminals, the following techniques can be used as net length estima-
tors [Shah 91, Cohn 94]:
• (a) semi-perimeter (Fig. 4. 13(a» : this technique estimates the net-length as the length of
half the perimeter of the smallest rectangle that contains the centers of all tenninals. The
semi-perimeter can be computed in O(n).
• (b) minimum spanning tree (Fig. 4.13(b» : the net-length is estimated as the sum of
the length of the n - 1 paths of a minimum spanning tree connecting the centers of all
tenninals. A minimum spanning tree is the minimum length acyclic connected path that
connects the centers of all tenninals [ThuI92]. Several algorithms of complexity O(n 2 )
have been proposed [Krus 56, Prim 57].
• (c) center of mass (Fig. 4.13(c» : the net-length is estimated as the sum of the distance
of all terminal centers to the weighted mean center of the net. Computation complexity is
O(n2).
• (d) minimum Steiner tree (Fig. 4.13(d» : this technique uses the sum of the path-lengths
of a minimum Steiner tree as an approximation for the net-length. In a Steiner tree, a
path can branch at any point along its length. A minimum Steiner tree is the shortest
possible route for connecting a set of tenninals. The computation of a Steiner tree is an NP-
complete problem. Heuristics can be used to find a non-optimal solution with complexity
ranging from O(nlogn) to O(n 2 ) [Oht 86].
4.9 Estimating Performance Degradation 97
( a) ( b) (e)
(d ) (e)
D.
A A
t
D
B B
Figure 4.14: Different ways to measure the distance between terminals A and B :
(a) Euclidean center to center
(b) Manhattan center to center
(c) Manhattan edge to edge
98 Placement
• (e) source to sink (Fig. 4.13(e» : in this technique, one source module is connected to
all other sink modules and the total length of these connections is used to approximate the
net-length. The complexity of this approach is O(n).
• (f) complete graph (Fig. 4. 13 (f) : net-length is estimated using a complete graph connec-
tion of the centers of all terminals. The calculation complexity of this measure is OCnz).
Using a minimum spanning tree to approximate the geometry of each net and the interconnect
parasitic models described in chapter 2, the interconnect parasitics can be extracted from the
placement. This is illustrated in Fig. 4.15 with a fragment of a placement with five devices and
two nets.
Net nl connects terminals Tl, T2, T3 and net nz terminals T4, T5. To compute the para-
sitics, both nets are approximated by their minimum spanning trees: the minimum spanning
tree for nl consists of the paths T 1 ---* T2 and T2 ---* T3, the one for n2 consists of the path
T4 ---* T5. We define L(T X ---* Ty) and WeT X ---* TY) as the length and the width of the
path T X ---* T Y. L (T X ---* T Y) can be extracted from the placement and W (T X ---* T Y) can
be computed based on the current flowing into terminals T X and T Y. The following parasi tics
can now be computed:
Cx = Cw.x + Ct •x (4.13)
where C w . x is the estimated capacitance of the wire to ground and Ct . x is the sum of the
capacitances of the terminals which are connected to the net. C UI • X can be calculated as the
total area of the net multiplied by Cav , the average capacitance to ground per unit area. Cav
is computed as a weighted average of the capacitance per unit area of the different routing
layers that are used in the process. Applied to nets nl and nz, this gives:
Cw,l [L(Tl ---* T2)W(Tl ---* T2) + L(T2 ---* T3)W(T2 ---* T3)] Cav (4.14)
C w ,2 = [L(T4 ---* T5)W(T4 ---* T5)] C av . (4.15)
4.9 Estimating Performance Degradation 99
Dp
The total terminal capacitance Ct .x can be computed as the sum of the capacitance of each
connecting terminal. For nets n, and n2 :
Nx Ny
Cxy = L L [Lp,ijCc(Dp,ij) + Cov(Aov,ij)] (4.18)
i=1 j=1
where Lp,ij is the length that paths i and j run in parallel, Dp.ij the distance between
their parallel segments, and Aov,ij the overlap area between them. Cc(d) gives the average
coupling capacitance per unit length as a function of the distance d between two pieces and
Cov(a) gives the average coupling capacitance per unit area as a function of the overlap area
100 Placement
a. In Fig. 4.15, path T1 .... T2 runs in parallel with path T4 .... T5 over a distance Lp and
a separation Dp, and path T2 .... T3 overlaps path T4 .... T5. The coupling capacitance
C I2 is thus given by :
where L is the estimated total net-length and tx the number of terminals connected to the
net. Wwire.i is the width of the ith wire segment which can be calculated from the current
flowing through the ith terminal. Psquare,av is a weighted average of the sheet resistances
of the routing layers which are used in the process.
=
Once the value of Cx and Rx .;. i l..n for every net and C. y between any two nets is known.
the performance degradation b.Pj •int for the performance characteristic Pj due to interconnect
parasitics can be determined using the precalculated sensitivity information:
(4.22)
where m is the number of nodes minus the ground node and tk is the number of terminals of net
p l3P· p 8P· p l3P·
k. SCi = _J, SRI = - - ' and SCi = --' are the sensitivities of performance characteristic
k 8Ck t.i 8Rk •i ki 8Cki
Pj to small changes in the parasitic capacitance Cb the parasitic resistance Rk,i and the coupling
capacitance Cki , respectively. These sensitivities are determined in advance by simulation.
(4.23)
(4.24)
4.9 Estimating Performance Degradation 101
where AVTO,SVTO,AfJ and SfJ are constants depending on the process. The area of the devices W L
and the distance D are known for each intermediate placement. Based on this information, the
standard deviations of Vro and f3 can be calculated with (4.23) and (4.24). Using predetermined
sensitivity information, the effect on the degradation of performance characteristic Pj can be
estimated as follows:
l::.Pj = t(ls2vTo.il
k=1
(3a(Vro )k) + IS:~il (3a(f3)k) (4.25)
S.Pj oPj
d SP, - -
h
were !> VTO.i = o l::.OPj
V .
an.!>fJ - are the senSitivities
0 l::.f3k
. . . . 0 f pe rformance c h ' .
aractenstlc
ro k i
Pj to small changes in l::. Vro and l::.f3 of matching transistor pair k. The sum is taken over all m
pairs of matching devices.
Equation (4.25) can be rewritten as
(4.26)
l::.Pj.area represents the degradation of the jth performance characteristic due to area effects. This
term can be computed after sizing and remains constant during placement.
l::.Pj,dislance represents the degradation of the jth performance characteristic due to distance
effects and therefore depends on the actual layout. This term can be computed as follows:
(4.27)
where Dk represents the distance between the transistors of matching pair k and S~~ is the sen-
sitivity of performance characteristic Pj to small variations in distance Dk . This term must be
recomputed for every new placement.
. nnw . mnh
4Q "o '\'
00 00
'\'
L
SIn - - SIn
L - y nnx mny
T(x,y)=--LLrN(n,m). x. .cos--.cos-- (4.28)
kN n=O m=O (1 + onO)nn (1 + omO)mn Lx Ly
Expression (4.28) can not be used for repeated thermal analysis of a placement for two reasons.
First, it is too expensive to evaluate. The required number of terms in the series for a designated
accuracy is directly proportional to the ratio of the chip to source size [Lee 89]. For the analysis
of structures with large chip-to-source size ratios, a considerable amount of CPU time is needed
102 Placement
Given the thermal model and the chip dimensions, the first step consists of computing a thermal
model for a unit source by evaluating series (4.28) on a grid of points (Xi, Yj), i = 0, ... , p -
I, j = 0, ... , Q-1. As discussed above, the required number of terms for each evaluation point
is proportional to the chip to source ratio and can be fairly high. A direct computation of series
(4.28) requires a considerable amount of CPU time, typically 30 sec for a two layer thermal
model. Since this summation has to be repeated for each evaluation point, the computational
burden of this step is very high.
The computational complexity of evaluating (4.28) can be reduced using the Discrete Cosine
Transform (DCT), a derivative of the Fast Fourier Transform (FFT) [Ghar 95a, Ghar 95b). The
DCT of a two-dimensional series knm is defined as :
(4.29)
By comparing expressions (4.29) and (4.30), it is easy to see that the temperature of a grid of
points can be determined from the DCT of the series (4.31). To determine the temperature of a
point (x, y), 1:; and 1:; are expressed as integer ratios 1 and ~. The temperature is then given
by the element Kpq of the DCTof (4.31). P and Q are chosen as the number of discretizations
in the x and y coordinates respectively. Once the DCT of knm is computed, it can be stored as a
matrix and reused during module thermal profile computation.
4.9 Estimating Performance Degradation 103
Figure 4.16: A resistor with resistive area fractured into unit areas.
(4.32)
Studies have shown that, when the heat source edges are at least one structure thickness away
from the boundaries of the rectangular structure, the thennal profiles are weakly affected by the
boundaries and thus the boundaries can be assumed to extend to infinity [Lee 88].
l:l.intx. d = ~ (k\ . -
~ .1 kz ,I.J(xe - x C,'.)2 + (y e - yC,I.)2). (4.35)
i=\
In this equation, N is the number of nets in the circuit, (x., Ye) is the center of the edge for
which the interconnect area is estimated, and (Xc.i, Yc.i) is the center of the bounding box of the
terminals that connect to net i. k\.i and k 2•i are constants depending on the estimated wirewidth
for net i, which can be computed based on the current flowing through the net. Although the
computational cost of the second approach is somewhat higher than that of the first, the results
obtained with this interconnect area estimation technique are consistently and significantly better.
4.11 Annealing Schedule 105
where R is the number of possible placement configurations. Each transition probability is de-
fined as the product of a generation probability G ij (T) and an acceptance probability Aij (T).
The simulated annealing algorithm used in our placement tool is. of the homogeneous
type [Laar 87] : it is a sequence of homogeneous Markov chains, each generated at a fixed
value of T, and T is decreased in between subsequent Markov chains. It is shown in [Laar 87]
that a homogeneous simulated annealing algorithm obtains a global optimum if
3. liml-+oo1i = 0,
where 11 is the temperature of the l-th Markov chain.
In a practical implementation of the algorithm, this asymptotic convergence can only be ap-
proximated. The number of transitions for each temperature 1i must be finite, and liml-+oo 1i = 0
can only be approximated in a finite number of temperatures 1i. Therefore, each implementation
of a simulated annealing algorithm involves a trade-off between speed of execution and quality
of the final solution. This speed/quality trade-off is influenced by the choice of a number of
parameters which are together referred to as the cooling schedule of the algorithm. The cooling
schedule used in our placement tool will be discussed next.
The first parameter that plays a role is the initial temperature To. The value of To is crucial
for the efficiency of the algorithm. If To is too high, too much CPU time is wasted exploring
the configuration space in the initial phase of the algorithm. If To is too low, there is a risk of
getting stuck in a local minimum. In our algorithm, the initial temperature is chosen such that
the average increase in cost t.C I is accepted with a certain probability Po at To [Otten 84]. To
106 Placement
achieve this, a number of random moves are executed at the start of the algorithm and the value
of I1C I is measured. To is then solved from
I1C+
Po = exp(--). (4.37)
To
This leads to the following value for To :
I1C I
To=--. (4.38)
In(Po)
The default value for Po in our algorithm is 0.6. This default value can be overruled by the user.
The second important parameter is final value of the temperature, Tf . Tf is determined by
the stopping criterion of the algorithm. In our implementation, Tf is selected by terminating the
execution of the algorithm if the cost of the last configurations of consecutive Markov chains are
within a specified interval, for a number of chains. The width of the interval and the number of
chains can be specified by the user.
The third part of the cooling schedule is the selection of the length Lk of each Markov chain k
and the transformation rule for changing Tk into Tk+ I. These two decisions are related through the
concept of the stationary distribution of a Markov chain. The stationary distribution of a Markov
chain is the probability distribution of the configurations after an infinite number of transitions.
This distribution is characterized by a vector q, whose i-th component gives the probability of
the system to be in state i after an infinite number of transitions. For simulated annealing, q
depends on the temperature Tk and is given by [Laar 87] :
(4.39)
where C(i) is the cost of the i-th configuration, Copt is the optimal cost and R is the number
of possible configurations. The Markov chain at Tk can be stopped if the Markov chain is in
quasi-equilibrium, i.e. if the probability distribution of the configurations is "close enough" to
the stationary distribution (4.39). Determining a rule for the length Lk of a Markov chain at tem-
perature Tk comes down to defining the exact meaning of "close enough" and hence to determine
when the chain is in quasi-equilibrium. In our cooling schedule we use the criterion proposed
in [Cath 88]. This quasi-equilibrium rule is based on the convergence of the denominator of
(4.39). While building the chain, an estimate of this denominator is gradually updated with the
contribution of the configurations which are accessed up to that moment. The chain is stopped
when the new contributions are not changing the average value anymore.
The rule to update the temperature Tk to the next temperature Tk+1 is related to the length of
the Markov chain at temperature Tk • The ratio between the old and the new temperature will be
denoted as a :
(4.40)
4.12 Experimental Results 107
performance
Performance Spec Plac 1 Plac 2 Unit
offset voltage <5 3.7 6.9 mV
delay <5 2.8 5.4 nsec
where a is varied between 0.95 and 0.8. A long Markov chain length at temperature Tk can be
seen as an indication that the simulated annealing algorithm is in a critical region and hence that
a high value of a should be used. A short Markov chain length indicates a less critical region and
justifies the use of a smaller a value. The details of this algorithm can be found in [Cath 88].
4.12.1 Comparator
The first example is a high-speed CMOS comparator [Steyaert 93]. The circuit is used in a
CMOS AID converter and its performance is a limiting factor for the performance of the over-
all AID converter. The specifications imposed upon the circuit are a propagation delay of less
than 5nsec and an offset voltage of less than 5m V. The circuit schematic is shown in Fig. 4.17.
To demonstrate the effectiveness of our direct performance-driven approach we have generated
two placements for this circuit. Placement 1 (see Fig. 4.18) was generated with the presented
performance-driven placement tool, while placement 2 (see Fig. 4.19) was generated in the tradi-
tional way, with the same placement tool but with the performance-driven mechanism disabled. It
can be seen from Table 4.1 that the simulated performance of placement 1 is significantly better
than that of placement 2. Placement 1 has both performance characteristics within the user-
specified ranges, while for placement 2 both specifications are violated. The optimized distances
between the matching transistor pairs together with the resulting offset voltage degradation due
to distance effects are shown in Table 4.2 for placement 1. The nominal values are the values
obtained after sizing of the circuit without parasitic layout effects (no parasitic node capacitances
and no mismatch). It can be seen that the performance-driven algorithm selectively minimizes
the distances for the most sensitive transistor pairs, which results in a lower offset voltage. CPU
times were 106 and 93 seconds for placement 1 and 2, respectively, on a SUN SPARC 10 work-
station, which means that the performance-driven mechanism significantly improves the circuit
performance at only a small increase in CPU time.
108 Placement
Vdd
lin
Vss
•• • rI ••• •• !
I
fJ'
•
.i
• ·1
1
!
t:
I
! • • . :
••• •• ••• • ••
1
.1 ·e 1. I
•• •• •• :1
I 1
• • • •• !
.1
• .... l, •
••
III 1111
ilH • •
•••••• • •••••
• ,
•• ,
ta..... 1&.....
I. • • i
i .llIf. III. III!•
-
• L. ~llil. 1
if• • • • • •
•••••• • •••••
m
Figure 4.18: Comparator: performance driven placement.
110 Placement
•
•
~
lE;J] IlIlllill . 1111111 111
iii ·• • •·• • •
••
II,
• •
• • •
M4b
4a 5 MSb 4b
3 Mia
inn 1 out
M2a
Vb3--~--~------~-----7--------~1
2a
M2b 1
M3b
Vss
4.12.2 Opampl
The second example is a high-speed CMOS operational amplifier [Fisher 87]. The schematic of
the circuit is shown is Fig. 4.20. The placement that was generated for this circuit is shown in
Fig. 4.21. The symmetry axis of the circuit is clearly visible in the placement. This circuit con-
tains some very large transistors which have been split into parallel parts for reasons explained
in chapter 3. As shown in Table 4.3 all performance characteristics for this circuit are within the
specifications. The nominal values are the values obtained after sizing of the circuit without par-
asitic layout effects (no parasitic wire capacitances and no mismatches). All specifications were
met in one pass of the program (CPU time 83 seconds). Note that the remaining performance
margins after placement are needed for the subsequent routing phase (some parasitics may be
different than estimated during placement).
112 Placement
1 1
opamp performance
Perfonnance Specification Nominal Value After Placement Unit
GBW > 225 228 226 MHz
Av > 60 67 67 dB
PM > 60 61 60.2 deg
slew - rate > 150 163 163 V / J-Lsec
Voffset <5 4.5 4.8 mV
Vdd
M5a MSb
9
~
oulD oulp
-1
15 Ra Rb
10 11
Cca Ccb
Ibias
j 12
cmfb
Vss
opamp performance
Perfonnance Specification Nominal Value After Placement Unit
Av > 100 107 104 dB
GBW > 200 205 202 MHz
PM > 70 77 74 deg
CMRR@lOHz > 70 00 78 dB
PSRR@lOHz > 80 00 86 dB
4.12.3 Opamp2
As a second example, a fully differential CMOS operational amplifier [Peeters 93] (see Fig. 4.22)
was used to test the efficiency of the algorithm for larger circuits. The placement of the opamp
is shown in Fig. 4.23. Note the clear symmetry axis in this fully differential circuit. The circuit
specifications together with the obtained perfonnances after sizing (without parasitics) and after
placement are given in Table 4.4. The degradation of all perfonnances clearly remains within the
specified margins. This placement required a CPU time of 163 seconds (less than 3 minutes) on
a SUN SPARC 10 workstation.
114 Placement
CI a
4.12.4 Opamp3
Finally, we will illustrate the thermal capabilities of the tool with the class AB operational am-
plifier shown in Fig. 4.24. The power consumption of the circuit is 140m W, of which most is
dissipated in the output stage M14, MI5, MI6, M17. An offset voltage of less than 5 mV is
specified.
During circuit analysis, a number of simulations with different device temperatures were
done to determine the sensitivity of the offset voltage of the opamp to temperature differences
between the matching transistor pairs of the circuit. The circuit was then automatically placed
with the algorithm described above and the resulting placement is shown in Fig. 4.25. The
thermal profile of the placement is given in Fig. 4:26. Table 4.5 shows the performance char-
acteristics of the operational amplifier. For each performance characteristic, the specification,
nominal value and simulated value after placement is given. The nominal value of a characteris-
tic is determined by simulating the circuit without parasitics. The value of a characteristic after
placement is determined by simulating the circuit with the parasitic effects. Since the real values
of interconnect parasitics are unknown, these values have to be be estimated. It can be concluded
from Table 4.5 that all performance characteristics are within the specifications after placement.
Table 4.6 explains the result for the offset voltage in more detail. This table shows the sensitivi-
ties, the temperature and the resulting offset voltage degradation for the most sensitive transistor
pairs in the circuit. It can be seen from the layout that the sensitive transistor pairs have been
placed symmetrically with respect to the output transistors. Note that the thermal profile calcu-
lation can also be used interactively, for example when a manually generated layout is given and
the designer wants to explore the impact of thermal effects.
116 Placement
Vdd
M2 M14
M4
M3 MIS
M5
M6
M7
M16
M17
Vss
"letnperat1.lr," -
2' 2' - .
170
1 3'
000'
T O "~ •
,.
3-
Routing
5.1 Introduction
In this chapter, we discuss the routing problem for high-performance analog circuits. The routing
phase is critical for the overall performance of the circuit, since it fixes the final values of the
interconnect parasitics. While the placement phase has taken into account the effect on the
performance of the minimum values for the interconnect parasitics, their real value is determined
during routing. Therefore, the main concern during performance driven routing is to connect all
wires while limiting the performance degradation introduced by the actual interconnect parasitics
within the specifications of the user.
• performance constraints
Each wire that is implemented by the router introduces a parasitic series resistance and a
parasitic capacitance to every other conductor in the circuit. We will use an RC model
for the interconnect and ignore inductive effects (see chapter 2). These elements introduce
parasitic loading and coupling effects into the circuit. Routing has to be done such that
the combined influence of these interconnect parasitics on the performance of the circuit
remains within the specifications imposed by the designer.
• symmetry constraints
For the differential signal paths in an analog circuit, the matching between parasitics on
symmetric nodes is often more important than their absolute values. To match the parasitics
of two nets, they have to be routed symmetrically, even if the placement is not completely
symmetrical.
• yield/testability effects
The layout of a circuit has a profound impact on the likelihood of certain faults and fault
types [Maly 90]. By careful routing, it is possible to reduce the total expected number of
faults and hence to improve the yield of a circuit. Moreover, it is also possible to improve
the testability of a circuit by decreasing the probability of occurrence of hard to detect
faults. A manufacturability driven routing algorithm minimizes the defect sensitivity and
increases the testability of an analog circuit, while bounding the performance degradation
within the allowed margins.
• n-Iayer routing
State-of-the-art technology processes have 6 or more metal layers. An analog routing
algorithm has to take advantage of all available routing layers to improve the performance
of the circuit.
(a) (b)
avoid blocking situations) and to facilitate the computations. On the other hand, it also restricts
the number of solutions that can be reached and may lead to sub-optimal solutions.
Generally speaking, reserved layer and gridded routing models reduce the complexity of
the routing problem and will result in faster routers. However, to achieve their efficiency, they
model the routing problem in a way that is too restrictive for analog circuit level layout. In
addition, analog circuit leve1layout is usually of a much smaller complexity than general Ie
layout. Therefore, we have chosen a grid-less, unreserved layer model for our router.
• Depth-First Search (DFS) In a DFS strategy, a new solution is generated from the most
recently generated solution. A partial solution is expanded as deeply as possible until the
target is reached or no further expansion is possible. In the latter case, the algorithm tracks
back to the previous partial solution and restarts the expansion process from there. The
order of exploring solutions in DFS is last-in-first-out.
• Breadth-First Search (BrFS) In a BrFS strategy, all solutions on the same level are ex-
plored before any other solution is generated. If the target is not reached for any solution
of expansion level n, the algorithm proceeds by generating solutions of level n + 1. The
order of exploring solutions in BrFS is first-in-first-out.
124 Routing
• Best-First Search (BeFS) The basic idea of the BeFS strategy is to expand new solutions
from the current solution with the best cost value. The advantage of this search strategy
is that if the target is reached, we can be sure that the path found is the minimal cost
path, since all other partial solutions visited have greater cost. BeFS can show a dramatic
improvement in time and space efficiency over blind searches as DFS and BrFs.
• Heuristic Search (HS) BeFS relies on historical information to predict which partial
solutions are the most likely to be on a minimal cost path. The ideal algorithm would
operate on perfect information, thereby always choosing the correct solution to expand at
each stage of the search. This is impossible, since the solution is not known. It is however
possible to use heuristics to predict the solution which is most likely on a minimal cost
path. In HS, this solution is selected for further expansion. An example of a heuristic
search algorithm is the A* algorithm [Nils 71].
In our routing tool we have chosen a heuristic search strategy with a special heuristic that
targets the expansion towards the routing solution that is best for the circuit performance. This
will be explained in detail in section 5.7.
• Lee's algorithm [Lee 61] In this algorithm, the search is conducted symmetrically in every
direction, using the breadth-first search technique. This can be seen as a wave propagating
from the source until the target is reached. This algorithm guarantees finding the shortest
path between two terminals if one exists.
• Soukup's algorithm [Souk 78] Soukup's algorithm uses a depth-first search until an ob-
stacle is encountered. If an obstacle is encountered, a breadth-first search method is used
to get around it. The running time of Soukup's algorithm is usually better than Lee's, but
this algorithm does not guarantee to find the shortest path.
5.5 Previous Work in Area Routing 125
• Hadlock's algorithm [Had 75] Hadlock's algorithm uses the A· heuristic search method
to prefer the direction of the search toward the target. Hadlock's algorithm is usually faster
than Lee's and Soukup's and guarantees finding a shortest path between the tenninals if
one exists.
Several extensions of these algorithms have been proposed to deal with multi-terminal nets
and multi-layer routing models.
• Mikamifl'abuchi's algorithm [Mika 68] In this algorithm, new trial lines are generated
by drawing perpendicular line segments through every grid point of the current trial line.
This search process is similar to breadth-first search and is guaranteed to find a path if one
exists. However, the path may not be the shortest one.
• Hightower's algorithm [High 69] In Hightower's algorithm, only one new trial line is
generated from a previous one. The new trial line is drawn such that it avoids the obstacle
that blocked the current trial line. [High 69] describes three different procedures to avoid
different types of obstacles. Although this search strategy performs very well in most
practical cases, it can not guarantee that a path will be found when it exists.
pushed onto a stack. In the next step, the generated active lines are expanded outside the zone
for further search. This procedure is initiated from the source terminal as well as from the target
terminal. A sequence of expansion zones is then generated from each of the terminals and the
execution of the algorithm is terminated if a zone originating from the source terminals meets a
zone originating from the target terminal. A back trace is then executed to find the connecting
path. This algorithm guarantees to find a path if one exists. This may not be the minimum cost
path.
5.5.4 Discussion
The area routing algorithms presented in this section are grid-based in their original formulations.
As discussed in the previous section, the grid-based routing model is a disadvantage in the con-
text of analog routing since it puts a severe and intolerable limitation on the possible solutions.
However, the basic concepts of the maze routing, line-search and line-expansion algorithms can
easily be generalized. The notion of expanding partial paths to neighbors is not restricted to rect-
angular grids but can be extended to any routing model. In a grid-less area router, partial paths
are represented by their actual geometry and a general expansion process is defined to expand a
partial path into its neighbors. If a grid-less routing model is used, the distinction between maze
routing and line-search (or line-expansion) becomes less important. If a unit length expansion
is used, the resulting routing algorithm can be called a grid-less maze router. If paths are ex-
panded over longer distances or until they hit an obstacle, the term grid-less line-search router
might be appropriate. The basic path finding mechanism used in our routing tool is a grid-less
area routing algorithm that combines features of the different area routing algorithms presented
above. The algorithm is similar to the ones presented in [Sato 87, Marg 87, Arno 88, Cohn 91].
Following [Marg 87] we will call our algorithm a grid-less maze routing algorithm, although it
combines elements from the maze routing and line-search routing techniques. The algorithm will
be discussed in detail in the next section.
1. Out of the collection of partially completed paths, one path is selected for expansion. This
path selection process is based on the A* search strategy. The path selection process and
the cost function will be discussed in section 5.7.
2. The partial path selected in step 1 is expanded into a collection of new partial paths using
the expansion mechanism discussed in section 5.6.3.
5.6 A Grid-Less Maze Routing Algorithm 127
partial
path
heap
del
3. The partial paths generated in step 2 are checked to see if they are design rule correct, i.e.
if they do not overlap with the devices or with previously routed wires.
4. The design rule correct paths are checked again to see if they overlap with the target tenni-
nal. If they do, the connection is complete and the iterative procedure can be tenninated. If
the target has not been reached yet, the new partial paths are inserted into the partial path
collection and they become candidates for further expansion, together with all the partial
paths generated during previous iterations.
In the following sections, we will discuss the routing model and the path expansion steps in
detail.
(a) (b)
Figure 5.4: Layout representations : (a) bin based (b) comer stitched.
access operations are executed thousands of times, an efficient data-structure, which allows fast
area search, is needed. Several data-structures have been proposed used to store a collection of
layout shapes :
• linked list The simplest data structure used to store a collection of shapes is a linked list,
where each list element represents a shape. The complexity of the area search operation
for a linked list data-structure is O(n) where n is the number of shapes.
• bin based (see Fig. S.4(a» A bin based data structure can be seen as an augmented version
of the linked list. A virtual grid is superimposed on the layout area. This grid divides the
area into a series of bins which can be represented using a two-dimensional array. For each
bin, a list of shapes intersecting it is stored. The worst case complexity for the area search
operation is O(b + n) where b is the number of bins. However, in practical situations, the
effective complexity is a lot better than O(b + n).
• corner stitching data-structures (tile planes, see Fig. S.4(b» [Oust 84] In comer stitch-
ing, shapes and empty space are represented by non-overlapping, rectangular tiles. Tiles
are linked to their neighbors at their lower left and upper right comers by pointers, called
comer stitches. Comer stitching differs from linked lists and bin based data-structures in
that empty space is represented explicitly by space tiles. The worst case complexity of the
area search operation is O(n), but in practical situa~ions, the area search operation is very
fast, especially if so-called hint tiles are used.
Tile planes perform very well for area searches in practice, despite their theoretical worst-case
running time that is proportional to the number of rectangles in the database. This makes them
very attractive for use in grid-less routing algorithms and several area routers based on tile planes
have been reported [Marg 87, Cohn 91].
In our routing tool, we use a tile based area representation with two separate tile planes for
each routing layer. Prior to routing , all features on a certain layer are inserted in two separate
tile planes, one organized in maximal vertical strips and one organized in maximal horizontal
strips. For each routing layer, the user can define additional areas where routing is not allowed.
5.6 A Grid-Less Maze Routing Algorithm 129
m
CO Cl T1 I
~----~ C3
I
C2 I
C4
T2 T2
(a) (b)
These areas are also stored in the tile planes for the routing layer. Vias are duplicated in the tile
planes of the two routing layers that they connect. The dual comer stitching representation for
each layer speeds up the path expansion and the DRC steps during routing.
I
r--
TIl I
Figure 5.6: Reconstructing a paniai path from its routing cell representation.
source
region
connecting tenninals T 1 and T2 is represented by the four routing cells (el, c2, c3, c4). Cells el
and c4 are regular cells, c2 and c3 are via cells. In the remainder of this chapter, we will represent
regular routing cells by circular symbols and via cells by square symbols on their center point.
Fig. 5.6 shows how a partial path is reconstructed from its routing cell representation: starting
from the head cell, we follow back trace pointers until the path changes direction or layer. The
bounding box of all cells encountered during this process is added to the path layout. The last
cell encountered is now used as a head cell and the process is repeated until the source cell is
encountered. When a via cell is encountered, a physical via is constructed and added to the path
layout. Note that routing cells are allowed to overlap each other.
terminal or a partially connected subnet resulting from previous routing steps. In the latter case,
the source region consists of a number of device terminals connected with a partially routed net.
Before the path finding algorithm can start, this region has to be transformed into a collection of
routing cells, suitable for further expansion by the algorithm. For each source box, source cells
are generated on the same routing layer of the box, and on the next higher and next lower layer
if the box dimensions allow the insertion of via cells. This process is depicted in Fig. 5.7.
Source cells on the same layer are created on the sides of the box that are adjacent to routable
area. To determine the location of the routing cells, the source box is shrunk by half the cell
width. Routing cell centers are placed on the perimeter of the shrunk box, at fixed intervals given
by the minimum resolution of the technology process (see Fig. 5.7(a». Note that all source cells
overlap the box from which they originate.
If the dimensions of the box allow the insertion of a via, via cells connecting to the next higher
and lower layers are also created and inserted. The centers of the via cells are again determined
by shrinking the source box by half the via cell width and placing center points at fixed intervals
given by the process resolution (see Fig 5.7(b) and (c».
After source region expansion, the cost of each cell is computed using the cost function de-
scribed in section 5.7 and the cells are inserted into the heap as candidates for further expansion.
The width of a routing cell depends on the current that is flowing in the wire that is to be
routed. Let [ be the current of the wire, then the width Wi of a regular routing cell on layer i is
given by:
[
Wi = - - , (5.l)
[mcu.i
where [max.i is the maximum current density of layer i, expressed in Ampere per meter. For a via
cell connecting layers i and j, the computation has to be done based on the maximum current in
a via. The neccesary number of vias N via can be computed as follows:
[
N via = - - , (5.2)
[max.ij
where [max.ij is the maximum current in a via connecting layers i and j. The width Wij of the
via cell is then given by :
where ceil(x) denotes the smallest integer greater than or equal to x and max(x, y) the maxi-
mum of x and y. viaSeparationij and viaOverlapi are technology constants specifying the
minimum separation between vias connecting i and j and the minimum layer i overlap of a via,
respectively. Note that the wire width is a property of a wire segment, not of the entire net.
------~a -------
,, ,
,
(a) (b)
-------~------ ----~---------
,, ,,
(c) (d)
1m] ~
~ ~
(a) (b)
cell, this can be done by popping the cheapest routing cell from the heap and deriving a set of
new routing cells from it. Regular routing cells are expanded into cells on the same layer and
into one or more via cells. A via cell is expanded into regular cells on one of its layers and into
one or more vias.
5.6 A Grid-Less Maze Routing Algorithm 133
The expansion process for regular routing cells is illustrated in Fig. 5.8. Without loss of
generality, we assume that the current routing cell was expanded from the bottom direction.
Original cells in Fig. 5.8 and 5.9 are shown in thin lines, new cells in thick lines. For each
expansion type, only one new cell is drawn, others are represented by their center points. New
cells on the same layer are created by translating the original cell in all three non backward
directions as shown in Fig. 5.8(a). Expansion on the next higherllower layer is done by creating
a number of via cells on top of the original cell. In general, the width of the via cells differs from
the width of the regular routing cells and therefore, several different alignments are possible. The
via cell is always created such that its top edge aligns with the top edge of the original cell. The
three different side edge alignments are illustrated in Fig. 5.8(b) (center alignment), (c) (left edge
alignment) and (d) (right edge alignment). In the most general case, the expansion of a regular
routing cell results in 9 new cells: three regular cell on the same layer, three via cells on the next
higher layer and three via cells on the next lower.
Via cells are expanded as shown in Fig. 5.9. We assume that the via cell from which the
expansion is done originated from a lower layer, i.e. new regular routing cells are created on
the top layer of the via cell. Since a via cell is in general larger than the corresponding top layer
routing cell, expansion on the same layer is similar to source region expansion (see section 5.6.2).
The layer box of the via is shrunk by half the routing cell width and top layer routing cell centers
are placed on the perimeter of the shrunk box, at fixed intervals given by the minimum resolution
of the technology process (see Fig. 5.9(a». If the process allows stacked vias, via cells to the
next higher layer can be created. If the size of the new via cell is smaller than the original one,
the expansion is similar to the one used for regular routing cells (see Fig. 5.9(b». In the opposite
case, different alignments must be considered and the approach is the same as for creating new
via cells from regular routing cells.
Each new routing cell results in a new partial path which is checked for design rule violations
with respect to itself and to other already routed nets (DRC operation). If the new partial path
is design rule correct, its cost is computed and the routing cell is inserted into the heap. The
expandlDRC/compute cost sequence is executed thousands of times during the routing proce-
dure. It is therefore crucial to perform these operations as fast as possible. The DRC operation
involves an area search in the corner stitching database of the terminal and previously routed net
geometry. The worst case complexity of this operation is O(n), where n is the number of tiles
in the database [Oust 84]. In practical situations however, the area search operation is very fast,
especially if so-called hint tiles are used. The use of a hint tile reduces the complexity in most
practical cases to 0(1).
Each new path expansion is done from the cheapest routing cell present in the collection of
previously expanded cells. This is implemented by storing all routing cells in a priority queue
and popping the cheapest cell from the queue for each expansion cycle. The priority queue is
implemented by a Fibonacci heap [Fredman 87]. Inserting a cell in the heap takes time O(logn),
where n is the number of cells already present in the heap. The space requirement is O(n).
134 Routing
(5.4)
Cn-I to Co can be derived from Cn by following back-trace pointers. Co is a source cell derived
from the source region. The cost of Pn, PathCost(Pn), is the sum of two terms:
PathCostact(Pn) is the actual cost of the path from the source cell Co to the head cell Cn and
PathCostpred(Pn), the predictor term, is an estimation of the cost required to complete the route
from cell Cn to the target terminal. The predictor term is used to bias the expansion preferentially
towards the target as described in section 5.7.2.
PathCostact(Pn) is computed recursively by adding the cost of the segment connecting Cn
and Cn-I to PathCostact (Pn- I ) :
(5.6)
The actual cost of Po, i.e. the path formed by the source cell Co is equal to zero:
PathCostact(PO) = ° (5.7)
The second term of the cost function, PathCostpred(Cn), can be computed from the location of
the head cell of the path :
(5.8)
Note that PathCostpred(Pn) is only a function of the head cell of the path, whereas
PathCostact(Pn) is a function of the all routing cells in the path.
U le .. 12
~ level)
D levelO
Figure 5.10: Parasitics introduced by the segment Cn-I -* Cn : parasitic resistance R, overlap
capacitance Cov, lateral capacitance C, and fringing capacitance C f.
In
R; = Psquare.L, - (5.9)
Wn
tion:
Ap
U j =
SPjC
Ci i + SPJR ~ 1 sPj C
R/ i + L... 2: C" ki (5.10)
k=l.k;<1
P 8P
where m is the number of nodes in the circuit. Sc1
i
=-
8C
' , SRP =
8P
8Ri
1
i
P
_J and Sc1 = __
ki
8P
8Cki
J are the
i
sensitivities of performance characteristic Pj to small changes in the parasitic capacitance Ci ,
the parasitic resistance Ri and the coupling capacitance Cki , respectively. These sensitivities are
determined in advance by simulation. The total cost of the segment is computed by summing the
performance degradations ~Pj for each performance characteristic Pj :
N,
Costact(cn , Cn-I) = L ~Pj (5.11)
j=1
where Ns is the number of performance specifications for the circuit. This cost value can be
combined with the contributions of all the previous segments to determine the total actual cost
of the partial path Pn using equation (5.6).
o D
A pair of symmetric nets is routed in one step. During this routing step, only one of the nets
is actually routed, the symmetric counterpart of the net is generated afterwards by mirroring the
net with respect to the symmetry axis. To make sure that the layout of the symmetric net pair is
DRC correct on both sides of the axis, the DRC step which is executed after each expansion step
has to be done on both sides of the axis. A new cell is first checked for DRC violations on the
side that is actually routed. If it passes this check, it is mirrored to the other side of the symmetry
axis and checked again. The cell is accepted as a candidate for further expansion only if it is
legal on both sides of the symmetry axis. This double check procedure guarantees symmetric
routes even in the case of partially symmetric placement.
The procedure is illustrated in Fig. 5.11. During routing of this symmetric net pair, only
the net on the right side is actually routed. In the absence of symmetry constraints, this would
result in an almost straight route between the two terminals. Due to the symmetric expansions
process, the route has to make a detour to avoid the obstacle that is present on the left side of the
placement.
1. Randomly select a terminal from the set of terminals to be interconnected and use it as the
source.
138 Routing
[] nC111.2.31
2
2-- 3
2. Select the tenninal which is closest to the source in Manhattan distance and use it as the
target. Use the basic path connection algorithm to connect source and target tenninal.
3. Create a new source by combining the source and target tenninals and the routed connec-
tion and go to step 2.
The second algorithm is similar to Kruskal's spanning tree construction process [Krus 56)
and consists of the following steps :
1. Find the two tenninals which are closest to each other and connect them using the basic
routing algorithm.
2. Unify the two tenninals and their interconnection into one tenninal and go to step I .
We have implemented both of these algorithms in our routing tool. The quality of the nets
produced by both algorithms was found to be similar in all our experiments. The routing se-
quence for a three terminal net is illustrated in Fig. 5.12. Note that in this case, both algorithms
would use the same routing sequence.
this algorithm can be called in an iterative fashion until all nets have been routed. However, the
sequential nature of this approach makes it very hard to predict the consequences that routing
a net will have on the global performance and routability of the circuit. As a consequence, the
quality of the final result depends heavily on the order in which the nets are routed. To reduce
the dependence of the router on net ordering, and to improve the quality of the final result, we
have introduced a three phase routing schedule consisting of a pre-routing step, a performance
driven routing phase and a manufacturability improvement phase.
Np
ll.Pj = L sf Pi
i=1
(5.12)
Nt
k ~ .
ll.Pj = L Sf Pi (5.13)
i=1
where N; is the number of parasitics associated with net k. We now define Fb the performance
impact factor for net k as:
"N, I1pf
lO'
rk = L..j=1Ns TPj (5.14)
140 Routing
where Ns is the total number of specifications imposed on the circuit. Fk is a number between
oand 1 and measures the impact of parasitics associated with net k on the overall performance
of the circuit. We use this factor to determine the net routing schedule. Nets are ripped up and
rerouted in increasing order of performance impact factor, i.e. the nets with the highest impact
on the performance are left in their pre-routed, optimal state, and nets with lower impact on
performance are forced to take on less optimal paths to remove DRC violations and to improve
performance.
• The cost function is changed to favor paths that optimize the manufacturability of the
circuit. A term that measures the impact on the manufacturability of the circuit is added to
the cost function:
n
Tmanu/ = L
i=l.i#k
Aki'Vki (5.15)
with n the number of nets and k the net which is rerouted. In this equation, Aki denotes the
expected number of bridging faults between circuit node k and i, and 'Vki is a measure of
the difficulty of detecting the bridging fault. The derivation of Aki and 'Vki is discussed in
the next section.
Figure 5.13: Layout fragment illustrating the influence of layout on defect-to-fault mapping.
detennined by the layout of a circuit. In Fig. 5.13 a fragment of a layout with three local defects
is shown. One of the three defects causes a short between net i and net k, the other two have
no influence on the connectivity of the circuit. For this layout, three defects result in one fault.
By increasing the separation between the wires implementing net i and net k, this fault can be
avoided. In [Stapper 83a, Stapper 84] the concept of critical area was defined as the area of a
layout in which the center of a defect must fall to cause a fault. In general, the critical area of
a layout is a function of the defect size. The larger the radius of a defect, the larger the area
in which the defect can lie to cause a fault. The critical area can be defined separately for each
defect mechanism and for each fault.
(5 .16)
where Aij denotes the critical area for fault i due to defect mechanism j and Dj(X) the defect
size distribution for defect mechanism j. The average number of faults for fault i due to the
combined effect of all defect mechanisms is then given by :
m
Under the assumption of Poisson distribution of defects, the elementary sub-yield Yj referred to
fault i is [Stapper 83b] :
(5.18)
142 Routing
area
Yj is the probability of non-occurrence of fault i. The yield Y of the circuit is the probability that
no fault is present in the circuit and can be computed as :
Y = n = ne-
n
i=1
Yj
n
j=1
Aj = e- A (S.19)
where A = L:7=1 Aj. The probability P that a fault i occurs can be derived from (S .18) :
j
Pj = 1 - Yj = 1 - e- Aj (S.20)
Using equations (S.17) - (5.20), the problem of determining the probability of occurrence for
each individual fault Pi and the overall yield Y for a g iven layout can be reduced to determining
the defect size distributions Dj(X), j = 1· . . m for each defect mechanism and the critical area
Ai} for each fault i and defect mechanism j.
We will consider two types of defect mechanisms in this work [Stapper 84]. First are dielec-
tric pinholes, very small defects which often occur in insulators, like silicon dioxide or silicon
nitride which are used between the conductive layers of integrated circuits. Their occurrence can
result in a short between wires at different routing levels . The critical area associated with these
defects is the overlap region between two wires (Fig. 5.14). If fault i is a shOrI between two nets
j and k , the critical area for fault i caused by dielectric pinholes A j • ph can thus be determined
as the total overlap area A ov. j k between wires at different photolithographic levels implementing
net j and net k :
A j • ph = Aov.jk (S .21)
The size of dielectric pinhole defects is very small compared to layout dimensions, and can be
considered constant for yield calculations. The defect size distribution is then simply given by :
(S .22)
where Dph is the average density of dielectric pinholes, in units of defects per unit area. The aver-
age number of faults for fault i due to the dielectric pinhole defect mechanism can be calculated
using (S.16) with the critical area given by (S.21) and the defect size distribution by (S.22) :
(5.23)
5.9 Estimating Yield and Testability 143
Figure 5.15: Critical area for a short between two wires caused by photolithographic defects.
The second class of defects are photolithographic defects, for which the defect size becomes
of importance. Sub-micron pattern dimensions are typical for integrated circuits manufactured
today. Dust and dirt particles with similar dimensions can interfere with the photolithographic
processes used to define the patterns. These defects can break wires (open faults) and can create
bridges between wires on the same routing level (short faults) . The critical area calculation for a
bridging fault i between two parallel conductors of length L, separated by a narrow slit of width
s is illustrated in Fig. 5.15. It can be seen that the critical area is a function of the defect size
X. If X is smaller than the separation s between the wires, the defect can not cause a bridging
fault and the critical area is zero. If X is larger than s, the critical area is proportional to X. The
critical area A j • pd for bridging fault i caused by photolithographic defects can thus be modeled
by the following equation:
Aj.pd(X) = {~(X - s)
forO < X <
for s < X <
S
00
(5.24)
where L is the parallel length of the two wires (see Fig. 5.15). The critical area calculation for an
open fault in a wire (see Fig. 5.16) is analogous to the calculation for a short between two wires.
If w is the width of the wire, the .critical area is given by :
forO::::: X ::::: w
(5 .25)
for w ::::: X ::::: 00
The critical area as a function of defect size has to be combined with a defect density
distribution to compute the average number of faults. An acceptable distribution was given
in [Stapper 83a] :
Figure 5.16: Critical area for an open in a wire caused by photolithographic defects.
o:s
....
(l)
u
« ~
(l)
';;j Q
u
:E (l)
>
.~
U
'0
0::
Figure 5.17: Critical area and defect density as a function of defect size
with Dpd the average photolithographic defect density in units of defects per unit area. This
defect size distribution and the critical area are plotted in Fig. 5.17. The density of very small de-
fects is assumed to increase linearly with defect size to a point where the straight line crosses the
~ curve. The peak of this distribution occurs at defect size Xo, which depends on the technology.
x
Defects smaller than XO can not be resolved by the optics used in the photolithographic process.
The minimum dimensions of the patterns must therefore always be larger than Xo. The average
number of bridging faults Aj.pd due to photolithographic defects can be calculated by inserting
the critical area (5.24) and the defect size distribution (5.26) in (5.16):
Aj.j = 1
s
00 XgDpd
L(x - s)--dx
X3
(5.27)
The second part of the defect density distribution has been used since Xo is smaller than s. Eval-
5.9 Estimating Yield and Testability 145
2 3 2 3
(a) (b)
where w is the width of a wire. The total failure rate As for a fault s causing a short between two
nets j and k can be computed by summing the failure rates caused by dielectric pinholes (5 .23)
and by photolithographic defects (5 .28) :
(5.30)
where A ov • jk is the total overlap area between the two wires and L jk the total parallel length of
the two wires. The total failure Ao rate for a fault 0 causing an open in a net j is given by :
(5.31 )
5.9.2 Testability
Based on the expected number of faults, it is impossible to distinguish between the two lay-
out fragments depicted in Fig. 5.18. In both cases, the expected number of faults is equal to
LX~ Dpd(2~1 + 2~2)' However, from a testability point of view, it is possible that faults between
146 Routing
nodes 1 and 2 are easier to detect than faults between nodes 2 and 3 and in that case (b) is to be
preferred over (a). To enable the router to make intelligent decisions in cases like this, we need a
measure of detectability to weigh the faults. Such a measure is developed next.
Analog test strategies can be classified in two categories [Milor 89]:
• (a) specification based testing: this technique distinguishes between a good and a faulty
circuit by testing all of a circuit's specifications. Specifications of analog circuits are typi-
cally based on their dynamic and transient behavior which makes testing for all specifica-
tions a time-consuming and expensive technique .
• (b)fault based testing: this technique detects faulty circuits by measuring a set of electrical
responses to an input stimulus and comparing this set to the simulated responses of the
fault-free circuit. This technique is much cheaper but has the disadvantage that circuits
can be mis-classified. Due to the statistical nature of this test technique, faulty circuits can
be classified as good ones and vice versa.
In the remainder of this section we will discuss only parametric testing techniques [Gielen 94].
We consider an electrical circuit whose response to an input stimulus is measured. The response
measurements may be nodal voltages, currents, linear matrix parameters, etc. If m electrical
responses are measured, the m-dimensional vector¢ = (¢o, ¢I, ... , ¢m_I)T is called the response
vector of the circuit. A typical parametric test strategy consists of the following steps:
1. During design, N Monte-Carlo simulations are carried out to measure the mean vector /.Lo
and the covariance matrix :Eo of ¢ for the fault-free circuit. N can be chosen such that the
standard deviations of the response estimates fall within a prescribed tolerance. During the
Monte-Carlo simulations, normal distributions are assigned to all parameters affecting the
performance of the circuit (technology parameters, device parameters, etc.)
2. During testing, the response vector ¢ is measured for the circuit under test and compared
to the simulated response. Since a direct comparison is impossible, a statistical decision
criterion must be used. Assuming that ¢ for the nominal circuit takes on a multivariate
normal distribution, it can be shown that the solid ellipsoid of ¢ vectors satisfying the
relation:
(5.32)
has probability a [And 58]. X; (a) is the 100ath percentile of a chi-square distribution
with m degrees of freedom. To determine if the circuit under test is fault-free, the following
null-hypothesis has to be tested:
HO : E{¢} = /.Lo (5.33)
This test is performed by evaluating the left-hand side of (5.32) for the measured response
vector ¢. If the result is greater than x;(a), the circuit is rejected and is faulty with
probability 1 - a. If the result is less than x;(a) the circuit is accepted and is fault-free
with probability 1 - a. The probability of falsely rejecting a fault-free circuit is a. The
value of a can be chosen depending on the application and the cost of rejecting good
circuits [Gielen 94].
5.10 Experimental Results 147
The effectiveness of the statistical decision criterion (5.32) depends strongly on the separability
of the response vector distributions of the fault-free and the faulty circuits. The method yields
accurate results when the response vectors of the faulty circuits lie in a region of response space
which is separable from the circuit's fault-free response.
In a circuit with n nodes. there are in general ~n(n - 1) types of shorts possible between the
nodes. Denote by i1-(k) the mean of the response vector 4> for the kth type of faulty circuit. The
statistical distance between the response vector of the good circuit and that of the circuit with
fault k can be defined as [And 58]:
(5.34)
The smaller the value of do,to the higher the chance that fault k will remain undetected during
testing. We can thus put II1ki in equation (5.15) equal to the inverse of the statistical distance
between the response vector of the good circuit and that of the circuit with nodes k and i shorted.
The cost function term (5.15) can thus be interpreted as the sum of the expected number
of faults between net k and all other nets i. multiplied by the probability that they will remain
undetected during testing. Minimizing this term for each net means optimizing the yield and the
testability of the overall circuit.
The testability criterion developed above can be used with any analog parametric test method.
To test the routing algorithm. we have used the test technique described in [Gie1en 94]. In this
technique. the results of time-domain simulations of the power-supply current are used to con-
struct the signature of a circuit. The circuit response vector is equal to 4> = (4)0, 4>1, .... 4>m-1 f,
where 4>0 is the RMS value and 4>1 ..... 4>m-1 are the first m - 1 harmonics of the power-supply
current. The experimental results presented in the next section were generated with this testing
algorithm.
S.10.1 Opamp1
Opampl (see Fig. 5.19) is a moderate performance circuit and is therefore a good candidate for
yield/testability optimization. During circuit analysis, Monte-Carlo analysis was used to deter-
mine the statistical distance between the lps spectrum of the good circuit and that of all the pos-
sible faulty circuits. The circuit was then automatically placed with the placement tool described
in chapter 4 and then routed using the algorithm described in this chapter. The resulting layout
is shown in Fig. 5.20. The performance and yield/testability characteristics for the circuit after
layout are given in Table 5.1. The third column gives the results after the performance-driven
stage of the algorithm and the fourth column gives the results after yield/testability optimization.
148 Routing
Vdd
Vss
The performance characteristics given are the unity-gain bandwidth (UGBW) and the phase mar-
gin (PM). To evaluate the testability of the layout, a large number of circuits was tested using the
Ips monitoring technique. Every type of fault was represented in this test set with a probability
calculated from the layout using the technique described in section 5.9. The Test Error Rate
(TER) is defined as the percentage of faulty circuits that are classified as good circuits by the test
algorithm. For this moderate performance circuit, the TER was driven to almost zero during the
yield/testability stage, while the performance characteristics remained within the specifications.
Table 5.1: Performance and Test Error Rate for Opampl after performance-driven routing (Stage
1) and after yield and testability optimization (Stage 2).
S.10 Experimental Results 149
---~/
c
Figure 5.20: OpampJ : layout.
150 Routing
pb pa
bias4--If----------+----.
5.10.2 Opamp2
To test the efficiency of the algorithm, a second test was carried out with a larger and higher
performance circuit :Opamp2 (see Fig. 5.21). The layout was generated in the same way as for
Opampl and the result is shown in Fig. 5.22. The performance characteristics and the Test Error
Rate for Opamp2 are shown in Table 5.2. For this circuit, the Test Error Rate was significantly
reduced with only a moderate degradation in performance. The performance of the final layout
is still within specifications.
Table 5.2: Performance and Test Error Rate for Opamp2 after performance-driven routing (Stage
1) and after yield and testability optimization (Stage 2).
5.10 Experimental Results 151
II Opampl II Opamp2 II
Program Step I seconds I % of total I seconds I % of total II
performance-driven placement 184 33 263 33
performance-driven routing 210 38 321 40
yield/testability optimization 163 29 213 27
total CPU time II 557 100 II 797 100 II
Table 5.3: Execution times for the different layout generation steps for test circuits Opamp 1 and
Opamp2
Implementation
6.1 Introduction
This chapter briefly describes some implementation details of the LAYLA system, and some of
our experiences with its introduction in an industrial environment.
6.2 Implementation
6.2.1 Source Code
The algorithms described in this book have been implemented in the C++ language in the UNIX
environment. The total system comprises about 115000 lines of source code. The total system is
organized in the following 8 subsystems:
• basic data structures (6000 LOC) : implementation of data structures that are used through-
out the system: linked lists, dictionaries, heaps, etc ..
• parser (11000 LOC) : code that parses the netlist file, the technology file and the per-
formance specification files. This code was developed using the LEX and YACC parser
generator programs.
• technology (6000 LOC) : data structures to store technology information and code to access
and manipulate technology data.
• geometry (5000 LOC) : data structures and routines to store and manipulate geometric
data. This includes the minimum spanning tree computation routine and the tile plane data
structure used in the router.
• circuit analysis (4000 LOC) : code that interfaces to a commercial circuit simulator and
processes the output to determine performance sensitivities and operating point informa-
tion.
,.- - - - - - - - - ...
: rpm'!,I-
4 .
: Commercial :
I Layout I
~ ~~v~r.?~~t;.n~_ : Routing
1
• module generators (36000 LOC) : procedural generators for basic device layout structures
• placement (26000 LOC) : the placement algorithm as described in chapter 4. This includes
the code to execute the various moves, the cost function and the simulated annealing en-
gine.
The software architecture of the LAYLA tool is shown in Fig. 6.1. The LAYLA program is
designed in an object oriented way. The use of the object oriented design style results in a system
that is easy to maintain and to extend. All module generators for instance are derived from one
base class. Additional module generators can be added to the system by deriving another class
from the base class and implementing the function that generates the actual layout. None of the
software that manipulates the modules has to be changed.
languages allow users to write extensions to their environments, using the framework's interface
mechanisms. In the case of ROSE, the interface routines were written directly into the system.
LAYLA uses its own internal data structures to represent layouts. To import these data struc-
tures into commercial frameworks, we write the layout to a file as a set of extension language
commands (AMPLE or SKILL). By executing the command file, we recreate the layout in the
framework's layout environment.
Sub
•
Sub ·•
1m
Wen m1 •••
••
Bot
•• ·•• Bot
Top
Rsl
Sub Sub
(a) (b)
Figure 6.2: Layout models for (a) a resistor and (b) a capacitor.
6.4 Results
Manual circuit level layout can be subdivided into module generation, placement, routing and
verification. For a typical design, the layout engineer spends about 50% of his time in module
generation, 12.5% in placement, 25% in routing and another 12.5% in verification . Through the
6.4 Results 157
use of a schematic driven layout methodology, combined with a library of device generators,
the time spent in module generation can be reduced to virtually nothing. The use of module
generators also results in layout that is correct by construction, which reduces the time spent
in verification of the design. Interactive use of the place and route algorithms proposed in this
book significantly reduces the time spent in placement and routing. The overall result of the
introduction of LAYLA in theindustrial environment was a 5 to 7x productivity gain compared
with manual layout. This has been consistently observed during the layout of over 20 mixed-
signal and RF chips.
Chapter 7
General Conclusions
This work has addressed the problem of automatic layout generation for analog integrated cir-
cuits. In chapter 2 we have described the major layout parasitic effects that influence the per-
formance of analog circuits: interconnect parasitics, device mismatch and thermal effects. All
of these effects have to be taken into account simultaneously during layout in order to keep the
performance degradation within specified limits. We have proposed a direct performance driven
layout strategy that, when applied to the different layout steps, guarantees a fully functional lay-
out that respects all performance constraints. The major novelty of the method is that it drives the
layout tools directly by the performance constraints, without an intermediate parasitic constraint
generation step. During placement and routing, the performance characteristics of the circuit are
evaluated using a linear approximation based on performance sensitivities. Using this approach,
a complete and sensible trade-off between different layout alternatives can be made on the fly,
and the resulting circuit layout can be guaranteed to be correct.
Apart from this general result, some more concrete results have been obtained in each of the
analog layout generation subproblems: module generation, placement and routing.
In general, three different strategies can be applied to module generation for analog circuit
layout. The first strategy uses a large library of procedural module generators that implement
most of the analog layout knowledge. The second strategy limits the number of procedural mod-
ule generators as much as possible and relies on the placement tool to assemble the more compli-
cated substructures. A third approach is to construct a limited number of promising alternative
sets of stacks in advance and to select the best one during placement.
We have demonstrated that neither of these strategies does the job in an optimal way. A
number of substructures returns over and over again in analog circuits and can not be assembled
by merging elementary devices. For these components, specific procedural module generators
have to be written. These module generators have to be combined with merging during place-
ment to achieve close to optimal layouts in a fast, reliable and predictable way. As described
in chapter 3, to overcome the problem of technology dependent module generator libraries, we
have proposed a technique that isolates technology dependence in technology parameter files
and device definitions. The LAYLA module generator library has been used with more than 20
different technologies from 5 different foundries without modifications to the source code.
The second important component of an automatic layout system is the placement tool. The
placement phase is crucial since all of the layout parasitics that determine the performance degra-
dation of an analog circuit layout (interconnect parasitics, device mismatch and thermal effects)
are either determined or greatly influenced by the placement of the circuit. Using our direct
performance driven layout strategy, we have created the first analog placement tool that handles
interconnect parasitics, device mismatch and thermal effects simultaneously and directly, with-
out an intermediate constraint generation step, as described in chapter 2. In addition to that,
our placement tool also supports other constraints that are crucial for high quality analog lay-
out: symmetric placement, aspect-ratio and terminal position constraints, arbitrary rectangular
component shapes, dynamic performance driven merging of diffusions and simultaneous loca-
tion and shape optimization. The functionality of the tool has been demonstrated with several
industrial example circuits.
After the placement phase, it is up to the router to add the actual wires to the layout and hence
to fix the final values of the interconnect parasitics. The main requirement for the router is thus to
limit the actual performance degradation within the specifications. It was shown in chapter 5 that
routing also has a profound impact on the yield and testability of the circuit. The main result of
our work on routing is a tool that controls the routing parasitics such that the overall performance
degradation stays within the specifications, and at the same uses any remaining performance
margin to optimize the manufacturability of the layout. Manufacturability optimization is done
by routing nets such that the probability of occurrence of hard to detect faults is minimized. A
new criterion to quantify the detectability of a fault was developed. Two analog circuits were
routed using this technique, and it was shown that the yield and the testability of the circuit
layout improved significantly, while the performance remained within the specifications.
The tools and algorithms described in this thesis have been integrated in the automatic analog
layout system LAYLA, and have been used in an industrial environment for more than a year.
During this period, the tools have been used for the layout of more than 20 mixed-signal and RF
chips. A 5 to 7x productivity gain over manual layout has been observed.
comparing the results of our automatic layout tool with manual layout. Upon close examina-
tion of the layout, the main reason for this area penalty appears to be the artificial separation of
the place and route phases. Experienced analog layout engineers do both tasks simultaneously:
they place components with routing in mind and insert wires such that subsequent placement of
components becomes easy. An important consequence of separating the placement and routing
steps in the macro-cell layout style is that the placement algorithm is responsible for allocating
the routing area. Failure to allocate sufficient interconnect area results in unroutable placements
and therefore, the routing area is usually slightly overestimated, resulting in loss of density. Al-
though the dynamic routing area estimation technique we presented in chapter 4 is a significant
improvement over static approaches, we still believe that a lot can be gained in this area. One so-
lution to the problem would be to perform placement and routing simultaneously. Early attempts
in this direction [Cohn 94] revealed that the main drawback of this approach is the complexity
of the problem, which limits its application domain to small circuits. Other approaches which
can be investigated for larger circuits include combining a global routing algorithm within the
placement optimization loop or the use of compaction after routing. The latter solution requires
a sophisticated compaction algorithm that is capable of handling performance constraints and
various other analog constraints such as symmetry and matching.
Integration of electrical and physical design is another grey area in analog synthesis. During
electrical design, the effect of layout parasitics is usually approximated or discarded totally. For
high-performance analog and RF design, designers usually include estimates of layout parasitics
during electrical design. The estimated values of the layout parasitics are based on experience
and on the estimated size of the circuit. Overestimation of layout parasitics results in wasted
power and area, while underestimation of parasitics leads to physically un-realizable circuits.
The automatic layout tools presented in this thesis could be used to derive more meaningful
estimates of layout parasitics. The speed of the layout generation process is currently too slow to
call in the inner loop of an automatic synthesis process. A template based or procedural approach
could be explored for this purpose.
Bibliography
[Alb 95] 1. Albers, "An Exact Recursion Relation Solution for the Steady-State Surface Tem-
perature of a General Multilayer Structure," IEEE Trans. on Comp., Pack., and Manuf.
Tech. - Part A, vol. 18, no. 1, pp. 31-38, March 1995.
[And 58] T. Anderson, "An Introduction to Multivariate Statistical Analysis," John Wisley and
Sons, 1958.
[Arno 88] M. Arnold, W. Scott, "An Interactive Maze Router with Hints," Proc. 25th ACMJIEEE
Design Automation Conf., pp. 672-676, 1988.
[Bak 90] H. Bakoglu, Circuits, Interconnections and Packaging for VSLI Addison Wesley,
1990.
[Bark 88] E. Barke, "Line-to-Ground Capacitance Calculation for VLSI: a Comparison," IEEE
Trans. on Computer-Aided Design, Vol. CAD-7, No.2, pp. 295-298, Feb. 1988.
[Bas 96] B. Basaran, R. Rutenbar, "An O(n) Algorithm for Transistor Stacking with Perfor-
mance Constraints," Proc. 33rd ACMJIEEE Design Automation Conf., pp. 221-226,
June 1996.
[Bast 96a] 1. Bastos, M. Steyaert and W. Sansen, "A High Yield 12-bit 250-MS/s CMOS D/A
Converter," Proc. IEEE Custom Integrated Circuits Conf, pp. 20.6.1-20.6.4, May
1996.
[Bast 96c] J. Bastos, M. Steyaert, B.Graindourze, W.Sansen, "Influence of Die Bonding on MOS
Transistor Matching," Proc. IEEE Int. Conference on Microelectronic Test Structures,
pp. 27-31, March 1996.
[Cath 88] F. Catthoor, H. de Man, J. Vandewalle, "SAMURAI: a general and efficient simulated-
annealing schedule with fully adaptvie annealing parameters," Integration, the VLSI
journal, vol. 6, pp. 147-178, 1988.
[Chak 90] S. Chakravarty, X. He, S. Ravi, "On Optimizing nMOS and Dynamic CMOS Func-
tional Cells," Proc. IEEE IntI. Symp. on Circuits and Systems, pp. 1701-1704, May
1990.
[Chang 95] 1. C. Chang, M. A. Styblinski, Yield and Variability Optimization of Integrated Cir-
cuits, Kluwer Academic Publishers, Boston, Dordrecht, London, February 1995.
[Chaw 70] B. Chawla, H. Gummel, "A Boundary Technique for Calculation of Distributed Re-
sistance," IEEE Trans. on Electron Devices, vol. ED-17, pp. 915-925, Oct. 1970.
BIBLIOGRAPHY 165
[Cohn 91] J. M. Cohn, R. A. Rutenbar, and L. R. Carley, "KOAN/ANAGRAM II: New tools
for device-level analog placement and routing," IEEE J. Solid-State Circuits, pp. 330-
342, no. 3, Mar. 1991.
[Cohn 94] J. Cohn, J. Garrod, R. Rutenbar, R. Carley, Analog Device-Level Layout Automation,
Kluwer Academic Publishers, Boston, Dordrecht, London, 1994.
[Conway 92] J. D. Conway, G. G. Schrooten, "An automatic layout generator for analog cir-
cuits," Proc. European Design Automation Conf., pp. 513-519,1992.
[Dir 69] S. Director, R. Rohrer, "The Generalized Adjoint Network and Network Sensitivities,"
IEEE Trans. on Circuit Theory, vol. CT-16, pp.318-322.
166 BIBLIOGRAPHY
[Dona 90] W. Donath, et al., "Timing Driven Placement Using Complete Path Delays," Proc.
ACMJIEEE Design Automation Conference, pp. 84-89, 1990.
[Donnay 94b] S. Donnay, K. Swings, G. Gielen, W. Sansen, "A Methodology for Analog High-
Level Synthesis," Proc. IEEE Custom Integrated Circuits Conf, pp. 373-376, May.
1994.
[Donz 91] L.-O. Donzelle et al." "A Constraint Based Approach to Automatic Design of Analog
Cells," Proc. 28th AMCIIEEE Design Automation Conf., June 1991.
[Dun 84] A. Dunlop, V. Agrawal, D. Deutsch, M. Jukl, P. Kozak, M. Wiesel, "Chip Layout
Optimization Using Critical Path Weighting," Proc. ACMlIEEE Design Automation
Conference, pp. 32-39, 1984.
[Fidu 82] C. Fiduccia, R. Mattheyses, "A Linear-Time Heuristic for Improving Network Parti-
tions," Proc. IEEElACM Design Automation Conference, pp. 175-181, June 1982.
[Fisher 87] J. Fisher, Koch R., "A Highly Linear CMOS Buffer Amplifier," IEEE J. of Solid
State Circuits, vol. SC-22,no. 3, pp. 330-334, June 1987.
[Fredman 87] M. L. Fredman, R. E. Tarjan, "Fibonacci Heaps and Their Uses in Improved Net-
work Optimization Algorithms," Journal of the ACM, vol. 34, pp. 596-615, 1987.
[Fuka 76] K. Fukahori, P. R. Gray, "Computer Simulation of Integrated Circuits in the Presence
of Electrothermal Interaction," IEEE Journal of Solid State Circuits, vol. SC-ll, no.
6, pp. 834-846, Dec. 1976.
[Gao 91] T. Gao, P. Vaidya, C. Liu, "A New Performance Driven Placement Algorithm," Proc.
IEEE IntI. Conf. on Computer-Aided Design, pp. 44-47, 1991.
BIBLIOGRAPHY 167
[Geer 93] B. Geeraerts, W. Van Peteghem, W. Sansen, "A BiMOS diode matrix for the charac-
terization of static and transient thermal phenomena on silicon," Proc. SEMITHERM-
IX, pp. 108-111, February 1993.
[Genderen 96] A. J. van Genderen, N. P. van der Mejs, T. Smedes, "Fast Computation of Sub-
strate Resistances in Large Circuits," Proc. European Design and Test Conf., pp. 560-
565, March 1996.
[Ghar 95a] R. Gharpurey, R. Meyer, "Modeling and Analysis of Substrate Coupling in Inte-
grated Circuits," Proc. IEEE Custom Integrated Circuit Conf., pp. 7.3.1-7.3.4, May
1995.
[Ghar 95b] R. Gharpurey, R. Meyer, "Analysis and Simulation of Substrate Coupling in Inte-
grated Circuits," IntI. Journ. of Circuit Theory and Applications, vol. 23, pp. 381-394,
July-August 1995.
[Gielen 92] G. Gielen, K. Swings, WSansen, "Open Analog Synthesis System Based on Declar-
ative Models," chapter 18 in Analog Circuit Design, Operational Amplifiers, Analog
to Digital Convertors, Analog Computer Aided Design, Kluwer Academic Publishers,
Boston, Dordrecht, London, 1992.
[Gielen 94] G. Gielen, Z. Wang, W. Sansen, "Fault Detection and Input Stimulus Determina-
tion for the Testing of Analog Integrated Circuits Based on Power-Supply Current
Monitoring," Proc. IEEE ICCAD, pp. 495-498, November 1994.
[Gielen 95a] G. Gielen, G. Debyser, K. Lampaert, F. Leyn, K. Swings, G. Van der Plas,
W. Sansen, D. Leenaerts, P. Veselinovic. W van Bokhoven, "An Analog Module
Generator for Mixed Analog/Digital ASIC design", IntI. Journal of Circuit Theory
and Applications, pp. 269-283, July-August 1995.
[Gielen 95b] G. Gielen, G. Debyser, S. Donnay, K. Lampaert, F. Leyn, K. Swings, G. Van Der
Plas, P. Wambacq, W. Sansen, "Comparison of Analog Synthesis using Symbolic
Equations and Simulation," Proc. IEEE European Conf. on Circuit Theory and De-
sign, 1995.
[Gielen 96] G. Gielen, F. Franca, "CAD Tools for Data Converter Design: An Overview," IEEE
Trans. on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 43, no.
2,pp. 77-89, Feb. 1996.
[Had 75] F. Hadlock, "Finding a maximum cut of a planar graph in polynomial time," SIAM
Journ. of Computing, vol. 4, no.3, pp. 221-225, September 1975.
[Hall 87] J. Hall, D. Hocevar, P. Yang, M. McGraw, "SPIDER - A CAD System for Modeling
VLSI Metallization Patterns," IEEE Trans. on Computer Aided Design, vol. CAD-6,
no. 6, pp. 1023-1031, Nov. 1987.
[Han 72] M. Hanan, J. Kurtzberg, Design Automation of Digital Systems. M. Breuer, Ed., Pren-
tice Hall, Englewood Cliffs, N. J., Chap. 5, pp. 213-282. 1972.
[Hau 87] P. Hauge, R. Nair, E. Yoffa, "Circuit Placement for Predictable Performance," Proc.
IEEE IntI. Conf. on Computer-Aided Design, pp. 88-91, 1987.
[Heyns 80] W. Heyns, W. Sansen, H. Beke, "A line-expansion algorithm for the general rout-
ing problem with a guaranteed solution," Proc. 17th ACMlIEEE Design Automation
Conf., pp.143-249, 1980.
[High 69] D.W. Hightower, "A solution to line-routing problems on the continuous plane," Proc.
6th Design Automation Workshop, pp. 1-24, 1969.
[Hoc 85] D. Hocevar, P. Yang, T. Trick, B. Epler, "Transient Sensitivity Computation for MOS-
FET Circuits," IEEE Trans. on Electron Devices, vol. ED-32, no. 10, Oct. 1985.
[Hong 90] S. Hong, P. Allen, "Performance driven analog layout compiler," proc. IEEE IntI.
Symp. on Circuits and Systems, pp. 835-838, May 1990.
[Hor 83] M. Horowitz, R. Dutton, "Resistance Extraction for Mask Layout Data," IEEE Trans.
on Computer-Aided Design, vol. CAD-2, no. 3, pp. 145-150, July 1983.
[Host 85] B. J. Hosticka, K.-G. Dalsab, D. Krey, G. Zimmer, "Behavior of Analog MOS In-
tegrated Circuits at High Temperatures," IEEE Journal of Solid State Circuits, vol.
SC-20, no. 4, pp. 871-874, Aug. 1985.
[Jack 89] M. Jackson, E. Kuh, "Performance-Driven Placement of Cell Based IC's," Proc.
ACMlIEEE Design Automation Conference, pp. 370-375, 1989.
[Jeps 84] D. W. Jepsen and C. D. Gelatt Jr, "Macro placement by Monte Carlo Annealing,"
Proc. IEEE int. Conf. on Computer Design, pp. 495-498, Nov. 1984.
[Joarder 94] "A Simple Approach to Modeling Cross-Talk in Integrated Circuits," IEEE Joum.
of Solid-State Circuits, vol. SC-29, no. 10, pp. 1212-1219, Oct. 1994.
[Johnson 84] "Chip Subs tate Resistance Modeling Technique for Integrated Circuit Design",
IEEE Trans. on Computer-Aided Design, vol. CAD-3, pp. 126-134, Apri I 1984.
BIBLIOGRAPHY 169
[Kayal 88] M. Kayal, S. Piguet, M. Declercq, B. Hochet, "SALIM: a layout generation tool for
analog ICS," proc. CICC, pp. 7.5.1-4,1988.
[Ker 70] B. Kernighan, S. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," Bell
Systems Technical Journal, Vol. 49, No.2, pp. 291-308, 1970.
[Kokkas 74] A. Kokkas, "Thermal Analysis of Multiple-Layer Structures," IEEE Trans. on Elec-
tron Devices, vol. ED-21, no. 11, November 1974.
[Krus 56] J. Kruskal, "On the Shortest Spanning Subtree of a Graph and the Travelling Salesman
Problem," Proc. Am. Math. Soc., vol. 7, pp. 48-50, 1956.
[Kuhn 87] J. Kuhn, "Analog module generators for silicon compilation," VLSI Systems design,
May 1987.
[Kuo 93] S.-Y. Kuo, ''YOR: A Yield-Optimizing Routing Algorithm by Minimizing Critical
Areas and Vias," IEEE Trans. Computer-Aided Design, vol. 12, NO.9, September
1993.
[Laar 87] P. van Laarhoven, E. Aarts, Simulated Annealing: Theory and Applications, Kluwer
Academic Publishers, Boston, Dordrecht, London, 1987.
[Laar 87] P. van Laarhoven, E. Aarts, Simulated Annealing: Theory and Applications, Kluwer
Academic Publishers, 1987.
[Lampaert 95c] K. Lampaert, G. Gielen, W. Sansen, "A Performance-Driven Placement Tool for
Analog Integrated Circuits," IEEE Joum. of Solid-State Circuits, vol. SC-30, no. 7,
pp. 773-780, July 1995.
[Lampaert 97b] K. Lampaert, G. Gielen, W. Sansen, "Analog Routing for Performance and
Manufacturability," submitted to IEEE Joum. of Solid-State Circuits.
[Lee 61] C.Y. Lee, "An algorithm for path connections and its app.lication," IRE Trans. on
electronic computers, vol. EC-lO, pp. 346-365, 1961.
[Lee 88] C. Lee, A. Palisoc, "Real-Time Thermal Design of Integrated Circuit Devices," IEEE
Trans. on Components, Hybrids, and Manufacturing Techn., vol. II, no. 4, December
1988.
[Lee 89] C. Lee, A. Palisoc, 1. Min, "Thermal Analysis of Integrated Circuit Devices and Pack-
ages," IEEE Trans. on Components, Hybrids and Manufacturing Techn., vol. 12, no.
4, December 1989.
[Malavasi 95] E. Malavasi, D. Pandini, "Optimum CMOS Stack Generation with Analog Con-
straints," IEEE Trans. on Computer-Aided Design, vol. 14, no. I, January 1995.
[Maly 86] W. Maly, A. J. Strojwas, S. W. Director, "VLSI Yield Prediction and Estimation:
A Unified Framework," IEEE Trans. on Computer-Aided Design, pp. 114-130, Jan.
1986.
[Maly 86] W. Maly, A. J. Strojwas, S. W. Director, "VLSI Yield Prediction and Estimation: A
Unified Framework," IEEE Trans. Computer-Aided Design, vol. CAD-5, no. 1, pp.
114-130, Jan. 1986.
[Maly 90] W. Maly, "Computer-Aided Design for VLSI Circuit Manufacturability," Proc. of the
IEEE, vol. 78, no. 2, pp. 356-391, Feb. 1990.
[Mar 89] M. Marek-Sadowska, S. Lin, "Timing Driven Placement," Proc. IEEE IntI. ConL on
Computer-Aided Design," pp. 94-97, 1989.
[Mar 90] D. Marple, M. Smulders, H. Hegen, "Tailor: A Layout System Based on Trapezoidal
Comer Stitching," IEEE Trans. on Computer-Aided Design, vol. 8, no. 1, pp. 66-90,
January 1990.
[Mich 92] C. Michael, M. Ismail, "Statistical modeling of device mismatch for analog MOS
integrated circuits," IEEE J. of Solid State Circuits, vol. SC-27,no. 2, pp. 154-165,
February 1992.
[Mika 68] K. Mikami, K. Tabuchi, "A computer program for optimal routing of printed circuit
connectors," Proc. IFIPS, vol. H47, pp. 1475-1478, 1968.
[Milor 89] L. Mil or, V. Visvanathan, "Detection of Catastrophic Faults in Analog Integrated
Circuits," IEEE Trans. on Computer-Aided Design, vol. CAD-8, no. 2, Feb. 1989.
[Nab 91] K Nabors, J. White, "FastCap: A Multipole Accelerated 3-D Capacitance Extraction
Program," IEEE Trans. on Computer-Aided Design, vol. CAD-1O, no. 11, Nov. 1991.
[Nab 92] K Nabors, S. Kim, 1. White, "Fast Capacitance Extraction of General Three-
Dimensional Structures," IEEE Trans. on Microwave Theory and Techniques, vol.
MTT-40, no. 7, July 1992.
[Nair 89] R. Nair, L. Berman, P. Hauge, E. Yoffa, "Generation of Performance Constraints for
Layout," IEEE Trans. on Computer-Aided Design, vol. 8, no. 8, Aug. 1989.
[Nils 71] N.1. Nilsson, Problem-solving Methods in Artificial Intelligence, McGraw-HilI, Ch3,
pp. 43-78,1971.
[Ning 87] Q. Ning, P. M. Dewilde, F. L. Neerhoff, "Capacitance Coefficients for VLSI Multi-
level Metallization Lines," IEEE Trans. on Electron Devices, vol. ED-34, pp. 644-649,
1987.
[Ogaw 86] Y. Ogawa, T. Ishii, Y. Shiraishi, H. Terai, T. Kozawa, K Yuyama, K Chiba, "Efficient
Placement Algorithms Optimizing Delay For High-Speed ECL Masterslice LSI's,"
Proc. ACMlIEEE Design Automation Conference, pp. 404-410, 1986.
[Oht 86] T. Ohtsuki (editor), Layout Design and Verification, North Holland, 1986.
[Otten 84] R. Otten, L. van Ginneken, "Floorplan Design using Simulated Annealing," Proc.
IEEE IntI. Conf. on Computer-Aided Design, pp. 96-98, November 1984.
[Oust 84] J. Ousterhout, "Comer stitching: a data-structuring technique for VLSI layout tools,"
IEEE Trans. Computer-Aided Design, vol. CAD-3,no. 1, pp. 87-100, January 1984.
[Pap 91] A. Papoulis, Probability. Random Variables and Stochastic Processes (third edition).
McGraw-Hill International Editions, 1991.
[Peeters 93] E. Peeters, Ghafoor K, "Design of a Fully Differential High-Speed CMOS Ampli-
fier," KU Leuven Masters Thesis. June 1993.
[Prim 57] R. Prim, "Shortest Connection Networks and Some Generalizations," Bell Systems
Technical Joum., vol. 36, pp. 1389-1401, 1957.
[ROSE 97] Rockwell Object Symbolic Environment, Reference Manual Rockwell Semiconduc-
tor Systems, June 1997.
[Rao 86] V. Rao, A. Djordjevic, T. Sarkar, Naiheng, "Analysis of Arbitrarily Shaped Dielectric
Media over a Finite Ground Plane," IEEE Trans. on Microwave Theory and Tech-
niques, vol. MTT-33, pp. 472-475,1986.
[Rijm 88] 1. Rijmenants, T. Schwartz, J. Litsios, R. Zinszner, "ILAC: An automated layout tool
for analog CMOS circuits," proc. Custom Integrated Circuits Conf., pp. 7.6.1-7.6.4,
May 1988.
[Rijm 89] J. Rijmenants et aI, "ILAC: An automated layout tool for analog CMOS circuits,"
IEEE 1. Solid-State Circuits, pp. 417-425, no. 2, Apr. 1989.
[Rueh 73] A.E. Ruehli and P.A. Brennan, "Efficient Capacitance Calculations for Three-
Dimensional Multiconductor Systems," IEEE Trans. Microwave Theory Tech. MTT,
Feb. 1973.
[SKILL 97) SKILL Language and Development, Cadence Design Systems, Inc., 1997
[Sahn 80] S. Sahni, A. Bhatt, "The Complexity of Design Automation Problems," Proc.
IEEE/ACM Design Automation Conf., pp. 402-411, June 1980.
[Sato 87] M. Sato, "A Fast Line-Search Method Based on a Tile-Plane," Proc. IEEE IntI. Symp.
on Circuits and Systems, pp. 588-591, 1987.
[Seq 93] C. Sequin, H. da Silva Facanha, "Comer-Stitched Tiles With Curved Coundaries,"
IEEE Trans. on Computer-Aided Design, vol. 9, no. 1, pp. 47-58, January 1993.
[Shah 91] K. Shahookar and P. Mazumder, "VLSI Cell Placement Techniques," ACM Comput-
ing Surveys, Vol. 23, No.2, June 1991.
[Sher 95] N. Sherwani, Algorithms for VLSI Physical Design Automation, second edition, chap-
ter 3, Kluwer Academic Publishers, Boston, Dordrecht, London, 1995.
174 BIBLIOGRAPHY
[Shyu 84] J. B. Shyu, G. Ternes, F. Krummenacher, "Random error effects in matched MOS
capacitors and current sources," IEEE J. of Solid State Circuits, vol. SC-19,no. 6, pp.
948-955, December 1984.
[Smedes 93a] "Substrate Resistance Extraction for Physics-Based Layout Verification," Proc.
IEEEIPRORISC Workshop on CSSP, pp. 101-106, March 1993.
[Smedes 93b] T. Smedes, N. P. van der Mejs, A. J. van Genderen, "Boundary Element Methods
for Capacitance and Substrate Resistance Calculations in a VLSI Layout Verification
Package", Software Applications in Electrical Engineering, Ed.: P.P. Silvester, pp.
337-344, 1993.
[Smedes 95] T. Smedes, N. P. van der Mejs, A. J. van Genderen, "Extraction of Circuit Models
for Substrate Cross-talk," Proc. IEEE IntI. Conf. on Computer-Aided Design, pp. 199-
206, Nov. 1995.
[Sol 74] J. E. Solomon, "The Monolithic Op Amp: A Tutorial Study," IEEE Journal of Solid
State Circuits, vol. SC-9, no. 6, pp. 314-332, Dec. 1974.
[Souk 78] J. Soukup, "Fast Maze Router," Proc. IEEE!ACM Design Automation Conference,
1984
[Stapper 83a] C. H. Stapper, "Modeling of Integrated Circuit Defect Sensitivities," IBM J. Res.
Develop., vol. 27, no. 6, pp. 549-557, Nov. 1983.
[Steyaert 93] M. Steyaert, R. Roovers and J. Craninckx, "A 110 MHz 8 bit CMOS Interpolating
AID converter," Proc IEEE Custom Integrated Circ. Conf., pp. 28.l.l-28.1.4 May
1993.
[Su 93] "Experimental Results and Modeling Techniques for Substrate Noise in Mixed-Signal
Integrated Circuits," IEEE Journ. of Solid State Circuits, vol. SC-28,no. 4, pp. 420-
430, April 1993.
BIBLIOGRAPHY 175
[Uehara 81] T. Uehara, W. van Cleemput, "Optimal Layout of CMOS Functional Arrays," IEEE
Trans. on Computers, vol. C-30, no. 5, May 1981.
[Van Pet 93] W. Van Petegem, "Iterative Solutions to Electronic Problems, Based on Finite-
Element Methods: Electro-Thermal Simulation and Electrical Impedance Tomogra-
phy," PhD Thesis, Katholieke Universiteit Leuven, September 1993.
[Verghese 95] "Fast Parasitic Extraction for Substrate Coupling in Mixed-Signal ICs," Proc.
IEEE Custom Integrated Circuit Conf., pp. 121-124, May 1995.
[Vitt 85] E. Vittoz, "The Design of High-performance Analog Circuits on Digital CMOS
Chips," IEEE Journal of Solid State Circuits, vol. SC 20, pp. 657-665, June 1985.
[Vlach 83] J. Vlach, K. Singhal, Computer methods for circuit analysis and design. Van Nos-
trand Reinhold, 1983.
[Wimer 87] S. Wimer, R. Pinter, J. Feldman, "Optimal Chaining of CMOS Transistors in a Func-
tional Cell," IEEE Trans. on Computer-Aided Design, vol. CAD-6, No.5, September
1987.
[Wong 86] D. Wong, C. Liu, "A new algorithm for f100rplan design," Proc. IEEE/ACM Design
Automation Conf., 1986.
[Wu 90] San-Yuan Wu, Sartah Sahni, "Covering Rectilinear Polygons by Rectangles," IEEE
Trans. on Computer-Aided Design, vol. CAD-9, no. 4, pp. 377-388, April 1990.