0% found this document useful (0 votes)
37 views72 pages

Asic Design Cadence

Uploaded by

tejaswini9348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views72 pages

Asic Design Cadence

Uploaded by

tejaswini9348
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 72

ASIC - Application Specific Integrated

Circuit
Introduction
• The input to the floorplanning step - output of system partitioning and
design entry—a netlist.
• Netlist - describing circuit blocks, the logic cells within the blocks, and
their connections.
Floorplanning Goals and Objectives
• The input to a floorplanning tool is a hierarchical netlist that describes
– the interconnection of the blocks (RAM, ROM, ALU, cache controller, and so on)
– the logic cells (NAND, NOR, D flip-flop, and so on) within the blocks
– the logic cell connectors (terminals , pins , or ports)

• The netlist is a logical description of the ASIC;


• The floorplan is a physical description of an ASIC.
• Floorplanning is a mapping between the logical description (the
netlist) and the physical description (the floorplan).

The Goals of Floorplanning are to:


• Arrange the blocks on a chip,
• Decide the location of the I/O pads,
• Decide the location and number of the power pads,
• Decide the type of power distribution, and
• Decide the location and type of clock distribution.

Objectives of Floorplanning –
To minimize the chip area To
minimize delay.
Measuring area is straightforward, but
measuring delay is more difficult.
Floorplanning - Optimization

Optimize Performance

• Chip area.
• Total wire length.
• Critical path delay.
• Routability.
• Others, e.g. noise, heat dissipation.

Cost = αA + βL,
Where
A = total area,
L = total wire length,
α and β constants.
Floorplanning
Area

• Deadspace


Minimizing area = Minimizing
deadspace
• Wire length estimation
• Exact wire length not known until
after routing.
• Pin position not known.
• How to estimate?
• Center to center estimation.
Floorplanning Tools
• Flexible blocks (or variable blocks ) :
– Their total area is fixed,
– Their shape (aspect ratio) and connector locations may be adjusted during the placement.

• Fixed blocks:
– The dimensions and connector locations of the other fixed blocks (perhaps RAM, ROM, compiled
cells, or megacells) can only be modified when they are created.

• Seeding:
– Force logic cells to be in selected flexible blocks by seeding . We choose seed cells by name.
– Seeding may be hard or soft.
• Hard seed - fixed and not allowed to move during the remaining floor
planning and placement steps.
• Soft seed - an initial suggestion only and can be altered if necessary
by the
floor planner.

• Seed connectors within flexible blocks—forcing certain nets to


appear in a
specified order, or location at the boundary of a flexible block.

• Rat’s nest:-display the connection between the blocks


lines between connectors.
• Connections are shown as bundles between the centers of blocks or
Aspect Ratio Bounds

No Bounds
• Block 4
• Block 3
• Block 2

• Block 1

NOT
GOOD!!
With Bounds
lower bound ≤ height/width ≤ upper bound

• Soft Blocks
• Flexible shape
• I/O positions not yet determined

• Hard Blocks
• Fixed shape
• Fixed I/O pin positions
Sizing
example*
Floorplanning Tools

Floorplanning a cell-based ASIC.


(a)Initial floorplan generated by the floorplanning tool. Two of the blocks are flexible (A and
C) and contain rows of standard cells (unplaced). A pop-up window shows the status of block A.
(b) An estimated placement for flexible blocks A and C. The connector positions are known and a
rat’s nest display shows the heavy congestion below block B.
(c) Moving blocks to improve the floorplan.
(d) The updated display shows the reduced congestion after the changes.
• Aspect ratio and Congestion
Analysis

(a) The initial floorplan with a 2:1.5 die aspect ratio.


(b) Altering the floorplan to give a 1:1 chip aspect ratio.
Congestion analysis-One measure of congestion is the difference between the number of
interconnects that we actually need, called the channel density , and the channel capacity
(c) A trial floorplan with a congestion map. Blocks A and C have been placed so that we know the terminal
positions in the channels. Shading indicates the ratio of channel density to the channel capacity. Dark areas
show regions that cannot be routed because the channel congestion exceeds the estimated
capacity.
(d) Resizing flexible blocks A and C alleviates congestion.
Channel Definition

• Channel definition or channel allocation


• During the floorplanning step, assign the areas between blocks that are to
be
used for interconnect.
• Routing a T-junction between two channels in two-level metal.
• The dots represent logic cell pins.
• (a) Routing channel A (the stem of the T) first allows us to adjust the width of channel
•B.Route
(b) If the
we route
stem channel B first (the
of a T-junction top of
before the T),
route thethis
top.fixes the width of channel A.
Channel Routing

• Defining the channel routing order for a slicing floorplan using a slicing tree.
• (a) Make a cut all the way across the chip between circuit blocks. Continue slicing until each
piece contains just one circuit block. Each cut divides a piece into two without cutting
through a circuit block.
• (b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks
are left.
• (c) The slicing tree corresponding to the sequence of cuts gives the order in which to route
the channels: 4, 3, 2, and finally 1.
Slicing Floorplan and General Floorplan
• Slicing floorplan •v
•5 •h •h
•1 •3
•1 •2 •v •v
•6
•3 •4 •7
•2 •h •5 •6
•4 •7 • Slicing Tree
• non-slicing floorplan
Area Utilization
• Area utilization
– Depends on how nicely the rigid modules’ shapes are
matched
– Soft modules can take different shapes to “fill in”
empty slots
– Floorplan sizing

• m3 m4

•m
•3 •m1
• m1

3 4
•1 • m1
•2
• m2


m2


•4

m
• m7
• m7

• m6
••
m7
m • m5
• m6

•7 •6 m
• m = 440•Area = 20x1 9
•Area = 20x22 1 79 = 38 77
•5
Slicing Floorplan Sizing
• Bottom-up process
– Has to be done per floorplan perturbation
– Requires O(n) time (N is the # of shapes of all
modules)•V •H

•L •R •T •B

• bi
•b •yj • max(bi, yj) •a • bi+ yj
i•a • xj i
i • xj •yj
• ai+ xj
• max(ai, xj)
Slicing Floorplan Sizing
• Simple case: all modules are hard macros
– No rotation allowed, one shape only
•3
•17x16 1234567
•1 •2
•4 •

•5

•9x15 • 167 • 2345


•7 •6
• m3 • m4

•8x16
• m1 • 67 •1 • 234 •5
• m2

•9x7 • 8x8 •8x11• 7x5


•6 •7 •2 • 34
• m6

• m7 • m5 • 4x7 • 5x4 • 4x8 •4x11


•3
• 3x6 • 4x5
•Slicing Floorplan Sizing
 General case: all modules are soft macros
 Stockmeyer’s work (1983) for optimal module orientation
 Non-slicing = NP complete
 Slicing = polynomial time solvable with dynamic programming
 Phase 1: bottom-up
 Input: floorplan tree, modules shapes
 Start with sorted shapes lists of modules
 Perform Vertical_Node_Sizing & Horizontal_Node_Sizing
 When get to the root node, we have a list of shapes. Select the
one
that is best in terms of area
 Phase 2: top-down
 Traverse the floorplan tree and set module locations
Sizing Example
•A •B • a1 • a2 • a3

• 4x6 • 5x5 • 6x4

• b1 • b1
•a1 •b1 •a2 •b1 • a3
• 2x7 • 6x7 • 7x7 • 8x7

• b2 • a1 • a2 • b2
• b2 • a3 • b2
• 3x4 • 7x6 • 8x5 • 9x4

• a1 • a2
• b3 • b3 • a3
• b3 • b3
•4x2 •8x6 •9x5 •10x4
Cyclic Constraints

• Cyclic constraints.
• (a) A nonslicing floorplan with a cyclic constraint that prevents channel routing.
(b)In this case it is difficult to find a slicing floorplan without increasing the
chip area.
• (c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic
constraints, but it is inefficient in area use and will be very difficult to route.
Cyclic Constraints


• (a) We can eliminate the cyclic constraint by merging the blocks A and C.
• (b) A slicing structure.
I/O and Power Planning (contd.,)
• Every chip communicates with the outside world.

• Signals flow onto and off the chip and we need to


supply power.

• We need to consider the I/O and power constraints early


in the floorplanning process.

• A silicon chip or die (plural die, dies, or dice) is


mounted on a chip carrier inside a chip package .
Connections are made by bonding the chip pads to
fingers on a metal lead frame that is part of the package.

• The metal lead-frame fingers connect to the package pins


. A die consists of a logic core inside a pad ring .

• On a pad-limited die we use tall, thin pad-limited


pads , which maximize the number of pads we can fit
around the outside of the chip.
• On a core-limited die we use short, wide core-limited
pads .
I/O and Power Planning

• FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of
pads determines the die size. (b) A core-limited die: The core logic determines the die
size. (c) Using both pad-limited pads and core-limited pads for a square die.
I/O and Power Planning (contd.,)
• Special power pads are used for:1. positive supply, or VDD, power buses
(or power rails ) and
2. ground or negative supply, VSS or GND.

– one set of VDD/VSS pads supplies power to the I/O pads only.

– Another set of VDD/VSS pads connects to a second power ring that supplies the logic core.

• I/O power as dirty power


– It has to supply large transient currents to the output transistors.
– Keep dirty power separate to avoid injecting noise into the internal-logic power (the clean
power ).

• I/O pads also contain special circuits to protect against electrostatic


discharge
( ESD ).
– These circuits can withstand very short high-voltage (several kilovolt) pulses that can be generated
during human or machine handling.
I/O and Power Planning (contd.,)
• If we make an electrical connection between the substrate and a chip pad, or to a
package pin, it must be to VDD ( n -type substrate) or VSS ( p -type substrate).
This substrate connection (for the whole chip) employs a down bond (or drop
bond) to the carrier. We have several options:

 We can dedicate one (or more) chip pad(s) to down bond to the chip carrier.

 We can make a connection from a chip pad to the lead frame and down bond
from the chip pad to the chip carrier.

 We can make a connection from a chip pad to the lead frame and down bond
from the lead frame.

 We can down bond from the lead frame without using a chip pad.

 We can leave the substrate and/or chip carrier unconnected.

• Depending on the package design, the type and positioning of down bonds may be fixed.
This means we need to fix the position of the chip pad for down bonding using a
pad seed
I/O and Power Planning (contd.,)
• A double bond connects two pads to one chip-carrier finger and one
package pin. We can do this to save package pins or reduce the series
inductance of bond wires (typically a few nanohenries) by parallel connection
of the pads.

• To reduce the series resistive and inductive impedance of power


supply networks, it is normal to use multiple VDD and VSS pads.

• This is particularly important with the simultaneously switching outputs (


SSOs ) that occur when driving buses

– The output pads can easily consume most of the power on a CMOS ASIC, because the load on
a pad (usually tens of picofarads) is much larger than typical on-chip capacitive loads.

– Depending on the technology it may be necessary to provide dedicated VDD and


VSS pads for every few SSOs. Design rules set how many SSOs can be used per VDD/VSS
pad pair. These dedicated VDD/VSS pads must “follow” groups of output pads as they are
seeded or planned on the floorplan.
I/O and Power Planning (contd.,)
• Using a pad mapping, we translate the logical pad in a netlist to a physical
pad from a pad library. We might control pad seeding and mapping in the
floorplanner.

• There are several nonobvious factors that must be considered when


generating a pad ring:

• Design library pad cells for one orientation.


– For example, an edge pad for the south side of the chip, and
a corner pad for the southeast corner.
– Generate other orientations by rotation and flipping (mirroring).
– Some ASIC vendors will not allow rotation or mirroring of logic
cells in the mask file. To avoid these problems we may need to have separate
horizontal, vertical, left- handed, and right-handed pad cells in the library with
appropriate logical to physical pad mappings.

• Mixing of pad-limited and core-limited edge pads in the same pad


ring complicates the design of corner pads.
– In this case a corner pad also becomes a pad-format changer, or hybrid corner pad .

• In single-supply chips we have one VDD net and one VSS net, both
global power nets . It is also possible to use mixed power supplies
(for example, 3.3 V and 5 V) or multiple power supplies ( digital VDD,
analog VDD).
I/O and Power Planning (contd.,)

• FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited pads. (b) A hybrid
corner pad. (c) A chip with stagger-bonded pads. (d) An area-bump bonded chip (or flip-chip). The chip is
turned upside down and solder bumps connect the pads to the lead
frame
I/O and Power Planning (contd.,)
• stagger-bond arrangement using two rows of I/O pads.
– In this case the design rules for bond wires (the spacing and the angle at which the
bond wires leave the pads) become very important.

• Area-bump bonding arrangement (also known as flip-chip,


solder-
bump) used, for example, with ball-grid array ( BGA ) packages.

– Even though the bonding pads are located in the center of the chip, the I/O circuits
are still often located at the edges of the chip because of difficulties in power
supply distribution and integrating I/O circuits together with logic in the center of
the die.

• In an MGA, the pad spacing and I/O-cell spacing is fixed—each pad


occupies a fixed pad slot (or pad site ). This means that the
properties of the pad I/O are also fixed but, if we need to, we can
parallel adjacent output cells to increase the drive. To increase
flexibility further the I/O cells can use a separation, the I/O-
cell pitch , that is smaller than the pad pitch .
I/O and Power Planning (contd.,)

• FIGURE 16.14 Gate-array I/O pads. (a) Cell-based


ASICs may contain pad cells of different sizes and
widths. (b) A corner of a gate-array base. (c) A
gate-array base with different I/O cell and pad
pitches
I/O and Power Planning (contd.,)

• The long direction of a rectangular channel is the channel spine .

• Some automatic routers may require that metal lines parallel to a channel
spine use a preferred layer (either m1, m2, or m3). Alternatively we say that
a particular metal layer runs in a preferred direction .
I/O and Power Planning (contd.,)

• FIGURE 16.15 Power distribution. (a) Power distributed using m1 for VSS and m2 for VDD. This helps
minimize the number of vias and layer crossings needed but causes problems in the routing channels.
(b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine. This can
make automatic routing easier but may increase the number of vias and layer crossings. (c) An
expanded view of part of a channel (interconnect is shown as lines). If power runs on different layers
along the spine of a channel, this forces signals to change layers. (d) A closeup of VDD and VSS buse
they cross. Changing layers requires a large number of via contacts to reduce resistance .
Power distribution.
• (a) Power distributed using m1 for VSS and m2 for VDD.
– This helps minimize the number of vias and layer crossings needed
– but causes problems in the routing channels.

• (b) In this floorplan m1 is run parallel to the longest side of


all
channels, the channel spine.
– This can make automatic routing easier
– but may increase the number of vias and layer crossings.

• (c) An expanded view of part of a channel (interconnect is shown as


lines). If power runs on different layers along the spine of a
channel, this forces signals to change layers.

• (d) A closeup of VDD and VSS buses as they cross. Changing layers
requires a large number of via contacts to reduce resistance.
Clock Planning
• clock spine routing scheme with all clock pins driven directly from the
clock driver. MGAs and FPGAs often use this fish bone type of clock
distribution scheme
• clock skew and clock latency
• FIGURE 16.16 Clock distribution.
• (a) A clock spine for a gate array.

(b) A clock spine for a cell-based ASIC
(typical chips have thousands of clock
nets).

(c) A clock spine is usually driven from
one or more clock-driver cells. Delay in
the driver cell is a function of the
number of stages and the ratio of output
to input capacitance for each stage
(taper).

(d) Clock latency and clock skew. We
would like to minimize both latency and
skew.
Clock Planning (cont.,)
• FIGURE 16.17 A clock tree. (a) Minimum delay is achieved when the
taper of successive stages is about 3. (b) Using a fanout of three at
successive nodes.
(c) A clock tree for the cell-based ASIC of Figure 16.16 b. We have to balance
the clock arrival times at all of the leaf nodes to minimize clock skew.
Conten
t⚫
Placement Definitions
⚫ Placement Goals and Objectives
⚫ Measurement of placement Goals and Objectives
⚫ Placement Algorithms
⚫ Simple placement Example
⚫ Physical Design Flow
Placement
⚫ The process of arranging circuit components on a
layout
surface undercertain constraints.
⚫ Inputs : Set of fixed modules, netlist
⚫ Output : Best position for each module based on various
cost functions
⚫ hotspots,
Cost performance,
functions I/O pads.
include wirelength, wire
⚫ routability,
Placement is much more to automation than
suited floorplanning.
⚫ After we complete floorplanning and placement, we
can predict both intrablock and interblock capacitances
Good placement vs Bad placement*

⚫ Good placement  Bad placement


⚫ No congestion  Congestion
⚫ Shorter wires  Longer wire lengths
⚫ Less metal levels
 More metal levels
⚫ Smaller delay
⚫ Lower power
 Longer delay
dissipation  Higher power dissipation
Placement Terms and Definitions
⚫ CBIC, MGA, and FPGA architectures all have rows of logic cells
separated by the
interconnect—these are row-based ASICs

• 2075(there is room for seven interconnects to run horizontally in m1). (c) A channel that uses OTC (over-the-cell)
routing in m2.
• FIGURE 16.18 INTERCONNECT STRUCTURE. (a) The two-level metal CBIC floorplan shown in Figure 16.11
b. (b) A channel from the flexible block A. This channel has a channel height equal to the maximum channel density
• FIGURE 16.19 GATE-ARRAY INTERCONNECT. (a) A small two-level metal gate array (about 4.6 k-
gate). (b) Routing in a block. (c) Channel routing showing channel density and channel capacity. The
channel height on a gate array may only be increased in increments of a row. If the interconnect does not use
up all of the channel, the rest of the space is wasted. The interconnect in the channel runs in m1 in the
horizontal direction with m2 in the vertical direction.
Vertical interconnect uses feedthroughs to cross the logic cells. Here are some
commonly used terms with explanations (there are no generally accepted
definitions):

⚫ An unused vertical track (or just track ) in a logic cell is called an uncommitted
feedthrough (also built-in feedthrough , implicit feedthrough , or jumper ).
⚫ A vertical strip of metal that runs from the top to bottom of a cell (for double-
entry
cells ), but has no connections inside the cell, is also called a feedthrough or
jumper.
⚫ Two connectors for the same physical net are electrically equivalent
connectors
(or equipotential connectors ). for double-entry cells these are usually at the top
and bottom of the logic cell.
⚫ A dedicated feedthrough cell (or crosser cell ) is an empty cell (with no logic) that
can hold one or more vertical interconnects. These are used if there are no other
feedthroughs available.

⚫ A feedthrough pin or feedthrough terminal is an input or output that has


connections at both the top and bottom of the standard cell.
⚫ A spacer cell (usually the same as a feedthrough cell) is used to fill space in
rows
so that the ends of all rows in a flexible block may be aligned to connect to
power
buses, for example.

⚫ There are also LOGICALLY CONNECTORS (or FUNCTIONALLY
EQUIVALENT EQUIVALENT CONNECTORS, also just
called EQUIVALENT
sometimes CONNECTORS—which is very
confusing).The two inputs of a two-input NAND gate may be logically equivalent
⚫ Example:

connectors. The placement tool can swap these without altering the logic (but the two

inputs may have different delay properties, so it is not always a good idea to swap

them).

⚫ There can also be LOGICALLY EQUIVALENT CONNECTOR GROUPS . For example, in


an OAI22 (OR-AND-INVERT) gate there are four inputs: A1, A2 are inputs to one OR
gate (gate A), and B1, B2 are inputs to the second OR gate (gate B). Then group A = (A1,
A2) is logically equivalent to group B = (B1, B2)—if we swap one input (A1 or A2) from
gate A to gate B, we must swap the other input in the group (A2 or A1).


Interconnect Area for CBIC,MGA and FPGA
HORIZONTAL INTERCONNECT

⚫ In the case of channeled gate arrays and FPGAs, the horizontal interconnect
areas—the channels, usually on m1—have a fixed capacity.

⚫ The channel capacity of CBICs and channelless MGAs can be expanded to


hold as many interconnects as are needed. Normally we choose, as an objective,
to minimize the number of interconnects that use each channel.

VERTICAL INTERCONNECT

⚫ In the vertical interconnect direction, usually m2, FPGAs still have fixed
resources.

⚫ In contrast the placement tool can always add vertical feedthroughs to a


channeled MGA, channelless MGA, or CBIC. These problems become less
important as we move to three and more levels of interconnect.


Placement Goals and
Objectives
The goal of a placement tool is to arrange all the logic cells within the flexible
blocks on a chip.
Ideally, the objectives of the placement step are to
⚫ Guarantee the router can complete the routing step
⚫ Minimize all the critical net delays
⚫ Make the chip as dense as possible

We may also have the following additional objectives:


⚫ Minimize power dissipation
⚫ Minimize cross talk between signals

Current placement tools use more specific and achievable criteria. The most
commonly used placement objectives are one or more of the following:
⚫ Minimize the total estimated interconnect length
⚫ Meet the timing requirements for critical nets
⚫ Minimize the interconnect congestion
Measurement of Placement Goals and Objectives

⚫ The graph structures that correspond to making all the connections for a net
are known as trees on graphs (or just trees ).

⚫ Special classes of trees— Steiner trees —minimize the total length of


interconnect and they are central to ASIC routing algorithms.

⚫ Minimum Steiner tree - This type of tree uses diagonal connections—we


want to solve a restricted version of this problem, using interconnects on a
rectangular grid. This is called rectilinear routing or Manhattan routing.

⚫ Euclidean distance between two points is the straight-line distance.

⚫ The Manhattan distance (or rectangular distance) between two points is the
distance we would have to walk in New York.


• FIGURE 16.20 Placement using trees on graphs. (a) The floorplan from Figure 16.11 b. (b) An
expanded view of the flexible block A showing four rows of standard cells for placement (typical blocks may
contain
thousands or tens of thousands of logic cells). We want to find the length of the net shown with four
terminals, W through Z, given the placement of four logic cells (labeled: A.211, A.19, A.43, A.25). (c) The
problem for net (W, X, Y, Z) drawn as a graph. The shortest connection is the minimum Steiner tree. (d) The
minimum
Measurement of Placement (contd.,)
⚫ The minimum rectilinear Steiner tree ( MRST ) is the shortest interconnect
using a rectangular grid. The determination of the MRST is in general an NP-
complete problem—which means it is hard to solve.

⚫ The complete graph has connections from each terminal to every other terminal.

⚫ The complete-graph measure adds all the interconnect lengths of the complete-
graph
connection together and then divides by n /2, where n is the number of terminals.
Complete graph = (n ( n -1) ) / 2 )

⚫ The bounding box is the smallest rectangle that encloses all the terminals.

⚫ half-perimeter measure (or bounding-box measure) is one-half the perimeter of


the
bounding box.
half perimeter f = ½ Σm h
i=1 i
where m is the nets, hi is the half perimeter measure

for net i.
FIGURE 16.21 Interconnect-length measures. (a) Complete-
graph measure. (b) Half-perimeter measure.


Correlation between total length of chip interconnect and the half-
perimeter and complete-graph measures.

⚫ Meander factor that specifies, on average, the ratio of the interconnect


created by the routing tool to the interconnect-length estimate used by the
placement tool.
⚫ Another problem is that we have concentrated on finding estimates to the
MRST, but the
MRST that minimizes total net length may not minimize net delay.
Interconnect congestion
⚫ There is no point in minimizing the interconnect length if we create a placement
that is too congested to route.

⚫ If weuse minimum interconnect congestion as an additional


placement
objective, we need some way of measuring it.

⚫ What we are trying to measure is interconnect density

⚫ One measure of interconnect congestion uses the maximum cut line .

⚫ Maximum cut line: Imagine a horizontal or vertical line drawn anywhere across a
chip or
block,

⚫ The number of interconnects that must cross this line is the cut size (the number
of interconnects we cut).The maximum cut line has the highest cut size.
• FIGURE 16.23 Interconnect congestion for the cell-
based ASIC from Figure 16.11 (b). (a) Measurement of
congestion. (b) An expanded view of flexible block A
shows a maximum cut line.

Interconnect Delay

⚫ Many placement tools minimize estimated interconnect length or interconnect congestion

as objectives.

⚫ The problem with this approach is that a logic cell may be placed a long way from
another logic cell to which it has just one connection. This logic cell with one connection is
less important as far as the total wire length is concerned than other logic cells, to which
there are many connections. However, the one long connection may be critical as far as
timing delay is concerned.

⚫ As technology is scaled, interconnection delays become larger relative to circuit delays and

this problem gets worse.


Interconnect
⚫ In Delay
timing-driven placement we must estimate delay for every net for every trial
placement, possibly for hundreds of thousands of gates.

⚫ Unfortunately, the minimum-length Steiner tree does not necessarily correspond to


the interconnect path that minimizes delay. To construct a minimum-delay path we may
have to route with non-Steiner trees.

⚫ In the placement phase typically we take a simple interconnect length approximation to


this
minimum-delay path (typically the half-perimeter measure).

⚫ Even when we can estimate the length of the interconnect, we do not yet have
information on which layers and how many vias the interconnect will use or how wide it
will be. Some tools allow us to include estimates for these parameters.
⚫ Often we can specify metal usage , the percentage of routing on the different layers to
expect from the router. This allows the placement tool to estimate RC values and delays—
and thus minimize delay.

Placement
Algorithms
There are two classes of placement algorithms commonly used in
commercial
CAD tools: constructive placement - uses a set of rules to arrive at a constructed
placement. Example :min-cut algorithm. Eigenvalue method.
 iterative placement improvement.

As in system partitioning, placement usually starts with a constructed


solution and then improves it using an iterative algorithm.

The min-cut placement method uses successive application of


partitioning.The
following steps are,
⚫ Cut the placement area into two pieces.
⚫ Swap the logic cells to minimize the cut cost.
⚫ Repeat the process from step 1, cutting smaller pieces until all the logic cells are
placed
Usually we divide the placement area into bins . The size of a bin can
vary, from a bin size equal to the base cell (for a gate array) to a bin size
that would hold several logic cells.We can start with a large bin size, to
get a rough placement, and then reduce the bin size to get a final
placement.
• FIGURE 16.24 Min-cut placement. (a) Divide the chip into bins using a grid. (b) Merge all connections to
the center of each bin. (c) Make a cut and swap logic cells between bins to minimize the cost of the cut.
(d) Take the cut pieces and throw out all the edges that are not inside the piece. (e) Repeat the process with
a new cut and continue until we reach the individual bins.


Eigen Value Placement
Algorithm
The eigenvalue placement algorithm uses the cost matrix or weighted connectivity matrix (eigen
value methods are also known as spectral methods ) [Hall, 1970]. The measure we use is a cost
function f that we shall minimize, given by ,
n
1
f   c ijd ij 2

2 i1
(1)

where C = [ c ij ] is the (possibly weighted) connectivity matrix, and d ij is the Euclidean distance
between the centers of logic cell i and logic cell j . Since we are going to minimize a cost function that is
the square of the distance between logic cells, these methods are also known as quadratic placement
methods. This type of cost function leads to a simple mathematical solution. We can rewrite the cost
function f in matrix form: n
2 2
ij i j i j
2 i, j 1

f  x T Bx  y T By

B is a symmetric matrix, the disconnection matrix (also called the


Laplacian).

B= D- C

C – Connectivity Matrix ; D – Diagonal Matrix or Degree Matrix


n

where,
dii  C
j 1
ij

dij  0, i  j
We can simplify the problem by noticing that it is symmetric in the x - and y -
coordinates.

Let us solve the simpler problem of minimizing the cost function for the placement
of logic cells
along just the x – axis first. We can then apply this solution to the more general two-dimensional
placement problem.

Before
them inwe solvepositions.
fixed this simpler problem,
We can definewe introduce
a vector a constraint
p consisting of thethat
validthe coordinates of the logic
positions:
make another simplifying assumption p  that
p1, pall
2 ....p n  cells are the
cells must correspond to valid positions (the cells do not overlap and they are placed on-grid). We
logic (4)
same size and we must
place
For a valid placement the x -coordinates of the logic cells,
x  x1, x2 ,...xn
(5)

must be a permutation of the fixed positions, p . We can show that requiring the logic cells to be
in
fixed positions in this way leads to a series of n equations restricting the values of the logic cell
coordinates .If we impose all of these constraint equations the problem becomes very complex.
Instead we choose just one of the equations:
n n
 x i   pi2
2 (6)
i1 i1


Simplifying the problem in this way will lead to an approximate solution to the placement
problem. We can write this single constraint on the x -coordinates in matrix form: ,

xT x  P
n
P   p2

i i1
where P is a constant.
We can now summarize the formulation of the problem, with the simplifications that we have
made, for a one-dimensional solution. We must minimize a cost function, g, where
(8)
subject to the constraint: g  x Bx
T

(9)
xT x  p
This is a standard problem that we can solve using a Lagrangian multiplier:

  x T
Bx    x T
x  (10)
To find the value of x that minimizes g we differentiate L partially with respect to x and set the


result equal to zero. We get the following equation:
p
B  I x  (11)
This last equation is called the characteristic equation for the disconnection matrix B and occurs
frequently in matrix algebra (this l 0has nothing to do with scaling). The solutions to this
equation are the eigenvectors and eigenvalues of B . Multiplying Eq.(11) by x T we get:

x xx T xx= P and x T Bx = g , then


However, since we imposed the constraint
T T

Bx g
The eigenvectors of the disconnection matrix B are the solutions to our 
placement problem. p


Iterative Placement Improvement

An iterative placement improvement algorithm takes an existing


placement and tries to improve it by moving the logic cells. There are
two parts to the algorithm:
⚫ The selection criteria that decides which logic cells to try moving.
⚫ The measurement criteria that decides whether to move the selected
cells.

There are several interchange or iterative exchange methods that differ


in their
selection and measurement criteria:
⚫ Pair wise interchange,
⚫ force-directed interchange,
⚫ force-directed relaxation, and
⚫ force-directed pair wise relaxation.

All of these methods usually consider only pairs of logic cells to be


exchanged.
A source logic cell is picked for trial exchange with a destination logic
• cell
Iterative Placement Improvement

An iterative placement improvement algorithm takes an existing


placement and tries to improve it by moving the logic cells. There are
two parts to the algorithm:
⚫ The selection criteria that decides which logic cells to try moving.
⚫ The measurement criteria that decides whether to move the selected
cells.

There are several interchange or iterative exchange methods that differ


in their
selection and measurement criteria:
⚫ Pair wise interchange,
⚫ force-directed interchange,
⚫ force-directed relaxation, and
⚫ force-directed pair wise relaxation.

All of these methods usually consider only pairs of logic cells to be


exchanged.
A source logic cell is picked for trial exchange with a destination logic
• cell
Iterative Placement
Improvement (contd.,)
The pair wise-interchange algorithm is similar to the interchange
algorithm used for iterative improvement in the system partitioning
step:
⚫ Select the source logic cell at random.
⚫ Try all the other logic cells in turn as the destination logic cell.
⚫ Use any of the measurement methods we have discussed to decide on
whether to
accept the interchange.
⚫ The process repeats from step 1, selecting each logic cell in turn as a
source logic cell.

The neighborhood exchange algorithm is a modification to pairwise


interchange that considers only destination logic cells in a
neighborhood —cells within a certain distance, e, of the source logic
cell. Limiting the search area for the destination logic cell to the e -
neighborhood reduces the search time.

• FIGURE 16.26 Interchange.
• (a) Swapping the source logic cell with a destination logic cell in pairwise interchange.
• (b) Sometimes we have to swap more than two logic cells at a time to reach an optimum
placement, but this is expensive in computation time. Limiting the search to
neighborhoods reduces the search time. Logic cells within a distance e of a logic cell
form an e-neighborhood.
• (c) A one-neighborhood.
• (d) A two-neighborhood.
Iterative Placement
Improvement (contd.,)
Force-directed placement methods:

Imagine identical springs connecting all the logic cells we wish to place.
The number of springs is equal to the number of connections between logic
cells. The effect of the springs is to pull connected logic cells together. The more
highly connected the logic cells, the stronger the pull of the springs. The force on
a logic cell i due to logic cell j is given by Hooke’s law , which says the force
of a spring is proportional to its extension:
F ij = – c ij x ij .
⚫ The vector component x ij is directed from the center of logic cell i to the center
of logic
cell j .
⚫ The vector magnitude is calculated as either the Euclidean or
Manhattan distance between the logic cell centers.
⚫ The c ij form the connectivity or cost matrix (the matrix element c
ij is the number of connections between logic cell i and logic cell j

).


• FIGURE 16.27 Force-directed placement.
• (a) A network with nine logic cells.
• (b) We make a grid (one logic cell per bin).
•(c) Forces are calculated as if springs were attached to
the centers of each logic cell for each connection.The two
nets connecting logic cells A and I correspond to two
springs.

• (d) The forces are proportional to the spring extensions.
Iterative Placement
Improvement (contd.,)
Force-directed placement algorithms:

 The force-directed interchange algorithm uses the force vector to


select a pair of logic cells to swap.
 The force-directed relaxation a chain of logic cells is moved.
 The force-directed pairwise relaxation algorithm swaps one pair of
logic cells at a time.

We reach a force-directed solution when we minimize the energy of the system,


corresponding to minimizing the sum of the squares of the distances
separating logic cells. Force-directed placement algorithms thus also use a
quadratic cost function.


• FIGURE 16.28 Force-directed iterative
placement
improvement.
• (a) Force-directed interchange.
• (b) Force-directed relaxation.
• (c) Force-directed pairwise relaxation.
•66
Placement Using Simulated Annealing
Applying simulated annealing to placement, the algorithm is as follows:

⚫ Select logic cells for a trial interchange, usually at random.


⚫ Evaluate the objective function E for the new placement.
⚫ If D E is negative or zero, then exchange the logic cells. If D E is positive, then
exchange the
logic cells with a probability of exp(– D E / T ).
⚫ Go back to step 1 for a fixed number of times, and then lower the temperature T
according
to a cooling schedule: T n +1 = 0.9 T n , for example.

Experiments show that simple min-cut based constructive placement is


faster than simulated annealing but that simulated annealing is capable of giving
better results at the expense of long computer run times. The iterative
improvement methods that we described earlier are capable of giving results as
good as simulated annealing, but they use more complex algorithms.

•67
Timing-Driven Placement Methods
⚫ Minimizing delay is becoming more and more important as a
placement objective.
⚫ There are two main approaches:
– net based

– path based

⚫ We can use net weights in our algorithms.


⚫ The problem is to calculate the weights.
⚫ One method finds the n most critical paths (using a timing-analysis engine,
possibly in the synthesis tool).
⚫ The net weights might then be the number of times each net appears in this list.
Another method to find the net weights uses the zeroslack algorithm.
Timing-Driven Placement Methods

⚫ Figure 16.29 (a) shows a circuit with primary inputs at which we know the
arrival times (actual times) of each signal.
⚫ We also know the required times for the primary outputs the points in
time at which we want the signals to be valid.
⚫ We can work forward from the primary inputs and backward from the
primary outputs to determine arrival and required times at each input pin
for each net.
⚫ The difference between the required and arrival times at each input pin is
the slack time (the time we have to spare).
⚫ The zero-slack algorithm adds delay to each net until the slacks are zero,
as
shown in Figure 16.29 (b).
⚫ The net delays can then be converted to weights or constraints in the
placement.

•69
• FIGURE 16.29
The zero-slack
algorithm.
(a) The circuit
with no net
delays.

• (b) The zero-


slack algorithm
adds net delays

•70
Physical design
flow

•71
Thank you

You might also like