Asic Design Cadence
Asic Design Cadence
Circuit
Introduction
• The input to the floorplanning step - output of system partitioning and
design entry—a netlist.
• Netlist - describing circuit blocks, the logic cells within the blocks, and
their connections.
Floorplanning Goals and Objectives
• The input to a floorplanning tool is a hierarchical netlist that describes
– the interconnection of the blocks (RAM, ROM, ALU, cache controller, and so on)
– the logic cells (NAND, NOR, D flip-flop, and so on) within the blocks
– the logic cell connectors (terminals , pins , or ports)
Objectives of Floorplanning –
To minimize the chip area To
minimize delay.
Measuring area is straightforward, but
measuring delay is more difficult.
Floorplanning - Optimization
Optimize Performance
• Chip area.
• Total wire length.
• Critical path delay.
• Routability.
• Others, e.g. noise, heat dissipation.
Cost = αA + βL,
Where
A = total area,
L = total wire length,
α and β constants.
Floorplanning
Area
• Deadspace
•
Minimizing area = Minimizing
deadspace
• Wire length estimation
• Exact wire length not known until
after routing.
• Pin position not known.
• How to estimate?
• Center to center estimation.
Floorplanning Tools
• Flexible blocks (or variable blocks ) :
– Their total area is fixed,
– Their shape (aspect ratio) and connector locations may be adjusted during the placement.
• Fixed blocks:
– The dimensions and connector locations of the other fixed blocks (perhaps RAM, ROM, compiled
cells, or megacells) can only be modified when they are created.
• Seeding:
– Force logic cells to be in selected flexible blocks by seeding . We choose seed cells by name.
– Seeding may be hard or soft.
• Hard seed - fixed and not allowed to move during the remaining floor
planning and placement steps.
• Soft seed - an initial suggestion only and can be altered if necessary
by the
floor planner.
No Bounds
• Block 4
• Block 3
• Block 2
• Block 1
•
NOT
GOOD!!
With Bounds
lower bound ≤ height/width ≤ upper bound
• Soft Blocks
• Flexible shape
• I/O positions not yet determined
• Hard Blocks
• Fixed shape
• Fixed I/O pin positions
Sizing
example*
Floorplanning Tools
• Defining the channel routing order for a slicing floorplan using a slicing tree.
• (a) Make a cut all the way across the chip between circuit blocks. Continue slicing until each
piece contains just one circuit block. Each cut divides a piece into two without cutting
through a circuit block.
• (b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks
are left.
• (c) The slicing tree corresponding to the sequence of cuts gives the order in which to route
the channels: 4, 3, 2, and finally 1.
Slicing Floorplan and General Floorplan
• Slicing floorplan •v
•5 •h •h
•1 •3
•1 •2 •v •v
•6
•3 •4 •7
•2 •h •5 •6
•4 •7 • Slicing Tree
• non-slicing floorplan
Area Utilization
• Area utilization
– Depends on how nicely the rigid modules’ shapes are
matched
– Soft modules can take different shapes to “fill in”
empty slots
– Floorplan sizing
• m3 m4
•m
•3 •m1
• m1
3 4
•1 • m1
•2
• m2
•
•
m2
•
•4
m
• m7
• m7
• m6
••
m7
m • m5
• m6
•7 •6 m
• m = 440•Area = 20x1 9
•Area = 20x22 1 79 = 38 77
•5
Slicing Floorplan Sizing
• Bottom-up process
– Has to be done per floorplan perturbation
– Requires O(n) time (N is the # of shapes of all
modules)•V •H
•L •R •T •B
• bi
•b •yj • max(bi, yj) •a • bi+ yj
i•a • xj i
i • xj •yj
• ai+ xj
• max(ai, xj)
Slicing Floorplan Sizing
• Simple case: all modules are hard macros
– No rotation allowed, one shape only
•3
•17x16 1234567
•1 •2
•4 •
•5
•8x16
• m1 • 67 •1 • 234 •5
• m2
• b1 • b1
•a1 •b1 •a2 •b1 • a3
• 2x7 • 6x7 • 7x7 • 8x7
• b2 • a1 • a2 • b2
• b2 • a3 • b2
• 3x4 • 7x6 • 8x5 • 9x4
• a1 • a2
• b3 • b3 • a3
• b3 • b3
•4x2 •8x6 •9x5 •10x4
Cyclic Constraints
• Cyclic constraints.
• (a) A nonslicing floorplan with a cyclic constraint that prevents channel routing.
(b)In this case it is difficult to find a slicing floorplan without increasing the
chip area.
• (c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic
constraints, but it is inefficient in area use and will be very difficult to route.
Cyclic Constraints
•
• (a) We can eliminate the cyclic constraint by merging the blocks A and C.
• (b) A slicing structure.
I/O and Power Planning (contd.,)
• Every chip communicates with the outside world.
• FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of
pads determines the die size. (b) A core-limited die: The core logic determines the die
size. (c) Using both pad-limited pads and core-limited pads for a square die.
I/O and Power Planning (contd.,)
• Special power pads are used for:1. positive supply, or VDD, power buses
(or power rails ) and
2. ground or negative supply, VSS or GND.
– one set of VDD/VSS pads supplies power to the I/O pads only.
– Another set of VDD/VSS pads connects to a second power ring that supplies the logic core.
We can dedicate one (or more) chip pad(s) to down bond to the chip carrier.
We can make a connection from a chip pad to the lead frame and down bond
from the chip pad to the chip carrier.
We can make a connection from a chip pad to the lead frame and down bond
from the lead frame.
We can down bond from the lead frame without using a chip pad.
• Depending on the package design, the type and positioning of down bonds may be fixed.
This means we need to fix the position of the chip pad for down bonding using a
pad seed
I/O and Power Planning (contd.,)
• A double bond connects two pads to one chip-carrier finger and one
package pin. We can do this to save package pins or reduce the series
inductance of bond wires (typically a few nanohenries) by parallel connection
of the pads.
– The output pads can easily consume most of the power on a CMOS ASIC, because the load on
a pad (usually tens of picofarads) is much larger than typical on-chip capacitive loads.
• In single-supply chips we have one VDD net and one VSS net, both
global power nets . It is also possible to use mixed power supplies
(for example, 3.3 V and 5 V) or multiple power supplies ( digital VDD,
analog VDD).
I/O and Power Planning (contd.,)
• FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited pads. (b) A hybrid
corner pad. (c) A chip with stagger-bonded pads. (d) An area-bump bonded chip (or flip-chip). The chip is
turned upside down and solder bumps connect the pads to the lead
frame
I/O and Power Planning (contd.,)
• stagger-bond arrangement using two rows of I/O pads.
– In this case the design rules for bond wires (the spacing and the angle at which the
bond wires leave the pads) become very important.
– Even though the bonding pads are located in the center of the chip, the I/O circuits
are still often located at the edges of the chip because of difficulties in power
supply distribution and integrating I/O circuits together with logic in the center of
the die.
• Some automatic routers may require that metal lines parallel to a channel
spine use a preferred layer (either m1, m2, or m3). Alternatively we say that
a particular metal layer runs in a preferred direction .
I/O and Power Planning (contd.,)
• FIGURE 16.15 Power distribution. (a) Power distributed using m1 for VSS and m2 for VDD. This helps
minimize the number of vias and layer crossings needed but causes problems in the routing channels.
(b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine. This can
make automatic routing easier but may increase the number of vias and layer crossings. (c) An
expanded view of part of a channel (interconnect is shown as lines). If power runs on different layers
along the spine of a channel, this forces signals to change layers. (d) A closeup of VDD and VSS buse
they cross. Changing layers requires a large number of via contacts to reduce resistance .
Power distribution.
• (a) Power distributed using m1 for VSS and m2 for VDD.
– This helps minimize the number of vias and layer crossings needed
– but causes problems in the routing channels.
• (d) A closeup of VDD and VSS buses as they cross. Changing layers
requires a large number of via contacts to reduce resistance.
Clock Planning
• clock spine routing scheme with all clock pins driven directly from the
clock driver. MGAs and FPGAs often use this fish bone type of clock
distribution scheme
• clock skew and clock latency
• FIGURE 16.16 Clock distribution.
• (a) A clock spine for a gate array.
•
(b) A clock spine for a cell-based ASIC
(typical chips have thousands of clock
nets).
•
(c) A clock spine is usually driven from
one or more clock-driver cells. Delay in
the driver cell is a function of the
number of stages and the ratio of output
to input capacitance for each stage
(taper).
•
(d) Clock latency and clock skew. We
would like to minimize both latency and
skew.
Clock Planning (cont.,)
• FIGURE 16.17 A clock tree. (a) Minimum delay is achieved when the
taper of successive stages is about 3. (b) Using a fanout of three at
successive nodes.
(c) A clock tree for the cell-based ASIC of Figure 16.16 b. We have to balance
the clock arrival times at all of the leaf nodes to minimize clock skew.
Conten
t⚫
Placement Definitions
⚫ Placement Goals and Objectives
⚫ Measurement of placement Goals and Objectives
⚫ Placement Algorithms
⚫ Simple placement Example
⚫ Physical Design Flow
Placement
⚫ The process of arranging circuit components on a
layout
surface undercertain constraints.
⚫ Inputs : Set of fixed modules, netlist
⚫ Output : Best position for each module based on various
cost functions
⚫ hotspots,
Cost performance,
functions I/O pads.
include wirelength, wire
⚫ routability,
Placement is much more to automation than
suited floorplanning.
⚫ After we complete floorplanning and placement, we
can predict both intrablock and interblock capacitances
Good placement vs Bad placement*
• 2075(there is room for seven interconnects to run horizontally in m1). (c) A channel that uses OTC (over-the-cell)
routing in m2.
• FIGURE 16.18 INTERCONNECT STRUCTURE. (a) The two-level metal CBIC floorplan shown in Figure 16.11
b. (b) A channel from the flexible block A. This channel has a channel height equal to the maximum channel density
• FIGURE 16.19 GATE-ARRAY INTERCONNECT. (a) A small two-level metal gate array (about 4.6 k-
gate). (b) Routing in a block. (c) Channel routing showing channel density and channel capacity. The
channel height on a gate array may only be increased in increments of a row. If the interconnect does not use
up all of the channel, the rest of the space is wasted. The interconnect in the channel runs in m1 in the
horizontal direction with m2 in the vertical direction.
Vertical interconnect uses feedthroughs to cross the logic cells. Here are some
commonly used terms with explanations (there are no generally accepted
definitions):
⚫ An unused vertical track (or just track ) in a logic cell is called an uncommitted
feedthrough (also built-in feedthrough , implicit feedthrough , or jumper ).
⚫ A vertical strip of metal that runs from the top to bottom of a cell (for double-
entry
cells ), but has no connections inside the cell, is also called a feedthrough or
jumper.
⚫ Two connectors for the same physical net are electrically equivalent
connectors
(or equipotential connectors ). for double-entry cells these are usually at the top
and bottom of the logic cell.
⚫ A dedicated feedthrough cell (or crosser cell ) is an empty cell (with no logic) that
can hold one or more vertical interconnects. These are used if there are no other
feedthroughs available.
connectors. The placement tool can swap these without altering the logic (but the two
inputs may have different delay properties, so it is not always a good idea to swap
them).
•
Interconnect Area for CBIC,MGA and FPGA
HORIZONTAL INTERCONNECT
⚫ In the case of channeled gate arrays and FPGAs, the horizontal interconnect
areas—the channels, usually on m1—have a fixed capacity.
VERTICAL INTERCONNECT
⚫ In the vertical interconnect direction, usually m2, FPGAs still have fixed
resources.
•
Placement Goals and
Objectives
The goal of a placement tool is to arrange all the logic cells within the flexible
blocks on a chip.
Ideally, the objectives of the placement step are to
⚫ Guarantee the router can complete the routing step
⚫ Minimize all the critical net delays
⚫ Make the chip as dense as possible
Current placement tools use more specific and achievable criteria. The most
commonly used placement objectives are one or more of the following:
⚫ Minimize the total estimated interconnect length
⚫ Meet the timing requirements for critical nets
⚫ Minimize the interconnect congestion
Measurement of Placement Goals and Objectives
⚫ The graph structures that correspond to making all the connections for a net
are known as trees on graphs (or just trees ).
⚫ The Manhattan distance (or rectangular distance) between two points is the
distance we would have to walk in New York.
•
• FIGURE 16.20 Placement using trees on graphs. (a) The floorplan from Figure 16.11 b. (b) An
expanded view of the flexible block A showing four rows of standard cells for placement (typical blocks may
contain
thousands or tens of thousands of logic cells). We want to find the length of the net shown with four
terminals, W through Z, given the placement of four logic cells (labeled: A.211, A.19, A.43, A.25). (c) The
problem for net (W, X, Y, Z) drawn as a graph. The shortest connection is the minimum Steiner tree. (d) The
minimum
Measurement of Placement (contd.,)
⚫ The minimum rectilinear Steiner tree ( MRST ) is the shortest interconnect
using a rectangular grid. The determination of the MRST is in general an NP-
complete problem—which means it is hard to solve.
⚫ The complete graph has connections from each terminal to every other terminal.
⚫ The complete-graph measure adds all the interconnect lengths of the complete-
graph
connection together and then divides by n /2, where n is the number of terminals.
Complete graph = (n ( n -1) ) / 2 )
⚫ The bounding box is the smallest rectangle that encloses all the terminals.
•
Correlation between total length of chip interconnect and the half-
perimeter and complete-graph measures.
⚫ Maximum cut line: Imagine a horizontal or vertical line drawn anywhere across a
chip or
block,
⚫ The number of interconnects that must cross this line is the cut size (the number
of interconnects we cut).The maximum cut line has the highest cut size.
• FIGURE 16.23 Interconnect congestion for the cell-
based ASIC from Figure 16.11 (b). (a) Measurement of
congestion. (b) An expanded view of flexible block A
shows a maximum cut line.
•
Interconnect Delay
as objectives.
⚫ The problem with this approach is that a logic cell may be placed a long way from
another logic cell to which it has just one connection. This logic cell with one connection is
less important as far as the total wire length is concerned than other logic cells, to which
there are many connections. However, the one long connection may be critical as far as
timing delay is concerned.
⚫ As technology is scaled, interconnection delays become larger relative to circuit delays and
•
Interconnect
⚫ In Delay
timing-driven placement we must estimate delay for every net for every trial
placement, possibly for hundreds of thousands of gates.
⚫ Even when we can estimate the length of the interconnect, we do not yet have
information on which layers and how many vias the interconnect will use or how wide it
will be. Some tools allow us to include estimates for these parameters.
⚫ Often we can specify metal usage , the percentage of routing on the different layers to
expect from the router. This allows the placement tool to estimate RC values and delays—
and thus minimize delay.
•
Placement
Algorithms
There are two classes of placement algorithms commonly used in
commercial
CAD tools: constructive placement - uses a set of rules to arrive at a constructed
placement. Example :min-cut algorithm. Eigenvalue method.
iterative placement improvement.
•
Eigen Value Placement
Algorithm
The eigenvalue placement algorithm uses the cost matrix or weighted connectivity matrix (eigen
value methods are also known as spectral methods ) [Hall, 1970]. The measure we use is a cost
function f that we shall minimize, given by ,
n
1
f c ijd ij 2
2 i1
(1)
where C = [ c ij ] is the (possibly weighted) connectivity matrix, and d ij is the Euclidean distance
between the centers of logic cell i and logic cell j . Since we are going to minimize a cost function that is
the square of the distance between logic cells, these methods are also known as quadratic placement
methods. This type of cost function leads to a simple mathematical solution. We can rewrite the cost
function f in matrix form: n
2 2
ij i j i j
2 i, j 1
f x T Bx y T By
B= D- C
where,
dii C
j 1
ij
dij 0, i j
We can simplify the problem by noticing that it is symmetric in the x - and y -
coordinates.
Let us solve the simpler problem of minimizing the cost function for the placement
of logic cells
along just the x – axis first. We can then apply this solution to the more general two-dimensional
placement problem.
Before
them inwe solvepositions.
fixed this simpler problem,
We can definewe introduce
a vector a constraint
p consisting of thethat
validthe coordinates of the logic
positions:
make another simplifying assumption p that
p1, pall
2 ....p n cells are the
cells must correspond to valid positions (the cells do not overlap and they are placed on-grid). We
logic (4)
same size and we must
place
For a valid placement the x -coordinates of the logic cells,
x x1, x2 ,...xn
(5)
must be a permutation of the fixed positions, p . We can show that requiring the logic cells to be
in
fixed positions in this way leads to a series of n equations restricting the values of the logic cell
coordinates .If we impose all of these constraint equations the problem becomes very complex.
Instead we choose just one of the equations:
n n
x i pi2
2 (6)
i1 i1
•
Simplifying the problem in this way will lead to an approximate solution to the placement
problem. We can write this single constraint on the x -coordinates in matrix form: ,
xT x P
n
P p2
i i1
where P is a constant.
We can now summarize the formulation of the problem, with the simplifications that we have
made, for a one-dimensional solution. We must minimize a cost function, g, where
(8)
subject to the constraint: g x Bx
T
(9)
xT x p
This is a standard problem that we can solve using a Lagrangian multiplier:
x T
Bx x T
x (10)
To find the value of x that minimizes g we differentiate L partially with respect to x and set the
result equal to zero. We get the following equation:
p
B I x (11)
This last equation is called the characteristic equation for the disconnection matrix B and occurs
frequently in matrix algebra (this l 0has nothing to do with scaling). The solutions to this
equation are the eigenvectors and eigenvalues of B . Multiplying Eq.(11) by x T we get:
Bx g
The eigenvectors of the disconnection matrix B are the solutions to our
placement problem. p
•
Iterative Placement Improvement
Imagine identical springs connecting all the logic cells we wish to place.
The number of springs is equal to the number of connections between logic
cells. The effect of the springs is to pull connected logic cells together. The more
highly connected the logic cells, the stronger the pull of the springs. The force on
a logic cell i due to logic cell j is given by Hooke’s law , which says the force
of a spring is proportional to its extension:
F ij = – c ij x ij .
⚫ The vector component x ij is directed from the center of logic cell i to the center
of logic
cell j .
⚫ The vector magnitude is calculated as either the Euclidean or
Manhattan distance between the logic cell centers.
⚫ The c ij form the connectivity or cost matrix (the matrix element c
ij is the number of connections between logic cell i and logic cell j
).
•
• FIGURE 16.27 Force-directed placement.
• (a) A network with nine logic cells.
• (b) We make a grid (one logic cell per bin).
•(c) Forces are calculated as if springs were attached to
the centers of each logic cell for each connection.The two
nets connecting logic cells A and I correspond to two
springs.
•
• (d) The forces are proportional to the spring extensions.
Iterative Placement
Improvement (contd.,)
Force-directed placement algorithms:
•
• FIGURE 16.28 Force-directed iterative
placement
improvement.
• (a) Force-directed interchange.
• (b) Force-directed relaxation.
• (c) Force-directed pairwise relaxation.
•66
Placement Using Simulated Annealing
Applying simulated annealing to placement, the algorithm is as follows:
•67
Timing-Driven Placement Methods
⚫ Minimizing delay is becoming more and more important as a
placement objective.
⚫ There are two main approaches:
– net based
– path based
⚫ Figure 16.29 (a) shows a circuit with primary inputs at which we know the
arrival times (actual times) of each signal.
⚫ We also know the required times for the primary outputs the points in
time at which we want the signals to be valid.
⚫ We can work forward from the primary inputs and backward from the
primary outputs to determine arrival and required times at each input pin
for each net.
⚫ The difference between the required and arrival times at each input pin is
the slack time (the time we have to spare).
⚫ The zero-slack algorithm adds delay to each net until the slacks are zero,
as
shown in Figure 16.29 (b).
⚫ The net delays can then be converted to weights or constraints in the
placement.
•69
• FIGURE 16.29
The zero-slack
algorithm.
(a) The circuit
with no net
delays.
•70
Physical design
flow
•71
Thank you