0% found this document useful (0 votes)
22 views65 pages

Module 2-1

Uploaded by

kgaddigoudar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views65 pages

Module 2-1

Uploaded by

kgaddigoudar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Module 2

FLOOR PLANNING,
PLACEMENT & ROUTING:
Introduction:
• The input to the floorplanning step is output of
system partitioning and design entry—a netlist.
• Netlist-is a hierarchical description of circuit blocks ,
the logic cells within the blocks , and their
connections.
• Interconnect and gate delays decrease as feature
sizes decrease .
• Both average interconnect delay and average gate
delay decrease—but at different rates . This is
because interconnect capacitance tends to a limit(
that is independent of scaling).
• Interconnect delay now dominates gate delay.
• Floorplanning- is used To predict interconnect delay
by estimating interconnect length.
The starting point of
floorplaning and placement
steps for the viterbi decoder

• Small boxes that look like bricks-outlines of


the standard cells.
• Largest standard cells , at the bottom of the
display (label ddfctnb)-188 D flip flops.
• '+'symbols-drawing origins of the standard
cells—for the D-flip-flops they are shifted to
the left and below the logic cell bottom left-
hand corner.
• Large box surrounding all the logic cells-
estimated chip size.(This is a screenshot from
Cadence Cell Ensemble.)
The viterbi decoder after
floorplanning and
placement
• 8 rows of standard cells
separated by 17 horizontal
channels (labeled2–18).
• Channels are routed as
numbered.
• In this example , the I/O
pads are omitted to show
the cell placement more
clearly.
Floorplanning:
• The input to a floorplanning tool is a hierarchical netlist that
describes:
• –the interconnection of the blocks (RAM, ROM, ALU, cache controller, and so
on)
• –the logic cells (NAND, NOR, D flip-flop, and so on) within the blocks
• –the logic cell connectors (terminals , pins , or ports)
• If netlist is a logical description of the ASIC;
• The floorplan is a physical description of an ASIC.
• Floorplanning is a mapping between the logical description and the
physical description .
Floorplanning Goals and Objectives:
• The Goals of Floorplanning are to:
• Arrange the blocks on a chip,
• Decide the location of the I/O pads,
• Decide the location and number of the power pads,
• Decide the type of power distribution, and
• Decide the location and type of clock distribution.
• Objectives of Floorplanning–
• -To minimize the chip area
• -To minimize delay.
• Measuring area is straightforward, but measuring delay is more
difficult.
Measurement of Delay in
Floor planning:
A floorplanning tool can use predicted-capacitance
tables (also known as interconnect-load tables or
wire-load tables).
•Typically between 60 and 70 percent of nets have
a FO=1.
•The distribution for a FO=1 has a very long tail ,
stretching to interconnects that run from corner to
corner of the chip.
•The distribution for a FO=1 has two peaks ,
corresponding to a distribution for close neighbors
in subgroups within a block , superimposed on a
distribution corresponding to routing between
subgroups.We often see a twin-peaked distribution FIGURE:Predicted capacitance. (a)Interconnect lengths as a function of
at the chip level also,corresponding to separate fanout (FO) and circuit-block size. (b)Wire-load table. There is only one
distributions for interblock routing (inside blocks) capacitance value for each fanout (typically the average value). (c)The
and intrablock routing (between blocks) . wire-load table predicts the capacitance and delay of a net (with a
•The distributions for FO>1 are more symmetrical considerable error). Net A and net B both have a fanout of 1, both have
and flatter than for FO=1. the same predicted net delay, but net B in fact has a much greater delay
than net A in the actual layout (of course we shall not know what the
actual layout is until much later in the design process).
• The wire-load tables can only contain one number , for example the average net capacitance , for
any one distribution.
• Many tools take a worst-case approach and use the 80-or 90-percentile point instead of the
average (Thus a tool may use a predicted capacitance for which we know 90 percent of the nets
will have less than the estimated capacitance.)
• Repeat the statistical analysis for blocks with different sizes.
• For example , a net with a FO=1 in a 25k-gate block will have a different (larger) average length
than if the net were in a 5k-gate block.
• The statistics depend on the shape (aspect ratio) of the block (usually the statistics are only
calculated for square blocks).
• The statistics will also depend on the type of netlist.
• For example , the distributions will be different for a netlist generated by setting a constraint for
minimum logic delay during synthesis—which tends to generate large numbers of two-input
NAND gates—than for netlists generated using minimum-area constraints.
Floorplanning Tools:

• Floorplanning in CBIC:
• •Flexible blocks (or variable blocks ):
–Their total area is fixed,
–Their shape (aspect ratio) and connector locations may be adjusted during the placement.
• •Fixed blocks:
–The dimensions and connector locations of the fixed blocks (perhaps RAM, ROM, compiled cells, or
megacells) can only be modified when they are created.
• •Seeding:
• -Force logic cells to be in selected flexible blocks by seeding .
• -We choose seed cells by name.
• -Seeding may be hard or soft.
• Hard seed-fixed and not allowed to move during the remaining floor
planning and placement steps.
• Soft seed-an initial suggestion only and can be altered if necessary by
the floor planner.
• Seed connectors : within flexible blocks—forcing certain nets to
appear in a specified order, or location at the boundary of a flexible
block.
• Rat’s nest:-display the connection between the blocks
-Connections are shown as bundles between the centers of blocks or as flight
lines between connectors.
Floor planning a cell-based ASIC.
(a)Initial floor plan generated by the
floor planning tool. Two of the blocks
are flexible (A and C) and contain rows
of standard cells (unplaced) .A pop-up
window shows the status of block A.

(b)An estimated placement for flexible


blocks A and C . The connector
positions are known and a rat’s nest
display shows the heavy congestion
below block B.

(c)Moving blocks to improve the floor


plan.

(d)The updated display shows the


reduced congestion after the changes.
(a)The initial floor plan with a 2:1.5 die aspect ratio.
(b) Altering the floor plan to give a 1:1 chip aspect
ratio.
(c) A trial floor plan with a congestion map . Blocks A
and C have been placed so that we know the
terminal positions in the channels . Shading indicates
the ratio of channel density to the channel capacity .
Dark areas show regions that cannot be routed
because the channel congestion exceeds the
estimated capacity.
(d) Resizing flexible blocks A and C alleviates
congestion . Congestion analysis-One measure of
congestion is the difference between the number of
interconnects that we actually need,called the
channel density, and the channel capacity
During the floor planning step , assign the areas between blocks that are to be used for interconnect.

•Route the stem of a T-junction before route the top.


FIGURE : Pad-limited and core-limited die.
(a)A pad-limited die . Tall, thin pad-limited pads, which maximize the number of pads we can fit around the
outside of the chip. The number of pads determines the die size.
(b)A core-limited die: short, wide core-limited pads . The core logic determines the die size.
(c)Using both pad-limited pads and core-limited pads for a square die.
Power Pads and I/O Pads:
•Special power pads are used for:
• 1. positive supply, or VDD, power buses (or power rails ) and
• 2. ground or negative supply, VSS or GND.
• One set of VDD/VSS pads supplies power to the I/O pads only.
• Another set of VDD/VSS pads connects to a second power ring that
supplies the logic core.
• The I/O power is a dirty power since it has to supply large transient
currents to the output transistors. We keep dirty power separate to avoid
injecting noise into the internal-logic power(the clean power).
• I/O pads also contain special circuits to protect against electrostatic
discharge ( ESD ). These circuits can withstand very short high-voltage
(several kilovolt) pulses that can be generated during human or machine
handling.
• A double bond connects two pads to one chip-carrier finger and one package pin. We can
do this to save package pins or reduce the series inductance of bond wires (typically a
few Nano-Henries) by parallel connection of the pads.
• •Multiple VDD and VSS pads:
• •To reduce the series resistive and inductive impedance of power supply networks, it is
normal to use multiple VDD and VSS pads.
• •This is particularly important with the simultaneously switching outputs ( SSOs ) that
occur when driving buses .
• •The output pads can easily consume most of the power on a CMOS ASIC, because the
load on a pad (usually tens of Pico-Farads) is much larger than typical on-chip capacitive
loads.
• •Depending on the technology it may be necessary to provide dedicated VDD and VSS
pads for every few SSOs . Design rules set how many SSOs can be used per VDD/VSS pad
pair. These dedicated VDD/VSS pads must “follow” groups of output pads as they are
seeded or planned on the floor plan.
• Using a pad mapping we translate the logical pad in a net list to a physical pad from a pad library . We might
control pad seeding and mapping in the floor planner.
• •Handling of I/O pads can become quite complex; there are several nonobvious factors that must be
considered when generating a pad ring:
• •Design library pad cells for one orientation.
• –For example, an edge pad for the south side of the chip, and a corner
• pad for the southeast corner.
• •We could then generate other orientations by rotation and flipping (mirroring). Some ASIC vendors will not
allow rotation or mirroring of logic cells in the mask file.
• •To avoid these problems we may need to have separate horizontal, vertical, left-handed, and right-handed
pad cells in the library with appropriate logical to physical pad mappings.
• •If we mix pad-limited and core-limited edge pads in the same pad ring, this complicates the design of
corner pads. Usually the two types of edge pad cannot abut. In this case a corner pad also becomes a pad-
format changer , or hybrid corner pad .
• •In single-supply chips we have one VDD net and one VSS net, both global power nets . It is also possible to
use mixed power supplies (for example, 3.3 V and 5 V) or multiple power supplies ( digital VDD, analog VDD).
Other I/O bond:
• •stagger-bond arrangement using two rows of I/O pads. In this case the design
rules for bond wires (the spacing and the angle at which the bond wires leave the
pads) become very important.
• •Area-bump bonding arrangement (also known as flip-chip, solder-bump) used,
for example, with ball-grid array ( BGA )packages.Even though the bonding pads
are located in the center of the chip, the I/O circuits are still often located at the
edges of the chip because of difficulties in power supply distribution and
integrating I/O circuits together with logic in the center of the die.
• •I/O pads in MGA and CBIC : In an MGA the pad spacing and I/O-cell spacing is
fixed—each pad occupies a fixed pad slot (or pad site ). This means that the
properties of the pad I/O are also fixed but, if we need to, we can parallel
adjacent output cells to increase the drive. To increase flexibility further the I/O
cells can use a separation, the I/O-cell pitch , that is smaller than the pad pitch .
FIGURE : Bonding pads. (a)This chip uses both pad-limited and core-limited pads. (b)A hybrid corner pad. (c)A chip
with stagger-bonded pads. (d)An area-bump bonded chip (or flip-chip). The chip is turned upside down and solder
bumps connect the pads to the lead frame
FIGURE : Gate-array I/O pads. (a)Cell-based ASICs may contain pad cells of different sizes and widths. (b)A
corner of a gate-array base. (c)A gate-array base with different I/O cell and pad pitches
FIGURE : Power distribution. (a)Power distributed using m1 for VSS and m2 for VDD. This helps minimize the number of vias
and layer crossings needed but causes problems in the routing channels. (b)In this floor plan m1 is run parallel to the
longest side of all channels, the channel spine. This can make automatic routing easier but may increase the number of vias
and layer crossings. (c)An expanded view of part of a channel (interconnect is shown as lines). If power runs on different
layers along the spine of a channel, this forces signals to change layers. (d)A close up of VDD and VSS buses as they cross.
Changing layers requires a large number of via contacts to reduce resistance. The long direction of a rectangular channel is
the channel spine . •Some automatic routers may require that metal lines parallel to a channel spine use a preferred
layer(either m1, m2, or m3). Alternatively we say that a particular metal layer runs in a preferred direction
Clock planning:
Clock spine : routing scheme with all clock pins driven
directly from the clock driver . MGAs and FPGAs often use
this fish bone type of clock distribution scheme.

Suppose we have an ASIC with the following


specifications:
•40,000 flip-flops.
•Input capacitance of the clock input to each flip-flops is
0.025pF
•Clock frequency is 200MHz
•VDD=3.3V
•Chip size is 20mm on a side FIGURE : Clock distribution. (a) A clock spine for a gate array.
•Clock spine consists of 200 lines across the chip (b) A clock spine for a cell-based ASIC (typical chips have
•Interconnect capacitance is 2pFcm-1 thousands of clock nets). (c) A clock spine is usually driven from
one or more clock-driver cells . Delay in the driver cell is a
function of the number of stages and the ratio of output to
input capacitance for each stage (taper).(d) Clock latency and
clock skew . We would like to minimize both latency and skew.
• The clock spine capacitance is CL =200*2cm*2pFcm-1 =800pF
• If we drive the clock spine with the chain of buffers with taper equal to e
≈2.7 and with a first stage input capacitance of 0.025pF
• We will need log (800×10−12 / 0.025×10−12).
• =10.4≈11 𝑠𝑡𝑎𝑔𝑒𝑠.
• The power dissipated charging the input capacitance of the flip flop clock is
= fCV2 or P1 = (4×104)(200MHz)(0.025pF)(3.3V)2 =2.178 W,≈2𝑊.
• Which is only slightly larger than the power dissipated driving the 800 pF
clock spine interconnect that we can calculate as follows:
• P2 =(200×200𝑀𝐻𝑧×20𝑚𝑚×2𝑝𝐹cm-1 )(3.3V)2 = 1.7424W.
• All of these power is dissipated in the clock driver cell.
• •The worst problem however is the dissipation of the enormous peak
current through final stage inverter
• •Now if we assume that the needed raise time is 0.1ns the peak current
has to reach
• •I = 800𝑝𝐹×3.3𝑉0.1𝑛𝑠 = 25A, Which is too large .
• •So an expensive design has to be in place to handle such large current or
we have to find a way to spread the power dissipation more evenly across
the chip.
• •We can design a tree of clock buffers so that the taper of each stage is
e≈2.7 by using a tree at each node as shown in Fig below.
Figure: clock tree. (a) Minimum delay is achieved when the taper of successive stages is about 3. (b) Using a
fanout of three at successive nodes. (c) A clock tree for a cell-based ASIC We have to balance the clock arrival
times at all of the leaf nodes to minimize clock skew.
Designing the clock tree that balances the rise and fall times at the leaf nodes has the beneficial side effect of minimizing
the Hot-electron Wearout.
•This problem occurs when an electron gains enough energy to become “hot” and jump out of channel into gate
oxide.(the problem is worse for electron in n-Channel devices as they are more mobile than holes).
•As these electrons accumulate in gate oxide they start influencing the threshold voltage of transistors which in-turn
alters the delay of the buffer.
•As the buffer delay change with time this introduces unpredictable skew.
•The problem is worse for n-channel device carrying maximum current with a high voltage across the channel- during
rise and fall time transitions.
•Balancing the rise and fall time in each buffer means that they all wear out at the same rate minimizing any additional
skew.
Phase Locked Loop:
• •Its an electronic flywheel that locks in frequency to an input clock
signal.
• •the input and output signal frequency may differ in phase.
• •We can drive a clock network with a PLL in such a way that the
output of the clock network is locked in-phase to the incoming clock,
thus eliminating the latency of the clock network
• •It also reduces the random variation of the input clock frequency
known as Jitter which is unpredictable.
• •Actel was one of the first company to introduce the PLL on chip of
FPGA.
Placement:
Goal of a placement–
•To arrange all the logic cells within the flexible blocks on a chip.
•Ideally, the objectives of the placement step are to
•-Guarantee the router can complete the routing step
•-Minimize all the critical net delays
•-Make the chip as dense as possible
Additional objectives:
•Minimize power dissipation
•Minimize cross talk between signals
The most commonly used placement objectives (by current Placement tools):
•Minimize the total estimated interconnect length
•Minimize the interconnect congestion
•Meet the timing requirements for critical nets
Placement Algorithms:
There are two classes of placement algorithms used in CAD tools:
• Constructive placement: uses a set of rules to arrive at a constructed placement . Example :min-cut
algorithm. Eigenvalue method.
• Iterative placement improvement -As in system partitioning, placement usually starts with a constructed
solution and then improves it using an iterative algorithm.
• In most tools we can specify the location and relative placement of certain critical logic cells as seed
placements
Min-cut placement method:-uses successive application of partitioning.
The steps are:
–Cut the placement area into two pieces.
–Swap the logic cells to minimize the cut cost.
–Repeat the process from step 1, cutting smaller pieces until all the logic cells are placed
• Usually we divide the placement area into bins .
• The size of a bin can vary, from a bin size equal to the base cell (for a gate array) to a bin size that would hold
several logic cells.
• We can start with a large bin size, to get a rough placement, and then reduce the size to get a final
placement.
• Fig:(a)Divide the chip into bins using a grid. (b)Merge all connections
to the center of each bin. (c)Make a cut and swap logic cells between
bins to minimize the cost of the cut. •(d)Take the cut pieces and
throw out all the edges that are not inside the piece. •(e)Repeat the
process with a new cut and continue until we reach the individual
bins.
• The pair wise-interchange algorithm
–Select the source logic cell at random.
–Try all the other logic cells in turn as the destination logic cell.
–Use any of the measurement methods we have discussed to decide
on whether to accept the interchange.
–The process repeats from step 1, selecting each logic cell in turn as a
source logic cell.
• Neighborhood exchange algorithm
• Modification to pairwise interchange that considers only destination logic cells in a
neighborhood —cells within a certain distance, 𝜀 , of the source logic cell.
• Limiting the search area for the destination logic cell to the 𝜀-neighborhood reduces
the search time.
FIGURE: Force-directed iterative placement improvement.
•(a)Force-directed interchange.
•(b)Force-directed relaxation.
•(c)Force-directed pairwise relaxation.
• Timing-Driven Placement Methods Minimizing delay is becoming
more and more important as a placement objective.
•There are two main approaches:
–net based
–path based.
•We know that we can use net weights in our algorithms.
•The net weights might then be the number of times each net appears
in this list.
•The problem is to calculate the weights.
•One method finds the n most critical paths.
• Find the net weights uses the zero-slack algorithm [Haugeetal.,1987] Figure
shows how this works (all times are in nano seconds).
• Primary inputs at which the arrival times (this is the original definition ,
some people use the term actual times ) of each signal are known.
• Required times for the primary outputs—the points in time at which the
signals to be valid.
• Work forward from the primary inputs and backward from the primary
outputs to determine arrival and required times at each input pin for each
net.
• The difference between the required and arrival times at each input pin is
the slack time (the time we have to spare).
Timing-Driven Placement Methods:
• •The zero –slack algorithm-adds delay to each net until the slacks are
zero , as shown in Figure
• •The net delays can then be converted to weights or constraints in
the placement.
• •Assumed that all the gates on a net switch at the same time so that
the net delay can be placed at the output of the gate driving the
net—a rather poor timing model but the best we can use without any
routing information.
Timing-Driven Placement Methods
• •Advantage
• •An important point to remember is that adjusting the net weight , even
for every net on a chip , does not theoretically make the placement
algorithms any more complex—we have to deal with the numbers anyway .
It does not matter whether the net weight is 1 or 6.6 , for example.
• •Disadvantage
• •The practical problem , however ,is getting the weight information for
each net ( usually in the form of timing constraints ) from a synthesis tool
or timing verifier.
• •These files can easily be hundreds of mega bytes in size.
•With the zero –slack algorithm we simplify but over constrain the problem.
Solution
• •Deal with paths such as the critical path shown in Figure (a) and not just nets.
• •Disadvantage of Path-based method
• •They are complex and not all commercial tools have this capability
• •Path delays between gates cannot be predicted with only placement
information.
• •Because we are using simple approximations to the total net length (such as the
half – perimeter measure )and then use this to estimate an et delay (the same to
each pin on a net).
• •It is not until the routing step that we can make accurate estimates of the actual
interconnect delays.
Physical design Flow:
Routing:
• FIGURE:The core of the
Viterbi decoder chip after
placement (a screen shot
from Cadence Cell
Ensemble)
• The core of the Viterbi
decoder chip after
placement.
• You can see the rows of
standard cells; the widest
cells are the D flip-flops.
• FIGURE: The core of the Viterbi
decoder chip after the
completion of global and
detailed routing (a screen shot
from Cadence Cell Ensemble).
• The core of the Viterbi decoder
chip after the completion of
global and detailed routing.
• This chip uses two-level metal.
Although you cannot see the
difference m1 runs in the
horizontal direction and m2 in
the vertical direction.
Global Routing:
• The details of global routing Goals and Objectives
differs lightly between cell- •Input to routing:
based ASICs, gate arrays, and 1)Floor-plan that includes the locations of all
FPGAs, but the principles are the the fixed and flexible blocks;
same. 2)Placement information for flexible blocks;
• A global router does not make 3)Locations of all the logic cells.
any connections, it just plans •Goal of global routing
them. –To provide complete instructions to the
• Global router route the whole detailed router on
chip (or large pieces if it is a where to route every net.
large chip) before detail routing •Objectives of global routing:
the whole chip (or the pieces). –Minimize the total interconnect length.
–Maximize the probability that the detailed
router can complete the routing.
–Minimize the critical path delay.
Global Routing Methods:
• Many of the methods used in global routing are based on the solutions to
the tree on a graph problem.
•sequential routing :
• One approach to global routing takes each net in turn and calculates the
shortest path using tree on graph algorithms—with the added restriction of
using the available channels.
•Disadvantage:
• •As a sequential routing algorithm proceeds , some channels will become
more congested since they hold more interconnects than others.
• •In the case of FPGAs and channeled gate arrays , the channels have a fixed
channel capacity and can only hold a certain number of interconnects.
• There are two types of areas to global route:
• –between blocks
• –inside the flexible blocks
• •There are two different ways that a global router normally handles this problem.
• 1.Order-independen tRouting
• 2.Order-dependent Routing
• •Order-independent routing , a global router proceeds by routing each net , ignoring
how crowded the channels are . Whether a particular net is processed first or last does
not matter , the channel assignment will be the same , after all the interconnects are
assigned to channels , the global router returns to those channels that are the most
crowded and reassigns some interconnects to other , less crowded , channels.
• •order dependent : A global router can consider the number of interconnects already
placed in various channels as it proceeds. In this case the global routing is order
dependent—the routing is still sequential , but now the order of processing the nets will
affect the results.
• •Iterative improvement or simulated annealing may be applied to the solutions found
from both order-dependent and order-independent algorithms to improve the routing
taking net at a time
• Hierarchical routing : handles all nets at a particular level at once.
• •Rather than handling all of the nets on the chip at the same time, the
global-routing problem is made more tractable by dividing the chip area
into levels of hierarchy.
• •By considering only one level of hierarchy at a time the size of the
problem is reduced at each level.
• •There are two ways to traverse the levels of hierarchy.
• •top-down approach:-Starting at the whole chip, or
• highest level, and proceeding down to the logic cells .
• •The bottom-up approach starts at the lowest level of
• hierarchy and globally routes the smallest areas first.

You might also like