0% found this document useful (0 votes)
395 views

Module 2 Notes Advanced Vlsi

Uploaded by

pramodhinis2323
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
395 views

Module 2 Notes Advanced Vlsi

Uploaded by

pramodhinis2323
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Module 2: Floor planning and placement:

Goals and objectives, Measurement of delay in Floor planning, Floor planning tools, Channel definition,
I/O and Power planning and Clock planning. Placement: Goals and Objectives, Min-cut Placement
algorithm, Iterative Placement Improvement, Time driven placement methods, Physical Design Flow.
Routing: Global Routing: Goals and objectives, Global Routing Methods, Global routing between blocks,
Back annotation.

Text Book 1

Floor planning

 In VLSI, Floor planning is the first step in the physical design process. It involves determining the
size, shape, and location of modules in a chip to optimize a cost metric, such as wire length, chip
area, or wire congestion. Floor planning allows to predict this interconnect delay by estimating
interconnect length

 Floor planning is physical description of the ASIC

 It is the mapping between the logical and physical description.

 As feature sizes decrease, both average interconnect delay and average gate delay decrease but
at different rates. This is because interconnect capacitance tends to a limit of about 2pF/cm for
a minimum width wire while gate delay continues to decrease. Floor planning allows us to
predict this interconnect delay by estimating interconnect length.

Floor planning Goals and objectives

 The input to a Floor planning tool is a hierarchical netlist that describes the interconnection of
the blocks such as ROM,RAM,ALU, cache, controller so on.

 The goals of the Floor planning are to

 Arrange the blocks on a chip

 Decide the location of the I/O pads

 Decide the location and number of the power pads

 Decide the type of power distribution

 Decide the location and type of clock distribution

 The objectives of Floor planning are to

 minimize the chip area

 minimize delay

Measurement of delay in floor-planning

 In floor-planning we need to predict the interconnect delay before we complete any routing.
 To predict delay we need to know the parasitics associated with interconnect: the interconnect
capacitance as well as the interconnect resistance.

 We cannot predict the resistance of the various pieces of the interconnect path since we do not
yet know the shape of the interconnect for a net. However, we can estimate the total length of
the interconnect and thus estimate the total capacitance. We estimate interconnect length by
collecting statistics from previously routed chips and analyzing the results. From these statistics
we create tables that predict the interconnect capacitance as a function of net fan-out and block
size. A floor-planning tool can then use these predicted-capacitance tables to estimate
interconnect capacitance as a function of net fan-out and block size.

 Interconnect lengths is a function of fanout (FO) and circuit-block size. Which is as shown in
figure(a). Wire-load table can also be used to predict the net capacitance.

 From the figure given, we can illustrate the following facts.

 Typically between 60 and 70 percent of nets have a FO=1

 Figure(c) shows that the wire-load table predicts the capacitance of a net with considerable
error. The net A and net B both have a fan-out of 1 and both have the same predicted net delay,
but net B in fact has a much greater delay than the net A in the actual layout.

 The distribution for FO>1 are more symmetrical and flatter than for FO=1.

 We need to repeat the statistical analysis for blocks with different sizes. For example, a net with
a FO=1 in a 25K-gate block will have a different average length than if the net were in a 5K-gate
block.

 The statistics depend on the shape of the block

 The wire-load table shows the estimated metal interconnect lengths for different fanout and
different die size.

 we usually do not decrease chip size as we scale down feature size, the worst case interconnect
delay increases. The worst-case delay of 0.25um process may be worse than a 0.35 um process.
 The wire-load table shows the estimated metal interconnect lengths for different fanout and
different die size.

 we usually do not decrease chip size as we scale down feature size, the worst case interconnect
delay increases. The worst-case delay of 0.25um process may be worse than a 0.35 um process.
Floor planning Tool:

 Figure (a) shows the initial floorplan generated by the Floor planning tool. Blocks A and C are
standard cell areas which are flexible blocks which contain rows of standard cells. Although their
total area is fixed, their shape and connector location may be adjusted during the placement
step. But, the dimensions and connector locations of the fixed blocks can only be modified when
they are created. A popup window shows the status of block A.

 Seeding: We may force logic cells to be in selected flexible blocks is called seeding. Seed cells
are selected by name. For example, ram_control* would select all logic cells whose name
started with ram_control to be placed in one flexible block.

 Hard seed: A hard seed is fixed and not allowed to move during the remaining Floor planning
and placement step.

 Soft seed: It is an initial suggestion only can be altered if necessary by the floorplanner.

 Seed connectors: They are used within flexible blocks forcing certain nets to appear in a
specified order at the boundary of a flexible block.

 The floorplanner can complete an estimated placement to determine the positions of connector
at the boundaries of the flexible blocks.

 Figure(b) illustrates a rat’s nest display of the connections between blocks.Connections are
shown as bundles between the centres of blocks
 Figure (c) and (d) show how we can move the blocks in a Floor planning tool to minimize routing
congestion.

Congestion Analysis:

 To fit the IC into the die cavity inside the package, it is required to control the aspect ratio of the
floorplan.

 Figure (a) to (c) show how we can rearrange the chip to achieve a square aspect ratio.
 Channel capacity: The number of interconnects that can be handled by the chip

 Channel density: The number of interconnects that we actually need, called the channel density

 A measure of congestion is the difference between the channel capacity and channel density.

 Figure(a) shows the initial floorplan with a 2: 1.5 die aspect ratio. In figure(b)floorplan is altered
to give a 1:1 chip aspect ratio. In figure (c), blocks A and C have been placed so that we know the
terminal positions in the channels. Shading indicates the ratio of channel density to the channel
capacity. Dark areas show regions that cannot be routed because the channel congestion
exceeds the estimated capacity.

 In figure (d), flexible blocks A and C are resized to alleviates congestion.

Channel Definition
 During the Floor planning step we assign the areas between blocks that are to be used for
interconnect. This process is known as Channel definition or channel allocation.

Slicing Floorplan

 Below figure shows a floorplan of a chip containing several blocks. Suppose we cut along the
block boundaries, the chip gets sliced into two pieces. Cut line one makes the chip in to two
blocks.

 Then suppose we can slice each of these pieces into two. If we can continue in this fashion until
all the blocks are separated, then we have a sliced floorplan as shown in below figure(b). A
sequence of cuts 1,2,3 and 4 successively slices the chip until only circuit blocks are left.
Figure(c) shows how the sequence we use to slice the chip define a hierarchy of the blocks.
Reversing the slicing order ensures that we route the stems of all the channel T-junction first.

 The figure (c) shows that the slicing tree corresponding to the sequence of cuts gives the order
in which to route the channels: 4,3,2 and finally 1.

 Cyclic constraint: Some floorplans are not a slicing or sliceable structures. We can not cut the
chip all the way across with a knife without chopping a circuit block in two. This means we
cannot route any of the channels in this floorplan without routing all of the other channels first.
Then we say there is a cyclic constraint in the floorplan

Solutions for Cyclic Constraints:

1. Move the blocks until we obtain a slicing floorplan.

2. Allow the use of L-shaped, rather than rectangular channels. We need an area-based
router rather than a channel router to route L-shaped regions or switch boxes.

Channel definition and ordering

 Moving the blocks is not a good solution for cyclic constraint problem. Because it increases the
chip size. Figure (a) shows the floorplan of ASIC(example). So, an alternate solution of merging
the flexible standard cell areas A and C is shown in figure (b). We can do this by selective
flattening of the netlist

I/O and Power Planning


 Every chip communicates with the outside world. Signals flow into and off the chip and we need
to supply power. We need to consider the I/O and power constraints early in the floor planning
process.
 A silicon chip is mounted on a chip carrier inside the chip package. Connections are made by
bonding the chip pads to lead frame fingers. These metal lead frame fingers are connected to
the package pins.
 A die consists of a logic core inside a pad ring
 Types of die:
1. Pad-limited die: (figure (a))
 A tall and thin pad-limited pads are used and which maximize the number of
pads which can fit around the outside of the chip.
2. core-limited die. (Figure (b))
 A short and wide core limited pads are used.
3. Die in which both pad-limited and core-limited pads are used. (figure (c))

 Special power pads are used for VDD, GND and power buses. One set of power pads supplies
power to the I/O pads and another set of power pads supply power to the logic core.
 I/O pads also contain special circuits to protect against ESD.
 A multiple signal pads are used to serve special purposes like to connect to an external crystal,
clocks etc.

Bonding Pad Details of the chip:

 In below figure, figure(a) shows the chip uses both pad-limited and core limited pads. Figure(b)
shows south-east corner of the chip. Core VDD/VSS and VDD/VSS pad rings are shown below.
I/O cells and pads in MGA

 In an MGA the pad spacing and I/O cell spacing is fixed. Each pad occupies a fixed pad slot. This
means that the properties of the pad I/O are also fixed out. If required, we can parallel adjacent
output cells to increase the drive. To increase the drive strength, we can parallel adjacent
output cells. For example, three 4mA driver cells can occupy two pad slots. Then we can use
two 4mA output cells in parallel to drive the pad. Forming an 8mA output pad as shown in
below figure.

Power Distribution schemes:

 Two power distribution schemes are shown in below figure.


 In one type (figure (a)), power distributed using m1 for VSS and M2 for VDD. This helps
minimize the number of vias and layer crossings needed but causes problems in the
routing channels.
 In the second type of floorplan m1 is run parallel to the longest side of all channels. The
long direction of a rectangular channel is the channel spine. This can make automatic
routing easier but may increase the number of vias and layer crossings.
 Figure (c) shows an expanded view of part of a channel. Figure(d) shows a close up of
VDD and VSS buses as they cross.
Clock Planning:
Figure (a) shows a clock spine of the IC. A clock spine is a straight metal line that
distributes clock signals to sinks in an integrated circuit. Both MGAs and FPGAs often use this
type of clock distribution scheme. Figure (b) shows a clock spine for a cell based ASIC. Figure (c)
shows the clock driver cell and figure (d) shows clock skew and latency. Since all clocked
elements are driven from one net with a clock spine, skew is caused by differing interconnect
lengths and loads. Clock skew represents a fraction of clock period that we cannot use for
computation. A clock skew of 500ps with a 200MHz clock means that we waste 500ps of every
5ns clock cycle.
Power dissipation and peak current in clock driver cell
Let us consider ASIC with the following specification.
 40,000 flip-flops
 Input capacitance of the clock input to each flip-flop is 0.025pF
 Clock frequency is 200MHz
 VDD=3.3V
 Chip size is 20mm on a side
 Clock spine consists of 200 lines across the chip
 Interconnect capacitance is 2pFcm-1
In this case CL= 200x2 cm x 2pFcm-1=800pF. If we drive the clock spine with a chain of
buffers with taper equal to e=2.7, and with a first-stage input capacitance of 0.025 pF.
We will need
−12
800 X 10
log −12
=10.4 (11 stages)
0.025 X 10
The power dissipated while charging the capacitance of the flip-flop clock is ( f . C . V 2)
P1 = (4X104 (200MHz) (0.025pF)(3.3V)2=2.178W
All of this power is dissipated in the clock driver cell and also enormous peak current in the
final inverter stage. If we assume that the needed rise time is 0.1nS, the peak current would
be as calculated below.
( 800 pF ) ( 3.3 V )
I= =25 A
0.1ns
Clearly such a current in the last stage inverter is not desirable.
A clock tree design for minimum power dissipation and delay
Usually clock spines are used to drive loads of 100- 200pF but, as is apparent from previous
calculations to reduce peak current and thus the power dissipation it would be better to
find a way to spread the power dissipation more evenly across the chip. So, minimum delay
is achieved when the taper of successive stages is about 3. We can design a tree of clock
buffers so that the taper of each stage is e=2.7 by using a fan-out of three at each node as
shown in figure (a) and (b). The clock tree shown in figure (c) uses the same number of
stages as a clock spin but with a lower peak for the inverter buffers. Here, we need to
balance the delay through the tree carefully to minimize clock skew.
Placement
 A placement the next step after floor planning
 Placement is much more suited to automation than floor planning
 After placement we can predict both intra-block and inter-block capacitances. This
allows us to return to logic synthesis with more accurate estimates of the capacitive
loads that each logic cell must drive.
Goals and objectives:
 Goal
o Arrange all the logic cells within the flexible blocks on a chip
 Objectives
o Guarantee the router can complete the routing step
o Minimize all the critical net delays
o Make the chip as dense as possible
o Minimize the total estimated interconnect length
o Minimize the interconnect congestion
o Meet the timing requirements for critical nets
o Additional objectives are to minimize power dissipation and cross talk between
signals
Placement algorithm:
Min-cut algorithm
 Uses successive application of partitioning. The following steps are used
o Cut the area into two pieces
o Swap the logic cells to minimize
o Repeat the process from step 1, cutting smaller pieces until all the logic cells are
placed.
 Usually we divide the placement area into bins. The size of a bin can vary, from a bin size
equal to the base cell to a bin size that would hold several logic cells. We can start with a
large bin size, to get a rough placement, and then reduce the bin size to get a final
placement. In Figure (a), it can be seen that the chip is divided into bins using a grid. In
figure (b) we can see merging all connections to the center of each bin. Figure (c): make
a cut and swap logic cells between bins to minimize the cost of the cut. Next, take the
cut pieces and throw out all the edges that are not inside the piece which is as shown in
figure (d). Then repeat the process with a new cut and continue until we reach the
individual bins as shown in figure (e).

Iterative Placement Improvement:


 Iterative placement improvement algorithm takes an existing placement and tries to
improve it by moving the logic cells. There are two parts in the algorithm.
1. The selection criteria that decides which logic cells to try moving
2. The measurement criteria that decides whether to move the selected cells.
 There are several interchange or iterative exchange methods
1. Pairwise interchange
2. Force directed interchange
3. Force-directed relaxation
4. Force-directed pairwise relaxation
 All of these methods usually consider only pairs of logic cells to be exchanged. A source
logic cell is picked for trial exchange with a destination logic cell.
Pairwise Interchange:
 Steps involved in pairwise-interchange algorithm
1. Select the source logic at random
2. Try all the other logic cells in turn as the destination logic cell
3. Use any of the measurement methods to decide on whether to accept the
interchange
4. The process repeats from step1, selecting each logic cell in turn as a source logic
cell.
 Figure (a) and (b) show how we can extend pairwise interchange to swap more than two
logic cells at a time. The neighborhood exchange algorithm is a modification to pairwise
interchange that considers only destination logic cells in a neighborhood cells within a
certain distance. Figure (c) and (d) show the one and two neighborhoods for a logic cell.

Force-Directed placement:
 Here, the number of springs is equal to the number of connections between logic cells.
Imagine identical springs connecting all the logic cells we wish to place. The more highly
connected the logic cells, the stronger the pull of the springs. A network with 9 logic
blocks and its grid are shown in figure (a) and (b) respectively. Forces are calculated as if
springs were attached to the centers of each logic cell for each connection. The two nets
connecting logic cells A and I correspond to two spring. (shown in figure (c). The forces
are proportional to the spring extensions.

Force Directed Iterative placement improvement


 Below figure shows the different kinds of force-directed placement algorithms. The
force-directed interchange algorithm uses the force vector to select a pair of logic cells
to swap. In force-directed relaxation a chain of logic cells is moved. The force- directed
pairwise relaxation algorithm swaps one pair of logic cells at a time.
Timing Driven Placement methods

 One of the objective of placement is minimizing delay. There are 2 approaches: net
based and path based.
 Net based algorithm: The net weight is the number of times each net appears in critical
net list. The problem with this approach is that as soon as we fix the first 100 critical
nets, suddenly another 200 become critical.
 Another method to find the net weights uses the zero-slack algorithm. Below figure
shows how this works. Figure (a) shows a circuit with primary inputs at which we know
the arrival times of each signal. We also know the required times for the primary
outputs. We can work forward from the primary inputs and backward from the primary
outputs to determine arrival and required times at each input pin for each net. The
difference between the required and arrival times at each input pin is the slack time.
The zero-slack alogorithm adds delay to each net until all the slacks are zero as shown in
figure (b). The net delays can then be converted to weights in the placement.
 Path based algorithm: With the zero-slack algorithm we simplify but over-constrain the
problem. So, we would like to do is deal with paths such as the critical path as shown in
figure (a) above and not just nets. So, path-based algorithms have been proposed to do
this, but they are complex.
Physical Design Flow
 The physical design flow in VLSI is a process that converts a design into a layout that can
be used to create a chip.
 Below figure shows a design flow using synthesis and floorplanning tool that includes
placement. The flow consists of the following steps.
1. Design entry: The input is a logical description with no physical information.
2. Initial synthesis. The initial synthesis contains little or no information on any
interconnect loading. The output of the synthesis tool (typically an EDIF
netlist) is the input to the floorplanner
3. Initial floorplan. From the initial floorplan interblock capacitances are input
to the synthesis tool as load constraints and intrablock capacitances are input
as wire-load tables.
4. Synthesis with load constraints. At this point the synthesis tool is able to
resynthesize the logic based on estimates of the interconnect capacitance
each gate is driving. The synthesis tool produces a forward annotation file to
constrain path delays in the placement step.
5. Timing-driven placement. After placement using constraints from the
synthesis tool, the location of every logic cell on the chip is fixed and accurate
estimates of interconnect delay can be passed back to the synthesis tool.
6. Synthesis with in-place optimization (IPO). The synthesis tool changes the drive
strength of gates based on the accurate interconnect delay estimates from the
floorplanner without altering the netlist structure.
7. Detailed placement. The placement information is ready to be input to the routing
step.

You might also like