0% found this document useful (0 votes)
89 views

21EC71 Advanced VLSI Module 1 - Module

Uploaded by

gowdabharth488
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

21EC71 Advanced VLSI Module 1 - Module

Uploaded by

gowdabharth488
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

21EC71 Advanced VLSI

Advanced VLSI
(21EC71)
SEMESTER – VII

Education in itself is an asset, Not an investment.

Module 1
Introduction to ASICs: Full custom, Semi-custom and Programmable ASICs, ASIC
Design flow, ASIC cell libraries. CMOS Logic: Data path Logic Cells: Data Path
Elements, Adders: Carry skip, Carry bypass, Carry save, Carry select, Conditional
sum, Multiplier (Booth encoding), Data path Operators, I/O cells, Cell Compilers.

Textbook 1

Page | 1
21EC71 Advanced VLSI

CONTENTS
Introduction to ASICs ............................................................................................................................................................................ 3
Types of ASICs .......................................................................................................................................................................................... 4
Full-Custom ASICs .............................................................................................................................................................................. 4
SEMI CUSTOM ASICS......................................................................................................................................................................... 5
Standard-Cell Based ASICs ........................................................................................................................................................ 5
GATE ARRAY BASED ASICS ....................................................................................................................................................... 7
programmable asics .......................................................................................................................................................................... 8
Programmable Logic Devices ................................................................................................................................................... 8
Field-Programmable Gate Arrays ............................................................................................................................................... 9
Design Flow............................................................................................................................................................................................. 10
ASIC Cell Libraries........................................................................................................................................................................... 11
CMOS LOGIC CELLS ............................................................................................................................................................................. 13
Datapath Logic Cells ....................................................................................................................................................................... 13
Datapath Elements .......................................................................................................................................................................... 14
adders .............................................................................................................................................................................................. 16
Multipliers ..................................................................................................................................................................................... 19
Other Datapath Operators ........................................................................................................................................................... 20
IO Cells.................................................................................................................................................................................................. 21
Cell Compilers ........................................................................................................................................................................................ 22

Page | 2
21EC71 Advanced VLSI

INTRODUCTION TO ASICS
An ASIC is an application-specific integrated circuit. Figure 1.0(a) shows an IC
package (this is a pin-grid array, or PGA, shown upside down; the pins will go through holes
in a printed-circuit board). A PGA package is usually made from a ceramic material, but
plastic packages are also common.

FIGURE 1.0 An integrated circuit (IC). (a) A pin-grid array (PGA) package. (b)
The silicon die or chip is under the package lid.

 The earliest ICs used bipolar technology and the majority of logic ICs used either
transistor transistor logic ( TTL ) or emitter-coupled logic (ECL).
 Although invented before the bipolar transistor, the metal-oxide-silicon (MOS) transistor
was initially difficult to manufacture because of problems with the oxide interface.
 By the early 1980s the aluminum gates of the transistors were replaced by polysilicon
gates.
 The introduction of polysilicon as a gate material was a major improvement in CMOS
technology
 The principal advantage of CMOS over NMOS techniology is lower power consumption.
 Another advantage of a polysilicon gate was a simplification of the fabrication process,
allowing devices to be scaled down in size.
 As different types of custom ICs began to evolve for different types of applications, these
new ICs gave rise to a new term:application-specific IC, or ASIC.

Examples of ICs that are ASICs include: Examples of ICs that are not ASICs include
standard parts such as:
1. a chip for a toy bear that talks;
2. a chip for a satellite; 1. memory chips sold as a commodity
3. a chip designed to handle the item ROMs,
interface between memory and a 2. DRAM, and SRAM;
microprocessor for a workstation 3. microprocessors;
CPU; 4. TTL or TTL-equivalent ICs at SSI,
4. a chip containing a microprocessor MSI, and LSI levels.
as a cell together with other logic.

Page | 3
21EC71 Advanced VLSI

TYPES OF ASICS
ICs are made on a thin (a few hundred microns thick), circular silicon wafer , with
each wafer holding hundreds of die (sometimes people use dies or dice for the plural of die).

The transistors and wiring are made from many layers (usually between 10 and 15
distinct layers) built on top of one another. Each successive mask layer has a pattern that is
defined using a mask similar to a glass photographic slide. The first half-dozen or so layers
define the transistors. The last half-dozen or so layers define the metal wires between the
transistors (the interconnect).

ASICs

Full Custom Semi Custom Programmable


ASICs ASICs ASICs

Programmable Field
Standard Cell Gate Array Programmable
Logic Devices
based ASICs based ASICs Gate Arrays
(PLDs)
(FPGAs)

Programmable
Channeled
Array Logic
gate arrays
(PALs)

Programmable
Channelless
Logic Array
gate arrays.
(PLAs)

Structured
gate arrays.

Figure 1.1: Types of ASICs

FULL-CUSTOM ASICS
 In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or
layout specifically for one ASIC.
 This means the designer abandons the approach of using pretested and precharacterized
cells for all or part of that design.
 It makes sense to take this approach only if there are no suitable existing cell libraries
available that can be used for the entire design. This might be because existing cell
libraries are not fast enough, or the logic cells are not small enough or consume too
much power.
 There is one growing member of this family, the mixed analog/digital ASIC.

Page | 4
21EC71 Advanced VLSI

SEMI CUSTOM ASICS


STANDARD-CELL BASED ASICS
 A cell-based ASIC (cell-based IC, or CBIC a common term in Japan) uses predesigned
logic cells (AND gates, OR gates, multiplexers, and flip-flops, for example) known as
standard cells.
 The standard-cell areas (also called flexible blocks) in a CBIC are built of rows of
standard cells like a wall built of bricks.
 The standard-cell areas may be used in combination with larger predesigned cells,
perhaps microcontrollers or even microprocessors, known as megacells .
 Megacells are also called megafunctions, full-custom blocks, system-level macros
(SLMs), fixed blocks, cores, or Functional Standard Blocks (FSBs).
 The ASIC designer defines only the placement of the standard cells and the interconnect
in a CBIC.
 However, the standard cells can be placed anywhere on the silicon; this means that all
the mask layers of a CBIC are customized and are unique to a particular customer.
 The advantage of CBICs is that designers save time, money, and reduce risk by using a
predesigned, pretested, and precharacterized standard-cell library . In addition each
standard cell can be optimized individually. During the design of the cell library each
and every transistor in every standard cell can be chosen to maximize speed or minimize
area, for example.
 The disadvantages are the time or expense of designing or buying the standard-cell
library and the time needed to fabricate all layers of the ASIC for each new design.

FIGURE 1.2: A cell-based ASIC (CBIC) die with a single standard-cell area (a flexible
block) together with four fixed blocks. The flexible block contains rows of standard
cells.

Page | 5
21EC71 Advanced VLSI

FIGURE 1.3 Standard cells are stacked like bricks in a wall; the abutment box (AB)
defines the edges of the brick. The difference between the bounding box (BB) and the
AB is the area of overlap between the bricks. Power supplies (labeled VDD and GND)
run horizontally inside a standard cell on a metal layer that lies above the transistor
layers. This standard cell has center connectors (the three squares, labeled A1, B1, and
Z) that allow the cell to connect to others.

FIGURE 1.4 Routing the CBIC (cell-based IC). The use of regularly shaped standard
cells, from a library allows ASICs like this to be designed automatically. This ASIC
uses two separate layers of metal interconnect (metal1 and metal2) running at right
angles to each other.

Page | 6
21EC71 Advanced VLSI

GATE ARRAY BASED ASICS


 Both cell-based and gate-array ASICs use predefined cells, but there is a difference we
can change the transistor sizes in a standard cell to optimize speed and performance, but
the device sizes in a gate array are fixed.
 This results in a trade-off in performance and area in a gate array at the silicon level.
 Only the top few layers of metal, which define the interconnect between transistors, are
defined by the designer using custom masks.
 To distinguish this type of gate array from other types of gate array, it is often called a
masked gate array ( MGA ).
 The smallest element that is replicated to make the base array is the base cell (sometimes
called a primitive cell ).
 The logic cells in a gate-array library are often called macros .

There are the following different types of MGA or gate-arraybased ASICs:

● Channeled gate arrays.


● Channelless gate arrays.
● Structured gate arrays.

FIGURE 1.5 A channeled gate- FIGURE 1.6 A channelless gate-array or


array die. The spaces between rows sea-of-gates (SOG) array die. The core
of the base cells are set aside for area of the die is completely filled with an
interconnect. array of base cells (the base array).

FIGURE 1.7 A structured or embedded gate-array die showing an embedded block


in the upper left corner (a static random-access memory, for example). The rest of
the die is filled with an array of base cells.

Page | 7
21EC71 Advanced VLSI

PROGRAMMABLE ASICS
PROGRAMMABLE LOGIC DEVICES
 Programmable logic devices ( PLDs ) are standard ICs that are available in standard
configurations from a catalog of parts and are sold in very high volume to many
different customers.
 However, PLDs may be configured or programmed to create a part customized to a
specific application, and so they also belong to the family of ASICs.
 No customized mask layers or logic cells • Fast design turnaround

FIGURE 1.8 A programmable logic device (PLD) die. The macrocells typically
consist of programmable array logic followed by a flip-flop or latch.The macrocells
are connected using a large programmable interconnect block.

 The simplest type of programmable IC is a read-only memory ( ROM ). The most common
types of ROM use a metal fuse that can be blown permanently (a programmable ROM or
PROM ).
 An electrically programmable ROM , or EPROM , uses programmable MOS transistors
whose characteristics are altered by applying a high voltage. You can erase an EPROM
either by using another high voltage (an electrically erasable PROM , or EEPROM ) or
by exposing the device to ultraviolet light ( UV-erasable PROM , or UVPROM ).
 There is another type of ROM that can be placed on any ASIC a mask-programmable
ROM (mask-programmed ROM or masked ROM). A masked ROM is a regular array of
transistors permanently programmed using custom mask patterns. An embedded
masked ROM is thus a large, specialized, logic cell.

 a PLA has a programmable AND logic array, or AND plane , followed by a


programmable OR logic array, or OR plane ;
 a PAL has a programmable AND plane and, a fixed OR plane.

Page | 8
21EC71 Advanced VLSI

FIELD-PROGRAMMABLE GATE ARRAYS


 FPGA is usually just larger and more complex than a PLD.
● None of the mask layers are customized.
● A method for programming the basic logic cells and the interconnect.
● The core is a regular array of programmable basic logic cells that can implement
combinational as well as sequential logic (flip-flops).
● A matrix of programmable interconnect surrounds the basic logic cells.
● Programmable I/O cells surround the core.
● Design turnaround is a few hours.

FIGURE 1.9 A field-programmable gate array (FPGA) die. The exact type, size, and number of the
programmable basic logic cells varies tremendously

Page | 9
21EC71 Advanced VLSI

DESIGN FLOW
Figure 1.10 shows the sequence of steps to design an ASIC; we call this a design flow.

FIGURE 1.10 ASIC design flow

1. Design entry. Enter the design into an ASIC design system, either using a hardware
description language ( HDL ) or schematic entry .
2. Logic synthesis. Use an HDL (VHDL or Verilog) and a logic synthesis tool to
produce a netlist a description of the logic cells and their connections.
3. System partitioning. Divide a large system into ASIC-sized pieces.
4. Prelayout simulation. Check to see if the design functions correctly.
5. Floorplanning. Arrange the blocks of the netlist on the chip.
6. Placement. Decide the locations of cells in a block.
7. Routing. Make the connections between cells and blocks.
8. Extraction. Determine the resistance and capacitance of the interconnect.
9. Postlayout simulation. Check to see the design still works with the added loads of the
interconnect.

Page | 10
21EC71 Advanced VLSI

ASIC CELL LIBRARIES


The cell library is the key part of ASIC design. You have three choices: the ASIC vendor (the
company that will build your ASIC) will supply a cell library, or you can buy a cell library
from a third-party library vendor , or you can build your own cell library.

 The first choice, using The second and third choices The third choice is to
an ASIC-vendor library, require you to make a buy-or- develop a cell library in-
requires you to use a set build decision . house.
of design tools approved  If you complete an ASIC  Many large computer and
by the ASIC vendor to design using a cell library electronics companies make
enter and simulate your that you bought, you also this choice.
design. own the masks (the  Most of the cell libraries
 You have to buy the tooling) that are used to designed today are still
tools, and the cost of the manufacture your ASIC. developed in-house
cell library is folded into This is called customer- despite the fact that the
the NRE. owned tooling (COT). process of library
 A library vendor normally development is complex
develops a cell library and very expensive
using information about a
process supplied by an
ASIC foundry .

However created, each cell in an ASIC cell library must contain the following:

A physical
layout
A
A routing
behavioral
model
model

A wire- A
load Verilog/VHDL
model ASIC Cell model
library
A detailed
A cell
timing
icon
model

A circuit A test
schematic strategy

1. The ASIC designer may not actually see the layout if it is hidden inside a phantom, but the
layout will be needed eventually.
2. The ASIC designer needs a high-level, behavioral model for each cell because simulation
at the detailed timing level takes too long for a complete ASIC design.
3. The designer may require Verilog and VHDL models in addition to the models for a
particular logic simulator.

Page | 11
21EC71 Advanced VLSI

4. ASIC designers also need a detailed timing model for each cell to determine the
performance of the critical pieces of an ASIC. Library engineers simulate the delay of
each cell, a process known as characterization.
5. All ASICs need to be production tested (programmable ASICs may be tested by the
manufacturer before they are customized, but they still need to be tested). Simple cells in
small or medium-size blocks can be tested using automated techniques, but large blocks
such as RAM or multipliers need a planned strategy.
6. The cell schematic (a netlist description) describes each cell so that the cell designer can
perform simulation for complex cells.
7. A cell icon helps in identifying the library cell.
8. In order to estimate the parasitic capacitance of wires before we actually complete any
routing, we need a statistical estimate of the capacitance for a net in a given size circuit
block. This usually takes the form of a look-up table known as a wire-load model .
9. We also need a routing model for each cell. Large cells are too complex for the physical
design or layout tools to handle directly and we need a simpler representation. The
phantom may include information that tells the automated routing tool where it can and
cannot place wires over the cell, as well as the location and types of the connections to
the cell.
LIB files (.lib)

LEF files (.lef)

Netlist file (.v )

GDS file (.gds)

SPICE Netlist (.sp)

Model file (.m)

https://teamvlsi.com/2020/08/standard-cell-library-in-asic-design.html

Page | 12
21EC71 Advanced VLSI

CMOS LOGIC CELLS


Type of Combinational Sequential Datapath I/O Cells Special purpose
Cells Logic Cells Logic Cells Logic Cells Cells
Description Perform Store data and Optimized for Manage Address specific
operations control timing, arithmetic and communication needs related to
where the with outputs logical between the power
internal logic of management,
output depends depending on operations, used
the chip and the signal integrity,
only on current both current in high-speed external design flexibility,
inputs, without and previous datapaths to environment. and
memory. states. reduce delay. manufacturing.
Examples AND Gate, D Flip-Flop, Carry Look- Input Buffer Tap Cells, Filler
Multiplexer Shift Register, Ahead Adder Cell, Output Cells, Antenna
(MUX), Ripple Counter (CLA), Booth Driver Cell, Bi- Cells, Spare Cells,
Directional I/O Endcap Cells
Carry Adder Multiplier,
Cell, ESD
(RCA) Barrel Shifter Protection Cell

DATAPATH LOGIC CELLS


Suppose we wish to build an n -bit adder (that adds two n -bit numbers) and to exploit
the regularity of this function in the layout. We can do so using a datapath structure.

FIGURE 2.20 A datapath adder.

(a) A full-adder (FA) cell with inputs (b) A 4-bit adder. (c) The layout, using two-level
metal, with data in m1 and control in m2. (d) The datapath layout.
Page | 13
21EC71 Advanced VLSI

What is the difference between using a datapath, standard cells, or gate arrays? Cells
are placed together in rows on a CBIC or an MGA, but there is no generally no regularity to
the arrangement of the cells within the rows we let software arrange the cells and complete
the interconnect.

Datapath layout automatically takes care of most of the interconnect between the cells
with the following advantages:

● Regular layout produces predictable and equal delay for each bit.

● Interconnect between cells can be built into each cell.

There are some disadvantages of using a datapath:

● The overhead (buffering and routing the control signals, for example) can make a
narrow (small number of bits) datapath larger and slower than a standard-cell (or even gate-
array) implementation.

● Datapath cells have to be predesigned (otherwise we are using full-custom design) for
use in a wide range of datapath sizes. Datapath cell design can be harder than designing
gate-array macros or standard cells.

● Software to assemble a datapath is more complex and not as widely used as software
for assembling standard cells or gate arrays.

DATAPATH ELEMENTS
Figure 2.21 shows some typical datapath symbols for an adder.

FIGURE 2.21 Symbols for a datapath adder.

Page | 14
21EC71 Advanced VLSI

TABLE 2.11 Binary arithmetic & representation

Page | 15
21EC71 Advanced VLSI

ADDERS

Page | 16
21EC71 Advanced VLSI

FIGURE 2.22 The carry-save adder (CSA). (a) A CSA cell. (b) A 4-bit CSA. (c)Symbol fora
CSA. (d) A four-input CSA. (e) The datapath for a four-input, 4-bit adder using CSAs with a
ripple-carry adder (RCA) as the final stage. (f) A pipelined adder. (g) The datapath for the
pipelined version showing the pipeline registers as well as the clock control lines that use m2

FIGURE 2.24 The conditional-sum adder. (a) A 1-bit conditional adder that calculates the sum
and carry out assuming the carry in is either '1' or '0'. (b) The multiplexer that selects between
sums and carries. (c) A 4-bit conditional-sum adder with carry input, C[0].

Page | 17
21EC71 Advanced VLSI

FIGURE 2.23 The Brent Kung carry-lookahead adder (CLA). (a) Carry generation in
4-bit CLA. (b) A cell to generate the lookahead terms, C[0] C[3]. (c) Cells L1, L2, an
L3 are rearranged into a tree that has less delay. Cell L4 is added to calculate C[2] that
is lost in the translation. (d) and (e) Simplified representations of parts a and c. (f) The
lookahead logic for an 8-bit adder. The inputs, 0:7, are the propagate and carry terms
formed from the inputs to the adder. (g) An 8-bit Brent Kung CLA. The outputs
of the lookahead logic are the carry bits that (together with the inputs) form the sum.
One advantage of this adder is that delays from the inputs to the outputs are more
nearly equal than in other adders. This tends to reduce the number of unwanted and
unnecessary switching events and thus reduces power dissipation.

Page | 18
21EC71 Advanced VLSI

MULTIPLIERS
There are two items we can attack to improve the performance of a multiplier: the
number of partial products and the addition of the partial products. Booth encoding reduces
the number o partial products by a factor of two and thus considerably reduces the area as
well as increasing the speed of our multiplier

Booth Encoding

The above multiplier architecture can be divided into two stages. In the first stage the Partial
Products are formed by the Booth encoder and Partial Product Generator(PPG). In the second
stage the partial products obtained in the above are merged to form the results.

 Andrew Donald Booth proposed Booth's multiplication algorithm which can perform the
multiplication operation of Two Signed Binary numbers in their respective 2's complement
form.
RADIX-2 BOOTH RECODING

https://linus5.blogspot.com/p/design-and-implementation-of-efficient.html

Table 2: Example of Booth’s Radix-2 multiplication

https://digitalsystemdesign.in/booths-multiplication-algorithm/

Page | 19
21EC71 Advanced VLSI

OTHER DATAPATH OPERATORS


The combinational datapath cells, NAND, NOR, and so on, and sequential datapath
cells (flip-flops and latches) have standard-cell equivalents and function identically.

FIGURE 2.31 Symbols for datapath elements. (a) An array or vector of flip-flops (a
register). (b) A two-input NAND cell with databus inputs. (c) A two-input NAND cell with a
control input. (d) A buswide MUX. (e) An incrementer/decrementer. (f) An all-zeros
detector. (g) An all-ones detector. (h) An adder/subtracter.

 We can build a ripple-borrow subtracter (a type of borrow-propagate subtracter), a


borrow-save subtracter, and a borrow-select subtracter in the same way we built
these adder architectures.
 A barrel shifter rotates or shifts an input bus by a specified amount. For example if
we have an eight-input barrel shifter with input '1111 0000' and we specify a shift of
'0001 0000' (3, coded by bit position) the right-shifted 8-bit output is '0001 1110'. A
barrel shifter may rotate left or right (or switch between the two under a separate
control).
 A leading-one detector is used with a normalizing (left-shift) barrel shifter to align
mantissas in floating-point numbers.
 The output of a priority encoder is the binary-encoded position of the leading one in
an input.
 An accumulator is an adder/subtracter and a register. Sometimes these are combined
with a multiplier to form a multiplier accumulator ( MAC ). An incrementer adds 1 to
the input bus, Z = A + 1, so we can use this function, together with a register, to
negate a two s complement number for example.

Page | 20
21EC71 Advanced VLSI

IO CELLS

FIGURE 2.32 A three-state bidirectional output buffer. When the output enable,
OE, is '1' the output section is enabled and drives the I/O pad. When OE is '0' the output
buffer is placed in a high-impedance state.

 Figure 2.33 shows a three-state bidirectional output buffer (Tri-State ® is a


registered trademark of National Semiconductor).When the output enable (OE) signal
is high, the circuit functions as a noninverting buffer driving the value of DATAin
onto the I/O pad. When OE is low, the output transistors or drivers , M1 and M2, are
disconnected. This allows multiple drivers to be connected on a bus. It is up to the
designer to make sure that a bus never has two drivers a problem known as
contention .In order to prevent the problem opposite to contention a bus floating to an
intermediate voltage when there are no bus drivers we can use a bus keeper or bus-
hold cell (TI calls this Bus-Friendly logic).
 Large currents and voltages at I/O pads because of inductances and capacitances
cause power-supply bounce. To avoid this, we can limit the number of simultaneously
switching outputs (SSOs), we can limit the number of I/O drivers that can be attached
to any one VDD and GND pad, and we can design the output buffer to limit the slew
rate of the output (we call these slew-rate limited I/O pads). Quiet-I/O cells also use
two separate power supplies and two sets of I/O drivers: an AC supply (clean or quiet
supply) with small AC drivers for the I/O circuits that start and stop the output
slewing at the beginning and end of a output transition, and a DC supply (noisy or
dirty supply) for the transistors that handle large currents as they slew the output.
 To protect the I/O cells from electrostatic discharge ESD, the input pads are
normally tied to device structures that clamp the input voltage to below the gate
breakdown voltage. Some I/O cells use transistors with a special ESD implant that
increases breakdown voltage and provides protection.

Page | 21
21EC71 Advanced VLSI

CELL COMPILERS
The process of hand crafting circuits and layout for a full-custom IC is a tedious,
time-consuming, and error-prone task.

There are two types of automated layout assembly tools, often known as a silicon
compilers.

1. The first type produces a specific kind of circuit, a RAM compiler or multiplier
compiler , for example.
2. The second type of compiler is more flexible, usually providing a
programming language that assembles or tiles layout from an input command
file, but this is full-custom IC design.

In addition to producing layout we also need a model compiler so that we can verify the
circuit at the behavioral level, and we need a netlist from a netlist compiler so that we can
simulate the circuit and verify that it works correctly at the structural level. Silicon compilers
are thus complex pieces of software.

End of Module 1 Notes

Page | 22
21EC71 Advanced VLSI

Advanced VLSI
(21EC71)
SEMESTER – VII

"Education is not solely about earning a great living. It means


living a great life." - Brad Henry

Module 2
Floor planning and placement: Goals and objectives, Measurement of delay in Floor
planning, Floor planning tools, Channel definition, I/O and Power planning and Clock
planning. Placement: Goals and Objectives, Min-cut Placement algorithm, Iterative
Placement Improvement, Time driven placement methods, Physical Design Flow.

Routing: Global Routing: Goals and objectives, Global Routing Methods, Global routing
between blocks, Back annotation.

Textbook 1

Page | 1
21EC71 Advanced VLSI

CONTENTS
Floorplanning ..................................................................................................................................................................... 3
FLOORPLANNING GOALS and Objectives .......................................................................................................... 3
Measurement of Delay in Floorplanning ............................................................................................................ 4
Floorplanning Tools .................................................................................................................................................... 5
Channel Definition ....................................................................................................................................................... 6
I/O and Power Planning ............................................................................................................................................ 8
Clock Planning ............................................................................................................................................................ 10
Placement ......................................................................................................................................................................... 11
Placement Terms and Definitions ...................................................................................................................... 11
Placement Goals and Objectives ......................................................................................................................... 12
Placement Algorithms ............................................................................................................................................. 12
min-cut placement method............................................................................................................................... 13
Iterative Placement Improvement ................................................................................................................ 14
Timing-Driven Placement Methods ................................................................................................................... 16
Physical Design Flow.................................................................................................................................................... 17
ROUTING ........................................................................................................................................................................... 18
Global Routing ................................................................................................................................................................ 18
Goals and Objectives ................................................................................................................................................ 18
Global Routing Methods ......................................................................................................................................... 19
Global Routing Between Blocks .......................................................................................................................... 20
Back-annotation ........................................................................................................................................................ 21

Page | 2
21EC71 Advanced VLSI

FLOORPLANNING
Floorplanning is a mapping between the logical description (the netlist) and the
physical description (the floorplan).

The input to a floorplanning tool is a hierarchical netlist that describes the


interconnection of the blocks (RAM, ROM, ALU, cache controller, and so on); the logic cells
(NAND, NOR, D flip-flop, and so on) within the blocks; and the logic cell connectors (the
terms terminals , pins , or ports mean the same thing as connectors ).

The netlist is a logical description of the ASIC; the floorplan is a physical


description of an ASIC.

FIGURE 16.3 Interconnect and gate delays. As feature sizes decrease, both average
interconnect delay and average gate delay decrease but at different rates. This is because
interconnect capacitance tends to a limit that is independent of scaling. Interconnect delay
now dominates gate delay.

FLOORPLANNING GOALS AND OBJECTIVES


The goals of floorplanning are to:

● arrange the blocks on a chip,

● decide the location of the I/O pads,

● decide the location and number of the power pads,

● decide the type of power distribution, and

● decide the location and type of clock distribution.

The objectives of floorplanning are to minimize the chip area and minimize delay.

Page | 3
21EC71 Advanced VLSI

MEASUREMENT OF DELAY IN FLOORPLANNING


 In floorplanning we wish to predict the interconnect delay before we complete any
routing.
 To predict delay we need to know the parasitics associated with interconnect: the
interconnect capacitance ( wiring capacitance or routing capacitance ) as well as the
interconnect resistance.
 We cannot predict the resistance of the various pieces of the interconnect path since we
do not yet know the shape of the interconnect for a net.
 We estimate interconnect length by collecting statistics from previously routed chips and
analyzing the results. From these statistics we create tables that predict the interconnect
capacitance as a function of net fanout and block size.

FIGURE 16.4 Predicted capacitance. (a) Interconnect lengths as a function of fanout (FO)
and circuit-block size. (b) Wire-load table. There is only one capacitance value for each
fanout (typically the average value). (c) The wire-load table predicts the capacitance and
delay of a net (with a considerable error).

Page | 4
21EC71 Advanced VLSI

FLOORPLANNING TOOLS
 Figure 16.6 (a) shows an initial random floorplan generated by a floorplanning tool. Two
of the blocks, A and C in this example, are standard-cell areas (the chip shown in Figure
16.1 is one large standard-cell area). These are flexible blocks (or variable blocks )
because, although their total area is fixed, their shape (aspect ratio) and connector
locations may be adjusted during the placement step.
 We may force logic cells to be in selected flexible blocks by seeding. Seeding may be
hard or soft. A hard seed is fixed and not allowed to move during the remaining
floorplanning and placement steps. A soft seed is an initial suggestion only and can be
altered if necessary by the floorplanner.

FIGURE 16.6 Floorplanning a cell-based ASIC. (a) Initial floorplan generated by the
floorplanning tool. Two of the blocks are flexible (A and C) and contain rows of
standard cells (unplaced). A pop-up window shows the status of block A. (b) An
estimated placement for flexible blocks A and C. The connector positions are known
and a rat s nest display shows the heavy congestion below block B. (c) Moving blocks
to improve the floorplan. (d) The updated display shows the reduced congestion after
the changes.

 We need to control the aspect ratio of our floorplan because we have to fit our chip into
the die cavity (a fixed-size hole, usually square) inside a package.
 With practice, we can create a good initial placement by floorplanning and a pictorial
display

Page | 5
21EC71 Advanced VLSI

FIGURE 16.7 Congestion analysis. (a) The initial floorplan with a 2:1.5 die aspect ratio.
(b) Altering the floorplan to give a 1:1 chip aspect ratio. (c) A trial floorplan with a
congestion map. Blocks A and C have been placed so that we know the terminal positions
in the channels. Shading indicates the ratio of channel density to the channel capacity.
Dark areas show regions that cannot be routed because the channel congestion exceeds the
estimated capacity. (d) Resizing flexible blocks A and C alleviates congestion.

CHANNEL DEFINITION
During the floorplanning step we assign the areas between blocks that are to be used for
interconnect. This process is known as channel definition or channel allocation .
Figure 16.8 shows a T-shaped junction between two rectangular channels and illustrates
why we must route the stem (vertical) of the T before the bar. The general problem of
choosing the order of rectangular channels to route is channel ordering.

FIGURE 16.8 Routing a T-junction between two channels in two-level metal. The dots
represent logic cell pins. (a) Routing channel A (the stem of the T) first allows us to adjust
the width of channel B. (b) If we route channel B first (the top of the T), this fixes the
width of channel A. We have to route the stem of a T-junction before we route the top.

Page | 6
21EC71 Advanced VLSI

Figure 16.9 shows a floorplan of a chip containing several blocks. Suppose we cut along
the block boundaries slicing the chip into two pieces ( Figure 16.9 a). Then suppose we can
slice each of these pieces into two. If we can continue in this fashion until all the blocks are
separated, then we have a slicing floorplan ( Figure 16.9 b). Figure 16.9 (c) shows how the
sequence we use to slice the chip defines a hierarchy of the blocks. Reversing the slicing
order ensures that we route the stems of all the channel T-junctions first.

FIGURE 16.9 Defining the channel routing order for a slicing floorplan using a
slicing tree. (a) Make a cut all the way across the chip between circuit blocks. Continue
slicing until each piece contains just one circuit block. Each cut divides a piece into two
without cutting through a circuit block. (b) A sequence of cuts: 1, 2, 3, and 4 that
successively slices the chip until only circuit blocks are left. (c) The slicing tree
corresponding to the sequence of cuts gives the order in which to route the channels: 4, 3,
2, and finally 1.
Figure 16.10 shows a floorplan that is not a slicing structure. We cannot cut the chip all
the way across with a knife without chopping a circuit block in two. This means we cannot
route any of the channels in this floorplan without routing all of the other channels first. We
say there is a cyclic constraint in this floorplan. There are two solutions to this problem.
One solution is to move the blocks until we obtain a slicing floorplan. The other solution is to
allow the use of L -shaped, rather than rectangular, channels (or areas with fixed
connectors on all sides a switch box ).

FIGURE 16.10 Cyclic constraints. (a) A nonslicing floorplan with a cyclic constraint
that prevents channel routing. (b) In this case it is difficult to find a slicing floorplan
without increasing the chip area. (c) This floorplan may be sliced (with initial cuts 1 or 2)
and has no cyclic constraints, but it is inefficient in area use and will be very difficult to
route.

Page | 7
21EC71 Advanced VLSI

I/O AND POWER PLANNING


Every chip communicates with the outside world. Signals flow onto and off the chip and
we need to supply power. We need to consider the I/O and power constraints early in the
floorplanning process.

FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of
pads determines the die size. (b) A core-limited die: The core logic determines the die size.
(c) Using both pad-limited pads and core-limited pads for a square die.

 Special power pads are used for the positive supply, or VDD, power buses (or power
rails ) and the ground or negative supply, VSS or GND.
 Usually one set of VDD/VSS pads supplies one power ring that runs around the pad
ring and supplies power to the I/O pads only.
 Another set of VDD/VSS pads connects to a second power ring that supplies the logic
core.
 We sometimes call the I/O power dirty power since it has to supply large transient
currents to the output transistors. We keep dirty power separate to avoid injecting
noise into the internal-logic power (the clean power).
 I/O pads also contain special circuits to protect against electrostatic discharge
(ESD). These circuits can withstand very short high-voltage (several kilovolt) pulses
that can be generated during human or machine handling.
Figure 16.13 (a) and (b) are magnified views of the southeast corner of our example chip
and show the different types of I/O cells. Figure 16.13 (c) shows a stagger-bond
arrangement using two rows of I/O pads. In this case the design rules for bond wires (the
spacing and the angle at which the bond wires leave the pads) become very important.
Figure 16.13 (d) shows an area-bump bonding arrangement (also known as flip-chip,
solder-bump or C4, terms coined by IBM who developed this technology [ Masleid, 1991])
used, for example, with ball-grid array ( BGA )packages.

Page | 8
21EC71 Advanced VLSI

FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited
pads. (b) A hybrid corner pad. (c) A chip with stagger-bonded pads. (d) An area-bump
bonded chip (or flip-chip). The chip is turned upside down and solder bumps connect the
pads to the lead frame.

Page | 9
21EC71 Advanced VLSI

CLOCK PLANNING
Figure 16.16 (a) shows a clock spine (not to be confused with a channel spine) routing
scheme with all clock pins driven directly from the clock driver. MGAs and FPGAs often use
this fish bone type of clock distribution scheme.

FIGURE 16.16 Clock distribution. (a) A clock spine for a gate array. (b) A clock spine
for a cell-based ASIC (typical chips have thousands of clock nets).
(c) A clock spine is usually driven from one or more clock-driver cells. Delay in the
driver cell is a function of the number of stages and the ratio of output to input
capacitance for each stage (taper). (d) Clock latency and clock skew. We would like to
minimize both latency and skew.

Page | 10
21EC71 Advanced VLSI

PLACEMENT
After completing a floorplan we can begin placement of the logic cells within the
flexible blocks. Placement is much more suited to automation than floorplanning. Thus we
shall need measurement techniques and algorithms.

PLACEMENT TERMS AND DEFINITIONS


CBIC, MGA, and FPGA architectures all have rows of logic cells separated by the
interconnect these are row-based ASICs . Figure 16.18 shows an example of the
interconnect structure for a CBIC. Interconnect runs in horizontal and vertical directions in
the channels and in the vertical direction by crossing through the logic cells. Figure 16.18
(c) illustrates the fact that it is possible to use over-the-cell routing ( OTC routing) in areas
that are not blocked. However, OTC routing is complicated by the fact that the logic cells
themselves may contain metal on the routing layers.

FIGURE 16.18 Interconnect structure. (a) The two-level metal CBIC floorplan
shown in Figure 16.11 b. (b) A channel from the flexible block A. This channel has a
channel height equal to the maximum channel density of 7 (there is room for seven
interconnects to run horizontally in m1). (c) A channel that uses OTC (over-the-cell)
routing in m2.

With two layers of metal, we route within the rectangular channels using the first
metal layer for horizontal routing, parallel to the channel spine, and the second metal layer
for the vertical direction (if there is a third metal layer it will normally run in the horizontal
direction again). The maximum number of horizontal interconnects that can be placed side
by side, parallel to the channel spine, is the channel capacity .

Page | 11
21EC71 Advanced VLSI

PLACEMENT GOALS AND OBJECTIVES


The goal of a placement tool is to arrange all the logic cells within the flexible blocks on
a chip. Ideally, the objectives of the placement step are to
● Guarantee the router can complete the routing step
● Minimize all the critical net delays
● Make the chip as dense as possible

We may also have the following additional objectives:


● Minimize power dissipation
● Minimize cross talk between signals

The most commonly used placement objectives are one or more of the following:
● Minimize the total estimated interconnect length
● Meet the timing requirements for critical nets
● Minimize the interconnect congestion

PLACEMENT ALGORITHMS
There are two classes of placement algorithms commonly used in commercial CAD
tools:

1. constructive placement
a. variations on the min-cut algorithm
b. eigenvalue method
2. iterative placement improvement.

Placement usually starts with a constructed solution and then improves it using an
iterative algorithm.

Page | 12
21EC71 Advanced VLSI

MIN -CUT PLACEMENT METHOD

The min-cut placement method uses successive application of partitioning [Breuer,1977].


The following steps are shown in Figure 16.24 :
1. Cut the placement area into two pieces.
2. Swap the logic cells to minimize the cut cost.
3. Repeat the process from step 1, cutting smaller pieces until all the logic
cells areplaced.

FIGURE 16.24 Min-cut placement. (a) Divide the chip into bins using a grid.
(b) Merge all connections to the center of each bin. (c) Make a cut and swap
logic cells between bins to minimize the cost of the cut. (d) Take the cut piecesand
throw out all the edges that are not inside the piece. (e) Repeat the processwith a
new cut and continue until we reach the individual bins.

Page | 13
21EC71 Advanced VLSI

ITERATIVE PLACEMENT IMPROVEMENT


An iterative placement improvement algorithm takes an existing placement and tries
to improve it by moving the logic cells. There are two parts to the algorithm:
● The selection criteria that decides which logic cells to try moving.

● The measurement criteria that decides whether to move the selected cells.
There are several interchange or iterative exchange methods that differ in their
selection and measurement criteria:

● pairwise interchange,
● force-directed interchange,
● force-directed relaxation, and
● force-directed pairwise relaxation.

FIGURE 16.26 Interchange. (a) Swapping the source logic cell with a destination logic
cell in pairwise interchange. (b) Sometimes we have to swap more than two logic cells
at a time to reach an optimum placement, but this is expensive in computation time.
Limiting the search to neighborhoods reduces the search time.
Logic cells within a distance e of a logic cell form an e-neighborhood. (c) A one-
neighborhood. (d) A two-neighborhood.

FIGURE 16.27 Force-directed placement. (a) A network with nine logic cells.
(b) We make a grid (one logic cell per bin). (c) Forces are calculated as if springs were
attached to the centers of each logic cell for each connection. The two nets connecting
logic cells A and I correspond to two springs. (d) The forces are proportional to the
spring extensions.

Page | 14
21EC71 Advanced VLSI

Without external forces to counteract the pull of the springs between logic cells, the
network will collapse to a single point as it settles. An important part of force-directed
placement is fixing some of the logic cells in position. Normally ASIC designers use the I/O
pads or other external connections to act as anchor points or fixed seeds.

FIGURE 16.28 Force-directed iterative placement improvement. (a) Force-directed


interchange. (b) Force-directed relaxation. (c) Force-directed pairwise relaxation.

Page | 15
21EC71 Advanced VLSI

TIMING-DRIVEN PLACEMENT METHODS


Minimizing delay is becoming more and more important as a placement objective.
There are two main approaches:

1. net based 2. path based

We know that we can use net weights in our algorithms. The problem is to calculate
the weights. One method finds the n most critical paths (using a timing-analysis engine,
possibly in the synthesis tool). The net weights might then be the number of times each net
appears in this list. The problem with this approach is that as soon as we fix (for example)
the first 100 critical nets, suddenly another 200 become critical.

FIGURE 16.29 The zero-slack algorithm. (a) The circuit with no net delays. (b) The
zero-slack algorithm adds net delays (at the outputs of each gate, equivalent to increasing
the gate delay) to reduce the slack times to zero.
With the zero-slack algorithm we simplify but overconstrain the problem. For
example, we might be able to do a better job by making some nets a little longer than the
slack indicates if we can tighten up other nets. What we would really like to do is deal with
paths such as the critical path shown in Figure 16.29 (a) and not just nets . Path-based
algorithms have been proposed to do this, but they are complex and not all commercial
tools have this capability.

Page | 16
21EC71 Advanced VLSI

PHYSICAL DESIGN FLOW


Historically placement was included with routing as a single tool (the term P&R is
often used for place and route). Because interconnect delay now dominates gate delay, the
trend is to include placement within a floorplanning tool and use a separate router.
1. Design entry. The input is a logical description with no physical information.
2. Synthesis. The initial synthesis contains little or no information on any interconnect
loading. The output of the synthesis tool (typically an EDIF netlist) is the input to the
floorplanner.
3. Initial floorplan. From the initial floorplan interblock capacitances are input to the
synthesis tool as load constraints and intrablock capacitances are input as wire-load tables.
4. Synthesis with load constraints. At this point the synthesis tool is able to resynthesize
the logic based on estimates of the interconnect capacitance each gate is driving. The
synthesis tool produces a forward annotation file to constrain path delays in the placement
step.

FIGURE 16.31 Timing-driven floorplanning and placement design flow.


5. Timing-driven placement. After placement using constraints from the synthesis tool,
the location of every logic cell on the chip is fixed and accurate estimates of interconnect
delay can be passed back to the synthesis tool.
6. Synthesis with in-place optimization ( IPO ). The synthesis tool changes the drive
strength of gates based on the accurate interconnect delay estimates from the floorplanner
without altering the netlist structure.
7. Detailed placement. The placement information is ready to be input to the routing
step.

Page | 17
21EC71 Advanced VLSI

ROUTING
Once the designer has floorplanned a chip and the logic cells within the flexible
blocks have been placed, it is time to make the connections by routing the chip.

Routing is usually split into

1. global routing
2. followed by detailed routing

GLOBAL ROUTING
The details of global routing differ slightly between cell-based ASICs, gate arrays,
and FPGAs, but the principles are the same in each case. A global router does not make any
connections, it just plans them. We typically global route the whole chip (or large pieces if it
is a large chip) before detail routing the whole chip (or the pieces).

There are two types of areas to global route:

1. inside the flexible blocks


2. between blocks

GOALS AND OBJECTIVES


The input to the global router is a floorplan that includes the locations of all the
fixed and flexible blocks; the placement information for flexible blocks; and the locations of
all the logic cells. The goal of global routing is to provide complete instructions to the
detailed router on where to route every net. The objectives of global routing are one or more
of the following:

● Minimize the total interconnect length.

● Maximize the probability that the detailed router can complete the routing.

● Minimize the critical path delay.

Page | 18
21EC71 Advanced VLSI

GLOBAL ROUTING METHODS

global
routing

sequential hierarchical
routing routing

order- order whole chip,


bottom-up
independent dependent or highest
approach
routing routing level

One approach to global routing takes each net in turn and calculates the shortest path
using tree on graph algorithms with the added restriction of using the available channels.
This process is known as sequential routing.

As a sequential routing algorithm proceeds, some channels will become more


congested since they hold more interconnects than others. In the case of FPGAs and
channeled gate arrays, the channels have a fixed channel capacity and can only hold a
certain number of interconnects.

There are two different ways that a global router normally handles this problem.
Using order-independent routing, a global router proceeds by routing each net, ignoring
how crowded the channels are. Whether a particular net is processed first or last does not
matter, the channel assignment will be the same. In order-independent routing, after all the
interconnects are assigned to channels, the global router returns to those channels that are
the most crowded and reassigns some interconnects to other, less crowded, channels.

Alternatively, a global router can consider the number of interconnects already


placed in various channels as it proceeds. In this case the global routing is order dependent
the routing is still sequential, but now the order of processing the net will affect the results.

In contrast to sequential global-routing methods, which handle nets one at a time,


hierarchical routing handles all nets at a particular level at once. Rather than handling all
of the nets on the chip at the same time, the global-routing problem is made more tractable
by dividing the chip area into levels of hierarchy. By considering only one level of hierarchy
at a time the size of the problem is reduced at each level. There are two ways to traverse the
levels of hierarchy.

Starting at the whole chip, or highest level, and proceeding down to the logic cells is
the top-down approach. The bottom-up approach starts at the lowest level of hierarchy and
globally routes the smallest areas first.

Page | 19
21EC71 Advanced VLSI

GLOBAL ROUTING BETWEEN BLOCKS

FIGURE 17.4 Global routing for a cell-based ASIC formulated as a graph problem. (a) A
cell-based ASIC with numbered channels. (b) The channels form the edges of a graph. (c)
The channel-intersection graph. Each channel corresponds to an edge on a graph whose
weight corresponds to the channel length

Figure 17.5 shows an example of global routing for a net with five terminals, labeled A1
through F1, for the cell-based ASIC shown in Figure 17.4 . If a designer wishes to use
minimum total interconnect path length as an objective, the global router finds the minimum-
length tree shown in Figure 17.5 (b). This tree determines the channels the interconnects will
use.

FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC (from Figure 17.4 )
showing a single net with a fanout of four (five terminals). We have to order the numbered
channels to complete the interconnect path for terminals A1 through F1. (b) The terminals
are projected to the center of the nearest channel, forming a graph. A minimum-length
tree for the net that uses the channels and takes into account the channel capacities. (c)
The minimum-length tree does not necessarily correspond to minimum delay. If we wish to
minimize the delay from terminal A1 to D1, a different tree might be better.

Page | 20
21EC71 Advanced VLSI

BACK-ANNOTATION
After global routing is complete it is possible to accurately predict what the length
of each interconnect in every net will be after detailed routing, probably to within 5 percent.
The global router can give us not just an estimate of the total net length (which was all we
knew at the placement stage), but the resistance and capacitance of each path in each net.
This RC information is used to calculate net delays. We can back-annotate this net delay
information to the synthesis tool for in-place optimization or to a timing verifier to make
sure there are no timing surprises. Differences in timing predictions at this point arise due
to the different ways in which the placement algorithms estimate the paths and the way the
global router actually builds the paths

End of Module 2 Notes

Page | 21

You might also like