PD Flow

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-1, Issue-3, August 2012

Block Level Physical Design of Interfacing


Module in RISC Core
Siddalinga Aland, V. Venkateswarlu, Rohith B.R

observed that the integration of transistor devices on a same


unit area is increased by more than double. From this
observation Gorden E Moore stated that the transistor
integration will increase by double every 18 months [1].
Block level physical design involves the designing the
layout without considering the pin pads. Block level has
peripheral interconnecting in the form of ports. Usually the
complex chip design starts with the block level physical
design. Each block consists of different number of standard
cells and macros. While designing a chip, in block level half
cell concept is used for abutting with the neighboring blocks.
Chip level physical design includes the combination of the
block levels. Chip level design is top level integration of
blocks. At the periphery IO pads and IO pins are included for
external world connection. For IO pads ESD (electrostatic
discharge) protection circuits also need to be implemented.
ESD is a huge topic of research, so this desertion work will
not be covering about this.
The complex digital systems are designed with hardware
modules which interact by transferring information and
synchronizing their inputs and outputs. In this project the
interfacing modules like PCI (peripheral component
interface) and SDRAM (Synchronous Dynamic Random
Access Memory). The ORCA (Optimized Reconfigurable
Cell Array) is considered as a reference for implementing the
interfacing module in a RISC core processor. The details
regarding RISC (Reduced Instruction Set Computer) is out of
scope of this project work. A RISC is computer architecture
that reduces chip complexity by using simpler instructions.
VLSI physical design includes the tasks like floor-planning
and partition of logical groups and power required. Power
planning involves the power network synthesis and analysis.
Clock tree synthesis involves distribution of clock signal in
the block by inserting required number and strength of cells.
Routing involves the rule based and non default rule based
routing metals for interconnection. Routing includes power,
clock and signal routing. The final stage is to generate a
GDSII from the DRC violation free and violation free timing.

Abstract The physical design plays a major role in


implementing the circuit and logic cells physically, because
physical devices and interconnecting materials will have its own
parasitic resistances and capacitances. Placement and Routing
(PNR) flow involves proper placement and routing the interfacing
module including majorly PCI and SDRAM. In this project work
digital cells called standard cells and macro are placed with
minimum congestion of 3% in a block. And routing is done by
keeping in mind the manufacturability by utilizing non default
rule (NDR) design rules. The clock tree network is built by using
the H-Tree network topology. The power network is synthesized
with higher metal layers available in technology node. This
project is implemented in TSMC 120nm technology, which has 7
metal layers but as this project is block level so 6 metal layers are
used for routing. The clock frequency of block system is
250MHz is used as the main clock, peripheral clocks and
generated clock of 133MHz. The GDSII format of layout is
generated with no violations.

I. INTRODUCTION
The chip design is a complex process which has to be
handled in hierarchy block level. The entire design is
partitioned into sub-blocks depending on the logical
connectivity and level of hierarchy. Typically standard cells
have a constant height that allows them to be lined up in rows
on the integrated circuit. So the physical design is directly
related to the technology nodes, irrespective of coding
technique. So this ASIC physical design is hot domain in
VLSI industry.
This project is carried out with the help of Synopsys IC
Compiler tool by preparing Milky-Way database.
The chip will consist of a very large number of rows (with
power and ground running next to each row) with each row
filled with the various cells making up the actual design.
Placers obey certain rules: each cell is assigned a unique
(exclusive) location on the die map. A given gate is placed
once, and may not occupy or overlap the location of any other
gate. Using the placement netlist and the layout view of the
standard cells, the router adds both signal connect lines and
power supply lines. The fully routed (physical) netlist
contains the listing of cells (from synthesis), the placement of
each cell (from placement), and the drawn interconnects
(from routing).
In 1958 IC has been invented, at that time only few
numbers of transistors are placed per unit area. In 1965 it is

II. OVERVIEW OF THE ASIC PHYSICAL DESIGN


The physical design is implementing the digital chip from
the synthesized gate level netlist. ASIC implementation is
divided mainly 3 types as Front end (RTL design and
Verification), DFT (Scan insertion) and Back end (Physical
design) Digital circuits are interconnected sets of logical gates
that perform operations varying from very simple such as
AND, OR, NOR, XOR to very complex such as
programmable general purpose processors, image processing,
graphics processing. In semiconductor design, standard cell
methodology is the way of designing Application Specific
Integrated Circuits (ASICs). Standard cell methodology is an
example of design abstraction, whereby a layout view of a
logic function is encapsulated into an abstract logic

Manuscript received on August, 2012.


Mr. Siddalinga Aland VLSI Design and Embedded Systems,
visvesvaraya Technological University/ UTL Tech. ltd., VTU Extn.center /
UTL Technologies, Bangalore, INDIA.
Dr. V.Venkateswarlu, VLSI Design and Embedded Systems,
visvesvaraya Technological University/ UTL Tech. Ltd., VTU Extn.center /
UTL Technologies, Bangalore, INDIA.
Mr. Rohith B. R, VLSI Physical Design, SMARTPLAY Technologies
Ltd, Bangalore, INDIA.

46

Block Level Physical Design of Interfacing Module in RISC Core


representation (such as a NAND gate). Cell-based
methodology (the general class that standard-cell belongs to)
makes it possible for one designer to focus on the high-level
(logical function) aspect of digital-design, while another
designer focused on the implementation (physical) aspect.
The library along with a logic level design netlist (list of
connections) is the basis for exchanging design information
between different phases of the SPR (Standard Cell Place and
Route) process[3].
The few of the important algorithms helps to do PNR
(placement and routing) of the cells. Placement algorithms
includes, simulated annealing, simulated evolution, force
directed placement, Breuers algorithm, cluster growth
algorithm [2].

are developed by the chip manufacturing units by rigorous


development and testing. Extraction is done in the file format
SPEF. This extracted data is used for the final timing analysis.
If the timing is met with the specified specification then the
GDSII is extracted and sent to the fabrication unit for
manufacturing the chip. Sometimes if the timing is not met
then it should be rerouted the some blocks or whole block and
then checked for timing. Even then the rerouting will not
satisfy the timing window, need to go back to the placement
stage, where whether timing and sizing is sufficient or more
than sufficient is analysed and timing analysis is done. Finally
in worst case even placement does not solves the issues then
portioning is done once more.From above discussion it shows
that the timing and cross verification at each step will keeps
the project flow clear and successful.
The flow diagram shown in figure 3.1 is one type of flow.
The basic idea like floor-planning, partitioning, power
planning, clock tree synthesis, routing flow will be
maintained. The way to perform it might from designer to
designer. The timing is checked at every step for the flow of
the physical design.

III. PHYSICAL DESIGN FLOW


The physical design involves the portioning,
floor-planning, placement, Clock Tree Synthesis, Routing,
compaction, extraction and verification. In figure 4.1 circuit
design and fabrication is out of scope of this project. The
following flow diagram shows the brief overview of digital
back end design flow.
A. Partitioning:
Partitioning is the process where the entire design block is
sub-divided into many sub-blocks depending on either
hierarchy or criticality of the blocks. Block means the logical
circuit which is sub part of the main logical circuit.
B. Floor-planning and Placement:
This step involves deciding the area which will be sufficient
to place and route the design without any congestion and
manufacturability issues. Placement involves the placement
of the critical blocks which are sensitive at to the noise or any
third order effects. Automated cell placement for VLSI
circuits has always been a key factor for achieving designs
with optimized area usage and timing behavior [4]. There are
essentially two types of placement that the IC Compiler
allows the designer to do, which is Timing-Driven placement
and Congestion-Driven placement. Besides these, other types
of placement include Block level placement and Hierarchical
placement. After placement timing analysis is done by
considering the virtual routing lines.
C. CTS and Routing:
Once the timing and size of the design if freezed then the
clock tree synthesis is created where the clock tree buffers or
inverters are inserted to balance the load and skew between
the blocks. Routing is critical stage in PNR, which is done
with 2 stages as global routing and detailed routing. The
routing of metals includes the parasitic resistance and
capacitance in the design. If the routing feels too complex
then the re floor-planning and placement is carried else the
compaction is done.

Figure 3.1: Physical design flow diagram [5].

D. Compaction:
Compaction is the process where the redundant metals are
removed and extraction of the parasitic is done for actual
timing analysis.

IV. IMPLEMENTATION
A. Data Setup
The library or data set-up preparation is done initially before
proceeding into the implementing the physical process of
block. It involves sourcing the required datas, files into the
home directory. The following are important files needed and
setting the parameters as per the design.

E. Timing Verification:
In this step the parasitic resistance, capacitance are
extracted with the help of wire load models. Wire load models

47

International Journal of Innovative Technology and Exploring Engineering (IJITEE)


ISSN: 2278-3075, Volume-1, Issue-3, August 2012
1.
2.

3.

4.

5.

Design Input data: synthesised gate level net list. Gives


the logical connectivity of the cells.
Constraints File: Synopsys Design Constraint
In this file timing specifications are specified for the
design.
Libraries,
i) model file
ii) Physical libraries:
iii) technology libraries
iv) Logical Library
Technology File
In this fabrication related information like metal
conductivity, width, spacing and layer name is specified.
TLU Plus
In this file the R and C values are specified.

TABLE OF FLOORPLAN

Table 4.2.1: Floor-plan information before pre-CTS


The floor-plan is specified as the aspect ratio, core
utilization, core to IO distance at top, bottom, left, right,
boundary area. The table 4.2.1 shows the floor-planning
constraint.
Specifications:
Aspect Ratio: 1 (width=1, length=1)
Core Size: width 710um, Height 710um
Block Size: width 750um, height 750 um
Clock frequency: 250MHz (main system clock) and
133MHz(generated clock)
Power consumption acceptable: 50Mw

The following process flow in figure 4.1 shown is used for the
implementing this project. In each process step there will be
many or few steps to finish the process. The input for the data
set up flow is verilog, technology files, physical and logical
libraries. And output is GDSII in the form of layout.

Figure 4.2: placement view of standard cells and macros in


ICC tool.
The partial placement view of the design is shown in figure
4.2. The pink colored portion is standard cells and green layer
is macro cells. The macro is hard IP, where physical designer
will not have chance to change the functionality.
C. Power Planning
Power planning is also called as the PG (power/ground)
planning. This step involves making the connections via net to
the pin VDD and VSS. Next need to create the power mesh
for proper distribution of powers to each cell in the core. The
mesh is like a square shape which is specified as the start and
end location with the direction as horizontal or vertical with
the metal used for power and ground route with the width
specified.
The power planning involves the power network synthesis,
in this power routes are laid with required width of the metals.
The power routing metals are done by the highest metal layers
available.

Figure 4.1: Flow diagram of physical design of block


The above diagram in figure 4.1 shows the complete flow of
the physical design. Some steps can be altered as and when
applicable. Analysis step is followed every stage for accuracy.
B. Floorplanning
Floor-plan involves the area decision for block and chip,
this is very critical stage in chip design. This is done by
applying the floor-plan constraints like aspect ratio, core
utilization, IO utilization. Floor-plan could be changed if it is
not fitting into the given area. Sometimes floorplan aspect
ratio might change to accommodate the all the standard cells
and macros. The core, macro and IO utilization should be
retained same for to maintain the logical functionalities of the
design.
In this design aspect ratio is 1, which means the block
should be square including the core, IO clearance and IO
ports. As this design is block level physical design, which
does not considers the IO pins/pads (special ESD issue should
be taken care). Each cell inside the block has its own pins for
interconnecting between block cells. The block boundary has
the ports which will be connecting to another block ports or
sometimes IO pads too. The report related to floor-plan is
given in the following table,

Figure 4.3: power metal routes for power supply in core area.

48

Block Level Physical Design of Interfacing Module in RISC Core


The figure 4.3 shows the horizontal power line is metal 5 and
vertical power line is metal 6. This is used because the power
supply lines should not have more power dissipation.
The power network analysis involves the calculating the
power consumed by the metal layers of the power routes.

The congestion analysis is done after placement of the cells.


The congestion is defined as the acquired routing channels to
the required channels. The above figure 4.5 shows most
congested places with the red color.

D. Placement
Placement of the standard cells and macros are carried out
depending on rule specified and floor-planned design. The
following flow diagram shows the placement steps as,
Placement involves the iteration process, initially all cells
are placed randomly called coarse placement and then the
cells are placed according to timing driven or congestion
driven or rout-ability driven requirements. After one level of
placement is done the placement is legalized and saved in
database in db format for further steps of physical design
analysis and implementation.
i) Timing-driven placement tries to place cells along
timing-critical paths close together to reduce net RCs and
meet setup timing.
ii) Congestion driven placement sees the number of metals
routes could be routed in the specified channel. So the
congestion driven placement spreads apart the cells that
contribute to high congestion.
iii) Addition and modification of blockages is added to the
cells usually for macros. Blockage may be hard or soft.
The blockage means the keep out of the cells from one
macro to another or to the boundary.
Congestion is calculated by,
Congestion = [number of nets crossing the global routing/
number of available routing tracks]

Figure 4.6: Second iteration of congestion map


The main intension of the physical design is to reduce the
congestion by proper placement of the blocks. The green and
blue spots shown in figure 4.6 are low congestion.

Figure 4.7: Final stage of congestion map.


The figure 4.7 shows the congestion map with the least
congested spot. The dark blue line is showed lowest
congestion value. When doing congestion analysis, the
threshold value is set for up to which there will not be any
congestion and above the threshold there will be congestion.
E. Clock Tree Synthesis (CTS)
Clock tree synthesis is a process of generating the clock
network in the design for clock distribution. In this project
clock network is built using the H-tree topology. The
following figure 4.8 shows the clock signal connection from
the clock source to target sequential cells.
Pre clock tree synthesis is just the connection of the clock
paths without considering the inclusion of clock tree buffers
and inverters for balancing and optimization of clock tree . It
assumes the zero skew, latency and transition. The following
figure 4.8 shows the pre-CTS circuit. Post CTS includes the
clock tree buffers in the path of the clock routes.

Figure 4.4: placement view of standard cell and filler cells


The figure 4.4 shows the partial placement of the cells. In
this project the functionality is achieved with mostly basic
standard cells.

Figure 4.8: Clock generation and connection to the sink ports

Figure 4.5: Initial congestion map in design

49

International Journal of Innovative Technology and Exploring Engineering (IJITEE)


ISSN: 2278-3075, Volume-1, Issue-3, August 2012
The clock tree buffer insertion diagram is not shown instead
the report generated is given in table.
So the CTS insert the clock tree buffers and/or inverters to
balance the loads and to meet the skew, latency and transition
goals of the design. The insertion and optimization of the
clock tree buffers are done using the specially designed
buffers with different driving strengths.

functioning of chip. So in table 4.3 shows the final CTS


network where no DRC and the additional area acquired by
the clock tree cells.
F. Routing
The routing goal is to physically interconnect all the clocks,
signal pins through metal interconnect and meet the set-up
and hold timing and clock skew requirement. Also metal
traces must meet physical DRC requirement.
The Routing flow involves as follows

Figure 4.9: Congestion map after clock tree buffers insertion


The figure 4.9 shows the congestion map after clock tree
synthesis, where clock tree synthesis cells are inserted. From
figure 4.7 and 4.9 congestion value will be differed because of
cell counts.

Figure 4.10: Routing flow diagram


The figure 4.10 shows the flow of the routing of the block,
which involves global routing, track assignment routing, local
or detailed routing and search and repair route for accurate
check.
Global Routing: Global routing is done while placement
of cells (both standard cells and macros). Usually this is done
with higher metal layers. Global routing is used for power and
clock routing for less IR drop and flexible long routing on
cells.
Track Assignment: It assigns each net to a specific track
and lays down the actual metal traces. It tries to make long and
straight traces also tries to reduce the number of via usage. In
this stage some DRC will occur like notch violation, which
will be resolved in later stage.
Detailed Routing: Detailed routing is routing inside the
block with lower metals than the global routing metals. There
would be many more net lengths inside the blocks. Detailed
route attempts to clear DRC violations using fixed size
PrBoundary.

Clock Tree Summury

Table 4.1: Clock tree network before cell insertion with DRC

Table 4.2: Clock tree network summary after cell insertion


with DRC

TABLE 4.3: CTS summary showing the number of CT cells


inserted with DRC clean
The table 4.1 shows the virtual clock tree network, where
no clock tree cells are inserted. It assumes that clock path is
ideal no DRC violations also it has target points in the column
sinks. The table 4.2 shows the summary of clock tree network,
where clock path is laid with the higher level metals. It has
some DRC errors so it need to be resolved for proper

Figure 4.11: Global and Local Routing view.


The global and detailed route is done with the all metals.
The power and clock route should be done with higher metals,
in this design it is with M4 and M5. The clock routing is very
critical and important, so shielding is utilized whenever clock
is critical.

50

Block Level Physical Design of Interfacing Module in RISC Core


the placement stage and 3% after clock tree insertion. The
area utilization for around 27000 standard cells, i.e
approximately about 60,000 of MOS devices has been placed
in the area assigned of size 751.76X748.48 um2. The power
consumption about 51mW is achieved including the dynamic
and static power. The entire design flow is carried out with the
help of Synopsys IC Compiler Galaxy family tool.
The physical analysis of DRC is also carried out successfully
without any violations. The DRC rules will be provided by the
TSMC 120nm Design Rule Manual (DRM). And for routing
critical clock and reducing IR drop at critical places Non
Default Rule (NDR) of DRC is used for achieving low power
and congestion free design.

Search and Route: Search and route divides chip into


SBox (standard box) and work through each SBox
sequentially and fixes the DRC violation by rerouting within
the confined box. In this stage involves insertion of redundant
vias for proper manufacturability. The figure 4.11 shows the
final routing view.
V. RESULTS AND DISCUSSIONS
A. Area Report Before CTS
The table 5.1 shows the summary report of area before
clock tree synthesis where clock tree cells were not inserted.
The core utilization is taken as 77% to place only standard
cells and macros. The pins are associated with each cells and
ports are interface to another block in the chip. The core area
is partitioned into 192 rows with height 2.5um and standard
width of tile 4.5um.

VII. ACKNOWLEDGE
I acknowledge Dr. Siva Yallampalli, Professor, UTL
Technologies Ltd., Mr. Mahesh C, Design lead, Smartplay
Technologies Ltd and Dr. Ramesh K, Engineering Director,
Smartplay Technologies Ltd. For the continuous suggestion
and encouragement to carry out this project. Also I thank Ms.
Vijayasree for helping me to submit the paper to the journal.
REFERENCES
[1]
[2]
[3]

Table 5.1: Area report of design before CTS


B. Area Report After Cts
The table 5.2 shows the summary of area of block after
clock tree synthesis step, where some standard cells like
buffers and inverters are inserted in the path of clock route for
balancing the load and skew. From comparison of table 5.1
and table 5.2 shows that utilization, standard cell count and
pin numbers increased. This increase is because of the clock
tree cells have been inserted.

[4]
[5]
[6]

[7]

Table 5.2: Area report after CTS


VI. CONCLUSION
Ever increasing complexity of the semiconductor devices
(IC) in the industry depending on the need of the consumers in
the form of the portable electronic gadgets, the need for
integration of multimillion gates need to be placed and
interconnected depending on the functionality. So this project
is initial step towards the multimillion gate integration.
The physical design of interfacing module of PCI and
SDRAM in the RISC core is critical and requires the accurate
timing, clocking and DRC free module. So in this project the
accurate timing at placement and routing is achieved. In this
project approximately achieved a lowest congestion of 2% at

51

Stephan Rusu, Trends and challenges in VLSI Technology scaling


towards 100nm, Intel Corporation, Sept 2011.
Subrat Kumar panda, Advanced VLSI Design Lab, Dept. of
Computer science, IIT Kharagpur.
Ioannis Fudos, Xrysovalantis Kavousianos , Dimitrios Markouzis and
Yiorgos Tsiatouhas, Placement and Routing in Computer Aided
Design of Standard Cell Arrays by Exploiting the Structure of the
Interconnection Graph, Computer-Aided Design & Applications,
5(1-4), 2008, 325-337.
Eisenmann H, Generic global placement and floor-planning, Design
Automation Conference IEEE, 1998, page 269-274.
Naveed Sherwani, Algorithm for physical design Automation, third
edition, kluwer academic publication, page 16.
Impact of small process geometries on microarchitectures in systems
on a chip, Sylvester, D. Keutzer, K. Michigan Univ., Ann Arbor, MI
Proceedings of the IEEE Apr 2001, pages 467 489.
Yao-Wen Chang, Physical Design for Nanometer ICs, Department
of Electrical Engineering National Taiwan University Spring 2012

You might also like