Comprehensive Optimization Stage of DC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 383

27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

front page news Blog live streaming flash memory class code changes the world register Log in

IC_learner

Blog Garden Home Bowen Flash New Essay Management


Essays- 88, Articles- 0, Comments- 273, Reads- 970,000
~~Record and sort out the knowledge you have
learned, and then write it in the blog. In the
future, even if the blog is a testimony to your
time spent, it can be regarded as a memory for

Tcl and Design Compiler (8) - DC logic synthesis and optimization yourself. ~~ Except for the blog posts that are
prohibited from being reprinted, other blog posts
If there is an error in this article, please leave a message to correct it; in addition, please can be reposted.

indicate the source http://www.cnblogs.com/IClearner/ , author: IC_learner

After the timing path, working environment, design rules, etc. are constrained, the DC
can be synthesized and the timing optimized. The optimization steps of the DC will be
explained below. However, when optimization cannot be performed in normal mode, we need
to write scripts to improve DC optimization to meet timing requirements. The theoretical part
is mainly based on logic synthesis and does not involve physical library information. In the
actual combat part, we will proceed in DC's topology mode . (This article mainly refers to Yu
Xiqing's "A Practical Course on ASIC Design" for the summary and experiment expansion)
The main contents are: Nickname: IC_learner
Age: 6 years and 4 months
·DC logic synthesis and optimization process Fans : 1594
Follow: 10
· Timing optimization and method +Follow

· Actual combat
< June 2023 >

1. Comprehensive optimization stage of DC dayonetwo three Four five six


28 29 30 31 1 2 3
We can use the compile command to allow DC to comprehensively optimize our design. 4 5 6 7 8 9 10
11 12 13 14 15 16 17
Here we use the normal mode. In the topology mode , the compile command is not supported,
twenty twenty t twenty t twenty
but the compile_ultra command is used. Circuit synthesis optimization includes three stages. 18 19 20
one wo hree four
In these three stages, the design is optimized, as shown in the following figure: 25 26 27 28 29 30 1
2 3 4 5 6 7 8

search

looking around

Google search

Most used link

my essay
my comment
my participation
latest comment
My Tags

My Tags
It mainly includes: Architectural- Level Optimization in the first stage, Logic-Level Design Compiler (13)
Digital IC Design (12)
Optimization in the second stage , and Gate-Level Optimization in the final stage.
DC (12)
(1) Architectural-Level Optimization tcl (12)
Comprehensive (11)
Structural-level optimization includes the following: Digital Backend (9)
verilog topic (9)
Digital ICs (8)
Low Power Design (8)
Common Circuit Modules (7)
More

Essay classification

FPGA design related (EDA tools, knowledge)


① Design structure selection (Implementation Selection): (6)
Linux system and related EDA environment(8)

https://www.cnblogs.com/iclearner/p/6636176.html 1/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

Select the most suitable structure or algorithm to realize the function of the circuit in Tcl and Design Compiler (15)
Topics in Verilog (10)
DesignWare . Some commonly used modules (12)
Static Timing Analysis and Primetime(2)
②Data-path Optimization:
Memories, impressions, realizations and

Choose algorithms such as CSA to optimize the design of the data path. hopes in life (12)
Digital IC (Front End) / Logic Design Tips(15)
③Sharing Common Subexpressions (Sharing Common Subexpressions): Digital IC front-end (simulation) verification (2)
Digital IC design back-end learning record (9)
That is, among multiple expressions/equations, there are common expressions that Fundamentals of Digital IC Design (9)

are shared, for example as follows: Image processing: from entry to abandonment
(manual funny) (3)
has the equation: Reposted and other blog posts(1)

SUM1<=A+B+C; Essay Archives

SUM2<=A+B+D; September 2020(1)


June 2020 (4)
SUM3<= A+B+E; November 2019 (1)
October 2019 (2)
It is easy to see that there is a common expression A+B in the above equation , then the code
August 2019 (4)
subexpression A+B can be shared , and the original equation can be changed to: June 2019 (1)
March 2019 (4)
Temp=A+B; December 2018 (1)
November 2018 (2)
SUM1<=Temp+C;
May 2018 (1)
April 2018 (3)
SUM2<=Temp+D;
March 2018 (1)
SUM3<=Temp+E; August 2017 (8)
July 2017 (12)
This approach can reduce the number of comparators and share common subexpressions. June 2017 (2)
More
④Resource Sharing:
read leaderboard
For the code below:
1. Tcl and Design Compiler (3) - DC synthesis
process (52101)
2. Clocks and constraints in digital design (482
35)
3. Cross-clock domain signal transmission (1)
- control signal (46340)
4. Tcl and Design Compiler (8) - DC logic synth
esis and optimization (40392)
5. (Digital IC) Introduction to Low Power Desig
n (1) - Low Power Design Purpose and Types
of Power Consumption (34261)

Comment leaderboard

1. Three analysis modes of static timing analys


is (brief description)(16)
2. Cross-clock domain signal transmission (1)
- control signal (16)
After resource sharing in DC, a design that uses only one adder and two multiplexers will be
3. The autumn recruitment is over (14)
synthesized, as shown in the figure below, thereby saving resources: 4. (Digital IC) Introduction to Low Power Desig
n (1) - Low Power Design Purpose and Type o
f Power Consumption (14)
5. Metastability and multi-clock switching(12)

Recommended leaderboard

1. Cross-clock domain signal transmission (1)


- control signal (16)
2. Tcl and Design Compiler (6) - basic timing p
The default policy for arithmetic resource sharing is constraint-driven . We can also ath constraints (13)
instruct DC to use an area-optimized strategy. That is to set the variable 3. Tcl and Design Compiler (3) - DC synthesis
process (12)
hlo_resource_allocation to area as follows:
4. Clocks and constraints in digital design(12)

set hlo_resource_allocation area 5. (Digital IC) Introduction to Low Power Desig


n (3) - System and Architecture Level Low Po
If you don't want resource sharing, you can set the variable hlo_resource_allocation to wer Design (10)

none . At this time, we still need to share resources for arithmetic operations, so we must latest comment
write the corresponding code in the RTL code, as shown below:
1. Re: Tcl and Design Compiler (4) - DC
startup environment settings

thank you so much sir

https://www.cnblogs.com/iclearner/p/6636176.html 2/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
--Connais
2. Re: ICC_lab summary——ICC_lab1: data
setting and basic process

I learned DC ICC this semester, thanks to the


blogger.

--YangZQplus
3. Re: Simple simulation using Modelsim

Hello blogger, can you tell me about the use of


notepad?

--Luo_Y
4. Re: (Digital IC) Introduction to Low Power
Design (4) - RTL Level Low Power Design

archeology

--Like fog or not like flowers


5. Re: Clocks and Constraints in Digital Design
⑤Reordering Operators:
Learned~Thank you blogger!

The RTL code contains the topology of the circuit . HDL compilers parse expressions --summer_li
from left to right . Parentheses have higher precedence. DesignWare in DC takes this order
as the beginning of sorting.

For example: the expression SUM<= A*B+C*D+E+F+G , the integrated structure in DC is


shown in the figure below:

The total delay of the circuit is equal to the delay of one multiplier plus the delay of 4
adders. To reduce the delay of the circuit, we can change the order of the expressions or use
parentheses to force the circuit to use a different topology. like:

The resulting composite structure is:

The total delay of the circuit is equal to the delay of one multiplier plus the delay of 2 adders,
which is less than the delay of 2 adders in the original circuit.

(2) Logic-Level Optimization

The content of logic optimization is as follows:

https://www.cnblogs.com/iclearner/p/6636176.html 3/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

After optimizing the structure, the function of the circuit is represented by GTECH
devices . In the process of logic-level optimization, structural optimization and flattening
optimization can be performed .

① Structural optimization:

Structuring optimization uses shared subexpressions to reduce logic, which can be


used for speed optimization and area optimization . Structural optimization is the default
logic-level optimization strategy of DC. Structure optimization When doing logic
optimization, intermediate variables and logic structures are added to the circuit. When DC is
optimized for structure, look for common subexpressions in the design. For example, for the
circuit below, before optimization:

After structural optimization , the circuit and function expressions are:

It is worth mentioning that the shared subexpressions in the structural optimization


at the logic level are different from the shared subexpressions at the previous structural
level . The structural optimization at the logic level refers to the shared subexpressions
of gate-level circuits , while the structural optimization at the structural level refers to the
common subexpressions of arithmetic circuits . shared subexpression. Logic-level structural
optimization does not change the design hierarchy, use the following command to set
structural optimization:

set_structuretrue

② Flatten optimization:

Flattening optimization reduces the combinational logic path to two levels and
becomes a sum -of-products (SOP) circuit, that is, a circuit with (and) and then or (or), as

https://www.cnblogs.com/iclearner/p/6636176.html 4/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

shown in the following figure:

This optimization is mainly used for speed optimization , and the area of ​the circuit may be
large. Set the flattening optimization with the following command :

set_flatten true -effort low | medium | high (one of low, medium, and high
is fine)

The default value after the command option "-effort" is low, and for most designs, the default
value can receive good results. If the circuit does not flatten easily, the optimization stops. If
the value after the option "-effort" is set to medium, DC will spend more CPU time trying to
flatten the design. If the value after the option "-effort" is set to high, the flattening process
will continue to completion. At this time, it may take a lot of time to perform flattening
optimization.

Comparison of Structuring Optimization and Flattening Optimization:

(3) Gate-Level Optimization (Gate-Level Optimization)

During gate-level optimization, Design Compiler starts mapping and completes the
implementation of gate-level circuits. The main contents are as follows:

The mapping optimization process consists of 4 stages:

Phase 1: Delay optimization , Phase 2: Design rule trimming , Phase 3: Design rule
trimming at the expense of timing , Phase 4 : Area optimization .

If we add area constraints to the design, Design Compiler will try to reduce the area of ​the
design in the final stage (stage 4). Gate-level optimization needs to map combinational
functions and timing functions:

The process of mapping combined functions is: DC selects combined units from the
target library to form a design , which can meet the requirements of time and area, as
shown in the figure below:

https://www.cnblogs.com/iclearner/p/6636176.html 5/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

The process of timing function mapping is: DC selects sequential units from the
target libraryto form a design, which can meet the requirements of time and area. In order to
improve speed and reduce area, DC will select more complex sequential units, as shown
below:

https://www.cnblogs.com/iclearner/p/6636176.html 6/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

The introduction of design rule modification is as follows: The process library


includes the design rules specified by the manufacturer for each unit. The design rules are:
max_capacitance, max_transition and max_fanout . During the mapping process, the DC
will check whether the circuit meets the constraints of the design rules , and if there is
any violation, the DC will modify the design rules by inserting buffers and modifying the
driving capabilities of the cells (resizes cells). The steps to modify a design rule are as
follows:

When DC is optimized, if one of the following conditions is met:

① All constraints are satisfied; ② User interrupts; ③ Design Compiler has


reached the stage of diminishing returns of the synthesis results, that is, the results

https://www.cnblogs.com/iclearner/p/6636176.html 7/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

cannot be improved much if the synthesis continues.

At this time, DC will perform interrupt optimization and stop synthesis.

(4) Other optimization conditions (need to add a certain comprehensive option switch)

For example, when one register drives multiple registers , it may violate the design rules.
DC will multiplex the driving registers and divide the driven ones , as shown in the
following figure:

(Use the topology mode of DC, plus the -timing option to


automatically use the above register copy optimization)

When multiple instantiations occur in your design, it is the following situation:

In this case, when the DC compiles, it copies every instantiated module. Each module
corresponds to a copy and has a unique name. In this way, DC can be optimized and
mapped according to the unique environment of each module itself , as shown in the
following figure (( module name is unique )):

In DC, we can use the uniquify command to generate a uniquely named copy of each module
in the design. When DC compiles the design, it also automatically generates a unique named
copy for each module. The variable uniquify_naming_style can be used to control how each

https://www.cnblogs.com/iclearner/p/6636176.html 8/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

copy of a multi-instantiated submodule is named. Its detailed method of use can be


checked with "man uniquify_naming_style" in DC'.

2. Timing optimization and method


After DC synthesis, we check the detailed report. If there are no violations and the
design can meet the time and area requirements without violating the design rules, then the
synthesis is completed. The gate-level netlist and design constraints can be handed over to
the backend tool for placement, clock tree synthesis, and route to generate GDSII files .
If the design cannot meet the time and area requirements or violate the design rules, etc., it is
necessary to analyze the problem, judge the size of the problem, and then take appropriate
measures to solve the problem. The problem is often a timing problem, and corresponding
measures can be taken when a timing violation occurs, as shown in the following figure:

( 1) When the violation is more serious , that is, when the timing violation is more than
25% of the clock cycle , it is necessary to re-modify the RTL code.

(2) When timing violations are below 25% , there are the following timing optimization
methods:

①Use the compile_ultra command (running in topology mode)

compile_ultra , like compile , is a command to compile. The compile_ultra command is


suitable for designs with strict timing requirements and high performance. Use this command
to get better delay quality (delay QoR) , especially suitable for high-performance arithmetic
circuit optimization. This command is very easy to use, it automatically sets all the required
options and variables. Here is some introduction to this command:

The compile_ultra command contains a time-centric optimization algorithm. The


algorithms used in the editing process are: A Time -driven high-level optimization (Timing
driven high-level optimization); B Select the appropriate macrocell structure for arithmetic
operations ; C Select the best data path implementation circuit from the DesignWare
library ; D map wide-fanin (Wide-fanin) gates to reduce the number of logic stages; E
aggressively use logic replication for load isolation; F automatically cancel the hierarchical
division in the critical path ( Auto-ungrouping of hierarchies).

The compile_ultra command supports the DFT process. In addition, the compile_ultra
command is very simple and easy to use. Its switch options are:

https://www.cnblogs.com/iclearner/p/6636176.html 9/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

Part of the explanation is as follows:

-scan : do testable (DFT) editing;

-no_autoungroup : Turn off the automatic ungroup feature;

-no_boundary_optimization : No boundary optimization;

-no_uniquify : Accelerates the runtime of designs with multiple instantiated


modules

-area_high_effort_script : area optimization

-timinq_high_effort_script : timing optimization

The above switch section description looks like this:

When using the compile_ultra command, all DesignWare layers are automatically canceled
if the following variables are set:

set compile_ultra_ ungroup _dw true (default is true)

That is to say, an adder and a multiplier you call are originally synthesized in the form of an
IP core, or in the form of a module, but after setting the above variables, the interface of the
module after synthesis will be No, you don't know which gates are for adders and which
are for multipliers .

When using the compile_ultra command, using the following variable settings, if there
are some modules in the design whose size is less than or equal to the value of the variable,
the module hierarchy is automatically canceled:

set compile_auto_ungroup_delay_num_cells 100 (default=500)

That is to say, suppose you have a module A that is a small multiplier, and you call module B,
and a module B is a small adder, and use the synthesis without setting this command, then we
can see module A What are the gate circuits corresponding to the multiplier in the middle, and
you can also see which gate circuits the adder of module B is composed of. There are layers
and boundaries between module A and module B; after setting the above command, We can't
see the hierarchical relationship between module A or module B, and we can't see which gate
circuits the multiplier is composed of, or you can see a certain AND gate, but you don't know
that it constitutes a multiplier The ones still make up the adder.

For optimal design results, we recommend using the compile_ultra command with the
DesignWare library .

Boundary optimization means that when editing (also called synthesis), Design Compiler
will optimize the transmission constants, unconnected pins and complement information, as
shown in the following figure:

https://www.cnblogs.com/iclearner/p/6636176.html 10/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

That is to say, boundary optimization will optimize some fixed levels and fixed logic
of boundary pins .

In addition, in DC Ultra (or DC topology mode) , we can use Behavioral ReTiming


(BRT for short) technology to optimize the timing of the gate- level netlist, and also
optimize the area of ​the register . BRT makes the design throughput faster by pipeline (or
pipelining) the gate-level netlist . BRT has two commands:

optimize_registers : applies to gate-level netlists containing registers (not


compile_ultra's switch option).

pipeline_design : Gate-level netlists for pure combinational circuits .

For the optimization of registers, for example, the following circuit contains both
combinational logic circuits and registers:

The timing path delay between registers and registers in the latter stage is 10.2 ns, and the
clock cycle is 10 ns, so this path is timing violation. But the timing path delay between
registers and registers of the previous stage is 7. 5 ns, and there is time redundancy. Using the
optimize_registers command, you can move part of the combinatorial logic of the latter
stage to the front stage , so that the timing path delay between all registers and registers is
less than the clock cycle, which meets the requirements of register setup time. The
optimize_registers command first optimizes timing and then optimizes area. After
optimization, the functionality of the circuit remains unchanged at the input/output
boundaries of the blocks. This command only optimizes the gate-level netlist .

In addition to using this command alone, you can also add the option -retime when
compiling (it seems that only compile_ultra has this switch option) . The function of the -
retime option is: when one path does not meet the requirements and the adjacent path meets
the requirements, the DC will perform logical migration between the paths to meet the
requirements of the two paths at the same time. This is also called adaptive retiming , as
shown below Shown:

https://www.cnblogs.com/iclearner/p/6636176.html 11/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

https://www.cnblogs.com/iclearner/p/6636176.html 12/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

For the pipeline (pipeline) optimization of pure combinational logic , examples are as
follows, and the optimization of pure combinational logic circuits is as follows:

The circuit on the left is a pure combinational circuit with a path delay of 23.0 ns. Pipelining
this circuit leads to the circuit shown on the right. Obviously, the throughput of the circuit is
accelerated. It should be noted that when using this command, the registers need to be
preset in the RTL design , otherwise the DC does not know how these registers come from.

②Use compile-scan-inc command

-inc is to use incremental compilation . This command is to perform incremental


compilation that supports design for testability. When using incremental editing, DC only
performs gate-level optimization . At this time, the design will not return to GTECH , as
shown in the following figure:

https://www.cnblogs.com/iclearner/p/6636176.html 13/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

③ Combining key ranges using custom paths

Before introducing this optimization method, let's understand path grouping and delay.

· Path grouping:

In order to facilitate the time analysis of the circuit, the timing paths are grouped
again . Paths are grouped by the clocks that control their destinations . If paths are not
clocked, these paths are classified into the default (Default) path group . We can use the
report_path_group command to report the path grouping in the current design. For example,
for the following circuit, let's look at the routing and grouping:

https://www.cnblogs.com/iclearner/p/6636176.html 14/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

According to the above figure, we can know that there are 5 endpoints in the figure (four
registers and one output). The clock CLK1 controls 3 endpoints, and there are 8 paths under
the control of CLK1. The clock CLK2 controls one terminal, and there are 3 paths under the
control of CLK2. The output port is an end point, which is not controlled by any clock, and
its starting point is the clock pin of the second-level register. There is only one path under its
control, and this path is classified into the default path group. Therefore, there are a total of
12 paths and 3 path groups in this design. The three path groups are respectively CLKI,
CLK2 and a default (Default) path group .

· Path delay:

When calculating the delay of the path, Design Compiler divides each path into time arcs
(timine arcs) , as shown in the following figure:

DC is to calculate the path delay through the time arc. Because time arcs describe the timing
characteristics of cells and/or wires . The time arc of the unit is defined by the process
library, including the delay and timing check of the unit (such as the setup/hold check of the
register, the delay of clk->q, etc.); the time arc of the connection is defined by the netlist. In

https://www.cnblogs.com/iclearner/p/6636176.html 15/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

the above circuit, the time arc has the delay of the connection , the delay of the unit and the
clk -> q delay of the register . The unit delay is usually calculated by a nonlinear model ;
the connection delay is calculated by the line load model before the layout; the distribution of
RC parasitic parameters is determined by the "Tree-type" attribute in the operating condition;
the working condition determines the process, voltage and temperature on the connection and
the effect of unit delay.

In addition, the delay of the path is related to the edge of the starting point , as shown
in the following figure:

Assuming that the connection delay is 0, if the starting point is a rising edge, the delay of this
path is equal to 1. 5 ns. If the starting point is the falling edge, the delay of this path is equal
to 2.0 ns. It can be seen that the time arc of the unit is edge sensitive . Design Compiler
accounts for the edge sensitivity of each path delay. It should also be emphasized that the
default behavior of Design Compiler is to assume that the maximum delay constraint
between registers is: TCLK - FF21ibSetup, that is, the maximum delay time of data
from the sending edge to the receiving edge is less than one clock cycle , as shown in the
following figure:

In Design Compiler, the report_timing command is commonly used to report whether


the timing of the design meets the target. When executing the report_timing command, DC
does 4 steps:

Break down the design into separate time groups;

Each path calculates the delay twice, one starting point is the rising edge, and the
other starting point is the falling edge;

Find the critical path in each path group , that is, the path with the largest delay;

· Display time reports for each time group.

We will introduce how to read the timing report later.

The default behavior of DC is to optimize the critical path . The synthesis process
stops when it cannot find a better optimized solution for the critical path. DC will not further
optimize the sub-critical paths (Sub-critical paths) . Therefore, if the critical path cannot
meet the timing requirements and violates the time constraint, the sub-critical paths will not
be optimized, and they are only mapped to the process library, as shown in the following
figure:

https://www.cnblogs.com/iclearner/p/6636176.html 16/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

For the following circuit, assume that after adding design constraints, all paths belong to the
same clock group, that is, there is only one path group:

If the optimization of the combinational circuit part cannot meet the timing
requirements, and the critical path is in the combinational circuit , according to the default
behavior of DC, the optimization of the critical path in the combinational circuit will block
registers belonging to the same clock group as it and paths between registers optimization.
There are two ways to prevent this from happening: custom path groups and setting key
ranges.

User-Defined Path Group:

During synthesis, the tool only independently optimizes the worst (longest delay)
path of a path group , but it does not hinder the path optimization of another custom path
group. Generating custom path groups can also help the synthesizer adopt a divide-and-
conquer strategy when doing timing analysis , because the report_timing command reports
the timing paths for each timing path group separately. This can help us isolate a certain area
of ​the design, have more control over the optimization, and analyze the problem, as shown in
the following figure:

The command to generate a custom path group is as follows:

#Avoid getting stuck on one path in the reg-reg group

group_path -name INPUTS -from [all_inputs]

group_path -name OUTPUTS -to [all_outputs]

group_path -name COMBO -from [all_inputs] -to [all_outputs]

https://www.cnblogs.com/iclearner/p/6636176.html 17/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

The above command generates three custom path groups, plus the original path group,
that is, the path group from register to register (because it is controlled by CLK , the default
is the path group of CLK), and now there are 4 path groups. The path of the combined circuit
belongs to the " COMBO " group. Since the starting point of the path group is the input
end , after executing the " group_path -name INPUTS -from [all_inputs] " command, the
command uses the option " -from [all_inputs] ", They originally belonged to the " INPUTS "
group. After executing the " group_path -name OUTPUTS -to [all_outputs] " command,
the path of the combined circuit will not be moved to the " OUTPUTS " group, because the
switch option ''-from' has higher priority than the option " -to " , so The path of the
combinatorial circuit is still left in the "INPUTS" path group. But since the " group_path -
name COMBO -from [all_inputs] -to [all-outputs] " command uses the switch options "-
from" and "-to" at the same time , The start and end points of the combined circuit paths
satisfy both requirements, so they end up belonging to "COMBO " group. DC works in this
way to prevent different results due to changes in the order of commands. We can use the
report_path_group command to get the timing path group in the design .

After the custom path group is generated, the path optimization is shown in the figure
below. At this time, the path between registers and registers can be optimized:

DC can specify the weight for optimization . When the timing of some paths is
relatively poor, you can focus on optimizing the path by specifying the weight. The highest
weight is 5 , followed by 2, and the default is 1; therefore, the worst value should be set to 5;
as shown in the figure below, the following command focuses on optimizing the path
group of CLK :

·Critical Range:

By default, DC only optimizes the timing of the critical path in a path group, but we can
set the DC to optimize the path within a certain delay value below the delay of the critical
path, so we can use the following command to set the critical range: set_critical_range 2
[current_design]

After using the above command, DC will optimize all paths within the range of 2ns of
the critical path , and solving the timing problem of the relevant sub-critical path may
also help the optimization of the critical path . The schematic diagram of timing
optimization is as follows:

https://www.cnblogs.com/iclearner/p/6636176.html 18/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

If after executing the set_critical_range command, the optimization makes the timing of
the critical path worse, the DC will not improve the timing of the sub-critical path . We
recommend that the value of the critical range should not exceed 10% of the total value of the
critical path .

· Custom Path Group + Key Range

This is to combine the key ranges of the custom path combination, that is, to set the key
range of the design with the specified key range in each path group , the command is as
follows:

group_path -name CLK1 -critical_range 0.3

group_path -name CLK2 -critical_range 0.1

group_path -name INPUTS -from [all_inputs] -critical_range 0

Using custom timing path groups and critical ranges at the same time makes the DC run
longer and uses a lot of the computer's memory. But this approach is worth a try, because DC
by default only optimizes the critical path in each path group. If the critical path cannot meet
the timing on a path, it will not try other methods to optimize other paths in the timing path
group. If the DC can be made to optimize more paths, it may be better optimizing other parts
of the design. In the design of the data path, many timing paths are interrelated, and the
optimization of the sub-critical path may improve the timing of the critical path . After
setting the critical range. Even if DC does not reduce the Worst Negative Slack in the
design (Worst NegativeSlack, I don't know what it is), it will reduce the Total Negative Slack
in the design .

The following are the key differences between custom path groups and key scopes:

Custom path group : After the user defines the path group, if the overall performance of the
design is improved, DC allows the path timing of one path group to be sacrificed (timing
deterioration) to improve the path timing of another path group. Including a path group in the
design can make the timing of the worst path worse.

Critical Range: The critical range does not allow the critical path timing of the same path
group to be made worse by improving the timing of the sub-critical path. If there are multiple
path groups in the design, we only set the critical range for one path group, instead of setting
the critical range for all path groups in the entire design, DC will only optimize several paths
in parallel, run Time doesn't add much.

④ Repartition Block (Repartition Block)

The division of modules is carried out at the beginning of the design, but because we focus on
the use of the DC tool, we will explain it here.

· Hierarchical structure and module division:

Hierarchical structures are widely used in IC design. In modern IC design, there is


almost no design without hierarchical structure. Some large designs may have as many as ten

https://www.cnblogs.com/iclearner/p/6636176.html 19/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

layers of logic. SoC design generally includes design reuse and intellectual property IP cores.
SoC designs include multiple levels of circuitry. Hierarchical IC design trends are as follows:

A SoC design consists of some blocks, as shown in the figure below:

Similarly, the synthesized logic circuit (such as RISC_CORE) in the figure is generally
composed of some sub-modules. For designing complex and large-scale circuits, we need to
partition it (Partitioning) , and then process (such as synthesis) the relatively simple and
small-scale circuits after partitioning. At this time, because the circuit is small, the processing
and analysis are more convenient and simple. It is easy to meet the requirements quickly.
Then integrate the processed small circuit into the original large circuit, as shown in the
figure below:

Ideally, all partitions should be planned before writing HDL code.

The initial division is defined by HDL.

·The original division can be modified with Desige Compiler.

There are many reasons for partitioning , here are a few of them:

Different functional blocks (such as Memory, uP, ADC, Codec, controller, etc.);

Design size and complexity (module processing time is moderate, the design size is
generally set to one night's running time, manual processing and debugging are performed
during the day, the machine runs at night, and the running results are checked the next
morning);

https://www.cnblogs.com/iclearner/p/6636176.html 20/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

· Convenient design team management project (each design engineer is responsible for
one or several modules);

Design reuse (using IP in design);

Satisfy physical constraints (such as using FPGA to make engineering samples first—
Engineering Sample; large designs may need to be implemented with multiple FPGA chips).

·etc.

The division of modules is related to the timing. If the timing is not good, the modules
can be re-divided. Therefore, we are required to properly divide the design when dividing the
modules.

You can define the hierarchical structure and blocks of your design with
instantiation . VHDL's entity and Verilog's module statements (statements) define a new
hierarchical module, that is, instantiating an entity or module produces a new level of
hierarchy. If in the design, we use symbols (+, one, *, /, ...) to mark the arithmetic operation
circuit, a new level of hierarchy may be generated. The Process in the VHDL language and
the Always statement in the Verilog language cannot generate a new level . When
designing, in order to obtain the optimal circuit, we need to design the hierarchical structure
of the entire circuit and divide the entire design so that the comprehensive results of each
module and the entire circuit can meet our goals.

For example in the following design:

There are 3 modules: A, B and C. They each have input and output ports. Because DC must
reserve the ports of each module when synthesizing the entire circuit . Therefore, logic
synthesis cannot cross block boundaries, nor can combinational logic from adjacent
blocks be merged . The delay of the path from register A to register C is longer, and the
circuit area of ​this part is larger. If we modify the division of the design and combine the
related combinational circuits into one module, the combinational circuits in the original
modules A, B and C have no hierarchical separation, and the technology for combinational
circuit optimization in the synthesis tool can now be fully used . At this time, the area of ​the
circuit is smaller than before, and the delay of the path from register A to register C is also
shorter. Modify as shown in the following figure:

If we make another modification to the division of the design, as follows, we get the best
division:

The modification here combines related combinational circuits into one module. The original
combinational circuits in modules A, B and C have no hierarchical separation, and the
https://www.cnblogs.com/iclearner/p/6636176.html 21/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

technology of combinational circuit optimization in synthesis tools can be fully used.


Moreover, since the combinational circuit is connected to the data input terminal of the
register , the synthesis tool can select a more complex flip-flop ((JK, T, Muxed and Clock-
enabled, etc.) when optimizing the sequential circuit to absorb part of the combinational
circuit Integrate into the flip-flop , so that the area of ​the circuit is smaller, and the delay of
the path from register A to register C is shorter.

For a general design, a good module division is shown in the following figure:

Under this division, the output boundary of the module is the output terminal of the
register . Since there is no boundary between combinational circuits, and its output is
connected to the data input of the register, we can make full use of the optimization
techniques of synthesis tools for combinational circuits and sequential circuits to obtain the
best results and simplify the design constraints. The delay of all input ports of each module
except the clock port in the figure is the same, which is equal to the delay from the clock pin
CLK of the register to the output pin Q. This makes timing constraints more convenient,
which was mentioned in the previous timing path constraints.

The above is the recommended module division mode, and the following is a description
of the module division methods to be avoided.

When doing module division, try to avoid using glue logic (Glue Logic) , the glue logic
is shown in the following figure:

Glue logic is combinational logic connected to modules. In the figure, the top-level NAND
gate (HAND gate) is only an instantiated unit, andoptimization is limited because the glue
logic cannot be absorbed by other modules . If we adopt a bottom-up (bottom up) strategy,
we need to do additional compilation (compile) at the top level. A division that avoids Glue
Logic looks like this:

https://www.cnblogs.com/iclearner/p/6636176.html 22/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

Glue logic can be optimized together with other logic, and the top-level design is just a
structured netlist. No further compilation is required.

·Modification of module division

There may be timing violations in the first module division, and the modules may need
to be re-divided. Here we will introduce the modification of the module division. We know
that the larger the design, the more resources the computer needs to synthesize the
design and the longer the run time . Design Compiler software itself does not limit the size
of the design. When we compile the design, we need to consider that the size of the divided
module should match the existing computer central processing unit (CPU) and memory
resources. Try to avoid the following improper divisions:

The module is too small : Due to the manual division of the module boundary, the
optimization is limited, and the integrated result may not be optimal.

The module is too large : the running time required for editing may be too long, and we
cannot wait too long due to the short design cycle required.

Generally speaking, according to the existing computer resources and the calculation
speed of the integrated software, according to our expected turnaround time , the scale of
the module division is set at about 400 ~ 800K gates . When synthesizing a design, a
reasonable run time is one night. During the day we design and modify the circuit, and write
compiled scripts. Before leaving get off work, use a script to input the design to DC, make
comprehensive optimization on the design, and come back the next morning to check the
results.

When dividing, the core logic (Core Logic) , I/0 Pads , clock generation circuit ,
asynchronous circuit and JTAG (Joint Test Action Group) circuit should be separated and
put into different modules. The top-level design is divided into at least 3 levels of hierarchy:
Top-level, Mid-level, and Functional Core , as shown in the figure below:

This division method is used because: the I/O pad unit is related to the process, the frequency
division clock generation circuit is untestable (Untestable), the JTAG circuit is related to the
process, and the design, constraints, and synthesis of asynchronous circuits are different from
those of synchronous circuits. Therefore, Also placed in a separate module from the core
functionality.

Here mainly introduces the design and synthesis of synchronous circuits. In order to
optimize the synthesis results of the circuit, and the synthesis run time is moderate, we need
to divide the design properly. If the existing division cannot meet the requirements, we need
to modify the division. We can modify the original RTL code to modify the partition, or
use the DC command to modify the partition . The following describes how to use
commands to modify partitions in DC.

The DC modifies partitions in two ways: automatically modifying partitions and


manually modifying partitions .

Automatically modify partitions:

https://www.cnblogs.com/iclearner/p/6636176.html 23/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

The DC needs to transparently modify partitions during the synthesis process. If you use
the command in DC:

compile -auto_ungroup area | delay (choose one of area and delay)

DC will automatically cancel (remove) small module partitions during synthesis.


Unpartitioning of modules is done by variables (these commands are also mentioned
earlier) :

compile_auto_ungroup_delay_num_cells _ _

compile_auto_ungroup_area_num_cells _ _

to control. The preset default values ​for the two variables are 500 and 30 respectively.
We can also use the set command to set them to any value we want. We can use the
report_auto_ungroup command to report those partitions that were ungrouped during
editing. Such as using the command in DC:

compile -ungroup_all

DC will automatically cancel any module partitions or hierarchies during synthesis . At


this point, the design will only have the top layer of circuitry. This command cannot undo
module partitions that have the dont_touch attribute attached to them .

Modify the division manually:

Manually modifying the division means that the user instructs all modifications with
commands. Use the "group" and "ungroup" commands to modify the partitions in the design,
as shown in the following figure:

The group command generates a new hierarchical module, as shown in the following figure:

The ungroup command cancels one or all module partitions, and the effect is shown in the
following figure:

https://www.cnblogs.com/iclearner/p/6636176.html 24/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

To cancel all hierarchies in the current design, use the following command:

ungroup -all -flatten

The ungroup command with the option " -simple_names " will get the original non-
hierarchical unit names U2 and U3

ungroup U23 -simple_names

The resulting effect is shown in the figure below:

Finally, in order to prevent the module from being divided again, here is a summary of the
module division strategy:

• Do not separate combinational circuits by hierarchical boundaries.

· Use the output of the register as the boundary of the division.

·The size of the module is moderate, and the running time is reasonable.

· Separate the core logic (Core Logic), Pads, clock generation circuit, asynchronous
circuit and JTAG circuit into different modules.

The advantages of this division are: better results - small and fast design, simplified
synthesis process - simplified constraints and scripts, faster compilation - faster turnaround
time (turnaround).

The following is the practical part

3. Actual combat
In this actual combat, we mainly practice the comprehensive optimization technology of
DC according to the given schematic diagram and comprehensive specification, and carry out

https://www.cnblogs.com/iclearner/p/6636176.html 25/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

in topology mode, so some physical design content may be involved, let's proceed step by
step.

Design schematic diagram:

(Schematic diagram of the top-level module:)

(Schematic diagram of the sub-module 1:)

(Schematic diagram of the sub-module 2:)

https://www.cnblogs.com/iclearner/p/6636176.html 26/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

(Schematic diagram of the sub-module three:)

Comprehensive specification:

(Description of available resources:)

(Design and constraints file description:)

(Layout planning instructions:)

(design specification:)

https://www.cnblogs.com/iclearner/p/6636176.html 27/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

First, let's briefly analyze this comprehensive specification:

Available resource specification : that is, by running the script to check how many cores
your computer has available for synthesis, we skip it here and ignore him.

Design and constraint file description : tell us the RTL file and name of the design, and
tell us the position and name of the constraint, we do not need to change the constraints such
as the RTL file and name, and the timing environment.

Floor plan description : Since we are using synthesis in topology mode, this floor plan
provides us with physical constraint information.

Design specification description : In fact, this is a comprehensive specification


description, telling you which modules need to be processed during the synthesis process, so
as to meet certain requirements. We will introduce the 10 specifications in the time course
later .

Set up the .synopsys_dc.setup startup file to configure the DC startup


environment

(Same as before, no specific description)

·Write the design constraint file , because on the one hand, there is no design
specification for timing and environmental attributes, and on the other hand, the relevant
design constraint file is given, so we don't need to write it, let me take a look:

Constraints on Timing and Environmental Properties :

https://www.cnblogs.com/iclearner/p/6636176.html 28/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

From top to bottom, they are: clear previous constraints, clock constraints, input port delay
constraints, input port environmental attribute constraints, output port delay constraints, and
output port environmental attribute constraints.

In the layout plan, the physical information contained in the corresponding physical
constraints is as follows:

The constraints are ready for us, we can start the DC

· Start the DC, check before reading into the design

(Here is the same as the previous chapter, no longer stated)

Create a file for formality so that the retiming conversion can capture the corresponding
file. In short, it is used for formal verification. The command is as follows :

set_svf STOTO.svf

· Read in design and check design

(The previous chapters have already been covered, so I won’t state them here again)

Execute timing constraints, check whether the constraints are satisfied, and execute
non-default physical constraints:

source STOTO.con

check_timing

source STOTO.pcon

report_clock

Depending on the design specification, different optimization commands are


applied:

https://www.cnblogs.com/iclearner/p/6636176.html 29/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

-->According to 1 and 2, IO constraints are conservative values ​and can be changed, and the
final design must satisfy the path between registers and registers. Therefore, we can group
paths and pay more attention to the group of clocks. It is the register-to-register group. The
optimized command is as follows:

group_path -name clk -critical 0.21 -weight 5

group_path -name INPUTS -from [all_inputs]

group_path -name OUTPUTS -to [all_output]

group_path -name COMBO -from [all_inputs] -to [all_output]

Then we can check when the settings are made:

report_path_group, the results are as follows:

-->According to 3, the structure of the INPUT module needs to be protected; according to 4,


the PIPELINE module needs to perform register_timing, that is, a pure pipeline, so it cannot
be broken up, so it needs to be set:

set_ungroup [get_designs "PIPELINE INPUT"] false

After setting, we need to check whether the setting is correct (if the setting is correct, it will
return false)

get_attribute [get_designs "PIPELINE INPUT"] ungroup

As shown below:

ungroup is the order of canceling the hierarchy, setting it to true is to cancel the
hierarchy; so we need to set it to false

-->According to 6, the registers of the I_DONT_PIPELINE module cannot be moved by the


pipeline. According to the previous explanation, we can constrain it like this:

set_dont_retime [get_cells I_MIDDLE/I_DONT_PIPELINE] true

Then check whether the constraint is successful, or the constraint is correct:

get_attribute [get_cells I_MIDDLE/I_DONT_PIPELINE] dont_retime

As shown in the image below, the return should be true:

-->According to requirement 4, pipelined is required, so we can enable register_timing, and


the constraints are as follows:

set_optimize_registers true -design PIPELINE

https://www.cnblogs.com/iclearner/p/6636176.html 30/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

-->According to requirement 5, although PIPELINE is pipelined, that is, register retiming is


performed, the output register cannot be moved, that is, the original register is maintained, so
constraints are required:

set_dont_retime [get_cells I_MIDDLE/I_PIPELINE/z_reg*] true

Then check if it is correct:

-->Save Save our design before synthesis:

write -f ddc -hier -out unmapped/STOTO.ddc

·To synthesize :

According to requirement 8: the design is timing-critical, so we need to add the -timing


option when synthesizing; according to requirement 10: to perform scan insertion, so add the
-scan option to see if there is any after pre-adding scan chain synthesis Violation; according to
requirements 7, 9 and the previous requirements, we can add the -retiming option to optimize
registers, combinational logic, etc.; the comprehensive command is as follows:

compile_ultra -scan -timing -retime

·Post-comprehensive inspection and processing :

--> After the synthesis is completed, we can check which features we have used (this step can
be ignored):

(These features are burning a lot of money)

-->Check which modules are broken up, that is, whether the verification is consistent with the
constraints:

https://www.cnblogs.com/iclearner/p/6636176.html 31/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

It can be known that the modules MIDDLE, OUTPUT, DONT_PIPELINE, GLUE,


ARITH and RANDOM have been broken up; the only ones that have not been broken up,
that is, the module structure is preserved are the following three designs: STOTO, PIPELINE,
INPUT

We can also see from the GUI:

Only the top-level design STOTO and sub-modules PIPELINE and INPUT
have been preserved, and the others have been scattered, that is, the boundaries of the
modules cannot be found.

-->Check for constraint violations:

Here we save the generated timing report to a file by redefining it:

redirect -tee -file rc_compile_ultra.rpt {report_constraint -all}

(In this experiment, timing violations)

-->View timing report:

redirect -tee -file rt_compile_ultra.rpt {report_timing}

Let's take a look at part of this timing report:

https://www.cnblogs.com/iclearner/p/6636176.html 32/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

From the above report, we can know that although some modules have been broken up,
the instantiation of the module is still there. We can find the module where the original is
located through the instantiation name, which is convenient for us to provide location when
the delay is unreasonable; this also This is one of the benefits of unique module instantiation
names .

-->Save the design:

write -f ddc -hier -out mapped/STOTO.ddc

The synthesis is completed, and we stop recording data for formality (in short, it is what
formal verification needs to do):

set_svf -off

Check whether the register is moved or not, that is, check the details of the results of the
optimization technology (if you are interested, you can take a closer look and learn more
about it)

We have carried out various retiming and pipeline optimizations before. Some registers
have been moved, and some combinational logic has been divided. Let’s take a look at those
that have been moved. The first aspect is to see if there is a situation where the constraints do
not match expectations.

--> View the registers moved by the register retiming technology in the PIPELINE design:

get_cells -hier *r_REG*_S

Through the path of the return value (that is, the name of the return register), we can
know that the pipeline register in PIPELINE has been moved:

( The name of the pipeline register that has been moved in retiming ends with
clkname_r_REG*_S* , * is a wildcard), combined with our schematic diagram, we can
know that z1_reg has been moved (the suffix name is z1 and s1) :

https://www.cnblogs.com/iclearner/p/6636176.html 33/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

-->We can also view the original module name of the instantiated name, as follows:

Check out the original I_IN module:

report_cell -nosplit I_IN:

-->In the first point, we said that z1_reg is moved only through the 1 in the name, which is
obviously not sufficient. You can verify whether z_reg has been moved by the following:

get_cells -hier *z_reg*

There is a return value, indicating that this register exists and has not been moved (the
instantiated name was changed after the move):

Then let's check z1_reg, and we can see that the object cannot be found, indicating that it has
been moved:

--> View other triggers that are moved by retiming (in retiming, the registers that have been
moved but in the pipeline are named R_* ):

The above are the registers moved by retiming in the INPUT module. We can check whether
there are any registers in the module that are not moved:

https://www.cnblogs.com/iclearner/p/6636176.html 34/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园

get_cells I_IN/*_reg*

有返回值,说明是存在有不被移动的寄存器的。

-->通过下面的命令:

report_timing -from I_MIDDLE/I_PIPELINE/z_reg*/*

可以知道PIPELINE模块是寄存输出的(因为有返回报告值)

优化的实战部分都这里就结束了,最后,DC的优化命令有很多,不懂的可以通过man
命令查看。最后感叹一下,总共码了一万两千多子,加上一堆图,这应该是本系列最
长的一篇博文吧。

不忘初心:写博客最初目的就是记录自己容易忘记的东西,而不是像写书那样专门写给别人看的。所以,除禁止转载的博文
外,其他博文可以转载。 尽自己的努力,做到更好!

分类: Tcl与Design Compiler

标签: DC , Design Compiler , tcl , 综合

好文要顶 关注我 收藏该文

IC_learner
粉丝 - 1594 关注 - 10 5 0

+加关注

« 上一篇: Tcl与Design Compiler (七)——环境、设计规则和面积约束


» 下一篇: Tcl与Design Compiler (九)——综合后的形式验证

posted on 2017-03-28 18:12 IC_learner 阅读(40393) 评论(2) 编辑 收藏 举报

刷新评论 刷新页面 返回顶部

登录后才能查看或发表评论,立即 登录 或者 逛逛 博客园首页

【推荐】腾讯云爆款云服务器首年95元,领折上折代金券最高再省1120元
【推荐】园子的商业化努力-阿里云云市场合作-第一期优惠活动发布上线
【推荐】阿里云-持续降低用云成本:云服务器全面降价,低至0.3元/天

编辑推荐:
· ASP.NET Core 6框架揭秘实例演示[40]:基于角色的授权
· 记一次字符串末尾空白丢失的排查,MySQL 是会玩的!
· 记一次 .NET 某旅行社审批系统 崩溃分析
· C#/.Net的多播委托到底是啥?彻底剖析下
· 如何在 long-running task 中调用 async 方法

Reading Ranking:
C # realizes Linux video chat and remote desktop (source code, supports Xinchuang localization environment, Yinhe Kylin,
Tongxin UOS)
Released a Visual Studio 2022 plug-in, which can automatically complete the constructor dependency injection code
· 11k+ Star a An open source BI tool that is more suitable for Chinese users
· Future programming language "GitHub Hotspot Quick View"
· C# uses enterprise WeChat group robots to push production data

Copyright © 2023 IC_learner


Powered by .NET 7.0 on Kubernetes Powered By Blog Garden

https://www.cnblogs.com/iclearner/p/6636176.html 35/35
Pattern-Based Power Planning in
IC Compiler II
November 2015
Copyright Notice and Proprietary Information
 2015 Synopsys, Inc. All rights reserved. This software and documentation contain confidential and proprietary
information that is the property of Synopsys, Inc. The software and documentation are furnished under a license
agreement and may be used or copied only in accordance with the terms of the license agreement. No part of the
software and documentation may be reproduced, transmitted, or translated, in any form or by any means, electronic,
mechanical, manual, optical, or otherwise, without prior written permission of Synopsys, Inc., or as expressly provided
by the license agreement.

Destination Control Statement


All technical data contained in this publication is subject to the export control laws of the United States of America.
Disclosure to nationals of other countries contrary to United States law is prohibited. It is the reader's responsibility to
determine the applicable regulations and to comply with them.

Disclaimer
SYNOPSYS, INC., AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH
REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

Trademarks
Synopsys company and certain product names are trademarks of Synopsys, as set forth at
http://www.synopsys.com/Company/Pages/Trademarks.aspx.
All other product or company names may be trademarks of their respective owners.

Third-Party Links
Any links to third-party websites included in this document are for your convenience only. Synopsys does not endorse
and is not responsible for such websites and their practices, including privacy practices, availability, and content.

Synopsys, Inc.
700 E. Middlefield Road
Mountain View, CA 94043
www.synopsys.com

ii
Contents
Introduction ................................................................................................................................ 5
Creating Power and Ground Rings............................................................................................. 5
Creating PG Ring Patterns ..................................................................................................... 5
Creating the PG Ring Strategy ............................................................................................... 5
Compiling the PG Ring ........................................................................................................... 6
Creating the Power and Ground Mesh ....................................................................................... 7
Creating the PG Mesh Pattern ............................................................................................... 7
Creating the PG Mesh Strategy .............................................................................................. 8
Compiling the PG Mesh ......................................................................................................... 8
Creating Power and Ground Standard Cell Rails ..................................................................... 10
Creating Power and Ground Macro and Pad Connections ....................................................... 11
Creating Via Rules ................................................................................................................... 13
Creating PG Via Master Rules ............................................................................................. 13
Creating Via Rules Between Different Strategies ................................................................. 16
Defining New Via Structures................................................................................................. 17
Creating a Power Plan Region ................................................................................................. 17
Manually Creating Power Plan Structures ................................................................................ 18
Creating Special Patterns......................................................................................................... 19
Inserting Channel Straps ...................................................................................................... 19
Inserting Terminal Alignment Straps..................................................................................... 20
Inserting Extra Straps to Honor Maximum Standard Cell Rail Tail Distances ....................... 21
Creating Power Switch Alignment Straps ............................................................................. 22
Using the Task Assistant for PG Prototyping ............................................................................ 24
Performing PG Prototyping .................................................................................................. 24
Entering Values for PG Prototyping .................................................................................. 24
Running or Saving the Script ............................................................................................ 26
Advanced Use Cases ............................................................................................................... 28
Creating Composite Patterns ............................................................................................... 28
Creating Stapling Vias on PG Rails ...................................................................................... 30

iii
Distributed PG Creation Flow ............................................................................................... 32
Frequently Asked Questions: ................................................................................................... 34

iv
Introduction
IC Compiler II introduces the pattern-based power planning methodology for power
planning, which replaces the template-based methodology used in IC Compiler. The
pattern-based power planning flow separates the physical implementation details (layer,
width, spacing) of the power ring and mesh from the regions of the design where the
structures are inserted. In a typical design, you run the pattern-based power planning
flow multiple times. The first pass defines and creates the power rings, the second pass
defines and creates and the power mesh, and so on.
This application note describes the basic steps need to create a power plan with IC
Compiler II, and provides some examples which compare the commands and files used
in IC Compiler to the new approach used in IC Compiler II.

Creating Power and Ground Rings


Use the following steps to create power and ground (PG) rings in IC Compiler II. These
steps are similar to those used to create the power and ground mesh.
1. Define the ring pattern with the create_pg_ring_pattern command
2. Create a power ring strategy for each set of rings with the set_pg_strategy
command
3. Compile the power and ground rings with the compile_pg command

Creating PG Ring Patterns


The ring pattern specifies the horizontal and vertical layer names, ring width values,
spacing values, vias, and corner bridging settings used to create the power and ground
rings. You can create ring patterns around the core, design blocks, macros, power and
ground regions, or by specifying a rectilinear polygon. To define the ring pattern, use the
create_pg_ring_pattern command.

The following example creates a ring pattern that uses layer M7 for horizontal layer with
a width of 10 and a spacing of 2; layer M8 is used for the vertical layer with a width of 10
and a spacing of 2.
create_pg_ring_pattern ring_pattern -horizontal_layer M7 \
-horizontal_width {10} -horizontal_spacing {2} \
-vertical_layer M8 -vertical_width {10} \
-vertical_spacing {2} -corner_bridge false

Creating the PG Ring Strategy


The PG strategy associates a PG pattern with a power plan region in the design, and
specifies the power nets, offset value, extension target, and other values related to the
power structure. Use the set_pg_strategy command to create the strategy and specify
the strategy name.

5
The following example uses the set_pg_strategy command to associate the
ring_pattern pattern with the core power plan region for nets VDD, VDD_LOW, and VSS.
The pattern will be created with an offset of 3 microns in both horizontal and vertical
directions, and the power straps will extend to the innermost ring.
set_pg_strategy core_ring -core -pattern \
{{pattern: ring_pattern} {nets: {VDD VDD_LOW VSS}} \
{offset: {3 3}}} -extension {{stop: innermost_ring}}

Compiling the PG Ring


The compile_pg command uses the specified strategy and instantiates the power
structure into the design. The following example uses the core_ring strategy to create
the power ring.
compile_pg -strategies core_ring

The figure above shows a design with a power ring inserted.


The following table shows the template file and power planning commands used to
create a power and ground ring in IC Compiler, and provides the comparable command
set for IC Compiler II.

IC Compiler IC Compiler II
In the template file: create_pg_ring_pattern \
template : ring_pattern \
ringm78(w1,w2,w3,o1,o2,o3) { -horizontal_layer M7 \
side : horizontal { -horizontal_width {10} \
layer: M7 -horizontal_spacing {2} \
width: 10 -vertical_layer M8 \
spacing: 2 -vertical_width {10} \
offset : 3 -vertical_spacing {2} \
} -corner_bridge false

6
side : vertical {
layer: M8
width: 10
spacing: 2
offset: 3
}
}
set_power_ring_strategy \ set_pg_strategy core_ring \
core_ring \ -core \
-nets {VDD VDD_LOW VSS} \ -pattern \
-core \ {{pattern: ring_pattern} \
-template ring.tpl:ringm78() {nets: {VDD VDD_LOW VSS}}\
{offset: {3 3}}}
compile_power_plan \ compile_pg \
–strategy core_ring –strategies core_ring

Creating the Power and Ground Mesh


Use the following steps to create a power mesh in IC Compiler II:
1. Define the PG mesh structure with the create_pg_mesh_pattern command
2. Create the PG mesh strategy with the set_pg_strategy command
3. Compile and create the PG mesh with the compile_pg command

Creating the PG Mesh Pattern


The PG mesh pattern specifies the horizontal and vertical layer names, metal width
values, metal spacing values, metal pitch, vias, and wire trim method to use to create the
power and ground mesh.
The following example creates vertical power meshes on the M6 and M8 layers, and a
horizontal power mesh on the M9 layer with the specified wire widths and pitches.
create_pg_mesh_pattern pg_mesh_pattern \
-layers \
{{{vertical_layer: M8} {width: 5} \
{spacing: interleaving} {pitch: 32}} \
{{vertical_layer: M6} {width: 2}
{spacing: interleaving} {pitch: 32}} \
{{horizontal_layer: M7} {width: 5} \
{spacing: interleaving} {pitch: 28.8}}}

You can replace the values in the command with parameters and assign the parameter
values with the set_pg_strategy command. Parameters let you reuse the pattern
creation command and apply a different width, pitch, offset, and other values when
creating the strategy. The following example creates a vertical mesh on layer M8 and a
horizontal mesh on layer M9. The actual values for the width, offset, pitch, and so on are
assigned later with the set_pg_strategy command.

7
create_pg_mesh_pattern pg_mesh1 -parameters {w1 p1 w2 p2 f t} \
-layers \
{{{vertical_layer: M8} {width: @w1} \
{spacing: interleaving} {pitch: @p1} {offset: @f} {trim: @t}} \
{{horizontal_layer: M9} {width: @w2} {spacing: interleaving} \
{pitch: @p2} {offset: @f} {trim: @t}}}

Creating the PG Mesh Strategy


The PG mesh strategy defines the high-level structure for the power and ground mesh.
The strategy includes the following components:
 A list of power and ground nets. You can define a single strategy for all power and
ground nets or create individual strategies for each net.
 The routing areas for the power and ground nets. The area can be the core area, a
voltage area, an input polygon, a single macro cell, a set of macro cells, or a power
plan region.
 An extension specification for power and ground nets. By default, the power ground
straps are bounded by the routing area; you can extend the straps beyond the
boundary or shorten the straps to create a gap between the strap ends and the
boundary.
 A blockage specification for power and ground nets. You can use the blockage
specification to prevent straps from crossing over macro cells, voltage areas, or
rectilinear areas of your design.
 A mesh pattern name. The mesh pattern contains settings and rules for the metal
layers used to build the power straps.
A strategy defines a single routing area and a group of power ground nets. You can
define multiple strategies for a single design, depending on the complexity of the power
plan for your design. For multivoltage designs, you might require a different strategy for
each voltage area.
The following example uses the set_pg_strategy command to associate the
pg_mesh1 pattern with the core area as the power plan region.
set_pg_strategy s_mesh1 -core \
-pattern {{pattern: pg_mesh1} {nets: {VDD VSS VSS VDD}} \
{offset_start: 400 400} {parameters: 4 80 6 120 3.344 false}} \
-blockage {{{nets: VDD} {block: u0_2 u0_3}}} \
-extension {{stop: outermost_ring}}

Compiling the PG Mesh


After defining the pattern and strategy for the power rings, use the compile_pg
command to create the PG mesh. The following example creates the power mesh for the
s_mesh1 strategy.
compile_pg -strategies s_mesh1

8
The figure above shows a design with a power mesh inserted.
The following table shows the power planning commands used to create a power and
ground mesh in IC Compiler, and provides the comparable command set for IC Compiler
II. Note that for the offset_type setting, IC Compiler II only supports the centerline offset
type.

IC Compiler IC Compiler II
template_name: s_mesh1 create_pg_mesh_pattern pg_mesh1 \
(w1 p1 w2 p2 f t) { -parameters {w1 p1 w2 p2 f t} \
layer: M8 { -layers {{{vertical_layer: M8}\
direction: vertical {width: @w1} \
{spacing: interleaving} \
width: @w1
{pitch: @p1} \
spacing: interleaving {offset: @f} \
number: {trim: @t}} \
pitch: @p1 {{horizontal_layer: M9} \
offset_type: centerline {width: @w2} \
offset_start: 400 {spacing: interleaving} \
offset: @f {pitch: @p2} \
trim_strap: @t {offset: @f} \
} {trim: @t}}}
layer: M9 {
direction: horizontal
width: @w2
spacing: interleaving
number:
9
pitch: @p2
offset_type:
offset_start: 400
offset: @f
trim_strap: @t
}
}
set_power_plan_strategy s_mesh1 \ set_pg_strategy s_mesh1 -core \
-core \ -pattern {{pattern: pg_mesh1} \
-template template.tpl:s_mesh1 \ {nets: {VDD VSS VSS VDD}} \
(4 80 6 120 3.344 false) \ {offset_start: 400 400} \
-nets {VDD VSS VSS VDD} \ {parameters: 4 80 6 \
-blockage {{{nets: VDD} \ 120 3.344 false}} \
{block: u0_2 u0_3}}} \ -blockage {{{nets: VDD} \
-extension \ {block: u0_2 u0_3}}} \
{{stop: outermost_ring}} -extension {{stop:
outermost_ring}}

compile_power_plan \ compile_pg \
-strategy s_mesh1 -strategies s_mesh1

Creating Power and Ground Standard Cell Rails


You can create power and ground rails in the design to connect the power and ground
pins in the standard cells. The power and ground standard cell rails connect to the straps
and rings in the design to provide connectivity to the standard cells. Use the
create_pg_std_cell_conn_pattern command to specify the metal layers, rail width,
and rail offset to use to create the power and ground rails for the standard cell rails.
The following example creates a standard cell connection pattern on layer M1 by using
the default width. If you do not specify a rail width, the command uses the width of the
standard cell power and ground pins as the rail width.
create_pg_std_cell_conn_pattern std_pattern -layers {M1}

You can also create standard cell connection pattern with specific layers, rail widths, and
offsets. The following example create a standard cell connection pattern; the layer, rail
width are specified using parameters. Parameter values are assigned later with the
set_pg_strategy command.
create_pg_std_cell_conn_pattern std_pattern2 \
-layers {@metal_layer} -rail_width {@w_top @w_bottom} \
-parameters {metal_layer w_top w_bottom}

After defining the standard cell connection pattern, associate the pattern with the power
plan region in the design with the set_pg_strategy command.
set_pg_strategy rail_strat -core \
-pattern {{name: std_pattern2} {nets: VDD VSS} \
{parameters: {M1 0.2 0.2}}}

After defining the pattern and strategy for the standard cell rails, use the compile_pg
command to create the rails. The following example creates the standard cell power rails
defined by the rail_strat strategy.
10
compile_pg -strategies rail_strat

The following table shows the power planning commands used to create a power and
ground standard cell rails in IC Compiler, and provides the comparable command set for
IC Compiler II.

IC Compiler IC Compiler II
preroute_standard_cells \ create_pg_std_cell_conn_pattern \
-nets {VDD VSS} \ std_pattern -layers {M1}
-route_pins_on_layer M1 set_pg_strategy rail_strat -core \
-pattern {{name: std_pattern} \
{nets: VDD VSS}}
compile_pg -strategies rail_strat}

Creating Power and Ground Macro and Pad Connections


Use the create_pg_macro_conn_pattern command to create macros and pad cell
power and ground pin connections. There are three types of macro connections:
long_pin, ring_pin and scattered_pin.
The following examples create a pad cell pin connection pattern for pins with shapes on
the M7 or M8 layer.
create_pg_macro_conn_pattern pad_pattern \
-pin_conn_type scattered_pin -layers {M7 M8}

After defining macro connection pattern, associates the pattern with the specified cells in
the design using the set_pg_strategy command.
set_pg_strategy s_pad -macros $all_pg_pads \
-pattern {{name: pad_pattern} {nets: {VDD VDD_LOW VSS}}}

Use the compile_pg command to create the rails to connect the macro pins, after you
define the pattern and strategy for the macro connections.
compile_pg -strategies s_pad

11
The figure above shows a design with power straps connected to pad cells
The following example creates a hard macro connection pattern for macro pins on layers
M5 and M6.
create_pg_macro_conn_pattern hm_pattern \
-pin_conn_type scattered_pin -layers {M5 M6}

set_pg_strategy macro_conn -macros $toplevel_macros \


-pattern {{name: hm_pattern} {nets: {VDD VDD_LOW VSS}}}

compile_pg -strategies macro_conn

12
The figure above shows a design with power and ground connections to a macro
The following table shows the power planning commands used to create power and
ground macro cell rails in IC Compiler, and provides the comparable command set for IC
Compiler II.

IC Compiler IC Compiler II
preroute_instances \ create_pg_macro_conn_pattern \
-nets {VDD VDD_LOW VSS} \ hm_pattern \
-route_pins_on_layer {M5 M6} \ -pin_conn_type scattered_pin \
-connect_instances specified \ -layers {M5 M6}
-cells $toplevel_hms set_pg_strategy macro_conn \
-macros $toplevel_hms \
-pattern {{name: hm_pattern} \
{nets: {VDD VDD_LOW VSS}}}
compile_pg -strategies macro_conn

Creating Via Rules


Use the following steps to create via rules in IC Compiler II.
1. Create PG via master rules
2. Create via rules between different strategies if multiple strategies exist
3. Create via definitions for via structures not defined in the technology file

Creating PG Via Master Rules


Use the set_pg_via_master_rule command to define the PG via rules and via
structures, including contact codes, via array dimensions, via locations within the
intersection, and so on.

13
set_pg_via_master_rule M67_mesh_via_rule \
-via_array_dimension {6 12} -contact_code {VIA67BAR}

set_pg_via_master_rule M78_mesh_via_rule \
-via_array_dimension {6 10} -contact_code {VIA78BAR_C}

To use the PG via rule created with the set_pg_via_master_rule command, specify
the via rule name after the via_master keyword for the create_pg_composite_pattern,
create_pg_macro_conn_pattern, create_pg_mesh_pattern, and
create_pg_ring_pattern commands. For example, the following command uses the
M67_mesh_via_rule and M78_mesh_via_rule via rules defined previously.
create_pg_mesh_pattern pg_mesh1 -parameters {w1 p1 w2 p2 f t} \
-layers \
{{{vertical_layer: M8} {width: @w1} {spacing: interleaving} \
{pitch: @p1} {offset: @f} {trim: @t}} \
{{horizontal_layer: M7} {width: @w2} {spacing: interleaving} \
{pitch: @p2} {offset: @f} {trim: @t}} \
{{vertical_layer: M6} {width: @w1} {spacing: interleaving} \
{pitch: @p2} {offset: @f} {trim: @t}}} \
-via_rule \
{{{layers: M8}{layers: M7} {via_master: M78_mesh_via_rule}} \
{{layers: M7}{layers: M6}{via_master: M67_mesh_via_rule}}}

Next, define the power plan strategy and compile the power plan to create the PG
structure with the specified via master.
set_pg_strategy s_mesh1 \
-pattern {{pattern: pg_mesh1} {nets: {VDD VSS VSS VDD}} \
{offset_start: 10 10} {parameters: 1 60 3 80 3.344 false}} \
-blockage {{{nets: VDD} {block: u0_2 u0_3}}} -core \
-extension {{stop: outermost_ring}}

compile_pg -strategies s_mesh1

The figure above shows a design with 6x10 VIA78BAR_C inserted between M7 and M8,
and 6x12 VIA67BAR inserted between M6 and M7.

14
The following table shows the power planning commands used to create via rules and
insert vias in IC Compiler, and provides the comparable command set for IC Compiler II.

IC Compiler IC Compiler II
set_preroute_advanced_via_rule \ set_pg_via_master_rule \
-contact_codes VIA67BAR \ M67_mesh_via_rule \
-size_by_array_dimensions \ -contact_code {VIA67BAR} \
{20 18} -via_array_dimension {20 18}
In the template file: create_pg_mesh_pattern pg_mesh1 \
template_name: s_mesh1 -parameters {w1 p1 w2 p2 f t} \
(w1 p1 w2 p2 f t) { -layers {{{vertical_layer: M8} \
layer: M8 { {width: @w1}{spacing: interleaving} \
direction: vertical {pitch: @p1}{offset: @f}{trim: @t}} \
width: @w1 {{horizontal_layer: M7}{width: @w2} \
spacing: interleaving {spacing: interleaving}{pitch: @p2} \
number: {offset: @f}{trim: @t}}} \
pitch: @p1 -via_rule {{{layers: M8} \
offset_type: {layers: M7} \
offset_start: 400 {via_master: M78_mesh_via_rule}}}
offset: @f
trim_strap: @t
}
layer: M7 {
direction: horizontal
width: @w2
spacing: interleaving
number:
pitch: @p2
offset_type:
offset_start: 400
offset: @f
trim_strap: @t
}
advanced_rule: on {
stack_vias: M8 M7
honor_advanced_via_rules:
on
}
}
set_power_plan_strategy \ set_pg_strategy s_mesh1 -core \
s_mesh1 \ -pattern {{pattern: pg_mesh1} \
-core \ {nets: {VDD VSS VSS VDD}} \
-template template.tpl: \ {offset_start: 400 400} \
s_mesh1 \ {parameters:4 80 6 120 3.344 false}} \
(4 80 6 120 3.344 false) \ -blockage {{{nets: VDD} \
-nets {VDD VSS VSS VDD} \ {block: u0_2 u0_3}}} \
-blockage {{{nets: VDD} \ -extension {{stop: outermost_ring}}
{block: u0_2 u0_3}}} \
-extension \
{{stop: outermost_ring}}
compile_power_plan \ compile_pg -strategies s_mesh1
–strategy smesh

15
Creating Via Rules Between Different Strategies
A complex power plan might require that you form via connections to create a complete
power network with several different power rings, meshes, power rails, and other
structures. Use the set_pg_strategy_via_rule command to set the via insertion rules
when inserting vias between different power plan strategies.
The following example creates the via_rule1 rule that inserts a VIA23LG via between
shapes on layer M2 as defined by strategy strat1, and shapes on layer M3 as defined by
strategy strat2. New vias are omitted between other metal layer intersections.
set_pg_strategy_via_rule via_rule1 \
-via_rule { {{{strategies: strat1} {layers: M2}} \
{{strategies: strat2} {layers: M3}} {via_master: VIA23LG}} \
{{intersection: undefined} {via_master: nil}} }

After defining the via rule and strategy, use the compile_pg command and specify the
via rule with the -via_rule option.
compile_pg -strategies {strat1 strat2} -via_rule via_rule1

The figure above shows a default via Via master VIA23LG is used according to
used when no via_rule is defined via_rule1

The following example defines a new strategy using the via_4x1 via master and the
rail_strategy strategy to insert vias on existing M5 straps defined with the rail_strategy
strategy. The compile_pg command inserts the vias:
set_pg_strategy_via_rule rail_via_rule \
-via_rule { {{{strategies: rail_strategy}}{{existing: strap} \
{layers: M5}} {via_master: via_4x1}} {{intersection: undefined} \
{via_master: NIL}} }

compile_pg -strategies rail_strategy -via_rule rail_via_rule

16
Defining New Via Structures
If your design requires that you create a new via definition that is not defined in the
technology file, use the create_via_def command. The following example creates a
via_def named VIA12_PG and uses the set_pg_via_master_rule command to create
a via rule.
create_via_def VIA12_PG -cut_layer VIA1 -cut_size {0.05 0.05} \
-upper_enclosure {0.06 0} -lower_enclosure {0.06 0}

set_pg_via_master_rule V12PG_via_rule -via_array_dimension {2 1} \


-contact_code {VIA12_PG}

The following table shows the power planning commands used to create a via definition
in IC Compiler, and provides the comparable command set for IC Compiler II.

IC Compiler IC Compiler II
create_via_master -name VIA12_PG \ create_via_def VIA12_PG \
-cut_layer_name VIA1 \ -cut_layer VIA1 \
-lower_layer_name M1 \ -cut_size {0.05 0.05} \
-upper_layer_name M2 \ -upper_enclosure {0.06 0} \
-cut_width 0.05 -cut_height 0.05 \ -lower_enclosure {0.06 0}
-lower_layer_enc_width 0.06 \
-lower_layer_enc_height 0 \
-upper_layer_enc_width 0.06 \
-upper_layer_enc_height 0

Creating a Power Plan Region


Use the create_pg_region command to define a power plan region which bounds the
power network configuration and strategy. Power plan regions are created on design
objects such as core areas, blocks, voltage areas, macros and groups of macros, groups
of regions, and polygon. You can use power plan regions as a working area to create
strategies or blockage areas for other routing areas. The following example creates a
power plan region named r0 that uses the core area and excludes macros:
create_pg_region r0 -core \
-exclude_macros [get_cells -filter "design_type==macro"] \
-macro_offset {8 8} -expand -2

17
The figure above shows a design after creating a PG region which excludes macros

The following table shows the power planning command used to create PG regions in IC
Compiler, and provides the comparable command set for IC Compiler II.

IC Compiler IC Compiler II
create_power_plan_regions r0 -core \ create_pg_region r0 -core \
-exclude_macros [get_flat_cells \ -exclude_macros [get_cells \
-filter \ -filter "design_type==macro"] \
"mask_layout_type == macro"] \ -macro_offset "8 8" -expand -2
-macro_offset 8 -expand -2

Manually Creating Power Plan Structures


In special cases, you might need to manually create PG structures in your design. The
create_pg_strap and create_pg_vias commands can be used to create PG straps
and vias.
The following example creates a horizontal PG strap for net VDD on layer M3 with a
center at 100um, a width of 3um, a low end at 50um and a high end 300um. DRC is
checked, but DRC errors are ignored and no automatic repairs are performed. All new
shapes are marked as macro connection. The low end is extended to the innermost ring,
and the high end is extended to the design boundary and a pin is generated. Vias are
created between the PG strap and rings using a specified rule, but not for macro pins.
Default vias are inserted at any remaining intersections.
create_pg_strap -layer M3 -direction horizontal -width 3 \
-net VDD -start 100 -low_end 50 –high_end 300 \
-extend_low innermost_ring \
-extend_high design_boundary_and_generate_pin \
-drc check_but_no_fix \
-via_rule { \

18
{{existing: ring}{via_master: VIA34}} \
{{macro_pins: all}{via_master: nil}} \
{{intersection: undefined}{via_master: default}} }

Starting from version K-2015.06-SP2, you can create multiple straps by using the -pitch
option. The following example creates vertical PG straps for net VDD on layer M6 with a
width of 1um. Straps are created from (10, 0) with a pitch of 20um, up to (200, 0).

create_pg_strap -layer M6 -direction vertical –net VDD -width 1 \


-start 10 -stop 200 -pitch 20

The following example creates PG vias for nets VDD and VSS within bounding box
{{10 10} {100 100}}. Vias are created between rings on layer METAL5 and standard cell
connections on layer METAL1. Additional vias are allowed and DRC is skipped. All vias
are marked as ring type.
create_pg_vias -within_bbox {{10 10} {100 100}} -nets {VDD VSS} \
-from_types ring -from_layers METAL5 -to_types std_conn \
-to_layers METAL1 -drc no_check -insert_additional_vias \
-mark_as ring

Creating Special Patterns


Special patterns are used to insert power straps in channels between design objects
such as blocks, macros, placement_blockages, voltage areas, and region boundaries.

Inserting Channel Straps


Use the create_pg_special_pattern -insert_channel_straps command to add
extra straps in the channels between objects such as macros, placement blockages,
voltage areas and blocks. The tool identifies the channels between the specified objects
and skips channels that contain synthesized straps or channels that are too narrow for
extra vertical straps. The tool inserts extra straps in the channels between the objects.
The channel_threshold specification defines the minimum threshold needed to insert
straps. You can select the layer and control the width and spacing of the strap.
The following example creates a special pattern for channel strap insertion, defines a
strategy, and compiles the power plan. Vertical channel straps are inserted on layer M8
with a width of 4um. Macro cells are considered for channels, and the channel threshold
is 15um.
create_pg_special_pattern channel_strap_pat \
-insert_channel_straps {{layer: M8} {direction: vertical} \
{width: 4} {spacing: 5} {channel_threshold: 15} \
{channel_between_objects: macro}}

set_pg_strategy channel_strat -core \


-pattern {{name: channel_strap_pat}{nets: VDD VSS}}

compile_pg -strategies channel_strat

19
The figure above shows a design before inserting channel straps

The figure above shows the same design with channel straps inserted

Inserting Terminal Alignment Straps


Use the create_pg_special_pattern -insert_terminal_alignment_straps on
command to create special patterns where straps align with preexisting terminals. The
following example creates straps which align with existing VDD and VSS terminals on
the boundary.
create_pg_special_pattern terminal_strap_pat \
-insert_terminal_alignment_straps on

set_pg_strategy terminal_st -design_boundary \


-pattern {{name: terminal_strap_pat} {nets: VDD VSS}}

20
compile_pg -strategies terminal_st

The figure above shows a design with PG terminals

The figure above shows the same design after inserting straps that align with the
terminals

Inserting Extra Straps to Honor Maximum Standard Cell Rail Tail


Distances
You can create a special pattern to insert extra strap to honor maximum standard cell
rail tail distance. Use the create_pg_special_pattern max_stdcell_pat command
with the -honor_max_stdcell_strap_distance option and specify the maximum
distance which should be honored, the layer for strap insertion, and the offset value. The
following example inserts extra straps of width 4um on layer M8. The rail tail maximum
length is 40um, the location is determined by the offset value list {20 10}.
create_pg_special_pattern max_stdcell_pat \
-honor_max_stdcell_strap_distance { \
{layer: M8} {max_distance: 40} \
{width: 4}{offset: {20 10}} }

set_pg_strategy max_std_st -core \


-pattern {{name: max_stdcell_pat} {nets: VDD VSS}}

21
compile_pg -strategies max_std_st

The figure above shows the original design

The figure above shows the same design after inserting extra straps

Creating Power Switch Alignment Straps


You can create special pattern for connecting power switch cell pin alignment strap
insertion. The following example inserts alignment straps on layer M4 for power switches
with the HEAD16DM library cell name.
create_pg_special_pattern psw_pt \
-insert_power_switch_alignment_straps \
{{lib_cells: HEAD16DM} {layer: ME6}{width: 3}}

22
set_pg_strategy pws_st -core \
-pattern {{name: psw_pt} {nets: InstDecode/VDD108}} \
-extension {stop: design_boundary}

compile_pg -strategies pws_st

The figure above shows the original design

The figure above shows the same design after inserting straps to connect the power
switch cells

23
Using the Task Assistant for PG Prototyping
The Task Assistant provides a form for performing PG prototyping on your design. Start
the Task Assistant by choosing Task > Task Assistant from the menu. Choose Design
Planning > PG Planning > PG Prototyping in the Task Assistant to open the PG
Prototyping form.

Performing PG Prototyping
You can use PG Prototyping to quickly create and remove PG meshes in your design.
After entering values in the form, use the Preview feature to display the Tcl commands
that will create the PG meshes and rails. After refining the commands, you can run the
commands or save them to a script file.

Entering Values for PG Prototyping


You can enter the following information in the PG Prototyping form:.
– PG area
o Choose Core, Polygon, B lock, or Voltage Area (VA) to define the power plan
region
– PG nets
o Enter the PG nets names, or click Browse and select the nets

24
– PG layers
o Specify up to four layers for the PG mesh
– PG tracks (%)
o Specify the routing track percentage used.
The figure below shows that the PG tracks(%) value affects the pitch value
for the create_pg_mesh_pattern command

25
Running or Saving the Script
After completing the form, use buttons in the form to perform different actions.
– Apply:
o Click the Apply button to create the PG mesh and rail based on your settings
The equivalent Tcl commands are also displayed in the Tcl Command
window

– Preview:
o Click the Preview button to display the command in the Tcl Command
window
o Review the commands, then click Add to Script to append the commands to
the Script Editor window
o Alternatively, run the commands in the Tcl Command window by clicking Run
Tcl Command

26
– Undo
o Click Undo to run compile_pg –undo on the current design
– Remove PG Routes
o Click Remove PG Routes to run remove_routes on the current design

27
Advanced Use Cases
This section describes advanced use cases in pattern-based power planning.

Creating Composite Patterns


You can create a composite pattern and use the pattern to create a power ground mesh.
The composite pattern can be defined hierarchically based on low-level patterns such as
basic wire patterns or other composite patterns. After the pattern is created, associate
the pattern with a power plan region with the set_pg_strategy command, and
instantiate the pattern with the compile_pg command.
The following example creates two basic wire patterns with the
create_pg_wire_pattern command, and uses the create_pg_composite_pattern
command to create a composite pattern based on the two basic wire patterns:
create_pg_wire_pattern m8_strap -layer M8 -direction vertical \
-width 2 -spacing 1 -pitch 10

create_pg_wire_pattern m7_seg -layer M7 -direction horizontal \


-width 1 -low_end_reference_point 0 -high_end_reference_point 5 \
-pitch {10 3.344}

create_pg_composite_pattern m78_mesh -nets {VDD VSS} \


-add_patterns {{{pattern: m8_strap} {nets: VDD VSS} {offset: 2}} \
{{pattern: m7_seg} {nets: VDD} {offset: 1 1.672}} }

set_pg_strategy smesh -core \


-pattern {{name: m78_mesh} {nets: VDD VSS}}

compile_pg -strategies smesh

28
The two figures show the same design after inserting a composite pattern

The following example defines a composite pattern which contains two wire patterns.
The pattern is used to create bridging straps in the design.
## Composite pattern creation
create_pg_wire_pattern m8_strap -layer M8 -direction vertical \
-width 2 -spacing 1 -pitch 10
create_pg_wire_pattern m7_seg -layer M7 -direction horizontal \
-width 1 -low_end_reference_point 0 -high_end_reference_point 5 \
-pitch {10 3.344}

29
create_pg_composite_pattern m78_mesh -nets {VDD VSS} \
-add_patterns {{{pattern: m8_strap} {nets: VDD VSS} {offset: 2}} \
{{pattern: m7_seg} {nets: VDD} {offset: 1 1.783}} }

set_pg_strategy smesh -core \


-pattern {{name: m78_mesh} {nets: VDD VSS}}

compile_pg -strategies smesh

## Rail creation with stacked vias


create_pg_std_cell_conn_pattern rail -layers M1 -rail_width 0.2
set_pg_strategy srail -pattern {{name: rail} {nets: VDD VSS}} -core
set_pg_via_master_rule via_shift_right -offset {1.5 0}

set_pg_strategy_via_rule vrule \
-via_rule { {{{strategies: srail} {nets: VDD}} {{existing: all} \
{layers: M7}} {via_master: via_shift_right} \
{between_parallel: true}}
{{{strategies: srail} {nets: VSS}} \
{{existing: all} {layers: M8}} {via_master: via_shift_right}} }

compile_pg -strategies srail -via_rule vrule

The figure above shows the design after creating the power rail with stacked vias

Creating Stapling Vias on PG Rails


In advanced technology nodes, M2 rails can run in parallel with M1 rails and represent
the same net. You can create stapling vias between the M1 and M2 rails using the
30
create_pg_vias or compile_pg commands. Following are some examples on how to
perform via stapling between M1 and M2 rails.

The following example inserts via1 vias between M1 and M2 rails when the rails already
exist in the design.
set_pg_via_master_rule V1_rule -contact_code VIA12SQ_C \
-via_array_dimension {20 1} -allow_multiple {2.4 0}

create_pg_vias -nets {VDD VSS} -within_bbox {{0 0} {1000 1000}} \


-from_layers M2 -to_layers M1 \
-allow_parallel_objects -via_masters V1_rule

The following example creates M1 rails and via1 vias when the design contains existing
M2 rails.
## Create M1 standard cell rail pattern and specify M1 color
create_pg_std_cell_conn_pattern m1_rail \
-layers {M1} -rail_width 0.16

set_pg_strategy m1_rail_strategy \
-pattern {{name: m1_rail} {nets: VDD VSS}} -core

## Set contact code, array size and pitch between via arrays
set_pg_via_master_rule V1_rule -contact_code VIA12SQ_C \
-via_array_dimension {20 1} -allow_multiple {2.4 0}

31
## Create via rules between M1 rail strategy and existing M2 straps to
## honor the via settings
set_pg_strategy_via_rule rail_rule \
-via_rule {{{strategies: m1_rail_strategy} \
{{existing: strap} {layers M2}} \
{via_master: V1_rule} {between_parallel: true}} \
{{intersection: undefined} {via_master: NIL}}}

## Run compile_pg to the rails and vias


compile_pg -strategies m1_rail_strategy -via_rule rail_rule

The following example creates M1 rails, M2 rails, and via1 vias simultaneously.
## Create M1 and M2 standard cell rail pattern
create_pg_std_cell_conn_pattern m1_rail -layers {M1} -rail_width 0.16
create_pg_std_cell_conn_pattern m2_rail -layers {M2} -rail_width 0.14

set_pg_strategy m1_rail_strategy \
-pattern {{name: m1_rail} {nets: VDD VSS}} -core
set_pg_strategy m2_rail_strategy \
-pattern {{name: m2_rail} {nets: VDD VSS}} -core

## Specify via1 contact code and array size


set_pg_via_master_rule V1_rule -contact_code VIA12SQ_C \
-via_array_dimension {20 1} -allow_multiple {2.4 0}

## Create via rules between m1 rail strategy and m2 rail strategy


set_pg_strategy_via_rule rail_rule \
-via_rule {{{strategies: m1_rail_strategy} \
{strategies: m2_rail_strategy} \
{via_master: V1_rule}{between_parallel: true}} \
{{intersection: undefined}{via_master: NIL}}}

## Run compile_pg to create the rails and vias


compile_pg -strategies {m1_rail_strategy m2_rail_strategy} \
-via_rule rail_rule

Distributed PG Creation Flow


Use the characterize_block_pg command to generate block-level PG creation
constraints, including via master rules, patterns, strategies, and strategy via rules for
each block based on the top-level constraints. The command generates one PG creation
script for each reference design of block instances. You can then open the reference
design and source the script to create the same PG inside the block, the same as if the
PG were pushed down from the top-level design.
For distributed power and ground creation, perform distributed PG creation for top-level
and block-level designs with the run_block_compile_pg command. First, run the
characterize_block_pg command with the -compile_pg_script option to write out a
PG mapping file. By default, the mapping file is written to ./pgroute_output/pg_mapfile.
This mapping file specifies the PG constraint and compile_pg script for each block. The
generated mapping file must be assigned with the set_constraint_mapping_file
command before running distributed PG creation. To instantiate the power plan, run the

32
run_block_compile_pg command to use distributed computing to create the power
plan.
The following example uses the characterize_block_pg,
set_constraint_mapping_file, and run_block_compile_pg commands to instantiate
the power plan using distributed computing.
characterize_block_pg -compile_pg_script compile_pg.tcl
# By default, tool generates PG mapping file "pg_mapfile"
# in the "pgroute_output" directory
set_constraint_mapping_file ./pgroute_output/pg_mapfile
run_block_compile_pg -host_options block_script

An example script read by the characterize_block_pg -compile_pg_script


command is shown below:
compile_pg -via_rule rail_m1_m8

An example PG mapping file is shown below:


leon3s PG_CONSTRAINT ./leon3s_pg.tcl
leon3s_2 PG_CONSTRAINT ./leon3s_2_pg.tcl
leon3s_3 PG_CONSTRAINT ./leon3s_3_pg.tcl
leon3mp PG_CONSTRAINT ./top_pg.tcl
leon3s COMPILE_PG ./compile_pg_tmp.tcl
leon3s_2 COMPILE_PG ./compile_pg_tmp.tcl
leon3s_3 COMPILE_PG ./compile_pg_tmp.tcl
leon3mp COMPILE_PG ./compile_pg_tmp

33
Frequently Asked Questions:
Q: How can I improve runtime when creating the PG mesh creation during early design
exploration?
A: During early design exploration, you can turn off via DRC checking when running
compile_pg by specifying the –ignore_via_drc option. This will decrease runtime and
allow for more design iterations.
Q: How do I improve runtime when implementing the top-level PG?
A: If the PG in the blocks are fully implemented, you should treat the blocks as blockage
when creating the top-level PG. Otherwise, the tool must do DRC checking on these
blocks and the runtime will increase. You can specify the blocks as blockage with the
-blockage {blocks: $blocks} option to set_pg_strategy.
Q: What is the next step if IC Validator (ICV) detects a PG DRC violation?
A: Write out the IC Compiler II technology file and verify that the corresponding ICV
runset rule is actually defined in the technology file. The DRC violation might be caused
by an inconsistency between the ICV runset and IC Compiler II technology file.
Q: When should I create stapling vias between parallel M1/M2 rails?
A: Create stapling vias between M1/M2 rails as the last step of PG creation, after
creating the PG mesh. Otherwise, the M1/M2 vias might affect PG mesh creation
runtime. In certain advanced technology nodes, stapling vias should be created after
signal route.
Q: What is the next step if certain wires or vias are missing during PG creation?
A: Use the -ignore_drc option with compile_pg for PG creation. You can also use
check_pg_drc or ICV to check DRCs in the local area to determine if the missing wires
or vias are due to certain DRCs.
Q: When should I use the -low_end_reference_point and -
high_end_reference_point options with the create_pg_wire_pattern command?

A: Use these options when segment wires are required, for example, when creating
bridging straps. Otherwise, avoid using these two options.
Q: How do I avoid creating an incomplete or missing PG meshes when creating single
layer mesh?
A: Include {trim: false} with the -layers option to create_pg_mesh_pattern when
creating mesh pattern for single layer PG.

34
SCENARIO

SCENARIO = MODE ​+ ​CORNER.

MODE:​ MODE IS DEFINED AS A ​SET OF CLOCKS​ , ​SUPPLY VOLTAGES​ ,​TIMING


CONSTRAINTS​ AND​LIBRARIES​.

MODES TYPE:
1. FUNCTIONAL MODE.
2. TEST MODE.
IT CONTAINS SDC CONSTRAINTS.
IN DESIGN DIFFERENT FUNCTIONALITY MODES CONTAINS DIFFERENT SDC'S.
IN DESIGN DIFFERENT FUNCTIONALITY MODES ARE PRESENT.

CONSTRAINTS IN TEST MODE WHILE THE CHIP IS A DEVICES UNDER TEST:


● TESTER CLOCK PERIOD AND CLOCK SOURCES.
● MODEL TESTER SKEW ON THE INPUT PORTS.
● DIFFERENT TIMING EXCEPTIONS.
● DIFFERENT SETUP/HOLD ON THE OUTPUT PORTS.
● THE SCAN CHAIN IS EXCERSIED IN TEST MODE.(NOT IN FUNCTIONAL MODE).
CORNERS:
CORNERS CONTAINS ​PVT'S.
_____________ BEST CASE CORNER
|
PVT -------------

​ ​ |
​ _____________WORST CASE CORNER.
_

FOR SETUP :

Arrival Path-------- |
|--------------------->Max Dealys
Data path------------|
Reqired Path------------------------------->Min Delays

FOR HOLD :
Arrival Path-------- |
|--------------------->Min Dealys
Data path------------|

Reqired Path------------------------------->Min Delays


BEST CASE : ​--------->FASTEST<-------->MIN DELAYS<------->Early<-----------> FOR HOLD

● MIN DELAYS IN ARRIVAL PATH,DATA PATH .


● MAX DELAYS IN CLOCK PATH.
PVT: PROCESS---------------------->FAST
​VOLTAGE--------------------->HIGH
​ TEMPERATURE------------>LOW

WORST CASE:​ ​--------->SLOWEST<-------->MAX DELAYS<------------> FOR SETUP


● MAX DELAYS IN ARRIVAL PATH, DATA PATH.
● MIN DELAYS IN CLOCK PATH.
PVT: PROCESS-------------------->SLOW
​VOLTAGE-------------------->LOW
​TEMPERATURE----------->HIGH

PHYSICAL VERIFICATION:
IN PHYSICAL VERIFICATION IT CHECKS:
1. LVS(LAYOUT VERSUS SCHEMATIC)
2. DRC(DESIGN RULE CONSTRAINTS CHECK)
3. ERC(ELECTRICAL RULE CHECK)
LAYOUT VERSUS SCHEMATIC(LVS):
INPUTS ARE (.LVS.V) AND (.GDSII) FILES AND RULE DECK FILES.
COMPARISION TWO ELECTRICAL CIRCUITS EQUIVALENT WITH RESPECT TO THEIR
"CONNECTIVITY" AND "TOTAL TRANSISTOR COUNT".

COMPARISION BETWEEN (.GDSII) FILE AND EXTRCTED NETLIST (.LVS.V) FILE.

FINALLY BOTH ARE CONVERTED INTO A SPICE LEVEL .


LVS CHECKS ARE:
EXTRACT ERRORS :
● SHORTS
● OPENS
● FLOATING NETS.
COMPARE ERRORS:
● PIN ERRORS
● PARAMETRIC ERRORS
● DEVICE MISMATCH
● NET MISMATCH
● MALFORMED DEVICES
● PORTS MISMATCH
DESIGN RULE CONSTARINTS CHECK(DRC):
INPUT IS .GDSII FILE AND RULE DECK FILE.
CHECKS:
● ACTIVE TO ACTIVE SPACINGS.
● WELL TO WELL SPACINGS.
● MINIMUM CHANNEL LENGTH OF THE TRANSISTOR.
● MINIMMUM METAL WIDTH.
● METAL TO METAL SPACINGS.
● ESD(ELECTRO STATIC DISCHARGE).
● I/O RULES.
● METAL FILL DENSITY.
ELECTRICAL RULE CHECK(ERC):

INPUT IS ​ ​(.​GDSII) FILE .

INVOLVES CHECKING A DESIGN FOR ALL ELECTRICAL CONNECTIONS.

CHECKS ARE:
● WELL AND SUBSTRATE AREAS FOR PROPER CONTACTS AND SPCINGS THERE BY
ENSURING CORRECT POWER CONTACTS AND GROUND CONNECTIONS.
● TO LOCATE FLOATING DEVICES AND FLOATING WELLS.
● TO LOCATE DEVICE WICH ARE SHORTED.
● TO LOCATE DEVICES WITH MISSING CONNECTIONS.
● GATE CONNECTRD DIRECTLY TO SUPPLIES.
● FLOATING INPUTS.

FORMAL VERIFICATIONS:

IN FORMAL VERIFICATION CHECKS ARE ​LEC(LOGICAL EQUIVALENCE CHECK).

CHECKING BETWEEN ​FINALLY EXTRCCTED NETLIST(.V)​ AND ​SYNTHESIZED NETLIST(.V)​.

INPUTS ARE ​EXTRCCTED NETLIST(.V) ​AND ​SYNTHESIZED NETLIST(.V)​.

HERE CHECKING FOR ​FUNCTIONALITY CORRECTNESS​.

FIXING SETUP AND FIXING HOLD TIME


SETUP TIME:​ THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE BEFORE
ARRIVAL OF SENSITIVE CLCK.

SETUP CHECK:​THE DATA LAUNCHED AT THE SENSITIVE EDGE OF THE LAUNCH FLOP
SHOULD BE CAPTURED AT THE NEXT SENSITIVE EDGE OF THE CAPTURED FLOP.
HOLD TIME:​ THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE AFTER
ARRIVAL OF SENSITIVE CLOCK.

HOLD CHECK:​ THE DATA LAUNCHED AT THE SENSITIVE EDGE OF THE LAUNCH FLOP
SHOULD NOT BE CAPTURE AT THE SAME SENSITIVE EDGE OF CAPTURED FLOP.

SETUP FIXES:

1. BUFFER INSERTION
2. UPSIZING THE DRIVER CELL
3. REDUCE NET LENGTH
4. CELL UP SIZING.
5. DRIVE STRENGTH OF LAUNCH FLOP INCREASE.
6. LOGICAL OPTIMIZATION ON DATA PATH.
7. USEFUL SKEW.
8. PIPELINING.
9. USE SYNC CELLS.
10. NET WIDTH INCREASE.
11. USE LVT CELLS.
12. SPLITTING THE COMBINATIONAL LOGIC.
13. INCREASE CLOCK PERIOD.
14. USING DOUBLE SYNCHRONIZER USING FLIP FLOPS.
15. REDUNDANT VIA.
16. REDUCE THE MORE FANOUT NETS WITHIN THE LOGIC
17. DOUBLE VIA
18. LAYER JUMPING

HOLD FIXES:

1. DELAY BUFFER INSERTION.


2. CELL DOWN SIZING.
3. INCREASE NET LENGTH.
4. USE HVT CELLS.
5. SCAN CHAIN REORDERING.
6. ADJUSTING TIMING PATHS
7. CAN BE FIXED BY ADDING DELAYS ON INPUT PORTS.
8. CLOCK SIZING
9. ONE CAN ADD LOOK UP LATCHES
10. REDUCE CLOCK SKEW
11. CAN BE FIXED BY ADDING DELAYS ON INPUT PPORTS
12. INCREASE NETLENGTH ( JOG)

FIXING ELECTRO MIGRATION


ELECTRO MIGRATION :

WHEN HIGH CURRENT DENSITY TRANSFERRING THROUGH A LONG WIRE FOR A LONG
TIME DUE TO THIS ELECTRONS MOVED WITH HIGH ACCELARATIONS ,DUE TO THIS
THOSE ARE TRANSFERRING THEIR MOMENTUM TO THE METAL ATOMS.DUE TO THESE
CAN MIGRATE AND MOVE AWAY FROM THE METAL .

THIS CAN CAUSES THE SHORTS AND OPENS.

FIXES

BY AVOIDING THIS PROBLEM DOUBLE THE WIDTH OF NETS.

AVOID THE BIG DRIVERS AND LARGE BUFFERS.

FIXING CROSSTALK

CROSS TALK:

THE VOLTAGE TRANSFER FROM ​HIGHLY SWITCHING NET(AGGRESSOR NET)​ TO


ANOTHER NET​ (LOW SWITCHING (OR) HIGH SWITCHING (OR) VICTIM NET (OR) CONSTANT
NET )​ THROUGH COUPLING CAPACITANCE THESE MAY CAUSE CROSS TALK .

REDUCING TECHNIQUES:

● VICTIM NET WIDTH INCREASING THEN RESISTANCE DECREASE IT IS USED AT


ROUTING ALSO.
● SPACING BETWEEN AGGRESSOR NET AND VICTIM NET INCREASE.
● BUFFERING ON CONSTANT NETS (OR) VICTIM NETS.
● PLACING AN GROUND NETS ON BETWEEN THE AGGRESSOR NET AND VICTIM NET
THEN VOLTAGE DISCHARGE ON GROUND NET THEN NO SIGNAL INTEGRITY
PROBLEM.THIS IS CALLED ​SHIELDING​ .
● MAINTAIN STABLE SUPPLY.
● FAST SLEW RATE.
● JOGING(INCRAESE HALF TRACK BY HALF ITCH).
● LAYER JUMPING(JUMP ONE LAYER ABOVE LAYER AND COMES TO SAME LAYER)
● INCREASE DRIVE STRENTH OF CELL
● CELL SIZING(UP SIZING)
● DEEP N-WELL.
● GUARD RING.

FIXING DRC'S

DRC'S FIXING
DRC'S ARE DIFFERENT TYPES :

1. LOGICAL DRC'S.
2. PHYSICAL DRC'S.

LOGICAL DRC'S:

1. MAX TRANSITION
2. MAX CAPACITANCE
3. MAX FANOUT

MAX TRANSITION:

FIXING TECHNIQUES:

● ADD A BUFFER IN MIDDLE OF THE LONG LENGTH WIRE.


● REDUCE THE WIRE LENGTH.
● ADDING A CHAIN OF BUFFERS.

MAX CAPACITANCE:

FIXING TECHNIQUES:

● DECREASE WIRE LENGTH AT OUTPUT SIDE.

MAX FANOUT:

FIXING TECHNIQUES:

● CLONNING=ADDING A SAME CELL LOAD WILL BE DIVIDED.


● SHARING THE LOAD

PHYSICAL DRC'S:

1. WIRE TO WIRE SPACING(MIN SPACING)


2. MIN WIDTH OF WIRES
3. VIA TO VIA SPACINGS
4. NOTCH AVOIDING

FIXING TECHNIQUES:

● SEARCH AND REPAIR

DIFFERENT TYPE OF CELLS

DIFFERENT TYPE OF CELLS:

● STDCELLS:
○ Nothing But Base cells(Gates,flops).
● TAP CELLS:
○ Avoids Latch up Problem(Placing these cells with a particular distance).
○ Cells are physical-only cells that have power and ground pins and dont have
signal pins.
○ Tap cells are well-tied cells that bias the silicon infrastructure of n-wells or
p-wells.
○ They are traditionally used so that Vdd or Gnd are connected to substrate or
n-well respectively.
○ This is to Help TIE Vdd and Gnd which results in lesser drift and prevention
from latchup.
○ Required by some technology libraries to limit resistance between Power or
Ground connections to well of the substrate.
● TIE CELLS :
○ It is used for preventing Damage of cells; Tie High cell(Gate One input is
connected to Vdd, another input is connected to signal net);Tie low cells Gate
one input is connected to Vss, another input is connected to signal .
○ Tie - high and Tie - low cells​ are used to connect the gate of the transistor to
either Power and Ground.
○ In lower technology nodes, if the gate is connected to Power or Ground. The
transistor might be turned ​"ON/OFF"​ due to ​Power​ or ​Ground Bounce​.
○ These cells are part of the std cell library.
○ The cells which require Vdd(​Typically constant signals tied to 1​) conncet to tie
high cells.
○ The cells which require Vss/Vdd (​Typically constant signals tied to 0​) connect
to tie low cells.
● END CAP CELLS:
○ To Know the end of the row,and At the edges endcap cells are placed to avoid
the cells damages at the end of the row to avoid wrong laser wavelength for
correct manufacturing.
○ You can add Endcap cells at both ​Ends of a cell row​.
○ Endcap cells surrounding the core area features which serve as ​second poly​ to
cells
○ placed at the edge of row.
○ The library cells do not have cell connectivity as they are only connected to
Power and Ground rail,
○ Thus ensure that gaps do not occure between ​"WELL"​ and ​"IMPLANT LAYER"
and to prevent the DRC violations by satisfying ​"WELL TIE - OFF"
requirements for core rows we use End cap cells.
○ Usually adding the ​"Well Extension"​ for DRC correct designs.
○ End caps are a ​"POLY EXTENSION"​ to avoid drain source ​SHORT
● DECAP CELLS:
○ Charge Sharing;To avoid the Dynamic IR drop ,charge stores in the cells and
release the charge to Nets.
○ Decoupling capacitor cells , or Decap cells, are cells that have a capacitor
placed.
○ Between the Power rail and Ground rail to Over come Dynamic voltage drop.
○ Dynamic IR Drop happens at the active edge of the clock at which a High
currents is drawn from the Power Grid for a small Duration.
○ If the Power is far from a flop the chances are there that flop can go into
Metastable State.
○ To overcome decaps are added , when current requirements is High this
Decaps discharges and provide boost to the power grid.
● FILLER CELLS:
○ Filler cells are used to connect the gaps between the cells after placement.
○ Filler cells are ussed to establish thecontinuity of the N-Wells and the
IMPLANT LAYERS on the standard cells rows, some of the cells also don't
have the Bulk Connection (Substrate connection) Because of their small size
(thin cells).
○ In those cases, the abutment of cells through inserting filler cells can connect
those substrates of small cells to the Power/Ground nets.
○ i.e. those tin cells can use the Bulk connection of the other cells(this is one of
the reason why you get stand alone LVS check failed on some cells)
● ICG CELLS:
○ Clock gating cells ,to avoid Dynamic power Dissipation.
○ Register banks disabled during some clock cycles.
○ During idle modes, the clocks can be gated-offs to save Dynamic power
dissipation on flipflops.
○ Proper circuit is essential to achive a gated clock state to prevent false glithes
on the clock paths
● POWER GATING CELLS:
■ In Power gating to avoid static power Dissipation.
○ Power Gating Cells:
■ Power switches
■ Level Shifters
■ Retention registers
■ Isolation cells
■ Power controler
● PAD CELLS:
○ To Interface with outside Devices;Input to of Power,Clock,Pins are connected
to pad cells and outside also.
● CORNER CELLS:
○ Corner Pads are used for Well Continity.
○ To lift the chip.
● MACRO CELLS:
○ Memories.
○ The memory cells are called Macros.
○ To store information using sequntial elements takes up lot of area.
○ A single flipflop could take up 15 to 20 transistors to store one bit store the
data efficiently and also do not occupy much space on the chip comparatively
by using macros.
● SPARE CELLS:
○ Used at the ECO.
○ Spare cells are standard cells in a design that are not used by the netlist.
○ Placing the spare cells in your design provides a margin for correcting logical
error that might be detected later in the design flow, or for adjusting the speed
of your design.
○ Spare cells are used by the fix ECO command during ECO process.
● PAD FILLER CELLS:
○ Used for Well Continity, Placed in between Pads.
● JTAG CELLS:
○ These are used to check the IO connectivity.

DIFFERENT FILES IN PHYSICAL DESIGN

FILES:

1. LOGICAL LIBRARIES---------------------------> ​.lib, .db


2. PHYSICAL LIBRARIES​ ​------------------------->​.lef, .milkyway (OR) .volcano (OR),
.plib(OR).enc
3. TECHNOLOGY FILE ---------------------------->​ ​.tf
4. TLU+ --------------------------------------------------​>​.tlup
5. INTER CONNECT TECHNOLOGY FILE ------>​ .itf
6. MAPPING FILE ------------------------------------> ​.map
7. NETLIST ---------------------------------------------> ​.v,(OR) .ddc ,(OR) .db, (OR) .EDIF
8. SDC-----------------------------------------------------> ​.sdc
9. PHYSICAL ONLY PAD CELLS PLACEMENT FILE --------------------->​ ​.tdf
10. SCAN CHAIN FILE ------------------------------->​ ​.scandef
11. TOGGLE RATE FILE------------------------------>​.saif, (OR) .vcd
12. ECO FILE ----------------------------------------->​.eco
13. GDS FILE------------------------------------------>​.gds
14. LOG FILE------------------------------------------>​.log
15. REPORT FILE-------------------------------------->​.rep
16. DESIGN EXCHANGE FORMAT------------------>​.def
17. STANDARD DELAY FORMAT------------------->​.sdf
18. STANDARD PARASITIC EXCHANGE FORMAT--------->​ ​.spef

CALCULATIONS:

POWER CALCULATIONS:
----->​NUMBER OF THE CORE POWER PAD REQUIRED FOR EACH SIDE OF CHIP​=​(TOTAL
CORE POWER)/{NUMBER OF SIDE)*(CORE VOLTAGE)*MAXIMUM ALLOWABLE CURRENT
FOR A I/O PAD)} .

----->CORE RING WIDTH:

CORE CURRENT(mA)=(CORE POWER)/(CORE VOLTAGE )

CORE P/G RING WIDTH​ =​(TOTAL CORE CURRENT)/{(0.OF.SIDES)*(MAXIMUM CURRENT


DENSITY OF THE METAL LAYER USED FOR PG RING)}

------->MAXIMUM CURRENT DENSITY Rj mA.

-------->SHEET RESISTANCE :Rs OHMS/SQUARE.

-------->​TOTAL CURRENT ​=​TOTAL POWER CONSUMPTION OF CHIP(P)/VOLTAGE(V).

-------->​NO.OF POWER PADS(Npads)​=​Itotal/Ip

------->Itotal =TOTAL CURRENT

------->Ip OBTAINED FROM IO LIBRARY SPACIFICATION.

-------->​NO.OF POWER PINS​ =​ Itotal/Ip

-------->MAXIMUM CURRENT SPACIFICATION OF EACH METAL LAYER FROM LIBRARY(Rj).

---------->​TOTAL METAL WIDTH REQUIRED ON LAYER1=LAYER2=

Wtotalstrap​ = ​ Itotal/(2*Rj)

----------->ASSUMING SPACINGS BETWEEN STRAPS=Lspace

L​<​(Vmax)/(Rj*Rs)

Vmax = MAX ALLOWABLE IR DROP

Rj=MAX CURRENT DENSITY

Rs=SHEET RESISTANCE

---------->​TOTAL CORE AREA​=​Wcore*Hcore

H=HEIGHT

W=WIDTH

----------->​NUMBER OF VERTICAL STRAPS​=​Nv​=​Wcore/L


----------->​NUMBER OF HORIZONTAL STRAPS​=​NH​=​Hcore/(2*L)

------------>​ MIN STRAP WIDTH REQUIRED​=​Wring/(Nv*Nh)

IR DROP​:

------>​AVG CURRENT THROUGH EACH STRAP=IstrapAvg​=​(Itotal)/(2*Nstraps)mA

-------->​APPROPRIATE IR DROP AT THE CENTER OF THE STRAP​=​Vdrop or IRdrop

=IstrapAvg*Rs*(W/2)*(1/Wstrap)

--------->NUMBER OF STRAPS BETWEEN TWO POWER PADS

Nstrappinspace​ = ​Dpadspacing/Lspace.

---------->​MIN RING WIDTH ​= ​Wring = Ip/Rj microm

POWER

​-------->​TOTAL POWER​=​STATIC POWER+DYNAMIC POWER

=LEAKAGE POWER+[INTERNAL POWER+EXT SWITCHING POWER]

=LEAKAGE POWER+[{SHORT CIRCUIT POWER + POWER+INT POWER}]+EXT


SWITCHING POWER]

=LEAKAGE POWER+[{(Vdd*Isc)+(C*V*V*F)+(1/2*C*V*V*F)]

Isc=SHORT CIRCUIT POWER

C=LOAD CAP

S=SWITCHING ACTIVITY FACTOR.

-​----CORE RING WIDTH:

CORE CURRENT=(CORE POWER)/(CORE VOLTAGE).

CORE P/G RING WIDTH=(TOTAL CURRENT)/(NO OF SIDES *MAXIMUM CURRENT DENSITY


OF THE METAL LAYER USED FOR P/G PAD RING)

FINAL VERIFICATION GDS FILE EXPORT

FINAL VERIFICATION:
1. PARASITICS EXTRACTION:​IT EXTRACT R,C VALUES FOR GETTING ORIGINAL
DELAYS. TOOL:STAR RC XT LICENCE
2. TIMING VERIFICATION:​IT IS FIND BY USING PRIME TIME TOOL.
3. LVS ,ERC CHECKS:​THESE IS FIND OUT BY USING CALIBRE,HERCULIES TOOLS.
4. DRC CHECKS:​THESE IS FIND OUT BY USING CALIBRE,HERCULIES TOOLS.

AFTER VERIFICATION:

1. AFTER THIS WE RELEASE THE GDS FILE


2. IN THIS WE HAVE ALL POLYGONS INFORMATION IS PRESENT.

AFTER GDS

AFTER THIS WE ARE FINALLY BASE TAPE OUT(BTO).

AFTER BASE TAPE OUT WE WILL DO METAL TAPE OUT(MTO).

CHIP FINISHING:

IN THE CHIP FINISHING:

WE NEED TO DO:

1. ANTENA FIXING​:AS TOTAL AREA OF WIRE INCREASE DURING PROCESSING,THE


VOLTAGE STRESSING THE GATE OXIDE INCREASE IT MAY DAMAGE OXIDE LAYER.
2. RANDOM PARTICAL DEFECTS​:RANDOM PARTICAL EFFECT IS (i)WIRES AT
MINIMUM SPACING ARE MOST SUSCEPTABLE TO SHORTS.(ii)WIRES AT MAXIMUM
WIDTH ARE MOST SUSCEPTABLE TO OPENS.
3. REDUNDANT VIA INSERTION:​VOID IN VIAS IS A SERIOUS ISSUE IN
MANUFACTURING.
4. FILLER CELL INSERTION:​SOME PLACEMENT SITES REMAIN EMPTY ON SOME
ROWS.
5. METAL FILL INSERTION:​METAL OVER ETCHING.
6. METAL SLOTTING​ :METAL EROSION,METAL LIFT OFF.

ANTENA FIXING TECHNIQUES


(THESE ARE FIX AT THE TIME OF THE SEARCH AND REPAIR)

● SPLITTING METAL(LAYER JUMPING).


● ADDING DIODE IN REVERSE BIAS MANNER.
RANDOM PARTICAL DEFECT FIXING TECHNIQUES

● SPREADING (OR) JOG(PUSH ROUTES OFF-TRACKS BY 1/2 PITCH)-------->REDUCE


SHORTS
● WIDENING(INCREASE THE WIRE WIDTH)------------>THIS MAY REDUCE OPENS.

REDUNDANT VIA INSERTION

● REDUNDANT VIA IS THE TECHNIQUE FOR REDUCING VOIDS IN THE METAL LAYER.

FILLER CELL INSERTION

● FILLER CELL INSERTION IS THE ONE OF THE TECHNIQUE FOR UTILIZING THE
TOTAL AREA WITH OUT GAPS .
● IT IS GOOD TECHNIQUE BECAUSE IN THE FUTURE WE CAN REPLACE FILLER
CELLS WITH SPARE CELLS WITH A LOGIC.

METAL FILL INSERTION

● AT THE TIME OF ETCHING THEY USE SOME TYPE OF CHEMICALS DUE TO THAT
CHEMICALS METAL LOSSES MORE FOR THAT ONE WE ARE INSERTING THE METAL
FILLS.

METAL SLOTTING

● METAL SLOTTING IS TECHNIQUE FOR AVOIDING THE PROBLEMS LIKE METAL LIFT
OFF , METAL EROSION.

ECO:

ECO'S ARE TWO TYPE :1)TIMING ECO'S(TO IMPROVE TIMING)

2)FUNCTIONAL ECO'S(TO ADD FUNCTIONALITY)

TIMING ECO FUNCTIONAL ECO

| |

----------------------------- ---------------------------

| | | |

FREEZE NON-FREEZE FREEZE NON-FREEZE


--------->IT IS THE LATE CHANGE IN THE FLOW.

---------->AFTER ROUTING IF WE WANT ANY CHANGES OR ADDING NEW CELLS , THESE


ALL ARE DONE AT THE ECO STAGE.

HERE TWO TYPE OF ECO'S PRESENT :

(i)FREEZE SILICON ECO

(ii)NON FREEZE SILICON ECO

--------->IN FREEZE SILICON ECO WE HAVE NO CHANCE OF ADDING CELL, HERE SPARE
CELLS ARE USED FOR THESE.

----------->IN NON FREEZE SILICON ECO WE CAN ADD THE CELLS AFTER ROUTING.

ROUTING (SEARCH AND REPAIR)

SEARCH AND REPAIR :

---->SEARCH AND REPAIR FIXES REMAINING DRC VIOLATIONS THROUGH MULTIPLE


LOOPS USING PROGRESSIVLY LARGE SBOX.

ROUTING (DETAIL ROUTING)

DETAIL ROUTING:

---->DETAIL ROUTE DONES ACTUAL ROUTING.

----->MEANS ACTUAL ROUTING METAL CONNECTIONS.

----->CHECK ALSO PHYSICAL DRC'S.

----->DETAIL ROUTING DOES NOT WORK ON THE ENTIRE CHIP AT THE SAME TIME LIKE
TRACK ASSIGNMENT.

------>INSTEAD IT WORKS BE REROUTING WITHIN THE CONFINES OF A SMALL AREA


CALLED AN "SBOX".

SBOX : DIVIDE THE BLOCK INTO MINI BOXES THESE ARE USED FOR THE DETAIL ROUTE.

ROUTING (TRACK ASSIGNMENT)

TRACK ASSIGNMENTS :
---->ASSIGNS EACH NET TO THE SPACIFIC TRACKS.

---->NETS ARE LAYDOWN THE METAL TRACES.

----->TRACES=METAL CONNECTIVITY..

ROUTING (GLOBAL ROUTING)

GLOBAL ROUTING:

--->FIRST THE DESIGN IS DIVIDED INTO SMALL BOXES EVERY BOX IS CALLED GLOBAL
ROUTING CELLS (GCELLS OR BUCKETS)

----->EVERY GCELL HAVING THE A NUMBER OF HORIZONTAL ROUTING RESOURCES AND


VERTICAL ROUTING RESOURCES.

----->GLOBAL ROUTING ASSIGNS NETS(​LOGICAL CONNECTIVITY NOT METAL


CONNECTIVITY​) TO SPACIFIC METAL LAYERS AND GLOBAL ROUTING CELLS.

------>BY USING GLOBAL ROUTING WE CAN ANALYZE CONGESTION.

------->CONGESTION =(REQUIRED ROUTING RESOURCES > AVAILABLE ROUTING


RESOURCES)

------->IF ANY GCELL HAVE CONGESTION THEN DETOURING(AVOID THE GCELL ROUTING
THROUGH ANOTHER GCELL).

ROUTING

ROUTING:

---->CREATE PHYSICAL CONNECTIONS TO ALL DATA SIGNAL PINS,CLOCK PINS


THROUGH METAL INTERCONNECTIONS.

---->PATHS MUST MET TIMINGS.

IN THE ROUTING MAINLY THREE STAGES ARE PRESENT:

(i)GLOBAL ROUTING

(ii)TRACK ASSIGNMENT

(iii)DETAIL ROUTING

EXTRA ONE
(iv)SEARCH AND REPAIR

CTS OPTOMIZATION

OPTIMIZATIONS TECHNIQUES:

BUFFERING ----------------->IT WILL IMPROVE SETUP TIME

GATE SIZING---------------->BY DECREASING GATE SIZE DELAY MAY DECREASE(UPSIZE)

DELAY INSERTION------->IT WILL IMPROVE HOLD TIME

BUFFER RELOCATION--->REDUCE SKEW & INSERTION DELAY

FIX MAX TRANSITION---->ADD BUFFERS

FIX MAX CAPACITANCE--->DECREASE NET LENGTH,CLONNING.

OPTIMIZATION PROCESS:

● REDUCE DISTRUBANCES TO OTHER CELLS AS MUCH AS POSSIBLE.


● PERFORM LOGICAL AND PLACEMENT OPTIMIZATIONS TO ALL FIX POSSIBLE
TIMING
● FIX MAX TRANS/CAP VIOLATIONS AND SKEW, BASED ON PROPAGATED CLOCK
ARRIVALS

CTS (APPLYIN NDRS ON CLOCK NETS)

NDR'S:

NDR'S ARE NOTHING BUT NON DEFAULT ROUTING .

THESE ARE APPLIED ON THE CLOCK NETS .

CLOCK NETS ARE LESS SENSITIVE TO CROSSTALK AND ELECTROMIGRATION.

CLOCK NETS ARE HIGH SWITCHING ACTIVITY NETS.

NDR RULES ARE ​(i)DOUBLE WIDTH,

(ii)DOUBLE SPACING.

(iii)SHEILDING

BY APPLYING DOUBLE WIDTH WE CAN AVOID THE ELECTROMIGRATION EFFECT.


BY APPLYING DOUBLE SPACING WE CAN AVOID CROSS TALK EFFECT.

BY DEFAULT, NON DEFAULT ROUTING RULE APPLIES ON ALL LEVELS CLOCK TREE. BUT
USING NDR RULES AT THE CLOCK SINK PIN POINTS IS BETTER TO AVOID.

HELPS TO AVOID CONGESTION AT LOWER METAL LAYERS

IMPROVES PIN ACCESSIBILITY OF STD CELLS

----------->ALWAYS ROUTE CLOCK ON ​METAL 3 ​AND ​ABOVE

----------->AVOID NDR ON CLOCK SINKS

----------->AVOID NDR ON METAL 1.

---- -MAY HAVE TROUBLE ACCESSINING METAL 1 PINS ON BUFFERS AND


GATES

-----CONSIDER DOUBLE WIDTH TO REDUCE RESISTANCE.

CTS (START POINTS AND END POINTS)


IN DESIGN WE HAVING

STATIC TIMING ANALYSIS


STATIC TIMING ANALYSIS DO NOT DEPEND ON INPUTS AND WEATHER IT IS PRESENT IN
ON STATE (OR) OF STATE,SWITCHING.

STATIC TIMING ANALYSIS IS PURELY DEPEND ON THE DELAYS.

SETUP TIME​ :THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE BEFORE
ARRIVAL OF SENSITIVE CLOCK.

HOLD TIME ​:THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE AFTER
ARRIVAL OF SENSITIVE CLOCK.

SETUP CHECK​:THE DATA LAUNCHED AT SENSITIVE EDGE OF THE LAUNCH FLOP SHOULD
BE CAPTURED AT NEXT SENSITIVE EDGE OF THE CAPTURED FLOP.

BELOW IS THE SETUP CHECK EQ :

Tcq + Tcomb < Tclk-Tsu

HOLD CHECK​:THE DATA LAUNCHED AT SENSITIVE EDGE OF THE LAUNCH FLOP


SHOULD NOT BE CAPTURED AT THE SAME SENSITIVE EDGE OF THE CAPTURED FLOP.

BELOW IS THE HOLD CHECK EQ :

Tcq + Tcomb > Thold

IN THE BASING TIMING DIAGRAM

START POINTS ARE ​(i)INPUT PORT

(ii)CLOCK PIN OF LAUNCH FLOP.

END POINTS ARE​ (i)DATA INPUT PIN OF CAPTURE FLOP.

(ii)OUTPUT PORT.

BY THE COMBINATION OF THE THESE START AND END POINTS WE HAVE THE PATHS LIKE
ARE

1. INPUT PORT TO DATA INPUT PIN OF LAUNCH FLOP


2. CLOCK PIN OF THE LAUNCH FLOP TO DATA INPUT PIN OF CAPTURE FLOP
3. INPUT PORT TO OUTPUT PORT
4. CLOCK PIN OF THE LAUNCH FLOP TO OUTPUT PORT.

BY DEPENDING ON THE START POINTS AND END POINTS WE HAVE FOUR TIMING GROUPS
PRESENT.

1. INPUT GROUP
2. REGISTER GROUP
3. FEED THROUGH GROUP.
4. OUTPUT GROUP.

CTS (CLOCK TREE SYNTHESIS)

CLOCK TREE SYNTHESIS :

----->CTS IS THE CONNECT THE CLOCKS TO THE ALL CLOCK PIN OF SEQUENTIAL
CIRCUITS.

------->ALL CLOCK PINS ARE DRIVEN BY A SINGLE CLOCK SOURCE.

------->​CTS TARGETS​ :(i)skew ,


(ii)insertion delay

-------->​CTS GOALS​ :(i)max transition,

(ii)max capacitance,

(iii)max fanout,

(iv)max buffer levels.

------->A BUFFER TREES IS BUILT TO BALANCE THE LOADS AND MINIMIZE THE SKEW.

-------->A CLOCK TREE WITH BUFFER LEVELS BETWEEN THE CLOCK SOURCE AND CLOCK
SINKS(END POINTS).

-------->CTS STARTING POINT IS​ CLOCK SOURCE​ (SDC DEFINED CREATE_CLOCK)

-------->CTS END POINTS ARE​ CLOCK PINS​ OF SEQUENTIAL CELLS.

-------->CLOCK PINS ARE ALSO CALLED AS THE ​CLOCK SINKS​.

-------->WHEN THE CLOCK ROOT IS PRIMARY PORT OF BLOCK.

-------->AT CHIP LEVEL PRIMARY PORTS ARE PADS.

-------->CLOCK PINS ARE DIFFERENT TYPES ,THOSE ARE (i) STOP PINS,

(ii)FLOAT PINS,

(iii)EXCLUDE PINS.

(iv)NON STOP PINS

-------->​STOP PINS​:CTS OPTIMIZES FOR CLOCK TREE TARGETS,CLOCK TREE GOALS.

-------->​FLOAT PIN​:LIKE AS STOP PINS,BUT DELAYS ON CLOCK PIN,MACRO INTERNAL


DELAY.

--------->​EXCLUDE PIN​:CTS IGNORES TARGETS,FIX CLOCK TREE DRC'S.

--------->​NON-STOP PINS​: NONSTOP PINS ARE PINS THROUGH WHICH CLOCK TREE
TRACING THE CONTINOUS AGAINEST THE DEFAULT BEHAVIOUR .

CLOCKS WHICH ARE TRAVERSED THROUGH DIVIDER CLOCK SEQUENTIAL ELEMENTS


CLOCK PINS ARE CONSIDERED AS ​NON-STOP PINS​.

PLACEMENT OPTIMIIZATION
PLACEMENT OPTIMIZATION:

PLACEMENT OPTIMIZATION WITH WE HAVE OPTIONS (i)CONGESTION,(ii)AREA RECOVERY


,(iii)POWER,(iv)DFT,(v)TIMING.

BY USING THE CONGESTION OPTION WE CAN REDUCE THE CONGESTION.

BY USING THE POWER OPTION WE CAN REDUCE THE STATIC POWER


DISSIPATION,DYNAMIC POWER DISSIPATION.

BY USING THE AREA RECOVERY OPTION WE CAN REDUCE THE CELLS , POWER, TIMING.

BY USING THE DFT OPTION WE CAN REDUCE THE ROUTING RESOURECES BY REORDER
THE SCAN CHAINS.

AND IF TIMING IS CRITICAL LOGICAL TIMING DRIVEN PLACEMENT.

AND CONGESTION IS CRITICAL CONGESTION DRIVEN PLACEMNT.

PLACEMENT (POWER SET UP)

POWER SETUP:
WE HAVE TWO TYPE OF THE POWER DISSIPATIONS:

1. STATIC POWER DISSIPATION


2. DYNAMIC POWER DISSIPATION

STATIC POWER DISSIPATION:

STATIC POWER DISSIPATION IS, IF THE CELLS ARE PRESENT AT THE "OFF" STATE THEN
DUE TO THE LEAKAGE OF CELLS STATIC POWER DISSIPATION OCCURRS.

THE LEAKAGE IS DUE TO THE JUNCTION LEAKAGE, TUNNELING , SUB THRESHOLD


LEAKAGE.
FOR REDUCING THE STATIC POWER DISSIPATION REPLACING THE LVT CELLS WITH HVT
CELLS.

HVT CELLS ARE SLOWER,AND LOW LEAKAGE ,HIGH Vt .

LVT CELLS ARE FASTER ,AND HIGH LEAKAGE,LOW Vt.

REPLACING THE LVT CELLS WITH HVT CELLS.

LVT CELLS ARE USED AT CRITICAL PATHS.

IN THE MOST OF THE ARCHITECTURES WE WILL USE THE POWER GATING FOR
REDUSING THE STATIC POWER DISSIPATION.

DYNAMIC POWER DISSIPATION:

DYNAMIC POWER DISSIPATION IS DUE TO THE SHORT CIRCUIT , INTERNAL LOAD,HIGH


SWITCHING.

FOR REDUCING THE DYNAMIC POWER DISSIPATION WE HAVE LOT OF TECHNIQUES


THOSE ARE :

REDUCING THE HIGH TOGGLE RATE NET NET LENGTHS. THESE TOGGLE RATE IS
GETTING FROM SWITCHING FILE(.SAIF ) THIS IS GETTING FROM SIMULATION PEOPLE.

AND FOR AVOIDING THIS WHICH CELLS HAVING HIGH TOGGLE RATE NET LENTHS
CONNECTED NEARER TO CONNECTED CELLS.

ANOTHER TECHNIQUE IS ADDING THE BUFFER IN BETWEEN THE HIGH NET LENGTH
NETS. FOR REDUCING THE HIGH COUPLING CAPACITANCE.(REDUCE THE LOAD
CAPACITANCE)

ANOTHER TECHNIQUE IS CONNECT HIGH COUPLING CAPACITANCE NET TO THE LOW


CAPACITANCE PIN OF THE CELL.(SWAPPING THE PIN).

ANOTHER TECHNIQUE IS CLONING , IT IS CREATING THE SAME CELL AND CONNECT THE
SOME OF THE OUTPUT NET TO THESE.(SHARING THE LOAD)

AND ANOTHER TECHNIQUE IS CELL SIZING.

ANOTHER TECHNIQUE IS GATE LEVEL LOGIC OPTIMIZATION.

MOSTLY IN DESIGN WE WILL USE THE CLOCK GATING TO REDUSING THE DYNAMIC
POWER DISSIPATION
PLACEMENT (DFT SETUP)

DFT SETUP:

SCAN CHAINS: SCAN CHAINS ARE NOTHING BUT A GROUP OF REGISTERS CONNECTED
SERIALLY.

THESE ARE CONNECTED ARE ALPHA NUMERIC MANNER.

THERE ARE TWO TYPE OF MODES PRESENT:(i)FUNCTION MODE,(ii)TEST MODE.

THESE MODE SELECTED BY USING MUX DEVICES.

TEST MODE IS DONE AT AT DFT TIME.

DFT(DESIGN FOR TESTABILITY) IS ONE OF THE STEP IN ASIC FLOW.

HERE SCAN INPUT SI , SCAN OUT IS SO.

WE HAVE A PROBLEM WITH PREEXISTING SCAN CHAINS ,

THE ISSUE IS PREEXISTING SCAN CHAINS ARE CONNECTED FAR AWAY , BECAUSE THEY
ARE CONNECTED BASED ON THE FUNCTIONALITY BASED,

SO FOR CONNECTING THESE WE HAVE TO USE MORE ROUTING ROUTING RESOURCES.

IT CAUSED FOR CONGESTION.

INSERT THE SCAN CHAINS FILE. IF PROBLEM WITH PREEXISTING SCAN CHAINS THEN
REORDER THE NAMES OF THE SCAN REGISTER NAMES.

IT ALSO REDUCES THE HOLD TIME.

SCAN CHAIN INFORMATION PRESENT IN .scandef FILE

IF THE GIVEN NETLIST IS .ddc FORMAT THEN THERE IS NO NEED OF LOADING .scandef

IF THE GIVEN NETLIST IS .v FORMAT THEN WE HAVE TO LOAD THE .scandef FILE

PLACEMENT

​IN PLACEMENT STEPS ARE

1. PLACEMENT CHECKS,
2. AHFNS
3. DFT SETUP.
4. POWER SETUP.
5. PLACEMENT OPTIMIZATION.

PLACEMENT :

AFTER GOING TO PLACEMENT WE HAVE TO CHECKS ,FIX

1. FIX MACRO PLACEMENT.(AGAIN)


2. VERIFY THE P-NET, IGNORED ROUTING LAYERS.
3. VERIFY KEEPOUT VARIABLE SETTINGS.
4. SPECIFY NON DEFAULT ROUTING RULES.
5. CHECK PLACEMENT READINESS.

-->FIX MACRO PLACEMENT AGAIN, BECAUSE AFTER INSERTING THE DESIGN IF MACROS
ARE MOVED THE CHECK.

-->P-NET, IGNORED ROUTING LAYERS ALSO.

-->MAINTAIN KEEPOUT VARIABLE SETTINGS FURTHER STEPS ALSO

-->NON DEFAULT RULES ARE SPECIAL RULES. LIKE DOUBLE SPACING, DOUBLE
WIDTHING. THESE ARE APPLIED FOR CLOCK WIRES. BECAUSE THOSE HIGH ACTIVITY
NETS.

-->BUT HERE WE ARE ONLY SPECIFYING NON DEFAULT ROUTING RULES[NDR'S].

--->SPACIFYING NDR'S BECAUSE AVOIDING CONGESTION AND TIMIMG PROBLEMS AT


THE STAGE OF CLOCK TREE SYNTHESIS

-->CHECK PLACEMENT READINESS IN WE ARE CHECK

1. FLOOR PLAN ,
2. NETLIST,
3. NARROW PLACEMENT REGIONS,
4. R,C FOR ROTING LAYERS,
5. DESIGN CONSTRAINTS.

AHFNS (AUTOMATIC HIGH FANOUT NET SYNTHESIS):


● HFNS FOR RESET AND SCAN ENABLE AND ETC....
● HFNS ARE SYNTHESIZED IN FRONT END ALSO BUT AT THAT MOMENT NO
PLACEMENT INFO STAND CELLS IS AVIALABLE.
● HENCE BACKEND TOOL COLLAPSE SYNTHESIZED HFNS.
● IT RESYNTHESIS HFNS BASED ON PLACEMENT INFO AND APPROPRIATELY
"INSERT BUFFERS".
● TARGET OF THIS SYNTHESIS IS TO MET DELAY REQUIREMENTS i.e. SETUP AND
HOLD.

FLOORPLAN(TIMING)

FLOOR PLAN [TIMING]:

IN FLOOR PLAN TIMING IS ALSO IMPORTANT.

1. BEFORE GOING TO TIMING , PERFORM GLOBAL ROUTING AND ANALYZE


CONGESTION.
2. BY PERFORMING THE GLOBAL ROUTING EXTRACT APPROPRIATE R,C VALUES.
3. IF IN THE DESIGN CONGESTION PRESENT, GO TO THE CONGESTION STEP AND
MODIFY THE P - NET OPTIONS FULL TO PARTIAL.
4. PERFORM GLOBAL ROUTING AND ANALYZE CONGESTION.
5. IF IN THE DESIGN CONGESTION PRESENT MODIFY THE P-NET OPTIONS PARTIAL
TO COMPLETE.
6. EXTRACT R,C VALUES , ANALYZE THE TIMING.

EXTRACT PARASITIC NET R,C VALUES AND GENERATE A TIMING REPORT.

OPTIMIZE TIMING [DEFAULT]-->IF THE TIMING IS NOT ACCEPTED REPEAT

GLOBAL ROUTING, ANALYZE CONGESTION , TIMING IF NOT ACCEPTED.

PERFORM OPTIMIZE TIMING[HIGH EFFORT]---->OPTIMIZE HIGH EFFORT

​ IF THE TIMING IS NOT ACCEPTED , MODIFY THE FLOOR PLAN /RESYNTHESIZE

AFTER ACCEPTING THE CONGESTION, TIMING THEN WRITE OUT THE .def file

SAVE THE DESIGN .AND THESE .def FILE IS GIVEN AS INPUT TO THE PLACEMENT​.

FLOOR PLAN(VIRTUAL FLAT PLACEMENT)


VIRTUAL FLAT PLACEMENT:

1. APPLY PLACEMENT STRATEGY PARAMETERS.


2. PERFORM VIRTUAL FLAT PLACEMENT.

PLACEMENT STRATEGY PARAMETERS ARE (i)VIPO(VIRTUAL IN PLACEMENT


OPTIMIZATION),(ii)CONGESTION EFFORT,(iii)SLIVER SIZE,(iv)MACRO
PLACEMENT,(v)OPTIMIZATION ALGORITHMS AND EFFORT.

VIRTUAL FLAT PLACEMENT MEANS VIRTUALLY PLACING THE STD CELLS.AND


ANALYZE THE CONGESTION AND TIMING.

FLOORPLAN (CONGESTION)

CONGESTION:​ REQUIRED NO.OF ROUTING RESOURCES ARE GREATER THAN THE


NO.OF AVAILABLE ROUTING RESOURCES

1. FOR THE CONGESTION ANALYSIS WE HAVE TO DO FIRST PERFORM GLOBAL


ROUTING.
2. BY USING GLOBAL ROUTING CALCULATING CONGESTION.
3. ANALYZE THE CONGESTION [IF WE HAVE CONGESTION MEANS ROUTING
PROBLEM(MAY CAUSES SHORTS)].
4. IF THE CONGESTION IS PRESENT THEN MODIFY THE PLACEMENT STRATEGY
PARAMETERS LIKE BLOCKAGES,OFFSET,KEEP OUT MARGINS,SLIVER SIZE AND
MACRO, STD CELL CONSTRAINTS.
5. PERFORM CONGESTION DRIVEN VIRTUAL FLAT PLACEMENT.CONGESTION DRIVEN
MEANS MOVING STD CELLS FAR AWAY.
6. REANALYZE THE CONGESTION. IF CONGESTION IS NOT SATISFIED.
7. PERFORM HIGH EFFORT CONGESTION DRIVEN VIRTUAL FLAT PLACEMENT.
8. REANALYZE THE CONGESTION.
9. IF CONGESTION IS NOT SATISFIED.
10. MODIFY THE FLOOR PLAN.
11. IF CONGESTION IS SATISFIED.
12. FIX MACRO PLACEMENT.
CONGESTION CAUSES :

1. MISSING PLACEMENT BLOCKAGES


2. IMPROPER MACRO PLACEMENT AND MACRO CHANNEL
3. HIGH CELL DENSITY(HIGH LOCAL UTILIZATION)
4. VERY ROUBUST POWER NETWORK
5. EXCESS POWER STACK VIAS
6. PIN DENSITY OF CELLS, MACROS
7. DUE TO PORTS

MORE FIXES :

1. HIGH CELL DENSITY PROBLEM---->BY USING CO-ORDINATES WE REDUCE


UTILIZATION
2. PARTIAL PLACEMENT BLOCKAGES
3. MAX UTILIZATION %
4. INCREASING SPACING BETWEEN MACROS
5. FLIP MACRO(DO NOT FLIP 90 DEGREES)
6. MODIFY KEEP OUT CONSTRAINTS
7. GIVING HARD KEEP CHANNEL WIDTH

POWER PLANNING

IN POWER PLANNING

​IR DROP :​VOLTAGE TRANSFER IN METAL A DROP OCCURS DUE TO RESISTANCE OF


METAL.THIS IS KNOWN AS IR DROP.

IR DROPS ARE TWO TYPES (i)STATIC IR DROP,(ii)DYNAMIC IR DROP.

STATIC IR DROP:INDEPENDENT OF THE CELL SWITCHING THE DROP IS


CALCULATED WITH THE HELP OF WIRE RESISTANCE.

IMPROVE STATIC IR DROP:(i)WIDTH OF WIRE INCREASE, OR (ii) INCREASE THE


NO.OF WIRES

DYNAMIC IR DROP:IR DROP IS CALCULATED WITH THE HELP OF THE SWITCHING OF


THE CELLS.
IMPROVE DYNAMIC IR DROP:(i)PLACING DCAP CELLS IN BETWEEN
THEM,(iii)INCREASE THE NO OF STRAPS.

​ ELECTROMIGRATION: ​WHEN HIGH CURRENT DENSITY CONTINUOUSLY PASSING


THROUGH A METAL DUE TO THE HIGH CURRENT, THE ATOMS ARE MOVING WITH KINETIC
ENERGY AND THEY TRANSFER THE ENERGY TO ANOTHER ATOMS DUE THESE DAMAGE
THE METAL.

IMPROVE:INCREASE METAL WIDTH.

1. FIRST SAVE THE DESIGN, BEFORE GOING TO POWER PLAN.


2. DEFINE LOGICAL P/G CONNECTIONS.
3. APPLY POWER NETWORK CONSTRAINTS.
4. POWER NETWORK CONSTRAINTS ARE
5. (i)NO.OF POWER STRAPS,(ii)POWER STRAPS WIDTH,(iii)NO.OF POWER PADS,
(iii)POWER RING WIDTH.
6. SYNTHESIZE THE POWER NETWORK,ANALYZE POWER NET WORK .
7. ANALYZE POWER NETWORK​ :(i) P/G NETPAIR (ii) POWER BUDGET OF
SYNTHESIZED NETS(iii)PNS CALCULATES THE REQUIRED NO.OF STRAPS BASED
ON PROVIDED CONSTRAINTS. (iv) IR DROP. (v) ELECTROMIGRATION.
8. ANALYZE IR DROP.
9. IF IR DROP IS MORE THEN MODIFY POWER NETWORK CONSTRAINTS ,AND
RESYNTHESIZE POWER NETWORK.
10. IF IR DROP IS NOT SATISFIED ADD P/G PADS.
11. COMMIT THE POWER NETWORK, HERE STRAPS AND RINGS ARE ROUTED ,SO WE
CAN'T MODIFY THE DESIGN.
12. CONNECT THE MACRO P/G PINS AND PAD P/G PINS TO THE CORE RINGS.
13. CREATE POWER RAILS.ALONG THE STD CELL ROWS.
14. AND RE ANALYZE IR DROP.IF STRAPS ARE NOT SUFFICIENT THEN ADD.
15. APPLY P-NET OPTIONS . WHEN A POWER STRAPS IN METAL 7,POWER STRAPS
ARE CONNECTED TO THE POWER RAILS THROUGH VIA'S. SO IF ANY PLACED INN
THAT AREA THEN SHORTS OCCURRED .FOR AVOIDING THESE PROBLEM WE ARE
ADDING P-NET OPTIONS.
16. AFTER THIS INCREMENTAL PLACEMENT:IT MEANS EFFECTIVELY CELLS MOVING

POWER PLANNING IS ALSO CALLED AS THE PREROUTES.

BECAUSE IN THE CHIP FIRST POWER NETS ROUTED FIRST.


POWER CALCULATIONS:

----->​NUMBER OF THE CORE POWER PAD REQUIRED FOR EACH SIDE OF CHIP​=​(TOTAL
CORE POWER)/{(NUMBER OF SIDE)*(CORE VOLTAGE)*MAXIMUM ALLOWABLE CURRENT
FOR A I/O PAD)} .

----->CORE RING WIDTH:

CORE CURRENT(mA)=(CORE POWER)/(CORE VOLTAGE )

CORE P/G RING WIDTH​ =​(TOTAL CORE CURRENT)/{(N0.OF.SIDES)*(MAXIMUM CURRENT


DENSITY OF THE METAL LAYER USED FOR PG RING)}

------->MAXIMUM CURRENT DENSITY Rj mA.

-------->SHEET RESISTANCE :Rs OHMS/SQUARE.

-------->​TOTAL CURRENT ​=​TOTAL POWER CONSUMPTION OF CHIP(P)/VOLTAGE(V).

-------->​NO.OF POWER PADS(Npads)​=​Itotal/Ip

------->Itotal =TOTAL CURRENT

------->Ip OBTAINED FROM IO LIBRARY SPACIFICATION.

-------->​NO.OF POWER PINS​ =​ Itotal/Ip

-------->MAXIMUM CURRENT SPACIFICATION OF EACH METAL LAYER FROM LIBRARY(Rj).

---------->​TOTAL METAL WIDTH REQUIRED ON LAYER1=LAYER2=

Wtotalstrap​ = ​ Itotal/(2*Rj)

----------->ASSUMING SPACINGS BETWEEN STRAPS=Lspace

L​<​(Vmax)/(Rj*Rs)

Vmax = MAX ALLOWABLE IR DROP

Rj=MAX CURRENT DENSITY

Rs=SHEET RESISTANCE

---------->​TOTAL CORE AREA​=​Wcore*Hcore


H=HEIGHT

W=WIDTH

----------->​NUMBER OF VERTICAL STRAPS​=​Nv​=​Wcore/L

----------->​NUMBER OF HORIZONTAL STRAPS​=​NH​=​Hcore/(2*L)

------------>​ MIN STRAP WIDTH REQUIRED​=​Wring/(Nv*Nh)

IR DROP​:

------>​AVG CURRENT THROUGH EACH STRAP=IstrapAvg​=​(Itotal)/(2*Nstraps)mA

-------->​APPROPRIATE IR DROP AT THE CENTER OF THE STRAP​=​Vdrop or IRdrop

=IstrapAvg*Rs*(W/2)*(1/Wstrap)

--------->NUMBER OF STRAPS BETWEEN TWO POWER PADS

Nstrappinspace​ = ​Dpadspacing/Lspace.

---------->​MIN RING WIDTH ​= ​Wring = Ip/Rj microm

POWER

​-------->​TOTAL POWER​=​STATIC POWER+DYNAMIC POWER

=LEAKAGE POWER+[INTERNAL POWER+EXT SWITCHING POWER]

=LEAKAGE POWER+[{SHORTCKT+INT POWER}]+EXT SWITCHING POWER]

=LEAKAGE POWER+[{(Vdd*Isc)+(C*V*V*F)+(1/2*C*V*V*F)]

Isc=SHORT CIRCUIT POWER

C=LOAD CAP

S=SWITCHING ACTIVITY FACTOR.

FLOOR PLAN (PAD CELLS)

IN FLOOR PLAN
1. CREATE PHYSICAL ONLY PAD CELLS. PHYSICAL ONLY CELLS MEANS ONLY
THOSE HAVING PHYSICAL INFORMATION ONLY. NO LOGICAL INFORMATION
PRESENT. AND THEY DON'T HAVE TIMING INFORATION ALSO.
2. PHYSICAL ONLY PAD CELLS ARE (i)VDD,VSS PADC CELLS,(ii)CORNER PAD CELLS.
3. PAD CELLS ACTS LIKE AS PORTS AT THE CHIP LEVEL.
4. CHIP OUTSIDE PINS ARE CONNECTED TO THE INNER CHIP PADS.
5. PADS TYPES:(i)POWER PADS, (ii)DATA PADS .
6. FOR THE POWER SUPPLY TO THE ALL PADS CREATING A PAD POWER RING .
7. VDD,VSS PADS ARE CONNECTED TO THE CORE VDD,VSS POWER RINGS.
8. FOR FILLING THE GAPS BETWEEN THE PADS FILLED BY PAD FILLER CELLS.
9. THESE PAD FILLER CELLS ARE FOR WELL CONTINUITY.

PHYSICAL ONLY CELLS ARE:

1. PAD CELLS.
2. END CAP CELLS.
3. TAP CELLS.
4. DECAP CELLS.

MACRO PLACEMENT (GUIDE LINES)

Macro Placement Depend On


1. FLY LINES
2. PORTS COMMUNICATIONS.
3. MACRO'S ARE PLACED AT BOUNDARIES-->Uniform area for Stad cells
4. MACRO GROUPING [LOGICAL HIERARCHY]
5. SPACING BETWEEN MACRO'S
6. MACRO ALIGNMENT
7. NOTCHES AVOIDING
8. ORIENTATION
9. BLOCKAGES
10. AVOID CRIS CROSS PLACEMENT OF MACROS
● MACROS ARE ROTATED AS REQUIRED TO OPTIMIZE WIRE LENGTH DURING
AUTOMATIC MACRO PLACEMENT.
● TYPICALLY , MACROS ARE PLACED AROUND EDGES OF BLOCKS,KEEPING ARE
LARGE MAIN AREA FOR STD CELLS
● LEAVE A HALO SPACE BETWEEN MACROS ON ALL SIDES
● FOR A NON PIN SIDES OF MACROS A MINIMAL SEPARATION .IS ADEQUATE.
● FOR PIN SIDES OF MACROS A LARGER SEPARATION IS APPROPRIATE.
● ALLOW CHANNELS FOR ROUTING PIN ACCESS AND POSSIBLE BUFFER INSERTION
● LEAVE SPACE BETWEEN MACRO AND THE EDGE OF CHIP/BLOCK, TO ALLOW FOR
BUFFERS INSERTION AND POWER STRIPES TO FEED STD CELL ROWS BETWEEN
MACRO AND BLOCK EDGE.

CALCULATION FOR DISTANCE BETWEEN MACROS:

NO.OF PINS (X) PITCH

DISTANCE BETWEEN MACROS= ------------------------------------------------------

AVAILABLE LAYERS/TOTAL LAYERS

FLOOR PLAN:
AT CHIP LEVEL:

FLOOR PLAN IS A STEP WHERE WE CREATING THE PAD CELLS .

AND SPACIFYING POSITIONS, PLACING PAD CELLS.

AND INSERTING PAD FILLER CELLS,FOR WELL CONTINITY.

WELL CONTINITY , WELL CONTINITY MEANS IF THE WELL IS NOT CONTINOUS THEN WE
HAVE TO CREATE SPECIAL MASKS.

IF WELL IS CONTINOUS THEN THERE IS NO NEED OF CREATING SPECIAL MASKS.

IN FLOOR PLAN MAIN IMPORTANT IS MACRO PLACEMENT.

MACRO IS NOTHING BUT IP'S, MEMORY CELLS.

IF WE HAVE A LARGE CIRCUIT THEN THERE IS NO NEED OF CREATING EVERY TIME.

THE CIRCUIT IS AVAILABLE IN THE MARKET IN THE FORM OF MACRO OR IP.

MACROS ARE TWO TYPES:(i)HARD MACRO.


(ii)SOFT MACRO.

HARD MACRO:THE CIRCUIT IS FIXED. AND WE DON'T NO WHICH TYPE OF GATES USING
INSIDE.WE KNOW THE ONLY TIMING INFORMATION.WE DON'T KNOW THE FUNCTIONALITY
INFORMATION.

SOFT MACRO:THE CIRCUIT IS NOT FIXED.WE KNOW WHICH TYPE OF GATES USING
INSIDE.WE KNOW THE TIMING INFORMATION. WE KNOW THE FUNCTIONALITY
INFORMATION.

AND IN FLOORPLAN WE ALSO CREATING THE BLOCKAGES.

BLOCKAGES:​BLOCKAGES ARE THE IF LET TAKE WE WANT SOME AREA WHERE NO


ONE STD CELL PLACE. FOR THAT PURPOSE WE ARE USING BLOCKAGES.

BLOCKAGES ARE TWO TYPES:(i)SOFT BLOCKAGES

(ii)HARD BLOCKAGES.

SOFT BLOCKAGES MEANS NO ONE STD CELLS PLACED FIRST, BUT AT THE TIME OF
OPTIMIZATION ONLY BUFFERS ARE PLACED, AND THESE ARE USED AT (i)BETWEEN TWO
MACROS,

(ii)AND BETWEEN MACRO AND BOARDERS.

HARD BLOCKAGES MEANS NO ONE STD CELLS PLACED.AND THESE ARE USED AT THE
AROUND THE MACRO.BECAUSE PIN ACCESSING.

IN THE FLOOR PLAN MAIN OBJECTS ARE MACRO PLACEMENT.,

DEFINE ASPECT RATIO(HEIGHT/WIDTH).

I/O PLACEMENTS.

CORE AREA INITIALIZATION.

CORE AREA :CORE AREA IS DEFINED FOR THE PLACEMENT OF STD CELLS,AND
MACROS.

CORE AREA DEPENDS ON (i)ASPECT RATIO

(ii)UTILIZATION.
UTILIZATION=(STD CELL AREA+MACRO AREA+BLOCKAGE AREA)/TOTAL AREA.

STD CELL UTILIZATION=(STD CELL AREA)/

(TOTAL CORE AREA -(MACRO AREA+BLOCKAGE AREA)).

THESE STD CELLS ARE PLACED IN ROWS.

----->I/O PLACEMENT.

IN I/O PLACEMENT WE HAVING PADS.

PADS ARE USED FOR INTERFACING PURPOSE,AND THESE ARE USED FOR
PROVIDING POWER SUPPLY, DATA SIGNAL,CLOCK SIGNAL.

EASILY THESE CAN BE USED AS PORTS.

PADS ARE DIFFERENT TYPES:(i)POWER PADS,

(ii)SIGNAL PADS.

(iii)CORNER PADS.

(iv)I/O PADS.

OPTIMIZATION CONTROLS

Design Optimization Controls :

1. Enable multiple clocks per register


2. Enable constant propagation
3. Enable multiple port net buffering
4. Enable Constant net buffering
5. Apply timing derating for On-Chip variations
6. Define Don't use or preferred cells
7. Keep Spare cells and unloaded cells
8. Apply area constraints and area recovery
9. Apply area and power cricalranges.
10. Organize paths into groups
11. Prevent clock as data networks
12. modify optimization priorities if needed
13. Enable recovery and removal check

SDC

SDC :Format is .SDC :


These Constraints are timing Constraints .

These Constraints are used for to meet timing requirements.

Constraints are

1. CLOCK DEFINITIONS:Create Clock Period.


2. Generated Clock Definitions
3. Input Delay
4. Output Delay
5. I/O delay
6. Max delay
7. Min Delay
8. --------------->Exceptions<-------------------------
9. Multi cycle path
10. False path
11. Half cycle path
12. Disable timing arcs
13. Case Analysis

Multi cycle path, False path are Exceptions.

And it also contains

--------------->Clock latency

--------------->Clock Uncertainity

--------------->Clock Transition

--------------->Clock Gating setup

--------------->Clock Gating Hold

--------------->Clock Driving cell


Netlist: Format is .V

It contains Logical connectivity Of all Cell(Std cells,Macros).

It contain List of nets.

In the design, for Knowing the connectivity by using Fly lines.

.V ---------->Logical Connectivity

.ddc-------->logical connectivity,Scan chain info, .Scandef file info,Gate level Description

TLU+ files: format is .TLUP:


1. R,C parasitics of metal per unit length.
2. These(R,C parasitics) are used for calculating Net Delays.
3. If TLU+ files are not given then these are getting from .ITF file.
4. For Loading TLU+ files we have load three files .
5. Those are Max Tlu+,Min TLU+,MAP file.
6. MAP file maps the .ITF file and .tf file of the layer and via names.

TECHNOLOGY FILE

Technology file: format is .tf:

1. It contains Name,Number conventions of layer and via


2. It contains Physical,electrical characteristics of layer and via
3. In Physical characteristics Min width,Min Spacing,Min Hight are present.
4. In Electrical characteristics Max Current Density is present.
5. Units and Precisions of layer and via .
6. Colors and pattern of layer and via .
7. Physical Design rules of layer and via
8. In Physical Design rules Wire to Wire Spacing,Min Width between Layer and via are
present.

Layer Info :

1. Mask Name
2. Visible
3. Selectable
4. Line Style(Solid)
5. Patteren
6. Pitch
7. Cut Layer
8.

PHYSICAL LIBRARIES

Physical libraries: format is .lef(Layout Exchange Format):

1. physical information of std cells,macros,pads.


2. Pin information.
3. Define unit tile(sites) placement.
4. Minimum Width of Resolution.
5. Hight of the placement Rows .
6. Preferred routing Directions.
7. Pitch of the routing tracks.
8. Antena Rules.
9. Routing Blockages,Macro Blockage

Macro/Stad Cells :-------------->Cell neame

-------------->Size(Dimensions,Area)

------------->Pin

------------->Port

------------->Layer

------------->Direction

Pins information : --------------->Direction(Input,Output,INOUT)

--------------->Use(Signal,Power,Ground)

--------------->Antena Gate Area

--------------->layer
LEFs are 3 Types : ​ .Macro lef (Macro Info)

.StdCell lef(Standard Cell Info )

.Tech lef(Layer,Via Info)

In physical info height,area,width are present.

and also it contains two views

1)Cell View:

In this all layout information is present,it is used at the time of tapeout

2)FRAM view:

Fram view is abstract view, it is used at the Place & Route

LOGIC LIBRARIES

Logical libraries :Format is .lib(liberty)

1. Timing information of Standard cells,Soft macros,Hard macros.


2. Functionality information of Standard cells,Soft macros.
3. And design rules like max transition ,max capacitance, max fanout.
4. In timing information Cell delays ,Setup,Hold,Recovery,Removal time are present.
5. Cell delay is Function of input transition and output load.
6. Cell delay is calculated based on lookup tables.
7. Cell delays are calculated by using linear delay models,Non linear delay models,CCS
models.
8. Functionality is used for Optimization Purpose.
9. And also Contain Power information.
10. And contains Leakage power for Default cell,Leakage Power Density for cell,Default
Input voltage , Out put voltage.

And PVT contains ------------>On Chip Variations(BC,WC)

------------>Cell leakage Power

---------->Internal Power
---------->Rise Transition

----------->Fall transition

---------->>Setup rise

----------->Setup fall

----------->Hold rise

------------>Hold fall

------------>Minimum pulse width high

------------->Minimum pulse width low

------------->Recovery rise

-------------->Removal fall

--------------->Cell rise

-------------->Cell fall

-------------->Pin Capacitance

Cell level information

1. Cell name
2. Area(represent with Nand Equ Area)
3. Power (Funtion of input transition, Total output net Cap )
4. Funtionality
5. Delay
6. Max Cap
7. Max Trans
8. Foot Print

And it also Contains K-Factor

And it also contain WIRE LOAD MODELS

And it contains A view(sub directory) i.e. LM(Logical Model view)view.

It contains logical libraries.

ASIC DESIGN TYPES


ASIC is mainly Divided into two Divisions
1)Logical Design(LD)

2)Physical Design(PD)

Physical Design is Physical implementation of Design

In Physical Design mainly Six inputs are present

1. Logical libraries --> format is .lib --->given by Vendors


2. Physical libraries -->format is .lef --->given by vendors
3. Technology file -->format is .tf --->given by fabrication peoples
4. TLU+ file -->format is .TLUP-->given by fabrication people
5. Netlist --->format is .v -->given by Synthesis People
6. Synthesis Design Constraints -->format is .SDC -->given by Synthesis People

​ PHYSICAL DESIGN PROCESS.

1. DATA PREPARATION.
2. FLOOR PLAN.
3. POWER PLAN-->POWER ROUTING [PRE ROUTE]
4. PLACEMENT.
5. CLOCK TREE SYNTHESIS.-->CLOCK ROUTING.
6. ROUTING.-->DATA ROUTING.-->[POST ROUTE]
7. CHIP FINISHING.
8. VERIFICATION.
9. GDSII FILE.
ASIC Design Flow Tutorial
Using Synopsys Tools

By
Hima Bindu Kommuru
Hamid Mahmoodi

Nano-Electronics & Computing Research Lab


School of Engineering
San Francisco State University
San Francisco, CA
Spring 2009

San Francisco State University Nano-Electronics & Computing Research Lab 1


TABLE OF CONTENTS
WHAT IS AN ASIC? ........................................................................................................ 5
1.0 INTRODUCTION ..................................................................................................................................... 5
1.1 CMOS TECHNOLOGY ........................................................................................................................... 6
1.2 MOS TRANSISTOR ................................................................................................................................ 6
Figure 1.2a MOS Transistor ................................................................................................................. 6
Figure 1.2b Graph of Drain Current vs Drain to Source Voltage ........................................................ 7
1.3 POWER DISSIPATION IN CMOS IC’S ..................................................................................................... 8
1.4 CMOS TRANSMISSION GATE ............................................................................................................... 8
Figure 1.4a Latch.................................................................................................................................. 9
Figure 1.4b Flip-Flop ........................................................................................................................... 9
OVERVIEW OF ASIC FLOW ..................................................................................... 10
2.0 INTRODUCTION ....................................................................................................................................10
Figure 2.a : Simple ASIC Design Flow................................................................................................11
SYNOPSYS VERILOG COMPILER SIMULATOR (VCS) TUTORIAL ............... 13
3.0 INTRODUCTION ....................................................................................................................................13
3.1 TUTORIAL EXAMPLE ............................................................................................................................14
3.1.1 Compiling and Simulating ..........................................................................................................14
Figure 3.a: vcs compile........................................................................................................................15
Figure 3.b Simulation Result ...............................................................................................................16
3.2 DVE TUTORIAL................................................................................................................................17
APPENDIX 3A: OVERVIEW OF RTL ........................................................................................................28
3.A.1 Register Transfer Logic ..............................................................................................................28
3.A.2 Digital Design ...........................................................................................................................30
APPENDIX 3B: TEST BENCH / VERIFICATION ................................................................................30
3.B.1 Test Bench Example: ..................................................................................................................33
DESIGN COMPILER TUTORIAL [RTL-GATE LEVEL SYNTHESIS] ............... 37
4.0 INTRODUCTION ....................................................................................................................................37
4.1 BASIC SYNTHESIS GUIDELINES ..................................................................................................39
4.1.1 Startup File .................................................................................................................................39
4.1.2 Design Objects ............................................................................................................................40
4.1.3 Technology Library.....................................................................................................................41
4.1.4 Register Transfer-Level Description ...........................................................................................42
4.1.5 General Guidelines .....................................................................................................................43
4.1.6 Design Attributes and Constraints ..............................................................................................44
4.2 TUTORIAL EXAMPLE ............................................................................................................................46
4.2.1 Synthesizing the Code .................................................................................................................48
Figure 4.a : Fragment of analyze command ........................................................................................50
Figure 4.b Fragment of elaborate command .....................................................................................51
Figure 4.c: Fragment of Compile command ........................................................................................53
4.2.2 Interpreting the Synthesized Gate-Level Netlist and Text Reports ..............................................54
Figure 4.d : Fragment of area report .................................................................................................55
Figure 4.e: Fragment of cell area report ............................................................................................55
Figure 4.f : Fragment of qor report ...................................................................................................56
Figure 4.g: Fragment of Timing report ...............................................................................................57
Figure 4.h : Synthesized gate-level netlist ...........................................................................................58
4.2.3 SYNTHESIS SCRIPT ............................................................................................................................58
Note : There is another synthesis example of a FIFO in the below location for further reference. This
synthesized FIFO example is used in the physical design IC Compiler Tutorial .................................60
APPENDIX 4A: SYNTHESIS OPTIMIZATION TECHNIQUES ..........................................................60
4. A.0 INTRODUCTION ...............................................................................................................................60

San Francisco State University Nano-Electronics & Computing Research Lab 2


4. A.1 MODEL OPTIMIZATION....................................................................................................................60
4.A.1.1 Resource Allocation .................................................................................................................60
Figure 4A.b. With resource allocation. ................................................................................................61
4.A.1.2 Flip-flop and Latch optimizations ...........................................................................................64
4.A.1.3 Using Parentheses ...................................................................................................................64
4.A.1.4 Partitioning and structuring the design. ..................................................................................65
4.A.2 OPTIMIZATION USING DESIGN COMPILER ........................................................................................65
4.A.2.1 Top-down hierarchical Compile ..............................................................................................66
4.A.2.2 Optimization Techniques .........................................................................................................67
4. A.3 TIMING ISSUES ................................................................................................................................70
Figure 4A.b Timing diagram for setup and hold On DATA.................................................................70
4.A.3.1 HOW TO FIX TIMING VIOLATIONS .....................................................................................71
Figure 4A.c : Logic with Q2 critical path ............................................................................................73
Figure 4A.d: Logic duplication allowing Q2 to be an independent path. ...........................................73
Figure 4A.e: Multiplexer with late arriving sel signal .........................................................................74
Figure 4A.f: Logic Duplication for balancing the timing between signals .........................................74
Figure 4.A.g : Logic with pipeline stages ............................................................................................74
4A.4 VERILOG SYNTHESIZABLE CONSTRUCTS ..........................................................................................75
5.0 DESIGN VISION ...................................................................................................... 78
5.1 ANALYSIS OF GATE-LEVEL SYNTHESIZED NETLIST USING DESIGN VISION ..................78
Figure 5.a: Design Vision GUI ...........................................................................................................78
Figure 5.b: Schematic View of Synthesized Gray Counter ..................................................................79
Figure 5.c Display Timing Path ...........................................................................................................81
Figure 5.d Histogram of Timing Paths ................................................................................................81
STATIC TIMING ANALYSIS ...................................................................................... 82
6.0 INTRODUCTION ....................................................................................................................................82
6.1 TIMING PATHS .....................................................................................................................................82
6.1.1 Delay Calculation of each timing path: ......................................................................................83
6.2 TIMING EXCEPTIONS ............................................................................................................................83
6.3 SETTING UP CONSTRAINTS TO CALCULATE TIMING:.............................................................................83
6.4 BASIC TIMING DEFINITIONS: ...............................................................................................................84
6.5 CLOCK TREE SYNTHESIS (CTS): .........................................................................................................85
6.6 PRIMETIME TUTORIAL EXAMPLE ..............................................................................................86
6.6.1 Introduction.................................................................................................................................86
6.6.2 PRE-LAYOUT ....................................................................................................................................86
6.6.2.1 PRE-LAYOUT CLOCK SPECIFICATION ..............................................................................87
6.6.3 STEPS FOR PRE-LAYOUT TIMING VALIDATION ...................................................................87
IC COMPILER TUTORIAL ......................................................................................... 92
8.0 BASICS OF PHYSICAL IMPLEMENTATION..............................................................................................92
8.1 Introduction ...................................................................................................................................92
Figure 8.1.a : ASIC FLOW DIAGRAM ...............................................................................................92
8.2 FLOORPLANNING ................................................................................................................................93
Figure 8.2.a : Floorplan example ........................................................................................................94
8.3 CONCEPT OF FLATTENED VERILOG NETLIST .......................................................................................97
8.3.a Hierarchical Model: ...................................................................................................................97
8.3.b Flattened Model: .........................................................................................................................98
Figure 8.c Floorplanning Flow Chart .................................................................................................98
8.4 PLACEMENT .........................................................................................................................................99
8.5 Routing ........................................................................................................................................100
Figure 8.5.a : Routing grid ................................................................................................................101
8.6 PACKAGING .......................................................................................................................................102
Figure 8.6.a : Wire Bond Example ....................................................................................................102

San Francisco State University Nano-Electronics & Computing Research Lab 3


Figure 8.6.b : Flip Chip Example ......................................................................................................103
8.7 IC TUTORIAL EXAMPLE ..............................................................................................................103
8.7.1 INTRODUCTION ...............................................................................................................................103
CREATING DESIGN LIBRARY.........................................................................................................106
FLOORPLANNING ...........................................................................................................................109
PLACEMENT.....................................................................................................................................112
CLOCK TREE SYNTHESIS ...............................................................................................................115
CTS POST OPTIMIZATION STEPS ..................................................................................................116
ROUTING ..........................................................................................................................................117
EXTRACTION ............................................................................................................. 121
9.0 INTRODUCTION ..................................................................................................................................121
APPENDIX A: DESIGN FOR TEST.......................................................................... 126
A.0 INTRODUCTION .................................................................................................................................126
A.1 TEST TECHNIQUES ............................................................................................................................126
A.1.1 Issues faced during testing .......................................................................................................126
A.2 SCAN-BASED METHODOLOGY ..........................................................................................................126
A.3 FORMAL VERIFICATION ....................................................................................................................128
APPENDIX B: EDA LIBRARY FORMATS ............................................................. 128
B.1 INTRODUCTION .................................................................................................................................128

San Francisco State University Nano-Electronics & Computing Research Lab 4


What is an ASIC?
1.0 Introduction
Integrated Circuits are made from silicon wafer, with each wafer holding hundreds of die.
An ASIC is an Application Specific Integrated Circuit. An Integrated Circuit designed
is called an ASIC if we design the ASIC for the specific application. Examples of ASIC
include, chip designed for a satellite, chip designed for a car, chip designed as an
interface between memory and CPU etc. Examples of IC’s which are not called ASIC
include Memories, Microprocessors etc. The following paragraphs will describe the types
of ASIC’s.
1. Full-Custom ASIC: For this type of ASIC, the designer designs all or some of
the logic cells, layout for that one chip. The designer does not used predefined
gates in the design. Every part of the design is done from scratch.
2. Standard Cell ASIC: The designer uses predesigned logic cells such as AND
gate, NOR gate, etc. These gates are called Standard Cells. The advantage of
Standard Cell ASIC’s is that the designers save time, money and reduce the risk
by using a predesigned and pre-tested Standard Cell Library. Also each Standard
Cell can be optimized individually. The Standard Cell Libraries is designed using
the Full Custom Methodology, but you can use these already designed libraries in
the design. This design style gives a designer the same flexibility as the Full
Custom design, but reduces the risk.
3. Gate Array ASIC: In this type of ASIC, the transistors are predefined in the
silicon wafer. The predefined pattern of transistors on the gate array is called a
base array and the smallest element in the base array is called a base cell. The
base cell layout is same for each logic cell, only the interconnect between the cells
and inside the cells is customized. The following are the types of gate arrays:
a. Channeled Gate Array
b. Channelless Gate Array
C. Structured Gate Array

When designing a chip, the following objectives are taken into consideration:
1. Speed
2. Area
3. Power
4. Time to Market

To design an ASIC, one needs to have a good understanding of the CMOS Technology.
The next few sections give a basic overview of CMOS Technology.

San Francisco State University Nano-Electronics & Computing Research Lab 5


1.1 CMOS Technology

In the present decade the chips being designed are made from CMOS technology. CMOS
is Complementary Metal Oxide Semiconductor. It consists of both NMOS and PMOS
transistors. To understand CMOS better, we first need to know about the MOS (FET)
transistor.

1.2 MOS Transistor


MOS stands for Metal Oxide Semiconductor field effect transistor. MOS is the basic
element in the design of a large scale integrated circuit is the transistor. It is a voltage
controlled device. These transistors are formed as a ``sandwich'' consisting of a
semiconductor layer, usually a slice, or wafer, from a single crystal of silicon; a layer of
silicon dioxide (the oxide) and a layer of metal. These layers are patterned in a manner
which permits transistors to be formed in the semiconductor material (the ``substrate'');
The MOS transistor consists of three regions, Source, Drain and Gate. The source and
drain regions are quite similar, and are labeled depending on to what they are connected.
The source is the terminal, or node, which acts as the source of charge carriers; charge
carriers leave the source and travel to the drain. In the case of an N channel MOSFET
(NMOS), the source is the more negative of the terminals; in the case of a P channel
device (PMOS), it is the more positive of the terminals. The area under the gate oxide is
called the ``channel”. Below is figure of a MOS Transistor.

Figure 1.2a MOS Transistor

The transistor normally needs some kind of voltage initially for the channel to form.
When there is no channel formed, the transistor is said to be in the ‘cut off region’. The
voltage at which the transistor starts conducting (a channel begins to form between the
source and the drain) is called threshold Voltage. The transistor at this point is said to be
in the ‘linear region’. The transistor is said to go into the ‘saturation region’ when there
are no more charge carriers that go from the source to the drain.

San Francisco State University Nano-Electronics & Computing Research Lab 6


Figure 1.2b Graph of Drain Current vs Drain to Source Voltage

CMOS technology is made up of both NMOS and CMOS transistors. Complementary


Metal-Oxide Semiconductors (CMOS) logic devices are the most common devices used
today in the high density, large number transistor count circuits found in everything from
complex microprocessor integrated circuits to signal processing and communication
circuits. The CMOS structure is popular because of its inherent lower power
requirements, high operating clock speed, and ease of implementation at the transistor
level. The complementary p-channel and n-channel transistor networks are used to
connect the output of the logic device to the either the VDD or VSS power supply rails for a
given input logic state. The MOSFET transistors can be treated as simple switches. The
switch must be on (conducting) to allow current to flow between the source and drain
terminals.

Example: Creating a CMOS inverter requires only one PMOS and one NMOS transistor.
The NMOS transistor provides the switch connection (ON) to ground when the input is
logic high. The output load capacitor gets discharged and the output is driven to a
logic’0’. The PMOS transistor (ON) provides the connection to the VDD power supply
rail when the input to the inverter circuit is logic low. The output load capacitor gets
charged to VDD . The output is driven to logic ’1’.
The output load capacitance of a logic gate is comprised of
a. Intrinsic Capacitance: Gate drain capacitance ( of both NMOS and PMOS
transistors)
b. Extrinsic Capacitance: Capacitance of connecting wires and also input

In CMOS, there is only one driver, but the gate can drive as many gates as possible. In
capacitance of the Fan out Gates.

CMOS technology, the output always drives another CMOS gate input.

The charge carriers for PMOS transistors is ‘holes’ and charge carriers for NMOS
are electrons. The mobility of electrons is two times more than that of ‘holes’. Due to this
the output rise and fall time is different. To make it same, the W/L ratio of the PMOS
transistor is made about twice that of the NMOS transistor. This way, the PMOS and

San Francisco State University Nano-Electronics & Computing Research Lab 7


NMOS transistors will have the same ‘drive strength’. In a standard cell library, the
length ‘L’ of a transistor is always constant. The width ‘W’ values are changed to have to
different drive strengths for each gate. The resistance is proportional to (L/W). Therefore
if the increasing the width, decreases the resistance.

1.3 Power Dissipation in CMOS IC’s


The big percentage of power dissipation in CMOS IC’s is due to the charging and
discharging of capacitors. Majority of the low power CMOS IC designs issue is to reduce
power dissipation. The main sources of power dissipation are:
1. Dynamic Switching Power: due to charging and discharging of circuit

 A low to high output transition draws energy from the power supply
capacitances

 A high to low transition dissipates energy stored in CMOS transistor.


 Given the frequency ‘f’, of the low-high transitions, the total power drawn
would be: load capacitance*Vdd*Vdd*f
2. Short Circuit Current: It occurs when the rise/fall time at the input of the gate is
larger than the output rise/fall time.
3. Leakage Current Power: It is caused by two reasons;
a. Reverse-Bias Diode Leakage on Transistor Drains: This happens in
CMOS design, when one transistor is off, and the active transistor charges
up/down the drain using the bulk potential of the other transistor.
Example: Consider an inverter with a high input voltage, output is low
which means NMOS is on and PMOS is off. The bulk of PMOS is
connected to VDD. Therefore there is a drain-to –bulk voltage –VDD,
causing the diode leakage current.
b. Sub-Threshold Leakage through the channel to an ‘OFF’ transistor/device.

1.4 CMOS Transmission Gate


A PMOS transistor is connected in parallel to a NMOS transistor to form a Transmission
gate. The transmission gate just transmits the value at the input to the output. It consists
of both NMOS and PMOS because, PMOS transistor transmits a strong ‘1’ and NMOS
transistor transmits a strong ‘0’. The advantages of using a Transmission Gate are:
1. It shows better characteristics than a switch.
2. The resistance of the circuit is reduced, since the transistors are connected in parallel.

Sequential Element
In CMOS, an element which stores a logic value (by having a feedback loop) is called a
sequential element. A simplest example of a sequential element would be two inverters
connected back to back. There are two types of basic sequential elements, they are:
1. Latch: The two inverters connected back to back, when connected to a
transmission gate, with a control input, forms a latch. When the control input is
high (logic ‘1’), the transmission gate is switched on and whatever value which
was at the input ‘D’ passes to the output. When the control input is low, the
transmission gate is off and the inverters that are connected back to back hold the

San Francisco State University Nano-Electronics & Computing Research Lab 8


value. Latch is called a transparent latch because when the ‘D’ input changes, the
output also changes accordingly.

D Q

Figure 1.4a Latch

2. Flip-Flop: A flip flop is constructed from two latches in series. The first latch is
called a Master latch and the second latch is called the slave latch. The control
input to the transmission gate in this case is called a clock. The inverted version of
the clock is fed to the input of the slave latch transmission gate.
a. When the clock input is high, the transmission gate of the master latch is
switched on and the input ‘D’ is latched by the 2 inverters connected back
to back (basically master latch is transparent). Also, due to the inverted
clock input to the transmission gate of the slave latch, the transmission
gate of the slave latch is not ‘on’ and it holds the previous value.
b. When the clock goes low, the slave part of the flip flop is switched on and
will update the value at the output with what the master latch stored when
the clock input was high. The slave latch will hold this new value at the
output irrespective of the changes at the input of Master latch when the
clock is low. When the clock goes high again, the value at the output of
the slave latch is stored and step’a’ is repeated again.
The data latched by the Master latch in the flip flop happens at the rising clock
edge, this type of flip flop is called positive-edge triggered flip flop. If the latching
happens at negative edge of the clock, the flip flop is called negative edge triggered flip
flop.

Master Slave

D Q

CLK

Figure 1.4b Flip-Flop

San Francisco State University Nano-Electronics & Computing Research Lab 9


Overview of ASIC Flow

2.0 Introduction

To design a chip, one needs to have an Idea about what exactly one wants to design. At
every step in the ASIC flow the idea conceived keeps changing forms. The first step to
make the idea into a chip is to come up with the Specifications.

 Goals and constraints of the design.


Specifications are nothing but

 Functionality (what will the chip do)


 Performance figures like speed and power
 Technology constraints like size and space (physical dimensions)
 Fabrication technology and design techniques
The next step is in the flow is to come up with the Structural and Functional
Description. It means that at this point one has to decide what kind of architecture
(structure) you would want to use for the design, e.g. RISC/CISC, ALU, pipelining etc …
To make it easier to design a complex system; it is normally broken down into several
sub systems. The functionality of these subsystems should match the specifications. At
this point, the relationship between different sub systems and with the top level system is
also defined.
The sub systems, top level systems once defined, need to be implemented. It is
implemented using logic representation (Boolean Expressions), finite state machines,
Combinatorial, Sequential Logic, Schematics etc.... This step is called Logic Design /
Register Transfer Level (RTL). Basically the RTL describes the several sub systems. It
should match the functional description. RTL is expressed usually in Verilog or VHDL.
Verilog and VHDL are Hardware Description Languages. A hardware description
language (HDL) is a language used to describe a digital system, for example, a network
switch, a microprocessor or a memory or a simple flip-flop. This just means that, by
using a HDL one can describe any hardware (digital) at any level. Functional/Logical
Verification is performed at this stage to ensure the RTL designed matches the idea.
Once Functional Verification is completed, the RTL is converted into an optimized
Gate Level Netlist. This step is called Logic/RTL synthesis. This is done by Synthesis
Tools such as Design Compiler (Synopsys), Blast Create (Magma), RTL Compiler
(Cadence) etc... A synthesis tool takes an RTL hardware description and a standard cell
library as input and produces a gate-level netlist as output. Standard cell library is the
basic building block for today’s IC design. Constraints such as timing, area, testability,
and power are considered. Synthesis tools try to meet constraints, by calculating the cost
of various implementations. It then tries to generate the best gate level implementation
for a given set of constraints, target process. The resulting gate-level netlist is a
completely structural description with only standard cells at the leaves of the design. At
this stage, it is also verified whether the Gate Level Conversion has been correctly
performed by doing simulation.
The next step in the ASIC flow is the Physical Implementation of the Gate Level
Netlist. The Gate level Netlist is converted into geometric representation. The geometric

San Francisco State University Nano-Electronics & Computing Research Lab 10


representation is nothing but the layout of the design. The layout is designed according to
the design rules specified in the library. The design rules are nothing but guidelines based
on the limitations of the fabrication process. The Physical Implementation step consists
of three sub steps; Floor planning->Placement->Routing. The file produced at the output
of the Physical Implementation is the GDSII file. It is the file used by the foundry to
fabricate the ASIC. This step is performed by tools such as Blast Fusion (Magma), IC
Compiler (Synopsys), and Encounter (Cadence) Etc…Physical Verification is performed
to verify whether the layout is designed according the rules.

Figure 2.a : Simple ASIC Design Flow

Idea

Specifications

RTL

Gate Level Netlist

Physical
Implementation

GDSII

CHIP

San Francisco State University Nano-Electronics & Computing Research Lab 11


For any design to work at a specific speed, timing analysis has to be performed.
We need to check whether the design is meeting the speed requirement mentioned in the
specification. This is done by Static Timing Analysis Tool, for example Primetime
(Synopsys). It validates the timing performance of a design by checking the design for all
possible timing violations for example; set up, hold timing.
After Layout, Verification, Timing Analysis, the layout is ready for Fabrication. The
layout data is converted into photo lithographic masks. After fabrication, the wafer is
diced into individual chips. Each Chip is packaged and tested.

San Francisco State University Nano-Electronics & Computing Research Lab 12


Synopsys Verilog Compiler Simulator (VCS) Tutorial
3.0 Introduction
Synopsys Verilog Compiler Simulator is a tool from Synopsys specifically designed to
simulate and debug designs. This tutorial basically describes how to use VCS, simulate a
verilog description of a design and learn to debug the design. VCS also uses VirSim,
which is a graphical user interface to VCS used for debugging and viewing the
waveforms.

There are three main steps in debugging the design, which are as follows

1. Compiling the Verilog/VHDL source code.


2. Running the Simulation.
3. Viewing and debugging the generated waveforms.

You can interactively do the above steps using the VCS tool. VCS first compiles the
verilog source code into object files, which are nothing but C source files. VCS can
compile the source code into the object files without generating assembly language files.
VCS then invokes a C compiler to create an executable file. We use this executable file to
simulate the design. You can use the command line to execute the binary file which
creates the waveform file, or you can use VirSim.

Below is a brief overview of the VCS tool, shows you how to compile and simulate a
counter. For basic concepts on verification and test bench, please refer to APPENDIX 3A
at the end of this chapter.

SETUP

Before going to the tutorial Example, let’s first setup up the directory.

You need to do the below 3 steps before you actually run the tool:

1. As soon as you log into your engr account, at the command prompt, please type “csh
“as shown below. This changes the type of shell from bash to c-shell. All the commands
work ONLY in c-shell.

[hkommuru@hafez ]$csh

2. Please copy the whole directory from the below location (cp –rf source destination)

[hkommuru@hafez ]$cd
[hkommuru@hafez ]$ cp -rf /packages/synopsys/setup/asic_flow_setup ./

This creates directory structure as shown below. It will create a directory called
“asic_flow_setup ”, under which it creates the following directories namely

San Francisco State University Nano-Electronics & Computing Research Lab 13


asic_flow_setup
src/ : for verilog code/source code
vcs/ : for vcs simulation ,
synth_graycounter/ : for synthesis
synth_fifo/ : for synthesis
pnr/ : for Physical design
extraction/: for extraction
pt/: for primetime
verification/: final signoff check

The “asic_flow_setup” directory will contain all generated content including, VCS
simulation, synthesized gate-level Verilog, and final layout. In this course we will always
try to keep generated content from the tools separate from our source RTL. This keeps
our project directories well organized, and helps prevent us from unintentionally
modifying the source RTL. There are subdirectories in the project directory for each
major step in the ASIC Flow tutorial. These subdirectories contain scripts and
configuration files for running the tools required for that step in the tool flow. For this
tutorial we will work exclusively in the vcs directory.

3. Please source “synopsys_setup.tcl” which sets all the environment variables necessary
to run the VCS tool.
Please source them at unix prompt as shown below

[hkommuru@hafez ]$ source /packages/synopsys/setup/synopsys_setup.tcl

Please Note : You have to do steps 1 and 3 above everytime you log in.

3.1 Tutorial Example

In this tutorial, we would be using a simple counter example . Find the verilog code and
testbench at the end of the tutorial.

Source code file name : counter.v


Test bench file name : counter_tb.v

Setup
3.1.1 Compiling and Simulating

NOTE: AT PRESENT THERE SEEMS TO BE A BUG IN THE TOOL, SO


COMPILE AND SIMULATION IN TWO DIFFERENT STEPS IS NOT
WORKING. THIS WILL BE FIXED SHORTLY. PLEASE DO STEP 3 TO SEE
THE OUTPUT OF YOUR CODE. STEP 3 COMMAND PERFORMS
COMPILATION AND SIMULATION IN ONE STEP.

San Francisco State University Nano-Electronics & Computing Research Lab 14


1. In the “vcs” directory, compile the verilog source code by typing the following at the
machine prompt.

[hkommuru@hafez vcs]$ cd asic_flow_setup/vcs


[hkommuru@hafez vcs]$ cp ../src/counter/* .
[hkommuru@hafez vcs]$ vcs –f main_counter.f + v2k

Please note that the –f option means the file specified (main_counter.f ) contains a list of
command line options for vcs. In this case, the command line options are just a list of the
verilog file names. Also note that the testbench is listed first. The below command also
will have same effect .

[hkommuru@hafez vcs]$ vcs –f counter_tb.v counter.v + v2k

The +v2k option is used if you are using Verilog IEE 1364-2000 syntax; otherwise there
is no need for the option. Please look at Figure 3.a for output of compile command.

Figure 3.a: vcs compile

Chronologic VCS (TM)


Version B-2008.12 -- Wed Jan 28 20:08:26 2009
Copyright (c) 1991-2008 by Synopsys Inc.
ALL RIGHTS RESERVED

This program is proprietary and confidential information of Synopsys Inc.


and may be used and disclosed only as authorized in a license agreement
controlling such use and disclosure.

Warning-[ACC_CLI_ON] ACC/CLI capabilities enabled


ACC/CLI capabilities have been enabled for the entire design. For faster
performance enable module specific capability in pli.tab file

Parsing design file 'counter_tb.v'


Parsing design file 'counter.v'
Top Level Modules:
timeunit
counter_testbench
TimeScale is 1 ns / 10 ps
Starting vcs inline pass...
2 modules and 0 UDP read.
recompiling module timeunit
recompiling module counter_testbench
Both modules done.
gcc -pipe -m32 -O -I/packages/synopsys/vcs_mx/B-2008.12/include -c -o rmapats.o rmapats.c
if [ -x ../simv ]; then chmod -x ../simv; fi
g+ + -o ../simv -melf_i386 -m32 5NrI_d.o 5NrIB_d.o IV5q_1_d.o blOS_1_d.o rmapats_mop.o rmapats.o
SIM_l.o /packages/synopsys/vcs_mx/B-2008.12/linux/lib/libvirsim.a /packages/synopsys/vcs_mx/B-
2008.12/linux/lib/librterrorinf.so /packages/synopsys/vcs_mx/B-2008.12/linux/lib/libsnpsmalloc.so

San Francisco State University Nano-Electronics & Computing Research Lab 15


/packages/synopsys/vcs_mx/B-2008.12/linux/lib/libvcsnew.so /packages/synopsys/vcs_mx/B-
2008.12/linux/lib/ctype-stubs_32.a -ldl -lz -lm -lc -ldl
../simv up to date

VirSim B-2008.12-B Virtual Simulator Environment


Copyright (C) 1993-2005 by Synopsys, Inc.
Licensed Software. All Rights Reserved.

By default the output of compilation would be a executable binary file is named simv.
You can specify a different name with the -o compile-time option.

For example :
vcs –f main_counter.f +v2k –o counter.simv

VCS compiles the source code on a module by module basis. You can incrementally
compile your design with VCS, since VCS compiles only the modules which have
changed since the last compilation.

2. Now, execute the simv command line with no arguments. You should see the output
from both vcs and simulation and should produce a waveform file called counter.dump in
your working directory.

[hkommuru@hafez vcs]$./counter.simv

Please see Figure 3.b for output of simv command

Figure 3.b Simulation Result

Chronologic VCS simulator copyright 1991-2008


Contains Synopsys proprietary information.
Compiler version B-2008.12; Runtime version B-2008.12; Jan 28 19:59 2009

time= 0 ns, clk= 0, reset= 0, out= xxxx


time= 10 ns, clk= 1, reset= 0, out= xxxx
time= 11 ns, clk= 1, reset= 1, out= xxxx
time= 20 ns, clk= 0, reset= 1, out= xxxx
time= 30 ns, clk= 1, reset= 1, out= xxxx
time= 31 ns, clk= 1, reset= 0, out= 0000
time= 40 ns, clk= 0, reset= 0, out= 0000
time= 50 ns, clk= 1, reset= 0, out= 0000
time= 51 ns, clk= 1, reset= 0, out= 0001
time= 60 ns, clk= 0, reset= 0, out= 0001
time= 70 ns, clk= 1, reset= 0, out= 0001
time= 71 ns, clk= 1, reset= 0, out= 0010
time= 80 ns, clk= 0, reset= 0, out= 0010
time= 90 ns, clk= 1, reset= 0, out= 0010
time= 91 ns, clk= 1, reset= 0, out= 0011
time= 100 ns, clk= 0, reset= 0, out= 0011
time= 110 ns, clk= 1, reset= 0, out= 0011
time= 111 ns, clk= 1, reset= 0, out= 0100

San Francisco State University Nano-Electronics & Computing Research Lab 16


time= 120 ns, clk= 0, reset= 0, out= 0100
time= 130 ns, clk= 1, reset= 0, out= 0100
time= 131 ns, clk= 1, reset= 0, out= 0101
time= 140 ns, clk= 0, reset= 0, out= 0101
time= 150 ns, clk= 1, reset= 0, out= 0101
time= 151 ns, clk= 1, reset= 0, out= 0110
time= 160 ns, clk= 0, reset= 0, out= 0110
time= 170 ns, clk= 1, reset= 0, out= 0110
All tests completed sucessfully

$finish called from file "counter_tb.v", line 75.


$finish at simulation time 171.0 ns
VC S S i m u l a t i o n R e p o r t
Time: 171000 ps
CPU Time: 0.020 seconds; Data structure size: 0.0Mb
Wed Jan 28 19:59:54 2009

If you look at the last page of the tutorial, you can see the testbench code, to understand
the above result better.

3. You can do STEP 1 and STEP 2 in one single step below. It will compile and simulate
in one single step. Please take a look at the command below:

[hkommuru@hafez vcs]$ vcs -V -R -f main_counter.f -o simv

In the above command,


-V : stands for Verbose
-R : command which tells the tool to do simulation immediately/automatically after
compilation
-o : output file name , can be anything simv, counter.simv etc...
-f : specifying file

To compile and simulate your design, please write your verilog code, and copy it to the
vcs directory. After copying your verilog code to the vcs directory, follow the tutorial
steps to simulate and compile.

3.2 DVE TUTORIAL


DVE provides you a graphical user interface to debug your design. Using DVE you can
debug the design in interactive mode or in postprocessing mode. In the interactive mode,
apart from running the simulation, DVE allows you to do the following:
• View waveforms
• Trace Drivers and loads
• Schematic, and Path Schematic view
• Compare waveforms
• Execute UCLI/Tcl commands
• Set line, time, or event break points
• Line stepping

San Francisco State University Nano-Electronics & Computing Research Lab 17


However, in post-processing mode, a VPD/VCD/EVCD file is created during simulation,
and you use DVE to:
• View waveforms
• Trace Drivers and loads
• Schematic, and Path Schematic view
• Compare waveforms
Use the below command line to invoke the simulation in interactive mode using DVE:

[hkommuru@hafez vcs]$simv –gui

A TopLevel window is a frame that displays panes and views.


• A pane can be displayed one time on each TopLevel Window. serves a specific debug
purpose. Examples of panes are Hierarchy, Data, and the Console panes.
• A view can have multiple instances per TopLevel window. Examples of views are
Source, Wave, List, Memory, and Schematic. Panes can be docked on any side to a
TopLevel window or left floating in the area in the frame not occupied by docked panes
(called the workspace).
You can use the above command or you can do everything, which is compile and
simulation, open the gui in one step.

1. Invoke dve to view the waveform. At the unix prompt, type :

[hkommuru@hafez vcs]$ vcs -V -R -f main_counter.f -o simv -gui –debug_pp

Where debug_pp option is used to run the dve in simulation mode. Debug_pp creates a
vpd file which is necessary to do simulation. The below window will open up.

San Francisco State University Nano-Electronics & Computing Research Lab 18


2. In the above window, open up the counter module to view the whole module like
below. Click on dut highlighted in blue and drag it to the data pane as shown below. All
the signals in the design will show up in the data pane.

San Francisco State University Nano-Electronics & Computing Research Lab 19


3. In this window, click on “Setup” under the “Simulator” option. A new small window
will open up as shown. Inter.vpd is the file, the simulator will use to run the waveform.

San Francisco State University Nano-Electronics & Computing Research Lab 20


The –debug_pp option in step1 creates this file. Click ok and now the step up is complete
to run the simulation shown in the previous page.

4. Now in the data pane select all the signals with the left mouse button holding the shift
button so that you select as many signals you want. Click on the right mouse button to
open a new window, and click on “Add to group => New group . A new window will
open up showing a new group of selected signals below.

You can create any number of signal groups you want so that you can organize the way
and you want to see the output of the signals .

San Francisco State University Nano-Electronics & Computing Research Lab 21


5. 4. Now in the data pane select all the signals with the left mouse button holding the

to open a new window, and click on “Add to waves  New wave view”. A new
shift button so that you select as many signals you want. Click on the right mouse button

waveform window will open with simulator at 0ns .

San Francisco State University Nano-Electronics & Computing Research Lab 22


6. In the waveform window, go to “Simulator menu option” and click on “Start”. The tool
now does simulation and you can verify the functionality of the design as shown below.

In the waveform window, the menu option View  Set Time Scale can be used to
change the display unit and the display precision

session again. In the menu option , File  Save Session, the below window opens as
7. You can save your current session and reload the same session next time or start a new

shown below.

San Francisco State University Nano-Electronics & Computing Research Lab 23


1. Scope  Show Source code: You can view your source code here and analyze.
8. For additional debugging steps, you can go to menu option

2. Scope  Show Schematic: You can view a schematic view of the design .

San Francisco State University Nano-Electronics & Computing Research Lab 24


9. Adding Breakpoints in Simulation. To be able to add breakpoints, you have to use a
additional compile option –debug_all –flag when you compile the code as shown below.

[hkommuru@hafez vcs]$ vcs -V -R -f main_counter.f -o simv -gui –debug_pp -


debug_all –flag

Go to the menu option, Simulation  Breakpoints , will open up a new window as shown
below. You need to do this before Step 6, i.e. before actually running the simulation.

You can browse which file and also the line number and click on “Create” button to
create breakpoints.

Now when you simulate, click on Simulate  Start, it will stop at your defined
breakpoint, click on Next to continue.

You can save your session again and exit after are done with debugging or in the middle
of debugging your design.

Verilog Code
File : Counter.v

module counter ( out, clk, reset ) ;

San Francisco State University Nano-Electronics & Computing Research Lab 25


input clk, reset;
output [3:0] out;

reg [3:0] out;

wire [3:0] next;

// This statement implements reset and increment


assign next = reset ? 4'b0 : (out + 4'b1);

// This implements the flip-flops


always @ ( posedge clk ) begin
out <= #1 next;
end

endmodule // counter

File : Counter_tb.v [ Test Bench ]


// This stuff just sets up the proper time scale and format for the
// simulation, for now do not modify.
`timescale 1ns/10ps
module timeunit;
initial $timeformat(-9,1," ns",9);
endmodule

// Here is the testbench proper:


module counter_testbench ( ) ;

// Test bench gets wires for all device under test (DUT) outputs:

wire [3:0] out;

// Regs for all DUT inputs:


reg clk;
reg reset;

counter dut (// (dut means device under test)


// Outputs
.out (out[3:0]),
// Inputs
.reset (reset),
.clk (clk));

// Setup clk to automatically strobe with a period of 20.


always #10 clk = ~clk;

initial
begin

// First setup up to monitor all inputs and outputs


$monitor ("time=%5d ns, clk=%b, reset=%b, out=%b", $time, clk,
reset, out[3:0]);

San Francisco State University Nano-Electronics & Computing Research Lab 26


// First initialize all registers
clk = 1'b0; // what happens to clk if we don't
// set this?;
reset = 1'b0;

@(posedge clk);#1; // this says wait for rising edge


// of clk and one more tic (to prevent
// shoot through)

reset = 1'b1;

@(posedge clk);#1;

reset = 1'b0;

// Lets watch what happens after 7 cycles


@(posedge clk);#1;
@(posedge clk);#1;
@(posedge clk);#1;
@(posedge clk);#1;
@(posedge clk);#1;
@(posedge clk);#1;
@(posedge clk);#1;

// At this point we should have a 4'b0110 coming out out


because
// the counter should have counted for 7 cycles from 0
if (out != 4'b0110) begin
$display("ERROR 1: Out is not equal to 4'b0110");
$finish;
end

// We got this far so all tests passed.


$display("All tests completed sucessfully\n\n");

$finish;
end

// This is to create a dump file for offline viewing.


initial
begin
$dumpfile ("counter.dump");
$dumpvars (0, counter_testbench);
end // initial begin

endmodule // counter_testbench

San Francisco State University Nano-Electronics & Computing Research Lab 27


APPENDIX 3A: Overview of RTL
3.A.1 Register Transfer Logic

RTL is expressed in Verilog or VHDL. This document will cover the basics of Verilog.
Verilog is a Hardware Description Language (HDL). A hardware description language is
a language used to describe a digital system example Latches, Flip-Flops, Combinatorial,
Sequential Elements etc… Basically you can use Verilog to describe any kind of digital
system. One can design a digital system in Verilog using any level of abstraction. The
most important levels are:

 Behavior Level: This level describes a system by concurrent algorithms


(Behavioral). Each algorithm itself is sequential, that means it consists of a set of
instructions that are executed one after the other. There is no regard to the
structural realization of the design. Example (Use of ‘always’ statement in


Verilog).
Register Transfer Level (RTL): Designs using the Register-Transfer Level
specify the characteristics of a circuit by transfer of data between the registers,
and also the functionality; for example Finite State Machines. An explicit clock is
used. RTL design contains exact timing possibility; and data transfer is scheduled


to occur at certain times.
Gate level: The system is described in terms of gates (AND, OR, NOT, NAND
etc…). The signals can have only these four logic states (‘0’,’1’,’X’,’Z’). The
Gate Level design is normally not done because the output of Logic Synthesis is
the gate level netlist.

Verilog allows hardware designers to express their designs at the behavioral level and not
worry about the details of implementation to a later stage in the design of the chip. The
design normally is written in a top-down approach. The system has a hierarchy which
makes it easier to debug and design. The basic skeleton of a verilog module looks like
this:
module example (<ports >);

input <ports>;
output <ports>;
inout <ports>;

# Data-type instantiation
#reg data-type stores values
reg <names>;

#Wire data-type connects two different pins/ports


wire <names> ;

<Instantiation>

San Francisco State University Nano-Electronics & Computing Research Lab 28


end module

The modules can reference other modules to form a hierarchy. If the module contains
references to each of the lower level modules, and describes the interconnections between
them, a reference to a lower level module is called a module instance. Each instance is an
independent, concurrently active copy of a module. Each module instance consists of the
name of the module being instanced (e.g. NAND or INV), an instance name (unique to
that instance within the current module) and a port connection list.

NAND N1 (in1, in2, out)

INV V1 (a, abar);

Instance name in the above example is ‘N1 and V1’ and it has to be unique. The port
connection list consists of the terms in open and closed bracket ( ). The module port
connections can be given in order (positional mapping), or the ports can be explicitly
named as they are connected (named mapping). Named mapping is usually preferred for
long connection lists as it makes errors less likely.

There are two ways to instantiate the ports:


1. Port Mapping by name : Don’t have to follow order
Example:
INV V2 (.in (a), .out (abar));

2. Port mapping by order: Don’t have to specify (.in) & (.out). The
Example:
AND A1 (a, b, aandb);

If ‘a’ and ‘b ‘are the inputs and ‘aandb’ is the output, then the ports must be
mentioned in the same order as shown above for the AND gate. One cannot write
it in this way:
AND A1 (aandb, a, b);

It will consider ‘aandb’ as the input and result in an error.

Example Verilog Code: D Flip Flop


module dff ( d, clk, q , qbar)
input d, clk;
output q;

always @(posedge clk)


begin
q<=d;
qbar = !d;

end

San Francisco State University Nano-Electronics & Computing Research Lab 29


endmodule

3.A.2 Digital Design

Digital Design can be broken into either Combinatorial Logic or Sequential Logic. As
mentioned earlier, Hardware Description Languages are used to model RTL. RTL again
is nothing but combinational and sequential logic. The most popular language used to
model RTL is Verilog. The following are a few guidelines to code digital logic in
Verilog:
1. Not everything written in Verilog is synthesizable. The Synthesis tool does not
synthesize everything that is written. We need to make sure, that the logic implied
is synthesized into what we want it to synthesize into and not anything else.
a. Mostly, time dependant tasks are not synthesizable in Verilog. Some of
the Verilog Constructs that are Non Synthesizable are task, wait, initial
statements, delays, test benches etc
b. Some of the verilog constructs that are synthesizable are assign statement,
always blocks, functions etc. Please refer to next section for more detail
information.
2. One can model level sensitive and also edge sensitive behavior in Verilog. This
can be modeled using an always block in verilog.
a. Every output in an ‘always’ block when changes and depends on the
sensitivity list, becomes combinatorial circuit, basically the outputs have
to be completely specified. If the outputs are not completely specified,
then the logic will get synthesized to a latch. The following are a few
examples to clarify this:
b. Code which results in level sensitive behavior
c. Code which results in edge sensitive behavior
d. Case Statement Example
i. casex
ii. casez
3. Blocking and Non Blocking statements
a. Example: Blocking assignment
b. Example: Non Blocking assignment
4. Modeling Synchronous and Asynchronous Reset in Verilog
a. Example: With Synchronous reset
b. Example: With Asynchronous reset
5. Modeling State Machines in Verilog
a. Using One Hot Encoding
b. Using Binary Encoding

APPENDIX 3B: TEST BENCH / VERIFICATION

After designing the system, it is very vital do verify the logic designed. At the front end,
this is done through simulation. In verilog, test benches are written to verify the code.

San Francisco State University Nano-Electronics & Computing Research Lab 30


This topic deals with the whole verification process. Some basic guidelines for writing

Test bench instantiates the top level design and provides the stimulus to the design.
test benches:

 Inputs of the design are declared as ‘reg’ type. The reg data type holds a value until a
new value is driven onto it in an initial or always block. The reg type can only be
assigned a value in an always or initial block, and is used to apply stimulus to the inputs

Outputs of design declared as ‘wire’ type. The wire type is a passive data type that
of the Device Under Test.

holds a value driven on it by a port, assign statement or reg type. Wires can not be

Always and initial blocks are two sequential control blocks that operate on reg types
assigned values inside always and initial blocks.

in a Verilog simulation. Each initial and always block executes concurrently in every
module at the start of simulation. An example of an initial block is shown below

reg clk_50, rst_l;


initial
begin
$display($time, " << Starting the Simulation >>");
clk_50 = 1’b0; // at time 0
rst_l = 0; // reset is active
#20 rst_l = 1’b1; // at time 20 release reset
end

Initial blocks start executing sequentially at simulation time 0. Starting with the first line
between the “begin end pair” each line executes from top to bottom until a delay is
reached. When a delay is reached, the execution of this block waits until the delay time
has passed and then picks up execution again. Each initial and always block executes
concurrently. The initial block in the example starts by printing << Starting the
Simulation >> to the screen, and initializes the reg types clk_50 and rst_l to 0 at time 0.
The simulation time wheel then advances to time index 20, and the value on rst_l changes
to a 1. This simple block of code initializes the clk_50 and rst_l reg types at the beginning
of simulation and causes a reset pulse from low to high for 20 ns in a simulation.

Some system tasks are called. These system tasks are ignored by the synthesis tool, so
it is ok to use them. The system task variables begin with a ‘$’ sign. Some of the system
level tasks are as follows:
a. $Display: Displays text on the screen during simulation
b. $Monitor: Displays the results on the screen whenever the parameter
changes.
c. $Strobe: Same as $display, but prints the text only at the end of the time
step.
d. $Stop: Halts the simulation at a certain point in the code. The user can add
the next set of instructions to the simulator. After $Stop, you get back to
the CLI prompt.
e. $Finish: Exits the simulator
f. $Dumpvar, $Dumpfile: This dumps all the variables in a design to a file.
You can dump the values at different points in the simulation.

San Francisco State University Nano-Electronics & Computing Research Lab 31


Tasks are a used to group a set of repetitive or related commands that would normally
be contained in an initial or always block. A task can have inputs, outputs, and inouts,
and can contain timing or delay elements. An example of a task is below

task load_count;
input [3:0] load_value;
begin
@(negedge clk_50);
$display($time, " << Loading the counter with %h >>", load_value);
load_l = 1’b0;
count_in = load_value;
@(negedge clk_50);
load_l = 1’b1;
end
endtask //of load_count

This task takes one 4-bit input vector, and at the negative edge of the next clk_50, it starts
executing. It first prints to the screen, drives load_l low, and drives the count_in of the
counter with the load_value passed to the task. At the negative edge of clk_50, the load_l
signal is released. The task must be called from an initial or always block. If the
simulation was extended and multiple loads were done to the counter, this task could be

 The compiler directive `timescale:


called multiple times with different load values.

‘timescale 1 ns / 100 ps

This line is important in a Verilog simulation, because it sets up the time scale and
operating precision for a module. It causes the unit delays to be in nanoseconds (ns) and
the precision at which the simulator will round the events down to at 100 ps. This causes
a #5 or #1 in a Verilog assignment to be a 5 ns or 1 ns delay respectively. The rounding
of the events will be to .1ns or 100 pico seconds.

 Verilog Test benches use a standard, which contains a description of the C language
procedural interface, better known as programming language interface (PLI). We can
treat PLI as a standardized simulator (Application Program Interface) API for routines
written in C or C++. Most recent extensions to PLI are known as Verilog procedural

Before writing the test bench, it is important to understand the design specifications of
interface (VPI);

You can view all the signals and check to see if the signal values are correct, in the
the design, and create a list of all possible test cases.

 When designing the test bench, you can break-points at certain times, or can do
waveform viewer.

simulation in a single step way, one can also have Time related breakpoints (Example:

To test the design further, it is good to have randomized simulation. Random
execute the simulation for 10ns and then stop)

Simulation is nothing but supplying random combinations of valid inputs to the


simulation tool and run it for a long time. When this random simulation runs for a long
time, it could cover all corner cases and we can hope that it will emulate real system
behavior. You can create random simulation in the test bench by using the $random
variable.

San Francisco State University Nano-Electronics & Computing Research Lab 32


Coverage Metric: A way of seeing, how many possibilities exist and how many of
them are executed in the simulation test bench. It is always good to have maximum
coverage.
a. Line Coverage: It is the percentage of lines in the code, covered by the
simulation tool.
b. Condition Coverage: It checks for all kinds of conditions in the code and
also verifies to see if all the possibilities in the condition have been
covered or not.
c. State Machine Coverage: It is the percentage of coverage, that checks to
see if every sequence of the state transitions that are covered.
d. Regression Test Suite: This type of regression testing is done, when a
new portion is added to the already verified code. The code is again tested
to see if the new functionality is working and also verifies that the old
code functionality has not been changed, due to the addition of the new
code.

Goals of Simulation are:


1. Functional Correctness: To verify the functionality of the design by verifies
main test cases, corner cases (Special conditions) etc…
2. Error Handling
3. Performance

Basic Steps in Simulation:


1. Compilation: During compilation, the verilog is converted to object code. It is done
on a module basis.
2. Linking: This is step where module interconnectivity takes place. The object files
are linked together and any kind of port mismatches (if any) occur.
3. Execution: An executable file is created and executed.

3.B.1 Test Bench Example:

The following is an example of a simple read, write, state machine design and a test
bench to test the state machine.

State Machine:

module state_machine(sm_in,sm_clock,reset,sm_out);

parameter idle = 2'b00;


parameter read = 2'b01;
parameter write = 2'b11;
parameter wait = 2'b10;

San Francisco State University Nano-Electronics & Computing Research Lab 33


input sm_clock;
input reset;
input sm_in;
output sm_out;

reg [1:0] current_state, next_state;

always @ (posedge sm_clock)


begin
if (reset == 1'b1)
current_state <= 2'b00;
else
current_state <= next_state;
end

always @ (current_state or sm_in)


begin
// default values
sm_out = 1'b1;
next_state = current_state;
case (current_state)
idle:
sm_out = 1'b0;
if (sm_in)
next_state = 2'b11;
write:
sm_out = 1'b0;
if (sm_in == 1'b0)
next_state = 2'b10;
read:
if (sm_in == 1'b1)
next_state = 2'b01;
wait:
if (sm_in == 1'b1)
next_state = 2'b00;
endcase
end

endmodule

Test Bench for State Machine

San Francisco State University Nano-Electronics & Computing Research Lab 34


module testbench;

// parameter declaration section


// ...
parameter idle_state = 2'b00;
parameter read_state = 2'b01;
parameter write_state = 2'b11;
parameter wait_state = 2'b10;

// testbench declaration section


reg [500:1] message;
reg [500:1] state_message;
reg in1;
reg clk;
reg reset;
wire data_mux;

// instantiations
state_machine #(idle_state,
read_state,
write_state,
wait_state) st_mac (
.sm_in (in1),
.sm_clock (clk),
.reset (reset),
.sm_out (data_mux)
);

// monitor section
always @ (st_mac.current_state)
case (st_mac.current_state)
idle_state : state_message = "idle";
read_state : state_message = "read";
write_state: state_message = "write";
wait_state : state_message = "wait";
endcase

// clock declaration
initial clk = 1'b0;
always #50 clk = ~clk;

// tasks
task reset_cct;
begin
@(posedge clk);
message = " reset";

San Francisco State University Nano-Electronics & Computing Research Lab 35


@(posedge clk);
reset = 1'b1;
@(posedge clk);
reset = 1'b0;
@(posedge clk);
@(posedge clk);
end
endtask

task change_in1_to;
input a;
begin
message = "change in1 task";
@ (posedge clk);
in1 = a;
end
endtask

// main task calling section


initial
begin
message = "start";
reset_cct;
change_in1_to (1'b1);
change_in1_to (1'b0);
change_in1_to (1'b1);
change_in1_to (1'b0);
change_in1_to (1'b1);
@ (posedge clk);
@ (posedge clk);
@ (posedge clk);
$stop;
end

endmodule

How do you simulate your design to get the real system behavior?

The following are two methods with which it id possible to achieve real system behavior
and verify it.

1. FPGA Implementation: Speeds up verification and makes it more comprehensive.


2. Hardware Accelerator: It is nothing but a bunch of FPGA’s implemented inside of
a box. During compilation, it takes the part of the code that is synthesizable and
maps it onto FPGA. Al the other non synthesizable part of the code such as test
benches etc, are invoked by the simulation tools.

San Francisco State University Nano-Electronics & Computing Research Lab 36


a. Basically RTL is mapped onto FPGA. The FPGA internally contains
optimized files.
b. It translates signal transitions in the software part and signals on the FPGA
and basically maps into the real signals that are there on the FPGA board.
c. This method of verification is good when there is a big design which has a
lot of RTL, it also depends on the percentage of synthesizable code versus
non-synthesizable code, if the amount of interaction between the codes,
lesser the better.

Design Compiler Tutorial [RTL-Gate Level Synthesis]

4.0 Introduction

The Design Compiler is a synthesis tool from Synopsys Inc. In this tutorial you will
learn how to perform hardware synthesis using Synopsys design compiler. In simple
terms, we can say that the synthesis tool takes a RTL [Register Transfer Logic] hardware
description [design written in either Verilog/VHDL], and standard cell library as input
and the resulting output would be a technology dependent gate-level-netlist. The gate-
level-netlist is nothing but structural representation of only standard cells based on the
cells in the standard cell library. The synthesis tool internally performs many steps, which
are listed below. Also below is the flowchart of synthesis process.

1. Design Compiler reads in technology libraries, DesignWare libraries, and symbol


libraries to implement synthesis.
During the synthesis process, Design Compiler [DC] translates the RTL description
to components extracted from the technology library and DesignWare library. The
technology library consists of basic logic gates and flip-flops. The DesignWare library
contains more complex cells for example adders and comparators which can be used for
arithmetic building blocks. DC can automatically determine when to use Design Ware
components and it can then efficiently synthesize these components into gate-level
implementations.
2. Reads the RTL hardware description written in either Verilog/VHDL.
3. The synthesis tool now performs many steps including high-level RTL optimization,
RTL to unoptimized Boolean logic, technology independent optimizations, and finally
technology mapping to the available standard cells in the technology library, known as
target library. This resulting gate-level-netlist also depends on constrains given.
Constraints are the designer’s specification of timing and environmental restrictions [area,
power, process etc] under which synthesis is to be performed.
As an RTL designer, it is good to understand the target standard cell library, so
that one can get a better understanding of how the RTL coded will be synthesized into
gates. In this tutorial we will use Synopsys Design Compiler to read/elaborate RTL, set
timing constraints, synthesize to gates, and report various QOR reports [timing/area
reports etc]. Please refer to APPENDIX sA: Overview of RTL for more information.

San Francisco State University Nano-Electronics & Computing Research Lab 37


4. After the design is optimized, it is ready for DFT [design for test/ test synthesis]. DFT
is test logic; designers can integrate DFT into design during synthesis. This helps the
designer to test for issues early in the design cycle and also can be used for debugging
process after the chip comes back from fabrication.
In this tutorial, we will not be covering the DFT process. The synthesized design in
the tutorial example is without the DFT logic. Please refer to tutorial on Design for Test
for more information.
5. After test synthesis, the design is ready for the place and route tools. The Place and
route tools place and physically interconnect cells in the design. Based on the physical
routing, the designer can back-annotate the design with actual interconnect delays; we
can use Design Compiler again to resynthesize the design for more accurate timing
analysis.

Figure 4.a Synthesis Flow

Libraries Read
Libraries

Read
Netlist
Netlist

Map to
Target Library
Map to and Optimize
Link Library
(if gate-level)
Write-out
Optimized
Apply Netlist
SDC
Const. Constraints

While running DC, it is important to monitor/check the log files, reports, scripts etc to
identity issues which might affect the area, power and performance of the design. In this

San Francisco State University Nano-Electronics & Computing Research Lab 38


tutorial, we will learn how to read the various DC reports and also use the graphical
Design Vision tool from Synopsys to analyze the synthesized design.

For Additional documentation please refer the below location, where you can get more
information on the 90nm Standard Cell Library, Design Compiler, Design Vision, Design
Ware Libraries etc.

4.1 BASIC SYNTHESIS GUIDELINES


4.1.1 Startup File
The Synopsys synthesis tool when invoked, through Design compiler command, reads a
startup file, which must be present in the current working directory. This startup file is
synopsys_dc.setup file. There should be two startup files present, one in the current
working directory and other in the root directory in which Synopsys is installed. The
local startup file in the current working directory should be used to specify individual
design specifications. This file does not contain design dependent data. Its function is to
load the Synopsys technology independent libraries and other parameters. The user in the
startup files specifies the design dependent data. The settings provided in the current
working directory override the ones specified in the root directory.

There are four important parameters that should be setup before one can start
using the tool. They are:
• search_path
This parameter is used to specify the synthesis tool all the paths that it should search
when looking for a synthesis technology library for reference during synthesis.
• target_library
The parameter specifies the file that contains all the logic cells that should used for
mapping during synthesis. In other words, the tool during synthesis maps a design to the
logic cells present in this library.
• symbol_library
This parameter points to the library that contains the “visual” information on the logic
cells in the synthesis technology library. All logic cells have a symbolic representation
and information about the symbols is stored in this library.
• link_library
This parameter points to the library that contains information on the logic gates in the
synthesis technology library. The tool uses this library solely for reference but does not
use the cells present in it for mapping as in the case of target_library.

An example on use of these four variables from a .synopsys_dc.setup file is given below.
search_path = “. /synopsys/libraries/syn/cell_library/libraries/syn”
target_library = class.db
link_library = class.db
symbol_library = class.db
Once these variables are setup properly, one can invoke the synthesis tool at the
command prompt using any of the commands given for the two interfaces.

San Francisco State University Nano-Electronics & Computing Research Lab 39


4.1.2 Design Objects

There are eight different types of objects categorized by Design Compiler.

Design: It corresponds to the circuit description that performs some logical function. The
design may be stand-alone or may include other sub-designs. Although sub-design may
be part of the design, it is treated as another design by the Synopsys.
Cell: It is the instantiated name of the sub-design in the design. In Synopsys terminology,
there is no differentiation between the cell and instance; both are treated as cell.
Reference: This is the definition of the original design to which the cell or instance refers.
For e.g., a leaf cell in the netlist must be referenced from the link library, which contains
the functional description of the cell. Similarly an instantiated sub-design must be
referenced in the design, which contains functional description of the instantiated
subdesign.
Ports: These are the primary inputs, outputs or IO’s of the design.
Pin: It corresponds to the inputs, outputs or IO’s of the cells in the design. (Note the
difference between port and pin)
Net: These are the signal names, i.e., the wires that hook up the design together by
connecting ports to pins and/or pins to each other.
Clock: The port or pin that is identified as a clock source. The identification may be
internal to the library or it may be done using dc_shell commands.
Library: Corresponds to the collection of technology specific cells that the design is
targeting for synthesis; or linking for reference.

Design Entry
Before synthesis, the design must be entered into the Design Compiler (referred to as DC
from now on) in the RTL format. DC provides the following two methods of design
entry:

read command
analyze & elaborate commands

The analyze & elaborate commands are two different commands, allowing designers to
initially analyze the design for syntax errors and RTL translation before building the
generic logic for the design. The generic logic or GTECH components are part of
Synopsys generic technology independent library. They are unmapped representation of
boolean functions and serve as placeholders for the technology dependent library.

The analyze command also stores the result of the translation in the specified design
library that maybe used later. So a design analyzed once need not be analyzed again and
can be merely elaborated, thus saving time. Conversely read command performs the
function of analyze and elaborate commands but does not store the analyzed results,
therefore making the process slow by comparison.

San Francisco State University Nano-Electronics & Computing Research Lab 40


Parameterized designs (such as usage of generic statement in VHDL) must use analyze
and elaborate commands in order to pass required parameters, while elaborating the
design. The read command should be used for entering pre-compiled designs or netlists
in DC .

One other major difference between the two methods is that, in analyze and elaborate
design entry of a design in VHDL format, one can specify different architectures during
elaboration for the same analyzed design. This option is not available in the read
command.

The commands used for both the methods in DC are as given below:
Read command:
dc_shell>read –format <format> <list of file names>
“-format” option specifies the format in which the input file is in, e.g. VHDL
Sample command for a reading “adder.vhd” file in VHDL format is given below

dc_shell>read –format vhdl adder.vhd


Analyze and Elaborate commands:

Or

dc_shell>read –format verilog adder.v


Analyze and Elaborate commands:

dc_shell>analyze -format <format> <list of file names>

dc_shell>elaborate <.syn file> -arch “<architecture >” –param “<parameter>”


.syn file is the file in which the analyzed information of the design analyzed is stored.
e.g: The adder entity in the adder.vhd has a generic parameter “width” which can be
specified while elaboration. The architecture used is “beh” defined in the adder.vhd file.
The commands for analyze and elaborate are as given below:

dc_shell> analyze -format vhdl adder.vhd

dc_shell> elaborate adder –arch “beh” –param “width = 32”

4.1.3 Technology Library

Technology libraries contain the information that the synthesis tool needs to generate a
netlist for a design based on the desired logical behavior and constraints on the design.
The tool referring to the information provided in a particular library would make
appropriate choices to build a design. The libraries contain not only the logical function
of an ASIC cell, but the area of the cell, the input-to-output timing of the cell, any
constraints on fanout of the cell, and the timing checks that are required for the cell.

San Francisco State University Nano-Electronics & Computing Research Lab 41


Other information stored in the technology library may be the graphical symbol of the
cell for use in creating the netlist schematic.

The target_library, link_library, and symbol_library parameters in the startup file are
used to set the technology library for the synthesis tool.

The Synopsys® .lib technology library contains the following information


• Wire-load models for net length and data estimation. Wire-load models available in the
technology library are statistical and hence inaccurate when estimating data.
• Operating Conditions along with scaling k-factors for different delay components to
model the effects of temperature, process, and voltage on the delay numbers.
• Specific delay models like piece-wise linear, non-linear, cmos2 etc. for calculation of
delay values.
For each of the technology primitive cells the following information is modeled
• Interface pin names, direction and other information.
• Functional descriptions for both combinational and sequential cells which can be
modeled in Synopsys®
• Pin capacitance and drive capabilities
• Pin to pin timing
• Area

4.1.4 Register Transfer-Level Description

A Register Transfer-Level description is a style that specifies a particular design in terms


or registers and combinational logic in between. This is be shown by the “register and
cloud” diagram in Fig 2.0

Figure 4.b. Register and cloud diagram

The registers can be described explicitly, through component instantiation, or implicitly,


through inference. The combinational logic is described either by logical equations,
sequential control statements (CASE, IF then ELSE, etc.), subprograms, or through
concurrent statements and are represented by cloud objects in figure 2.0 between the
registers.

San Francisco State University Nano-Electronics & Computing Research Lab 42


RTL is the most popular form of high-level design specification. A good coding style
would help the synthesis tool generate a design with minimal area and maximum
performance.

4.1.5 General Guidelines

Following are given some guidelines which if followed might improve the performance
of the synthesized logic, and produce a cleaner design that is suited for automating the

 Clock logic including clock gating and reset generation should be kept in one block –
synthesis process.

to be synthesized once and not touched again. This helps in a clean specification of
the clock constraints. Another advantage is that the modules that are being driven by

 No glue logic at the top: The top block is to be used only for connecting modules
the clock logic can be constrained using the ideal clock specifications.

together. It should not contain any combinational glue logic. This removes the time
consuming top-level compile, which can now be simply stitched together without

 Module name should be same as the file name and one should avoid describing more
undergoing additional synthesis.

that one module or entity in a single file. This avoids any confusion while compiling

 While coding finite state machines, the state names should be described using the
the files and during the synthesis.

enumerated types. The combinational logic for computing the next state should be in
its own process, separate from the state registers. Implement the next-state
combinational logic with a case statement. This helps in optimizing the logic much

 Incomplete sensitivity lists must be avoided as this might result in simulation


better and results in a cleaner design.

 Memory elements, latches and flip-flops: A latch is inferred when an incomplete if


mismatches between the source RTL and the synthesized logic.

statement with a missing else part is specified. A flip-flop, or a register, is inferred


when an edge sensitive statement is specified in the always statement for Verilog and
process statement for VHDL. A latch is more troublesome than a latch as it makes
static timing analysis on designs containing latches. So designers try to avoid latches

 Multiplexer Inference: A case statement is used for implementing multiplexers. To


and prefer flipflops more to latches.

prevent latch inferences in case statements the default part of the case statement
should always be specified. On the other hand an if statement is used for writing
priority encoders. Multiple if statements with multiple branches result in the creation
of a priority encoder structure.
Ex: always @ (A, B, C)
begin
if A= 0 then D = B; end if;
if A= 1 then D = C; end if;
end

San Francisco State University Nano-Electronics & Computing Research Lab 43


The above example infers a priority encoder with the first if statement given the
precedence. The same code can be written using a case statement to implement a
multiplexer as follows.
always @ (A, B, C)
begin
case (A) is
when 0 => D = B;
when others => D = C;
end case;
end

The same code can be written using if statement along with elsif statements to cover all
possible branches.
฀ Three state buffers: A tri-state buffer is inferred whenever a high impedance (Z) is
assigned to an output. Tri-state logic is generally not always recommended because it
reduces testability and is difficult to optimize – since it cannot be buffered.
฀ Signals versus Variables in VHDL: Signal assignments are order independent, i.e. the
order in which they are placed within the process statement does not have any effect on
the order in which they are executed as all the signal assignments are done at the end of
the process. The variable assignments on the other hand are order dependent. The signal
assignments are generally used within the sequential processes and variable assignments
are used within the combinational processes.

4.1.6 Design Attributes and Constraints

A designer, in order to achieve optimum results, has to methodically constrain the design,
by describing the design environment, target objectives and design rules. The constraints
contain timing and/or area information, usually derived from the design specifications.
The synthesis tool uses these constraints to perform synthesis and tries to optimize the
design with the aim of meeting target objectives.

4.1.6.1 Design Attributes

Design attributes set the environment in which a design is synthesized. The attributes
specify the process parameters, I/O port attributes, and statistical wire-load models. The
most common design attributes and the commands for their setting are given below:
Load: Each output can specify the drive capability that determines how many loads can
be driven within a particular time. Each input can have a load value specified that
determines how much it will slow a particular driver. Signals that are arriving later than
the clock can have an attribute that specifies this fact. The load attribute specifies how
much capacitive load exists on a particular output signal. The load value is specified in
the units of the technology library in terms of picofarads or standard loads, etc... The
command for setting this attribute is given below:
set_load <value> <object_list>
e.g. dc_shell> set_load 1.5 x_bus

San Francisco State University Nano-Electronics & Computing Research Lab 44


Drive: The drive specifies the drive strength at the input port. It is specified as a
resistance value. This value controls how much current a particular driver can source.
The larger a driver is, i.e 0 resistance, the faster a particular path will be, but a larger
driver will take more area, so the designer needs to trade off speed and area for the best
performance. The command for setting the drive for a particular object is given below

set_drive <value> <object_list>


e.g. dc_shell> set_drive 2.7 ybus

4.1.6.2 Design Constraints

Design constraints specify the goals for the design. They consist of area and timing
constraints. Depending on how the design is constrained the DC/DA tries to meet the set
objectives. Realistic specification is important, because unrealistic constraints might
result in excess area, increased power and/or degrading in timing. The basic commands to
constrain the design are

set_max_area: This constraint specifies the maximum area a particular design should
have. The value is specified in units used to describe the gate-level macro cells in the
technology library.
e.g. dc_shell> set_max_area 0
Specifying a 0 area might result in the tool to try its best to get the design as small as
possible

create_clock: This command is used to define a clock object with a particular period and
waveform. The –period option defines the clock period, while the –waveform option
controls the duty cycle and the starting edge of the clock. This command is applied to a
pin or port, object types.

Following example specifies that a port named CLK is of type “clock” that has a period
of 40 ns, with 50% duty cycle. The positive edge of the clock starts at time 0 ns, with the
falling edge occurring at 20 ns. By changing the falling edge value, the duty cycle of the
clock may be altered.
e.g. dc_shell> create_clock –period 40 –waveform {0 20} CLK

set_don’t_touch_network: This is a very important command, usually used for clock


networks and resets. This command is used to set a dont_touch property on a port, or on
the net. Note setting this property will also prevent DC from buffering the net. In addition
any gate coming in contact with the “don’t_touch” net will also inherit the attribute.
e.g. dc_shell> set_dont_touch_network {CLK, RST}

set_don’t_touch: This is used to set a don_touch property on the current_design, cells,


references, or nets. This command is frequently used during hierarchical compilation of
blocks for preventing the DC from optimizing the don’t_touch object.
e.g. dc_shell> set_don’t_touch current_design

San Francisco State University Nano-Electronics & Computing Research Lab 45


current_design is the variable referencing the current working design. It can be set using
the current_design command as follows
dc_shell>current_design <design_name>

set_input_delay: It specifies the input arrival time of a signal in relation to the clock. It is
used at the input ports, to specify the time it takes for the data to be stable after the clock
edge. The timing specification of the design usually contains this information, as the
setup/hold time requirements for the input signals. From the top-level timing
specifications the sub-level timing specifications may also be extracted.
e.g. dc_shell> set_input_delay –max 23.0 –clock CLK {datain}
dc_shell> set_input_delay –min 0.0 –clock CLK {datain}

The CLK has a period of 30 ns with 50% duty cycle. For the above given specification of
max and min input delays for the datain with respect to CLK, the setup-time requirement
for the input signal datain is 7ns, while the hold-time requirement is 0ns.

set_output_delay: This command is used at the output port, to define the time it takes for
the data to be available before the clock edge. This information is usually is provided in
the timing specification.
e.g. dc_shell> set_output_delay – max 19.0 –clock CLK {dataout}

The CLK has a period of 30 ns with 50% duty cycle. For the above given specification of
max output delay for the dataout with respect to CLK, the data is valid for 11 ns after the
clock edge.

set_max_delay: It defines the maximum delay required in terms of time units for a
particular path. In general it is used for blocks that contain combination logic only.
However it may also be used to constrain a block that is driven by multiple clocks, each
with a different frequency. This command has precedence over DC derived timing
requirements.
e.g. dc_shell> set_max_delay 5 –from all_inputs() – to_all_outputs()

set_min_delay: It defines the minimum delay required in terms of time units for a
particular path.. It is the opposite of the set_max_delay command. This command has
precedence over DC derived timing requirements.
e.g. dc_shell> set_max_delay 3 –from all_inputs() – to_all_outputs()

4.2 Tutorial Example

Setup
1. Write the Verilog Code. For the purpose of this tutorial, please consider the simple
verilog code for gray counter below.

San Francisco State University Nano-Electronics & Computing Research Lab 46


Gray Code Counter
// MODULE: Sequential Circuit Example: gray_counter.v
// MODULE DECLARATION
module graycount (gcc_out, reset_n, clk, en_count);

output [2-1:0] gcc_out; // current value of counter


input reset_n; // active-low RESET signal
input clk; // clock signal
input en_count; // counting is enabled when en_count = 1

// SIGNAL DECLARATIONS
reg [2-1:0] gcc_out;
// Compute new gcc_out value based on current gcc_out value
always @(negedge reset_n or posedge clk) begin
if (~reset_n)
gcc_out <= 2'b00;
else begin // MUST be a (posedge clk) - don't need “else if (posedge clk)"
if (en_count) begin // check the count enable
case (gcc_out)
2'b00: begin gcc_out <= 2'b01; end
2'b01: begin gcc_out <= 2'b11; end
2'b11: begin gcc_out <= 2'b10; end
default: begin gcc_out <= 2'b00; end
endcase // of case
end // of if (en_count)
end // of else
end // of always loop for computing next gcc_out value

endmodule

2. As soon as you log into your engr account, at the command prompt, please type “csh
“as shown below. This changes the type of shell from bash to c-shell. All the commands
work ONLY in c-shell.

[hkommuru@hafez ]$csh

2. Please copy the whole directory from the below location

[hkommuru@hafez ]$cd
[hkommuru@hafez ]$ cp –rf /packages/synopsys/setup/asic_flow_setup .

This ccreate directory structure as shown below. It will create a directory called
“asic_flow_setup ”, under which it creates the following directories namely

asic_flow_setup

San Francisco State University Nano-Electronics & Computing Research Lab 47


src/ : for verilog code/source code
vcs/ : for vcs simulation ,
synth_graycounter/ : for synthesis
synth_fifo/ : for synthesis
pnr/ : for Physical design
extraction/: for extraction
pt/: for primetime
verification/: final signoff check

The “asic_flow_setup” directory will contain all generated content including, VCS
simulation, synthesized gate-level Verilog, and final layout. In this course we will always
try to keep generated content from the tools separate from our source RTL. This keeps
our project directories well organized, and helps prevent us from unintentionally
modifying the source RTL. There are subdirectories in the project directory for each
major step in the ASIC Flow tutorial. These subdirectories contain scripts and
configuration files for running the tools required for that step in the tool flow. For this
tutorial we will work exclusively in the vcs directory.

3. Please source “synopsys_setup.tcl” which sets all the environment variables necessary
to run the VCS tool.
Please source them at unix prompt as shown below

[hkommuru@hafez ]$ source /packages/synopsys/setup/synopsys_setup.tcl

Please Note : You have to do steps 1 and 3 above everytime you log in.

4. Please open the “dc_synth.tcl” at below location

[hkommuru@hafez ]$cd
[[email protected]] $cd asic_flow_setup/synth_graycounter
[[email protected]] $cd scripts
[[email protected]] $emacs dc_synth.tcl &
[[email protected]] $cd ..

4.2.1 Synthesizing the Code

5. First we will learn how to run dc_shell manually, before we automate the scripts. Use
the below command invoke dc_shell

[[email protected]] $ dc_shell-xg-t
Initializing...
dc_shell-xg-t>

Once you get the prompt above, you can run various commands to load verilog files,
libraries etc. To get more information on any command you can type “man
<command_name> at the prompt.

San Francisco State University Nano-Electronics & Computing Research Lab 48


6. Type/Copy in the below commands at the command prompt, from the “dc_synth.tcl”
which you have already opened in STEP 4.

dc_shell-xg-t> lappend search_path ../src/gray_counter


dc_shell-xg-t> define_design_lib WORK –path “work”
dc_shell-xg-t > set link_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max.db
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_min.db
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_typ.db ]
dc_shell-xg-t > set target_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max.db]

The command “lappend search path” tells the tool to search for the verilog code in that
particular directory ] to the verilog source code directory.

The next command “define_design_lib”, creates a Synopsys work directory, and the la
last two commands “ set link_library “ and “ set target_library “ point to the standard
technology libraries we will be using. The DB files contain wireload models [Wire load
modeling allows the tool to estimate the effect of wire length and fanout on the resistance,
capacitance, and area of nets, calculate wire delays and circuit speeds], area and timing
information for each standard cell. DC uses this information to optimize the synthesis
process. For more detail information on optimization, please refer to the DC manual.

7. The next step is to load your Verilog/VHDL design into Design Compiler. The
commands to load verilog are “analyze” and “elaborate”. Executing these commands
results in a great deal of log output as the tool elaborates some Verilog constructs and
starts to infer some high-level components. Try executing the commands as follows.

dc_shell_xg-t > analyze –library WORK –format verilog gray_counter.v


dc_shell_xg-t> elaborate –architecture verilog –library WORK graycount

Notice, that the graycount is the name of the top module to be synthesized and not the
name of the verilog file (gray_counter.v). You can see part of the analyze command in
Figure 7.a below

San Francisco State University Nano-Electronics & Computing Research Lab 49


Figure 4.a : Fragment of analyze command

You can see Figure 7.b, which shows you a part of the elaboration, for the above gray
code; the tool has inferred flipflop with 2 bit width. Please make sure that you check your
design at this stage in the log to check if any latches inferred. We typically do not want
latches inferred in the design.

Before DC optimizes the design, it uses Presto Verilog Compiler [for verilog code], to
read in the designs; it also checks the code for the correct syntax and builds a generic
technology (GTECH) netlist. DC uses this GTECH netlist to optimize the design. You
could also use “read_verilog” command, which basically combines both elaborate and
analyze command into one. You can use “read_verilog” as long as your design is not
parameterized, meaning look at the below example of a register.

module dflipflop( inp, clk, outp );


parameter SIZE = 8;
input [SIZE-1:0] inp;
input clk;
output [SIZE-1:0] outp;
reg [SIZE-1:0] outp;
reg [SIZE-1:0] tmp;
always @(clk)
if (clk == 0)
tmp = inp;
else //(clk == 1)
out1 <= tmp;
endmodule

If you want an instance of the above register to have a bit-width of 32, use the elaborate
command to specify this as follows:

elaborate dflipflop -param SIZE=32

San Francisco State University Nano-Electronics & Computing Research Lab 50


For more information on the “elaborate” command, and how the synthesis tool infers
combinational and sequential elements, please refer to Presto HDL Compiler Reference
Manual found in the documentation area.

Figure 4.b Fragment of elaborate command

8. Next, we check to see if the design is in a good state or consistent state; meaning that
there are no errors such as unconnected ports, logical constant-valued ports, cells with no
input or output pins, mismatches between a cell and its reference, multiple driver nets etc.

dc_shell-xg-t> check_design

Please go through, the check_design errors and warnings. DC cannot compile the design
if there are any errors. Many of the warning’s may not an issue, but it is still useful to
skim through this output.

9. After the design compile is clean, we need to tell the tool the constraints, before it
actually synthesizes. The tool needs to know the target frequency you want to synthesize.
Take a look at the “create_clock” command below.

dc_shell-xg-t> create_clock clk -name ideal_clock1 -period 5

San Francisco State University Nano-Electronics & Computing Research Lab 51


The above command tells the tool that the pin named clk is the clock and that your
desired clock period is 5 nanoseconds. We need to set the clock period constraint
carefully. If the period is unrealistically small, then the tools will spend forever trying to
meet timing and ultimately fail. If the period is too large, then the tools will have no
trouble but you will get a very conservative implementation.

You could also add additional constraints such as constrain the arrival of certain input
signals, the drive strength of the input signals, capacitive load on the output signals etc.
Below are some examples. These constraints are defined by you, the user; hence we can
call them user specified constraints.
Set input constraints by defining how much time would be spent by signals arriving into
your design, outside your design with respect to clock.

dc_shell-xg-t> set_input_delay 2.0 [remove_from_collection [all_inputs] clk ] –clock


ideal_clock1

Similarly you can define output constraints, which define how much time would be spent
by signals leaving the design, outside the design, before being captured by the same clk.

dc_shell-xg-t> set_output_delay 2.0 [all_outputs] –clock ideal_clock1

Set area constraints: set maximum allowed area to 0 , well it’s just to instruct design
compiler to use as less area as possible.

dc_shell-xg-t > set_max_area 0

Please refer to tutorial on “Basics of Static Timing Analysis” for more understanding of
concepts of STA and for more information on the commands used in STA, please refer to
the Primetime Manual and DC Compiler Manual at location /packages/synopsys/

10. Now we are ready to use the compile command to actually synthesize our design into
a gate-level netlist. Two of the most important options for the compile command are the
map effort and the area effort. Both of these can be set to one of none, low, medium, or
high. They specify how much time to spend on technology mapping and area reduction.

dc_shell-xg-t> compile -map_effort medium -area_effort medium

DC will attempt to synthesize your design while still meeting the constraints. DC
considers two types of constraints: user specified constraints and design rule constraints.
We looked at the user specified constraints in the previous step. Design rule constraints
are fixed constraints which are specified by the standard cell library. For example, there
are restrictions on the loads specific gates can drive and on the transition times of certain
pins. To get a better understanding of the standard cell library, please refer to Generic
90nm library documents in the below location which we are using in the tutorial.

San Francisco State University Nano-Electronics & Computing Research Lab 52


/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digi
tal_Standard_Cell_Library/doc/databook/

Also, note that the compile command does not optimize across module boundaries. You
have to use “set flatten” command to enable inter-module optimization. For more
information on the compile command consult the Design Compiler User Guide (dc-user-
guide.pdf) or use man compile at the DC shell prompt.

The compile command will report how the design is being optimized. You should see DC
performing technology mapping, delay optimization, and area reduction. Figure 7.c
shows a fragment from the compile output. Each line is an optimization pass. The area
column is in units specific to the standard cell library, but for now you should just use the
area numbers as a relative metric. The worst negative slack column shows how much
room there is between the critical path in your design and the clock constraint. Larger
negative slack values are worse since this means that your design is missing the desired
clock frequency by a greater amount. Total negative slack is the sum of all negative slack
across all endpoints in the design - if this is a large negative number it indicates that not
only is the design not making timing, but it is possible that many paths are too slow. If
the total negative slack is a small negative number, then this indicates that only a few
paths are too slow. The design rule cost is an indication of how many cells violate one of
the standard cell library design rules constraints.

You can use the compile command more than once, as many iterations as you want, for
example, first iteration you can optimize only timing, but it might come with high area
cost, for second iteration, it optimizes area, but could cause the design to no longer meet
timing. There is no limit on number of iterations; however each design is different, and
you need to do number of runs, to decide how many iterations it needs.

We can now use various commands to examine timing paths, display reports, and further
optimize the design. Using the shell directly is useful for finding out more information
about a specific command or playing with various options.

Figure 4.c: Fragment of Compile command

San Francisco State University Nano-Electronics & Computing Research Lab 53


4.2.2 Interpreting the Synthesized Gate-Level Netlist and Text Reports

In addition to the actual synthesized gate-level netlist, the dc_synth.tcl also generates
several text reports. Reports usually have the rpt filename suffix. The following is a list
of the synthesis reports.
The synth area.rpt report contains area information for each module in the design. 7.d
shows a fragment from synth_area.rpt. We can use the synth_area.rpt report to gain
insight into how various modules are being implemented. We can also use the area report
to measure the relative area of the various modules.

You can find all these reports in the below location for your reference.
/packages/synopsys/setup/project_dc/synth/reports/

You can also look at command.log , in the synth directory, which will list all the
commands used in the current session.

synth_area.rpt - Contains area information for each module instance

San Francisco State University Nano-Electronics & Computing Research Lab 54


Figure 4.d : Fragment of area report

Library(s) Used:

saed90nm_typ (File:
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm
/Digital_Standard_Cell_Library/synopsys/models/saed90nm_typ.db)

Number of ports: 5
Number of nets: 9
Number of cells: 5
Number of references: 3

Combinational area: 29.492001


Noncombinational area: 64.512001
Net Interconnect area: undefined (No wire load specified)

Total cell area: 94.003998


Total area: undefined
1

The synth_cells.rpt - Contains the cells list in the design , as you can see in Figure
4.e . From this report , you can see the breakup of each cell area in the design.

Figure 4.e: Fragment of cell area report


Attributes:
b - black box (unknown)
h - hierarchical
n - noncombinational
r - removable
u - contains unmapped logic

Cell Reference Library Area Attributes


-----------------------------------------------------------------------
U2 AO22X1 saed90nm_typ 11.981000
U3 AO22X1 saed90nm_typ 11.981000
U4 INVX0 saed90nm_typ 5.530000
gcc_out_reg[0] DFFARX1 saed90nm_typ 32.256001 n
gcc_out_reg[1] DFFARX1 saed90nm_typ 32.256001 n
-----------------------------------------------------------------------
---------
Total 5 cells 94.003998
1

Synth_qor.rpt – Contains summary information on the area , timing, critical paths


violations in the design. You can take a look at this report to understand the overall
quality of your design. Figure 4.f shows the example. As you can see in Figure 4.f , there
is no negative slack in the design that means the design is meeting timing.

San Francisco State University Nano-Electronics & Computing Research Lab 55


Figure 4.f : Fragment of qor report

Timing Path Group 'ideal_clock1'


-----------------------------------
Levels of Logic: 2.00
Critical Path Length: 0.12
Critical Path Slack: 2.77
Critical Path Clk Period: 5.00
Total Negative Slack: 0.00
No. of Violating Paths: 0.00
-----------------------------------

Cell Count
-----------------------------------
Hierarchial Cell Count: 0
Hierarchial Port Count: 0
Leaf Cell Count: 5
-----------------------------------

Area
-----------------------------------
Combinational Area: 29.492001
Noncombinational Area: 64.512001
Net Area: 0.000000
-----------------------------------
Cell Area: 94.003998
Design Area: 94.003998

Design Rules
-----------------------------------
Total Number of Nets: 9
Nets With Violations: 0
-----------------------------------

Hostname: hafez.sfsu.edu

Compile CPU Statistics


-----------------------------------
Resource Sharing: 0.00
Logic Optimization: 0.31
Mapping Optimization: 0.33
-----------------------------------
Overall Compile Time: 3.64

1
synth_timing.rpt - Contains critical timing paths

You can see below an example of a timing report dumped out from synthesis . You can
see at the last line of the Figure 7.f , this paths meets timing. The report lists the critical
path of the design. The critical path is the slowest logic path between any two registers

San Francisco State University Nano-Electronics & Computing Research Lab 56


and is therefore the limiting factor preventing you from decreasing the clock period
constraint
(and thus increasing performance). The report is generated from a purely static worst-
case timing analysis (i.e. independent of the actual signals which are active when the
processor is running). In the example below, since it’s a simple gray counter, the crtical
path is from the port to the register.
Please note that the last column lists the cumulative delay to that node, while the middle
column shows the incremental delay. You can see that the datapath is and Inverter,
complex gate before it reaches the register which is 0.12ns. From our SDC constraints,
we set 2ns delay on the input port. So, the total delay so far is 2.12ns. Notice, however,
that the final register file flip-flop has a setup time of 0.11 ns, the clock period is 5ns.
Therefore 5ns-0.11ns=4.89ns, is the time before which the register should latch the data.
The critical path delay is however only 2.12ns, so there is more than enough time for the
path to meet timing.

Figure 4.g: Fragment of Timing report

Operating Conditions: TYPICAL Library: saed90nm_typ


Wire Load Model Mode: top

Startpoint: en_count (input port)


Endpoint: gcc_out_reg[1]
(rising edge-triggered flip-flop clocked by ideal_clock1)
Path Group: ideal_clock1
Path Type: max

Point Incr Path


-----------------------------------------------------------
clock (input port clock) (rise edge) 0.00 0.00
input external delay 2.00 2.00 f
en_count (in) 0.00 2.00 f
U4/QN (INVX0) 0.02 2.02 r
U2/Q (AO22X1) 0.10 2.12 r
gcc_out_reg[1]/D (DFFARX1) 0.00 2.12 r
data arrival time 2.12

clock ideal_clock1 (rise edge) 5.00 5.00


clock network delay (ideal) 0.00 5.00
gcc_out_reg[1]/CLK (DFFARX1) 0.00 5.00 r
library setup time -0.11 4.89
data required time 4.89
-----------------------------------------------------------
data required time 4.89
data arrival time -2.12
-----------------------------------------------------------
slack (MET) 2.77

synth_resources.rpt - Contains information on Design Ware components

=> In the above example, the file will be empty since the graycounter
did not need any of the complex cells.

San Francisco State University Nano-Electronics & Computing Research Lab 57


****************************************
Report : resources
Design : graycount
Version: X-2005.09-SP3
Date : Mon Mar 9 21:30:37 2009
****************************************

No resource sharing information to report.

No implementations to report

No multiplexors to report

synth_check_design.rpt - Contains output from check design command, which is


clean in the above example.

Below is the gate-level netlist output of the gray counter RTL code after synthesis.

Figure 4.h : Synthesized gate-level netlist

module graycount ( gcc_out, reset_n, clk, en_count );


output [1:0] gcc_out;
input reset_n, clk, en_count;
wire N8, n1, n4, n5, n6;
assign gcc_out[0] = N8;

AO22X1 U2
( .IN1(gcc_out[1]), .IN2(n1), .IN3(en_count), .IN4(N8), .Q(n4) );
AO22X1 U3 ( .IN1(en_count), .IN2(n6), .IN3(N8), .IN4(n1), .Q(n5) );
INVX0 U4 ( .IN(en_count), .QN(n1) );
DFFARX1 \gcc_out_reg[0]
( .D(n5), .CLK(clk), .RSTB(reset_n), .Q(N8) );
DFFARX1 \gcc_out_reg[1]
( .D(n4), .CLK(clk), .RSTB(reset_n), .Q(gcc_out[1]),
.QN(n6) );
endmodule

4.2.3 Synthesis Script

###### Synthesis Script #######

## Give the path to the verilog files and define the WORK directory

lappend search_path ../src/gray_counter


define_design_lib WORK -path "work"

San Francisco State University Nano-Electronics & Computing Research Lab 58


## Define the library location
set link_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm
/Digital_Standard_Cell_Library/synopsys/models/saed90nm_max.db
packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/
Digital_Standard_Cell_Library/synopsys/models/saed90nm_typ.db
packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/
Digital_Standard_Cell_Library/synopsys/models/saed90nm_min.db]

set target_library [ list


/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm
/Digital_Standard_Cell_Library/synopsys/models/saed90nm_max.db ]

## read the verilog files


analyze -library WORK -format verilog gray_counter.v

elaborate -architecture verilog -library WORK graycount

## Check if design is consistent


check_design > reports/synth_check_design.rpt

## Create Constraints
create_clock clk -name ideal_clock1 -period 5
set_input_delay 2.0 [remove_from_collection [all_inputs] clk ] –clock
ideal_clock1
set_output_delay 2.0 [all_outputs] –clock ideal_clock1
set_max_area 0

## Compilation
## you can change medium to either low or high
compile -area_effort medium -map_effort medium

## Below commands report area , cell, qor, resources, and timing


information needed to analyze the design.

report_area > reports/synth_area.rpt


report_cell > reports/synth_cells.rpt
report_qor > reports/synth_qor.rpt
report_resources > reports/synth_resources.rpt
report_timing -max_paths 10 > reports/synth_timing.rpt

## Dump out the constraints in an SDC file

write_sdc const/gray_counter.sdc

## Dump out the synthesized database and gate leedl nedtlist


write -f ddc -hierarchy -output output/gray_counter.ddc
write -hierarchy -format verilog –output output/gray_counter.v

## You can play with the commands or exit

exit

San Francisco State University Nano-Electronics & Computing Research Lab 59


Note : There is another synthesis example of a FIFO in the below location for
further reference. This synthesized FIFO example is used in the physical
design IC Compiler Tutorial
Location :
/packages/synopsys/setup/asic_flow_setup/synth_fifo

APPENDIX 4A: SYNTHESIS OPTIMIZATION TECHNIQUES

4. A.0 Introduction

A fully optimized design is one, which has met the timing requirements and occupies the
smallest area. The optimization can be done in two stages one at the code level, the other
during synthesis. The optimization at the code level involves modifications to RTL code
that is already been simulated and tested for its functionality. This level of modifications
to the RTL code is generally avoided as sometimes it leads to inconsistencies between
simulation results before and after modifications. However, there are certain standard
model optimization techniques that might lead to a better synthesized
design.

4. A.1 Model Optimization

Model optimizations are important to a certain level, as the logic that is generated by the
synthesis tool is sensitive to the RTL code that is provided as input. Different RTL codes
generate different logic. Minor changes in the model might result in an increase or
decrease in the number of synthesized gates and also change its timing characteristics. A
logic optimizer reaches different endpoints for best area and best speed depending on the
starting point provided by a netlist synthesized from the RTL code. The different starting
points are obtained by rewriting the same HDL model using different constructs. Some of
the optimizations, which can be used to modify the model for obtaining a better quality
design, are listed below.

4.A.1.1 Resource Allocation


This method refers to the process of sharing a hardware resource under mutually
exclusive conditions. Consider the following if statement.

if A = ‘1’ then
E = B + C;
else
E = B + D;
end if;

San Francisco State University Nano-Electronics & Computing Research Lab 60


The above code would generate two ALUs one for the addition of B+C and other for the
addition B + D which are executed under mutually exclusive conditions. Therefore a
single ALU can be shared for both the additions. The hardware synthesized for the above
code is given below in Figure 4A.a.

Figure 4A.a Without resource allocation.


The above code is rewritten with only on addition operator being employed. The
hardware synthesized is given in Figure 4A.b.

if A = ‘1’ then
temp := C; // A temporary variable introduced.
else
temp := D;
end if;
E = B + temp;

Figure 4A.b. With resource allocation.

It is clear from the figure that one ALU has been removed with one ALU being shared
for both the addition operations. However a multiplexer is induced at the inputs of the
ALU that contributes to the path delay. Earlier the timing path of the select signal goes
through the multiplexer alone, but after resource sharing it goes through the multiplexer

San Francisco State University Nano-Electronics & Computing Research Lab 61


and the ALU datapath, increasing its path delay. However due to resource sharing the
area of the design has decreased. This is therefore a trade-off that the designer may have
to make. If the design is timing-critical it would be better if no resource sharing is
performed.

Common sub-expressions and Common factoring


It is often useful to identify common subexpressions and to reuse the computed values
wherever possible. A simple example is given below.

B := R1 + R2;
…..
C <= R3 – (R1 + R2);

Here the subexpression R1 + R2 in the signal assignment for C can be replaced by B as


given below. This might generate only one adder for the computation instead of two.
C <= R3 – B;

Common factoring is the extraction of common sub expressions in mutually-exclusive


branches of an if or case statement.

if (test)
A <= B & (C + D);
else
J <= (C + D) | T;
end if;

In the above code the common factor C + D can be place out of the if statement, which
might result in the tool generating only one adder instead of two as in the above case.

temp := C + D; // A temporary variable introduced.


if (test)
A <= B & temp;
else
J <= temp | T;
end if;

Such minor changes if made by the designer can cause the tool to synthesize better logic
and also enable it to concentrate on optimizing more critical areas.

Moving Code
In certain cases an expression might be placed, within a for/while loop statement, whose
value would not change through every iteration of the loop. Typically a synthesis tool
handles the a for/while loop statement by unrolling it the specified number of times. In
such cases redundant code might be generated for that particular expression causing
additional logic to be synthesized. This could be avoided if the expression is moved
outside the loop, thus optimizing the design. Such optimizations performed at a higher

San Francisco State University Nano-Electronics & Computing Research Lab 62


level, that is, within the model, would help the optimizer to concentrate on more critical
pieces of the code. An example is given below.

C := A + B;
…………
for c in range 0 to 5 loop
……………
T := C – 6;

// Assumption : C is not assigned a new value within the loop, thus the above expression
would remain constant on every iteration of the loop.
……………
end loop;

The above code would generate six subtracters for the expression when only one is
necessary. Thus by modifying the code as given below we could avoid the generation of
unnecessary logic.

C := A + B;
…………
temp := C – 6; // A temporary variable is introduced
for c in range 0 to 5 loop
……………
T := temp;
// Assumption : C is not assigned a new value within the loop, thus the above expression
would remain constant on every iteration of the loop.
……………
end loop;

Constant folding and Dead code elimination


The are possibilities where the designer might leave certain expressions which are
constant in value. This can be avoided by computing the expressions instead of the
implementing the logic and then allowing the logic optimizer to eliminate the additional
logic.

Ex:
C := 4;
….
Y = 2 * C;

Computing the value of Y as 8 and assigning it directly within your code can avoid the
above unnecessary code. This method is called constant folding. The other optimization,
dead code elimination refers to those sections of code, which are never executed.

Ex.
A := 2;

San Francisco State University Nano-Electronics & Computing Research Lab 63


B := 4;
if(A > B) then
……
end if;

The above if statement would never be executed and thus should be eliminated from the
code. The logic optimizer performs these optimizations by itself, but nevertheless if the
designer optimizes the code accordingly the tool optimization time would be reduced
resulting in faster tool running times.

4.A.1.2 Flip-flop and Latch optimizations


Earlier in the RTL code section, it has been described how flip-flops and latches are
inferred through the code by the synthesis tool. However there are only certain cases
where the inference of the above two elements is necessary. The designer thus should try
to eliminate all the unnecessary flip-flop and latch elements in the design. Placing only
the clock sensitive signals under the edge sensitive statement can eliminate the
unnecessary flip-flops. Similarly the unwanted latches can be avoided by specifying the
values for the signals under all conditions of an if/case statement.

4.A.1.3 Using Parentheses

The usage of parentheses is critical to the design as the correct usage might result in
better timing paths.

Ex.
Result <= R1 + R2 - P + M;
The hardware generated for the above code is as given below in Figure 4 (a).

If the expression has been written using parentheses as given below, the hardware
synthesized would be as given in Figure 4 (b).
Result <= (R1 + R2) – (P - M);

San Francisco State University Nano-Electronics & Computing Research Lab 64


It is clear that after using the parentheses the timing path for the datapath has been
reduced as it does not need to go through one more ALU as in the earlier case.

4.A.1.4 Partitioning and structuring the design.


A design should always be structured and partitioned as it helps in reducing design
complexity and also improves the synthesis run times since it smaller sub blocks
synthesis synthesize faster. Good partitioning results in the synthesis of a good quality
design. General recommendations for partitioning are given below.



Keep related combinational logic in the same module


Partition for design reuse.


Separate modules according to their functionality.


Separate structural logic from random logic.


Limit a reasonable block size (perhaps a maximum of 10K gates per block).


Partition the top level.


Do not add glue-logic at the top level.


Isolate state-machine from other logic.


Avoid multiple clocks within a block.
Isolate the block that is used for synchronizing the multiple clocks.

4.A.2 Optimization using Design Compiler

For the optimization of design, to achieve minimum area and maximum speed, a lot of
experimentation and iterative synthesis is needed. The process of analyzing the design for
speed and area to achieve the fastest logic with minimum area is termed – design space
exploration.

For the sake of optimization, changing of HDL code may impact other blocks in the
design or test benches. For this reason, changing the HDL code to help synthesis is less
desirable and generally is avoided. It is now the designer’s responsibility to minimize the
area and meet the timing requirements through synthesis and optimization. The later

San Francisco State University Nano-Electronics & Computing Research Lab 65


versions of DC, starting from DC98 have their compile flow different from previous
versions. In the DC98 and later versions the timing is prioritized over area. Another
difference is that DC98 performs compilation to reduce “total negative slack” instead of
“worst negative slack”. This ability of DC98 produces better timing results but has some
impact on area. Also DC98 requires designers to specify area constraints explicitly as
opposed to the previous versions that automatically handled area minimization. Generally
some area cleanup is performed but better results are obtained when constraints are
specified.

The DC has three different compilation strategies. It is up to user discretion to choose the
most suitable compilation strategy for a design.
a) Top-down hierarchical compile method.
b) Time-budget compile method.
c) Compile-characterize-write-script-recompile (CCWSR) method.

4.A.2.1 Top-down hierarchical Compile


Prior to the release of DC98 this method was used to synthesize small designs as this
method was extremely memory intensive and took a lot of time for large designs. In this
method the source is compiled by reading the entire design with constraints and attributes
applied, only at the top level. DC98 provided Synopsys the capability to synthesize
million gate designs by tackling much larger blocks (>100K) at a time. This approach is
feasible for some designs depending on the design style (single clock etc.) and other
factors. One may use this technique to synthesize larger blocks at a time by grouping the
sub-blocks together and flattening them to improve timing.

Advantages
฀ Only top level constraints are needed.
฀ Better results due to optimization across entire design.

Disadvantages
฀ Long compile time.
฀ Incremental changes to the sub-blocks require complete re-synthesis.
฀ Does not perform well, if design contains multiple clocks or generated clocks.

Time-budgeting compile.
This process is best for designs properly partitioned designs with timing specifications
defined for each sub-block. Due to specifying of timing requirements for each block,
multiple synthesis scripts for individual blocks are produced. The synthesis is usually
performed bottom-up i.e., starting at the lowest level and going up to the top most level.
This method is useful for medium to very large designs and does not require large
amounts memory.

Advantages
฀ Design easier to manage due to individual scripts.
฀ Incremental changes to sub-blocks do not require complete re-synthesis.

San Francisco State University Nano-Electronics & Computing Research Lab 66


฀ Can be used for any style of design, e.g. multiple and generated clocks.
Disadvantages
฀ Difficult to keep track of multiple scripts.
฀ Critical paths seen at top level may not be critical at lower level.
฀ Incremental compilations may be needed for fixing DRC’s.

Compile-Characterize-Write-Script-Recompile
This is an advanced synthesis approach, useful for medium to very large designs that do
not have good inter-block specifications defined. It requires constraints to be applied at
the top level of the design, with each sub-block compiled beforehand. The subblocks are
then characterized using the top-level constraints. This in effect propagates the required
timing information from the top-level to the sub-blocks. Performing a write_script on
the characterized sub-blocks generates the constraint file for each subblock.

The constraint files are then used to re-compile each block of the design.
Advantages
฀ Less memory intensive.
฀ Good quality of results because of optimization between sub-blocks of the design.
฀ Produces individual scripts, which may be modified by the user.
Disadvantages
฀ The generated scripts are not easily readable.
฀ It is difficult to achieve convergence between blocks
฀ Lower block changes might need complete re-synthesis of entire design.

Resolving Multiple instances


Before proceeding for optimization, one needs to resolve multiple instances of the sub-
block of your design. This is a necessary step as Dc does not permit compilation until
multiple instances are resolved.

Ex: Lets say moduleA has been synthesized. Now moduleB that has two instantiations of
moduleA as U1 and U2 is being compiled. The compilation will be stopped with an error
message stating that moduleA is instantiated 2 times in moduleB. There are two methods
of resolving this problem.
You can set a don_touch attribute on moduleA before synthesizing moduleB, or
uniquify moduleB. uniquify a dc_shell command creates unique definitions of multiple
instances. So it for the above case it generates moduleA-u1 and moduleA_u2 (in VHDL),
corresponding to instance U1 and U2 respectively.

4.A.2.2 Optimization Techniques


Various optimization techniques that help in achieving better area and speed for
your design are given below.

Compile the design


The compilation process maps the HDL code to actual gates specified from the target
library. This is done through the compile command. The syntax is given below :

San Francisco State University Nano-Electronics & Computing Research Lab 67


compile –map_effort <low | medium | high>
-incremental_mapping
-in_place
-no_design_rule | -only_design_rule
-scan
The compile command by default uses the –map_effort medium option. This usually
produces the best results for most of the designs. It also default settings for the structuring
and flattening attributes. The map_effort high should only be used, if target objectives are
not met through default compile. The -incremental_mapping is used only after initial
compile as it works only at gate-level. It is used to improve timing of the logic.

Flattening and structuring


Flattening implies reducing the logic of a design to a 2-level AND/OR representation.
This approach is used to optimize the design by removing all intermediate variables and
parenthesis. This option is set to “false” by default. The optimization is performed in two
stages. The first stage involves the flattening and structuring and the second stage
involves mapping of the resulting design to actual gates, using mapping optimization
techniques.

Flattening
Flattening reduces the design logic in to a two level, sum-of-products of form, with few
logic levels between the input and output. This results in faster logic. It is recommended
for unstructured designs with random logic. The flattened design then can be structured
before final mapping optimization to reduce area. This is important as flattening has
significant impact on area of the design. In general one should compile the design using
default settings (flatten and structure are set as false). If timing objectives are not met
flattening and structuring should be employed. It the design is still failing goals then just
flatten the design without structuring it. The command for flattening is given below

set_flatten <true | false>


-design <list of designs>
-effort <low | medium | high>
-phase <true | false>
The –phase option if set to true enables the DC to compare the logic produced by
inverting the equation versus the non-inverted form of the equation. Structuring The
default setting for this is “true”. This method adds intermediate variables that can be
factored out. This enables sharing of logic that in turn results in reduction of area.
For ex.

Before structuring After structuring


P = ax + ay + c P = aI + c
Q=x+y+zQ=I+z
I=x+y

San Francisco State University Nano-Electronics & Computing Research Lab 68


The shared logic generated might effect the total delay of the logic. Thus one should be
careful enough to specify realistic timing constraints, in addition to using default settings.
Structuring can be set for timing(default) or Boolean optimization. The latter helps in
reducing area, but has a greater impact on timing. Thus circuits that are timing sensitive
should not be structured for Boolean optimization. Good examples for Boolean
optimization are random logic structures and finite state machines. The command for
structuring is given below.

set_structure <true | false>


-design <list of designs>
-boolean <low | medium | high>
-timing <true | false>

If the design is not timing critical and you want to minimize for area only, then set the
area constraints (set_max_area 0) and perform Boolean optimization. For all other case
structure with respect to timing only.

Removing hierarchy
DC by default maintains the original hierarchy that is given in the RTL code. The
hierarchy is a logic boundary that prevents DC from optimizing across this boundary.
Unnecessary hierarchy leads to cumbersome designs and synthesis scripts and also limits
the DC optimization within that boundary, without optimizing across hierarchy. To allow
DC to optimize across hierarchy one can use the following commands.

dc_shell> current_design <design name>


dc_shell> ungroup –flatten –all

This allows the DC to optimize the logic separated by boundaries as one logic resulting in
better timing and an optimal solution.

Optimizing for Area

DC by default tries to optimize for timing. Designs that are not timing critical but area
intensive can be optimized for area. This can be done by initially compiling the design
with specification of area requirements, but no timing constraints. In addition, by using
the don_touch attribute on the high-drive strength gates that are larger in size, used by
default to improve timing, one can eliminate them, thus reducing the area considerably.
Once the design is mapped to gates, the timing and area constraints should again be
specified (normal synthesis) and the design re-compiled incrementally. The incremental
compile ensures that DC maintains the previous structure and does not bloat the logic
unnecessarily. The following points can be kept in mind for further area optimization:

 Bind all combinational logic as much as possible. If combinational logic were


spreadover different blocks of the design the optimization of the logic would not
be perfect resulting in large areas. So better partitioning of the design with

San Francisco State University Nano-Electronics & Computing Research Lab 69


combinational logic not spread out among different blocks would result in better


area.
At the top level avoid any kind of glue logic. It is better to incorporate glue logic
in one of the sub-components thus letting the tool to optimize the logic better.

4. A.3 Timing issues

There are two kind of timing issues that are important in a design- setup and hold timing
violations.
Setup Time: It indicates the time before the clock edge during which the data should be
valid i.e. it should be stable during this period and should not change. Any change during
this period would trigger a setup timing violation. Figure 4A.b illustrates an example with
setup time equal to 2 ns. This means that signal DATA must be valid 2 ns before the
clock edge; i.e. it should not change during this 2ns period before the clock edge.
Hold Time: It indicates the time after the clock edge during which the data should be
held valid i.e. it should not change but remain stable. Any change during this period
would trigger a hold timing violation. Figure 4A.b illustrates an example with hold time
equal to 1 ns. This means that signal DATA must be held valid 1 ns after the clock edge;
i.e. it should not change during the 1 ns period after the clock edge.

Figure 4A.b Timing diagram for setup and hold On DATA

The synthesis tool automatically runs its internal static timing analysis engine to check
for setup and hold time violations for the paths, that have timing constraints set on them.
It mostly uses the following two equations to check for the violations.

Tprop + Tdelay < Tclock - Tsetup (1)


Tdelay + Tprop > Thold (2)

Here Tprop is the propagation delay from input clock to output of the device in question
(mostly a flip-flop); Tdelay is the propagation delay across the combinational logic
through which the input arrives; Tsetup is the setup time requirement of the device;

Tclock is clock period;


Thold the hold time requirement of the device.

San Francisco State University Nano-Electronics & Computing Research Lab 70


So if the propagation delay across the combinational logic, Tdelay is such that the
equation (1) fails i.e. Tprop + Tdelay is more than Tclock – Tsetup then a setup timing
violation is reported. Similarly if Tdelay + Tprop is greater than Thold then a hold timing
violation is reported. In the case of the setup violation the input data arrives late due to
large Tdelay across the combinational logic and thus is not valid/unstable during the
setup time period triggering a violation. The flip-flop needs a certain time to read the
input. During this period the data must remain stable and unchanged and any change
would result in improper working of the device and thus a violation. In case of hold
timing violation the data arrives faster than usual because the Tdelay + Tprop is not
enough to delay the data enough. The flip-flop needs some time to store the data, during
which the data should remain stable. Any change during this period would result in a
violation. The data changes faster, without giving the flip-flop sufficient time to read it,
thus triggering a violation.

4.A.3.1 HOW TO FIX TIMING VIOLATIONS

When the synthesis tool reports timing violations the designer needs to fix them. There
are three options for the designer to fix these violations.

1) Optimization using synthesis tool: this is the easiest of all the other options. Few of
the techniques have been discussed in the section Optimization Techniques above.

Few other techniques will be dealt with later in this section.


2) Microarchitectural Tweaks: This is a manual approach compared to the previous one.
Here the designer should modify code to make microarchitectural changes that effect the
timing of the design. Some of these techniques were discussed in the section
Optimization Techniques and few new ones would be dealt with in this section.
3) Architectural changes: This is the last option as the designer needs to change the
whole architecture of the design under consideration and would take up a long time.

Optimization using synthesis tool


The tool can be used to tweak the design for improving performance. A designer for
performance optimization can employ the following ways.
a) Compilation with a map_effort high option;
b) Group critical paths together and give them a weight factor;
c) Register balancing;
d) Choose a specific implementation for a module;
e) Balancing heavy loading.

Compilation with a map_effort high


The initial compilation of a design is done with map_effort as medium when employing
design constraints. This usually gives the best results with flattening and structuring
options. In case the desired results are not met i.e. the design generates some timing
violations then the map_effort of high can be set. This usually takes a long time to run

San Francisco State University Nano-Electronics & Computing Research Lab 71


and thus is not used as the first option. This compilation could improve design
performance by about 10% .

Group critical paths and assign a weight factor


We can use the group_path command to group critical timing paths and set a weight
factor on these critical paths. The weight factor indicates the effort the tool needs to
spend to optimize these paths. Larger the weight factor the more the effort. This
command allows the designer to prioritize the critical paths for optimization using the
weight factor.

group_path –name <group_name> -from <starting_point> -to <ending_point> -weight


<value>

Register balancing
This command is particularly useful with designs that are pipelined. The command
reshuffles the logic from one pipeline stage to another. This allows extra logic to be
moved away from overly constrained pipeline stages to less constrained ones with
additional timing. The command is simply balance_registers.

Choose a specific implementation for a module


A synthesis tool infers high-level functional modules for operators like ‘+’, ‘-’, ‘*’, etc.. .
however depending upon the map_effort option set, the design compiler would choose
the implementation for the functional module. For example the adder has the following
kinds of implementation.

a) Ripple carry – rpl


b) Carry look ahead –cla
c) Fast carry look ahead –clf
d) Simulation model –sim

The implementation type sim is only for simulation. Implementation types rpl, cla, and
clf are for synthesis; clf is the faster implementation followed by cla; the slowest being
rpl. If compilation of map_effort low is set the designer can manually set the
implementation using the set_implementation command. Otherwise the selection will not
change from current choice. If the map_effort is set to medium the design compiler
would automatically choose the appropriate implementation depending upon the
optimization algorithm. A choice of medium map_effort is suitable for better
optimization or even a manual setting can be used for better performance results.
Balancing heavy loading Designs generally have certain nets with heavy fanout
generating a heavy load on a certain point. A large load would be difficult to drive by a
single net. This leads to unnecessary delays and thus timing violations. The
balance_buffers command comes in hand to solve such problems. this command would
make the design compiler to create buffer trees to drive the large fanout and thus balance
the heavy load.

Microarchitectural Tweaks

San Francisco State University Nano-Electronics & Computing Research Lab 72


The design can be modified for both setup timing violations as well as hold timing
violations. Lets deal with setup timing violations. When a design with setup violations
cannot be fixed with tool optimizations the code or microarchitectural implementation
changes should be employed.
The following methods can be used for this purpose.

a) Logic duplication to generate independent paths


b) Balancing of logic between flip-flops
c) Priority decoding versus multiplex decoding
Logic duplication to generate independent paths

Consider the figure 4A.c Assuming a critical path exists from A to Q2, logic optimization
on combinational logic X, Y, and Z would be difficult because X is shared with Y and Z.
We can duplicate the logic X as shown in figure 4A.d. In this case Q1 and Q2 have
independent paths and the path for Q2 can be optimized in a better fashion by the tool to
ensure better performance.

Figure 4A.c : Logic with Q2 critical path

Figure 4A.d: Logic duplication allowing Q2 to be an independent path.

Logic duplication can also be used in cases where a module has one signal arriving late
compared to other signals. The logic can be duplicated in front of the fast -arriving
signals such that timing of all the signals is balanced. Figure 4A.e & 4A.f illustrate this
fact quite well. The signal Q might generate a setup violation as it might be delayed due

San Francisco State University Nano-Electronics & Computing Research Lab 73


to the late-arriving select signal of the multiplexer. The combinational logic present at the
output could be put in front of the inputs (fast arriving). This would cause the delay due
the combinational logic to be used appropriately to balance the timing of the inputs of
the multiplexer and thus avoiding the setup violation for Q.

Figure 4A.e: Multiplexer with late arriving sel signal

Figure 4A.f: Logic Duplication for balancing the timing between signals

Balancing of logic between flip-flops


This concept is similar to the balance_registers command we have come across in the
Tool optimization section. The difference is that the designer does this at the code level.
To fix setup violations in designs using pipeline stages the logic between each stage
should be balanced. Consider a pipeline stage consisting of three flip-flops and two
combinational logic modules in between each flip-flop. If the delay of the first logic
module is such that it violates the setup time of the second flip-flop by a large margin and
the delay of the second logic module is so less that the data on the third flip-flop is
comfortably meeting the setup requirement. We can move part of the first logic module
to the second logic module so that the setup time requirement of both the flip-flops is met.
This would ensure better performance without any violations taking place. Figure 4.A.g
illustrates the example.

Figure 4.A.g : Logic with pipeline stages

San Francisco State University Nano-Electronics & Computing Research Lab 74


Priority encoding versus multiplex encoding

When a designer knows for sure that a particular input signal is arriving late then priority
encoding would be a good bet. The signals arriving earlier could be given more priority
and thus can be encoded before the late arriving signals.

Consider the boolean equation:


Q = A.B.C.D.E.F

It can be designed using five and gates with A, B at the first gate. The output of first gate
is anded with C and output of the second gate with D and so on. This would ensure
proper performance if signal F is most late arriving and A is the earliest to arrive. If
propagation delay of each and gate were 1 ns this would ensure the output signal Q would
be valid only 5 ns after A is valid or only 1 ns after signal H is valid. Multiplex decoding
is useful if all the input signals arrive at the same time. This would ensure that the output
would be valid at a faster rate. Thus multiplex decoding is faster than priority decoding if
all input signals arrive at the same time. In this case for the boolean equation above the
each of the two inputs would be anded parallely in the form of A.B, C.D and E.F each
these outputs would then be anded again to get the final output. This would ensure Q to
be valid in about 2 ns after A is valid.

Fixing Hold time violations


Hold time violations occur when signals arrive to fast causing them to change before they
are read in by the devices. The best method to fix paths with hold time violations is to
add buffers in those paths. The buffers generate additional delay slowing the path
considerably. One has to careful while fixing hold time violations. Too many buffers
would slow down the signal a lot and might result in setup violations which are a problem
again.

4A.4 Verilog Synthesizable Constructs


Since it is very difficult fot the synthesis tool to find hardware with exact delays, all
absolute and relative timing declarations are ignored by the tools. Also, all signals are
assumed to be of maximum strength (strength 7). Boolean operations on x and z are not
permitted. The constructs are classified as

San Francisco State University Nano-Electronics & Computing Research Lab 75


 Fully supported constructs- Constructs that are supported as defined in the
Verilog Language Reference Manual.
 Partially supported constructs- Constructs supported with restrictions on them
 Ignore constructs - constructs which are ignored by the synthesis tool
 Unsupported constructs- constructs which if used, may cause the synthesis tool to
not accept the Verilog input or may cause different results between synthesis and
simulation.

Fully supported constructs:


<module instantiation, with named and positional notations>

<integer data types, with all bases>

<identifiers>

<subranges and slices on right hand side of assignment>

<continuous assignment>

>>,<<,?:,{}
assign (procedural and declarative), begin, end, case, casex, casez, endcase
default
disable
function, endfunction
if, else, else if
input, output, inout
wire, wand, wor, tri
integer, reg
macromodule, module
parameter
supply0, supply1
task, endtask

Partially Supported Constructs

Construct Constraints
when both operands
constants or second
*,/,%
operand
is a power of 2
only edge triggered
Always
events
bounded by static
For variables: only ise +
or - to index

San Francisco State University Nano-Electronics & Computing Research Lab 76


posedge, negedge only with always @
Combinational and
edge sensitive user
primitive, endprimitive, table, endtable defined
primitives are often
supported.
limitations on usage
<= with blocking
statement
Gate types
and,nand,or,nor,xor,xnor,buf,not,,bufif0,bufif1,not
supported without X
if0,notif1
or Z constructs
Operators supported
!, &&, ||, ~, &, |, ^, ^~, ~^, ~&, ~|, +, -, <, >, <=,
without X or Z
>=, ++, !=
constructs

Ignored Constructs

<intra assignment timing controls>


<delay specifications>
scalared, vectored
small medium large
specify
time (some tools treat these as integers)
weak1, weak0, highz0, highz1, pull0, pull1
$keyword (some tools use these to set synthesis constraints)
wait (some tools support wait with a bounded condition).

Unsupported constructs
<assignment with variable used as bit select on LHS of assignment>
<global variables>
===, !==
cmos,nmos,rcmos,rnmos,pmos,rpmos
deassign
defparam
event
force
fork,join
forever,while
initial
pullup,pulldown
release
repeat
rtran,tran,tranif0
tranif1

San Francisco State University Nano-Electronics & Computing Research Lab 77


rtranif0,rtranif1
table,endtable,primitive,endprimitive

5.0 DESIGN VISION


5.1 ANALYSIS OF GATE-LEVEL SYNTHESIZED NETLIST USING
DESIGN VISION

Synopsys provides a GUI front-end to Design Compiler called Design Vision which we
will use to analyze the synthesis results. You should avoid using the GUI to actually
perform synthesis since we want to use scripts for this. To launch Design Vision and read
in the synthesized design, move into the /project_dc/synth/ working directory and use the
following commands. The command “design_vision-xg” will open up a GUI.

% design_vision-xg
design_vision-xg> read_file -format ddc output/gray_counter.ddc

You can browse your design with the hierarchical view. Right click on the gray_counter
module and choose the Schematic View option [Figure 8.a], the tool will display a
schematic of the synthesized logic corresponding to that module. Figure 8.b shows the
schematic view for the gray counter module. You can see synthesized flip-flops in the
schematic view.

Figure 5.a: Design Vision GUI

San Francisco State University Nano-Electronics & Computing Research Lab 78


Figure 5.b: Schematic View of Synthesized Gray Counter

San Francisco State University Nano-Electronics & Computing Research Lab 79


You can use Design Vision to examine various timing data. The Schematic ! Add Paths
From/To menu option will bring up a dialog box which you can use to examine a specific
path. The default options will produce a schematic of the critical path. The Timing !
Paths Slack menu option [Figure 8.c] will create a histogram of the worst case timing
paths in your design. You can use this histogram to gain some intuition on how to
approach a design which does not meet timing. If there area large number of paths which
have a very large negative timing slack then a global solution is probably necessary,
while if there are just one or two paths which are not making timing a more local
approach may be sufficient. Figure 8.c and Figure 8.d shows an example of using these
two features.

In the current gray_count design, there are no submodules. If there are submodules in the
design, it is sometimes useful to examine the critical path through a single submodule. To
do this, right click on the module in the hierarchy view and use the Characterize option.
Check the timing, constraints, and connections boxes and click OK. Now choose the
module from the drop down list box on the toolbar (called the Design List). Choosing
Timing ! Report Timing will provide information on the critical path through that
submodule given the constraints of the submodule within the overall design’s context.
For more information on Design Vision consult the Design Vision User Guide

San Francisco State University Nano-Electronics & Computing Research Lab 80


Figure 5.c Display Timing Path

Figure 5.d Histogram of Timing Paths

San Francisco State University Nano-Electronics & Computing Research Lab 81


STATIC TIMING ANALYSIS

6.0 Introduction

Why is timing analysis important when designing a chip?


Timing is important because just designing the chip is not enough; we need to know
how fast the chip is going to run, how fast the chip is going to interact with the other
chips, how fast the input reaches the output etc…
Timing Analysis is a method of verifying the timing performance of a design by
checking for all possible timing violations in all possible paths.

Why do we normally do Static Timing Analysis and not Dynamic Timing Analysis?
What is the difference between them?
Timing Analysis can be done in both ways; static as well as dynamic. Dynamic
Timing analysis requires a comprehensive set of input vectors to check the timing
characteristics of the paths in the design. Basically it determines the full behavior of the
circuit for a given set of input vectors. Dynamic simulation can verify the functionality of
the design as well as timing requirements. For example if we have 100 inputs then we
need to do 2 to the power of 100 simulations to complete the analysis. The amount of
analysis is astronomical compared to static analysis.
Static Timing analysis checks every path in the design for timing violations without
checking the functionality of the design. This way, one can do timing and functional
analysis same time but separately. This is faster than dynamic timing simulation because
there is no need to generate any kind of test vectors. That’s why STA is the most popular
way of doing timing analysis.

6.1 Timing Paths

1. Input pin/port  Sequential Element


The different kinds of paths when checking the timing of a design are as follows:

2. Sequential Element  Sequential Element


3. Sequential Element  Output pin/port
4. Input pin/port  Output pin/port

The static timing analysis tool performs the timing analysis in the following way:
1. STA Tool breaks the design down into a set of timing paths.
2. Calculates the propagation delay along each path.
3. Checks for timing violations (depending on the constraints e.g. clock) on the
different paths and also at the input/output interface.

San Francisco State University Nano-Electronics & Computing Research Lab 82


6.1.1 Delay Calculation of each timing path:
STA calculates the delay along each timing path by determining the Gate delay and Net
delay.
Gate Delay: Amount of delay from the input to the output of a logic gate. It is calculated
based on 2 parameters
a. Input Transition Time
b. Output Load Capacitance

Net Delay: Amount of delay from the output of a gate to the input of the next gate in a
timing path. It depends on the following parameters
a. Parasitic Capacitance
b. Resistance of net

During STA, the tool calculates timing of the path by calculating:

2. Output Transition Time  (which in turn depends on Input Transition Time and
1. Delay from input to output of the gate (Gate Delay).

Output Load Capacitance).

6.2 Timing Exceptions


Timing exceptions are nothing but constraints which don’t follow the default when doing
timing analysis. The different kinds of timing exceptions are:
1. False path: If any path does not affect the output and does not contribute to the
delay of the circuit then that path is called false path.
2. Multicycle Path: Multicycle paths in a design are the paths that require more than
one clock cycle. Therefore they require special Multicycle setup and hold-time
calculations.
3. Min/Max Path: This path must match a delay constraint that matches a specific
value. It is not an integer like the multicycle path. For example:
Delay from one point to another max: 1.67ns; min: 1.87ns

 3 input and gate (a, b, c) and output (out). If you want you can disable the
4. Disabled Timing Arcs: The input to the output arc in a gate is disabled. For e.g.

path from input ‘a’ to output ‘out’ using disable timing arc constraint.

6.3 Setting up Constraints to calculate timing:


To perform timing analysis we need to specify constraints. Few of the basic constraints
one need to specify are:
1. Clock Constraint: Define the clock frequency you want your circuit to run at.
This clock input controls all the timing in the chip/design.
2. Setting Input Delay: This delay is defined as the time taken by the signal to
reach the input with respect to the clock.
3. Setting Output Delay: This delay is the 9delay incurred outside the particular
block/pin/port with respect to the clock.
Example: Assume Clock to be: 10ns
If Output Delay is 5.4 ns then Input Delay would be: 10ns -5.4ns = 4.6ns

San Francisco State University Nano-Electronics & Computing Research Lab 83


4. Interface Timing: It is the timing between different components/chips of a design.

6.4 Basic Timing Definitions:


 Clock Latency: Clock latency means delay between the clock source and the
clock pin. This is called as source latency of the clock. Normally it specifies the
skew between the clock generation point and the Clock pin.
 Rise Time: It is defined as the time it takes for a waveform to rise from 10% to


90% of its steady state value.
Fall time: It is defined as the time it takes for a waveform to rise from 90% to


10% of its steady state value.
Clock-Q Delay: It is the delay from rising edge of the clock to when Q (output)
becomes available. It depends on
o Input Clock transition


o Output Load Capacitance
Clock Skew: It is defined as the time difference between the clock path reference
and the data path reference. The clock path reference is the delay from the main
clock to the clock pin and data path reference is the delay from the main clock to
the data pin of the same block. (Another way of putting it is the delay between the


longest insertion delay and the smallest insertion delay.)
Metastability: It is a condition caused when the logic level of a signal is in an


indeterminate state.
Critical Path: The clock speed is normally determined by the slowest path in the
design. This is often called as ‘Critical Path’.
 Clock jitter: It is the variation in clock edge timing between clock cycles. It is


usually caused by noise.
Set-up Time: It is defined as the time the data/signal has to stable before the


clock edge.
Hold Time: It is defined as the time the data/signal has to be stable after the clock


edge.
Interconnect Delay: This is delay caused by wires. Interconnect introduces three
types of parasitic effects – capacitive, resistive, and inductive – all of which


influence signal integrity and degrade the performance of the circuit.
Negative Setup time: In certain cases, due to the excessive delay (example:
caused by lot of inverters in the clock path) on the clock signal, the clock signal
actually arrives later than the data signal. The actual clock edge you want your


data to latch arrives later than the data signal. This is called negative set up time.
Negative Hold time: It basically allows the data that was supposed to change in


the next cycle, change before the present clock edge.
Negative Delay: It is defined as the time taken by the 50% of output crossing to


50% of the input crossing.
Transition Time: It is the time taken for the signal to go from one logic level to
another logic level

San Francisco State University Nano-Electronics & Computing Research Lab 84


 Delay Time: It is defined as the time taken by the 50% of input crossing to 50%
of the output crossing. ( 50% of i/p transition level to 50% of output transition


level)
Insertion Delay: Delay from the clock source to that of the sequential pin.

6.5 Clock Tree Synthesis (CTS):


Clock Tree Synthesis is a process which makes sure that the clock gets distributed
evenly to all sequential elements in a design. Also, if a net is a high fan out net, then we
need to do load balancing. A high fan out net is a net which drives a large number of
inputs. Load balancing is nothing but Clock tree Synthesis. During this process,
depending on the clock skew, buffers are added to the different clock paths in the design.
For example, let us consider two Flip-Flops; Launch FF and Capture FF. Launch FF is
where the data is launched and Capture FF is the FF where data has to be captured. To
improve setup time or hold time, the following need to be done.
a. More delays (buffers) are added on the launching side to have a better hold time.

 The most efficient time to CTS is after Placement.


b. More delays (buffers) are added to latching side to have a better Set up time.

You can learn about CTS more detail in the Physical Design part of this tutorial.

Clock Network Delay: A set of buffers are added in between the source of the clock to
the actual clock pin of the sequential element. This delay due to the addition of all these
buffers is defined as the Clock Network Delay. [Clock Network Delay is added to clock
period in Primetime]
Path Delay: When calculating path delay, the following has to be considered:
Clock Network Delay+ Clock-Q + (Sum of all the Gate delays and Net delays)
Global Clock skew: It is defined as the delay which is nothing but the difference
between the Smallest and Longest Clock Network Delay.
Zero Skew: When the clock tree is designed such that the skew is zero, it is defined as
zero skew.
Local Skew: It is defined as the skew between the launch and Capture flop. The worst
skew is taken as Local Skew.
Useful Skew: When delays are added only to specific clock paths such that it improves
set up time or hold time, is called useful skew.

What kind of model does the tool use to calculate the delay?
The tool uses a wire load model. It is nothing but a statistical model .It consists of a table
which gives the capacitance and resistance of the net with respect to fan-out.

For more information please refer to the Primetime User Manual in the
packages/synopsys/ directory.

San Francisco State University Nano-Electronics & Computing Research Lab 85


6.6 PRIMETIME TUTORIAL EXAMPLE
6.6.1 Introduction

PrimeTime (PT) is a sign-off quality static timing analysis tool from Synopsys. Static
timing analysis or STA is without a doubt the most important step in the design flow. It
determines whether the design works at the required speed. PT analyzes the timing delays
in the design and flags violation that must be corrected.

PT, similar to DC, provides a GUI interface along with the command-line interface. The
GUI interface contains various windows that help analyze the design graphically.
Although the GUI interface is a good starting point, most users quickly migrate to using
the command-line interface. Therefore, I will focus solely on the command-line interface
of PT.

PT is a stand-alone tool that is not integrated under the DC suite of tools. It is a separate
tool, which works alongside DC. Both PT and DC have consistent commands, generate
similar reports, and support common file formats. In addition PT can also generate timing
assertions that DC can use for synthesis and optimization. PT’s command-line interface is
based on the industry standard language called Tcl. In contrast to DC’s internal STA
engine, PT is faster, takes up less memory, and has additional features.

6.6.2 Pre-Layout

After successful synthesis, the netlist obtained must be statically analyzed to check for
timing violations. The timing violations may consist of either setup and/or hold-time
violations. The design was synthesized with emphasis on maximizing the setup-time,
therefore you may encounter very few setup-time violations, if any. However, the hold-
time violations will generally occur at this stage. This is due to the data arriving too fast
at the input of sequential cells with respect to the clock.

If the design is failing setup-time requirements, then you have no other option but to re-
synthesize the design, targeting the violating path for further optimization. This may
involve grouping the violating paths or over constraining the entire sub-block, which had
violations. However, if the design is failing hold-time requirements, you may either fix
these violations at the pre-layout level, or may postpone this step until after layout. Many
designers prefer the latter approach for minor hold-time violations (also used here), since
the pre-layout synthesis and timing analysis uses the statistical wire-load models and
fixing the hold-time violations at the pre-layout level may result in setup-time violations
for the same path, after layout. However, if the wire-load models truly reflect the post-
routed delays, then it is prudent to fix the hold-time violations at this stage. In any case, it
must be noted that gross hold-time violations should be fixed at the pre-layout level, in
order to minimize the number of hold-time fixes, which may result after the layout.

San Francisco State University Nano-Electronics & Computing Research Lab 86


6.6.2.1 PRE-LAYOUT CLOCK SPECIFICATION

In the pre-layout phase, the clock tree information is absent from the netlist. Therefore, it
is necessary to estimate the post-route clock-tree delays upfront, during the pre-layout
phase in order to perform adequate STA. In addition, the estimated clock transition
should also be defined in order to prevent PT from calculating false delays (usually large)
for the driven gates. The cause of large delays is usually attributed to the high fanout
normally associated with the clock networks. The large fanout leads to slow input
transition times computed for the clock driving the endpoint gates, which in turn results
in PT computing unusually large delay values for the endpoint gates. To prevent this
situation, it is recommended that a fixed clock transition value be specified at the source.
The following commands may be used to define the clock, during the prelayout phase of
the design.

pt_shell> create_clock –period 20 –waveform [list 0 10] [list CLK]


pt_shell> set_clock_latency 2.5 [get_clocks CLK]
pt_shell> set_clock_transition 0.2 [get_clocks CLK]
pt_shell> set_clock_uncertainty 1.2 –setup [get_clocks CLK]
pt_shell> set_clock_uncertainty 0.5 –hold [get_clocks CLK]

The above commands specify the port CLK as type clock having a period of 20ns, the
clock latency as 2.5ns, and a fixed clock transition value of 0.2ns. The clock latency
value of 2.5ns signifies that the clock delay from the input port CLK to all the endpoints
is fixed at 2.5ns. In addition, the 0.2ns value of the clock transition forces PT to use the
0.2ns value, instead of calculating its own. The clock skew is approximated with 1.2ns
specified for the setup-time, and 0.5ns for the hold-time. Using this approach during pre-
layout yields a realistic approximation to the post-layout clock network results.

6.6.3 STEPS FOR PRE-LAYOUT TIMING VALIDATION

0. The design example for the rest of this tutorial is a FIFO whose verilog code is
available in asic_flow_setup/src/fifo/fifo.v . Please first run the DC synthesis on this file:

[hkommuru@hafez ]$ source /packages/synopsys/setup/synopsys_setup.tcl


[[email protected]] $ cd
[[email protected]] $cd asic_flow_setup/synth_fifo
[[email protected]] $cd scripts
[[email protected]] $emacs dc_synth.tcl &
[[email protected]] $cd ..
[[email protected]] $ dc_shell-xg-t

and run the scripts in /asic_flow_setup/synth_fifo/scripts/dc_synth.tcl

1. Please source “synopsys_setup.tcl” which sets all the environment variables


necessary to run the Primetime . Please type csh at the unix prompt before you source the
below script.

San Francisco State University Nano-Electronics & Computing Research Lab 87


Please source the above file from the below location.

[[email protected]] $ csh
[[email protected]] $ cd
[[email protected]] $ cd /asic_flow_setup/pt
[[email protected]] $ source /packages/synopsys/setup/synopsys_setup.tcl

2. PT may be invoked in the command-line mode using the command pt_shell or in the
GUI mode through the command primetime as shown below.
Command-line mode:
> pt_shell
GUI-mode:
> primetime

Before doing the next step , open the pre_layout_pt.tcl script and keep it ready which is at
location /

[[email protected]]$ vi scripts/pre_layout_pt.tcl

3. Just like DC setup, you need to set the path to link_library and search_path

San Francisco State University Nano-Electronics & Computing Research Lab 88


pt_shell > set link_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_typ.db
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max .db/packages/process_kit/gen
eric/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digital_Standard_Cell_Library/
synopsys/models/saed90nm_min.db ]
pt_shell > set target_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max.db]

4. Read the synthesized netlist

pt_shell> read_verilog ../synth_fifo/output/fifo.v

5. You need to set the top module name .

pt_shell > current_design FIFO

6. Read in the SDC from the synthesis

pt_shell> source ../synth_fifo/const/fifo.sdc

7. Now we can do the analysis of the design, as discussed in the beginning of this chapter,

 From primary inputs to all flops in the design.


general, four types of analysis is performed on the design, as follows:

 From flop to flop.


 From flop to primary output of the design.
 From primary inputs to primary outputs of the design.

All four types of analysis can be accomplished by using the following commands:

pt_shell> report_timing -from [all_inputs] -max_paths 20 -to [all_registers -data_pins] >


reports/timing.rpt

pt_shell> report_timing -from [all_register -clock_pins] -max_paths 20 -to [all_registers


-data_pins] >> reports/timing.rpt

pt_shell> report_timing -from [all_registers -clock_pins] -max_paths 20 -to [all_outputs]


>> reports/timing.rpt

pt_shell> report_timing -from [all_inputs] -to [all_outputs] -max_paths 20 >>


reports/timing.rpt

San Francisco State University Nano-Electronics & Computing Research Lab 89


Please note that the above command write their reports to the file timing.rpt under the
reports subdirectory. Please open this file to read the reports and check for any errors.
Please notice that -max_paths 20 option above, gives the worst 20 paths in the design.

8. Reporting setup time and hold time. Primetime by default reports the setup time. You
can report the setup or hold time by specifying the –delay_type option as shown in below
figure.

pt_shell> report_timing -from [all_registers -clock_pins] -to [all_registers -data_pins] -


delay_type max >> reports/timing.rpt

If you open and read the timing.rpt file, you will notice that the design meets the setup
time.

9. Reporting hold time

pt_shell> report_timing -from [all_registers -clock_pins] -to [all_registers -data_pins] -


delay_type min >> reports/timing.rpt

San Francisco State University Nano-Electronics & Computing Research Lab 90


If you open and read the timing.rpt file, you will notice that the design meets the setup
time.

10. Reporting timing with capacitance and transition time at each level in the path

pt_shell > report_timing -transition_time -capacitance -nets -input_pins -from


[all_registers -clock_pins] -to [all_registers -data_pins] > reports/timing.tran.cap.rpt

11. You can save your session and come back later if you chose to.

pt_shell > save_session output/fifo.session

Note: If the timing is not met, you need to go back to synthesis and redo to make sure the
timing is clean before you proceed to the next step of the flow that is Physical
Implementation.

San Francisco State University Nano-Electronics & Computing Research Lab 91


IC COMPILER TUTORIAL
8.0 Basics of Physical Implementation

8.1 Introduction

As you have seen in the beginning of the ASIC tutorial, after getting an optimized gate-
level netlist, the next step is Physical implementation. Before we actually go into details
of ICCompiler, which is the physical implementation tool from Synopsys, this chapter
covers the necessary basic concepts needed to do physical implementation. Also, below
you can see a more detailed flowchart of ASIC flow.

Figure 8.1.a : ASIC FLOW DIAGRAM

San Francisco State University Nano-Electronics & Computing Research Lab 92


The Physical Implementation step in the ASIC flow consists of:
1. Floorplanning
2. Placement
3. Routing
This document will cover each one of the topics mentioned above one by one.

8.2 Floorplanning

At the floorplanning stage, we have a netlist which describes the design and the various
blocks of the design and the interconnection between the different blocks. The netlist is
the logical description of the ASIC and the floorplan is the physical description of the
ASIC. Therefore, by doing floorplanning, we are mapping the logical description of the
design to the physical description. The main objectives of floorplanning are to minimize
a. Area

San Francisco State University Nano-Electronics & Computing Research Lab 93


b. Timing (delay)

 The size of the chip is estimated.


During floorplanning, the following are done:

 The various blocks in the design, are arranged on the chip.


 Pin Assignment is done.
 The I/O and Power Planning are done.
 The type of clock distribution is decided

Figure 8.2.a : Floorplan example


I/O pins

Row utilization

t
Row y
Spacing

Floorplanning is a major step in the Physical Implementation process. The final timing,
quality of the chip depends on the floorplan design. The three basic elements of chip are:
1. Standard Cells: The design is made up of standard cells.
2. I/O cells: These cells are used to help signals interact to and from the chip.
3. Macros (Memories): To store information using sequential elements takes up lot
of area. A single flip flop could take up 15 to 20 transistors to store one bit.
Therefore special memory elements are used which store the data efficiently and
also do not occupy much space on the chip comparatively. These memory cells
are called macros. Examples of memory cells include 6T SRAM (Static Dynamic
Access Memory), DRAM (Dynamic Random Access Memory) etc.
The above figure shows a basic floorplan. The following is the basic floorplanning steps
(and terminology):

1. Aspect ratio (AR): It is defines as the ratio of the width and length of the chip.
From the figure, we can say that aspect ratio is x/y. In essence, it is the shape of
the rectangle used for placement of the cells, blocks. The aspect ratio should take
into account the number of routing resources available. If there are more

San Francisco State University Nano-Electronics & Computing Research Lab 94


horizontal layers, then the rectangle should be long and width should be small and
vice versa if there are more vertical layers in the design.
a. Normally, METAL1 is used up by the standard cells. Usually, odd
numbered layers are horizontal layers and even numbered layers are
vertical. So for a 5 layer design, AR = 2/2 = 1.
b. Example2: For a 6 layer design, AR = 2/3 = 0.66
2. Concept of Rows: The standard cells in the design are placed in rows. All the
rows have equal height and spacing between them. The width of the rows can
vary. The standard cells in the rows get the power and ground connection from
VDD and VSS rails which are placed on either side of the cell rows. Sometimes,
the technology allows the rows to be flipped or abutted, so that they can share the
power and ground rails.
3. Core: Core is defined as the inner block, which contains the standard cells and
macros. There is another outer block which covers the inner block. The I/O pins
are placed on the outer block.
4. Power Planning: Signals flow into and out off the chip, and for the chip to work,
we need to supply power. A power ring is designed around the core. The power
ring contains both the VDD and VSS rings. Once the ring is placed, a power mesh
is designed such that the power reaches all the cells easily. The power mesh is
nothing but horizontal and vertical lines on the chip. One needs to assign the
metal layers through which you want the power to be routed. During power
planning, the VDD and VSS rails also have to be defined.
5. I/O Placement: There are two types of I/O‘s.
a. Chip I/O: The chip contains I/O pins. The chip consists of the core, which
contains all the standard cells, blocks. The chip I/O placement consists of
the placement of I/O pins and also the I/O pads. The placement of these
I/O pads depends on the type of packaging also. ( Refer document :
Packaging)
b. Block I/O: The core contains several blocks. Each block contains the
Block I/O pins which communicate with other blocks, cells in the chip.
This placement of pins can be optimized.
6. Pin Placement: Pin Placement is an important step in floorplanning. You may not
know where to place the pins initially, but later on when you get a better idea, the
pin placement can be done based on timing, congestion and utilization of the chip.
a. Pin Placement in Macros: It uses up m3 layers most of the time, so the
macro needs to be placed logically. The logical way is to put the macros
near the boundary. If there is no connectivity between the macro pins and
the boundary, then move it to another location.
7. Concept of Utilization: Utilization is defined as the percentage of the area that
has been utilized in the chip. In the initial stages of the floorplan design, if the size
of the chip is unknown, then the starting point of the floorplan design is utilization.
There are three different kinds of utilizations.
a. Chip Level utilization: It is the ratio of the area of standard cells, macros

 Area (Standard Cells) + Area (Macros) + Area (Pad Cells)


and the pad cells with respect to area of chip.

Area (chip)

San Francisco State University Nano-Electronics & Computing Research Lab 95


b. Floorplan Utilization: It is defined as the ratio of the area of standard
cells, macros, and the pad cells to the area of the chip minus the area of the

 Area (Standard Cells) + Area (Macros) + Area (Pad Cells)


sub floorplan.

Area (Chip) – Area (sub floorplan)


c. Cell Row Utilization: It is defined as the ratio of the area of the standard
cells to the area of the chip minus the area of the macros and area of


blockages.
Area (Standard Cells)
Area (Chip) - Area (Macro) – Area (Region Blockages)

8. Macro Placement: As a part of floorplanning, initial placement of the macros in


the core is performed. Depending on how the macros are placed, the tool places
the standard cells in the core. If two macros are close together, it is advisable to
put placement blockages in that area. This is done to prevent the tool from putting
the standard cells in the small spaces between the macros, to avoid congestion.
Few of the different kinds of placement blockages are:
a. Standard Cell Blockage: The tool does not put any standard cells in the
area specified by the standard cell blockage.
b. Non Buffer Blockage: The tool can place only buffers in the area
specified by the Non Buffer Blockage.
c. Blockages below power lines: It is advisable to create blockages under
power lines, so that they do not cause congestion problems later. After
routing, if you see an area in the design with a lot of DRC violations, place
small chunks of placement blockages to ease congestion.

After Floorplanning is complete, check for DRC (Design Rule check) violations. Most
of the pre-route violations are not removed by the tool. They have to be fixed manually.
I/O Cells in the Floorplan: The I/O cells are nothing but the cells which interact in
between the blocks outside of the chip and to the internal blocks of the chip. In a
floorplan these I/O cells are placed in between the inner ring (core) and the outer ring
(chip boundary). These I/O cells are responsible for providing voltage to the cells in the
core. For example: the voltage inside the chip for 90nm technology is about 1.2 Volts.
The regulator supplies the voltage to the chip (Normally around 5.5V, 3.3V etc).
The next question which comes to mind is that why is the voltage higher than the voltage
inside the chip?
The regulator is basically placed on the board. It supplies voltage to different other chips
on board. There is lot of resistances and capacitances present on the board. Due to this,
the voltage needs to be higher. If the voltage outside is what actually the chip need inside,
then the standard cells inside of the chip get less voltage than they actually need and the
chip may not run at all.
So now the next question is how the chips can communicate between different voltages?
The answer lies in the I/O cells. These I/O cells are nothing but Level Shifters. Level
Shifters are nothing but which convert the voltage from one level to another The Input
I/O cells reduce the voltage coming form the outside to that of the voltage needed inside

San Francisco State University Nano-Electronics & Computing Research Lab 96


the chip and output I/O cells increase the voltage which is needed outside of the chip. The
I/O cells acts like a buffer as well as a level shifter.

8.3 Concept of Flattened Verilog Netlist

Most of the time, the verilog netlist is in the hierarchical form. By hierarchical I mean
that the design is modeled on basis of hierarchy. The design is broken down into different
sub modules. The sub modules could be divided further. This makes it easier for the logic
designer to design the system. It is good to have a hierarchical netlist only until Physical
Implementation. During placement and routing, it is better to have a flattened netlist.
Flattening of the netlist implies that the various sub blocks of the model have basically
opened up and there are no more sub blocks. There is just one top block. After you flatten
the netlist you cannot differentiate between the various sub block, but the logical

 In a flat design flow, the placement and routing resources are always visible and
hierarchy of the whole design is maintained. The reason to do this is:

Physical Design Engineers can perform routing optimization and can avoid congestion
available.

to achieve a good quality design optimization. If the conventional hierarchical flow is


used, then it can lead to sub-optimal timing for critical paths traveling through the blocks
and for critical nets routed around the blocks.
The following gives and example of a netlist in the hierarchical mode as well as the
flattened netlist mode:

8.3.a Hierarchical Model:

module top (a , out1 )


input a;
output out1;

wire n1;

SUB1 U1 (.in (a), .out (n1))


SUB1 U2 (.in (n1), .out (out1))

endmodule

module SUB1 ( b , outb )


input b;
output outb;

wire n1, n2;

INVX1 V1 (.in (b), .out (n1))


INVX1 V2 (.in (n1), .out (n2))

San Francisco State University Nano-Electronics & Computing Research Lab 97


INVX1 V3 (.in (n2), .out (outb))

endmodule

In verilog, the instance name of each module is unique. In the flattened netlist, the
instance name would be the top level instance name/lower level instance name etc…
Also the input and output ports of the sub modules also get lost. In the above example the
input and output ports; a, out1, b and outb get lost.
The above hierarchical model, when converted to the flattened netlist, will look like this:

8.3.b Flattened Model:

module top ( in1, out1 )

input in1;
output out1;

wire topn1;

INVX1 U1/V1 (.in (in1), .out (V1/n1)


INVX1 U1/V2 (.in (V1/n1), .out (V2/n2 )
INVX1 U1/V3 (.in (V2/n2), .out (topn1 )
INVX1 U2/V1 (.in (n1), .out (V1/n1 )
INVX1 U2/V2 (.in (V1/n1), .out (V2/n2 )
INVX1 U2/V3 (.in (V2/n2), .out ( out1 )

endmodule

The following figure shows the summary of floorplanning:

Figure 8.c Floorplanning Flow Chart

San Francisco State University Nano-Electronics & Computing Research Lab 98


Floorplanning Summary

Bind the Physical Library to netlist

Create the Initial Core

Create the I/O pin placement, pad ring

Place the macros, standard cells

Create Placement Blockages, readjust Macro Placement

Specify Power and Ground nets

Create Power and Macro rings

Create power , ground rails, meshes

Route Power and Ground nets

Check floorplan visually and for any other violations

Is floorplan
ok?
No
Yes

Placement

8.4 Placement

Placement is a step in the Physical Implementation process where the standard cells
location is defined to a particular position in a row. Space is set aside for interconnect to
each logic/standard cell. After placement, we can see the accurate estimates of the
capacitive loads of each standard cell must drive. The tool places these cells based on the
algorithms which it uses internally. It is a process of placing the design cells in the
floorplan in the most optimal way.
What does the Placement Algorithm want to optimize?
The main of the placement algorithm is
1. Making the chip as dense as possible ( Area Constraint)
2. Minimize the total wire length ( reduce the length for critical nets)

San Francisco State University Nano-Electronics & Computing Research Lab 99


3. The number of horizontal/vertical wire segments crossing a line.

 The placement should be routable (no cell overlaps; no density overflow).


Constraints for doing the above are:

 Timing constraints are met


There are different algorithms to do placement. The most popular ones are as follows:
1. Constructive algorithms: This type of algorithm uses a set of rules to arrive at the
optimized placement. Example: Cluster growth, min cut, etc.
2. Iterative algorithms: Intermediate placements are modified in an attempt to improve
the cost function. It uses an already constructed placement initially and iterates on that to
get a better placement.
Example: Force-directed method, etc
3. Nondeterministic approaches: simulated annealing, genetic algorithm, etc.

Min-Cut Algorithm
This is the most popular algorithm for placement. This method uses successive
application of partitioning the block. It does the following steps:
1. Cuts the placement area into two pieces. This piece is called a bin. It counts the
number of nets crossing the line. It optimizes the cost function. The cost function
here would be number of net crossings. The lesser the cost function, the more
optimal is the solution.
2. Swaps the logic cells between these bins to minimize the cost function.
3. Repeats the process from step 1, cutting smaller pieces until all the logic cells are
placed and it finds the best placement option.
The cost function not only depends on the number of crossings but also a number of
various other factors such as, distance of each net, congestion issues, signal integrity
issues etc. The size of the bin can vary from a bin size equal to the base cell to a bin size
that would hold several logic cells. We can start with a large bin size, to get a rough
placement, and then reduce the bin size to get a final placement.

There are two steps in the Placement process.


1. Global Placement
2. Detail Placement

8.5 Routing

After the floorplanning and placement steps in the design, routing needs to be done.
Routing is nothing but connecting the various blocks in the chip with one an other. Until
now, the blocks were only just placed on the chip. Routing also is spilt into two steps
1. Global routing: It basically plans the overall connections between all the blocks
and the nets. Its main aim is to minimize the total interconnect length, minimize

The chip is divided into small blocks. These small blocks are called routing
the critical path delay. It determines the track assignments for each interconnect.

bins. The size of the routing bin depends on the algorithm the tool uses. Each
routing bin is also called a gcell. The size of this gcell depends on the tool.

San Francisco State University Nano-Electronics & Computing Research Lab 100
Each gcell has a finite number of horizontal and vertical tracks. Global routing
assigns nets to specific gcells but it does not define the specific tracks for each
of them. The global router connects two different gcells from the centre point

 Track Assignment: The Global router keeps track of how many


of each gcell.

interconnections are going in each of direction. This is nothing but the routing
demand. The number of routing layers that are available depend on the design
and also, if the die size is more, the greater the routing tracks. Each routing
layer has a minimum width spacing rule, and its own routing capacity.
For Example: For a 5 metal layer design, if Metal 1, 4, 5 are partially up for
inter-cell connections, pin, VDD, VSS connections, the only layers which are
routable 100% are Metal2 and Metal3. So if the routing demand goes over the
routing supply, it causes Congestion. Congestion leads to DRC errors and
slow runtime.
2. Detailed Routing: In this step, the actual connection between all the nets takes
place. It creates the actual via and metal connections. The main objective of
detailed routing is to minimize the total area, wire length, delay in the critical

It specifies the specific tracks for the interconnection; each layer has its
paths.

own routing grid, rules. During the final routing, the width, layer, and exact
location of the interconnection are decided.

Figure 8.5.a : Routing grid


Routing grid

Gcell

After detailed routing is complete, the exact length and the position of each interconnect
for every net in the design is known. The parasitic capacitance, resistance can now is
extracted to determine the actual delays in the design. The parasitic extraction is done by
extraction tools. This information is back annotated and the timing of the design is now
calculated using the actual delays by the Static Timing Analysis Tool.
After timing is met and all other verification is performed such as LVS, etc, the design is
sent to the foundry to manufacture the chip.

San Francisco State University Nano-Electronics & Computing Research Lab 101
8.6 Packaging
Depending on the type of packaging of the chip, the I/O cells, pad cells are designed
differently during the Physical Implementation. There are two types of Packaging style:
a. Wire-bond: The connections in this technique are real wires. The underside of the
die is first fixed in the package cavity. A mixture of epoxy and a metal (aluminum,
sliver or gold) is used to ensure a low electrical and thermal resistance between the
die and the package. The wires are then bonded one at a time to the die and the
package. Below is a illustration of Wire Bond packaging.

Figure 8.6.a : Wire Bond Example

b. Flip-Chip: Flip Chip describes the method of electrically connecting the die to the
package carrier. This is a direct chip-attach technology, which accommodates dies that
have several bond pads placed anywhere on the surfaces at the top. Solder balls are
deposited on the die bond pads usually when they are still on the wafer, and at
corresponding locations on the board substrate. The upside-down die (Flip-chip) is then
aligned to the substrate. The advantage of this type of packaging is very short
connections (low inductance) and high package density. The picture below is an
illustration of flip-chip type of packaging.

San Francisco State University Nano-Electronics & Computing Research Lab 102
Figure 8.6.b : Flip Chip Example

8.7 IC TUTORIAL EXAMPLE

8.7.1 Introduction

The physical design stage of the ASIC design flow is also known as the “place and route”
stage. This is based upon the idea of physically placing the circuits, which form logic
gates and represent a particular design, in such a way that the circuits can be fabricated.

This is a generic, high level description of the physical design (place/route) stage. Within
the physical design stage, a complete flow is implemented as well. This flow will be
described more specifically, and as stated before, several EDA companies provide
software or CAD tools for this flow. Synopsys® software for the physical design process
is called IC Compiler. The overall goal of this tool/software is to combine the inputs of a
gate-level netlist, standard cell library, along with timing constraints to create and placed
and routed layout. This layout can then be fabricated, tested, and implemented into the
overall system that the chip was designed for.

The first of the main inputs into ICC are :


1. Gate-level netlist, which can be in the form of Verilog or VHDL. This netlist is
produced during logical synthesis, which takes place prior to the physical design
stage( discussed in chapter 3 ).
2. The second of the main inputs into ICC is a standard cell library. This is a collection
of logic functions such as OR, AND, XOR, etc. The representation in the library is that of
the physical shapes that will be fabricated.

San Francisco State University Nano-Electronics & Computing Research Lab 103
This layout view or depiction of the logical function contains the drawn mask layers
required to fabricate the design properly. However, the place and route tool does not
require such level of detail during physical design. Only key information such as the
location of metal and input/output pins for a particular logic function is needed. This
representation used by ICC is considered to be the abstract version of the layout. Every
desired logic function in the standard cell library will have both a layout and abstract
view. Most standard cell libraries will also contain timing information about the function
such as cell delay and input pin capacitance which is used to calculated output loads. This
timing information comes from detailed parasitic analysis of the physical layout of each
function at different process, voltage, and temperature points (PVT). This data is
contained within the standard cell library and is in a format that is usable by ICC. This
allows ICC to be able to perform static timing analysis during portions of the physical
design process. It should be noted that the physical design engineer may or may not be
involved in the creating of the standard cell library, including the layout, abstract, and
timing information. However, the physical design engineer is required to understand what
common information is contained within the libraries and how that information is used
during physical design. Other common information about standard cell libraries is the
fact that the height of each cell is constant among the different functions. This common
height will aid in the placement process since they can now be linked together in rows
across the design. This concept will be explained in detail during the placement stage of
physical design.

3. The third of the main inputs into ICC are the design constraints. These constraints are
identical to those which were used during the front-end logic synthesis stage prior to
physical design. These constraints are derived from the system specifications and
implementation of the design being created. Common constraints among most designs
include clock speeds for each clock in the design as well as any input or output delays
associated with the input/output signals of the chip. These same constraints using during
logic synthesis are used byICC so that timing will be considered during each stage of
place and route. The constraints are specific for the given system specification of the
design being implemented.

In the below IC compiler tutorial example, we will place & route the fifo design
synthesized.

STEPS

1. As soon as you log into your engr account, at the command prompt, please type “csh
“as shown below. This changes the type of shell from bash to c-shell. All the commands
work ONLY in c-shell.

[hkommuru@hafez ]$csh

2. Please copy the whole directory from the below location

[hkommuru@hafez ]$ cp –rf /packages/synopsys/setup/asic_flow_setup ./

San Francisco State University Nano-Electronics & Computing Research Lab 104
This ccreate directory structure as shown below. It will create a directory called
“asic_flow_setup ”, under which it creates the following directories namely

asic_flow_setup
src/ : for verilog code/source code
vcs/ : for vcs simulation for counter example
synth_graycounter/ : for synthesis of graycounter example
synth_fifo/ : for fifo synthesis
pnr_fifo/ : for Physical design of fifo design example
extraction/: for extraction
pt/: for primetime
verification/: final signoff check

The “asic_flow_setup” directory will contain all generated content including, VCS
simulation, synthesized gate-level Verilog, and final layout. In this course we will always
try to keep generated content from the tools separate from our source RTL. This keeps
our project directories well organized, and helps prevent us from unintentionally
modifying the source RTL. There are subdirectories in the project directory for each
major step in the ASIC Flow tutorial. These subdirectories contain scripts and
configuration files for running the tools required for that step in the tool flow. For this
tutorial we will work exclusively in the vcs directory.

3. Please source “synopsys_setup.tcl” which sets all the environment variables necessary
to run the VCS tool.
Please source them at unix prompt as shown below

[hkommuru@hafez ]$ source /packages/synopsys/setup/synopsys_setup.tcl

Please Note : You have to do steps 1 and 3 above everytime you log in.

4. Go to the pnr directory .

[[email protected]] $cd asic_flow_setup/pnr_fifo


[[email protected]] $cd scripts
[[email protected]] $emacs init_design_icc.tcl &
[[email protected]] $cd ..

At the unix prompt type “icc_shell “, it will open up the icc window.

[[email protected]] $icc_shell

Next, to open the gui, type “gui_start”, it opens up gui window as shown in the next page .

icc_shell > gui_start

San Francisco State University Nano-Electronics & Computing Research Lab 105
Before a design can be placed and routed within ICC, the environment for the design
needs to be created. The goal of the design setup stage in the physical design flow is to
prepare the design for floorplanning. The first step is to create a design library. Without a
design library, the physical design process using will not work. This library contains all
of the logical and physical data that will need. Therefore the design library is also
referenced as the design container during physical design. One of the inputs to the design
library which will make the library technology specific is the technology file.

CREATING DESIGN LIBRARY

4.a Setting up the logical libraries. The below commands will set the logical libraries and
define VDD and VSS

icc_shell > lappend search_path


“/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/synopsys/models”

icc_shell > set link_library “ * saed90nm_max.db saed90nm_min.db


saed90nm_typ.db”

San Francisco State University Nano-Electronics & Computing Research Lab 106
icc_shell > set target_library “ saed90nm_max.db”

icc_shell > set mw_logic0_net VSS


icc_shell > set mw_logic1_net VDD

icc_shell> set_tlu_plus_files -max_tluplus


/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/star_rcxt/tluplus/saed90nm_1p9m_1t_
Cmax.tluplus -min_tluplus
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/star_rcxt/tluplus/saed90nm_1p9m_1t_
Cmin.tluplus -tech2itf_map
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/astro/tech/tech2itf.map

4.b Creating Milkway database

icc_shell> create_mw_lib -technology


/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/astro/tech/astroTechFile.tf -
mw_reference_library
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/astro/fram/saed90nm_fr/
FIFO_design.mw

notice the space between saed90nm_fr/ and FIFO_design.mw, which is the design
name. You can choose your own design name.

4.c Open the newly created Milkyway database

icc_shell > open_mw_lib FIFO_design.mw

4.d Read in the gate level synthesized verilog netlist. It opens up a layout window , which
contains the layout information as shown below. You can see all the cells in the design at
the bottom, since we have not initialized the floorplan yet or done any placement.

icc_shell > read_verilog ../synth_fifo/output/fifo.v

San Francisco State University Nano-Electronics & Computing Research Lab 107
4.e Uniquify the design by using the uniquify_fp_mw_cel command. The Milkyway
format does not support multiply instantiated designs. Before saving the design in
Milkyway format, you must uniquify the design to remove multiple instances.

icc_shell> uniquify_fp_mw_cel

4.f Link the design by using the link command (or by choosing File > Import > Link
Design in the GUI).

icc_shell> link

4.g Read the timing constraints for the design by using the read_sdc command (or by
choosing File > Import > Read SDC in the GUI).

icc_shell > read_sdc ../synth_fifo/const/fifo.sdc

San Francisco State University Nano-Electronics & Computing Research Lab 108
4.h Save the design.

icc_shell > save_mw_cel –as fifo_inital

FLOORPLANNING
Open file /scripts/floorplan_icc.tcl

5. Initalize the floorplan with below command

icc_shell > initialize_floorplan -core_utilization 0.6 -start_first_row -


left_io2core 5.0 -bottom_io2core 5.0 -right_io2core 5.0 -top_io2core 5.0 -
pin_snap

You can see in the layout window, the floorplan size and shape. Since we are still in the
floorplan stage all the cells in the design are outside of the floorplan . You can change the
above options to play around with the floorplan size.

San Francisco State University Nano-Electronics & Computing Research Lab 109
6. Connect Power and Ground pins with below command

icc_shell > derive_pg_connection -power_net VDD –ground_net VSS


icc_shell > derive_pg_connection -power_net VDD –ground_net VSS -tie

7. ICC automatically places the pins around the boundary of the floorplan evenly, if there
are no pin constraints given. You can constrain the pins around the boundary [ the blue
color pins in the above figure ] , using a TDF file. You can look at the file /const/fifo.tdf

You need to create the TDF file in the following format

pin PINNAME layer width height pinside [ pin Order ]


pin PINNAME layer width height pinside [ pin Offset ]

Example with pin Order


pin Clk M3 0.36 0.4 right 1
In this tutorial example, the tool is placing the pins automatically. Hence, you do not have
to run this step.

8. Power Planning: First we need to create rectangular power ring around the floorplan.

San Francisco State University Nano-Electronics & Computing Research Lab 110
##Create VSS ring

icc_shell>create_rectangular_rings -nets {VSS} -left_offset 0.5 -


left_segment_layer M6 -left_segment_width 1.0 -extend_ll -extend_lh -
right_offset 0.5 -right_segment_layer M6 -right_segment_width 1.0 -
extend_rl -extend_rh -bottom_offset 0.5 -bottom_segment_layer M7 -
bottom_segment_width 1.0 -extend_bl -extend_bh -top_offset 0.5 -
top_segment_layer M7 -top_segment_width 1.0 -extend_tl -extend_th

## Create VDD Ring

icc_shell>create_rectangular_rings -nets {VDD} -left_offset 1.8 -


left_segment_layer M6 -left_segment_width 1.0 -extend_ll -extend_lh -
right_offset 1.8 -right_segment_layer M6 -right_segment_width 1.0 -
extend_rl -extend_rh -bottom_offset 1.8 -bottom_segment_layer M7 -
bottom_segment_width 1.0 -extend_bl -extend_bh -top_offset 1.8 -
top_segment_layer M7 -top_segment_width 1.0 -extend_tl -extend_th

## Creates Power Strap

icc_shell>create_power_strap -nets { VDD } -layer M6 -direction vertical -


width 3

icc_shell>create_power_strap -nets { VSS } -layer M6 -direction vertical -


width 3

See in the figure below.

San Francisco State University Nano-Electronics & Computing Research Lab 111
10. Save the design .

icc_shell> save_mw_cel –as fifo_fp

PLACEMENT

open /scripts/place_icc.tcl

During the optimization step, the place_opt command introduces buffers and inverters
tofix timing and DRC violations. However, this buffering strategy is local to some critical
paths.The buffers and inverters that are inserted become excess later because critical
paths change during the course of optimization. You can reduce the excess buffer and
inverter counts after place_opt by using the set_buffer_opt_strategy command, as shown

icc_shell> set_buffer_opt_strategy -effort low

This buffering strategy will not degrade the quality of results (QoR).

Also, by default, IC Compiler performs automatic high-fanout synthesis on nets with a


fanout greater than or equal to 100 and does not remove existing buffer or inverter trees.

San Francisco State University Nano-Electronics & Computing Research Lab 112
You can control the medium- and high-fanout thresholds by using the -hf_thresh and -
mf_thresh options, respectively. You can control the effort used to remove existing buffer
and inverter trees by using the -remove_effort option.

Setting TLUplus files:


TLUPlus models are a set of models containing advanced process effects that can be used
by the parasitic extractors in Synopsys place-and-route tools for modeling. These files
need to be set using the set_tlu_plus_files command as shown below:

icc_shell> set_tlu_plus_files -max_tluplus


/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/star_rcxt/tluplus/saed90nm_1p9m_1t_
Cmax.tluplus -min_tluplus
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/star_rcxt/tluplus/saed90nm_1p9m_1t_
Cmin.tluplus -tech2itf_map
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90n
m/Digital_Standard_Cell_Library/process/astro/tech/tech2itf.map

11. Goto Layout Window , Placement  Core Placement and Optimization . A new
window opens up as shown below . There are various options, you can click on what ever
option you want and say ok. The tool will do the placement. Alternatively you can also
run at the command at icc_shell . Below is example with congestion option.

icc_shell > place_opt –congestion

When you want to add area recovery, execute :


# place_opt -area_recovery -effort low

San Francisco State University Nano-Electronics & Computing Research Lab 113
# When the design has congestion issues, you have following choices :
# place_opt -congestion -area_recovery -effort low # for medium effort congestion
removal
# place_opt -effort high -congestion -area_recovery # for high eff cong removal

## What commands do you need when you want to optimize SCAN ?


# read_def < def file >
# check_scan_chain > $REPORTS_DIR/scan_chain_pre_ordering.rpt
# report_scan_chain >> $REPORTS_DIR/scan_chain_pre_ordering.rpt
# place_opt -effort low -optimize_dft

## What commands do you need when you want to reduce leakage power ?
# set_power_options -leakage true
# place_opt -effort low -area_recovery -power

## What commands do you need when you want to reduce dynamic power ?
# set_power_options -dynamic true -low_power_placement true
# read_saif –input < saif file >
# place_opt -effort low -area_recovery –power
# Note : option -low_power_placement enables the register clumping algorithm in
# place_opt, whereas the option -dynamic enables the
# Gate Level Power Optimization (GLPO)

## When you want to do scan opto, leakage opto, dynamic opto, and you have congestion
issues,
## use all options together :
# read_def < scan def file >
# set_power_options -leakage true -dynamic true -low_power_placement true
# place_opt -effort low -congestion -area_recovery -optimize_dft -power -num_cpus

12. After the placement is done, all the cells would be placed in the design and it would
the below window.

San Francisco State University Nano-Electronics & Computing Research Lab 114
13. You can report the following information after the placement stage.

icc_shell> save_mw_cel –as fifo_place

### Reports

icc_shell>report_placement_utilization > output/fifo_place_util.rpt


icc_shell>report_qor_snapshot > output/fifo_place_qor_snapshot.rpt
icc_shell>report_qor > output/fifo_place_qor.rpt

### Timing Report

icc_shell>report_timing -delay max -max_paths 20 > output/fifo_place.setup.rpt


icc_shell>report_timing -delay min -max_paths 20 > output/fifo_place.hold.rpt

After placement, if you look at the fifo_cts.setup.rpt and fifo_cts.hold.rpt, in the reports
directory, they meet timing.

CLOCK TREE SYNTHESIS

San Francisco State University Nano-Electronics & Computing Research Lab 115
open scripts/ cts_icc.tcl

Before doing the actual cts, you can set various optimization steps. In the Layout window,
click on “Clock “, you will see various options, you can set any of the options to run CTS.
If you click on Clock  Core CTS and Optimization .

14. Save the Cell and report timing

icc_shell>save_mw_cel -as fifo_cts


icc_shell> report_placement_utilization > reports/fifo_cts_util.rpt
icc_shell> report_qor_snapshot > reports/fifo_cts_qor_snapshot.rpt
icc_shell> report_qor > reports/fifo_cts_qor.rpt
icc_shell> report_timing –max_paths 20 –delay max > reports/fifo_cts.setup.rpt
icc_shell> report_timing –max_paths 20 –delay min > reports/fifo_cts.hold.rpt

CTS POST OPTIMIZATION STEPS


##Check for hold time and setup time and do incremental if there are any violations
## setup time fix

## clock_opt -only_psyn

San Francisco State University Nano-Electronics & Computing Research Lab 116
## clock_opt -sizing

## hold_time fix
## clock_opt -only_hold_time

## Save cel again and report paths


save_mw_cel -as fifo_cts_opt

### Timing Report


report_timing -delay max -max_paths 20 > reports/fifo_cts_opt.setup.rpt
report_timing -delay min -max_paths 20 > reports/fifo_cts_opt.hold.rpt

ROUTING
open scripts/route_icc.tcl

14 . In the layout window, click on Route  Core Routing and Optimization, a new
window will open up as shown below

San Francisco State University Nano-Electronics & Computing Research Lab 117
You can select various options, if you want all the routing steps in one go, or do global
routing first, and then detail, and then optimization steps. It is up to you.

icc_shell> route_opt

Above command does not have optimizations. You can although do an incremental
optimization by clicking on the incremental mode in the above window after route_opt is
completed. View the shell window after routing is complete. You can see that there are
no DRC violations reported, indicating that the routing is clean:

San Francisco State University Nano-Electronics & Computing Research Lab 118
15. Save the cel and report timing

icc_shell > save_mw_cel –as fifo_route


icc_shell> report_placement_utilization > reports/fifo_route_util.rpt
icc_shell> report_qor_snapshot > reports/fifo_route_qor_snapshot.rpt
icc_shell> report_qor > reports/fifo_route_qor.rpt
icc_shell> report_timing –max_paths 20 –delay max >
reports/fifo_route.setup.rpt
icc_shell> report_timing –max_paths 20 –delay min > reports/fifo_route.hold.rpt

POST ROUTE OPTIMIZATION STEPS

16. Goto Layout Window, Route  Verify Route, it opens up a new window as shown
below, click ok.

San Francisco State University Nano-Electronics & Computing Research Lab 119
The results are clean , as you can see in the window below:

If results are not clean, you might have to do post route optimization steps, like
incremental route. Verify, clean, etc.

San Francisco State University Nano-Electronics & Computing Research Lab 120
EXTRACTION
9.0 Introduction
In general, almost all layout tools are capable of extracting the layout database using
various algorithms. These algorithms define the granularity and the accuracy of the
extracted values. Depending upon the chosen algorithm and the desired accuracy, the
following types of information may be extracted:
Detailed parasitics in DSPF or SPEF format.
Reduced parasitics in RSPF or SPEF format.
Net and cell delays in SDF format.
Net delay in SDF format + lumped parasitic capacitances.
The DSPF (Detailed Standard Parasitic Format) contains RC information of each
segment (multiple R’s and C’s) of the routed netlist. This is the most accurate form of
extraction. However, due to long extraction times on a full design, this method is not
practical. This type of extraction is usually limited to critical nets and clock trees of the
design.
The RSPF (Reduced Standard Parasitic Format) represents RC delays in terms of a pi
model (2 C’s and 1 R). The accuracy of this model is less than that of DSPF, since it does
not account for multiple R’s and C’s associated with each segment of the net. Again, the
extraction time may be significant, thus limiting the usage of this type of information.
Target applications are critical nets and small blocks of the design. Both detailed and
reduced parasitics can be represented by OVI’s (Open Verilog International) Standard
Parasitic Exchange Format (SPEF). The last two (number 3 and 4) are the most common
types of extraction used by the designers. Both utilize the SDF format. However, there is
major difference between the two. Number 3 uses the SDF to represent both the cell and
net delays, whereas number 4 uses the SDF to represent only the net delays. The lumped
parasitic capacitances are generated separately. Some layout tools generate the lumped
parasitic capacitances in the Synopsys set_load format, thus facilitating direct back
annotation to DC or PT.

Extraction steps for the tutorial example:


open scripts/extract_icc.tcl

17. Go to Layout Window, Route  Extract RC, it opens up a new window as shown
below, click ok.

San Francisco State University Nano-Electronics & Computing Research Lab 121
Alternatively, you can run this script on the ICC shell:

icc_shell > extract_rc -coupling_cap -routed_nets_only -incremental

##write parasitic to a file for delay calculations tools (e.g PrimeTime).

icc_shell > write_parasitics -output ./output/fifo_extracted.spef -format SPEF

The above script will produce the min and max files, which can be used for delay
estimation using PrimeTime tool.

##Write Standard Delay Format (SDF) back-annotation file

icc_shell > write_sdf ./output/fifo_extracted.sdf

##Write out a script in Synopsys Design Constraints format

icc_shell > write_sdc ./output/fifo_extracted.sdc

##Write out a hierarchical Verilog file for the current design, extracted from layout

icc_shell > write_verilog ./output/fifo_extracted.v

The extracted verilog netlist can be used to double-check the netlist by running a
simulation on it using VCS

18. Report timing

icc_shell> report_timing –max_paths 20 –delay max >


reports/fifo_extracted.setup.rpt
icc_shell> report_timing –max_paths 20 –delay min >
reports/fifo_extracted.hold.rpt

San Francisco State University Nano-Electronics & Computing Research Lab 122
19. Report power

icc_shell> report_power > reports/fifo_power.rpt

If you open the generated fifo_power.rpt, you will notice both total dynamic power
(active mode) and cell leakage power (standby mode) being reported.

20. Save the design

icc_shell> save_mw_cel -as fifo_extracted

Post-Layout Timing Verification Using PrimeTime


After extraction of RC parasitics (.spef file) and the enlist (.V) from layout, we can run
primetime to perform timing verification on the extracted netlist. The steps will be similar
to the step we used to run prime time on the verilog netlist obtained after DC synthesis,
except that we need to use the post layout netlist as well as the parasitic information.

1. Please source “synopsys_setup.tcl” which sets all the environment variables


necessary to run the Primetime . Please type csh at the unix prompt before you source the
below script. Please source the above file from the below location.

[[email protected]] $ csh
[[email protected]] $ cd
[[email protected]] $ cd /asic_flow_setup/pt_post
[[email protected]] $ source /packages/synopsys/setup/synopsys_setup.tcl

2. PT may be invoked in the command-line mode using the command pt_shell or in the
GUI mode through the command primetime as shown below.
Command-line mode:
> pt_shell
GUI-mode:
> primetime

Before doing the next step, open the post_layout_pt.tcl script and keep it ready which is
at location /scripts/post_layout_pt.tcl

[[email protected]]$ vi scripts/post_layout_pt.tcl

3. Just like DC setup, you need to set the path to link_library and search_path

pt_shell > set link_library [ list


/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_typ.db

San Francisco State University Nano-Electronics & Computing Research Lab 123
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max .db/packages/process_kit/gen
eric/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digital_Standard_Cell_Library/
synopsys/models/saed90nm_min.db ]
pt_shell > set target_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max.db]

4. Read the extracted netlist

pt_shell> read_verilog ../pnr_fifo/output/fifo_extracted.v

5. You need to set the top module name.

pt_shell > current_design FIFO

6. Read extracted parasitics

pt_shell> read_parasitics -format SPEF ../pnr_fifo/output/fifo_extracted.spef.max

7. Read in the SDC from the ICC

pt_shell> read_sdc ../pnr_fifo/output/fifo_extracted.sdc

8. Now we can do the analysis of the design. Generally, four types of analysis is

 From primary inputs to all flops in the design.


performed on the design, as follows:

 From flop to flop.


 From flop to primary output of the design.
 From primary inputs to primary outputs of the design.

All four types of analysis can be accomplished by using the following commands:

pt_shell> report_timing -from [all_inputs] -max_paths 20 -to [all_registers -data_pins] >


reports/timing.rpt

pt_shell> report_timing -from [all_register -clock_pins] -max_paths 20 -to [all_registers


-data_pins] >> reports/timing.rpt

pt_shell> report_timing -from [all_registers -clock_pins] -max_paths 20 -to [all_outputs]


>> reports/timing.rpt

pt_shell> report_timing -from [all_inputs] -to [all_outputs] -max_paths 20 >>


reports/timing.rpt

San Francisco State University Nano-Electronics & Computing Research Lab 124
The results are reported to the timing.rpt file under the reports subdirectory. Please open
the file to see if the design meets the timing or if there are any errors.

9. Reporting setup time and hold time. Primetime by default reports the setup time. You
can report the setup or hold time by specifying the –delay_type option as shown in below
figure.

pt_shell> report_timing -from [all_registers -clock_pins] -to [all_registers -data_pins] -


delay_type max >> reports/timing.rpt

The report is added to the timing.rpt file. Please read the file.

10. Reporting hold time


pt_shell> read_parasitics -format SPEF ../pnr_fifo/output/fifo_extracted.spef.min

pt_shell> report_timing -from [all_registers -clock_pins] -to [all_registers -data_pins] -


delay_type min >> reports/timing.rpt

The report is added to the timing.rpt file. Please read the file.

11. Reporting timing with capacitance and transition time at each level in the path

pt_shell > report_timing -transition_time –capacitance –nets -input_pins –from


[all_registers –clock_pins ] –to [ all_registers –data_pins ] > reports/timing.tran.cap.rpt

The report is added to the timing.tran.cap.rpt file. Please read the file for results.

12. You can save your session and come back later if you chose to.

pt_shell > save_session output/fifo.session

San Francisco State University Nano-Electronics & Computing Research Lab 125
APPENDIX A: Design for Test
A.0 Introduction

What do we do to test the chip after it is manufactured? Why do we need DFT?


Consider a scenario where a million chips are produced. It is time consuming and an
extremely costly process of trying to test each of these million chips. The design being
correct does not guarantee that the manufactured chip is operational. It could have a lot of
manufacturing defects. For example for 0.13nm technology, the percentage of chips that
are good is about 60%. Also it is very difficult to ascertain that the chip manufactured
will function correctly for all possible inputs and various other conditions. Moreover
even we do test the chip; we can only see the inputs and the output pins. A designer will
not be able to see what is happening at the intermediate steps before the signal reaches
the output pins. To overcome this problem DFT was started. This is the second best thing
we can do. It increases the probability of the chip working in a real system. To perform
Design for Test, very small changes in the design are needed.
In CMOS technology, the transistor consists of both NMOS and PMOS transistors. If
we take an example of and inverter, we see that PMOS is connected to VDD and NMOS
is connected to VSS. Inverter inverts the given input. PMOS transistor is on when the
input to the gate is zero. It charges up the capacitor to the VDD value and the opposite e
happened for NMOS. So basically NMOS acts alike a pull down device and PMOS acts
like a pull up device. SO sometimes what happens is that these pull up and pull down
devices don’t work properly in some transistors and a fault occurs. This fault is called as
‘Stuck at Fault’. The logic could be stuck at zero permanently or at ‘1’ permanently
without depending on the input values. The next topic covers Test techniques.

A.1 Test Techniques


A.1.1 Issues faced during testing
1. Consider a combinatorial circuit which has N inputs. To validate the circuit, we need
to exhaustively apply all possible input test vectors and observe the responses. Therefore
for N inputs we need to generate 2 to the power of N inputs. If the number of inputs is too
high, we still can manage to it, but it would take a long time.
Test vector: A set of input vectors to test the system.
2. Now consider a Sequential circuit, for a sequential circuit, the output of the circuit
depends on the inputs as well as the state value. To test such a circuit, it would take
extremely long long time and is practically impossible to do it.
To overcome the above issue, a Scan-based Methodology was introduced

A.2 Scan-Based Methodology


To begin with, there are two important concepts we need to understand. They are:
2. Controllability: It is the ability to control the nodes in a circuit by a set of inputs.
For a given set of input and output pins, when we give the system a set of input

San Francisco State University Nano-Electronics & Computing Research Lab 126
test vectors, we should be able to control each node we want to test. The higher
the degree of controllability, the better.
3. Observability: It is the ease with which we can observe the changes in the nodes
(gates). Like in the previous case the higher the observability, the better. What I
mean by saying higher is that, we can see the desired state of the gates at the
output in lesser number of cycles.

It is easy to test combinational circuits using the above set of rules. For sequential circuits,
to be able to the above, we need to replace the flip flops (FF) with ‘Scan-FF’. These Scan
Flip Flops are a special type of flip flops; they can control and also check the logic of the
circuits. There are two methodologies:
1. Partial Scan: Only some Flip-Flops are changed to Scan-FF.
2. Full Scan: Entire Flip Flops in the circuit are changed to these special Scan FF.
This does mean that we can test the circuit 100%.

What does the Scan-FF consist of and how does it work?


The Scan-FF is nothing but a Multiplexer added to the Flip-Flop. We can control the
input to the flip-flops by using the select pin of the multiplexer to do it. Therefore using
the select pin, the circuit can run in two modes, functional mode and scan mode. In the
scan mode, the select input is high and the input test vectors are passed to the system. So
now we have controlled the circuit and we can therefore see how the circuit is going to
behave for the specific set of input test vectors which were given. After the input vectors
are in the exact place we want them to be in the circuit, the select pin is changed to low
and the circuit is now in functional mode. We can now turn back the select pin to Scan
Mode and the output vectors can be seen at the output pins and a new set of input test
vectors are flushed into the system. We can check the output vectors to see whether they

Some Flip-Flops cannot be made scanable like Reset pins, clock gated Flip Flops
are expected results or not.

Typical designs might have hundreds of scan chains...


The number of scan chains depends on the clock domains. (Normally within a domain,
it is not preferable to have different clocks in each scan chain)
Example: If there are 10 clock domains, then it means the minimum number of scan

Scan chains test logic of combinational gates


chains would be 10.

Scan chains test sequential Flip-Flops through the vectors which pass through them.
Any mismatch in the value of the vectors can be found.

Manufacturing tests: These include functional test and performance test. Functional
test checks whether the manufactured chip is working according to what is actually

 After the chip comes back from the foundry, it sits on a load board to be tested. The
designed and performance test checks the performance of the design (speed).

The socket is connected to the board. The chip is placed on a socket. A mechanical arm
equipment which tests these chips is called Automatic Test Equipment (ATE).

is used to place the chip on the socket.

San Francisco State University Nano-Electronics & Computing Research Lab 127
 A test program tells the ATE what kind of vectors needs to be loaded. Once the
vectors are loaded, the logic is computed and the output vectors can be observed on the

The output pattern should match the expected output.


screen.

When an error is found, you can exactly figure out which flip-flop output is not right

 Failure Analysis analyzes why the chip failed. This is a whole new different field.
Through the Automatic Test Pattern Generator, we can point out the erroneous FF.

 Fault Coverage: It tells us that for a given set of vectors, how many faults are covered.
 Test Coverage: It is the coverage of the testable logic; meaning how much of the

 The latest upcoming technology tests the chips at the wafer level (called wafer sort
circuit can be tested.

(with probes))

A.3 Formal Verification


Formal Verification verifies the circuit without changing the logic of the circuit. It is the
defacto standard used today in the industry. Some of the tools which use formal
verification today in the industry are CONFORMAL (from cadence), FORMALITY
(from Synopsys). Formal verification is an algorithmic-based approach to logic
verification that exhaustively proves functional properties about a design. The algorithms

 Binary Decision Diagram (BDD): It is a compact data structure of Boolean logic. It


formal verification uses are:

Symbolic FSM Traversal


can represent the logic state encoded as a Boolean function.

Typically, the following are the types of formal verification:


1.Equivalence Checking: Verifies the functional equivalence of two designs that are at
the same or different abstraction levels (e.g., RTL-to-RTL, RTL-to-Gate, or Gate-to-
Gate). It checks combinatorial and sequential elements. (Basically checks if two circuits
are equivalent). For sequential elements, it checks if the specific instance name occurs in
both the circuits or not.
2. Model Checking: Verifies that the implementation satisfies the properties of the
design. Model checking is used early in the design creation phase to uncover functional
bugs.
Compare Point: [Cadence Nomenclature]; it is defined as any input of a sequential
element and any primary output ports.

APPENDIX B: EDA Library Formats


B.1 Introduction

In the EDA industry, library is defined as a collection of cells (gates). These cells are
called standard cells. There are different kinds of libraries which are used at different
steps in the whole ASIC flow. All the libraries used contain standard cells. The libraries
contain the description of these standard cells; like number of inputs, logic functionality,
propagation delay etc. The representation of each standard cell in each library is different.

San Francisco State University Nano-Electronics & Computing Research Lab 128
There is no such library which describes all the forms of standard cells. For example: the
library used for timing analysis contains the timing information of each of the standard
cells; the library used during Physical Implementation contains the geometry rules.
Standard Cell Libraries determine the overall performance of the synthesized logic. The
library usually contains multiple implementations of the same logic-function, differing by
area and speed. For example few of the basic standard cells could be a NOR gate, NAND
gate, AND gate, etc. The standard cell could be a combination of the basic gates etc. Each
library can be designed to meet the specific design constraints. The flowing are some
library formats used in the ASIC flow:

LEF: Library Exchange Format:


LEF is used to define an IC process and a logic cell library. For example, you would use
LEF to describe a gate array: the base cells, the logic macros with their size and
connectivity information, the interconnect layers and other information to set up the
database that the physical design tools need.

DEF: Design Exchange format:


DEF is used to describe all the physical aspects of a particular chip design including the
netlist and physical location of cells on the chip. For example, if you had a complete
placement from a floorplanning tool and wanted to exchange this information with
another tool, you would use DEF.

EDIF: (Electronic Design Interchange Format)


It is used to describe both schematics and layout.

GDSII: Generic Data Structures Library:


This file is used by foundry for fabricating the ASIC.

SPICE: (Simulation Program with Integrated Circuits Emphasis)


SPICE is a circuit simulator. It is used to analyze the behavior of the circuits. It is mostly
used in the design of analog and mixed signal IC design. The input to SPICE is basically
at netlist, and the tool analyzes the netlist information and does what it is asked to do.

PDEF: Physical Design Exchange Format:


It is a proprietary file format used by Synopsys to describe placement information and the
clustering of logic cells.

SPF: Standard Parasitic Format


This file is generated by the Parasitic Extraction Tool. It contains all the parasitic
information of the design such as capacitance, resistance etc. This file is generated after
the layout. It is back annotated and the timing analysis of the design is performed again to
get the exact timing information.

.lib: Liberty Format


This library format is from Synopsys.

San Francisco State University Nano-Electronics & Computing Research Lab 129
.plib: Physical Liberty Format
This library format is from Synopsys. It is mainly used for Very Deep Sub Micron
Technology. This library is an extension of the Liberty format; it contains extensive,
advanced information needed for Physical Synthesis, Floorplanning, Routing and RC
Extraction

SPEF: Standard Parasitic Exchange Format


Standard Parasitic Exchange Format (SPEF) is a standard for representing parasitic
data of wires in a chip in ASCII format. Resistance, capacitance and inductance of wires
in a chip are known as parasitic data. SPEF is used for delay calculation and ensuring
signal integrity of a chip which eventually determines its speed of operation.

SDC: Synopsys Design Constraints File


SDC is a Tcl Based format-constraining file. It provides the timing information,
constraints regarding the design to the tool. The tool uses the SDC file when performing
timing analysis.

SDF: Standard Delay Format


Primetime writes out SDF for gate level simulation purposes. It describes the gate
delay and also the interconnect delay. SDF can also be used with floorplanning and
synthesis tools to backannotate the interconnect delay. A synthesis tool can us this
information to improve the logic structure.
Fragment of SDF file:

(TIMESCALE 100ps) (INSTANCE B) (DELAY (ABSOLUTE

(NETDELAY net1 (0.6)))

(INSTANCE B) (DELAY (ABSOLUTE

(INTERCONNECT A.INV8.OUT B.DFF1.Q (:0.6:) (:0.6:))))

In this example the rising and falling delay is 60 ps (equal to 0.6 units multiplied by the
time scale of 100 ps per unit specified in a TIMESCALE construct. The delay is specified
between the output port of an inverter with instance name A.INV8 in block A and the Q
input port of a D flip-flop (instance name B.DFF1) in block B

San Francisco State University Nano-Electronics & Computing Research Lab 130
CUSTOMER EDUCATION SERVICES

IC Compiler II:
Block-level Implementation
Workshop
Lab Guide
20-I-078-SLG-010 2019.03-SP4

Synopsys Customer Education Services


690 E. Middlefield Road
Mountain View, California 94043

Workshop Registration: https://training.synopsys.com


Copyright Notice and Proprietary Information
 2019 Synopsys, Inc. All rights reserved. This software and documentation contain confidential and proprietary
information that is the property of Synopsys, Inc. The software and documentation are furnished under a license
agreement and may be used or copied only in accordance with the terms of the license agreement. No part of the
software and documentation may be reproduced, transmitted, or translated, in any form or by any means, electronic,
mechanical, manual, optical, or otherwise, without prior written permission of Synopsys, Inc., or as expressly provided
by the license agreement.

Destination Control Statement


All technical data contained in this publication is subject to the export control laws of the United States of America.
Disclosure to nationals of other countries contrary to United States law is prohibited. It is the reader's responsibility to
determine the applicable regulations and to comply with them.

Disclaimer
SYNOPSYS, INC. AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

Trademarks
Synopsys and certain Synopsys product names are trademarks of Synopsys, as set forth at
https://www.synopsys.com/company/legal/trademarks-brands.html
All other product or company names may be trademarks of their respective owners.

Third-Party Links
Any links to third-party websites included in this document are for your convenience only. Synopsys does not endorse
and is not responsible for such websites and their practices, including privacy practices, availability, and content.

Synopsys, Inc.
690 E. Middlefield Road
Mountain View, CA 94043
www.synopsys.com

Document Order Number: 20-I-078-SLG-010


IC Compiler II: Block-level Implementation Lab Guide
Instructions

0 IC Compiler II GUI

Learning Objectives

This lab’s purpose is to familiarize you with the IC


Compiler II GUI.

After completing this lab, you should be able to:


• Invoke and exit the IC Compiler II GUI
• Navigate the layout view
• Control object and layer visibility
• Select and query layout objects
• Control the view level
• Rearrange panels in the GUI
• Use the Recent menu and Favorites
• Use Command Search, as well as help and man to get
help and additional information about commands and
options

Lab Duration:
45 minutes

IC Compiler II GUI Lab 0-1


Synopsys 20-I-078-SLG-010
Lab 0

Task 1. Launch IC Compiler II

1. Log in to the Linux environment with the assigned user id and password.
2. From the lab’s installation directory, change to the following working
directory and invoke IC Compiler II:

$ cd lab0_gui
$ icc2_shell

The Linux terminal prompt becomes icc2_shell>, the IC Compiler II shell


command prompt.
3. Some of the Linux commands also exist at the IC Compiler II prompt. Have a
look at your current directory:

icc2_shell> ls

You will see that command and output log files were created
(icc2_shell.cmd.* and .log.*, with date/time). The .cmd file
records all commands, including initialization commands invoked during
start-up. The.log file records commands and command output after tool start-
up. In addition, there is an icc2_output.txt file that also contains all
output. Do not spend too much time looking at the log file contents.
Note: Log/cmd file naming is defined through variables in the
initialization file, .synopsys_icc2.setup.

4. Start the GUI:

icc2_shell> start_gui

The IC Compiler II BlockWindow opens.


5. Click on the symbol to open an existing block:

6. In the dialog that opens, click on the yellow symbol in the top-right
corner to Choose a design library called “ORCA_TOP.dlib”.
Design libraries are marked with this symbol:
7. Now that a library has been chosen, select “ORCA_TOP/placed” from the
list at the bottom of the Open Block dialog. Click OK. You will be presented
with the layout view of the design.

Lab 0-2 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

8. Enlarge or maximize the GUI window. You will see that the layout view
adjusts to fit the larger window.

You are looking at the layout of a placed design: Macros are placed at pre-
defined locations, the power mesh (vertical and horizontal VDD/VSS straps)
has been completed, and standard cells have been placed.

Task 2. Navigating the Layout View

1. Spend a few minutes to become familiar with the zoom and pan buttons.
Hint: A short, descriptive ToolTip will pop up when a mouse pointer is held
motionless over a button.

To exit the zoom and pan mode press the [Esc] key or pick the Selection Tool
(the arrow icon). The cursor returns to an arrow or pointer shape.

IC Compiler II GUI Lab 0-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

2. Use hotkeys: Lower-case [F] or [Ctrl F] both correspond to zoom fit all (or
full view), for example. [+] or [=] is zoom-in 2x, [-] is zoom-out 2x.
3. Find out about other hot key definitions by selecting the pull down menu
Help  Report Hotkey Bindings. A new view appears, listing the hot key
definitions. When done, close the Hotkeys view by clicking on the tab’s “X”.

Note: Hotkeys can be defined by using gui_set_hotkey. In the


interest of time, do not attempt this presently.

4. You can also use mouse strokes to pan and zoom, instead of using GUI
buttons or keyboard hotkeys. Try using strokes as follows:
Zoom in on an area of interest: Lower-case [Z], draw an area and then [Esc].
Now click and hold the middle mouse button while moving the pointer
straight up or down and holding it there. The stroke menu appears near the
pointer:

Release the middle button and the design should zoom to fit the display
window (Zoom Fit All). To zoom in on an area stroke (move mouse with
middle button depressed) in a 45° direction upward (to the left or right) – the
view should zoom-in to a rectangular area defined by the stroke line. Stroking
45° downward zooms out. Stroking in the east/west direction pans the display
such that the start point of the stroke is moved to the center of the window.
Note: You can query or define your own strokes by using the
commands get_gui_stroke_bindings and
set_gui_stroke_binding.

5. The keyboard arrow keys can also be used to pan the display
up/down/left/right. Try it.
6. If your mouse has a scroll wheel, it can be used to zoom in/out (2X or ½X)
around the area of the mouse’s pointer.

Lab 0-4 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

7. In the bottom-right of the BlockWindow, click on the Display overview of


view icon.

The Overview panel is a miniature version of the layout view. Zoom in to an


area in the layout, and you will see a yellow box outlining that area in the
overview panel. You can move and resize that box within the overview panel
and the layout view will adjust accordingly. Note that the overview window is
on-demand – it closes once you click anything in the full layout window.

IC Compiler II GUI Lab 0-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

Task 3. Controlling Object and Layer Visibility

You can control what types of objects are visible and/or selectable through the View
Settings panel. In the following steps you will turn on visibility to some key objects
one at a time, to clearly see what they represent.

1. Make sure that under the Options button, Auto apply is checked: This way
changes are applied immediately without having to confirm with the Apply
button each time.
2. In the visibility column uncheck Route.
3. Check Pin. The input, output and power
pins of the cells are displayed.
4. Uncheck then check Labels. This controls
cell name visibility.
5. Zoom in to one of the SRAM macro pins.
Expand Labels by clicking on the + icon on
the left. Check Pin. Pin names are now
visible as well.
6. In selection mode (press [ESC]), draw a
box to fully enclose a few macros. This
selects all selectable objects inside the box,
which are highlighted in white.
7. Make pins un-selectable by unchecking the
box in the selection column. Draw
the same box again to see that only the
macros are selected.
8. Check Route. All routes are displayed.
Expand Route and you will find that you
can control visibility of routes by Net Type.
(e.g. Clock, Ground, etc.).
9. Save the view settings. Click this button
and select Save preset as…
Type MyPreset into the preset name field
and click OK. ICC II creates a file
MyPreset.tcl in the
~/.synopsys_icc2_gui/presets/Layout
directory. All saved presets will be loaded
automatically whenever ICC II is started
again.

10. Select Default from the pull down list to


restore the default settings. Fit the view to
the window [F].

Lab 0-6 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

11. Select the Layers tab, which can be used


to fine-tune the routing visibility further
on a layer-by-layer basis.

12. To list only the actual routing layers in the


Layers tab, click on the filter button and
uncheck “All Layers” first to clear
everything, then check “Routing”. Click
the green “check” mark to apply the
filters.
13. Just as with the Objects, you can control
visibility as well as selectability of the
routing layers by using the checkmarks.
14. In addition, you can click on the
individual items for finer control. For
example, click on the up/down arrow next
to M4 to toggle track visibility.
In addition, you can control the visibility
of Shape, Via, Terminal, etc.
(terminals are the physical pins of the top-
level block ports)

15. Zoom into the layout and turn the


visibility of various layers off and on, to
get a better understanding of the PG mesh
(made up of wide horizontal M7, and
vertical M8 straps, as well as narrow
vertical M2 straps).

16. Notice that there are more columns which


you can control. Use the horizontal scroll
bar to see them all.

17. Re-apply the Default settings by clicking the “reload” circular arrow.

IC Compiler II GUI Lab 0-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

Task 4. Querying and Selecting Objects

1. To be able to query or select objects, the mouse cursor must be an arrow,


which denotes select mode. If your cursor is not in select mode either click the
arrow (Select Tool) button or press the [Esc] key.
2. Back in the Objects tab, uncheck Route Visibility.
3. Hover the pointer over an object without clicking the mouse. The object is
highlighted with dashed white lines, and an InfoTip box appears in the
bottom-left, displaying some key attributes of the object.
4. Now select a single object with a left mouse click. The selected object is
highlighted with solid white lines, and remains highlighted until de-selected,
or a different object is selected. Keep the object selected.
5. While one object is selected, hover the mouse cursor over a different object.
Notice that the InfoTip box displays information about the dashed white line
object, not the selected (solid white line) object.
6. To obtain a full query of a selected object press lower-case [Q]
or use the menu: Select  Query Selection. A query panel lists
all the attribute values of the selected object.

7. You can return to the Objects tab of the View Settings panel by selecting the
View Settings tab.
8. Deselect all objects by either clicking on an empty area in the layout, by using
the menu Select  Clear, or by typing [Ctrl D].
9. Select multiple objects in the same area with a left button drag-and-draw. All
selectable objects within the drawn rectangle are selected.
10. Keep what is selected and select additional objects by holding down the [Ctrl]
key while selecting with the left mouse click.
11. Enable Route visibility, zoom into a small area, then click on an area with
multiple objects stacked on top of each other (for example, a via connecting a
horizontal and a vertical metal route). Notice that one object will be selected
(solid white), while a different object will be queried (dashed white). The
InfoTip box goes with the dashed object.
12. Cycle through the stacked objects by repeatedly clicking the left mouse
button, without moving the cursor: Notice that both the solid and dashed line
objects cycle. Alternatively, press F1 to cycle through just the object queries.
13. If it is difficult to notice the highlighted objects among other bright objects, it
is possible to reduce the brightness of the unselected objects, thereby
increasing the contrast. A Brightness control is located at the top of the
View Settings panel.

Lab 0-8 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

Task 5. Controlling the View Level

By default, the GUI displays all shapes at the current block level, e.g. metal shapes,
blockages, pins etc. To see shapes inside standard cells or hard macros, you have to
increase the view level, and enable them to be “expanded”.

1. Disable the visibility of all Route objects.


2. To better see the objects inside the
macros, remove the fill pattern of the
macro shapes:
In the Objects tab of the View Settings
panel, expand Cell, then click on the
aquamarine outline pattern of Hard
Macro.

In the Select Style dialog, under the Fill pattern,


select the black (no fill) pattern, then click OK.

3. Increase the viewing Level from 0 to 1, then click on the Hierarchy Settings
button, and check Hard Macro.

If you zoom into one of the RAMs you should see the structures inside the
macro. You can turn layer visibilities off/on to see individual layers. Since
these are frame views, what you are seeing are mostly routing blockages.
4. You can also interact with structures within cells: Click on the Multiple Levels
Active button. Now you can hover over
objects within cells, and analyze them.
This functionality is used extensively
in hierarchical design planning, where
you can perform physical manipulation at multiple physical design levels at
the same time.

IC Compiler II GUI Lab 0-9


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

Task 6. Rearranging Panels

As you may have noticed, you can switch back and forth between the panels (or
tabs) on the right. Sometimes it can be useful to display more than one panel at the
same time. This is particularly useful if you have a lot of screen real estate.

1. The query panel should still be open. If not, select a macro then press [Q] to
bring it up again.
2. Right-click on the query tab –
a context menu should be
displayed.
3. From that context menu click on
the “Split ‘Query’ Below” button.
The panel area on the right will
now split into a top and a bottom
area, with the query panel being
in the bottom area.
4. Now that you have two panel
areas, you can drag and drop tabs
from one area to the other. For
example, try dragging the
Favorites tab (explained in the
next Task) from the top to the bottom area. Of course, you can also rearrange
the tabs within the same panel area using drag and drop.
5. When you right-click on the empty area under the tabs, or on a tab (as you’ve
done above), you can also open new panels, for example the Property Editor,
which can be used to display or change attributes of selected objects.
6. Try the other menu banner icons, like Detach, Dock and Collapse. Also note
that you are not limited to just two panel areas – you can split the panels
again.

Lab 0-10 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

Task 7. Timing Analysis

Have a look at the timing of this design. Use the menu WindowTiming Analysis
Window. In the new window that appears, press the Update button.

Once timing is up-to date, you will see a histogram for the timing distribution.
Violating paths show up as red bars on the left, and positive slack paths are green.

Click on the left-most red bar and you should see timing end-points displayed on the
bottom. Select the first entry, then right click. From the context menu, choose
“Select Worst Paths”. You will see the timing path in the layout view.

Task 8. Using the Recent Menu and Favorites

Whenever you perform a GUI function, like viewing a


congestion map or analyzing timing, this function can
be repeated by using the Recent pull-down menu
(top-left in the GUI, under Select Tool).

For certain functions that you use over and over, it might
make sense to add them to your Favorites.

Pull down the Recent menu, then right click on a function,


and select Add to Favorites.

You will see the added


function listed in the
Favorites pane as
shown here:

IC Compiler II GUI Lab 0-11


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

Task 9. Getting Help

1. The quickest way to get help is to use Command Search. Access Command
Search by using the keyboard shortcut lower-case [H] or by clicking on the
magnifying glass at the top of the block window, on the right end of the
menus.
2. In the “Search for commands” field, begin typing “place”. Notice that as you
type, ICC II lists all the menus, commands and application options that relate
to that search string.
3. Select one of the Application Options, for example,
place.coarse.congestion_analysis_effort.
A second window will open, titled “Application Options”.
In this window you can change the settings of application options, or you can
search for them. Explore this window a little to become familiar with it.
Application options are used to configure various aspects of how
IC Compiler II works. You will learn more about them in the next lecture.
4. IC Compiler II supports command, variable, file name and command option
completion through the [Tab] key. Try the following in the console at the
icc2_shell> prompt:

h[Tab]e[Tab] he[Tab] –v[Tab] [Enter]

5. To view the man page on a command or application option you need to enter
the exact command or variable name. Alternatively, you can enter the starting
characters of a command and use command completion to find the rest (auto-
completion is not available for application options). If you are not sure what
the exact name is, use help for commands, use get_app_options or
report_app_options for application options, and use printvar for
variables, along with the * wildcard. Here are some examples:
Let’s say you are looking for more information about the clock tree synthesis
command, but you do not remember the exact command name. You know it
contains the string “syn” (for synthesis). To list all commands that contain this
string enter:

help *syn*

From the displayed list of commands, you pick out the one you are interested
in, namely, synthesize_clock_trees.
Of course, you could have entered syn in Command Search as well.

Lab 0-12 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

6. From the icc2_shell> prompt, to list the available options for


synthesize_clock_trees (note that you can also use tab completion on
the synth* command after help):

help synthesize_clock_trees –verbose


or
help synthesize_clock_trees –v
or
synthesize_clock_trees -help

7. To get a full help manual page – a detailed description of the command and
all of its options, type:

man synt[Tab]c[Tab]
or
man synthesize_clock_trees

8. Now let’s say you need help on a specific application option, but again, you
don’t remember its exact name, but it pertains to CTS. To list all application
options that start with “cts”, enter:

report_app_options cts*

From the list you can identify the option of interest.


Notice that the report_app_options command also lists the current value
of each option. This report output will be explained in an upcoming lecture.
9. To get a full help manual page of the application option, type: ([Tab] does not
work on application option)

man cts.compile.enable_cell_relocation

10. You can also get additional help for an error or warning message, using the
unique message code, for example:

man ZRT-536

IC Compiler II GUI Lab 0-13


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 0

11. Finally, specifically for this workshop, we are providing a few custom
functions for the shell that can simplify your work. They are defined in the file
../ref/tools/procs.tcl. Try the following:

aa syn

The aa function searches through commands, application options and


variables that match the given string. In addition, it shows the current value.
If you want to redirect content to a separate window, try these:

v man ZRT-536
v aa cts

v is a user-defined alias that calls the “view” custom function, useful for
viewing long reports, and allows for regular expression searches.
12. To list the workshop-provided helper functions and aliases, type “ces_help”.

13. Quit IC Compiler II by using the menu FileExit, or by typing exit at the
command prompt, and selecting Discard All.

You have completed the IC Compiler II GUI lab.

Lab 0-14 IC Compiler II GUI


Synopsys IC Compiler II: Block-level Implementation Workshop
Icc_

2 Floorplanning

Learning Objectives

The purpose of this lab is to familiarize you with the basic


block-level floorplanning operations in IC Compiler II.

After completing this lab, you should be able to:


• Define a rectilinear block shape
• Place voltage areas inside the block
• Perform placement to determine macro locations
• Define block pin locations
• Analyze congestion and netlist connectivity
• Run PG prototyping
• Source a script that builds a power network using
Pattern-based Power Network Synthesis

Lab Duration:
35 minutes

Floorplanning Lab 2-1


Synopsys 20-I-078-SLG-010
Lab 2

Introduction

ORCA Design
The example design used in all labs is called ORCA which was specifically created
to address the needs of training. The design was created by the Synopsys Customer
Education Services department.

SAED32nm Library
The technology library used in this workshop is a 32 nm library which can be freely
distributed amongst Synopsys customers. It was created by Synopsys Inc. to serve as
a means of demonstrating the various needs of modern high-frequency, multi-
voltage, multi-scenario designs. For more information about this library please visit
http://www.synopsys.com.

This Lab
You will be creating a basic floorplan for the ORCA_TOP design. This will include
creating the initial outline of the block, shaping and placing the voltage areas,
placing the macros, performing analysis on macro placement, and finally
implementing the power network.

Lab 2-2 Floorplanning


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Instructions

Answers / Solutions
You are encouraged to refer to the end of this lab for answers and help.

Task 1. Invoke I CC I I and load the ORCA_TOP Block

1. Change your current directory to lab2_floorplan , then invoke


IC Compiler II:

UNIX% cd lab2_floorplan
UNIX% icc2_shell -gui

2. Select the Script Editor tab at the bottom-left of your ICC II window.

3. Click on the “folder” button (see the arrow above) and open the run.tcl file.
This file contains the commands that you will be executing in this lab. Instead
of opening the file in a separate editor and using copy/paste to transfer
commands, you can use the built-in script editor to simply highlight/select
lines you want to execute, and then click on the Run Selection button.
Try this now by selecting the entire line echo "hello world", then
clicking Run Selection. Look at the results in the shell window.

4. Open the block which has been initialized with Verilog, UPF and timing
constraints, ready to be floorplanned. Use the GUI (Open an existing block),
or select/run the following command:

open_block ORCA_TOP.dlib:ORCA_TOP/floorplan

Floorplanning Lab 2-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Task 2. Initial Floorplanning using the Task Assistant

For this task you will use the Task Assistant. You do not need to run commands
from run.tcl for the next few tasks.

1. Open the Tasks palette (if not already open) by right-clicking your mouse
anywhere in the tab area on the right (where the View Settings palette is), then
selecting Tasks. In the pull-down menu at the top of the Tasks palette, select
Design Planning.
Note: You could also use the full task assistant by pressing F4, or
using the menu TaskTask Assistant. Then, select the
“>” (Show task navigation tree) icon in the upper-left
corner of the Task Assistant
window, and in the pull-down
menu choose Design Planning.

2. Select Floorplan Preparation Floorplan Initialization.


This opens the Floorplan Initialization dialog.

3. Use the boundary Type and Orientation pull-down


fields to create the block L-shape shown here (click
on Preview under the black Display Window in the
dialog to double check your set up):

4. Verify that the Side size control is set to Ratio.

5. Enter the correct Sides ratio numbers so


that side “c” is half the size of sides “a”,
“b” and “d” (which are equal).
Hint: Use whole numbers only.

Core shape should be set to “L-shape”, Orientation should be set to “West”.


Set the 4 parameters as follows: a=2, b=2, c=1, d=2

6. Preview the floorplan, and once it looks correct, specify a Uniform spacing
value of 20 between the core and the die.

7. Click on Apply to generate the initial floorplan and then Close the dialog box.

Lab 2-4 Floorplanning


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

8. You should now see something like this:

Task 3. Block Shaping

1. From the Tasks palette, select Block Shaping  Block Shaping: This is
used to automatically create (shape) voltage areas.

2. Select Shape Blocks and click


Apply.
After shape_blocks completes,
bring the layout view to the
foreground.

3. Enable visibility of voltage areas by selecting Voltage Area from the View
Settings  Objects panel and expand Voltage Area to enable Guardband
visibility as well.
You should find that there are two voltage areas displayed. PD_RISC_CORE
in the top-right corner, and DEFAULT_VA for the remaining area.
You should also see that the macros in DEFAULT_VA have been placed
(although not very carefully), but not the macros in PD_RISC_CORE: There
are four macros inside that voltage area are stacked on top of one another.
This is normal. Only the top-level macros are placed at this time, macros
belonging to non-default voltage areas will be placed during the next Task 4.

Floorplanning Lab 2-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Task 4. Macro and Standard Cell Placement

1. From the Tasks palette, select Cell Placement  Cell Placement.


Note: Note: There is a Cell Placement  Hard Macro
Constraints entry that can be used to apply macro keepout
margins. We are skipping this, for simplicity.

2. In the Cell Placement dialog, check the box next to Use floorplanning
placement.

3. Run placement by clicking on Apply.

4. Enable pin visibility and examine the placement of the macros in the GUI.

You should see that the placer has automatically created channels between the
macros and has added soft placement blockages in the narrow channels, to
reduce congestion and improve routability. The more pins along a macro’s
edge, the larger the channel width.

You should also see that the macros have been flipped so that the sides with
common pins face each other. This is done to minimize the overall number of
channels that require routing and possibly buffering between macros.

Task 5. Place the Block Ports

1. You could also use the Task Assistant to place the pins of the block, but this
time just select/run the corresponding commands from run.tcl:

set_block_pin_constraints -self -allowed_layers {M3 M4 M5 M6}


place_pins -self

2. Have a look at the layout, you should see that all the block’s ports, the logical
representations of the physical pins, have been placed.

3. Zoom in to individual ports to verify that the pins have, indeed, only been
placed on layers M3-M6.
Note: To be able to select the physical block “pins”, turn on
Terminal selectability (terminals are the physical pins of
the current block).

Lab 2-6 Floorplanning


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

4. Search for the *clk ports/terminals and zoom into their location. See
run.tcl for hints.

5. Apply the commands from run.tcl to create a pin guide. This will constrain
all the specified clock pins to be placed in a certain area. Afterwards, turn on
pin guide visibility by turning on GuidePin Guide from the View Settings
 Objects panel, or apply the command from the script. To see the pin guide,
zoom in to the highlighted area as shown below.

6. Change the width of the *clk ports (the terminal shapes) to 0.1 and the length
to 0.4, then rerun pin placement. You should find that all clock pins are now
located inside the pin guide, and that they have been resized as specified.
This was just one example of applying pin constraints. Review the man page
to see what other adjustments are possible.

Task 6. Congestion Map

Now that macros and standard cells have been placed, as well as the block ports, it
is a good idea to check if there are any congestion issues.

1. Bring up the congestion map by selecting the


“Global Route Congestion” entry from the
GUI maps pull-down menu.

2. Select “Reload”, then confirm the dialog


that appears by pressing OK.

Floorplanning Lab 2-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

3. After global routing completes, you should see that


the layout has been updated to display the heat map.
You will not see any congestion to speak of: From
the colored histogram, most “overflows” are 1, and
very few overflows are greater than 2. overflow # of GRC edges

4. To make the display a little more interesting, try the following steps.
Change the Bins/From/To settings as shown in the following screenshot:
Bins has been reduced from 9, and from/to has been
changed from 0/7: Now, overflows from -2 to 2 have
their own individual bin; All overflows greater than 2
are grouped into a combined bin, and the same for
overflows less than -2.

This should drastically change what you see, by making the low overflows
brighter/hotter: You will see that there are a lot of areas with 0 overflow.
Display changes do not alter the results: The design has no serious congestion
issues. If you want to see a detailed calculation of the overflow for an edge,
move the mouse over that edge and you will see a popup with all the details,
as shown here:

5. Close the congestion map, either by clicking on the Draw Global Route

Congestion … icon in the top banner, or by closing the palette.

Lab 2-8 Floorplanning


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Task 7. Analyze Macro Placement using DFF

Data Flow Flylines (DFF) is a feature that allows you to analyze not just simple pin-
to-pin connections, but connections that cross through combinational gates as well
as registers. This enables a very insightful view of your overall design and makes it
easier to make informed decisions about macro placement which has a large impact
on standard cell placement.

1. Bring up Data Flow Lines by selecting the


appropriate entry from the connectivity pull-
down, located to the left of the maps pull-down.

2. Select Reload. In the dialog that appears you


can configure the tracing behavior of DFF. For
example, you can specify how many register
levels to trace through. The higher this number,
the longer the calculations will take. For our
purposes, accept the defaults and click on OK.

3. Once this completes, select Macros and Ports for items to


Include in the Data Flow Flylines analysis, and click on
Apply (highlighted in blue once you make changes) to accept
the changes.
You can also change the configuration of this panel and
choose “Auto apply”, so the changes take effect immediately.

4. Now select one macro to see its connections to other macros and ports. Limit
the tracing by checking “Number of registers” or “Number of gates” and
changing the Min/Max numbers. Using this method, you can quickly figure
out whether objects are connected directly, or through several gate levels, or
through several register levels. You can select several macros at the same time
using the Control key. If you click on a flyline, you will get detailed
information about the connection(s).

5. Note that DFF will not show you any flylines if they terminate at a register.
You will only see flylines between macros or between macros and ports,
depending on the selection you have made under “Include”. In the next Task
we will show you how to perform register tracing.

6. Close the DFF palette.

Floorplanning Lab 2-9


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Task 8. Register Tracing

1. Select Register Tracing from the connectivity pull-down, located right under
the Data Flow Flylines entry.

2. Select any macro. The registers connecting to that macro will be highlighted
immediately, skipping any logic that may be in between. Select Show flylines
to see the flylines between the macro and registers.

3. To see the next level of registers, increase Max levels to 2 under the Limit
Tracing heading, and under Show Levels, check Level 2. If there is a second
level of registers they will be highlighted in a different color.

4. Under Highlight, you have the option to also display End points: These are the
endpoints from the last level of registers displayed. You can also display
Direct end points, which are endpoints that connect directly (level 0) to the
selected source, i.e. our macro.

5. Close the Register Tracing palette.

6. We will assume for this lab that the macros are placed to our satisfaction, so
the next step is to fix their location. You can do this by selecting the macros
then clicking on , or by entering:

set_fixed_objects [get_flat_cells -filter "is_hard_macro"]

Lab 2-10 Floorplanning


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Task 9. PG Prototyping

PG prototyping can help you create a basic PG mesh very quickly. All you need to
specify are the PG net names, the layers and the percentage of the layers you want
to use for PG routing. This is most commonly used as a place-holder for the final
power mesh, in order to check congestion issues that a PG mesh may cause.

1. From the Tasks palette, select PG Planning  PG Prototyping


2. Click the Default button in the PG Prototyping dialog to show the default
settings. Notice that, by default, the upper-two metal PG layers from the
technology file, MRDL and M9, are listed to be used as the vertical and
horizontal PG layers, respectively.

3. Specify M8 and M7 instead of the default vertical and horizontal layers,


respectively, using the pull-down menus.

4. Click on Apply. In a couple of seconds, you should see a basic PG mesh


inserted, which does not route over hard macros.

5. You can remove the PG mesh by clicking on Remove PG Routes.

6. If you like, play around with the layer and percentage parameters to test
different PG mesh configurations. Remove PG routes before re-applying new
parameters.

In the next task you will build the final mesh using a pre-written script.

Floorplanning Lab 2-11


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 2

Task 10. Power Network Synthesis

Planning an entire power network for a design can be a complex task, especially in a
multi-voltage environment. The purpose of this task is to just give you an idea of
what is possible in IC Compiler II using pattern-based power network synthesis
(PPNS), and to demonstrate how few commands it takes to create a full multi-
voltage PG mesh.

1. Quickly review the script that inserts the entire power structure. Open the file
scripts/pns.tcl in an editor (or in the script editor!).

After first deleting any pre-existing PG mesh structures, you will see how the
patterns are created (create_pg_*_pattern), followed by the strategies
(set_pg_strategy). Once both are defined, the strategies are implemented
using compile_pg.

2. Source the script:

source scripts/pns.tcl

3. Once the script has completed, review the power mesh, the macro PG
connections, and the standard cell rails in the layout view.
Note: The power mesh will have a few issues here and there,
which will have to be taken care of for final implementation.

4. Save and Exit out of IC Compiler II:

save_lib
exit

You have completed the floorplanning lab!

Lab 2-12 Floorplanning


Synopsys IC Compiler II: Block-level Implementation Workshop
3 Placement and
Optimization

Learning Objectives

The purpose of this lab is to familiarize you with the


placement capabilities in IC Compiler II.
You will perform placement on the ORCA_TOP design, after
applying a number of settings. Reports will be generated to
monitor and track the progress of the design.

After completing this lab, you should be able to:


• Check the readiness of the design for placement
• Apply settings for place_opt
• Perform placement and optimization
• Analyze timing and design quality of results (QoR)

Lab Duration:
60 minutes

Placement and Optimization Lab 3-1


Synopsys 20-I-078-SLG-010
Lab 3

Instructions

Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers, or to
obtain help with the execution of some steps.

Task 1. Load and Check the Initial Design

1. Change to the work directory for the placement lab, then start ICC II in GUI
mode and execute the load script:

UNIX% cd lab3_place
UNIX% icc2_shell -gui -f load.tcl

The script copies the block that completed design setup, called
ORCA_TOP/init_design, to a block called ORCA_TOP/place_opt, and
opens the copied block.
2. Generate a timing QoR summary.

report_qor –summary

As to be expected, you should see a high WNS/TNS; no placement or any


optimizations have been performed yet.

Task 2. Place and Optimize the Design

To support a more immersive and interesting lab experience, there are no step-by-
step instructions for this lab.
Instead, you are asked to open the file run.tcl in the ICC2 script editor and
exercise the commands line by line. We are specifically asking you not to just
source the entire file, as this would defeat the purpose which is to understand how
all the options and commands play together. If there is an option that does not make
sense, have a look at its man page.
The following sections provide additional information and comments. The sections
are ordered in such a way that you can refer to them as you go through the script.
If you like, you can diverge from the commands in run.tcl, or you could try
different efforts, different settings etc. Note though that the runtimes might vary.
Make sure you only run the place_opt command once, as the runtime is relatively
high.

Lab 3-2 Placement and Optimization


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 3

The following section is designed to be a guide through the lab. It contains


information on the items that must be configured and run, and some additional
questions.
If you need help, talk to your instructor.

Pre-placement Checks
Before performing placement and optimization, it is best to ensure that there are no
issues that will prevent place_opt from doing its job.

Question 1. Are there any remaining ideal nets?

...................................................................................................

Question 2. What is the maximum routing layer set for the block?

...................................................................................................

The design contains scan chains that are specified using SCANDEF as part of the
data setup step.
Question 3. How many scan chains exist in the design?

...................................................................................................

It can be important to establish whether the design has any high-fanout nets, or nets
with a certain fanout. In ICC II, this is analyzed using report_net_fanout. Use
the commands shown in run.tcl to answer the following question:
Question 4. How many non-clock high fanout nets exist with a fanout
larger than 60?

...................................................................................................

Pre-placement Settings
For technologies of 12 nm and below, you should use the set_technology
command to configure ICCII. This command changes several application options to
support the given technology.
To insert tie cells during place_opt, library tie cells must not have the
dont_touch attribute applied, and they must be included for ‘optimization
purpose’.
Question 5. What command/option is used to include cells for
optimization?

...................................................................................................

Placement and Optimization Lab 3-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 3

Leakage + Dynamic Power Optimization Settings


Leakage and dynamic power optimization are not performed by default during
place_opt. You need to enable the appropriate power optimization (leakage,
dynamic or total) and make sure that at least one scenario is enabled for leakage
and/or dynamic power, if applicable.

Logic Restructuring
You can use advanced logic restructuring by setting the following option, for
example to achieve added power restructuring, set the application option
opt.common.advanced_logic_restructuring_mode to power.
The other choices are area, area_timing, timing, timing_power.

Layer Optimization / Route Driven Extraction (RDE)


Layer optimization identifies long and timing critical nets, then promotes them to
higher, low-resistance metal layers. The assigned min/max layer constraints from
place_opt are carried all the way through to post-route optimization.
In 2019.03, this optimization is on-by-default and cannot be disabled.
Earlier releases used a different, less-optimal algorithm which was controlled using
the application option place_opt.flow.optimize_layers

Using route-driven extraction, global routing is run on the initially placed design to
construct an RDE extraction table. This extraction is used subsequently for all
virtual net extraction which is then used for all pre-route optimizations (place_opt
and clock_opt). This improves the pre- to post-route timing correlation. RDE is
on-by-default for 16nm and below technologies (setting: auto). You can explicitly
turn this on by setting this to “true”.
Application option: opt.common.enable_rde

ICG Optimization
ICG optimization is useful only if a design is known to have problems meeting ICG
enable setup timing which is not the case in our design.
In the interest of reasonable run times, it is recommended not to enable this
optimization for this lab.
Application option: place_opt.flow.optimize_icgs

Lab 3-4 Placement and Optimization


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 3

Coarse Placement Density


In release 2019.03, an additional control was added to produce better out-of-the-box
results for the placer. To dynamically adjust the density throughout placement, set
the application option place.coarse.enhanced_auto_density_control to
true.

Placement and Analysis


Make sure the scan chains are optimized during the place_opt run. Verify that the
DFT optimization option is set appropriately.
Question 6. What is the app option for optimizing the scan chains and
what is its default setting?

...................................................................................................

...................................................................................................

The advanced legalizer is generally recommended for 12nm technologies and


below. The lab is based on 32nm technology, but if you want to enable the advanced
legalizer anyway, set place.legalize.enable_advanced_legalizer to
true.
Run place_opt (you may choose to run each stage individually – see the run.tcl
file). If you execute all five stages in a single run, this may take around 10 minutes
to complete, so take a break.
After place_opt completes, check the design for congestion and timing issues.
You should find that there is no congestion to speak of. Turn on visibility of keepout
margins (View SettingsCellKeepout Margin) and you will see that all the
channels between RAMs were blocked. This is an important contributor to the good
congestion situation, as well as the good runtime and timing.
Timing should be met as well, especially if you have enabled the advanced
legalizer.

You have completed the placement lab.

Placement and Optimization Lab 3-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 3

Answers / Solutions

Question 1. Are there any remaining ideal nets?


report_ideal_network reports no ideal networks for all
scenarios. If there are remaining ideal nets, you would have
to remove them using the remove_ideal_net command.

Question 2. What is the maximum routing layer set for the block?
The report_ignored_layers command indicates that
the maximum layer is set to M8.
Although the technology being used has 9 metal layers, we
limit signal routing to metal 1 through 8 (M1 – M8). Metal
7 and 8 are used also for the power mesh, which limits the
available resources. Metal 9 is not used in this design.

Question 3. How many scan chains exist in the design?


8. You can see this information near the end of the
command report_design -summary, or you can use
get_scan_chain_count.

Question 4. How many non-clock high fanout nets exist with a fanout
larger than 60?
There are 10 total nets with a fanout >= 60. By eliminating
any net name containing “clk”, or fanout driver pin
containing “CLK” you should find that there are 2 non-clock
nets with a fanout >= 60.
You can also use the more precise method, shown in
run.tcl, using “-filter net_type==signal”.

Question 5. What command/option is used to include cells for


optimization?
set_lib_cell_purpose –include optimization

Question 6. What is the app option for optimizing the scan chains and
what is its default setting?
It is opt.dft.optimize_scan_chain and the default is
true

Placement and Optimization Lab 3-6


Synopsys IC Compiler II: Block-level Implementation Workshop
Icc_

5 Design Setup

Learning Objectives

The purpose of this lab is to familiarize you with design


setup in IC Compiler II.

After completing this lab, you should be able to:


• Create an NDM design library
• Load the netlist, and apply the UPF
• Load floorplan- and scan-DEF files
• Confirm placement site and routing layer settings
• Debug some common design setup mistakes

Lab Duration:
40 minutes

Design Setup Lab 5-1


Synopsys 20-I-078-SLG-010
Lab 5

Introduction

The design setup task is the most important step to perform correctly. Faulty setup
can lead to many problems downstream, which can have a large negative impact on
the design schedule. Once design setup is completed, you rarely need to revisit this
task (unless the design or constraints change) and can focus on the more productive
tasks like placement, CTS and routing.

The design setup steps that you will be performing include:

• Creating an NDM design library

• Loading the netlist, floorplan- and scan-DEF files

• Loading multi-voltage design data: UPF and voltage area definition

• Applying/confirming placement site and routing layer settings

Lab 5-2 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Relevant Files and Directories


All files for this lab are located in the lab56_setup directory under your home
directory.

lab56_setup/

run5.tcl Script with all the commands executed


in this lab.

setup.tcl Setup variables and general settings.

.synopsys_icc2.setup Read by IC Compiler II upon startup.

ORCA_TOP_design_data/

ORCA_TOP.v Verilog gate level netlist.


ORCA_TOP.upf UPF multi-voltage design intent.
ORCA_TOP.fp/ Floorplan directory written by ICC II
write_floorplan.

ORCA_TOP.scandef SCAN-DEF scan-chain definition.

../ref/CLIBs Directory containing the SAED32nm


cell libraries

Design Setup Lab 5-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Instructions

Answers / Solutions
You are encouraged to refer to the end of this lab to verify your answers.

Task 1. Directory Structure and Invoking I CC I I

1. Change your current directory to lab56_setup and look at the contents of


the directory:

UNIX% cd lab56_setup
UNIX% ls -al

This directory is shared with the next lab, in which you will complete the
setup process (timing setup).
The rm_setup/ and ORCA_constraints/ directories, as well as the
run6.tcl file are not listed on the previous page, because they are not used
in this lab.
2. Invoke IC Compiler II:

UNIX% icc2_shell -gui

3. Open the run5.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
4. Open another terminal window and change your current directory to
lab56_setup. You will use this window to examine the files that will be
used in this lab.

Lab 5-4 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Task 2. Setup Variables and Settings

1. In the last terminal window that you just opened, view the setup.tcl file in
the lab56_setup directory.
Question 1. What is the default value of the search_path application
variable?
(HINT:
printvar sear[TAB] or
echo $search_path or
get_app_var search_path or
report_app_var search_path )

...................................................................................................

Question 2. Which commands in run5.tcl will use the search_path


application variable to locate their specified file(s)?
(list just the commands, without their options and arguments)

...................................................................................................

...................................................................................................

Question 3. Which command in run5.tcl uses the user-defined


TECH_LIB and REFERENCE_LIBRARY variables?

...................................................................................................

Question 4. From setup.tcl: How many reference libraries (labeled


“saed32”) are being used?

...................................................................................................

Question 5. From setup.tcl: Up to how many cores are enabled for


multi-threading?

...................................................................................................

Design Setup Lab 5-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

2. By looking at the SEE ALSO section at the bottom of the man page of the
set_host_options command, use the appropriate command to answer the
next question (this is a useful technique to find related commands):
Question 6. How many cores are enabled, by default?

...................................................................................................

The suppress_message commands at the bottom of setup.tcl are useful


for suppressing information and warning messages that are expected, and
either not useful or known to be benign. This helps to de-clutter the run log,
making it easier to locate warnings and errors of real concern, when running
scripts in batch mode.
3. Close the setup.tcl file – do not save it.
4. From the run5.tcl file in the ICC II GUI script editor, select and run the
command to source the setup.tcl file:

source -echo setup.tcl

5. Confirm the applied settings:

printvar search_path
printvar REFERENCE_LIBRARY
report_host_options
print_suppressed_messages

You might notice that there are additional suppressed messages beyond the
ones listed in the setup file (CTS-725, POW-001, …). These are tool-defaults,
and this is usually done in order to reduce output verbosity. To see these
messages, use unsuppress_message.

Lab 5-6 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Task 3. Design Library, Netlist and UPF

1. Create the design library:

create_lib \
-use_technology_lib $TECH_LIB \
-ref_libs $REFERENCE_LIBRARY \
ORCA_TOP.dlib

You will get the following error message: Error: technology library
' ../ref/CLIBs/saed32_1p9m_tech.ndm' does not match any
library on the reference library list. (LIB-059)

2. Correct the problem in the setup.tcl file, then repeat the necessary
commands until a design library is successfully created.
Question 7. What is the name of the newly-created design library?

...................................................................................................

Question 8. Did the design library show up in the lab56_setup


directory?

...................................................................................................

3. Read the Verilog netlist:

read_verilog -top ORCA_TOP ORCA_TOP.v

You will see another error message:


Error: File 'ORCA_TOP.v' cannot be found using
search_path of: '. scripts ORCA_TOP_constraints'.
(FILE-002)

4. Using the file and directory structure shown on page 3, correct the problem,
then repeat the necessary commands until the netlist is read without errors.
5. If, as in our case, the GUI is running, explicitly linking the block is not
needed, since it is done automatically. You can, therefore, skip the following
command. Linking ensures that all instantiated references can be found.

link_block

Design Setup Lab 5-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

You will see the following warning messages:


Warning: Unable to resolve reference to 'SRAMLP2RW32x4'
first referenced from module 'PCI_TOP'. (LNK-005)
Warning: Unable to resolve reference to 'SRAMLP2RW64x8'
first referenced from module 'CONTEXT_MEM'. (LNK-005)
...

A linking problem occurs when the list of reference library pointers attached
to the design library (specified by create_lib –ref_libs ...), is
missing the library that contains one or more reference cells that are
instantiated in the netlist.
6. The ORCA_TOP design uses four different SRAMs, which are all unresolved.
If you look inside the directory that contains the reference cell libraries, you
should be able to determine the missing reference library name.
Question 9. What is the name and location of the reference library
containing the unresolved references?

...................................................................................................

7. The problem can be corrected in one of two ways:


- Method #1: Use set_ref_libs to add the missing reference library to
the design library.
- Method #2: Correct the setup.tcl file and repeat the design setup steps.
The first method requires fewer steps, but, it does not correct the problem in
the key design setup file, setup.tcl. If design setup needs to be repeated in
the future (the netlist or the constraints are updated, for example), you will run
into the same error.
The second method, which requires a few more steps, ensures that if design
setup needs to be repeated, you will not encounter the same error.
You will execute both methods to fix the problem.
8. Method #1:
Add the missing reference library to the existing list using set_ref_libs,
then re-link:

set_ref_libs -add ../ref/CLIBs/saed32_sram_lp.ndm


link_block -force

Lab 5-8 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

The design should link without any warnings. While you could continue with
the remaining design setup steps, you will first implement the second method
to fix the problem at its source, the setup.tcl file.
9. Method #2:
First edit setup.tcl and add the missing reference library to the
REFERENCE_LIBRARY list, then close the design library, then repeat the
main design setup steps:

close_lib
source -echo setup.tcl
create_lib -use_technology_lib $TECH_LIB \
-ref_libs $REFERENCE_LIBRARY ORCA_TOP.dlib
read_verilog -top ORCA_TOP ORCA_TOP.v
link_block

10. For final confirmation, run the command report_ref_libs.


11. You should see a BlockWindow in IC Compiler II’s GUI, which contains the
physical or layout representation of the ORCA_TOP block. The layout contains
all the standard cell and macro instances of the netlist, stacked on top of each
other, in the lower-left corner. The blue-green rectangles are the hard macros,
and the small purple rectangles in the lower-left corner are the standard cells.
The block’s I/O ports are also stacked on top of each other, and show up as a
small light-blue square with a Greek-like Phi Φ symbol (actually an O and an
I superimposed), just outside of the lower-left corner of the stacked cells.
Question 10. What is the name (block handle) of the newly-created current
block in the design library?

...................................................................................................

Note: The default block handle format is


libName:blockName.viewName.
The default viewName is design. A block can be saved
with an optional label, in which case the block handle is:
libName:blockName/labelName.viewName. You will
save the block with a label name at the end of the design
setup steps.

Design Setup Lab 5-9


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

12. If a design library contains multiple blocks with the same blockName (but
with a different labelName and/or viewName), the block must be referred to
by its unique block handle. If the design library contains only one block with a
certain blockName, then that block can be referred to simply by its
blockName. You can optionally include the libName, and/or labelName,
and/or viewName, as demonstrated by executing these commands:

get_blocks ORCA_TOP
get_blocks ORCA_TOP.design
get_blocks ORCA_TOP.dlib:ORCA_TOP
get_blocks ORCA_TOP.dlib:ORCA_TOP.design

Note: Since this library contains only one block called ORCA_TOP,
all of the above commands are accepted by IC Compiler II,
and they all return the same full block handle.

13. Take a quick look at the UPF file located at


ORCA_TOP_design_data/ORCA_TOP.upf. This UPF files defines:

- Three power supply nets/ports: VSS, VDD and VDDH


- Two power domains: PD_ORCA_TOP (the top level of the block) and
PD_RISC_CORE (contains the RISC_CORE sub-design)

- Level shifters for inputs and outputs of PD_RISC_CORE


- Defines the power states of the power nets. Here, the power nets only
define an ON-state
14. Close the UPF file – do not save it, then load and commit UPF:

load_upf ORCA_TOP.upf
commit_upf

Lab 5-10 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Task 4. Load Floorplan and Scan-DEF

1. Load the floorplan generated by ICC II floorplanning:

source ORCA_TOP.fp/floorplan.tcl

Note: The ORCA_TOP.fp directory is located under the


ORCA_TOP_design_data directory.

2. Look at the GUI BlockWindow, and you will see the complete mirrored-L
shape floorplan of the ORCA_TOP block.
3. Zoom in and notice the complex P/G mesh structure, then zoom back out to a
full-zoom. Since the P/G mesh makes it difficult to see the underlying
structures clearly, we will turn off their visibility next.
4. In the View Settings panel, make the follwing changes, to improve the
visibility of the floorplan:
- Port  Uncheck visibility
- Terminal  Check selectability
- Voltage Area  Check visibility
- Route  Net Type  Power and Ground  Uncheck visibility
5. Under the SettingsView tab, enable Label settingsScale fonts. This
improves the readibility of the labels.
6. You can now more clearly see that:
- All the macros are placed
- Terminals (metal connection shapes) for the I/O and P/G ports are placed
around the block boundary
- A voltage area, called DEFAULT_VA, is defined for the entire core area
(dashed purple outline)
- A second voltage area called PD_RISC_CORE is defined in the lower-
right
- Standard cells are still stacked on top of each other in the lower-left corner

Design Setup Lab 5-11


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

7. Read the SCAN-DEF file:

read_def ORCA_TOP.scandef

Question 11. How many scan chains does the design have?

...................................................................................................

8. Connect the P/G pins to the supply nets, and verify that there are no P/G
connection errors:

connect_pg_net
check_mv_design

Lab 5-12 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Task 5. Placement Site and Routing Layer Settings

1. Confirm that the placement site called unit is set as the default site definition
(its is_default attribute should be true):

get_attribute [get_site_defs unit] is_default

2. Confirm that Y-symmetry is applied to the standard cells:

get_attribute [get_site_defs unit] symmetry

In our case, this was defined while preparing the technology-only library,
which you learned about in the previous unit. If this was not done, or you did
not use a technology-only NDM, you would have to apply this command:
set_attribute [get_site_defs unit] symmetry {Y}

Note: You do not need to run the above command!

Question 12. Does Y-symmetry mean that standard cell can be flipped in
the Y-direction (along the X-axis), or flipped in the X-
direction (along the Y-axis)?

...................................................................................................

3. Confirm that the metal layer preferred routing directions are defined:

get_attribute [get_layers M?] routing_direction

As noted above, this information was specified when preparing the


technology-only NDM library.
4. Confirm that all metal layers are available for signal routing (none are
ignored), by default, then limit routing to M6 or lower:

report_ignored_layers
set_ignored_layers -max_routing_layer M6
report_ignored_layers

Design Setup Lab 5-13


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5

Task 6. Save the Block

You have completed all the design setup steps; Timing setup will be performed in
the next unit. This is a good time to save the block.

1. List the files and directories in the current working directory (CWD),
lab56_setup:

ls -l

Question 13. Does the ORCA_TOP.dlib design library exist in the current
working directory?

...................................................................................................

2. Execute the following commands to rename the block with a label,


init_design, which helps to identify it (as this version that has been
initialized), and then to save the block and library:

rename_block –to_block ORCA_TOP/init_design


save_block -or- save_lib

Question 14. Does the ORCA_TOP.dlib design library show up now?

...................................................................................................

3. Exit out of IC Compiler II. The GUI will ask for confirmation – click Exit:

exit

You have completed the Design Setup lab!

Lab 5-14 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 5

Answers / Solutions

Question 1. What is the default value of the search_path application


variable?
(HINT:
printvar sear[TAB] or
echo $search_path or
get_app_var search_path or
report_app_var search_path )

The current directory: .

Question 2. Which commands in run5.tcl will use the search_path


application variable to locate their specified file(s)?
(list just the commands, without their options and
arguments)?
source, read_verilog, load_upf, read_def.
(Since the $TECH_LIB and $REFERENCE_LIBRARY
variables include the path to each library, the create_lib
command will not use the search_path variable)

Question 3. Which command in run5.tcl uses the user-defined


TECH_LIB and REFERENCE_LIBRARY variables?
create_lib

Question 4. From setup.tcl: How many cell reference libraries are


being used?
3: HVth, LVth and RVth cell libraries (containing standard
cells and level shifters).

Question 5. From setup.tcl: Up to how many cores are enabled for


multi-threading?
The code lines in the script will reserve as many threads as
available on the machine, but no more than 8.

Question 6. How many cores are enabled, by default?


From report_host_options: max_cores: 1

Design Setup Lab 5-15


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5 Answers / Solutions

Task 3. Design Library, Netlist and UPF

2. Correct the problem …


Add the technology library to the top of the REFERENCE_LIBRARY list in
the setup.tcl file, then source the file:

set REFERENCE_LIBRARY [join "


$TECH_LIB
../ref/CLIBs/saed32_hvt.ndm
../ref/CLIBs/saed32_lvt.ndm
../ref/CLIBs/saed32_rvt.ndm
"]

Question 7. What is the name of the newly-created design library?


ORCA_TOP.dlib: As echoed in the run log, or, from the
create_lib argument without an option, or, by typing
current_lib

Question 8. Did the design library show up in the lab56_setup


directory?
No. The design library will not appear (as a UNIX
directory) until the block is saved, which will be done near
the end of the design setup steps.
4. Using the file and directory structure shown on page 3, correct the problem

Add the ORCA_TOP_design directory, which contains the Verilog netlist
(and other) design files, to the search_path in setup.tcl, then source
the file:

set_app_var search_path " . scripts


ORCA_TOP_constraints ORCA_TOP_design_data"

Question 9. What is the name and location of the reference library


containing the unresolved references?
../ref/CLIBs/saed32_sram_lp.ndm

Lab 5-16 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 5

9. Method #2:
First close the design library, then edit setup.tcl and add the missing
reference library to the REFERENCE_LIBRARY list …

set REFERENCE_LIBRARY [join "


$TECH_LIB
../ref/CLIBs/saed32_hvt.ndm
../ref/CLIBs/saed32_lvt.ndm
../ref/CLIBs/saed32_rvt.ndm
../ref/CLIBs/saed32_sram_lp.ndm
"]

Question 10. What is the name (block handle) of the newly-created


current block in the design library?
From current_block:
ORCA_TOP.dlib:ORCA_TOP.design

Question 11. How many scan chains does the design have?
8
Number of Processed/Read DEF Constructs
---------------------------------------
VERSION : 1/1
DIVIDERCHAR : 1/1
BUSBITCHARS : 1/1
DESIGN : 1/1
SCANCHAINS : 8/8

You can also get the scan chain count by running the
command get_scan_chain_count.

Question 12. Does Y-symmetry mean that standard cell can be flipped in
the Y-direction (along the X-axis), or flipped in the X-
direction (along the Y-axis)?
Standard cell can be flipped in the X-direction, along the
Y-axis.

Design Setup Lab 5-17


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 5 Answers / Solutions

Question 13. Does the ORCA_TOP.dlib design library exist in the


current working directory?
No. It is still not there.

Question 14. Does the ORCA_TOP.dlib design library show up now?


Yes!

Lab 5-18 Design Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Icc_

6 Timing Setup

Learning Objectives

The purpose of this lab is to familiarize you with timing


setup in IC Compiler II.

After completing this lab, you should be able to:


• Perform MCMM setup:
- Define corners, modes and scenarios required for
analysis and optimization
- Load MCMM timing constraints
• Confirm implementation phase readiness with various
reports and checks, as well as a zero-interconnect timing
sanity check

Lab Duration:
40 minutes

Timing Setup Lab 6-1


Synopsys 20-I-078-SLG-010
Lab 6

Introduction

The timing setup task completes the design setup, which is the most important step
to perform correctly. As explained earlier, once setup is completed, you rarely need
to revisit this task (unless the design or constraints change) and can focus on the
more productive tasks like placement, CTS and routing.

The timing setup steps that you will be performing include:

• Performing MCMM setup, which includes defining the corners, modes and
scenarios required for analysis and optimization, and loading the MCMM
constraints
• Confirming implementation phase readiness with various reports and checks
• Performing a zero-interconnect timing sanity check

Relevant Files and Directories


All files for this lab are located in the lab56_setup directory under your home
directory.

lab56_setup/

run6.tcl Script with all the commands executed


in this lab.

setup.tcl Setup variables and general settings.

ORCA_TOP_constraints/
ORCA_TOP_c_*.tcl Corner constraints.
ORCA_TOP_m_*.tcl Mode constraints.
ORCA_TOP_s_*.tcl Scenario constraints.
ORCA_TOP_port_lists.tcl Port definitions used by constraint
commands.
scripts/
mcmm_ORCA_TOP.tcl Script to define MCMM corners, modes,
and scenarios, and load their respective
timing constraints.

Lab 6-2 Timing Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

Instructions

Answers / Solutions
You are encouraged to refer to the end of this lab to verify your answers.

Task 1. Open the Block from Lab 5

This lab is a continuation of Lab 5. It expects that block ORCA_TOP/init_design


exists, which is created at the end of Lab 5. If you did not complete Lab 5 yet, do
that first.
Alternatively, to catch up, run: icc2_shell -f .solution/complete5.tcl

1. Invoke IC Compiler II from the lab56_setup directory:

UNIX% cd lab56_setup
UNIX% icc2_shell -gui

2. Open the run6.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
3. Source the setup.tcl file:

source -echo setup.tcl

4. Open the block: This can be accomplished either by first opening the library,
and then the block, or, by using the full block handle, which includes the
library, in which case you do not need to first open the design library:

open_block ORCA_TOP.dlib:ORCA_TOP/init_design
# OR
open_lib ORCA_TOP.dlib
open_block ORCA_TOP/init_design

Timing Setup Lab 6-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

Task 2. Multi-Corner Multi-Mode Setup

1. In a different terminal window, open the file


scripts/mcmm_ORCA_TOP.tcl.

This script first creates the modes, corners and scenarios needed for multi-
corner multi-mode (MCMM) optimization of this design.
Note: The script takes advantage of Tcl arrays
(set arrayName(varName) varValue) to create
efficiently-coded foreach loops.

Question 1. What are the names of the modes, corners and scenarios that
will be created?

Modes: ......................................................................................

Corners: ....................................................................................

...................................................................................................

Scenarios: .................................................................................

...................................................................................................

...................................................................................................

2. The next section of the mcmm_ORCA_TOP.tcl script sources the mode-,


corner- and scenario-specific constraints files into their respective newly-
created modes, corners and scenarios.
The constraints files are located in ORCA_TOP_constraints/. If you have
the time, take a quick look at one of each of the mode (_m_), corner (_c_)
and scenario (_s_) files.
3. The last section of the mcmm_ORCA_TOP.tcl script configures the analysis
types that are activated for the scenarios.
Use the man page for set_scenario_status, if needed, to help you answer
the following questions:
Question 2. Which scenarios will be active?

...................................................................................................

Lab 6-4 Timing Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

Question 3. Which analysis types will be enabled for the test.ss_125c


scenario?

...................................................................................................

4. Close the mcmm_ORCA_TOP.tcl file – do not save it, then source it:

source -echo mcmm_ORCA_TOP.tcl

5. Look at the log messages that were generated in the icc2_shell window:
After scenario creation, the messages confirm that all scenarios are active for
all analysis types:
Created scenario test.ff_125c for mode test and corner ff_125c
All analysis types are activated.

The warning messages about the virtual and generated clocks, which occur
when sourcing the mode constraints, are just informational, and can be
ignored (virtual clocks, by definition, have no sources).
6. Verify that the active analysis types for the test.ss_125c scenario match
your answer to the previous question.
7. Ensure that there are no propagated clocks prior to clock tree synthesis:

set cur_mode [current_mode]


foreach_in_collection mode [all_modes] {
current_mode $mode
remove_propagated_clocks [all_clocks]
remove_propagated_clocks [get_ports]
remove_propagated_clocks [get_pins -hierarchical]
}
current_mode $cur_mode

8. Verify that the current_mode and current_corner are consistent with


the current_scenario.
9. Change the current_mode or current_corner and notice that the
current_scenario changes accordingly. Conversely, if you change the
current_scenario this changes the current_mode and/or
current_corner.

Note: There are no scenarios for test mode in *_m40c corners.

Timing Setup Lab 6-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

10. Generate a mode report, which creates a convenient table-fomat summary of


the active and analysis type status of all scenarios, grouped by mode:

report_mode

Right after the name of the mode you will see this line:
Current: false Default: false Empty: false
Current refers to whether this mode is the current mode or not.
Default is only true if you didn’t create any modes on your own, in that case
ICC II would have single mode named default.
Empty is true if you have not applied any constraints to this mode.
11. Generate a pvt report, to find out if there are any mismatches between the
PVT values defined in each corner, versus the available library PVTs:

view report_pvt

12. Look at the summary section for each of the four corners (between the
horizontal dashed lines), and answer the following questions:
Question 4. Which corner(s) have PVT mismatches?

...................................................................................................

Question 5. What is mismatching – process, voltage and/or temperature?

...................................................................................................

13. The information below the warning summary section lists the details of each
library that has a mismatch (based on the cells instantiated in the netlist). A
quick way to find out what the problems are is to look at the lines with an
asterisk or star (*).
Question 6. What is causing all of the mismatches?

...................................................................................................

...................................................................................................

...................................................................................................

Lab 6-6 Timing Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

Based on the name of the corner with the mismatches (ss_m40c), as well as
the name of the other corner with that same temperature (ff_m40c), it is
reasonable to conclude that the problem is with the user-specified temperature
constraint of -55, not with the characterized corner of the libraries.
14. Exit the PVT report.
15. Make the necessary correction in the appropriate constraints file located in
the ORCA_TOP_constraints directory.
16. Re-execute the commands in steps 4, 7 and 11 on the previous pages, until a
clean PVT report is obtained.
17. Save the block/library

save_lib

18. Run the following command to list the blocks:

list_blocks

Question 7. Has the block been saved?

...................................................................................................

Task 3. Zero-Interconnect (ZIC) Timing Sanity Check

It is a good idea to perform a zero-interconnect (ZIC) timing sanity check, to ensure


that the netlist has a chance at meeting timing after the design is placed and routed.

1. Generate a QoR summary report, then set the design in ZIC timing mode and
generate another QoR summary report:

report_qor -summary -include setup


set_app_options -list \
{time.delay_calculation_style zero_interconnect}
report_qor -summary -include setup

You will notice a big difference in non-ZIC and ZIC timing. This is due to the
long, estimated routes between the unplaced standard cells in the lower-left
corner, and the hard macro cells in the block, as well as the I/O terminals

Timing Setup Lab 6-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

around the block. In ZIC timing, the RC parasitics of these long, estimated
routes are set to zero which drastically improves setup timing.
This QoR report is very useful to get a high-level summary of the worst
negative slack (WNS) timing, as well as the total negative slack (TNS), and
the number of violating endpoints (NVE) for each scenario.
At first glance, when looking at the second, ZIC QoR report, it appears that
the design has a serious problem! Two of the three setup timing scenarios
have WNS violations of ~2.6 ns!
2. First let us make sure that these large violations are not due to unbuffered high
fanout nets (HFNs) - assume a fanout of 50 or more is considered a HFN:

set_app_options -list \
{time.high_fanout_net_pin_capacitance 0pF
time.high_fanout_net_threshold 50}
update_timing -full
report_qor -summary -include setup

Since the results are the same, the violations are not caused by HFNs.
3. Notice that the WNS for the scenario in the -40 OC corner, func.ss_m40c,
is not violating (has a positive WNS).

Question 8. Can you think of a reason why one scenario meets setup ZIC
timing, while the others have a huge WNS violation?

...................................................................................................

...................................................................................................

...................................................................................................

Next, you will investigate the large setup timing violations further, to confirm
that optimization was, indeed, not done for these scenarios.
4. Generate a timing report for the five worst violating paths:

view report_timing -max_paths 5

Lab 6-8 Timing Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6

5. In the view window, look at the incremental delay numbers in the column
labeled Incr, for the first reported path. You should notice a couple of large
(>> 1ns) delays. If you look to the left of those large delays, at the name of the
standard cell reference shown in parenthesis, you will find that they are small-
sized (1x or 2x), high Vth gates (ending with X1_HVT or X2_HVT) which are
the slowest cells available! Scroll down and look at the other four paths: You
will find the same thing there. In fact, you should notice that these slowest
cells are being used all along the timing-critical paths.
This is a clear indication that these paths were not optimized, which confirms
that these scenarios were not considered during synthesis. Ideally, it would be
best to re-synthesize the design under all key setup timing scenarios, which
would provide a better starting netlist to IC Compiler II. This would result in a
better initially-placed design, requiring less optimization, and thus better
placement run-time. If re-synthesizing the design is not an option, there is still
a good likelihood that optimization during the placement, CTS and routing
phases will be able to eliminate, or drastically improve the timing in the other
scenarios.
6. Exit the timing report view window.
You have completed all recommended timing setup steps!
7. Exit out of IC Compiler II:

exit

You have completed the Timing Setup lab!

Timing Setup Lab 6-9


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6 Answers / Solutions

Answers / Solutions

Question 1. What are the names of the modes, corners and scenarios
that will be created?
Modes: func, test
Corners: ss_125c, ss_m40c, ff_125c, ff_m40c
Scenarios: func.ss_125c, func.ss_m40c,
func.ff_125c, func.ff_m40c, test.ss_125c,
test.ff_125c

Question 2. Which scenarios will be active?


ALL scenarios. By default, all scenarios are active when
created. The –active false option is used to de-activate
scenarios, after-which –active true can be used to re-
activate them.

Question 3. Which analysis types will be enabled for the


test.ss_125c scenario?

Setup timing and logical DRCs (max transition, max


capacitance, min capacitance). Leakage/dynamic power,
hold timing as well as cell and signal EM analyses are
disabled.

Question 4. Which corner(s) have PVT mismatches?


ss_m40c

Question 5. What is mismatching – process, voltage and/or temperature?


Temperature

Question 6. What is causing all of the mismatches?


The effective (library characterization) temperature of
-40 degrees Celsius does not match the user-specified
value of -55.

Lab 6-10 Timing Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 6

Task 2. Multi-Corner Multi-Mode Setup

15. Make the necessary correction in the appropriate constraints file located in
the ORCA_TOP_constraints directory.
In the ORCA_TOP_c_ss_m40c.tcl file make the following correction:

set_temperature -40

Question 7. Has the block been saved?


The date and time stamp show when the block was last
saved to disk, which confirms that the block has been saved.

Question 8. Can you think of a reason why one scenario meets setup
ZIC timing, while the others have a huge WNS violation?
One common reason is that synthesis was not performed in
an MCMM environment. In our case, it was performed for a
single mode and corner: The functional mode (func), and
the slow-slow process at -40 OC corner (ss_m40c).

Timing Setup Lab 6-11


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 6 Answers / Solutions

This page was left blank intentionally.

Lab 6-12 Timing Setup


Synopsys IC Compiler II: Block-level Implementation Workshop
7 Running CTS

Learning Objectives

The purpose of this lab is for you to become familiar with


the clock tree synthesis and Post-CTS optimization
capabilities in IC Compiler II.
You will perform clock tree synthesis on the ORCA_TOP
design using the recommended flow, after applying
multiple settings. Reports will be generated to monitor and
track the progress of the design.

After completing this lab, you should be able to:


• Apply basic settings for clock tree synthesis and data-
path optimization
• Run either the classic CTS or the CCD flow
• Analyze CTS quality-of-results (QoR)

Lab Duration:
70 minutes

Running CTS Lab 7-1


Synopsys 20-I-078-SLG-010
Lab 7

Instructions

Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers, and to
look for hints.

Task 1. Load and Analyze the Placed Design

In this task, you will load the resulting design after placement and optimization and
perform pre-CTS checks.

1. Change to the work directory for the CTS lab, then load the placed design:

UNIX% cd lab7_cts
UNIX% icc2_shell -gui -f load.tcl

The script will make a copy of the place_opt block and open it.
2. Open the run.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
3. Generate a timing QoR summary.

report_qor -summary

Question 1. From a timing stand-point – is the design ready for CTS?


What about the hold violations?

...................................................................................................

...................................................................................................

Lab 7-2 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

4. Set the current mode to “func”.


5. Generate a clock report:

report_clocks

Question 2. How many master clocks are defined?

...................................................................................................

Question 3. How many generated clocks are defined?

...................................................................................................

Question 4. What type of clock are the remaining clocks?

...................................................................................................

Question 5. What is the source of the SD_DDR_CLK clock?

...................................................................................................

6. Generate a clock skew report:

report_clocks -skew

Question 6. What is smallest/largest Setup Uncertainty?


What is smallest/largest Hold Uncertainty?

...................................................................................................

...................................................................................................

7. Generate a clock groups report:

report_clocks -groups

Question 7. Which clock groups are mutually exclusive or asynchronous?

...................................................................................................

...................................................................................................

...................................................................................................

Running CTS Lab 7-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

8. Generate a clock tree summary report for both modes (default report):

v report_clock_qor -modes func


v report_clock_qor -modes test

Question 8. What is the big difference between the two modes?


(Hint: Look at the clock names)

...................................................................................................

Question 9. How many sinks does SD_DDR_CLK have?

...................................................................................................

9. Generate a port report on the start point or source of SD_DDR_CLK (see


Question 5):

report_ports [get_ports sd_CK]

Question 10. Why does SD_DDR_CLK have zero sinks?


(Hint: Look at the port direction)

...................................................................................................

...................................................................................................

Lab 7-4 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

Task 2. CTS Setup and Configuration

1. Since CTS setup (clock tree balancing constraints, NDR rules, etc.) will be
covered in the next unit, perform these setup steps by sourcing a file:

source scripts/cts_ex_ndr.tcl

2. Ensure that the correct scenarios are enabled for hold fixing. To find all active
scenarios that are enabled for hold:

get_scenarios -filter active&&hold


report_scenarios

Question 11. Are the scenarios configured properly for hold fixing?

...................................................................................................

Complete the scenario setup for hold (look at run.tcl). Also, double check
that all scenarios are active.
3. You can control which buffers or delay cells should be used for hold fixing
using set_lib_cell_purpose. Execute the three corresponding
commands from the run.tcl file.
4. If you like, you can increase the effort for maximum hold timing
optimization, although it is not required for this design:

set_app_options -name clock_opt.hold.effort \


-value high

5. In order to reduce scan chain hold timing violations, it is recommended to


enable scan-chain reordering to minimize crossings between clock buffers:

set_app_options \
-name opt.dft.clock_aware_scan_reorder \
-value true

6. Enable clock reconvergence pessimism removal by executing the


corresponding line from run.tcl.

Running CTS Lab 7-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

Task 3. Post-CTS I/O Latency

For our design, we do not want the I/O latencies to be updated, except for
v_PCI_CLK. This clock is a virtual clock, so latency adjustment needs to be
configured to update the clock:

foreach_in_collection mode [all_modes] {


current_mode $mode
set_latency_adjustment_options -exclude_clocks "*"
set_latency_adjustment_options \
-reference_clock PCI_CLK \
-clocks_to_update v_PCI_CLK
}

Note: Generally, if the clock for the input/output constraints is the


same as the internal clock, no configuration needs to be
done – their I/O latencies will automatically be adjusted.

Task 4. Run Comprehensive Clock Tree Checking

1. Now that you have analyzed the clocks using the manual methods discussed
earlier, generate a clock tree check report to see what other potential problems
might appear during CTS:

v check_clock_trees

You should see a long report with a “Summary” and a “Details” section.
The summary section will show you how many problems were found of each
problem category, and if there is a suggested solution, for example:
CTS-019 2 None Clocks propagate to output ports
CTS-905 4 None There are clocks with no sinks
For more details, review the Details section: The detailed section for
CTS-0905 (at the end of the report) complains about clocks without sinks,
which you have already analyzed in earlier steps. Four clocks are listed,
related to ports SD_DDR_CLK and SD_DDR_CLKn (repeated for each mode,
func and test).

Lab 7-6 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

CTS-903 46 None Cells instantiated in


the clock network are not in the clock reference list
CTS-904 12 None Some clock reference
cells have no LEQ cell specified for resizing
These two warnings list cells that are used in the clock tree, and are either not
enabled for the CTS lib cell purpose, or, they are enabled for the CTS lib cell
purpose, but their “logically equivalent cells” (LEQs) are not. In either case
this means that CTS will not be able to resize these cells (discussed in the next
Unit).

CTS-009 12 None Cell instances in the


clock tree have multiple conditional delay arcs between
the same pins
CTS-013 35 None Cells in the clock
network have a dont_touch constraint
This lists the cells that we have applied a dont_touch on.

For the purposes of this lab, all of these Warnings are acceptable, and can be
ignored.

Classic CTS or CCD


The allotted lab time is such that you will most likely only have time for one of the
CTS flows – choose either the classic CTS flow (Option A – Tasks 5 and 6, starting
on the next page), or the CCD flow (Option B – Task 7, starting at page 12).

Note: For this design, the runtime for CCD is higher than for classic CTS.

If you are done early with one flow, try the other flow.

Running CTS Lab 7-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

Task 5. Option A: Perform Classic CTS

Note: If you are performing this step after you have already performed
option B (CCD), then you will need to re-load the design and re-
apply all settings. To simplify this, just restart ICCII and use the
script scripts/load_all.tcl – this script will re-load the
design and set up everything up to this point.

1. Perform clock tree synthesis and route the clock trees:

set_app_options -name clock_opt.flow.enable_ccd \


-value false
clock_opt -to route_clock

The run should only take a few minutes. The above command will execute the
first two stages: build_clock and route_clock.
2. Once the run has completed, review the CTS skew results. After looking at all
results in all modes/corners, record results for the worst corner for the
functional mode, ss_125c:

report_clock_qor
v report_clock_qor -type local_skew
report_clock_qor -type area
report_clock_qor -mode func -corner ss_125c \
-significant_digits 3

3. Record the various clock QoR statistics:

Number of clock buffers (clock repeaters):

Clock tree area (repeaters + other network cells):

Record the global/local skew/max latency numbers for the indicated clocks, in
the slowest corner ss_125c:
ss_125c Corner Global Skew Local Skew Max Latency
SYS_2x_CLK
SDRAM_CLK

Lab 7-8 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

4. Another very useful report is the robustness report:

v report_clock_qor -type robustness -mode func \


-corner ff_m40c -robustness_corner ss_125c

Multi-corner robustness is a measure of how the latency to each clock sink


scales between the measured corner and a reference corner (specified with the
-robustness_corner option). Every corner pair has a scaling factor, which
is simply the ratio of the average latency in the measured corner over the
average latency in the reference corner. The robustness value for an individual
sink is the ratio of its latency in the measured corner over the latency in a
reference corner, normalized against the scaling factor for those corners.
If a clock sink has a robustness value of 1, that means it exhibits typical
scaling between the measured and reference corner. A robustness value
greater than one indicates a sink has higher than average latency in the
measured corner compared to the reference corner. Likewise, a robustness
value less than one indicates a sink with less than average latency in the
measured corner compared to the reference corner. The largest and smallest
robustness value sinks have the worst multi-corner robustness, and could
cause timing problems in the measured corner. If all clock sinks for a clock or
skew group have a similar robustness value, then the clock tree is said to be
multi-corner robust, and the skew can be maintained across those corners.

5. Generate a different skew report using the clock timing report: Concentrate on
the func mode and the worst corner:

report_clock_timing -type skew -modes func \


-corners ss_125c -significant_digits 3

The reported Skew is the difference between the max and min Latency
numbers, plus or minus the clock reconvergence pessimism (CRP).

Record Skew and maximum Latency for the indicated clocks:


Skew Latency (max)
SYS_2x_CLK
SDRAM_CLK

Running CTS Lab 7-9


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

Note: Definition of the symbols on the right end of the report:


w Worst-case operating condition
b Best-case operating condition
r Rising transition
f Falling transition
p Propagated clock to this pin
i Clock inversion to this pin
- Launching transition
+ Capturing transition

Question 12. Why are the skews reported by report_clock_qor and


report_clock_timing different?

...................................................................................................

...................................................................................................

Task 6. Perform Post-CTS Optimization

In this task you will optimize the non-clock network logic to address any timing
violations, and you will perform hold-fixing for the first time.

1. Generate a timing summary before optimization:

report_qor -summary

Note down the worst-case (Design) WNS/TNS/NVE numbers for setup and
hold:
WNS TNS NVE
Setup
Hold

2. Execute post-CTS optimization:

clock_opt -from final_opto

Lab 7-10 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

3. Once post-CTS optimization has completed, generate another timing


summary.

Question 13. Are there any setup or hold violations left?

...................................................................................................

4. Confirm that the design has no congestion issues.

5. Save the block and continue to Task 8.

Running CTS Lab 7-11


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

Task 7. Option B: Concurrent Clock & Data Flow

In this task, you will use the CCD flow to build the clock trees and optimize the
logic. Note that CCD will take much longer to run compared to classic CTS. If
you have chosen option B right away, then continue with the first step.

Note: If you are performing this step after you have already performed
option A, then you will need to re-load the design and re-apply all
settings. To simplify this, just restart ICCII and use the script
scripts/load_all.tcl – this script will re-load the design
and set up everything to this point, ready for CCD.

1. Source the following script – this will make things a little more interesting for
CCD, by introducing a few setup timing violations, which will need to be
fixed by the CCD algorithms. Have a look at a timing summary afterwards:

source scripts/margins_for_ccd.tcl
report_qor -summary

2. Note down the worst-case (Design) WNS/TNS/NVE numbers for setup and
hold:
WNS TNS NVE
Setup
Hold

3. Enable the CCD flow.


4. Run default clock_opt. This is the recommended way to run the CCD flow,
all three stages (build_clock, route_clock and final_opto) will be
executed. This will run for at least 15 minutes. Take a break.
You may, of course, choose to run each stage by itself, and then perform
intermediate analysis.
5. After completing the run, record the design QoR to see whether CCD was
able to fix all the artificially introduced violations. Note: Do not compare
results with the classic CTS! Timing was purposely made worse for CCD.
6. Save the block and continue to Task 8.

Lab 7-12 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

Task 8. Analysis

1. Have a look at the synthesized clock tree using the clock abstract graph GUI.
In the GUI select WindowClock Tree Analysis Window.
2. In the new CTSWindow, check the little box in the top right corner next to
‘Filter clock by corner’. This allows us to analyze clock latencies, which
vary by corner. Make sure the ss_125c corner is selected.
3. In the main section of the window, where all the clocks are listed, go to the
func scenario section, right click on the SYS_2x_CLK, then select Clock
Tree Latency Graph of selected Corner.

This screenshot shows the latency graph for SYS_2x_CLK for the classic CTS
flow (the CCD screenshot is shown on the next page):

Running CTS Lab 7-13


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

This screenshot shows the latency graph for SYS_2x_CLK for the CCD flow:

As expected, the latency distribution to the register endpoints will be larger


for CCD than for classic CTS.
4. Close the CTSWindow.

Lab 7-14 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7

5. Examine the clock tree routing topology, by


selecting the Clock Tree visual mode.
In the panel that, click on the little gear symbol
then make sure that “Auto Apply” is selected.

From the list of clocks, select an


individual clock to visualize its
routing topology in the layout window.

You can also select how many, and


which levels of the routing topology to
show.
Note: Level 0 is the net that connects to the clock root or source.
Level 1 is the net that is driven by the first driver, etc.

6. Close the Clock Tree visual mode panel.

7. Examine the timing from one of the clocks constrained by v_PCI_CLK:

report_timing -from [get_clocks v_PCI_CLK]

Question 14. Is the network latency on the input v_PCI_CLK propagated?

...................................................................................................

You have successfully completed the Running CTS lab.

Running CTS Lab 7-15


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7 Answers / Solutions

Answers / Solutions

Question 1. From a timing stand-point – is the design ready for CTS?


What about the hold violations?
There should be no setup violations. There are hold
violations, which will be addressed during CTS.

Question 2. How many master clocks are defined?


There are 3 master clocks defined: Master clocks have a
clock source, and are not generated (“G”).
PCI_CLK, SDRAM_CLK and SYS_2x_CLK

Question 3. How many generated clocks are defined?


3 clocks are generated clocks:
SD_DDR_CLK, SD_DDR_CLKn, and SYS_CLK.
Look at the top section of the report:
The Attrs column shows the letter G (generated).
The section below the list of all clocks contains details of
the generated clocks.

Question 4. What type of clock are the remaining clocks?


The remaining clocks are virtual clocks. They are:
v_SDRAM_CLK and v_PCI_CLK
Virtual clocks have an empty field in the Sources column.

Question 5. What is the Source of the SD_DDR_CLK clock?


From the Sources column in the top section of the report:
sd_CK, which is a generated clock of SDRAM_CLK.

Question 6. What is smallest/largest Setup Uncertainty?


What is smallest/largest Hold Uncertainty?
From report_clocks -skew:
Setup Uncertainty:
0.10 ns (*ff* scenarios), 0.30 ns (*ss* scenarios)
Hold Uncertainty:
0.05 ns (*ff* scenarios), 0.10 ns(*ss* scenarios)

Lab 7-16 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 7

Question 7. Which clock groups are mutually exclusive or


asynchronous?
The report lists three asynchronous clock groups:
SYS_2x_CLK and SYS_CLK
PCI_CLK and v_PCI_CLK
SDRAM_CLK, v_SDRAM_CLK and SD_DDR_CLK*
All clocks within a group are synchronous to one another,
but clocks across groups are asynchronous to one another.
Clock groups can be logically_exclusive,
physically_exclusive, or asynchronous: Paths between
exclusive or asynchronous clocks are not considered during
timing analysis (similar to set_false_path). In addition,
logically_exclusive, physically_exclusive, and asynchronous
relationships affect how crosstalk analysis is performed
between the clocks.

Question 8. What is the big difference between the two modes?


(Hint: Look at the clock names)
The test mode has an additional clock: ate_clk.

Question 9. How many sinks does SD_DDR_CLK have?


Zero sinks.

Question 10. Why does SD_DDR_CLK have zero sinks?


(Hint: Look at the port direction Dir )
SD_DDR_CLK is a generated clock defined on the port
sd_CK (as shown by report_clock). From
report_port [get_ports sd_CK] you will see that
this is an output port, and therefore has no sinks.

Question 11. Are the scenarios configured properly for hold fixing?
You will find that there are three *ff* scenarios, of which
one is not configured for hold fixing: test.ff_125c.
Since this is a fast scenario, make sure it is enabled for hold
fixing.

Running CTS Lab 7-17


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 7 Answers / Solutions

Question 12. Why are the skews reported by report_clock_qor and


report_clock_timing different?
The report_clock_qor command reports global skew,
by default, which is the maximum skew across all sinks in
the entire clock domain (the difference between the longest
and the shortest insertion delays), even if these extreme
clock paths are not related.
Furthermore, report_clock_qor does not use timing
derates, even though they might be defined in your corners.
You can also generate a local skew report using
report_clock_qor -type local_skew.
The report_clock_timing report only calculates local
clock skew: The reported maximum skew is between two
flip flops that share a timing path (one clock branch
launches the data while the other branch captures the data).
The report_clock_timing command also considers
timing derates, but for setup timing it applies late derates
to the launch clock path and early derates to the capture
clock path. The latter report is more accurate in determining
the real worst-case clock skew.

Question 13. Are there any setup or hold violations left?


There should only be a few small setup and hold violations.

Question 14. Is the network latency on the input v_PCI_CLK


propagated?
No. A latency number is listed under “clock network
delay (ideal)”. This is the auto-updated ideal latency
value that ICC II has computed for PCI_CLK, and applied to
v_PCI_CLK (since we configured this in Task 3).

Lab 7-18 Running CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Setting up CTS
8

Learning Objectives

The purpose of this lab is for you become familiar with the
setup steps for clock tree balancing, NDRs as well as
timing/DRCs.
After CTS setup, you will run the build_clock /
route_clock clock tree synthesis stages to confirm the
results.

After completing this lab, you should be able to:


• Set up clock tree balancing
• Create and apply Non-Default routing rules
• Apply clock-related timing and DRC constraints

Lab Duration:
45 minutes

Setting up CTS Lab 8-1


Synopsys 20-I-078-SLG-010
Lab 8

Instructions

Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers, and to
look for hints.

Task 1. Load the Design and Analyze the Clocks

1. Change to the work directory for the CTS lab, then load the starting design:

UNIX% cd lab8_cts
UNIX% icc2_shell -gui -f load.tcl

The script will open a block that is ready for the upcoming tasks.
2. Open the run.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
3. Select the menu WindowClock Tree Analysis Window and answer Yes
when asked “… Do you want to continue?”.
4. Expand the SDRAM_CLK entry (click on the “+” in front of it) and you will see
the two SD_DDR_CLK* clocks. Notice that the is_generated column is set
to true for these clocks, and their sources are sd_CK*. Also, you will see
“M” and “G” symbols in front of the clocks, which identifies them as Master
or Generated clocks, respectively. In summary: The SD_DDR_CLK and
SD_DDR_CLKn clocks are generated from the master clock SDRAM_CLK. The
SDRAM_CLK clock is applied to its source port sdram_clk. The generated
clocks are applied to their source ports sd_CK and sd_CKn, respectively.

Lab 8-2 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

5. Perform closer analysis of SDRAM_CLK in func mode by right-clicking on it,


then selecting “Clock Tree Object List”. In the “Find CTS Object”
dialog that appears, select OK.

You should see a new window as shown above.


In this new window, you will see all valid sink pins of this clock. You will not
see the output ports sd_CK and sd_CKn in this list. You’ll see why later.
If you scroll to the right, you will see many more attributes associated with the
pins/ports. To better display the information that is important to you, you can
move the position of the columns to the left/right, and you can sort the order
by the values in any column.
Close the CTS window.
6. To report what type of balance points exist on the clock tree endpoints
(implicit or explicit ignore or sink pins), generate a clock structure report:

v report_clock_qor -type structure

In the view window, type Ctrl-F (or click on Search…) then enter the search
string “sd_CK” and press enter. You should see the line beginning with
sd_CK [out Port], and at the end of the line you will see a balance point
exception (something other than an implicit SINK PIN).
Question 1. What balance point exception is set on sd_CK, and why?

...................................................................................................

...................................................................................................

Do NOT close the view window yet.

Setting up CTS Lab 8-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

Task 2. Clock Tree Balancing

Many designs have special or non-default requirements for their clock trees, in
which case executing a default clock tree synthesis is not sufficient.

CTS will only balance the delays (minimize skew) to sink pins, which, by default,
are clock pins of sequential cells or macros. If there are additional pins that need to
be balanced along with these clock pins, ICC II needs to be explicitly told about
them prior to CTS.

Figure 1. SDRAM interface

Figure 1 shows the SDRAM interface. The clock SDRAM_CLK is connected directly to
the select pins of muxes, which in turn will drive the output ports of the ORCA_TOP
block. The dummy mux driving sd_CK is required because of the tight timing
requirement of the DDR SDRAM interface, which produces data at its output data
ports on both the rising and the falling clock edges. This design requires that the
clock skew from SDRAM_CLK to sd_DQ_out and sd_CK be optimized. By default,
select (S0) pins are marked as implicit ignore pins. To have CTS balance the skew
you need to redefine these select pins as sink pins.

1. In the view window that should still be open (see the last step of the previous
Task), notice that just above and below the sd_CK line, the MUX select pins
described in Figure 1, I_SDRAM_TOP/I_SDRAM_IF/sd_mux_CK*/S0, are,
in fact, also implicit ignore pins.
2. Click on Dismiss Search and then Exit the view window.

Lab 8-4 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

As a reminder, select and run the commands from the run.tcl script using
the built-in script editor.

3. Apply balancing constraints for the S0 pins in all modes:

set_clock_balance_points \
-modes [all_modes] \
-balance_points [get_pins "I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*/S0"]

4. Generate the following report to verify the user-defined balance points:

V report_clock_balance_points

You should see the pins listed below the following lines:
Clock Independent:
Balance Points:
The reason the balance points are “Clock Independent” is because you did not
specify a clock to go with the exception. If the exception is intended to be
balanced with regard to a specific clock, and there are multiple clocks
reaching this point, then a clock should be specified. This is not the case here.
5. Have another look at a clock structure report, and search for sd_mux:

v report_clock_qor -type structure

6. In the view window, search for sd_CK.


Question 2. How are the S0 (select) pins of the MUXes labeled now?

...................................................................................................

Question 3. How is the sd_CK port labeled now? What does this mean?

...................................................................................................

...................................................................................................

7. Close the view window.


8. Instruct CTS to not change the SDRAM muxes: They have been chosen to
achieve the best timing, and should be left as they are.

set_dont_touch \
[get_cells "I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*"]

Setting up CTS Lab 8-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

9. Verify your settings with the report_dont_touch command:

report_dont_touch I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*

10. Return to WindowClock Tree Analysis Window. Expand the


SYS_2x_CLK (click on the “+” in front of it). You should see the SYS_CLK
clock.
Question 4. What is the source of this generated clock?

...................................................................................................

11. Instruct CTS to not change the register that is used as the clock divider:

set_dont_touch \
[get_cells "I_CLOCKING/sys_clk_in_reg"]

12. Verify with the reporting command used in a previous step.


13. Set a skew target of 0.05ns for all the slow (ss) corners, and 0.02ns for the
fast (ff) corners.
14. Generate the following report and confirm:

report_clock_tree_options

15. When performing CTS, it is generally desirable to use specific cells for
synthesis, instead of letting ICC II choose any cell from the library, for
example: Cells which help to reduce skew (identical rise/fall ramp times);
Cells which help to better balance between power consumption and
speed/drive-strength, size, etc.
CTS-specific cells are defined using:
set_lib_cell_purpose -include cts
First, automatically identify the gates and ICGs that are already on the clock
network, and their logical equivalents (leq’s):

derive_clock_cell_references -output cts_leq_set.tcl

Note that the above will only work if all library cells already have the cts
purpose.
16. Have a look at the file that was created - cts_leq_set.tcl.
As you can see, all cells that are on the clock network currently have been
identified, along with their LEQ’s. You could copy and paste the commands to
a new file, uncomment the appropriate lines, and source it later, however, you
do NOT have to do this - we have already done this for you.

Lab 8-6 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

17. Next, choose the buffers and/or inverters you want to use for the clock tree –
this is done using the following lines:

set CTS_CELLS [get_lib_cells \


"*/NBUFF*LVT */NBUFF*RVT * \
/INVX*_LVT */INVX*_RVT */*DFF*"]
set_dont_touch $CTS_CELLS false
set_lib_cell_purpose -exclude cts [get_lib_cells]
set_lib_cell_purpose -include cts $CTS_CELLS

Select/run these lines from run.tcl.


18. Instead of sourcing an edited copy of the cts_leq_set.tcl file that you
created earlier, you can source our version:

source scripts/cts_include_refs.tcl

19. Generate a report to ensure that the correct lib-cell purpose was indeed set,
and dont_touch was removed, on the key CTS cells:

v report_lib_cells -objects [get_lib_cells] \


-columns {name:20 valid_purposes dont_touch}

In the view window, search for the string “cts”.


You could also use the workshop-provided alias _full_lib_report.

Setting up CTS Lab 8-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

Task 3. Define CTS Non-Default Routing Rules

In this task you will specify CTS non-default routing rules, as well as clock cell
spacing rules.

1. Open the file scripts/ndr.tcl in an editor.


2. Review the file and answer the following questions:
Question 5. Which net segment(s) of the clock tree do the two clock
routing rules apply to?

...................................................................................................

...................................................................................................

...................................................................................................

Question 6. What are some key differences between the rules?

...................................................................................................

...................................................................................................

...................................................................................................

...................................................................................................

...................................................................................................

3. Apply the clock NDRs:

source –echo scripts/ndr.tcl

4. First, verify that the routing rules that were created:

v report_routing_rules -verbose

You should see the metal layer details for each of the two rules that were
created. In addition, you should see a section for the vias.
5. Now verify where the rules were applied:

report_clock_routing_rules

The report shows which net segments (net type) the rules apply to (sink
overrides all), and the min/max layer constraints for each clock segment.

Lab 8-8 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

Task 4. Timing and DRC Constraints

1. Generate a port report for the master clock sources:

report_ports -verbose [get_ports *clk]

Verify that the master clock sources are input ports, and that they are all
constrained by either a Driving Cell or input Transition.
Question 7. Are all clock ports constrained by either a Driving Cell or
input Transition?

...................................................................................................

...................................................................................................

Question 8. Why is it important for clock input ports to be constrained by


set_driving_cell or set_input_transition?

...................................................................................................

...................................................................................................

2. Fix the problem found above by adding a driving cell to the ate_clk port.
Driving cells need to be added in all scenarios, because this constraint is
scenario-specific.
Specify NBUFFX16_RVT as the driving cell for the port ate_clk, then report
the clock ports again:

set_driving_cell -scenarios [all_scenarios] \


-lib_cell NBUFFX16_RVT [get_ports ate_clk]

report_ports -verbose [get_ports *clk]

Setting up CTS Lab 8-9


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

3. Take another look at the clock uncertainty numbers using


report_clocks -skew. Next, we will change the clock uncertainty
numbers to account for post-CTS propagated clock timing. This is
accomplished by reducing the uncertainty by the value that was intended to
model the skew. The uncertainty should still model the effects of clock jitter
or additional timing margin.
Apply the following commands:

foreach_in_collection scen [all_scenarios] {


current_scenario $scen
set_clock_uncertainty 0.1 -setup [all_clocks]
set_clock_uncertainty 0.05 -hold [all_clocks]
}

4. Apply a max transition constraint of 0.15ns on all clocks, in all corners of


the func mode.
5. Enable the removal of clock reconvergence pessimism to eliminate the
timing pessimism of OCV timing derating on shared launch/capture clock tree
branches.
6. Generate the following report to confirm the max transition setting:

v report_clock_settings

Note: To see the correct max transition information, you have to scroll down
past the second ##Global section, and search for the “Mode = func”
section which lists all the individual clocks. The report first lists Global
settings for all modes/corners, which were not set in our case. Instead, we
applied clock-specific settings (to all clocks) by using “-clock_path
[get_clocks]”, in the current func mode.

Lab 8-10 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8

Task 5. Perform CTS and Analyze the Results

1. Build and route the clock trees. Remember to disable CCD, it’s not needed
for now:

set_app_options -name clock_opt.flow.enable_ccd \


-value false
clock_opt -to route_clock

2. In the GUI, turn off the visibility of power and ground nets, and zoom in to
have a closer look at the clock routes. If you hover the mouse cursor over a
net, in the query window that appears, you will be able see the NDRs routing
rule that has been applied.
3. Report the skew between all the sd_mux* pins, which were defined as sink
pins in an earlier Task. An easy way to do this is:

report_clock_qor \
-to I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*/S0 \
-corners ss_125c

You should find that the global skew is pretty small.


4. Have a look at the clock tree latency graph for SDRAM_CLK:
a. In the GUI: Window  Clock Tree Analysis Window
b. Check the little box in the top right corner next to ‘Filter clock by
corner’. This allows us to analyze the latency, since latency
calculation is done on a corner-by-corner basis (without selecting a
corner, the x-axis of the latency graph displays “levels of logic”
instead).
c. Right click on SDRAM_CLK under the func scenario, then select Clock
Tree Latency Graph of selected Corner.
5. Time permitting, perform any additional analysis which is of interest to you.

You have successfully completed the Setting up CTS lab.

Setting up CTS Lab 8-11


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8 Answers / Solutions

Answers / Solutions

Question 1. What balance point exception is set on sd_CK, and why?


It is an [IMPLICIT IGNORE PIN]. Clock tree sink pin
endpoints are register or macro clock pins, by default. This
is an output port, and therefore ignored for balancing, by
default.

Question 2. How are the S0 (select) pins of the MUXes labeled now?

[BALANCE PIN], which represents explicit, or user-defined


sink pin exceptions.

Question 3. How is the sd_CK port labeled now? What does this mean?
The sd_CK port is now described as [BEYOND
EXCEPTION], since the port is in the fanout of (beyond) the
BALANCE PIN exception on the mux. CTS non-default
design rules, as well as general max transition and
capacitance design rules will be applied on the “beyond
exception” sections of the clock network.

Question 4. What is the source of this generated clock?


The source is I_CLOCKING/sys_clk_in_reg/Q. This is
a divide-by-2 register, used to divide SYS_2x_CLK and
generate SYS_CLK.

Question 5. Which net segment(s) of the clock tree do the two clock
routing rules apply to?
The bottom routing rule $CTS_LEAF_NDR_RULE_NAME
(cts_w1_s2), is applied to the sink segments of the clock
net (-net_type sink).
The top rule $CTS_NDR_RULE_NAME (cts_w2_s2_vlg),
is applied to the remaining root and internal segments
of the clock net, since–net_type was not specified.

Lab 8-12 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 8

Question 6. What are some key differences between the rules?


The $CTS_LEAF_NDR_RULE_NAME rule for the sink nets
specifies only non-default spacings (uses default
widths), while the $CTS_NDR_RULE_NAME rule has both
non-default spacings and widths, as well as via (cuts)
rules
The sink nets will be routed on M1-M5, while the root and
internal clock tree nets will be routed primarily on M4-
M5.
For the sink rule, M1 non-default spacing is not specified:
This is to avoid potential DRC violations when connecting
to the standard cell pins.
For the root and internal clock tree nets, NDRs were
included for the lower non-primary CTS routing layers M1-
M3: This is generally recommended, since these nets will
have to eventually be routed on these lower layers to
connect to the clock pins.

Question 7. Are all clock ports constrained by either a Driving Cell


or input Transition?
No. You will find that the clock port ate_clk appears in
most report sections, but not in the Driving Cell or
Transition sections (Note: The Transition section
appears only if at least one input has an input transition
applied, which is not the case here). This means that there is
no driving cell or input transition constraint for this clock
port. The reason you did not see this port before is because
it is only defined as a clock in test mode.

Question 8. Why is it important for clock input ports to be constrained


by set_driving_cell or set_input_transition?
During CTS, clock insertion delays are calculated to
optimize for skew and latency. Either one of the above two
constraints allows CTS to more accurately calculate the
delay of the first gate connected to the clock input port, for
more accurate clock insertion delay calculation (a cell’s
delay is a function of a its input transition time). Input port
transition time is 0ns, by default, which can cause less
accurate results.

Setting up CTS Lab 8-13


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 8 Answers / Solutions

This page was intentionally left blank.

Lab 8-14 Setting up CTS


Synopsys IC Compiler II: Block-level Implementation Workshop
Routing and Post-
11 Route Optimization,
Signoff

Learning Objectives

After completing this lab, you should be able to:


• Perform routeability checks on a placed design
with clock trees
• Apply routing options
• Route secondary PG nets
• Control via optimization
• Perform route and post-route optimization
• Analyze the design for timing with SI enabled, and
perform incremental power and crosstalk
optimizations
• Optional: Perform sign-off DRC checking and
fixing, standard cell filler insertion, and sign-off
metal fill insertion.

Lab Duration:
70 minutes

Routing and Post-Route Optimization, Signoff Lab 11-1


Synopsys 20-I-078-SLG-010
Lab 11

Instructions

Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers.

Task 1. Load and Check the Post-CTS Design

1. Change to the work directory for the Routing lab, then load the post-CTS
design:

UNIX% cd lab9_11_route_signoff
UNIX% icc2_shell -gui -f load.tcl

The script will make a copy of the clock_opt block and open it.
2. Generate a timing QoR summary.

report_qor -summary

Question 1. Is timing acceptable for routing?

...................................................................................................

...................................................................................................

...................................................................................................

Lab 11-2 Routing and Post-Route Optimization, Signoff


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 11

Task 2. Route the Design

To support a more immersive and interesting lab experience, there are no step-by-
step instructions for this lab.
Instead, you are asked to open the file run.tcl in an editor (or using the ICC II
script editor), and exercise the commands line by line. We are specifically asking
you not to just source the entire file, as this would defeat the purpose, which is to
understand how all the options and commands play together. If there is an option
that does not make sense, have a look at its man page.
The following sections provide additional information and comments. The sections
are ordered in such a way that you can refer to them as you go through the script.
If you like, you can diverge from the commands in run.tcl, or you could try
different efforts, different settings etc. Note, though, that the runtimes might vary.
All in all, the runtimes are very quick, so you are encouraged to experiment.
The following section is designed to be a guide through the lab. It contains
information on the items that have to be configured and run, and some additional
questions.
If you need help, talk to your instructor.

Pre-routing Checks
Before you route the design it is best to ensure that there are no issues that will
prevent the router from doing its job.
Question 2. Is the design ready for routing?

...................................................................................................

...................................................................................................

...................................................................................................

Antenna
Antenna definitions are commonly supplied in a separate TCL file, specific to the
technology, using the following commands (these are the same commands used in
IC Compiler):
• define_antenna_rule
• define_antenna_layer_rule
• define_antenna_area_rule
• define_antenna_accumulation_mode
• define_antenna_layer_ratio_scale

In addition, there are application options that control how antenna violations are
handled. Use the report_app_options command shown in run.tcl.

Routing and Post-Route Optimization, Signoff Lab 11-3


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 11

Crosstalk Prevention
Crosstalk prevention tries to ensure that timing-critical nets are not routed in parallel
over long distances. Prevention can occur in the global routing and the track assign
stages. The current recommendation is to enable prevention during the track assign
stage only. In order for crosstalk prevention to occur, crosstalk (or signal integrity)
analysis must also be enabled. In order to make post-route analysis and optimization
more interesting (showing SI violations that need to be fixed during route_opt),
you can choose to do that “artificially” by not enabling SI analysis during routing.

Secondary PG Routing
Secondary PG pins are power pins on special cells like level shifters or isolation
cells. In addition to the regular power supply provided through the standard cell
rails, these secondary power pins need to be routed to using the standard router.
There are a few routing parameters used to configure the routing behavior. In
addition, it is common to define non-default rules for these power connections.
After performing the secondary PG routing as shown in the script, identify the low-
to-high level shifters (in the PD_RISC_CORE voltage area) and examine their power
routing, to answer the following question:
Question 3. What is the name of the secondary PG pins, and where do
they connect to?

...................................................................................................

...................................................................................................

...................................................................................................

Routing, DRC Analysis


After completing secondary PG routing, as well as routing “critical” signals, you
perform auto-route. At this stage this will route all signal nets that have not been
routed previously. Any signals that have been routed already (clocks, secondary PG)
will not be touched again if they are DRC clean. Auto-routing runs global routing,
track assignment and detail routing.
Question 4. How many detail route iterations are run by default?
(Hint: Review the man page for route_auto.)

...................................................................................................

Lab 11-4 Routing and Post-Route Optimization, Signoff


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 11

Question 5. How do you change the default, and how do you force the
router to run through all iterations even though the router
might not see any improvements
(Hint: report_app_options *iterat*)?

...................................................................................................

...................................................................................................

Examining Routing Design Rule Violations


Use the error browser to visualize any routing violations in the GUI that may have
been left over. On the top toolbar, select the button for the error browser.

In the popup, select zroute.err and click on Open Selected. In the new window,
you can select the errors from the list, which causes the layout view to zoom to that
violation.
Close the Error Browser.

Signal Integrity
Signal integrity analysis should be turned on before routing. This will instruct the
extraction engine to extract cross-coupling capacitances, and instruct the router to
perform timing analysis with delta delays using these coupling capacitances. This is
important when performing timing-driven routing.
For this lab, if you left SI analysis off before auto-route, turn it on afterwards to see
the SI effects, and to allow SI-related violations to be fixed during route_opt.
For better correlation with PrimeTime, you should also enable timing window
analysis, as shown in the script.

Post-Route Optimization
Post-route optimization is performed using route_opt, which performs timing,
logical drc, area and (optionally) CCD and power optimization.
There are application options you have to set in order to enable CCD and power
optimization.
For best correlation with PrimeTime, you should enable PT delay calculation. In this
lab, don’t perform StarRC InDesign extraction in route_opt. You will be using
StarRC later when performing ECO Fusion.

Routing and Post-Route Optimization, Signoff Lab 11-5


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 11

ECO Fusion
After completing route_opt, it’s important to analyze signoff timing in
PrimeTime using StarRC parasitic extraction. Any violations uncovered by PT can
be fixed using PT’s physical ECO.
Using Fusion, the entire ECO process can be controlled from within ICC II.
Review the run.tcl file for the necessary commands, and perform one round of
ECO fixing. For Fusion, you will require ICC II, StarRC and PrimeTime SI.
Note that after the ECOs have been implemented, you have to analyze timing using
the command check_pt_qor. You should not run ICC II’s native timing reporting
commands (report_qor, report_timing, …).

Std Cell Fillers, ICV DRC, Metal Fill


After all post-route optimizations are complete, and any necessary ECOs have been
performed, perform standard cell filler insertion as shown in the run.tcl script.
If you want to perform signoff DRC checking, have a look at the next steps. You
will find commands for performing DRC checking and also ICV DRC auto-repair.
Please use the select_rules filter as shown, otherwise you will see many violations
which are due to a mismatch between our techfile and the ICV DRC runset.
You can review the errors generated by ICV using the error browser. If the error
browser is already open, use FileOpen… and select signoff_check_drc.err.
Otherwise select the error file in the error-browser popup (the Checker column will
say “IC Validator”).
As the final step, perform metal filling using IC Validator. Have a look at the very
end of the run.tcl script.
Examine the metal fill using the GUI, as described in the lecture.

You have successfully completed the Routing lab.

Lab 11-6 Routing and Post-Route Optimization, Signoff


Synopsys IC Compiler II: Block-level Implementation Workshop
Answers / Solutions Lab 11

Answers / Solutions

Question 1. Is timing acceptable for routing?


You should see that there are only a few remaining small
timing violations. There are also a few smaller max
transition as well as a few max capacitance violations (you
will see them with report_constraints -all)

Question 2. Is the design ready for routing?


Using the check_design command shows a few issues.
Examine the EMS issues in the message window by using
WindowMessage Browser Window, then use
FileOpen Message Database… to open the file
check_design.ems.
For this lab, you can ignore the EMS messages.
For the remaining non-EMS messages, you have to review
the file check_design*.log.
ZRT-022 warns about a missing default contact for layer
CO, which is entirely fine for this technology. ZRT-044
lists a few cells that do not have valid via regions, which is
something that should be addressed in the corresponding
physical library during library preparation. ZRT-585
complains about cell-internal pins, that will not be relevant
to the router. Finally, please ignore ZRT-511.

Question 3. What is the name of the secondary PG pins, and where do


they connect to?
The PG pins for the low voltage are called VDDL. The router
connects them to the nearest VDD vertical straps, or
horizontal rails, which are located just outside the voltage
area boundary. You can verify the voltages using
report_power_domains.

Question 4. How many detail route iterations are run by default?


There, the default is stated as 40.

Routing and Post-Route Optimization, Signoff Lab 11-7


Synopsys IC Compiler II: Block-level Implementation Workshop
Lab 11 Answers / Solutions

Question 5. How do you change the default, and how do you force the
router to run through all iterations even though the router
might not see any improvements?
You change the default using
route_auto -max_detail_route_iterations <#>
To force route_auto to actually run through all iterations
you need to change an application option from its default
(false):
route.detail.force_max_number_iterations  true

Lab 11-8 Routing and Post-Route Optimization, Signoff


Synopsys IC Compiler II: Block-level Implementation Workshop
PHYSICAL DESIGN is the process of transforming a circuit description into
the physical representation for manufacturing

SYNTHESIS:
Synthesis is process of converting RTL to technology specific gate level netlist
Input files required
 .lib-timing info of standard cell & macros
 .v- RTL code.
 SDC- Timing constraints.
 UPF- power intent of the design.
 Scan config- Scan related info like scan chain length, scan IO, which flops
are to be considered in the scan chains.
 RC co-efficient file (tluplus).
 LEF/FRAM- abstract view of the cell.
 Floorplan DEF- locations of IO ports and macros.

Steps involved in Synthesis


Translation: All the codes and arithmetic operators are converted into Gtech and DW
(Design Ware) components. These are technology independent libraries.
 Gtech- contains basic logic gates &flops.
 DesignWare- contains complex cells like FIFO, counters.

Optimization: Boolean equation is optimized using SoP or PoS optimization methods.


Technology mapping: Technology independent boolean logic equations are mapped to
technology dependant library logic gates based on design constraints, library of
available technology gates. This produces optimized gate level representation which is
generally represented in Verilog.

These three methods are done internally in the logic synthesis tool(1. Genes,
2.Design compiler) and are not visible to the designer.
3. Import constraints and UPF
SDC – for timing constraints
If the design consists of multiple power domains UPF file is needed
4. Clock gating
Due to high switching activity of clock a lot of dynamic power is consumed. to lower the
dynamic power is clock gating technique is used
clock gating circuit consists of an AND gate in the clock path with one input as enable.
5. DFT (Design for Testing) insertion
DFT circuits are used for testing each and every node in the design.
9. Outputs of Synthesis
 netlist
 SDC
 UPF
 ScanDEF- information of scan flops and their connectivity in a scan chain
Checks to be done after sythenthesis or sanity check before floorplan
the RTL and netlist are logically equivalent (LEC/FM)
 Floating pins
 multi driven inputs
 un-driven inputs
 un-driven outputs
 normal cells in clock path
 pin direction mismatch
 don’t use cells
 Setuptiming
 CLP check --- always on buffer is placed or not
 Cell profiing
 Buffer count

Floor Planning:
A floorplanning is the process of placing blocks/macros in the chip/core area,
.
Floorplan determines the size of die , I/O pin/pad placement and creates power
ground(PG) connections.

Inputs required
1. Gate level netlist
2. LEF,LIB
3. Timing constraints (SDC)
4. Power Intent (UPF / CPF)
5. FP DEF & Scan DEF

Steps involved in floor plan:

— Initialize with Chip & Core Aspect Ratio (AR)


— Initialize with Core Utilization
— Initialize Row Configuration & Cell Orientation
— Provide the Core to Pad/ IO spacing (Core to IO clearance)
— Pins/ Pads Placement
— Macro Placement by Fly-line Analysis
— Macro Placement requirements are also need to consider
— Blockage Management (Placement/ Routing)
1. Initialize Core Aspect Ratio (AR)
AR=Horizontal routing/vertical routing(1-square)

2. Initialize with Core Utilization:


Amount of core area used for cell placement 60-70 remaining for routing
= otal Standard Cell AREA + Macro Area x 100 %
Total Core Area
3. Initialize Row Configuration & Cell Orientation
these rows are individual rows and the row area is utilized by the standard cell

Cell orientation
Fly/flight lines are virtual connections between macros and also macros to I/O pads.
flight lines are of three types.
1. Macro to macro fly lines
2. pin to pin fly lines
3. macro to I/O fly lines

4. Providing space between the Core to Pad/ IO spacing

5.IO placements

6.Add
— End Caps to prevents DRC violations
— Well Taps prevent Latch-up

6.Power Planning
Grid is created to distribute power to all the cells
Width of the metal is available in LEF
• Rings (Vertical and horizontal)
— VDD and VSS Rings are formed around the Core and Macro
• Stripes
— Carries VDD and VSS around the chip
• Rails (Special Route)
— Connect VDD and VSS to the standard cell

7.Macro placement
Guidelines will be provided for macro placement
— Reserve enough room around Macros for IO Routing
— Provide necessary Blockages around the Macro
8. Blockages
— Placement Blockage & Routing Blockage
— Both of the Blockages can again be classified as-
• Hard, Soft and Partial Blockages
— Hard Blockage
• Complete Standard Cell Blockage
—Soft Blockage
• Non-Buffering Blockage
— Partial Blockage
• Partial Standard Cell Blockage and is used to avoid congestion
• We can Block Standard Cells as per the required percentage value
— Keep-out/ Halo
• Halo is similar to Soft Blockage (Terminology in Cadence EDI)
• Its basically a keep-out Macro margin
• Halo respects Macro while other Blockages respect location
i.e., even if Macro is moved Halo also moves along with it

9.Create Power Domain

Checks to be done:
How to qualify Floorplan?

1. Max density
2. Check PG connections (For macros & pre-placed cells only)
3. Check the power connections to all Macros,
4. All the macros should be placed at the boundary
5. Remove all unnecessary placement blockages & routing blockages (which might
be put during floor-plan & pre-placing)
6. Check power connection to power switches
7. Check pin placements
8. Power related short open in design, IR drop

Placement
all the standard cells are placed in the design
• Placement Stages
— Global Placement
— Detail Placement
— Placement Legalization
— In-Place Optimizations
• Global/ Coarse Placement
— approximately place the cells
— Cells are not legally placed and there
can be overlapping
• Detail/ Legal Placement
— Cells have legalized locations To avoid cell overlapping
• Placement Legalization
— Placed Macros are legally oriented with Standard Cell Rows
• In-Place Optimizations
— Scan Chain Reordering
Checks:
1.placement congestion
2.timing checks
3.dont use don’t touch
4.max trans and max cap
5.cell profiling
6.CLP check
7.High fan out
8.Secondary PG connection

CTS:
So far we used ideal clock in cts physical clock tree structure will be built between clock
source to sink
Clock should get distributed evenly for all elements in a design
Goal:
 Meet the clock tree DRC.
 Max. Transition.
 Max. Capacitance.
 Max. Fanout.
 Minimal skew.
 Minimum insertion delay.
These details were present in lib file
Checks to be done before CTS
1. Check legality.
2. verify PG connections.
3. Timing QoR (setup should be under control).
4.Timing DRVs.- max tran, max cap, max fanout
5.Conjestion hotspot
6. Check & qualify don’t_touch, don’t size attributes on clock components
Clock buffer and clock inverter are used to maintain 50% of duty cycle
several structure for clock tree:
 H-Tree

 X-Tree

 Multi level clock tree

 Fish bone

Before CTS all Clock Pins are driven by a single Clock Source
After CTS the buffer tree is built to balance the loads and minimize the skew

To meet the ID(Insertion delay) value TAP POINTS CAN BE INCREASED/DECREASED


we can use transport buffer/INVERTER or higher metal

Htree minimize the skew rate or NDR (DWDS)

Many clock buffers are added, congestion may increase, this will cause setup,hold
violation

. Set Up Fixing:

i. Upsizing the cells (increase the drive strength) in data path.


ii. We can reduce the buffers from datapath .
iii. We can replace buffers with two inverters with some distance this will adjust the
delay
iv. LVT cells
v. Clock pushing(in capture path) by adding buffer in clock path
Hold Fixing:
It is well understood hold time will be large if data path has more delay. So we have to
add more delays in data path.
i. Downsizing the cells (decrease the drive strength) in data path.
ii. By adding buffers/Inverter pairs/delay cells to the data path.
iii. By increasing the wire load model, we can also fix the hold violation.
WLM contains the details of Wire resistance and capacitance
Transition violation(it occur only in input pins)
signal takes too long transiting from one logic to another, than a transition violation is
caused. The Trans violation can be because of node resistance and capacitance.
i. By upsizing the driver cell.
ii. reducing long routed net.
iii. By adding Buffers.
Cap violation
The capacitance on a node is a combination of the fan-out of the output pin and
capacitance of the net. This check ensures that the device does not drive more
capacitance than the device is characterized for.
i. The violation can be removed by increasing the drive strength of the cell.
By buffering the some of the fan-out paths to reduce the capacitance seen by the output
pin.
Max fanout(it occur only in output pin)

Routing

Checklist Before Routing


 Placement completed
 CTS completed
 Power and ground nets routed
 Estimated congestion - acceptable
 Estimated timing – acceptable (~0 ns slack)
 Estimated max cap/trans – no violations
Different Types of Delays in ASIC or VLSI design
 Source Delay/Latency
 Network Delay/Latency
 Insertion Delay
 Transition Delay/Slew: Rise time, fall time
 Path Delay
 Net delay, wire delay, interconnect delay
 Propagation Delay
 Phase Delay
 Cell Delay
 Intrinsic Delay
 Extrinsic Delay
 Input Delay
 Output Delay
 Exit Delay
 Latency (Pre/post CTS)
 Uncertainty (Pre/Post CTS)
 Unateness: Positive unateness, negative unateness
 Jitter: PLL jitter, clock jitter

Gate delay

 Transistors within a gate take a finite time to switch. This means that
a change on the input of a gate takes a finite time to cause a change on the
output.[Magma]

 Gate delay =function of(i/p transition time, Cnet+Cpin).


 Cell delay is also same as Gate delay.

Source Delay (or Source Latency)

 It is known as source latency also. It is defined as "the delay from the


clock origin point to the clock definition point in the design".

 Delay from clock source to beginning of clock tree (i.e. clock


definition point).
 The time a clock signal takes to propagate from its ideal waveform
origin point to the clock definition point in the design.

Network Delay(latency)

 It is also known as Insertion delay or Network latency. It is defined as


"the delay from the clock definition point to the clock pin of the register".
 The time clock signal (rise or fall) takes to propagate from the clock
definition point to a register clock pin.

Insertion delay

 The delay from the clock definition point to the clock pin of the
register.

Transition delay

 It is also known as "Slew". It is defined as the time taken to change


the state of the signal. Time taken for the transition from logic 0 to logic 1
and vice versa . or Time taken by the input signal to rise from 10%(20%) to
the 90%(80%) and vice versa.
 Transition is the time it takes for the pin to change state.

Slew

 Rate of change of logic.See Transition delay.


 Slew rate is the speed of transition measured in volt / ns.

Rise Time

 Rise time is the difference between the time when the signal crosses
a low threshold to the time when the signal crosses the high threshold. It
can be absolute or percent.
 Low and high thresholds are fixed voltage levels around the mid
voltage level or it can be either 10% and 90% respectively or 20% and 80%
respectively. The percent levels are converted to absolute voltage levels at
the time of measurement by calculating percentages from the difference
between the starting voltage level and the final settled voltage level.

Fall Time
 Fall time is the difference between the time when the signal crosses
a high threshold to the time when the signal crosses the low threshold.
 The low and high thresholds are fixed voltage levels around the mid
voltage level or it can be either 10% and 90% respectively or 20% and 80%
respectively. The percent levels are converted to absolute voltage levels at
the time of measurement by calculating percentages from the difference
between the starting voltage level and the final settled voltage level.
 For an ideal square wave with 50% duty cycle, the rise time will be
0.For a symmetric triangular wave, this is reduced to just 50%.

 Click here to see waveform.


 Click here to see more info.

 The rise/fall definition is set on the meter to 10% and 90% based on
the linear power in Watts. These points translate into the -10 dB and -0.5
dB points in log mode (10 log 0.1) and (10 log 0.9). The rise/fall time values
of 10% and 90% are calculated based on an algorithm, which looks at the
mean power above and below the 50% points of the rise/fall times. Click
here to see more.

Path delay

 Path delay is also known as pin to pin delay. It is the delay from the
input pin of the cell to the output pin of the cell.

Net Delay (or wire delay)

 The difference between the time a signal is first applied to the net
and the time it reaches other devices connected to that net.
 It is due to the finite resistance and capacitance of the net.It is also
known as wire delay.
 Wire delay =fn(Rnet , Cnet+Cpin)
Propagation delay

 For any gate it is measured between 50% of input transition to the


corresponding 50% of output transition.
 This is the time required for a signal to propagate through a gate or
net. For gates it is the time it takes for a event at the gate input to affect the
gate output.
 For net it is the delay between the time a signal is first applied to the
net and the time it reaches other devices connected to that net.

 It is taken as the average of rise time and fall time i.e. Tpd=
(Tphl+Tplh)/2.

Phase delay

 Same as insertion delay

Cell delay

 For any gate it is measured between 50% of input transition to the


corresponding 50% of output transition.

Intrinsic delay

 Intrinsic delay is the delay internal to the gate. Input pin of the cell to
output pin of the cell.
 It is defined as the delay between an input and output pair of a cell,
when a near zero slew is applied to the input pin and the output does not
see any load condition.It is predominantly caused by the internal
capacitance associated with its transistor.
 This delay is largely independent of the size of the transistors forming
the gate because increasing size of transistors increase internal capacitors.

Extrinsic delay

 Same as wire delay, net delay, interconnect delay, flight time.


 Extrinsic delay is the delay effect that associated to with interconnect.
output pin of the cell to the input pin of the next cell.

Input delay

 Input delay is the time at which the data arrives at the input pin of the
block from external circuit with respect to reference clock.

Output delay
 Output delay is time required by the external circuit before which the
data has to arrive at the output pin of the block with respect to reference
clock.

Exit delay

 It is defined as the delay in the longest path (critical path) between


clock pad input and an output. It determines the maximum operating
frequency of the design.

Latency (pre/post cts)

 Latency is the summation of the Source latency and the Network


latency. Pre CTS estimated latency will be considered during the synthesis
and after CTS propagated latency is considered.

Uncertainty (pre/post cts)

 Uncertainty is the amount of skew and the variation in the arrival


clock edge. Pre CTS uncertainty is clock skew and clock Jitter. After CTS
we can have some margin of skew + Jitter.

Unateness

 A function is said to be unate if the rise transition on the positive


unate input variable causes the ouput to rise or no change and vice versa.
 Negative unateness means cell output logic is inverted version of
input logic. eg. In inverter having input A and output Y, Y is -ve unate w.r.to
A. Positive unate means cell output logic is same as that of input.
 These +ve ad -ve unateness are constraints defined in library file and
are defined for output pin w.r.to some input pin.
 A clock signal is positive unate if a rising edge at the clock source
can only cause a rising edge at the register clock pin, and a falling edge at
the clock source can only cause a falling edge at the register clock pin.
 A clock signal is negative unate if a rising edge at the clock source
can only cause a falling edge at the register clock pin, and a falling edge at
the clock source can only cause a rising edge at the register clock pin. In
other words, the clock signal is inverted.

 A clock signal is not unate if the clock sense is ambiguous as a result


of non-unate timing arcs in the clock path. For example, a clock that passes
through an XOR gate is not unate because there are nonunate arcs in the
gate. The clock sense could be either positive or negative, depending on
the state of the other input to the XOR gate.

Jitter

 The short-term variations of a signal with respect to its ideal position


in time.
 Jitter is the variation of the clock period from edge to edge. It can
varry +/- jitter value.

 From cycle to cycle the period and duty cycle can change slightly due
to the clock generation circuitry. This can be modeled by adding uncertainty
regions around the rising and falling edges of the clock waveform.

Sources of Jitter

Common sources of jitter include:

 Internal circuitry of the phase-locked loop (PLL)


 Random thermal noise from a crystal
 Other resonating devices
 Random mechanical noise from crystal vibration
 Signal transmitters
 Traces and cables
 Connectors
 Receivers
 Click here to read more about jitter from Altera.
 Click here to read what wiki says about jitter.

Skew

 The difference in the arrival of clock signal at the clock pin of different
flops.
 Two types of skews are defined: Local skew and Global skew.

Local skew
 The difference in the arrival of clock signal at the clock pin of related
flops.
Global skew

 The difference in the arrival of clock signal at the clock pin of non
related flops.

 Skew can be positive or negative.

 When data and clock are routed in same direction then it is Positive
skew.

 When data and clock are routed in opposite then it is negative skew.

Recovery Time

 Recovery specifies the minimum time that an asynchronous control


input pin must be held stable after being de-asserted and before the next
clock (active-edge) transition.

 Recovery time specifies the time the inactive edge of the


asynchronous signal has to arrive before the closing edge of the clock.

 Recovery time is the minimum length of time an asynchronous


control signal (eg.preset) must be stable before the next active clock edge.
The recovery slack time calculation is similar to the clock setup slack time
calculation, but it applies asynchronous control signals.

Equation 1:

 Recovery Slack Time = Data Required Time – Data Arrival Time
 Data Arrival Time = Launch Edge + Clock Network Delay to Source
Register + Tclkq+ Register to Register Delay
 Data Required Time = Latch Edge + Clock Network Delay to
Destination Register =Tsetup
If the asynchronous control is not registered, equations shown in Equation
2 is used to calculate the recovery slack time.

Equation 2:

 Recovery Slack Time = Data Required Time – Data Arrival Time
 Data Arrival Time = Launch Edge + Maximum Input Delay + Port to
Register Delay
 Data Required Time = Latch Edge + Clock Network Delay to
Destination Register Delay+Tsetup
 If the asynchronous reset signal is from a port (device I/O), you must
make an Input Maximum Delay assignment to the asynchronous reset pin
to perform recovery analysis on that path.
Removal Time

 Removal specifies the minimum time that an asynchronous control


input pin must be held stable before being de-asserted and after the
previous clock (active-edge) transition.
 Removal time specifies the length of time the active phase of the
asynchronous signal has to be held after the closing edge of clock.
 Removal time is the minimum length of time an asynchronous control
signal must be stable after the active clock edge. Calculation is similar to
the clock hold slack calculation, but it applies asynchronous control signals.
If the asynchronous control is registered, equations shown in Equation 3 is
used to calculate the removal slack time.

 If the recovery or removal minimum time requirement is violated, the


output of the sequential cell becomes uncertain. The uncertainty can be
caused by the value set by the resetbar signal or the value clocked into the
sequential cell from the data input.

Equation 3

 Removal Slack Time = Data Arrival Time – Data Required Time
 Data Arrival Time = Launch Edge + Clock Network Delay to Source
Register + Tclkq of Source Register + Register to Register Delay
 Data Required Time = Latch Edge + Clock Network Delay to
Destination Register + Thold
 If the asynchronous control is not registered, equations shown in
Equation 4 is used to calculate the removal slack time.

Equation 4

 Removal Slack Time = Data Arrival Time – Data Required Time
 Data Arrival Time = Launch Edge + Input Minimum Delay of Pin +
Minimum Pin to Register Delay
 Data Required Time = Latch Edge + Clock Network Delay to
Destination Register +Thold
 If the asynchronous reset signal is from a device pin, you must
specify the Input Minimum Delay constraint to the asynchronous reset pin
to perform a removal analysis on this path.

For more detail about recovery and removal time click here.
Clock Tree Synthesis- part 1
by signoff-scribe | Oct 16, 2017 | Weekly-Training-Sessions | 13 comments
Blog Views: 8,107
Author : Nishant Lamani, Physical Design Engineer, SignOff Semiconductors
Clock Tree Synthesis (CTS) is one of the most important stages in PnR. CTS QoR decides timing
convergence & power. In most of the ICs clock consumes 30-40 % of total power. So efficient clock
architecture, clock gating & clock tree implementation helps to reduce power.
Sanity checks need to be done before CTS
 Check legality.
 Check power stripes, standard cell rails & also verify PG connections.
 Timing QoR (setup should be under control).
 Timing DRVs.
 High Fanout nets (like scan enable / any static signal).
 Congestion (running CTS on congested design / design with congestion hotspots can create more
congestion & other issues (noise / IR)).
 Remove don’t_use attribute on clock buffers & inverters.
 Check whether all pre-existing cells in clock path are balanced cells (CK* cells).
 Check & qualify don’t_touch, don’t size attributes on clock components.
Preparations
 Understand clock structure of the design & balancing requirements of the designs. This will be help in
coming with proper exceptions to build optimum clock tree.
 Creating non-default rules (check whether shielding is required).
 Setting clock transition, capacitance & fan-out.
 Decide on which cells to be used for CTS (clock buffer / clock inverter).
 Handle clock dividers & other clock elements properly.
 Come up with exceptions.
 Understand latency (from Full chip point of view) & skew targets.
 Take care of special balancing requirements.
 Understand inter-clock balancing requirements.
Difference between High Fan-out Net Synthesis (HFNS) & Clock Tree Synthesis:
 Clock buffers and clock inverter with equal rise and fall times are used. Whereas HFNS uses buffers and
inverters with a relaxed rise and fall times.
 HFNS are used mostly for reset, scan enable and other static signals having high fan-outs. There is not
stringent requirement of balancing & power reduction.
 Clock tree power is given special attention as it is a constantly switching signal. HFNS are mostly
performed for static signals and hence not much attention to power is needed.
 NDR rules are used for clock tree routing.
Why buffers/inverters are inserted?
 Balance the loads.
 Meet the DRC’s (Max Tran/Cap etc.).
 Minimize the skew.
What is the difference between clock buffer and normal buffer?
 Clock buffer have equal rise time and fall time, therefore pulse width violation is avoided.
 In clock buffers Beta ratio is adjusted such that rise & fall time are matched. This may increase size of
clock buffer compared to normal buffer.
 Normal buffers may not have equal rise and fall time.
 Clock buffers are usually designed such that an input signal with 50% duty cycle produces an output with
50% duty cycle.
CTS Goals
 Meet the clock tree DRC.
 Max. Transition.
 Max. Capacitance.
 Max. Fanout.
 Meet the clock tree targets.
 Minimal skew.
 Minimum insertion delay.
Clock Tree Reference
By default, each clock tree references list contains all the clock buffers and clock inverters in the logic
library. The clock tree reference list is,
 Clock tree synthesis.
 Boundary cell insertions.
 Sizing.
 Delay insertion.
Boundary cell insertions
 When you are working on a block-level design, you might want to preserve the boundary conditions of the
block’s clock ports (the boundary clock pins).
 A boundary cell is a fixed buffer that is inserted immediately after the boundary clock pins to preserve the
boundary conditions of the clock pin.
 When boundary cell insertion is enabled, buffer is inserted from the clock tree reference list immediately
after the boundary clock pins. For multi-voltage designs, buffers are inserted at the boundary in the default
voltage area.
 The boundary cells are fixed for clock tree synthesis after insertion; it can’t be moved or sized. In addition,
no cells are inserted between a clock pin and its boundary cell.
Fig1: Boundary cell
Delay Insertion
 If the delay is more, instead of adding many buffers we can just add a delay cell of particular delay value.
Advantage is the size and also power reduction. But it has high variation, so usage of delay cells in clock
tree is not recommended.
Clock Tree Design Rule Constraints
 Max. Transition.
 The Transition of the clock should not be too tight or too relaxed.
 If it is too tight then we need more number of buffers.
 If it is too relaxed then dynamic power is more.
 Max. Capacitance.
 Max. Fanout.
Clock Tree Exceptions
 Non- Stop Pin
 Exclude Pin
 Float Pin
 Stop Pin
 Don’t Touch Subtree
 Don’t Buffer Nets
 Don’t Size Cells
Non- Stop Pin:
 Nonstop pins trace through the endpoints that are normally considered as endpoints of the clock tree.
 Example :
 The clock pin of sequential cells driving generated clock are implicit non-stop pins.
 Clock pin of ICG cells.
Fig2: Non Stop pin
Exclude pin:
 Exclude pin are clock tree endpoints that are excluded from clock tree timing calculation and optimization.
 The tool considers exclude pins only in calculation and optimizations for design rule constraints.
 During CTS, the tool isolates exclude pins from the clock tree by inserting a guide buffer before the pin.
 Examples:
 Implicit exclude pin-
 Non clock input pin of sequential cell.
 Multiplexer select pin.
 Three-state enable pin.
 Output port.
 Incorrectly defined clock pin [if pin don’t have trigger edge info.].
 Cascaded clock.

Fig3: Exclude pin


 In the above figure, beyond the exclude pin the tool never perform skew or insertion delay optimization
but does perform design rule fixing.
Float Pin:
 Float pins are clock pins that have special insertion delay requirements and balancing is done according to
the delay.[Macro modelling].
Fig4: Float pin
Stop Pin:
 Stop pins are the endpoints of clock tree that are used for delay balancing.
 CTS, the tool uses stop pins in calculation & optimization for both DRC and clock tree timing.
 Example:
 Clock sink are implicit stop pins.

Fig5: Stop pin


The optimization is done only upto the stop pin as shown in the above figure.
Don’t Touch Sub-tree:
 If we want to preserve a portion of an existing clock tree, we put don’t touch exception on the sub-tree.

Fig6: Don’t touch subtree


 CLK1 is the pre-existing clock and path 1 is optimized with respect to CLK1.
 CLK2 is the new generated clock. Don’t touch sub-tree attribute is set w.r.t C1.
 Example:
 If path1 is 300ps and path2 is 200ps, during balancing delay are added in path2.
 If path1 is 200ps and path2 is 300ps, during balancing delay can’t be added on path1 because on path1
don’t touch attribute is set and we get violation.
Don’t Buffer Net:
 It is used in order to improve the results, by preventing the tool from buffering certain nets.
Note: Don’t buffer nets have high priority than DRC.CTS do not add buffers on such nets.
 Example:
 If the path is a false path, then no need of balancing the path. So set don’t buffer net attribute.
Don’t Size Cell:
 To prevent sizing of cells on the clock path during CTS and optimization, we must identify the cell as
don’t size cells.
Specifying Size-Only Cells:
 During CTS & optimization, size only cells can only be sized not moved or split.
 After sizing, if the cells overlap with an adjacent cell after sizing, the size-only cell might be moved during
the legalization step.
Implementing Clock Tree:
For implementing the clock tree, use the clock-opt which performs CTS & incremental physical
optimization.
 Synthesizes the clock Tree:
 Before implementing the clock tree, the tool upsize & possible moves the existing clock gate which
improves the quality of result (QoR) and reduce the number of clock tree levels.
 Optimize the Clock Tree: Is done by following steps
 Buffer relocation.
 Buffer sizing.
 Gate relocation.
 Gate sizing.
 Improve skew.
 Delay insertion.
 Perform inter-clock delay balancing
 Balancing has to be done between two flops driven by two different clocks.
 Clock groups between which balancing have to be performed need to be specified.
 Perform detail routing of clock nets [NDR rule].
 Apply non default routing (NDR) rules for clock nets.
 Double width.
 Double spacing.
 Shielding
 By default the tool applies routing rules for sink pin by default. It is better to use normal routing rules at
the sink pin because to reduce the congestion and tapping of clock might be easy.
Fig7: Non Default Routing
 Perform RC extraction of the clock nets and compute accurate clock arrival time.
 Adjust the I/O timings.
 After implementing the clock tree, the tool can update the input and output delays to reflect the actual
clock arrival time.
 Perform power optimization.
 Use a large/Max clock gating fanout during insertion of the ICG cells.
 Merge ICG cells that have the same enable signal.
 Perform power-aware placement of ICG and registers.
 Check and fix any congestion hotspots.
 Optimize the scan chain.
 Fix the placement of the clock tree buffers and inverters.
 Perform placement and timing opt.
 Check for major hold time violation.
Synthesis
by signoff-scribe | Oct 17, 2017 | Random-Blogs | 4 comments
Blog Views: 3,343
Author: Batchu Sri Sai Chaitanya, Physical Design Engineer, Signoff Semiconductors
Synthesis is process of converting RTL (Synthesizable Verilog code) to technology specific gate
level netlist (includes nets, sequential and combinational cells and their connectivity).
Goals of Synthesis
1. To get a gate level netlist
2. Inserting clock gates
3. Logic optimization
4. Inserting DFT logic
5. Logic equivalence between RTL and netlist should be maintained
Input files required
1. Tech related:
 .tf- technology related information.
 .lib-timing info of standard cell & macros
2. Design related:
 .v- RTL code.
 SDC- Timing constraints.
 UPF- power intent of the design.
 Scan config- Scan related info like scan chain length, scan IO, which flops are to be considered
in the scan chains.
3. For Physical aware:
 RC co-efficient file (tluplus).
 LEF/FRAM- abstract view of the cell.
 Floorplan DEF- locations of IO ports and macros.
Synthesis steps
Fig1: Synthesis Flow
1. Analyze
 Checks syntax on RTL and generates immediate files.
2. Elaborate
 Brings all lower level blocks into synthesis tool.
 All the codes and arithmetic operators are converted into Gtech and DW (Design Ware)
components. These are technology independent libraries.
 Gtech- contains basic logic gates &flops.
 DesignWare- contains complex cells like FIFO, counters.
 Elaborate performs following tasks;
 Analyses design hierarchy.
 Removes empty switches and dead branches.
 Executes initial commands.
 Detects asynchronous reset.
 Converts decision trees to mux.
 Converts synchronous to Dlatch/DFF.
 FSM pass
 Detects FSM logic and extracts the no of input, output bits and state bits.
 Converts FSM logic to basic logic.
 Memory pass
 Merging DFF to memory write(memwr) and memory read (memrd)
 Consolidating memwr/memrd cells
 Generate memory (mem) cells
 Mapping mem cells to basic logic
3. Import constraints and UPF
Once the design is extracted in the form of technology independent cells, timing constraints are
imported from the SDC file.
If the design consists of multiple power domains, then using the UPF power domains, isolation
cells, level shifters, power switches, retention flops are placed.
4. Clock gating
Due to high switching activity of clock a lot of dynamic power is consumed. One of the
techniques to lower the dynamic power is clock gating. In load enabled flops, the output of the
flops switches only when the enable is on. But clock switches continuously, increasing the
dynamic power consumption.
By converting load enable circuits to clock gating circuit dynamic power can be reduced. Normal
clock gating circuit consists of an AND gate in the clock path with one input as enable. But when
enable becomes one in between positive level of the clock a glitch is obtained.

Fig2: Load enabled register bank


Fig3: Clock gated register bank

Fig4: Waveform for clock gate


To remove the glitches due to AND gate, integrated clock gate is used. It has a negative
triggered latch and an AND gate.
Fig5: Integrated clock gated register bank

Fig6: Waveform for ICG


Clock gating makes design more complex. Timing and CG timing closure becomes complex.
Clock gating adds more gates to the design. Hence min bit width (minimum register bit width to
be clock gated) should be wisely chosen, because the overall dynamic power consumption may
increase.
5. Compile
 Performs Boolean optimization.
 Maps all the cells to technology libraries.
 Performs logic and design optimization.
6. Optimization
 Logic optimization
 Constant folding
 Detect identical cells
 Optimize mux(dead branches in mux)
 consolidate mux and reduce inputs(many to single)
 Remove DFF with constant value
 Reduce word size of the cells
 Remove unused cells and wires
 Design optimization
 Reduce TNS and WNS
 Power Optimization
 Area Optimization
 Meet the timing DRV’s
 incremental clock gating
7. DFT (Design for Testing) insertion
 DFT circuits are used for testing each and every node in the design.
 More the numbers of nodes that can be tested with some targeted pattern, more is the coverage.
 To get more coverage the design needs to be more controllable and observable.
 For the design to be more controllable we need more control points (mux through which alternate
path is provided to propagate pattern).
 For the design to be more observable we need more observe point (A scan-able flop that
observes the value at that node).
 Scan mode is used to test stuck at faults and manufactured devices for delay.
 Scan mode is done using scan chains
 Scan chains are part of scan based designs to propagate the test data.
 By having scan chains, the design can be more controlable and observable.
 Each scan chain inputs the pattern through scan input and outputs the pattern through scan
output.
 Scan chain consists of scan flops where the output of scanflops is directly connected to scan
inputs of the flops.
 Stages of scan mode
 Inputs the pattern through scan input port.
 Scan shift- Scan enable is set to 1. Then inputs the pattern through the scan input, shifts the
pattern through the scan flops and load all the flops with test pattern.
 Scan capture- Scan enable is set to 0. In one clock cycle the loaded value in the flops
propagates through combinational circuit and reaches the D pin of the next flop.
 Scan enable is set to 1 and outputs the pattern through scan output port.
 The scan chain length and number of scan chains has to be properly chosen, as having more
scan chain length increases the pattern propagation time and having more scan chains increases
the number of scan IO ports.
Fig7: Scan chain
8. Compile incremental
 Technology mapping of DFT circuit
 Optimization of the design
9. Outputs of Synthesis
 netlist
 SDC
 UPF
 ScanDEF- information of scan flops and their connectivity in a scan chain
Checklist
 Check if the RTL and netlist are logically equivalent (LEC/FM).
 Check if SDC and UPF are generated after synthesis and also check their completeness.
 Check if there are any assign statements.
 Checks related to timing
 Combinational loops
 Un-clocked registers
 unconstrained IO’s
 IO delay missing
 Un-expandable clocks
 Master slave separation
 multiple clocks
 Checks related to design
 Floating pins
 multi driven inputs
 un-driven inputs
 un-driven outputs
 normal cells in clock path
 pin direction mismatch
 don’t use cells
Placement Commands
Sanity checks in Placement( ICC2 TOOL)
 Check_lagality --------------->> Checks the legality of the current placement

 Check_design -------------->> Runs pre-defined or user-defined checks on current design

 Check_scan_chain ------->> Allows scan chain structural consistency checking based on the scan
chain information stored in the current design

 Report_design-------------->> Reports netlist, floorplan, routing, and library information for the
current block
 Report_threshold_voltage_groups-->> Reports statistics on cell count and area by threshold
voltage group names.

 Report_timing ------------------->> Displays timing information about a design.

 Report_power -------------------->> Calculates and reports dynamic and static power for the design
or instance

 Report_qor ------------------------->> Displays QoR information and statistics for the current design.

Sanity checks in Placement ( INNOVUS TOOL):

 checkPlace----------------------->> For violation Caused by preplaced cells or block

 checkDesign -all---------------------->> Runs pre-defined or user-defined checks on current design

 report_power ----------------------->> To generate power report

 timeDesign – preplace ---------------->>To get an idea of Zero Wire Load timing of the design

 Report_timing ------------------->> Displays timing information about a design.

ICC-II PALACEMENT RELATED COMMANDS:

Magnet_placement –logical_level 1 { [get_ports* ] * [all_macro_cells*] } ---------->> to define magnet


placement to macros and I/O’s to improve congestion and timing.

Set_app_var physopt_hard_keepout_distance 10------------------>> setting a keepout margin, is a region


around the boundary of fixed cells in a block in which no other cells are placed.

Printvar physopt_hard_keepout_distance ----------------------------->> To define an outer keepout margin


for hard Macro
Create_bound –name bound_1 –effort medium (cell name *)--------------------->> To define a move bound
(To define the boundaries of these shapes)

set place.coarse.congestion_deriven_max_util ------------------------------------->> Specify a maximum


utilization that controls how densely the tool can place cells

set place.coarse.max_density ----------------------------------------------->> Specify a maximum density that


controls how densely the tool can place cells

create_placement –timing_driven –continue_on_missing_scandef ------->> for global placement

place_opt –effort medium –area –power –skip_initial_placement –continue_on_missing_scandef ---------


----->> detail placement (legal placement)

INNOVUS PALACEMENT RELATED COMMANDS:

specifyScanChain scan1 -start { } –stop { } --------------------->>To specify the scan chain in the design

placeDesign --------------------------------->> place the standard cells in the core area

placeDesign - incremental------------------->>Placement incremental to spread out cells

optDesign –preCTS---------------------->> optimize the setup and congestion violation in the placement
stage

setPlaceMode –congEffort high---------->> set congestion effort to high prior to running PlaceDesign

congRepair command------------------>> an incremental placement based on the trialRoute congestion


results

setPlaceMode –modulePadding module factor--------------------->> Specifies a module that needs


padding(Placement Clearance)

createDensityArea { x1 y1 x2 y2} factor------------------>> to create density screens,also known as partial

placement blockages.

generate_fence ---------------------->> To create the bounds to overcome congestion


timeDesign -preCTS -prefix prects -drvReports -expandReg2Reg –slackReports ---------->> Generate
number of violating paths in setup mode

reportCongestion -overflow => reports the overflow of congestion

describeCongestion => reports horizontal and vertical congestion

report_timing – path_type {end_slack_only }– late - max_paths { }

report_timing – path_type {end_slack_only }– early - max_paths { }

You might also like