Comprehensive Optimization Stage of DC
Comprehensive Optimization Stage of DC
Comprehensive Optimization Stage of DC
front page news Blog live streaming flash memory class code changes the world register Log in
IC_learner
Tcl and Design Compiler (8) - DC logic synthesis and optimization yourself. ~~ Except for the blog posts that are
prohibited from being reprinted, other blog posts
If there is an error in this article, please leave a message to correct it; in addition, please can be reposted.
After the timing path, working environment, design rules, etc. are constrained, the DC
can be synthesized and the timing optimized. The optimization steps of the DC will be
explained below. However, when optimization cannot be performed in normal mode, we need
to write scripts to improve DC optimization to meet timing requirements. The theoretical part
is mainly based on logic synthesis and does not involve physical library information. In the
actual combat part, we will proceed in DC's topology mode . (This article mainly refers to Yu
Xiqing's "A Practical Course on ASIC Design" for the summary and experiment expansion)
The main contents are: Nickname: IC_learner
Age: 6 years and 4 months
·DC logic synthesis and optimization process Fans : 1594
Follow: 10
· Timing optimization and method +Follow
· Actual combat
< June 2023 >
search
looking around
Google search
my essay
my comment
my participation
latest comment
My Tags
My Tags
It mainly includes: Architectural- Level Optimization in the first stage, Logic-Level Design Compiler (13)
Digital IC Design (12)
Optimization in the second stage , and Gate-Level Optimization in the final stage.
DC (12)
(1) Architectural-Level Optimization tcl (12)
Comprehensive (11)
Structural-level optimization includes the following: Digital Backend (9)
verilog topic (9)
Digital ICs (8)
Low Power Design (8)
Common Circuit Modules (7)
More
Essay classification
https://www.cnblogs.com/iclearner/p/6636176.html 1/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
Select the most suitable structure or algorithm to realize the function of the circuit in Tcl and Design Compiler (15)
Topics in Verilog (10)
DesignWare . Some commonly used modules (12)
Static Timing Analysis and Primetime(2)
②Data-path Optimization:
Memories, impressions, realizations and
Choose algorithms such as CSA to optimize the design of the data path. hopes in life (12)
Digital IC (Front End) / Logic Design Tips(15)
③Sharing Common Subexpressions (Sharing Common Subexpressions): Digital IC front-end (simulation) verification (2)
Digital IC design back-end learning record (9)
That is, among multiple expressions/equations, there are common expressions that Fundamentals of Digital IC Design (9)
are shared, for example as follows: Image processing: from entry to abandonment
(manual funny) (3)
has the equation: Reposted and other blog posts(1)
Comment leaderboard
Recommended leaderboard
none . At this time, we still need to share resources for arithmetic operations, so we must latest comment
write the corresponding code in the RTL code, as shown below:
1. Re: Tcl and Design Compiler (4) - DC
startup environment settings
https://www.cnblogs.com/iclearner/p/6636176.html 2/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
--Connais
2. Re: ICC_lab summary——ICC_lab1: data
setting and basic process
--YangZQplus
3. Re: Simple simulation using Modelsim
--Luo_Y
4. Re: (Digital IC) Introduction to Low Power
Design (4) - RTL Level Low Power Design
archeology
The RTL code contains the topology of the circuit . HDL compilers parse expressions --summer_li
from left to right . Parentheses have higher precedence. DesignWare in DC takes this order
as the beginning of sorting.
The total delay of the circuit is equal to the delay of one multiplier plus the delay of 4
adders. To reduce the delay of the circuit, we can change the order of the expressions or use
parentheses to force the circuit to use a different topology. like:
The total delay of the circuit is equal to the delay of one multiplier plus the delay of 2 adders,
which is less than the delay of 2 adders in the original circuit.
https://www.cnblogs.com/iclearner/p/6636176.html 3/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
After optimizing the structure, the function of the circuit is represented by GTECH
devices . In the process of logic-level optimization, structural optimization and flattening
optimization can be performed .
① Structural optimization:
set_structuretrue
② Flatten optimization:
Flattening optimization reduces the combinational logic path to two levels and
becomes a sum -of-products (SOP) circuit, that is, a circuit with (and) and then or (or), as
https://www.cnblogs.com/iclearner/p/6636176.html 4/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
This optimization is mainly used for speed optimization , and the area of the circuit may be
large. Set the flattening optimization with the following command :
set_flatten true -effort low | medium | high (one of low, medium, and high
is fine)
The default value after the command option "-effort" is low, and for most designs, the default
value can receive good results. If the circuit does not flatten easily, the optimization stops. If
the value after the option "-effort" is set to medium, DC will spend more CPU time trying to
flatten the design. If the value after the option "-effort" is set to high, the flattening process
will continue to completion. At this time, it may take a lot of time to perform flattening
optimization.
During gate-level optimization, Design Compiler starts mapping and completes the
implementation of gate-level circuits. The main contents are as follows:
Phase 1: Delay optimization , Phase 2: Design rule trimming , Phase 3: Design rule
trimming at the expense of timing , Phase 4 : Area optimization .
If we add area constraints to the design, Design Compiler will try to reduce the area of the
design in the final stage (stage 4). Gate-level optimization needs to map combinational
functions and timing functions:
The process of mapping combined functions is: DC selects combined units from the
target library to form a design , which can meet the requirements of time and area, as
shown in the figure below:
https://www.cnblogs.com/iclearner/p/6636176.html 5/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
The process of timing function mapping is: DC selects sequential units from the
target libraryto form a design, which can meet the requirements of time and area. In order to
improve speed and reduce area, DC will select more complex sequential units, as shown
below:
https://www.cnblogs.com/iclearner/p/6636176.html 6/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
https://www.cnblogs.com/iclearner/p/6636176.html 7/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
(4) Other optimization conditions (need to add a certain comprehensive option switch)
For example, when one register drives multiple registers , it may violate the design rules.
DC will multiplex the driving registers and divide the driven ones , as shown in the
following figure:
In this case, when the DC compiles, it copies every instantiated module. Each module
corresponds to a copy and has a unique name. In this way, DC can be optimized and
mapped according to the unique environment of each module itself , as shown in the
following figure (( module name is unique )):
In DC, we can use the uniquify command to generate a uniquely named copy of each module
in the design. When DC compiles the design, it also automatically generates a unique named
copy for each module. The variable uniquify_naming_style can be used to control how each
https://www.cnblogs.com/iclearner/p/6636176.html 8/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
( 1) When the violation is more serious , that is, when the timing violation is more than
25% of the clock cycle , it is necessary to re-modify the RTL code.
(2) When timing violations are below 25% , there are the following timing optimization
methods:
The compile_ultra command supports the DFT process. In addition, the compile_ultra
command is very simple and easy to use. Its switch options are:
https://www.cnblogs.com/iclearner/p/6636176.html 9/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
When using the compile_ultra command, all DesignWare layers are automatically canceled
if the following variables are set:
That is to say, an adder and a multiplier you call are originally synthesized in the form of an
IP core, or in the form of a module, but after setting the above variables, the interface of the
module after synthesis will be No, you don't know which gates are for adders and which
are for multipliers .
When using the compile_ultra command, using the following variable settings, if there
are some modules in the design whose size is less than or equal to the value of the variable,
the module hierarchy is automatically canceled:
That is to say, suppose you have a module A that is a small multiplier, and you call module B,
and a module B is a small adder, and use the synthesis without setting this command, then we
can see module A What are the gate circuits corresponding to the multiplier in the middle, and
you can also see which gate circuits the adder of module B is composed of. There are layers
and boundaries between module A and module B; after setting the above command, We can't
see the hierarchical relationship between module A or module B, and we can't see which gate
circuits the multiplier is composed of, or you can see a certain AND gate, but you don't know
that it constitutes a multiplier The ones still make up the adder.
For optimal design results, we recommend using the compile_ultra command with the
DesignWare library .
Boundary optimization means that when editing (also called synthesis), Design Compiler
will optimize the transmission constants, unconnected pins and complement information, as
shown in the following figure:
https://www.cnblogs.com/iclearner/p/6636176.html 10/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
That is to say, boundary optimization will optimize some fixed levels and fixed logic
of boundary pins .
For the optimization of registers, for example, the following circuit contains both
combinational logic circuits and registers:
The timing path delay between registers and registers in the latter stage is 10.2 ns, and the
clock cycle is 10 ns, so this path is timing violation. But the timing path delay between
registers and registers of the previous stage is 7. 5 ns, and there is time redundancy. Using the
optimize_registers command, you can move part of the combinatorial logic of the latter
stage to the front stage , so that the timing path delay between all registers and registers is
less than the clock cycle, which meets the requirements of register setup time. The
optimize_registers command first optimizes timing and then optimizes area. After
optimization, the functionality of the circuit remains unchanged at the input/output
boundaries of the blocks. This command only optimizes the gate-level netlist .
In addition to using this command alone, you can also add the option -retime when
compiling (it seems that only compile_ultra has this switch option) . The function of the -
retime option is: when one path does not meet the requirements and the adjacent path meets
the requirements, the DC will perform logical migration between the paths to meet the
requirements of the two paths at the same time. This is also called adaptive retiming , as
shown below Shown:
https://www.cnblogs.com/iclearner/p/6636176.html 11/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
https://www.cnblogs.com/iclearner/p/6636176.html 12/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
For the pipeline (pipeline) optimization of pure combinational logic , examples are as
follows, and the optimization of pure combinational logic circuits is as follows:
The circuit on the left is a pure combinational circuit with a path delay of 23.0 ns. Pipelining
this circuit leads to the circuit shown on the right. Obviously, the throughput of the circuit is
accelerated. It should be noted that when using this command, the registers need to be
preset in the RTL design , otherwise the DC does not know how these registers come from.
https://www.cnblogs.com/iclearner/p/6636176.html 13/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
Before introducing this optimization method, let's understand path grouping and delay.
· Path grouping:
In order to facilitate the time analysis of the circuit, the timing paths are grouped
again . Paths are grouped by the clocks that control their destinations . If paths are not
clocked, these paths are classified into the default (Default) path group . We can use the
report_path_group command to report the path grouping in the current design. For example,
for the following circuit, let's look at the routing and grouping:
https://www.cnblogs.com/iclearner/p/6636176.html 14/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
According to the above figure, we can know that there are 5 endpoints in the figure (four
registers and one output). The clock CLK1 controls 3 endpoints, and there are 8 paths under
the control of CLK1. The clock CLK2 controls one terminal, and there are 3 paths under the
control of CLK2. The output port is an end point, which is not controlled by any clock, and
its starting point is the clock pin of the second-level register. There is only one path under its
control, and this path is classified into the default path group. Therefore, there are a total of
12 paths and 3 path groups in this design. The three path groups are respectively CLKI,
CLK2 and a default (Default) path group .
· Path delay:
When calculating the delay of the path, Design Compiler divides each path into time arcs
(timine arcs) , as shown in the following figure:
DC is to calculate the path delay through the time arc. Because time arcs describe the timing
characteristics of cells and/or wires . The time arc of the unit is defined by the process
library, including the delay and timing check of the unit (such as the setup/hold check of the
register, the delay of clk->q, etc.); the time arc of the connection is defined by the netlist. In
https://www.cnblogs.com/iclearner/p/6636176.html 15/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
the above circuit, the time arc has the delay of the connection , the delay of the unit and the
clk -> q delay of the register . The unit delay is usually calculated by a nonlinear model ;
the connection delay is calculated by the line load model before the layout; the distribution of
RC parasitic parameters is determined by the "Tree-type" attribute in the operating condition;
the working condition determines the process, voltage and temperature on the connection and
the effect of unit delay.
In addition, the delay of the path is related to the edge of the starting point , as shown
in the following figure:
Assuming that the connection delay is 0, if the starting point is a rising edge, the delay of this
path is equal to 1. 5 ns. If the starting point is the falling edge, the delay of this path is equal
to 2.0 ns. It can be seen that the time arc of the unit is edge sensitive . Design Compiler
accounts for the edge sensitivity of each path delay. It should also be emphasized that the
default behavior of Design Compiler is to assume that the maximum delay constraint
between registers is: TCLK - FF21ibSetup, that is, the maximum delay time of data
from the sending edge to the receiving edge is less than one clock cycle , as shown in the
following figure:
Each path calculates the delay twice, one starting point is the rising edge, and the
other starting point is the falling edge;
Find the critical path in each path group , that is, the path with the largest delay;
The default behavior of DC is to optimize the critical path . The synthesis process
stops when it cannot find a better optimized solution for the critical path. DC will not further
optimize the sub-critical paths (Sub-critical paths) . Therefore, if the critical path cannot
meet the timing requirements and violates the time constraint, the sub-critical paths will not
be optimized, and they are only mapped to the process library, as shown in the following
figure:
https://www.cnblogs.com/iclearner/p/6636176.html 16/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
For the following circuit, assume that after adding design constraints, all paths belong to the
same clock group, that is, there is only one path group:
If the optimization of the combinational circuit part cannot meet the timing
requirements, and the critical path is in the combinational circuit , according to the default
behavior of DC, the optimization of the critical path in the combinational circuit will block
registers belonging to the same clock group as it and paths between registers optimization.
There are two ways to prevent this from happening: custom path groups and setting key
ranges.
During synthesis, the tool only independently optimizes the worst (longest delay)
path of a path group , but it does not hinder the path optimization of another custom path
group. Generating custom path groups can also help the synthesizer adopt a divide-and-
conquer strategy when doing timing analysis , because the report_timing command reports
the timing paths for each timing path group separately. This can help us isolate a certain area
of the design, have more control over the optimization, and analyze the problem, as shown in
the following figure:
https://www.cnblogs.com/iclearner/p/6636176.html 17/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
The above command generates three custom path groups, plus the original path group,
that is, the path group from register to register (because it is controlled by CLK , the default
is the path group of CLK), and now there are 4 path groups. The path of the combined circuit
belongs to the " COMBO " group. Since the starting point of the path group is the input
end , after executing the " group_path -name INPUTS -from [all_inputs] " command, the
command uses the option " -from [all_inputs] ", They originally belonged to the " INPUTS "
group. After executing the " group_path -name OUTPUTS -to [all_outputs] " command,
the path of the combined circuit will not be moved to the " OUTPUTS " group, because the
switch option ''-from' has higher priority than the option " -to " , so The path of the
combinatorial circuit is still left in the "INPUTS" path group. But since the " group_path -
name COMBO -from [all_inputs] -to [all-outputs] " command uses the switch options "-
from" and "-to" at the same time , The start and end points of the combined circuit paths
satisfy both requirements, so they end up belonging to "COMBO " group. DC works in this
way to prevent different results due to changes in the order of commands. We can use the
report_path_group command to get the timing path group in the design .
After the custom path group is generated, the path optimization is shown in the figure
below. At this time, the path between registers and registers can be optimized:
DC can specify the weight for optimization . When the timing of some paths is
relatively poor, you can focus on optimizing the path by specifying the weight. The highest
weight is 5 , followed by 2, and the default is 1; therefore, the worst value should be set to 5;
as shown in the figure below, the following command focuses on optimizing the path
group of CLK :
·Critical Range:
By default, DC only optimizes the timing of the critical path in a path group, but we can
set the DC to optimize the path within a certain delay value below the delay of the critical
path, so we can use the following command to set the critical range: set_critical_range 2
[current_design]
After using the above command, DC will optimize all paths within the range of 2ns of
the critical path , and solving the timing problem of the relevant sub-critical path may
also help the optimization of the critical path . The schematic diagram of timing
optimization is as follows:
https://www.cnblogs.com/iclearner/p/6636176.html 18/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
If after executing the set_critical_range command, the optimization makes the timing of
the critical path worse, the DC will not improve the timing of the sub-critical path . We
recommend that the value of the critical range should not exceed 10% of the total value of the
critical path .
This is to combine the key ranges of the custom path combination, that is, to set the key
range of the design with the specified key range in each path group , the command is as
follows:
Using custom timing path groups and critical ranges at the same time makes the DC run
longer and uses a lot of the computer's memory. But this approach is worth a try, because DC
by default only optimizes the critical path in each path group. If the critical path cannot meet
the timing on a path, it will not try other methods to optimize other paths in the timing path
group. If the DC can be made to optimize more paths, it may be better optimizing other parts
of the design. In the design of the data path, many timing paths are interrelated, and the
optimization of the sub-critical path may improve the timing of the critical path . After
setting the critical range. Even if DC does not reduce the Worst Negative Slack in the
design (Worst NegativeSlack, I don't know what it is), it will reduce the Total Negative Slack
in the design .
The following are the key differences between custom path groups and key scopes:
Custom path group : After the user defines the path group, if the overall performance of the
design is improved, DC allows the path timing of one path group to be sacrificed (timing
deterioration) to improve the path timing of another path group. Including a path group in the
design can make the timing of the worst path worse.
Critical Range: The critical range does not allow the critical path timing of the same path
group to be made worse by improving the timing of the sub-critical path. If there are multiple
path groups in the design, we only set the critical range for one path group, instead of setting
the critical range for all path groups in the entire design, DC will only optimize several paths
in parallel, run Time doesn't add much.
The division of modules is carried out at the beginning of the design, but because we focus on
the use of the DC tool, we will explain it here.
https://www.cnblogs.com/iclearner/p/6636176.html 19/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
layers of logic. SoC design generally includes design reuse and intellectual property IP cores.
SoC designs include multiple levels of circuitry. Hierarchical IC design trends are as follows:
Similarly, the synthesized logic circuit (such as RISC_CORE) in the figure is generally
composed of some sub-modules. For designing complex and large-scale circuits, we need to
partition it (Partitioning) , and then process (such as synthesis) the relatively simple and
small-scale circuits after partitioning. At this time, because the circuit is small, the processing
and analysis are more convenient and simple. It is easy to meet the requirements quickly.
Then integrate the processed small circuit into the original large circuit, as shown in the
figure below:
There are many reasons for partitioning , here are a few of them:
Different functional blocks (such as Memory, uP, ADC, Codec, controller, etc.);
Design size and complexity (module processing time is moderate, the design size is
generally set to one night's running time, manual processing and debugging are performed
during the day, the machine runs at night, and the running results are checked the next
morning);
https://www.cnblogs.com/iclearner/p/6636176.html 20/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
· Convenient design team management project (each design engineer is responsible for
one or several modules);
Satisfy physical constraints (such as using FPGA to make engineering samples first—
Engineering Sample; large designs may need to be implemented with multiple FPGA chips).
·etc.
The division of modules is related to the timing. If the timing is not good, the modules
can be re-divided. Therefore, we are required to properly divide the design when dividing the
modules.
You can define the hierarchical structure and blocks of your design with
instantiation . VHDL's entity and Verilog's module statements (statements) define a new
hierarchical module, that is, instantiating an entity or module produces a new level of
hierarchy. If in the design, we use symbols (+, one, *, /, ...) to mark the arithmetic operation
circuit, a new level of hierarchy may be generated. The Process in the VHDL language and
the Always statement in the Verilog language cannot generate a new level . When
designing, in order to obtain the optimal circuit, we need to design the hierarchical structure
of the entire circuit and divide the entire design so that the comprehensive results of each
module and the entire circuit can meet our goals.
There are 3 modules: A, B and C. They each have input and output ports. Because DC must
reserve the ports of each module when synthesizing the entire circuit . Therefore, logic
synthesis cannot cross block boundaries, nor can combinational logic from adjacent
blocks be merged . The delay of the path from register A to register C is longer, and the
circuit area of this part is larger. If we modify the division of the design and combine the
related combinational circuits into one module, the combinational circuits in the original
modules A, B and C have no hierarchical separation, and the technology for combinational
circuit optimization in the synthesis tool can now be fully used . At this time, the area of the
circuit is smaller than before, and the delay of the path from register A to register C is also
shorter. Modify as shown in the following figure:
If we make another modification to the division of the design, as follows, we get the best
division:
The modification here combines related combinational circuits into one module. The original
combinational circuits in modules A, B and C have no hierarchical separation, and the
https://www.cnblogs.com/iclearner/p/6636176.html 21/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
For a general design, a good module division is shown in the following figure:
Under this division, the output boundary of the module is the output terminal of the
register . Since there is no boundary between combinational circuits, and its output is
connected to the data input of the register, we can make full use of the optimization
techniques of synthesis tools for combinational circuits and sequential circuits to obtain the
best results and simplify the design constraints. The delay of all input ports of each module
except the clock port in the figure is the same, which is equal to the delay from the clock pin
CLK of the register to the output pin Q. This makes timing constraints more convenient,
which was mentioned in the previous timing path constraints.
The above is the recommended module division mode, and the following is a description
of the module division methods to be avoided.
When doing module division, try to avoid using glue logic (Glue Logic) , the glue logic
is shown in the following figure:
Glue logic is combinational logic connected to modules. In the figure, the top-level NAND
gate (HAND gate) is only an instantiated unit, andoptimization is limited because the glue
logic cannot be absorbed by other modules . If we adopt a bottom-up (bottom up) strategy,
we need to do additional compilation (compile) at the top level. A division that avoids Glue
Logic looks like this:
https://www.cnblogs.com/iclearner/p/6636176.html 22/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
Glue logic can be optimized together with other logic, and the top-level design is just a
structured netlist. No further compilation is required.
There may be timing violations in the first module division, and the modules may need
to be re-divided. Here we will introduce the modification of the module division. We know
that the larger the design, the more resources the computer needs to synthesize the
design and the longer the run time . Design Compiler software itself does not limit the size
of the design. When we compile the design, we need to consider that the size of the divided
module should match the existing computer central processing unit (CPU) and memory
resources. Try to avoid the following improper divisions:
The module is too small : Due to the manual division of the module boundary, the
optimization is limited, and the integrated result may not be optimal.
The module is too large : the running time required for editing may be too long, and we
cannot wait too long due to the short design cycle required.
Generally speaking, according to the existing computer resources and the calculation
speed of the integrated software, according to our expected turnaround time , the scale of
the module division is set at about 400 ~ 800K gates . When synthesizing a design, a
reasonable run time is one night. During the day we design and modify the circuit, and write
compiled scripts. Before leaving get off work, use a script to input the design to DC, make
comprehensive optimization on the design, and come back the next morning to check the
results.
When dividing, the core logic (Core Logic) , I/0 Pads , clock generation circuit ,
asynchronous circuit and JTAG (Joint Test Action Group) circuit should be separated and
put into different modules. The top-level design is divided into at least 3 levels of hierarchy:
Top-level, Mid-level, and Functional Core , as shown in the figure below:
This division method is used because: the I/O pad unit is related to the process, the frequency
division clock generation circuit is untestable (Untestable), the JTAG circuit is related to the
process, and the design, constraints, and synthesis of asynchronous circuits are different from
those of synchronous circuits. Therefore, Also placed in a separate module from the core
functionality.
Here mainly introduces the design and synthesis of synchronous circuits. In order to
optimize the synthesis results of the circuit, and the synthesis run time is moderate, we need
to divide the design properly. If the existing division cannot meet the requirements, we need
to modify the division. We can modify the original RTL code to modify the partition, or
use the DC command to modify the partition . The following describes how to use
commands to modify partitions in DC.
https://www.cnblogs.com/iclearner/p/6636176.html 23/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
The DC needs to transparently modify partitions during the synthesis process. If you use
the command in DC:
compile_auto_ungroup_delay_num_cells _ _
compile_auto_ungroup_area_num_cells _ _
to control. The preset default values for the two variables are 500 and 30 respectively.
We can also use the set command to set them to any value we want. We can use the
report_auto_ungroup command to report those partitions that were ungrouped during
editing. Such as using the command in DC:
compile -ungroup_all
Manually modifying the division means that the user instructs all modifications with
commands. Use the "group" and "ungroup" commands to modify the partitions in the design,
as shown in the following figure:
The group command generates a new hierarchical module, as shown in the following figure:
The ungroup command cancels one or all module partitions, and the effect is shown in the
following figure:
https://www.cnblogs.com/iclearner/p/6636176.html 24/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
To cancel all hierarchies in the current design, use the following command:
The ungroup command with the option " -simple_names " will get the original non-
hierarchical unit names U2 and U3
Finally, in order to prevent the module from being divided again, here is a summary of the
module division strategy:
·The size of the module is moderate, and the running time is reasonable.
· Separate the core logic (Core Logic), Pads, clock generation circuit, asynchronous
circuit and JTAG circuit into different modules.
The advantages of this division are: better results - small and fast design, simplified
synthesis process - simplified constraints and scripts, faster compilation - faster turnaround
time (turnaround).
3. Actual combat
In this actual combat, we mainly practice the comprehensive optimization technology of
DC according to the given schematic diagram and comprehensive specification, and carry out
https://www.cnblogs.com/iclearner/p/6636176.html 25/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
in topology mode, so some physical design content may be involved, let's proceed step by
step.
https://www.cnblogs.com/iclearner/p/6636176.html 26/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
Comprehensive specification:
(design specification:)
https://www.cnblogs.com/iclearner/p/6636176.html 27/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
Available resource specification : that is, by running the script to check how many cores
your computer has available for synthesis, we skip it here and ignore him.
Design and constraint file description : tell us the RTL file and name of the design, and
tell us the position and name of the constraint, we do not need to change the constraints such
as the RTL file and name, and the timing environment.
Floor plan description : Since we are using synthesis in topology mode, this floor plan
provides us with physical constraint information.
·Write the design constraint file , because on the one hand, there is no design
specification for timing and environmental attributes, and on the other hand, the relevant
design constraint file is given, so we don't need to write it, let me take a look:
https://www.cnblogs.com/iclearner/p/6636176.html 28/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
From top to bottom, they are: clear previous constraints, clock constraints, input port delay
constraints, input port environmental attribute constraints, output port delay constraints, and
output port environmental attribute constraints.
In the layout plan, the physical information contained in the corresponding physical
constraints is as follows:
Create a file for formality so that the retiming conversion can capture the corresponding
file. In short, it is used for formal verification. The command is as follows :
set_svf STOTO.svf
(The previous chapters have already been covered, so I won’t state them here again)
Execute timing constraints, check whether the constraints are satisfied, and execute
non-default physical constraints:
source STOTO.con
check_timing
source STOTO.pcon
report_clock
https://www.cnblogs.com/iclearner/p/6636176.html 29/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
-->According to 1 and 2, IO constraints are conservative values and can be changed, and the
final design must satisfy the path between registers and registers. Therefore, we can group
paths and pay more attention to the group of clocks. It is the register-to-register group. The
optimized command is as follows:
After setting, we need to check whether the setting is correct (if the setting is correct, it will
return false)
As shown below:
ungroup is the order of canceling the hierarchy, setting it to true is to cancel the
hierarchy; so we need to set it to false
https://www.cnblogs.com/iclearner/p/6636176.html 30/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
·To synthesize :
--> After the synthesis is completed, we can check which features we have used (this step can
be ignored):
-->Check which modules are broken up, that is, whether the verification is consistent with the
constraints:
https://www.cnblogs.com/iclearner/p/6636176.html 31/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
Only the top-level design STOTO and sub-modules PIPELINE and INPUT
have been preserved, and the others have been scattered, that is, the boundaries of the
modules cannot be found.
https://www.cnblogs.com/iclearner/p/6636176.html 32/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
From the above report, we can know that although some modules have been broken up,
the instantiation of the module is still there. We can find the module where the original is
located through the instantiation name, which is convenient for us to provide location when
the delay is unreasonable; this also This is one of the benefits of unique module instantiation
names .
The synthesis is completed, and we stop recording data for formality (in short, it is what
formal verification needs to do):
set_svf -off
Check whether the register is moved or not, that is, check the details of the results of the
optimization technology (if you are interested, you can take a closer look and learn more
about it)
We have carried out various retiming and pipeline optimizations before. Some registers
have been moved, and some combinational logic has been divided. Let’s take a look at those
that have been moved. The first aspect is to see if there is a situation where the constraints do
not match expectations.
--> View the registers moved by the register retiming technology in the PIPELINE design:
Through the path of the return value (that is, the name of the return register), we can
know that the pipeline register in PIPELINE has been moved:
( The name of the pipeline register that has been moved in retiming ends with
clkname_r_REG*_S* , * is a wildcard), combined with our schematic diagram, we can
know that z1_reg has been moved (the suffix name is z1 and s1) :
https://www.cnblogs.com/iclearner/p/6636176.html 33/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
-->We can also view the original module name of the instantiated name, as follows:
-->In the first point, we said that z1_reg is moved only through the 1 in the name, which is
obviously not sufficient. You can verify whether z_reg has been moved by the following:
There is a return value, indicating that this register exists and has not been moved (the
instantiated name was changed after the move):
Then let's check z1_reg, and we can see that the object cannot be found, indicating that it has
been moved:
--> View other triggers that are moved by retiming (in retiming, the registers that have been
moved but in the pipeline are named R_* ):
The above are the registers moved by retiming in the INPUT module. We can check whether
there are any registers in the module that are not moved:
https://www.cnblogs.com/iclearner/p/6636176.html 34/35
27/06/2023, 18:20 Tcl and Design Compiler (eight) - DC logic synthesis and optimization - IC_learner - 博客园
get_cells I_IN/*_reg*
有返回值,说明是存在有不被移动的寄存器的。
-->通过下面的命令:
可以知道PIPELINE模块是寄存输出的(因为有返回报告值)
优化的实战部分都这里就结束了,最后,DC的优化命令有很多,不懂的可以通过man
命令查看。最后感叹一下,总共码了一万两千多子,加上一堆图,这应该是本系列最
长的一篇博文吧。
不忘初心:写博客最初目的就是记录自己容易忘记的东西,而不是像写书那样专门写给别人看的。所以,除禁止转载的博文
外,其他博文可以转载。 尽自己的努力,做到更好!
IC_learner
粉丝 - 1594 关注 - 10 5 0
+加关注
登录后才能查看或发表评论,立即 登录 或者 逛逛 博客园首页
【推荐】腾讯云爆款云服务器首年95元,领折上折代金券最高再省1120元
【推荐】园子的商业化努力-阿里云云市场合作-第一期优惠活动发布上线
【推荐】阿里云-持续降低用云成本:云服务器全面降价,低至0.3元/天
编辑推荐:
· ASP.NET Core 6框架揭秘实例演示[40]:基于角色的授权
· 记一次字符串末尾空白丢失的排查,MySQL 是会玩的!
· 记一次 .NET 某旅行社审批系统 崩溃分析
· C#/.Net的多播委托到底是啥?彻底剖析下
· 如何在 long-running task 中调用 async 方法
Reading Ranking:
C # realizes Linux video chat and remote desktop (source code, supports Xinchuang localization environment, Yinhe Kylin,
Tongxin UOS)
Released a Visual Studio 2022 plug-in, which can automatically complete the constructor dependency injection code
· 11k+ Star a An open source BI tool that is more suitable for Chinese users
· Future programming language "GitHub Hotspot Quick View"
· C# uses enterprise WeChat group robots to push production data
https://www.cnblogs.com/iclearner/p/6636176.html 35/35
Pattern-Based Power Planning in
IC Compiler II
November 2015
Copyright Notice and Proprietary Information
2015 Synopsys, Inc. All rights reserved. This software and documentation contain confidential and proprietary
information that is the property of Synopsys, Inc. The software and documentation are furnished under a license
agreement and may be used or copied only in accordance with the terms of the license agreement. No part of the
software and documentation may be reproduced, transmitted, or translated, in any form or by any means, electronic,
mechanical, manual, optical, or otherwise, without prior written permission of Synopsys, Inc., or as expressly provided
by the license agreement.
Disclaimer
SYNOPSYS, INC., AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, WITH
REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Trademarks
Synopsys company and certain product names are trademarks of Synopsys, as set forth at
http://www.synopsys.com/Company/Pages/Trademarks.aspx.
All other product or company names may be trademarks of their respective owners.
Third-Party Links
Any links to third-party websites included in this document are for your convenience only. Synopsys does not endorse
and is not responsible for such websites and their practices, including privacy practices, availability, and content.
Synopsys, Inc.
700 E. Middlefield Road
Mountain View, CA 94043
www.synopsys.com
ii
Contents
Introduction ................................................................................................................................ 5
Creating Power and Ground Rings............................................................................................. 5
Creating PG Ring Patterns ..................................................................................................... 5
Creating the PG Ring Strategy ............................................................................................... 5
Compiling the PG Ring ........................................................................................................... 6
Creating the Power and Ground Mesh ....................................................................................... 7
Creating the PG Mesh Pattern ............................................................................................... 7
Creating the PG Mesh Strategy .............................................................................................. 8
Compiling the PG Mesh ......................................................................................................... 8
Creating Power and Ground Standard Cell Rails ..................................................................... 10
Creating Power and Ground Macro and Pad Connections ....................................................... 11
Creating Via Rules ................................................................................................................... 13
Creating PG Via Master Rules ............................................................................................. 13
Creating Via Rules Between Different Strategies ................................................................. 16
Defining New Via Structures................................................................................................. 17
Creating a Power Plan Region ................................................................................................. 17
Manually Creating Power Plan Structures ................................................................................ 18
Creating Special Patterns......................................................................................................... 19
Inserting Channel Straps ...................................................................................................... 19
Inserting Terminal Alignment Straps..................................................................................... 20
Inserting Extra Straps to Honor Maximum Standard Cell Rail Tail Distances ....................... 21
Creating Power Switch Alignment Straps ............................................................................. 22
Using the Task Assistant for PG Prototyping ............................................................................ 24
Performing PG Prototyping .................................................................................................. 24
Entering Values for PG Prototyping .................................................................................. 24
Running or Saving the Script ............................................................................................ 26
Advanced Use Cases ............................................................................................................... 28
Creating Composite Patterns ............................................................................................... 28
Creating Stapling Vias on PG Rails ...................................................................................... 30
iii
Distributed PG Creation Flow ............................................................................................... 32
Frequently Asked Questions: ................................................................................................... 34
iv
Introduction
IC Compiler II introduces the pattern-based power planning methodology for power
planning, which replaces the template-based methodology used in IC Compiler. The
pattern-based power planning flow separates the physical implementation details (layer,
width, spacing) of the power ring and mesh from the regions of the design where the
structures are inserted. In a typical design, you run the pattern-based power planning
flow multiple times. The first pass defines and creates the power rings, the second pass
defines and creates and the power mesh, and so on.
This application note describes the basic steps need to create a power plan with IC
Compiler II, and provides some examples which compare the commands and files used
in IC Compiler to the new approach used in IC Compiler II.
The following example creates a ring pattern that uses layer M7 for horizontal layer with
a width of 10 and a spacing of 2; layer M8 is used for the vertical layer with a width of 10
and a spacing of 2.
create_pg_ring_pattern ring_pattern -horizontal_layer M7 \
-horizontal_width {10} -horizontal_spacing {2} \
-vertical_layer M8 -vertical_width {10} \
-vertical_spacing {2} -corner_bridge false
5
The following example uses the set_pg_strategy command to associate the
ring_pattern pattern with the core power plan region for nets VDD, VDD_LOW, and VSS.
The pattern will be created with an offset of 3 microns in both horizontal and vertical
directions, and the power straps will extend to the innermost ring.
set_pg_strategy core_ring -core -pattern \
{{pattern: ring_pattern} {nets: {VDD VDD_LOW VSS}} \
{offset: {3 3}}} -extension {{stop: innermost_ring}}
IC Compiler IC Compiler II
In the template file: create_pg_ring_pattern \
template : ring_pattern \
ringm78(w1,w2,w3,o1,o2,o3) { -horizontal_layer M7 \
side : horizontal { -horizontal_width {10} \
layer: M7 -horizontal_spacing {2} \
width: 10 -vertical_layer M8 \
spacing: 2 -vertical_width {10} \
offset : 3 -vertical_spacing {2} \
} -corner_bridge false
6
side : vertical {
layer: M8
width: 10
spacing: 2
offset: 3
}
}
set_power_ring_strategy \ set_pg_strategy core_ring \
core_ring \ -core \
-nets {VDD VDD_LOW VSS} \ -pattern \
-core \ {{pattern: ring_pattern} \
-template ring.tpl:ringm78() {nets: {VDD VDD_LOW VSS}}\
{offset: {3 3}}}
compile_power_plan \ compile_pg \
–strategy core_ring –strategies core_ring
You can replace the values in the command with parameters and assign the parameter
values with the set_pg_strategy command. Parameters let you reuse the pattern
creation command and apply a different width, pitch, offset, and other values when
creating the strategy. The following example creates a vertical mesh on layer M8 and a
horizontal mesh on layer M9. The actual values for the width, offset, pitch, and so on are
assigned later with the set_pg_strategy command.
7
create_pg_mesh_pattern pg_mesh1 -parameters {w1 p1 w2 p2 f t} \
-layers \
{{{vertical_layer: M8} {width: @w1} \
{spacing: interleaving} {pitch: @p1} {offset: @f} {trim: @t}} \
{{horizontal_layer: M9} {width: @w2} {spacing: interleaving} \
{pitch: @p2} {offset: @f} {trim: @t}}}
8
The figure above shows a design with a power mesh inserted.
The following table shows the power planning commands used to create a power and
ground mesh in IC Compiler, and provides the comparable command set for IC Compiler
II. Note that for the offset_type setting, IC Compiler II only supports the centerline offset
type.
IC Compiler IC Compiler II
template_name: s_mesh1 create_pg_mesh_pattern pg_mesh1 \
(w1 p1 w2 p2 f t) { -parameters {w1 p1 w2 p2 f t} \
layer: M8 { -layers {{{vertical_layer: M8}\
direction: vertical {width: @w1} \
{spacing: interleaving} \
width: @w1
{pitch: @p1} \
spacing: interleaving {offset: @f} \
number: {trim: @t}} \
pitch: @p1 {{horizontal_layer: M9} \
offset_type: centerline {width: @w2} \
offset_start: 400 {spacing: interleaving} \
offset: @f {pitch: @p2} \
trim_strap: @t {offset: @f} \
} {trim: @t}}}
layer: M9 {
direction: horizontal
width: @w2
spacing: interleaving
number:
9
pitch: @p2
offset_type:
offset_start: 400
offset: @f
trim_strap: @t
}
}
set_power_plan_strategy s_mesh1 \ set_pg_strategy s_mesh1 -core \
-core \ -pattern {{pattern: pg_mesh1} \
-template template.tpl:s_mesh1 \ {nets: {VDD VSS VSS VDD}} \
(4 80 6 120 3.344 false) \ {offset_start: 400 400} \
-nets {VDD VSS VSS VDD} \ {parameters: 4 80 6 \
-blockage {{{nets: VDD} \ 120 3.344 false}} \
{block: u0_2 u0_3}}} \ -blockage {{{nets: VDD} \
-extension \ {block: u0_2 u0_3}}} \
{{stop: outermost_ring}} -extension {{stop:
outermost_ring}}
compile_power_plan \ compile_pg \
-strategy s_mesh1 -strategies s_mesh1
You can also create standard cell connection pattern with specific layers, rail widths, and
offsets. The following example create a standard cell connection pattern; the layer, rail
width are specified using parameters. Parameter values are assigned later with the
set_pg_strategy command.
create_pg_std_cell_conn_pattern std_pattern2 \
-layers {@metal_layer} -rail_width {@w_top @w_bottom} \
-parameters {metal_layer w_top w_bottom}
After defining the standard cell connection pattern, associate the pattern with the power
plan region in the design with the set_pg_strategy command.
set_pg_strategy rail_strat -core \
-pattern {{name: std_pattern2} {nets: VDD VSS} \
{parameters: {M1 0.2 0.2}}}
After defining the pattern and strategy for the standard cell rails, use the compile_pg
command to create the rails. The following example creates the standard cell power rails
defined by the rail_strat strategy.
10
compile_pg -strategies rail_strat
The following table shows the power planning commands used to create a power and
ground standard cell rails in IC Compiler, and provides the comparable command set for
IC Compiler II.
IC Compiler IC Compiler II
preroute_standard_cells \ create_pg_std_cell_conn_pattern \
-nets {VDD VSS} \ std_pattern -layers {M1}
-route_pins_on_layer M1 set_pg_strategy rail_strat -core \
-pattern {{name: std_pattern} \
{nets: VDD VSS}}
compile_pg -strategies rail_strat}
After defining macro connection pattern, associates the pattern with the specified cells in
the design using the set_pg_strategy command.
set_pg_strategy s_pad -macros $all_pg_pads \
-pattern {{name: pad_pattern} {nets: {VDD VDD_LOW VSS}}}
Use the compile_pg command to create the rails to connect the macro pins, after you
define the pattern and strategy for the macro connections.
compile_pg -strategies s_pad
11
The figure above shows a design with power straps connected to pad cells
The following example creates a hard macro connection pattern for macro pins on layers
M5 and M6.
create_pg_macro_conn_pattern hm_pattern \
-pin_conn_type scattered_pin -layers {M5 M6}
12
The figure above shows a design with power and ground connections to a macro
The following table shows the power planning commands used to create power and
ground macro cell rails in IC Compiler, and provides the comparable command set for IC
Compiler II.
IC Compiler IC Compiler II
preroute_instances \ create_pg_macro_conn_pattern \
-nets {VDD VDD_LOW VSS} \ hm_pattern \
-route_pins_on_layer {M5 M6} \ -pin_conn_type scattered_pin \
-connect_instances specified \ -layers {M5 M6}
-cells $toplevel_hms set_pg_strategy macro_conn \
-macros $toplevel_hms \
-pattern {{name: hm_pattern} \
{nets: {VDD VDD_LOW VSS}}}
compile_pg -strategies macro_conn
13
set_pg_via_master_rule M67_mesh_via_rule \
-via_array_dimension {6 12} -contact_code {VIA67BAR}
set_pg_via_master_rule M78_mesh_via_rule \
-via_array_dimension {6 10} -contact_code {VIA78BAR_C}
To use the PG via rule created with the set_pg_via_master_rule command, specify
the via rule name after the via_master keyword for the create_pg_composite_pattern,
create_pg_macro_conn_pattern, create_pg_mesh_pattern, and
create_pg_ring_pattern commands. For example, the following command uses the
M67_mesh_via_rule and M78_mesh_via_rule via rules defined previously.
create_pg_mesh_pattern pg_mesh1 -parameters {w1 p1 w2 p2 f t} \
-layers \
{{{vertical_layer: M8} {width: @w1} {spacing: interleaving} \
{pitch: @p1} {offset: @f} {trim: @t}} \
{{horizontal_layer: M7} {width: @w2} {spacing: interleaving} \
{pitch: @p2} {offset: @f} {trim: @t}} \
{{vertical_layer: M6} {width: @w1} {spacing: interleaving} \
{pitch: @p2} {offset: @f} {trim: @t}}} \
-via_rule \
{{{layers: M8}{layers: M7} {via_master: M78_mesh_via_rule}} \
{{layers: M7}{layers: M6}{via_master: M67_mesh_via_rule}}}
Next, define the power plan strategy and compile the power plan to create the PG
structure with the specified via master.
set_pg_strategy s_mesh1 \
-pattern {{pattern: pg_mesh1} {nets: {VDD VSS VSS VDD}} \
{offset_start: 10 10} {parameters: 1 60 3 80 3.344 false}} \
-blockage {{{nets: VDD} {block: u0_2 u0_3}}} -core \
-extension {{stop: outermost_ring}}
The figure above shows a design with 6x10 VIA78BAR_C inserted between M7 and M8,
and 6x12 VIA67BAR inserted between M6 and M7.
14
The following table shows the power planning commands used to create via rules and
insert vias in IC Compiler, and provides the comparable command set for IC Compiler II.
IC Compiler IC Compiler II
set_preroute_advanced_via_rule \ set_pg_via_master_rule \
-contact_codes VIA67BAR \ M67_mesh_via_rule \
-size_by_array_dimensions \ -contact_code {VIA67BAR} \
{20 18} -via_array_dimension {20 18}
In the template file: create_pg_mesh_pattern pg_mesh1 \
template_name: s_mesh1 -parameters {w1 p1 w2 p2 f t} \
(w1 p1 w2 p2 f t) { -layers {{{vertical_layer: M8} \
layer: M8 { {width: @w1}{spacing: interleaving} \
direction: vertical {pitch: @p1}{offset: @f}{trim: @t}} \
width: @w1 {{horizontal_layer: M7}{width: @w2} \
spacing: interleaving {spacing: interleaving}{pitch: @p2} \
number: {offset: @f}{trim: @t}}} \
pitch: @p1 -via_rule {{{layers: M8} \
offset_type: {layers: M7} \
offset_start: 400 {via_master: M78_mesh_via_rule}}}
offset: @f
trim_strap: @t
}
layer: M7 {
direction: horizontal
width: @w2
spacing: interleaving
number:
pitch: @p2
offset_type:
offset_start: 400
offset: @f
trim_strap: @t
}
advanced_rule: on {
stack_vias: M8 M7
honor_advanced_via_rules:
on
}
}
set_power_plan_strategy \ set_pg_strategy s_mesh1 -core \
s_mesh1 \ -pattern {{pattern: pg_mesh1} \
-core \ {nets: {VDD VSS VSS VDD}} \
-template template.tpl: \ {offset_start: 400 400} \
s_mesh1 \ {parameters:4 80 6 120 3.344 false}} \
(4 80 6 120 3.344 false) \ -blockage {{{nets: VDD} \
-nets {VDD VSS VSS VDD} \ {block: u0_2 u0_3}}} \
-blockage {{{nets: VDD} \ -extension {{stop: outermost_ring}}
{block: u0_2 u0_3}}} \
-extension \
{{stop: outermost_ring}}
compile_power_plan \ compile_pg -strategies s_mesh1
–strategy smesh
15
Creating Via Rules Between Different Strategies
A complex power plan might require that you form via connections to create a complete
power network with several different power rings, meshes, power rails, and other
structures. Use the set_pg_strategy_via_rule command to set the via insertion rules
when inserting vias between different power plan strategies.
The following example creates the via_rule1 rule that inserts a VIA23LG via between
shapes on layer M2 as defined by strategy strat1, and shapes on layer M3 as defined by
strategy strat2. New vias are omitted between other metal layer intersections.
set_pg_strategy_via_rule via_rule1 \
-via_rule { {{{strategies: strat1} {layers: M2}} \
{{strategies: strat2} {layers: M3}} {via_master: VIA23LG}} \
{{intersection: undefined} {via_master: nil}} }
After defining the via rule and strategy, use the compile_pg command and specify the
via rule with the -via_rule option.
compile_pg -strategies {strat1 strat2} -via_rule via_rule1
The figure above shows a default via Via master VIA23LG is used according to
used when no via_rule is defined via_rule1
The following example defines a new strategy using the via_4x1 via master and the
rail_strategy strategy to insert vias on existing M5 straps defined with the rail_strategy
strategy. The compile_pg command inserts the vias:
set_pg_strategy_via_rule rail_via_rule \
-via_rule { {{{strategies: rail_strategy}}{{existing: strap} \
{layers: M5}} {via_master: via_4x1}} {{intersection: undefined} \
{via_master: NIL}} }
16
Defining New Via Structures
If your design requires that you create a new via definition that is not defined in the
technology file, use the create_via_def command. The following example creates a
via_def named VIA12_PG and uses the set_pg_via_master_rule command to create
a via rule.
create_via_def VIA12_PG -cut_layer VIA1 -cut_size {0.05 0.05} \
-upper_enclosure {0.06 0} -lower_enclosure {0.06 0}
The following table shows the power planning commands used to create a via definition
in IC Compiler, and provides the comparable command set for IC Compiler II.
IC Compiler IC Compiler II
create_via_master -name VIA12_PG \ create_via_def VIA12_PG \
-cut_layer_name VIA1 \ -cut_layer VIA1 \
-lower_layer_name M1 \ -cut_size {0.05 0.05} \
-upper_layer_name M2 \ -upper_enclosure {0.06 0} \
-cut_width 0.05 -cut_height 0.05 \ -lower_enclosure {0.06 0}
-lower_layer_enc_width 0.06 \
-lower_layer_enc_height 0 \
-upper_layer_enc_width 0.06 \
-upper_layer_enc_height 0
17
The figure above shows a design after creating a PG region which excludes macros
The following table shows the power planning command used to create PG regions in IC
Compiler, and provides the comparable command set for IC Compiler II.
IC Compiler IC Compiler II
create_power_plan_regions r0 -core \ create_pg_region r0 -core \
-exclude_macros [get_flat_cells \ -exclude_macros [get_cells \
-filter \ -filter "design_type==macro"] \
"mask_layout_type == macro"] \ -macro_offset "8 8" -expand -2
-macro_offset 8 -expand -2
18
{{existing: ring}{via_master: VIA34}} \
{{macro_pins: all}{via_master: nil}} \
{{intersection: undefined}{via_master: default}} }
Starting from version K-2015.06-SP2, you can create multiple straps by using the -pitch
option. The following example creates vertical PG straps for net VDD on layer M6 with a
width of 1um. Straps are created from (10, 0) with a pitch of 20um, up to (200, 0).
The following example creates PG vias for nets VDD and VSS within bounding box
{{10 10} {100 100}}. Vias are created between rings on layer METAL5 and standard cell
connections on layer METAL1. Additional vias are allowed and DRC is skipped. All vias
are marked as ring type.
create_pg_vias -within_bbox {{10 10} {100 100}} -nets {VDD VSS} \
-from_types ring -from_layers METAL5 -to_types std_conn \
-to_layers METAL1 -drc no_check -insert_additional_vias \
-mark_as ring
19
The figure above shows a design before inserting channel straps
The figure above shows the same design with channel straps inserted
20
compile_pg -strategies terminal_st
The figure above shows the same design after inserting straps that align with the
terminals
21
compile_pg -strategies max_std_st
The figure above shows the same design after inserting extra straps
22
set_pg_strategy pws_st -core \
-pattern {{name: psw_pt} {nets: InstDecode/VDD108}} \
-extension {stop: design_boundary}
The figure above shows the same design after inserting straps to connect the power
switch cells
23
Using the Task Assistant for PG Prototyping
The Task Assistant provides a form for performing PG prototyping on your design. Start
the Task Assistant by choosing Task > Task Assistant from the menu. Choose Design
Planning > PG Planning > PG Prototyping in the Task Assistant to open the PG
Prototyping form.
Performing PG Prototyping
You can use PG Prototyping to quickly create and remove PG meshes in your design.
After entering values in the form, use the Preview feature to display the Tcl commands
that will create the PG meshes and rails. After refining the commands, you can run the
commands or save them to a script file.
24
– PG layers
o Specify up to four layers for the PG mesh
– PG tracks (%)
o Specify the routing track percentage used.
The figure below shows that the PG tracks(%) value affects the pitch value
for the create_pg_mesh_pattern command
25
Running or Saving the Script
After completing the form, use buttons in the form to perform different actions.
– Apply:
o Click the Apply button to create the PG mesh and rail based on your settings
The equivalent Tcl commands are also displayed in the Tcl Command
window
– Preview:
o Click the Preview button to display the command in the Tcl Command
window
o Review the commands, then click Add to Script to append the commands to
the Script Editor window
o Alternatively, run the commands in the Tcl Command window by clicking Run
Tcl Command
26
– Undo
o Click Undo to run compile_pg –undo on the current design
– Remove PG Routes
o Click Remove PG Routes to run remove_routes on the current design
27
Advanced Use Cases
This section describes advanced use cases in pattern-based power planning.
28
The two figures show the same design after inserting a composite pattern
The following example defines a composite pattern which contains two wire patterns.
The pattern is used to create bridging straps in the design.
## Composite pattern creation
create_pg_wire_pattern m8_strap -layer M8 -direction vertical \
-width 2 -spacing 1 -pitch 10
create_pg_wire_pattern m7_seg -layer M7 -direction horizontal \
-width 1 -low_end_reference_point 0 -high_end_reference_point 5 \
-pitch {10 3.344}
29
create_pg_composite_pattern m78_mesh -nets {VDD VSS} \
-add_patterns {{{pattern: m8_strap} {nets: VDD VSS} {offset: 2}} \
{{pattern: m7_seg} {nets: VDD} {offset: 1 1.783}} }
set_pg_strategy_via_rule vrule \
-via_rule { {{{strategies: srail} {nets: VDD}} {{existing: all} \
{layers: M7}} {via_master: via_shift_right} \
{between_parallel: true}}
{{{strategies: srail} {nets: VSS}} \
{{existing: all} {layers: M8}} {via_master: via_shift_right}} }
The figure above shows the design after creating the power rail with stacked vias
The following example inserts via1 vias between M1 and M2 rails when the rails already
exist in the design.
set_pg_via_master_rule V1_rule -contact_code VIA12SQ_C \
-via_array_dimension {20 1} -allow_multiple {2.4 0}
The following example creates M1 rails and via1 vias when the design contains existing
M2 rails.
## Create M1 standard cell rail pattern and specify M1 color
create_pg_std_cell_conn_pattern m1_rail \
-layers {M1} -rail_width 0.16
set_pg_strategy m1_rail_strategy \
-pattern {{name: m1_rail} {nets: VDD VSS}} -core
## Set contact code, array size and pitch between via arrays
set_pg_via_master_rule V1_rule -contact_code VIA12SQ_C \
-via_array_dimension {20 1} -allow_multiple {2.4 0}
31
## Create via rules between M1 rail strategy and existing M2 straps to
## honor the via settings
set_pg_strategy_via_rule rail_rule \
-via_rule {{{strategies: m1_rail_strategy} \
{{existing: strap} {layers M2}} \
{via_master: V1_rule} {between_parallel: true}} \
{{intersection: undefined} {via_master: NIL}}}
The following example creates M1 rails, M2 rails, and via1 vias simultaneously.
## Create M1 and M2 standard cell rail pattern
create_pg_std_cell_conn_pattern m1_rail -layers {M1} -rail_width 0.16
create_pg_std_cell_conn_pattern m2_rail -layers {M2} -rail_width 0.14
set_pg_strategy m1_rail_strategy \
-pattern {{name: m1_rail} {nets: VDD VSS}} -core
set_pg_strategy m2_rail_strategy \
-pattern {{name: m2_rail} {nets: VDD VSS}} -core
32
run_block_compile_pg command to use distributed computing to create the power
plan.
The following example uses the characterize_block_pg,
set_constraint_mapping_file, and run_block_compile_pg commands to instantiate
the power plan using distributed computing.
characterize_block_pg -compile_pg_script compile_pg.tcl
# By default, tool generates PG mapping file "pg_mapfile"
# in the "pgroute_output" directory
set_constraint_mapping_file ./pgroute_output/pg_mapfile
run_block_compile_pg -host_options block_script
33
Frequently Asked Questions:
Q: How can I improve runtime when creating the PG mesh creation during early design
exploration?
A: During early design exploration, you can turn off via DRC checking when running
compile_pg by specifying the –ignore_via_drc option. This will decrease runtime and
allow for more design iterations.
Q: How do I improve runtime when implementing the top-level PG?
A: If the PG in the blocks are fully implemented, you should treat the blocks as blockage
when creating the top-level PG. Otherwise, the tool must do DRC checking on these
blocks and the runtime will increase. You can specify the blocks as blockage with the
-blockage {blocks: $blocks} option to set_pg_strategy.
Q: What is the next step if IC Validator (ICV) detects a PG DRC violation?
A: Write out the IC Compiler II technology file and verify that the corresponding ICV
runset rule is actually defined in the technology file. The DRC violation might be caused
by an inconsistency between the ICV runset and IC Compiler II technology file.
Q: When should I create stapling vias between parallel M1/M2 rails?
A: Create stapling vias between M1/M2 rails as the last step of PG creation, after
creating the PG mesh. Otherwise, the M1/M2 vias might affect PG mesh creation
runtime. In certain advanced technology nodes, stapling vias should be created after
signal route.
Q: What is the next step if certain wires or vias are missing during PG creation?
A: Use the -ignore_drc option with compile_pg for PG creation. You can also use
check_pg_drc or ICV to check DRCs in the local area to determine if the missing wires
or vias are due to certain DRCs.
Q: When should I use the -low_end_reference_point and -
high_end_reference_point options with the create_pg_wire_pattern command?
A: Use these options when segment wires are required, for example, when creating
bridging straps. Otherwise, avoid using these two options.
Q: How do I avoid creating an incomplete or missing PG meshes when creating single
layer mesh?
A: Include {trim: false} with the -layers option to create_pg_mesh_pattern when
creating mesh pattern for single layer PG.
34
SCENARIO
MODES TYPE:
1. FUNCTIONAL MODE.
2. TEST MODE.
IT CONTAINS SDC CONSTRAINTS.
IN DESIGN DIFFERENT FUNCTIONALITY MODES CONTAINS DIFFERENT SDC'S.
IN DESIGN DIFFERENT FUNCTIONALITY MODES ARE PRESENT.
|
_____________WORST CASE CORNER.
_
FOR SETUP :
Arrival Path-------- |
|--------------------->Max Dealys
Data path------------|
Reqired Path------------------------------->Min Delays
FOR HOLD :
Arrival Path-------- |
|--------------------->Min Dealys
Data path------------|
PHYSICAL VERIFICATION:
IN PHYSICAL VERIFICATION IT CHECKS:
1. LVS(LAYOUT VERSUS SCHEMATIC)
2. DRC(DESIGN RULE CONSTRAINTS CHECK)
3. ERC(ELECTRICAL RULE CHECK)
LAYOUT VERSUS SCHEMATIC(LVS):
INPUTS ARE (.LVS.V) AND (.GDSII) FILES AND RULE DECK FILES.
COMPARISION TWO ELECTRICAL CIRCUITS EQUIVALENT WITH RESPECT TO THEIR
"CONNECTIVITY" AND "TOTAL TRANSISTOR COUNT".
CHECKS ARE:
● WELL AND SUBSTRATE AREAS FOR PROPER CONTACTS AND SPCINGS THERE BY
ENSURING CORRECT POWER CONTACTS AND GROUND CONNECTIONS.
● TO LOCATE FLOATING DEVICES AND FLOATING WELLS.
● TO LOCATE DEVICE WICH ARE SHORTED.
● TO LOCATE DEVICES WITH MISSING CONNECTIONS.
● GATE CONNECTRD DIRECTLY TO SUPPLIES.
● FLOATING INPUTS.
FORMAL VERIFICATIONS:
SETUP CHECK:THE DATA LAUNCHED AT THE SENSITIVE EDGE OF THE LAUNCH FLOP
SHOULD BE CAPTURED AT THE NEXT SENSITIVE EDGE OF THE CAPTURED FLOP.
HOLD TIME: THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE AFTER
ARRIVAL OF SENSITIVE CLOCK.
HOLD CHECK: THE DATA LAUNCHED AT THE SENSITIVE EDGE OF THE LAUNCH FLOP
SHOULD NOT BE CAPTURE AT THE SAME SENSITIVE EDGE OF CAPTURED FLOP.
SETUP FIXES:
1. BUFFER INSERTION
2. UPSIZING THE DRIVER CELL
3. REDUCE NET LENGTH
4. CELL UP SIZING.
5. DRIVE STRENGTH OF LAUNCH FLOP INCREASE.
6. LOGICAL OPTIMIZATION ON DATA PATH.
7. USEFUL SKEW.
8. PIPELINING.
9. USE SYNC CELLS.
10. NET WIDTH INCREASE.
11. USE LVT CELLS.
12. SPLITTING THE COMBINATIONAL LOGIC.
13. INCREASE CLOCK PERIOD.
14. USING DOUBLE SYNCHRONIZER USING FLIP FLOPS.
15. REDUNDANT VIA.
16. REDUCE THE MORE FANOUT NETS WITHIN THE LOGIC
17. DOUBLE VIA
18. LAYER JUMPING
HOLD FIXES:
WHEN HIGH CURRENT DENSITY TRANSFERRING THROUGH A LONG WIRE FOR A LONG
TIME DUE TO THIS ELECTRONS MOVED WITH HIGH ACCELARATIONS ,DUE TO THIS
THOSE ARE TRANSFERRING THEIR MOMENTUM TO THE METAL ATOMS.DUE TO THESE
CAN MIGRATE AND MOVE AWAY FROM THE METAL .
FIXES
FIXING CROSSTALK
CROSS TALK:
REDUCING TECHNIQUES:
FIXING DRC'S
DRC'S FIXING
DRC'S ARE DIFFERENT TYPES :
1. LOGICAL DRC'S.
2. PHYSICAL DRC'S.
LOGICAL DRC'S:
1. MAX TRANSITION
2. MAX CAPACITANCE
3. MAX FANOUT
MAX TRANSITION:
FIXING TECHNIQUES:
MAX CAPACITANCE:
FIXING TECHNIQUES:
MAX FANOUT:
FIXING TECHNIQUES:
PHYSICAL DRC'S:
FIXING TECHNIQUES:
● STDCELLS:
○ Nothing But Base cells(Gates,flops).
● TAP CELLS:
○ Avoids Latch up Problem(Placing these cells with a particular distance).
○ Cells are physical-only cells that have power and ground pins and dont have
signal pins.
○ Tap cells are well-tied cells that bias the silicon infrastructure of n-wells or
p-wells.
○ They are traditionally used so that Vdd or Gnd are connected to substrate or
n-well respectively.
○ This is to Help TIE Vdd and Gnd which results in lesser drift and prevention
from latchup.
○ Required by some technology libraries to limit resistance between Power or
Ground connections to well of the substrate.
● TIE CELLS :
○ It is used for preventing Damage of cells; Tie High cell(Gate One input is
connected to Vdd, another input is connected to signal net);Tie low cells Gate
one input is connected to Vss, another input is connected to signal .
○ Tie - high and Tie - low cells are used to connect the gate of the transistor to
either Power and Ground.
○ In lower technology nodes, if the gate is connected to Power or Ground. The
transistor might be turned "ON/OFF" due to Power or Ground Bounce.
○ These cells are part of the std cell library.
○ The cells which require Vdd(Typically constant signals tied to 1) conncet to tie
high cells.
○ The cells which require Vss/Vdd (Typically constant signals tied to 0) connect
to tie low cells.
● END CAP CELLS:
○ To Know the end of the row,and At the edges endcap cells are placed to avoid
the cells damages at the end of the row to avoid wrong laser wavelength for
correct manufacturing.
○ You can add Endcap cells at both Ends of a cell row.
○ Endcap cells surrounding the core area features which serve as second poly to
cells
○ placed at the edge of row.
○ The library cells do not have cell connectivity as they are only connected to
Power and Ground rail,
○ Thus ensure that gaps do not occure between "WELL" and "IMPLANT LAYER"
and to prevent the DRC violations by satisfying "WELL TIE - OFF"
requirements for core rows we use End cap cells.
○ Usually adding the "Well Extension" for DRC correct designs.
○ End caps are a "POLY EXTENSION" to avoid drain source SHORT
● DECAP CELLS:
○ Charge Sharing;To avoid the Dynamic IR drop ,charge stores in the cells and
release the charge to Nets.
○ Decoupling capacitor cells , or Decap cells, are cells that have a capacitor
placed.
○ Between the Power rail and Ground rail to Over come Dynamic voltage drop.
○ Dynamic IR Drop happens at the active edge of the clock at which a High
currents is drawn from the Power Grid for a small Duration.
○ If the Power is far from a flop the chances are there that flop can go into
Metastable State.
○ To overcome decaps are added , when current requirements is High this
Decaps discharges and provide boost to the power grid.
● FILLER CELLS:
○ Filler cells are used to connect the gaps between the cells after placement.
○ Filler cells are ussed to establish thecontinuity of the N-Wells and the
IMPLANT LAYERS on the standard cells rows, some of the cells also don't
have the Bulk Connection (Substrate connection) Because of their small size
(thin cells).
○ In those cases, the abutment of cells through inserting filler cells can connect
those substrates of small cells to the Power/Ground nets.
○ i.e. those tin cells can use the Bulk connection of the other cells(this is one of
the reason why you get stand alone LVS check failed on some cells)
● ICG CELLS:
○ Clock gating cells ,to avoid Dynamic power Dissipation.
○ Register banks disabled during some clock cycles.
○ During idle modes, the clocks can be gated-offs to save Dynamic power
dissipation on flipflops.
○ Proper circuit is essential to achive a gated clock state to prevent false glithes
on the clock paths
● POWER GATING CELLS:
■ In Power gating to avoid static power Dissipation.
○ Power Gating Cells:
■ Power switches
■ Level Shifters
■ Retention registers
■ Isolation cells
■ Power controler
● PAD CELLS:
○ To Interface with outside Devices;Input to of Power,Clock,Pins are connected
to pad cells and outside also.
● CORNER CELLS:
○ Corner Pads are used for Well Continity.
○ To lift the chip.
● MACRO CELLS:
○ Memories.
○ The memory cells are called Macros.
○ To store information using sequntial elements takes up lot of area.
○ A single flipflop could take up 15 to 20 transistors to store one bit store the
data efficiently and also do not occupy much space on the chip comparatively
by using macros.
● SPARE CELLS:
○ Used at the ECO.
○ Spare cells are standard cells in a design that are not used by the netlist.
○ Placing the spare cells in your design provides a margin for correcting logical
error that might be detected later in the design flow, or for adjusting the speed
of your design.
○ Spare cells are used by the fix ECO command during ECO process.
● PAD FILLER CELLS:
○ Used for Well Continity, Placed in between Pads.
● JTAG CELLS:
○ These are used to check the IO connectivity.
FILES:
CALCULATIONS:
POWER CALCULATIONS:
----->NUMBER OF THE CORE POWER PAD REQUIRED FOR EACH SIDE OF CHIP=(TOTAL
CORE POWER)/{NUMBER OF SIDE)*(CORE VOLTAGE)*MAXIMUM ALLOWABLE CURRENT
FOR A I/O PAD)} .
Wtotalstrap = Itotal/(2*Rj)
L<(Vmax)/(Rj*Rs)
Rs=SHEET RESISTANCE
H=HEIGHT
W=WIDTH
IR DROP:
=IstrapAvg*Rs*(W/2)*(1/Wstrap)
Nstrappinspace = Dpadspacing/Lspace.
POWER
=LEAKAGE POWER+[{(Vdd*Isc)+(C*V*V*F)+(1/2*C*V*V*F)]
C=LOAD CAP
FINAL VERIFICATION:
1. PARASITICS EXTRACTION:IT EXTRACT R,C VALUES FOR GETTING ORIGINAL
DELAYS. TOOL:STAR RC XT LICENCE
2. TIMING VERIFICATION:IT IS FIND BY USING PRIME TIME TOOL.
3. LVS ,ERC CHECKS:THESE IS FIND OUT BY USING CALIBRE,HERCULIES TOOLS.
4. DRC CHECKS:THESE IS FIND OUT BY USING CALIBRE,HERCULIES TOOLS.
AFTER VERIFICATION:
AFTER GDS
CHIP FINISHING:
WE NEED TO DO:
● REDUNDANT VIA IS THE TECHNIQUE FOR REDUCING VOIDS IN THE METAL LAYER.
● FILLER CELL INSERTION IS THE ONE OF THE TECHNIQUE FOR UTILIZING THE
TOTAL AREA WITH OUT GAPS .
● IT IS GOOD TECHNIQUE BECAUSE IN THE FUTURE WE CAN REPLACE FILLER
CELLS WITH SPARE CELLS WITH A LOGIC.
● AT THE TIME OF ETCHING THEY USE SOME TYPE OF CHEMICALS DUE TO THAT
CHEMICALS METAL LOSSES MORE FOR THAT ONE WE ARE INSERTING THE METAL
FILLS.
METAL SLOTTING
● METAL SLOTTING IS TECHNIQUE FOR AVOIDING THE PROBLEMS LIKE METAL LIFT
OFF , METAL EROSION.
ECO:
| |
----------------------------- ---------------------------
| | | |
--------->IN FREEZE SILICON ECO WE HAVE NO CHANCE OF ADDING CELL, HERE SPARE
CELLS ARE USED FOR THESE.
----------->IN NON FREEZE SILICON ECO WE CAN ADD THE CELLS AFTER ROUTING.
DETAIL ROUTING:
----->DETAIL ROUTING DOES NOT WORK ON THE ENTIRE CHIP AT THE SAME TIME LIKE
TRACK ASSIGNMENT.
SBOX : DIVIDE THE BLOCK INTO MINI BOXES THESE ARE USED FOR THE DETAIL ROUTE.
TRACK ASSIGNMENTS :
---->ASSIGNS EACH NET TO THE SPACIFIC TRACKS.
----->TRACES=METAL CONNECTIVITY..
GLOBAL ROUTING:
--->FIRST THE DESIGN IS DIVIDED INTO SMALL BOXES EVERY BOX IS CALLED GLOBAL
ROUTING CELLS (GCELLS OR BUCKETS)
------->IF ANY GCELL HAVE CONGESTION THEN DETOURING(AVOID THE GCELL ROUTING
THROUGH ANOTHER GCELL).
ROUTING
ROUTING:
(i)GLOBAL ROUTING
(ii)TRACK ASSIGNMENT
(iii)DETAIL ROUTING
EXTRA ONE
(iv)SEARCH AND REPAIR
CTS OPTOMIZATION
OPTIMIZATIONS TECHNIQUES:
OPTIMIZATION PROCESS:
NDR'S:
(ii)DOUBLE SPACING.
(iii)SHEILDING
BY DEFAULT, NON DEFAULT ROUTING RULE APPLIES ON ALL LEVELS CLOCK TREE. BUT
USING NDR RULES AT THE CLOCK SINK PIN POINTS IS BETTER TO AVOID.
SETUP TIME :THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE BEFORE
ARRIVAL OF SENSITIVE CLOCK.
HOLD TIME :THE MINIMUM AMOUNT OF TIME THE DATA SHOULD BE STABLE AFTER
ARRIVAL OF SENSITIVE CLOCK.
SETUP CHECK:THE DATA LAUNCHED AT SENSITIVE EDGE OF THE LAUNCH FLOP SHOULD
BE CAPTURED AT NEXT SENSITIVE EDGE OF THE CAPTURED FLOP.
(ii)OUTPUT PORT.
BY THE COMBINATION OF THE THESE START AND END POINTS WE HAVE THE PATHS LIKE
ARE
BY DEPENDING ON THE START POINTS AND END POINTS WE HAVE FOUR TIMING GROUPS
PRESENT.
1. INPUT GROUP
2. REGISTER GROUP
3. FEED THROUGH GROUP.
4. OUTPUT GROUP.
----->CTS IS THE CONNECT THE CLOCKS TO THE ALL CLOCK PIN OF SEQUENTIAL
CIRCUITS.
(ii)max capacitance,
(iii)max fanout,
------->A BUFFER TREES IS BUILT TO BALANCE THE LOADS AND MINIMIZE THE SKEW.
-------->A CLOCK TREE WITH BUFFER LEVELS BETWEEN THE CLOCK SOURCE AND CLOCK
SINKS(END POINTS).
-------->CLOCK PINS ARE DIFFERENT TYPES ,THOSE ARE (i) STOP PINS,
(ii)FLOAT PINS,
(iii)EXCLUDE PINS.
--------->NON-STOP PINS: NONSTOP PINS ARE PINS THROUGH WHICH CLOCK TREE
TRACING THE CONTINOUS AGAINEST THE DEFAULT BEHAVIOUR .
PLACEMENT OPTIMIIZATION
PLACEMENT OPTIMIZATION:
BY USING THE AREA RECOVERY OPTION WE CAN REDUCE THE CELLS , POWER, TIMING.
BY USING THE DFT OPTION WE CAN REDUCE THE ROUTING RESOURECES BY REORDER
THE SCAN CHAINS.
POWER SETUP:
WE HAVE TWO TYPE OF THE POWER DISSIPATIONS:
STATIC POWER DISSIPATION IS, IF THE CELLS ARE PRESENT AT THE "OFF" STATE THEN
DUE TO THE LEAKAGE OF CELLS STATIC POWER DISSIPATION OCCURRS.
IN THE MOST OF THE ARCHITECTURES WE WILL USE THE POWER GATING FOR
REDUSING THE STATIC POWER DISSIPATION.
REDUCING THE HIGH TOGGLE RATE NET NET LENGTHS. THESE TOGGLE RATE IS
GETTING FROM SWITCHING FILE(.SAIF ) THIS IS GETTING FROM SIMULATION PEOPLE.
AND FOR AVOIDING THIS WHICH CELLS HAVING HIGH TOGGLE RATE NET LENTHS
CONNECTED NEARER TO CONNECTED CELLS.
ANOTHER TECHNIQUE IS ADDING THE BUFFER IN BETWEEN THE HIGH NET LENGTH
NETS. FOR REDUCING THE HIGH COUPLING CAPACITANCE.(REDUCE THE LOAD
CAPACITANCE)
ANOTHER TECHNIQUE IS CLONING , IT IS CREATING THE SAME CELL AND CONNECT THE
SOME OF THE OUTPUT NET TO THESE.(SHARING THE LOAD)
MOSTLY IN DESIGN WE WILL USE THE CLOCK GATING TO REDUSING THE DYNAMIC
POWER DISSIPATION
PLACEMENT (DFT SETUP)
DFT SETUP:
SCAN CHAINS: SCAN CHAINS ARE NOTHING BUT A GROUP OF REGISTERS CONNECTED
SERIALLY.
THE ISSUE IS PREEXISTING SCAN CHAINS ARE CONNECTED FAR AWAY , BECAUSE THEY
ARE CONNECTED BASED ON THE FUNCTIONALITY BASED,
INSERT THE SCAN CHAINS FILE. IF PROBLEM WITH PREEXISTING SCAN CHAINS THEN
REORDER THE NAMES OF THE SCAN REGISTER NAMES.
IF THE GIVEN NETLIST IS .ddc FORMAT THEN THERE IS NO NEED OF LOADING .scandef
IF THE GIVEN NETLIST IS .v FORMAT THEN WE HAVE TO LOAD THE .scandef FILE
PLACEMENT
1. PLACEMENT CHECKS,
2. AHFNS
3. DFT SETUP.
4. POWER SETUP.
5. PLACEMENT OPTIMIZATION.
PLACEMENT :
-->FIX MACRO PLACEMENT AGAIN, BECAUSE AFTER INSERTING THE DESIGN IF MACROS
ARE MOVED THE CHECK.
-->NON DEFAULT RULES ARE SPECIAL RULES. LIKE DOUBLE SPACING, DOUBLE
WIDTHING. THESE ARE APPLIED FOR CLOCK WIRES. BECAUSE THOSE HIGH ACTIVITY
NETS.
1. FLOOR PLAN ,
2. NETLIST,
3. NARROW PLACEMENT REGIONS,
4. R,C FOR ROTING LAYERS,
5. DESIGN CONSTRAINTS.
FLOORPLAN(TIMING)
AFTER ACCEPTING THE CONGESTION, TIMING THEN WRITE OUT THE .def file
SAVE THE DESIGN .AND THESE .def FILE IS GIVEN AS INPUT TO THE PLACEMENT.
FLOORPLAN (CONGESTION)
MORE FIXES :
POWER PLANNING
IN POWER PLANNING
----->NUMBER OF THE CORE POWER PAD REQUIRED FOR EACH SIDE OF CHIP=(TOTAL
CORE POWER)/{(NUMBER OF SIDE)*(CORE VOLTAGE)*MAXIMUM ALLOWABLE CURRENT
FOR A I/O PAD)} .
Wtotalstrap = Itotal/(2*Rj)
L<(Vmax)/(Rj*Rs)
Rs=SHEET RESISTANCE
W=WIDTH
IR DROP:
=IstrapAvg*Rs*(W/2)*(1/Wstrap)
Nstrappinspace = Dpadspacing/Lspace.
POWER
=LEAKAGE POWER+[{(Vdd*Isc)+(C*V*V*F)+(1/2*C*V*V*F)]
C=LOAD CAP
IN FLOOR PLAN
1. CREATE PHYSICAL ONLY PAD CELLS. PHYSICAL ONLY CELLS MEANS ONLY
THOSE HAVING PHYSICAL INFORMATION ONLY. NO LOGICAL INFORMATION
PRESENT. AND THEY DON'T HAVE TIMING INFORATION ALSO.
2. PHYSICAL ONLY PAD CELLS ARE (i)VDD,VSS PADC CELLS,(ii)CORNER PAD CELLS.
3. PAD CELLS ACTS LIKE AS PORTS AT THE CHIP LEVEL.
4. CHIP OUTSIDE PINS ARE CONNECTED TO THE INNER CHIP PADS.
5. PADS TYPES:(i)POWER PADS, (ii)DATA PADS .
6. FOR THE POWER SUPPLY TO THE ALL PADS CREATING A PAD POWER RING .
7. VDD,VSS PADS ARE CONNECTED TO THE CORE VDD,VSS POWER RINGS.
8. FOR FILLING THE GAPS BETWEEN THE PADS FILLED BY PAD FILLER CELLS.
9. THESE PAD FILLER CELLS ARE FOR WELL CONTINUITY.
1. PAD CELLS.
2. END CAP CELLS.
3. TAP CELLS.
4. DECAP CELLS.
FLOOR PLAN:
AT CHIP LEVEL:
WELL CONTINITY , WELL CONTINITY MEANS IF THE WELL IS NOT CONTINOUS THEN WE
HAVE TO CREATE SPECIAL MASKS.
HARD MACRO:THE CIRCUIT IS FIXED. AND WE DON'T NO WHICH TYPE OF GATES USING
INSIDE.WE KNOW THE ONLY TIMING INFORMATION.WE DON'T KNOW THE FUNCTIONALITY
INFORMATION.
SOFT MACRO:THE CIRCUIT IS NOT FIXED.WE KNOW WHICH TYPE OF GATES USING
INSIDE.WE KNOW THE TIMING INFORMATION. WE KNOW THE FUNCTIONALITY
INFORMATION.
(ii)HARD BLOCKAGES.
SOFT BLOCKAGES MEANS NO ONE STD CELLS PLACED FIRST, BUT AT THE TIME OF
OPTIMIZATION ONLY BUFFERS ARE PLACED, AND THESE ARE USED AT (i)BETWEEN TWO
MACROS,
HARD BLOCKAGES MEANS NO ONE STD CELLS PLACED.AND THESE ARE USED AT THE
AROUND THE MACRO.BECAUSE PIN ACCESSING.
I/O PLACEMENTS.
CORE AREA :CORE AREA IS DEFINED FOR THE PLACEMENT OF STD CELLS,AND
MACROS.
(ii)UTILIZATION.
UTILIZATION=(STD CELL AREA+MACRO AREA+BLOCKAGE AREA)/TOTAL AREA.
----->I/O PLACEMENT.
PADS ARE USED FOR INTERFACING PURPOSE,AND THESE ARE USED FOR
PROVIDING POWER SUPPLY, DATA SIGNAL,CLOCK SIGNAL.
(ii)SIGNAL PADS.
(iii)CORNER PADS.
(iv)I/O PADS.
OPTIMIZATION CONTROLS
SDC
Constraints are
--------------->Clock latency
--------------->Clock Uncertainity
--------------->Clock Transition
.V ---------->Logical Connectivity
TECHNOLOGY FILE
Layer Info :
1. Mask Name
2. Visible
3. Selectable
4. Line Style(Solid)
5. Patteren
6. Pitch
7. Cut Layer
8.
PHYSICAL LIBRARIES
-------------->Size(Dimensions,Area)
------------->Pin
------------->Port
------------->Layer
------------->Direction
--------------->Use(Signal,Power,Ground)
--------------->layer
LEFs are 3 Types : .Macro lef (Macro Info)
1)Cell View:
2)FRAM view:
LOGIC LIBRARIES
---------->Internal Power
---------->Rise Transition
----------->Fall transition
---------->>Setup rise
----------->Setup fall
----------->Hold rise
------------>Hold fall
------------->Recovery rise
-------------->Removal fall
--------------->Cell rise
-------------->Cell fall
-------------->Pin Capacitance
1. Cell name
2. Area(represent with Nand Equ Area)
3. Power (Funtion of input transition, Total output net Cap )
4. Funtionality
5. Delay
6. Max Cap
7. Max Trans
8. Foot Print
2)Physical Design(PD)
1. DATA PREPARATION.
2. FLOOR PLAN.
3. POWER PLAN-->POWER ROUTING [PRE ROUTE]
4. PLACEMENT.
5. CLOCK TREE SYNTHESIS.-->CLOCK ROUTING.
6. ROUTING.-->DATA ROUTING.-->[POST ROUTE]
7. CHIP FINISHING.
8. VERIFICATION.
9. GDSII FILE.
ASIC Design Flow Tutorial
Using Synopsys Tools
By
Hima Bindu Kommuru
Hamid Mahmoodi
When designing a chip, the following objectives are taken into consideration:
1. Speed
2. Area
3. Power
4. Time to Market
To design an ASIC, one needs to have a good understanding of the CMOS Technology.
The next few sections give a basic overview of CMOS Technology.
In the present decade the chips being designed are made from CMOS technology. CMOS
is Complementary Metal Oxide Semiconductor. It consists of both NMOS and PMOS
transistors. To understand CMOS better, we first need to know about the MOS (FET)
transistor.
The transistor normally needs some kind of voltage initially for the channel to form.
When there is no channel formed, the transistor is said to be in the ‘cut off region’. The
voltage at which the transistor starts conducting (a channel begins to form between the
source and the drain) is called threshold Voltage. The transistor at this point is said to be
in the ‘linear region’. The transistor is said to go into the ‘saturation region’ when there
are no more charge carriers that go from the source to the drain.
Example: Creating a CMOS inverter requires only one PMOS and one NMOS transistor.
The NMOS transistor provides the switch connection (ON) to ground when the input is
logic high. The output load capacitor gets discharged and the output is driven to a
logic’0’. The PMOS transistor (ON) provides the connection to the VDD power supply
rail when the input to the inverter circuit is logic low. The output load capacitor gets
charged to VDD . The output is driven to logic ’1’.
The output load capacitance of a logic gate is comprised of
a. Intrinsic Capacitance: Gate drain capacitance ( of both NMOS and PMOS
transistors)
b. Extrinsic Capacitance: Capacitance of connecting wires and also input
In CMOS, there is only one driver, but the gate can drive as many gates as possible. In
capacitance of the Fan out Gates.
CMOS technology, the output always drives another CMOS gate input.
The charge carriers for PMOS transistors is ‘holes’ and charge carriers for NMOS
are electrons. The mobility of electrons is two times more than that of ‘holes’. Due to this
the output rise and fall time is different. To make it same, the W/L ratio of the PMOS
transistor is made about twice that of the NMOS transistor. This way, the PMOS and
A low to high output transition draws energy from the power supply
capacitances
Sequential Element
In CMOS, an element which stores a logic value (by having a feedback loop) is called a
sequential element. A simplest example of a sequential element would be two inverters
connected back to back. There are two types of basic sequential elements, they are:
1. Latch: The two inverters connected back to back, when connected to a
transmission gate, with a control input, forms a latch. When the control input is
high (logic ‘1’), the transmission gate is switched on and whatever value which
was at the input ‘D’ passes to the output. When the control input is low, the
transmission gate is off and the inverters that are connected back to back hold the
D Q
2. Flip-Flop: A flip flop is constructed from two latches in series. The first latch is
called a Master latch and the second latch is called the slave latch. The control
input to the transmission gate in this case is called a clock. The inverted version of
the clock is fed to the input of the slave latch transmission gate.
a. When the clock input is high, the transmission gate of the master latch is
switched on and the input ‘D’ is latched by the 2 inverters connected back
to back (basically master latch is transparent). Also, due to the inverted
clock input to the transmission gate of the slave latch, the transmission
gate of the slave latch is not ‘on’ and it holds the previous value.
b. When the clock goes low, the slave part of the flip flop is switched on and
will update the value at the output with what the master latch stored when
the clock input was high. The slave latch will hold this new value at the
output irrespective of the changes at the input of Master latch when the
clock is low. When the clock goes high again, the value at the output of
the slave latch is stored and step’a’ is repeated again.
The data latched by the Master latch in the flip flop happens at the rising clock
edge, this type of flip flop is called positive-edge triggered flip flop. If the latching
happens at negative edge of the clock, the flip flop is called negative edge triggered flip
flop.
Master Slave
D Q
CLK
2.0 Introduction
To design a chip, one needs to have an Idea about what exactly one wants to design. At
every step in the ASIC flow the idea conceived keeps changing forms. The first step to
make the idea into a chip is to come up with the Specifications.
Idea
Specifications
RTL
Physical
Implementation
GDSII
CHIP
There are three main steps in debugging the design, which are as follows
You can interactively do the above steps using the VCS tool. VCS first compiles the
verilog source code into object files, which are nothing but C source files. VCS can
compile the source code into the object files without generating assembly language files.
VCS then invokes a C compiler to create an executable file. We use this executable file to
simulate the design. You can use the command line to execute the binary file which
creates the waveform file, or you can use VirSim.
Below is a brief overview of the VCS tool, shows you how to compile and simulate a
counter. For basic concepts on verification and test bench, please refer to APPENDIX 3A
at the end of this chapter.
SETUP
Before going to the tutorial Example, let’s first setup up the directory.
You need to do the below 3 steps before you actually run the tool:
1. As soon as you log into your engr account, at the command prompt, please type “csh
“as shown below. This changes the type of shell from bash to c-shell. All the commands
work ONLY in c-shell.
[hkommuru@hafez ]$csh
2. Please copy the whole directory from the below location (cp –rf source destination)
[hkommuru@hafez ]$cd
[hkommuru@hafez ]$ cp -rf /packages/synopsys/setup/asic_flow_setup ./
This creates directory structure as shown below. It will create a directory called
“asic_flow_setup ”, under which it creates the following directories namely
The “asic_flow_setup” directory will contain all generated content including, VCS
simulation, synthesized gate-level Verilog, and final layout. In this course we will always
try to keep generated content from the tools separate from our source RTL. This keeps
our project directories well organized, and helps prevent us from unintentionally
modifying the source RTL. There are subdirectories in the project directory for each
major step in the ASIC Flow tutorial. These subdirectories contain scripts and
configuration files for running the tools required for that step in the tool flow. For this
tutorial we will work exclusively in the vcs directory.
3. Please source “synopsys_setup.tcl” which sets all the environment variables necessary
to run the VCS tool.
Please source them at unix prompt as shown below
Please Note : You have to do steps 1 and 3 above everytime you log in.
In this tutorial, we would be using a simple counter example . Find the verilog code and
testbench at the end of the tutorial.
Setup
3.1.1 Compiling and Simulating
Please note that the –f option means the file specified (main_counter.f ) contains a list of
command line options for vcs. In this case, the command line options are just a list of the
verilog file names. Also note that the testbench is listed first. The below command also
will have same effect .
The +v2k option is used if you are using Verilog IEE 1364-2000 syntax; otherwise there
is no need for the option. Please look at Figure 3.a for output of compile command.
By default the output of compilation would be a executable binary file is named simv.
You can specify a different name with the -o compile-time option.
For example :
vcs –f main_counter.f +v2k –o counter.simv
VCS compiles the source code on a module by module basis. You can incrementally
compile your design with VCS, since VCS compiles only the modules which have
changed since the last compilation.
2. Now, execute the simv command line with no arguments. You should see the output
from both vcs and simulation and should produce a waveform file called counter.dump in
your working directory.
[hkommuru@hafez vcs]$./counter.simv
If you look at the last page of the tutorial, you can see the testbench code, to understand
the above result better.
3. You can do STEP 1 and STEP 2 in one single step below. It will compile and simulate
in one single step. Please take a look at the command below:
To compile and simulate your design, please write your verilog code, and copy it to the
vcs directory. After copying your verilog code to the vcs directory, follow the tutorial
steps to simulate and compile.
Where debug_pp option is used to run the dve in simulation mode. Debug_pp creates a
vpd file which is necessary to do simulation. The below window will open up.
4. Now in the data pane select all the signals with the left mouse button holding the shift
button so that you select as many signals you want. Click on the right mouse button to
open a new window, and click on “Add to group => New group . A new window will
open up showing a new group of selected signals below.
You can create any number of signal groups you want so that you can organize the way
and you want to see the output of the signals .
to open a new window, and click on “Add to waves New wave view”. A new
shift button so that you select as many signals you want. Click on the right mouse button
In the waveform window, the menu option View Set Time Scale can be used to
change the display unit and the display precision
session again. In the menu option , File Save Session, the below window opens as
7. You can save your current session and reload the same session next time or start a new
shown below.
2. Scope Show Schematic: You can view a schematic view of the design .
Go to the menu option, Simulation Breakpoints , will open up a new window as shown
below. You need to do this before Step 6, i.e. before actually running the simulation.
You can browse which file and also the line number and click on “Create” button to
create breakpoints.
Now when you simulate, click on Simulate Start, it will stop at your defined
breakpoint, click on Next to continue.
You can save your session again and exit after are done with debugging or in the middle
of debugging your design.
Verilog Code
File : Counter.v
endmodule // counter
// Test bench gets wires for all device under test (DUT) outputs:
initial
begin
reset = 1'b1;
@(posedge clk);#1;
reset = 1'b0;
$finish;
end
endmodule // counter_testbench
RTL is expressed in Verilog or VHDL. This document will cover the basics of Verilog.
Verilog is a Hardware Description Language (HDL). A hardware description language is
a language used to describe a digital system example Latches, Flip-Flops, Combinatorial,
Sequential Elements etc… Basically you can use Verilog to describe any kind of digital
system. One can design a digital system in Verilog using any level of abstraction. The
most important levels are:
Verilog).
Register Transfer Level (RTL): Designs using the Register-Transfer Level
specify the characteristics of a circuit by transfer of data between the registers,
and also the functionality; for example Finite State Machines. An explicit clock is
used. RTL design contains exact timing possibility; and data transfer is scheduled
to occur at certain times.
Gate level: The system is described in terms of gates (AND, OR, NOT, NAND
etc…). The signals can have only these four logic states (‘0’,’1’,’X’,’Z’). The
Gate Level design is normally not done because the output of Logic Synthesis is
the gate level netlist.
Verilog allows hardware designers to express their designs at the behavioral level and not
worry about the details of implementation to a later stage in the design of the chip. The
design normally is written in a top-down approach. The system has a hierarchy which
makes it easier to debug and design. The basic skeleton of a verilog module looks like
this:
module example (<ports >);
input <ports>;
output <ports>;
inout <ports>;
# Data-type instantiation
#reg data-type stores values
reg <names>;
<Instantiation>
The modules can reference other modules to form a hierarchy. If the module contains
references to each of the lower level modules, and describes the interconnections between
them, a reference to a lower level module is called a module instance. Each instance is an
independent, concurrently active copy of a module. Each module instance consists of the
name of the module being instanced (e.g. NAND or INV), an instance name (unique to
that instance within the current module) and a port connection list.
Instance name in the above example is ‘N1 and V1’ and it has to be unique. The port
connection list consists of the terms in open and closed bracket ( ). The module port
connections can be given in order (positional mapping), or the ports can be explicitly
named as they are connected (named mapping). Named mapping is usually preferred for
long connection lists as it makes errors less likely.
2. Port mapping by order: Don’t have to specify (.in) & (.out). The
Example:
AND A1 (a, b, aandb);
If ‘a’ and ‘b ‘are the inputs and ‘aandb’ is the output, then the ports must be
mentioned in the same order as shown above for the AND gate. One cannot write
it in this way:
AND A1 (aandb, a, b);
end
Digital Design can be broken into either Combinatorial Logic or Sequential Logic. As
mentioned earlier, Hardware Description Languages are used to model RTL. RTL again
is nothing but combinational and sequential logic. The most popular language used to
model RTL is Verilog. The following are a few guidelines to code digital logic in
Verilog:
1. Not everything written in Verilog is synthesizable. The Synthesis tool does not
synthesize everything that is written. We need to make sure, that the logic implied
is synthesized into what we want it to synthesize into and not anything else.
a. Mostly, time dependant tasks are not synthesizable in Verilog. Some of
the Verilog Constructs that are Non Synthesizable are task, wait, initial
statements, delays, test benches etc
b. Some of the verilog constructs that are synthesizable are assign statement,
always blocks, functions etc. Please refer to next section for more detail
information.
2. One can model level sensitive and also edge sensitive behavior in Verilog. This
can be modeled using an always block in verilog.
a. Every output in an ‘always’ block when changes and depends on the
sensitivity list, becomes combinatorial circuit, basically the outputs have
to be completely specified. If the outputs are not completely specified,
then the logic will get synthesized to a latch. The following are a few
examples to clarify this:
b. Code which results in level sensitive behavior
c. Code which results in edge sensitive behavior
d. Case Statement Example
i. casex
ii. casez
3. Blocking and Non Blocking statements
a. Example: Blocking assignment
b. Example: Non Blocking assignment
4. Modeling Synchronous and Asynchronous Reset in Verilog
a. Example: With Synchronous reset
b. Example: With Asynchronous reset
5. Modeling State Machines in Verilog
a. Using One Hot Encoding
b. Using Binary Encoding
After designing the system, it is very vital do verify the logic designed. At the front end,
this is done through simulation. In verilog, test benches are written to verify the code.
Test bench instantiates the top level design and provides the stimulus to the design.
test benches:
Inputs of the design are declared as ‘reg’ type. The reg data type holds a value until a
new value is driven onto it in an initial or always block. The reg type can only be
assigned a value in an always or initial block, and is used to apply stimulus to the inputs
Outputs of design declared as ‘wire’ type. The wire type is a passive data type that
of the Device Under Test.
holds a value driven on it by a port, assign statement or reg type. Wires can not be
Always and initial blocks are two sequential control blocks that operate on reg types
assigned values inside always and initial blocks.
in a Verilog simulation. Each initial and always block executes concurrently in every
module at the start of simulation. An example of an initial block is shown below
Initial blocks start executing sequentially at simulation time 0. Starting with the first line
between the “begin end pair” each line executes from top to bottom until a delay is
reached. When a delay is reached, the execution of this block waits until the delay time
has passed and then picks up execution again. Each initial and always block executes
concurrently. The initial block in the example starts by printing << Starting the
Simulation >> to the screen, and initializes the reg types clk_50 and rst_l to 0 at time 0.
The simulation time wheel then advances to time index 20, and the value on rst_l changes
to a 1. This simple block of code initializes the clk_50 and rst_l reg types at the beginning
of simulation and causes a reset pulse from low to high for 20 ns in a simulation.
Some system tasks are called. These system tasks are ignored by the synthesis tool, so
it is ok to use them. The system task variables begin with a ‘$’ sign. Some of the system
level tasks are as follows:
a. $Display: Displays text on the screen during simulation
b. $Monitor: Displays the results on the screen whenever the parameter
changes.
c. $Strobe: Same as $display, but prints the text only at the end of the time
step.
d. $Stop: Halts the simulation at a certain point in the code. The user can add
the next set of instructions to the simulator. After $Stop, you get back to
the CLI prompt.
e. $Finish: Exits the simulator
f. $Dumpvar, $Dumpfile: This dumps all the variables in a design to a file.
You can dump the values at different points in the simulation.
task load_count;
input [3:0] load_value;
begin
@(negedge clk_50);
$display($time, " << Loading the counter with %h >>", load_value);
load_l = 1’b0;
count_in = load_value;
@(negedge clk_50);
load_l = 1’b1;
end
endtask //of load_count
This task takes one 4-bit input vector, and at the negative edge of the next clk_50, it starts
executing. It first prints to the screen, drives load_l low, and drives the count_in of the
counter with the load_value passed to the task. At the negative edge of clk_50, the load_l
signal is released. The task must be called from an initial or always block. If the
simulation was extended and multiple loads were done to the counter, this task could be
‘timescale 1 ns / 100 ps
This line is important in a Verilog simulation, because it sets up the time scale and
operating precision for a module. It causes the unit delays to be in nanoseconds (ns) and
the precision at which the simulator will round the events down to at 100 ps. This causes
a #5 or #1 in a Verilog assignment to be a 5 ns or 1 ns delay respectively. The rounding
of the events will be to .1ns or 100 pico seconds.
Verilog Test benches use a standard, which contains a description of the C language
procedural interface, better known as programming language interface (PLI). We can
treat PLI as a standardized simulator (Application Program Interface) API for routines
written in C or C++. Most recent extensions to PLI are known as Verilog procedural
Before writing the test bench, it is important to understand the design specifications of
interface (VPI);
You can view all the signals and check to see if the signal values are correct, in the
the design, and create a list of all possible test cases.
When designing the test bench, you can break-points at certain times, or can do
waveform viewer.
simulation in a single step way, one can also have Time related breakpoints (Example:
To test the design further, it is good to have randomized simulation. Random
execute the simulation for 10ns and then stop)
The following is an example of a simple read, write, state machine design and a test
bench to test the state machine.
State Machine:
module state_machine(sm_in,sm_clock,reset,sm_out);
endmodule
// instantiations
state_machine #(idle_state,
read_state,
write_state,
wait_state) st_mac (
.sm_in (in1),
.sm_clock (clk),
.reset (reset),
.sm_out (data_mux)
);
// monitor section
always @ (st_mac.current_state)
case (st_mac.current_state)
idle_state : state_message = "idle";
read_state : state_message = "read";
write_state: state_message = "write";
wait_state : state_message = "wait";
endcase
// clock declaration
initial clk = 1'b0;
always #50 clk = ~clk;
// tasks
task reset_cct;
begin
@(posedge clk);
message = " reset";
task change_in1_to;
input a;
begin
message = "change in1 task";
@ (posedge clk);
in1 = a;
end
endtask
endmodule
How do you simulate your design to get the real system behavior?
The following are two methods with which it id possible to achieve real system behavior
and verify it.
4.0 Introduction
The Design Compiler is a synthesis tool from Synopsys Inc. In this tutorial you will
learn how to perform hardware synthesis using Synopsys design compiler. In simple
terms, we can say that the synthesis tool takes a RTL [Register Transfer Logic] hardware
description [design written in either Verilog/VHDL], and standard cell library as input
and the resulting output would be a technology dependent gate-level-netlist. The gate-
level-netlist is nothing but structural representation of only standard cells based on the
cells in the standard cell library. The synthesis tool internally performs many steps, which
are listed below. Also below is the flowchart of synthesis process.
Libraries Read
Libraries
Read
Netlist
Netlist
Map to
Target Library
Map to and Optimize
Link Library
(if gate-level)
Write-out
Optimized
Apply Netlist
SDC
Const. Constraints
While running DC, it is important to monitor/check the log files, reports, scripts etc to
identity issues which might affect the area, power and performance of the design. In this
For Additional documentation please refer the below location, where you can get more
information on the 90nm Standard Cell Library, Design Compiler, Design Vision, Design
Ware Libraries etc.
There are four important parameters that should be setup before one can start
using the tool. They are:
• search_path
This parameter is used to specify the synthesis tool all the paths that it should search
when looking for a synthesis technology library for reference during synthesis.
• target_library
The parameter specifies the file that contains all the logic cells that should used for
mapping during synthesis. In other words, the tool during synthesis maps a design to the
logic cells present in this library.
• symbol_library
This parameter points to the library that contains the “visual” information on the logic
cells in the synthesis technology library. All logic cells have a symbolic representation
and information about the symbols is stored in this library.
• link_library
This parameter points to the library that contains information on the logic gates in the
synthesis technology library. The tool uses this library solely for reference but does not
use the cells present in it for mapping as in the case of target_library.
An example on use of these four variables from a .synopsys_dc.setup file is given below.
search_path = “. /synopsys/libraries/syn/cell_library/libraries/syn”
target_library = class.db
link_library = class.db
symbol_library = class.db
Once these variables are setup properly, one can invoke the synthesis tool at the
command prompt using any of the commands given for the two interfaces.
Design: It corresponds to the circuit description that performs some logical function. The
design may be stand-alone or may include other sub-designs. Although sub-design may
be part of the design, it is treated as another design by the Synopsys.
Cell: It is the instantiated name of the sub-design in the design. In Synopsys terminology,
there is no differentiation between the cell and instance; both are treated as cell.
Reference: This is the definition of the original design to which the cell or instance refers.
For e.g., a leaf cell in the netlist must be referenced from the link library, which contains
the functional description of the cell. Similarly an instantiated sub-design must be
referenced in the design, which contains functional description of the instantiated
subdesign.
Ports: These are the primary inputs, outputs or IO’s of the design.
Pin: It corresponds to the inputs, outputs or IO’s of the cells in the design. (Note the
difference between port and pin)
Net: These are the signal names, i.e., the wires that hook up the design together by
connecting ports to pins and/or pins to each other.
Clock: The port or pin that is identified as a clock source. The identification may be
internal to the library or it may be done using dc_shell commands.
Library: Corresponds to the collection of technology specific cells that the design is
targeting for synthesis; or linking for reference.
Design Entry
Before synthesis, the design must be entered into the Design Compiler (referred to as DC
from now on) in the RTL format. DC provides the following two methods of design
entry:
read command
analyze & elaborate commands
The analyze & elaborate commands are two different commands, allowing designers to
initially analyze the design for syntax errors and RTL translation before building the
generic logic for the design. The generic logic or GTECH components are part of
Synopsys generic technology independent library. They are unmapped representation of
boolean functions and serve as placeholders for the technology dependent library.
The analyze command also stores the result of the translation in the specified design
library that maybe used later. So a design analyzed once need not be analyzed again and
can be merely elaborated, thus saving time. Conversely read command performs the
function of analyze and elaborate commands but does not store the analyzed results,
therefore making the process slow by comparison.
One other major difference between the two methods is that, in analyze and elaborate
design entry of a design in VHDL format, one can specify different architectures during
elaboration for the same analyzed design. This option is not available in the read
command.
The commands used for both the methods in DC are as given below:
Read command:
dc_shell>read –format <format> <list of file names>
“-format” option specifies the format in which the input file is in, e.g. VHDL
Sample command for a reading “adder.vhd” file in VHDL format is given below
Or
Technology libraries contain the information that the synthesis tool needs to generate a
netlist for a design based on the desired logical behavior and constraints on the design.
The tool referring to the information provided in a particular library would make
appropriate choices to build a design. The libraries contain not only the logical function
of an ASIC cell, but the area of the cell, the input-to-output timing of the cell, any
constraints on fanout of the cell, and the timing checks that are required for the cell.
The target_library, link_library, and symbol_library parameters in the startup file are
used to set the technology library for the synthesis tool.
Following are given some guidelines which if followed might improve the performance
of the synthesized logic, and produce a cleaner design that is suited for automating the
Clock logic including clock gating and reset generation should be kept in one block –
synthesis process.
to be synthesized once and not touched again. This helps in a clean specification of
the clock constraints. Another advantage is that the modules that are being driven by
No glue logic at the top: The top block is to be used only for connecting modules
the clock logic can be constrained using the ideal clock specifications.
together. It should not contain any combinational glue logic. This removes the time
consuming top-level compile, which can now be simply stitched together without
Module name should be same as the file name and one should avoid describing more
undergoing additional synthesis.
that one module or entity in a single file. This avoids any confusion while compiling
While coding finite state machines, the state names should be described using the
the files and during the synthesis.
enumerated types. The combinational logic for computing the next state should be in
its own process, separate from the state registers. Implement the next-state
combinational logic with a case statement. This helps in optimizing the logic much
prevent latch inferences in case statements the default part of the case statement
should always be specified. On the other hand an if statement is used for writing
priority encoders. Multiple if statements with multiple branches result in the creation
of a priority encoder structure.
Ex: always @ (A, B, C)
begin
if A= 0 then D = B; end if;
if A= 1 then D = C; end if;
end
The same code can be written using if statement along with elsif statements to cover all
possible branches.
Three state buffers: A tri-state buffer is inferred whenever a high impedance (Z) is
assigned to an output. Tri-state logic is generally not always recommended because it
reduces testability and is difficult to optimize – since it cannot be buffered.
Signals versus Variables in VHDL: Signal assignments are order independent, i.e. the
order in which they are placed within the process statement does not have any effect on
the order in which they are executed as all the signal assignments are done at the end of
the process. The variable assignments on the other hand are order dependent. The signal
assignments are generally used within the sequential processes and variable assignments
are used within the combinational processes.
A designer, in order to achieve optimum results, has to methodically constrain the design,
by describing the design environment, target objectives and design rules. The constraints
contain timing and/or area information, usually derived from the design specifications.
The synthesis tool uses these constraints to perform synthesis and tries to optimize the
design with the aim of meeting target objectives.
Design attributes set the environment in which a design is synthesized. The attributes
specify the process parameters, I/O port attributes, and statistical wire-load models. The
most common design attributes and the commands for their setting are given below:
Load: Each output can specify the drive capability that determines how many loads can
be driven within a particular time. Each input can have a load value specified that
determines how much it will slow a particular driver. Signals that are arriving later than
the clock can have an attribute that specifies this fact. The load attribute specifies how
much capacitive load exists on a particular output signal. The load value is specified in
the units of the technology library in terms of picofarads or standard loads, etc... The
command for setting this attribute is given below:
set_load <value> <object_list>
e.g. dc_shell> set_load 1.5 x_bus
Design constraints specify the goals for the design. They consist of area and timing
constraints. Depending on how the design is constrained the DC/DA tries to meet the set
objectives. Realistic specification is important, because unrealistic constraints might
result in excess area, increased power and/or degrading in timing. The basic commands to
constrain the design are
set_max_area: This constraint specifies the maximum area a particular design should
have. The value is specified in units used to describe the gate-level macro cells in the
technology library.
e.g. dc_shell> set_max_area 0
Specifying a 0 area might result in the tool to try its best to get the design as small as
possible
create_clock: This command is used to define a clock object with a particular period and
waveform. The –period option defines the clock period, while the –waveform option
controls the duty cycle and the starting edge of the clock. This command is applied to a
pin or port, object types.
Following example specifies that a port named CLK is of type “clock” that has a period
of 40 ns, with 50% duty cycle. The positive edge of the clock starts at time 0 ns, with the
falling edge occurring at 20 ns. By changing the falling edge value, the duty cycle of the
clock may be altered.
e.g. dc_shell> create_clock –period 40 –waveform {0 20} CLK
set_input_delay: It specifies the input arrival time of a signal in relation to the clock. It is
used at the input ports, to specify the time it takes for the data to be stable after the clock
edge. The timing specification of the design usually contains this information, as the
setup/hold time requirements for the input signals. From the top-level timing
specifications the sub-level timing specifications may also be extracted.
e.g. dc_shell> set_input_delay –max 23.0 –clock CLK {datain}
dc_shell> set_input_delay –min 0.0 –clock CLK {datain}
The CLK has a period of 30 ns with 50% duty cycle. For the above given specification of
max and min input delays for the datain with respect to CLK, the setup-time requirement
for the input signal datain is 7ns, while the hold-time requirement is 0ns.
set_output_delay: This command is used at the output port, to define the time it takes for
the data to be available before the clock edge. This information is usually is provided in
the timing specification.
e.g. dc_shell> set_output_delay – max 19.0 –clock CLK {dataout}
The CLK has a period of 30 ns with 50% duty cycle. For the above given specification of
max output delay for the dataout with respect to CLK, the data is valid for 11 ns after the
clock edge.
set_max_delay: It defines the maximum delay required in terms of time units for a
particular path. In general it is used for blocks that contain combination logic only.
However it may also be used to constrain a block that is driven by multiple clocks, each
with a different frequency. This command has precedence over DC derived timing
requirements.
e.g. dc_shell> set_max_delay 5 –from all_inputs() – to_all_outputs()
set_min_delay: It defines the minimum delay required in terms of time units for a
particular path.. It is the opposite of the set_max_delay command. This command has
precedence over DC derived timing requirements.
e.g. dc_shell> set_max_delay 3 –from all_inputs() – to_all_outputs()
Setup
1. Write the Verilog Code. For the purpose of this tutorial, please consider the simple
verilog code for gray counter below.
// SIGNAL DECLARATIONS
reg [2-1:0] gcc_out;
// Compute new gcc_out value based on current gcc_out value
always @(negedge reset_n or posedge clk) begin
if (~reset_n)
gcc_out <= 2'b00;
else begin // MUST be a (posedge clk) - don't need “else if (posedge clk)"
if (en_count) begin // check the count enable
case (gcc_out)
2'b00: begin gcc_out <= 2'b01; end
2'b01: begin gcc_out <= 2'b11; end
2'b11: begin gcc_out <= 2'b10; end
default: begin gcc_out <= 2'b00; end
endcase // of case
end // of if (en_count)
end // of else
end // of always loop for computing next gcc_out value
endmodule
2. As soon as you log into your engr account, at the command prompt, please type “csh
“as shown below. This changes the type of shell from bash to c-shell. All the commands
work ONLY in c-shell.
[hkommuru@hafez ]$csh
[hkommuru@hafez ]$cd
[hkommuru@hafez ]$ cp –rf /packages/synopsys/setup/asic_flow_setup .
This ccreate directory structure as shown below. It will create a directory called
“asic_flow_setup ”, under which it creates the following directories namely
asic_flow_setup
The “asic_flow_setup” directory will contain all generated content including, VCS
simulation, synthesized gate-level Verilog, and final layout. In this course we will always
try to keep generated content from the tools separate from our source RTL. This keeps
our project directories well organized, and helps prevent us from unintentionally
modifying the source RTL. There are subdirectories in the project directory for each
major step in the ASIC Flow tutorial. These subdirectories contain scripts and
configuration files for running the tools required for that step in the tool flow. For this
tutorial we will work exclusively in the vcs directory.
3. Please source “synopsys_setup.tcl” which sets all the environment variables necessary
to run the VCS tool.
Please source them at unix prompt as shown below
Please Note : You have to do steps 1 and 3 above everytime you log in.
[hkommuru@hafez ]$cd
[[email protected]] $cd asic_flow_setup/synth_graycounter
[[email protected]] $cd scripts
[[email protected]] $emacs dc_synth.tcl &
[[email protected]] $cd ..
5. First we will learn how to run dc_shell manually, before we automate the scripts. Use
the below command invoke dc_shell
[[email protected]] $ dc_shell-xg-t
Initializing...
dc_shell-xg-t>
Once you get the prompt above, you can run various commands to load verilog files,
libraries etc. To get more information on any command you can type “man
<command_name> at the prompt.
The command “lappend search path” tells the tool to search for the verilog code in that
particular directory ] to the verilog source code directory.
The next command “define_design_lib”, creates a Synopsys work directory, and the la
last two commands “ set link_library “ and “ set target_library “ point to the standard
technology libraries we will be using. The DB files contain wireload models [Wire load
modeling allows the tool to estimate the effect of wire length and fanout on the resistance,
capacitance, and area of nets, calculate wire delays and circuit speeds], area and timing
information for each standard cell. DC uses this information to optimize the synthesis
process. For more detail information on optimization, please refer to the DC manual.
7. The next step is to load your Verilog/VHDL design into Design Compiler. The
commands to load verilog are “analyze” and “elaborate”. Executing these commands
results in a great deal of log output as the tool elaborates some Verilog constructs and
starts to infer some high-level components. Try executing the commands as follows.
Notice, that the graycount is the name of the top module to be synthesized and not the
name of the verilog file (gray_counter.v). You can see part of the analyze command in
Figure 7.a below
You can see Figure 7.b, which shows you a part of the elaboration, for the above gray
code; the tool has inferred flipflop with 2 bit width. Please make sure that you check your
design at this stage in the log to check if any latches inferred. We typically do not want
latches inferred in the design.
Before DC optimizes the design, it uses Presto Verilog Compiler [for verilog code], to
read in the designs; it also checks the code for the correct syntax and builds a generic
technology (GTECH) netlist. DC uses this GTECH netlist to optimize the design. You
could also use “read_verilog” command, which basically combines both elaborate and
analyze command into one. You can use “read_verilog” as long as your design is not
parameterized, meaning look at the below example of a register.
If you want an instance of the above register to have a bit-width of 32, use the elaborate
command to specify this as follows:
8. Next, we check to see if the design is in a good state or consistent state; meaning that
there are no errors such as unconnected ports, logical constant-valued ports, cells with no
input or output pins, mismatches between a cell and its reference, multiple driver nets etc.
dc_shell-xg-t> check_design
Please go through, the check_design errors and warnings. DC cannot compile the design
if there are any errors. Many of the warning’s may not an issue, but it is still useful to
skim through this output.
9. After the design compile is clean, we need to tell the tool the constraints, before it
actually synthesizes. The tool needs to know the target frequency you want to synthesize.
Take a look at the “create_clock” command below.
You could also add additional constraints such as constrain the arrival of certain input
signals, the drive strength of the input signals, capacitive load on the output signals etc.
Below are some examples. These constraints are defined by you, the user; hence we can
call them user specified constraints.
Set input constraints by defining how much time would be spent by signals arriving into
your design, outside your design with respect to clock.
Similarly you can define output constraints, which define how much time would be spent
by signals leaving the design, outside the design, before being captured by the same clk.
Set area constraints: set maximum allowed area to 0 , well it’s just to instruct design
compiler to use as less area as possible.
Please refer to tutorial on “Basics of Static Timing Analysis” for more understanding of
concepts of STA and for more information on the commands used in STA, please refer to
the Primetime Manual and DC Compiler Manual at location /packages/synopsys/
10. Now we are ready to use the compile command to actually synthesize our design into
a gate-level netlist. Two of the most important options for the compile command are the
map effort and the area effort. Both of these can be set to one of none, low, medium, or
high. They specify how much time to spend on technology mapping and area reduction.
DC will attempt to synthesize your design while still meeting the constraints. DC
considers two types of constraints: user specified constraints and design rule constraints.
We looked at the user specified constraints in the previous step. Design rule constraints
are fixed constraints which are specified by the standard cell library. For example, there
are restrictions on the loads specific gates can drive and on the transition times of certain
pins. To get a better understanding of the standard cell library, please refer to Generic
90nm library documents in the below location which we are using in the tutorial.
Also, note that the compile command does not optimize across module boundaries. You
have to use “set flatten” command to enable inter-module optimization. For more
information on the compile command consult the Design Compiler User Guide (dc-user-
guide.pdf) or use man compile at the DC shell prompt.
The compile command will report how the design is being optimized. You should see DC
performing technology mapping, delay optimization, and area reduction. Figure 7.c
shows a fragment from the compile output. Each line is an optimization pass. The area
column is in units specific to the standard cell library, but for now you should just use the
area numbers as a relative metric. The worst negative slack column shows how much
room there is between the critical path in your design and the clock constraint. Larger
negative slack values are worse since this means that your design is missing the desired
clock frequency by a greater amount. Total negative slack is the sum of all negative slack
across all endpoints in the design - if this is a large negative number it indicates that not
only is the design not making timing, but it is possible that many paths are too slow. If
the total negative slack is a small negative number, then this indicates that only a few
paths are too slow. The design rule cost is an indication of how many cells violate one of
the standard cell library design rules constraints.
You can use the compile command more than once, as many iterations as you want, for
example, first iteration you can optimize only timing, but it might come with high area
cost, for second iteration, it optimizes area, but could cause the design to no longer meet
timing. There is no limit on number of iterations; however each design is different, and
you need to do number of runs, to decide how many iterations it needs.
We can now use various commands to examine timing paths, display reports, and further
optimize the design. Using the shell directly is useful for finding out more information
about a specific command or playing with various options.
In addition to the actual synthesized gate-level netlist, the dc_synth.tcl also generates
several text reports. Reports usually have the rpt filename suffix. The following is a list
of the synthesis reports.
The synth area.rpt report contains area information for each module in the design. 7.d
shows a fragment from synth_area.rpt. We can use the synth_area.rpt report to gain
insight into how various modules are being implemented. We can also use the area report
to measure the relative area of the various modules.
You can find all these reports in the below location for your reference.
/packages/synopsys/setup/project_dc/synth/reports/
You can also look at command.log , in the synth directory, which will list all the
commands used in the current session.
Library(s) Used:
saed90nm_typ (File:
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm
/Digital_Standard_Cell_Library/synopsys/models/saed90nm_typ.db)
Number of ports: 5
Number of nets: 9
Number of cells: 5
Number of references: 3
The synth_cells.rpt - Contains the cells list in the design , as you can see in Figure
4.e . From this report , you can see the breakup of each cell area in the design.
Cell Count
-----------------------------------
Hierarchial Cell Count: 0
Hierarchial Port Count: 0
Leaf Cell Count: 5
-----------------------------------
Area
-----------------------------------
Combinational Area: 29.492001
Noncombinational Area: 64.512001
Net Area: 0.000000
-----------------------------------
Cell Area: 94.003998
Design Area: 94.003998
Design Rules
-----------------------------------
Total Number of Nets: 9
Nets With Violations: 0
-----------------------------------
Hostname: hafez.sfsu.edu
1
synth_timing.rpt - Contains critical timing paths
You can see below an example of a timing report dumped out from synthesis . You can
see at the last line of the Figure 7.f , this paths meets timing. The report lists the critical
path of the design. The critical path is the slowest logic path between any two registers
=> In the above example, the file will be empty since the graycounter
did not need any of the complex cells.
No implementations to report
No multiplexors to report
Below is the gate-level netlist output of the gray counter RTL code after synthesis.
AO22X1 U2
( .IN1(gcc_out[1]), .IN2(n1), .IN3(en_count), .IN4(N8), .Q(n4) );
AO22X1 U3 ( .IN1(en_count), .IN2(n6), .IN3(N8), .IN4(n1), .Q(n5) );
INVX0 U4 ( .IN(en_count), .QN(n1) );
DFFARX1 \gcc_out_reg[0]
( .D(n5), .CLK(clk), .RSTB(reset_n), .Q(N8) );
DFFARX1 \gcc_out_reg[1]
( .D(n4), .CLK(clk), .RSTB(reset_n), .Q(gcc_out[1]),
.QN(n6) );
endmodule
## Give the path to the verilog files and define the WORK directory
## Create Constraints
create_clock clk -name ideal_clock1 -period 5
set_input_delay 2.0 [remove_from_collection [all_inputs] clk ] –clock
ideal_clock1
set_output_delay 2.0 [all_outputs] –clock ideal_clock1
set_max_area 0
## Compilation
## you can change medium to either low or high
compile -area_effort medium -map_effort medium
write_sdc const/gray_counter.sdc
exit
4. A.0 Introduction
A fully optimized design is one, which has met the timing requirements and occupies the
smallest area. The optimization can be done in two stages one at the code level, the other
during synthesis. The optimization at the code level involves modifications to RTL code
that is already been simulated and tested for its functionality. This level of modifications
to the RTL code is generally avoided as sometimes it leads to inconsistencies between
simulation results before and after modifications. However, there are certain standard
model optimization techniques that might lead to a better synthesized
design.
Model optimizations are important to a certain level, as the logic that is generated by the
synthesis tool is sensitive to the RTL code that is provided as input. Different RTL codes
generate different logic. Minor changes in the model might result in an increase or
decrease in the number of synthesized gates and also change its timing characteristics. A
logic optimizer reaches different endpoints for best area and best speed depending on the
starting point provided by a netlist synthesized from the RTL code. The different starting
points are obtained by rewriting the same HDL model using different constructs. Some of
the optimizations, which can be used to modify the model for obtaining a better quality
design, are listed below.
if A = ‘1’ then
E = B + C;
else
E = B + D;
end if;
if A = ‘1’ then
temp := C; // A temporary variable introduced.
else
temp := D;
end if;
E = B + temp;
It is clear from the figure that one ALU has been removed with one ALU being shared
for both the addition operations. However a multiplexer is induced at the inputs of the
ALU that contributes to the path delay. Earlier the timing path of the select signal goes
through the multiplexer alone, but after resource sharing it goes through the multiplexer
B := R1 + R2;
…..
C <= R3 – (R1 + R2);
if (test)
A <= B & (C + D);
else
J <= (C + D) | T;
end if;
In the above code the common factor C + D can be place out of the if statement, which
might result in the tool generating only one adder instead of two as in the above case.
Such minor changes if made by the designer can cause the tool to synthesize better logic
and also enable it to concentrate on optimizing more critical areas.
Moving Code
In certain cases an expression might be placed, within a for/while loop statement, whose
value would not change through every iteration of the loop. Typically a synthesis tool
handles the a for/while loop statement by unrolling it the specified number of times. In
such cases redundant code might be generated for that particular expression causing
additional logic to be synthesized. This could be avoided if the expression is moved
outside the loop, thus optimizing the design. Such optimizations performed at a higher
C := A + B;
…………
for c in range 0 to 5 loop
……………
T := C – 6;
// Assumption : C is not assigned a new value within the loop, thus the above expression
would remain constant on every iteration of the loop.
……………
end loop;
The above code would generate six subtracters for the expression when only one is
necessary. Thus by modifying the code as given below we could avoid the generation of
unnecessary logic.
C := A + B;
…………
temp := C – 6; // A temporary variable is introduced
for c in range 0 to 5 loop
……………
T := temp;
// Assumption : C is not assigned a new value within the loop, thus the above expression
would remain constant on every iteration of the loop.
……………
end loop;
Ex:
C := 4;
….
Y = 2 * C;
Computing the value of Y as 8 and assigning it directly within your code can avoid the
above unnecessary code. This method is called constant folding. The other optimization,
dead code elimination refers to those sections of code, which are never executed.
Ex.
A := 2;
The above if statement would never be executed and thus should be eliminated from the
code. The logic optimizer performs these optimizations by itself, but nevertheless if the
designer optimizes the code accordingly the tool optimization time would be reduced
resulting in faster tool running times.
The usage of parentheses is critical to the design as the correct usage might result in
better timing paths.
Ex.
Result <= R1 + R2 - P + M;
The hardware generated for the above code is as given below in Figure 4 (a).
If the expression has been written using parentheses as given below, the hardware
synthesized would be as given in Figure 4 (b).
Result <= (R1 + R2) – (P - M);
Keep related combinational logic in the same module
Partition for design reuse.
Separate modules according to their functionality.
Separate structural logic from random logic.
Limit a reasonable block size (perhaps a maximum of 10K gates per block).
Partition the top level.
Do not add glue-logic at the top level.
Isolate state-machine from other logic.
Avoid multiple clocks within a block.
Isolate the block that is used for synchronizing the multiple clocks.
For the optimization of design, to achieve minimum area and maximum speed, a lot of
experimentation and iterative synthesis is needed. The process of analyzing the design for
speed and area to achieve the fastest logic with minimum area is termed – design space
exploration.
For the sake of optimization, changing of HDL code may impact other blocks in the
design or test benches. For this reason, changing the HDL code to help synthesis is less
desirable and generally is avoided. It is now the designer’s responsibility to minimize the
area and meet the timing requirements through synthesis and optimization. The later
The DC has three different compilation strategies. It is up to user discretion to choose the
most suitable compilation strategy for a design.
a) Top-down hierarchical compile method.
b) Time-budget compile method.
c) Compile-characterize-write-script-recompile (CCWSR) method.
Advantages
Only top level constraints are needed.
Better results due to optimization across entire design.
Disadvantages
Long compile time.
Incremental changes to the sub-blocks require complete re-synthesis.
Does not perform well, if design contains multiple clocks or generated clocks.
Time-budgeting compile.
This process is best for designs properly partitioned designs with timing specifications
defined for each sub-block. Due to specifying of timing requirements for each block,
multiple synthesis scripts for individual blocks are produced. The synthesis is usually
performed bottom-up i.e., starting at the lowest level and going up to the top most level.
This method is useful for medium to very large designs and does not require large
amounts memory.
Advantages
Design easier to manage due to individual scripts.
Incremental changes to sub-blocks do not require complete re-synthesis.
Compile-Characterize-Write-Script-Recompile
This is an advanced synthesis approach, useful for medium to very large designs that do
not have good inter-block specifications defined. It requires constraints to be applied at
the top level of the design, with each sub-block compiled beforehand. The subblocks are
then characterized using the top-level constraints. This in effect propagates the required
timing information from the top-level to the sub-blocks. Performing a write_script on
the characterized sub-blocks generates the constraint file for each subblock.
The constraint files are then used to re-compile each block of the design.
Advantages
Less memory intensive.
Good quality of results because of optimization between sub-blocks of the design.
Produces individual scripts, which may be modified by the user.
Disadvantages
The generated scripts are not easily readable.
It is difficult to achieve convergence between blocks
Lower block changes might need complete re-synthesis of entire design.
Ex: Lets say moduleA has been synthesized. Now moduleB that has two instantiations of
moduleA as U1 and U2 is being compiled. The compilation will be stopped with an error
message stating that moduleA is instantiated 2 times in moduleB. There are two methods
of resolving this problem.
You can set a don_touch attribute on moduleA before synthesizing moduleB, or
uniquify moduleB. uniquify a dc_shell command creates unique definitions of multiple
instances. So it for the above case it generates moduleA-u1 and moduleA_u2 (in VHDL),
corresponding to instance U1 and U2 respectively.
Flattening
Flattening reduces the design logic in to a two level, sum-of-products of form, with few
logic levels between the input and output. This results in faster logic. It is recommended
for unstructured designs with random logic. The flattened design then can be structured
before final mapping optimization to reduce area. This is important as flattening has
significant impact on area of the design. In general one should compile the design using
default settings (flatten and structure are set as false). If timing objectives are not met
flattening and structuring should be employed. It the design is still failing goals then just
flatten the design without structuring it. The command for flattening is given below
If the design is not timing critical and you want to minimize for area only, then set the
area constraints (set_max_area 0) and perform Boolean optimization. For all other case
structure with respect to timing only.
Removing hierarchy
DC by default maintains the original hierarchy that is given in the RTL code. The
hierarchy is a logic boundary that prevents DC from optimizing across this boundary.
Unnecessary hierarchy leads to cumbersome designs and synthesis scripts and also limits
the DC optimization within that boundary, without optimizing across hierarchy. To allow
DC to optimize across hierarchy one can use the following commands.
This allows the DC to optimize the logic separated by boundaries as one logic resulting in
better timing and an optimal solution.
DC by default tries to optimize for timing. Designs that are not timing critical but area
intensive can be optimized for area. This can be done by initially compiling the design
with specification of area requirements, but no timing constraints. In addition, by using
the don_touch attribute on the high-drive strength gates that are larger in size, used by
default to improve timing, one can eliminate them, thus reducing the area considerably.
Once the design is mapped to gates, the timing and area constraints should again be
specified (normal synthesis) and the design re-compiled incrementally. The incremental
compile ensures that DC maintains the previous structure and does not bloat the logic
unnecessarily. The following points can be kept in mind for further area optimization:
area.
At the top level avoid any kind of glue logic. It is better to incorporate glue logic
in one of the sub-components thus letting the tool to optimize the logic better.
There are two kind of timing issues that are important in a design- setup and hold timing
violations.
Setup Time: It indicates the time before the clock edge during which the data should be
valid i.e. it should be stable during this period and should not change. Any change during
this period would trigger a setup timing violation. Figure 4A.b illustrates an example with
setup time equal to 2 ns. This means that signal DATA must be valid 2 ns before the
clock edge; i.e. it should not change during this 2ns period before the clock edge.
Hold Time: It indicates the time after the clock edge during which the data should be
held valid i.e. it should not change but remain stable. Any change during this period
would trigger a hold timing violation. Figure 4A.b illustrates an example with hold time
equal to 1 ns. This means that signal DATA must be held valid 1 ns after the clock edge;
i.e. it should not change during the 1 ns period after the clock edge.
The synthesis tool automatically runs its internal static timing analysis engine to check
for setup and hold time violations for the paths, that have timing constraints set on them.
It mostly uses the following two equations to check for the violations.
Here Tprop is the propagation delay from input clock to output of the device in question
(mostly a flip-flop); Tdelay is the propagation delay across the combinational logic
through which the input arrives; Tsetup is the setup time requirement of the device;
When the synthesis tool reports timing violations the designer needs to fix them. There
are three options for the designer to fix these violations.
1) Optimization using synthesis tool: this is the easiest of all the other options. Few of
the techniques have been discussed in the section Optimization Techniques above.
Register balancing
This command is particularly useful with designs that are pipelined. The command
reshuffles the logic from one pipeline stage to another. This allows extra logic to be
moved away from overly constrained pipeline stages to less constrained ones with
additional timing. The command is simply balance_registers.
The implementation type sim is only for simulation. Implementation types rpl, cla, and
clf are for synthesis; clf is the faster implementation followed by cla; the slowest being
rpl. If compilation of map_effort low is set the designer can manually set the
implementation using the set_implementation command. Otherwise the selection will not
change from current choice. If the map_effort is set to medium the design compiler
would automatically choose the appropriate implementation depending upon the
optimization algorithm. A choice of medium map_effort is suitable for better
optimization or even a manual setting can be used for better performance results.
Balancing heavy loading Designs generally have certain nets with heavy fanout
generating a heavy load on a certain point. A large load would be difficult to drive by a
single net. This leads to unnecessary delays and thus timing violations. The
balance_buffers command comes in hand to solve such problems. this command would
make the design compiler to create buffer trees to drive the large fanout and thus balance
the heavy load.
Microarchitectural Tweaks
Consider the figure 4A.c Assuming a critical path exists from A to Q2, logic optimization
on combinational logic X, Y, and Z would be difficult because X is shared with Y and Z.
We can duplicate the logic X as shown in figure 4A.d. In this case Q1 and Q2 have
independent paths and the path for Q2 can be optimized in a better fashion by the tool to
ensure better performance.
Logic duplication can also be used in cases where a module has one signal arriving late
compared to other signals. The logic can be duplicated in front of the fast -arriving
signals such that timing of all the signals is balanced. Figure 4A.e & 4A.f illustrate this
fact quite well. The signal Q might generate a setup violation as it might be delayed due
Figure 4A.f: Logic Duplication for balancing the timing between signals
When a designer knows for sure that a particular input signal is arriving late then priority
encoding would be a good bet. The signals arriving earlier could be given more priority
and thus can be encoded before the late arriving signals.
It can be designed using five and gates with A, B at the first gate. The output of first gate
is anded with C and output of the second gate with D and so on. This would ensure
proper performance if signal F is most late arriving and A is the earliest to arrive. If
propagation delay of each and gate were 1 ns this would ensure the output signal Q would
be valid only 5 ns after A is valid or only 1 ns after signal H is valid. Multiplex decoding
is useful if all the input signals arrive at the same time. This would ensure that the output
would be valid at a faster rate. Thus multiplex decoding is faster than priority decoding if
all input signals arrive at the same time. In this case for the boolean equation above the
each of the two inputs would be anded parallely in the form of A.B, C.D and E.F each
these outputs would then be anded again to get the final output. This would ensure Q to
be valid in about 2 ns after A is valid.
<identifiers>
<continuous assignment>
>>,<<,?:,{}
assign (procedural and declarative), begin, end, case, casex, casez, endcase
default
disable
function, endfunction
if, else, else if
input, output, inout
wire, wand, wor, tri
integer, reg
macromodule, module
parameter
supply0, supply1
task, endtask
Construct Constraints
when both operands
constants or second
*,/,%
operand
is a power of 2
only edge triggered
Always
events
bounded by static
For variables: only ise +
or - to index
Ignored Constructs
Unsupported constructs
<assignment with variable used as bit select on LHS of assignment>
<global variables>
===, !==
cmos,nmos,rcmos,rnmos,pmos,rpmos
deassign
defparam
event
force
fork,join
forever,while
initial
pullup,pulldown
release
repeat
rtran,tran,tranif0
tranif1
Synopsys provides a GUI front-end to Design Compiler called Design Vision which we
will use to analyze the synthesis results. You should avoid using the GUI to actually
perform synthesis since we want to use scripts for this. To launch Design Vision and read
in the synthesized design, move into the /project_dc/synth/ working directory and use the
following commands. The command “design_vision-xg” will open up a GUI.
% design_vision-xg
design_vision-xg> read_file -format ddc output/gray_counter.ddc
You can browse your design with the hierarchical view. Right click on the gray_counter
module and choose the Schematic View option [Figure 8.a], the tool will display a
schematic of the synthesized logic corresponding to that module. Figure 8.b shows the
schematic view for the gray counter module. You can see synthesized flip-flops in the
schematic view.
In the current gray_count design, there are no submodules. If there are submodules in the
design, it is sometimes useful to examine the critical path through a single submodule. To
do this, right click on the module in the hierarchy view and use the Characterize option.
Check the timing, constraints, and connections boxes and click OK. Now choose the
module from the drop down list box on the toolbar (called the Design List). Choosing
Timing ! Report Timing will provide information on the critical path through that
submodule given the constraints of the submodule within the overall design’s context.
For more information on Design Vision consult the Design Vision User Guide
6.0 Introduction
Why do we normally do Static Timing Analysis and not Dynamic Timing Analysis?
What is the difference between them?
Timing Analysis can be done in both ways; static as well as dynamic. Dynamic
Timing analysis requires a comprehensive set of input vectors to check the timing
characteristics of the paths in the design. Basically it determines the full behavior of the
circuit for a given set of input vectors. Dynamic simulation can verify the functionality of
the design as well as timing requirements. For example if we have 100 inputs then we
need to do 2 to the power of 100 simulations to complete the analysis. The amount of
analysis is astronomical compared to static analysis.
Static Timing analysis checks every path in the design for timing violations without
checking the functionality of the design. This way, one can do timing and functional
analysis same time but separately. This is faster than dynamic timing simulation because
there is no need to generate any kind of test vectors. That’s why STA is the most popular
way of doing timing analysis.
The static timing analysis tool performs the timing analysis in the following way:
1. STA Tool breaks the design down into a set of timing paths.
2. Calculates the propagation delay along each path.
3. Checks for timing violations (depending on the constraints e.g. clock) on the
different paths and also at the input/output interface.
Net Delay: Amount of delay from the output of a gate to the input of the next gate in a
timing path. It depends on the following parameters
a. Parasitic Capacitance
b. Resistance of net
2. Output Transition Time (which in turn depends on Input Transition Time and
1. Delay from input to output of the gate (Gate Delay).
3 input and gate (a, b, c) and output (out). If you want you can disable the
4. Disabled Timing Arcs: The input to the output arc in a gate is disabled. For e.g.
path from input ‘a’ to output ‘out’ using disable timing arc constraint.
90% of its steady state value.
Fall time: It is defined as the time it takes for a waveform to rise from 90% to
10% of its steady state value.
Clock-Q Delay: It is the delay from rising edge of the clock to when Q (output)
becomes available. It depends on
o Input Clock transition
o Output Load Capacitance
Clock Skew: It is defined as the time difference between the clock path reference
and the data path reference. The clock path reference is the delay from the main
clock to the clock pin and data path reference is the delay from the main clock to
the data pin of the same block. (Another way of putting it is the delay between the
longest insertion delay and the smallest insertion delay.)
Metastability: It is a condition caused when the logic level of a signal is in an
indeterminate state.
Critical Path: The clock speed is normally determined by the slowest path in the
design. This is often called as ‘Critical Path’.
Clock jitter: It is the variation in clock edge timing between clock cycles. It is
usually caused by noise.
Set-up Time: It is defined as the time the data/signal has to stable before the
clock edge.
Hold Time: It is defined as the time the data/signal has to be stable after the clock
edge.
Interconnect Delay: This is delay caused by wires. Interconnect introduces three
types of parasitic effects – capacitive, resistive, and inductive – all of which
influence signal integrity and degrade the performance of the circuit.
Negative Setup time: In certain cases, due to the excessive delay (example:
caused by lot of inverters in the clock path) on the clock signal, the clock signal
actually arrives later than the data signal. The actual clock edge you want your
data to latch arrives later than the data signal. This is called negative set up time.
Negative Hold time: It basically allows the data that was supposed to change in
the next cycle, change before the present clock edge.
Negative Delay: It is defined as the time taken by the 50% of output crossing to
50% of the input crossing.
Transition Time: It is the time taken for the signal to go from one logic level to
another logic level
level)
Insertion Delay: Delay from the clock source to that of the sequential pin.
You can learn about CTS more detail in the Physical Design part of this tutorial.
Clock Network Delay: A set of buffers are added in between the source of the clock to
the actual clock pin of the sequential element. This delay due to the addition of all these
buffers is defined as the Clock Network Delay. [Clock Network Delay is added to clock
period in Primetime]
Path Delay: When calculating path delay, the following has to be considered:
Clock Network Delay+ Clock-Q + (Sum of all the Gate delays and Net delays)
Global Clock skew: It is defined as the delay which is nothing but the difference
between the Smallest and Longest Clock Network Delay.
Zero Skew: When the clock tree is designed such that the skew is zero, it is defined as
zero skew.
Local Skew: It is defined as the skew between the launch and Capture flop. The worst
skew is taken as Local Skew.
Useful Skew: When delays are added only to specific clock paths such that it improves
set up time or hold time, is called useful skew.
What kind of model does the tool use to calculate the delay?
The tool uses a wire load model. It is nothing but a statistical model .It consists of a table
which gives the capacitance and resistance of the net with respect to fan-out.
For more information please refer to the Primetime User Manual in the
packages/synopsys/ directory.
PrimeTime (PT) is a sign-off quality static timing analysis tool from Synopsys. Static
timing analysis or STA is without a doubt the most important step in the design flow. It
determines whether the design works at the required speed. PT analyzes the timing delays
in the design and flags violation that must be corrected.
PT, similar to DC, provides a GUI interface along with the command-line interface. The
GUI interface contains various windows that help analyze the design graphically.
Although the GUI interface is a good starting point, most users quickly migrate to using
the command-line interface. Therefore, I will focus solely on the command-line interface
of PT.
PT is a stand-alone tool that is not integrated under the DC suite of tools. It is a separate
tool, which works alongside DC. Both PT and DC have consistent commands, generate
similar reports, and support common file formats. In addition PT can also generate timing
assertions that DC can use for synthesis and optimization. PT’s command-line interface is
based on the industry standard language called Tcl. In contrast to DC’s internal STA
engine, PT is faster, takes up less memory, and has additional features.
6.6.2 Pre-Layout
After successful synthesis, the netlist obtained must be statically analyzed to check for
timing violations. The timing violations may consist of either setup and/or hold-time
violations. The design was synthesized with emphasis on maximizing the setup-time,
therefore you may encounter very few setup-time violations, if any. However, the hold-
time violations will generally occur at this stage. This is due to the data arriving too fast
at the input of sequential cells with respect to the clock.
If the design is failing setup-time requirements, then you have no other option but to re-
synthesize the design, targeting the violating path for further optimization. This may
involve grouping the violating paths or over constraining the entire sub-block, which had
violations. However, if the design is failing hold-time requirements, you may either fix
these violations at the pre-layout level, or may postpone this step until after layout. Many
designers prefer the latter approach for minor hold-time violations (also used here), since
the pre-layout synthesis and timing analysis uses the statistical wire-load models and
fixing the hold-time violations at the pre-layout level may result in setup-time violations
for the same path, after layout. However, if the wire-load models truly reflect the post-
routed delays, then it is prudent to fix the hold-time violations at this stage. In any case, it
must be noted that gross hold-time violations should be fixed at the pre-layout level, in
order to minimize the number of hold-time fixes, which may result after the layout.
In the pre-layout phase, the clock tree information is absent from the netlist. Therefore, it
is necessary to estimate the post-route clock-tree delays upfront, during the pre-layout
phase in order to perform adequate STA. In addition, the estimated clock transition
should also be defined in order to prevent PT from calculating false delays (usually large)
for the driven gates. The cause of large delays is usually attributed to the high fanout
normally associated with the clock networks. The large fanout leads to slow input
transition times computed for the clock driving the endpoint gates, which in turn results
in PT computing unusually large delay values for the endpoint gates. To prevent this
situation, it is recommended that a fixed clock transition value be specified at the source.
The following commands may be used to define the clock, during the prelayout phase of
the design.
The above commands specify the port CLK as type clock having a period of 20ns, the
clock latency as 2.5ns, and a fixed clock transition value of 0.2ns. The clock latency
value of 2.5ns signifies that the clock delay from the input port CLK to all the endpoints
is fixed at 2.5ns. In addition, the 0.2ns value of the clock transition forces PT to use the
0.2ns value, instead of calculating its own. The clock skew is approximated with 1.2ns
specified for the setup-time, and 0.5ns for the hold-time. Using this approach during pre-
layout yields a realistic approximation to the post-layout clock network results.
0. The design example for the rest of this tutorial is a FIFO whose verilog code is
available in asic_flow_setup/src/fifo/fifo.v . Please first run the DC synthesis on this file:
[[email protected]] $ csh
[[email protected]] $ cd
[[email protected]] $ cd /asic_flow_setup/pt
[[email protected]] $ source /packages/synopsys/setup/synopsys_setup.tcl
2. PT may be invoked in the command-line mode using the command pt_shell or in the
GUI mode through the command primetime as shown below.
Command-line mode:
> pt_shell
GUI-mode:
> primetime
Before doing the next step , open the pre_layout_pt.tcl script and keep it ready which is at
location /
[[email protected]]$ vi scripts/pre_layout_pt.tcl
3. Just like DC setup, you need to set the path to link_library and search_path
7. Now we can do the analysis of the design, as discussed in the beginning of this chapter,
All four types of analysis can be accomplished by using the following commands:
8. Reporting setup time and hold time. Primetime by default reports the setup time. You
can report the setup or hold time by specifying the –delay_type option as shown in below
figure.
If you open and read the timing.rpt file, you will notice that the design meets the setup
time.
10. Reporting timing with capacitance and transition time at each level in the path
11. You can save your session and come back later if you chose to.
Note: If the timing is not met, you need to go back to synthesis and redo to make sure the
timing is clean before you proceed to the next step of the flow that is Physical
Implementation.
8.1 Introduction
As you have seen in the beginning of the ASIC tutorial, after getting an optimized gate-
level netlist, the next step is Physical implementation. Before we actually go into details
of ICCompiler, which is the physical implementation tool from Synopsys, this chapter
covers the necessary basic concepts needed to do physical implementation. Also, below
you can see a more detailed flowchart of ASIC flow.
8.2 Floorplanning
At the floorplanning stage, we have a netlist which describes the design and the various
blocks of the design and the interconnection between the different blocks. The netlist is
the logical description of the ASIC and the floorplan is the physical description of the
ASIC. Therefore, by doing floorplanning, we are mapping the logical description of the
design to the physical description. The main objectives of floorplanning are to minimize
a. Area
Row utilization
t
Row y
Spacing
Floorplanning is a major step in the Physical Implementation process. The final timing,
quality of the chip depends on the floorplan design. The three basic elements of chip are:
1. Standard Cells: The design is made up of standard cells.
2. I/O cells: These cells are used to help signals interact to and from the chip.
3. Macros (Memories): To store information using sequential elements takes up lot
of area. A single flip flop could take up 15 to 20 transistors to store one bit.
Therefore special memory elements are used which store the data efficiently and
also do not occupy much space on the chip comparatively. These memory cells
are called macros. Examples of memory cells include 6T SRAM (Static Dynamic
Access Memory), DRAM (Dynamic Random Access Memory) etc.
The above figure shows a basic floorplan. The following is the basic floorplanning steps
(and terminology):
1. Aspect ratio (AR): It is defines as the ratio of the width and length of the chip.
From the figure, we can say that aspect ratio is x/y. In essence, it is the shape of
the rectangle used for placement of the cells, blocks. The aspect ratio should take
into account the number of routing resources available. If there are more
Area (chip)
blockages.
Area (Standard Cells)
Area (Chip) - Area (Macro) – Area (Region Blockages)
After Floorplanning is complete, check for DRC (Design Rule check) violations. Most
of the pre-route violations are not removed by the tool. They have to be fixed manually.
I/O Cells in the Floorplan: The I/O cells are nothing but the cells which interact in
between the blocks outside of the chip and to the internal blocks of the chip. In a
floorplan these I/O cells are placed in between the inner ring (core) and the outer ring
(chip boundary). These I/O cells are responsible for providing voltage to the cells in the
core. For example: the voltage inside the chip for 90nm technology is about 1.2 Volts.
The regulator supplies the voltage to the chip (Normally around 5.5V, 3.3V etc).
The next question which comes to mind is that why is the voltage higher than the voltage
inside the chip?
The regulator is basically placed on the board. It supplies voltage to different other chips
on board. There is lot of resistances and capacitances present on the board. Due to this,
the voltage needs to be higher. If the voltage outside is what actually the chip need inside,
then the standard cells inside of the chip get less voltage than they actually need and the
chip may not run at all.
So now the next question is how the chips can communicate between different voltages?
The answer lies in the I/O cells. These I/O cells are nothing but Level Shifters. Level
Shifters are nothing but which convert the voltage from one level to another The Input
I/O cells reduce the voltage coming form the outside to that of the voltage needed inside
Most of the time, the verilog netlist is in the hierarchical form. By hierarchical I mean
that the design is modeled on basis of hierarchy. The design is broken down into different
sub modules. The sub modules could be divided further. This makes it easier for the logic
designer to design the system. It is good to have a hierarchical netlist only until Physical
Implementation. During placement and routing, it is better to have a flattened netlist.
Flattening of the netlist implies that the various sub blocks of the model have basically
opened up and there are no more sub blocks. There is just one top block. After you flatten
the netlist you cannot differentiate between the various sub block, but the logical
In a flat design flow, the placement and routing resources are always visible and
hierarchy of the whole design is maintained. The reason to do this is:
Physical Design Engineers can perform routing optimization and can avoid congestion
available.
wire n1;
endmodule
endmodule
In verilog, the instance name of each module is unique. In the flattened netlist, the
instance name would be the top level instance name/lower level instance name etc…
Also the input and output ports of the sub modules also get lost. In the above example the
input and output ports; a, out1, b and outb get lost.
The above hierarchical model, when converted to the flattened netlist, will look like this:
input in1;
output out1;
wire topn1;
endmodule
Is floorplan
ok?
No
Yes
Placement
8.4 Placement
Placement is a step in the Physical Implementation process where the standard cells
location is defined to a particular position in a row. Space is set aside for interconnect to
each logic/standard cell. After placement, we can see the accurate estimates of the
capacitive loads of each standard cell must drive. The tool places these cells based on the
algorithms which it uses internally. It is a process of placing the design cells in the
floorplan in the most optimal way.
What does the Placement Algorithm want to optimize?
The main of the placement algorithm is
1. Making the chip as dense as possible ( Area Constraint)
2. Minimize the total wire length ( reduce the length for critical nets)
Min-Cut Algorithm
This is the most popular algorithm for placement. This method uses successive
application of partitioning the block. It does the following steps:
1. Cuts the placement area into two pieces. This piece is called a bin. It counts the
number of nets crossing the line. It optimizes the cost function. The cost function
here would be number of net crossings. The lesser the cost function, the more
optimal is the solution.
2. Swaps the logic cells between these bins to minimize the cost function.
3. Repeats the process from step 1, cutting smaller pieces until all the logic cells are
placed and it finds the best placement option.
The cost function not only depends on the number of crossings but also a number of
various other factors such as, distance of each net, congestion issues, signal integrity
issues etc. The size of the bin can vary from a bin size equal to the base cell to a bin size
that would hold several logic cells. We can start with a large bin size, to get a rough
placement, and then reduce the bin size to get a final placement.
8.5 Routing
After the floorplanning and placement steps in the design, routing needs to be done.
Routing is nothing but connecting the various blocks in the chip with one an other. Until
now, the blocks were only just placed on the chip. Routing also is spilt into two steps
1. Global routing: It basically plans the overall connections between all the blocks
and the nets. Its main aim is to minimize the total interconnect length, minimize
The chip is divided into small blocks. These small blocks are called routing
the critical path delay. It determines the track assignments for each interconnect.
bins. The size of the routing bin depends on the algorithm the tool uses. Each
routing bin is also called a gcell. The size of this gcell depends on the tool.
San Francisco State University Nano-Electronics & Computing Research Lab 100
Each gcell has a finite number of horizontal and vertical tracks. Global routing
assigns nets to specific gcells but it does not define the specific tracks for each
of them. The global router connects two different gcells from the centre point
interconnections are going in each of direction. This is nothing but the routing
demand. The number of routing layers that are available depend on the design
and also, if the die size is more, the greater the routing tracks. Each routing
layer has a minimum width spacing rule, and its own routing capacity.
For Example: For a 5 metal layer design, if Metal 1, 4, 5 are partially up for
inter-cell connections, pin, VDD, VSS connections, the only layers which are
routable 100% are Metal2 and Metal3. So if the routing demand goes over the
routing supply, it causes Congestion. Congestion leads to DRC errors and
slow runtime.
2. Detailed Routing: In this step, the actual connection between all the nets takes
place. It creates the actual via and metal connections. The main objective of
detailed routing is to minimize the total area, wire length, delay in the critical
It specifies the specific tracks for the interconnection; each layer has its
paths.
own routing grid, rules. During the final routing, the width, layer, and exact
location of the interconnection are decided.
Gcell
After detailed routing is complete, the exact length and the position of each interconnect
for every net in the design is known. The parasitic capacitance, resistance can now is
extracted to determine the actual delays in the design. The parasitic extraction is done by
extraction tools. This information is back annotated and the timing of the design is now
calculated using the actual delays by the Static Timing Analysis Tool.
After timing is met and all other verification is performed such as LVS, etc, the design is
sent to the foundry to manufacture the chip.
San Francisco State University Nano-Electronics & Computing Research Lab 101
8.6 Packaging
Depending on the type of packaging of the chip, the I/O cells, pad cells are designed
differently during the Physical Implementation. There are two types of Packaging style:
a. Wire-bond: The connections in this technique are real wires. The underside of the
die is first fixed in the package cavity. A mixture of epoxy and a metal (aluminum,
sliver or gold) is used to ensure a low electrical and thermal resistance between the
die and the package. The wires are then bonded one at a time to the die and the
package. Below is a illustration of Wire Bond packaging.
b. Flip-Chip: Flip Chip describes the method of electrically connecting the die to the
package carrier. This is a direct chip-attach technology, which accommodates dies that
have several bond pads placed anywhere on the surfaces at the top. Solder balls are
deposited on the die bond pads usually when they are still on the wafer, and at
corresponding locations on the board substrate. The upside-down die (Flip-chip) is then
aligned to the substrate. The advantage of this type of packaging is very short
connections (low inductance) and high package density. The picture below is an
illustration of flip-chip type of packaging.
San Francisco State University Nano-Electronics & Computing Research Lab 102
Figure 8.6.b : Flip Chip Example
8.7.1 Introduction
The physical design stage of the ASIC design flow is also known as the “place and route”
stage. This is based upon the idea of physically placing the circuits, which form logic
gates and represent a particular design, in such a way that the circuits can be fabricated.
This is a generic, high level description of the physical design (place/route) stage. Within
the physical design stage, a complete flow is implemented as well. This flow will be
described more specifically, and as stated before, several EDA companies provide
software or CAD tools for this flow. Synopsys® software for the physical design process
is called IC Compiler. The overall goal of this tool/software is to combine the inputs of a
gate-level netlist, standard cell library, along with timing constraints to create and placed
and routed layout. This layout can then be fabricated, tested, and implemented into the
overall system that the chip was designed for.
San Francisco State University Nano-Electronics & Computing Research Lab 103
This layout view or depiction of the logical function contains the drawn mask layers
required to fabricate the design properly. However, the place and route tool does not
require such level of detail during physical design. Only key information such as the
location of metal and input/output pins for a particular logic function is needed. This
representation used by ICC is considered to be the abstract version of the layout. Every
desired logic function in the standard cell library will have both a layout and abstract
view. Most standard cell libraries will also contain timing information about the function
such as cell delay and input pin capacitance which is used to calculated output loads. This
timing information comes from detailed parasitic analysis of the physical layout of each
function at different process, voltage, and temperature points (PVT). This data is
contained within the standard cell library and is in a format that is usable by ICC. This
allows ICC to be able to perform static timing analysis during portions of the physical
design process. It should be noted that the physical design engineer may or may not be
involved in the creating of the standard cell library, including the layout, abstract, and
timing information. However, the physical design engineer is required to understand what
common information is contained within the libraries and how that information is used
during physical design. Other common information about standard cell libraries is the
fact that the height of each cell is constant among the different functions. This common
height will aid in the placement process since they can now be linked together in rows
across the design. This concept will be explained in detail during the placement stage of
physical design.
3. The third of the main inputs into ICC are the design constraints. These constraints are
identical to those which were used during the front-end logic synthesis stage prior to
physical design. These constraints are derived from the system specifications and
implementation of the design being created. Common constraints among most designs
include clock speeds for each clock in the design as well as any input or output delays
associated with the input/output signals of the chip. These same constraints using during
logic synthesis are used byICC so that timing will be considered during each stage of
place and route. The constraints are specific for the given system specification of the
design being implemented.
In the below IC compiler tutorial example, we will place & route the fifo design
synthesized.
STEPS
1. As soon as you log into your engr account, at the command prompt, please type “csh
“as shown below. This changes the type of shell from bash to c-shell. All the commands
work ONLY in c-shell.
[hkommuru@hafez ]$csh
San Francisco State University Nano-Electronics & Computing Research Lab 104
This ccreate directory structure as shown below. It will create a directory called
“asic_flow_setup ”, under which it creates the following directories namely
asic_flow_setup
src/ : for verilog code/source code
vcs/ : for vcs simulation for counter example
synth_graycounter/ : for synthesis of graycounter example
synth_fifo/ : for fifo synthesis
pnr_fifo/ : for Physical design of fifo design example
extraction/: for extraction
pt/: for primetime
verification/: final signoff check
The “asic_flow_setup” directory will contain all generated content including, VCS
simulation, synthesized gate-level Verilog, and final layout. In this course we will always
try to keep generated content from the tools separate from our source RTL. This keeps
our project directories well organized, and helps prevent us from unintentionally
modifying the source RTL. There are subdirectories in the project directory for each
major step in the ASIC Flow tutorial. These subdirectories contain scripts and
configuration files for running the tools required for that step in the tool flow. For this
tutorial we will work exclusively in the vcs directory.
3. Please source “synopsys_setup.tcl” which sets all the environment variables necessary
to run the VCS tool.
Please source them at unix prompt as shown below
Please Note : You have to do steps 1 and 3 above everytime you log in.
At the unix prompt type “icc_shell “, it will open up the icc window.
[[email protected]] $icc_shell
Next, to open the gui, type “gui_start”, it opens up gui window as shown in the next page .
San Francisco State University Nano-Electronics & Computing Research Lab 105
Before a design can be placed and routed within ICC, the environment for the design
needs to be created. The goal of the design setup stage in the physical design flow is to
prepare the design for floorplanning. The first step is to create a design library. Without a
design library, the physical design process using will not work. This library contains all
of the logical and physical data that will need. Therefore the design library is also
referenced as the design container during physical design. One of the inputs to the design
library which will make the library technology specific is the technology file.
4.a Setting up the logical libraries. The below commands will set the logical libraries and
define VDD and VSS
San Francisco State University Nano-Electronics & Computing Research Lab 106
icc_shell > set target_library “ saed90nm_max.db”
notice the space between saed90nm_fr/ and FIFO_design.mw, which is the design
name. You can choose your own design name.
4.d Read in the gate level synthesized verilog netlist. It opens up a layout window , which
contains the layout information as shown below. You can see all the cells in the design at
the bottom, since we have not initialized the floorplan yet or done any placement.
San Francisco State University Nano-Electronics & Computing Research Lab 107
4.e Uniquify the design by using the uniquify_fp_mw_cel command. The Milkyway
format does not support multiply instantiated designs. Before saving the design in
Milkyway format, you must uniquify the design to remove multiple instances.
icc_shell> uniquify_fp_mw_cel
4.f Link the design by using the link command (or by choosing File > Import > Link
Design in the GUI).
icc_shell> link
4.g Read the timing constraints for the design by using the read_sdc command (or by
choosing File > Import > Read SDC in the GUI).
San Francisco State University Nano-Electronics & Computing Research Lab 108
4.h Save the design.
FLOORPLANNING
Open file /scripts/floorplan_icc.tcl
You can see in the layout window, the floorplan size and shape. Since we are still in the
floorplan stage all the cells in the design are outside of the floorplan . You can change the
above options to play around with the floorplan size.
San Francisco State University Nano-Electronics & Computing Research Lab 109
6. Connect Power and Ground pins with below command
7. ICC automatically places the pins around the boundary of the floorplan evenly, if there
are no pin constraints given. You can constrain the pins around the boundary [ the blue
color pins in the above figure ] , using a TDF file. You can look at the file /const/fifo.tdf
8. Power Planning: First we need to create rectangular power ring around the floorplan.
San Francisco State University Nano-Electronics & Computing Research Lab 110
##Create VSS ring
San Francisco State University Nano-Electronics & Computing Research Lab 111
10. Save the design .
PLACEMENT
open /scripts/place_icc.tcl
During the optimization step, the place_opt command introduces buffers and inverters
tofix timing and DRC violations. However, this buffering strategy is local to some critical
paths.The buffers and inverters that are inserted become excess later because critical
paths change during the course of optimization. You can reduce the excess buffer and
inverter counts after place_opt by using the set_buffer_opt_strategy command, as shown
This buffering strategy will not degrade the quality of results (QoR).
San Francisco State University Nano-Electronics & Computing Research Lab 112
You can control the medium- and high-fanout thresholds by using the -hf_thresh and -
mf_thresh options, respectively. You can control the effort used to remove existing buffer
and inverter trees by using the -remove_effort option.
11. Goto Layout Window , Placement Core Placement and Optimization . A new
window opens up as shown below . There are various options, you can click on what ever
option you want and say ok. The tool will do the placement. Alternatively you can also
run at the command at icc_shell . Below is example with congestion option.
San Francisco State University Nano-Electronics & Computing Research Lab 113
# When the design has congestion issues, you have following choices :
# place_opt -congestion -area_recovery -effort low # for medium effort congestion
removal
# place_opt -effort high -congestion -area_recovery # for high eff cong removal
## What commands do you need when you want to reduce leakage power ?
# set_power_options -leakage true
# place_opt -effort low -area_recovery -power
## What commands do you need when you want to reduce dynamic power ?
# set_power_options -dynamic true -low_power_placement true
# read_saif –input < saif file >
# place_opt -effort low -area_recovery –power
# Note : option -low_power_placement enables the register clumping algorithm in
# place_opt, whereas the option -dynamic enables the
# Gate Level Power Optimization (GLPO)
## When you want to do scan opto, leakage opto, dynamic opto, and you have congestion
issues,
## use all options together :
# read_def < scan def file >
# set_power_options -leakage true -dynamic true -low_power_placement true
# place_opt -effort low -congestion -area_recovery -optimize_dft -power -num_cpus
12. After the placement is done, all the cells would be placed in the design and it would
the below window.
San Francisco State University Nano-Electronics & Computing Research Lab 114
13. You can report the following information after the placement stage.
### Reports
After placement, if you look at the fifo_cts.setup.rpt and fifo_cts.hold.rpt, in the reports
directory, they meet timing.
San Francisco State University Nano-Electronics & Computing Research Lab 115
open scripts/ cts_icc.tcl
Before doing the actual cts, you can set various optimization steps. In the Layout window,
click on “Clock “, you will see various options, you can set any of the options to run CTS.
If you click on Clock Core CTS and Optimization .
## clock_opt -only_psyn
San Francisco State University Nano-Electronics & Computing Research Lab 116
## clock_opt -sizing
## hold_time fix
## clock_opt -only_hold_time
ROUTING
open scripts/route_icc.tcl
14 . In the layout window, click on Route Core Routing and Optimization, a new
window will open up as shown below
San Francisco State University Nano-Electronics & Computing Research Lab 117
You can select various options, if you want all the routing steps in one go, or do global
routing first, and then detail, and then optimization steps. It is up to you.
icc_shell> route_opt
Above command does not have optimizations. You can although do an incremental
optimization by clicking on the incremental mode in the above window after route_opt is
completed. View the shell window after routing is complete. You can see that there are
no DRC violations reported, indicating that the routing is clean:
San Francisco State University Nano-Electronics & Computing Research Lab 118
15. Save the cel and report timing
16. Goto Layout Window, Route Verify Route, it opens up a new window as shown
below, click ok.
San Francisco State University Nano-Electronics & Computing Research Lab 119
The results are clean , as you can see in the window below:
If results are not clean, you might have to do post route optimization steps, like
incremental route. Verify, clean, etc.
San Francisco State University Nano-Electronics & Computing Research Lab 120
EXTRACTION
9.0 Introduction
In general, almost all layout tools are capable of extracting the layout database using
various algorithms. These algorithms define the granularity and the accuracy of the
extracted values. Depending upon the chosen algorithm and the desired accuracy, the
following types of information may be extracted:
Detailed parasitics in DSPF or SPEF format.
Reduced parasitics in RSPF or SPEF format.
Net and cell delays in SDF format.
Net delay in SDF format + lumped parasitic capacitances.
The DSPF (Detailed Standard Parasitic Format) contains RC information of each
segment (multiple R’s and C’s) of the routed netlist. This is the most accurate form of
extraction. However, due to long extraction times on a full design, this method is not
practical. This type of extraction is usually limited to critical nets and clock trees of the
design.
The RSPF (Reduced Standard Parasitic Format) represents RC delays in terms of a pi
model (2 C’s and 1 R). The accuracy of this model is less than that of DSPF, since it does
not account for multiple R’s and C’s associated with each segment of the net. Again, the
extraction time may be significant, thus limiting the usage of this type of information.
Target applications are critical nets and small blocks of the design. Both detailed and
reduced parasitics can be represented by OVI’s (Open Verilog International) Standard
Parasitic Exchange Format (SPEF). The last two (number 3 and 4) are the most common
types of extraction used by the designers. Both utilize the SDF format. However, there is
major difference between the two. Number 3 uses the SDF to represent both the cell and
net delays, whereas number 4 uses the SDF to represent only the net delays. The lumped
parasitic capacitances are generated separately. Some layout tools generate the lumped
parasitic capacitances in the Synopsys set_load format, thus facilitating direct back
annotation to DC or PT.
17. Go to Layout Window, Route Extract RC, it opens up a new window as shown
below, click ok.
San Francisco State University Nano-Electronics & Computing Research Lab 121
Alternatively, you can run this script on the ICC shell:
The above script will produce the min and max files, which can be used for delay
estimation using PrimeTime tool.
##Write out a hierarchical Verilog file for the current design, extracted from layout
The extracted verilog netlist can be used to double-check the netlist by running a
simulation on it using VCS
San Francisco State University Nano-Electronics & Computing Research Lab 122
19. Report power
If you open the generated fifo_power.rpt, you will notice both total dynamic power
(active mode) and cell leakage power (standby mode) being reported.
[[email protected]] $ csh
[[email protected]] $ cd
[[email protected]] $ cd /asic_flow_setup/pt_post
[[email protected]] $ source /packages/synopsys/setup/synopsys_setup.tcl
2. PT may be invoked in the command-line mode using the command pt_shell or in the
GUI mode through the command primetime as shown below.
Command-line mode:
> pt_shell
GUI-mode:
> primetime
Before doing the next step, open the post_layout_pt.tcl script and keep it ready which is
at location /scripts/post_layout_pt.tcl
[[email protected]]$ vi scripts/post_layout_pt.tcl
3. Just like DC setup, you need to set the path to link_library and search_path
San Francisco State University Nano-Electronics & Computing Research Lab 123
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max .db/packages/process_kit/gen
eric/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digital_Standard_Cell_Library/
synopsys/models/saed90nm_min.db ]
pt_shell > set target_library [ list
/packages/process_kit/generic/generic_90nm/updated_Oct2008/SAED_EDK90nm/Digita
l_Standard_Cell_Library/synopsys/models/saed90nm_max.db]
8. Now we can do the analysis of the design. Generally, four types of analysis is
All four types of analysis can be accomplished by using the following commands:
San Francisco State University Nano-Electronics & Computing Research Lab 124
The results are reported to the timing.rpt file under the reports subdirectory. Please open
the file to see if the design meets the timing or if there are any errors.
9. Reporting setup time and hold time. Primetime by default reports the setup time. You
can report the setup or hold time by specifying the –delay_type option as shown in below
figure.
The report is added to the timing.rpt file. Please read the file.
The report is added to the timing.rpt file. Please read the file.
11. Reporting timing with capacitance and transition time at each level in the path
The report is added to the timing.tran.cap.rpt file. Please read the file for results.
12. You can save your session and come back later if you chose to.
San Francisco State University Nano-Electronics & Computing Research Lab 125
APPENDIX A: Design for Test
A.0 Introduction
San Francisco State University Nano-Electronics & Computing Research Lab 126
test vectors, we should be able to control each node we want to test. The higher
the degree of controllability, the better.
3. Observability: It is the ease with which we can observe the changes in the nodes
(gates). Like in the previous case the higher the observability, the better. What I
mean by saying higher is that, we can see the desired state of the gates at the
output in lesser number of cycles.
It is easy to test combinational circuits using the above set of rules. For sequential circuits,
to be able to the above, we need to replace the flip flops (FF) with ‘Scan-FF’. These Scan
Flip Flops are a special type of flip flops; they can control and also check the logic of the
circuits. There are two methodologies:
1. Partial Scan: Only some Flip-Flops are changed to Scan-FF.
2. Full Scan: Entire Flip Flops in the circuit are changed to these special Scan FF.
This does mean that we can test the circuit 100%.
Some Flip-Flops cannot be made scanable like Reset pins, clock gated Flip Flops
are expected results or not.
Scan chains test sequential Flip-Flops through the vectors which pass through them.
Any mismatch in the value of the vectors can be found.
Manufacturing tests: These include functional test and performance test. Functional
test checks whether the manufactured chip is working according to what is actually
After the chip comes back from the foundry, it sits on a load board to be tested. The
designed and performance test checks the performance of the design (speed).
The socket is connected to the board. The chip is placed on a socket. A mechanical arm
equipment which tests these chips is called Automatic Test Equipment (ATE).
San Francisco State University Nano-Electronics & Computing Research Lab 127
A test program tells the ATE what kind of vectors needs to be loaded. Once the
vectors are loaded, the logic is computed and the output vectors can be observed on the
When an error is found, you can exactly figure out which flip-flop output is not right
Failure Analysis analyzes why the chip failed. This is a whole new different field.
Through the Automatic Test Pattern Generator, we can point out the erroneous FF.
Fault Coverage: It tells us that for a given set of vectors, how many faults are covered.
Test Coverage: It is the coverage of the testable logic; meaning how much of the
The latest upcoming technology tests the chips at the wafer level (called wafer sort
circuit can be tested.
(with probes))
In the EDA industry, library is defined as a collection of cells (gates). These cells are
called standard cells. There are different kinds of libraries which are used at different
steps in the whole ASIC flow. All the libraries used contain standard cells. The libraries
contain the description of these standard cells; like number of inputs, logic functionality,
propagation delay etc. The representation of each standard cell in each library is different.
San Francisco State University Nano-Electronics & Computing Research Lab 128
There is no such library which describes all the forms of standard cells. For example: the
library used for timing analysis contains the timing information of each of the standard
cells; the library used during Physical Implementation contains the geometry rules.
Standard Cell Libraries determine the overall performance of the synthesized logic. The
library usually contains multiple implementations of the same logic-function, differing by
area and speed. For example few of the basic standard cells could be a NOR gate, NAND
gate, AND gate, etc. The standard cell could be a combination of the basic gates etc. Each
library can be designed to meet the specific design constraints. The flowing are some
library formats used in the ASIC flow:
San Francisco State University Nano-Electronics & Computing Research Lab 129
.plib: Physical Liberty Format
This library format is from Synopsys. It is mainly used for Very Deep Sub Micron
Technology. This library is an extension of the Liberty format; it contains extensive,
advanced information needed for Physical Synthesis, Floorplanning, Routing and RC
Extraction
In this example the rising and falling delay is 60 ps (equal to 0.6 units multiplied by the
time scale of 100 ps per unit specified in a TIMESCALE construct. The delay is specified
between the output port of an inverter with instance name A.INV8 in block A and the Q
input port of a D flip-flop (instance name B.DFF1) in block B
San Francisco State University Nano-Electronics & Computing Research Lab 130
CUSTOMER EDUCATION SERVICES
IC Compiler II:
Block-level Implementation
Workshop
Lab Guide
20-I-078-SLG-010 2019.03-SP4
Disclaimer
SYNOPSYS, INC. AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
WITH REGARD TO THIS MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Trademarks
Synopsys and certain Synopsys product names are trademarks of Synopsys, as set forth at
https://www.synopsys.com/company/legal/trademarks-brands.html
All other product or company names may be trademarks of their respective owners.
Third-Party Links
Any links to third-party websites included in this document are for your convenience only. Synopsys does not endorse
and is not responsible for such websites and their practices, including privacy practices, availability, and content.
Synopsys, Inc.
690 E. Middlefield Road
Mountain View, CA 94043
www.synopsys.com
0 IC Compiler II GUI
Learning Objectives
Lab Duration:
45 minutes
1. Log in to the Linux environment with the assigned user id and password.
2. From the lab’s installation directory, change to the following working
directory and invoke IC Compiler II:
$ cd lab0_gui
$ icc2_shell
icc2_shell> ls
You will see that command and output log files were created
(icc2_shell.cmd.* and .log.*, with date/time). The .cmd file
records all commands, including initialization commands invoked during
start-up. The.log file records commands and command output after tool start-
up. In addition, there is an icc2_output.txt file that also contains all
output. Do not spend too much time looking at the log file contents.
Note: Log/cmd file naming is defined through variables in the
initialization file, .synopsys_icc2.setup.
icc2_shell> start_gui
6. In the dialog that opens, click on the yellow symbol in the top-right
corner to Choose a design library called “ORCA_TOP.dlib”.
Design libraries are marked with this symbol:
7. Now that a library has been chosen, select “ORCA_TOP/placed” from the
list at the bottom of the Open Block dialog. Click OK. You will be presented
with the layout view of the design.
8. Enlarge or maximize the GUI window. You will see that the layout view
adjusts to fit the larger window.
You are looking at the layout of a placed design: Macros are placed at pre-
defined locations, the power mesh (vertical and horizontal VDD/VSS straps)
has been completed, and standard cells have been placed.
1. Spend a few minutes to become familiar with the zoom and pan buttons.
Hint: A short, descriptive ToolTip will pop up when a mouse pointer is held
motionless over a button.
To exit the zoom and pan mode press the [Esc] key or pick the Selection Tool
(the arrow icon). The cursor returns to an arrow or pointer shape.
2. Use hotkeys: Lower-case [F] or [Ctrl F] both correspond to zoom fit all (or
full view), for example. [+] or [=] is zoom-in 2x, [-] is zoom-out 2x.
3. Find out about other hot key definitions by selecting the pull down menu
Help Report Hotkey Bindings. A new view appears, listing the hot key
definitions. When done, close the Hotkeys view by clicking on the tab’s “X”.
4. You can also use mouse strokes to pan and zoom, instead of using GUI
buttons or keyboard hotkeys. Try using strokes as follows:
Zoom in on an area of interest: Lower-case [Z], draw an area and then [Esc].
Now click and hold the middle mouse button while moving the pointer
straight up or down and holding it there. The stroke menu appears near the
pointer:
Release the middle button and the design should zoom to fit the display
window (Zoom Fit All). To zoom in on an area stroke (move mouse with
middle button depressed) in a 45° direction upward (to the left or right) – the
view should zoom-in to a rectangular area defined by the stroke line. Stroking
45° downward zooms out. Stroking in the east/west direction pans the display
such that the start point of the stroke is moved to the center of the window.
Note: You can query or define your own strokes by using the
commands get_gui_stroke_bindings and
set_gui_stroke_binding.
5. The keyboard arrow keys can also be used to pan the display
up/down/left/right. Try it.
6. If your mouse has a scroll wheel, it can be used to zoom in/out (2X or ½X)
around the area of the mouse’s pointer.
You can control what types of objects are visible and/or selectable through the View
Settings panel. In the following steps you will turn on visibility to some key objects
one at a time, to clearly see what they represent.
1. Make sure that under the Options button, Auto apply is checked: This way
changes are applied immediately without having to confirm with the Apply
button each time.
2. In the visibility column uncheck Route.
3. Check Pin. The input, output and power
pins of the cells are displayed.
4. Uncheck then check Labels. This controls
cell name visibility.
5. Zoom in to one of the SRAM macro pins.
Expand Labels by clicking on the + icon on
the left. Check Pin. Pin names are now
visible as well.
6. In selection mode (press [ESC]), draw a
box to fully enclose a few macros. This
selects all selectable objects inside the box,
which are highlighted in white.
7. Make pins un-selectable by unchecking the
box in the selection column. Draw
the same box again to see that only the
macros are selected.
8. Check Route. All routes are displayed.
Expand Route and you will find that you
can control visibility of routes by Net Type.
(e.g. Clock, Ground, etc.).
9. Save the view settings. Click this button
and select Save preset as…
Type MyPreset into the preset name field
and click OK. ICC II creates a file
MyPreset.tcl in the
~/.synopsys_icc2_gui/presets/Layout
directory. All saved presets will be loaded
automatically whenever ICC II is started
again.
17. Re-apply the Default settings by clicking the “reload” circular arrow.
7. You can return to the Objects tab of the View Settings panel by selecting the
View Settings tab.
8. Deselect all objects by either clicking on an empty area in the layout, by using
the menu Select Clear, or by typing [Ctrl D].
9. Select multiple objects in the same area with a left button drag-and-draw. All
selectable objects within the drawn rectangle are selected.
10. Keep what is selected and select additional objects by holding down the [Ctrl]
key while selecting with the left mouse click.
11. Enable Route visibility, zoom into a small area, then click on an area with
multiple objects stacked on top of each other (for example, a via connecting a
horizontal and a vertical metal route). Notice that one object will be selected
(solid white), while a different object will be queried (dashed white). The
InfoTip box goes with the dashed object.
12. Cycle through the stacked objects by repeatedly clicking the left mouse
button, without moving the cursor: Notice that both the solid and dashed line
objects cycle. Alternatively, press F1 to cycle through just the object queries.
13. If it is difficult to notice the highlighted objects among other bright objects, it
is possible to reduce the brightness of the unselected objects, thereby
increasing the contrast. A Brightness control is located at the top of the
View Settings panel.
By default, the GUI displays all shapes at the current block level, e.g. metal shapes,
blockages, pins etc. To see shapes inside standard cells or hard macros, you have to
increase the view level, and enable them to be “expanded”.
3. Increase the viewing Level from 0 to 1, then click on the Hierarchy Settings
button, and check Hard Macro.
If you zoom into one of the RAMs you should see the structures inside the
macro. You can turn layer visibilities off/on to see individual layers. Since
these are frame views, what you are seeing are mostly routing blockages.
4. You can also interact with structures within cells: Click on the Multiple Levels
Active button. Now you can hover over
objects within cells, and analyze them.
This functionality is used extensively
in hierarchical design planning, where
you can perform physical manipulation at multiple physical design levels at
the same time.
As you may have noticed, you can switch back and forth between the panels (or
tabs) on the right. Sometimes it can be useful to display more than one panel at the
same time. This is particularly useful if you have a lot of screen real estate.
1. The query panel should still be open. If not, select a macro then press [Q] to
bring it up again.
2. Right-click on the query tab –
a context menu should be
displayed.
3. From that context menu click on
the “Split ‘Query’ Below” button.
The panel area on the right will
now split into a top and a bottom
area, with the query panel being
in the bottom area.
4. Now that you have two panel
areas, you can drag and drop tabs
from one area to the other. For
example, try dragging the
Favorites tab (explained in the
next Task) from the top to the bottom area. Of course, you can also rearrange
the tabs within the same panel area using drag and drop.
5. When you right-click on the empty area under the tabs, or on a tab (as you’ve
done above), you can also open new panels, for example the Property Editor,
which can be used to display or change attributes of selected objects.
6. Try the other menu banner icons, like Detach, Dock and Collapse. Also note
that you are not limited to just two panel areas – you can split the panels
again.
Have a look at the timing of this design. Use the menu WindowTiming Analysis
Window. In the new window that appears, press the Update button.
Once timing is up-to date, you will see a histogram for the timing distribution.
Violating paths show up as red bars on the left, and positive slack paths are green.
Click on the left-most red bar and you should see timing end-points displayed on the
bottom. Select the first entry, then right click. From the context menu, choose
“Select Worst Paths”. You will see the timing path in the layout view.
For certain functions that you use over and over, it might
make sense to add them to your Favorites.
1. The quickest way to get help is to use Command Search. Access Command
Search by using the keyboard shortcut lower-case [H] or by clicking on the
magnifying glass at the top of the block window, on the right end of the
menus.
2. In the “Search for commands” field, begin typing “place”. Notice that as you
type, ICC II lists all the menus, commands and application options that relate
to that search string.
3. Select one of the Application Options, for example,
place.coarse.congestion_analysis_effort.
A second window will open, titled “Application Options”.
In this window you can change the settings of application options, or you can
search for them. Explore this window a little to become familiar with it.
Application options are used to configure various aspects of how
IC Compiler II works. You will learn more about them in the next lecture.
4. IC Compiler II supports command, variable, file name and command option
completion through the [Tab] key. Try the following in the console at the
icc2_shell> prompt:
5. To view the man page on a command or application option you need to enter
the exact command or variable name. Alternatively, you can enter the starting
characters of a command and use command completion to find the rest (auto-
completion is not available for application options). If you are not sure what
the exact name is, use help for commands, use get_app_options or
report_app_options for application options, and use printvar for
variables, along with the * wildcard. Here are some examples:
Let’s say you are looking for more information about the clock tree synthesis
command, but you do not remember the exact command name. You know it
contains the string “syn” (for synthesis). To list all commands that contain this
string enter:
help *syn*
From the displayed list of commands, you pick out the one you are interested
in, namely, synthesize_clock_trees.
Of course, you could have entered syn in Command Search as well.
7. To get a full help manual page – a detailed description of the command and
all of its options, type:
man synt[Tab]c[Tab]
or
man synthesize_clock_trees
8. Now let’s say you need help on a specific application option, but again, you
don’t remember its exact name, but it pertains to CTS. To list all application
options that start with “cts”, enter:
report_app_options cts*
man cts.compile.enable_cell_relocation
10. You can also get additional help for an error or warning message, using the
unique message code, for example:
man ZRT-536
11. Finally, specifically for this workshop, we are providing a few custom
functions for the shell that can simplify your work. They are defined in the file
../ref/tools/procs.tcl. Try the following:
aa syn
v man ZRT-536
v aa cts
v is a user-defined alias that calls the “view” custom function, useful for
viewing long reports, and allows for regular expression searches.
12. To list the workshop-provided helper functions and aliases, type “ces_help”.
13. Quit IC Compiler II by using the menu FileExit, or by typing exit at the
command prompt, and selecting Discard All.
2 Floorplanning
Learning Objectives
Lab Duration:
35 minutes
Introduction
ORCA Design
The example design used in all labs is called ORCA which was specifically created
to address the needs of training. The design was created by the Synopsys Customer
Education Services department.
SAED32nm Library
The technology library used in this workshop is a 32 nm library which can be freely
distributed amongst Synopsys customers. It was created by Synopsys Inc. to serve as
a means of demonstrating the various needs of modern high-frequency, multi-
voltage, multi-scenario designs. For more information about this library please visit
http://www.synopsys.com.
This Lab
You will be creating a basic floorplan for the ORCA_TOP design. This will include
creating the initial outline of the block, shaping and placing the voltage areas,
placing the macros, performing analysis on macro placement, and finally
implementing the power network.
Instructions
Answers / Solutions
You are encouraged to refer to the end of this lab for answers and help.
UNIX% cd lab2_floorplan
UNIX% icc2_shell -gui
2. Select the Script Editor tab at the bottom-left of your ICC II window.
3. Click on the “folder” button (see the arrow above) and open the run.tcl file.
This file contains the commands that you will be executing in this lab. Instead
of opening the file in a separate editor and using copy/paste to transfer
commands, you can use the built-in script editor to simply highlight/select
lines you want to execute, and then click on the Run Selection button.
Try this now by selecting the entire line echo "hello world", then
clicking Run Selection. Look at the results in the shell window.
4. Open the block which has been initialized with Verilog, UPF and timing
constraints, ready to be floorplanned. Use the GUI (Open an existing block),
or select/run the following command:
open_block ORCA_TOP.dlib:ORCA_TOP/floorplan
For this task you will use the Task Assistant. You do not need to run commands
from run.tcl for the next few tasks.
1. Open the Tasks palette (if not already open) by right-clicking your mouse
anywhere in the tab area on the right (where the View Settings palette is), then
selecting Tasks. In the pull-down menu at the top of the Tasks palette, select
Design Planning.
Note: You could also use the full task assistant by pressing F4, or
using the menu TaskTask Assistant. Then, select the
“>” (Show task navigation tree) icon in the upper-left
corner of the Task Assistant
window, and in the pull-down
menu choose Design Planning.
6. Preview the floorplan, and once it looks correct, specify a Uniform spacing
value of 20 between the core and the die.
7. Click on Apply to generate the initial floorplan and then Close the dialog box.
1. From the Tasks palette, select Block Shaping Block Shaping: This is
used to automatically create (shape) voltage areas.
3. Enable visibility of voltage areas by selecting Voltage Area from the View
Settings Objects panel and expand Voltage Area to enable Guardband
visibility as well.
You should find that there are two voltage areas displayed. PD_RISC_CORE
in the top-right corner, and DEFAULT_VA for the remaining area.
You should also see that the macros in DEFAULT_VA have been placed
(although not very carefully), but not the macros in PD_RISC_CORE: There
are four macros inside that voltage area are stacked on top of one another.
This is normal. Only the top-level macros are placed at this time, macros
belonging to non-default voltage areas will be placed during the next Task 4.
2. In the Cell Placement dialog, check the box next to Use floorplanning
placement.
4. Enable pin visibility and examine the placement of the macros in the GUI.
You should see that the placer has automatically created channels between the
macros and has added soft placement blockages in the narrow channels, to
reduce congestion and improve routability. The more pins along a macro’s
edge, the larger the channel width.
You should also see that the macros have been flipped so that the sides with
common pins face each other. This is done to minimize the overall number of
channels that require routing and possibly buffering between macros.
1. You could also use the Task Assistant to place the pins of the block, but this
time just select/run the corresponding commands from run.tcl:
2. Have a look at the layout, you should see that all the block’s ports, the logical
representations of the physical pins, have been placed.
3. Zoom in to individual ports to verify that the pins have, indeed, only been
placed on layers M3-M6.
Note: To be able to select the physical block “pins”, turn on
Terminal selectability (terminals are the physical pins of
the current block).
4. Search for the *clk ports/terminals and zoom into their location. See
run.tcl for hints.
5. Apply the commands from run.tcl to create a pin guide. This will constrain
all the specified clock pins to be placed in a certain area. Afterwards, turn on
pin guide visibility by turning on GuidePin Guide from the View Settings
Objects panel, or apply the command from the script. To see the pin guide,
zoom in to the highlighted area as shown below.
6. Change the width of the *clk ports (the terminal shapes) to 0.1 and the length
to 0.4, then rerun pin placement. You should find that all clock pins are now
located inside the pin guide, and that they have been resized as specified.
This was just one example of applying pin constraints. Review the man page
to see what other adjustments are possible.
Now that macros and standard cells have been placed, as well as the block ports, it
is a good idea to check if there are any congestion issues.
4. To make the display a little more interesting, try the following steps.
Change the Bins/From/To settings as shown in the following screenshot:
Bins has been reduced from 9, and from/to has been
changed from 0/7: Now, overflows from -2 to 2 have
their own individual bin; All overflows greater than 2
are grouped into a combined bin, and the same for
overflows less than -2.
This should drastically change what you see, by making the low overflows
brighter/hotter: You will see that there are a lot of areas with 0 overflow.
Display changes do not alter the results: The design has no serious congestion
issues. If you want to see a detailed calculation of the overflow for an edge,
move the mouse over that edge and you will see a popup with all the details,
as shown here:
5. Close the congestion map, either by clicking on the Draw Global Route
Data Flow Flylines (DFF) is a feature that allows you to analyze not just simple pin-
to-pin connections, but connections that cross through combinational gates as well
as registers. This enables a very insightful view of your overall design and makes it
easier to make informed decisions about macro placement which has a large impact
on standard cell placement.
4. Now select one macro to see its connections to other macros and ports. Limit
the tracing by checking “Number of registers” or “Number of gates” and
changing the Min/Max numbers. Using this method, you can quickly figure
out whether objects are connected directly, or through several gate levels, or
through several register levels. You can select several macros at the same time
using the Control key. If you click on a flyline, you will get detailed
information about the connection(s).
5. Note that DFF will not show you any flylines if they terminate at a register.
You will only see flylines between macros or between macros and ports,
depending on the selection you have made under “Include”. In the next Task
we will show you how to perform register tracing.
1. Select Register Tracing from the connectivity pull-down, located right under
the Data Flow Flylines entry.
2. Select any macro. The registers connecting to that macro will be highlighted
immediately, skipping any logic that may be in between. Select Show flylines
to see the flylines between the macro and registers.
3. To see the next level of registers, increase Max levels to 2 under the Limit
Tracing heading, and under Show Levels, check Level 2. If there is a second
level of registers they will be highlighted in a different color.
4. Under Highlight, you have the option to also display End points: These are the
endpoints from the last level of registers displayed. You can also display
Direct end points, which are endpoints that connect directly (level 0) to the
selected source, i.e. our macro.
6. We will assume for this lab that the macros are placed to our satisfaction, so
the next step is to fix their location. You can do this by selecting the macros
then clicking on , or by entering:
Task 9. PG Prototyping
PG prototyping can help you create a basic PG mesh very quickly. All you need to
specify are the PG net names, the layers and the percentage of the layers you want
to use for PG routing. This is most commonly used as a place-holder for the final
power mesh, in order to check congestion issues that a PG mesh may cause.
6. If you like, play around with the layer and percentage parameters to test
different PG mesh configurations. Remove PG routes before re-applying new
parameters.
In the next task you will build the final mesh using a pre-written script.
Planning an entire power network for a design can be a complex task, especially in a
multi-voltage environment. The purpose of this task is to just give you an idea of
what is possible in IC Compiler II using pattern-based power network synthesis
(PPNS), and to demonstrate how few commands it takes to create a full multi-
voltage PG mesh.
1. Quickly review the script that inserts the entire power structure. Open the file
scripts/pns.tcl in an editor (or in the script editor!).
After first deleting any pre-existing PG mesh structures, you will see how the
patterns are created (create_pg_*_pattern), followed by the strategies
(set_pg_strategy). Once both are defined, the strategies are implemented
using compile_pg.
source scripts/pns.tcl
3. Once the script has completed, review the power mesh, the macro PG
connections, and the standard cell rails in the layout view.
Note: The power mesh will have a few issues here and there,
which will have to be taken care of for final implementation.
save_lib
exit
Learning Objectives
Lab Duration:
60 minutes
Instructions
Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers, or to
obtain help with the execution of some steps.
1. Change to the work directory for the placement lab, then start ICC II in GUI
mode and execute the load script:
UNIX% cd lab3_place
UNIX% icc2_shell -gui -f load.tcl
The script copies the block that completed design setup, called
ORCA_TOP/init_design, to a block called ORCA_TOP/place_opt, and
opens the copied block.
2. Generate a timing QoR summary.
report_qor –summary
To support a more immersive and interesting lab experience, there are no step-by-
step instructions for this lab.
Instead, you are asked to open the file run.tcl in the ICC2 script editor and
exercise the commands line by line. We are specifically asking you not to just
source the entire file, as this would defeat the purpose which is to understand how
all the options and commands play together. If there is an option that does not make
sense, have a look at its man page.
The following sections provide additional information and comments. The sections
are ordered in such a way that you can refer to them as you go through the script.
If you like, you can diverge from the commands in run.tcl, or you could try
different efforts, different settings etc. Note though that the runtimes might vary.
Make sure you only run the place_opt command once, as the runtime is relatively
high.
Pre-placement Checks
Before performing placement and optimization, it is best to ensure that there are no
issues that will prevent place_opt from doing its job.
...................................................................................................
Question 2. What is the maximum routing layer set for the block?
...................................................................................................
The design contains scan chains that are specified using SCANDEF as part of the
data setup step.
Question 3. How many scan chains exist in the design?
...................................................................................................
It can be important to establish whether the design has any high-fanout nets, or nets
with a certain fanout. In ICC II, this is analyzed using report_net_fanout. Use
the commands shown in run.tcl to answer the following question:
Question 4. How many non-clock high fanout nets exist with a fanout
larger than 60?
...................................................................................................
Pre-placement Settings
For technologies of 12 nm and below, you should use the set_technology
command to configure ICCII. This command changes several application options to
support the given technology.
To insert tie cells during place_opt, library tie cells must not have the
dont_touch attribute applied, and they must be included for ‘optimization
purpose’.
Question 5. What command/option is used to include cells for
optimization?
...................................................................................................
Logic Restructuring
You can use advanced logic restructuring by setting the following option, for
example to achieve added power restructuring, set the application option
opt.common.advanced_logic_restructuring_mode to power.
The other choices are area, area_timing, timing, timing_power.
Using route-driven extraction, global routing is run on the initially placed design to
construct an RDE extraction table. This extraction is used subsequently for all
virtual net extraction which is then used for all pre-route optimizations (place_opt
and clock_opt). This improves the pre- to post-route timing correlation. RDE is
on-by-default for 16nm and below technologies (setting: auto). You can explicitly
turn this on by setting this to “true”.
Application option: opt.common.enable_rde
ICG Optimization
ICG optimization is useful only if a design is known to have problems meeting ICG
enable setup timing which is not the case in our design.
In the interest of reasonable run times, it is recommended not to enable this
optimization for this lab.
Application option: place_opt.flow.optimize_icgs
...................................................................................................
...................................................................................................
Answers / Solutions
Question 2. What is the maximum routing layer set for the block?
The report_ignored_layers command indicates that
the maximum layer is set to M8.
Although the technology being used has 9 metal layers, we
limit signal routing to metal 1 through 8 (M1 – M8). Metal
7 and 8 are used also for the power mesh, which limits the
available resources. Metal 9 is not used in this design.
Question 4. How many non-clock high fanout nets exist with a fanout
larger than 60?
There are 10 total nets with a fanout >= 60. By eliminating
any net name containing “clk”, or fanout driver pin
containing “CLK” you should find that there are 2 non-clock
nets with a fanout >= 60.
You can also use the more precise method, shown in
run.tcl, using “-filter net_type==signal”.
Question 6. What is the app option for optimizing the scan chains and
what is its default setting?
It is opt.dft.optimize_scan_chain and the default is
true
5 Design Setup
Learning Objectives
Lab Duration:
40 minutes
Introduction
The design setup task is the most important step to perform correctly. Faulty setup
can lead to many problems downstream, which can have a large negative impact on
the design schedule. Once design setup is completed, you rarely need to revisit this
task (unless the design or constraints change) and can focus on the more productive
tasks like placement, CTS and routing.
lab56_setup/
ORCA_TOP_design_data/
Instructions
Answers / Solutions
You are encouraged to refer to the end of this lab to verify your answers.
UNIX% cd lab56_setup
UNIX% ls -al
This directory is shared with the next lab, in which you will complete the
setup process (timing setup).
The rm_setup/ and ORCA_constraints/ directories, as well as the
run6.tcl file are not listed on the previous page, because they are not used
in this lab.
2. Invoke IC Compiler II:
3. Open the run5.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
4. Open another terminal window and change your current directory to
lab56_setup. You will use this window to examine the files that will be
used in this lab.
1. In the last terminal window that you just opened, view the setup.tcl file in
the lab56_setup directory.
Question 1. What is the default value of the search_path application
variable?
(HINT:
printvar sear[TAB] or
echo $search_path or
get_app_var search_path or
report_app_var search_path )
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
2. By looking at the SEE ALSO section at the bottom of the man page of the
set_host_options command, use the appropriate command to answer the
next question (this is a useful technique to find related commands):
Question 6. How many cores are enabled, by default?
...................................................................................................
printvar search_path
printvar REFERENCE_LIBRARY
report_host_options
print_suppressed_messages
You might notice that there are additional suppressed messages beyond the
ones listed in the setup file (CTS-725, POW-001, …). These are tool-defaults,
and this is usually done in order to reduce output verbosity. To see these
messages, use unsuppress_message.
create_lib \
-use_technology_lib $TECH_LIB \
-ref_libs $REFERENCE_LIBRARY \
ORCA_TOP.dlib
You will get the following error message: Error: technology library
' ../ref/CLIBs/saed32_1p9m_tech.ndm' does not match any
library on the reference library list. (LIB-059)
2. Correct the problem in the setup.tcl file, then repeat the necessary
commands until a design library is successfully created.
Question 7. What is the name of the newly-created design library?
...................................................................................................
...................................................................................................
4. Using the file and directory structure shown on page 3, correct the problem,
then repeat the necessary commands until the netlist is read without errors.
5. If, as in our case, the GUI is running, explicitly linking the block is not
needed, since it is done automatically. You can, therefore, skip the following
command. Linking ensures that all instantiated references can be found.
link_block
A linking problem occurs when the list of reference library pointers attached
to the design library (specified by create_lib –ref_libs ...), is
missing the library that contains one or more reference cells that are
instantiated in the netlist.
6. The ORCA_TOP design uses four different SRAMs, which are all unresolved.
If you look inside the directory that contains the reference cell libraries, you
should be able to determine the missing reference library name.
Question 9. What is the name and location of the reference library
containing the unresolved references?
...................................................................................................
The design should link without any warnings. While you could continue with
the remaining design setup steps, you will first implement the second method
to fix the problem at its source, the setup.tcl file.
9. Method #2:
First edit setup.tcl and add the missing reference library to the
REFERENCE_LIBRARY list, then close the design library, then repeat the
main design setup steps:
close_lib
source -echo setup.tcl
create_lib -use_technology_lib $TECH_LIB \
-ref_libs $REFERENCE_LIBRARY ORCA_TOP.dlib
read_verilog -top ORCA_TOP ORCA_TOP.v
link_block
...................................................................................................
12. If a design library contains multiple blocks with the same blockName (but
with a different labelName and/or viewName), the block must be referred to
by its unique block handle. If the design library contains only one block with a
certain blockName, then that block can be referred to simply by its
blockName. You can optionally include the libName, and/or labelName,
and/or viewName, as demonstrated by executing these commands:
get_blocks ORCA_TOP
get_blocks ORCA_TOP.design
get_blocks ORCA_TOP.dlib:ORCA_TOP
get_blocks ORCA_TOP.dlib:ORCA_TOP.design
Note: Since this library contains only one block called ORCA_TOP,
all of the above commands are accepted by IC Compiler II,
and they all return the same full block handle.
load_upf ORCA_TOP.upf
commit_upf
source ORCA_TOP.fp/floorplan.tcl
2. Look at the GUI BlockWindow, and you will see the complete mirrored-L
shape floorplan of the ORCA_TOP block.
3. Zoom in and notice the complex P/G mesh structure, then zoom back out to a
full-zoom. Since the P/G mesh makes it difficult to see the underlying
structures clearly, we will turn off their visibility next.
4. In the View Settings panel, make the follwing changes, to improve the
visibility of the floorplan:
- Port Uncheck visibility
- Terminal Check selectability
- Voltage Area Check visibility
- Route Net Type Power and Ground Uncheck visibility
5. Under the SettingsView tab, enable Label settingsScale fonts. This
improves the readibility of the labels.
6. You can now more clearly see that:
- All the macros are placed
- Terminals (metal connection shapes) for the I/O and P/G ports are placed
around the block boundary
- A voltage area, called DEFAULT_VA, is defined for the entire core area
(dashed purple outline)
- A second voltage area called PD_RISC_CORE is defined in the lower-
right
- Standard cells are still stacked on top of each other in the lower-left corner
read_def ORCA_TOP.scandef
Question 11. How many scan chains does the design have?
...................................................................................................
8. Connect the P/G pins to the supply nets, and verify that there are no P/G
connection errors:
connect_pg_net
check_mv_design
1. Confirm that the placement site called unit is set as the default site definition
(its is_default attribute should be true):
In our case, this was defined while preparing the technology-only library,
which you learned about in the previous unit. If this was not done, or you did
not use a technology-only NDM, you would have to apply this command:
set_attribute [get_site_defs unit] symmetry {Y}
Question 12. Does Y-symmetry mean that standard cell can be flipped in
the Y-direction (along the X-axis), or flipped in the X-
direction (along the Y-axis)?
...................................................................................................
3. Confirm that the metal layer preferred routing directions are defined:
report_ignored_layers
set_ignored_layers -max_routing_layer M6
report_ignored_layers
You have completed all the design setup steps; Timing setup will be performed in
the next unit. This is a good time to save the block.
1. List the files and directories in the current working directory (CWD),
lab56_setup:
ls -l
Question 13. Does the ORCA_TOP.dlib design library exist in the current
working directory?
...................................................................................................
...................................................................................................
3. Exit out of IC Compiler II. The GUI will ask for confirmation – click Exit:
exit
Answers / Solutions
9. Method #2:
First close the design library, then edit setup.tcl and add the missing
reference library to the REFERENCE_LIBRARY list …
Question 11. How many scan chains does the design have?
8
Number of Processed/Read DEF Constructs
---------------------------------------
VERSION : 1/1
DIVIDERCHAR : 1/1
BUSBITCHARS : 1/1
DESIGN : 1/1
SCANCHAINS : 8/8
You can also get the scan chain count by running the
command get_scan_chain_count.
Question 12. Does Y-symmetry mean that standard cell can be flipped in
the Y-direction (along the X-axis), or flipped in the X-
direction (along the Y-axis)?
Standard cell can be flipped in the X-direction, along the
Y-axis.
6 Timing Setup
Learning Objectives
Lab Duration:
40 minutes
Introduction
The timing setup task completes the design setup, which is the most important step
to perform correctly. As explained earlier, once setup is completed, you rarely need
to revisit this task (unless the design or constraints change) and can focus on the
more productive tasks like placement, CTS and routing.
• Performing MCMM setup, which includes defining the corners, modes and
scenarios required for analysis and optimization, and loading the MCMM
constraints
• Confirming implementation phase readiness with various reports and checks
• Performing a zero-interconnect timing sanity check
lab56_setup/
ORCA_TOP_constraints/
ORCA_TOP_c_*.tcl Corner constraints.
ORCA_TOP_m_*.tcl Mode constraints.
ORCA_TOP_s_*.tcl Scenario constraints.
ORCA_TOP_port_lists.tcl Port definitions used by constraint
commands.
scripts/
mcmm_ORCA_TOP.tcl Script to define MCMM corners, modes,
and scenarios, and load their respective
timing constraints.
Instructions
Answers / Solutions
You are encouraged to refer to the end of this lab to verify your answers.
UNIX% cd lab56_setup
UNIX% icc2_shell -gui
2. Open the run6.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
3. Source the setup.tcl file:
4. Open the block: This can be accomplished either by first opening the library,
and then the block, or, by using the full block handle, which includes the
library, in which case you do not need to first open the design library:
open_block ORCA_TOP.dlib:ORCA_TOP/init_design
# OR
open_lib ORCA_TOP.dlib
open_block ORCA_TOP/init_design
This script first creates the modes, corners and scenarios needed for multi-
corner multi-mode (MCMM) optimization of this design.
Note: The script takes advantage of Tcl arrays
(set arrayName(varName) varValue) to create
efficiently-coded foreach loops.
Question 1. What are the names of the modes, corners and scenarios that
will be created?
Modes: ......................................................................................
Corners: ....................................................................................
...................................................................................................
Scenarios: .................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
4. Close the mcmm_ORCA_TOP.tcl file – do not save it, then source it:
5. Look at the log messages that were generated in the icc2_shell window:
After scenario creation, the messages confirm that all scenarios are active for
all analysis types:
Created scenario test.ff_125c for mode test and corner ff_125c
All analysis types are activated.
The warning messages about the virtual and generated clocks, which occur
when sourcing the mode constraints, are just informational, and can be
ignored (virtual clocks, by definition, have no sources).
6. Verify that the active analysis types for the test.ss_125c scenario match
your answer to the previous question.
7. Ensure that there are no propagated clocks prior to clock tree synthesis:
report_mode
Right after the name of the mode you will see this line:
Current: false Default: false Empty: false
Current refers to whether this mode is the current mode or not.
Default is only true if you didn’t create any modes on your own, in that case
ICC II would have single mode named default.
Empty is true if you have not applied any constraints to this mode.
11. Generate a pvt report, to find out if there are any mismatches between the
PVT values defined in each corner, versus the available library PVTs:
view report_pvt
12. Look at the summary section for each of the four corners (between the
horizontal dashed lines), and answer the following questions:
Question 4. Which corner(s) have PVT mismatches?
...................................................................................................
...................................................................................................
13. The information below the warning summary section lists the details of each
library that has a mismatch (based on the cells instantiated in the netlist). A
quick way to find out what the problems are is to look at the lines with an
asterisk or star (*).
Question 6. What is causing all of the mismatches?
...................................................................................................
...................................................................................................
...................................................................................................
Based on the name of the corner with the mismatches (ss_m40c), as well as
the name of the other corner with that same temperature (ff_m40c), it is
reasonable to conclude that the problem is with the user-specified temperature
constraint of -55, not with the characterized corner of the libraries.
14. Exit the PVT report.
15. Make the necessary correction in the appropriate constraints file located in
the ORCA_TOP_constraints directory.
16. Re-execute the commands in steps 4, 7 and 11 on the previous pages, until a
clean PVT report is obtained.
17. Save the block/library
save_lib
list_blocks
...................................................................................................
1. Generate a QoR summary report, then set the design in ZIC timing mode and
generate another QoR summary report:
You will notice a big difference in non-ZIC and ZIC timing. This is due to the
long, estimated routes between the unplaced standard cells in the lower-left
corner, and the hard macro cells in the block, as well as the I/O terminals
around the block. In ZIC timing, the RC parasitics of these long, estimated
routes are set to zero which drastically improves setup timing.
This QoR report is very useful to get a high-level summary of the worst
negative slack (WNS) timing, as well as the total negative slack (TNS), and
the number of violating endpoints (NVE) for each scenario.
At first glance, when looking at the second, ZIC QoR report, it appears that
the design has a serious problem! Two of the three setup timing scenarios
have WNS violations of ~2.6 ns!
2. First let us make sure that these large violations are not due to unbuffered high
fanout nets (HFNs) - assume a fanout of 50 or more is considered a HFN:
set_app_options -list \
{time.high_fanout_net_pin_capacitance 0pF
time.high_fanout_net_threshold 50}
update_timing -full
report_qor -summary -include setup
Since the results are the same, the violations are not caused by HFNs.
3. Notice that the WNS for the scenario in the -40 OC corner, func.ss_m40c,
is not violating (has a positive WNS).
Question 8. Can you think of a reason why one scenario meets setup ZIC
timing, while the others have a huge WNS violation?
...................................................................................................
...................................................................................................
...................................................................................................
Next, you will investigate the large setup timing violations further, to confirm
that optimization was, indeed, not done for these scenarios.
4. Generate a timing report for the five worst violating paths:
5. In the view window, look at the incremental delay numbers in the column
labeled Incr, for the first reported path. You should notice a couple of large
(>> 1ns) delays. If you look to the left of those large delays, at the name of the
standard cell reference shown in parenthesis, you will find that they are small-
sized (1x or 2x), high Vth gates (ending with X1_HVT or X2_HVT) which are
the slowest cells available! Scroll down and look at the other four paths: You
will find the same thing there. In fact, you should notice that these slowest
cells are being used all along the timing-critical paths.
This is a clear indication that these paths were not optimized, which confirms
that these scenarios were not considered during synthesis. Ideally, it would be
best to re-synthesize the design under all key setup timing scenarios, which
would provide a better starting netlist to IC Compiler II. This would result in a
better initially-placed design, requiring less optimization, and thus better
placement run-time. If re-synthesizing the design is not an option, there is still
a good likelihood that optimization during the placement, CTS and routing
phases will be able to eliminate, or drastically improve the timing in the other
scenarios.
6. Exit the timing report view window.
You have completed all recommended timing setup steps!
7. Exit out of IC Compiler II:
exit
Answers / Solutions
Question 1. What are the names of the modes, corners and scenarios
that will be created?
Modes: func, test
Corners: ss_125c, ss_m40c, ff_125c, ff_m40c
Scenarios: func.ss_125c, func.ss_m40c,
func.ff_125c, func.ff_m40c, test.ss_125c,
test.ff_125c
15. Make the necessary correction in the appropriate constraints file located in
the ORCA_TOP_constraints directory.
In the ORCA_TOP_c_ss_m40c.tcl file make the following correction:
set_temperature -40
Question 8. Can you think of a reason why one scenario meets setup
ZIC timing, while the others have a huge WNS violation?
One common reason is that synthesis was not performed in
an MCMM environment. In our case, it was performed for a
single mode and corner: The functional mode (func), and
the slow-slow process at -40 OC corner (ss_m40c).
Learning Objectives
Lab Duration:
70 minutes
Instructions
Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers, and to
look for hints.
In this task, you will load the resulting design after placement and optimization and
perform pre-CTS checks.
1. Change to the work directory for the CTS lab, then load the placed design:
UNIX% cd lab7_cts
UNIX% icc2_shell -gui -f load.tcl
The script will make a copy of the place_opt block and open it.
2. Open the run.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
3. Generate a timing QoR summary.
report_qor -summary
...................................................................................................
...................................................................................................
report_clocks
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
report_clocks -skew
...................................................................................................
...................................................................................................
report_clocks -groups
...................................................................................................
...................................................................................................
...................................................................................................
8. Generate a clock tree summary report for both modes (default report):
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
1. Since CTS setup (clock tree balancing constraints, NDR rules, etc.) will be
covered in the next unit, perform these setup steps by sourcing a file:
source scripts/cts_ex_ndr.tcl
2. Ensure that the correct scenarios are enabled for hold fixing. To find all active
scenarios that are enabled for hold:
Question 11. Are the scenarios configured properly for hold fixing?
...................................................................................................
Complete the scenario setup for hold (look at run.tcl). Also, double check
that all scenarios are active.
3. You can control which buffers or delay cells should be used for hold fixing
using set_lib_cell_purpose. Execute the three corresponding
commands from the run.tcl file.
4. If you like, you can increase the effort for maximum hold timing
optimization, although it is not required for this design:
set_app_options \
-name opt.dft.clock_aware_scan_reorder \
-value true
For our design, we do not want the I/O latencies to be updated, except for
v_PCI_CLK. This clock is a virtual clock, so latency adjustment needs to be
configured to update the clock:
1. Now that you have analyzed the clocks using the manual methods discussed
earlier, generate a clock tree check report to see what other potential problems
might appear during CTS:
v check_clock_trees
You should see a long report with a “Summary” and a “Details” section.
The summary section will show you how many problems were found of each
problem category, and if there is a suggested solution, for example:
CTS-019 2 None Clocks propagate to output ports
CTS-905 4 None There are clocks with no sinks
For more details, review the Details section: The detailed section for
CTS-0905 (at the end of the report) complains about clocks without sinks,
which you have already analyzed in earlier steps. Four clocks are listed,
related to ports SD_DDR_CLK and SD_DDR_CLKn (repeated for each mode,
func and test).
For the purposes of this lab, all of these Warnings are acceptable, and can be
ignored.
Note: For this design, the runtime for CCD is higher than for classic CTS.
If you are done early with one flow, try the other flow.
Note: If you are performing this step after you have already performed
option B (CCD), then you will need to re-load the design and re-
apply all settings. To simplify this, just restart ICCII and use the
script scripts/load_all.tcl – this script will re-load the
design and set up everything up to this point.
The run should only take a few minutes. The above command will execute the
first two stages: build_clock and route_clock.
2. Once the run has completed, review the CTS skew results. After looking at all
results in all modes/corners, record results for the worst corner for the
functional mode, ss_125c:
report_clock_qor
v report_clock_qor -type local_skew
report_clock_qor -type area
report_clock_qor -mode func -corner ss_125c \
-significant_digits 3
Record the global/local skew/max latency numbers for the indicated clocks, in
the slowest corner ss_125c:
ss_125c Corner Global Skew Local Skew Max Latency
SYS_2x_CLK
SDRAM_CLK
5. Generate a different skew report using the clock timing report: Concentrate on
the func mode and the worst corner:
The reported Skew is the difference between the max and min Latency
numbers, plus or minus the clock reconvergence pessimism (CRP).
...................................................................................................
...................................................................................................
In this task you will optimize the non-clock network logic to address any timing
violations, and you will perform hold-fixing for the first time.
report_qor -summary
Note down the worst-case (Design) WNS/TNS/NVE numbers for setup and
hold:
WNS TNS NVE
Setup
Hold
...................................................................................................
In this task, you will use the CCD flow to build the clock trees and optimize the
logic. Note that CCD will take much longer to run compared to classic CTS. If
you have chosen option B right away, then continue with the first step.
Note: If you are performing this step after you have already performed
option A, then you will need to re-load the design and re-apply all
settings. To simplify this, just restart ICCII and use the script
scripts/load_all.tcl – this script will re-load the design
and set up everything to this point, ready for CCD.
1. Source the following script – this will make things a little more interesting for
CCD, by introducing a few setup timing violations, which will need to be
fixed by the CCD algorithms. Have a look at a timing summary afterwards:
source scripts/margins_for_ccd.tcl
report_qor -summary
2. Note down the worst-case (Design) WNS/TNS/NVE numbers for setup and
hold:
WNS TNS NVE
Setup
Hold
Task 8. Analysis
1. Have a look at the synthesized clock tree using the clock abstract graph GUI.
In the GUI select WindowClock Tree Analysis Window.
2. In the new CTSWindow, check the little box in the top right corner next to
‘Filter clock by corner’. This allows us to analyze clock latencies, which
vary by corner. Make sure the ss_125c corner is selected.
3. In the main section of the window, where all the clocks are listed, go to the
func scenario section, right click on the SYS_2x_CLK, then select Clock
Tree Latency Graph of selected Corner.
This screenshot shows the latency graph for SYS_2x_CLK for the classic CTS
flow (the CCD screenshot is shown on the next page):
This screenshot shows the latency graph for SYS_2x_CLK for the CCD flow:
...................................................................................................
Answers / Solutions
Question 11. Are the scenarios configured properly for hold fixing?
You will find that there are three *ff* scenarios, of which
one is not configured for hold fixing: test.ff_125c.
Since this is a fast scenario, make sure it is enabled for hold
fixing.
Learning Objectives
The purpose of this lab is for you become familiar with the
setup steps for clock tree balancing, NDRs as well as
timing/DRCs.
After CTS setup, you will run the build_clock /
route_clock clock tree synthesis stages to confirm the
results.
Lab Duration:
45 minutes
Instructions
Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers, and to
look for hints.
1. Change to the work directory for the CTS lab, then load the starting design:
UNIX% cd lab8_cts
UNIX% icc2_shell -gui -f load.tcl
The script will open a block that is ready for the upcoming tasks.
2. Open the run.tcl file in the ICC II script editor, as in previous labs.
This file contains all the commands that you will be executing in this lab.
Select the commands from this file and use Run Selection, instead of typing
them yourself, to save time and avoid typing errors.
3. Select the menu WindowClock Tree Analysis Window and answer Yes
when asked “… Do you want to continue?”.
4. Expand the SDRAM_CLK entry (click on the “+” in front of it) and you will see
the two SD_DDR_CLK* clocks. Notice that the is_generated column is set
to true for these clocks, and their sources are sd_CK*. Also, you will see
“M” and “G” symbols in front of the clocks, which identifies them as Master
or Generated clocks, respectively. In summary: The SD_DDR_CLK and
SD_DDR_CLKn clocks are generated from the master clock SDRAM_CLK. The
SDRAM_CLK clock is applied to its source port sdram_clk. The generated
clocks are applied to their source ports sd_CK and sd_CKn, respectively.
In the view window, type Ctrl-F (or click on Search…) then enter the search
string “sd_CK” and press enter. You should see the line beginning with
sd_CK [out Port], and at the end of the line you will see a balance point
exception (something other than an implicit SINK PIN).
Question 1. What balance point exception is set on sd_CK, and why?
...................................................................................................
...................................................................................................
Many designs have special or non-default requirements for their clock trees, in
which case executing a default clock tree synthesis is not sufficient.
CTS will only balance the delays (minimize skew) to sink pins, which, by default,
are clock pins of sequential cells or macros. If there are additional pins that need to
be balanced along with these clock pins, ICC II needs to be explicitly told about
them prior to CTS.
Figure 1 shows the SDRAM interface. The clock SDRAM_CLK is connected directly to
the select pins of muxes, which in turn will drive the output ports of the ORCA_TOP
block. The dummy mux driving sd_CK is required because of the tight timing
requirement of the DDR SDRAM interface, which produces data at its output data
ports on both the rising and the falling clock edges. This design requires that the
clock skew from SDRAM_CLK to sd_DQ_out and sd_CK be optimized. By default,
select (S0) pins are marked as implicit ignore pins. To have CTS balance the skew
you need to redefine these select pins as sink pins.
1. In the view window that should still be open (see the last step of the previous
Task), notice that just above and below the sd_CK line, the MUX select pins
described in Figure 1, I_SDRAM_TOP/I_SDRAM_IF/sd_mux_CK*/S0, are,
in fact, also implicit ignore pins.
2. Click on Dismiss Search and then Exit the view window.
As a reminder, select and run the commands from the run.tcl script using
the built-in script editor.
set_clock_balance_points \
-modes [all_modes] \
-balance_points [get_pins "I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*/S0"]
V report_clock_balance_points
You should see the pins listed below the following lines:
Clock Independent:
Balance Points:
The reason the balance points are “Clock Independent” is because you did not
specify a clock to go with the exception. If the exception is intended to be
balanced with regard to a specific clock, and there are multiple clocks
reaching this point, then a clock should be specified. This is not the case here.
5. Have another look at a clock structure report, and search for sd_mux:
...................................................................................................
Question 3. How is the sd_CK port labeled now? What does this mean?
...................................................................................................
...................................................................................................
set_dont_touch \
[get_cells "I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*"]
report_dont_touch I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*
...................................................................................................
11. Instruct CTS to not change the register that is used as the clock divider:
set_dont_touch \
[get_cells "I_CLOCKING/sys_clk_in_reg"]
report_clock_tree_options
15. When performing CTS, it is generally desirable to use specific cells for
synthesis, instead of letting ICC II choose any cell from the library, for
example: Cells which help to reduce skew (identical rise/fall ramp times);
Cells which help to better balance between power consumption and
speed/drive-strength, size, etc.
CTS-specific cells are defined using:
set_lib_cell_purpose -include cts
First, automatically identify the gates and ICGs that are already on the clock
network, and their logical equivalents (leq’s):
Note that the above will only work if all library cells already have the cts
purpose.
16. Have a look at the file that was created - cts_leq_set.tcl.
As you can see, all cells that are on the clock network currently have been
identified, along with their LEQ’s. You could copy and paste the commands to
a new file, uncomment the appropriate lines, and source it later, however, you
do NOT have to do this - we have already done this for you.
17. Next, choose the buffers and/or inverters you want to use for the clock tree –
this is done using the following lines:
source scripts/cts_include_refs.tcl
19. Generate a report to ensure that the correct lib-cell purpose was indeed set,
and dont_touch was removed, on the key CTS cells:
In this task you will specify CTS non-default routing rules, as well as clock cell
spacing rules.
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
v report_routing_rules -verbose
You should see the metal layer details for each of the two rules that were
created. In addition, you should see a section for the vias.
5. Now verify where the rules were applied:
report_clock_routing_rules
The report shows which net segments (net type) the rules apply to (sink
overrides all), and the min/max layer constraints for each clock segment.
Verify that the master clock sources are input ports, and that they are all
constrained by either a Driving Cell or input Transition.
Question 7. Are all clock ports constrained by either a Driving Cell or
input Transition?
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
2. Fix the problem found above by adding a driving cell to the ate_clk port.
Driving cells need to be added in all scenarios, because this constraint is
scenario-specific.
Specify NBUFFX16_RVT as the driving cell for the port ate_clk, then report
the clock ports again:
v report_clock_settings
Note: To see the correct max transition information, you have to scroll down
past the second ##Global section, and search for the “Mode = func”
section which lists all the individual clocks. The report first lists Global
settings for all modes/corners, which were not set in our case. Instead, we
applied clock-specific settings (to all clocks) by using “-clock_path
[get_clocks]”, in the current func mode.
1. Build and route the clock trees. Remember to disable CCD, it’s not needed
for now:
2. In the GUI, turn off the visibility of power and ground nets, and zoom in to
have a closer look at the clock routes. If you hover the mouse cursor over a
net, in the query window that appears, you will be able see the NDRs routing
rule that has been applied.
3. Report the skew between all the sd_mux* pins, which were defined as sink
pins in an earlier Task. An easy way to do this is:
report_clock_qor \
-to I_SDRAM_TOP/I_SDRAM_IF/sd_mux_*/S0 \
-corners ss_125c
Answers / Solutions
Question 2. How are the S0 (select) pins of the MUXes labeled now?
Question 3. How is the sd_CK port labeled now? What does this mean?
The sd_CK port is now described as [BEYOND
EXCEPTION], since the port is in the fanout of (beyond) the
BALANCE PIN exception on the mux. CTS non-default
design rules, as well as general max transition and
capacitance design rules will be applied on the “beyond
exception” sections of the clock network.
Question 5. Which net segment(s) of the clock tree do the two clock
routing rules apply to?
The bottom routing rule $CTS_LEAF_NDR_RULE_NAME
(cts_w1_s2), is applied to the sink segments of the clock
net (-net_type sink).
The top rule $CTS_NDR_RULE_NAME (cts_w2_s2_vlg),
is applied to the remaining root and internal segments
of the clock net, since–net_type was not specified.
Learning Objectives
Lab Duration:
70 minutes
Instructions
Answers / Solutions
You are encouraged to refer to the back of the lab to verify your answers.
1. Change to the work directory for the Routing lab, then load the post-CTS
design:
UNIX% cd lab9_11_route_signoff
UNIX% icc2_shell -gui -f load.tcl
The script will make a copy of the clock_opt block and open it.
2. Generate a timing QoR summary.
report_qor -summary
...................................................................................................
...................................................................................................
...................................................................................................
To support a more immersive and interesting lab experience, there are no step-by-
step instructions for this lab.
Instead, you are asked to open the file run.tcl in an editor (or using the ICC II
script editor), and exercise the commands line by line. We are specifically asking
you not to just source the entire file, as this would defeat the purpose, which is to
understand how all the options and commands play together. If there is an option
that does not make sense, have a look at its man page.
The following sections provide additional information and comments. The sections
are ordered in such a way that you can refer to them as you go through the script.
If you like, you can diverge from the commands in run.tcl, or you could try
different efforts, different settings etc. Note, though, that the runtimes might vary.
All in all, the runtimes are very quick, so you are encouraged to experiment.
The following section is designed to be a guide through the lab. It contains
information on the items that have to be configured and run, and some additional
questions.
If you need help, talk to your instructor.
Pre-routing Checks
Before you route the design it is best to ensure that there are no issues that will
prevent the router from doing its job.
Question 2. Is the design ready for routing?
...................................................................................................
...................................................................................................
...................................................................................................
Antenna
Antenna definitions are commonly supplied in a separate TCL file, specific to the
technology, using the following commands (these are the same commands used in
IC Compiler):
• define_antenna_rule
• define_antenna_layer_rule
• define_antenna_area_rule
• define_antenna_accumulation_mode
• define_antenna_layer_ratio_scale
In addition, there are application options that control how antenna violations are
handled. Use the report_app_options command shown in run.tcl.
Crosstalk Prevention
Crosstalk prevention tries to ensure that timing-critical nets are not routed in parallel
over long distances. Prevention can occur in the global routing and the track assign
stages. The current recommendation is to enable prevention during the track assign
stage only. In order for crosstalk prevention to occur, crosstalk (or signal integrity)
analysis must also be enabled. In order to make post-route analysis and optimization
more interesting (showing SI violations that need to be fixed during route_opt),
you can choose to do that “artificially” by not enabling SI analysis during routing.
Secondary PG Routing
Secondary PG pins are power pins on special cells like level shifters or isolation
cells. In addition to the regular power supply provided through the standard cell
rails, these secondary power pins need to be routed to using the standard router.
There are a few routing parameters used to configure the routing behavior. In
addition, it is common to define non-default rules for these power connections.
After performing the secondary PG routing as shown in the script, identify the low-
to-high level shifters (in the PD_RISC_CORE voltage area) and examine their power
routing, to answer the following question:
Question 3. What is the name of the secondary PG pins, and where do
they connect to?
...................................................................................................
...................................................................................................
...................................................................................................
...................................................................................................
Question 5. How do you change the default, and how do you force the
router to run through all iterations even though the router
might not see any improvements
(Hint: report_app_options *iterat*)?
...................................................................................................
...................................................................................................
In the popup, select zroute.err and click on Open Selected. In the new window,
you can select the errors from the list, which causes the layout view to zoom to that
violation.
Close the Error Browser.
Signal Integrity
Signal integrity analysis should be turned on before routing. This will instruct the
extraction engine to extract cross-coupling capacitances, and instruct the router to
perform timing analysis with delta delays using these coupling capacitances. This is
important when performing timing-driven routing.
For this lab, if you left SI analysis off before auto-route, turn it on afterwards to see
the SI effects, and to allow SI-related violations to be fixed during route_opt.
For better correlation with PrimeTime, you should also enable timing window
analysis, as shown in the script.
Post-Route Optimization
Post-route optimization is performed using route_opt, which performs timing,
logical drc, area and (optionally) CCD and power optimization.
There are application options you have to set in order to enable CCD and power
optimization.
For best correlation with PrimeTime, you should enable PT delay calculation. In this
lab, don’t perform StarRC InDesign extraction in route_opt. You will be using
StarRC later when performing ECO Fusion.
ECO Fusion
After completing route_opt, it’s important to analyze signoff timing in
PrimeTime using StarRC parasitic extraction. Any violations uncovered by PT can
be fixed using PT’s physical ECO.
Using Fusion, the entire ECO process can be controlled from within ICC II.
Review the run.tcl file for the necessary commands, and perform one round of
ECO fixing. For Fusion, you will require ICC II, StarRC and PrimeTime SI.
Note that after the ECOs have been implemented, you have to analyze timing using
the command check_pt_qor. You should not run ICC II’s native timing reporting
commands (report_qor, report_timing, …).
Answers / Solutions
Question 5. How do you change the default, and how do you force the
router to run through all iterations even though the router
might not see any improvements?
You change the default using
route_auto -max_detail_route_iterations <#>
To force route_auto to actually run through all iterations
you need to change an application option from its default
(false):
route.detail.force_max_number_iterations true
SYNTHESIS:
Synthesis is process of converting RTL to technology specific gate level netlist
Input files required
.lib-timing info of standard cell & macros
.v- RTL code.
SDC- Timing constraints.
UPF- power intent of the design.
Scan config- Scan related info like scan chain length, scan IO, which flops
are to be considered in the scan chains.
RC co-efficient file (tluplus).
LEF/FRAM- abstract view of the cell.
Floorplan DEF- locations of IO ports and macros.
These three methods are done internally in the logic synthesis tool(1. Genes,
2.Design compiler) and are not visible to the designer.
3. Import constraints and UPF
SDC – for timing constraints
If the design consists of multiple power domains UPF file is needed
4. Clock gating
Due to high switching activity of clock a lot of dynamic power is consumed. to lower the
dynamic power is clock gating technique is used
clock gating circuit consists of an AND gate in the clock path with one input as enable.
5. DFT (Design for Testing) insertion
DFT circuits are used for testing each and every node in the design.
9. Outputs of Synthesis
netlist
SDC
UPF
ScanDEF- information of scan flops and their connectivity in a scan chain
Checks to be done after sythenthesis or sanity check before floorplan
the RTL and netlist are logically equivalent (LEC/FM)
Floating pins
multi driven inputs
un-driven inputs
un-driven outputs
normal cells in clock path
pin direction mismatch
don’t use cells
Setuptiming
CLP check --- always on buffer is placed or not
Cell profiing
Buffer count
Floor Planning:
A floorplanning is the process of placing blocks/macros in the chip/core area,
.
Floorplan determines the size of die , I/O pin/pad placement and creates power
ground(PG) connections.
Inputs required
1. Gate level netlist
2. LEF,LIB
3. Timing constraints (SDC)
4. Power Intent (UPF / CPF)
5. FP DEF & Scan DEF
Cell orientation
Fly/flight lines are virtual connections between macros and also macros to I/O pads.
flight lines are of three types.
1. Macro to macro fly lines
2. pin to pin fly lines
3. macro to I/O fly lines
5.IO placements
6.Add
— End Caps to prevents DRC violations
— Well Taps prevent Latch-up
6.Power Planning
Grid is created to distribute power to all the cells
Width of the metal is available in LEF
• Rings (Vertical and horizontal)
— VDD and VSS Rings are formed around the Core and Macro
• Stripes
— Carries VDD and VSS around the chip
• Rails (Special Route)
— Connect VDD and VSS to the standard cell
7.Macro placement
Guidelines will be provided for macro placement
— Reserve enough room around Macros for IO Routing
— Provide necessary Blockages around the Macro
8. Blockages
— Placement Blockage & Routing Blockage
— Both of the Blockages can again be classified as-
• Hard, Soft and Partial Blockages
— Hard Blockage
• Complete Standard Cell Blockage
—Soft Blockage
• Non-Buffering Blockage
— Partial Blockage
• Partial Standard Cell Blockage and is used to avoid congestion
• We can Block Standard Cells as per the required percentage value
— Keep-out/ Halo
• Halo is similar to Soft Blockage (Terminology in Cadence EDI)
• Its basically a keep-out Macro margin
• Halo respects Macro while other Blockages respect location
i.e., even if Macro is moved Halo also moves along with it
Checks to be done:
How to qualify Floorplan?
1. Max density
2. Check PG connections (For macros & pre-placed cells only)
3. Check the power connections to all Macros,
4. All the macros should be placed at the boundary
5. Remove all unnecessary placement blockages & routing blockages (which might
be put during floor-plan & pre-placing)
6. Check power connection to power switches
7. Check pin placements
8. Power related short open in design, IR drop
Placement
all the standard cells are placed in the design
• Placement Stages
— Global Placement
— Detail Placement
— Placement Legalization
— In-Place Optimizations
• Global/ Coarse Placement
— approximately place the cells
— Cells are not legally placed and there
can be overlapping
• Detail/ Legal Placement
— Cells have legalized locations To avoid cell overlapping
• Placement Legalization
— Placed Macros are legally oriented with Standard Cell Rows
• In-Place Optimizations
— Scan Chain Reordering
Checks:
1.placement congestion
2.timing checks
3.dont use don’t touch
4.max trans and max cap
5.cell profiling
6.CLP check
7.High fan out
8.Secondary PG connection
CTS:
So far we used ideal clock in cts physical clock tree structure will be built between clock
source to sink
Clock should get distributed evenly for all elements in a design
Goal:
Meet the clock tree DRC.
Max. Transition.
Max. Capacitance.
Max. Fanout.
Minimal skew.
Minimum insertion delay.
These details were present in lib file
Checks to be done before CTS
1. Check legality.
2. verify PG connections.
3. Timing QoR (setup should be under control).
4.Timing DRVs.- max tran, max cap, max fanout
5.Conjestion hotspot
6. Check & qualify don’t_touch, don’t size attributes on clock components
Clock buffer and clock inverter are used to maintain 50% of duty cycle
several structure for clock tree:
H-Tree
X-Tree
Fish bone
Before CTS all Clock Pins are driven by a single Clock Source
After CTS the buffer tree is built to balance the loads and minimize the skew
Many clock buffers are added, congestion may increase, this will cause setup,hold
violation
. Set Up Fixing:
Routing
Gate delay
Transistors within a gate take a finite time to switch. This means that
a change on the input of a gate takes a finite time to cause a change on the
output.[Magma]
Network Delay(latency)
Insertion delay
The delay from the clock definition point to the clock pin of the
register.
Transition delay
Slew
Rise Time
Rise time is the difference between the time when the signal crosses
a low threshold to the time when the signal crosses the high threshold. It
can be absolute or percent.
Low and high thresholds are fixed voltage levels around the mid
voltage level or it can be either 10% and 90% respectively or 20% and 80%
respectively. The percent levels are converted to absolute voltage levels at
the time of measurement by calculating percentages from the difference
between the starting voltage level and the final settled voltage level.
Fall Time
Fall time is the difference between the time when the signal crosses
a high threshold to the time when the signal crosses the low threshold.
The low and high thresholds are fixed voltage levels around the mid
voltage level or it can be either 10% and 90% respectively or 20% and 80%
respectively. The percent levels are converted to absolute voltage levels at
the time of measurement by calculating percentages from the difference
between the starting voltage level and the final settled voltage level.
For an ideal square wave with 50% duty cycle, the rise time will be
0.For a symmetric triangular wave, this is reduced to just 50%.
The rise/fall definition is set on the meter to 10% and 90% based on
the linear power in Watts. These points translate into the -10 dB and -0.5
dB points in log mode (10 log 0.1) and (10 log 0.9). The rise/fall time values
of 10% and 90% are calculated based on an algorithm, which looks at the
mean power above and below the 50% points of the rise/fall times. Click
here to see more.
Path delay
Path delay is also known as pin to pin delay. It is the delay from the
input pin of the cell to the output pin of the cell.
The difference between the time a signal is first applied to the net
and the time it reaches other devices connected to that net.
It is due to the finite resistance and capacitance of the net.It is also
known as wire delay.
Wire delay =fn(Rnet , Cnet+Cpin)
Propagation delay
It is taken as the average of rise time and fall time i.e. Tpd=
(Tphl+Tplh)/2.
Phase delay
Cell delay
Intrinsic delay
Intrinsic delay is the delay internal to the gate. Input pin of the cell to
output pin of the cell.
It is defined as the delay between an input and output pair of a cell,
when a near zero slew is applied to the input pin and the output does not
see any load condition.It is predominantly caused by the internal
capacitance associated with its transistor.
This delay is largely independent of the size of the transistors forming
the gate because increasing size of transistors increase internal capacitors.
Extrinsic delay
Input delay
Input delay is the time at which the data arrives at the input pin of the
block from external circuit with respect to reference clock.
Output delay
Output delay is time required by the external circuit before which the
data has to arrive at the output pin of the block with respect to reference
clock.
Exit delay
Unateness
Jitter
From cycle to cycle the period and duty cycle can change slightly due
to the clock generation circuitry. This can be modeled by adding uncertainty
regions around the rising and falling edges of the clock waveform.
Sources of Jitter
Skew
The difference in the arrival of clock signal at the clock pin of different
flops.
Two types of skews are defined: Local skew and Global skew.
Local skew
The difference in the arrival of clock signal at the clock pin of related
flops.
Global skew
The difference in the arrival of clock signal at the clock pin of non
related flops.
When data and clock are routed in same direction then it is Positive
skew.
When data and clock are routed in opposite then it is negative skew.
Recovery Time
Equation 1:
Recovery Slack Time = Data Required Time – Data Arrival Time
Data Arrival Time = Launch Edge + Clock Network Delay to Source
Register + Tclkq+ Register to Register Delay
Data Required Time = Latch Edge + Clock Network Delay to
Destination Register =Tsetup
If the asynchronous control is not registered, equations shown in Equation
2 is used to calculate the recovery slack time.
Equation 2:
Recovery Slack Time = Data Required Time – Data Arrival Time
Data Arrival Time = Launch Edge + Maximum Input Delay + Port to
Register Delay
Data Required Time = Latch Edge + Clock Network Delay to
Destination Register Delay+Tsetup
If the asynchronous reset signal is from a port (device I/O), you must
make an Input Maximum Delay assignment to the asynchronous reset pin
to perform recovery analysis on that path.
Removal Time
Equation 3
Removal Slack Time = Data Arrival Time – Data Required Time
Data Arrival Time = Launch Edge + Clock Network Delay to Source
Register + Tclkq of Source Register + Register to Register Delay
Data Required Time = Latch Edge + Clock Network Delay to
Destination Register + Thold
If the asynchronous control is not registered, equations shown in
Equation 4 is used to calculate the removal slack time.
Equation 4
Removal Slack Time = Data Arrival Time – Data Required Time
Data Arrival Time = Launch Edge + Input Minimum Delay of Pin +
Minimum Pin to Register Delay
Data Required Time = Latch Edge + Clock Network Delay to
Destination Register +Thold
If the asynchronous reset signal is from a device pin, you must
specify the Input Minimum Delay constraint to the asynchronous reset pin
to perform a removal analysis on this path.
For more detail about recovery and removal time click here.
Clock Tree Synthesis- part 1
by signoff-scribe | Oct 16, 2017 | Weekly-Training-Sessions | 13 comments
Blog Views: 8,107
Author : Nishant Lamani, Physical Design Engineer, SignOff Semiconductors
Clock Tree Synthesis (CTS) is one of the most important stages in PnR. CTS QoR decides timing
convergence & power. In most of the ICs clock consumes 30-40 % of total power. So efficient clock
architecture, clock gating & clock tree implementation helps to reduce power.
Sanity checks need to be done before CTS
Check legality.
Check power stripes, standard cell rails & also verify PG connections.
Timing QoR (setup should be under control).
Timing DRVs.
High Fanout nets (like scan enable / any static signal).
Congestion (running CTS on congested design / design with congestion hotspots can create more
congestion & other issues (noise / IR)).
Remove don’t_use attribute on clock buffers & inverters.
Check whether all pre-existing cells in clock path are balanced cells (CK* cells).
Check & qualify don’t_touch, don’t size attributes on clock components.
Preparations
Understand clock structure of the design & balancing requirements of the designs. This will be help in
coming with proper exceptions to build optimum clock tree.
Creating non-default rules (check whether shielding is required).
Setting clock transition, capacitance & fan-out.
Decide on which cells to be used for CTS (clock buffer / clock inverter).
Handle clock dividers & other clock elements properly.
Come up with exceptions.
Understand latency (from Full chip point of view) & skew targets.
Take care of special balancing requirements.
Understand inter-clock balancing requirements.
Difference between High Fan-out Net Synthesis (HFNS) & Clock Tree Synthesis:
Clock buffers and clock inverter with equal rise and fall times are used. Whereas HFNS uses buffers and
inverters with a relaxed rise and fall times.
HFNS are used mostly for reset, scan enable and other static signals having high fan-outs. There is not
stringent requirement of balancing & power reduction.
Clock tree power is given special attention as it is a constantly switching signal. HFNS are mostly
performed for static signals and hence not much attention to power is needed.
NDR rules are used for clock tree routing.
Why buffers/inverters are inserted?
Balance the loads.
Meet the DRC’s (Max Tran/Cap etc.).
Minimize the skew.
What is the difference between clock buffer and normal buffer?
Clock buffer have equal rise time and fall time, therefore pulse width violation is avoided.
In clock buffers Beta ratio is adjusted such that rise & fall time are matched. This may increase size of
clock buffer compared to normal buffer.
Normal buffers may not have equal rise and fall time.
Clock buffers are usually designed such that an input signal with 50% duty cycle produces an output with
50% duty cycle.
CTS Goals
Meet the clock tree DRC.
Max. Transition.
Max. Capacitance.
Max. Fanout.
Meet the clock tree targets.
Minimal skew.
Minimum insertion delay.
Clock Tree Reference
By default, each clock tree references list contains all the clock buffers and clock inverters in the logic
library. The clock tree reference list is,
Clock tree synthesis.
Boundary cell insertions.
Sizing.
Delay insertion.
Boundary cell insertions
When you are working on a block-level design, you might want to preserve the boundary conditions of the
block’s clock ports (the boundary clock pins).
A boundary cell is a fixed buffer that is inserted immediately after the boundary clock pins to preserve the
boundary conditions of the clock pin.
When boundary cell insertion is enabled, buffer is inserted from the clock tree reference list immediately
after the boundary clock pins. For multi-voltage designs, buffers are inserted at the boundary in the default
voltage area.
The boundary cells are fixed for clock tree synthesis after insertion; it can’t be moved or sized. In addition,
no cells are inserted between a clock pin and its boundary cell.
Fig1: Boundary cell
Delay Insertion
If the delay is more, instead of adding many buffers we can just add a delay cell of particular delay value.
Advantage is the size and also power reduction. But it has high variation, so usage of delay cells in clock
tree is not recommended.
Clock Tree Design Rule Constraints
Max. Transition.
The Transition of the clock should not be too tight or too relaxed.
If it is too tight then we need more number of buffers.
If it is too relaxed then dynamic power is more.
Max. Capacitance.
Max. Fanout.
Clock Tree Exceptions
Non- Stop Pin
Exclude Pin
Float Pin
Stop Pin
Don’t Touch Subtree
Don’t Buffer Nets
Don’t Size Cells
Non- Stop Pin:
Nonstop pins trace through the endpoints that are normally considered as endpoints of the clock tree.
Example :
The clock pin of sequential cells driving generated clock are implicit non-stop pins.
Clock pin of ICG cells.
Fig2: Non Stop pin
Exclude pin:
Exclude pin are clock tree endpoints that are excluded from clock tree timing calculation and optimization.
The tool considers exclude pins only in calculation and optimizations for design rule constraints.
During CTS, the tool isolates exclude pins from the clock tree by inserting a guide buffer before the pin.
Examples:
Implicit exclude pin-
Non clock input pin of sequential cell.
Multiplexer select pin.
Three-state enable pin.
Output port.
Incorrectly defined clock pin [if pin don’t have trigger edge info.].
Cascaded clock.
Check_scan_chain ------->> Allows scan chain structural consistency checking based on the scan
chain information stored in the current design
Report_design-------------->> Reports netlist, floorplan, routing, and library information for the
current block
Report_threshold_voltage_groups-->> Reports statistics on cell count and area by threshold
voltage group names.
Report_power -------------------->> Calculates and reports dynamic and static power for the design
or instance
Report_qor ------------------------->> Displays QoR information and statistics for the current design.
timeDesign – preplace ---------------->>To get an idea of Zero Wire Load timing of the design
specifyScanChain scan1 -start { } –stop { } --------------------->>To specify the scan chain in the design
optDesign –preCTS---------------------->> optimize the setup and congestion violation in the placement
stage
setPlaceMode –congEffort high---------->> set congestion effort to high prior to running PlaceDesign
placement blockages.