VDF Project Part2 (2021)
VDF Project Part2 (2021)
ECE 513
Project Part 2
Group Number 5
Diptanu Som (MT21191)
Priyanka Bhagat (MT21158)
Mallika Singhal (2019173)
TOOLS and LIBRARY for Physical Design
Cadence - Slow.lib
Interpretation of results obtained in various steps of physical design: -
● At every step of physical design timing slacks vary . During placement tools try to place
each standard cell such that total wire length reduces according to connectivity with
input- output pins so that optimization takes place and in this process critical path timing
hampers and hence slack reduces.
● After CTS pre optimization steps ; clock networks are added, to account for the actual
clock delay and sometimes slack varies . Mostly it reduces as arrival time increases .
After optimization (post CTS) buffers are added . Because buffers provide extra delay in
the paths which further reduces the slack .
● After detailed routing actual interconnects of layout are taking place. These actual
interconnect delays further reduce the slack.
● Clock tree networks are built after CTS and exact clock delays information are available
and hold violations are fixed by adding clock buffers in hold violated path so as to
increase delay and hence arrival time after Post CTS optimization. So overall slack
improves.
● After routing, hold slack further reduces.
● In pre and post placement the number of cells remains the same . During placement ,
the standard cells available inside the netlist are placed . Hence the area of the standard
cell remains the same .
● Some extra cells of clock inverters are added after CTS pre-optimization . So the
number of cells increases and hence the area of standard cells increases.
● Some clock buffers are added after CTS post- optimization . So the number of cells
further increases and hence the area of standard cells increases.
● After routing, no standard cells are added so the number of cells and area remain the
same.
POWER CONSUMED:
● After CTS extra inverters and clock buffers are added which consume power and hence
overall power consumption increases
● After routing due to voltage drop in interconnect power consumption further increases .
● In 0.5 core utilization, 50 percent of area is reserved for routing purpose so routability is
good in this case as compared to 0.8. In 0.8 core utilization only 20 percent area is
reserved for routing.
● Less routing area results in more congestion in the design and hence signal integrity
issues come into picture due to crosstalk . This problem occurs because the decrease
in space between wires makes coupling capacitance significantly considerable .
● In our design the number of cells are less. So for 0.8 utilization cells are placed very
close. So less routing resources required to route the cells and hence delay introduced
by the interconnects are less . That’s why timing slack in .8 utilization cells is more than
0.5 core utilization.
RTL to GDS is comprised of several steps like RTL design , Synthesis, Equivalence checking,
Design for Test and then Physical design. Physical design step converts netlist to GDS( layout).
So, to start with the physical design the structural netlist with tight timing constraints which was
generated after DFT is used. The various information is reported below corresponding to the
netlist which was used for the physical design.
Constraint file:
Constraint File
I. Timing Information:
report_timing command in tempus provides the information about the various paths in
the design i.e. delay through the entire path. The start node and the end node of each path is
identified.
Static Timing Analysis works by building a timing graph for the circuit and the computes the
arrival time (AT) using forward traversal starting from the timing begin point. Then computes the
required time (RT) by backward traversal starting from the timing end point. Then the slack = RT
– AT is computed.
AT is calculated using forward traversal in the timing graph. At a given vertex Q arrival
time is computed as Max (𝐴𝑃+𝐷𝑃𝑄), P are the vertices at the input of the Q t. All the input
vertices of the given vertex have their AT already computed which is given in the below timing
report as other end arrival time (1.330ns) which is usually network delay (0.65 ns) plus source
delay (0.68 ns).
Here in the below timing report GBA is performed to find the most critical path of in the
network, the path from the beginpoint in2/Q_reg/D (ν) i.e., starting from the in2 instance with net
name Q_reg which is usually D pin of FF and to the path ending endpoint pty/par_reg_reg/D (^)
ending at the pty instance with net name par_reg_reg which is usually D pins of the FF.
Timing Report
In the above report at begin point vertex arrival time(other end arrival time) computed as 1.330
ns and by traversing to the end point through different cells encounters different delays which
are added e.g.from the rising edge of the clock of the in2 instance to the Q pin of the in2
instance i.e., 0.491 ns is added with 1.330ns to get AT of Q vertex as 1.821ns. In similar fashion
arrival time at subsequent vertices are calculated . Finally, we find that the AT at the end point
is 2.281ns.
Required Time Computation: -
From the above timing report we observed that RT is calculated as other end arrival time
(1.330 ns) + Phase Shift (2.50, clock period) – setup (0.313ns, time required by the clock to
setup) - uncertainty (0.8 ns, clock jitter value). Required Time (RT) at the endpoint evaluates to
2.717 ns. In the similar fashion RT’s of the vertices prior to endpoint are calculated by
traversing backwards,e.g. the required time at pty/g242 instance is calculated as 2.717 ns –
0.225 ns (delay of the cell) = 2.491 ns.
Slack Computation: -
Slack = RT – AT
= 2.717 ns – 2.281 ns
= 0.435 ns
Slew is the rate of transition of signal which is provided to model slow rising and falling
signals for particular nodes to prevent the timing violation. So, delay increases when slew
decreases.
Delay of a path is a function of output load . For correct static timing analysis , we use
set_load to model the load that will be driven by the output port. Whenever load increases the
delay will increase.
Unateness : For a given timing arc how input transition may lead to change in output transition
define the timing sense of an arc i.e. unateness. Three types of unateness are there.
● Positive unate: If rise transition at the input results in rise transition at output and
fall transition in input results in fall at output Ex. And gate,Buffer.
● Negative unate: if rise transition at the input results in fall transition at the output and
fall transition on the input results in rise in output. Ex. Inverter, NAND gate.
Different cells in the library have different unateness. Based on the unateness of the
particular cell corresponding delays are calculated.Unateness concept reduces the problem
space so complexity of timing analysis tool gets decreased.
During GBA we choose maximum arrival time and slew(that comes from different input timing
arcs) at a vertex of a given timing arc to calculate the delay at each stage ; So GBA can achieve
a safe bound for timing analysis i.e pessimistic.
During PBA we compute the delay of a given timing arc by considering the actual arrival time
and between the input and output of a cell.
using this command the Path based analysis can be done for maximum 50 paths
retime :- This is used to reanalyze the specific set of paths using the specified method
nworst specifies the maximum number of paths to report per endpoint. The default is 1, which
reports only the single worst path ending at a given endpoint. Here we have specified the limit
as 50.
PBA is done for 50 worst path and first paths PBA report is shown below
● PBA is calculated taking the exact delay and exact slew associated with that path. It
provides accurate slack for the path. So slack obtained by PBA for the same path for
which GBA performed is 0.439 ns given by Slack Time. Slack Time(original) = 0.435 ns
gives the slack obtained from GBA. There is an improvement in slack when PBA is done.
We can see from the report, improvement in setup for PBA
PBA REPORT
Total area of 269 standard cells is 2915.579 µm^2 of which 100 are sequential type instances
with area of 2110.994 µm^2 , 9 inverter instances with area 20.436 µm2 and 160 logic type
instances with area of 784.148 µm^2 .
2. AFTER PLACEMENT
Through Physical design steps we convert the netlist to the physical layout through the
processes like floorplanning, placement, clock – tree synthesis, routing and writing GDS.
To invoke the innous tool for physical design, from the terminal type innous which opens in GUI.
Inputs to the physical design tool are,
Input – Output assignment file: - which defines the different pins of the circuit in particular
direction. The direction of the pins are specified as North (N), South (S), East (E), West (W), and
so on.
LEF file : - This file provides the technology design information like the placement and design
rules, process information for layers and vias and also macros and standard cell information of
the design.
View file :- This file contains the delay corners , constraints etc. to be used.
1. Floorplanning: -We have done Floorplanning in two different types i.e., the core utilization
with 0.5 or large die area and the core utilization with 0.8 or small die area. For two different
floorplans reports for area, power and timing are reported and analyzed .
Below command is used for Foorplanning where the aspect ratio is chosen 1 with the core
utilization is 0 .5 and the core to the IO boundary space is chosen to be 4.06 on all sides.
Floorplanning is do
Similarly, floorplanning is done for 0.8 utilization by changing only the core utilization in the tcl
script.
Floorplanning thus creates IO pad design and also creates the rows for standard cells.
2. Powerplanning: - Power planning creates the power rails for VDD and GND to
accommodate for different standard cells.
Power planning is done by below following commands.Rings are added and nets are selected
from top to bottom and left to right Metal8 and Metal9 respectively because these higher metal
layers have lower resistance, and power rails with the width of the rails as 1.25 and the spacing
between the rails as 0.4 and offset center in the channel.
In below command Stripes are added with metal8 layer and with ten number of sets to
accommodate the better routing of different standard cells with the power rails.
Similarly, power planning is done for 0.8 utilization by using the same commands.
3. Placement: - Through below command Placement of io pins are done automatically by tool
and full placement of standard cells. The tool automatically spreads all the standard cells in the
definite standard cell rows over the core area. Similarly, the placement is done for the 0.8
utilization also.
2.1. Utilization 0.5 (larger die area)
Timing Information :
in2/Q_reg[4]/Q (^)
Here in the above timing report shows the critical path of all the paths in the circuit, the path
from the beginpoint in2/Q_reg[4]/Q (^) i.e., starting from the in2 instance with net name Q_reg[4]
which is usually Q pin of FF and to the path ending endpoint dmx/out_reg_reg[23]/D (^) ending
with dmx with net name out_reg_reg[23] which is usually D pins of the FF.
AT Computation: - Starting from the already computed arrival time (other end arrival time)
1.330ns at beginpoint vertex and traversing to the end point through different cells with
experiencing different delays are added up. So subsequent vertices arrival time is calculated by
adding up the already computed arrival time. Finally, from the above timing report we can infer
that AT at the end point is 2.477 ns.
RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + Phase Shift (2.500, clock period) – setup (0.182ns, time required by the clock to
setup) - uncertainty (0.8 ns, clock jitter value). Required Time (RT) at the endpoint evaluates to
2.848 ns.
Setup slack = RT – AT = 2.848ns – 2.477ns = 0.371 ns
Comparison of results obtained from after placement timing with the before placement.
● Required Time and Arrival Time has increased.
● Worst slack has decreased.
Static timing analysis will assume the accuracy in placement of the different standard cells and
compute the slack whereas the placement engine accurately makes decisions by dividing the
different standard cells into corresponding bins which are then moved to the legal positions by
estimating the minimum wire length, timing and congestion. So the worst slack decreases.
Here in the above timing report shows the critical path of all the paths in the circuit, the path
from the beginpoint scan_en (v) i.e., starting from the scan_en pin and to the path ending
endpoint dmx/out_reg_reg[27]/SE (v) ending at the dmx with net name out_reg_reg[27]/SE.
AT Computation: - Here in above timing Report delay from the falling edge of the scan_en pin
to the falling edge SE pin of the dmux is only added i.e., 0.006ns as the input scan_en does not
provide any delay. Finally, from the above timing report we can infer that AT at the end point is
0.006 ns.
RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + hold (0.194ns, time required for the inputs not to change). Required Time (RT) at
the endpoint evaluates to 1.542ns.
Hold slack = AT – RT
= 0.006ns – 1.542ns = -1.517 ns
Area Information :
Area Report :
Area of standard cells : 2915.579 um^2
Area of Buffers : 0.000 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 784.148 um^2
Power Information :
Power Report :
Internal power: - Internal power is the power dissipated within the boundary of a cell as during
switching internal power is dissipated by the charging or discharging of any existing
capacitances internal to the cell . The total internal power is 0.7813mW
Switching power: - The switching power of a driving cell is the power dissipated by the charging
and discharging of the load capacitance at the output of the cell. The total switching power is
0.02794mW
Leakage power: - This leakage current is the current that flows from VDD to GND when there is
no switching. The total leakage power is 0.01859mW
Total power: - Total power is the sum of the all above powers which is 0.8278mW
Snap – shot of the layout design with flight – lines: -
2.1. Utilization 0.8 (smaller die area)
Timing Information :
Here in the above timing report shows the critical path, the path from the beginpoint
in0/Q_reg[3]/Q (^) i.e., starting from the in0 with net name Q_reg[3]which is usually Q pin of FF
and to the path ending endpoint pty/par_reg_reg/D(^) ending at the pty instance with net name
par_reg_reg which is usually D pins of the FF.
AT Computation: - Starting from the already computed arrival time (other end arrival time) 1.330
ns at begin point and traversing to the end point through different cells with different delays are
added up, like the delay from the rising edge of the clock to Qpin i.e., 0.472ns is added with
1.330ns to get AT of Q as 1.802ns. Likewise, subsequent vertex arrival time is calculated by
adding up the already computed arrival time. Finally, from the above timing report we get that AT
at the end point is 2.349 ns.
RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + Phase Shift (2.500, clock period) – setup (0.298ns, time required by the clock to
setup) - uncertainty (0.8 ns, clock jitter value). Required Time (RT) at the endpoint evaluates to
2.732 ns.
Setup slack = RT – AT = 2.732ns – 2.349ns = 0.383 ns
Comparison of results obtained from after placement timing with the before placement.
● RT and AT increased.
● Worst slack decreased.
Static timing analysis will assume the accuracy in placement of the different standard cells and
compute the slack whereas the placement engine accurately makes decisions by dividing the
different standard cells into corresponding bins which are then moved to the legal positions by
estimating the minimum wire length, timing and congestion. So the worst slack decreases.
Here in the above timing report shows the critical path, the path from the beginpoint scan_en (v)
i.e., starting from the scan_en pin and to the path ending endpoint dmx/out_reg_reg[7]/SE (v)
ending at the dmx with net name out_reg_reg[7].
AT Computation: - The delay from the rising edge of the scan_en pin to the rising edge SE pin of
the dmx is only added i.e.0.003ns as input scan_en does not provide any delay. Finally, from the
above timing report we can infer that AT at the end point is 0.003 ns.
RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + hold (0.186ns, time required for the inputs not to change). Required Time (RT) at
the endpoint evaluates to 1.516ns.
Hold slack = AT – RT
= 0.003ns – 1.516ns = -1.513 ns
Area Information :
Area of standard cells : 2915.579 um^2
Area of Buffers : 0.000 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 784.148 um^2
Power Information :
Power Report :
Power Report :
Internal power: - Internal power is the power dissipated within the boundary of a cell as during
switching internal power is dissipated by the charging or discharging of any existing
capacitances internal to the cell . The total internal power is 0.7821mW
Switching power: - The switching power of a driving cell is the power dissipated by the charging
and discharging of the load capacitance at the output of the cell. The total switching power is
0.02856mW
Leakage power: - This leakage current is the current that flows from VDD to GND when there is
no switching. The total leakage power is 0.01859mW
Total power: - Total power is the sum of the all above powers which is 0.8292mW
Snap – shot of the layout design with flight – lines: -
In clock tree synthesis complete routing of clock is done. CTS insert clock buffers and
interconnect in the layout from clock source upto register clock pin.
Clock signals are very critical so to avoid a detour clock is routed before the actual routing.
Below commands is used to perform CTS
In the above command different clock buffers are used to perform CTS.
A. TIMING INFORMATION
Timing Setup
Here in the above timing report shows worst path for which slack is worst compared to all the
other paths, the path from the beginpoint in2/Q_reg[4]/Q(^) i.e., starting from the in2 and to the
path ending endpoint dmx/out_reg_reg[23]/D (^) ending at the dmx with net name
out_reg_reg[23].
Setup Slack= Required time- Arrival time= 2.836-2.670= 0.166ns. Slack at any vertex will be the
same for the path though arrival and required time will be different.
Timing Hold
Hold checks are done at the same edge of the clock. In the above timing report Beginpoint is
input port scan_en and Endpoint is SE pin of dmx/out_reg_reg[30].
For required time calculation, latency of capture clock is given by Other End Arrival Time i.e.
1.299 and path delay associated with it is given in other end path. As both the clock edges are
appearing at same time so phase shift is 0ns. Hold time of a flops is also 0.006ns. Therefore
required time at end point= 1.299+0.169+0= 1.468ns.
Hold checks are fixed after CTS because now we have the information of the exact clock
network.
POST CTS OPTIMIZATION:
Before CTS we assume clock to be ideal to do timing analysis and according to it we add
uncertainty ,latency to change ideal behavior of clock but it's just an estimation .After CTS we
get exact interconnect routing of clock and hence timing will be accurately calculated. So timing
optimization can be done after CTS.
Timing (setup):
Slack = 0.107ns
Post optimization as some buffers are being added so delay in the paths increase hence we get
a reduced slack.
Effect of timing of clock path on overall slack: We provide a latency margin of 1.33ns in the
constraint for both launch and capture clock. So before the CTS tool was taking that timing into
account for doing timing analysis. But after the CTS clock network is built, actual clock delays
are taken i.e. 1.329 ns(latency of both capture and launch flop). Though arrival time and
required time decreases but slack is not affected because of latency parameters as the overall
skew remains zero. CTS tools try to keep skew zero as unplanned skew can degrade
performance. The slack decreases due to increase in setup i.e. 0.265 ns (before CTS it was
0.194ns) which reduces the required time and hence slack as compared to pre CTS.
After optimization, hold violations are fixed. The worst hold path changed before and after
optimization.
Required time at the Endpoint = 1.298(Other End Arrival Time) + 0.00(phase shift)+
0.132(hold)= 1.430ns
Effect of timing of clock path on overall slack: After CTS clock network is built so actual
clock delays are taken i.e. 1.298 ns(latency of both capture and launch flop). Extra delay in the
paths are provided which increases the arrival time .Hence hold slack improves and timing
violations are removed.
B. AREA INFORMATION
Area report after CTS pre-optimization:
Area of standard cells : 2876.977 um^2
Area of Buffers : 15.138 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 730.409 um^2
Area report after CTS post-optimization:
Area of standard cells : 2995.053 um^2
Area of Buffers : 153.651 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2090.558 um^2
Area of other combinational cells : 730.409 um^2
C. POWER INFORMATION
Timing Setup
In this timing report shown below, the tool finds the worst path for which the slack is worst
compared to all the paths. The path is defined here by Beginpoint of the path that is in2 with net
name Q_reg[4] to Endpoint that is D pin of out_reg_reg[28].
Other End Arrival Time is basically the latency of capture clock i.e time required for the clock
signal to reach clock pin out_reg_reg[28] is 1.330ns. We can see that the clock network is built
in the CTS so actual delays associated with the clock path are taken into account.
Setup is time required for the clock signal to set i.e. 0.210 ns. Uncertainty of 0.8 is provided in
constraint to account jitter in the clock period. So Required time at End Point = 2.819 ns.
Timing Hold
Hold checks are done at the same edge of the clock. Here Beginpoint is DFT_sdi_1 and
Endpoint is in2/Q_reg[2]/SI.
For required time calculation, latency of capture clock is given by Other End Arrival Time i.e.
1.299 and path delay associated with it is given in other end path. As both the clock edges are
appearing at same time so phase shift is 0ns. Hold time is also 0.163ns. Therefore required time
at end point= 1.299+0.163+0= 1.462ns.
Hold checks are fixed after CTS because now we have the information of the exact clock
network.
Before CTS we assume clock to be ideal to do timing analysis and according to it we add
uncertainty ,latency to change ideal behavior of clock but its just an estimation .After CTS we
get exact interconnect routing of clock and hence timing will be accurately calculated. So timing
optimization can be done after CTS.
Timing (setup):
Slack = 0.128ns
Post optimization as some buffers are being added so delay in the paths increase hence we get
a reduced slack.
Effect of timing of clock path on overall slack: We provide a latency margin of 1.33ns in the
constraint for both launch and capture clock. So before the CTS tool was taking that timing into
account for doing timing analysis. But after the CTS clock network is built, actual clock delays
are taken i.e. 1.329 ns(latency of both capture and launch flop). Though arrival time and
required time decreases but slack is not affected because of latency parameters as the overall
skew remains zero. CTS tools try to keep skew zero as unplanned skew can degrade
performance. The slack decreases due to increase in setup i.e. 0.281 ns (before CTS it was
0.210 ns) which reduces the required time and hence slack as compared to pre CTS.
Timing Report Hold:
After optimization, hold violations are fixed. The worst hold path changed before and after
optimization.
Required time at the Endpoint = 1.299(Other End Arrival Time) + 0.00(phase shift)+
0.098(hold)= 1.397ns
Effect of timing of clock path on overall slack: After CTS clock network is built so actual
clock delays are taken i.e. 1.299 ns(latency of both capture and launch flop). Extra delay in the
paths are provided which increases the arrival time .Hence hold slack improves and timing
violations are removed.
B. AREA INFORMATION
Routing step is done to establish interconnections amongst the components that have been
placed on the design .
For a given net connections are made for all the pins satisfying certain constraints.
The objective of this step is to minimize the wire lengths , routing areas and vias.
4.1.Utilization 0.5 :
A. TIMING INFORMATION
Timing report (setup):
Set up Slack=0.095 ns
After the detailed routing step we get to know the exact interconnect layouts , where in each of
the wires will have different resistance and capacitance and hence different and more practical
delays can be now seen.
While doing timing analysis actual interconnect delays are considered and hence arrival time
increases and hence overall slack reduces
Timing report (hold):
Hold slack=0.062 ns
B. AREA INFORMATION
Area report:
Area of standard cells : 2995.053 um^2
Area of Buffers : 153.651 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2090.558 um^2
Area of other combinational cells : 730.409 um^2
C. POWER INFORMATION
Power report:
Total power consumed = 0.8278 mW
A. TIMING INFORMATION
Setup slack=0.117 ns
Timing report (hold):
B. AREA INFORMATION
Area report:
Area of standard cells : 3145.676 um^2
Area of Buffers : 305.788 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2089.044 um^2
Area of other combinational cells : 730.409 um^2
C. POWER INFORMATION
Power report:
Power consumed = 0.8292 mW
We in our design have done the routing from M1-M9 layers ,but if suppose we do it from M2-M9
or for that matter we don't use all the layers for routing then , what happens is that if there is a
pin that is outside limits then the tool uses via (increase in resistance) to route in that case ,
which leads to an increase in delay and further in the arrival time .