0% found this document useful (0 votes)
25 views62 pages

VDF Project Part2 (2021)

Uploaded by

ankit raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views62 pages

VDF Project Part2 (2021)

Uploaded by

ankit raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

VLSI Design Flow

ECE 513

Project Part 2

Group Number 5
Diptanu Som (MT21191)
Priyanka Bhagat (MT21158)
Mallika Singhal (2019173)
TOOLS and LIBRARY for Physical Design

1. Physical Design Tool: Innovus(Cadence)

2. Static Timing Analysis Tool: Tempus (Cadence)

● Library Used: 90nm

Cadence - Slow.lib
Interpretation of results obtained in various steps of physical design: -

For 0.8 utilization:

Design Step Timing Timing No of cells Area of Power


Slack Slack standard (mW)
(setup)( (hold) (ns) 2
cells(μ𝑚 )
ns)

Before physical 0.435 - 269 2915.579 0.8157


Design

After Placement 0.383 -1.513 269 2915.579 0.8292

After CTS(pre 0.176 -1.462 272 2876.977 0.8292


optimization)

After 0.128 0.005 297 3145.676 0.8292


CTS(post-optimiz
ation)

After Detailed 0.117 0.003 297 3145.676 0.8292


routing

For 0.5 utilization:

Design Step Timing Timing No of cells Area of Power


Slack Slack standard (mW)
(setup) (hold) (ns) cells(μ𝑚 )
2
(ns)

Before physical 0.435 - 269 2915.579 0.8157


Design

After Placement 0.371 -1.517 269 2915.579 0.8278

After CTS(pre 0.166 -1.463 272 2876.977 0.8278


optimization)

After 0.107 0.051 282 2995.053 0.8278


CTS(post-optimi
zation)

After Detailed 0.095 0.062 282 2995.053 0.8278


routing
TIMING SLACK (SETUP):

● At every step of physical design timing slacks vary . During placement tools try to place
each standard cell such that total wire length reduces according to connectivity with
input- output pins so that optimization takes place and in this process critical path timing
hampers and hence slack reduces.
● After CTS pre optimization steps ; clock networks are added, to account for the actual
clock delay and sometimes slack varies . Mostly it reduces as arrival time increases .
After optimization (post CTS) buffers are added . Because buffers provide extra delay in
the paths which further reduces the slack .
● After detailed routing actual interconnects of layout are taking place. These actual
interconnect delays further reduce the slack.

TIMING SLACK (HOLD):

● Clock tree networks are built after CTS and exact clock delays information are available
and hold violations are fixed by adding clock buffers in hold violated path so as to
increase delay and hence arrival time after Post CTS optimization. So overall slack
improves.
● After routing, hold slack further reduces.

NUMBER OF CELLS AND AREA OF STANDARD CELLS

● In pre and post placement the number of cells remains the same . During placement ,
the standard cells available inside the netlist are placed . Hence the area of the standard
cell remains the same .
● Some extra cells of clock inverters are added after CTS pre-optimization . So the
number of cells increases and hence the area of standard cells increases.

● Some clock buffers are added after CTS post- optimization . So the number of cells
further increases and hence the area of standard cells increases.

● After routing, no standard cells are added so the number of cells and area remain the
same.

POWER CONSUMED:

● Power consumption increases after placement . This is due to switching of devices at a


higher rate Dynamic power dissipation increases as rate of switching of devices
increases . Also power is consumed due to device leakages and voltage drop across the
input- output pins.

● After CTS extra inverters and clock buffers are added which consume power and hence
overall power consumption increases
● After routing due to voltage drop in interconnect power consumption further increases .

ROUTABILITY ON CHANGE OF FLOORPLAN:

● In 0.5 core utilization, 50 percent of area is reserved for routing purpose so routability is
good in this case as compared to 0.8. In 0.8 core utilization only 20 percent area is
reserved for routing.

● Less routing area results in more congestion in the design and hence signal integrity
issues come into picture due to crosstalk . This problem occurs because the decrease
in space between wires makes coupling capacitance significantly considerable .

● In our design the number of cells are less. So for 0.8 utilization cells are placed very
close. So less routing resources required to route the cells and hence delay introduced
by the interconnects are less . That’s why timing slack in .8 utilization cells is more than
0.5 core utilization.

1. BEFORE STARTING PHYSICAL DESIGN

RTL to GDS is comprised of several steps like RTL design , Synthesis, Equivalence checking,
Design for Test and then Physical design. Physical design step converts netlist to GDS( layout).
So, to start with the physical design the structural netlist with tight timing constraints which was
generated after DFT is used. The various information is reported below corresponding to the
netlist which was used for the physical design.
Constraint file:

Constraint File
I. Timing Information:

report_timing command in tempus provides the information about the various paths in
the design i.e. delay through the entire path. The start node and the end node of each path is
identified.
Static Timing Analysis works by building a timing graph for the circuit and the computes the
arrival time (AT) using forward traversal starting from the timing begin point. Then computes the
required time (RT) by backward traversal starting from the timing end point. Then the slack = RT
– AT is computed.

Arrival Time Computation: -

AT is calculated using forward traversal in the timing graph. At a given vertex Q arrival
time is computed as Max (𝐴𝑃+𝐷𝑃𝑄), P are the vertices at the input of the Q t. All the input
vertices of the given vertex have their AT already computed which is given in the below timing
report as other end arrival time (1.330ns) which is usually network delay (0.65 ns) plus source
delay (0.68 ns).
Here in the below timing report GBA is performed to find the most critical path of in the
network, the path from the beginpoint in2/Q_reg/D (ν) i.e., starting from the in2 instance with net
name Q_reg which is usually D pin of FF and to the path ending endpoint pty/par_reg_reg/D (^)
ending at the pty instance with net name par_reg_reg which is usually D pins of the FF.
Timing Report

In the above report at begin point vertex arrival time(other end arrival time) computed as 1.330
ns and by traversing to the end point through different cells encounters different delays which
are added e.g.from the rising edge of the clock of the in2 instance to the Q pin of the in2
instance i.e., 0.491 ns is added with 1.330ns to get AT of Q vertex as 1.821ns. In similar fashion
arrival time at subsequent vertices are calculated . Finally, we find that the AT at the end point
is 2.281ns.
Required Time Computation: -

RT is calculated by backward traversal in a timing graph. At a given vertex P the RT is


calculated as Min (𝑅𝑄 - 𝐷𝑃𝑄), where Q are all the vertices in the output of the vertex P. Generally
RT at the endpoint obtained from the constraints.

From the above timing report we observed that RT is calculated as other end arrival time
(1.330 ns) + Phase Shift (2.50, clock period) – setup (0.313ns, time required by the clock to
setup) - uncertainty (0.8 ns, clock jitter value). Required Time (RT) at the endpoint evaluates to
2.717 ns. In the similar fashion RT’s of the vertices prior to endpoint are calculated by
traversing backwards,e.g. the required time at pty/g242 instance is calculated as 2.717 ns –
0.225 ns (delay of the cell) = 2.491 ns.

Slack Computation: -

Slack = RT – AT
= 2.717 ns – 2.281 ns
= 0.435 ns

Effect of slew on timing: -

Slew is the rate of transition of signal which is provided to model slow rising and falling
signals for particular nodes to prevent the timing violation. So, delay increases when slew
decreases.

Effect of load on timing: -

Delay of a path is a function of output load . For correct static timing analysis , we use
set_load to model the load that will be driven by the output port. Whenever load increases the
delay will increase.

Effect of positive / negative unateness: -

Unateness : For a given timing arc how input transition may lead to change in output transition
define the timing sense of an arc i.e. unateness. Three types of unateness are there.
● Positive unate: If rise transition at the input results in rise transition at output and
fall transition in input results in fall at output Ex. And gate,Buffer.

● Negative unate: if rise transition at the input results in fall transition at the output and
fall transition on the input results in rise in output. Ex. Inverter, NAND gate.
Different cells in the library have different unateness. Based on the unateness of the
particular cell corresponding delays are calculated.Unateness concept reduces the problem
space so complexity of timing analysis tool gets decreased.

Effect of GBA and PBA on timing: -

During GBA we choose maximum arrival time and slew(that comes from different input timing
arcs) at a vertex of a given timing arc to calculate the delay at each stage ; So GBA can achieve
a safe bound for timing analysis i.e pessimistic.

During PBA we compute the delay of a given timing arc by considering the actual arrival time
and between the input and output of a cell.

Command: - report_timing -retime path_slew_propagation -max_path 50 -nworst


50 -path_type full_clock >

using this command the Path based analysis can be done for maximum 50 paths

retime :- This is used to reanalyze the specific set of paths using the specified method

Path_slew_propagation: - RE – evaluates the given set of GBA paths by


recalculating the delay values based upon the actual propagated slew across the
path.

nworst specifies the maximum number of paths to report per endpoint. The default is 1, which
reports only the single worst path ending at a given endpoint. Here we have specified the limit
as 50.

PBA is done for 50 worst path and first paths PBA report is shown below

● PBA is calculated taking the exact delay and exact slew associated with that path. It
provides accurate slack for the path. So slack obtained by PBA for the same path for
which GBA performed is 0.439 ns given by Slack Time. Slack Time(original) = 0.435 ns
gives the slack obtained from GBA. There is an improvement in slack when PBA is done.
We can see from the report, improvement in setup for PBA
PBA REPORT

Slack using path based analysis is 0.439

II. Area Information: -


AREA REPORT

Total area of 269 standard cells is 2915.579 µm^2 of which 100 are sequential type instances
with area of 2110.994 µm^2 , 9 inverter instances with area 20.436 µm2 and 160 logic type
instances with area of 784.148 µm^2 .

III. Power Information: -


Area Report
Total power of 310 standard cells is 0.8157 mW of which total internal power is of 0.7229 mW ,
total switching power is of 0.0762187 mW and total leakage power is of 0.01655 mW.

2. AFTER PLACEMENT

Through Physical design steps we convert the netlist to the physical layout through the
processes like floorplanning, placement, clock – tree synthesis, routing and writing GDS.

To invoke the innous tool for physical design, from the terminal type innous which opens in GUI.
Inputs to the physical design tool are,

Netlist : - it is a structural netlist which contains interconnection of different standard cells

Input – Output assignment file: - which defines the different pins of the circuit in particular
direction. The direction of the pins are specified as North (N), South (S), East (E), West (W), and
so on.
LEF file : - This file provides the technology design information like the placement and design
rules, process information for layers and vias and also macros and standard cell information of
the design.

View file :- This file contains the delay corners , constraints etc. to be used.

1. Floorplanning: -We have done Floorplanning in two different types i.e., the core utilization
with 0.5 or large die area and the core utilization with 0.8 or small die area. For two different
floorplans reports for area, power and timing are reported and analyzed .

Below command is used for Foorplanning where the aspect ratio is chosen 1 with the core
utilization is 0 .5 and the core to the IO boundary space is chosen to be 4.06 on all sides.

Floorplanning is do

Similarly, floorplanning is done for 0.8 utilization by changing only the core utilization in the tcl
script.
Floorplanning thus creates IO pad design and also creates the rows for standard cells.

2. Powerplanning: - Power planning creates the power rails for VDD and GND to
accommodate for different standard cells.
Power planning is done by below following commands.Rings are added and nets are selected
from top to bottom and left to right Metal8 and Metal9 respectively because these higher metal
layers have lower resistance, and power rails with the width of the rails as 1.25 and the spacing
between the rails as 0.4 and offset center in the channel.

In below command Stripes are added with metal8 layer and with ten number of sets to
accommodate the better routing of different standard cells with the power rails.

Similarly, power planning is done for 0.8 utilization by using the same commands.

3. Placement: - Through below command Placement of io pins are done automatically by tool
and full placement of standard cells. The tool automatically spreads all the standard cells in the
definite standard cell rows over the core area. Similarly, the placement is done for the 0.8
utilization also.
2.1. Utilization 0.5 (larger die area)

Timing Information :

Timing Report Setup:

in2/Q_reg[4]/Q (^)

Here in the above timing report shows the critical path of all the paths in the circuit, the path
from the beginpoint in2/Q_reg[4]/Q (^) i.e., starting from the in2 instance with net name Q_reg[4]
which is usually Q pin of FF and to the path ending endpoint dmx/out_reg_reg[23]/D (^) ending
with dmx with net name out_reg_reg[23] which is usually D pins of the FF.

AT Computation: - Starting from the already computed arrival time (other end arrival time)
1.330ns at beginpoint vertex and traversing to the end point through different cells with
experiencing different delays are added up. So subsequent vertices arrival time is calculated by
adding up the already computed arrival time. Finally, from the above timing report we can infer
that AT at the end point is 2.477 ns.

RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + Phase Shift (2.500, clock period) – setup (0.182ns, time required by the clock to
setup) - uncertainty (0.8 ns, clock jitter value). Required Time (RT) at the endpoint evaluates to
2.848 ns.
Setup slack = RT – AT = 2.848ns – 2.477ns = 0.371 ns

Comparison of results obtained from after placement timing with the before placement.
● Required Time and Arrival Time has increased.
● Worst slack has decreased.

Static timing analysis will assume the accuracy in placement of the different standard cells and
compute the slack whereas the placement engine accurately makes decisions by dividing the
different standard cells into corresponding bins which are then moved to the legal positions by
estimating the minimum wire length, timing and congestion. So the worst slack decreases.

Timing Report Hold:

Here in the above timing report shows the critical path of all the paths in the circuit, the path
from the beginpoint scan_en (v) i.e., starting from the scan_en pin and to the path ending
endpoint dmx/out_reg_reg[27]/SE (v) ending at the dmx with net name out_reg_reg[27]/SE.

AT Computation: - Here in above timing Report delay from the falling edge of the scan_en pin
to the falling edge SE pin of the dmux is only added i.e., 0.006ns as the input scan_en does not
provide any delay. Finally, from the above timing report we can infer that AT at the end point is
0.006 ns.

RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + hold (0.194ns, time required for the inputs not to change). Required Time (RT) at
the endpoint evaluates to 1.542ns.
Hold slack = AT – RT
= 0.006ns – 1.542ns = -1.517 ns
Area Information :

Area Report :
Area of standard cells : 2915.579 um^2
Area of Buffers : 0.000 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 784.148 um^2

Power Information :

Power Report :
Internal power: - Internal power is the power dissipated within the boundary of a cell as during
switching internal power is dissipated by the charging or discharging of any existing
capacitances internal to the cell . The total internal power is 0.7813mW

Switching power: - The switching power of a driving cell is the power dissipated by the charging
and discharging of the load capacitance at the output of the cell. The total switching power is
0.02794mW
Leakage power: - This leakage current is the current that flows from VDD to GND when there is
no switching. The total leakage power is 0.01859mW

Total power: - Total power is the sum of the all above powers which is 0.8278mW
Snap – shot of the layout design with flight – lines: -
2.1. Utilization 0.8 (smaller die area)

Timing Information :

Timing Report Setup:

Here in the above timing report shows the critical path, the path from the beginpoint
in0/Q_reg[3]/Q (^) i.e., starting from the in0 with net name Q_reg[3]which is usually Q pin of FF
and to the path ending endpoint pty/par_reg_reg/D(^) ending at the pty instance with net name
par_reg_reg which is usually D pins of the FF.
AT Computation: - Starting from the already computed arrival time (other end arrival time) 1.330
ns at begin point and traversing to the end point through different cells with different delays are
added up, like the delay from the rising edge of the clock to Qpin i.e., 0.472ns is added with
1.330ns to get AT of Q as 1.802ns. Likewise, subsequent vertex arrival time is calculated by
adding up the already computed arrival time. Finally, from the above timing report we get that AT
at the end point is 2.349 ns.

RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + Phase Shift (2.500, clock period) – setup (0.298ns, time required by the clock to
setup) - uncertainty (0.8 ns, clock jitter value). Required Time (RT) at the endpoint evaluates to
2.732 ns.
Setup slack = RT – AT = 2.732ns – 2.349ns = 0.383 ns
Comparison of results obtained from after placement timing with the before placement.
● RT and AT increased.
● Worst slack decreased.

Static timing analysis will assume the accuracy in placement of the different standard cells and
compute the slack whereas the placement engine accurately makes decisions by dividing the
different standard cells into corresponding bins which are then moved to the legal positions by
estimating the minimum wire length, timing and congestion. So the worst slack decreases.

Timing Report Hold:

Here in the above timing report shows the critical path, the path from the beginpoint scan_en (v)
i.e., starting from the scan_en pin and to the path ending endpoint dmx/out_reg_reg[7]/SE (v)
ending at the dmx with net name out_reg_reg[7].

AT Computation: - The delay from the rising edge of the scan_en pin to the rising edge SE pin of
the dmx is only added i.e.0.003ns as input scan_en does not provide any delay. Finally, from the
above timing report we can infer that AT at the end point is 0.003 ns.

RT Computation: - Here in the above timing report RT is computed as other end arrival time
(1.330 ns) + hold (0.186ns, time required for the inputs not to change). Required Time (RT) at
the endpoint evaluates to 1.516ns.
Hold slack = AT – RT
= 0.003ns – 1.516ns = -1.513 ns
Area Information :
Area of standard cells : 2915.579 um^2
Area of Buffers : 0.000 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 784.148 um^2

Power Information :

Power Report :

Power Report :
Internal power: - Internal power is the power dissipated within the boundary of a cell as during
switching internal power is dissipated by the charging or discharging of any existing
capacitances internal to the cell . The total internal power is 0.7821mW

Switching power: - The switching power of a driving cell is the power dissipated by the charging
and discharging of the load capacitance at the output of the cell. The total switching power is
0.02856mW
Leakage power: - This leakage current is the current that flows from VDD to GND when there is
no switching. The total leakage power is 0.01859mW

Total power: - Total power is the sum of the all above powers which is 0.8292mW
Snap – shot of the layout design with flight – lines: -

EFFECT OF PLACEMENT ON TIMING :


When the utilization is 0.5 then that means 50% of the die area is for routing and the rest 50% is
for the placement of cells , similarly when the utilization is 0.8 then that means 80% of the die
area is for the placement of cells and the rest 20% is for the routing . In the routing part , timing
optimizations are done and in the placement part timing constraints are estimated based on the
wire length estimates , so when the area for the routing part is less and the placement part is
more ,we will tend to have comparatively loose timing constraints in the later case .
3. After Clock Tree Synthesis

In clock tree synthesis complete routing of clock is done. CTS insert clock buffers and
interconnect in the layout from clock source upto register clock pin.

Clock signals are very critical so to avoid a detour clock is routed before the actual routing.
Below commands is used to perform CTS

In the above command different clock buffers are used to perform CTS.

3.1. Utilization 0.5

Post CTS Pre Optimization:

A. TIMING INFORMATION

Timing Setup
Here in the above timing report shows worst path for which slack is worst compared to all the
other paths, the path from the beginpoint in2/Q_reg[4]/Q(^) i.e., starting from the in2 and to the
path ending endpoint dmx/out_reg_reg[23]/D (^) ending at the dmx with net name
out_reg_reg[23].

Setup Slack= Required time- Arrival time= 2.836-2.670= 0.166ns. Slack at any vertex will be the
same for the path though arrival and required time will be different.

Timing Hold

Hold checks are done at the same edge of the clock. In the above timing report Beginpoint is
input port scan_en and Endpoint is SE pin of dmx/out_reg_reg[30].

For required time calculation, latency of capture clock is given by Other End Arrival Time i.e.
1.299 and path delay associated with it is given in other end path. As both the clock edges are
appearing at same time so phase shift is 0ns. Hold time of a flops is also 0.006ns. Therefore
required time at end point= 1.299+0.169+0= 1.468ns.

Arrival time at the End point is 0.05ns.

Therefore hold slack= Arrival time- Required time=0.05-1.468= -1.463ns

Hold checks are fixed after CTS because now we have the information of the exact clock
network.
POST CTS OPTIMIZATION:

Before CTS we assume clock to be ideal to do timing analysis and according to it we add
uncertainty ,latency to change ideal behavior of clock but it's just an estimation .After CTS we
get exact interconnect routing of clock and hence timing will be accurately calculated. So timing
optimization can be done after CTS.

Timing (setup):

Slack = 0.107ns

Post optimization as some buffers are being added so delay in the paths increase hence we get
a reduced slack.

Effect of timing of clock path on overall slack: We provide a latency margin of 1.33ns in the
constraint for both launch and capture clock. So before the CTS tool was taking that timing into
account for doing timing analysis. But after the CTS clock network is built, actual clock delays
are taken i.e. 1.329 ns(latency of both capture and launch flop). Though arrival time and
required time decreases but slack is not affected because of latency parameters as the overall
skew remains zero. CTS tools try to keep skew zero as unplanned skew can degrade
performance. The slack decreases due to increase in setup i.e. 0.265 ns (before CTS it was
0.194ns) which reduces the required time and hence slack as compared to pre CTS.

Timing Report Hold:

After optimization, hold violations are fixed. The worst hold path changed before and after
optimization.

Worst hold slack after optimization is 0.051ns

The path is from input port scanen to SE pin of dmx/out_reg_reg[3]

Required time at the Endpoint = 1.298(Other End Arrival Time) + 0.00(phase shift)+
0.132(hold)= 1.430ns

Arrival time= 1.481ns

Hence slack= arrival time - required time =1.481-1.430= 0.051ns

Effect of timing of clock path on overall slack: After CTS clock network is built so actual
clock delays are taken i.e. 1.298 ns(latency of both capture and launch flop). Extra delay in the
paths are provided which increases the arrival time .Hence hold slack improves and timing
violations are removed.

B. AREA INFORMATION
Area report after CTS pre-optimization:
Area of standard cells : 2876.977 um^2
Area of Buffers : 15.138 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 730.409 um^2
Area report after CTS post-optimization:
Area of standard cells : 2995.053 um^2
Area of Buffers : 153.651 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2090.558 um^2
Area of other combinational cells : 730.409 um^2
C. POWER INFORMATION

Power report post CTS pre optimization:

Total Power consumed= 0.8278 mW.


Power report post CTS post optimization:

Total power consumed= 0.8278 mW

D. SNAP - SHOT OF LAYOUT SHOWING CLOCK TREE:


3.1. Utilization 0.8

Post CTS Pre Optimization:


A. TIMING INFORMATION

Timing Setup

In this timing report shown below, the tool finds the worst path for which the slack is worst
compared to all the paths. The path is defined here by Beginpoint of the path that is in2 with net
name Q_reg[4] to Endpoint that is D pin of out_reg_reg[28].

Other End Arrival Time is basically the latency of capture clock i.e time required for the clock
signal to reach clock pin out_reg_reg[28] is 1.330ns. We can see that the clock network is built
in the CTS so actual delays associated with the clock path are taken into account.

Setup is time required for the clock signal to set i.e. 0.210 ns. Uncertainty of 0.8 is provided in
constraint to account jitter in the clock period. So Required time at End Point = 2.819 ns.

Overall Arrival time at Endpoint is 2.644 ns.


Setup Slack= Required time- Arrival time= 2.819-2.644= 0.176ns. Slack at any vertex will be the
same for the path though arrival and required time will be different.

Timing Hold

Hold checks are done at the same edge of the clock. Here Beginpoint is DFT_sdi_1 and
Endpoint is in2/Q_reg[2]/SI.

For required time calculation, latency of capture clock is given by Other End Arrival Time i.e.
1.299 and path delay associated with it is given in other end path. As both the clock edges are
appearing at same time so phase shift is 0ns. Hold time is also 0.163ns. Therefore required time
at end point= 1.299+0.163+0= 1.462ns.

Arrival time = 0.000ns.

Therefore hold slack= Arrival time- Required time=0.000-1.462= -1.462ns

Hold checks are fixed after CTS because now we have the information of the exact clock
network.

POST CTS OPTIMIZATION:

Before CTS we assume clock to be ideal to do timing analysis and according to it we add
uncertainty ,latency to change ideal behavior of clock but its just an estimation .After CTS we
get exact interconnect routing of clock and hence timing will be accurately calculated. So timing
optimization can be done after CTS.
Timing (setup):
Slack = 0.128ns

Post optimization as some buffers are being added so delay in the paths increase hence we get
a reduced slack.

Effect of timing of clock path on overall slack: We provide a latency margin of 1.33ns in the
constraint for both launch and capture clock. So before the CTS tool was taking that timing into
account for doing timing analysis. But after the CTS clock network is built, actual clock delays
are taken i.e. 1.329 ns(latency of both capture and launch flop). Though arrival time and
required time decreases but slack is not affected because of latency parameters as the overall
skew remains zero. CTS tools try to keep skew zero as unplanned skew can degrade
performance. The slack decreases due to increase in setup i.e. 0.281 ns (before CTS it was
0.210 ns) which reduces the required time and hence slack as compared to pre CTS.
Timing Report Hold:

After optimization, hold violations are fixed. The worst hold path changed before and after
optimization.

Worst hold slack after optimization is 0.098ns

The path is from DFT_sdi_1 to in2/Q_reg[2]/SI

Required time at the Endpoint = 1.299(Other End Arrival Time) + 0.00(phase shift)+
0.098(hold)= 1.397ns

Arrival time= 1.402ns

Hence slack= arrival time - required time =1.402-1.397= 0.005ns

Effect of timing of clock path on overall slack: After CTS clock network is built so actual
clock delays are taken i.e. 1.299 ns(latency of both capture and launch flop). Extra delay in the
paths are provided which increases the arrival time .Hence hold slack improves and timing
violations are removed.
B. AREA INFORMATION

Area report after CTS pre-optimization:


Area of standard cells : 2876.977 um^2
Area of Buffers : 15.138 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2110.994 um^2
Area of other combinational cells : 730.409 um^2
Area report after CTS post-optimization:
Area of standard cells : 3145.676 um^2
Area of Buffers : 305.788 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2089.044 um^2
Area of other combinational cells : 730.409 um^2
C. POWER INFORMATION

Power report post CTS pre optimization:

Total power consumed= 0.8292 mW


Power report post CTS post optimization:

Total power consumed= 0.8292 mW


D. SNAP - SHOT OF LAYOUT SHOWING CLOCK TREE:
4. After Detailed routing:

Routing step is done to establish interconnections amongst the components that have been
placed on the design .
For a given net connections are made for all the pins satisfying certain constraints.
The objective of this step is to minimize the wire lengths , routing areas and vias.

4.1.Utilization 0.5 :

A. TIMING INFORMATION
Timing report (setup):

Set up Slack=0.095 ns

After the detailed routing step we get to know the exact interconnect layouts , where in each of
the wires will have different resistance and capacitance and hence different and more practical
delays can be now seen.

While doing timing analysis actual interconnect delays are considered and hence arrival time
increases and hence overall slack reduces
Timing report (hold):

Hold slack=0.062 ns

B. AREA INFORMATION
Area report:
Area of standard cells : 2995.053 um^2
Area of Buffers : 153.651 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2090.558 um^2
Area of other combinational cells : 730.409 um^2
C. POWER INFORMATION

Power report:
Total power consumed = 0.8278 mW

D. SNAP - SHOT OF LAYOUT OF DESIGN AND CONNECTIVITY SHOWING DIFFERENT


METAL LAYER
4.1.Utilization 0.8 :

A. TIMING INFORMATION

Timing report (setup):

Setup slack=0.117 ns
Timing report (hold):

Hold slack =0.003 ns

B. AREA INFORMATION

Area report:
Area of standard cells : 3145.676 um^2
Area of Buffers : 305.788 um^2
Area of Inverters : 20.436 um^2
Area of flip-flops : 2089.044 um^2
Area of other combinational cells : 730.409 um^2

C. POWER INFORMATION

Power report:
Power consumed = 0.8292 mW

D. SNAPSHOT OF LAYOUT OF DESIGN AND CONNECTIVITY SHOWING DIFFERENT


METAL LAYER

Effects of changing metal layers for routing:

We in our design have done the routing from M1-M9 layers ,but if suppose we do it from M2-M9
or for that matter we don't use all the layers for routing then , what happens is that if there is a
pin that is outside limits then the tool uses via (increase in resistance) to route in that case ,
which leads to an increase in delay and further in the arrival time .

You might also like