VDF Project Part 2 Group 20
VDF Project Part 2 Group 20
ECE-513
PROJECT PART- 2
Group Number - 20
5. Constraint File: (Same constraint which was used for tighter timings)
At the floor planning stage, the layout of the interconnects is not yet decided.
Therefore, it is challenging to estimate the area we must allocate for the interconnects
on the die. A parameter named Utilization is used to account for this:
For UF=0.8, the cells are to be placed in 80% of the core Area and rest only 20%
is granted for routing purposes.
All the steps in physical design have been done for these two utilization factors
separately.
Sanity checks are crucial for verifying the accuracy and consistency of inputs during
the physical design process. Prior to floor planning, conducting sanity checks is
essential to validate the correctness of the netlist, standard cell library, and
constraints. These checks help pre-empt potential issues during subsequent
placement and routing stages. Therefore, incorporating sanity checks at various
points during placement and routing is vital to mitigate risks and ensure smoother
progress in the design process.
9. Power Planning:
# Adding Rings
addRing -skip_via_on_wire_shape Noshape -skip_via_on_pin Standardcell -
center 1 -stacked_via_top_layer Metal9 -type core_rings -jog_distance
0.435 -threshold 0.435 -nets {VSS VDD} -follow core -
stacked_via_bottom_layer Metal1 -layer {bottom Metal8 top Metal8 right
Metal9 left Metal9} -width 1.25 -spacing 0.4 -offset 0.435
# Adding Stripes
addStripe -skip_via_on_wire_shape Noshape -block_ring_top_layer_limit
Metal9 -max_same_layer_jog_length 0.88 -padcore_ring_bottom_layer_limit
Metal7 -number_of_sets 10 -skip_via_on_pin Standardcell -
stacked_via_top_layer Metal9 -padcore_ring_top_layer_limit Metal9 -
spacing 0.4 -merge_stripes_value 0.435 -layer Metal8 -
block_ring_bottom_layer_limit Metal7 -width 0.44 -nets {VDD VSS} -
stacked_via_bottom_layer Metal1
12.CTS
#/*Clock Tree Synthesis*/
set_ccopt_mode -cts_buffer_cells {CLKBUFX12 CLKBUFX16 CLKBUFX2
CLKBUFX20 CLKBUFX3 CLKBUFX4 CLKBUFX6 CLKBUFX8 CLKINVX1 CLKINVX12
CLKINVX16 CLKINVX2 CLKINVX20 CLKINVX3 CLKINVX4 CLKINVX6 CLKINVX8} -
cts_opt_priority all
create_ccopt_clock_tree_spec -file $report_dir/cts_spec/ccopt_new.spec
-keep_all_sdc_clocks -views {view_1 view_1}
source $report_dir/cts_spec/ccopt_new.spec
13. Routing:
#/*Global & Detail Routing*/
setNanoRouteMode -quiet -timingEngine {}
setNanoRouteMode -quiet -routeWithSiPostRouteFix 0
setNanoRouteMode -quiet -drouteStartIteration default
setNanoRouteMode -quiet -routeTopRoutingLayer default
setNanoRouteMode -quiet -routeBottomRoutingLayer default
setNanoRouteMode -quiet -drouteEndIteration default
setNanoRouteMode -quiet -routeWithTimingDriven false
setNanoRouteMode -quiet -routeWithSiDriven false
routeDesign -globalDetail
14.GDS Writing:
#/*Generating GDS*/
streamOut rtl_module.gds -mapFile streamOut.map -libName DesignLib -
units 2000 -mode ALL
1. For UF = 0.5
1.3. After adding input slew and output load in Constraint file
set_input_transition 0.4 [get_ports "rst"]
set_input_transition 0.4 [get_ports "cruse_button"]
set_input_transition 0.4 [get_ports "add"]
set_input_transition 0.4 [get_ports "sub"]
set_input_transition 0.4 [get_ports "speed"]
set_input_transition 0.4 [get_ports "brake"]
set_load 2 [all_outputs]
1.4. Setup Slack for worst path (effect of slew):
As the slew rate increases, the signal delay increase, the arrival time consequently
extends, resulting in an increased hold slack (difference between arrival time and
required time).
1.6. Setup Slack for worst path (effect of load):
The delay of a specific path and consequently the overall slack are primarily
influenced by two critical factors: Input Slew and Output Load. When the load on
the output increases, the cell is required to charge or discharge a larger capacitance.
This process naturally takes more time, leading to an increase in delay along the
path. As a result, the available slack decreases since the signal needs more time to
propagate through the path, thereby reducing the timing margin. In summary, higher
output load leads to increased delay and decreased slack along the path.
1.8. Positive Unate-ness:
A positive unate timing arc is characterized by the output signal rising or remaining
unchanged when there's a rising transition on the input, and falling or remaining
unchanged when there's a falling transition on the input. Examples of components
exhibiting positive unate behaviour include buffer, AND gate, and OR gate.
A negative unate timing arc is observed when the output signal falls or remains
unchanged upon a rising transition on the input, and rises or remains unchanged
upon a falling transition on the input. Components such as inverters, NAND gates,
and NOR gates exemplify negative unate behaviour.
The unate-ness of a logic function plays a crucial role in VLSI design by affecting
timing slack and constraints. A logic function that is unate can be represented as a
linear delay element, simplifying timing analysis and optimization. Conversely, non-
unate functions require more complex nonlinear models, making timing analysis and
optimization more challenging and potentially leading to timing violations.
Knowing the unate-ness of an element reduces the number of possible input/output
combinations, saving computation time and aiding in efficient design. Overall,
considering the unate-ness of logic functions is vital for optimizing timing
behaviour and performance in VLSI design.
GBA stores the maximum (Arrival Time + Delay) and the maximum slew observed
from all possible timing arcs at each timing vertex. Consequently, the slack
computed using GBA tends to be pessimistic, as it considers the worst delay and
slew, even if they're not on the same critical path. This approach ensures safety but
may result in less accuracy.
On the other hand, PBA is more realistic and optimistic but computationally
intensive. PBA estimates the critical path using the maximum delay function and
computes the slew on that path. Therefore, slack calculated using PBA incorporates
the delay of the critical path and the slew at the input of that path, resulting in a
more accurate representation of timing behavior.
The slack calculated on the worst Hold path is the same for both PBA and GBA in
this case.
2.1. Worst path Setup Slack (with input slew and output load):
As, we're utilizing the same netlist post-scan insertion, wherein D flip-flops (DFFs)
were replaced by scan chains, resulting in a significant increase in area. When we
asses area before placement, the area remains consistent at 3431.028 um2 for both
core utilization factor values of 0.5 and 0.8.
During the placement step, we decide where to place the standard cells in our design
and further optimize it for timing violations.
1. UF = 0.5
2. UF = 0.8
The GBA path considers the maximum value among the slews and delays set while
traversing, whereas PBA considers all the specified paths. So, in GBA most
pessimistic results for setup/hold analysis are provided , calculated arrival time is
high in case of GBA in comparison to PBA thus slacks we get from GBA are
generally lower than the slacks we get from PBA.
Before placement the setup slack was -2.171 ns whereas after placement setup slack
reduced to -2.843 ns for UF = 0.5. This reduction in setup slack is because of
increase in wire load. After placement the estimate wire load is considered which
leads to increase in total load. As load increases, delay increases. However, Hold
slack remains unchanged in this case.
The area obtained before and after placement remains unchanged. This consistency
suggests that the circuit is systematically partitioned into sub circuits using
algorithmic and mathematical techniques. Consequently, the number of cells or
instances in the design remains constant, as evidenced by the provided cell and area
reports. This constancy implies that the overall area of the design should also remain
constant throughout the placement process.
In both cases of UF, power consumption has decreased. This happens because
during placement, the actual resistance and capacitance of the wires, which differ
slightly from the assumed wired load model, are taken into account. However, the
reduction in power consumption is more pronounced when UF is set to 0.5. When
the core utilization factor (UF) is higher, there's less area for wires and paths, which
means there will be more congestion between interconnects, which will lead to
additional resistance and capacitance and eventually higher power consumption.
Analysis of Reports and Layouts generated after CTS:
Clock Tree Synthesis (CTS) is a critical stage in the design process of integrated
circuits. It involves distributing a real clock signal from a single source to multiple
registers throughout the layout. CTS achieves this by constructing a tree-like
structure of clock buffers and interconnects, ensuring effective propagation of the
clock signal. The primary objective of CTS is to create a balanced clock network,
minimizing global clock skew to synchronize the operation of all components within
the design.
The, CTS optimizes the clock tree to meet setup and hold requirements for all paths,
adding buffers and inverters as needed. The process iteratively adjusts the clock
network until all timing slack values are positive, indicating successful optimization.
Finally, the updated netlist is forwarded for routing.
Clock latency from the constraint file has been removed as we can’t assume the
delay of clock signal from its source now, and CTS has been done, to observe
accurate effect of CTS.
1. UF = 0.5
After completing clock tree synthesis (CTS) optimization and conducting timing
analysis with the same clock period as used during placement, we found that hold
violations were resolved, but setup violations persisted. To address this issue, we
adjusted the clock period to 3ns and reran the timing analysis on the CTS
optimization script. As a result, all setup and hold paths were successfully met with
positive slack values.
When the clock period is increased from 2.457 ns to 3 ns, the required time (clock
period - setup time) for the begin point w1_reg [3]/Q increases from 2.091 ns to
2.549 ns. Consequently, the setup slack (required time - arrival time) also increases,
leading to the resolution of setup violations.
As indicated in the timing report, a clock buffer, CLKBUFX2, has been introduced
into the path after clock tree synthesis (CTS), a step where it was absent previously.
The delay introduced by this buffer is now part of the path, resulting in an increase
in the arrival time (AT) of the path for the begin point scan_en. Specifically, the AT
has increased from 0.010 ns to 0.540 ns. Consequently, the slack (AT - RT), also
known as hold slack, has increased from the post-placement step.
2. UF = 0.8
Following clock tree synthesis (CTS) optimization with a clock period of 3ns, which
was utilized for UF = 0.5, and conducting timing analysis, we observed that setup
violations in path-based analysis (PBA) were resolved. However, setup violations
persisted in graph-based analysis (GBA), and hold violations were still present.
To resolve this issue, we adjusted the clock period to 3.5ns and re-executed the
timing analysis on the CTS optimization script. Consequently, all setup and hold
paths were successfully met with positive slack values. Therefore, for UF = 0.8, a
clock period of 3.5ns was selected.
When transitioning from a 2.457 ns clock period to a 3 ns clock period, the required
time (RT) for the signal at the begin point scan_en increases from 0.350 ns to 0.614
ns. This increase in required time is calculated as the sum of the capture time
(Tcapture) and the hold time (Thold) for the signal.
2.6. Worst Hold Slack ( for Clock period = 3.5 ns):
Upon examining the begin point "sub," we observe the presence of an additional
delay element, DLYX1, in the path. This addition contributes to an increase in the
arrival time (AT) of the signal, reaching 0.293 ns. Consequently, this increase in AT
enhances the hold slack, resulting in the fulfilment of all hold paths at this clock
period.
3. Comparison Table:
After clock tree synthesis (CTS), the total area has increased from 3431.028 um2 to
4948.612 um2 in the case of UF = 0.5. This increase is attributed to CTS introducing
additional buffers and delay instances in the design while routing the clock, aiming
to minimize overall clock skew. The inclusion of these extra instances leads to an
expansion in the overall area. Specifically, the reported total area is 4948.612 um2,
with a total of 586 instances utilized. Some of the added instances include DLY1X1,
CLKBUFX2, and others.
The area has increased in both cases compared to the post-placement step, but the
increase is slightly more pronounced for UF = 0.5 than UF = 0.8. This difference
can likely be attributed to the higher utilization factor, which results in cells being
placed closer to each other. As a result, for UF = 0.8, unnecessary cells are removed
more efficiently, leading to a reduction in the overall area compared to UF = 0.5.
In the case of UF = 0.5, the power consumption increased after clock tree synthesis
(CTS) due to the increase in the number of instances post-placement.
However, for UF = 0.8, the total power after CTS decreased to 1.417 mW from
1.429 mW in the post-placement step. This decrease is attributed to optimization
measures taken after CTS, resulting in a reduction in internal power consumption to
1.099 mW. The optimization process involves more efficient usage and connection
of instances with the clock, leading to a decrease in internal power.
Despite the decrease in internal power, switching power and leakage power have
slightly increased due to the greater number of instances present in the design after
CTS. Additionally, the capacitance has increased due to the addition of extra
instances, contributing to the overall increase in total load.
The routing stage of the design process involves the establishment of interconnect
paths for data path signals, divided into global and detailed routing phases.
Global Routing: In this stage, the die area is partitioned using various
algorithms. Interconnection paths between two sites are selected at a high level,
typically on a Gcell (Global Routing Cell) level.
Detailed Routing: Following global routing, detailed routing is conducted. This
stage finalizes exact tracks and via locations, incorporating metal layers and
necessary via connections.
Metal Layer Assignment: Different signals are routed through specific metal
layers, and corresponding vias are determined to complete the net.
Optimization: Upon completion of routing, optimization techniques are applied
to ensure that all paths meet hold/setup requirements, optimizing the layout for
timing and performance.
1. UF = 0.5
As, it can be seen from above snippets, total wire length after detailed routing
increases from global routing.
Routing is done with the use of Metal1, Metal2 and Metal3, Metal4, Metal5.
Power rails are laid using higher metal layers that is Metal8 and Metal9.
1.6. Layout Showing Metal-1, Metal-2 Layer:
2. UF = 0.8
3. Comparison Table:
During routing, a plan is established for each net to be routed, outlining the path it
will take. In detailed routing, the exact layout of each net is determined, and
additional metal tracks may be assigned to alleviate wire congestion, leading to
improved routing and better slack in both cases.
With an increase in the utilization factor (UF), the arrival time (AT) increases. Thus,
delays increase due to greater standard cell congestion and increased RC delays
resulting from less routing place utilized for routing. Consequently, setup slack
decreases from UF = 0.5 to 0.8, while hold slack increases, representing an opposite
trend.
After detailed routing, the area and cell count for both cases remain exactly the same
as in the post-clock tree synthesis stage.
In the case of UF = 0.5, the total power has slightly increased. This increase can be
attributed to the routing engine performing several re-routings to resolve trial route
violations. Additionally, during the detailed path routing step (in post-route), the RC
delays of the paths may have increased. Consequently, the change in resistance and
capacitance levels likely led to an overall increase in total power consumption.