PnR-II-CTS Routing Chip Finishing
PnR-II-CTS Routing Chip Finishing
PnR-II-CTS Routing Chip Finishing
1
Clock Tree Synthesis
2
Clock Parameters
• Skew
• Difference in clock arrival time
at two different registers.
Skew
• Jitter Clock Skew
Difference in clock arrival time
• Difference in clock period at two spatially distinct points
between different cycles. Clock Jitter
Difference in clock
• Slew
• Transition (trise/tfall) of clock
signal.
• Insertion Delay
• Delay from clock source until
3
registers.
How do clock skew and jitter arise?
Clock
Distribution
• Clock Generation Network
• Coupling
Local
• Load Clock
Buffers
• Environment Variation
• Temperature Intel 1998, Variations in local clock
0.25um load, local power
• Power Supply supply, local gate
length and threshold,
local temperature
4
Capture Clock Edge
The edge of the clock for which data is detected is known as capture edge.
5
Local skew
Local skew is the difference in the
arrival of clock signal at the clock pin
of related flops.
Global skew
Global skew is the difference in the
arrival of clock signal at the clock pin
of non related flops. This also defined
as the difference between shortest
clock path delay and longest clock
path delay reaching two sequential
elements.
6
Positive Skew
If capture clock comes late than launch clock
then it is called +ve skew.
+ve skew can lead to hold violation.
+ve skew improves setup time
Negative Skew
If capture clock comes early than launch clock it
is called –ve skew.
-ve skew can lead to setup violation
-ve skew improves hold time.
7
Source Delay or Source Latency
It is the delay from the clock origin point to the clock
definition point in the design".
Delay from clock source to beginning of clock tree (i.e.
clock definition point).
The time a clock signal takes to propagate from its ideal
waveform origin point to the clock definition point in
the design.
8
Uncertainty
Clock uncertainty is the time difference between the
arrivals of clock signals at registers in one clock domain
or between domains.
Clock latency
Latency is the delay of the clock source and clock network
delay.
Pre-CTS
Uncertainty = source latency + Network latency + jitter +
margin [est. network latency]
Post-CTS
Uncertainty = source latency + jitter (cal. skew)
9
Introduction
CTS is building a buffer/inverter network in order to balance the
relative delays of FFs belonging to a clock domain. (triggered by the
same clock).
→ [Global skew of each clock domain =~ 0].
• Question: FF FF FF FF FF FF
• Answer…
• Timing Clock FF FF FF FF FF FF
FF FF FF FF FF FF
• Power
• Area
• Signal Integrity FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
10
CTS VS HFS
Clock buffers and clock inverter with equal rise and fall times are used
(symmetric buffers/inverter). Whereas HFNS uses buffers and inverters with
a relaxed rise and fall times.
HFNS are used mostly for reset, scan enable and other static signals having
high fan-outs. There is not stringent requirement of balancing & power
reduction.
Clock tree power is given special attention as it is a constantly switching
signal. HFNS are mostly performed for static signals and hence not much
attention to power is needed.
NDR rules are used for clock tree routing.
11
Where does the Clock Tree Begin and End?
D Q
STOP FF
GATED CLK
D Q
STOP FF
CLOCK CLK
Clock Sinks
D Q
Start FF
(stop, float or
STOP
Clock Source
CLK
exclude pins)
create_clock …
12
Clock Trees
RC-Tree
•Naïve approach:
• Route an individual clock net to each sink
and balance the RC-delay
• However, this would burn excessive power and the large
RC of each net would cause signal integrity issues.
•Instead use a buffered tree
• Short nets mean lower RC values
• Buffers restore the signal for
better slew rates
• Lower total insertion delay
• Less total switching
capacitance FF FF FF FF FF FF FF FF FF FF
13
The requirements of Setup and Hold on timing paths
14
Design Status, Start of CTS Phase
Placement - completed
Power and ground nets – prerouted
Estimated congestion – acceptable
Estimated timing – acceptable (~0ns slack)
Estimated max cap/transition – no violations
High fanout nets:
Reset, Scan Enable synthesized with buffers
Clocks are still not buffered
15
Starting Point before CTS
17
When are the pros and cons of setting a tight
constraint for target_skew?
18
Clock buffer vs normal buffer
Clock buffer have equal rise time and fall time, therefore pulse width violation
is avoided.
In clock buffers Beta ratio is adjusted such that rise & fall time are matched.
This may increase size of clock buffer compared to normal buffer.
Normal buffers may not have equal rise and fall time.
Clock buffers are usually designed such that an input signal with 50% duty
cycle produces an output with 50% duty cycle.
19
Clock Tree constraints
Parameter Description
Clock tree buffers/inverters Has to be symmetric
Clock gating cells, logic cells [ex. MUXs]
Routing layers and NDR Usually used to avoid crosstalk
Target skew Target global skew
Max transition Target max transition of clock signal
Max capacitance Target max capacitance of clock signal
CTS cell spacing Extra cell spacing for clock cells to avoid congestion and
IR hotspots.
20
Clock Tree Synthesis (CTS) (1/2)
A buffer tree is built to balance the loads and minimize the skew.
21
Is the Design Ready for CTS?
check_physical_design –stage
pre_clock_opt checks for:
Design is placed
Clocks have been defined
Clock roots are not hierarchical pins
check_clock_tree checks and warns if:
A clock source pin is a hierarchical pin (see below for support)
A generated-clock with improperly specified master-clock
A clock tree has no synchronous pins
There are multiple clocks per register
22
Clock Tree Synthesis
Control
DRC
max tran/cap/fanout
clock_opt
Exceptions
Physical report_clock_tree
Constraints Analysis report_clock_timing
report_timing
23
Where does the Clock Tree Begin and End?
D Q
STOP FF
GATED CLK
D Q
STOP FF
CLOCK CLK
Clock Sinks
D Q
Start FF
(stop, float or
STOP
Clock Source
CLK
exclude pins)
create_clock …
24
Define Clock Root Attributes
Driving Cell
External driving cell
specified for clock port CLK
STOP Pins:
FF
GATED CLK
FF
FLOAT Pins:
Like Stop pins, but with delays IP_CLK
on clock pin IP
EXCLUDE Pins:
CTS ignores targets skew and insertion
CTS fixes clock tree DRCs delay are ignored D Q
FF
CLK
26
Defining an Explicit Stop Pin
CLOCK D Q
FF
0.42 CLK
IP_CLK D Q
FF
0.17 CLKn
Explicit stop pin defined
IP
clock delay – it can only “see” up to the
stop pin!
FF
0.42 CLK
IP_CLK D Q
FF
0.15 CLKn
FF
CLKn
IP
set_clock_tree_exceptions \
-float_pins IP/IP_CLK \
-float_pin_max_delay_rise 0.15
28
19
Generated and Gated Clocks
D Q
FF1
GATED 0.64 CLK
D Q
FF3
CLK
CLOCK
D Q 0.63
FFD D Q
create_clock CLK
QN FF4
CLK
D Q
create_generated_clock FF5
CLK
Skew will be balanced ‘globally’, within each clock domain, across all clock-
pins of both master and generated clock.
29
Skew Balancing not Required?
If the divided clock domain is Exceptions
independent of the master
domain (no paths), skew
balancing may not be
important. D Q
FF
0.42 CLK
…
D Q
FF
CLK
CLOCK
D Q
D Q
FF
FFD 0.67 CLK
CLK
QN
…
Define an explicit
exclude pin here D Q
FF
CLK
33
Invoke CTS: Core Command
Control
✓
DRC
✓ clock_opt
Exceptions
Physical
✓ Single Command CTS, Optimization and
CT routing
Constraints
✓
34
clock_opt use recommendation
35
32
Effects of Clock Tree Synthesis
36
(Embedded) Clock Tree Optimization
37
k Tree Optimizations
Gate
relocation
Buffer
relocation
Buffer
Delay Gate sizing
insertion sizing
Useful Skew
38
Routing
39
Design Status, Start of Routing Phase
Placement - completed
CTS – completed
Power and ground nets - routed
Estimated congestion - acceptable
Estimated timing - acceptable (~0ns slack)
Estimated max cap/transition – no violations
40
Routing Fundamentals: Goal
Routing creates physical connections to all clock and signal pins through metal
interconnects
Routed paths must meet setup and hold timing, max cap/trans, and clock skew
requirements
Metal traces must meet physical DRC requirements
41
Grid-Based Routing System
Trace
Grid Point
Track
42
Routing Operations
◼ IC Compiler performs:
Global Route
⚫ Global Routing
⚫ Track Assignment Track Assign
43
Route Operations: Global Route
45
Route Operations: Track Assignment
Preroute
46
Track Assignment
Track assignment is a stage wherein the routing tracks are assigned for each
global routes. The tasks that are performed during this stage are as follows-
Assigning tracks in horizontal and vertical partitions.
Rerouting all overlapped wires.
Track Assignment replaces all global routes with actual metal layers. Although
all nets are routed(not very carefully), there will be many DRC, SI and timing
related violations, especially in regions where the routing connects the pins.
These violations are fixed in the succeeding stages.
47
Detail Routing
The detailed router uses the routing plan laid by the router during the Global
Routing and Track Assignment and lays actually metal to logically connect pins
with nets and other pins in the design.
The violations that were created during the Track Assignment stage are fixed
through multiple iterations in this stage.
The main goal of detailed routing is to complete all of the required
interconnect without leaving shorts or spacing violations.
The detailed routing starts with the router dividing the block into specific
areas called switch boxes or Sbox, which are generally expressed in terms of
gcells. These boxes align with the gcell boundary. For example, a 3x3 Sbox is a
box which encompass 9 gcells.
48
Route Operations: Detail Routing
◼ Detail route attempts to clear DRC violations using a fixed size Sbox
◼ Due to the fixed Sbox size, detail route may not be able to clear all DRC
violations
Notch
Spacing
Thin&Fat
Spacing
Min
Spacing
49
Route Operations: Search&Repair
Search&Repair fixes remaining DRC violations through multiple
loops using progressively larger SBox sizes
Loop4
Loop3
Loop2
Loop1
Note: Even if the design is DRC clean after S&R, you must still
run a sign-off DRC checker (Hercules).
Routing DRC rules are a subset of the complete technology DRC
rules
IC Compiler works on the FRAM view, not the detailed
transistor-level (CEL) view
50
Route Operations: Search&Repair
Note: Even if the design is DRC clean after S&R, you must still run a sign-off
DRC checker (Hercules/ICV/Calibre or opensource alternatives).
Routing DRC rules are a subset of the complete technology DRC rules
IC Compiler works on the FRAM view, not the detailed
transistor-level (CEL) view
51
Pre-Route Checks
Check design for routing stage readiness
There should not be:
Ideal nets
High fanout nets greater than 500
Use check_routeability to check a design’s
prerequisites for detail routing and report a list of
violations
net 1 Aggressor
Cc
net 2 Victim
Aggressor Aggressor
Cm Lm Victim
Victim
dVdriver dI driver
I noise ,Cm = C m Vnoise ,L m = L m
dt dt
Mutual Capacitance, Cm Mutual Inductance, Lm
54
Crosstalk Definition
The crosstalk is electrical interaction between two long nets.
The causes of the crosstalk are: long parallel nets, coupling capacitance or
inductance and high frequency switching.
Coupling not only changes the delay on a wire but can also induce coupling
noise. The concept of crosstalk requires to define a “victim” and “aggressor”
net.
55
Crosstalk Definition
58
Crosstalk Delay
In case of passing into submicrometer technologies,
timing problems become stricter because each new
generation brings shrinking feature sizes, wire
width, and wire spacing. The reduction in wire
width means a decrease in total wire capacitance. 1.5V
1.5V
However, it also means a dramatic increase in the
0V
fraction of wire capacitance resulting from lateral 0V
1.5V Crosstalk
coupling. Improved performance translates to Delay
higher clock frequencies, with much faster 0V
Timing Error
switching signal slew rates. As signal slew rates
Cload
increase, more noise couples onto neighboring
nets.
59
Crosstalk Glitch
A steady signal net can have a glitch due to charge transferred by the switching aggressors through the
coupling capacitances
0
Aggressor net
Cc Glitch
Victim net
60
Factors Affecting Glitch
Factors for large magnitude of glitch:
Large coupling capacitance
Fast slew time on the aggressor
Smaller victim net grounded capacitance
Smaller victim net driving strength
61
Crosstalk Delay Analysis
Capacitance extraction for a net consists of different capacitances
Capacitance Cg to ground
Coupling capacitance Cc to a neighboring net
Aggressor
1.2V Cc
0
= Distributed
Victim RC
Cg
62
Crosstalk Scenarios
Aggressor net stable
Victim net provides the charge for Cg and Cc to be charged to Vdd
Total charge= (Cg + Cc) * Vdd
Aggressor switching in same direction
If aggressor slew is similar → total charge by the driving cell is (Cg * Vdd)
If aggressor slew is faster → charge smaller than (Cg * Vdd)
Aggressor switching in opposite direction
Cc is charged from -Vdd to Vdd
Charge on Cc changes by (2 * Cc * Vdd) before and after transitions
Aggressor
1.2V Cc
0
Cg
Victim
63
Positive Crosstalk
1.2V
1.2V
0
0
1.2V
Cc Crosstalk delay
0
Cg Timing Error!
64
Negative Crosstalk
Aggressor and victim are switching in the same directions
Charge on Cc remains the same before and after transitions
Victim delay reduced
1.2V 1.2V
0
Crosstalk delay
0
1.2V Cc
0
Cg Timing Error!
65
Crosstalk Timing Verification
Positive rise delay
Rise edge moves forward in time
Negative rise delay
Rise edge moves backward in time
Positive fall delay
Fall edge moves forward in time
Negative fall delay
Fall edge moves backward in time
66
Setup Analysis
Consider the logic shown below where crosstalk can occur at various nets along the data path and along the clock paths.
Data path
UFF0 UFF1
CK CK
Common clock
path Common point
67
Hold Analysis
The worst condition for hold check occurs when both the launch clock path and the data path have
negative crosstalk and the capture clock path has positive crosstalk. There is one important difference
between the hold and setup analyses related to crosstalk on the common portion of the clock path.
The worst-case hold (or min path) analysis for STA with crosstalk assumes:
Launch clock (not including the common path) sees negative crosstalk delay so that the data is launched early.
Data path sees negative crosstalk delay so that it reaches the destination early.
Capture clock (not including the common path) sees positive crosstalk delay so that the data is captured by the capture
flipflop late.
68
Crosstalk Correction in IC Compiler
69
Design Finishing
70
Antenna Violations
Metal wires (antennae) placed in an EM field generate voltage gradients
During the metal etch stage, strong EM fields are used to ionize the
plasma etchant
Resultant voltage gradients at MOSFET gates can damage the thin oxide
Protective coating
Metal 1
Oxide
Poly
71
Antenna Rules
gate
poly
diffusion
72
Solution 1: Splitting Metal or Layer Jumping
Before layer jumping
metal 3 M3 blockage
M1 gate
blockage metal 1
poly
Unacceptable antenna area
driver
diffusion
M1 is split by
jumping to
After layer jumping, to meet Antenna rules M3 and back
M1 gate
blockage metal 1
poly
driver
Acceptable antenna area
diffusion
73
Solution 2: Inserting Diodes
Before inserting diodes
Connection fails if
contact defective Connection is okay even
if one contact defective
75
Insert Redundant Vias
2X1
76
Why Filler Cell Insertion?
How?
Some placement sites remain empty on some rows
ICC can fill such empty sites with standard cells
77
Problem: Metal Over-Etching
78
ECO
79
The Two Types of ECO Flows
ECO netlist
ECO placement
Yes NO
derives the
Placement location for new
Spare cells are Fixed?
required added cell
instances
Continue with
ECO routing
80
Functional ECO Flows
1. Non-Freeze silicon ECO
Pre-tapeout, no restriction on placement or routing
Minimal disturbances to the existing layout
ECO cells are placed close to their optimal locations
2. Freeze silicon ECO
Post-tapeout, metal masks change only using previously inserted spare cells
Cell placement remains unchanged
ECO cells are mapped to spare cells that are closest to the optimal
location
Deleted cells become spare cells
81
Spare Cells
• Spare cells generally consist of a group of standard cells mainly inverter, buffer, nand, nor,
and, or, exor, mux
• The inputs of spare cells are tied either VDD or VSS through the tie cell and the output is
left floating.
• Spare cells enable us to modify/improve the functionality of a chip with minimal changes in
the mask. We can use already placed spare cells from the nearby location and just need to
modify the metal interconnect.
• There is no need to make any changes in the base layers. Using metal ECO we can modify
the interconnect metal connection and make use of spare cells. We only need to change
some metal mask, not the base layer masks.
82
Physical Only Cells
83
DCAP cells
• Decap cells are basically a charge storing device made of the capacitors.
• It is used to fill empty spaces at chip-finishing stage, with the good impact of stabilizing
voltage when there is a sudden current drawn and voltage drop.
• Decap cells work as charge reservoirs and support the power delivery network and make it robust.
• How to build it:
• Source and drain of pMOS transistor shorted together and connected to VDD and the Gate is
connected to VSS.
• Similarly, the source and drain of the nMOS transistor are connected to the VSS and gate is connected
to VDD.
• However, you have to be careful of leakage!
84
EndCap cells
• The end cap cell or boundary cell is placed at both the ends of each placement row to
terminate the row.
• It has also been placed at the top and bottom row at the block level to make integration
with other blocks.
85
EndCap cells
86
Well Tap cells
• Well tap cells (or Tap cells) are used to prevent the latch-up issue
in the CMOS design. Well tap cells connect the nwell to VDD and
p-substrate to VSS in order to prevent the latch-up issue.
87
Well Tap cells
88
Well Tap placement
89