0% found this document useful (0 votes)
55 views42 pages

5.ClockTreeSynthesis JD

Clock Tree Synthesis (CTS) is a crucial process in VLSI design that ensures clocks are delivered to all sequential elements simultaneously while minimizing skew and insertion delay. The document outlines the need for CTS, its terminology, flow, optimization techniques, and challenges faced during implementation. It also discusses various clock tree topologies and the importance of clock gating to reduce power consumption.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views42 pages

5.ClockTreeSynthesis JD

Clock Tree Synthesis (CTS) is a crucial process in VLSI design that ensures clocks are delivered to all sequential elements simultaneously while minimizing skew and insertion delay. The document outlines the need for CTS, its terminology, flow, optimization techniques, and challenges faced during implementation. It also discusses various clock tree topologies and the importance of clock gating to reduce power consumption.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Clock Tree Synthesis

(CTS)

Jaidev Kaushik
Azenda

1. Need of CTS

2.Clock Terminology

3.Clock targets and DRV

4.Types of clock tree

5.Clock tree optimization

6.CTS flow

7.Challenges faced in Project

2
ASIC PD Flow

3
What is CTS?
 Deliver Clocks to all sequential elements almost in same time, with
proper buffering, meeting given constraints(skew, insertion delay)
without DRCs (max. Tran, max. Cap)
 In VLSI flow, CTS is performed after the placement and before the
routing of signal nets. D Q

D Q D Q D CK Q

CK CK CK

D Q D Q

CK CK

CLK
4
Why CTS ?
 Within most VLSI circuits, data
transfer between sequential elements is
synchronized by the processing clock.

 Before CTS, All clock pins are driven


by a single clock source having high
fan-out and high load.

 CTS is the process of inserting


buffers/inverters along the clock path of
the ASIC design to balance the clock
delay to all clock inputs.

 In order to balance the skew and


minimize insertion delay, CTS is
performed.

5
CTS Terminology
 Clock Skew (Global Skew, Local Skew, Positive/Negative Skew, Useful Skew)

• Clock Latency OR Clock Insertion Delay (Max/Min Insertion Delay)

• Source Latency, Network Latency, Common Path

• Target Latency ,Target Skew

• Clock Tree Report, Skew Report

• Clock Tree Algorithm

• Auto-CTS(Conventional Clock Tree), Custom Clock Tree (H-Tree), Clock Mesh

• No. of Levels

• Clock Gating, Clock Gating Integrated Cell(ICG)


Clock Skew
 Clock skew is the maximum difference in the arrival time of a clock signal at pins of two
different sequential elements.

Figure showing both Local Skew and Global skew


Local Skew, Global Skew
 Global skew: Difference between shortest clock path delay and longest clock path
delay considering all sequential elements (in a clock domain).
 Local skew: Difference between the arrival times of clock signal at the clock pin of
two related flops of same clock domain.
Positive Skew, Negative Skew
 Positive skew : Capture clock comes late than launch clock . It improves setup time but can
lead to hold violation
 Negative skew: Capture clock comes early than launch clock. It improves hold time but can
lead to setup violation.
 Beneficial/Useful Skew : If clock is skewed intentionally to improve timing violations
Clock Latency
There are two terms associated with latency:

 Source Latency: It is the time taken by the clock signal to propagate from its ideal waveform
origin point to the clock definition point in the design.
 Network Latency: It is the time taken by the clock signal to propagate from the clock
definition point in the design to the clock pin of the sequential device.

Figure showing source latency and network


latency
Clock Latency in SDC

set_clock_latency 0.500 [get_clocks core_clk]

It is the delay that is assumed to exist between the clock source and the flip-flop clock
pin during pre CTS stage.

This is used before clock routing, when clock is ideal.

 It is not the actual delay, but the delay specified by the user, to account for the clock
delay which will be implemented after routing of clock tree.

The timing analyzer uses this information to determine clock arrival times in the
absence of propagated clocking i.e. during pre CTS.
Clock Latency in SDC

Here are some example commands that specify source and network laten-
cies.
# Specify a network latency (no -source option) of 0.8ns for
# rise, fall, max and min:
set_clock_latency 0.8 [get_clocks CLK_CONFIG]
# Specify a source latency:
set_clock_latency 1.9 -source [get_clocks SYS_CLK]
Actual : Clock Insertion Delay

 Once CTS is completed, i.e. post CTS design, the actual delay from the clock source point to
the clock sink points can be calculated. These are typically called (actual) clock insertion
delays at that point.

 Until this point, we normally use set_clock_latency in the SDC (assumed value) to account the
clock insertion delay.

 Technically, clock_latency in SDC and Actual Clock insertion Delay, both are same just that
the estimated latency is hardcoded for all sequential elements whereas the actual insertion
delays for all sequential elements are likely to be slightly different(from one another)
Timing : Ideal Clock to Propagated Clock
Common Path

 Picture with and without Common Path


Design Status before doing CTS

 Placement – Completed (implies Floorplan/Power Plan also completed)

 Power ground nets – Pre-routed

 Estimated congestion – Acceptable

 Estimated Timing – Acceptable (setup should be ~0ps )

 High fanout signal nets – buffered (reset scan enable etc.)

 Clock nets – Not Buffered


CTS Goals
 Delivering Clock to all Sequential elements
 Meet the clock tree targets:

Maximum skew; Min/Max insertion delay

Minimizing Power dissipation, Total wirelength, Noise and coupling effect
 Meet the clock tree Design Rule Constraints (DRC):

Maximum transition delay ; Maximum load capacitance; Max fanout; Max buffer level

Inputs Required for CTS


 Detailed placement Database (should pass, checklist specified earlier)
 Target Latency and Target Clock skew (desired numbers)
 Clock Buffers/inverters (different flavors/drive strength) – equal rise/fall time
 DRC targets (Max Tran, Max Cap, No. of Buffer levels)
 Non Default Rules (for Leaf, Non Leaf Nets), Metal Layers for Routing
 Clock Tree Exceptions(Float/Exclude/Stop Pins) and InterClock Balancing Requirement
CTS Flow

 Load the Place Opt Design

 Read CTS SDC: Clock Tree begins at SDC defined clock pin and ends at stop pin
of the flop

 Compile/Build CTS using CTS Spec. file

 Place/Add Clock Tree Cells (to meet CTS targets)

 Route Clock Tree (Optional and can be done during Signal net routing also)

 Save the Clock Tree built Design

 Generate all Clock Tree reports


CTS FLOW
#####################################
# CTS
######################################
setDontUse CLKBUF* false
setDontUse CLKINV* false
setDontUse PBUFX2 false Remove Dntuse status
setDontUse PINVX1 false
reportAlwaysOnBuffer -all

Define Clock
set_ccopt_property buffer_cells {CLKBUFX4 CLKBUFX8 CLKBUFX16 PBUFX2} Buffers and
set_ccopt_property inverter_cells {CLKINVX4 CLKINVX8 CLKINVX16 PINVX1}
Inverters
Define preferred Metal Layer
setNanoRouteMode -quiet -routeTopRoutingLayer 9
ccopt_check_prerequisites
set_ccopt_property max_fanout 50
create_ccopt_clock_tree_spec -immediate
#setDelayCalMode -engine aae Create Clock tree
ccopt_design

saveDesign DBS/cts.enc -compress


verifyPowerDomain -bind -gconn -isoNetPD RPT/cts.isonets.rpt -xNetPD RPT/cts.xnets.rpt

############################################
# postCTS optimization (hold)
############################################
setDontUse DLY* false
Why Clock Buffers(Inverter Pairs) ?

 Equal rise/fall time and hence maintaining 50% duty cycle.


Clock Tree Tracing : Begin/End points, Exceptions

Clock tree begins at clock source and ends at


clock pins of FF/macro or input pin of
combinational cell.
Clock Tree Exceptions:
 Sync (Stop) pins: Considered for building clock tree balancing.
 Ignore(Exclude) pins: Need considered for clock tree balancing.
 Float pins(Macro Model pin): Internal clock latency is taken into
consideration.
Phases of Clock tree Building
 Clock Tree Synthesis (CTS)
-- Builds an initial load balanced clock tree
-- Must Meet DRC’s

 Clock Tree Optimization (CTO)


-- Performs cell sizing, relocation, buffer/delay insertion
-- Tries to achieve clock tree target

 Clock Tree Routing (optional, but generally used)


CTO
 Clock Tree Optimization (CTO)
1.Buffer gate sizing
2.Buffer and Gate Relocation
3.Level adjustment
4.Reconfiguration
5.Delay Insertions
6.Dummy Load Insertions
CTO
 Clock Tree Optimization (CTO)
1.Buffer gate sizing
2.Buffer and Gate Relocation
3.Level adjustment
4.Reconfiguration
5.Delay Insertions
6.Dummy Load Insertions
Clock Tree Optimization
Effect/Results of CTS

Clock Buffers(or Inverter Pairs) are added, flops likely moved

Congestion may increase

Non clock cells may be moved to non-ideal location (to be optimized later)

Can introduce timing and max cap/tran violation


What to check after CTS ?

Skew Reports

Clock tree Reports

Timing reports for Setup and Hold

Power and Area Report


CTS Topologies/Algorithms

RC Tree Based CTS

H Tree based Algorithm

X Tree based Algorithm

Pi Configuration
H-Tree Clock Placement/Routing

H-tree, Because of the balanced construction, it is easy to reduce clock skew in the H-
tree clock structure.
A disadvantage to this approach is that the fixed clock plan makes it
difficult to fix register placement. It is rigid in fine-tuning the clock tree.
Conventional CTS Distribution
 It is the most used approach for dealing with design complexity

 There is very huge depth for both buffer and clock-gating levels. Most of the sinks in the design
share very less paths back to the clock root.

 Impact of on-chip-variation effect is very high.

Clo
ck

FF FF FF FF FF FF FF FF FF FF
Clock Mesh Distribution

 It has extremely shallow logic depth below the


mesh, usually just a single buffer or clock gate
directly driving the sinks.

 It has large shared path from clock root to the


mesh.

 Impact of on-chip-variation effect is minimal

 It uses a very dense mesh fabric.

 Ultra low skew values can be achieved.


Clock Mesh Distribution
Pros
• Multipoint CTS is the best option when working
with high speed design of frequencies
1Ghz and above.
• Lowest skew can be achieved by this method.
• Non even distribution of sinks doesn’t effect the
skew.
• Tolerant to process variation.
• Cons
• Huge wire area and very large drivers which
results in huge power consumption.
• Grid is in general over designed achieving
optimal solution is highly impossible.
Spine Tree

>The spine tree (Fish bone) arrangement makes it easy to reduce the skew. But it is
heavily influenced by process parameters, and may have problems with phase delay
Clock Gating
 Common technique for reducing clock power by shutting off the clock
to modules by a clock enable signal.

 Clock gating functionally requires only an AND or OR gate.

 Consider you were using an AND gate with clock. The high EN edge
may come anytime and may not coincide with a clock edge. In that
case the output of the AND gate will be a 1 for less time than the
clock’s duty cycle. You in turn end up with a glitch in your clock
signal.

 To avoid this, a special kind of clock gating cells are used, that
synchronizes the EN with a clock edge. These are call integrated
clock gating cells or ICG.
Clock Gating

 Clock tree consume more than 50 % of dynamic power.

 So we turn off the clock, when it is not needed by using clock-gating cells

 There are two types of clock gating styles available. They are:

-- Latch-free clock gating (simple AND, OR cells)

- -Latch-based clock gating (Integrated Clock Gating Cell - ICG)


Latch free Clock Gating
 It uses one of simple AND/OR gate.

 The output gated clock, can turn terminate prematurely or can generate multiple
clocks pulses.

 This restriction makes it inappropriate for single clock based flip-flop designs.
Integrated Clock Gating Cell (ICG)
 This style adds a level-sensitive latch to Using AND Gate with High EN
the design to hold the enable signal from
the active edge of the clock until the
inactive edge of the clock.

 Since the latch captures the state of the


enable signal and holds it until the
complete clock pulse has been generated,
the enable signal need only be stable
around the rising edge of the clock.

Using OR Gate
with High EN
Latch Based Clock Gating

 This style adds a level-sensitive latch to the design to hold the enable signal from the
active edge of the clock until the inactive edge of the clock.

 Since the latch captures the state of the enable signal and holds it until the complete
clock pulse has been generated, the enable signal need only be stable around the
rising edge of the clock.
Project Challanges
During creating clock spec, & nets had more then 500 fanout, means
afterplacement, still Ideal networks were there.
●We updated the SDC file and made those net as nonideal
●Initial Netlist, Clock input tansition was too slow 0.800 and target was
0.889. So we made it 0.890 in all sdc files.
●In updated Netlist, it was fixed automatically.
●Dont_touch cells and dnt touch network were present in design. We found
one tcl script and did run.
Remove all dnt touch cell and dont Touch network
●check_dont_touch_clock_tree_nets
●remove_dont_touch_clock_tree_nets
●check_cts_clock_cells
●remove_dont_touch_clock_cells
●To check all clock spec requirement, we used command >
ccopt_check_prerequisites
Clock Exceptions

1 Nonstop pins:Nonstop pins are pins that would normally be considered endpoints of the
clock tree, but instead ICC traces through them to find the clock tree endpoints, the clock pins of
sequential cells driving generated clocks are implicit nonstop pins. In addtion, ICC supports user-
defined ( or explicit ) nonstop pins.


2.Exclude pins:Exclude pins are clock tree endpoints that are excluded from clock tree timing
calculations and optimizations. ICC uses exclude pins only in calculations and optimizations for
design rule constraints.


3. Float pins:Float Pins are clock pins that have special insertion delay requirements. ICC adds
the float pin delay ( positive or negative ) to the calculated insertion delay up to this pin.


4. Stop pins: A stop pin is an explicitly specified end pin of a clock tree. Unlike default clock
sinks, a stop pin can be the input pin of a non-sequential cell. Clock tree synthesis treats a stop pin
as a clock sink.
Few more Terms
 On Chip Variations (OCV)

 Timing Derates

 Cross Talk
Thank You

42

You might also like