0% found this document useful (0 votes)
907 views8 pages

ClockGating Cts

The document discusses clock gating techniques to reduce power consumption. Clock trees can consume over 50% of dynamic power. There are two main types of clock gating styles - latch-based and latch-free. Latch-based clock gating uses a latch to ensure a clean clock signal is delivered to flip-flops, making it better suited for single-clock flip-flop designs. Latch-free clock gating directly gates the clock with combinational logic and may prematurely truncate the clock or generate extra pulses, so it is less suitable. RTL clock gating identifies groups of flip-flops that can be gated together using a common enable signal to turn off their clock power when inactive.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
907 views8 pages

ClockGating Cts

The document discusses clock gating techniques to reduce power consumption. Clock trees can consume over 50% of dynamic power. There are two main types of clock gating styles - latch-based and latch-free. Latch-based clock gating uses a latch to ensure a clean clock signal is delivered to flip-flops, making it better suited for single-clock flip-flop designs. Latch-free clock gating directly gates the clock with combinational logic and may prematurely truncate the clock or generate extra pulses, so it is less suitable. RTL clock gating identifies groups of flip-flops that can be gated together using a common enable signal to turn off their clock power when inactive.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 8

Clock Gating

Clock tree consume more than 50 % of dynamic power. The components of this power are:

1) Power consumed by combinatorial logic whose values are changing on each clock edge

2) Power consumed by flip-flops and

3) The power consumed by the clock buffer tree in the design.

It is good design idea to turn off the clock when it is not needed. Automatic clock gating is supported by
modern EDA tools. They identify the circuits where clock gating can be inserted.

RTL clock gating works by identifying groups of flip-flops which share a common enable control signal.
Traditional methodologies use this enable term to control the select on a multiplexer connected to the D
port of the flip-flop or to control the clock enable pin on a flip-flop with clock enable capabilities. RTL
clock gating uses this enable signal to control a clock gating circuit which is connected to the clock ports
of all of the flip-flops with the common enable term. Therefore, if a bank of flip-flops which share a
common enable term have RTL clock gating implemented, the flip-flops will consume zero dynamic
power as long as this enable signal is false.

There are two types of clock gating styles available. They are:

1) Latch-based clock gating

2) Latch-free clock gating.

Target Skew
Target Skew, the skew value on which the cts engine will try to build a balanced clock tree.
In this post we will discuss about on which factors we will choose the target skew of our
design & how’s that factors affect our design QOR.
As a designer, it is general tendency to have a zero skew & have a perfect balanced clock
tree, but Zero skew is not overall good for design, why? Think about in terms of latency,
buffer count, Dynamic power & congestion.
For Zero skew, overall latency of a design is going to increase, as it will take more clock
Buff/Inv to balance the flops (for zero skew), which may results in increase of uncommon
clock path (more prone to OCV variation) & high dynamic power dissipation as all the flops
& buffer will going to toggle at same time. As each clock net takes double routing resource
because of NDR settings apply on clock net, so the congestion also increases as the lesser
skew are targeted.
As the technology is shrinking, so it is becoming more critical to close timing across corners.
Skew has direct impact on setup/hold. The main motive to attain Zero skew is the hold
timing across all the corners. So, by optimal selection of skew number (target skew), we will
decrease clock power consumption, clock buffer/inv count & significant congestion
reduction.

How we will analyze the Target skew value?


For Target skew we have to do multiple experiments, creating clock tree with target skew
defined by keeping the constraints constant (SDC) & then different Skew numbers are
analyzed based on latency, power & congestion.
We know that hold timing equation of a flop i.e,

Figure 1: Timing Path

Tck->q + Tcomb > TSkew + THold

We can re-write this as,


TSkew < Tck->q + Tcomb - THold
For worst case scenario in hold ,lets suppose T comb = 0 (flops sitting very near, no logic path),
above equation can be rewritten as
TSkew < Tck->q - THold
Lets assume flop delay in worst case is 100 ps & hold time is 30 ps
TSkew < 70 ps
Which mean there is a scope of ~70 ps skew without degrading hold timing in worst case.
So we do multiple iterations by setting the target skew in range of +- 30 ps & analyse for
above factors as well as for checked for timing(NFE) in all the corners.

NFE -> No. of failing endpoints


Temperature Inversion
Temperature inversion is a phenomenon which occurs in lower nodes,which makes the delay
of a cell decreases when there is a rise in temperature contradictory the delay in higher
nodes.
Lets unfold this,
If you look at the MOSFET drive current equation,

So, ID varies linearly as u (mobility) and (VGS-Vt)^2 or the overdrive voltage. We can
conclude that Delay of a cell depend upon two factors mobility & threshold voltage( Vt)of a
transistor.

How mobility & Vt depend upon temperature

Due to rise in temperature, metal ions going to vibrate more, so mobility of charge carriers
will decreases such that delay of a cell going to increase.

Threshold voltage is also going to decrease,with rise in temperature as number of minority


carriers in the substrate going to increase, which makes the less Vt than usual required to
form a channel.

To Summarise, increase in temperature, makes the delay of a cell

 Decreases due to Decrease in threshold voltage,


 Increases due to Decrease in the mobility.
So delay of a cell may increase or decrease depend upon which factor going to dominate
either mobility or threshold voltage on final current.

When the VGS- Vt or overdrive voltage is large(in higher node-> high VGS), then decrease
in threshold voltage due to variation in temperature is negligible because overall overdrive
voltage has very less impact, so mobility factor is going to dominate here, results delay of a
cell going to increase with rise in temperature .
When the overdrive voltage(VGS-Vt) is less (smaller node -> less VGS), then decrease in
threshold voltage due to rise in temp going to dominate the overall overdrive voltage,
results delay of a cell going to decrease with rise in temperature.

One thing should be noted that temperature inversion is come into picture at lower nodes
(lower voltage) with more prominent effects on HVT cells.

ICG Optimization
In the previous post we have read about ICG Enable timing problem, to overcome the problem we use
ICG optimization technique in pre-cts stage.
ICG optimization is executed during place stage & performs
 Dummy -CTS
 ICG Splitting
 Clock aware placement
Dummy CTS
In Dummy CTS ,it will build a Dummy clock tree to identify the critical ICG, calling it as dummy clock
because in cts stage tool will build the actual clock tree by discarding the dummy one
Benefits of dummy cts are:
 Accurately determine the ICG enable critical timing paths with the help of Dummy cts
 Accurate data path optimization of timing critical ICG enable paths in place stage.
 Effective ICG splitting & clock aware placement (discuss below).
One thing should be taken care of that we have to apply all cts related settings like clock tree exceptions,
NDR rules, layers, etc before running place stage in order to correlate Dummy cts & actual cts clock tree
as much as possible for optimum ICG optimziation .

ICG Splitting
After Dummy CTS, ICG optimization perform ICG splitting, we know that if ICG cell is driving multiple reg ,
it will increase the ICG downstream latency lead to more enable timing critical paths.
Tool identify the Critical ICG's in Dummy cts & do ICG splitting i.e instantiate one ICG into many ICG's &
place them near to the reg they drive to reduce ICG downstream latency for better ICG enable timing.
ICG splitting is timing driven means only ICG's with enable timing violations will split.
Figure 1: ICG Splitting

Clock aware Placement


In last stage of ICG optimization tool will do clock aware placement i.e after timing driven splitting of ICG,
tool will place the ICG's with critical enable timing near to register clusters(as shown in figure1).

One thing to note that ICG optimization may increase dynamic power dissipation in our design.

ICG Enable Timing Problem


As we know Integrated Clock Gating cells are used to reduce dynamic power dissipation in
the design, which is being Enable by CTRL logic. To get the glitch free output from ICG cell ,
it should meet the timing requirement (setup/hold) at enable pin of ICG cell.

Figure1: ICG cell


In the above figure as we seen ICG cell is driving multiple flops which is being enabled by
control logic flop R1. L2 & L3 is the latency from clock port to ICG & flops.so our ICG cell
latency(latency from ICG output clock to flops) will be

ICG latency = L3-L2

Ideally one ICG cell can drive infinite flops, as no. of flops going to increase driven by ICG
cell,tool is going to add more buffer in the clock path to balance clock tree, which will
increase the ICG latency.

As L3 latency going to increase, results in increase in ICG latency, as clock period is fixed so
now we are having lesser clock period than before to meet setup timing at EN pin.

So we can conclude that Larger the ICG latency , more critical the ICG enable timing.

It is always advisable to address ICG timing in place/pre-cts stage, as after CTS it can be
too late for the design to address ICG timing violation.

we know that Pre-cts timing analysis used ideal clock latency for all clock pins,that means
L2=L3, & ICG latency will be 0.

As ICG latency is 0 ,which will make ICG Enable timing analysis too optimistic, because now
ICG cell will get full clock to meet setup at Enable pin(before it get only L3 - ICG Latency).
So, In Pre-cts actual ICG violations are not seen, therefore not fixed in the design.

To overcome this design problem ICG optimization is a technique recommended for


designs having critical ICG enable timing.In the next post we will discuss about ICG
optimization technique, how it is executed.

------------------------------------
Latch free clock gating
The latch-free clock gating style uses a simple AND or OR gate (depending on the edge on which flip-
flops are triggered). Here if enable signal goes inactive in between the clock pulse or if it multiple times
then gated clock output either can terminate prematurely or generate multiple clock pulses. This
restriction makes the latch-free clock gating style inappropriate for our single-clock flip-flop based
design.

Latch based clock gating


The latch-based clock gating style adds a level-sensitive latch to the design to hold the enable signal from
the active edge of the clock until the inactive edge of the clock. Since the latch captures the state of the
enable signal and holds it until the complete clock pulse has been generated, the enable signal need only
be stable around the rising edge of the clock, just as in the traditional ungated design style.

Specific clock gating cells are required in library to be utilized by the synthesis tools. Availability of clock
gating cells and automatic insertion by the EDA tools makes it simpler method of low power technique.
Advantage of this method is that clock gating does not require modifications to RTL description.

You might also like