Clock Gating
Clock Gating
Clock tree consume more than 50 % of dynamic power. The components of this power are:
1) Power consumed by combinatorial logic whose values are changing on each clock edge
It is good design idea to turn off the clock when it is not needed. Automatic clock gating is supported by
modern EDA tools. They identify the circuits where clock gating can be inserted.
RTL clock gating works by identifying groups of flip-flops which share a common enable control signal.
Traditional methodologies use this enable term to control the select on a multiplexer connected to the D
port of the flip-flop or to control the clock enable pin on a flip-flop with clock enable capabilities. RTL
clock gating uses this enable signal to control a clock gating circuit which is connected to the clock ports
of all of the flip-flops with the common enable term. Therefore, if a bank of flip-flops which share a
common enable term have RTL clock gating implemented, the flip-flops will consume zero dynamic
power as long as this enable signal is false.
There are two types of clock gating styles available. They are:
Specific clock gating cells are required in library to be utilized by the synthesis tools. Availability of clock
gating cells and automatic insertion by the EDA tools makes it simpler method of low power technique.
Advantage of this method is that clock gating does not require modifications to RTL description.
Target Skew
Target Skew, the skew value on which the cts engine will try to build a balanced clock tree.
In this post we will discuss about on which factors we will choose the target skew of our
design & how’s that factors affect our design QOR.
As a designer, it is general tendency to have a zero skew & have a perfect balanced clock
tree, but Zero skew is not overall good for design, why? Think about in terms of latency,
buffer count, Dynamic power & congestion.
For Zero skew, overall latency of a design is going to increase, as it will take more clock
Buff/Inv to balance the flops (for zero skew), which may results in increase of uncommon
clock path (more prone to OCV variation) & high dynamic power dissipation as all the flops
& buffer will going to toggle at same time. As each clock net takes double routing resource
because of NDR settings apply on clock net, so the congestion also increases as the lesser
skew are targeted.
As the technology is shrinking, so it is becoming more critical to close timing across corners.
Skew has direct impact on setup/hold. The main motive to attain Zero skew is the hold
timing across all the corners. So, by optimal selection of skew number (target skew), we will
decrease clock power consumption, clock buffer/inv count & significant congestion
reduction.
So, ID varies linearly as u (mobility) and (VGS-Vt)^2 or the overdrive voltage. We can
conclude that Delay of a cell depend upon two factors mobility & threshold voltage( Vt)of a
transistor.
Due to rise in temperature, metal ions going to vibrate more, so mobility of charge carriers
will decreases such that delay of a cell going to increase.
When the VGS- Vt or overdrive voltage is large(in higher node-> high VGS), then decrease
in threshold voltage due to variation in temperature is negligible because overall overdrive
voltage has very less impact, so mobility factor is going to dominate here, results delay of a
cell going to increase with rise in temperature .
When the overdrive voltage(VGS-Vt) is less (smaller node -> less VGS), then decrease in
threshold voltage due to rise in temp going to dominate the overall overdrive voltage,
results delay of a cell going to decrease with rise in temperature.
One thing should be noted that temperature inversion is come into picture at lower nodes
(lower voltage) with more prominent effects on HVT cells.
ICG Optimization
In the previous post we have read about ICG Enable timing problem, to overcome the problem we use
ICG optimization technique in pre-cts stage.
ICG optimization is executed during place stage & performs
Dummy -CTS
ICG Splitting
Clock aware placement
Dummy CTS
In Dummy CTS ,it will build a Dummy clock tree to identify the critical ICG, calling it as dummy clock
because in cts stage tool will build the actual clock tree by discarding the dummy one
Benefits of dummy cts are:
Accurately determine the ICG enable critical timing paths with the help of Dummy cts
Accurate data path optimization of timing critical ICG enable paths in place stage.
Effective ICG splitting & clock aware placement (discuss below).
One thing should be taken care of that we have to apply all cts related settings like clock tree exceptions,
NDR rules, layers, etc before running place stage in order to correlate Dummy cts & actual cts clock tree
as much as possible for optimum ICG optimziation .
ICG Splitting
After Dummy CTS, ICG optimization perform ICG splitting, we know that if ICG cell is driving multiple reg ,
it will increase the ICG downstream latency lead to more enable timing critical paths.
Tool identify the Critical ICG's in Dummy cts & do ICG splitting i.e instantiate one ICG into many ICG's &
place them near to the reg they drive to reduce ICG downstream latency for better ICG enable timing.
ICG splitting is timing driven means only ICG's with enable timing violations will split.
Figure 1: ICG Splitting
One thing to note that ICG optimization may increase dynamic power dissipation in our design.
Ideally one ICG cell can drive infinite flops, as no. of flops going to increase driven by ICG
cell,tool is going to add more buffer in the clock path to balance clock tree, which will
increase the ICG latency.
As L3 latency going to increase, results in increase in ICG latency, as clock period is fixed so
now we are having lesser clock period than before to meet setup timing at EN pin.
So we can conclude that Larger the ICG latency , more critical the ICG enable timing.
It is always advisable to address ICG timing in place/pre-cts stage, as after CTS it can be
too late for the design to address ICG timing violation.
we know that Pre-cts timing analysis used ideal clock latency for all clock pins,that means
L2=L3, & ICG latency will be 0.
As ICG latency is 0 ,which will make ICG Enable timing analysis too optimistic, because now
ICG cell will get full clock to meet setup at Enable pin(before it get only L3 - ICG Latency).
So, In Pre-cts actual ICG violations are not seen, therefore not fixed in the design.
------------------------------------