Analyzing Clock Trees: Jeff Shabel
Analyzing Clock Trees: Jeff Shabel
Jeff Shabel
QUALCOMM, Inc.
ABSTRACT
Clock tree power continues to be a major contributor to dynamic and, to a lesser degree, static
chip power. It is imperative for low-power designs to reduce clock tree power as much as
possible. The introduction of Power Compiler makes it possible to drastically cut dynamic clock
tree power. This paper shows how PrimeTime 2004.12 was used to obtain fairly accurate clock
tree power estimates on a 130nm chip. On this chip, Power Compiler dynamic clock gating was
applied to several blocks. This paper describes how much power was consumed by each
component of the clock tree, including wire, pin, clock buffer and register components. By
analyzing how these different components contribute to the overall clock tree power, it was
possible to find ways to improve the library to achieve lower-power designs. In addition this
analysis shows how much of the clock tree power was consumed at the leaf of the tree. In
situations where the leaf of the tree consumes more power, Power Compiler can be used to
achieve significant savings in dynamic clock tree power. This paper explains the potential
savings achieved on a real chip by using Power Compiler and compares this information to real
silicon data. The paper also compares two different chips, one that used Power Compiler and one
that did not, and describes how power savings for two blocks were achieved with Power
Compiler.
In addition to analyzing clock tree power, this paper explains how to take advantage of the new
PrimeTime 2004.12 feature, read_parasitics_load_locations, to assist with visualizing clock
trees. The paper describes how Tcl scripts can be used to plot clock trees, critical paths, and
clock tree power relative to a given floorplan. These scripts can provide the designer with a
quick and intuitive way to analyze clock trees and to find any potential issues with them.
Table of Contents
1.0 Introduction......................................................................................................................... 4
2.0 How is clock tree power measured now? ........................................................................... 4
2.1 Static measurements............................................................................................................ 4
2.1.1 PrimePower..................................................................................................................... 4
2.1.2 With real silicon.............................................................................................................. 4
2.2 Dynamic measurements ...................................................................................................... 5
2.2.1 PrimePower..................................................................................................................... 5
2.2.2 With real silicon.............................................................................................................. 5
3.0 Using PrimeTime to calculate clock tree power ................................................................. 5
3.1 Overview............................................................................................................................. 5
3.2 Details of the PrimeTime Tcl script.................................................................................... 8
4.0 Static analysis results .......................................................................................................... 9
4.1 Ungated clock tree power ................................................................................................... 9
4.2 Gated clock tree power ....................................................................................................... 9
5.0 Correlation to silicon ........................................................................................................ 11
5.1 Setup and method.............................................................................................................. 11
5.2 Ungated clocks.................................................................................................................. 11
5.3 Gated clocks...................................................................................................................... 12
5.4 Dynamic clock savings ..................................................................................................... 13
6.0 Power plotting................................................................................................................... 13
6.1 Goals and setup ................................................................................................................. 13
6.2 Sample results ................................................................................................................... 14
7.0 Clock tree plotting ............................................................................................................ 14
7.1 Goals and setup ................................................................................................................. 14
7.2 Sample results ................................................................................................................... 14
8.0 Conclusion ........................................................................................................................ 15
9.0 Acknowledgements........................................................................................................... 16
10.0 References......................................................................................................................... 16
11.0 Appendix........................................................................................................................... 16
11.1 Perl script to preprocess .lib file ....................................................................................... 16
11.2 Clock Analyzer Tcl script ................................................................................................. 18
The clock tree analysis described in this paper was done entirely within PrimeTime. While
PrimePower can do some of this analysis, PrimePower does not have the flexibility to analyze
different aspects of clock tree power, as can be done with simple Tcl scripts inside PrimeTime.
First, PrimePower requires a separate license from PrimeTime. While many companies have
PrimeTime licenses, not all companies have a PrimePower license nor do they have people
sufficiently experienced with the tool to use it effectively. Second, PrimePower does not
consider the internal register power that is consumed when only the clock pin is toggling. Third,
PrimePower does not have the flexibility to quickly analyze clock trees starting and ending at
specific points as required by the user. Fourth, PrimePower does not have the ability to extract
additional useful clock tree power statistics, which is shown later in this paper.
Because it is difficult to create an effective measurement setup with silicon, it is suggested that a
sanity check that correlates clock tree power results to some static predictions be performed.
Clock tree power is calculated by taking into account the components shown in
Figure 3-1. The resulting general formula for clock tree power consumption is:
PrimeTime can provide the wire and pin capacitances required to calculate total power.
PrimeTime will not provide the internal switching power of the buffers and leaf cells that comes
from the clock lines toggling. This information needs to be extracted from the .lib file for the
The internal switching power of clock tree buffers is a component of two values: input transition
time and output load. It is assumed that the input transition time and output load of clock tree
buffers is fairly tight (and consistent) across the clock tree. If this is the case, notice that the
internal switching power of a clock tree buffer does not change much within the range of
acceptable transition times and output loads for most clock tree synthesis (CTS) settings.
Because of this phenomenon, it is reasonable to choose an average power value from the
internal_power table inside the .lib file of the standard cell library. An example is shown in
Figure 3-2. Note that the values in red were chosen by the preprocessing Perl script to feed into
PrimeTime. These values were chosen because they are near the center of the power table.
power_lut_template (clock_buffer1_energy_template_0) {
variable_1 : input_transition_time ;
variable_2 : total_output_net_capacitance ;
index_1 ( "0.1,0.25,0.3,0.45,0.5,2.0" );
index_2 ( "0,5,25,50,90,340,2000" );
}
cell (clock_buffer1) {
[snip]
pin (z) {
[snip]
internal_power () {
related_pin : "a" ;
fall_power (clock_buffer1_energy_template_0) {
values ( "77,77,78,79,79,80,80",\
"76,76,77,78,78,83,80",\
"75,75,76,77,77,79,79",\
"76,76,76,76,77,80,80",\
"78,77,77,78,78,79,81",\
"85,85,84,83,84,85,86" );
}
rise_power (clock_buffer1_energy_template_0) {
values ( "80,80,81,81,82,81,82",\
"79,79,80,80,81,78,79",\
"77,77,78,78,78,78,79",\
"79,79,78,77,76,74,75",\
"81,82,84,86,88,71,71",\
"87,87,86,86,86,88,90" );
}
}
}
}
The same principle applies to registers. Internal register power depends only on input transition
time. Because transition times are fairly sharp after CTS, it is reasonable to choose an average
(or best guess) value from the .lib file of the standard cell library. An example is shown in
Figure 3-3. Note that the values in red were chosen by the preprocessing Perl script to feed into
PrimeTime. These values were chosen because they are near the center of the power table.
At QUALCOMM, we use our own standard cell library. However, this same principle can be
applied to the TSMC standard cell library as well.
A Perl script can be used to preprocess the .lib files and write a Tcl script to read into
PrimeTime. This Tcl script defines a new user attribute for each clock buffer and register in the
standard cell library to store these values. Note that the Perl script must add both the rise and
fall power and supply the summed value to PrimeTime. The summed value represents the
consumed power during one clock cycle. A sample portion of the resulting Tcl script is shown
in Figure 3-4.
The Tcl script provides PrimeTime with the additional information necessary to calculate clock
tree current consumption. A Tcl script can be written to traverse the clock tree, computing clock
tree power as it traverses, until it reaches a leaf cell. A leaf cell is typically a register but can
also be a memory element, custom block, or a random logic gate.
While PrimeTime is traversing the tree, the Tcl script can save various statistics that can be used
after PrimeTime completes the traversal. First, the script can optionally stop at power-compiler
clock gating cells (CGCs). If the Tcl script is run twice on the same clock, stopping once at
CGCs, and another time traversing through them, it is possible to see the maximum effect of
Power Compiler on clock tree power. In one case, Power Compiler is calculating the clock tree
power assuming that the CGCs are in a gating state. In the other case, it is calculating the clock
tree power assuming that the CGCs are in a non-gating state. The difference between these two
Second, the script can track how much current is consumed at the leaf of the tree. In this paper,
power consumed at the leaf of the tree is computed by summing the currents due to the final wire
and pin caps after the last buffer (or CGC), as well as the final leaf cell. If a majority of the
clock tree power comes from the leaf of the tree, Power Compiler will be extremely useful in
saving clock tree power. However, if a majority of the clock tree power comes from higher up in
the tree, Power Compiler would not be very effective in gating off clock tree power.
Third, the script can keep track of how much current is consumed by various components of the
clock tree:
With this information, it should be easy to see which areas of the clock tree should be evaluated
to save the most power.
1. Prerequisite: Source the Tcl script generated from preprocessing the standard cell .lib file.
2. The user provides the start point of the clock tree.
3. Traverse the tree recursively, continuing only if the script finds a legal clock tree library
cell.
4. Optionally stop at power compiler gating cells.
5. Record all components and their power contributions along the way, and also the power
contributions of components at each level of the tree.
6. For each leaf traversed, record the power consumed:
a. Include the last wire and pin caps.
b. Include the final internal switching power of the leaf cell.
7. When traversal is complete, report final power statistics.
Chip 1 - 130nm Chip 2 - 130nm Chip 3 - 130nm Chip 4 - 130nm Chip 5 - 130nm Chip 6 - 90nm Chip 7 - 90nm
Current Source % of Total % of Total % of Total % of Total % of Total % of Total % of Total
Misc 0% 0% 0% 0% 0% 2% 2%
Memory 0% 0% 1% 1% 1% 0% 0%
CGC (int) 4% 3% 5% 4% 4% 1% 1%
Pin 10% 10% 9% 9% 9% 10% 10%
Clock Buffer (Int) 15% 15% 14% 16% 16% 14% 14%
Wire 18% 24% 25% 25% 24% 24% 23%
Register (Int) 52% 48% 47% 45% 46% 49% 50%
Last Stage 68% 69% 70% 69% 70% 70% 70%
Table 4-1 Ungated clock tree power for five 130nm chips and two 90nm chips
Table 4-1 shows two very significant trends. First, note that regardless of the chip or the
technology, roughly 70% of the clock tree power comes from the last stage, that is, the last net
and leaf cell. This indicates that using Power Compiler will be very beneficial and that it should
be run on these chips.
Second, note that roughly 45 to 50% of the clock tree power is due to register power and only
15% is due to clock tree buffer power. Therefore, while it always helps to improve clock tree
buffer cell designs, improving the register design could reduce overall dynamic power
consumption.
Maximum current
savings per component
Current source using Power Compiler
Misc 0%
Memory 0%
CGC 11%
Pin 26%
CBUF 13%
Wire 31%
Register 33%
Total Savings 28%
Table 4-2 Maximum current savings using Power Compiler
Also note that Power Compiler was not run on a large portion of the chip. It was run on the
blocks with the fastest clocks but not on many others, mostly due to tool issues on (now) older
versions of Design Compiler. Only 27% of the registers in the chip were synthesized using
Power Compiler. Not all of registers in the 27% were successfully gated using CGCs. Clock
gating was done on 66% of those registers, which equates to 18% of the total registers in the
chip.
So, even with only 18% of the registers successfully gated off, Power Compiler can save us up to
28% on our clock tree power. Again, note that the main reason for the high power savings is that
the highest-speed blocks were using this feature. The medium-to-slower speed blocks were not
able to use Power Compiler. This is an important point. Even if Power Compiler cannot be run
on all the blocks in a chip, it is imperative that Power Compiler be used on the highest-speed
blocks to maximize the potential savings.
Table 4-3 shows various gating statistics for 15 different clock trees that were synthesized using
Power Compiler.
Maximum
power Average
savings Gated CGC Median CGC
Clock name (%) registers (%) fanout fanout
clock1 27% 40% 32 28
clock2 33% 54% 26 18
clock3 35% 59% 31 32
clock4 42% 58% 25 32
clock5 44% 70% 27 29
clock6 45% 56% 43 34
clock7 47% 55% 35 30
clock8 48% 77% 17 14
clock9 55% 86% 20 14
clock10 57% 77% 32 30
clock11 57% 72% 53 32
clock12 62% 72% 31 32
clock13 66% 84% 43 16
clock14 67% 90% 21 26
clock15 75% 88% 31 32
It is useful to note the wide range in power savings from clock regime to clock regime. Some
savings were as high as 75%, while others were as low as 27%. In general, the clocks on this
chip averaged 45 to 50% maximum power savings on their clock trees.
The clock trees that were not synthesized using Power Compiler can be easily correlated to static
measurements. The clock trees that were synthesized using Power Compiler are more difficult to
correlate. These clock trees have clock gating cells and it is not known which gating cells are in
a non-gating state and which cells are in a gating state while the block is being held in reset. It is
completely design dependent. Therefore, static clock power predictions can only bound the
power when Power Compiler is used. Clock tree power as measured on silicon should be within
the bounds predicted by PrimeTime.
The current measurements have a margin of error due to the precision of the measuring device in
the lab. For smaller clock regimes, the margin of error can be very close to the actual measured
current value. Therefore, it is necessary to consider the measurement error when correlating
clock tree power to static measurements, as noted in the following sections.
Note that almost all clocks are within 10% of the expected value predicted by PrimeTime. The
clocks that are outside the 10% range could be due to random logic that is not held in reset and is
toggling with the clock.
Silicon Silicon
Clock measuremen PT PT measurement
t gated ungated margin of error
clock1 38 0 100 1
clock2 -2 0 100 1
clock3 -3 0 100 7
clock4 31 0 100 0
clock5 27 0 100 3
clock6 81 0 100 2
clock7 43 0 100 36
clock8 80 0 100 7
clock9 86 0 100 17
clock10 8 0 100 174
clock11 90 0 100 9
clock12 56 0 100 6
clock13 -9 0 100 10
clock14 26 0 100 0
Table 5-2 Gated clock correlation to silicon
PT maximum
predicted CT Silicon Cell count
Block savings savings difference Comments
Savings higher probably also
block1 38% 40% -8.5%
due to cell count decrease
block2 / mode1
39%
44% +4%
block2 / mode2
31%
Table 5-3 Dynamic clock savings from Power Compiler on silicon
For both of these blocks, note the significant savings due to Power Compiler clock gating. The
power measurements were taken once after each block was set up, just before the blocks were
run. The power measurements were taken again while each block was running, doing real work.
The difference between these two numbers is what is being compared between these two chips.
This data provides proof that real dynamic clock tree power savings can be accomplished by
using Power Compiler.
One application of this option is to create power plots of the chip that show clock tree power
consumption. With this capability, a user can quickly tell where the most power in the chip is
being consumed due to a clock tree. Furthermore, since this is done using a Tcl script, the user
has the flexibility to control things such as stopping (optionally) at clock gating cells. This can
be beneficial when analyzing clock tree power. Because clock tree power is roughly 40 to 50%
The actual plotting is done using Gnuplot 4.0+. A Tcl script can be written to write Gnuplot
commands to generate the plots of interest. A sample Tcl script is provided in the Appendix.
The Tcl script creates bins for a small area of a chip. The total clock tree power consumed
within that area is summed up and normalized with the rest of the bins. Bins with higher clock
tree power are represented by orange and red colors. Bins with lower clock tree power are
indicated by green and blue colors
It is possible to annotate the top-level floorplan to the plot as well. This can be done by using
PrimeTime or Physical Compiler, depending on the particular design flow. If PrimeTime is used
for flat timing analysis (that is, no hardmacros, ETMs, or ILMs), Physical Compiler should be
used to extract the hardmacro boundaries.
It is helpful to have this flexibility in PrimeTime since many users already use this tool for final
chip analysis.
Note that the real routes of the clocks are not shown. A straight connection is made using
Gnuplot to connect the buffers in the tree.
8.0 Conclusion
Clock trees typically consume 40 to 50% of the dynamic power of a chip. Analysis shows that
most of that clock tree power, upwards of 70%, is at the leaf. In fact, 45 to 50% of the clock tree
power comes from the internal switching power of the registers.
The data shows that Power Compiler should, and does, help reduce clock tree power
significantly.
The data also identifies which areas of the clock tree should be considered for improvement to
get the most “bang for the buck.” First, it is imperative that the placement of the last buffer (or
gating cell) be optimized with respect to the leaf cells of the tree. The closer together that the
last buffer and leaf are, the greater are the power savings that can be achieved. Second,
considerable effort should be placed on improving the internal switching power of the registers.
If this improvement comes at the cost of performance, then it might be feasible to have two types
of registers, one with less dynamic power consumption but poorer performance, and one with
better performance but more dynamic power consumption. With these two register types,
synthesis tools should be able to choose the appropriate register to meet the design constraints.
If the synthesis tools cannot handle this trade-off, external scripts can be written to perform the
necessary register swaps where needed. If scripts are used, it is beneficial if the two register
types have identical footprints so that a cell swap can be performed easily without affecting
placement.
I would also like to thank both Elisabeth Moseley and Geoffrey Suzuki of Synopsys for their
help in researching and writing this paper.
10.0 References
[1] Chun, K. and Ling, A. Placement approach cuts SoC power needs. EE Times, 11/21/03
http://www.eetimes.com/story/OEG20031121S0035
[2] Synopsys PrimeTime User Guide, Version 2004.12, 2004.
11.0 Appendix
#!/bin/perl
$outfile = "set_attribute_library.tcl";
while (<INFILE>)
{
if (/^\s+cell \(([^\)]+)\)/)
{
$cellname = $1;
if ($cellname =~ /DFF/)
{
while (<INFILE>)
{
# Clock pin of register here.
if (/^\s+pin \(clock/)
{
#
# Now Find internal power sections..
# Make sure to use the one without a when: clause..
# We're lucky because it's always the last one
# This will take some tweaking to get right for every .lib file.
#
while (<INFILE>)
{
if (/internal_power/)
{
$_ = <INFILE>;
$_ = <INFILE>; # values line
#
# Now Find internal power sections..
# Make sure to use the one without a when: clause..
# We're lucky because it's always the last one
#
while (<INFILE>)
{
if (/internal_power/)
{
$_ = <INFILE>;
$_ = <INFILE>;
# pick the 3rd line.. avg of that line
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 2
$_ = <INFILE>; # values line 3
/\s+"[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$fall_power = $1;
$_ = <INFILE>; # values line 4
$_ = <INFILE>; # values line 5
$_ = <INFILE>; # values line 6
$_ = <INFILE>; # } line
$_ = <INFILE>; # rise_power line
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 1
$_ = <INFILE>; # values line 3
/\s+"[0-9\.]+,[0-9\.]+,[0-9\.]+,([0-9\.]+)/;
$rise_power = $1;
$total_power = $rise_power + $fall_power;
print OUTFILE "set_user_attribute [get_lib_cells
$library_name/$cellname] total_power $total_power\n";
goto next_cell;
}
next_power:
}
}
}
}
}
next_cell:
}
###################################################################
# Set up modes below.
# Mode 1 = _rpt_summary
# Mode 2 = _rpt_summary_level
# Mode 3 = _plot_power
#
# For any of the above modes, you can have the script stop
# at power-compiler-inserted CGC cells, assuming they are gating
# the clock. Set the variable "_stop_at_pc_cgc" to 1 for this.
###################################################################
set _rpt_summary 0
set _rpt_summary_level 0
set _plot_power 0
set _plot_tree 0
set _stop_at_pc_cgc 0
###################################################################
# For the three modes, give report directory names here where
# to write files.
###################################################################
# For mode 1
set _summary_dir "summary_results"
if {$_rpt_summary == 1} {
if { [ file exist $_summary_dir ] == 0 } {
file mkdir $_summary_dir
}
}
# For mode 2
set _summary_level_dir "summary_level_results"
if {$_rpt_summary_level == 1} {
if { [ file exist $_summary_level_dir ] == 0 } {
file mkdir $_summary_level_dir
}
}
# For mode 3
set _power_dir "power_results"
if {$_plot_power == 1} {
if { [ file exist $_power_dir ] == 0 } {
file mkdir $_power_dir
}
}
# For mode 4
set _plot_tree_file "test.gnuplot"
set _plot_tree_cmd_file "test.cmd.gnuplot"
if {$_plot_tree == 1} {
set PLOT_TREE_FILE [open $_plot_tree_file w+]
set PLOT_TREE_CMD_FILE [open $_plot_tree_cmd_file w+]
}
###################################################################
###################################################################
# This filter sets up all the valid cells on the clock tree
# or sitting at the leaf of a tree. This should include
# any clock tree buffers, inverters that area allowed, delay cells,
# gating cells, memories, regs, etc.
###################################################################
set _filter_valid_ct_cells "@ref_name =~ *BUF* ||
ref_name =~ *DFF* ||
ref_name =~ *CGC* ||
ref_name =~ *RAM*"
###################################################################
# Need to know valid leaf cell names so know when to stop
# tracing through cells.
###################################################################
set _filter_valid_leaf_cells "@ref_name =~ *DFF* ||
ref_name =~ *RAM*"
###################################################################
# Need to know CGC library cell name that Power Compiler will
# use to insert clock gating cells.
###################################################################
set _filter_cgc_name "*CGC*"
###################################################################
# Need to know valid register library cell names.
###################################################################
set _filter_reg_name "*DFF*"
###################################################################
# Need to know valid clock buffer library cell names.
###################################################################
set _filter_cbuf_name "*CBUF*"
###################################################################
# Need to know valid memory library cell names.
###################################################################
set _filter_mem_name "*RAM*"
###################################################################
# Need to know valid misc library cell names that could show up.
###################################################################
set _filter_misc_name "*INV*"
###################################################################
# Write out header lines to .csv files if required to do so.
###################################################################
if {$_rpt_summary == 1} {
set CURRENT_SUM [open $_summary_dir/current_sum.csv w+]
puts $CURRENT_SUM "Clock,Simple Clock Name,Pin,Wire,CGC,CBUF,Misc,Reg,Mem,Total,Last Stage
Power,Freq,mA/MHz,Num Regs"
}
if {$_rpt_summary_level == 1} {
set CURRENT_LEVEL [open $_summary_level_dir/current_level.csv w+]
puts $CURRENT_LEVEL "Clock,Simple Clock
Name,Level,Pin,Wire,CGC,CBUF,Misc,Reg,Mem,Total,Running Total,Freq,mA/MHz,Num Cells,Num Leaf
Cells"
}
#####################################################
# This is only needed if you want to do gnuplot
# plotting (power and/or clock).
#####################################################
set _die_size_x 2000
set _die_size_y 2000
#####################################################
# This is only needed if you want to do gnuplot
# clock plotting.
# Then do:
if {$_plot_tree == 1} {
puts $PLOT_TREE_FILE ""
puts $PLOT_TREE_CMD_FILE "set multiplot"
puts $PLOT_TREE_CMD_FILE "set key off"
puts $PLOT_TREE_CMD_FILE "set style line 1"
puts $PLOT_TREE_CMD_FILE "set style line 6"
puts $PLOT_TREE_CMD_FILE "plot [0:$_die_size_x][0:$_die_size_y] '$_hm_ref_graph_file' with
lines ls 6"
puts $PLOT_TREE_CMD_FILE "plot [0:$_die_size_x][0:$_die_size_y] '$_plot_tree_file' with lines
ls 1"
}
#####################################################
# This is only needed if you want to do gnuplot
# power plotting
#
# Then do:
# (execute gnuplot 4.0)
# load "$_power_dir/gnuplot.script"
# plot [0:$_die_size_x][0:$_die_size_y] 'pc_hm_boxes.graph' with lines ls 6
#####################################################
#
# Specify how many bins in X and Y direction for power plotting
# Should be nice even number from chip dimensions.
#
set _num_x_bins 20
set _num_y_bins 20
#
# Set up power bins initialized to 0
#
if {$_plot_power == 1} {
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
set _current_bin($i,$j) 0
}
}
}
#####################################################
# Initialize variables that should span across
# all calls to trace_clock_tree procedure. These
#####################################################
# Finally, the real procedure call gets defined.
#####################################################
#########################################################
# Set up global variables to access here.
#########################################################
global _vdd
global _plot_tree
global PLOT_TREE_FILE
global PLOT_TREE_CMD_FILE
global _last_stage_power
global _plot_power
global _stop_at_pc_cgc
global _filter_valid_ct_cells
global _filter_valid_leaf_cells
global _max_level
global _num_cells_at_level
global _num_regs
global _total_current_at_level
global _total_wire_current_at_level
global _total_mem_current_at_level
global _total_pin_current_at_level
global _total_cgc_current_at_level
global _total_cbuf_current_at_level
global _total_misc_current_at_level
global _total_reg_current_at_level
global _total_leafs_at_level
global _top_total_current_due_to_pins
global _top_total_current_due_to_wires
global _top_total_current_due_to_cgcs
global _top_total_current_due_to_cbufs
global _top_total_current_due_to_misc
global _top_total_current_due_to_regs
global _top_total_current_due_to_mems
global _top_total_current_all
global CURRENT_SUM
global CURRENT_LEVEL
global _summary_level_dir
global _rpt_summary
global _rpt_summary_level
global _last_stage_power_for_clock
global _filter_cgc_name
global _filter_reg_name
global _filter_cbuf_name
global _filter_mem_name
global _filter_misc_name
#########################################################
# Store the original level we were at coming into
# this routine. Increment it for use inside this
# routine.
#########################################################
set orig_mylevel $_mylevel
incr _mylevel;
#########################################################
# If this is the first time we are being called
# for a clock, let's initialize a bunch of variables.
#########################################################
# Keep track of max clock tree depth.
#########################################################
if {[expr $_mylevel > $_max_level]} {
set _max_level $_mylevel
}
#########################################################
# Let's initialize/set some arrayed variables
# if they don't already exist.
#########################################################
if {![info exists _num_cells_at_level($_mylevel)]} {
set _num_cells_at_level($_mylevel) 0;
}
if {![info exists _total_current_at_level($_mylevel)]} {
set _total_current_at_level($_mylevel) 0;
}
if {![info exists _total_wire_current_at_level($_mylevel)]} {
set _total_wire_current_at_level($_mylevel) 0;
}
if {![info exists _total_pin_current_at_level($_mylevel)]} {
set _total_pin_current_at_level($_mylevel) 0;
}
if {![info exists _total_cgc_current_at_level($_mylevel)]} {
set _total_cgc_current_at_level($_mylevel) 0;
}
if {![info exists _total_reg_current_at_level($_mylevel)]} {
set _total_reg_current_at_level($_mylevel) 0;
}
if {![info exists _total_mem_current_at_level($_mylevel)]} {
set _total_mem_current_at_level($_mylevel) 0;
}
if {![info exists _total_cbuf_current_at_level($_mylevel)]} {
set _total_cbuf_current_at_level($_mylevel) 0;
}
if {![info exists _total_misc_current_at_level($_mylevel)]} {
set _total_misc_current_at_level($_mylevel) 0;
}
if {![info exists _total_leafs_at_level($_mylevel)]} {
set _total_leafs_at_level($_mylevel) 0;
}
#########################################################
# Ok, let's begin.
#
# Get the cell and its X,Y coordinates for plotting,
# if needed later.
#########################################################
set orig_buf_cell [get_cells -of_objects $_myclock]
set orig_buf_loc_x [get_attribute $orig_buf_cell x_coordinate_max]
set orig_buf_loc_y [get_attribute $orig_buf_cell y_coordinate_max]
#########################################################
# Get immediate fanouts of this cell, filtered for only
# valid clock tree/leaf cell types. Remember to
# remove the original cell we started with.
#########################################################
set myclock_bufs [remove_from_collection \
[filter [all_fanout -flat -only_cells -levels 1 -from $_myclock] \
$_filter_valid_ct_cells] \
[get_cells -of_objects $_myclock] ]
#########################################################
# Get output net from this net and gather capacitance info
#
#########################################################
# Store the current value due to this net into
# variables for later use. Use divide ratio as needed
# to get from cap units to desired current units to report.
#########################################################
set _total_current_at_level($_mylevel) [expr \
$_total_current_at_level($_mylevel) + \
($tot_cap * $_vdd * $_freq / 1000000)]
#########################################################
# For debugging purposes, it may be useful to know
# what cells were rejected due to our filters above.
# Uncomment this below if you want to know that.
#########################################################
# list rejected ones..
#set rejected [remove_from_collection [remove_from_collection [all_fanout -flat -only_cells -
levels 1 -from $_myclock] $myclock_bufs] [get_cells -of_objects $_myclock]]
#foreach_in_collection rejected_inst $rejected {
# set rejected_name [get_attribute $rejected_inst full_name]
# echo "rejected: $rejected_name"
#}
#########################################################
# Ok, let's go through all the fanouts from the startpoint
# at this level. We need to recursively traverse each
# one (if not a leaf). We also need to start gathering
# all sorts of stats.
#########################################################
#########################################################
# We need this found_leaf variable to help us calculate
# the power at the last stage of the clock tree. We don't
# want to count the last buffer and net more than once!
#########################################################
set found_leaf 0
foreach_in_collection mybuf $myclock_bufs {
incr _num_cells_at_level($_mylevel)
set cell_name [get_attribute $mybuf full_name]
set ref_name [get_attribute $mybuf ref_name]
set lib_cell_name [get_attribute [get_lib_cells -of_objects $mybuf] full_name]
if {$_plot_tree == 1} {
puts $PLOT_TREE_FILE "$orig_buf_loc_x $orig_buf_loc_y"
puts $PLOT_TREE_FILE "$new_buf_loc_x $new_buf_loc_y"
puts $PLOT_TREE_FILE ""
}
###############################################################################
# If we stop at PC-CGCs and we're at a PC-CGC, we need to include the
# average power of clk pin. (not through cell which is what's in "total_power".
###############################################################################
# Just in case we didn't set the user attribute for this cell type here
# let's use 0 so our computations below don't die.
if {$pwr == ""} {
echo "Warning: $cell_name ($ref_name) has no total_power attribute set on it"
set pwr 0
}
set _total_current_at_level($_mylevel) \
[expr $_total_current_at_level($_mylevel) + ($pwr * $_freq / $_vdd / 1000000)]
if {$_plot_power == 1} {
store_current_in_bin $new_buf_loc_x $new_buf_loc_y [expr ($pwr * $_freq / $_vdd /
1000000)]
}
###################################################################################
# If we are at a PC-CGC and we need to stop here, record it (else part of this
# if statement) and don't trace beyond it.
#
# If we are at a leaf cell, record it (else part of this if statement) and
# don't trace beyond it.
#
# Otherwise, continue down tree.
###################################################################################
if {([sizeof [filter [get_cells $cell_name] $_filter_valid_leaf_cells]] == 0) &&
!([string match $_filter_cgc_name $ref_name] && ($_stop_at_pc_cgc == 1))} {
###################################################################
# Trace down tree more!
###################################################################
trace_clock_tree $myoutput_pin $_myclockname $_mylevel $_freq
} else {
###################################################################
# We are at a leaf.
###################################################################
incr _total_leafs_at_level($_mylevel)
set _last_stage_power [expr $_last_stage_power + ($pwr * $_freq / $_vdd / 1000000)]
set _last_stage_power_for_clock [expr $_last_stage_power_for_clock + ($pwr * $_freq /
$_vdd / 1000000)]
###################################################################
# Store cap due to wire+pin and last leaf cell
if {$found_leaf == 0} {
set found_leaf 1
set current [expr $tot_cap * $_vdd * $_freq / 1000000]
# Add in net between buffer and this reg only once
set _last_stage_power [expr $_last_stage_power + ($tot_cap * $_vdd * $_freq /
1000000)]
set _last_stage_power_for_clock [expr $_last_stage_power_for_clock + ($tot_cap *
$_vdd * $_freq / 1000000)]
}
}
}
###############################################################
# If finishing up an entire clock (done with recursion)
# Let's write out stats for this clock
###############################################################
if {$orig_mylevel == 0} {
set _total_current_due_to_pins 0
set _total_current_due_to_wires 0
set _total_current_due_to_cgcs 0
set _total_current_due_to_cbufs 0
set _total_current_due_to_misc 0
set _total_current_due_to_regs 0
set _total_current_due_to_mems 0
set _total_leafs 0
set _total_current_all 0
###############################################################
# Add up power parts for each level to totals for clock.
###############################################################
set _total_current_due_to_pins \
[expr $_total_current_due_to_pins + $_total_pin_current_at_level($i)]
set _total_current_due_to_wires \
[expr $_total_current_due_to_wires + $_total_wire_current_at_level($i)]
set _total_current_due_to_cgcs \
[expr $_total_current_due_to_cgcs + $_total_cgc_current_at_level($i)]
set _total_current_due_to_cbufs \
[expr $_total_current_due_to_cbufs + $_total_cbuf_current_at_level($i)]
set _total_current_due_to_regs \
[expr $_total_current_due_to_regs + $_total_reg_current_at_level($i)]
set _total_current_due_to_misc \
[expr $_total_current_due_to_misc + $_total_misc_current_at_level($i)]
set _total_current_due_to_mems \
[expr $_total_current_due_to_mems + $_total_mem_current_at_level($i)]
set _total_current_all [expr $_total_current_all + $_total_current_at_level($i)]
###############################################################
# Also, add power parts to total for entire chip
###############################################################
set _top_total_current_due_to_pins \
[expr $_top_total_current_due_to_pins + $_total_pin_current_at_level($i)]
set _top_total_current_due_to_wires \
[expr $_top_total_current_due_to_wires + $_total_wire_current_at_level($i)]
set _top_total_current_due_to_cgcs \
[expr $_top_total_current_due_to_cgcs + $_total_cgc_current_at_level($i)]
set _top_total_current_due_to_cbufs \
[expr $_top_total_current_due_to_cbufs + $_total_cbuf_current_at_level($i)]
set _top_total_current_due_to_regs \
[expr $_top_total_current_due_to_regs + $_total_reg_current_at_level($i)]
set _top_total_current_due_to_misc \
[expr $_top_total_current_due_to_misc + $_total_misc_current_at_level($i)]
set _top_total_current_due_to_mems \
[expr $_top_total_current_due_to_mems + $_total_mem_current_at_level($i)]
set _top_total_current_all [expr $_top_total_current_all +
$_total_current_at_level($i)]
if {$_rpt_summary_level == 1} {
set ma_mhz [expr $_total_current_at_level($i) / $_freq]
}
set ma_mhz [expr $_total_current_all / $_freq]
if {$_rpt_summary == 1} {
puts $CURRENT_SUM
"$_myclock,$_myclockname,$_total_current_due_to_pins,$_total_current_due_to_wires,$_total_current
_due_to_cgcs,$_total_current_due_to_cbufs,$_total_current_due_to_misc,$_total_current_due_to_regs
,$_total_current_due_to_mems,$_total_current_all,$_last_stage_power_for_clock,$_freq,$ma_mhz,$_nu
m_regs"
}
###############################################################
# Let's reset these back to 0 so next run doesn't
# have to set them to 0 again.
###############################################################
for {set i 0} {$i<=$_max_level} {incr i} {
set _num_cells_at_level($i) 0
set _total_current_at_level($i) 0
set _total_pin_current_at_level($i) 0
set _total_wire_current_at_level($i) 0
set _total_cgc_current_at_level($i) 0
set _total_cbuf_current_at_level($i) 0
set _total_misc_current_at_level($i) 0
set _total_reg_current_at_level($i) 0
set _total_leafs_at_level($i) 0
set _total_mem_current_at_level($i) 0
}
###############################################################
# Close any open files
###############################################################
if {$_rpt_summary == 1} {
close $CURRENT_SUM
}
if {$_rpt_summary_level == 1} {
close $CURRENT_LEVEL
}
if {$_plot_tree == 1} {
close $PLOT_TREE_FILE
close $PLOT_TREE_CMD_FILE
}
}
}
###############################################################
# Put calls to clock routine here.
# trace_clock_tree <startpoint> <a clock name> 0 <frequency (MHz)>
#
# The <a clock name> can be any text you want. This shows
# up in a column in the .csv file and is supposed to be shorter
# than the startpoint for easier identification in Excel.
# The "0" is the level. We always start the recursive calls
# to trace_clock_tree with 0 indicating the start of the tree.
###############################################################
trace_clock_tree clock_block/myclock myclockname 0 50
###############################################################
# Now, let's collect the entire chip stats and
# write them out
###############################################################
set percentage [expr $_last_stage_power / $_top_total_current_all * 100]
if {$_rpt_summary == 1} {
set SUMMARY_FILE [open $_summary_dir/summary w+]
puts $SUMMARY_FILE "Total pin current: $_top_total_current_due_to_pins"
puts $SUMMARY_FILE "Total wire current: $_top_total_current_due_to_wires"
###############################################################
# For Plotting power, there's lots of stuff to write out..
###############################################################
if {$_plot_power == 1} {
set _max_current_in_bin 0
set _min_current_in_bin 9999999
for {set i 0} {$i<$_num_x_bins} {incr i} {
for {set j 0} {$j<$_num_y_bins} {incr j} {
if {[expr $_current_bin($i,$j) > $_max_current_in_bin]} {
set _max_current_in_bin $_current_bin($i,$j)
}
if {[expr $_current_bin($i,$j) < $_min_current_in_bin]} {
set _min_current_in_bin $_current_bin($i,$j)
}
}
}
#
# We have only a certain # of colors to choose from for bins
# So figure out levels..
#
set num_bin_colors 6
set bin_current_width [expr ($_max_current_in_bin - $_min_current_in_bin) / $num_bin_colors]
close $GNUPLOT_POWER_SCRIPT
close $SUMMARY_COLORS
}