Mohan Ty 2004
Mohan Ty 2004
6, JUNE 2004
Abstract—In battery driven portable applications, the mini- The three sources of power dissipation in a CMOS digital
mization of energy, average power, peak power, and peak power circuit are dynamic power , short-circuit power , and
differential are equally important to improve reliability and static power as summarized in (1) below [1], [6]
efficiency. The peak power and the peak power differential drive
the transient characteristics of a CMOS circuit. In this paper, we
propose a framework for the simultaneous reduction of energy
and transient power during behavioral synthesis. A new metric (1)
called “cycle power function” (CPF) is defined which captures
the transient power characteristics as an equally weighted sum of
the normalized mean cycle power and the normalized mean cycle
where is the switching activity, is the total capacitance seen
differential power. Minimizing CPF using multiple supply voltages at the gate output, is the supply voltage, is the oper-
and dynamic frequency clocking under resource constraints re- ating frequency, is the time when short-circuit occurs, is
sults in the reduction of both energy and transient power. Based on the short-circuit current and is the leakage current. In [3],
the above, we develop a new datapath scheduling algorithm called it is pointed out that there is an increase in both dynamic and
CPF-scheduler which attempts at power and energy minimization
by minimizing the CPF parameter during the scheduling process.
static power in deep submicron and nanometer domains. It is
The type and number of functional units available become the set well known that: (i) reducing supply voltage, both power and
of resource constraints for the scheduler. Experimental results in- energy can be saved compromising delay; (ii) slowing down the
dicate that the proposed scheduler achieves significant reductions circuit by reducing the clock frequency will save power but not
in terms of power and energy. energy, and finally; (iii) varying frequency as well as voltage in
Index Terms—Average power, dynamic frequency clocking, low- a coordinated manner could save both energy and power while
power datapath scheduling, multiple supply voltages, peak power, maintaining performance at acceptable levels [6]–[8].
peak power differential, power fluctuation. In this paper, we use the concepts of multiple voltages and dy-
namic clocking [9]–[11] to achieve simultaneous minimization
I. INTRODUCTION of energy and transient power during behavioral synthesis. A
new framework is developed for simultaneous minimization of
with multiple supply voltages is given in [18], which helps in Most works we have discussed here address either average
reducing power using multiple supply voltages. The authors power or energy or peak power, but do not address all of the
propose in [19] a resource-constrained and a time-constrained power parameters (average power, energy, peak power, and peak
instruction scheduling algorithms for low-power pipelined power differential) together. In this paper, we describe a frame-
functional units. In [20], resource and latency constrained work for simultaneous minimization of total energy, average
list-based scheduling algorithms with multiple supply voltages power, peak power, and peak power differential. A new pa-
are discussed. Scheduling algorithms with resource and time rameter called cycle power function (CPF) is defined which
constraints based on the Lagrange multiplier technique are is an equally weighted sum of normalized mean cycle power
investigated in [21]. The above scheduling techniques consider and normalized mean cycle differential power. Minimizing this
various concepts such as single-clock frequency, multiple parameter using multiple supply voltages (MV) and dynamic
supply voltages, voltage scaling, capacitance reduction, and frequency clocking (DFC) results in the reduction of both en-
switching activity reduction for minimizing either total energy ergy and transient power. We investigate two different models
or average power, but not both at the same time. Further, these for defining CPF. The cycle differential power is defined as the
works have not considered dynamic frequency clocking or absolute deviation of the cycle power from the average power
transient power reduction. for any given cycle in the first model whereas it is defined as
Both the peak power and the peak power differential drive the the cycle-to-cycle power gradient in the second. Further, the
transient power characteristics of a system. The earliest work CPF models take into consideration the switching activity of
on peak power reduction during simultaneous scheduling and the different functional units. A datapath scheduling algorithm
assignment is reported in [22], in which power minimization (called, CPF-Scheduler) is proposed which attempts to mini-
achieved in one level (with SPICE) is used to optimize at be- mize the CPF while keeping the time penalty at a minimum
havioral level using genetic algorithms. In [23], ILP and force and using the concepts of dynamic frequency clocking and mul-
directed scheduling methods are explored for minimizing peak tiple supply voltages. The algorithm assumes different types and
power under latency constraints. The formulations consider numbers of resources (such as, multipliers and ALUs) operating
multicycling, pipelining and single supply voltage. ILP based at different voltages and frequencies as resource constraints. The
models to minimize peak power and peak area have been CPF-scheduling algorithm generates a parameter called cycle
proposed in [24] for latency constraint scheduling. The authors frequency index, , for each control step which serves as the
also introduce resource binding to minimize the amount of clock dividing factor for the dynamic clocking unit (DCU) gen-
switching at the input of functional units. In [25], the authors erates the different clock frequencies on the fly.
describe a time constrained ILP scheduling algorithm for real
time systems that minimizes both peak power and number
of resources. In [26], the use of data monitor operations for III. CPF
simultaneous peak power and peak power differential reduction
In this section, we introduce the different notations and termi-
is addressed. The above works address only peak power issues
nology required for defining the CPF. The notations and termi-
and do not include energy minimization and only attempt
nology needed for the proposed models are given in Table I. The
to minimize any one of the four parameters. In [27], the
datapath is represented as a sequencing data flow graph (DFG).
authors introduce the use of “telescopic” units to improve the
The CPF is defined to consist of two main components: the
throughput. The telescopic units allow variation in the number normalized mean cycle power and the normalized mean cycle
of clock cycles for execution depending on the input data. difference power. The normalized mean cycle power
A SIMD linear array image processor design is discussed in is the mean cycle power normalized with respect to the
[11] in which the modules of a circuit can be operated in dif- peak power consumption of the DFG. The normalized
ferent frequencies to improve the system performance. The con- mean cycle difference power is the mean cycle dif-
cept of dynamic frequency clocking is introduced in [11]. A ference power normalized with respect to the peak power
low-power design using multiple clocking scheme is presented differential of the DFG. The second component varies between
in [28]. If the overall effective frequency is , then the circuit is the two models. The mean difference power is the mean of the
partitioned into different disjoint modules with each module cycle difference power over the control steps. In model 1,
operating at the frequency indicating power savings of the cycle difference power is defined as the absolute de-
up to 50% compared to using a single frequency. The use of viation of the cycle power from the mean cycle power. Then,
frequency scaling in an MPEG2 decoder design is described in the mean cycle difference power is the mean deviation of
[10]. In this system, the clock speed is increased if the load is the cycle power from the mean cycle power. On other hand, in
high and the clock frequency is decreased if the load is small. A model 2, the cycle difference power of a current cycle is
time constrained heuristic scheduling algorithm is discussed in modeled as the cycle-to-cycle power gradient. In other words,
[29] that uses both frequency and voltage scaling. Energy sav- the cycle difference power of a current control step is
ings in the range of 33%–75% is reported, but power savings the difference (or gradient) of the current cycle power and the
is not mentioned. Several system-level approaches [7], [8] have previous cycle power. This can be expressed mathematically as,
been investigated toward reducing power consumption in both or . In this case, the
general purpose and special purpose processors with the help of mean cycle difference power is the mean difference (or the
simultaneous voltage and frequency scaling. gradient).
564 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 6, JUNE 2004
(5)
Thus, the normalized mean cycle power is an unitless
quantity in the range .
The cycle difference power for any control step can
be defined as follows. This is the absolute deviation of the cycle
power from the mean cycle power consumption of the DFG.
This is a measure of the cycle power fluctuation of the DFG
(3)
(8)
The cycle power function CPF, which is modeled as the The peak differential power is characterized by
equally weighted sum of the normalized mean cycle power
and the normalized mean cycle difference power
is given below
(10)
Thus, the CPF will have a value in the range . The CPF
can be impacted by various constraints, including the resource
constraints. In terms of peak cycle power and peak cycle
difference power , the CPF can be expressed as (14)
(11)
Using (5) and (9), the CPF can be written in (12), as shown at
the bottom of the page.
(9)
(12)
566 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 6, JUNE 2004
and datapath scheduler. Using the dynamic energy model pro- that both the numerator and denominator are minimized. We
posed in [15], we can express the effective switching capaci- have to put constraints on the denominators of the function
tance of our proposed model as and minimize the overall function. The constraints on the
denominators can be imposed through the use of resource
constraints. Thus, we conclude that the minimization of CPF
(17) using multiple supply voltages, dynamic frequency clocking
and multicycling under resource constraints will lead to the
Here, the and are the parameters corresponding to the reduction of energy and power parameters.
functional unit . The is a measure of the effective
switching capacitance of resource (functional unit) , which
is a function of and ; where and are the average IV. CPF-SCHEDULER ALGORITHM
switching activity values on the first and second input operands In this section, we develop a scheduling algorithm that mini-
of resource . It should be noted that the above switching mizes the objective functions using multiple voltages and dy-
model [in (17)] handles input pattern dependencies. The CPF namic clocking to reduce energy and the power. We assume
model can be easily modified for different modes of operation the availability of different functional units operating at dif-
of the datapath circuit: (i) single supply voltage and single fre- ferent supply voltages. In dynamic frequency clocking or fre-
quency; (ii) multiple supply voltages and single frequency; (iii) quency scaling, all the units are clocked by a single clock line
multiple supply voltages and dynamic frequency; and (iv) mul- which can switch frequencies at runtime [9]–[11]. In such sys-
tiple supply voltage and multicycling. For example, for single tems, a dynamic clocking unit (DCU) generates different clocks
supply voltage and single frequency scheme, and are using a clock dividing strategy. It should be noted that frequency
same for all , for multiple supply voltage and multicycling scaling helps in reducing power, but not energy. Moreover, the
is same for all . Using (17) we rewrite (12) as shown in (18) frequency reduction facilitates the operations of the different
at the bottom of the page. Using (17), we can derive a similar functional units at different voltages, which in turn helps in en-
expression for (16). The notation represents for the ergy reduction.
functional unit active in control step . A look-up table The target architecture model assumed for the scheduling is
constructed to store the values for different combinations from [13]. Each functional unit is associated with a register
of ( and ) for different types of functional units, such as and a multiplexor. The register and the multiplexor will op-
multipliers and ALUs. We use interpolation technique to deter- erate at the same voltage level as that of the functional units.
mine the values for the ( and ) combinations that Level converters are used when a low-voltage functional unit is
are not available in the look-up table. The size of the look-up driving a high-voltage functional unit [13], [20]. A controller
table impact the accuracy of the results; larger the size better is decides which of the functional units are active in each control
the accuracy. step and those that are not active are disabled using the mul-
Minimization of CPF: CPF is used as the objective function tiplexors. The controller will have a storage unit to store the
for low-power datapath scheduling. From the above equations, cycle frequency index values obtained from the sched-
we make the following observations about the CPF. The CPF is uling, used as the clock dividing factor for the dynamic clocking
a nonlinear function. It is a function of four parameters, such unit. The cycle frequency is generated dynamically and a cor-
as average power , peak power , average difference responding functional unit is activated.
power and peak difference power . Each of the The delay for a control step is dependent on the delays of the
above power parameters are dependent on switching activity, functional units , multiplexor , register and
capacitance, operating voltage and operating frequency. The level converters as expressed in the following:
absolute function ( or ) in the numerator [of (12) or (16)]
contributes to the nonlinearity. The complex behavior of the (19)
function is also contributed by the denominator parameters,
and . A fractional function can be minimized where, is the delay of control step , is the delay of the
by decreasing the numerator or by increasing the denominator. slowest FU in the control step and the register delays include
But, we are aiming at minimizing both the numerator and the setup and propagation delays. Using the above delay model,
denominator of the above fractional form objective function. the worst case delays of the library components are estimated.
So, we need to use specific approach to minimize CPF, such For a given base frequency , maximum frequencies of
(18)
MOHANTY AND RANGANATHAN: A FRAMEWORK FOR ENERGY AND TRANSIENT POWER REDUCTION DURING BEHAVIORAL SYNTHESIS 567
each FU are scaled down to operating frequencies . These 3. The ASAP schedule is unconstrained and the ALAP schedule
parameters are determined as follows: uses the number of clock steps found in the ASAP schedule as
the latency constraint. In Step 4, the number of resources of each
type and voltage levels is determined. For example, if the re-
source constraint is 1 multiplier at 2.4 V, 2 multipliers at 3.3 V,
2 ALUs at 2.4 V and 3 ALUs at 3.3 V, then the relaxed voltage
initial resource constraint is found out to be 3 multipliers and
5 ALUs. In Step 5, the scheduler uses the above relaxed voltage
resource constraints and modifies the ASAP and ALAP sched-
ules to take into account the resource constraints. This helps in
(20) restricting the mobility of vertices to a great extent and reducing
the solution search space for the heuristic. Due to the resource
where, is the minimum of the control step delays and is constraints the number of control steps of modified ASAP and
the number of allowable frequencies. The value of is chosen modified ALAP may be different from that of the ASAP and
in such a way that is closest value greater than or equal to ALAP schedule in Step 3. In Step 6, the scheduler fixes the total
. number of control steps of the schedule which is the maximum
The inputs to the algorithm are an unscheduled data flow of the control steps of the modified ASAP or modified ALAP
graph (UDFG), the resource constraints, the number of allow- in Step 5. In Step 7, the vertices are marked as having zero mo-
able voltage levels , the number of allowable frequencies bility or nonzero mobility. The zero mobility vertices are those
, delay of each resource , multiplexor , reg- having same modified ASAP time stamp and modified ALAP
ister at different voltage levels. The delays of level con- time stamp, and nonzero mobility vertices are those having dif-
verters are represented in the form of a matrix that ferent modified ASAP and modified ALAP time stamp. On de-
shows the delay for converting one voltage level to another termining the vertices having zero mobility and vertices having
voltage level (where, both , ). The resource nonzero mobility, proper time stamp and operating voltage for
constraint includes the number of ALUs and multipliers at dif- mobile vertices, and operating voltages for nonmobile vertices
ferent voltage levels (where, . The scheduling are found out. Further, operating clock frequencies are estab-
algorithm determines the proper time stamp for each operation, lished such that the CPF as well as the time penalty is minimum.
, and the voltage level such that CPF as well as the The CPF-Scheduler uses an heuristic algorithm for the same. In
time penalty is minimum. To reduce the time penalty, the lesser Step 9, the scheduler determines the base frequency and
energy consuming resources are used at as maximum frequency cycle frequency index using (20). In Step 10, the sched-
as possible. uler calculates the peak power, average power, peak power dif-
The CPF-Scheduler: The flow of the proposed algorithm is ferential, energy estimates of the scheduled DFG and also the
outlined in Fig. 1. In Step 1, the switching activities at the in- critical path delay.
puts of each node of the DFG are determined. For this purpose, The CPF-Scheduler Heuristic: Fig. 2 shows the heuristic
different sets of application specific input vectors (having dif- algorithm used by the CPF-Scheduler. The inputs to the
ferent correlations) are given at the primary inputs of the DFG CPF-Scheduler heuristic are modified ASAP time stamp of
and the average switching activity at each node is calculated. In each vertex , the modified ALAP time stamp of each vertex
Step 2, the scheduler constructs a look-up table with effective , the resource constraints, the number of allowable voltage
switching capacitance and the average switching activity pair as levels , the number of allowable frequencies . Delay
described in (17). If the look-up table is large enough to contain of each functional unit , multiplexor , register
the switching capacitance for all estimated average switching at different voltage levels are also given as inputs. The
activities is Step 1, then the power model accuracy is the highest. delays for the level converters are represented in the
The algorithm determines the as-soon-as-possible (ASAP) and form of a matrix. The heuristic has to find time stamp (in
the as-late-as-possible (ALAP) schedules for the UDFG in Step the range ) and operating voltage for each vertex
568 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 6, JUNE 2004
with operation . The aim of the heuristic is to minimize and ALUs are the available resources then the . If a
CPF while keeping time penalty at a minimum. The heuristic voltage is assigned for a vertex, then the matrix entry of the
minimized time ratio along with CPF to minimize the time corresponding type and operating voltage is decremented. A
penalty. The time ratio is defined as the ratio between the particular vertex is placed in a cycle for which the sum of
critical path delay when the vertices of the DFG are operating CPF and is minimum. The heuristic, initially assumes
at multiple voltage and when each of the vertices of the the modified ASAP schedule (with relaxed voltage resource
DFG is operated at the highest voltage. Expressing mathemat- constrained) as the current schedule (line 01). In case a vertex
ically, . These two objectives, minimization of is a multiplication operation, then the initial voltage assignment
CPF (minimization of energy and power) and minimization is the minimum available operating depending on the number
of time penalty are mutually conflicting. This is due to the of multipliers, whereas, for ALU operations vertex, it is the
fact that if operating voltage is reduced to minimize energy maximum available operating voltage (line 04–08). Then the
or power consumption this results in increase of critical path CPF and value for the current schedule is calculated (line
delay and hence increase of time penalty. The heuristic is to 09 and line 10). The heuristic finds CPF (and ) values for
operate the energy hungry functional units at higher voltage each allowable control step of each mobile vertices and for each
and frequency values, and the less energy consuming units available operating voltages denoted as Temp CPF (and Temp
at lower voltage and frequency ranges in order to achieve ) (line 17–20). The statement in line 17 adjusts the current
the simultaneous reduction of both parameters. The heuristic schedule by adjusting the time stamps of successor vertices
fixes the operating voltages of the nonmobile vertices as per while maintaining the resource constraint (using the matrix)
this order depending on the types of resources they need. The and guaranteeing that the precedence is satisfied. In line 12,
heuristic attempts to find suitable time stamp and operating the vertices are visited in ASAP manner. Another possible way
voltage for the mobile vertices using exhaustive search. The of visiting the mobile vertices is to prioritize them in some
mobile-vertices are attempted to be placed in each of the manner, say vertex with lower mobility is visited first. The
time stamps within their mobile range , when each heuristic fixes the time step and operating voltage for a vertex
placement and voltage assignment is done, the CPF and and hence cycle frequency for which is minimum
value is calculated. The predecessor and successor time stamps (line 22–26). For CPF computation the heuristics uses as
are adjusted accordingly to maintain the precedence. For a temporary measure for . The above steps are repeated until
this purpose the heuristic maintains a matrix of dimension all mobile vertices are time stamped.
having number of resources of different types Time complexity of CPF-Scheduler Heuristic: Let there be
as entries rowwise over all control steps. The is the number of vertices in the DFG, out of which number
type of resources available, for example, if only multiplier of vertices have mobility and the maximum mobility of any mo-
MOHANTY AND RANGANATHAN: A FRAMEWORK FOR ENERGY AND TRANSIENT POWER REDUCTION DURING BEHAVIORAL SYNTHESIS 569
bile vertex is . It should be noted that the total number of ver- TABLE II
tices in the DFG is total number of operations in DFG and the NOTATIONS USED TO EXPRESS THE RESULTS
total number of NO-OPs. The running time of finding an op-
erating voltage from the matrix for particular type of operation
is . The statements from line 04–08 have running time
of . The worst case running time of the statement in
line 17 (or line 29) that adjusts the current schedule is .
The running time of the code segment between line 17–26 is
, which is , since
it is always true that , . So, the running time of
the code segment from line 15–27 is . Thus, the run-
ning time of the code segment line 12–28 is . The
other statements of the pseudocode have constant running time.
So, the running time or time complexity of the code segment in
line 03–29 is . This can
be simplified to an weak upper bound on worst case running of
the code segment (line 03–29) under the assumption that
, but in practice . Under the above assumption we
conclude that the worst case upper bound on the running time of
the code segment in line 03–29 is . Considering the
while loop in line 02 the overall running time of the algorithm 2) number of multipliers: 2 at 2.4 V; number of ALUs: 1 at
can be written as . Again under the assumption 3.3 V;
3) number of multipliers: 2 at 2.4 V; number of ALUs: 1 at
that , we conclude that the worst case upper bound
2.4 V and 1 at 3.3 V;
on the running time of the algorithm is . In other 4) number of multipliers: 1 at 2.4 V and 1 at 3.3 V; number
words, the heuristic runs in time cubic to the number of vertices of ALUs: 1 at 2.4 V and 1 at 3.3 V.
in the DFG. It can be noted that the time complexity of the algo- The sets of resource constraints were chosen so as to cover re-
rithm is independent of the number of operating voltage levels. sources at different operating voltages. The number of allow-
able voltage levels was assumed to be two (2.4 V, 3.3 V) and
V. EXPERIMENTAL RESULTS maximum number of allowable frequencies are three. The CPF-
scheduler determines the frequencies, in this case they are 4.5,
The CPF-Scheduler algorithm was implemented in C and
9.0, and 18.0 MHz. The experimental results for different bench-
tested with selected benchmark circuits. The benchmarks used
marks are shown in Table III for different resource constraints.
are as follows:
The results take into account the power or energy consumptions
1) Auto-Regressive filter (ARF) (total 28 nodes, 16 , 12 , in overheads, such as level converters and dynamic clocking
40 edges); unit. This indicates that the scheduling scheme could achieve
2) band-Pass filter (BPF) (total 29 nodes, 10 , 10 ; 9 , 40 significant reductions in peak power, peak power differential,
edges). average power and total energy with reasonable time penalties.
3) DCT filter (total 42 nodes, 13 , 29 , 68 edges); The time penalty for the ARF and HAL benchmarks circuits
4) elliptic-wWave filter (EWF) (total 34 nodes, 8 , 26 , 53 were relatively high. For many cases, CPF-Scheduler could re-
edges); duce energy and power even without any time penalty or even
5) FIR filter (total 23 nodes, 8 , 15 , 32 edges); with gain in time. This happens when the performance degra-
6) HAL differential equation solver (total 11 nodes, 6 , 2 , dation due to multiplications in the critical path are adequately
2 , 1 , 16 edges). compensated by the number of ALU operations in the critical
Our algorithm can handle large DFGs and find solutions in rea- path. For this to happen, the ALU operations should be larger
sonable time. The parameters used to express our experimental than or equal to the number of multiplications in the critical path.
results are shown in Table II. The look-up table construction This is the case for most of the schedules obtained for the EWF
consists of two phases, such as input pattern generation and cell and FIR benchmarks indicated by the time ratio of less
characterization. We generate the primary input signals of dif- than or equal to one.
ferent correlations and perform the characterization of the phys- For the above experimental set up, we plotted the power
ical implementations of the library modules available in [29]. consumption per cycle, over all the control steps (clock steps)
Our first set of experiments were carried out for the CPF for different benchmarks in Fig. 3(a) and (b) for resource
model 1 (18) in which the cycle difference power is based on constraints RC1 and RC3, respectively. The curves labeled as
the absolute deviation. We tested the scheduling algorithm using “S” correspond to the profile when the schedule is operated at
the following sets of resource constraints (RC1, RC2, RC3, and a single frequency (which is the maximum frequency of the
RC4): slowest operator, the multiplier) and single voltage. The profiles
1) number of multipliers: 1 at 2.4 V; number of ALUs: 1 at labeled as “D” correspond to the case when dynamic clocking
3.3 V; and multiple voltage scheme are used. The effectiveness of the
570 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 6, JUNE 2004
TABLE III
POWER ESTIMATES FOR DIFFERENT BENCHMARKS (USING MODEL 1)
Fig. 3. Cycle power consumption of different benchmarks for various resource constraints.
proposed scheduling scheme is obvious from the figures. Since reductions. However, the time penalty increased by 15%. It is to
the CPF is a complex function consisting of several parameters, be noted that the number of allowable frequency levels should
it is difficult to quantify the impact of a specific parameter be as close to the number of allowable voltages in order to
accurately. keep the time penalty within a reasonable limit. We performed
We also performed experiments with three voltage levels (1.5, the same set of experiments for the CPF model 2 in which
2.4, and 3.3 V) and four frequency levels. The results could im- the cycle difference power is modeled as cycle-to-cycle power
prove within the range of 5%–10% in terms of power or energy gradient. The experimental results indicate that the energy and
MOHANTY AND RANGANATHAN: A FRAMEWORK FOR ENERGY AND TRANSIENT POWER REDUCTION DURING BEHAVIORAL SYNTHESIS 571
power reduction were similar with small differences, but there [13] M. Johnson and K. Roy, “Datapath scheduling with multiple supply volt-
were no changes in terms of time penalty. We conclude that ages and level converters,” ACM Trans. Design Automation of Electron.
Syst., vol. 2, no. 3, pp. 227–248, July 1997.
the minor difference is due to the fact that the quantitative [14] , “Optimal selection of supply voltages and level conversions
difference between the values of and during datapath scheduing under resource constraints,” in Proc. Int.
Conf. Computer Design, Oct. 1996, pp. 72–77.
are not significant. [15] J. M. Chang and M. Pedram, “Energy minimization using multiple
supply voltages,” IEEE Trans. VLSI Syst., vol. 5, pp. 436–443, Dec.
1997.
[16] A. Raghunathan and N. K. Jha, “SCALP: An iterative-improvement
VI. CONCLUSIONS based low-power datapath synthesis system,” IEEE Trans. Com-
puter-Aided Design, vol. 16, pp. 1260–1277, Nov. 1997.
[17] M. Sarrafzadeh and S. Raje, “Scheduling with multiple voltages under
For deep submicron and nanometer technology designs used resource constraints,” in Proc. IEEE Symp. Circuits Systems, vol. 1,
in low-power battery driven systems, simultaneous minimiza- 1999, pp. 350–353.
tion of total energy and transient power is beneficial. The CPF [18] A. Kumar and M. Bayoumi, “Multiple voltage-based scheduling
methodology for low-power in the high level synthesis,” in Proc. Int.
parameter defined and used in this work essentially facilitates Symp. Circuits Systems, vol. 1, July 1999, pp. 371–379.
such simultaneous optimization. The datapath scheduling al- [19] M. M. Mansour, M. M. Mansour, I. Hajj, and N. Shanbhag, “Instruction
gorithm described in this paper is particularly useful for syn- scheduling for low-power on dynamically variable voltage processors,”
in Proc. 7th IEEE Int. Conf. Electronics, Circuits Systems, 2000, pp.
thesizing data intensive application specific integrated circuits. 613–618.
The algorithm attempts to optimize energy and power while [20] W. T. Shiue and C. Chakrabarti, “Low-power scheduling with resources
keeping the time penalty at a minimum. The CPF-Scheduler operating at multiple voltages,” IEEE Trans. Circuits Syst. II, vol. 47,
pp. 536–543, June 2000.
algorithm assumes the number of different types of resources [21] A. Manzak and C. Chakrabarti, “A low-power scheduling scheme with
at each voltage level and the number of allowable frequencies resources operating at multiple voltages,” IEEE Trans. VLSI Syst., vol.
as resource constraints. The main contribution of this work is a 10, pp. 6–14, Feb. 2002.
[22] R. S. Martin and J. P. Knight, “Optimizing power in ASIC behavioral
unified framework for simultaneous multicost space metric op- synthesis,” IEEE Des. Test Comput., vol. 13, no. 2, pp. 58–70, Apr. 1996.
timization of different energy and power components in CMOS [23] W. T. Shiue, “High-level synthesis for peak power minimization using
circuit design. Future work could address leakage reduction and ILP,” in Proc. IEEE Int. Conf. Application Specific Systems, Architec-
tures Processors, 2000, pp. 103–112.
interconnect issues. The effectiveness of the CPF in the con- [24] W. T. Shiue and C. Chakrabarti, “ILP based scheme for low-power
text of a pipelined datapath and in control intensive applications scheduling and resource binding,” in Proc. IEEE Int. Symp. Circuits
needs to be investigated. Systems, vol. 3, 2000, pp. 279–282.
[25] W. T. Shiue, J. Denison, and A. Horak, “A novel scheduler for low-power
real time systems,” in Proc. 43rd Midwest Symp. Circuits Systems, Aug.
2000, pp. 312–315.
REFERENCES [26] V. Raghunathan, S. Ravi, A. Raghunathan, and G. Lakshminarayana,
“Transient power management through high level synthesis,” in Proc.
[1] A. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS Int. Conf. Computer-Aided Design, 2001, pp. 545–552.
digital design,” IEEE J. Solid-State Circuits, vol. 27, pp. 473–483, Apr. [27] L. Benini, E. Macii, M. Pnocino, and G. De Micheli, “Telescopic units:
1992. A new paradigm for performance optimization of VLSI design,” IEEE
[2] D. Singh, J. M. Rabaey, M. Pedram, F. Catthoor, S. Rajgopal, N. Sehgal, Trans. Computer-Aided Design, vol. 17, pp. 220–232, Mar. 1998.
and T. J. Mozdzen, “Power conscious CAD tools and methodologies: A [28] C. Papachristou, M. Spining, and M. Nourani, “A multiple clocking
perspective,” Proc. IEEE, vol. 83, pp. 570–594, Apr. 1995. scheme for low-power RTL design,” IEEE Trans. VLSI Syst., vol. 7, pp.
[3] D. Sylvester and H. Kaul, “Power-driven challenges in nanometer 266–276, June 1999.
design,” IEEE Design Test Computers, vol. 13, pp. 12–21, Nov.–Dec. [29] S. P. Mohanty and N. Ranganathan, “Energy efficient scheduling for
2001. datapath synthesis,” in Proc. Int. Conf. VLSI Design, Jan. 2003, pp.
[4] L. Benini, G. Casterlli, A. Macii, and R. Scarsi, “Battery-driven dynamic 446–451.
power management,” IEEE Design Test Computers, vol. 13, pp. 53–60,
Mar.–Apr. 2001.
[5] T. L. Martin and D. P. Siewiorek, “Non-ideal battery properties and
low-power operation in wearable computing,” in Proc. 3rd Int. Symp.
Wearable Computers, 1999, pp. 101–106.
[6] T. N. Mudge, “Power: A first class design constraint for future architec-
ture and automation,” in Proc. Int. Conf. High-Performance Computing,
2000, pp. 215–224. Saraju P. Mohanty (S’00) received the B. Tech.
[7] T. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, “A dy- degree (Hons.) in electrical engineering from
namic voltage scaled microprocessor system,” IEEE J. Solid-State Cir- the College of Engineering and Technology,
cuits, vol. 35, pp. 1571–1580, Nov. 2000. Orissa University of Agriculture and Technology,
[8] J. Pouwelse, K. Langendoen, and H. Sips, “Energy priority scheduling Bhubansewar, India, in 1995, the M.E. degree in
for variable voltage processor,” in Proc. Int. Symp. Low-Power Elec- systems science and automation from the Indian
tronics Design, Aug. 2001, pp. 28–33. Institute of Science, Bangalore, India, in 1999,
[9] I. Brynjolfson and Z. Zilic, “Dynamic clock management for low-power and the Ph.D. degree in computer science and
applications in FPGAs,” in Proc. IEEE Custom Integrated Circuits engineering from the University of South Florida,
Conf., 2000, pp. 139–142. Tampa, in 2003.
[10] J. M. Kim and S. I. Chae, “New MPEG2 decoder architecture using He has published several research papers in
frequency scaling,” in Proc. IEEE Int. Symp. Circuits Systems, 1996, the areas of VLSI design automation, very large scale integration (VLSI)
pp. 253–256. design, and digital watermarking. His research interests include high-level
[11] N. Ranganathan, N. Vijaykrishnan, and N. Bhavanishankar, “A VLSI synthesis, low-power synthesis, VLSI CAD for DSM regime, dynamic power
array architecture with dynamic frequency clocking,” in Proc. Int. Conf. management, low-power ASIC design.
Computer Design, 1996, pp. 137–140. Dr. Mohanty is a Member of ACM-SIGDA. He was nominated for the Best
[12] Y. R. Lin, C. T. Hwang, and A. C. H. Wu, “Scheduling techniques for Paper Award at the International Conference on VLSI Design in 2003. He
variable voltage low-power design,” ACM Trans. Design Automation received a Certificate of Recognition from the Provost, University of South
Electron. Syst., vol. 2, no. 2, pp. 81–97, Apr. 1997. Florida, Tampa, for outstanding teaching in 2002 and 2003.
572 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 6, JUNE 2004