Proceedings of Seminar On Energy-Aware Software
Proceedings of Seminar On Energy-Aware Software
Proceedings of Seminar on
Energy-Aware Software
TEKNILLINEN KORKEAKOULU
TEKNISKA HÖGSKOLAN
HELSINKI UNIVERSITY OF TECHNOLOGY
TECHNISCHE UNIVERSITÄT HELSINKI
UNIVERSITE DE TECHNOLOGIE D’HELSINKI
Helsinki University of Technology
Department of Computer Science and Engineering
Software Technology Laboratory
P.O. Box 5400
FIN-02015 HUT
Espoo
Finland
ISBN: 978-951-22-8871-7
ISSN: 1239-6907
Preface
The Software Technology Laboratory at Helsinki University of Technology organized a
seminar on energy-aware software during spring 2007. The goal of the seminar was to
take a look at the current status of the field. These proceedings include some of the work
presented in the seminar.
Vesa Hirvisalo
iii
iv
Contents
Energy-Aware Scheduling
Juhani Peltonen 26
Energy Acconting
Timo Töyry 38
v
vi
Hardware Accelerators in Embedded Systems
Lari Ahti
Helsinki University of Technology
Software Technology Laboratory
[email protected]
Hardware acceleration is use of additional hardware to Motivation is for hardware acceleration is usually perfor-
perform some function more effectively than with more gen- mance increase of the system, but also energy consumption
eral hardware implementation. Espescially in wireless em- may be motivator, especially in embedded systems. Hard-
bedded systems, complicated communication algortihms re- ware acceleration changes behaviour of the system at least
quire high performance from system on chip. In case of a in the hardware level, but may also affect software imple-
battery powered device, algorithms are often implemented mentation. [7]
with hardware accelerators in order to meet both energy Usually hardware acceleration implementation is a trade-
consumption and performance requirements. off between three main factors: Performance, energy con-
One implementation, that takes advantage of hardware sumption and flexibility. Other factors that may have effect
accelerators, is software defined radio, which is a solution on selecting specific technique are for example manufac-
targetted to solve problems caused by fixed hardware in mo- turing price, design costs and time to market. Especially
bile communication devices. Software defined radio means in battery powered devices energy consumption will limit
layer that is capable to perform highly demanding signal other factors. Another example of hardware acceleration
processing and yet be reprogrammable after manufactur- is graphics accelerators, that are designed to render real-
ing. Reprogrammability has several advantages: Support time graphics. In this case performance of the accelerator is
for multiple protocols, faster time-to-market, higher chip maximized and other factors are limited.
volumes and support for late implementation changes.
2.2. Hardware techniques
[7]
There are also other types of accelerators used and in
many cases combination of several accelerators are used.
For example graphics accelerator is usually a combination
of several hardware accelerators that are located in single
chip.
Wireless communication is a field where hardware ac- Figure 2. Modern wireless communication
celerators are commonly used to perform signal processing protocols. [4]
tasks. Battery powered devices require energy consumption
to be as small as possible, but complicated signal process-
ing algortihms require high computational power. To over- Problem with these protocols is that their computational
come problem caused by this scenario, hardware accelators requirements are much higher than with capabilities of
are used to perform computionally demanding tasks with modern DSP processors. Current DSP processors are able
smaller energy consumption than with more general pur- to perform 10 Mops/mW while modern wirless protocols
pose hardware. Design and verification of fixed hardware require about 100 Mops/Mw. [4] To increase performance
systems is difficult, because each implementation is suited of the system, hardware accelerators are needed.
(VLIW) DSPs can achieve high performance by exe-
cuting several instructions parallel. The idea in VLIW
is that several instructions can be combined into sin-
gle intruction word in compile time and then executed
parallel in hardware. The instruction execution energy
consumption is higher than with other architectures,
which limits also the overall perfomance of the archi-
tecture. Current DSP implementations don’t satisfy
high computational requirements of modern wireless
algorithms, and thus architecture requires some type of
hardware accelerator to give additional performance.
One example of VLIW DSP processor is the Texas In-
struments TMS320C64x DSP. [8]
4 Summary
Figure 3. Computational power relative to en-
ergy consumption in some hardware imple- Many embedded systems require high computational
mentations. [4] performance and extremely low power consumption. Hard-
ware accelerators are used to provide additional perfor-
mance for system-on-chip. Hardware accelerators perform
some function more energy efficiently than with general
3.2. Architecture suggestions purpose processors. Usually accelerators limit flexibility of
the system by adding more task specific hardware to system.
Several different software defined radio architectures Many modern wireless battery powered devices use
have been suggested. It is interesting to note, that solutions hardware accelerators to meet with computional require-
are very different from each other. List of few interesting ments with limited energy capacity. Historically hardware
software defined radio architectures types: accelerator implementations were created with task specific
hardware that is not very flexibile. Current mobile devices
• Hybrid-SIMD based architecture. This solution ar- implement wide range of wireless protocols, which results
chitecture consist of separate scalar and vector proces- in difficulties with complicated hardware implementations.
sors. Scalar processor is used to control signal process- More dynamic approach is required to create more generic
ing and vector processors are used to perform actual processing elements that can be reprogrammed after manu-
processing. Examples of such architectures are SODA facturing.
[4] and SandBlaster [6]. Software defined radio is a solution to problems caused
• FPGA based architecture Many solutions take ad- by fixed hardware implmentations in mobile devices. The
vantage of FPGAs as they provide decent performance idea in SDR is to create layer that is capable to handle
increase and are reprogrammable. Solutions usually several complicated communication algorithms with lim-
include a processor that controls processing. Diffi- ited energy resources. Hardware accelerators are used
culties arise from requirements to meet real-time con- to gain performance without signicantly increasing power
straints. Example of architecture is picoArray [1]. consumption of the device. Many SDR architectures have
been proposed, but consumer products with SDR imple-
• Heterogeneous architecture Some solutions use sev- mentations are not yet available.
eral heterogeneous processing elements to satisfy com-
putational requirements. Each processing element is
References
designed for spesific communication signal process-
ing algorithm. This approach limits flexibility of the
system, but is very efficient for spesial purpose. Also [1] R. Baines and D. Pulley. Software defined baseband
workload distribution among processing elements is processing for 3g basestations. In 4th International
difficult due to heterogeneous architecture and each Conference on 3G Mobile Communication Technolo-
processing element type must be capable to handle gies, pages 123–127, 2003.
worst case workload.
[2] William J. Dally and Brian Towles. Route packets, not
• VLIW DSP architecture Very long instruction word wires: on-chip inteconnection networks. In DAC ’01:
Proceedings of the 38th conference on Design automa-
tion, pages 684–689, New York, NY, USA, 2001. ACM
Press.
[3] Kanishka Lahiri and Anand Raghunathan. Power anal-
ysis of system-level on-chip communication architec-
tures. In CODES+ISSS ’04: Proceedings of the 2nd
IEEE/ACM/IFIP international conference on Hard-
ware/software codesign and system synthesis, pages
236–241, New York, NY, USA, 2004. ACM Press.
[4] Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel,
Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, and
Krisztian Flautner. Soda: A low-power architecture for
software radio. In ISCA ’06: Proceedings of the 33rd
annual international symposium on Computer Archi-
tecture, pages 89–101, Washington, DC, USA, 2006.
IEEE Computer Society.
[5] Chittarsu Raghunandan, K. S. Sainarayanan, and M. B.
Srinivas. Bus-encoding technique to reduce delay,
power and simultaneous switching noise (ssn) in rlc
interconnects. In GLSVLSI ’07: Proceedings of the
17th great lakes symposium on Great lakes symposium
on VLSI, pages 371–376, New York, NY, USA, 2007.
ACM Press.
[6] Michael Schulte, John Glossner, Sanjay Jinturkar,
Mayan Moudgill, Suman Mamidi, and Stamatis Vas-
siliadis. A low-power multithreaded processor for soft-
ware defined radio. J. VLSI Signal Process. Syst., 43(2-
3):143–159, 2006.
[7] Eric Tell. Design of Programmable Baseband Proces-
sors. PhD thesis, Linkping Studies in Science and Tech-
nology, 2005.
[8] Texas Instruments. TMS320C64x DSP Generation,
2003. http://www.softier.com/pdf/sprt236a.pdf.
[9] Feng Wang, Yuan Xie, N. Vijaykrishnan, and M. J. Ir-
win. On-chip bus thermal analysis and optimization.
In DATE ’06: Proceedings of the conference on De-
sign, automation and test in Europe, pages 850–855,
3001 Leuven, Belgium, Belgium, 2006. European De-
sign and Automation Association.
Harnessing the Power of Software –
A Survey on Energy-Aware Interfaces
Gerard Bosch i Creus
Nokia Research Center
P.O. Box 407
FIN-00045 Nokia Group (Finland)
[email protected]
Application
Device Description Application
Device
Application
Power Application
Device
Driver
Device
Driver
Driver
State
On Full power Notification Message Queue Notification Message Queue
PM APIs
Idle Low power mode and inactive. The device can still respond
to interrupts or external events
«send» «send»
Notifications Notifications
Standby Inactive with internal state maintained Power Manager
TABLE III
GPS Power
Handler
D EVICE POWER STATES IN W INDOWS M OBILE V 5
GPS
Logical
Device Driver
and their requirements, and adjusts the system power states in dictate the bottom limit state, floor, for a particular device
response to power events. Power events may be caused by through the PM interface. Devices are allowed to manage their
powering on and off the system and by changes to the state of own power states between the set ceiling and floor levels.
shared power resources. The system needs to behave gracefully System power states are named collections of device power
also in the event of a critical power failure. states, such as Battery, Docked or UserIdle. There is no limit
Power resources are powered up and down as needed, when for the number of system power states, and state transition
requested by their power handlers. Power resources use their does not need to be linear. Device power states follow a
power handlers to specify power requirements to the power clearly specified hierarchy. Table III presents the device power
model. The power model changes the system power state in state hierarchy. States D0 and D1 should be fully functional
response to the cumulative power requirements for all power from a user perspective, and higher numbered states typically
handlers. consume less power. Power management can increase device
driver complexity considerably. For example, drivers may need
B. Windows Mobile to behave differently upon receiving a power down event while
The power management infrastructure in Windows Mobile on a state other than D4. Dependencies to other devices (and
v5 is built around the Power Manager (PM) component [16]. their power states) will influence the course of action for
The PM provides interfaces at the application, system and device drivers.
driver level for developers, with the goal of extending battery
C. Darwin
life. The PM is based on the notion of power states, making
a clear separation between system and device (driver) power Darwin is the open-source UNIX-based foundation of Ap-
states. Both application and driver developers are actively ple’s Mac OS X. The basic concepts underlying Darwin power
encouraged to make use of the PM interfaces to control management are hierarchies of devices (referred to as power
devices. The PM uses a publish-subscribe pattern to update domains in Darwin) and their supervising entities, policy
software of impending changes in power states. Figure 5 makers and power controllers [17], [18], [19].
presents the Windows Mobile power management architecture. The fundamental entity in Darwin power management is
The PM is designed around the concept of power states and the device, a hardware component the power of which can be
a clear division between system and device states. Devices adjusted independently of system power. A device may have
are expected to implement a set of well-defined power states, different power states associated with it, at least two - on and
independent of states defined by the ACPI standard [14]. off. Darwin associates several attributes to the power state of
Manufacturers define system power states that provide an each device:
upper limit state, ceiling, for all devices. Applications can • Power used by the device in that state
Root
devices, but the character of the power supplied to members of
Domain
the domain. This may imply adjusting clock rates or voltages,
Policy Power for example. Secondly, policy makers for power domains
maker Controller
Root
do not base their decisions on idleness like device policy
Domain
Child
makers. Instead, they base their decisions based on requests
Domain Device
from their members (which include policy makers and power
Policy Power
Card Hard maker Controller controllers). Therefore, policy makers are responsible for the
Reader Disk
Domain Domain power supplied to the domain they supervise, and they request
Device the power state for the domain they belong to.
The power management architecture also provides support
for notification of power state changes to interested entities.
Power controllers are automatically notified of power state
(a) Power domain hierarchy (b) Darwin power management architecture changes. Other interested objects may include driver objects,
Fig. 6. Darwin power management
that will need to subscribe to the notifications by implementing
a couple of function callbacks. In addition, user processes may
also request notifications of system and device power events.
• Capabilities of the device in that state V. D ISCUSSION
• Power required to move the device to the next higher
There seems to be a clear gap between the state of the art
state (currently unused)
and the state of the practice. While several studies show that
• Time required to move the device to that state (currently
application input is essential for effective power management,
unused)
operating systems are slow adopting the results of power
Another entity in Darwin power management is a power management research.
domain, a switchable source of power in the system providing Symbian OS offers a comprehensive framework for power
power to one or more devices that are considered part of the management, yet fails to reflect application requirements,
power domain. Power domains dictate the highest power state assuming that a hardware view offers a complete enough
for the devices they contain, and like devices, they have a range picture of the power requirements. This effectively turns
of power states, with at least on and off states supported. Power Symbian power management into a reactive system rather than
domains are hierarchical, with the top-level domain being the proactive, therefore missing power saving opportunities. It is a
root power domain, which represents the main power of the clear example of autonomous power management. In addition,
system. Figure 6(a) presents an example of the power domain while the framework is designed to provide an accurate picture
hierarchy. of the device power consumption, this is rarely achieved in
Darwin defines two supervising entities for devices and practice. Hardware may exhibit complex power consumption
power domains: policy makers and power controllers. Policy patterns and manufacturers tend to neglect power consumption
makers are the objects responsible for deciding when to change notifications. Symbian OS power management is typically
the power state of devices and power domains, based on reduced to a device management framework for system startup
different factors. The major factor in this decision is device and shutdown events.
idleness. When the system detects that a device is idle, it Windows Mobile power management takes instead a
will try to reduce its power state, and policy makers are the requester-controlled approach, with the implied drawbacks.
entities responsible for deciding when to change the power The framework relies on deice drivers to take the most
state of a device or domain. However, the entities responsible intelligent decisions regarding device power states, which
for implementing these changes are power controllers. Other may not always be possible due to their narrower scope
factors that the policy makers take into account include the with respect to the system. In addition, the abstraction level
aggressiveness, which is defined by the user in different offered to applications is not adequate [1]. The system should
contexts (such as AC-plugged or running from batteries) or by be ultimately responsible for resource management, albeit
the system itself (such as when the battery reaches a critically through collaboration with applications.
low state). Darwin provides a very simple approach to power manage-
A power controller knows about the power states of a device ment. Idleness and user-defined aggressiveness are the major
and can steer the device between them. Additionally, it reports factors governing policy making. However, I argue that waiting
power-related information of a device to the policy maker for an idleness timer to expire is a potential energy waste and
to assist in decision-making. Figure 6 presents an overview should be avoided. Application inputs are not considered at
of the interoperation between the different Darwin power all, with the implied loss of power-saving opportunities.
management entities. It seems that leading mobile operating systems do not
There are a few fundamental differences between policy exploit the advantages offered by application adaptation. This
makers for devices and power domains. First, policy makers could provide significant power savings as exposed in III.
for power domains do not alter the power consumption of Additional techniques such as idle period notification and
ghost hints could provide additional savings. Given that power [10] A. Weissel, M. Faerber, and F. Bellosa, “Application characterization
efficiency is taking a front seat in mobile OS design, the for wireless network power management,” in Proceedings of the
International Conference on Architecture of Computing Systems
techniques presented provide promising methods to reduce the (ARCS’04), January 2004. [Online]. Available: http://citeseer.ist.psu.
burden on the battery. edu/649995.html
[11] T. Heath, E. Pinheiro, J. Hom, U. Kremer, and R. Bianchini,
“Application transformations for energy and performance-aware device
VI. C ONCLUSIONS AND FURTHER WORK management,” in Proceedings of the Eleventh Conference on Parallel
Architectures and Compilation Techniques (PACT’02), September 2002.
[Online]. Available: http://dx.doi.org/10.1109/PACT.2002.1106011
This paper has presented a survey of the published research [12] ——, “Code transformations for energy-efficient device management,”
on software adaptations for energy efficiency. In addition, IEEE Transactions on Computers, vol. 53, no. 8, August 2004. [Online].
I reviewed three mainstream mobile OSs and the power Available: http://www.cs.rutgers.edu/∼ricardob/papers/tc04.pdf
[13] Y.-H. Lu, L. Benini, and G. D. Micheli, “Power-aware operating
management functionality they offer. There seems to exist a systems for interactive systems,” IEEE Transactions on Very Large
big gap between the state of the art and the state of the practice, Scale Integration (VLSI) Systems, vol. 10, no. 2, April 2002. [Online].
as evidenced by the lack of application input to power manage- Available: http://dx.doi.org/10.1109/92.994989
[14] Hewlett-Packard, Intel, Microsoft, Phoenix, and Toshiba, Advanced
ment interfaces. Operating systems could greatly benefit from Configuration and Power Interface Specification 3.0a, Dec. 2005.
the inclusion of the reviewed techniques in their design, since [Online]. Available: http://www.acpi.info/DOWNLOADS/ACPIspec30a.
power management seems to play an increasingly important pdf
[15] “Power management framework,” Symbian Developer
role in driving OS design. Battery technology cannot provide Library. [Online]. Available: http://www.symbian.com/developer/
the necessary improvements to increase or even maintain the techlib/v70docs/sdl v7.0/doc source/baseporting/KernelProgramming/
battery life of mobile devices, that face an increasing amount PowerManagementFramework/index.html
[16] J. Looney, “New Power Manager States in Windows Mobile V5 and
of power-hungry features and hardware components. Software How to Use Them,” CLI327. Mobile & Embedded DevCon 2005.
needs to become more energy-efficient, and collaboration with Presentation.
the OS is a promising area for achieving this goal. [17] “Power management,” Apple Developer Connection. [Online].
Available: http://developer.apple.com/documentation/DeviceDrivers/
Conceptual/IOKitFundamentals/PowerMgmt/chapter 10 section 3.html
R EFERENCES [18] “Power management for Macintosh; getting started,” Apple Developer
Connection. [Online]. Available: http://developer.apple.com/technotes/
tn2002/tn2075.html
[1] Y.-H. Lu, L. Benini, and G. D. Micheli, “Requester-aware power [19] “Technical Note TN2075. Power Saving Features for the
reduction,” in International Symposium on System Synthesis. Stanford PowerBook G4 computer,” Apple Developer Connection. [Online].
University, September 2000, pp. 18–23. [Online]. Available: http: Available: http://developer.apple.com/documentation/Hardware/
//doi.acm.org/10.1145/501790.501796 Developer Notes/Macintosh CPUs-G4/PowerBook G4Apr02/
[2] T. Simunic, L. Benini, and G. De Micheli, “Energy-efficient 1Introduction/Power Saving Features.html#TPXREF115
design of battery-powered embedded systems,” in Proceedings of
the International Symposium on Low-Power Electronics and Design
(ISLPED’98), June 1998. [Online]. Available: http://www.acm.org/pubs/
articles/proceedings/dac/313817/p212-simunic/p212-simunic.pdf
[3] T. K. Tan, A. Raghunathan, and N. Jha, “Software architectural
transformations: A new approach to low energy embedded software,”
in Proceedings of the Conference on Design Automation and Test
in Europe (DATE’03), 2003. [Online]. Available: http://ieeexplore.ieee.
org/xpls/abs all.jsp?arnumber=1253742
[4] B. Noble, “System support for mobile, adaptive applications,” IEEE
Personal Communications, vol. 7, no. 1, pp. 44–49, February 2000.
[Online]. Available: http://www.cs.cmu.edu/∼coda/docdir/ieeepcs00.pdf
[5] J. Flinn and M. Satyanarayanan, “Energy-aware adaptation for mobile
applications,” in Proceedings of the Seventeenth Symposium on
Operating System Principles (SOSP’99), December 1999. [Online].
Available: http://portal.acm.org/citation.cfm?doid=319151.319155
[6] C. Ellis, “The case for higher level power management,” in Proceedings
of the Seventh Workshop on Hot Topics in Operating Systems
(HotOS’99), March 1999. [Online]. Available: http://www.cs.duke.edu/
∼carla/ellis.pdf
[7] B. D. Noble, M. Satyanarayanan, D. Narayanan, J. E. Tilton, J. Flinn,
and K. R. Walker, “Agile application-aware adaptation for mobility,” in
Proceedings of the Sixteenth Symposium on Operating System Principles
(SOSP’97), Saint Malo, France, 1997, pp. 276–287. [Online]. Available:
http://portal.acm.org/citation.cfm?id=269005.266708
[8] C.Ellis, A. Lebeck, and A. Vahdat, “System support for energy
management in mobile and embedded workloads: A white paper,” Duke
University, Department of Computer Science, Tech. Rep., October 1999.
[Online]. Available: http://www.cs.duke.edu/∼carla/research/whitepaper.
pdf
[9] M. Anand, E. B. Nightingale, and J. Flinn, “Ghosts in the machine:
Interfaces for better power management,” in Proceedings of the
Second International Conference on Mobile Systems, Applications,
and Services (MOBISYS’04), June 2004. [Online]. Available: http:
//www.eecs.umich.edu/∼anandm/mobisys.pdf
Compiler memory energy optimizations
Peter Majorin
Software Technology Laboratory/TKK
2.2. Loop cache Loop analysis [1] is a static analysis which compilers use
to locate important code to be optimized. In the context of
A loop cache [6] is another low power on-chip memory, energy optimizations loop headers can be used as a basis
which is more limited than a cache, and therefore consumes for growing traces (Section 3.3). If the on-chip memory is
less energy. In contrast to scratchpads, loop caches can be large enough, or the loop is small enough, entire loops can
hardware controlled, but they can only contain code. As be placed on the on-chip memory at once.
the name implies, it is used to store loop code; the loop
code must fit entirely into the loop cache to be effective. A 3.3. Trace analysis
loop cache can also be software controlled, and it is then
equivalent to a scratchpad that can only store instructions. Traces [14] (frequently executed straight-line sequences
of basic blocks) can be used in the context of memory opti-
3. Compiler analyses for memory optimiza- mizations. Traces can be generated from the profiling data
tions and static loop analysis; the loop header provides a starting
point to grow the trace from. A trace is terminated when
its tail execution frequency decreases below a certain fixed
3.1. Static and dynamic analyses
threshold value as compared to the header execution fre-
quency. An advantage with a trace is that it can cross pro-
For memory optimizations, static and dynamic analyses
cedure boundaries so that opportunities for saving energy at
can be used; both have their advantages and disadvantages,
the interprocedural level are not missed. Furthermore, the
and at best they can be used to complement each other[5,
trace building must be tailored to a certain memory hierar-
16].
chy; the size of the trace must not exceed a SPM and caches
Dynamic analyses cannot usually capture all possible
must be taken into account if they are present [20].
program executions, but are easy to perform; the program
to be profiled is just run with various inputs a number of
times to obtain the program run-time behavior. In a partic-
3.4. Statistical measures
ular run, everything about the program can be found out,
including the memory accesses done and where program Instead of performing a structural analysis on code to
execution time is spent, but this information is obviously identify loops and to build traces out of these, the authors in
only valid for that particular run of the program. [9] suggest a novel heuristic they call concomitance. This
Static analyses on the other hand attempt to find out pro- is a statistical measure of the temporal correlation between
gram run-time behavior by considering all program execu- blocks of instructions. The advantage with this method is
tions at the same time. This makes static analyses sound that it can capture hot spots in the program without need-
(can capture all program executions), but in practice pro- ing to identify the structure of the program. Traces are still
gram semantics must be approximated to make the analyses needed as profiling data as with the other methods.
feasible, also resulting in inaccuracies of the results. More-
over, over a decade of research in automatic flow analysis 4. Scratchpad allocation algorithms
shows that statically analyzing program behavior to obtain
accurate information is a very difficult problem for larger The allocation algorithms presented here optimize en-
programs. ergy consumption with respect to average case energy con-
But for memory optimizations, an accurate knowledge sumption (ACEC); also other optimization criteria exist
about where program execution time is spent is essential, such as worst case energy consumption (WCEC) [10]. Op-
so program profiling should be used instead of inaccurate timization with respect to energy consumption may result
in performance improvements as well; this is usually the Size (bytes) fetch (SPM) (nJ) fetch (I-cache) (nJ)
case when doing memory optimizations because low-power 64 0.1803 0.2961
memories are usually faster as well. 128 0.1888 0.3059
The inputs for all the allocation algorithms are a power 256 0.1980 0.4732
model for the instructions and the hot spots of the program 512 0.2188 0.4966
to be optimized in the form of basic blocks, procedures, 1024 0.2404 0.5233
loops, traces and global variables. The energy savings ob- 2048 0.2748 0.5655
tained by scratchpad allocation is often compared against a 4096 0.3277 0.6351
cache of a similar size, or against a static allocation (Section
4.1), if appropriate.
Dynamic data structures such as a stack and heap mem- Table 1. Energy consumed by a SPM vs. an I-
ory remain problematic, because their sizes are not usu- cache (0.18 µm) of equal size from the CACTI
ally known at compile-time. However, these issues have model [15].
received some research so far [3].
4.1. Static Allocation The approach by Verma et al. [19] is based on ILP, and
solved in the following phases:
Static allocation has been studied extensively, e.g. [17]:
the contents (what variables and code) of the scratchpad is 1. Determine candidate SP (scratchpad) objects
loaded in the start of the program and this allocation re- as in static allocation (code and data)
mains unchanged during program execution. The problem
to solve is the integer knapsack problem: ILP (integer lin- 2. Perform liveness analysis on the SP objects
ear programming) or dynamic programming can be used
to optimally solve this problem: the selected procedures, 3. Assignment of SP memory objects and their
traces, loops and variables based on profiling data and en- spill locations in code
ergy model are placed on scratchpad so as to optimize its 4. Computation of memory addresses of the SP
filling. objects
Static allocation still finds use in studies of more com-
plex environments such as multi-banked and cache-aware Some of the above steps were approximated with heuristic
scratchpad allocations. It also serves as a benchmark to methods, because they take a very long time to compute for
compare the effectiveness of dynamic allocations against. larger programs if done with ILP. The authors report 26%
average energy savings as compared to a static allocation
4.2. Dynamic Allocation method.
Here we see the typical structure of a dynamic scratch-
Dynamic allocation is harder than static allocation, but it pad allocation algorithm: since the objects can be evicted
has been demonstrated to save more energy than static ap- from the scratchpad at any point in the program, we have to
proaches. In dynamic allocation, the allocation of scratch- determine the live ranges of the objects to be able to reason
pad can change during runtime. In contrast to static alloca- about how long we must keep the objects on the scratchpad
tion, program points have to be identified in a program to before we can evict them. We also have to determine the
load and evict the scratchpad. Several different approaches points in the program where to evict and load the scratch-
have been proposed for dynamic allocation [19, 18, 9], most pad. Finally we have to decide where in the scratchpad we
of them being heuristic methods. put the loaded scratchpad object.
The motivation for dynamic allocation can be seen in Ta- The approach by Janapsatya et al.[9] is based on a sta-
ble 1. Smaller SPM sizes save more energy, because each tistical method and considers only instructions. Rather than
fetch costs less energy, but on the other hand a smaller SPM using a structural approach to identify loops and traces, they
can hold less code or data. Dynamic allocation can there- identify temporally correlated blocks of code directly from
fore save more energy than with static allocation, because traces. The authors report 41.9% savings in energy when
it is able to utilize better the limited storage space. This compared to a similar sized cache. Part of this large sav-
is the case for larger programs that have several hot spots ing comes from their SPM controller and DMA support for
and alter between these. The conclusion is that very small scratchpad transfers.
SPM sizes may save most energy, but this is also program- The approach by Udayakumaran et al. [18] annotates the
dependent. We next present some state-of-the-art dynamic program CFG (control flow graph) with timestamps to rea-
allocation methods. son about the eviction/placement strategy. Both code and
data objects are considered, as well the stack. Because data small performance penalties, but an average leakage energy
is considered, the authors use a run-time disambiguator to of over 40% is saved over all the benchmarked programs.
correct memory references, which causes some overhead.
Energy savings of 31.3% on the average are reported as 4.4. Scratchpad allocation in a multitasking envi-
compared to a static method. ronment
The approach by Ravindran et al. [15] considers only in-
structions, and uses an iterative liveness analysis of traces to Scratchpad allocation in multitasking systems has also
hoist the allocation of traces upwards in the program CFG. been considered in [13, 4].
This is because loading naively in all basicblocks of a trace The problem considered in the first paper is how to
at its entry will result in a sub-optimal allocation strategy choose an appropriate static allocation for code and data for
that can be improved upon. The advantage with this method a set of statically scheduled processes on a single scratch-
is that it is heuristic method and requires much less compu- pad. Furthermore, the execution time of the processes and
tation than solving an ILP problem. their energy consumption is assumed be known a priori.
The drawback with dynamic allocation is that it causes The goal is minimize the energy consumption over the en-
aliasing problems; memory references become invalid to tire set of processes.
SP, when its allocation is changed during program execu- The allocation strategies considered are saving, non-
tion. Furthermore, dynamic allocation may also be a prob- saving and hybrid (which is a mixture of the two previ-
lem in strict real-time applications, because of the extra ous). The saving approach allocates the single scratchpad
copy code inserted to handle the scratchpad at various pro- completely to the active process, while the non-saving ap-
gram locations. In addition, if aliasing problems are ad- proach divides the scratchpad evenly among all processes.
dressed a run-time disambiguator costs additional processor The hybrid approach uses a common memory area for all
time. To avoid aliasing, but still get some benefits from dy- processes, but also areas that remain dedicated to certain
namic allocation, data could be allocated statically and code processes. This reduces the overhead of context switches
dynamically, largely avoiding the aliasing problems (this is by having a common region which can be used for shared
still a problem with code called via function pointers). data that all processes use frequently. Therefore, context
switches matter only for the saving and hybrid approaches,
4.3. Multi-banked scratchpad allocation which cause some overhead, but allows the scratchpad to be
better utilized, if it is small.
Dividing the memory hierarchy in several smaller mem- As a result, it was found that the non-saving approach
ory banks, instead of using a single monolithic memory worked best for large SPM sizes (1-4 kB), while the saving
bank has many advantages. Smaller banks consume less approach worked best for small SPM sizes (up to 512 B).
energy per access and unused banks can be turned off to The hybrid approach, on the other hand, worked well for
save static leakage energy. Furthermore, memory accesses all scratchpad sizes, but required most computational time.
can be made in parallel, giving additional performance [8]. Energy savings of 9-20% were reported as compared to a
Multi-banked scratchpad allocation has been studied in non-saving allocation that does not attempt to minimize the
e.g. [22, 12]. energy consumption over all processes.
The results of Wehmeyer et al. [22] show that using The second paper [4] presents a novel way of using
many smaller scratchpads instead of a large one becomes a scratchpad in a virtual memory system with an MMU
beneficial when a program is large enough to be able to uti- (memory management unit) to store swapped-in pages into
lize a bigger scratchpad size. Energy savings of up to 22% an SPM. Their page allocation algorithm considers only a
were reported as compared to a single scratchpad system for single process system and code allocation, but they state
a total of 32 kB SPM size for both cases. that their method is easily extended to a multi-process en-
Kandemir et al. [12] on the other hand focus on optimiz- vironment and data allocation. A 33% reduction in energy
ing array accesses in loop nests in a multi-banked scratch- consumption is reported, as compared to fully-cached con-
pad system in order to minimize leakage current loss. The figuration.
motivation for this is that the speed and density of the
CMOS transistors is expected to rise in future, so that static 4.5. Complementary scratchpad optimizations
leakage management is expected to become very important.
Their method is based on optimizing bank locality, which In this section we cover some complementary memory
means that successive SPM accesses should come to the optimizations that can be used together with a scratchpad to
same bank as much as possible, which makes it possible to save even more energy than with a scratchpad alone.
put the other banks in a low-power idle state for as long time A hardware-controlled loop cache has been studied to-
as possible. Turning on and off the memory banks results in gether with an Instruction Register File [7]. It was found
that allocating frequently used instructions into a register On the other hand, the work of [11] presents an energy-
file along with a hardware-controlled loop cache can save aware compilation framework (EAC) and focus only on
more energy than using these in isolation. Here, we see that high-level energy optimizations, including memory opti-
instead of a hardware-controlled loop cache, a scratchpad mizations. Array-dominated programs are common in DSP
could be used, saving even more energy. and multimedia applications, and large energy savings can
Processor caches have also been studied together with an be obtained by source-level optimizations. The authors take
SPM in [20], where the authors studied a memory hierarchy the view that simulations and profiling takes a too long time,
consisting of a scratchpad together with an I-cache and a so the code is analyzed statically instead, taking as input
static allocation algorithm of code objects was used. It was technology parameters (memory model, buses etc.). Vali-
found that using a scratchpad along with a cache can result dation of the methods is still performed with a simulator.
in poor energy savings if the cache behavior is not taken The previously mentioned frameworks have a drawback
into account, resulting in needless cache thrashing. There- in that they are meant for research use, and are not yet ma-
fore, cache misses and hits need to be taken into account ture for common use. In the author’s opinion, the prob-
when deciding what code objects to place on scratchpad. lem with these frameworks is also that they rely on previ-
Their formulation of the problem is a nonlinear optimiza- ously developed components which do not necessarily fit
tion problem, which consists of cache model represented as well together, creating an unnecessarily complex tool chain
a conflict graph. This problem is then linearized and solved with many intermediate formats. Also, integrating energy-
optimally and near-optimally as an ILP problem. awareness in a compiler architecture may not be a good idea
in the long run, because it would tie the users to a specific
5. Compiler frameworks for memory opti- compiler. Otherwise, energy-awareness would need to be
mizations integrated separately in each compiler.
[10] Ramkumar Jayaseelan, Tulika Mitra, and Xianfeng [19] Manish Verma and Peter Marwedel. Overlay Tech-
Li. Estimating the Worst-Case Energy Consumption niques for Scratchpad Memories in Low Power Em-
of Embedded Software. In Proceedings of the 12th bedded Processors. IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, 14(8):802–815, Au-
gust 2006.
[20] Manish Verma, Lars Wehmeyer, and Peter Marwedel.
Cache-Aware Scratchpad-Allocation Algorithms for
Energy-Constrained Embedded Systems. IEEE Trans-
actions on Computer-Aided Design of Integrated Cir-
cuits and Systems, 25(10):2035–2051, October 2006.
[21] Manish Verma, Lars Wehmeyer, Robert Pyka, Peter
Marwedel, and Luca Benini. Compilation and Sim-
ulation Tool Chain for Memory Aware Energy Op-
timizations. In Proceedings of the 6th International
Workshop on Embedded Computer Systems: Architec-
tures, Modeling, and Simulation (SAMOS’06), pages
279–288, Samos, Greece, July 2006.
[22] Lars Wehmeyer, Urs Helmig, and Peter Marwedel.
Compiler-optimized Usage of Partitioned Memories.
In Proceedings of the 3rd Workshop on Memory Per-
formance Issues (WMPI’04), pages 114–120, Munich,
Germany, June 2004.
[23] Joseph Zambreno, Mahmut T. Kandemir, and Alok N.
Choudhary. Enhancing Compiler Techniques for
Memory Energy Optimizations. In Proceedings of
the Second International Conference on Embedded
Software (EMSOFT’02), pages 364–381, Grenoble,
France, October 2002.
Measuring the CPU energy consumption of a modern mobile device
Antti P Miettinen
[email protected]
Abstract
Modern mobile devices employ complex system-on-
chip (SoC) processors as their main computing en-
gine. Inside a SoC several functional blocks can
share the same external supply voltage line. This
makes measurement based study of a single block,
e.g. the main CPU, challenging as the current
drawn by an individual block cannot be measured
directly.
The goal of this work was to characterize the
feasibility of using simple board level current mea-
surement instrumentation for studying the energy
consumed by program code run on the ARM926
core inside the OMAP1710 SoC. The results indi-
cate that by using carefully planned experimental
setups energy consumption related to the following
factors can be studied at least qualitatively:
Figure 1: ARM926 core inside OMAP1710, picture
• instruction path bit switching from [1]
1 Introduction
The environment for developing software for an em-
bedded hardware target often consists of
Page 1
– additional trace and debug instrumenta-
tion
Page 2
As the ETM trace provides a cycle accurate in-
struction execution trace the energy per instruction
can be calculated from cycles per instruction (CPI)
and CPU clock speed:
CP I CP I
tinsn = , ⇒ Einsn = P (4)
f f
The ETM trace provides also a convenient
method for choosing the current samples corre-
sponding to execution of only the test code as the
Trace32 is able to display the A/D samples in sync
with the ETM trace.
Although the methodology seems trivial, the
challenge lies in obtaining the power level that is
representative for the power level of the factor of
interest. As the OMAP1710 contains much more
than just the ARM926, absolute power levels are
not of interest. Another important issue is to asses
the repeatability of the measurements because of
unknown uncontrollable factors in the setup. A
rough measure for the contribution of secondary
factors can be obtained by measuring the power
level when the ARM926 is in the wait-for-interrupt
mode. This still includes power contribution from
the ARM core but for the purpose of this work, the
ARM idle power is also a secondary factor.
As the sampling frequency of the A/D converter
is orders of magnitude smaller than the clock speed
of the circuit being measured, the only feasible Figure 4: ARM926 overview from [9]
way to study instruction level effects seems to be
contructing test programs that excercise the phe-
nomenon of interest for sufficiently long time and to the problem was more or less ad hoc. However, a
use averaging to obtain a representative power brief look into ARM9 microarchitecture [8] is useful
value. for constructing feasible tests.
For estimating the error in the value obtained by The ARM926EJ-S inside the OMAP1710 is a
averaging the straight forward approach is to use member of the ARM9 general-purpose micropro-
e.g. sample standard deviation. However, both the cessors and is targeted for multitasking operating
sample mean and standard deviation can give bi- systems with full memory protection and virtual
ased estimates. The current drawn by a digital cir- memory support. The caches and MMU can be
cuit is by nature composed of current peaks. Sam- significant power consumers and can be studied as
pling might get synchronized and therefore the av- primary factors but must also be addressed as sec-
erage value could be biased. ondary factors when studying e.g. core internal fac-
For energy measurements it would make sense tors. The JTAG and ETM instrumentation also
to perform the A/D conversion by integrating the consume power so the test setup should strive to
current over the sampling period. This could be keep the contribution of those blocks as constant
achieved by analog integrating circuitry but imple- as possible.
menting this was not feasible because of time con- The actual ARM9EJ-S [10] processor core imple-
straints of this work. ments the ARM v5TE architecture. This includes
support for ARM, Thumb and Jazelle instruction
2.3 Test programs sets and DSP instructions. The pipeline consists of
five stages:
2.3.1 General
• Instruction fetch
As the goal of this work was simply to establish
whether a given effect is measurable, the approach • Decode
Page 3
• Instruction fetch
• Data read
• Data write
Figure 5: ARM9 pipeline from [8]
• Functional unit activity
• Register access
To limit the scope of this work, the test pro-
grams were constructed so that instruction and
data fetches occur always from cache, i.e. code se-
quences of test loops were kept smaller than instruc-
tion cache (32k) and data accesses were localized
to areas smaller than data cache (16k). To max-
imize the proportion of instructions that excercise
the factor under study test loops were contructed as
instruction sequences of 4096 instructions repeating
the effect being measured. The loop overhead was
below ten instructions so the loop overhead effect
on power should be quite small.
Page 4
Two test cases were constructed for measuring 52
nop
move immediate
the effect of bit toggling in instruction words. One 50
test varies the operand register of a mov rx,rx in-
struction allowing varying the number of alternat- 48
Energy
instruction allowing bit variation between zero and 44
38
Page 5
54
it seems that the only significant functional unit
52
energy-wise is the multiplier. The effects of e.g. bit
switching in the instruction path and in the register
50
bank as well as the data cache reads and writes are
48
clearly measurable but with the performed tests it
was not possible to measure the effects of e.g. shift
Energy
46 and addition.
44
The measurements performed in this work were
quite limited. For example comparing ARM and
42 Thumb execution was completely omitted as was
40
all testing related to the branch instructions. Also
only addition was tested as an arithmetic-logic op-
38
0 5 10 15 20 25 30
eration. Possible future work could include more
Number of alternating bits complete tests and comparisons to e.g. ARM11 and
Cortex architectures.
Figure 8: Register bank bit switching effect
References
3.4 Functional unit activity
[1] Texas instruments OMAP1710 overview.
The relative energies for the tested data processing
http://focus.ti.com/...
instructions are shown below:
instruction minimum maximum [2] Joint test action group, standard test ac-
mul 2 × ( 48 ± 1 ) 2 × ( 62 ± 1 ) cess port and boundary-scan architecture, ieee
add 38 ± 1 39 ± 1 1149.1. http://en.wikipedia.org/wiki/JTAG.
mov with shift 39 ± 1 39 ± 1
[3] Embedded trace macrocell architecture speci-
The instructions were run with different data val- fication. http://www.arm.com/pdfs/...
ues as operands and as can be seen multiplication
shows significant data dependecy. High cost of mul- [4] Lauterbach datentechnik GmbH debug and
tiplication is clearly measurable already before tak- trace products. http://www.lauterbach.com/.
ing CPI into account. On the other hand it seems
[5] ARM926EJ-S technical reference manual.
that adder and shifter consume neglible energy (the
http://arm.com/pdfs/DDI0198D 926 TRM.pdf.
power level difference to e.g. nop is below error
margin). [6] Ubuntu, community developed linux-based op-
erating system. http://www.ubuntu.com/.
3.5 Register access [7] Dan Kegel. Building and testing gcc/glibc
Figure 7 shows the relative energy of mov r0,r1, cross toolchains. http://kegel.com/crosstool/.
mov r0,r2 code sequence as a function of the num-
[8] The ARM9 family - high performance mi-
ber of alternating bits in the values in registers r1
croprocessors for embedded applications. In
and r2. As for instruction path, the trend is very
ICCD ’98: Proceedings of the International
clear.
Conference on Computer Design, page 230,
Washington, DC, USA, 1998. IEEE Computer
4 Conclusions and future Society.
Page 6
Energy-Aware Scheduling
Juhani Peltonen
Software Technology Laboratory/TKK
[email protected]
Kristian Söderblom
Software Technology Laboratory/TKK
[email protected]
A technique to create an energy consumption model for [3] I. Kadayif, M. Kandemir, G. Chen, N. Vijaykrish-
a RISC processor using measurements and a regression nan, M. J. Irwin, and A. Sivasubramaniam. Compiler-
model was presented in [6]. Experiments showed an aver- Directed High-Level Energy Estimation and Opti-
age energy estimation error of 2.5% for random instruction mization. ACM Transactions on Embedded Comput-
sequences. ing Systems (TECS), 4(4):819–850, November 2005.
[4] Paul Landman. High-Level Power Estimation. In Pro- [13] V. Tiwari, S. Malik, A. Wolfe, and T. C. Lee. Instruc-
ceedings of the 1996 international symposium on Low tion Level Power Analysis and Optimization of Soft-
power electronics and design, pages 29–35, Monterey, ware. Journal of VLSI Signal Processing, 13(2):1–18,
California, United States, August 1996. August 1996.
[5] Mike Tien-Chien Lee, Vivek Tiwari, Sharad Malik, [14] Vivek Tiwari, Sharad Malik, and Andrew Wolfe.
and Masahiro Fujita. Power Analysis and Minimiza- Power Analysis of Embedded Software: A First Step
tion Techniques for Embedded DSP Software. IEEE Towards Software Power Minimization. IEEE Trans-
Transactions on Very Large Scale Integration (VLSI) actions on Very Large Scale Integration (VLSI) Sys-
Systems, 5(1):123–135, March 1997. tems, 2(4):437–445, December 1994.
[15] Lars Wehmeyer, Urs Helmig, and Peter Marwedel.
[6] Sheayun Lee, Andreas Ermedahl, and Sang Lyul Min.
Compiler-optimized Usage of Partitioned Memories.
An Accurate Instruction-Level Energy Consumption
In Proceedings of the 3rd Workshop on Memory Per-
Model for Embedded RISC Processors. In LCTES
formance Issues (WMPI’04), pages 114–120, Munich,
’01: Proceedings of the ACM SIGPLAN workshop
Germany, June 2004.
on Languages, compilers and tools for embedded sys-
tems, 2001. [16] Yrjö Neuvo. Cellular phones as embedded systems.
In 2004 IEEE International Solid-State Circuits Con-
[7] E. Macii, M. Pedram, and F. Somenzi. High-level ference, San Francisco, CA, USA, February 2004.
power modeling, estimation, and optimization. IEEE
Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 17(11):1061–1079, November
1998.
Timo Töyry
Software Technology Laboratory/TKK
[email protected]
Abstract 2 ACPI
In this paper I will present shortly the two method ACPI (Advanced Configuration and Power Interface)
most commomly used in energy accounting these are model specification was originally introduced by Intel, Toshiba
based energy accounting and measurement based energy and Phoenix in December 1996. Compaq (now Hewlett-
accounting. I will also do a sort survey to taday’s com- Packard) and Microsoft joined to development group later.
puters power management in form of ACPI. ACPI specification has been updated couple times after
is initial introduction. The current version is 3.0b which
was published in October 2006. ACPI specification defines
common interfaces for energy management for both soft-
1 Introduction ware and hardware. ACPI allows the operating system to
control energy management of whole system as well indi-
Energy consumption is now one of the biggest chal- vidual devices. The core of ACPI is ACPI System Descrip-
lenges of today’s mobile devices. As traditional solution tion Tables which expose power saving modes of hardware
almost every chip and other devices have many different to operating system. Usually ACPI hardware support means
energy saving modes, but they are not efficiently used in only that hardware provides these tables to above software
today’s operating systems since devices are put to lower en- layer. In these tables are listed for example energy modes
ergy mode after some timeout period and then put back to supported by the device. ACPI is in wide use nowadays,
higher energy mode when some process needs it. actually it is the most commonly used energy saving inter-
There is a new different approach to this issue: What if face in all kind of computers today. But unfortunately ACPI
user is given the power to choose how long system should does not provide any information about energy consump-
stay running with certain set of programs. Then the user tion to software (operating system). For this reason ACPI is
shall prioritize his/hers running programs in relation to each not usable for energy accounting. [1], [2],[3].
other to reflect their importance to he/she. And then the
operating system takes care that users requirements are met 3 Model based energy accounting
if they are realistic. One method to achieve this is so called
energy accounting. Model based energy accounting employs some model of
The term energy accounting could be defined for exam- used hardware to estimate its energy consumption. These
ple as so: A system which allows operating system to con- systems do not usually have any feedback from hardware
trol the amount of energy consumed by some program. This about real energy consumption, but when the model is de-
requires that operating system have knowledge about that veloped energy consumption of system in certain states are
programs energy usage. The knowledge about program’s measured and the model is then fitted to measurement data.
power consumption can be carried out many different ways. There are couple different approaches to build an energy
Probably the two most common methods are model based model for system [4], [6].
energy accounting, which uses a model of the system to cal- Event counter based models employs hardware perfor-
culate estimates for used energy, and measurement based mance counters available in system as input to the model.
energy accounting which employs realtime measurements Performance counters are configured to measure relevant
of power usage of the system. In following chapters I will events for energy consumption such as CPU cache misses.
introduce the today’s standard in computers power manage- This kind of model can be used only for CPUs since other
ment the ACPI and these two methods in little bit more de- devices do not usually have any performance counters.
tail and also prototype examples for both. Only exception is the main memory whose usage can be
measured although indirectly by CPU cache misses and model because it forms a common unit for energy allocation
memory write backs. The accuracy of model is relative and accounting over different hardware devices and soft-
to amount of input data from performance counters. The ware, in which different processes are competing for limited
model itself is usually very simple and does not require hardware resources [6].
much computing which makes its results suitable to be used One unit of currentcy is defined as right to consume cer-
as input for energy accounting algorithm in running system. tain amount of energy in certain period of time. In the pro-
The simple model is also the weak point of this approach totype system one currentcy is specified to correspond 0.01
because simplifications can cause inaccurancies in estima- mJ of battery energy [6].
tion [4], [6]. In currentcy model power costs consists from two parts,
State based models represents system as a state machine first part is so called base cost and the second part is cost
which changes states between different energy states. States from managed devices. Base cost is defined to include low-
are changed correspondingly to actions of running program. est power states of managed devices and default power state
When the model is built its energy states are calibrated to of unmanaged devices. The second part of cost is from use
match real energy consumption of the state. It is done by of managed devices (in the prototype CPU, hard drive and
measuring energy consumption of components related to WLAN) so that they go to higher energy state. All man-
the inspected energy state and adjusting the model when aged devices may have own charging policies for higher en-
required to match measured values. These models give ergy states than the lowest energy state. Base costs are not
usually more accurate estimates about systems energy con- charged from processes currentcy containers, but they are
sumption than event counter models. The good point of taken in account in total energy consumption when target-
this approach is that it is an all software solution, but the ing to certain battery life [6].
model can not represent any variation in energy consump- There are two main aspects in currentcy allocation first
tion within states so the accuracy of the model is essen- and most important is target battery life selected by user.
tially defined by number of states used. In energy account- The target battery life specifies the amount of currentcy
ing cost of using a device is usually charged from process avalaible in each epoch. Epoch is the time interval in which
which causes that device to change its state to higher energy the allocated currentcy should be consumed. Second thing
state[4], [6]. is the currentcy allocation to competing processes in epochs
Third different approach to obtain estimates for the en- this is done by user given priorities. If process consumes all
ergy consumption of system is statistical modeling. The its currentcy it will be halted until next epoch and new cur-
energy estimates are calculated statistically from measure- rentcy allocation even if it is otherwise ready to run. Pro-
ment data which is collected while test programs are ran on cesses can also accumulate some unused currentcy for next
system. The measurement data contains usually the process epochs to pay some more expensive task. However cur-
id and value of program counter of currently running pro- rentcy accumulation is quite strictly limited to avoid situa-
cess and measured total energy consumption of system. The tions where many processes have lots of currentcy to spend
sample rate of measurement data is quite low compared to in single epoch. If these wealthy processes use all their cur-
clock rate of the CPU, but if data are collected low enough rentcy one there will be heavy peak in battery discharge rate
there will be statistically significant amount of samples for which is an unwanted effect that can reduce battery life [6],
every instruction of inspected program. Estimates obtained [5].
by this method are more detailed and accurate than esti- The processes are required to pay for usage of managed
mates from both event counter based and state bases models devices when they are going to use them, for example a pro-
[4], [6]. cess pays for execution time on CPU before it is executed.
The process can be executed as long as it have currentcy left
3.1 ECOSystem to pay for execution [6].
References