0% found this document useful (0 votes)
69 views6 pages

An Integrated Optimization Framework For Reducing The Energy Consumption of Embedded Real-Time Applications

project

Uploaded by

Mohan Raj
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views6 pages

An Integrated Optimization Framework For Reducing The Energy Consumption of Embedded Real-Time Applications

project

Uploaded by

Mohan Raj
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

An Integrated Optimization Framework for Reducing the Energy Consumption of Embedded Real-Time Applications

Hideki Takase , Gang Zeng, Lovic Gauthier , Hirotaka Kawashima, Noritoshi Atsumi , Tomohiro Tatematsu, Yoshitake Kobayashi, Shunitsu Kohara, Takenori Koshiro, Tohru Ishihara , Hiroyuki Tomiyama and Hiroaki Takada
Nagoya University, C3-1 (631), Furo-cho, Chikusa-ku, Nagoya, 464-8603 Japan Email: {takase, sogo, hkawashi, atsumi, tatematsu, hiro}@ertl.jp Kyushu University, 3-8-33 Momochihama, Sawara-ku, Fukuoka, 814-0001 Japan Email: {lovic, ishihara}@slrc.kyushu-u.ac.jp Toshiba Corporation, 1 Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, 212-8582 Japan Email: {yoshitake.kobayashi, shunitsu.kohara, takenori.koshiro}@toshiba.co.jp Ritsumeikan University, 1-1-1 Noji-Higashi, Kusatsu, Shiga, 525-8577 Japan. Email: [email protected] The Japan Society for the Promotion of Science
AbstractThis paper presents a framework for the purpose of energy optimization of embedded real-time systems. We implemented the presented framework as an optimization toolchain and an energy-aware real-time operating system. Our framework is synthetic, that is, multiple techniques optimize the target application together. The main idea of our approach is to utilize a trade-off between energy and performance of the processor conguration. The optimal processor conguration is selected at each appropriate point in the task. Additionally, an optimization technique about the memory allocation is employed in our framework. Our framework is also gradual, that is, the target application is optimized in a step-by-step manner. The characteristic and the behavior of target applications are analyzed and optimized for both intra-task and inter-task levels by our toolchain at the static time. Based on the results of static time optimization, the runtime energy optimization is performed by a real-time operating system according to the behavior of the application. A case study shows that energy minimization is achieved on average while keeping the real-time performance. Index Termsenergy optimization, embedded application, development environment.

I. I NTRODUCTION Our life is surrounded by a large number of equipment into which computers are embedded. If even a little energy reduction per equipment can be achieved, it is possible to give a meaningful contribution to the whole society. Also, energy savings have become one of the primary goals in the embedded systems. The amount of energy consumption might decide the commodity value of the product. A number of techniques have been proposed for the energy optimization of embedded applications so far. However, the man-hour for the energy optimization has not been invested enough in the development process of embedded applications due to its growing scale and complexity. For the purpose of energy minimization, it is advisable to apply as many techniques as possible. Energy optimization can be performed at several layers, i.e., the compiler layer, the

hardware layer, the Operating System (OS) layer. It is desirable to save more energy by coordinating all three layers. However, combination of individual techniques in a simplistic form may negate the optimization effect of another optimization. A carefully-designed coordination of each technique is required to achieve a synergistic effect on the energy consumption. Additionally, an automatic optimization toolkit for the embedded real-time application should be provided to enhance the energy optimization in the development process. This paper presents a synthesizing and gradual framework for energy optimization of the embedded real-time applications in single processor environments. We implement an optimization toolchain for the static time and an energy-aware real-time operating system (RTOS) for the runtime optimization. The term synthesizing means that our framework includes multiple optimization techniques. The framework can coordinate a recongurable capability of processor, an RTOS support, and a compiler function to achieve maximal energy reduction. The main idea of our approach is to utilize a trade-off between energy and performance of an embedded processor conguration. An optimization technique for the memory allocation is also incorporated in our framework. The term gradual means that the whole optimization process applies several phases. More precisely, there are three phases, i.e., the intra-task, the inter-task, and the runtime optimization phases. The intratask and inter-task phases optimize the characteristics and the behavior of the target application at the static time. Then, the runtime optimization performs with the results of the static optimization. It aims energy minimization for the average case while guaranteeing the task deadline constraints. In the embedded domains, there is a commonly-held view that the characteristic and runtime behavior of the application are known at the design phase. So, it is possible to obtain them by our analytic and optimization toolchain at static phases. Its analytic and optimization results are saved as manage-

978-1-61284-660-6/11/$26.00 2011 IEEE

271

intra-task optimization
task code test data

inter-task optimization
linker info. DEPS profile
task set info.

runtime optimization
DEPS table SPM table

simulation checkpoint extraction intra-task memory allocation DEPS profiling

bugdet distribution inter-task SPM partitioning compile

SPM switching slack estimation DEPS

HW config.

loadable module

Fig. 1.

The overview of the energy optimization framework

ment tables. Then, our RTOS utilizes the acquired results in management tables for the runtime optimization. It should be noted that though our approach employs application-dependent optimizations, it is widely applicable for many embedded applications. The implementation overhead of our toolchain and RTOS is reasonable because the modication of source code is controllable, and most energy/time related information is explored at the static time. Additionally, our toolchain contributes to the reduction of the man-hour of the embedded applications developers since the energy optimization process is designed to be automatically performed. The effectiveness of our framework will be demonstrated by a case study of the video-conference systems. There have been a number of publications that perform energy optimization for embedded applications. AbouGhazaleh et al. proposed a collaborative approach with the compiler and the OS [1]. The compiler extracts temporal information and the OS periodically invokes an interrupt service routine to perform the dynamic voltage and frequency scaling (DVFS). Wang et al. proposed a cross-layer adaptation framework for soft real-time applications [2]. Seo et al. proposed an algorithm to solve the combined optimization problem of both intratask and inter-task DVFS [3]. Azevedo et al. proposed the use of program checkpoint for the intra-task DVFS in which the checkpoint is decided by the proled information [4]. Suhendra et al. proposed an integer programming problem formulation which explores task mapping, scheduling, SPM partitioning, and data allocation simultaneously [5]. Although some of the previous work has employed compiler to extract the DVFS-related information, they did not consider the management of memory allocation at the same time. Additionally, all these works aimed at DVFS applications, and they cannot be extended to processors with a recongurable capability directly. Moreover, no existing toolchain and RTOS can support the energy optimization of all three layers. In summary, This paper makes following contributions.

framework and demonstrate its implementation overhead. The rest of this paper is organized as follows. Section II presents an overview of our framework. From Section III to Section V, the respective phases of our framework are described in the order of procedure. Section VI presents the evaluation of our framework by a case study. Finally, Section VII concludes this paper. II. OVERVIEW A. Objective and Target Systems Our objective is to minimize the energy consumption in the average case for an embedded real-time application. Our framework targets hard real-time embedded systems where a high responsiveness is indispensable to execute its processing exactly. In the embedded real-time systems, one of the most important things is that all tasks must be completed within their deadlines. Therefore, our optimization framework takes the policy that the task deadline constraint must be guaranteed. As for the target systems, a set of independent periodic tasks or sporadic tasks (aperiodic tasks with minimal inter-arrival separation) is assumed to constitute the application. Also, we assumed that all tasks are scheduled according to a priority based preemptive scheduling with static priority assignment to ensure the real-time performance. B. Workow Fig. 1 indicates the workow of our framework. It consists of three phases. The rst is an intra-task optimization phase. The characteristics of each task are analyzed from execution traces obtained by an instruction-set simulation. The second is an inter-task optimization phase. Based on the results obtained at the former phase, the application level (multi-task level) characteristic and behavior are analyzed and optimized. The last is a runtime optimization phase. The runtime optimization achieves maximal energy savings with two management tables generated at the previous phase. The framework is designed to apply both static and runtime optimization, both compilerlayer and OS-layer optimization, and both intra-task and intertask optimization by the continuous ow. In the design of our framework, each phase is further divided into several optimization steps. It is possible that a rough combination of the techniques negates the optimization effect each other. Therefore, we decided the order of application of each step so that the optimization effect of each step should not be sacriced. Moreover, the workow employs an one-pass

We present an automatic framework to optimize the energy consumption of embedded real-time applications. Our framework is designed to support the coordination of all three layers for energy savings. We implement the framework as a static optimization toolchain and an energy-aware RTOS. To the best of our knowledge, this is the rst work to present the whole optimization framework as a practical toolchain. We make a case study to validate the effectiveness of the

272

optimization manner. The optimization ow should be applied continuously and should not be turned back to the prior step. There are three reasons why our framework is designed to perform in a step-by-step manner. First, the runtime overhead for the RTOS should become low by optimizing the target application as enough as possible at the static time. The number of processes required by the RTOS can be very small to utilize the attribution of the embedded domain, which the characteristics of the application are obtained enough at the static time. Second, it is easy to append other optimization techniques to the framework later. Since applying many techniques can achieve a better effect on the energy, our framework has an expandability to synthesize other energy optimization techniques. Third, the closest optimal result can be achieved even by deriving the local optimal result one-by-one on each step. Although it is desirable to integrate a number of suboptimization problems into a large one, it is unrealistic to obtain the optimal solution within a reasonable time due to its large scale. Therefore, our step-by-step optimization procedure can balance the computational complexity and energy savings. As shown in Fig. 1, the framework takes task code and its test data set as the input information. These inputs are provided to a cycle-level simulator to obtain execution traces that are the main analytic target of the framework. The reason why we treat the execution trace for analysis is that they include the information about the characteristics and the realistic behavior of the target application [6]. Since the optimization is performed based on the execution traces of the target application, our approach is widely applicable to many embedded real-time applications. C. The Optimization Techniques in Our Framework This subsection introduces the optimization techniques synthesized into our presented framework. We employ a dynamic energy and performance scaling (DEPS) technique, which has been proposed in [7] as a key technique of our framework. The DEPS is a generalized technique of the commonly used DVFS [8]. The rationale behind DEPS is to bind a trade-off between performance and energy by selecting different processor congurations. In addition to the voltage and frequency of the processor, any recongurable hardware mechanisms that can trade-off performance for energy savings are considered in the DEPS. However, a challenge of DEPS is that the execution time and energy consumption under a specied conguration can no longer be estimated by simple calculation, as done in the DVFS. We thus proposed proling based solution to measure its execution time and energy by using a larger number of traces with different congurations [9], [10]. To evaluate and validate the DEPS, we have also developed a multi-performance processor (MPP) [11]. The MPP has two processing elements (PE) with different voltage and frequency, and whose shared instruction cache is resizable by changing its associativity. A distinct feature of the MPP is that the overhead for changing its conguration is very small in comparison with conventional DVFS processors.

test data task code


task code with checkpoints

simulation
execution trace

checkpoint extraction

modified task code intra-task memory linker info.

intra-task memory allocation DEPS profiling

HW config. info.

DEPS profile

Fig. 2.

The workow of the intra-task phase

Another technique to be synthesized within the framework is the optimization of the memory allocation. We have proposed memory allocation techniques in [12], [13] for the scratch-pad memory (SPM). An SPM is an on-chip SRAM which is faster and more energy efcient than a cache memory [14]. The basic idea for the techniques using an SPM is that the concentrated allocation of frequently accessed code and data to the SPM will bring energy reduction for the embedded systems. The technique of [12] can arrange the code and the static data placement at the single task level. Moreover, efcient SPM allocation techniques for multi-task environments were proposed in [13]. The space of the SPM is spatially or temporally partitioned among the tasks for energy reduction. D. Implementation In general, the behaviors and performance requirements of an application are different from phase to phase. Our approach tries to set the processor conguration and allocate code and data to the SPM at runtime to the lowest energy consumption while subjecting to the performance constraint. A main issue for our framework is to determine when and which processor conguration should be used. To address this issue, we developed several software tools for fully exploiting the performance/energy characteristics of given applications at the static time. They are implemented to be applicable to the MPP [11]. It has the Toshiba MeP [15] as the processor core. We implemented the static analytic and the optimization toolchain as a patch of the GNU mep-elf environment [16]. GCC 3.4.6 as the C compiler, Binutils 2.19 as the assembler and linker, and Newlib 1.17.0 as the standard library were used for the basis of our toolchain. Our toolchain modies the source code of target application automatically at compile time. No modication by the applications developer is needed for the energy optimization. Additionally, we developed an energy-aware RTOS as an extended version of TOPPERS/ASP kernel [17]. TOPPERS/ASP kernel is an open source RTOS for single processor embedded systems. III. I NTRA - TASK O PTIMIZATION P HASE Fig. 2 indicates the workow of the intra-task optimization phase. The pieces of output information in this phase are a modied task code, intra-task level linker information, and a DEPS prole. This section describes the detail of the intra-task phase in the applying order of steps.

273

A. Simulation At rst step of the intra-task phase, a number of execution traces are obtained by the simulation of the target application with various kinds of input data set. In this work, we used the instruction-set simulator of the MeP processor. It is advisable to obtain a large number of execution traces for the analysis of our framework. Additionally, each input data set is weighted by its occurrence frequency. The different execution paths are generated by using the different data sets. These can improve the effectiveness of the optimization for an average case energy consumption since the importance of an execution path is related to the occurrence frequency of its data set. B. Checkpoints Extraction In our framework, the checkpoints extraction is applied rstly over other steps since it receives less effects from other optimizations and only a rough behavior analysis of each task is required. The purpose of this step is to insert effective checkpoints into the source code of the tasks. Similar to [4], we dene a checkpoint as the location in a program where the appropriate processor conguration may be changed. To achieve this purpose, we have proposed an execution trace mining in [9], which is a technique for deriving the characteristics of the task automatically from a set of execution traces. It is important to decide where to insert checkpoints to enhance the effectiveness of the DEPS. Checkpoints should be inserted at where the remaining worst case execution time or the behaviors of the task (e.g. cache access rate) has been changed greatly. These locations are extracted as checkpoints where the energy-efcient processor conguration may change at runtime. Note that only the most effective checkpoints are selected by using the technique in [9] since too many checkpoints raise a signicant overhead on both execution time and energy consumption. We developed a software tool to extract and insert effective checkpoints automatically. The function of checkpoint is implemented as a RTOS API. Note that our tool inserts checkpoints into the assembly code instead of the C code because it is easy to locate the checkpoint in the assembly code after analysis of assembly-level execution traces C. Intra-task Memory Allocation The next optimization is the memory allocation for each task. The program code and the data memory objects are allocated to the SPM or the main memory depending on the memory access history obtained from execution traces. It should be noted that this optimization step decides only the amount of SPM used by a task. We implemented the tool to decide the allocation of code and data automatically. The source code of each task is further modied to allocate them to the SPM. It should be noted that no modication to the source code of the target application is made in the following optimization steps. The size of a memory object cannot be changed after the decision of memory allocation due to the limited capacity of the SPM.

Fig. 3.
task set info.

An example of the DEPS prole


DEPS profile

intra-task memory linker info. inter-task memory linker info. modified task code
kernel / library code

budget distribution

HW config. info.

inter-task SPM partitioning

DEPS mngt. table

compile

SPM mngt. table

loadable module

Fig. 4.

The workow of the inter-task phase

The completely modied source code is generated in this optimization. D. DEPS Proling The last step of the intra-task optimization phase is the proling proposed in [10]. By running a cycle-level simulation with the modied tasks code and the determined memory allocation, worst case execution time (WCET) and average energy consumption (AEC) for each conguration are obtained at each checkpoint. The DEPS proling uses these results to calculate WCET and AEC for the combinations of the processor conguration. A challenge is that there are too many combinations of processor conguration to check them completely. Our approach is to reserve the Pareto-optimal conguration combinations that have higher performance or less energy consumption than any other ones, the other combinations being pruned at this analysis. We call its results a DEPS prole (Fig. 3). It consists of a combination of conguration at each checkpoint with their WCET and AEC. Since the optimization step about the DEPS proling is applied at the last of the intra-task phase, the highly accurate values of WCET and AEC can be derived in our framework. We have developed an energy estimation tool with a technique proposed in [18]. Our tool uses a linear model for energy estimation and nds the coefcients of the model using multiple linear regression analysis. The amount of energy consumption is calculated by using execution traces and the post-layout gate-level simulation result of the target processor. IV. I NTER - TASK O PTIMIZATION P HASE Fig. 4 shows the workow of the second phase of our framework. The application level behavior is optimized with the information of a task set, i.e., the deadline, the priority and the activate interval of each task within the application. Additionally, the inter-task optimization uses the results of the

274

TABLE I T HE HARDWARE CONFIGURATION OF THE MPP. CPU I-cache I-SPM D-SPM SDRAM PE-H: 60 MHz / 1.8 V PE-L: 30 MHz / 1.0 V 8 KB / 4-way (resizable by changing its associativity) 8 KB 16 KB 256 MB

On-chip memory Off-chip

V. RUNTIME O PTIMIZATION P HASE


Fig. 5.
task-level table

An example of the DEPS management table


system-level table

rgn dram addr 0 0x600 2 0x630 3 0x670

index alloc spm addr mm size 0 2 0x00 0x30 1 2 0x50 0x10 2 3 0x60 0x40 3 1 0xa0 0x30

ROM area RAM area

Fig. 6.

An example of the SPM management table

prior phase as shown in Fig. 4. There are two management tables as output information of this phase. This phase also generates a loadable module as the conclusive result of the static optimization of the target application. A. Budget Distribution In this step, execution time budgets are distributed to each task in such a way that the total system energy is minimized and all deadlines are met. To achieve the above objective, an integer linear problem is constructed with the task set information and the DEPS proling of each task [7]. The output of this step is a DEPS management table, that is at each checkpoint as shown in Fig. 5. It consists of all the selectable processor congurations for each checkpoint and the corresponding remaining worst case execution time (RWCET). We developed a software tool to generate the DEPS management table from the DEPS proling and the applicationlevel information. Note that these congurations are sorted as increased RWCET and decreased energy consumption to simplify their use at the runtime phase. B. Inter-task SPM Partitioning In a multi-task environment, the SPM is to be shared among the tasks. The space of the SPM is spatially and temporally partitioned among the tasks in order to maximize the amount of SPM available to each of them by using our tool developed with the technique proposed in [13]. Note that the allocation of each tasks memory object has been decided during the intra-task optimization phase. The inter-task SPM partitioning step determines the address space used by each task. Like the prior step, this step outputs a management table about the runtime SPM optimization. We call this output information an SPM management table as shown in Fig. 6. The SPM management table falls into two types; the tasklevel and the system-level tables. These management tables are utilized at the runtime phase.

The processing for the runtime optimization are as follows; 1) Update contents of the SPM at each task switching 2) Calculate the amount of slack time at each checkpoint 3) Switch the processor conguration at each checkpoint The rst is performed with the SPM management table described in Section IV-B. When a task is switched to running state, its code and data are temporarily allocated to the space of the SPM. The value of the alloc column is updated for the runtime SPM management. The space of the SPM used by several tasks is reallocated on the task context switching so that a running task can temporarily use it. The second is performed at each checkpoint to estimate the approximate runtime slack at low cost. The third is performed based on the slack time calculated at runtime and the DEPS management table. The optimal processor conguration that can meet the deadline constraint and with the minimum energy consumption is selected at each checkpoint. These above functions have been implemented in our energy-aware RTOS where the rst one is integrated with the task dispatcher, and the other two functions are realized as checkpoint APIs. In our framework, the runtime overhead is limited because most of the analysis and optimization are performed at the static phase. Their results are saved as management tables and referred for the runtime usage. VI. C ASE S TUDY To evaluate and validate the effectiveness of this work, we conducted a case study with implemented toolchain and RTOS. Table I summarizes the conguration of the MPP used in this work. It has two processing elements with different voltage and frequency, and resizable 4-way instruction cache. Our toolchain and RTOS treat eight combinations of conguration in the MPP as the selectable processor conguration. As the target application, we deal with a brief version of the video-conference system. We used the Xvid MPEG-4 video codec [19] and the FFmpeg library [20], and ported them to the cycle-level simulator of the MPP. The target application consists of video encoding/decoding tasks and an I/O control task. Moreover, the source code of video codec is further divided to multiple tasks. Our toolchain is applied to the source code of the video application for the static optimization. Fig. 7 shows the capture of the demonstration of the videoconference system. Two systems send and receive its video data between each together via interactive communication. Fig. 8 shows the evaluation result of the case study. The bar shows the energy consumption in mJ. The amount of energy

275

transition of energy consumption without our framework

energy optimization is performed at runtime based on the information of two management tables generated at static phase. We implemented the optimization framework as the toolchain and the RTOS. In the future, we are planning to extend the framework for the multi-processor environments. ACKNOWLEDGMENT This work is supported by Core Research for Evolutional Science and Technology (CREST) of JST.

transition of energy consumption with our framework

R EFERENCES
[1] N. AbouGhazaleh, et al., Collaborative Operating System and Compiler Power Management for Real-Time Application, ACM Transaction on Embedded Computing Systems, vol. 5, no. 1, pp. 82115, 2006. [2] W. Yuan, et al., GRACE-1: Cross-layer Adaptation for Multimedia Quality and Battery Energy, IEEE Tansaction on Mobile Computing, vol. 5, no. 7, pp. 799815, 2006. [3] J. Seo, et al., Optimal Integration of Intra and Inter Task Dynamic Voltage Scaling for Hard Real-Time Applications, in Proceedings of International Conference on Computer-Aided Design, pp. 450455, 2005. [4] A. Azevedo, et al., Prole-Based Dynamic Voltage Scheduling Using Program Checkpoints, in Proceedings of Design, Automation and Test in Europe (DATE), pp. 168175, 2002. [5] V. Suhendra, et al., Integrated Scratchpad Memory Optimization and Task Scheduling for MPSoC Architectures, in Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pp. 401410, 2006. [6] P. Bjureus, et al., Performance Analysis with Condence Intervals for Embedded Software Processes, in Proceedings of the International Symposium on Systems Synthesis (ISSS), pp. 4550, 2001. [7] G. Zeng, et al., A Generalized Framework for Energy Savings in Hard Real-Time Embedded Systems, IPSJ Transactions on System LSI Design Methodology, vol. 2, pp. 180188, 2009. [8] W. Kim, et al., Performance Evaluation of Dynamic Voltage Scaling Algorithms for Hard Real-Time Systems, Journal on Low Power Electronics, vol. 1, no. 3, pp. 207216, 2005. [9] T. Tatematsu, et al., Checkpoints Extraction Using Execution Traces for Intra-Task DVFS in Embedded Systems, Proceedings of International Symposium on Electronic Design, Test and Applications (DELTA), Queenstown, New Zealand, Jan. 2011, pp. 1924. [10] H. Kawashima, et al., Intra-task Analysis of Worst Case Execution Time and Average Energy Consumption on DEPS Framework, in IEICE Technical Report (VLD2010-119), vol. 110, no. 432, Mar. 2011, pp. 19 24 (in Japanese). [11] T. Ishihara, A Multi-Performance Processor for Reducing the Energy Consumption of Real-Time Embedded Systems, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E93-A, no. 12, pp. 25332541, 2010. [12] Y. Ishitobi, et al., Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories, Journal of Signal Processing Systems, vol. 60, no. 2, pp. 211224, 2008. [13] H. Takase, et al., Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems, in Proceedings of Design Automation and Test in Europe (DATE), Dresden, Germany, Mar. 2010, pp. 11241129. [14] R. Banakar, et al., Scratchpad Memory: A Design Alternative for Cache On-chip memory in Embedded Systems, in Proceedings of the International Symposium on Hardware/Software Codesign (CODES), Estes Park, Colorado, pp. 7378, 2002. [15] A. Mizuno, et al., Design Methodology and System for a Congurable Media Embedded Processor Extensible to VLIW Architecture, in Proceeding of IEEE International Conference on Computer Design (ICCD), Dec. 2002, pp. 27. [16] The GNU Compiler Collection, [Online]. Available: http://gcc.gnu.org/ [17] TOPPERS project, [Online]. Available: http://www.toppers.jp/en/ [18] L. Gauthier, et al., Compiler Assisted Energy Reduction Techniques for Embedded Multimedia Processors, in Proceedings of the 2nd APSIPA Annual Summit and Conference, Biopolis, Singapore, pp. 2736, Dec. 2010. [19] Xvid, MPEG-4 Compliant Video Codec, [Online]. Available: http: //www.xvid.org [20] FFmpeg multimedia system, [Online]. Available: http://www.ffmpeg.org

Fig. 7.

The demonstration of the case study (the video-conference system)


700
Energy consumption [mJ]

600 500 400 300 200 100 0 nonopt opt nonopt opt nonopt opt nonopt opt

off-chip on-chip cpu

nonopt

opt

data1

data2

data3

data4

average

Fig. 8.

Evaluation results

consumption is analyzed into three factors; energy of CPU, that of on-chip memory (cache and SPM), and that of off-chip memory, respectively. The label dataX in the x-axis denotes the number of input data set. We used four input data set to the video-conference systems for the evaluation. Also, nonopt and opt in the x-axis denote the energy consumption without and with our framework, respectively. From Fig. 8, the effectiveness of the presented framework is conrmed, signicant energy reduction can be achieved. On average, 44.9 % of the total energy consumption was reduced. It can be said that our framework achieved a synergistic effect by combining several techniques since energy saving on each component was achieved. The energy optimization was stably performed for all the input data set. Moreover, each optimization step in the framework can be completed within a reasonable time. Furthermore, we conrmed that the implementation overhead of the runtime DEPS and SPM management is very small, up to 0.1 % of energy was only paid for the runtime optimization. These results imply that the toolchain and the energy-aware RTOS which have been developed in this work are suitable for practical use. VII. C ONCLUSION In this work, we proposed an integrated optimization framework for minimizing the energy consumption of embedded real-time applications. Our approach mainly utilizes a tradeoff between energy and performance on the processor congurations. The framework can perform the optimal allocation of SPM and the optimal adaptation of processor congurations together. The characteristic and the runtime behavior of the target application are analyzed at the static phase. Then,

276

You might also like