0% found this document useful (0 votes)

16 views7 pages

Zhou 2008

Uploaded by

Gangadharareddy Peddamallu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Zhou 2008

Uploaded by

Gangadharareddy Peddamallu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing

Application Speciﬁc Low Power ALU Design

Yu Zhou and Hui Guo

School of Computer Science & Engineering, University of New South Wales
Sydney, Australia
Email: {zhouyu, huig}@cse.unsw.edu.au

Abstract One typical component in the processor is the Arithmetic

and Logic Unit (ALU). An ALU provides common func-
Power consumption is a critical design issue in embed- tions for arithmetic and logic operations. It contains a set of
ded processor design. One of common components in the functional components. Each functional component is re-
processor is the Arithmetic and Logic Unit (ALU). Usu- sponsible for one type of operations. For example, an adder
ally, ALUs are designed with a combinational logic circuit performs additions.
containing a number of functional components for differ-
ent arithmetic and logic operations. An ALU can be con- A
add sel
B
structed with a tree or a chain structure.
Existing approaches to reduce power often achieve and
S
power reduction at the cost of increased design complexity, MUX

thus resulting in delay and area overheads. In this paper, xor

we present a customization approach for the chain-structure

or
based ALU design by repositioning functional components
in the chain. The approach can be easily integrated into (a)
A
add
a processor design environment to effectively reduce ALU B
MUX
power consumption for a given application. A
and
B
MUX
We have applied our approach to a set of benchmarks. A
xor
Our experimental results show that the power savings range B
MUX
S

from 43.5% to 49.6%; on average, 46.9% of ALU power re- A

or
B
duction can be achieved. Most importantly, this achieve- (b)
ment is at cost of neither hardware complexity nor pro-
cessor performance, and the implementation is extremely Figure 1. Typical Designs (a) Tree Structure
straightforward. (b) Chain Structure

Keywords: Low Power ALU, Application-Speciﬁc Design

There are two typical ALU design structures: tree struc-
ture and chain structure. An example is shown in Figure 1,
1 Introduction where the ALU contains four functional components for ad-
dition, logic and, logic xor, and logic or. With the tree struc-
The stringent requirement for low power consumption ture (see Figure 1(a)) all functional components are con-
has been a big issue in most embedded processor designs. nected in parallel to the multiplexor (denoted as MUX in the
Power reduction on the processor can be fulﬁlled at different ﬁgure); the multiplexor selects one result as the ALU output
design levels, such as transistor sizing [4][14] and thresh- among results from all functional components. In the chain
old voltage scaling [5] [8] at the semiconductor chip design structure (Figure 1(b)), functional components are concate-
level, clock gating [17] [18] and power gating [2][9] at the nated through a series of small multiplexors; each multi-
logic and register transfer level, Dynamic Voltage Scaling plexor takes a subset of functional components and passes
(DVS) [13][15][19] at the system level. Power reduction the result to the output. Therefore, some operations in the
can also be performed on individual functional components chain structure take more levels of multiplexor transmission
of the processor. to the output than others.

978-0-7695-3492-3/08 $25.00 © 2008 IEEE 214

DOI 10.1109/EUC.2008.81
The tree structure often demands more area than the Transistor sizing [4][14] is one approach to reduce
chain structure but its operation is usually faster, as will be switching capacitance at the low design level. Also applied
shown in our simulation results in Section 3. to the low design level is the approach of altering transis-
With a modern processor design, where the processor is tor threshold voltage [5] [8]. Increase of threshold voltage
pipelined into a number of stages, the speed of the processor reduces leakage power consumption .
is determined by the longest delay that is associated with the At the logic gate level, technology mapping is often tar-
critical path. Usually, the ALU is not on the critical path geted for power reduction. Technology mapping automati-
(as will be demonstrated in Section 3). Therefore, some cally constructs a gate-level design representation based on
processor design tools, such as ASIPMeister [11] [1], use a given logic gate library. To find a mapping that will mini-
the chain structure for the ALU to save area. mize the total power consumption under the delay and cost
For the chain structure, there are a variety of options to constraints is an NP-hard problem. Some heuristic algo-
place functional components. So far, the functional com- rithms [6][10][12] have been proposed to find the optimal
ponent placement in the chain is arbitrarily chosen in the mapping for a design.
processor design. To our best knowledge, no work related Moving to the Register Transfer Level (RTL) and system
to this design issue has been reported. In fact, placing a level, clock gating and power gating are used. Clock gating
functional component differently in the chain structure may [17] [18] controls the clock signal from reaching idle func-
cause different power consumption. For example, swapping tional units so that unnecessary switching activity in the idle
the add component and the or component in Figure 1(b) functional units is avoided, hence saving dynamic power. A
may favor some applications and can save a considerable clear description of the clock gating can be found in [17]
amount of ALU power. and a good example of using logic gating is given by Tiwari
In this paper, we investigate the effect on power con- et al. [18]. Power gating [2][9], on the other hand, discon-
sumption of functional component placement in the chain nects the power supply to unused components to eliminate
structure; we proposed a functional component placement them from both dynamic and leakage power consumption.
approach to reduce power consumption for a given applica- The benefits and costs of power gating are investigated in
tion; we implemented the approach in the hardware descrip- [9]. An application approach at the system level design can
tion modeling and integrated it into the ASIPMeister pro- be found in [2].
cessor design tool [11]. Our experiment on a set of bench- There have been many other approaches that apply mul-
marks shows that on average 46.9% of ALU power can be tiple techniques to reduce power consumption. One popular
saved. approach is Dynamic Voltage Scaling (DVS) [13][15][19].
Compared with existing power reduction approaches, DVS scales the supply voltage and clock frequency during
which often incur high area and/or delay overheads and re- circuit operation. The theoretical and practical limits of the
quires considerable implementation efforts, our approach is DVS approach are discussed in [19].
almost cost free and is extremely simple to implement.
1.2 Paper Organization
1.1 Related Work
The rest of the paper is organized as follows. Section 2
discusses the functional component placement in the chain
Power reduction on digital system design has been stud-
structure design and develops a customization approach for
ied for many decades. Power consumption in a digital
an application specific low power ALU. Section 3 presents
system (with CMOS technology) consists of two types:
the experiment setup, followed by the simulation results and
dynamic power from logic signal switching activities and
related discussions. Section 4 concludes the paper.
static power from transistor leakage currents.
Approaches for dynamic power reduction can be classi-
fied into three categories with each category having a dif- 2 Chain Structure Design and Power Reduc-
ferent focus: to reduce switching capacitance, to reduce tion
switching frequency, or to reduce supply voltage.
Similarly, exiting approaches for leakage power reduc- Without loss of generality, we refer to the chain structure
tion can be classified subject to their power reduction strate- as in Figure 2, where there are n functional components and
gies: reducing supply voltage, reducing circuit size, reduc- they are concatenated by 2-to-1 multiplexors.
ing operating temperature, or increasing transistor threshold All functional components in the structure will operate
voltage. for any a calculation request, but at most one of them exe-
Approaches in each category can be even further cutes the required calculation; other components operate for
grouped, based on the design level at which they are ap- nothing except exercising and consuming power. We use
plied. execution to indicate a component operation that responds

215
en1
fun1 A
MUX B add
1
fun2 (a)
MUX
2 A
fun3 MUX R B and
n-1
en2
funn A
B xor
cond1 en1

Figure 2. General Chain Structure (b) en3

en2
cond2 A S
B or
en3
cond3
to a calculation request and whose result will ripple through
en4
a chain of multiplexors to the output. cond4
en4

The dynamic power consumption of the design is de-

termined by the operational activity of the circuit. At the
Figure 3. Implementation Example of Chain
functional component level, the operational activity can be
Structure: (a) Passing Paths (b) Control
represented by
Logic
n
n−1

OP = Oi + Omuxj , (1)
i=1 j=1

where Oi is the operational activity of component i, and executed for a given application. Some instructions corre-
Omuxj the operational activity of multiplexor j. spond to ALU operations, such as add (addition), xor (logic
Given an input to the design, the operational activity of xor), beq (branch on equal). They will use the ALU for
each of the functional components is ﬁxed; change of com- calculation. Some instructions do not involve ALU, such
ponent positions in the chain will not affect the value of Oi as jmp (jump to an instruction memory location). There-
in Formula 1. However, the execution of a component will fore, we partition the instruction set into ALU instructions
lead to a different chain of subsequent multiplexor opera- and non-ALU instructions. The ALU instructions are fur-
tions, depending on the component’s position in the chain. ther grouped according to what functional component they
A component positioned far away from the output causes actually use. For example, instructions add and sub belong
more operations than when it is close to the output; the more to the same group since they use component adder.
frequently the component executes, the higher the opera- The execution frequency of a component is the sum of
tional activity it generates. frequencies of instructions associated with the functional
It is very important to note that when the design structure component.
is implemented with a synthesis design tool, such as Syn- Algorithm 1 summarizes the functional component
opsys Design Compiler, the chain of multiplexors may be placement approach. In the algorithm, different weights
realized with an additional disable function, as illustrated in of power consumption are assigned to different functional
Figure 3, where Figure 3(a) shows the passing paths of the components. The more complicated the component, the
chain structure and Figure 3(b) is the control logic. The low higher the weight. Given two functional components of
level implementation of the multiplexor not only includes the same operational frequency but with different design
the function of selecting one of the inputs as the output, but complexities, we place the component with a higher weight
also integrates a gating logic that disables the propagation closer to the output. The algorithm is self explanatory.
of the inputs to the output. Given an ALU operation, some Elaboration is omitted.
passing paths can be blocked, which effectively reduces un-
necessary signal switchings. 2.2 Design Environment Integration

2.1 ALU Customization The customization technique can be integrated into a

processor design environment. A general design platform
We can customize the ALU design by identifying fre- is given in Figure 4.
quent functional components and placing them close to the The design ﬂow starts from a given application written
output. in a high level programming language. A target machine
The execution frequency of a functional component is architecture is selected and the program is compiled for the
obtained from instruction frequencies. The frequency of target machine. Based on the instruction set of the machine
an instruction is how often the instruction is executed, it architecture, a hardware description model for the processor
is measured in percentage of total number of instructions model is developed by either commercial or in-house devel-

216
Algorithm 1 Functional Component Placement in Chain put to the output), respectively. The experiment setup is
Structure given in Figure 5(a), where the loop exhausts all possible
/* Given the execution trace and the set of functional compo- placements in the chain design.
nents, F C */
step 1: obtain instruction execution frequency, IF ;
C program
step 2: obtain the functional component execution frequency
based on IF ;
step 3: Alu operation Simplescalar
/* Find the location for each component in the chain*/ and operands compiler
ALU VHDL generator
/* Start from the closest level to the output */ Model PISA ISA
current level = 1; Generator

/* If there are more than 2 components in F C */ Profiler

while |F C| > 2 do
S <= most f requent components in F C; TESTBENCH ASIPMeister
ALU

F C <= F C − S; For all ModelSim

Cumtomizer

place
/* Repeat if there are multiple such components */ mnt
while S = φ do Synopsys
Customized
get the component of highest weight, fc in S; Design
Processor
PrimPower
/* Assign the component to the current level */ Compiler
TESTBENCH
level(f c) = current level; ModelSim

/* Go to one level further from the output */ Simulation

Simulation
current level + +; result:
area,
result: Synopsys
power
S <= S − f c; delay
Design
PrimPower
end while Compiler

end while
/* Assign the last two components to the farthest level */ Simulation Simulation
(a)
level(F C) = current level; result: result:
area, delay power

(b)
Processor Model
C program Compilation Profiling
Generation
Figure 5. Experimental Setup (a) ALU Design
Exploration (b) Processor Design with ALU
Simulation Design Processor Model ALU Model Customization
Results Simulation Update Customization

Figure 4. Processor Design Flow with ALU For each iteration, a new ALU model with different com-
Customization ponent placement is generated. Its function is verified by
ModelSim [7]. Next, the design is synthesized with Synop-
sys [16] Design Compiler based on the tsl18fs120 library,
and the related power consumption is estimated by Synop-
opment tools. Algorithm 1 is applied to customize the ALU
sys PrimePower. The three tools (ModelSim, Design Com-
in the processor. The processor model is then updated with
piler and PrimePower) together form the testbench for each
the customized ALU. Since the update does not function-
design evaluation.
ally affect the processor and instruction set architecture, no
Later, we integrated the customization technique into
modification is required to the instruction code. Next, the
ASIPMeister [11], a processor design tool. The related ex-
new processor model for the application code is simulated.
perimental setup is given in Figure 5(b). In this experimen-
The design is synthesized using a synthesis tool. Based on
tal environment, the processor design for a given applica-
the synthesis, the design is evaluated.
tion is automatically generated and the functionality of the
processor model is systematically verified.
3 Experimental Setup and Simulation Re- We choose the Portable Instruction Set Architecture
sults (PISA)[3] as the target processor instruction set architec-
ture. ASIPMeister is used to automatically generate the pro-
To verify our placement approach, we first developed a cessor VHDL model. We use Simplescalar [3] to compile
small stand-alone ALU for full design space exploration. the application program and to profile the program execu-
The ALU contains only four functional components each tion. Based on Algorithm 1, the ALU model in the pro-
for addition, logic xor, logic and and pass (passing one in- cessor is customized. Finally, the processor with the cus-

217
tomized ALU is evaluated using the same testbench de- For "add" Operation
scribed in the setup for design space exploration. 2.70E-05
2.60E-05
2.50E-05
2.40E-05
2.30E-05

Design Area 2.20E-05

2.10E-05
2.00E-05

214

212

210

Area (gates)

208

206

204
202
200 For "and" Operation

2.60E-05

2.50E-05

2.40E-05

2.30E-05

2.20E-05
De sign 2.10E-05
2.00E-05
chain structure tree structure 1.90E-05

# # # # # #

! ! ! ! ! !

# # # # # #

" " " " " "

# #

# # # #

# #
! ! ! !

# # # #

! !
" "

" " " "

Figure 6. Design Area of Different Compo-

# # # #

# #

# # # # ! !

# #

! ! ! !

" " " "

" "

# # # # # #

! ! ! ! ! !
" " " " " "

nent Placements
For "xor" Operation

2.70E-05
2.60E-05
2.50E-05
2.40E-05
2.30E-05
2.20E-05
Design Delay 2.10E-05
2.00E-05
1.90E-05
4.35

4.3
, , , , , ,

% % % % % % % % % % % %

( * ( * ( * ( * ( * ( *
) ) ) ) ) )

$ , $ , $ , $ , $ , $ ,
' ' % % ' % ' % ' % ' %
$ $ $ $ $ $ $ $ $ $ $ $
Delay (ns)

+ + + + + +

, ,
% % % %
& & & & & &

, , , ,

% % % %
& & & & & & & & & & & &

$ , $ ,
' ( ) * % ( ) * ' ( ) * % ( ) *

4.25
$ $ $ $

% , % , % , % ,
& & $ & & $ & $ & $

$ % $ ' $ % $ '

( ) * ( ) *
+ +

$ ' $ % $ ' $ %
+ + + +

, , , ,
& & & & % & % & % & % &

, ,

& & & &

% % , , % % , ,
& & $ $ & & $ $
% % ' ' ( * ( *
$ $ $ $

) )
% % % % , ,
& & & & & & & & $ $

( * ( * ( * ( *
$ ' $ ' ) ) $ % $ % ) )
+ + + +

4.2
$ % $ % $ ' $ '
+ +

, , , , , ,

& & & & & &

% & % & % & % & % & % & % & % & % & % & % & % & $ , $ , $ , $ , $ , $ ,

& & & & & &

( ( ( ( ( (
$ % $ % $ % $ % $ % $ % $ ' $ ' $ ' $ ' $ ' $ ' ) * ) * ) * ) * ) * ) *
+ + + + + +

4.15

4.1

For "pass" Operation

2.70E-05

2.60E-05

2.50E-05

2.40E-05
Design 2.30E-05
2.20E-05
chain structure tree structure 2.10E-05
2.00E-05

, , , , , ,

% % % % % % % % % % % %

( * ( * ( * ( * ( * ( *
) ) ) ) ) )

$ , $ , $ , $ , $ , $ ,
' ' % % ' % ' % ' % ' %
$ $ $ $ $ $ $ $ $ $ $ $

+ + + + + +

, ,

Figure 7. Design Delay of Different Compo-

% % % %
& & & & & &

, , , ,

% % % %
& & & & & & & & & & & &

$ , $ ,
' ( ) * % ( ) * ' ( ) * % ( ) *
$ $ $ $

% , % , % , % ,
& & $ & & $ & $ & $

$ % $ ' $ % $ '

( ) * ( ) *
+ +

$ ' $ % $ ' $ %
+ + + +

, , , ,
& & & & % & % & % & % &

, ,

& & & &

% % , , % % , ,
& & $ $ & & $ $
% % ' ' ( * ( *
$ $ $ $

) )
% % % % , ,
& & & & & & & & $ $

( * ( * ( * ( *
$ ' $ ' ) ) $ % $ % ) )
+ + + +

$ % $ % $ ' $ '
+ +

nent Placements
, , , , , ,

& & & & & &

% & % & % & % & % & % & % & % & % & % & % & % & $ , $ , $ , $ , $ , $ ,

& & & & & &

( ( ( ( ( (
$ % $ % $ % $ % $ % $ % $ ' $ ' $ ' $ ' $ ' $ ' ) * ) * ) * ) * ) * ) *
+ + + + + +

Figure 8. Design Exploration (X-axis: Com-

ponent Placement Structure in ALU, Y-axis:
3.1 Simulation Results
ALU Power Consumption)

3.1.1 ALU Design Space Exploration

As aforementioned, a reduced ALU of four functions was
used for exploring designs of all possible placements in or- The area and delay for designs with different placements
der to verify the effectiveness of our component placement are plotted in Figures 6 and 7, respectively. The tree struc-
approach. The chain structure of the ALU is modeled us- tures (described with the case statement in VHDL) were
ing VHDL if-then-else statement. Different order of the also designed for comparison.
ALU calculations in the if-then-else statement corresponds As can be seen from Figure 6, chain designs have lower
to a different functional component placement in the chain area cost than the tree structure.
structure. From Figure 7, we can see that most chain designs have
Given four functional components, there are 4! = 24 dif- longer delay than the tree structure. But for some designs,
ferent placements. We use notation C1-C2-C3-C4 to denote the delay is smaller than that of the tree structure. This
a placement arrangement. For example, add-pass-xor-and can be explained as follows: The critical path of the tree
represents a placement where component for and is placed structure is the longest functional component plus the 4-to-
in the closest position to the output, and add & pass are po- 1 multiplexer; while the critical path in the chain structure
sitioned to the far end of the chain from the output. This is the maximal value of delays of all functional components
notation appears in Figures 6 & 7 & 8 for the chain designs to the output. The delay of the chain varies with the com-
with different component placements. ponent placement. Placing the longest component closest

218
to the output reduces the maximal delay. The adder is the and, or, xor and nor of each application. Their relative fre-
longest component, therefore, when it is positioned next to quencies are displayed in Figure 9.
output in the chain, the overall delay is reduced.
It is worth to note that the power consumption is closely ALU Operation Frequency

related to the input. Different inputs will result in different 100%

Operation Frequency
power consumption; inputs with high switching frequen- 80% Nor(%)
cies are very likely to bring about high power consumption. 60% Xor(%)
Since we are not interested in the effect of inputs on power 40%
Or(%)
And(%)
consumption, we used a same set of random input data of
20% Add(%)
operations for different design structures to eliminate its ef-
0%
fect on our component placement approach.

To check whether the placement affects the power con-

sumption, we carried out a series of tests with the same se-

quence of input data. Each test runs the ALU with a single Benchmark

type of operation. The power consumption for a given type

of operation with different placements is shown in each plot Figure 9. Operation Frequency
of Figure 8, where power is measured in Watts.
Based on the simulation results, we can see that for a As can be seen from Figure 9, for all applications, addi-
given type of operation, the power consumption varies with tion has a highest frequency, dominating other ALU opera-
the different component placement in the chain. It reaches tions. Therefore, for all designs, the adder is placed closest
the minimal level when the related functional component is to the output.
placed closest to the output. There are 3! = 6 such cases for We measured the ALU power consumption for each de-
each type of operation. For instance, when only additions sign. To verify whether the ALU customization affects
are performed (see the top plot in Figure 8), six designs the processor speed, we also evaluated the processor clock
(as highlighted in the plot) all with the adder closest to the speed (determined by the critical path delay) for each de-
output have the lowest power consumption. Similar obser- sign. The results are given in Table 1.
vations (refer to the rest of plots in Figure 8) can be made For each application (row 1, columns 3-11 in the table),
for the other operations . The results therefore verify our we have two designs: normal non-custom design, directly
placement customization approach. from ASIPMeister and the design with the ALU customized
It must be noted that the power savings here are not sig- (see the label custom in the table). The processor clock time
nificant. This is because the low bit-switchings of the inputs is given in rows 2 & 3. Power consumption for ALU in the
used in the simulation. Nevertheless, it does not affect our processor is given in rows 4 & 5 with the reduction rate
investigation on the effect of different placements on power being presented in the last row.
consumption. From the table, we can see that the CPU clock time
remains unchanged throughout all designs, which demon-
strates that the ALU is not on the critical path and differing
3.1.2 Application Specific ALU
functional component placements in ALU does not affect
To see the effectiveness of our approach in real applications, the processor clock speed. However, with our customiza-
we applied the ALU customization technique to the proces- tion approach, the ALU power can be reduced in a range
sor design for a set of benchmarks in the design environ- from 43.5% to 49.6%; on average, 46.9% ALU power can
ment shown in Figure 5(b). With ASIPMeister, the multi- be saved as compared to non-custom designs.
plier and divider are separated from the ALU. Therefore, It is worth to note that the design results are partially af-
none of the ALU designs in the experiment include those fected by the low level implementation (such as logic map-
functional components. ping, layout) based on the synthesis tool and design library.
ASIPMeister models the ALUs with two chains: a chain But, the effect of different design models at the high level
of functional components as has been discussed and a chain described in HDL can still be observed from the final syn-
of different inputs to the adder for different types of ad- thesized results and the proposed design approach can be
ditions, such as a + b and a − b. Therefore, we applied verified.
Algorithm 1 jointly to both chains.
For each design, the functional component execution fre- 4 Conclusions
quencies were obtained from the Simplescalar profiler.
Based on the set of benchmarks we used in our experi- In this paper, we discussed the effect of component
ments, there are five most used ALU operations: addition, placement on the chain structure design. We found that the

219
Table 1. Customized ALU in Application Speciﬁc Processors
Application design qsort aes crc des RC4 rsa dijkstra stringsearch sha average
CPU clk time non-custom 7.77 7.77 7.77 7.77 7.77 7.77 7.77 7.77 7.77
(ns) custom 7.77 7.77 7.77 7.77 7.77 7.77 7.77 7.77 7.77
ALU power non-custom. 1.0170 0.7826 1.2880 1.6590 1.0110 1.1160 1.1170 1.4010 1.3980
(mW) custom 0.5557 0.4418 0.6534 0.8367 0.5214 0.6132 0.6073 0.7403 0.7302
pow.red(%) 45.4 43.5 49.3 49.6 48.4 45.1 45.6 47.2 47.8 46.9

order of functional components in the chain affects power Proceedings of the ACM/SIGDA 12th International Sympo-
consumption. To reduce power consumed by the chain, the sium on Field Programmable Gate Arrays, pages 109–117,
frequently operating component should be positioned close 2004.
to the output of the chain. [7] M. Graphics. Modelsim. (http://www.model.com/).
[8] Y.-T. Ho and T.-T. Hwang. Low power design using dual
We developed a VHDL customization approach for the
threshold voltage. In Proceedings of the Asia and South Pa-
low power ALU design. The customization is extremely cific Design Automation Conference, pages 205–208, 2004.
simple. It entails neither additional control logic nor modi- [9] H. Jiang, M. Marek-Sadowska, and S. Nassif. Benefits and
fication to the interface of the ALU model in the processor costs of power-gating technique. In Proceedings of the 2005
design. It only repositions the functional components in the IEEE International Conference on Computer Design: VLSI
ALU chain structure by swapping the order of ALU opera- in Computers and Processors, pages 559– 566, 2005.
tions in the related if-then-else statement in the hardware de- [10] H. Li, S. Katkoori, and W.-K. Mak. Power minimization
scription model. The approach can be readily integrated to algorithms for lut-based fpga technology mapping. ACM
an existing tool that uses a hardware description language. Trans. Design Autom. Electr. Syst., 9:33–51, 2004.
[11] M. I. M, S. Higaki, Y. Takeuchi, A. Kitajima, M. Imai,
We implemented the approach into an Application Specific
J. Sato, and A. Shiomi. Peas-iii: An asip design environ-
Processor design tool, ASIPMeister [11] which is available
ment. In Proceedings of the 2000 IEEE International Con-
to the public. ference on Computer Design, pages 430 – 436, 2000.
Our experiments on a set of benchmarks have shown that [12] R. A. Rutenbar, L. R. Carley, R. Zafalon, and N. Dragone.
on average, 46.9% ALU power can be achieved with our Low-power technology mapping for mixed-swing logic. In
design approach and the power saving is at no cost of pro- Proceedings of the International Symposium on Low Power
cessor performance. Electronics and Design, pages 291–294, 2001.
The component placement approach may be applicable [13] L. Shang, L. Peh, and N. Jha. Dynamic voltage scaling
to other designs with a similar chain structure (such as with links for power optimization of interconnection net-
floating-point ALUs), which will be studied in the future. works. In Proceedings of International Symposium on High-
Performance Computer Architecture, pages 91–102, 2003.
[14] J. M. Shyu, A. Sangiovanni-Vincentelli, J. Fishburn, and
References A. Dunlop. Optimization-based transistor sizing. IEEE
Journal of Solid-State Circuits, 23:400–409, 1988.
[1] Asip-meister. (http://www.eda-meister.org/asip-meister/). [15] T. Simunic, L. Benini, A. Acquaviva, P. Glynn, and G. D.
[2] K. Agarwal, H. Deogun, D. Sylvester, and K. Nowka. Power Micheli. Dynamic voltage scaling and power management
gating with multiple sleep modes. In Proceedings of the 7th for portable systems. In Proceedings of the 38th Conference
ACM/IEEE International Symposium on Quality Electronic on Design Automation, pages 524–529, 2001.
Design, January 2006. [16] Synopsys. Synopsys design compiler.
[3] T. Austin, E. Larson, and D. Ernst. Simplescalar: An (http://www.synopsys.com/).
infrastructure for computer system modeling. Computer, [17] C. Thimmannagari. CPU Design: Answers to Frequently
35(2):59–67, 2002. Asked Questions. Springer, 2005.
[4] M. Borah, R. M. Owens, and M. J. Irwin. Transistor sizing [18] V. Tiwari, S. Malik, and P. Ashar. Guarded evaluation: Push-
for low power cmos circuits. IEEE Trans. on Computer- ing power management to logic synthesis/design. In Pro-
Aided Design of Integrated Circuits and Systems, 15:665– ceedings of the 9th International Symposium on Low Power
671, 1996. Design, pages 221–226, 1995.
[5] B. H. Calhoun, F. A. Honore, and A. Chandrakasan. Design [19] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner. Theo-
methodology for fine-grained leakage control in mtcmos. In retical and practical limits of dynamic voltage scaling. In
Proceedings of the International Symposium on Low Power Proceedings of the 41st Conference on Design Automation,
Electronics and Design, pages 104–109, 2003. pages 868 – 873, 2004.
[6] D. Chen, J. Cong, F. Li, and L. He. Low power technology
mapping for fpga architectures with dual supply voltages. In

220

Unit V - Sources of Power Dissipation
No ratings yet
Unit V - Sources of Power Dissipation
52 pages
R23 III-I CSE (AI) Computer Vision and Image Processing Question Bank
No ratings yet
R23 III-I CSE (AI) Computer Vision and Image Processing Question Bank
27 pages
Keywords-Gate Diffusion Input, Pass Transistor Logic, Arithmetic Circuit, Power Dissipation, Propagation Delay
No ratings yet
Keywords-Gate Diffusion Input, Pass Transistor Logic, Arithmetic Circuit, Power Dissipation, Propagation Delay
47 pages
LPV 06
No ratings yet
LPV 06
52 pages
Chapter 4
No ratings yet
Chapter 4
35 pages
Low Power
No ratings yet
Low Power
67 pages
Eytu Lecture2-3
No ratings yet
Eytu Lecture2-3
114 pages
Factors Affecting Power Consumption in VLSI
No ratings yet
Factors Affecting Power Consumption in VLSI
44 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
6 pages
16 - 02 - 2023 - 1894968801 BEEE Vemu
No ratings yet
16 - 02 - 2023 - 1894968801 BEEE Vemu
239 pages
Lec 38
No ratings yet
Lec 38
31 pages
Svcet: 1. Addressing Modes
100% (1)
Svcet: 1. Addressing Modes
14 pages
30VLSI System Level
No ratings yet
30VLSI System Level
49 pages
AME UNIT I Antenna Basics
No ratings yet
AME UNIT I Antenna Basics
46 pages
Power and Speed Trade-Offs in Data Path Structures Array Subsystems
100% (1)
Power and Speed Trade-Offs in Data Path Structures Array Subsystems
54 pages
JNTUA Low Power VLSI Circuits & Systems Notes - R15
No ratings yet
JNTUA Low Power VLSI Circuits & Systems Notes - R15
68 pages
Principles of Compiler Design
100% (2)
Principles of Compiler Design
35 pages
DR Abdul Saleemand Vamsi
No ratings yet
DR Abdul Saleemand Vamsi
14 pages
Low Power 16×16 Bit Multiplier Design Using Dadda Algorithm
No ratings yet
Low Power 16×16 Bit Multiplier Design Using Dadda Algorithm
17 pages
Survey On Power Optimization Techniques For Low Power Vlsi Circuit in Deep Submicron Technology
No ratings yet
Survey On Power Optimization Techniques For Low Power Vlsi Circuit in Deep Submicron Technology
15 pages
Low Power VLSI Design
No ratings yet
Low Power VLSI Design
12 pages
Power Optimization in VLSI
No ratings yet
Power Optimization in VLSI
30 pages
MEMS - Unit 1
No ratings yet
MEMS - Unit 1
50 pages
Why Low Power Design?
No ratings yet
Why Low Power Design?
29 pages
DSP Lab Manual 2016
No ratings yet
DSP Lab Manual 2016
113 pages
Chapter Five
No ratings yet
Chapter Five
13 pages
LPVD U1,2
No ratings yet
LPVD U1,2
34 pages
Unit 3 03032023-Lecture Notes On Antennas and Microwave Engineering (3-2 Ece, r20, Jntua) - 86-174
No ratings yet
Unit 3 03032023-Lecture Notes On Antennas and Microwave Engineering (3-2 Ece, r20, Jntua) - 86-174
89 pages
Unit 5 03032023-LECTURE NOTES ON ANTENNAS AND MICROWAVE ENGINEERING (3-2 ECE, R20, JNTUA) - 230-249
No ratings yet
Unit 5 03032023-LECTURE NOTES ON ANTENNAS AND MICROWAVE ENGINEERING (3-2 ECE, R20, JNTUA) - 230-249
20 pages
k2 V11ea1
No ratings yet
k2 V11ea1
30 pages
Final Documentation
No ratings yet
Final Documentation
63 pages
Design and Implementation of 4-Bit Arithmetic and Logic Unit Chip With The Constraint of Power Consumption
No ratings yet
Design and Implementation of 4-Bit Arithmetic and Logic Unit Chip With The Constraint of Power Consumption
8 pages
LPVD U1
No ratings yet
LPVD U1
21 pages
MEMS Unit5
No ratings yet
MEMS Unit5
26 pages
MEMS
No ratings yet
MEMS
26 pages
Lecture Notes: B.Tech
No ratings yet
Lecture Notes: B.Tech
68 pages
Signal Processing (E.g. For Multimedia and Wireless Communications)
No ratings yet
Signal Processing (E.g. For Multimedia and Wireless Communications)
10 pages
Cmos Power Consumption AND Approaches Towards Low Power Design
No ratings yet
Cmos Power Consumption AND Approaches Towards Low Power Design
24 pages
3 Anandi
No ratings yet
3 Anandi
27 pages
Low Power Vlsi Design: Assignment-1 G Abhishek Kumar Reddy, M Manoj Varma
No ratings yet
Low Power Vlsi Design: Assignment-1 G Abhishek Kumar Reddy, M Manoj Varma
17 pages
Low Power VLSI Design
No ratings yet
Low Power VLSI Design
6 pages
Algebra 1
No ratings yet
Algebra 1
69 pages
Lasc As 2010
No ratings yet
Lasc As 2010
4 pages
LP Main
No ratings yet
LP Main
10 pages
Q Electrical Dinamic Power
No ratings yet
Q Electrical Dinamic Power
8 pages
Unit 5
No ratings yet
Unit 5
11 pages
Invitayion For DDC Meeting-Ravi
No ratings yet
Invitayion For DDC Meeting-Ravi
2 pages
Lecture13 03 PDF
No ratings yet
Lecture13 03 PDF
35 pages
Chapter-4 Low Power Computing: Sources of Energy Consumptions
No ratings yet
Chapter-4 Low Power Computing: Sources of Energy Consumptions
3 pages
IJCRT1872033
No ratings yet
IJCRT1872033
10 pages
Translation Term Paper
100% (1)
Translation Term Paper
7 pages
AWP Unit I Dipole, Loop Antennas
No ratings yet
AWP Unit I Dipole, Loop Antennas
22 pages
Rends and Challenges in Vlsi: BY: Bhanuteja Labishetty
No ratings yet
Rends and Challenges in Vlsi: BY: Bhanuteja Labishetty
35 pages
Power Aware WP
No ratings yet
Power Aware WP
29 pages
Kinera
No ratings yet
Kinera
15 pages
ISA Certified Automation Professional (CAP) Associate: Certification Exam Prep: 500 Practice Exam Questions and Explanations
From Everand
ISA Certified Automation Professional (CAP) Associate: Certification Exam Prep: 500 Practice Exam Questions and Explanations
Steve Brown
No ratings yet
11 - Chepter 3 PDF
No ratings yet
11 - Chepter 3 PDF
17 pages
Grade 1 To 12 Daily Lesson Log: Monday Tuesday Wednesday Thursday Friday
No ratings yet
Grade 1 To 12 Daily Lesson Log: Monday Tuesday Wednesday Thursday Friday
5 pages
Introduction To Low Power Design: M. Balakrishnan CSE Department, IIT Delhi
No ratings yet
Introduction To Low Power Design: M. Balakrishnan CSE Department, IIT Delhi
25 pages
Energy Efficient CMOS Microprocessor Design
No ratings yet
Energy Efficient CMOS Microprocessor Design
10 pages
Holiday HW - Linear Transformation Practice Worksheet
No ratings yet
Holiday HW - Linear Transformation Practice Worksheet
18 pages
1 s2.0 0026269296000109 Main PDF
No ratings yet
1 s2.0 0026269296000109 Main PDF
14 pages
Low Power Design of Digital Systems
No ratings yet
Low Power Design of Digital Systems
28 pages
Grammar Booklet
No ratings yet
Grammar Booklet
46 pages
Chapter 17: Low-Power Design: Keshab K. Parhi and Viktor Owall
No ratings yet
Chapter 17: Low-Power Design: Keshab K. Parhi and Viktor Owall
34 pages
Designing For Low Power in Soc Projects
No ratings yet
Designing For Low Power in Soc Projects
14 pages
ME Notes Unit 4 Part 3 Low Power IC and RFICs
No ratings yet
ME Notes Unit 4 Part 3 Low Power IC and RFICs
5 pages
Penitential Rosary
No ratings yet
Penitential Rosary
16 pages
Power Optimization For Low Power VLSI Circuits
No ratings yet
Power Optimization For Low Power VLSI Circuits
4 pages
Unit Ii
No ratings yet
Unit Ii
16 pages
Vehicle Reidentification With Self-Adaptive Time Windows For Real-Time Travel Time Estimation
No ratings yet
Vehicle Reidentification With Self-Adaptive Time Windows For Real-Time Travel Time Estimation
13 pages
PS Salary
No ratings yet
PS Salary
11 pages
Low-Power VLSI Design TOC
No ratings yet
Low-Power VLSI Design TOC
3 pages
How To Change Resume On Linkedin
100% (1)
How To Change Resume On Linkedin
8 pages
Levels of Integration: (Ics Are Categorized According To Number of Gates in Single Package)
No ratings yet
Levels of Integration: (Ics Are Categorized According To Number of Gates in Single Package)
11 pages
RTL Design Techniques To Reduce The Power Consumption of FPGA Based Circuits
No ratings yet
RTL Design Techniques To Reduce The Power Consumption of FPGA Based Circuits
6 pages
31IJELS 109202050 Licensing PDF
No ratings yet
31IJELS 109202050 Licensing PDF
6 pages
Low Power Syntheis
100% (3)
Low Power Syntheis
18 pages
Autotheory As Rebellion - On Research, Embodiment, and Imagination in Creative No
No ratings yet
Autotheory As Rebellion - On Research, Embodiment, and Imagination in Creative No
20 pages
Dynamic Power Reduction WP
No ratings yet
Dynamic Power Reduction WP
6 pages
3 B Morphology
No ratings yet
3 B Morphology
3 pages
How To Schedule Query Extracts Using RSCRM - BAPI
No ratings yet
How To Schedule Query Extracts Using RSCRM - BAPI
14 pages
Notes Graph
No ratings yet
Notes Graph
9 pages
Ege 18 Idp
No ratings yet
Ege 18 Idp
3 pages
SB 4.15.22
No ratings yet
SB 4.15.22
9 pages
Cmos Low Power
No ratings yet
Cmos Low Power
5 pages
Untitled
No ratings yet
Untitled
3 pages
Gottschalk's Conjecture Attempt
No ratings yet
Gottschalk's Conjecture Attempt
12 pages
MP 4 JSD
No ratings yet
MP 4 JSD
8 pages
Meinberg m320 Datasheet
No ratings yet
Meinberg m320 Datasheet
5 pages
IT3101 - Object-Oriented Systems Development: University of Colombo, Sri Lanka
No ratings yet
IT3101 - Object-Oriented Systems Development: University of Colombo, Sri Lanka
12 pages
A Study of Low Power Design Techniques For Application Specific Processors
No ratings yet
A Study of Low Power Design Techniques For Application Specific Processors
2 pages
Grammar Summary Unit 4-1
No ratings yet
Grammar Summary Unit 4-1
2 pages
Passive Exercises With Answers
No ratings yet
Passive Exercises With Answers
6 pages
Passive Voice
No ratings yet
Passive Voice
5 pages
Mems Question Paper
No ratings yet
Mems Question Paper
1 page
Soal B.inggris 7-9
No ratings yet
Soal B.inggris 7-9
5 pages
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
From Everand
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
Mamta Devi
No ratings yet
Gutierrez Gaby Tte 540 Unit Plan
No ratings yet
Gutierrez Gaby Tte 540 Unit Plan
7 pages
Assignment 2 - Design Patterns
No ratings yet
Assignment 2 - Design Patterns
4 pages
Switch Alcatel 9700
No ratings yet
Switch Alcatel 9700
6 pages
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
From Everand
Analog Dialogue, Volume 47, Number 1: Analog Dialogue, #9
Analog Dialogue
No ratings yet
Analog Dialogue, Volume 45, Number 3: Analog Dialogue, #3
From Everand
Analog Dialogue, Volume 45, Number 3: Analog Dialogue, #3
Analog Dialogue
No ratings yet
Purpose: Autopipe - Installation Test Set
No ratings yet
Purpose: Autopipe - Installation Test Set
1 page

Zhou 2008

Uploaded by

Zhou 2008

Uploaded by

2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing

Application Speciﬁc Low Power ALU Design

Yu Zhou and Hui Guo

Abstract One typical component in the processor is the Arithmetic

thus resulting in delay and area overheads. In this paper, xor

we present a customization approach for the chain-structure

from 43.5% to 49.6%; on average, 46.9% of ALU power re- A

Keywords: Low Power ALU, Application-Speciﬁc Design

978-0-7695-3492-3/08 $25.00 © 2008 IEEE 214

Figure 2. General Chain Structure (b) en3

The dynamic power consumption of the design is de-

2.1 ALU Customization The customization technique can be integrated into a

/* If there are more than 2 components in F C */ Profiler

F C <= F C − S; For all ModelSim

/* Go to one level further from the output */ Simulation

Design Area 2.20E-05

" " " " " "

Figure 6. Design Area of Different Compo-

& & & &

& & & & & &

& & & & & &

Figure 7. Design Delay of Different Compo-

& & & &

& & & & & &

& & & & & &

Figure 8. Design Exploration (X-axis: Com-

3.1.1 ALU Design Space Exploration

related to the input. Different inputs will result in different 100%

To check whether the placement affects the power con-

sumption, we carried out a series of tests with the same se-

type of operation. The power consumption for a given type

You might also like