0% found this document useful (0 votes)
77 views30 pages

Power Modeling and Optimization For Ddr3 Sdram

This document is a thesis submitted by Praxal Shah to the Indian Institute of Technology Delhi for the degree of Master of Technology. The thesis proposes a power model and optimization techniques for DDR3 SDRAM memory. It presents a high-level power model based on memory access counts that can be used for dynamic power management. It also proposes exploiting the power down and self-refresh low power modes of DDR3 SDRAM to reduce power consumption. An adaptive threshold technique is introduced to increase time spent in self-refresh mode while reducing performance penalties of frequent mode transitions. The thesis is supervised by Preeti Ranjan Panda from the Department of Computer Science and Engineering at IIT Delhi.

Uploaded by

praxal shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views30 pages

Power Modeling and Optimization For Ddr3 Sdram

This document is a thesis submitted by Praxal Shah to the Indian Institute of Technology Delhi for the degree of Master of Technology. The thesis proposes a power model and optimization techniques for DDR3 SDRAM memory. It presents a high-level power model based on memory access counts that can be used for dynamic power management. It also proposes exploiting the power down and self-refresh low power modes of DDR3 SDRAM to reduce power consumption. An adaptive threshold technique is introduced to increase time spent in self-refresh mode while reducing performance penalties of frequent mode transitions. The thesis is supervised by Preeti Ranjan Panda from the Department of Computer Science and Engineering at IIT Delhi.

Uploaded by

praxal shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Power Modeling and Optimization for

DDR3 SDRAM

A thesis submitted in partial fulfillment of the requirements of the degree

of

Master of Technology

in

Integrated Electronics and Circuits

by
Praxal Shah
Entry No. 2012EEN2335

Under the supervision of

Preeti Ranjan Panda (CSE)

DEPARTMENT OF ELECTRICAL ENGINEERING


INDIAN INSTITUTE OF TECHNOLOGY DELHI
JUNE 2014
Certificate

This is to certify that the thesis entitled Power Modeling and Optimization for DDR3 SDRAM,
submitted by Praxal Shah to the Indian Institute of Technology Delhi, for the partial fulfillment
of the award of the degree of Master of Technology, is a record of bona-fide work carried out
by him under my supervision and guidance. The work presented in this thesis has not been
submitted elsewhere either in part or full, for the award of any other Degree or Diploma.

Preeti Ranjan Panda


Professor
Department of Computer Science and Engineering
Indian Institute of Technology Delhi
New Delhi - 110 016
Acknowledgments

I take this opportunity to thank my mentor, Prof. Preeti Ranjan Panda, for his continuous
support and valuable guidance during the course of thesis. It is my fortune to have worked
closely with Sir, who provided innumerable insights and encouraged me for the project work.
I would also like to thank Namita Sharma (Pursuing PhD from Department of Computer
Science & Engineering, IIT Delhi) for her continuous support and suggestions.
I would like to extend my thanks to all faculty members and the Institute for providing a
wonderful learning environment during the entire duration of the course. My heartfelt thanks to
all my friends who made my stay at IIT Delhi a memorable one. Last but not the least, I thank
my parents for their constant encouragement and support.

Praxal Shah
IIT Delhi, India
Abstract

Power optimization has become critical in modern systems with high component densities.
The memory subsystem consumes considerable fraction of the overall power thereby making
memory power optimization an important area of exploration. Dynamic power management
techniques require power estimates for the decision on the policies to be adopted. This creates
a need for highly accurate power models.
In this thesis, we propose a high level power model for DDR3 SDRAM that can be used
for implementing dynamic power management policies. We also propose power optimization
techniques exploiting the two low power operating modes – Power Down and Self Refresh.
The power savings are proportional to the time spent in the low power modes. With the SR
mode being the lowest power operating mode but with high performance overhead, we propose
an adaptive threshold technique to increase the overall gain. The proposed technique improves
the power savings by increasing the time spent in Self-Refresh mode and the performance by
reducing the re-synchronization penalties by avoiding transitions to SR mode for short IDLE
periods.
Contents

Certificate i

Acknowledgments ii

Abstract iii

Abstract iii

List of Figures vi

1 Introduction 1
1.1 SDRAM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Memory Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work 5
2.1 Power Modeling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 DRAM Power Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . 5

3 Proposed Power Model 8


3.1 Memory Access Count based Power Model . . . . . . . . . . . . . . . . . . . 8
3.2 Methods for Self-Refresh Fraction Estimation . . . . . . . . . . . . . . . . . . 10
3.3 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Accuracy of Self-Refresh Fraction Estimation Methods . . . . . . . . . 12
3.3.3 Power Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Power Optimization Strategies 15


4.1 Exploration for SR Threshold Selection . . . . . . . . . . . . . . . . . . . . . 15
4.2 Power Optimization through Mode Switching . . . . . . . . . . . . . . . . . . 16

iv
CONTENTS v

4.3 SRth Selection for Adaptive SR Implementation . . . . . . . . . . . . . . . . . 17


4.4 Power Optimization using Adaptive SR Threshold . . . . . . . . . . . . . . . . 18

5 Conclusion 20

Bibliography 21
List of Figures

1.1 Simplified DDR3 SDRAM Architecture and FSM . . . . . . . . . . . . . . . . 2


1.2 Memory Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Curve fitting for Power Model based on self-refresh fraction . . . . . . . . . . 9


3.2 Curve fitting for Power Model based on last level cache misses per cycle . . . . 9
3.3 Curve fitting for Power Model based on both the parameters . . . . . . . . . . 10
3.4 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 TSR estimation error using first approach . . . . . . . . . . . . . . . . . . . . . 12
3.6 Comparison of the different TSR estimation strategies at SRth = 1000 . . . . . . 13
3.7 Estimation Errors for different models . . . . . . . . . . . . . . . . . . . . . . 13

4.1 TSR estimation error using proposed approaches . . . . . . . . . . . . . . . . . 16


4.2 Power and Delay variations for different operating modes of DDR3 SDRAM . 16
4.3 Energy variation with SRth for some benchmarks . . . . . . . . . . . . . . . . 17
4.4 Delay and Power variations for some benchmarks . . . . . . . . . . . . . . . . 18
4.5 Illustration for time gap prediction . . . . . . . . . . . . . . . . . . . . . . . . 18
4.6 Power and Delay Comparison for different power optimization strategies . . . . 19

vi
Chapter 1

Introduction

Power optimization has become one of the major challenges in modern mobile, embedded, and
wireless devices. The urge to increase processor performance and memory density is leading
to increased power consumption. With increasing number of cores on a single chip for per-
formance enhancement, the data accesses, storage, and computation requirements are growing
exponentially. To meet these requirements, there is a need to have main memory with high
bandwidth and capacity. Studies on power consumption distribution in smartphones and data-
centers [1] have shown that the memory sub-system consumes around 40% of the total power.
In servers, the memory consumes around 1.5 times the core power consumption [1]. Thus, with
memory contributing a major fraction of the system’s power dissipation, power management in
this sub-system becomes an imporatant problem to be addressed. Many dynamic power man-
agement techniques require an estimation model for accuracy in decision making. In this thesis,
we build and validate a power estimation model. We also explore the effect of the threshold
factor, the number of cycles for which the banks are in idle mode before switching to power
saving states, on the system’s power consumption. This exploration gives a direction for power
optimization of the DDR3 SDRAM system under consideration.
In the next sections, we briefly discuss the DDR3 SDRAM architectural details, the state
diagram and the memory controller operational details along with the different power saving
modes to refresh the basics in the context of this thesis.

1.1 SDRAM Architecture


The basic architecture of Synchronous DRAM is shown in Figure 1.1(a). It is organized into
columns, rows, banks, and ranks. Bank is a two-dimensional array of DRAM cells, each of
which is capable of holding one information bit. A Rank has multiple Banks, each of which
can be individually powered on or off. A cell is addressed by first selecting the Bank, then

1
Introduction 2

the row is selected by the row decoder and bought into a row buffer, followed by the column
decoder selecting the required column.

(a) Architecture (b) Simplified State Diagram

Figure 1.1: Simplified DDR3 SDRAM Architecture and FSM

Figure 1.1(b) shows a simplified state diagram of the DDR3 SDRAM. We discuss below
the operations in these states:

• ACTIVE: In this state, a particular row in a bank is selected and entire row is moved to
the row buffer until the next PRECHARGE.

• READ/WRITE: In this state a burst READ/WRITE is initiated from/in row buffer. Once
the burst READ/WRITE completes, the SDRAM moves to ACTIVE state and waits until
the next command.

• PRECHARGE: Before opening another row for read operation, it is necessary to write
back data from the row buffer to the last accessed row since reading is destructive in
SDRAM. This operation is performed in PRECHARGE state.

• REFRESH: Each bank in SDRAM is periodically refreshed to restore the lost charge
from the memory cells. All the banks must be in IDLE state before REFRESH command
is issued.

• SELF-REFRESH: SELF-REFRESH (SR) is a low power state in SDRAM. All the banks
should be in IDLE mode before entering into self-refresh mode.
Introduction 3

• POWER DOWN: POWER DOWN is another power saving state. Compared to SELF-
REFRESH state, the power consumption in this state is higher. Further, the SDRAM is
not refreshed in this state. Thus, being in PowerDown state for a longer duration may
result in data loss.

1.2 Memory Controller


Memory controller is the sub-system where the policies for memory operation are implemented.
It serves as an interface between CPU and DRAM. Figure 1.2 shows the memory controller’s
interface with the overall computer system organization.

Figure 1.2: Memory Controller

Both the ends of memory controller work at different speeds, so it is required to maintain
a queue to store the requests. Last level cache misses arrive at the memory controller and are
stored in the queue. In this queue, the memory controller may re-arrange memory requests in
order to save power and improve performance. After reordering of requests, the correspond-
ing commands (ACTIVE, PRECHARGE, READ, WRITE, etc.) are issued to the SDRAM,
observing the appropriate timing protocol.

1.3 Contributions
We make the following contributions.

• We propose a power model of the DDR3 SDRAM based on memory access counts and
time spent in self-refresh state, and validate the model on PARSEC benchmarks.

• We present an exploration for the SR threshold selection, the number of cycles after which
the memory bank should be moved to SR mode, to minimize the overheads involved with
state switching.

• Propose a power optimization utilizing the above exploration results for switching across
different operating modes of DDR3 SDRAM.
Introduction 4

1.4 Organization of the thesis


The rest of the thesis is organized as follows. A survey of the related research is presented in
Chapter 2. We propose a power model for the DDR3 SDRAM in Chapter 3. In Chapter 4, we
discuss the exploration for SR threshold and propose and evaluate a power optimization strategy
using the available operating modes for the memory. We conclude with Chapter 5 summarizing
the contributions of the thesis.
Chapter 2

Related Work

In this chapter, we survey the existing works related to power modeling methodologies and the
power optimization strategies for DRAM.

2.1 Power Modeling Techniques


Power models are required for evaluating dynamic power management policies. An energy
estimation technique for embedded DRAM (eDRAM) using parameters such as eDRAM access
count and row and column activation count, has been proposed by Park et al. [2]. Ji et al.
estimated power for DRAM as a function of the percentage of current values in different FSM
states, different operations and the supply voltage [3].
The power estimation approach proposed by Cho et al. is based on the cycle count for each
SDRAM state and the number of state transitions [4]. Chandrasekar et al. [5] improved over the
SDRAM power model by Micron [6] by including the power consumption when transitioning
to low power modes and using exact timings between the commands.
In this thesis, we propose a power model based on the memory access count per cycle and
fractional time spent in the Self-Refresh state.

2.2 DRAM Power Optimization Strategies


With growing DRAM technologies, starting from RDRAM in early 90s to the currently used
DDR3 SDRAM, several changes appear in their power operating modes. Accounting for these,
several DRAM power optimization techniques have been explored by researchers. Lebeck
et al. [7] proposed two hardware based techniques for determining the power state of a chip
with multiple banks – the time between the two accesses to the chip, and intelligent code and
data placement so that page allocation is continuous rather than random. The chips may be

5
Related Work 6

moved to the same operating mode or controlled separately based on the time between the
accesses to the chip. Delauz et al. [8] investigated Software and Hardware based techniques
for controlling DRAM states. Software based techniques include transition to different states
based on idle period detection through evaluation of all the choices; and grouping the array
variables with similar lifetime access patterns and mapping them to the same bank. Different
threshold prediction techniques such as adaptive, constant and history based techniques are
used for transitioning between the different operating modes.
De la Luz et al. [9] proposed a dynamic data migration strategy that groups the arrays with
similar access counts across the sampling points. Mapping a group to the same bank helps in
reducing the number of banks to be kept active. Zhou et al. [10] analyzed the cache misses vs.
memory size curves to determine the maximum memory size required. Once the ratio becomes
constant, the maximum memory size required can be determined and accordingly other banks
can be powered down. Mapping the applications to banks in a rank according to their memory
access intensity and row-buffer locality helps in keeping some ranks to low power mode for
longer periods of time, thereby saving power [11].
Lyuh et al. [12] explored the scheduling and binding techniques to reduce the active time
for a bank. Data recomputation [13] was another technique to keep the banks in low power
mode for longer durations. Code re-structuring helps in grouping the cache hits and misses
during execution [14]. This leads to clustering of memory accesses and memory idle cycles
giving an opportunity to keep banks in low power states for longer times. Amin et al. [15]
proposed a cache replacement policy to enable longer low power states for DRAM ranks. When
conflict misses occur, the cache lines corresponding to high priority ranks are not written back
to memory and instead other lines are moved to memory to have high priority ranks in low
power mode for longer time.
Read requests affect the system performance more than the write requests. This fact was
explored by Lai et al., who proposed a read-write aware throttling mechanism [16]. If any read
request exists in the request queue, then the targeted rank is activated and commands for it are
forwarded to command queues. The request queues are checked at defined intervals and ranks
are kept powered off for intervals until any read request for them is received. This helps in
keeping ranks in low power for longer time. Hur et al. also proposed a queue aware power-
down mechanism where the analysis is done rank wise [17]. Each time a request to a bank in
the rank is received, the counter’s value is incremented by the latency of the new command. If
there is no request pending for the corresponding rank in the central queue, the rank is powered
down.
Different power saving modes have different charcteristics. For example, the power con-
sumption in Power-Down mode is 18 mW, while in SR mode it is 9 mW. On the otherhand,
the power-up latencies for the former is 10 clock cycles, while its 512 clock cycles for the lat-
Related Work 7

ter [6]. Thomas et al. [18] proposed a power saving policy that explored the best of both the
modes. A history table is used to predict the forthcoming idle period duration and accordingly
switch to a power mode. In this work, we explore the effect of the Self refresh threshold on
power and performance. From the exploration results, we conclude that instead of using fixed
threshold throughout, an adaptive threshold technique results in increased power savings and
performance improvement.
Chapter 3

Proposed Power Model

This work is an extension to the earlier research [19] in which the power model for DDR3
SDRAM is developed using trace driven execution on Alpha architecture. We re-define the
DDR3 SDRAM power model for system with X86 architecture and hierarchical memory using
the same methodology and parameters. For developing the model, we use the training set
comprising of following benchmarks: cjpeg, crc, typeset, fir, patricia, rawaudio, and linux boot.
At every 1M simulation cycles, we extract the power values, state counter values, number of
last level cache misses and other simulation statistics. Self-Refresh threshold is the cycle count
in idle state after which DRAM banks switch to self-refresh mode. The value of the self-refresh
threshold (SRth ) should be chosen while taking into account its effect on performance as well
as power consumption of the memory. We have:

PowerIdle 102.7
SRth = = ≈ 50 (3.1)
PowerSR × Cycles 9.5 × 512

where Cycles represents the re-synchronisation penalty for being in SR state.

3.1 Memory Access Count based Power Model


We modify the Memory Access Count based Power Model [19] that depends on the number  of
#cycles in self-refresh state
last level cache (LLC) misses per cycle and self-refresh fraction Total cycles .
To measure LLC miss per cycle, a simple counter is used, which is incremented at each new
requests arrival. For measuring the self-refresh fraction, we propose two hardware mechanisms
in Section 3.2. Depending upon the number of parameters being taken into account, the models
are categorized as:

• Model 1: which is a function of only the self-refresh fraction – Using curve fitting on the

8
Proposed Power Model 9

simulation results for the training set (Figure 3.1), the synthesized equation is

P1 = −144.9 × X + 153.4 (3.2)

where X is the self-refresh fraction.

Figure 3.1: Curve fitting for Power Model based on self-refresh fraction

• Model 2: which is a function of last level cache misses per cycle – Using curve fitting on
the simulation results for the training set (Figure 3.2), the synthesized equation is

P2 = 23870 × X + 9.995 (3.3)

where X is the cache misses per cycle.

Figure 3.2: Curve fitting for Power Model based on last level cache misses per cycle

• Model 3: which uses both the last level cache misses per cycle and self-refresh fraction
Proposed Power Model 10

as parameters, is defined as follows (using Figure 3.3)

P3 = 4207 × X − 123.8 ×Y + 132.3 (3.4)

where X is the cache misses per cycle and Y is self-refresh fraction.

Figure 3.3: Curve fitting for Power Model based on both the parameters

3.2 Methods for Self-Refresh Fraction Estimation


We propose two mechanisms to measure self-refresh fraction (TSR ) that differ in the hardware
complexity. They are as follows.

• Observing each bank’s queue – In this approach, the queue for each bank is tracked and
if all are empty for more than the defined SRth cycles, self-refresh tracking counter is set.
This approach requires two counters – a 20 bit counter to count maximum up to an epoch
length of 1M cycles, and a 6 bit counter for checking the whether all the banks have been
idle for SRth cycles.

• Observing the global request queue – This approach tracks the global queue rather than
each bank’s local queue. It stores the Exit Cycle value, the cycle at which the last request
in the global queue is sent to the respective bank queue, and the Entry Cycle, the cycle
in which a new request arrives in the empty global queue. If the time gap between the
last request exit and new request entry is greater than SRth , then the number of cycles
after moving to SR mode is added to the existing number of cycles in self-refresh state
SRcycles . Thus, the cycle count for self-refresh state is updated by:

SRcycles = SRcycles + Entry Cycle − Exit Cycle − SRth (3.5)


Proposed Power Model 11

The additional hardware required for implementing this approach is: 2 registers for stor-
ing the Entry Cycle and Exit Cycle and an ALU to measure the gap between the requests.
SRcycles
The self-refresh fraction (TSR ) is computed as T , where T is the time interval over
which it is measured.

3.3 Experimental Validation

3.3.1 Setup
The Gem5 architecture simulator [20] integrated with RUBY [21] is used for our simulations.
Gem5 provides support for many processor architectures including X86, Alpha, and ARM.
We use X86 architecture for all the experiments. RUBY supports detailed simulation mod-
els for memory subsystem, which includes caches, memory controller, DMA, and coherence
protocols. Each of the models is highly configurable and flexible. RUBY facilitates cache
configuration by selecting cache levels, number of caches at each level, cache size, cache line
replacement policy, and cache coherence protocol. On LLC miss, each request arrives at the
memory controller, which calculates its access time and waiting time in the memory controller
queue. It also takes care of the refresh interval and other DRAM timing requirements. DRAM
power is estimated through the cycle counts retrieved for each state using the FSM model with
counters for each memory state. The overview of the complete infrastructure is shown in Fig-
ure 3.4.

Figure 3.4: Simulation Setup


Proposed Power Model 12

3.3.2 Accuracy of Self-Refresh Fraction Estimation Methods


The proposed self-refresh fraction estimation strategies are evaluated on a wide range of bench-
marks. For the first approach, we check across different SRth values and compare the estimated
values against the accurate Self-Refresh cycle count obtained using the FSM model [22]. Esti-
mation errors over the complete range are summarized in Figure 3.5.

Figure 3.5: TSR estimation error using first approach

We observe that the error is higher for smaller SRth values. This is because of the re-
synchronization penalty of 512 cycles involved with every transition of DRAM state from self-
refresh mode to idle mode. If the next request arrives after the DRAM has been idle for SRth
cycles and moved to Self-Refresh state, then the request will have to wait for 512 cycles before
being served. The higher the SRth , the lower will be the chances for the new request to wait
before being served. For lower SRth , the estimation errors are higher as most of the requests
incur the re-synchronization penalty.
Figure 3.6 shows a comparison of the proposed TSR estimation strategies assuming SRth =
1000. Strategy involving the local bank’s queue outperforms the global queue based strategy.
Thus, we conclude that using the first approach, i.e., observing each bank’s queue for TSR
estimation, power can be estimated using our proposed power model with reasonable accuracy.
Proposed Power Model 13

Figure 3.6: Comparison of the different TSR estimation strategies at SRth = 1000

3.3.3 Power Model Validation


The applications used for validating the proposed models are from the PARSEC Benchmark
suite [23]. Figure 3.7 summarizes the power estimation error percentages for different proposed
models.

Figure 3.7: Estimation Errors for different models

We observe that Model 3, which uses two parameters, is the best among three models for
power estimation. Model 3 needs more hardware than Model 1 and Model 2 as it has to keep
Proposed Power Model 14

track of two parameters using two counters, one for each of the parameters. Among the single
parameter models, Model 1, based on self-refresh fraction is better.
Chapter 4

Power Optimization Strategies

Power Down and Self-Refresh modes are low power modes for DDR3 SDRAM. In Power-
Down mode, the power consumption is around 18mW, and in Self-Refresh mode, the power
consumption is approximately 9mW. In Idle mode, 102mW power is consumed. Transiting
back from self-refresh mode to active mode takes 512 cycles whereas the power-up latency in
Power Down mode is only 10 clock cycles. To ensure data retention, the SDRAM cannot stay in
Power Down mode for more than 60 ms. In this Chapter, we discuss some power optimization
strategies using these low power modes.

4.1 Exploration for SR Threshold Selection


We present the power and delay variations with the Self-Refresh threshold (SRth ) choice for
a set of benchmarks. Figure 4.1(a) illustrates the average power variation with the SRth . The
system’s power consumption keeps on increasing with the increasing SRth . On increasing SRth
the system remains in IDLE mode for more number of cycles than it would have at lower
SRth . Since the IDLE mode power consumption is higher than the SR mode power, with the
delay in switching to SR mode the system’s power consumption increases. On the other hand,
the execution time decreases with increasing SRth (Figure 4.1(b)). This decrease is due to the
reduced number of memory requests facing re-synchronization penalty, as with increasing SRth ,
the frequency of mode switching reduces.
Based on the designer’s choice, the SRth value can be selected using following criterion:

• Lower SRth for minimizing power consumption

• Higher SRth for improving performance

15
Power Optimization Strategies 16

(a) Power Variation (b) Delay Variation

Figure 4.1: TSR estimation error using proposed approaches

4.2 Power Optimization through Mode Switching


PowerDown(PD) mode of the SDRAM is widely used for reducing the main memory power
consumption. DDR3 SDRAM has an additional power saving mode called Self-Refresh(SR)
mode. Here we present a comparison of the different power saving modes. Switching to SR
mode, on an average, results in 37% power reduction (Figure 4.2(a)) over the PD mode though it
has an additional performance overhead of 11% (Figure 4.2(b)). Considering the Power-Delay
product, it is minimum for SR mode. Thus, switching to SR mode rather than PD mode during
the idle cycles is better for power savings.

(a) Power Variation (b) Delay Variation

Figure 4.2: Power and Delay variations for different operating modes of DDR3 SDRAM

In the next sections, we propose a strategy to increase the power savings with reduced
performance overhead by using SR mode with an adaptive threshold technique.
Power Optimization Strategies 17

4.3 SRth Selection for Adaptive SR Implementation


We present exploration results for selecting the SRth factors for adaptive SR implementation.
Figure 4.3 shows the energy (power-delay product) variation with the threshold factors for a set
of benchmarks. We observe that for all the benchmarks the minimum energy points are SRth = 0
and SRth = 600. Since our aim is to optimize power without any performance overhead, the
selection of energy optimal points is reasonable as these are the threshold factors balancing the
power savings and performance degradation.

Figure 4.3: Energy variation with SRth for some benchmarks

The choice of SRth = 600 is also supported from the delay values shown in Figure 4.4(a). We
observe that beyond SRth = 600, the delay saturates, leaving no further scope for performance
improvement. Similarly, while considering power, SRth = 0 is the minimum power consuming
threshold factor and the power consumption increases with increasing SRth (Figure 4.4(b)). The
above analysis is independent of benchmarks and can be theoretically explained as follows.
Since the re-synchronization penalty is 512 cycles, if the time between consecutive requests is
greater than 512, then it is better to increase the duration in SR state by switching to SR mode
right after the current request is served. However, if the time gap is less than 512, then keeping
a higher threshold value of 600 is better to avoid the re-synchronization penalty.
Power Optimization Strategies 18

(a) Delay variation (b) Power variation

Figure 4.4: Delay and Power variations for some benchmarks

4.4 Power Optimization using Adaptive SR Threshold


We use the above derived threshold factors for dynamically changing the SRth values to reduce
power but with negligible performance overhead. The choice of SRth is based on the time gap
prediction between the current request and the next request. Let us represent the time gaps by
gapi and the predicted time gap by pred. Figure 4.5 illustrates these.

Figure 4.5: Illustration for time gap prediction

We evaluate two prediction techniques:

• Hist 1 – This technique assumes that the next request will arrive at the same time gap as
that was for the current request, i.e.,

pred = gap1

• Hist 2 – This technique uses the last 2 time gaps for predicting the next request arrival
time. The average of the 2 gap values is assumed to be the time gap after the current
request is served.
gap1 + gap2
pred =
2
Power Optimization Strategies 19

If the predicted gap (pred) between the requests is less than 50, we set SRth = 600, so that
the requests arriving within short intervals do not face any re-synchronization penalty. However,
if the time gap is higher than 50, then the system is be moved to SR mode as early as possible
to increase the time for which it remains in low power SR state. Hence, for time gap greater
than 50, the system transits to SR mode immediately after serving the current request. The
proposed strategy reduces the time spent in IDLE mode (with 102 mW power consumption)
and increases the SR mode (with 9 mW power consumption) duration thereby achieving signif-
icant power savings. Figure 4.6(a) shows a comparison of the power consumption with adaptive
SRth strategy and assuming fixed SRth on some benchmark applications. Since the number of
switchings for lower threshold values have reduced, thereby minimizing the re-synchronization
penalties, this implementation is performance efficient as well compared to the fixed thresh-
old implementation. On an average, with Hist 1 approach negligible power savings (≤ 2%)
are achieved (Figure 4.6(a)) but is about 42% faster than the SRth = 50 implementation. On
the other hand, with Hist 2 approach, we achieve up to 21% reduction in memory power con-
sumption with performance improvement of around 40% over the fixed threshold (SRth = 50)
implementation. Thus, the prediction of next time gap is better using Hist 2 approach.

(a) Power Variation (b) Delay Variation

Figure 4.6: Power and Delay Comparison for different power optimization strategies
Chapter 5

Conclusion

In this thesis, we presented a power model for DDR3 SDRAM based on memory access counts
and fractional time spent in Self-Refresh (SR) state. These parameteric values can be extracted
from the statistics generated by the processor itself without any modification in the memory
controller. Evaluation of this model on PARSEC benchmarks shows the estimation errors to be
within 1%. Evaluation of this model considering it to be a function of either of the parameters
shows that the model is dominated by the fractional time spent in Self-Refresh (SR) state.
Techniques for estimating fractional time spent in SR state are also proposed.
We also explored power optimization strategies by utilizing the low power operation modes
available for the DDR3 SDRAM. The banks are moved to Power Down mode or SR mode
based on the requests in the queues. Experimental results over a range of benchmarks show that
switching to SR mode results in greater power savings than moving to Power Down mode. Ad-
ditional power savings and performance improvements are achieved by using adaptive threshold
techniques with SR mode.

20
Bibliography

[1] C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and T. Keller. Energy man-
agement for comercial servers. In Computer, volume 36, pages 39–48, 2003.

[2] Yong-Ha Park, Jeonghoon Kook, and Hoi-Jun Yoo. Embedded DRAM (eDRAM) power-
energy estimation for system-on-a-chip (SoC) applications. In Proceedings of VLSI De-
sign, pages 625–630, 2002.

[3] Jinsong Ji, Chao Wang, and Xuehai Zhou. System-Level Early Power Estimation for
Memory Subsystem in Embedded Systems. In Fifth IEEE International Symposium on
Embedded Computing, pages 370–375, 2008.

[4] Youngjin Cho, Younghyun Kim, Sangyoung Park, and Naehyuck Chang. System-level
power estimation using an on-chip bus performance monitoring unit. In IEEE/ACM Inter-
national Conference on Computer-Aided Design, pages 149–154, 2008.

[5] K. Chandrasekar, B. Akesson, and K. Goossens. Improved Power Modeling of DDR


SDRAMs. In Euromicro Conference on Digital System Design (DSD), pages 99–108,
2011.

[6] Micron Technology Inc. DDR3 SDRAM 1GB Data Sheet. Technical report, 2006.

[7] Alvin R. Lebeck, Xiaobo Fan, Heng Zeng, and Carla Ellis. Power Aware Page Alloca-
tion. In Proceedings of the Ninth International Conference on Architectural Support for
Programming Languages and Operating Systems, pages 105–116, 2000.

[8] V. Delaluz, M. Kandemir, N. Vijaykrishnan, A. Sivasubramaniam, and M.J. Irwin. DRAM


energy management using software and hardware directed power mode control. In Inter-
national Symposium on High-Performance Computer Architecture, pages 159–169, 2001.

[9] V. De La Luz, M. Kandemir, and I. Kolcu. Automatic Data Migration for Reducing Energy
Consumption in Multi-bank Memory Systems. In Proceedings of the Design Automation
Conference, pages 213–218, 2002.

21
BIBLIOGRAPHY 22

[10] Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou,
and Sanjeev Kumar. Dynamic Tracking of Page Miss Ratio Curve for Memory Man-
agement. In Proceedings of the International Conference on Architectural Support for
Programming Languages and Operating Systems, pages 177–188, 2004.

[11] Mingli Xie, Dong Tong, Yi Feng, Kan Huang, and Xu Cheng. Page policy control with
memory partitioning for DRAM performance and power efficiency. In IEEE International
Symposium on Low Power Electronics and Design (ISLPED), pages 298–303, Sept 2013.

[12] Chun-Gi Lyuh and Taewhan Kim. Memory Access Scheduling and Binding Considering
Energy Minimization in Multi-bank Memory Systems. In Proceedings of the Design
Automation Conference, pages 81–86, 2004.

[13] H. Koc, O. Ozturk, M. Kandemir, S. H. K. Narayanan, and E. Ercanli. Minimizing Energy


Consumption of Banked Memories Using Data Recomputation. In Proceedings of the
International Symposium on Low Power Electronics and Design, pages 358–362, 2006.

[14] O. Ozturk, G. Chen, M. Kandemir, and M. Karakoy. Cache Miss Clustering for Banked
Memory Systems. In Proceedings of the International Conference on Computer-aided
Design, pages 244–250, 2006.

[15] Ahmed M. Amin and Zeshan A. Chishti. Rank-aware Cache Replacement and Write
Buffering to Improve DRAM Energy Efficiency. In Proceedings of the International
Symposium on Low Power Electronics and Design, pages 383–388, 2010.

[16] Chih-Yen Lai, Gung-Yu Pan, Hsien-Kai Kuo, and Jing-Yang Jou. A read-write aware
DRAM scheduling for power reduction in multi-core systems. In Asia and South Pacific
Design Automation Conference (ASP-DAC), pages 604–609, Jan 2014.

[17] I. Hur and C. Lin. A comprehensive approach to DRAM power management. In Inter-
national Symposium on High Performance Computer Architecture, pages 305–316, Feb
2008.

[18] Gervin Thomas, Karthik Chandrasekar, Benny Akesson, Ben Juurlink, and Kees
Goossens. A Predictor-Based Power-Saving Policy for DRAM Memories. In Proceedings
of the Euromicro Conference on Digital System Design, pages 882–889, 2012.

[19] S. Aravind. High Level Power Modeling for DDR3 SDRAM. Technical report, IIT Delhi,
2012.

[20] Gem5 Simulator. http://www.gem5.org/.


[21] RUBY Memory System Simulator. http://www.m5sim.org/Ruby/.

[22] Vishal Patel. Power Optimization for the DDR3 SDRAM System. Technical report, IIT
Delhi, 2013.

[23] PARSEC Benchmarks. http://www.m5sim.org/PARSEC benchmarks/.

23

You might also like