Power Modeling and Optimization For Ddr3 Sdram
Power Modeling and Optimization For Ddr3 Sdram
DDR3 SDRAM
of
Master of Technology
in
by
Praxal Shah
Entry No. 2012EEN2335
This is to certify that the thesis entitled Power Modeling and Optimization for DDR3 SDRAM,
submitted by Praxal Shah to the Indian Institute of Technology Delhi, for the partial fulfillment
of the award of the degree of Master of Technology, is a record of bona-fide work carried out
by him under my supervision and guidance. The work presented in this thesis has not been
submitted elsewhere either in part or full, for the award of any other Degree or Diploma.
I take this opportunity to thank my mentor, Prof. Preeti Ranjan Panda, for his continuous
support and valuable guidance during the course of thesis. It is my fortune to have worked
closely with Sir, who provided innumerable insights and encouraged me for the project work.
I would also like to thank Namita Sharma (Pursuing PhD from Department of Computer
Science & Engineering, IIT Delhi) for her continuous support and suggestions.
I would like to extend my thanks to all faculty members and the Institute for providing a
wonderful learning environment during the entire duration of the course. My heartfelt thanks to
all my friends who made my stay at IIT Delhi a memorable one. Last but not the least, I thank
my parents for their constant encouragement and support.
Praxal Shah
IIT Delhi, India
Abstract
Power optimization has become critical in modern systems with high component densities.
The memory subsystem consumes considerable fraction of the overall power thereby making
memory power optimization an important area of exploration. Dynamic power management
techniques require power estimates for the decision on the policies to be adopted. This creates
a need for highly accurate power models.
In this thesis, we propose a high level power model for DDR3 SDRAM that can be used
for implementing dynamic power management policies. We also propose power optimization
techniques exploiting the two low power operating modes – Power Down and Self Refresh.
The power savings are proportional to the time spent in the low power modes. With the SR
mode being the lowest power operating mode but with high performance overhead, we propose
an adaptive threshold technique to increase the overall gain. The proposed technique improves
the power savings by increasing the time spent in Self-Refresh mode and the performance by
reducing the re-synchronization penalties by avoiding transitions to SR mode for short IDLE
periods.
Contents
Certificate i
Acknowledgments ii
Abstract iii
Abstract iii
List of Figures vi
1 Introduction 1
1.1 SDRAM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Memory Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 Power Modeling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 DRAM Power Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . 5
iv
CONTENTS v
5 Conclusion 20
Bibliography 21
List of Figures
vi
Chapter 1
Introduction
Power optimization has become one of the major challenges in modern mobile, embedded, and
wireless devices. The urge to increase processor performance and memory density is leading
to increased power consumption. With increasing number of cores on a single chip for per-
formance enhancement, the data accesses, storage, and computation requirements are growing
exponentially. To meet these requirements, there is a need to have main memory with high
bandwidth and capacity. Studies on power consumption distribution in smartphones and data-
centers [1] have shown that the memory sub-system consumes around 40% of the total power.
In servers, the memory consumes around 1.5 times the core power consumption [1]. Thus, with
memory contributing a major fraction of the system’s power dissipation, power management in
this sub-system becomes an imporatant problem to be addressed. Many dynamic power man-
agement techniques require an estimation model for accuracy in decision making. In this thesis,
we build and validate a power estimation model. We also explore the effect of the threshold
factor, the number of cycles for which the banks are in idle mode before switching to power
saving states, on the system’s power consumption. This exploration gives a direction for power
optimization of the DDR3 SDRAM system under consideration.
In the next sections, we briefly discuss the DDR3 SDRAM architectural details, the state
diagram and the memory controller operational details along with the different power saving
modes to refresh the basics in the context of this thesis.
1
Introduction 2
the row is selected by the row decoder and bought into a row buffer, followed by the column
decoder selecting the required column.
Figure 1.1(b) shows a simplified state diagram of the DDR3 SDRAM. We discuss below
the operations in these states:
• ACTIVE: In this state, a particular row in a bank is selected and entire row is moved to
the row buffer until the next PRECHARGE.
• READ/WRITE: In this state a burst READ/WRITE is initiated from/in row buffer. Once
the burst READ/WRITE completes, the SDRAM moves to ACTIVE state and waits until
the next command.
• PRECHARGE: Before opening another row for read operation, it is necessary to write
back data from the row buffer to the last accessed row since reading is destructive in
SDRAM. This operation is performed in PRECHARGE state.
• REFRESH: Each bank in SDRAM is periodically refreshed to restore the lost charge
from the memory cells. All the banks must be in IDLE state before REFRESH command
is issued.
• SELF-REFRESH: SELF-REFRESH (SR) is a low power state in SDRAM. All the banks
should be in IDLE mode before entering into self-refresh mode.
Introduction 3
• POWER DOWN: POWER DOWN is another power saving state. Compared to SELF-
REFRESH state, the power consumption in this state is higher. Further, the SDRAM is
not refreshed in this state. Thus, being in PowerDown state for a longer duration may
result in data loss.
Both the ends of memory controller work at different speeds, so it is required to maintain
a queue to store the requests. Last level cache misses arrive at the memory controller and are
stored in the queue. In this queue, the memory controller may re-arrange memory requests in
order to save power and improve performance. After reordering of requests, the correspond-
ing commands (ACTIVE, PRECHARGE, READ, WRITE, etc.) are issued to the SDRAM,
observing the appropriate timing protocol.
1.3 Contributions
We make the following contributions.
• We propose a power model of the DDR3 SDRAM based on memory access counts and
time spent in self-refresh state, and validate the model on PARSEC benchmarks.
• We present an exploration for the SR threshold selection, the number of cycles after which
the memory bank should be moved to SR mode, to minimize the overheads involved with
state switching.
• Propose a power optimization utilizing the above exploration results for switching across
different operating modes of DDR3 SDRAM.
Introduction 4
Related Work
In this chapter, we survey the existing works related to power modeling methodologies and the
power optimization strategies for DRAM.
5
Related Work 6
moved to the same operating mode or controlled separately based on the time between the
accesses to the chip. Delauz et al. [8] investigated Software and Hardware based techniques
for controlling DRAM states. Software based techniques include transition to different states
based on idle period detection through evaluation of all the choices; and grouping the array
variables with similar lifetime access patterns and mapping them to the same bank. Different
threshold prediction techniques such as adaptive, constant and history based techniques are
used for transitioning between the different operating modes.
De la Luz et al. [9] proposed a dynamic data migration strategy that groups the arrays with
similar access counts across the sampling points. Mapping a group to the same bank helps in
reducing the number of banks to be kept active. Zhou et al. [10] analyzed the cache misses vs.
memory size curves to determine the maximum memory size required. Once the ratio becomes
constant, the maximum memory size required can be determined and accordingly other banks
can be powered down. Mapping the applications to banks in a rank according to their memory
access intensity and row-buffer locality helps in keeping some ranks to low power mode for
longer periods of time, thereby saving power [11].
Lyuh et al. [12] explored the scheduling and binding techniques to reduce the active time
for a bank. Data recomputation [13] was another technique to keep the banks in low power
mode for longer durations. Code re-structuring helps in grouping the cache hits and misses
during execution [14]. This leads to clustering of memory accesses and memory idle cycles
giving an opportunity to keep banks in low power states for longer times. Amin et al. [15]
proposed a cache replacement policy to enable longer low power states for DRAM ranks. When
conflict misses occur, the cache lines corresponding to high priority ranks are not written back
to memory and instead other lines are moved to memory to have high priority ranks in low
power mode for longer time.
Read requests affect the system performance more than the write requests. This fact was
explored by Lai et al., who proposed a read-write aware throttling mechanism [16]. If any read
request exists in the request queue, then the targeted rank is activated and commands for it are
forwarded to command queues. The request queues are checked at defined intervals and ranks
are kept powered off for intervals until any read request for them is received. This helps in
keeping ranks in low power for longer time. Hur et al. also proposed a queue aware power-
down mechanism where the analysis is done rank wise [17]. Each time a request to a bank in
the rank is received, the counter’s value is incremented by the latency of the new command. If
there is no request pending for the corresponding rank in the central queue, the rank is powered
down.
Different power saving modes have different charcteristics. For example, the power con-
sumption in Power-Down mode is 18 mW, while in SR mode it is 9 mW. On the otherhand,
the power-up latencies for the former is 10 clock cycles, while its 512 clock cycles for the lat-
Related Work 7
ter [6]. Thomas et al. [18] proposed a power saving policy that explored the best of both the
modes. A history table is used to predict the forthcoming idle period duration and accordingly
switch to a power mode. In this work, we explore the effect of the Self refresh threshold on
power and performance. From the exploration results, we conclude that instead of using fixed
threshold throughout, an adaptive threshold technique results in increased power savings and
performance improvement.
Chapter 3
This work is an extension to the earlier research [19] in which the power model for DDR3
SDRAM is developed using trace driven execution on Alpha architecture. We re-define the
DDR3 SDRAM power model for system with X86 architecture and hierarchical memory using
the same methodology and parameters. For developing the model, we use the training set
comprising of following benchmarks: cjpeg, crc, typeset, fir, patricia, rawaudio, and linux boot.
At every 1M simulation cycles, we extract the power values, state counter values, number of
last level cache misses and other simulation statistics. Self-Refresh threshold is the cycle count
in idle state after which DRAM banks switch to self-refresh mode. The value of the self-refresh
threshold (SRth ) should be chosen while taking into account its effect on performance as well
as power consumption of the memory. We have:
PowerIdle 102.7
SRth = = ≈ 50 (3.1)
PowerSR × Cycles 9.5 × 512
• Model 1: which is a function of only the self-refresh fraction – Using curve fitting on the
8
Proposed Power Model 9
simulation results for the training set (Figure 3.1), the synthesized equation is
Figure 3.1: Curve fitting for Power Model based on self-refresh fraction
• Model 2: which is a function of last level cache misses per cycle – Using curve fitting on
the simulation results for the training set (Figure 3.2), the synthesized equation is
Figure 3.2: Curve fitting for Power Model based on last level cache misses per cycle
• Model 3: which uses both the last level cache misses per cycle and self-refresh fraction
Proposed Power Model 10
Figure 3.3: Curve fitting for Power Model based on both the parameters
• Observing each bank’s queue – In this approach, the queue for each bank is tracked and
if all are empty for more than the defined SRth cycles, self-refresh tracking counter is set.
This approach requires two counters – a 20 bit counter to count maximum up to an epoch
length of 1M cycles, and a 6 bit counter for checking the whether all the banks have been
idle for SRth cycles.
• Observing the global request queue – This approach tracks the global queue rather than
each bank’s local queue. It stores the Exit Cycle value, the cycle at which the last request
in the global queue is sent to the respective bank queue, and the Entry Cycle, the cycle
in which a new request arrives in the empty global queue. If the time gap between the
last request exit and new request entry is greater than SRth , then the number of cycles
after moving to SR mode is added to the existing number of cycles in self-refresh state
SRcycles . Thus, the cycle count for self-refresh state is updated by:
The additional hardware required for implementing this approach is: 2 registers for stor-
ing the Entry Cycle and Exit Cycle and an ALU to measure the gap between the requests.
SRcycles
The self-refresh fraction (TSR ) is computed as T , where T is the time interval over
which it is measured.
3.3.1 Setup
The Gem5 architecture simulator [20] integrated with RUBY [21] is used for our simulations.
Gem5 provides support for many processor architectures including X86, Alpha, and ARM.
We use X86 architecture for all the experiments. RUBY supports detailed simulation mod-
els for memory subsystem, which includes caches, memory controller, DMA, and coherence
protocols. Each of the models is highly configurable and flexible. RUBY facilitates cache
configuration by selecting cache levels, number of caches at each level, cache size, cache line
replacement policy, and cache coherence protocol. On LLC miss, each request arrives at the
memory controller, which calculates its access time and waiting time in the memory controller
queue. It also takes care of the refresh interval and other DRAM timing requirements. DRAM
power is estimated through the cycle counts retrieved for each state using the FSM model with
counters for each memory state. The overview of the complete infrastructure is shown in Fig-
ure 3.4.
We observe that the error is higher for smaller SRth values. This is because of the re-
synchronization penalty of 512 cycles involved with every transition of DRAM state from self-
refresh mode to idle mode. If the next request arrives after the DRAM has been idle for SRth
cycles and moved to Self-Refresh state, then the request will have to wait for 512 cycles before
being served. The higher the SRth , the lower will be the chances for the new request to wait
before being served. For lower SRth , the estimation errors are higher as most of the requests
incur the re-synchronization penalty.
Figure 3.6 shows a comparison of the proposed TSR estimation strategies assuming SRth =
1000. Strategy involving the local bank’s queue outperforms the global queue based strategy.
Thus, we conclude that using the first approach, i.e., observing each bank’s queue for TSR
estimation, power can be estimated using our proposed power model with reasonable accuracy.
Proposed Power Model 13
Figure 3.6: Comparison of the different TSR estimation strategies at SRth = 1000
We observe that Model 3, which uses two parameters, is the best among three models for
power estimation. Model 3 needs more hardware than Model 1 and Model 2 as it has to keep
Proposed Power Model 14
track of two parameters using two counters, one for each of the parameters. Among the single
parameter models, Model 1, based on self-refresh fraction is better.
Chapter 4
Power Down and Self-Refresh modes are low power modes for DDR3 SDRAM. In Power-
Down mode, the power consumption is around 18mW, and in Self-Refresh mode, the power
consumption is approximately 9mW. In Idle mode, 102mW power is consumed. Transiting
back from self-refresh mode to active mode takes 512 cycles whereas the power-up latency in
Power Down mode is only 10 clock cycles. To ensure data retention, the SDRAM cannot stay in
Power Down mode for more than 60 ms. In this Chapter, we discuss some power optimization
strategies using these low power modes.
15
Power Optimization Strategies 16
Figure 4.2: Power and Delay variations for different operating modes of DDR3 SDRAM
In the next sections, we propose a strategy to increase the power savings with reduced
performance overhead by using SR mode with an adaptive threshold technique.
Power Optimization Strategies 17
The choice of SRth = 600 is also supported from the delay values shown in Figure 4.4(a). We
observe that beyond SRth = 600, the delay saturates, leaving no further scope for performance
improvement. Similarly, while considering power, SRth = 0 is the minimum power consuming
threshold factor and the power consumption increases with increasing SRth (Figure 4.4(b)). The
above analysis is independent of benchmarks and can be theoretically explained as follows.
Since the re-synchronization penalty is 512 cycles, if the time between consecutive requests is
greater than 512, then it is better to increase the duration in SR state by switching to SR mode
right after the current request is served. However, if the time gap is less than 512, then keeping
a higher threshold value of 600 is better to avoid the re-synchronization penalty.
Power Optimization Strategies 18
• Hist 1 – This technique assumes that the next request will arrive at the same time gap as
that was for the current request, i.e.,
pred = gap1
• Hist 2 – This technique uses the last 2 time gaps for predicting the next request arrival
time. The average of the 2 gap values is assumed to be the time gap after the current
request is served.
gap1 + gap2
pred =
2
Power Optimization Strategies 19
If the predicted gap (pred) between the requests is less than 50, we set SRth = 600, so that
the requests arriving within short intervals do not face any re-synchronization penalty. However,
if the time gap is higher than 50, then the system is be moved to SR mode as early as possible
to increase the time for which it remains in low power SR state. Hence, for time gap greater
than 50, the system transits to SR mode immediately after serving the current request. The
proposed strategy reduces the time spent in IDLE mode (with 102 mW power consumption)
and increases the SR mode (with 9 mW power consumption) duration thereby achieving signif-
icant power savings. Figure 4.6(a) shows a comparison of the power consumption with adaptive
SRth strategy and assuming fixed SRth on some benchmark applications. Since the number of
switchings for lower threshold values have reduced, thereby minimizing the re-synchronization
penalties, this implementation is performance efficient as well compared to the fixed thresh-
old implementation. On an average, with Hist 1 approach negligible power savings (≤ 2%)
are achieved (Figure 4.6(a)) but is about 42% faster than the SRth = 50 implementation. On
the other hand, with Hist 2 approach, we achieve up to 21% reduction in memory power con-
sumption with performance improvement of around 40% over the fixed threshold (SRth = 50)
implementation. Thus, the prediction of next time gap is better using Hist 2 approach.
Figure 4.6: Power and Delay Comparison for different power optimization strategies
Chapter 5
Conclusion
In this thesis, we presented a power model for DDR3 SDRAM based on memory access counts
and fractional time spent in Self-Refresh (SR) state. These parameteric values can be extracted
from the statistics generated by the processor itself without any modification in the memory
controller. Evaluation of this model on PARSEC benchmarks shows the estimation errors to be
within 1%. Evaluation of this model considering it to be a function of either of the parameters
shows that the model is dominated by the fractional time spent in Self-Refresh (SR) state.
Techniques for estimating fractional time spent in SR state are also proposed.
We also explored power optimization strategies by utilizing the low power operation modes
available for the DDR3 SDRAM. The banks are moved to Power Down mode or SR mode
based on the requests in the queues. Experimental results over a range of benchmarks show that
switching to SR mode results in greater power savings than moving to Power Down mode. Ad-
ditional power savings and performance improvements are achieved by using adaptive threshold
techniques with SR mode.
20
Bibliography
[1] C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and T. Keller. Energy man-
agement for comercial servers. In Computer, volume 36, pages 39–48, 2003.
[2] Yong-Ha Park, Jeonghoon Kook, and Hoi-Jun Yoo. Embedded DRAM (eDRAM) power-
energy estimation for system-on-a-chip (SoC) applications. In Proceedings of VLSI De-
sign, pages 625–630, 2002.
[3] Jinsong Ji, Chao Wang, and Xuehai Zhou. System-Level Early Power Estimation for
Memory Subsystem in Embedded Systems. In Fifth IEEE International Symposium on
Embedded Computing, pages 370–375, 2008.
[4] Youngjin Cho, Younghyun Kim, Sangyoung Park, and Naehyuck Chang. System-level
power estimation using an on-chip bus performance monitoring unit. In IEEE/ACM Inter-
national Conference on Computer-Aided Design, pages 149–154, 2008.
[6] Micron Technology Inc. DDR3 SDRAM 1GB Data Sheet. Technical report, 2006.
[7] Alvin R. Lebeck, Xiaobo Fan, Heng Zeng, and Carla Ellis. Power Aware Page Alloca-
tion. In Proceedings of the Ninth International Conference on Architectural Support for
Programming Languages and Operating Systems, pages 105–116, 2000.
[9] V. De La Luz, M. Kandemir, and I. Kolcu. Automatic Data Migration for Reducing Energy
Consumption in Multi-bank Memory Systems. In Proceedings of the Design Automation
Conference, pages 213–218, 2002.
21
BIBLIOGRAPHY 22
[10] Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou,
and Sanjeev Kumar. Dynamic Tracking of Page Miss Ratio Curve for Memory Man-
agement. In Proceedings of the International Conference on Architectural Support for
Programming Languages and Operating Systems, pages 177–188, 2004.
[11] Mingli Xie, Dong Tong, Yi Feng, Kan Huang, and Xu Cheng. Page policy control with
memory partitioning for DRAM performance and power efficiency. In IEEE International
Symposium on Low Power Electronics and Design (ISLPED), pages 298–303, Sept 2013.
[12] Chun-Gi Lyuh and Taewhan Kim. Memory Access Scheduling and Binding Considering
Energy Minimization in Multi-bank Memory Systems. In Proceedings of the Design
Automation Conference, pages 81–86, 2004.
[14] O. Ozturk, G. Chen, M. Kandemir, and M. Karakoy. Cache Miss Clustering for Banked
Memory Systems. In Proceedings of the International Conference on Computer-aided
Design, pages 244–250, 2006.
[15] Ahmed M. Amin and Zeshan A. Chishti. Rank-aware Cache Replacement and Write
Buffering to Improve DRAM Energy Efficiency. In Proceedings of the International
Symposium on Low Power Electronics and Design, pages 383–388, 2010.
[16] Chih-Yen Lai, Gung-Yu Pan, Hsien-Kai Kuo, and Jing-Yang Jou. A read-write aware
DRAM scheduling for power reduction in multi-core systems. In Asia and South Pacific
Design Automation Conference (ASP-DAC), pages 604–609, Jan 2014.
[17] I. Hur and C. Lin. A comprehensive approach to DRAM power management. In Inter-
national Symposium on High Performance Computer Architecture, pages 305–316, Feb
2008.
[18] Gervin Thomas, Karthik Chandrasekar, Benny Akesson, Ben Juurlink, and Kees
Goossens. A Predictor-Based Power-Saving Policy for DRAM Memories. In Proceedings
of the Euromicro Conference on Digital System Design, pages 882–889, 2012.
[19] S. Aravind. High Level Power Modeling for DDR3 SDRAM. Technical report, IIT Delhi,
2012.
[22] Vishal Patel. Power Optimization for the DDR3 SDRAM System. Technical report, IIT
Delhi, 2013.
23