0% found this document useful (0 votes)
11 views

GPU_Architecture_Optimization_For_Mobile_Computing

This document discusses the optimization of GPU architecture for mobile computing, focusing on power gating techniques to reduce leakage power in GPU cache arrays. It presents measurements of leakage power in different operational modes and emphasizes the importance of power efficiency in mobile devices due to their limited power supply. The findings indicate a potential 73.3% power saving in sleep mode compared to active mode, highlighting the need for effective power management in modern GPUs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

GPU_Architecture_Optimization_For_Mobile_Computing

This document discusses the optimization of GPU architecture for mobile computing, focusing on power gating techniques to reduce leakage power in GPU cache arrays. It presents measurements of leakage power in different operational modes and emphasizes the importance of power efficiency in mobile devices due to their limited power supply. The findings indicate a potential 73.3% power saving in sleep mode compared to active mode, highlighting the need for effective power management in modern GPUs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

GPU Architecture Optimization For Mobile

Computing
Abdulsami Aldahlawi1, Kyung Ki Kim2 Yong-Bin Kim1
1
Department of Electrical and Computer Engineering, Northeastern University
Boston, MA, USA
2
Department of Electronic Engineering, Daegu University
Gyeongsan, Gyeongbuk, South Korea
aldahlawi, [email protected], [email protected]

Abstract— Graphical Processing Units (GPUs) are always x Double-Precision Unit: used for double-precision
criticized for high power consumption due to its massive operations.
performance that it can deliver. While GPUs are getting into the
mobile market, more power constraints are established. In this x Special-Function Unit: used for fast evaluations of
work, we evaluate the power gating techniques for GPU cache functions such as sine, cosine, square-root,
arrays. The leakage power in active mode is measured at 2.28 μW complex numbers, etc.
whereas is sleep mode leakage power is measured at 0.61 μW
(26.7% of active mode leakage) and 0.034 μW at off mode (1.5% x Load-Store Unit: used to calculate the source and
of active mode leakage) at 1.0V power supply using 45nm standard destinations addresses made for memory requests.
CMOS process. Each SM has its own private Level 1 (L1) cache whereas all
SMs in a GPU share a common Level 2 (L2) cache. Fig. 1 shows
Keywords; Graphical Processing Units (GPU); Power Gating;
a general architecture of modern GPUs. [1]
Leakage Power.

I. INTRODUCTION
The demand for higher performance and computing
capability is steadily increasing. Scientific simulations,
sophisticated graphic rendering, and big data processing are all
examples of modern applications requiring massive parallel
processing. Since Central-Processing Units (CPUs) execution
model is based on sequential processing, these applications
require a significant amount of time if it were to be executed on
a CPU. However, with the easier programmability of current
GPUs, the era of General-Purpose GPUs (GPGPUs) has Figure 1. GPU Architecture
blossomed. Furthermore, the need for GPUs is expanding to the
mobile market as smartphones nowadays are capable of running B. GPU Memories
multimedia processing applications, 3D graphics online video The memory hierarchy in GPU differs from that of a CPU in
games, and more. However, with the limited power supply in few aspects. Mainly in CPUs, register file gets loaded from L1
mobile devices, strict power constraints must be met to ensure cache, which gets loaded from L2 cache and then from main
adequate power-performance relationship when running these memory. The GPU memory hierarchy, however, has more
applications. components to allow efficient execution for thread, block, and a
grid level. Fig 2. shows the hierarchy of GPU memory. [1]
II. BACKGROUND

A. General GPU Architecture


GPUs consist of multiple streaming-multiprocessor (SM).
Each SM consists of multiple functional units to perform
different tasks. These functional units are as follows:
x Scalar Processors: consists of an Arithmetic and
Logic Unit (ALU) and single-precision Floating-
Point Unit (FPU).

Figure 2. GPU Memory Hierarchy

978-1-7281-2478-0/19/$31.00 ©2019 IEEE 247


Authorized licensed use limited to: VIT University. Downloaded on September 26,2024 at 17:30:42 UTC from IEEE Xplore. Restrictions apply.
ISOCC 2019
The use of shared memory allows efficient communication equivalent 6T SRAM sizing is shown in Table I. Fig. 5 shows
on a thread block level. If properly used, access to L1 cache will the voltage bump at the virtual ground node.
be less and hence more idle cycle. Fig. 3 shows the percentage
of idle L1 cache time of total execution time running some
TABLE I. EQUIVALENT CACHE ARRAY SIZING
benchmarks. Power gating L1 cache arrays will be used to
reduce leakage power consumption. Transistor M1 M2 M3 M4 M5 M6
Size (μm) 56 56 28 28 42 42

Figure 3. Percentage of idle L1 cache time of total execution time of 20


common benchmarks applications

III. LEAKAGE POWER REDUCTION FOR GPU CACHE


Figure 5. Voltage Bump Measurements at different Sleep Transistors Sizes
A. Cache Array Model
A common method of reducing leakage power in CMOS C. Simulation Results
circuits is the use of a higher threshold voltage transistor often The value of W/L for the sleep transistors selected is W/L=6.
called a sleep transistor. Rusu et al. in [2] explained the use of a This is to ensure that the bump does not go over 0.1V and to
coupled sleep transistor to achieve data retention as well as satisfy the delay of < 2ns. Using these values, the amount of
leakage power saving. Fig 4. Shows the model used in analysis leakage power consumption and the amount saved is
of this method for GPU L1 cache array. summarized below in Table II.

TABLE II. LEAKAGE POWER CONSUMPTION AT DIFFERENT MODES OF


OPERATION

Mode Leakage Power in SRAM Array


Active 2.28 μW (100%)
Sleep 0.61 μW (26.7%)
Off 0.34 μW (1.5%)

IV. CONCLUSION
The powerful computing capabilities brought by modern
GPUs comes with the price of increased power consumption.
Figure 4. SRAM Cache Array Due to limited power supplies in mobile devices and the fact that
leakage power consumption in modern CMOS technologies
The control generating S1 and S2 signals will have its input acquire a handful of total power consumed by the device, a
from the warp scheduler. If the SM completed its execution, both leakage power saving technique was applied to GPU cache
signals will be off, and the data is lost. However, during idle memory. Using this technique, the estimated power saving in
cycles, S1 will be on and S2 will be off which will retain the data sleep mode is about 73.3% compared to normal active mode.
and save some leakage power.
REFERENCES
B. 6T SRAM Cell Sizing and Sleep Transistor Sizing
The 6T SRAM cell sizing (M1-M6) has been determined to [1] J. D. Owens, M. Houston, D. Luebke, S. Green, J. E Stone, and J. C.
ensure proper read and write operations using 45nm standard Phillips. “GPU computing”. Proceedings of the IEEE, 96(5), 2008.
CMOS technology and to satisfy the 1 GHz clock timing [2] Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayres, J. Chang, R. Varada, M.
constraints. The sleep transistors, however, will be attached to a Ratta, S. Kottapalli and S. Vora. “Power Reduction Techniques for an 8-
core Xeon Processor”. 2009. In Proc. IEEE ESSCIRC. 340–343.
32 Bytes array. The model used to size those transistors is the
[3] J. Kao, A. Chandrakasan, and D. Antoniadis. “Transistor Sizing Issues
sum of all low Vt transistors in an array to estimate the rise of and Tool For Multi-Threshold CMOS Technology”. 2014. In Proc.
voltage given by the current rush in worst case scenario where Design Automation Conference.
both sleep transistors turns on at the same time [3]. The

978-1-7281-2478-0/19/$31.00 ©2019 IEEE 248


Authorized licensed use limited to: VIT University. Downloaded on September 26,2024 at 17:30:42 UTC from IEEE Xplore. Restrictions apply.
ISOCC 2019

You might also like