GPU Architecture Optimization For Mobile Computing

This document discusses the optimization of GPU architecture for mobile computing, focusing on power gating techniques to reduce leakage power in GPU cache arrays. It presents measurements of leakage power in different operational modes and emphasizes the importance of power efficiency in mobile devices due to their limited power supply. The findings indicate a potential 73.3% power saving in sleep mode compared to active mode, highlighting the need for effective power management in modern GPUs.

Uploaded by

howickitcyotara88000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views2 pages

GPU Architecture Optimization For Mobile Computing

Uploaded by

howickitcyotara88000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

GPU Architecture Optimization For Mobile

Computing
Abdulsami Aldahlawi1, Kyung Ki Kim2 Yong-Bin Kim1
1
Department of Electrical and Computer Engineering, Northeastern University
Boston, MA, USA
2
Department of Electronic Engineering, Daegu University
Gyeongsan, Gyeongbuk, South Korea
aldahlawi, [email protected], [email protected]

Abstract— Graphical Processing Units (GPUs) are always x Double-Precision Unit: used for double-precision
criticized for high power consumption due to its massive operations.
performance that it can deliver. While GPUs are getting into the
mobile market, more power constraints are established. In this x Special-Function Unit: used for fast evaluations of
work, we evaluate the power gating techniques for GPU cache functions such as sine, cosine, square-root,
arrays. The leakage power in active mode is measured at 2.28 μW complex numbers, etc.
whereas is sleep mode leakage power is measured at 0.61 μW
(26.7% of active mode leakage) and 0.034 μW at off mode (1.5% x Load-Store Unit: used to calculate the source and
of active mode leakage) at 1.0V power supply using 45nm standard destinations addresses made for memory requests.
CMOS process. Each SM has its own private Level 1 (L1) cache whereas all
SMs in a GPU share a common Level 2 (L2) cache. Fig. 1 shows
Keywords; Graphical Processing Units (GPU); Power Gating;
a general architecture of modern GPUs. [1]
Leakage Power.

I. INTRODUCTION
The demand for higher performance and computing
capability is steadily increasing. Scientific simulations,
sophisticated graphic rendering, and big data processing are all
examples of modern applications requiring massive parallel
processing. Since Central-Processing Units (CPUs) execution
model is based on sequential processing, these applications
require a significant amount of time if it were to be executed on
a CPU. However, with the easier programmability of current
GPUs, the era of General-Purpose GPUs (GPGPUs) has Figure 1. GPU Architecture
blossomed. Furthermore, the need for GPUs is expanding to the
mobile market as smartphones nowadays are capable of running B. GPU Memories
multimedia processing applications, 3D graphics online video The memory hierarchy in GPU differs from that of a CPU in
games, and more. However, with the limited power supply in few aspects. Mainly in CPUs, register file gets loaded from L1
mobile devices, strict power constraints must be met to ensure cache, which gets loaded from L2 cache and then from main
adequate power-performance relationship when running these memory. The GPU memory hierarchy, however, has more
applications. components to allow efficient execution for thread, block, and a
grid level. Fig 2. shows the hierarchy of GPU memory. [1]
II. BACKGROUND

A. General GPU Architecture

GPUs consist of multiple streaming-multiprocessor (SM).
Each SM consists of multiple functional units to perform
different tasks. These functional units are as follows:
x Scalar Processors: consists of an Arithmetic and
Logic Unit (ALU) and single-precision Floating-
Point Unit (FPU).

Figure 2. GPU Memory Hierarchy

978-1-7281-2478-0/19/$31.00 ©2019 IEEE 247

Authorized licensed use limited to: VIT University. Downloaded on September 26,2024 at 17:30:42 UTC from IEEE Xplore. Restrictions apply.
ISOCC 2019
The use of shared memory allows efficient communication equivalent 6T SRAM sizing is shown in Table I. Fig. 5 shows
on a thread block level. If properly used, access to L1 cache will the voltage bump at the virtual ground node.
be less and hence more idle cycle. Fig. 3 shows the percentage
of idle L1 cache time of total execution time running some
TABLE I. EQUIVALENT CACHE ARRAY SIZING
benchmarks. Power gating L1 cache arrays will be used to
reduce leakage power consumption. Transistor M1 M2 M3 M4 M5 M6
Size (μm) 56 56 28 28 42 42

Figure 3. Percentage of idle L1 cache time of total execution time of 20

common benchmarks applications

III. LEAKAGE POWER REDUCTION FOR GPU CACHE

Figure 5. Voltage Bump Measurements at different Sleep Transistors Sizes
A. Cache Array Model
A common method of reducing leakage power in CMOS C. Simulation Results
circuits is the use of a higher threshold voltage transistor often The value of W/L for the sleep transistors selected is W/L=6.
called a sleep transistor. Rusu et al. in [2] explained the use of a This is to ensure that the bump does not go over 0.1V and to
coupled sleep transistor to achieve data retention as well as satisfy the delay of < 2ns. Using these values, the amount of
leakage power saving. Fig 4. Shows the model used in analysis leakage power consumption and the amount saved is
of this method for GPU L1 cache array. summarized below in Table II.

TABLE II. LEAKAGE POWER CONSUMPTION AT DIFFERENT MODES OF

OPERATION

Mode Leakage Power in SRAM Array

Active 2.28 μW (100%)
Sleep 0.61 μW (26.7%)
Off 0.34 μW (1.5%)

IV. CONCLUSION
The powerful computing capabilities brought by modern
GPUs comes with the price of increased power consumption.
Figure 4. SRAM Cache Array Due to limited power supplies in mobile devices and the fact that
leakage power consumption in modern CMOS technologies
The control generating S1 and S2 signals will have its input acquire a handful of total power consumed by the device, a
from the warp scheduler. If the SM completed its execution, both leakage power saving technique was applied to GPU cache
signals will be off, and the data is lost. However, during idle memory. Using this technique, the estimated power saving in
cycles, S1 will be on and S2 will be off which will retain the data sleep mode is about 73.3% compared to normal active mode.
and save some leakage power.
REFERENCES
B. 6T SRAM Cell Sizing and Sleep Transistor Sizing
The 6T SRAM cell sizing (M1-M6) has been determined to [1] J. D. Owens, M. Houston, D. Luebke, S. Green, J. E Stone, and J. C.
ensure proper read and write operations using 45nm standard Phillips. “GPU computing”. Proceedings of the IEEE, 96(5), 2008.
CMOS technology and to satisfy the 1 GHz clock timing [2] Rusu, S. Tam, H. Muljono, J. Stinson, D. Ayres, J. Chang, R. Varada, M.
constraints. The sleep transistors, however, will be attached to a Ratta, S. Kottapalli and S. Vora. “Power Reduction Techniques for an 8-
core Xeon Processor”. 2009. In Proc. IEEE ESSCIRC. 340–343.
32 Bytes array. The model used to size those transistors is the
[3] J. Kao, A. Chandrakasan, and D. Antoniadis. “Transistor Sizing Issues
sum of all low Vt transistors in an array to estimate the rise of and Tool For Multi-Threshold CMOS Technology”. 2014. In Proc.
voltage given by the current rush in worst case scenario where Design Automation Conference.
both sleep transistors turns on at the same time [3]. The

Authorized licensed use limited to: VIT University. Downloaded on September 26,2024 at 17:30:42 UTC from IEEE Xplore. Restrictions apply.
ISOCC 2019

DesigningVideoGameHardrwareInVerilog Ebook Dec2018
No ratings yet
DesigningVideoGameHardrwareInVerilog Ebook Dec2018
217 pages
Embedded Systems:: Jonathan W. Valvano
No ratings yet
Embedded Systems:: Jonathan W. Valvano
7 pages
Keywords - SRAM, Biomedical Systems, Sub-Threshold, Transmission Gate, Stability, 45-nm Technology
No ratings yet
Keywords - SRAM, Biomedical Systems, Sub-Threshold, Transmission Gate, Stability, 45-nm Technology
43 pages
Msi ms-7329 Rev 1.0 SCH PDF
No ratings yet
Msi ms-7329 Rev 1.0 SCH PDF
39 pages
Boolean Algebra and Logic Circuits: Introduction To Electronics Engineering 22ESC143
No ratings yet
Boolean Algebra and Logic Circuits: Introduction To Electronics Engineering 22ESC143
20 pages
CS101 Midterm Fall 2015 - Solution PDF
No ratings yet
CS101 Midterm Fall 2015 - Solution PDF
12 pages
Type of 6T SRAM Cell
No ratings yet
Type of 6T SRAM Cell
8 pages
EMBEDDED SYSTEM-unit-1
No ratings yet
EMBEDDED SYSTEM-unit-1
119 pages
BXE Important Chapter
100% (1)
BXE Important Chapter
55 pages
REN Wp-cm-008 Current Consumption Reduction Solutions WHP 20240818
No ratings yet
REN Wp-cm-008 Current Consumption Reduction Solutions WHP 20240818
10 pages
VHDL Code For Adc0804, Comparator and Parity Generator
100% (1)
VHDL Code For Adc0804, Comparator and Parity Generator
26 pages
ARM Chap 3 - Last
No ratings yet
ARM Chap 3 - Last
51 pages
Course Outline - Digital Logic Design
No ratings yet
Course Outline - Digital Logic Design
10 pages
Gistfile 1
No ratings yet
Gistfile 1
49 pages
Fet & Mosfet
No ratings yet
Fet & Mosfet
14 pages
Debugger Cevax
No ratings yet
Debugger Cevax
57 pages
A Highly Stable Reliable SRAM Cell For Low Power Applications
No ratings yet
A Highly Stable Reliable SRAM Cell For Low Power Applications
11 pages
1 LOS and LOC in Vlsi Conference-Proceeding
No ratings yet
1 LOS and LOC in Vlsi Conference-Proceeding
9 pages
A Robust and Reconfigurable Multi-Mode Power Gating Architecture
No ratings yet
A Robust and Reconfigurable Multi-Mode Power Gating Architecture
6 pages
3
No ratings yet
3
48 pages
10T Sram Paper
No ratings yet
10T Sram Paper
5 pages
FPGA
No ratings yet
FPGA
7 pages
Computer Organisation
No ratings yet
Computer Organisation
19 pages
Analysis and Reduction of Power Using Gating Techniques Near Subthreshold Region
No ratings yet
Analysis and Reduction of Power Using Gating Techniques Near Subthreshold Region
7 pages
v3 Clocks Reset
No ratings yet
v3 Clocks Reset
19 pages
Iotuni 22
No ratings yet
Iotuni 22
24 pages
Ece341 Lecture04
No ratings yet
Ece341 Lecture04
20 pages
CL-A0618A Linux Specification 20181113
No ratings yet
CL-A0618A Linux Specification 20181113
10 pages
Logic Families PDF
No ratings yet
Logic Families PDF
10 pages
Optimizing GPU Energy Efficiency With 3D Die-Stacking Graphics Memory and Reconfigurable Memory Interface
No ratings yet
Optimizing GPU Energy Efficiency With 3D Die-Stacking Graphics Memory and Reconfigurable Memory Interface
25 pages
Assignment 3: 1. Given The MIPS Assembly Code
No ratings yet
Assignment 3: 1. Given The MIPS Assembly Code
12 pages
Chapter 3
No ratings yet
Chapter 3
56 pages
8255 (Programmable Peripheral Interface)
No ratings yet
8255 (Programmable Peripheral Interface)
6 pages
Suresh Kumar 2020 IOP Conf. Ser. Mater. Sci. Eng. 994 012045
No ratings yet
Suresh Kumar 2020 IOP Conf. Ser. Mater. Sci. Eng. 994 012045
9 pages
Exam Ict Igcse
No ratings yet
Exam Ict Igcse
3 pages
Sleepy Stack Defense
No ratings yet
Sleepy Stack Defense
61 pages
Why Choose Intel Xeon
No ratings yet
Why Choose Intel Xeon
5 pages
8051 Programs
No ratings yet
8051 Programs
3 pages
32k-Bit Sleepy Sram
No ratings yet
32k-Bit Sleepy Sram
18 pages
Course Outline Course Information) : Perancangan Kursus
No ratings yet
Course Outline Course Information) : Perancangan Kursus
6 pages
Thesis Manuscript
No ratings yet
Thesis Manuscript
6 pages
White Paper - : Amd64 Technology
No ratings yet
White Paper - : Amd64 Technology
5 pages
Analysis of 6T SRAM Cell in Different Technologies
No ratings yet
Analysis of 6T SRAM Cell in Different Technologies
5 pages
Bradley University ECE Senior Capstone Projects
No ratings yet
Bradley University ECE Senior Capstone Projects
7 pages
Reducing Execution Unit Leakage Power in Embedded Processors
No ratings yet
Reducing Execution Unit Leakage Power in Embedded Processors
11 pages
Chaurasia - 2022 - IOP - Conf. - Ser. - Mater. - Sci. - Eng. - 1272 - 012007 (Ranu)
No ratings yet
Chaurasia - 2022 - IOP - Conf. - Ser. - Mater. - Sci. - Eng. - 1272 - 012007 (Ranu)
1 page
Fight The Power
No ratings yet
Fight The Power
29 pages
Energy Efficiency in Gpu
No ratings yet
Energy Efficiency in Gpu
26 pages
Automated Design Techniques For Low-Power High-Speed Circuits
No ratings yet
Automated Design Techniques For Low-Power High-Speed Circuits
4 pages
Low Power Implementation of RISC V Proce
No ratings yet
Low Power Implementation of RISC V Proce
6 pages
Power Aware Architecture
No ratings yet
Power Aware Architecture
46 pages
Marks) : Kalam Technological (Z0l
No ratings yet
Marks) : Kalam Technological (Z0l
15 pages
Low Power 12T MTCMOS SRAM Based CAM
No ratings yet
Low Power 12T MTCMOS SRAM Based CAM
6 pages
Performance Evaluation of 6T, 7T & 8T SRAM at 180 NM Technology
No ratings yet
Performance Evaluation of 6T, 7T & 8T SRAM at 180 NM Technology
7 pages
Review Jurnal Multicore Processor Technology - Advantages Challange
No ratings yet
Review Jurnal Multicore Processor Technology - Advantages Challange
2 pages
Analysis of Power Efficient Sram Cell For Portable Devices: Swaroop Kumar K, Shabirahmed B J, Narendra K, Asha G H
No ratings yet
Analysis of Power Efficient Sram Cell For Portable Devices: Swaroop Kumar K, Shabirahmed B J, Narendra K, Asha G H
9 pages
Sram - Power Dissipation
No ratings yet
Sram - Power Dissipation
5 pages
WARM SRAM - A Novel Scheme To Reduce Static Leakage Energy in SRAM
No ratings yet
WARM SRAM - A Novel Scheme To Reduce Static Leakage Energy in SRAM
8 pages
10 - Chapter 1 - Mem PDF
No ratings yet
10 - Chapter 1 - Mem PDF
10 pages
"Power" of Realtime 3D-Rendering: Raja Koduri
No ratings yet
"Power" of Realtime 3D-Rendering: Raja Koduri
32 pages
Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines
No ratings yet
Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines
38 pages
Design and Analysis of Low Power 6T SRAM Cell
No ratings yet
Design and Analysis of Low Power 6T SRAM Cell
6 pages
Pal 2021 IOP Conf. Ser. Mater. Sci. Eng. 1187 012008
No ratings yet
Pal 2021 IOP Conf. Ser. Mater. Sci. Eng. 1187 012008
11 pages
q1 Paper-2
No ratings yet
q1 Paper-2
20 pages
Designand Analysisof Low Power MTCMOS
No ratings yet
Designand Analysisof Low Power MTCMOS
10 pages
A 32-Bit ALU With Sleep Mode For Leakage Power Reduction: Abstract
No ratings yet
A 32-Bit ALU With Sleep Mode For Leakage Power Reduction: Abstract
7 pages
Tegra Whitepaper 0911b
No ratings yet
Tegra Whitepaper 0911b
16 pages
6T SRAM Operation
No ratings yet
6T SRAM Operation
5 pages
2007 07 A Multi-Mode Power Gating Structure For Low-Voltage Deep-Submicron CMOS ICs
No ratings yet
2007 07 A Multi-Mode Power Gating Structure For Low-Voltage Deep-Submicron CMOS ICs
5 pages
Thesis Reprt - 60761017
No ratings yet
Thesis Reprt - 60761017
63 pages
Power Consumption in Wireless Networks: Techniques & Optimizations
No ratings yet
Power Consumption in Wireless Networks: Techniques & Optimizations
4 pages
A Hybrid Register Cache For GPUs
No ratings yet
A Hybrid Register Cache For GPUs
11 pages
Design and Leakage Power Optimization of 6T Static
No ratings yet
Design and Leakage Power Optimization of 6T Static
6 pages
Main 2
No ratings yet
Main 2
80 pages
Full Text 23952016
No ratings yet
Full Text 23952016
6 pages
Low Power Sram Cell With Improved Response
No ratings yet
Low Power Sram Cell With Improved Response
6 pages
Variation Tolerant Differential 8T SRAM Cell For Ultralow Power Applications
No ratings yet
Variation Tolerant Differential 8T SRAM Cell For Ultralow Power Applications
10 pages
Design of Ultra Low Power 6T SRAM Cell Using 180 NM CMOS Technology For Access Enhancement
No ratings yet
Design of Ultra Low Power 6T SRAM Cell Using 180 NM CMOS Technology For Access Enhancement
6 pages
Kumar 2023 Eng. Res. Express 5 035057
No ratings yet
Kumar 2023 Eng. Res. Express 5 035057
10 pages
Performance of Low Power SRAM Cells On SNM and Power Dissipation
No ratings yet
Performance of Low Power SRAM Cells On SNM and Power Dissipation
5 pages
Venkata Lakshmi 2021 J. Phys. Conf. Ser. 1804 012185
No ratings yet
Venkata Lakshmi 2021 J. Phys. Conf. Ser. 1804 012185
8 pages
8T Paper WTCM Final
No ratings yet
8T Paper WTCM Final
11 pages
11 - Chepter 3 PDF
No ratings yet
11 - Chepter 3 PDF
17 pages
Electronics 12 00834
No ratings yet
Electronics 12 00834
15 pages
Literature Survey of Low Power Strategies and
No ratings yet
Literature Survey of Low Power Strategies and
4 pages
Efficient Design of 1
No ratings yet
Efficient Design of 1
7 pages
Design of 6T-SRAM Cell Using Dual Threshold Voltage Transistor
No ratings yet
Design of 6T-SRAM Cell Using Dual Threshold Voltage Transistor
4 pages
Analysis of 6T SRAM Cell in Different Technologies
No ratings yet
Analysis of 6T SRAM Cell in Different Technologies
4 pages
Recent Trends in Low Power VLSI Design: R. Sivakumar, D. Jothi
No ratings yet
Recent Trends in Low Power VLSI Design: R. Sivakumar, D. Jothi
15 pages
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet

GPU Architecture Optimization For Mobile Computing

Uploaded by

GPU Architecture Optimization For Mobile Computing

Uploaded by

GPU Architecture Optimization For Mobile

A. General GPU Architecture

Figure 2. GPU Memory Hierarchy

978-1-7281-2478-0/19/$31.00 ©2019 IEEE 247

Figure 3. Percentage of idle L1 cache time of total execution time of 20

III. LEAKAGE POWER REDUCTION FOR GPU CACHE

TABLE II. LEAKAGE POWER CONSUMPTION AT DIFFERENT MODES OF

Mode Leakage Power in SRAM Array

978-1-7281-2478-0/19/$31.00 ©2019 IEEE 248

You might also like