0% found this document useful (0 votes)

82 views12 pages

GPU Architecture

The document discusses GPU architecture and programming. It explains that GPUs use a massively parallel processing model with many lightweight cores compared to CPUs. GPUs are well-suited for applications with data parallelism that can process each data item independently. The document outlines CUDA programming and the CUDA compute hierarchy of threads organized into blocks that make up grids to perform computations across large datasets in parallel on GPUs.

Uploaded by

Hetvi Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views12 pages

GPU Architecture

Uploaded by

Hetvi Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

15-02-2024

GPU Architecture

• Sequential Execution Model

int a[N]; // N is large
for (i =0; i < N; i++)
a[i] = a[i] * fade;

Flow of control / Thread

time

One instruction at the time

Optimizations possible at the machine level

1
15-02-2024

• Data Parallel Execution Model / SIMD

int a[N]; // N is large
for all elements do in parallel
a[index] = a[index] * f;
time

• Single Program Multiple Data / SPMD

int a[N]; // N is large
for all elements do in parallel
if (a[i] > threshold) a[i]*= f;
time

2
15-02-2024

• Programmer’s view – Typical System

CPU regs
CPU
caches

12.8GB/sec – 31.92GB/sec
8B per transfer

Memory
Memory

• What is a GPU
– Specialized processor for graphics
– Massively parallel:
• Lots of:
– Read data, calculate, write
– Used to be fixed function
– Are becoming more programmable

• What is CUDA
– A C extension for programming for NVIDIA
GPUs
– Straightforward to learn
– Challenge is in getting performance

3
15-02-2024

• Programmer’s view with GPU

CPU 3GB/s GPU

141GB/sec

12.8GB/sec – 31.92GB/sec
8B per transfer
GPU Memory
Memory
1GB on our systems

• Programmer’s view with GPU

CPU
GPU
Copy to GPU mem

Launch GPU threads

Synchronize with GPU

Copy from GPU mem

time

4
15-02-2024

• Structure: CPU vs. GPU

• What the programmer needs to know?

5
15-02-2024

• Programmer’s view: GPU Architecture

• Threads / Blocks / Grid

Block size = 12
#blocks = 5

Block 0: a[0]…a[11]
…
Block 4: a[48] .. a[59]

a[48]

a[59]

6
15-02-2024

• Memory Hierarchy
Anything declared inside
The kernel

__shared__ int…

__global__ int…

• Performance Programmer’s view

7
15-02-2024

GPU Computing
• GPU computing is the use of a GPU (graphics processing unit) as a co-
processor to accelerate CPUs for general-purpose scientific and
engineering computing.
• The GPU accelerates applications running on the CPU by offloading
some of the compute-intensive and time-consuming portions of the
code.
• The rest of the application still runs on the CPU. From a user's
perspective, the application runs faster because it's using the
massively parallel processing power of the GPU to boost
performance. This is known as "heterogeneous" or "hybrid"
computing.

• A CPU consists of four to eight CPU cores, while

the GPU consists of hundreds of smaller cores.
• Together, they operate to crunch through the
data in the application. This massively parallel
architecture is what gives the GPU its high
compute performance.
• There are a number of GPU-accelerated
applications that provide an easy way to access
high-performance computing (HPC).

8
15-02-2024

• A GPU uses many lightweight processing cores,

leverages data parallelism, and has high
memory throughput.
• While the specific components will vary by
model, fundamentally most modern GPUs use
single instruction multiple data (SIMD) stream
architecture.

Data Parallelism
• Modern applications process large amounts of data that incur significant
execution time on sequential computers. An example of such an
application is rendering pixels. For example, an application that converts
sRGB pixels to grayscale. To process a 1920x1080 image, the application
must process 2073600 pixels.
• Processing all those pixels on a traditional uniprocessor CPU will take a
very long time since the execution will be done sequentially. (The time
taken will be proportional to the number of pixels in the image).
• Further, it is very inefficient since the operation that is performed on each
pixel is the same, but different on the data (SPMD).
• Since processing one pixel is independent of the processing of any other
pixel, all the pixels can be processed in parallel.
• If we use 2073600 threads (“workers”) and each thread processes one
pixel, the task can be reduced to constant time.
• Millions of such threads can be launched on modern GPUs.

9
15-02-2024

• Understanding GPU architecture leads us to

Nvidia's popular Compute Unified Device
Architecture (CUDA) parallel computing
platform. By providing an API that enables
developers to optimize how GPU resources
are used -- without the need for specialized
graphics programming knowledge

Program Structure of CUDA

10
15-02-2024

CUDA compute hierarchy

1. Threads –
• A thread -- or CUDA core -- is a parallel
processor that computes floating point
math calculations in an Nvidia GPU.
• All the data processed by a GPU is
processed via a CUDA core. Modern GPUs
have hundreds or even thousands of CUDA
cores.
• Each CUDA core has its own memory
register that is not available to other
threads.

2. Thread blocks
• As the name implies, a thread block -- or
CUDA block -- is a grouping of CUDA cores
(threads) that can be executed together in
series or parallel.
• The logical grouping of cores enables more
efficient data mapping. Thread blocks share
memory on a per-block basis.

11
15-02-2024

3. Kernel grids
• The next layer of abstraction up from
thread blocks is the kernel grid. Kernel
grids are groupings of thread blocks on the
same kernel. Grids can be used to perform
larger computations in parallel

Parallel Programming With CUDA - Architecture, Analysis
No ratings yet
Parallel Programming With CUDA - Architecture, Analysis
93 pages
Introduction - CUDA C Programming Guide
No ratings yet
Introduction - CUDA C Programming Guide
573 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Cuda
No ratings yet
Cuda
69 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Part1 22
No ratings yet
Part1 22
77 pages
w13s1 MultiprocessingGPU
No ratings yet
w13s1 MultiprocessingGPU
21 pages
CUDA 1 - Introduction To GPU, CUDA
No ratings yet
CUDA 1 - Introduction To GPU, CUDA
21 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
Coe4590 15 Gpu1
No ratings yet
Coe4590 15 Gpu1
14 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Multicore Computers
No ratings yet
Multicore Computers
9 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Unit 4
No ratings yet
Unit 4
48 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
GPU Khoruzhenko
No ratings yet
GPU Khoruzhenko
5 pages
Gpus
No ratings yet
Gpus
32 pages
Gpu Cuda Part2
No ratings yet
Gpu Cuda Part2
15 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
6 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
GPU Architecture and Function: Michael Foster and Ian Frasch
No ratings yet
GPU Architecture and Function: Michael Foster and Ian Frasch
35 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Khan Muhammad Nafee Mostafa: Presented by
No ratings yet
Khan Muhammad Nafee Mostafa: Presented by
20 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
What Is A GPU
No ratings yet
What Is A GPU
3 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Intro To Gpu &amp Cuda
No ratings yet
Intro To Gpu &amp Cuda
15 pages
Accelerating Large Graph Algorithms On The GPU Using CUDA
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using CUDA
12 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
CUDA
No ratings yet
CUDA
46 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Jetson Orin NX Series Modules Data Sheet DS 10712 001 v1.1
No ratings yet
Jetson Orin NX Series Modules Data Sheet DS 10712 001 v1.1
54 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
OpenACC For Programmers 2018
No ratings yet
OpenACC For Programmers 2018
317 pages
LumenRT Reference Manual
100% (1)
LumenRT Reference Manual
62 pages
Functions and Type of Computer Cards
100% (1)
Functions and Type of Computer Cards
13 pages
Computer Repair With Diagnostic Flowcharts Third Edition
No ratings yet
Computer Repair With Diagnostic Flowcharts Third Edition
69 pages
Desktop Engineering - 2012-01
No ratings yet
Desktop Engineering - 2012-01
52 pages
2ad6a430 1637912349895
No ratings yet
2ad6a430 1637912349895
51 pages
Data Centre Magazine February 2024
No ratings yet
Data Centre Magazine February 2024
158 pages
Factors Affecting The Vbusiness Operations in Ralph Internet Café
No ratings yet
Factors Affecting The Vbusiness Operations in Ralph Internet Café
16 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Installation and Deployment Guide
No ratings yet
Installation and Deployment Guide
19 pages
Dakar10Abx: WWW - Vinafix.vn
No ratings yet
Dakar10Abx: WWW - Vinafix.vn
57 pages
ECG Arrow Lake-S Datasheet Addendum - Rev1.0
No ratings yet
ECG Arrow Lake-S Datasheet Addendum - Rev1.0
15 pages
Botanica Cotidiano
No ratings yet
Botanica Cotidiano
183 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Nvidia
No ratings yet
Nvidia
12 pages
(Guide) Intel Framebuffer Patching Using WhateverGreen PDF
No ratings yet
(Guide) Intel Framebuffer Patching Using WhateverGreen PDF
23 pages
MMDetection Open MMLab Detection Toolbox and Benchmark
No ratings yet
MMDetection Open MMLab Detection Toolbox and Benchmark
13 pages
DX Diag
No ratings yet
DX Diag
46 pages
AMD Graphics Accelerator: User's Manual
No ratings yet
AMD Graphics Accelerator: User's Manual
39 pages
Last Exception
No ratings yet
Last Exception
54 pages
OpenACC Princeton Bootcamp PDF
No ratings yet
OpenACC Princeton Bootcamp PDF
51 pages
Lastexception 63855532120
No ratings yet
Lastexception 63855532120
21 pages
Sway 020 A
No ratings yet
Sway 020 A
7 pages
Lastexception 63813969158
No ratings yet
Lastexception 63813969158
22 pages
Resume Oussema Toumi
No ratings yet
Resume Oussema Toumi
10 pages
DX Diag
No ratings yet
DX Diag
20 pages
Geforce RTX 4070 Super 12g Gaming X Slim
No ratings yet
Geforce RTX 4070 Super 12g Gaming X Slim
1 page
Install GPU Support For Theano
No ratings yet
Install GPU Support For Theano
6 pages
Log
No ratings yet
Log
2 pages
Logger
No ratings yet
Logger
1 page
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet

GPU Architecture

Uploaded by

GPU Architecture

Uploaded by

15-02-2024

• Sequential Execution Model

Flow of control / Thread

One instruction at the time

• Data Parallel Execution Model / SIMD

• Single Program Multiple Data / SPMD

• Programmer’s view – Typical System

• Programmer’s view with GPU

CPU 3GB/s GPU

• Programmer’s view with GPU

Launch GPU threads

Synchronize with GPU

Copy from GPU mem

• Structure: CPU vs. GPU

• What the programmer needs to know?

• Programmer’s view: GPU Architecture

• Threads / Blocks / Grid

• Performance Programmer’s view

• A CPU consists of four to eight CPU cores, while

• A GPU uses many lightweight processing cores,

• Understanding GPU architecture leads us to

Program Structure of CUDA

CUDA compute hierarchy

You might also like