0% found this document useful (0 votes)
43 views

GPU Architecture

The document discusses GPU architecture and programming. It explains that GPUs use a massively parallel processing model with many lightweight cores compared to CPUs. GPUs are well-suited for applications with data parallelism that can process each data item independently. The document outlines CUDA programming and the CUDA compute hierarchy of threads organized into blocks that make up grids to perform computations across large datasets in parallel on GPUs.

Uploaded by

Hetvi Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

GPU Architecture

The document discusses GPU architecture and programming. It explains that GPUs use a massively parallel processing model with many lightweight cores compared to CPUs. GPUs are well-suited for applications with data parallelism that can process each data item independently. The document outlines CUDA programming and the CUDA compute hierarchy of threads organized into blocks that make up grids to perform computations across large datasets in parallel on GPUs.

Uploaded by

Hetvi Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

15-02-2024

GPU Architecture

• Sequential Execution Model


int a[N]; // N is large
for (i =0; i < N; i++)
a[i] = a[i] * fade;

Flow of control / Thread


time

One instruction at the time


Optimizations possible at the machine level

1
15-02-2024

• Data Parallel Execution Model / SIMD


int a[N]; // N is large
for all elements do in parallel
a[index] = a[index] * f;
time

• Single Program Multiple Data / SPMD


int a[N]; // N is large
for all elements do in parallel
if (a[i] > threshold) a[i]*= f;
time

2
15-02-2024

• Programmer’s view – Typical System

CPU regs
CPU
caches

12.8GB/sec – 31.92GB/sec
8B per transfer

Memory
Memory

• What is a GPU
– Specialized processor for graphics
– Massively parallel:
• Lots of:
– Read data, calculate, write
– Used to be fixed function
– Are becoming more programmable

• What is CUDA
– A C extension for programming for NVIDIA
GPUs
– Straightforward to learn
– Challenge is in getting performance

3
15-02-2024

• Programmer’s view with GPU

CPU 3GB/s GPU


141GB/sec

12.8GB/sec – 31.92GB/sec
8B per transfer
GPU Memory
Memory
1GB on our systems

• Programmer’s view with GPU


CPU
GPU
Copy to GPU mem

Launch GPU threads

Synchronize with GPU

Copy from GPU mem


time

4
15-02-2024

• Structure: CPU vs. GPU

• What the programmer needs to know?

10

5
15-02-2024

• Programmer’s view: GPU Architecture

11

• Threads / Blocks / Grid


Block size = 12
#blocks = 5

Block 0: a[0]…a[11]

Block 4: a[48] .. a[59]

a[48]

a[59]

13

6
15-02-2024

• Memory Hierarchy
Anything declared inside
The kernel

__shared__ int…

__global__ int…

14

• Performance Programmer’s view

15

7
15-02-2024

GPU Computing
• GPU computing is the use of a GPU (graphics processing unit) as a co-
processor to accelerate CPUs for general-purpose scientific and
engineering computing.
• The GPU accelerates applications running on the CPU by offloading
some of the compute-intensive and time-consuming portions of the
code.
• The rest of the application still runs on the CPU. From a user's
perspective, the application runs faster because it's using the
massively parallel processing power of the GPU to boost
performance. This is known as "heterogeneous" or "hybrid"
computing.

17

• A CPU consists of four to eight CPU cores, while


the GPU consists of hundreds of smaller cores.
• Together, they operate to crunch through the
data in the application. This massively parallel
architecture is what gives the GPU its high
compute performance.
• There are a number of GPU-accelerated
applications that provide an easy way to access
high-performance computing (HPC).

18

8
15-02-2024

• A GPU uses many lightweight processing cores,


leverages data parallelism, and has high
memory throughput.
• While the specific components will vary by
model, fundamentally most modern GPUs use
single instruction multiple data (SIMD) stream
architecture.

19

Data Parallelism
• Modern applications process large amounts of data that incur significant
execution time on sequential computers. An example of such an
application is rendering pixels. For example, an application that converts
sRGB pixels to grayscale. To process a 1920x1080 image, the application
must process 2073600 pixels.
• Processing all those pixels on a traditional uniprocessor CPU will take a
very long time since the execution will be done sequentially. (The time
taken will be proportional to the number of pixels in the image).
• Further, it is very inefficient since the operation that is performed on each
pixel is the same, but different on the data (SPMD).
• Since processing one pixel is independent of the processing of any other
pixel, all the pixels can be processed in parallel.
• If we use 2073600 threads (“workers”) and each thread processes one
pixel, the task can be reduced to constant time.
• Millions of such threads can be launched on modern GPUs.

20

9
15-02-2024

• Understanding GPU architecture leads us to


Nvidia's popular Compute Unified Device
Architecture (CUDA) parallel computing
platform. By providing an API that enables
developers to optimize how GPU resources
are used -- without the need for specialized
graphics programming knowledge

21

Program Structure of CUDA

22

10
15-02-2024

CUDA compute hierarchy


1. Threads –
• A thread -- or CUDA core -- is a parallel
processor that computes floating point
math calculations in an Nvidia GPU.
• All the data processed by a GPU is
processed via a CUDA core. Modern GPUs
have hundreds or even thousands of CUDA
cores.
• Each CUDA core has its own memory
register that is not available to other
threads.

23

2. Thread blocks
• As the name implies, a thread block -- or
CUDA block -- is a grouping of CUDA cores
(threads) that can be executed together in
series or parallel.
• The logical grouping of cores enables more
efficient data mapping. Thread blocks share
memory on a per-block basis.

24

11
15-02-2024

3. Kernel grids
• The next layer of abstraction up from
thread blocks is the kernel grid. Kernel
grids are groupings of thread blocks on the
same kernel. Grids can be used to perform
larger computations in parallel

25

12

You might also like