0% found this document useful (0 votes)
12 views21 pages

Course 7

Uploaded by

anes20181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

Course 7

Uploaded by

anes20181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Programming langages

Pr. Yahia Benmoussa ([email protected])


GPU Programming

NVIDIA GPU hardware architecture

CUDA Programming model
GPU vs CPU
GPU vs CPU

GPU and the CPU exists because they are designed
with different goals
– CPU is designed to execute a sequence of operations,
called a thread, as fast as possible.

transistors are devoted to insctruction control
– GPU is designed to execute thousands of them in parallel

transistors are devoted to data processing
GPU architecture
What is CUDA
The CUDA parallel programming model is
designed to overcome this challenge while
maintaining a low learning curve for programmers
familiar with standard programming languages
such as C.
CUDA Programming model

CUDA Programming model offers three key
abstractions
– Hierarchy of thread groups
– Shared memories
– Barrier synchronization

These abstractions are simply exposed to the
programmer as a minimal set of language extensions.
Thread Hierarchy

The programmer can to partition the problem into :
– coarse sub-problems that can be solved independently
in parallel by blocks of threads,
– sub-problem into finer pieces that can be solved
cooperatively in parallel by all threads within the block
– Grid is composed of different block
Thread Hierarchy

This decomposition preserves language expressivity by
allowing threads to cooperate when solving each sub-
problem

At the same time enables automatic scalability where each
block of threads can be scheduled on any of the available
multiprocessors within a GPU, in any order, concurrently or

sequentially, so that a compiled CUDA program can
execute on any number of multiprocessors
What is CUDA

CUDA C++ extends C++ by allowing the
programmer to define C++ functions, called
kernels, that, when called, are executed N
times in parallel by N different CUDA threads,
as opposed to only once like regular C++
functions.
CUDA Programming model

There is a limit to the number of threads per block,
since all threads of a block are expected to reside
on the same streaming multiprocessor core and
must share the limited memory resources of that
– On current GPUs, a thread block may contain up to
1024 threads

The size of the grid depends on the data size
CUDA Programming model

A kernel is defined using the __global__
declaration specifier and the number of CUDA
threads that execute that kernel for a given
kernel call is specified using a
new<<<...>>>execution configuration syntax
CUDA build-in variables

int threadIdx → This variable contains the
thread index within the block

int blockDim → This variable contains the
number of threads per block

int blockIdx.x → This variable contains the
block index within the grid
Dimensions of the block/grid
CUDA Programming model
Where to run your CUDA code ?

On you PC if it has a NVIDIA GPU !
– CUDA Installation Guide for Linux

On the cloud
– Google colab : 3 types of NVIDIA GPU :

T4

A 100

L4
– Kaggle
– Amazon SageMarker Studio Lab
T4 GPU

Number of SM = 40

Number of core per SM = 64

Total number of cores = 2560
References

CUDA C++ Programming Guide

You might also like