0% found this document useful (0 votes)

6 views14 pages

UNIT-5 Part 1

The document outlines Unit 5 of a course on Modern Computer Architecture, focusing on High-Performance Computing (HPC) with CUDA. It covers the CUDA programming model, including data transfer between CPU and GPU, execution flow, kernel and thread hierarchy, and memory architecture in CUDA-capable GPUs. Key concepts include the roles of host and device memory, the execution of CUDA kernels, and the organization of threads and blocks for parallel processing.

Uploaded by

anilprincev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

UNIT-5 Part 1

Uploaded by

anilprincev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Department of

Computer Science and

Engineering

UNIT 5 – HPC with CUDA

Subject Name : MODERN COMPUTER ARCHITECTURE

Course Code : 10211CS129

School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology
Unit-4::Syllabus
UNIT- V
Unit 5 HPC with CUD (HPC) 9Hours
CUDA programming model
Basic principles of CUDA programming
Concepts of threads and blocks,
GPU and CPU data exchange.
Unit-4:: CUDA programming model
HPC Architecture
CUDA (or Compute Unified Device Architecture)

The CUDA programming model provides an abstraction of GPU

architecture that acts as a bridge between an application and its
possible implementation on GPU hardware.

CUDA® is a parallel computing platform and programming model

developed by NVIDIA for general computing on Graphical
Processing Units (GPUs). With CUDA, developers are able to
dramatically speed up computing applications by harnessing the
power of GPUs.
Source: https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
Unit-4:: CUDA programming model
HPC Architecture
CUDA processing flow

1. Copy data from main memory

to GPU memory
2. CPU initiates the GPU
compute kernel
3. GPU's CUDA cores execute the
kernel in parallel
4. Copy the resulting data from
GPU memory to main memory

Source: https://en.wikipedia.org/wiki/CUDA
Unit-4:: CUDA programming
model: HPC Architecture
host and device.

The host is the CPU available in the system.

The system memory associated with the
CPU is called host memory.
The GPU is called a device and GPU
memory likewise called device memory.

Source: https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
Unit-4:: CUDA programming model
HPC Architecture
To execute any CUDA program, there are three
main steps:
 Copy the input data from host memory to
device memory, also known as host-to-device
transfer.
 Load the GPU program and execute, caching
data on-chip for performance.
 Copy the results from device memory to host
memory, also called device-to-host transfer.
Source: https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
Unit-4:: CUDA programming model
HPC Architecture
CUDA kernel and thread hierarchy

The CUDA kernel is a function that

gets executed on GPU.

The parallel portion of your

applications is executed K times in
parallel by K different CUDA
threads, as opposed to only one Figure 1. The kernel is a function
time like regular C/C++ functions. executed on the GPU.

Source: https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
Unit-4:: CUDA programming model
HPC Architecture
Every CUDA kernel starts with a __global__ declaration specifier.
Programmers provide a unique global ID to each thread by using
built-in variables.

Figure 2. CUDA kernels are subdivided into blocks.

Unit-4:: CUDA programming model
HPC Architecture
Each CUDA block is executed by
one streaming multiprocessor
(SM) and cannot be migrated to
other SMs in GPU (except during
preemption, debugging, or CUDA
dynamic parallelism).
One SM can run several
concurrent CUDA blocks
depending on the resources
needed by CUDA blocks.
Each kernel is executed on one
device and CUDA supports
running multiple kernels on a
device at one time.
Figure 3 shows the kernel
execution and mapping on
hardware resources available in
Figure 3. Kernel execution on GPU. GPU.
Unit-4:: CUDA programming model
HPC Architecture
 The CUDA program for adding two matrices
below shows:
multidimensional blockIdx and threadIdx and
other variables like blockDim.
 In the example below, a 2D block is chosen for
ease of indexing and each block has 256 threads
with 16 each in x and y-direction.
 The total number of blocks are computed using
the data size divided by the size of each block.
Unit-4:: CUDA programming model
HPC Architecture

Example of CUDA Program for Matrix

Unit-4:: CUDA programming model
HPC Architecture
Memory hierarchy : CUDA-capable GPUs have a memory hierarchy as depicted in
Figure 4.

Figure 4. Memory hierarchy in GPUs.

Source: https://developer.nvidia.com/blog/cuda-refresher-cuda-programming-model/
Unit-4:: CUDA programming model
HPC Architecture
Registers—These are private to each thread, which means that registers
assigned to a thread are not visible to other threads. The compiler makes
decisions about register utilization.
L1/Shared memory (SMEM)—Every SM has a fast, on-chip scratchpad
memory that can be used as L1 cache and shared memory. All threads in
a CUDA block can share shared memory, and all CUDA blocks running on
a given SM can share the physical memory resource provided by the SM.
Read-only memory—Each SM has an instruction cache, constant
memory, texture memory and RO cache, which is read-only to kernel
code.
L2 cache—The L2 cache is shared across all SMs, so every thread in every
CUDA block can access this memory. The NVIDIA A100 GPU has increased
the L2 cache size to 40 MB as compared to 6 MB in V100 GPUs.
Global memory—This is the framebuffer size of the GPU and DRAM
sitting in the GPU.
Department of
Computer Science and
Engineering

Thank You

School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology

Boeing 747 Data
100% (1)
Boeing 747 Data
244 pages
Unit 5
No ratings yet
Unit 5
14 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
1 Cuda
100% (1)
1 Cuda
173 pages
UNIT-5 Part 2
No ratings yet
UNIT-5 Part 2
8 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
CUDA
No ratings yet
CUDA
46 pages
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
No ratings yet
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
18 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
CUDA
No ratings yet
CUDA
33 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Parallel Processing With Cuda
No ratings yet
Parallel Processing With Cuda
25 pages
Govind 6
No ratings yet
Govind 6
4 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
CUDA
No ratings yet
CUDA
18 pages
Course 7
No ratings yet
Course 7
21 pages
GPU Programming Slides 2
No ratings yet
GPU Programming Slides 2
37 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
CUDA Programming
No ratings yet
CUDA Programming
35 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Cuda
No ratings yet
Cuda
25 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
Crud Hello
No ratings yet
Crud Hello
4 pages
CUDA Programming On Nvidia Gpus: Mike Giles
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
21 pages
00 CourseIntroduction
No ratings yet
00 CourseIntroduction
33 pages
Lec 1
No ratings yet
Lec 1
27 pages
Basic-Cuda
No ratings yet
Basic-Cuda
49 pages
Endsem Imp HPC Unit 5
No ratings yet
Endsem Imp HPC Unit 5
24 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
Lecture12 GPUArchCUDA02-CUDAMem
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
67 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
Zeel PR 9
No ratings yet
Zeel PR 9
4 pages
Parallel Programming With CUDA - Architecture, Analysis
No ratings yet
Parallel Programming With CUDA - Architecture, Analysis
93 pages
CSE Lec4 Cuda
No ratings yet
CSE Lec4 Cuda
91 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
07 cmsc416 Cuda
No ratings yet
07 cmsc416 Cuda
26 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Air Act 1981 Project Arjun Dubey 4046
No ratings yet
Air Act 1981 Project Arjun Dubey 4046
3 pages
Boarding Pass (IXR MAA)
No ratings yet
Boarding Pass (IXR MAA)
2 pages
Model Lite
No ratings yet
Model Lite
4 pages
Strategic Competitice Analysis
No ratings yet
Strategic Competitice Analysis
30 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
62 pages
Editorial Cartoon
No ratings yet
Editorial Cartoon
4 pages
Chapter Five
No ratings yet
Chapter Five
10 pages
Pseudocode - 2
No ratings yet
Pseudocode - 2
106 pages
Adobe CS2 EULA
No ratings yet
Adobe CS2 EULA
7 pages
FINM7402 Case Study Questions
No ratings yet
FINM7402 Case Study Questions
6 pages
2.2. BASIC Work in Team Environment
No ratings yet
2.2. BASIC Work in Team Environment
3 pages
Airforceregs
No ratings yet
Airforceregs
308 pages
Contract - II
No ratings yet
Contract - II
8 pages
BOQ - Zallaf South Refinery Project - CAMP & TSF
No ratings yet
BOQ - Zallaf South Refinery Project - CAMP & TSF
18 pages
Defecte Multiplexare
No ratings yet
Defecte Multiplexare
22 pages
Sunder Rajan - 2005 - Biocapital
No ratings yet
Sunder Rajan - 2005 - Biocapital
359 pages
RocGwalior863 12072017
No ratings yet
RocGwalior863 12072017
16 pages
Oracle Database Administration
No ratings yet
Oracle Database Administration
57 pages
Configuring A JOB in T24
No ratings yet
Configuring A JOB in T24
2 pages
Total
No ratings yet
Total
19 pages
LOADALL - 533-105: Static Dimensions
No ratings yet
LOADALL - 533-105: Static Dimensions
4 pages
Bigmart PDF
No ratings yet
Bigmart PDF
20 pages
BMC Remedy Service Desk 7.6 Connector Installation and Configuration Guide
No ratings yet
BMC Remedy Service Desk 7.6 Connector Installation and Configuration Guide
50 pages
Technology Newsletter
No ratings yet
Technology Newsletter
5 pages
Client Duties Case Preparation
No ratings yet
Client Duties Case Preparation
2 pages
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
No ratings yet
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
2 pages
En Girafe
No ratings yet
En Girafe
4 pages
Towards A Critical Health Psychology Practice
100% (1)
Towards A Critical Health Psychology Practice
15 pages
Lab Rheology and Injection Molding - 1
No ratings yet
Lab Rheology and Injection Molding - 1
3 pages

UNIT-5 Part 1

Uploaded by

UNIT-5 Part 1

Uploaded by

Department of

Computer Science and

UNIT 5 – HPC with CUDA

Subject Name : MODERN COMPUTER ARCHITECTURE

The CUDA programming model provides an abstraction of GPU

CUDA® is a parallel computing platform and programming model

1. Copy data from main memory

The host is the CPU available in the system.

The CUDA kernel is a function that

The parallel portion of your

Figure 2. CUDA kernels are subdivided into blocks.

Example of CUDA Program for Matrix

Figure 4. Memory hierarchy in GPUs.

You might also like