Program Structure of CUDA

The structure of cuda programming

Uploaded by

cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views3 pages

Program Structure of CUDA

The structure of cuda programming

Uploaded by

cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

The CPU and GPUs are separate entities. Both have their own memory space.

CPU
cannot directly access GPU memory, and vice versa. In CUDA terminology, CPU
memory is called host memory and GPU memory is called device memory. Pointers to
CPU and GPU memory are called host pointer and device pointer, respectively.

For data to be accessible by GPU, it must be presented in the device memory. CUDA
provides APIs for allocating device memory and data transfer between host and device
memory. Following is the common workflow of CUDA programs.

1. Allocate host memory and initialized host data

2. Allocate device memory
3. Transfer input data from host to device memory
4. Execute kernels
5. Transfer output from device memory to host

So far, we have done step 1 and 4. We will add step 2, 3, and 5 to our vector addition
program and finish this exercise.

Program Structure of CUDA

A typical CUDA program has code intended both for the GPU and the CPU. By default,
a traditional C program is a CUDA program with only the host code. The CPU is
referred to as the host, and the GPU is referred to as the device. Whereas the host
code can be compiled by a traditional C compiler as the GCC, the device code needs a
special compiler to understand the api functions that are used. For Nvidia GPUs, the
compiler is called the NVCC (Nvidia C Compiler).
The device code runs on the GPU, and the host code runs on the CPU. The NVCC
processes a CUDA program, and separates the host code from the device code. To
accomplish this, special CUDA keywords are looked for. The code intended to run of
the GPU (device code) is marked with special CUDA keywords for labelling data-
parallel functions, called ‘Kernels’. The device code is further compiled by the NVCC
and executed on the GPU.
Execution of a CUDA C Program
How does a CUDA program work? While writing a CUDA program, the programmer
has explicit control on the number of threads that he wants to launch (this is a carefully
decided-upon number). These threads collectively form a three-dimensional grid
(threads are packed into blocks, and blocks are packed into grids). Each thread is
given a unique identifier, which can be used to identify what data it is to be acted upon.

Device Global Memory and Data Transfer

As has been explained in the previous chapter, a typical GPU comes with its own
global memory (DRAM- Dynamic Random Access Memory). For example, the Nvidia
GTX 480 has DRAM size equal to 4G. From now on, we will call this memory the
device memory.
To execute a kernel on the GPU, the programmer needs to allocate separate memory
on the GPU by writing code. The CUDA API provides specific functions for
accomplishing this. Here is the flow sequence −
 After allocating memory on the device, data has to be transferred from the host
memory to the device memory.
 After the kernel is executed on the device, the result has to be transferred back
from the device memory to the host memory.
 The allocated memory on the device has to be freed-up. The host can access
the device memory and transfer data to and from it, but not the other way round.
CUDA provides API functions to accomplish all these steps.

GPU Basics
No ratings yet
GPU Basics
93 pages
71 Albert Einstein Quotes To Inspire You For Life PDF
100% (1)
71 Albert Einstein Quotes To Inspire You For Life PDF
23 pages
1 Cuda
100% (1)
1 Cuda
173 pages
C Material
100% (2)
C Material
102 pages
Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
PPDS Cookbook 3 Configuration of The Core Interface
No ratings yet
PPDS Cookbook 3 Configuration of The Core Interface
13 pages
Tinkering Lab Exp 1
No ratings yet
Tinkering Lab Exp 1
26 pages
Adams - Hash Clusters Oracle
100% (1)
Adams - Hash Clusters Oracle
17 pages
Climaveneta W 3000
No ratings yet
Climaveneta W 3000
65 pages
Gpu History and Cuda Programming Basics
No ratings yet
Gpu History and Cuda Programming Basics
44 pages
Electronics Project Proposal
33% (3)
Electronics Project Proposal
5 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Microsoft Official Course: Trainer: Binduraj
No ratings yet
Microsoft Official Course: Trainer: Binduraj
93 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Autosar Sws Com
100% (1)
Autosar Sws Com
216 pages
Online Bakery Shop: Project
100% (1)
Online Bakery Shop: Project
9 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
T3tafj5 Work With Ds I r15
No ratings yet
T3tafj5 Work With Ds I r15
32 pages
Unmanned Petrol Pump Synopsis
50% (4)
Unmanned Petrol Pump Synopsis
12 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Gpu Cuda
No ratings yet
Gpu Cuda
204 pages
Oracle JR - DBA Interview Questions and Answers
100% (1)
Oracle JR - DBA Interview Questions and Answers
18 pages
Unit 2 Part A
No ratings yet
Unit 2 Part A
178 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
How To Run CUDA C
100% (1)
How To Run CUDA C
6 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
Managing The MAC Address Table: Disabling MAC Address Learning On An Interface or VLAN
No ratings yet
Managing The MAC Address Table: Disabling MAC Address Learning On An Interface or VLAN
4 pages
Shift Micro-Operation PDF
No ratings yet
Shift Micro-Operation PDF
14 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
Cuda C
No ratings yet
Cuda C
70 pages
Lecture12 GPUArchCUDA02-CUDAMem
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
67 pages
Moxa AWK-5232 User Manual
No ratings yet
Moxa AWK-5232 User Manual
81 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Prepking EE0-200 Exam Questions
0% (1)
Prepking EE0-200 Exam Questions
11 pages
Lecture3 Fundamentals of CUDA (Part1) - 2025
No ratings yet
Lecture3 Fundamentals of CUDA (Part1) - 2025
52 pages
Basic-Cuda
No ratings yet
Basic-Cuda
49 pages
CUDA - Part 1 LMS
No ratings yet
CUDA - Part 1 LMS
51 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
Threads
No ratings yet
Threads
54 pages
Programming Models For GPU Architecture
No ratings yet
Programming Models For GPU Architecture
55 pages
CUDA Part-1
No ratings yet
CUDA Part-1
52 pages
CUDA PPT Anurita Unit3
No ratings yet
CUDA PPT Anurita Unit3
42 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
CUDA Programming Invert
No ratings yet
CUDA Programming Invert
36 pages
CUDA
No ratings yet
CUDA
33 pages
GPU Programming Slides 2
No ratings yet
GPU Programming Slides 2
37 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
cs179 2016 Lec13
No ratings yet
cs179 2016 Lec13
30 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
CUDAProg Model
No ratings yet
CUDAProg Model
24 pages
56 Mind-Blowing Albert Einstein Quotes
No ratings yet
56 Mind-Blowing Albert Einstein Quotes
22 pages
56 Mind-Blowing Albert Einstein Quotes
No ratings yet
56 Mind-Blowing Albert Einstein Quotes
22 pages
Endsem Imp HPC Unit 5
No ratings yet
Endsem Imp HPC Unit 5
24 pages
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
No ratings yet
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
21 pages
Java Point
No ratings yet
Java Point
21 pages
CUDA Programming On Nvidia Gpus: Mike Giles
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
21 pages
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
11 pages
Lec 1
No ratings yet
Lec 1
21 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
CUDA
No ratings yet
CUDA
18 pages
Technical Note 011 - Notes For IT Departments
No ratings yet
Technical Note 011 - Notes For IT Departments
17 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
CUDA
No ratings yet
CUDA
13 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
In This Document: Symptoms Cause Solution References
No ratings yet
In This Document: Symptoms Cause Solution References
3 pages
Getting Started On The AT91SAM7X-EK PDF
No ratings yet
Getting Started On The AT91SAM7X-EK PDF
16 pages
Jubli Perak PMK - Pencapain Hybrid Digital Trainer 2019-2023
No ratings yet
Jubli Perak PMK - Pencapain Hybrid Digital Trainer 2019-2023
10 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
SMART Board® MX065-V2 Interactive Display With Iq: Specifications
No ratings yet
SMART Board® MX065-V2 Interactive Display With Iq: Specifications
6 pages
GPU Series III CUDA Compilation Host Side 1721302802
No ratings yet
GPU Series III CUDA Compilation Host Side 1721302802
8 pages
Project Synopsis
No ratings yet
Project Synopsis
4 pages
1673
No ratings yet
1673
3 pages
CC Gigamon Product Comparison
No ratings yet
CC Gigamon Product Comparison
4 pages
MMC It Job Advert.
No ratings yet
MMC It Job Advert.
3 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Pa2423l Brief
No ratings yet
Pa2423l Brief
1 page
4 - Key Concepts
No ratings yet
4 - Key Concepts
2 pages
Difference Between IoT and M2M
No ratings yet
Difference Between IoT and M2M
2 pages
68HC908AB32FS
No ratings yet
68HC908AB32FS
2 pages
CUDA Programming with Python: From Basics to Expert Proficiency
From Everand
CUDA Programming with Python: From Basics to Expert Proficiency
William Smith
1/5 (1)
Practical GPU Programming: High-performance computing with CUDA, CuPy, and Python on modern GPUs
From Everand
Practical GPU Programming: High-performance computing with CUDA, CuPy, and Python on modern GPUs
Maris Fenlor
No ratings yet
Mastering CUDA Python Programming
From Everand
Mastering CUDA Python Programming
Ed A Norex
No ratings yet
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet

Program Structure of CUDA

Uploaded by

Program Structure of CUDA

Uploaded by

The CPU and GPUs are separate entities. Both have their own memory space.

1. Allocate host memory and initialized host data

Program Structure of CUDA

Device Global Memory and Data Transfer

You might also like