Tutorial No 3

This document contains a tutorial on parallel processing and CUDA. It includes: 1. A definition of GPGPU and what it means for a GPU to have general purpose capabilities. 2. An explanation of why CUDA is considered heterogeneous computing, with different processors handling low and high latency code serially and parallel respectively. 3. Definitions of key CUDA terms like device, kernel, grid of thread blocks, and warp. 4. An overview of CUDA's parallel programming model with one kernel executing at a time across a grid of thread blocks and threads in a warp executing simultaneously. 5. A listing and brief explanation of CUDA's different memory types including registers, local memory, shared memory

Uploaded by

mmed68003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views2 pages

Tutorial No 3

Uploaded by

mmed68003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

King Saud University

College of Computer and Information Sciences

Department of Computer Science
CSC453 – Parallel Processing – Tutorial No 3 – Fall 2021

Question
1. What GPGPU stands for and what does it mean.
General Purpose GPU, a GPU that has the ability to perform calculations that are usually dedicated for CPU

2. Why CUDA is said Heterogeneous computing. 2- Processing is handled by two different processors
the low letancy code performed by CPU in a Serial way
the high letancy code performed by GPU in a Parallel way
3. Give the definition of the following terms:
a. Device: Refers to the GPU and its memory
b. Kernel. A function that runs on the device. One kernel executed at a time and Many
threads execute each kernel.
c. Grid of thread blocks. The kernal is executed by a grid of thread blocks. Each Grid has a collection of blocks and
each block has a collection of threads.
d. Warp.
Group of 32 threads of the same block

4. Explain the parallel programming model of CUDA.

One kernel is executed at a time. Kernal executed by a grid of thread blocks and threads of the same Warp they
execute the same instruction at the same time
5. Enumerate and explain the different types of memory adopted by CUDA.

6. Explain Why the Constant memory is cached, while the Global memory is not.

Constant Memory is read only. caching has no overhead because it doesn't has cache coherency problem
Global Memory is read/write. It has cache coherency problem and the overhead to maintain it will be very high we have thousands of
threads running.

cached == read only

1- Registers: per thread, 32bit, on chip
2- local memory: per thread, relative large, in DRAM
3- shared memory: per block, 16KB, on chip
4- Global memory: per grid, non cached, in DRAM Registers, shared: On chip
5- Constant memory: per grid, cached, in DRAM
6- Texture memory: per grid, cached, in DRAM rest are in DRAM

1
King Saud University
College of Computer and Information Sciences
Department of Computer Science
CSC453 – Parallel Processing – Tutorial No 3 – Fall 2021

Lecture 12 GPU Programming
No ratings yet
Lecture 12 GPU Programming
65 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
GPU Computing 2
No ratings yet
GPU Computing 2
28 pages
BiCMOS Technology and Applications PDF
No ratings yet
BiCMOS Technology and Applications PDF
344 pages
Types of Pipeline
100% (1)
Types of Pipeline
2 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Paper-Loc and Los
No ratings yet
Paper-Loc and Los
5 pages
Cpus: Latency Oriented Design
No ratings yet
Cpus: Latency Oriented Design
2 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
PART19
No ratings yet
PART19
20 pages
Part4 22
No ratings yet
Part4 22
65 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
GPU Khoruzhenko
No ratings yet
GPU Khoruzhenko
5 pages
CS 179: GPU Computing: Lecture 2: More Basics
No ratings yet
CS 179: GPU Computing: Lecture 2: More Basics
23 pages
CH19 COA10e
No ratings yet
CH19 COA10e
20 pages
Parralel Demro 001
No ratings yet
Parralel Demro 001
45 pages
4 - Key Concepts
No ratings yet
4 - Key Concepts
2 pages
Parallel Programming Module 5
No ratings yet
Parallel Programming Module 5
24 pages
CSE Lec4 Cuda
No ratings yet
CSE Lec4 Cuda
91 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Arallel Rocessing NIT
No ratings yet
Arallel Rocessing NIT
44 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
Gpu Cuda 2
No ratings yet
Gpu Cuda 2
72 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
IR Drop Analysis
No ratings yet
IR Drop Analysis
45 pages
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
No ratings yet
CS 179: GPU Computing: Lecture 4: Gpu Memory Systems
43 pages
Hardware
No ratings yet
Hardware
54 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Vector Processors
No ratings yet
Vector Processors
20 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
6 pages
Arallel Rocessing NIT
No ratings yet
Arallel Rocessing NIT
58 pages
Cuda
No ratings yet
Cuda
69 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Cuda
No ratings yet
Cuda
25 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
DS1822-Parallel Computing - Unit5
No ratings yet
DS1822-Parallel Computing - Unit5
16 pages
Endsem Imp HPC Unit 5
No ratings yet
Endsem Imp HPC Unit 5
24 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
GPU Architecture and Programming Lecture
No ratings yet
GPU Architecture and Programming Lecture
9 pages
Mtech Electronics Syllabus VTU
No ratings yet
Mtech Electronics Syllabus VTU
48 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
Lec 1
No ratings yet
Lec 1
27 pages
CUDA
No ratings yet
CUDA
18 pages
Unit 4
No ratings yet
Unit 4
48 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
Lecture2 GPU Architecture - 2025
No ratings yet
Lecture2 GPU Architecture - 2025
46 pages
GPU Fundamentals
No ratings yet
GPU Fundamentals
20 pages
CUDA Programming
No ratings yet
CUDA Programming
35 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Cuda Mode Lecture2
No ratings yet
Cuda Mode Lecture2
33 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Note2 4
No ratings yet
Note2 4
11 pages
Cell - Based IC Design, Implementation and Verification
No ratings yet
Cell - Based IC Design, Implementation and Verification
216 pages
Bus and Memory Transfers
No ratings yet
Bus and Memory Transfers
14 pages
CSED405 Lec2-CUDA Overview - 240916 - 131108
No ratings yet
CSED405 Lec2-CUDA Overview - 240916 - 131108
52 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Microprocessor Architecture Programming and Its Application With 8085
No ratings yet
Microprocessor Architecture Programming and Its Application With 8085
56 pages
Finite State Machines (Mealy and Moore Machines) : Gookyi Dennis A. N. Soc Design Lab
No ratings yet
Finite State Machines (Mealy and Moore Machines) : Gookyi Dennis A. N. Soc Design Lab
18 pages
EPROM
No ratings yet
EPROM
9 pages
Chapter 1 - Introduction To Computer Architecture and Organization
No ratings yet
Chapter 1 - Introduction To Computer Architecture and Organization
21 pages
STLD Syl
No ratings yet
STLD Syl
2 pages
Identifying Purposes and Characteristics
No ratings yet
Identifying Purposes and Characteristics
35 pages
Lecture 11
No ratings yet
Lecture 11
34 pages
Addressing Modes
No ratings yet
Addressing Modes
10 pages
Workshop Notes
No ratings yet
Workshop Notes
5 pages
Chapter One: Overview of Computer Systems
No ratings yet
Chapter One: Overview of Computer Systems
37 pages
Computer Architecture MCQ
No ratings yet
Computer Architecture MCQ
5 pages
What Are The Memory Limitations While Writing Embedded Code?explain
No ratings yet
What Are The Memory Limitations While Writing Embedded Code?explain
2 pages
Smartcard Mcu With t2g+ Authentication Library
No ratings yet
Smartcard Mcu With t2g+ Authentication Library
2 pages
8051
No ratings yet
8051
36 pages
MPMC
No ratings yet
MPMC
232 pages
Computer Workshop Tech, Presentation I
No ratings yet
Computer Workshop Tech, Presentation I
9 pages
Convolution
No ratings yet
Convolution
6 pages
Winbond W9425G6JH-5 Datasheet PDF
No ratings yet
Winbond W9425G6JH-5 Datasheet PDF
52 pages
COA Project
No ratings yet
COA Project
8 pages
Hardware and Software Game
No ratings yet
Hardware and Software Game
13 pages
PIC16F152XX Family Programming Specification
No ratings yet
PIC16F152XX Family Programming Specification
39 pages
Arm Notes 2
No ratings yet
Arm Notes 2
9 pages
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
From Everand
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet

Tutorial No 3

Uploaded by

Tutorial No 3

Uploaded by

King Saud University

College of Computer and Information Sciences

4. Explain the parallel programming model of CUDA.

cached == read only

You might also like