Intro To CUDA
Intro To CUDA
2
Data Parallelism in Modern
Applications
•Concept of Data Parallelism
•Definition:
•Break down large problems into independent smaller computations.
•Execution in Parallel:
•Organize computation around data.
•Execute independent tasks simultaneously.
3
Illustrating Data Parallelism with
Color-to-Grayscale Conversion
•Color Image Representation
•Pixels: Each pixel contains red (r), green (g), and blue (b) values.
•Range: Values vary from 0 (black) to 1 (full intensity).
•Computation Structure
•Input: Array I of RGB values.
•Output: Array O of luminance values.
•Independence: Each pixel's computation is independent.
•For example, O[0] is the result of applying the formula to I[0]
•O[1] is the result for I[1]
•O[2] for I[2], and so on.
4
5
Task Parallelism vs. Data
Parallelism
1.Data Parallelism:
1. Definition: Parallelism based on performing the same operation on multiple data elements
independently.
2. Example: Color-to-grayscale conversion.
3. Scalability: Utilizes large datasets to exploit parallel processors effectively.
4. Focus: Independent computations on data elements.
2.Task Parallelism:
1. Definition: Parallelism based on executing different tasks or functions concurrently.
2. Example: Vector addition and matrix-vector multiplication.
• In a parallel computing environment, you might set up the following tasks:
• Task 1: Perform vector addition on one set of vectors.
• Task 2: Perform matrix-vector multiplication on another set of data.
3. Application: Common in complex applications with multiple independent tasks.
4. Examples in Applications:
1. Molecular dynamics: vibrational forces, rotational forces, neighbor identification, etc.
5. Focus: Independent tasks that can be performed simultaneously.
6
Task Parallelism vs. Data
Parallelism
1.Comparison:
1. Data Parallelism: Primarily scales with dataset size, leveraging parallel processors.
2. Task Parallelism: Focuses on decomposing applications into independent tasks.
2.Importance in Parallel Programming:
1. Data Parallelism: Main source of scalability for large datasets and hardware resources.
2. Task Parallelism: Important for performance goals, especially in applications with diverse
tasks.
7
Exploiting Data Parallelism with
CUDA C
•What is CUDA C?
•Extension of ANSI C: Minimal new syntax and library functions.
•Purpose: Target heterogeneous systems with CPUs and GPUs.
•Platform: Built on NVIDIA’s CUDA platform.
•CUDA C Overview:
•Host and Device Code:
•Host Code: Runs on CPU.
•Device Code: Runs on GPU, marked with CUDA C keywords.
•Kernel Functions: Execute in a data-parallel manner on the GPU.
•Execution Flow:
•Start with Host Code: Traditional CPU code.
•Kernel Call: Launches threads on the GPU.
8
Exploiting Data Parallelism with
CUDA C
•Threads and Grids:
•Threads: Primary units of parallel execution.
•Grid: Collection of threads launched by a kernel.
•CUDA C Tools:
•Maturity: Widely used in high-performance computing.
•Tools: Compilers, debuggers, and profilers available.
9
10
Thread Generation in CUDA
•Threads in CUDA:
•Purpose: Exploit data parallelism.
•Example Use Case: Color-to-grayscale conversion.
•Thread Generation:
•Grid Launch: Generates many threads.
•Threads for Pixels: One thread per pixel in the image.
•Efficiency:
•CUDA Threads: Few clock cycles for generation and scheduling.
•Contrast with CPU Threads: Typically thousands of clock cycles.
•Upcoming Examples:
•Color-to-Grayscale Conversion: Implementing the kernel.
•Image Blur Kernels: Advanced operations in CUDA.
•Vector Addition: Used as a simple running example.
11
Understanding Threads in
Computing and CUDA
•Definition of a Thread:
•Code of the program
•Current execution point
•Values of variables and data structures
•Sequential Execution:
•User-level view of thread execution
•Debugging: Step-by-step execution, variable inspection
•Threads in CUDA:
•Execution is sequential within each thread
•Parallel execution initiated via kernel functions
•Threads process different parts of data in parallel
12
Vector Addition in CUDA C
•Vector Addition Overview:
•Simplest data parallel computation
•Analogous to "Hello World" in sequential programming
•Code Example:
•Traditional C program structure
•Host code with variables suffixed with "_h"
•CUDA C Transition:
•Introduction to CUDA C kernel code
•Host vs. Device variables
13
Remark on pointers
int value = 5;
int *ptr = &value; // ptr now holds the address of value
*ptr = 10; // Modifies the value at the address ptr is pointing to
14
Pointers in the C Language
1.Definition and Declaration:
•Pointers allow access to variables and data structures.
•Example:
•float V; // Float variable
•float *P; // Pointer to a float
2.Pointer Operations:
•Assigning address:
•float V = 3.14; // Float variable
•float *P = &V; // P is a pointer to a float, holding the address of V
•Dereferencing pointer: *P is equivalent to V
•Example operations:
•U = *P // Assigns the value of V to U
•*P = 3 // Changes the value of V to 3
3.Array Access via Pointers:
•int A[5] = {1, 2, 3, 4, 5}; // An array of integers you can also write P = A; and it will
•int *P; // Pointer to an integer do the same thing as P = &A[0];.
•P = &A[0]; // P points to the address of the first element of array A
15
Pointers in the C Language
#include <stdio.h>
int main() {
int A[5] = {1, 2, 3, 4, 5}; // Declare an array
int *P; // Declare a pointer to an integer
return 0;
}
16