0% found this document useful (0 votes)
24 views16 pages

Intro To CUDA

Uploaded by

omarobeidd03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Intro To CUDA

Uploaded by

omarobeidd03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Intro To CUDA

DR. RACHAD ATAT


Data Parallelism in Modern
Applications
•Data Challenges in Modern Software
•Image Processing: Millions to trillions of pixels
•Scientific Applications: Billions of grid points
•Molecular Dynamics: Thousands to billions of atoms
•Airline Scheduling: Thousands of flights, crews, gates

•Independent Data Processing


•Image Processing:
•Converting a color pixel to grayscale (data of one pixel)
•Blurring an image (data of a pixel's neighborhood)
•Scientific Applications:
•Fluid dynamics (independent grid points)
•Molecular Dynamics:
•Simulating interactions (independent atoms)
•Airline Scheduling:
•Scheduling (independent flights, crews, gates)

2
Data Parallelism in Modern
Applications
•Concept of Data Parallelism
•Definition:
•Break down large problems into independent smaller computations.
•Execution in Parallel:
•Organize computation around data.
•Execute independent tasks simultaneously.

•Benefits of Data Parallelism


•Performance:
•Faster completion of tasks.
•Scalability:
•Efficient handling of large datasets.

3
Illustrating Data Parallelism with
Color-to-Grayscale Conversion
•Color Image Representation
•Pixels: Each pixel contains red (r), green (g), and blue (b) values.
•Range: Values vary from 0 (black) to 1 (full intensity).

•Grayscale Conversion Formula


•Formula: L=r×0.21+g×0.72+b×0.07
•Purpose: Compute luminance (L) for each pixel.

•Computation Structure
•Input: Array I of RGB values.
•Output: Array O of luminance values.
•Independence: Each pixel's computation is independent.
•For example, O[0] is the result of applying the formula to I[0]
•O[1] is the result for I[1]
•O[2] for I[2], and so on.

4
5
Task Parallelism vs. Data
Parallelism
1.Data Parallelism:
1. Definition: Parallelism based on performing the same operation on multiple data elements
independently.
2. Example: Color-to-grayscale conversion.
3. Scalability: Utilizes large datasets to exploit parallel processors effectively.
4. Focus: Independent computations on data elements.

2.Task Parallelism:
1. Definition: Parallelism based on executing different tasks or functions concurrently.
2. Example: Vector addition and matrix-vector multiplication.
• In a parallel computing environment, you might set up the following tasks:
• Task 1: Perform vector addition on one set of vectors.
• Task 2: Perform matrix-vector multiplication on another set of data.
3. Application: Common in complex applications with multiple independent tasks.
4. Examples in Applications:
1. Molecular dynamics: vibrational forces, rotational forces, neighbor identification, etc.
5. Focus: Independent tasks that can be performed simultaneously.
6
Task Parallelism vs. Data
Parallelism
1.Comparison:
1. Data Parallelism: Primarily scales with dataset size, leveraging parallel processors.
2. Task Parallelism: Focuses on decomposing applications into independent tasks.
2.Importance in Parallel Programming:
1. Data Parallelism: Main source of scalability for large datasets and hardware resources.
2. Task Parallelism: Important for performance goals, especially in applications with diverse
tasks.

7
Exploiting Data Parallelism with
CUDA C
•What is CUDA C?
•Extension of ANSI C: Minimal new syntax and library functions.
•Purpose: Target heterogeneous systems with CPUs and GPUs.
•Platform: Built on NVIDIA’s CUDA platform.

•CUDA C Overview:
•Host and Device Code:
•Host Code: Runs on CPU.
•Device Code: Runs on GPU, marked with CUDA C keywords.
•Kernel Functions: Execute in a data-parallel manner on the GPU.

•Execution Flow:
•Start with Host Code: Traditional CPU code.
•Kernel Call: Launches threads on the GPU.

8
Exploiting Data Parallelism with
CUDA C
•Threads and Grids:
•Threads: Primary units of parallel execution.
•Grid: Collection of threads launched by a kernel.

•Execution Cycle: Threads complete, then execution continues on the host.

•CUDA C Tools:
•Maturity: Widely used in high-performance computing.
•Tools: Compilers, debuggers, and profilers available.

9
10
Thread Generation in CUDA
•Threads in CUDA:
•Purpose: Exploit data parallelism.
•Example Use Case: Color-to-grayscale conversion.

•Thread Generation:
•Grid Launch: Generates many threads.
•Threads for Pixels: One thread per pixel in the image.

•Efficiency:
•CUDA Threads: Few clock cycles for generation and scheduling.
•Contrast with CPU Threads: Typically thousands of clock cycles.

•Upcoming Examples:
•Color-to-Grayscale Conversion: Implementing the kernel.
•Image Blur Kernels: Advanced operations in CUDA.
•Vector Addition: Used as a simple running example.

11
Understanding Threads in
Computing and CUDA
•Definition of a Thread:
•Code of the program
•Current execution point
•Values of variables and data structures

•Sequential Execution:
•User-level view of thread execution
•Debugging: Step-by-step execution, variable inspection

•Traditional Thread Management:


•Use of thread libraries or special languages for parallel execution

•Threads in CUDA:
•Execution is sequential within each thread
•Parallel execution initiated via kernel functions
•Threads process different parts of data in parallel

12
Vector Addition in CUDA C
•Vector Addition Overview:
•Simplest data parallel computation
•Analogous to "Hello World" in sequential programming

•Conventional Vector Addition:


•Main function and vector addition function
•Use of suffixes for host and device variables

•Code Example:
•Traditional C program structure
•Host code with variables suffixed with "_h"

•CUDA C Transition:
•Introduction to CUDA C kernel code
•Host vs. Device variables

13
Remark on pointers
int value = 5;
int *ptr = &value; // ptr now holds the address of value
*ptr = 10; // Modifies the value at the address ptr is pointing to

int arr[3] = {1, 2, 3};


int *ptr = arr; // Points to the first element of the array
*(ptr + 1) = 5; // Modifies the second element of the array to 5

14
Pointers in the C Language
1.Definition and Declaration:
•Pointers allow access to variables and data structures.
•Example:
•float V; // Float variable
•float *P; // Pointer to a float
2.Pointer Operations:
•Assigning address:
•float V = 3.14; // Float variable
•float *P = &V; // P is a pointer to a float, holding the address of V
•Dereferencing pointer: *P is equivalent to V
•Example operations:
•U = *P // Assigns the value of V to U
•*P = 3 // Changes the value of V to 3
3.Array Access via Pointers:
•int A[5] = {1, 2, 3, 4, 5}; // An array of integers you can also write P = A; and it will
•int *P; // Pointer to an integer do the same thing as P = &A[0];.
•P = &A[0]; // P points to the address of the first element of array A
15
Pointers in the C Language
#include <stdio.h>

int main() {
int A[5] = {1, 2, 3, 4, 5}; // Declare an array
int *P; // Declare a pointer to an integer

// Assign the address of the first element of A to P


P = &A[0];

// Or simply P = A; (which does the same thing)

printf("First element: %d\n", *P); // Prints: 1 (value of A[0])

// Access second element via pointer arithmetic


printf("Second element: %d\n", *(P + 1)); // Prints: 2 (value of A[1])

return 0;
}

16

You might also like