0% found this document useful (0 votes)

28 views13 pages

Parallel Computing Lab4

Uploaded by

Agha Ammar Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views13 pages

Parallel Computing Lab4

Uploaded by

Agha Ammar Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

AIR UNIVERSITY

DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING

EXPERIMENT NO. 4

Lab Title: Introduction to Parallel Programming with CudaC: Exploring CUDA C programming 2D
operations

Student Name: M.Bilal Ijaz, Agha Ammar Khan Reg. No:210316,210300

Objective: Implement and analyze various 2D array/matrix operations in CUDAC

LAB ASSESSMENT:

Attributes Excellent Good Average Satisfactory Unsatisfactory

(5) (4) (3) (2) (1)

Ability to Conduct
Experiment

Ability to assimilate the

results

Effective use of lab

equipment and follows the
lab safety rules

Total Marks: Obtained Marks:

LAB REPORT ASSESSMENT:

Attributes Excellent Good Average Satisfactory Unsatisfactory

(5) (4) (3) (2) (1)

Data Presentation

Experiment Results

Conclusion

Total Marks: Obtained Marks:

Date: 17/10/2024 Signature:

LAB#04
TITLE: Exploring CUDA C Programming: 2D Operations.

Objective:
Implement and analyze various 2D array/matrix operations in CUDAC

Introduction:
The aim of this lab was to explore parallel computing techniques by implementing basic 2D
matrix operations using CUDA C. These operations include matrix addition, matrix
multiplication, matrix transposition, and scalar multiplication. CUDA (Compute Unified
Device Architecture) provides a platform for parallel computing on NVIDIA GPUs, allowing
developers to write code that exploits data-level parallelism for large datasets, such as 2D
matrices.

The primary objective of this lab is to:

• Understand how to allocate and transfer memory between host (CPU) and device
(GPU).
• Use CUDA kernels to implement matrix operations using thread blocks.
• Optimize the design for thread allocation and workload distribution for 2D matrices.

This lab demonstrates the concepts of:

• Grid and block structure for managing threads.
• Memory handling between host and device.
• Synchronization of threads in GPU to ensure correct computation.

Experiment Setup:
• Software: CUDA toolkit, NVIDIA CUDA Compiler (NVCC), C/C++ for code
implementation.
• Hardware: A machine with an NVIDIA GPU compatible with CUDA.
Each operation was implemented on a 16x16 matrix, using a block size of 16x16 threads.
This configuration allowed one thread to compute one element of the matrix.
Matrix Addition:
The task is to add two 2D matrices element-wise. Each thread computes the sum for one element of
the resulting matrix.

Key steps:
1. Matrices A and B are initialized on the host.
2. Memory is allocated on the device, and data is transferred from the host to the
device.
3. A CUDA kernel is launched, where each thread adds corresponding elements of
matrices A and B.
4. The result is copied back from the device to the host.

Matrix Multiplication:
Matrix multiplication involves computing the dot product of the rows of the first matrix with
the columns of the second matrix.

Key steps:
1. Each thread computes the value of one element in the resulting matrix.
2. For each thread, the dot product of one row of matrix A and one column of matrix B
is computed and assigned to the result matrix C.

Matrix Transposition:
Matrix transposition involves switching the rows and columns of a matrix. In this case, each
thread transposes one element of the matrix.

Key steps:
1. A CUDA kernel is launched where each thread switches the row and column indices
to transpose the matrix.
2. For every element A[i][j], it is assigned to B[j][i].
Scalar Multiplication:
Scalar multiplication involves multiplying each element of a matrix by a constant scalar
value.

Key steps:
1. Each thread multiplies the element of the matrix A by a scalar k.
2. The result is stored in matrix C.

Performance Considerations:
For all of the operations:
1. Thread Management: The grid and block dimensions were chosen to optimize the
number of threads per block, ensuring efficient parallelism.
2. Memory Transfer: Efficient transfer of data between host and device is crucial. The
use of pinned memory or using memory pools may further optimize this.
3. Thread Synchronization: No explicit synchronization is required in these operations
since each thread works independently on separate elements of the matrix.

Lab Tasks:
Code and Output:
Task2:

Code and output:

Task3:

Code and Output:

Task4:

Code and Output:

Conclusion:
This lab offered practical experience in performing basic 2D matrix operations using CUDA,
highlighting the power of data parallelism through the use of thread blocks. The programs
showcased significant performance improvements compared to serial CPU execution, as
tasks such as matrix addition, multiplication, transposition, and scalar multiplication were
efficiently distributed across multiple threads in a grid, enabling parallel processing.
The concepts learned in this lab lay a strong foundation for more advanced matrix
operations and optimization techniques, including the use of shared memory, tiling, and
stream-based computations, which will be explored in upcoming labs.

Pdclab 5
No ratings yet
Pdclab 5
11 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
HPC-Practical-4Addition of Two Large Vectors
No ratings yet
HPC-Practical-4Addition of Two Large Vectors
4 pages
cs239 Ejer1
No ratings yet
cs239 Ejer1
2 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
LP1 1
No ratings yet
LP1 1
129 pages
Combinepdf
No ratings yet
Combinepdf
28 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
Threads
No ratings yet
Threads
54 pages
HPC File
No ratings yet
HPC File
22 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
Mamindla Sathvika Lab8
No ratings yet
Mamindla Sathvika Lab8
7 pages
Summary Exam 2015
No ratings yet
Summary Exam 2015
30 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
PDC Assignment
No ratings yet
PDC Assignment
9 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
CUDA MatrixMultiplication
No ratings yet
CUDA MatrixMultiplication
2 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
Opencl Programming For The Cuda Architecture
No ratings yet
Opencl Programming For The Cuda Architecture
23 pages
CUDA Part-1
No ratings yet
CUDA Part-1
52 pages
Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab
No ratings yet
Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab
4 pages
Rishi
No ratings yet
Rishi
30 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
CUDA Programming: Johan Seland Johan - Seland@sintef - No
No ratings yet
CUDA Programming: Johan Seland Johan - Seland@sintef - No
76 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
GPU Programming Slides 2
No ratings yet
GPU Programming Slides 2
37 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
No ratings yet
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
3 pages
CUDA - Part 1 LMS
No ratings yet
CUDA - Part 1 LMS
51 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
Cuda Mode Lecture2
No ratings yet
Cuda Mode Lecture2
33 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
Exercise Instructions
No ratings yet
Exercise Instructions
12 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
Parallel ProgrammingSyllabus
No ratings yet
Parallel ProgrammingSyllabus
2 pages
6963 Midterm Review
No ratings yet
6963 Midterm Review
20 pages
Cuda Program + Wait For User Input
No ratings yet
Cuda Program + Wait For User Input
2 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
CUDA
No ratings yet
CUDA
33 pages
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
No ratings yet
Department of Computer Engineering BE Laboratory Practice-I A.Y 2021-22 SEM1
45 pages
Advanced Computer Architecture 1
No ratings yet
Advanced Computer Architecture 1
14 pages
Assignment 2 MPI MSA
No ratings yet
Assignment 2 MPI MSA
10 pages
Intro To CUDA
No ratings yet
Intro To CUDA
76 pages
HW 4
No ratings yet
HW 4
3 pages
CUDA C Best Practices Guide
No ratings yet
CUDA C Best Practices Guide
73 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
Class4 Advanced Cuda Opencl
No ratings yet
Class4 Advanced Cuda Opencl
64 pages
01 Cuda C Basics
No ratings yet
01 Cuda C Basics
32 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
19 pages
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
Manish Soni
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Kubernetes and Cloud Native Associate (KCNA) Exam Preparation
From Everand
Kubernetes and Cloud Native Associate (KCNA) Exam Preparation
Georgio Daccache
No ratings yet
TEG Water Equilibrium
100% (1)
TEG Water Equilibrium
9 pages
Microsoft Word
No ratings yet
Microsoft Word
28 pages
Dehydration of Milk
50% (2)
Dehydration of Milk
23 pages
Mobile Programming MCQs
No ratings yet
Mobile Programming MCQs
35 pages
Practical 1 Introduction To HTML. Create A Basic HTML File
No ratings yet
Practical 1 Introduction To HTML. Create A Basic HTML File
11 pages
Crack High Level Seating Arrangement Questions 2019 - Edition 1 PDF
100% (1)
Crack High Level Seating Arrangement Questions 2019 - Edition 1 PDF
181 pages
Lecture 1 8086 Interrupt
No ratings yet
Lecture 1 8086 Interrupt
26 pages
Electrical Machines I
No ratings yet
Electrical Machines I
31 pages
Oracle Questions
No ratings yet
Oracle Questions
56 pages
c8 Revision Checklist - Chemical Analysis
No ratings yet
c8 Revision Checklist - Chemical Analysis
2 pages
Iso 25619-1-2021
100% (1)
Iso 25619-1-2021
24 pages
LG Ducted Split System Brochure 2009
No ratings yet
LG Ducted Split System Brochure 2009
8 pages
Generic PLL-Based Grid-Forming Control
No ratings yet
Generic PLL-Based Grid-Forming Control
4 pages
Quantification of A Professional Football Team s.26 PDF
No ratings yet
Quantification of A Professional Football Team s.26 PDF
8 pages
Micro Electro Mechanical Systems PDF
No ratings yet
Micro Electro Mechanical Systems PDF
117 pages
SD Questions About Pricing Condition
100% (1)
SD Questions About Pricing Condition
9 pages
Chapter 02 Documenting The Crime Scene
No ratings yet
Chapter 02 Documenting The Crime Scene
102 pages
Simple Tax Invoice Template
No ratings yet
Simple Tax Invoice Template
2 pages
Greenbox Arrefecedores Evaporativos
No ratings yet
Greenbox Arrefecedores Evaporativos
2 pages
On The Suitability of Modeling Approaches For Engineering Distributed Control Systems
No ratings yet
On The Suitability of Modeling Approaches For Engineering Distributed Control Systems
6 pages
Siemens Power Link
No ratings yet
Siemens Power Link
14 pages
Asymptomatic Radiopacity of Mandible Causing Delayed Orthodontic Tooth Movement: A Case Report
No ratings yet
Asymptomatic Radiopacity of Mandible Causing Delayed Orthodontic Tooth Movement: A Case Report
4 pages
Cambridge As Level Preboards Examination Time Table 2023 - 2024
No ratings yet
Cambridge As Level Preboards Examination Time Table 2023 - 2024
2 pages
Electrical Energy Audit in Residential House
No ratings yet
Electrical Energy Audit in Residential House
11 pages
Wood Species Identification Using Convolutional Neural Network (CNN) Architectures On Macroscopic Images
No ratings yet
Wood Species Identification Using Convolutional Neural Network (CNN) Architectures On Macroscopic Images
11 pages
Vacheron Constantin - Historiques Cornes de Vache+1955
100% (1)
Vacheron Constantin - Historiques Cornes de Vache+1955
48 pages
Collada-Loader For Processing: Markus Zimmermann WWW - Die-Seite - CH
No ratings yet
Collada-Loader For Processing: Markus Zimmermann WWW - Die-Seite - CH
10 pages
Count Description TP 200-250/4 A-F-B-BAQE: Company Name: Created By: Phone: Date
No ratings yet
Count Description TP 200-250/4 A-F-B-BAQE: Company Name: Created By: Phone: Date
7 pages
Tonearm Setup Audio
No ratings yet
Tonearm Setup Audio
12 pages
GiD User Manual
No ratings yet
GiD User Manual
169 pages

Parallel Computing Lab4

Uploaded by

Parallel Computing Lab4

Uploaded by

AIR UNIVERSITY

DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING

Student Name: M.Bilal Ijaz, Agha Ammar Khan Reg. No:210316,210300

Objective: Implement and analyze various 2D array/matrix operations in CUDAC

Attributes Excellent Good Average Satisfactory Unsatisfactory

(5) (4) (3) (2) (1)

Ability to assimilate the

Effective use of lab

Total Marks: Obtained Marks:

LAB REPORT ASSESSMENT:

Attributes Excellent Good Average Satisfactory Unsatisfactory

(5) (4) (3) (2) (1)

Total Marks: Obtained Marks:

Date: 17/10/2024 Signature:

The primary objective of this lab is to:

This lab demonstrates the concepts of:

Code and output:

Code and Output:

Code and Output:

You might also like