0% found this document useful (0 votes)

166 views4 pages

Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab

The document provides instructions for 3 assignments for a GPU programming course. Students must submit optimized CUDA code for each assignment, along with a report following a specific format. Assignment 1 involves adding a constant to array elements on the GPU and profiling for different problem sizes. Assignment 2 is matrix multiplication on the GPU using naive and shared memory implementations. Assignment 3 implements numerical integration using the trapezoidal rule in serial and parallel. Reports must include analysis, observations, and performance curves. Code and presentations will be evaluated for each assignment.

Uploaded by

kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views4 pages

Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab

Uploaded by

kiran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignments for GPU Programming Course/ Lab

This document will be updated regularly. All assignments have to be

submitted by a given deadline (some issues in accessing the servers
will be resolved shortly).
Submission of assignments: Create separate folders with the name for
each assignment in your google-drive and share the folder with the id:
[email protected] ([email protected])

Naming the folder (strictly follow this): ID + GPU +

Assignment No.

Example: 201201014-GPU-AS2
Also do the above for your previous assignment folder
(asgn-1) in your drive shared with csdaiict, it will
automatically reflect in csdaiict drive.
All the assignments have to be supplemented with a brief write-up or
ppt with the following details (wherever necessary):
1. Context:
Brief description of the problem.
Complexity of the algorithm (serial).
Possible speedup (theoretical).
Optimization strategy.
Problems faced in parallelization and possible solutions.
2. Hardware details: GPU model, no of cores, device properties. Compute
capability.
3. Input parameters. Output. Make sure results from serial and parallel
are same.
4. Nave implementation description. Possible improvements over nave
implementation.
5. Problem Size vs Time (Serial, parallel) curve. Speedup curve.
Observations and comments about the results.
6. If more than one implementation, curves for all algorithms in the same
plot.
7. Wherever necessary use log scale and auxiliary units.
8. Effect of block dimensions and grid launch on speedup.
9. Proper labeling of graphs.
10.
List of observations and conclusion under each table and
figure.

Assignments include submission of codes [optimized serial and parallel codes

(multiple versions if applicable) with necessary comments inside the code].
Also register at http://courses.daiict.ac.in.
Evaluation will be based on the submission of materials (reports +
code) and presentation in class. Date-wise presentation details will be
updated in the list-of-presentation file at-least one day in advance.

Assignment 1 (17th Aug)

Deadline:
26th August

1. Write a CUDA program that adds a number X to all elements of a one-dimensional array
A.
2. The elements of A and X should be single precision floating-point numbers.
3. Using the necessary timer calls, have your program report the time needed to copy data
from the CPU to the GPU, the time needed to add X to all elements of A in the GPU, and
the time needed to copy the data back from the GPU to the CPU.
4. The elements of A should be initialized with some value (not random). So that
comparison with serial code is possible.
5. Vary the number of elements in A from min of 1Million to the maximum number that can
be supported by single invocation of a GPU kernel in power of two steps, i.e., 1M, 2M,
4M, 16M, etc.
6. For every different array size, have your program print three time measurements: the time
required to copy A from the CPU to the GPU, the time taken by the kernel, and the time
required to copy the data from the GPU to the CPU.
7. The output should be reported in tabular form, like:
Elements(M) ; CPUtoGPU(ms) ; Kernel(ms) ; GPUtoCPU(ms)
Explore the following possibilities for profiling.

cutStartTimer(myTimer)

Events

8. Comment about CGMA ratio in the case of above program.

9. Extend your kernel specifying how many times X should be added to each element. Do
not use multiplication for these additions. Create a loop.
10. For the maximum number of elements that can be supported by a single kernel invocation
have your program print out the three time measurements above as a function of the
number of times X is added. Do so, for a range of 1 through 256 in power of two steps.
Your programs output should look as follows:
XaddedTimes; Elements(M); CPUtoGPU(ms) ; Kernel(ms); GPUtoCPU(ms)

What to submit/ report:

Submit the version that prints both measurements. (i.e., time as a function of element
count and time as a function of the number of additions).

Make sure it compiles and runs correctly.

Presentation of around 10 slides summarizing the above 10 points and other observations.
Supported by necessary curves.

Assignment 2 (26th Aug)

Deadline: September
2

Shared / shared CUDA Matrix Multiply

Write a serial CPU code for matrix multiplication (MM)
Parallelize the serial code (nave implementation of MM) using CUDA.
Modify the nave implementation using shared memory as discussed in the
class. (not tiled implementation, yet to be fully discussed in the class)
Investigate how each of the following factors influences performance of matrix
multiplication in CUDA (for both implementation nave and shared memory ):
(1) The size of matrices to be multiplied

(2) The size of the block computed by each thread block

In addition to 8 points discussed in page-1, also make comments about your observations and
the most optimized implementation.

Assignment 3 (Sep 2)
16

Deadline: September

Numerical Integration using trapezoidal rule as discussed in the class.

1. Serial implementation.
2. Parallel implementation using CUDA.
In addition to 10 points discussed in page-1, also make comments about your observations and
the most optimized implementation in the report.

CUDA Optimization Fundamentals
No ratings yet
CUDA Optimization Fundamentals
150 pages
Cheatsheet: Performance Principles Patterns and Anti Patterns
No ratings yet
Cheatsheet: Performance Principles Patterns and Anti Patterns
1 page
Thesis Gpu Programming
100% (2)
Thesis Gpu Programming
6 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
Using CUDA
No ratings yet
Using CUDA
57 pages
Module4
No ratings yet
Module4
40 pages
Pdclab 5
No ratings yet
Pdclab 5
11 pages
S3076 Getting Started With OpenACC
No ratings yet
S3076 Getting Started With OpenACC
58 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
Assignment 2 MPI MSA
No ratings yet
Assignment 2 MPI MSA
10 pages
3-computation
No ratings yet
3-computation
28 pages
6CS005 - Assessment 20-21
No ratings yet
6CS005 - Assessment 20-21
25 pages
6CS005 - Assessment 21-22
No ratings yet
6CS005 - Assessment 21-22
4 pages
HPC
No ratings yet
HPC
7 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
19 pages
HPC 1
No ratings yet
HPC 1
27 pages
ECE408 2012 Practice Exam1
No ratings yet
ECE408 2012 Practice Exam1
10 pages
hw2
No ratings yet
hw2
12 pages
Owens
No ratings yet
Owens
67 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
combinepdf
No ratings yet
combinepdf
28 pages
Parallel Computing Lab4
No ratings yet
Parallel Computing Lab4
13 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
written_asst2
No ratings yet
written_asst2
27 pages
Programming Assignments: A1 - Systemc and Openmp
No ratings yet
Programming Assignments: A1 - Systemc and Openmp
2 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
GPU_Assignment-3_Solution
No ratings yet
GPU_Assignment-3_Solution
4 pages
chapter-8
No ratings yet
chapter-8
58 pages
cs239 Ejer1
No ratings yet
cs239 Ejer1
2 pages
Assignment-Lab Report 1 CEW
No ratings yet
Assignment-Lab Report 1 CEW
4 pages
Download full An Introduction to Parallel Programming 2. Edition Pacheco ebook all chapters
100% (2)
Download full An Introduction to Parallel Programming 2. Edition Pacheco ebook all chapters
37 pages
Multithread
No ratings yet
Multithread
3 pages
GPU Based Parallel Processing Model Proposal
No ratings yet
GPU Based Parallel Processing Model Proposal
4 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Embedded System Architecture by Ralf Niemann
100% (2)
Embedded System Architecture by Ralf Niemann
130 pages
GPU Based Parallel Processing Model Proposal Expanded
No ratings yet
GPU Based Parallel Processing Model Proposal Expanded
4 pages
27_10_23
No ratings yet
27_10_23
2 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
UCS631
No ratings yet
UCS631
1 page
GPU_Programming_slides_2
No ratings yet
GPU_Programming_slides_2
37 pages
HPC Assignments
No ratings yet
HPC Assignments
1 page
Gpu, Cuda and Pycuda
No ratings yet
Gpu, Cuda and Pycuda
11 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
Analysis of Programs For GPGPU Architectures
No ratings yet
Analysis of Programs For GPGPU Architectures
4 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
new hpc_removed
No ratings yet
new hpc_removed
5 pages
TM275 - Meeting 2# Parallel Programming (Hands on) Using Python (Multiprocessing Library)
No ratings yet
TM275 - Meeting 2# Parallel Programming (Hands on) Using Python (Multiprocessing Library)
60 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
22 pages
Master of Computer Application : Draft Syllabus
No ratings yet
Master of Computer Application : Draft Syllabus
50 pages
3-CUDA
No ratings yet
3-CUDA
5 pages
Cplex Users Manual PDF
No ratings yet
Cplex Users Manual PDF
564 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
No ratings yet
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
3 pages
Ray v2 Architecture
No ratings yet
Ray v2 Architecture
64 pages
HCIA-Cloud Computing V4.0 Learning Guide
No ratings yet
HCIA-Cloud Computing V4.0 Learning Guide
142 pages
Parallel Quicksort Implementation Using Mpi and Pthreads: Puneet Kataria RUID - 117004233
No ratings yet
Parallel Quicksort Implementation Using Mpi and Pthreads: Puneet Kataria RUID - 117004233
14 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Springer Ebook Universitas Telkom
No ratings yet
Springer Ebook Universitas Telkom
530 pages
Parallel ProgrammingSyllabus
No ratings yet
Parallel ProgrammingSyllabus
2 pages
Algorithm and Programming Concepts, Sample Question
No ratings yet
Algorithm and Programming Concepts, Sample Question
7 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
Waqar Hussain, Jari Nurmi, Jouni Isoaho, Fabio Garzia - Computing Platforms For Software-Defined Radio-Springer (2017)
No ratings yet
Waqar Hussain, Jari Nurmi, Jouni Isoaho, Fabio Garzia - Computing Platforms For Software-Defined Radio-Springer (2017)
241 pages
Lập Trình Trên Bộ Xử Lý Song Song GPU Có Hỗ Trợ Lõi CUDA
No ratings yet
Lập Trình Trên Bộ Xử Lý Song Song GPU Có Hỗ Trợ Lõi CUDA
18 pages
CSE5006 Multicore-Architectures ETH 1 AC41
No ratings yet
CSE5006 Multicore-Architectures ETH 1 AC41
9 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
61 pages
Computer Architecture by Kai Hwang Kai Hwang & F. A. Briggs, "Computer Architecture and Parallel Processing", McGraw Hill
75% (8)
Computer Architecture by Kai Hwang Kai Hwang & F. A. Briggs, "Computer Architecture and Parallel Processing", McGraw Hill
864 pages
Parrot
No ratings yet
Parrot
22 pages
Unit 3 (Pipelining)
No ratings yet
Unit 3 (Pipelining)
33 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
COSC 4101 Parallel and Distributed Computing Final
No ratings yet
COSC 4101 Parallel and Distributed Computing Final
4 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
2 pages
Main GPU
No ratings yet
Main GPU
87 pages
02 Principles of Parallel Execution and Partitioning
No ratings yet
02 Principles of Parallel Execution and Partitioning
23 pages
Watson White Paper1
No ratings yet
Watson White Paper1
15 pages
Module 1: Introduction To Operating System: Need For An OS
No ratings yet
Module 1: Introduction To Operating System: Need For An OS
15 pages
Calicut University Syllabus For Eight Semester Computer Science & Engineering
No ratings yet
Calicut University Syllabus For Eight Semester Computer Science & Engineering
30 pages
GaiaGPU Sharing GPUs in Container Clouds
No ratings yet
GaiaGPU Sharing GPUs in Container Clouds
8 pages
BITS Pilani: Distributed Computing
No ratings yet
BITS Pilani: Distributed Computing
73 pages
Deloitte and Snowflake AI Trends in 2024
No ratings yet
Deloitte and Snowflake AI Trends in 2024
16 pages
Data Integration in Grid
No ratings yet
Data Integration in Grid
11 pages
Compstat 2008
No ratings yet
Compstat 2008
13 pages
Literature Review On Relay Coordination Using Software Techniques
No ratings yet
Literature Review On Relay Coordination Using Software Techniques
4 pages
TP1: Converting Vector Addition To CUDA.: Listing 1 An Example of Vector Addition Implemented in C
No ratings yet
TP1: Converting Vector Addition To CUDA.: Listing 1 An Example of Vector Addition Implemented in C
1 page
High Performance Computing System (CSE 5154) RCS
No ratings yet
High Performance Computing System (CSE 5154) RCS
1 page
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet

Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab

Uploaded by

Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab

Uploaded by

Assignments for GPU Programming Course/ Lab

This document will be updated regularly. All assignments have to be

Naming the folder (strictly follow this): ID + GPU +

Assignments include submission of codes [optimized serial and parallel codes

Assignment 1 (17th Aug)

8. Comment about CGMA ratio in the case of above program.

What to submit/ report:

Make sure it compiles and runs correctly.

Assignment 2 (26th Aug)

Shared / shared CUDA Matrix Multiply

(2) The size of the block computed by each thread block

Numerical Integration using trapezoidal rule as discussed in the class.

You might also like