Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab
Example: 201201014-GPU-AS2: Assignments For GPU Programming Course/ Lab
Example: 201201014-GPU-AS2
Also do the above for your previous assignment folder
(asgn-1) in your drive shared with csdaiict, it will
automatically reflect in csdaiict drive.
All the assignments have to be supplemented with a brief write-up or
ppt with the following details (wherever necessary):
1. Context:
Brief description of the problem.
Complexity of the algorithm (serial).
Possible speedup (theoretical).
Optimization strategy.
Problems faced in parallelization and possible solutions.
2. Hardware details: GPU model, no of cores, device properties. Compute
capability.
3. Input parameters. Output. Make sure results from serial and parallel
are same.
4. Nave implementation description. Possible improvements over nave
implementation.
5. Problem Size vs Time (Serial, parallel) curve. Speedup curve.
Observations and comments about the results.
6. If more than one implementation, curves for all algorithms in the same
plot.
7. Wherever necessary use log scale and auxiliary units.
8. Effect of block dimensions and grid launch on speedup.
9. Proper labeling of graphs.
10.
List of observations and conclusion under each table and
figure.
Deadline:
26th August
1. Write a CUDA program that adds a number X to all elements of a one-dimensional array
A.
2. The elements of A and X should be single precision floating-point numbers.
3. Using the necessary timer calls, have your program report the time needed to copy data
from the CPU to the GPU, the time needed to add X to all elements of A in the GPU, and
the time needed to copy the data back from the GPU to the CPU.
4. The elements of A should be initialized with some value (not random). So that
comparison with serial code is possible.
5. Vary the number of elements in A from min of 1Million to the maximum number that can
be supported by single invocation of a GPU kernel in power of two steps, i.e., 1M, 2M,
4M, 16M, etc.
6. For every different array size, have your program print three time measurements: the time
required to copy A from the CPU to the GPU, the time taken by the kernel, and the time
required to copy the data from the GPU to the CPU.
7. The output should be reported in tabular form, like:
Elements(M) ; CPUtoGPU(ms) ; Kernel(ms) ; GPUtoCPU(ms)
Explore the following possibilities for profiling.
cutStartTimer(myTimer)
Events
Submit the version that prints both measurements. (i.e., time as a function of element
count and time as a function of the number of additions).
Presentation of around 10 slides summarizing the above 10 points and other observations.
Supported by necessary curves.
Deadline: September
2
In addition to 8 points discussed in page-1, also make comments about your observations and
the most optimized implementation.
Assignment 3 (Sep 2)
16
Deadline: September