0% found this document useful (0 votes)
42 views1 page

GPU Optimisation

This document outlines an assignment to profile and optimize GPU performance by developing a Template D program. Students are asked to implement profiling of computation phases, occupancy, bandwidth and speedup and compare GPU and CPU performance. They must submit their template and report by Monday analyzing when GPU processing provides better performance than the CPU.

Uploaded by

raleigh_rayl
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views1 page

GPU Optimisation

This document outlines an assignment to profile and optimize GPU performance by developing a Template D program. Students are asked to implement profiling of computation phases, occupancy, bandwidth and speedup and compare GPU and CPU performance. They must submit their template and report by Monday analyzing when GPU processing provides better performance than the CPU.

Uploaded by

raleigh_rayl
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Advanced Operating Systems – Programming Assignment 2

GPU Profiling & Optimisation

Your assignment is to complete our investigation into GPU speedup, occupancy, and memory
bandwidth, and to answer the question:

“Since there is an overhead in GPU processing, when does it make sense to use the GPU instead of the
CPU?”

Refer to the templates we have developed in the lab (A, B, and C), the NVIDIA webinars on CUDA
optimisation, and the other example programs in the NVIDIA GPU Computing SDK (e.g.
simpleZeroCopy, bandwidthTest, etc.).

Develop a Template D which incorporates your understanding of the CUDA C Runtime API, and the
material covered in the lectures. Write a short report on your findings, and provide comparative results
(speed, occupancy, bandwidth), using Template C results as your baseline.

Your program should implement the following requirements:

1. Target the FERMI architecture (in your GTX470), but state which features, in your Template D,
are not CUDA 1.0 compute capabilities when you write your report.

2. Wall-clock timings of CPU and GPU computation.

3. Direct profiling of GPU computation phases, as per the events in Template C.

4. Achieve maximum GPU speedup.

5. Achieve maximum GPU occupancy.

6. Achieve maximum GPU memory bandwidth.

7. Compare 5,6, and 7, with CPU.

Submit your template and report to the course website no later than first thing Monday 11th October –
please note that there are NO extensions on this deadline.

You might also like