Assignment 3
Assignment 3
1 Introduction
Traditionally, computers have only one processor and algorithms were designed to be
sequential. But, today’s computers have more than one processing element (either multiple
CPU cores or GPU) means “heterogeneous computing” like we are discussing now a days in
CUDA programming. So, sequential programs are slow as they need to process the
instructions one by one. Performance of some sequential programs (not all) can be
improved by exploiting the parallel architectures. Let us consider a vector addition
example. Two vectors of length n can be added in parallel using at most n processors, as
shown in Figure 11. Here, pi denotes ith processor. If we have n processors, then
asymptotic run time of a vector addition program can be reduced from O(n) to O(1)
upon parallelizing.
In general, if you have p processors and execution time is ts seconds for a sequential
p
version of an application, then you should see a speed-up (theoretically) of O( ts ) upon
parallelizing the application. Please note that all sequential algorithms can’t be
converted to parallel algorithms and you have to be cautious while taking these
design decisions.
1
Figure 1: Vector addition: Sequential to Parallel
example
2 Assignment
In this assignment, you are asked to select an algorithm of your choice like from
course of “Algorithm Analysis” and parallelize it. You can use any programming
language (C/C++, FORTAN etc.) with which you are familiar. You are required to
submit a report of about 1-2 pages (In this part, you should discuss the
performance of the application in terms of different parameters which you feel are
relevant for analysis. Speed-up in execution time could be one of the parameters.
That what’s the speed up of sequential and parallel program) along with the code
(both sequential version and parallel version) in a zip file. You should name your zip
file as: Nazia_ R o l l n u m b e r _ a3.