0% found this document useful (0 votes)
2 views

Lecture 9 - Parallel Algorithms

The document discusses parallel algorithms, highlighting their potential to significantly reduce solution times by utilizing multiple processors. It explains the concept of speedup, efficiency, and various computational models, including shared memory models like PRAM. Additionally, it covers specific parallel computation techniques such as prefix computation and merging algorithms, emphasizing the differences between work-optimal and non-work-optimal approaches.

Uploaded by

das2107060
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 9 - Parallel Algorithms

The document discusses parallel algorithms, highlighting their potential to significantly reduce solution times by utilizing multiple processors. It explains the concept of speedup, efficiency, and various computational models, including shared memory models like PRAM. Additionally, it covers specific parallel computation techniques such as prefix computation and merging algorithms, emphasizing the differences between work-optimal and non-work-optimal approaches.

Uploaded by

das2107060
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

CSE 2201: Algorithm Analysis and Design

Parallel Algorithms
Md Mehrab Hossain Opi
Introduction 2
• Algorithms we have seen so far uses single-processor computer.
• Today we will study algorithms for parallel machines.
• Computers with more than one processors.

• Parallel machines offer the potential of decreasing the solution times


enormously.
Idea 3
• Say there are 100 numbers to be added.
• And there are two persons A and B.
• Person A can add the first 50 numbers.
• Person B can add the last 50 numbers.
• When they are done, one of them can add the two individual sums.
• Two people can add the 100 numbers in almost half the time required
by one.
• If we take 10 person?
Idea 4
• The idea of parallel computing is very similar.
• Given a problem to solve, we partition the problem into many
subproblems.
• Let each processor work on a subproblem.
• When all the processors are done, the partial solutions are combined to
arrive at the final answer.
• Any algorithm designed for a single-processor machine as a sequential
algorithm.
• Any designed for a multiprocessor machine as a parallel algorithm.
Speedup 5
• If a sequential algorithm has a run time of , where n is the problem
size.
• If a parallel algorithm on a p-processor machine runs in time then the
speedup of the algorithm is defined to be

• What is the speedup of adding 100 numbers by two person?


Speedup 6
• The total work done by p-processor is defined as .
• The efficiency of the algorithm is defined to be .
• The parallel algorithm is said to be work-optimal if
Computational Model 7
• The sequential model we have employed is the RAM.
• We assume we can perform – addition, subtraction, multiplication,
division, comparison, memory access, assignment, etc. in one unit of
time.
• An important feature of parallel computing that is absent in sequential
computing is the need for inter-processor communication.
• Which subproblem will be solved by each processor.
• Check if every one has finished its task.
Shared Memory Model 8
• A number of processors work synchronously.
• They communicate with each other using a common block of global
memory that is accessible by all.
• Called common or shared memory.

• Communication is performed by writing to and/or reading from the


common memory.
• The model is also called PRAM (Parallel RAM) model.
Shared Memory Model 9
• Each processor in a PRAM is a RAM with some local memory.

processors
1 2 3 p

1 2 3 4 … … … m

Global memory
Shared Memory Model 10
• We assume the input is given in the global memory.
• There is space for storing intermediate results.
• There is chance of access conflict.
• What happens if more than one processor tries to access the same
global memory cell.
• Several variants of PRAM is proposed.
Shared Memory Model 11
• EREW PRAM
• Exclusive Read and Exclusive Write PRAM is the model in which no
concurrent read or write is allowed.
• Processor can access different cells concurrently.

• CREW PRAM
• Concurrent Read and Exclusive Write

• CRCW PRAM
• Concurrent Read and Concurrent Write.

• Depending on the model time complexity varies.


Prefix Computation 12
• Suppose we want to compute the prefix sum/multiplication of an array.
• It can also be minimum or maximum.

Array 3 -2 4 3 -4 8 -7 9
Sum 3 1 5 8 4 12 5 14
Minimum 3 -2 -2 -2 -4 -4 -7 -7

• How can we compute this?


• What will be the time complexity?
Prefix Computation 13
• The prefix computation problem can be solved on time sequentially.
• Let’s see how we can solve it using multiple processor.
Prefix Computation 14
• We will use the divide-and-conquer strategy.
• Let the input be .
• Assume that n is an integral power of 2.
Parallel Computation 15
Step 0: If one processor outputs
Step 1: Let the first processors recursively compute the prefix of Let, be the
result. At the same time, let the rest of the processors recursively calculate
prefixes of and let be the output.
Step 2: the first half of the final answer is the same as . The second half of the
final answer is . The second half of the processor can concurrently read from
the global memory and update their answer. This step takes time. (CREW
Model)
Prefix Computation 16
• What will be the time complexity?
• Consider we have n processors.
• Step 1 takes times and step 2 takes time.
• We get the following relation

• This solves to .
• Is it work-optimal?
• No.
Prefix Computation 17
• The total work done by this algorithm is .
• Using sequential algorithm this can be done in .

• A work-optimal algorithm can be obtained by decreasing the number of


processors.
• We will use processors.
Work-Optimal Prefix Computation 18
Step 1: Processor is parallel computes the prefixes of its assigned
elements This takes time. Let the results be , ,…, .
Step 2: A total of processors employ the previous algorithm to compute
the prefixes of the elements . Let, be the result.
Step 3: Each processor updates the prefixes it computed in step 1 as
follows. Processor computes and outputs ,…, for . Processor 1 outputs
without any modification.
Work-Optimal Prefix Computation 19
Input 5 12 8 6 3 9 11 12 1 5 6 7 10 4 3 5

5 12 8 6 3 9 11 12 1 5 6 7 10 4 3 5
Processor 1 Processor 2 Processor 3 Processor 4
Step 1 (local to processors)

5 17 25 31 3 12 23 35 1 6 12 19 10 14 17 22

Local sums 31 35 19 22
Step 2 (global computation)
31 66 85 10
7
5 17 25 31 3 12 23 35 1 6 12 19 10 14 17 22
Step 3 (update)

5 17 25 31 34 43 54 66 67 72 78 85 95 99 10 10
2 7
Work-Optimal Prefix Computation 20
• Step 1 takes time.
• Step 2 takes time.
• Finally step 3 also takes time.
Merging 21
• The problem of merging is to take two sorted sequence as input and
produce a sorted sequence of all the elements.
• We already know an time algorithm.
• Let’s see how we can use parallel computing.
Parallel Merging 22
• Suppose we have two sorted sequence.

• Assume, m is a power of 2, and all values are distinct.


• We can think of the problem as assigning rank to each .
• If we know the rank of each key, then the keys can be merged by
writing the key whose rank is into global memory cell .
• Writing will take time if we have processors.
Parallel Merging 23
• How do we find the rank of an item
• If , then we already know there’s at least elements less than k.
• How do we find the number of elements less than in
• Binary Search.
• If 2m processors are working parallelly, all the rank can be computed in
time.
• And writing will take time.
• Hence, the merging can be done in time.
• But the algorithm is not work optimal.
Odd-Even Merge 24
0. If m = 1, merge the sequence with one comparison.
1. Partition into their odd and even parts. That is partition into and .
Similarly, partition .
2. Recursively merge with using processors. Let, be the result. At the
same time merge using the other processors to get
3. Shuffle and ; that is form the sequence Compare every pair and
interchange them if they are out of order. Output the resultant
sequence.
Odd-Even Merge 25
Odd-Even Merge Sort 26
• Self Study.
• Algorithm 13.11 and 13.12.
References 27
• Computer Algorithms
– Ellis Horowitz, Sartaj Sahni, Sanguthevar Rajasekaran.
28

Thank You.

You might also like