0% found this document useful (0 votes)

57 views27 pages

Parallel Distributed Computing Unit-4

The document summarizes sorting algorithms that can be used for parallel computing. It discusses bubble sort and its variants like odd-even transposition sort. It also covers parallel implementations of sorting algorithms like shellsort and quicksort. For quicksort, it describes a shared address space formulation where the list is divided among processors, a pivot is selected, and the lists are merged and sorted recursively.

Uploaded by

sowmya srikande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views27 pages

Parallel Distributed Computing Unit-4

Uploaded by

sowmya srikande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Chapter 9 (Secs. 9.1, 9.3, 9.

4)
Sorting Algorithms
A. Grama, A. Gupta, G. Karypis, and V. Kumar

To accompany the text “Introduction to Parallel Computing”,

Addison Wesley, 2003.
Topic Overview

• Issues in Sorting on Parallel Computers

• Bubble Sort and its Variants

• Quicksort
Sorting: Basics

• One of the most commonly used and well-studied kernels.

• The fundamental operation of comparison-based sorting is

compare-exchange.

• The lower bound on any comparison-based sort of n numbers

is Θ(n log n) on a serial computer.

• In case of parallel sorting, the sorted list is partitioned among a

number of processors, such that (1) each sublist is sorted (2) for
i < j, each element in processor Pi’s sublist is less than those in
Pj ’s sublist.
Sorting: Parallel Compare Exchange Operation
ai aj ai , aj aj , aimin{ai, aj } max{ai, aj }
Pi Pj Pi Pj Pi Pj

Step 1 Step 2 Step 3

A parallel compare-exchange operation (each process is

responsible for one element). Processes Pi and Pj send their
elements to each other. Process Pi keeps min{ai, aj }, and Pj
keeps max{ai, aj }.
Sorting: Parallel Compare Split Operation
2 7 9 10 12 1 6 8 11 13

1 6 8 11 13 2 7 9 10 12
1 6 8 11 13 2 7 9 10 12

Pi Pj Pi Pj
Step 1 Step 2

1 2 6 7 8 9 10 11 12 13 1 2 6 7 8 9 10 11 12 13 1 2 6 7 8 9 10 11 12 13

Pi Pj Pi Pj
Step 3 Step 4

A compare-split operation (each process is responsible for a

block of elements). Each process sends all its elements to
another process. Each process merges the received block with
its own block and retains only the appropriate half of the
merged block. In this example, process Pi retains the smaller
elements and process Pj retains the larger elements.
Bubble Sort and its Variants

The sequential bubble sort algorithm compares and exchanges

adjacent elements in the sequence to be sorted:

1. procedure BUBBLE SORT(n)

2. begin
3. for i := n − 1 downto 1 do
4. for j := 1 to i do
5. compare-exchange(aj , aj+1);
6. end BUBBLE SORT

Sequential bubble sort algorithm.

Bubble Sort and its Variants

• The complexity of bubble sort is Θ(n2).

• Bubble sort is difficult to parallelize since the algorithm has no

concurrency.

• A simple variant, though, uncovers the concurrency.

Odd-Even Transposition

1. procedure ODD-EVEN(n)
2. begin
3. for i := 1 to n do
4. begin
5. if i is odd then
6. for j := 0 to n/2 − 1 do
7. compare-exchange(a2j+1, a2j+2);
8. if i is even then
9. for j := 1 to n/2 − 1 do
10. compare-exchange(a2j , a2j+1);
11. end for
12. end ODD-EVEN

Sequential odd-even transposition sort algorithm.

Odd-Even Transposition
Unsorted

3 2 3 8 5 6 4 1

Phase 1 (odd)

2 3 3 8 5 6 1 4

Phase 2 (even)

2 3 3 5 8 1 6 4

Phase 3 (odd)

2 3 3 5 1 8 4 6

Phase 4 (even)

2 3 3 1 5 4 8 6

Phase 5 (odd)

2 3 1 3 4 5 6 8

Phase 6 (even)

2 1 3 3 4 5 6 8

Phase 7 (odd)

1 2 3 3 4 5 6 8

Phase 8 (even)

1 2 3 3 4 5 6 8

Sorted

Sorting n = 8 elements, using the odd-even transposition sort

algorithm. During each phase, n = 8 elements are compared.
Odd-Even Transposition

• After n phases of odd-even exchanges, the sequence is sorted.

• Each phase of the algorithm (either odd or even) requires Θ(n)

comparisons.

• Serial complexity is Θ(n2).

Parallel Odd-Even Transposition

• Consider the one item per processor case.

• There are n iterations, in each iteration, each processor does

one compare-exchange.

• The parallel run time of this formulation is Θ(n).

• This is cost optimal with respect to the base serial algorithm but
not the optimal one.
Parallel Odd-Even Transposition

1. procedure ODD-EVEN PAR(n)

2. begin
3. id := process’s label
4. for i := 1 to n do
5. begin
6. if i is odd then
7. if id is odd then
8. compare-exchange min(id + 1);
9. else
10. compare-exchange max(id − 1);
11. if i is even then
12. if id is even then
13. compare-exchange min(id + 1);
14. else
15. compare-exchange max(id − 1);
16. end for
17. end ODD-EVEN PAR

Parallel formulation of odd-even transposition.

Parallel Odd-Even Transposition

• Consider a block of n/p elements per processor.

• The first step is a local sort.

• In each subsequent step, the compare exchange operation is

replaced by the compare split operation.

• The parallel run time of the formulation is

z local
}|sort { comparisons communication
n n z }| { z }| {
TP = Θ log + Θ(n) + Θ(n).
p p
Shellsort

• Let n be the number of elements to be sorted and p be the

number of processes.

• During the first phase, processes that are far away from each
other in the array compare-split their elements.

• During the second phase, the algorithm switches to an odd-

even transposition sort.
Parallel Shellsort

0 3 4 5 6 7 2 1

An example of the first phase of parallel shellsort on an

eight-process array.
Parallel Shellsort

• Each process performs d = log p compare-split operations.

• With O(p) bisection width, the each communication can be

performed in time Θ(n/p) for a total time of Θ((n log p)/p).

• In the second phase, l odd and even phases are performed,

each requiring time Θ(n/p).

• The parallel run time of the algorithm is:

first phase second phase

z local
}|sort { z }| { z }| {
n n n n
TP = Θ log +Θ log p + Θ l . (1)
p p p p
Quicksort

• Quicksort is one of the most common sorting algorithms for

sequential computers because of its simplicity, low overhead,
and optimal average complexity.

• Quicksort selects one of the entries in the sequence to be the

pivot and divides the sequence into two – one with all elements
less than the pivot and other greater.

• The process is recursively applied to each of the sublists.

Quicksort
(a) 3 2 1 5 8 4 3 7

Pivot
(b) 1 2 3 5 8 4 3 7

Final position
(c) 1 2 3 3 4 5 8 7

(d) 1 2 3 3 4 5 7 8

(e) 1 2 3 3 4 5 7 8

Example of the quicksort algorithm sorting a sequence of size

n = 8.
Quicksort

• The performance of quicksort depends critically on the quality

of the pivot.

• In the best case, the pivot divides the list in such a way that the
larger of the two lists does not have more than αn elements (for
some constant α).

• In this case, the complexity of quicksort is O(n log n).

Parallelizing Quicksort: Shared Address Space
Formulation

• Consider a list of size n equally divided across p processors.

• A pivot is selected by one of the processors and made known

to all processors.

• Each processor partitions its list into two, say Si and Li, based
on the selected pivot.

• All of the Si lists are merged and all of the Li lists are merged
separately.

• The set of processors is partitioned into two (in proportion of the

size of lists S and L). The process is recursively applied to each
of the lists.
Shared Address Space Formulation
P0 P1 P2 P3 P4
7 13 18 2 17 1 14 20 6 10 15 9 3 16 19 4 11 12 5 8 pivot selection
pivot=7

First Step
P0 P1 P2 P3 P4
after local
7 2 18 13 1 17 14 20 6 10 15 9 3 4 19 16 5 12 11 8 rearrangement
after global
7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 rearrangement

P0 P1 P2 P3 P4
7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 pivot selection
pivot=5 pivot=17
Second Step
P0 P1 P2 P3 P4
after local
1 2 7 6 3 4 5 14 13 17 18 20 10 15 9 19 16 12 11 8 rearrangement

after global
1 2 3 4 5 7 6 14 13 17 10 15 9 16 12 11 8 18 20 19 rearrangement

P0 P1 P2 P3 P4
1 2 3 4 5 7 6 14 13 17 10 15 9 16 12 11 8 18 20 19 pivot selection
pivot=11

P0 P1 P2 P3 P4
Third Step

after local
1 2 3 4 5 6 7 10 13 17 14 15 9 8 12 11 16 18 19 20 rearrangement

after global
10 9 8 12 11 13 17 14 15 16 rearrangement

P2 P3
Fourth Step

after local
10 9 8 12 11 13 17 14 15 16 rearrangement

P0 P1 P2 P3 P4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Solution
Parallelizing Quicksort: Shared Address Space
Formulation

• How to globally merge the local lists (S0, L0, S1, L1, . . .) to form
S and L?

• Each processor needs to determine the right location for its

elements in the merged list.

• Each processor first counts the number of elements locally less

than and greater than pivot.

• It then computes two sum-scans to determine the starting

location for its elements in the merged S and L lists.

• Once it knows the starting locations, it can write its elements

safely.
Parallelizing Quicksort: Shared Address Space
Formulation
P0 P1 P2 P3 P4
7 13 18 2 17 1 14 20 6 10 15 9 3 16 19 4 11 12 5 8 pivot selection
pivot=7

P0 P1 P2 P3 P4
after local
7 2 18 13 1 17 14 20 6 10 15 9 3 4 19 16 5 12 11 8 rearrangement

|Si | 2 1 1 2 1 2 3 3 2 3 |Li |
Prefix Sum Prefix Sum
0 2 3 4 6 7 0 2 5 8 10 13

after global
7 2 1 6 3 4 5 18 13 17 14 20 10 15 9 19 16 12 11 8 rearrangement
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Efficient global rearrangement of the array.

Parallelizing Quicksort: Shared Address Space
Formulation

• The parallel time depends on the split and merge time, and the
quality of the pivot.

• The latter is an issue independent of parallelism, so we focus on

the first aspect, assuming ideal pivot selection.

• The algorithm executes in four steps: (i) determine and

broadcast the pivot; (ii) locally rearrange the array assigned
to each process; (iii) determine the locations in the globally
rearranged array that the local elements will go to; and (iv)
perform the global rearrangement.

• The first step takes time Θ(log p), the second, Θ(n/p), the third,
Θ(log p), and the fourth, Θ(n/p).

• The overall complexity of splitting an n-element array is Θ(n/p)+

Θ(log p).
Parallelizing Quicksort: Shared Address Space
Formulation

• The process recurses until there are p lists, at which point, the
lists are sorted locally.

• Therefore, the total parallel time is:

array splits
z local
}|sort { z }| {
n n n
TP = Θ log +Θ log p + Θ(log2 p). (2)
p p p

• The corresponding isoefficiency is Θ(p log2 p) due to broadcast

and scan operations.
Parallelizing Quicksort: Message Passing Formulation

• A simple message passing formulation is based on the recursive

halving of the machine.

• Assume that each processor in the lower half of a p processor

ensemble is paired with a corresponding processor in the upper
half.

• A designated processor selects and broadcasts the pivot.

• Each processor splits its local list into two lists, one less (Si), and
other greater (Li) than the pivot.

• A processor in the low half of the machine sends its list Li to the
paired processor in the other half. The paired processor sends
its list Si.

• It is easy to see that after this step, all elements less than the
pivot are in the low half of the machine and all elements
greater than the pivot are in the high half.
Parallelizing Quicksort: Message Passing Formulation

• The above process is recursed until each processor has its own
local list, which is sorted locally.

• The time for a single reorganization is Θ(log p) for broadcasting

the pivot element, Θ(n/p) for splitting the locally assigned
portion of the array, Θ(n/p) for exchange and local
reorganization.

• We note that this time is identical to that of the corresponding

shared address space formulation.

• It is important to remember that the reorganization of elements

is a bandwidth sensitive operation.

Altea CM (Customer Management)
100% (2)
Altea CM (Customer Management)
8 pages
SAP Manual Book 2020 PDF
No ratings yet
SAP Manual Book 2020 PDF
44 pages
Vaadin Unplugged
No ratings yet
Vaadin Unplugged
3 pages
sort
No ratings yet
sort
16 pages
Parallel Sorting Algorithms
No ratings yet
Parallel Sorting Algorithms
22 pages
3.parallel Processing - Algorithms
No ratings yet
3.parallel Processing - Algorithms
37 pages
Sorting Algos in Parallel System
No ratings yet
Sorting Algos in Parallel System
23 pages
Cours 3
No ratings yet
Cours 3
54 pages
Parallel Sorting Algorithms
100% (1)
Parallel Sorting Algorithms
7 pages
F8 PDF
No ratings yet
F8 PDF
32 pages
Odd-Even Transposition Sort
No ratings yet
Odd-Even Transposition Sort
19 pages
Sorting
No ratings yet
Sorting
43 pages
Performance Comparison of Sequential Quick Sort and Parallel Quick Sort Algorithms
No ratings yet
Performance Comparison of Sequential Quick Sort and Parallel Quick Sort Algorithms
9 pages
(Slideshare Downloader La) 63c8d73f6879b
No ratings yet
(Slideshare Downloader La) 63c8d73f6879b
31 pages
L8 Parallel Algorithms
No ratings yet
L8 Parallel Algorithms
41 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
51 pages
Lec32 35
No ratings yet
Lec32 35
40 pages
Cours 3
No ratings yet
Cours 3
54 pages
Iterative Parallel Shift Sort Optimization and Design For Area Constrained Applications
No ratings yet
Iterative Parallel Shift Sort Optimization and Design For Area Constrained Applications
7 pages
Parallel Algorithm & Sorting in Parallel Programming: Submitted By:-Submitted To: - Dalpat Songra
No ratings yet
Parallel Algorithm & Sorting in Parallel Programming: Submitted By:-Submitted To: - Dalpat Songra
42 pages
Diving Into Algorithms: Presentation by
No ratings yet
Diving Into Algorithms: Presentation by
402 pages
Sorting Algorithms - Presentation
0% (1)
Sorting Algorithms - Presentation
32 pages
04 Sorting Algorithms
No ratings yet
04 Sorting Algorithms
53 pages
Algorithm ASSIGNMENT 1 Group 2
No ratings yet
Algorithm ASSIGNMENT 1 Group 2
6 pages
6 - Algorithms On Data Structures - Part 1
No ratings yet
6 - Algorithms On Data Structures - Part 1
9 pages
L9 Sorting
No ratings yet
L9 Sorting
50 pages
Sorting
No ratings yet
Sorting
69 pages
11 Sorting
No ratings yet
11 Sorting
131 pages
Insertion Sort
No ratings yet
Insertion Sort
11 pages
Sorting
No ratings yet
Sorting
54 pages
DS 4
No ratings yet
DS 4
53 pages
CH7 1
No ratings yet
CH7 1
60 pages
Algorithm & Data Structure Lec2 (BET)
No ratings yet
Algorithm & Data Structure Lec2 (BET)
49 pages
Reviw of Sorting Algorihms
No ratings yet
Reviw of Sorting Algorihms
4 pages
Analysis and Design of Algorithm Lab Manual
No ratings yet
Analysis and Design of Algorithm Lab Manual
49 pages
Sorting Algorithms
No ratings yet
Sorting Algorithms
44 pages
Data Structures and Algorithms (CS F211) Sorting
No ratings yet
Data Structures and Algorithms (CS F211) Sorting
4 pages
Q2.Nabil Mohsen Alzeqri
No ratings yet
Q2.Nabil Mohsen Alzeqri
7 pages
Sorting: What Makes It Hard? Chapter 7 in DS&AA Chapter 8 in DS&PS
No ratings yet
Sorting: What Makes It Hard? Chapter 7 in DS&AA Chapter 8 in DS&PS
20 pages
Sorting Algorithms
No ratings yet
Sorting Algorithms
28 pages
CS3353 Unit5
No ratings yet
CS3353 Unit5
21 pages
2 1 Ordenation Algorithms
No ratings yet
2 1 Ordenation Algorithms
60 pages
Unit 04 Sorting
No ratings yet
Unit 04 Sorting
191 pages
Example:: (A Complete Solution Site)
No ratings yet
Example:: (A Complete Solution Site)
9 pages
Sorting Final
No ratings yet
Sorting Final
34 pages
G1-Sorting Algorithms
No ratings yet
G1-Sorting Algorithms
33 pages
Algorithms and Complexity - Sorting - Part1-1
No ratings yet
Algorithms and Complexity - Sorting - Part1-1
8 pages
Dsa A10a A10e
No ratings yet
Dsa A10a A10e
3 pages
DSD Unit 3 Sorting and Searching
No ratings yet
DSD Unit 3 Sorting and Searching
36 pages
10 Sorting
No ratings yet
10 Sorting
20 pages
Sorting
No ratings yet
Sorting
47 pages
Sorting Algorithms
No ratings yet
Sorting Algorithms
6 pages
Chapter 8: Sorting: Important Concepts Common Applications
100% (2)
Chapter 8: Sorting: Important Concepts Common Applications
68 pages
Dsa 6
No ratings yet
Dsa 6
59 pages
Sorting
No ratings yet
Sorting
26 pages
Implimentation and Analysis of Various Sorting Techniques
No ratings yet
Implimentation and Analysis of Various Sorting Techniques
30 pages
Edusat Lect
No ratings yet
Edusat Lect
127 pages
CSE 12 Sorting and Searching
No ratings yet
CSE 12 Sorting and Searching
44 pages
Chapter-4B Sorting
No ratings yet
Chapter-4B Sorting
43 pages
03 Sorting Algorithms
No ratings yet
03 Sorting Algorithms
60 pages
CS20L: Information Structures Semester II, 2004 Sorting: Definitions
No ratings yet
CS20L: Information Structures Semester II, 2004 Sorting: Definitions
11 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
What Are The Advantages and Disadvantages of Allowing Employees To Use Their Personal Smart Phones For Work?
No ratings yet
What Are The Advantages and Disadvantages of Allowing Employees To Use Their Personal Smart Phones For Work?
4 pages
Unit 2
No ratings yet
Unit 2
33 pages
3BSE047353 1.0 en System 800xa 5.0 Control and IO Detailed Presentation
100% (1)
3BSE047353 1.0 en System 800xa 5.0 Control and IO Detailed Presentation
70 pages
B2B Addon Configuration
No ratings yet
B2B Addon Configuration
262 pages
Sfset With Vip 400/410 Built-In Control Unit Medium Voltage Circuit Breaker
No ratings yet
Sfset With Vip 400/410 Built-In Control Unit Medium Voltage Circuit Breaker
20 pages
Isa Ic37
No ratings yet
Isa Ic37
1 page
Wifite and Reaver
No ratings yet
Wifite and Reaver
5 pages
Programming Fundamentals (CS1002) : Assignment 4 (125 Marks) + 10 Bonus Marks
No ratings yet
Programming Fundamentals (CS1002) : Assignment 4 (125 Marks) + 10 Bonus Marks
5 pages
Intro To Hypermesh
No ratings yet
Intro To Hypermesh
35 pages
Resume PDF
No ratings yet
Resume PDF
1 page
PASS Azure Data Engineering Bootcamp
No ratings yet
PASS Azure Data Engineering Bootcamp
35 pages
JS
No ratings yet
JS
21 pages
12 Perform Analytics in Power BI
No ratings yet
12 Perform Analytics in Power BI
33 pages
Matrimonial Fraud Brochure Final
No ratings yet
Matrimonial Fraud Brochure Final
7 pages
02-Basic Structures
No ratings yet
02-Basic Structures
78 pages
009-1941-03 Savant Pro Remote X2 (REM-4000xx REM-4000xxI) Quick Reference Guide
No ratings yet
009-1941-03 Savant Pro Remote X2 (REM-4000xx REM-4000xxI) Quick Reference Guide
2 pages
Prodigy User Guide
No ratings yet
Prodigy User Guide
23 pages
1 Context Dependency Injection 1 1 m1 Slides
No ratings yet
1 Context Dependency Injection 1 1 m1 Slides
22 pages
STS - Group 4 (When Technology and Humanity Cross)
No ratings yet
STS - Group 4 (When Technology and Humanity Cross)
28 pages
Before We Knew It: An Empirical Study of Zero-Day Attacks in The Real World
No ratings yet
Before We Knew It: An Empirical Study of Zero-Day Attacks in The Real World
12 pages
Project Title: E-Business in Organization
No ratings yet
Project Title: E-Business in Organization
4 pages
F
No ratings yet
F
101 pages
Hemis Project White Paper Pre Airdrop
No ratings yet
Hemis Project White Paper Pre Airdrop
9 pages
A Better Way To Format Your Do For CEUR-WS: Dmitry S. Ilaria Manfred
No ratings yet
A Better Way To Format Your Do For CEUR-WS: Dmitry S. Ilaria Manfred
10 pages
Sampath Rachoti - AEM
No ratings yet
Sampath Rachoti - AEM
4 pages
9.2.1 Packet Tracer - Configure Multiarea OSPFv3 - ITExamAnswers
No ratings yet
9.2.1 Packet Tracer - Configure Multiarea OSPFv3 - ITExamAnswers
3 pages
Python Programme With Mosh Notes
No ratings yet
Python Programme With Mosh Notes
3 pages

Parallel Distributed Computing Unit-4

Uploaded by

Parallel Distributed Computing Unit-4

Uploaded by

Chapter 9 (Secs. 9.1, 9.3, 9.

To accompany the text “Introduction to Parallel Computing”,

• Issues in Sorting on Parallel Computers

• Bubble Sort and its Variants

• One of the most commonly used and well-studied kernels.

• The fundamental operation of comparison-based sorting is

• The lower bound on any comparison-based sort of n numbers

• In case of parallel sorting, the sorted list is partitioned among a

Step 1 Step 2 Step 3

A parallel compare-exchange operation (each process is

A compare-split operation (each process is responsible for a

The sequential bubble sort algorithm compares and exchanges

1. procedure BUBBLE SORT(n)

Sequential bubble sort algorithm.

• The complexity of bubble sort is Θ(n2).

• Bubble sort is difficult to parallelize since the algorithm has no

• A simple variant, though, uncovers the concurrency.

Sequential odd-even transposition sort algorithm.

Sorting n = 8 elements, using the odd-even transposition sort

• After n phases of odd-even exchanges, the sequence is sorted.

• Each phase of the algorithm (either odd or even) requires Θ(n)

• Serial complexity is Θ(n2).

• Consider the one item per processor case.

• There are n iterations, in each iteration, each processor does

• The parallel run time of this formulation is Θ(n).

1. procedure ODD-EVEN PAR(n)

Parallel formulation of odd-even transposition.

• Consider a block of n/p elements per processor.

• The first step is a local sort.

• In each subsequent step, the compare exchange operation is

• The parallel run time of the formulation is

• Let n be the number of elements to be sorted and p be the

• During the second phase, the algorithm switches to an odd-

An example of the first phase of parallel shellsort on an

• Each process performs d = log p compare-split operations.

• With O(p) bisection width, the each communication can be

• In the second phase, l odd and even phases are performed,

• The parallel run time of the algorithm is:

first phase second phase

• Quicksort is one of the most common sorting algorithms for

• Quicksort selects one of the entries in the sequence to be the

• The process is recursively applied to each of the sublists.

Example of the quicksort algorithm sorting a sequence of size

• The performance of quicksort depends critically on the quality

• In this case, the complexity of quicksort is O(n log n).

• Consider a list of size n equally divided across p processors.

• A pivot is selected by one of the processors and made known

• The set of processors is partitioned into two (in proportion of the

• Each processor needs to determine the right location for its

• Each processor first counts the number of elements locally less

• It then computes two sum-scans to determine the starting

• Once it knows the starting locations, it can write its elements

Efficient global rearrangement of the array.

• The latter is an issue independent of parallelism, so we focus on

• The algorithm executes in four steps: (i) determine and

• The overall complexity of splitting an n-element array is Θ(n/p)+

• Therefore, the total parallel time is:

• The corresponding isoefficiency is Θ(p log2 p) due to broadcast

• A simple message passing formulation is based on the recursive

• Assume that each processor in the lower half of a p processor

• A designated processor selects and broadcasts the pivot.

• The time for a single reorganization is Θ(log p) for broadcasting

• We note that this time is identical to that of the corresponding

• It is important to remember that the reorganization of elements

You might also like