Introduction To Parallel Programming
Introduction To Parallel Programming
Introduction to Parallel
Section 10.
Programming
Parallel Methods for Sorting
Problem Statement
Parallelizing Techniques
Bubble Sort
Shell Sort
Parallel Quick Sort
HyperQuick Sort
Sorting by Regular Sampling
Summary
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 2 Æ 51
Problem Statement
S = {a1 , a 2 ,..., a n }
in the order of monotonous increase or decrease
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 3 Æ 51
Parallelizing Techniques…
The basic operation – " compare-exchange "
// the basic sorting operation
if ( A[i] > A[j] ) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
}
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 5 Æ 51
Parallelizing Techniques…
Parallel generalization of the basic operation when p < n
(each processor contains a data block of n / p size):
− To sort the block on each processor at the beginning of sorting,
− To exchange the blocks between the processors Pi and Pi +1 ,
− To unite the blocks Ai and Ai +1 on each processor into
a sorted block with the help of merge operation,
− To split the obtained double block into equal parts and to locate
one of the parts (for instance, the one with smaller data values)
on processor Pi , and the other part (with the greater data
values) – on processor Pi +1
[ Ai ∪ Ai +1 ]сорт = Ai' ∪ Ai' +1 : ∀ai' ∈ Ai' , ∀a 'j ∈ Ai' +1 ⇒ ai' ≤ a 'j
This procedure is usually referred to as the “compare-split” operation
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 6 Æ 51
Parallelizing Techniques
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 7 Æ 51
Bubble Sort: Sequential Algorithm
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 8 Æ 51
Bubble sorting: Algorithm of odd-even permutation…
Suitability of parallelizing
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 9 Æ 51
Bubble sorting: Algorithm of odd-even permutation…
// Parallel algorithm of odd-even permutation
ParallelOddEvenSort ( double A[], int n ) {
int id = GetProcId(); // process number
int np = GetProcNum(); // number of processes
for ( int i=0; i<np; i++ ) {
if ( i%2 == 1 ) { // odd iteration
if ( id%2 == 1 ) // odd process number
compare_split_min(id+1); // compare-exchange to the right
else compare_split_max(id-1); // compare-exchange to the left
}
if ( i%2 == 0 ) { // even iteration
if( id%2 == 0 ) // even process number
compare_split_min(id+1); // compare-exchange to the right
else compare_split_max(id-1); // compare-exchange to the left
}
}
}
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 10 Æ 51
Bubble sorting: Algorithm of odd-even permutation…
Efficiency analysis…
– The general estimation of efficiency and speedup
characteristics:
n log2 n n log2 n
Sp = , Ep =
(n p)⋅ log2 (n / p) + 2n p ⋅ ((n p) ⋅ log2 (n / p) + 2n)
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 11 Æ 51
Bubble sorting: Algorithm of odd-even permutation…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 12 Æ 51
Bubble sorting: Algorithm of odd-even permutation…
0,012000
0,010000
0,008000
Time
Experiment
0,006000
Model
0,004000
0,002000
0,000000
10000 20000 30000 40000 50000
Num be r of e le m e nts
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 13 Æ 51
Bubble sorting: Algorithm of odd-even permutation…
0,900000
0,800000
0,700000
10000 elements
0,600000
Speedup
20000 elements
0,500000
30000 elements
0,400000
40000 elements
0,300000
50000 elements
0,200000
0,100000
0,000000
2 4
Numbe r of proce sse s
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 14 Æ 51
Bubble sorting: algorithm of odd-even permutation
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 15 Æ 51
Shell sort: Sequential algorithm…
The general concept of the Shell sort is the comparison of the
pairs of values located rather far from each other in the set of
values to be ordered at the initial stages of sorting (sorting such
pairs requires, as a rule, a big number of permutation operations,
if only neighboring elements are compared):
– At the first step of the algorithm the elements n/2 pairs
(ai, an/2+i) for 1 ≤ i ≤ n/2 are sorted,
– At the second step the elements in n/4 groups of four elements
(ai, an/4+1, an/2+1, a3n/4+1) for 1 ≤ i ≤ n/4 are sorted etc.,
– At the last step the elements of all the array (a1, a2,…, an) are
sorted.
The total number of the Shell algorithm iterations is equal to log2n
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 16 Æ 51
Shell sort: Sequential algorithm
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 18 Æ 51
Shell sort: Parallel algorithm…
Example:
Iteration 1
00 01
11 50 53 95 36 44 67 86
On completion of iteration 2
1 16 35 81 5 15 23 44 00 01
10 11 1 5 11 15 16 23 35 36
Iteration 2
00 01 44 44 50 53 67 81 86 95
1 11 16 35 5 15 23 36 10 11
50 53 81 95 44 44 67 86
10 11
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 19 Æ 51
Shell sort: Parallel algorithm…
Efficiency analysis…
– The general estimation of speedup and efficiency:
n log2 n n log2 n
Sp = , Ep =
(n p)⋅ log2 (n / p) + 2n p ⋅ ((n p) ⋅ log2 (n / p) + 2n)
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 20 Æ 51
Shell sort: Parallel algorithm…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 21 Æ 51
Shell sort: Parallel algorithm…
0,014000
0,012000
0,010000
0,008000
Time
Experiment
Model
0,006000
0,004000
0,002000
0,000000
10000 20000 30000 40000 50000
Num be r of e le m e nts
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 22 Æ 51
Shell sort: Parallel algorithm
Results of computational experiments:
– Speedup
Parallel algorithm
Number of
Sequential algorithm 2 processors 4 processors
elements
Time Speedup Time Speedup
10,000 0.001422 0.002959 0.480568 0.007509 0.189373
20,000 0.002991 0.004557 0.656353 0.009826 0.304396
30,000 0.004612 0.006118 0.753841 0.012431 0.371008
40,000 0.006297 0.008461 0.744238 0.017009 0.370216
50,000 0.008014 0.009920 0.807863 0.019419 0.412689
0,900000
0,800000
0,700000
10000 elements
0,600000
Speedup
20000 elements
0,500000
30000 elements
0,400000
40000 elements
0,300000
50000 elements
0,200000
0,100000
0,000000
2 4
Numbe r of Proce ssors
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 23 Æ 51
Quick sort: Sequential algorithm…
The quick sort algorithm proposed by Hoare C.A.R. is based on
sequential partitioning of the data set being sorted into blocks of smaller
sizes so that order relations are provided between the values of different
blocks (for any pair of blocks all the values of one of them do not exceed
the values of the other one):
– At the first iteration of the method the initial data set is split into first
two parts; a certain pivot element is selected to arrange this splitting, all
the set values, which are less than the pivot element, are transferred to
the first block being formed, all the other values create the second
block of the set,
– At the second iteration the above described rules are applied
recursively to both blocks, which have been formed, etc.
If the choice of the pivot elements is optimal, the initial data array
appears to be sorted after executing log2n iterations
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 24 Æ 51
Quick sort: Sequential algorithm
// Sequential algorithm of quick sort
QuickSort(double A[], int i1, int i2) {
if ( i1 < i2 ){
double pivot = A[i1];
int is = i1;
for ( int i = i1+1; i<i2; i++ )
if ( A[i] ≤ pivot ) {
is = is + 1;
swap(A[is], A[i]);
}
swap(A[i1], A[is]);
QuickSort (A, i1, is);
QuickSort (A, is+1, i2);
}
}
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 26 Æ 51
Quick sort: Parallel algorithm…
− To create the pairs of processors, for which the bit presentations
of their numbers differ only in position N, and exchange the data
between the processors; as a result of these data communication
operations, the parts of blocks with the values less than the pivot
element must appear on the processors, for which the number of bit
position N is equal to 0 in the bit presentation of the numbers; the
processors with the numbers, where bit N is equal to 1 must gather
correspondingly all the data values, which exceed the value of the
pivot element,
− To pass over to the subhypercube of smaller dimension and
repeat of the above described procedure.
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 27 Æ 51
Quick sort: Parallel algorithm…
Efficiency analysis…
– The general estimation of speedup and efficiency:
n log 2 n
Sp = ,
(n p ) log 2 (n / p) + log 2 p ⋅ (2 n p )
n log 2 n
Ep =
p ⋅ ((n p ) log 2 (n / p ) + log 2 p ⋅ (2 n p ))
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 28 Æ 51
Quick sort: Parallel algorithm…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 29 Æ 51
Quick sort: Parallel algorithm…
Results of computational experiments…
– Comparison of theoretical estimations and experimental
T
T ***
44222
2
*
data
Data Parallel algorithm
size 2 processors 4 processors
10,000 0.001280 0.001521 0.001735 0.003434
20,000 0.002265 0.002234 0.002321 0.004094
30,000 0.003289 0.003080 0.002928 0.005088
40,000 0.004338 0.004363 0.003547 0.005906
50,000 0.005407 0.005486 0.004175 0.006635
0,006000
0,005000
0,004000
Time
Experiment
0,003000
Model
0,002000
0,001000
0,000000
10000 20000 30000 40000 50000
Num be r of e le m e nts
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 30 Æ 51
Quick sort: Parallel algorithm
Results of computational experiments:
– Speedup
Parallel algorithm
Number of
Sequential algorithm 2 processors 4 processors
elements
Time Speedup Time Speedup
10,000 0.001422 0.001521 0.934911 0.003434 0.414094
20,000 0.002991 0.002234 1.338854 0.004094 0.730581
30,000 0.004612 0.003080 1.497403 0.005088 0.906447
40,000 0.006297 0.004363 1.443273 0.005906 1.066204
50,000 0.008014 0.005486 1.460809 0.006635 1.207837
1,600000
1,400000
1,200000
10000 elements
1,000000
Speedup
20000 elements
0,800000 30000 elements
0,600000 40000 elements
50000 elements
0,400000
0,200000
0,000000
2 4
Number of processes
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 31 Æ 51
HyperQuick sort: Parallel algorithm…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 32 Æ 51
HyperQuick sort: Parallel algorithm…
Efficiency analysis…
– The general estimation of speedup and efficiency:
n log 2 n
Sp = ,
(n p ) log 2 (n / p) + log 2 p ⋅ (2 n p )
n log 2 n
Ep =
p ⋅ ((n p ) log 2 (n / p ) + log 2 p ⋅ (2 n p ))
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 33 Æ 51
HyperQuick sort: Parallel algorithm…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 34 Æ 51
HyperQuick sort: Parallel algorithm…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 35 Æ 51
HyperQuick sort: Parallel algorithm…
Code
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 36 Æ 51
HyperQuick sort: Parallel algorithm…
data 422
Parallel algorithm
Data size
2 processors 4 processors
10,000 0.001281 0.001485 0.001735 0.002898
20,000 0.002265 0.002180 0.002322 0.003770
30,000 0.003289 0.003077 0.002928 0.004451
40,000 0.004338 0.003859 0.003547 0.004721
50,000 0.005407 0.005041 0.004176 0.005242
0,006000
0,005000
0,004000
Experiment
Time
0,003000
Model
0,002000
0,001000
0,000000
10000 20000 30000 40000 50000
Num be r of e le m e nts
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 37 Æ 51
HyperQuick sort: Parallel algorithm
1,800000
1,600000
1,400000
10000 elements
1,200000
Speedup
20000 elements
1,000000
30000 elements
0,800000
40000 elements
0,600000
50000 elements
0,400000
0,200000
0,000000
2 4
Numbe r of e le me nts
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 38 Æ 51
Sorting by Regular Sampling: Parallel Algorithm…
The first stage: the blocks are sorted on each processor
independently by means of the conventional quick sorting
algorithm; each processor further forms a set of the elements of its
blocks with the indices 0, m, 2m,…,(p-1)m, where m=n/p2,
The second stage: all the data sets created on the processors
are gathered on one of the system processors and united in the
course of sequential merge into a sorted set; the obtained set of
values of the elements with the indices
p + ⎣ p / 2⎦ − 1, 2 p + ⎣ p / 2⎦ − 1, ..., ( p − 1) p + ⎣ p / 2⎦
is the basis for forming the set of the pivot elements, which is
transmitted to all the processors; at the end of this stage, each
processor splits its block into p parts using the obtained set of the
pivot values,
to be continued
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 39 Æ 51
Sorting by Regular Sampling: Parallel Algorithm…
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 40 Æ 51
Sorting by Regular Sampling: Parallel Algorithm…
Stage 1
Example: 1: 15 46 48 93 39 6 72 91 14 6 14 15 39 46 48 72 91 93
(p=3) 2: 36 69 40 89 61 97 12 21 54 12 21 36 40 54 61 69 89 97
3: 53 97 84 58 32 27 33 72 20 20 27 32 33 53 58 72 84 97
Stage 2
6 39 72 12 40 69 20 33 72
6 12 20 33 39 40 69 72 72
Stage 3
33 69
1: 1:
6 14 15 39 46 48 72 91 93 6 14 15 12 21 20 27 32 33
2: 2:
12 21 36 40 54 61 69 89 97 39 46 48 36 40 54 61 69 53 58
3: 3:
20 27 32 33 53 58 72 84 97 72 91 93 89 97 72 84 97
Stage 4
1: 6 12 14 15 20 21 27 32 33
2: 36 39 40 46 48 53 54 58 61 69
3: 72 72 84 89 91 93 97 97
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 41 Æ 51
Sorting by Regular Sampling: Parallel Algorithm…
Efficiency analysis (detailed estimations):
- The execution time of the first parallel algorithm stage:
T p1 = ( n / p ) log 2 (n / p ) τ ,
- The execution time of the second parallel algorithm stage:
T p2 = [α log 2 p + wp ( p − 1) / β ] + [ p 2 log 2 pτ ] + [ pτ ] + [log 2 p (α + wp / β )] ,
- The execution time of the third parallel algorithm stage:
T p3 = ( n / p )τ + log 2 p (α + w( n / 2 p ) / β ),
- The execution time of the fourth parallel algorithm stage:
T p4 = (n / p ) log 2 pτ
Total time of parallel algorithm execution:
T p = (n / p) log 2 (n / p) τ + (α log 2 p + wp( p − 1) / β ) + p 2 log 2 pτ + (n / p)τ +
+ log 2 p(α + wp / β ) + pτ + log 2 p(α + w(n / 2 p) / β ) + (n / p) log 2 pτ
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 42 Æ 51
Sorting by Regular Sampling: Parallel Algorithm…
Results of computational experiments…
– Comparison of theoretical estimations the experimental
dataT **
422
0,007000
0,006000
0,005000
0,004000
Experiment
Time
Model
0,003000
0,002000
0,001000
0,000000
10000 20000 30000 40000 50000
Num be r of e le m e nts
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 43 Æ 51
Sorting by Regular Sampling: Parallel Algorithm
Results of computational experiments:
– Speedup
Parallel algorithm
Number of
Sequential algorithm 2 processors 4 processors
elements
Time Speedup Time Speedup
10,000 0.001422 0.001513 0.939855 0.001166 1.219554
20,000 0.002991 0.002307 1.296489 0.002081 1.437290
30,000 0.004612 0.003168 1.455808 0.003099 1.488222
40,000 0.006297 0.004542 1.386394 0.003819 1.648861
50,000 0.008014 0.005503 1.456297 0.004370 1.833867
2,000000
1,800000
1,600000
1,400000 10000 elements
Speedup
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 44 Æ 51
Summary
The following methods of parallel data sorting are described:
– Bubble sort,
– Shell sort,
– Quick sort
The two additional variants are described for the quick sorting
algorithm:
– HyperQuick sort,
– Sorting by regular sampling
Software implementation of the HyperQuick sorting is presented
The order of adducing the parallel sorting methods can be
considered as an example of step-by-step modifications of parallel
methods aimed at improving the speedup and efficiency
characteristics
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 45 Æ 51
Discussions
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 46 Æ 51
Exercises
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 47 Æ 51
References
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 48 Æ 51
Next Section
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 49 Æ 51
Author’s Team
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 50 Æ 51
About the project
The purpose of the project is to develop the set of educational materials for the
teaching course “Multiprocessor computational systems and parallel programming”.
This course is designed for the consideration of the parallel computation problems,
which are stipulated in the recommendations of IEEE-CS and ACM Computing
Curricula 2001. The educational materials can be used for teaching/training
specialists in the fields of informatics, computer engineering and information
technologies. The curriculum consists of the training course “Introduction to the
methods of parallel programming” and the computer laboratory training “The
methods and technologies of parallel program development”. Such educational
materials makes possible to seamlessly combine both the fundamental education in
computer science and the practical training in the methods of developing the
software for solving complicated time-consuming computational problems using the
high performance computational systems.
The project was carried out in Nizhny Novgorod State University, the Software
Department of the Computing Mathematics and Cybernetics Faculty
(http://www.software.unn.ac.ru). The project was implemented with the support of
Microsoft Corporation.
Nizhny Novgorod, 2005 Introduction to Parallel Programming: Parallel Methods for Sorting
© Gergel V.P. 51 Æ 51