8-Analysis of Multithreaded Algorithms-CS4402
8-Analysis of Multithreaded Algorithms-CS4402
CS4402-9535
Plan
Divide-and-Conquer Recurrences
Matrix Multiplication
Merge Sort
Tableau Construction
Plan
Divide-and-Conquer Recurrences
Matrix Multiplication
Merge Sort
Tableau Construction
Orders of magnitude
Let f , g et h be functions from N to R.
I We say that g (n) is in the order of magnitude of f (n) and
we write f (n) ∈ Θ(g (n)) if there exist two strictly positive
constants c1 and c2 such that for n big enough we have
Indeed we have
1
(f (n) + g (n)) ≤ max(f (n), g (n)) ≤ (f (n) + g (n)). (6)
2
I Assume a and b are positive real constants. Then we have
Divide-and-Conquer Recurrences
Matrix Multiplication
Merge Sort
Tableau Construction
Divide-and-Conquer Algorithms
Divide-and-conquer algorithms proceed as follows.
Divide the input problem into sub-problems.
Conquer on the sub-problems by solving them directly if they
are small enough or proceed recursively.
Combine the solutions of the sub-problems to obtain the
solution of the input problem.
Equation satisfied by T (n). Assume that the size of the input
problem increases with an integer n. Let T (n) be the time
complexity of a divide-and-conquer algorithm to solve this problem.
Then T (n) satisfies an equation of the form:
T(n)
f(n)
a
T(n) T(n/b)
T( /b) T(
T(n/b)
/b) … T( /b)
T(n/b)
f(n)
a
f( /b) f(
f(n/b) f(n/b)
/b) … f(n/b)
f( /b)
a
f(n) /b22)) … T(n/b
a T( /b22)) T(n/b
T(n/b
f( /b
f(n/b T(
f( /b
f(n/b T(
f( /b
f(n/b
/b22))
T( /b) T(
T(n/b)
f( /b)
f(n/b) T(n/b)
f( /b)
f(n/b)
/b) … T(n/b)
T(
f( /b)
f(n/b)
/b)
a
T(n/b T( /b2) … T(n/b
T( /b2) T(n/b T( /b2) T(1)
Solving divide-and-conquer recurrences (2/2)
f(n) f(n)
a
/b) f(
f(n/b)
f( f(n/b)
/b) … f(n/b)
f( /b) a f(n/b)
f( /b)
h = logbn a
f(n/b f( /b2) … f(n/b
f( /b2) f(n/b f( /b2) a2 f(n/b
f( /b2)
…
T(1) alogbn T(1)
= Θ(nlogba)
IDEA: Compare
C nlogba with
ith f(n)
f( ) .
Master Theorem: case nlogb a f (n)
f(n) f(n)
a
f(n/b) f(n/b) … f(n/b) a f(n/b)
h = logbn nlogaba ≫ f(n)
2) … f(n/b2)
f(n/b2) f(n/bGEOMETRICALLY a2 f(n/b2)
INCREASING
…
Specifically, f(n) = O(nlogba – ε)
Specifically
for some constant ε > 0 .
T(1) alogbn T(1)
= Θ(nlogba)
T(n) = Θ(nloggba)
Master Theorem: case f (n) ∈ Θ(nlogb a logk n)
f(n) f(n)
a
f(n/b) nf(n/b)
b ≈ f(n)
…
log a
ff(n/b)
(n/b) a f(n/b)
h = logbn a
ARITHMETICALLY
… f(n/b2)
f(n/b2) f(n/b2) INCREASING a2 f(n/b2)
Specifically,
p y, f(n)
( ) = Θ(n
( logbalg
gkn))
…
for some constant k ≥ 0.
T(1) alogbn T(1)
= Θ(nlogba)
T(n) = Θ(nlogbalgk+1n))
Master Theorem: case where f (n) nlogb a
f(n) f(n)
nlogba ≪ f(n)
a
f(n/b) f(n/b) … f(n/b)
GEOMETRICALLY a f(n/b)
h = logbn a
DECREASING
2) … f(n/b2)
f(n/b2) f(n/bSpecifically,
S ifi ll f(n)
f( ) = a2 f(n/b2)
Ω(nlogba + ε)
for some
constant ε > 0 .*
…
T(1) alogbn T(1)
= Θ(nlogba)
T(n) = Θ(f(n))
*and f(n) satisfies the regularity condition that
a f(n/b) ≤ c f(n) for some constant c < 1.
More examples
I Consider the relation:
We obtain:
n2 n2 n2 n2
T (n) = n2 + + + + · · · + p + n T (1). (15)
2 4 8 2
Hence we have:
T (n) ∈ Θ(n2 ). (16)
I Consider the relation:
We obtain:
T (n) ∈ Θ(log3 (n)n). (18)
Master Theorem when b = 2
Let a > 0 be an integer and let f , T : N −→ R+ be functions
such that
(i) f (2 n) ≥ 2 f (n) and f (n) ≥ n.
(ii) If n = 2p then T (n) ≤ a T (n/2) + f (n).
Then for n = 2p we have
(1) if a = 1 then
(2) if a = 2 then
(3) if a ≥ 3 then
2 log2 (a)−1
T (n) ≤ n − 1 f (n)+T (1) nlog2 (a) ∈ O(f (n) nlog2 (a)−
a−2
(21)
Master Theorem when b = 2
Indeed
T (2p ) ≤ aT p−1 ) + f (2p )
(2 p−2
a a T (2 ) + f (2p−1 ) + f (2p )
≤
= a2 T (2p−2 ) + a f (2p−1 ) + f (2p )
a2 a T (2p−3 ) + f (2p−2 ) + a f (2p−1 ) + f (2p )
≤
= a3 T (2p−3 ) + a2 f (2p−2 ) + a f (2p−1 ) + f (2p )
j=p−1 j
≤ ap T (s1) + σj=0 a f (2p−j )
(22)
Master Theorem when b = 2
Moreover
f (2p ) ≥ 2 f (2p−1 )
f (2p ) ≥ 22 f (2p−2 )
.. .. .. (23)
. . .
f (2p ) ≥ 2j f (2p−j )
Thus a j
j=p−1 j
Σj=0 a f (2p−j ) ≤ f (2p ) Σj=p−1
j=0 . (24)
2
Master Theorem when b = 2
Hence a j
p p
T (2 ) ≤ a T (1) + f (2 p
) Σj=p−1
j=0 . (25)
2
For a = 1 we obtain
1 j
T (2p ) ≤ T (1) + f (2p ) Σj=p−1
j=0 2
1
−1
= T (1) + f (2p ) 2p
1
−1
(26)
2
= T (1) + f (n) (2 − 2/n).
For a = 2 we obtain
T (2p ) ≤ 2p T (1) + f (2p ) p
(27)
= n T (1) + f (n) log2 (n).
Master Theorem cheat sheet
For a ≥ 1 and b > 1, consider again the equation
I We have:
I We have:
I T (n) = 4T (n/2) + n
I T (n) = 4T (n/2) + n2
I T (n) = 4T (n/2) + n3
Divide-and-Conquer Recurrences
Matrix Multiplication
Merge Sort
Tableau Construction
Matrix multiplication
= ·
c21 c22 ⋯ c2n a21 a22 ⋯ a2n b21 b22 ⋯ b2n
⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
cn11 cn22 ⋯ cnn an11 an22 ⋯ ann bn11 bn22 ⋯ bnn
C A B
We will study three approaches:
I a naive and iterative one
I a divide-and-conquer one
I a divide-and-conquer one with memory management
consideration
Naive iterative matrix multiplication
I Work: ?
I Span: ?
I Parallelism: ?
Naive iterative matrix multiplication
I Work: Θ(n3 )
I Span: Θ(n)
I Parallelism: Θ(n2 )
Matrix multiplication based on block decomposition
// C <- C + A * B
void MMult(T *C, T *A, T *B, int n, int size) {
T *D = new T[n*n];
//base case & partition matrices
cilk_spawn MMult(C11, A11, B11, n/2, size);
cilk_spawn MMult(C12, A11, B12, n/2, size);
cilk_spawn MMult(C22, A21, B12, n/2, size);
cilk_spawn MMult(C21, A21, B11, n/2, size);
cilk_spawn MMult(D11, A12, B21, n/2, size);
cilk_spawn MMult(D12, A12, B22, n/2, size);
cilk_spawn MMult(D22, A22, B22, n/2, size);
MMult(D21, A22, B21, n/2, size);
cilk_sync;
MAdd(C, D, n, size); // C += D;
delete[] D;
}
Divide-and-Conquer Recurrences
Matrix Multiplication
Merge Sort
Tableau Construction
Merging two sorted arrays
void Merge(T *C, T *A, T *B, int na, int nb) {
while (na>0 && nb>0) {
if (*A <= *B) {
*C++ = *A++; na--;
} else {
*C++ = *B++; nb--;
}
}
while (na>0) {
*C++ = *A++; na--;
}
while (nb>0) {
*C++ = *B++; nb--;
}
}
4 14 21 23
Merge sort
3 4 12 14 19 21 33 46
merge
3 12 19 46 4 14 21 33
merge
3 19 12 46 4 33 14 21
merge
g
19 3 12 46 33 4 21 14
Parallel merge sort with serial merge
I Work?
I Span?
Parallel merge sort with serial merge
0 ma = na/2 na
A ≤ A[ma] ≥ A[ma]
B ≤ A[ma] ≥ A[ma] na ≥ nb
0 mb-1 mb nb
I Recall PM1 (n) ≤ PM1 (αn) + PM1 ((1 − α)n) + Θ(lg n) for
some 1/4 ≤ α ≤ 3/4.
I To solve this hairy equation we use the substitution method.
I We assume there exist some constants a, b > 0 such that
PM1 (n) ≤ an − b lg n holds for all 1/4 ≤ α ≤ 3/4.
I After substitution, this hypothesis implies:
PM1 (n) ≤ an − b lg n − b lg n + Θ(lg n).
I We can pick b large enough such that we have
PM1 (n) ≤ an − b lg n for all 1/4 ≤ α ≤ 3/4 and all n > 1/
I Then pick a large enough to satisfy the base conditions.
I Finally we have PM1 (n) = Θ(n).
Parallel merge sort with parallel merge
I Work?
I Span?
Parallel merge sort with parallel merge
template <typename T>
void P_MergeSort(T *B, T *A, int n) {
if (n==1) {
B[0] = A[0];
} else {
T C[n];
cilk_spawn P_MergeSort(C, A, n/2);
P_MergeSort(C+n/2, A+n/2, n-n/2);
cilk_sync;
P_Merge(B, C, C+n/2, n/2, n-n/2);
}
}
I The work satisfies T1 (n) = 2T1 (n/2) + Θ(n) (as usual) and
we have T1 (n) = Θ(nlog(n)).
I The worst case critical-path length of the Merge-Sort now
satisfies
T∞ (n) = T∞ (n/2) + Θ(lg2 n) = Θ(lg3 n)
.
I The parallelism is now Θ(n lg n)/Θ(lg3 n) = Θ(n/ lg2 n).
Plan
Divide-and-Conquer Recurrences
Matrix Multiplication
Merge Sort
Tableau Construction
Tableau construction
00 01 02 03 04 05 06 07
10 11 12 13 14 15 16 17
20 21 22 23 24 25 26 27
30 31 32 33 34 35 36 37
40 41 42 43 44 45 46 47
50 51 52 53 54 55 56 57
60 61 62 63 64 65 66 67
70 71 72 73 74 75 76 77
n
Parallel code
I;
cilk_spawn II;
I II III;
;
cilk_sync;
n IV;
III IV