0% found this document useful (0 votes)

19 views48 pages

8-Analysis of Multithreaded Algorithms-CS4402

The document analyzes multithreaded algorithms and complexity notions. It discusses divide-and-conquer recurrences, matrix multiplication, merge sort, and tableau construction. It also covers the master theorem for solving divide-and-conquer recurrences and orders of magnitude for asymptotic analysis.

Uploaded by

demro channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views48 pages

8-Analysis of Multithreaded Algorithms-CS4402

Uploaded by

demro channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Analysis of Multithreaded Algorithms

Marc Moreno Maza

University of Western Ontario, London, Ontario (Canada)

CS4402-9535
Plan

Review of Complexity Notions

Divide-and-Conquer Recurrences

Matrix Multiplication

Merge Sort

Tableau Construction
Plan

Review of Complexity Notions

Divide-and-Conquer Recurrences

Matrix Multiplication

Merge Sort

Tableau Construction
Orders of magnitude
Let f , g et h be functions from N to R.
I We say that g (n) is in the order of magnitude of f (n) and
we write f (n) ∈ Θ(g (n)) if there exist two strictly positive
constants c1 and c2 such that for n big enough we have

0 ≤ c1 g (n) ≤ f (n) ≤ c2 g (n). (1)

I We say that g (n) is an asymptotic upper bound of f (n) and

we write f (n) ∈ O(g (n)) if there exists a strictly positive
constants c2 such that for n big enough we have

0 ≤ f (n) ≤ c2 g (n). (2)

I We say that g (n) is an asymptotic lower bound of f (n) and

we write f (n) ∈ Ω(g (n)) if there exists a strictly positive
constants c1 such that for n big enough we have

0 ≤ c1 g (n) ≤ f (n). (3)

Examples
I With f (n) = 12 n2 − 3n and g (n) = n2 we have f (n) ∈ Θ(g (n)).
Indeed we have
1 2
c1 n2 ≤ n − 3n ≤ c2 n2 . (4)
2
for n ≥ 12 with c1 = 14 and c2 = 12 .
I Assume that there exists a positive integer n0 such that f (n) > 0
and g (n) > 0 for every n ≥ n0 . Then we have

max(f (n), g (n)) ∈ Θ(f (n) + g (n)). (5)

Indeed we have
1
(f (n) + g (n)) ≤ max(f (n), g (n)) ≤ (f (n) + g (n)). (6)
2
I Assume a and b are positive real constants. Then we have

(n + a)b ∈ Θ(nb ). (7)

Indeed for n ≥ a we have

0 ≤ nb ≤ (n + a)b ≤ (2n)b . (8)

Hence we can choose c1 = 1 and c2 = 2b .

Properties
I f (n) ∈ Θ(g (n)) holds iff f (n) ∈ O(g (n)) and f (n) ∈ Ω(g (n))
hold together.
I Each of the predicates f (n) ∈ Θ(g (n)), f (n) ∈ O(g (n)) and
f (n) ∈ Ω(g (n)) define a reflexive and transitive binary relation
among the N-to-R functions. Moreover f (n) ∈ Θ(g (n)) is
symmetric.
I We have the following transposition formula

f (n) ∈ O(g (n)) ⇐⇒ g (n) ∈ Ω(f (n)). (9)

In practice ∈ is replaced by = in each of the expressions

f (n) ∈ Θ(g (n)), f (n) ∈ O(g (n)) and f (n) ∈ Ω(g (n)). Hence, the
following
f (n) = h(n) + Θ(g (n)) (10)
means
f (n) − h(n) ∈ Θ(g (n)). (11)
Another example

Let us give another fundamental example. Let p(n) be a

(univariate) polynomial with degree d > 0. Let ad be its leading
coefficient and assume ad > 0. Let k be an integer. Then we have
(1) if k ≥ d then p(n) ∈ O(nk ),
(2) if k ≤ d then p(n) ∈ Ω(nk ),
(3) if k = d then p(n) ∈ Θ(nk ).
Exercise: Prove the following
k=n
Σk=1 k ∈ Θ(n2 ). (12)
Plan

Review of Complexity Notions

Divide-and-Conquer Recurrences

Matrix Multiplication

Merge Sort

Tableau Construction
Divide-and-Conquer Algorithms
Divide-and-conquer algorithms proceed as follows.
Divide the input problem into sub-problems.
Conquer on the sub-problems by solving them directly if they
are small enough or proceed recursively.
Combine the solutions of the sub-problems to obtain the
solution of the input problem.
Equation satisfied by T (n). Assume that the size of the input
problem increases with an integer n. Let T (n) be the time
complexity of a divide-and-conquer algorithm to solve this problem.
Then T (n) satisfies an equation of the form:

T (n) = a T (n/b) + f (n). (13)

where f (n) is the cost of the combine-part, a ≥ 1 is the number of

recursively calls and n/b with b > 1 is the size of a sub-problem.
Tree associated with a divide-and-conquer recurrence

Labeled tree associated with the equation. Assume n is a

power of b, say n = b p . To solve the equation

T (n) = a T (n/b) + f (n).

we can associate a labeled tree A(n) to it as follows.

(1) If n = 1, then A(n) is reduced to a single leaf labeled T (1).
(2) If n > 1, then the root of A(n) is labeled by f (n) and A(n)
possesses a labeled sub-trees all equal to A(n/b).
The labeled tree A(n) associated with T (n) = a T (n/b) + f (n)
has height p + 1. Moreover the sum of its labels is T (n).
Solving divide-and-conquer recurrences (1/2)

T(n)
f(n)
a
T(n) T(n/b)
T( /b) T(
T(n/b)
/b) … T( /b)
T(n/b)

f(n)
a
f( /b) f(
f(n/b) f(n/b)
/b) … f(n/b)
f( /b)
a
f(n) /b22)) … T(n/b
a T( /b22)) T(n/b
T(n/b
f( /b
f(n/b T(
f( /b
f(n/b T(
f( /b
f(n/b
/b22))
T( /b) T(
T(n/b)
f( /b)
f(n/b) T(n/b)
f( /b)
f(n/b)
/b) … T(n/b)
T(
f( /b)
f(n/b)
/b)
a
T(n/b T( /b2) … T(n/b
T( /b2) T(n/b T( /b2) T(1)
Solving divide-and-conquer recurrences (2/2)

f(n) f(n)
a
/b) f(
f(n/b)
f( f(n/b)
/b) … f(n/b)
f( /b) a f(n/b)
f( /b)
h = logbn a
f(n/b f( /b2) … f(n/b
f( /b2) f(n/b f( /b2) a2 f(n/b
f( /b2)

…
T(1) alogbn T(1)
= Θ(nlogba)

IDEA: Compare
C nlogba with
ith f(n)
f( ) .
Master Theorem: case nlogb a f (n)

f(n) f(n)
a
f(n/b) f(n/b) … f(n/b) a f(n/b)
h = logbn nlogaba ≫ f(n)
2) … f(n/b2)
f(n/b2) f(n/bGEOMETRICALLY a2 f(n/b2)
INCREASING

…
Specifically, f(n) = O(nlogba – ε)
Specifically
for some constant ε > 0 .
T(1) alogbn T(1)
= Θ(nlogba)

T(n) = Θ(nloggba)
Master Theorem: case f (n) ∈ Θ(nlogb a logk n)

f(n) f(n)
a
f(n/b) nf(n/b)
b ≈ f(n)
…
log a
ff(n/b)
(n/b) a f(n/b)
h = logbn a
ARITHMETICALLY
… f(n/b2)
f(n/b2) f(n/b2) INCREASING a2 f(n/b2)

Specifically,
p y, f(n)
( ) = Θ(n
( logbalg
gkn))

…
for some constant k ≥ 0.
T(1) alogbn T(1)
= Θ(nlogba)
T(n) = Θ(nlogbalgk+1n))
Master Theorem: case where f (n) nlogb a

f(n) f(n)
nlogba ≪ f(n)
a
f(n/b) f(n/b) … f(n/b)
GEOMETRICALLY a f(n/b)
h = logbn a
DECREASING
2) … f(n/b2)
f(n/b2) f(n/bSpecifically,
S ifi ll f(n)
f( ) = a2 f(n/b2)
Ω(nlogba + ε)
for some
constant ε > 0 .*

…
T(1) alogbn T(1)
= Θ(nlogba)
T(n) = Θ(f(n))
*and f(n) satisfies the regularity condition that
a f(n/b) ≤ c f(n) for some constant c < 1.
More examples
I Consider the relation:

T (n) = 2 T (n/2) + n2 . (14)

We obtain:
n2 n2 n2 n2
T (n) = n2 + + + + · · · + p + n T (1). (15)
2 4 8 2
Hence we have:
T (n) ∈ Θ(n2 ). (16)
I Consider the relation:

T (n) = 3T (n/3) + n. (17)

We obtain:
T (n) ∈ Θ(log3 (n)n). (18)
Master Theorem when b = 2
Let a > 0 be an integer and let f , T : N −→ R+ be functions
such that
(i) f (2 n) ≥ 2 f (n) and f (n) ≥ n.
(ii) If n = 2p then T (n) ≤ a T (n/2) + f (n).
Then for n = 2p we have
(1) if a = 1 then

T (n) ≤ (2 − 2/n) f (n) + T (1) ∈ O(f (n)), (19)

(2) if a = 2 then

T (n) ≤ f (n) log2 (n) + T (1) n ∈ O(log2 (n) f (n)), (20)

(3) if a ≥ 3 then

2 log2 (a)−1
T (n) ≤ n − 1 f (n)+T (1) nlog2 (a) ∈ O(f (n) nlog2 (a)−
a−2
(21)
Master Theorem when b = 2

Indeed
T (2p ) ≤ aT p−1 ) + f (2p )
(2 p−2
a a T (2 ) + f (2p−1 ) + f (2p )

≤
= a2 T (2p−2 ) + a f (2p−1 ) + f (2p )
a2 a T (2p−3 ) + f (2p−2 ) + a f (2p−1 ) + f (2p )

≤
= a3 T (2p−3 ) + a2 f (2p−2 ) + a f (2p−1 ) + f (2p )
j=p−1 j
≤ ap T (s1) + σj=0 a f (2p−j )
(22)
Master Theorem when b = 2

Moreover
f (2p ) ≥ 2 f (2p−1 )
f (2p ) ≥ 22 f (2p−2 )
.. .. .. (23)
. . .
f (2p ) ≥ 2j f (2p−j )
Thus a j
j=p−1 j
Σj=0 a f (2p−j ) ≤ f (2p ) Σj=p−1
j=0 . (24)
2
Master Theorem when b = 2

Hence a j
p p
T (2 ) ≤ a T (1) + f (2 p
) Σj=p−1
j=0 . (25)
2
For a = 1 we obtain
1 j
T (2p ) ≤ T (1) + f (2p ) Σj=p−1

j=0 2
1
−1
= T (1) + f (2p ) 2p
1
−1
(26)
2
= T (1) + f (n) (2 − 2/n).

For a = 2 we obtain
T (2p ) ≤ 2p T (1) + f (2p ) p
(27)
= n T (1) + f (n) log2 (n).
Master Theorem cheat sheet
For a ≥ 1 and b > 1, consider again the equation

T (n) = a T (n/b) + f (n). (28)

I We have:

(∃ε > 0) f (n) ∈ O(nlogb a−ε ) =⇒ T (n) ∈ Θ(nlogb a ) (29)

I We have:

(∃ε > 0) f (n) ∈ Θ(nlogb a logk n) =⇒ T (n) ∈ Θ(nlogb a logk+1 n)

(30)
I We have:

(∃ε > 0) f (n) ∈ Ω(nlogb a+ε ) =⇒ T (n) ∈ Θ(f (n)) (31)

Master Theorem quizz!

I T (n) = 4T (n/2) + n

I T (n) = 4T (n/2) + n2

I T (n) = 4T (n/2) + n3

I T (n) = 4T (n/2) + n2 /logn

Acknowledgements

I Charles E. Leiserson (MIT) for providing me with the sources

of its lecture notes.
Plan

Review of Complexity Notions

Divide-and-Conquer Recurrences

Matrix Multiplication

Merge Sort

Tableau Construction
Matrix multiplication

c11 c12 ⋯ c1n a11 a12 ⋯ a1n b11 b12 ⋯ b1n

= ·
c21 c22 ⋯ c2n a21 a22 ⋯ a2n b21 b22 ⋯ b2n
⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
cn11 cn22 ⋯ cnn an11 an22 ⋯ ann bn11 bn22 ⋯ bnn

C A B
We will study three approaches:
I a naive and iterative one
I a divide-and-conquer one
I a divide-and-conquer one with memory management
consideration
Naive iterative matrix multiplication

cilk_for (int i=1; i<n; ++i) {

cilk_for (int j=0; j<n; ++j) {
for (int k=0; k<n; ++k {
C[i][j] += A[i][k] * B[k][j];
}
}

I Work: ?
I Span: ?
I Parallelism: ?
Naive iterative matrix multiplication

cilk_for (int i=1; i<n; ++i) {

cilk_for (int j=0; j<n; ++j) {
for (int k=0; k<n; ++k {
C[i][j] += A[i][k] * B[k][j];
}
}

I Work: Θ(n3 )
I Span: Θ(n)
I Parallelism: Θ(n2 )
Matrix multiplication based on block decomposition

C11 C12 A11 A12 B11 B12

= ·
C21 C22 A21 A22 B21 B22

A11B11 A11B12 A12B21 A12B22

= +
A21B11 A21B12 A22B21 A22B22

The divide-and-conquer approach is simply the one based on

blocking, presented in the first lecture.
Divide-and-conquer matrix multiplication

// C <- C + A * B
void MMult(T *C, T *A, T *B, int n, int size) {
T *D = new T[n*n];
//base case & partition matrices
cilk_spawn MMult(C11, A11, B11, n/2, size);
cilk_spawn MMult(C12, A11, B12, n/2, size);
cilk_spawn MMult(C22, A21, B12, n/2, size);
cilk_spawn MMult(C21, A21, B11, n/2, size);
cilk_spawn MMult(D11, A12, B21, n/2, size);
cilk_spawn MMult(D12, A12, B22, n/2, size);
cilk_spawn MMult(D22, A22, B22, n/2, size);
MMult(D21, A22, B21, n/2, size);
cilk_sync;
MAdd(C, D, n, size); // C += D;
delete[] D;
}

Work ? Span ? Parallelism ?

Divide-and-conquer matrix multiplication
void MMult(T *C, T *A, T *B, int n, int size) {
T *D = new T[n*n];
//base case & partition matrices
cilk_spawn MMult(C11, A11, B11, n/2, size);
cilk_spawn MMult(C12, A11, B12, n/2, size);
cilk_spawn MMult(C22, A21, B12, n/2, size);
cilk_spawn MMult(C21, A21, B11, n/2, size);
cilk_spawn MMult(D11, A12, B21, n/2, size);
cilk_spawn MMult(D12, A12, B22, n/2, size);
cilk_spawn MMult(D22, A22, B22, n/2, size);
MMult(D21, A22, B21, n/2, size);
cilk_sync; MAdd(C, D, n, size); // C += D;
delete[] D; }

I Ap (n) and Mp (n): times on p proc. for n × n Add and Mult.

I A1 (n) = 4A1 (n/2) + Θ(1) = Θ(n2 )
I A∞ (n) = A∞ (n/2) + Θ(1) = Θ(lg n)
I M1 (n) = 8M1 (n/2) + A1 (n) = 8M1 (n/2) + Θ(n2 ) = Θ(n3 )
I M∞ (n) = M∞ (n/2) + Θ(lg n) = Θ(lg2 n)
I M1 (n)/M∞ (n) = Θ(n3 / lg2 n)
Divide-and-conquer matrix multiplication: No temporaries!

template <typename T>

void MMult2(T *C, T *A, T *B, int n, int size) {
//base case & partition matrices
cilk_spawn MMult2(C11, A11, B11, n/2, size);
cilk_spawn MMult2(C12, A11, B12, n/2, size);
cilk_spawn MMult2(C22, A21, B12, n/2, size);
MMult2(C21, A21, B11, n/2, size);
cilk_sync;
cilk_spawn MMult2(C11, A12, B21, n/2, size);
cilk_spawn MMult2(C12, A12, B22, n/2, size);
cilk_spawn MMult2(C22, A22, B22, n/2, size);
MMult2(C21, A22, B21, n/2, size);
cilk_sync; }

Work ? Span ? Parallelism ?

Divide-and-conquer matrix multiplication: No temporaries!
template <typename T>
void MMult2(T *C, T *A, T *B, int n, int size) {
//base case & partition matrices
cilk_spawn MMult2(C11, A11, B11, n/2, size);
cilk_spawn MMult2(C12, A11, B12, n/2, size);
cilk_spawn MMult2(C22, A21, B12, n/2, size);
MMult2(C21, A21, B11, n/2, size);
cilk_sync;
cilk_spawn MMult2(C11, A12, B21, n/2, size);
cilk_spawn MMult2(C12, A12, B22, n/2, size);
cilk_spawn MMult2(C22, A22, B22, n/2, size);
MMult2(C21, A22, B21, n/2, size);
cilk_sync; }

I MAp (n): time on p proc. for n × n Mult-Add.

I MA1 (n) = Θ(n3 )
I MA∞ (n) = 2MA∞ (n/2) + Θ(1) = Θ(n)
I MA1 (n)/MA∞ (n) = Θ(n2 )
I Besides, saving space often saves time due to hierarchical
memory.
Plan

Review of Complexity Notions

Divide-and-Conquer Recurrences

Matrix Multiplication

Merge Sort

Tableau Construction
Merging two sorted arrays
void Merge(T *C, T *A, T *B, int na, int nb) {
while (na>0 && nb>0) {
if (*A <= *B) {
*C++ = *A++; na--;
} else {
*C++ = *B++; nb--;
}
}
while (na>0) {
*C++ = *A++; na--;
}
while (nb>0) {
*C++ = *B++; nb--;
}
}

Time for merging n elements is Θ(n).

3 12 19 46

4 14 21 23
Merge sort

3 4 12 14 19 21 33 46
merge
3 12 19 46 4 14 21 33
merge
3 19 12 46 4 33 14 21
merge
g
19 3 12 46 33 4 21 14
Parallel merge sort with serial merge

template <typename T>

void MergeSort(T *B, T *A, int n) {
if (n==1) {
B[0] = A[0];
} else {
T* C[n];
cilk_spawn MergeSort(C, A, n/2);
MergeSort(C+n/2, A+n/2, n-n/2);
cilk_sync;
Merge(B, C, C+n/2, n/2, n-n/2);
}

I Work?
I Span?
Parallel merge sort with serial merge

template <typename T>

void MergeSort(T *B, T *A, int n) {
if (n==1) {
B[0] = A[0];
} else {
T* C[n];
cilk_spawn MergeSort(C, A, n/2);
MergeSort(C+n/2, A+n/2, n-n/2);
cilk_sync;
Merge(B, C, C+n/2, n/2, n-n/2);
}

I T1 (n) = 2T1 (n/2) + Θ(n) thus T1 (n) == Θ(n lg n).

I T∞ (n) = T∞ (n/2) + Θ(n) thus T∞ (n) = Θ(n).
I T1 (n)/T∞ (n) = Θ(lg n). Puny parallelism!
I We need to parallelize the merge!
Parallel merge

0 ma = na/2 na
A ≤ A[ma] ≥ A[ma]

Recursive Binary Search Recursive

P_Merge P_Merge

B ≤ A[ma] ≥ A[ma] na ≥ nb
0 mb-1 mb nb

Idea: if the total number of elements to be sorted in n = na + nb

then the maximum number of elements in any of the two merges is
at most 3n/4.
Parallel merge
template <typename T>
void P_Merge(T *C, T *A, T *B, int na, int nb) {
if (na < nb) {
P_Merge(C, B, A, nb, na);
} else if (na==0) {
return;
} else {
int ma = na/2;
int mb = BinarySearch(A[ma], B, nb);
C[ma+mb] = A[ma];
cilk_spawn P_Merge(C, A, B, ma, mb);
P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb);
cilk_sync;
}
}

I One should coarse the base case for efficiency.

I Work? Span?
Parallel merge
template <typename T>
void P_Merge(T *C, T *A, T *B, int na, int nb) {
if (na < nb) {
P_Merge(C, B, A, nb, na);
} else if (na==0) {
return;
} else {
int ma = na/2;
int mb = BinarySearch(A[ma], B, nb);
C[ma+mb] = A[ma];
cilk_spawn P_Merge(C, A, B, ma, mb);
P_Merge(C+ma+mb+1, A+ma+1, B+mb, na-ma-1, nb-mb);
cilk_sync; } }

I Let PMp (n) be the p-processor running time of P-Merge.

I In the worst case, the span of P-Merge is
PM∞ (n) ≤ PM∞ (3n/4) + Θ(lg n) = Θ(lg2 n)
I The worst-case work of P-Merge satisfies the recurrence
PM1 (n) ≤ PM1 (αn) + PM1 ((1 − α)n) + Θ(lg n)
, where α is a constant in the range 1/4 ≤ α ≤ 3/4.
Analyzing parallel merge

I Recall PM1 (n) ≤ PM1 (αn) + PM1 ((1 − α)n) + Θ(lg n) for
some 1/4 ≤ α ≤ 3/4.
I To solve this hairy equation we use the substitution method.
I We assume there exist some constants a, b > 0 such that
PM1 (n) ≤ an − b lg n holds for all 1/4 ≤ α ≤ 3/4.
I After substitution, this hypothesis implies:
PM1 (n) ≤ an − b lg n − b lg n + Θ(lg n).
I We can pick b large enough such that we have
PM1 (n) ≤ an − b lg n for all 1/4 ≤ α ≤ 3/4 and all n > 1/
I Then pick a large enough to satisfy the base conditions.
I Finally we have PM1 (n) = Θ(n).
Parallel merge sort with parallel merge

template <typename T>

void P_MergeSort(T *B, T *A, int n) {
if (n==1) {
B[0] = A[0];
} else {
T C[n];
cilk_spawn P_MergeSort(C, A, n/2);
P_MergeSort(C+n/2, A+n/2, n-n/2);
cilk_sync;
P_Merge(B, C, C+n/2, n/2, n-n/2);
}
}

I Work?
I Span?
Parallel merge sort with parallel merge
template <typename T>
void P_MergeSort(T *B, T *A, int n) {
if (n==1) {
B[0] = A[0];
} else {
T C[n];
cilk_spawn P_MergeSort(C, A, n/2);
P_MergeSort(C+n/2, A+n/2, n-n/2);
cilk_sync;
P_Merge(B, C, C+n/2, n/2, n-n/2);
}
}

I The work satisfies T1 (n) = 2T1 (n/2) + Θ(n) (as usual) and
we have T1 (n) = Θ(nlog(n)).
I The worst case critical-path length of the Merge-Sort now
satisfies
T∞ (n) = T∞ (n/2) + Θ(lg2 n) = Θ(lg3 n)
.
I The parallelism is now Θ(n lg n)/Θ(lg3 n) = Θ(n/ lg2 n).
Plan

Review of Complexity Notions

Divide-and-Conquer Recurrences

Matrix Multiplication

Merge Sort

Tableau Construction
Tableau construction

00 01 02 03 04 05 06 07
10 11 12 13 14 15 16 17
20 21 22 23 24 25 26 27
30 31 32 33 34 35 36 37
40 41 42 43 44 45 46 47
50 51 52 53 54 55 56 57
60 61 62 63 64 65 66 67
70 71 72 73 74 75 76 77

Constructing a tableau A satisfying a relation of the form:

A[i, j] = R(A[i − 1, j], A[i − 1, j − 1], A[i, j − 1]). (32)

The work is Θ(n2 ).

Recursive construction

n
Parallel code
I;
cilk_spawn II;
I II III;
;
cilk_sync;
n IV;

III IV

I T1 (n) = 4T1 (n/2) + Θ(1), thus T1 (n) = Θ(n2 ).

I T∞ (n) = 3T∞ (n/2) + Θ(1), thus T∞ (n) = Θ(nlog2 3 ).
I Parallelism: Θ(n2−log2 3 ) = Ω(n0.41 ).
A more parallel construction
n
I;
cilk_spawn
ilk II
II;
I II IV III;
cilk_sync;
cilk spawn
cilk_spawn IV;
cilk_spawn V;
n III V VII VI;
cilk sync;
cilk_sync;
cilk_spawn VII;
VIII;
VI VIII IX cilk_sync;
IX
IX;

I T1 (n) = 9T1 (n/3) + Θ(1), thus T1 (n) = Θ(n2 ).

I T∞ (n) = 5T∞ (n/3) + Θ(1), thus T∞ (n) = Θ(nlog3 5 ).
I Parallelism: Θ(n2−log3 5 ) = Ω(n0.53 ).
I This nine-way d-n-c has more parallelism than the four way
but exhibits more cache complexity (more on this later).
Acknowledgements

I Charles E. Leiserson (MIT) for providing me with the sources

of its lecture notes.

I Matteo Frigo (Intel) for supporting the work of my team with

Cilk++ and offering us the next lecture.

I Yuzhen Xie (UWO) for helping me with the images used in

these slides.

I Liyun Li (UWO) for generating the experimental data.

On Recurrence Relation
100% (1)
On Recurrence Relation
25 pages
publication_11_23912_388
No ratings yet
publication_11_23912_388
11 pages
COMP3121 2 Basic Tools For Analysis of Algorithms
No ratings yet
COMP3121 2 Basic Tools For Analysis of Algorithms
23 pages
Recurrence and Master Theorem
50% (2)
Recurrence and Master Theorem
35 pages
Merged Notes
No ratings yet
Merged Notes
622 pages
Lecture 18 Eso207a 2022 Recuurence
No ratings yet
Lecture 18 Eso207a 2022 Recuurence
41 pages
7 Recurrence Relations
No ratings yet
7 Recurrence Relations
21 pages
2 Mergesort
No ratings yet
2 Mergesort
43 pages
Divide and Conquer
No ratings yet
Divide and Conquer
44 pages
Cs3230-Lec03b-Full Version
No ratings yet
Cs3230-Lec03b-Full Version
29 pages
3 Divideandconquer 3 Mastertheorem
No ratings yet
3 Divideandconquer 3 Mastertheorem
76 pages
Alg Wk4a
No ratings yet
Alg Wk4a
26 pages
LEC 7
No ratings yet
LEC 7
59 pages
Worst-Case Analysis: - in This Class, We Will Focus On
No ratings yet
Worst-Case Analysis: - in This Class, We Will Focus On
29 pages
Discrete Refresher Notes 2018
No ratings yet
Discrete Refresher Notes 2018
29 pages
03 Divide and Conquer 3 Master Theorem
No ratings yet
03 Divide and Conquer 3 Master Theorem
76 pages
CS 124 Lecture 3_ Feb 3
No ratings yet
CS 124 Lecture 3_ Feb 3
11 pages
Design A Algorithm
No ratings yet
Design A Algorithm
20 pages
Exercises
No ratings yet
Exercises
32 pages
3_Recurrence
No ratings yet
3_Recurrence
32 pages
DAA or Algorithms PPT (1)
No ratings yet
DAA or Algorithms PPT (1)
77 pages
Master Theorem
No ratings yet
Master Theorem
25 pages
Divide and Conquer: Andreas Klappenecker (Based On Slides by Prof. Welch)
No ratings yet
Divide and Conquer: Andreas Klappenecker (Based On Slides by Prof. Welch)
27 pages
COE428-4
No ratings yet
COE428-4
51 pages
Om Phat Swaha: Tatya PDF
No ratings yet
Om Phat Swaha: Tatya PDF
1,647 pages
L02_2_Analysis3_Ch04_Ch05
No ratings yet
L02_2_Analysis3_Ch04_Ch05
37 pages
HPC Lab Manual
No ratings yet
HPC Lab Manual
47 pages
Algorithms Analysis
No ratings yet
Algorithms Analysis
21 pages
Topic 1 Recurrences Short
No ratings yet
Topic 1 Recurrences Short
20 pages
Ad Endsem Imp
No ratings yet
Ad Endsem Imp
27 pages
Recurrence Relation and Recursion
No ratings yet
Recurrence Relation and Recursion
39 pages
Master's Theorem
No ratings yet
Master's Theorem
13 pages
4 Recurrence and Master Theorem
No ratings yet
4 Recurrence and Master Theorem
37 pages
Unit 4-Decrease and Conquer & Divide and Conquer
No ratings yet
Unit 4-Decrease and Conquer & Divide and Conquer
13 pages
Data Structures and Design Notes
100% (1)
Data Structures and Design Notes
74 pages
DAA Unit-2
No ratings yet
DAA Unit-2
25 pages
Algorithm
No ratings yet
Algorithm
15 pages
Lec01introF23 PDF
No ratings yet
Lec01introF23 PDF
45 pages
lec 5 (2)
No ratings yet
lec 5 (2)
33 pages
CSE 5311 Homework 1 Solution: Problem 2.2-1
0% (1)
CSE 5311 Homework 1 Solution: Problem 2.2-1
8 pages
Unit I Introduction To Algoritm Design Session - 8
No ratings yet
Unit I Introduction To Algoritm Design Session - 8
7 pages
UNIT-I-Divide and Conquer & Binary Search (1)
No ratings yet
UNIT-I-Divide and Conquer & Binary Search (1)
32 pages
CH 2algo Analysis - Part2
No ratings yet
CH 2algo Analysis - Part2
31 pages
s8
No ratings yet
s8
6 pages
COE428-3
No ratings yet
COE428-3
26 pages
Divide and Conquer
No ratings yet
Divide and Conquer
7 pages
Comp Sci analysis Assignment4-Questions
No ratings yet
Comp Sci analysis Assignment4-Questions
4 pages
mastertheorem_umd
No ratings yet
mastertheorem_umd
9 pages
Master Theorem - (Decreasing and Dividing Functions)
No ratings yet
Master Theorem - (Decreasing and Dividing Functions)
53 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
22 pages
DSA MK Lect3 PDF
No ratings yet
DSA MK Lect3 PDF
75 pages
COMP2230 Introduction To Algorithmics: A/Prof Ljiljana Brankovic
No ratings yet
COMP2230 Introduction To Algorithmics: A/Prof Ljiljana Brankovic
39 pages
Mastertheorem
No ratings yet
Mastertheorem
10 pages
Slide 04
No ratings yet
Slide 04
28 pages
Algorithms and Complexity: Two Numbers // 23, 45 Algorithm
No ratings yet
Algorithms and Complexity: Two Numbers // 23, 45 Algorithm
10 pages
Ps 1 Sol
No ratings yet
Ps 1 Sol
4 pages
Chapter 4: Recurrence Relations: Iterative and The Master Method
No ratings yet
Chapter 4: Recurrence Relations: Iterative and The Master Method
19 pages
Ada
No ratings yet
Ada
62 pages
1.1 Algorithm Paradigms
100% (1)
1.1 Algorithm Paradigms
32 pages
puma---see product analysis
No ratings yet
puma---see product analysis
29 pages
Recur
No ratings yet
Recur
7 pages
lEC - 10 - Sorting - Part1
No ratings yet
lEC - 10 - Sorting - Part1
162 pages
2mark and 13 mark
No ratings yet
2mark and 13 mark
47 pages
CS105_W9_eCommerceAndEnterpriseSystems
No ratings yet
CS105_W9_eCommerceAndEnterpriseSystems
33 pages
Chapter4
No ratings yet
Chapter4
7 pages
tb chapter 13
No ratings yet
tb chapter 13
15 pages
Dynamic Programming in Computer Science
No ratings yet
Dynamic Programming in Computer Science
49 pages
CSE601 - Data Structure and Algorithm
No ratings yet
CSE601 - Data Structure and Algorithm
230 pages
ADA - Study Material by MKS
No ratings yet
ADA - Study Material by MKS
113 pages
CSC301 D&AoA Syllabus FA v4.5
No ratings yet
CSC301 D&AoA Syllabus FA v4.5
4 pages
surplus
No ratings yet
surplus
7 pages
MIS-summary
No ratings yet
MIS-summary
14 pages
EELU-DS-Week3-L1
No ratings yet
EELU-DS-Week3-L1
10 pages
Parralel Demro 003
No ratings yet
Parralel Demro 003
46 pages
LEC 4,5 Linked List
No ratings yet
LEC 4,5 Linked List
50 pages
CPPS Ans
No ratings yet
CPPS Ans
33 pages
18CS42 - Design and Analysis of Algorithm
No ratings yet
18CS42 - Design and Analysis of Algorithm
126 pages
CSC423 - Lec10 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec10 - Distributed and Parallel ComputerSystems
29 pages
L 3- demro
No ratings yet
L 3- demro
4 pages
L14-ch 11
No ratings yet
L14-ch 11
14 pages
Thermodynamics1 Ch2 Basic Concepts
No ratings yet
Thermodynamics1 Ch2 Basic Concepts
42 pages
CSC423 - Lec11 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec11 - Distributed and Parallel ComputerSystems
19 pages
LEC - 2 Stack
No ratings yet
LEC - 2 Stack
20 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
BUS1710 Chapter 2 Emotions
No ratings yet
BUS1710 Chapter 2 Emotions
32 pages
CSC423 - Lec9 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec9 - Distributed and Parallel ComputerSystems
16 pages
Comparison Between Inventory Management Models
No ratings yet
Comparison Between Inventory Management Models
1 page
Daa Question Bank Srmist
No ratings yet
Daa Question Bank Srmist
58 pages
Advanced Algorithms Analysis and Design - CS702 Power Point Slides Lecture 13
100% (1)
Advanced Algorithms Analysis and Design - CS702 Power Point Slides Lecture 13
20 pages
Module 1 DAA
No ratings yet
Module 1 DAA
46 pages
L 6 part 1 summary
No ratings yet
L 6 part 1 summary
3 pages
Group V
No ratings yet
Group V
9 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
L7-demro
No ratings yet
L7-demro
13 pages
Cse Daa
No ratings yet
Cse Daa
5 pages
Thermodynamics1 Ch7 Second Law
No ratings yet
Thermodynamics1 Ch7 Second Law
54 pages
CHO - Design and Analysis of Algorithms
No ratings yet
CHO - Design and Analysis of Algorithms
15 pages
L 6-p2-MIS
No ratings yet
L 6-p2-MIS
5 pages
test (3)
No ratings yet
test (3)
6 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
Midterm Exam Review
No ratings yet
Midterm Exam Review
25 pages
IT576 Computer Systems (3-0-2-4) : Lecture Schedule (CEP 108)
No ratings yet
IT576 Computer Systems (3-0-2-4) : Lecture Schedule (CEP 108)
11 pages
Ch5 - Revision Questions + Model Answers
No ratings yet
Ch5 - Revision Questions + Model Answers
4 pages
Classification of Algorithm
No ratings yet
Classification of Algorithm
4 pages
DAA Important Question
No ratings yet
DAA Important Question
3 pages
CH 10 OB Summary
No ratings yet
CH 10 OB Summary
7 pages
Midterm
No ratings yet
Midterm
19 pages
2-Summary L 6
No ratings yet
2-Summary L 6
6 pages
DAA COURSE HANDOUT
No ratings yet
DAA COURSE HANDOUT
6 pages
ADA Que Bank 2022-2023
No ratings yet
ADA Que Bank 2022-2023
14 pages
Chapter 2.0 Introduction To Algorithm 4th Edition
No ratings yet
Chapter 2.0 Introduction To Algorithm 4th Edition
4 pages
Thermodynamics1 Ch6 Control Volume p1
No ratings yet
Thermodynamics1 Ch6 Control Volume p1
23 pages
Daa Copo
No ratings yet
Daa Copo
1 page
Algorithm Notes Additional Materials
No ratings yet
Algorithm Notes Additional Materials
17 pages
Data Structure Course Objectives
No ratings yet
Data Structure Course Objectives
1 page
Tutorial 02
No ratings yet
Tutorial 02
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet

8-Analysis of Multithreaded Algorithms-CS4402

Uploaded by

8-Analysis of Multithreaded Algorithms-CS4402

Uploaded by

Analysis of Multithreaded Algorithms

Marc Moreno Maza

University of Western Ontario, London, Ontario (Canada)

Review of Complexity Notions

Review of Complexity Notions

0 ≤ c1 g (n) ≤ f (n) ≤ c2 g (n). (1)

I We say that g (n) is an asymptotic upper bound of f (n) and

0 ≤ f (n) ≤ c2 g (n). (2)

I We say that g (n) is an asymptotic lower bound of f (n) and

0 ≤ c1 g (n) ≤ f (n). (3)

max(f (n), g (n)) ∈ Θ(f (n) + g (n)). (5)

(n + a)b ∈ Θ(nb ). (7)

Indeed for n ≥ a we have

0 ≤ nb ≤ (n + a)b ≤ (2n)b . (8)

Hence we can choose c1 = 1 and c2 = 2b .

f (n) ∈ O(g (n)) ⇐⇒ g (n) ∈ Ω(f (n)). (9)

In practice ∈ is replaced by = in each of the expressions

Let us give another fundamental example. Let p(n) be a

Review of Complexity Notions

T (n) = a T (n/b) + f (n). (13)

where f (n) is the cost of the combine-part, a ≥ 1 is the number of

Labeled tree associated with the equation. Assume n is a

T (n) = a T (n/b) + f (n).

we can associate a labeled tree A(n) to it as follows.

T (n) = 2 T (n/2) + n2 . (14)

T (n) = 3T (n/3) + n. (17)

T (n) ≤ (2 − 2/n) f (n) + T (1) ∈ O(f (n)), (19)

T (n) ≤ f (n) log2 (n) + T (1) n ∈ O(log2 (n) f (n)), (20)

T (n) = a T (n/b) + f (n). (28)

(∃ε > 0) f (n) ∈ O(nlogb a−ε ) =⇒ T (n) ∈ Θ(nlogb a ) (29)

(∃ε > 0) f (n) ∈ Θ(nlogb a logk n) =⇒ T (n) ∈ Θ(nlogb a logk+1 n)

(∃ε > 0) f (n) ∈ Ω(nlogb a+ε ) =⇒ T (n) ∈ Θ(f (n)) (31)

I T (n) = 4T (n/2) + n2 /logn

I Charles E. Leiserson (MIT) for providing me with the sources

Review of Complexity Notions

c11 c12 ⋯ c1n a11 a12 ⋯ a1n b11 b12 ⋯ b1n

cilk_for (int i=1; i<n; ++i) {

cilk_for (int i=1; i<n; ++i) {

C11 C12 A11 A12 B11 B12

A11B11 A11B12 A12B21 A12B22

The divide-and-conquer approach is simply the one based on

Work ? Span ? Parallelism ?

I Ap (n) and Mp (n): times on p proc. for n × n Add and Mult.

template <typename T>

Work ? Span ? Parallelism ?

I MAp (n): time on p proc. for n × n Mult-Add.

Review of Complexity Notions

Time for merging n elements is Θ(n).

template <typename T>

template <typename T>

I T1 (n) = 2T1 (n/2) + Θ(n) thus T1 (n) == Θ(n lg n).

Recursive Binary Search Recursive

Idea: if the total number of elements to be sorted in n = na + nb

I One should coarse the base case for efficiency.

I Let PMp (n) be the p-processor running time of P-Merge.

template <typename T>

Review of Complexity Notions

Constructing a tableau A satisfying a relation of the form:

A[i, j] = R(A[i − 1, j], A[i − 1, j − 1], A[i, j − 1]). (32)

The work is Θ(n2 ).

I T1 (n) = 4T1 (n/2) + Θ(1), thus T1 (n) = Θ(n2 ).

I T1 (n) = 9T1 (n/3) + Θ(1), thus T1 (n) = Θ(n2 ).

I Charles E. Leiserson (MIT) for providing me with the sources

I Matteo Frigo (Intel) for supporting the work of my team with

I Yuzhen Xie (UWO) for helping me with the images used in

I Liyun Li (UWO) for generating the experimental data.

You might also like