0% found this document useful (0 votes)

68 views

18-Assignment 1 - Solution

This document provides instructions for Assignment 1 in Distributed and Parallel Systems. It specifies that submissions should include: - Answers to problem questions typed in a PDF file - Source code files and input test files - A Makefile for CilkPlus code Submissions are due by March 3, 2019 and should be submitted through the class website. Students are expected to work independently but can use external references which should be cited. The assignment is out of 100 points and clear organization may yield a 10% bonus while disorganized work risks a 10% malus.

Uploaded by

demro channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

18-Assignment 1 - Solution

Uploaded by

demro channel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Distributed and Parallel Systems Due on Sunday, March 3, 2019

Assignment 1
CS4402B / CS9635B University of Western Ontario

Submission instructions.
Format: The answers to the problem questions should be typed:
• source programs must be accompanied with input test files and,
• in the case of CilkPlus code, a Makefile (for compiling and running) is required,
and
• for algorithms or complexity analyzes, LATEX is highly recommended.
A PDF file (no other format allowed) should gather all the answers to non-programming
questions. All the files (the PDF, the source programs, the input test files and Make-
files) should be archived using the UNIX command tar.

Submission: The assignment should submitted through the OWL website of the class.

Collaboration. You are expected to do this assignment on your own without assistance
from anyone else in the class. However, you can use literature and if you do so, briefly
list your references in the assignment. Be careful! You might find on the web solutions
to our problems which are not appropriate. For instance, because the parallelism model
is different. So please, avoid those traps and work out the solutions by yourself. You
should not hesitate to contact me if you have any questions regarding this assignment.
I will be more than happy to help.

Marking. This assignment will be marked out of 100. A 10 % bonus will be given if your
paper is clearly organized, the answers are precise and concise, the typography and the
language are in good order. Messy assignments (unclear statements, lack of correctness
in the reasoning, many typographical and language mistakes) may yield a 10 % malus.

1
PROBBLEM 1. [20 points] Consider the following multithreaded algorithm for perform-
ing pairwise addition on n-element arrays A[1..n] and B[1..n], storing the sums in D[1..n],
shown in Algorithm 5.

Algorithm 1: Pairwise addition

Sum-Array(A, B, D, n)
int grain size = ?; /* To be determined */
int r = dn/grain sizee;
for k = 0; k < r; ++k do
spawn Add-Subarray (A, B, D, k · grain size,
min((k + 1) · grain size, n));
sync;
Add-Subarray(A, B, D, i, j)
for k = i, k < j; ++k do
D[k] = A[k] + B[k];

1.1 Suppose that we set grain size = 1. What is the work, span and parallelism of this
implementation?

Solution.

• With grain size = 1, the for-loop of the procedure Sum-Array performs n iter-
ations. Moreover, at each iteration, the function call Add-Subarray performs
constant work. Therefore, the work is in the order of Θ(n).
• As for the span, it is also Θ(n): indeed, spawning the function calls does not
reduce the critical path.
• Therefore, the parallelism is in Θ(1).

1.2 For an arbitrary grain size, what is the work, span and parallelism of this implementa-
tion?

Solution.

• Let us denote the grain size by g, each function call has a cost in Θ(g).
• With grain size = g, the for-loop of the procedure Sum-Array performs n/g it-
erations. Moreover, at each iteration, the function call Add-Subarray performs
Θ(g). Therefore, the work remains in the order of Θ(n).
• Here again, spawning the function calls does not reduce the critical path. So each
of the n/g iterations has a span of Θ(g) and in the possible worst case, these n/g
function calls are executed one after another. Hence, the span is in O(n).

2
• Therefore, the parallelism is in Ω(1), which is not an attractive result. In practice,
some benefits can come from a spawning a function call at each iteration of a for-
loop, but this is hard to capture theoretically. Moreover, using cilk for is generally
the better way to go.

1.3 Determine the best value for grain size that maximizes parallelism. Explain the reasons.

Solution.
• To give a precise answer, we would need to know whether some of the function
calls to Add-Subarray are performed concurrently. Let us consider the best
and the worst cases.
• In the worst case, these function calls execute serially, one after another, whatever
is g. In which case, the parallelism is in Θ(1) and the value of g has no effect.
• In the best case, all the function calls execute in parallel. In which case, the span
√Θ(n/g + g). The function g 7−→ n/g + g reaches a minimum (for g > 0)
drops to
at g = n, which suggests to use this value for maximizing parallelism.

1.4 Implement in C/C++ this algorithm with the best value of grain size (which can be
determined from either theory or practice), and then use Cilkview to collect the following
information of the whole program with n = 4096 or larger:
Work (instructions) Span (instructions) Burdened span (instructions)
Parallelism Burdened parallelism
as well as the speedup estimated on 2, 4, 8, 16, 32, 64 and 128 processors, respectively.
This question receives 10 points distributed as follows:
• the code compiles: 3 points,
• the Code runs: 4 points,
• the code runs correctly against verification: 3 points.

PROBBLEM 2. [20 points] The objective of this problem is to prove that, with respect
to the Theorem of Graham & Brent, a greedy scheduler achieves the stronger bound:
TP ≤ (T1 − T∞ )/p + T∞ .
Let G = (V, E) be the DAG representing the instruction stream for a multithreaded
program in the fork-join parallelism model. The sets V and E denote the vertices and edges
of G respectively. Let T1 and T∞ be the work and span of the corresponding multithreaded
program. We assume that G is connected. We also assume that G admits a single source
(vertex with no predecessors) denoted by s and a single target (vertex with no successors)
denoted by t. Recall that T1 is the total number of elements of V and T∞ is the maximum
number of nodes on a path from s to t (counting s and t).
Let S0 = {s}. For i ≥ 0, we denote by Si+1 the set of the vertices w satisfying the
following two properties:

3
(i) all immediate predecessors of w belong to Si ∪ Si−1 ∪ · · · ∪ So ,

(ii) at least one immediate predecessor of w belongs to Si .

Therefore, the set Si represents all the units of work which can be done during the i−-th
parallel step (and not before that point) on infinitely many processors.
Let p > 1 be an integer. For all i ≥ 0, we denote by wi the number of elements in Si .
Let ` be the largest integer i such that wi 6= 0. Observe that S0 , S1 , . . . , S` form a partition
of V . Finally, we define the following sequence of integers:

0 if wi ≤ p
ci =
dwi /pe − 1 if wi > p

2.1 For the computation of the 5-th Fibonacci number (as studied in class) what are
S0 , S1 , S2 , . . .?

Solution.

2.2 Show that ` + 1 = T∞ and w0 + · · · + w` = T1 both hold.

Solution.
For each i = 0 · · · ` − 1, the set Si+1 consists of strands which cannot be executed
before those in Si ∪ Si−1 ∪ · · · ∪ So are executed. Therefore the span T∞ is at least
` + 1. On the other hand, all strands in Si+1 can be executed (concurrently) after those

4
in Si ∪ Si−1 ∪ · · · ∪ So are executed. Therefore the T∞ is at most ` + 1. These two
observations imply ` + 1 = T∞ .
Since S0 , S1 , . . . , S` form a partition of V , we clearly have w0 + · · · + w` = T1 .

2.3 Show that we have:

c0 + · · · + c` ≤ (T1 − T∞ )/p.

Solution. We have
Pi=`
c0 + · · · + c` ≤ (dwi /pe − 1)
Pi=0
i=`
≤ i=0 (wi /p − 1/p)
1
P i=` (1)
≤ p i=0 (wi − 1)
1
≤ p (T1 − T∞ ) .

Indeed, for every positive integer a, b, one can easily verify the following inequality
a a−1
d e−1 ≤ . (2)
b b

2.4 Prove the desired inequality:

TP ≤ (T1 − T∞ )/p + T∞ .

Solution. We start by an interpretation of the quantity ci :

• if wi ≥ p, that is, if one could perform at least one complete step with the strands
in Si , then ci counts the number of other steps (incomplete or complete) that can
be done after that first complete step,
• if wi < p, that is, if one can only perform one step (in fact, an incomplete one)
with the strands in Si , then ci = 0
Therefore, in all cases, ci counts the number steps the number of other steps that
can be done in Si after that first one whether it is complete or incomplete. Hence
c0 + · · · + c` = TP − (` + 1). Recall that we have ` + 1 = T∞ . With the result of the
previous question, we deduce the desired inequality
1
TP − T∞ ≤ (T1 − T∞ ) . (3)
p

5
2.5 Application: Professor Brown takes some measurements of his (deterministic) multi-
threaded program, which is scheduled using a greedy scheduler, and finds that T8 = 80
seconds and T64 = 20 seconds. Give lower bound and an upper bound for Professor
Brown’s computation running time on p processors, for 1 ≤ p ≤ 100? Using a plot is
recommended.

Solution.

6
The above solution is elegant and addresses the question in the best possibl way.
Neverthless we accept grosser solutions where Equation (3) is used an equality in order
to numerically determine T1 and T∞ . After that, one observes
T1 + (p − 1)T∞
≥ TP ≥ max(T1 /p, T∞ )
p
and plots the above upper and lower bounds of TP .

PROBBLEM 3. [20 points] Given a weighted directed graph G = (V , E), where each edge
(v, w) ∈ E (vertices v, w ∈ V ) has a non-negative weight, the Floyd-Warshall algorithm,
shown in Algorithm 2, can find the shortest paths between all pairs of vertices in G. Let |V |
be the number of vertices in G.

3.1 Determine which loops among the k-loop, i-loop and j-loop can be parallelized and
explain the reasons.

Solution. From the proposed pseudo-code, it is unclear that any of the 3 for loops
could become of a parallel loop. Thus, it is an acceptable solution to say: none! The
challenge is the dynamic programming formulation. In fact, one needs to rework the
algorithm a bit so as to obtain a bloking strategy formulation. See for instance:

7
Algorithm 2: The Floyd-Warshall algorithm
/* Let D be a |V | × |V | array of minimum distances initialized by the
weighted directed graph G. */
for k = 0; k < |V |; ++k do
for i = 0; i < |V |; ++i do
for j = 0; j < |V |; ++j do
if D[i][j] > D[i][k] + D[k][j] then
D[i][j] = D[i][k] + D[k][j];

https://gkaracha.github.io/papers/floyd-warshall.pdf

and

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1333649

From there, one deduces that the two inner for loops can bemcome parallel for loops.
Indeed, the the ”i” and ”j” iterations are independent of each other. This yields a
fork-join algorithm with a work in Θ(n3 ) and a span in Θ(n).

3.2 The wikipedia page https://en.wikipedia.org/wiki/Parallel_all-pairs_shortest_

path_algorithm#Parallelization explains a parallelization of Floyd-Warshall algo-
rithm. Give a multithreaded pseudo-code using the cilk language and expressing the
algorithm presented in that wikipedia page

Solution. The section of the parallelization of the Floyd algorithm, in the wikipedia
page, provides us with an interesting point of view. We can see Floyd-Warshall algo-
rithm as a stencil computation, see Algorithm 3. Note that the parallel for-loops in
Algorithm 3 can be expressed in the cilk language using cilk for with the appropri-
ate grain size.

3.3 Analyze its work, span and parallelism of your multithreaded pseudo-code.

Solution.

• Removing the two in parallel clauses yields a serial algorithm of work in Θ(N 3 ).
• The outermost loop and the two innermost loops are serial. This yields a span of
Θ(N (2log((N − 1)/b))b2 ). If ve wiew b as a small constant, we can simply answer
Θ(N log(N )).

8
Algorithm 3: Parallel Floyd-Warshall algorithm using blocking
/* Let D be a |V | × |V | array of minimum distances initialized by the
weighted directed graph G. */
(0)
Define D = D and let N = |V | ;
Let b be an integer dividing N − 1 ;
for k = 0; k < N ; ++k do
Initialize an N × N matrix D(k+1) to zero ;
for i = 0; i ≤ (N − 1)/b; ++i ; in parallel do
for j = 0; j ≤ (N − 1)/b; ++j ; in parallel do
for h = 0; h < b; ++h do
for ` = 0; ` < b; ++` do
(k+1) (k) (k) (k)
Di b+h,j b+` = min(Di b+h,j b+` , Di b+h,k + Dk,j b+` )

D(k) = D(k+1) ;

• Therefore, the parallelism in in Θ(N 2 /log(N )).

PROBBLEM 4. [40 points]

Computational science is replete with algorithms that require the entries of an array to
be filled in with values that depend on the values of certain already computed neighboring
entries, along with other information that does not change over the course of the computa-
tion. The pattern of neighboring entries that does not change during the computation and
is called a stencil.
Consider a simple stencil calculation on an n × n array A in which, the value placed in to
entry A[i, j] depends on the average value of its neighbors: A[i − 1, j], A[i + 1, j], A[i, j − 1]
and A[i, j + 1]. The serial pseudo-code is shown in Algorithm 4.

Algorithm 4: A simple stencil calculation

/* An auxiliary array D is used. */
for i = 1; i < n − 1; ++i do
for j = 1; j < n − 1; ++j do
D[i, j] = 0.25 * (A[i − 1, j] + A[i + 1, j] + A[i, j − 1] + A[i, j + 1]);
for i = 0; i < n; ++i do
for j = 0; j < n; ++j do
A[i, j] = D[i, j];

9
We can divide the n × n array A into four n/2 × n/2 subarrays as,

A11 A12
A= ,
A21 A22
and then recursively to update each subarray in parallel.

4.1 Based on this decomposition, give a multithreaded pseudo-code using a divide-and-

conquer algorithm.

Solution.

4.2 Draw the computation dag of your pseudo-code, and show how to schedule the dag on
4 processors using greedy scheduling.
4.3 Give and solve recurrences for the work and span for this algorithm in terms of n. What
is the parallelism?

Solution.
Copy part:
Work: O(n2 )
Span: C∞ (n) = C∞ (n/2) + O(1) ∈ O(log n)
The whole algorithm:
Work: O(n2 )
Span: S∞ (n) = S∞ (n/2) + O(log n) = Θ(log2 n)
Parallelism: O(n2 / log2 n)

Choose an integer b ≥ 2. Divide the n × n array into b2 subarrays, each of size n/b × n/b,
recursing with as much parallelism as possible.

4.4 In terms of n and b, what are the work, span and parallelism of your algorithm?

Copy part:
Work: O(n2 )
Span: C∞ (n) = C∞ (n/b) + O(1) ∈ O(logb n))
The whole algorithm:
Work: O(n2 )
Span: S∞ (n) = S∞ (n/b) + O(logb n) = Θ(log2b n)
Parallelism: O(n2 / log2b n)

10
Algorithm 5: Parallel Stencil
Update(A, D, b, N )
Update-blocks (A, D, b, 0, 0, N − 1, N − 1);
Copy-blocks (A, D, b, 0, 0, N − 1, N − 1);
Update-blocks(A, D, b, i0 , j0 , di , dj )
if di > b then
d = di /2;
spawn Update-blocks (A, D, b, i0 , j0 , d, dj ) ;
Update-blocks (A, D, b, i0 + d, j0 , d, dj ) ;
return ;
if dj > b then
d = dj /2;
spawn Update-blocks (A, D, b, i0 , j0 , di , d) ;
Update-blocks (A, D, b, i0 , j0 + d, di , d) ;
return ;
Update-block(A, D, i0 , j0 , di , dj )
Copy-blocks(A, D, b, i0 , j0 , di , dj )
if di > b then
d = di /2;
spawn Copy-blocks (A, D, b, i0 , j0 , d, dj ) ;
Copy-blocks (A, D, b, i0 + d, j0 , d, dj ) ;
return ;
if dj > b then
d = dj /2;
spawn Copy-blocks (A, D, b, i0 , j0 , di , d) ;
Copy-blocks (A, D, b, i0 , j0 + d, di , d) ;
return ;
Copy-block(A, D, i0 , j0 , di , dj )
Update-block(A, D, i0 , j0 , di , dj )
for i = i0 , k < i0 + di ; ++i do
for j = j0 , k < j0 + dj ; ++j do
D[i, j] = 0.25 * (A[i − 1, j] + A[i + 1, j] + A[i, j − 1] + A[i, j + 1]);

copy-block(A, D, b, i0 , j0 )
for i = i0 , k < i0 + b; ++i do
for j = j0 , k < j0 + b; ++j do
A[i, j] = D[i, j] ;

11
4.5 For any choice of b ≥ 2, analyze the trends of parallelism and burden parallelism.

Parallelism grows but grows slower and slower.

Burdened parallelism grows slower than parallelism

4.6 Implement in C/C++ your multithreaded pseudocode from 4.1.

The code can be found in problem4/stencilDnC.cpp, For simplicity, the order of the
matrix is set to n + 2 and we ignore the edge cells.

PROBBLEM 5. [20 points]

In the chapter Analysis of Multithreaded Algorithms, we studied the 2-way and 3-way
construction of a tableau.

5.1 Describe, in plain words, how to construct a tableau in a k-way fashion, for an arbitrary
integer k ≥ 2, using the same stencil (the one of the Pascal triangle construction) as
in the lectures.

One can use either a divide-and-conquer or a blocking strategy, as seen in class for
Pascal’s triangle.

5.2 Determine the work and the span for an input square array of order n.

For an input n × n array, the work is clearly in Θ(n2 ) Let Sk (n) be the non-burdened
span for the k-way divide and conquer approach. We have:

Sk (n) = (2k − 1)Sk (n/k) + Θ(1) ∈ Θ(n log2k−1 k)

5.3 Determine the burdened span, similarly to what we did for the Pascal triangle construc-
tion at then of the chapter Multithreaded Parallelism and Performance Measures

Sk (n) ∈ Θ( nk log nk )

Architectural Technicities
100% (1)
Architectural Technicities
207 pages
What Would I Say in 2 Minutes or Less in Hopes To Impress Someone Concerning Prodigy and The ORIGINAL Metro 2 Compliance Method That It Leverages
100% (1)
What Would I Say in 2 Minutes or Less in Hopes To Impress Someone Concerning Prodigy and The ORIGINAL Metro 2 Compliance Method That It Leverages
2 pages
Assignment 1: Name Class Date Period Sbuid Netid Email
No ratings yet
Assignment 1: Name Class Date Period Sbuid Netid Email
4 pages
Dchuynh HW4
No ratings yet
Dchuynh HW4
5 pages
Moog Servoelectronics
No ratings yet
Moog Servoelectronics
27 pages
Relatório Propane Dehydrogenation
0% (1)
Relatório Propane Dehydrogenation
530 pages
Par - 1 In-Term Exam - Course 2018/19-Q2
No ratings yet
Par - 1 In-Term Exam - Course 2018/19-Q2
9 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
10 pages
Partial Solutions Manual Parallel and Distributed Computation: Numerical Methods
No ratings yet
Partial Solutions Manual Parallel and Distributed Computation: Numerical Methods
95 pages
Partial Solutions Manual Parallel and Distributed Computation: Numerical Methods
No ratings yet
Partial Solutions Manual Parallel and Distributed Computation: Numerical Methods
95 pages
Sample Questions 2019 Test Code PCB (Short Answer Type)
No ratings yet
Sample Questions 2019 Test Code PCB (Short Answer Type)
24 pages
Par - 1 In-Term Exam - Course 2017/18-Q2
No ratings yet
Par - 1 In-Term Exam - Course 2017/18-Q2
7 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
9 pages
AP Lab Assignment (1)
No ratings yet
AP Lab Assignment (1)
13 pages
CS 251 Fall 2018 Final Exam
No ratings yet
CS 251 Fall 2018 Final Exam
15 pages
Write Your Answers in The Question Paper Itself. Be Brief and Precise. Answer All Questions
No ratings yet
Write Your Answers in The Question Paper Itself. Be Brief and Precise. Answer All Questions
10 pages
Mid-Sem2
No ratings yet
Mid-Sem2
2 pages
106 Analysis of Algorithms With C Language Example
No ratings yet
106 Analysis of Algorithms With C Language Example
26 pages
CA PDF
No ratings yet
CA PDF
10 pages
HW 3
No ratings yet
HW 3
12 pages
CMR Institute of Technology: Department of Computer Science & Engineering IV Semester
No ratings yet
CMR Institute of Technology: Department of Computer Science & Engineering IV Semester
51 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
CSE524sp10-01
No ratings yet
CSE524sp10-01
62 pages
2 AlgorithmsAnalysis
No ratings yet
2 AlgorithmsAnalysis
93 pages
Assignment of Algorithm
No ratings yet
Assignment of Algorithm
9 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Compre1
No ratings yet
Compre1
2 pages
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
No ratings yet
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
12 pages
MCA-Floyd Warshall Algorithm
No ratings yet
MCA-Floyd Warshall Algorithm
9 pages
Final Term Test Soln 2020
No ratings yet
Final Term Test Soln 2020
9 pages
1 Parallel and Distributed Computation
No ratings yet
1 Parallel and Distributed Computation
10 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Homework 1 Solutions: Input Output
No ratings yet
Homework 1 Solutions: Input Output
4 pages
2019 Fall Final
No ratings yet
2019 Fall Final
26 pages
UNIT_2
No ratings yet
UNIT_2
35 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
labquiz3
No ratings yet
labquiz3
8 pages
Problem Set 2
No ratings yet
Problem Set 2
5 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
Combat Unacademy
No ratings yet
Combat Unacademy
45 pages
t2 2017 Key
No ratings yet
t2 2017 Key
7 pages
Tushar AP Lab Assignment
No ratings yet
Tushar AP Lab Assignment
13 pages
Gate Preparation Tips
No ratings yet
Gate Preparation Tips
18 pages
KTUweb - CS 352may19 PDF
No ratings yet
KTUweb - CS 352may19 PDF
8 pages
final_exam
No ratings yet
final_exam
8 pages
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
No ratings yet
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
37 pages
Chang Final Practice
No ratings yet
Chang Final Practice
10 pages
rpgate-3
No ratings yet
rpgate-3
2 pages
Mid 1 Spring 2024
No ratings yet
Mid 1 Spring 2024
9 pages
Homework #3 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison
No ratings yet
Homework #3 Solution: Department of Electrical and Computer Engineering University of Wisconsin - Madison
7 pages
Pre-Tutorial Questions
No ratings yet
Pre-Tutorial Questions
4 pages
Parallel Computation: Next Tail Up
No ratings yet
Parallel Computation: Next Tail Up
13 pages
Data Structures
No ratings yet
Data Structures
34 pages
ASD_TP1_2024_Final
No ratings yet
ASD_TP1_2024_Final
3 pages
Final + Sol - Spring 2023
No ratings yet
Final + Sol - Spring 2023
11 pages
written_asst2
No ratings yet
written_asst2
27 pages
AP Lab Assignment Fast Learner Nemesis
No ratings yet
AP Lab Assignment Fast Learner Nemesis
16 pages
Quiz For Chapter 7 With Solutions
No ratings yet
Quiz For Chapter 7 With Solutions
8 pages
Dijkstra's Algorithm Overview: Mergesort Example: Merge As We Return From Recursive Calls
No ratings yet
Dijkstra's Algorithm Overview: Mergesort Example: Merge As We Return From Recursive Calls
4 pages
Final DAA Lab
No ratings yet
Final DAA Lab
24 pages
QTT Irif 20200123 PDF
No ratings yet
QTT Irif 20200123 PDF
82 pages
Problem Set 1
No ratings yet
Problem Set 1
13 pages
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
MIS-summary
No ratings yet
MIS-summary
14 pages
Comparison Between Inventory Management Models
No ratings yet
Comparison Between Inventory Management Models
1 page
surplus
No ratings yet
surplus
7 pages
puma---see product analysis
No ratings yet
puma---see product analysis
29 pages
BUS1710 Chapter 2 Emotions
No ratings yet
BUS1710 Chapter 2 Emotions
32 pages
L 3- demro
No ratings yet
L 3- demro
4 pages
LEC 4,5 Linked List
No ratings yet
LEC 4,5 Linked List
50 pages
L14-ch 11
No ratings yet
L14-ch 11
14 pages
Thermodynamics1 Ch7 Second Law
No ratings yet
Thermodynamics1 Ch7 Second Law
54 pages
tb chapter 13
No ratings yet
tb chapter 13
15 pages
L 6-p2-MIS
No ratings yet
L 6-p2-MIS
5 pages
L7-demro
No ratings yet
L7-demro
13 pages
test (3)
No ratings yet
test (3)
6 pages
CS105_W9_eCommerceAndEnterpriseSystems
No ratings yet
CS105_W9_eCommerceAndEnterpriseSystems
33 pages
L 6 part 1 summary
No ratings yet
L 6 part 1 summary
3 pages
CH 10 OB Summary
No ratings yet
CH 10 OB Summary
7 pages
2-Summary L 6
No ratings yet
2-Summary L 6
6 pages
Ch5 - Revision Questions + Model Answers
No ratings yet
Ch5 - Revision Questions + Model Answers
4 pages
Thermodynamics1 Ch6 Control Volume p1
No ratings yet
Thermodynamics1 Ch6 Control Volume p1
23 pages
Group V
No ratings yet
Group V
9 pages
Thermodynamics1 Ch2 Basic Concepts
No ratings yet
Thermodynamics1 Ch2 Basic Concepts
42 pages
CSC423 - Lec10 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec10 - Distributed and Parallel ComputerSystems
29 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Lecture 3
No ratings yet
Lecture 3
27 pages
lEC - 10 - Sorting - Part1
No ratings yet
lEC - 10 - Sorting - Part1
162 pages
Parralel Demro 003
No ratings yet
Parralel Demro 003
46 pages
CSC423 - Lec11 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec11 - Distributed and Parallel ComputerSystems
19 pages
CSC423 - Lec9 - Distributed and Parallel ComputerSystems
No ratings yet
CSC423 - Lec9 - Distributed and Parallel ComputerSystems
16 pages
Parralel 01
No ratings yet
Parralel 01
38 pages
LEC - 2 Stack
No ratings yet
LEC - 2 Stack
20 pages
WAF (Web Application Firewall)
No ratings yet
WAF (Web Application Firewall)
11 pages
Design and Fabrication of 360 Degree Rotating Fire-1
No ratings yet
Design and Fabrication of 360 Degree Rotating Fire-1
5 pages
Best Grocery Online Shopping in Zimbabwe - Myzimstore
No ratings yet
Best Grocery Online Shopping in Zimbabwe - Myzimstore
16 pages
menu flutuante att_02-06-2025_1142 2
No ratings yet
menu flutuante att_02-06-2025_1142 2
5 pages
MVWS SYSTEM GRAPHICS 1
No ratings yet
MVWS SYSTEM GRAPHICS 1
13 pages
Special and Leadership Awards Form
No ratings yet
Special and Leadership Awards Form
3 pages
Meistream Plus: Class C Bulkmeter For Cold Potable Water DN 40... 150 PN 16
No ratings yet
Meistream Plus: Class C Bulkmeter For Cold Potable Water DN 40... 150 PN 16
4 pages
Vol +1+no +3+Agustus+2023+HAL+246-257
No ratings yet
Vol +1+no +3+Agustus+2023+HAL+246-257
12 pages
HT-View-Xform Ansys Help
No ratings yet
HT-View-Xform Ansys Help
8 pages
Unit-4 SNSW
No ratings yet
Unit-4 SNSW
9 pages
DH-IPC-HDBW2531E-S-S2: 5MP Lite IR Fixed-Focal Dome Network Camera
No ratings yet
DH-IPC-HDBW2531E-S-S2: 5MP Lite IR Fixed-Focal Dome Network Camera
3 pages
I400 WBF Software For Continuous Dosing
100% (1)
I400 WBF Software For Continuous Dosing
2 pages
MCQ Se&pm 2019-2020 Q & A
No ratings yet
MCQ Se&pm 2019-2020 Q & A
78 pages
01__361-Emerald_Pratt-3611866_Link
No ratings yet
01__361-Emerald_Pratt-3611866_Link
12 pages
Module 2 Standard Operating Procedures - PostTest
No ratings yet
Module 2 Standard Operating Procedures - PostTest
2 pages
DAMEA FWW-VC Technical
No ratings yet
DAMEA FWW-VC Technical
1 page
Data Mining And Analytics In Healthcare Management Applications And Tools David L Olson download
100% (1)
Data Mining And Analytics In Healthcare Management Applications And Tools David L Olson download
59 pages
Number System-All Parts
No ratings yet
Number System-All Parts
103 pages
صيانة واصلاح نظام الفرامل المؤزرة
No ratings yet
صيانة واصلاح نظام الفرامل المؤزرة
23 pages
نسخة نسخة All chapters IS 251
No ratings yet
نسخة نسخة All chapters IS 251
399 pages
Computer Science 2 2021 2022
100% (1)
Computer Science 2 2021 2022
77 pages
Tableau Desktop Training Notes Environment
No ratings yet
Tableau Desktop Training Notes Environment
34 pages
High-Performance Hardware For Machine Learning - 0916
No ratings yet
High-Performance Hardware For Machine Learning - 0916
68 pages
Arif Hussain: Executive Profile Key Skills
No ratings yet
Arif Hussain: Executive Profile Key Skills
3 pages
April 16th: - : "Enter The Value of A:" "Enter The Value of B:"
No ratings yet
April 16th: - : "Enter The Value of A:" "Enter The Value of B:"
6 pages
Easy Pay
No ratings yet
Easy Pay
650 pages

18-Assignment 1 - Solution

Uploaded by

18-Assignment 1 - Solution

Uploaded by

Distributed and Parallel Systems Due on Sunday, March 3, 2019

Algorithm 1: Pairwise addition

(ii) at least one immediate predecessor of w belongs to Si .

2.2 Show that ` + 1 = T∞ and w0 + · · · + w` = T1 both hold.

2.3 Show that we have:

2.4 Prove the desired inequality:

Solution. We start by an interpretation of the quantity ci :

3.2 The wikipedia page https://en.wikipedia.org/wiki/Parallel_all-pairs_shortest_

• Therefore, the parallelism in in Θ(N 2 /log(N )).

PROBBLEM 4. [40 points]

Algorithm 4: A simple stencil calculation

4.1 Based on this decomposition, give a multithreaded pseudo-code using a divide-and-

Parallelism grows but grows slower and slower.

4.6 Implement in C/C++ your multithreaded pseudocode from 4.1.

PROBBLEM 5. [20 points]

Sk (n) = (2k − 1)Sk (n/k) + Θ(1) ∈ Θ(n log2k−1 k)

You might also like