Midterm
Midterm
Note: No plagiarism. This is a take home exam, so you should not share your answers with others.
Note: These papers are research papers, so you will be able to find recorded conference presentations in
video format as well on YouTube. You can refer to those presentations as well.
Q1. Based on the reference paper, "COZ: Finding Code that Counts with Causal Profiling", answer the three
parts:
Answer: Virtual speed-up of a code is achieved by artificially slowing down all the other codes running
concurrently (by means of pauses) so that a specific segment of the code can use the resources better and
performs faster. This slowdown of the rest of the code segment virtually results in a speed-up in a specific
code segment. This speed-up is known as virtual speed-up.
Part 2: What is the difference between actual speedup and virtual speedup.
Answer: In an actual speed-up scenario, the runtime of a function is reduced actually so that the overall
program/code performs faster with a smaller wall time. In an actual speed-up scenario, the runtime of the
rest of the code (other functions) does not get affected.
On the other hand, in a virtual speed-up scenario, the runtime of a function is reduced with the expense of
slowing down the other segment of the code (other functions). In this manner, the overall runtime of the
code increases (larger wall time).
Part 3: Assume you have a concurrent software running on a multi-core CPU and you have been assigned
the task of finding inefficiencies in code blocks that can potentially be improved, how causal profiling can
help you perform the task.
Answer: Causal profiling can be useful to find out the line of codes which can be optimized for better
performance. A causal profile graph provides a list of the lines in the code with their virtual speed-up slope.
This makes it easier to identify which lines are most important and have more prospect of optimization. The
lines with a steep upward slope have the most optimization scope. The lines with zero slope cannot be further
optimized.
The causal profiling also helps us identify the lines with negative speedup effect. Optimization on these
critical segments can hurt the overall performance. This is known as the contention issue. By solving these
contention issues, we can achieve further optimization and increase efficiency of the overall code.
Q2. Instructor first describes Amdahl's law formula.
Reference Paper is "Amdahl's law for the multi-core era" and "Retrospective on Amdahl’s Law
in the Multicore Era". Based on your reading of these two papers, answer the following two questions:
Part 1: Inspired by Amdahl's law, the authors have come up with their improved speedup model for multi-
core chips. Based on the speedup models provided, draw two speedup graphs for - 1) Symmetric Multicore
Chips, and 2) Asymmetric Multicore Chips. Show speedup calculations and plots for n = 256 and r = {3, 9,
27 and 81} BCE for Symmetric and Asymmetric multi-core chips. The graphs are similar to the ones shown
in Figure 2 (b) and 2 (d) of the paper.
1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝$%&&"'!() =
1−𝑓 𝑓. 𝑟
+
√𝑟 𝑛. √𝑟
1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝*$%&&"'!() =
1−𝑓 𝑓
+
√ 𝑟 𝑛 + √𝑟 − 𝑟
The required calculations are shown below for parallel fraction, f = 0.9:
n r f Pref Speedup
3 √3 15.668
9 3 22.789
256 0.9
27 3√3 26.658
81 9 23.391
n r f Pref Speedup
3 √3 16.322
9 3 27.076
256 0.9
27 3√3 43.313
81 9 62.491
The speedup plots for n=256 are generated using gnuplot utility and shown below:
Speedup for symmetric multi-core chips Speedup for asymmetric multi-core chips
Part2: Almost ten years later, the same authors reflect on their earlier paper. What are the actual hardware
chip examples author provide in their retrospective article for processors with different designs, namely,
Symmetric Multicore Chips, Asymmetric Multicore Chips and Dynamic Multicore Chips?
Answer: The actual hardware chip examples provided by the author is given below:
Symmetric
Multi-core chips Intel Xeon Skylake chips: 28 cores (56 threads)
1st thread evaluates first k terms, then forwards its answer to the 2nd thread. Then, 2nd thread evaluates the
next k terms and updates its answer with the value received from the 1st thread. This pattern follows where
the 2 threads take turn to evaluate the entire polynomial for a given value of x. As such, the 2 threads make
use of the partial answer generated by the other thread to get the final answer.
This strategy looks like round-robin or cyclic strategy of picking work, but it is different. The difference is
that only when one thread has finished evaluating its k terms, only then the next thread is allowed to process
the next k terms. This problem can also be stated as a message passing problem by replacing thread by MPI
process running on a shared-nothing environment.
If your Marquette student id is an even number, then write pseudo-code or algorithm or code for evaluating
polynomial using Threads. You can choose either C++ or Java syntax.
However, if your Marquette student id is an odd number, then write pseudo-code or algorithm or code for
evaluating polynomial using MPI.
Answer: The code for the required problem is solved using C++ thread and given below:
#include <iostream>
#include <thread>
using namespace std;
#define SIZE 7
#define N_Threads 2
#define MAX 10000
#define COEFFICIENT 1
double globalSum=0;
double x = 0.99;
if(degree == 1) return x;
}
int main()
{
cout << "Threads Starting" << endl;
int count=0;
initialize(coeffArr);
for(int i=0;i<=MAX/SIZE;i++){
if(i%N_Threads==0){
thread th1(sum,count);
th1.join();
count++;
}
else {
thread th2(sum,count);
th2.join();
count++;
}
}
cout<<"Summation \t:"<<globalSum<<endl;
return 0;
}