0% found this document useful (0 votes)

107 views5 pages

Midterm

The document is a midterm exam with 3 questions. Question 1 has 3 parts about causal profiling and the differences between actual and virtual speedup. Question 2 has 2 parts about Amdahl's law and speedup models for symmetric and asymmetric multicore chips. Question 3 asks the student to write code to evaluate a polynomial concurrently using threads or MPI processes depending on if their student ID is even or odd.

Uploaded by

khaledmosharrafmukut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views5 pages

Midterm

Uploaded by

khaledmosharrafmukut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

MIDTERM WRITTEN EXAM (March 18, 2021)

Name: Khaled Mosharraf Mukut MUID: 006140806

Each question carries 10 points.

Note: No plagiarism. This is a take home exam, so you should not share your answers with others.

Note: These papers are research papers, so you will be able to find recorded conference presentations in
video format as well on YouTube. You can refer to those presentations as well.

Q1. Based on the reference paper, "COZ: Finding Code that Counts with Causal Profiling", answer the three
parts:

Part 1: Explain the concept of virtual speedup in your own words.

Answer: Virtual speed-up of a code is achieved by artificially slowing down all the other codes running
concurrently (by means of pauses) so that a specific segment of the code can use the resources better and
performs faster. This slowdown of the rest of the code segment virtually results in a speed-up in a specific
code segment. This speed-up is known as virtual speed-up.

Part 2: What is the difference between actual speedup and virtual speedup.

Answer: In an actual speed-up scenario, the runtime of a function is reduced actually so that the overall
program/code performs faster with a smaller wall time. In an actual speed-up scenario, the runtime of the
rest of the code (other functions) does not get affected.

On the other hand, in a virtual speed-up scenario, the runtime of a function is reduced with the expense of
slowing down the other segment of the code (other functions). In this manner, the overall runtime of the
code increases (larger wall time).

Part 3: Assume you have a concurrent software running on a multi-core CPU and you have been assigned
the task of finding inefficiencies in code blocks that can potentially be improved, how causal profiling can
help you perform the task.

Answer: Causal profiling can be useful to find out the line of codes which can be optimized for better
performance. A causal profile graph provides a list of the lines in the code with their virtual speed-up slope.
This makes it easier to identify which lines are most important and have more prospect of optimization. The
lines with a steep upward slope have the most optimization scope. The lines with zero slope cannot be further
optimized.

The causal profiling also helps us identify the lines with negative speedup effect. Optimization on these
critical segments can hurt the overall performance. This is known as the contention issue. By solving these
contention issues, we can achieve further optimization and increase efficiency of the overall code.
Q2. Instructor first describes Amdahl's law formula.

Reference Paper is "Amdahl's law for the multi-core era" and "Retrospective on Amdahl’s Law
in the Multicore Era". Based on your reading of these two papers, answer the following two questions:

Part 1: Inspired by Amdahl's law, the authors have come up with their improved speedup model for multi-
core chips. Based on the speedup models provided, draw two speedup graphs for - 1) Symmetric Multicore
Chips, and 2) Asymmetric Multicore Chips. Show speedup calculations and plots for n = 256 and r = {3, 9,
27 and 81} BCE for Symmetric and Asymmetric multi-core chips. The graphs are similar to the ones shown
in Figure 2 (b) and 2 (d) of the paper.

Answer: For symmetric and asymmetric multi-core chips assuming, 𝑝!"# = √𝑟

1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝$%&&"'!() =
1−𝑓 𝑓. 𝑟
+
√𝑟 𝑛. √𝑟

1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝*$%&&"'!() =
1−𝑓 𝑓
+
√ 𝑟 𝑛 + √𝑟 − 𝑟

The required calculations are shown below for parallel fraction, f = 0.9:

Symmetric Multi-core chips

n r f Pref Speedup

3 √3 15.668

9 3 22.789
256 0.9
27 3√3 26.658

81 9 23.391

Asymmetric Multi-core chips

n r f Pref Speedup

3 √3 16.322

9 3 27.076
256 0.9
27 3√3 43.313

81 9 62.491
The speedup plots for n=256 are generated using gnuplot utility and shown below:

Speedup for symmetric multi-core chips Speedup for asymmetric multi-core chips

Part2: Almost ten years later, the same authors reflect on their earlier paper. What are the actual hardware
chip examples author provide in their retrospective article for processors with different designs, namely,
Symmetric Multicore Chips, Asymmetric Multicore Chips and Dynamic Multicore Chips?

Answer: The actual hardware chip examples provided by the author is given below:

Model Name Examples

Symmetric
Multi-core chips Intel Xeon Skylake chips: 28 cores (56 threads)

Asymmetric Apple’s A5-A8 chips

Multi-core chips
AMD Accelerated Processing Unit (APU) chip: four CPU cores and 512 GPU cores
Most modern intel processor (no specific chip is mentioned in the paper)
Dynamic Multi-
core chips
Intel core-i7 Coffee lake 9700K, core-i5 9600K etc.

Q3. Choose thread/process based on MU ID even/odd.

Two threads are created to evaluate a polynomial with N terms concurrently.

1st thread evaluates first k terms, then forwards its answer to the 2nd thread. Then, 2nd thread evaluates the
next k terms and updates its answer with the value received from the 1st thread. This pattern follows where
the 2 threads take turn to evaluate the entire polynomial for a given value of x. As such, the 2 threads make
use of the partial answer generated by the other thread to get the final answer.
This strategy looks like round-robin or cyclic strategy of picking work, but it is different. The difference is
that only when one thread has finished evaluating its k terms, only then the next thread is allowed to process
the next k terms. This problem can also be stated as a message passing problem by replacing thread by MPI
process running on a shared-nothing environment.
If your Marquette student id is an even number, then write pseudo-code or algorithm or code for evaluating
polynomial using Threads. You can choose either C++ or Java syntax.
However, if your Marquette student id is an odd number, then write pseudo-code or algorithm or code for
evaluating polynomial using MPI.

Answer: The code for the required problem is solved using C++ thread and given below:
#include <iostream>
#include <thread>
using namespace std;
#define SIZE 7
#define N_Threads 2
#define MAX 10000
#define COEFFICIENT 1

int coeffArr = (int )malloc(sizeof(int) * MAX);

double globalSum=0;
double x = 0.99;

void initialize(int coeffArr[])

{
int i;
for( i = 0; i < MAX; i++)
{
coeffArr[i] = COEFFICIENT;
}
}

double power(double x, int degree)

{
if(degree == 0) return 1;

if(degree == 1) return x;

return x * power(x, degree - 1);

}

void sum(int count)

{
double localSum = 0;
int startIndex=count*SIZE;
int endIndex=(count+1)*SIZE;
if (count == MAX/SIZE)
endIndex=MAX;
for(int i=startIndex; i<endIndex;i++)
localSum+=coeffArr[i]*power(x,i);
globalSum+=localSum;
// cout << "Count: "<<count<<" startIndex: " << startIndex << " endIndex:
" << endIndex<<endl;

}
int main()
{
cout << "Threads Starting" << endl;

int count=0;
initialize(coeffArr);

for(int i=0;i<=MAX/SIZE;i++){
if(i%N_Threads==0){
thread th1(sum,count);
th1.join();
count++;
}
else {
thread th2(sum,count);
th2.join();
count++;
}
}

cout<<"Summation \t:"<<globalSum<<endl;

return 0;
}

Sage X3 Web Services
100% (1)
Sage X3 Web Services
114 pages
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
100% (1)
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
21 pages
Ehr1.0 PPT-1
No ratings yet
Ehr1.0 PPT-1
58 pages
Dynamic Programming For Coding Interviews - A Bottom-Up Approach To Problem Solving
No ratings yet
Dynamic Programming For Coding Interviews - A Bottom-Up Approach To Problem Solving
146 pages
03 Ch1 BasicArch Parallel
No ratings yet
03 Ch1 BasicArch Parallel
79 pages
Quiz For Chapter 7 With Solutions
No ratings yet
Quiz For Chapter 7 With Solutions
8 pages
Web GPU
0% (1)
Web GPU
40 pages
02 Basicarch
No ratings yet
02 Basicarch
83 pages
OS Manual
No ratings yet
OS Manual
39 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
HPC Lab Manual 2317 Merged Organized
No ratings yet
HPC Lab Manual 2317 Merged Organized
35 pages
04 Progbasics
No ratings yet
04 Progbasics
51 pages
Multicore Architecture and Programming Lab Manual
No ratings yet
Multicore Architecture and Programming Lab Manual
29 pages
02 Multicore
No ratings yet
02 Multicore
66 pages
Written Asst1
No ratings yet
Written Asst1
31 pages
CP4292 Multicore Architecture Lab Manual
No ratings yet
CP4292 Multicore Architecture Lab Manual
36 pages
CP4292 Mcap
No ratings yet
CP4292 Mcap
24 pages
Written Asst2
No ratings yet
Written Asst2
27 pages
Gauravkumar 221it027@it301 Lab2
No ratings yet
Gauravkumar 221it027@it301 Lab2
28 pages
MPC LAB Manual New
No ratings yet
MPC LAB Manual New
24 pages
MAP Lab Mannual
No ratings yet
MAP Lab Mannual
24 pages
03 Progmodels
No ratings yet
03 Progmodels
47 pages
Assignment 04
No ratings yet
Assignment 04
16 pages
Parallel Algorithm Merged
No ratings yet
Parallel Algorithm Merged
76 pages
Micro
No ratings yet
Micro
30 pages
CP4292 Mcap
No ratings yet
CP4292 Mcap
15 pages
UMC 102 2024 Final
No ratings yet
UMC 102 2024 Final
13 pages
PDC Experiments
No ratings yet
PDC Experiments
11 pages
HPC LA2 2024 Questions
No ratings yet
HPC LA2 2024 Questions
7 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
Assignment 6 - P1
No ratings yet
Assignment 6 - P1
7 pages
Introduction
No ratings yet
Introduction
46 pages
Sample Midterm
No ratings yet
Sample Midterm
14 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
Trial Exam
No ratings yet
Trial Exam
14 pages
HPC Int I Retest Answer Key
No ratings yet
HPC Int I Retest Answer Key
10 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
9 pages
8 Week Report
No ratings yet
8 Week Report
23 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
Mid 19
No ratings yet
Mid 19
3 pages
Mid Sem QP&Solution
No ratings yet
Mid Sem QP&Solution
7 pages
2 - Hardware
No ratings yet
2 - Hardware
29 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
HPC Programs
No ratings yet
HPC Programs
19 pages
SAP Portfolio and Project Mgmt1
No ratings yet
SAP Portfolio and Project Mgmt1
42 pages
F
No ratings yet
F
136 pages
L 8-Extrusion PDF
No ratings yet
L 8-Extrusion PDF
16 pages
Digital Assignment-6: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Narayanamoorthi M SLOT: L59+L60
No ratings yet
Digital Assignment-6: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Narayanamoorthi M SLOT: L59+L60
14 pages
RT11 2
No ratings yet
RT11 2
15 pages
t2 2017 Key
No ratings yet
t2 2017 Key
7 pages
Exercises 6
No ratings yet
Exercises 6
3 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
HPC La2-2023qs
No ratings yet
HPC La2-2023qs
5 pages
Cia 1 QP CCS332 - App Unit 3 Answer Key
No ratings yet
Cia 1 QP CCS332 - App Unit 3 Answer Key
9 pages
Dry Type Transformer Monitoring 1712166381
No ratings yet
Dry Type Transformer Monitoring 1712166381
30 pages
Gcu Coursework Submission Form
100% (2)
Gcu Coursework Submission Form
6 pages
Assignment 2-Multiprocessing
No ratings yet
Assignment 2-Multiprocessing
2 pages
Par - 1 In-Term Exam - Course 2017/18-Q2
No ratings yet
Par - 1 In-Term Exam - Course 2017/18-Q2
7 pages
#Include #Include #Define
No ratings yet
#Include #Include #Define
8 pages
Fortianalyzer v6.4.7 Release Notes
No ratings yet
Fortianalyzer v6.4.7 Release Notes
39 pages
CS 6290: High-Performance Computer Architecture: Summer 2019
No ratings yet
CS 6290: High-Performance Computer Architecture: Summer 2019
5 pages
PAR Control1 2016 Q1
No ratings yet
PAR Control1 2016 Q1
6 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
CS4961: Parallel Programming Midterm Exam October 20, 2011
No ratings yet
CS4961: Parallel Programming Midterm Exam October 20, 2011
4 pages
Word Processing Assignments
100% (1)
Word Processing Assignments
8 pages
CS151 Sem2 ISA2Morning Solns
No ratings yet
CS151 Sem2 ISA2Morning Solns
9 pages
Lab 7
No ratings yet
Lab 7
3 pages
C: Finding Code That Counts With Causal Profiling: Charlie Curtsinger Emery D. Berger
No ratings yet
C: Finding Code That Counts With Causal Profiling: Charlie Curtsinger Emery D. Berger
14 pages
Cs Cheat
No ratings yet
Cs Cheat
2 pages
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
No ratings yet
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
4 pages
Zach Rieker Spec Out List (Template)
No ratings yet
Zach Rieker Spec Out List (Template)
20 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Parking System Operation Manual
No ratings yet
Parking System Operation Manual
26 pages
STM - Unit 2
No ratings yet
STM - Unit 2
35 pages
Beginning Game Development With Godot (For John Malkovec) : Learn To Create and Publish Your First 2D Platform Game Maithili Dhule
No ratings yet
Beginning Game Development With Godot (For John Malkovec) : Learn To Create and Publish Your First 2D Platform Game Maithili Dhule
47 pages
User's Guide To Conversion
No ratings yet
User's Guide To Conversion
9 pages
Search Luffy Stickfigures
No ratings yet
Search Luffy Stickfigures
1 page
Lecture 6 Software and Its Types
No ratings yet
Lecture 6 Software and Its Types
19 pages
Trace - 2021-03-18 19 - 52 - 49 558
No ratings yet
Trace - 2021-03-18 19 - 52 - 49 558
14 pages
React: Nanodegree Program Syllabus
No ratings yet
React: Nanodegree Program Syllabus
14 pages
Blasius Equation Numerical Solution
100% (1)
Blasius Equation Numerical Solution
4 pages
OData Services Documentation
No ratings yet
OData Services Documentation
9 pages
Alice 6 LDe Data Sheet
No ratings yet
Alice 6 LDe Data Sheet
2 pages
9A05703 Grid and Cluster Computing PDF
No ratings yet
9A05703 Grid and Cluster Computing PDF
4 pages
5th Module Assessment CSIT
No ratings yet
5th Module Assessment CSIT
7 pages
Tank Blowdown Math: Dean Wheeler Brigham Young University March 13, 2019
No ratings yet
Tank Blowdown Math: Dean Wheeler Brigham Young University March 13, 2019
7 pages
How To Make A Main Menu in Unity: Food Living Outside Play Technology Workshop
No ratings yet
How To Make A Main Menu in Unity: Food Living Outside Play Technology Workshop
6 pages
Decentralized E-Voting System Using Blockchain
No ratings yet
Decentralized E-Voting System Using Blockchain
6 pages
Introduction To Requirement Engineering Requirements:: 1. Milk
0% (1)
Introduction To Requirement Engineering Requirements:: 1. Milk
2 pages
MEEN 4325 Intermediate Fluid Mechanics (3 Credit Hours)
No ratings yet
MEEN 4325 Intermediate Fluid Mechanics (3 Credit Hours)
2 pages
Summary of A Mechanistic Explanation of The Increase in Particle Scavenging in The Ultrasonic Scrubber
No ratings yet
Summary of A Mechanistic Explanation of The Increase in Particle Scavenging in The Ultrasonic Scrubber
3 pages
5W1H
No ratings yet
5W1H
1 page
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet

Midterm

Uploaded by

Midterm

Uploaded by

MIDTERM WRITTEN EXAM (March 18, 2021)

Name: Khaled Mosharraf Mukut MUID: 006140806

Each question carries 10 points.

Part 1: Explain the concept of virtual speedup in your own words.

Answer: For symmetric and asymmetric multi-core chips assuming, 𝑝!"# = √𝑟

Symmetric Multi-core chips

Asymmetric Multi-core chips

Model Name Examples

Asymmetric Apple’s A5-A8 chips

Q3. Choose thread/process based on MU ID even/odd.

Two threads are created to evaluate a polynomial with N terms concurrently.

int *coeffArr = (int *)malloc(sizeof(int) * MAX);

void initialize(int coeffArr[])

double power(double x, int degree)

return x * power(x, degree - 1);

void sum(int count)

You might also like

int coeffArr = (int )malloc(sizeof(int) * MAX);