0% found this document useful (0 votes)
17 views

Lecture 02-Amdahl's Law, Modern Hardware: ECE 459: Programming For Performance

This document summarizes a lecture about Amdahl's Law and limitations on parallelization speedups. Amdahl's Law states that the maximum speedup from parallelization is limited by the fraction of the program that must run serially. Even with an infinite number of processors, the serial fraction of a program will limit overall speedup. The document provides formulations of Amdahl's Law and an example graph showing how speedup levels off as more processors are added based on different serial fractions. It also notes assumptions behind Amdahl's Law, such as fixed problem size and negligible overhead from parallelization.

Uploaded by

mirion
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture 02-Amdahl's Law, Modern Hardware: ECE 459: Programming For Performance

This document summarizes a lecture about Amdahl's Law and limitations on parallelization speedups. Amdahl's Law states that the maximum speedup from parallelization is limited by the fraction of the program that must run serially. Even with an infinite number of processors, the serial fraction of a program will limit overall speedup. The document provides formulations of Amdahl's Law and an example graph showing how speedup levels off as more processors are added based on different serial fractions. It also notes assumptions behind Amdahl's Law, such as fixed problem size and negligible overhead from parallelization.

Uploaded by

mirion
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Lecture 02Amdahls Law, Modern Hardware

ECE 459: Programming for Performance

Patrick Lam
University of Waterloo

January 7, 2015

About Prediction and Speedups

Cliff Click said: 5% miss rates dominate performance.


Why is that?

2 / 13

About Prediction and Speedups

Cliff Click said: 5% miss rates dominate performance.


Why is that?
Recall: 100-1000 slot penalty for a miss.
See L02.pdf for a calculation.

3 / 13

Forcing Branch Mispredicts


blog.man7.org/2012/10/
how-much-do-builtinexpect-likely-and.html
#include <stdlib.h>
#include <stdio.h>
static __attribute__ ((noinline)) int f(int a) { return a; }
#define BSIZE 1000000
int main(int argc, char* argv[])
{
int *p = calloc(BSIZE, sizeof(int));
int j, k, m1 = 0, m2 = 0;
for (j = 0; j < 1000; j++) {
for (k = 0; k < BSIZE; k++) {
if (__builtin_expect(p[k], EXPECT_RESULT)) {
m1 = f(++m1);
} else {
m2 = f(++m2);
}
}
}
printf("%d, %d\n", m1, m2);
}

Running times: 3.1s with good (or no) hint, 4.9s with bogus hint.
4 / 13

Limitations of Speedups

Our main focus is parallelization.

Most programs have a sequential part and a


parallel part; and,
Amdahls Law answers, what are the limits to
parallelization?

5 / 13

Formulation (1)
S: fraction of serial runtime in a serial execution.
P: fraction of parallel runtime in a serial execution.
Therefore, S + P = 1.

With 4 processors, best case, what can happen to the


following runtime?
Runtime
S

6 / 13

Formulation (1)
Runtime
S

We want to split up the parallel part over 4 processors


Runtime
S

P
P
P
P

7 / 13

Formulation (2)
Ts : time for the program to run in serial
N: number of processors/parallel executions
Tp : time for the program to run in parallel
Under perfect conditions, get N speedup for P

Tp = Ts (S + NP )

8 / 13

Formulation (3)
How much faster can we make the program?

speedup =
=
=

Ts
Tp
Ts
TS (S +
1
P
S+N

P
N)

(assuming no overhead for parallelizing; or costs near zero)

9 / 13

Fixed-Size Problem Scaling,


Varying Fraction of Parallel Code
32
30
28
26
24
22

Speedup

20
18

50% Parallel
70% Parallel
90% Parallel
95% Parallel
99% Parallel
100% Parallel

16
14
12
10
8
6
4
2
0

12

16

20

24

28

32

Number of processors
10 / 13

Amdahls Law
Replace S with (1 P):
speedup =

maximum speedup =

1
P
(1P)+ N

1
(1P) ,

since

P
N

As you might imagine, the asymptotes in the previous


graph are bounded by the maximum speedup.

11 / 13

Assumptions behind Amdahls Law

How can we invalidate Amdahls Law?

12 / 13

Assumptions behind Amdahls Law

We assume:
problem size is fixed (well see this soon);
program/algorithm behaves the same on 1
processor and on N processors; and
that we can accurately measure runtimes
i.e. that overheads dont matter.

13 / 13

You might also like