0% found this document useful (0 votes)

30 views19 pages

ACA 2024W 01 Introduction

Uploaded by

Ghofrane Rh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views19 pages

ACA 2024W 01 Introduction

Uploaded by

Ghofrane Rh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

2024-10-23

Multicore & GPU Programming : An Integrated Approach

Ch. 1: Introduction
By G. Barlas

<C> G. Barlas, 2016  Modifications by H. Weber

Objectives

!
Understand the current trends in computing machine design, and
how this influences software development.
!
Learn how to categorize computing machines based on Flynn's
taxonomy.
!
Learn the essential tools used to evaluate multicore/parallel
performance, i.e. speedup and efficiency.
! Learn the proper experimental procedure for measuring and
reporting performance.
!
Learn Amdahl's and Gustafson-Barsis' laws and apply them in
order to predict the performance of parallel programs.

<C> G. Barlas, 2016 2

The Era of Multicore Machines

Source: http://commons.wikimedia.org/wiki/File:Transistor_Count_and_Moore's_Law_-_2011.svg

The Era of Multicore Machines

<C> G. Barlas, 2016 4

Source: https://en.wikipedia.org/wiki/File:Moore%27s_Law_Transistor_Count_1970-2020.png
Flynn's Taxonomy (1966)
! Single Instruction, Single Data (SISD): a simple sequential machine, that
executes one instruction at a time, operating on a single data item. Surprisingly, the
vast majority of contemporary CPUs, do not belong to this category.
! Single Instruction, Multiple Data (SIMD): a machine where each instruction is
applied on a collection of items. Vector processors were the very first machines
that followed this paradigm. GPUs also follow this design at the level of the
Streaming Multiprocessor (SM for NVidia) or the SIMD unit (for AMD).
! Multiple Instructions, Single Data (MISD): this configuration seems like an oddity.
! Multiple Instructions, Multiple Data (MIMD): the most versatile machine
category. Multicore machines follow this paradigm, including GPUs.

Symmetric  5
Multiprocessing

Industry Trends
! Increase the on-chip core count, combined with
augmented specialized SIMD instruction sets and
larger caches. This is best exemplified by Intel's x86
line of CPUs and the Intel Xeon Phi co-processor.
! Combine heterogeneous cores in the same
package, typically CPU and GPU ones, each
optimized for a different type of task. This is best
exemplified by AMD's line of APU (Accelerated
Processing Unit) chips. Intel is also offering
OpenCL-based computing on its line of CPUs with
integrated graphics chips.
<C> G. Barlas, 2016  6
CPUs VS GPUs
! CPUs:
− Large caches, repetitive use of data
− Instruction decoding and prediction hardware
− Pipelined execution

! GPUs:
− Small caches, single-time use of data
− Big data volume
− Simple program logic, simple ALUs

<C> G. Barlas, 2016 7

Examples of recent multicore chips

! Name of chip: ...

! Company: ...
! Architecture description (architecture diagram):

! Architecture description (keywords):

– …
– ...

<C> G. Barlas, 2016 8

A Glimpse at the Top 500

June 2024

<C> G. Barlas, 2016 9

https://www.top500.org/lists/top500/2024/06/

A Glimpse at the Top 500

June 2023

<C> G. Barlas, 2016 10

How do we measure performance?
!
It's all about time.
!
Counting steps or calculating the asymptotic complexity has little to no
benefit.
!
At the very least, a parallel program should be able to beat in terms of
execution time its sequential counterpart (not always certain).
!
The improvement in execution time is typically expressed as the speedup:

where tseq is the execution time of the sequential program, and tpar is the
execution time of the parallel program for solving the same instance of a
problem.

<C> G. Barlas, 2016 11

Speedup – How objective is it?

!
Both tseq and tpar are total response times (elapsed times), and
as such they are not objective. They can be influenced by:
− The skill of the programmer who wrote the implementations
− The choice of compiler (e.g. GNU C++ versus Intel C++)
− The compiler switches (e.g. turning optimization on/off)
− The operating system
− The type of filesystem holding the input data (e.g. EXT4, NTFS, etc.)
− The time of day... (different workloads, network traffic, etc.)

<C> G. Barlas, 2016  12

Speedup - Conditions

!
One should abide by the following rules:
− Both the sequential and the parallel programs should be
tested on identical software and hardware platforms, and
under similar conditions.
− The sequential program should be the fastest known
solution to the problem at hand.
!
Why is the second condition there?
!
Shouldn't we compare against the sequential version
of the parallel program instead?

<C> G. Barlas, 2016 13

Efficiency
!
Speedup tells only part of the story: it can tell us if it is feasible to accelerate the solution
of a problem, e.g. if speedup > 1.
!
It cannot tell us if this can be done with a modest amount of resources.
!
The second metric employed for this purpose is efficiency, defined as:

!
where N is the number of CPUs/cores employed for the execution of the parallel program.
!
One can interpret the efficiency as the average percent of time, that a node is utilized
during the parallel execution.
!
When speedup = N, then the corresponding parallel program exhibits what is called as
linear speedup.

<C> G. Barlas, 2016 14

Efficiency Example
! Speedup and efficiency curves for a sample program that
calculates the definite integral of a function by applying the
trapezoidal rule algorithm.

Hint concerning the shown curves

(Efficiency Example)

●
There is a discrepancy here for the cautious student:
●
If we only have a quad-core CPU like the Intel i7, how can
we test and report speedup for 8 threads?
Hyperthreading!
●
What is a Thread?
●
A sequence of instructions of a
process which is managed
separately by the operating
system scheduler as a unit.

<C> G. Barlas, 2016 16


Efficiency Milestones

! 2005 : first single-die dual-core CPU (AMD Athlon)

! 2007 : first heterogeneous CPU : Cell BE
! Mid 2000s : introduction of GPGPU paradigm.

! GPUs offer distinct advantages:

− Bulk computation power
− High FLOP/Watt ratio

<C> G. Barlas, 2016 17

Speedup-Efficiency Considerations

! How could we calculate efficiency, if the sequential

and parallel programs run on different platforms?
Example, CPU and GPU respectively.
! Caution for testing: make sure you are using real
hardware resources.

<C> G. Barlas, 2016 18

Speedup-Efficiency Considerations cont.

! Is it possible to get:
speedup > N
efficiency > 100%
! This is the so-called super-linear speedup
scenario.
! Can be caused by using an other algorithm, which
cannot be used on a single processor, e.g. acqui-
sition of an item in a search space.

<C> G. Barlas, 2016 19

Super-linear Speedup Example

!
Let us consider the problem of breaking DES. In the DES encryption standard, a secret
number in the range [0,256-1] is used as the key to encrypt a message.
!
A brute force attack on a ciphertext, would involve trying out all the keys until the decoded
message could be identified as a readable text. If we assume that each attempt to decipher
the message costs time T on a single CPU, if the key was the number 255, then a sequential
program would take tseq = (255+1) T time to solve the problem.
!
If we were to employ 2 CPUs to solve the same problem, and we partitioned the search
space of 256 keys equally among the 2 CPUs, i.e. range [0,255-1] to the first one, and range
[255,256-1] to the second one, then the key would be found after only one attempt by the
second CPU! We would then have

!
What would happen If we increased the CPUs? Would we get a reduction in run time?
!
What would happen if the secret key was 2?

<C> G. Barlas, 2016 20

Scalability

!
Speedup covers the efficacy of a parallel solution: is it
beneficial or not?
! Efficiency is a measure of resource utilization: how much of
the potential afforded by the computing resources we commit,
is actually used?
! Finally, we would like to know how a parallel algorithm
behaves with increased computational resources and/or
problem sizes: does it scale?
! Scalability is the ability of a (software or hardware) system to
efficiently handle a growing amount of work.

<C> G. Barlas, 2016 21

Scalability, cont.

!
In the context of a parallel algorithm and/or platform, scalability
translates to being able to
(a) solve bigger problems and/or
(b) to incorporate more computing resources.
! To measure (a) we use the weak scaling efficiency:

where t'par is the time to solve a N-times bigger problem than

the one the single processor machine is solving in tseq

<C> G. Barlas, 2016 22

Scalability, cont.

! To measure (b) we use the strong scaling efficiency:

which is the same as the efficiency discussed

earlier.
! The one that is most challenging to improve, is the
strong scaling efficiency.

<C> G. Barlas, 2016 23

Predicting and Measuring Parallel Program

Performance

!
The development of a parallel solution to a problem, starts with
the development of its sequential variant!
!
Questions that need to be answered:
− Which parts of the sequential program are the greatest consumers of
computational power?
− What is the potential speedup?
− Is the parallel program correct?
!
The development of the sequential algorithm and associated
program can also provide essential insights about the design
that should be pursued for parallelization.

<C> G. Barlas, 2016 24

Predicting and Measuring Parallel Program
Performance (cont.)

!
Once the sequential version is implemented, we can use a
profiler to guide the design process.
!
Profilers can use:
− Instrumentation: modifies the code of the program that is
being profiled, so that information can be collected (usually
requires re-compilation).
− Sampling: the execution of the target program is inter-
rupted periodically, in order to query which function is being
executed.

<C> G. Barlas, 2016 25

Profiler Example
$ valgrind --tool=callgrind ./bucketsort 10000

<C> G. Barlas, 2016 26

See: https://www.valgrind.org/
Experimentation Guidelines
!
The duration of the whole execution should be measured, unless specifically stated
otherwise.
!
Results should be reported in the form of averages over multiple runs, possibly
including standard deviations.
!
Outliers, i.e. too big or too small results should be excluded from the calculation of the
averages as they typically are expression of an anomaly. However, care should be
given so that unfavorable results are not brushed away instead of being explained.
!
Scalability is paramount, so results should be reported for a variety of input sizes
(ideally covering the size and/or quality of real-life data), and a variety of parallel
platform sizes.
!
Test inputs should vary from very small, to very big, but they should always include
problem sizes that would be typical of a production environment, if these can be
identified.
!
When multicore machines are employed, the number of threads and/or processes,
should not exceed the number of available hardware cores. (Disable multithreading!)

<C> G. Barlas, 2016 27

Amdahl's Law
! In 1967 Gene Amdahl formulated a simple thought experiment:
− A sequential application that requires T time to execute on a single
CPU.
− The application consists of a part that can be
parallelized. The remaining 1 − α has to be done sequentially.
− Parallel execution incurs no communication overhead, and the
parallelizable part can be divided evenly among any chosen number of
CPUs. This assumption suits particularly well multicore architectures,
where cores have access to the same shared memory.
! Given the above assumptions, the speedup obtained by N
nodes, should be upper-bounded by:

<C> G. Barlas, 2016 28

Amdahl's Predictions
! How much faster can we go?
! What if we had infinite resources?
1
( speedup ) =
1− α

<C> G. Barlas, 2016 29

Amdahl's Predictions (2)

30
„An army of ants versus a herd of
elephants“
!
What is the best investment (in mainframe/minicomputer time)?
!
Assuming that we have a program that can execute in time TA on a
single powerful CPU and time TB on a less powerful, inexpensive CPU.
!
We can declare, based on the execution time, that CPU A is

times faster than B.

!
If we can afford to buy NB CPUs of the inexpensive type, the best
speedup we can get relative to the execution on a single CPU of type A
is:

<C> G. Barlas, 2016 31

„An army of ants versus a herd of

elephants“

!
For infinite NB, we can get the absolute upper bound on the speedup:

which means that the speedup will never be above 1, no matter how
many „ants“ you use, if

!
So if α = 90% and r = 10 we are better off using a single expensive
CPU than going the parallel route with inexpensive components.
!
BUT, math does not tell a true story all the time!


<C> G. Barlas, 2016 32

Gustafson-Barsis' rebuttal (1988)

!
Empirical data shows: Parallel programs routinely exceed the
speedup limits predicted by Amdahl's law.
!
The key to understanding the fundamental error in Amdahl's law
is problem size.
!
Do we solve the same size of problems with parallel machines,
as with sequential ones?
!
Assuming that we have:
− A parallel application that requires T time to execute of N CPUs.
− The application spends percent of the total time running on
all machines. The remaining has to be done sequentially.

<C> G. Barlas, 2016 33

Gustafson-Barsis' rebuttal (1988) cont.

! A sequential machine would require a total time:

! The speedup would then be:

! And the corresponding efficiency:

! The efficiency has a lower bound of α, as N goes to infinity.

! Gustafson-Barsis speedup curves are worlds apart from Amdahl's curves!
(see next slides) 

<C> G. Barlas, 2016 34

Gustafson-Barsis' Predictions

<C> G. Barlas, 2016 35

Gustafson-Barsis' Predictions

<C> G. Barlas, 2016 36

Which one is right?

!
Actually..... none!
!
The assumption of zero communication overhead is too
optimistic, even with shared-memory platforms.
!
Scenarios where coordination is completely absent even
for cores sharing the same memory space are not typical.
!
There are two things we can keep from the discussion:
− There is room for optimism in parallel computing.
− Simple models can help us predict performance before actually
building the software.

<C> G. Barlas, 2016 37

Sap Commodity Pricing Engine Cpe Configuration Guide For Consultants
100% (2)
Sap Commodity Pricing Engine Cpe Configuration Guide For Consultants
26 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
OOAD
No ratings yet
OOAD
67 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Performance Metrics For Parallel Programs: 8 March 2010
No ratings yet
Performance Metrics For Parallel Programs: 8 March 2010
44 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Unit 4 HPC Part2
No ratings yet
Unit 4 HPC Part2
18 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
PC 2
No ratings yet
PC 2
44 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
Karp
No ratings yet
Karp
5 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Unit 4 HPC
No ratings yet
Unit 4 HPC
82 pages
ch4 PC
No ratings yet
ch4 PC
76 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Cours 2
No ratings yet
Cours 2
25 pages
Performance and Tuning of Openmp Programs
No ratings yet
Performance and Tuning of Openmp Programs
76 pages
L01 Slides
No ratings yet
L01 Slides
24 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
CSCI 8150 Advanced Computer Architecture
No ratings yet
CSCI 8150 Advanced Computer Architecture
26 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
P 1
No ratings yet
P 1
44 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
High Performance Computing For Computational Mechanics: ISCM-10
No ratings yet
High Performance Computing For Computational Mechanics: ISCM-10
63 pages
Unit 4
No ratings yet
Unit 4
64 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
Parallel Computing
No ratings yet
Parallel Computing
30 pages
Hpca Notes
No ratings yet
Hpca Notes
216 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Lecture # 21
No ratings yet
Lecture # 21
16 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Hiragana Memory Hint Worksheet Booklet JPF
No ratings yet
Hiragana Memory Hint Worksheet Booklet JPF
12 pages
MARKSHEET
No ratings yet
MARKSHEET
1 page
Objective:: Related Coursework
No ratings yet
Objective:: Related Coursework
1 page
Functional Setup Manager - FSM
No ratings yet
Functional Setup Manager - FSM
34 pages
Digital Currency Stocks Bonds: The Most Popular Cryptocurrency Terms and Phrases Cryptocurrency
No ratings yet
Digital Currency Stocks Bonds: The Most Popular Cryptocurrency Terms and Phrases Cryptocurrency
75 pages
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
No ratings yet
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
8 pages
(LAB 10) Small Vs Large Deflection & Performing A Restart On ANSYS
No ratings yet
(LAB 10) Small Vs Large Deflection & Performing A Restart On ANSYS
20 pages
Social Bookmarking Sites
100% (1)
Social Bookmarking Sites
20 pages
Automata Theory
100% (1)
Automata Theory
103 pages
Hypermesh To Ansys Workbench
No ratings yet
Hypermesh To Ansys Workbench
7 pages
SRS Documentation
No ratings yet
SRS Documentation
8 pages
Semin Detailed Lesson Plan in CSS 11 Assembly and Disassembly
100% (4)
Semin Detailed Lesson Plan in CSS 11 Assembly and Disassembly
4 pages
LS10200 000NF E A1 IFC NOTIFIER Compatibility Document
No ratings yet
LS10200 000NF E A1 IFC NOTIFIER Compatibility Document
2 pages
Emasters in Data Science Data Analytics
No ratings yet
Emasters in Data Science Data Analytics
12 pages
Praneeth K FSD AWS
No ratings yet
Praneeth K FSD AWS
4 pages
LWDMTX5 001
No ratings yet
LWDMTX5 001
333 pages
Syllabus IT 430-002 Ethical Hacking
No ratings yet
Syllabus IT 430-002 Ethical Hacking
3 pages
Difference Between NC and CNC Machine - Mechanical Booster
0% (1)
Difference Between NC and CNC Machine - Mechanical Booster
5 pages
DDOS 7.12 Command Reference Guide
No ratings yet
DDOS 7.12 Command Reference Guide
338 pages
Vehicles of Interest Introduction
No ratings yet
Vehicles of Interest Introduction
1 page
Buss Pass
No ratings yet
Buss Pass
1 page
James Hall
No ratings yet
James Hall
8 pages
Test Plan Document Client and Server Application
No ratings yet
Test Plan Document Client and Server Application
8 pages
Palo Alto Networks Emea How To Guide
No ratings yet
Palo Alto Networks Emea How To Guide
21 pages
Steam Turbine - Various Applications
No ratings yet
Steam Turbine - Various Applications
2,388 pages
Resetting Windows Password
No ratings yet
Resetting Windows Password
8 pages
Bab - La Phrases Resume CV English Arabic
No ratings yet
Bab - La Phrases Resume CV English Arabic
4 pages
Fastest: Tested DNS Servers
No ratings yet
Fastest: Tested DNS Servers
9 pages
CCBoot Manual - Update Image and Game
No ratings yet
CCBoot Manual - Update Image and Game
65 pages

ACA 2024W 01 Introduction

Uploaded by

ACA 2024W 01 Introduction

Uploaded by

2024-10-23

Multicore & GPU Programming : An Integrated Approach

<C> G. Barlas, 2016  Modifications by H. Weber

<C> G. Barlas, 2016 2

The Era of Multicore Machines

<C> G. Barlas, 2016 4

<C> G. Barlas, 2016 7

Examples of recent multicore chips

! Name of chip: ...

! Architecture description (keywords):

<C> G. Barlas, 2016 8

<C> G. Barlas, 2016 9

A Glimpse at the Top 500

<C> G. Barlas, 2016 10

<C> G. Barlas, 2016 11

Speedup – How objective is it?

<C> G. Barlas, 2016  12

<C> G. Barlas, 2016 13

<C> G. Barlas, 2016 14

Hint concerning the shown curves

<C> G. Barlas, 2016 16

! 2005 : first single-die dual-core CPU (AMD Athlon)

! GPUs offer distinct advantages:

<C> G. Barlas, 2016 17

! How could we calculate efficiency, if the sequential

<C> G. Barlas, 2016 18

<C> G. Barlas, 2016 19

Super-linear Speedup Example

<C> G. Barlas, 2016 20

<C> G. Barlas, 2016 21

where t'par is the time to solve a N-times bigger problem than

<C> G. Barlas, 2016 22

! To measure (b) we use the strong scaling efficiency:

which is the same as the efficiency discussed

<C> G. Barlas, 2016 23

Predicting and Measuring Parallel Program

<C> G. Barlas, 2016 24

<C> G. Barlas, 2016 25

<C> G. Barlas, 2016 26

<C> G. Barlas, 2016 27

<C> G. Barlas, 2016 28

<C> G. Barlas, 2016 29

Amdahl's Predictions (2)

times faster than B.

<C> G. Barlas, 2016 31

„An army of ants versus a herd of

<C> G. Barlas, 2016 32

<C> G. Barlas, 2016 33

Gustafson-Barsis' rebuttal (1988) cont.

! The speedup would then be:

! And the corresponding efficiency:

! The efficiency has a lower bound of α, as N goes to infinity.

<C> G. Barlas, 2016 34

<C> G. Barlas, 2016 35

<C> G. Barlas, 2016 36

<C> G. Barlas, 2016 37

You might also like