0% found this document useful (0 votes)

74 views36 pages

Parallel Programming: Sathish S. Vadhiyar Course Web Page

This document provides an overview of parallel programming. It discusses the motivation for parallel programming including faster execution times and addressing resource constraints. It also covers challenges in parallel programming like avoiding communication delays and idling processes. Methods for evaluating parallel programs are presented, including metrics like speedup and efficiency. Different types of parallel algorithms and architectures are described at a high level.

Uploaded by

Anna Poorani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views36 pages

Parallel Programming: Sathish S. Vadhiyar Course Web Page

Uploaded by

Anna Poorani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Parallel Programming

Sathish S. Vadhiyar
Course Web Page:
http://www.serc.iisc.ernet.in/~vss/courses/PPP2007
Outline
 Motivation for parallel programming
 Challenges in parallel programming
 Evaluating a parallel
program/algorithm – speedup,
efficiency, scalability analysis
 Parallel Algorithm – Design, Types
and Models
 Parallel Architectures
Motivation for Parallel Programming
• Faster Execution time due to non-dependencied
between regions of code
• Presents a level of modularity
• Resource constraints. Large databases.
• Certain class of algorithms lend themselves
• Aggregate bandwidth to memory/disk. Increase in data
throughput.
• Clock rate improvement in the past decade – 40%
• Memory access time improvement in the past decade
– 10%
• Grand challenge problems (more later)
Challenges / Problems in Parallel
Algorithms
 Building efficient algorithms.
 Avoiding
 Communication delay
 Idling
 Synchronization
Challenges

Idle time
Computation

Communication

Synchronization
How do we evaluate a parallel
program?
 Execution time, Tp
 Speedup, S
 S(p, n) = T(1, n) / T(p, n)
 Usually, S(p, n) < p
 Sometimes S(p, n) > p (superlinear speedup)
 Efficiency, E
 E(p, n) = S(p, n)/p
 Usually, E(p, n) < 1
 Sometimes, greater than 1
 Scalability – Limitations in parallel
computing, relation to n and p.
Speedups and efficiency

S E

Ideal p p

Practical
Limitations on speedup – Amdahl’s
law
 Amdahl's law states that the performance
improvement to be gained from using some
faster mode of execution is limited by the
fraction of the time the faster mode can be
used.
 Overall speedup in terms of fractions of
computation time with and without
enhancement, % increase in enhancement.
 Places a limit on the speedup due to
parallelism.
 Speedup = 1
(fs + (fp/P))
Amdahl’s law Illustration
S = 1 / (s + (1-s)/p)

1 Efficiency
0.8

0.6

0.4

0.2

0
Courtesy: 0 5 10 15

http://www.metz.supelec.fr/~dedu/docs/kohPaper/node2.html
http://nereida.deioc.ull.es/html/openmp/pdp2002/sld008.htm
Amdahl’s law analysis
f P=1 P=4 P=8 P=16 P=32
1.00 1.0 4.00 8.00 16.00 32.00
0.99 1.0 3.88 7.48 13.91 24.43
0.98 1.0 3.77 7.02 12.31 19.75
0.96 1.0 3.57 6.25 10.00 14.29
•For the same fraction, speedup numbers keep moving away from
processor size.
•Thus Amdahl’s law is a bit depressing for parallel programming.
•In practice, the number of parallel portions of work has to be large
enough to match a given number of processors.
Gustafson’s Law
 Amdahl’s law – keep the parallel work fixed
 Gustafson’s law – keep computation time
on parallel processors fixed, change the
fraction of parallel work to match the
computation time
 Serial component of code is independent of
problem size
 Parallel component scales as problem size
which scales as number of processors
 Scaled Speedup, S =
(Seq + Par(P)*P)/(Seq + Par(P))
Metrics (Contd..)

Table 5.1: Efficiency as a function of n and p.

N P=1 P=4 P=8 P=16 P=32

64 1.0 0.80 0.57 0.33
192 1.0 0.92 0.80 0.60
512 1.0 0.97 0.91 0.80
Scalability
 Efficiency decreases with increasing P;
increases with increasing N
 How effectively the parallel algorithm can
use an increasing number of processors
 How the amount of computation performed
must scale with P to keep E constant
 This function of N in terms of P is called
isoefficiency function.
 An algorithm with an isoefficiency function
of O(P) is highly scalable while an algorithm
with quadratic or exponential isoefficiency
function is poorly scalable
Scalability Analysis – Finite
Difference algorithm with 1D
decomposition
For constant efficiency, a function of P, when substituted for N
must satisfy the following relation for increasing P and constant
E.

Can be satisfied with N = P, except for small P.

Hence isoefficiency function = O(P2) since

computation is O(N2)
Scalability Analysis – Finite
Difference algorithm with 2D
decomposition

Can be satisfied with N = sqroot(P)

Hence isoefficiency function = O(P)

2D algorithm is more scalable than 1D

Parallel Algorithm – Design, Types
and Models
Parallel Algorithm Design -
Components
 Decomposition – Splitting the
problem into tasks or modules
 Mapping – Assigning tasks to
processor
 Mapping’s contradictory objectives
 To minimize idle times
 To reduce communications
Parallel Algorithm Design -
Containing Interaction Overheads
 Maximizing data locality
 Minimizing volume of data exchange
 Minimizing frequency of interactions
 Minimizing contention and hot spots
 Overlapping computations with
interactions
 Overlapping interactions with
interactions
 Replicating data or computations
Parallel Algorithm Types and
Models
 Single Program
Multiple Data
(SPMD)
 Multiple Program
Multiple Data
(MPMD)

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
Parallel Algorithm Types and
Models
 Master-Worker / P0

parameter sweep /
task farming P1 P2 P3 P4
 Pipleline / systolic /
wavefront

P0 P1 P2 P3 P4

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
Parallel Algorithm Types and
Models
 Data parallel model
 Processes perform identical tasks on different data
 Task parallel model
 Different processes perform different tasks on same
or different data – based on task dependency graph
 Work pool model
 Any task can be performed by any process. Tasks are
added to a work pool dynamically
 Pipeline model
 A stream of data passes through a chain of processes
– stream parallelism
Parallel Architectures
- Classification
- Cache coherence in shared memory platforms
- Interconnection networks
Classification of Architectures –
Flynn’s classification
 Single Instruction
Single Data (SISD):
Serial Computers
 Single Instruction
Multiple Data (SIMD)
- Vector processors
and processor arrays
- Examples: CM-2,
Cray-90, Cray YMP,
Hitachi 3600

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
Classification of Architectures –
Flynn’s classification
 Multiple Instruction
Single Data (MISD):
Not popular
 Multiple Instruction
Multiple Data (MIMD)
- Most popular
- IBM SP and most
other supercomputers,
clusters,
computational Grids
etc.

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
Classification of Architectures –
Based on Memory
 Shared memory
 2 types – UMA and NUMA
NUMA Examples: HP-
Exemplar, SGI Origin,
UMA Sequent NUMA-Q

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
Classification of Architectures –
Based on Memory
 Distributed memory

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/

 Recently multi-cores
 Yet another classification – MPPs, NOW (Berkeley), COW, Computational Grids
Programming Paradigms, Algorithm
Types, Techniques
 Shared memory
model – Threads,
OpenMP
 Message passing
model – MPI
 Data parallel model
– HPF

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/
Cache Coherence in SMPs
• All processes
read variable ‘x’
CPU0 CPU1 CPU2 CPU3 residing in
cache line ‘a’
•Each process
a a a a updates ‘x’ at
cache0 cache1 cache2 cache3 different points
of time

a
Challenge: To maintain consistent view
of the data Main

Protocols: Memory

•Write update
•Write invalidate
Caches Coherence Protocols and
Implementations
 Write update – propagate cache line to other
processors on every write to a processor
 Write invalidate – each processor get the
updated cache line whenever it reads stale
data
 Which is better??
Caches –False sharing
•Different processors
update different parts of
the same cache line
•Leads to ping-pong of
CPU0 cache lines between CPU1
processors
•Situation better in
A0, A2, A4… A1, A3, A5…
update protocols than
cache0
invalidate protocols. cache1

Why?

A0 – A8
A9 – A15 •Modify the
Main algorithm to change
the stride
Memory
Caches Coherence using invalidate
protocols
 3 states associated with data items
 Shared – a variable shared by 2 caches
 Invalid – another processor (say P0) has updated the data
item
 Dirty – state of the data item in P0
 Implementations
 Snoopy
 for bus based architectures
 Memory operations are propagated over the bus and snooped
 Instead of broadcasting memory operations to all processors,
propagate coherence operations to relevant processors
 Directory-based
 A central directory maintains states of cache blocks, associated
processors
 Implemented with presence bits
Interconnection Networks
 An interconnection network defined by
switches, links and interfaces
 Switches – provide mapping between
input and output ports, buffering, routing
etc.
 Interfaces – connects nodes with network
 Network topologies
 Static – point-to-point communication
links among processing nodes
 Dynamic – Communication links are
formed dynamically by switches
Interconnection Networks
 Static
 Bus – SGI challenge
 Completely connected
 Star
 Linear array, Ring (1-D torus)
 Mesh – Intel ASCI Red (2-D) , Cray T3E (3-D), 2DTorus
 k-d mesh: d dimensions with k nodes in each
dimension
 Hypercubes – logp-2 mesh – e.g. many MIMD
machines
 Trees – our campus network
 Dynamic – Communication links are formed
dynamically by switches
 Crossbar – Cray X series – non-blocking network
 Multistage – SP2 – blocking network.
Evaluating Interconnection
topologies
 Diameter – maximum distance between any two
processing nodes
 Full-connected – 1
 Star – 2
 Ring – p/2
 Hypercube - logP
 Connectivity – multiplicity of paths between 2 nodes.
Maximum number of arcs to be removed from
network to break it into two disconnected networks
 Linear-array – 1
 Ring – 2
 2-d mesh – 2
 2-d mesh with wraparound – 4
 D-dimension hypercubes – d
Evaluating Interconnection
topologies
 bisection width – minimum number of
links to be removed from network to
partition it into 2 equal halves
 Ring – 2
 P-node 2-D mesh - Root(P)
 Tree – 1
 Star – 1
 Completely connected – P2/4
 Hypercubes - P/2
Evaluating Interconnection
topologies
 channel width – number of bits that can be
simultaneously communicated over a link,
i.e. number of physical wires between 2
nodes, channel rate, channel bandwidth,
bisection bandwidth, cost
 channel rate – performance of a single
physical wire
 channel bandwidth – channel rate times
channel width
 bisection bandwidth – maximum volume of
communication between two halves of
network, i.e. bisection width times channel
bandwidth

2024 L1 FixedIncome
No ratings yet
2024 L1 FixedIncome
93 pages
Matrix of Activities DSPC 1
No ratings yet
Matrix of Activities DSPC 1
4 pages
Conti USA IFS Hydraulic Hoses Fittings Catalog 2016
No ratings yet
Conti USA IFS Hydraulic Hoses Fittings Catalog 2016
444 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
SSP Cakram5 6
No ratings yet
SSP Cakram5 6
420 pages
Pressure Equipment - European Commission PDF
No ratings yet
Pressure Equipment - European Commission PDF
22 pages
Lecture 1.5 Challenges in Distributed Multimedia Systems
No ratings yet
Lecture 1.5 Challenges in Distributed Multimedia Systems
9 pages
Amec Unit 2 QB
No ratings yet
Amec Unit 2 QB
23 pages
University of Okara: Advertisement No. 2/2020
No ratings yet
University of Okara: Advertisement No. 2/2020
3 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
Lecture 1.2 Multimedia Elements
No ratings yet
Lecture 1.2 Multimedia Elements
19 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
Pda 1
No ratings yet
Pda 1
72 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
PDC Last Min Notes For MCQS - Theory
No ratings yet
PDC Last Min Notes For MCQS - Theory
39 pages
Analysis of Legal Case Document Automated Summarizer
No ratings yet
Analysis of Legal Case Document Automated Summarizer
6 pages
Slides Chap04 PDF
No ratings yet
Slides Chap04 PDF
144 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
Modern Information Retrieval: User Interfaces For Search
No ratings yet
Modern Information Retrieval: User Interfaces For Search
87 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
C Lab Manual
No ratings yet
C Lab Manual
81 pages
Managerial Economics MCQs
100% (1)
Managerial Economics MCQs
6 pages
PDC Notes by Zatch-1
No ratings yet
PDC Notes by Zatch-1
42 pages
Partnership - Case Digests (Thyrz)
No ratings yet
Partnership - Case Digests (Thyrz)
15 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Edraky - SD
No ratings yet
Edraky - SD
29 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
No ratings yet
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
40 pages
IT2032 Software Testing Unit-4
No ratings yet
IT2032 Software Testing Unit-4
68 pages
CS439 CC 2 Parallel Distributed Systems
No ratings yet
CS439 CC 2 Parallel Distributed Systems
37 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
It2032 Software Testing Unit-3
No ratings yet
It2032 Software Testing Unit-3
40 pages
IT2032 Software Testing Unit-3
No ratings yet
IT2032 Software Testing Unit-3
39 pages
Parallel Algorithms Presentation
No ratings yet
Parallel Algorithms Presentation
32 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Lung Sementation in X-Ray
No ratings yet
Lung Sementation in X-Ray
12 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
No ratings yet
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
8 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Lung Segmentation - in - Chest - X-Ray
No ratings yet
Lung Segmentation - in - Chest - X-Ray
11 pages
PROSPECTUS 2025-26
No ratings yet
PROSPECTUS 2025-26
11 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
CosinGAN 2020
No ratings yet
CosinGAN 2020
28 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
3.2. Perspectives On Listening Ho
No ratings yet
3.2. Perspectives On Listening Ho
35 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
2006 - Kholodenko - Cell Signalling Dynamics in Time and Space
No ratings yet
2006 - Kholodenko - Cell Signalling Dynamics in Time and Space
12 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Unit 2 Unit 2
No ratings yet
Unit 2 Unit 2
19 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
MTS3101 Appendices v1
No ratings yet
MTS3101 Appendices v1
35 pages
Worksheet and Coronavirus 10 Ac
No ratings yet
Worksheet and Coronavirus 10 Ac
5 pages
IT2032 Software Testing Unit-3
No ratings yet
IT2032 Software Testing Unit-3
50 pages
Alagappa Chettiar Government College of Engineering and Technology Karaikudi - 630 003
No ratings yet
Alagappa Chettiar Government College of Engineering and Technology Karaikudi - 630 003
2 pages
ELE 4623: Control Systems: Faculty of Engineering Technology
No ratings yet
ELE 4623: Control Systems: Faculty of Engineering Technology
15 pages
G0709084055
No ratings yet
G0709084055
16 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
The Star Weaver
No ratings yet
The Star Weaver
2 pages
Movement in The Classroom
No ratings yet
Movement in The Classroom
7 pages
Accresm Research Sample
No ratings yet
Accresm Research Sample
46 pages
All About Aiims Mbbs
No ratings yet
All About Aiims Mbbs
23 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Web Content Management System
No ratings yet
Web Content Management System
6 pages
Unit 1 AP World History Powerpoint
No ratings yet
Unit 1 AP World History Powerpoint
55 pages
Unit 13 Listening
No ratings yet
Unit 13 Listening
1 page
Basic IR: Modeling
No ratings yet
Basic IR: Modeling
22 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
Pleuropulmonary Infections
No ratings yet
Pleuropulmonary Infections
40 pages
PSPP Unit 1 EEE A Section
No ratings yet
PSPP Unit 1 EEE A Section
6 pages
NCMA 219 RUBRICS - ADMINISTERING Magnesium Sulfate
No ratings yet
NCMA 219 RUBRICS - ADMINISTERING Magnesium Sulfate
3 pages
Anaesthetic Considerations Patients With Mitral Valve Prolapse
No ratings yet
Anaesthetic Considerations Patients With Mitral Valve Prolapse
7 pages
01.09 - Vocab - Life in The Countryside
No ratings yet
01.09 - Vocab - Life in The Countryside
3 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Iphone Laptop Computer Information
No ratings yet
Iphone Laptop Computer Information
1 page
David Wall VP Hse & Im EPT - HSE, Operations & Engineering: Confidential BP-HZN - BLYOO196756
No ratings yet
David Wall VP Hse & Im EPT - HSE, Operations & Engineering: Confidential BP-HZN - BLYOO196756
3 pages
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Parallel Programming: Sathish S. Vadhiyar Course Web Page

Uploaded by

Parallel Programming: Sathish S. Vadhiyar Course Web Page

Uploaded by

Parallel Programming

Table 5.1: Efficiency as a function of n and p.

N P=1 P=4 P=8 P=16 P=32

Can be satisfied with N = P, except for small P.

Hence isoefficiency function = O(P2) since

Can be satisfied with N = sqroot(P)

Hence isoefficiency function = O(P)

2D algorithm is more scalable than 1D

You might also like