0% found this document useful (0 votes)

11 views52 pages

Chapter 01

The document discusses parallel processing, emphasizing its necessity due to computational demands and the evolution of practical parallel processing from early machines. It contrasts pipeline and data parallelism, explores scalability, and details the Sieve of Eratosthenes algorithm in both control and data parallel approaches. Additionally, it addresses the speedup achievable through parallel execution and the impact of the number of processors on performance.

Uploaded by

abdallahm.alsoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views52 pages

Chapter 01

Uploaded by

abdallahm.alsoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 52

Chapter1: Introduction

 Why Parallel Processing?

 Advent of Practical Parallel Processing.
 Parallel Processing Terminology.
 Contrasting Pipeline and Data Parallelism.
 Control Parallelism.
 Scalability.
 The Sieve of Eratosthenes (Sieve Technique
 Control Parallel Approach.
 Data Parallel Approach.
Why Parallel Processing?
 Because of the computational
demand.
Advent of Practical Parallel
Processing
Early machines in the 70’s & 80’s in
the laboratories:
 ILLIAC IV at Burroughs Corporation
(70’s).
 Cm* & C.mmp at Carnegie-Mellon
University (70’s).
 Cosmic Cube by nCUBE (80’s).
…
 The performance of a single
processor can be improved by either
architecture or technological
advances.
 Architecture: by increasing the amount
of work performed by per instruction
cycle.
 Technological advances: by reducing the
time needed per instruction cycle.
Parallel Processing
Terminology
 Most high-performance modern
computers exhibit concurrency, but it
is not desirable to call them parallel
computers.
 Parallel Processing: is information
(data) processing that emphasizes
the concurrent manipulation of data
elements belonging to one or more
processes solving a single problem.
…
 Parallel Computer: is a multiple-
processor computer capable of
parallel processing.
 Super Computer: a general-purpose

computer capable of solving

individual problems at extremely high
speed.
A super computer is a parallel
computer.
…
 Throughput: number of results per
unit of time.
 Pipeline: computation is diveded into
steps /segments/ stages.
 Data Parallelism: is the use of
multiple functional units to apply the
same operation simultaneously to
elements of data set.
Ex: parallel addition.
…
 Speedup: the ratio between the time
needed using sequential algorithm
and the time to perform the same
computation in a parallel machine.
Contrasting Pipeline and Data
Parallelism
Ex: Copy Machine.
1. Sequential.

ABC w w
2 1

2. Pipeline (Control parallelism )

A B C w w w w w
5 4 3 2 1
…
 Data Parallelism.

ABC w w
4 1

ABC w w
5 2

ABC w w
6 3
…
 To copy 4 papers sequentially, it
takes 12 seconds.
 Using pipeline, it takes 3+1+1+1=6
seconds.
 Using data parallelism with 3
functional units, it takes 3+3=6.
Ex:

Papers Pipelin Paralle

1 3e 3l
… … …
4 6 6
… … …
7 9 9
… … …
10 12 12

A three-way data parallel machine produces

3 papers every 3 units of time.
…
…
 Speedup in a pipeline when papers
=4 is
12/6 =2
 Speedup in data parallelism when
papers=4 is
12/6 =2
…
 When number of papers is very large
the speedup of the pipeline machine
is the same of the data parallelism,
given that number of functional units
equal number of stages.
 Pipelining is achieved by applying
different operations to different data
elements simultaneously, which is
called control parallelism.
Scalability
 An architecture is scalable if it
continues to yield the same
performance per processor as the
number of processors increases.
 An algorithm is scalable if the level of
parallelism increases, at least
linearly, with the problem size.
…
 Control Parallelism (pipelining):
Applying different operations on
different data simultaneously.
 Data Parallelism:

Applying same operations, using

multiple functional units, on different
data simultaneously.
The Sieve of Eratosthenes
(Sieve Technique)
(The classic prime-finding algorithm)
 We want to find the number of primes

lees than or equal to some positive

integer n.
 a) Begin with a list of natural numbers

2,3,4,…n
b) Strike multiples of 2,3,5, and
successive primes.
c) Terminate after multiples of the
largest prime ≤ √n have been struck.
Sequential Implementation

(a) 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

(b) 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

(d) 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
…

(e) 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

(f) 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

(g) 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

√30=5.5
5 is the largest prime ≤ 5.5  STOP
…

curren
t
p prime
index

2 3 4 n-1 n

The sequential algorithm maintains an

array of natural numbers, variable storing
current prime, and variable storing index of
loop interring through array of natural
Control Parallel Approach
 Algorithm:
Every processor repeatedly goes
through the two-step process of
finding the next prime number and
striking from the list multiples of that
prime, beginning with its square.
…

P1 P2 P3
index index index

Shared current
prime
Memory

2 3 4 n-1 n
…
 Algorithm:
For example, one processor will be
responsible for marking multiple of 2
beginning with 4.
While this processor marks up
multiples of 2, another may be
marking multiples of 3 beginning with
9, and so on.
…
Two problems might occur:
a) Tow or more processors may end up
sieving the same prime number.
b) A processor may end up sieving
multiples of a composite number.
Analysis
Analysis starting from the prime square
a) Consider the time taken by the
sequential algorithms:
 n  3  number of steps needed to sieve
 2  multiples of 2.
 
 (n  8)  number of steps needed to sieve
 3  multiples of 3.
…
…

Total number of steps needed to sieve all

primes is
 (n  1)   12   (n  1)   22   (n  1)   32   (n  1)   k2 
       ...   
 1    2    3   k 

 n  3   n  8   n  24   (n  1)    2
       ...   
k

 2   3   5   k 
…
b) Consider the time taken by the
parallel algorithm:
 Note that parallel execution will not

increase if more than 3 processor are

used.
 An upper bound on the execution

time of the parallel algorithm for

n=1000 is 2.83 .
…
0 100 200 300 400 500 600 700 800 900 10001100 1200 1300 14001500
23
29
(a) 2 3 5 7 1 1 1 1 31
1 3 7 9
23 29

2 7 1 31
(b 7
3 5 1 1 1
)
1 3 9

2
29
(c) 3 1 1 31
1 9
5 7 1 1 23
3 7
…
 Time for one processor is 1411.
 Time for two processors is 706,
therefore the speedup is
1411/706=2.
 Time for three processors is 499,
therefore the speedup is
1411/499=2.83
Data Parallel Approach
a) Algorithm:
Processors will work together to
strike multiples of each newly found
prime. Every processor will be
responsible for a segment of the
array representing the natural
numbers.

H.W: Compute the time and speedup for p=1,2, …,10

and find the relation between speedup and
number of processors.
…
b) Consider a different model of parallel
computation.
No shared memory, interaction
occurs through message passing.
…
P2 current

P1 current
index
Prime index

Prime

n/p+1 2n/p
2 n/
p

Pp current
index
Prime

(p-1)n/ n
p+1
…
 Assume we solve for p processors.
 Every processor is assigned no more
than the ceiling of n/p natural
numbers.
 Assume P is much less than √n.
 All primes are in the segment
assigned to the first processor.
…
Algorithm:
 Processor 1 finds the next prime and

broadcast it to other processors.

 Then all processors strike from their

lists all multiples of the newly found

prime, and so on.
Analysis
 We focus on the time spent on
 Marking composite numbers.
 Communicating the current prime for P1
to the rest.
 Assume it takes ‫ א‬time units fir a
processor to mark a multiple of a
prime.
…
 The total amount of time a
processor spends striking out
composite numbers is no longer
than
 n/   n/   n/  
   ...  
  2   3   
  k 
…
 Communication
Assume a processor spends λ time
unit each time it passes a number to
another processor.
The total communication time for all K
primes is
K(p-1) λ
K: number of prime
(p-1): number of p’s
λ : communication time
Example
 n=1,000,000
 There are 168 primes < 1,000 =
√1,000,000
 The largest prime is 997.
 The maximum possible execution
time spent striking out primes is
  1,000,000 /     1,000,000 /     1,000,000 /    
      ...    
 2   3   997 
…
 The total communication time is 168
(p-1)λ
 Assume the rotation between ‫ א‬and λ
is
λ= 100 ‫א‬
…
Note that speedup declines after 11 processors
…
Note that the total time begins to
increase after 11 processors.
A mdahl’s Law
 S ≤1/(f+(1-f)/p)
S: speedup
f: the fraction of operations in a
computation that must be performed
sequentially, 0 ≤f ≤1.
p: number of processor.
Q.1-6)
Widgets Sequantaully pipelined speedup
1 3 3 1
2 6 3 2
3 9 3 3
4 12 4 3
5 15 4 3.75
6 18 4 4.5
7 21 5 4.2
8 24 5 4.8
9 27 5 5.4
10 30 6 5
…

6
5
4
Speedup

3
2
1
0
1 2 3 4 5 6 7 8 9 10
# of widgets
Question
 Analyze the speedup achievable by
data parallel algorithm on the shared
model.
 Assume it takes a unit of time for a
processor to mark a multiple of a
prime as being a composite number.
 The total amount of time a processor
spends striking out composite
n/ p  n/ p  n/ p
 2   than
numbers is no greater  3   ...    
 k 
Ex:
 N=1000
 The primes are (2,3,5,7,11,13,17,19,23,29,31)
 When p=1

 1000 / 1  1000 / 1  1000 / 1  1000 / 1


 2   3   5    ...   31 
       
=500+334+200+143+91+77+59+53+44+35
+33
=1569
Speedup=1569/1569=1
…
 When p=2
 1000 / 2   1000 / 2   1000 / 2   1000 / 2 
 2    3    5   ...   31 

=
250+167+100+72+46+39+30+27+22+18
+17
=788
Speedup=1569/788=1.99
…
 When p=3

 1000 / 3   1000 / 3   1000 / 3   1000 / 3 


 2   3   5    ...   31 

=167+112+67+48+31+26+20+18+15+1
2+11
=527
Speedup=1569/527=2.97
…
 When p=10
 1000 / 10   1000 / 10   1000 / 10   1000 / 10 
       ...   
2   3   5   31 

=50+34+20+15+10+8+6+5+4+4
=162
Speedup=1569/162=9.68
…

12
10
8
Speedup

6
4
2
0
1 2 3 4 5 6 7 8 9 10
Processor
H.W
 Continue for p=4,5, …, 9 and draw
the relation of the speedup and the
number of processors.

Macmillan Next Move Level 2 Pupil S Book Sample
No ratings yet
Macmillan Next Move Level 2 Pupil S Book Sample
10 pages
Parallel Algorithms Ws 20
No ratings yet
Parallel Algorithms Ws 20
353 pages
Parallel Algorithm Merged
No ratings yet
Parallel Algorithm Merged
76 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
Module 5 - Rizal
No ratings yet
Module 5 - Rizal
2 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Introduction To Parallelism
No ratings yet
Introduction To Parallelism
27 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
No ratings yet
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
18 pages
Dis Top Tim Notes 1
No ratings yet
Dis Top Tim Notes 1
3 pages
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
No ratings yet
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
12 pages
JaJa Parallel - Algorithms Intro
50% (2)
JaJa Parallel - Algorithms Intro
45 pages
Assignment of Algorithm
No ratings yet
Assignment of Algorithm
9 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Daa Unit-V
No ratings yet
Daa Unit-V
50 pages
Com - 612 Exam
No ratings yet
Com - 612 Exam
13 pages
2022 Mid 1
No ratings yet
2022 Mid 1
4 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
Overheads
No ratings yet
Overheads
139 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
My Lecture5 Analysis
No ratings yet
My Lecture5 Analysis
18 pages
Analysis and Estimation: Hardware Software Codesign
No ratings yet
Analysis and Estimation: Hardware Software Codesign
18 pages
Chapter 02
No ratings yet
Chapter 02
47 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Pda 3
No ratings yet
Pda 3
90 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
410A Week 5
No ratings yet
410A Week 5
23 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
21 pages
OOAD
No ratings yet
OOAD
67 pages
Unit 4
No ratings yet
Unit 4
64 pages
Chapter Six
No ratings yet
Chapter Six
18 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Pda 1
No ratings yet
Pda 1
72 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
PC 2
No ratings yet
PC 2
44 pages
WSEAS On of Prime Number Generaton
No ratings yet
WSEAS On of Prime Number Generaton
25 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
01-Parallel Computing
No ratings yet
01-Parallel Computing
7 pages
(AP CSP) (The Internet) Sequential vs. Parallel and Distributed (Student)
No ratings yet
(AP CSP) (The Internet) Sequential vs. Parallel and Distributed (Student)
2 pages
Module 1
No ratings yet
Module 1
14 pages
Fundamentalsof Computer Algorithms by Ellis Horowitz 613 635
No ratings yet
Fundamentalsof Computer Algorithms by Ellis Horowitz 613 635
23 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Multiprocessor Concepts
No ratings yet
Multiprocessor Concepts
40 pages
Parallel Algorithm Main Single
No ratings yet
Parallel Algorithm Main Single
289 pages
Gasire Numere Prime - Distribuit MPI, C
No ratings yet
Gasire Numere Prime - Distribuit MPI, C
15 pages
Lecture 7 Disributed Algorithms
No ratings yet
Lecture 7 Disributed Algorithms
43 pages
Lec Notes
No ratings yet
Lec Notes
50 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
Mid 1 Spring 2024
No ratings yet
Mid 1 Spring 2024
9 pages
The Sieve of Eratosthenes
No ratings yet
The Sieve of Eratosthenes
68 pages
PP Assignment
No ratings yet
PP Assignment
6 pages
Chapter Six
No ratings yet
Chapter Six
19 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Worked Examples in Mechanical Vibrations using MATLAB
From Everand
Worked Examples in Mechanical Vibrations using MATLAB
Eric Okoth Ogur
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Chapter 03
No ratings yet
Chapter 03
68 pages
Cal CH6
No ratings yet
Cal CH6
42 pages
Cal CH1
No ratings yet
Cal CH1
16 pages
Ec Ch10part1
No ratings yet
Ec Ch10part1
32 pages
Payment and Order Fulfillment
No ratings yet
Payment and Order Fulfillment
79 pages
Ec ch6
No ratings yet
Ec ch6
27 pages
VBHTP2e 20-Beta
No ratings yet
VBHTP2e 20-Beta
92 pages
Vbhtp2e 10 Beta
No ratings yet
Vbhtp2e 10 Beta
84 pages
VBHTP2e 01-Beta
No ratings yet
VBHTP2e 01-Beta
35 pages
Spring Boot With MongoDB
No ratings yet
Spring Boot With MongoDB
16 pages
String Handling
No ratings yet
String Handling
5 pages
Common Core Diagnostic Test
No ratings yet
Common Core Diagnostic Test
3 pages
Afro Shakuntala
No ratings yet
Afro Shakuntala
3 pages
Letter To God
No ratings yet
Letter To God
2 pages
AUTOSAR SRS PortDriver
No ratings yet
AUTOSAR SRS PortDriver
13 pages
Hedy HD700 Aug V4.1
No ratings yet
Hedy HD700 Aug V4.1
160 pages
3kb04.muhammad Sandhi Khadafi.T2
No ratings yet
3kb04.muhammad Sandhi Khadafi.T2
91 pages
How To Check The Health of Your Laptop's Battery in Windows
No ratings yet
How To Check The Health of Your Laptop's Battery in Windows
8 pages
Example, Showing Entries in Different Databases: Relocatable
No ratings yet
Example, Showing Entries in Different Databases: Relocatable
15 pages
Coldplay - Yellow: Were Came Wrote Was Took Was
No ratings yet
Coldplay - Yellow: Were Came Wrote Was Took Was
2 pages
Annex C 3 COT Rating Sheet For Proficient Teacher For SY 2024 2025
No ratings yet
Annex C 3 COT Rating Sheet For Proficient Teacher For SY 2024 2025
1 page
Nios 302 Ch-23
No ratings yet
Nios 302 Ch-23
12 pages
Errata: Cultural History of The Native Peoples of Southern New England
100% (3)
Errata: Cultural History of The Native Peoples of Southern New England
5 pages
SSC & Railway General Awareness Quiz (Eng.)
No ratings yet
SSC & Railway General Awareness Quiz (Eng.)
4 pages
Cecilio K Pedro Entreprenuer
No ratings yet
Cecilio K Pedro Entreprenuer
3 pages
2 Extracts To Aid Characterisation WS
No ratings yet
2 Extracts To Aid Characterisation WS
2 pages
Linguistics
No ratings yet
Linguistics
49 pages
DFC20123 Chap 1 Fundamentals of DBMS
No ratings yet
DFC20123 Chap 1 Fundamentals of DBMS
48 pages
A Semiotic Theory of Life Lotman S Princ PDF
No ratings yet
A Semiotic Theory of Life Lotman S Princ PDF
13 pages
Kotoba Safety & Quality
No ratings yet
Kotoba Safety & Quality
24 pages
LTE Session On AMOS Commands
No ratings yet
LTE Session On AMOS Commands
32 pages
Elt 124 Castaneda Ubbanan
No ratings yet
Elt 124 Castaneda Ubbanan
12 pages
Table of Specifications in English 10 For Quarter 2 - Compress
No ratings yet
Table of Specifications in English 10 For Quarter 2 - Compress
4 pages
The Passive Voice
No ratings yet
The Passive Voice
8 pages
Difficulties in Listening of The First Year Students at Tay Do University in Vietnam
No ratings yet
Difficulties in Listening of The First Year Students at Tay Do University in Vietnam
9 pages
Quiz - Flottation - Corrigé
No ratings yet
Quiz - Flottation - Corrigé
2 pages
100 MCQ Questions For HTML and Web Page Designing - MCQ Sets
No ratings yet
100 MCQ Questions For HTML and Web Page Designing - MCQ Sets
11 pages

Chapter 01

Uploaded by

Chapter 01

Uploaded by

Chapter1: Introduction

 Why Parallel Processing?

computer capable of solving

2. Pipeline (Control parallelism )

Papers Pipelin Paralle

A three-way data parallel machine produces

Applying same operations, using

lees than or equal to some positive

The sequential algorithm maintains an

Total number of steps needed to sieve all

increase if more than 3 processor are

time of the parallel algorithm for

H.W: Compute the time and speedup for p=1,2, …,10

broadcast it to other processors.

lists all multiples of the newly found

 1000 / 1  1000 / 1  1000 / 1  1000 / 1

 1000 / 3   1000 / 3   1000 / 3   1000 / 3 

You might also like