0% found this document useful (0 votes)

16 views26 pages

CA Chap7 Multiprocessing

Uploaded by

haosana04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views26 pages

CA Chap7 Multiprocessing

Uploaded by

haosana04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Chapter 7: Introduction to

Multi-processing

Ngo Lam Trung

[with materials from Computer Organization and Design, Patterson &

Hennessy, Computer Organization and Architecture, William Stallings]

CO&ISA NLT, 2021 1

Introduction
❑ Overall goal: increasing performance
For a large software
For a large number of small individual software
With energy efficiency
❑ Approaches in previous chapters?
Pipeline
Super scaler
Multi-threaded
→ increase performance of a single CPU core

CO&ISA NLT, 2021 2

Difficulty in parallelism
❑ The difficulty with parallelism is not the hardware!
❑ It is difficult to write software that uses multiple
processors to complete one task faster, and the problem
gets worse as the number of processors increases.
Conventional algorithm have been designed as sequential
Divide the job to multiple processors may lead to large
communication overhead

CO&ISA NLT, 2021 3

Multiprocessing
❑ Pollack:
“Performance is roughly proportional to square root of increase
in complexity”
double the logic in a processor core, then it delivers only 40%
more performance

❑ The use of multiple cores has the potential to provide

near-linear performance improvement with the increase
in the number of cores

CO&ISA NLT, 2021 4

Multiprocessing
❑ Replacing large inefficient processors with many smaller,
efficient processors
➔better performance per joule
➔Improved energy efficiency joins scalable performance

❑ Task-level/Process-level parallelism: multiple

independent tasks running on multiple processors
❑ Parallel processing program: single program running
on multiple processors simultaneously

CO&ISA NLT, 2021 5

Encountering Amdahl’s Law
❑ Speedup due to enhancement E is
Exec time w/o E
Speedup w/ E = ----------------------
Exec time w/ E
❑ Suppose that enhancement E accelerates a fraction F
(F <1) of the task by a factor S (S>1) and the remainder
of the task is unaffected

ExTime w/ E = ExTime w/o E 

Speedup w/ E =

CO&ISA NLT, 2021 6

ExTime w/ E = ExTime w/o E  ((1-F) + F/S)

Speedup w/ E = 1 / ((1-F) + F/S)

CO&ISA NLT, 2021 7

Example 1: Amdahl’s Law
Speedup w/ E =
❑ Consider an enhancement which runs 20 times faster
but which is only usable 25% of the time.
Speedup w/ E =

❑ What if its usable only 15% of the time?

Speedup w/ E =

❑ Amdahl’s Law tells us that to achieve linear speedup

with 100 processors, none of the original computation
can be scalar!
❑ To get a speedup of 90 from 100 processors, the
percentage of the original program that could be scalar
would have to be 0.1% or less
Speedup w/ E =
CO&ISA NLT, 2021 8
Example 1: Amdahl’s Law
Speedup w/ E = 1 / ((1-F) + F/S)
❑ Consider an enhancement which runs 20 times faster
but which is only usable 25% of the time.
Speedup w/ E = 1/(.75 + .25/20) = 1.31

❑ What if its usable only 15% of the time?

Speedup w/ E = 1/(.85 + .15/20) = 1.17

❑ To achieve linear speedup with 100 processors, none

of the original computation can be scalar!
❑ To get a speedup of 90 from 100 processors, the
percentage of the original program that could be scalar
would have to be 0.1% or less
Speedup w/ E = 1/(.001 + .999/100) = 90.99

CO&ISA NLT, 2021 9

Example 2: Amdahl’s Law
Speedup w/ E = 1 / ((1-F) + F/S)
❑ Consider summing 10 scalar variables and two 10x10
matrices (matrix sum) on 10 processors
Speedup w/ E =

❑ What if there are 100 processors ?

Speedup w/ E =

❑ What if the matrices are 100x100 (or 10,010 adds in

total) on 10 processors?
Speedup w/ E =

❑ What if the matrices are 100x100 and there are 100

processors?
Speedup w/ E =
CO&ISA NLT, 2021 10
Example 2: Amdahl’s Law
Speedup w/ E = 1 / ((1-F) + F/S)
❑ Consider summing 10 scalar variables and two 10x10
matrices (matrix sum) on 10 processors
Speedup w/ E = 1/(.091 + .909/10) = 1/0.1819 = 5.5

❑ What if there are 100 processors ?

Speedup w/ E = 1/(.091 + .909/100) = 1/0.10009 = 10.0

❑ What if the matrices are 100x100 (or 10,010 adds in

total) on 10 processors?
Speedup w/ E = 1/(.001 + .999/10) = 1/0.1009 = 9.9

❑ What if the matrices are 100x100 and there are 100

processors?
Speedup w/ E = 1/(.001 + .999/100) = 1/0.01099 = 91
CO&ISA NLT, 2021 11
Scaling
❑ To get good speedup on a multiprocessor while keeping
the problem size fixed is harder than getting good
speedup by increasing the size of the problem.
Strong scaling – when speedup can be achieved on a
multiprocessor without increasing the size of the problem
Weak scaling – when speedup is achieved on a multiprocessor
by increasing the size of the problem proportionally to the
increase in the number of processors

❑ Load balancing is another important factor.

CO&ISA NLT, 2021 12

Multiprocessor Key Questions

❑ Q1 – How do they share data?

❑ Q2 – How do they coordinate?

❑ Q3 – How scalable is the architecture? How many

processors can be supported?

CO&ISA NLT, 2021 13

Types of Parallel Processor Systems

CO&ISA NLT, 2021 14

Types of Parallel Processor Systems
❑ Single instruction, single data (SISD) stream: A single
processor executes a single instruction stream to operate
on data stored in a single memory.
❑ Single instruction, multiple data (SIMD) stream: A single
machine instruction controls the simultaneous execution
of a number of processing elements
❑ Multiple instruction, single data (MISD) stream: not
implemented
❑ Multiple instruction, multiple data (MIMD) stream: A set
of processors simultaneously execute different
instruction sequences on different data sets.

CO&ISA NLT, 2021 15

SISD

❑ CU: Control Unit

❑ PU: Processing Unit
❑ MU: Memory Unit
❑ Sequential execution
❑ Data stored in a single main memory
❑ ➔ Uniprocessor computer

CO&ISA NLT, 2021 16

SIMD

❑ 1 instruction stream
❑ Multiple processing
units
❑ Each PC processes
data from a separate
memory
❑ All PUs execute the
same instruction
stream from CU
❑ Example: GPU

CO&ISA NLT, 2021 17

MIMD
❑ Multiple instruction, multiple data
❑ Require multiple CUs and PUs
❑ Shared or distributed memory

MIMD with shared memory MIMD with distributed memory

CO&ISA NLT, 2021 18

Symmetric Multiprocessor (SMP)
❑ Two or more similar processors of comparable capability
❑ These processors share the same main memory and I/O
facilities
❑ All processors share access to I/O devices
❑ All processors can perform the same functions
(symmetric)
❑ The system is controlled by an integrated operating
system
Provides interaction between processors and system resources

CO&ISA NLT, 2021 19

Symmetric Multiprocessor Organization

CO&ISA NLT, 2021 20

Cluster
❑ Group of interconnected, whole computers
working together as a unified computing
resource. Each computer in a cluster is typically
referred to as a node.

❑ Absolute scalability
❑ Incremental scalability
❑ High availability
❑ Superior price/performance

CO&ISA NLT, 2021 21

Cluster Computer Architecture

CO&ISA NLT, 2021 22

SMP vs cluster
❑ Both are high performance computer architecture
❑ SMP
Easy to use and maintenance
Closer to uniprocessor system
Small size and low power consumption

❑ Cluster
High computing capability
Scalability
Hight dependability and availability

CO&ISA NLT, 2021 23

SMP vs cluster
❑ SMP
Limited capability.

❑ Cluster
Separated memory on each node
Complicated software

➔ combining advantages of SMP and cluster:

NUMA

CO&ISA NLT, 2021 24

Cache-coherence NUMA

CO&ISA NLT, 2021 25

The end

CO&ISA NLT, 2021 26

Configuring BGP On Cisco Routers Lab Guide 3.2
67% (3)
Configuring BGP On Cisco Routers Lab Guide 3.2
106 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Lecture 3.1.4 (Amdahl's Law)
No ratings yet
Lecture 3.1.4 (Amdahl's Law)
4 pages
QUestion Bank 103
No ratings yet
QUestion Bank 103
102 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Patterson6e MIPS Ch06 PPT
No ratings yet
Patterson6e MIPS Ch06 PPT
74 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Zeus Spy Eye Banking Trojan Analysis
0% (1)
Zeus Spy Eye Banking Trojan Analysis
31 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
CSC 313 Module 3 Pipelining
No ratings yet
CSC 313 Module 3 Pipelining
59 pages
Lect 02
No ratings yet
Lect 02
51 pages
PDC Notes by Zatch-1
No ratings yet
PDC Notes by Zatch-1
42 pages
Unit 1
No ratings yet
Unit 1
54 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
PDC Lecture 03
No ratings yet
PDC Lecture 03
36 pages
Module 5
No ratings yet
Module 5
45 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
IT3030E CA Chap8 Multiprocessing
No ratings yet
IT3030E CA Chap8 Multiprocessing
26 pages
CS439 CC 2 Parallel Distributed Systems
No ratings yet
CS439 CC 2 Parallel Distributed Systems
37 pages
CA Chap7 Multiprocessing
No ratings yet
CA Chap7 Multiprocessing
35 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
Ünite
No ratings yet
Ünite
33 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
B38DF LS2b Performance
No ratings yet
B38DF LS2b Performance
20 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
01 Introduction
No ratings yet
01 Introduction
20 pages
2 ND
No ratings yet
2 ND
19 pages
Unit-2 Aca
No ratings yet
Unit-2 Aca
24 pages
Lecture 6 (Amdahl's Law)
No ratings yet
Lecture 6 (Amdahl's Law)
13 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
PDC - Lecture - No. 2
No ratings yet
PDC - Lecture - No. 2
31 pages
Performance Evaluation of Parallel Computers
No ratings yet
Performance Evaluation of Parallel Computers
37 pages
Coa Presentation
No ratings yet
Coa Presentation
20 pages
Chap 7 Parallel Processing Multicore Computers
No ratings yet
Chap 7 Parallel Processing Multicore Computers
9 pages
Computer Hardware Engineering: IS1200, Spring 2015
No ratings yet
Computer Hardware Engineering: IS1200, Spring 2015
17 pages
Lecture 3.1.4 (Amdahl's Law)
No ratings yet
Lecture 3.1.4 (Amdahl's Law)
13 pages
Speedup
No ratings yet
Speedup
12 pages
Amdahl's Law: Example 1
No ratings yet
Amdahl's Law: Example 1
12 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
7 pages
Lecture04 PDF
No ratings yet
Lecture04 PDF
27 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Multi
No ratings yet
Multi
5 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Functional Reverse Engineering of Machine Tools Compress
No ratings yet
Functional Reverse Engineering of Machine Tools Compress
351 pages
Laraib Cs - 39 Assig 1
No ratings yet
Laraib Cs - 39 Assig 1
4 pages
Multiple Stacks and Queues
No ratings yet
Multiple Stacks and Queues
6 pages
Device 477 Connection Guide
No ratings yet
Device 477 Connection Guide
11 pages
Node B Integration Instructions For Ericsson 3308
100% (2)
Node B Integration Instructions For Ericsson 3308
3 pages
LogMeIn Hamachi
No ratings yet
LogMeIn Hamachi
114 pages
Code Blocks Instructions
No ratings yet
Code Blocks Instructions
64 pages
I2C (Inter Inegrated Communication Systems) : (Type Here)
No ratings yet
I2C (Inter Inegrated Communication Systems) : (Type Here)
5 pages
Operating System For Ubiquiti Airmax Ac Series Products Release Version: 8
No ratings yet
Operating System For Ubiquiti Airmax Ac Series Products Release Version: 8
56 pages
Intellex™ Rugged CCL, Pressure & Fluid Temperature Tool
No ratings yet
Intellex™ Rugged CCL, Pressure & Fluid Temperature Tool
2 pages
Hpe0 v25VCE
No ratings yet
Hpe0 v25VCE
4 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Megamat - Swiss Instruments LTD
No ratings yet
Megamat - Swiss Instruments LTD
74 pages
Sony Vaio VPC-F Series Foxconn M931 IRX-5300 MBX-215 RevSA
No ratings yet
Sony Vaio VPC-F Series Foxconn M931 IRX-5300 MBX-215 RevSA
93 pages
Kendriya Vidyalaya Uttarkashi: ACADEMIC YEAR: 2020-21 Synopsis of The Project Atm Management System
No ratings yet
Kendriya Vidyalaya Uttarkashi: ACADEMIC YEAR: 2020-21 Synopsis of The Project Atm Management System
11 pages
9 2 3 7 Lab ConfiguringPortAddressTranslation
No ratings yet
9 2 3 7 Lab ConfiguringPortAddressTranslation
11 pages
OCR A Level H046 H446 Revision Checklist
No ratings yet
OCR A Level H046 H446 Revision Checklist
23 pages
Virtual Systems & Services Lecture 1
No ratings yet
Virtual Systems & Services Lecture 1
13 pages
FP2 C2
No ratings yet
FP2 C2
3 pages
OS Notes Unit 1-2
No ratings yet
OS Notes Unit 1-2
50 pages
Super X5Ss8-Gm Super X5Sse-Gm Super X5Sse-Gmii: User'S Manual
No ratings yet
Super X5Ss8-Gm Super X5Sse-Gm Super X5Sse-Gmii: User'S Manual
88 pages
Data Processing Instructions
No ratings yet
Data Processing Instructions
15 pages
Computer Science Igcse Student Workbook
No ratings yet
Computer Science Igcse Student Workbook
28 pages
Chapter 14 Files
No ratings yet
Chapter 14 Files
20 pages
IEEE P802.1DG - Time-Sensitive Networking Profile For Automotive In-Vehicle Ethernet Communications Call For Participation
No ratings yet
IEEE P802.1DG - Time-Sensitive Networking Profile For Automotive In-Vehicle Ethernet Communications Call For Participation
2 pages
Scoot Tacas08
No ratings yet
Scoot Tacas08
4 pages
Start Guide P2P User Interface Linux
No ratings yet
Start Guide P2P User Interface Linux
12 pages
Question Answer
No ratings yet
Question Answer
2 pages
Antenna: Location Application Location Application Radio Manager
No ratings yet
Antenna: Location Application Location Application Radio Manager
1 page
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

CA Chap7 Multiprocessing

Uploaded by

CA Chap7 Multiprocessing

Uploaded by

Chapter 7: Introduction to

Ngo Lam Trung

[with materials from Computer Organization and Design, Patterson &

CO&ISA NLT, 2021 1

CO&ISA NLT, 2021 2

CO&ISA NLT, 2021 3

❑ The use of multiple cores has the potential to provide

CO&ISA NLT, 2021 4

❑ Task-level/Process-level parallelism: multiple

CO&ISA NLT, 2021 5

ExTime w/ E = ExTime w/o E 

CO&ISA NLT, 2021 6

ExTime w/ E = ExTime w/o E  ((1-F) + F/S)

Speedup w/ E = 1 / ((1-F) + F/S)

CO&ISA NLT, 2021 7

❑ What if its usable only 15% of the time?

❑ Amdahl’s Law tells us that to achieve linear speedup

❑ What if its usable only 15% of the time?

❑ To achieve linear speedup with 100 processors, none

CO&ISA NLT, 2021 9

❑ What if there are 100 processors ?

❑ What if the matrices are 100x100 (or 10,010 adds in

❑ What if the matrices are 100x100 and there are 100

❑ What if there are 100 processors ?

❑ What if the matrices are 100x100 (or 10,010 adds in

❑ What if the matrices are 100x100 and there are 100

❑ Load balancing is another important factor.

CO&ISA NLT, 2021 12

❑ Q1 – How do they share data?

❑ Q2 – How do they coordinate?

❑ Q3 – How scalable is the architecture? How many

CO&ISA NLT, 2021 13

CO&ISA NLT, 2021 14

CO&ISA NLT, 2021 15

❑ CU: Control Unit

CO&ISA NLT, 2021 16

CO&ISA NLT, 2021 17

MIMD with shared memory MIMD with distributed memory

CO&ISA NLT, 2021 18

CO&ISA NLT, 2021 19

CO&ISA NLT, 2021 20

CO&ISA NLT, 2021 21

CO&ISA NLT, 2021 22

CO&ISA NLT, 2021 23

➔ combining advantages of SMP and cluster:

CO&ISA NLT, 2021 24

CO&ISA NLT, 2021 25

CO&ISA NLT, 2021 26

You might also like