0% found this document useful (0 votes)

55 views16 pages

Lec7 PDF

This document discusses parallel processing and provides an overview of key topics including: - Classifying parallel architectures based on instruction and data flows, including SISD, SIMD, MISD, and MIMD. - Examples of parallel programs including matrix addition and parallel sorting. - Flynn's taxonomy for parallel architectures based on the nature of instruction and data streams. - Details on SIMD, MISD, and MIMD architectures with examples. - Cautions about terminology in parallel processing due to fast development blurring boundaries.

Uploaded by

Rabia Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views16 pages

Lec7 PDF

Uploaded by

Rabia Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Lecture 7: Parallel Processing

 Introduction and
motivation

 Architecture classification

 Performance of Parallel
Architectures

 Interconnection Network

Performance Improvement
 Reduction of instruction execution time:
 Increased clock frequency by fast circuit technology.
 Simplify instructions (RISC)

 Parallelism within processor:

 Pipelining.
 Parallel execution of instructions (ILP):
• Superscalar.
p
• VLIW architectures.

 Parallel processing.

1
Why Parallel Processing?
 Traditional computers often are not able to meet performance
needs in many applications:
 Simulation of large complex systems in physics, economy, biology,
etc.
etc
 Distributed data base with search function.
 Computer aided design.
 Visualization and multimedia.

 Such applications are characterized by a very large amount of

numerical computation and/or a high quantity of input data.
 In order to deliver sufficient performance for such applications, we
can have many processors in a single computer.
 PP has the potential of being more reliable: if one processor fails,
the system continues to work at a slightly lower performance.

Parallel Computer
 Parallel computers refer to architectures in which
many CPUs are running in parallel to implement a
certain application
pp or a set of applications.
pp
 Such computers can be organized in very different
ways, depending on several key parameters:
 number and complexity of individual CPUs;
 availability of common (shared) memory;
 interconnection
te co ect o tec technology
o ogy a
and
d topo
topology;
ogy;
 performance of interconnection network;
 I/O devices;
 etc.

2
Parallel Program
 In order to solve a problem using a parallel computer, one must
decompose the problem into sub-problems, which can be solved
in parallel.
 The results of sub-problems
sub problems may have to be combined to get the
final result of the main problem.
 Due to data dependency among the sub-problems, it is not easy
to decompose some problem to get a large degree of parallelism.
 Due to data dependency, the processors may also have to
communicate among each other.
 The time taken for communication is usually very high when
compared with the processing time.
 The communication mechanism must therefore be very well
designed in order to get a good performance.

Parallel Program Example (1)

 Matrix addition:

 A11  B11 A12  B12 A13  B13  A1 M  B1 M 

 A B A 22  B22 A 23  B23  A 2 M  B2 M 
 21 21

C  A  B   A 31  B31 A32  B32 A33  B33  A 3 M  B3 M 
      
 
A  B A N 2  BN 2 A N 3  B N 3  A NM  B NM 
 N1 N1

Vector computation with vector of m elements:

for i:=1 to n do
C[i,1:m]:=A[i,1:m] + B[i,1:m];
end for;

3
Parallel Program Example (2)
 Parallel sorting:
Unsorted-1 .Unsorted-2
. . U N S O R Unsorted-3
TED . . . Unsorted-4

Sorting Sorting Sorting Sorting Parallel part

Sorted-1 Sorted-2 Sorted-3 Sorted-4

Merge Sequential part

SORTED
cobegin
sort(1,250)|
sort(251,500)|
sort(501,750)| Sorting of 1000 integers
sort(751,1000)
coend;
merge;

Flynn’s Classification of Architectures

 Flynn’s classification (1966) is based on the nature of
the instruction flow executed by the computer and that
of the data flow on which the instructions operate
operate.
 The multiplicity of instruction stream and data stream
gives us four different classes:
 Single instruction, single data stream - SISD
 Single instruction, multiple data stream - SIMD
 Multiple instruction,
instruction single data stream - MISD
 Multiple instruction, multiple data stream- MIMD

4
Single Instruction, Single Data - SISD
 A single processor
 A single instruction stream
 Data stored in a single memory

CPU

Processing
Control Unit Inst. stream unit

Memory data stream

System

Single Instruction, Multiple Data - SIMD

 A single machine instruction stream
 Simultaneous execution on different sets of data
 A large number of processing elements
 Lockstep synchronization among the process
elements.
 The processing elements can:
 have their respective private data memory; or
 share a common memory via an interconnection
network.

 Array and vector processors are the most common

SIMD machines

5
SIMD with Shared Memory

Processing DS1
Unit_1

ork
Interconnection Netwo
DS2
Processing
IS Unit_2
Control Shared
Unit
Memory

…
Processing
DS
Sn
Unit_n

Multiple Instruction, Single Data - MISD

 A single sequence of data
 Transmitted to a set of processors
 Each
E h processor executes
t diff
differentt iinstruction
t ti
sequences
 Never been commercially implemented!

Data

PE1 PE2 ... PEn

6
Multiple Instruction, Multiple Data - MIMD

 A set of processors
 Simultaneously execute different instruction
sequences
 Different sets of data

 The MIMD class can be further divided:

 Shared memory (tightly coupled):
• Symmetric multiprocessor (SMP)
• Non-uniform memory access (NUMA)
 Distributed memory (loosely coupled) = Clusters

MIMD with Shared Memory

LM1
CPU_1
DS1
Control IS1 Processing
Unit_1 Unit_1
onnection Network

LM2
CPU_2
DS2
Control IS2 Processing
Unit_2
Shared
Unit_2
Memory
…

Interco

LMn
CPU_n
DS
Control IS2 Processing
n

Unit_n Unit_n

7
Cautions
 Very fast development in Parallel Processing and
related areas has blurred concept boundaries, causing
lot of terminological confusion:
 concurrent computing,
 multiprocessing,
 distributed computing,
 etc.
 There is
Th i no strict
t i t delimiter
d li it ffor contributors
t ib t tto th
the area
of parallel processing; it includes CA, OS, HLLs,
compilation, databases, and computer networks.

Lecture 7: Parallel Processing

 Introduction and
motivation

 Architecture classification

 Performance of Parallel
Architectures

 Interconnection Network

8
Performance of Parallel Architectures
Important questions:

 How fast does a p

parallel computer
p run at its maximal
potential?
 How fast execution can we expect from a parallel
computer for a given application?
 Note the increase of multi-tasking and multi-thread
computing.
 How do we correctly measure the performance of a
parallel computer and the performance improvement
we get by using one?

Performance Metrics
 Peak rate: the maximal computation rate that can be
theoretically achieved when all processors are fully utilized.
 The peak rate is of no practical significance for the user.
 It is mostly used by vendor companies for marketing their
computers.

 Speedup: measures the gain we get by using a certain parallel

computer to run a given parallel program in order to solve a
specific problem.

Ts
S =
Tp

TS: execution time needed with the sequential algorithm;

Tp : execution time needed with the parallel algorithm.

9
Performance Metrics (Cont’d)
 Efficiency: this metric relates the speedup to the number of
processors used; by this it provides a measure of the efficiency
with which the processors are used.
S
E=
P
S: speedup;
P: number of processors.

For the ideal situation, in theory:

S = P; which means E = 1. 1

Practically the ideal efficiency of 1 cannot be achieved!

Amdahl’s Law
 Let f be the ratio of computations that, according to the
algorithm, have to be executed sequentially (0  f  1); and P the
number of processors.
(1 – f ) × Ts
Tp = f × Ts + P
Ts 1
S= =
(1 – f ) × Ts (1 – f )
S f × Ts + f+
P P

10
9
8
7
6 For a parallel computer with 10 processing elements
5
4
3
2
1
0 .2 0 .4 0 .6 0 .8 1 .0 f

10
Amdahl’s Law (Cont’d)
 Amdahl’s law says that even a little ratio of sequential
computation imposes a limit to speedup.
 A higher speedup than 1/f can not be achieved, regardless of the
number of processors,
processors since
If there is 20% sequential
1 computation, the speedup will
S=
(1 – f ) maximally be 5, even If you
f+
P have 1 million processors.

 To efficiently exploit a high number of processors, f must be

small (the algorithm has to be highly parallel), since
S 1
E= =
P f × (P – 1) + 1

Other Aspects that Limit the Speedup

 Beside the intrinsic sequentiality of parts of an algorithm, there
are also other factors that limit the achievable speedup:
 communication cost;
 load balancing of processors;
 costs of creating and scheduling processes; and
 I/O operations (mostly sequential in nature).

 There are many algorithms with a high degree of parallelism.

 The value of f is very small and can be ignored;
 Suited for massively parallel systems; and
 The other limiting factors, like the cost of communications,
become critical, in such algorithms.

11
Efficiency and Communication Cost
 Consider a highly parallel computation, f is small and can be neglected.
 Let fc be the fractional communication overhead of a processor:
 Tcalc: the time that a processor executes computations;
 Tcomm: the time that a processor is idle because of communication;

Tcomm TS
fc = Tp = P × (1 + fc)
Tcalc
TS P 1
S = Tp = E = 1 + fc  1 – fc (if fc is small)
1 + fc

 With algorithms having a high degree of parallelism, massively parallel

computers,
t consisting
i ti off large
l number
b off processors, can b
be efficiently
ffi i tl
used if fc is small.
 The time spent by a processor for communication has to be small
compared to its time for computation.
 In order to keep fc reasonably small, the size of processes can not go
below a certain limit.

Lecture 7: Parallel Processing

 Introduction and
motivation

 Architecture classification

 Performance of Parallel
Architectures

 Interconnection Network

12
Interconnection Network
 The interconnection network (IN) is a key component
of the architecture. It has a decisive influence on:
 the overall performance; and
 total cost of the architecture.

 The traffic in the IN consists of data transfer and

transfer of commands and requests (control
information).

 The key parameters of the IN are

 total bandwidth: transferred bits/second; and
 implementation cost.

Single Bus
Node1 Node2 ... Noden

 Single bus networks are simple and cheap.

 One single communication is allowed at a time; the
bandwidth is shared by all nodes.
 P f
Performance iis relatively
l ti l poor.
 In order to keep a certain performance, the number of
nodes is limited (around 16 - 20).
 Multiple buses can be used instead, if needed.

13
Completely Connected Network

N x (N-1)/2 wires

 Each node is connected to every other one.

 Communications can be performed in parallel between any pair
of nodes.
 Both performance and cost are high.
 Cost increases rapidly with number of nodes.

Crossbar Network
Node1

Node2

Noden

 A dynamic network: the interconnection topology can be modified by

configurating the switches.
 It is completely connected: any node can be directly connected to any
other.
 Fewer interconnections are needed than for the static completely
connected network; however, a large number of switches is needed.
 A large number of communications can be performed in parallel (even
though one node can receive or send only one data at a time).

14
Mesh Network
Torus:

 Cheaper than completely connected networks, while giving

relatively good performance.
 In order to transmit data between two nodes, routing through
intermediate nodes is needed (maximum 2×(n-1)
2×(n 1) intermediates
for an n×n mesh).
 It is possible to provide wrap-around connections:
 Torus.
 Three dimensional meshes have been also implemented.

Hypercube Network

2-D 3-D 4-D 5-D

 2n nodes are arranged in an n-dimensional cube. Each node is

connected to n neighbors.
 In order to transmit data between two nodes, routing through
intermediate nodes is needed (maximum n intermediates).

15
Summary
 The growing need for high performance can not always be
satisfied by computers with a single CPU.
 With parallel computers, several CPUs are running concurrently
i order
in d tto solve
l a given
i application.
li ti
 Parallel programs have to be available in order to make use of
the parallel computers.
 Computers can be classified based on the nature of the
instruction flow and that of the data flow on which the
instructions operate.
 A key component of a parallel architecture is also the
interconnection network.
 The performance we can get with a parallel computer depends
not only on the number of available processors but is limited by
characteristics of the executed programs.

Introduction To GIS Programming and Fundamentals With Python and ArcGIS
100% (7)
Introduction To GIS Programming and Fundamentals With Python and ArcGIS
381 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Unit-Iv: Database Management System
50% (2)
Unit-Iv: Database Management System
8 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Datastage Architecture
No ratings yet
Datastage Architecture
4 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
BEIJER - New HMI Ix v2.20
No ratings yet
BEIJER - New HMI Ix v2.20
25 pages
Computer Architecture: Dept. of Computer Science (UOG) University of Gujrat
No ratings yet
Computer Architecture: Dept. of Computer Science (UOG) University of Gujrat
20 pages
Computer Architecture: Dept. of Computer Science (UOG) University of Gujrat
No ratings yet
Computer Architecture: Dept. of Computer Science (UOG) University of Gujrat
20 pages
Test SAP Solution Manager: Document Categories
No ratings yet
Test SAP Solution Manager: Document Categories
6 pages
Module 4 ACA Notes
No ratings yet
Module 4 ACA Notes
53 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
ZXWR RNC (V3.12.10) Radio Network Controller Alarm and Notification Handling Reference - 548986
No ratings yet
ZXWR RNC (V3.12.10) Radio Network Controller Alarm and Notification Handling Reference - 548986
26 pages
Screens - Steps To Configure SAP Router
100% (1)
Screens - Steps To Configure SAP Router
23 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
CS0051 - Module 01 - Subtopic 1
No ratings yet
CS0051 - Module 01 - Subtopic 1
27 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
CS405 Csa M1
No ratings yet
CS405 Csa M1
5 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
CSC 313 Module 3 Pipelining
No ratings yet
CSC 313 Module 3 Pipelining
59 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
Introduction To Paralel Procesing
No ratings yet
Introduction To Paralel Procesing
40 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
Speedup
No ratings yet
Speedup
12 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
PC 1
No ratings yet
PC 1
53 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Module 5
No ratings yet
Module 5
45 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
Unit 5
No ratings yet
Unit 5
66 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Acer Aspire X1200-X3200 Service Manual
100% (1)
Acer Aspire X1200-X3200 Service Manual
96 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Materi 3
No ratings yet
Materi 3
26 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Parallel Processor Computing Unit 1
No ratings yet
Parallel Processor Computing Unit 1
10 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
PAG Unit1
No ratings yet
PAG Unit1
64 pages
RK3126 Datasheet V1.0
No ratings yet
RK3126 Datasheet V1.0
47 pages
9 1 Required Components For Installing JD Edwards EnterpriseOne Tools 9.2.0.1 Rev PDF
No ratings yet
9 1 Required Components For Installing JD Edwards EnterpriseOne Tools 9.2.0.1 Rev PDF
3 pages
What's New in Interaction Center With SAP CRM 7.0 EHP1
No ratings yet
What's New in Interaction Center With SAP CRM 7.0 EHP1
2 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Capacity and Volte: Impact in Cell Throughput Capacity and in Overall System Capacity Impact in Non-Gbr Traffic
100% (1)
Capacity and Volte: Impact in Cell Throughput Capacity and in Overall System Capacity Impact in Non-Gbr Traffic
5 pages
(MIUI DEVICE TEAM) Flash Fastboot Rom With SP Flash Tool On Redmi Note 3 (MTK) - Redmi Note 3 - Xiaomi MIUI Official Forum
No ratings yet
(MIUI DEVICE TEAM) Flash Fastboot Rom With SP Flash Tool On Redmi Note 3 (MTK) - Redmi Note 3 - Xiaomi MIUI Official Forum
20 pages
Electricity Power Billing Management System Project
No ratings yet
Electricity Power Billing Management System Project
15 pages
OAFramework
No ratings yet
OAFramework
21 pages
Installation of Mplab and Hi Tech C Pro Pr9
No ratings yet
Installation of Mplab and Hi Tech C Pro Pr9
12 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Lect5 PDF
No ratings yet
Lect5 PDF
21 pages
Juniper NSRP
No ratings yet
Juniper NSRP
39 pages
AST-0060878 Wayne Eckerson Research Report Big Data Analytics Final
No ratings yet
AST-0060878 Wayne Eckerson Research Report Big Data Analytics Final
57 pages
Mobile Application Server For Industial Apps
No ratings yet
Mobile Application Server For Industial Apps
29 pages
Teaching Material For RTOS
No ratings yet
Teaching Material For RTOS
49 pages
Computer Arcitecture: Lecture Data Path by Engr. Saleem Afzal Dhillu Class: BS-CS The University of Gujrat
No ratings yet
Computer Arcitecture: Lecture Data Path by Engr. Saleem Afzal Dhillu Class: BS-CS The University of Gujrat
36 pages
OS-9 Operating System User's Guide
No ratings yet
OS-9 Operating System User's Guide
104 pages
KX-HTS32 Brochure French
No ratings yet
KX-HTS32 Brochure French
12 pages
Zigbee Technology Paper Abstract
No ratings yet
Zigbee Technology Paper Abstract
3 pages
Ses DG
No ratings yet
Ses DG
497 pages
Fix Internet Connection Problems On Android Devices
No ratings yet
Fix Internet Connection Problems On Android Devices
3 pages
DVC Bailey User Interface PDF
No ratings yet
DVC Bailey User Interface PDF
52 pages
What-S New in MicroStrategy 9.2.1 - Transaction Services Beta1 PDF
No ratings yet
What-S New in MicroStrategy 9.2.1 - Transaction Services Beta1 PDF
19 pages
950.IMS - Restoration.procedures
No ratings yet
950.IMS - Restoration.procedures
16 pages
Representation Processing: Week 4
No ratings yet
Representation Processing: Week 4
35 pages
Lecture 4: RISC Computers Lecture 4: RISC Computers
No ratings yet
Lecture 4: RISC Computers Lecture 4: RISC Computers
15 pages
Resilientflow: Deployments of Distributed Control Channel Maintenance Modules To Recover SDN From Unexpected Failures
No ratings yet
Resilientflow: Deployments of Distributed Control Channel Maintenance Modules To Recover SDN From Unexpected Failures
8 pages
The Binary Search Tree Property: Balanced Trees
No ratings yet
The Binary Search Tree Property: Balanced Trees
6 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
From Everand
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
Mamta Devi
No ratings yet

Lec7 PDF

Uploaded by

Lec7 PDF

Uploaded by

Lecture 7: Parallel Processing

 Parallelism within processor:

 Such applications are characterized by a very large amount of

Parallel Program Example (1)

 A11  B11 A12  B12 A13  B13  A1 M  B1 M 

Vector computation with vector of m elements:

Sorting Sorting Sorting Sorting Parallel part

Sorted-1 Sorted-2 Sorted-3 Sorted-4

Merge Sequential part

Flynn’s Classification of Architectures

Memory data stream

Single Instruction, Multiple Data - SIMD

 Array and vector processors are the most common

Multiple Instruction, Single Data - MISD

PE1 PE2 ... PEn

 The MIMD class can be further divided:

MIMD with Shared Memory

Lecture 7: Parallel Processing

 How fast does a p

 Speedup: measures the gain we get by using a certain parallel

TS: execution time needed with the sequential algorithm;

For the ideal situation, in theory:

Practically the ideal efficiency of 1 cannot be achieved!

 To efficiently exploit a high number of processors, f must be

Other Aspects that Limit the Speedup

 There are many algorithms with a high degree of parallelism.

 With algorithms having a high degree of parallelism, massively parallel

Lecture 7: Parallel Processing

 The traffic in the IN consists of data transfer and

 The key parameters of the IN are

 Single bus networks are simple and cheap.

 Each node is connected to every other one.

 A dynamic network: the interconnection topology can be modified by

 Cheaper than completely connected networks, while giving

2-D 3-D 4-D 5-D

 2n nodes are arranged in an n-dimensional cube. Each node is

You might also like