0% found this document useful (0 votes)

20 views43 pages

Lecture 4 Flynn's Classical Taxonomy

The document covers Flynn's Classical Taxonomy, categorizing computer architectures into SISD, SIMD, MISD, and MIMD based on instruction and data streams. It discusses the physical organization of parallel platforms, including PRAM and its classes, as well as communication costs and message passing techniques in parallel computing. Additional resources for further reading on parallel computing are also provided.

Uploaded by

Sameer Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views43 pages

Lecture 4 Flynn's Classical Taxonomy

Uploaded by

Sameer Zohaib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

CS 3006

Parallel and Distributed Computing

Lecture 4
Danyal Farhat
FAST School of Computing
NUCES Lahore
Flynn’s Classical Taxonomy &
Processor to Memory
Connection Strategies
Outline
• Flynn’s Classical Taxonomy
SISD
SIMD
MISD
MIMD
• Physical Organization of Parallel Platforms
PRAM
• Routing Techniques and Costs
• Summary
• Additional Resources
Flynn’s Classical Taxonomy
• Widely used architectural classification scheme
• Classifies architectures into four types
• The classification is based on how data and instructions flow
through the cores.
Instruction stream: Sequence of instructions from memory to control unit
Data stream: Sequence of data from memory to control unit
Processor Organizations
Flynn’s Classical Taxonomy (Cont.)
SISD:
• Refers to traditional computer:
a serial architecture
• This architecture includes
single core computers
• Single instruction stream is in
execution at a given time
• Similarly, only one data stream
is active at any time

Introduction: 1-6
Example of SISD
Flynn’s Classical Taxonomy (Cont.)
SIMD:
• Refers to parallel architecture with
multiple cores
• All the cores execute the same
instruction stream at any time but, data
stream is different for the each
• Well-suited for the scientific operations
requiring large matrix-vector operations
• Vector computers (Cray vector
processing machine) and Intel co-
processing unit ‘MMX’ fall under this
category
• Used with array operations, image
processing and graphics

Introduction: 1-8
Example of SIMD
Flynn’s Classical Taxonomy (Cont.)
MISD:
• Multiple instruction stream and single
data stream
 A pipeline of multiple independently
executing functional units
 Each operating on a single stream of data
and forwarding results from one to the
next
• Rarely used in practice
• E.g., Systolic arrays : network of
primitive processing elements that
pump data
• Example: Multiple cryptography
algorithms attempting to crack a coded
message

Introduction: 1-10
Example of MISD
Flynn’s Classical Taxonomy (Cont.)
MIMD:
• Multiple instruction streams and
multiple data streams
• Different CPUs can simultaneously
execute different instruction
streams manipulating different data
• Most of the modern parallel
architectures fall under this category
e.g., Multiprocessor and
multicomputer architectures
• Many MIMD architectures include
SIMD executions by default
• Supercomputers also fall in this
category

Introduction: 1-12
Example of MIMD
Flynn’s Classical Taxonomy (Cont.)
SIMD-MIMD Comparison
• SIMD computers require less hardware than MIMD computers
(single control unit)
• However, since SIMD processors are specially designed, they
tend to be expensive and have long design cycles
• Not all applications are naturally suited to SIMD processors
• In contrast, platforms supporting the SPMD (Same Program
Multiple Data) paradigm can be built from inexpensive off-the-
shelf components with relatively little effort in a short amount
of time
The Term SPMD is close variant of MIMD
Uniform Memory Access (UMA)

• From all processing units to the shared memory, the data

access time is constant
• Mostly represented by Symmetric Multiprocessor (SMP)
machines
Non-Uniform Memory Access (NUMA)

• From all processing units to the shared memory, the data

access time is not constant
Physical Organization of Parallel Platforms
Architecture of an Ideal Parallel Computer
• Parallel Random Access Machine (PRAM)
An extension to ideal sequential model: Random Access Machine (RAM)
PRAMs consist of p processors
A global memory
 Unbounded size
 Uniformly accessible to all processors with same address space
Processors share a common clock but may execute different instructions in
each cycle
Based on simultaneous memory access mechanisms, PRAM can further be
classified
Graphical Representation of PRAM
Parallel Random Access Machine (PRAM)
• PRAM has a set of similar type of processors
• Processors communicate with each other using the shared
memory
• N processors can perform independent operations on N
number of data in a given time, this might lead to
simultaneous access of same memory location by different
processors
To solve the simultaneous access of same memory location problem we
have PRAM classes
PRAM Classes
• PRAMs can be divided into four classes
Exclusive-Read, Exclusive-Write (EREW) PRAM
No two processors can perform read/write operations concurrently
Weakest PRAM model, provides minimum memory access concurrency
Concurrent-Read, Exclusive-Write (CREW) PRAM
All processors can read concurrently but can’t write at same time
Multiple write accesses to a memory location are serialized
Exclusive-Read, Concurrent-Write (ERCW) PRAM
No two processors can perform read operations concurrently, but can write
Concurrent-Read, Concurrent-Write (CRCW) PRAM
Most powerful PRAM model
PRAM Arbitration Protocols
• Concurrent reads do not create any semantic
inconsistencies

• But, What about concurrent write?

• Need of an arbitration (mediation) mechanism to resolve

concurrent write access
PRAM Arbitration Protocols (Cont.)
• Common
Write only if all values that processors are attempting to write are identical
• Arbitrary
Write the data from a randomly selected processor and ignore the rest
• Priority
Follow a predetermined priority order
Processor with highest priority succeeds and the rest fail
• Sum
Write the sum of the data items in all the write requests
The sum-based write conflict resolution model can be extended for any of
the associative operators, that is defined for data being written
Physical Complexity of an Ideal Parallel Computer
• Processors and memories are connected via switches

• Since these switches must operate in O(1) time at the level

of words, for a system of p processors and m words, the
switch complexity is O(mp)
Switches determine the memory word being accessed by each processor
Switch is a device that opens or closes access to certain data bank or word

• Clearly, for meaningful values of p and m, a true PRAM is

not realizable
Communication Costs in Parallel Machines
• Along with idling (doing nothing) and contention (conflict
e.g., resource allocation), communication is a major
overhead in parallel programs
• The communication cost is usually dependent on a
number of features including the following:
Programming model for communication
 Required pattern of the communication in the program
Network topology
Data handling and routing
Associated network protocols
• Usually, distributed systems suffer from major
communication overheads
Message Passing Costs in Parallel Computers
• The total time to transfer a message over a network
comprises of the following:

• Startup time (ts): Time spent at sending and receiving

nodes (preparing the message [adding headers, trailers,
and parity information] , executing the routing algorithm,
establishing interface between node and router, etc.)
Message Passing Costs in Parallel Computers (Cont)
• Per-hop time (th): This time is a function of number of hops
(steps) and includes factors such as switch latencies,
network delays, etc.
Also known as node latency
Also accounts for the latency to take decision of choosing next channel to
which this message shall be forwarded
• Per-word transfer time (tw): This time includes all
overheads that are determined by the length of the
message. This includes bandwidh of links, and buffering
overheads, etc.
• If channel bandwidth is r words/s then each word take tw= 1/r to traverse
the link
Message Passing Costs in Parallel Computers (Cont)
Store-and-Forward Routing
• A message traversing multiple hops is completely received at
intermediate hop before being forwarded to next hop
• The total communication cost for a message of size m words to
traverse l communication links is

• In most platforms, th is small and the above expression can be

approximated by

Cost of header transfer at each hop (step) th ts is startup time

mtw is cost of transferring m words over the link
Message Passing Costs in Parallel Computers (Cont)
Packet Routing
• Store-and-forward makes poor use of communication resources
• Packet routing breaks messages into packets and pipelines them
through the network
• Since packets may take different paths, each packet must carry
routing information, error checking, sequencing, and other related
header information
Error checking (parity information), sequencing (order number)
Related headers: layers headers, addressing headers
• The total communication time for packet routing is approximated
by:
• Here factor tw also accounts for overheads in packet headers
Message Passing Costs in Parallel Computers (Cont)
Cut-Through Routing
• Takes the concept of packet routing to an extreme by further
dividing messages into basic units called flits or flow control digits

• Since flits are typically small, the header information must be

minimized

• This is done by forcing all flits to take the same path, in sequence

• A tracer message first programs all intermediate routers. All flits

then take the same route
Message Passing Costs in Parallel Computers (Cont)
Cut-Through Routing (Cont.)
• Error checks are performed on entire message, as opposed to flits
• No sequence numbers are needed
Sequencing information is not needed as all the packets are following same
path which ensures in-order delivery
• The total communication time for cut-through routing is
approximated by:

• This is identical to packet routing, however, tw is typically much

smaller
Header of the message takes l* th to reach the destination and entire
message arrives in time m tw after the message header
Message Passing Costs in Parallel Computers
(Cont.)
(a) through a store-and-forward
communication network

b) and (c) extending the concept to cut-

through routing

Shaded regions here represent the time where

message is in transit (travel)
The startup time associated with this message
transfer is assumed to be zero
Message Passing Costs in Parallel Computers (Cont)
Simplified Cost Model for Communicating Messages
• The cost of communicating a message between two nodes l hops
away using cut-through routing is given by:

• In this expression, th is typically smaller than ts and tw. For this

reason, the second term in the RHS does not show, particularly,
when m is large
• For these reasons, we can approximate the cost of message
transfer by:

For communication using flits, start-up time dominates the node latencies
Message Passing Costs in Parallel Computers (Cont)
Simplified Cost Model for Communicating Messages (Cont.)

• It is important to note that the original expression for

communication time is valid for only uncongested networks

• Different communication patterns congest different networks to

varying extents

• It is important to understand and account for this in the

communication time accordingly
Summary
• Flynn’s Classical Taxonomy
Differentiates multiprocessor computers on the basis of dimensions of
instruction and data

• Processor Organizations
SISD - Used in uniprocessor computers
SIMD - Used in vector or array processor computers
MISD - Not commercially implemented
MIMD - Used in supercomputers, grid computers etc.
Summary (Cont.)
• SISD
Easy and deterministic but with limited performance
SISD processor performance - MIPS Rate = f x IPC
How to increase performance of uniprocessor?
• SIMD
Homogeneous processing units (vector or array processors)
Execution of single instruction on multiple data sets using single control unit
Associated data memory for each processing element
• SIMD – Examples
Processing of pixels, online gaming servers, matrix based calculations etc.
Summary (Cont.)
• MISD
Single data stream transmitted to a set of processors, each of which
executes a different instruction sequence
Commercially not implemented
Example: Multiple cryptography algorithms attempting to crack a coded
message

• MIMD
Supercomputers, Grid computers, Networked parallel computers etc.
Summary (Cont.)
• MIMD - Shared Memory - CPUs share same address space
Uniform Memory Access (UMA) - Constant data access time
Non-Uniform Memory Access (NUMA) - Non-Constant data access time
• MIMD - Distributed Memory
CPUs connected via network and have their own associated memory
• Architecture of an Ideal Parallel Computer – PRAM
Extension to ideal sequential model: Random Access Machine (RAM)
Consist of p processors
Global memory (Unbounded size, Uniformly accessible to all processors
with same address space)
Summary (Cont.)
• PRAM Classes
Exclusive-Read, Exclusive-Write (EREW) PRAM
Concurrent-Read, Exclusive-Write (CREW) PRAM
Exclusive-Read, Concurrent-Write (ERCW) PRAM
Concurrent-Read, Concurrent-Write (CRCW) PRAM

• PRAM Arbitration Protocols

Concurrent writes do create semantic inconsistencies
We need an arbitration mechanism to resolve concurrent write access
Common, Arbitrary, Priority, Sum protocols can be used
Summary (Cont.)
• Physical Complexity of an Ideal Parallel Computer
Processors and memories are connected via switches

Switches must operate in O(1) time at the level of words, for a system of p
processors and m words, the switch complexity is O(mp)

For meaningful values of p and m, a true PRAM is not realizable

• Communication Costs in Parallel Machines

Summary (Cont.)
• Message Passing Costs in Parallel Computers
Startup time, Per-hop time, Per-word transfer time

Store-and-Forward Routing

Packet Routing

Cut-Through Routing

Simplified Cost Model for Communicating Messages

Additional Resources
• Introduction to Parallel Computing by Ananth Grama and
Anshul Gupta

Chapter 2: Parallel Programming Platforms

Section 2.3: Dichotomy of Parallel Computing Platforms

Section 2.4: Physical Organization of Parallel Platforms
Section 2.4.1: Architecture of an Ideal Parallel Computer
Section 2.5 Communication Costs in Parallel Machines
Section 2.5.1 Message Passing Costs in Parallel Computers
Questions?

PCIe 6 SPECIFICATION
No ratings yet
PCIe 6 SPECIFICATION
6 pages
Top Interview Questions For "Network Engineer" Position. Md. Wasif Bin Hafiz
No ratings yet
Top Interview Questions For "Network Engineer" Position. Md. Wasif Bin Hafiz
7 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Unit 1
No ratings yet
Unit 1
25 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
L2
No ratings yet
L2
27 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Distributed System
100% (1)
Distributed System
26 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Fundamentals of Parallel Computers
No ratings yet
Fundamentals of Parallel Computers
6 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
34 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
Unit V
No ratings yet
Unit V
95 pages
Ceg4131 Models
No ratings yet
Ceg4131 Models
27 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Pda 2
No ratings yet
Pda 2
105 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
CSA Presentation
No ratings yet
CSA Presentation
37 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
Unit 5
No ratings yet
Unit 5
96 pages
Parallel and Distributed Computing Lecture 03
No ratings yet
Parallel and Distributed Computing Lecture 03
44 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
33 pages
Parallel Architecture: Sathish Vadhiyar
No ratings yet
Parallel Architecture: Sathish Vadhiyar
26 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
Chapter 2 - Communication Models
No ratings yet
Chapter 2 - Communication Models
64 pages
Organization of Multiprocessor Systems
No ratings yet
Organization of Multiprocessor Systems
87 pages
Aca Notes: Scalability
No ratings yet
Aca Notes: Scalability
13 pages
Rust Essentials: Master the Language of Safe Systems Programming
From Everand
Rust Essentials: Master the Language of Safe Systems Programming
Tyler Hayes
No ratings yet
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Concurrency and Multithreading in C: POSIX Threads and Synchronization
From Everand
Concurrency and Multithreading in C: POSIX Threads and Synchronization
Larry Jones
No ratings yet
200-301 CCNA (Cisco Certified Network Associate) Study Guide
From Everand
200-301 CCNA (Cisco Certified Network Associate) Study Guide
Anand Vemula
No ratings yet
Soal MTCNA
No ratings yet
Soal MTCNA
8 pages
A Non-Invasive Electricity Measurement Within The Smart Grid Landscape: Arduino-Based Visualization Platform For Iot
No ratings yet
A Non-Invasive Electricity Measurement Within The Smart Grid Landscape: Arduino-Based Visualization Platform For Iot
7 pages
Network Design Methodology
No ratings yet
Network Design Methodology
283 pages
DS-2CD2046G2-IU-C Datasheet V5.5.112 20230217
No ratings yet
DS-2CD2046G2-IU-C Datasheet V5.5.112 20230217
5 pages
Quectel Wireless Solutions: Empa Iot & Sensor Week 23 Nov 2020
No ratings yet
Quectel Wireless Solutions: Empa Iot & Sensor Week 23 Nov 2020
28 pages
Rack - RS-07 Accesorios
No ratings yet
Rack - RS-07 Accesorios
8 pages
Integration Broker Simple Example To Transfer Data From Hrms To fscm1
No ratings yet
Integration Broker Simple Example To Transfer Data From Hrms To fscm1
17 pages
Network Protocols Handbook Javvin Technologies PDF Download
100% (2)
Network Protocols Handbook Javvin Technologies PDF Download
57 pages
Planet Switch Cli
No ratings yet
Planet Switch Cli
433 pages
Networking Internship Documentation (1) (1) - Pages-Merged
No ratings yet
Networking Internship Documentation (1) (1) - Pages-Merged
39 pages
Course Title:-Advanced Computer Networking Group Presentation On NFV Functionality
No ratings yet
Course Title:-Advanced Computer Networking Group Presentation On NFV Functionality
18 pages
12.3.4 Packet Tracer - ACL Demonstration
No ratings yet
12.3.4 Packet Tracer - ACL Demonstration
2 pages
Datasheet MOTOTRBO - LCP
No ratings yet
Datasheet MOTOTRBO - LCP
14 pages
20-10 OSPF Cost Metric
No ratings yet
20-10 OSPF Cost Metric
15 pages
DS-2CD1743G2-IZS Datasheet V5.7.1 20221216
No ratings yet
DS-2CD1743G2-IZS Datasheet V5.7.1 20221216
6 pages
Active Reactive Power With PPC Intallation Guide Eng
No ratings yet
Active Reactive Power With PPC Intallation Guide Eng
23 pages
Ether Type
No ratings yet
Ether Type
4 pages
QNV-6082R: 2MP Network IR Dome Camera
No ratings yet
QNV-6082R: 2MP Network IR Dome Camera
3 pages
A Wireshark View of RTPCP
No ratings yet
A Wireshark View of RTPCP
3 pages
4.2.2.13 Lab - Configuring and Verifying Extended ACLs - ILM
No ratings yet
4.2.2.13 Lab - Configuring and Verifying Extended ACLs - ILM
16 pages
SDN and NFV Exam Qna
No ratings yet
SDN and NFV Exam Qna
8 pages
Mengurangi Downtime Jaringan Komputer de
No ratings yet
Mengurangi Downtime Jaringan Komputer de
21 pages
Computer Networks Draft Notes
No ratings yet
Computer Networks Draft Notes
43 pages
1.2. Specifics of Comtech Technologies Platforms
No ratings yet
1.2. Specifics of Comtech Technologies Platforms
34 pages
FLC 810 e
No ratings yet
FLC 810 e
2 pages
AirOS 3.4 - Ubiquiti Wiki#BasicWirelessSettings#BasicWirelessSettings
No ratings yet
AirOS 3.4 - Ubiquiti Wiki#BasicWirelessSettings#BasicWirelessSettings
24 pages
Dokumen - Pub - CCDP Self Study Designing Cisco Network Architectures Arch 1587051850 9781587051852
No ratings yet
Dokumen - Pub - CCDP Self Study Designing Cisco Network Architectures Arch 1587051850 9781587051852
697 pages
NetPerformer 9220-9230 Data Sheet
No ratings yet
NetPerformer 9220-9230 Data Sheet
2 pages

Lecture 4 Flynn's Classical Taxonomy

Uploaded by

Lecture 4 Flynn's Classical Taxonomy

Uploaded by

CS 3006

Parallel and Distributed Computing

• From all processing units to the shared memory, the data

• From all processing units to the shared memory, the data

• But, What about concurrent write?

• Need of an arbitration (mediation) mechanism to resolve

• Since these switches must operate in O(1) time at the level

• Clearly, for meaningful values of p and m, a true PRAM is

• Startup time (ts): Time spent at sending and receiving

• In most platforms, th is small and the above expression can be

Cost of header transfer at each hop (step) th ts is startup time

• Since flits are typically small, the header information must be

• A tracer message first programs all intermediate routers. All flits

• This is identical to packet routing, however, tw is typically much

b) and (c) extending the concept to cut-

Shaded regions here represent the time where

• In this expression, th is typically smaller than ts and tw. For this

• It is important to note that the original expression for

• Different communication patterns congest different networks to

• It is important to understand and account for this in the

• PRAM Arbitration Protocols

For meaningful values of p and m, a true PRAM is not realizable

• Communication Costs in Parallel Machines

Simplified Cost Model for Communicating Messages

Chapter 2: Parallel Programming Platforms

Section 2.3: Dichotomy of Parallel Computing Platforms

You might also like