0% found this document useful (0 votes)

22 views56 pages

CS-3006 3 ParallelArchitectures

The document provides an overview of parallel and distributed computing, detailing Flynn's Taxonomy which classifies processor architectures into SISD, SIMD, MISD, and MIMD. It discusses the advantages and organization of symmetric multiprocessors (SMP) and non-uniform memory access (NUMA) systems, as well as the concepts of cluster, grid, and cloud computing. Additionally, it highlights the characteristics and performance metrics of supercomputers.

Uploaded by

lordshen2804

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views56 pages

CS-3006 3 ParallelArchitectures

Uploaded by

lordshen2804

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 56

Parallel and Distributed Computing

Overview
(CS 3006)

Muhammad Aadil Ur Rehman

Department of Computer Science,

National University of Computer & Emerging Sciences,
Islamabad Campus
Credits: Dr. Muhammad Aleem
Flynn’s Taxonomy
• Specific classification of parallel architecture

• By Michael Flynn (from Stanford, in 1966)

– Made a classification of computer systems
known as Flynn’s Taxonomy

● Computer
• Instructions
• Data
Flynn’s Taxonomy
Taxonomy of Processor Architectures
1. Single Instruction, Single Data Stream – SISD
• Single processor
• Single instruction stream
• Data stored in single memory
• Deterministic execution

SI SISD SD
2. Single Instruction, Multiple Data Stream - SIMD
• A parallel processor
• Single instruction: all processing units execute same
instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a
different data element
• Large number of processing elements (with local
memory)
SISD SD
• Examples: GPUs, etc.
SI SISD SD

SISD SD
2. Single Instruction, Multiple Data Stream - SIMD

Slide Credit: Alex Klimovitski & Dean Macri, Intel Corporation

3. Multiple Instruction, Single Data Stream - MISD
• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction
sequence, using same Data
• Examples: Pipelined Vector Processors, etc.
4. Multiple Instruction, Multiple Data Stream- MIMD

• Most common parallel processor architecture

• Simultaneously execute different instructions
• Using different sets of data
• Examples: Multi-cores, SMPs, Clusters, Grid, Cloud
Taxonomy of Processor Architectures
MIMD - Overview
• General purpose processors
• Each can process all instructions necessary
• Further classified by method of processor
communication:
1. Shared Memory
2. Distributed Memory
Symmetric Multiprocessor (SMP)
• Processors share memory (tightly coupled)
• Communicate via shared memory (single bus)
• Same memory access time (any memory region, from
any processor)
• Processors share I/O address space too
SMP Advantages
• Performance
– work can be done in parallel

• Availability
– Failure of a single processor does not halt system

• Incremental growth
– Adding additional processors enhances performance

• Scaling
– Range of products based on number of processors
Symmetric Multiprocessor Organization
Multithreading and Chip Multiprocessors
• Instruction stream divided into smaller streams
called “threads”

• Executed in parallel
Taxonomy of Processor Architectures
Tightly Coupled - NUMA
• Non-Uniform Memory Access (NUMA)
– Access times to different regions of memory differs
SunFire X4600M2 NUMA machine
Non-uniform Memory Access (NUMA)
• Non-uniform memory access
– All processors have access to all parts of memory
– Access time of processor differs depending on
memory region
– Different processors access different regions of
memory at different speeds

• Cache-coherent NUMA (cc-NUMA)

– Cache coherence is maintained among the caches
of the various processors
Motivation (Why NUMA)
• SMP has practical limit to number of processors
– Bus traffic limits to between 16 and 64 processors

• In clusters each node has own memory:

– Apps do not see large global memory
– Coherence maintained by software not hardware

• NUMA retains SMP flavour while giving large scale

multiprocessing
CC-NUMA Organization
CC-NUMA Operation
• Each processor has own L1 and L2 cache
• Each node has own main memory
• Nodes connected by some networking facility
• Each processor sees single addressable memory
• Hardware support for read/write to non-local
memories, cache coherency

• Memory request order:

1. L1 cache 🡪 L2 cache (local to processor)
2. Main memory (local to node)
3. Remote memory
NUMA Pros & Cons
• Effective performance at higher levels of parallelism
than SMP

• No major software changes

• Performance can breakdown if too much access to

remote memory
Distributed Memory / Message Passing
• Each processor has access to its own memory only

• Data transfer between processors is explicit (via

message passing functions): E.g., MPI library

• User has complete control/responsibility for data

placement and management

Interconnection Network

CPU Memor CPU Memor CPU Memor

y y y
Hybrid Systems
• Distributed memory system with multiprocessor
shared memory nodes

• Most common parallel architecture

Interconnection Network
Network Network
Interface Network Interface
Interface
CPU CPU CPU
Memor

CPU
Memor

Memor
CPU CPU
y

y
CPU CPU CPU
Taxonomy of Processor Architectures
Distributed Computing
• Using distributed systems to solve large
problems

• Paradigms:
– Cluster computing
– Grid computing
– Cloud computing
Cluster Computing
Clusters - Loosely Coupled
• Collection of independent uni-processor systems or
SMPs
• Interconnected to form a cluster

• Communication via fixed path or network connections

• Not a single shared memory

Introduction to Clusters
• Alternative to SMP
• High performance
• High availability
• A group of interconnected whole computers
• Working together as unified resource
• Illusion of being one big machine
• Each computer called a node
Cluster Benefits
• Scalability
• Superior price/performance ratio
Cluster System Architecture
Cluster Middleware
• Unified image to user
– Single system image
• Single point of entry
• Single file hierarchy
• Single job management system
• Single user interface
• Single I/O space
Cluster vs. SMP
• Both provide multiprocessor support

• SMPs:
– Easier to manage and control
– Closer to single processor systems:
• Scheduling is main difference
• Less physical space required
• Lower power consumption
Cluster vs. SMP
• Clustering:
– Superior incremental scalability
– Superior availability
• Redundancy
Grid Computing
Grid Computing
• Heterogeneous computers over the whole world
providing CPU power and data storage capacity

• Applications can be executed at several locations

• Geographically distributed services

• Coordinates/Access of resources; as contract to

centralized control

• Uses standard, open, general-purpose protocols and

interfaces Credits: Grid Computing by Camiel
Grid Architecture

Autonomous, globally distributed computers/clusters

A typical view of Grid environment
Grid Information
Service
Grid Information Service system
collects the details of the Details of Grid resources
available Grid resources. Passes
information to resource broker.

Computational jobs

Grid application

Processed jobs
Computation result

Us Resource
A User submit
er
computation or data Broker
intensive application to
A Resource Broker distribute the Grid
jobs in an application to the Grid Grid Resources (Cluster, PC,
Grids. resources based on user’s QoS Resources
Supercomputer, database,
requirements and available Grid instruments, etc.)
resources.
Cloud Computing
What is Cloud Computing?
• Cloud Computing is a network-based computing that
takes place over the Internet:
– a collection/group of integrated and networked
hardware, software, and Internet infrastructure
(called a platform).

• Hides the complexity and details of the underlying

infrastructure
What is Cloud Computing?
•On demand services, that are always ON, Anywhere,
Anytime and Any place

•Pay for use and as needed

•Elastic: scale up and down (capacity and functionalities)

•Shared pool of configurable computing resources

43
Service Models

44
Cloud Service Models

Adopted from: Effectively and 45 Securely Using the Cloud Computing Paradigm by peter Mell, Tim
Grance
Figure source: https://dachou.github.io/assets/20110326-cloudmodels.png
Cloud Providers
SuperComputers
What is Cloud Computing?
• Typical definition*: A computer that leads the world in
terms of processing capacity, speed of calculation, at the
time of its introduction
– Computer speed is measured in FLoating Point
Operations Per Second (FLOPs)

– Currently the LINPACK Benchmark is officially used to

determine a computers speed.
http://www.netlib.org/benchmark/hpl

– Top 500 SuperComputers

– A ranked list of general purpose systems that are in
common use for high-end applications
*https://home.chpc.utah.edu/~thorne/computing/L13_Supercomputing_Part1.pdf
Top 5 of the list (Nov. 2020)
El Captain – Today’s SuperComputer

Cores: 11,039,616 CPU and GPU cores

Processors: 44,544 AMD MI300A processors
Memory: 5.4375 petabytes
Peak performance: 2.79 exaflops
Peak power: ~35 MW
Interconnect: HPE Slingshot 64-port switch
Storage: Rabbit NVM-Express fast storage arrays
Floor space: 7,500 square feet (700 m2)
El Captain – Shasta Architecture
Top 500 SuperComputers - Nov. 2020
Top 500 SuperComputers - Nov. 2020
Top 500 SuperComputers - Nov. 2020
Top 500 SuperComputers - Nov. 2020
Any Questions?

CS-3006 3 ParallelArchitectures
No ratings yet
CS-3006 3 ParallelArchitectures
53 pages
Week 6 A
No ratings yet
Week 6 A
22 pages
Week 6 A
No ratings yet
Week 6 A
32 pages
Cluster Computing: Dr. C. Amalraj 01/03/2021 The University of Moratuwa Amalraj@uom - LK
No ratings yet
Cluster Computing: Dr. C. Amalraj 01/03/2021 The University of Moratuwa Amalraj@uom - LK
49 pages
Jamshed 2015
No ratings yet
Jamshed 2015
17 pages
L1&2 Intro To DS&CC - M
No ratings yet
L1&2 Intro To DS&CC - M
53 pages
Cloud Computing Unit-1
100% (1)
Cloud Computing Unit-1
88 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
Chapter 01 PDC
No ratings yet
Chapter 01 PDC
51 pages
2 CS Architecture
No ratings yet
2 CS Architecture
22 pages
High Performance Cluster Computing:: Architectures and Systems
No ratings yet
High Performance Cluster Computing:: Architectures and Systems
70 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
Lecture 10 - 1 Parallel Systems-20181127022751-20191112080309
No ratings yet
Lecture 10 - 1 Parallel Systems-20181127022751-20191112080309
49 pages
Management Information System: Ghulam Yasin Hajvery University Lahore
No ratings yet
Management Information System: Ghulam Yasin Hajvery University Lahore
42 pages
2 - Distributed Computing Models
No ratings yet
2 - Distributed Computing Models
27 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
Unit IV Cluster Computing
No ratings yet
Unit IV Cluster Computing
70 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Background: Computer System Architectures Computer System Software
No ratings yet
Background: Computer System Architectures Computer System Software
25 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
Organization of Multiprocessor Systems
No ratings yet
Organization of Multiprocessor Systems
87 pages
MC5501 Cloud Computing
No ratings yet
MC5501 Cloud Computing
24 pages
Gopallapuram, Renigunta-Srikalahasti Road, Tirupati: by B. Hari Prasad, Asst - Prof
No ratings yet
Gopallapuram, Renigunta-Srikalahasti Road, Tirupati: by B. Hari Prasad, Asst - Prof
18 pages
Cluster
No ratings yet
Cluster
55 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Cloud Short Note by Dipu #2
No ratings yet
Cloud Short Note by Dipu #2
26 pages
IV. Physical Organization and Models: March 9, 2009
No ratings yet
IV. Physical Organization and Models: March 9, 2009
35 pages
IV. Physical Organization and Models: March 9, 2009
No ratings yet
IV. Physical Organization and Models: March 9, 2009
35 pages
High Performance Cluster Computing:: Architectures and Systems
No ratings yet
High Performance Cluster Computing:: Architectures and Systems
70 pages
Parallel Computers
No ratings yet
Parallel Computers
39 pages
CC Ica
No ratings yet
CC Ica
16 pages
Multiprocessors and Multicomputers
No ratings yet
Multiprocessors and Multicomputers
27 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Unit III
No ratings yet
Unit III
39 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
SystemModelsforDistributedandCloudComputing PDF
No ratings yet
SystemModelsforDistributedandCloudComputing PDF
15 pages
Parallel Processors and Cluster Systems: Gagan Bansal IME Sahibabad
No ratings yet
Parallel Processors and Cluster Systems: Gagan Bansal IME Sahibabad
15 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Cluster Computing at A Glance Chapter 1: by M. Baker and R. Buyya
No ratings yet
Cluster Computing at A Glance Chapter 1: by M. Baker and R. Buyya
15 pages
Technologies For Network
No ratings yet
Technologies For Network
3 pages
Slide02 Parallel Computers
No ratings yet
Slide02 Parallel Computers
44 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Module 4 - Architecture
No ratings yet
Module 4 - Architecture
22 pages
Architecture
No ratings yet
Architecture
67 pages
M 1 IA
No ratings yet
M 1 IA
5 pages
DoS - Unit 1
No ratings yet
DoS - Unit 1
57 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Unit 3
No ratings yet
Unit 3
46 pages
Lecture 1 - Overview of Distributed Computing
No ratings yet
Lecture 1 - Overview of Distributed Computing
71 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Downloadfile
No ratings yet
Downloadfile
16 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
Ucopia Router Architecture
No ratings yet
Ucopia Router Architecture
7 pages
CSC311 Lecture 1
No ratings yet
CSC311 Lecture 1
29 pages
Android Building Blocks - Part 1
No ratings yet
Android Building Blocks - Part 1
14 pages
Address Space: Ipv4 Addresses
No ratings yet
Address Space: Ipv4 Addresses
9 pages
Wireless Access Points: Omniaccess Stellar Ap1201H
No ratings yet
Wireless Access Points: Omniaccess Stellar Ap1201H
1 page
Optimal Sizing of A Wind, Fuel Cell, Electrolyzer, Battery and Supercapacitor System For Off-Grid Applications
No ratings yet
Optimal Sizing of A Wind, Fuel Cell, Electrolyzer, Battery and Supercapacitor System For Off-Grid Applications
14 pages
Ensoniq DP 4 Musicians Manual
No ratings yet
Ensoniq DP 4 Musicians Manual
212 pages
WATO EX-20&30&35 Service Manual V9.0 en
No ratings yet
WATO EX-20&30&35 Service Manual V9.0 en
280 pages
Porta Punch Manual
No ratings yet
Porta Punch Manual
16 pages
BRC Sequent Brochure
No ratings yet
BRC Sequent Brochure
6 pages
Managed Print Services Come of Age
No ratings yet
Managed Print Services Come of Age
19 pages
People Ware Firmware Software Hardware
No ratings yet
People Ware Firmware Software Hardware
5 pages
CCSA 156-215.80-512Q (2020feb06 Revised)
100% (1)
CCSA 156-215.80-512Q (2020feb06 Revised)
230 pages
Ict Prof. R. Aquino
No ratings yet
Ict Prof. R. Aquino
44 pages
Power Saving Initiatives - DHL Express
No ratings yet
Power Saving Initiatives - DHL Express
7 pages
CSE3gn20 - Summer23 Assignment 1 - SRJ
No ratings yet
CSE3gn20 - Summer23 Assignment 1 - SRJ
8 pages
Fax Cover Letter
100% (2)
Fax Cover Letter
7 pages
BCA-202 Unit-1
No ratings yet
BCA-202 Unit-1
23 pages
Class XII First Test (T1)
No ratings yet
Class XII First Test (T1)
2 pages
How To Download Scribd Documents For Free
No ratings yet
How To Download Scribd Documents For Free
9 pages
1234
No ratings yet
1234
1 page
Benshaw RSM Redistart Micro Iom 890015-02-08
No ratings yet
Benshaw RSM Redistart Micro Iom 890015-02-08
146 pages
Current Transducer LF 510-S 500 A
No ratings yet
Current Transducer LF 510-S 500 A
7 pages
Chapter Four Description of Automatic Water Tank Level Control System
No ratings yet
Chapter Four Description of Automatic Water Tank Level Control System
14 pages
Basildon Rooftop Project RSM550 Edit - vc0 Results
No ratings yet
Basildon Rooftop Project RSM550 Edit - vc0 Results
5 pages
Oracle Fusion Expenses Android
No ratings yet
Oracle Fusion Expenses Android
7 pages
Tutorial 2 Levelling
No ratings yet
Tutorial 2 Levelling
2 pages
Chapter Cyber Sec
No ratings yet
Chapter Cyber Sec
30 pages
v1 Covered
No ratings yet
v1 Covered
26 pages
CV by English
No ratings yet
CV by English
2 pages

CS-3006 3 ParallelArchitectures

Uploaded by

CS-3006 3 ParallelArchitectures

Uploaded by

Parallel and Distributed Computing

Muhammad Aadil Ur Rehman

Department of Computer Science,

• By Michael Flynn (from Stanford, in 1966)

Slide Credit: Alex Klimovitski & Dean Macri, Intel Corporation

• Most common parallel processor architecture

• Cache-coherent NUMA (cc-NUMA)

• In clusters each node has own memory:

• NUMA retains SMP flavour while giving large scale

• Memory request order:

• No major software changes

• Performance can breakdown if too much access to

• Data transfer between processors is explicit (via

• User has complete control/responsibility for data

CPU Memor CPU Memor CPU Memor

• Most common parallel architecture

• Communication via fixed path or network connections

• Not a single shared memory

• Applications can be executed at several locations

• Geographically distributed services

• Coordinates/Access of resources; as contract to

• Uses standard, open, general-purpose protocols and

Autonomous, globally distributed computers/clusters

• Hides the complexity and details of the underlying

•Pay for use and as needed

•Elastic: scale up and down (capacity and functionalities)

•Shared pool of configurable computing resources

– Currently the LINPACK Benchmark is officially used to

– Top 500 SuperComputers

Cores: 11,039,616 CPU and GPU cores

You might also like