0% found this document useful (0 votes)

22 views

Lecture 4

This document discusses parallel computing platforms and their logical and physical organization. It covers shared memory and message passing platforms from a logical perspective, and static interconnection networks from a physical perspective, evaluating networks based on properties like diameter and bisection width.

Uploaded by

ahmedtarek86519623

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Lecture 4

Uploaded by

ahmedtarek86519623

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

High Performance

Computing
LECTURE 4

1
O Parallel Computing Platform
O Logical Organization
Agenda 1- Control
2- Communication
O Physical Organization: Interconnection networks:
1- Static Network
▪ Topology
▪ Evaluation of networks

2
Parallel Computing Platform
Logical Organization

3
Parallel Computing Platform
Logical Organization

1- Accessing Shared data

❖ In shared memory system, all processors share a global memory.

❖ Processors exchange information (communication between tasks running on

different processors) through writing to and reading from the global memory.

❖Changes in a memory location effected by one processor are visible to all other
processors (global address space).

6
Parallel Computing Platform
1- Accessing Shared data (cont.) Logical Organization

❖Shared memory machines can divided into two main classes based upon
memory access times:

➢ Uniform Memory Access (UMA) and

➢ Non- Uniform Memory Access (NUMA).

7
Parallel Computing Platform
1- Accessing Shared data (cont.) Logical Organization

❖Shared memory machines can divided into two main classes based upon
memory access times:

➢ Uniform Memory Access (UMA) and

➢ Non- Uniform Memory Access (NUMA).

(a) Uniform-memory access (b) Uniform-memory-access (c) Non-uniform-memory-

shared-address-space computer; shared-address-space access shared-address-
Performance degradation. So, solution → computer with caches and space computer with local
memories (cache coherence!) memory only. 8
Parallel Computing Platform
Logical Organization

Platforms that provide a shared data space are Platforms that support messaging are called
called shared-address-space machines or message passing platforms or multi-computers.
multiprocessors
2- Exchanging messages (Cont.) Parallel Computing Platform
Logical Organization
Message passing systems are a class of multi-computers [clustered
workstations] in which each processor has access to its own local memory
▪ Each one operates independently.
▪ Changes it makes to its local memory have no effect on the
memory of other processors.
▪ Hence, the concept of cache coherency does not apply.

Memory Memory

Memory
2- Exchanging messages (Cont.) Parallel Computing Platform
Logical Organization
❖These platforms are programmed using (variants of) send and receive
primitives. {GetID, NumProcs}.

❖ Principal functions send(), receive(), each processor has unique ID

❖ Libraries such as MPI and PVM provide such primitives.

❖When a processor needs access to data in another processor “Distributed

Memory”, it is usually the task of the programmer to explicitly define how
and when data is communicated.

❖Synchronization between tasks is the programmer’s responsibility

11
2- Exchanging messages (Cont.)
❖Each node comprises at least one network interface (NI) that mediates the
connection to a communication network.

❖On each CPU runs a serial process that can communicate with other
processes on other CPUs by means of the network.
(MPI)—A distributed memory parallel programming language
❖Synchronizes well with Data Parallelism.

❖The same program on each processor/machine (SPMD—a very useful

subset of MIMD)

❖Each process distinguished by its rank.

❖The program is written in a sequential language (FORTRAN/C[++])

❖All variables are local! No concept of shared memory

❖Data exchange between processes through Send/receive messages via

appropriate library 13
(MPI)—A distributed memory parallel programming language

❖ MPI System requires information about:

✓ Which processor is sending the message. (Sender)

✓ Where is the data on the sending processor. (S Variable)

✓ What kind of data is being sent. (Data type)

✓ How much data is there. (Size)

✓ Which processor (s) are receiving the message. (Receiver)

✓ Where should the data be left on the receiving processor. (R variable)

✓ How much data is the receiving processor prepared to accept. (Size)

14
Your Turn
Compare between shared address space and message passing platforms

15
Your Turn Distributed Memory
Shared Memory
Advantages
Advantages
Global address space provides a user-friendly Memory is scalable with number of processor,
Increase the number of processors and the size of
programming prespective to memory. memory increases proportionally.
Data sharing between tasks is both fast and Each processor can rapidly access its own
uniform due to the proximity of memory to memory without interference and without
CPUs overhead incurred with trying to maintain cache
coherency.
Cost effectiveness
Disadvantages
Lack of scalability between memory and Disadvantages
CPUs. Adding more CPUs can increases traffic
on the shared memory and CPU path Programmer responsible for many details
associated with data communication between
Expensive processors.
Difficult to map existing data structures, based on
global memory, to this memory organization
42
Interconnection Networks
❖Provide another mechanisms for data transfer between processors and memory
modules

17
❖ Interconnection networks can be classified as static or dynamic.

❖ Static networks:
✓ Consist of point-to-point communication links among processing nodes.

✓ Referred to as direct networks.

❖ Dynamic networks:
✓ Are built using switches and communication links.
✓ Communication links are connected to one another dynamically by the
switches to establish paths among processing nodes and memory banks.

✓ Dynamic networks are also referred to as indirect networks.

18
8.
20
Network Topologies

❖A variety of network topologies have been proposed and implemented.

❖ These topologies tradeoff performance for cost.

❖Commercial machines often implement hybrids of multiple topologies for reasons of

packaging, cost, and available components.

who is connected to whom

21
A- Static Interconnection Network
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖ Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

22
A- Static Interconnection Networks
1. Complete network (clique)
2. Star network
3. Linear array
4. Ring
5. Tree
6. 2D & 3D mesh/torus
7. Hypercube
8. Fat tree

23
A- Static Interconnection Networks

1- Completely Connected
❖ Each processor is connected to every other processor.

❖While the performance scales very well, the hardware

complexity is not realizable for large values of p.

❖Completely connected networks are static counterparts of

crossbar.
A- Static Interconnection Network

2-Star
❖ Every node is connected only to a common node at the center.

❖ The central node becomes a bottleneck.

❖In this sense, star connected networks are static counterparts

of buses.
A- Static Interconnection Network

3- linear
❖ Each node has two neighbors, one to its left and one to its right.

4- Ring (1D)
❖ It is linear but the nodes at either end are connected.

21
A- Static Interconnection Networks

5- 2D & 3D mesh

❖ Has nodes with 4 neighbors, to the north, south, east, and west
t.
❖Good match for discrete simulation and matrix operations
❖ Easy to manufacture and extend
❖ Examples: Cray 3D (3d torus), Intel Paragon (2D mesh)
A- Static Interconnection Network
6.Hypercubes
❖ A special case of a d-dimensional mesh is a hypercube.

Here, d = log2 p, where p is the total number of nodes.

❖ Each node has log p neighbors.

❖The distance between two nodes is given by the number

of bit positions at which the two nodes differ.

❖costly/difficult to manufacture for high n, not so popular

nowadays
7- Tree A- Static Interconnection Network

❖The distance between any two nodes is no more than 2log p.

❖ Links higher up the tree potentially carry more traffic than
those at the lower levels.
❖For this reason, a variant called a fat-tree, fattens the links as we
go up the tree.
❖Trees can be laid out in 2D with no wire crossings. This is an
attractive property of trees.

❖Thus tree suffers from a communication bottleneck at higher

levels of the tree (specially if right part of tree try sending to
left part) [problem]
Solution a FAT tree increased number of communication links and switching
nodes closer to root
9. Fat Tree Network A- Static Interconnection Network

❖In the pervious tree networks there was only

one path between any two pairs of nodes

❖To send message, the source node send the

message up tree until it reaches the node at
root of both source and destination then
message is routed down the tree
A- Static Interconnection Network
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖ Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

33
Calculate it A- Static Interconnection Network

Mantenimiento Dispensador - MINIMECH
No ratings yet
Mantenimiento Dispensador - MINIMECH
183 pages
Programming Techniques For Supercomputers
No ratings yet
Programming Techniques For Supercomputers
20 pages
05 - Lecture #5 - 6
No ratings yet
05 - Lecture #5 - 6
42 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Parallel Architecture: Sathish Vadhiyar
No ratings yet
Parallel Architecture: Sathish Vadhiyar
26 pages
pdcco1
No ratings yet
pdcco1
8 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Chapter 2- Communication Models
No ratings yet
Chapter 2- Communication Models
64 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Lecture 5 Network Topologies for Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies for Parallel Architectures - Updated
46 pages
CS621 Final Term
No ratings yet
CS621 Final Term
111 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
Linear and Ring Networks
No ratings yet
Linear and Ring Networks
11 pages
Unit 1
No ratings yet
Unit 1
25 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Unit3-all
No ratings yet
Unit3-all
115 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
RG1-Intro-ParallelArch-HPCAI-Jan2020
No ratings yet
RG1-Intro-ParallelArch-HPCAI-Jan2020
47 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Paralle Processing in Brief
No ratings yet
Paralle Processing in Brief
31 pages
Lecture 3.2.3 (Various interconnection networks)
No ratings yet
Lecture 3.2.3 (Various interconnection networks)
22 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
No ratings yet
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
31 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
34 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
lecture 3
No ratings yet
lecture 3
16 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
CS621 FT highlighted by vaniza
No ratings yet
CS621 FT highlighted by vaniza
111 pages
Lec3 InnerconnectionNetworks
No ratings yet
Lec3 InnerconnectionNetworks
28 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
Memory Organisation: Shared Memory in Distributed Memory Architectures
No ratings yet
Memory Organisation: Shared Memory in Distributed Memory Architectures
15 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Parallel Processing Lecture3
No ratings yet
Parallel Processing Lecture3
54 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
V Models of Parallel Computers V. Models of Parallel Computers - After PRAM and Early Models
No ratings yet
V Models of Parallel Computers V. Models of Parallel Computers - After PRAM and Early Models
35 pages
unit-ii_ppt[1]
No ratings yet
unit-ii_ppt[1]
43 pages
Chapter Two: Introduction To Computer Networks
No ratings yet
Chapter Two: Introduction To Computer Networks
47 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Computer Networks: Zaka Ul-Mustafa
No ratings yet
Computer Networks: Zaka Ul-Mustafa
54 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
COE4590_9_Shared Mem_MessgPassing
No ratings yet
COE4590_9_Shared Mem_MessgPassing
14 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
33 pages
Introduction
No ratings yet
Introduction
46 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Operating System Text Book
From Everand
Operating System Text Book
Manish Soni
No ratings yet
Understanding Software Engineering Vol 1: Where does the software run and how? The hardware.
From Everand
Understanding Software Engineering Vol 1: Where does the software run and how? The hardware.
Gabriel Clemente
No ratings yet
Operating System Interview Questions and Answers
From Everand
Operating System Interview Questions and Answers
Manish Soni
No ratings yet
Lesson 4
No ratings yet
Lesson 4
13 pages
May KeyCreator 2015 Version 13.5
No ratings yet
May KeyCreator 2015 Version 13.5
54 pages
Bda Ut-2
No ratings yet
Bda Ut-2
34 pages
01.montronix Product Presentation
No ratings yet
01.montronix Product Presentation
32 pages
Esign - Online Electronic Signature Service: Government of India
No ratings yet
Esign - Online Electronic Signature Service: Government of India
2 pages
Master Thesis On Software Testing
100% (2)
Master Thesis On Software Testing
6 pages
Lesson-1-ICF-8_094608
No ratings yet
Lesson-1-ICF-8_094608
3 pages
Kendriya Vidyalaya, No.2 Tiruchirappalli: (Combined Walk-in-Interview For Contractual Teachers For Schools)
No ratings yet
Kendriya Vidyalaya, No.2 Tiruchirappalli: (Combined Walk-in-Interview For Contractual Teachers For Schools)
8 pages
Solved Example PDF
No ratings yet
Solved Example PDF
41 pages
SCST Documentation Lcna15 - Bvanassche
No ratings yet
SCST Documentation Lcna15 - Bvanassche
28 pages
Series: Counting Scales
No ratings yet
Series: Counting Scales
4 pages
HPAC Admin Guide
No ratings yet
HPAC Admin Guide
56 pages
OOP Micro
No ratings yet
OOP Micro
8 pages
Autopilot Kap140 Bendixking For Da42
No ratings yet
Autopilot Kap140 Bendixking For Da42
42 pages
Programming in C Practice Book - Module 2
No ratings yet
Programming in C Practice Book - Module 2
119 pages
Movement
No ratings yet
Movement
6 pages
SNMP web manager user manual
No ratings yet
SNMP web manager user manual
25 pages
Garnier Hair Color - Mmxi PDF
No ratings yet
Garnier Hair Color - Mmxi PDF
51 pages
unit 1 part 2
No ratings yet
unit 1 part 2
18 pages
Responder Action Policy Examples - New.generateall
No ratings yet
Responder Action Policy Examples - New.generateall
4 pages
Drum Sound and Drum Tuning : Bridging Science and Creativity 1st Edition Rob Toulson All Chapters Instant Download
100% (1)
Drum Sound and Drum Tuning : Bridging Science and Creativity 1st Edition Rob Toulson All Chapters Instant Download
65 pages
anymal-x-technical-specifications
No ratings yet
anymal-x-technical-specifications
12 pages
OPTIKA DOWNLOAD LINK SOFTWARES Technical Datasheet EN
No ratings yet
OPTIKA DOWNLOAD LINK SOFTWARES Technical Datasheet EN
2 pages
Chapter-3: Sequential Logic Circuit
No ratings yet
Chapter-3: Sequential Logic Circuit
15 pages
UMTS System Architecture
50% (2)
UMTS System Architecture
25 pages
Scalexm - Ai: A Compact Guide To Large Language Models
No ratings yet
Scalexm - Ai: A Compact Guide To Large Language Models
9 pages
QUADRATIC EXPONENTIAL EQUATIONS BY MERRICK MULLINGS
No ratings yet
QUADRATIC EXPONENTIAL EQUATIONS BY MERRICK MULLINGS
17 pages
Common Emitter Amplifier: Sayed Taher Zewari ECE 334-201 Lab No. 7 09/28/00
No ratings yet
Common Emitter Amplifier: Sayed Taher Zewari ECE 334-201 Lab No. 7 09/28/00
4 pages
Aerowave Brochure READER HR 2 17 16
No ratings yet
Aerowave Brochure READER HR 2 17 16
4 pages

Lecture 4

Uploaded by

Lecture 4

Uploaded by

High Performance

1- Accessing Shared data

❖ In shared memory system, all processors share a global memory.

❖ Processors exchange information (communication between tasks running on

➢ Uniform Memory Access (UMA) and

➢ Non- Uniform Memory Access (NUMA).

➢ Uniform Memory Access (UMA) and

➢ Non- Uniform Memory Access (NUMA).

(a) Uniform-memory access (b) Uniform-memory-access (c) Non-uniform-memory-

❖ Principal functions send(), receive(), each processor has unique ID

❖ Libraries such as MPI and PVM provide such primitives.

❖When a processor needs access to data in another processor “Distributed

❖Synchronization between tasks is the programmer’s responsibility

❖The same program on each processor/machine (SPMD—a very useful

❖Each process distinguished by its rank.

❖The program is written in a sequential language (FORTRAN/C[++])

❖All variables are local! No concept of shared memory

❖Data exchange between processes through Send/receive messages via

❖ MPI System requires information about:

✓ Where is the data on the sending processor. (S Variable)

✓ What kind of data is being sent. (Data type)

✓ How much data is there. (Size)

✓ Which processor (s) are receiving the message. (Receiver)

✓ Where should the data be left on the receiving processor. (R variable)

✓ How much data is the receiving processor prepared to accept. (Size)

✓ Referred to as direct networks.

✓ Dynamic networks are also referred to as indirect networks.

❖A variety of network topologies have been proposed and implemented.

❖ These topologies tradeoff performance for cost.

❖Commercial machines often implement hybrids of multiple topologies for reasons of

who is connected to whom

❖ Connectivity: the multiplicity of paths between any two processing nodes

❖While the performance scales very well, the hardware

❖Completely connected networks are static counterparts of

❖ The central node becomes a bottleneck.

❖In this sense, star connected networks are static counterparts

Here, d = log2 p, where p is the total number of nodes.

❖ Each node has log p neighbors.

❖The distance between two nodes is given by the number

❖costly/difficult to manufacture for high n, not so popular

❖The distance between any two nodes is no more than 2log p.

❖Thus tree suffers from a communication bottleneck at higher

❖In the pervious tree networks there was only

❖To send message, the source node send the

❖ Connectivity: the multiplicity of paths between any two processing nodes

You might also like