0% found this document useful (0 votes)
8 views

Lecture 4

This document discusses parallel computing platforms and their logical and physical organization. It covers shared memory and message passing platforms from a logical perspective, and static interconnection networks from a physical perspective, evaluating networks based on properties like diameter and bisection width.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 4

This document discusses parallel computing platforms and their logical and physical organization. It covers shared memory and message passing platforms from a logical perspective, and static interconnection networks from a physical perspective, evaluating networks based on properties like diameter and bisection width.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

High Performance

Computing
LECTURE 4

1
O Parallel Computing Platform
O Logical Organization
Agenda 1- Control
2- Communication
O Physical Organization: Interconnection networks:
1- Static Network
▪ Topology
▪ Evaluation of networks

2
Parallel Computing Platform
Logical Organization

3
Parallel Computing Platform
Logical Organization

Platforms that provide a shared data space are Platforms that support messaging are called
called shared-address-space machines or message passing platforms or multi-computers.
multiprocessors
Parallel Computing Platform
Logical Organization

1- Accessing Shared data

❖ In shared memory system, all processors share a global memory.

❖ Processors exchange information (communication between tasks running on


different processors) through writing to and reading from the global memory.

❖Changes in a memory location effected by one processor are visible to all other
processors (global address space).

6
Parallel Computing Platform
1- Accessing Shared data (cont.) Logical Organization

❖Shared memory machines can divided into two main classes based upon
memory access times:

➢ Uniform Memory Access (UMA) and

➢ Non- Uniform Memory Access (NUMA).

7
Parallel Computing Platform
1- Accessing Shared data (cont.) Logical Organization

❖Shared memory machines can divided into two main classes based upon
memory access times:

➢ Uniform Memory Access (UMA) and

➢ Non- Uniform Memory Access (NUMA).

(a) Uniform-memory access (b) Uniform-memory-access (c) Non-uniform-memory-


shared-address-space computer; shared-address-space access shared-address-
Performance degradation. So, solution → computer with caches and space computer with local
memories (cache coherence!) memory only. 8
Parallel Computing Platform
Logical Organization

Platforms that provide a shared data space are Platforms that support messaging are called
called shared-address-space machines or message passing platforms or multi-computers.
multiprocessors
2- Exchanging messages (Cont.) Parallel Computing Platform
Logical Organization
Message passing systems are a class of multi-computers [clustered
workstations] in which each processor has access to its own local memory
▪ Each one operates independently.
▪ Changes it makes to its local memory have no effect on the
memory of other processors.
▪ Hence, the concept of cache coherency does not apply.

Memory Memory

Memory
2- Exchanging messages (Cont.) Parallel Computing Platform
Logical Organization
❖These platforms are programmed using (variants of) send and receive
primitives. {GetID, NumProcs}.

❖ Principal functions send(), receive(), each processor has unique ID

❖ Libraries such as MPI and PVM provide such primitives.

❖When a processor needs access to data in another processor “Distributed


Memory”, it is usually the task of the programmer to explicitly define how
and when data is communicated.

❖Synchronization between tasks is the programmer’s responsibility


11
2- Exchanging messages (Cont.)
❖Each node comprises at least one network interface (NI) that mediates the
connection to a communication network.

❖On each CPU runs a serial process that can communicate with other
processes on other CPUs by means of the network.
(MPI)—A distributed memory parallel programming language
❖Synchronizes well with Data Parallelism.

❖The same program on each processor/machine (SPMD—a very useful


subset of MIMD)

❖Each process distinguished by its rank.

❖The program is written in a sequential language (FORTRAN/C[++])

❖All variables are local! No concept of shared memory

❖Data exchange between processes through Send/receive messages via


appropriate library 13
(MPI)—A distributed memory parallel programming language

❖ MPI System requires information about:


✓ Which processor is sending the message. (Sender)

✓ Where is the data on the sending processor. (S Variable)

✓ What kind of data is being sent. (Data type)

✓ How much data is there. (Size)

✓ Which processor (s) are receiving the message. (Receiver)

✓ Where should the data be left on the receiving processor. (R variable)

✓ How much data is the receiving processor prepared to accept. (Size)


14
Your Turn
Compare between shared address space and message passing platforms

15
Your Turn Distributed Memory
Shared Memory
Advantages
Advantages
Global address space provides a user-friendly Memory is scalable with number of processor,
Increase the number of processors and the size of
programming prespective to memory. memory increases proportionally.
Data sharing between tasks is both fast and Each processor can rapidly access its own
uniform due to the proximity of memory to memory without interference and without
CPUs overhead incurred with trying to maintain cache
coherency.
Cost effectiveness
Disadvantages
Lack of scalability between memory and Disadvantages
CPUs. Adding more CPUs can increases traffic
on the shared memory and CPU path Programmer responsible for many details
associated with data communication between
Expensive processors.
Difficult to map existing data structures, based on
global memory, to this memory organization
42
Interconnection Networks
❖Provide another mechanisms for data transfer between processors and memory
modules

17
❖ Interconnection networks can be classified as static or dynamic.

❖ Static networks:
✓ Consist of point-to-point communication links among processing nodes.

✓ Referred to as direct networks.

❖ Dynamic networks:
✓ Are built using switches and communication links.
✓ Communication links are connected to one another dynamically by the
switches to establish paths among processing nodes and memory banks.

✓ Dynamic networks are also referred to as indirect networks.


18
8.
20
Network Topologies

❖A variety of network topologies have been proposed and implemented.

❖ These topologies tradeoff performance for cost.

❖Commercial machines often implement hybrids of multiple topologies for reasons of


packaging, cost, and available components.

who is connected to whom

21
A- Static Interconnection Network
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖ Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

22
A- Static Interconnection Networks
1. Complete network (clique)
2. Star network
3. Linear array
4. Ring
5. Tree
6. 2D & 3D mesh/torus
7. Hypercube
8. Fat tree

23
A- Static Interconnection Networks

1- Completely Connected
❖ Each processor is connected to every other processor.

❖While the performance scales very well, the hardware


complexity is not realizable for large values of p.

❖Completely connected networks are static counterparts of


crossbar.
A- Static Interconnection Network

2-Star
❖ Every node is connected only to a common node at the center.

❖ The central node becomes a bottleneck.

❖In this sense, star connected networks are static counterparts


of buses.
A- Static Interconnection Network

3- linear
❖ Each node has two neighbors, one to its left and one to its right.

4- Ring (1D)
❖ It is linear but the nodes at either end are connected.

21
A- Static Interconnection Networks

5- 2D & 3D mesh

❖ Has nodes with 4 neighbors, to the north, south, east, and west
t.
❖Good match for discrete simulation and matrix operations
❖ Easy to manufacture and extend
❖ Examples: Cray 3D (3d torus), Intel Paragon (2D mesh)
A- Static Interconnection Network
6.Hypercubes
❖ A special case of a d-dimensional mesh is a hypercube.

Here, d = log2 p, where p is the total number of nodes.

❖ Each node has log p neighbors.

❖The distance between two nodes is given by the number


of bit positions at which the two nodes differ.

❖costly/difficult to manufacture for high n, not so popular


nowadays
7- Tree A- Static Interconnection Network

❖The distance between any two nodes is no more than 2log p.


❖ Links higher up the tree potentially carry more traffic than
those at the lower levels.
❖For this reason, a variant called a fat-tree, fattens the links as we
go up the tree.
❖Trees can be laid out in 2D with no wire crossings. This is an
attractive property of trees.

❖Thus tree suffers from a communication bottleneck at higher


levels of the tree (specially if right part of tree try sending to
left part) [problem]
Solution a FAT tree increased number of communication links and switching
nodes closer to root
9. Fat Tree Network A- Static Interconnection Network

❖In the pervious tree networks there was only


one path between any two pairs of nodes

❖To send message, the source node send the


message up tree until it reaches the node at
root of both source and destination then
message is routed down the tree
A- Static Interconnection Network
Evaluating Static Interconnection Networks
❖Diameter: The maximum distance between any two processing nodes in
the network. (number of hops through which a message in transferred on
its way from one point to another )

❖Bisection Width: The minimum number of wires you must cut to divide
the network into two equal parts.

❖ Connectivity: the multiplicity of paths between any two processing nodes

❖Cost: The number of links or switches besides the length of wires, etc., are
factors in to the cost.

33
Calculate it A- Static Interconnection Network

You might also like