0% found this document useful (0 votes)

93 views33 pages

Parallel Architecture

This document discusses parallel computing architectures and classification. It covers Flynn's classification of architectures based on instruction and data streams, including SISD, SIMD, MISD, and MIMD models. It also discusses shared memory versus distributed memory architectures. Key topics covered include cache coherence problems in shared memory systems and solutions using cache coherence protocols, as well as different interconnection network topologies used in parallel systems like meshes, toruses, hypercubes, and fat trees.

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views33 pages

Parallel Architecture

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Parallel Architecture

Sathish Vadhiyar
Motivations of Parallel Computing
 Faster execution times
 From days or months to hours or seconds
 E.g., climate modelling, bioinformatics
 Large amount of data dictate parallelism
 Parallelism more natural for certain kinds of
problems, e.g., climate modelling
 Due to computer architecture trends
 CPU speeds have saturated
 Slow memory bandwidths

2
Classification of Architectures – Flynn’s
classification
In terms of parallelism in
instruction and data stream
 Single Instruction Single
Data (SISD): Serial
Computers
 Single Instruction Multiple
Data (SIMD)
- Vector processors and
processor arrays
- Examples: CM-2, Cray-90,
Cray YMP, Hitachi 3600

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/

3
Classification of Architectures – Flynn’s
classification
 Multiple Instruction Single
Data (MISD): Not popular
 Multiple Instruction
Multiple Data (MIMD)
- Most popular
- IBM SP and most other
supercomputers,
clusters, computational
Grids etc.

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/

4
Classification of Architectures – Based on
Memory
 Shared memory
 2 types – UMA and
NUMA NUMA
Examples: HP-
Exemplar, SGI Origin,
UMA Sequent NUMA-Q

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/ 5
Classification 2:
Shared Memory vs Message Passing
 Shared memory machine: The n processors
share physical address space
 Communication can be done through this shared
memory P
M P
M P
M P
M P
M P
M P
M

P P P Interconnect
P P P P

Interconnect
Main Memory

 The alternative is sometimes referred to

as a message passing machine or a
distributed memory machine
6
Shared Memory Machines
The shared memory could itself be
distributed among the processor nodes
 Each processor might have some portion of the
shared physical address space that is physically
close to it and therefore accessible in less time
 Terms: NUMA vs UMA architecture
 Non-Uniform Memory Access
 Uniform Memory Access

7
SHARED MEMORY AND
CACHES

8
Shared Memory Architecture: Caches
P1 P2
ReadX=1
Write X Read X
Cache hit:
Wrong data!!
X:
X:10 X: 0

X: 1
0

9
Cache Coherence Problem
 If each processor in a shared memory
multiple processor machine has a data cache
 Potential data consistency problem: the cache
coherence problem
 Shared variable modification, private cache
 Objective: processes shouldn’t read `stale’
data
 Solutions
 Hardware: cache coherence mechanisms

10
Cache Coherence Protocols
 Write update – propagate cache line to other
processors on every write to a processor
 Write invalidate – each processor gets the
updated cache line whenever it reads stale
data
 Which is better?

11
Invalidation Based Cache Coherence
P1 P2
ReadX=1
Write X Read X

X: 1
X:
X:10 X: 0

Invalidate

X: 0 X: 1

12
Cache Coherence using invalidate protocols

 3 states associated with data items

 Shared – a variable shared by 2
caches
 Invalid – another processor (say P0)
has updated the data item
 Dirty – state of the data item in P0

13
Implementations of cache coherence protocols

 Snoopy
 for bus based architectures
 shared bus interconnect where all cache
controllers monitor all bus activity
 There is only one operation through bus at a
time; cache controllers can be built to take
corrective action and enforce coherence in
caches
 Memory operations are propagated over the bus
and snooped

14
Implementations of cache coherence protocols

 Directory-based
 Instead of broadcasting memory operations to
all processors, propagate coherence operations
to relevant processors
 A central directory maintains states of cache
blocks, associated processors

15
Implementation of Directory Based
Protocols
 Using presence bits for the owner processors
 Two schemes:
 Full bit vector scheme – O(MxP) storage for
P processors and M cache lines
 But not necessary
 Modern day processors use sparse or tagged
directory scheme
 Limited cache lines and limited presence bits
16
False Sharing
 Cache coherence occurs at the granularity of
cache lines – an entire cache line is
invalidated
 Modern day cache lines are 64 bytes in size
 Consider a Fortran program dealing with a
matrix
 Assume each thread or process accessing a
row of a matrix
 Leads to false sharing

17
False sharing: Solutions
 Reorganize the code so that each processor
access a set of rows
 Can still lead to overlapping of cache lines if
matrix size not divisible by processors
 In such cases, employ padding
 Padding: dummy elements added to make
the matrix size divisible

18
INTERCONNECTION
NETWORKS

19
Interconnects
 Used in both shared memory and
distributed memory architectures
 In shared memory: Used to connect
processors to memory
 In distributed memory: Used to connect
different processors
 Components
 Interface (PCI or PCI-e): for connecting
processor to network link
 Network link connected to a communication
network (network of connections)

20
Communication network
 Consists of switching elements to which
processors are connected through ports
 Switch: network of switching elements
 Switching elements connected with each
other using a pattern of connections
 Pattern defines the network topology

 In shared memory systems, memory units

are also connected to communication
network
21
Parallel Architecture: Interconnections
 Routing techniques: how the route taken by the message
from source to destination is decided
 Network topologies
 Static – point-to-point communication links among processing
nodes
 Dynamic – Communication links are formed dynamically by
switches

22
Network Topologies

 Static
 Bus
 Completely connected
 Star
 Linear array, Ring (1-D torus)
 Mesh
 k-d mesh: d dimensions with k nodes in each dimension
 Hypercubes – 2-logp mesh
 Trees – our campus network
 Dynamic – Communication links are formed dynamically by
switches
 Crossbar
 Multistage
 For more details, and evaluation of topologies, refer to book by
Grama et al.
23
Network Topologies
 Bus, ring – used in small-
scale shared memory
systems

 Crossbar switch – used in

some small-scale shared
memory machines, small
or medium-scale
distributed memory
machines 24
Crossbar Switch
 Consists of 2D grid of switching elements
 Each switching element consists of 2 input
ports and 2 output ports
 An input port dynamically connected to an
output port through a switching logic

25
Multistage network – Omega network
 To reduce switching complexity
 Omega network – consisting of logP stages,
each consisting of P/2 switching elements

 Contention
 In crossbar – nonblocking
 In Omega – can occur during multiple
communications to disjoint pairs
26
Mesh, Torus, Hypercubes, Fat-tree
 Commonly used network topologies in
distributed memory architectures
 Hypercubes are networks with dimensions

27
Mesh, Torus, Hypercubes

2D
Mesh
Hypercube (binary n-cube)

n=2 n=3

Torus

28
Fat Tree Networks
 Binary tree
 Processors arranged in leaves
 Other nodes correspond to switches
 Fundamental property:
No. of links from a node to
a children = no. of links
from the node to its parent
 Edges become fatter as we traverse up the
tree

29
Fat Tree Networks
 Any pairs of processors can communicate
without contention: non-blocking network
 Constant Bisection Bandwidth (CBB)
networks
 Two level fat tree has a diameter of four

30
Evaluating Interconnection topologies
 Diameter – maximum distance between any two processing
nodes
 Full-connected – 1

 Star – 2
 Ring – p/2

 Hypercube - logP

 Connectivity – multiplicity of paths between 2 nodes. Miniimum

number of arcs to be removed from network to break it into two
disconnected networks
 Linear-array – 1

 Ring – 2
 2-d mesh – 2

 2-d mesh with wraparound – 4

 D-dimension hypercubes – d

31
Evaluating Interconnection topologies
 bisection width – minimum number of links to
be removed from network to partition it into 2
equal halves
 Ring – 2
 P-node 2-D mesh - Root(P)
 Tree – 1
 Star – 1
 Completely connected – P2/4
 Hypercubes - P/2

32
Evaluating Interconnection topologies

 channel width – number of bits that can be

simultaneously communicated over a link, i.e.
number of physical wires between 2 nodes
 channel rate – performance of a single physical
wire
 channel bandwidth – channel rate times channel
width
 bisection bandwidth – maximum volume of
communication between two halves of network,
i.e. bisection width times channel bandwidth

Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Wago-I/O-System 750: Manual
No ratings yet
Wago-I/O-System 750: Manual
418 pages
Mastering Proxmox - Second Edition
From Everand
Mastering Proxmox - Second Edition
Wasim Ahmed
No ratings yet
Introduction To Middleware Technologies
No ratings yet
Introduction To Middleware Technologies
30 pages
Parallel Architecture: Sathish Vadhiyar
No ratings yet
Parallel Architecture: Sathish Vadhiyar
26 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Introduction
No ratings yet
Introduction
46 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Pdcco 1
No ratings yet
Pdcco 1
8 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
Unit 1
No ratings yet
Unit 1
25 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
05 Multiprocessor
No ratings yet
05 Multiprocessor
54 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
ACA Assignment 4
No ratings yet
ACA Assignment 4
16 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
Lec3 InnerconnectionNetworks
No ratings yet
Lec3 InnerconnectionNetworks
28 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
Module 4
No ratings yet
Module 4
66 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
CS621 Final Term
No ratings yet
CS621 Final Term
111 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Network 34
No ratings yet
Network 34
76 pages
cs668 Lec1 ParallelArch
No ratings yet
cs668 Lec1 ParallelArch
18 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Lecture 5
No ratings yet
Lecture 5
72 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
Unit 5
No ratings yet
Unit 5
89 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Distributed System
100% (1)
Distributed System
26 pages
Parallel Processors: Session 5 Interconnection Networks
No ratings yet
Parallel Processors: Session 5 Interconnection Networks
48 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Week 5
No ratings yet
Week 5
52 pages
3RD Unit Half 2
No ratings yet
3RD Unit Half 2
8 pages
Module 4
No ratings yet
Module 4
40 pages
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Open MP
No ratings yet
Open MP
35 pages
Assignment: Objective
No ratings yet
Assignment: Objective
30 pages
Aphs
No ratings yet
Aphs
111 pages
Scanned by Camscanner
No ratings yet
Scanned by Camscanner
6 pages
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
No ratings yet
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
1 page
MCQ QUESTION SHEET IcT
No ratings yet
MCQ QUESTION SHEET IcT
12 pages
JUNOS OSPF Configuring Route Optimizations
No ratings yet
JUNOS OSPF Configuring Route Optimizations
31 pages
Application Server
No ratings yet
Application Server
4 pages
@V5000 English V3
No ratings yet
@V5000 English V3
100 pages
F5 80question
No ratings yet
F5 80question
8 pages
Giganet Cat6A-UUTP Cable
No ratings yet
Giganet Cat6A-UUTP Cable
2 pages
802.1Q VLAN Configuration Guide T1600G T1700G T1700X Series
No ratings yet
802.1Q VLAN Configuration Guide T1600G T1700G T1700X Series
16 pages
AWS
No ratings yet
AWS
3 pages
Jeet 21
No ratings yet
Jeet 21
20 pages
Tech Seminar
No ratings yet
Tech Seminar
24 pages
Netflow Con Flow
No ratings yet
Netflow Con Flow
11 pages
CAIE-AS Level-Computer Science - Theory
No ratings yet
CAIE-AS Level-Computer Science - Theory
25 pages
Network Monitoring Project
No ratings yet
Network Monitoring Project
30 pages
Hierarchical Network Design
No ratings yet
Hierarchical Network Design
10 pages
Configuring RFC 2544 - CISCO
100% (1)
Configuring RFC 2544 - CISCO
650 pages
Gigabit Ethernet Smart Managed Plus Switches Ds
No ratings yet
Gigabit Ethernet Smart Managed Plus Switches Ds
8 pages
SAP Nota Fiscal Eletrônica 10.0 Security Guide
No ratings yet
SAP Nota Fiscal Eletrônica 10.0 Security Guide
34 pages
Eltek FP2 Indoor
No ratings yet
Eltek FP2 Indoor
1 page
DFE33B Gateway 16725611
No ratings yet
DFE33B Gateway 16725611
144 pages
QuadGen Corporate Leaflet
No ratings yet
QuadGen Corporate Leaflet
2 pages
Supporting SCCM 2007 Lab Manual
No ratings yet
Supporting SCCM 2007 Lab Manual
87 pages
Horizontal Campus Validated Design Guide
No ratings yet
Horizontal Campus Validated Design Guide
174 pages
Lzu1431171 M10
100% (1)
Lzu1431171 M10
23 pages
Airspan Mobile Wimax RF Planning Parameters PDF
No ratings yet
Airspan Mobile Wimax RF Planning Parameters PDF
23 pages
EMC Unity - Replication Technologies - 复制技术
No ratings yet
EMC Unity - Replication Technologies - 复制技术
37 pages
S.no Particular Super Value BSNL CUL Plan 129 1 Monthly Charge 2 Free Calls
No ratings yet
S.no Particular Super Value BSNL CUL Plan 129 1 Monthly Charge 2 Free Calls
12 pages
9 E4010E-01 (NEO Training Manual 20jun)
No ratings yet
9 E4010E-01 (NEO Training Manual 20jun)
106 pages
Distributed File Systems: - Objectives - Contents
No ratings yet
Distributed File Systems: - Objectives - Contents
34 pages

Parallel Architecture

Uploaded by

Parallel Architecture

Uploaded by

Parallel Architecture

 The alternative is sometimes referred to

 3 states associated with data items

 In shared memory systems, memory units

 Crossbar switch – used in

 Connectivity – multiplicity of paths between 2 nodes. Miniimum

 2-d mesh with wraparound – 4

 channel width – number of bits that can be

You might also like