Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu

Parallel computing uses multiple processors working together to solve problems faster than a single processor can. There are two main approaches: shared memory, where processors access the same memory, and distributed memory, where each processor has its own local memory and they communicate over a network. Performance depends on latency, the time for messages to be sent, and bandwidth, the data transfer rate. Popular architectures include clusters, grids, torus networks, and fat trees. Programming models include data parallel, which splits arrays across processors, and work sharing, which distributes loops. Load balancing aims to distribute work evenly for best performance.

Uploaded by

anjnasharma

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu

Uploaded by

anjnasharma

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Parallel Computing: Overview

John Urbanic
[email protected]
Introduction to Parallel
Computing
• Why we need parallel computing
• How such machines are built
• How we actually use these machines
New Applications
Clock Speeds
CPU Clock
Clock Speeds
When the PSC went from a 2.7 GFlop Y-MP to a 16
GFlop C90, the clock only got 50% faster. The
rest of the speed increase was due to increased use
of parallel techniques:
• More processors (8  16)
• Longer vector pipes (64  128)
• Parallel functional units (2)
• Cray X1 (13 GFlops/CPU) is only 800 MHz!
Clock Speeds
So, we want as many processors working
together as possible. How do we do this?
There are two distinct elements:
Hardware
• vendor does this
Software
• you, at least today
Amdahl’s Law
How many processors can
we really use?

Let’s say we have a legacy

code such that is it only
feasible to convert half of
the heavily used routines
to parallel:
Amdahl’s Law
If we run this on a parallel
machine with five
processors:
Our code now takes about
60s. We have sped it up
by about 40%. Let’s say
we use a thousand
processors:
We have now sped our code
by about a factor of two.
Amdahl’s Law
This seems pretty depressing, and it does point out one limitation of converting old
codes one subroutine at a time. However, most new codes, and almost all parallel
algorithms, can be written almost entirely in parallel (usually, the “start up” or
initial input I/O code is the exception), resulting in significant practical speed ups.
This can be quantified by how well a code scales which is often measured as
efficiency.
Shared Memory
Easiest to program. There are no
real data distribution or
communication issues. Why
doesn’t everyone use this
scheme?
• Limited numbers of processors
(tens) – Only so many C90(60)/Marvel(64)/Altix(12)
processors can share the same
bus before conflicts dominate.
• Limited memory size –
Memory shares bus as well.
Accessing one part of memory
will interfere with access to
other parts.
Distributed Memory
• Number of processors only limited by physical
size (tens of meters).
• Memory only limited by the number of processors
time the maximum memory per processor (very
large). However, physical packaging usually
dictates no local disk per node and hence no
virtual memory.
• Since local and remote data have much different
access times, data distribution is very important.
We must minimize communication.
Common Distributed Memory
Machines
• CM-2
• CM-5
• T3E
• Workstation Cluster
• SP4
• TCS
• ASCI (Red, Blue, Purple, White, Red Storm…)
• Earth Simulator
Common Distributed Memory
Machines
While the CM-2 is SIMD (one instruction unit for multiple processors),
all the new machines are MIMD (multiple instructions for multiple
processors) and based on commodity processors.

SP-4 POWER4
CM-5 SPARC
T3E Alpha
Workstations Mostly Intel and AMD
TCS Alpha

Therefore, the single most defining characteristic of any of these

machines is probably the network.
Latency and Bandwidth
Even with the "perfect" network we have here, performance is determined by two more quantities that,
together with the topologies we'll look at, pretty much define the network: latency and bandwidth.
Latency can nicely be defined as the time required to send a message with 0 bytes of data. This number
often reflects either the overhead of packing your data into packets, or the delays in making
intervening hops across the network between two nodes that aren't next to each other.
Bandwidth is the rate at which very large packets of information can be sent. If there was no latency, this is
the rate at which all data would be transferred. It often reflects the physical capability of the wires and
electronics connecting nodes.
Token-Ring/Ethernet with
Workstations
Complete Connectivity
Super Cluster / SP4
CM-2
Binary Tree
CM-5 Fat Tree
INTEL Paragon (2-D Mesh)
3-D Torus
T3E has Global
Addressing hardware,
and this helps to
simulate shared
memory.
Torus means that “ends”
are connected. This
means A is really
connected to B and the
cube has no real
boundary.
TCS Fat Tree
Data Parallel
Only one executable. Strengths:
Do computation on arrays of data using 1. Scales transparently to different
array operators. size machines
Do communications using array shift
2. Easy debugging, as there I sonly
or rearrangement operators.
one copy of coed executing in
Good for problems with static load highly synchronized fashion
balancing that are array-oriented
SIMD machines. Weaknesses:
Variants: 1. Much wasted synchronization
FORTRAN 90 2. Difficult to balance load
CM FORTRAN
HPF
C*
CRAFT
Data Parallel – Cont’d
Computation in FORTRAN 90
Data Parallel – Cont’d
Communication in FORTRAN 90
Data Parallel – Cont’d
When to use Data Parallel
– Very array-oriented programs
• FEA
• Fluid Dynamics
• Neural Nets
• Weather Modeling
– Very synchronized operations
• Image processing
• Math analysis
Work Sharing
Splits up tasks (as opposed to arrays Strengths:
in date parallel) such as loops 1. Directive based, so it can be
amongst separate processors. added to existing serial codes
Do computation on loops that are Weaknesses:
automatically distributed.
1. Limited flexibility
Do communication as a side effect of 2. Efficiency dependent upon
data loop distribution. Not
structure of existing serial code
important on shared memory
machines. 3. May be very poor with
distributed memory.
If you have used CRAYs before, this
of this as “advanced Variants:
multitasking.” * CRAFT
Good for shared memory * Multitasking
implementations. * OpenMP
Work Sharing – Cont’d
When to use Work Sharing
• Very large / complex / old existing codes:
Gaussian 90
• Already multitasked codes: Charmm
• Portability (Directive Based)
• (Not Recommended)
Load Balancing
An important consideration which can be controlled by communication is
load balancing:
Consider the case where a dataset is distributed evenly over 4 sites.
Each site will run a piece of code which uses the data as input and
attempts to find a convergence. It is possible that the data contained at
sites 0, 2, and 3 may converge much faster than the data at site 1. If
this is the case, the three sites which finished first will remain idle
while site 1 finishes. When attempting to balance the amount of work
being done at each site, one must take into account the speed of the
processing site, the communication "expense" of starting and
coordinating separate pieces of work, and the amount of work required
by various pieces of data.
There are two forms of load balancing: static and dynamic.
Load Balancing – Cont’d
Static Load Balancing
In static load balancing, the programmer must
make a decision and assign a fixed amount of
work to each processing site a priori.
Static load balancing can be used in either the
Master-Slave (Host-Node) programming model
or the "Hostless" programming model.
Load Balancing – Cont’d
Static Load Balancing yields good performance
when:
• homogeneous cluster
• each processing site has an equal amount of work
Poor performance when:
• heterogeneous cluster where some processors are
much faster (unless this is taken into account in
the program design)
• work distribution is uneven
Load Balancing – Cont’d
Dynamic Load Balancing
Dynamic load balancing can be further divided into the categories:
task-oriented
when one processing site finishes its task, it is assigned another task (this is the
most commonly used form).
data-oriented
when one processing site finishes its task before other sites, the site with the most
work gives the idle site some of its data to process (this is much more complicated
because it requires an extensive amount of bookkeeping).
Dynamic load balancing can be used only in the Master-Slave programming model.
• ideal for:
• codes where tasks are large enough to keep each processing site busy
• codes where work is uneven
• heterogeneous clusters

Enterprise IT Troubleshooting Cross Functional IT Problem Solving
No ratings yet
Enterprise IT Troubleshooting Cross Functional IT Problem Solving
399 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
33 pages
Parallel Computing Introduction
No ratings yet
Parallel Computing Introduction
36 pages
High Performance Computing
100% (2)
High Performance Computing
61 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
The Tera Computer System
No ratings yet
The Tera Computer System
6 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
22 pages
SystemModelsforDistributedandCloudComputing PDF
No ratings yet
SystemModelsforDistributedandCloudComputing PDF
15 pages
Advanced Computer Architecture: Mca - Tma CS-12
No ratings yet
Advanced Computer Architecture: Mca - Tma CS-12
11 pages
Classification of Distributed Computing Systems
No ratings yet
Classification of Distributed Computing Systems
14 pages
Traing On Hadoop
No ratings yet
Traing On Hadoop
123 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Mines Paristech / Cri Lal / Cnrs / In2P3
No ratings yet
Mines Paristech / Cri Lal / Cnrs / In2P3
37 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
Data Parallel Algorithms
No ratings yet
Data Parallel Algorithms
14 pages
Cluster Computing
100% (6)
Cluster Computing
28 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
Cluster Computing
No ratings yet
Cluster Computing
5 pages
The Tera Computer System
No ratings yet
The Tera Computer System
6 pages
AOS - Theory - Multi-Processor & Distributed UNIX Operating Systems - I
No ratings yet
AOS - Theory - Multi-Processor & Distributed UNIX Operating Systems - I
13 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Networks of Workstations: (Distributed Memory)
100% (1)
Networks of Workstations: (Distributed Memory)
19 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
hpc-Neal
No ratings yet
hpc-Neal
32 pages
Increasing Factors Which Improves The Performance of Computer in Future
No ratings yet
Increasing Factors Which Improves The Performance of Computer in Future
7 pages
multicore02-2
No ratings yet
multicore02-2
18 pages
PDC 5 - Networks of Workstations (Distributed Memory)
No ratings yet
PDC 5 - Networks of Workstations (Distributed Memory)
19 pages
A Summary On "Characterizing Processor Architectures For Programmable Network Interfaces"
No ratings yet
A Summary On "Characterizing Processor Architectures For Programmable Network Interfaces"
6 pages
Cluster Computing
0% (1)
Cluster Computing
20 pages
Implementing Parallel Processing in Rugged Embeddable Environment
No ratings yet
Implementing Parallel Processing in Rugged Embeddable Environment
6 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Lectures On DS
No ratings yet
Lectures On DS
8 pages
Cluster Computing: DATE: 28 November 2013
No ratings yet
Cluster Computing: DATE: 28 November 2013
32 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Cluster Computing: Shashwat Shriparv Infinitysoft
No ratings yet
Cluster Computing: Shashwat Shriparv Infinitysoft
28 pages
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
No ratings yet
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
22 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
15 pages
Poor Man's Computing Revisited: Alexander Shchepetkin, I.G.P.P. UCLA
No ratings yet
Poor Man's Computing Revisited: Alexander Shchepetkin, I.G.P.P. UCLA
12 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Cluster and Grid Computing
No ratings yet
Cluster and Grid Computing
37 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
CS621 Final Term Current Papers
No ratings yet
CS621 Final Term Current Papers
9 pages
Multiprocessors: Cs 152 L1 5 .1 DAP Fa97, U.CB
No ratings yet
Multiprocessors: Cs 152 L1 5 .1 DAP Fa97, U.CB
38 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Lecture 8 Miscellaneous Topics
No ratings yet
Lecture 8 Miscellaneous Topics
52 pages
UNIT 1 (1)
No ratings yet
UNIT 1 (1)
34 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
HPC Unit 3
No ratings yet
HPC Unit 3
31 pages
m1IA
No ratings yet
m1IA
5 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Operating System Notes
100% (1)
Operating System Notes
12 pages
Opreating System
No ratings yet
Opreating System
10 pages
Basics of OS
No ratings yet
Basics of OS
27 pages
OSCA Assignment
No ratings yet
OSCA Assignment
18 pages
Computer Notes For Bank Exams
No ratings yet
Computer Notes For Bank Exams
27 pages
3 History Evaluation and Types of OS
No ratings yet
3 History Evaluation and Types of OS
12 pages
Computer
No ratings yet
Computer
25 pages
Question Bank: Subject Name: Operating Systems Subject Code: CS2254 Year/ Sem: IV
No ratings yet
Question Bank: Subject Name: Operating Systems Subject Code: CS2254 Year/ Sem: IV
6 pages
About The Presentations
No ratings yet
About The Presentations
42 pages
600computer MCQ
No ratings yet
600computer MCQ
79 pages
OS Lecture Notes 1-2
No ratings yet
OS Lecture Notes 1-2
32 pages
OS_Unit-I-1
No ratings yet
OS_Unit-I-1
24 pages
Operating Systems System Administration: U. U. Samantha Rajapaksha
No ratings yet
Operating Systems System Administration: U. U. Samantha Rajapaksha
32 pages
B.Tech. (CSE) - DS Semester-4
No ratings yet
B.Tech. (CSE) - DS Semester-4
24 pages
OS - 2 Marks With Answers
No ratings yet
OS - 2 Marks With Answers
28 pages
Operating Systems Lecture Notes
100% (3)
Operating Systems Lecture Notes
47 pages
Operating Systems
No ratings yet
Operating Systems
17 pages
Diploma Mcqs Computer Nse
No ratings yet
Diploma Mcqs Computer Nse
8 pages
Project Report
No ratings yet
Project Report
29 pages
Memory Allocation in Operating Systems
No ratings yet
Memory Allocation in Operating Systems
37 pages
Submitted By: Manisha Mca-I
No ratings yet
Submitted By: Manisha Mca-I
32 pages
It Fundamentals
No ratings yet
It Fundamentals
4 pages
OS Multiple Choice Questions
No ratings yet
OS Multiple Choice Questions
6 pages
Operatig System
100% (1)
Operatig System
29 pages
PPT-Unit-2-Services and Components of Operating Systems
No ratings yet
PPT-Unit-2-Services and Components of Operating Systems
44 pages
Introduction To Os Important Question
No ratings yet
Introduction To Os Important Question
12 pages
Cse-V-Operating Systems (10CS53) - Notes PDF
No ratings yet
Cse-V-Operating Systems (10CS53) - Notes PDF
150 pages
Computer Brahmastra E-BOOK
No ratings yet
Computer Brahmastra E-BOOK
99 pages
Operating System: Subtitle
No ratings yet
Operating System: Subtitle
33 pages

Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu

Uploaded by

Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu

Uploaded by

Parallel Computing: Overview

Let’s say we have a legacy

Therefore, the single most defining characteristic of any of these

You might also like