0% found this document useful (0 votes)

7 views51 pages

L01 Introduction

The document provides an introduction to parallel computing, outlining its motivation, basic concepts, and challenges. It discusses the differences between serial and parallel computing, the importance of task decomposition, scheduling, and mapping, and the need for synchronization among tasks. Additionally, it highlights the evolution of computing and the necessity of parallelism to overcome limitations in serial processing.

Uploaded by

Vishnu Prasath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views51 pages

L01 Introduction

Uploaded by

Vishnu Prasath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

CS3210

Introduction

Lecture 1

1
Outline
 Parallel Computing
 Motivation
 Serial vs Parallel Computing
 Serial vs Parallel Processing Units
 Basic of Parallel Computing
 Decompose, Schedule and Map
 Performance
 Challenges
 Landscape of Parallel Computing
 Applications
 Systems
[ CS3210 AY2425S1 - L1 ]
2
Cheetah vs Hyena

 Weigh 60 kg  Weigh 50 kg
 Fastest land animal  Top speed of 65 km/h
 Top speed of 120 km/h
 0-100 km/h in 3 seconds
 Hunt alone  Hunt in group of 4-40
 Usually over in 1 minute  Can take down large animals
 50% of losing its catch to other like buffalo, rhino, lion
predators
[ CS3210 AY2425S1 - L1 ]
3
Observations

 Optimized on speed only  Optimized to cooperate and work

in parallel

 Single Processor:  Parallel Computer:

 Failed to meet Moore's Law since early  A collection of processing elements that
2000 cooperate to solve problems quickly
 Huge amount of computation power

[ CS3210 AY2425S1 - L1 ]
4
Parallel Computing - Motivation
 In the first issue of CACM (1958), Saul Gorn notes that:

“We know that the so-called parallel computers are somewhat faster
than the serial ones, and that no matter how fast we make access and
arithmetic serially, we can do correspondingly better in parallel.
However access and arithmetic speeds seem to be approaching a
definite limit ... Does this mean that digital speeds are reaching a limit,
that digital computation of multi-variate partial differential systems must
accept it, and that predicting tomorrow’s weather must require
more than one day? Not if we have truly-parallel machines, with say,
a thousand instruction registers.”

[ CS3210 AY2425S1 - L1 ]
5
Hitting the Wall
 ‘70s: “Supercomputers” for Scientific Computing
 Early ‘90s: Databases
 Inflection point in 2004: Intel hits the Power Density Wall

“obtaining more computing power by stamping multiple processors

on a single chip rather than straining to increase the speed of a
single processor.”

John Markoff, New York Times, May 17, 2004

*Cooking aware computing

[ CS3210 AY2425S1 - L1 ]
6
How do you make your program run faster?
 Answer before 2004:
 Just wait 6 months and buy a new machine!
 Answer after 2004:
 Write parallel software

[ CS3210 AY2425S1 - L1 ]
7
Parallel Computing - Challenges
 Gorn also recognized the challenges that would be posed to
the system designer by the new parallel computers:

“But visualize what it would be like to program for such a machine! If a

thousand subroutines were in progress simultaneously, one must program
the recognition that they have all finished, if one is to use their results
together. The programmer must not only synchronize his subroutines, but
schedule all his machine units, unless he is willing to have most of them
sitting idle most of the time. Not only would programming techniques
be vastly different from the serial ones in the serial languages we now
use, but they would be humanly impossible without machine intervention.”

[ CS3210 AY2425S1 - L1 ]
8
Serial Computing
problem

processing
unit

in i2 i1
instructions

 Traditionally, a problem is divided into a discrete

series of instructions
 Instructions are executed one after another
 Only one instruction executed at any moment in time
[ CS3210 AY2425S1 - L1 ]
9
Example: Evaluate a = (b+1) * (b-c)
Program Memory
Locations
b (4)
Add 1 to b
i: + b 1 t1
Store result in t1 1
t1 (5 )

Subtract c from b c (2)

Store result in t2 i2: - b c t2

t2 (2 )
Multiply t1, t2 i3: * t1 t2 a
Store result in a
a (10 )

single control flow

Sequential Computation Model

[ CS3210 AY2425S1 - L1 ]
10
Processing Unit: von Neumann Model
Memory
(holds instructions and data)

P
R
Control Arithmetic O
Unit Logic C
E
(Instruction Unit S
Counter) (Registers) S
O
R

input output

[ CS3210 AY2425S1 - L1 ]
11
von Neumann Computation Model
 Processor:
 Performs instructions
 Memory:
 Stores both the instructions and data in cells with unique addresses
 Control Scheme:
 Fetches instructions one after another from memory
 Shuttles data between memory and processor
 Memory wall
  disparity between processor (< 1 nanosecond or 10-9 sec) and
memory speed (100-1,000 nanoseconds)

[ CS3210 AY2425S1 - L1 ]
12
Ways to improve performance
1. Work hard  higher clock frequency

2. Work smart  pipelining, superscalar

processor

3. Get help  replication – multicore, cluster, ..

[ CS3210 AY2425S1 - L1 ]
13
Parallel Computing
 Simultaneous use of multiple processing units to solve a
problem fast / solve a larger problem

 Processing units could be:

i. A single processor with multiple cores
ii. A single computer with multiple processors
iii. A number of computers connected by a network
iv. Combinations of the above

 Ideally, a problem (application) is partitioned into sufficient

number of independent parts for execution on parallel
processing elements
[ CS3210 AY2425S1 - L1 ]
14
Parallel Computing
problem instructions

processing
task0 unit0

processing
task1 unit1

…… ………… ……

processing
taskm unitp-1

in i2 i1
1. A problem is divided into m discrete parts (tasks) that can be solved concurrently
2. Each part is further broken down to a series of instructions (i)
3. Instructions from each part execute in parallel on different processing units (p)

[ CS3210 AY2425S1 - L1 ]
15
Example: Evaluate a = (b+1) * (b-c)

fork creates parallel

fork i2
executing "thread"

i1: + b 1 t1 i2: - b c t2

join blocks executing

join 2 threads until n
wait until join i3:
threads reaches the
same point
* t1 t 2 a

multiple control flows

Parallel Computing (Shared-Memory)

[ CS3210 AY2425S1 - L1 ]
16
Why Parallel Computing
 Primary reasons:
1. Overcome limits of serial computing
2. Save (wall-clock) time
3. Solve larger problems

 Other reasons:
1. Take advantage of non-local resources
2. Cost saving – use multiple “cheaper” (commoditized) computing
resources
3. Overcome memory constraints

 Performance - exploits a large collection of processing units that

can communicate and cooperate to solve large problems fast
[ CS3210 AY2425S1 - L1 ]
17
BASIC OF
PARALLEL COMPUTING

[ CS3210 AY2425S1 - L1 ]
18
Computational Model Attributes
[Operation Mechanism]
• The primitive units of computation or basic actions of the computer (data types and
operations defined by the instruction set)

[Data Mechanism]
• The definition of address spaces available to the computation (how data are
accessed and stored)

[Control Mechanism]
• The schedulable units of computation (rules for partitioning and scheduling the
problem for computation using the primitive units)

[Communication Mechanism]
• The modes and patterns of communication among processing elements working
in parallel, so that they can exchange needed information

[Synchronization Mechanism ]
• Mechanism to ensure needed information arrives at the right time
[ CS3210 AY2425S1 - L1 ]
19
Basics of Parallel Computing
Application
Problem

decompose

Tasks
schedule
& map

Physical Cores &

Processors

[ CS3210 AY2425S1 - L1 ]
20
Decomposition
 One problem  many possible decompositions

 Complicated and laborious process

 Potential parallelism of an application dictates how it should be split
into tasks
 Size of tasks is called granularity – can choose different task sizes
 Defining the tasks is challenging and difficult to automate (why?)

[ CS3210 AY2425S1 - L1 ]
21
Tasks
 Tasks are coded in a parallel programming language

 Scheduling: Assignment of tasks to processes or threads

 Dictates the task execution order
 Manually defined? Static? Dynamic?

 Mapping: Assignment of processes/threads to physical

cores/processors for execution

 Tasks may depend on each other resulting in data or control

dependencies
 impose execution order of parallel tasks
[ CS3210 AY2425S1 - L1 ]
22
Dependences and Coordination
 Dependences among tasks impose constraints on scheduling

 To execute correctly, processes and threads need

synchronization and coordination
 depends on information exchanges between processes and threads
and in turns, depends on the hardware memory organization

 Memory organizations: shared-memory (threads) and

distributed-memory (processes)

[ CS3210 AY2425S1 - L1 ]
23
An illusion?

[ CS3210 AY2425S1 - L1 ]
24
Concurrency vs. Parallelism

Concurrency Parallelism
 Two or more tasks can start, run,  Two or more tasks can run (execute)
and complete in overlapping simultaneously, at the exact same
time periods time
 They might not be running (executing  Tasks do not only make progress,
on CPU) at the same instant but they also actually execute
 Two or more execution flows make simultaneously
progress at the same time by
interleaving their executions or by
executing instructions (on CPU) at
exactly the same time

[ CS3210 AY2425S1 - L1 ]
25
Example – Serial Solution
 Find the sum for n numbers:

sum = 0;
for (i = 0; i < n; i++) {
x = Compute_next_value(. . .);
sum += x;
}

Serial Code

[ CS3210 AY2425S1 - L1 ]
26
Example – Parallel Solution (1/3)
 Suppose we have p cores ( p < n )
 Each core can perform a partial sum of n/p values

my_sum = 0;
my_start = ……;
my_end = ……;
for (my_i = my_start; my_i < my_end; my_i++){
my_x = Compute_next_value(. . .);
my_sum += my_x;
}

 Each core uses its own private variables and executes this
block of code independently
[ CS3210 AY2425S1 - L1 ]
27
Example – Parallel Solution (2/3)
 Upon completion, each core contains:
 the partial sum in my_sum

 E.g. If n (=24), and p = 8

 Values generated:
1 4 3 9 2 8 5 1 1 5 3 7 2 5 0 4 1 8 6 5 1 2 3 9

 Partial sum calculated in each core:

Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14

[ CS3210 AY2425S1 - L1 ]
28
Example – Parallel Solution (3/3)
 Next, a core designated as the master core adds up all values
of my_sum to form the global sum (8 + 19 + 7 + 15 + 7 + 13 +
12 + 14 = 95)
if (I’m the master core){
sum = my_x;
for each core other than myself {
receive value from other core
sum += value;
}
} else { //not master core
send my_x to the master;
}
Parallel Code
Core 0 1 2 3 4 5 6 7
my_sum 8
95 19 7 15 7 13 12 14

[ CS3210 AY2425S1 - L1 ]
29
Better Parallel Algorithm
 Share the work of the global summing too:
CORES
0 1 2 3 4 5 6 7
8 19 7 15 7 13 12 14

+ 27 + 22 + 20 + 26
time

+ 49 + 46

+ 95

[ CS3210 AY2425S1 - L1 ]
30
Comparison: Two Parallel Solutions
 8 cores  1000 cores
 1st version = 7 steps  1st version = 999 steps
 2nd version = 3 steps  2nd version = 10 steps

 factor of 2 improvement  factor of 100 improvement

[ CS3210 AY2425S1 - L1 ]
31
Parallel Performance
 Two perspectives:
 Execution time versus Throughput

 Parallel execution time = computation time + parallelization

overheads

 Parallelization overheads:
 distribution of work (tasks) to processors
 information exchange or synchronization
 idle time
 etc
[ CS3210 AY2425S1 - L1 ]
32
Parallel Computing is HARD

“…when we start talking about parallelism and ease of use of truly parallel
computers, we're talking about a problem that's as hard as any that computer
science has faced. …
I would be panicked if I were in industry.”
John Hennessy, President, Stanford University, 2007

[ CS3210 AY2425S1 - L1 ]
33
Compiler Challenges
 Heterogeneous processors
 Increase in the design space for code optimization

 Auto-tuners:
 Optimizing code at runtime

 Software-controlled memory management

[ CS3210 AY2425S1 - L1 ]
34
Parallel Programming Challenges
 Finding enough parallelism (Amdahl's Law!)

 Granularity

 Locality

 Load balance

 Coordination and synchronization

 Debugging

 Performance modeling / Monitoring

[ CS3210 AY2425S1 - L1 ]
35
LANDSCAPE OF
PARALLEL COMPUTING

[ CS3210 AY2425S1 - L1 ]
36
Uses of Parallel Computing
 Data mining, web search engines, web-based business
services, pharmaceutical design, financial and economic
modeling, AI, etc.

http://computing.llnl.gov/tutorials/parallel_comp
[ CS3210 AY2425S1 - L1 ]
37
Types of Computer Systems

Computer Systems

Single System Distributed Systems

(multiple systems)

PC/Workstation SMP/NUMA Vector Mainframe

Clusters Client Server Grids Peer-to-Peer Cloud

Control and Management

Centralised Decentralized
[ CS3210 AY2425S1 - L1 ]
38
Parallel Computing and Parallel Machines
In your pocket:
Intel Skylake architecture:
ARM big.LITTLE 64-bit
Quad-core CPU + multi-core
(Cortex-A53 + Cortex-A73)
GPU integrated on one chip
4-8 cores, >2GHz

[ CS3210 AY2425S1 - L1 ]
40
Fastest Supercomputer

[ CS3210 AY2425S1 - L1 ]
41
2022-24 Fastest Supercomputer
Supercomputer Frontier
1,206 petaflops
8,699,904 cores
AMD Optimized 3rd Gen EPYC 64C 2GHz
22 Mwatts
No. 7* in GREEN500

Video

[ CS3210 AY2425S1 - L1 ]
42
2020-21 Fastest Supercomputer
Supercomputer
Fugaku
415 petaflops
~7300000 cores ARM A64FX
29 Mwatts
No. 33 in GREEN500

Video
Virtual tour

[ CS3210 AY2425S1 - L1 ]
43
Usage of Frontier
 Energy
 Fusion energy studies
 Combustion energy improvements
 Innovative clean energy systems
 Materials study
 Social and scientific issues
 Innovative drug discovery
 Personalized and preventive medicine
 Disaster prevention and global climate problems
 Meteorological and global environmental predictions
 Evolution of the universe
[ CS3210 AY2425S1 - L1 ]
45
Deconstructing Frontier
 Node:
 1 HPC and AI Optimized AMD EPYC CPU
 4 Purpose Built AMD Radeon Instinct GPU
 74 cabinets:
 Over 9000 nodes
 Interconnect:
 Multiple Slingshot NICs
 Slingshot dragonfly network

[ CS3210 AY2425S1 - L1 ]
46
Deconstructing Fugaku
Processor (CPU):
• Fujitsu’s 48-core A64FX SoC (optional assistant cores)

• 158,976 nodes (CPUs):

396 racks of 384 nodes
36 racks of 192 nodes
• Fujitsu Tofu (Torus Fusion) interconnect

[ CS3210 AY2425S1 - L1 ]
47
TOP500 Performance over Time

name prefix multiplier

exa E 1018
peta P 1015
tera T 1012
giga G 109
mega M 106
kilo K 103

source: top500.org
[ CS3210 AY2425S1 - L1 ]
48
Summary
 Serial versus parallel computing

 Parallel computing - what, why and how

 Performance and challenges

 Landscape of parallel computing

[ CS3210 AY2425S1 - L1 ]
49
Evolution of Prominent Computers
year location name first
1834 Cambridge Difference engine Programmable computer
1943 Bletchley Colossus Electronic computer
1948 Manchester SSEM (Baby) Stored-program computer
1951 MIT Whirlwind 1 Real-time I/O computer
1953 Manchester Transistor computer Transistorized computer
1971 California Intel 4004 Mass-market CPU & IC
1979 Cambridge Sinclair ZX-79 Mass-market home computer
1981 New York IBM PC Personal computer
1987 Cambridge Acorn A400 series High-street RISC PC sales
1990 New York IBM RS6000 Superscalar RISC processor
1998 California Sun picolJAVA Computer Based on a language

[ CS3210 AY2425S1 - L1 ]
50
Progression of Computation Speed
year Floating point operations per second (FLOPS)
1941 1
1945 100
1949 1,000 (1 KiloFLOPS, kFLOPS)
1951 10,000
1961 100,000
1964 1,000,000 (1 MegaFLOPS, MFLOPS)
1968 10,000,000
1975 100,000,000
1987 1,000,000,000 (1 GigaFLOPS, GFLOPS)
1992 10,000,000,000
1993 100,000,000,000
1997 1,000,000,000,000 (1 TeraFLOPS, TFLOPS)
2000 10,000,000,000,000
2007 478,000,000,000,000 (478 TFLOPS)
2009 1,100,000,000,000,000 (1.1 PetaFLOPS)
[ CS3210 AY2425S1 - L1 ]
51
Computing Units
International System of Units (SI) International Electrotechnical
Commission (IEC)
name prefix multiplier
Nam prefix multiplier
exa E 1018
e
peta P 1015
Exbi Ei 260
tera T 1012
pebi Pi 250
giga G 109
tebi Ti 240
mega M 106
gibi Gi 230
kilo K 103
mebi Mi 220
milli m 10-3
kibi Ki 210
micro µ 10-6
nano n 10-9 Example of disk capacity:
pico p 10-12 20 Mibytes
20 MiB
20 Mebibytes
= 20 x 220 = 20,971,520 bytes
[ CS3210 AY2425S1 - L1 ]
52
Reference & Readings
 Fugaku technical details:
 https://www.r-ccs.riken.jp/en/fugaku/

 Reading
 Robert Robey, Yuliana Zamora, ”Parallel and High-Performance
Computing”, Manning Publications – Chapter 1.
 A View of the Parallel Computing Landscape, CACM Vol. 52 No. 10,
pp. 56-67, October 2009. [Technical Report: The Landscape of
Parallel Computing Research: A View from Berkeley, Dec 2006]

[ CS3210 AY2425S1 - L1 ]
53

Types of Parallel Computing
No ratings yet
Types of Parallel Computing
11 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
L03 Architecture Memory
No ratings yet
L03 Architecture Memory
56 pages
L04 Parallel Programming Models I
No ratings yet
L04 Parallel Programming Models I
72 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Parallel and Distributed Computing Module I
No ratings yet
Parallel and Distributed Computing Module I
28 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
51 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
Parallel Computing Unit 1 - Introduction To Parallel Computing
No ratings yet
Parallel Computing Unit 1 - Introduction To Parallel Computing
43 pages
Parallel and Distributed Computing Module I
No ratings yet
Parallel and Distributed Computing Module I
26 pages
Introduction
No ratings yet
Introduction
17 pages
Unit 5
No ratings yet
Unit 5
66 pages
LLNL Introduction To Parallel Computing
No ratings yet
LLNL Introduction To Parallel Computing
39 pages
Chapter # 1
No ratings yet
Chapter # 1
117 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
PC 1
No ratings yet
PC 1
53 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
CC - Unit 1
No ratings yet
CC - Unit 1
29 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
A Roadmap To Accounting For IncomeTaxes - November 2020
100% (1)
A Roadmap To Accounting For IncomeTaxes - November 2020
670 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Week 1
No ratings yet
Week 1
74 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
PDC 3
No ratings yet
PDC 3
26 pages
Unit1 Parallel and Distributed
No ratings yet
Unit1 Parallel and Distributed
29 pages
Parallel and Distributed Computing-1
No ratings yet
Parallel and Distributed Computing-1
17 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
PDC 1
No ratings yet
PDC 1
41 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Module 2: Goals of Parallelism Week 2 Learning Outcomes:: General-Purpose Computing On Graphics Processing Units
No ratings yet
Module 2: Goals of Parallelism Week 2 Learning Outcomes:: General-Purpose Computing On Graphics Processing Units
11 pages
L35 MC 6
No ratings yet
L35 MC 6
351 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
10 Parallel Computing
No ratings yet
10 Parallel Computing
15 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
SQAP For Starter or Control Panel
No ratings yet
SQAP For Starter or Control Panel
29 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Intro PDC1
No ratings yet
Intro PDC1
3 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
Algonquin College Oda Check List
No ratings yet
Algonquin College Oda Check List
17 pages
BRKCRS 3147 Advanced Troubleshooting of The ASR1K and ASR4400 Made Easy 2014 Milan 90 Mins PDF
No ratings yet
BRKCRS 3147 Advanced Troubleshooting of The ASR1K and ASR4400 Made Easy 2014 Milan 90 Mins PDF
92 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
霍尼韦尔 Lks310 燃烧器控制器说明书
No ratings yet
霍尼韦尔 Lks310 燃烧器控制器说明书
8 pages
CONSIDERING THE FUTURE OF THE PROFESSION-Artículo en Ingles PDF
No ratings yet
CONSIDERING THE FUTURE OF THE PROFESSION-Artículo en Ingles PDF
40 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Rock Cycle - Metamorphic Rocks
No ratings yet
Rock Cycle - Metamorphic Rocks
33 pages
Parallel Computing: Charles Koelbel
No ratings yet
Parallel Computing: Charles Koelbel
12 pages
Toms River Animal Shelter Expenses 2023
No ratings yet
Toms River Animal Shelter Expenses 2023
27 pages
What Is Parallel Computing 1 PDF
No ratings yet
What Is Parallel Computing 1 PDF
21 pages
Final Suggestion EC, BC by GKJ
No ratings yet
Final Suggestion EC, BC by GKJ
109 pages
Introduction To Computing
No ratings yet
Introduction To Computing
6 pages
OptaSense Third Party Interface Specification
No ratings yet
OptaSense Third Party Interface Specification
32 pages
K7D628
No ratings yet
K7D628
16 pages
INSPI - Yaoure-ESIA-Appendix-34-Cultural-Heritage-Management-Plan
100% (1)
INSPI - Yaoure-ESIA-Appendix-34-Cultural-Heritage-Management-Plan
7 pages
Mechanical Module 06
No ratings yet
Mechanical Module 06
14 pages
SDS Underwater Cutting Rods 2018 PDF
100% (1)
SDS Underwater Cutting Rods 2018 PDF
8 pages
Japan Accounting
No ratings yet
Japan Accounting
14 pages
Final BSBCMM401 Assessment Solution
No ratings yet
Final BSBCMM401 Assessment Solution
18 pages
Dairy Industry Analysis
No ratings yet
Dairy Industry Analysis
48 pages
223 Dak 17 DRG Cul Misc GW Typ 01
No ratings yet
223 Dak 17 DRG Cul Misc GW Typ 01
2 pages
DLL - FP Wk8 Day 1
No ratings yet
DLL - FP Wk8 Day 1
5 pages
Illycaffe: The Starbucks Threat: Marketing Strategy
No ratings yet
Illycaffe: The Starbucks Threat: Marketing Strategy
12 pages
Fire Hydrant 2 Polyhose
No ratings yet
Fire Hydrant 2 Polyhose
1 page
Group Life Assurance in Myanmar
No ratings yet
Group Life Assurance in Myanmar
2 pages
Job Analysis The Process and Its Uses
No ratings yet
Job Analysis The Process and Its Uses
13 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
India - Gratuity - Form - V2 - Signed PDF
No ratings yet
India - Gratuity - Form - V2 - Signed PDF
2 pages
Chapter 17 - Answer PDF
No ratings yet
Chapter 17 - Answer PDF
5 pages
Synchro PRO 2018 - Technical Overview
No ratings yet
Synchro PRO 2018 - Technical Overview
11 pages
Our Development Board: Product Details
No ratings yet
Our Development Board: Product Details
4 pages
Principles of Digital Transmission
No ratings yet
Principles of Digital Transmission
1 page
Worksheet 7 Solution
No ratings yet
Worksheet 7 Solution
4 pages
IGNOU PGDCA MCS 203 Operating System Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 203 Operating System Previous Years Unsolved Papers
Manish Soni
No ratings yet
IGNOU BCA Data and File Structure Previous Year Unsolved Papers MCS 021
From Everand
IGNOU BCA Data and File Structure Previous Year Unsolved Papers MCS 021
Manish Soni
No ratings yet

L01 Introduction

Uploaded by

L01 Introduction

Uploaded by

CS3210

 Optimized on speed only  Optimized to cooperate and work

 Single Processor:  Parallel Computer:

“obtaining more computing power by stamping multiple processors

John Markoff, New York Times, May 17, 2004

*Cooking aware computing

“But visualize what it would be like to program for such a machine! If a

 Traditionally, a problem is divided into a discrete

Subtract c from b c (2)

single control flow

Sequential Computation Model

2. Work smart  pipelining, superscalar

3. Get help  replication – multicore, cluster, ..

 Processing units could be:

 Ideally, a problem (application) is partitioned into sufficient

fork creates parallel

join blocks executing

multiple control flows

Parallel Computing (Shared-Memory)

 Performance - exploits a large collection of processing units that

Physical Cores &

 Complicated and laborious process

 Scheduling: Assignment of tasks to processes or threads

 Mapping: Assignment of processes/threads to physical

 Tasks may depend on each other resulting in data or control

 To execute correctly, processes and threads need

 Memory organizations: shared-memory (threads) and

 E.g. If n (=24), and p = 8

 Partial sum calculated in each core:

 factor of 2 improvement  factor of 100 improvement

 Parallel execution time = computation time + parallelization

 Software-controlled memory management

 Coordination and synchronization

 Performance modeling / Monitoring

Single System Distributed Systems

PC/Workstation SMP/NUMA Vector Mainframe

Clusters Client Server Grids Peer-to-Peer Cloud

Control and Management

• 158,976 nodes (CPUs):

name prefix multiplier

 Parallel computing - what, why and how

 Performance and challenges

 Landscape of parallel computing

You might also like