0% found this document useful (0 votes)

68 views29 pages

02 - Lecture #2

High Performance Computing (HPC) utilizes multiple processors working concurrently to solve large problems faster. The increasing demands of applications and limitations of sequential processing due to physical constraints have made parallel computing inevitable. Key terminology includes core, multicore, task, speedup, efficiency, and Amdahl's law, which is used to predict maximum speedup using multiple processors based on the sequential fraction of a problem. Parallelism can be expressed in programs implicitly through defined tasks or explicitly through programming models that specify the coordination of parallel elements.

Uploaded by

Fatma mansour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views29 pages

02 - Lecture #2

Uploaded by

Fatma mansour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

High Performance

Computing
LECTURE #2

1
Agenda
o What is parallel computing?
o Why Parallel Computers?
o Motivation
o Inevitability of parallel computing
o Application demands
o Technology and architecture trends
oTerminologies
o How is parallelism expressed in a program
o Challenges
2
What is parallel computing?
Multiple processors cooperating concurrently to solve one problem.

3
What is parallel computing?

“A parallel computer is a collection of processing elements that can

communicate and cooperate to solve large problems fast”
Almasi/Gottlieb

“communicate and cooperate”

• Nodes and interconnect architecture
• Problem partitioning (Co-ordination of events in a process)

“large problems fast”

• Programming model
• Match of model and architecture

4
What is parallel computing?

Some broad issues:

• Resource Allocation:
– How large a collection?
– How powerful are the elements?

• Data access, Communication and Synchronization

– How are data transmitted between processors?
– How do the elements cooperate and communicate?
– What are the abstractions and primitives for cooperation?

• Performance and Scalability

– How does it all translate into performance?
– How does it scale? A service is said to be scalable when ??
5
Why Parallel Computer?
• Tremendous advances in microprocessor technology,
ex: clock rates of processors increased from 40MHz (e.g MIPS R3000, 1988)
to 2.0 GHz (e.g pentium 4, 2002)
to nowadays, 8.429GHz (AMD's Bulldozer based FX chips, 2012)

• Processor are now capable of executing multiple instruction in the same cycle

◦ The fundamental sequence of steps that a CPU performs. Also known as the "fetch-execute
cycle," it is the time in which a single instruction is fetched from memory, decoded and
executed.

◦ The first half of the cycle transfers the instruction from memory to the instruction register
and decodes it. The second half executes the instruction
6
Why parallel computing?

• The ability of memory system to feed data to processor at required rate

increased

• In addition, significant innovations in architecture and software have

addressed the mitigation of bottlenecks posed by data-Path and memory

◦ Hence, multiplicity of data-paths to increase access to storage elements (memory & disk)

7
Motivation
• Sequential architectures reaching physical limitation.

• Uniprocessor architectures will not be able to sustain the rate of performance

increments in the future.

• Computation requirements are ever increasing

-- visualization, distributed databases,
-- simulations, scientific prediction (earthquake), etc.
• Accelerating applications.

8
Inevitability of parallel computing
Application demand for performance
• Scientific: weather forecasting, pharmaceutical design, genomics
• Commercial: OLTP, search engine, decision support, data mining
• Scalable web servers

Technology and architecture trends

• limits to sequential CPU, memory, storage performance
• parallelism is an effective way of utilizing growing number of transistors.
• low incremental cost of supporting parallelism

9
Application Demand: Inevitability of parallel Computing
Engineering Computers
• Earthquake and structural modeling • Embedded systems increasingly rely on
distributed control algorithms.
• Design and simulation of micro- and nano-scale
systems. •Network intrusion detection, cryptography, etc.
•Optimizing performance of modern automobile.
Computational Sciences •Networks, mail-servers, search engines…
• Bioinformatics: Functional and structural •Visualization architectures & entertainment
characterization of genes and proteins.
•Simulation Traditional scientific and engineering
• Astrophysics: exploring the evolution of galaxies. paradigm:
 1) Do theory or paper design.
• Weather modeling, flood/tornado prediction..  2) Perform experiments or build system.

Limitations:
Commercial
– Too difficult -- build large wind tunnels.
• Data mining and analysis for optimizing business – Too expensive -- build a throw-away passenger jet.
and marketing decisions.
– Too slow -- wait for climate or galactic evolution.
• Database and Web servers for online transaction
– Too dangerous -- weapons, drug design, climate
processing experimentation. 10
Technology and architecture

1- Processor Capacity

11
2- Transistor Count

40% more functions can be performed by a CPU per year

Fundamentally, the use of more transistors improves performance in two ways:
◦ Parallelism: multiple operations done at once (less processing time)
◦ Locality: data references performed close to the processor (less memory latency)
12
3- Clock Rate

30% per year ---> today’s PC is yesterday’s Supercomputer

13
4- Similar Story for Memory and Disk
❖ Divergence between memory capacity and speed
o Capacity increased by 1000X from 1980-95, speed only 2X
o Larger memories are slower, while processors get faster “memory wall”
- Need to transfer more data in parallel
- Need deeper cache hierarchies
- Parallelism helps hide memory latency

❖Parallelism within memory systems too

o New designs fetch many bits within memory chip, follow with fast pipelined
transfer across narrower interface

14
5- Role of Architecture
Greatest trend in VLSI is an increase in the exploited parallelism
• Up to 1985: bit level parallelism:
– 4-bit -> 8 bit -> 16-bit – slows after 32 bit

• Mid 80s to mid 90s: Instruction Level Parallelism (ILP)

– pipelining and simple instruction sets (RISC)
– on-chip caches and functional units => superscalar execution
– Greater sophistication: out of order execution, speculation

• Nowadays:
– Hyper-threading
– Multi-core
15
❖ Definition
High-performance computing (HPC) is the use of parallel processing for
running advanced application programs efficiently, reliably and quickly.

17
Conclusions
• The hardware evolution, driven by Moore’s law, was geared toward two
things:
– exploiting parallelism
– Dealing with memory (latency, capacity)

18
Terminologies
❑Core a single computing unit with its own independent control

❑ Multicore is a processor having several cores that can access the same memory
concurrently

❑ A computation is decomposed into several parts called Tasks that can be computed
in parallel

❑Finding enough parallelism is (one of the) critical steps for high performance
(Amdahl’s law).

19
Performance Metrics

❑ Execution time:
The time elapsed between the beginning and the end of its execution.

❑Speedup:
The ration between serial and parallel time.
Speedup= Ts/Tp

❑ Efficiency:
Ratio of speedup to the number of processors.
Efficiency= Speedup/P
20
❑ Amdahl’s Law
Used to predict maximum speedup using multiple processors.
• Let f = fraction of work performed sequentially.
• (1 - f) = fraction of work that is parallelizable.
• P = number of processors
On 1 cpu: T1 = f + (1 – f ) = 1.
(1−𝑓)
On P processors: Tp = f +
𝑝

• Speedup
𝑇1 1 1
= <
𝑇𝑝 𝑓+(1−𝑓)/𝑝 𝑓

Speedup limited by sequential part

21
How is parallelism expressed in a
program
IMPLICITLY EXPLICITLY
❑ Define tasks only, rest implied; or ❑ Define tasks, work decomposition,
define tasks and work decomposition data decomposition, communication,
rest implied; synchronization.

❑ OpenMP is a high-level parallel ❑MPI is a library for fully explicit

programming model, which is mostly an parallelization.
implicit model.

23
1- IMPLICITLY
❑ It is a characteristic of a programming language that allows a compiler or interpreter to
automatically exploit the parallelism inherent to the computations expressed by some of the
language's constructs.

❑ A pure implicitly parallel language does not need special directives, operators or functions
to enable parallel execution.

❑ Programming languages with implicit parallelism include Axum, HPF, Id, LabVIEW, MATLAB
M-code,

❑ Example: taking the sine or logarithm of a group of numbers, a language that provides
implicit parallelism might allow the programmer to write the instruction thus:

24
Advantages Disadvantages

❑A programmer does not need to worry ❑It reduce the control that the
about task division or process programmer has over the parallel
communication, execution of the program,

❑ focusing instead in the problem that ❑resulting sometimes in less-than-optimal

his or her program is intended to solve. parallel efficiency.

❑It generally facilitates the design of ❑ Sometimes debugging is difficult.

parallel programs .

25
2- EXPLICITLY
How is parallelism expressed in a program
❑it is the representation of concurrent computations by means of primitives in the form
of special-purpose directives or function calls.

❑Most parallel primitives are related to process synchronization, communication or task

partitioning.

Advantages Disadvantages

❑ The absolute programmer control ❑ programming with explicit parallelism

over the parallel execution. is often difficult, especially for non
computing specialists,

❑ A skilled parallel programmer takes

advantage of explicit parallelism to ❑because of the extra work involved in
produce very efficient code. planning the task division and
synchronization of concurrent
26
What is Think - Different
How many people doing the work → (Degree of Parallelism)

What is needed to begin the work → (Initialization)

Who does what → (Work distribution)

Access to work part → (Data/IO access)

Whether they need info from each other to finish their own job → (Communication)

When are they all done → (Synchronization)

What needs to be done to collate the result

27
Challenges
All parallel programs contain:
❑ Parallel sections
❑ Serial sections
❑Serial sections are with work is being duplicated or no useful work is being done, (waiting
for others)

Building efficient algorithms avoiding:

❑ Communication delay
❑ Idling
❑ Synchronization

28
Sources of overhead in parallel programs
❑ Inter process interaction:
The time spent communicating data between processing elements is
usually the most significant source of parallel processing overhead.

❑ Idling:
Processes may become idle due to many reasons such as load
imbalance, synchronization, and presence of serial components in a
program.

❑ Excess Computation:
The fastest known sequential algorithm for a problem may be difficult
or impossible to parallelize, forcing us to use a parallel algorithm
based on a poorer but easily parallelizable sequential algorithm.

Coding Notes - Billing & Coding Pocket Guide
89% (9)
Coding Notes - Billing & Coding Pocket Guide
238 pages
Sola Morales. Public and Collective Space
No ratings yet
Sola Morales. Public and Collective Space
7 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
PDC 3
No ratings yet
PDC 3
26 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
PC 1
No ratings yet
PC 1
53 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Parallel Programming
No ratings yet
Parallel Programming
10 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
PDC Lecture 2
No ratings yet
PDC Lecture 2
13 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
Cours 1
No ratings yet
Cours 1
38 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
Lec1-Introduction To Parallel - Distributed System
No ratings yet
Lec1-Introduction To Parallel - Distributed System
29 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Watercolor Organic Shapes SlidesMania
No ratings yet
Watercolor Organic Shapes SlidesMania
23 pages
Unit 5
No ratings yet
Unit 5
66 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Unit 1
No ratings yet
Unit 1
54 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
14013204-3 - Parallel Computing - Lecture1
No ratings yet
14013204-3 - Parallel Computing - Lecture1
52 pages
Cours 1
No ratings yet
Cours 1
38 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
01 - Lecture Intro To HPC
No ratings yet
01 - Lecture Intro To HPC
62 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Lecture 1 Introduction 1
No ratings yet
Lecture 1 Introduction 1
49 pages
Chapter # 1
No ratings yet
Chapter # 1
117 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Parallel Programming Module 4
No ratings yet
Parallel Programming Module 4
93 pages
Parallel Processing: Types of Parallelism
No ratings yet
Parallel Processing: Types of Parallelism
7 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Mca 4
No ratings yet
Mca 4
61 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
TVM: Compiler Infrastructure for Deep Learning Optimization: The Complete Guide for Developers and Engineers
From Everand
TVM: Compiler Infrastructure for Deep Learning Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Situational Leadership Theory
100% (1)
Situational Leadership Theory
12 pages
Metal Carbonyls
No ratings yet
Metal Carbonyls
9 pages
UX Laws New Edition
No ratings yet
UX Laws New Edition
44 pages
DLL FABM2 Week 6
100% (2)
DLL FABM2 Week 6
3 pages
Factors Affecting Agriculture: E Ditorial
No ratings yet
Factors Affecting Agriculture: E Ditorial
1 page
Jesus On Every Book On The Bible
No ratings yet
Jesus On Every Book On The Bible
1 page
Digital Renewal Form (Bulk) - RENEW-110123
No ratings yet
Digital Renewal Form (Bulk) - RENEW-110123
3 pages
Merce Paper - I - Introduction To Business (Eng) PDF
0% (1)
Merce Paper - I - Introduction To Business (Eng) PDF
290 pages
Locus: Definition: The Set of All Points (And Only Those Points) Which Satisfy The Given Geometrical
No ratings yet
Locus: Definition: The Set of All Points (And Only Those Points) Which Satisfy The Given Geometrical
7 pages
I Go To London Bahsaa Inggris Ranuditya
No ratings yet
I Go To London Bahsaa Inggris Ranuditya
4 pages
2 Kofi Oteng Kufuor, The African Human Rights System Origin and
No ratings yet
2 Kofi Oteng Kufuor, The African Human Rights System Origin and
24 pages
Ashraf Adel CV
No ratings yet
Ashraf Adel CV
2 pages
Civil Year 3 Geotec Report
No ratings yet
Civil Year 3 Geotec Report
32 pages
EE 353 Computer Networks-Fall 2022
No ratings yet
EE 353 Computer Networks-Fall 2022
81 pages
Gp618-Heat Transfer Nov 2014 PDF
No ratings yet
Gp618-Heat Transfer Nov 2014 PDF
5 pages
Lab - Test Menu - Rev - Oct2019
No ratings yet
Lab - Test Menu - Rev - Oct2019
20 pages
Operat Or'S Manual: Electric Scissor Lifts
100% (1)
Operat Or'S Manual: Electric Scissor Lifts
42 pages
SmartEdge Plus Returns - Updated Till March 31st 2024
No ratings yet
SmartEdge Plus Returns - Updated Till March 31st 2024
5 pages
Cerenity Sanitizer Case Study
No ratings yet
Cerenity Sanitizer Case Study
14 pages
Proed Merged
No ratings yet
Proed Merged
94 pages
Mtree Shortlisting 2025 Batch
No ratings yet
Mtree Shortlisting 2025 Batch
6 pages
2018.proceedings Cmmse 7
No ratings yet
2018.proceedings Cmmse 7
4 pages
Understanding Contracts Import White Paper
No ratings yet
Understanding Contracts Import White Paper
11 pages
Persuasive Essay On School Hours
100% (2)
Persuasive Essay On School Hours
4 pages
Scaffolding Safety
No ratings yet
Scaffolding Safety
7 pages
Ic List Irtcishop
100% (2)
Ic List Irtcishop
24 pages
JD Harvesterheads 400 EN Net
No ratings yet
JD Harvesterheads 400 EN Net
12 pages
MGMT30006 S1 190502 8950
No ratings yet
MGMT30006 S1 190502 8950
4 pages

02 - Lecture #2

Uploaded by

02 - Lecture #2

Uploaded by

High Performance

“A parallel computer is a collection of processing elements that can

“communicate and cooperate”

“large problems fast”

Some broad issues:

• Data access, Communication and Synchronization

• Performance and Scalability

• The ability of memory system to feed data to processor at required rate

• In addition, significant innovations in architecture and software have

• Uniprocessor architectures will not be able to sustain the rate of performance

• Computation requirements are ever increasing

Technology and architecture trends

40% more functions can be performed by a CPU per year

30% per year ---> today’s PC is yesterday’s Supercomputer

❖Parallelism within memory systems too

• Mid 80s to mid 90s: Instruction Level Parallelism (ILP)

Speedup limited by sequential part

❑ OpenMP is a high-level parallel ❑MPI is a library for fully explicit

❑ focusing instead in the problem that ❑resulting sometimes in less-than-optimal

❑It generally facilitates the design of ❑ Sometimes debugging is difficult.

❑Most parallel primitives are related to process synchronization, communication or task

❑ The absolute programmer control ❑ programming with explicit parallelism

❑ A skilled parallel programmer takes

What is needed to begin the work → (Initialization)

Who does what → (Work distribution)

Access to work part → (Data/IO access)

When are they all done → (Synchronization)

What needs to be done to collate the result

Building efficient algorithms avoiding:

You might also like