0% found this document useful (0 votes)

52 views53 pages

High Performance Computing (HPC) - Lec2

Uploaded by

omargamalelziky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views53 pages

High Performance Computing (HPC) - Lec2

Uploaded by

omargamalelziky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

High Performance Computing

(HPC)
Lecture 2

By: Dr. Maha Dessokey

Agenda

 Parallel Computer Memory Architectures

 Multithreading vs. Multiprocessing
 Designing Parallel Programs
 HPC Cluster Architecture
Parallel Computer Memory Architectures

 Shared Memory
All processors access all memory as a single global address
space.
Data sharing is fast.
Lack of scalability between memory and CPUs
Parallel Computer Memory Architectures
(Contd.)

 Shared Memory
Advantages:
Global address space provides a user-friendly programming perspective to
memory
Data sharing between tasks is both fast and uniform due to the proximity of
memory to CPUs
Disadvantages:
Lack of scalability between memory and CPUs
Programmer responsibility for synchronization constructs that insure "correct"
access of global memory.
Expense: it becomes increasingly difficult and expensive to design and produce
shared memory machines with ever increasing numbers of processors.
Parallel Computer Memory Architectures
(Contd.)

 Distributed Memory
Each processor has its own memory.
Is scalable, no overhead for cache coherency.
Programmer is responsible for many details of
communication between processors.
Parallel Computer Memory Architectures
(Contd.)

 Distributed Memory
Advantages:
Memory is scalable with number of processors
Each processor can rapidly access its own memory
without interference and without the overhead
incurred with trying to maintain cache coherency.
Cost effectiveness: can use commodity, off-the-shelf
processors and networking.
Disadvantages:
The programmer is responsible for many of the details
associated with data communication between
processors.
Agenda

 Parallel Computer Memory Architectures

 Multithreading vs. Multiprocessing
 Designing Parallel Programs
 HPC Cluster Architecture
Multithreading vs. Multiprocessing

Threads - shares “heavyweight”

the same process -
memory space completely
and global separate program
variables with its own
between variables, stack,
routines. and memory
allocation.
Agenda

 Parallel Computer Memory Architectures

 Multithreading vs. Multiprocessing
 Designing Parallel Programs
 HPC Cluster Architecture
Designing Parallel Programs

1. Understand the Problem and the Program

2. Partitioning
3. Communication and Data Dependencies
4. Mapping
1-Understand the Problem and the
Program

 Understand the problem you want to solve in parallel, including any

existing serial code if applicable.
 Before developing a parallel solution, confirm that the problem can
actually be parallelized.
Examples of non-parallelizable problems

 Sequential Dependency Problems

Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,...) by use of the
formula:
F(k + 2) = F(k + 1) + F(k)
 Input/Output Bound Tasks
File Compression/Decompression: If the process requires sequentially
reading or writing data, it cannot be effectively parallelized.
 Dynamic Programming Problems with Dependencies
Knapsack Problem: The optimal solution for one subproblem may
depend on the solutions to other subproblems in a specific order.
Embarrassingly Parallel Computations

 A computation that can obviously be divided into a number of completely

independent parts, each of which can be executed by a separate process(or)

 No communication or very little communication between processes. Each

process can do its tasks without any interaction with other processes
Embarrassingly Parallel Computations
(Contd.)

Practical embarrassingly parallel computation with

static process creation and master-slave approach
(MPI approach)
Embarrassingly Parallel Computation
Examples

 Low level image processing

x and y are the original and x’and y’ are the new coordinates.
Shifting : Object shifted by Dx in the x-dimension and Dy in the y-dimension:
X’= X+ Dx Y’= Y+ Dy
 Scaling Object scaled by a factor Sx in x-direction and Sy in y-direction:
x’ = x *Sx , y’ = y *Sy
Rotation Object rotated through an angle q about the origin of the coordinate
system:
x’ = x cosq + y sinq y¢ = -x sinq + y cosq
Identify the program's hotspots

 Know where most of the real work is being done. The majority of
scientific and technical programs usually accomplish most of their
work in a few places (functions).
 Profilers and performance analysis tools can help here
 Focus on parallelizing the hotspots and ignore those sections of the
program that account for little CPU usage.
Identify bottlenecks in the program

 Are there areas that are disproportionately slow, or cause

parallelizable work to halt or be deferred? For example, I/O is
usually something that slows a program down.
 May be possible to restructure the program or use a different
algorithm to reduce or eliminate unnecessary slow areas
Other considerations

Identify inhibitors to parallelism. One common class of

inhibitor is data dependence, as demonstrated by the
Fibonacci sequence above.
Investigate other algorithms if possible. This may be the
single most important consideration when designing a
parallel application.
Designing Parallel Programs

1. Understand the Problem and the Program

2. Partitioning
3. Communication and Data Dependencies
4. Mapping
2- Designing Parallel Programs-
Partitioning

 Breaking the problem into discrete "chunks" of work that can

be distributed to multiple tasks. This is known as decomposition
or partitioning.
 There are two basic ways to partition computational work
among parallel tasks:
Domain Decomposition

Functional Decomposition
Partitioning-Domain Decomposition

Domain Decomposition
In this type of partitioning, the
data associated with a
problem is decomposed. Each
parallel task then works on a
portion of the data.
Partitioning-Domain Decomposition
(contd.)

 There are different ways to partition

data
Partitioning-Functional Decomposition

The focus is on the

computation that is to be
performed rather than on
the data manipulated by
the computation. The
problem is decomposed
according to the work that
must be done. Each task
then performs a portion of
the overall work.
Partitioning Examples

 Operations on sequences of number such as simply adding them

together , n= No. of elements, p= No. of processors
Partitioning Examples (contd.)

 Ecosystem Modeling
Each program calculates the
population of a given group, where
each group's growth depends on that
of its neighbors. As time progresses,
each process calculates its current
state, then exchanges information with
the neighbor populations. All tasks then
progress to calculate the state at the
next time step.
Designing Parallel Programs

1. Understand the Problem and the Program

2. Partitioning
3. Communication and Data Dependencies
4. Mapping
3- Designing Parallel Programs -
Communications

 No Communication Needed
Some problems can be executed in parallel with minimal data
sharing. These are known as embarrassingly parallel problems due
to their simplicity and minimal inter-task communication.
 Communication Required
Most parallel applications are not quite so simple, and do require
tasks to share data with each other. For instance, in a 3-D heat
diffusion problem, each task needs temperature information from
neighboring tasks, as changes in neighboring data directly impact
its own results.
Communications- Factors to Consider

1- Cost of communications
Overhead: Inter-task communication consumes
machine cycles and resources that could be used for
computation.
Synchronization: Communication often requires
synchronization, causing tasks to spend time waiting
instead of working.
Bandwidth Saturation: Competing communication
traffic can saturate network bandwidth, worsening
performance issues.
Communications- Factors to Consider
(contd.)

2- Key Communication Metrics

Latency: The time to send a minimal message (0 bytes)
from point A to point B, typically measured in
microseconds.
Bandwidth: The amount of data transmitted per unit of
time, commonly expressed in megabytes per second.
Sending many small messages can lead to high latency,
making it more efficient to combine them into larger
messages to enhance effective communication
bandwidth.
Communications- Factors to Consider
(contd.)

3-Communication Types
 Synchronous Communication: Requires "handshaking" between tasks,
either explicitly coded or handled at a lower level. It's called blocking
communication because other tasks must wait for it to complete.
 Asynchronous Communication: Allows tasks to transfer data
independently. For instance, task 1 can send a message to task 2 and
continue working without waiting for the data to be received. This is
known as non-blocking communication, as other work can proceed
simultaneously.
The main advantage of asynchronous communication is the ability to
interleave computation with communication, maximizing efficiency.
Communications- Factors to Consider
(contd.)

4-Scope of communications
Identifying which tasks need to communicate is crucial during
the design of parallel code. The two types of communication
can be implemented either synchronously or asynchronously:
 Point-to-Point: Involves two tasks, with one acting as the
sender (producer) and the other as the receiver (consumer).
 Collective: Involves data sharing among multiple tasks,
typically organized into a common group or collective.
Collective Communications Example
Designing Parallel Programs - Data
Dependencies

A dependence exists between program statements when

the order of statement execution affects the results of the
program.
A data dependence results from multiple use of the same
location(s) in storage by different tasks.
Dependencies are important to parallel programming
because they are one of the primary inhibitors to
parallelism.
Designing Parallel Programs - Data
Dependencies

Temp = [B]
[B] = [A][B]
[B new] = [A][B old] Tempi,j = σ𝑛𝑘=1 𝐴𝑖, 𝑘* B k,j
B new i,j = σ𝑛𝑘=1 𝐴𝑖, 𝑘* B old k,j .
[B new] = Temp
Designing Parallel Programs - Data
Dependencies (contd.)

i,j+1

i-1,j
i-1,j

i,j
j

i,j-1
i
Designing Parallel Programs - Data Dependencies
(contd.)

How to Handle Data Dependencies?

Distributed memory architectures
Communicate required data at synchronization points.
Shared memory architectures
Synchronize read/write operations between tasks.
Designing Parallel Programs

1. Understand the Problem and the Program

2. Partitioning
3. Communication and Data Dependencies
4. Mapping
4- Designing Parallel Programs -Mapping

1- Load balancing
used to distribute computations
fairly across processors in order
to obtain the highest possible
execution speed
distributing work among tasks
so that all tasks are kept busy
all of the time
Mapping- Load balancing

Imperfect load balancing

leading t Load balancing
to increased execution
time

Perfect load balancing

Mapping- Load balancing

How to Achieve Load Balance?

(1) Equally partition the work each task receives
(2) Use dynamic work assignment.
Agenda

 Parallel Computer Memory Architectures

 Multithreading vs. Multiprocessing
 Designing Parallel Programs
 HPC Cluster Architecture
HPC Platforms

MPI Horizontal Scaling

Scale out
Vertical Scaling
Scale up
SuperComputers
Multiple independent
Installing more processors,
machines are added together
more memory and faster
hardware
FPGA
Vertical Vs. Horizontal HPC platforms

Vertical HPC Platforms

Integration: Components (CPUs, memory, storage) are tightly
integrated within a single system, which can lead to lower latency
and higher bandwidth for data transfers.
Scalability: Performance is achieved by adding more powerful
components (e.g., more CPUs or GPUs) within the same system,
allowing for significant performance gains without the complexities
of inter-node communication.
Efficiency: Often optimized for specific tasks, which can lead to
better overall performance for those tasks due to reduced
overhead in communication and resource management.
Vertical Vs. Horizontal HPC platforms

Horizontal HPC Platforms

Distributed Architecture: Comprises many individual nodes (often
commodity hardware) connected via a network. Each node
operates independently, which can introduce latency in
communication.
Scalability: Can scale out by adding more nodes, allowing for
potentially limitless growth in computational power, but
performance gains may be limited by network bandwidth and
latency.
Flexibility: More adaptable to different workloads and can utilize
a wider range of hardware, But may require more complex
resource management and optimization.
How to measure computer performance?

 What do we mean by “performance”?

For scientific and technical programming use FLOPS
 FLoating point OPerations per Second
 1.324398404 + 3.6287414 = ? • 2.365873534 * 2443.3147 = ?
 Modern supercomputers measured in PFLOPS (PetaFLOPS) • Kilo, Mega, Giga,
Tera, Peta, Exa = 103 , 106 , 109 , 1012, 1015
How to measure computer performance?

 Floating-point operations per second (FLOPS): 𝐹𝐿𝑂𝑃𝑆 = 𝑛𝑜𝑑𝑒𝑠 × 𝑐𝑜𝑟𝑒𝑠

/𝑛𝑜𝑑𝑒𝑠 × 𝑐𝑦𝑐𝑙𝑒𝑠/𝑠𝑒𝑐𝑜𝑛𝑑 × 𝐹𝐿𝑂𝑃𝑠 𝑐𝑦𝑐𝑙𝑒
 The 3rd term clock cycles per second is known as the clock frequency,
typically 2 ~ 3 GHz.
 The 4th term FLOPs per cycle is how many floating-point operations
are done in one clock cycle.
HPC- Benchmarking

 The LINPACK Benchmarks are a measure of a system's floating-point

computing power.
 The aim is to approximate how fast a computer will perform when
solving real problems.
 The peak performance is the maximal theoretical performance a
computer can achieve. The actual performance will always be
lower than the peak performance.
HPC- Benchmarking

 HPL is a portable implementation of Linpack that was written in C,

originally as a guideline, but that is now widely used to provide
data for the TOP500 list, though other technologies and packages
can be used. HPL generates a linear system of equations of order n
and solves it using LU decomposition with partial row pivoting. It
requires installed implementations of MPI and either BLAS or VSIPL to
run.
 Rmax - Maximal LINPACK performance achieved (actual)

 Rpeak - Theoretical peak performance

Top 500 Supercomputers

June 2024 | TOP500

Rmax Rpeak
Rank System Cores (PFlop/s) (PFlop/s) Power (kW)
1 Frontier - HPE 8,699,904 1,206.00 1,714.81 22,786
Cray EX235a,
HPE
DOE/SC/Oak
Ridge
National
Laboratory
United States
HPC Cluster Architecture
HPC cluster components

Nodes: Individual computers in the cluster

Cores (threads): individual processing units available
within each CPU of each Node
e.g. a “Node” with eight “quad”-core CPUs = 32 cores for that
node.
 Shared disk: storage that can be shared (and accessed)
by all nodes
Questions?

Scania Parts List
100% (4)
Scania Parts List
2 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Designing and Building Parallel Programs
No ratings yet
Designing and Building Parallel Programs
371 pages
T-Spot Test Results
No ratings yet
T-Spot Test Results
1 page
Trial Memorandum Plaintiff SAMPLE
100% (4)
Trial Memorandum Plaintiff SAMPLE
10 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
Easy Love Spell
50% (2)
Easy Love Spell
2 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Parallel Computing Challanges
No ratings yet
Parallel Computing Challanges
7 pages
PYQ DEMO COMBO PYQ BANK All Odisha Previous Year Subject Wise Topic Wise 20000 Questions Answer PDF
100% (1)
PYQ DEMO COMBO PYQ BANK All Odisha Previous Year Subject Wise Topic Wise 20000 Questions Answer PDF
51 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
ch3 Parallel PDF
0% (1)
ch3 Parallel PDF
76 pages
Native Corn Recipes
100% (3)
Native Corn Recipes
115 pages
Our Annual List of Must-Have Wines.: by The Editors of Wine Enthusiast Magazine
100% (1)
Our Annual List of Must-Have Wines.: by The Editors of Wine Enthusiast Magazine
10 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Programming For Performance
No ratings yet
Programming For Performance
79 pages
PDC Summers Finals Revision Notes
No ratings yet
PDC Summers Finals Revision Notes
50 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
How To Parallelise An Application
No ratings yet
How To Parallelise An Application
30 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
6 Month MCQs (Oct To May 25) English
No ratings yet
6 Month MCQs (Oct To May 25) English
197 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
Partitioning
No ratings yet
Partitioning
37 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Introduction
No ratings yet
Introduction
34 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Parallel Algorithm and Programming
No ratings yet
Parallel Algorithm and Programming
4 pages
In3200 Chap05
No ratings yet
In3200 Chap05
34 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
HPC Note
No ratings yet
HPC Note
39 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
34 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
No ratings yet
Parallel Computing: Overview: John Urbanic Urbanic@psc - Edu
33 pages
U1&u2 Padcom-25
No ratings yet
U1&u2 Padcom-25
95 pages
Unit 3
No ratings yet
Unit 3
49 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
Lect 1 Overview
No ratings yet
Lect 1 Overview
17 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
L04 Parallel Programming Models I
No ratings yet
L04 Parallel Programming Models I
72 pages
CP4253 Map Unit Ii
No ratings yet
CP4253 Map Unit Ii
23 pages
Unit1 2 and 3
No ratings yet
Unit1 2 and 3
76 pages
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
No ratings yet
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
22 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Koneru Lakshmaiah College of Engineering: (Atonomous)
No ratings yet
Koneru Lakshmaiah College of Engineering: (Atonomous)
7 pages
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
No ratings yet
Group3 - Parallel - Computing - Techniques - Presentation Power Point 2025
27 pages
PDC Unit-2
No ratings yet
PDC Unit-2
48 pages
Parallel Algorithm Design Principles and Programming
No ratings yet
Parallel Algorithm Design Principles and Programming
8 pages
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
No ratings yet
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
14 pages
Grammar Worksheets
No ratings yet
Grammar Worksheets
30 pages
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
No ratings yet
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
4 pages
24csppc202 Multicore Architecture and Programming
No ratings yet
24csppc202 Multicore Architecture and Programming
21 pages
Brosur Master Steel
No ratings yet
Brosur Master Steel
4 pages
300 Ohm Twin-Lead J-Pole Portable Antenna
No ratings yet
300 Ohm Twin-Lead J-Pole Portable Antenna
3 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
M4 Merge PDF
No ratings yet
M4 Merge PDF
68 pages
Attachment 3
No ratings yet
Attachment 3
11 pages
Random Details
No ratings yet
Random Details
2 pages
Survey Instrument Validation Rating Scale SHS 2023
No ratings yet
Survey Instrument Validation Rating Scale SHS 2023
1 page
Kerry Anderson Resume 2017 Weebly
No ratings yet
Kerry Anderson Resume 2017 Weebly
3 pages
API FR - INR.RINR DS2 en Excel v2 2917298
No ratings yet
API FR - INR.RINR DS2 en Excel v2 2917298
74 pages
Eim Q3W8
No ratings yet
Eim Q3W8
47 pages
Peter Markus NGEM01
No ratings yet
Peter Markus NGEM01
63 pages
Intermediary Liability in A Global World: Prof. Dr. Matthias Leistner, LL.M. (Cambridge)
No ratings yet
Intermediary Liability in A Global World: Prof. Dr. Matthias Leistner, LL.M. (Cambridge)
40 pages
CT TIF Presentation For Kickoff-Final
No ratings yet
CT TIF Presentation For Kickoff-Final
13 pages
Gsu100 6648-0.0
No ratings yet
Gsu100 6648-0.0
16 pages
Ict2611 Octnov24
No ratings yet
Ict2611 Octnov24
15 pages
Active Directory Administration The Personal Trainer For Windows Server 2008 and Windows Server 2008 R2 William Stanek Instant Download
100% (1)
Active Directory Administration The Personal Trainer For Windows Server 2008 and Windows Server 2008 R2 William Stanek Instant Download
41 pages
BPMG 3023 Transport and Society Assignment 1
No ratings yet
BPMG 3023 Transport and Society Assignment 1
8 pages
Punzalan, Joshua Mitchell L. Case-Scenarios-NICU
No ratings yet
Punzalan, Joshua Mitchell L. Case-Scenarios-NICU
2 pages
PJS Damansara Qtr4 2022 - Invoices
No ratings yet
PJS Damansara Qtr4 2022 - Invoices
3 pages
General Biology Chapter 2 Assignment
No ratings yet
General Biology Chapter 2 Assignment
2 pages
B1 Final Test SpeakingTestFormat
No ratings yet
B1 Final Test SpeakingTestFormat
4 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Mastering Dynamic Programming in Python
From Everand
Mastering Dynamic Programming in Python
Ed A Norex
No ratings yet

High Performance Computing (HPC) - Lec2

Uploaded by

High Performance Computing (HPC) - Lec2

Uploaded by

High Performance Computing

By: Dr. Maha Dessokey

 Parallel Computer Memory Architectures

 Parallel Computer Memory Architectures

Threads - shares “heavyweight”

 Parallel Computer Memory Architectures

1. Understand the Problem and the Program

 Understand the problem you want to solve in parallel, including any

 Sequential Dependency Problems

 A computation that can obviously be divided into a number of completely

 No communication or very little communication between processes. Each

Practical embarrassingly parallel computation with

 Low level image processing

 Are there areas that are disproportionately slow, or cause

Identify inhibitors to parallelism. One common class of

1. Understand the Problem and the Program

 Breaking the problem into discrete "chunks" of work that can

 There are different ways to partition

The focus is on the

 Operations on sequences of number such as simply adding them

1. Understand the Problem and the Program

2- Key Communication Metrics

A dependence exists between program statements when

How to Handle Data Dependencies?

1. Understand the Problem and the Program

Imperfect load balancing

Perfect load balancing

How to Achieve Load Balance?

 Parallel Computer Memory Architectures

MPI Horizontal Scaling

Vertical HPC Platforms

Horizontal HPC Platforms

 What do we mean by “performance”?

 Floating-point operations per second (FLOPS): 𝐹𝐿𝑂𝑃𝑆 = 𝑛𝑜𝑑𝑒𝑠 × 𝑐𝑜𝑟𝑒𝑠

 The LINPACK Benchmarks are a measure of a system's floating-point

 HPL is a portable implementation of Linpack that was written in C,

 Rpeak - Theoretical peak performance

June 2024 | TOP500

Nodes: Individual computers in the cluster

You might also like