0% found this document useful (0 votes)
16 views

Lecture01 Intro ToHPC

Uploaded by

nadiabha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lecture01 Intro ToHPC

Uploaded by

nadiabha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Lecture 1: INTRODUCTION TO HPC

“Foundation of HPC” course


DATA SCIENCE &
SCIENTIFIC COMPUTING
2022-2023 Stefano Cozzini
First Week
• 10.10 Introduction to HPC
• 11.10 HPC Hardware and parallel computing
• 12.10 HPC Software stack and tools

• 13.10 Tutorial 1/2 Using a HPC system


• 14.10 Tutorial 1/2
Some more information
• Slides and materials of the course available here:
https://github.com/Foundations-of-
HPC/Foundations_of_HPC_2022
• For any section of the course a directory has been
created and informations and materials will be
loaded there: i.e for the introductory part:
• https://github.com/Foundations-of-
HPC/Foundations_of_HPC_2022/Basic/intro
Before starting: HPC prefix..
• How many data are
produced daily?
• How large is your HD on
your laptop ?
• How large is your RAM?
• How powerful is your
CPU in your laptop ?
• How large is the L1
cache of your CPU ?
Agenda

Prologue: why and where HPC ?

What is HPC ?

Performance and metrics

Supercomputers and TOP500

Parallel computers
What do they have in common ?
Where and Why HPC ?

Traditionally HPC system (a.k.a supercomputers)


were confined in research and academic lab…

Today they are everywhere: HPC is now an


enabler not just for science but also for business

Today HPC does not necessarily means


supercomputers
HPC not easy to define..

High performance computing (HPC), also known


as supercomputing, refers to computing
systems with extremely high computational
power that are able to solve hugely complex
and demanding problems.

[Taken from https://ec.europa.eu/digital-single-


market/en/high-performance-computing]
Complex problem 1:
Weather forecast..
Recipe:
- Define a mathematical model to describe the
problem
- Solve it computationally
- Discretization over a 3d grid
- Integrate equations
- Check results..
Complex problem: climate change over
the Mediterranean sea
• What are the requirements in term of RAM to
have decent results ?

200 km 25 km
Complex problem: climate change over
the Mediterranean sea
• Resolution:
– 200km -> ~ 1GB 2km -> ? GB

200 km 25 km
Complex problems solved by
simulations
● Simulation has become the way to research and
develop new scientific and engineering solutions.
● Used nowadays in leading science domains like
aerospace industry, astrophysics, etc.
● Challenges related to the complexity, scalability
and data production of the simulators arise.
● Impact on the relaying IT infrastructure.
Interested in more example ?
• See chapter one section
1.2 of reference 4
• Look around on the
internet..
Research is changing..
● Inference Spiral of System Science
As models become more complex and new data bring in more
information, we require ever increasing computational power
Data are flooding us..

In today’s world, larger and larger amounts of data are


constantly being generated, from 33 zettabytes globally in
2018 to an expected 175 zettabytes in 2025). As a result, the
nature of computing is changing, with an increasing number
of data-intensive critical applications. is key to processing
and analysing this growing volume of data, and to making
the most of it for the benefit of citizens, businesses,
researchers and public administrations.

[Taken again from https://ec.europa.eu/digital-single-market/en/high-performance-


computing]
Data intensive science
Big data challenge: from HPC to HPDA
through AI
● Organizations are expanding their definitions
of high-performance computing (HPC) to
include workloads such as artificial
intelligence (AI) and high-performance data
analytics (HPDA) in addition to traditional HPC
simulation and modeling workloads.

From https://insidebigdata.com/2019/07/22/converged-hpc-
clusters/
Complex problem 2
Protein folding..
Determine the e structure of the protein from its
aminoacid sequence
- Solve it computationally..
A. Run very long MD simulation on each sequence and wait..
B. Predict new structures training an AI algorithm..
Approach A: Folding@home
• A more than 20 year project
• It allows everybody to run an MD simulation
contributing to the problem..
• Impressive distributed computational power..
• From statistics page:
OS GPU CPU Tflops GPU Tflops x86

Windows 2440+ 12864 31417 27504 55243

Linux 98 + 19257 34794 36561 72456

Total 2538+ 32121 73018 64429 128063


Approach B : AlphaFold

• AlphaFold is an AI system
developed by DeepMind that
predicts a protein’s 3D
structure from its amino acid
sequence.
• Presented in 2018 and 2020
at CASP outperformed all
other approaches
• Scores obtained are roughly
equivalent to experimentally
determined structured
AlphaFold DB
AlphaFold DB release in July 2022 open access to over 200
million protein structure predictions to accelerate scientific
research.

Full dataset download for AlphaFold Database - UniProt


(214M):

• The full dataset of all predictions is available at no cost and


under a CC-BY-4.0 licence this by single-species for ease of
downloading subsets or all of the data from Google Cloud
Public Datasets. We've grouped We suggest that you only
download the full dataset if you need to process all the data
with local computing resources (the size of the dataset is 23
TiB, ~1M tar files).
How much power does AlphaFold
“We train the model on Tensor Processing Unit (TPU)* v3 with a batch size of 1 per
TPU core, hence the model uses 128 TPU v3 cores…
The initial training stage takes approximately 1 week, and the fine-tuning stage
takes approximately 4 additional days.”

* A Tensor Processing Unit


(TPU) is an application specific
integrated circuit (ASIC)
developed by Google to
accelerate machine learning.
Google offers TPUs on demand,
as a cloud deep learning service
called Cloud TPU.
FastFold vs. AlphaFold
FastFold: Reducing AlphaFold Training Time
from 11 Days to 67 Hours

We successfully scaled the AlphaFold model training to 512 NVIDIA A100 GPUs and
obtained aggregate 6.02 PetaFLOPs at the training stage. The overall training time is
reduced to 67 hours from 11 days with significant economic cost savings.
Agenda

Prologue: why and where HPC ?

What is HPC ?

Performance and metrics ?

Supercomputers and TOP500

Parallel Computers
HPC: a first second definition
High Performance Computing (HPC) is the use of
servers, clusters, and supercomputers – plus
associated software, tools, components, storage,
and services – for scientific, engineering, or
analytical tasks that are particularly intensive in
computation, memory usage, or data management

HPC is used by scientists and engineers both in


research and in production across industry,
government and academia.
[to be continued]
Elements of the HPC ecosystem..
• use of servers, clusters, and supercomputers
→ HARDWARE
• associated software, tools, components,
storage, and services
→ SOFTWARE
• scientific, engineering, or analytical tasks
→ PROBLEMS TO BE SOLVED..
A list of HPC items

COMPUTATIONAL ACCELERATORS HIGH SPEED NETWORKS HIGH END PARALLEL


SERVERS STORAGE

IS ALL THIS ENOUGH ?

MIDDLEWARE SCIENTIFIC/TECHNICAL/ RESEARCH/TECHNICAL PROBLEMS TO BE


DATA ANALYSIS DATA SOLVED
SOFTWARE
Last but not least: people
• Human capital is by far the most important
aspect
• Two important roles:
– HPC providers
• plan/install/manage HPC resources
– HPC user :
• use at best HPC resource

MIXING/INTERPLAYING ROLES
INCREASES COMPETENCE LEVELS
Yet another definition
• HPC incorporates all
facets of three disciplines:
– Technology
– Methodology
– Application

• The main defining


property and value
provided by HPC is
delivering performance
for end-user application
Agenda

Prologue: why and where HPC ?

What is HPC ?

Performance and metrics

Supercomputers and TOP500

Parallel Computers
It is all about Performance
• It is difficult to define Performance properly
“speed” / “how fast” are vague terms
• Performance as a measure again ambiguous
and not clearly defined and in its
interpretation
• In any case performance it is at core to HPC
as a discipline
• Let discuss it in some details
Does P stand just for Performance ?
• Performance is not always what matters..
to reflect a greater focus on the productivity,
rather than just the performance, of large-scale
computing systems, many believe that HPC
should now stand for High Productivity
Computing. [ from wikipedia]

• P should also stand for PROFITABILITY


Performance vs Productivity
• A possible definition:
– Productivity = (application performance) / (application
programming effort)
• Example:
– To speed up a code by a factor of two it takes 6 months work
– does this deserve to be done ?

• people in HPC arena have different goals in mind thus


different expectations and different definitions of
productivity.

• Suggestion: Understand which kind of productivity are you


interested in
How do measure (basic) performance
of HPC systems
• How fast can I crunch numbers on my CPUs ?
• How fast can I move data around ?
– from CPUs to memory
– from CPUs to disk
– from CPUs on different machines
• How much data can I store ?
Number crunching on CPU: what do
we count ?
• Rate of [million/billions of] floating point
operations per second ([M|G]flops) FLOPs/S
• Theoretical peak performance:
– determined by counting the number of floating-
point additions and multiplications that can be
completed during a period of time, usually the
cycle time of the machine

FLOPS=clock_rate*Number_of_FP_operation*Number_of_cores
Sustained (peak) performance
• Real (sustained) performance: a measure
FLOPS= (total number of floating point operations done by a
program) / (time the program takes to run in second)

• Number_of_floating_point_operations not easy to


be defined for real application
• benchmarks are available for that..
• Top500 list uses HPL Linpack:
– Sustained peak performance is what's matter in TOP500
Agenda

Prologue: why and where HPC ?

What is HPC ?

Performance and metrics

Supercomputers and TOP500

Parallel Computers
TOP 500 List

• The TOP500 list www.top500.org


• published twice a year from 1993
–ISC conference in Europe (June)
–Supercomputing conference in USA
(November)
• List the most powerful computers in the
world
• yardstick: Linpack benchmark (HPL)
HPL: some details
• From http://icl.cs.utk.edu/hpl/index.html:
– The code solves a uniformly random system of linear equations and
reports time and floating-point execution rate using a standard
formulafor operation count.
– Number_of_floating_point_operations = 2/3n³ + 2n² (n=size of the
system)
HPL&TOP 500 List

• For each machine the following numbers are


reported using HPL:
– Rmax: the performance in GFLOPS for the largest
problem run on a machine.
– Rpeak: the theoretical peak performance GFLOPS for
the machine.
– The measure of the power required to run the
benchmark
And the winner is..
Highlights (from www.top500.org)
• the 59th edition of the TOP500 revealed the Frontier
system to be the first true exascale machine with an HPL
score of 1.102 Exaflop/s.
• Frontier brings the pole position back to the USA after it
was held for 2 years by the Fugaku system at RIKEN
Center for Computational Science (R-CCS) in Kobe, Japan.
• The Frontier has a peak performance of 1.6 ExaFlop/s and
has achieved so far, an HPL benchmark score of 1.102
Eflop/s.
• On the HPL-AI benchmark, which measure performance
for mixed precision calculation, Frontier already
demonstrated 6.86 Exaflops!
Other highlights
• A total of 169 systems on the list are using
accelerator/co-processor technology, up from 151 six
months ago. 84 of these use NVIDIA Volta chips, 54 use
NVIDIA Ampere, and 8 systems with NVIDIA Pascal.
• Intel continues to provide the processors for the largest
share (77.60 percent) of TOP500 systems, down from
81.60 % six months ago. 93 (18.60 %) of the systems in
the current list used AMD processors, up from 14.60 %
six months ago.
• The average concurrency level in the TOP500
is 182,864 cores per system up from 162,520 six
months ago.
By country…
By operating system
Performance development
Agenda

Prologue: why and where HPC ?

What is HPC ?

Performance and metrics

Supercomputers and TOP500

Parallel Computers
• To be continued

You might also like