0% found this document useful (0 votes)

18 views

Module 1-Topic 1

hpc

Uploaded by

krishna teja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Module 1-Topic 1

hpc

Uploaded by

krishna teja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

HIGH PERFORMANCE

COMPUTING

1
Module 1-Contents
Topics :
1. High-performance Computing Disciplines
2. Impact of Supercomputing in Science, and Security
3. Anatomy of a Supercomputer
4. Computer Performance
5. A Brief History of Supercomputing

2
Topic 1: High Performance Computing Disciplines

1.1 Definition
1.2 Application Programs
1.3 Performance and Metrics
1.4 High Performance Computing Systems
1.5 Supercomputing Problems
1.6 Application Programming

3
1.1 High Performance Computing (HPC) or Supercomputers -
Definition
• To solve large complex problems by:
• Breaking down the computational portions of the algorithm into concurrent
instructions and deploying a number of processors to work in parallel.

• Parallel computers (Multicore) -> Cluster -> A group of clusters-> HPC

4
• Parallel computing runs multiple tasks simultaneously on
numerous computer servers or processors.
• HPC uses massively parallel computing, which uses tens of
thousands to millions of processors or processor cores.
• An HPC cluster comprises multiple high-speed computer
servers networked with a centralized scheduler that manages
the parallel computing workload.
• The computers, called nodes, use either high-performance
multi-core CPUs or—more likely today—GPUs, which are well
suited for rigorous mathematical calculations, machine
learning (ML) models and graphics-intensive tasks.
• A single HPC cluster can include 100,000 or more nodes.

5
• All the other computing resources in an HPC cluster—such
as networking, memory, storage and file systems—are high speed and
high throughput. They are also low-latency components that can keep
pace with the nodes and optimize the computing power and
performance of the cluster.
• HPC workloads rely on a message passing interface (MPI), a standard
library and protocol for parallel computer programming that allows
users to communicate between nodes in a cluster or across a
network.

6
7
1.2 Application Programs -SAMPLE APPLICATION AREAS OF HPC

8
1.2 Application Programs -SAMPLE APPLICATION AREAS OF HPC

9
10
Addressing the Big Questions
• What grand challenge applications demand these capabilities?
• How do users program such systems?
– What languages and in what environments?
– What are the semantics and strategies?
• How to manage supercomputer resources to deliver useful
computing capabilities?
– What are the hardware mechanisms?
– What are the software policies?
• What are the computational models and algorithms that can map the
innate application properties to the physical medium of the
machine?
• How to integrate enabling technologies into computing engines?
• How to push the performance to extremes?
– What are the enabling conditions?
– What are the inhibiting factors?
1.3 Performance Metrics

• Most widely used metric in HPC:

• FLOPS (floating-point operations per sec)
• Gigaflops=1E9 flops (1*109)
• Teraflops=1E12 flops
• Teraflops (TFLOPS): A measure of one trillion (1,000,000,000,000) operations
per second. 1 PFLOPS is 1,000 TFLOPS.
• Petaflops=1E15flops
• Exaflops=1E18flops
• A measure of one quintillion (1,000,000,000,000,000,000) operations per
second. 1 EFLOPS is 1,000 PFLOPS.

12
1.3 Performance Metrics

• A laptop with a 3 GHz processor can perform around 3 billion calculations per
second.
• High-performance computers (HPCs) can perform quadrillions of calculations per
second.
• A quadrillion operations per second, often abbreviated as peta-operations per second
(PetaOPS or PFLOPS for floating-point operations per second), is a measure of computational
performance.
• A quadrillion is a number with 15 zeros (1,000,000,000,000,000).
• HPC is the supercomputer, which contains thousands of nodes (servers) working
together to complete one or more tasks. This is called "parallel processing."
• High-performance computing architecture comprises several servers networked
together, also known as an HPC cluster. Each server in this cluster is called a node,
and each node works together to increase processing speed.

13
14
HPC consume more power, storage and generates noise

• Storage: The HPC machines appear as rows upon rows of many racks
taking up thousands of square feet.
• Power consumption: consuming potentially multiple megawatts of
electrical power.
• Generates a lot of noise, and rapidly shifting temperature gradients.

15
Biggest Supercomputer in the US

The TITAN petaflops machine fully deployed at Oak Ridge National Laboratory in
2013. It takes up more than 4000 square feet and consumes approximately 8
Megawatts of electrical power. It has a theoretical peak performance of over 27
Petaflops and delivers 17.6 Petaflops Rmax sustained performance for the HPL
(Linpack) benchmark. This architecture includes Nvidia GPU accelerators.

16
1.4 High Performance Computing Systems :
Conventional Heterogeneous Multicore System Architecture

Global Interconnection Network

NIC

Accelerator

Core Array

Multicore Sockets

Scratchpad
memory …
Memory Banks
Node Node Node
Node
2 3 N
1

17
1.4 High Performance Computing Systems : Conventional
Heterogeneous Multicore System Architecture
1. Core Array: A structured arrangement of multiple processor cores
designed for parallel processing.
2. Multicore Socket: A single CPU package containing multiple cores,
allowing for simultaneous processing of multiple tasks.
3. NIC (Network Interface Card): A hardware component enabling
high-speed network communication between nodes in an HPC cluster.
4. Scratchpad Memory: Fast, local memory used to store frequently
accessed temporary data, enhancing computational efficiency.

18
1.4 High Performance Computing Systems : Conventional Heterogeneous
Multicore System Architecture

1. Core Array
• A core array in HPC refers to a configuration of multiple processor cores
arranged in a structured, interconnected grid or array.
• This setup is designed to maximize computational efficiency and
parallelism.
• Each core in the array can execute instructions and perform calculations
independently, and they often work together to handle large-scale
computations.
• The core array consist of either homogeneous cores or heterogeneous
cores.
• The interconnect network between these cores is optimized to ensure high
data transfer bandwidth and low latency, facilitating efficient parallel
processing.
19
Homogeneous Cores: Identical cores with uniform capabilities, e.g., Intel Xeon
processors.

• Homogeneous Cores
• Homogeneous cores are identical processing units within a system.
Each core has the same architecture, capabilities, and performance
characteristics. This uniformity simplifies the design of parallel
applications, as all tasks can be distributed equally among the cores.
• Example:
• Intel Xeon Processors: These processors feature multiple identical cores, all
capable of executing the same instructions at the same speed. They are often
used in data centers and HPC environments where uniform performance is
desired.

20
Heterogeneous Cores: Different types of cores optimized for
specific tasks, e.g., NVIDIA Tegra X1 and Apple M1.

• Heterogeneous Cores
• Heterogeneous cores refer to systems that contain different types of processing units,
each optimized for specific tasks. This configuration can provide better overall
performance and energy efficiency by using the most appropriate core for each task.
• Example:
• NVIDIA Tegra X1: This SoC (System on Chip) combines ARM Cortex-A57 and Cortex-A53
CPU cores (for general-purpose processing) with a 256-core Maxwell GPU (for graphics
and parallel computations). The different cores handle different types of workloads,
providing a balance between performance and power efficiency.
• Apple M1: This processor includes high-performance cores (for demanding tasks) and
high-efficiency cores (for less intensive tasks) along with an integrated GPU and a Neural
Engine for machine learning tasks. This design allows the system to optimize
performance and power consumption dynamically.

21
Conventional Heterogeneous Multicore System Architecture

2. Multicore Socket
• A multicore socket is a single physical CPU package that contains
multiple processor cores. Modern processors, especially those used in
HPC systems, integrate several cores within a single socket to increase
computational power and enable parallel processing.
• Multicore CPU: A CPU with multiple cores within one physical
package. Each core can independently execute its own thread or
process, allowing multiple tasks to be processed simultaneously.
• Socket: The physical interface on the motherboard that houses the
CPU. An HPC system can have multiple sockets, each with a multicore
CPU, increasing the total number of cores available for computation.
22
Conventional Heterogeneous Multicore System Architecture

3. Network Interface Card (NIC)

• A Network Interface Card (NIC) is a hardware component that
connects a computer to a network. In HPC, NICs are crucial for
enabling high-speed data transfer between nodes in a cluster.
• Function: NICs handle the input and output of data to and from the
network, facilitating communication between different nodes
(computers) in an HPC cluster.
• High-Speed Interconnects: In HPC, NICs often support high-speed
interconnect technologies such as InfiniBand, Ethernet, or Omni-Path,
which provide low-latency and high-bandwidth communication
essential for distributed computing tasks.
23
Conventional Heterogeneous Multicore System Architecture

4. Scratchpad Memory
• Scratchpad memory is a type of fast, local memory used in HPC systems to
store temporary data that is frequently accessed by the processor cores. It
is designed to be faster and more efficient than regular main memory
(DRAM).
• Purpose: Scratchpad memory is used to reduce latency and increase the
speed of data access for certain computations. It is often employed in
specialized processors like GPUs or TPUs to store intermediate results and
data that require quick read/write access.
• Characteristics: It is usually smaller in size compared to main memory but
much faster, providing a dedicated space for critical data during
computation-intensive tasks.
24
Conventional Heterogeneous Multicore System Architecture
5. Accelerator:
A co-processor , a special hardware component to speed up computations that are
more compute-intensive or time-consuming. Example : Google’s TPU
• Tensor Processing Unit (TPU) is an AI accelerator application-specific
integrated circuit (ASIC) developed by Google for neural network machine
learning, using Google's own TensorFlow software

25
What distinguishes a HPC from a conventional computer?

• Purpose & Usage: - Designed for complex computations and large-scale simulations.
• Architecture: Composed of a cluster of many nodes (each node being a powerful computer)
connected through high-speed networks.

• Performance: Delivers significantly higher performance in terms of processing speed, memory

capacity, and data throughput.

• Scalability: Highly scalable, can add more nodes to increase computational power.
• Software and Applications: - Runs specialized software optimized for parallel processing and
large-scale computations.

• Cost and Maintenance: Significantly more expensive to purchase, operate, and maintain due to
the complexity and scale of the hardware.

26
1.5 Supercomputing Problems

27
Supercomputing Problems

• The main benchmark currently used to measure a supercomputer’s

peak performance is a dense linear algebra problem.
• Dense linear algebra (DLA) problems are problems that can be solved using a
relatively small set of standard mathematical operations, such as
multiplication, LU factorization, or the symmetric eigenvalue problem. DLA
problems include: Solving dense systems of linear equations, Least square
problems, Eigenvalue and singular value problems, and Other related
computational tasks.

28
A Particle-In-Cell simulation from the Gyrokinetic Toroidal code (Princeton
Plasma Physics laboratory) that simulates a plasma within a Tokomak fusion
device. A sampling of some particles within the toroid are shown here colored
according to their velocity with different supercomputing processor
boundaries delineated by the toroidal subdivisions.
1.6 APPLICATION PROGRAMMING
• What are the requirements and characteristics of application programming in the
context of HPC ?
• The principal view the user has of a HPC system is through one or more
programming interfaces, which take the form of programming languages,
libraries, or other services.
1. Correctness
2. Reliability
3. Performance is the driving requirement that differentiates HPC programming
from other domains
• Performance is most significantly represented by the need for representation and
exploitation of computational parallelism: the ability to perform multiple tasks
simultaneously.
4. Parallel processing involves the definition of parallel tasks, establishing the
criteria that determine
• when a task is performed, synchronization among tasks in part to coordinate sharing, and
allocation to computing resources.

30
APPLICATION PROGRAMMING

5. Control of the relationship of allocations of data and tasks to the physical

resources of the parallel and distributed systems.
• The nature of the parallelism may vary significantly depending on the form of
computer system architecture targeted by the application program.
6. Also of concern are issues of determinism, correctness, performance debugging,
and performance portability.

31
Parallel Programming Models used in HPC determines the nature of parallelism

Depending on the nature of the class of parallel system architecture, different

programming models are employed. One dimension of differentiation is granularity
of the parallel workflow.
• Coarse-grained parallelism
• Very coarse-grained workloads with no interactivity, sometimes referred to as
“embarrassingly parallel” or “job-stream” workflow
• Fine-grained parallelism
• multiple-thread shared-memory system programming interfaces such as
OpenMP and Cilkþþ
• Medium-grained parallelism
• highly scaled massively parallel processors (MPPs) and clusters, is primarily
represented by communicating sequential processes such as the message-
passing interface (MPI)

32
Backup slides –Only for reference

33
Intel Xeon processor-An example for homogeneous core

34
NVIDIA and Apple integerated on a chip–An example for heterogeneous core

NVIDIA Tegra X1 Apple M1

35
36

Andersson, Gerhard - The Internet and CBT - A Clinical Guide-CRC Press (2014)
No ratings yet
Andersson, Gerhard - The Internet and CBT - A Clinical Guide-CRC Press (2014)
164 pages
HPC
No ratings yet
HPC
30 pages
UNIT 1
No ratings yet
UNIT 1
31 pages
CC Unit1 Notes Compressed
No ratings yet
CC Unit1 Notes Compressed
41 pages
L1.1 HPC Environment
No ratings yet
L1.1 HPC Environment
27 pages
CC notes I unit
No ratings yet
CC notes I unit
31 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
CC-All 5 Units Notes
No ratings yet
CC-All 5 Units Notes
86 pages
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
No ratings yet
Introduction To High Performance Computing: Shaohao Chen Research Computing Services (RCS) Boston University
29 pages
HPC%20Tools%20and%20Technologies%20for%20Web%20Programming
No ratings yet
HPC%20Tools%20and%20Technologies%20for%20Web%20Programming
33 pages
CC 1
No ratings yet
CC 1
11 pages
Ebook What Is HPC
No ratings yet
Ebook What Is HPC
25 pages
Introduction To High-Performance Computing
No ratings yet
Introduction To High-Performance Computing
13 pages
L1.2 HPC Introduction
No ratings yet
L1.2 HPC Introduction
42 pages
HPC Intro Ad OS
No ratings yet
HPC Intro Ad OS
44 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
Unit 1 - Computing Paradigms
No ratings yet
Unit 1 - Computing Paradigms
31 pages
04 - Computer Clusters
No ratings yet
04 - Computer Clusters
66 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
Unit-1 Part-1
No ratings yet
Unit-1 Part-1
14 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
High Performance Computing
No ratings yet
High Performance Computing
18 pages
High Performance Computing (HPC) Lec1
No ratings yet
High Performance Computing (HPC) Lec1
30 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
CLOUD COMPUTING UNIT - 1
No ratings yet
CLOUD COMPUTING UNIT - 1
41 pages
Screenshot 2024-06-27 at 11.49.45 PM
No ratings yet
Screenshot 2024-06-27 at 11.49.45 PM
28 pages
m1IA
No ratings yet
m1IA
5 pages
Ccmid1 Unit1
No ratings yet
Ccmid1 Unit1
17 pages
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
100% (1)
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
17 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
MODULE- 01 CC(BCS601)
No ratings yet
MODULE- 01 CC(BCS601)
47 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Unit I
No ratings yet
Unit I
13 pages
Hpc in Abstract
No ratings yet
Hpc in Abstract
3 pages
High Performance Computing Unit 1
No ratings yet
High Performance Computing Unit 1
3 pages
Module 1 ppt
No ratings yet
Module 1 ppt
33 pages
Presentation cc 1
No ratings yet
Presentation cc 1
63 pages
CC Unit 1
No ratings yet
CC Unit 1
24 pages
CC Unit-1
No ratings yet
CC Unit-1
17 pages
Cloud Computing Unit1 New
No ratings yet
Cloud Computing Unit1 New
27 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Cloud Computing Unit - 1
No ratings yet
Cloud Computing Unit - 1
42 pages
Cloudcomputingunit 1
No ratings yet
Cloudcomputingunit 1
28 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
High Performance Computing: What Is It Used For and Why?
No ratings yet
High Performance Computing: What Is It Used For and Why?
19 pages
I Notes
No ratings yet
I Notes
27 pages
CC Unit 1
No ratings yet
CC Unit 1
41 pages
SPPU - BE - HPC - Unit 1 Notes
67% (3)
SPPU - BE - HPC - Unit 1 Notes
47 pages
Cloud Computing Unit-1
No ratings yet
Cloud Computing Unit-1
88 pages
Pendahuluan Paralel Komputer
No ratings yet
Pendahuluan Paralel Komputer
167 pages
Chapter One: Computing Paradigms
No ratings yet
Chapter One: Computing Paradigms
4 pages
Notes
No ratings yet
Notes
72 pages
Ayushagrawal Hpc
No ratings yet
Ayushagrawal Hpc
17 pages
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
CC UNIT- 1 Question Bank Answers
No ratings yet
CC UNIT- 1 Question Bank Answers
25 pages
UNIT I Notes Modified
No ratings yet
UNIT I Notes Modified
7 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
High Performance Computing
100% (2)
High Performance Computing
61 pages
HPC Clusters Demystified
From Everand
HPC Clusters Demystified
Alisa Turing
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Kristopher L. Cannon CV
No ratings yet
Kristopher L. Cannon CV
8 pages
Cambridge IGCSE: ECONOMICS 0455/21
No ratings yet
Cambridge IGCSE: ECONOMICS 0455/21
8 pages
Technologies of The Body in Contemporary Ayahuasca Shamanism in The Peruvian Amazon - Implications For Future Research PDF
No ratings yet
Technologies of The Body in Contemporary Ayahuasca Shamanism in The Peruvian Amazon - Implications For Future Research PDF
7 pages
FM 2016 Tactics Guide
100% (1)
FM 2016 Tactics Guide
33 pages
Cybercrime in Zimbabwe and Globally
No ratings yet
Cybercrime in Zimbabwe and Globally
19 pages
Popular Electronics 1963-11
No ratings yet
Popular Electronics 1963-11
118 pages
Bismillah Salsa Sempro Fixxx
No ratings yet
Bismillah Salsa Sempro Fixxx
42 pages
Where Can Buy Microeconomics Theory and Applications With Calculus Third Edition Perloff Ebook With Cheap Price
100% (13)
Where Can Buy Microeconomics Theory and Applications With Calculus Third Edition Perloff Ebook With Cheap Price
60 pages
Starlight 9 Additional
No ratings yet
Starlight 9 Additional
18 pages
Marketing Report: Hangzhou XXXX Company
No ratings yet
Marketing Report: Hangzhou XXXX Company
19 pages
Activity 2 Formation
No ratings yet
Activity 2 Formation
4 pages
Ba Mba ZC411 Ec 2R Ak 1605956849492
No ratings yet
Ba Mba ZC411 Ec 2R Ak 1605956849492
2 pages
TCKT 5
No ratings yet
TCKT 5
1 page
Rajiv Gandhi University of Health Sciences Karnataka Bangalore
No ratings yet
Rajiv Gandhi University of Health Sciences Karnataka Bangalore
13 pages
Emailing 04-Pediatrics-Orthobullets2017 PDF
100% (2)
Emailing 04-Pediatrics-Orthobullets2017 PDF
254 pages
M A M SC M Com Other Courses 2011-12
No ratings yet
M A M SC M Com Other Courses 2011-12
95 pages
The Stela of Piankhy. Warner, Et Al., Comp. 1917
No ratings yet
The Stela of Piankhy. Warner, Et Al., Comp. 1917
22 pages
Accounting Manual Final 21.04.2025
No ratings yet
Accounting Manual Final 21.04.2025
149 pages
I. Jose Protacio Rizal Mercado Y Allonzo Realonda II. June 19, 1861 III. Father Pedro Casanas
No ratings yet
I. Jose Protacio Rizal Mercado Y Allonzo Realonda II. June 19, 1861 III. Father Pedro Casanas
41 pages
Bittu Home Work
No ratings yet
Bittu Home Work
5 pages
2nde HARLEM - definition with mindmap
No ratings yet
2nde HARLEM - definition with mindmap
3 pages
Obgyn: History Taking and Examination DR Musa Marena Obgyn
No ratings yet
Obgyn: History Taking and Examination DR Musa Marena Obgyn
94 pages
ABC and ABM
100% (1)
ABC and ABM
3 pages
Adhitya MYP3 Unit2 Formative Assessment (Sec D & F)
No ratings yet
Adhitya MYP3 Unit2 Formative Assessment (Sec D & F)
9 pages
Channel Lineup
No ratings yet
Channel Lineup
2 pages
International Journal of Heat and Mass Transfer: Jeongmin Lee, Lucas E. O'Neill, Issam Mudawar
No ratings yet
International Journal of Heat and Mass Transfer: Jeongmin Lee, Lucas E. O'Neill, Issam Mudawar
19 pages
SJH Hotel Fund Im Final 7 April 2021
No ratings yet
SJH Hotel Fund Im Final 7 April 2021
50 pages
Meeting 7 English Maritime
No ratings yet
Meeting 7 English Maritime
3 pages
Math 1 Q1
100% (1)
Math 1 Q1
4 pages

Module 1-Topic 1

Uploaded by

Module 1-Topic 1

Uploaded by

HIGH PERFORMANCE

• Parallel computers (Multicore) -> Cluster -> A group of clusters-> HPC

• Most widely used metric in HPC:

Global Interconnection Network

3. Network Interface Card (NIC)

• Performance: Delivers significantly higher performance in terms of processing speed, memory

• The main benchmark currently used to measure a supercomputer’s

5. Control of the relationship of allocations of data and tasks to the physical

Depending on the nature of the class of parallel system architecture, different

NVIDIA Tegra X1 Apple M1

You might also like