0% found this document useful (0 votes)
10 views44 pages

Intro HPC IITK

Uploaded by

abhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views44 pages

Intro HPC IITK

Uploaded by

abhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

An introduction to High

Performance Computing
and its Applications

Ashish P. Kuvelkar
Senior Director (HPC- Tech)
C-DAC, Pune

Centre for Development of Advanced Computing


Outline

• Introduction to HPC
• Architecting a HPC system
• Approach to Parallelization
• Parallelization Paradigm
• Applications in area of Science and Engineering

© Centre for Development of Advanced Computing


What is a HPC?

High Performance Computing

• Set of Computing technologies for very fast numeric


simulation, modeling and data processing

• Employed for specialised applications that require lot of


mathematical calculations

• Using computer power to execute a few applications


extremely fast

© Centre for Development of Advanced Computing


What is HPC?(continued)
Definition 1
• High Performance Computing (HPC) is the use of
parallel processing for running advanced application
programs efficiently, reliably and quickly.
• A supercomputer is a system that performs at or near
the currently highest operational rate for computers.

Definition 2 (Wikipedia)
• High Performance Computing (HPC) uses
Supercomputers and Computer Clusters to solve
advanced computation problems.

(C) 2001, C-DAC


© Centre for Development of Advanced Computing
Evolution of Supercomputers
• Supercomputer in the 1980s and 90s
• Custom-built computer systems
• Very expensive

• Supercomputer after 1990s


• Build using commodity off-the-shelf”
components
• Uses cluster computing techniques

© Centre for Development of Advanced Computing


Supercomputers

Cray Supercomputer PARAM Yuva II

© Centre for Development of Advanced Computing


Components of Cluster

Login Nodes Boot Servers/


Management Nodes
1GbE for
administration
Primary Interconnect

HSM/
Backup
Networking
Server
Gateway

Switch Storage
Fabric Accelerated
Compute Acceleration
Compute
Nodes Tape Library/
Nodes
Backup storage
Parallel File
System
Local
Network

© Centre for Development of Advanced Computing


HPC Software Stack

© Centre for Development of Advanced Computing


Single CPU Systems

• Can run a single stream of code


• Performance can be improvement through
• Increasing ALU width
• Increasing clock frequency
• Making use of pipelining
• Improved compilers
• But still, there is a limit to each of these techniques
• Parallel computing, provides relief

© Centre for Development of Advanced Computing


Why use Parallel Computing?

• Overcome limitations of single CPU systems


• Sequential systems are slow
• Calculations make take days, weeks, years
• More CPUs can get job done faster
• Sequential systems are small
• Data set may not fit in memory
• More CPUs can give access to more memory
• So, the advantages are
• Save time
• Solve bigger problems
© Centre for Development of Advanced Computing
Single Processor Parallelism

• Instruction level Parallelism is achieved through


• Pipelining

• Superscaler implementation

• Multicore architecture

• Using advanced extensions

© Centre for Development of Advanced Computing


Pipelined Processors

• A new instruction enters every clock


• Instruction parallelism = No. of pipeline stages

Diagram Souce: Quora


© Centre for Development of Advanced Computing
Superscaler

Multiple Instructions
Cache/ Fetch Decode/
Memory Unit issue
Unit

E E E
U U U
• Multiple execution units
• Sequential instructions, multiple Register File
issue
13 © Centre for Development of Advanced Computing
Multicore Processor
• Single computing component with two or more
independent processing units
• Each unit is called cores, which read and execute
program instructions

Source: Wikipedia.

© Centre for Development of Advanced Computing


Advanced Vector eXtensions

• Useful for algorithms that can take advantage of SIMD


• AVX were introduced by Intel and AMD in x86
• Using AVX-512, applications can pack
• 32 double precision or 64 single precision floating
point operations or
• eight 64-bit and sixteen 32-bit integers
• Accelerates performance for workloads such as
• Scientific simulations, artificial intelligence (AI)/deep
learning, image and audio/video processing
© Centre for Development of Advanced Computing
Parallelization Approach

Centre for Development of Advanced Computing


Means of achieving parallelism

• Implicit Parallelism
• Done by the compiler and runtime system
• Explicit Parallelism
• Done by the programmer

© Centre for Development of Advanced Computing


Implicit Parallelism
• Parallelism is exploited implicitly by the compiler and
runtime system
• Automatically detects potential parallelism in the
program
• Assigns the tasks for parallel execution
• Controls and synchronizes execution
(+) Frees the programmer from the details of parallel
execution
(+) it is a more general and flexible solution
(-) very hard to achieve an efficient solution for many
applications

© Centre for Development of Advanced Computing


Explicit Parallelism
• It is the programmer who has to
• Annotate the tasks for parallel execution
• Assign tasks to processors
• Control the execution and the synchronization points
(+) Experienced programmers achieve very efficient
solutions for specific problems
(-) programmers are responsible for all details
(-) programmers must have deep knowledge of the
computer architecture to achieve maximum
performance.
© Centre for Development of Advanced Computing
Explicit Parallel Programming Models

Two dominant parallel programming models


• Shared-variable model
• Message-passing model

© Centre for Development of Advanced Computing


(Contd…)
Shared Memory Model
• Uses the concept of single address space
• Typically SMP architecture is used
• Scalability is not good

© Centre for Development of Advanced Computing


Shared Memory Model

• Multiple threads operate independently but share same


memory resources
• Data is not explicitly allocated
• Changes in a memory location effected by one process
is visible to all other processes
• Communication is implicit
• Synchronization is explicit

© Centre for Development of Advanced Computing


Advantages & Disadvantages of Shared
Memory Model

Advantages :
• Data sharing between threads is fast and uniform
• Global address space provides user friendly
programming
Disadvantages :
• Lack of scalability between memory and CPUs
• Programmer is responsible for specifying
synchronization, e.g. locks
• Expensive
© Centre for Development of Advanced Computing
Message Passing Model

© Centre for Development of Advanced Computing


Characteristics of Message Passing
Model

• Asynchronous parallelism

• Separate address spaces

• Explicit interaction

• Explicit allocation by user

© Centre for Development of Advanced Computing


How Message Passing Model Works

• A parallel computation consists of a number of


processes
• Each process has purely local variables
• No mechanism for any process to directly access
memory of another
• Sharing of data among processes is done by explicitly
message passing
• Data transfer requires cooperative operations by each
process

© Centre for Development of Advanced Computing


Usefulness of Message Passing Model

• Extremely general model


• Essentially, any type of parallel computation can be cast
in the message passing form
• Can be implemented on wide variety of platforms, from
networks of workstations to even single processor
machines
• Generally allows more control over data location and
flow within a parallel application than in, for example the
shared memory model
• Good scalability
© Centre for Development of Advanced Computing
Parallelization Paradigms

Centre for Development of Advanced Computing


Ideal Situation !!!

• Each Processor has a Unique work to do


• Communication among processes is largely
unnecessary
• All processes do equal work

© Centre for Development of Advanced Computing


Writing parallel codes

• Distribute the data to memories

• Distribute the code to processors

• Organize and synchronize the workflow

• Optimize the resource requirements by means of


efficient algorithms and coding techniques

© Centre for Development of Advanced Computing


Parallel Algorithm Paradigms

• Phase parallel
• Divide and conquer
• Pipeline
• Process farm
• Domain Decomposition

© Centre for Development of Advanced Computing


Phase Parallel Model

o The parallel program consists of


a number of super steps, and
each has two phases.

o In a computation phase, multiple


processes each perform an
independent computation.

o In interaction phase, the


processes perform one or more
synchronous interaction
operations, such as a barrier or a
blocking communication.
© Centre for Development of Advanced Computing
Divide and Conquer model
o A parent process divides its
workload into several smaller
pieces and assigns them to a
number of child processes.

o The child processes then


compute their workload in
parallel and the results are
merged by the parent.

o This paradigm is very natural


for computations such as
quick sort.
© Centre for Development of Advanced Computing
Pipeline Model
Data Stream

o In pipeline paradigm, a number


of processes form a virtual
pipeline.

o A continuous data stream is fed


into the pipeline, and the
processes execute at different
pipeline stages simultaneously.

© Centre for Development of Advanced Computing


Process Farm Model
o Also known as the master-
worker paradigm.
Master o A master process executes the
essentially sequential part of
the parallel program
o It spawns a number of worker
processes to execute the
Worker Worker Worker parallel workload.
o When a worker finishes its
workload, it informs the master
which assigns a new workload
to the slave.

o The coordination is done by


the master.
© Centre for Development of Advanced Computing
Domain Decomposition

This methods solve a


boundary value problem
Program Program by splitting it into smaller
boundary value problems
on subdomains and
iterating to coordinate the
solution between adjacent
subdomains.
n threads
1 Domain
n sub-domains

© Centre for Development of Advanced Computing


Desirable Attributes for Parallel Algorithms

• Concurrency
• Ability to perform many actions simultaneously
• Scalability
• Resilience to increasing processor counts
• Data Locality
• High ratio of local memory accesses to remote
memory accesses (through communication)
• Modularity:
• Decomposition of complex entities into simpler
components
© Centre for Development of Advanced Computing
Massive processing power introduces I/O challenge
• Getting data to and from the processing units can take as long
as the processing itself
• Requires careful software design and deep understanding of
algorithms and architecture of
 Processors (Cache effects, memory bandwidth)
 GPU accelerators
 Interconnects (Ethernet, IB, 10 Gigabit Ethernet),
 Storage (local disks, NFS, parallel file systems)

4 cores
© Centre for Development of Advanced Computing
Application Areas of HPC in
Science & Engineering

Centre for Development of Advanced Computing


HPC in Science
Space Science
• Applications in Astrophysics and
Astronomy
Earth Science
• Applications in understanding
Physical Properties of Geological
Structures, Water Resource
Modelling, Seismic Exploration
Atmospheric Science
• Applications in Climate and
Weather Forecasting, Air Quality

© Centre for Development of Advanced Computing


HPC in Science
Life Science
• Applications in Drug Designing, Genome
Sequencing, Protein Folding

Nuclear Science
• Applications in Nuclear Power, Nuclear
Medicine (cancer etc.), Defence

Nano Science
• Applications in Semiconductor Physics,
Microfabrication, Molecular Biology,
Exploration of New Materials

© Centre for Development of Advanced Computing


HPC in Engineering
Crash Simulation
• Applications in Automobile and
Mechanical Engineering
Aerodynamics Simulation & Aircraft
Designing
• Applications in Aeronautics and
Mechanical Engineering
Structural Analysis
• Applications in Civil Engineering and
Architecture
© Centre for Development of Advanced Computing
Multimedia and Animation

DreamWorks Animation
SKG produces all its animated
movies using HPC graphic
technology

Graphical Animation Application in


Multimedia and Animation

© Centre for Development of Advanced Computing


Thank You

[email protected]

Centre for Development of Advanced Computing

You might also like