Introduction To High Performance Computing: Unit-I
Introduction To High Performance Computing: Unit-I
Performance Computing
Unit-I
Syed Rameem Zahra
(Assistant Professor)
Department of CSE, NSUT
What & Why
❑ What is high performance computing (HPC)?
❖ The use of the most efficient algorithms on computers capable of
the highest performance to solve the most demanding problems.
❑ Why HPC?
❖ Numerical simulation to predict the behaviour of physical
systems.
❖ High performance graphics—particularly visualization, and
animation.
❖ Big data analytics for strategic decision making.
❖ Synthesis of molecules for designing medicines.
HPC Examples: Blood Flow in
Human Vascular Network
❑ Cardiovascular disease accounts for
about 50% of deaths in western world;
❑ Formation of arterial disease strongly
correlated to blood flow patterns;
In one minute, the heart pumps the Blood flow involves multiple scales
entire blood supply of 5 quarts
through 60,000 miles of vessels, that
is a quarter of the distance between
the moon and the earth
Computational challenges:
Enormous problem size
HPC Examples
Earthquake simulation
Data Visualization,
Viz software Validation,
Physical insight
Major Applications of Next Generation Supercomputer
Targeted as grand
challenges
Performance Metrics
❑ FLOPS, or FLOP/S: FLoating-point Operations Per
Second
❖ MFLOPS: MegaFLOPS, 10^6 flops
❖ GFLOPS: GigaFLOPS, 10^9 flops, home PC
❖ TFLOPS: TeraGLOPS, 10^12 flops, present-day
supercomputers (www.top500.org)
❖ PFLOPS: PetaFLOPS, 10^15 flops, by 2011
❖ EFLOPS: ExaFLOPS, 10^18 flops, by 2020
❖ MIPS=Mega Instructions per Second = MegaHertz (if only 1IPS)
Note: von Neumann computer -- 0.00083 MIPS
Performance Metrics
❑ Theoretical peak performance (R_theor): maximum
FLOPS a machine can reach in theory.
❖ Clock_rate * no_cpus * no_FPU/CPU
❖ 3GHz, 2 cpus, 1 FPU/CPU, then R_theor = 3x10^9 * 2 = 6
GFLOPS
❑ Real performance (R_real): FLOPS for specific
operations, e.g. vector multiplication
❑ Sustained performance (R_sustained): performance
on an application, e.g. CFD
Not uncommon
R_sustained << R_real << R_theor R_sustained < 10%R_theor
Computer Performance
❑ CPU operates on data. If no data, CPU has to wait;
performance degrades.
❖ typical workstation: 3.2GHz CPU, Memory 667MHz. Memory 5
times slower.
❖ Moore’s law: CPU speed doubles every 18 months
❖ Memory speed increases much much slower;
❑ Fast CPU requires sufficiently fast memory.
❑ Rule of thumb: Memory size in GB=R_theor in GFLOPS
❖ 1CPU cycle (1 FLOPS) handles 1 byte of data
❖ 1MFLOPS needs 1MB of data/memory
❖ 1GFLOPS needs 1GB of data/memory
Many “tricks” designed for performance improvement targets the memory
CPU Performance
❑ Computer time is measured in terms of CPU cycles
❖ Minimum time to execute 1 instruction is 1 CPU cycle
❑ Time to execute a given program:
7 IF RD EX WB
cycle 1 2 3 4 5 6 7 8 9 10
● Even though this method reduces the time to complete the set of
jobs, it also has the disadvantages of both temporal parallelism and
to some extent that of data parallelism.
● The method is effective only if the number of jobs given to each
pipeline is much larger than the number of stages in the pipeline.
● Multiple pipeline processing was used in supercomputers such as
Cray and NEC-SX as this method is very efficient for numerical
computing in which a number of long vectors and large matrices
are used as data and could be processed simultaneously.
Other Parallelisms
● Data Parallelism with Dynamic Assignment
○ Here a head examiner gives one answer paper to each teacher and
keeps the rest with him. All teachers simultaneously correct the paper
given to them. A teacher who completes correction goes to the head
examiner for another paper which is given to him for correction. If a
second teacher completes correction at the same time, then he
queues up in front of the head examiner and waits for his turn to get
an answer paper. The procedure is repeated till all the answer papers
are corrected.
● Data Parallelism with Quasi-dynamic Scheduling
○ giving each teacher unequal sets of answer papers to correct.
Other Parallelisms