0% found this document useful (0 votes)
25 views62 pages

01 - Lecture Intro To HPC

The document outlines the course CSE 455 on High Performance Computing (HPC), covering topics such as HPC fundamentals, parallel programming, and performance analysis. It emphasizes the need for HPC due to increasing computational demands and introduces various parallel architectures and programming concepts. Assessment methods include assignments, quizzes, a midterm, a major project, and a final exam.

Uploaded by

John Wadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views62 pages

01 - Lecture Intro To HPC

The document outlines the course CSE 455 on High Performance Computing (HPC), covering topics such as HPC fundamentals, parallel programming, and performance analysis. It emphasizes the need for HPC due to increasing computational demands and introduces various parallel architectures and programming concepts. Assessment methods include assignments, quizzes, a midterm, a major project, and a final exam.

Uploaded by

John Wadie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

CSE 455

Higher Performance
Computing
LECTURE 1
Course Outline
• High Performance Computing
• Flynn’s Taxonomy
• Interconnection Networks
• Performance Analysis of Multiprocessor Architecture
• Shared Memory Architecture
• Parallel Programming with MPI
• Parallelization Fundamentals
• Parallel Programming with OpenMP
• Graphical Processing Units (GPUs)
• Task Scheduling and Allocation
Textbooks
Assessment
Assessment Marks
Assignments 15
Quizzes 5
Midterm 20
Major Task (Project) 20
Final Exam 40
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
What is High Performance Computing (HPC)?

• Using Computing resources that provide more


computing power to solve a problem in a reasonable
amount of time.
• Those Problems need large amounts of computing
power for short periods of time.
• HPC systems range from workstations, up to the
largest supercomputers.
What is High Performance Computing (HPC)?

HPC includes work on ‘four basic building blocks’ in this


course
• Theory (numerical laws, physical models, speed-up
performance, etc.)
• Technology (multi-core, supercomputers, networks,
storages, etc.)
• Architecture (shared-memory, distributed-memory,
interconnects, etc.)
• Software (libraries, schedulers, monitoring,
applications, etc.)
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Why we need ever-increasing
performance
◼ Computational power is increasing, but so
are our computation problems and needs.
◼ Problems we never dreamed of have been
solved because of past increases, such as
decoding the human genome.
◼ More complex problems are still waiting to
be solved.

Copyright © 2010, Elsevier Inc. All rights Reserved 9


Characteristics of Problems
Solved Using HPC
◼ It takes long time to compute its results (Need
More compute power).
◼ Needs large quantity of resources (memory,
etc...).
◼ Requires multiple runs.
◼ Time critical

Copyright © 2010, Elsevier Inc. All rights Reserved 10


Motivational Examples
A few examples of large-scale problems for motivation.
◼ Currently that means Tera-scale or Peta-scale.

Kilo thousand (103 ) 210


Mega million (106 ) 220
Giga billion (109 ) 230
Tera trillion (1012) 240
Peta (1015) 250
Exa (1018) 260
◼ Processing speed measured in ???

◼ One operation may take several cycles.

◼ How many operation/sec done in 1 GHz Processor ???

◼ Exascale is a billion times more. (Also flops and bytes.)


Solving a linear system
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Climate modeling

Copyright © 2010, Elsevier Inc. All rights Reserved 14


Protein folding

Copyright © 2010, Elsevier Inc. All rights Reserved 15


Drug discovery

Copyright © 2010, Elsevier Inc. All rights Reserved 16


Energy research

Copyright © 2010, Elsevier Inc. All rights Reserved 17


Data analysis

Copyright © 2010, Elsevier Inc. All rights Reserved 18


Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Why we’re building parallel
systems.
◼ From 1986 – 2002, microprocessors were
speeding like a rocket, increasing in
performance an average of 50% per year.

◼ Since then, it’s dropped to about 20%


increase per year.

Copyright © 2010, Elsevier Inc. All rights Reserved 20


An intelligent solution
◼ Instead of designing and building faster
microprocessors, put multiple processors
on a single integrated circuit.

Copyright © 2010, Elsevier Inc. All rights Reserved 21


Now it’s up to the programmers
◼ Adding more processors doesn’t help
much if programmers aren’t aware of
them…
◼ … or don’t know how to use them.

◼ Serial programs don’t benefit from this


approach (in most cases).

Copyright © 2010, Elsevier Inc. All rights Reserved 22


Why we’re building parallel
systems
◼ Up to now, performance increases have
been attributable to increasing density of
transistors.

◼ But there are


inherent
problems.

Copyright © 2010, Elsevier Inc. All rights Reserved 23


A little physics lesson
◼ Smaller transistors = faster processors.
◼ Faster processors = increased power
consumption.
◼ Increased power consumption = increased
heat.
◼ Increased heat = unreliable processors.

Copyright © 2010, Elsevier Inc. All rights Reserved 24


Solution
◼ Move away from single-core systems to
multicore processors.
◼ “core” = central processing unit (CPU)

◼ Introducing parallelism!!!

Copyright © 2010, Elsevier Inc. All rights Reserved 25


Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Why we need to write parallel
programs
◼ Running multiple instances of a serial
program often isn’t very useful.
◼ Think of running multiple instances of your
favorite game.

◼ What you really want is for


it to run faster.

Copyright © 2010, Elsevier Inc. All rights Reserved 27


Approaches to the serial problem
◼ Rewrite serial programs so that they’re
parallel.

◼ Write translation programs that


automatically convert serial programs into
parallel programs.
◼ This is very difficult to do.
◼ Success has been limited.

Copyright © 2010, Elsevier Inc. All rights Reserved 28


More problems
◼ Some coding constructs can be
recognized by an automatic program
generator, and converted to a parallel
construct.
◼ However, it’s likely that the result will be a
very inefficient program.
◼ Sometimes the best parallel solution is to
step back and devise an entirely new
algorithm.

Copyright © 2010, Elsevier Inc. All rights Reserved 29


Example
◼ Compute n values and add them together.
◼ Serial solution:

Copyright © 2010, Elsevier Inc. All rights Reserved 30


Example (cont.)
◼ We have p cores, p much smaller than n.
◼ Each core performs a partial sum of
approximately n/p values.

Each core uses it’s own private variables


and executes this block of code
independently of the other cores.

Copyright © 2010, Elsevier Inc. All rights Reserved 31


Example (cont.)
◼ After each core completes execution of the
code, is a private variable my_sum
contains the sum of the values computed
by its calls to Compute_next_value.

◼ Ex., 8 cores, n = 24, then the calls to


Compute_next_value return:
1,4,3, 9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,9

Copyright © 2010, Elsevier Inc. All rights Reserved 32


Example (cont.)
◼ Once all the cores are done computing
their private my_sum, they form a global
sum by sending results to a designated
“master” core which adds the final result.

Copyright © 2010, Elsevier Inc. All rights Reserved 33


Example (cont.)

Copyright © 2010, Elsevier Inc. All rights Reserved 34


Example (cont.)

Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14

Global sum
8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95

Core 0 1 2 3 4 5 6 7
my_sum 95 19 7 15 7 13 12 14

Copyright © 2010, Elsevier Inc. All rights Reserved 35


But wait!
There’s a much better way
to compute the global sum.

Copyright © 2010, Elsevier Inc. All rights Reserved 36


Better parallel algorithm
◼ Don’t make the master core do all the
work.
◼ Share it among the other cores.
◼ Pair the cores so that core 0 adds its result
with core 1’s result.
◼ Core 2 adds its result with core 3’s result,
etc.
◼ Work with odd and even numbered pairs of
cores.
Copyright © 2010, Elsevier Inc. All rights Reserved 37
Better parallel algorithm (cont.)
◼ Repeat the process now with only the
evenly ranked cores.
◼ Core 0 adds result from core 2.
◼ Core 4 adds the result from core 6, etc.

◼ Now cores divisible by 4 repeat the


process, and so forth, until core 0 has the
final result.

Copyright © 2010, Elsevier Inc. All rights Reserved 38


Multiple cores forming a global
sum

Copyright © 2010, Elsevier Inc. All rights Reserved 39


Analysis
◼ In the first example, the master core
performs 7 receives and 7 additions.

◼ In the second example, the master core


performs 3 receives and 3 additions.

◼ The improvement is more than a factor of 2!

Copyright © 2010, Elsevier Inc. All rights Reserved 40


Analysis (cont.)
◼ The difference is more dramatic with a
larger number of cores.
◼ If we have 1000 cores:
◼ The first example would require the master to
perform 999 receives and 999 additions.
◼ The second example would only require 10
receives and 10 additions.

◼ That’s an improvement of almost a factor


of 100!
Copyright © 2010, Elsevier Inc. All rights Reserved 41
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
How do we write parallel
programs?
◼ Task parallelism
◼ Partition various tasks carried out solving the
problem among the cores.

◼ Data parallelism
◼ Partition the data used in solving the problem
among the cores.
◼ Each core carries out similar operations on it’s
part of the data.

Copyright © 2010, Elsevier Inc. All rights Reserved 43


Professor P

15 questions
300 exams

Copyright © 2010, Elsevier Inc. All rights Reserved 44


Professor P’s grading assistants

TA#1 TA#3
TA#2

Copyright © 2010, Elsevier Inc. All rights Reserved 45


Division of work –
data parallelism

TA#1
100 exams
TA#3

100 exams

100 exams
TA#2

Copyright © 2010, Elsevier Inc. All rights Reserved 46


Division of work –
task parallelism

TA#1
TA#3
Questions 11 - 15
Questions 1 - 5

TA#2
Questions 6 - 10

Copyright © 2010, Elsevier Inc. All rights Reserved 47


Division of work –
data parallelism

Copyright © 2010, Elsevier Inc. All rights Reserved 48


Division of work –
task parallelism

Tasks
1) Receiving
2) Addition

Copyright © 2010, Elsevier Inc. All rights Reserved 49


Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Parallel Programming Concepts
Parallel programming is explicitly indicating how
different portions of the computation may be executed
concurrently by different processors.
Programmers/compilers must be able to identify
operations that may be performed in parallel.
Terminology
◼ Concurrent computing – a program is one
in which multiple tasks can be in progress
at any instant.
◼ Parallel computing – a program is one in
which multiple tasks cooperate closely to
solve a problem
◼ Distributed computing – a program may
need to cooperate with other programs to
solve a problem.

Copyright © 2010, Elsevier Inc. All rights Reserved 52


Coordination
◼ Cores usually need to coordinate their work.
◼ Communication – one or more cores send
their current partial sums to another core.
◼ Load balancing – share the work evenly
among the cores so that one is not heavily
loaded.
◼ Synchronization – because each core works
at its own pace, make sure cores do not get
too far ahead of the rest.

Copyright © 2010, Elsevier Inc. All rights Reserved 53


Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Parallel Architectures
◼ Multi-Processors Systems
◼ Multiple-CPU computer with shared memory.
◼ The same address on different CPUs refers
to the same memory location.
◼ Centralized multiprocessor: all the primary
memory is in one place.
◼ Distributed multiprocessor: the primary
memory is distributed among the processors.

55
Parallel Architectures
◼ Multi-Computers Systems
◼ Has disjoint local address spaces (memory).
◼ Each CPU has direct access to its local
memory only.
◼ The same address on different CPUs refers to
different memory locations.
◼ CPUs interact with each other by passing
messages.

56
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Type of parallel systems

Shared-memory Distributed-memory
Multi-Processor Multicomputer
Copyright © 2010, Elsevier Inc. All rights Reserved 58
Parallel Algorithm Design
A four-step process for designing parallel
algorithms
Foster’s Design Methodology
Parallel Algorithm Design
Decomposition: partitioning the problem
into tasks.
Communication: connecting tasks to each
other.
Agglomeration: reducing the number of
tasks to reduce communication overhead.
Mapping: assigning tasks to processes.
Concluding Remarks (1)
◼ The laws of physics have brought us to the
doorstep of multicore technology.
◼ Serial programs typically don’t benefit from
multiple cores.
◼ Automatic parallel program generation
from serial program code isn’t the most
efficient approach to get high performance
from multicore computers.

Copyright © 2010, Elsevier Inc. All rights Reserved 61


Concluding Remarks (2)
◼ Learning to write parallel programs
involves learning how to coordinate the
cores.
◼ Parallel programs are usually very
complex and therefore, require sound
program techniques and development.

Copyright © 2010, Elsevier Inc. All rights Reserved 62

You might also like