01 - Lecture Intro To HPC
01 - Lecture Intro To HPC
Higher Performance
Computing
LECTURE 1
Course Outline
• High Performance Computing
• Flynn’s Taxonomy
• Interconnection Networks
• Performance Analysis of Multiprocessor Architecture
• Shared Memory Architecture
• Parallel Programming with MPI
• Parallelization Fundamentals
• Parallel Programming with OpenMP
• Graphical Processing Units (GPUs)
• Task Scheduling and Allocation
Textbooks
Assessment
Assessment Marks
Assignments 15
Quizzes 5
Midterm 20
Major Task (Project) 20
Final Exam 40
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
What is High Performance Computing (HPC)?
◼ Introducing parallelism!!!
Core 0 1 2 3 4 5 6 7
my_sum 8 19 7 15 7 13 12 14
Global sum
8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95
Core 0 1 2 3 4 5 6 7
my_sum 95 19 7 15 7 13 12 14
◼ Data parallelism
◼ Partition the data used in solving the problem
among the cores.
◼ Each core carries out similar operations on it’s
part of the data.
15 questions
300 exams
TA#1 TA#3
TA#2
TA#1
100 exams
TA#3
100 exams
100 exams
TA#2
TA#1
TA#3
Questions 11 - 15
Questions 1 - 5
TA#2
Questions 6 - 10
Tasks
1) Receiving
2) Addition
55
Parallel Architectures
◼ Multi-Computers Systems
◼ Has disjoint local address spaces (memory).
◼ Each CPU has direct access to its local
memory only.
◼ The same address on different CPUs refers to
different memory locations.
◼ CPUs interact with each other by passing
messages.
56
Agenda
What is High Performance Computing (HPC)?
Why we need high performance computing?
HPC Domain of Applications
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
Parallel Programming Concepts
Parallel Architectures
Parallel Algorithm Design
Type of parallel systems
Shared-memory Distributed-memory
Multi-Processor Multicomputer
Copyright © 2010, Elsevier Inc. All rights Reserved 58
Parallel Algorithm Design
A four-step process for designing parallel
algorithms
Foster’s Design Methodology
Parallel Algorithm Design
Decomposition: partitioning the problem
into tasks.
Communication: connecting tasks to each
other.
Agglomeration: reducing the number of
tasks to reduce communication overhead.
Mapping: assigning tasks to processes.
Concluding Remarks (1)
◼ The laws of physics have brought us to the
doorstep of multicore technology.
◼ Serial programs typically don’t benefit from
multiple cores.
◼ Automatic parallel program generation
from serial program code isn’t the most
efficient approach to get high performance
from multicore computers.