Lec1 and 2
Lec1 and 2
Lecture 1
Why Parallel Computing?
1
INTRODUCTION
WEEK 01
Course Objectives
Learn how to program parallel processors and
systems
Learn how to think in parallel and write correct parallel
programs
Achieve performance and scalability through
understanding of architecture and software mapping
Significant hands-on programming experience
Develop real applications on real hardware
Discuss the current parallel computing context
What are the drivers that make this course timely
Contemporary programming models and architectures, and
where is the field going
3
Why is this Course Important?
Multi-core and many-core era is here to stay
Why? Technology Trends
Learn how to put all these vast machine resources to the best
use!
Useful for
Joining the industry
Graduate school
Our focus
Teach core concepts
4
Roadmap
Why we need ever-increasing performance.
Why we’re building parallel systems.
Why we need to write parallel programs.
How do we write parallel programs?
What we’ll be doing.
Concurrent, parallel, distributed!
5
Parallel and Distributed
Computing
Parallel computing (processing):
the use of two or more processors (computers), usually
Parallel programming:
the human process of developing programs that express what
6
Parallel Computing
To be run using multiple CPUs
◦A problem is broken into discrete parts that can be solved
concurrently
◦Each part is further broken down to a series of
instructions
Page 7
Parallel Computing Example
Page 8
Parallel Computing
The simultaneous use of multiple compute resources to solve a
computational problem.
Compute Resources
The compute resources can include:
◦A single computer with multiple processors/cores
◦An arbitrary number of computers connected by
a network
◦A combination of both
Page 10
Why we need ever-increasing
performance
Computational power is increasing, but so are our
computation problems and needs.
Problems we never dreamed of have been
solved because of past increases, such as
decoding the human genome.
More complex problems are still waiting to be
solved.
11
Climate modeling
National Oceanic and Atmospheric Administration
(NOAA) has more than 20PB of data and processes
80TB/day
12
Climate modeling
One
Another processor
processor computes
computes this part
this part in
parallel
13
Data analysis
CERN’s Large Hadron Collider (LHC) produces about 15PB per year
High-energy physics workflows involve a range of both data-intensive and
compute-intensive activities.
The collision data from the detectors on the LHC needs to be filtered to select a
few thousand interesting collisions from as many as one billion that may take
place each second.
The WLCG produces a massive sample of billions of simulated beam crossings,
trying to predict the response of the detector and compare it to known physics
processes and potential new physics signals.
14
Drug discovery
Computational drug discovery and design (CDDD) based on HPC is a
combination of pharmaceutical chemistry, computational chemistry, and
biology using supercomputers, and has become a critical technology in
drug research and development.
15
Why Parallel Computing?
The Real World is Massively Parallel:
◦Parallel computing attempts to emulate the natural world
◦Many complex, interrelated events happening at the same time, yet within a
sequence.
Page 12
Why Parallel Computing?
Page 13
Why Parallel Computing?
To solve larger, more complex Problems:
numerical simulations of complex systems and
"Grand Challenge Problems" such as:
To provide Concurrency:
Commercial applications require the processing of large amounts of data
in sophisticated ways.
Page 15
Why Parallel Computing?
Page 16
Why Parallel Computing?
◦ To save time
◦ To solve larger problems
◦ To provide concurrency
Page 17
Why Parallel Computing?
Page 18
Who and What?
Top500.org provides statistics on parallel computing
users.
Who and What?
Page 20
The Future?
During the past 20 years, the trends indicated by ever faster
networks, distributed systems, and multi-processor
architectures clearly show that parallelism is the future of
computing.
In this same time period, there has been a greater than
500,000x increase in supercomputer performance, with no
end currently in sight.
The race is already on for Exascale Computing!
Exaflop = 1018 calculations per second
Page 21
Towards parallel hardware
26
Why we’re building parallel
systems
Up to now, performance increases have been
attributable to increasing density of transistors.
27
A little physics lesson
Smaller transistors = faster processors.
Faster processors = increased power
consumption.
Increased power consumption =
increased heat.
Increased heat = unreliable processors.
28
Evolution of processors in the
last 50 years
Evolution of processors in the last 50 years
29
How small is 5nm?
https://www.tsmc.com/english/dedicatedFoundry/technology/logic/l_5nm
30
An intelligent solution
Instead of designing and building faster
microprocessors, put multiple processors on a
single integrated circuit.
Move away from single-core systems to
multicore processors.
Introducing parallelism!!!
31
Basic Computer Architecture
Old computers – one New computers have 4
unit to execute or more cpu cores
instructions
Core
32
Memory Cache
L1 Cache
Size is up to 2MB
Typically 100 times faster than RAM
L2 Cache
Size is typically between 256KB to 8MB
Typically 25 times faster than RAM
L3 Cache
Size is up to 64MB
L3 cache is a general memory pool that the entire chip can make use of
Page 23
Basic Concepts
Task
◦A logically discrete (independent) section of computational work.
◦A task is typically a program or set of instructions executed by a
processor.
Page 24
Basic Concepts
Parallel Task
◦A task that can be executed by multiple processors safely (yields
correct results)
Page 25
Basic Concepts
Parallel Program
◦A program which consists of multiple tasks running on multiple
processors, simultaneously.
Page 26
Basic Concepts
Serial Execution
◦Execution of a program sequentially, one statement at a time. All
parallel tasks will have sections that must be executed serially.
Parallel Execution
◦Execution of a program by more than one task, with each task being
able to execute the same or different statement at the same moment
in time (simultaneously).
Page 27
Basic Concepts
Node
◦A standalone "computer in a box".
◦Usually comprised of multiple processors/cores, memory, network
interfaces, etc.
Page 28
Basic Concepts
Communications
◦The data exchange between parallel tasks.
◦There are several ways this can be accomplished, such as through a
shared memory bus or over a network.
Page 29
Basic Concepts
Synchronization
◦The coordination of parallel tasks in real time, very often associated
with communications.
◦Synchronization usually involves waiting by at least one task, and can
therefore cause a parallel application's execution time to increase.
Page 30
Basic Concepts
Massively Parallel
◦Refers to the hardware that comprises a parallel system - having
many processors.
◦The meaning of many keeps increasing (up to 6 digits!!!!).
Page 31
Basic Concepts
Parallel Computing System
◦Consists of multiple processors having direct (usually bus based)
access to common physical memory.
◦All processors communicate with each other using a shared memory
Page 32
Basic Concepts
Distributed System
◦Contains multiple processors connected by a communication
network.
◦Refers to network based memory access for physical memory that is
not common.
Page 33
Memory Models
There are three common kinds of parallel
memory models
Shared
Distributed
Hybrid
46
Shared Memory Model
All cores share the same pool of memory
HPC Architecture – we talked about the
memory available on one node
Any memory changes seen by all
processors
47
Benefits and Drawback
Benefit:
Data sharing is fast
Drawback:
Adding more processors may lead to performance
issues when accessing the same shared memory
resource (memory contention)
48
Distributed Memory Model
In a distributed memory model, each core has its own
memory
Processors communicate only through a network connection
and/or communication protocol ( e.g., MPI )
Changes to local memory associated with processor do not
have an impact on other processors
Remote-memory access must be explicitly managed by the
programmer
49
Benefits and Drawbacks
Biggest benefit is scalability
Adding more processors doesn’t result in resource
contention as far as memory is concerned
Biggest Drawback
Can be tedious to program for distributed memory
models
All data relocation must be programmed by hand
50
Hybrid Memory Model
As the name implies, the hybrid memory model is a
combination of the shared and distributed memory
models
Most large and fast clusters today admit a hybrid-
memory model
A certain number of cores share the memory on one
node, but are connected to the cores sharing memory
on other nodes through a network
51
Benefits and Drawbacks
Benefit:
Scalability
Drawback
Must know how to program communication
between nodes (e.g., MPI)
52