CS-3006 2 PDC Overview Compressed
CS-3006 2 PDC Overview Compressed
Overview
(CS 3006)
• Parallelism provides:
– Solving large problems in same time (or even less)
– Solve fixed size problems faster
Why do parallel computing?
Two main aspects:
1. Technology Push
2. Applications Pull
Technology Push
Single Processor Systems
– From mid-1980s until 2004 computers got
faster:
• More number of transistors were packed
every year.
• Size of transistors continued to get smaller
• Resulted in faster processors (frequency
scaling method more GHzs means more
performance) - as observed in Moore’s law
Moore’s Law
• An observation by Gordon Moore (co-founder of
Intel)
• Projection based on historical trend (processor
design) in 1970s
Number of transistors in a processor doubles
every year
• Explicit Parallelism:
– VLIW, More execution Units
c c c c
o o o o
r r r r
e e e e
1 2 3 4
Multi-core CPU chip
c c c c
o o o o
r r r r
e e e e
1 2 3 4
Threads may be time-sliced (like a uniprocessor)
c c c c
o o o o
r r r r
e e e e
1 2 3 4
52
Other Option: Simultaneous Multithreading
c c c c
o o o o
r r r r
e e e e
1 2 3 4
Mobile parallel processing
Power constraints also heavily influence the design of mobile systems.
http://simplecore.intel.com/newsroom/wp-content/uploads/sites/11/HSW-E-Die-Mapping-Hi-Res.jpg
Mobile Parallel Processing
Mobile Parallel Processing
Why do parallel computing?
Application Pull
Why do parallel computing?
Fetch/
Decode
ld r0, addr[r1]
mul r1, r0, r0
Execution Unit mul r1, r1, r0
(ALU) ...
...
...
Execution ...
Context ...
...
st addr[r2], r0
y[i]
x[i] x[j]
Fetch/ Fetch/
Decode Decode
x[i] x[j]
But there are now two cores: 2 × 0.75 = 1.5 (potential for speedup!)
S
Expressing parallelism using C++ threads
Some Terminologies
• Task
– A logically discrete section of computational work.
– A task is typically a program or program-like set of
instructions that is executed by a processor.
• Parallel Task
– A task that can be executed by multiple processors
safely (producing correct results)
• Serial Execution
– Execution of a program sequentially, one statement at
a time using one processor.
4 Steps in Creating Parallel Programs
Parallelization Strategy
1. Problem Understanding
2. Partitioning/Decomposition
3. Assignment
4. Orchestration
5. Mapping
Parallelization Strategy
1. Problem Understanding
2. Partitioning/Decomposition
3. Assignment
4. Orchestration
5. Mapping
1. Problem Understanding
• The first step in developing parallel application is
to Understand the Problem (that you wish to
solve)
• Potential Solutions:
– Restructure the program
– Use a different algorithm
– Overlap the communication with computation
Parallelization Strategy
1. Problem Understanding
2. Partitioning/Decomposition
3. Assignment
4. Orchestration
5. Mapping
Amdhal’s Law
• Speedup
wall-clock time of serial execution
----------------------------------------------
wall-clock time of parallel execution
2. Partitioning/Decomposition
• Divide work into discrete "chunks"
• Broken chunks can be assigned to specific tasks
• Tasks could be executed concurrently
• Decomposition: (1) Domain decomposition
(2) Functional decomposition
• Decomposition Granularity:
– Fine-grain: large number of small tasks
– Coarse-grain: small number of large tasks
Fine-grained tasks
Coarse-grained
tasks
Granularity
a) Inter-task communication
b) Synchronization among tasks
c) Data locality aspect
d) Other system related considerations (NUMA,
etc.).
Communication
Who Needs Communications?
• No Communication Required:
– Some problems can be decomposed and executed
in parallel with virtually no need for tasks to share
data:
• Imagine an image processing operation where
every pixel in a black and white image needs to
have its color reversed