Comp322 s19 Lec01 Slides v1 PDF
Comp322 s19 Lec01 Slides v1 PDF
Comp322 s19 Lec01 Slides v1 PDF
Fundamentals of
Parallel Programming
Lecture 1: Task Creation & Termination
(async, finish)
http://comp322.rice.edu
• Instruction-level parallelism
(ILP) in hardware has also
plateaued below 10
instructions/cycle
• ⇒ Parallelism must be
managed by software!
interaction BUS
L2 Cache
Schematic of a dual-core
Processor
Observations:
This algorithm is simple to understand since it sums the elements of
•However,
The decision to sum
we could haveupobtained
the elements
the from
sameleft
algebraic result by summin
to rightThis
instead. was over-specification
arbitrary of the ordering of operations in sequen
referred to as the Von Neumann bottleneck [1]. The left-to-right evalua
•seenThein computation graphgraph
the computation shows that all
shown in Figure 1. We will study compu
operations
course. must
For now, be executed
think of each node sequentially
or vertex (denoted by a circle) as an
edge (denoted by an arrow) as an ordering constraint between the oper
flow
10
of the outputCOMP
from the first operation to the input of the second op
322, Spring 2019 (M. Joyner, Z. Budimlić)
Parallelization Strategy for two cores
(Two-way Parallel Array Sum)
Task 0: Compute sum of Task 1: Compute sum of
lower half of array upper half of array
+"
Basic idea:
• Decompose problem into two tasks for partial sums
• Combine results to obtain final answer
• Parallel divide-and-conquer pattern
// T0(Parent task) T1 T0
STMT0; STMT0
finish { //Begin finish
async { fork
STMT1; //T1(Child task)
} STMT1 STMT2
STMT2; //Continue in T0
//Wait for T1
join
} //End finish
STMT3; //Continue in T0 STMT3
12 COMP 322, Spring 2019 (M. Joyner, Z. Budimlić)
since there is no edge or sequence of edges connecting Tasks T 2 and T 3. This indicates that tasks
Two-way Parallel Array Sum
can execute in parallel with each other; for example, if your computer has two processor cores, T
using async & finish constructs
can be executed on two di↵erent processors at the same time. We will see much richer examples
programs using async, finish and other constructs during the course.