Parallel Programming 1
Parallel Programming 1
●
Some parts of the program are not
parallelizable (because of dependencies)
●
Some parts may be parallelizable, but the
overhead dwarfs the increased speedup.
How could two statements execute in parallel?
●
On one processor:
statement 1;
statement 2;
●
On two processors:
processor1: processor2:
statement1; statement2;
Fundamental Assumption
●
Processors execute independently:
no control over order of execution
among processors
How could two statements execute in parallel?
• Possibility 1
Processor1: Processor2:
statement1;
statement2;
• Possibility 2
Processor1: Processor2:
statement2:
statement1;
How could two statements execute in parallel?
●
Their order of execution must not matter!
●
In other words,
statement1; statement2;
must be equivalent to
statement2; statement1;
Example 1
a = 1;
b = a;
●
Statements cannot be executed in parallel
●
Program modifications may make it possible.
Example 2
a = f(x);
b = a;
●
May not be wise to change the program.
Example 3
a = 1;
a = 2;
●
Statements cannot be executed in parallel.
Dependencies
●
What prevent us from parallelizing the
previous codes is the dependency between
the code statements.
Types of Dependencies
●
True dependency
●
Anti dependency
●
Output dependency
True dependence
Statements S1, S2
S2 has an anti-dependence on S1
iff →
S2 writes a value read by S1.
Output Dependence
Statements S1, S2.
●
No dependencies
●
Iterations can be executed in parallel.
Example 5
for(i=0; i<100; i++) {
a[i] = i;
b[i] = 2*i;
}
for(i=0;i<100;i++) a[i] = i;
for(i=0;i<100;i++) b[i] = 2*i;
●
There is a dependence … on itself!
●
Loop is still parallelizable.
Example 8
for( i=0; i<100; i++ )
a[i] = f(a[i-1]);
●
Dependence between a[i] and a[i-1].
●
Loop iterations are not parallelizable.
Loop-carried dependence
●
A loop carried dependence is a
dependence that is present only if the
statements are part of the execution of a
loop.
●
Otherwise, we call it a loop-independent
dependence.
●
Loop-carried dependencies prevent loop
iteration parallelization.
Example 9
for(i=0; i<100; i++ )
for(j=1; j<100; j++ )
a[i][j] = f(a[i][j-1]);
●
Loop-independent dependence on i.
●
Loop-carried dependence on j.
●
Outer loop can be parallelized, inner
loop cannot.
Example 10
for( j=1; j<100; j++ )
for( i=0; i<100; i++ )
a[i][j] = f(a[i][j-1]);
●
Inner loop can be parallelized, outer loop
cannot.
●
Less desirable situation.
●
Loop interchange is sometimes possible.
Level of loop-carried dependence
●
Is the nesting depth of the loop that
carries the dependence.
●
Indicates which loops can be parallelized.
Be careful … Example 11
printf(“a”);
printf(“b”);
●
Dependence between a[10], a[20], …
●
Dependence between a[11], a[21], …
●
…
●
Some parallel execution is possible.
Be careful … Example 14
for( i=1; i<100;i++ ) {
a[i] = …;
... = a[i-1];
}
●
Dependence between a[i] and a[i-1]
●
Complete parallel execution impossible
●
Pipelined parallel execution possible
Be careful … Example 15
for( i=0; i<100; i++ )
a[i] = f(a[indexa[i]]);