0% found this document useful (0 votes)
4 views13 pages

L28 Parallelization

The document discusses data dependence in programming, defining four types: flow, anti, output, and input dependence. It explains how these dependencies affect the execution order of statements and introduces the concept of dependence graphs to represent these relationships. Additionally, it provides examples and problem formulations to illustrate the implications of data dependence on parallelization and optimization in compilers.

Uploaded by

Jonathan Song
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

L28 Parallelization

The document discusses data dependence in programming, defining four types: flow, anti, output, and input dependence. It explains how these dependencies affect the execution order of statements and introduces the concept of dependence graphs to represent these relationships. Additionally, it provides examples and problem formulations to illustrate the implications of data dependence on parallelization and optimization in compilers.

Uploaded by

Jonathan Song
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Dependence

S1 : A  1.0
S2 : B  A  2.0
S3 : A  C D
Data Dependence, Parallelization, S4 :

A  B/C
and Locality Enhancement
We define four types of data dependence.

 Flow (true) dependence: a statement Si precedes a


(courtesy
y of Tarek Abdelrahman, University
y of Toronto) statement Sj in execution and Si computes a data value that
Sj uses.

 Implies that Si must execute before Sj.


Si δt Sj (S1 δt S2 and S2 δt S4 )

Carnegie Mellon Carnegie Mellon


Todd C. Mowry Optimizing Compilers: Parallelization -2- Todd C. Mowry

Data Dependence Data Dependence

S1 : A  1.0 S1 : A  1.0
S2 : B  A  2.0 S2 : B  A  2.0
S3 : A  C D S3 : A  C D
 
S4 : A  B/C S4 : A  B/C

We define four types of data dependence. We define four types of data dependence.

 Anti dependence: a statement Si precedes a statement Sj in  Output dependence: a statement Si precedes a statement Sj
execution and Si uses a data value that Sj computes. in execution and Si computes a data value that Sj also
computes.
 It implies that Si must be executed before Sj.
 It implies that Si must be executed before Sj.
Si δ Sj
a
(S2 δ S3 )
a
Si δo Sj (S1 δo S3 and S3 δo S4 )

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -3- Todd C. Mowry Optimizing Compilers: Parallelization -4- Todd C. Mowry

1
Data Dependence Data Dependence (continued)
 The dependence is said to flow from Si to Sj because Si
precedes Sj in execution.
S1 : A  1.0
S2 : B  A  2.0  Si is said to be the source of the dependence. Sj is said to
S3 : A  C D be the sink of the dependence.

S4 : A  B/C  The only “true” dependence is flow dependence; it
represents the flow of data in the program.

We define four types of data dependence.  The other types of dependence are caused by programming
style; they may be eliminated by re-naming.
 Input dependence: a statement Si precedes a statement Sj
in execution and Si uses a data value that Sj also uses. S1 : A  1.0
S2 : B  A  2.0
 Does this imply that Si must execute before Sj? S3 : A1  C  D

S4 : A2  B/C
Si δI Sj (S3 δI S4 )

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -5- Todd C. Mowry Optimizing Compilers: Parallelization -6- Todd C. Mowry

Data Dependence (continued) Value or Location?


 Data dependence in a program may be represented using a  There are two ways a dependence is defined: value-oriented
dependence graph G=(V,E), where the nodes V represent or location-oriented.
statements in the program and the directed edges E
represent dependence relations.

S1 : A  1.0
S1 S2 : B  A  2.0
S1 : A  1.0 t
S3 : A  CD
S2 : B  A  2.0 S2 o 
S3 : A  CD 
S4 : A  B/C
 t S3
S4 : A  B/C o I

S4

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -7- Todd C. Mowry Optimizing Compilers: Parallelization -8- Todd C. Mowry

2
Example 1 Example 2

i=2 i=3 i=4 i=2 i=3 i=4


do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4] do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4]
S1: a(i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(i)
(i) S2: d(i) = a(i-1)
(i 1)
end do t t t end do
a(2) a(2) a(3) a(3) a(4) a(4) a(2) a(1) a(3) a(2) a(4) a(3)

 There is an instance of S1 that precedes an instance of S2 in t t


execution and S1 produces data that S2 consumes.
 There is an instance of S1 that precedes an instance of S2 in
 S1 is the source of the dependence; S2 is the sink of the execution and S1 produces data that S2 consumes.
dependence.
 S1 is the source of the dependence; S2 is the sink of the
 The dependence flows between instances of statements in the d
dependence.
d
same iteration (loop-independent dependence).
 The dependence flows between instances of statements in
 The number of iterations between source and sink (dependence different iterations (loop-carried dependence).
distance) is 0. The dependence direction is =.
 The dependence distance is 1. The direction is positive (<).
S1 δt S2 or S1 δ0t S2
S1 δt S2 or S1 δ1t S2
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -9- Todd C. Mowry Optimizing Compilers: Parallelization -10- Todd C. Mowry

Example 3 Example 4

i=2 i=3 i=4 do i = 2, 4 a(1,3) a(1,4) a(1,5)


do j = 2, 4 S[2,2] S[2,3] S[2,4]
do i = 2, 4 S1[2] S2[2] S1[3] S2[3] S1[4] S2[4]
S: a(i,j) = a(i-1,j+1)
S1: a(i) = b(i) + c(i) end do
S2: d(i) = a(i+1)
(i 1) end do a(2,2) t a(2,3) t a(2,4)
a a
end do
a(2) a(3) a(3) a(4) a(4) a(5)  An instance of S precedes
another instance of S and a(2,3) a(2,4) a(2,5)
 There is an instance of S2 that precedes an instance of S1 in S produces data that S S[3,2] S[3,3] S[3,4]
execution and S2 consumes data that S1 produces. consumes.
 S2 is the source of the dependence; S1 is the sink of the  S is both source and sink.
dependence. a(3,2) t a(3,3) t a(3,4)
 The dependence is loop-
 The dependence is loop-carried.
loop carried carried.
i d
a(3,3) a(3,4) a(3,5)
 The dependence distance is 1.  The dependence distance S[4,2] S[4,3] S[4,4]
is (1,-1).
S2 δ S1
a
 or S2 δ S1
1
a

 Are you sure you know why it is S2 <a S1 even though S1 appears a(4,2) a(4,3) a(4,4)
before S2 in the code? S δ(t, ) S or S δ(1,
t
1)
S
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -11- Todd C. Mowry Optimizing Compilers: Parallelization -12- Todd C. Mowry

3
Problem Formulation Problem Formulation

 Consider the following perfect nest of depth d:  Dependence
 willexist
 if there
 exists two iteration vectors k
and j such that L  k  j  U and:
 
do I1  L1 , U1 array reference f1(k )  g1( j )
do I2  L2, U2  and  
 a( , fk (I ) ,  , ) f2 (k )  g2 ( j )
and
do Id  Ld , Ud   
a(f1 (I ), f2 (I ), ,fm (I ))  
 and  
  a(g1 (I ), g2 (I ), , gm (I )) fm (k )  gm ( j )
subscript subscript
enddo position function
 or  That is:
enddo subscript  
enddo expression f1(k )  g1( j )  0
 and  
f2 (k )  g2 ( j )  0
I  (I1,I2 ,,Id ) and
 
L  (L1, L2 , , Ld ) linear functions
and  
 b0  b1 I1  b2 I2    bd Id fm (k )  gm ( j )  0
U  (U1,U2 ,,Ud )
 
L U
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -13- Todd C. Mowry Optimizing Compilers: Parallelization -14- Todd C. Mowry

Problem Formulation - Example Problem Formulation - Example


do i = 2, 4 do i = 2, 4
S1: a(i) = b(i) + c(i) S1: a(i) = b(i) + c(i)
S2: d(i) = a(i-1) S2: d(i) = a(i+1)
end do end do

 Does there exist two iteration vectors i1 and i2, such that
 Does there exist two iteration vectors i1 and i2, such that 2  i1  i2  4 and such that:
2  i1  i2  4 and such that:
i1 = i2 +1?
i1 = i2 -1?
 Answer: yes; i1=3 & i2=2 and i1=4 & i2 =3. (But, but!).
 Answer: yes; i1=2 & i2=3 and i1=3 & i2 =4.
 Hence,, there is dependence!
p
 Hence, there is dependence!
 The dependence distance vector is i2-i1 = -1.
 The dependence distance vector is i2-i1 = 1.  The dependence direction vector is sign(-1) = .
 The dependence direction vector is sign(1) = .  Is this possible?

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -15- Todd C. Mowry Optimizing Compilers: Parallelization -16- Todd C. Mowry

4
Problem Formulation - Example Problem Formulation
 Dependence testing is equivalent to an integer linear
do i = 1, 10 programming (ILP) problem of 2d variables & m+d constraint!
S1: a(2*i) = b(i) + c(i)
S2: d(i) = a(2*i+1) An algorithm that
end do
   determines if there exits two iteration
vectors k and j that satisfies these constraints is called a
dependence tester.
 
 Does there exist two iteration vectors i1 and i2, such that  The dependence distance vector is given by j - k.
1  i1  i2  10 and such that:  
 The dependence direction vector is give by sign( j - k ).
2*i1 = 2*i2 +1?
 Dependence testing is NP-complete!
 Answer: no; 2
2*ii1 is even & 2
2*ii2+1 is odd.
odd
 A dependence test that reports dependence only when there
 Hence, there is no dependence! is dependence is said to be exact. Otherwise it is in-exact.

 A dependence test must be conservative; if the existence of


dependence cannot be ascertained, dependence must be
assumed.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -17- Todd C. Mowry Optimizing Compilers: Parallelization -18- Todd C. Mowry

Dependence Testers Lamport’s Test


 Lamport’s Test.  Lamport’s Test is used when there is a single index variable
 GCD Test. in the subscript expressions, and when the coefficients of
the index variable in both expressions are the same.
 Banerjee’s Inequalities.
 Generalized GCD Test.
Test A(  , b * i  c1 ,  )  
 Power Test.   A(  , b * i  c2 ,  )
 I-Test.
 The dependence problem: does there exist i1 and i2, such
 Omega Test. that Li  i1  i2  Ui and such that
 Delta Test.
c1  c2
Stanford Test. b*i1 + c1 = b*i2 + c2? or ? i2  i1 
 b
etc
etc… c1  c2

 There
Th is
i integer
i t solution
l ti if andd only
l if is
i integer.
i t
b
The dependence distance is d = c1 c2 if Li  |d|  Ui.


b
 d  0  true dependence.
d = 0  loop independent dependence.
d  0  anti dependence.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -19- Todd C. Mowry Optimizing Compilers: Parallelization -20- Todd C. Mowry

5
Lamport’s Test - Example Lamport’s Test - Example
do i = 1, n do i = 1, n
do j = 1, n do j = 1, n
S: a(i,j) = a(i-1,j+1) S: a(i,2*j) = a(i-1,2*j+1)
end do end do
end
d do
d end
d do
d

 i1 = i2 -1?  j1 = j2 + 1?  i1 = i2 -1?  2*j1 = 2*j2 + 1?

b = 1; c1 = 0; c2 = -1 b = 1; c1 = 0; c2 = 1 b = 1; c1 = 0; c2 = -1 b = 2; c1 = 0; c2 = 1
c1  c2 c1  c2 c1  c2 c1  c2 1
1  1 1 
b b b b 2
There is dependence. There is dependence. There is dependence. There is no dependence.
Distance (i) is 1. Distance (j) is -1. Distance (i) is 1.

?
S δ(1,
t
1)
S or S δ(t, ) S There is no dependence!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -21- Todd C. Mowry Optimizing Compilers: Parallelization -22- Todd C. Mowry

GCD Test GCD Test - Example


 Given the following equation:
do i = 1, 10
n S1: a(2*i) = b(i) + c(i)
 ai xi  c ai's and c are int egers S2: d(i) = a(2*i-1)
i1
i1 end do

an integer solution exists if and only if:


 Does there exist two iteration vectors i1 and i2, such that
1  i1  i2  10 and such that:
gcd( a1, a2 , , an) divides c
2*i1 = 2*i2 -1?
or
 Problems:
2*i2 - 2*i1 = 1?
– ignores loop bounds.
– gives no information on distance or direction of dependence.  There will be an integer solution if and only if gcd(2,-2)
divides 1.
– often gcd(……) is 1 which always divides c, resulting in false
dependences.
 This is not the case, and hence, there is no dependence!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -23- Todd C. Mowry Optimizing Compilers: Parallelization -24- Todd C. Mowry

6
GCD Test Example Dependence Testing Complications
 Unknown loop bounds.
do i = 1, 10
do i = 1, N
S1: a(i) = b(i) + c(i)
S1: a(i) = a(i+10)
S2: d(i) = a(i-100)
end do
end do
What is the relationship between N and 10?
 Does there exist two iteration vectors i1 and i2, such that
1  i1  i2  10 and such that:  Triangular loops.

i1 = i2 -100? do i = 1, N
or do j = 1, i-1
i2 - i1 = 100? S: a(i,j) = a(j,i)
end do
 There will be an integer solution if and only if gcd(1,-1) divides end do
100.

 This is the case, and hence, there is dependence! Or is there? Must impose j  i as an additional constraint.
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -25- Todd C. Mowry Optimizing Compilers: Parallelization -26- Todd C. Mowry

More Complications More Complications


 User variables.  Scalars.

do i = 1, 10 do i = 1, N do i = 1, N
S1: a(i) = a(i+k)
end do
S1:
S2:
x = a(i)
b(i)
()=x
 S1:
S2:
x(i) = a(i)
b(i)
( ) = x(i)
()
end do end do
Same problem as unknown loop bounds, but occur due to
some loop transformations (e.g., normalization).
j = N-1
do i = 1, N do i = 1, N
S1:
do i = L, H
a(i) = a(i-1)
S1: a(i) = a(j)  S1: a(i) = a(N-i)
S2: j = j - 1
end do end do end do

 sum = 0 do i = 1, N
do i = 1, H-L
do i = 1, N
S1: sum = sum + a(i)
 S1: sum(i) = a(i)
end do
S1: a(i+L) = a(i+L-1) end do sum += sum(i) i = 1, N
end do
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -27- Todd C. Mowry Optimizing Compilers: Parallelization -28- Todd C. Mowry

7
Serious Complications Loop Parallelization
 Aliases.  A dependence is said to be carried by a loop if the loop is
– Equivalence Statements in Fortran: the outmost loop whose removal eliminates the dependence.
If a dependence is not carried by the loop, it is loop-
real a(10,10), b(10) independent.
d i=2
do 2, n-11
makes b the same as the first column of a. do j = 2, m-1
a(i, j) = …
... = a(i, j)
– Common blocks: Fortran’s way of having shared/global variables.
b(i, j) = …
common /shared/a,b,c … = b(i, j-1)
:
:
c(i, j) = …
subroutine foo (…) … = c(i-1, j)
common /shared/a,b,c end do
end do
common /shared/x,y,z
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -29- Todd C. Mowry Optimizing Compilers: Parallelization -30- Todd C. Mowry

Loop Parallelization Loop Parallelization


 A dependence is said to be carried by a loop if the loop is  A dependence is said to be carried by a loop if the loop is
the outmost loop whose removal eliminates the dependence. the outmost loop whose removal eliminates the dependence.
If a dependence is not carried by the loop, it is loop- If a dependence is not carried by the loop, it is loop-
independent. independent.
d i=2
do 2, n-11 d i=2
do 2, n-11
do j = 2, m-1 do j = 2, m-1
a(i, j) = … a(i, j) = …
δt,  ... = a(i, j) ... = a(i, j)

b(i, j) = … b(i, j) = …
… = b(i, j-1) δt, … = b(i, j-1)

c(i, j) = … c(i, j) = …
… = c(i-1, j) … = c(i-1, j)
end do end do
end do end do

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -31- Todd C. Mowry Optimizing Compilers: Parallelization -32- Todd C. Mowry

8
Loop Parallelization Loop Parallelization
 A dependence is said to be carried by a loop if the loop is  A dependence is said to be carried by a loop if the loop is
the outmost loop whose removal eliminates the dependence. the outmost loop whose removal eliminates the dependence.
If a dependence is not carried by the loop, it is loop- If a dependence is not carried by the loop, it is loop-
independent. independent.
d i=2
do 2, n-11 d i=2
do 2, n-11
do j = 2, m-1 do j = 2, m-1
a(i, j) = … a(i, j) = …
... = a(i, j) δt,  ... = a(i, j)

b(i, j) = … b(i, j) = …
… = b(i, j-1) δt, … = b(i, j-1)

c(i, j) = … c(i, j) = …
δt,  δt, 
… = c(i-1, j) … = c(i-1, j)
end do end do
end do end do
 Outermost loop with a non “=“ direction carries dependence!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -33- Todd C. Mowry Optimizing Compilers: Parallelization -34- Todd C. Mowry

Loop Parallelization Loop Parallelization - Example


fork
i=2 i=n-1

do i = 2, n-1 i=3 i=n-2


do j = 2,2 m-1
m 1
δt, b(i, j) = …
The iterations of a loop may be executed … = b(i, j-1)
end do
in parallel with one another if and only if
end do
no dependences are carried by the loop!
join

 Iterations of loop j must be executed sequentially, but the


iterations of loop i may be executed in parallel.

 Outer loop parallelism.


Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -35- Todd C. Mowry Optimizing Compilers: Parallelization -36- Todd C. Mowry

9
Loop Parallelization - Example Loop Parallelization - Example
fork fork
j=2 j=m-1 j=2 j=m-1

do i = 2, n-1 j=3 j=m-2 do i = 2, n-1 j=3 j=m-2


do j = 2,2 mm-11 do j = 2,2 m-1
m 1
b(i, j) = … b(i, j) = …
δt,  i=i+1 δt, i=i+1
… = b(i-1, j) … = b(i-1, j-1)
end do end do
end do end do

join join

 Iterations of loop i must be executed sequentially, but the  Iterations of loop i must be executed sequentially, but the
iterations of loop j may be executed in parallel. iterations of loop j may be executed in parallel. Why?

 Inner loop parallelism.  Inner loop parallelism.


Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -37- Todd C. Mowry Optimizing Compilers: Parallelization -38- Todd C. Mowry

Loop Interchange Loop Interchange

Loop interchange changes the order of the loops to improve the Loop interchange changes the order of the loops to improve the
spatial locality of a program. spatial locality of a program.

do j = 1, n do j = 1, n do i = 1, n
do i = 1, n do i = 1, n do j = 1, n
... a(i,j) ... ... a(i,j) ... … a(i,j) ...
end do end do end do
end do end do end do

j i
P P

C C
i j
M M

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -39- Todd C. Mowry Optimizing Compilers: Parallelization -40- Todd C. Mowry

10
Loop Interchange Loop Interchange
 Loop interchange can improve the granularity of parallelism!
j

δt,  δt,
i
do i = 1,n δt, do j = 1,n
do i = 1, n do j = 1, n do j = 1,n do i = 1,n
do j = 1, n do i = 1, n δt,  δt,
… a(i,j) … … a(i,j) …
a(i,j) = b(i,j) a(i,j) = b(i,j) end do end do
c(i,j) = a(i-1,j) c(i,j) = a(i-1,j) end do δt, δt,  end do
end do end do
end do end do
δt, δt,

δt,  δt,
 When is loop interchange legal?

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -41- Todd C. Mowry Optimizing Compilers: Parallelization -42- Todd C. Mowry

Loop Interchange Loop Interchange

j j

δt,  δt, δt,  δt,


i i
do i = 1,n do j = 1,n do i = 1,n do j = 1,n
do j = 1,n δt,  δt, do i = 1,n do j = 1,n δt,  δt, do i = 1,n
… a(i,j) … … a(i,j) … … a(i,j) … … a(i,j) …
end do end do end do end do
end do δt,  end do end do δt,  end do

 When is loop interchange legal?  When is loop interchange legal?

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -43- Todd C. Mowry Optimizing Compilers: Parallelization -44- Todd C. Mowry

11
Loop Interchange Loop Blocking (Loop Tiling)
Exploits temporal locality in a loop nest.
j

δt,  δt,
i
do i = 1,n do j = 1,n
do j = 1,n δt,  δt, do i = 1,n
… a(i,j) … … a(i,j) … do t = 1,T
end do end do do i = 1,n
end do δt,  end do do j = 1,n
… a(i,j) …
end do
end do
end do

 When is loop interchange legal? when the “interchanged”


dependences remain lexiographically positive!
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -45- Todd C. Mowry Optimizing Compilers: Parallelization -46- Todd C. Mowry

Loop Blocking (Loop Tiling) Loop Blocking (Loop Tiling)


Exploits temporal locality in a loop nest. Exploits temporal locality in a loop nest.

control loops
p control loops
p jc =1
do ic = 1,
1 nn, B do ic = 1,
1 nn, B
do jc = 1, n , B do jc = 1, n , B
do t = 1,T do t = 1,T
ic =1
do i = 1,B do i = 1,B
do j = 1,B do j = 1,B
… a(ic+i-1,jc+j-1) … … a(ic+i-1,jc+j-1) …
end do end do
end do end do
end do end do
end do end do
end do B: Block size end do B: Block size

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -47- Todd C. Mowry Optimizing Compilers: Parallelization -48- Todd C. Mowry

12
Loop Blocking (Loop Tiling) Loop Blocking (Loop Tiling)
Exploits temporal locality in a loop nest. Exploits temporal locality in a loop nest.

control loops
p jc =2 control loops
p
do ic = 1,
1 nn, B do ic = 1,
1 nn, B
do jc = 1, n , B do jc = 1, n , B
do t = 1,T do t = 1,T
ic =1
do i = 1,B do i = 1,B
do j = 1,B do j = 1,B
… a(ic+i-1,jc+j-1) … … a(ic+i-1,jc+j-1) …
end do end do
end do end do ic =2
end do end do
end do end do
end do B: Block size end do B: Block size

jc =1

Carnegie Mellon Carnegie Mellon


Optimizing Compilers: Parallelization -49- Todd C. Mowry Optimizing Compilers: Parallelization -50- Todd C. Mowry

Loop Blocking (Loop Tiling) Loop Blocking (Tiling)


Exploits temporal locality in a loop nest.

do ic = 1, n, B
do t = 1,T do jc = 1, n , B
control loops
p do t = 1,T
1T do ic = 1,
1 nn, B do t = 1,T
1T
do ic = 1,
1 nn, B
do jc = 1, n , B do i = 1,n do i = 1,B do i = 1,B
do t = 1,T do j = 1,n do jc = 1, n, B do j = 1,B
do i = 1,B … a(i,j) … do j = 1,B … a(ic+i-1,jc+j-1) …
do j = 1,B end do … a(ic+i-1,jc+j-1) … end do
… a(ic+i-1,jc+j-1) … end do end do end do
end do end do end do end do
end do ic =2 end do end do
end do end do
end do
end do B: Block size

jc =2
 When is loop blocking legal?
Carnegie Mellon Carnegie Mellon
Optimizing Compilers: Parallelization -51- Todd C. Mowry Optimizing Compilers: Parallelization -52- Todd C. Mowry

13

You might also like