VSS-NumericalLibraries
VSS-NumericalLibraries
Libraries
Sathish Vadhiyar
Gaussian Elimination - Review
Version 1
for each column i
zero it out below the diagonal by adding multiples of row i to
later rows
for i= 1 to n-1
for each row j below row i
for j = i+1 to n
add a multiple of row i to row j
for k = i to n
A(j, k) = A(j, k) – A(j, i)/A(i, i) * A(i, k)
i
0
0 0 k
i
0 0 i,i
0 0 X
j 0 0 X
0 0 x
Gaussian Elimination - Review
Version 2 – Remove A(j, i)/A(i, i) from inner loop
for each column i
zero it out below the diagonal by adding multiples of row i to
later rows
for i= 1 to n-1
for each row j below row i
for j = i+1 to n
m = A(j, i) / A(i, i)
for k = i to n
A(j, k) = A(j, k) – m* A(i, k)
i
0
0 0 k
i
0 0 i,i
0 0 X
j 0 0 X
0 0 x
Gaussian Elimination - Review
Version 3 – Don’t compute what we already know
for each column i
zero it out below the diagonal by adding multiples of row i to
later rows
for i= 1 to n-1
for each row j below row i
for j = i+1 to n
m = A(j, i) / A(i, i)
for k = i+1 to n
A(j, k) = A(j, k) – m* A(i, k)
i
0
0 0 k
i
0 0 i,i
0 0 X
j 0 0 X
0 0 x
Gaussian Elimination - Review
Version 4 – Store multipliers m below diagonals
for each column i
zero it out below the diagonal by adding multiples of row i to
later rows
for i= 1 to n-1
for each row j below row i
for j = i+1 to n
A(j, i) = A(j, i) / A(i, i)
for k = i+1 to n
A(j, k) = A(j, k) – A(j, i)* A(i, k)
i
0
0 0 k
i
0 0 i,i
0 0 X
j 0 0 X
0 0 x
GE - Runtime
Divisions
1+ 2 + 3 + … (n-1) = n2/2 (approx.)
Multiplications / subtractions
12 + 22 + 32 + 42 +52 + …. (n-1)2 = n3/3 – n2/2
Total
2n3/3
Parallel GE
1st step – 1-D block partitioning along
blocks of n columns by p processors
i
0
0 k
0
i
0 0 i,i
0 0 X
j 0 0 X
0 0 x
1D block partitioning - Steps
1. Divisions
n2/2
2. Broadcast
xlog(p) + ylog(p-1) + zlog(p-3) + … log1 <
n2logp
3. Multiplications and Subtractions
(n-1)n/p + (n-2)n/p + …. 1x1 = n3/p (approx.)
Runtime:
< n2/2 +n2logp + n3/p
2-D block
To speedup the divisions
P
0
0 k
0
i
0 0 i,i Q
0 0 X
j 0 0 X
0 0 x
2D block partitioning - Steps
1. Broadcast of (k,k)
logQ
2. Divisions
n2/Q (approx.)
3. Broadcast of multipliers
xlog(P) + ylog(P-1) + zlog(P-2) + …. = n2/Q logP
0123012301230
• Row pivoting
• Involves distributed search and exchange – O(n/P)+O(logP)
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
Red-Black Ordering
Color alternate nodes in each
dimension red and black
Number red nodes first and then
black nodes
Red nodes can be updated
simultaneously followed by
simultaneous black nodes updates
2D Grid example – Red Black
Ordering
15 7 16 8
5 13 6 14
11 3 12 4
1 9 2 10