Fox Example
Fox Example
Thomas Anastasio
November 23, 2003
Fox’s algorithm for matrix multiplication is described in Pacheco 1 . This handout gives an example of the
algorithm applied to 2 × 2 matrices, A and B. The product is a 2 × 2 matrix C.
A00 A01 B00 B01 A00 B00 + A01 B10 A00 B01 + A01 B11
A= B= C=
A10 A11 B10 B11 A10 B00 + A11 B10 A10 B01 + A11 B11
Assume that we have n2 processes, one for each of the elements in A, B, and C. Call the processes P 00 , P01 ,
P10 , and P11 , and think of them as being arranged in a grid as follows:
P00 P01
P10 P11
Allocate space on each processor Pij for an A element, a B element, and a C element.
Fox’s algorithm takes n stages for matrices of order n. The algorithm starts off with each C i,j = 0. In
stage k, process Pi,j computes
Ci,j = Ci,j + Ai,i+k × Bi+k,j
In this example, since our matrices are of order 2, there will be two stages. In stage 0, P i,j computes
Ci,j = Ci,j + Ai,i × Bi,j . In stage 1, Pi,j computes Ci,j = Ci,j + Ai,i+1 × Bi+1,j , a column to the “right” in
A and a row “down” in B.
1. Stage 0
(a) We want Ai,i on process Pi,j , so broadcast the diagonal elements of A across the rows, (A ii → Pij ).
This will place A0,0 on each P0,j and A1,1 on each P1,j . The A elements on the P matrix will be
A00 A00
A11 A11
(b) We want Bi,j on process Pi,j , so broadcast B across the rows (Bij → Pij ). The A and B values
on the P matrix will be
A00 A00
B00 B01
A11 A11
B10 B11
1 Peter Pacheco, Parallel Programming with MPI, Morgan-Kaufmann, 1996, Section 7.2
1
(c) Compute Cij = AB for each process
A00 A00
B00 B01
C00 = A00 B00 C01 = A00 B01
A11 A11
B10 B11
C10 = A11 B10 C11 = A11 B11
We are now ready for the second stage. In this stage, we broadcast the next column (mod n) of A
across the processes and shift-up (mod n) the B values.
2. Stage 1
(a) The next column of A is A0,1 for the first row and A1,0 for the second row (it wrapped around,
mod n). Broadcast next A across the rows
A01 A01
B00 B01
C00 = A00 B00 C01 = A00 B01
A10 A10
B10 B11
C10 = A11 B10 C11 = A11 B11
(b) Shift the B values up. B1,0 moves up from process P1,0 to process P0,0 and B0,0 moves up (mod n)
from P0,0 to P1,0 . Similarly for B1,1 and B0,1 .
A01 A01
B10 B11
C00 = A00 B00 C01 = A00 B01
A10 A10
B00 B01
C10 = A11 B10 C11 = A11 B11
(c) Compute Cij = AB for each process
A01 A01
B10 B11
C00 = C00 + A01 B10 C01 = C01 + A01 B11
A10 A10
B00 B01
C10 = C10 + A10 B00 C11 = C11 + A10 B01
The algorithm is complete after n stages and process P i,j contains the final result for Ci,j .