Read Various Algorithms Listed
Read Various Algorithms Listed
Algorithms
Gal Beniamini Nathan Cheng Olga Holtz
The Hebrew University of Jerusalem University of California at Berkeley University of California at Berkeley
[email protected] [email protected] [email protected]
, 1 See Section 2.1 for the connection between the number of additions and the leading
2020. . coefficient.
DOI: 2 See Section 2.1 for definition.
1
Table 1: Examples of improved leading coefficients
Beniamini and Schwartz [1] extended the lower bound to the and obtain novel alternative-basis algorithms, often resulting in
generalized setting, in which the input and output can be trans- arithmetic complexity with leading coefficients superior to those
formed to a basis of larger dimension. They also found that the known previously (See Table 1, Table 2, and Appendix A).
leading coefficient of any such algorithm with a 2 × 2 base case The first two methods were obtained by the introduction of new
using 7 multiplications is at least 5. solutions to the Sparsest Independent Vector problem, which were
then used as oracles for Gottlieb and Neylon’s algorithm. As matrix
Obtaining alternative basis algorithms. Recursive-bilinear algo- sparsification is known to be NP-Hard, it is no surprise that these
rithms can be described by a triplet of matrices, dubbed the encod- methods exhibit exponential worst case complexity. Nevertheless,
ing and decoding matrices (see Section 2.1). The alternative basis they perform well in practice on the encoding/decoding matrices
technique [22, 23] utilizes a decomposition of each of these matrices of fast matrix multiplication algorithms.
into a pair of matrices – a basis transformation, and a sparse encod- Our third method for matrix sparsification simultaneously mini-
ing or decoding matrix. Once a decomposition is found, applying mizes the number of non-singular values in the matrix. This method
the algorithm is straightforward (see Section 2.1). does not guarantee an optimal solution for matrix sparsification.
The leading coefficient of the arithmetic complexity is deter- Nonetheless, it obtains solutions with the same (and, in some cases,
mined by the number of non-zero (and non-singleton) entries in better) leading coefficients than the former two methods when
each of the encoding/decoding matrices, while the basis transforma- applied to many of the fast matrix multiplication algorithms in
tions only affect the low order terms of the arithmetic complexity our corpus, and runs significantly faster than the first two when
(see Section 2.1). Thus, reducing the leading coefficient of fast ma- implemented using Z3 [14]. For completeness, we also present the
trix multiplication algorithms translates to the matrix sparsification sparsification heuristic used in [22, 23].
(MS) problem.
Matrix sparsification. Unfortunately, matrix sparsification is NP-
Hard to solve [28] and NP-Hard to approximate to within a factor 1.3 Paper Organization.
.5−o(1)
of 2log n [15] (Over Q, assuming NP does not admit quasi- In Section 2, we recall preliminaries regarding fast matrix mul-
polynomial time deterministic algorithms). Despite the problem tiplication and recursive-bilinear algorithms, followed by a sum-
being NP-hard, search heuristics can be leveraged to obtain bases mary of the Alternative Basis technique [22, 23]. We then present
which significantly sparsify the encoding/decoding matrices of fast Matrix Sparsification (MS, Problem 2.13), alongside Gottlieb and
matrix multiplication algorithms with small base cases. Neylon’s [15] algorithm for solving MS by relying on an oracle for
There are a few heuristics that can solve the problem, under Sparsest Independent Vector (SIV, Problem 2.15). In Section 3 we
severe assumptions, such as the full rank of any square submatrix, present our two algorithms (Algorithms 3 and 4) for implementing
and requiring that the rank of each submatrix be equal to the size SIV. In Section 4, we introduce Algorithm 5 - the sparsification
of the largest matching in the induced bipartite graph (cf., [8, 17, heuristic of [22, 23], and a new efficient heuristic for sparsifying
28, 29]). These assumptions rarely hold in practice, and specifically, matrices while simultaneously minimizing non-singular values
do not apply to any matrix multiplication algorithm we know. (Algorithm 6). In Section 5 we present the resulting fast matrix
Gottlieb and Neylon’s algorithm [15] sparsifies an n × m matrix multiplication algorithms. Section 6 contains a discussion and plans
with no assumptions about the input. It does so by using calls to for future work.
an oracle for the sparsest independent vector problem.
where is the element-wise product (Hadamard product). Given a recursive bilinear, hn, m, k; ti ϕ,ψ ,υ -algorithm ALG, an
alternative basis matrix multiplication operates as follows:
Definition 2.2. [22, 23] (Encoding/Decoding matrices). We refer
to the matrix triplet hU , V , W i of a recursive-bilinear algorithm (see
Fact 2.1) as its encoding/decoding matrices (U , V are the encoding Algorithm 1 Alternative Basis Matrix Multiplication Algorithm
matrices and W is the decoding matrix).
Input: A ∈ R n×m , Bm×k
Notation 2.3. [1] Denote the number of nonzero entries in a
Output: n × k matrix C = A · B
matrix by nnz (A), and the number of non-singleton (i.e., not ±1)
1: function Mult(A, B)
entries in a matrix by nns (A). Let the number of rows/columns be
2: Ã = ϕ(A) . R n×m basis transformation
nrows (A) and ncols (A), respectively.
3: B̃ = ψ (B) . Rm×k basis transformation
Remark 2.4. [1] The number of linear operations used by a bi- 4: C̃ = ALG(Ã, B̃) . hn, m, k; ti ϕ,ψ ,υ -algorithm
linear algorithm is determined by its encoding/decoding matrices. 5: C = υ −1 (C̃) . R n×k basis transformation
6: return C
The number of arithmetic operations performed by each of the
encodings is:
OpsU = nnz (U ) + nns (U ) − nrows (U )
OpsV = nnz (V ) + nns (V ) − nrows (V ) Lemma 2.11. [22, 23] Let R be a ring, and let ϕ, ψ , υ be automor-
The number of operations performed by the decoding is: phisms of R n ·m , Rm ·k , R n ·k (respectively). Then hU , V , W i are en-
coding/decoding matrices of an hn, m, k; ti ϕ,ψ ,υ -algorithm if and only
OpsW = nnz (W ) + nns (W ) − ncols (W )
if U ϕ, Vψ , W υ −T are encoding/decoding matrices of an hn, m, k; ti-
Remark 2.5. We assume that none of the rows of the U , V , and W algorithm
matrices is zero. This is because any zero row in U , V is equivalent
Alternative basis multiplication is fast since the basis transfor-
to an identically 0 multiplicand, and any zero row inW is equivalent
mations are fast and incur an asymptotically negligible overhead:
to a multiplication that is never used in the output. Hence, such
rows can be omitted, resulting in asymptotically faster algorithms. Claim 2.12. [22, 23] Let R be a ring, let ψ : R n0 ×m0 → R n0 ×m0
Corollary 2.6. [1] Let ALG be an hn 0 , m 0 , k 0 ; t 0 i-algorithm that be a linear map, and let A ∈ R n×m where n = nk0 , m = mk0 . The
performs OpsU, OpsV, OpsW linear operations at the base case and let complexity of ψ (A) is
n = nl0 , m = ml0 , k = k 0l (l ∈ N). The arithmetic complexity of ALG
is: q
F (n, m) = nm · logn0m0 (nm)
OpsU OpsV OpsW
n 0m 0
F (n, m, k) = 1 + + + tl
t 0 − n 0m 0 t 0 − m 0 k 0 t 0 − n 0 k 0 0 where q is the number of linear operations performed.
OpsU · nm OpsV · mk OpsW · nk
− + +
t 0 − n 0m 0 t0 − m0 k0 t0 − n0 k0 2.3 Matrix Sparsification.
Definition 2.7. Let P I ×J denote the permutation matrix that ex- Finding a basis that minimizes the number of additions and sub-
changes row-order for column-order of the vectorization of an I × J tractions performed by a fast matrix multiplication algorithm is
matrix. equivalent, by Remark 2.4, to the Matrix Sparsification problem:
3
Problem 2.13. Matrix Sparsification Problem (MS): Let U be an Notation 3.3. Given an Ω-valid set S, we will refer to a vector
n × m matrix. The objective is to find an invertible matrix A such that λ ∈ Fn with supp (λ) 1 Ω s.t. λT U:, S = 0 as an Ω-validator of S.
A = argmin (nnz (AU )) Next, we provide a definition for vectors which are candidates
A∈G L n
for a solution of SIV:
Remark 2.14. It is traditional to think of the matrices U , V , and
W as “tall and skinny”, i.e., with n ≥ m. However, in the area of Definition 3.4. A vector v in the row space of U is called Ω-
matrix sparsification, it is traditional to deal with matrices satisfying independent if v is not in the row space of U Ω, : .
n ≤ m and transformations applied from the left. However, since Note that any solution to SIV (Problem 2.15) is, by definition, an
nnz(AU ) = nnz(U T AT ), we can simply apply MS to U T and use AT optimally sparse Ω-independent vector.
as our basis transformation. From now on, we will therefore switch
to the convention n ≤ m used in matrix sparsification. Remark 3.5. Note that given a set S ⊂ [m], it is possible to verify
whether S is Ω-validand find an appropriate Ω-validator for it in
To solve MS, we make use of Gottlieb and Neylon’s algorithm [15], cubic time (e.g., via Gaussian elimination).
which solves the matrix sparsification problem for n×m matrices, by
repeatedly invoking an oracle for the Sparsest Independent Vector 3.1 Sparse Independent Vectors and maximal
problem (Problem 2.15).
Ω-valid sets.
Problem 2.15. Sparsest Independent Vector Problem (SIV): Let The crux of our algorithms lies in the idea of finding an Ω-valid set
U ∈ R n×m (n ≤ m) and let Ω = {ω1 , . . . , ωk } ⊂ [m]. Find a of maximal cardinality and using it to compute a solution for SIV,
vector v ∈ R n s.t. v is in the row space of U , v is not in the span of which can then be used by Algorithm 2. The connection between
Uω1 , . . . , Uωk , and v has a minimal number of nonzero entries.
Ω-valid sets and Ω-independent vectors is given by the following
Given a subroutine SIV (U , Ω) which returns a pair (v, i), where lemmas:
v is the sparse vector as required by SIV, and i ∈ [n] \ Ω is an Lemma 3.6. Let v ∈ Fn be an Ω-independent vector. Then the set
integer such that the i’th row of U can be replaced by v without S = j : v j = 0 is an Ω-valid set of size zeros (v).
changing the span of U . Then Algorithm 2 returns an exact solution
for MS [15]. Proof. By Definition 3.4, there exists a vector λ ∈ Fn s.t. v =
Ín T
i=1 λi Ui (i.e., v = λ U ) and λi 0 , 0 for some i 0 < Ω (hence
Algorithm 2 MS via SIV [15] supp (λ) 1 Ω). Thus, λ is an Ω-validator of S, and therefore, S is
1: procedure MS(U) Ω-valid.
2: Ω←∅ Lemma 3.7. Let S ⊂ [m] be an Ω-valid set and let λ ∈ Fn an
3: for j = 1, ..., n Ω-validator of S. Then v = λT U is an Ω-independent vector with at
4: (v j , i) ← SIV (U , Ω) least |S | zero entries.
5: Replace i’th row of U with v j
6: Ω ← Ω ∪ {i} Proof. Since S is valid, there exists λ ∈ Fn s.t. λT U:,S = 0 and
return U supp (λ) 1 Ω. Denote v =λT U . By
definition, v has at least |S | zero
entries since ∀i ∈ S vi = λT U:, S = 0. Next we show that v is Ω-
i Í
3 OPTIMAL SPARSIFICATION METHODS independent. Note that, v = λT U = ni=1 λi Ui, : is in the row space
of U since it is a linear combination of the rows of U . Furthermore,
In this section, we reframe SIV as a problem of finding a maximal
since supp (λ) 1 Ω, there exists i 0 < Ω s.t. λi 0 , 0. Therefore,
subset of columns of the input matrix U according to constraints
v is not in the row span of U Ω, : since we assume (Remark 2.14)
given by Ω (see Definition 3.2). We refer to such sets as Ω-valid sets
that all rows of U are linearly independent. Hence, v = λT U is an
and show that Ω-valid sets are tied to sparse independent vectors
Ω-independent vector with at least |S | zero entries.
(Section 3.1) and that any algorithm which finds an Ω-valid set
of maximal cardinality can be used as an oracle in Algorithm 2. Corollary 3.8. Let M ⊂ [m] be a maximal Ω-valid set (i.e.,
Finally, we show how to find maximal Ω-valid sets (Section 3.2), M is not a subset of any other Ω-valid set), and let v ∈ Fn be an
and obtain two algorithms that solve SIV. Ω-independent vector s.t. ∀i ∈ M vi = 0. Then ∀j < M, v j , 0.
Recall that we use the convention that U ∈ Fn×m where n ≤ m
(see Remark 2.14). Throughout this section, we also assume that U Proof. Denote the set of indices of zero entries of v by M =0
0
is of full rank n and Ω ( [n]. j : v j = 0 . Since v is Ω-independent, Lemma 3.6 yields that M
is valid. Hence, by maximality of M, M = M 0 and |M | = zeros (v).
Notation 3.1. For a set S and an integer k, let Ck (S) denote the
Therefore, ∀i ∈ [m] vi = 0 if, and only if, i ∈ M.
set of all subsets of S with k elements.
Definition 3.2. S ⊂ [m] is Ω-valid if there exists i < Ω such that Corollary 3.9. Let M ⊂ [m] be a maximal Ω-valid set and let
λ ∈ Fn be an Ω-validator of M. Then v = λT U is an Ω-independent
Ui, S is in the span of rows U[n]\{i }, S .
vector with exactly |M | zero entries.
Formally, a set S ⊂ [m] is Ω-valid if exists λ ∈ Fn with supp (λ) 1
Ω s.t. λT U:, S = 0 (where supp (λ) = {i : λi , 0}). Proof. Follows directly from Lemma 3.7 and Corollary 3.8
4
The final two claims will show how Ω-validity can serve as (since λT U (:, S) = 0), and any linear combinations of columns of S.
an oracle for Algorithm 2. Recall that Algorithm 2 uses an or- This leads to the following extension of sets:
acle which returns a pair (v, i), where v is an optimally sparse
Ω-independent vector, and replacing the i’th row of U with v does Definition 3.12. Let S ⊂ [m]. We define theextension of S, E (S),
to be the largest set E ⊂ [m] s.t. span col U:, S = span col U:, E .
not change the row span of U . The next claim shows that a maxi-
mally sparse Ω-independent vector is equivalent to an Ω-valid set Lemma 3.13. Let S ⊂ [m]. Then S is Ω-valid if, and only if, E (S)
of maximal cardinality. is Ω-valid.
Claim 3.10. An Ω-independent vector v ∈ Fm is optimally sparse
if, and only if, M = {i : vi = 0} is an Ω-valid set of maximal cardi- Proof. Assume E (S) is Ω-valid. By definition of Ω-validity,
nality. exists a vector λ ∈ Fn s.t. supp (λ) 1 Ω and λT U:, E(S ) = 0. Since
S ⊂ E (S), λT U:,S = 0, therefore, S is valid.
Proof. First, assume that v ∈ Fm is a maximally sparse Ω- Let S ⊂ [m] be an Ω-valid set, and let λ ∈ Fn with supp (λ) 1 Ω
independent vector (i.e., for any Ω-independent vector u, zeros (u) ≤ s.t. λT U:, S = 0. Since col (U (:, E (S))) = col (U (:, S)), all columns
zeros (v)). From Lemma 3.6, we know that M is Ω-valid. Lemma 3.7 indexed by E (S) are linear combinations of the columns indexed by
shows that if there exists an Ω-valid set S s.t. |M | < |S |, then S. Since λ is orthogonal to all columns of U indexed by S, it is also
there also exists an Ω-independent vector u ∈ Fm s.t. zeros (u) ≥ orthogonal to all their linear combinations. Therefore, λT U:, E(S ) =
|S | > zeros (v). This contradicts v being a maximally sparse Ω- 0. Hence E (S) is valid.
independent
vector. Next we show that the search for a maximal Ω-valid set can be
Now, assume that M is an Ω-valid set of maximal cardinality (i.e., reduced to the search over maximal extensions of sets of size n − 1.
for any Ω-valid set S, |S | ≤ |M |) and let λ M be an Ω-validator of
M. By Corollary 3.9, v M = λTM U is an Ω-independent vector with Remark 3.14. Note that rank U:, S ≤ n − 1 for any Ω-valid set
exactly |M | zero entries. Assume by contradiction that exists u ∈ S. This is due to the fact that if rank U:, S = n then λT U:, S = 0
Fm with z > |M | zero entries, then by Lemma 3.6, there is an Ω- implies that λ = 0 since the rows of U are linearly independent.
valid set S s.t. |M | < |S |, in contradiction to M being an Ω-valid set
Lemma 3.15. Let S be an Ω-valid set and let λ ∈ Fn be an Ω-
of maximal cardinality. Therefore, v M = λ M U is a maximally sparse
validator of S. Then
Ω-independent vector.
E (S) ⊂ i : λT U = 0
n o
The following claim shows that given an Ω-valid set, S, and i
its corresponding Ω-independent vector v (as in Lemma 3.7), the
support of the Ω-validator of S can be used to find an index i s.t.
n o
Proof. Let D = i : λT U = 0 . By Definition 3.12, columns
the i’th row of U can be replaced with v without changing the row i
indexed by E (S) are linear combinations of the columns indexed
span of U .
by S and λ is orthogonal to all columns of U:, S (and their linear
Claim 3.11. Let S be an Ω-valid set, let λ be an Ω-validator of S, combinations). Hence, λT U:, E(S ) = 0 and E (S) ⊂ D.
and let v = λT U . Then for any i ∈ supp (λ) \ Ω, replacing row i of U
with v does not the change row span of U . That is: Lemma 3.16. Let S be an Ω-valid set s.t. rank U:, S = n − 1, and
let D be an Ω-valid set s.t. S ⊂ D. Then D ⊂ E (S).
span (rows (U )) = span rows U[n]\{i },: ∪ {v}
Proof. Since S ⊂ D, n − 1 = rank U:, S ≤ rank
U:, D . How-
Proof. Fix i 0 ∈ supp (λ) \ Ω. Since
v is a linear combination
we know that rank
U:, D ≤ n − 1, there-
ever, from Remark 3.14,
of rows of U and λi 0 , 0, u ∈ span rows U[n]\{i 0 },: ∪ {v} , for
fore, span col U:, S = span col U:, D . Hence, by definition,
any u ∈ span (rows (U )). Now, let α ∈ Fn be the α j = −λ
vector j D ⊂ E (S).
(for j , i 0 ) and α i 0 = 0. Then w = α T U ∈ span rows U[n]\{i 0 },: ,
Corollary 3.17. Let S be an Ω-valid set s.t. rank U:, S = n − 1,
therefore w +v = λi 0 Ui 0,: ∈ span rows U[n]\{i 0 },: ∪ {v} . Hence, n
and let λ ∈ F be an Ω-validator of S. Then
span (rows )) =
(U
E (S) = i : λT U = 0
n o
span rows U[n]\{i },: ∪ {v} . i
Therefore, any algorithm which finds an Ω-valid set of maximal Proof. This is a direct result of Lemma 3.15 and Lemma 3.16.
cardinality is an oracle for Algorithm 2.
Note that Corollary 3.17 gives us the tools to quicklycompute
3.2 Computing maximal Ω-valid sets. the extension of any Ω-valid set S such that rank U:, S = n − 1.
Given a maximal Ω-valid set, we now have the tools to compute Next we prove that any maximal Ω-valid set is an extension of an
optimally sparse Ω-independent vectors. As the next stage, we show Ω-valid set of n − 1 linearly independent columns of U :
how to compute a maximal Ω-valid set M using a small subset of
Claim 3.18. Let S ⊂ [m] be a maximal Ω-valid set, then
columns S ⊂ M. The key intuition here is that if λ ∈ Fn is an
Ω-validator of S, then λ is orthogonal to all columns indexed by S rank U:, S = n − 1
5
Proof. Let S ⊂ [m] be a maximal Ω-valid set, and let i 0 < Ω such Algorithm 3 Sparsest Independent Vector (1)
that Ui 0, S ∈ span rows U[n]\i 0, S (such i 0 exists by definition of 1: procedure SIV (U , Ω)
an Ω-valid set). Suppose, by contradiction, that rank U:, S = n − r
2: sparsity ← 0
for some r > 1. 3: sparsest ← null
Note that since Ui 0, S is in the row span U[n]\i 0, S , rank U:, S =
4: i ← null
5: for C ∈ Cn−1 ({1, ..., m})
rank U[n]\i 0, S = n − r . Therefore, exists S 0 ⊂ S s.t. |S | = n − r 6: if rank U:,C < n − 1 or C is not Ω-valid
and rank U:, S 0 = n − r .
7: continue
Let Q ⊂ [m]\S s.t. |Q | = r −1, rank U[n]\i 0, Q = r −1, and each 8: λ ← Ω-validator of C
column indexed by Q is not in the column span of U[n]\i 0, S . Such 9: v ← λT U
Q exists because the matrix U[n]\{i 0 }, : has full rank n − 1 (since U 10: E ← {i : vi = 0}
is of full row rank n). 11: if |E| > sparsity
Since the matrix U[n]\{i 0 }, S 0 ∪Q is a square n − 1 ×n − 1 matrix of 12: sparsity ← |E|
13: sparsest ← v
full rank, Ui 0,S 0 ∪Q is in the span of row U[n]\{i 0 }, S 0 ∪Q . Therefore, 14: i ← any element of supp (λ) \ Ω
S 0 ∪ Q is an Ω-valid set. return (v, i)
By Lemma 3.13, the extension of S 0 ∪Q is also valid. Furthermore,
S ∪Q ⊂ E (S 0 ∪ Q) because we have chosen S 0 s.t. it spans the same
column space as S. However, by construction of Q, we know that Theorem 3.22. Algorithm 3 produces an optimal solution to SIV,
S ∩ Q = ∅, meaning that |E (S 0 ∪ Q)| ≥ |S ∪ Q | > |S |. This in and is an oracle for Algorithm 2.
contradiction to maximality of S. Proof. By Lemma 3.21, Algorithm 3 iterates over all maximal Ω-
valid sets. Lines 14-17 check whether a given Ω-valid set has greater
Corollary 3.19. Let S ⊂ [m] be a maximal Ω-valid set and let
cardinality than any previously found maximal Ω-valid set and if it
C ⊂ S s.t. rank U:, C = n − 1. Then S = E (C).
does, the algorithm choose this set as a working solution. Hence, at
the end of the algorithm, the chosen vector v correlates to a maximal
Proof. C ⊂ S, Therefore, span col U:, C ⊂ span col U:, S .
cardinality Ω-valid set. By Claim 3.10, v is an optimal solution to SIV
Because S is maximal, Claim 3.18 shows that rank
U:, S = n−1. We (a maximally sparse Ω-independent vector) if, and only if, the set
have, by rank equality, that span col U:, C = span col U:,S .
E = {i : vi = 0} is an Ω-valid set of maximal cardinality. Therefore,
By definition, E (C) is the maximal set E s.t. span col U:, C ⊂
the vector chosen at the end of the algorithm is a maximally sparse
span col U:, E , therefore, S ⊂ E (C). However, by maximality of
Ω-independent vector. Finally, by Claim 3.11, the pair (v, i) serves
S, we have S = E (C).
as the oracle for SIV required by Algorithm 2.
Corollary 3.20. Let S ⊂ [m] be a maximal Ω-valid set, then
exist C ∈ Cn−1 ([m]) s.t. S = E (C).
3.4 Implementation of our first optimal
algorithm.
Proof. This is a direct result of Corollary 3.19 In order for Algorithm 3 to perform well, we have added a blacklist
to the algorithm’s operation. Since the maximal Ω-valid sets are
3.3 First algorithm for SIV. generated by computing the extension (Definition 3.12) of n − 1
Our first algorithm performs an exhaustive search over all maximal independent columns, once a given Ω-valid set is found, we wish
Ω-valid sets in order find one with maximal cardinality. This is a to blacklist all of its subsets of size n − 1 since we need not revisit
result of the observation given by Claim 3.10, which states that any that extension. However, in addition to memory costs, looking
solution to SIV is tied to an Ω-valid set of maximal cardinality (and up an element in the blacklist incurs a significant overhead as the
vice versa). The search is done using by combining Corollary 3.20, blacklist grows. To address this problem, rather than storing all
which states that any maximal Ω-valid set is the extension of an subsets Cn−1 (S) of a given set S, we store S itself in the blacklist,
Ω-valid set of n − 1 independent columns, and Corollary 3.17, which in which case C is not blacklisted if ∀B ∈ blacklist C 1 B. Despite
provides a method to compute said extension. this measure, in some cases the blacklist still grew too large, so we
and imposed a limit on the maximum size of the blacklist, storing
Lemma 3.21. Algorithm 3 iterates over all maximal Ω-valid sets. only the M largest sets found so far.
Proof. By Corollary 3.20, for any maximal Ω-valid set E, there 3.5 Second algorithm for SIV.
exist an Ω-valid set C ∈ Cn−1 ([m]) s.t. rank U:, C = n − 1 and
While our first algorithm performs well in many cases, we have
E is the extension of C. Therefore, the algorithm iterates over all
found that it performs poorly when the largest Ω-valid set is very
Ω-valid sets C ∈ Cn−1 ([m]) s.t. rank U:, C = n − 1. Furthermore,
large. In such cases the algorithm quickly finds the correct solution,
by Corollary 3.17,nif rank U:, C =o n − 1 and λ is an Ω-validator of
but then continues its exhaustive search for a very long time. Our
C then E (C) = i : λT U = 0 . The algorithm performs this second algorithm is slightly simpler and avoids this inefficiency
i
computation at lines 11-13. Hence, the algorithm iterates over all by using a top-down approach, searching for Ω-valid sets in de-
Ω-valid sets. scending order of cardinality to find an Ω-valid set of maximal
6
cardinality. Just like our first algorithm, it relies on the observation Algorithm 5 Row basis sparsification [22, 23]
of Claim 3.10, which ties any solution of SIV (maximally sparse, 1: procedure KS-Sparsification(U)
Ω-independent vector) to an Ω-valid set of maximal cardinality. 2: sparsity ← nnz (U )
3: basis ← In
Algorithm 4 Sparsest Independent Vector (2) 4: for C ∈ Cm (n)
5: if U:, C is of full rank
procedure SIV (U , Ω)
sparsifier ← U:,−1C
1:
6:
2: for z = m − 1, ..., n − 1
7: if nnz (sparsi f ier · U ) < sparsity
3: for C ∈ Cz ([m])
8: sparsity ← nnz (sparsi f ier · U )
4: if rank U:,C = n − 1 and C is Ω-valid
9: basis ← sparsifier
5: λ ← Ω-validator of C
6: v ← λT U 10: return basis
7: i ← any element of supp (λ) \ Ω
8: return (v, i) 4.2 Greedy sparsification.
A second heuristic for matrix sparsification, inspired by Gottlieb
To prove the correctness of our Algorithm 4, we use the following and Neylon’s algorithm (Algorithm 2), employs an even simpler
lemma, which provides bounds on the size of a maximal Ω-valid set. greedy approach.
Recall that for a given n × m matrix U (n ≤ m), we seek an n × n
Lemma 3.23. Let S ⊂ [m] be a maximal Ω-valid set, then n − 1 ≤ matrix A which minimizes nnz (AU ) + nns (AU ). For this purpose,
|S | ≤ m − 1. rather than searching for the entire invertible matrix A achieving
this objective, we could instead search for each row of A individually.
Proof. First, we show that |S | < m. Assume, by contradiction, Concretely, we iteratively compose the matrix A row-wise; where
that |S | = m and let λ ∈ Fn be an Ω-validator of S. Then λT U = 0, at each step i, we obtain the sparsest row vector vi such that vi is
which means that i ∈[n] λi Ui,: = 0, in contradiction to U having independent of {v 1 , . . . , vi−1 } and minimizes nnz (vU ) + nns (vU ).
Í
full row rank n. Hence, |S | ≤ m − 1. This yields the following algorithm:
Next, by Claim 3.18, since S is a maximal Ω-valid set, its rank
isn − 1, therefore, n − 1 ≤ |S |. Hence n − 1 ≤ |S | ≤ m − 1. Algorithm 6 Greedy Sparsification
1: procedure Greedy − Sparsi f ication(U )
Theorem 3.24. Algorithm 3 produces an optimal solution to SIV, 2: A←∅
and is an oracle for Algorithm 2. 3: for i = 1, . . . , n
v← nnz v T U + nns v T U
argmin
Fm
4:
Proof. Claim 3.10 states that v ∈ is a solution to SIV (an op- v ∈Fm
r k ({v 1 , . . .,v i −1 ,v })=i
timally sparse, Ω-independent vector) if and only if S = {i : vi = 0}
is Ω-valid. The algorithm iterates all subsets of [m] in descending 5: Ai,: ← vT
order of cardinality. Therefore, the first Ω-valid set found is an 6: return A
Ω-valid set of maximal cardinality. Furthermore, Lemma 3.23 states
that any maximal Ω-valid set is of size n − 1 ≤ z ≤ m − 1, hence, the In order to implement the subroutine for finding each row vec-
algorithm iterates all candidates S ⊂ [m] that could be Ω-valid sets tor vi , we encoded the objective as a MaxSAT instance and used
of maximal cardinality. Therefore, Algorithm 4 returns a sparsest Z3 [14], an SMT Theorem Prover, to find the optimal solution. Our
Ω-independent vector. Finally, by Claim 3.11, the pair (v, i) serves MaxSAT instance employs two types of “soft” constraints: one
as the oracle for SIV required by Algorithm 2. which penalizes non-zero entries, and another which penalizes
non-singleton entries. Therefore, optimal solutions will minimize
4 ADDITIONAL SPARSIFICATION METHODS the sum of non-zero and non-singleton entries, thereby minimizing
the associated arithmetic complexity (Remark 2.4).
4.1 Sparsification via subset of rows.
This algorithm, while not proven to be optimal, has the advan-
The alternative bases presented in Karstadt and Schwartz’s [22, 23] tage of considering both non-zeros and non-singletons, and can
paper were found using a straightforward heuristic of iterating over therefore produce decompositions resulting in a lower arithmetic
all sets of n linearly independent rows of an n ×m matrix of full rank complexity than the optimal algorithms (Algorithms 3, 4). For a
(where n ≥ m). This heuristic was based on the observation that summary of these results, see Table 2.
using the columns of the original matrix for sparsification ensures
that the sparsified matrix contains n rows, each with only a single 5 APPLICATION AND RESULTING
non-zero entry. ALGORITHMS
While this method is inefficient, requiring ( m
n ) passes, it finds
sparsifications which significantly improve the leading coefficients Table 2 contains a list of alternative basis algorithms found using
of multiple algorithms. The refinement of this method led to the our new methods. All of the algorithms used were taken from
development of Algorithm 3. It is therefore presented here for the repository of Ballard and Benson [2]3 . The alternative basis
completeness. 3 The algorithms can be found at github.com/arbenson/fast-matmul
7
Table 2: Alternative Basis Algorithms
algorithms obtained represent a significant improvement over the this reason, the decomposition of some of the larger instances re-
original versions, with the reduction in the leading coefficient rang- quired the use of Mira supercomputer. However after some tuning
ing between 15% and 88%. Almost all of the results were found of Algorithms 3 and 4 (see Section 3.4) and the implementation
using our exhaustive methods (Algorithms 3 and 4). In certain cases of Algorithm 6 using Z3, all decompositions completed on a PC
(marked (?)), where the U , V , W matrices contain non-singular within a reasonable time. Specifically, all runs of Algorithms 3 and
values, our search heuristic’s (Algorithm 6) result exceeded those 4 completed within 40 minutes, while Algorithm 6 took less than
of our exhaustive algorithms. For example, bases obtained for the one minute, on a PC4 . It should be remembered that Algorithms 3
h4, 4, 2; 26i-algorithm by Algorithms 3 and 4 reduced the number of and 4 guarantee optimal sparsification, while Algorithm 6 has no
arithmetic operations from 235 to 110, while Algorithm 6 reduced such guarantee. However, in all cases, Algorithm 6 ran much faster
the number of arithmetic operations even further, to 105. and produced an equally good decomposition, with better results
when there were non-singular values.
Comparison of different search methods. The exhaustive algo- 6 DISCUSSION AND FUTURE WORK
rithms (Algorithms 3, 4) solve the SIV problem. Their proof of
We have improved the leading coefficient of several fast matrix
correctness, coupled with that of Gottlieb and Neylon’s algorithm,
multiplication algorithms by introducing new methods to solve to
guarantee that they obtain decompositions minimizing the number
of non-zero entries. As MS and SIV are both NP-Hard problems,
these algorithms exhibit an exponential worst-case complexity. For 4 Matebook X (i7-7500U CPU and 8GB RAM)
8
sparsify the encoding/decoding matrices of fast matrix multipli- [6] Nader H Bshouty. 1995. On the additive complexity of 2×2 matrix multiplication.
cation algorithms. The number of arithmetic operations depends Information processing letters 56, 6 (1995), 329–335.
[7] Murat Cenk and M Anwar Hasan. 2017. On the arithmetic complexity of Strassen-
on both both non-zero and non-singular entries. This means that like matrix multiplications. Journal of Symbolic Computation 80 (2017), 484–501.
in order to minimize the arithmetic complexity, the sum of both [8] S Frank Chang and S Thomas McCormick. 1992. A hierarchical algorithm for
making sparse matrices sparser. Mathematical Programming 56, 1 (1992), 1–30.
non-zero and non-singular entries should be minimized, otherwise [9] Henry Cohn and Christopher Umans. 2003. A group-theoretic approach to fast
an optimal sparsification may result in a 2-approximation of the matrix multiplication. In Foundations of Computer Science, 2003. Proceedings. 44th
minimal number of arithmetic operations when matrix entries are Annual IEEE Symposium on. IEEE, 438–449.
[10] Don Coppersmith and Shmuel Winograd. 1982. On the asymptotic complexity
not limited to 0, ±1. Further work is required in order to find a of matrix multiplication. SIAM J. Comput. 11, 3 (1982), 472–492.
provably optimal algorithm which minimizes both non-zero and [11] Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arith-
non-singleton values. metic progressions. Journal of symbolic computation 9, 3 (1990), 251–280.
[12] Hans F de Groote. 1978. On varieties of optimal algorithms for the computation
We attempted sparsification of additional algorithms for larger of bilinear mappings I. the isotropy group of a bilinear mapping. Theoretical
dimensions (e.g., Pan’s h44, 44, 44; 36133i-algorithm [31], which is Computer Science 7, 1 (1978), 1–24.
[13] Hans F de Groote. 1978. On varieties of optimal algorithms for the computa-
asymptotically faster than those presented here). However, the tion of bilinear mappings II. Optimal algorithms for 2×2-matrix multiplication.
size of the base case of these algorithms led to prohibitively long Theoretical Computer Science 7, 2 (1978), 127–148.
runtimes. [14] Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In
International conference on Tools and Algorithms for the Construction and Analysis
The methods presented in this paper apply to finding square of Systems. Springer, 337–340.
invertible matrices solving the MS problem. Other classes of sparse [15] Lee-Ad Gottlieb and Tyler Neylon. 2010. Matrix sparsification and the sparse
decompositions exist which do not fall within this category. For ex- null space problem. In Approximation, Randomization, and Combinatorial Opti-
mization. Algorithms and Techniques. Springer, 205–218.
ample, Beniamini and Schwartz’s [1] decomposed recursive-bilinear [16] Vince Grolmusz. 2008. Modular representations of polynomials: Hyperdense
framework relies upon decompositions in which the sparsifying coding and fast matrix multiplication. IEEE Transactions on Information Theory
54, 8 (2008), 3687–3692.
matrix may be rectangular, rather than square. Some of the leading [17] Alan J Hoffman and ST McCormick. 1984. A fast algorithm that makes matrices
coefficients in [1] are better than those presented here. For example, optimally sparse. Progress in Combinatorial Optimization (1984), 185–196.
they obtained a leading coeffcient of 2 for a h3, 3, 3; 23i-algorithm [18] John E Hopcroft and Leslie R Kerr. 1971. On minimizing the number of multi-
plications necessary for matrix multiplication. SIAM J. Appl. Math. 20, 1 (1971),
of [2] a h4, 3, 3; 29i-algorithm of [36], compared to our values 5.36 30–36.
and 6.96 respectively. However, the arithmetic overhead of basis [19] John E Hopcroft and Jean Musinski. 1973. Duality applied to the complexity of
transformation in Karstadt and Schwartz [22, 23] (and therefore matrix multiplications and other bilinear forms. In Proceedings of the fifth annual
ACM symposium on Theory of computing. ACM, 73–87.
here as well) is O n 2 log n , whereas in [1] it may be larger. Note [20] Rodney W Johnson and Aileen M McLoughlin. 1986. Noncommutative Bilinear
also that the decomposition heuristic of [1] does not always guaran- Algorithms for 3×3 Matrix Multiplication. SIAM J. Comput. 15, 2 (1986), 595–603.
[21] Igor Kaporin. 1999. A practical algorithm for faster matrix multiplication. Nu-
tee optimality. Further work is required to find new decomposition merical linear algebra with applications 6, 8 (1999), 687–700.
methods for such settings. [22] Elaye Karstadt and Oded Schwartz. 2017. Matrix multiplication, a little faster.
In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and
Architectures. ACM, 101–110.
7 ACKNOWLEDGEMENTS [23] Elaye Karstadt and Oded Schwartz. 2020. Matrix multiplication, a little faster.
We thank Austin R. Benson for providing details regarding the Journal of the ACM (JACM) 67, 1 (2020), 1–31.
[24] Donald E Knuth. 1981. The Art of Computer Programming, Volume 2: Seminu-
h2, 3, 2; 11i-algorithm. This research used resources of the Argonne merical Algorithms, Addison-Wesley. Reading, MA (1981).
Leadership Computing Facility, which is a DOE Office of Science [25] Julian Laderman, Victor Y Pan, and Xuan-He Sha. 1992. On practical algorithms
User Facility supported under Contract DE-AC02-06CH11357. This for accelerated matrix multiplication. Linear Algebra and Its Applications 162
(1992), 557–588.
work was supported by the PetaCloud industry-academia consor- [26] Julian D Laderman. 1976. A noncommutative algorithm for multiplying 3×3
tium. This research was supported by a grant from the United matrices using 23 multiplications. In Am. Math. Soc, Vol. 82. 126–128.
[27] François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In
States-Israel Bi-national Science Foundation, Jerusalem, Israel. This Proceedings of the 39th international symposium on symbolic and algebraic com-
work was supported by the HUJI Cyber Security Research Center putation. ACM, 296–303.
in conjunction with the Israel National Cyber Bureau in the Prime [28] S Thomas McCormick. 1983. A Combinatorial Approach to Some Sparse Matrix
Problems. Technical Report. Stanford university CA systems optimization lab.
Minister’s Office. This project has received funding from the Euro- [29] S Thomas McCormick. 1990. Making sparse matrices sparser: Computational
pean Research Council (ERC) under the European Union’s Horizon results. Mathematical Programming 49, 1-3 (1990), 91–111.
2020 research and innovation programme (grant agreement No [30] Victor Y Pan. 1978. Strassen’s algorithm is not optimal trilinear technique of
aggregating, uniting and canceling for constructing fast algorithms for matrix
818252). operations. In Foundations of Computer Science, 1978., 19th Annual Symposium
on. IEEE, 166–176.
REFERENCES [31] Victor Y Pan. 1982. Trilinear aggregating with implicit canceling for a new
acceleration of matrix multiplication. Computers & Mathematics with Applications
[1] Gal Beniamini and Oded Schwartz. 2019. Faster Matrix Multiplication via Sparse 8, 1 (1982), 23–34.
Decomposition. In Proceedings of the 31st ACM Symposium on Parallelism in [32] Robert L Probert. 1976. On the additive complexity of matrix multiplication.
Algorithms and Architectures. ACM, 11–22. SIAM J. Comput. 5, 2 (1976), 187–203.
[2] Austin R Benson and Grey Ballard. 2015. A framework for practical parallel fast [33] Francesco Romani. 1982. Some properties of disjoint sums of tensors related to
matrix multiplication. ACM SIGPLAN Notices 50, 8 (2015), 42–53. matrix multiplication. SIAM J. Comput. 11, 2 (1982), 263–267.
[3] Dario Bini, Milvio Capovani, Francesco Romani, and Grazia Lotti. 1979. O(n 2.7799 ) [34] Arnold Schönhage. 1981. Partial and total matrix multiplication. SIAM J. Comput.
complexity for n×n approximate matrix multiplication. Information processing 10, 3 (1981), 434–455.
letters 8, 5 (1979), 234–235. [35] Alexey V Smirnov. 2013. The bilinear complexity and practical algorithms for
[4] Marco Bodrato. 2010. A Strassen-like matrix multiplication suited for squaring matrix multiplication. Computational Mathematics and Mathematical Physics 53,
and higher power computation. In Proceedings of the 2010 International Sympo- 12 (2013), 1781–1795.
sium on Symbolic and Algebraic Computation. ACM, 273–280. [36] Alexey V Smirnov. 2017. Several bilinear algorithms for matrix multiplication.
[5] Richard P Brent. 1970. Algorithms for matrix multiplication. Technical Report. Technical Report.
Stanford university CA department of computer science.
9
[37] Andrew James Stothers. 2010. On the complexity of matrix multiplication. Thesis [41] Petr Tichavskỳ, Anh-Huy Phan, and Andrzej Cichocki. 2017. Numerical CP
(2010). decomposition of some difficult tensors. J. Comput. Appl. Math. 317 (2017),
[38] Volker Strassen. 1969. Gaussian elimination is not optimal. Numerische mathe- 362–370.
matik 13, 4 (1969), 354–356. [42] Virginia V Williams. 2012. Multiplying matrices faster than Coppersmith-
[39] Volker Strassen. 1986. The asymptotic spectrum of tensors and the exponent Winograd. In Proceedings of the forty-fourth annual ACM symposium on Theory
of matrix multiplication. In Foundations of Computer Science, 1986., 27th Annual of computing. ACM, 887–898.
Symposium on. IEEE, 49–54. [43] Shmuel Winograd. 1971. On multiplication of 2×2 matrices. Linear algebra and
[40] Petr Tichavskỳ and Teodor Kováč. 2015. Private communication with Ballard its applications 4, 4 (1971), 381–388.
and Benson, see [2] for benchmarking. (2015).
10
A SAMPLES OF ALTERNATIVE BASIS Table 5: h3, 3, 3; 23i-algorithm [35]
ALGORITHMS
Uϕ Vψ Wυ
0 0 0 0 0 0 0 0 1 −1 0 0 0 0 0 0 −1 0 0 −1 0 0 −1 0 0 1 0
In this section we present the encoding/decoding matrices of the 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 −1 0
0 1 0 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 −1 0 0
alternative basis algorithms listed in Table 2. To verify the correct- 1 0 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 −1 0 0
ness of these algorithms, recall Corollary 2.11 and use the following 0
−1
0
0
0
0
−1
0
0
0
1
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
0
−1
0
−1
0
0
1
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
1
−1 0
1
fact: 0 0 −1 0 −1 0 −1 0 1 0 0 0 0 0 0 0 1 0 0 −1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 −1 0 0 −1 1 0 0 0 0 0
−1 −1
Fact A.1. (Triple product condition). [5, 24] Let R be a ring, and
0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
0 0 −1 0 0 0 0 0 −1 0 0 0 0 0 −1 0 0 0 0 1 0 −1 0 −1 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0
0
0
0
0
0
0
0
1
0
0
−1 0
0
0
0 0
0
0
0
0
0
−1
0 0
0 0
1
0
0
0
0
0
0
0
0
−1
1
0
0
0
0
0
0
0
0 0
0 1
0
0
0
0
0
0
1
0
0
0
0
0
0
−1
0
0
0
0
0
1
0
0
0
0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
1
−1
−1
−1 −1 −1
1 0 0 0 0 0 0 −1 0 0 0 1 0 0 0 0 −1 0 0 1 −1 0
0
0
0
0
−1
0
0
0
0
1
0
0
0
1
0
0
0
0
0 1
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0 0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
−1
0
0
0
0 0
0 1
0 −1
0
0
0
0
0
0
0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 −1 0 0 0 0 0 0 −1
0 0 −1 −1 0 −1 1 0 0 −1 −1 0 0 −1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 −1 0 0 0 0 0 0 0 0 −1 0 1 0 0 0 0 1 0 0 0 0 0 0 −1 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 0 −1 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 1 0 −1 0 0 0 −1 −1 0 0 0 0 0 1 0 0 0 0 −1 0
0
0
0
0
0
0
0
−1
0
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
1
0
0
−1
0
0
0
−1
−1
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
−1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
1
0 0
0 0
−1 0
0 0
0
0
0
0
0
0 1 0 0 0 −1 0 −1 0 0 0 0 −1 0 0 0 0 0 0 1 0 0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
−1
0
0
1
−1
−1
−1
0
0
0
0
0
−1
0
0
1
0
1
0
0
−1
0
0
0
0
0
0
0
0
0
0
−1
−1
0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
0 0
0
0
0 1
0
0
1
−1
0
0
0
ϕ ψ υ −T
1 0 0 0 0 0 0 0 0 0 0 0 −1 0 1 −1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1 0 −1 0 0
ϕ ψ υ −T
− 18 1 0 −1 0 −1 −1 0 1 1 0 0 1 − 18 0 −1 0 0 0 0 0 0 1 −1 0 −1 1 0 0 1 1 0 0 1 0 0 0 − 18 0 0 −2 1 0 1 0
0 0 0 0 1 −1 0 0 1 0 0 1 0 0 0 0 1 0 0 −1 0 0
8
0 0.25 0 0 2 −1 1 −1 −1 1 1
8
− 18 1
8 − 18 0 0 0 0 0 0 1 −1 0 −1 1 0 1 1 1
8
1
8
1
8
1
8
8
0 0 0 0 0 0 0 0 0 − 18
8
− 18 − 18
0 0 − 18 0 −1 0 0 −1 0 1 0 0 0 0 − 18 −1 0 2 1 −1 0 0 0 0 −1 1 0 0 0 1 1 0.25 0 1 0 0 0 1 0 0 0 1 0 1 0
−1 −1 −1
8 8 8 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0
0
− 18
0
2
0
1
0
0
0
0
1
−1
−1
0
1
1
1
0
−1
0
1
0
1
0
− 18
1
8
1
−1
0
0
0
0
0
−1
−1
0
0
−1
1
0
0
1
−1
1
1
1
1
−1
−1
0
0
0
1
0
1
0
0
1
8
0
− 18
1
− 18
1
0
0
0
1
0
1
− 18
0
1
8
1
1
8
0
−1
−1
1
1
1
2
0
0
0
− 18
0
0
8 8 8 8 8 8 8
0 0 0 0 0 −1 −1 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0
0
0
0
0
0
−1
1
1
−1
−1
−1
0
1
0
−1
0
−1 1
0 0
−1
0
−1
− 18
0
1
8
0
− 81
0
1
−1
−1
1
1
1
0
1
0
0
0
−1
0
0
2
1
0
−1 1
0 −2
−1
0
0
0
0
0
0
−1
−1
− 18
− 18
0
0
0
0
− 18
0
0
1
0
1
0
0
1
8
1
0
0
2
1
0
−1 0
1 0
1
− 18
0
0
1
8 8 8 8 8
−1 0 1 0 0 1 0 0 1 −1 0 1 0 0 −1 0 0 −1 0 0 0 0
0
0
0
0
− 18
1
1
−1
0
1
−1
1
0
−1
−1
1
0
1
1
−1
0
1
0
0
− 18
0
− 18
0
0
−1
0
1
1
−1
1
1
−2
0
0
−1
0
0
0
−1
0
1
0
−1
0
1
−2
0
0
1
−1
1
−1
0
−1
1
8
− 18
0
− 81
0
− 18
0
− 18
1
8
− 81
1
8
− 18
1
8
0
0
0
− 18
0
0
−1
0
−1
1
−1
0
0
− 18
0
0
0
0 0 −1 0 1 −1 −1 −1 0 0 −1 0 0 0 −1 0 0 0 −1 0 −1 − 18
0
− 81
0
0
− 18
0
−1
−1
0
0
1
1
0
0
−1
−1 1
0 1
0
0
0
0
0
− 18
0
1
8
− 18
0
0
0
1
−1
1
1
0
1
0
−1
−1
−1
0
0
1
8
0
− 18
0
− 18
0
0
0
0
0
− 18
− 18
0
1
8
1
8
1
8
−1
−1
1
1
0
1
0
1
8
− 18
− 18
0
− 18
− 18 1 − 18 1 1 − 18 1 1 − 18
1 −1 −1 0 0 0 0 0 1 0 −1 0 −1 1 0 −1 1 0 −1 0 0 0
0
0
0 1
8
−1
0
0
−1
−1
0
0
−1
−1
0
0
−1 0
1 0
1
0
1
8
− 81 1
8
0
0
0
−1
−1
0
−1
0
−1
−1
−1
−1
1
0
8
0
8
1
8 − 18
8
1
8
8
0 0
0
0
0
1
8
0
0
−1
0
−1
0
1
1
0
− 18
0
0
0
− 81
− 18 −1 0 −1 − 18 1 − 81 − 18 1 1 1 −1 1
1 −1 0 0 −1 0 0 0 0 0
1
8
0
− 18 1
8
0
1
1
−1
0
1
1
−1
0
1 −1 0 0
1
0 − 18
8
1
8
0
− 18 0
1
0
0
0
0 0
1
0
1
1
0
0
0 1
8
1
8
8
1
8
0
0
0
0
8
0
0
1
8
8
0 0
1
0
0
1
0
− 18
8
0
0
1
8
− 18 1 0 0 −1 0 −1 0 1 1 0 0 0 0 − 18 0 −1 1 1 −1 0 0 1 1 − 81 0 0 0 − 18 0 0 0 −1 − 18 0 − 81
0 0 −1 0 0 0 0 −1 0 0
8
0 1
8 0 −1 0 1 0 1 0 −1 1 − 18 − 18 0 −1 0 0 0 0 −1 0
8
− 81
8
1
8 − 81 0 0 1
8 0 1
8 1 1 0 0 − 18 0
− 18 − 81 0 1 0 1 0 −1 0 0 −1 1 0 0 1 −1 0 0 0 0 1 1 0 0 0 − 18 1 0 − 18 0 1 −1 0 − 81 0 1
0 0 −1 0 0 −1 1 0 0 8 8 8 8
Uϕ Vψ Wυ
0 0 −1 0 0 1 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 −1 0
0 0 −1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 −1 0 0 0 0 1 0
0 0 0 0 0 0 −1 1 0 0 0 1 −1 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 −1
0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
0 −1 0 0 0 0 0 −1 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 −1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 −1 0 0 0 0 0 0 0
0 0 0 0 1 0 0 −1 0 −1 0 1 −1 0 0 0 0 −1 0 0 0 0 0 1 −1 0
0 0 0 −1 0 −1 0 0 0 0 1 −1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
0 −1 0 0 0 0 −1 0 −1 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 −1 0 1
−1 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0
0 0 0 0 0 0 −1 0 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 0 0 −1
0 0 0 0 0 1 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
−1 0 0 0 0 0 0 −1 −1 0 0 −1 0 0 0 0 0 −1 0 0 0 0 0 0 0 0
0 0 0 0 1 −1 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 1 −1 0 0 0
0 0 0 −1 0 0 −1 0 0 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 −1 1 0 −1 −1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 0 −1 0 1 0 0 0 1 0 −1 1 0 0 0 0 0 0 1 0 0 0 0 0
0 −1 0 0 0 0 0 0 0 −1 0 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 −1
ϕ ψ υ −T
−1 0 0 0 0 0 0 −1 −1 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0
−1 0 −1 0 0 0 0 0 −1 −1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 −1 0 0 1 0 0 0 0 0 0 0 0 0
−1 −1 0 0 0 0 0 0 1 0 0 −1 0 −1 0 0 0 0 0 0 0 0 0 −1 1 0
1 0 0 0 0 0 0 0 1 1 −1 0 0 0 1 −1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 −1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 −1 −1
1 0 0 −1 0 0 0 0 0 0 −1 −1 0 1 0 0 0 0 1 0
1 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1 1 1 1 −1 −1
−1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 −1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1 −1 −1
0 0 1 1 −1 −1 0 0 0 0 0 0
11