0% found this document useful (0 votes)

13 views11 pages

Read Various Algorithms Listed

This document presents three new methods for sparsifying the operators of fast matrix multiplication algorithms, aimed at reducing their leading coefficients and improving practical performance. The authors apply these methods to existing algorithms, achieving significant reductions in leading coefficients while noting that the sparsification problem is NP-Hard. The paper also discusses previous work and provides a framework for alternative basis algorithms, contributing to the ongoing research in efficient matrix multiplication techniques.

Uploaded by

Shafi Mn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Read Various Algorithms Listed

Uploaded by

Shafi Mn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Sparsifying the Operators of Fast Matrix Multiplication

Algorithms
Gal Beniamini Nathan Cheng Olga Holtz
The Hebrew University of Jerusalem University of California at Berkeley University of California at Berkeley
[email protected] [email protected] [email protected]

Elaye Karstadt Oded Schwartz

The Hebrew University of Jerusalem The Hebrew University of Jerusalem
[email protected] [email protected]
arXiv:2008.03759v1 [cs.DS] 9 Aug 2020

ABSTRACT 1.1 Previous work

Fast matrix multiplication algorithms may be useful, provided that Reducing the leading coefficients. Winograd [43] reduced the lead-
their running time is good in practice. Particularly, the leading coef- ing coefficient of Strassen’s algorithm’s arithmetic complexity from
ficient of their arithmetic complexity needs to be small. Many sub- 7 to 6 by decreasing the number of additions and subtractions in
cubic algorithms have large leading coefficients, rendering them the 2 × 2 base case from 18 to 151 . Later, Bodrato [4] introduced
impractical. Karstadt and Schwartz (SPAA’17, JACM’20) demon- the intermediate representation method, that successfully reduces
strated how to reduce these coefficients by sparsifying an algo- the leading coefficient to 5, for repeated squaring and chain matrix
rithm’s bilinear operator. Unfortunately, the problem of finding multiplication. Cenk and Hasan [7] presented a non-uniform im-
optimal sparsifications is NP-Hard. plementation of Strassen-Winograd’s algorithm [43], which also
We obtain three new methods to this end, and apply them to reduces the leading coefficient from 6 to 5, but incurs additional
existing fast matrix multiplication algorithms, thus improving their penalties such as a larger memory footprint and higher communi-
leading coefficients. These methods have an exponential worst case cation costs. Independently, Karstadt and Schwartz [22, 23] used a
running time, but run fast in practice and improve the performance technique similar to Bodrato’s, and obtained a matrix multiplication
of many fast matrix multiplication algorithms. Two of the methods algorithm with a 2 × 2 base case, using 7 multiplications, and a
are guaranteed to produce leading coefficients that, under some leading coefficient of 5. Their method also applies to other base
assumptions, are optimal. cases, improving the leading coefficients of multiple algorithms.
Beniamini and Schwartz [1] introduced the decomposed recursive
bilinear framework, which generalizes [22, 23]. Their technique
1 INTRODUCTION allows a further reduction of the leading coefficient, yielding several
Matrix multiplication is a fundamental computation kernel, used in fast matrix multiplication algorithms with a leading coefficient of
many fields ranging from imaging to signal processing and artificial 2, matching that of the classical algorithm.
neural networks. The need to improve performance has attracted Lower bounds on leading coefficients. Probert [32] proved that 15
much attention from the science and engineering communities. additions are necessary for any recursive-bilinear matrix multipli-
Strassen’s discovery of the first sub-cubic algorithm [38] sparked cation algorithm with a 2 × 2 base case using 7 multiplications over
intensive research into the complexity of matrix multiplication F2 , which corresponds to a leading coefficient of 6. This was later
algorithms (cf. [1–3, 9–13, 16, 18, 20–23, 25–27, 30, 31, 33–37, 39, matched by Bshouty [6], who used a different technique to obtain
42, 43]). the same lower bound over an arbitrary ring. Both cases have been
The research efforts can be divided into two main branches. The interpreted as a proof of optimality for the leading coefficient of
first revolves around the search for asymptotic upper bounds on Winograd’s algorithm [43].
the arithmetic complexity of matrix multiplication (cf. [9–11, 16, Karstadt and Schwartz’s h2, 2, 2; 7i-algorithm2 [22, 23] requires
27, 37, 39, 42]). This approach focuses on asymptotics, typically dis- 12 additions (thus having a leading coefficient of 5) and seemingly
regarding the hidden constants of the algorithms and other aspects contradicts these lower bounds. Indeed, they showed that these
of practical importance. Many of these algorithms remain highly lower bounds [6, 32] do not hold under alternative basis multipli-
theoretical due to their large hidden constants, and furthermore, cation. In addition, they extended the lower bounds to apply to
they apply only to matrices of very high dimensions. algorithms that utilize basis transformations, and showe that 12
In constrast, the second branch focuses on obtaining matrix additions are necessary for any recursive-bilinear matrix multipli-
multiplication algorithms that are both asymptotically fast and cation algorithm with a 2 × 2 base case using 7 multiplications,
practical. This requires the algorithms to have reasonable hidden regardless of basis. Thus proving a lower bound of 5 on the leading
constants that are applicable even to small instances (cf., [1–3, 18, coefficient of such algorithms.
20–23, 25, 26, 30, 31, 33–36, 43]).

, 1 See Section 2.1 for the connection between the number of additions and the leading
2020. . coefficient.
DOI: 2 See Section 2.1 for definition.

1
Table 1: Examples of improved leading coefficients

Arithmetic Operations Leading Coefficient Improvement

Algorithm Leading Monomial Original [22, 23] Here Original [22, 23] Here [22, 23] Here
h2, 2, 2; 7i [38] n log2 7 ≈ n 2.80735 18 12 12 7 5 5 28.57% 28.57%
3
h3, 2, 3; 15i [2] n log18 15 ≈ n 2.81076 64 52 39 9.61 7.94 6.17 17.37% 35.84%
3
h4, 2, 3; 20i [35] n log24 20 ≈ n 2.82789 78 58 51 8.9 7.46 5.88 16.17% 33.96%
h3, 3, 3; 23i [2] n log3 23 ≈ n 2.85404 87 75 66 7.21 6.57 5.71 8.87% 20.79%
3
h6, 3, 3; 40i [35] n log54 40 ≈ n 2.77429 1246 202 190 55.63 9.36 8.9 83.17% 84.01%
The leading monomial of rectangular hn, m, k; t i-algorithms refers to their composition [19] into square nmk, nmk, nmk ; t 3 -algorithms.
The improvement column is the ratio between the new and the original leading coefficients of the arithmetic complexity. See Table 2 for a
full list of results.

Beniamini and Schwartz [1] extended the lower bound to the and obtain novel alternative-basis algorithms, often resulting in
generalized setting, in which the input and output can be trans- arithmetic complexity with leading coefficients superior to those
formed to a basis of larger dimension. They also found that the known previously (See Table 1, Table 2, and Appendix A).
leading coefficient of any such algorithm with a 2 × 2 base case The first two methods were obtained by the introduction of new
using 7 multiplications is at least 5. solutions to the Sparsest Independent Vector problem, which were
then used as oracles for Gottlieb and Neylon’s algorithm. As matrix
Obtaining alternative basis algorithms. Recursive-bilinear algo- sparsification is known to be NP-Hard, it is no surprise that these
rithms can be described by a triplet of matrices, dubbed the encod- methods exhibit exponential worst case complexity. Nevertheless,
ing and decoding matrices (see Section 2.1). The alternative basis they perform well in practice on the encoding/decoding matrices
technique [22, 23] utilizes a decomposition of each of these matrices of fast matrix multiplication algorithms.
into a pair of matrices – a basis transformation, and a sparse encod- Our third method for matrix sparsification simultaneously mini-
ing or decoding matrix. Once a decomposition is found, applying mizes the number of non-singular values in the matrix. This method
the algorithm is straightforward (see Section 2.1). does not guarantee an optimal solution for matrix sparsification.
The leading coefficient of the arithmetic complexity is deter- Nonetheless, it obtains solutions with the same (and, in some cases,
mined by the number of non-zero (and non-singleton) entries in better) leading coefficients than the former two methods when
each of the encoding/decoding matrices, while the basis transforma- applied to many of the fast matrix multiplication algorithms in
tions only affect the low order terms of the arithmetic complexity our corpus, and runs significantly faster than the first two when
(see Section 2.1). Thus, reducing the leading coefficient of fast ma- implemented using Z3 [14]. For completeness, we also present the
trix multiplication algorithms translates to the matrix sparsification sparsification heuristic used in [22, 23].
(MS) problem.
Matrix sparsification. Unfortunately, matrix sparsification is NP-
Hard to solve [28] and NP-Hard to approximate to within a factor 1.3 Paper Organization.
.5−o(1)
of 2log n [15] (Over Q, assuming NP does not admit quasi- In Section 2, we recall preliminaries regarding fast matrix mul-
polynomial time deterministic algorithms). Despite the problem tiplication and recursive-bilinear algorithms, followed by a sum-
being NP-hard, search heuristics can be leveraged to obtain bases mary of the Alternative Basis technique [22, 23]. We then present
which significantly sparsify the encoding/decoding matrices of fast Matrix Sparsification (MS, Problem 2.13), alongside Gottlieb and
matrix multiplication algorithms with small base cases. Neylon’s [15] algorithm for solving MS by relying on an oracle for
There are a few heuristics that can solve the problem, under Sparsest Independent Vector (SIV, Problem 2.15). In Section 3 we
severe assumptions, such as the full rank of any square submatrix, present our two algorithms (Algorithms 3 and 4) for implementing
and requiring that the rank of each submatrix be equal to the size SIV. In Section 4, we introduce Algorithm 5 - the sparsification
of the largest matching in the induced bipartite graph (cf., [8, 17, heuristic of [22, 23], and a new efficient heuristic for sparsifying
28, 29]). These assumptions rarely hold in practice, and specifically, matrices while simultaneously minimizing non-singular values
do not apply to any matrix multiplication algorithm we know. (Algorithm 6). In Section 5 we present the resulting fast matrix
Gottlieb and Neylon’s algorithm [15] sparsifies an n × m matrix multiplication algorithms. Section 6 contains a discussion and plans
with no assumptions about the input. It does so by using calls to for future work.
an oracle for the sparsest independent vector problem.

1.2 Our contribution. 2 PRELIMINARIES

We obtain three new methods for matrix sparsification, based on 2.1 Encoding and Decoding matrices.
Gottlieb and Neylon’s [15] matrix sparsification algorithm. We Fast matrix multiplication algorithms are recursive divide-and-
apply these methods to multiple matrix multiplication algorithms conquer algorithms, which utilize a small base case. We use the
2
notation hn 0 , m 0 , k 0 ; t 0 i-algorithm to refer to an algorithm multi- Lemma 2.8. [19] Let hU , V , W i be the encoding/decoding matri-
plying n 0 × m 0 by m 0 × k 0 matrices in its base case, using t 0 scalar ces of an hm, k, n; ti-algorithm. Then hW Pn×m , U , V Pn×k i are the
multiplications, where n 0 , m 0 , k 0 and t 0 are fixed positive integers. encoding/decoding matrices of an hn, m, k; ti-algorithm.
When multiplying n × m by m × k matrix multiplication, the
algorithm splits each matrix into blocks (each of size nn0 × m m Remark 2.9. In addition to Lemma 2.8, Hopcroft and Musin-
m k
0
ski [19] proved that any hn, m, k; ti-algorithm defines algorithms
and m0 × k0 ,
respectively), and works block-wise, according to for all permutations of n, m, and k. Note, however, that while the
the base algorithm. Additions and subtractions in the base-case number of non-zero and non-singular entries does not change, it fol-
algorithm become block-wise additions and subtractions. Similarly, lows from Remark 2.4 and Corollary 2.6 that the leading coefficient
multiplication by a scalar become multiplication of a block matrix varies according to the dimensions of the decoding matrix.
by a scalar. Matrix multiplications in the algorithm are performed
via recursion. 2.2 Alternative Basis Matrix Multiplication.
Throughout this paper, we refer to an algorithm by its base case.
Definition 2.10. [22, 23] Let R be a ring and let ϕ, ψ , υ be au-
Hence, an hn, m, k; ti-algorithm may refer to either the algorithm’s
base case or the corresponding block recursive algorithm, as obvi- tomorphisms of R n ·m , Rm ·k , R n ·k (respectively). We denote a
ous from the context. recursive bilinear matrix multiplication algorithm which takes
ϕ (A) , ψ (B) as inputs and outputs υ (A · B) using t multiplications
Fact 2.1. [22, 23] Let R be a ring, and let f : R n × Rm → R k by hn, m, k; ti ϕ,ψ ,υ . If n = m = k and ϕ = ψ = υ, we can use the no-
be a bilinear function that performs t multiplications. There exist tation hn, n, n; ti ϕ -algorithm. This notation extends the hn, m, k; ti-
U ∈ R t ×n , V ∈ R t ×m , W ∈ R t ×k such that algorithm notation, as the latter applies when the three basis trans-
∀x ∈ R n , y ∈ Rm , f (x, y) = W T ((U · x) (V · y)) formations are the identity map.

where is the element-wise product (Hadamard product). Given a recursive bilinear, hn, m, k; ti ϕ,ψ ,υ -algorithm ALG, an
alternative basis matrix multiplication operates as follows:
Definition 2.2. [22, 23] (Encoding/Decoding matrices). We refer
to the matrix triplet hU , V , W i of a recursive-bilinear algorithm (see
Fact 2.1) as its encoding/decoding matrices (U , V are the encoding Algorithm 1 Alternative Basis Matrix Multiplication Algorithm
matrices and W is the decoding matrix).
Input: A ∈ R n×m , Bm×k
Notation 2.3. [1] Denote the number of nonzero entries in a
Output: n × k matrix C = A · B
matrix by nnz (A), and the number of non-singleton (i.e., not ±1)
1: function Mult(A, B)
entries in a matrix by nns (A). Let the number of rows/columns be
2: Ã = ϕ(A) . R n×m basis transformation
nrows (A) and ncols (A), respectively.
3: B̃ = ψ (B) . Rm×k basis transformation
Remark 2.4. [1] The number of linear operations used by a bi- 4: C̃ = ALG(Ã, B̃) . hn, m, k; ti ϕ,ψ ,υ -algorithm
linear algorithm is determined by its encoding/decoding matrices. 5: C = υ −1 (C̃) . R n×k basis transformation
6: return C
The number of arithmetic operations performed by each of the
encodings is:
OpsU = nnz (U ) + nns (U ) − nrows (U )
OpsV = nnz (V ) + nns (V ) − nrows (V ) Lemma 2.11. [22, 23] Let R be a ring, and let ϕ, ψ , υ be automor-
The number of operations performed by the decoding is: phisms of R n ·m , Rm ·k , R n ·k (respectively). Then hU , V , W i are en-
coding/decoding matrices of an hn, m, k; ti ϕ,ψ ,υ -algorithm if and only
OpsW = nnz (W ) + nns (W ) − ncols (W )
if U ϕ, Vψ , W υ −T are encoding/decoding matrices of an hn, m, k; ti-
Remark 2.5. We assume that none of the rows of the U , V , and W algorithm
matrices is zero. This is because any zero row in U , V is equivalent
Alternative basis multiplication is fast since the basis transfor-
to an identically 0 multiplicand, and any zero row inW is equivalent
mations are fast and incur an asymptotically negligible overhead:
to a multiplication that is never used in the output. Hence, such
rows can be omitted, resulting in asymptotically faster algorithms. Claim 2.12. [22, 23] Let R be a ring, let ψ : R n0 ×m0 → R n0 ×m0
Corollary 2.6. [1] Let ALG be an hn 0 , m 0 , k 0 ; t 0 i-algorithm that be a linear map, and let A ∈ R n×m where n = nk0 , m = mk0 . The
performs OpsU, OpsV, OpsW linear operations at the base case and let complexity of ψ (A) is
n = nl0 , m = ml0 , k = k 0l (l ∈ N). The arithmetic complexity of ALG
is: q
F (n, m) = nm · logn0m0 (nm)

OpsU OpsV OpsW
n 0m 0
F (n, m, k) = 1 + + + tl
t 0 − n 0m 0 t 0 − m 0 k 0 t 0 − n 0 k 0 0 where q is the number of linear operations performed.
OpsU · nm OpsV · mk OpsW · nk

− + +
t 0 − n 0m 0 t0 − m0 k0 t0 − n0 k0 2.3 Matrix Sparsification.
Definition 2.7. Let P I ×J denote the permutation matrix that ex- Finding a basis that minimizes the number of additions and sub-
changes row-order for column-order of the vectorization of an I × J tractions performed by a fast matrix multiplication algorithm is
matrix. equivalent, by Remark 2.4, to the Matrix Sparsification problem:
3
Problem 2.13. Matrix Sparsification Problem (MS): Let U be an Notation 3.3. Given an Ω-valid set S, we will refer to a vector
n × m matrix. The objective is to find an invertible matrix A such that λ ∈ Fn with supp (λ) 1 Ω s.t. λT U:, S = 0 as an Ω-validator of S.
A = argmin (nnz (AU )) Next, we provide a definition for vectors which are candidates
A∈G L n
for a solution of SIV:
Remark 2.14. It is traditional to think of the matrices U , V , and
W as “tall and skinny”, i.e., with n ≥ m. However, in the area of Definition 3.4. A vector v in the row space of U is called Ω-
matrix sparsification, it is traditional to deal with matrices satisfying independent if v is not in the row space of U Ω, : .
n ≤ m and transformations applied from the left. However, since Note that any solution to SIV (Problem 2.15) is, by definition, an
nnz(AU ) = nnz(U T AT ), we can simply apply MS to U T and use AT optimally sparse Ω-independent vector.
as our basis transformation. From now on, we will therefore switch
to the convention n ≤ m used in matrix sparsification. Remark 3.5. Note that given a set S ⊂ [m], it is possible to verify
whether S is Ω-validand find an appropriate Ω-validator for it in
To solve MS, we make use of Gottlieb and Neylon’s algorithm [15], cubic time (e.g., via Gaussian elimination).
which solves the matrix sparsification problem for n×m matrices, by
repeatedly invoking an oracle for the Sparsest Independent Vector 3.1 Sparse Independent Vectors and maximal
problem (Problem 2.15).
Ω-valid sets.
Problem 2.15. Sparsest Independent Vector Problem (SIV): Let The crux of our algorithms lies in the idea of finding an Ω-valid set
U ∈ R n×m (n ≤ m) and let Ω = {ω1 , . . . , ωk } ⊂ [m]. Find a of maximal cardinality and using it to compute a solution for SIV,
vector v ∈ R n s.t. v is in the row space of U , v is not in the span of which can then be used by Algorithm 2. The connection between
Uω1 , . . . , Uωk , and v has a minimal number of nonzero entries.

Ω-valid sets and Ω-independent vectors is given by the following
Given a subroutine SIV (U , Ω) which returns a pair (v, i), where lemmas:
v is the sparse vector as required by SIV, and i ∈ [n] \ Ω is an Lemma 3.6. Let v ∈ Fn be an Ω-independent vector. Then the set
integer such that the i’th row of U can be replaced by v without S = j : v j = 0 is an Ω-valid set of size zeros (v).

changing the span of U . Then Algorithm 2 returns an exact solution
for MS [15]. Proof. By Definition 3.4, there exists a vector λ ∈ Fn s.t. v =
Ín T
i=1 λi Ui (i.e., v = λ U ) and λi 0 , 0 for some i 0 < Ω (hence
Algorithm 2 MS via SIV [15] supp (λ) 1 Ω). Thus, λ is an Ω-validator of S, and therefore, S is
1: procedure MS(U) Ω-valid.
2: Ω←∅ Lemma 3.7. Let S ⊂ [m] be an Ω-valid set and let λ ∈ Fn an
3: for j = 1, ..., n Ω-validator of S. Then v = λT U is an Ω-independent vector with at
4: (v j , i) ← SIV (U , Ω) least |S | zero entries.
5: Replace i’th row of U with v j
6: Ω ← Ω ∪ {i} Proof. Since S is valid, there exists λ ∈ Fn s.t. λT U:,S = 0 and
return U supp (λ) 1 Ω. Denote v =λT U . By
definition, v has at least |S | zero
entries since ∀i ∈ S vi = λT U:, S = 0. Next we show that v is Ω-
i Í
3 OPTIMAL SPARSIFICATION METHODS independent. Note that, v = λT U = ni=1 λi Ui, : is in the row space
of U since it is a linear combination of the rows of U . Furthermore,
In this section, we reframe SIV as a problem of finding a maximal
since supp (λ) 1 Ω, there exists i 0 < Ω s.t. λi 0 , 0. Therefore,
subset of columns of the input matrix U according to constraints
v is not in the row span of U Ω, : since we assume (Remark 2.14)
given by Ω (see Definition 3.2). We refer to such sets as Ω-valid sets
that all rows of U are linearly independent. Hence, v = λT U is an
and show that Ω-valid sets are tied to sparse independent vectors
Ω-independent vector with at least |S | zero entries.
(Section 3.1) and that any algorithm which finds an Ω-valid set
of maximal cardinality can be used as an oracle in Algorithm 2. Corollary 3.8. Let M ⊂ [m] be a maximal Ω-valid set (i.e.,
Finally, we show how to find maximal Ω-valid sets (Section 3.2), M is not a subset of any other Ω-valid set), and let v ∈ Fn be an
and obtain two algorithms that solve SIV. Ω-independent vector s.t. ∀i ∈ M vi = 0. Then ∀j < M, v j , 0.
Recall that we use the convention that U ∈ Fn×m where n ≤ m
(see Remark 2.14). Throughout this section, we also assume that U Proof. Denote the set of indices of zero entries of v by M =0
0
is of full rank n and Ω ( [n]. j : v j = 0 . Since v is Ω-independent, Lemma 3.6 yields that M
is valid. Hence, by maximality of M, M = M 0 and |M | = zeros (v).
Notation 3.1. For a set S and an integer k, let Ck (S) denote the
Therefore, ∀i ∈ [m] vi = 0 if, and only if, i ∈ M.
set of all subsets of S with k elements.
Definition 3.2. S ⊂ [m] is Ω-valid if there exists i < Ω such that Corollary 3.9. Let M ⊂ [m] be a maximal Ω-valid set and let
λ ∈ Fn be an Ω-validator of M. Then v = λT U is an Ω-independent
Ui, S is in the span of rows U[n]\{i }, S .
vector with exactly |M | zero entries.
Formally, a set S ⊂ [m] is Ω-valid if exists λ ∈ Fn with supp (λ) 1
Ω s.t. λT U:, S = 0 (where supp (λ) = {i : λi , 0}). Proof. Follows directly from Lemma 3.7 and Corollary 3.8
4
The final two claims will show how Ω-validity can serve as (since λT U (:, S) = 0), and any linear combinations of columns of S.
an oracle for Algorithm 2. Recall that Algorithm 2 uses an or- This leads to the following extension of sets:
acle which returns a pair (v, i), where v is an optimally sparse
Ω-independent vector, and replacing the i’th row of U with v does Definition 3.12. Let S ⊂ [m]. We define theextension of S, E (S),
to be the largest set E ⊂ [m] s.t. span col U:, S = span col U:, E .

not change the row span of U . The next claim shows that a maxi-
mally sparse Ω-independent vector is equivalent to an Ω-valid set Lemma 3.13. Let S ⊂ [m]. Then S is Ω-valid if, and only if, E (S)
of maximal cardinality. is Ω-valid.
Claim 3.10. An Ω-independent vector v ∈ Fm is optimally sparse
if, and only if, M = {i : vi = 0} is an Ω-valid set of maximal cardi- Proof. Assume E (S) is Ω-valid. By definition of Ω-validity,
nality. exists a vector λ ∈ Fn s.t. supp (λ) 1 Ω and λT U:, E(S ) = 0. Since
S ⊂ E (S), λT U:,S = 0, therefore, S is valid.
Proof. First, assume that v ∈ Fm is a maximally sparse Ω- Let S ⊂ [m] be an Ω-valid set, and let λ ∈ Fn with supp (λ) 1 Ω
independent vector (i.e., for any Ω-independent vector u, zeros (u) ≤ s.t. λT U:, S = 0. Since col (U (:, E (S))) = col (U (:, S)), all columns
zeros (v)). From Lemma 3.6, we know that M is Ω-valid. Lemma 3.7 indexed by E (S) are linear combinations of the columns indexed by
shows that if there exists an Ω-valid set S s.t. |M | < |S |, then S. Since λ is orthogonal to all columns of U indexed by S, it is also
there also exists an Ω-independent vector u ∈ Fm s.t. zeros (u) ≥ orthogonal to all their linear combinations. Therefore, λT U:, E(S ) =
|S | > zeros (v). This contradicts v being a maximally sparse Ω- 0. Hence E (S) is valid.
independent
vector. Next we show that the search for a maximal Ω-valid set can be
Now, assume that M is an Ω-valid set of maximal cardinality (i.e., reduced to the search over maximal extensions of sets of size n − 1.
for any Ω-valid set S, |S | ≤ |M |) and let λ M be an Ω-validator of
M. By Corollary 3.9, v M = λTM U is an Ω-independent vector with Remark 3.14. Note that rank U:, S ≤ n − 1 for any Ω-valid set

exactly |M | zero entries. Assume by contradiction that exists u ∈ S. This is due to the fact that if rank U:, S = n then λT U:, S = 0

Fm with z > |M | zero entries, then by Lemma 3.6, there is an Ω- implies that λ = 0 since the rows of U are linearly independent.
valid set S s.t. |M | < |S |, in contradiction to M being an Ω-valid set
Lemma 3.15. Let S be an Ω-valid set and let λ ∈ Fn be an Ω-
of maximal cardinality. Therefore, v M = λ M U is a maximally sparse
validator of S. Then
Ω-independent vector.
E (S) ⊂ i : λT U = 0
n o
The following claim shows that given an Ω-valid set, S, and i
its corresponding Ω-independent vector v (as in Lemma 3.7), the
support of the Ω-validator of S can be used to find an index i s.t.
n o
Proof. Let D = i : λT U = 0 . By Definition 3.12, columns
the i’th row of U can be replaced with v without changing the row i
indexed by E (S) are linear combinations of the columns indexed
span of U .
by S and λ is orthogonal to all columns of U:, S (and their linear
Claim 3.11. Let S be an Ω-valid set, let λ be an Ω-validator of S, combinations). Hence, λT U:, E(S ) = 0 and E (S) ⊂ D.
and let v = λT U . Then for any i ∈ supp (λ) \ Ω, replacing row i of U
with v does not the change row span of U . That is: Lemma 3.16. Let S be an Ω-valid set s.t. rank U:, S = n − 1, and

let D be an Ω-valid set s.t. S ⊂ D. Then D ⊂ E (S).

span (rows (U )) = span rows U[n]\{i },: ∪ {v}
Proof. Since S ⊂ D, n − 1 = rank U:, S ≤ rank
U:, D . How-

Proof. Fix i 0 ∈ supp (λ) \ Ω. Since
v is a linear combination
we know that rank
U:, D ≤ n − 1, there-
ever, from Remark 3.14,
of rows of U and λi 0 , 0, u ∈ span rows U[n]\{i 0 },: ∪ {v} , for
fore, span col U:, S = span col U:, D . Hence, by definition,
any u ∈ span (rows (U )). Now, let α ∈ Fn be the α j = −λ
vector j D ⊂ E (S).
(for j , i 0 ) and α i 0 = 0. Then w = α T U ∈ span rows U[n]\{i 0 },: ,
Corollary 3.17. Let S be an Ω-valid set s.t. rank U:, S = n − 1,

therefore w +v = λi 0 Ui 0,: ∈ span rows U[n]\{i 0 },: ∪ {v} . Hence, n
and let λ ∈ F be an Ω-validator of S. Then
span (rows )) =
(U
E (S) = i : λT U = 0
n o
span rows U[n]\{i },: ∪ {v} . i

Therefore, any algorithm which finds an Ω-valid set of maximal Proof. This is a direct result of Lemma 3.15 and Lemma 3.16.
cardinality is an oracle for Algorithm 2.
Note that Corollary 3.17 gives us the tools to quicklycompute
3.2 Computing maximal Ω-valid sets. the extension of any Ω-valid set S such that rank U:, S = n − 1.
Given a maximal Ω-valid set, we now have the tools to compute Next we prove that any maximal Ω-valid set is an extension of an
optimally sparse Ω-independent vectors. As the next stage, we show Ω-valid set of n − 1 linearly independent columns of U :
how to compute a maximal Ω-valid set M using a small subset of
Claim 3.18. Let S ⊂ [m] be a maximal Ω-valid set, then
columns S ⊂ M. The key intuition here is that if λ ∈ Fn is an
Ω-validator of S, then λ is orthogonal to all columns indexed by S rank U:, S = n − 1

5
Proof. Let S ⊂ [m] be a maximal Ω-valid set, and let i 0 < Ω such Algorithm 3 Sparsest Independent Vector (1)
that Ui 0, S ∈ span rows U[n]\i 0, S (such i 0 exists by definition of 1: procedure SIV (U , Ω)
an Ω-valid set). Suppose, by contradiction, that rank U:, S = n − r
2: sparsity ← 0
for some r > 1. 3: sparsest ← null
Note that since Ui 0, S is in the row span U[n]\i 0, S , rank U:, S =
4: i ← null
5: for C ∈ Cn−1 ({1, ..., m})
rank U[n]\i 0, S = n − r . Therefore, exists S 0 ⊂ S s.t. |S | = n − r 6: if rank U:,C < n − 1 or C is not Ω-valid
and rank U:, S 0 = n − r .

7: continue
Let Q ⊂ [m]\S s.t. |Q | = r −1, rank U[n]\i 0, Q = r −1, and each 8: λ ← Ω-validator of C
column indexed by Q is not in the column span of U[n]\i 0, S . Such 9: v ← λT U
Q exists because the matrix U[n]\{i 0 }, : has full rank n − 1 (since U 10: E ← {i : vi = 0}
is of full row rank n). 11: if |E| > sparsity
Since the matrix U[n]\{i 0 }, S 0 ∪Q is a square n − 1 ×n − 1 matrix of 12: sparsity ← |E|
13: sparsest ← v
full rank, Ui 0,S 0 ∪Q is in the span of row U[n]\{i 0 }, S 0 ∪Q . Therefore, 14: i ← any element of supp (λ) \ Ω
S 0 ∪ Q is an Ω-valid set. return (v, i)
By Lemma 3.13, the extension of S 0 ∪Q is also valid. Furthermore,
S ∪Q ⊂ E (S 0 ∪ Q) because we have chosen S 0 s.t. it spans the same
column space as S. However, by construction of Q, we know that Theorem 3.22. Algorithm 3 produces an optimal solution to SIV,
S ∩ Q = ∅, meaning that |E (S 0 ∪ Q)| ≥ |S ∪ Q | > |S |. This in and is an oracle for Algorithm 2.
contradiction to maximality of S. Proof. By Lemma 3.21, Algorithm 3 iterates over all maximal Ω-
valid sets. Lines 14-17 check whether a given Ω-valid set has greater
Corollary 3.19. Let S ⊂ [m] be a maximal Ω-valid set and let
cardinality than any previously found maximal Ω-valid set and if it
C ⊂ S s.t. rank U:, C = n − 1. Then S = E (C).
does, the algorithm choose this set as a working solution. Hence, at
the end of the algorithm, the chosen vector v correlates to a maximal
Proof. C ⊂ S, Therefore, span col U:, C ⊂ span col U:, S .

cardinality Ω-valid set. By Claim 3.10, v is an optimal solution to SIV
Because S is maximal, Claim 3.18 shows that rank
U:, S = n−1. We (a maximally sparse Ω-independent vector) if, and only if, the set
have, by rank equality, that span col U:, C = span col U:,S .

E = {i : vi = 0} is an Ω-valid set of maximal cardinality. Therefore,
By definition, E (C) is the maximal set E s.t. span col U:, C ⊂
the vector chosen at the end of the algorithm is a maximally sparse
span col U:, E , therefore, S ⊂ E (C). However, by maximality of
Ω-independent vector. Finally, by Claim 3.11, the pair (v, i) serves
S, we have S = E (C).
as the oracle for SIV required by Algorithm 2.
Corollary 3.20. Let S ⊂ [m] be a maximal Ω-valid set, then
exist C ∈ Cn−1 ([m]) s.t. S = E (C).
3.4 Implementation of our first optimal
algorithm.
Proof. This is a direct result of Corollary 3.19 In order for Algorithm 3 to perform well, we have added a blacklist
to the algorithm’s operation. Since the maximal Ω-valid sets are
3.3 First algorithm for SIV. generated by computing the extension (Definition 3.12) of n − 1
Our first algorithm performs an exhaustive search over all maximal independent columns, once a given Ω-valid set is found, we wish
Ω-valid sets in order find one with maximal cardinality. This is a to blacklist all of its subsets of size n − 1 since we need not revisit
result of the observation given by Claim 3.10, which states that any that extension. However, in addition to memory costs, looking
solution to SIV is tied to an Ω-valid set of maximal cardinality (and up an element in the blacklist incurs a significant overhead as the
vice versa). The search is done using by combining Corollary 3.20, blacklist grows. To address this problem, rather than storing all
which states that any maximal Ω-valid set is the extension of an subsets Cn−1 (S) of a given set S, we store S itself in the blacklist,
Ω-valid set of n − 1 independent columns, and Corollary 3.17, which in which case C is not blacklisted if ∀B ∈ blacklist C 1 B. Despite
provides a method to compute said extension. this measure, in some cases the blacklist still grew too large, so we
and imposed a limit on the maximum size of the blacklist, storing
Lemma 3.21. Algorithm 3 iterates over all maximal Ω-valid sets. only the M largest sets found so far.

Proof. By Corollary 3.20, for any maximal Ω-valid set E, there 3.5 Second algorithm for SIV.
exist an Ω-valid set C ∈ Cn−1 ([m]) s.t. rank U:, C = n − 1 and
While our first algorithm performs well in many cases, we have
E is the extension of C. Therefore, the algorithm iterates over all
found that it performs poorly when the largest Ω-valid set is very
Ω-valid sets C ∈ Cn−1 ([m]) s.t. rank U:, C = n − 1. Furthermore,

large. In such cases the algorithm quickly finds the correct solution,
by Corollary 3.17,nif rank U:, C =o n − 1 and λ is an Ω-validator of
but then continues its exhaustive search for a very long time. Our
C then E (C) = i : λT U = 0 . The algorithm performs this second algorithm is slightly simpler and avoids this inefficiency
i
computation at lines 11-13. Hence, the algorithm iterates over all by using a top-down approach, searching for Ω-valid sets in de-
Ω-valid sets. scending order of cardinality to find an Ω-valid set of maximal
6
cardinality. Just like our first algorithm, it relies on the observation Algorithm 5 Row basis sparsification [22, 23]
of Claim 3.10, which ties any solution of SIV (maximally sparse, 1: procedure KS-Sparsification(U)
Ω-independent vector) to an Ω-valid set of maximal cardinality. 2: sparsity ← nnz (U )
3: basis ← In
Algorithm 4 Sparsest Independent Vector (2) 4: for C ∈ Cm (n)
5: if U:, C is of full rank
procedure SIV (U , Ω)
sparsifier ← U:,−1C
1:
6:
2: for z = m − 1, ..., n − 1
7: if nnz (sparsi f ier · U ) < sparsity
3: for C ∈ Cz ([m])
8: sparsity ← nnz (sparsi f ier · U )
4: if rank U:,C = n − 1 and C is Ω-valid
9: basis ← sparsifier
5: λ ← Ω-validator of C
6: v ← λT U 10: return basis
7: i ← any element of supp (λ) \ Ω
8: return (v, i) 4.2 Greedy sparsification.
A second heuristic for matrix sparsification, inspired by Gottlieb
To prove the correctness of our Algorithm 4, we use the following and Neylon’s algorithm (Algorithm 2), employs an even simpler
lemma, which provides bounds on the size of a maximal Ω-valid set. greedy approach.
Recall that for a given n × m matrix U (n ≤ m), we seek an n × n
Lemma 3.23. Let S ⊂ [m] be a maximal Ω-valid set, then n − 1 ≤ matrix A which minimizes nnz (AU ) + nns (AU ). For this purpose,
|S | ≤ m − 1. rather than searching for the entire invertible matrix A achieving
this objective, we could instead search for each row of A individually.
Proof. First, we show that |S | < m. Assume, by contradiction, Concretely, we iteratively compose the matrix A row-wise; where
that |S | = m and let λ ∈ Fn be an Ω-validator of S. Then λT U = 0, at each step i, we obtain the sparsest row vector vi such that vi is
which means that i ∈[n] λi Ui,: = 0, in contradiction to U having independent of {v 1 , . . . , vi−1 } and minimizes nnz (vU ) + nns (vU ).
Í
full row rank n. Hence, |S | ≤ m − 1. This yields the following algorithm:
Next, by Claim 3.18, since S is a maximal Ω-valid set, its rank
isn − 1, therefore, n − 1 ≤ |S |. Hence n − 1 ≤ |S | ≤ m − 1. Algorithm 6 Greedy Sparsification
1: procedure Greedy − Sparsi f ication(U )
Theorem 3.24. Algorithm 3 produces an optimal solution to SIV, 2: A←∅
and is an oracle for Algorithm 2. 3: for i = 1, . . . , n
v← nnz v T U + nns v T U

argmin
Fm
4:
Proof. Claim 3.10 states that v ∈ is a solution to SIV (an op- v ∈Fm
r k ({v 1 , . . .,v i −1 ,v })=i
timally sparse, Ω-independent vector) if and only if S = {i : vi = 0}
is Ω-valid. The algorithm iterates all subsets of [m] in descending 5: Ai,: ← vT
order of cardinality. Therefore, the first Ω-valid set found is an 6: return A
Ω-valid set of maximal cardinality. Furthermore, Lemma 3.23 states
that any maximal Ω-valid set is of size n − 1 ≤ z ≤ m − 1, hence, the In order to implement the subroutine for finding each row vec-
algorithm iterates all candidates S ⊂ [m] that could be Ω-valid sets tor vi , we encoded the objective as a MaxSAT instance and used
of maximal cardinality. Therefore, Algorithm 4 returns a sparsest Z3 [14], an SMT Theorem Prover, to find the optimal solution. Our
Ω-independent vector. Finally, by Claim 3.11, the pair (v, i) serves MaxSAT instance employs two types of “soft” constraints: one
as the oracle for SIV required by Algorithm 2. which penalizes non-zero entries, and another which penalizes
non-singleton entries. Therefore, optimal solutions will minimize
4 ADDITIONAL SPARSIFICATION METHODS the sum of non-zero and non-singleton entries, thereby minimizing
the associated arithmetic complexity (Remark 2.4).
4.1 Sparsification via subset of rows.
This algorithm, while not proven to be optimal, has the advan-
The alternative bases presented in Karstadt and Schwartz’s [22, 23] tage of considering both non-zeros and non-singletons, and can
paper were found using a straightforward heuristic of iterating over therefore produce decompositions resulting in a lower arithmetic
all sets of n linearly independent rows of an n ×m matrix of full rank complexity than the optimal algorithms (Algorithms 3, 4). For a
(where n ≥ m). This heuristic was based on the observation that summary of these results, see Table 2.
using the columns of the original matrix for sparsification ensures
that the sparsified matrix contains n rows, each with only a single 5 APPLICATION AND RESULTING
non-zero entry. ALGORITHMS
While this method is inefficient, requiring ( m
n ) passes, it finds
sparsifications which significantly improve the leading coefficients Table 2 contains a list of alternative basis algorithms found using
of multiple algorithms. The refinement of this method led to the our new methods. All of the algorithms used were taken from
development of Algorithm 3. It is therefore presented here for the repository of Ballard and Benson [2]3 . The alternative basis
completeness. 3 The algorithms can be found at github.com/arbenson/fast-matmul
7
Table 2: Alternative Basis Algorithms

Arithmetic Operations Leading Coefficient

Algorithm Leading Monomial Original Here Original Here Improvement
h2, 2, 2; 7i [38] n log2 7 ≈ n 2.80735 18 12 7 5 28.57%
3
h3, 2, 2; 11i [2] n log12 11 ≈ n 2.89495 22 18 5.06 4.26 15.82%
3
h2, 3, 2; 11i [40] n log12 11 ≈ n 2.89495 22 18 4.71 3.91 16.97%
3
h4, 2, 2; 14i [2] n log16 14 ≈ n 2.85551 48 28 8.33 5.27 36.8%
3
h3, 2, 3; 15i [18] n log18 15 ≈ n 2.81076 55 39 8.28 6.17 25.5%
3
h3, 2, 3; 15i [2] n log18 15 ≈ n 2.81076 64 39 9.61 6.17 35.84%
3
h5, 2, 2; 18i [2] n log20 18 ≈ n 2.89449 53 32 6.98 4.46 36.06%
3
h4, 2, 3; 20i [35] n log24 20 ≈ n 2.82789 78 51 8.9 5.88 33.96%
3
h4, 2, 3; 20i [2] n log24 20 ≈ n 2.82789 82 51 9.19 5.88 36.01%
3
h4, 2, 3; 20i [2] n log24 20 ≈ n 2.82789 86 54 9.38 6.12 34.77%
3
h4, 2, 3; 20i [2] n log24 20 ≈ n 2.82789 104 56 11.38 6.38 43.9%
3
h2, 3, 4; 20i [2] n log24 20 ≈ n 2.82789 96 58 9.96 6.12 38.59%
h3, 3, 3; 23i [2] n log3 23 ≈ n 2.85404 87 66 7.21 5.71 20.79%
h3, 3, 3; 23i [2] n log3 23 ≈ n 2.85404 88 65 7.29 5.64 22.55%
h3, 3, 3; 23i [2] n log3 23 ≈ n 2.85404 89 65 7.36 5.64 23.3%
h3, 3, 3; 23i [2] n log3 23 ≈ n 2.85404 97 61 7.93 5.36 32.43%
h3, 3, 3; 23i [2] n log3 23 ≈ n 2.85404 166 73 12.86 6.21 51.67%
h3, 3, 3; 23i [26] n log3 23 ≈ n 2.85404 98 74 8 6.29 21.43%
h3, 3, 3; 23i [35] n log3 23 ≈ n 2.85404 84 68 7 5.86 16.33%
3
h4, 4, 2; 26i [2] n log32 26 ≈ n 2.82026 235 105 (?) 18.1 7.81 56.84%
3
h4, 3, 3; 29i [2] n log36 29 ≈ n 2.81898 164 102 10.27 6.73 34.49%
3
h3, 4, 3; 29i [36] n log36 29 ≈ n 2.81898 137 109 8.54 6.96 18.46%
3
h3, 4, 3; 29i [2] n log36 29 ≈ n 2.81898 167 105 10.27 6.73 34.49%
3
h3, 5, 3; 36i [36] n log45 36 ≈ n 2.82414 199 139 9.62 6.87 28.6%
3
h6, 3, 3; 40i [35] n log54 40 ≈ n 2.77429 1246 190 (?) 55.63 8.9 84.01%
3
h3, 3, 6; 40i [41] n log54 40 ≈ n 2.77429 1822 190 (?) 79.28 8.9 88.78%
(?) Denotes algorithms with non-singular values, where the result of Algorithm 6 was better than those of the exhaustive algorithms.

algorithms obtained represent a significant improvement over the this reason, the decomposition of some of the larger instances re-
original versions, with the reduction in the leading coefficient rang- quired the use of Mira supercomputer. However after some tuning
ing between 15% and 88%. Almost all of the results were found of Algorithms 3 and 4 (see Section 3.4) and the implementation
using our exhaustive methods (Algorithms 3 and 4). In certain cases of Algorithm 6 using Z3, all decompositions completed on a PC
(marked (?)), where the U , V , W matrices contain non-singular within a reasonable time. Specifically, all runs of Algorithms 3 and
values, our search heuristic’s (Algorithm 6) result exceeded those 4 completed within 40 minutes, while Algorithm 6 took less than
of our exhaustive algorithms. For example, bases obtained for the one minute, on a PC4 . It should be remembered that Algorithms 3
h4, 4, 2; 26i-algorithm by Algorithms 3 and 4 reduced the number of and 4 guarantee optimal sparsification, while Algorithm 6 has no
arithmetic operations from 235 to 110, while Algorithm 6 reduced such guarantee. However, in all cases, Algorithm 6 ran much faster
the number of arithmetic operations even further, to 105. and produced an equally good decomposition, with better results
when there were non-singular values.

Comparison of different search methods. The exhaustive algo- 6 DISCUSSION AND FUTURE WORK
rithms (Algorithms 3, 4) solve the SIV problem. Their proof of
We have improved the leading coefficient of several fast matrix
correctness, coupled with that of Gottlieb and Neylon’s algorithm,
multiplication algorithms by introducing new methods to solve to
guarantee that they obtain decompositions minimizing the number
of non-zero entries. As MS and SIV are both NP-Hard problems,
these algorithms exhibit an exponential worst-case complexity. For 4 Matebook X (i7-7500U CPU and 8GB RAM)
8
sparsify the encoding/decoding matrices of fast matrix multipli- [6] Nader H Bshouty. 1995. On the additive complexity of 2×2 matrix multiplication.
cation algorithms. The number of arithmetic operations depends Information processing letters 56, 6 (1995), 329–335.
[7] Murat Cenk and M Anwar Hasan. 2017. On the arithmetic complexity of Strassen-
on both both non-zero and non-singular entries. This means that like matrix multiplications. Journal of Symbolic Computation 80 (2017), 484–501.
in order to minimize the arithmetic complexity, the sum of both [8] S Frank Chang and S Thomas McCormick. 1992. A hierarchical algorithm for
making sparse matrices sparser. Mathematical Programming 56, 1 (1992), 1–30.
non-zero and non-singular entries should be minimized, otherwise [9] Henry Cohn and Christopher Umans. 2003. A group-theoretic approach to fast
an optimal sparsification may result in a 2-approximation of the matrix multiplication. In Foundations of Computer Science, 2003. Proceedings. 44th
minimal number of arithmetic operations when matrix entries are Annual IEEE Symposium on. IEEE, 438–449.
[10] Don Coppersmith and Shmuel Winograd. 1982. On the asymptotic complexity
not limited to 0, ±1. Further work is required in order to find a of matrix multiplication. SIAM J. Comput. 11, 3 (1982), 472–492.
provably optimal algorithm which minimizes both non-zero and [11] Don Coppersmith and Shmuel Winograd. 1990. Matrix multiplication via arith-
non-singleton values. metic progressions. Journal of symbolic computation 9, 3 (1990), 251–280.
[12] Hans F de Groote. 1978. On varieties of optimal algorithms for the computation
We attempted sparsification of additional algorithms for larger of bilinear mappings I. the isotropy group of a bilinear mapping. Theoretical
dimensions (e.g., Pan’s h44, 44, 44; 36133i-algorithm [31], which is Computer Science 7, 1 (1978), 1–24.
[13] Hans F de Groote. 1978. On varieties of optimal algorithms for the computa-
asymptotically faster than those presented here). However, the tion of bilinear mappings II. Optimal algorithms for 2×2-matrix multiplication.
size of the base case of these algorithms led to prohibitively long Theoretical Computer Science 7, 2 (1978), 127–148.
runtimes. [14] Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In
International conference on Tools and Algorithms for the Construction and Analysis
The methods presented in this paper apply to finding square of Systems. Springer, 337–340.
invertible matrices solving the MS problem. Other classes of sparse [15] Lee-Ad Gottlieb and Tyler Neylon. 2010. Matrix sparsification and the sparse
decompositions exist which do not fall within this category. For ex- null space problem. In Approximation, Randomization, and Combinatorial Opti-
mization. Algorithms and Techniques. Springer, 205–218.
ample, Beniamini and Schwartz’s [1] decomposed recursive-bilinear [16] Vince Grolmusz. 2008. Modular representations of polynomials: Hyperdense
framework relies upon decompositions in which the sparsifying coding and fast matrix multiplication. IEEE Transactions on Information Theory
54, 8 (2008), 3687–3692.
matrix may be rectangular, rather than square. Some of the leading [17] Alan J Hoffman and ST McCormick. 1984. A fast algorithm that makes matrices
coefficients in [1] are better than those presented here. For example, optimally sparse. Progress in Combinatorial Optimization (1984), 185–196.
they obtained a leading coeffcient of 2 for a h3, 3, 3; 23i-algorithm [18] John E Hopcroft and Leslie R Kerr. 1971. On minimizing the number of multi-
plications necessary for matrix multiplication. SIAM J. Appl. Math. 20, 1 (1971),
of [2] a h4, 3, 3; 29i-algorithm of [36], compared to our values 5.36 30–36.
and 6.96 respectively. However, the arithmetic overhead of basis [19] John E Hopcroft and Jean Musinski. 1973. Duality applied to the complexity of
transformation in Karstadt and Schwartz [22, 23] (and therefore matrix multiplications and other bilinear forms. In Proceedings of the fifth annual
ACM symposium on Theory of computing. ACM, 73–87.
here as well) is O n 2 log n , whereas in [1] it may be larger. Note [20] Rodney W Johnson and Aileen M McLoughlin. 1986. Noncommutative Bilinear
also that the decomposition heuristic of [1] does not always guaran- Algorithms for 3×3 Matrix Multiplication. SIAM J. Comput. 15, 2 (1986), 595–603.
[21] Igor Kaporin. 1999. A practical algorithm for faster matrix multiplication. Nu-
tee optimality. Further work is required to find new decomposition merical linear algebra with applications 6, 8 (1999), 687–700.
methods for such settings. [22] Elaye Karstadt and Oded Schwartz. 2017. Matrix multiplication, a little faster.
In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and
Architectures. ACM, 101–110.
7 ACKNOWLEDGEMENTS [23] Elaye Karstadt and Oded Schwartz. 2020. Matrix multiplication, a little faster.
We thank Austin R. Benson for providing details regarding the Journal of the ACM (JACM) 67, 1 (2020), 1–31.
[24] Donald E Knuth. 1981. The Art of Computer Programming, Volume 2: Seminu-
h2, 3, 2; 11i-algorithm. This research used resources of the Argonne merical Algorithms, Addison-Wesley. Reading, MA (1981).
Leadership Computing Facility, which is a DOE Office of Science [25] Julian Laderman, Victor Y Pan, and Xuan-He Sha. 1992. On practical algorithms
User Facility supported under Contract DE-AC02-06CH11357. This for accelerated matrix multiplication. Linear Algebra and Its Applications 162
(1992), 557–588.
work was supported by the PetaCloud industry-academia consor- [26] Julian D Laderman. 1976. A noncommutative algorithm for multiplying 3×3
tium. This research was supported by a grant from the United matrices using 23 multiplications. In Am. Math. Soc, Vol. 82. 126–128.
[27] François Le Gall. 2014. Powers of tensors and fast matrix multiplication. In
States-Israel Bi-national Science Foundation, Jerusalem, Israel. This Proceedings of the 39th international symposium on symbolic and algebraic com-
work was supported by the HUJI Cyber Security Research Center putation. ACM, 296–303.
in conjunction with the Israel National Cyber Bureau in the Prime [28] S Thomas McCormick. 1983. A Combinatorial Approach to Some Sparse Matrix
Problems. Technical Report. Stanford university CA systems optimization lab.
Minister’s Office. This project has received funding from the Euro- [29] S Thomas McCormick. 1990. Making sparse matrices sparser: Computational
pean Research Council (ERC) under the European Union’s Horizon results. Mathematical Programming 49, 1-3 (1990), 91–111.
2020 research and innovation programme (grant agreement No [30] Victor Y Pan. 1978. Strassen’s algorithm is not optimal trilinear technique of
aggregating, uniting and canceling for constructing fast algorithms for matrix
818252). operations. In Foundations of Computer Science, 1978., 19th Annual Symposium
on. IEEE, 166–176.
REFERENCES [31] Victor Y Pan. 1982. Trilinear aggregating with implicit canceling for a new
acceleration of matrix multiplication. Computers & Mathematics with Applications
[1] Gal Beniamini and Oded Schwartz. 2019. Faster Matrix Multiplication via Sparse 8, 1 (1982), 23–34.
Decomposition. In Proceedings of the 31st ACM Symposium on Parallelism in [32] Robert L Probert. 1976. On the additive complexity of matrix multiplication.
Algorithms and Architectures. ACM, 11–22. SIAM J. Comput. 5, 2 (1976), 187–203.
[2] Austin R Benson and Grey Ballard. 2015. A framework for practical parallel fast [33] Francesco Romani. 1982. Some properties of disjoint sums of tensors related to
matrix multiplication. ACM SIGPLAN Notices 50, 8 (2015), 42–53. matrix multiplication. SIAM J. Comput. 11, 2 (1982), 263–267.
[3] Dario Bini, Milvio Capovani, Francesco Romani, and Grazia Lotti. 1979. O(n 2.7799 ) [34] Arnold Schönhage. 1981. Partial and total matrix multiplication. SIAM J. Comput.
complexity for n×n approximate matrix multiplication. Information processing 10, 3 (1981), 434–455.
letters 8, 5 (1979), 234–235. [35] Alexey V Smirnov. 2013. The bilinear complexity and practical algorithms for
[4] Marco Bodrato. 2010. A Strassen-like matrix multiplication suited for squaring matrix multiplication. Computational Mathematics and Mathematical Physics 53,
and higher power computation. In Proceedings of the 2010 International Sympo- 12 (2013), 1781–1795.
sium on Symbolic and Algebraic Computation. ACM, 273–280. [36] Alexey V Smirnov. 2017. Several bilinear algorithms for matrix multiplication.
[5] Richard P Brent. 1970. Algorithms for matrix multiplication. Technical Report. Technical Report.
Stanford university CA department of computer science.

9
[37] Andrew James Stothers. 2010. On the complexity of matrix multiplication. Thesis [41] Petr Tichavskỳ, Anh-Huy Phan, and Andrzej Cichocki. 2017. Numerical CP
(2010). decomposition of some difficult tensors. J. Comput. Appl. Math. 317 (2017),
[38] Volker Strassen. 1969. Gaussian elimination is not optimal. Numerische mathe- 362–370.
matik 13, 4 (1969), 354–356. [42] Virginia V Williams. 2012. Multiplying matrices faster than Coppersmith-
[39] Volker Strassen. 1986. The asymptotic spectrum of tensors and the exponent Winograd. In Proceedings of the forty-fourth annual ACM symposium on Theory
of matrix multiplication. In Foundations of Computer Science, 1986., 27th Annual of computing. ACM, 887–898.
Symposium on. IEEE, 49–54. [43] Shmuel Winograd. 1971. On multiplication of 2×2 matrices. Linear algebra and
[40] Petr Tichavskỳ and Teodor Kováč. 2015. Private communication with Ballard its applications 4, 4 (1971), 381–388.
and Benson, see [2] for benchmarking. (2015).

10
A SAMPLES OF ALTERNATIVE BASIS Table 5: h3, 3, 3; 23i-algorithm [35]
ALGORITHMS
Uϕ Vψ Wυ
0 0 0 0 0 0 0 0 1 −1 0 0 0 0 0 0 −1 0 0 −1 0 0 −1 0 0 1 0
In this section we present the encoding/decoding matrices of the 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 −1 0
0 1 0 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 −1 0 0
alternative basis algorithms listed in Table 2. To verify the correct- 1 0 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 −1 0 0
ness of these algorithms, recall Corollary 2.11 and use the following 0
−1
0
0
0
0
−1
0
0
0
1
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
0
−1
0
−1
0
0
1
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
1
−1 0
1

fact: 0 0 −1 0 −1 0 −1 0 1 0 0 0 0 0 0 0 1 0 0 −1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 −1 0 0 −1 1 0 0 0 0 0
−1 −1
Fact A.1. (Triple product condition). [5, 24] Let R be a ring, and
0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0
0 0 −1 0 0 0 0 0 −1 0 0 0 0 0 −1 0 0 0 0 1 0 −1 0 −1 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1

let U ∈ R t ×n ·m , V ∈ R t ×m ·k , W ∈ R t ×n ·k . Then hU , V , W i are 0

0
0
0
0
−1
−1
0
0
−1
0
0
0
0
0
0 0
1 −1
0
0
0
0
−1
1
0
0
0
−1
0
0
0
0
1
0
−1
0
0
0
0
0
0
0
0
1
0
1
1
0
0
0
0
0
0

encoding/decoding matrices of an hn, m, k; ti-algorithm if and only

0 0 0 0 0 0 −1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 −1 0 0 0 0
0 0 0 0 0 1 0 −1 0 0 0 0 1 −1 0 0 0 0 1 0 0 0 0 0 0 0 0
0 −1 0 0 0 1 0 0 0 0 0 0 0 −1 1 1 0 0 −1 0 0 0 0 0 0 0 1
if: 0 0 0 1 0 −1 0 1 1 1 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 1 0 0
−1 −1
∀i 1 , k 1 ∈ [n] , j 1 , i 2 ∈ [m] , j 2 , k 2 ∈ [k]
0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 −1
−1 0 0 0 0 0 0 −1 0 0 −1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
−1 −1 −1 −1
t
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Õ ϕ ψ υ −T

Ur,(i 1,i 2 )Vr,(j1, j2 )Wr,(k1,k2 ) = δi 1,k1 δi 2, j1 δ j2,k2

0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 −1 0 0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 −1 0 1
0 0 0 0 0 0 0 1 0 0 −1 −1 0 0 0 0 0 0 0 0 0 0 −1 1 0 0 0
r =1 0
−1
0
0
−1
0
0
0
0
0
0
0
0
−1
0
−1
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
−1
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
−1

where δi, j = 1 if i = j and 0 otherwise. 0

1
0
0
−1
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
1
0
0
0
0
0
0
0
−1
0
0
0
0
0
1
0
−1
0
0
0
0
0
0 0
0 0
0
1
1
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0

A.1 A sample of Algorithms Table 6: h6, 3, 3; 40i-algorithm [35]

Table 3: h3, 2, 3; 15i-algorithm [2]
Uϕ Vψ Wυ
0 0 0 0 0 0 −1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 −1 0 −1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 −1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 −1 −1 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
−1 −1 −1
Uϕ Vψ Wυ
0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 −1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 −1 0 −1 −1 0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 −1
−1 −1 −1
0 −1 0 1 0 1 1 0 0 0 0 −1 0 0 1 0 0 0 0 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
−1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0 0
0
0 0
0
−1
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
−1 0
0 0
0 0
0
0
0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 1 −1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 −1 0 −1 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
0 0 1 1 0 0 0 0 −1 0 1 0 −1 0 0 0 0 0 −1 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
0
1
0
0
−1 0
0 −1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
−1
0 0 0 0 −1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 −1 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
−1 0
0 0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
−1
−1
0
−1
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 −1 0 0 0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
−1
0
0
0
0
0
−1
−1
0
0
0
0
0
0
0
0
0
−1
0
0
1
−1
0
0
0
0 0
0
1
0 −1
1
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
−1
−1
−1
−1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
−1 −1 −1
1 0 0 0 −1 0 0 0 −1 0 0 0 0 0 0 0 0 −1 0 0 0 0
0
0
1 0
0
0
0
0
0
0
0
1
0
0
1
0 −1
0 0
0
0
1
0
−1
0
0
0
0
1
0
0
0
0
0
0
0
0
−1 0 0
0
−1
1
0
0
−1
1
0 0
0 0
0
0
0
1
0
0
−1 0
0
0
0
0
0 0
0
1
0
0
0
0
0
0
0 0
0
0
0 0
0
0
−1
0
0
0
0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 −1 0 0 −1 0 0 1 −1 0 −1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 −1 0 0 1 0 0 −1 0 −1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 −1 0 1 0 0 0 0 0 1 0 0 0 0 0 −1 0 0 0 0 0 −1 0 0 0 −1 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 −1 −1 0 1 0 0 0 0 0 0 1 0 0 −1 0 0 −1 −1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 −1 0 0 0 −1 0
0 −1 0 0 0 0 1 0 0 −1 0 0 0 0 0 0 0 −1 −1 0 −1 0 1 0 0 0 0 1 0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 −1 0 0 0 0 0 0 0 0 −1 0 1 −1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 −1 0 0 −1 0 0 1 0 1 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 1 0 −1 1 0 0 1 0 0 0 0 −1 0 0
1
0
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
1
0
0
0
0
0
0
0
0
0
−1 0
0 0
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
−1
1
0
−1
0
0
0
0
0
0
0
0
1
0
0
0
0
−1
−1 0 0 0 1 −1 0 −1 0 0 1 0 0 −1 0 0 0 0 −1 0 0 0
0
0
0
−1
0
0
0
0
−1 0
0
0
1
1
0 0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
0
0
−1
−1
0
0
0
0
−1
0
0
−1
1
0
0
0
0
0 1
0 0
0
0
0
0
0
0
0
0
0
−1
1
1
0
0
0
0
0
0
0 0
0 0
1
0
−1
0
1
0
0
0
0
1
0
0
0

0 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0
0
0
0
0
0
0
0
1
0
0
−1 0
0
0
0 0
0
0
0
0
0
−1
0 0
0 0
1
0
0
0
0
0
0
0
0
−1
1
0
0
0
0
0
0
0
0 0
0 1
0
0
0
0
0
0
1
0
0
0
0
0
0
−1
0
0
0
0
0
1
0
0
0
0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
1
−1
−1
−1 −1 −1
1 0 0 0 0 0 0 −1 0 0 0 1 0 0 0 0 −1 0 0 1 −1 0
0
0
0
0
−1
0
0
0
0
1
0
0
0
1
0
0
0
0
0 1
0
0
0
0
0
−1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0 0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
−1
0
0
0
0 0
0 1
0 −1
0
0
0
0
0
0
0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 −1 0 0 0 0 0 0 −1
0 0 −1 −1 0 −1 1 0 0 −1 −1 0 0 −1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 −1 0 0 0 0 0 0 0 0 −1 0 1 0 0 0 0 1 0 0 0 0 0 0 −1 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 0 −1 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 1 0 −1 0 0 0 −1 −1 0 0 0 0 0 1 0 0 0 0 −1 0
0
0
0
0
0
0
0
−1
0
0
−1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
1
0
0
−1
0
0
0
−1
−1
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
−1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
−1
1
0 0
0 0
−1 0
0 0
0
0
0
0
0
0 1 0 0 0 −1 0 −1 0 0 0 0 −1 0 0 0 0 0 0 1 0 0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
−1
0
0
1
−1
−1
−1
0
0
0
0
0
−1
0
0
1
0
1
0
0
−1
0
0
0
0
0
0
0
0
0
0
−1
−1
0
0
0
0
0
0
0
0
0
0
0
0
−1
0
0
0
0
0
0 0
0
0
0 1
0
0
1
−1
0
0
0

ϕ ψ υ −T
1 0 0 0 0 0 0 0 0 0 0 0 −1 0 1 −1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1 0 −1 0 0
ϕ ψ υ −T
− 18 1 0 −1 0 −1 −1 0 1 1 0 0 1 − 18 0 −1 0 0 0 0 0 0 1 −1 0 −1 1 0 0 1 1 0 0 1 0 0 0 − 18 0 0 −2 1 0 1 0
0 0 0 0 1 −1 0 0 1 0 0 1 0 0 0 0 1 0 0 −1 0 0
8
0 0.25 0 0 2 −1 1 −1 −1 1 1
8
− 18 1
8 − 18 0 0 0 0 0 0 1 −1 0 −1 1 0 1 1 1
8
1
8
1
8
1
8
8
0 0 0 0 0 0 0 0 0 − 18
8
− 18 − 18
0 0 − 18 0 −1 0 0 −1 0 1 0 0 0 0 − 18 −1 0 2 1 −1 0 0 0 0 −1 1 0 0 0 1 1 0.25 0 1 0 0 0 1 0 0 0 1 0 1 0
−1 −1 −1
8 8 8 8
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0
0
0
− 18
0
2
0
1
0
0
0
0
1
−1
−1
0
1
1
1
0
−1
0
1
0
1
0
− 18
1
8
1
−1
0
0
0
0
0
−1
−1
0
0
−1
1
0
0
1
−1
1
1
1
1
−1
−1
0
0
0
1
0
1
0
0
1
8
0
− 18
1
− 18
1
0
0
0
1
0
1
− 18
0
1
8
1
1
8
0
−1
−1
1
1
1
2
0
0
0
− 18
0
0
8 8 8 8 8 8 8
0 0 0 0 0 −1 −1 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0
0
0
0
0
0
−1
1
1
−1
−1
−1
0
1
0
−1
0
−1 1
0 0
−1
0
−1
− 18
0
1
8
0
− 81
0
1
−1
−1
1
1
1
0
1
0
0
0
−1
0
0
2
1
0
−1 1
0 −2
−1
0
0
0
0
0
0
−1
−1
− 18
− 18
0
0
0
0
− 18
0
0
1
0
1
0
0
1
8
1
0
0
2
1
0
−1 0
1 0
1
− 18
0
0
1
8 8 8 8 8
−1 0 1 0 0 1 0 0 1 −1 0 1 0 0 −1 0 0 −1 0 0 0 0
0
0
0
0
− 18
1
1
−1
0
1
−1
1
0
−1
−1
1
0
1
1
−1
0
1
0
0
− 18
0
− 18
0
0
−1
0
1
1
−1
1
1
−2
0
0
−1
0
0
0
−1
0
1
0
−1
0
1
−2
0
0
1
−1
1
−1
0
−1
1
8
− 18
0
− 81
0
− 18
0
− 18
1
8
− 81
1
8
− 18
1
8
0
0
0
− 18
0
0
−1
0
−1
1
−1
0
0
− 18
0
0
0
0 0 −1 0 1 −1 −1 −1 0 0 −1 0 0 0 −1 0 0 0 −1 0 −1 − 18
0
− 81
0
0
− 18
0
−1
−1
0
0
1
1
0
0
−1
−1 1
0 1
0
0
0
0
0
− 18
0
1
8
− 18
0
0
0
1
−1
1
1
0
1
0
−1
−1
−1
0
0
1
8
0
− 18
0
− 18
0
0
0
0
0
− 18
− 18
0
1
8
1
8
1
8
−1
−1
1
1
0
1
0
1
8
− 18
− 18
0
− 18
− 18 1 − 18 1 1 − 18 1 1 − 18
1 −1 −1 0 0 0 0 0 1 0 −1 0 −1 1 0 −1 1 0 −1 0 0 0
0
0
0 1
8
−1
0
0
−1
−1
0
0
−1
−1
0
0
−1 0
1 0
1
0
1
8
− 81 1
8
0
0
0
−1
−1
0
−1
0
−1
−1
−1
−1
1
0
8
0
8
1
8 − 18
8
1
8
8
0 0
0
0
0
1
8
0
0
−1
0
−1
0
1
1
0
− 18
0
0
0
− 81
− 18 −1 0 −1 − 18 1 − 81 − 18 1 1 1 −1 1
1 −1 0 0 −1 0 0 0 0 0
1
8
0
− 18 1
8
0
1
1
−1
0
1
1
−1
0
1 −1 0 0
1
0 − 18
8
1
8
0
− 18 0
1
0
0
0
0 0
1
0
1
1
0
0
0 1
8
1
8
8
1
8
0
0
0
0
8
0
0
1
8
8
0 0
1
0
0
1
0
− 18
8
0
0
1
8
− 18 1 0 0 −1 0 −1 0 1 1 0 0 0 0 − 18 0 −1 1 1 −1 0 0 1 1 − 81 0 0 0 − 18 0 0 0 −1 − 18 0 − 81
0 0 −1 0 0 0 0 −1 0 0
8
0 1
8 0 −1 0 1 0 1 0 −1 1 − 18 − 18 0 −1 0 0 0 0 −1 0
8
− 81
8
1
8 − 81 0 0 1
8 0 1
8 1 1 0 0 − 18 0
− 18 − 81 0 1 0 1 0 −1 0 0 −1 1 0 0 1 −1 0 0 0 0 1 1 0 0 0 − 18 1 0 − 18 0 1 −1 0 − 81 0 1

0 0 −1 0 0 −1 1 0 0 8 8 8 8

Table 4: h4, 2, 3; 20i-algorithm [35]

Uϕ Vψ Wυ
0 0 −1 0 0 1 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 −1 0
0 0 −1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 −1 0 0 0 0 1 0
0 0 0 0 0 0 −1 1 0 0 0 1 −1 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 −1
0 0 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0
0 −1 0 0 0 0 0 −1 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 −1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 −1 0 0 0 0 0 0 0
0 0 0 0 1 0 0 −1 0 −1 0 1 −1 0 0 0 0 −1 0 0 0 0 0 1 −1 0
0 0 0 −1 0 −1 0 0 0 0 1 −1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
0 −1 0 0 0 0 −1 0 −1 0 0 0 −1 0 0 0 0 0 0 0 1 0 0 −1 0 1
−1 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0
0 0 0 0 0 0 −1 0 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 0 0 −1
0 0 0 0 0 1 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
−1 0 0 0 0 0 0 −1 −1 0 0 −1 0 0 0 0 0 −1 0 0 0 0 0 0 0 0
0 0 0 0 1 −1 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 1 −1 0 0 0
0 0 0 −1 0 0 −1 0 0 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 −1 1 0 −1 −1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 −1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
0 0 0 0 −1 0 1 0 0 0 1 0 −1 1 0 0 0 0 0 0 1 0 0 0 0 0
0 −1 0 0 0 0 0 0 0 −1 0 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 −1
ϕ ψ υ −T
−1 0 0 0 0 0 0 −1 −1 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 0
−1 0 −1 0 0 0 0 0 −1 −1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0 0 0 −1 0 0 1 0 0 0 0 0 0 0 0 0
−1 −1 0 0 0 0 0 0 1 0 0 −1 0 −1 0 0 0 0 0 0 0 0 0 −1 1 0
1 0 0 0 0 0 0 0 1 1 −1 0 0 0 1 −1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 −1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 −1 −1
1 0 0 −1 0 0 0 0 0 0 −1 −1 0 1 0 0 0 0 1 0
1 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1 1 1 1 −1 −1
−1 0 0 0 0 0 0 1 0 0 0 0
0 0 0 −1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1 −1 −1
0 0 1 1 −1 −1 0 0 0 0 0 0

Matrix Operations - Richard Bronson - 2011 - SCHAUM's OUTLINES
100% (4)
Matrix Operations - Richard Bronson - 2011 - SCHAUM's OUTLINES
244 pages
Matrix Comp
No ratings yet
Matrix Comp
716 pages
Victor Pan Structured Matrix 2001
No ratings yet
Victor Pan Structured Matrix 2001
299 pages
Design and Analysis of Algorithms - Algebraic Algorithms
No ratings yet
Design and Analysis of Algorithms - Algebraic Algorithms
6 pages
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
No ratings yet
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
479 pages
(Original PDF) Introduction To Enumerative and Analytic Combinatorics, Second Edition Instant Download
100% (7)
(Original PDF) Introduction To Enumerative and Analytic Combinatorics, Second Edition Instant Download
46 pages
DAA-Q Bank (THEORY) Solved
No ratings yet
DAA-Q Bank (THEORY) Solved
63 pages
Matrix Operations
No ratings yet
Matrix Operations
4 pages
DQBJ Vol 2
100% (1)
DQBJ Vol 2
788 pages
Calculus With Matlab
No ratings yet
Calculus With Matlab
40 pages
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
100% (3)
Functions of Matrices Theory and Computation TQW - Darksiderg PDF
446 pages
Chap2 Force Systems
100% (1)
Chap2 Force Systems
36 pages
Unit 2 Strassen's Algo
No ratings yet
Unit 2 Strassen's Algo
7 pages
Lec3 dnc2 v1 Light 1up
No ratings yet
Lec3 dnc2 v1 Light 1up
37 pages
7 - Pictorial Projection
100% (6)
7 - Pictorial Projection
48 pages
Fast Reliable Algorithms For Matrices With Structure-Ed Kailith-Sayed
No ratings yet
Fast Reliable Algorithms For Matrices With Structure-Ed Kailith-Sayed
356 pages
(NATO ASI Series 274) Xxxxxxxxxxxxs (Eds.) - C2014.14 (1994)
No ratings yet
(NATO ASI Series 274) Xxxxxxxxxxxxs (Eds.) - C2014.14 (1994)
736 pages
Schaum S Outline of Theory and Problems of Matrix Operations PDF
No ratings yet
Schaum S Outline of Theory and Problems of Matrix Operations PDF
235 pages
Linear Equation Faster Than Mul
No ratings yet
Linear Equation Faster Than Mul
18 pages
CLRS 4 1 2
No ratings yet
CLRS 4 1 2
7 pages
Fundamentals of Numerical Linear Algebra
No ratings yet
Fundamentals of Numerical Linear Algebra
265 pages
Direct Methods For Space Matrices
No ratings yet
Direct Methods For Space Matrices
7 pages
Efficiently Calculating The Determinant of A Matrix
No ratings yet
Efficiently Calculating The Determinant of A Matrix
8 pages
Class4 1
No ratings yet
Class4 1
18 pages
DAA IA-1 Case Study Material-CSE
No ratings yet
DAA IA-1 Case Study Material-CSE
9 pages
Math 6610 - Analysis of Numerical Methods I
No ratings yet
Math 6610 - Analysis of Numerical Methods I
103 pages
Strassen
No ratings yet
Strassen
11 pages
Strassen S
No ratings yet
Strassen S
10 pages
2020 NW Grade 12 Assignment Term 1 Memo2020
No ratings yet
2020 NW Grade 12 Assignment Term 1 Memo2020
13 pages
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
Ebook 2
No ratings yet
Ebook 2
8 pages
Strassen Algorithm
No ratings yet
Strassen Algorithm
6 pages
Giris Hetal 2013
No ratings yet
Giris Hetal 2013
8 pages
Dynamic Programming
No ratings yet
Dynamic Programming
43 pages
10.1007@s40305 019 00280 X
No ratings yet
10.1007@s40305 019 00280 X
15 pages
Set 9 Pure Math 2025
No ratings yet
Set 9 Pure Math 2025
9 pages
Iterative Matrix Computation
No ratings yet
Iterative Matrix Computation
55 pages
LA Exercises 2
No ratings yet
LA Exercises 2
23 pages
Matrix Computations, Marko Huhtanen
No ratings yet
Matrix Computations, Marko Huhtanen
63 pages
Reader (09 10)
No ratings yet
Reader (09 10)
232 pages
Ipse Ilsen
No ratings yet
Ipse Ilsen
135 pages
Final Report Daa Case Study 1
No ratings yet
Final Report Daa Case Study 1
19 pages
Strassen Matrix DAA
No ratings yet
Strassen Matrix DAA
14 pages
Matrix Multiplication Algorithms With Better Time Complexity
No ratings yet
Matrix Multiplication Algorithms With Better Time Complexity
9 pages
19 Ways To Evaluate The Exponential of Matrices
No ratings yet
19 Ways To Evaluate The Exponential of Matrices
46 pages
Gee7 2011
No ratings yet
Gee7 2011
318 pages
Thesis
No ratings yet
Thesis
127 pages
Strassen's Matrix Multiplication
No ratings yet
Strassen's Matrix Multiplication
13 pages
Strassens Matrix Multiflication
No ratings yet
Strassens Matrix Multiflication
14 pages
Csce411 Divideconquer2
No ratings yet
Csce411 Divideconquer2
12 pages
Matrix Mult 09
No ratings yet
Matrix Mult 09
12 pages
Recurrence Relation Presentation MEERA
0% (1)
Recurrence Relation Presentation MEERA
27 pages
Knight Math221 Paper
No ratings yet
Knight Math221 Paper
34 pages
15cs204j-Algorithm Design and Analysis
No ratings yet
15cs204j-Algorithm Design and Analysis
3 pages
Divide and Conquer: Andreas Klappenecker (Based On Slides by Prof. Welch)
No ratings yet
Divide and Conquer: Andreas Klappenecker (Based On Slides by Prof. Welch)
27 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Fast Sparse Matrix Multiplication
No ratings yet
Fast Sparse Matrix Multiplication
11 pages
SE-Comps SEM4 AOA-CBCGS DEC18 SOLUTION
No ratings yet
SE-Comps SEM4 AOA-CBCGS DEC18 SOLUTION
15 pages
Strassen Matrix Multiplication: Under The Guidance of
No ratings yet
Strassen Matrix Multiplication: Under The Guidance of
10 pages
Notes Differential Equations
No ratings yet
Notes Differential Equations
37 pages
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
From Everand
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BMATE101
No ratings yet
BMATE101
5 pages
Analysis of Time Complexity of Strassens Algo
No ratings yet
Analysis of Time Complexity of Strassens Algo
4 pages
2.2 Conduction Heat Transfer Part II
No ratings yet
2.2 Conduction Heat Transfer Part II
31 pages
Mesh Generation: Application to Finite Elements
From Everand
Mesh Generation: Application to Finite Elements
Pascal Frey
No ratings yet
1980020753
No ratings yet
1980020753
152 pages
An Easy New Approach For Matrix Decomposition: Mohammed Hassan Elzubair
No ratings yet
An Easy New Approach For Matrix Decomposition: Mohammed Hassan Elzubair
3 pages
Functions of Several Complex Variables and Their Singularities Wolfgang Ebeling PDF Download
No ratings yet
Functions of Several Complex Variables and Their Singularities Wolfgang Ebeling PDF Download
77 pages
Lec 05
No ratings yet
Lec 05
49 pages
AAD Flow Networks and Divide and Conquer
No ratings yet
AAD Flow Networks and Divide and Conquer
17 pages
Libya Free High Study Academy / Misrata: - Report of Address
No ratings yet
Libya Free High Study Academy / Misrata: - Report of Address
19 pages
Introductory Mathematical Analysis: For Business, Economics, and The Life and Social Sciences
No ratings yet
Introductory Mathematical Analysis: For Business, Economics, and The Life and Social Sciences
31 pages
Category Theory LITE Alpha 2025 1 - 1
No ratings yet
Category Theory LITE Alpha 2025 1 - 1
18 pages
Xi Limits and Derivatives Examplar
No ratings yet
Xi Limits and Derivatives Examplar
19 pages
Marketing Part 3
No ratings yet
Marketing Part 3
4 pages
Right Triangle Trig Evaluating Ratios PDF
100% (1)
Right Triangle Trig Evaluating Ratios PDF
4 pages
Matrix Multiplication1
No ratings yet
Matrix Multiplication1
10 pages
CS124 Spring 2011: (N) Is The Number of Comparisons, Then T (N) 2T (n/2) + 2. (The 2T (n/2) Term Comes From
No ratings yet
CS124 Spring 2011: (N) Is The Number of Comparisons, Then T (N) 2T (n/2) + 2. (The 2T (n/2) Term Comes From
4 pages
Strassen Algorithm
No ratings yet
Strassen Algorithm
4 pages
Notes Unconstrained Max PDF
No ratings yet
Notes Unconstrained Max PDF
10 pages
Addition of Angular Momenta
No ratings yet
Addition of Angular Momenta
6 pages
MME A-Level-Formula-Booklet
No ratings yet
MME A-Level-Formula-Booklet
7 pages
Example 1.2: Several Springs Are Connected As Shown in Figure 1.2 Subjected To An Axial Force
No ratings yet
Example 1.2: Several Springs Are Connected As Shown in Figure 1.2 Subjected To An Axial Force
5 pages
Goudas Paper PDF
No ratings yet
Goudas Paper PDF
9 pages
Properties of Repeated Reflections
No ratings yet
Properties of Repeated Reflections
6 pages
Alg 1 Standard Pace Chart
No ratings yet
Alg 1 Standard Pace Chart
2 pages
Macaulays Method 5A
No ratings yet
Macaulays Method 5A
13 pages
Final Al-Mahdi11thg 16-17
No ratings yet
Final Al-Mahdi11thg 16-17
2 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

Read Various Algorithms Listed

Uploaded by

Read Various Algorithms Listed

Uploaded by

Sparsifying the Operators of Fast Matrix Multiplication

Elaye Karstadt Oded Schwartz

ABSTRACT 1.1 Previous work

Arithmetic Operations Leading Coefficient Improvement

1.2 Our contribution. 2 PRELIMINARIES

Arithmetic Operations Leading Coefficient

let U ∈ R t ×n ·m , V ∈ R t ×m ·k , W ∈ R t ×n ·k . Then hU , V , W i are 0

encoding/decoding matrices of an hn, m, k; ti-algorithm if and only

Ur,(i 1,i 2 )Vr,(j1, j2 )Wr,(k1,k2 ) = δi 1,k1 δi 2, j1 δ j2,k2

where δi, j = 1 if i = j and 0 otherwise. 0

A.1 A sample of Algorithms Table 6: h6, 3, 3; 40i-algorithm [35]

Table 4: h4, 2, 3; 20i-algorithm [35]

You might also like