0% found this document useful (0 votes)

10 views11 pages

Can Be Faster

The document introduces a new algorithm called RXTX for computing the product of a matrix and its transpose, which reduces the number of multiplications and additions by 5% compared to the previous state-of-the-art methods. RXTX is developed through a combination of machine learning techniques and combinatorial optimization, demonstrating improved efficiency even for smaller matrices. The paper provides detailed comparisons and theoretical analysis of RXTX against existing algorithms, highlighting its advantages in specific structured matrix products.

Uploaded by

openid_mgC8yaK1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

Can Be Faster

Uploaded by

openid_mgC8yaK1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

XX t Can Be Faster

Dmitry Rybin ∗ 1,2 , Yushun Zhang †1,2 , and Zhi-Quan Luo‡1,2

1 The Chinese University of Hong Kong, Shenzhen, China
2 Shenzhen Research Institute of Big Data

May 16, 2025

arXiv:2505.09814v1 [cs.DS] 14 May 2025

Abstract
We present a new algorithm RXTX for computation of the product of matrix by its transpose XX t . RXTX
uses 5% less multiplications and additions and provides accelerations even for small sizes of matrix X.
The algorithm was discovered by combining Machine Learning-based search methods with Combinatorial
Optimization.

1 Introduction

Algorithm Previous State-of-the-Art for XX t RXTX - AI-discovered

 Algorithm
 
 
At Ct

 
Bt Dt
Matrix Form
AAt + BBt AC t + BD t
   
A B
C D ∗ CC t + DD t 



 ∗



   ∗ ∗ 
∗ ∗ ∗
Recursion S(n) = 4S(n/2) + 2M(n/2) R(n) = 8R(n/4) + 26M (n/4)
Asymptotic S(n) ∼ 23 M (n) R(n) ∼ 26
41 M ( n )
4 × 4 rank 38 34

Table 1: New algorithm (RXTX) is based on recursive 4 × 4 block matrix multiplication. It uses 8 recursive
calls and 26 general products. In comparison, previous SotA uses 16 recursive calls and 24 general products.
R(n), S(n), M(n) - are the number of multiplications performed by RXTX, previous SotA, and Strassen
algorithm respectively for n × n matrix X. RXTX asymptotic constant 26/41 ≈ 0.6341 is 5% smaller than
2/3 ≈ 0.6666, which is asymptotic constant of previous SotA.

Finding faster matrix multiplication algorithms is a central challenge in computer science and numerical
linear algebra. Since the groundbreaking results of Strassen Strassen [1969] and Winograd Winograd [1968],
which demonstrated that the number of multiplications required for a general matrix product AB can
be significantly reduced, extensive research has emerged exploring this problem. Techniques in the area
∗ Correspondence author. Email: [email protected].
† Email: [email protected].
‡ Email: [email protected]

1
Algorithm 1 RXTX - AI-discovered asymptotic SotA for XX t
1: Input: 4 × 4 block-matrix X
2: Output: C = XX t using 8 recursive calls and 26 general products.
3: m1 = (− X2 + X3 − X4 + X8 ) · ( X8 + X11 )t
4: m2 = ( X1 − X5 − X6 + X7 ) · ( X15 + X5 )t
5: m3 = (− X2 + X12 ) · (− X10 + X16 + X12 )t
6: m4 = ( X9 − X6 ) · ( X13 + X9 − X14 )t
7: m5 = ( X2 + X11 ) · (− X6 + X15 − X7 )t
8: m6 = ( X6 + X11 ) · ( X6 + X7 − X11 )t
9: m7 = X11 · ( X6 + X7 )t
10: m8 = X2 · (− X14 − X10 + X6 − X15 + X7 + X16 + X12 )t
11: m9 = X6 · ( X13 + X9 − X14 − X10 + X6 + X7 − X11 )t
12: m10 = ( X2 − X3 + X7 + X11 + X4 − X8 ) · X11 t

13: m11 = ( X5 + X6 − X7 ) · X5t

14: m12 = ( X2 − X3 + X4 ) · X8t

15: m13 = (− X1 + X5 + X6 + X3 − X7 + X11 ) · X15 t

16: m14 = (− X1 + X5 + X6 ) · ( X13 + X9 + X15 )t

17: m15 = ( X2 + X4 − X8 ) · ( X11 + X16 + X12 )t
18: m16 = ( X1 − X8 ) · ( X9 − X16 )t
19: m17 = X12 · ( X10 − X12 )t
20: m18 = X9 · ( X13 − X14 )t
21: m19 = (− X2 + X3 ) · (− X15 + X7 + X8 )t
22: m20 = ( X5 + X9 − X8 ) · X9t
23: m21 = X8 · ( X9 − X8 + X12 )t
24: m22 = (− X6 + X7 ) · ( X5 + X7 − X11 )t
25: m23 = X1 · ( X13 − X5 + X16 )t
26: m24 = (− X1 + X4 + X12 ) · X16 t

27: m25 = ( X9 + X2 + X10 ) · X14t

28: m26 = ( X6 + X10 + X12 ) · X10 t

29: t
s 1 = X1 X1
30: s2 = X2 X2t
31: s3 = X3 X3t
32: s4 = X4 X4t
33: s5 = X13 X13t

34: s6 = X14 X14t

35: s7 = X15 X15t

36: s8 = X16 X16t

37: C11 = s1 + s2 + s3 + s4
38: C12 = m2 − m5 − m7 + m11 + m12 + m13 + m19
39: C13 = m1 + m3 + m12 + m15 + m16 + m17 + m21 − m24
40: C14 = m2 − m3 − m5 − m7 − m8 + m11 + m13 − m17 + m23 + m24
41: C22 = m1 + m6 − m7 + m10 + m11 + m12 + m22
42: C23 = m1 − m4 + m6 − m7 − m9 + m10 + m12 + m18 + m20 + m21
43: C24 = m2 + m4 + m11 + m14 + m16 − m18 − m20 + m23
44: C33 = m4 − m6 + m7 + m9 − m17 − m18 + m26
45: C34 = m3 + m5 + m7 + m8 + m17 + m18 + m25
46: C44 = s5 + s6 + s7 + s8
47: return C 2
range from gradient descent approaches Smirnov [2013] and heuristics Éric Drevet et al. [2011], to group-
theoretic methods Ye and Lim [2018], graph-based random walks Kauers and Moosbauer [2022], and deep
reinforcement learning Fawzi et al. [2022].
Despite this progress, much less attention has been paid to matrix products with additional structure,
such as B = A or B = At , or products involving sparsity or symmetry Dumas et al. [2020, 2023], Arrigoni
et al. [2021]. This is surprising given that expressions like AAt are widely used in fields such as statistics,
data analysis, deep learning, and wireless communications. For example, AAt often represents a covariance
or Gram matrix, while in linear regression, the solution for the data pair ( X, y) involves the data covariance
matrix X t X:
β = ( X t X )−1 X t y.
From a theoretical standpoint, computing XX t has the same asymptotic complexity as general matrix
multiplication. As a result, only constant-factor speedups are possible. The RXTX algorithm, presented in
Algorithm 1, achieves such a speedup by exploiting structure specific to XX t .

1.1 Previous works

Prior work by Ye and Lim [2016, 2018] used representation theory and the Cohn–Umans framework to derive
new multiplication schemes for structured matrix products. Reinforcement learning methods have also been
applied to this domain. For instance, Fawzi et al. [2022] used deep RL to compute tensor ranks and discover
novel multiplication algorithms. Neural Networks wtih proper training setup can rediscover Strassen and
Laderman algorithms for small matrices Elser [2016].
More recently, Dumas et al. [2020, 2023] proposed optimized schemes for computing XX T over finite
fields and complex numbers. To the best of our knowledge, the current state-of-the-art approach for real-
valued XX T is due to Arrigoni et al. [2021], which recursively applies Strassen’s algorithm to 2 × 2 block
matrices, reducing the problem to general matrix multiplication. In contrast, our approach uses the structure
of XX t in a novel way.

2 Analysis of RXTX
We define
• R(n) - number of multiplications performed by RXTX for n × n matrix
• S(n) - number of multiplications performed by recursive Strassen Arrigoni et al. [2021] for n × n matrix
• M (n) - number of multiplications performed by Strassen-Winograd algorithm for general product of
n × n matrices
• R+ (n) - number of additions and multiplications performed by RXTX for n × n matrix
• S+ (n) - number of additions and multiplications performed by recursive Strassen Arrigoni et al. [2021]
for n × n matrix

• M+ (n) - number of additions and multiplications performed by Strassen-Winograd algorithm for

general product of n × n matrices
The superscript opt indicates an optimal cutoff: for sufficiently small matrices, standard matrix multiplication
is used instead of further recursive calls.

3
2.1 Number of multiplications
Theorem 1. The number of multiplications for RXTX:
26 15 26 log 7 15 3/2
R(n) = M(n) + n3/2 = n 2 + n .
41 41 41 41
The number of multiplications for recursive Strassen:
2 1 2 1
S(n) = M (n) + n2 = nlog2 7 + n2 .
3 3 3 3
Proof. The definition of RXTX involves 8 recursive calls and 26 general matrix multiplications. It follows that
R(n) = 8R(n/4) + 26M (n/4).
The general solutions to this recursive equation has a form Cormen et al. [2009]
R(n) = αM (n) + βn3/2 .
Plugging n = 1 and n = 4 we get
1 = α + β,
34 = 49α + 8β.
Solving this system we obtain
26 15
α= ≈ 0.6341, β= ≈ 0.3658.
41 41
Similarly, recursive Strassen for XX t uses 4 recursive calls and 2 general matrix multiplications:
S(n) = 4S(n/2) + 2M (n/2).
General solution form
S(n) = γM (n) + δn2 .
Plugging n = 1 and n = 2 we get
1 = γ + δ, 6 = 7γ + 4δ.
Solving this system we obtain γ = 2/3 ≈ 0.6666 and δ = 1/3 ≈ 0.3333.
In Figure 1 we can see the ratio R(n)/S(n) for n given by powers of 4. The ratio always stays below 100%
and approaches the asymptotic 95%, which indicates a 5% reduction in the number of multiplications. Same
happens in Figure 2, where we use optimal cutoff i.e. for small enough matrix sizes we use standard matrix
multiplication instead of further recursive calls.

Ratio of R(n) to S(n). Ratio of R(n) to naive algorithm n2 (n + 1)/2.

Figure 1: Comparison of number of multiplications of RXTX to previous SotA and naive algorithm.

4
Ratio of Ropt (n) to Sopt (n). Ratio of Ropt (n) to n2 (n + 1)/2.

Figure 2: Comparison of number of multiplications of RXTX with optimal cutoff to previous SotA and naive
algorithm.

2.2 Total number of operations

Theorem 2. Total number of additions and multiplications for RXTX:
156 log 7 615 2 155 3/2
R+ (n) = n 2 − n + n .
41 164 164
Total number of additions and multiplications for recursive Strassen:
7
S+ (n) = 4nlog2 7 − n2 log2 n − 3n2 .
4
Proof. The definition of RXTX involves 139 additions of (n/4) × (n/4) matrices. There exist methods in
the literature Mårtensson and Wagner [2024] to reduce this number by utilizing common sub-expressions
that appear in the algorithm 1 e.g. X6 + X7 . For example, while Strassen algorithm uses 18 additions,
its Winograd variant uses only 15 additions. We designed a custom search that allowed us to reduce the
number of additions in RXTX from 139 to 100. We provide the resulting addition scheme in Algorithm 2 and
Algorithm 3. Assuming 100 additions, we get the recursion
R+ (n) = 8R+ (n/4) + 26M+ (n/4) + 100(n/4)2 .
General solution has a form
26
R+ (n) = M+ (n) + αn2 + βn3/2 .
41
Plugging the value n = 1 and n = 4 gives
26
+ α + β = 1.
41
26
· 214 + 16α + 8β = 134
41
We conclude that
95 155
α=− ≈ −0.5793, β= ≈ 0.9451.
164 164
Similarly, definition of recursive Strassen gives
S+ (n) = 4S+ (n/2) + 2M+ (n) + 3(n/2)2 .
Which has a solution of the form
2
S+ ( n ) = M+ (n) + γn2 log2 n + δn2 .
3

5
Plugging values n = 1 and n = 2 gives γ = −7/4 and δ = 1/3. It is known Cenk and Hasan [2017] that
M+ (n) = 6nlog2 7 − 5n2 .
It follows that
26 95 2 155 3/2 156 log 7 615 2 155 3/2
R+ (n) = (6nlog2 7 − 5n2 ) − n + n = n 2 − n + n
41 164 164 41 164 164
and
2 7 1 7
S+ ( n ) = (6nlog2 7 − 5n2 ) − n2 log2 n + n2 = 4nlog2 7 − n2 log2 n − 3n2 .
3 4 3 4

Ratio of R+ (n) to S+ (n). Ratio of R+ (n) to (2n − 1)n(n + 1)/2.

Figure 3: Comparison of number of operations of RXTX to recursive Strassen and naive algorithm. RXTX
outperforms recursive Strassen for n ≥ 256 and naive algorithm for n ≥ 1024.

opt opt opt

Ratio of R+ (n) to S+ (n). Ratio of R+ (n) to (2n − 1)n(n + 1)/2.

Figure 4: Comparison of algorithms with optimal cutoffs i.e. for small enough matrices in recursion switch
to the algorithm with least operations. RXTX outperforms naive algorithm for n ≥ 32 and SotA for n ≥ 256.

6
Algorithm 2 First stage of optimized addition scheme. The number of additions is reduced from 77 to 53.
1: Input: X1 , X2 , ..., X16
2: Output: Left elements L1 , ..., L26 and right elements R1 , ..., R26 of multiplications m1 , ...m26
3: y1 ← X13 − X14
4: y2 ← X12 − X10
5: w 1 ← X2 + X4 − X8
6: w 2 ← X1 − X5 − X6
7: w 3 ← X6 + X7
8: w4 ← X14 + X15
9: w5 ← y2 + X16
10: w6 ← X10 + X11
11: w 7 ← X9 + y 1
12: w 8 ← X9 − X8
13: w9 ← X7 − X11
14: w10 ← X6 − X7
15: w11 ← X2 − X3
16: L 1 ← − w 1 + X3 R1 ← X8 + X11
17: L 2 ← w 2 + X7 R2 ← X15 + X5
18: L3 ← − X2 + X12 R 3 ← w5
19: L 4 ← X9 − X6 R 4 ← w7
20: L5 ← X2 + X11 R5 ← X15 − w3
21: L6 ← X6 + X11 R6 ← w3 − X11
22: L7 ← X11 R 7 ← w3
23: L 8 ← X2 R 8 ← w3 − w4 + w5
24: L 9 ← X6 R 9 ← w7 − w6 + w3
25: L10 ← w1 − X3 + X7 + X11 R10 ← X11
26: L11 ← X5 + w10 R11 ← X5
27: L12 ← w11 + X4 R12 ← X8
28: L13 ← −w2 + X3 − w9 R13 ← X15
29: L14 ← −w2 R14 ← w7 + w4
30: L15 ← w1 R15 ← w6 + w5
31: L16 ← X1 − X8 R16 ← X9 − X16
32: L17 ← X12 R17 ← −y2
33: L18 ← X9 R18 ← y1
34: L19 ← −w11 R19 ← − X15 + X7 + X8
35: L20 ← X5 + w8 R20 ← X9
36: L21 ← X8 R21 ← X12 + w8
37: L22 ← −w10 R22 ← X5 + w9
38: L23 ← X1 R23 ← X13 − X5 + X16
39: L24 ← − X1 + X4 + X12 R24 ← X16
40: L25 ← X9 + X2 + X10 R25 ← X14
41: L26 ← X6 + X10 + X12 R26 ← X10

7
Algorithm 3 Second stage of optimized addition scheme. The number of additions is reduced from 62 to 47.
1: Input: m1 , m2 , ..., m26 and s1 , ...s8 .
2: Output: Entries Cij using 47 additions.
3: z1 ← m7 − m11 − m12
4: z2 ← m1 + m12 + m21
5: z3 ← m3 + m17 − m24
6: z4 ← m2 + m11 + m23
7: z5 ← m5 + m7 + m8
8: z6 ← m4 − m18 − m20
9: z7 ← m6 − m7 − m9
10: z8 ← m17 + m18
11: C11 ← s1 + s2 + s3 + s4
12: C12 ← m2 − m5 − z1 + m13 + m19
13: C13 ← z2 + z3 + m15 + m16
14: C14 ← z4 − z3 − z5 + m13
15: C22 ← m1 + m6 − z1 + m10 + m22
16: C23 ← z2 − z6 + z7 + m10
17: C24 ← z4 + z6 + m14 + m16
18: C33 ← m4 − z7 − z8 + m26
19: C34 ← m3 + z5 + z8 + m25
20: C44 ← s5 + s6 + s7 + s8

2.3 Runtime of RXTX

We verify that RXTX gives speed-up in practice for large sizes of matrix X. Figure 5 shows histogram of
runtimes in the following setup:
• 6144 × 6144 dense matrix is sampled 1000 times with random normal N (0, 1) entries. Here 6144 =
3 · 212 .
• RXTX is implemented as depth-1 recursion i.e. we directly use BLAS routines to compute 26 general
matrix multiplications and 8 symmetric products of matrices of size 1536 × 1536.
• Default is direct call of BLAS-3 routine for XX t .
• single thread CPU 10th Gen Intel Core i7-10510U Processor, 1.8 GHz 4 cores.
We didn’t perform search for the smallest matrix size where RXTX outperforms other methods since runtime
is highly sensitive to hardware, computation graph organization, and memory management. Figure 4
suggests that RXTX can be faster than recursive Strassen for n ≥ 256.

8
Figure 5: The average runtime for RXTX is 2.524s, which is 9% faster than average runtime of specific BLAS
routine 2.778s. RXTX was faster in 99% of the runs.

3 Discovery Methodology
3.1 Description of RL-guided Large Neighborhood Search
In this section we briefly present our methodology. Full methodology with other discovered accelerations
will be described in Rybin et al. [2025]. We combine RL-guided Large Neighborhood Search Wu et al. [2021],
Addanki et al. [2020] with a two–level MILP pipeline:
1. The RL agent proposes a (potentially redundant) set of rank-1 bilinear products;
2. MILP-A exhaustively enumerates tens of thousands of linear relations between these candidate rank-1
bilinear products and target expressions;

3. MILP-B then selects the smallest subset of products whose induced relations cover every target
expression of XX t .
The loop iterates under a Large Neighborhood Search regime. One way to view this pipeline is a simplifi-
2 2 2
cation of AlphaTensor RL approach Fawzi et al. [2022]: instead of sampling tensors from Rn ⊗ Rn ⊗ Rn ,
2 2
we sample candidate tensors from Rn ⊗ Rn and let the MILP solver find optimal linear combinations of
sampled candidates.

9
3.2 Example: matrix times transpose algorithm search for 2-by-2 matrix
Consider the example for 2 × 2 matrix X. We want to perform the computation of XX t :
x12 + x22

x1 x2 x1 x3 x1 x3 + x2 x4
· =
x3 x4 x2 x4 x1 x3 + x2 x4 x32 + x42
We identify 3 target expressions
T = { x12 + x22 , x32 + x42 , x1 x3 + x2 x4 }.
We randomly sample thousands of products p1 , ..., pm , each one given by
! !
4 4
∑ αi xi · ∑ β j xj
i =1 j =1

with αi , β j ∈ {−1, 0, +1} chosen by RL policy πθ . MILP-A enumerates ways to write target expressions
from T as linear combinations of sampled products ∑ γi pi . MILP-B selects minimal number of sampled
products such that every target expression can be obtained as their linear combination. Key observation is
that MILP-A and MILP-B are rapidly solvable with solvers like Gurobi Gurobi Optimization, LLC [2024].

4 Acknowledgements
The work of Z.-Q. Luo was supported by the Guangdong Major Project of Basic and Applied Basic Research
(No.2023B0303000001), the Guangdong Provincial Key Laboratory of Big Data Computing, and the National
Key Research and Development Project under grant 2022YFA1003900.

References
R. Addanki, V. Nair, and M. Alizadeh. Neural large neighborhood search. In Learning Meets Combinatorial
Algorithms @ NeurIPS 2020, 2020.

V. Arrigoni, F. Maggioli, A. Massini, and E. Rodolà. Efficiently parallelizable strassen-based multiplication of

a matrix by its transpose. In Proceedings of the 50th International Conference on Parallel Processing, ICPP ’21,
New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450390682.
M. Cenk and M. A. Hasan. On the arithmetic complexity of strassen-like matrix multiplications. Journal of
Symbolic Computation, 80:484–501, 2017. ISSN 0747-7171. doi: https://doi.org/10.1016/j.jsc.2016.07.004.
URL https://www.sciencedirect.com/science/article/pii/S0747717116300359.
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, 3rd Edition. MIT Press,
2009. ISBN 978-0-262-03384-8. URL http://mitpress.mit.edu/books/introduction-algorithms.
J.-G. Dumas, C. Pernet, and A. Sedoglavic. On fast multiplication of a matrix by its transpose. In Proceedings
of the 45th International Symposium on Symbolic and Algebraic Computation, ISSAC ’20, page 162–169, New
York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450371001.
J.-G. Dumas, C. Pernet, and A. Sedoglavic. Some fast algorithms multiplying a matrix by its adjoint. Journal
of Symbolic Computation, 115:285–315, 2023.
V. Elser. A network that learns strassen multiplication. Journal of Machine Learning Research, 17:1–13, 2016.

A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, A. Novikov, F. J. R Ruiz,

J. Schrittwieser, G. Swirszcz, et al. Discovering faster matrix multiplication algorithms with reinforcement
learning. Nature, 610(7930):47–53, 2022.

10
Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2024. URL https://www.gurobi.com.

M. Kauers and J. Moosbauer. The fbhhrbnrssshk-algorithm for multiplication in Z52×5 is still not the end of
the story. ArXiv, abs/2210.04045, 2022. URL https://api.semanticscholar.org/CorpusID:252780122.

E. Mårtensson and P. S. Wagner. The number of the beast: Reducing additions in fast matrix multiplication
algorithms for dimensions up to 666. Cryptology ePrint Archive, Paper 2024/2063, 2024. URL https:
//eprint.iacr.org/2024/2063.
D. Rybin, Y. Zhang, and Z.-Q. Luo. Accelerating structured matrix computations with machine learning
based search. in progress, 2025.

A. V. Smirnov. The bilinear complexity and practical algorithms for matrix multiplication. Compu-
tational Mathematics and Mathematical Physics, 53(12):1781–1795, Dec. 2013. ISSN 1555-6662. doi:
10.1134/s0965542513120129. URL http://dx.doi.org/10.1134/S0965542513120129.
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13(4):354–356, Aug. 1969. ISSN
0945-3245. doi: 10.1007/bf02165411. URL http://dx.doi.org/10.1007/BF02165411.

S. Winograd. A new algorithm for inner product. IEEE Transactions on Computers, 100(7):693–694, 1968.
Y. Wu, W. Song, Z. Cao, and J. Zhang. Learning large neighborhood search policy for integer programming.
arXiv preprint arXiv:2111.03466, 2021.
K. Ye and L.-H. Lim. Algorithms for structured matrix-vector product of optimal bilinear complexity. In
2016 IEEE Information Theory Workshop (ITW), pages 310–314. IEEE, 2016.
K. Ye and L.-H. Lim. Fast structured matrix computations: tensor rank and cohn–umans method. Foundations
of Computational Mathematics, 18:45–95, 2018.
C. Éric Drevet, M. Nazrul Islam, and Éric Schost. Optimization techniques for small matrix multiplication.
Theoretical Computer Science, 412(22):2219–2236, 2011. ISSN 0304-3975. doi: https://doi.org/10.1016/j.tcs.
2010.12.012. URL https://www.sciencedirect.com/science/article/pii/S0304397510007036.

CLRS 4 1 2
No ratings yet
CLRS 4 1 2
7 pages
Divide-And-Conquer (CLRS 4.2) : Matrix Multiplication
No ratings yet
Divide-And-Conquer (CLRS 4.2) : Matrix Multiplication
4 pages
Libya Free High Study Academy / Misrata: - Report of Address
No ratings yet
Libya Free High Study Academy / Misrata: - Report of Address
19 pages
ProblemSheets2015 Solutions
No ratings yet
ProblemSheets2015 Solutions
202 pages
Da CH 4
No ratings yet
Da CH 4
5 pages
DAA IA-1 Case Study Material-CSE
No ratings yet
DAA IA-1 Case Study Material-CSE
9 pages
Lec3 dnc2 v1 Light 1up
No ratings yet
Lec3 dnc2 v1 Light 1up
37 pages
Strassen Algorithm
No ratings yet
Strassen Algorithm
4 pages
Strassen S
No ratings yet
Strassen S
10 pages
Discrete Project
No ratings yet
Discrete Project
10 pages
Strassen's Matrix Multiplication
100% (5)
Strassen's Matrix Multiplication
11 pages
Strassen Matrix Multiplication: Under The Guidance of
No ratings yet
Strassen Matrix Multiplication: Under The Guidance of
10 pages
CSE548 Lecture 3
No ratings yet
CSE548 Lecture 3
28 pages
Strassens Matrix Multiflication
No ratings yet
Strassens Matrix Multiflication
14 pages
Matrix Multiplication1
No ratings yet
Matrix Multiplication1
10 pages
Strassen Matrix Multiplication
No ratings yet
Strassen Matrix Multiplication
11 pages
Strassen's Matrix Multiplication
No ratings yet
Strassen's Matrix Multiplication
13 pages
Csce411 Divideconquer2
No ratings yet
Csce411 Divideconquer2
12 pages
Strassen's Matrix Multiplication: Sibel Kirmizigül
No ratings yet
Strassen's Matrix Multiplication: Sibel Kirmizigül
11 pages
FALLSEM2023-24 CSE2012 ETH VL2023240103657 2023-10-06 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE2012 ETH VL2023240103657 2023-10-06 Reference-Material-I
11 pages
Strassen
No ratings yet
Strassen
11 pages
Strassen
No ratings yet
Strassen
11 pages
Strassen
No ratings yet
Strassen
11 pages
Strassen Matrix DAA
No ratings yet
Strassen Matrix DAA
14 pages
Strassen
No ratings yet
Strassen
11 pages
B-Large Integers and Strassen's Multiplication
No ratings yet
B-Large Integers and Strassen's Multiplication
4 pages
Fast Sparse Matrix Multiplication
No ratings yet
Fast Sparse Matrix Multiplication
11 pages
Exponential and Logarithmic Functions
No ratings yet
Exponential and Logarithmic Functions
9 pages
Divide and Conquer
No ratings yet
Divide and Conquer
15 pages
Read Various Algorithms Listed
No ratings yet
Read Various Algorithms Listed
11 pages
Linear and Discrete Problems
No ratings yet
Linear and Discrete Problems
2 pages
MISCELLANEOUS
No ratings yet
MISCELLANEOUS
17 pages
On The Complexity of Matrix Multiplication
No ratings yet
On The Complexity of Matrix Multiplication
117 pages
Class4 1
No ratings yet
Class4 1
18 pages
Strassen's Matrix Multiplication
No ratings yet
Strassen's Matrix Multiplication
11 pages
10 Strassens Matrix Multiplication
No ratings yet
10 Strassens Matrix Multiplication
25 pages
L16 - 25.11.2018 - Divide and Conquer - Strassen's Method, Power
No ratings yet
L16 - 25.11.2018 - Divide and Conquer - Strassen's Method, Power
20 pages
Automatic Reproduction of A Genius Algorithm Strassens Algorithm Revisited by Genetic Search
No ratings yet
Automatic Reproduction of A Genius Algorithm Strassens Algorithm Revisited by Genetic Search
6 pages
Design and Analysis of Algorithms - Tutorial Sheet Practice
No ratings yet
Design and Analysis of Algorithms - Tutorial Sheet Practice
2 pages
Strassens
No ratings yet
Strassens
5 pages
Module 2 Slides II
No ratings yet
Module 2 Slides II
15 pages
Linear Algebra: Assignment I
No ratings yet
Linear Algebra: Assignment I
11 pages
Practice Sheet 4 2016 Practice Sheet 4 2016
No ratings yet
Practice Sheet 4 2016 Practice Sheet 4 2016
3 pages
Strassen Matrix Multiplication
No ratings yet
Strassen Matrix Multiplication
29 pages
Final Report Daa Case Study 1
No ratings yet
Final Report Daa Case Study 1
19 pages
Midterm F15 Sols
No ratings yet
Midterm F15 Sols
5 pages
Matrix Algorithms
No ratings yet
Matrix Algorithms
74 pages
ADA Madhav
No ratings yet
ADA Madhav
5 pages
Slide 04b
No ratings yet
Slide 04b
15 pages
Daa Lab Manual 24
No ratings yet
Daa Lab Manual 24
36 pages
Local Search For Fast Matrix Multiplication
No ratings yet
Local Search For Fast Matrix Multiplication
9 pages
Strassen's Matrix Multiplication
100% (1)
Strassen's Matrix Multiplication
10 pages
Simulation 2012
No ratings yet
Simulation 2012
28 pages
Practice Sheet Divide and Conquer
No ratings yet
Practice Sheet Divide and Conquer
5 pages
Victor Pan Structured Matrix 2001
No ratings yet
Victor Pan Structured Matrix 2001
299 pages
Efficient Realization of Givens Rotation Through Algorithm-Architecture Co-Design For Acceleration of QR Factorization
No ratings yet
Efficient Realization of Givens Rotation Through Algorithm-Architecture Co-Design For Acceleration of QR Factorization
6 pages
Black on the Block: The Politics of Race and Class in the City
From Everand
Black on the Block: The Politics of Race and Class in the City
Mary Pattillo
3.5/5 (10)
Catalan's Constant [Ramanujan's Formula]
From Everand
Catalan's Constant [Ramanujan's Formula]
Greg Fee
No ratings yet
Expected Viva Questions
No ratings yet
Expected Viva Questions
2 pages
ELEN4903 hw1 Spring2018
No ratings yet
ELEN4903 hw1 Spring2018
2 pages
Adaboost
No ratings yet
Adaboost
5 pages
2nd International Conference IOT, Blockchain and Cryptography (IOTBC 2024)
No ratings yet
2nd International Conference IOT, Blockchain and Cryptography (IOTBC 2024)
3 pages
ST107 Solutions 4
No ratings yet
ST107 Solutions 4
5 pages
Markov Chain Algorithm in Java
No ratings yet
Markov Chain Algorithm in Java
7 pages
Ee 324 Lab 1 Report
No ratings yet
Ee 324 Lab 1 Report
7 pages
Course Syllabus - Spring 2023 CS11212 Data Structures and Introduc/on To Algorithms
No ratings yet
Course Syllabus - Spring 2023 CS11212 Data Structures and Introduc/on To Algorithms
5 pages
Credit Card Fraud Detection Using AI
No ratings yet
Credit Card Fraud Detection Using AI
18 pages
Introduction (v4)
No ratings yet
Introduction (v4)
16 pages
Karnaugh Maps (K-Maps)
No ratings yet
Karnaugh Maps (K-Maps)
12 pages
hw7 Sol
No ratings yet
hw7 Sol
3 pages
Form Finding of Shells by Structural Optimization
No ratings yet
Form Finding of Shells by Structural Optimization
9 pages
156az - Finite Element Methods
No ratings yet
156az - Finite Element Methods
2 pages
LPP-Graphical Method
No ratings yet
LPP-Graphical Method
48 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
Tutsheet 8
No ratings yet
Tutsheet 8
2 pages
Review B - Unit 2 Topics 2.1 - 2.4
No ratings yet
Review B - Unit 2 Topics 2.1 - 2.4
4 pages
Tugas 6 Analisis Multivariat Data Panel
No ratings yet
Tugas 6 Analisis Multivariat Data Panel
11 pages
Linear Algebra
No ratings yet
Linear Algebra
20 pages
Seminar 8 - Network Models I (Exercise)
No ratings yet
Seminar 8 - Network Models I (Exercise)
2 pages
미분적분학 솔루션 2판 제임스 스튜어트 1 200
No ratings yet
미분적분학 솔루션 2판 제임스 스튜어트 1 200
201 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
11 Ceaser-X-orCustomized 2231450 2221510 2221555 2110655 2111498
No ratings yet
11 Ceaser-X-orCustomized 2231450 2221510 2221555 2110655 2111498
1 page
JFJF
No ratings yet
JFJF
14 pages
Expected Value
No ratings yet
Expected Value
3 pages
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
No ratings yet
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
10 pages
Iterative Control Structure
No ratings yet
Iterative Control Structure
6 pages
Cryptography 2
No ratings yet
Cryptography 2
9 pages
MCL705 Intro 2023
No ratings yet
MCL705 Intro 2023
2 pages

Can Be Faster

Uploaded by

Can Be Faster

Uploaded by

XX t Can Be Faster

Dmitry Rybin ∗ 1,2 , Yushun Zhang †1,2 , and Zhi-Quan Luo‡1,2

May 16, 2025

Algorithm Previous State-of-the-Art for XX t RXTX - AI-discovered

13: m11 = ( X5 + X6 − X7 ) · X5t

14: m12 = ( X2 − X3 + X4 ) · X8t

16: m14 = (− X1 + X5 + X6 ) · ( X13 + X9 + X15 )t

27: m25 = ( X9 + X2 + X10 ) · X14t

28: m26 = ( X6 + X10 + X12 ) · X10 t

34: s6 = X14 X14t

35: s7 = X15 X15t

36: s8 = X16 X16t

1.1 Previous works

• M+ (n) - number of additions and multiplications performed by Strassen-Winograd algorithm for

Ratio of R(n) to S(n). Ratio of R(n) to naive algorithm n2 (n + 1)/2.

2.2 Total number of operations

Ratio of R+ (n) to S+ (n). Ratio of R+ (n) to (2n − 1)n(n + 1)/2.

opt opt opt

2.3 Runtime of RXTX

V. Arrigoni, F. Maggioli, A. Massini, and E. Rodolà. Efficiently parallelizable strassen-based multiplication of

A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, A. Novikov, F. J. R Ruiz,

You might also like