A Multi-Level Blocking Distinct-Degree Factorization Algorithm
A Multi-Level Blocking Distinct-Degree Factorization Algorithm
1. Introduction
The problem of factoring a univariate polynomial P (x) over a finite field F
often arises in computational algebra [7, 11, 12]. An important case is when F
has small characteristic and P (x) has high degree but is sparse, that is P (x) has
only a small number of nonzero terms.
To simplify the exposition we restrict attention to the case where F = GF(2)
and P (x) is a trinomial
P (x) = xr + xs + 1, r > s > 0,
although the ideas apply more generally and should be useful for factoring sparse
polynomials over fields of small characteristic.
1991 Mathematics Subject Classification. Primary 11B83, 11Y05, 11Y16; Secondary 11-04,
11K31, 11N35, 11R09, 11T06, 11Y55, 12-04, 68Q25 .
Key words and phrases. Amortized complexity, distinct-degree factorization, finite field,
irreducible trinomial, Mersenne exponent, polynomial factorization, primitive trinomial.
c
2008 the authors. rpb230
1
2 RICHARD P. BRENT AND PAUL ZIMMERMANN
Our aim is to give an algorithm with good amortized complexity, that is, one
that works well on average. Since we are restricting attention to trinomials, we
average over all trinomials of fixed degree r.
Our motivation is to speed up previous algorithms for searching for irreducible
trinomials of high degree [5, 6, 13, 14]. For given degree r, we want to find all
irreducible trinomials xr + xs + 1.
In our examples the degree r is a Mersenne exponent, i.e., 2r − 1 is a Mersenne
prime. In this case an irreducible trinomial of degree r is necessarily primitive. In
general, without the restriction to Mersenne exponents, we would need the prime
factorisation of 2r − 1 in order to test primitivity (see e.g., [10]).
We are only interested in Mersenne exponents r = ±1 mod 8, because in other
cases Swan’s theorem [15, 21, 22] rules out irreducible trinomials of degree r
(except for s = 2 or r − 2, but these cases are usually easy to handle: for example
if r = 13466917 or 20996011 we have r = 1 mod 3, so xr + x2 + 1 is divisible by
x2 + x + 1).
Mersenne exponents can be found on the GIMPS website [23]. At the time
of writing, the five largest known Mersenne exponents r satisfying the condition
r = ±1 mod 8 are r = 6972593, 24036583, 25964951, 30402457 and 32582657. In
the smallest case r = 6972593, a primitive trinomial was found by Brent, Larvala
and Zimmermann [6] using an efficient implementation of the naive algorithm.
However, it was not feasible to consider the larger Mersenne exponents r using the
same algorithm, since the time complexity of this algorithm is roughly of order r3 ,
and the next case r = 24036583 would take about 41 times longer than r = 6972593.
With the new “fast” algorithm described in this paper we have been able to find two
primitive trinomials of degree r = 24036583 in less time than the naive algorithm
took for r = 6972593. The speedup over the naive algorithm for r = 24036583 is
about a factor of 560.
If xr + xs + 1 is reducible then we want to provide an easily-checked certificate
of reducibility. The certificate can simply be an encoding of an irreducible factor
f of xr + xs + 1. We choose the factor f of smallest degree d > 0. In case
there are several factors of equal smallest degree d, we give the one that is least in
lexicographic order, e.g., x3 + x + 1 is preferred to x3 + x2 + 1.
1.1. Distinct-degree factorization. Factorization of polynomials over finite
fields typically proceeds in three stages: square-free factorization, distinct-degree
factorization, and equal-degree factorization. The most time-consuming stage, and
the one that we consider in this paper, is distinct-degree factorization [8, 10, 11].
The program described in §4.3 performs equal-degree factorization when it is
necessary to split a product of equal-degree factors in order to give the unique cer-
tificate described above, but this is cheap (on average) because it is rarely required
for factors of high degree.
In the complexity analysis we only consider the time required to find one non-
trivial factor (it will be a factor of smallest degree) or output “irreducible”, since
that is what is required in the search for irreducible trinomials. However the algo-
rithm outlined in §2.4 readily extends to a complete distinct-degree factorization.
d
1.2. Factorization over GF(2). It is well-known that x2 + x is the product
of all irreducible polynomials of degree dividing d. For example,
3
x2 + x = x(x + 1)(x3 + x + 1)(x3 + x2 + 1).
A MULTI-LEVEL BLOCKING DISTINCT-DEGREE FACTORIZATION ALGORITHM 3
In this way we replace ℓ GCDs by one GCD and ℓ − 1 multiplications mod P (x).
The drawback of blocking is that we may have to backtrack if P (x) has more
than one factor with degree in the interval [d′ , d′ + ℓ), since the algorithm produces
the product of these factors. Thus ℓ should not be too large. The optimal strategy
depends on the expected size distribution of factors and the ratio of times for GCDs
and multiplications.
2.4. Multi-level blocking. Our new idea is to use a finer level of blocking
to replace most multiplications by squarings, which speeds up the computation in
GF(2)[x]/P (x) of the above interval polynomials. The idea is to split the interval
[d′ , d′ + ℓ) into k ≥ 2 smaller intervals of length m over which
Y
m−1
j
Xm
(2.2) pm (X, x) = X2 + x = xm−j sj,m (X),
j=0 j=0
where
X
(2.3) sj,m (X) = X k,
0≤k<2m , w(k)=j
and w(k) denotes the Hamming weight of k, that is the number of nonzero bits in
the binary representation of k.
A MULTI-LEVEL BLOCKING DISTINCT-DEGREE FACTORIZATION ALGORITHM 5
2.5. Sieving out small √ factors. We define a small factor to be one with
degree d < 21 log2 r, so 2d < r. The constant 21 in the definition is arbitrary and
could be replaced by any fixed constant in (0, 1). A large factor is a factor that is
not small.
6 RICHARD P. BRENT AND PAUL ZIMMERMANN
It would be inefficient to find small factors in the same way as large factors.
Instead, let D = 2d − 1, r′ = r mod D, s′ = s mod D. Then
′ ′
P (x) = xr + xs + 1 = xr + xs + 1 mod (xD − 1),
so we only need compute
′ ′
GCD(xr + xs + 1, xD − 1).
√
Because r′ , s′ < D < r, the cost of finding small factors is negligible (both
theoretically and in practice), so can be neglected.
2.6. Outer-level blocking strategy. The blocksize in the outer level of
blocking is ℓ = km. We take a linearly increasing sequence of block sizes
k = k0 j for j = 1, 2, 3, . . . ,
where the first interval starts at about log r (since small factors will have been
found by sieving).
The choice k = k0 j leads to a quadratic polynomial for the interval bounds.
More generally, we could take k to be a polynomial of degree δ > 0 in j, so the
interval bounds would be a polynomial of degree δ + 1. The analysis of §4 would go
through with minor changes. Generally, increasing δ reduces the number of GCDs
but increases the number of squarings/multiplications. In practice, we found that
the simple choice δ = 1 is close to optimal.
In principle, using the data that we have obtained on the distribution of degrees
of smallest factors of trinomials (see §3), and assuming that this distribution is not
very sensitive to the degree r, we could obtain a strategy that is close to optimal.
However, the choice k0 j with suitable k0 is easy to implement and not too far from
optimal. The number of GCD and sqr/mul operations is usually within a factor of
1.5 of the minimum possible in our experiments.
Table 1 gives the observed values of dπd for r = 3021377, r = 6972593, and
r = 24036583. The maximum values for each r are given in bold. The table shows
that the values of dπd are remarkably stable for small d, and bounded by 4 for
large d (this is because there are four irreducible trinomials of degree 3021377 and
also four of degree 24036583, when we count both trinomials xr + xs + 1 and their
reciprocals xr + xr−s + 1).
r
X
O(1) if β < 1,
β
Eβ := d pd = O(log r) if β = 1,
d=1 O(rβ−1 ) if β > 1.
8 RICHARD P. BRENT AND PAUL ZIMMERMANN
Proof. The proof is similar to that of Lemma 3.2. We end with the upper
bound
r−1
X (d + 1)β − dβ
+ Dβ πD−1 .
d
d=D
From Hypothesis 3.1, πD−1 = O(1/D), and the sum over d is O(Dβ−1 ), so the
result follows.
and from Lemma 3.2 this is O(m log r). Thus, the expected cost of sqr/mul opera-
tions per trinomial is
p p
O S(r) log r M (r)/S(r) = O log r M (r)S(r)
(4.1) = O r(log r)3/2 (log log r)1/2 .
If we used only a single level of blocking, then the cost of multiplications would
dominate that of squarings,
with an expected cost per trinomial of O (log rM (r)) =
O r(log r)2 log log r .
A MULTI-LEVEL BLOCKING DISTINCT-DEGREE FACTORIZATION ALGORITHM 9
The bound (4.1) is correct as r → ∞. In practice, for r < 6.4 × 107 , our imple-
mentation of Schönhage’s FFT-based polynomial multiplication algorithm [17] calls
a different multiplication routine (usually TC4) to perform smaller multiplications,
rather than recursively calling itself. TC4 has exponent α′ = ln(7)/ ln(4) ≈ 1.4, so
the effective exponent for FFT multiplication is α = (1 + α′ )/2 ≈ 1.2 > 1. In this
case, the expected cost of sqr/mul operations per trinomial is
p
(4.2) O log r M (r)S(r) = O(r(1+α)/2 log r) = O(r1.1··· log r).
4.1. Expected cost of GCDs. Suppose that P (x) has a smallest factor of
degree d. The number of GCDs required to find the √ factor, using our (quadratic
polynomial) blocking strategy, is at least 1, and O( d) if d is large. By Hypothe-
sis 3.1, the expected number of GCDs for a trinomial with no small factor is
X
1+O d1/2 pd ,
log2 r<2d≤r
improved the basecase multiplication code; more details concerning efficient multi-
plication in GF(2)[x] are available in [4]. Finally, we implemented a subquadratic
GCD routine, since NTL only provides a classical GCD for binary polynomials.
5. Conclusion
The new double-blocking strategy, combined with fast multiplication and GCD
algorithms, has allowed us to find new primitive trinomials of record degree.
The same ideas should work over finite fields GF(p) for small prime p > 2, and
for factoring sparse polynomials P (x) that are not necessarily trinomials: all we
need is that the time for p-th powers (mod P (x)) is much less than the time for
multiplication (mod P (x)).
A MULTI-LEVEL BLOCKING DISTINCT-DEGREE FACTORIZATION ALGORITHM 11
References
[1] M. Bodrato, Towards Optimal Toom-Cook Multiplication for Univariate and Multivariate
Polynomials in Characteristic 2 and 0, Lecture Notes in Computer Science 4547, 119–136.
Springer, 2007. http://bodrato.it/papers/#WAIFI2007
[2] W. Bosma, and J. Cannon, Handbook of Magma Functions, School of Mathematics and
Statistics, University of Sydney, 1995. http://magma.maths.usyd.edu.au/
[3] R. P. Brent, Search for primitive trinomials (mod 2), http://wwwmaths.anu.edu.au/∼ brent/
trinom.html
[4] R. P. Brent, P. Gaudry, E. Thomé and P. Zimmermann, Faster Multiplication in GF(2)[x],
Proceedings of ANTS VIII, A. van der Poorten, A. Stein, editors, Lecture Notes in Com-
puter Science, 2008, to appear. Also INRIA Tech Report RR-6359, http://hal.inria.fr/
inria-00188261/en/, Nov. 2007, 19 pp.
[5] R. P. Brent, S. Larvala and P. Zimmermann, A fast algorithm for testing reducibility of
trinomials mod 2 and some new primitive trinomials of degree 3021377, Math. Comp. 72
(2003), 1443–1452. http://wwwmaths.anu.edu.au/∼ brent/pub/pub199.html
[6] R. P. Brent, S. Larvala and P. Zimmermann, A primitive trinomial of degree 6972593, Math.
Comp. 74 (2005), 1001–1002, http://wwwmaths.anu.edu.au/∼ brent/pub/pub224.html
[7] D. G. Cantor and H. Zassenhaus, A new algorithm for factoring polynomials over finite fields,
Math. Comp. 36 (1981), 587–592.
[8] Ph. Flajolet, X. Gourdon and D. Panario, The complete analysis of a polynomial factorization
algorithm over finite fields, J. of Algorithms 40 (2001), 37–81.
[9] M. Fürer, Faster integer multiplication, Proceedings of the 39th annual ACM Symposium on
Theory of Computing (STOC 2007), 57–66.
[10] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press,
Cambridge, UK, 1999.
[11] J. von zur Gathen and J. Gerhard, Polynomial factorization over F2 , Math. Comp. 71 (2002),
1677–1698.
[12] J. von zur Gathen and V. Shoup, Computing Frobenius maps and factoring polynomials,
Computational Complexity 2 (1992), 187–224. http://www.shoup.net/papers/
[13] J. R. Heringa, H. W. J. Blöte and A. Compagner. New primitive trinomials of Mersenne-
exponent degrees for random-number generation, International J. of Modern Physics C 3
(1992), 561–564.
[14] T. Kumada, H. Leeb, Y. Kurita and M. Matsumoto, New primitive t-nomials (t = 3, 5) over
GF(2) whose degree is a Mersenne exponent, Math. Comp. 69 (2000), 811–814. Corrigenda:
ibid 71 (2002), 1337–1338.
[15] A.-E. Pellet, Sur la décomposition d’une fonction entière en facteurs irréductibles suivant un
module premier p, Comptes Rendus de l’Académie des Sciences Paris 86 (1878), 1071–1072.
[16] J. M. Pollard. A Monte Carlo method for factorization, BIT 15 (1975), 331–334.
[17] A. Schönhage, Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2,
Acta Inf. 7 (1977), 395–398.
[18] A. Schönhage and V. Strassen, Schnelle Multiplikation groβer Zahlen, Computing 7 (1971),
281–292.
[19] V. Shoup, NTL: A library for doing number theory, Version 5.4.1, http:www.shoup.net/ntl/
[20] A. Steel, personal communications, July 5–9, 2007.
[21] L. Stickelberger, Über eine neue Eigenschaft der Diskriminanten algebraischer Zahlkörper,
Verhandlungen des ersten Internationalen Mathematiker-Kongresses, Zürich, 1897, 182–193.
[22] R. G. Swan, Factorization of polynomials over finite fields, Pacific J. Math. 12 (1962), 1099–
1106.
[23] G. Woltman et al, GIMPS, The Great Internet Mersenne Prime Search, http://www.
mersenne.org/
12 RICHARD P. BRENT AND PAUL ZIMMERMANN
Centre de Recherche INRIA Nancy - Grand Est, 615 rue du Jardin Botanique, 54600
Villers-lès-Nancy, France
E-mail address: [email protected]