Integer Factorization
Integer Factorization
we won’t prove it here. The reason is that for a randomly chosen odd composite
integer n, the expected number of nonwitnesses to the compositeness of n is likely
to be very much smaller than .n 1/=2.
If the integer n is not chosen randomly, however, the best that can be proven is
that the number of nonwitnesses is at most .n 1/=4, using an improved version
of Theorem 31.38. Furthermore, there do exist integers n for which the number of
nonwitnesses is .n 1/=4.
Exercises
31.8-1
Prove that if an odd integer n > 1 is not a prime or a prime power, then there exists
a nontrivial square root of 1 modulo n.
31.8-2 ?
It is possible to strengthen Euler’s theorem slightly to the form
a.n/ 1 .mod n/ for all a 2 Zn ;
where n D p1e1 prer and �.n/ is defined by
�.n/ D lcm.�.p1e1 /; : : : ; �.prer // : (31.42)
Prove that �.n/ j �.n/. A composite number n is a Carmichael number if
�.n/ j n 1. The smallest Carmichael number is 561 D 3 11 17; here,
�.n/ D lcm.2; 10; 16/ D 80, which divides 560. Prove that Carmichael num-
bers must be both “square-free” (not divisible by the square of any prime) and the
product of at least three primes. (For this reason, they are not very common.)
31.8-3
Prove that if x is a nontrivial square root of 1, modulo n, then gcd.x 1; n/ and
gcd.x C 1; n/ are both nontrivial divisors of n.
Suppose we have an integer n that we wish to factor, that is, to decompose into a
product of primes. The primality test of the preceding section may tell us that n is
composite, but it does not tell us the prime factors of n. Factoring a large integer n
seems to be much more difficult than simply determining whether n is prime or
composite. Even with today’s supercomputers and the best algorithms to date, we
cannot feasibly factor an arbitrary 1024-bit number.
976 Chapter 31 Number-Theoretic Algorithms
P OLLARD -R HO .n/
1 i D1
2 x1 D R ANDOM .0; n 1/
3 y D x1
4 k D2
5 while TRUE
6 i D i C1
7 xi D .xi21 1/ mod n
8 d D gcd.y xi ; n/
9 if d ¤ 1 and d ¤ n
10 print d
11 if i == k
12 y D xi
13 k D 2k
x1 ; x2 ; x4 ; x8 ; x16 ; : : : :
Line 3 saves the value x1 , and line 12 saves xk whenever i is equal to k. The
variable k is initialized to 2 in line 4, and line 13 doubles it whenever line 12
updates y. Therefore, k follows the sequence 2; 4; 8; 16 : : : and always gives the
subscript of the next value xk to be saved in y.
Lines 8–10 try to find a factor of n, using the saved value of y and the cur-
rent value of xi . Specifically, line 8 computes the greatest common divisor
d D gcd.y xi ; n/. If line 9 finds d to be a nontrivial divisor of n, then line 10
prints d .
This procedure for finding a factor may seem somewhat mysterious at first.
Note, however, that P OLLARD -R HO never prints an incorrect answer; any num-
ber it prints is a nontrivial divisor of n. P OLLARD -R HO might not print anything
at all, though; it comes with no guarantee that it will print any divisors. We shall
see, however, that we have good reason to expect P OLLARD -R HO to print a fac-
p
tor p of n after ‚. p/ iterations of the while loop. Thus, if n is composite, we
can expect this procedure to discover enough divisors to factor n completely after
approximately n1=4 updates, p since every prime factor p of n except possibly the
largest one is less than n.
We begin our analysis of how this procedure behaves by studying how long
it takes a random sequence modulo n to repeat a value. Since Zn is finite, and
since each value in the sequence (31.44) depends only on the previous value, the
sequence (31.44) eventually repeats itself. Once we reach an xi such that xi D xj
for some j < i, we are in a cycle, since xi C1 D xj C1 , xi C2 D xj C2 , and so on.
The reason for the name “rho heuristic” is that, as Figure 31.7 shows, we can draw
the sequence x1 ; x2 ; : : : ; xj 1 as the “tail” of the rho and the cycle xj ; xj C1 ; : : : ; xi
as the “body” of the rho.
Let us consider the question of how long it takes for the sequence of xi to repeat.
This information is not exactly what we need, but we shall see later how to modify
the argument. For the purpose of this estimation, let us assume that the function
fn .x/ D .x 2 1/ mod n
behaves like a “random” function. Of course, it is not really random, but this as-
sumption yields results consistent with the observed behavior of P OLLARD -R HO.
We can then consider each xi to have been independently drawn from Zn according
to a uniform distribution
p on Zn . By the birthday-paradox analysis of Section 5.4.1,
we expect ‚. n/ steps to be taken before the sequence cycles.
Now for the required modification. Let p be a nontrivial factor of n such that
gcd.p; n=p/ D 1. For example, if n has the factorization n D p1e1 p2e2 prer , then
we may take p to be p1e1 . (If e1 D 1, then p is just the smallest prime factor of n,
a good example to keep in mind.)
978 Chapter 31 Number-Theoretic Algorithms
996 310
814 396
x700
x7 177 84
31
595 1053
x4 63 x40 x400 63
6
x70
x3 8 x30 8 x60 x300 8
16
x2 3 x20 3 x200 3
x50
x1 2 x10 2 x100 2
mod 1387 mod 19 mod 73
(a) (b) (c)
Figure 31.7 Pollard’s rho heuristic. (a) The values produced by the recurrence xi C1 D
.xi2 1/ mod 1387, starting with x1 D 2. The prime factorization of 1387 is 19 73. The heavy
arrows indicate the iteration steps that are executed before the factor 19 is discovered. The light
arrows point to unreached values in the iteration, to illustrate the “rho” shape. The shaded values are
the y values stored by P OLLARD -R HO. The factor 19 is discovered upon reaching x7 D 177, when
gcd.63 177; 1387/ D 19 is computed. The first x value that would be repeated is 1186, but the
factor 19 is discovered before this value is repeated. (b) The values produced by the same recurrence,
modulo 19. Every value xi given in part (a) is equivalent, modulo 19, to the value xi0 shown here.
For example, both x4 D 63 and x7 D 177 are equivalent to 6, modulo 19. (c) The values produced
by the same recurrence, modulo 73. Every value xi given in part (a) is equivalent, modulo 73, to the
value xi00 shown here. By the Chinese remainder theorem, each node in part (a) corresponds to a pair
of nodes, one from part (b) and one from part (c).
D ..xi0 /2 1/ mod p
D fp .xi0 / :
Thus, although we are not explicitly computing the sequence hxi0 i, this sequence is
well defined and obeys the same recurrence as the sequence hxi i.
Reasoning as before, we find that the expected number of steps before the se-
p
quence hxi0 i repeats is ‚. p/. If p is small compared to n, the sequence hxi0 i might
repeat much more quickly than the sequence hxi i. Indeed, as parts (b) and (c) of
Figure 31.7 show, the hxi0 i sequence repeats as soon as two elements of the se-
quence hxi i are merely equivalent modulo p, rather than equivalent modulo n.
Let t denote the index of the first repeated value in the hxi0 i sequence, and let
u > 0 denote the length of the cycle that has been thereby produced. That is, t
and u > 0 are the smallest values such that x t0 Ci D x t0 CuCi for all i 0. By the
p
above arguments, the expected values of t and u are both ‚. p/. Note that if
x t0 Ci D x t0 CuCi , then p j .x t CuCi x t Ci /. Thus, gcd.x t CuCi x t Ci ; n/ > 1.
Therefore, once P OLLARD -R HO has saved as y any value xk such that k t,
then y mod p is always on the cycle modulo p. (If a new value is saved as y,
that value is also on the cycle modulo p.) Eventually, k is set to a value that
is greater than u, and the procedure then makes an entire loop around the cycle
modulo p without changing the value of y. The procedure then discovers a factor
of n when xi “runs into” the previously stored value of y, modulo p, that is, when
xi y .mod p/.
Presumably, the factor found is the factor p, although it may occasionally hap-
pen that a multiple of p is discovered. Since the expected values of both t and u are
p p
‚. p/, the expected number of steps required to produce the factor p is ‚. p/.
This algorithm might not perform quite as expected, for two reasons. First, the
heuristic analysis of the running time is not rigorous, and it is possible that the cycle
p
of values, modulo p, could be much larger than p. In this case, the algorithm
performs correctly but much more slowly than desired. In practice, this issue seems
to be moot. Second, the divisors of n produced by this algorithm might always be
one of the trivial divisors 1 or n. For example, suppose that n D pq, where p
and q are prime. It can happen that the values of t and u for p are identical with
the values of t and u for q, and thus the factor p is always revealed in the same
gcd operation that reveals the factor q. Since both factors are revealed at the same
980 Chapter 31 Number-Theoretic Algorithms
time, the trivial divisor pq D n is revealed, which is useless. Again, this problem
seems to be insignificant in practice. If necessary, we can restart the heuristic with
a different recurrence of the form xi C1 D .xi2 c/ mod n. (We should avoid the
values c D 0 and c D 2 for reasons we will not go into here, but other values are
fine.)
Of course, this analysis is heuristic and not rigorous, since the recurrence is
not really “random.” Nonetheless, the procedure performs well in practice, and
it seems to be as efficient as this heuristic analysis indicates. It is the method of
choice for finding small prime factors of a large number. To factor a ˇ-bit compos-
ite number n completely, we only need to find all prime factors less than bn1=2 c,
and so we expect P OLLARD -R HO to require at most n1=4 D 2ˇ=4 arithmetic opera-
tions and at most n1=4 ˇ 2 D 2ˇ=4 ˇ 2 bit operations. P OLLARD -R HO’s ability to find
p
a small factor p of n with an expected number ‚. p/ of arithmetic operations is
often its most appealing feature.
Exercises
31.9-1
Referring to the execution history shown in Figure 31.7(a), when does P OLLARD -
R HO print the factor 73 of 1387?
31.9-2
Suppose that we are given a function f W Zn ! Zn and an initial value x0 2 Zn .
Define xi D f .xi 1 / for i D 1; 2; : : :. Let t and u > 0 be the smallest values such
that x t Ci D x t CuCi for i D 0; 1; : : :. In the terminology of Pollard’s rho algorithm,
t is the length of the tail and u is the length of the cycle of the rho. Give an efficient
algorithm to determine t and u exactly, and analyze its running time.
31.9-3
How many steps would you expect P OLLARD -R HO to require to discover a factor
of the form p e , where p is prime and e > 1?
31.9-4 ?
One disadvantage of P OLLARD -R HO as written is that it requires one gcd compu-
tation for each step of the recurrence. Instead, we could batch the gcd computa-
tions by accumulating the product of several xi values in a row and then using this
product instead of xi in the gcd computation. Describe carefully how you would
implement this idea, why it works, and what batch size you would pick as the most
effective when working on a ˇ-bit number n.