0% found this document useful (0 votes)
44 views6 pages

Integer Factorization

The document discusses integer factorization, specifically focusing on Pollard's rho heuristic, which is an effective method for factoring composite numbers. It outlines the algorithm's steps, its expected performance, and the conditions under which it operates, emphasizing that while it is not guaranteed to always find a factor, it is generally effective in practice. The analysis includes considerations about the algorithm's efficiency and potential pitfalls, but concludes that it remains a preferred method for finding small prime factors of large integers.

Uploaded by

kunalrastogi13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views6 pages

Integer Factorization

The document discusses integer factorization, specifically focusing on Pollard's rho heuristic, which is an effective method for factoring composite numbers. It outlines the algorithm's steps, its expected performance, and the conditions under which it operates, emphasizing that while it is not guaranteed to always find a factor, it is generally effective in practice. The analysis includes considerations about the algorithm's efficiency and potential pitfalls, but concludes that it remains a preferred method for finding small prime factors of large integers.

Uploaded by

kunalrastogi13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

31.

9 Integer factorization 975

we won’t prove it here. The reason is that for a randomly chosen odd composite
integer n, the expected number of nonwitnesses to the compositeness of n is likely
to be very much smaller than .n  1/=2.
If the integer n is not chosen randomly, however, the best that can be proven is
that the number of nonwitnesses is at most .n  1/=4, using an improved version
of Theorem 31.38. Furthermore, there do exist integers n for which the number of
nonwitnesses is .n  1/=4.

Exercises

31.8-1
Prove that if an odd integer n > 1 is not a prime or a prime power, then there exists
a nontrivial square root of 1 modulo n.

31.8-2 ?
It is possible to strengthen Euler’s theorem slightly to the form
a.n/  1 .mod n/ for all a 2 Zn ;
where n D p1e1    prer and �.n/ is defined by
�.n/ D lcm.�.p1e1 /; : : : ; �.prer // : (31.42)
Prove that �.n/ j �.n/. A composite number n is a Carmichael number if
�.n/ j n  1. The smallest Carmichael number is 561 D 3  11  17; here,
�.n/ D lcm.2; 10; 16/ D 80, which divides 560. Prove that Carmichael num-
bers must be both “square-free” (not divisible by the square of any prime) and the
product of at least three primes. (For this reason, they are not very common.)

31.8-3
Prove that if x is a nontrivial square root of 1, modulo n, then gcd.x  1; n/ and
gcd.x C 1; n/ are both nontrivial divisors of n.

? 31.9 Integer factorization

Suppose we have an integer n that we wish to factor, that is, to decompose into a
product of primes. The primality test of the preceding section may tell us that n is
composite, but it does not tell us the prime factors of n. Factoring a large integer n
seems to be much more difficult than simply determining whether n is prime or
composite. Even with today’s supercomputers and the best algorithms to date, we
cannot feasibly factor an arbitrary 1024-bit number.
976 Chapter 31 Number-Theoretic Algorithms

Pollard’s rho heuristic


Trial division by all integers up to R is guaranteed to factor completely any number
up to R2 . For the same amount of work, the following procedure, P OLLARD -R HO,
factors any number up to R4 (unless we are unlucky). Since the procedure is only
a heuristic, neither its running time nor its success is guaranteed, although the
procedure is highly effective in practice. Another advantage of the P OLLARD -
R HO procedure is that it uses only a constant number of memory locations. (If you
wanted to, you could easily implement P OLLARD -R HO on a programmable pocket
calculator to find factors of small numbers.)

P OLLARD -R HO .n/
1 i D1
2 x1 D R ANDOM .0; n  1/
3 y D x1
4 k D2
5 while TRUE
6 i D i C1
7 xi D .xi21  1/ mod n
8 d D gcd.y  xi ; n/
9 if d ¤ 1 and d ¤ n
10 print d
11 if i == k
12 y D xi
13 k D 2k

The procedure works as follows. Lines 1–2 initialize i to 1 and x1 to a randomly


chosen value in Zn . The while loop beginning on line 5 iterates forever, searching
for factors of n. During each iteration of the while loop, line 7 uses the recurrence
xi D .xi21  1/ mod n (31.43)
to produce the next value of xi in the infinite sequence
x1 ; x2 ; x3 ; x4 ; : : : ; (31.44)
with line 6 correspondingly incrementing i. The pseudocode is written using sub-
scripted variables xi for clarity, but the program works the same if all of the sub-
scripts are dropped, since only the most recent value of xi needs to be maintained.
With this modification, the procedure uses only a constant number of memory lo-
cations.
Every so often, the program saves the most recently generated xi value in the
variable y. Specifically, the values that are saved are the ones whose subscripts are
powers of 2:
31.9 Integer factorization 977

x1 ; x2 ; x4 ; x8 ; x16 ; : : : :
Line 3 saves the value x1 , and line 12 saves xk whenever i is equal to k. The
variable k is initialized to 2 in line 4, and line 13 doubles it whenever line 12
updates y. Therefore, k follows the sequence 2; 4; 8; 16 : : : and always gives the
subscript of the next value xk to be saved in y.
Lines 8–10 try to find a factor of n, using the saved value of y and the cur-
rent value of xi . Specifically, line 8 computes the greatest common divisor
d D gcd.y  xi ; n/. If line 9 finds d to be a nontrivial divisor of n, then line 10
prints d .
This procedure for finding a factor may seem somewhat mysterious at first.
Note, however, that P OLLARD -R HO never prints an incorrect answer; any num-
ber it prints is a nontrivial divisor of n. P OLLARD -R HO might not print anything
at all, though; it comes with no guarantee that it will print any divisors. We shall
see, however, that we have good reason to expect P OLLARD -R HO to print a fac-
p
tor p of n after ‚. p/ iterations of the while loop. Thus, if n is composite, we
can expect this procedure to discover enough divisors to factor n completely after
approximately n1=4 updates, p since every prime factor p of n except possibly the
largest one is less than n.
We begin our analysis of how this procedure behaves by studying how long
it takes a random sequence modulo n to repeat a value. Since Zn is finite, and
since each value in the sequence (31.44) depends only on the previous value, the
sequence (31.44) eventually repeats itself. Once we reach an xi such that xi D xj
for some j < i, we are in a cycle, since xi C1 D xj C1 , xi C2 D xj C2 , and so on.
The reason for the name “rho heuristic” is that, as Figure 31.7 shows, we can draw
the sequence x1 ; x2 ; : : : ; xj 1 as the “tail” of the rho and the cycle xj ; xj C1 ; : : : ; xi
as the “body” of the rho.
Let us consider the question of how long it takes for the sequence of xi to repeat.
This information is not exactly what we need, but we shall see later how to modify
the argument. For the purpose of this estimation, let us assume that the function
fn .x/ D .x 2  1/ mod n
behaves like a “random” function. Of course, it is not really random, but this as-
sumption yields results consistent with the observed behavior of P OLLARD -R HO.
We can then consider each xi to have been independently drawn from Zn according
to a uniform distribution
p on Zn . By the birthday-paradox analysis of Section 5.4.1,
we expect ‚. n/ steps to be taken before the sequence cycles.
Now for the required modification. Let p be a nontrivial factor of n such that
gcd.p; n=p/ D 1. For example, if n has the factorization n D p1e1 p2e2    prer , then
we may take p to be p1e1 . (If e1 D 1, then p is just the smallest prime factor of n,
a good example to keep in mind.)
978 Chapter 31 Number-Theoretic Algorithms

996 310

814 396

x700
x7 177 84
31

x6 1186 120 x600 18 11

x5 1194 339 529 x500 26 47

595 1053
x4 63 x40 x400 63
6
x70
x3 8 x30 8 x60 x300 8

16
x2 3 x20 3 x200 3
x50

x1 2 x10 2 x100 2
mod 1387 mod 19 mod 73
(a) (b) (c)

Figure 31.7 Pollard’s rho heuristic. (a) The values produced by the recurrence xi C1 D
.xi2  1/ mod 1387, starting with x1 D 2. The prime factorization of 1387 is 19  73. The heavy
arrows indicate the iteration steps that are executed before the factor 19 is discovered. The light
arrows point to unreached values in the iteration, to illustrate the “rho” shape. The shaded values are
the y values stored by P OLLARD -R HO. The factor 19 is discovered upon reaching x7 D 177, when
gcd.63  177; 1387/ D 19 is computed. The first x value that would be repeated is 1186, but the
factor 19 is discovered before this value is repeated. (b) The values produced by the same recurrence,
modulo 19. Every value xi given in part (a) is equivalent, modulo 19, to the value xi0 shown here.
For example, both x4 D 63 and x7 D 177 are equivalent to 6, modulo 19. (c) The values produced
by the same recurrence, modulo 73. Every value xi given in part (a) is equivalent, modulo 73, to the
value xi00 shown here. By the Chinese remainder theorem, each node in part (a) corresponds to a pair
of nodes, one from part (b) and one from part (c).

The sequence hxi i induces a corresponding sequence hxi0 i modulo p, where


xi0 D xi mod p
for all i.
Furthermore, because fn is defined using only arithmetic operations (squaring
and subtraction) modulo n, we can compute xi0 C1 from xi0 ; the “modulo p” view of
31.9 Integer factorization 979

the sequence is a smaller version of what is happening modulo n:


xi0 C1 D xi C1 mod p
D fn .xi / mod p
D ..xi2  1/ mod n/ mod p
D .xi2  1/ mod p (by Exercise 31.1-7)
D ..xi mod p/  1/ mod p
2

D ..xi0 /2  1/ mod p
D fp .xi0 / :
Thus, although we are not explicitly computing the sequence hxi0 i, this sequence is
well defined and obeys the same recurrence as the sequence hxi i.
Reasoning as before, we find that the expected number of steps before the se-
p
quence hxi0 i repeats is ‚. p/. If p is small compared to n, the sequence hxi0 i might
repeat much more quickly than the sequence hxi i. Indeed, as parts (b) and (c) of
Figure 31.7 show, the hxi0 i sequence repeats as soon as two elements of the se-
quence hxi i are merely equivalent modulo p, rather than equivalent modulo n.
Let t denote the index of the first repeated value in the hxi0 i sequence, and let
u > 0 denote the length of the cycle that has been thereby produced. That is, t
and u > 0 are the smallest values such that x t0 Ci D x t0 CuCi for all i  0. By the
p
above arguments, the expected values of t and u are both ‚. p/. Note that if
x t0 Ci D x t0 CuCi , then p j .x t CuCi  x t Ci /. Thus, gcd.x t CuCi  x t Ci ; n/ > 1.
Therefore, once P OLLARD -R HO has saved as y any value xk such that k  t,
then y mod p is always on the cycle modulo p. (If a new value is saved as y,
that value is also on the cycle modulo p.) Eventually, k is set to a value that
is greater than u, and the procedure then makes an entire loop around the cycle
modulo p without changing the value of y. The procedure then discovers a factor
of n when xi “runs into” the previously stored value of y, modulo p, that is, when
xi  y .mod p/.
Presumably, the factor found is the factor p, although it may occasionally hap-
pen that a multiple of p is discovered. Since the expected values of both t and u are
p p
‚. p/, the expected number of steps required to produce the factor p is ‚. p/.
This algorithm might not perform quite as expected, for two reasons. First, the
heuristic analysis of the running time is not rigorous, and it is possible that the cycle
p
of values, modulo p, could be much larger than p. In this case, the algorithm
performs correctly but much more slowly than desired. In practice, this issue seems
to be moot. Second, the divisors of n produced by this algorithm might always be
one of the trivial divisors 1 or n. For example, suppose that n D pq, where p
and q are prime. It can happen that the values of t and u for p are identical with
the values of t and u for q, and thus the factor p is always revealed in the same
gcd operation that reveals the factor q. Since both factors are revealed at the same
980 Chapter 31 Number-Theoretic Algorithms

time, the trivial divisor pq D n is revealed, which is useless. Again, this problem
seems to be insignificant in practice. If necessary, we can restart the heuristic with
a different recurrence of the form xi C1 D .xi2  c/ mod n. (We should avoid the
values c D 0 and c D 2 for reasons we will not go into here, but other values are
fine.)
Of course, this analysis is heuristic and not rigorous, since the recurrence is
not really “random.” Nonetheless, the procedure performs well in practice, and
it seems to be as efficient as this heuristic analysis indicates. It is the method of
choice for finding small prime factors of a large number. To factor a ˇ-bit compos-
ite number n completely, we only need to find all prime factors less than bn1=2 c,
and so we expect P OLLARD -R HO to require at most n1=4 D 2ˇ=4 arithmetic opera-
tions and at most n1=4 ˇ 2 D 2ˇ=4 ˇ 2 bit operations. P OLLARD -R HO’s ability to find
p
a small factor p of n with an expected number ‚. p/ of arithmetic operations is
often its most appealing feature.

Exercises

31.9-1
Referring to the execution history shown in Figure 31.7(a), when does P OLLARD -
R HO print the factor 73 of 1387?

31.9-2
Suppose that we are given a function f W Zn ! Zn and an initial value x0 2 Zn .
Define xi D f .xi 1 / for i D 1; 2; : : :. Let t and u > 0 be the smallest values such
that x t Ci D x t CuCi for i D 0; 1; : : :. In the terminology of Pollard’s rho algorithm,
t is the length of the tail and u is the length of the cycle of the rho. Give an efficient
algorithm to determine t and u exactly, and analyze its running time.

31.9-3
How many steps would you expect P OLLARD -R HO to require to discover a factor
of the form p e , where p is prime and e > 1?

31.9-4 ?
One disadvantage of P OLLARD -R HO as written is that it requires one gcd compu-
tation for each step of the recurrence. Instead, we could batch the gcd computa-
tions by accumulating the product of several xi values in a row and then using this
product instead of xi in the gcd computation. Describe carefully how you would
implement this idea, why it works, and what batch size you would pick as the most
effective when working on a ˇ-bit number n.

You might also like