0% found this document useful (0 votes)
67 views26 pages

RT Hcs

The document discusses the Pell equation x^2 - ny^2 = 1, where n is a nonsquare integer. It introduces the equation and its connection to irrational square roots. The document then: 1) Explains that solutions (x,y) to the equation form a group under composition of solutions. 2) Provides an example of finding all solutions to the equation x^2 - 2y^2 = 1 by composing the smallest nontrivial solution (3,2). 3) Notes that while the smallest solution depends on n in a mysterious way, once found all other solutions can be generated by a simple formula.

Uploaded by

london
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views26 pages

RT Hcs

The document discusses the Pell equation x^2 - ny^2 = 1, where n is a nonsquare integer. It introduces the equation and its connection to irrational square roots. The document then: 1) Explains that solutions (x,y) to the equation form a group under composition of solutions. 2) Provides an example of finding all solutions to the equation x^2 - 2y^2 = 1 by composing the smallest nontrivial solution (3,2). 3) Notes that while the smallest solution depends on n in a mysterious way, once found all other solutions can be generated by a simple formula.

Uploaded by

london
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

5

The Pell equation

P REVIEW
The so-called Pell equation x2 − ny2 = 1 (wrongly attributed to Pell
by Euler) is one of the oldest equations in mathematics and it is
fundamental to the study of quadratic Diophantine equations. The
Greeks studied the special case x2 − 2y2 = 1 because they realized√
that its natural number solutions throw light on the nature of 2.
There is a similar connection
√ between the natural number solutions
of x2 − ny2 = 1 and n when n is any nonsquare natural number.

The irrationality of n when n is nonsquare causes strange behavior
√ the solutions of x − ny = 1. Nevertheless, the irrationality of
in 2 2

n reflects light back on the equation: it leads to simple algebraic


structure, and a simple general formula for all integer solutions of
x2 − ny2 = 1 in terms of the smallest natural number solution.
But there is no simple formula for the smallest natural number solu-
tion and it is not trivial even to prove that it exists. In this chapter we
give two proofs: the first is a relatively
√ direct proof due to Dirichlet,
based on the approximation of n by rational numbers. The second
(in the starred sections at the end of the chapter) is based on a more
general theory of quadratic forms due to Conway.
We include Conway’s theory because it is a natural extension of
our study of the Euclidean algorithm (particularly the results in the
starred sections of Chapter 2) and because it gives a very simple
explanation of periodicity phenomena connected with the Pell equa-

tion and n. It also gives a highly visual approach to the subject,
which makes the complex behavior of the Pell equation surprisingly
easy to grasp.

76
5.1 Side and diagonal numbers 77

5.1 Side and diagonal numbers


The ancient
√ Greeks met the equation x2 − 2y2 = 1 in their efforts to under-
stand 2, the diagonal of the unit square, which they knew to be irrational.
They found a way to produce arbitrarily large solutions (x1 , y1√), (x2 , y2 ), . . .
of this equation, and hence fractions√ xi /yi that approximate 2 arbitrarily
closely. The fractions xi /yi tend to 2, because if xi2 − 2y2i = 1 then

xi2 1
2
= 2+ 2 → 2 as yi → ∞.
yi yi

Thus if yi is the side of a square, xi approximates the diagonal.


The Greeks discovered the solutions (xi , yi ) among the “side numbers”
si and “diagonal numbers” di defined by

d1 = 3, s1 = 2,
di+1 = di + 2si , si+1 = di + si .

It follows from these equations that

d12 − 2s21 = 1, 2
di+1 − 2s2i+1 = −(di2 − 2s2i ).

Hence the odd-numbered pairs (d1 , s1 ), (d3 , s3 ), (d5 , s5 ), . . . satisfy the


equation x2 − 2y2 = 1 while the rest satisfy x2 − 2y2 = −1.
The first equation is an example of a Pell equation, the general form of
which is x2 − ny2 = 1 where n is a nonsquare integer. The second is closely
related to it; in fact we later look at all values of x2 − ny2 in order to see
whether they include the value 1.

Irrational square roots


In dealing with equations x2 − ny2 = 1, where n is a nonsquare integer, we

rely heavily on the irrationality of n proved in Section 2.5.
The upside of irrationality is that we can encode a pair of integers (a, b)

by a single real number a + b n; we say that this number has rational part
a and irrational part b. Real and imaginary parts are meaningful because

if n is irrational, a1 , b1 , a2 , b2 ∈ Z, and
√ √
a1 + b1 n = a2 + b2 n,

then a1 = a2 and b1 = b2 .
78 5 The Pell equation

Suppose, on the contrary, that b1 = b2 . Then



a1 − a2 = (b2 − b1 ) n,
√ a −a
and, since b2 −b1 = 0, we get n = b1 −b2 . This contradicts the irrationality
√ 2 1
of n. Hence b1 = b2 , and therefore a1 = a2 . 2

Exercises

In the sections that follow we use numbers of the form xi +yi n to encode solution
pairs of x2 −ny2 = 1. To give a√ taste of how this works, the following two exercises
use numbers of the form a + b 2 to encode (diagonal,side) pairs.
√ √
5.1.1 Check that (1 + 2)2 = 3 + 2 2 and that
√ √ √
(x + y 2)(1 + 2) = x + 2y + (x + y) 2.
√ √
5.1.2 Use induction to show from Exercise 5.1.1 that (1 + 2)n+1 = dn + sn 2.
When n is an integer square, the equation x2 − ny2 = 1 is not so interesting,
so we dispose of it right now.
5.1.3 By factorizing the left-hand side of x2 − y2 = 1, show that it has only two
integer solutions.
5.1.4 Show similarly that x2 − ny2 = 1 has only two integer solutions when n is a
square positive integer.

5.2 The equation x2 − 2y2 = 1


It is straightforward to find all rational solutions of x2 − ny2 = 1 by Dio-
phantus’ method (draw the line of slope t through the rational point (1, 0)).
Thus the method of solution is completely independent of n.
It is a different matter to find even one integer solution of x2 − ny2 = 1
other than the obvious ones (±1, 0). The least positive solution = (±1, 0)
depends on n in a mysterious way. However, once this least nontrivial solu-
tion is found, all other integer solutions are generated by a simple formula.
We illustrate the method for the case n = 2.
When x2 − 2y2 = 1 the smallest integer solution = (±1, 0) can be found
by trial to be (3, 2). Other solutions can then be found by the following
composition rule: if (x1 , y1 ) and (x2 , y2 ) are solutions of x2 − 2y2 = 1, then
so is (x3 , y3 ), where x3 and y3 are defined by
√ √ √
(x1 + y1 2)(x2 + y2 2) = x3 + y3 2.
5.2 The equation x2 − 2y2 = 1 79

To show that this rule gives a new solution we first calculate x3 and
y3 . Expanding the left-hand side, and collecting its rational and irrational
parts, we find that
x3 = x1 x2 + 2y1 y2 , y3 = x1 y2 + y1 x2 .
It can then be checked by multiplication that
(x1 x2 + 2y1 y2 )2 − 2(x1 y2 + y1 x2 )2 = (x12 − 2y21 )(x22 − 2y22 ) = 1 × 1 = 1.
Hence x32 − 2y23 = 1, as required. 2
Examples. Composing the solution (3, 2) with itself, we get a new solution
(x3 , y3 ), where
√ √ √ √
x3 + y3 2 = (3 + 2 2)2 = 9 + 8 + 12 2 = 17 + 12 2.
Equating rational and irrational parts, x3 = 17, y3 = 12, which is indeed
another solution. If we then compose (17, 12) with (3, 2) we get
√ √ √ √
(17 + 12 2)(3 + 2 2) = 51 + 48 + (36 + 34) 2 = 99 + 70 2,
hence another solution is (99, 70), and so on. By this process we can ob-
tain infinitely many integer solutions, but it is not clear how close we are
to finding all integer solutions. The situation becomes clearer when we
observe that a group structure is present.

Exercises
Another way to arrive at the composition rule is to use the irrational factorization
√ √
x2 − 2y2 = (x − y 2)(x + y 2). (*)
We suppose that 1 = x12 − 2y21 and 1 = x22 − 2y22 , so that
1 = 1 × 1 = (x12 − 2y21 )(x22 − 2y22 ). (**)
5.2.1 Apply the factorization (*) to each factor on the right-hand side of (**),
then combine the factors in a different way to show that

1 = [x1 x2 + 2y1 y2 − (x1 y2 + y1 x2 ) 2]

× [x1 x2 + 2y1 y2 + (x1 y2 + y1 x2 ) 2].

5.2.2 Deduce from Exercise 5.2.1 that x32 − 2y23 = 1, where


x3 = x1 x2 + 2y1 y2 and y3 = x1 y2 + y1 x2 .

In Section 5.4 we generalize this method to find a composition rule for solu-
tions of x2 − ny2 = 1.
80 5 The Pell equation

5.3 The group of solutions


Not only do solutions (x1 , y1 ) and (x2 , y2 ) of x2 − 2y2 = 1 have a “product”
(x1 x2 + 2y1 y2 , x1 y2 + y1 x2 ), corresponding to the product of numbers
√ √
(x1 + y1 2)(x2 + y2 2),
√ √
the numbers x + y n such √ that x2 − ny2 = 1 include
√ 1 = 1 + 0 n and the
multiplicative inverse x − y 2 of the number x + y 2:
√ √
(x + y 2)(x − y 2) = x2 − 2y2 = 1

since x2 − 2y2 = 1 by the assumption that (x, y) is a solution.


√ (x, y) form a group, with the same 2structure
Thus the solutions as the
set of numbers x + y 2, where x, y are integers such that x − 2y = 1. To
2

understand
√ this group we first focus on the subgroup of positive numbers
x + y 2 where x2 − 2y2 = 1.

Structure of positive solutions. The group of positive x + y 2, where
√ solution of x − 2y = 1, is the infinite cyclic group of
(x, y) is an integer 2 2

powers of 3 + 2 2.

To see why, apply the log function to all the positive numbers x + y 2
where x, y are integers such that√x2 −2y2 = 1. Since log(ab) = log a+log b,
the resulting numbers log(x + y 2) then form a group√under +.
This group has a least positive element, log(3 + 2 2), because
√ √
• 3 + 2 2 is the least x + y 2 corresponding to solutions (x, y) with
x, y > 0,

• solutions (x, −y) with y > 0 are inverses


√ of solutions (x, y) with x, y >
0. Hence the corresponding x − y 2 are < 1, and their logs are < 0.

But any such group of numbers consists of the integer multiples of its
least positive element m: if any element k lies between multiples of m,

mn < k < m(n + 1),

we also have k − mn in the group, and the size of this element,

0 < k − mn < |m|,

contradicts the minimality of m. 2



5.4 The general Pell equation and Z[ n] 81

Thus all solutions (x,√y) of x2 − 2y2 = 1 for which x + y 2 > 0 √corre-
spond √to powers of 3 + 2 2. Now for any solution (x, y) either x + y 2 or
−x−y 2 is > 0. Hence the remaining solutions√ (x, y) are just the negatives
of those obtained from the powers of 3 + 2 2.

Exercises
Suppose we define integer pairs (uk , vk ) by the equation
√ √
uk + vk 2 = (3 + 2 2)k for all integers k.

Then what we have just proved is that the pairs (uk , vk ) are all the integer solutions
(x, y) of x2 − 2y2 = 1 with x positive. It is now quite easy to express uk and vk as

explicit functions of k, though (not surprisingly) these functions involve 2.
√ √ √
5.3.1 Given that (3 + 2 2)k = uk + vk 2, what is (3 − 2 2)k ?
5.3.2 Deduce from Exercise 5.3.1 that
1 √ √
1 √ √

uk = (3 + 2 2)k + (3 − 2 2)k , vk = √ (3 + 2 2)k − (3 − 2 2)k .


2 2 2

5.3.3 Deduce from Exercise 5.3.2 that uk = nearest integer to (3 + 2 2)k /2. And
vk =?


5.4 The general Pell equation and Z[ n]
If n is a nonsquare integer we define
√ √
Z[ n] = {x + y n : x, y ∈ Z}.

Just as we used the numbers x + y 2 to study x2 − 2y2 = 1 we use the

numbers x + y n to study x2 − ny2 = 1.
√ √
In fact, x2 − ny2 is what we call the norm of x + y n in Z[ n], the
√ √
product of x + y n by its conjugate x − y n:
√ √ √
norm(x + y n) = (x − y n)(x + y n) = x2 − ny2 .

Thus finding solutions of the Pell equation is the same as finding elements

of Z[ n] with norm 1.

The advantage of searching in Z[ n], rather than among pairs (x, y) of

integers, is that we can use algebra on numbers in Z[ n].
82 5 The Pell equation

Brahmagupta composition rule. If (x1 , y1 ) and (x2 , y2 ) are both solutions


of the Pell equation x2 − ny2 = 1, then so is

(x3 , y3 ) = (x1 x2 + ny1 y2 , x1 y2 + y1 x2 ).

This generalizes the “composition” rule used for n = 2 in Section 5.2 and

it may be proved as follows, using factorization in Z[ n].
Since (x1 , y1 ) and (x2 , y2 ) are solutions,

x12 − ny21 = 1 = x22 − ny22 .

Therefore

1 = (x12 − ny21 )(x22 − ny22 )


√ √ √ √
= (x1 − y1 n)(x1 + y1 n) × (x2 − y2 n)(x2 + y2 n)
√ √ √ √
= (x1 − y1 n)(x2 − y2 n) × (x1 + y1 n)(x2 + y2 n)
√ √
= [x1 x2 + ny1 y2 − (x1 y2 + y1 x2 ) n] × [x1 x2 + ny1 y2 + (x1 y2 + y1 x2 ) n]
= (x1 x2 + ny1 y2 )2 − n(x1 y2 + y1 x2 )2
= x32 − ny23 2

This “composition” of solutions to form a new solution was discovered


by the Indian mathematician Brahmagupta around 600 CE (but without

using n).
We also have an identity solution (1, 0) and an inverse (x, −y) of each
solution (x, y), hence the solutions form a group, as we saw previously in
the special case n = 2. As in that case, we can prove that all solutions come
from powers of the smallest positive solution.
Example. Solutions of x2 − 3y2 = 1.
We find by trial that the smallest positive solution is (2, 1). Composing
(2, 1) with itself we get the solutions

(2 × 2 + 3 × 1 × 1, 2 × 1 + 1 × 2) = (7, 4),
(2 × 7 + 3 × 1 × 4, 2 × 4 + 1 × 7) = (26, 15),

and so on. These solutions correspond to the powers of 2 + 3.

The calculation used to prove the Brahmagupta composition rule ac-


tually shows a more general property, which holds not only with integer

5.4 The general Pell equation and Z[ n] 83

coefficients x, y but also with rational coefficients, that is, quotients of in-
tegers. We use the symbol Q (“quotients”) for the rational numbers and

make the natural generalization of Z[ n] to
√ √
Q[ n] = {x + y n : x, y ∈ Q}.
This set of numbers is the set of quotients of elements of Z[n] and it is a
number field, that is, closed under +, −, ×, and ÷ (by nonzero members).
The closure properties are easily checked by calculation (exercises).

We extend the definition of norm to Q[ n] by the same formula

norm(x + y n) = x2 − ny2 .

This formula remains meaningful because each element of Q[ n] is uniquely

expressible as x + y n with x, y ∈ Q, by the argument of Section 5.1.

Multiplicative property of the norm. For any α and β in Q[ n]
norm(α )norm(β ) = norm(αβ ).

√ √
Proof. Let α = x1 + y1 n and β = x2 + y2 n. Then
norm(α )norm(β ) = (x12 − ny21 )(x22 − ny22 )
= (x1 x2 + ny1 y2 )2 − n(x1 y2 + y1 x2 )2
by the calculation above
= norm(αβ ). 2

Exercises
5.4.1 Show that +, −, and × of numbers in Q[n] are themselves numbers in Q[n].
√ √
5.4.2 Show that 1/(x + y n) for x, y ∈ Q (not both zero) is of the form x + y n
for x , y ∈ Q. Deduce that Q[n] is closed under ÷ by nonzero members.
The multiplicative property of the norm can be restated as follows.
5.4.3 If (x1 , y1 ) satisfies x2 − ny2 = k1 and (x2 , y2 ) satisfies x2 − ny2 = k2 , show
that (x1 x2 + ny1 y2 , x1 y2 + y1 x2 ) satisfies x2 − ny2 = k1 k2 .
Brahmagupta used this fact to solve x2 − ny2 = 1 via easier equations x2 − ny2 = k.
His method is most convenient when there is an obvious solution of x2 −ny2 = −1.
5.4.4 Find a nontrivial solution of x2 − 17y2 = −1 by inspection, and use it to find
a nontrivial solution of x2 − 17y2 = 1.
5.4.5 Similarly find a nontrivial solution of x2 − 37y2 = 1.
84 5 The Pell equation

5.5 The pigeonhole argument


The smallest nontrivial solution of x2 − ny2 = 1 is not always so easy to
find as for n = 2 and n = 3. For example, the smallest nontrivial solution
of x2 − 61y2 = 1 is

(x, y) = (1766319049, 226153980)!

This amazing example was discovered by Bhaskara II in 12th century India


and rediscovered by Fermat.
The smallest nontrivial solution appears so unpredictably that its ex-
istence is not clear in general. However, Lagrange proved in 1768 that if
n is any nonsquare positive integer, the Pell equation x2 − ny2 = 1 has an
integer solution = (±1, 0).
An interesting new proof of this was given by Dirichlet around 1840.
He used what is now called the “pigeonhole principle”: if more than k
pigeons go into k boxes then at least one box contains at least two pigeons
(finite version); if infinitely many pigeons go into k boxes, then at least one
box contains infinitely many pigeons (infinite version).
Dirichlet’s argument can be subdivided into the following steps. First,
a theorem on the approximation of irrational numbers:

Dirichlet’s approximation theorem. For any irrational n and integer
B > 0 there are integers a, b with 0 < b < B and
√ 1
|a − b n| < .
B
√ √
Proof. For any integer B > 0 consider the B − 1 numbers n, 2 n . . .,

(B − 1) n. For each multiplier k choose the integer Ak such that

0 < Ak − k n < 1.
√ √
Since n is irrational, the B − 1 numbers Ak − k n are strictly between 0
and 1 and they are all different for the same reason (by the result of Section
5.1). Thus we have B + 1 different numbers
√ √ √
0, A1 − n, A2 − 2 n, . . . , AB−1 − (B − 1) n, 1

in the interval from 0 to 1.


If we then divide this interval into B subintervals of length 1/B, it fol-
lows by the finite pigeonhole principle that at least one subinterval contains
5.5 The pigeonhole argument 85

two of the numbers. The difference between these two numbers, which is

of the form a − b n for some integers a and b, is therefore irrational and
such that
√ 1
|a − b n| < .
B
Also, b < B because b is the difference of two positive integers less than B.
2
The next few steps are short and directed towards applications of the
infinite pigeonhole principle.

1. Since Dirichlet’s approximation theorem holds for all B > 0, we can


make 1/B arbitrarily small, thus forcing the choice of new values
of a and b. Thus there are infinitely many integer pairs (a, b) with

|a − b n| < 1/B. Since 0 < b < B, we have
√ 1
|a − b n| < .
b

2. It follows from step 1 that


√ √ √ √
|a + b n| ≤ |a − b n| + |2b n| ≤ |3b n|,

and therefore
1 √ √
|a2 − nb2 | ≤ · 3b n = 3 n.
b
√ √
Hence there are infinitely many a − b n ∈ Z[ n] with norm of size

≤ 3 n.

3. By the infinite pigeonhole principle we obtain in turn



• infinitely many a − b n with the same norm, N say,
• infinitely many of these with a in the same congruence class,
mod N,
• infinitely many of these with b in the same congruence class,
mod N.
√ √
4. From step 3 we get two positive numbers, a1 − b1 n and a2 − b2 n,
with

• the same norm N,


86 5 The Pell equation

• a1 ≡ a2 (mod N),
• b1 ≡ b2 (mod N).

The final step uses the quotient a − b n of the two numbers just found.
Its norm a2 − nb2 is clearly 1 by the multiplicative property of norm. It
is not so clear that a and b are integers, but this now follows from the
congruence conditions in step 4.

Nontrivial solution of the Pell equation. When n is a nonsquare positive


integer, the equation x2 − ny2 = 1 has an integer solution (a, b) = (±1, 0).
√ √
Proof. Consider the quotient a − b n of the two numbers a1 − b1 n and

a2 − b2 n found in step 4. We have
√ √ √
√ a1 − b1 n (a1 − b1 n)(a2 + b2 n)
a−b n = √ =
a2 − b2 n a22 − nb22
a a − nb1 b2 a1 b2 − b1 a2 √
= 1 2 + n,
N N
√ √
where N = a22 − nb22 is the common norm of a1 − b1 n and a2 − b2 n.

Since the latter numbers have equal norms, their quotient a−b n has norm
1 by the multiplicative property of norm (Section 5.4).
√ √
Since a1 − b1 n and a2 − b2 n are unequal and positive, their quotient

a − b n = ±1. It remains to show that a and b are integers. This amounts
to showing that N divides a1 a2 − nb1 b2 and a1 b2 − b1 a2 , or that

a1 a2 − nb1 b2 ≡ a1 b2 − b1 a2 ≡ 0 (mod N).

The first congruence follows from the fact that a21 − nb21 = N, which
implies

0 ≡ a21 − nb21 ≡ a1 a1 − nb1 b1 ≡ a1 a2 − nb1 b2 (mod N),

replacing a1 and b1 by their respective congruent values a1 ≡ a2 (mod N)


and b1 ≡ b2 (mod N) found in step 4.
The second congruence follows from a1 ≡ a2 (mod N) and b2 ≡ b1
(mod N) by multiplying to obtain a1 b2 ≡ b1 a2 (mod N), in other words,
a1 b2 − b1 a2 ≡ 0 (mod N). 2
5.6 ∗ Quadratic forms 87

5.6 ∗ Quadratic forms


Dirichlet’s pigeonhole argument is one of the neatest ways to prove the
existence of nontrivial solutions of the Pell equation and it contains ideas
that can be applied in other situations. Nevertheless, it is not obviously
relevant to other quadratic Diophantine equations, so there is reason give a
second proof: one that draws on a general theory of quadratic forms.
A binary quadratic form Ax2 + Bxy + Cy2 , where A, B,C ∈ Z, can be
viewed as an integer-valued function of integer pairs, or vectors (x, y).
Many classical questions in number theory are concerned with the values
of quadratic forms. For example, the Pell equation asks whether 1 is a value
of the form x2 − ny2 , when n is a nonsquare natural number. To approach
such questions we use two elementary properties of quadratic forms that
can be confirmed by simple algebra.
Properties of quadratic forms. If f (x, y) = Ax2 + Bxy +Cy2 and v = (x, y)
then

1. f (kv) = k2 f (v),

2. f (v1 + v2 ) + f (v1 − v2 ) = 2[ f (v1 ) + f (v2 )]

Proof. 1. If v = (x, y) then kv = (kx, ky). Hence

f (kv) = A(kx)2 + B(kx)(ky) +C(ky)2 = k2 (Ax2 + Bxy +Cy2 ) = k2 f (v).

2. If v1 = (x1 , y1 ) and v2 = (x2 , y2 ) then

f (v1 ) = Ax12 + Bx1 y1 +Cy21 and f (v2 ) = Ax22 + Bx2 y2 +Cy22 .

Also

f (v1 + v2 ) = A(x1 + x2 )2 + B(x1 + x2 )(y1 + y2 ) +C(y1 + y2 )2


= Ax12 + Ax22 + Bx1 y1 + Bx2 y2 +Cy21 +Cy22
+ 2Ax1 x2 + Bx2 y1 + Bx1 y2 + 2Cy1 y2 ,
f (v1 − v2 ) = A(x1 − x2 )2 + B(x1 − x2 )(y1 − y2 ) +C(y1 − y2 )2
= Ax12 + Ax22 + Bx1 y1 + Bx2 y2 +Cy21 +Cy22
− 2Ax1 x2 − Bx2 y1 − Bx1 y2 − 2Cy1 y2 .
88 5 The Pell equation

Hence
f (v1 + v2 ) + f (v1 − v2 ) = 2Ax12 + 2Bx1 y1 + 2Cy21 + 2Ax22 + 2Bx2 y2 + 2Cy22
= 2 [ f (v1 ) + f (v2 )] 2
A simple consequence of Property 1 is that f (−v) = f (v), so a quadratic
form makes no distinction between a vector v and its negative. Property 1
also says that f (kv) is a multiple of f (v); in particular f (v) is prime (or
1) only for vectors v = (x, y) that are not integer multiples of other inte-
ger vectors, that is, for (x, y) with relatively prime x and y. We call these
primitive vectors.
In Section 2.8 we found a map of all the primitive vectors with positive
x and y. We also found that the latter vectors are generated from i = (1, 0)
and j = (0, 1) by the processes (v1 , v2 ) → (v1 + v2 , v2 ) and (v1 , v2 ) →
(v1 , v1 + v2 ). In the next section we see that vectors with x and y of oppo-
site sign are similarly generated from (0, −1) and (1, 0). Then Property 2
shows that there is a simple relation between the values of f at successive
stages in these processes. This leads to a “map” of the values of f .

Equivalent forms
Another view of a quadratic form f , related to the one described above,
surveys all equivalent forms f ∗ (x, y) = f (px + qy, rx + sy), obtained by
replacing the row vector (x y) by
 
p r
(px + qy rx + sy) = (x y) = (x y)M,
q s
where the matrix M and its inverse M −1 both have integer entries. When M
satisfies these conditions, the pairs (px + qy, rx + sy) run through the set Z2
of all integer pairs when (x, y) does. Indeed, if (x , y ) is any integer pair,
we have
(x y ) = (x y)M ⇔ (x y) = (x y )M −1 .
Thus equivalent forms have the same set of values. Examples are x2 + y2
and x2 + 2xy + 2y2 , the latter obtained from x2 + y2 when (x, y) is replaced
by (x + y, y).
When M and M −1 both have integer entries, then det M and det M −1
are both integers. Since
 
−1 1 0
MM = ,
0 1
5.6 ∗ Quadratic forms 89

it follows by taking the determinant of both sides that

det M · det M −1 = 1

(due to the multiplicative property: det(M1 M2 ) = det M1 · det M2 ). The only


possible values for det M and det M −1 are therefore ±1. Thus the condition
for a matrix M to define an equivalence of quadratic forms is that M have
integer entries and that det M = ps − qr = ±1. Such a matrix is called
unimodular.
Now an arbitrary quadratic form can be expressed as a matrix product,
  
A B/2 x
Ax2 + Bxy +Cy2 = (x y) . (*)
B/2 C y

So it follows from what we have just seen that any equivalent form is ob-
tained by replacing
   
A B/2 A B/2
by M M −1 ,
B/2 C B/2 C

where M is unimodular. This is so because the new matrix effects the


replacement of (x y) by (x y)M.
Formula (*) reveals an invariant of the form Ax2 + Bxy + Cy2 under
equivalence, namely the determinant AC − B2 /4 of its matrix. Indeed, the
determinant of any equivalent,
 
A B/2
det M M −1 ,
B/2 C

is equal (again by the multiplicative property of determinants) to


   
A B/2 A B/2
det M det det M −1 = (±1)2 det
B/2 C B/2 C
 
A B/2
= det ,
B/2 C

since det M = det M −1 = ±1 by hypothesis. Thus all equivalents of the


form Ax2 + Bxy +Cy2 have the same determinant.
90 5 The Pell equation

Exercises
Although equivalent forms have the same determinant, the converse is not always
true. It so happens that the form x2 + y2 is equivalent to all other forms with
determinant 1, but x2 + 5y2 is not equivalent to all other forms with determinant 5.
5.6.1 Show that 13x2 + 16xy + 5y2 has determinant 1, and that it is equivalent to

2 3
x2 + y2 via the matrix M = .
1 2
5.6.2 Show that 2x2 + 2xy + 3y2 has the same determinant as x2 + 5y2 , but that
it is not equivalent to x2 + 5y2 , by showing that x2 + 5y2 does not take the
value 7.
5.6.3 More generally, show that x2 + 5y2 takes no values ≡ 3 or 7 (mod 20), by
working out the possible values of x2 + 5y2 (mod 20).

5.7 ∗ The map of primitive vectors


In Section 2.8 we described a partition of the plane (a “map”) into regions
labelled by (1, 0), (0, 1) and all the primitive vectors (a, b) of natural num-
bers. Figure 5.1 (right half) shows this map again, rotated through 90◦ ,
together with a near mirror image of it (left half) in which the second co-
ordinate of each pair has a negative sign.

(1, −3) (1, 3)

(1, −2) (1, 2)


(2, −3) (2, 3)
(0, −1) (0, 1)
(1, −1) (1, 1)
(1, 0) (1, 0)
(3, −2) (3, 2)

(2, −1) (2, 1)


v1 v1
v 1 + v2 v 1 + v2
(3, −1) v2 v2 (3, 1)

Figure 5.1: Two partial maps of primitive vectors

Also in the right half of the figure we have the schematic vector sum
rule that generates all the labels from (1, 0) and (0, 1), and in the left half
the mirror image rule that obviously applies there.
5.7 ∗ The map of primitive vectors 91

We put these two maps side by side because we want to join them
together, but we seem prevented from doing so by the incompatible labels,
(0, 1) and (0, −1), in the upper central region. The conflict can be resolved
by giving each label a ± sign. This yields Figure 5.2, which we call the
(complete) map of primitive vectors, for the obvious reason that it contains
every primitive vector. The ± labelling fuses the two vector sum rules into
the single vector difference/sum rule shown at the bottom of the Figure.

±(1, −3) ±(1, 3)

±(1, −2) ±(1, 2)


±(2, −3) ±(2, 3)
±(0, 1)
±(1, −1) ±(1, 1)
±(3, −2)
±(1, 0) ±(3, 2)

±(2, −1) ±(2, 1)


±v1
±(3, −1) ±(v1 − v2 ) ±(v1 + v2 ) ±(3, 1)
±v2

Figure 5.2: The complete map of primitive vectors

This rule needs some clarification because of the ambiguous signs. In


a ± pair of vectors, say ±(1, 2), we are free to choose either (1, 2) or
−(1, 2) as v1 . Likewise for the pair, say ±(2, 3), labelling a region below
an edge of region ±v1 : we can choose either (2, 3) or −(2, 3) to be v2 .
The vector difference/sum rule says that, for some choice of v1 and v2 , the
region between v1 and v2 at the left end of their common edge is labelled
±(v1 − v2 ) and the region at the right end is labelled ±(v1 + v2 ). In this
example the regions are as in Figure 5.3.

±(1, 2) ±(3, 5)
±(1, 2)
±(2, 3)
= ±(1, 1) ±(3, 5)
±(1, 1) ±(2, 3)

Figure 5.3: Regions above, below, and at the ends of an edge


92 5 The Pell equation

Figure 5.3 shows how lines may be deformed to conform with the
schematic diagram for the difference/sum rule—in particular the edge com-
mon to regions ±(1, 2) and ±(2, 3) is not really horizontal—within bounds
that preserve the meanings of “above”, “below”, “right end”, and “left end”
for the edge common to the regions ±(1, 2) and ±(2, 3). Here the choice

v1 = (1, 2), v2 = (2, 3) gives v1 + v2 = (3, 5), v1 − v2 = −(1, 1),

so at the right end ±(3, 5) = ±(v1 + v2 ) and at the left end ±(1, 1) =
±(v1 − v2 ), as required.
It follows from the vector sum rules in the separate left and right maps
in Figure 5.1 that the vector difference/sum rule holds in the complete map.
This is proved by a finite number of simple checks, similar to the example
above but more general. The details are left to the exercises.
The sign ambiguity ±(x, y) has no effect on the value of a quadratic
form because

Ax2 + Bxy +Cy2 = A(−x)2 + B(−x)(−y) +C(−y)2 .

Hence the map of primitive vectors gives an unambiguous map of all values
of the quadratic form f (x, y) = Ax2 + Bxy +Cy2 for relatively prime x and
y, obtained by entering each value f (a, b) in the region ±(a, b). Moreover,
it is possible to see some pattern in this map, thanks to the parallel between
the vector difference/sum rule and Property 2 of quadratic forms proved
in the previous section. We show this in the next section, assisted by the
invariance of the determinant AC − B2 /4 under change of variables. The
complete map also displays such changes, as we are about to see.

The tree of integral bases


In Section 5.6 we defined forms f , f ∗ to be equivalent if f ∗ (x, y) results
from f (x, y) by replacing the vector (x, y) by a vector (px + qy, rx + sy),
which is equivalent to it in the sense that (px + qy, rx + sy) runs through Z2
when (x, y) does. Since

(x, y) = x(1, 0) + y(0, 1) and (px + qy, rx + sy) = x(p, r) + y(q, s),

this amounts to replacing the vectors (1, 0) and (0, 1) by the new vectors
(p, r) and (q, s). We call the pair of vectors (1, 0) and (0, 1) an integral
basis of Z2 because any integer vector (x, y) is a linear combination of
them with integer coefficients, namely x(1, 0) + y(0, 1).
5.7 ∗ The map of primitive vectors 93

Equivalence says that the replacement M : (x, y) → (px + qy, rx + sy) is


invertible, so the inverse matrix M −1 has integer coefficients and the new
vectors also form an integral basis. Thus the criterion for a pair of vectors
(p, r) and (q, s) to form an integral basis is the criterion derived in Section
5.6 for M and M −1 to be integral, namely ps − qr = ±1.
Now in Section 2.7 we showed that, if (p, r) and (q, s) are labels on
two regions with a common edge in the map of relatively prime pairs, then

ps − rq = ±1.

It is easily seen that this property extends to the complete map of Figure
5.2. Thus each edge in the map of primitive vectors represents an integral
basis of Z2 , namely the pair of labels on the regions that meet along the
edge. The ± signs on the labels give four different bases, but they are
essentially the same. Since the edges of the map form a tree, and each
edge is associated in this way with an integral basis (up to sign), we call
the edge complex of the map of primitive vectors the tree of integral bases.
As the name suggests, the tree represents all integral bases. We do not
need this fact. However, it is easy to prove using the vector difference/sum
rule to implement a kind of Euclidean algorithm (see exercises).

Exercises
To prove that the vector difference/sum rule holds in the complete map of primitive
vectors we check that it holds in the middle and in “general position” on the right
and left.
5.7.1 Verify that the difference/sum rule holds in the middle of the map (Figure
5.4) by choosing v1 = (0, 1) and v2 = (1, 0).

±(0, 1)
±(1, −1) ±(1, 1)
±(1, 0)

Figure 5.4: The middle of the complete map

5.7.2 Figure 5.5 shows one “general position” on the right side of the complete
map. By choosing v1 = u1 and v2 = u1 + u2 , verify that the difference/sum
holds here.
94 5 The Pell equation

±u1 ±(2u1 + u2 )
±(u1 + u2 )
±u2

Figure 5.5: One “general position” on the right

5.7.3 Work out which other general positions occur on the right and on the left
and verify that the difference/sum rule holds for each of them.
5.7.4 The “vector sum/difference rule” shown in Figure 5.6 is also valid. Why?

±v1
±(v1 + v2 ) ±(v1 − v2 )
±v2

Figure 5.6: The sum/difference rule

To prove that the tree in the complete map represents all integral bases we
use the difference/sum and sum/difference rules to trace a path from a given basis
{(p, r), (q, s)} back to {(1, 0), (0, 1)}. Exercise 5.7.5 is an example, and Exercises
5.7.6–5.7.8 show why such a path can always be found.
5.7.5 By repeatedly subtracting the “smaller” vector from the “larger”, reduce the
pair {(35, 3), (23, 2)} to the pair {(1, 0), (11, 1)}. The latter pair is repre-
sented in the tree (why?), hence so is the former (why?).
5.7.6 Show that if
(p , r ) = (p + q, r + s), (q , s ) = (q, s)
or
(p , r ) = (p, r), (q , s ) = (p + q, r + s)
then ps − qr = ±1 ⇔ p s − q r = ±1.
5.7.7 By repeatedly adding or subtracting one vector from the other, show that
any pair {(p, r), (q, s)} with pr − qs = ±1 reduces to a pair of the form
{(p , 0), (q , s )}. (Hint: gcd(r, s) = 1. Why?) Deduce from Exercise 5.7.6
that p = ±1, q = ±1.
5.7.8 Deduce that {(p , 0), (q , s )} in Exercise 5.7.7 is represented by an edge in
the tree, and hence so is {(p, r), (q, s)}.
5.8 ∗ Periodicity in the map of x2 − ny2 95

5.8 ∗ Periodicity in the map of x2 − ny2


In the previous section we briefly mentioned how a map of any quadratic
form f may be superimposed on the map of primitive vectors by marking
the region ±v with the value f (v) = f (−v). We now investigate maps of
quadratic forms in more depth and, to get an idea of what to expect, we
first present the map of x2 − 3y2 in Figure 5.7. Only the right half is shown,
because the left half is its mirror image. The values are marked as numbers
in circles.

(1, 3) −26

(1, 2) −11
−3
(2, 3) −23
(0, 1)
(1, 1) −2
(1, 0)
(3, 2) −3
1
(2, 1) 1

(3, 1) 6

Figure 5.7: The map of x2 − 3y2

In this map there seems to be a single dividing line between positive


and negative values of x2 − 3y2 . Conway calls this line the river, and
we have drawn it heavily in Figure 5.7. On either side of the river the
values of x2 − 3y2 appear to increase in absolute value as one moves away
from it (which is why one expects there to be only one river). And, rather
unexpectedly, the values along the river seem to be periodic: in successive
regions “above” the river the values are −3, −2, −3, −2, . . . and below each
pair of successive regions with values −3, −2 there is a single region with
value 1. Figure 5.8 confirms the pattern a bit further.
96 5 The Pell equation

−3
(0, 1)

(1, 0)
(1, 1) −2 (3, 2) −3

1 (2, 1) 1 (5, 3) −2 (12, 7) −3

(7, 4) 1 (19, 11) −2

(26, 15) 1

Figure 5.8: The river for x2 − 3y2

If this pattern continues indefinitely, then we can generate the sequence


of positive solutions of the Pell equation x2 − 3y2 = 1, namely (2, 1), (7, 4),
(26, 15), . . ., by applying the vector addition rule for the map of primitive
vectors to locate the successive regions with value 1 (see exercises).
The example of x2 − 3y2 is a good example of what happens with any
indefinite quadratic form, that is, one that takes both positive and negative
values but not the value zero. With the help of the following proposition
we can show that any indefinite quadratic form has a unique “river”, with
periodic behavior.
Arithmetic progression rule. If L, U, D, R (for “left”, “up”, “down”,
“right”) are the values of a quadratic form f around an edge as shown in
Figure 5.9 then

U
L R
D

Figure 5.9: Values in regions around an edge

1. L, U + D, R is an arithmetic progression.
2. If (p, r) and (q, s) respectively are the regions above and below the
edge, then the common difference in this progression is the coefficient
of xy in the quadratic form f (px + qy, rx + sy).
5.8 ∗ Periodicity in the map of x2 − ny2 97

Proof. The difference/sum rule in the map of primitive vectors (Section


5.7) implies that

L = f (v1 − v2 ), U = f (v1 ), D = f (v2 ), R = f (v1 + v2 ),

where v1 and v2 are the regions above and below the middle edge. It then
follows from Property 2 of quadratic forms (Section 5.6) that

L + R = 2(U + D), or (U + D) − L = R − (U + D),

and this says that L, U + D, R is an arithmetic progression.


Recall from Section 5.7 that if the basis i = (1, 0), j = (0, 1) of Z2 is re-
placed by the basis v1 = (p, r), v2 = (q, s), then the form f (x, y) is replaced
by the equivalent form f ∗ (x, y) = f (px + qy, rx + sy) = Ax2 + Bxy + Cy2
say. Also, the values of f at v1 , v2 , v1 + v2 and v1 − v2 are the same as
the values f ∗ at i, j, i + j and i − j, namely A, C, A + B +C and A − B +C
respectively.
Thus the common difference, (U + D) − L, of the arithmetic progres-
sion is A +C − (A − B +C) = B, as claimed. 2

Part 1 of the arithmetic progression rule is enough to show:


Uniqueness of the river. For any form x2 − ny2 , where n is a nonsquare
natural number, there is a unique edge path in the map of primitive vectors
that separates regions of positive value from regions of negative value.
Proof. Such a form is never zero, because x2 − ny2 = 0 implies n = x2 /y2
is a square; and x2 − ny2 certainly takes both positive and negative values.
Consider a place on its map where a region of value L < 0 meets two
regions with values U, D > 0 as in Figure 5.9. (If the region with value
L is actually on the right, it is still true that L, U + D, R is an arithmetic
progression.)
Then Part 1 implies that R − (U + D) = (U + D) − L > U + D, hence
R > max(U, D). Thus moving one edge away from the border between
positive and negative values leads to a region of greater positive value.
More generally, if D > max(U, L) then R > D by a similar application
of Part 1, so it follows that values of regions continually increase as we
move further from the negative region. Similarly, values on the negative
side continually decrease as we move further from the boundary path be-
tween positive and negative regions. Hence there is only one edge path
separating the positive- from negative-valued regions. 2
98 5 The Pell equation

We need Part 2 of the arithmetic progression rule to prove the more


difficult periodicity property, which guarantees the existence of nontrivial
solutions of the Pell equation.
Periodicity of the river. When n is a nonsquare natural number, the pat-
tern of values along the sides of the river for x2 − ny2 is periodic.
Proof. It will suffice to prove that regions sharing edges with the river are
bounded in absolute value. Indeed, if that is so, the values L, U and D in
Figure 5.9 around some edge in the river will recur; hence so will the value
R (being determined by L, U and D according to the arithmetic progression
rule), whose region also shares an edge with the river, and so on.
As we saw in the proof of Part 2, the values U and D equal C and
A, where Ax2 + Bxy + Cy2 is a quadratic form f ∗ equivalent to f (x, y) =
x2 − ny2 . But we know from Section 5.6 that the determinant AC − B2 /4
is the same for all equivalents f ∗ of f . Here C and A, being the values of
regions on opposite sides of the river, have opposite signs. Hence

|AC − B2 /4| = |A||C| + B2 /4

Since AC −B2 /4 is constant, it follows that |A| and |C|—the absolute values
of D and U—are bounded as required. 2

Exercises
The “Pell quadratic forms” x2 − ny2 are by no means the only indefinite forms.
Another

interesting example is x2 + xy − y2 , which is related to the golden ratio
1+ 5
2 and the Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, . . ..
 √  √ 
5.8.1 Show that x2 + xy − y2 = x + y 1+2 5 x + y 1−2 5 and deduce from this
that the form x2 + xy − y2 is indefinite.
5.8.2 Construct enough of the river for x2 + xy − y2 to show that its period looks
like Figure 5.10.

−1
1

Figure 5.10: The period of x2 + xy − y2


5.9 Discussion 99

5.8.3 Show that the positive labels (xi , yi ) alternately below and above the river
(in the regions marked alternately 1 and −1) satisfy

(x1 , y1 ) = (1, 1), (xi−1 , yi−1 ) + (xi , yi ) = (xi+1 , yi+1 ).

5.8.4 Deduce from Exercise 5.8.3 that the natural number pairs satisfying the
equation x2 + xy − y2 = 1 are (F2n+1 , F2n+2 ) for n = 0, 1, 2, 3, . . ., where
F1 = F2 = 1 and Fi + Fi−1 = Fi+1 (the Fibonacci sequence).
Periodicity in the shape of the river leads naturally to recurrence relations
between the vectors labelling riverside regions. The Fibonacci relation arising
from x2 + xy − y2 is the simplest example of such a recurrence relation. Another
is the relation for x2 − 3y2 , whose river was constructed above.
5.8.5 Use two successive periods in the river for x2 − 3y2 to show that the non-
negative solutions (xi , yi ) of x2 − 3y2 = 1 satisfy

(x0 , y0 ) = (1, 0), (xi+1 , yi+1 ) = 4(xi , yi ) − (xi−1 , yi−1 ).

The river also shows why certain equations do not have solutions.
5.8.6 Explain why the equation x2 − 3y2 = −1 has no integer solution.

5.9 Discussion
The Pell equation x2 − ny2 = 1 is one of the oldest and most important
quadratic Diophantine equations. Probably its only rival is the Pythagorean
equation x2 + y2 = z2 . The Pell equation also dates back to the time of the
√ BCE), who studied the special case x − 2y = 1
Pythagoreans (around 500 2 2

in connection with the 2, as mentioned in Section 5.1.


Another famous Pell equation is due to Archimedes. His “cattle prob-
lem” leads to the Pell equation x2 − 4729494y2 = 1, the least nontrivial so-
lution of which has an x with 206545 digits! This solution was surely not
known to Archimedes, though perhaps he knew that Pell equations could
have remarkably large solutions. For an excellent discussion of the cattle
problem, and the computational issues it raises, see Lenstra (2002).
The Pell equation was rediscovered in India, where mathematicians
were also fascinated by short questions with long answers. Around 600
CE , Brahmagupta discovered the formula for composing solutions we used
in Section 5.4. He used a generalization of it to find the minimal solution
(1151, 120) of x2 − 92y2 = 1 (saying that “a person solving this equation
within a year is a mathematician”). In 1150 CE Bhaskara II extended Brah-
magupta’s idea to a method that solves all Pell equations, illustrating it with
100 5 The Pell equation

the well chosen example x2 − 61y2 = 1. He found its minimal solution,


(1766319049,226153980), which is by far the largest minimal solution of
any Pell equation x2 − ny2 = 1 with n ≤ 61.
In Europe nothing was known of the Indian discoveries, but the Pell
equation resurfaced in the 17th century when Fermat independently dis-
covered how to solve it. He did not reveal his method, but he evidently
knew what he was doing, because he too picked x2 − 61y2 = 1 as a chal-
lenge to other mathematicians. He also posed the even more formidable
equation x2 − 109y2 = 1, the minimal solution of which is

(158070671986249, 15140424455100).

His English rivals Wallis and Brouncker rose to the challenge with a method
that solves the Pell equation, not unlike the method of Bhaskara II (see Weil
(1984), p. 94). In the 18th century these methods morphed into the simpler
and more elegant continued fraction algorithm, which can be viewed as the

Euclidean algorithm applied to the pair ( n, 1).
All of these methods are based on the observation of periodicity in cer-
tain computations. It is likely that the ancient Greeks observed periodicity
in the Euclidean algorithm, because
√ simple
√ geometric arguments show its
periodicity on pairs such as ( 2, 1) and ( 3, 1) (see, for example, Stillwell
(1998), p. 268, or Artmann (1999), p. 242). However, while many could
use periodicity to solve instances of the Pell equation, the first to prove
that periodicity always occurs was Lagrange (1768). He thereby showed
that the continued fraction method always works. He underlined the im-
portance of this result by showing that solving the Pell equation leads to
the solution of all quadratic Diophantine equations in two variables.
Conway’s visual approach, expounded in Sections 5.6–5.8, is certainly
related to the old approaches to the Pell equation. But it is essentially
simpler in that it replaces a process (the Euclidean algorithm) by a picture
(the map of primitive vectors). I have attempted to make this as clear as
possible by deriving the map of primitive vectors and its properties directly
from properties of the Euclidean algorithm, before imprinting the values
of a quadratic form on it. (Conway assumes the simplest properties of the
map, or sketches topological proofs, and proves others with the help of
quadratic forms.) For further insights obtainable from Conway’s approach,
see the book Conway (1997) or his related video ax2 + hxy + by2 available
from the American Mathematical Society.
http://www.springer.com/978-0-387-95587-2

You might also like