Mathematics Magazine
Mathematics Magazine
MATHEMATICS
MAGAZINE
EDITOR
Walter Stromquist
ASSOCIATE EDITORS
Bernardo M.
Abrego
California State University, Northridge
Paul J. Campbell
Beloit College
Annalisa Crannell
Franklin & Marshall College
Deanna B. Haunsperger
Carleton College
Warren P. Johnson
Connecticut College
Victor J. Katz
University of District of Columbia, retired
Keith M. Kendig
Cleveland State University
Roger B. Nelsen
Lewis & Clark College
Kenneth A. Ross
University of Oregon, retired
David R. Scott
University of Puget Sound
Paul K. Stockmeyer
College of William & Mary, retired
Harry Waldman
MAA, Washington, DC
LETTER FROM THE EDITOR
The cover refers to Mad Vet puzzles, in which animals are transformed into other
animals. These puzzles are the starting point for the article by Gene Abrams and
Jessica Sklar in this issue. They show how each of these puzzles is related to
a particular semigroup. Understand the semigroup and solve the puzzle! From
there they nd connections to graph theory and to current research.
Other animalssome horses, but also beasts like Lebesgue measuretake the
stage when Julia Barnes and Lorelei Koss invite us to their carnival. It is a carnival
of mappings, exploring the implications of G. D. Birkhoffs Ergodic Theorem.
Ever drill a hole through the center of a sphere? In calculus problems, perhaps.
Vincent Coll and Jeff Dodd consider what other solids you might drill through
instead. The diameters of the Earth and of a hydrogen atom are mentioned.
Danielle Arett and Suzanne Dor ee tell us about Tower of Hanoi graphs. They
explore properties of these graphs and use them to derive combinatorial identi-
ties. Arett was Dor ees student at Augsburg College when this work began.
In the Notes section, Todd Will gives us a denitive treatment of a sums-of-
squares problem, partly by combining (and sometimes reconciling) old results.
There are also pieces by Ron Hirshon on random walks with barriers (or gam-
bling games, if we prefer), Christopher Frayer on polynomial root squeezing, and
Alexander Kheifets and James Propp on integration by parts. At the back of the
issue are problems, solutions, and results from the 50th International Mathemat-
ical Olympiad.
But let us begin with some beginnings. Ko-Wei Lih introduces us to a magic
square from 18th-century Korealong before Eulers work on the latin squares.
Could Choes square have inuenced Benjamin Franklin? He would surely have
been interested, and it was in print before he was ten years old.
Walter Stromquist, Editor
162
ARTICLES
A Remarkable Euler Square before Euler
KO- WEI LI H
Institute of Mathematics
Academia Sinica
Nankang, Taipei 115, Taiwan
[email protected]
Orthogonal Latin squares and Choes conguration
A Latin square of order n is formed when the cells of an n n square array are lled
with elements taken from a set of cardinality n so that all cells along any row or any
column are occupied with distinct elements. A notion of orthogonality between two
Latin squares can be dened as follows. We may juxtapose two Latin squares A and B
of order n into one square array so that each cell is occupied with an ordered pair, rst
component from A and second component from B. When all n
2
of these ordered pairs
are distinct, we say that A is orthogonal to B. Obviously, this orthogonality relation is
symmetric. The juxtaposition of two orthogonal Latin squares is called a Graeco-Latin
square by Euler, who was the rst to study the properties of Latin and Graeco-Latin
squares in a short paper [2] written in 1776. His motivation was to produce magic
squares from Graeco-Latin squares. We call a Graeco-Latin square an Euler square in
this article.
A magic square of order n is an arrangement of the numbers 1, 2, . . . , n
2
into an
n n square array so that the sum of numbers along any row, any column, or either of
the two main diagonals is equal to the xed number n(n
2
+1)/2.
To make things simpler, we always suppose that a Latin square of order n is lled
with numbers from the set {1, 2, . . . , n}. Euler used the simple algorithm of mapping
the pair (x, y) into the number n(x 1) + y to convert a Graeco-Latin square of order
n into an array of order n. We call this mapping the canonical mapping in the sequel.
It is easy to see that the range of this mapping is the set {1, 2, . . . , n
2
} and the sum of
numbers along any row or column of the array is n(n
2
+1)/2. If we can arrange to
have both main diagonals sum to n(n
2
+1)/2, then a magic square is produced.
The highest order of an Euler square explicitly constructed in [2] is ve. The follow-
ing is an example from [2] in matrix form with entry xy representing a pair (x, y) in
the Euler square. Applying the canonical mapping to this square, we obtain the magic
square on the right.
34 45 51 12 23
25 31 42 53 14
11 22 33 44 55
52 13 24 35 41
43 54 15 21 32
14 20 21 2 8
10 11 17 23 4
1 7 13 19 25
22 3 9 15 16
18 24 5 6 12
Math. Mag. 83 (2010) 163167. doi:10.4169/002557010X494805. c Mathematical Association of America
163
164 MATHEMATICS MAGAZINE
Orthogonal Latin squares have been known to predate Euler in Europe. A compre-
hensive history of Latin squares can be found in [1]. However, it is surprising that an
Euler square of order higher than ve was already in existence in the Orient, prior to
Eulers paper. In a Korean mathematical treatise Kusuryak ( , Summary of the
Nine Branches of Numbers) written by Choe S ok-ch ong ( , 16461715), an Eu-
ler square of order nine appeared. Choe, a Confucian scholar and one time the prime
minister of the Choson Dynasty, wrote his treatise presumably after his retirement in
1710. Figure 1 is a facsimile of the pages copied from [5] (vol. 1, pp. 698699) exhibit-
ing Choes congurations. The 9 9 square on the right is our main concern in this
note. (The square begins with the rightmost column on the left-hand page and extends
over most of the right-hand page.)
Figure 1 A facsimile of Choes congurations
The reader is referred to [3] and [4] for background information on the history of
Korean mathematics. Choes treatise was entirely written in Chinese characters. He
did not reveal any clue as how he arrived at his congurations. A modern matrix form
M of his square is displayed as follows.
M =
51 63 42 87 99 78 24 36 15
43 52 61 79 88 97 16 25 34
62 41 53 98 77 89 35 14 26
27 39 18 54 66 45 81 93 72
19 28 37 46 55 64 73 82 91
38 17 29 65 44 56 92 71 83
84 96 75 21 33 12 57 69 48
76 85 94 13 22 31 49 58 67
95 74 86 32 11 23 68 47 59
5 6 4 8 9 7 2 3 1
4 5 6 7 8 9 1 2 3
6 4 5 9 7 8 3 1 2
2 3 1 5 6 4 8 9 7
1 2 3 4 5 6 7 8 9
3 1 2 6 4 5 9 7 8
8 9 7 2 3 1 5 6 4
7 8 9 1 2 3 4 5 6
9 7 8 3 1 2 6 4 5
R =
1 3 2 7 9 8 4 6 5
3 2 1 9 8 7 6 5 4
2 1 3 8 7 9 5 4 6
7 9 8 4 6 5 1 3 2
9 8 7 6 5 4 3 2 1
8 7 9 5 4 6 2 1 3
4 6 5 1 3 2 7 9 8
6 5 4 3 2 1 9 8 7
5 4 6 2 1 3 8 7 9
It is also observed in [6] that each pair of corresponding rows of L and R form
a palindrome. Let P
n
= ( p
i, j
) be an n n permutation matrix with p
i, j
= 1 when
j = n +1 i . Then this observation amounts to the matrix equality R = LP
9
.
In the next section, we list new observations about nice properties of M. In the last
section we will explain how M can be constructed by a matrix product method. The
construction will make clear why these properties hold.
More nice properties of Choes square
Sums of centrally symmetric cells Any pair of cells in a matrix of odd order is said
to be centrally symmetric if they are located symmetrically with respect to the center
cell. In the square L (or R), any pair of entries at centrally symmetric cells sum to 10.
It follows that, in Choes square M, if we read each entry as a two-digit integer, any
pair of centrally symmetric entries sums to 110. (In the magic square formed by the
canonical map, any pair of centrally symmetric entries sums to 82.)
A partition into orthogonal Latin squares We split M right down the central ver-
tical line to get two matrices L
and R
5 1 6 3 4 2 8 7 9
4 3 5 2 6 1 7 9 8
6 2 4 1 5 3 9 8 7
2 7 3 9 1 8 5 4 6
1 9 2 8 3 7 4 6 5
3 8 1 7 2 9 6 5 4
8 4 9 6 7 5 2 1 3
7 6 8 5 9 4 1 3 2
9 5 7 4 8 6 3 2 1
9 7 8 2 4 3 6 1 5
8 9 7 1 6 2 5 3 4
7 8 9 3 5 1 4 2 6
6 4 5 8 1 9 3 7 2
5 6 4 7 3 8 2 9 1
4 5 6 9 2 7 1 8 3
3 1 2 5 7 6 9 4 8
2 3 1 4 9 5 8 6 7
1 2 3 6 8 4 7 5 9
Again, R
= L
P
9
and L
is an Euler square.
59 17 68 32 44 23 86 71 95
48 39 57 21 66 12 75 93 84
67 28 49 13 55 31 94 82 76
26 74 35 98 11 89 53 47 62
15 96 24 87 33 78 42 69 51
34 85 16 79 22 97 61 58 43
83 41 92 65 77 56 29 14 38
72 63 81 54 99 45 18 36 27
91 52 73 46 88 64 37 25 19
i
and R
i
, then R
i
= L
i
P
9
and L
i
R
i
is again an Euler square.
Our method to construct Choes square
First we dene a formal Kronecker product of two matrices. Let U = (u
i, j
) be an
m m matrix and V = (v
i, j
) be an n n matrix. Dene U V to be an mn mn
matrix
Y
1,1
Y
1,2
Y
1,m
Y
2,1
Y
2,2
Y
2,m
. . . . . . . . . . . . . . . . . . . . .
Y
m,1
Y
m,2
Y
m,m
,
where Y
i, j
is an n n matrix whose (s, t )-entry is equal to the pair (u
i, j
, v
s,t
).
There are six permutations of the numbers 1, 2, and 3. They can be grouped into
two 3 3 orthogonal Latin squares A and B such that B = AP
3
.
A =
2 3 1
1 2 3
3 1 2
B =
1 3 2
3 2 1
2 1 3
Next we substitute 3(a 1) + b for the entry (a, b) in A A. The result is the
matrix L. Any pair of entries at centrally symmetric cells in A sum to 4. Therefore,
the above substitution implies that any pair of entries at centrally symmetric cells in
A A sum to 10.
Similarly, we may compute B B and perform the same substitution and the out-
come is the matrix R. Again, any pair of entries at centrally symmetric cells in B B
sum to 10.
We also note that (A A)P
9
= AP
3
AP
3
= B B. Consequently, The proper-
ties of L
and R
+ y
.
Associativity of +on W is inherited from the associativity of +on S. Thus, (W, +)
is a semigroup, called the Mad Vet semigroup of its corresponding Mad Vet scenario.
Since addition is clearly commutative on S, every Mad Vet semigroup (W, +) is com-
mutative.
EXAMPLE. We revisit Scenario #1 and examine its Mad Vet semigroup (W, +).
We showed previously that in this case W is the 3-element set
W = {[(1, 0, 0)], [(2, 0, 0)], [(3, 0, 0)]}.
Using the operation + in W, we get, for instance,
[(1, 0, 0)] +[(1, 0, 0)] = [(1 +1, 0, 0)] = [(2, 0, 0)],
as wed expect. But perhaps its a bit surprising that
[(1, 0, 0)] +[(3, 0, 0)] = [(4, 0, 0)] = [(1, 0, 0)].
In other words, [(3, 0, 0)] behaves like an identity element with respect to the ele-
ment [(1, 0, 0)] in W. In fact, [(i, 0, 0)] + [(3, 0, 0)] = [(i, 0, 0)] for any 1 i 3.
So for this Mad Vet scenario the Mad Vet semigroup (W, +) is a monoid, with identity
[(3, 0, 0)]. Further, since
[(1, 0, 0)] +[(2, 0, 0)] = [(3, 0, 0)]
in W, every element in (W, +) has an inverse. Therefore, (W, +) is in fact a group;
since its order is 3, it must be isomorphic to the group Z
3
.
5. Not all Mad Vet semigroups are groups
Perhaps it is not surprising that the Mad Vet semigroup of Scenario #1 is a group, in
light of the explicit description of its elements. In many Mad Vet scenarios, (W, +)
is indeed a group; however, we will later see a Mad Vet semigroup that is not even a
monoid. Notably, given any Mad Vet semigroup W, the obvious choice, [0], for an
identity element of W is not even contained in W, since 0 is not in S.
Scenario #2. Suppose the same Mad Vet has replaced two of her machines with
new machines.
Machine 1 still turns one ant into one beaver;
Machine 2 now turns one beaver into one ant and one cougar;
Machine 3 now turns one cougar into two cougars.
In this situation W is a monoid, but not a group. First, we claim that
W = {[(i, 0, 0)] : i Z
+
} {[(0, 0, 1)]},
where Z
+
denotes the set of positive integers. Indeed, let (a, b, c) be a menagerie
for this scenario. If a = b = 0 (that is, there are only cougars in the menagerie) then
VOL. 83, NO. 3, JUNE 2010 173
c 1 applications of Machine 3 yields that (0, 0, c) (0, 0, 1). Else, suppose that
at least one of a or b is nonzero. Since (a, b, c) (a +b, 0, c) (using Machine 1 in
reverse b times), we may assume that the menagerie contains at least one ant and no
beavers. If c = 0, then we are done. If c = 0, then we can apply Machine 3 in the
appropriate direction |a c| times, obtaining a menagerie that contains a ants and a
cougars; thus, (a, 0, c) (a, 0, a). Then applying Machine 2 in reverse a times yields
(a, 0, a) (0, a, 0), which is equivalent to (a, 0, 0) (using Machine 1).
Hence, W consists of the indicated elements. We may now use arguments similar
to the argument utilized in studying Scenario #1 to show that these elements are all
distinct in W. This establishes our claim.
The same sorts of computations as before show that [(0, 0, 1)] is an identity element
for this Mad Vet semigroup, and hence W in this case is a monoid. But W is not a
group, because, for instance, there is no element [x] in W for which [(1, 0, 0)] +[x] =
[(0, 0, 1)].
Given a Mad Vet scenario, we can pose a variety of questions regarding the struc-
ture of its Mad Vet semigroup. For instance, is its semigroup nite or innite? Is it a
monoid? If it is a monoid, is it a group? Note that if it is a group, then that group is nec-
essarily abelian (since all Mad Vet semigroups are commutative)but is it necessarily
cyclic?
To give some sense of just how diverse Mad Vet semigroups can be, we provide be-
low ve additional Mad Vet scenarios (Scenarios #37) which include, in some order,
a scenario for which (1) W is an innite group; (2) W is a nite noncyclic group; (3)
W is a nite nonmonoid; (4) W is a nite cyclic group, not isomorphic to Z
3
; and (5)
W is an innite nonmonoid.
In fact, these ve different structures even arise in scenarios where the Mad Vet has
just three species in her lab. Our readers are encouraged to try their hands at matching
the above-described scenarios with those of Scenarios #37. Teachers can also nd a
sample Mad Vet homework assignment, appropriate for a rst-semester abstract alge-
bra course, at the MAGAZINE website. Descriptions of the semigroups arising in the
following ve Mad Vet scenarios are provided at the end of the article, so that readers
can check their work.
Scenario #3.
Machine 1 turns one ant into one beaver and one cougar;
Machine 2 turns one beaver into one ant and one cougar;
Machine 3 turns one cougar into one ant and one beaver.
Scenario #4.
Machine 1 turns one ant into two ants;
Machine 2 turns one beaver into two beavers;
Machine 3 turns one cougar two cougars.
Scenario #5.
Machine 1 turns one ant into one beaver and one cougar;
Machine 2 turns one beaver into one ant and one beaver;
Machine 3 turns one cougar into one ant and one cougar.
174 MATHEMATICS MAGAZINE
Scenario #6.
Machine 1 turns one ant into one beaver;
Machine 2 turns one beaver into one cougar;
Machine 3 turns one cougar into one cougar.
Scenario #7.
Machine 1 turns one ant into one ant, one beaver and one cougar;
Machine 2 turns one beaver into one ant and one cougar;
Machine 3 turns one cougar into one ant and one beaver.
Given the varied properties of Mad Vet semigroups displayed thus far, one may
wonder how one can possibly identify when Mad Vet semigroups are groups. In the
next section, we translate this algebraic question into a comparable graph-theoretical
question, whose solution is used to obtain an answer in the algebraic realm.
6. The Mad Vet Group Test
In this section, we answer the question: Given a Mad Vet scenario, when is its Mad
Vet semigroup W actually a group?
We need a bit more (standard) graph theory terminology. A path in a directed graph
is a sequence P = e
1
e
2
e
m
of one or more edges in for which t (e
j
) = i (e
j +1
)
for each 1 j m 1; we say that P is a path from i (e
1
) to t (e
m
). If v and w are
vertices in , we say v connects to w in case either v = w or there is a path in from
v to w. More generally, if P = e
1
e
2
e
m
is any path in and v is any vertex in ,
we say v connects to P in case v connects to i (e
j
) for some edge e
j
of P, 1 j m.
For a vertex v in V, a cycle based at v is a path e
1
e
2
e
m
from v to v for which the
vertices i (e
1
), i (e
2
), . . . , i (e
m
) are distinct. A loop at a vertex is therefore a cycle, with
m = 1.
The following graph-theoretic denitions might be more unfamiliar to a reader. A
nite graph is conal in case every vertex v of connects to every cycle and to
every sink in . Next, if C = f
1
f
2
f
m
is a cycle in , then an edge e is called an
exit for C if i (e) = i ( f
j
) for some 1 j m, and e = f
j
. (Intuitively, an exit for
C is an edge e, not included in C, which provides a way to momentarily step away
from C.)
EXAMPLE. Consider the following graph.
z
g
y
e f
h
x
The cycle eg based at y has three different exits: f , h and the loop at y. These same
three edges are also exits for the cycle ge based at z. Similarly, the loop at y has exits
e, f and h. On the other hand, the loop at x has no exit. Also, notice that this graph is
not conal, since, for example, vertex x does not connect to the cycle eg.
Now we are ready to answer the main question of this section.
VOL. 83, NO. 3, JUNE 2010 175
MAD VET GROUP TEST. The Mad Vet semigroup W of a Mad Vet scenario is a
group if and only if the corresponding Mad Vet graph has the following two proper-
ties.
(1) is conal; and
(2) Every cycle in has an exit.
The proof of this test is too long for this article; however, in Section 7 we will show
how the result follows from a more general theorem (whose complete proof is provided
in a supplement at the MAGAZINE website). Here, we see how this test applies to some
Mad Vet scenarios.
EXAMPLES. Consider again the Mad Vet graph associated with Scenario #1.
A
1
A
3
A
2
By inspection we see that is conal (there are no sinks in and every vertex con-
nects to each of the cycles in ) and that every cycle in has an exit. Thus the Mad
Vet Group Test reconrms that the Mad Vet Semigroup for this scenario is indeed a
group, a fact we established directly in Section 4. On the other hand, recall the Mad
Vet graph of Scenario #2.
A
1
A
3
A
2
We see that is not conal, since vertex A
3
does not connect to the cycle A
1
A
2
A
1
. So
the Mad Vet Group Test reconrms that the Mad Vet semigroup of Scenario #2 is not
a group, as we saw in Section 5.
Scenario #8. Consider the Mad Vet scenario described by Harris [7], in which the
Mad Vet has three machines with the following properties.
Machine 1 turns one cat into two dogs and ve mice;
Machine 2 turns one dog into three cats and three mice;
Machine 3 turns one mouse into a cat and a dog.
This scenario has the following Mad Vet graph, where A
1
= Cat, A
2
= Dog, and
A
3
= Mouse. The label (d) on an edge e indicates that there are actually d edges in
the graph from i (e) to t (e).
176 MATHEMATICS MAGAZINE
A
1
(5)
(2)
A
3
A
2
(3)
(3)
It is straightforward to see that this graph satises the two properties enumerated in
the Mad Vet Group Test; thus, the Mad Vet semigroup in this case is a group, which
we identify in Section 8.
You may now want to draw the Mad Vet graphs of Scenarios #37, and use the Mad
Vet Group Test to determine (or conrm) which three of those Mad Vet scenarios pro-
duce Mad Vet groups. Heres one additional observation about the Mad Vet graphs of
the remaining two scenarios: One of the graphs is conal but contains a cycle without
an exit, and the other is not conal, though each of its cycles has an exit.
7. Explanation of the Mad Vet Group Test
With the Mad Vet Group Test in hand, we have achieved the second main goal of our
article: that is, answering an algebraic question using graph theory. But we have not
proven the Mad Vet Group Test. We omit its lengthy proof, but note that the result
follows from a theorem about graph semigroups. In Section 2, we described a natural
connection between Mad Vet scenarios and directed graphs. In fact, a tighter connec-
tion can be forged. Any directed graph has an associated commutative graph monoid,
(M
, +). (The interested reader can nd the specics of this construction on p. 163 of
Ara et al. in [2].) It turns out that if x, y M
= M
are isomorphic.
Thus, information about graph semigroups may be brought to bear in a Mad Vet con-
text. In particular, the main question of the previous section can be answered if we can
answer the related question: Given a directed graph , when is its graph semigroup
W
actually a group?
As it turns out, this question about graph semigroups has recently received signif-
icant attention in various mathematical research circles. Some of the related research
ideas are described in Section 9. Though in this article we are interested only in sink-
free graphs, we do not limit ourselves to such graphs in stating the following result.
GRAPH SEMIGROUP GROUP TEST. Let be a nite directed graph. Then the
graph semigroup W
, dened as follows:
Suppose has n vertices, v
1
, v
2
, . . . , v
n
. Then A
is the n n matrix (d
i j
), where d
i j
is
the number of edges with initial vertex v
i
and terminal vertex v
j
(for all 1 i, j n).
For example, if is the graph of Scenario #1, then
A
0 1 0
1 1 1
1 1 0
.
First, we form the matrix I
n
A
, where I
n
is the n n identity matrix. For in-
stance, using the above matrix A
, we have
I
3
A
1 1 0
1 0 1
1 1 1
.
Then we put the (square) matrix I
n
A
1
,
2
, . . . ,
q
, 0, 0, . . . , 0
such that
i
divides
i +1
for each 1 i q 1. The Smith normal form of a matrix
A can be obtained by performing on A a combination of these matrix operations:
interchanging rows or columns, or adding an integer multiple of a row [column] to
another row [column]. The resulting Smith normal form of matrix A is thus of the
form PAQ, where P and Q are integer-valued matrices with determinants equal to
1. Many computer algebra systems have a built-in Smith normal form function.
For
more information about the Smith normal form of a matrix, see, for example, Stein
[10] or Chapter 23 in Hogben [8].
Heres a way of answering the just exactly what group is it? question.
MAD VET GROUP IDENTIFICATION THEOREM. Given a Mad Vet scenario whose
Mad Vet semigroup, W, is a group, let be its associated Mad Vet graph. Then
W
= Z
1
Z
2
Z
q
Z
nq
,
where
1
,
2
, . . . ,
q
are the nonzero diagonal entries of the Smith normal form of the
matrix I
n
A
.
The justication of this theorem is beyond the scope of this article, but the very
enthusiastic reader can nd a similar justication in Section 3 of Abrams et al. [1].
For instance, to use Maple to compute the Smith normal form of a matrix B, dene B in Maple, load the
package LinearAlgebra, and use the command SmithForm(B). A word of caution: the Smith normal form function
in some computer algebra systems will not nd the Smith normal form of a matrix of determinant 0, even though
such a Smith normal form always exists in this case. A matrix of that type may arise in some Mad Vet scenarios;
indeed, it arises in one of our eight numbered Mad Vet scenarios.
178 MATHEMATICS MAGAZINE
EXAMPLE. Letting be the Mad Vet graph of Scenario #1, the Smith normal form
of the matrix I
3
A
is the matrix
1 0 0
0 1 0
0 0 3
.
Because we already know that Scenario #1s semigroup is a group, the Mad Vet Group
Identication Theorem implies that it is isomorphic to Z
1
Z
1
Z
3
= {0} {0}
Z
3
= Z
3
, as expected.
See if you can now use this method to identify the three groups which arise among
Scenarios #37. Finally, try applying this method to Scenario #8; you should get that
the Mad Vet group in that case is isomorphic to Z
34
.
9. Beyond the Mad Vet
By this point, you may be wondering: Who really cares about Mad Vet semigroups
anyway? Good question! In case you are not convinced that Mad Vet semigroups are
of interest in their own right, we present the following theorem. Although this result
is rather technical, our point in stating it is to emphasize the fact that Mad Vet semi-
groups do indeed play a central role in current, active lines of mathematical research.
Not only that, but this theorem actually bridges two apparently different branches of
mathematics (algebra and analysis) and the Graph Semigroup Group Test is exactly
the link between them.
PURELY INFINITE SIMPLICITY THEOREM. For a nite directed sink-free graph
, the following are equivalent:
(1) The Leavitt path algebra L
C
() is purely innite and simple. (This is a statement
about an algebraic structure.)
(2) The graph C
-algebra C
is a group.
In the interest of brevity, we have not stated the most general formof this result. Pardos
direct proof of the equivalence of (3) and (4), which involves only undergraduate-level
graph- and group-theoretic ideas, is new; the only published proof of this equivalence
of which the authors are aware involves showing that both (3) and (4) are equivalent
to (1). The very energetic reader may wish to consult Arando Pino et al. [3].
Finally, as promised earlier, here is a description of the Mad Vet semigroups arising
in Scenarios #37. In order, these scenarios semigroups are (up to isomorphism) the
group Z
2
Z
2
, a 7-element nonmonoid, the group Z, the monoid Z
+
, and the group
Z
4
. For details, see our Analyses of Mad Vet Scenarios #37, available at the MAGA-
ZINE website.
Acknowledgment The authors express their gratitude to Enrique Pardo for allowing them to use and modify
his proof of the Graph Semigroup Group Test for this article; to Amelia Taylor and Brian Hopkins for carefully
reading and offering helpful suggestions about the article; and to Ken Ross for his valuable comments, advice, and
support. The rst author was introduced to Mad Veterinarian puzzles at a June 2008 workshop on Math Teachers
Circles, sponsored by the American Institute of Mathematics, Palo Alto, CA. The author is grateful for AIMs
support.
VOL. 83, NO. 3, JUNE 2010 179
REFERENCES
1. G. Abrams, P. N.
Anh, A. Louly, and E. Pardo, The classication question for Leavitt path algebras, Journal
of Algebra 320(5) (2008) 19832026. doi:10.1016/j.jalgebra.2008.05.020
2. P. Ara, M.A. Moreno, and E. Pardo, Nonstable K-Theory for graph algebras, Algebra Rep. Th. 10 (2007)
157178. doi:10.1007/s10468-006-9044-z
3. G. Aranda Pino, F. Perera, and M. Siles Molina (eds.), Graph Algebras: Bridging the Gap between Analysis
and Algebra, Universidad de M alaga Press, Malaga, Spain, 2007.
4. Norman L. Biggs, E. Keith Lloyd, and Robin J. Wilson, Graph Theory 17361936, Oxford University Press,
New York, 1999.
5. John B. Fraleigh, A First Course in Abstract Algebra, 7th ed., Addison Wesley, Boston, 2002.
6. P. A. Grillet, Commutative Semigroups, Springer, New York, 2001.
7. Robert S. Harris, Bobs Mad Veterinarian Puzzles, http://www.bumblebeagle.org/madvet/index.
html.
8. Leslie Hogben, ed., Handbook of Linear Algebra, Chapman & Hall/CRC, Boca Raton, FL, 2006.
9. John M. Howie, Fundamentals of Semigroup Theory, Oxford Science Publications, Oxford, UK, 1996.
10. William Stein, Finitely generated abelian groups, http://modular.fas.harvard.edu/papers/ant/
html/node9.html.
11. Douglas B. West, Introduction to Graph Theory, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 2000.
12. Robin J. Wilson and John J. Watkins, Graphs: An Introductory ApproachA First Course in Discrete Math-
ematics, Wiley, New York, 1990.
13. Joshua Zucker, Math Teachers Circle: An introduction to problem solving, http://www.
mathteacherscircle.org/materials/JZproblemsolvingstrategies.pdf.
Summary In this paper, we explore Mad Veterinarian scenarios. We show how these recreational puzzles nat-
urally give rise to semigroups (which are sometimes groups), and we point out a beautiful, striking connection
between abstract algebra and graph theory. Linear algebra also plays a role in our analysis.
GENE ABRAMS received his Ph.D. in Mathematics from the University of Oregon in 1981 under the direction
of Frank Anderson. He is pleased to have coauthored this article with a (much younger, much wiser) mathematical
sibling! He has been an algebraist at the University of Colorado at Colorado Springs since 1983. He is proud to
have been designated as a University of Colorado systemwide Presidents Teaching Scholar, as well as the 2002
MAA Rocky Mountain Section Distinguished Teaching Award recipient. When not out riding his bicycle, he
surrenders to his passions for baseball and the New York Times Sunday Crossword.
JESSICA K. SKLAR received her Ph.D. in Mathematics from the University of Oregon in 2001, and is happy
to have collaborated on this paper with a mathematical older brother. She is a Pacic Lutheran University
algebraist and animal enthusiast. She swears she would never transmogrify her cats into goldsh, but wouldnt
mind turning her neighborhood woodpeckers into something less destructive. Like tapirs. Or grizzly bears.
180 MATHEMATICS MAGAZINE
The Ergodic Theory Carnival
J ULI A BARNES
Western Carolina University
Cullowhee, NC 28723
[email protected]
LORELEI KOSS
Dickinson College
Carlisle, PA 17013
[email protected]
Ladies and gentlemen, children of all ages. Come one, come all, to see the amazing
sights at our ergodic theory carnival! Step right up, friends, and we will show you some
of the mysteries seen around the carousel and in a taffy pulling booth. We will see a
carnival photographer and nd out what kinds of carousel rotations work best for her
photographs. We will meet a magician who knows how to nd a jewel in a pile of taffy
without getting his hands sticky!
Youve got to see it to believe it, but these situations can be analyzed by an area
of mathematics called ergodic theory. Thats right, folks, not only will we look at a
collection of basic piecewise linear functions that model activities at the carnival, but
we will also use ergodic theory to distinguish between these activities. Come right
over and watch how very small differences in local behavior cause big differences in
the long term behavior of functions!
What else is ergodic theory good for, you ask? Well, let me tell you. You can use
it to explain what happens to a system over time. This marvelous mathematics was
rst used to study statistical mechanics and investigate the motion or ow of gases
over time [7, 10, 15]. But wait! Theres more! For no extra cost you can use ergodic
properties in number theory to calculate how frequently any digit occurs in the real-
number base > 1 expansion of a number in [0, 1] [3, 12, 15]. Believe it or not, you
can even use ergodic theory in the eld of environmental science to assess the validity
of ecosystem models for pine forests [11]. An ergodic function has the property that if
you look long enough at its iterates on an arbitrary point you can obtain information
that represents the entire system. Starting at any other point gives you exactly the same
information. There is no sleight of hand here, folks; what you see is what you get.
Gather around and watch what we are going to do! Grab some cotton candy, bring
your mathematical intuition, and join us for a great show.
Basic examples
As you enter our carnival, stop rst at the carousel with its artistically crafted horses
and distinctive music. Find a place to stand by the side of the carousel and watch the
activity for a while. Notice the photographer taking pictures of children riding on the
carousel. She has set up her tripod at the best vantage point, and she takes a picture
every time the carousel stops.
As a mathematician, you notice that each movement of the carousel can be de-
scribed as a function on a circle, ignoring the up-and-down movement of the horses.
Pick the horse nearest to you on the edge of the carousel and call its initial point zero.
Let the circumference of the carousel be one unit. As the carousel moves, the distance
Math. Mag. 83 (2010) 180190. doi:10.4169/002557010X494823. c Mathematical Association of America
VOL. 83, NO. 3, JUNE 2010 181
of the horse from you, measured along the circumference of the carousel in the direc-
tion of motion, increases from 0 to 1. But wait! When it has traveled one unit, it is back
at its initial point. The location of any horse at any instant is described by its distance
along the edge of the circle in the counterclockwise direction, or a number in [0, 1]
where 0 and 1 represent the same location.
Lets practice by describing the motion of the horses while the carousel is stopped
to let children on. If a horse starts at the location x, let I (x) be its location at the end
of this motion. It isnt moving! So, I (x) = x. That was easy!
The operator starts the carousel again and could stop it after it travels any distance.
For now, the horse that starts at zero moves halfway around the circle and stops. It has
been a very short ride for everyone, and because the carousel is a solid structure, every
horse has moved exactly halfway around the circle. Using your mathematical skills,
you think of a function to represent this circular motion. While you might consider a
function taking the circle to itself, here we represent the distance traveled along the
edge of the carousel as a function from [0, 1] to [0, 1]. If a horse starts at x, let C(x)
be its location at the end of this motion. Then C(x) is dened by
C(x) =
2x if 0 x < 1/2
2x +2 if 1/2 x < 1
to describe the taffy pull, and the graph of T(x) appears in FIGURE 3. This map is
commonly referred to as a tent map because of the shape of the graph.
1
1
1
1
(a)
(b)
(c)
(d)
Figure 2 (a) original taffy; (b) stretch
the taffy to twice its original length;
(c) fold taffy in half; (d) smush taffy
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Figure 3 The taffy function T(x)
As you watch the repetitive motion of the clowns, a magician appears and, with a
sly smile, leans over the taffy and drops a shiny jewel into the sticky mess. It lands
about 3/4 of the distance from the rst taffy puller toward the second taffy puller and,
after one quick stretch and fold, you catch a glimpse of it about halfway between them.
The taffy pullers are stretching and folding so quickly that you lose track of the jewel,
and you wonder if the magician will be able to nd it again.
Invertibility
While you stand there eating your cotton candy and watching the carnival sights, you
contemplate how the attractions you have seen are similar and how they are different.
You can begin by comparing the properties of the carnival functions we have already
dened, I (x), C(x), and T(x).
What would happen if the carousel were rotated in reverse? What if the taffy pullers
were to try to undo their work? It is easy to see that I (x) can be reversed. That is, since
the horses dont move at all, every horse arriving at I (x) comes from one previous
pointin this case, x. Mathematically, this property is called invertibility. A function
f (x) is called invertible if it is one-to-one, so that for any element y in the range of f ,
there is exactly one element x in the domain with f (x) = y. Even when the carousel
rotates, it is certainly possible to undo the rotation, sending each horse backwards to
the place where it started. Therefore, the carousel function, C(x), is invertible. (Parents
are lucky that the carousel is invertible; if it werent, reversing the direction of the
carousel would take a horse and child back to more than one locationthe one he or
she originally started from as well as cloned duplicates of the child in other locations.
Children are hard enough to keep up with already!)
VOL. 83, NO. 3, JUNE 2010 183
Attempting to invert T(x), however, is a little more sticky. Notice that T(1/4) =
T(3/4) = 1/2. That is, applying the taffy function in reverse would take each portion
of the taffy, break it into two pieces, and send these pieces to different locations. It
becomes a gummy mess, which is what is expected if one attempts to unmix taffy. It
also means that the taffy function is not invertible.
Lebesgue measure
Now, we take you on a quick trip away from the midway. Up next, we show you the
strange and mystifying sideshow attraction of measure theory. Those with sensitive
stomachs should look away as we generalize the concept of length to frightening and
grotesque subsets of the real line.
Step right up, ladies and gentleman, young and old, to see the wonderful and mys-
terious Lebesgue measure. If you have previously seen the secrets of the fantastic
integration developed by Henri Lebesgue then you may move immediately to the next
section of our carnival. But no one else should miss this attraction!
The familiar Riemann integration that you learned in calculus originated in the work
of Newton and Leibniz, and it only works on functions that are relatively nice. In
particular, we expect the sets that we use to be no more complicated than countable
unions of disjoint intervals contained in [0, 1]. Using that the length of an interval
[a, b] is l([a, b]) = b a, we can clearly dene the length of sets that are countable
unions of disjoint intervals. The length of the set is just the sum of the lengths of the
intervals. This concept of length is critical to the denition of Riemann integration.
Henri Lebesgue worked to extend the concepts of integration to functions that are
much more bizarre. He did this by generalizing the notion of length to what is called
a measure that is dened on more complicated sets. We now offer entire classes (one
may be starting soon in the Chautauqua tent right over there!) on the theory of measur-
able sets, measures, and integration, and mathematicians are still conducting interest-
ing research in these areas. Here, we sketch an outline of the development of Lebesgue
measure, the details of which can be found in Halmos book on measure theory [6].
To begin, we dene Lebesgue outer measure , which is a function dened on all
subsets E of [0, 1]. First, we take a countable collection of open intervals whose union
contains the set E and nd the sum of the lengths of the intervals in that collection.
Then we take the greatest lower bound of the lengths over all such unions of open
intervals containing E. This serves to minimize any overlap and measure E as closely
as possible. The greatest lower bound is called the outer measure of E, or (E).
Now, we really want to have the relationship (E) + ([0, 1] \ E) = 1. That is,
E and its complement should surely combine to have the length of [0, 1], and no
more. Thats just common sense! When that happens, we say that E is a Lebesgue
measureable set, and we dene the Lebesgue measure of E, (E), to be (E) = (E).
If [a, b] is an interval, then ([a, b]) = b a, as we expect. So Lebesgue measure is a
generalized length function that can be applied to more complicated subsets of [0, 1].
But not all subsets! Unfortunately, there are some complicated subsets of the inter-
val (sideshow horrors, unsuitable for most visitors) for which Lebesgue outer measure
gives rise to some paradoxes that conict with properties that we expect any length
function to have, so Lebesgue outer measure is not really a length function. Do you
want to enter the Sideshow of Strange Pathologies? No, no, turn back! The uninitiated
may be shocked by the behavior of sets born from the Axiom of Choice. Skip the next
two paragraphs!
We will now construct for you a set that has no Lebesgue measure. The rst step is
to suppose that x and y are two numbers in [0, 1] and dene x to be equivalent to y
184 MATHEMATICS MAGAZINE
if and only if x y is a rational number. That seemingly tame axiom we mentioned
allows us to conjure up a subset of [0, 1] that contains exactly one element from each
equivalence class; call it N. For each rational number r in [0, 1], we dene another
subset N
r
as follows:
N
r
= {x +r : x N [0, 1 r)} {x +r 1: x N [1 r, 1]}.
This slight-of-hand moves N r units to the right, and then moves the part that extends
beyond the point 1 backwards by 1 unit. It takes a little bit of work, but it is not difcult
to show that [0, 1] is the disjoint union of the sets N
r
.
The length of N
r
for each r should be the same because they are just translations of
N, but the sum of the lengths of the N
r
s over the countably innite rational numbers
in [0, 1] must be the length of the entire interval. If each N
r
has a positive length then
the sum would be innite, contradicting your knowledge that the length of [0, 1] is
one. Similarly, if each N
r
has length 0, then the sum would be 0. This is a paradox,
and we end up with a strange and alarming set whose length cannot be measured.
Welcome back, and for the sake of your sanity, be glad that you skipped the last two
paragraphs!
Measure-preserving functions
Back in the safety of the midway, we return to comparing the properties of the func-
tions that we have seen. How do the carousel and taffy pull treat our new friend
Lebesgue measure? Do children change in size? Does the amount of taffy shrink?
Mathematically, we are asking whether the functions preserve measure. To introduce
the formal denition, we need to dene the preimage of the set A as the set f
1
(A) =
{x : f (x) A}.
DEFINITION 1. If is Lebesgue measure on [0, 1] and f : [0, 1] [0, 1] is a
function, then f preserves the measure if ( f
1
(A)) = (A) for every measurable
set A.
If we consider the identity map, the inverse image of any measurable set A is simply
itself, I
1
(A) = A, so it easily follows that I preserves measure.
The carousel function, C(x), simply rotates every point x to a location halfway
around the carousel, and C
1
(x) rotates the carousel halfway the other direction. If
we see a certain number of children on the carousel right now, there were the same
number there before the carousel rotated. The children did not multiply or disappear.
The measure of any set of children is not changed by C
1
(x), and therefore C(x)
is measure-preserving. Even if we modify C(x) to rotate by an amount other than
1/2, C
a
(x) = (x +a) mod 1 for some real number a, Lebesgue measure is preserved
because C
a
(x) is still just a translation. FIGURE 4 shows the graph of one example of
a modied carousel function, C
2/2
(x) = (x +
2/2) mod 1.
Unlike C(x), the taffy function T(x) is not a simple translation, so it may appear
as if this function does not preserve measure. This function is not even invertible! But,
does T(x) preserve measure?
When the taffy is pulled, the original slab of taffy is stretched to twice its length.
Then the taffy is folded over to make a new piece of taffy the same length as the
original piece. If we reverse the process, any set A in the interval [0, 1] of taffy has to
be unfolded into two pieces, and then each piece is shrunk to half of its length (yes, this
would be difcult to do in real life!). FIGURE 5 illustrates this procedure. Since each
piece is reduced by half its length and there are two pieces, the pre-image of A has the
VOL. 83, NO. 3, JUNE 2010 185
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Figure 4 The modied carousel func-
tion C
2/2
(x) = (x +
2/2) mod 1
1
1
1
1
(a)
(b)
(c)
(d)
Figure 5 (a) select part of the taffy;
(b) un-smush taffy; (c) unfold taffy;
(d) shrink taffy back to original length
same measure as A. Looking at this more formally, imagine having an interval [a, b]
in [0, 1]. Then, T
1
([a, b]) = [
a
2
,
b
2
] [1
b
2
, 1
a
2
]. It follows that the Lebesgue
measure of T
1
([a, b]) is [
b
2
a
2
] +[(1
a
2
) (1
b
2
)] = 2(
b
2
a
2
) = b a which
is the measure of [a, b]. Since this holds for all intervals and since any measurable set
A has a measure based on all intervals containing A, it follows that T is a measure-
preserving function. In other words, we dont lose any taffy in the process, and it is
spread evenly in each step.
The fact that our taffy function is measure-preserving is based on the fact that when
we mix the taffy, any newly mixed piece (interval) comes from two pieces which
are each half the length of the new piece. This is directly related to the fact that we
stretched the taffy to twice its length. What if we modify the taffy function to allow
stretching by a different amount? Suppose that a new taffy-pulling clown arrives at the
scene. Instead of stretching the taffy to twice its length, the new clown stretches the
taffy from a length of one to a length of 3/2 and then folds the newly stretched taffy
over, making a crease at the point one unit from 0 like before. This time, part of the
taffy is not covered by the newly stretched part, and the graph is not symmetric, as
seen in FIGURE 6. The new resulting taffy fold function becomes:
T
3/2
(x) =
3
2
x if 0 x <
2
3
3
2
x +2 if
2
3
x < 1.
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Figure 6 The modied taffy fold function T
3/2
186 MATHEMATICS MAGAZINE
If we consider a portion of the taffy near 0, say the interval A = [0, 1/3], then the
measure of the pre-image of A, T
1
3/2
(A), is (1/3)/(3/2) = 2/9. But the measure of A
is 1/3. Since A is a measurable set, T
3/2
is not a measure-preserving function, even
though no taffy is lost in the process. The main difference is that the taffy is not mixed
evenly in this case.
Folks, both our taffy and carousel functions as originally dened preserve Lebesgue
measure. However, when we modify the functions, all carousel-like functions preserve
Lebesgue measure, but not all taffy-like functions preserve Lebesgue measure.
Ergodicity
Ladies and gentlemen, you are now about to witness the secrets behind ergodicity and
how it relates to our carnival functions and modied versions of these functions. We
will rst show you strange sets that are equal to their preimages.
What does this mean, you ask, to have a set A with f
1
(A) = A? Watch carefully
as our carousel carnies dress two children on opposite sides of our carousel in red
clown wigs. Keep your eyes wide open when the operator runs the carousel backwards.
Thats right, friends, C
1
(x) is a rotation halfway around the carousel, so no child ends
in the same location as he or she began. But wait! After a half rotation of the carousel
backwards, the red wigs are located in exactly the same positions as before, even if the
children themselves are in different locations! Thats right, folks, if A denotes the set
of locations of red clown wigs, we have that C
1
(A) = A.
Why are sets with f
1
(A) = A important? In general, if we have a measurable
subset A [0, 1] and measure-preserving function f such that f
1
(A) = A, then it
is also true that f
1
([0, 1] A) = [0, 1] A. In this case, we could simplify things;
we could study f by looking at its restriction to A independently from its restriction
to [0, 1] A. However, if (A) = 0 or (A) = 1, then we havent signicantly sim-
plied our study. Functions that cannot be simplied in this way are called ergodic. In
other words, if the measure of A is strictly between 0 and 1, then for f to be ergodic,
it is necessary that f
1
moves at least part of the set A to somewhere else.
DEFINITION 2. If is Lebesgue measure on [0, 1] and f : [0, 1] [0, 1] is a
measure-preserving function, then f is ergodic if the only measurable sets A with
f
1
(A) = A satisfy (A) = 0 or 1.
Although the carousel function C(x) is measure-preserving, it is not ergodic, and it
is very easy to construct a measurable set to verify this. Let A = [0, 1/4) [1/2, 3/4),
representing children riding in the rst or third quadrants of the circle. Then A is
clearly measurable with (A) = 1/2, and C
1
(A) = A, so C(x) is not ergodic. We
will see in the next section why this is signicant.
What about the other carousel-like functions dened by C
a
(x) = (x +a) mod 1
for a real number a? We know that they are all measure-preserving, but are any of
these functions ergodic? If a = 0 then we have that C
0
(x) is the identity function
I (x), and in this case I
1
(A) = A for any set A, so I (x) is clearly not ergodic. If
the translation number a is any other rational number, then C
a
is also not ergodic. For
when a is rational, a = p/q for some integers p and q with q = 0, and p and q have no
common factors besides 1. Dene A = [0,
1
2q
] [
1
q
,
3
2q
] [
2
q
,
5
2q
] [
q1
q
,
2q1
2q
].
Then C
1
a
(A) = A, but the measure of A is 1/2.
If the translation number a of C
a
(x) is irrational, then we are in a much differ-
ent situation. If a is irrational, then C
a
(x) is ergodic. While this is difcult to prove
rigorously from the denition, it is not too challenging to see why the conditions of er-
godicity must hold on intervals. Suppose that the set A contained an interval [c, d]. We
VOL. 83, NO. 3, JUNE 2010 187
know that no matter how many times we run the carousel, we always end up with a set
of length d c. Since C
1
a
(A) = A, it follows that C
1
a
([c, d]) A. Using the same
reasoning, C
1
a
(C
1
a
([c, d])) A, and so on. However, since a is an irrational number,
the points C
n
a
(c) = C
1
a
C
1
a
C
1
a
(c), where we perform n compositions, ll
out the circle. That is, no matter where you decide to stand around the carousel, at some
time the left endpoint c will stop arbitrarily close to you. If kids with red clown wigs
were sitting in the interval [c, d], then no matter where you stand before the carousel
moves, at some time there will be a red wig almost directly in front of you; thus there
was a red wig in front of you before the carousel moved. So all points on the carousel
must belong to A, and (A) = 1. Recall that FIGURE 4 shows the graph of a modied
carousel function with a =
2/2, which we now know is ergodic since the translation
number is irrational.
Dont let this sleight of hand fool you into thinking that this is a complete proof
that C
a
(x) is ergodic when a is irrational! Remember from our earlier discussion that
Lebesgue measurable sets are more complicated than intervals, or even innite unions
or intersections of intervals. Still, examining intervals gives us some idea about why
C
a
is ergodic when a is an irrational number, and you can nd a complete proof in
standard ergodic theory books, like ones by Petersen [10] or Walters [15].
What about the taffy function T(x)? It, too, is ergodic. Again, we can examine
intervals to obtain a glimmer of understanding as to why this is true. Imagine that you
used red food coloring to color a visible section of your taffy that belonged to a set A.
After unfolding and shrinking, there would be a red piece closer to the rst taffy puller
and a red piece closer to the second taffy puller, so those regions had to belong to the
original set A as well. Continuing this process, we see that if A contains an interval
and T
1
(A) = A, then (A) = 1. A rigorous proof can be found in Nicholis book on
Nonlinear Science [9].
You might suspect that any measure-preserving, noninvertible function is ergodic,
but that is false. In fact, we can easily modify the taffy function to obtain a measure-
preserving function that isnt ergodic. Lets suppose that our original taffy stretching
clown, who was skilled at stretching the taffy to twice its original length, returns to the
booth. However, after the second clown stretches the taffy to twice its length, instead
of folding the taffy, he cuts it in half. Then each clown performs his own taffy fold at
the midpoint of his own piece. After this is completed, the second clown sticks the two
ends of his piece to the fold of the rst clowns piece, so they now have a piece of taffy
that has length one again, and they repeat the process. We can represent this function
with the equation
S(x) =
2x if 0 x < 1/4
2x +1 if 1/4 x < 1/2
2x 1/2 if 1/2 x < 3/4
2x +5/2 if 3/4 x < 1.
(1)
See FIGURE 7 for a graph of S(x). Again, if we look at pre-images of any interval,
we end up with exactly two pieces of the same length. In addition, it is not difcult
to see that the graph in FIGURE 7 can be decomposed into the function on [0, 1/2]
and the function on [1/2, 1]. Using arguments similar to those for T(x), the func-
tion S(x) is non-invertible and measure-preserving. However, S is not ergodic because
S
1
([0, 1/2)) = [0, 1/2).
So the original taffy function is ergodic but the carousel function is not. However,
as we have shown, we can modify the carousel function to obtain one that is ergodic,
and we can modify the taffy function to obtain one that is not!
188 MATHEMATICS MAGAZINE
0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Figure 7 The modied taffy fold function S(x)
The ergodic theorem
Folks, you may not yet be convinced that ergodic functions are useful or important,
but stick around to see the famous Birkhoff ergodic theorem, proved by George David
Birkhoff in 1931 [2]. The ergodic theorem ensures that what you observe is represen-
tative of the entire system. We will use this theorem to help our photographer, who
would like to take pictures of all children on the carousel. If she only takes photos
when the carousel stops, which carousel functions will allow her to photograph all of
the children? We will also use this theorem to help our magician, who has dropped the
jewel into the taffy.
Stick around, friends, and we will show you a simplied version of Birkhoffs er-
godic theorem that will resolve the conundrums of our photographer and magician. To
do this, we need to dene the characteristic function of A,
1
A
(x) =
1 if x A
0 if x / A.
THEOREM 3. (BIRKHOFFS ERGODIC THEOREM) If is Lebesgue measure on
[0, 1] and f : [0, 1] [0, 1] is a measure-preserving function, then f is ergodic if
and only if
lim
N
1
N
N
n=1
1
A
( f
n
(x)) = (A)
for each measurable set A and for almost every x [0, 1] ( for all x [0, 1] except
for at most a set of measure 0).
The left-hand side of the equation in Birkhoffs ergodic theorem represents the limit
of the average number of times f (x), f ( f (x)), f ( f ( f (x))), . . . lands in the set A.
This is commonly known as the time average, and the right hand side of the equation
is known as the space average. In other words, for almost every possible point x,
the set f (x), f ( f (x)), f ( f ( f (x))), . . . will eventually land in every set of positive
measure, and about as often as the measure of the set would indicate. The statement
and proof of Birkhoffs ergodic theorem is beyond the scope of this paper, but we refer
the interested reader to Birkhoffs paper, [2], or ergodic theory books by Petersen [10]
or Walters [15].
How does the ergodic theorem apply to our photographer, who is taking pictures
every time that the carousel stops? If the carousel moves according to the original
carousel function C(x), the photographer would photograph the same two children
VOL. 83, NO. 3, JUNE 2010 189
over and over again. This is because C(x) rotates exactly halfway around each time.
If we look at C
a
(x) for any rational a, she would still see a nite number of children
as the day continues. She much prefers the motion x described by C
a
(x) when a is
irrational. Why? Since we know this system is ergodic, Birkhoffs ergodic theorem
implies that almost every point along the edge of the carousel will eventually move
into the cameras eld of view. The photographer does not have to move, yet she can
take photographs of each child if she waits long enough. If she had selected a different
location to set up her camera, she would still photograph every child. Hence, when a
is irrational, we have a happy photographer.
What about our magician? He simply asks a member of the audience to select one
small region on the table to stare at as the taffy pullers work. The magician is convinced
that the jewel will reappear in this one location as long as the group waits long enough.
Since we showed that the taffy function T(x) is ergodic in the previous section, the
ergodic theorem implies that he is correct. However, our magician knows better than to
do his jewel trick with the modied taffy function S(x) shown in FIGURE 7. He cant
guarantee that the audience member will choose a spot where the jewel will reappear
because S(x) is not ergodic.
This brings us back to the question of how ergodic theory is used. In physics, the
ergodic theorem implies that studying the motion of a single particle of gas over the
long term (the time average) gives the same information as looking at all particles at a
particular instant (the space average) [7, 10, 15]. Ergodicity is also useful in biomedical
signal and image processing. For many tests, such as the electrocardiogram (ECG) and
the electroencephalography (EEG), technicians take only one sample recording from
a patient and calculate a time average. If the process is ergodic, then they can use the
time average to estimate the mean and variance of the signal (the space averages) using
the ergodic theorem [8]. These examples, my friends, are not just a day at the carnival.
A nal question After spending a hot, sticky day at the midway, we want to leave
you with one more enticing idea that will compel you to return to our carnival again.
How can we distinguish between the ergodic examples? There are many other prop-
erties that play important roles in ergodic theory; we mention one more. We say a
function f is strong mixing if for all measurable sets A and B
lim
n
(A f
n
B) = (A)(B).
This means that, in the long run, f distributes B fairly evenly throughout [0, 1]. Strong
mixing implies ergodicity, but not all ergodic functions are strong mixing. One of the
ergodic examples in this paper is strong mixing with respect to Lebesgue measure, and
the other is not. Can you gure out which is which? The answers can be found in the
references [9, 15].
Acknowledgment The referees made insightful comments and suggestions that greatly improved this paper,
and so we novice carnival workers gratefully acknowledge the assistance of our Lot Managers.
REFERENCES
1. J. Barnes and J. Hawkins, Families of ergodic and exact one-dimensional maps, Dyn. Syst. 22(2) (2007)
203217. doi:10.1080/14689360600914730
2. G. D. Birkhoff, Proof of the ergodic theorem, Proc. Natl. Acad Sci. 17 (1931) 656660. doi:10.1073/pnas.
17.12.656
3. K. Dajani and C. Kraaikamp, Ergodic Theory of Numbers, Mathematical Association of America, Washing-
ton, DC, 2002.
4. W. de Melo and S. van Strien, One Dimensional Dynamics, Springer-Verlag, Berlin, 1993.
5. P. Collet and J. P. Eckmann, Iterated Maps on the Interval as Dynamical Systems, Birkhh auser, Boston, 1980.
190 MATHEMATICS MAGAZINE
6. P. R. Halmos, Measure Theory, Van Nostrand, New York, 1950.
7. U. Krengel, Ergodic Theorems, de Gruyter Studies in Mathematics #6, Walter de Gruyter, Berlin, 1985.
8. K. Najarian, R. Splinter, Biomedical Signal and Image Processing, CRC Press, Boca Raton, FL, 2006.
9. G. Nicolis, Introduction to Nonlinear Science, Cambridge University Press, Cambridge, UK, 1995.
10. K. Petersen, Ergodic Theory, Cambridge studies in advanced mathematics #2, Cambridge University Press,
Cambridge, UK, 1983.
11. S. Pietsch and H. Hasenauer, Using ergodic theory to assess the performance of ecosystem models, Tree
Physiology 25 (2005) 825837.
12. A. R enyi, Representations for real numbers and their ergodic properties, Acta Math. Acad. Sci. Hungar 8
(1957) 477493. doi:10.1007/BF02020331
13. D. Rudolph, Fundamentals of Measurable Dynamics: Ergodic Theory on Lebesgue Spaces, Clarendon Press,
Oxford, UK, 1990.
14. C. E. Silva, An Invitation to Ergodic Theory, American Mathematical Society, Providence, RI, 2008.
15. P. Walters, An Introduction to Ergodic Theory, Springer-Verlag, New York, 1982.
Summary The Birkhoff ergodic theorem, proved by George David Birkhoff in 1931, allows us to investigate
the long-term behavior of certain dynamical systems. In this article, we explain what it means for a function to
be ergodic, and we present Birkhoffs theorem. We construct models of activities typically found at carnivals and
compare and contrast them by analyzing their ergodic theory properties. We use these carnival models to show
how Birkhoffs ergodic theorem can be used to help a photographer set up her equipment to take pictures of all
children on a carousel and to aid a magician in nding a lost jewel in a sticky mess of taffy.
JULIA BARNES received her Ph.D. from UNC-Chapel Hill in 1996, and has been teaching at Western Carolina
University ever since. Her research area is a cross between ergodic theory and complex dynamical systems.
Although she has not visited a carnival, ridden a carousel, or watched a clown pull taffy lately, she does enjoy
looking at fun applications of mathematics.
LORELEI KOSS is an associate professor in the Department of Mathematics and Computer Science at Dickin-
son College in Carlisle, Pennsylvania. She received a Ph.D. in Mathematics from the University of North Carolina
at Chapel Hill (1998). In addition to her interest in teaching undergraduate mathematics, she enjoys research on
complex dynamical systems and ergodic theory. She also loves taffy.
To appear in College Mathematics Journal, September 2010
THE FAIRNESS ISSUE
Articles
An Interview with Steven J. Brams, by Michael A. Jones
A Geometric Approach to Fair Division, by Julius Barbanel
Cutting Cakes Carefully, by Theodore P. Hill and Kent E. Morrison
Taking Turns, by Brian Hopkins
Who Does the Housework? by Angela Vierling-Claassen
Lewis Carroll, Voting, and the Taxicab Metric, by Thomas C. Ratliff
Gerrymandering and Convexity, by Jonathan K. Hodge, Emily Marshall, and
Geoff Patterson
Classroom Capsule
Visualizing Elections using Saari Triangles, by Mariah Birgen
VOL. 83, NO. 3, JUNE 2010 191
Which Surfaces of Revolution
Core Like a Sphere?
VI NCENT COLL
Lehigh University
Bethlehem PA 18015
[email protected]
J EFF DODD
Jacksonville State University
Jacksonville, AL 36265
[email protected]
A spherical ring is the object that remains when a cylindrical drill bit bores through a
solid sphere along an axis, removing from the sphere a capsule consisting of a cylinder
with a spherical cap on each end, as shown in FIGURE 1. Remarkably, the volume of
such a spherical ring depends only on its height, dened as the height of its cylindrical
inner boundary, and not on the radius of the sphere from which it was cut.
h
y
x
z
Figure 1 Cutting a spherical ring of height h from a sphere.
h
(r, 0) (r, 0)
(0, r)
y
x
y =
_
r
2
x
2
_
h/2,
_
r
2
(h/2)
2
_ _
h/2,
_
r
2
(h/2)
2
_
Figure 2 A spherical ring as a solid of revolution.
One straightforward way to verify this fact is to note that all the objects in FIGURE 1
are solids of revolution. This is depicted in FIGURE 2, where everything shown in the
xy-plane is to be revolved around the x-axis. There a sphere of radius r is represented
Math. Mag. 83 (2010) 191199. doi:10.4169/002557010X494832. c Mathematical Association of America
192 MATHEMATICS MAGAZINE
by the semicircular graph of y =
r
2
x
2
, and a spherical ring of height h cut from
this sphere is represented by the shaded region below the semicircle and above the
horizontal line segment of length h inscribed in the semicircle. We can calculate the
volume of this spherical ring by integrating the areas of its annular cross-sections taken
perpendicular to the x-axis (the washer method):
V =
_
h/2
h/2
_
(
_
r
2
x
2
)
2
(
_
r
2
(h/2)
2
)
2
_
dx
=
_
h/2
h/2
_
(h/2)
2
x
2
_
dx =
h
3
6
.
At the outset it looks as though V should depend on both r and h, but it turns out
to be a function of h only. This is a surprise that challenges many peoples intuition.
For example, a spherical ring of height one centimeter cut out of a sphere the size of
the earth has the same volume as a spherical ring of height one centimeter cut out of
a sphere the size of a baseball. How can this be? The reason is that while the inner
radius of the ring cut out of the earth is much larger, the radial thickness of this ring is
much smaller: about 2 10
10
cm, which is less than the diameter of a hydrogen atom.
For spherical rings of any xed height h cut out of spheres of increasing radius r, this
tradeoff between increasing inner radius (the quantity
_
r
2
(h/2)
2
in FIGURE 2) and
decreasing radial thickness (the quantity r
_
r
2
(h/2)
2
in FIGURE 2) preserves a
xed volume.
This property of the sphere appears in many calculus textbooks as an exercise in
calculating volumes of solids of revolution. It has also caught the eye of many recre-
ational mathematicians, perhaps getting its most public airing in the newspaper column
of Marilyn vos Savant [11]. But, despite its prominence, it seems to lack a name. Since
the process of cutting a spherical ring out of a sphere is much like coring an apple, we
refer to this property as the coring property of the sphere.
Many surfaces of revolution can be similarly cored by cylindrical drill bits centered
on their axes of revolution. So it is natural to ask to what extent the coring property
characterizes the sphere among surfaces of revolution. Here we pose this question pre-
cisely and answer it completely using only elementary ideas from calculus, informed
at critical junctures by geometric insight.
The coring property The rst order of business is to state the coring property in
such a way that it applies to surfaces of revolution other than spheres. The coring
property of the sphere compares spheres of different radii r, but each of these is just
the unit sphere scaled up or down by the linear scale factor r. So we say that a surface
of revolution satises the coring property if, when the surface is scaled up or down
by a linear scale factor and then cored by a cylindrical drill bit centered on its axis of
revolution, what remains (exterior to the drill bit) is a ring whose volume depends only
on its height, and not on the scale factor. We dene a ring to be a one-piece solid of
revolution having a single cylindrical inner boundary, and the height of such a ring to
be the height of its cylindrical inner boundary.
To esh out this formulation of the coring property, and to give us a workable setup
for our investigation of it, we need a picture. In general, a surface of revolution S is
generated by revolving a plane curve C, called the prole curve of S, around a line
lying in the same plane as C, which we have already called the axis of revolution
of S. In particular, a sphere is the surface generated by revolving a semicircle around
the line containing its diameter. (In fact, this is how Euclid dened a sphere in his
VOL. 83, NO. 3, JUNE 2010 193
Elements [6, p. 261]!) Since we are essentially generalizing a property of the sphere,
we begin with a prole curve looking much like a semicircle, as depicted in FIGURE 3.
h
(ra, 0) (ra, 0)
y
x
(0, rb)
y = r f (x/r)
_
h/2, r f (h/2r)
_ _
h/2, r f (h/2r)
_
Figure 3 An even prole function y = f (x) scaled by a linear scale factor r.
The prole curve in FIGURE 3 is the graph of an even prole function y = f (x)
and is to be revolved around the x-axis. We scale the surface S generated by the
graph of f by a linear scale factor r, yielding surfaces S(r) generated by the curves
y/r = f (x/r), or y = r f (x/r). (For example, if S is a sphere of radius , then S(r)
is a sphere of radius r.) We can cut a ring out of the solid bounded by S(r) by boring
through it with a cylindrical drill bit centered on the x-axis. The resulting ring is gen-
erated by revolving the shaded region around the x-axis in FIGURE 3. We say that the
surface S satises the coring property if the volume V(r, h) of a ring of height h cut
out of the solid bounded by S(r) is a function of h alone.
Before striking out in search of surfaces satisfying the coring property, lets exam-
ine the assumptions implicit in FIGURE 3, since these will be the hypotheses for any
conclusions that we reach based on this picture. To begin with, the prole curve in
FIGURE 3 is not self-intersecting and it has exactly two x-intercepts. We accept these
assumptions as geometrically natural, because they ensure that the resulting surface S
is closed: that is, it encloses a single 3-dimensional region.
Two other prominent features of this prole curve are:
1. It is the graph of a function y = f (x).
2. It has a vertical line of symmetry, which conveniently and with no loss of generality
is the y-axis.
These assumptions are not quite as cumbersome as they might seem because, for our
purposes, the rst is subsumed by the second. That is, if a curve C generates a surface
that satises the coring property and if C is symmetric with respect to the y-axis, then y
must be a function of x on C. This is because for any prole curve C that is symmetric
with respect to the y-axis on which y is not a function of x, there will be values of h
for which two or more rings having the same height h but different volumes can be cut
out of the surface generated by C by cylindrical drill bits of different sizes, so that the
volume of a ring cannot be a function of its height alone. For example, consider the
prole curve C indicated in FIGURE 4. For the value of h indicated there, cylindrical
drill bits of radii R
1
, R
2
, and R
3
will cut rings out of the surface generated by C having
the same height h but different volumes. A surface generated by a curve C having a
vertical line of symmetry is centrally symmetric. That is, it has a center of symmetry:
a point P (in this case the origin) bisecting every line segment passing through P that
connects two points on the surface.
194 MATHEMATICS MAGAZINE
(h/2, R
3
)
(h/2, R
2
)
(h/2, R
1
)
(h/2, R
3
)
(h/2, R
2
)
(h/2, R
1
)
h
(a, 0) (a, 0)
(0, b)
y
x
Figure 4 A symmetric prole curve not dened by a function.
So a closed, centrally symmetric surface of revolution S satisfying the coring prop-
erty must be generated by the graph of an even prole function f having exactly two
x-intercepts. In addition, f must be increasing to the left of x = 0 and decreasing to
the right of x = 0, since only then will coring the surface S with a cylindrical drill bit
always result in what we have dened to be a ring, which needs to be in one piece.
Therefore, to determine which closed, centrally symmetric surfaces of revolution sat-
isfy the coring property, it is safe use FIGURE 3 as a starting point.
The symmetric case: a calculus argument The volume V(r, h) of the ring formed
in FIGURE 3 is twice the volume of the right half of the ring, which is the volume
enclosed by S(r) on the interval 0 x h/2 less the volume of the cylinder drilled
out on that same interval:
V(r, h) = 2
_
_
h/2
0
_
r f
_
x
r
__
2
dx
_
r f
_
h
2r
__
2
h
2
_
. (1)
We wish to identify the functions f for which V depends only on h and not on r.
Towards this end, the simplest strategy turns out to be the best: we simply set equal to
each other the volumes of two different rings of the same height, and see what we can
say about f based on the resulting equation.
In particular, note that for a ring cut out of the unscaled surface S, whose height
h will satisfy 0 h/2 a, another ring of the same height can be cut out of any
scaled-up surface S(r) where r > 1, and the volumes of these two rings should be
the same. That is, for any h such that 0 h/2 a and any r 1, we should have
V(1, h) = V(r, h), or from (1):
2
_
_
h/2
0
[ f (x)]
2
dx
_
f
_
h
2
__
2
h
2
_
= 2
_
_
h/2
0
_
r f
_
x
r
__
2
dx
_
r f
_
h
2r
__
2
h
2
_
(2)
which is easily rearranged to yield
_
h/2
0
_
[ f (x)]
2
[r f (x/r)]
2
_
dx = (h/2)
_
[ f (h/2)]
2
r
2
[ f (h/2r)]
2
_
. (3)
VOL. 83, NO. 3, JUNE 2010 195
For xed r 1, let
g(x) = [ f (x)]
2
r
2
[ f (x/r)]
2
.
Then for 0 h/2 a, g satises
_
h/2
0
g(x) dx =
h
2
g(h/2). (4)
Dividing both sides of (4) by h/2, we see that the average value of g on any subinterval
[0, h/2] of [0, a] is its value at the right endpoint of the subinterval: g(h/2). Does this
mean that g must be constant? If f is continuous on the interval [0, a], then so is g, so
that both sides of (4) are differentiable functions of h. Differentiating yields
1
2
g(h/2) =
1
2
g(h/2) +
h
4
g
(h/2)
so that g
_
b
a
_
2
h
3
which depends only on the shape of the spheroid and on h, and not on the scale of the
spheroid. So we have shown:
PROPOSITION 1. A closed, centrally symmetric surface of revolution generated by
a continuous prole curve satises the coring property if and only if it is a spheroid.
196 MATHEMATICS MAGAZINE
The non-symmetric case: a geometric insight To expand our search for closed
surfaces of revolution satisfying the coring property, we need to look at surfaces that
are not centrally symmetric. But the prole curve of such a surface need not be the
graph of a prole function. So how do we describe the prole curves among which we
want to search? We must replace FIGURE 3 by the more complicated FIGURE 5.
h
x = rG(y/r) x = rF(y/r)
(0, rb)
y
x
R
Figure 5 A family of non-symmetric prole curves C(r).
There a non-symmetric prole curve C generating a non-symmetric surface S is
scaled by a linear scale factor r to produce a family of prole curves C(r) that generate
surfaces S(r). For convenience, we locate the maximum y-value b on the curve C at
the point (0, b). Since by hypothesis the curve C has exactly two x-intercepts, one
portion of C must connect the rightmost of these x-intercepts with (0, b) and another
portion of C must connect the leftmost of these x-intercepts with (0, b). On each of
these portions y need not be a function of x, but x is a function of y. Otherwise, coring
the surface S with a cylindrical drill bit centered on its axis would not always produce
a ring, which by denition has to be in one piece. So the curve C is the union of the
graphs of two functions: x = F(y) on the right and x = G(y) on the left. The domain
of both F and G is 0 y b and F(b) = G(b) = 0.
Fortunately, we can reduce this more complicated situation to the simpler one we
have already analyzed. We merely symmetrize the prole curve C in FIGURE 5 with
respect to the y-axis. That is, for each y we horizontally shift the line segment deter-
mined by the points (G(y), y) and (F(y), y) on C so that its center is on the y-axis.
The left and right endpoints of the shifted line segment then lie the same distance
(F(y) G(y))/2 to the left and the right of the y-axis, respectively. This transforms
C to the symmetric curve C
generated by C
is the
symmetrization of the surface S generated by C relative to the plane x = 0. Clearly S
is centrally symmetric.
Now suppose we scale both the original curve C and the symmetrized curve C
by the same linear scale factor r. Coring the resulting surfaces of revolution using the
same cylindrical drill bit of radius R centered on the x-axis yields two rings having the
same height h(r) = r(F(R/r) G(R/r)). These rings are generated by revolving the
shaded regions around the x-axes in FIGURE 5 and FIGURE 6. If the volumes of these
rings are calculated using the shell method, the answer is the same in each case:
V =
_
rb
R
2y (r F(y/r) r G(y/r)) dy.
VOL. 83, NO. 3, JUNE 2010 197
h
(0, rb)
y
x
R
x =
r
2
_
G(y/r) F(y/r)
_
x =
r
2
_
F(y/r) G(y/r)
_
Figure 6 Symmetrized versions C
generated by the
symmetrized curve C
EE
Augsburg College
Minneapolis, MN 55454-1338
[email protected]
The Tower of Hanoi graphs are intricate, highly symmetric, little-known combinatorial
graphs that arise from the multipeg generalization of the well-known Tower of Hanoi
puzzle. In this paper, we tour this family of graphs, exploring what we and others have
shown, and what is open for further investigation. Even a quick glance at FIGURES 1
4 showing the rst few examples (which we dene more carefully within the paper)
suggests patterns waiting to be discovered. We count the order, size, and degrees of
vertices and show how alternate methods of counting these objects can be used to de-
rive combinatorial identities. We describe the standard labeling of these graphs, from
which we demonstrate that, although these graphs become more complex as their order
increases, one measure of their complexitythe chromatic numberremains remark-
ably simple.
Figure 1 The Hanoi graph H
2
3
Figure 2 The Hanoi graph H
2
4
Figure 3 The Hanoi graph H
3
3
Figure 4 The Hanoi graph H
3
4
Math. Mag. 83 (2010) 200209. doi:10.4169/002557010X494841. c Mathematical Association of America
VOL. 83, NO. 3, JUNE 2010 201
The Hanoi graphs
The graphs begin with the Tower of Hanoi puzzle. The classic version has three pegs
and several disks with distinct diameters, as in FIGURE 5. At the beginning, all of
the disks are stacked on the rst peg in order by size, with the largest at the bottom.
The object is to move the disks so that they are similarly stacked on the second peg.
Only one disk may be moved at a time, from the top of one stack to the top of another
stack (or onto an empty peg)and, no disk may ever sit atop a smaller disk. Readers
who have never tried the puzzle might wish to play one of the many available online
versions.
Figure 5 The tower of Hanoi puzzle
Figure 6 Adjacent states in H
5
4
The puzzle was invented in 1883 by French number theorist and recreational math-
ematician
Edouard Lucas (18421891). It was quickly generalized. Lucas himself ex-
plored multipeg puzzles as early as 1889. A 4-peg puzzle known as The Reves Puz-
zle appeared in 1908 in The Canterbury Puzzles and Other Curious Problems [3].
The problem of counting the number of steps needed to solve the multipeg puzzle (as
a function of the numbers of pegs and disks) was posed in 1939 in the Monthly [17].
Lucas counted the minimum number of moves needed to solve the 3-peg puzzle, but
the minimum number of moves needed to solve the 4-peg puzzle has yet to be settled.
Of course, if the number of pegs exceeds the number of disks, then the puzzle is trivial,
but with each added peg the corresponding graphs become more complicated. Andreas
Hinz gives a more detailed history of the puzzle [4].
Associated with many puzzles and games is a model called a state graph, or cong-
uration graph. Its vertices are the legal states, in our case the allowable congurations
of disks on pegs. Two vertices are connected by an edge if a single move takes us from
one state to the other. The state graph of a Tower of Hanoi puzzle with d disks on p
pegs for p 3 is called a generalized Tower of Hanoi graph, or just Hanoi graph, and
is denoted H
d
p
. These graphs are undirected since every move is reversible.
For example, FIGURE 6 shows two states in the puzzle with ve disks on four pegs.
We get from the rst state to the second by moving the next-to-smallest (light gray)
disk from the rst to fourth peg. Thus the vertices corresponding to these two states
are connected by an edge in the graph H
5
4
.
To see how these graphs are built, note that for the (admittedly silly) one-disk puzzle
on p pegs, the state graph consists of p vertices with an edge connecting each pair of
vertices. That is, H
1
p
= K
p
, the complete graph on p vertices. Another observation
for those just getting to know these graphs is that the corners of the large triangle in
FIGURE 3 correspond to states with all three disks stacked on a single peg.
202 MATHEMATICS MAGAZINE
For two disks, the subgraph of H
2
p
whose edges correspond to moves of the smaller
disk is p disjoint copies of H
1
p
= K
p
. (Each copy of H
1
p
corresponds to a particular
xed placement of the larger disk.) To build the full graph H
2
p
, we connect vertices
from different components when there is a move of the larger disk between their cor-
responding states. For example, FIGURE 1 shows the graph H
2
3
built from three copies
of the triangle H
1
3
= K
3
, and FIGURE 2 shows the graph H
2
4
built from four copies
of the kite H
1
4
= K
4
. Using our imagination, we see H
2
5
built from ve copies of the
pentagram H
1
5
= K
5
and so on. We can more easily track this construction using the
vertex labeling we present later.
In general, the d-disk graph H
d
p
is built from p copies of H
d1
p
, each corresponding
to a xed placement of the largest disk, where we connect remote vertices if there is
a corresponding move of this largest disk. For example, FIGURE 3 shows the graph
H
3
3
built from three copies of H
2
3
and FIGURE 4 shows the graph H
3
4
built from four
copies of H
2
4
.
This recursive construction suggests that the graphs are connected: that we can get
from any arrangement of disks on pegs to any other in the puzzle. Though connect-
edness is not obvious from the puzzle itself, Hinz and Daniele Parisse prove that the
Hanoi graphs are not only connected when p 3, but also Hamiltonian: there exists
a cycle visiting each vertex exactly once [7]. They also assert that H
d
p
is ( p 1)-
connected: that the removal of any p 2 vertices and their corresponding edges does
not disconnect the graph.
The Hanoi graphs for the classic 3-peg puzzle were introduced in 1944 in The Math-
ematical Gazette [16]. They bear striking resemblance to Sierpi nskis triangles and are
a special case of the Sierpi nski graphs discussed by various authors [8, 9, 12, 18]. They
are related to Pascals triangle, as discussed by David Poole [15] and Hinz [5]. As an
application, Paul Cull and Ingrid Nelson discuss the 3-peg graphs role in perfect 1-
error correcting codes [2]. The Hanoi graphs for the puzzle on more than three pegs
have been studied since the 1980s, for example by Xiaowu Lu [13] and Hinz [4].
Though we are interested in the graphs, it is worth mentioning the connection to
solving the puzzle. A path in a graph is a sequence of distinct vertices, each consecu-
tive pair connected by an edge. The length of the path is the number of edges. Solving
the puzzle amounts to nding a path from the starting vertex to the ending vertex,
and of particular interest are paths of minimal length. In the 3-peg graphs, a minimal
path follows the side of the triangle. Hinz and others have expressed hope that under-
standing the Hanoi graphs might lead to insight on minimal solutions of the puzzle for
p > 3 pegs.
Counting on the Hanoi graphs
A graph can be measured in many ways, often beginning with the number of vertices,
number of edges, and degrees of vertices. In this section, we calculate these quantities
for the Hanoi graphs. Then, we derive some combinatorial identities. These results
appear (or are implicit) in the work of Sandi Klav zar, Uro s Milutinovi c, and Ciril
Petr [10].
How many vertices does H
d
p
have? Each of the d disks can be assigned to any of
the p pegs. Since disks must be piled largest to smallest on each peg, each assignment
produces a unique conguration. Therefore, there are p
d
different congurations and,
thus, p
d
vertices in the graph.
How many edges does H
d
p
have? For a xed pair of pegs, we can move a disk
from precisely one of those pegs to the other at every state except where both pegs are
empty. Since there are ( p 2)
d
states with both pegs empty, there are p
d
( p 2)
d
VOL. 83, NO. 3, JUNE 2010 203
states where we can move a disk between this pair of pegs. Each move is counted at
each state, which is to say, counted twice. Accounting for our choice of pegs as well,
we nd the total number of edges is
1
2
p
2
[ p
d
( p 2)
d
].
For example, the graph H
3
3
shown in FIGURE 3 has 27 vertices and 39 edges, and
the graph H
2
4
shown in FIGURE 2 has 16 vertices and 36 edges.
Alternatively, for each 1 i d, we can move disk i between peg A and peg B
as long as none of the i 1 smaller disks sit on either of these pegs. There are
p
2
p
2
p
di
( p 2)
i 1
edges that correspond to moving disk i . Summing to get the total number of edges and
equating with our previous count gives the identity
d
i =1
p
2
p
di
( p 2)
i 1
=
1
2
p
2
[ p
d
( p 2)
d
].
We could have derived this by algebraic manipulation (using the factorization of
x
n
y
n
, where here x y = 2), but is more amusing when it appears from counting
on Hanoi graphs.
What is the degree of each vertex? At each vertex there is one incident edge for
every pair of pegs, except when both pegs are empty in the corresponding state. Thus,
the degree of a vertex corresponding to a state with k occupied pegs, or equivalently k
top disks, is
p
2
p k
2
,
where the second term is understood to equal zero if k = p 1 or k = p.
Alternatively, the only disks that move are top disks, which can move to any other
peg unless that peg is occupied by a smaller top disk. Thus, counting from smallest top
disk to largest, we nd the degree of a vertex corresponding to a state with k occupied
pegs equals
( p 1) +( p 2) + +( p k) = kp
k +1
2
p
2
p k
2
.
Notice that the degree depends on the number of occupied pegs in the corresponding
state. Howmany states have exactly k occupied pegs? For this count we use the Stirling
number of the second kind, S(d, k), which equals the number of ways to partition
d distinguishable objects into k nonempty subsets. A standard recursion to calculate
S(d, k) for 0 k d is
S(0, 0) = 1; S(d, 0) = 0 for d 1;
204 MATHEMATICS MAGAZINE
and
S(d, k) = S(d 1, k 1) +kS(d 1, k), for d 1.
(To see why, note that the rst summand counts the partitions where the dth element
is in a singleton set.)
Thus we can sort d disks into exactly k nonempty subsets in S(d, k) ways. We
can assign these subsets to p pegs in p( p 1) ( p (k 1)) ways; we denote
this falling factorial by ( p)
k
. Since the subsequent placement of each disk onto its
subsets assigned peg is uniquely determined by size, the number of states with exactly
k occupied pegs is S(d, k)( p)
k
.
Klav zar et al. use the Hanoi graphs to derive various combinatorial identities [10].
For example, summing over the possible number of occupied pegs and equating our
two counts for the total number of vertices give the well-known Stirling identity
p
k=1
S(d, k)( p)
k
= p
d
for any positive integers d and p.
Similarly, we can compare the number of edges. We count S(d, k)( p)
k
vertices
corresponding to states with exactly k occupied pegs, each with degree
p
2
pk
2
.
Thus the number of edges in the graph is
1
2
p
k=1
S(d, k)( p)
k
p
2
p k
2
.
Equating with our previous count and simplifying give
p2
k=1
S(d, k)( p)
k+2
= p( p 1)( p 2)
d
,
which might appear to be novel but, alas, after canceling p( p 1) reduces to the same
Stirling identity for p 2.
There are further enumerative uses of the Hanoi graphs. Klav zar et al. showed con-
nections to second order Euler numbers, Lah numbers, and Catalan numbers; they
suggest that there may be additional identities available [11]. Hinz et al. connect the
graphs to Sterns diatomic sequence [6].
Labeling and coloring the Hanoi graphs
It is helpful to label each vertex of the Hanoi graph in a way that lets us read off
the state of the puzzle it represents. In this section, we describe the standard labeling,
which leads to a natural denition of the recursive structure introduced informally
earlier and is key to coloring the vertices.
It is customary to number the pegs 0, 1, 2, . . . , p 1 and the disks 1, 2, 3, . . . , d
from smallest to largest. We say the i th disk sits on peg s
i
, for i = 1, 2, . . . , d, and
label the vertex corresponding to this state with the string s
d
s
2
s
1
in this (reverse)
order. Note that the labeling denotes where each disk goes; imagine placing the disks
on the pegs, starting with the largest disk and working down by size.
For example, the state shown in FIGURE 7 corresponds to the vertex labeled 173033
in H
6
8
.
VOL. 83, NO. 3, JUNE 2010 205
1 5 4 3 2 0 7 6
Figure 7 State corresponding to vertex labeled 173033 in H
6
8
We list the labels of its twenty-two adjacent vertices in a table.
Disk to peg 0 to peg 1 to peg 2 to peg 3 to peg 4 to peg 5 to peg 6 to peg 7
1 173030 173031 173032 173034 173035 173036 173037
2
3 173133 173233 173433 173533 173633 173733
4
5 113033 123033 143033 153033 163033
6 273033 473033 573033 673033
As another example, note that FIGURE 6 corresponds to the edge between vertices
labeled 01302 (top) and 01332 (bottom) in H
5
4
. Conversely, we can determine the state
from its vertex label.
Notice the vertex labeled s
d
s
2
s
1
has k = |{s
d
, , s
2
, s
1
}| occupied pegs. For
example, the vertex labeled 173033 in H
6
8
has
k = |{1, 7, 3, 0, 3, 3}| = 4
occupied pegs and thus degree
8
2
84
2
( p 2)
d1
edges of this type. Therefore, e
1, p
=
p
2
and
for d 2,
e
d, p
= pe
d1, p
+
p
2
( p 2)
d1
.
The reader can check that our previous count satises this recursion.
Thus far we have looked at known properties of the Hanoi graphs. We are now ready
to prove a new result. The Hanoi graphs are complicated, but thanks to their symmetry
and our convenient labeling, they can be easily colored.
For a positive integer c, a graph can be c-colored if there is a way to label the
vertices with the colors 0, 1, . . . , c 1 such that adjacent vertices are different colors.
The chromatic number of a graph G is the smallest number of colors needed and is
denoted (G). For example, (H
1
p
) = (K
p
) = p.
At any vertex of the full graph H
d
p
, the subgraph corresponding to moving only the
smallest disk is a copy of H
1
p
= K
p
. Thus (H
d
p
) p.
To see that p colors sufce, color the vertex labeled s
d
s
2
s
1
by the sum of its peg
numbers modulo p. That is,
(s
d
s
2
s
1
) = s
d
+ +s
2
+s
1
(mod p).
To check that is a p-coloring, observe that the labels of adjacent vertices differ in
exactly one place, corresponding to the sole moved disk between the states.
FIGURE 10 shows this coloring of H
3
4
with white (0), light gray (1), dark gray (2),
and black (3).
Alternatively, this coloring can be built recursively. Begin with H
1
p
colored by its
vertex labeling. For d 2, given p copies of H
d1
p
each initially p-colored the same,
place the number a in front of each vertex label in the ath copy and twist the coloring
of each vertex in that copy by adding a modulo p. Formally, write (v) for the color
assigned to the vertex labeled v in H
d1
p
, so that the twisted coloring on H
d
p
is dened
by
(av) = (v) +a (mod p).
The reader can now verify that each type of edge in H
d
p
connects vertices of different
colors and also that we obtain the same coloring as before.
Notice that, although the number of vertices and number of edges of the Hanoi
graphs each grow exponentially in the number of disks, the chromatic number is inde-
pendent of the number of disks.
VOL. 83, NO. 3, JUNE 2010 207
Figure 10 H
3
4
with colored vertices
Another way to measure a graph is by its independence number, which is the max-
imum number of non-adjacent vertices, usually called (G). In the Hanoi graphs, the
p
d1
vertices of a xed color in a minimal coloring form an independent set and so
(H
d
p
) p
d1
. Conversely, any independent set may include at most one vertex from
each copy of K
p
corresponding to moving only the smallest disk. As there are p
d1
copies, (H
d
p
) = p
d1
.
Further investigation
While we understand much about the Hanoi graphs, there is much we still do not know.
Hinz and Parisse have calculated the chromatic index (edge-coloring number) of the
Hanoi graphs [8]. Any permutation of the peg numbers gives an automorphism of the
graph. Recently, So Eun Park has shown that these are the only automorphisms of
the graph: Aut (H
d
p
)
= S
p
[14]. Most graph theoretic measures of the Hanoi graphs
including the domination number, covering number, and pebbling numbersare un-
known. Some of these quantities have been calculated for the Sierpi nski graphs but not
the Hanoi graphs for more than three pegs [18].
We are particularly interested in the diameter: the maximum over all pairs of ver-
tices of the minimal length of a path connecting them. The minimum number of moves
needed to solve the Tower of Hanoi puzzle is bounded by the diameter of the graph and
equal to the diameter in the classic 3-peg graph. The diameter of the multipeg graphs
are, in general, unknown and it is known that in some cases the diameter is larger than
the minimum number of moves. Thus it is not clear whether calculating the diameter is
more or less difcult than calculating the minimum number of moves needed to solve
208 MATHEMATICS MAGAZINE
the puzzle. Some results on the diameter of variants of the puzzle are known [1].
The 3-peg Hanoi graphs are planar: they can be drawn in the plane without any
edges crossing. Hinz and Parisse [7] prove that the only planar Hanoi graphs on more
than three pegs are H
1
4
and H
2
4
. (We challenge the reader to draw H
2
4
without crossing.
If you try and are stuck, consider these possibly cryptic hints: View K
4
as if looking
at the top of a tetrahedron and do a little cats cradle. In case you are still puzzled,
look for a representation of H
2
4
as a planar graph in the October 2010 issue of this
MAGAZINE.) For any nonplanar graph, it is natural to ask about the crossing number:
the minimum number of crossings needed to draw it in the plane. (Technically, a cross-
ing involves only two edges at a time.) Alternatively we might inquire whether there
are other surfaces on which the graph can be drawn without crossings; the genus of a
graph is the smallest genus of such a surface. The genus is no larger than the cross-
ing number, as one can add a bypass handle at each edge crossing, but efciencies
often lead to a smaller genus. The genera of the complete graphs are known, but the
crossing numbers are not. Results on the crossing numbers of the related Sierpinski
graphs are given by Klav zar and Bojan Mohar [12]. The genera and crossing numbers
of nonplanar multidisk Hanoi graphs are unknown.
We offer one nal direction for further investigation. Poole lists numerous variants
of the puzzle [15]. For example, in Straightline Hanoi on three pegs, we may only
move disks to and from the rst peg. In Cyclic Hanoi the pegs are arranged in a
circle and we may only move disks counterclockwise. In Rainbow Hanoi the disks
are colored and various restrictions are placed on moves based on the color of the disks.
In Multidisk Hanoi there are multiple copies of each disk (either distinguishable or
not). Hinz claims that Lucas suggested the variation of allowing the disks to be out of
order at the startlarger disks on smaller onessubject to the usual rules later in the
play. Still other variants allow a larger disk to sit on the next smallest disk, but not any
smaller disks than that. To our knowledge, very little about their graphs is known.
Acknowledgment We thank Paul Cull for introducing Danielle to these graphs at his Research Experience
for Undergraduates at Oregon State University in Summer 1999, Andreas Hinz for expert advice, and Matthew
Richey for help with the graphics and for cheering Suzanne on.
REFERENCES
1. Daniel Berend and Amir Sapir, The diameter of Hanoi graphs, Inform. Process. Lett. 98(2) (2006) 7985.
doi:10.1016/j.ipl.2005.12.004
2. Paul Cull and Ingrid Nelson, Error-correcting codes on the towers of Hanoi graphs, Discrete Math.
208/209(28) (1999) 157175. doi:10.1016/S0012-365X(99)00070-9
3. Henry E. Dudeney, The Canterbury puzzles and other curious problems, E. P. Dutton, New York, 1908. [4th
edition, Dover Publications, Mineola, NY, 1958.]
4. Andreas M. Hinz, The tower of Hanoi, Ensign. Math. (2) 35(2) (1989) 289321.
5. Andreas M. Hinz, Pascals triangle and the tower of Hanoi, Amer. Math. Monthly 99 (1992) 538544. doi:
10.2307/2324061
6. Andreas M. Hinz, Sandi Klav zar, Uro s Milutinovi c, Daniele Parisse, and Ciril Petr, Metric properties of
the tower of Hanoi graphs and Sterns diatomic sequence, European J. Combin. 26(5) (2005) 693708. doi:
10.1016/j.ejc.2004.04.009
7. Andreas M. Hinz and Daniele Parisse, On the planarity of Hanoi graphs, Expo. Math. 20(3) (2002) 263268.
8. Andreas M. Hinz and Daniele Parisse, Coloring Hanoi graphs, preprint, 2006.
9. M. Jakovac and Sandi Klav zar, Vertex-, edge-, and total-colorings of Sierpi nski-like graphs, Discrete Math.
309(6) (2009) 15481556. doi:10.1016/j.disc.2008.02.026
10. Sandi Klav zar, Uro s Milutinovi c, and Ciril Petr, Combinatorics of topmost discs of multi-peg tower of Hanoi
problem, Ars Combin. 59 (2001) 5564.
11. Sandi Klav zar, Uro s Milutinovi c, and Ciril Petr, Hanoi graphs and some classical numbers, Expo. Math. 23(4)
(2005) 371378.
12. Sandi Klav zar and Bojan Mohar, Crossing number of Sierpi nski-like graphs, J. Graph Theory 50(3) (2005)
186198. doi:10.1002/jgt.20107
VOL. 83, NO. 3, JUNE 2010 209
13. Xiaowu Lu, Tower of Hanoi graphs, Int. J. Comput. Math. 19 (1986) 2338. doi:10.1080/
00207168608803502
14. S. Eun Park, The group of symmetries of the Tower of Hanoi graph, Amer. Math. Monthly 117 (2010) 353
360. doi:10.4169/000298910X480829
15. David G. Poole, The towers and triangles of Professor Claus (or, Pascal knows Hanoi), Math. Mag. 67 (1994)
323344.
16. R. S. Scorer, P. M. Grundy, and C. A. B. Smith, Some binary games, Gaz. Math. 28(280) (1944) 96103. doi:
10.2307/3606393
17. B. M. Stewart, Advanced problem 3918, Amer. Math. Monthly 46 (1939) 363364. doi:10.2307/2302907
18. Alberto M. Teguia and Anant P. Godbole, Sierpinski gasket graphs and some of their properties, Australas.
J. Combin. 35 (2006) 181192.
Summary The Tower of Hanoi graphs make up a beautifully intricate and highly symmetric family of graphs
that show moves in the Tower of Hanoi puzzle played on three or more pegs. Although the size and order of these
graphs grow exponentially large as a function of the number of pegs, p, and disks, d (there are p
d
vertices and
even more edges), their chromatic number remains remarkably simple. The interplay between the puzzles and the
graphs provides fertile ground for counts, alternative counts, and still more alternative counts.
DANIELLE ARETT graduated with a double major in Mathematics and English from Augsburg College in
2000. She now works for the Hartford Life Insurance Company in Fargo, North Dakota. In her free time, Danielle
enjoys writing prose, composing music, and playing piano and guitar. She rst learned of the Tower of Hanoi
graphs in an REU at Oregon State University in the summer of 1999.
SUZANNE DOR
EE earned her doctorate in mathematics at the University of Wisconsin. She has taught at
Augsburg College since 1989 where she adores working with studentsfrom directing undergraduate research
projects in combinatorics, to helping mathematics majors develop their reasoning and speaking skills, to engaging
diverse learners in the developmental algebra course she developed. For fun, Suzanne enjoys playing bridge,
solving puzzles, interior design, and getting her hands dirty, literally, in the garden.
NOTE S
When Is n
2
a Sum of k Squares?
TODD G. WI LL
University of WisconsinLa Crosse
La Crosse, WI 54601
[email protected]
The square 169 can be written as a sum of two squares 5
2
+ 12
2
, as a sum of three
squares 3
2
+4
2
+12
2
, as a sum of four squares 1
2
+2
2
+8
2
+10
2
, as a sum of ve
squares 1
2
+ 2
2
+ 2
2
+4
2
+12
2
, and so on for quite a long while. In fact, Jackson,
et al. [5] note that 169 can be written as a sum of k positive squares for all k from 1
to 155 and rst fails as a sum of length 156. The authors go on to ask whether there is
any limit to such a string of sums. Specically, for every positive integer b is there an
integer n which can be written as a sum of k positive squares for all k from 1 to b? We
assemble a collection of results, most of which have been known for quite some time,
to answer this question and, in fact, to specify all possible lengths for sums of squares
equal to a given square.
This investigation began when I read a manuscript in which the author proved that a
certain combinatorially dened integer c(k) could be written as a sum of k positive in-
teger squares. Although the proof technique was interesting, I wondered if it wouldnt
be more surprising to nd that a sufciently large integer couldnt be written as a sum
of k squares. For that reason, in what follows we address the possible lengths for sums
of squares equal to a given integer which may or may not be a square.
Sums of 5 or more positive squares Dickson [1] credits Dubouis with publishing
the following theorem in 1911. An integer n 34 can be written as a sum of k pos-
itive squares for all k satisfying 5 k n except for k = n 13, n 10, n 7,
n 5, n 4, n 2, n 1. Writing 20 years later, Pall [7] laments over having du-
plicated Dubouis work before noticing the report of it but resists presenting his own
proof. Writing over 75 years later still, I suspect that both Dubouis and Palls proofs
resembled the following.
First we show that no integer n can be written as a sum of k positive squares for
k {n 13, n 10, n 7, n 5, n 4, n 2, n 1}. To see this note that the sum
of k positive squares n = s
2
1
+ + s
2
k
can be obtained from the sum of n ones by
repeatedly replacing s
2
i
of the ones with the single square s
2
i
. This replacement reduces
the number of summands by s
2
i
1. For example, replacing four ones, 1 +1 +1 +1,
with a single square 2
2
reduces the number of summands by 3. A replacement of 3
2
ones reduces the number of summands by 8 and larger squares reduce the number of
summands by at least 15. A quick check shows that the count of n summands in the
sum of all ones cannot be reduced by any of the amounts 1, 2, 4, 5, 7, 10, 13 using
reductions of 3 and 8.
We now use induction to show that n can be written as sums of the specied lengths,
securing the base case of n = 34 with a hand check. For n > 34 we add 1
2
to each of
Math. Mag. 83 (2010) 210213. doi:10.4169/002557010X494850. c Mathematical Association of America
210
VOL. 83, NO. 3, JUNE 2010 211
the sums of squares equal to n 1 given by the induction hypothesis. This gives all of
the required lengths of sums for n except for a length 5 sum.
The proof is completed by showing that all n > 34 can be written as a sum of 5
positive squares. A computer check (an additional hand check for Pall and Dubouis)
veries this for 34 < n 169. For n > 169 we use Lagranges theorem, which states
that every positive integer can be written as a sum of four or fewer positive squares.
For n > 169, use Lagranges theorem to write n 169 as a sum of 1, 2, 3 or 4 positive
squares. Then add the appropriate representation of 169 as the sum of 4, 3, 2, or 1
positive squares to obtain ve positive squares summing to n.
So, except for lengths of 2, 3, and 4, this result species all possible lengths for sums
of squares equal to a given square. In addition the result greatly simplies the question
in Jackson, et al., since if a square n can be written as a sum of 2, 3, and 4 positive
squares then n can be written as a sum of k positive squares for all 1 k n 14.
Sums of two positive squares There seems to be some disagreement about when
an integer can be written as a sum of two positive squares. In the 1959 article [3] the
condition is stated that the integer must have the form 4
a
n
1
n
2
2
, with integral a 0,
n
1
> 1, the prime factors of n
1
congruent to 1 mod 4 and the prime factors of n
2
congruent to 3 mod 4. In the 2006 book [6] the condition is the same except that 4
a
is
replaced with 2
e
, with e a nonnegative integer. In both sources the claims are said to
follow easily from previous results, but proofs are not given. However, neither of these
conditions include 18 = 2 3
2
= 3
2
+3
2
since 18 has no 4k +1 prime factor. More
generally the conditions exclude the numbers n = m
2
+m
2
where m has no 4k +1
prime. Perhaps the authors meant to describe conditions in which n could be written
as a sum of two distinct positive squares.
In any case, the correct statement is that a positive integer n can be written as the
sum of two positive squares if and only if either n is twice a square or n has at least
one 4k +1 prime factor and all of its 4k +3 prime factors appear to even powers.
This fact follows easily from the much deeper theory for computing r
k
(n) which
is dened to be the number of ways of writing n as a sum of k integer squares. In
computing r
k
(n) the squares of both positive and negative integers as well as 0
2
are
allowed and permutations of addends are counted as distinct sums. So, for example
r
2
(9) = 4 since 9 = 0
2
+ (3)
2
= (3)
2
+0
2
are the four ways to express 9 as the
sum of two integer squares.
Let n = 2
k
p
a
i
i
q
b
j
j
be the prime factorization of n with the p
i
and q
j
being the
primes congruent to 1 and 3 mod 4, respectively. Gauss showed that if any of the b
j
are
odd then r
2
(n) = 0 and otherwise r
2
(n) = 4
(1 +a
i
). So for example, since n = 9
has no 4k +3 primes to an odd power, and all 4k +1 primes occur to the zero power,
r
2
(9) = 4(1 +0) = 4 as counted above.
Now assume that n = a
2
+b
2
is the sum of two positive squares. Either n is twice
a square or a = b in which case n = (a)
2
+ (b)
2
= (b)
2
+ (a)
2
shows that
r
2
(n) 8. From this it follows that all 4k +3 primes appear to even powers and there
is at least one 4k +1 prime. Conversely, if n = 2k
2
, then clearly n is the sum of two
nonzero squares. If, on the other hand, all 4k + 3 primes appear to even power and
there is at least one 4k +1 prime, then r
2
(n) 8. Since at most 4 of these sums can
use 0
2
, there must be a sum with two positive squares.
Sums of three positive squares When an integer can be written as a sum of three
positive squares has not quite been pinned down. Legendre showed that numbers of
the form 4
h
(8k +7) are those which cannot be written as the sum of three or fewer
positive squares. But this left open the set of numbers which cannot be written as a sum
of three positive squares but can be written as a sum of one or two. In 1959 Grosswald,
212 MATHEMATICS MAGAZINE
et al., [3] proved that there exists a nite set of integers S such that n is not the sum
of three positive squares if and only if n = 4
h
q where q = 7 mod 8 or q is an element
of the nite set S. They conjectured that S = {1, 2, 5, 10, 13, 25, 37, 58, 85, 130} but
their proof showed only that the set S is nite.
Despite this disappointment, it is known which squares are sums of three positive
squares. Hurwitz [4] proved that with the exception of (2
k
)
2
and (5 2
k
)
2
, every pos-
itive square can be written as a sum of three positive squares. Fraser and Gordon later
gave an elementary proof of this fact in [2].
As a digression, note that Hurwitzs result shows that the set S contains no squares
other than 1 and 25. So, in considering whether there might be additional numbers in
S, we need only consider nonsquares. If n is not a square, then for n = a
2
+b
2
neither
a nor b are zero and so the orderings in the three sums 0
2
+a
2
+b
2
, a
2
+ 0
2
+ b
2
,
a
2
+ b
2
+ 0
2
are distinct. If n cannot be written as a sum of three positive squares,
then all sums of three squares equal to n must have one of these three forms. Thus if n
is not a square, then n cannot be written as a sum of three positive squares if and only if
r
3
(n) = 3r
2
(n). In three hours, a laptop search using Mathematicas built-in SquaresR
function veried that the conjectured values for S are correct for n 5 10
6
.
Sums of four positive squares In [6], Pall is credited with showing that n can be
written as a sum of four positive squares if and only if n is not one of {1, 3, 5, 9, 11,
17, 29, 41} or of the form 2 4
k
, 6 4
k
, 14 4
k
. In a footnote of the cited work [7],
Pall says that the reader will have no difculty in proving [this result] by using the
following classical result, which was rst stated by Fermat, and was rst proved by
Legendre in 1798. A positive integer is a sum of three [or fewer positive] squares if
and only if it is not of the form 4
h
(8k + 7). With such a challenge I picked up my
pen and searched for the proof. Minutes ticked away to hours with my ego sinking
all the while. I eventually did hit upon the following proof similar to the one I later
found in [8].
First note that 4
h
(8k +7) = 0, 4, 7 mod 8. If n = 2, 3, 4, 6, 7 mod 8, then n 13
2
=
1, 2, 3, 5, 6 mod 8 and so n 13
2
is not of the form 4
h
(8k +7). Thus for n > 13
2
,
Legendres results shows that n 13
2
can be written as a sumof three or fewer positive
squares. Augment this sum with the appropriate choice from among 13
2
= 5
2
+12
2
=
3
2
+ 4
2
+ 12
2
to obtain four positive squares summing to n. For n 13
2
, a com-
puter check nds that {2, 3, 6, 11, 14} are the only integers in these congruence classes
which cannot be written as a sum of four positive squares.
If n = 1, 5 mod 8, then n 26
2
= 5, 1 mod 8. So for n > 26
2
, n 26
2
can be
written as a sum of three positive squares. Augment this sum with the appropriate
choice from among 26
2
= 10
2
+24
2
= 6
2
+8
2
+24
2
to obtain four positive squares
summing to n. For n 26
2
, a computer check nds that {1, 5, 9, 17, 29, 41} are the
only integers in these congruence classes which cannot be written as a sum of four
positive squares.
If n = 0 mod 8, consideration mod 8 shows that n is a sum of four positive squares
if and only if n/4 is. Repeated applications of this observation allows n to be written
as 4
a
2 j where 2 j = 0 mod 8 and n is a sum of four positive squares if and only if 2 j
is. Previous cases show that 2 j = 0 mod 8 is not a sum of four positive squares only
for 2 j = 2, 6, 14.
Conclusion What, then, are the possible lengths for sums of squares equal to a given
positive square?
The possible lengths of 5 and higher are specied by Dubouis result for squares 36
and above. A direct check shows that the same result holds for 16 and 25 and that the
possible sum lengths for 9 are 1, 3, 6, 9. Since a square cannot also be twice a square,
VOL. 83, NO. 3, JUNE 2010 213
the squares which can be written as a sum of two positive squares are those with a
prime factor congruent to 1 mod 4. We see that among positive squares, (2
k
)
2
and
(5 2
k
)
2
are the only ones which cannot be written as a sum of three positive squares
and that 1 and 9 are the only ones which cannot be written as a sum of four positive
squares.
Combining these conditions, we learn that with the exception of (5 2
k
)
2
, a square
can be written as sums of 2, 3, and 4 positive squares if and only if it has at least one
prime factor congruent to 1 mod 4. Moreover such a square n can be written as a sum
of k positive squares for all k from 1 to n 14.
The rst few squares meeting the combined conditions are 169, 225, 289, 625, 676,
841, 900. Going out a little farther we nd n = 1 000 002 000 001 = (101 9901)
2
with 101 being a prime congruent to 1 mod 4. So this square can be written as a sum
of k positive squares for all k from 1 to 1 000 001 999 987, making 169s run of 155
look not so special after all.
REFERENCES
1. Leonard Eugene Dickson, History of the Theory of Numbers, Vol. II: Diophantine Analysis, Dover, New York,
2005.
2. Owen Fraser and Basil Gordon, On representing a square as the sum of three squares, Amer. Math. Monthly
76 (1969) 922923. doi:10.2307/2317949
3. E. Grosswald, A. Calloway, and J. Calloway, The representation of integers by three positive squares, Proc.
Amer. Math. Soc. 10 (1959) 451455. doi:10.2307/2032865
4. Adolf Hurwitz, Mathematische Werke. Bd. II: Zahlentheorie, Algebra und Geometrie, Birkh auser Verlag,
Basel, 1963.
5. Kelly Jackson, Francis Masat, and Robert Mitchell, Extensions of a sums-of-squares problem, Math. Mag. 66
(1993) 4143.
6. Carlos J. Moreno and Samuel S. Wagstaff, Jr., Sums of Squares of Integers, Chapman & Hall/CRC, Boca
Raton, FL, 2006.
7. Gordon Pall, On sums of squares, Amer. Math. Monthly 40 (1933) 1018. doi:10.2307/2301257
8. Don Redmond, Number Theory: An Introduction, Marcel Dekker, New York, 1996.
Summary This note shows that with the exception of (5 2
k
)
2
, an integer square can be written as sums of
2, 3, and 4 positive squares if and only if it has at least one prime factor congruent to 1 mod 4. Moreover such
a square n can be written as a sum of k positive squares for all k from 1 to n 14. The question of when a
non-square can be written as a sum of k positive squares is also examined.
How Fast Will We Lose?
RON HI RSHON
College of Staten Island
Staten Island, NY 10314
[email protected]
Two players X and Y play a gambling game. They start with bankrolls of x and y
dollars respectively, where x and y are positive integers and (x, y) = (1, 1). They
repeatedly ip a coin, which may be a fair or unfair coin. When heads appears, X wins
and receives one dollar from Y; when tails appears, X loses and pays one dollar to Y.
Math. Mag. 83 (2010) 213218. doi:10.4169/002557010X494869. c Mathematical Association of America
214 MATHEMATICS MAGAZINE
The game continues until one player runs out of money. Let L be the event that X loses
the match; that is, that it is X who ends the game with a zero balance.
We assume that the ips are independent. We write p for the probability that X wins
a given ip, and we always write q for 1 p. Then the probability that X loses is
Pr(L) = q
x
p
y
q
y
p
x+y
q
x+y
( p = q); Pr(L) =
y
x + y
p = q =
1
2
. (1)
This is a well-known formula. Our gambling game is called gamblers ruin, and
can also be described as a random walk on the integers with two absorbing barriers.
A classical reference is Feller [1], chapters III and XIV; see especially equations (3.4)
and (3.5) in section XIV.3. The theory goes back over 300 years, and early investigators
include Huygens, DeMoivre, Monmart, and two Bernoullis. A good source, both for
history and results, is Tak acs [4]. Formula (1) is also used in [3], for which this paper
is a sequel.
In this paper, we study the probability of the event L
n
that X loses in exactly n ips.
DeMoivre calculated this probability in 1718, but his formula was quite complicated;
see [4], equations (13) and (12). Our goal is to give a simple method for nding these
probabilities. As explained in the last section of [3], this will involve the parallel goal
of counting the number c
n
= c
n
(x, y) of different sequences of H and T of length
n that lead to losing in exactly n ips. Also, given that X loses, we determine the
expected time it will take to lose.
For X to go broke, X must lose x more coin ips than X wins. Thus, for some
integer k 0, the sequence consists of x + k tails and k heads. The probability of
each such sequence is q
x+k
p
k
, and the number of such sequences is c
x+2k
. Thus
Pr(L
x+2k
) = c
x+2k
q
x+k
p
k
. If n is not of the form x + 2k, then Pr(L
n
) = 0. Therefore
Pr(L) =
k=0
c
x+2k
q
x+k
p
k
. (2)
As noted in [3], the numbers c
x+2k
are the coefcients for the power series of a certain
rational function g = g
x,y
. This means that g is a generating function for the sequence
{c
x+2k
}, k = 0, 1, 2, . . . .
First we rewrite equation (1) using S
n
= p
n1
+ p
n2
q + + pq
n2
+ q
n1
,
which is positive for 0 p 1. Observe that
S
n
=
p
n
q
n
p q
( p = q) and S
n
=
n
2
n1
p = q =
1
2
. (3)
It follows that
Pr(L) = q
x
S
y
S
x+y
for 0 p 1; (4)
to see this, for p = q divide the numerator and denominator in (1) by p q, and for
p =
1
2
, note that
q
x
S
y
S
x+y
=
1
2
x
y
2
y1
2
x+y1
x + y
=
y
x + y
.
LEMMA. The expression S
n
may be expressed as a polynomial in u = pq with
integer coefcients.
VOL. 83, NO. 3, JUNE 2010 215
Proof. For n = 1 or n = 2, (3) reduces to 1 so that S
1
= S
2
= 1. Since
p
n+1
q
n+1
= p
n
p q
n
q = p
n
(1 q) q
n
(1 p)
= ( p
n
q
n
) pq( p
n1
q
n1
),
for p = q and n 2 we have from (3) that
S
n+1
= S
n
pqS
n1
= S
n
uS
n1
. (5)
This identity also holds for p =
1
2
, which can be veried directly or by using a conti-
nuity argument. The lemma follows by induction.
Iterating (5), we obtain the sample calculations summarized in TABLE 1.
TABLE 1
S
3
S
4
S
5
S
6
1 u 1 2u 1 3u + u
2
1 4u + 3u
2
Set the expressions in (2) and (4) for Pr(L) equal and cancel q
x
from both sides of
the resulting identity. Setting u = pq, we obtain the identity
k=0
c
x+2k
u
k
=
S
y
S
x+y
. (6)
We write g(u) = g
x,y
(u) for the rational function
S
y
S
x+y
. From (6) and (4), we have
g(u) =
k=0
c
x+2k
u
k
and Pr(L) = q
x
g(u). (7)
We call g the loss function of X for the parameters x and y (in the variable u), and
we call the coefcients of the Maclaurin expansion in (7) the loss sequence of X for
these parameters. To repeat, the rst term in the loss sequence is always c
x
= 1.
THEOREM. Given the loss sequence c
x+2k
, we have
Pr(L
x+2k
) = c
x+2k
q
x+k
p
k
for integers k 0. (8)
Similarly, there is a win sequence d
y+2k
for X, based on Ys loss function g
y,x
, so that
Xs probability of winning in exactly y + 2k steps is d
y+2k
p
k
q
y+k
.
Note that, in the beginning, we had equation (2) but we did not know the coef-
cients. Equation (7) gets the power series to represent a rational function g. Now by
direct means, we can obtain the rational function, then its power series, and then easily
read off as many coefcients as we like. This is valid because of the uniqueness theo-
rem for power series: If two power series agree on an interval, then their coefcients
are equal.
To illustrate the Theorem, see TABLE 2. For example, from the (x, y) = (4, 2)
line, we conclude that Pr(L
4
) = q
4
, Pr(L
6
) = 4q
5
p, Pr(L
8
) = 13q
6
p
2
, Pr(L
10
) =
40q
7
p
3
, etc. Note also that the number of ways of losing in 22 ips is 29,524.
The loss functions in TABLE 2 were obtained using equation (6) and the results
in TABLE 1. Most of the loss sequences in TABLE 2 can be veried by rewriting the
216 MATHEMATICS MAGAZINE
TABLE 2
x y Loss Function g(u) Loss Sequencerst ten terms
1 2 1/(1 u) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
1 3 (1 u)/(1 2u) 1, 1, 2, 2
2
, 2
3
, 2
4
, 2
5
, 2
6
, 2
7
, 2
8
1 4 (1 2u)/(1 3u + u
2
) 1, 1, 2, 5, 13, 34, 89, 233, 610, 1597
2 4 (1 2u)/(1 4u + 3u
2
) 1, 2, 5, 14, 41, 122, 365, 1094, 3281, 9842
5 1 1/(1 4u + 3u
2
) 1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524
4 2 1/(1 4u + 3u
2
) 1, 4, 13, 40, 121, 364, 1093, 3280, 9841, 29524
3 3 1/(1 3u) 1, 3, 3
2
, 3
3
, 3
4
, 3
5
, 3
6
, 3
7
, 3
8
, 3
9
loss function using partial fractions and then using the expansion
1
1w
=
k=0
w
k
. For
example, for (x, y) = (1, 3), we obtain
g(u) =
1 u
1 2u
= 1 +
u
1 2u
= 1 +
k=1
2
k1
u
k
,
which explains the powers of 2 in the loss sequence. The relationship g
2,4
(u) =
ug
4,2
(u) +
1
1u
explains why the loss sequences in lines (4, 2) and (2, 4) look similar.
The sequence for (x, y) = (1, 4) in TABLE 2 no doubt looks familiar. In fact, it
is 1, f
1
, f
3
, f
5
, . . . where f
n
is the Fibonacci sequence ( f
1
= f
2
= 1, f
3
= 2, f
4
=
3, f
5
= 5, . . . ). To see this, we note that
f
1
+ f
2
z + f
3
z
2
+ f
4
z
3
+ f
5
z
4
+ =
1
1 z z
2
;
see, for example, formulas (6.116) and (6.117) in [2]. Also
f
1
f
2
z + f
3
z
2
f
4
z
3
+ f
5
z
4
=
1
1 + z z
2
.
Adding, we obtain
f
1
+ f
3
z
2
+ f
5
z
4
+ =
1 z
2
(1 z
2
)
2
z
2
.
Replacing z
2
by u, we get for 0 < u < 1,
f
1
+ f
3
u + f
5
u
2
+ =
1 u
1 3u + u
2
,
and so
1 + f
1
u + f
3
u
2
+ f
5
u
3
+ = 1 + u
1 u
1 3u + u
2
=
1 2u
1 3u + u
2
.
The rational function on the right is the loss function g(u) in TABLE 2 for (x, y) =
(1, 4), and we now see why the corresponding loss sequence consists of Fibonacci
numbers.
We return to the power series in (7). Pr(L) is dened for all p between 0 and 1
inclusive. Hence, from (7),
k=0
c
x+2k
u
k
converges for p =
1
2
or u =
1
4
, so the radius
of convergence R of the Maclaurin series of any loss function, with (x, y) = (1, 1),
VOL. 83, NO. 3, JUNE 2010 217
obeys
1
4
R < 1. If we set u =
1
4
and p = q =
1
2
in (7), and if we use the second
equation of (1), we obtain the following useful relation for the loss sequence:
k=0
c
x+2k
1
4
k
= g
1
4
= 2
x
Pr(L) =
2
x
y
x + y
. (9)
This shows that, given the value of x and the loss sequence of X, the value of y is
uniquely determined. As an example, suppose the loss sequence is one whose gen-
eral term is 3
k
, k 0. If x = 3, then by using (9), y is determined by the equations
k=0
(
3
4
)
k
= 4 =
8y
3+y
, which has unique solution y = 3.
Different pairs (x, y) may yield the same loss function. For example, (x, y) =
(n, 1) yields the same loss function as (x, y) = (n 1, 2). In each case, the com-
mon loss function is 1/S
n+1
. However, one can never nd three distinct pairs (x, y)
that have the same loss function. To see this, note that if for n > 1, we arrange the
powers of u in the expansion of S
n
in ascending order as in Table 1, the rst two terms
in this expansion will be
1 (n 2)u. (10)
This is easily proved by induction using the dening relation S
n+1
= S
n
uS
n1
.
Now suppose that two pairs (x, y) and (x
, y
+y
and
S
y
S
x
+y
= S
x+y
S
y
. (11)
First suppose that both y and y
+ y
2)u] = [1 (x + y 2)u][1 (y
2)u].
Equating coefcients of u, we nd that x = x
. But then y = y
by the statement
following equation (9).
Now suppose that y = 1, so we are investigating the case when (x, 1) and (x
, y
)
yield the same loss function. Then S
1
/S
x+1
= S
y
/S
x
+y
and S
x
+y
= S
x+1
S
y
. The
same analysis as in the last paragraph leads to x
= x if y
= 1, and x
= x 1
if y
), (x 1, y
) because, as
we noted after equation (9), the y value is uniquely determined by the x value and the
loss sequence. Thus only one y value can go with x 1 and y
= y
.
Finally, here are two questions that come to mind.
QUESTION 1. Can an innite number of the loss functions have a common root?
QUESTION 2. Our main ideas are actually probability free in their denition.
Can one give, in a manner as simple as ours, a method of determining the loss function
for any (x, y) without referring to the probability result (1)?
Average time to lose As promised, we compute the expected time it will take to lose,
given that we lose. If T represents the number of ips before losing, then we want the
conditional expectation E(T|L) and this equals
1
Pr(L)
k=0
(x + 2k)c
x+2k
q
x+k
p
k
=
xq
x
g(u) + 2q
x
ug
(u)
q
x
g(u)
= x + 2u
g
(u)
g(u)
.
218 MATHEMATICS MAGAZINE
For (x, y) = (4, 2), we have g
(u)/g(u) =
46u
3u
2
4u+1
, so the expected number of ips
is
4 +2pq
4 6pq
3p
2
q
2
4pq +1
.
For p = q =
1
2
and u =
1
4
, the expected time to lose is 32/3.
Acknowledgment The author would like to express his thanks to Emeric Deutsch for reading several versions
of this paper and for general advice. Special thanks are due to Ken Ross, Associate Editor, for a great deal of
improvement of this paper, mathematically, historically, and stylistically.
REFERENCES
1. W. Feller, An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed., John Wiley, 1968.
2. R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A Foundation for Computer Science,
Addison-Wesley, 1989.
3. R. Hirshon and R. De Simone, An offer you cant refuse, Mathematics Magazine 81 (2008) 146152.
4. L. Tak acs, On the classical ruin problems, J. American Statistical Association 64 (1969) 889906. doi:10.
2307/2283470
Summary In a version of gamblers ruin, players start with x and y dollars respectively, and ip coins for
one dollar per ip until one player runs out of money. This is a random walk with two absorbing barriers. We
consider the number of ways for the rst player to lose on the nth ip, for n = x, x +2, . . . . We use probabilistic
arguments to construct generating functions for these quantities along with explicit methods for computing them.
This paper builds on the paper by Hirshon and De Simone, Mathematics Magazine 81 (2008) 146152.
More Polynomial Root Squeezing
CHRI STOPHER FRAYER
University of WisconsinPlatteville
Platteville, WI 53818-3099
[email protected]
Suppose youre looking at the graph of a polynomial y = p(x) in a java applet, with
blue dots on the x-axis indicating the polynomials roots, and red dots on the x-axis
showing the positions of the critical points. Lets assume that all the roots are real and
that you grab the blue dots and move them around on the x-axis. As you do this, what
happens to the red dots?
This is a fair question because the roots determine the polynomial up to a constant
multiple, and they determine the critical points exactly. For simplicity (and without
loss of generality) we will only consider monic polynomials (that is, polynomials with
leading coefcient 1).
If you move all the blue points (roots) the same amount, the whole graph just trans-
lates, and all the red dots simply move along for the ride. If you move all the roots in
the same direction but by different amounts, it seems reasonable that the critical points
all move in that same direction. This is in fact true, according to the Polynomial Root
Dragging Theorem (see [1], [3]). But suppose you take two roots and symmetrically
squeeze them closer to each other, something we call polynomial root squeezing. Then
Math. Mag. 83 (2010) 218221. doi:10.4169/002557010X494878. c Mathematical Association of America
VOL. 83, NO. 3, JUNE 2010 219
what do the critical points do? In [2], Boelkins, From and Kolins answer this for crit-
ical points that are outside the interval between the two selected roots. In this article
we extend their analysis to cover critical points at or between the two squeezed roots.
Notation and denitions Let p(x) be a monic degree-n polynomial with real roots
r
1
r
2
r
n
and critical points c
1
c
2
c
n1
. Rolles Theorem tells us
that there is a critical point strictly between each pair of adjacent roots. We know that
wherever there are r roots together at a single point, there are also (r 1) critical
points. So we have
r
1
c
1
r
2
c
2
c
n1
r
n
(1)
with r
i
< c
i
< r
i +1
whenever r
i
< r
i +1
. By polynomial root squeezing we mean se-
lecting two indices i and j with r
i
strictly less than r
j
; we then move the smaller root
from r
i
to r
i
+ d and the larger root from r
j
to r
j
d, where d > 0. We insist that
d <
r
i
+r
j
2
, so that the roots dont pass each other.
As an example, consider the polynomial p(x) = x
2
(x + 1)(x 2). It has single
roots at 1 and 2, and a double root at 0. Its critical points are at (approximately)
.693, 0, and 1.443. After squeezing the roots at 1 and 2 to .5 and 1.5 respectively,
the polynomial becomes p(x) = x
2
(x +.5)(x 1.5). The left critical point moves to
the right from.693 to .343, and the right critical point moves to the left from 1.443
to 1.093. However the center critical point remains at zero. This example is illustrated
in FIGURE 1.
q(x)
p(x)
2.5
2
1.5
1
.5
0
.5
1.5 1 .5 0 .5 1 1.5 2 2.5
Figure 1 Two roots of the polynomial p(x) = x
2
(x + 1)(x 2) have been squeezed
together to form p(x). In this example, x = 0 is a critical point of p(x) and q(x).
Why doesnt the critical point at zero move? It is because x = 0 is a repeated root
of
p(x)
(x+1)(x2)
, and as long as this repeated root remains xed, so must the critical point.
More generally, if c
k
is a repeated root of
p(x)
(xr
i
)(xr
j
)
, then c
k
will remain a critical
point when r
i
and r
j
are squeezed together. For this reason, we say that a critical point
is stubborn if it is a repeated root of
p(x)
(xr
i
)(xr
j
)
, and ordinary otherwise.
A stubborn critical point can move if it lies at r
i
or r
j
. If r
i
(or r
j
) lies at a repeated
root of multiplicity greater than two, then there is a repeated stubborn critical point
there. When r
i
is dragged to the right, one of the stubborn critical points will move to
220 MATHEMATICS MAGAZINE
the right, while the others will remain xed. In order to state the theorem as succinctly
as possible we exclude the case of stubborn critical points and leave the details as an
exercise.
The theorem Boelkins, From and Kolins [2] proved the Polynomial Root Squeezing
Theorem. That theorem explains how squeezing two roots together affects the critical
points that are outside of the interval between the two squeezed roots. Our proof of the
Polynomial Root Squeezing Theorem extends their analysis to the critical points that
lie at or between the two squeezed roots.
THEOREM. If the roots at r
i
and r
j
move equal distances toward each other, then
each ordinary critical point moves toward (r
i
+r
j
)/2. If the roots at r
i
and r
j
move
equal distances away from each other, then each ordinary critical point moves away
from (r
i
+r
j
)/2.
r
4
r
2
c
1
c
2
c
3
c
4
c
5
r
2
+ r
4
2
Figure 2 The Polynomial Root Squeezing Theorem: when we drag r
2
and r
4
together,
the critical points move toward (r
2
+r
4
)/2.
Proof. We prove the root squeezing part of the theorem. The root separating part
(moving r
i
and r
j
equal distances away from each other) follows similarly.
Let p(x) be a polynomial of degree n with (possibly repeated) real roots r
1
r
2
r
n
, r
i
< r
j
and c
k
any critical point of p(x). Let p(x) be the polynomial that
results from squeezing r
i
and r
j
a xed distance d, with 0 d <
1
2
r
j
r
i
. That is
p(x) = (x r
i
d)(x r
j
+d)
k=i, j
(x r
k
)
= (x r
i
d)(x r
j
+d)q(x).
Denote the roots of p(x) by r
1
r
2
. . . r
n
and the critical points by c
1
c
2
. . . c
n1
.
If c
k
lies outside the interval from r
i
to r
j
, then the conclusion follows from [2].
(It also follows from a slight variation of the reasoning below.) If c
k
is between r
i
and
r
i
+d, or between r
j
d and r
j
(that is, if one of the moving roots passes by c
k
) then
the result follows from counting intervals in (1).
We now assume that c
k
is not at a repeated root of p and that r
i
+d < c
k
< r
j
d.
Our goal is to compare c
k
and c
k
. We do so by investigating p
(c
k
). Let
p(x) = (x r
i
)(x r
j
)q(x),
so that
p
(x) = (x r
i
+ x r
j
)q(x) +(x r
i
)(x r
j
)q
(x), (2)
VOL. 83, NO. 3, JUNE 2010 221
and
p
(x) = (x r
i
+ x r
j
)q(x) +(x r
i
d)(x r
j
+d)q
(x). (3)
Subtracting (2) from (3) yields
p
(c
k
) = d(r
j
r
i
d)q
(c
k
). (4)
Since r
j
r
i
d > 0, this implies that p
(c
k
) and q
(c
k
) have the same sign.
Without loss of generality we assume that p(x) < 0 on (r
k
, r
k+1
) and that |c
k
r
i
| <
|c
k
r
j
| (The cases where |c
k
r
i
| > |c
k
r
j
| and or p(x) > 0 are similar.) Since
r
i
< c
k
< r
j
, it follows that (c
k
r
i
)(c
k
r
j
) < 0 so that q(c
k
) > 0. As p
(c
k
) = 0,
0 = p
(c
k
) = (c
k
r
i
+c
k
r
j
)q(c
k
) +(c
k
r
i
)(c
k
r
j
)q
(c
k
).
An analysis of the sign of the terms, with the assumption that |c
k
r
i
| < |c
k
r
j
|,
implies that q
(c
k
) < 0. It then follows from (4) that p
(c
k
) < 0.
Since p(c
k
) < 0, the equation
p(c
k
)(c
k
r
i
d)(c
k
r
j
+d) = p(c
k
)(c
k
r
i
)(c
k
r
j
)
implies that p(c
k
) < 0. Since we assume that r
i
+ d < c
k
< r
j
d and c
k
is not a
repeated root of p, it follows that r
k
= r
k
or r
k
= r
i
+d while r
k+1
= r
k+1
or r
k+1
=
r
j
d. In all four cases, r
k
< c
k
< r
k+1
with p(c
k
) < 0 which implies that p(x) < 0
on ( r
k
, r
k+1
). Therefore p
(c
k
) <
0, it follows that c
k
< c
k
and c
k
has moved toward (r
i
+r
j
)/2.
This extended version of the Polynomial Root Squeezing Theorem completely char-
acterizes the behavior of all the critical points when distinct roots are squeezed or sep-
arated a uniform distance. In every case, if a critical point moves at all, it moves in the
same direction as the moving root that is nearest to it.
Unfortunately, this intuition does not help us when two distinct roots are squeezed
together a nonuniform distance. Neither does it tell us what happens when more than
two roots are moved simultaneously. These problems could prompt some interesting
undergraduate research.
Acknowledgment The author wishes to express his gratitude to James Swenson and Tony Thomas for helpful
conversations.
REFERENCES
1. Bruce Anderson, Polynomial root dragging, Amer. Math. Monthly 100 (1993) 864866. doi:10.2307/
2324665
2. Matthew Boelkins, Justin From, and Samuel Kolins, Polynomial root squeezing, Math. Mag. 81 (2008) 3944.
3. Gideon Peyser, On the roots of the derivative of a polynomial with real roots, Amer. Math. Monthly 74 (1967)
11021104. doi:10.2307/2313625
Summary Given a polynomial with all real roots, the Polynomial Root Dragging Theorem states that moving
one or more roots of the polynomial to the right will cause every critical point to move to the right, or stay xed.
But what happens to the position of a critical point when roots are dragged in opposite directions? In this note
we discuss the Polynomial Root Squeezing Theorem, which states that moving two roots, r
i
and r
j
, an equal
distance toward each other without passing other roots, will cause each critical point to move toward (r
i
+r
j
)/2,
or remain xed.
222 MATHEMATICS MAGAZINE
A Counterexample to Integration by Parts
ALEXANDER KHEI FETS
Department of Mathematical Sciences
University of Massachusetts Lowell
Alexander [email protected]
J AMES PROPP
Department of Mathematical Sciences
University of Massachusetts Lowell
James [email protected]
The integration-by-parts formula
f
(x)g(x) dx = f (x)g(x)
f (x)g
(x) dx
carries with it an implicit quantication over functions f, g to which the formula ap-
plies. So, what conditions must f and g satisfy in order for us to be able to apply the
formula?
A natural guesswhich some teachers might even offer to a student who raised
the questionwould be that this formula applies whenever f and g are differen-
tiable. Clearly this condition is necessary, since otherwise the integrands f
(x)g(x)
and f (x)g
(x) are not dened. But is this condition sufcient? We will show that
it is not. That is, we will give an example of two differentiable functions f, g on
[0, 1] for which the denite integrals
1
0
f
(x)g(x) dx and
1
0
f (x)g
(x) dx do not
exist (the former is and the latter is +); it follows that the functions f
(x)g(x)
and f (x)g
(x) do not have antiderivatives on the interval [0, 1], so that the indenite
integrals
f
(x)g(x) dx and
f (x)g
x
2
sin
1
x
4
, x = 0
0, x = 0
Math. Mag. 83 (2010) 222225. doi:10.4169/002557010X494896. c Mathematical Association of America
VOL. 83, NO. 3, JUNE 2010 223
and
g(x) =
x
2
cos
1
x
4
, x = 0
0, x = 0
on the interval [0, 1]. Both functions are continuous on [0, 1] and differentiable on
[0, 1]. Indeed, if we consider f and g as dened above to be dened on all of R,
both functions are differentiable everywhere; for, away from 0 we can use the chain
rule, while at 0 we have |( f (h) f (0))/(h 0)| = | f (h)/h| |h
2
/h| = |h| so that
f
(0) = lim
h0
( f (h) f (0))/(h 0) = 0, and likewise g
1
0
[ f (x)g(x)]
dx
exists. However, we will show that both integrals
1
0
f
(x)g(x) dx and
1
0
f (x)g
(x) dx
are divergent. It sufces to show that the rst integral is divergent. For x = 0,
f
(x) = 2x sin
1
x
4
4x
2
cos
1
x
4
1
x
5
.
The rst term in this representation of f
(x) is continuous, and g(x) is continuous, so
their product is continuous and therefore integrable. So, we focus on the second term
times g(x), namely
4
1
0
x
2
cos
1
x
4
1
x
5
g(x) dx = 4
1
0
x
4
cos
2
1
x
4
1
x
5
dx
=
1
0
x
4
cos
2
1
x
4
1
x
4
.
After the substitution
u =
1
x
4
the integral turns into
1
1
u
cos
2
(u) du
(with the minus sign coming from the interchange of upper and lower limits of inte-
gration). To show that this integral diverges, let k be a positive integer. Then for every
u in the interval [2k
4
, 2k] we have
cos
2
(u)
1
2
and
1
u
1
(2k)
.
224 MATHEMATICS MAGAZINE
Therefore,
1
1
u
cos
2
(u) du
k=1
2k
2k
4
1
u
cos
2
(u) du
k=1
1
(2k)
1
2
4
=
1
16
k=1
1
k
=
This completes the proof.
Our analysis shows that the (improper) denite integrals
1
0
f
(x)g(x) dx and
1
0
f (x)g
(x) do not have antiderivatives on [0, 1]. For, if these functions had antideriva-
tives, the fundamental theorem of calculus would yield nite values for the denite
integrals.
We have shown that the functions f
g and f g
1
0
f
(x) dx exists: for all > 0 the Fundamental
Theorem of calculus implies
f
(x) dx = f (1) f (), which converges to f (1)
f (0) as 0, implying that
1
0
f
(x) dx exists and equals f (1) f (0). Likewise g
uv
dx = uv
v dx). More common are books and web sites that present the
integration by parts formula and give examples without specifying the conditions un-
der which the formula applies. A provocative treatment of other pedagogical aspects
of the integration by parts theorem is [2].
So, what should the calculus teacher say?
In an ordinary calculus class, the integration by parts formula should be stated as
a theorem that begins If f
and g
is continuous).
For a more advanced course (an honors calculus class or an introductory real analy-
sis class), our example could be presented in detail and used to motivate the notion of
bounded variation, since the lack of bounded variation of the derivatives of the func-
tions near the origin is the source of the problem. We also mention that, in lieu of
adopting the hypothesis that f (or g) is continuously differentiable, one might require
that f be Riemann-Stieltjes integrable with respect to dg. Then it can be shown that the
integration by parts formula (where the integrals now are Riemann-Stieltjes integrals)
is valid, and it is part of the conclusion that g will be Riemann-Stieltjes integrable with
respect to d f (see [1]).
Finally, we mention that if the functions f
and g
g and f g
are not
integrable, so that the integration by parts formula does not apply.
PROBL E MS
BERNARDO M.
ABREGO, Editor
California State University, Northridge
Assistant Editors: SILVIA FERN
E A. G
OMEZ, Facultad de Ciencias, UNAM, M exico; ROGELIO VALDEZ, Facultad
de Ciencias, UAEM, M exico; WILLIAM WATKINS, California State University, Northridge
PROPOSALS
To be considered for publication, solutions should be received by November 1,
2010.
1846. Proposed by Eddie Cheng and Jerrold W. Grossman, Department of Mathemat-
ics and Statistics, Oakland University, Rochester, MI.
For which n 1 is it possible to place the numbers 1, 2, . . . , n in some order (a) on
a line segment, or (b) on a circle, so that for every s from 1 to
1
2
n(n + 1) there is a
connected subset of the segment or circle such that the sum of the numbers on that
subset is s?
1847. Proposed by Panagiote Ligouras, Leonardo da Vinci High School, Noci,
Italy.
Let ABC be a scalene triangle. Let h
a
, l
a
, and m
a
be the respective lengths of the
height, bisector, and median, of ABC with respect to A, and let r
a
be the exradius of
the excircle of ABC opposite to A. Similarly, dene h
b
, l
b
, m
b
, and r
b
, with respect
to B, and h
c
, l
c
, m
c
, and r
c
with respect to C. Prove that
l
4
a
(m
2
a
h
2
a
)
h
3
a
r
a
(l
2
a
h
2
a
)
+
l
4
b
(m
2
b
h
2
b
)
h
3
b
r
b
(l
2
b
h
2
b
)
+
l
4
c
(m
2
c
h
2
c
)
h
3
c
r
c
(l
2
c
h
2
c
)
>
16
3
.
1848. Proposed by Herb Bailey, RoseHulman Institute of Technology, Terre Haute,
IN.
Let N be a base ten positive integer with nonzero last digit. Let N
be the integer
formed by moving the last digit of N to the front. For example, if N = 867053 then
N
.
Math. Mag. 83 (2010) 226233. doi:10.4169/002557010X494904. c Mathematical Association of America
We invite readers to submit problems believed to be new and appealing to students and teachers of advanced
undergraduate mathematics. Proposals must, in general, be accompanied by solutions and by any bibliographical
information that will assist the editors and referees. A problem submitted as a Quickie should have an unexpected,
succinct solution. Submitted problems should not be under consideration for publication elsewhere.
Solutions should be written in a style appropriate for this MAGAZINE.
Solutions and new proposals should be mailed to Bernardo M.
Abrego, Problems Editor, Department of
Mathematics, California State University, Northridge, 18111 Nordhoff St, Northridge, CA 91330-8313, or mailed
electronically (ideally as a L
A
T
E
X or pdf le) to [email protected]. All communications, written or
electronic, should include on each page the readers name, full address, and an e-mail address and/or FAX
number.
226
VOL. 83, NO. 3, JUNE 2010 227
1849. Proposed by Ovidiu Furdui, Campia Turzii, Cluj, Romania.
Find the sum
m=1
n=1
(1)
n+m
(
n +m)
3
,
where a denotes the greatest integer less than or equal to a.
1850. Proposed by Richard Stephens, Department of Mathematics, Columbus State
University, Columbus, GA.
Let be a topology on a nite set X. Dene a topology on X to be regular if for any
nonempty closed E X and x X \ E, there exist disjoint open sets U and V in
such that E V and x U. Prove or disprove that the topological space (X, ) is
regular if and only if has a base B which is a partition of X.
Quickies
Answers to the Quickies are on page 232.
Q1001. Proposed by Herman Roelants, Center for Logic, Institute of Philosophy, Uni-
versity of Leuven, Leuven, Belgium.
The recursive sequence (a
n
) is dened as follows: a
1
= 0 and a
n+1
=
_
a
2
n
+1 +a
n
for n 1. Determine the value of
lim
n
2
n
a
n
.
Q1002. Proposed by Michael W. Botsko, Saint Vincent College, Latrobe, PA.
Let g be a positive, continuous, real-valued function on [0, ), and let
f (x) = g(x)
_
x
0
1
(g(t ))
2
dt.
Prove that f is unbounded on [0, ).
Solutions
Locating the intersection of the diagonals June 2009
1821. Proposed by Abdullah Al-Sharif and Mowaffaq Hajja, Yarmouk University, Ir-
bid, Jordan.
Let ABCD be a convex quadrilateral, let X and Y be the midpoints of sides BC and DA
respectively, and let O be the point of intersection of diagonals of ABCD. Prove that
O lies inside of quadrilateral ABXY if and only if
Area(AOB) < Area(COD).
I. Solution by Michel Bataille, Rouen, France.
Let U and V be the points of intersection of XY with AC and BD, respectively (see
gure).
228 MATHEMATICS MAGAZINE
Let positive real numbers p, q be dened by
OC = p
OA,
OD = q
OB
so that C = pA +(1 + p)O and D = (1 +q)O qB.
Then, 2X = B + C = pA + (1 + p)O + B and similarly, 2Y = A + (1 +
q)O qB. It follows that the equation of the line XY, in barycentric coordinates
(x, y, z) relative to (A, O, B), is
( pq +1 +2q)x +( pq 1)y +( pq +1 +2p)z = 0,
and so the coordinates of U and V are U = (1 pq, pq + 1 + 2q, 0) and V =
(0, pq +1 +2p, 1 pq), that is,
2(1 +q)
OU = (1 pq)
OV = (1 pq)
OB.
Thus O is in the interior of ABXY if and only if pq > 1.
On the other hand,
Area(COD) =
1
2
OC OD sin(COD) =
1
2
p OA q OB sin(AOB)
= pqArea(AOB).
Thus pq > 1 if and only if Area(COD) > Area(AOB).
II. Solution by David Getling, Berlin, Germany.
Let Z and W be the midpoints of CD and AB, respectively. Varignons Theorem
says that XWYZ is a parallelogram. Indeed, XW and YZ are parallel to AC and also YW
and XZ are parallel to BD. As a consequence O always lies inside this parallelogram.
Also, O lies inside ABXY if and only if O lies inside the triangle XYW, that is, O lies
inside ABXY if and only if [WXOY] < [YOXZ], where [WXOY] designates the area of
WXOY.
VOL. 83, NO. 3, JUNE 2010 229
In the gure, all triangular regions with the same area have been labeled with the
same number. The condition [WXOY] < [YOXZ] is equivalent to
[1] +[1] +[3] +[4] < [2] +[2] +[3] +[4], or [1] +[1] +[3] < [2] +[2] +[3].
But [WBX] = [ABC]/4, from which [1] +[3] = [5] +[8], and similarly, [2] +[3] =
[7] +[8]. Thus the condition is equivalent to
[1] +[5] +[8] < [2] +[7] +[8], or
1
2
[AOB] = [1] +[5] < [2] +[7] =
1
2
[COD],
which completes the proof.
Also solved by Robert Calcaterra, Robert L. Doucette, Fisher Problem Solving Group, Dmitry Fleischman,
Michael Goldenberg and Mark Kaplan, Eugen J. Ionascu, Young Ho Kim (Korea), Omran Kouba (Syria), Victor
Y. Kutsenok, Aaron Panchal, Joel Schlosberg, Edward Schmeichel, Marian Tetiva (Romania), and the proposers.
An inequality for
3
u/v +
3
x +2.
Note that
8x 17 +
2
x(x
2
3)
= (x 2)
_
8
(x +1)
2
x(x
2
3)
_
.
Writing (x +1)
2
= (x
2
3) +2x +4, and noting that x
2
3 > 1 for x > 2, it fol-
lows that
(x +1)
2
x(x
2
3)
=
1
x
+
2x +4
x(x
2
3)
<
1
x
+
2x +4
x
= 2 +
5
x
< 2 +
5
2
=
9
2
.
Hence
8x 17 +
2
x(x
2
3)
> (x 2)
_
8
9
2
_
> 0,
and the rst inequality is proved. To prove the second inequality, note that x > 2 im-
plies
x +2 > 2, and consequently
x < 2(x 1) < (x 1)
x +2.
For the problem at hand, let x =
3
u/v +
3
x +2.
230 MATHEMATICS MAGAZINE
Because x
3
= u/v +v/u +3x, it follows that
(x 1)
2
(x +2) = x
3
3x +2 = 2 +
u
v
+
v
u
= (u +v)
_
1
u
+
1
v
_
and
2
x(x
2
3)
=
2
x
3
3x
=
2
u/v +v/u
=
2uv
u
2
+v
2
.
Therefore
1
8
_
17
2uv
u
2
+v
2
_
<
3
_
u
v
+
3
_
v
u
<
_
(u +v)
_
1
u
+
1
v
_
.
Moreover, if u = v then all three expressions in the inequality are equal, so equality
holds if and only if u = v.
Editors Note. Stan Wagon veried that the constant
1
8
in the rst inequality cannot be
improved. Eugene A. Herman proved the stronger inequality
4
9
(5 uv/(u
2
+v
2
)) <
3
u/v +
3
u/v +
3
u/v +
4
v/u and
veried that the statement no longer holds with fth roots.
Also solved by Arkady Alt, Michel Bataille (France), Minh Can, Hongwei Chen, John Christopher, Chip Cur-
tis, Robert L. Doucette, John Ferdinands, Leon Gerber, Michael Goldenberg and Mark Kaplan, Eugene A. Her-
man, Eugen J. Ionascu and Sarah E. Ewing, Parvis Khalili, Elias Lampakis (Greece), Kee-Wai Lau (China), Gra-
ham Lord, Jos e H. Nieto (Venezuela), Northwestern University Math Problem Solving Group, Occidental College
Problem Solving Group, Paolo Perfetti (Italy), Gabriel T. Pr ajitur a, Joel Schlosberg, John L. Simmons (Holland),
Nicholas C. Singer, Sanghun Song (Korea), Albert Stadler (Switzerland), David Stone and John Hawkins, Mar-
ian Tetiva (Romania), Texas State Problem Solvers Group, Michael Vowe (Switzerland), Stan Wagon, and the
proposer. There were two incorrect submissions.
Permutations with k initial entries of the same parity June 2009
1823. Proposed by Emeric Deutsch, Polytechnic University, Brooklyn, NY.
Let n and k be positive integers. Find a closed-form expression for the number of
permutations of {1, 2, . . . , n} for which the initial k entries have the same parity, but
the initial k + 1 entries do not. (As an example, for the permutation 5712463, the
number of initial entries of the same parity is 3, the order of the set {5, 7, 1}.)
Solution by Jos e H. Nieto, Universidad del Zulia, Maracaibo, Venezuela.
Let I
n
= {1, 2, . . . , n}. Denote by E(n, k) and O(n, k) the sets of permutations
of I
n
with just k initial even entries, respectively with just k initial odd entries. The
problem asks to nd an expression for p(n, k) = |E(n, k)| +|O(n, k)|.
If n = 2m is even, the rst k entries of a permutation in E(n, k) can be chosen
in m(m 1) (m k +1) ways, the (k +1)th entry in m ways, and the remaining
n k 1 entries in (2m k 1)! ways, hence |E(2m, k)| =
_
m
k
_
k!m(2m k 1)!.
By symmetry |O(2m, k)| = |E(2m, k)| and
p(2m, k) = 2m
_
m
k
_
k! (2m k 1)!.
Analogously, if n = 2m + 1 then |E(2m + 1, k)| =
_
m
k
_
k! (m + 1)(2m k)! and
|O(2m +1, k)| =
_
m+1
k
_
k! m(2m k)!, hence
p(2m +1, k) =
_
(m +1)
_
m
k
_
+m
_
m +1
k
__
k!(2m k)!.
VOL. 83, NO. 3, JUNE 2010 231
Both formulas for n even and odd may be resumed as follows:
p(n, k) =
_
_
n
2
_
_
_
n
2
_
k
_
+
_
n
2
_
_
_
n
2
_
k
__
k!(n k 1)!.
Editors Note. Graham Lord observed that if the set I
n
is partitioned into sets A and
B with | A| = a and |B| = b, then the number of permutations of I
n
where the rst k
entries are in A and the next j entries are in B is equal to
_
a
k
_
k!
_
b
j
_
j ! (n j k)!.
Also solved by Michel Bataille (France), Jany C. Binz (Switzerland), Robert Calcaterra, Chip Curtis,
M. N. Deshpande (India), Dmitry Fleischman, Ralph P. Grimaldi, Eugene A. Herman, Peter M. Joyce and
Richard F. McCoart Jr., Victor Y. Kutsenok, Elias Lampakis (Greece), Graham Lord, Rob Pratt, Joel Schlos-
berg, John Sumner and Aida Kadic-Galeb, Nicholas C. Singer, Texas State Problem Solvers Group, Michael
Woltermann, and the proposer.
An Intermediate Value Theorem conclusion June 2009
1824. Proposed by Cezar Lupu, student, University of Bucharest, Bucharest, Roma-
nia.
Let f be a continuous real-valued function dened on [0, 1] and satisfying
_
1
0
f (x) dx =
_
1
0
x f (x) dx.
Prove that there exists a real number c, 0 < c < 1, such that
cf (c) =
_
c
0
x f (x) dx.
Solution by Dave Trautman, Department of Mathematics and Computer Science, The
Citadel, Charleston, SC.
Because f is continuous and
_
1
0
(1 x) f (x) dx = 0, the Mean Value Theorem for
Integrals assures the existence of some c
1
, 0 < c
1
< 1, such that (1 c
1
) f (c
1
) = 0.
Clearly this means f (c
1
) = 0. If
_
c
1
0
x f (x) dx = 0, then c = c
1
proves the required
identity. Replacing f by f if necessary, it can be assumed that
_
c
1
0
x f (x) dx > 0.
Because the function G(x) = x f (x) is continuous on [0, 1], there exists c
2
, 0 c
2
<
c
1
, such that G(c
2
) is the maximum value of G on [0, c
1
]. For 0 x c
1
, let
H(x) =
_
x
0
t f (t ) dt.
Because c
2
< 1, it follows that
H(c
2
) =
_
c
2
0
t f (t ) dt c
2
G(c
2
) < G(c
2
).
On the other hand,
H(c
1
) =
_
c
1
0
t f (t ) dt > 0 = G(c
1
).
Thus the Intermediate Value Theorem says that there exists c, c
2
< c < c
1
, such that
G(c) = H(c), that is cf (c) =
_
c
0
x f (x) dx.
Editors Note. A number of readers pointed out that the same conclusion follows if the
hypothesis is replaced by the weaker condition of f being continuous and f (x
0
) = 0
for some 0 < x
0
< 1.
232 MATHEMATICS MAGAZINE
Also solved by Michael R. Bacon and Charles K. Cook, Michel Bataille (France), Gerald E. Bilodeau, Michael
W. Bosko, Robert Calcaterra, Hongwei Chen, John Christopher, Andr es Fielbaum (Chile), Fisher Problem Solv-
ing Group, G.R.A.20 Problem Solving Group (Italy), William Hodge, Eugen J. Ionascu, Parviz Khalili, Elias
Lampakis (Greece), Kee-Wai Lau (China), Kim McInturff, Occidental Problem Solving Group,
Angel Plaza and
Jos e M. Pacheco (Spain), Edward Schmeichel, Sanghun Song (Korea), Marian Tetiva (Romania), Jeremy Thi-
bodeaux, Thomas P. Turiel, Nicholas J. Willis, and the proposer.
Non-nested subsets of a ring closed under multiplication June 2009
1825. Proposed by Greg Oman and Kevin Schoenecker, The Ohio State University,
Columbus, OH.
Let R be a ring with more than two elements. Prove that there exist subsets S and T
of R, both closed under multiplication, and such that S T and T S. (Note: We
do not assume that R is commutative nor do we assume that R has a multiplicative
identity.)
Solution by Howard E. Bell, Department of Mathematics, Brock University, St. Cather-
ines, Ontario, Canada.
If R contains an element a such that a
n
= 0 for all n Z
+
, then the sets S = {0} and
T = {a
n
: n Z
+
} satisfy the required properties. Assume that R is a nil ring, that is
for every x R there is a positive integer n such that x
n
= 0. Let the index of x be the
smallest positive integer with this property. If R contains two distinct elements a and
b of index 2, then let S = {0, a} and T = {0, b}. Clearly S and T satisfy the required
conditions. This case occurs if the maximum index in R is 2. It also occurs when there
exists a R with index k 4, for in this case a
k1
and a
k2
are two elements of index
2. The only remaining case is that R contains an element a of index 3, in which case
a, a
2
, and a +a
2
are nonzero and a = a
2
, a = a +a
2
, and a
2
= a +a
2
. Thus the sets
S = {0, a, a
2
} and T = {0, a +a
2
, a
2
} satisfy the requirements.
Note. It is possible to insist that S T be commutative, for if R is a noncommutative
ring with maximum index 2 and a and b are noncommuting elements of R, then a,
b, and a + b all have square zero, so that ab + ba = 0 and hence both ab and ba
are nonzero. Thus, S = {0, ab} and T = {0, ba} satisfy the requirements and S T is
commutative.
Also solved by Paul Budney, Robert Calcaterra, John Ferdinands, John N. Fitch, Rod Hardy and Alin
A. Stancu, Elias Lampakis (Greece), David P. Lang, Missouri State University Problem Solving Group, Justin
Neil and Paul Peck, Jos e H. Nieto, Northwestern University Math Problem Solving Group,
Eric Pit e (France),
Gabriel T. Pr ajitur a, Nicholas C. Singer, John Sumner and Aida Kadic-Galeb, Vadim Ponomarenko, Marian
Tetiva (Romania), Texas State University Problem Solvers Group, Gregory P. Wene (Mexico), and the proposers.
There was one incorrect submission.
Answers
Solutions to the Quickies from page 227.
A1001. The answer is . Note that 1/a
2
= 1 = tan(/2
2
). By induction, if 1/a
n
=
tan(/2
n
), then for positive angles less than /2 the Tangent Half-Angle Formula
gives
tan
_
2
n+1
_
=
1 +
_
1 +tan
2
(/2
n
)
tan(/2
n
)
=
1 +
_
1 +a
2
n
a
1
n
= a
n
+
_
a
2
n
+1 =
1
_
a
2
n
+1 +a
n
=
1
a
n+1
.
VOL. 83, NO. 3, JUNE 2010 233
Therefore
lim
n
2
n
a
n
= lim
n
_
2
n
tan
_
2
n
__
= lim
n
_
2
n
tan
_
2
n
_
_
= .
A1002. Suppose f is bounded on [0, ). Let h(x) =
_
x
0
(g(t ))
2
dt so that h
(x) =
(g(x))
2
. Note that h(x) > 0 on (0, ). Because f is bounded, there exists B >
0 such that f (x) = g(x)h(x) B on [0, ). Therefore g
2
(x)h
2
(x) B
2
and thus
h
(x)/h
2
(x) 1/B
2
on (0, ). Integrating this inequality yields
1
h(1)
1
h(x)
=
_
x
1
h
(t )
h
2
(t )
dt
_
x
1
1
B
2
dt =
1
B
2
(x 1) on [1, ).
Therefore
1
B
2
(x 1)
1
h(x)
+
1
B
2
(x 1)
1
h(1)
on [1, ),
which is a contradiction.
Editors Note. By letting B(x) = c
x is also
unbounded. On the other hand, the function g(x) =
x +2 shows that it is possible
for f (x)/(
s
i
0. Since d = s
s
i +1
s
s
i
, we see s
i +1
s
i
is bounded. The difference achieves a
maximum s
a+1
s
a
= M and minimum s
b+1
s
b
= m. Let s
a
= k. Let s
b
= l.
Then s
s
s
a+1
s
s
s
a
= s
s
s
a
+M
s
s
s
a
= s
s
k+M
s
s
k
= M d since s
s
i
is an arithmetic
progression with common difference d. Since M is the maximum of s
i +1
s
i
, and
the average value of s
i +1
s
i
from s
s
s
a
to s
s
s
a+1
is M, it follows s
s
s
a
+1
s
s
s
a
= M.
But s
s
i
+1
s
s
i
is constant, so it equals M. By a similar argument using that m is the
minimum of s
i +1
s
i
, we have s
s
i
+1
s = m. Hence M = m and the given sequence
is arithmetic.
This problem was proposed by Gabriel Carroll of the USA.
4. Let =
DAC. Then
CAB = 2,
BCA =
CBA = 90
, and
EBC = 45
/2,
CEK = 3/2 and
CKE = 135
. Finally,
CKA = 135
/2)
sin(45
+3/2)
. (1)
Apply the Law of Sines to AKC and simplify to obtain KC =
2ACsin(/2).
Finally, apply the Law of Sines to EKC and rearrange to obtain EC = KCsin(135
)/ sin(/2). Combining
EC =
2ACsin(/2)
sin(135
)
sin(3/2)
. (2)
Then equating (1) and (2) and cancelling AC
2 sin()
sin(45
/2)
sin(45
+3/2)
=
2 sin(/2)
sin(135
)
sin(3/2)
. (3)
Solving this equation, = 30
or = 45
, so
CAB = 60
or 90
.
The problem was suggested by Jan Vonk, Belgium, Peter Vandendriessche, Belgium
and Hojoo Lee, Korea.
238 MATHEMATICS MAGAZINE
5. First note that if a triangle has positive integer side lengths 1, a, b, then by the triangle
inequality a 1 b a +1. If the triangle is non-degenerate, then a = b. Using a =
1, then f (b) = f (b + f (a) 1). Now the claim is that f (a) = 1, since otherwise if
f (a) > 1 then f is periodic of period f (a) 1, and f is bounded above. Then choosing
a larger than twice the upper bound violates the triangle inequality. Using b = 1, then
a, f (1) = 1, f ( f (a)) are the side lengths of a triangle, so a = f ( f (a)) for all a. Thus
f is injective.
Now assume f (2) = k > 2. Hence f (b) 1 f (b + f (2) 1) f (b) +1. Then
check the 3 possibilities for f (b + f (2) 1):
(a) f (b + f (2) 1) = f (b), so f (2) = 1 which is impossible.
(b) f (b + f (2) 1) = f (b) 1, so set k = f (2) 1, so f (b + k) = f (b)
1. By induction f (b + n k) = f (b) n. Choosing n = f (b) 1 leads to
function value 1, contradicting injectivity.
(c) f (b + f (2) 1) = f (b) + 1. Set b = 1, f (2) 1 = k, so f (1 + k) =
f (1) +1. Inducting, f (1 +n k) = n +1. Now if k > 1, then 1 k 1 <
k + 1 < 1 + n k. This means that f (k 1) = 1 = f (1) or k = 2 implies
f (2) = 3 and f (b +2) = f (b) +1 and nally f (2) = 3 and f (5) = 3 which
is impossible. Conclude that k = 1, so f (b +1) = f (b) +1 and f (n) = n.
This problem was proposed by Bruno Le Floch of France.
6. Induct on n. The cases n = 1 and n = 2 are easy. For n 3, without loss of generality
let a
n
be the largest jump size and let m
1
be the smallest element of M. Consider 3
cases.
(a) If m
1
< a
n
and a
n
/ M, then begin with a jump of size a
n
. That jump avoids
m
1
, and the induction hypothesis means that the grasshopper can arrange the
remaining n 1 jumps to avoid the remaining n 2 values of M.
(b) If m
1
< a
n
but a
n
M, say a
n
= m
j
for some j , then consider the start-
ing two-jump sequences (a
1
, a
n
), . . . (a
n1
, a
n
). There are n 1 of these
sequences, and the landing values are all distinct and different from m
j
.
Therefore there are not enough forbidden values in M to block all of them.
For some i , the grasshopper can start with two safe jumps of size a
i
and a
n
.
These two jumps take the grasshopper past m
1
and m
j
, and by induction the
grasshopper can arrange the remaining n 2 jumps to avoid the remaining
n 3 values of M.
(c) If m
1
a
n
the grasshopper needs a different strategy. Begin with jump a
n
,
ignore the value m
1
, and arrange the remaining jumps to avoid the remaining
n 2 values of M other than m
1
. If this arrangement avoids m
1
, the proof is
done. Otherwise, suppose that the grasshopper lands on m
1
just before making
a jump of size a
i
. Then modify the jump sequence by exchanging jumps a
n
and a
i
. Then verify that the modied sequence avoids all the values of M.
This solution is by Anton Mellit, IMO observer with the Ukraine delegation, and
Ilya Bogdanov, IMO observer with the Russian delegation with simplications by Brian
Basham, a mathematics student at MIT.
Immediately following the IMO, Terry Tao hosted a collaborative solution on his
blog site as a mini-polymath project, [3]. The polymath collaborative solution con-
tinued two days [4] until the contributors agreed upon a solution. Terry Tao followed
with an analysis of the polymath process, [5]. Michael Nielsen wrote up 5 variant proofs
from the collaboration [2].
This problem was proposed by Dmitry Khramtsov of Russia.
VOL. 83, NO. 3, JUNE 2010 239
2009 International Mathematical Olympiad Results At the IMO 530 young math-
ematicians from 104 countries competed on July 1516, 2009. The USA team ranked 6th
among all 104 participating countries. The USA team has consistently nished in the top
ten at the IMO. As part of the 50th anniversary of the IMO, Terry Tao and 5 other famous
mathematicians who were IMO medalists gave commemorative lectures. The students vis-
ited a mag-lev train demonstration project, the North Sea resort island Wangerooge, and
the historic Bremen city center.
John Berman, a graduate of John T. Hoggard High School, Wilmington, NC, won a Gold
medal.
Wenyu Cao, a student at Phillips Academy, Andover, Massachusetts won a Silver medal.
Eric Larson, who graduated from South Eugene High School, Eugene, OR won a Gold
medal.
Delong Meng who graduated from Baton Rouge Magnet School, Baton Rouge, LA won
a Silver medal.
Evan ODorney who attends the Venture School and is from Danville CA, won a Silver
medal.
Qinxuan Pan, who graduated from Wooton High School in Rockville MD, won a Silver
medal.
REFERENCES
1. IMO Moderators, Questions of the IMO 2009 Germany, 2009 (accessed Mar. 24, 2010). http://www.
artofproblemsolving.com/Forum/index.php?f=580.
2. Michael Nielsen. Imo 2009 Q6, 2009 (accessed Mar. 24, 2010). http://michaelnielsen.org/
polymath1/index.php?title=Imo_2009_q6.
3. Terry Tao, IMO 2009 Q6 as a mini-polymath project, 2009 (accessed Mar. 24, 2010). http://terrytao.
wordpress.com/2009/07/20/imo-2009-q6-as-a-mini-polymath-project/.
4. Terry Tao, IMO 2009 Q6 mini-polymath project cont., 2009 (accessed Mar. 24, 2010). http://terrytao.
wordpress.com/2009/07/21/imo-2009-q6-mini-polymath-project-cont/.
5. Terry Tao, IMO 2009 Q6 mini-polymath project cont., 2009 (accessed Mar. 24, 2010). http://
terrytao.wordpress.com/2009/07/22/imo-2009-q6-mini-polymath-project-impressions-
reflections-analysis/.
As a robust repertoire of examples is essential for students
to learn the practice of mathematics, so a mental library of
counterexamples is critical for students to grasp the logic of
mathematics. Counterexamples are tools that reveal incor-
rect beliefs. Without such tools, learners natural misconcep-
tions gradually harden into convictions that seriously impede
further learning. This slim volume brings the power of coun-
terexamples to bear on one of the largest and most important
courses in the mathematics curriculum.
Professor Lynn Arthur Steen, St. Olaf College, Minnesota,
USA, Co-author of Counterexamples in Topology
Counterexamples in Calculus
Sergiy Klymchuk
Order your copy today!
1.800.331.1622 www.maa.org
Catalog Code: CXC
101pp., Paperbound, 2010
ISBN: 978-0-88385-756-6
List: $45.95
MAA Member: $35.95
Counterexamples in Calculus serves as a supplementary resource to en-
hance the learning experience in single variable calculus courses. This
book features carefully constructed incorrect mathematical statements
that require students to create counterexamples to disprove them. Methods
of producing these incorrect statements vary. At times the converse of a
well-known theorem is presented. In other instances crucial conditions are
omitted or altered or incorrect defnitions are employed. Incorrect state-
ments are grouped topically with sections devoted to: Functions, Limits,
Continuity, Differential Calculus and Integral Calculus.
This book aims to fll a gap in the literature and provide a resource Ior
using counterexamples as a pedagogical tool in the study of introductory
calculus. In that light it may well be useful for
high school teachers and university Iaculty as a teaching resource
high school and college students as a learning resource
a proIessional development resource Ior calculus instructors
New title by the MAA
That student is taught the best who is
told the least.
R. L. Moore, 1966
The Moore Method: A Pathway to Learner-Centered Instruction ofers a
practical overview of the method as practiced by the four co-authors,
serving as both a how to manual for implementing the method and
an answer to the question, what is the Moore method. Moore is well
known as creator of The Moore Method (no textbooks, no lectures, no
conferring) in which there is a current and growing revival of interest and
modifed application under inquiry-based learning projects. Beginning
with Moores Method as practiced by Moore himself, the authors proceed
to present their own broader defnitions of the method before addressing
specifc details and mechanics of their individual implementations. Each
chapter consists of four essays, one by each author, introduced with the
commonality of the authors writings.
Topics include the culture the authors strive to establish in the classroom,
their grading methods, the development of materials and typical days
in the classroom. Appendices include sample tests, sample notes, and
diaries of individual courses. With more than 130 references supporting
the themes of the book the work provides ample additional reading
supporting the transition to learner-centered methods of instruction.
The Moore Method: A Pathway
to Learner-Centered Instruction
Catalog Code: NTE-75
260 pp., Paperbound, 2009,
ISBN: 978-0-88385-185-2
List: $57.50 MAA Member: $47.50
Charles A. Coppin, Ted Mahavier, E. Lee May,
and Edgar Parker, Editors
To order call 1-800-331-1622 or visit us online at www.maa.org
New title from the MAA