S 15

Download as pdf or txt
Download as pdf or txt
You are on page 1of 116

monthly

THE AMERICAN MATHEMATICAL

VOLUME 123, NO. 7 AUGUSTSEPTEMBER 2016

The Range of a Rotor Walk 627


Laura Florescu, Lionel Levine, and Yuval Peres

A Tale of Three Theorems 643


Lawrence Zalcman

An Electric Network for Nonreversible Markov Chains 657


Mrton Balzs and ron Folly

A Characteristic Averaging Property of the Catenary 683


Vincent Coll and Jeff Dodd

The Six Circles Theorem Revisited 689


Dennis Ivanov and Serge Tabachnikov

NOTES
Cayleys Formula: A Page From The Book 699
Arnon Avron and Nachum Dershowitz

On Tangents and Secants of Infinite Sums 701


Michael Hardy

Factorization of a Matrix Differential Operator Using 704


Functions in Its Kernel
Alex Kasman

Computing ](2m) by using Telescoping Sums 710


Brian D. Sittinger

Generating Iterated Function Systems for the Vicsek 716


Snowflake and the Koch Curve
Yuanyuan Yao and Wenxia Li

PROBLEMS AND SOLUTIONS 722


BOOK REVIEW
An Introduction to Statistical Learning with Applications 731
in R by Gareth James, Daniela Witten, Trevor Hastie,
and Robert Tibshirani
by Matthew Richey

An Official Publication of the Mathematical Association of America


Enjoy math contests?

12 3 4 Take a look at our latest


Euclidean Geometry in Mathematical Olympiads
Evan Chen

7KLVLVDFKDOOHQJLQJSUREOHPVROYLQJERRNLQ(XFOLGHDQJHRP
HWU\(DFKFKDSWHUFRQWDLQVFDUHIXOO\FKRVHQZRUNHGH[DPSOHV
ZKLFKH[SODLQQRWRQO\WKHVROXWLRQVWRWKHSUREOHPVEXWDOVRGH
VFULEHLQFORVHGHWDLOKRZRQHZRXOGLQYHQWWKHVROXWLRQWREHJLQ
ZLWK7KHWH[WFRQWDLQVDVHOHFWLRQRISUDFWLFHSUREOHPVRI

6
YDU\LQJGLFXOW\IURPFRQWHVWVDURXQGWKHZRUOGZLWKH[WHQ

5
VLYHKLQWVDQGVHOHFWHGVROXWLRQV7KHH[SRVLWLRQLVIULHQGO\DQG
UHOD[HGDQGDFFRPSDQLHGE\RYHUEHDXWLIXOO\GUDZQJXUHV

7
H,6%1 HERRN SDJHV
7RRUGHUYLVLWZZZPDDRUJHERRNV(*02

A Gentle Introduction to the American Invitational


Mathematics Exam

9
Scott Annin

7KLVERRNFHOHEUDWHVPDWKHPDWLFDOSUREOHPVROYLQJDWWKHOHYHO

8 10
RIWKH$PHULFDQ,QYLWDWLRQDO0DWKHPDWLFV([DPLQDWLRQ7KHUH
DUHPRUHWKDQIXOO\VROYHGSUREOHPVLQWKHERRNFRQWDLQLQJ
H[DPSOHVIURP$,0(FRPSHWLWLRQVRIWKHVVV
DQGV,QVRPHFDVHVPXOWLSOHVROXWLRQVDUHSUHVHQWHGWR
KLJKOLJKWYDULDEOHDSSURDFKHV7RKHOSSUREOHPVROYHUVZLWKWKH
H[HUFLVHVWKHDXWKRUSURYLGHVWZROHYHOVRIKLQWVWRHDFKH[HUFLVH
LQWKHERRNRQHWRKHOSJHWDQLGHDKRZWREHJLQDQGDQRWKHUWR
SURYLGHPRUHJXLGDQFHLQQDYLJDWLQJDQDSSURDFKWRWKHVROXWLRQ

H,6%1 HERRN SDJHV


7RRUGHUYLVLWZZZPDDRUJHERRNV*,$
monthly
THE AMERICAN MATHEMATICAL

VOLUME 123, NO. 7 AUGUSTSEPTEMBER 2016

EDITOR
Scott T. Chapman
Sam Houston State University

EDITOR-ELECT NOTES EDITOR BOOK REVIEW EDITOR


Susan Colley Sergei Tabachnikov Jeffrey Nunemacher
Oberlin College Pennsylvania State University Ohio Wesleyan University

PROBLEM SECTION EDITORS


Douglas B. West Gerald Edgar Doug Hensley
University of Illinois Ohio State University Texas A&M University

ASSOCIATE EDITORS
William Adkins Jeffrey Lawson
Louisiana State University Western Carolina University
David Aldous C. Dwight Lahr
University of California, Berkeley Dartmouth College
Elizabeth Allman Susan Loepp
University of Alaska, Fairbanks Williams College
Jonathan M. Borwein Irina Mitrea
University of Newcastle Temple University
Jason Boynton Bruce P. Palka
North Dakota State University National Science Foundation
Edward B. Burger Vadim Ponomarenko
Southwestern University San Diego State University
Minerva Cordero-Epperson Catherine A. Roberts
University of Texas, Arlington College of the Holy Cross
Allan Donsig Rachel Roberts
University of Nebraska, Lincoln Washington University, St. Louis
Michael Dorff Ivelisse M. Rubio
Brigham Young University Universidad de Puerto Rico, Rio Piedras
Daniela Ferrero Adriana Salerno
Texas State University Bates College
Luis David Garcia-Puente Edward Scheinerman
Sam Houston State University Johns Hopkins University
Sidney Graham Anne Shepler
Central Michigan University University of North Texas
Tara Holm Frank Sottile
Cornell University Texas A&M University
Lea Jenkins Susan G. Staples
Clemson University Texas Christian University
Daniel Krashen Daniel Ullman
University of Georgia George Washington University
Ulrich Krause Daniel Velleman
Universitt Bremen Amherst College
Steven Weintraub
Lehigh University

ASSISTANT MANAGING EDITOR MANAGING EDITOR


Bonnie K. Ponce Beverly Joy Ruedi
NOTICE TO AUTHORS Proposed problems or solutions should be sent to:
DOUG HENSLEY, MONTHLY Problems
The MONTHLY publishes articles, as well as notes and other fea-
Department of Mathematics
tures, about mathematics and the profession. Its readers span
Texas A&M University
a broad spectrum of mathematical interests, and include pro-
3368 TAMU
fessional mathematicians as well as students of mathematics
College Station, TX 77843-3368
at all collegiate levels. Authors are invited to submit articles
and notes that bring interesting mathematical ideas to a wide In lieu of duplicate hardcopy, authors may submit pdfs to
audience of MONTHLY readers. [email protected].
The MONTHLYs readers expect a high standard of exposition; Advertising correspondence should be sent to:
they expect articles to inform, stimulate, challenge, enlighten, MAA Advertising
and even entertain. MONTHLY articles are meant to be read, en- 1529 Eighteenth St. NW
joyed, and discussed, rather than just archived. Articles may Washington DC 20036
be expositions of old or new results, historical or biographical Phone: (202) 319-8461
essays, speculations or definitive treatments, broad develop- E-mail: [email protected]
ments, or explorations of a single application. Novelty and
Further advertising information can be found online at www.
generality are far less important than clarity of exposition
maa.org.
and broad appeal. Appropriate figures, diagrams, and photo-
graphs are encouraged. Change of address, missing issue inquiries, and other sub-
scription correspondence can be sent to:
Notes are short, sharply focused, and possibly informal. They
are often gems that provide a new proof of an old theorem, a [email protected]
novel presentation of a familiar theme, or a lively discussion or
of a single issue. The MAA Customer Service Center
P.O. Box 91112
Submission of articles, notes, and filler pieces is required via the Washington, DC 20090-1112
MONTHLYs Editorial Manager System. Initial submissions in pdf or (800) 331-1622
LATEX form can be sent to the Editor-Elect Susan Colley at (301) 617-7800
www.editorialmanager.com/monthly Recent copies of the MONTHLY are available for purchase
The Editorial Manager System will cue the author for all re- through the MAA Service Center at the address above.
quired information concerning the paper. The MONTHLY has in- Microfilm Editions are available at: University Microfilms In-
stituted a double blind refereeing policy. Manuscripts which ternational, Serial Bid coordinator, 300 North Zeeb Road, Ann
contain the authors names will be returned. Questions con- Arbor, MI 48106.
cerning submission of papers can be addressed to the Editor-
Elect at [email protected]. Authors who use LATEX can find The AMERICAN MATHEMATICAL MONTHLY (ISSN 0002-9890) is
our article/note template at www.maa.org/monthly.html. published monthly except bimonthly June-July and August-
This template requires the style file maa-monthly.sty, which September by the Mathematical Association of America at
can also be downloaded from the same webpage. A format- 1529 Eighteenth Street, NW, Washington, DC 20036 and Lan-
ting document for MONTHLY references can be found there too. caster, PA, and copyrighted by the Mathematical Association
of America (Incorporated), 2015, including rights to this jour-
Letters to the Editor on any topic are invited. Comments, criti- nal issue as a whole and, except where otherwise noted, rights
cisms, and suggestions for making the MONTHLY more lively, to each individual contribution. Permission to make copies
entertaining, and informative can be forwarded to the Editor- of individual articles, in paper or electronic form, including
Elect at [email protected] posting on personal and class web pages, for educational and
The online MONTHLY archive at www.jstor.org is a valuable scientific use is granted without fee provided that copies are
resource for both authors and readers; it may be searched not made or distributed for profit or commercial advantage
online in a variety of ways for any specified keyword(s). MAA and that copies bear the following copyright notice: [Copy-
members whose institutions do not provide JSTOR access right 2016 Mathematical Association of America. All rights
may obtain individual access for a modest annual fee; call reserved.] Abstracting, with credit, is permitted. To copy
800-331-1622 for more information. otherwise, or to republish, requires specific permission of the
MAAs Director of Publications and possibly a fee. Periodicals
See the MONTHLY section of MAA Online for current informa-
postage paid at Washington, DC, and additional mailing offic-
tion such as contents of issues and descriptive summaries of
es. Postmaster: Send address changes to the American Math-
forthcoming articles:
ematical Monthly, Membership/Subscription Department,
www.maa.org/monthly.html MAA, 1529 Eighteenth Street, NW, Washington, DC 20036-1385.
The Range of a Rotor Walk
Laura Florescu, Lionel Levine, and Yuval Peres

Abstract. In rotor walk on a graph, the exits from each vertex follow a prescribed periodic
sequence. We show that any rotor walk on the d-dimensional lattice Zd visits at least on the
order of t d/(d+1) distinct sites in t steps. This result extends to Eulerian graphs with a volume
growth condition. In a uniform rotor walk, the first exit from each vertex is to a neighbor
chosen uniformly at random. We prove a shape theorem for the uniform rotor walk on the
comb graph, showing that the size of the range is of order t 2/3 and the asymptotic shape of
the range is a diamond. Using a connection to the mirror model, we show that the uniform
rotor walk is recurrent on two different directed graphs obtained by orienting the edges of the
square grid: the Manhattan lattice and the F-lattice. We end with a short discussion of the time
it takes for rotor walk to cover a finite Eulerian graph.

1. INTRODUCTION. Imagine walking your dog on an infinite square grid of city


streets. At each intersection, your dogs tugs you one block further north, east, south,
or west. After youve been dragged in this fashion down t blocks, how many distinct
intersections have you seen?
The answer depends of course on your dogs algorithm. If she makes a beeline for
the north, then every block brings you to a new intersection, so you see t + 1 distinct
intersections. At the opposite extreme, she could pull you back and forth repeatedly
along her favorite block so that you see only ever see two distinct intersections.
In the clockwise rotor walk each intersection has a signpost pointing the way when
you first arrive there. But your dog likes variety, and she has a capacious memory.
If you come back to an intersection you have already visited, your dog chooses the
direction 90 clockwise from the direction you went the last time you were there.
We can formalize the city grid as the infinite graph Z2 . The intersections are all the
points (x, y) in the plane with integer coordinates, and the city blocks are the line
segments from (x, y) to (x 1, y) and (x, y 1). More generally, we can consider a
d-dimensional city Zd or even an arbitrary graph, but the 90 clockwise rule will have
to be replaced by something more abstract (a rotor mechanism, defined below).
In a rotor walk on a graph, the exits from each vertex follow a prescribed periodic
sequence. Such walks were first studied in [28] as a model of mobile agents exploring a
territory and in [23] as a model of self-organized criticality. Propp proposed rotor walk
as a deterministic analogue of random walk, a perspective explored in [7, 10, 16]. This
paper is concerned with the following questions. How much territory does a rotor walk
cover in a fixed number of steps? Conversely, how many steps does it take for a rotor
walk to completely explore a given finite graph?
Let G = (V, E) be a finite or infinite directed graph. For v V let E v E be
the set of outbound edges from v, and let Cv be the set of all cyclic permutations of
E v . A rotor configuration on G is a choice of an outbound edge (v) E v for each
v V . A rotor mechanism on G is a choice of cyclic permutation m(v) Cv for each
v V . Given and m, the simple rotor walk started at X 0 is a sequence of vertices

http://dx.doi.org/10.4169/amer.math.monthly.123.7.627
MSC: Primary 60C05

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 627


X 0 , X 1 , . . . Zd and rotor configurations = 0 , 1 , . . . such that for all integer times
t 0

m(v)(t (v)), v = X t
t+1 (v) =
t (v), v = X t

and

X t+1 = t+1 (X t )+

where e+ denotes the target of the directed edge e. In words, the rotor at X t rotates
to point to a new neighbor of X t and then the walker steps to that neighbor.
We have chosen the retrospective rotor conventioneach rotor at an already vis-
ited vertex indicates the direction of the most recent exit from that vertexbecause it
makes a few of our results such as Lemma 2.2 easier to state.

Figure 1. The range of a clockwise uniform rotor walk on Z2 after 80 returns to the origin. The mechanism
m cycles through the four neighbors in clockwise order (north, east, south, west), and the initial rotors (v)
were oriented independently north, east, south, or west, each with probability 1/4. Colors indicate the first 20
excursion sets A1 , . . . , A20 , defined in 2.

The range of rotor walk at time t is the set

Rt = {X 1 , . . . , X t }.

We investigate the size of the range, #Rt ,


in terms of the growth rate of balls in the underlying graph G. Fix an origin o V
(the starting point of our rotor walk). For r N, the ball of radius r centered at o,
denoted B(o, r ), is the set of vertices reachable from o by a directed path of length
r . Suppose that there are constants d, k > 0 such that

#B(o, r ) kr d (1)

for all r 1. Intuitively, this condition says that G is at least d-dimensional.

628 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A directed graph is called Eulerian if each vertex has as many incoming as outgoing
edges. In particular, any undirected graph can be made Eulerian by converting each
edge into a pair of oppositely oriented directed edges.

Theorem 1.1. For any Eulerian graph G of bounded degree satisfying (1), the number
of distinct sites visited by a rotor walk started at o in t steps satisfies

#Rt ct d/(d+1)

for a constant c > 0 depending only on G (and not on or m).

Priezzhev et al. [23] and Povolotsky et al. [22] gave a heuristic argument that #Rt
has order t 2/3 for the clockwise rotor walk on Z2 with uniform random initial rotors.
Theorem 1.1 gives a lower bound of this order, and our proof is directly inspired by
their argument.
The upper bound promises to be more difficult because it depends on the initial
rotor configuration . Indeed, the next theorem shows that for certain , the number of
visited sites #Rt grows linearly in t (which we need not point out is much faster than
t 2/3 !). A rotor walk is called recurrent if X t = X 0 for infinitely many t and transient
otherwise.

Theorem 1.2. For any Eulerian graph G and any mechanism m, if the initial rotor
configuration has an infinite path directed toward o, then a rotor walk started at o
is transient and
t
#Rt ,

where  is the maximal degree of a vertex in G.

Theorems 1.1 and 1.2 are proved in 3. But enough about the size of the range; what
about its shape?
Each pixel in Figure 1 corresponds to a vertex of Z2 , and Rt is the set of all colored
pixels (the different colors correspond to excursions of the rotor walk, defined in 2);
the mechanism m is clockwise, and the initial rotors independently point north, east,
south, or west with probability 1/4 each. Although the set Rt of Figure 1 looks far
from round, Kapri and Dhar have conjectured that for very large t it becomes nearly
a circular disk! From now on, by uniform rotor walk we will always mean that the
initial rotors {(v)}vV are independent and uniformly distributed on E v .

Conjecture 1.3 (Kapri-Dhar [25]). The set of sites Rt visited by the clockwise uni-
form rotor walk in Z2 is asymptotically a disk. There exists a constant c such that for
any  > 0,

P{D(c)t 1/3 Rt D(c+)t 1/3 } 1

as t , where Dr = {(x, y) Z2 : x 2 + y 2 < r 2 }.

We are a long way from proving anything like Conjecture 1.3, but we can show that
an analogous shape theorem holds on a much simpler graph, the comb obtained from
Z2 by deleting all horizontal edges except those along the x-axis (Figure 2).

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 629



x

Figure 2. A piece of the comb graph (left) and the set of sites visited by a uniform rotor walk on the comb
graph in 10,000 steps.

Theorem 1.4. For uniform rotor walk on the comb graph, #Rt has order t 2/3 and the
asymptotic shape of Rt is a diamond.

For the precise statement, see 4. This result contrasts with random walk on the
comb, for which the expected number of sites visited is only on the order of t 1/2 log t
as shown by Pach and Tardos [21].
Thus, the uniform rotor walk explores the comb more efficiently than random walk.
(On the other hand, it is conjectured to explore Z2 less efficiently than random walk!)
The main difficulty in proving upper bounds for #Rt lies in showing that the uniform
rotor walk is recurrent. This seems to be a difficult problem in Z2 , but we can show it
for two different directed graphs obtained by orienting the edges of Z2 : the Manhattan
lattice and the F-lattice, pictured in Figure 3. The F-lattice has two outgoing horizon-
tal edges at every odd node and two outgoing vertical edges at every even node (we
call (x, y) odd or even according to whether x + y is odd or even). The Manhattan
lattice is full of one-way streets: Rows alternate pointing left and right, while columns
alternate pointing up and down.

(a) F-lattice (b) Manhattan lattice

Figure 3. Two different periodic orientations of the square grid with indegree and outdegree 2.

630 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Theorem 1.5. Uniform rotor walk is recurrent on both the F-lattice and the Manhat-
tan lattice.

The proof uses a connection to the mirror model and critical bond percolation on
Z2 ; see 5.
Theorems 1.1-1.5 bound the rate at which a rotor walk explores various infinite
graphs. In 6, we bound the time it takes a rotor walk to completely explore a given
finite graph.

Related work. By comparing to a branching process, Angel and Holroyd [3] showed
that uniform rotor walk on the infinite b-ary tree is transient for b 3 and recurrent for
b = 2. In the latter case, the corresponding branching process is critical, and the dis-
tance traveled by rotor walk before returning n times to the root is doubly exponential
in n. They also studied rotor walk on a singly infinite comb with the most transient
initial rotor configuration . They showed that if n particles start at the origin, then
1d
order n of them escape to infinity (more generally, order n 12 for a d-dimensional
analogue of the comb).
In rotor aggregation, each of n particles starting at the origin performs rotor walk
until reaching an unoccupied site, which it then occupies. For rotor aggregation in
Zd , the asymptotic shape of the set of occupied sites is a Euclidean ball [20]. For the
layered square lattice (Z2 with an outward bias along the x- and y-axes), the asymp-
totic shape becomes a diamond [18]. Huss and Sava [17] studied rotor aggregation
on the two-dimensional comb with the most recurrent initial rotor configuration.
They showed that at certain times the boundary of the set of occupied sites is com-
posed of four segments of exact parabolas. It is interesting to compare their result with
Theorem 1.4: The asymptotic shape, and even the scaling, is different.

2. EXCURSIONS. Let G = (V, E) be a connected Eulerian graph. In this section,


G can be either finite or infinite, and the rotor mechanism m can be arbitrary. The main
idea of the proof of Theorem 1.1 is to decompose rotor walk on G into a sequence of
excursions. This idea was also used in [4] to construct recurrent rotor configurations
on Zd for all d and in [5, 6, 30] to bound the cover time of rotor walk on a finite
graph (about which we say more in 6). For a vertex o V , we write deg(o) for the
number of outgoing edges from o, which equals the number of incoming edges since
G is Eulerian.

Definition. An excursion from o is a rotor walk started at o and run until it returns to
o exactly deg(o) times.

More formally, let (X t )t0 be a rotor walk started at X 0 = o. For t 0, let

u t (x) = #{1 s t : X s = x}.

For n 0, let

T (n) = min{t 0 : u t (o) n deg(o)},

be the time taken for the rotor walk to complete n excursions from o (with the conven-
tion that min of the empty set is ). For all n 1 such that T (n 1) < , define

en u T (n) u T (n1)

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 631


so that en (x) counts the number of visits to x during the nth excursion. To make sense
of this expression when T (n) = , we write u (x) N {} for the increasing
limit of the sequence u t (x).
Our first lemma says that each x V is visited at most deg(x) times per excursion.
The assumption that G is Eulerian is crucial here.

Lemma 2.1. [4, Lemma 8]; [6, 4.2] For any initial rotor configuration ,

e1 (x) deg(x) x V.

Proof. If the rotor walk never traverses the same directed edge twice, then u t (x)
deg(x) for all t and x, so we are done. Otherwise, consider the smallest t such that
(X s , X s+1 ) = (X t , X t+1 ) for some s < t. By definition, rotor walk reuses an outgoing
edge from X t only after it has used all of the outgoing edges from X t . Therefore, at
time t the vertex X t has been visited deg(X t ) + 1 times, but by the minimality of t each
incoming edge to X t has been traversed at most once. Since G is Eulerian, it follows
that X t = X 0 = o and t = T (1).
Therefore, every directed edge is used at most once during the first excursion, so
each x V is visited at most deg(x) times during the first excursion.

Lemma 2.2. If T (1) < and there is a directed path of initial rotors from x to o,
then

e1 (x) = deg(x).

Proof. Let y be the first vertex after x on the path of initial rotors from x to o. By
induction on the length of this path, y is visited exactly deg(y) times in an excursion
from o. Each incoming edge to y is traversed at most once by Lemma 2.1, so in fact
each incoming edge to y is traversed exactly once. In particular, the edge (x, y) is
traversed. Since (x) = (x, y), the edge (x, y) is the last one traversed out of x, so x
must be visited at least deg(x) times.

If G is finite, then T (n) < for all n since by Lemma 2.1 the number of visits to
a vertex is at most or equal to the degree of that vertex. If G is infinite, then depending
on the rotor mechanism m and initial rotor configuration , rotor walk may or may not
complete an excursion from o. In particular, Lemma 2.2 implies the following.

Corollary 2.3. If has an infinite path directed toward o, then T (1) = .

Now let

An = {x V : en (x) > 0}

be the set of sites visited during the nth excursion. We also set e0 = o (where, as
usual, o (x) = 1 if x = o and 0 otherwise) and A0 = {o}. For a subset A V , define
its outer boundary A as the set

A := {y
/ A : (x, y) E for some x A}.

Lemma 2.4. For each n 0, if T (n + 1) < , then

632 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
1. en+1 (x) deg(x) for all x V ,
2. en+1 (x) = deg(x) for all x An ,
3. An+1 An An .

Proof. Part (i) is immediate from Lemma 2.1.


Part (ii) follows from Lemma 2.2 and the observation that in the rotor configuration
T (n) , the rotor at each x An points along the edge traversed most recently from x,
so for each x An there is a directed path of rotors in T (n) leading to X T (n) = o.
Part (iii) follows from (ii): The (n + 1)st excursion traverses each outgoing edge
from each x An , so in particular, it visits each vertex in An An .

Note that the balls B(o, n) can be defined inductively by B(o, 0) = {o} and

B(o, n + 1) = B(o, n) B(o, n)

for each n 0. Inducting on n using Lemma 2.4(iii), we obtain the following.

Corollary 2.5. For each n 1, if T (n) < , then B(o, n) An .

Rotor walk is called recurrent if T (n) < for all n. Consider the rotor config-
uration T (n) at the end of the nth excursion. By Lemma 2.4, each vertex in x An
is visited exactly deg(x) times during the N th excursion for each N n + 1, so we
obtain the following.

Corollary 2.6. For a recurrent rotor walk, T (N ) (x) = T (n) (x) for all x An and all
N n.

The following proposition is a kind of converse to Lemma 2.4 in the case of undi-
rected graphs.

Proposition 2.7. [5, Lemma 3]; [4, Prop. 11] Let G = (V, E) be an undirected graph.
For a sequence S1 , S2 , . . . V of sets inducing connected subgraphs such that Sn+1
Sn Sn for all n 1, and any vertex o S1 , there exists a rotor mechanism m and
initial rotors such that the nth excursion for rotor walk started at o traverses each
edge incident to Sn exactly once in each direction and no other edges.

3. LOWER BOUND ON THE RANGE. In this section, G = (V, E) is an infinite


connected Eulerian graph. Fix an origin o V and let v(n) be the number of directed
 1
edges incident to the ball B(o, n). Let W (m) = m1 n=0 v(n). Write W (t) = min{m
N : W (m) > t}.
Fix a rotor mechanism m and an initial rotor configuration on G. For x V , let
u t (x) be the number of times x is visited by a rotor walk started at o and run for t
steps. In the proof of the next theorem, our strategy for lower bounding the size of the
range

Rt = {x V : u t (x) > 0}

will be to (i) upper bound the number of excursions completed by time t, in order to
(ii) upper bound the number of times each vertex is visited, so that (iii) many distinct
vertices must be visited.

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 633


Theorem 3.1. For any rotor mechanism m, any initial rotor configuration on G, and
any time t 0, the following bounds hold.
u t (o)
1. < W 1 (t).
deg(o)
u t (x) u t (o)
2. + 1 for all x V .
deg(x) deg(o)
3. Let t = maxxB(o,t) deg(x). Then

t
#Rt . (2)
t (W 1 (t) + 1)

Before proving this theorem, let us see how it implies Theorem 1.1. The volume
growth condition (1) implies v(r ) kr d , so W (r ) k r d+1 for a constant k  , so
W 1 (t) (t/k  )1/(d+1) . Now if G has bounded degree, then the right side of (2) is at
least ct d/(d+1) for a constant c (which depends only on k and the maximal degree).

Proof of Theorem 3.1. We first argue that the total length T (m) of the first m excur-
sions is at least W (m). By Corollary 2.5, the nth excursion visits every site in the ball
B(o, n). Therefore, by Lemma 2.4(ii), the (n + 1)st excursion visits every site x
B(o, n) exactly deg(x) times, so the (n + 1)st excursion traverses each directed edge
incident to B(o, n). The length T (n + 1) T (n) of the (n + 1)st excursion is there-
fore at least v(n). Summing over n < m yields the desired inequality T (m) W (m).
Now let m = W 1 (t). Since t < W (m), the rotor walk has not yet completed its mth
excursion at time t, so u t (o) < m deg(o), which proves (i).
Part (ii) now follows from Lemma 2.1 since e1 (x) = u T (1) (x) deg(x). During
each completed excursion, the origin o is visited deg(o) times while x is visited at
most deg(x) times. The +1 accounts for the possibility that time t falls in the middle
of an excursion. 
Part (iii) follows from the fact that t = xB(o,t) u t (x). By parts (i) and (ii), each
term in the sum is at most t (W 1 (t) + 1), so there are at least t/(t (W 1 (t) + 1))
nonzero terms.

Pausing to reflect on the proof, we see that an essential step was the inclusion
B(o, n) An of Corollary 2.5. Can this inclusion ever be an equality? Yes! By Propo-
sition 2.7, if G is undirected, then there exists a rotor walk (that is, a particular m and
) for which

An = B(o, n) for all n 1.

If G = Zd (or any undirected graph satisfying (1) along with its upper bound coun-
terpart, #B(o, n) K n d for a constant K ), then the range of this particular rotor walk
satisfies RW (n) = B(o, n) and hence

#Rt #B(o, W 1 (t)) Ct d/(d+1)

for a constant C. So in this case, the exponent in Theorem 1.1 is best possible. We
derived this upper bound just for a particular rotor walk by choosing a rotor mecha-
nism m and initial rotors . For example, when G = Z2 the rotor mechanism is clock-
wise and the initial rotors are shown in Figure 4. Next, we are going to see that by
varying we can make #Rt a lot larger.

634 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Figure 4. Minimal range rotor configuration for Z2 . The excursion sets are diamonds.

Part (i) of the next theorem gives a sufficient condition for rotor walk to be tran-
sient. Parts (i) and (ii) together prove Theorem 1.2. Part (iii) shows that on a graph
of bounded degree, the number of visited sites #Rt of a transient rotor walk grows
linearly in t.

Theorem 3.2. On any Eulerian graph, the following hold.


1. If has an infinite path of initial rotors directed toward the origin o, then
u t (o) < deg(o) for all t 1.
2. If u t (o) < deg(o), then #Rt t/t where t = maxxB(o,t) deg(x).
3. If rotor walk is transient, then there is a constant C = C(m, ) such that
t
#Rt C
t

for all t 1.

Proof. (i) By Corollary 2.3, if has an infinite path directed toward o, then rotor
walk never completes its first excursion from o.
(ii) If rotor walk does not complete its first excursion, then it visits each vertex x at
most deg(x) times by Lemma 2.1, so it must visit at least t/t distinct vertices.
(iii) If rotor walk is transient, then for some n it does not complete its nth excursion,
so this follows from part (ii) taking C to be the total length of the first n 1 excursions.

4. UNIFORM ROTOR WALK ON THE COMB. The two-dimensional comb is


the subgraph of the square lattice Z2 obtained by removing all of its horizontal edges
except for those on the x-axis (Figure 2). Vertices on the x-axis have degree 4, and all
other vertices have degree 2.
Recall that the uniform rotor walk starts with independent random initial rotors
(v) with the uniform distribution on outgoing edges from v. The following result
shows that the range of the uniform rotor walk on the comb is close to the diamond

Dn := {(x, y) Z2 : |x| + |y| < n}.

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 635


Theorem 4.1. Consider
 3 uniform rotor walk on the comb with any rotor mechanism.
Let n 2 and t = 16
3
n . For any a > 0 there exist constants c, C > 0 such that
 
P Dncn log n Rt Dn+cn log n > 1 Cn a .

Since the bounding diamonds have area 2n 2 (1 + o(1)) while t has order n 3 , it fol-
lows that the size of the range is of order t 2/3 . More precisely, by the first Borel
Cantelli lemma,
 2/3
#Rt 3

t 2/3 2

as t , almost surely. See [11] for more details.

x1

x1

x2

x2

Figure 5. An initial rotor configuration on Z (top) and the corresponding rotor walk.

The proof of Theorem 4.1 is based on the observation that rotor walk on the comb,
viewed at the times when it is on the x-axis, is a rotor walk on Z. If 0 < x1 < x2 < . . .
are the positions of rotors on the positive x-axis that will send the walker left before
right and 0 > x1 > x2 > . . . are the positions on the negative x-axis that will send
the walker right before left, then the x-coordinate of the rotor walk on the comb follows
a zigzag path: right from 0 to x1 , then left to x1 , right to x2 , left to x2 , and so on
(Figure 5).
Likewise, a rotor walk on the comb, viewed at the times when it is on a fixed vertical
line x = k, is also a rotor walk on Z. Let 0 < yk,1 < yk,2 < . . . be the heights of the
rotors on the line x = k above the x-axis that initially send the walker down, and let
0 > yk,1 > yk,2 > . . . be the heights of the rotors on the line x = k below the x-axis
that initially send the walker up.
We only sketch the remainder of the proof; the full details are in [11]. For uniform
initial rotors, the quantities xi and yk,i are sums of independent geometric random
variables of mean 2. We have Exi = 2|i| and Eyk, j = 2| j|. Standard concentration
inequalities ensure that these quantities are close to their expectations so that a rotor
walk on the comb run for n/2 excursions visits each site (x, 0) Dn about (n |x|)/2
times and hence visits each site (x, y) Dn about (n |x| |y|)/2 times. Summing

636 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
over (x, y) Dn shows that the total time to complete these n/2 excursions is about
n . With high probability, every site in the smaller diamond Dncn log n is visited
16 3
3
at least once during these n/2 excursions, whereas no site outside the larger diamond
Dn+cn log n is visited.

5. DIRECTED LATTICES AND THE MIRROR MODEL. Figure 3 shows two


different orientations of the square grid Z2 : The F- lattice has outgoing vertical arrows
(N and S) at even sites and outgoing horizontal arrows (E and W) at odd sites. The
Manhattan lattice has every even row pointing E, every odd row pointing W , every
even column pointing S, and every odd column pointing N . In these two lattices, every
vertex has outdegree 2, so there is a unique rotor mechanism on each lattice (namely,
exits from a given vertex alternate between the two outgoing edges), and a rotor walk
is completely specified by its starting point and the initial rotor configuration .
In this section, we relate the uniform rotor walk on these lattices to percolation
and the Lorenz mirror model [14, 13.3]. Consider the half dual lattice L, a square
grid whose vertices are the points (x + 12 , y + 12 ) for x, y Z with x + y even and
the usual lattice edges: (x + 12 , y + 12 ) (x + 12 , y 12 ), (x + 12 , y + 12 ) (x 12 ,
y + 12 ), (x + 12 , y + 12 ) (x + 32 , y + 12 ), (x + 12 , y + 12 ) (x + 12 , y + 32 ). We con-
sider critical bond percolation on L. Each possible lattice edge of L is either open or
closed independently with probability 12 .
Note that each vertex v of Z2 lies on a unique edge ev of L. We consider two differ-
ent rules for placing two-sided mirrors at the vertices of Z2 .
F-lattice: Each vertex v has a mirror, which is oriented parallel to ev if ev is closed
and perpendicular to ev if ev is open.
Manhattan lattice: If ev is closed, then v has a mirror oriented parallel to ev ; other-
wise, v has no mirror.

(a) F-Lattice (b) Manhattan lattice


Figure 6. Percolation on L: dotted blue edges are open, solid blue edges are closed. Shown in green are the
corresponding mirrors on the F-lattice (left) and Manhattan lattice.

Consider now the first glance mirror walk: Starting at the origin o, it travels along a
uniform random outgoing edge (o). On its first visit to each vertex v = Z2 {o}, the
walker behaves like a light ray. If there is a mirror at v, then the walker reflects by a
right angle, and if there is no mirror, then the walker continues straight. At this point,
v is assigned the rotor (v) = (v, w) where w is the vertex of Z2 visited immediately
after v. On all subsequent visits to v, the walker follows the usual rules of rotor walk.

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 637


Figure 7. Mirror walk on the Manhattan lattice.

Lemma 5.1. With the mirror assignments described above, uniform rotor walk on the
Manhattan lattice or the F-lattice has the same law as the first glance mirror walk.

Proof. The mirror placements are such that the first glance mirror walk must fol-
low a directed edge of the corresponding lattice. The rotor (v) assigned by the first
glance mirror walk when it first visits v is uniform on the outgoing edges from v; this
remains true even if we condition on the past because all previously assigned rotors
are independent of the status of the edge ev (open or closed), and changing the status
of ev changes (v).

Write e = 1{e is open}. Given the random variables e {0, 1} indexed by the
edges of L, we have described how to set up mirrors and run a rotor walk, using the
mirrors to reveal the initial rotors as needed. The next lemma holds pointwise in .

Lemma 5.2. If there is a cycle of closed edges in L surrounding o, then rotor walk
started at o returns to o at least twice before visiting any vertex outside the cycle.

Proof. Denote by C the set of vertices v such that ev lies on the cycle and by A the
set of vertices enclosed by the cycle. Let w be the first vertex not in A C visited by
the rotor walk. Since the cycle surrounds o, the walker must arrive at w along an edge
(v, w) where v C. Since ev is closed, the walker reflects off the mirror ev the first
time it visits v, so only on the second visit to v does it use the outgoing edge (v, w).
Moreover, the two incoming edges to v are on opposite sides of the mirror. Therefore,
by minimality of w, the walker must use the same incoming edge (u, v) twice before
visiting w. The first edge to be used twice is incident to the origin by Lemma 2.1, so
the walk must return to the origin twice before visiting w.

Now we use a well-known theorem about critical bond percolation: There are
infinitely many disjoint cycles of closed edges surrounding the origin. Together with
Lemma 5.2 this completes the proof that the uniform rotor walk is recurrent both on
the Manhattan lattice and the F-lattice.

638 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
To make a quantitative statement, consider the probability of finding a closed cycle
within a given annulus. The following result is a consequence of the RussoSeymour
Welsh estimate and FKG inequality; see [14, 11.72].

Theorem 5.3. Let S = [, ] [, ]. Then for all  1,

P(there exists a cycle of closed edges surrounding the origin in S3 S ) > p

for a constant p that does not depend on .

Let u t (o) be the number of visits to o by the first t steps of uniform rotor walk in
the Manhattan or F-lattice.

Theorem 5.4. For any a > 0 there exists c > 0 such that

P(u t (o) < c log t) < t a .

Proof. By Lemma 5.2, the event {u t (o) < k} is contained in the event that at most
k/2 of the annuli S3 j S3 j1 for j = 1, . . . , 101 log t contain a cycle of closed edges
surrounding the origin. Taking k = c log t for sufficiently small c, this event has prob-
ability at most t a by Theorem 5.3.

Figure 8. Set of sites visited by uniform rotor walk after 250,000 steps on the F-lattice and the Manhattan
lattice (right). Green represents at least two visits to the vertex and red one visit.

Although we used the same technique to show that the uniform rotor walk on these
two lattices is recurrent, experiments suggest that behavior of the two walks is rather
different. The number of distinct sites visited in t steps appears to be of order t 2/3 on
the Manhattan lattice but of order t for F-lattice. This difference is clearly visible in
Figure 8.

6. TIME FOR ROTOR WALK TO COVER A FINITE EULERIAN GRAPH.


Let (X t )t0 be a rotor walk on a finite connected Eulerian directed graph G = (V, E)
with diameter D. The vertex cover time is defined by

tvertex = min{t : {X s }ts=1 = V }.

The edge cover time is defined by

tedge = min{t : {(X s1 , X s )}ts=1 = E}

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 639


where E is the set of directed edges. Yanovski, Wagner, and Bruckstein [30] show
tedge 2D#E for any Eulerian directed graph. Our next result improves this bound
slightly, replacing 2D by D + 1.

Theorem 6.1. For rotor walk on a finite Eulerian graph G of diameter D, with any
rotor mechanism m and any initial rotor configuration ,

tvertex D#E

and

tedge (D + 1)#E.

Proof. Consider the time T (n) for rotor walk to complete n excursions from o. If G
has diameter D, then A D = V by Corollary 2.5 and e D+1 deg by Lemma 2.4(ii). It
follows that tvertex T (D) and tedge T (D + 1). By Lemma 2.1, each directed edge
is used at most once per excursion, so T (n) n#E for all n 0.

Bampas et al. [5] prove a corresponding lower bound: On any finite undirected
graph, there exist a rotor mechanism m and initial rotor configuration such that
tvertex 14 D#E.

Hitting times for random walk. The upper bounds for tvertex and tedge in Theorem 6.1
match (up to a constant factor) those found by Friedrich and Sauerwald [13] on an
impressive variety of graphs: regular trees, stars, tori, hypercubes, complete graphs,
lollipops, and expanders. Intriguingly, the method of [13] is different. Using a theorem
of Holroyd and Propp [16] relating rotor walk to the expected time H (u, v) for random
walk started at u to hit v, they infer that tvertex K + 1 and tedge 3K , where

1
K := max H (u, v) + #E + |H (i, v) H ( j, v) 1| .
u,vV 2 (i, j)E

A curious consequence of the upper bound tvertex K + 1 of [13] and the lower bound
maxm, tvertex (m, ) 14 D#E of [5] is the following inequality.

Corollary 6.2. For any undirected graph G of diameter D, we have

1
K D#E 1.
4
Is K always within a constant factor of D#E? It turns out the answer is no. To
construct a counterexample, we will build a graph G = G ,N of small diameter that has
so few long-range edges that random walk effectively does not feel them (Figure 9).
Let , N 2 be integers, and set V = {1, . . . , } {1, . . . , N } with edges (x, y)
(x  , y  ) if either x  x 1 (mod ) or y  = y. The diameter of G is 2: Any two
vertices (x, y) and (x  , y  ) are linked by the path (x, y) (x + 1, y  ) (x  , y  ). Each
vertex (x, y) has 2N short-range edges to (x 1, y  ) and  3 long-range edges to
(x  , y). It turns out that if  is sufficiently large and N is much larger still (N = 5 ),
then K > 10 1
#E, showing that K can exceed D#E by an arbitrarily large factor. The
details can be found in [11].

640 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
2 2
1 1

1 1
2 2
Figure 9. The thick cycle G ,N with  = 4 and N = 2. Long-range edges are dotted, and short-range edges
are solid.

We conclude with a curious observation and a question. Corollary 6.2 is a fact


purely about random walk on a graph. Can it be proved without resorting to rotor
walk?

7. ACKNOWLEDGEMENTS. This work was initiated while the first two authors
were visiting Microsoft Research in Redmond, WA. We thank Sam Watson for help
with some of the simulations, Tobias Friedrich for bringing to our attention references
[5] and [30], and the referees for their careful reading. This research has been partially
supported by NSF grant DMS-1243606 and a Sloan Fellowship.

REFERENCES

1. N. Alon, J. Spencer, The Probabilistic Method. Third ed. John Wiley & Sons, Hoboken, NJ 2008.
2. S. Angelopoulos, B. Doerr, A. Huber, K. Panagiotou, Tight bounds for quasirandom rumor spreading,
Electron. J. Comb. 16 (2009).
3. O. Angel, A. E. Holroyd, Rotor walks on general trees, SIAM J. Discrete Math. 25 (2011) 423446,
arxiv.org/1009.4802.
4. O. Angl, A. E. Holroyd, Recurrent rotorrouter configurations, J. Comb. 3 no. 2 (2012) 185194,
arxiv.org/1101.2484.
5. E. Bampas, L. Gasieniec, N. Hanuss, D. Ilcinkas, R. Klasing, A. Kosowski, Euler tour lock-in problem
in the rotor-router model, in Distributed Computing. Springer, Berline, Heidelberg 2009. 423435.
6. S. N. Bhatt, S. Even, D. S. Greenberg, R. Tayar, Traversing directed Eulerian mazes, J. Graph. Algorithms
Appl. 6 no. 2 (2002) 157173.
7. J. N. Cooper, J. Spencer, Simulating a random walk with constant error, Combin., Probab. Comput. 15
no. 6 (2006) 815822, arxiv.org/0402323.
8. P. Diaconis, L. Smith, Honest Bernoulli excursions, J. Appl. Prob. 25 (1988) 464477.
9. B. Doerr, T Friedrich, T. Sauerwald, Quasirandom rumor spreading, in Proc. 19th SODA. SIAM, San
Francisco, CA 2008.
10. L. Florescu, S. Ganguly, L. Levine, Y. Peres, Escape rates for rotor walks in Zd , SIAM J. Discrete Math.
28 no. 1 (2014) 323334, arxiv.org/1301.3521.
11. L. Florescu, L. Levine, Y. Peres, The Range of a Rotor Walk, arxiv.org/1408.5533.
12. T. Friedrich, L. Levine, Fast simulation of large-scale growth models, ArXiv e-prints (2010).
13. T. Friedrich, T. Sauerwald, The cover time of deterministic random walks, Electron. J. Combin. 17 (2010).
14. G. Grimmett, Percolation. Second ed. Springer, Berlin, 1999.
15. A. E. Holroyd, L. Levine, K. Meszaros, Y. Peres, J. Propp, D. B. Wilson, Chip-firing and rotor-routing
on directed graphs, in In and Out of Equilibrium 2. Vol. 60, Progr. Probab., Birkhauser, Basel, 2008,
dx.doi.org/10.1007/978-3-7643-8786-0 17.
16. A. E. Holroyd, J. G. Propp, Rotor walks and Markov chains, contemporary mathematics, Algorithmic
Probab. Combin. 520 (2010) 105126, arxiv.org/0904.4507.
17. W. Huss, E. Sava, Rotor-router aggregation on the comb, Elect. J. Comb. 18 (2011) 224,
arxiv.org/1103.4797.
18. W. Kager, L. Levine, Rotor-router aggregation on the layered square lattice Electron. J. Combin. 17 no. 1
(2010) P 152, 16 pp. arxiv.org/abs/1003.4017.
19. G. F. Lawler, Intersections of Random Walks. Birkhauser, Boston, 1996.

AugustSeptember 2016] THE RANGE OF A ROTOR WALK 641


20. L. Levine, Y. Peres, Strong spherical asymptotics for rotorrouter aggregation and the divisible sandpile,
Potential Anal. 30 (2009) 127, arxiv.org/0704.0688.
21. J. Pach, G. Tardos, The range of a random walk on a comb, Electron. J. Combin. 20 no. 3 (2013).
22. A. M Povolotsky, V. B. Priezzhev, R. R. Shcherbakov, Dynamics of Eulerian walkers, Phys. Rev. E 58
no. 5 (1998) 5449, arxiv.org/cond-mat/9802070.
23. V. B. Priezzhev, D. Dhar, A. Dhar, S. Krishnamurthy, Eulerian walkers as a model of self-organized
criticality, Phys. Rev. Lett. 77 (1996) 5079-5082, arxiv.org/1009.4802.
24. J. Propp, Random walk and random aggregation, derandomized, 2003,
http://research.microsoft.com/apps/video/default.aspx?id=104906.
25. K. Rajeev, D. Dhar, Asymptotic shape of the region visited by an Eulerian walker, Phys. Rev. E. 80 no. 5
(2009).
26. A. Reddy, R. Tulasi, A recurrent roto-router configuration in Z3 , 2010, arxiv.org/1005.3962.
27. O. Schramm, Personal communication. 2003.
28. I. A. Wagner, M. Lindenbaum, A. M. Bruckstein, 4th Israel Symposium on Theory of Computing and
Systems (ISTCS 96). 1996.
29. I. A. Wagner, M Lindenbaum, A. M. Bruckstein, Efficiently searching a graph by a smell-oriented vertex
process, Ann. Math. Artif. Intell. 24 (1998) 211223, http://dx.doi.org/10.1023/A:1018957401093.
30. V. Yanovski, I. A. Wagner, A. M. Bruckstein, A distributed ant algorithm for efficiently patrolling a
network, Algorithmica 37 no. 3 (2003) 165186.

LAURA FLORESCU is a Ph.D. student at Courant Institute, New York University.


Courant Institute of Mathematical Sciences, NYU, New York, NY 10012
[email protected]

LIONEL LEVINE is an assistant professor at Cornell University. His Ph.D. is from Berkeley, where his
advisor was Yuval Peres. You can usually find him thinking about why things are the way they are or else
about why they arent the way they arent.
Cornell University, Ithaca, NY 14853
[email protected]

YUVAL PERES is a principal researcher at Microsoft Research in Redmond, WA. He has written more than
250 research papers in probability theory, ergodic theory, and analysis and theoretical computer science, in
particular on fractals, random walks, Brownian motion, percolation, online learning, and Markov chain mixing
times. He obtained his Ph.D. at the Hebrew University of Jerusalem in 1990 and was later a faculty member
there and at the University of California, Berkeley. Yuval was awarded the Rollo Davidson Prize in 1995, the
Love Prize in 2001, and was a corecipient of the David P. Robbins Prize in 2011. He was an invited speaker
at the International Congress of Mathematicians in 2002. He is most proud of the 21 Ph.D. students he has
mentored and hopes to keep learning from them.
Microsoft Research, Redmond, WA 98052
[email protected]

642 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A Tale of Three Theorems
Lawrence Zalcman

To Brit Kirwan on his Jubilee, with admiration and affection

Ampliat aetatis spatium sibi vir bonus: hoc est


vivere bis, vita posse priore frui.
Martial x.23

Abstract. This article presents a birds eye view of the celebrated theorems of Picard and
their close relatives, with particular emphasis on some surprising developments of the past
half century.

1. A VERY GOOD YEAR. The tale I have to tell begins on May 19, 1879, when the

23-year-old Emile Picard astounded the mathematical world with his announcement
[23] that a nonconstant function holomorphic on the entire complex plane C (an entire
function) must take on every complex value with at most one exception. Previously, the
most that had been known was that the range of a nonconstant entire function is dense
in C. That Picards theorem is sharp is shown, for instance, by the function f (z) = e z ,
which never assumes the value 0.
Our interest in this paper focuses on meromorphic functions on the plane, i.e.,
functions holomorphic on C except for isolated poles (which may accumulate at ).
Picards theorem extends to such functions in a straightforward fashion. Indeed, if f is
meromorphic on C and fails to take on the value c C, then g = 1/( f c) is entire
and therefore omits and at most one complex value. It follows that f = 1/g + c
omits at most two values in the extended complex plane C = C {}. We may sum-
marize this as follows.

Picards Theorem. A nonconstant meromorphic function on C takes on every value


with at most two exceptions. Equivalently, a meromorphic function on C whose
in C
is constant.
range omits three values in C

Remarkably, Picards theorem is very far from the last word on the values
assumed (or omitted) by entire or meromorphic functions. Just five months after
his first announcement, on October 20, 1879, Picard announced [24] that an entire
function that is not a polynomial takes on every finite value with at most one
exception infinitely often. And on November 5, 1879, he announced the general
form of what has come to be known as the big Picard theorem, or Picards great
theorem, concerning the behavior of a meromorphic function in the neighborhood of
an (isolated) essential singularity [25]. For functions defined on the whole complex
plane, this takes the following form.
Augmented text of a lecture entitled Picard Theorems 18792013, delivered on March 10, 2015,
at the Kirwan Mathematics Festival, held at the University of Maryland at College Park in celebration of
W.E. Kirwans 50 years as a mathematician.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.643
MSC: Primary 30D35, Secondary 30D45

AugustSeptember 2016] THREE THEOREMS 643


Picards Great Theorem. A transcendental (i.e., nonrational) meromorphic function
infinitely often, with at most two exceptions.
on C takes on every value in C

Note that Picards little theorem (as the result simply called Picards theorem
above has become affectionately known) is an instant consequence of the great
theorem. This holds a fortiori in case f is transcendental; while if f is a noncon-
= C,
stant rational function, f (C) so f can omit at most one complex value on C.
Less than half a year separates the formulation and proof of Picards little theorem
from that of the great theorem, but it would take almost a third of a century before the
third theorem of our title would be proved. Indeed, the very formulation of that result
requires new definitions and notational conventions. It is to these that we turn in our
next section.

= C {} can be
2. SPHERES OF INFLUENCE. The extended complex plane C
identified with the euclidean sphere

S = {(x1 , x2 , x3 ) : x12 + x22 + (x3 1/2)2 = 1/4}

in R3 via the stereographic projection : S C, which associates to the point


x = (x1 , x2 , x3 ) S (x3 = 1) the point of intersection of the line through (0, 0, 1)
and x with the x1 x2 -plane, which is then identified with C via the correspondence
(x1 , x2 ) x1 + i x2 . More explicitly,

x1 x2
+ i x3 = 1
(x1 , x2 , x3 ) = 1 x3 1 x3
x3 = 1.

This identification induces a natural topology on C, which agrees with the usual topol-
ogy of C on bounded sets, while a sequence {z n } C converges to precisely when
is given by
the sequence of preimages { 1 (z n )} tends to (0, 0, 1). This topology on C
the so-called chordal metric, defined by

(z, w) = | 1 (z) 1 (w)|R3 ,


z, w C,

which is the length in R3 of the chord of S joining the preimages of z and w. In


complex coordinates,

|z w|
(z, w) =   z, w C
1 + |z|2 1 + |w|2
(1)
1
(z, ) =  ,
1 + |z|2

where, as usual, |z| = |x + i y| = x 2 + y 2 . Clearly, (z, w) |z w|; moreover,
with the usual convention 1/ = 0, 1/0 = , we have (1/z, 1/w) = (z, w).
For our purposes, a meromorphic function f defined on a plane domain D is most
profitably viewed as a map

f
),
(D, | |) (C, (2)

644
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
where | | denotes the euclidean distance in C induced by the absolute value.
The continuity of such a map is evident: if f is holomorphic at z 0 D, then
( f (z), f (z 0 )) | f (z) f (z 0 )|, which tends to 0 as z z 0 ; while if f has a
pole at z 0 , then 1/ f is holomorphic at z 0 , and the result follows from the previous case
since ( f (z), f (z 0 )) = (1/ f (z), 1/ f (z 0 )).
Associated with the map f in (2) is the so-called spherical derivative, in which
distances in the target space C are measured via while distances in D are measured
in the euclidean metric:
( f (z + h), f (z))
f # (z) = lim
h0 |h|
(3)
| f (z)|
= ( f (z) = ).
1 + | f (z)|2

If f has a pole at z, 1/ f is holomorphic at z and we define f # (z) = (1/ f )# (z). Note


that f # (z) = (1/ f )# (z) whenever f (z) = 0, . It follows that f # is continuous wher-
ever f is meromorphic. In particular, if f is meromorphic on all of C, f # is bounded
on compact subsets of C.
We are now prepared to make the definitions necessary for formulating our third
theorem.

3. NORMAL FAMILIES. The birth of modern complex function theory can be


dated precisely to 1907, with the introduction of notions of compactness into com-
plex analysis by Paul Montel in his theory of normal families [16]. Let us briefly recall
the main lines of this important theory.
A family F of functions meromorphic on a plane domain D is said to be normal
on D if each sequence { f n } of functions belonging to F has a subsequence { f nk } that
converges uniformly with respect to the chordal metric on compact subsets of D. It is
a standard fact that the limit function is itself meromorphic on D or identically equal
to infinity. For families of holomorphic functions, the above requirement is equivalent
to the existence of a subsequence that either converges uniformly (with respect to the
euclidean metric) on compacta in D or diverges uniformly on compacta. Such, in fact,
is Montels original definition of normality for families of holomorphic functions; for
the (easy) extension to families of meromorphic functions, see [19, pp. 124125]. The
classical theory of normal families is laid out in detail in Montels magisterial treat-
ment [19], and more recent developments are discussed in Schiffs monograph [27].

Example. The family of holomorphic functions F = { f n : n = 1, 2, . . . }, where


f n (z) = nz, is not normal on any domain in C containing the origin. Indeed, f n (0) = 0
for all n, while f n (z) as n for any z = 0. Thus, no subsequence of the
sequence { f n } can converge or diverge uniformly on any neighborhood of 0.

Clearly, normality is a compactness notion. Indeed, a family F of functions


meromorphic on D C is normal on D if and only it is relatively compact (i.e.,
has compact closure) in the topology of uniform convergence with respect to the
chordal metric on compact subsets of D. By the Arzel`a-Ascoli theorem (see, for
instance, [27, p. 35]), this is equivalent to the equicontinuity (with respect to the
chordal metric) on compact subsets of D of the family F . Since we are dealing with
functions that are smooth (in an appropriate sense), this equicontinuity is equivalent
to the uniform boundedness on compacta of (appropriately defined) derivatives of the
functions in F . The reader will have no difficulty in accepting that the appropriate

AugustSeptember 2016] THREE THEOREMS 645


notion of derivative here is precisely that given in (3). And so, we have the follow-
ing result [14, p. 190] from Frederic Martys thesis, written in 1931 under Montels
direction.

Martys Theorem. A family F of functions meromorphic on the plane domain D is


normal on D if and only if for each compact set K D, there exists a constant M(K )
such that

f # (z) M(K )

for all f F and all z K .

That such a basic result as Martys theorem should have been proved so late in
the development of the theory may seem more than a little surprising. This apparent
anomaly stems from the fact that the use of the chordal metric in complex analysis
and, more specifically, the definition of normality given above in terms of the chordal
metric were introduced (by Alexander Ostrowski [22]) only in 1926. Until then,
Montels original definition, as extended to families of meromorphic functions, reigned
supreme. For a snapshot biography of Marty, see [2, p. 219, n. 81]. The fascinat-
ing volume [2] also contains much more of interest, including an extremely warm
appreciation of Montels work, contained in a letter from Lebesgue to Elie Cartan
[2, pp. 240-248].

4. MONTELS THEOREM AND BLOCHS PRINCIPLE. Central among the


results in the theory of normal families is the crit`ere fondamental [19, pp. 61-64] for
normality established by Montel in 1912.

Montels Theorem. The collection F of all meromorphic functions on the plane


is a normal family on D.
domain D that omit three (distinct) fixed values a, b, c C

For a simple proof, see [37, p. 218].


Montels theorem makes available the mechanism of normal families for proving
global results in (one-dimensional) complex dynamics and thus stands at the very
foundation of that theory. Over the years, it has undergone a number of far-reaching
extensions and generalizations; for instance, the three values a, b, and c can be
replaced by three distinct functions meromorphic on D. For a survey of such results,
see [38].
Our interest here in Montels theorem is based on the fact that it implies, via a very
simple argument, the great Picard theorem (see for instance, [27, pp. 60-61]). Thus,
we have the sequence of implications

Montels theorem great Picard theorem little Picard theorem.

The fact that in Montels theorem the hypothesis on the functions in the family F is
precisely the condition that in Picards little theorem forces a meromorphic function
on C to be constant suggests that there might be a close relation between these two
results. Could it be that the little Picard theorem actually implies Montels theorem?
Such, indeed, turns out to be the case [36, p. 815]. Thus, the implications dis-
played above are actually equivalences, and Montels theorem takes its place as the
third theorem of our title. With all dramatis personae now on stage, we are ready to
continue our entertainment. But first, a word of elaboration on this latest development.

646
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
That Picards little theorem implies Montels theorem turns out to be a special case
of a much more general phenomenon, which goes by the name of Blochs principle.
According to Blochs principle, if P is a property that forces a function meromorphic
on C to be constant, the family of all functions meromorphic on the plane domain D
that have P is normal on D. Stated so baldly, this assertion is simply false. Consider,
for instance, the property of being bounded. A bounded meromorphic function on C
can have no poles and is therefore a bounded entire function, which, by Liouvilles
theorem, must be constant. On the other hand, each of the functions f n (z) = nz is
bounded on the unit disk ; but, as noted in the previous section, { f n } is not normal
on . However, Blochs principle is true in specific cases frequently enough to make
explication of those conditions under which it is valid an interesting and challenging
endeavor. First steps in this direction were taken in [36], but much more remains to be
done. See [3] for an extensive discussion of these matters.
Blochs principle is named for the French mathematician Andre Bloch, who is per-
haps best remembered for having originated Blochs constant (cf. [27, p. 112]) and
whose visionary insights into the theory of several complex variables retain vitality to
this day. Bloch was a profoundly troubled individual, who spent essentially his entire
adult life (and produced his very considerable mathematical oeuvre) while incarcer-
ated in the famous psychiatric hospital at Charenton. Further details on his life and
work can be found in [30] and [5].
Finally, a comment on the attribution of Blochs principle to Bloch. I have been
unable to find anything remotely resembling the enunciation of this principle in
Blochs published work. The first statement of the principle of which I am aware
occurs in Valirons monograph of 1929 [28, p. 2] but without any mention of Bloch.
Blochs name is mentioned in connection with the principle in a later work of Valiron
[29, p. 4]. Finding a concrete connection between Bloch and his principle (or proving
that none exists) would thus be a worthy project for a doctoral dissertation in the
history of mathematics.
And now, after all these digressions, we return to our tale.

5. A GREAT LEAP FORWARD. We are about to fast forward almost half a century,
from 1912 to 1959. Such a leap inevitably skips over important developments. The
most notable of these, so far as complex analysis is concerned, is undoubtedly Rolf
Nevanlinnas theory of value distribution for meromorphic functions [20], termed by
Hermann Weyl one of the few great mathematical events in our century [33, p. 8].
Although our principal concerns in the remainder of this article are very much in this
direction, the beautiful intricacies of Nevanlinna theory (as it is often called) need not
detain us here. The interested reader is directed to Haymans classical exposition of
the subject [9] (see also [35]) and the comprehensive monograph [7]. In recent years,
considerable attention has focused on an amazing analogy between Nevanlinna theory
and diophantine approximation, first observed by C. F. Osgood and later elaborated by
Paul Vojta and others, for which see [31].

6. HAYMANS ALTERNATIVE AND ITS CONGENERS. In 1959, Walter


Hayman published the following striking result, which has come to be known as
Haymans alternative [8, Theorem 3].

Haymans Alternative. Let f be a transcendental meromorphic function on C. Then


either
(i) f assumes each value a C infinitely often,

AugustSeptember 2016] THREE THEOREMS 647


or
(ii) f (k) assumes each value b C \ {0} infinitely often for k = 1, 2, 3, . . . .

Note that the value is excluded from consideration, as it is evident that f and
f (k) take on the value at exactly the same points, viz., the poles of f.
In the sequel, we focus on the case k = 1, returning to the case of general k at the
very end of the paper. Accordingly, considering f a in place of f, we may restate
Haymans alternative in the following equivalent form.

Theorem A. Let f be a transcendental meromorphic function on C that has at most


finitely many zeros. Then f takes on every nonzero complex value infinitely often.

For entire functions, this had been proved as early as 1923 by Walter Saxer [26];
however, it needs to be emphasized that the passage from holomorphic to meromorphic
functions represents an enormous advance.
Theorem A can be considered a kind of refinement of the great Picard theorem,
in that it provides additional information on (the derivative of) f in the case that f
takes on some finite complex value at most finitely often. In much the same way that
the great Picard theorem implies the little Picard theorem, Theorem A implies the
following result.

Theorem B. [8, p. 20] Let f be a meromorphic function on C such that f (z) = 0 and
f (z) = 1 for all z C. Then f is constant.

Theorem B highlights what is perhaps the most remarkable aspect of Haymans


Alternative, viz., the magic number 3 in the Picard theorems has been reduced from 3
to 2.
Finally, in analogy with Montels theorem, we have the following result of
Yongxing Gu.

Theorem C. [12] Let F be a family of functions meromorphic on the plane domain D.


Suppose that for each f F , f (z) = 0 and f (z) = 1 for all z D. Then F is a
normal family on D.

More generally, the constants 0 and 1 in Theorems B and C can be replaced by


arbitrary a C and b C \ {0}, as one sees by applying these theorems to the function
g(z) = [ f (z) a]/b. Theorems B and C also remain true if f is replaced by f (k) for
any fixed positive integer k > 1.

7. THE WONDERS OF MODERN TECHNOLOGY. It is instructive to consider


the rate of progress represented by the results of the previous section. Some 36 years
separate Saxers result [26] from Haymans alternative, i.e., the passage from a result
about entire functions to its analogue for meromorphic functions on the plane. The
analogue of Theorem C for analytic functions was first proved by Miranda [15] in
1935, so the gap between the normal families result for analytic functions and that
for meromorphic functions is even greater, 44 years. No less remarkable (in view of
Blochs principle) is the fact that fully two decades separate Theorem B (the extension
of the little Picard theorem) from Theorem C (the extension of Montels theorem), i.e.,
between a result for a single (meromorphic) function and the corresponding normality
result for a family of meromorphic functions.

648
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Technical developments which go back to [36] have reduced the work of decades
to (literally) just a few lines, leading to renewed interest and activity at an unparalleled
level in the study of normal families. See [37] and [3] for details and many examples.
One insight that has been gained through these developments is the fact that in
results of the sort we have been considering, the assumption that a function f fails to
vanish can often be replaced by the assumption that all zeros of f have suitably high
multiplicity. Such is the case with Theorems A, B, and C, as shown by Yufei Wang
and Mingliang Fang in [32].

Theorem A . [32, Theorem 3] Let f be a transcendental meromorphic function on C,


all of whose zeros have multiplicity at least 3. Then f takes on every nonzero complex
value infinitely often.

Like Haymans alternative, Theorem A is proved by skillful manipulations in


Nevanlinna theory, and, like it, it remains true when f is replaced by f (k) for k 2.

Theorem B . (Cf. [32, Lemma 8]) Let f be a meromorphic function on C, all of whose
zeros have multiplicity at least 3. Then if f (z) = 1 for z C, f must be constant.

Theorem C . [32, Theorem 7] Let F be a family of meromorphic functions on the


plane domain D, all of whose zeros have multiplicity at least 3. If f (z) = 1 for all
f F and z D, then F is a normal family on D.

8. A PICAYUNE QUESTION? In view of the results stated in the previous section,


we may ask the following question:
Can the number 3 in Theorems A , B , and C be replaced by 2?
In other words (since any zero has multiplicity at least 1), can the lower bound for the
multiplicity of zeros be reduced to the minimum?
Lest the sceptical reader be tempted to view such a query as an instance of gener-
alized nitpicking, let us cite the opinion [1, p. 3] of Lars Ahlfors, the dominant figure
in complex function theory during the middle years of the past century, on research in
the theory of analytic functions of one complex variable.

The classical literature abounds in sharp and explicit results which are inter-
twined in a network of relationships. It is a challenge for modern research to
discover new aspects and new methods which bring about a deeper understand-
ing of these questions . . . the point is that one can and must penetrate much
deeper in the case of two dimensions, and the deep methods are precisely the
ones that do not have easy generalizations.

Armed with this warrant, let us examine the evidence.

Exhibit B. Let a, b C, a = b, and set

(z a)2 (a b)2
f (z) = = z + (b 2a) + .
zb zb
Then f has a single zero, at z = a, whose multiplicity is 2, and

(a b)2
f (z) = 1 = 1.
(z b)2

AugustSeptember 2016] THREE THEOREMS 649


Thus, a meromorphic function on C, all of whose zeros are multiple, which satisfies
f (z) = 1 for all z C need not be constant. In other words, the multiplicity condition
in Theorem B cannot be reduced from 3 to 2.

Exhibit C. Let F = { f }, C \ {0}, where

(z )2
f (z) = .
z 2

Then

2
f (z) = z + , so f (z) = 1 for z C.
(z 2)

For any neighborhood V = {z : |z| < } of 0, f takes on both the values 0 and in
V if || < /2. Thus, the family F is clearly not equicontinuous at 0 and hence fails
to be normal on any plane domain containing 0.
Hence, one cannot reduce the multiplicity condition in Theorem C from 3 to 2.

It has been said that The virtue of a logical proof is not that it compels belief, but
that it suggests doubts, and the proof tells us where to concentrate our doubts [11,
p. 282]. It is no less true that the chief virtue of a counterexample is not that it compels
doubt but that it suggests belief, and the counterexample tells us where to concentrate
our belief.
Where, then, do the counterexamples we have just seen tell us to concentrate our
belief? Where, indeed, if not on the possibility of reducing 3 to 2 in Theorem A ?

9. MEASURING MEROMORPHIC FUNCTIONS. The growth of an entire func-


tion f is measured by the maximum modulus function

M(r ) = M(r, f ) = max | f (z)|.


|z|r

The order of f is then defined by

log log M(r )


= lim sup
r log r
n
so that 0 . Thus, for instance, the entire function f (z) = e z has order n.
For functions having poles, M(r ) = as soon as r is greater than or equal to the
distance of the nearest pole of f to the origin. Nonetheless, it is relatively simple to
define a measure of growth for meromorphic functions on C that leads to the same
order as defined above when the function is entire. Indeed, for f a meromorphic func-
tion on C, let

1
S(t) = [ f # (z)]2 d xd y,
|z|<t

where f # is the spherical derivative of f given by (3). This function represents the
(normalized) spherical area of the image on C of the disc Dt = {z : |z| < t} under the
mapping f, counted with multiplicity. Then the AhlforsShimizu characteristic of f is

650
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
 r
S(t)
T0 (r ) = dt,
0 t

and we define the order of f by

log T0 (r )
= lim sup .
r log r

It is not difficult to show that when f is an entire function; this definition coincides
with that given previously in terms of M(r, f ).
It follows immediately from the definitions above that if f # remains (uniformly)
bounded throughout C, then 2. As we shall see, the finiteness of in this case
plays an important role in the subsequent development.

10. A POSITIVE RESULT. Historically speaking, functions of finite order have


played a dominant role in the theory of entire functions and, to a considerable degree,
in the theory of meromorphic functions as well. It is thus natural, in pursuit of a posi-
tive result, to restrict our attention initially to functions of finite order. Such a modest
approach is crowned with success. Indeed, we have the following result.

Theorem A 0 . Let f be a transcendental meromorphic function on C of finite order,


all of whose zeros are multiple. Then f takes on every nonzero value in C infinitely
often.

This follows almost immediately from the following beautiful theorem of


Bergweiler and Eremenko [4, Theorem 3].

Theorem BE. Let f be a transcendental meromorphic function on C of finite order


that has an infinite number of multiple zeros. Then f takes on every nonzero value in
C infinitely often.

To derive Theorem A 0 from Theorem BE, let f be as in Theorem A 0 . If f has


only finitely many zeros, then f takes on every nonzero value in C infinitely often
by Haymans alternative. Otherwise, f has infinitely many zeros, all of which are (by
hypothesis) multiple. Thus, the condition of Theorem BE holds, so (again) f takes on
each nonzero value in C infinitely often.
Unfortunately, there is no hope of extending the proof given above to the case of
functions of infinite order, as Theorem BE does not hold for all such functions.

Example. [4, p. 370] Let a, b C, where 1 + ab = 0 and 1 + aeb = 0. (Choose b


such that eb = b and let a = 1/b.) Consider the entire function
 z
f (z) = z + a exp[be ]d. (4)
0

Then

f (z) = 1 + a exp[be z z]. (5)

Clearly, f (0) = 0 and, by (5), f (0) = 1 + aeb = 0, so f has a multiple zero at the
origin. We claim that f has period 2i. Once this has been shown, it follows that f
has multiple zeros at the points 2ik, k Z.

AugustSeptember 2016] THREE THEOREMS 651


To verify the claim, make the change of variable w = e in (4). Then

ebw
f (z + 2i) f (z) = 2i + a dw, (6)
w2

where is the image of the segment joining z to z + 2i under the map  e ,


i.e., the positively oriented circle about the origin of radius eRe z . Now, by Cauchys
formula for derivatives, the integral in (6) equals 2i times the derivative of ebz at 0,
i.e., 2ib. Thus, the right-hand side of (6) is 2i(1 + ab) = 0, as claimed. Thus, f is
a function having multiple zeros at all integral multiples of 2i.
But clearly, by (5), f (z) = 1 for z C.

11. QUO VADIMUS? In the absence of any obvious direction to pursue in hopes of
extending Theorem A to functions of infinite order, it makes sense to backtrack a bit
and have a look at the proof of Theorem A .
Suppose, then, that f is a meromorphic function on C, all of whose zeros have
multiplicity at least 3 but that f takes on some nonzero value, say 1, only finitely
often. Then (by Theorem A 0 ) f has infinite order, so f # is unbounded on C. Since
f # is bounded on each compact set of C, it follows that there exists a sequence of
points {z n } such that z n and f # (z n ) . Define f n (z) = f (z + z n ); then for
|z| < 1, f n (z) = f (z + z n ) = 1 for n sufficiently large, say n N . Consider the
family of meromorphic functions F = { f n : n N } on the unit disk . Then each
element of F satisfies the hypothesis of Theorem C , and hence, F is normal on .
But f n# (0) = f # (z n ) , which contradicts Martys theorem.
Since Theorem C , on which the proof of Theorem A given immediately above
depends, does not admit an extension to functions all of whose zeros are (merely)
multiple, we find our path blocked by a brick wall.
Walls can be climbed, or they can be broken through. But it is often simplest to
circumvent them. That is the track we take in the next section.

12. QUASINORMAL FAMILIES. The alternate route we follow is based on an


extension of the notion of normality introduced by Montel in 1922 [18]. A family
F of functions meromorphic on a plane domain D is said to be quasinormal on D
if from each sequence { f n } F one can extract a subsequence { f nk } that converges
locally uniformly with respect to the chordal metric in D \ E, where E D (which
may depend on { f nk }) has no accumulation point in D. If E can always be chosen to
contain no more than M points, F is said to be quasinormal of order M on D.

Example. The family {nz} is not normal on {|z| < 1}, but it is quasinormal of
order 1 there, as it is normal on {0 < |z| < 1}.

Following its introduction, the theory of quasinormal families enjoyed a certain


vogue and then ultimately slipped from view. More recently, it has played a role in
the analytic theory of continued fractions and in complex dynamics. Its relevance
to the question posed in Section 8 above derives from the following substitute for
Theorem C ; cf. [21, Theorem 1].

Theorem C . Let F be a family of meromorphic functions on the plane domain D,


all of whose zeros are multiple. If for each f F , f (z) = 1 for all z D, then F is
quasinormal of order 1 on D.

652
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
It is easy to see that the assumption concerning the zeros of the functions in F
cannot be dropped. Indeed, consider the family of holomorphic functions F = { f n },
where f n (z) = z n 3n , on the annulus D = {z : 2 < |z| < 4}. Then f n (z) = nz n1
= 1 for z D. However, no subsequence of { f n } can converge uniformly on a neigh-
borhood of any point of E = {z : |z| = 3}. Thus, F is not quasinormal on any domain
intersecting E.

13. BRINGING IN THE SHEAVES. While the proof of Theorem C is not easy, it
steers clear of the technicalities of classical value distribution theory. Thus, it is quite
remarkable that it implies the following positive response to the query of Section 8
[21, Theorem 2].

Theorem A . The derivative of a transcendental meromorphic function on C, all


but finitely many of whose zeros are multiple, takes on every nonzero complex value
infinitely often.

The derivation of Theorem A from Theorem C is not difficult. To avoid techni-


calities, we content ourselves with indicating how the following extended version of
Theorem B follows from Theorem C .

Theorem B . Let f be a transcendental meromorphic function, all of whose zeros are


multiple. Then f takes on every nonzero value in C.

Proof. Suppose not, say f (z) = 1, z C. Then by Theorem A 0 , f has infinite


order, so there exist points z n such that z n and f # (z n ) . Consider the family
F of functions { f n }, where f n (z) = f (z n z)/z n , on C. Then f n has only multiple zeros
on C, and f n (z) = f (z n z) = 1 on C. Thus, F is quasinormal of order 1 on C, by
Theorem C .
On the other hand, for |z n | 1,

| f (z n )| | f (z n )|
f n# (1) = = f n# (z n ) ,
1 + | f (z n )/z n |2 1 + | f (z n )|2

so by Martys theorem, no subsequence of F can be normal in a neighborhood of


z = 1.
Similarly, for each > 0 and |z n | 1,

| f (z n z)|
sup f n# (z) = sup
|z| |z| 1 + | f (z n z)/z n |2
| f (z n z)|
sup
|z| 1 + | f (z n z)|2
= sup f # (w) ,
|w||z n |

so that no subsequence of F can be normal at 0, either. Thus, the family F cannot


be quasinormal of order 1 on any domain containing the points 0 and 1, a contradic-
tion.

Note that we have called Theorem B an extended version, not an extension, of


Theorem B . Indeed, as has already been observed, the extension of Theorem B to

AugustSeptember 2016] THREE THEOREMS 653


functions all of whose zeros are multiple is false. Theorem B states the remarkable
fact that this extension remains true for transcendental functions.

14. ONE FOR THE BOOK. An instant consequence of Theorem A is the follow-
ing answer to a celebrated problem of Hayman (see [10, 1.19]), whose solution (see
[37, p. 226]) stretches over three and a half decades.

Corollary. If f is a transcendental meromorphic function on C, then f f n takes on


every nonzero complex value infinitely often for each positive integer n.

Proof. ( f n+1 ) = (n + 1) f f n .

15. EPILOGUE. Our tale is told, but the story is not over. We have already observed
that Theorem A remains true if f is replaced by f (k) for any natural number k. It is
natural to ask whether this remains true for Theorem A :

Let f be a transcendental meromorphic function on C, all (but finitely many)


of whose zeros are multiple. Must f (k) take on every nonzero complex value
infinitely often for k = 2, 3, . . .?

The answer to this question turns out to be surprisingly elusive. Thus (as already
noted), when all zeros of f have multiplicity at least 3, f (k) does indeed take on each
nonzero complex value infinitely often. However, the proof of this in [32] fails com-
pletely (even for k = 1) when the multiplicity of the zeros drops to 2. Moreover, the
analogue of Theorem C for k = 2 definitely fails: A family F of functions meromor-
phic on a plane domain D, all of whose zeros are multiple and such that f = 1 on D
for each f F need not be quasinormal of any finite order on D.
On the other hand, one has a positive result for functions of finite order.

Theorem. If f is a transcendental meromorphic function of finite order, all of


whose zeros are multiple, and k 2, then f (k) takes on each nonzero complex value
infinitely often.

Proof. Suppose that f (k) = 1 only finitely often. If f has only finitely many poles,
then

f (k) (z) = 1 + R(z)e P(z) ,

where R is a rational function and P is a polynomial. In this case, an elementary


calculation leads to a contradiction.
Otherwise, f has infinitely many poles. However, according to a theorem of Lang-
ley [13, Theorem 1.2], a meromorphic function g of finite order such that g (k) has only
finitely many zeros for some k 2 has only finitely many poles. Applying this result
to g(z) = f (z) z k /k! yields a contradiction. (Note that the hypothesis of multiple
zeros was not used in this case.)

Alas, Langleys result most definitely does not hold for meromorphic functions of
infinite order [13, pp. 107-108]. Thus, the evidence seems equivocal, at best.
Nevertheless, . . . the ayes have it.

654
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Theorem. Let f is a transcendental meromorphic function on C, all but finitely many
of whose zeros are multiple. Then f (k) takes on each nonzero complex values infinitely
often for k = 1, 2, 3, . . . .

This has now been shown by Mingliang Fang and Yuefei Wang [6, Theorem 10],
who derive the result in fairly straightforward fashion from an inequality proved in
the breakthrough paper [34] of Katsutoshi Yamanoi, in which he proves the celebrated
Goldberg conjecture.
But that is another story.

REFERENCES

1. L. V. Ahlfors, Classical and contemporary analysis, SIAM Rev. 3 (1961) 19.


2. M. Audin, Fatou, Julia, Montel: The Great Prize of Mathematical Sciences of 1918 and Beyond. Lecture
Notes in Mathematics, Vol. 2014, Springer-Verlag, Berlin, 2011.
3. W. Bergweiler, Blochs principle, Comput. Methods Funct. Theory 6 (2006) 77108.
4. W. Bergweiler, A. Eremenko, On the singularities of the inverse to a meromorphic function of finite order,
Rev. Mat. Iberoamericana 11 (1995) 355373.
5. H. Cartan, J. Ferrand, The case of Andre Bloch, Math. Intelligencer, 10 no. 1 (1988) 2326.
6. M.-L. Fang, Y.-F. Wang, A note on the conjectures of Hayman, Mues and Goldberg, Comput. Methods
Funct. Theory 13 (2013) 533543.
7. A. A. Goldberg, I. V. Ostrovskii, Value Distribution of Meromorphic Functions. American Mathematical
Society, Providence, RI, 2008.
8. W. K. Hayman, Picard values of meromorphic functions and their derivatives, Ann. of Math. 70 no. 2
(1959) 942.
9. , Meromorphic Functions. Clarendon Press, Oxford, 1964.
10. , Research Problems in Function Theory. Athlone Press, London, 1967.
11. M. Kline, Logic versus pedagogy, Amer. Math. Monthly 77 (1970) 264282.
12. Y. X. Ku, Un crit`ere de normalite des familles de fonctions meromorphes, Sci. Sinica Special Issue 1
(1979) 267274.
13. J. K. Langley, The second derivative of a meromorphic function of finite order, Bull. London Math. Soc.
35 (2003) 97108.
14. F. Marty, Recherches sur la repartition des valeurs dune fonction meromorphe, Ann. Fac. Sci. Univ.
Toulouse 23 no. 3 (1931) 183261.
15. C. Miranda, Sur un nouveau crit`ere de normalite pour les familles de fonctions holomorphes, Bull. Soc.
Math. France 63 (1935) 185196.

16. P. Montel, Sur les suites infinies de fonctions, Ann. Ecole Norm. Sup. 24 no. 3 (1907) 233334.
17. , Sur les familles de fonctions analytiques qui admettant des valeurs exceptionnelles dans un

domaine, Ann. Ecole Norm. Sup. 29 no. 3 (1912) 487535.
18. , Sur les familles quasi-normales de fonctions holomorphes, Mem. Acad. Roy. Belgique 6 no. 2
(1922) 141.
19. , Lecons sur les familles normales de fonctions analytiques et leurs applications. Corrected reprint
of 1927 edition. Chelsea, New York, 1974.
20. R. Nevanlinna, Zur Theorie der Meromorphen Funktionen, Acta Math. 46 (1925) 199.
21. S. Nevo, X.-C. Pang, L. Zalcman, Quasinormality and meromorphic functions with multiple zeros,
J. Analyse Math. 101 (2007) 123.

22. A. Ostrowski, Uber Folgen analytischer Funktionen und einige Verscharfungen des Picardschen Satzes,
Math. Z. 24 (1926) 215258.
23. E. Picard, Sur une propriete des fonctions enti`eres, C. R. Acad. Sci. Paris 88 (1879) 10241027.
24. , Sur les fonctions enti`eres, C. R. Acad. Sci. Paris 89 (1879) 662665.
25. , Sur les fonctions analytiques uniformes dans le voisinage dun point singular essentiel,
C. R. Acad. Sci. Paris 89 (1879) 745747.

26. W. Saxer, Uber die Picardschen Ausnahmewerte sukzessiver Derivierten, Math. Z. 17 (1923) 206227.
27. J. Schiff, Normal Families. Springer-Verlag, New York, 1993.
28. G. Valiron, Familles normales et quasi-normales de fonctions meromorphes. Gauthier-Villars, Paris,
1929.
29. , Sur les valeurs exceptionnelles des fonctions meromorphes et leurs derivees. Hermann et Cie,
Paris, 1937.
30. , Des theor`emes de Bloch aux theories dAhlfors, Bull. Sci. Math. 73 no. 2 (1949) 152162.

AugustSeptember 2016] THREE THEOREMS 655


31. P. Vojta, Diophantine approximation and Nevanlinna theory, in Arithmetic Geometry, Lecture Notes in
Mathematics, Vol. 2009, Springer-Verlag, Berlin, 2011, 111224.
32. Y.-F. Wang, M.-L. Fang, Picard values and normal families of meromorphic functions with multiple
zeros, Acta Math. Sinica (N.S.) 14 (1998) 1726.
33. H. Weyl, Meromorphic Functions and Analytic Curves. Princeton Univ. Press, Princeton, NJ, 1943.
34. K. Yamanoi, Zeros of higher derivatives of meromorphic functions in the complex plane, Proc. London
Math. Soc. 106 no. 3 (2013) 703780.
35. L. Yang, Value Distribution Theory. Springer-Verlag, Berlin, 1993.
36. L. Zalcman, A heuristic principle in complex function theory, Amer. Math. Monthly 82 (1975) 813817.
37. , Normal families: new perspectives, Bull. Amer. Math. Soc. (N.S.) 35 (1998) 215230.
38. , Variations on Montels Theorem, Bull. Soc. Sci. Lett. odz Ser. Rech. Deform. 59 (2009) 2536.

LAWRENCE ZALCMAN attended Dartmouth College, where he learned complex function theory from
A. S. Besicovitch and functional analysis from Misha Cotlar. He moved on to MIT, where he continued his
studies in these areas under Henry McKean and Kenneth Hoffman. After 17 years teaching at Stanford and
the University of Maryland, he relocated to Israel in 1985 as Lady Davis Professor of Mathematics at Bar-Ilan
University. Recently emerited, he is now in his 29th year as editor of Journal dAnalyse Mathematique.
Department of Mathematics, Bar-Ilan University, Ramat Gan 5290002, Israel
[email protected]

656
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
An Electric Network for Nonreversible
Markov Chains

Marton Balazs and Aron Folly

Abstract. We give an analogy between nonreversible Markov chains and electric networks
much in the flavor of the classical reversible results originating from Kakutani and later
Kemeny-Snell-Knapp and Kelly. Nonreversibility is made possible by a voltage multiplier
a new electronic component. We prove that absorption probabilities, escape probabilities,
expected number of jumps over edges, and commute times can be computed from electri-
cal properties of the network as in the classical case. The central quantity is still the effective
resistance, which we do have in our networks despite the fact that individual parts cannot be
replaced by a simple resistor. We rewrite a recent nonreversible result of Gaudilli`ere-Landim
about the Dirichlet and Thomson principles into the electrical language. We also give a few
tools that can help in reducing and solving the network. The subtlety of our network is, how-
ever, that the classical Rayleigh monotonicity is lost.

1. INTRODUCTION. Random walks or, more generally, reversible Markov chains


have a strong connection to electric resistor networks. Our knowledge of this anal-
ogy started with the work of Kakutani [8]; Doob [3]; Kemeny, Snell, and Knapp [9];
and Nash-Williams [11]. Since then, the field became a foundational part of the the-
ory of reversible Markov chains; we refer the readers to Doyle and Snell [5]; Telcs
[13]; Lyons and Peres [10]; and Chandra, Raghavan, Ruzzo, Smolensky, and Tiwari
[2] as a few references in the huge literature. Among several results, escape proba-
bilities, transience-recurrence problems, and commute and mixing times have been
successfully investigated with the use of this analogy. Two fundamental tools were
the Thomson (or Dirichlet) energy minimum principles and Rayleighs monotonic-
ity law. The former say that under given boundary conditions, the physical current
(or voltage, respectively) minimizes the power losses on the resistors. As a conse-
quence, Rayleighs monotonicity law states that the effective resistance of the network
is increasing in any of its individual resistances.
The resistor is a symmetric component, and this fact has fundamentally restricted
applications to the family of reversible Markov chains. Much less is known therefore in
the nonreversible case. The Thomson and Dirichlet principles have been established by
Doyle [4] and Gaudilli`ere-Landim [7] and reproved in an elementary way by Slowik
[12]. As an application, Gaudilli`ere and Landim also prove recurrence theorems in
some nonreversible systems. These studies use notions like energy, potential, and con-
ductance, but a genuine electric network is not featured behind these ideas.
In this paper, we build a full electrical framework behind nonreversible Markov
chains. The basic idea is to replace the single resistor by a nonsymmetric electrical
component. This new part consists of traditional resistors and a new voltage-multiplier
unit, which we will just call amplifier in short. As shown below, this unit is very
directly linked to how much a jump is nonreversible in the Markov chain. In par-
ticular, the amplifier becomes trivial and the network reduces to the classical resistor
http://dx.doi.org/10.4169/amer.math.monthly.123.7.657
MSC: Primary 60J10, Secondary 82C41

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 657


circuit if the chain is reversible. Also, reversing a chain with respect to its stationary
distribution will simply have the effect of reversing the amplifiers.
With this new component, many of the classical analogies work out flawlessly. The
starting point is, as in the reversible case, to make voltages in the network directly
related to absorption probabilities of the Markov chain. The electric current also has
a probabilistic interpretation. Our first observation is that, despite the fact that indi-
vidual components are more complicated than a single resistor, relevant networks can
be replaced by a single effective resistance between any two vertices (or even subsets
on two different constant potentials). We derive that the effective conductance (recip-
rocal of the effective resistance) is equal to what people call capacity in the theory
of Markov chains. We show how symmetry properties and other simple observations
regarding the capacity and escape probabilities follow from the electrical point of view.
The beautiful observation of Chandra, Raghavan, Ruzzo, Smolensky, and Tiwari [2]
that connects commute times and effective resistance also generalizes without prob-
lems to the nonreversible setting. We remark here that a nice mapping of states of non-
reversible Markov chains to Euclidean space, based on commuting times, had been
worked out earlier by Doyle and Steiner [6].
Problems start when we look at Rayleighs monotonicity principle. In its simple,
naive form, monotonicity is just not true in our networksthis we demonstrate with a
counterexample. What can possibly come as a replacement is a question for the future.
As a possible first step toward answering this, we rewrite the Dirichlet and Thomson
principles by Gaudilli`ere-Landim [7] and Slowik [12] into the electrical language. All
terms are then assigned an electrical-energetic meaning, but this has not helped our
intuition enough to come up with a sensible way of establishing monotonicity.
Finally, we give a bit of further insight to the behavior of our electric networks by
showing how series and parallel substitutions work. In connection with the lack of
(naive) monotonicity, it turns out that deltastar transformations, being essential in the
theory of resistor networks, cannot hold in general for our case.

2. AN ELECTRIC PART. We begin with describing the electric component that we


can later use in our analogy with irreversible Markov chains. The schematic picture
we use is as follows.

yx
ux Rx y /2 Rx y /2 uy

ix y

This unit is thought of as being connected to neighboring vertices x and y of a graph.


These vertices are on respective electric potentials u x and u y , which induces a current
i x y through the unit from vertex x to vertex y. We will always consider i as an antisym-
metric quantity in the sense that i x y = i yx . Our unit consists of three components:
an ordinary resistor of Rx y /2 > 0 Ohms,
a voltage amplifier of parameter yx > 0,
another ordinary resistor of Rx y /2 Ohms.
The resistors each satisfy Ohms classical law: The signed difference between the
potentials at their two ends is proportional to the current that flows through them,
and the rate is the value of the resistance. The resistance value R is considered as a
symmetric quantity: Rx y = R yx . The new element is the voltage amplifier of parameter
yx . It has the following characteristics:

658 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
the current that flows into it on one end agrees with the current that comes out on
the other end;
the potential, measured with respect to ground, from its left end (closer to x) to its
right end (closer to y) gets multiplied by the positive real parameter yx .
As the definition naturally suggests, the parameter is log-antisymmetric: We always
assume yx = 1/x y . We will also allow the graph to have loops (edges connecting a
vertex to itself) from some vertex x to x, in which case we require x x = 1.
According to the above, we now follow the potential (with respect to ground) from
left to right in the above unit. First, according to Ohms law, a drop of i x y Rx y /2 in
the potential occurs on the first resistor. Then this dropped potential gets multiplied by
yx . Finally, a second drop by i x y Rx y /2 occurs on the second resistor. Therefore,
 Rx y  Rx y
u y = ux ix y yx i x y , or
2 2
(1)
2C x y
ix y = ( yx u x u y )
1 + yx

with the introduction of the (Ohmic) conductance C x y = C yx = 1/Rx y . Notice that the
case yx = 1 reduces our unit to the classical single resistor of value Rx y . Notice also
that currents are automatically zero along loops: i x x = 0 whenever x is a vertex with a
loop.
We write z x for neighboring vertices z and x in the graph. This includes x x
for vertices x with a loop. For later use, we introduce
 1 2x y 
x y : = x y = , Dx y : = C x y = D yx , Dx : = Dx z zx . (2)
yx 1 + x y zx

The symmetry of the matrix D follows from that of C and log-antisymmetry of and
. With these quantities, we rewrite the above as

i x y = Dx y ( yx u x x y u y ). (3)

We emphasise that the voltage amplifier is not a natural object. Sophisticated engi-
neering would be required to build a black box with these characteristics, and this
black box would require an outer energy source (or energy absorber) for its operation.
We do not consider this energy source (or absorber) as part of our network.

Two alternative parts. Two alternative units will facilitate calculations in our net-
works. Using these is not required for any of the later arguments but simplifies matters.
Consider

yx
ux Rx y /2 Rx y /2 uy

ix y

pr
yx se
yx
pr
ux R yx uy ux R se
yx uy

ix y ix y

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 659


which are the original unit, the primer unit, and the secunder unit, respectively. The
primer and secunder units are built of the same types of elements as before. Repeating
the arguments, we see for the latter two cases

u y = (u x i x y R pr
yx ) yx
pr
and u y = u x se
yx i x y R yx .
se

Comparing this with (1), we conclude that these three units behave in a completely
identical way under the choices

yx + 1 yx + 1
yx = yx = yx ,
pr se
yx = R x y
R pr , yx = R x y
R se , (4)
2 yx 2

which we will assume whenever we write the pr or se indexed quantities. Notice that
the primer and secunder resistances are not symmetric quantities anymore.

Existence and uniqueness of solutions. An electric network for our purposes con-
sists of our units placed along the edges of a finite, connected graph G = (V, E).
We allow G to have loops as well. Suppose that a subset W of the vertices is taken to
fixed potentials, Ux , x W . The only requirement we make is that W is nonempty.
We show below that there exists a unique solution of the network with these boundary
values that is a unique set of currents i with

ix y = 0 x
/ W, (5)
yx

and voltages u x with

u x = Ux x W

that satisfy (1) for all x y. We start with uniqueness.

Proposition 1. Given the graph G and the boundary set W , fix the boundary condition
(Ux )xW , and suppose that we have two solutions u  , u and i  , i with this boundary
condition. Then u  u and i  i.

Proof. As (1) is linear, the difference of two solutions is yet another solution. There-
fore, u  u and i  i is another solution with boundary condition 0 for all x W .
Define now the set V W of vertices where u  u is positive. If this set is
nonempty, then any edge that connects it with the rest of V sees an outflowing current
by (1). But this contradicts (5) (summed up for x ). A similar argument shows
that there are no vertices of negative potential either, thus u  u 0. Then by (1) it
follows that i  i 0 as well.

We proceed by existence of solutions. Call the incoming current to a vertex x W


i x , which is defined as

ix : = ix y . (6)
yx

This is zero for all x


/ W.

660 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Lemma 1. Given the graph G and the boundary set W , suppose we have a solution
for all boundary conditions (Ux )xW . Then for any x W , i x is an affine increasing
function of Ux when keeping all other boundary voltages U y , x = y W constant.

Proof. Fix x W ; consider Ux > Ux , U y = U y for all x = y W and the cor-
responding solutions u  , u and i  , i. Then u  u and i  i is another solution with
boundary condition Ux Ux > 0 for x and 0 for all x = y W . Again, looking at the
current out of the set of vertices with positive potential, it is clear that the incoming
current i x i x is strictly positive in this setting. Now, multiplying all of u  u, i  i
by any factor is yet another solution. It follows that i x i x is a positive constant
multiple of Ux Ux , and the proof is complete.

Proposition 2. There is a solution for any set = W V and boundary condition


(Ux )xW .

Proof. We will call the set V W the free vertices and perform an induction on its
size n = |V W |. When n = 0, all potentials are fixed, and the currents are simply
computed by (1). Suppose now that the statement is true for n, and consider a set W
with |V W | = n + 1. Pick any vertex x V W . Fixing all boundary values U y
for y W and also the value Ux , we only have n free vertices, and we know by the
induction hypothesis that we have a solution. We also know by the above lemma that
the incoming current i x is an affine function of Ux . Therefore, there exists a particular
value Ux0 with the corresponding incoming current i x0 = 0, and a solution u 0 , i 0 that
goes with the boundary condition Ux0 for x and U y for y W . This will be a solution
with boundary condition U y for y W only, and the induction step is complete.

3. IRREVERSIBLE MARKOV CHAINS.

Absorption probabilities and the connection. In this section, we make a connection


of electric networks, built of our units, to Markov chains. The novelty is that the chain
does not need to be reversible. For the following proposition, two nonintersecting sub-
sets, A and B of the vertex set V , are supposed to be connected to constant external
potentials U A and U B :

ua U A a A and ub UB b B.

All other vertices are free: they just connect to neighboring ones via our units. The
starting point is as follows.

Proposition 3. For every x


/ A B, we have

 Dx y x y
ux = uy . (7)
yx
Dx

Proof. Consider a vertex x / A B and its neighboring vertices y x, x = y. We


demonstrate the situation for two neighbors with the picture

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 661


x y x y 
uy Rx y /2 Rx y /2 Rx y  /2 Rx y  /2 u y
ux
x y x y 
uy Rxsey Rxsey  u y
ux

where the second line shows an equivalent rewriting of the original setting. However,
with this secunder representation, our formula follows easily after realizing that
the potentials on the x-side of the amplifiers are x y u y for the respective vertices
y;
from here, the potential u x is computed using the well-known formula for a voltage
divider (of conductances C xsey = 1/Rxsey ).
Putting all that in formulas, we have

2 u y Dx y x y u x Dx x
 C xsey  C x y x yx+1
y
yx
ux = x y u y  se = uy  =
yx Cx z yx C x z 2+1 Dx Dx x
zx zx xz
y =x z =x y =x
z =x

with the use of (4), (2) and x x = 1. Rearranging the equation finishes the proof.

Let P be the transition probabilities of an irreducible Markov chain on the finite,


connected graph G. Throughout this manuscript, we assume that Px y > 0 whenever
(x, y) E is an edge of the graph, totally asymmetric steps are not handled by our
methods (although we suspect that a meaningful limit could be worked out for these
cases). As usual, the graph has a loop on vertex x whenever Px x > 0. We now give a
recipe of how to build an electric network for this chain so that the resulting voltages
and currents have the classical probabilistic interpretations (see, e.g., Doyle and Snell
[5]). This will be done regardless of whether the chain is reversible or irreversible.
The unique stationary distribution of the chain will be called , and we now make the
following choices:

Dx y : = x Px y y Pyx ;

x Px y (8)
x y : = .
y Pyx

Notice first that these choices are consistent with the respective symmetry and log-
antisymmetry of D and . It is also clear that the conductances C and the amplifying
factors can also be expressed with the help of the above quantities. Following the
definition (2), we have
 
Dx = Dx z zx = z Pzx = x . (9)
zx zx

Recall the nonintersecting subsets A and B of the vertex set V , and define, for x V ,
the first reaching times

A0 : = inf{t 0 : X (t) A}, (10)

662 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
and similarly B0 of these sets by the Markov chain started from x. When x A (B), we
define A0 ( B0 , respectively) to be 0. Px will stand for the probabilities associated with
the chain started from x. For short, we will set h x : = Px { A0 < B0 }.

Theorem 1. Set up an electric network with the choices (8) as follows. Apply constant
potentials U A 1 on vertices of the set A and U B 0 on vertices of the set B. More-
over, make no external connections to vertices of V A B. Then for every x V
we have u x = h x .

Proof. By definition, we have h x = 1 for vertices x A, h x = 0 for vertices x B,


and by a first step analysis of the Markov chain,

hx = h y Px y
yx

when x V A B. Next, we find that by definition we have u x = 1 for vertices


x A, u x = 0 for vertices x B, and by (7), (8), and (9),
 Dx y x y 
ux = uy = u y Px y
yx
Dx yx

when x V A B. Thus, h and u satisfy the same (well-defined) equations with


the same boundary conditions; therefore, they agree on all vertices.

A nice consequence of the analogy is what happens to our electric network when
we reverse our Markov chain. The reversed Markov chain has the same stationary
distribution x as the original one, and its transition probabilities become
y
Px y = Pyx .
x

We will simply call the network that corresponds the reversed chain the reversed net-
work, and its parameters will be marked by hats. They are
 
D x y = x Px y y Pyx = y Pyx x Px y = Dx y ;


(11)

x Px y y Pyx 1
x y = = = yx = .
y Pyx x Px y x y

This also implies C x y = C x y and x y = yx = 1/x y for the reversed network; in other
words, reversing the Markov chain simply reverses the direction of our voltage ampli-
fiers while keeping the resistance values intact. A Markov chain is reversible if and
only if the corresponding network has all its amplifiers with x y 1. Indeed, an ampli-
fier of parameter 1 is just a plain wire. Therefore, this case reduces to the classical
reversible setting with ordinary resistors on the edges.

Markovian networks. Notice that P being a Markov  transition probability imposes


restrictions on our electric network. From now on, will be our notation for double
zxV
summation on all neighboring vertices x and z in V .

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 663


Theorem 2. Suppose that we are given an electric network built of our components on
the edges of the finite connected graph G = (V, E). There is an irreducible Markov
chain of graph G and transition probabilities P such that (8) holds if and only if we
have both
 
Dx z x z = Dx z zx (x V ), and (12)
zx zx

Dx z zx = 1. (13)
zxV

In this case,
 Dx y x y
x = Dx z zx and Px y =  . (14)
zx
Dx z zx
zx

Proof. If (8) holds for a Markov transition probability P, then the above formulas
follow from direct verification. Conversely, if (12) and (13) hold, then we make the
definition
Dx y x y
Px y =  ,
Dx z zx
zx

and notice that


  Dx y x y
Px y =  =1 (x V ),
yx yx
Dx z zx
zx

which shows that P is a Markov transition probability matrix. It is irreducible by


positivity of our parameters, and its stationary distribution is the unique vector with
  Dx y x y
y = x Px y = x  (y V ) and
xy xy
Dx z zx
zx

x = 1.
xV


Notice, however, that the vector Dx z zx xV
satisfies the same properties:
zx

  Dx y x y  
Dx z zx Px y = Dx z zx  = Dx y x y = D yx x y
xy zx xy zx
Dxw wx xy xy
wx

by the symmetry of D, and



Dx z zx = 1
zxV

by (13). Therefore, these two vectors agree.

664 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Remark. The normalization (13) is just an artificial choice and is not essential at all.
Given an electric network, multiplying every resistor value by the same constant K
while keeping the amplifiers unchanged will result in the same voltages everywhere
with currents multiplied by 1/K . In particular, Theorem 1 holds true in this case.

Remark. The condition (12) is, on the other hand, very essential, and we will refer to
networks with this property as Markovian. On a technical level, it states that we can
extend the definition (9) of D by
 
Dx = Dx z zx = Dx z x z .
zx zx

Notice also that this implies


 zx + x z 
Dx = Dx z = Cx z . (15)
zx
2 zx

The Markovian property also has a rather intuitive meaning: Considering (7), it states
that the constant potential u x U for all vertices is a valid solution of the (free)
network.

This was, of course, trivially true for the all-resistors networks that correspond to
reversible Markov chains. Consider a set of connected resistors, and apply potential
U on one of the vertices. Then all vertices will stay at potential U , with no current
flowing anywhere in the network. This is not at all straightforward with our generalized
networks of resistors and amplifiers. Applying potential U on one of the vertices, the
amplifiers will change voltages for different parts of the network, and this can keep up
currents in the cycles of the graph G. The Markovian property is that, nevertheless,
each vertex will still stay at the same potential U even if circular (that is, divergence
free) currents flow in the system.
A classical result for Markov chains follows easily from the analogy.

Corollary 1. A Markov chain is reversible if and only if for every closed cycle
x0 , x1 , x2 , . . . , xn = x0 in the graph G we have

Px0 x1 Px1 x2 Pxn1 x0 = Px0 xn1 Pxn1 xn2 Px1 x0 .

In particular, any Markov chain on a finite connected tree G is necessarily reversible.

Proof. Rewriting the above formula and using (14) together with the symmetry of
D, we arrive to the equivalent statement

x0 x1 x1 x2 xn1 x0 = x0 xn1 xn1 xn2 x1 x0 , or


x0 x1 x1 x2 xn1 x0 = 1.

This is of course trivially true in the reversible case where all of the amplifiers have
x y = 1. For the other direction, assume now the above formula holds, and turn it into
electrical language. It says that the total multiplication factor of the potentials is one
along any closed cycle of the circuit. It follows that fixing one vertex at potential U ,
zero currents everywhere in the network is a solution. By uniqueness, this is the only

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 665


solution. The network being Markovian, on the other hand, tells us that every vertex
has to be on potential U . With no currents, the only way this can happen is that all of
the amplifiers have parameter one, and the chain is reversible.
A similar argument works directly for the tree. Since there are no cycles, no current
can flow if only one vertex is fixed at potential U . The Markovian property again tells
us that every vertex will be on potential U , which again means parameter one for all
of the amplifiers and thus reversibility of the chain.

Effective resistance, capacity, and escape probabilities. In this section, we make


sense of effective resistance in our network and give it a probabilistic interpretation
similar to that of the classical case. The setting is the one of Proposition 3: two disjoint
subsets A and B of the vertices are forced to be on constant potentials U A and U B ,
respectively. Define the total incoming current to the set A (cf. (6)) as
 
iA : = ix = ix y . (16)
xA yxA

Notice that, by conservation of currents, this agrees to the sum of currents of edges
across the boundary of A, and also i A + i B = 0. The existence of the effective resis-
tance between sets A and B means that the network between these sets can be replaced
by a single resistor. This is not true for arbitrary configurations since the amplifiers in
general push the characteristics away from that of a single resistor. It is, however, true
for networks that match a Markov chain; this is formulated in the next theorem.

Theorem 3. In a Markovian electric network, for any disjoint A, B V there is a


eff
constant R AB > 0 such that

eff
U A U B = R AB i A (U A , U B R).

Proof. The proof will again proceed along the lines of linearity. When U A = U B ,
then we just have the Markovian solution with zero incoming currents, thus i A = 0 and
everything is trivial. Suppose that we are given arbitrary reals U A = U B , U A = U B .
We consider two solutions of our network: the one u, i that satisfies the given bound-
ary conditions u x U A for x A and u x U B for x B, and one that comes from
the Markovian property: u M x U B , i x 0 (incoming currents to vertex x, not to
M

be mixed with currents i x y of edges!) for all x V . We think of this latter one as a
M

solution with boundary conditions u M x U B for all x A B.


The difference u u and i i of these two is yet another solution due to lin-
M M

earity. It has boundary conditions u x u xM U A U B on x A and u x u xM


U B U B = 0 on x B, and notice that the incoming current to the set A is still
iA iM A = i A 0 = i A.
Again, by linearity, every current and potential can be multiplied by the factor
U A U B
U A U B
and we still have a valid system. This looks like

U A U B U A U B
(u u M ) , (i i M )
UA UB UA UB

666 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
and therefore has boundary conditions

U A U B
(U A U B ) = U A U B on x A, and
UA UB
U A U B
0 = 0 on x B,
UA UB
U A U B
with incoming current i A U A U B
on the set A.

Finally, add the Markovian solution with constant potential u M U B everywhere,

and i M with zero incoming currents in all vertices. This results in

U A U B U A U B 
(u u M ) + U B , (i i M ) + iM
UA UB UA UB

and therefore has boundary conditions

U A U B + U B = U A on x A, and
0+ U B = U B on x B,

U A U B U A U B
with incoming current i A U A U B
+ 0 = iA U A U B
on the set A.
We have thus produced the solution for the boundary conditions U A and U B and con-
cluded that the corresponding incoming current to the set A is

U A U B
i A = i A .
UA UB

This is equivalent to the statement of the theorem, i.e., the ratio (U A U B )/i A is a
constant for all boundary potentials U A and U B .

A Markovian network has a further peculiar property. The effective resistance


R eff
AB stays the same if we reverse each of the amplifiers. Recall that the network
then turns into that of the reversed Markov chain. To prove this property, we follow
Slowiks argument [12] and first define the one-step Markov generator L on functions
f : V R as follows:

(L f )x = Px y ( f y f x ).
yx

Rewriting this via (14) into electrical terms and then applying (12), we get
 Dx y x y 1 
(L f )x = ( fx f y ) = Dx y ( yx f x x y f y ).
yx
Dx Dx yx

This latter formula is meaningful in electrical terms, as soon as we imagine f as a


potential applied on vertices of the graph, and define the resulting currents

i xfy : = Dx y ( yx f x x y f y ) (17)

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 667


via (3). We thus see, comparing to (6), that

i xf
(L f )x =
Dx

where i xf is the current we are required to pump in vertex x in order to maintain


potential f x . The quantity
 
E( f ) : = x f x (L f )x = f x i xf
xV x

is referred to as the energy associated to the pair P, , and we now see that it is the
total electric power we need to pump in the system in order to maintain potential f x
at each vertex x. (As usual, we do not count the external energy sources (absorbers)
required by the amplifiers to work.)
With this preparation, we now prove the following.

Proposition 4. Reversing a Markovian network does not affect the effective resistance.
In other words,

 eff eff
R AB = R AB .

Proof. We repeat Slowiks arguments [12] in the electrical language. Take two func-
tions f and g on V , and apply (12) in the first term and symmetry of the double sum-
mation and of D in the second term below:
  
f x i xg = fx Dx y ( yx gx x y g y )
x xV yx
  
= f x gx Dx y yx f x Dx y x y g y
xV yx yxV
   (18)
= f x gx Dx y x y gx Dx y yx f y
xV yx yxV
  
= gx Dx y (yx f x x y f y ) = gx ixf .
xV yx x

(This equation is the electrical way of saying that the adjoint of the generator is the
one of the reversed process.) As before, fix the boundary conditions u x 1 u x on
x A and u x 0 u x on x B for two scenarios: u, i of the original network and
i of the reversed one. This latter has all its amplifiers reversed, and it corresponds to
u,
the reversed Markov chain. We claim that in our situation i xu 0 ixu on x / AB
since these are free vertices. This, together with the common boundary condition for
the two networks, implies
   
E (u) = u x i xu = u x i xu = u x i xu = u x i xu
xV xA xA xV
   
= u x ixu = u x ixu = u x ixu = u x ixu = E (u).

xV xA xA xV

668 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Rewriting the power, we apply to maintain our boundary conditions gives


= (U A U B )2 C 
AB = (U A U B ) C AB = E (u) = E (u)
C eff AB = C AB .
2 eff eff eff

Next, we introduce what is called the capacity in the theory of Markov chains and
show that it has close connections to the effective resistance. We again assume that
A, B V are nonempty, disjoint, and follow Gaudilli`ere-Landim and Slowik [7, 12]
by defining

A : = inf{t > 0 : X (t) A}

(cf. (10)) and



cap(A, B) : = x Px { B < A }.
xA

eff eff
Proposition 5. The above capacity is simply the effective conductance C AB = 1/R AB
between the sets A and B.

Proof. We use the analogy set up in Theorem 1 as


 
cap(A, B) = x Px y P y { B0 < A0 }
xA yx

= x Px y (1 P y { A0 < B0 })
yxA
 
= Dx y x y (1 u y ) = Dx y ( yx 1 x y u y )
yxA yxA
 
= Dx y ( yx u x x y u y ) = ix y
yxA yxA

= i A = U A C eff
AB = C AB .
eff

Along the way, we also used (12), the fact that u x U A = 1 for all x A, and
finally (3).

It follows immediately that the capacity is a symmetric quantity in its two argu-
ments A and B. The identity cap(A, B) = cap(B,
 A) also follows from the previous
proposition.

Remark. Gaudilli`ere-Landim and Slowik [7, 12] also establish

1 
cap(A, B) = x Pxsy (h x h y )2 ,
2 xyV

with the symmetrised transitions Pxsy = 12 (Px y + Px y ). Equation (14) together with (11)
and (2) gives

Dx y x y + Dx y yx Cx y
Pxsy = = , (19)
2Dx Dx

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 669


and the capacity gets another interesting interpretation. In the setting of Theorem 1,
1 
cap(A, B) = C x y (u x u y )2 , (20)
2 xyV

the ohmic power loss on the resistors, should we apply the actual voltages u on them
without the amplifiers. It is important to note that this interpretation is nonphysical.
With the amplifiers the ohmic losses are not given by the above formula; without the
amplifiers, the voltages u would be totally different.

The capacity, being C eff AB is, however, equal to the total power (U A U B ) C AB
2 eff

= (1 0) C AB we need to pump in the set A to keep it on potential U A = 1.


2 eff

We repeat the computation for (20) in the electrical language. First, notice that, by
(17) and (2),
1 u  + x y + yx 
(i x y + ixuy ) = Dx y
yx xy
ux u y = C x y (u x u y ),
2 2 2
fixing the voltages everywhere, the average of the current, and the reversed current is
the one of the network without amplifiers. This is the starting point to expand the right
hand-side of (20):

1 
C x y (u x u y )2
2 xyV

1 
= (u x u y )(i xuy + ixuy )
4 xyV

1  1  1  1 
= u x i xuy + u x ixuy u y i xuy u y ixuy
4 xyV 4 xyV 4 xyV 4 xyV

1  1  1  1 
= u x i xuy + u x ixuy + u
u y i yx + u y iyx
u
4 xyV 4 xyV 4 xyV 4 xyV

1 1  u 1  1  u
= u x i xu + ux ix + u y i yu + uyiy
4 xV 4 xV 4 yV 4 yV

1 1 1 1 
= u x i xu + u x i xu + u y i yu + u y i yu = u x i xu
4 xV 4 xV 4 yV 4 yV xV

with the use of the adjoint identity (18). Now, as in the proof of Proposition 4, apply our
usual boundary conditions, and the right hand-side becomes the total power required
to maintain the boundary conditions or, equivalently, C eff AB .
Finally, we define the escape probability from the set A and show its connection to
the effective resistance. This goes exactly as in the reversible case. Suppose that the
Markov chain is started from its stationary distribution , conditioned on being in the
set A. (When A = a is a singleton, this is just the unit mass on the vertex a.) The
escape probability is the chance that the chain reaches set B before its first return to A:
 x cap(A, B) C eff C eff
P{ B < A } = Px { B < A } = = AB =  AB .
xA
(A) (A) Dz C zy
zA yzA

670 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
The last step used (15). The right-hand side agrees word for word with the classical
reversible result, the starting point of elegant recurrence-transience proofs.
By symmetrizing a Markov chain we mean replacing its transition probabilities
by P s .

Corollary 2. Symmetrizing a Markov chain never increases the escape probabilities.

Proof. By symmetrizing, we keep the stationary distribution = D and the con-


ductances C unchanged while the amplifiers all become trivial: 1. This can easily
be seen via (11) and (19). Denote the potentials that result our usual boundary condi-
tions U A 1 and U B 0 by u in the original network and by u s in the symmetrized
one. The classical Dirichlet principle for the reversible case tells us that u s is the poten-
tial that minimizes the ohmic power losses in the resistors for the reversible chain.
Therefore,
1  1 
C x y (u sx u sy )2 C x y (u x u y )2 .
2 xyV 2 xyV

Since the conductances agree for the two networks, the left-hand side is the capacity
of the symmetrized network, while the right-hand side is the one of the original net-
work. Dominance of the capacities then implies that of the escape probabilities as the
conductances are not changed by symmetrizing.

The current. In this section, we give a probabilistic interpretation of the currents i of


the network. We take a singleton set A = {a} and another, arbitrary set B  a. Start
the Markov chain from a, and define vx as the expected number of visits to vertex x
before the first hitting of the set B. Clearly, vx 0 for all x B. A last step analysis
shows that for any a = x / B we have
  x
vx = v y Pyx = v y Px y .
yx yx
y

In other words, vx /x is harmonic w.r.t. P. The boundary conditions are va /a , fixed


(to be determined later), and vx /x 0 on x B. Consider now the corresponding
electric network of parameters C and with these boundary conditions. It follows that
its potentials u x agree with vx /x for all x V , and its currents are (see (3))
vx vy
ix y = Dx y yx x y
x y
vx vy
= Dx y x y yx = vx Px y v y Pyx .
x y

This latter is the expected number of jumps from x to y minus that from y to x before
absorption in B. It remains to fix the boundary term u a = va /a . This is done by the
simple observation that the chain has to exit vertex a one more time than enter it; thus,

ia = iay = 1.
ya

This fixes a unique potential u a = va /a for the boundary. Everything is analogue to


the classical reversible case, except that we need to use the reversed network.

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 671


Commute times and costs. We have found that a classical result of Chandra, Ragha-
van, Ruzzo, Smolensky, and Tiwari [2] on effective resistance and commute times (or
costs) can be extended easily to the irreversible case. Work on commute times in this
case has also been done by Doyle and Steiner [6]. Fix two vertices a = b of the graph
and a cost function k x y on edges of the graph. Costs k x y and k yx can be different; we
do not require any relation between these two. The expected cost of the chain from a
to b is the expected price to pay until the first hitting of the chain to vertex b if started
from a. It is defined as
b0

k
Hab : = Ea k X (t1) X (t) ,
t=1

where we (ab)used definition (10) (by writing b0 for {b}


0
). We consider an empty sum
to be zero for the case Haa = 0. In particular, for k 1 we arrive to the expected
k
1
hitting time Hab = Ea (b0 ). Define the expected commute cost K abk
= Hab k
+ Hba
k
; this
becomes the expected commute time for k 1.

Theorem 4. The expected commute cost can be computed by


eff
k
K ab = Rab D k ,

with D k : = Dx y x y k x y .
yxV

We spell out the case k 1. The expected commute time is


  
1
K ab = Rabeff
D 1 = Rab
eff
Dx y x y = Rab
eff
Dx = Rab
eff
Cx z
yxV xV zxV

via (15). This is the exact same formula as the one of [2] for the reversible case.

Proof. We start with a first step analysis and write, for any x = b,
  Dx y x y
k
Hxb = Px y (k x y + Hyb
k
)= (k x y + Hyb
k
). (21)
yx yx
D x

In our electric network, impose boundary conditions u x = Hxb


k
on each vertex x. This
results in currents

i x y = Dx y ( yx Hxb
k
x y Hyb
k
),

according to (3), and, via (21), the necessity of pumping external currents
  
ix = ix y = Dx y yx Hxb
k
Dx y x y Hyb
k

yx yx yx

= Dx Hxb
k
Dx Hxb
k
+ Dx y x y k x y
yx

= Dx y x y k x y
yx

672 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
into each vertex x = b. By conservation of current,

ib = Dby by kby D k
yb


with D k : = Dx y x y k x y .
yxV
A second configuration we consider is u x = Hxa
k
on each vertex x. In a similar
fashion, this has external currents

i x = Dx y x y k x y
yx

for all x = a, and



i a = Day ay kay D k .
ya

Our equations being linear, the difference u u  is also a solution of the network. It
has potentials and external currents

u a u a = Hab
k
Haak
= Habk
and
 
i a i a = Day ay kay Day ay kay + D k = D k in vertex a,
ya ya

u b u b = Hbb
k
Hbak
= Hba k
and
 
i b i b = Dby by kby D k Dby by kby = D k in vertex b,
yb yb
 
i x i x = Dx y x y k x y Dx y x y k x y = 0 elsewhere.
yx yx

Therefore, this combination only has boundary conditions at a and b, and all other
vertices are free. The effective conductance between a and b is given by

i a i a Dk Dk
eff
Cab = 
= k = k ,
ua ua ub + ub
 Hab + Hba
k
K ab

which completes the proof.

A nonmonotone example. We have seen many nice properties of the network. The
next step in the reversible case is making use of Rayleighs monotonicity property.
In the reversible case, the effective resistance is a nondecreasing function of any of
the individual resistances. Here, we show an example to demonstrate that this is not
the case in the irreversible case; the naive approach does not work. Resistance values
below are in ohms.

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 673


1/5 5/13
3/2 3/2 2/2 2/2

a b

R
5 13/5
3/2 3/2 2/2 2/2

We immediately rewrite this network to an equivalent form using the primer and secun-
der alternatives.

1/5 5/13
9/5 18/5

a b
5 R 13/5
9 18/13

First notice that a circular current of 4 amperes in the positive direction and no current
through R gives a constant 9 volts free solution; thus, the network is Markovian for all
R values. Therefore, it has an effective resistance, and it is perhaps easiest to compute
if we fix u a = 5 volts and u b = 0. Then we just need to figure out currents in this
diagram.

x
1V 9/5 18/5
R

25 V 9 18/13
y

One way of proceeding is to write the equations for the voltage dividers in x and y.
These are

1 5
+0 5
+ uy 1
10R 6
ux = 9 18 R
= + uy,
5
9
+ 5
18
+ R1 15R + 18 5R + 6
25 1
+0 13
+ ux 1
50R 6
uy = 9 18 R
= + ux .
1
9
+ 13
18
+ R1 15R + 18 5R + 6

The solution is u x = 10R+72


15R+36
and u y = 50R+72
15R+36
, and the effective resistance is

ua ua ua 675R + 1620 27 1296


eff
Rab = = = = = + ,
ia i b 5
u
18 x
+ 13
u
18 y
350R + 648 14 1225R + 2268

a decreasing function of R. The situation reminds the authors to Braesss paradox [1].

674 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
The Dirichlet and Thomson principles. Having the direct monotonicity approach
fail, we now give an insight to the Dirichlet and Thomson energy minimum princi-
ples for the irreversible case. These are the fundamental principles that enable one to
derive Rayleighs monotonicity law in the reversible case. The irreversible case was
established in this form by Gaudilli`ere-Landim and Slowik [7, 12]. Below, we simply
give a translation of their results without proof into the electrical language. In fact, we
stick to the notation of Slowiks Proposition 2.6 as closely as possible.
As before, take two sets A B = of vertices, and define

H A,B : = {u : V R : u| A 1, u| B 0}.

We think of such functions as voltages with respective boundary conditions on A and


B. Set U0A,B as the set of currents i with zero external currents i x for x / (A B)
(compare to (6)), and total incoming current i A = 0 = i B to the set A (and therefore
to the set B as well); see (16). The next quantity to define is, for currents i,

1  1 2 1 
D(i) = ix y = Rx y i x2y ,
2 yxV x Px y
s 2 yxV

the ohmic power losses on the resistors; see (19). Finally, recall (17), and reverse the
amplifiers in there to get i . We can now state the following.

Proposition 6 (Dirichlet principle, Slowik [12] Proposition 2.6). The capacity


between sets A and B is given by

cap(A, B) = min min D(iu i).


uH A,B iU0
A,B

The minimum is attained for u = 12 (u AB + u AB ) and i = iu i s, u AB , where u AB and


u AB are the physical potentials in the network and in the reversed network under our
boundary conditions, respectively, and i s, u AB is the current that would result under the
potential (u AB ) without the amplifiers.

Using words, find a potential function u with our boundary conditions (this results
in currents iu in the reversed network) and a divergence-free current i on the free
vertices with total incoming flow i A = 0 such that the difference of these two currents
minimizes the ohmic losses on the network. Then these ohmic losses sum up to the
total physical power required to maintain the boundary conditions (this is the effective
conductance C effAB = cap(A, B) since the boundary voltage difference is U A U B
= 1). We emphasize that the minimizers u and i are nonphysical, except in the
reversible case when u becomes the physical voltage while i 0.
Next, we define

G 0A,B : = {u : V R : u| A u| B 0},

and U1A,B as the set of currents i with zero external currents i x for x
/ (A B) (com-
pare to (6)), and total incoming current i A = 1 = i B to the set A (and therefore -1 to
the set B); see (16).

Proposition 7 (Thomson principle, Slowik [12] Proposition 2.6). The capacity


between sets A and B is also given by

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 675


1
cap(A, B) = max max .
iU1A,B uG 0A,B D(i iu )

The maximum is attained for u = 12 (u AB u AB )/ cap(A, B), and i = i 0 AB + iu ,


s, u
s, u AB
where i 0 is the current that would result under the potential (u AB ) without the
amplifiers, except that it is normalized to have unit total inflow in A.

Using words, find a potential function u that vanishes in A and B (this results in
currents iu in the reversed network) and a unit flow i such that the difference of these
currents minimizes the ohmic losses on the network. Then these ohmic losses sum up
to the total physical power required to maintain a unit flow (the reciprocal is again
the effective conductance since the total current flow is one). Again, the minimizers
are nonphysical, except for the reversible case when u 0 and i is the physical unit
current flow.
How these principles can be used toward monotonicity is a question left for future
work.

4. SERIES, PARALLEL, AND STARDELTA TRANSFORMATIONS. In this


last section, we dive a little bit into network-algebra by showing how units in series,
parallel, star, or delta configurations behave. Series and parallel are simple and nice
situations.

Proposition 8. Two of our units in series of respective parameters (R, ) and (Q, )
can be replaced by a single unit of parameters
 ( + 1) +1 
R +Q , .
+ 1 + 1

Proof. We use the primer and secunder alternatives as follows.


R/2 R/2 Q/2 Q/2

R pr Q se

R pr Q se

R pr Q  pr

S pr

S/2 S/2

It is obvious that the parameter of the voltage amplifier is . Applying the transcrip-
tion formula for each step, the resistance of the substitute element can be determined:
2 2 2 2
= (R pr + Q  )
pr
S = S pr = R pr + Q se
+ 1 + 1 + 1 + 1

676 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
( + 1) +1
=R +Q .
+ 1 + 1

Proposition 9. Two of our units in parallel of respective parameters (R, ) and


(Q, ) can be replaced by a single unit of parameters
 RQ Q( + 1) R( + 1) 
, + .
R + Q Q( + 1) + R( + 1) Q( + 1) + R( + 1)

Notice the classical parallel formula for the resistance and the weighted average for
the amplifier.

Proof. This case cannot be reduced with transformations into one single unit, but
the alternative elements are still useful. Below are two equivalent circuits.


R/2 R/2 R se

ux uy ux uy

Q/2 Q/2 Q se

The total current from x to y will be the sum of currents like in (1) for the top and the
bottom branches; therefore, it will be in the same form. This proves that a single unit
can be used as a replacement. Its secunder alternative will be as follows.


ux S se uy

It remains to determine the parameters S and . Assume first u x = 1 and zero total
current;that is, leave vertex y free. Then the secunder resistors act as a voltage divider,
giving

Q se + R se
uy = .
Q se + R se

In the simple unit this agrees to the value of the amplifier, therefore

Q se + R se Q( + 1) + R( + 1)
= = .
Q +Rse se Q( + 1) + R( + 1)

Next, when u x = 0, the amplifiers keep the potentials at zero, and the parallel formula

R se Q se
S se =
R se + Q se

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 677


follows. Returning to the original alternative,

2 Q( + 1) + R( + 1) R se Q se RQ
S= S se = 2 se = .
+1 (R + Q)( + 1)( + 1) R + Q se R+Q

Notice that in both the series or parallel formulas the resulting resistances are mono-
tone increasing functions of the original ones. Not all networks can, however, be
reduced using only series or parallel substitutions. The next step of transformations
for classical resistor networks is the stardelta transformation. As we will see shortly,
it is here where nonmonotonicity issues begin. Our nonmonotone example is also one
that cannot be reduced using only series or parallel substitutions.
In our case, star and delta look like
Uz
S/2

Uz

2
Q

R /


/2


S/2

Q

R /


R/
Q/
2

/2
2
Ux S  /2 S  /2 Uy
R/
Q/
2 2
Ux Uy

where it is essential that the centre of star has no further connections. The question is
whether the parameters can be linked so that these two networks behave identically
under all scenarios. We start by rewriting the above into the equivalent secunder alter-
natives and work with those thereafter. Any formulas can be rewritten into the original
parameters via (4), and we avoid that for the sake of simplicity.
Uz

Uz

Q
se
S se


R  se


se R se
Q S  se
Ux Uy

Ux Uy

Proposition 10. Any star can be transformed into an equivalent delta, the parameters
of which are given by

678 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
R se S se + Q se S se + Q se R se
S =
se
,
S se
R se S se + Q se S se + Q se R se
Q
se
= ,
Q se
R se S se + Q se S se + Q se R se
R =
se
,
R se

and  = ,  = ,  = . The notations are the ones used in the above picture.

Not every delta, however, can be transformed into a star.

Proposition 11. A delta can be transformed into an equivalent star if and only if

   = 1. (22)

Even in this case, the resulting star is not unique. With any positive number > 0, it
can have parameters

 1/3  2/3 R  se Q  se
S se = ,
 2/3  1/3 Q  se +  2/3  1/3 R  se +  1/3  2/3 S  se
 1/3  2/3 R  se S  se
Q se = ,
 2/3  1/3 Q  se +  2/3  1/3 R  se +  1/3  2/3 S  se
 2/3  1/3 Q  se S  se
R se = ,
 2/3  1/3 Q  se +  2/3  1/3 R  se +  1/3  2/3 S  se

and =  1/3  2/3 , =  2/3  1/3 , =  2/3  1/3 .

Proof of both stardelta and deltastar. We determine and compare the incoming
currents on vertices x, y and z in the two networks. We start with star. The voltages
at the outer points of the resistances are Ux , U y , and Uz , thus the voltage in the
center point is

Ux Q1se + U y R1se + Uz S1se Ux R se S se + U y Q se S se + Uz Q se R se


U= = .
1
Q se
+ 1
R se
+ 1
S se
R se S se + Q se S se + Q se R se

Therefore, the respective currents flowing from x, y and z into the center point are

Ux U (S se + R se )Ux S se U y R se Uz
ix = = ,
Q se R se S se + Q se S se + Q se R se
U y U (S se + Q se )U y S se Ux Q se Uz
iy = = ,
R se R se S se + Q se S se + Q se R se
Uz U (R se + Q se )Uz R se Ux Q se U y
iz = = .
S se R se S se + Q se S se + Q se R se
Next, we turn to delta. The currents flowing on the edges are

 Ux U y  U y Uz  U z U x
ix y = , i yz = , i zx = .
S  se Q  se R  se

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 679


Thus, the currents flowing from x, y, and z into the network can be written as

(  R  se + S  se )Ux R  se U y  S  se Uz
i x = i x y i zx = ,
R  se S  se
( S  se + Q  se )U y  Q  se Ux S  se Uz
i y = i yz i x y = ,
Q  se S  se
( Q  se + R  se )Uz Q  se Ux  R  se U y
i z = i zx i yz = .
Q  se R  se

In a star / delta substitution the currents have to be equal for all possible voltages Ux ,
U y , and Uz . Hence, by comparing the coefficients of the voltages in the formulas for the
currents, the connections between the quantities can be determined. It is subservient
to consider first the coefficient of U y in the formula for i x , the coefficient of Uz in the
formula for i y and the coefficient of Ux in the formula for i z :

1 S se
= ,
S se
R se S se + Q se S se + Q se R se
1 Q se
= , (23)
Q  se R se S se + Q se S se + Q se R se
1 R se
= .
R  se R se S se + Q se S se + Q se R se

Second, consider the coefficient of Uz in i x , the coefficient of Ux in i y , and the coeffi-


cient of U y in i z :

 R se
se = ,
R  R S + Q se S se + Q se R se
se se

 S se
= ,
S  se R se S se + Q se S se + Q se R se
 Q se
= .
Q  se R se S se + Q se S se + Q se R se

By dividing the corresponding equations in the two triplets of equations, we get


 = ,  = and  = .

Finally, with these substitutions the coefficients of Ux in i x , U y in i y , and Uz in i z also


match. This proves Proposition 10.
Notice that, for any star, in the substitute delta, we have    = 1 and that mul-
tiplying the amplifiers in the star by a common constant does not change the parameters
of the voltage amplifiers in the substitute delta. Therefore, the inversion of the previous
formulas is possible only if (22) holds, and even in this case, it is not uniquely deter-
mined; a constant factor has to be chosen. This can be written as = 3 where
> 0 is the free parameter. Then

=   = 
2/3  1/3
=  
1/3 2/3 2/3 1/3
, and .

680 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
To invert the resistances (23), first note that

1 1 1 1 1 1 1
se se + se se + se se = .
R  S  Q  S  Q  R  R S + Q S se + Q se R se
se se se

Thus, from (23) and from (22):


1
S  se
S se = 1
R  se
1
S  se
+ 1
Q  se
1
S  se
+ 1
Q  se
1
R  se

 1/3  2/3 R  se Q  se
= ,
 2/3  1/3 Q  se +  2/3  1/3 R  se +  1/3  2/3 S  se
1
Q  se
Q se = 1
R  se
1
S  se
+ 1
Q  se
1
S  se
+ 1
Q  se
1
R  se

 1/3  2/3 R  se S  se
= ,
 2/3  1/3 Q  se +  2/3  1/3 R  se +  1/3  2/3 S  se
1
R  se
R se = 1
R  se
1
S  se
+ 1
Q  se
1
S  se
+ 1
Q  se
1
R  se

 2/3  1/3 Q  se S  se
= .
 2/3  1/3 Q  se +  2/3  1/3 R  se +  1/3  2/3 S  se

The condition (22) is that at any constant potential the delta has no circular current
by itself. This is rather restrictive; thus, deltastar transformations cannot be used to
reduce a general network. After the lack of monotonicity, this is the second serious
drawback of our networks compared to the classical resistor-only case.

ACKNOWLEDGMENT. The authors wish to thank Edward Crane, Nic Freeman, Alexandre Gaudilli`ere,
Claudio Landim, Gabor Pete, Martin Slowik, Andras Telcs, and Balint Toth for stimulating discussions on this
project. M. Balazs was partially supported by the Hungarian Scientific Research Fund (OTKA) grants K60708,
F67729, K100473, K109684, and the Bolyai Scholarship of the Hungarian Academy of Sciences.

REFERENCES

1. Wikipedia contributors, Braesss paradox, Wikipedia, The Free Encyclopedia, http://en.wikipedia.


org/wiki/Braesss_paradox.
2. A.K. Chandra, P. Raghavan, W.L. Ruzzo, R. Smolensky, P. Tiwari, The electrical resistance of a graph
captures its commute and cover times, Comput. Complexity 6 no. 4 (1996) 312340.
3. J. L. Doob, Discrete potential theory and boundaries, J. Math. Mech., 8 (1959) 433458; erratum 993.
4. P. G. Doyle, Energy for Markov chains (1994), Preprint under GNU FDL http://www.math.
dartmouth.edu/~doyle/.
5. P. G. Doyle, J. L. Snell, Random Walks and Electric Networks. Carus Mathematical Monographs, Vol.
22, Mathematical Association of America, Washington, DC, 1984
http://arxiv.org/abs/math/0001057, available under GNU GPL.
6. P. G. Doyle, J. Steiner, Commuting time geometry of ergodic Markov chains, (2011), Preprint in the
public domain. http://arxiv.org/abs/1107.2612.
7. A. Gaudilli`ere, C. Landim, A dirichlet principle for non-reversible Markov chains and some recurrence
theorems, Prob. Theory Related Fields 158 no. 12 (2014) 5589.
8. S. Kakutani, Markov processes and the Dirichlet problem, Proc. Jap. Acad., 21 (1945) 227233.
9. J. G. Kemeny, J. L. Snell, A. W. Knapp, Denumerable Markov Chains. Van Nostrand, Princeton,
NJ.-Toronto, Ont.-London, 1966.

AugustSeptember 2016] ELECTRIC NETWORKNONREVERSIBLE 681


10. R. Lyons, Y. Peres, Probability on Trees and Networks. Cambridge Univ. Press, 2014, http://mypage.
iu.edu/~rdlyons/prbtree/prbtree.html.
11. C. St. J. A. Nash-Williams, Random walk and electric currents in networks, Proc. Cambridge Philos.
Soc., 55 (1959) 181194.
12. M. Slowik. A note on variational representations of capacities for reversible and non-reversible Markov
chains, (2013), Unpublished manuscript.
13. A. Telcs, The Art of Random Walks. Lecture Notes in Mathematics, Vol. 1885, Springer-Verlag, Berlin,
2006.


MARTON
BALAZS University of Bristol. Part of this work was done while the author was affiliated with
the Institute of Mathematics, Budapest University of Technology and Economics, the MTA-BME Stochastics
Research Group, and the Alfred Renyi Institute of Mathematics.
[email protected]


ARON FOLLY This work was done while the author was affiliated with the Department of Mathematics,
Ludwig-Maximilians-Universitat Munchen and the Institute of Mathematics, Budapest University of Technol-
ogy and Economics.
[email protected]

100 Years Ago This Month in The American Mathematical Monthly


Edited by Vadim Ponomarenko
The Mathematical Gazette, May, 1916, contains an interesting article by Professor
H. S. CARSLAW on A progressive income-tax, a scheme of taxation introduced in
Australia. Schedules are deduced for Rate of tax upon income derived from personal
exertion, and Rate upon income derived from property.
According to cable reports from London, the Council of Trinity College, Cam-
bridge, has removed Professor BERTRAND RUSSELL from his lectureship in logic
and principles of mathematics on account of his having been convicted under the
defense of the realm act for publishing a leaflet defending the Conscientious Objec-
tor to service in the British army. Professor Russell is well known in this country
through his mathematical writings.
Excerpted from Notes and News 23 (1916) 315320.

682 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A Characteristic Averaging Property of the
Catenary
Vincent Coll and Jeff Dodd

Abstract. It is well-known that the catenary is characterized by an extremal centroidal con-


dition: It is the shape of the curve whose centroid is the lowest among all curves having a
prescribed length and specified endpoints. Here, we establish a broad characteristic averaging
property of the centenary that yields two new centroidal characterizations.

1. INTRODUCTION. In the May 1690 Acta Eruditorum1 , Jacob Bernoulli chal-


lenged the mathematical community to determine the shape of the chain curve
formed by an idealized chain hanging from two points with no force other than gravity
acting upon it. The following year, Johann Bernoulli, Huygens, and Leibniz indepen-
dently solved the problem to find the curve that Huygens named the catenary and that,
in todays notation, is the graph of the hyperbolic cosine function
 
x c
y = f (x) = k cosh . (1)
k

Johann Bernoulli and Leibniz noted three surprising ways in which the catenary
mimics the graph of a constant function [5]. To formulate these, consider the graph
y = f (x) of a smooth, strictly positive function f as depicted in Figure 1.

y
y = f (x)

( xC , y C )
C

( x A , yA )
A

x
a b

Figure 1. Two centroidal properties of the catenary (in standard vertical position)

For an interval [a, b] on the x-axis, let C denote the segment of the graph of f lying
over [a, b], and let A denote the shaded planar region lying over [a, b] that is bounded
http://dx.doi.org/10.4169/amer.math.monthly.123.7.683
MSC: Primary 34A00
1 Latin for transactions of the scholars, Acta Eruditorum, the first German journal of science and scholar-

ship, was published from 1682 to 1782.

AugustSeptember 2016] CHARACTERISTIC AVERAGING PROPERTY 683


above by C. Let (x C , y C ) be the coordinates of the centroid of C and (x A , y A ) be
the coordinates of the centroid of A. Both Bernoulli and Leibnitz formulated what we
would now call a differential equation for the catenary, reading

f (x) = k 1 + [ f  (x)]2 , for each x R, (2)

and it follows directly from (2) that the catenary in standard vertical position (1) shares
the following properties with the graph of the constant function f (x) = k.
Proportionality. For every interval [a, b], (area of A) = k (arclength of C):
 b  b 
f (x) d x = k 1 + [ f  (x)]2 d x. (3)
a a

Horizontal Collocation. For every interval [a, b], x A = x C :


b b 
x f (x) d x x 1 + [ f  (x)]2 d x
a
b = a b  . (4)
a
f (x) d x a
1 + [ f  (x)]2 d x

1
Vertical Bisection. For every interval [a, b], y A = y :
2 C
b b 
(1/2)[ f (x)]2 d x 1 a f (x) 1 + [ f  (x)]2 d x
a
b = b . (5)
a
f (x) d x 2
a
1 + [ f  (x)]2 d x

Note that while the horizontal collocation property (4) and the vertical bisection prop-
erty (5) look a bit more complicated than the proportionality property (3), they are
more elegant in that they involve no unit-dependent constants, whereas the constant k
in the proportionality property has the dimension of length.
It is natural to ask to what extent each of these properties characterizes the catenary.
As far as we know, this question has never been addressed for the two centroidal prop-
erties of horizontal collocation and vertical bisection. Here, we show that each of these
two centroidal properties is in fact a characteristic property of the catenary. The proof
of this fact is surprisingly subtle. Moreover, it reveals a broad averaging property of
the catenary that, despite its geometric manifestations, is essentially analytic in nature.

2. FROM PROPERTIES TO CHARACTERIZATIONS. We wish to determine


the extent to which the horizontal collocation property (4) and vertical bisection prop-
erty (5) characterize the catenary. But first, we warm up by addressing the same ques-
tion for the proportionality property (3). To what extent does this property characterize
the catenary?
A rough answer to this question requires only the fundamental theorem of calculus.
For a continuously differentiable function f , fixing a and differentiating with respect
to b in (3) recasts the global proportionality property as a local property in the form
of the differential equation (2). This differential equation has the singular solution
f (x) = k, and separation of variables and integration yield a catenary of the form
(1). This straightforward argument has been included as an example, or prompted
as an exercise, in differential equation textbooks for at least the last 150 years; see,
for example, the well-known 1859 text by George Boole [2]. It leaves one with the

684 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
impression that the catenary is the only nontrivial continuously differentiable function
satisfying the proportionality property.
The subtlety here, as noted recently by E. Parker [6], is that it is possible to form
positive, nonconstant, continuously differentiable solutions of (2) by joining a por-
tion of the graph of f (x) = k with the left half and/or right half of a catenary of the
form (1). But of course, these piecewise defined functions are not twice differentiable
everywhere, so the precise answer to our warm up question reads this way.
Parkers characterization of the catenary. Catenaries of the form (1) are the only
positive, nonconstant, twice-differentiable functions satisfying the differential equation
(2) or the proportionality property (3).
We were surprised to discover that it is not such a straightforward matter to charac-
terize the functions that satisfy the horizontal collocation property (4) or the vertical
bisection property (5). (We invite the reader to spend a few minutes trying!) It is eas-
iest to see the source of the difficulty and to distill what we need to overcome it if we
take a step back and notice that each of the four quantities appearing in (4) and (5) can
be written in this form:
b
a
g(x)w(x) d x
b (6)
a
w(x) d x

where w is a positive, continuous function. For example, on the left-hand side of


(4), g(x) is the horizontal coordinate x and w(x) d x is the differential area element
f (x) d x. In general, the expression (6) is a weighted mean of the function g over
[a, b], where the weight function w is defined globally on all of R but is normalized
locally over each interval [a, b]. The natural hope is that if we could untangle these
global and local aspects of the horizontal collocation and vertical bisection properties,
then we might be able to localize these properties completely in the form of the dif-
ferential equation (2). Toward this end, we have formulated and proven the following
fact, which is new to us.

Lemma (Equal Averages Principle). Let g be a function that is continuously differ-


entiable on an interval [c, d] and such that g  (x) = 0 for x (c, d). Suppose that w1
and w2 are functions that are continuous on [c, d] and positive on (c, d). Then
b b
g(x)w1 (x) d x g(x)w2 (x) d x
a
b = a
b , for every subinterval [a, b] of [c, d] (7)
a
w1 (x) d x a
w2 (x) d x

if and only if w1 = kw2 for some constant k > 0.

Proof. It is clear that if w1 = kw2 for some  x constant k > 0, then (7) holds. To
prove the converse implication, let Wi (x) = c wi (s)ds for i = 1 and 2. Then for any
x (c, d), writing (7) for the interval [c, x] gives us
x x
g(t)W1 (t) dt g(t)W2 (t) dt
c
= c .
W1 (x) W2 (x)

Integrating by parts in the numerators on both sides yields


x x
g(x)W1 (x) 0 g  (t)W1 (t) dt g(x)W2 (x) c g  (t)W2 (t) dt
= . (8)
W1 (x) W2 (x)

AugustSeptember 2016] CHARACTERISTIC AVERAGING PROPERTY 685


Because g  is continuous and nonzero on (c, d), g  (x) is either positive or negative on
all of (c, d), we can safely rewrite (8) as
x  x 
g (t)W1 (t) dt g (t)W2 (t) dt g  (x)W1 (x) g  (x)W2 (x)
c
= 0
=  x =  x .
g  (x)W1 (x) g  (x)W2 (x) c
g  (t)W1 (t) dt 0
g  (t)W2 (t) dt
x
Letting G i (x) = c g  (t)Wi (t) dt for i = 1 and 2, we have that

G 1 (x) G  (x)
= 2 = ln |G 1 (x)| = ln |G 2 (x)| + I = G 1 (x) = kG 2 (x) (9)
G 1 (x) G 2 (x)
where I is a constant of integration and, because G 1 (x) and G 2 (x) have the same sign
on (c, d), k > 0. Finally, differentiating the last equation in (9) and again keeping in
mind that g  (x) = 0 for all x (c, d), it follows for all such x that

g  (x)W1 (x) = kg  (x)W2 (x) = W1 (x) = kW2 (x) = w1 (x) = kw2 (x).

Armed with the equal averages principle, we can now formulate characterizations
for the catenary based on the horizontal collocation property and the vertical bisection
property.

Theorem (Centroidal Characterizations of the Catenary). Catenaries of the form


(1) are the only positive, nonconstant, twice-differentiable functions satisfying either
the horizontal collocation property or the vertical bisection property.

Proof. If a positive, nonconstant, twice-differentiable f satisfies the horizontal col-


location property, then it satisfies (4). Applying the equal averages principle with
g(x) = x, w1 (x) = f (x), and w2 (x) = 1 + [ f  (x)]2 yields the differential equation
(2), which must hold everywhere, and the conclusion follows from Parkers character-
ization of the catenary.
If a positive, nonconstant, twice-differentiable f satisfies the vertical bisection
Applying the equal averages p with g(x) = (1/2) f (x),
property, then it satisfies (5). 
w1 (x) = f (x), and w2 (x) = 1 + [ f  (x)]2 yields the differential equation (2), which
must hold on any open interval where f  = 0.
By the continuity of f  , S = {x R : f  (x) = 0} can be written as the disjoint
union of open intervals, and at any endpoint p of such an open interval, f  ( p) = 0. On
each of these open intervals, by Parkers characterization of the catenary, f is given
by a segment of a catenary of the form (1), so each open interval must be of the form
(, c1 ) or (c2 , ). Outside of these intervals, f  = 0, so f is constant. The only way
in which f can be twice-differentiable is if S = (, c1 ) (c2 , ) where c1 = c2 ,
yielding a catenary of the form (1).

Remark. It is tempting to modify the vertical bisection property by requiring that the
graph of a positive, continuously differentiable function y = f (x) satisfy y A = y C
over all intervals [a, b] for some = 1/2. However, there are no such functions. If
b  b  
(1/2)[ f (x)]2 d x f (x) 1 + [ f  (x)]2 d x
a
b = a
b , for all intervals [a, b],
0
f (x) d x a
1 + [ f  (x)]2 d x

then letting b a and evaluating the resulting indeterminate limits by lHospitals


rule yields = 1/2.

686 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
 In the equal averages principle, letting the weight functions w1 and w2 be f and
1 + [ f  ]2 yields the same differential equation for f regardless of the choice of the
function g, which is the differential equation (2) whose only positive, nonconstant,
twice-differentiable solution is the catenary function. So the horizontal collocation and
vertical bisection properties are only special cases of a broad characteristic averaging
property of the catenary or, to say it another way, of a multitude of characteristic aver-
aging properties of the catenary corresponding to different choices for the function g.
For example, letting g(x) = (x x0 )n for any real number x0 and any integer
n 1, we see that catenaries of the form (1) are the only positive, nonconstant, twice-
differentiable functions f such that over every interval [a, b],
b b 
(x x0 )n f (x) d x (x x0 )n 1 + [ f  (x)]2 d x
a
b = a
b . (10)
a
f (x) d x a
1 + [ f  (x)]2 d x

That is, for each interval [a, b], the nth moment of the region A under the graph of
f on [a, b] and the nth moment of the segment C of the graph of f over [a, b] are
the same with respect to any vertical axis x = x0 . If the graph of the catenary (1) is
assigned a uniform linear mass density and the region below this graph is assigned a
uniform area mass density, then (10) has natural physical interpretations when n = 1
and n = 2.
When n = 1, (10) is the horizontal collocation property and a straightforward phys-
ical interpretation is that when f is a catenary of the form (1), then for any inter-
val [a, b], the x-coordinate of the center of mass of the region A is the same as the
x-coordinate of the center of mass of the segment C. An equivalent, and perhaps more
counterintuitive, physical interpretation is that for any interval [a, b], the x-coordinate
of the center of mass of C A is unaffected by the uniform mass densities assigned
to the graph of the catenary and the region under the graph of the catenary. That is,
suppose we were to build a fence on level ground with the top of the fence following
a catenary of the form (1). Then the horizontal position of the center of mass of any
slice of the fence bounded by two vertical lines would be unchanged by the addition of
a railing of uniform linear mass density running along the top of the fence, no matter
how heavy the railing.
When n = 2, the left- and right-hand sides of (10) represent, respectively, the radius
of gyration x A (x0 ) of the region A about the axis x = x0 and the radius of gyration
x C (x0 ) of the segment C about the axis x = x0 . (The radius of gyration of an object
O about the axis x = x0 is defined as the distance x O (x0 ) such that if the mass of the
object were concentrated into a point mass at a distance x O (x0 ) from the axis x = x0 ,
then this point mass would have the same moment of inertia around the axis x = x0
as the object O itself.) So when f is a catenary of the form (1) then, for every interval
[a, b] and any axis x = x0 , the radius of gyration x C (x0 ) of the segment C of the graph
of f over [a, b] is the same as the radius of gyration x A (x0 ) of the region A under the
graph of f on [a, b].

3. CONCLUSION. It is remarkable to us that new mathematical properties, char-


acterizations, and generalizations of the catenary continue to be discovered (see, for
example, the recent Amer. Math. Monthly articles by Apostle and Mnatsakanian [1]
and Coll and Harrison [3]).
Here is just one follow up question for further investigation. Upon revolution about
the x-axis, the catenary produces the catenoid, which is the unique minimal surface
of revolution in R3 . The generalized catenaries are curves that, in an analogous way,

AugustSeptember 2016] CHARACTERISTIC AVERAGING PROPERTY 687


generate the unique minimal hypersurfaces of revolution in Rn (see [3], [4], and [7]).
Surely, these curves have characteristic centroidal properties. What are they?

REFERENCES

1. T. Apostol, M. Mnatsakanian, Volume/surface area relations for n-dimensional spheres, pseudospheres,


and catenoids, Amer. Math. Monthly 122, (2015) 745756.
2. G. Boole, A Treatise on Differential Equations. Macmillan, Cambridge, 1859.
3. V. Coll, M. Harrison, Two generalizations of a property of the catenary, Amer. Math. Monthly 121 (2014)
109118.
4. , Hypersurfaces of revolution with proportional principal curvatures, Adv. Geom. 13 (2013) 485
496.
5. G. W. Leibniz, The string whose curve is described as bending under its own weight, and the remarkable
resources that can be discovered from it by however many proportional means and logarithms, Acta Eru-
ditorum June (1691) 277291. Translation by P. Beaudry in P. Beaudry, G. W. Leibniz: Two papers on the
catenary curve and logarithmic curve, Fidelio Magazine 10 (2001) 5461.
6. E. Parker, A property characterizing the catenary, Math. Mag. 83 (2010) 6364.
7. M. Pinl, W. Ziller, Minimal hypersurfaces in spaces of constant curvature, J. Diff. Geom. 11 (1976) 335
343.

VINCENT E. COLL, JR. received a B.S. from Loyola Universtiy in New Orleans, an M.S. from Texas A&M
University and a Ph.D. from the University of Pennsylvania in 1990 under the direction of Murray Gersten-
haber. He is a professor of practice at Lehigh University. His research interests include deformation theory and
the study of Frobenius Lie algebras. His outside interests include boxing and ice-hockeybut not at the same
time.
Lehigh University, 27 Memorial Drive West, Bethlehem PA 18015
[email protected]

JEFF DODD received a B.S. from the University of Maryland at College Park, an M.A. from the University
of Pennsylvania and a Ph.D. from the University of Maryland at College Park in 1996 under the direction of
Robert L. Pego. He is a professor of mathematics at Jacksonville State University.
Jacksonville State University, 700 Pelham Road North, Jacksonville AL 36265
[email protected]

688 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
The Six Circles Theorem Revisited
Dennis Ivanov and Serge Tabachnikov

Abstract. The six circles theorem of C. Evelyn, G. Money-Coutts, and J. Tyrrell concerns
chains of circles inscribed into a triangle: the first circle is inscribed in the first angle, the
second circle is inscribed in the second angle and tangent to the first circle, the third circle is
inscribed in the third angle and tangent to the second circle, and so on, cyclically. The theorem
asserts that if all the circles touch the sides of the triangle, and not their extensions, then the
chain is 6-periodic. We show that, in general, the chain is eventually 6-periodic but may have
an arbitrarily long pre-period.

1. INTRODUCTION. Given a triangle P1 P2 P3 , we construct a chain of circles as


follows: C1 , inscribed in the angle P1 ; C2 , inscribed in the angle P2 and tangent to
C1 ; C3 , inscribed in the angle P3 and tangent to C2 ; C4 , inscribed in the angle P1 and
tangent to C3 , and so on. The claim of The six circles theorem is that this process is
6-periodic: C7 = C1 , see Figure 1.

P2
5

4 6

2
P3 P1
3 1=7

Figure 1. The six circles theorem: The centers of the consecutive circles are labeled 1, 2, . . . , 7

This beautiful theorem is one of many in the book [3] which is a result of the col-
laboration of three geometry enthusiasts, C. Evelyn, G. Money-Coutts, and J. Tyrrell.
The following is a quotation from John Tyrrells obituary [6]:

John also worked with two amateur mathematicians, C. J. A. Evelyn and G. B. Money-Coutts,
who found theorems by using outsize drawing instruments to draw large figures. They then
looked for concurrencies, collinearities, or other special features. The three men used to meet
for tea at the Cafe Royal and talk about mathematics, and then go to the opera at Covent
Garden, where Money-Coutts had a box.

We refer to [12, 8, 4, 9, 10] for various proofs and generalizations of the theorem and to
[11] for a brief biography of C. J. A. Evelyn. See also [2, 13, 14] for Internet resources.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.689
MSC: Primary 52C26

AugustSeptember 2016] THE SIX CIRCLES THEOREM REVISITED 689


2. A REFINEMENT. The formulation of the six circles theorem needs clarification.
First, there are two choices for each succeeding circle; we assume that each time the
smaller of the two circles tangent to the previous one is chosen (that is, the one which
is closer to the respective vertex of the triangle). Second, it well may happen that the
next circle is tangent not to a side of the triangle but rather to its extension.
The six circles theorem, as previously stated, holds for a chain of circles for which
all tangency points lie on the sides of the triangle, not their extensions. And what about
the latter case? Figure 2 shows what may happen.

P3

2
9=3
5 7 4
P2 8 P1
1

Figure 2. The chain of circles is eventually 6-periodic with pre-period of length two: C9 = C3 , but C8 = C2

Theorem. Assume that, for the initial circle, at least one of the tangency points lies
on a side of the triangle. Then the chain of circles is eventually 6-periodic. One can
choose the shape of a triangle and an initial circle so that the pre-period is arbitrarily
long.

The existence of pre-periods is due to the fact that the map assigning the next circle
to the previous one is not 1-1; that is, the inverse map is multivalued.
Concerning the assumption that at least one of the tangency points of a circle with
the sides of the angle of a triangle lies on a side of the triangle, and not its extension,
we observe the following.

Lemma 1. If the first circle in the chain satisfies this assumption, then so do all the
consecutive circles.

Proof. If circle C1 touches side P1 P2 , then circle C2 also touches this side, at a point
closer to P2 than the previous tangency point. Shifting the index by one, if circle C2
does not touch side P2 P3 but touches side P1 P2 , then it intersects side P1 P3 , and the
next circle C3 touches side P1 P3 , at a point closer to P3 than the intersection points.
See Figure 5 below for an illustration.

What about the case when the initial circle touches the extensions of both sides,
P1 P2 and P1 P3 ? If the circle does not intersect side P2 P3 , then the next circle in the
chain cannot be constructed, so this case is not relevant to us. If the first circle intersects
side P2 P3 , then the next circle touches side P2 P3 , and thus satisfies the assumption of
Theorem 2, see Figure 3. Hence this assumption holds, starting with the second circle
in the chain, and we may make it without loss of generality.

690 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
P2
A

P3 P1

P2

P3 P1

Figure 3. When the initial circle touches the extensions of both sides of the triangle

3. BEGINNING OF THE PROOF. The proof consists of reducing the system to an


iteration of a piecewise linear function; this is achieved by a trigonometric change of
variables (see [3, 9, 10, 4] for versions of this approach). The choices of coordinates
and various manipulations may look somewhat unmotivated; they are merely justified
by the fact that they work. The reader interested in a coordinate-free, but less elemen-
tary, approach is referred to [9].
Let us introduce some notation. The angles of the triangle are 21 , 22 , and 23 ;
its side lengths are a1 , a2 , a3 (with the usual convention that the ith side is opposite
the ith vertex). Let p = (a1 + a2 + a3 )/2. We note that p > ai for i = 1, 2, 3: this is
the triangle inequality. We denote the radii of the circles Ci by ri , i = 1, 2, . . . and
assume that Ci is a circle that is inscribed into the (i mod 3)rd angle.

P2

A
2

P3 1
P1

Figure 4. The first case of equation (1): |P1 A| + |AB| + |B P2 | = |P1 P2 |

If two circles of radii r1 and r2 are tangent externally, then the length of their com-
mon tangent segment (segment AB in Figures 4, 5, 6) is

(r1 + r2 )2 (r1 r2 )2 = 2 r1r2 .

AugustSeptember 2016] THE SIX CIRCLES THEOREM REVISITED 691


1 P1

3
P2
P3
B A

Figure 5. The second case of equation (1): |P2 A| |AB| + |B P3 | = |P2 P3 |

P1

2 3

A B
P3 P2

Figure 6. Another illustration for the second case of equation (1): |P2 A| |AB| + |B P3 | = |P2 P3 |

Thus, depending on the mutual positions of the consecutive circles, as shown in


Figures 4, 5, and 6, we obtain the equations

r1 cot 1 + 2 r1r2 + r2 cot 2 = a3 or r1 cot 1 2 r1r2 + r2 cot 2 = a3 , (1)

or the cyclic permutation of the indices 1, 2, 3 thereof. Specifically, if C1 is tangent to


the side P1 P2 , then we have the first equation (1), and if C1 is tangent to the extension
side P1 P2 , then we have the second equation.

4. SOLVING THE EQUATIONS. The equations in (1) determine the new radius r2
as a function of the previous one r1 . We shall solve these equations in two steps. First,
introduce the notations

u1 = r1 cot 1 , e3 = tan 1 tan 2 ,

and their cyclic permutations. Then (1) is rewritten as

u 21 2e3 u 1 u 2 + u 22 = a3 , (2)

or

u 1 (u 1 e3 u 2 ) + u 2 (u 2 e3 u 1 ) = a3 . (3)

692 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Solving (2) for u 2 , we obtain
 
u 2 = e3 u 1 + a3 (1 e32 )u 21 , or u 2 = e3 u 1 a3 (1 e32 )u 21 , (4)

accordingly as the sign in (2) is positive or negative. The minus sign in front of the
radical in the second formula (4) is because our construction chooses the smaller of
the two circles tangent to the previous one. Likewise, solving for u 1 yields
 
u 1 = e3 u 2 + a3 (1 e32 )u 22 , or u 1 = e3 u 2 + a3 (1 e32 )u 22 ,

again depending on the sign in (2). The plus sign in front of the radical in the second
formula is due to the fact that, going in the reverse direction, from C2 to C1 , one
chooses the greater of the two circles. Substituting this to (3), we obtain
 
u 1 a3 (1 e32 )u 22 u 2 a3 (1 e32 )u 21 = a3 . (5)

The sign depends on whether u 21 is smaller or greater than a3 (and if u 21 = a3 , then


u 2 = 0 in (4)).

5. TRIGONOMETRIC SUBSTITUTION. We shall rewrite the previous formula


as the formula for sine of the sum or difference of two angles. To do so, we need a
lemma.
Given a triangle ABC, let a, b, c be its sides, p its semiperimeter, and 2, 2, 2
its angles.

Lemma 2. For the triangle ABC, as described above, one has


c
1 tan tan = .
p

G F

B E C

Figure 7. For the proof of Lemma 2

Proof. Let R be the inradius and S the area of the triangle. Let

TA = AF = AG, TB = BG = B E, TC = C E = C F,

Figure 7. Then p =2 TA + TB + TC and S = Rp. By Herons formula, S =


see
pTA TB TC . Therefore, R p = TA TB TC .

AugustSeptember 2016] THE SIX CIRCLES THEOREM REVISITED 693


On the other hand, tan = R/TA , tan = R/TB , hence

R2 TC T A + TB c
1 tan tan = 1 =1 = = ,
T A TB p p p

as claimed.

Using the lemma, we rewrite (5) as


  
u1 u 22 u2 u 21 a3
1 1 = . (6)
p p p p p

We are ready for the final change of variables. Let


   
ui ai
i = arcsin , i = arcsin .
p p

To justify the second formula, we note that ai < p. Likewise, each circle is tangent
to a side of the triangle, so u i2 is not greater than some side, and hence less than p. This
justifies the first formula.
In the new variables, (6) can be rewritten as sin(1 2 ) = sin 3 , where one has a
plus sign for 1 < 3 and a minus sign otherwise. Hence

2 = |1 3 |. (7)

This equation describes the dynamics of the chain of circles.


Before studying the dynamics of this function we note that the angles i satisfy the
triangle inequality, as the next lemma asserts. Assume that 1 2 3 .

Lemma 3. With the same hypothesis as Lemma 2, one has 3 < 1 + 2 .

Proof. We start by noting that sin i < 1 for i = 1, 2, 3, and that

sin2 1 + sin2 2 + sin2 3 = 2 or sin2 3 = cos2 1 + cos2 2 .

Assume that the triangle inequality is violated for some triangle. Since the inequality
holds for an equilateral triangle, one can deform it to obtain a triangle for which 3 =
1 + 2 . Then

cos2 1 + cos2 2 = sin2 3 = (sin 1 cos 2 + sin 2 cos 1 )2 .

It follows, after some manipulations, that

sin 1 sin 2 cos 1 cos 2 = cos2 1 cos2 2 or sin 1 sin 2 = cos 1 cos 2 .

Therefore,

cos(1 + 2 ) = 0 or 1 + 2 = .
2

Hence sin2 1 + sin2 2 = 1, and thus sin 3 = 1. This is a contradiction.

694 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
6. PIECEWISE LINEAR DYNAMICS. We are ready to investigate the function
(7). Although the dynamics of a piecewise linear function can be very complex [7],
ours is quite simple.
Iterating the map three times, with the values of the index i = 1, 2, 3, yields the
function y = |||x 1 | 2 | 3 |. We scale the x y plane so that 1 = 1 and rewrite
the function as

f (x) = |||x 1| a| b|, (8)

where a b and b < a + 1. We will show that every orbit of the map f is eventually
2-periodic, see Figure 8.

4 P0

0 P00
0 2 4 6 8 10 12

Figure 8. Iteration of the function f (x) for a = 3.6, b = 4.2

The graph of f (x) is shown in Figure 9 with the characteristic points marked.

b
ba+1

ba
1 b
ba a+1 a+b+1

Figure 9. The graph y = f (x). The segment [b a, 1] consists of 2-periodic points.

It is clear that iterations of the function f take every orbit to the segment [0, b], and
this segment is mapped to itself. Indeed, if x a + b + 1, then f (x) = x a b 1,
and if x a + b + 1, then f (x) b. Thus iterations of the function f will keep
decreasing x until it lands on [0, b].
Let

I1 = [0, b a], I2 = [b a, 1], I3 = [1, b].

Then I2 consists of 2-periodic points, and we need to show that every orbit lands on
this interval. Indeed, f (I1 ) = [1, b a + 1] I3 . On the other hand, each iteration of
f chops off from the left a segment of length 1 + a b from I3 and sends it to I2 . It
follows that every orbit eventually reaches I2 .

AugustSeptember 2016] THE SIX CIRCLES THEOREM REVISITED 695


If |I2 | = a + 1 b is small, it may take an orbit a long time to reach I2 . For
example, take a = 1 and b = 2 . Then, choosing sufficiently small, one can make
the pre-period of point x = arbitrarily long. This choice corresponds to an isosceles
triangle with the obtuse angle close to and a small initial circle C1 , compare this
with Figure 2.

7. FINAL COMMENTS.
1. Although our considerations are close to those in [3], the authors of this book
did not consider the pre-periodic behavior of the chain of circles. They addressed
the issue of the two choices in each step of the construction and noted:

... we may make the first three sign choice quite arbitrarily provided that, thereafter, we
make correct choices ...

so that the chain becomes 6-periodic.


2. For a parallelogram, a similar phenomenon holds: the chain of circles is eventu-
ally 4-periodic but with a pre-period, see [10]. Our analysis is similar to that of
Troubetzkoy.
3. For n > 3, the chain of circles inscribed in an n-gon is generically chaotic, see
[10] for a proof when n = 4 and Figure 10 for an illustration when n = 5. How-
ever, for every n, there is a class of n-gons enjoying 2n-periodicity, see [9].
Presumably, this periodicity is also eventual, with an arbitrary long pre-period.

Figure 10. A chain of circles in a pentagon

4. A version of the six circles theorem holds for curvilinear triangles made of arcs
of circles [3, 12, 8], and a generalization to n-gons is available as well [9]. Again,
one expects eventual periodicity with arbitrarily long pre-periods.
5. Constructing the chains of circles, we consistently chose the smaller of the two
circles tangent to the previous one. It is interesting to investigate what happens
when other choices are made; for example, one may toss a coin at each step. See
Figure 11 for an experiment with a randomly chosen triangle.
6. The six circles theorem is closely related with the Malfatti problem: to inscribe
three pairwise tangent circles into the three angles of a triangle; see, e.g., [5] and
the references therein. This 3-periodic chain of circles exists and is unique for

696 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
400

300

200

100

0 10 20 30 40 50 60 70

Figure 11. The histogram represents 3000 chains of circles in a generic triangle. The selection, out of two,
of each next circle in a chain is random. The horizontal axis represents the length of the pre-period, and the
vertical the number of chains having this pre-period.

every triangle; it corresponds to the fixed point of the function f (x). See [1] for
a discussion of the Malfatti problem close to our considerations.

ACKNOWLEDGMENT. Most of the experiments that inspired this note and of the drawings were made in
GeoGebra. The second author was supported by the NSF grant DMS-1105442. We are grateful to the referees
for their criticism and advice.

REFERENCES

1. V. Belenky, A. Zaslavsky, On the Malfatti problem (in Russian), Kvant No. 4 (1994) 3842.
2. A. Bogomolny, Six Circles Theorem, http://www.cut-the-knot.org/Curriculum/Geometry/
Evelyn.shtml
3. C. J. A. Evelyn, G. Money-Coutts, J. A. Tyrrell, The Seven Circles Theorem and Other New Theorems.
Stacey Int., London, 1974.
4. D. Fuchs, S. Tabachnikov, Mathematical Omnibus. Thirty Lectures on Classic Mathematics. Amer. Math.
Soc., Providence, RI, 2007.
5. R. Guy, The lighthouse theorem, Morley & MalfattiA budget of paradoxes, Amer. Math. Monthly 114
(2007) 97141.
6. M. Laird, J. Silveste, John Alfred Tyrrell, 19321992, Bull. Lond. Math. Soc. 43 (2011) 401405.
7. M. Nathanson, Piecewise linear functions with almost all points eventually periodic, Proc. Amer. Math.
Soc. 60 (1976) 7581.
8. J. Rigby, On the Money-Coutts configuration of nine antitangent cycles, Proc. Lond. Math. Soc. 43 (1981)
110132.
9. S. Tabachnikov, Going in circles: Variations on the Money-Coutts theorem, Geom. Dedicata 80 (2000)
201209.
10. S. Troubetzkoy, Circles and polygons, Geom. Dedicata 80 (2000) 289296.
11. J. A. Tyrrell, Cecil John Alvin Evelyn, Bull. Lond. Math. Soc. 9 (1977) 328329.
12. J. A. Tyrrell, M. T. Powell, A theorem in circle geometry, Bull. Lond. Math. Soc. 3 (1971) 7074.
13. http://en.wikipedia.org/wiki/Six_circles_theorem
14. E. Weisstein, Six Circles TheoremFrom MathWorld, A Wolfram Web Resource,
http://mathworld.wolfram.com/SixCirclesTheorem.html

AugustSeptember 2016] THE SIX CIRCLES THEOREM REVISITED 697


DENNIS IVANOV is a physicist by education and an amateur mathematician. He likes mathematical experi-
ments; see http://www.geogebratube.org/user/profile/id/4032 for his GeoGebra applets. His hob-
bies include yoga, music, and outdoors activities. He lives in Moscow, Russia.
[email protected]

SERGE TABACHNIKOV was educated in the Soviet Union (Ph.D. from Moscow State University); since
1990, he has been teaching at universities in the USA. His mathematical interests include geometry, topology,
and dynamics. He served as a Deputy Director of ICERM (Institute for Computational and Experimental
Research in Mathematics) at Brown University. He is the Editor-in-Chief of Experimental Mathematics,
the Editor of the Mathematical Gems and Curiosities column of the Mathematical Intelligencer, and in
20102015, he served as the Notes Editor of this Monthly.
Department of Mathematics, Penn State, University Park, PA 16802
[email protected]

698 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
NOTES
Edited by Sergei Tabachnikov

Cayleys Formula: A Page From The Book


Arnon Avron and Nachum Dershowitz

Abstract. We present a simple proof of Cayleys formula.

We give a short elementary proof of Cayleys famous formula for the enumeration Tn
of free, unrooted trees with n 1 labeled nodes. We first count Fn,k , the number of
n-node forests composed of k rooted, directed trees, for 1 k n. For the history of
the formula, including Jim Pitmans use of directed forests, see [1, pp. 221226].
The crux of the proof is simple double counting. There are two equivalent ways
of counting the number of k-tree forests with one designated internal (nonroot) node,
which shows, for all k = 1, . . . , n 1, that

(n k)Fn,k = kn Fn,k+1 . (*)

For the left side of (*): Consider one of the Fn,k forests with k trees. Designate any
one of its n k internal nodes.
For the right side: Consider one of the Fn,k+1 forests with k + 1 trees. Choose any
one of the n nodes, and hang from it any one of the k trees not containing that node.
The root of that grafted subtree is the designated internal node.
Iterating (*) n 1 times gives

Fn,1 = 1
n1
n Fn,2 = 1 2
n1 n2
n 2 Fn,3 = = 1 2
n1 n2
n1
1
n n1 Fn,n .

The k and n k factors all cancel each other out. Because there is precisely one way of
turning n nodes into n distinct trees (each root being a whole tree), we have Fn,n = 1.
Thus, the number Fn,1 of n-node rooted trees is n n1 . Since any of the n nodes in a tree
can be the root, Fn,1 = nTn ; Cayleys formula, Tn = n2
 n nk1
, follows.
Applying (*) only n k times, yields Fn,k = k knn
, for k = 1, . . . , n.
Alternatively, the relation (k + 1)Rn,k = kn Rn,k+1 for the number Rn,k of n-node
forests with k designated roots leads to Rn,k = kn nk1 and to Tn = Rn,1 = n n2 .
As a final remark, there are (n + 1)n1 rooted trees with n + 1 nodes that all share
the same root. Each corresponds to a rooted forest with n nodesjust chop off the root
node. Therefore, the limit of the ratio of rooted labeled forests to rooted labeled trees,
as their size grows, is limn (n + 1)n1 /n n1 = e.

ACKNOWLEDGMENT. We thank Ed Reingold for his suggestions and everyone who read earlier drafts.

http://dx.doi.org/10.4169/amer.math.monthly.123.7.699
MSC: Primary 05C05

AugustSeptember 2016] NOTES 699


REFERENCE

1. M. Aigner, G. M. Ziegler, Proofs from THE BOOK. Fifth edition. Illustrations by K. H. Hofmann. Springer-
Verlag, Berlin, 2014.

School of Computer Science, Tel Aviv University, Ramat Aviv, 69978 Israel
[email protected]
School of Computer Science, Tel Aviv University, Ramat Aviv, 69978 Israel
[email protected]

Congratulations to the U.S. team, Ankan Bhattacharya, Michael Kural, Allen Liu,
Junyao Peng, Ashwin Sah, and Yuan Yao for their second consecutive win at the
57th International Mathematical Olympiad in Hong Kong! Not only did the team
bring home first place for the U.S., but students Allen Liu and Yuan Yao earned
perfect scores on the exam, and all six U.S. students took home a gold medal for
their individual high scores.

700 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
On Tangents and Secants of Infinite Sums
Michael Hardy

Abstract. We prove some identities involving tangents, secants, and cosecants of infinite sums.

For k = 0, 1, 2, . . . let ek be the kth-degree elementary symmetric function of tan j ,


j = 1, 2, 3, . . ., i.e. the sum of all products of k of the tangents. It is routine to prove
by induction on the number of terms, when that number is finite, that
 e1 e3 + e5
tan j = ,
j
e0 e2 + e4

 j sec j
sec j = , (1)
j
e0 e2 + e4

 j sec j
csc j = .
j
e1 e3 + e5

As far as I know, the last two identities do not appear in any refereed source. The
case of the first one in which only finitely many terms appear on the left appears in
[1, page 47].
We will prove that the last two identities hold when the sum on the left converges
absolutely. The first identity in that case follows as a corollary. I added a quick sketch
of these proofs to Wikipedias List of trigonometric identities [2] in 2012.

1. ELEMENTARY SYMMETRIC FUNCTIONS AND CONVERGENCE.


The kth-degree elementary symmetric function en,k in finitely many variables xi ,
i {1, . . . , n} is the sum of all products of k of those variables:
 
en,k = xj.
A{1,...,n} jA
|A|=k

In particular, en,0 = 1 and if k > n then en,k = 0. For example, e4,2 = x1 x2 + x1 x3


+ x1 x4 + x2 x3 + x2 x4 + x3 x4 .
The kth-degree elementary symmetric function ek in variables xi , i N = {1, 2,
3, . . .} is the sum of all products of k of those variables:
 
ek = xj. (2)
AN jA
|A|=k

http://dx.doi.org/10.4169/amer.math.monthly.123.7.701
MSC: Primary 33B10

AugustSeptember 2016] NOTES 701


In the two infinite series

e0 e2 + e4
e1 e3 + e5

each term shown is itself 1 times one of the infinite series (2). As long as conver-
gence is absolute, the order of summation will not affect the value of the sum, but
our proofs will involve limits of partial sums. Hence, we will evaluate the sums in
the following order:

e0 e2 + e4 = lim (1)k/2 en,k
n
even kn
 (3)
e1 e3 + e5 = lim (1)(k1)/2 en,k .
n
odd kn

2. EXAMPLES OF TRIGONOMETRIC IDENTITIES. In all that follows, we


will have x j = tan j for j = 1, 2, 3, . . .. Here are our identities in the case where
only three s are not 0:

tan 1 + tan 2 + tan 3 tan 1 tan 2 tan 3


tan(1 + 2 + 3 ) = ,
1 tan 1 tan 2 tan 1 tan 3 tan 2 tan 3
sec 1 sec 2 sec 3
sec(1 + 2 + 3 ) = ,
1 tan 1 tan 2 tan 1 tan 3 tan 2 tan 3
sec 1 sec 2 sec 3
csc(1 + 2 + 3 ) = .
tan 1 + tan 2 + tan 3 tan 1 tan 2 tan 3

3. RESULTS IN THE INFINITE CASE.



Theorem 1. Suppose j=1 j converges absolutely. Then the three identities (1) hold,
with the infinite sums involving symmetric functions defined as in (3). Convergence in
(3) is absolute.

Proof. We take the codomain of trigonometric functions to be the one-point com-


pactification R {}. Thus, theyare continuous everywhere, and as n , the
secant, cosecant, and tangent of nj=1 j approach the values of those functions at

j=1 j , including the case in which a pole occurs at the value of that sum.
Our central tactic is to rearrange the last two identities in (1) in the case involving
only finitely many terms as follows:
n
j=1sec j
en,0 en,2 + en,4 = n
sec j=1 j
n (4)
j=1 sec j
en,1 en,3 + en,5 = 
csc nj=1 j

(the sums on the left have only finitely many nonzero terms). The identities (4) are
proved by a routine induction on n.

702 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
For large enough N , for all j N , we have j so close to 0 that 1 sec j
1 + 2j exp( 2j ). Hence,


n 
n 
n 
n
1 sec j (1 + 2j ) exp( 2j ) = exp 2j ,
j=N j=N j=N j=N


and that converges as n since j j converges absolutely. Thus, the right sides
of (4) converge and, therefore, so do the left sides, and to the same limit.
To show that convergence on the left side is absolute, we let f n,k be the kth-degree
elementary symmetric function in the absolute values |x1 |, . . . , |xn |. It is enough to
prove that


n
lim f n,k < .
n
k=0

We have

n 
n 
n 
n
lim f n,k = lim (1 + |xi |) lim exp |xi | = lim exp |xi | < .
n n n n
k=0 i=1 i=1 i=1

Finally, since tan = sec/csc, the first identity in (1) follows.

REFERENCES

1. E.W. Hobson, A Treatise on Plane Trigonometry, Cambridge Univ. Press, Cambridge, 1891.
2. https://en.wikipedia.org/wiki/List_of_trigonometric_identities

Hamline University, Saint Paul, MN 55104


[email protected]

AugustSeptember 2016] NOTES 703


Factorization of a Matrix Differential
Operator Using Functions in Its Kernel
Alex Kasman

Abstract. Just as knowing some roots of a polynomial allows one to factor it, a well-known
result provides a factorization of any scalar differential operator given a set of linearly inde-
pendent functions in its kernel. This note provides a straightforward generalization to the case
of matrix coefficient differential operators.

1. MOTIVATION. An ordinary differential operator (ODO) is a commonly used


mathematical object that turns one function1 into another by adding up the products of
some other specified functions with its derivatives. Symbolically, an ODO is a polyno-
mial in having coefficients that are functions of x. The highest power of appear-
ing with a nonzero coefficient is called the order of the operator. ODOs act on functions
of x according to the rule:


n 
n
L= i (x) i
implies L( f ) = i (x) f (i) (x). (1)
i=0 i=0

Much interest in differential operators comes from their use in writing linear dif-
ferential equations. However, ODOs also have the algebraic structure of a noncom-
mutative ring. They can be added as one would add any polynomials, by combining
the coefficients of similar powers of , and multiplication is defined by extending the
following rule for the product of two monomials linearly over sums:
m  
 m
((x) m ) ((x) n ) = (x) (i) (x) m+ni . (2)
i=0
i

Although this definition may look complicated at first, in fact, it is simply a con-
sequence of the usual product rule from calculus, applied here to guarantee that
multiplication corresponds to operator composition as one would expect. That is, this
definition was chosen so that L Q( f ) = L(Q( f )).
Given the rule (2) for multiplication, the inverse question of factorization naturally
arises. One factorization method for ODOs is surprisingly reminiscent of a familiar
fact about polynomials. If x = is a root of the polynomial p(x), then you know that it
has a factor of x , the simplest first degree polynomial with this property. Similarly,
for any nonzero function f (x), f  / f is the simplest first order differential operator
having f in its kernel, and if L is any ODO with f in its kernel, then L = Q (
f  / f ) for some differential operator Q. Moreover, just as knowing additional roots
of the polynomial p would allow further factorization, one may factor a differential
http://dx.doi.org/10.4169/amer.math.monthly.123.7.704
MSC: Primary 16S32
1 Here and throughout this note, functions of x will be understood to mean sufficiently differentiable

functions of x even if differentiability is not specifically mentioned.

704 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
operator of order n into the product of operators n k and k from the knowledge of k
linearly independent functions in its kernel. The general statement written in terms of
Wronskian determinants2 is as follows.

Theorem 1 (Scalar Case). Let 1 , . . . , m be functions such that Wr(1 , . . . , m )


= 0. Then (a) the unique monic differential operator K of order m satisfying K (i )
= 0 for 1 i m is the one whose action on an arbitrary function f (x) is given by
the formula

Wr(1 , . . . , m , f )
K( f ) = (3)
Wr(1 , . . . , m )

and (b) if L is any differential operator satisfying L(i ) = 0 for 1 i m, then there
exists a differential operator Q such that L = Q K .

2. THE MATRIX CASE. The purpose of this note is to generalize this well-known
and useful result to the case of matrix coefficients3 . A matrix coefficient ordinary dif-
ferential operator (MODO) is again a polynomial in with coefficients that depend on
x, but we now consider the case in which those coefficients are N N matrices. The
formulas for multiplying MODOs and for applying them to functions remain the same
(see (1) and (2)), but now the products involving the coefficients are understood to be
matrix products and the function f is an N -vector valued function.
Matrix analogues of Theorem 1(a) already appear in the literature. For example,
a result of EtingofGelfandRetakh [1] allows one to produce a monic MODO with
a specified kernel using quasi-determinants. However, not only is there no published
analogue of Theorem 1(b) for MODOs, it appears that many researchers have sus-
pected that it does not generalize nicely to the matrix case. A common proof of
Theorem 1(b) depends on the fact that any nonzero ODO of order at most n has
a kernel of dimension at most n. So, the fact that any MODO with a singular
leading coefficient has an infinite-dimensional kernel is both an obstacle to gener-
alizing the proof and reason to doubt the validity of the equivalent statement for
MODOs.
It is therefore good news that Theorem 2(b) below does indeed fully generalize the
scalar result to the matrix case without imposing any additional restrictions on the
leading coefficient, order, or kernel of the operator L. In addition to proving this new
result, what follows can be seen as providing a novel alternative proof to Theorem 1
and Theorem 2(a).

Definition. Let 1 (x), . . . , MN (x) be N -vector valued functions and let  be the
MN MN block Wronskian matrix

1 2 MN
1 2 
MN
..
 = .. .. ..
. . (4)
. . .
1(M1) 2(M1) (M1)
MN
2 TheWronskian determinant Wr(1 , . . . , n ) of the functions i (x) is defined to be the determinant of the
i1
n n matrix having ddx i1 j (x) in row i and column j.
3 Another sort of higher-dimensional generalization is the case of partial differential operators, which is

considered in [2].

AugustSeptember 2016] NOTES 705


Note that the first N rows of  are given by the selected vector functions and then
every other element is the derivative of the element N rows above it. In the case
N = 1, the nonsingularity of this matrix is famously equivalent to the linear indepen-
dence of the functions, but that is not true in general. For instance, if N = 2, M = 1,
1 = (1 1) and 2 = (x x) , then det() = 0 even though these are linearly inde-
pendent functions of x.

For the sake of brevity, the following notations will be utilized below. The symbol
will denote the N MN matrix (1 MN ) and D() = 0 will be used as a
shorthand for the statement that D(i ) is equal to the zero vector for each 1 i MN
(i.e., that the vector functions are in the kernel of some linear operator D). The N N
identity matrix will be written simply as I , and more generally, the m m identity
matrix for any natural number m will be denoted by Im .
The main result can now be stated concisely.

Theorem 2 (Matrix Case). If the functions i are chosen so that det  = 0, then (a)
the differential operator

I
I

K = I M 1(M) MN
(M)
1 .. (5)
.
M1 I

is the unique monic MODO K of order M such that K () = 0 and (b) if L is any
MODO such that L() = 0, then there exists a MODO Q such that L = Q K .

Proof. For a natural number m M, let m denote the (m N + N ) (MN + N )


matrix with block decomposition

1 2 MN I
1 2 
MN I

m = .. .. .. .. .
. . . .
1(m) 2(m) (m)
MN m I

It is a nearly trivial observation that for a MODO L of order at most m one has


m
L= i (x) i (0 1 m ) m = (L(1 ) L(MN ) L). (6)
i=0

That is, for any choice of L the product of the N (m N + N ) matrix made from
its coefficients with m has the vector L(i ) as its i th column for 1 i MN and the
last N N block is a copy of the operator itself. For any choice of N (m N + N )
matrix , its product with m yields a matrix that records a differential operator L and
its action on each of the vector functions as in (6).
In the case that m = M and the N N blocks i are defined by
 
1 0
= (0 M ) = 1(M) (M)
MN I ,
0 I

706 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
the product m in (6) would equal
 1  
(M)  IMN

(M) (M)  0 B
1 MN I  M = I = (0 K )
0 I (M) M I

for some MODO K . Because the MODOs in the block B are of degree at most M 1,
K is monic of degree M. We know that K () = 0 since the first MN columns, which
are all zero, record its action on the functions i . This operator K can equivalently be
produced with the formula (5), as the quasi-determinant | M | M+1,M+1 where  M is
viewed as an (M + 1) (M + 1) matrix with entries that are N N matrices or as
the Schur complement of the invertible block  in the matrix  M . This demonstrates
the existence of the operator K promised in (a). (This could also have been achieved in
many other ways, including merely by direct computation of K (i ), but doing it this
way sets us up nicely for proving the rest of the claim.)
Let G be the (m N + N ) (m N + N ) matrix whose decomposition into
N (m N + N ) blocks

G0 G0
G1 G 1 1 
G=
... is such that . = 
.. 0

Gm G M1

and for i M the block row G i = (gi0 gi1 gim ) where the functions gi j are defined
by the equation iM K = mj=0 gi j (x) j . Consider the product Gm . Its first MN
rows would have the form (IMN B) since 1  = IMN . For i M, the product of
block row G i with m is designed so that the MODO iM K shows up as its last
N N block. According to (6), the previous columns would be the image of under
the action of that operator, but iM K () = 0 so

IMN B
0 K

0 K
Gm = . .. . (7)
.
. .
0 mM K

Now, suppose L is any MODO of order at most m satisfying L() = 0. Label


its coefficients as in (6), and consider the product of the N (m N + N ) matrix4
q = (0 m )G 1 and the matrix  = Gm . On the one hand, because the G 1
and G cancel, q = (0 m )m is equal to the expression on the right in (6).
Specifically, given the assumption L() = 0, it has the form (0 0 L).
On the other hand, defining the N N functions qi by the block decomposition
q = (q0 qm ) and making use of the block form of  = Gm in (7), it is clear that
q is also equal to (q0 q1 . . . q M1 L) because the block IMN picks out and preserves
the first M coefficient blocks.
Combining these two observations, we conclude that qi = 0 for 0 i M 1.
Considering only the last block column in the product q, one obtains an expression
for L as a sum involving the functions qi as coefficients multiplied by the operators in
4 The matrix G is invertible because its top-left MN MN block (1 ) is invertible, and below that, it is

lower triangular with 1s along the diagonal.

AugustSeptember 2016] NOTES 707


the last block column of Gm , but given the fact that the first M of those coefficients
are zero, this simplifies to
mM 
m
iM  
L= qi K = qi+M K .
i

i=M i=0

Then the operator in parentheses above is the operator Q satisfying the claim in (b).
Finally, suppose K was also a monic MODO of order M such that K () = 0. Then
D = K K is an operator of order strictly less than M with this same property. The
only way that D can have order less than M and also satisfy D = Q K for some
MODO Q is if Q = D = 0, which demonstrates the uniqueness of K and completes
the proof.

3. EXAMPLE. Consider the case M = N = 2 and the vector functions


 3   2    
x x 0 1
1 = , 2 = , 3 = , and 4 = .
x 3 0 x 0

Then the block Wronskian matrix in (4) is the matrix



x3 x2 0 1
x 3 0 x 0
= 2 ,
3x 2x 0 0
3x 2 0 1 0

whose invertibility assures us that there is a unique monic MODO of order 2 having
all four of these vectors in its kernel. Using (5), we can easily determine that this
operator is
   
x1 2x3 0 2x32
K =I + 2
+ .
0 x3 0 3
x2

Another MODO that obviously also has each i in its kernel is


 
1 1 3
L= ,
1 1

especially since the leading coefficient of L is nonzero and singular, a situation that
cannot arise in the case N = 1. Without the theorem above it would not be clear that
there is an algebraic relationship between L and K . However, Theorem 2 assures us
that any MODO having these vectors in its kernel must have K as a right factor, and so
there must be a differential operator Q such that L = Q K . (In fact, one can check
that
   
1 3
1 1
Q= + x1 2x3
1 1
x 2x

realizes this factorization.)

ACKNOWLEDGMENT. The author wishes to thank the referee, Maarten Bergvelt, Michael Gekhtman, Tom
Kunkle, and Chunxia Li for advice, assistance, and encouragement.

708 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
REFERENCES

1. P. Etingof, I. Gelfand, V. Retakh, Factorization of differential operators, quasideterminants, and nonabelian


Toda field equations, Math. Res. Letters 4 (1997), 413425.
2. A. Kasman, Kernel inspired factorizations of partial differential operators, J. Math. Anal. Appl. 234 (1999)
580591.

Department of Mathematics, College of Charleston, Charleston SC 29424


[email protected]

The Paul R. HalmosLester R. Ford Awards for 2015

The Paul R. HalmosLester R. Ford Awards, established in 1964, are made annually
to authors of outstanding expository papers in the MONTHLY. The award is named
for Paul R. Halmos and Lester R. Ford, Sr., both distinguished mathematicians and
former editors of the Monthly. Winners of the HalmosFord Awards for expository
papers appearing in Volume 122 (2015) of the Monthly are as follows.

Alex Chin, Gary Gordon, Kellie MacPhee, and Charles Vincent, Pick a tree any
tree, pp. 424432.
Kenneth S. Williams, A four integers theorem and a five integers theorem,
pp. 528536.
Manya Raman-Sundstrom, A pedagogical history of compactness, pp. 619635.
Zhiqin Lu and Julie Rowlett, The sound of symmetry, pp. 815835.

AugustSeptember 2016] NOTES 709


Computing (2m) by Using Telescoping Sums
Brian D. Sittinger

Abstract. In this article, we give another proof for the closed form of (2m) inspired by the
elementary telescoping sum proof for (2) given by Daners [3]. This proof, which begins
with recurrence relations derived from certain integrals by using integration by parts, yields
a identity giving the value of (2m) in terms of (2), (4), ..., (2m 2). A quick proof by
induction yields the closed form of (2m).

1. INTRODUCTION. One of the more fascinating formulas from the theory of infi-
nite series has to be the identity


1 2
(2) = = .
k=1
k2 6

This identity, originally established by Euler around 1735, has received much interest
throughout the years. Chapman has compiled many proofs of this result in [2].
More generally, Euler proved that for any m N


1 (1)m+1 22m1 B2m 2m
(2m) = = ,
k=1
k 2m (2m)!

where Bn denotes the n-th Bernoulli number, defined by the generating function

t  tn
= Bn .
et 1 n=0
n!

The above identity for (2m) has been established in many ways as well, a couple of
which are found in [1] and [4]. The purpose of this article is to generalize Daners
approach for computing (2) found in [3] to compute (2m). In [3], he establishes
2
that (2) = 6 by relying on nothing more sophisticated than recurrence relations
obtained from applying integration by parts on a simple family of integrals. Although
the reasoning is more involved, the main ideas are similar, and we find a recurrence
that will establish Eulers identity for (2m).

2. DERIVATION OF THE RESULT. We start by defining a family of integrals


indexed by nonnegative integers k and n:

2
Ik,n = x 2k cos2n x d x.
0

The following result collects two recurrences for Ik,n that we will repeatedly use.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.710
MSC: Primary 40-01

710 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Lemma 1. For all positive integers n,
2n 1
(1) I0,n = I0,n1 ,
2n  
1
(2) Ik,n = 2n(2n 1)Ik+1,n1 4n 2 Ik+1,n .
(2k + 2)(2k + 1)

Proof. The first recurrence follows from one application of integration by parts with
u = cos2n1 x and dv = sin x d x and using sin2 x = 1 cos2 x.
We establish the second recurrence similarly. Applying integration by parts with
u = cos2n x and dv = x 2k d x yields

2n 2
Ik,n = x 2k+1 cos2n1 x sin x d x.
2k + 1 0

Applying integration by parts again and using sin2 x = 1 cos2 x yields the desired
result.

Next, we use the recurrences from Lemma 1 in a manner similar to Daners [3] to
 1
derive the following recurrence. Hereafter, we set S(k) = .
i ... i k2
1i i 1
2
1 k

m 
 1 k 1
Proposition 1. For any m N, S(k) = 0.
k=0
2 (2m 2k + 1)!

Proof. Start with recurrence (2) from Lemma 1, and divide both sides by n 2 I0,n :

Ik,n 1  Ik+1,n1 
2 Ik+1,n
= 2n(2n 1) 4n .
n 2 I0,n n 2 (2k + 2)(2k + 1) I0,n I0,n

Next, we apply the identity I0,n1 = 2n


I
2n1 0,n
to obtain

Ik,n 4 I Ik+1,n 
k+1,n1
= .
n 2 I0,n (2k + 2)(2k + 1) I0,n1 I0,n
 1
Define S N (0) = 1, and S N (k) = for k > 0; note that S N (k) is
1i 1 i k N
i 2
1 ... i 2
k
a truncated sum of S(k). Repeated telescoping with the previous identity as k varies
yields

m
(1)k (2m)! Imk,0 Im,N
S N (k) = . ()
k=0
2 (2m 2k)! I0,0
2k I0,N

To explain how the telescoping process works, we establish that (*) is true by induc-
tion on m. For the base case m = 1, we start with the recurrence above with k = 0:

1 I I1,n 
1,n1
= 2 .
n2 I0,n1 I0,n

AugustSeptember 2016] NOTES 711


Since these hold for n N, summing over n = 1 to k yields

k
1 I I1,k 
1,0
= 2 ,
n=1
n2 I0,0 I0,k

which can be rewritten in the form of the claim with m = 1.


For the inductive step, we assume that the claim is true for m. Substituting this into
the recurrence (along with some minor relabels of variables) yields

1  (1)m+k (2m)! Ik,0


m
4 I Im+1, j 
m+1, j1
S j (m k) = .
j 2 k=0 22m2k (2k)! I0,0 (2m + 2)(2m + 1) I0, j1 I0, j

N
S j (n)
Summing both sides from j = 1 to N and noting that = S N (n + 1) yields
j=1
j2

m
(1)m+k (2m)! Ik,0 4 I Im+1,N 
m+1,0
S j (m k + 1) = .
k=0
22m2k (2k)! I0,0 (2m + 2)(2m + 1) I0,0 I0,N

This can be readily rewritten in a form that verifies the claim for m + 1, as required.
We now bound the right side of (*). Using the fact that sin x 2x
on [0, 2 ], in
conjunction with I0,n = 2n1 I
2n 0,n1
, we obtain
  2m   2m  2m I
2 2 0,N
Im,N sin x cos2N x d x sin2 x cos2N x d x = .
0 2 0 2 2 N +1

Im,N  2m I
0,N
Since the integrands are nonnegative, it follows that 0 < .
I0,N 2 N +1
Consequently, we can rewrite (*) as

m
(1)k (2m)! Imk,0 2m
0< S N (k) .
k=0
22k (2m 2k)! I0,0 22m (N + 1)

Letting N , the squeeze theorem yields

m
(1)k (2m)! Imk,0
S(k) = 0.
k=0
22k (2m 2k)! I0,0

In,0 2n
We can simplify this further by noting that = 2n by a simple integra-
I0,0 2 (2n + 1)
tion calculation. Upon applying this fact, we obtain the desired recurrence.

1 
We next compute the value of S(k) = . Although this result is
... i k2 i2
1i 1 i k 1
known in the literature (see [5], for instance), we will give a short derivation of this
result below.

712 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Lemma 2. For any k N, we have S(k) = 2(1 212k ) (2k).



Proof. Consider the generating function G(x) = S(k)x 2k , where we are taking
k=0
S(0) = 1. Before rewriting G(x), we quote the following two classic results from
complex analysis without proof (see [6] for further details):

 z2   (1)n+1 z

1
sin z = z 1 and csc z = 2 .
j=1
( j)2 z n=1
z 2 (n)2

First, directly applying the definition of S(n) to G(x) followed by using the infi-

 x 2 1 x
nite product for sine yields G(x) = 1 2 = = x csc( x). Next,
j=1
j sin x


(1)n+1 x 2
by applying the expansion for cosecant, we obtain G(x) = 1 2 .
n=1
x 2 n2
  
x2 x 2k
However, 2 = by the geometric series. Applying this to our last
x n2 k=1
n
expression for G(x) and interchanging the order of summation yields




(1)n+1
G(x) = 1 + 2 x 2k .
k=1 n=1
n 2k

Now we rewrite the inner sum in terms of the zeta function:




(1)n+1 
1 
2
= = (1 212k ) (2k).
n=1
n 2k n=1
n 2k
n=1
(2n) 2k



Therefore, we obtain G(x) = 1 + 2(1 212k ) (2k)x 2k . On the other hand, since
k=1


G(x) = 2k
S(k)x , equating like coefficients of x 2k yields the desired result.
k=0

Before proving the main result, we need the following identity among the Bernoulli
numbers, which we will also prove by using generating functions.
m 
 
2m + 1
Lemma 3. For any m N, we have (1 22k1 )B2k = 0.
k=0
2k

t 1 + et 2t
Proof. Since = 2t , the generating function for the Bernoulli
1 et 2 e 1
numbers and the Maclaurin series for et yields

 1   t n    Bk (2t)k 

Bk t k
= 1+ .
k=1
k! 2 n=0
n! k=1
k!

AugustSeptember 2016] NOTES 713


Rearrange this as


  r  
r 1 tr r tr
Br (1 2 ) = (1 2 j1 B j ) .
r =1
r! r =0 j=0
j r!

Equate the coefficients for t 2m+1 from both sides, noting that B2m+1 = 0:


2m+1
2m + 1

(1 2 j1 )B j = 0.
j=0
j

Since the odd indices contribute nothing to the sum on the left side of the equality, we
can rewrite this (upon relabeling the even indices) as

m  
2m + 1
(1 22k1 )B2k = 0.
k=0
2k

Finally, we establish the closed form of (2m) for any positive integer m by putting
together the proposition with the last two lemmata.

Theorem 1. For any m N,

(1)m+1 22m1 B2m 2m


(2m) = .
(2m)!

Proof. First of all, note that applying Lemma 2 to the proposition yields
m1 

(1)m+1 2m 1  1 k 1 212k
(2m) = + (2k) .
1 212m 2(2m + 1)! k=1 2 (2m 2k + 1)!

With this identity, we prove the theorem by strong induction on m. By letting m = 1,


2 23 B2 4
we obtain (2) = = , establishing the base case. Next, we assume that the
6 2!
claim is true for all integers k = 1, 2, ..., m 1. Applying the inductive hypothesis to
the aforementioned identity, we obtain

m1 
 
(1)m+1 2m 22m1 1 2m + 1
(2m) = (22k1 1)B2k .
(2m)! (2m + 1)(22m1 1) k=0 2k

Since Lemma 3 implies that

m1 
 
1 2m + 1
B2m = (22k1 1)B2k ,
(2m + 1)(22m1 1) k=0 2k

we deduce that the claim is also true for m, thereby completing the proof.

714 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
REFERENCES

1. F. Beukers, E. Calabi, J. Kolk, Sums of generalized harmonic series and volumes, Nieuw Arch. Wiskd. 11
no. 4 (1993) 217224.
2. R. Chapman, Evaluating (2), 2003, available at http://empslocal.ex.ac.uk/people/staff/
rjchapma/etc/zeta2.pdf.
3. D. Daners, A short elementary proof of 1/k 2 = 2 /6, Math. Mag. 85 (2012) 361364.
4. T. Osler, Finding (2 p) from a Product of Sines, Amer. Math. Monthly 111 (2004) 5254.
5. Y. Ohno, W. Zudilin, Zeta stars, Commun. Number Theory Phys. 2 (2008) 325347.
6. R. Silverman, Introductory Complex Analysis. Dover, New York, 1972.

Department of Mathematics, California State University Channel Islands, Camarillo, CA 93012


[email protected]

AugustSeptember 2016] NOTES 715


Generating Iterated Function Systems for the
Vicsek Snowflake and the Koch Curve
Yuanyuan Yao and Wenxia Li

Abstract. We determine all generating iterated function systems for certain self-similar sets
such as the Vicsek snowflake and the Koch curve.

1. INTRODUCTION. Our work is motivated by a basic problem in fractal geome-


try: How does one find all generating iterated function systems (IFSs) for a self-similar
set? Applications of IFS can be seen in reptiles [2] and image compression [1, 3, 6].
We call a nonempty compact set F Rd a self-similar set if it is a finite union
of its self-similar copies; that is, there exists a family of contractive similitudes
F = {i (x ) = i Ui x + bi }i=1
N
(N is an integer no smaller than 2) such that
N
F = i=1 i (F), where i (0, 1), Ui is an orthonormal d d matrix, and bi is
a translation vector. The family F is called a generating IFS for F. It is well-known
that F determines F uniquely, but the converse is not true.
Investigating all generating IFSs for a self-similar set was first done by Feng and
Wang in R [5]. However, in the higher-dimensional case, the situation is somewhat
different since the form of an orthonormal matrix is much more complicated. The
discussion is limited either to homogeneous IFSs (all i Ui are the same) with the
strong separation condition [4] or to special kinds of planar self-similar sets [8].
In this note, we first give an easy-to-check theorem. We then use it to deal with all
generating IFSs for some self-similar sets that cannot be covered by the above works.
We denote by I E the collection of all isometries of a set E Rd . Readers can refer
to [7] for more information about I E . By f i , we mean f i1 f i if i = i 1 . . . i 
is a finite sequence in k=1 {1, . . . , N }k
, which is the set of all finite words over
{1, . . . , N }. Then we have the following.

Theorem. Let E Rd be the self-similar set generated by an IFS {i (x)}i=1N


. Assume
that for each contractive similitude (x) with (E) E, we have (E) i (E) for
some i {1, . . . , N }. Then everycontractive similitude satisfying (E) E can
be written as i S for some i k=1 {1, . . . , N } and S I E .
k

Remark. As an application, we investigate all generating IFSs for the triadic Cantor
set C generated by the IFS {1 (x) = x/3, 2 (x) = (x + 2)/3}. Note that each con-
tractive similitude with (C) C satisfies (C) i (C) for some i {1, 2}, and
IC = {x, 1 x}. Then by the Theorem, every k (x) in a generating IFS {k }k=1 for
C must be of the form i (x) or i (1 x).

2. PROOF OF THEOREM AND SOME EXAMPLES.

Proof of Theorem. Suppose that (E) i1 (E) for some i 1 {1, . . . , N }. Then
i1
1
(E) E, so either i1
1
(E) = E or, by repeating the above process, i12

http://dx.doi.org/10.4169/amer.math.monthly.123.7.716
MSC: Primary 28A80, Secondary 28A78

716 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
i1 (E) E for some i 2 {1, . . . , N }. We can obtain by induction that i1
1 
(E) = E for some i 1
k=1 {1, . . . , N } , and hence, i I E .
k

Let E be the self-similar set generated by an IFS {i }i=1


N
. Assume that F is a com-
pact set satisfying i (F) F for all 1 i N . Then the following fact about self-
similar sets will be used in all our examples:



E= i (F) F. (1)
k=1 i{1,...,N }k

The first example is Vicsek snowflake, a just-touching self-similar set.

Example 1. The Vicsek snowflake V (Figure 1) is generated by the IFS { f i }i=1


5
, where
         2
      2

x 1 x x 1 x x 1 x
f1 = , f2 = + 3 , f3 = + 3
2 ,
y 3 y y 3 y 0 y 3 y 3
           1

x 1 x 0 x 1 x
f4 = + 2 , f5 = + 3
1 .
y 3 y 3
y 3 y 3

Suppose that is a contractive similitude in R2 with (V ) V . Then = f i S


for some i k=1 {1, . . . , 5} and S IV .
k

Figure 1. The first three iterations of Vicsek snowflake

Proof. By the Theorem, it suffices to prove that (V ) f i (V ) for some i


{1, . . . , 5}.
By (1), V is a subset of [0, 1] [0, 1] and contains its two diagonals AC and BD
(Figure 2). We claim that each line segment I in V is parallel to AC or BD.

D C
f4 f3

f5
E F
f1 f2

A B

Figure 2. The diagonals of [0, 1] [0, 1] are in the Vicsek snowflake.

Assume that I f i (V ) for some i k=0 {1, . . . , 5} and I


f ii (V ) for all i
k
1
{1, . . . , 5} with f {1,...,5}0 being the identity. Then f i (I ) is a line segment in V and it
intersects at least two different f i ([0, 1] [0, 1])s.

AugustSeptember 2016] NOTES 717


Let M and N be the two endpoints of f i1 (I ). We claim that MN lies on either AC
or BD. We can reduce its proof into the following two cases.
Case 1. M f 1 (V ) and N f 2 (V ). Then EF MN, which contradicts the fact
that EF V is the Cantor set and MN is a line segment in V .
Case 2. M f 1 (V ) and N f 5 (V ). In this case, E MN. By f 51 (E) = A, we
can assume EN f 51m (V ) and EN
f 51(m+1) (V ) for some nonnegative integer m,
where 1m is the word by repeating 1 a total of m times. Then f 511m (EN) is a subset
of V and it intersects both f 1 (V ) and some f i (V ) with i
= 1, so it lies on AC, giving
EN f 51m (AC) AC. Therefore, MN AC, proving our claim.
The two diagonals of ([0, 1] [0, 1]) are in V ; hence, they are parallel to AC
and BD, respectively. Thus, all sides of ([0, 1] [0, 1]) are parallel to either AB
or CD. Suppose that (V )
f i (V ) for all i {1, . . . , 5}. Then the four vertices of
([0, 1] [0, 1]) are in four different regions of the form f i ([0, 1] [0, 1]) with i
=
5. Therefore, (AB V ) is a scaled copy of the Cantor set and contains a gap of length
at least 13 , which is impossible as is a contractive similitude.

The second example is a self-similar set for which substantial overlaps occur.

Example 2. The self-similar set W is generated by the IFS {g1 , g2 , g3 } where


               
x x x x 2 x x 0
g1 = , g2 = + , g3 = 2 +
y y y y 0 y y

with = 51 . Suppose that is a contractive similitude in R2 with (W ) W .
2 
Then = gi for some i k=1 {1, 2, 3} .
k

B (0, 1) B (0,1)

f3
F F
E E

f1 G f G
2

O (0, 0) H I A (1, 0) O (0, 0) H I A (1, 0)

Figure 3a. Figure 3b.

Proof. We point out that g122 = g211 , which is a complete overlap. Note that
IW = {identity}. By the Theorem, we only need to show that (W ) gi (W ) for
some i {1, 2, 3}. Figure 3 may help in following the proof.
Using (1), we can check that W is a subset of OAB and contains all its three sides
but no points of (EFG) , where (EFG) is the interior of EFG (Figure 3a).
We first prove that (O AB) gi (OAB) for some i {1, 2, 3}. Suppose other-
wise. Let P1 P2 W be one side of (OAB); there are two cases to consider.
Case 1. P1 BEF \ {E, F}, P2 (EOI \ {E}) (FHA \ {F}).
Then P1 P2 lies on either BO or BA, say BO since it cannot intersect (EFG) .
Reasoning similarly, the third vertex of (OAB) lies on BA. So P1 equals B and
EFG (OAB), which contradicts the fact that is a contractive similitude.
Case 2. P1 EOI \ GHI, P2 FHA \ GHI.  
Recall that (EFG) contains no points of W . So does g k (EFG) by similarity,
where g = g12 or g21 , and g k is the kth iteration of g.

718 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Notice that g k (EFG) connect each other one by one. They are located along the
line FH (or EI) and approach to the point H (or I ) as k . Thus, either P1 P2 or
another side of (OAB) passes through Yn for some n (Figure 3b gives the case Y2 ),
where


n
 
n

Yn = k
g12 (EFG) k
g21 (EFG) .
k=0 k=0

This is impossible since all three sides of (OAB) are in W .


Having known that (OAB) gi (OAB) for some i {1, 2, 3}, we will prove
that (W ) gi (W ) by verifying g1 (W ) GHI = g2 (W ) GHI.
It follows from g122 = g211 that g21n
g1 = g1 g22 n
. Moreover, for each n N, we
can check that g21 g3 (O) lies in EI, g21 g22 (O) equals I , and g21
n n n
g23 (O) lies over
EI. So
 
GHI g21 n
(W ) = GHI g1 g22 n
(W ) g21
n
g2 (W ) g21
n
g3 (W )
 
g1 (W ) GHI g21 n
g2 (W )
 n+1 

= g1 (W ) GHI g21 (W ) g21 n


g22 (W ) g21 n
g23 (W )
 
g1 (W ) GHI g21 n+1
(W ) .

Then  
GHI g2 (W ) = GHI g21 (W ) g1 (W ) GHI g21 n
(W ) .
Therefore, GHI g2 (W ) g1 (W ) as n=1 g21 n
(W ) = {I } g1 (W ). Using a
similar argument yields GHI g1 (W ) g2 (W ), which completes the proof.

All generating IFSs for the above examples can be iterated from their defining IFS.
However, the famous Koch curve, which is a fractal invented by the Swedish mathe-
matician Helge von Koch in 1904, is one of the exceptions.
Start with E 0 = [0, 1]. Let E 1 be the set consists of the four line segments obtained
by replacing the middle third of E 0 by the other two sides of the equilateral triangle
based on the removed segment. Inductively, we construct E k by applying the same
procedure to each line segments in E k1 (Figure 4). Finally, we arrive at the Koch
curve.

E0 E1 E2

Figure 4. E k with k = 0, 1, 2

Example 3. The Koch curve K is generated by the IFS {h 1 , h 2 } where


  x   x
x + 3
y x 3
y + 1
h1 =
2 6 and h 2 = 2 6 .
y
6
3
x 2y y
6
3
x 2y

Suppose that isa contractive similitude in R2 with (K ) K . Then = h i or


h i T for some i k=1 {1, 2, 3} , where
k

AugustSeptember 2016] NOTES 719


       
x
x
+ 13 x 1x
h3 = 3
and T = .
y y
3
+ 93 y y

Proof. Note that h 1 and h 2 map ABC to DBA and ECA, respectively (Fig-
ure 5). Owing to (1), we have K ABC. By the Theorem, it remains to prove that
(K ) h i (K ) for some i {1, 2, 3} as I K = {identity, T }.

A (1/2, 3/6)

M Q J N
H
I P
B (0, 0) D (1/3, 0) E (2/3, 0) C (1, 0)

Figure 5. Possible positions of vertices of (ABC)

Suppose that (K )
h i (K ) for i {1, 2}. Denote the three vertices of (ABC)
by I, J and L. Without loss of generality, we can assume I ABD \ {A} and J
AEC \ {A}. Let H be the intersection point of IJ and AE. Let MN be the line segment
passing through H and parallel to BC with M AB and N AC.
Note that K AB (also K AC and K BC) is a similar copy of the triadic Cantor
set C. So is (K ) IJ. Therefore, |PH| min{|HJ|, |PI|}, which implies I = M
and J = N . Otherwise, as we can see in Figure 5, |HJ| < |HN| = |QH| < |PH|, a
contradiction!
Finally, we claim that L = A. Otherwise, either LM or LN is parallel to BC by
applying the same argument as above, which is impossible.
Now we get that (K AB) is a subset of K AM or K AN. We only consider
the former case. A similar proof
works for the latter case.  
Notice that K AB = C/ 3. Letting |AM|/|AB| = yields C/ 3 C/ 3
or, equivalently, C C. By the Remark in the introduction, we have = 3n for
some positive integer n. Thus, (K ) h 3 (K ), which finishes our proof.

ACKNOWLEDGMENT. We thank the anonymous referee for careful reading of the manuscript and making
many useful suggestions. The first author is supported by NSFC #11101148 and the Fundamental Research
Funds for the Central Universities, ECUST #222201514321. The second author is supported by the NSFC
#11271137 and STCSM #13dz2260400.

REFERENCES

1. M. F. Barnsley, L. P. Hurd, Fractal Image Compression, A K Peters, Wellesley, MA, 1993.


2. H. T. Croft, K. J. Falconer, R. K. Guy, Unsolved Problems in Geometry, Springer-Verlag, New York, 1991.
3. A. Deliu, J. Geronimo, R. Shonkwiler, On the inverse fractal problem for two-dimensional attractors, Proc.
R. Soc. London, Ser. A 355 (1997) 10171062.
4. Q. R. Deng, K. S. Lau, On the equivalence of homogeneous iterated function systems, Nonlinearity 26
(2013) 27672775.
5. D. J. Feng, Y. Wang, On the structure of generating iterated function systems of Cantor sets, Adv. Math.
222 (2009) 19641981.
6. N. Lu, Fractal Imaging, Academic Press, San Diego, CA, 1997.
7. M. Moran, The group of isometries of a self-similar set, J. Math. Anal. Appl. 392 (2012) 8998.
8. Y. Y. Yao, Generating iterated function systems of some planar self-similar sets, J. Math. Anal. Appl. 421
(2015), 938949.

720 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Department of Mathematics, East China University of Science and Technology, Shanghai 200237, P.R. China
[email protected]

Department of Mathematics, Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai
200241, P.R. China
[email protected]

AugustSeptember 2016] NOTES 721


PROBLEMS AND SOLUTIONS
Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West
with the collaboration of Itshak Borosh, Paul Bracken, Ezra A. Brown, Randall
Dougherty, Tamas Erdelyi, Zachary Franco, Christian Friesen, Ira M. Gessel, Laszlo
Liptak, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Steven J. Miller,
Mohamed Omar, Richard Pfiefer, Dave Renfro, Cecil C. Rousseau, Leonard Smiley,
Kenneth Stolarsky, Richard Stong, Walter Stromquist, Daniel Ullman, Charles Vanden
Eynden, and Fuzhen Zhang.

Proposed problems should be submitted online via http://www.americanmath


ematicalmonthly.submittable.com/submit . Proposed solutions to the prob-
lems below should be submitted on or before January 31, 2017 at the same
link. More detailed instructions are available online. Solutions to problems num-
bered 11921 or below should continue to be submitted via e-mail to monthlyprob-
[email protected]. Proposed problems must not be under consideration concur-
rently to any other journal nor be posted to the internet before the deadline date
for solutions. An asterisk (*) after the number of a problem or a part of a problem
indicates that no solution is currently available.

PROBLEMS
11922. Proposed by Max Alekseyev, George Washington University, Washington, DC
Find every positive integer n such that both n and n 2 are palindromes when written in
the binary numeral system (and with no leading zeros).
11923. Proposed by Omran Kouba, Higher Institute for Applied Sciences and Tech-
nology, Damascus, Syria. Let f p be the function on (0, /2) given by

f p (x) = (1 + sin x) p (1 sin x) p 2 sin( px).

Prove f p > 0 for 0 < p < 1/2 and f p < 0 for 1/2 < p < 1.
11924. Proposed by Cornel Ioan Valean, Timis, Romania. Calculate
 /2
{tan x}
d x,
0 tan x
where {u} denotes u u.
11925. Proposed by Leonard Giugiuc, Drobeta Turnu Severin, Romania. Let n be an
integer with n 4. Find the largest k such that for any list a of n real numbers that
sum to 0,
3 2
 n n
a 2j k a 3j .
j=1 j=1

http://dx.doi.org/10.4169/amer.math.monthly.123.7.722

722 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
11926. Proposed by Ovidiu Furdui, Technical University of Cluj-Napoca, Cluj-
Napoca, Romania. Let k be an integer, k 2. Find

log |1 x|
d x.
0 x (1+1/k)

11927. Proposed by Finbarr Holland, University College Cork, Cork, Ireland. Let
O, G, I , and K be, respectively, the circumcenter, centroid, incenter, and symme-
dian point (also called Lemoine point or Grebe point) of triangle ABC. Prove |OG|
|O I | |O K |, with equality if and only if ABC is equilateral.
11928. Proposed by Hideyuki Ohtsuka, Saitama, Japan. For positive integers n and m
and for a sequence ai , prove

n  m   n+m 

n m n+m
ai+ j = ak
i=0 j=0
i j k=0
k

and
 n n i + j  n n 2
= .
i< j
i j n i< j
i j

SOLUTIONS

Special Multiples of an Integer


11789 [2014, 648]. Proposed by Gregory Galperin, Eastern Illinois University,
Charleston, IL, and Yury J. Ionin, Central Michigan University, Mount Pleasant,
MI.
Let a and k be positive integers. Prove that for every positive integer d there exists
a positive integer n such that d divides ka n + n.
Solution by Mark Wildon, Royal Holloway, University of London, Egham, U. K.
We shall show by induction on d that there are infinitely many solutions. If d = 1,
then any n N is a solution. Consider d > 1.
Suppose first that a is not a unit modulo d. Choose a prime p dividing gcd(a, d)
and exponents and such that a = a p and d = d p , where a and d are not divis-
ible by p. Let n = p m. When m is sufficiently large, ka n + n n 0 (mod p ).
Therefore, ka n + n 0 (mod d) if and only if
m
ka p + p m 0 (mod d ),

or, equivalently, if and only if

bm + m 0 (mod d ),

where b = a p and  N is chosen so that k p (mod d ). By the induction
hypothesis, we find infinitely many choices for m.
In the remaining case, d > 1 and a is a unit modulo d. Let c = gcd((d), d). By
the induction hypothesis, there exists m N such that ka m + m 0 (mod c). Let ka m
+ m = r c, where r N, and choose s, t N so that c = s(d) + td. Set

AugustSeptember 2016] PROBLEMS AND SOLUTIONS 723


n = m r s(d) + (d)d, where N is chosen so that n N. Now a (d) 1
(mod d), and hence,
ka n + n = ka mr s(d)+(d)d + m r s(d) + (d)d
ka m + m r s(d)d
= r (c s(d)) + (d)d = r td + +(d)d,
where the congruence is modulo d. Hence, d divides ka n + n, as required. We now
obtain infinitely many solutions by varying .
Also solved by J. H. Lindsey II, B. Maji (India), M. Omarjee (France), K. Razminia (Iran), N. Safei (Iran),
J. C. Smith, A. Stenger, R. Stong, R. Tauraso (Italy), R. Viteam (India), University of Louisiana at Lafayette
Math Club, and the proposers.

A Lacunary Recurrence for Bernoulli Numbers



11791 [2014, 648]. Proposed by Marian Stofka, Slovak University of Technology,
Bratislava, Slovakia.
Show that for r 1,
r 
6r + 1 6r + 1
B6s2 = ,
s=1
6s 2 6
where Bn denotes the nth Bernoulli number.
Solution by Allen Stenger, Boulder, CO. Let be a primitive sixth root of unity so
that 3 = 1 and 2
+ 1 = 0. Recall that the nth Bernoulli polynomial Bn (x) is
defined by Bn (x) = ns=0 ns Bs x ns . We first prove
 r 
1
5
6r + 1
B6s2 = (1)m B6r +1 ( m ). (1)
s=1
6s 2 6 m=0

We know that 5m=0 am is 6 if a is divisible by 6 and is 0 otherwise. By the definition


of Bernoulli polynomials, we have
5 5 
6r +1 
3m 6r + 1
(1) B6r +1 ( ) =
m m
Bk ( m )6r +1k
m=0 m=0 k=0
k


6r +1   5
6r + 1
= Bk m(6r 2k)
k=0
k m=0

  r 
6r + 1 6r + 1
=6 Bk = 6 B6s2 ,
0k6r +1
k s=1
6s 2
6|k+2

which proves (1).


From the generating function
ze x z 
zn
= Bn (x) ,
ez 1 n=0
n!
it is easy to prove the well-known facts that Bn (0) = Bn (1) = 0 when n is odd and
exceeds 1 and that Bn (x + 1) Bn (x) = nx n1 for all n. Thus, Bn (1) = n when n
is odd and exceeds 1. Now

724 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123

5
(1)m B6r +1 ( m )
m=0

= 0 (B6r +1 () B6r +1 ( 2 )) + (6r + 1) + (B6r +1 ( 4 ) B6r +1 ( 5 )).

Using 2 = 1, we evaluate the two grouped terms on the right as


B6r +1 () B6r +1 ( 2 ) = B6r +1 ( 2 + 1) = B6r +1 ( 2 )
= (6r + 1)( 2 )6r = 6r + 1
and
B6r +1 ( 4 ) B6r +1 ( 5 ) = B6r +1 () B6r +1 ( 2 )
= B6r +1 () B6r +1 (1 )
= (6r + 1)()6r = (6r + 1).
Combining these facts yields

5
(1)m B6r +1 ( m ) = (6r + 1).
m=0

Editorial comment. As noted by several solvers, this formula was proved by S.


Ramanujan, Some properties of Bernoullis numbers, J. Indian Math. Soc. 3 (1911),
219234, and further work on related identities can be found in D. H. Lehmer, Lacu-
nary recurrence formulas for the numbers of Bernoulli and Euler, Ann. of Math. (2)
36 (1935), 637649 and F. T. Howard, A general lacunary recurrence formula, in
Applications of Fibonacci Numbers, Vol. 9, Kluwer Acad. Publ., Dordrecht, 2004,
pp. 121135.
Also solved by U. Abel (Germany), D. Beckwith, R. Chapman (U.K.), C. Georghiou (Greece),
F. T. Howard, B. Karaivanov, O. Kouba (Syria), O. P. Lossers (Netherlands), R. Tauraso (Italy), C. Vig-
nat & V. H. Moll, M. Vowe (Switzerland), and GCHQ Problem Solving Group (U.K.).

A Functional Equation
11794 [2014, 648]. Proposed by George Stoica, University of New Brunswick, Saint
John, Canada. Find every twice differentiable function f on R such that for all nonzero
x and y, x f ( f (y)/x) = y f ( f (x)/y).
Solution by O. P. Lossers, Eindhoven University of Technology, Eindhoven, The
Netherlands. We assume only that f is once differentiable. We claim that the solutions
are f (x) = a(x + a), where a R.
The functional equation
 
f (y) f (x)
xf = yf (1)
x y
can be written in the form
f (y) 
f f (y) f (x)
x
f (y)
= f (2)
x
y y

when f (y) = 0.

AugustSeptember 2016] PROBLEMS AND SOLUTIONS 725


Taking x > 0, y < 0 in (1), we conclude either that f has a zero or that f has both
positive and negative values, so again f has a zero.
Case 1: f (0) = 0. If f (x) = f (y) = 0, then from (1) we get x = y, so the zero
is unique in this case. If f satisfies (1), then F(x) = a 2 f (x/a) also does, so we may
assume F(1) = 0. Setting x = 1 and then y = 0 gives F(F(0)) = 0, so by the unique-
ness, F(0) = 1. Setting x = 1 then yields F(F(y)) = y for all y, so F is surjective. If
we substitute z for F(y) in (1) and differentiate with respect to x, we obtain
z z  
z F(x)
F F =F F (x).
x x x F(z)

Setting z = x yields F (1) 1 + F (x) = 0 for all x. It follows that F(x) = 1 x. We
conclude that the general solution in this case is f (x) = a(a + x) with a = 0.
Case 2: f (0) = 0. We claim f (0) = 0. If not, then by scaling F(x) = a 2 f (x/a) as
before, we produce an F so that F (0) = 1. Set x = y in (2) and let y go to zero
to obtain F(1) (1) = F(1), so F(1) = 0. Now set x = 1 in (2) and let y go to zero.
We get (F (0))2 = 0, a contradiction; hence, f (0) = 0.
Now we claim that f is identically zero. If not, then there exists a such that f (a)
= 0. Set y = f (x)/a. We want to let x 0. Since f is continuous at 0, we have
f (x) 0 as x 0. Thus, y = f (x)/a 0 as x 0. Now f (y) 0 as x 0.
Since f (0) = 0, also f (x)/x 0, so y/x 0 as x 0. Furthermore, f (y)/y 0
and f (y)/x 0 as x 0. Thus, by (2),

f (x) f f (y) f (y)
f (a) = f = f (y) x
f (0) f (0) = 0
y x
y

as x 0, a contradiction. Hence, f (x) is identically zero, and we get a(x + a) in the


last case a = 0.
Editorial comment. If the functional equation is required only for positive x, y, then
there are many other solutions, such as f (x) = (x 2 + 1)1/2 .
Also solved by E. A. Herman, Y. J. Ionin, and R. Stong

A Trig Integral with Gamma


11796 [2015, 738]. Proposed by Gleb Glebov, Simon Fraser University, Burnaby,
Canada. Find

sin((2n + 1)x) x m1
e x dx
0 sin x
in terms of , m, and n, when > 0, m 1, and n is a nonnegative integer.
Solution by Michel Bataille, Rouen, France. Define I to be the proposed integral. Then
we can write it as,
  


I = 1+2 cos(2kx) ex x m1 d x = I0 + 2 Ik .
0 k=1 k=1

From the definition of the gamma function, we have



(m)
I0 = ex x m1 d x = ,
0 m

726 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
and for k 1,

(2ki)x (m) (m) ( + 2ki)m
Ik = Re e x m1
dx = Re = Re .
0 ( 2ki)m ( 2 + 4k 2 )m
To put this in real form, define k = tan1 (2k/) so we can write ( + 2ki)m
= ( 2 + 4k 2 )m [cos(mk ) + i sin(mk )]. The real part Ik of the integral can be calcu-
lated from this, and
 
1  m
cos m tan1 2k
I = (m) +2 .
m k=1
( 2 + 4k 2 )m/2

Also solved by U. Abel (Germany), K. F. Andersen (Canada), R. Bagby, D. Beckwith, R. Boukharfane


(France), K. N. Boyadzhiev, P.Bracken, B. Bradie, M. A. Carlton, R. Chapman (U. K.), H. Chen, D. F. Connon
(U. K.), B. E. Davis, J. L. Ekstrom, C. Georghiou (Greece), M. L. Glasser, E. A. Herman, M. Hoffman, B.
Karaivanov & T. S. Vassilev (U.S.A & Canada), O. Kouba (Syria), K. D. Lathrop, M. Omarjee (France), P.
Perfetti (Italy), C. M. Russell, R. Sargsyan (Armenia), M. A. Shayib & M. Misaghian, A. Stenger, R. Stong, R.
Tauraso (Italy), N. Thornber, E. I. Verriest, Z. Voros (Hungary), M. Vowe (Switzerland), H. Widmer (Switzer-
land), M. Wildon (U. K.), GCHQ Problem Solving Group (U. K.), and the proposer.

Growth Rate for Solution


11799 [2014, 739]. Proposed by Vicentiu Radulescu, King Abdulaziz University, Jed-
dah, Saudi Arabia. Let a, b, and c be positive.
(a) Prove that there is a unique continuously differentiable function f from [0, )
into R such that f (0) = 0 and, for all x 0,
c
f (x) 1 + a| f (x)|b = 1.
(b) Find, in terms of a, b, and c, the largest such that f (x) = O(x ) as x .
Solution by Kenneth F. Andersen, Edmonton, AB, Canada. (a) If f is a solution, then
c
f (x) 1 + a| f (x)|b = 1 (1)
implies f (x) > 0 for all x 0. Since f (0) = 0, we conclude that f is nonnegative
and strictly increasing on [0, ). We claim that f is unbounded. Indeed, if f (x) M,
then (1) shows f (x) (1 + a M b )c > 0 for x 0 so that
 x
x
f (x) = f (t) dt ,
0 (1 + a M b )c
and thus, f (x) > M for sufficiently large x, a contradiction. Thus, f is a bijection
of [0, ) onto itself, with a continuously differentiable inverse f 1 satisfying f 1 (0)
= 0 and
1
f f (x) f (x) = 1
for all x 0. Combining this with (1) yields
1 c
f (y) = 1 + ay b
for all y [0, ). Since (1 + ay b )c is a continuous function of y, by the fundamental
theorem of calculus,
 y
1
c
f (y) = 1 + at b dt.
0

Since inverses are unique, this uniquely determines f on [0, ).

AugustSeptember 2016] PROBLEMS AND SOLUTIONS 727


(b) Clearly, there is no maximal value of satisfying the stated requirement. If
satisfies the requirement, then so does + 1. We will show that the minimal value of
satisfying the stated requirement is (1 + bc)1 . Note that
 y  1  1
c b c ac
lim y 1bc 1 + at b dt = lim y + as b ds = a c s bc ds = .
y 0 y 0 0 bc + 1
Thus,


y c
lim x f (x) = lim 1 + at b dt y
x y 0

is finite if and only if (1 + bc)1 .


Editorial comment. In simplifying the statement, the editors mistakenly wrote
largest instead of smallest.
Also solved by R. Bagby, R. Chapman (U. K.), J.-P. Grivaux (France), O. Kouba (Syria), J. H. Lindsey II,
I. Pinelis, A. Stenger, R. Stong, M. L. Treuden, E. I. Verriest, GCHQ Problem Solving Group (U. K.), and the
proposer.

Arithmetic Minus Geometric Means


11800 [2014, 739]. Proposed by Oleksiy Klurman, University of Montreal, Montreal,
Canada. Let f be a continuous function from [0, 1] into R+ . Prove that
 1  1   2

f (x) d x exp log f (x) d x max f (x) f (y) .
0 0 0x,y1

Composite Solution by Rafik Sargsyan, Yerevan State University, Yerevan, Armenia,


and Kenneth Schilling, Mathematics Department, University of MichiganFlint, Flint,
MI. We will prove a stronger result. Recall that the arithmetic, geometric, and harmonic
means A, G, and H of f on [0, 1] satisfy
 1  1   1 1
dx
A= f (x) d x G = exp log f (x) d x H = .
0 0 0 f (x)
Let M and m be the maximum and minimum values of f , respectively. We show
2
that A H M m , which is stronger than the requested inequality A G
2
M m .
For x [0, 1], define s(x) by the relation

f (x) = s(x) m + (1 s(x)) M


1
and let t = 0
s(x) d x. We have
 1
f (x) d x = tm + (1 t)M.
0

By convexity, 1
f (x)
s(x)
m
+ 1s(x)
M
, so
 1
dx t 1t t M + (1 t)m
+ = .
0 f (x) m M mM

728 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Also note that
2 2
0 t M (1 t) m = t M + (1 t)m t (1 t) M+ m ,
2
so t M + (1 t)m t (1 t) M+ m . Now

mM t (1 t)(M m)2
A H tm + (1 t)M =
t M + (1 t)m t M + (1 t)m
t (1 t)(M m) 2 2
 = M m ,
2
t (1 t) M+ m

as claimed.
Note: The argument above applies even if f need only be a bounded, positive mea-
surable function. In that version, equality holds in the strengthened inequality when f
takes the value m on a set of measure M+mm and the value M on a set of complemen-

M
tary measure .
M+ m

Editorial comment. The discrete case of the strengthened inequality above appeared
as Problem 11469 in this Monthly (problem in December 2009 and solution in May
2011) and in B. Meyer, Some inequalities for elementary mean values, Math. Comp.
42 (1984) 193194. The best possible upper bound for A G in terms of m and M can
be proved by similar arguments to the ones above. This upper bound is the maximum of
tm + (1 t)M m t M 1t , which is attained at t = log[M log(M/m)/(Mm)]
log(M/m)
. This result is
also proved in S. H. Tung, On lower and upper bounds of the difference between the
arithmetic and geometric mean, Math. Comp. 29 (1975) 834836.
Also solved by R. Bagby, R. Boukharfane (France), R. Chapman (U. K.), E. A. Herman, B. Karaivanov
& T. S. Vassilev (U.S.A. & Canada), J. H. Lindsey II, P. W. Lindstrom, M. Omarjee (France), P. Perfetti
(Italy), I. Pinelis, R. Sargsyan (Armenia), K. Schilling, A. Stenger, R. Stong, S. Yi (Korea), Z. Zhang (China),
FAU Problem Solving Group, Northwestern University Math Problem Solving Group, NSA Problems Group,
University of Louisiana at Lafayette Math Club, and the proposer.

Rational Polynomials with no Nonnegative Zeros


11801 [2014, October]. Proposed by David Carter, Nahant, MA. Let f be a polynomial
in one variable with rational coefficients that has no nonnegative real root. Show that
there is a nonzero polynomial g with rational coefficients such that the coefficients of
f g are all positive.
Solution by Richard Stong, Center for Communications Research, San Diego, CA. We
prove the seemingly weaker statement that if a polynomial f has real coefficients and
no nonnegative real root, then there is a polynomial h with real coefficients such that
the coefficients of f h are nonnegative. To see that this suffices, note that if f (x)h(x)
has degree d and nonnegative coefficients, then (x d + x d1 + + x + 1) f (x)h(x)
has degree 2d and positive coefficients. Invoking continuity and the density of the
rationals, we can then take g to be a polynomial with rational coefficients close enough
to (x d + x d1 + + x + 1)h(x) to solve the problem as stated (with a slightly weaker
hypothesis on f ).
The weaker fact has a multiplicative property: If polynomials h 1 and h 2 exist such
that f 1 h 1 and f 2 h 2 have nonnegative coefficients, then ( f 1 f 2 )(h 1 h 2 ) also has nonneg-
ative coefficients. Thus, by factoring over the reals, it suffices to prove the result in

AugustSeptember 2016] PROBLEMS AND SOLUTIONS 729


three cases: for f a nonzero constant polynomial, for f (x) = x + a with a > 0, and
for f (x) = (x b)2 + a with a > 0.
If f (x) = c with c = 0 , then set h(x) = c, yielding f (x)h(x) = c2 > 0. If f (x)
= x + a with a > 0, then take h(x) = 1. For f (x) = (x b)2 + a with a > 0, let
denote the root b + i a of f . We may also write = r ei . Since lies in the upper
half-plane, 0 < < , and hence, the origin lies in the convex hull of {eik : 0 k
2/}.

Letting d = 2/, we can write 0 = dk=0 ck eik for nonnegative real constants

c0 , . . . , cd with sum 1. Rewriting this as 0 = dk=0 ck r k x k expresses as a root of


the polynomial p with nonnegative real coefficients defined by p(x) = dk=0 ck r k x k .


Hence, f divides p, and p/ f is the desired polynomial h.
Also solved by A. J. Bevelacqua, R. Chapman (U. K.), N. Grivaux (France), E. A. Herman, Y. J. Ionin,
O. Kouba (Syria), R. E. Prather, N. C. Singer, A. Stenger, R. Tauraso (Italy), T. Viteam (India), Z. Wu (China),
NSA Problems Group, and the proposer.

A Deranged Sum
11802 [2014, 739]. Proposed by Istvan Mezo, Nanjing University
n of Information
2
Science

and Technology, Nanjing, China. Let H n,2 = k=1 k , and let Dn
= n! nk=0 (1)k /k!. (This is the derangement number of n, that is, the number of
permutations of {1, . . . , n} that fix no element.) Prove that


(1)n 2 

Dn
Hn,2 = .
n=1
n! 6e n=0
n!(n + 1)2

Solution by Brian Bradie, Christopher Newport University, Newport News, VA.


Because the sum on the left side is absolutely convergent, the order of summation
can be interchanged. Hence,
 n   

(1)n   1 (1)n 
1  (1)n
Hn,2 = =
n=1
n! n=1 k=1
k2 n! k=1
k 2 n=k n!
 

1  (1)n  (1)n
k1
= 2

k=1
k n=0
n! n=0
n!
    k1 
 1  (1)n 
1  (1)n
=
k=1
k2 n=0
n! k=1
k 2 n=0 n!

2 1  2 

Dk1 Dk
= = .
6 e k=1 (k 1)! k 2 6e k=0
k! (k + 1)2

Also solved by U. Abel (Germany), A. Ali (India), K. F. Andersen (Canada), M. Andreoli, R. Bagby,
S. Banerjee & B. Maji (India), M. Bataille (France), R. Boukharfane (France), K. N. Boyadzhiev, P. Bracken,
M. A. Cariton, R. Chapman (U. K.), H. Chen, D. Fleischman, N. Fontes-Merz, O. Geupel (Germany),
M. L. Glasser, J.-P. Grivaux (France), E. A. Herman, M. Hoffman, B. Karaivanov & T. Vassilev (Canada),
P. M. Kayll, O. Kouba (Syria), J. Minkus, M. Omarjee (France), R. Sargsyan (Armenia), A. Stenger, R. Stong,
R. Tauraso (Italy), J. Vinuesa (Spain), M. Vowe (Switzerland), M. Wildon (U. K.), J. Zacharias, Armstrong
Problem Solvers, FAU Problem Solving Group, GCHQ Problem Solving Group (U. K.), GWstat Problem
Solving Group, NSA Problems Group, and the proposer.

730 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
REVIEWS
Edited by Jeffrey Nunemacher
Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015

An Introduction to Statistical Learning with Applications in R. By Gareth James, Daniela


Witten, Trevor Hastie, and Robert Tibshirani. Springer, New York. 2013. xiv+426 pp., ISBN
978-1-4614-7137-0, $79.99

Reviewed by Matthew Richey


Mathematics has had no shortage of golden ages, eras during which our discipline not
only experienced significant intellectual advances but also unprecedented growth in its
applicability to the world around us. Think of the applied mathematics of Archimedes;
Newton and Liebniz and the calculus; Gauss and mathematical physics; and the com-
putational mathematics that grew out of the Manhattan Project.
I posit that we are now in a new golden age defined by the interplay of the math-
ematical sciencesthe trinity of mathematics, statistics, and computer sciencewith
data. Phrases such as data science, big data, and predictive analytics are more than
just the current buzz. They encompass a range of methodologies used in a variety of
disciplines to understand the massive influx of data that is altering how we think about
and interact with the world. One can argue that there is nothing new here. Many math-
ematical scientists, particularly applied mathematicians, statisticians, and computer
scientists, have been working on data-intensive questions for a long time. As far back
as 1962, one can find features of data science articulated by John Tukey in his paper
The Future of Data Analysis [1]. Whether data science started with Tukey is not the
point. Data science is here and is here to stay.
But what is data science? One view is that it is simply an amalgamation of methods
that attempt to extract knowledge and understanding from data, with data being very
broadly defined. Data science shares many characteristics with statistics, but its more
than t-tests, chi-square tables, and linear regression. In his 1962 paper, Tukey identified
some of the differences between statistics and the work he was doing:

For a long time I have thought I was a statistician, interested in inferences from the particular to
the general. But as I have watched mathematical statistics evolve, I have had cause to wonder
and to doubt. ... All in all I have come to feel that my central interest is in data analysis, which I
take to include, among other things: procedures for analyzing data, techniques for interpreting
the results of such procedures, ways of planning the gathering of data to make its analysis
easier, more precise or more accurate, and all the machinery and results of (mathematical)
statistics which apply to analyzing data.1

Replace data analysis with data science and, even today, it is hard to find a more
elegant and precise definition of this discipline. As well, data science and statistics dif-
fer in their respective goals and perspectives. For example, statistics primarily focuses
http://dx.doi.org/10.4169/amer.math.monthly.123.7.731
1I first became aware of this quote in a compelling paper entitled 50 Years of Data Science by David
Donoho, presented at a conference honoring the 100th birthday of Tukey. The paper is currently available on
the Simply Statistics blog (simplystatistics.org).

AugustSeptember 2016] REVIEWS 731


on inference (e.g., what is the effect of age on income level?) whereas data science
is primarily concerned with prediction (for a specific individual, what is the best pre-
diction of their income?). Data science is the engine behind Google, Amazon, and
Netflix. It helps biologists to grapple with microarray data involving tens of thousands
of genes. It allows doctors to make complex diagnoses. The quants on Wall Street love
it, perhaps to detriment the rest of us. It is used in almost every discipline, even in
the humanities (textual analysis is a prime example). Suffice it to say that data science
is something very important that lives at the intersection of mathematics, statistics,
and computing. From an intellectual perspective, data science is an ideal subject for
the mathematical sciences. In many ways, it is a perfect subject for the undergraduate
mathematics curriculum.
As mathematicians, we should also care about data science from a more practical
perspective, namely as a natural career option for our majors. Many (the majority, in
all likelihood) of our students are not going off to careers in which they will be prov-
ing theorems. It is more likely that they will be working with data; in effect, they will
become entry-level data scientists. The standard advice for these students is take a
statistics course before you graduate. Agreed, thats a good start, but as noted, data
science is more than statistics. The challenge is that our standard undergraduate math-
ematics curriculum does not contain a course that provides a good entry into the world
of data science. But how do we teach a subject that is so new we hardly know how
to define it? In most departments, especially those at smaller, primarily undergraduate
institutions, its unlikely youll find anyone who identifies themselves as a data sci-
entist. For now, it is hard to imagine an influx of data scientists (whoever these folks
are) coming to teach undergraduate mathematics. No doubt, this will happen, here and
there, but for now there is a fundamental supply and demand problem, not to mention
a salary issue.
One thing we can do is to introduce courses that prepare our students for careers in
data science. Developing a course outside of ones disciplinary expertise takes work,
if not courage. Having a good textbook is a clearly helpful. This brings us to the book
under review, An Introduction to Statistical Learning with Applications in R by James,
Witten, Hastie, and Tibshirani, hereafter referred to as ISLR. The use of the word
statistical in the title reflects the authors professional orientation. One could, and
should, consider this text to be an introduction to the world of machine learning, a
phrase that may sound ominous, raising the specter of the Terminator movies or banks
of super-cooled computers. However, here machine really means an algorithm and
learning means we are going to learn what we can from our data. This combination
is a central tenet of data science. Machine learning, by itself, is not data science but
is an important component of data science. In particular, it is a topic ideally suited for
mathematically inclined students who envision entering careers with a heavy emphasis
on data.
As a textbook for an introduction to data science through machine learning, there is
much to like about ISLR. Its thorough, lively, written at level appropriate for under-
graduates and usable by nonexperts. Its chock full of interesting examples of how
modern predictive machine learning algorithms work (and dont work) in a variety of
settings. I have had the pleasure of teaching out of ISLR twice, both times to classes of
around 20, predominantly senior, mathematics majors; some, but not all, with a back-
ground in statistics. The only prerequisite I required was one of linear algebra, an intro-
ductory computer science course, or an introductory statistics course (one that includes
regression). I have found that students with a statistics background do not necessarily
have an advantage in this course. The order of the topics in ISLR is excellent; over the
course of ten chapters, the reader is led through a slew of modern algorithms

732 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
nearest-neighbor, penalized regression, regression splines, trees, boosting, k-means
clustering, hierarchical clustering, and support vector machines. Naturally, there is
an emphasis on implementing the algorithms in the programming language R. Each
chapter concludes with a detailed lab stepping through the computational techniques
just described.
It should be noted that ISLR is the undergraduate version of Elements of Statistical
Learning: Data Mining, Inference, and Predictions [3] (hereafter ESL) by Hastie, Tib-
shirani, and Friedman, written for graduate students and practicing statisticians. ESL
gives a more sophisticated approach to almost all the topics covered in ISLR; the two
books even share many of the same examples. Any instructor not already comfort-
able with machine learning (such as this reviewer) and planning to use ISLR should
keep ESL handy as a reference. ESL is particularly useful as a means of augmenting
the content of ISLR in a manner appropriate for advanced undergraduates, especially
those considering graduate school. Trevor Hastie and Rob Tibshirani, both at Stanford
University, the two authors common to both ISLR and ESL, are leaders in developing
and using advanced algorithms to better understand the rich and complex data of the
world around us. This adds to the appeal of ISLR; students will be learning directly
from some of the modern leaders in the field of data science.
At first glance, ISLR might appear to be a text written by statisticians for other
statisticians and data scientists. I believe, however, that with a little effort, any mathe-
matician with a working, not necessarily expert, knowledge of R is capable of teaching
an upper-level course to mathematics majors from this book. Anyone who has taught
a statistics course using multivariate regression is a candidate to teach from ISLR.
Overall, the mathematical level of the text is not especially high, and R is used in a
very straightforward manner. Throughout, the focus is on algorithms and applications,
not clever programming tricks. Every algorithm is already implemented in R and is
easy to use, if even from a black box perspective. Because I have been teaching this
to mostly senior mathematics majors, I enjoy augmenting the material in places with
more mathematically advanced perspectives. It is quite plausible that ISLR could be
used in a mid-level course, with less emphasis on the mathematical subtleties.
There are some things in ISLR that could be better. None of these is a major prob-
lem, but they are issues of which one should be aware. Understandably, the authors
cant avoid bringing in their statistics perspective, but they occasionally overempha-
size it in a way that doesnt bring any additional insight. For example, I dont feel that it
is necessary to cite the formula for the standard error of estimation for regression coef-
ficients since error estimates are estimated from data later on. At times, this bifurcated
perspectiveare we interested in inference (statistics) or prediction (data science)?
muddles the message. Another rough spot is that occasionally the level oscillates from
basic ideas (e.g., reminding the reader that the second derivative measures concavity)
to advanced notions such as L 2 spaces. The data sets used in the text are useful as a
way of introducing the methods, but they pale in comparison to what one can easily
dig up from other sources. The labs at the end of each chapter are helpful to students
but a bit outdated relative to current R programming paradigms (no ggplot or dplyr).
Perhaps in a new edition, these will be modernized.
Lets review the content of ISLR. It begins, naturally, with a discussion of what
statistical or machine learning is all about. Notions covered include the differ-
ence between supervised learning (e.g., regression or any other scenario with a
response variable) and unsupervised learning (e.g., clustering models in which struc-
ture is waiting to be discovered). The authors do an especially nice job describing
the bias-variance tradeoff, i.e., how total error can be decomposed into squared
bias, predication variance, and unaccounted-for error. Bias-variance tradeoff is a

AugustSeptember 2016] REVIEWS 733


fundamental principle of machine learning and comes up throughout the book. I find
it useful early on to spend a little extra time with computational examples illustrating
how it works in practice. As the book unfolds, students will be learning all sorts of
nifty algorithms. It is crucial they understand the philosophy behind machine learning
and the role that bias-variance tradeoff plays in the process of deciding which method
to use.
ISLR really gets going in Chapter 3 with a discussion of linear regression and
nearest-neighbor algorithms. Here students can learn everything they need to know
about linear regression in just 40 pages. This might seem brisk, but it could be more
condensed. Regression is a great tool, no doubt, but most students have already seen
it in other contexts. For those who havent, this introduction is sufficient and consis-
tent with how other methods are covered later in the book. For statistics students, the
contrast with a nearest-neighbors algorithm is valuable in that it shows there are viable
alternatives to linear regression. Chapter 4 is a quick tour of some basic classification
modelslogistic regression, linear and quadratic discriminate analysis, and nearest-
neighbor for classification.
The juxtaposition of regression with the nearest-neighbor algorithm illustrates a
strong distinction between statistics and data science. As with many statistical meth-
ods, regression relies on assumptions about the data, for example, that the response
variable is normally distributed around a mean determined as a linear function of the
predictors. As a result of these assumptions, analytic expressions for standard errors
and estimators are available. This approach has been called data-centric [2] in that
the theoretical assumptions of how the data are produced drive conclusions about
estimators and error. In contrast, machine learning can be thought of as algorithm-
centric in that the algorithm is front and center. In many cases, nearest neighbors
included, there are few assumptions about the data; in statistical terms, the methods
are nonparametric. With regression, the resulting model is the set of coefficients (basi-
cally a distillation of the original data, one for each predictor along with a constant
term), and individual predictions rely on just these coefficients. With nearest neigh-
bors, the model includes all of the data; the entire data set is required each time a
prediction is made. The ability of a model to carry along all its data is an example
of how advances in computing affect the methods one can consider.
The next chapter covers resampling methods such as bootstrapping and cross-
validation. Although there are only a few pages devoted to these approaches, these
tools, especially, cross-validation, are the keys to measuring the performance of
machine learning algorithms. Cross-validation, in short, is a method by which the
performance of an algorithm is judged by how well it performs on the data on which
it was built. This might sound self-referential, but the idea is fundamental to data
science. Using cross-validation, there is no call to the theoretical gods for insight into
estimated errors on future data. An algorithm is judged by how well it works on the
data on which it was built. With any luck, this data will reflect the data on which it will
be used. If not, we will learn from our mistakes and adapt the algorthms as we move
forward.
Armed with cross-validation, the reader next moves to methods such as ridge
regression and the least absolute shrinkage and selection operator (also known as
Lasso; among its many virtues, machine learning has an entertaining knack for catchy
acronyms). Both ridge regression and Lasso are variants of traditional linear regres-
sion, only this time with a penalty governing the the growth of the coefficients in order
to control overfitting the data. The appropriate choice of the penalty is determined
by cross-validation. These algorithms work in real time on large data sets because
of efficient numerical algorithms (a topic more fully explored in ESL). Here, we see

734 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
the difference between working in the L 2 norm (ridge regression) and the L 1 norm
(Lasso). This chapter also contains a fairly traditional discussion of best subset selec-
tion for multivariate regression. I dont spend much time on this subject since it is
covered in detail in statistics courses on regression.
The next chapter covers regression splines, localized regression (known as loess and
familiar to most R users) and GAMs (generalized additive models, basically combina-
tions of splines and other generalized fitting techniques). After that, there is a chapter
devoted to tree-based methods: regression and classification trees, bagging (bootstrap
aggregation in which the average of a collection of models will generally outperform a
single model), random forests (bagging with a small, but effective, twist), and boosting
(an algorithm in which less is more, especially if you do less a lot). These last three
methods are examples of randomized algorithms, a topic undergraduate students do
not often see.
The final supervised learning chapter covers support vector machines (SVMs).
SVMs are probably the most technically challenging topic covered but are handled
with an appropriately streamlined approach. Because of their complexity, I find SVMs
provide an excellent platform for showing students how machine learning relies on
ideas from linear algebra, geometry, and numerical methods. The final chapter of the
book is devoted to unsupervised methods: principal components, k-means clustering,
and hierarchical clustering. Hierarchical clustering is a wonderful topic on which to
end because of its use with gene expression (microarray) data and other situations with
messy data in search of order.
It is quite possible to cover almost all of ISLR in one semester, with students doing
mini-projects along the way to learn the algorithms and then leaving some time for
substantial projects at the end. I dont use too many of the exercises, but this is a
personal preference. By the end of the semester, the students are equipped with an
impressive array of algorithms that are applicable to a wide variety of situations. As
well, I find they learn to use R in a creative and generative manner. Best of all, they
have fun. Its thrilling to take on challenging and, often, unusual data sets. For example,
they encounter very wide data sets taken from DNA microarrays, say, with tens of
thousands predictors (individual genes) and only a few dozen samples (patients). They
also get a chance to see how mathematics fits together: how geometry, linear algebra,
and calculus can be brought to bear on challenging and exciting problems. Although
the singular value decomposition (SVD) is not mentioned explicitly in ISLR, the topics
encountered make it easy to introduce students to this wonderful idea. Its a shame that
SVD is not yet a standard topic in our first course in linear algebra.
Understandably, some topics and algorithms are left out. For example, a case could
be made for including multivariate regression splines (MARS, another entertaining
acronym), neural networks, or Markov chain Monte Carlo methods. Including any
of these, or other, topics would mean leaving something out. One can quibble over
such matters of taste, but I feel the authors have done an excellent job of selecting
topics. The goal of ISLR is to serve as an introduction to machine and/or statistical
learning appropriate for undergraduates, not as an reference or encyclopedia (ESL
already serves that purpose). In this sense, it succeeds admirably.
There are a couple of artifacts of the digital era that make this ISLR even more
enticing. To start, both ISLR and ESL are freely available as pdfs. I like having the
real book, being somewhat old school, but not surprisingly, many of the students
opt for the digital version. Another treat is that Tibshirini and Hastie ran a MOOC
(massive open online course) for the book in edX and Stanford Online. The course
has been archived and is currently (and, I assume, in perpetuity since nothing ever
goes away on the internet) freely available. In the MOOC, Tibshirini and Hastie go

AugustSeptember 2016] REVIEWS 735


through each section of the book in sufficient detail to allow the class sessions to be
devoted to computational explorations and active learning labs. In essence, you have a
ready-made flipped classroom.
In summary, I recommend that mathematicians take a hard look at An Introduction
to Statistical Learning with Applications in R and consider using it to support a course
in machine learning for mathematics majors. Its not perfect; the R code is sorely in
need of some updating, and the data sets could be better. But ISLR is a very good
book and a very good option for preparing mathematically inclined students for the
fascinating and important world of data science. Doing so would be an important step
toward insuring that undergraduate mathematics programs stay current and vital.

REFERENCES

1. J. Tukey, The future of data analysis, Ann. Math. Stat. 33 (1962) 167.
2. L. Breiman, Statistical modeling: Two cultures, Statist. Sci. 16 (2001) 199231.
3. T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Data Mining, Inference, and Pre-
diction, Second Edition, Springer-Verlag, New York, NY, 2009.

St. Olaf College, Northfield, MN 55015


[email protected]

736 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A MERICAN M ATHEMATICAL S OCIETY

109 Inequalities Gallery of the


from the Infinite
AwesomeMath Richard Evan Schwartz,
Summer Program Brown University, Providence,
RI
Titu Andreescu, University
of Texas at Dallas, This book is a mathematicians
Richardson, TX, and Adithya unique view of the infinitely
Ganesh, Stanford University, many sizes of infinity.
CA
2016: 187 pages; Softcover;
Explore the theory and tech- ISBN: 978-1-4704-2557-9;
niques involved in proving List US$29; AMS members
algebraic inequalities. US$23.20;
Order code MBK/97
A publication of XYZ Press. Distributed in North America by the American Mathematical Society.

XYZ Series, Volume 16; 2015; 203 pages; Hardcover; Socks Are Like
ISBN: 978-0-9885622-8-8; List US$59.95; AMS members US$47.96;
Order code XYZ/16
Pants, Cats Are
Like Dogs
The Case of Games, Puzzles & Activities
Academician for Choosing, Identifying &
Nikolai Sorting Math!
Nikolaevich Luzin Malke Rosenfeld and
Gordon Hamilton
Sergei S. Demidov, Russian
Academy of Sciences, Moscow, Mathematical thinking
Russia and Boris V. Lvshin, and calculating are two dif-
Editors
ferent things. Of the two, the
Translated by Roger Cooke former skill is far more important to develop than the latter, especially
This book chronicles the 1936 today, when electronic calculators and computers are everywhere.
attack on mathematician Young children, who may know nothing of calculating, can be
Nikolai Nikolaevich Luzin during the USSR campaign to Sovietize remarkably good at mathematical thinking. They do it naturally
all sciences. in their play. The puzzles in this book are meant to be approached
History of Mathematics, Volume 43; 2016; 416 pages; Hardcover; playfully, and they help children build upon their natural capacities
ISBN: 978-1-4704-2608-8; List US$59; AMS members US$47.20; for mathematical thought.
Order code HMATH/43 Peter Gray, Research Professor, Boston College, and author of
Free to Learn: Why Releasing the Instinct to Play Will Make Our
Teaching School Children Happier, More Self-Reliant, and Better Students for Life
Mathematics: From Explore the mathematics of choosing, identifying, and sorting
Pre-Algebra to Algebra through a diverse collection of math games, puzzles, and activities.
A publication of Delta Stream Media, an imprint of Natural Math. Distributed in North America by the
American Mathematical Society.
Hung-Hsi Wu, University of
California, Berkeley, CA Natural Math Series, Volume 4; 2016; 84 pages; Softcover;
ISBN: 978-0-9776939-0-0; List US$15; AMS members US$12;
This two-volume set includes a system- Order code NMATH/4
atic exposition of a major part of the
mathematics of grades 59 (excluding
statistics) written specifically for
Common-Core era teachers.

Parts 1 and 2 available for individual sale.

Set: 2016; approximately 667 pages; Hardcover; facebook.com/amermathsoc


ISBN: 978-1-4704-3000-9; List US$90; AMS members US$72; (800)321-4267 (U.S. & Canada) @amermathsoc
Order code MBK/98/99 (401)455-4000 (Worldwide) plus.google.com/+AmsOrg
1529 Eighteenth St., NW O Washington, DC 20036

Need a text for a QR course?

Common Sense
Mathematics
Ethan D. Bolker and Maura B. Mast

&RPPRQ6HQVH0DWKHPDWLFVLVD
WH[WIRUDRQHVHPHVWHUFROOHJHOHYHO
FRXUVHLQTXDQWLWDWLYHOLWHUDF\7KH
WH[WHPSKDVL]HVFRPPRQVHQVHDQG
FRPPRQNQRZOHGJHLQDSSURDFKLQJ
UHDOSUREOHPVWKURXJKSRSXODUQHZVLWHPVDQGQGLQJXVHIXOPDWK
HPDWLFDOWRROVDQGIUDPHVZLWKZKLFKWRDGGUHVVWKRVHTXHVWLRQV

Catalog Code: CSM Print ISBN: 9781939512109


E-ISBN: 9781614446217 List: $60.00
ebook: $30.00 MAA Member: $45.00
MAA Textbooks 328 pages, Hardbound, 2016

To order a print book visit www.store.maa.org or call 800-331-1622.


To order an electronic book visit www.maa.org/ebooks/CSM.

You might also like