S 15
S 15
S 15
NOTES
Cayleys Formula: A Page From The Book 699
Arnon Avron and Nachum Dershowitz
7KLVLVDFKDOOHQJLQJSUREOHPVROYLQJERRNLQ(XFOLGHDQJHRP
HWU\(DFKFKDSWHUFRQWDLQVFDUHIXOO\FKRVHQZRUNHGH[DPSOHV
ZKLFKH[SODLQQRWRQO\WKHVROXWLRQVWRWKHSUREOHPVEXWDOVRGH
VFULEHLQFORVHGHWDLOKRZRQHZRXOGLQYHQWWKHVROXWLRQWREHJLQ
ZLWK7KHWH[WFRQWDLQVDVHOHFWLRQRISUDFWLFHSUREOHPVRI
6
YDU\LQJGLFXOW\IURPFRQWHVWVDURXQGWKHZRUOGZLWKH[WHQ
5
VLYHKLQWVDQGVHOHFWHGVROXWLRQV7KHH[SRVLWLRQLVIULHQGO\DQG
UHOD[HGDQGDFFRPSDQLHGE\RYHUEHDXWLIXOO\GUDZQJXUHV
7
H,6%1 HERRN SDJHV
7RRUGHUYLVLWZZZPDDRUJHERRNV(*02
9
Scott Annin
7KLVERRNFHOHEUDWHVPDWKHPDWLFDOSUREOHPVROYLQJDWWKHOHYHO
8 10
RIWKH$PHULFDQ,QYLWDWLRQDO0DWKHPDWLFV([DPLQDWLRQ7KHUH
DUHPRUHWKDQIXOO\VROYHGSUREOHPVLQWKHERRNFRQWDLQLQJ
H[DPSOHVIURP$,0(FRPSHWLWLRQVRIWKHVVV
DQGV,QVRPHFDVHVPXOWLSOHVROXWLRQVDUHSUHVHQWHGWR
KLJKOLJKWYDULDEOHDSSURDFKHV7RKHOSSUREOHPVROYHUVZLWKWKH
H[HUFLVHVWKHDXWKRUSURYLGHVWZROHYHOVRIKLQWVWRHDFKH[HUFLVH
LQWKHERRNRQHWRKHOSJHWDQLGHDKRZWREHJLQDQGDQRWKHUWR
SURYLGHPRUHJXLGDQFHLQQDYLJDWLQJDQDSSURDFKWRWKHVROXWLRQ
EDITOR
Scott T. Chapman
Sam Houston State University
ASSOCIATE EDITORS
William Adkins Jeffrey Lawson
Louisiana State University Western Carolina University
David Aldous C. Dwight Lahr
University of California, Berkeley Dartmouth College
Elizabeth Allman Susan Loepp
University of Alaska, Fairbanks Williams College
Jonathan M. Borwein Irina Mitrea
University of Newcastle Temple University
Jason Boynton Bruce P. Palka
North Dakota State University National Science Foundation
Edward B. Burger Vadim Ponomarenko
Southwestern University San Diego State University
Minerva Cordero-Epperson Catherine A. Roberts
University of Texas, Arlington College of the Holy Cross
Allan Donsig Rachel Roberts
University of Nebraska, Lincoln Washington University, St. Louis
Michael Dorff Ivelisse M. Rubio
Brigham Young University Universidad de Puerto Rico, Rio Piedras
Daniela Ferrero Adriana Salerno
Texas State University Bates College
Luis David Garcia-Puente Edward Scheinerman
Sam Houston State University Johns Hopkins University
Sidney Graham Anne Shepler
Central Michigan University University of North Texas
Tara Holm Frank Sottile
Cornell University Texas A&M University
Lea Jenkins Susan G. Staples
Clemson University Texas Christian University
Daniel Krashen Daniel Ullman
University of Georgia George Washington University
Ulrich Krause Daniel Velleman
Universitt Bremen Amherst College
Steven Weintraub
Lehigh University
Abstract. In rotor walk on a graph, the exits from each vertex follow a prescribed periodic
sequence. We show that any rotor walk on the d-dimensional lattice Zd visits at least on the
order of t d/(d+1) distinct sites in t steps. This result extends to Eulerian graphs with a volume
growth condition. In a uniform rotor walk, the first exit from each vertex is to a neighbor
chosen uniformly at random. We prove a shape theorem for the uniform rotor walk on the
comb graph, showing that the size of the range is of order t 2/3 and the asymptotic shape of
the range is a diamond. Using a connection to the mirror model, we show that the uniform
rotor walk is recurrent on two different directed graphs obtained by orienting the edges of the
square grid: the Manhattan lattice and the F-lattice. We end with a short discussion of the time
it takes for rotor walk to cover a finite Eulerian graph.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.627
MSC: Primary 60C05
and
X t+1 = t+1 (X t )+
where e+ denotes the target of the directed edge e. In words, the rotor at X t rotates
to point to a new neighbor of X t and then the walker steps to that neighbor.
We have chosen the retrospective rotor conventioneach rotor at an already vis-
ited vertex indicates the direction of the most recent exit from that vertexbecause it
makes a few of our results such as Lemma 2.2 easier to state.
Figure 1. The range of a clockwise uniform rotor walk on Z2 after 80 returns to the origin. The mechanism
m cycles through the four neighbors in clockwise order (north, east, south, west), and the initial rotors (v)
were oriented independently north, east, south, or west, each with probability 1/4. Colors indicate the first 20
excursion sets A1 , . . . , A20 , defined in 2.
Rt = {X 1 , . . . , X t }.
#B(o, r ) kr d (1)
628
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A directed graph is called Eulerian if each vertex has as many incoming as outgoing
edges. In particular, any undirected graph can be made Eulerian by converting each
edge into a pair of oppositely oriented directed edges.
Theorem 1.1. For any Eulerian graph G of bounded degree satisfying (1), the number
of distinct sites visited by a rotor walk started at o in t steps satisfies
#Rt ct d/(d+1)
Priezzhev et al. [23] and Povolotsky et al. [22] gave a heuristic argument that #Rt
has order t 2/3 for the clockwise rotor walk on Z2 with uniform random initial rotors.
Theorem 1.1 gives a lower bound of this order, and our proof is directly inspired by
their argument.
The upper bound promises to be more difficult because it depends on the initial
rotor configuration . Indeed, the next theorem shows that for certain , the number of
visited sites #Rt grows linearly in t (which we need not point out is much faster than
t 2/3 !). A rotor walk is called recurrent if X t = X 0 for infinitely many t and transient
otherwise.
Theorem 1.2. For any Eulerian graph G and any mechanism m, if the initial rotor
configuration has an infinite path directed toward o, then a rotor walk started at o
is transient and
t
#Rt ,
where is the maximal degree of a vertex in G.
Theorems 1.1 and 1.2 are proved in 3. But enough about the size of the range; what
about its shape?
Each pixel in Figure 1 corresponds to a vertex of Z2 , and Rt is the set of all colored
pixels (the different colors correspond to excursions of the rotor walk, defined in 2);
the mechanism m is clockwise, and the initial rotors independently point north, east,
south, or west with probability 1/4 each. Although the set Rt of Figure 1 looks far
from round, Kapri and Dhar have conjectured that for very large t it becomes nearly
a circular disk! From now on, by uniform rotor walk we will always mean that the
initial rotors {(v)}vV are independent and uniformly distributed on E v .
Conjecture 1.3 (Kapri-Dhar [25]). The set of sites Rt visited by the clockwise uni-
form rotor walk in Z2 is asymptotically a disk. There exists a constant c such that for
any > 0,
We are a long way from proving anything like Conjecture 1.3, but we can show that
an analogous shape theorem holds on a much simpler graph, the comb obtained from
Z2 by deleting all horizontal edges except those along the x-axis (Figure 2).
Figure 2. A piece of the comb graph (left) and the set of sites visited by a uniform rotor walk on the comb
graph in 10,000 steps.
Theorem 1.4. For uniform rotor walk on the comb graph, #Rt has order t 2/3 and the
asymptotic shape of Rt is a diamond.
For the precise statement, see 4. This result contrasts with random walk on the
comb, for which the expected number of sites visited is only on the order of t 1/2 log t
as shown by Pach and Tardos [21].
Thus, the uniform rotor walk explores the comb more efficiently than random walk.
(On the other hand, it is conjectured to explore Z2 less efficiently than random walk!)
The main difficulty in proving upper bounds for #Rt lies in showing that the uniform
rotor walk is recurrent. This seems to be a difficult problem in Z2 , but we can show it
for two different directed graphs obtained by orienting the edges of Z2 : the Manhattan
lattice and the F-lattice, pictured in Figure 3. The F-lattice has two outgoing horizon-
tal edges at every odd node and two outgoing vertical edges at every even node (we
call (x, y) odd or even according to whether x + y is odd or even). The Manhattan
lattice is full of one-way streets: Rows alternate pointing left and right, while columns
alternate pointing up and down.
Figure 3. Two different periodic orientations of the square grid with indegree and outdegree 2.
630
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Theorem 1.5. Uniform rotor walk is recurrent on both the F-lattice and the Manhat-
tan lattice.
The proof uses a connection to the mirror model and critical bond percolation on
Z2 ; see 5.
Theorems 1.1-1.5 bound the rate at which a rotor walk explores various infinite
graphs. In 6, we bound the time it takes a rotor walk to completely explore a given
finite graph.
Related work. By comparing to a branching process, Angel and Holroyd [3] showed
that uniform rotor walk on the infinite b-ary tree is transient for b 3 and recurrent for
b = 2. In the latter case, the corresponding branching process is critical, and the dis-
tance traveled by rotor walk before returning n times to the root is doubly exponential
in n. They also studied rotor walk on a singly infinite comb with the most transient
initial rotor configuration . They showed that if n particles start at the origin, then
1d
order n of them escape to infinity (more generally, order n 12 for a d-dimensional
analogue of the comb).
In rotor aggregation, each of n particles starting at the origin performs rotor walk
until reaching an unoccupied site, which it then occupies. For rotor aggregation in
Zd , the asymptotic shape of the set of occupied sites is a Euclidean ball [20]. For the
layered square lattice (Z2 with an outward bias along the x- and y-axes), the asymp-
totic shape becomes a diamond [18]. Huss and Sava [17] studied rotor aggregation
on the two-dimensional comb with the most recurrent initial rotor configuration.
They showed that at certain times the boundary of the set of occupied sites is com-
posed of four segments of exact parabolas. It is interesting to compare their result with
Theorem 1.4: The asymptotic shape, and even the scaling, is different.
Definition. An excursion from o is a rotor walk started at o and run until it returns to
o exactly deg(o) times.
For n 0, let
be the time taken for the rotor walk to complete n excursions from o (with the conven-
tion that min of the empty set is ). For all n 1 such that T (n 1) < , define
en u T (n) u T (n1)
Lemma 2.1. [4, Lemma 8]; [6, 4.2] For any initial rotor configuration ,
e1 (x) deg(x) x V.
Proof. If the rotor walk never traverses the same directed edge twice, then u t (x)
deg(x) for all t and x, so we are done. Otherwise, consider the smallest t such that
(X s , X s+1 ) = (X t , X t+1 ) for some s < t. By definition, rotor walk reuses an outgoing
edge from X t only after it has used all of the outgoing edges from X t . Therefore, at
time t the vertex X t has been visited deg(X t ) + 1 times, but by the minimality of t each
incoming edge to X t has been traversed at most once. Since G is Eulerian, it follows
that X t = X 0 = o and t = T (1).
Therefore, every directed edge is used at most once during the first excursion, so
each x V is visited at most deg(x) times during the first excursion.
Lemma 2.2. If T (1) < and there is a directed path of initial rotors from x to o,
then
e1 (x) = deg(x).
Proof. Let y be the first vertex after x on the path of initial rotors from x to o. By
induction on the length of this path, y is visited exactly deg(y) times in an excursion
from o. Each incoming edge to y is traversed at most once by Lemma 2.1, so in fact
each incoming edge to y is traversed exactly once. In particular, the edge (x, y) is
traversed. Since (x) = (x, y), the edge (x, y) is the last one traversed out of x, so x
must be visited at least deg(x) times.
If G is finite, then T (n) < for all n since by Lemma 2.1 the number of visits to
a vertex is at most or equal to the degree of that vertex. If G is infinite, then depending
on the rotor mechanism m and initial rotor configuration , rotor walk may or may not
complete an excursion from o. In particular, Lemma 2.2 implies the following.
Now let
An = {x V : en (x) > 0}
be the set of sites visited during the nth excursion. We also set e0 = o (where, as
usual, o (x) = 1 if x = o and 0 otherwise) and A0 = {o}. For a subset A V , define
its outer boundary A as the set
A := {y
/ A : (x, y) E for some x A}.
632
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
1. en+1 (x) deg(x) for all x V ,
2. en+1 (x) = deg(x) for all x An ,
3. An+1 An An .
Note that the balls B(o, n) can be defined inductively by B(o, 0) = {o} and
Rotor walk is called recurrent if T (n) < for all n. Consider the rotor config-
uration T (n) at the end of the nth excursion. By Lemma 2.4, each vertex in x An
is visited exactly deg(x) times during the N th excursion for each N n + 1, so we
obtain the following.
Corollary 2.6. For a recurrent rotor walk, T (N ) (x) = T (n) (x) for all x An and all
N n.
The following proposition is a kind of converse to Lemma 2.4 in the case of undi-
rected graphs.
Proposition 2.7. [5, Lemma 3]; [4, Prop. 11] Let G = (V, E) be an undirected graph.
For a sequence S1 , S2 , . . . V of sets inducing connected subgraphs such that Sn+1
Sn Sn for all n 1, and any vertex o S1 , there exists a rotor mechanism m and
initial rotors such that the nth excursion for rotor walk started at o traverses each
edge incident to Sn exactly once in each direction and no other edges.
Rt = {x V : u t (x) > 0}
will be to (i) upper bound the number of excursions completed by time t, in order to
(ii) upper bound the number of times each vertex is visited, so that (iii) many distinct
vertices must be visited.
t
#Rt . (2)
t (W 1 (t) + 1)
Before proving this theorem, let us see how it implies Theorem 1.1. The volume
growth condition (1) implies v(r ) kr d , so W (r ) k r d+1 for a constant k , so
W 1 (t) (t/k )1/(d+1) . Now if G has bounded degree, then the right side of (2) is at
least ct d/(d+1) for a constant c (which depends only on k and the maximal degree).
Proof of Theorem 3.1. We first argue that the total length T (m) of the first m excur-
sions is at least W (m). By Corollary 2.5, the nth excursion visits every site in the ball
B(o, n). Therefore, by Lemma 2.4(ii), the (n + 1)st excursion visits every site x
B(o, n) exactly deg(x) times, so the (n + 1)st excursion traverses each directed edge
incident to B(o, n). The length T (n + 1) T (n) of the (n + 1)st excursion is there-
fore at least v(n). Summing over n < m yields the desired inequality T (m) W (m).
Now let m = W 1 (t). Since t < W (m), the rotor walk has not yet completed its mth
excursion at time t, so u t (o) < m deg(o), which proves (i).
Part (ii) now follows from Lemma 2.1 since e1 (x) = u T (1) (x) deg(x). During
each completed excursion, the origin o is visited deg(o) times while x is visited at
most deg(x) times. The +1 accounts for the possibility that time t falls in the middle
of an excursion.
Part (iii) follows from the fact that t = xB(o,t) u t (x). By parts (i) and (ii), each
term in the sum is at most t (W 1 (t) + 1), so there are at least t/(t (W 1 (t) + 1))
nonzero terms.
Pausing to reflect on the proof, we see that an essential step was the inclusion
B(o, n) An of Corollary 2.5. Can this inclusion ever be an equality? Yes! By Propo-
sition 2.7, if G is undirected, then there exists a rotor walk (that is, a particular m and
) for which
If G = Zd (or any undirected graph satisfying (1) along with its upper bound coun-
terpart, #B(o, n) K n d for a constant K ), then the range of this particular rotor walk
satisfies RW (n) = B(o, n) and hence
for a constant C. So in this case, the exponent in Theorem 1.1 is best possible. We
derived this upper bound just for a particular rotor walk by choosing a rotor mecha-
nism m and initial rotors . For example, when G = Z2 the rotor mechanism is clock-
wise and the initial rotors are shown in Figure 4. Next, we are going to see that by
varying we can make #Rt a lot larger.
634
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Figure 4. Minimal range rotor configuration for Z2 . The excursion sets are diamonds.
Part (i) of the next theorem gives a sufficient condition for rotor walk to be tran-
sient. Parts (i) and (ii) together prove Theorem 1.2. Part (iii) shows that on a graph
of bounded degree, the number of visited sites #Rt of a transient rotor walk grows
linearly in t.
for all t 1.
Proof. (i) By Corollary 2.3, if has an infinite path directed toward o, then rotor
walk never completes its first excursion from o.
(ii) If rotor walk does not complete its first excursion, then it visits each vertex x at
most deg(x) times by Lemma 2.1, so it must visit at least t/t distinct vertices.
(iii) If rotor walk is transient, then for some n it does not complete its nth excursion,
so this follows from part (ii) taking C to be the total length of the first n 1 excursions.
Since the bounding diamonds have area 2n 2 (1 + o(1)) while t has order n 3 , it fol-
lows that the size of the range is of order t 2/3 . More precisely, by the first Borel
Cantelli lemma,
2/3
#Rt 3
t 2/3 2
x1
x1
x2
x2
Figure 5. An initial rotor configuration on Z (top) and the corresponding rotor walk.
The proof of Theorem 4.1 is based on the observation that rotor walk on the comb,
viewed at the times when it is on the x-axis, is a rotor walk on Z. If 0 < x1 < x2 < . . .
are the positions of rotors on the positive x-axis that will send the walker left before
right and 0 > x1 > x2 > . . . are the positions on the negative x-axis that will send
the walker right before left, then the x-coordinate of the rotor walk on the comb follows
a zigzag path: right from 0 to x1 , then left to x1 , right to x2 , left to x2 , and so on
(Figure 5).
Likewise, a rotor walk on the comb, viewed at the times when it is on a fixed vertical
line x = k, is also a rotor walk on Z. Let 0 < yk,1 < yk,2 < . . . be the heights of the
rotors on the line x = k above the x-axis that initially send the walker down, and let
0 > yk,1 > yk,2 > . . . be the heights of the rotors on the line x = k below the x-axis
that initially send the walker up.
We only sketch the remainder of the proof; the full details are in [11]. For uniform
initial rotors, the quantities xi and yk,i are sums of independent geometric random
variables of mean 2. We have Exi = 2|i| and Eyk, j = 2| j|. Standard concentration
inequalities ensure that these quantities are close to their expectations so that a rotor
walk on the comb run for n/2 excursions visits each site (x, 0) Dn about (n |x|)/2
times and hence visits each site (x, y) Dn about (n |x| |y|)/2 times. Summing
636
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
over (x, y) Dn shows that the total time to complete these n/2 excursions is about
n . With high probability, every site in the smaller diamond Dncn log n is visited
16 3
3
at least once during these n/2 excursions, whereas no site outside the larger diamond
Dn+cn log n is visited.
Consider now the first glance mirror walk: Starting at the origin o, it travels along a
uniform random outgoing edge (o). On its first visit to each vertex v = Z2 {o}, the
walker behaves like a light ray. If there is a mirror at v, then the walker reflects by a
right angle, and if there is no mirror, then the walker continues straight. At this point,
v is assigned the rotor (v) = (v, w) where w is the vertex of Z2 visited immediately
after v. On all subsequent visits to v, the walker follows the usual rules of rotor walk.
Lemma 5.1. With the mirror assignments described above, uniform rotor walk on the
Manhattan lattice or the F-lattice has the same law as the first glance mirror walk.
Proof. The mirror placements are such that the first glance mirror walk must fol-
low a directed edge of the corresponding lattice. The rotor (v) assigned by the first
glance mirror walk when it first visits v is uniform on the outgoing edges from v; this
remains true even if we condition on the past because all previously assigned rotors
are independent of the status of the edge ev (open or closed), and changing the status
of ev changes (v).
Write e = 1{e is open}. Given the random variables e {0, 1} indexed by the
edges of L, we have described how to set up mirrors and run a rotor walk, using the
mirrors to reveal the initial rotors as needed. The next lemma holds pointwise in .
Lemma 5.2. If there is a cycle of closed edges in L surrounding o, then rotor walk
started at o returns to o at least twice before visiting any vertex outside the cycle.
Proof. Denote by C the set of vertices v such that ev lies on the cycle and by A the
set of vertices enclosed by the cycle. Let w be the first vertex not in A C visited by
the rotor walk. Since the cycle surrounds o, the walker must arrive at w along an edge
(v, w) where v C. Since ev is closed, the walker reflects off the mirror ev the first
time it visits v, so only on the second visit to v does it use the outgoing edge (v, w).
Moreover, the two incoming edges to v are on opposite sides of the mirror. Therefore,
by minimality of w, the walker must use the same incoming edge (u, v) twice before
visiting w. The first edge to be used twice is incident to the origin by Lemma 2.1, so
the walk must return to the origin twice before visiting w.
Now we use a well-known theorem about critical bond percolation: There are
infinitely many disjoint cycles of closed edges surrounding the origin. Together with
Lemma 5.2 this completes the proof that the uniform rotor walk is recurrent both on
the Manhattan lattice and the F-lattice.
638
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
To make a quantitative statement, consider the probability of finding a closed cycle
within a given annulus. The following result is a consequence of the RussoSeymour
Welsh estimate and FKG inequality; see [14, 11.72].
P(there exists a cycle of closed edges surrounding the origin in S3 S ) > p
Let u t (o) be the number of visits to o by the first t steps of uniform rotor walk in
the Manhattan or F-lattice.
Theorem 5.4. For any a > 0 there exists c > 0 such that
Proof. By Lemma 5.2, the event {u t (o) < k} is contained in the event that at most
k/2 of the annuli S3 j S3 j1 for j = 1, . . . , 101 log t contain a cycle of closed edges
surrounding the origin. Taking k = c log t for sufficiently small c, this event has prob-
ability at most t a by Theorem 5.3.
Figure 8. Set of sites visited by uniform rotor walk after 250,000 steps on the F-lattice and the Manhattan
lattice (right). Green represents at least two visits to the vertex and red one visit.
Although we used the same technique to show that the uniform rotor walk on these
two lattices is recurrent, experiments suggest that behavior of the two walks is rather
different. The number of distinct sites visited in t steps appears to be of order t 2/3 on
the Manhattan lattice but of order t for F-lattice. This difference is clearly visible in
Figure 8.
Theorem 6.1. For rotor walk on a finite Eulerian graph G of diameter D, with any
rotor mechanism m and any initial rotor configuration ,
tvertex D#E
and
tedge (D + 1)#E.
Proof. Consider the time T (n) for rotor walk to complete n excursions from o. If G
has diameter D, then A D = V by Corollary 2.5 and e D+1 deg by Lemma 2.4(ii). It
follows that tvertex T (D) and tedge T (D + 1). By Lemma 2.1, each directed edge
is used at most once per excursion, so T (n) n#E for all n 0.
Bampas et al. [5] prove a corresponding lower bound: On any finite undirected
graph, there exist a rotor mechanism m and initial rotor configuration such that
tvertex 14 D#E.
Hitting times for random walk. The upper bounds for tvertex and tedge in Theorem 6.1
match (up to a constant factor) those found by Friedrich and Sauerwald [13] on an
impressive variety of graphs: regular trees, stars, tori, hypercubes, complete graphs,
lollipops, and expanders. Intriguingly, the method of [13] is different. Using a theorem
of Holroyd and Propp [16] relating rotor walk to the expected time H (u, v) for random
walk started at u to hit v, they infer that tvertex K + 1 and tedge 3K , where
1
K := max H (u, v) + #E + |H (i, v) H ( j, v) 1| .
u,vV 2 (i, j)E
A curious consequence of the upper bound tvertex K + 1 of [13] and the lower bound
maxm, tvertex (m, ) 14 D#E of [5] is the following inequality.
1
K D#E 1.
4
Is K always within a constant factor of D#E? It turns out the answer is no. To
construct a counterexample, we will build a graph G = G ,N of small diameter that has
so few long-range edges that random walk effectively does not feel them (Figure 9).
Let , N 2 be integers, and set V = {1, . . . , } {1, . . . , N } with edges (x, y)
(x , y ) if either x x 1 (mod ) or y = y. The diameter of G is 2: Any two
vertices (x, y) and (x , y ) are linked by the path (x, y) (x + 1, y ) (x , y ). Each
vertex (x, y) has 2N short-range edges to (x 1, y ) and 3 long-range edges to
(x , y). It turns out that if is sufficiently large and N is much larger still (N = 5 ),
then K > 10 1
#E, showing that K can exceed D#E by an arbitrarily large factor. The
details can be found in [11].
640
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
2 2
1 1
1 1
2 2
Figure 9. The thick cycle G ,N with = 4 and N = 2. Long-range edges are dotted, and short-range edges
are solid.
7. ACKNOWLEDGEMENTS. This work was initiated while the first two authors
were visiting Microsoft Research in Redmond, WA. We thank Sam Watson for help
with some of the simulations, Tobias Friedrich for bringing to our attention references
[5] and [30], and the referees for their careful reading. This research has been partially
supported by NSF grant DMS-1243606 and a Sloan Fellowship.
REFERENCES
1. N. Alon, J. Spencer, The Probabilistic Method. Third ed. John Wiley & Sons, Hoboken, NJ 2008.
2. S. Angelopoulos, B. Doerr, A. Huber, K. Panagiotou, Tight bounds for quasirandom rumor spreading,
Electron. J. Comb. 16 (2009).
3. O. Angel, A. E. Holroyd, Rotor walks on general trees, SIAM J. Discrete Math. 25 (2011) 423446,
arxiv.org/1009.4802.
4. O. Angl, A. E. Holroyd, Recurrent rotorrouter configurations, J. Comb. 3 no. 2 (2012) 185194,
arxiv.org/1101.2484.
5. E. Bampas, L. Gasieniec, N. Hanuss, D. Ilcinkas, R. Klasing, A. Kosowski, Euler tour lock-in problem
in the rotor-router model, in Distributed Computing. Springer, Berline, Heidelberg 2009. 423435.
6. S. N. Bhatt, S. Even, D. S. Greenberg, R. Tayar, Traversing directed Eulerian mazes, J. Graph. Algorithms
Appl. 6 no. 2 (2002) 157173.
7. J. N. Cooper, J. Spencer, Simulating a random walk with constant error, Combin., Probab. Comput. 15
no. 6 (2006) 815822, arxiv.org/0402323.
8. P. Diaconis, L. Smith, Honest Bernoulli excursions, J. Appl. Prob. 25 (1988) 464477.
9. B. Doerr, T Friedrich, T. Sauerwald, Quasirandom rumor spreading, in Proc. 19th SODA. SIAM, San
Francisco, CA 2008.
10. L. Florescu, S. Ganguly, L. Levine, Y. Peres, Escape rates for rotor walks in Zd , SIAM J. Discrete Math.
28 no. 1 (2014) 323334, arxiv.org/1301.3521.
11. L. Florescu, L. Levine, Y. Peres, The Range of a Rotor Walk, arxiv.org/1408.5533.
12. T. Friedrich, L. Levine, Fast simulation of large-scale growth models, ArXiv e-prints (2010).
13. T. Friedrich, T. Sauerwald, The cover time of deterministic random walks, Electron. J. Combin. 17 (2010).
14. G. Grimmett, Percolation. Second ed. Springer, Berlin, 1999.
15. A. E. Holroyd, L. Levine, K. Meszaros, Y. Peres, J. Propp, D. B. Wilson, Chip-firing and rotor-routing
on directed graphs, in In and Out of Equilibrium 2. Vol. 60, Progr. Probab., Birkhauser, Basel, 2008,
dx.doi.org/10.1007/978-3-7643-8786-0 17.
16. A. E. Holroyd, J. G. Propp, Rotor walks and Markov chains, contemporary mathematics, Algorithmic
Probab. Combin. 520 (2010) 105126, arxiv.org/0904.4507.
17. W. Huss, E. Sava, Rotor-router aggregation on the comb, Elect. J. Comb. 18 (2011) 224,
arxiv.org/1103.4797.
18. W. Kager, L. Levine, Rotor-router aggregation on the layered square lattice Electron. J. Combin. 17 no. 1
(2010) P 152, 16 pp. arxiv.org/abs/1003.4017.
19. G. F. Lawler, Intersections of Random Walks. Birkhauser, Boston, 1996.
LIONEL LEVINE is an assistant professor at Cornell University. His Ph.D. is from Berkeley, where his
advisor was Yuval Peres. You can usually find him thinking about why things are the way they are or else
about why they arent the way they arent.
Cornell University, Ithaca, NY 14853
[email protected]
YUVAL PERES is a principal researcher at Microsoft Research in Redmond, WA. He has written more than
250 research papers in probability theory, ergodic theory, and analysis and theoretical computer science, in
particular on fractals, random walks, Brownian motion, percolation, online learning, and Markov chain mixing
times. He obtained his Ph.D. at the Hebrew University of Jerusalem in 1990 and was later a faculty member
there and at the University of California, Berkeley. Yuval was awarded the Rollo Davidson Prize in 1995, the
Love Prize in 2001, and was a corecipient of the David P. Robbins Prize in 2011. He was an invited speaker
at the International Congress of Mathematicians in 2002. He is most proud of the 21 Ph.D. students he has
mentored and hopes to keep learning from them.
Microsoft Research, Redmond, WA 98052
[email protected]
642
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A Tale of Three Theorems
Lawrence Zalcman
Abstract. This article presents a birds eye view of the celebrated theorems of Picard and
their close relatives, with particular emphasis on some surprising developments of the past
half century.
1. A VERY GOOD YEAR. The tale I have to tell begins on May 19, 1879, when the
23-year-old Emile Picard astounded the mathematical world with his announcement
[23] that a nonconstant function holomorphic on the entire complex plane C (an entire
function) must take on every complex value with at most one exception. Previously, the
most that had been known was that the range of a nonconstant entire function is dense
in C. That Picards theorem is sharp is shown, for instance, by the function f (z) = e z ,
which never assumes the value 0.
Our interest in this paper focuses on meromorphic functions on the plane, i.e.,
functions holomorphic on C except for isolated poles (which may accumulate at ).
Picards theorem extends to such functions in a straightforward fashion. Indeed, if f is
meromorphic on C and fails to take on the value c C, then g = 1/( f c) is entire
and therefore omits and at most one complex value. It follows that f = 1/g + c
omits at most two values in the extended complex plane C = C {}. We may sum-
marize this as follows.
Remarkably, Picards theorem is very far from the last word on the values
assumed (or omitted) by entire or meromorphic functions. Just five months after
his first announcement, on October 20, 1879, Picard announced [24] that an entire
function that is not a polynomial takes on every finite value with at most one
exception infinitely often. And on November 5, 1879, he announced the general
form of what has come to be known as the big Picard theorem, or Picards great
theorem, concerning the behavior of a meromorphic function in the neighborhood of
an (isolated) essential singularity [25]. For functions defined on the whole complex
plane, this takes the following form.
Augmented text of a lecture entitled Picard Theorems 18792013, delivered on March 10, 2015,
at the Kirwan Mathematics Festival, held at the University of Maryland at College Park in celebration of
W.E. Kirwans 50 years as a mathematician.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.643
MSC: Primary 30D35, Secondary 30D45
Note that Picards little theorem (as the result simply called Picards theorem
above has become affectionately known) is an instant consequence of the great
theorem. This holds a fortiori in case f is transcendental; while if f is a noncon-
= C,
stant rational function, f (C) so f can omit at most one complex value on C.
Less than half a year separates the formulation and proof of Picards little theorem
from that of the great theorem, but it would take almost a third of a century before the
third theorem of our title would be proved. Indeed, the very formulation of that result
requires new definitions and notational conventions. It is to these that we turn in our
next section.
= C {} can be
2. SPHERES OF INFLUENCE. The extended complex plane C
identified with the euclidean sphere
This identification induces a natural topology on C, which agrees with the usual topol-
ogy of C on bounded sets, while a sequence {z n } C converges to precisely when
is given by
the sequence of preimages { 1 (z n )} tends to (0, 0, 1). This topology on C
the so-called chordal metric, defined by
|z w|
(z, w) = z, w C
1 + |z|2 1 + |w|2
(1)
1
(z, ) = ,
1 + |z|2
where, as usual, |z| = |x + i y| = x 2 + y 2 . Clearly, (z, w) |z w|; moreover,
with the usual convention 1/ = 0, 1/0 = , we have (1/z, 1/w) = (z, w).
For our purposes, a meromorphic function f defined on a plane domain D is most
profitably viewed as a map
f
),
(D, | |) (C, (2)
644
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
where | | denotes the euclidean distance in C induced by the absolute value.
The continuity of such a map is evident: if f is holomorphic at z 0 D, then
( f (z), f (z 0 )) | f (z) f (z 0 )|, which tends to 0 as z z 0 ; while if f has a
pole at z 0 , then 1/ f is holomorphic at z 0 , and the result follows from the previous case
since ( f (z), f (z 0 )) = (1/ f (z), 1/ f (z 0 )).
Associated with the map f in (2) is the so-called spherical derivative, in which
distances in the target space C are measured via while distances in D are measured
in the euclidean metric:
( f (z + h), f (z))
f # (z) = lim
h0 |h|
(3)
| f (z)|
= ( f (z) = ).
1 + | f (z)|2
f # (z) M(K )
That such a basic result as Martys theorem should have been proved so late in
the development of the theory may seem more than a little surprising. This apparent
anomaly stems from the fact that the use of the chordal metric in complex analysis
and, more specifically, the definition of normality given above in terms of the chordal
metric were introduced (by Alexander Ostrowski [22]) only in 1926. Until then,
Montels original definition, as extended to families of meromorphic functions, reigned
supreme. For a snapshot biography of Marty, see [2, p. 219, n. 81]. The fascinat-
ing volume [2] also contains much more of interest, including an extremely warm
appreciation of Montels work, contained in a letter from Lebesgue to Elie Cartan
[2, pp. 240-248].
The fact that in Montels theorem the hypothesis on the functions in the family F is
precisely the condition that in Picards little theorem forces a meromorphic function
on C to be constant suggests that there might be a close relation between these two
results. Could it be that the little Picard theorem actually implies Montels theorem?
Such, indeed, turns out to be the case [36, p. 815]. Thus, the implications dis-
played above are actually equivalences, and Montels theorem takes its place as the
third theorem of our title. With all dramatis personae now on stage, we are ready to
continue our entertainment. But first, a word of elaboration on this latest development.
646
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
That Picards little theorem implies Montels theorem turns out to be a special case
of a much more general phenomenon, which goes by the name of Blochs principle.
According to Blochs principle, if P is a property that forces a function meromorphic
on C to be constant, the family of all functions meromorphic on the plane domain D
that have P is normal on D. Stated so baldly, this assertion is simply false. Consider,
for instance, the property of being bounded. A bounded meromorphic function on C
can have no poles and is therefore a bounded entire function, which, by Liouvilles
theorem, must be constant. On the other hand, each of the functions f n (z) = nz is
bounded on the unit disk ; but, as noted in the previous section, { f n } is not normal
on . However, Blochs principle is true in specific cases frequently enough to make
explication of those conditions under which it is valid an interesting and challenging
endeavor. First steps in this direction were taken in [36], but much more remains to be
done. See [3] for an extensive discussion of these matters.
Blochs principle is named for the French mathematician Andre Bloch, who is per-
haps best remembered for having originated Blochs constant (cf. [27, p. 112]) and
whose visionary insights into the theory of several complex variables retain vitality to
this day. Bloch was a profoundly troubled individual, who spent essentially his entire
adult life (and produced his very considerable mathematical oeuvre) while incarcer-
ated in the famous psychiatric hospital at Charenton. Further details on his life and
work can be found in [30] and [5].
Finally, a comment on the attribution of Blochs principle to Bloch. I have been
unable to find anything remotely resembling the enunciation of this principle in
Blochs published work. The first statement of the principle of which I am aware
occurs in Valirons monograph of 1929 [28, p. 2] but without any mention of Bloch.
Blochs name is mentioned in connection with the principle in a later work of Valiron
[29, p. 4]. Finding a concrete connection between Bloch and his principle (or proving
that none exists) would thus be a worthy project for a doctoral dissertation in the
history of mathematics.
And now, after all these digressions, we return to our tale.
5. A GREAT LEAP FORWARD. We are about to fast forward almost half a century,
from 1912 to 1959. Such a leap inevitably skips over important developments. The
most notable of these, so far as complex analysis is concerned, is undoubtedly Rolf
Nevanlinnas theory of value distribution for meromorphic functions [20], termed by
Hermann Weyl one of the few great mathematical events in our century [33, p. 8].
Although our principal concerns in the remainder of this article are very much in this
direction, the beautiful intricacies of Nevanlinna theory (as it is often called) need not
detain us here. The interested reader is directed to Haymans classical exposition of
the subject [9] (see also [35]) and the comprehensive monograph [7]. In recent years,
considerable attention has focused on an amazing analogy between Nevanlinna theory
and diophantine approximation, first observed by C. F. Osgood and later elaborated by
Paul Vojta and others, for which see [31].
Note that the value is excluded from consideration, as it is evident that f and
f (k) take on the value at exactly the same points, viz., the poles of f.
In the sequel, we focus on the case k = 1, returning to the case of general k at the
very end of the paper. Accordingly, considering f a in place of f, we may restate
Haymans alternative in the following equivalent form.
For entire functions, this had been proved as early as 1923 by Walter Saxer [26];
however, it needs to be emphasized that the passage from holomorphic to meromorphic
functions represents an enormous advance.
Theorem A can be considered a kind of refinement of the great Picard theorem,
in that it provides additional information on (the derivative of) f in the case that f
takes on some finite complex value at most finitely often. In much the same way that
the great Picard theorem implies the little Picard theorem, Theorem A implies the
following result.
Theorem B. [8, p. 20] Let f be a meromorphic function on C such that f (z) = 0 and
f (z) = 1 for all z C. Then f is constant.
648
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Technical developments which go back to [36] have reduced the work of decades
to (literally) just a few lines, leading to renewed interest and activity at an unparalleled
level in the study of normal families. See [37] and [3] for details and many examples.
One insight that has been gained through these developments is the fact that in
results of the sort we have been considering, the assumption that a function f fails to
vanish can often be replaced by the assumption that all zeros of f have suitably high
multiplicity. Such is the case with Theorems A, B, and C, as shown by Yufei Wang
and Mingliang Fang in [32].
Theorem B . (Cf. [32, Lemma 8]) Let f be a meromorphic function on C, all of whose
zeros have multiplicity at least 3. Then if f (z) = 1 for z C, f must be constant.
The classical literature abounds in sharp and explicit results which are inter-
twined in a network of relationships. It is a challenge for modern research to
discover new aspects and new methods which bring about a deeper understand-
ing of these questions . . . the point is that one can and must penetrate much
deeper in the case of two dimensions, and the deep methods are precisely the
ones that do not have easy generalizations.
(z a)2 (a b)2
f (z) = = z + (b 2a) + .
zb zb
Then f has a single zero, at z = a, whose multiplicity is 2, and
(a b)2
f (z) = 1 = 1.
(z b)2
(z )2
f (z) = .
z 2
Then
2
f (z) = z + , so f (z) = 1 for z C.
(z 2)
For any neighborhood V = {z : |z| < } of 0, f takes on both the values 0 and in
V if || < /2. Thus, the family F is clearly not equicontinuous at 0 and hence fails
to be normal on any plane domain containing 0.
Hence, one cannot reduce the multiplicity condition in Theorem C from 3 to 2.
It has been said that The virtue of a logical proof is not that it compels belief, but
that it suggests doubts, and the proof tells us where to concentrate our doubts [11,
p. 282]. It is no less true that the chief virtue of a counterexample is not that it compels
doubt but that it suggests belief, and the counterexample tells us where to concentrate
our belief.
Where, then, do the counterexamples we have just seen tell us to concentrate our
belief? Where, indeed, if not on the possibility of reducing 3 to 2 in Theorem A ?
where f # is the spherical derivative of f given by (3). This function represents the
(normalized) spherical area of the image on C of the disc Dt = {z : |z| < t} under the
mapping f, counted with multiplicity. Then the AhlforsShimizu characteristic of f is
650
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
r
S(t)
T0 (r ) = dt,
0 t
log T0 (r )
= lim sup .
r log r
It is not difficult to show that when f is an entire function; this definition coincides
with that given previously in terms of M(r, f ).
It follows immediately from the definitions above that if f # remains (uniformly)
bounded throughout C, then 2. As we shall see, the finiteness of in this case
plays an important role in the subsequent development.
Then
Clearly, f (0) = 0 and, by (5), f (0) = 1 + aeb = 0, so f has a multiple zero at the
origin. We claim that f has period 2i. Once this has been shown, it follows that f
has multiple zeros at the points 2ik, k Z.
11. QUO VADIMUS? In the absence of any obvious direction to pursue in hopes of
extending Theorem A to functions of infinite order, it makes sense to backtrack a bit
and have a look at the proof of Theorem A .
Suppose, then, that f is a meromorphic function on C, all of whose zeros have
multiplicity at least 3 but that f takes on some nonzero value, say 1, only finitely
often. Then (by Theorem A0 ) f has infinite order, so f # is unbounded on C. Since
f # is bounded on each compact set of C, it follows that there exists a sequence of
points {z n } such that z n and f # (z n ) . Define f n (z) = f (z + z n ); then for
|z| < 1, f n (z) = f (z + z n ) = 1 for n sufficiently large, say n N . Consider the
family of meromorphic functions F = { f n : n N } on the unit disk . Then each
element of F satisfies the hypothesis of Theorem C , and hence, F is normal on .
But f n# (0) = f # (z n ) , which contradicts Martys theorem.
Since Theorem C , on which the proof of Theorem A given immediately above
depends, does not admit an extension to functions all of whose zeros are (merely)
multiple, we find our path blocked by a brick wall.
Walls can be climbed, or they can be broken through. But it is often simplest to
circumvent them. That is the track we take in the next section.
Example. The family {nz} is not normal on {|z| < 1}, but it is quasinormal of
order 1 there, as it is normal on {0 < |z| < 1}.
652
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
It is easy to see that the assumption concerning the zeros of the functions in F
cannot be dropped. Indeed, consider the family of holomorphic functions F = { f n },
where f n (z) = z n 3n , on the annulus D = {z : 2 < |z| < 4}. Then f n (z) = nz n1
= 1 for z D. However, no subsequence of { f n } can converge uniformly on a neigh-
borhood of any point of E = {z : |z| = 3}. Thus, F is not quasinormal on any domain
intersecting E.
13. BRINGING IN THE SHEAVES. While the proof of Theorem C is not easy, it
steers clear of the technicalities of classical value distribution theory. Thus, it is quite
remarkable that it implies the following positive response to the query of Section 8
[21, Theorem 2].
| f (z n )| | f (z n )|
f n# (1) = = f n# (z n ) ,
1 + | f (z n )/z n |2 1 + | f (z n )|2
| f (z n z)|
sup f n# (z) = sup
|z| |z| 1 + | f (z n z)/z n |2
| f (z n z)|
sup
|z| 1 + | f (z n z)|2
= sup f # (w) ,
|w||z n |
14. ONE FOR THE BOOK. An instant consequence of Theorem A is the follow-
ing answer to a celebrated problem of Hayman (see [10, 1.19]), whose solution (see
[37, p. 226]) stretches over three and a half decades.
Proof. ( f n+1 ) = (n + 1) f f n .
15. EPILOGUE. Our tale is told, but the story is not over. We have already observed
that Theorem A remains true if f is replaced by f (k) for any natural number k. It is
natural to ask whether this remains true for Theorem A :
The answer to this question turns out to be surprisingly elusive. Thus (as already
noted), when all zeros of f have multiplicity at least 3, f (k) does indeed take on each
nonzero complex value infinitely often. However, the proof of this in [32] fails com-
pletely (even for k = 1) when the multiplicity of the zeros drops to 2. Moreover, the
analogue of Theorem C for k = 2 definitely fails: A family F of functions meromor-
phic on a plane domain D, all of whose zeros are multiple and such that f = 1 on D
for each f F need not be quasinormal of any finite order on D.
On the other hand, one has a positive result for functions of finite order.
Proof. Suppose that f (k) = 1 only finitely often. If f has only finitely many poles,
then
Alas, Langleys result most definitely does not hold for meromorphic functions of
infinite order [13, pp. 107-108]. Thus, the evidence seems equivocal, at best.
Nevertheless, . . . the ayes have it.
654
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Theorem. Let f is a transcendental meromorphic function on C, all but finitely many
of whose zeros are multiple. Then f (k) takes on each nonzero complex values infinitely
often for k = 1, 2, 3, . . . .
This has now been shown by Mingliang Fang and Yuefei Wang [6, Theorem 10],
who derive the result in fairly straightforward fashion from an inequality proved in
the breakthrough paper [34] of Katsutoshi Yamanoi, in which he proves the celebrated
Goldberg conjecture.
But that is another story.
REFERENCES
LAWRENCE ZALCMAN attended Dartmouth College, where he learned complex function theory from
A. S. Besicovitch and functional analysis from Misha Cotlar. He moved on to MIT, where he continued his
studies in these areas under Henry McKean and Kenneth Hoffman. After 17 years teaching at Stanford and
the University of Maryland, he relocated to Israel in 1985 as Lady Davis Professor of Mathematics at Bar-Ilan
University. Recently emerited, he is now in his 29th year as editor of Journal dAnalyse Mathematique.
Department of Mathematics, Bar-Ilan University, Ramat Gan 5290002, Israel
[email protected]
656
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
An Electric Network for Nonreversible
Markov Chains
Marton Balazs and Aron Folly
Abstract. We give an analogy between nonreversible Markov chains and electric networks
much in the flavor of the classical reversible results originating from Kakutani and later
Kemeny-Snell-Knapp and Kelly. Nonreversibility is made possible by a voltage multiplier
a new electronic component. We prove that absorption probabilities, escape probabilities,
expected number of jumps over edges, and commute times can be computed from electri-
cal properties of the network as in the classical case. The central quantity is still the effective
resistance, which we do have in our networks despite the fact that individual parts cannot be
replaced by a simple resistor. We rewrite a recent nonreversible result of Gaudilli`ere-Landim
about the Dirichlet and Thomson principles into the electrical language. We also give a few
tools that can help in reducing and solving the network. The subtlety of our network is, how-
ever, that the classical Rayleigh monotonicity is lost.
yx
ux Rx y /2 Rx y /2 uy
ix y
658
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
the current that flows into it on one end agrees with the current that comes out on
the other end;
the potential, measured with respect to ground, from its left end (closer to x) to its
right end (closer to y) gets multiplied by the positive real parameter yx .
As the definition naturally suggests, the parameter is log-antisymmetric: We always
assume yx = 1/x y . We will also allow the graph to have loops (edges connecting a
vertex to itself) from some vertex x to x, in which case we require x x = 1.
According to the above, we now follow the potential (with respect to ground) from
left to right in the above unit. First, according to Ohms law, a drop of i x y Rx y /2 in
the potential occurs on the first resistor. Then this dropped potential gets multiplied by
yx . Finally, a second drop by i x y Rx y /2 occurs on the second resistor. Therefore,
Rx y Rx y
u y = ux ix y yx i x y , or
2 2
(1)
2C x y
ix y = ( yx u x u y )
1 + yx
with the introduction of the (Ohmic) conductance C x y = C yx = 1/Rx y . Notice that the
case yx = 1 reduces our unit to the classical single resistor of value Rx y . Notice also
that currents are automatically zero along loops: i x x = 0 whenever x is a vertex with a
loop.
We write z x for neighboring vertices z and x in the graph. This includes x x
for vertices x with a loop. For later use, we introduce
1 2x y
x y : = x y = , Dx y : = C x y = D yx , Dx : = Dx z zx . (2)
yx 1 + x y zx
The symmetry of the matrix D follows from that of C and log-antisymmetry of and
. With these quantities, we rewrite the above as
i x y = Dx y ( yx u x x y u y ). (3)
We emphasise that the voltage amplifier is not a natural object. Sophisticated engi-
neering would be required to build a black box with these characteristics, and this
black box would require an outer energy source (or energy absorber) for its operation.
We do not consider this energy source (or absorber) as part of our network.
Two alternative parts. Two alternative units will facilitate calculations in our net-
works. Using these is not required for any of the later arguments but simplifies matters.
Consider
yx
ux Rx y /2 Rx y /2 uy
ix y
pr
yx se
yx
pr
ux R yx uy ux R se
yx uy
ix y ix y
u y = (u x i x y R pr
yx ) yx
pr
and u y = u x se
yx i x y R yx .
se
Comparing this with (1), we conclude that these three units behave in a completely
identical way under the choices
yx + 1 yx + 1
yx = yx = yx ,
pr se
yx = R x y
R pr , yx = R x y
R se , (4)
2 yx 2
which we will assume whenever we write the pr or se indexed quantities. Notice that
the primer and secunder resistances are not symmetric quantities anymore.
Existence and uniqueness of solutions. An electric network for our purposes con-
sists of our units placed along the edges of a finite, connected graph G = (V, E).
We allow G to have loops as well. Suppose that a subset W of the vertices is taken to
fixed potentials, Ux , x W . The only requirement we make is that W is nonempty.
We show below that there exists a unique solution of the network with these boundary
values that is a unique set of currents i with
ix y = 0 x
/ W, (5)
yx
u x = Ux x W
Proposition 1. Given the graph G and the boundary set W , fix the boundary condition
(Ux )xW , and suppose that we have two solutions u , u and i , i with this boundary
condition. Then u u and i i.
Proof. As (1) is linear, the difference of two solutions is yet another solution. There-
fore, u u and i i is another solution with boundary condition 0 for all x W .
Define now the set V W of vertices where u u is positive. If this set is
nonempty, then any edge that connects it with the rest of V sees an outflowing current
by (1). But this contradicts (5) (summed up for x ). A similar argument shows
that there are no vertices of negative potential either, thus u u 0. Then by (1) it
follows that i i 0 as well.
660
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Lemma 1. Given the graph G and the boundary set W , suppose we have a solution
for all boundary conditions (Ux )xW . Then for any x W , i x is an affine increasing
function of Ux when keeping all other boundary voltages U y , x = y W constant.
Proof. Fix x W ; consider Ux > Ux , U y = U y for all x = y W and the cor-
responding solutions u , u and i , i. Then u u and i i is another solution with
boundary condition Ux Ux > 0 for x and 0 for all x = y W . Again, looking at the
current out of the set of vertices with positive potential, it is clear that the incoming
current i x i x is strictly positive in this setting. Now, multiplying all of u u, i i
by any factor is yet another solution. It follows that i x i x is a positive constant
multiple of Ux Ux , and the proof is complete.
Proof. We will call the set V W the free vertices and perform an induction on its
size n = |V W |. When n = 0, all potentials are fixed, and the currents are simply
computed by (1). Suppose now that the statement is true for n, and consider a set W
with |V W | = n + 1. Pick any vertex x V W . Fixing all boundary values U y
for y W and also the value Ux , we only have n free vertices, and we know by the
induction hypothesis that we have a solution. We also know by the above lemma that
the incoming current i x is an affine function of Ux . Therefore, there exists a particular
value Ux0 with the corresponding incoming current i x0 = 0, and a solution u 0 , i 0 that
goes with the boundary condition Ux0 for x and U y for y W . This will be a solution
with boundary condition U y for y W only, and the induction step is complete.
ua U A a A and ub UB b B.
All other vertices are free: they just connect to neighboring ones via our units. The
starting point is as follows.
Dx y x y
ux = uy . (7)
yx
Dx
where the second line shows an equivalent rewriting of the original setting. However,
with this secunder representation, our formula follows easily after realizing that
the potentials on the x-side of the amplifiers are x y u y for the respective vertices
y;
from here, the potential u x is computed using the well-known formula for a voltage
divider (of conductances C xsey = 1/Rxsey ).
Putting all that in formulas, we have
2 u y Dx y x y u x Dx x
C xsey C x y x yx+1
y
yx
ux = x y u y se = uy =
yx Cx z yx C x z 2+1 Dx Dx x
zx zx xz
y=x z=x y=x
z=x
with the use of (4), (2) and x x = 1. Rearranging the equation finishes the proof.
Notice first that these choices are consistent with the respective symmetry and log-
antisymmetry of D and . It is also clear that the conductances C and the amplifying
factors can also be expressed with the help of the above quantities. Following the
definition (2), we have
Dx = Dx z zx = z Pzx = x . (9)
zx zx
Recall the nonintersecting subsets A and B of the vertex set V , and define, for x V ,
the first reaching times
662
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
and similarly B0 of these sets by the Markov chain started from x. When x A (B), we
define A0 ( B0 , respectively) to be 0. Px will stand for the probabilities associated with
the chain started from x. For short, we will set h x : = Px { A0 < B0 }.
Theorem 1. Set up an electric network with the choices (8) as follows. Apply constant
potentials U A 1 on vertices of the set A and U B 0 on vertices of the set B. More-
over, make no external connections to vertices of V A B. Then for every x V
we have u x = h x .
A nice consequence of the analogy is what happens to our electric network when
we reverse our Markov chain. The reversed Markov chain has the same stationary
distribution x as the original one, and its transition probabilities become
y
Px y = Pyx .
x
We will simply call the network that corresponds the reversed chain the reversed net-
work, and its parameters will be marked by hats. They are
D x y = x Px y y Pyx = y Pyx x Px y = Dx y ;
(11)
x Px y y Pyx 1
x y = = = yx = .
y Pyx x Px y x y
This also implies C x y = C x y and x y = yx = 1/x y for the reversed network; in other
words, reversing the Markov chain simply reverses the direction of our voltage ampli-
fiers while keeping the resistance values intact. A Markov chain is reversible if and
only if the corresponding network has all its amplifiers with x y 1. Indeed, an ampli-
fier of parameter 1 is just a plain wire. Therefore, this case reduces to the classical
reversible setting with ordinary resistors on the edges.
In this case,
Dx y x y
x = Dx z zx and Px y = . (14)
zx
Dx z zx
zx
Proof. If (8) holds for a Markov transition probability P, then the above formulas
follow from direct verification. Conversely, if (12) and (13) hold, then we make the
definition
Dx y x y
Px y = ,
Dx z zx
zx
Notice, however, that the vector Dx z zx xV
satisfies the same properties:
zx
Dx y x y
Dx z zx Px y = Dx z zx = Dx y x y = D yx x y
xy zx xy zx
Dxw wx xy xy
wx
664
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Remark. The normalization (13) is just an artificial choice and is not essential at all.
Given an electric network, multiplying every resistor value by the same constant K
while keeping the amplifiers unchanged will result in the same voltages everywhere
with currents multiplied by 1/K . In particular, Theorem 1 holds true in this case.
Remark. The condition (12) is, on the other hand, very essential, and we will refer to
networks with this property as Markovian. On a technical level, it states that we can
extend the definition (9) of D by
Dx = Dx z zx = Dx z x z .
zx zx
The Markovian property also has a rather intuitive meaning: Considering (7), it states
that the constant potential u x U for all vertices is a valid solution of the (free)
network.
This was, of course, trivially true for the all-resistors networks that correspond to
reversible Markov chains. Consider a set of connected resistors, and apply potential
U on one of the vertices. Then all vertices will stay at potential U , with no current
flowing anywhere in the network. This is not at all straightforward with our generalized
networks of resistors and amplifiers. Applying potential U on one of the vertices, the
amplifiers will change voltages for different parts of the network, and this can keep up
currents in the cycles of the graph G. The Markovian property is that, nevertheless,
each vertex will still stay at the same potential U even if circular (that is, divergence
free) currents flow in the system.
A classical result for Markov chains follows easily from the analogy.
Corollary 1. A Markov chain is reversible if and only if for every closed cycle
x0 , x1 , x2 , . . . , xn = x0 in the graph G we have
Proof. Rewriting the above formula and using (14) together with the symmetry of
D, we arrive to the equivalent statement
This is of course trivially true in the reversible case where all of the amplifiers have
x y = 1. For the other direction, assume now the above formula holds, and turn it into
electrical language. It says that the total multiplication factor of the potentials is one
along any closed cycle of the circuit. It follows that fixing one vertex at potential U ,
zero currents everywhere in the network is a solution. By uniqueness, this is the only
Notice that, by conservation of currents, this agrees to the sum of currents of edges
across the boundary of A, and also i A + i B = 0. The existence of the effective resis-
tance between sets A and B means that the network between these sets can be replaced
by a single resistor. This is not true for arbitrary configurations since the amplifiers in
general push the characteristics away from that of a single resistor. It is, however, true
for networks that match a Markov chain; this is formulated in the next theorem.
eff
U A U B = R AB i A (U A , U B R).
Proof. The proof will again proceed along the lines of linearity. When U A = U B ,
then we just have the Markovian solution with zero incoming currents, thus i A = 0 and
everything is trivial. Suppose that we are given arbitrary reals U A = U B , U A = U B .
We consider two solutions of our network: the one u, i that satisfies the given bound-
ary conditions u x U A for x A and u x U B for x B, and one that comes from
the Markovian property: u M x U B , i x 0 (incoming currents to vertex x, not to
M
be mixed with currents i x y of edges!) for all x V . We think of this latter one as a
M
U A U B U A U B
(u u M ) , (i i M )
UA UB UA UB
666
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
and therefore has boundary conditions
U A U B
(U A U B ) = U A U B on x A, and
UA UB
U A U B
0 = 0 on x B,
UA UB
U A U B
with incoming current i A U A U B
on the set A.
Finally, add the Markovian solution with constant potential u M U B everywhere,
and i M with zero incoming currents in all vertices. This results in
U A U B U A U B
(u u M ) + U B , (i i M ) + iM
UA UB UA UB
U A U B + U B = U A on x A, and
0+ U B = U B on x B,
U A U B U A U B
with incoming current i A U A U B
+ 0 = iA U A U B
on the set A.
We have thus produced the solution for the boundary conditions U A and U B and con-
cluded that the corresponding incoming current to the set A is
U A U B
i A = i A .
UA UB
This is equivalent to the statement of the theorem, i.e., the ratio (U A U B )/i A is a
constant for all boundary potentials U A and U B .
Rewriting this via (14) into electrical terms and then applying (12), we get
Dx y x y 1
(L f )x = ( fx f y ) = Dx y ( yx f x x y f y ).
yx
Dx Dx yx
i xfy : = Dx y ( yx f x x y f y ) (17)
i xf
(L f )x =
Dx
is referred to as the energy associated to the pair P, , and we now see that it is the
total electric power we need to pump in the system in order to maintain potential f x
at each vertex x. (As usual, we do not count the external energy sources (absorbers)
required by the amplifiers to work.)
With this preparation, we now prove the following.
Proposition 4. Reversing a Markovian network does not affect the effective resistance.
In other words,
eff eff
R AB = R AB .
Proof. We repeat Slowiks arguments [12] in the electrical language. Take two func-
tions f and g on V , and apply (12) in the first term and symmetry of the double sum-
mation and of D in the second term below:
f x i xg = fx Dx y ( yx gx x y g y )
x xV yx
= f x gx Dx y yx f x Dx y x y g y
xV yx yxV
(18)
= f x gx Dx y x y gx Dx y yx f y
xV yx yxV
= gx Dx y (yx f x x y f y ) = gx ixf .
xV yx x
(This equation is the electrical way of saying that the adjoint of the generator is the
one of the reversed process.) As before, fix the boundary conditions u x 1 u x on
x A and u x 0 u x on x B for two scenarios: u, i of the original network and
i of the reversed one. This latter has all its amplifiers reversed, and it corresponds to
u,
the reversed Markov chain. We claim that in our situation i xu 0 ixu on x / AB
since these are free vertices. This, together with the common boundary condition for
the two networks, implies
E (u) = u x i xu = u x i xu = u x i xu = u x i xu
xV xA xA xV
= u x ixu = u x ixu = u x ixu = u x ixu = E (u).
xV xA xA xV
668
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Rewriting the power, we apply to maintain our boundary conditions gives
= (U A U B )2 C
AB = (U A U B ) C AB = E (u) = E (u)
C eff AB = C AB .
2 eff eff eff
Next, we introduce what is called the capacity in the theory of Markov chains and
show that it has close connections to the effective resistance. We again assume that
A, B V are nonempty, disjoint, and follow Gaudilli`ere-Landim and Slowik [7, 12]
by defining
eff eff
Proposition 5. The above capacity is simply the effective conductance C AB = 1/R AB
between the sets A and B.
= i A = U A C eff
AB = C AB .
eff
Along the way, we also used (12), the fact that u x U A = 1 for all x A, and
finally (3).
It follows immediately that the capacity is a symmetric quantity in its two argu-
ments A and B. The identity cap(A, B) = cap(B,
A) also follows from the previous
proposition.
1
cap(A, B) = x Pxsy (h x h y )2 ,
2 xyV
with the symmetrised transitions Pxsy = 12 (Px y + Px y ). Equation (14) together with (11)
and (2) gives
Dx y x y + Dx y yx Cx y
Pxsy = = , (19)
2Dx Dx
the ohmic power loss on the resistors, should we apply the actual voltages u on them
without the amplifiers. It is important to note that this interpretation is nonphysical.
With the amplifiers the ohmic losses are not given by the above formula; without the
amplifiers, the voltages u would be totally different.
The capacity, being C eff AB is, however, equal to the total power (U A U B ) C AB
2 eff
We repeat the computation for (20) in the electrical language. First, notice that, by
(17) and (2),
1 u + x y + yx
(i x y + ixuy ) = Dx y
yx xy
ux u y = C x y (u x u y ),
2 2 2
fixing the voltages everywhere, the average of the current, and the reversed current is
the one of the network without amplifiers. This is the starting point to expand the right
hand-side of (20):
1
C x y (u x u y )2
2 xyV
1
= (u x u y )(i xuy + ixuy )
4 xyV
1 1 1 1
= u x i xuy + u x ixuy u y i xuy u y ixuy
4 xyV 4 xyV 4 xyV 4 xyV
1 1 1 1
= u x i xuy + u x ixuy + u
u y i yx + u y iyx
u
4 xyV 4 xyV 4 xyV 4 xyV
1 1 u 1 1 u
= u x i xu + ux ix + u y i yu + uyiy
4 xV 4 xV 4 yV 4 yV
1 1 1 1
= u x i xu + u x i xu + u y i yu + u y i yu = u x i xu
4 xV 4 xV 4 yV 4 yV xV
with the use of the adjoint identity (18). Now, as in the proof of Proposition 4, apply our
usual boundary conditions, and the right hand-side becomes the total power required
to maintain the boundary conditions or, equivalently, C eff AB .
Finally, we define the escape probability from the set A and show its connection to
the effective resistance. This goes exactly as in the reversible case. Suppose that the
Markov chain is started from its stationary distribution , conditioned on being in the
set A. (When A = a is a singleton, this is just the unit mass on the vertex a.) The
escape probability is the chance that the chain reaches set B before its first return to A:
x cap(A, B) C eff C eff
P{ B < A } = Px { B < A } = = AB = AB .
xA
(A) (A) Dz C zy
zA yzA
670
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
The last step used (15). The right-hand side agrees word for word with the classical
reversible result, the starting point of elegant recurrence-transience proofs.
By symmetrizing a Markov chain we mean replacing its transition probabilities
by P s .
Since the conductances agree for the two networks, the left-hand side is the capacity
of the symmetrized network, while the right-hand side is the one of the original net-
work. Dominance of the capacities then implies that of the escape probabilities as the
conductances are not changed by symmetrizing.
This latter is the expected number of jumps from x to y minus that from y to x before
absorption in B. It remains to fix the boundary term u a = va /a . This is done by the
simple observation that the chain has to exit vertex a one more time than enter it; thus,
ia = iay = 1.
ya
via (15). This is the exact same formula as the one of [2] for the reversible case.
Proof. We start with a first step analysis and write, for any x = b,
Dx y x y
k
Hxb = Px y (k x y + Hyb
k
)= (k x y + Hyb
k
). (21)
yx yx
D x
i x y = Dx y ( yx Hxb
k
x y Hyb
k
),
according to (3), and, via (21), the necessity of pumping external currents
ix = ix y = Dx y yx Hxb
k
Dx y x y Hyb
k
yx yx yx
= Dx Hxb
k
Dx Hxb
k
+ Dx y x y k x y
yx
= Dx y x y k x y
yx
672
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
into each vertex x = b. By conservation of current,
ib = Dby by kby D k
yb
with D k : = Dx y x y k x y .
yxV
A second configuration we consider is u x = Hxa
k
on each vertex x. In a similar
fashion, this has external currents
i x = Dx y x y k x y
yx
Our equations being linear, the difference u u is also a solution of the network. It
has potentials and external currents
u a u a = Hab
k
Haak
= Habk
and
i a i a = Day ay kay Day ay kay + D k = D k in vertex a,
ya ya
u b u b = Hbb
k
Hbak
= Hba k
and
i b i b = Dby by kby D k Dby by kby = D k in vertex b,
yb yb
i x i x = Dx y x y k x y Dx y x y k x y = 0 elsewhere.
yx yx
Therefore, this combination only has boundary conditions at a and b, and all other
vertices are free. The effective conductance between a and b is given by
i a i a Dk Dk
eff
Cab =
= k = k ,
ua ua ub + ub
Hab + Hba
k
K ab
A nonmonotone example. We have seen many nice properties of the network. The
next step in the reversible case is making use of Rayleighs monotonicity property.
In the reversible case, the effective resistance is a nondecreasing function of any of
the individual resistances. Here, we show an example to demonstrate that this is not
the case in the irreversible case; the naive approach does not work. Resistance values
below are in ohms.
a b
R
5 13/5
3/2 3/2 2/2 2/2
We immediately rewrite this network to an equivalent form using the primer and secun-
der alternatives.
1/5 5/13
9/5 18/5
a b
5 R 13/5
9 18/13
First notice that a circular current of 4 amperes in the positive direction and no current
through R gives a constant 9 volts free solution; thus, the network is Markovian for all
R values. Therefore, it has an effective resistance, and it is perhaps easiest to compute
if we fix u a = 5 volts and u b = 0. Then we just need to figure out currents in this
diagram.
x
1V 9/5 18/5
R
25 V 9 18/13
y
One way of proceeding is to write the equations for the voltage dividers in x and y.
These are
1 5
+0 5
+ uy 1
10R 6
ux = 9 18 R
= + uy,
5
9
+ 5
18
+ R1 15R + 18 5R + 6
25 1
+0 13
+ ux 1
50R 6
uy = 9 18 R
= + ux .
1
9
+ 13
18
+ R1 15R + 18 5R + 6
a decreasing function of R. The situation reminds the authors to Braesss paradox [1].
674
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
The Dirichlet and Thomson principles. Having the direct monotonicity approach
fail, we now give an insight to the Dirichlet and Thomson energy minimum princi-
ples for the irreversible case. These are the fundamental principles that enable one to
derive Rayleighs monotonicity law in the reversible case. The irreversible case was
established in this form by Gaudilli`ere-Landim and Slowik [7, 12]. Below, we simply
give a translation of their results without proof into the electrical language. In fact, we
stick to the notation of Slowiks Proposition 2.6 as closely as possible.
As before, take two sets A B = of vertices, and define
H A,B : = {u : V R : u| A 1, u| B 0}.
1 1 2 1
D(i) = ix y = Rx y i x2y ,
2 yxV x Px y
s 2 yxV
the ohmic power losses on the resistors; see (19). Finally, recall (17), and reverse the
amplifiers in there to get i . We can now state the following.
Using words, find a potential function u with our boundary conditions (this results
in currents iu in the reversed network) and a divergence-free current i on the free
vertices with total incoming flow i A = 0 such that the difference of these two currents
minimizes the ohmic losses on the network. Then these ohmic losses sum up to the
total physical power required to maintain the boundary conditions (this is the effective
conductance C effAB = cap(A, B) since the boundary voltage difference is U A U B
= 1). We emphasize that the minimizers u and i are nonphysical, except in the
reversible case when u becomes the physical voltage while i 0.
Next, we define
G 0A,B : = {u : V R : u| A u| B 0},
and U1A,B as the set of currents i with zero external currents i x for x
/ (A B) (com-
pare to (6)), and total incoming current i A = 1 = i B to the set A (and therefore -1 to
the set B); see (16).
Using words, find a potential function u that vanishes in A and B (this results in
currents iu in the reversed network) and a unit flow i such that the difference of these
currents minimizes the ohmic losses on the network. Then these ohmic losses sum up
to the total physical power required to maintain a unit flow (the reciprocal is again
the effective conductance since the total current flow is one). Again, the minimizers
are nonphysical, except for the reversible case when u 0 and i is the physical unit
current flow.
How these principles can be used toward monotonicity is a question left for future
work.
Proposition 8. Two of our units in series of respective parameters (R, ) and (Q, )
can be replaced by a single unit of parameters
( + 1) +1
R +Q , .
+ 1 + 1
R/2 R/2 Q/2 Q/2
R pr Q se
R pr Q se
R pr Q pr
S pr
S/2 S/2
It is obvious that the parameter of the voltage amplifier is . Applying the transcrip-
tion formula for each step, the resistance of the substitute element can be determined:
2 2 2 2
= (R pr + Q )
pr
S = S pr = R pr + Q se
+ 1 + 1 + 1 + 1
676
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
( + 1) +1
=R +Q .
+ 1 + 1
Notice the classical parallel formula for the resistance and the weighted average for
the amplifier.
Proof. This case cannot be reduced with transformations into one single unit, but
the alternative elements are still useful. Below are two equivalent circuits.
R/2 R/2 R se
ux uy ux uy
Q/2 Q/2 Q se
The total current from x to y will be the sum of currents like in (1) for the top and the
bottom branches; therefore, it will be in the same form. This proves that a single unit
can be used as a replacement. Its secunder alternative will be as follows.
ux S se uy
It remains to determine the parameters S and . Assume first u x = 1 and zero total
current;that is, leave vertex y free. Then the secunder resistors act as a voltage divider,
giving
Q se + R se
uy = .
Q se + R se
In the simple unit this agrees to the value of the amplifier, therefore
Q se + R se Q( + 1) + R( + 1)
= = .
Q +Rse se Q( + 1) + R( + 1)
Next, when u x = 0, the amplifiers keep the potentials at zero, and the parallel formula
R se Q se
S se =
R se + Q se
2 Q( + 1) + R( + 1) R se Q se RQ
S= S se = 2 se = .
+1 (R + Q)( + 1)( + 1) R + Q se R+Q
Notice that in both the series or parallel formulas the resulting resistances are mono-
tone increasing functions of the original ones. Not all networks can, however, be
reduced using only series or parallel substitutions. The next step of transformations
for classical resistor networks is the stardelta transformation. As we will see shortly,
it is here where nonmonotonicity issues begin. Our nonmonotone example is also one
that cannot be reduced using only series or parallel substitutions.
In our case, star and delta look like
Uz
S/2
Uz
2
Q
R /
/2
S/2
Q
R /
R/
Q/
2
/2
2
Ux S /2 S /2 Uy
R/
Q/
2 2
Ux Uy
where it is essential that the centre of star has no further connections. The question is
whether the parameters can be linked so that these two networks behave identically
under all scenarios. We start by rewriting the above into the equivalent secunder alter-
natives and work with those thereafter. Any formulas can be rewritten into the original
parameters via (4), and we avoid that for the sake of simplicity.
Uz
Uz
Q
se
S se
R se
se R se
Q S se
Ux Uy
Ux Uy
Proposition 10. Any star can be transformed into an equivalent delta, the parameters
of which are given by
678
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
R se S se + Q se S se + Q se R se
S =
se
,
S se
R se S se + Q se S se + Q se R se
Q
se
= ,
Q se
R se S se + Q se S se + Q se R se
R =
se
,
R se
and = , = , = . The notations are the ones used in the above picture.
Proposition 11. A delta can be transformed into an equivalent star if and only if
= 1. (22)
Even in this case, the resulting star is not unique. With any positive number > 0, it
can have parameters
1/3 2/3 R se Q se
S se = ,
2/3 1/3 Q se + 2/3 1/3 R se + 1/3 2/3 S se
1/3 2/3 R se S se
Q se = ,
2/3 1/3 Q se + 2/3 1/3 R se + 1/3 2/3 S se
2/3 1/3 Q se S se
R se = ,
2/3 1/3 Q se + 2/3 1/3 R se + 1/3 2/3 S se
Proof of both stardelta and deltastar. We determine and compare the incoming
currents on vertices x, y and z in the two networks. We start with star. The voltages
at the outer points of the resistances are Ux , U y , and Uz , thus the voltage in the
center point is
Therefore, the respective currents flowing from x, y and z into the center point are
Ux U (S se + R se )Ux S se U y R se Uz
ix = = ,
Q se R se S se + Q se S se + Q se R se
U y U (S se + Q se )U y S se Ux Q se Uz
iy = = ,
R se R se S se + Q se S se + Q se R se
Uz U (R se + Q se )Uz R se Ux Q se U y
iz = = .
S se R se S se + Q se S se + Q se R se
Next, we turn to delta. The currents flowing on the edges are
Ux U y U y Uz U z U x
ix y = , i yz = , i zx = .
S se Q se R se
( R se + S se )Ux R se U y S se Uz
i x = i x y i zx = ,
R se S se
( S se + Q se )U y Q se Ux S se Uz
i y = i yz i x y = ,
Q se S se
( Q se + R se )Uz Q se Ux R se U y
i z = i zx i yz = .
Q se R se
In a star / delta substitution the currents have to be equal for all possible voltages Ux ,
U y , and Uz . Hence, by comparing the coefficients of the voltages in the formulas for the
currents, the connections between the quantities can be determined. It is subservient
to consider first the coefficient of U y in the formula for i x , the coefficient of Uz in the
formula for i y and the coefficient of Ux in the formula for i z :
1 S se
= ,
S se
R se S se + Q se S se + Q se R se
1 Q se
= , (23)
Q se R se S se + Q se S se + Q se R se
1 R se
= .
R se R se S se + Q se S se + Q se R se
R se
se = ,
R R S + Q se S se + Q se R se
se se
S se
= ,
S se R se S se + Q se S se + Q se R se
Q se
= .
Q se R se S se + Q se S se + Q se R se
= , = and = .
= =
2/3 1/3
=
1/3 2/3 2/3 1/3
, and .
680
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
To invert the resistances (23), first note that
1 1 1 1 1 1 1
se se + se se + se se = .
R S Q S Q R R S + Q S se + Q se R se
se se se
1/3 2/3 R se Q se
= ,
2/3 1/3 Q se + 2/3 1/3 R se + 1/3 2/3 S se
1
Q se
Q se = 1
R se
1
S se
+ 1
Q se
1
S se
+ 1
Q se
1
R se
1/3 2/3 R se S se
= ,
2/3 1/3 Q se + 2/3 1/3 R se + 1/3 2/3 S se
1
R se
R se = 1
R se
1
S se
+ 1
Q se
1
S se
+ 1
Q se
1
R se
2/3 1/3 Q se S se
= .
2/3 1/3 Q se + 2/3 1/3 R se + 1/3 2/3 S se
The condition (22) is that at any constant potential the delta has no circular current
by itself. This is rather restrictive; thus, deltastar transformations cannot be used to
reduce a general network. After the lack of monotonicity, this is the second serious
drawback of our networks compared to the classical resistor-only case.
ACKNOWLEDGMENT. The authors wish to thank Edward Crane, Nic Freeman, Alexandre Gaudilli`ere,
Claudio Landim, Gabor Pete, Martin Slowik, Andras Telcs, and Balint Toth for stimulating discussions on this
project. M. Balazs was partially supported by the Hungarian Scientific Research Fund (OTKA) grants K60708,
F67729, K100473, K109684, and the Bolyai Scholarship of the Hungarian Academy of Sciences.
REFERENCES
MARTON
BALAZS University of Bristol. Part of this work was done while the author was affiliated with
the Institute of Mathematics, Budapest University of Technology and Economics, the MTA-BME Stochastics
Research Group, and the Alfred Renyi Institute of Mathematics.
[email protected]
ARON FOLLY This work was done while the author was affiliated with the Department of Mathematics,
Ludwig-Maximilians-Universitat Munchen and the Institute of Mathematics, Budapest University of Technol-
ogy and Economics.
[email protected]
682
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A Characteristic Averaging Property of the
Catenary
Vincent Coll and Jeff Dodd
Johann Bernoulli and Leibniz noted three surprising ways in which the catenary
mimics the graph of a constant function [5]. To formulate these, consider the graph
y = f (x) of a smooth, strictly positive function f as depicted in Figure 1.
y
y = f (x)
( xC , y C )
C
( x A , yA )
A
x
a b
Figure 1. Two centroidal properties of the catenary (in standard vertical position)
For an interval [a, b] on the x-axis, let C denote the segment of the graph of f lying
over [a, b], and let A denote the shaded planar region lying over [a, b] that is bounded
http://dx.doi.org/10.4169/amer.math.monthly.123.7.683
MSC: Primary 34A00
1 Latin for transactions of the scholars, Acta Eruditorum, the first German journal of science and scholar-
and it follows directly from (2) that the catenary in standard vertical position (1) shares
the following properties with the graph of the constant function f (x) = k.
Proportionality. For every interval [a, b], (area of A) = k (arclength of C):
b b
f (x) d x = k 1 + [ f (x)]2 d x. (3)
a a
1
Vertical Bisection. For every interval [a, b], y A = y :
2 C
b b
(1/2)[ f (x)]2 d x 1 a f (x) 1 + [ f (x)]2 d x
a
b = b . (5)
a
f (x) d x 2
a
1 + [ f (x)]2 d x
Note that while the horizontal collocation property (4) and the vertical bisection prop-
erty (5) look a bit more complicated than the proportionality property (3), they are
more elegant in that they involve no unit-dependent constants, whereas the constant k
in the proportionality property has the dimension of length.
It is natural to ask to what extent each of these properties characterizes the catenary.
As far as we know, this question has never been addressed for the two centroidal prop-
erties of horizontal collocation and vertical bisection. Here, we show that each of these
two centroidal properties is in fact a characteristic property of the catenary. The proof
of this fact is surprisingly subtle. Moreover, it reveals a broad averaging property of
the catenary that, despite its geometric manifestations, is essentially analytic in nature.
684
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
impression that the catenary is the only nontrivial continuously differentiable function
satisfying the proportionality property.
The subtlety here, as noted recently by E. Parker [6], is that it is possible to form
positive, nonconstant, continuously differentiable solutions of (2) by joining a por-
tion of the graph of f (x) = k with the left half and/or right half of a catenary of the
form (1). But of course, these piecewise defined functions are not twice differentiable
everywhere, so the precise answer to our warm up question reads this way.
Parkers characterization of the catenary. Catenaries of the form (1) are the only
positive, nonconstant, twice-differentiable functions satisfying the differential equation
(2) or the proportionality property (3).
We were surprised to discover that it is not such a straightforward matter to charac-
terize the functions that satisfy the horizontal collocation property (4) or the vertical
bisection property (5). (We invite the reader to spend a few minutes trying!) It is eas-
iest to see the source of the difficulty and to distill what we need to overcome it if we
take a step back and notice that each of the four quantities appearing in (4) and (5) can
be written in this form:
b
a
g(x)w(x) d x
b (6)
a
w(x) d x
Proof. It is clear that if w1 = kw2 for some x constant k > 0, then (7) holds. To
prove the converse implication, let Wi (x) = c wi (s)ds for i = 1 and 2. Then for any
x (c, d), writing (7) for the interval [c, x] gives us
x x
g(t)W1 (t) dt g(t)W2 (t) dt
c
= c .
W1 (x) W2 (x)
G 1 (x) G (x)
= 2 = ln |G 1 (x)| = ln |G 2 (x)| + I = G 1 (x) = kG 2 (x) (9)
G 1 (x) G 2 (x)
where I is a constant of integration and, because G 1 (x) and G 2 (x) have the same sign
on (c, d), k > 0. Finally, differentiating the last equation in (9) and again keeping in
mind that g (x) = 0 for all x (c, d), it follows for all such x that
g (x)W1 (x) = kg (x)W2 (x) = W1 (x) = kW2 (x) = w1 (x) = kw2 (x).
Armed with the equal averages principle, we can now formulate characterizations
for the catenary based on the horizontal collocation property and the vertical bisection
property.
Remark. It is tempting to modify the vertical bisection property by requiring that the
graph of a positive, continuously differentiable function y = f (x) satisfy y A = y C
over all intervals [a, b] for some = 1/2. However, there are no such functions. If
b b
(1/2)[ f (x)]2 d x f (x) 1 + [ f (x)]2 d x
a
b = a
b , for all intervals [a, b],
0
f (x) d x a
1 + [ f (x)]2 d x
686
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
In the equal averages principle, letting the weight functions w1 and w2 be f and
1 + [ f ]2 yields the same differential equation for f regardless of the choice of the
function g, which is the differential equation (2) whose only positive, nonconstant,
twice-differentiable solution is the catenary function. So the horizontal collocation and
vertical bisection properties are only special cases of a broad characteristic averaging
property of the catenary or, to say it another way, of a multitude of characteristic aver-
aging properties of the catenary corresponding to different choices for the function g.
For example, letting g(x) = (x x0 )n for any real number x0 and any integer
n 1, we see that catenaries of the form (1) are the only positive, nonconstant, twice-
differentiable functions f such that over every interval [a, b],
b b
(x x0 )n f (x) d x (x x0 )n 1 + [ f (x)]2 d x
a
b = a
b . (10)
a
f (x) d x a
1 + [ f (x)]2 d x
That is, for each interval [a, b], the nth moment of the region A under the graph of
f on [a, b] and the nth moment of the segment C of the graph of f over [a, b] are
the same with respect to any vertical axis x = x0 . If the graph of the catenary (1) is
assigned a uniform linear mass density and the region below this graph is assigned a
uniform area mass density, then (10) has natural physical interpretations when n = 1
and n = 2.
When n = 1, (10) is the horizontal collocation property and a straightforward phys-
ical interpretation is that when f is a catenary of the form (1), then for any inter-
val [a, b], the x-coordinate of the center of mass of the region A is the same as the
x-coordinate of the center of mass of the segment C. An equivalent, and perhaps more
counterintuitive, physical interpretation is that for any interval [a, b], the x-coordinate
of the center of mass of C A is unaffected by the uniform mass densities assigned
to the graph of the catenary and the region under the graph of the catenary. That is,
suppose we were to build a fence on level ground with the top of the fence following
a catenary of the form (1). Then the horizontal position of the center of mass of any
slice of the fence bounded by two vertical lines would be unchanged by the addition of
a railing of uniform linear mass density running along the top of the fence, no matter
how heavy the railing.
When n = 2, the left- and right-hand sides of (10) represent, respectively, the radius
of gyration x A (x0 ) of the region A about the axis x = x0 and the radius of gyration
x C (x0 ) of the segment C about the axis x = x0 . (The radius of gyration of an object
O about the axis x = x0 is defined as the distance x O (x0 ) such that if the mass of the
object were concentrated into a point mass at a distance x O (x0 ) from the axis x = x0 ,
then this point mass would have the same moment of inertia around the axis x = x0
as the object O itself.) So when f is a catenary of the form (1) then, for every interval
[a, b] and any axis x = x0 , the radius of gyration x C (x0 ) of the segment C of the graph
of f over [a, b] is the same as the radius of gyration x A (x0 ) of the region A under the
graph of f on [a, b].
REFERENCES
VINCENT E. COLL, JR. received a B.S. from Loyola Universtiy in New Orleans, an M.S. from Texas A&M
University and a Ph.D. from the University of Pennsylvania in 1990 under the direction of Murray Gersten-
haber. He is a professor of practice at Lehigh University. His research interests include deformation theory and
the study of Frobenius Lie algebras. His outside interests include boxing and ice-hockeybut not at the same
time.
Lehigh University, 27 Memorial Drive West, Bethlehem PA 18015
[email protected]
JEFF DODD received a B.S. from the University of Maryland at College Park, an M.A. from the University
of Pennsylvania and a Ph.D. from the University of Maryland at College Park in 1996 under the direction of
Robert L. Pego. He is a professor of mathematics at Jacksonville State University.
Jacksonville State University, 700 Pelham Road North, Jacksonville AL 36265
[email protected]
688
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
The Six Circles Theorem Revisited
Dennis Ivanov and Serge Tabachnikov
Abstract. The six circles theorem of C. Evelyn, G. Money-Coutts, and J. Tyrrell concerns
chains of circles inscribed into a triangle: the first circle is inscribed in the first angle, the
second circle is inscribed in the second angle and tangent to the first circle, the third circle is
inscribed in the third angle and tangent to the second circle, and so on, cyclically. The theorem
asserts that if all the circles touch the sides of the triangle, and not their extensions, then the
chain is 6-periodic. We show that, in general, the chain is eventually 6-periodic but may have
an arbitrarily long pre-period.
P2
5
4 6
2
P3 P1
3 1=7
Figure 1. The six circles theorem: The centers of the consecutive circles are labeled 1, 2, . . . , 7
This beautiful theorem is one of many in the book [3] which is a result of the col-
laboration of three geometry enthusiasts, C. Evelyn, G. Money-Coutts, and J. Tyrrell.
The following is a quotation from John Tyrrells obituary [6]:
John also worked with two amateur mathematicians, C. J. A. Evelyn and G. B. Money-Coutts,
who found theorems by using outsize drawing instruments to draw large figures. They then
looked for concurrencies, collinearities, or other special features. The three men used to meet
for tea at the Cafe Royal and talk about mathematics, and then go to the opera at Covent
Garden, where Money-Coutts had a box.
We refer to [12, 8, 4, 9, 10] for various proofs and generalizations of the theorem and to
[11] for a brief biography of C. J. A. Evelyn. See also [2, 13, 14] for Internet resources.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.689
MSC: Primary 52C26
P3
2
9=3
5 7 4
P2 8 P1
1
Figure 2. The chain of circles is eventually 6-periodic with pre-period of length two: C9 = C3 , but C8 = C2
Theorem. Assume that, for the initial circle, at least one of the tangency points lies
on a side of the triangle. Then the chain of circles is eventually 6-periodic. One can
choose the shape of a triangle and an initial circle so that the pre-period is arbitrarily
long.
The existence of pre-periods is due to the fact that the map assigning the next circle
to the previous one is not 1-1; that is, the inverse map is multivalued.
Concerning the assumption that at least one of the tangency points of a circle with
the sides of the angle of a triangle lies on a side of the triangle, and not its extension,
we observe the following.
Lemma 1. If the first circle in the chain satisfies this assumption, then so do all the
consecutive circles.
Proof. If circle C1 touches side P1 P2 , then circle C2 also touches this side, at a point
closer to P2 than the previous tangency point. Shifting the index by one, if circle C2
does not touch side P2 P3 but touches side P1 P2 , then it intersects side P1 P3 , and the
next circle C3 touches side P1 P3 , at a point closer to P3 than the intersection points.
See Figure 5 below for an illustration.
What about the case when the initial circle touches the extensions of both sides,
P1 P2 and P1 P3 ? If the circle does not intersect side P2 P3 , then the next circle in the
chain cannot be constructed, so this case is not relevant to us. If the first circle intersects
side P2 P3 , then the next circle touches side P2 P3 , and thus satisfies the assumption of
Theorem 2, see Figure 3. Hence this assumption holds, starting with the second circle
in the chain, and we may make it without loss of generality.
690
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
P2
A
P3 P1
P2
P3 P1
Figure 3. When the initial circle touches the extensions of both sides of the triangle
P2
A
2
P3 1
P1
If two circles of radii r1 and r2 are tangent externally, then the length of their com-
mon tangent segment (segment AB in Figures 4, 5, 6) is
(r1 + r2 )2 (r1 r2 )2 = 2 r1r2 .
3
P2
P3
B A
P1
2 3
A B
P3 P2
Figure 6. Another illustration for the second case of equation (1): |P2 A| |AB| + |B P3 | = |P2 P3 |
4. SOLVING THE EQUATIONS. The equations in (1) determine the new radius r2
as a function of the previous one r1 . We shall solve these equations in two steps. First,
introduce the notations
u1 = r1 cot 1 , e3 = tan 1 tan 2 ,
u 21 2e3 u 1 u 2 + u 22 = a3 , (2)
or
u 1 (u 1 e3 u 2 ) + u 2 (u 2 e3 u 1 ) = a3 . (3)
692
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Solving (2) for u 2 , we obtain
u 2 = e3 u 1 + a3 (1 e32 )u 21 , or u 2 = e3 u 1 a3 (1 e32 )u 21 , (4)
accordingly as the sign in (2) is positive or negative. The minus sign in front of the
radical in the second formula (4) is because our construction chooses the smaller of
the two circles tangent to the previous one. Likewise, solving for u 1 yields
u 1 = e3 u 2 + a3 (1 e32 )u 22 , or u 1 = e3 u 2 + a3 (1 e32 )u 22 ,
again depending on the sign in (2). The plus sign in front of the radical in the second
formula is due to the fact that, going in the reverse direction, from C2 to C1 , one
chooses the greater of the two circles. Substituting this to (3), we obtain
u 1 a3 (1 e32 )u 22 u 2 a3 (1 e32 )u 21 = a3 . (5)
G F
B E C
Proof. Let R be the inradius and S the area of the triangle. Let
TA = AF = AG, TB = BG = B E, TC = C E = C F,
R2 TC T A + TB c
1 tan tan = 1 =1 = = ,
T A TB p p p
as claimed.
To justify the second formula, we note that ai < p. Likewise, each circle is tangent
to a side of the triangle, so u i2 is not greater than some side, and hence less than p. This
justifies the first formula.
In the new variables, (6) can be rewritten as sin(1 2 ) = sin 3 , where one has a
plus sign for 1 < 3 and a minus sign otherwise. Hence
2 = |1 3 |. (7)
Assume that the triangle inequality is violated for some triangle. Since the inequality
holds for an equilateral triangle, one can deform it to obtain a triangle for which 3 =
1 + 2 . Then
sin 1 sin 2 cos 1 cos 2 = cos2 1 cos2 2 or sin 1 sin 2 = cos 1 cos 2 .
Therefore,
cos(1 + 2 ) = 0 or 1 + 2 = .
2
694
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
6. PIECEWISE LINEAR DYNAMICS. We are ready to investigate the function
(7). Although the dynamics of a piecewise linear function can be very complex [7],
ours is quite simple.
Iterating the map three times, with the values of the index i = 1, 2, 3, yields the
function y = |||x 1 | 2 | 3 |. We scale the x y plane so that 1 = 1 and rewrite
the function as
where a b and b < a + 1. We will show that every orbit of the map f is eventually
2-periodic, see Figure 8.
4 P0
0 P00
0 2 4 6 8 10 12
The graph of f (x) is shown in Figure 9 with the characteristic points marked.
b
ba+1
ba
1 b
ba a+1 a+b+1
It is clear that iterations of the function f take every orbit to the segment [0, b], and
this segment is mapped to itself. Indeed, if x a + b + 1, then f (x) = x a b 1,
and if x a + b + 1, then f (x) b. Thus iterations of the function f will keep
decreasing x until it lands on [0, b].
Let
Then I2 consists of 2-periodic points, and we need to show that every orbit lands on
this interval. Indeed, f (I1 ) = [1, b a + 1] I3 . On the other hand, each iteration of
f chops off from the left a segment of length 1 + a b from I3 and sends it to I2 . It
follows that every orbit eventually reaches I2 .
7. FINAL COMMENTS.
1. Although our considerations are close to those in [3], the authors of this book
did not consider the pre-periodic behavior of the chain of circles. They addressed
the issue of the two choices in each step of the construction and noted:
... we may make the first three sign choice quite arbitrarily provided that, thereafter, we
make correct choices ...
4. A version of the six circles theorem holds for curvilinear triangles made of arcs
of circles [3, 12, 8], and a generalization to n-gons is available as well [9]. Again,
one expects eventual periodicity with arbitrarily long pre-periods.
5. Constructing the chains of circles, we consistently chose the smaller of the two
circles tangent to the previous one. It is interesting to investigate what happens
when other choices are made; for example, one may toss a coin at each step. See
Figure 11 for an experiment with a randomly chosen triangle.
6. The six circles theorem is closely related with the Malfatti problem: to inscribe
three pairwise tangent circles into the three angles of a triangle; see, e.g., [5] and
the references therein. This 3-periodic chain of circles exists and is unique for
696
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
400
300
200
100
0 10 20 30 40 50 60 70
Figure 11. The histogram represents 3000 chains of circles in a generic triangle. The selection, out of two,
of each next circle in a chain is random. The horizontal axis represents the length of the pre-period, and the
vertical the number of chains having this pre-period.
every triangle; it corresponds to the fixed point of the function f (x). See [1] for
a discussion of the Malfatti problem close to our considerations.
ACKNOWLEDGMENT. Most of the experiments that inspired this note and of the drawings were made in
GeoGebra. The second author was supported by the NSF grant DMS-1105442. We are grateful to the referees
for their criticism and advice.
REFERENCES
1. V. Belenky, A. Zaslavsky, On the Malfatti problem (in Russian), Kvant No. 4 (1994) 3842.
2. A. Bogomolny, Six Circles Theorem, http://www.cut-the-knot.org/Curriculum/Geometry/
Evelyn.shtml
3. C. J. A. Evelyn, G. Money-Coutts, J. A. Tyrrell, The Seven Circles Theorem and Other New Theorems.
Stacey Int., London, 1974.
4. D. Fuchs, S. Tabachnikov, Mathematical Omnibus. Thirty Lectures on Classic Mathematics. Amer. Math.
Soc., Providence, RI, 2007.
5. R. Guy, The lighthouse theorem, Morley & MalfattiA budget of paradoxes, Amer. Math. Monthly 114
(2007) 97141.
6. M. Laird, J. Silveste, John Alfred Tyrrell, 19321992, Bull. Lond. Math. Soc. 43 (2011) 401405.
7. M. Nathanson, Piecewise linear functions with almost all points eventually periodic, Proc. Amer. Math.
Soc. 60 (1976) 7581.
8. J. Rigby, On the Money-Coutts configuration of nine antitangent cycles, Proc. Lond. Math. Soc. 43 (1981)
110132.
9. S. Tabachnikov, Going in circles: Variations on the Money-Coutts theorem, Geom. Dedicata 80 (2000)
201209.
10. S. Troubetzkoy, Circles and polygons, Geom. Dedicata 80 (2000) 289296.
11. J. A. Tyrrell, Cecil John Alvin Evelyn, Bull. Lond. Math. Soc. 9 (1977) 328329.
12. J. A. Tyrrell, M. T. Powell, A theorem in circle geometry, Bull. Lond. Math. Soc. 3 (1971) 7074.
13. http://en.wikipedia.org/wiki/Six_circles_theorem
14. E. Weisstein, Six Circles TheoremFrom MathWorld, A Wolfram Web Resource,
http://mathworld.wolfram.com/SixCirclesTheorem.html
SERGE TABACHNIKOV was educated in the Soviet Union (Ph.D. from Moscow State University); since
1990, he has been teaching at universities in the USA. His mathematical interests include geometry, topology,
and dynamics. He served as a Deputy Director of ICERM (Institute for Computational and Experimental
Research in Mathematics) at Brown University. He is the Editor-in-Chief of Experimental Mathematics,
the Editor of the Mathematical Gems and Curiosities column of the Mathematical Intelligencer, and in
20102015, he served as the Notes Editor of this Monthly.
Department of Mathematics, Penn State, University Park, PA 16802
[email protected]
698
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
NOTES
Edited by Sergei Tabachnikov
We give a short elementary proof of Cayleys famous formula for the enumeration Tn
of free, unrooted trees with n 1 labeled nodes. We first count Fn,k , the number of
n-node forests composed of k rooted, directed trees, for 1 k n. For the history of
the formula, including Jim Pitmans use of directed forests, see [1, pp. 221226].
The crux of the proof is simple double counting. There are two equivalent ways
of counting the number of k-tree forests with one designated internal (nonroot) node,
which shows, for all k = 1, . . . , n 1, that
For the left side of (*): Consider one of the Fn,k forests with k trees. Designate any
one of its n k internal nodes.
For the right side: Consider one of the Fn,k+1 forests with k + 1 trees. Choose any
one of the n nodes, and hang from it any one of the k trees not containing that node.
The root of that grafted subtree is the designated internal node.
Iterating (*) n 1 times gives
Fn,1 = 1
n1
n Fn,2 = 1 2
n1 n2
n 2 Fn,3 = = 1 2
n1 n2
n1
1
n n1 Fn,n .
The k and n k factors all cancel each other out. Because there is precisely one way of
turning n nodes into n distinct trees (each root being a whole tree), we have Fn,n = 1.
Thus, the number Fn,1 of n-node rooted trees is n n1 . Since any of the n nodes in a tree
can be the root, Fn,1 = nTn ; Cayleys formula, Tn = n2
n nk1
, follows.
Applying (*) only n k times, yields Fn,k = k knn
, for k = 1, . . . , n.
Alternatively, the relation (k + 1)Rn,k = kn Rn,k+1 for the number Rn,k of n-node
forests with k designated roots leads to Rn,k = kn nk1 and to Tn = Rn,1 = n n2 .
As a final remark, there are (n + 1)n1 rooted trees with n + 1 nodes that all share
the same root. Each corresponds to a rooted forest with n nodesjust chop off the root
node. Therefore, the limit of the ratio of rooted labeled forests to rooted labeled trees,
as their size grows, is limn (n + 1)n1 /n n1 = e.
ACKNOWLEDGMENT. We thank Ed Reingold for his suggestions and everyone who read earlier drafts.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.699
MSC: Primary 05C05
1. M. Aigner, G. M. Ziegler, Proofs from THE BOOK. Fifth edition. Illustrations by K. H. Hofmann. Springer-
Verlag, Berlin, 2014.
School of Computer Science, Tel Aviv University, Ramat Aviv, 69978 Israel
[email protected]
School of Computer Science, Tel Aviv University, Ramat Aviv, 69978 Israel
[email protected]
Congratulations to the U.S. team, Ankan Bhattacharya, Michael Kural, Allen Liu,
Junyao Peng, Ashwin Sah, and Yuan Yao for their second consecutive win at the
57th International Mathematical Olympiad in Hong Kong! Not only did the team
bring home first place for the U.S., but students Allen Liu and Yuan Yao earned
perfect scores on the exam, and all six U.S. students took home a gold medal for
their individual high scores.
700
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
On Tangents and Secants of Infinite Sums
Michael Hardy
Abstract. We prove some identities involving tangents, secants, and cosecants of infinite sums.
As far as I know, the last two identities do not appear in any refereed source. The
case of the first one in which only finitely many terms appear on the left appears in
[1, page 47].
We will prove that the last two identities hold when the sum on the left converges
absolutely. The first identity in that case follows as a corollary. I added a quick sketch
of these proofs to Wikipedias List of trigonometric identities [2] in 2012.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.701
MSC: Primary 33B10
e0 e2 + e4
e1 e3 + e5
each term shown is itself 1 times one of the infinite series (2). As long as conver-
gence is absolute, the order of summation will not affect the value of the sum, but
our proofs will involve limits of partial sums. Hence, we will evaluate the sums in
the following order:
e0 e2 + e4 = lim (1)k/2 en,k
n
even kn
(3)
e1 e3 + e5 = lim (1)(k1)/2 en,k .
n
odd kn
(the sums on the left have only finitely many nonzero terms). The identities (4) are
proved by a routine induction on n.
702
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
For large enough N , for all j N , we have j so close to 0 that 1 sec j
1 + 2j exp( 2j ). Hence,
n
n
n
n
1 sec j (1 + 2j ) exp( 2j ) = exp 2j ,
j=N j=N j=N j=N
and that converges as n since j j converges absolutely. Thus, the right sides
of (4) converge and, therefore, so do the left sides, and to the same limit.
To show that convergence on the left side is absolute, we let f n,k be the kth-degree
elementary symmetric function in the absolute values |x1 |, . . . , |xn |. It is enough to
prove that
n
lim f n,k < .
n
k=0
We have
n
n
n
n
lim f n,k = lim (1 + |xi |) lim exp |xi | = lim exp |xi | < .
n n n n
k=0 i=1 i=1 i=1
REFERENCES
1. E.W. Hobson, A Treatise on Plane Trigonometry, Cambridge Univ. Press, Cambridge, 1891.
2. https://en.wikipedia.org/wiki/List_of_trigonometric_identities
Abstract. Just as knowing some roots of a polynomial allows one to factor it, a well-known
result provides a factorization of any scalar differential operator given a set of linearly inde-
pendent functions in its kernel. This note provides a straightforward generalization to the case
of matrix coefficient differential operators.
n
n
L= i (x) i
implies L( f ) = i (x) f (i) (x). (1)
i=0 i=0
Much interest in differential operators comes from their use in writing linear dif-
ferential equations. However, ODOs also have the algebraic structure of a noncom-
mutative ring. They can be added as one would add any polynomials, by combining
the coefficients of similar powers of , and multiplication is defined by extending the
following rule for the product of two monomials linearly over sums:
m
m
((x) m ) ((x) n ) = (x) (i) (x) m+ni . (2)
i=0
i
Although this definition may look complicated at first, in fact, it is simply a con-
sequence of the usual product rule from calculus, applied here to guarantee that
multiplication corresponds to operator composition as one would expect. That is, this
definition was chosen so that L Q( f ) = L(Q( f )).
Given the rule (2) for multiplication, the inverse question of factorization naturally
arises. One factorization method for ODOs is surprisingly reminiscent of a familiar
fact about polynomials. If x = is a root of the polynomial p(x), then you know that it
has a factor of x , the simplest first degree polynomial with this property. Similarly,
for any nonzero function f (x), f / f is the simplest first order differential operator
having f in its kernel, and if L is any ODO with f in its kernel, then L = Q (
f / f ) for some differential operator Q. Moreover, just as knowing additional roots
of the polynomial p would allow further factorization, one may factor a differential
http://dx.doi.org/10.4169/amer.math.monthly.123.7.704
MSC: Primary 16S32
1 Here and throughout this note, functions of x will be understood to mean sufficiently differentiable
704
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
operator of order n into the product of operators n k and k from the knowledge of k
linearly independent functions in its kernel. The general statement written in terms of
Wronskian determinants2 is as follows.
Wr(1 , . . . , m , f )
K( f ) = (3)
Wr(1 , . . . , m )
and (b) if L is any differential operator satisfying L(i ) = 0 for 1 i m, then there
exists a differential operator Q such that L = Q K .
2. THE MATRIX CASE. The purpose of this note is to generalize this well-known
and useful result to the case of matrix coefficients3 . A matrix coefficient ordinary dif-
ferential operator (MODO) is again a polynomial in with coefficients that depend on
x, but we now consider the case in which those coefficients are N N matrices. The
formulas for multiplying MODOs and for applying them to functions remain the same
(see (1) and (2)), but now the products involving the coefficients are understood to be
matrix products and the function f is an N -vector valued function.
Matrix analogues of Theorem 1(a) already appear in the literature. For example,
a result of EtingofGelfandRetakh [1] allows one to produce a monic MODO with
a specified kernel using quasi-determinants. However, not only is there no published
analogue of Theorem 1(b) for MODOs, it appears that many researchers have sus-
pected that it does not generalize nicely to the matrix case. A common proof of
Theorem 1(b) depends on the fact that any nonzero ODO of order at most n has
a kernel of dimension at most n. So, the fact that any MODO with a singular
leading coefficient has an infinite-dimensional kernel is both an obstacle to gener-
alizing the proof and reason to doubt the validity of the equivalent statement for
MODOs.
It is therefore good news that Theorem 2(b) below does indeed fully generalize the
scalar result to the matrix case without imposing any additional restrictions on the
leading coefficient, order, or kernel of the operator L. In addition to proving this new
result, what follows can be seen as providing a novel alternative proof to Theorem 1
and Theorem 2(a).
Definition. Let 1 (x), . . . , MN (x) be N -vector valued functions and let be the
MN MN block Wronskian matrix
1 2 MN
1 2
MN
..
= .. .. ..
. . (4)
. . .
1(M1) 2(M1) (M1)
MN
2 TheWronskian determinant Wr(1 , . . . , n ) of the functions i (x) is defined to be the determinant of the
i1
n n matrix having ddx i1 j (x) in row i and column j.
3 Another sort of higher-dimensional generalization is the case of partial differential operators, which is
considered in [2].
For the sake of brevity, the following notations will be utilized below. The symbol
will denote the N MN matrix (1 MN ) and D() = 0 will be used as a
shorthand for the statement that D(i ) is equal to the zero vector for each 1 i MN
(i.e., that the vector functions are in the kernel of some linear operator D). The N N
identity matrix will be written simply as I , and more generally, the m m identity
matrix for any natural number m will be denoted by Im .
The main result can now be stated concisely.
Theorem 2 (Matrix Case). If the functions i are chosen so that det = 0, then (a)
the differential operator
I
I
K = I M 1(M) MN
(M)
1 .. (5)
.
M1 I
is the unique monic MODO K of order M such that K () = 0 and (b) if L is any
MODO such that L() = 0, then there exists a MODO Q such that L = Q K .
It is a nearly trivial observation that for a MODO L of order at most m one has
m
L= i (x) i (0 1 m ) m = (L(1 ) L(MN ) L). (6)
i=0
That is, for any choice of L the product of the N (m N + N ) matrix made from
its coefficients with m has the vector L(i ) as its i th column for 1 i MN and the
last N N block is a copy of the operator itself. For any choice of N (m N + N )
matrix , its product with m yields a matrix that records a differential operator L and
its action on each of the vector functions as in (6).
In the case that m = M and the N N blocks i are defined by
1 0
= (0 M ) = 1(M) (M)
MN I ,
0 I
706
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
the product m in (6) would equal
1
(M) IMN
(M) (M) 0 B
1 MN I M = I = (0 K )
0 I (M) M I
for some MODO K . Because the MODOs in the block B are of degree at most M 1,
K is monic of degree M. We know that K () = 0 since the first MN columns, which
are all zero, record its action on the functions i . This operator K can equivalently be
produced with the formula (5), as the quasi-determinant | M | M+1,M+1 where M is
viewed as an (M + 1) (M + 1) matrix with entries that are N N matrices or as
the Schur complement of the invertible block in the matrix M . This demonstrates
the existence of the operator K promised in (a). (This could also have been achieved in
many other ways, including merely by direct computation of K (i ), but doing it this
way sets us up nicely for proving the rest of the claim.)
Let G be the (m N + N ) (m N + N ) matrix whose decomposition into
N (m N + N ) blocks
G0 G0
G1 G 1
1
G=
... is such that . =
.. 0
Gm G M1
and for i M the block row G i = (gi0 gi1 gim ) where the functions gi j are defined
by the equation iM K = mj=0 gi j (x) j . Consider the product Gm . Its first MN
rows would have the form (IMN B) since 1 = IMN . For i M, the product of
block row G i with m is designed so that the MODO iM K shows up as its last
N N block. According to (6), the previous columns would be the image of under
the action of that operator, but iM K () = 0 so
IMN B
0 K
0 K
Gm = . .. . (7)
.
. .
0 mM K
i=M i=0
Then the operator in parentheses above is the operator Q satisfying the claim in (b).
Finally, suppose K was also a monic MODO of order M such that K () = 0. Then
D = K K is an operator of order strictly less than M with this same property. The
only way that D can have order less than M and also satisfy D = Q K for some
MODO Q is if Q = D = 0, which demonstrates the uniqueness of K and completes
the proof.
whose invertibility assures us that there is a unique monic MODO of order 2 having
all four of these vectors in its kernel. Using (5), we can easily determine that this
operator is
x1 2x3 0 2x32
K =I + 2
+ .
0 x3 0 3
x2
especially since the leading coefficient of L is nonzero and singular, a situation that
cannot arise in the case N = 1. Without the theorem above it would not be clear that
there is an algebraic relationship between L and K . However, Theorem 2 assures us
that any MODO having these vectors in its kernel must have K as a right factor, and so
there must be a differential operator Q such that L = Q K . (In fact, one can check
that
1 3
1 1
Q= + x1 2x3
1 1
x 2x
ACKNOWLEDGMENT. The author wishes to thank the referee, Maarten Bergvelt, Michael Gekhtman, Tom
Kunkle, and Chunxia Li for advice, assistance, and encouragement.
708
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
REFERENCES
The Paul R. HalmosLester R. Ford Awards, established in 1964, are made annually
to authors of outstanding expository papers in the MONTHLY. The award is named
for Paul R. Halmos and Lester R. Ford, Sr., both distinguished mathematicians and
former editors of the Monthly. Winners of the HalmosFord Awards for expository
papers appearing in Volume 122 (2015) of the Monthly are as follows.
Alex Chin, Gary Gordon, Kellie MacPhee, and Charles Vincent, Pick a tree any
tree, pp. 424432.
Kenneth S. Williams, A four integers theorem and a five integers theorem,
pp. 528536.
Manya Raman-Sundstrom, A pedagogical history of compactness, pp. 619635.
Zhiqin Lu and Julie Rowlett, The sound of symmetry, pp. 815835.
Abstract. In this article, we give another proof for the closed form of (2m) inspired by the
elementary telescoping sum proof for (2) given by Daners [3]. This proof, which begins
with recurrence relations derived from certain integrals by using integration by parts, yields
a identity giving the value of (2m) in terms of (2), (4), ..., (2m 2). A quick proof by
induction yields the closed form of (2m).
1. INTRODUCTION. One of the more fascinating formulas from the theory of infi-
nite series has to be the identity
1 2
(2) = = .
k=1
k2 6
This identity, originally established by Euler around 1735, has received much interest
throughout the years. Chapman has compiled many proofs of this result in [2].
More generally, Euler proved that for any m N
1 (1)m+1 22m1 B2m 2m
(2m) = = ,
k=1
k 2m (2m)!
where Bn denotes the n-th Bernoulli number, defined by the generating function
t tn
= Bn .
et 1 n=0
n!
The above identity for (2m) has been established in many ways as well, a couple of
which are found in [1] and [4]. The purpose of this article is to generalize Daners
approach for computing (2) found in [3] to compute (2m). In [3], he establishes
2
that (2) = 6 by relying on nothing more sophisticated than recurrence relations
obtained from applying integration by parts on a simple family of integrals. Although
the reasoning is more involved, the main ideas are similar, and we find a recurrence
that will establish Eulers identity for (2m).
The following result collects two recurrences for Ik,n that we will repeatedly use.
http://dx.doi.org/10.4169/amer.math.monthly.123.7.710
MSC: Primary 40-01
710
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Lemma 1. For all positive integers n,
2n 1
(1) I0,n = I0,n1 ,
2n
1
(2) Ik,n = 2n(2n 1)Ik+1,n1 4n 2 Ik+1,n .
(2k + 2)(2k + 1)
Proof. The first recurrence follows from one application of integration by parts with
u = cos2n1 x and dv = sin x d x and using sin2 x = 1 cos2 x.
We establish the second recurrence similarly. Applying integration by parts with
u = cos2n x and dv = x 2k d x yields
2n 2
Ik,n = x 2k+1 cos2n1 x sin x d x.
2k + 1 0
Applying integration by parts again and using sin2 x = 1 cos2 x yields the desired
result.
Next, we use the recurrences from Lemma 1 in a manner similar to Daners [3] to
1
derive the following recurrence. Hereafter, we set S(k) = .
i ... i k2
1i i 1
2
1 k
m
1 k 1
Proposition 1. For any m N, S(k) = 0.
k=0
2 (2m 2k + 1)!
Proof. Start with recurrence (2) from Lemma 1, and divide both sides by n 2 I0,n :
Ik,n 1 Ik+1,n1
2 Ik+1,n
= 2n(2n 1) 4n .
n 2 I0,n n 2 (2k + 2)(2k + 1) I0,n I0,n
Ik,n 4 I Ik+1,n
k+1,n1
= .
n 2 I0,n (2k + 2)(2k + 1) I0,n1 I0,n
1
Define S N (0) = 1, and S N (k) = for k > 0; note that S N (k) is
1i 1 i k N
i 2
1 ... i 2
k
a truncated sum of S(k). Repeated telescoping with the previous identity as k varies
yields
m
(1)k (2m)! Imk,0 Im,N
S N (k) = . ()
k=0
2 (2m 2k)! I0,0
2k I0,N
To explain how the telescoping process works, we establish that (*) is true by induc-
tion on m. For the base case m = 1, we start with the recurrence above with k = 0:
1 I I1,n
1,n1
= 2 .
n2 I0,n1 I0,n
k
1 I I1,k
1,0
= 2 ,
n=1
n2 I0,0 I0,k
N
S j (n)
Summing both sides from j = 1 to N and noting that = S N (n + 1) yields
j=1
j2
m
(1)m+k (2m)! Ik,0 4 I Im+1,N
m+1,0
S j (m k + 1) = .
k=0
22m2k (2k)! I0,0 (2m + 2)(2m + 1) I0,0 I0,N
This can be readily rewritten in a form that verifies the claim for m + 1, as required.
We now bound the right side of (*). Using the fact that sin x 2x
on [0, 2 ], in
conjunction with I0,n = 2n1 I
2n 0,n1
, we obtain
2m 2m 2m I
2 2 0,N
Im,N sin x cos2N x d x sin2 x cos2N x d x = .
0 2 0 2 2 N +1
Im,N 2m I
0,N
Since the integrands are nonnegative, it follows that 0 < .
I0,N 2 N +1
Consequently, we can rewrite (*) as
m
(1)k (2m)! Imk,0 2m
0< S N (k) .
k=0
22k (2m 2k)! I0,0 22m (N + 1)
m
(1)k (2m)! Imk,0
S(k) = 0.
k=0
22k (2m 2k)! I0,0
In,0 2n
We can simplify this further by noting that = 2n by a simple integra-
I0,0 2 (2n + 1)
tion calculation. Upon applying this fact, we obtain the desired recurrence.
1
We next compute the value of S(k) = . Although this result is
... i k2 i2
1i 1 i k 1
known in the literature (see [5], for instance), we will give a short derivation of this
result below.
712
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Lemma 2. For any k N, we have S(k) = 2(1 212k ) (2k).
Proof. Consider the generating function G(x) = S(k)x 2k , where we are taking
k=0
S(0) = 1. Before rewriting G(x), we quote the following two classic results from
complex analysis without proof (see [6] for further details):
z2 (1)n+1 z
1
sin z = z 1 and csc z = 2 .
j=1
( j)2 z n=1
z 2 (n)2
First, directly applying the definition of S(n) to G(x) followed by using the infi-
x 2 1 x
nite product for sine yields G(x) = 1 2 = = x csc( x). Next,
j=1
j sin x
(1)n+1 x 2
by applying the expansion for cosecant, we obtain G(x) = 1 2 .
n=1
x 2 n2
x2 x 2k
However, 2 = by the geometric series. Applying this to our last
x n2 k=1
n
expression for G(x) and interchanging the order of summation yields
(1)n+1
G(x) = 1 + 2 x 2k .
k=1 n=1
n 2k
Therefore, we obtain G(x) = 1 + 2(1 212k ) (2k)x 2k . On the other hand, since
k=1
G(x) = 2k
S(k)x , equating like coefficients of x 2k yields the desired result.
k=0
Before proving the main result, we need the following identity among the Bernoulli
numbers, which we will also prove by using generating functions.
m
2m + 1
Lemma 3. For any m N, we have (1 22k1 )B2k = 0.
k=0
2k
t 1 + et 2t
Proof. Since = 2t , the generating function for the Bernoulli
1 et 2 e 1
numbers and the Maclaurin series for et yields
1 t n Bk (2t)k
Bk t k
= 1+ .
k=1
k! 2 n=0
n! k=1
k!
r
r 1 tr r tr
Br (1 2 ) = (1 2 j1 B j ) .
r =1
r! r =0 j=0
j r!
Equate the coefficients for t 2m+1 from both sides, noting that B2m+1 = 0:
2m+1
2m + 1
(1 2 j1 )B j = 0.
j=0
j
Since the odd indices contribute nothing to the sum on the left side of the equality, we
can rewrite this (upon relabeling the even indices) as
m
2m + 1
(1 22k1 )B2k = 0.
k=0
2k
Finally, we establish the closed form of (2m) for any positive integer m by putting
together the proposition with the last two lemmata.
Proof. First of all, note that applying Lemma 2 to the proposition yields
m1
(1)m+1 2m 1 1 k 1 212k
(2m) = + (2k) .
1 212m 2(2m + 1)! k=1 2 (2m 2k + 1)!
m1
(1)m+1 2m 22m1 1 2m + 1
(2m) = (22k1 1)B2k .
(2m)! (2m + 1)(22m1 1) k=0 2k
m1
1 2m + 1
B2m = (22k1 1)B2k ,
(2m + 1)(22m1 1) k=0 2k
we deduce that the claim is also true for m, thereby completing the proof.
714
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
REFERENCES
1. F. Beukers, E. Calabi, J. Kolk, Sums of generalized harmonic series and volumes, Nieuw Arch. Wiskd. 11
no. 4 (1993) 217224.
2. R. Chapman, Evaluating (2), 2003, available at http://empslocal.ex.ac.uk/people/staff/
rjchapma/etc/zeta2.pdf.
3. D. Daners, A short elementary proof of 1/k 2 = 2 /6, Math. Mag. 85 (2012) 361364.
4. T. Osler, Finding (2 p) from a Product of Sines, Amer. Math. Monthly 111 (2004) 5254.
5. Y. Ohno, W. Zudilin, Zeta stars, Commun. Number Theory Phys. 2 (2008) 325347.
6. R. Silverman, Introductory Complex Analysis. Dover, New York, 1972.
Abstract. We determine all generating iterated function systems for certain self-similar sets
such as the Vicsek snowflake and the Koch curve.
Remark. As an application, we investigate all generating IFSs for the triadic Cantor
set C generated by the IFS {1 (x) = x/3, 2 (x) = (x + 2)/3}. Note that each con-
tractive similitude with (C) C satisfies (C) i (C) for some i {1, 2}, and
IC = {x, 1 x}. Then by the Theorem, every k (x) in a generating IFS {k }k=1 for
C must be of the form i (x) or i (1 x).
Proof of Theorem. Suppose that (E) i1 (E) for some i 1 {1, . . . , N }. Then
i1
1
(E) E, so either i1
1
(E) = E or, by repeating the above process, i12
http://dx.doi.org/10.4169/amer.math.monthly.123.7.716
MSC: Primary 28A80, Secondary 28A78
716
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
i1 (E) E for some i 2 {1, . . . , N }. We can obtain by induction that i1
1
(E) = E for some i 1
k=1 {1, . . . , N } , and hence, i I E .
k
E= i (F) F. (1)
k=1 i{1,...,N }k
D C
f4 f3
f5
E F
f1 f2
A B
The second example is a self-similar set for which substantial overlaps occur.
B (0, 1) B (0,1)
f3
F F
E E
f1 G f G
2
Proof. We point out that g122 = g211 , which is a complete overlap. Note that
IW = {identity}. By the Theorem, we only need to show that (W ) gi (W ) for
some i {1, 2, 3}. Figure 3 may help in following the proof.
Using (1), we can check that W is a subset of OAB and contains all its three sides
but no points of (EFG) , where (EFG) is the interior of EFG (Figure 3a).
We first prove that (O AB) gi (OAB) for some i {1, 2, 3}. Suppose other-
wise. Let P1 P2 W be one side of (OAB); there are two cases to consider.
Case 1. P1 BEF \ {E, F}, P2 (EOI \ {E}) (FHA \ {F}).
Then P1 P2 lies on either BO or BA, say BO since it cannot intersect (EFG) .
Reasoning similarly, the third vertex of (OAB) lies on BA. So P1 equals B and
EFG (OAB), which contradicts the fact that is a contractive similitude.
Case 2. P1 EOI \ GHI, P2 FHA \ GHI.
Recall that (EFG) contains no points of W . So does g k (EFG) by similarity,
where g = g12 or g21 , and g k is the kth iteration of g.
718
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Notice that g k (EFG) connect each other one by one. They are located along the
line FH (or EI) and approach to the point H (or I ) as k . Thus, either P1 P2 or
another side of (OAB) passes through Yn for some n (Figure 3b gives the case Y2 ),
where
n
n
Yn = k
g12 (EFG) k
g21 (EFG) .
k=0 k=0
Then
GHI g2 (W ) = GHI g21 (W ) g1 (W ) GHI g21 n
(W ) .
Therefore, GHI g2 (W ) g1 (W ) as n=1 g21 n
(W ) = {I } g1 (W ). Using a
similar argument yields GHI g1 (W ) g2 (W ), which completes the proof.
All generating IFSs for the above examples can be iterated from their defining IFS.
However, the famous Koch curve, which is a fractal invented by the Swedish mathe-
matician Helge von Koch in 1904, is one of the exceptions.
Start with E 0 = [0, 1]. Let E 1 be the set consists of the four line segments obtained
by replacing the middle third of E 0 by the other two sides of the equilateral triangle
based on the removed segment. Inductively, we construct E k by applying the same
procedure to each line segments in E k1 (Figure 4). Finally, we arrive at the Koch
curve.
E0 E1 E2
Figure 4. E k with k = 0, 1, 2
Proof. Note that h 1 and h 2 map ABC to DBA and ECA, respectively (Fig-
ure 5). Owing to (1), we have K ABC. By the Theorem, it remains to prove that
(K ) h i (K ) for some i {1, 2, 3} as I K = {identity, T }.
A (1/2, 3/6)
M Q J N
H
I P
B (0, 0) D (1/3, 0) E (2/3, 0) C (1, 0)
Suppose that (K )
h i (K ) for i {1, 2}. Denote the three vertices of (ABC)
by I, J and L. Without loss of generality, we can assume I ABD \ {A} and J
AEC \ {A}. Let H be the intersection point of IJ and AE. Let MN be the line segment
passing through H and parallel to BC with M AB and N AC.
Note that K AB (also K AC and K BC) is a similar copy of the triadic Cantor
set C. So is (K ) IJ. Therefore, |PH| min{|HJ|, |PI|}, which implies I = M
and J = N . Otherwise, as we can see in Figure 5, |HJ| < |HN| = |QH| < |PH|, a
contradiction!
Finally, we claim that L = A. Otherwise, either LM or LN is parallel to BC by
applying the same argument as above, which is impossible.
Now we get that (K AB) is a subset of K AM or K AN. We only consider
the former case. A similar proof
works for the latter case.
Notice that K AB = C/ 3. Letting |AM|/|AB| = yields C/ 3 C/ 3
or, equivalently, C C. By the Remark in the introduction, we have = 3n for
some positive integer n. Thus, (K ) h 3 (K ), which finishes our proof.
ACKNOWLEDGMENT. We thank the anonymous referee for careful reading of the manuscript and making
many useful suggestions. The first author is supported by NSFC #11101148 and the Fundamental Research
Funds for the Central Universities, ECUST #222201514321. The second author is supported by the NSFC
#11271137 and STCSM #13dz2260400.
REFERENCES
720
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Department of Mathematics, East China University of Science and Technology, Shanghai 200237, P.R. China
[email protected]
Department of Mathematics, Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai
200241, P.R. China
[email protected]
PROBLEMS
11922. Proposed by Max Alekseyev, George Washington University, Washington, DC
Find every positive integer n such that both n and n 2 are palindromes when written in
the binary numeral system (and with no leading zeros).
11923. Proposed by Omran Kouba, Higher Institute for Applied Sciences and Tech-
nology, Damascus, Syria. Let f p be the function on (0, /2) given by
Prove f p > 0 for 0 < p < 1/2 and f p < 0 for 1/2 < p < 1.
11924. Proposed by Cornel Ioan Valean, Timis, Romania. Calculate
/2
{tan x}
d x,
0 tan x
where {u} denotes u u.
11925. Proposed by Leonard Giugiuc, Drobeta Turnu Severin, Romania. Let n be an
integer with n 4. Find the largest k such that for any list a of n real numbers that
sum to 0,
3 2
n n
a 2j k a 3j .
j=1 j=1
http://dx.doi.org/10.4169/amer.math.monthly.123.7.722
722
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
11926. Proposed by Ovidiu Furdui, Technical University of Cluj-Napoca, Cluj-
Napoca, Romania. Let k be an integer, k 2. Find
log |1 x|
d x.
0 x (1+1/k)
11927. Proposed by Finbarr Holland, University College Cork, Cork, Ireland. Let
O, G, I , and K be, respectively, the circumcenter, centroid, incenter, and symme-
dian point (also called Lemoine point or Grebe point) of triangle ABC. Prove |OG|
|O I | |O K |, with equality if and only if ABC is equilateral.
11928. Proposed by Hideyuki Ohtsuka, Saitama, Japan. For positive integers n and m
and for a sequence ai , prove
n m n+m
n m n+m
ai+ j = ak
i=0 j=0
i j k=0
k
and
n n i + j n n 2
= .
i< j
i j n i< j
i j
SOLUTIONS
bm + m 0 (mod d ),
where b = a p and N is chosen so that k p (mod d ). By the induction
hypothesis, we find infinitely many choices for m.
In the remaining case, d > 1 and a is a unit modulo d. Let c = gcd((d), d). By
the induction hypothesis, there exists m N such that ka m + m 0 (mod c). Let ka m
+ m = r c, where r N, and choose s, t N so that c = s(d) + td. Set
6r +1 5
6r + 1
= Bk m(6r 2k)
k=0
k m=0
r
6r + 1 6r + 1
=6 Bk = 6 B6s2 ,
0k6r +1
k s=1
6s 2
6|k+2
724
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
5
(1)m B6r +1 ( m )
m=0
A Functional Equation
11794 [2014, 648]. Proposed by George Stoica, University of New Brunswick, Saint
John, Canada. Find every twice differentiable function f on R such that for all nonzero
x and y, x f ( f (y)/x) = y f ( f (x)/y).
Solution by O. P. Lossers, Eindhoven University of Technology, Eindhoven, The
Netherlands. We assume only that f is once differentiable. We claim that the solutions
are f (x) = a(x + a), where a R.
The functional equation
f (y) f (x)
xf = yf (1)
x y
can be written in the form
f (y)
f f (y) f (x)
x
f (y)
= f (2)
x
y y
when f (y) = 0.
726
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
and for k 1,
(2ki)x (m) (m) ( + 2ki)m
Ik = Re e x m1
dx = Re = Re .
0 ( 2ki)m ( 2 + 4k 2 )m
To put this in real form, define k = tan1 (2k/) so we can write ( + 2ki)m
= ( 2 + 4k 2 )m [cos(mk ) + i sin(mk )]. The real part Ik of the integral can be calcu-
lated from this, and
1 m
cos m tan1 2k
I = (m) +2 .
m k=1
( 2 + 4k 2 )m/2
By convexity, 1
f (x)
s(x)
m
+ 1s(x)
M
, so
1
dx t 1t t M + (1 t)m
+ = .
0 f (x) m M mM
728
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
Also note that
2
2
0 t M (1 t) m = t M + (1 t)m t (1 t) M+ m ,
2
so t M + (1 t)m t (1 t) M+ m . Now
mM t (1 t)(M m)2
A H tm + (1 t)M =
t M + (1 t)m t M + (1 t)m
t (1 t)(M m) 2
2
= M m ,
2
t (1 t) M+ m
as claimed.
Note: The argument above applies even if f need only be a bounded, positive mea-
surable function. In that version, equality holds in the strengthened inequality when f
takes the value m on a set of measure M+mm and the value M on a set of complemen-
M
tary measure .
M+ m
Editorial comment. The discrete case of the strengthened inequality above appeared
as Problem 11469 in this Monthly (problem in December 2009 and solution in May
2011) and in B. Meyer, Some inequalities for elementary mean values, Math. Comp.
42 (1984) 193194. The best possible upper bound for A G in terms of m and M can
be proved by similar arguments to the ones above. This upper bound is the maximum of
tm + (1 t)M m t M 1t , which is attained at t = log[M log(M/m)/(Mm)]
log(M/m)
. This result is
also proved in S. H. Tung, On lower and upper bounds of the difference between the
arithmetic and geometric mean, Math. Comp. 29 (1975) 834836.
Also solved by R. Bagby, R. Boukharfane (France), R. Chapman (U. K.), E. A. Herman, B. Karaivanov
& T. S. Vassilev (U.S.A. & Canada), J. H. Lindsey II, P. W. Lindstrom, M. Omarjee (France), P. Perfetti
(Italy), I. Pinelis, R. Sargsyan (Armenia), K. Schilling, A. Stenger, R. Stong, S. Yi (Korea), Z. Zhang (China),
FAU Problem Solving Group, Northwestern University Math Problem Solving Group, NSA Problems Group,
University of Louisiana at Lafayette Math Club, and the proposer.
Letting d = 2/, we can write 0 = dk=0 ck eik for nonnegative real constants
A Deranged Sum
11802 [2014, 739]. Proposed by Istvan Mezo, Nanjing University
n of Information
2
Science
and Technology, Nanjing, China. Let H n,2 = k=1 k , and let Dn
= n! nk=0 (1)k /k!. (This is the derangement number of n, that is, the number of
permutations of {1, . . . , n} that fix no element.) Prove that
(1)n 2
Dn
Hn,2 = .
n=1
n! 6e n=0
n!(n + 1)2
2 1 2
Dk1 Dk
= = .
6 e k=1 (k 1)! k 2 6e k=0
k! (k + 1)2
Also solved by U. Abel (Germany), A. Ali (India), K. F. Andersen (Canada), M. Andreoli, R. Bagby,
S. Banerjee & B. Maji (India), M. Bataille (France), R. Boukharfane (France), K. N. Boyadzhiev, P. Bracken,
M. A. Cariton, R. Chapman (U. K.), H. Chen, D. Fleischman, N. Fontes-Merz, O. Geupel (Germany),
M. L. Glasser, J.-P. Grivaux (France), E. A. Herman, M. Hoffman, B. Karaivanov & T. Vassilev (Canada),
P. M. Kayll, O. Kouba (Syria), J. Minkus, M. Omarjee (France), R. Sargsyan (Armenia), A. Stenger, R. Stong,
R. Tauraso (Italy), J. Vinuesa (Spain), M. Vowe (Switzerland), M. Wildon (U. K.), J. Zacharias, Armstrong
Problem Solvers, FAU Problem Solving Group, GCHQ Problem Solving Group (U. K.), GWstat Problem
Solving Group, NSA Problems Group, and the proposer.
730
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
REVIEWS
Edited by Jeffrey Nunemacher
Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015
For a long time I have thought I was a statistician, interested in inferences from the particular to
the general. But as I have watched mathematical statistics evolve, I have had cause to wonder
and to doubt. ... All in all I have come to feel that my central interest is in data analysis, which I
take to include, among other things: procedures for analyzing data, techniques for interpreting
the results of such procedures, ways of planning the gathering of data to make its analysis
easier, more precise or more accurate, and all the machinery and results of (mathematical)
statistics which apply to analyzing data.1
Replace data analysis with data science and, even today, it is hard to find a more
elegant and precise definition of this discipline. As well, data science and statistics dif-
fer in their respective goals and perspectives. For example, statistics primarily focuses
http://dx.doi.org/10.4169/amer.math.monthly.123.7.731
1I first became aware of this quote in a compelling paper entitled 50 Years of Data Science by David
Donoho, presented at a conference honoring the 100th birthday of Tukey. The paper is currently available on
the Simply Statistics blog (simplystatistics.org).
732
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
nearest-neighbor, penalized regression, regression splines, trees, boosting, k-means
clustering, hierarchical clustering, and support vector machines. Naturally, there is
an emphasis on implementing the algorithms in the programming language R. Each
chapter concludes with a detailed lab stepping through the computational techniques
just described.
It should be noted that ISLR is the undergraduate version of Elements of Statistical
Learning: Data Mining, Inference, and Predictions [3] (hereafter ESL) by Hastie, Tib-
shirani, and Friedman, written for graduate students and practicing statisticians. ESL
gives a more sophisticated approach to almost all the topics covered in ISLR; the two
books even share many of the same examples. Any instructor not already comfort-
able with machine learning (such as this reviewer) and planning to use ISLR should
keep ESL handy as a reference. ESL is particularly useful as a means of augmenting
the content of ISLR in a manner appropriate for advanced undergraduates, especially
those considering graduate school. Trevor Hastie and Rob Tibshirani, both at Stanford
University, the two authors common to both ISLR and ESL, are leaders in developing
and using advanced algorithms to better understand the rich and complex data of the
world around us. This adds to the appeal of ISLR; students will be learning directly
from some of the modern leaders in the field of data science.
At first glance, ISLR might appear to be a text written by statisticians for other
statisticians and data scientists. I believe, however, that with a little effort, any mathe-
matician with a working, not necessarily expert, knowledge of R is capable of teaching
an upper-level course to mathematics majors from this book. Anyone who has taught
a statistics course using multivariate regression is a candidate to teach from ISLR.
Overall, the mathematical level of the text is not especially high, and R is used in a
very straightforward manner. Throughout, the focus is on algorithms and applications,
not clever programming tricks. Every algorithm is already implemented in R and is
easy to use, if even from a black box perspective. Because I have been teaching this
to mostly senior mathematics majors, I enjoy augmenting the material in places with
more mathematically advanced perspectives. It is quite plausible that ISLR could be
used in a mid-level course, with less emphasis on the mathematical subtleties.
There are some things in ISLR that could be better. None of these is a major prob-
lem, but they are issues of which one should be aware. Understandably, the authors
cant avoid bringing in their statistics perspective, but they occasionally overempha-
size it in a way that doesnt bring any additional insight. For example, I dont feel that it
is necessary to cite the formula for the standard error of estimation for regression coef-
ficients since error estimates are estimated from data later on. At times, this bifurcated
perspectiveare we interested in inference (statistics) or prediction (data science)?
muddles the message. Another rough spot is that occasionally the level oscillates from
basic ideas (e.g., reminding the reader that the second derivative measures concavity)
to advanced notions such as L 2 spaces. The data sets used in the text are useful as a
way of introducing the methods, but they pale in comparison to what one can easily
dig up from other sources. The labs at the end of each chapter are helpful to students
but a bit outdated relative to current R programming paradigms (no ggplot or dplyr).
Perhaps in a new edition, these will be modernized.
Lets review the content of ISLR. It begins, naturally, with a discussion of what
statistical or machine learning is all about. Notions covered include the differ-
ence between supervised learning (e.g., regression or any other scenario with a
response variable) and unsupervised learning (e.g., clustering models in which struc-
ture is waiting to be discovered). The authors do an especially nice job describing
the bias-variance tradeoff, i.e., how total error can be decomposed into squared
bias, predication variance, and unaccounted-for error. Bias-variance tradeoff is a
734
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
the difference between working in the L 2 norm (ridge regression) and the L 1 norm
(Lasso). This chapter also contains a fairly traditional discussion of best subset selec-
tion for multivariate regression. I dont spend much time on this subject since it is
covered in detail in statistics courses on regression.
The next chapter covers regression splines, localized regression (known as loess and
familiar to most R users) and GAMs (generalized additive models, basically combina-
tions of splines and other generalized fitting techniques). After that, there is a chapter
devoted to tree-based methods: regression and classification trees, bagging (bootstrap
aggregation in which the average of a collection of models will generally outperform a
single model), random forests (bagging with a small, but effective, twist), and boosting
(an algorithm in which less is more, especially if you do less a lot). These last three
methods are examples of randomized algorithms, a topic undergraduate students do
not often see.
The final supervised learning chapter covers support vector machines (SVMs).
SVMs are probably the most technically challenging topic covered but are handled
with an appropriately streamlined approach. Because of their complexity, I find SVMs
provide an excellent platform for showing students how machine learning relies on
ideas from linear algebra, geometry, and numerical methods. The final chapter of the
book is devoted to unsupervised methods: principal components, k-means clustering,
and hierarchical clustering. Hierarchical clustering is a wonderful topic on which to
end because of its use with gene expression (microarray) data and other situations with
messy data in search of order.
It is quite possible to cover almost all of ISLR in one semester, with students doing
mini-projects along the way to learn the algorithms and then leaving some time for
substantial projects at the end. I dont use too many of the exercises, but this is a
personal preference. By the end of the semester, the students are equipped with an
impressive array of algorithms that are applicable to a wide variety of situations. As
well, I find they learn to use R in a creative and generative manner. Best of all, they
have fun. Its thrilling to take on challenging and, often, unusual data sets. For example,
they encounter very wide data sets taken from DNA microarrays, say, with tens of
thousands predictors (individual genes) and only a few dozen samples (patients). They
also get a chance to see how mathematics fits together: how geometry, linear algebra,
and calculus can be brought to bear on challenging and exciting problems. Although
the singular value decomposition (SVD) is not mentioned explicitly in ISLR, the topics
encountered make it easy to introduce students to this wonderful idea. Its a shame that
SVD is not yet a standard topic in our first course in linear algebra.
Understandably, some topics and algorithms are left out. For example, a case could
be made for including multivariate regression splines (MARS, another entertaining
acronym), neural networks, or Markov chain Monte Carlo methods. Including any
of these, or other, topics would mean leaving something out. One can quibble over
such matters of taste, but I feel the authors have done an excellent job of selecting
topics. The goal of ISLR is to serve as an introduction to machine and/or statistical
learning appropriate for undergraduates, not as an reference or encyclopedia (ESL
already serves that purpose). In this sense, it succeeds admirably.
There are a couple of artifacts of the digital era that make this ISLR even more
enticing. To start, both ISLR and ESL are freely available as pdfs. I like having the
real book, being somewhat old school, but not surprisingly, many of the students
opt for the digital version. Another treat is that Tibshirini and Hastie ran a MOOC
(massive open online course) for the book in edX and Stanford Online. The course
has been archived and is currently (and, I assume, in perpetuity since nothing ever
goes away on the internet) freely available. In the MOOC, Tibshirini and Hastie go
REFERENCES
1. J. Tukey, The future of data analysis, Ann. Math. Stat. 33 (1962) 167.
2. L. Breiman, Statistical modeling: Two cultures, Statist. Sci. 16 (2001) 199231.
3. T. Hastie, R. Tibshirani, J. Friedman, Elements of Statistical Learning: Data Mining, Inference, and Pre-
diction, Second Edition, Springer-Verlag, New York, NY, 2009.
736
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 123
A MERICAN M ATHEMATICAL S OCIETY
XYZ Series, Volume 16; 2015; 203 pages; Hardcover; Socks Are Like
ISBN: 978-0-9885622-8-8; List US$59.95; AMS members US$47.96;
Order code XYZ/16
Pants, Cats Are
Like Dogs
The Case of Games, Puzzles & Activities
Academician for Choosing, Identifying &
Nikolai Sorting Math!
Nikolaevich Luzin Malke Rosenfeld and
Gordon Hamilton
Sergei S. Demidov, Russian
Academy of Sciences, Moscow, Mathematical thinking
Russia and Boris V. Lvshin, and calculating are two dif-
Editors
ferent things. Of the two, the
Translated by Roger Cooke former skill is far more important to develop than the latter, especially
This book chronicles the 1936 today, when electronic calculators and computers are everywhere.
attack on mathematician Young children, who may know nothing of calculating, can be
Nikolai Nikolaevich Luzin during the USSR campaign to Sovietize remarkably good at mathematical thinking. They do it naturally
all sciences. in their play. The puzzles in this book are meant to be approached
History of Mathematics, Volume 43; 2016; 416 pages; Hardcover; playfully, and they help children build upon their natural capacities
ISBN: 978-1-4704-2608-8; List US$59; AMS members US$47.20; for mathematical thought.
Order code HMATH/43 Peter Gray, Research Professor, Boston College, and author of
Free to Learn: Why Releasing the Instinct to Play Will Make Our
Teaching School Children Happier, More Self-Reliant, and Better Students for Life
Mathematics: From Explore the mathematics of choosing, identifying, and sorting
Pre-Algebra to Algebra through a diverse collection of math games, puzzles, and activities.
A publication of Delta Stream Media, an imprint of Natural Math. Distributed in North America by the
American Mathematical Society.
Hung-Hsi Wu, University of
California, Berkeley, CA Natural Math Series, Volume 4; 2016; 84 pages; Softcover;
ISBN: 978-0-9776939-0-0; List US$15; AMS members US$12;
This two-volume set includes a system- Order code NMATH/4
atic exposition of a major part of the
mathematics of grades 59 (excluding
statistics) written specifically for
Common-Core era teachers.
Common Sense
Mathematics
Ethan D. Bolker and Maura B. Mast
&RPPRQ6HQVH0DWKHPDWLFVLVD
WH[WIRUDRQHVHPHVWHUFROOHJHOHYHO
FRXUVHLQTXDQWLWDWLYHOLWHUDF\7KH
WH[WHPSKDVL]HVFRPPRQVHQVHDQG
FRPPRQNQRZOHGJHLQDSSURDFKLQJ
UHDOSUREOHPVWKURXJKSRSXODUQHZVLWHPVDQGQGLQJXVHIXOPDWK
HPDWLFDOWRROVDQGIUDPHVZLWKZKLFKWRDGGUHVVWKRVHTXHVWLRQV