Measurement - Paul Lockhart
Measurement - Paul Lockhart
MEASUREMENT
PAUL LOCKHART
Lockhart, Paul.
Measurement / Paul Lockhart.
p. cm.
Includes index.
ISBN 978-0-674-05755-5 (hardcover : alk. paper)
1. Geometry. I. Title.
QA447.L625 2012
516—dc23 2012007726
For Will, Ben, and Yarrow
CONTENTS
Acknowledgments
Index
REALITY AND IMAGINATION
There are many realities out there. There is, of course, the physical reality we
find ourselves in. Then there are those imaginary universes that resemble
physical reality very closely, such as the one where everything is exactly the
same except I didn’t pee in my pants in fifth grade, or the one where that
beautiful dark-haired girl on the bus turned to me and we started talking and
ended up falling in love. There are plenty of those kinds of imaginary
realities, believe me. But that’s neither here nor there.
I want to talk about a different sort of place. I’m going to call it
“mathematical reality.” In my mind’s eye, there is a universe where beautiful
shapes and patterns float by and do curious and surprising things that keep me
amused and entertained. It’s an amazing place, and I really love it.
The thing is, physical reality is a disaster. It’s way too complicated, and
nothing is at all what it appears to be. Objects expand and contract with
temperature, atoms fly on and off. In particular, nothing can truly be
measured. A blade of grass has no actual length. Any measurement made in
this universe is necessarily a rough approximation. It’s not bad; it’s just the
nature of the place. The smallest speck is not a point, and the thinnest wire is
not a line.
Mathematical reality, on the other hand, is imaginary. It can be as simple
and pretty as I want it to be. I get to have all those perfect things I can’t have
in real life. I will never hold a circle in my hand, but I can hold one in my
mind. And I can measure it. Mathematical reality is a beautiful wonderland of
my own creation, and I can explore it and think about it and talk about it with
my friends.
Now, there are lots of reasons people get interested in physical reality.
Astronomers, biologists, chemists, and all the rest are trying to figure out how
it works, to describe it.
I want to describe mathematical reality. To make patterns. To figure out
how they work. That’s what mathematicians like me try to do.
The point is I get to have them both—physical reality and mathematical
reality. Both are beautiful and interesting (and somewhat frightening). The
former is important to me because I am in it, the latter because it is in me. I
get to have both these wonderful things in my life and so do you.
My idea with this book is that we will design patterns. We’ll make patterns
of shape and motion, and then we will try to understand our patterns and
measure them. And we will see beautiful things!
But I won’t lie to you: this is going to be very hard work. Mathematical
reality is an infinite jungle full of enchanting mysteries, but the jungle does
not give up its secrets easily. Be prepared to struggle, both intellectually and
creatively. The truth is, I don’t know of any human activity as demanding of
one’s imagination, intuition, and ingenuity. But I do it anyway. I do it because
I love it and because I can’t help it. Once you’ve been to the jungle, you can
never really leave. It haunts your waking dreams.
So I invite you to go on an amazing adventure! And of course, I want you
to love the jungle and to fall under its spell. What I’ve tried to do in this book
is to express how math feels to me and to show you a few of our most
beautiful and exciting discoveries. Don’t expect any footnotes or references or
anything scholarly like that. This is personal. I just hope I can manage to
convey these deep and fascinating ideas in a way that is comprehensible and
fun.
Still, expect it to be slow going. I have no desire to baby you or to protect
you from the truth, and I’m not going to apologize for how hard it is. Let it
take hours or even days for a new idea to sink in—it may have originally
taken centuries!
I’m going to assume that you love beautiful things and are curious to learn
about them. The only things you will need on this journey are common sense
and simple human curiosity. So relax. Art is to be enjoyed, and this is an art
book. Math is not a race or a contest; it’s just you playing with your own
imagination. Have a wonderful time!
ON PROBLEMS
When you connect each corner of a triangle to the middle of the opposite
side, the three lines seem to all meet at a point. You try this for a wide variety
of triangles, and it always seems to happen. Now you have a mystery! But
let’s be very clear about exactly what the mystery is. It’s not about your
drawings or what looks like is happening on paper. The question of what
pencil-and-paper triangles may or may not do is a scientific one about
physical reality. If your drawing is sloppy, for example, then the lines won’t
meet. I suppose you could make an extremely careful drawing and put it
under a microscope, but you would learn a lot more about graphite and paper
fibers than you would about triangles.
The real mystery is about imaginary, too-perfect-to-exist triangles, and the
question is whether these three perfect lines meet in a perfect point in
mathematical reality. No pencils or microscopes will help you now. (This is a
distinction I will be stressing throughout the book, probably to the point of
annoyance.) So how are we to address such a question? Can anything ever
really be known about such imaginary objects? What form could such
knowledge take?
Before examining these issues, let’s take a moment to simply delight in the
question itself and to appreciate what is being said here about the nature of
mathematical reality.
This kind of triangle is also called equilateral (Latin for “same sides”).
Now, I know this is an absurdly atypical situation, but the idea is that if we
can somehow explain why the lines meet in this special case, it might give us
a clue about how to proceed with a more general triangle. Or it might not.
You never know, you just have to mess around—what we mathematicians like
to call “doing research.”
In any event, we have to start somewhere, and it should at least be easier to
figure something out in this case. What we have going for us in this special
situation is tons of symmetry. Do not ignore symmetry! In many ways, it is
our most powerful mathematical tool. (Put it in your backpack with your
machete and canteen.)
Here symmetry allows us to conclude that anything that happens on one
side of the triangle must also happen on the other. Another way to say this is
that if we flipped the triangle across its line of symmetry, it would look
exactly the same.
In particular, the midpoints of the two sides would switch places, as would
the lines connecting them to their opposite corners.
But this means that the crossing point of these two lines can’t be on one
side of the line of symmetry, else when we flip the triangle it would move to
the other side, and we could tell that it got flipped!
So the crossing point must actually be on the line of symmetry. Clearly our
third line (the one connecting the top corner to the middle of the bottom side)
is simply the line of symmetry itself, and so that is why all three lines meet at
a point. Isn’t that a nice explanation?
This is an example of a mathematical argument, otherwise known as a
proof. A proof is simply a story. The characters are the elements of the
problem, and the plot is up to you. The goal, as in any literary fiction, is to
write a story that is compelling as a narrative. In the case of mathematics, this
means that the plot not only has to make logical sense but also be simple and
elegant. No one likes a meandering, complicated quagmire of a proof. We
want to follow along rationally to be sure, but we also want to be charmed
and swept off our feet aesthetically. A proof should be lovely as well as
logical.
Which brings me to another piece of advice: improve your proofs. Just
because you have an explanation doesn’t mean it’s the best explanation. Can
you eliminate any unnecessary clutter or complexity? Can you find an entirely
different approach that gives you deeper insight? Prove, prove, and prove
again. Painters, sculptors, and poets do the same thing.
Our proof just now, for instance, despite its logical clarity and simplicity,
has a slightly arbitrary feature. Even though we made an essential use of
symmetry, there’s something annoyingly asymmetrical about the proof (at
least to me). Specifically, the argument favors one corner. Not that it’s so very
bad to pick one corner and use its line as our line of symmetry, it’s just that
the triangle is so symmetrical; we shouldn’t have to make such an arbitrary
choice.
We could, for instance, use the fact that in addition to having flip-
symmetry, our triangle is also rotationally symmetric. That is, if we turn it
one-third of a full turn around, it looks exactly the same. This means that our
triangle must have a center.
Now, if we flip the triangle across any of its three lines of symmetry
(favoring none of them), the triangle doesn’t change, so its center must stay
put. This means that the center point lies on all three lines of symmetry. So
that’s why the lines all meet!
Now, I’m not trying to say that this argument is so much better or even all
that different. (And in fact, there are lots of other ways to prove it.) All I’m
saying is that deeper insight and understanding can be gained by coming at a
problem in more than one way. In particular, the second proof not only tells
me that the lines meet, it tells me where—namely, at the center of rotation.
Which makes me wonder, where exactly is that? Specifically, how far up an
equilateral triangle is its center?
Throughout the book, questions like this will come up. Part of becoming a
mathematician is learning to ask such questions, to poke your stick around
looking for new and exciting truths to uncover. Problems and questions that
occur to me I will put in boldface type. Then you can think about them and
work on them as you please and hopefully also come up with problems of
your own. So here’s your first one:
Now going back to the original problem, we see that we have barely made
a dent. We have an explanation for why the lines meet in an equilateral
triangle, but our arguments are so dependent on symmetry, it’s hard to see
how this will help in the more general situation. Actually, I suppose our first
argument still works if our triangle has two equal sides:
The reason is that this kind of triangle, known as isosceles (Greek for
“same legs”), still possesses a line of symmetry. This is a nice example of
generalization—getting a problem or an argument to make sense in a wider
context. But still, for the average asymmetrical triangle, our arguments clearly
won’t work.
This puts us in a place that is all too familiar to mathematicians. It’s called
stuck. We need a new idea, preferably one that doesn’t hinge so much on
symmetry. So let’s go back to the drawing board.
What do we notice? Well, we’ve divided the original triangle into four
smaller ones. In the symmetrical case, they are clearly identical. What
happens in general?
Are the triangles all the same? Actually, it looks like three of them might
just be smaller (half-scale) versions of the original triangle. Could that be
true? What about the middle one? Could it also be the same, only rotated
upside down? What exactly have we stumbled onto here?
We’ve stumbled onto a glimmer of truth, pattern, and beauty, that’s what.
And maybe this will lead to something wholly unexpected, possibly having
nothing to do with our original problem. So be it. There’s nothing sacred
about our three lines problem; it’s a question like any other. If your thoughts
on one problem lead you to another, then good for you! Now you have two
problems to work on. My advice: be open-minded and flexible. Let a problem
take you where it takes you. If you come across a river in the jungle, follow it!
Let’s suppose this is true. And that, by the way, is a perfectly fine thing to
do. Mathematicians are always supposing things and seeing what would
happen (the Greeks even had a word for it—they called it analysis). There are
thousands of apparent mathematical truths out there that we humans have
discovered and believe to be true but have so far been unable to prove. They
are called conjectures. A conjecture is simply a statement about mathematical
reality that you believe to be true (usually you also have some examples to
back it up, so it is a reasonably educated guess). I hope that you will find
yourself conjecturing all over the place as you read this book and do
mathematics. Maybe you will even prove some of your conjectures. Then you
get to call them theorems.
Supposing that our conjecture about the four triangles is true (and, of
course, we still want a nice proof of this), the next question would be whether
this helps us solve our original problem. Maybe it will, maybe it won’t. You
just have to see if anything comes to you.
Essentially, engaging in the practice of mathematics means that you are
playing around, making observations and discoveries, constructing examples
(as well as counterexamples), formulating conjectures, and then—the hard
part—proving them. I hope you will find this work fascinating and
entertaining, challenging, and ultimately deeply rewarding.
So I will leave the problem of the triangle and its intersecting lines in your
capable hands.
Which brings me to my next bit of advice: critique your work. Subject your
arguments to scathing criticism by yourself and by others. That’s what all
artists do, especially mathematicians. As I’ve said, for a piece of mathematics
to fully qualify as such, it has to stand up to two very different kinds of
criticism: it must be logically sound and convincing as a rational argument,
and it must also be elegant, revelatory, and emotionally satisfying. I’m sorry
that these criteria are so painfully steep, but that is the nature of the art.
Now, aesthetic judgments are obviously quite personal, and they can
change with time and place. Certainly that has happened with mathematics no
less than with other human endeavors. An argument that was considered
beautiful a thousand or even a hundred years ago might now be looked upon
as clumsy and inelegant. (A lot of classical Greek mathematics, for example,
appears quite dreadful to my modern sensibilities.)
My advice is not to worry about trying to hold yourself to some impossibly
high standard of aesthetic excellence. If you like your proof (and most of us
are fairly proud of our hard-won creations), then it is good. If you are
dissatisfied in some way (and most of us are), then you have more work to do.
As you gain experience, your taste will grow and develop, and you may find
later that you are unhappy with some of your earlier work. That is as it should
be.
I think the same could be said for logical validity as well. As you do more
mathematics, you will literally get smarter. Your logical reasoning will
become tighter, and you will begin to develop a mathematical “nose.” You
will learn to be suspicious, to sense that some important details have been
glossed over. So let that happen.
Now, there is a certain obnoxious type of mathematician who simply
cannot allow false statements to be made at any time. I am not one of them. I
believe in making a mess—that’s how great art happens. So your first essays
in this craft are likely to be logical disasters. You will believe things to be
true, and they won’t be. Your reasoning will be flawed. You will jump to
conclusions. Well, go ahead and jump. The only person you have to satisfy is
yourself. Believe me, you will discover plenty of errors in your own
deduction. You will declare yourself a genius at breakfast and an idiot at
lunch. We’ve all done it.
Part of the problem is that we are so concerned with our ideas being simple
and beautiful that when we do have a pretty idea, we want so much to believe
it. We want it to be true so badly that we don’t always give it the careful
scrutiny that we should. It’s the mathematical version of “rapture of the
deep.” Divers see such beautiful sights that they forget to come up for air.
Well, logic is our air, and careful reasoning is how we breathe. So don’t forget
to breathe!
The real difference between you and more experienced mathematicians is
that we’ve seen a lot more ways that we can fool ourselves. So we have more
nagging doubts and therefore insist on a much higher standard of logical rigor.
We learn to play the devil’s advocate.
Whenever I am working on a conjecture, I always entertain the possibility
that it is false. Sometimes I work to prove it, other times I try to refute it—to
prove myself wrong. Occasionally, I discover a counterexample showing that
I was indeed misled and that I need to refine or possibly scrap my conjecture.
Still other times, my attempts to construct a counterexample keep running
into the same barrier, and this barrier then becomes the key to my eventual
proof. The point is to keep an open mind and not to let your hopes and wishes
interfere with your pursuit of truth.
Of course, as much as we mathematicians may ultimately insist on the most
persnickety level of logical clarity, we also know from experience when a
proof “smells right,” and it is clear that we could supply the necessary details
if we wished. The truth of the matter is that math is a human activity, and we
humans make mistakes. Great mathematicians have “proved” utter nonsense,
and so will you. (It’s another good reason to collaborate with other people—
they can raise objections to your arguments that you might overlook.)
The point is to get out there in mathematical reality, make some
discoveries, and have fun. Your desire for logical rigor will grow with
experience; don’t worry.
So go ahead and do your mathematical art. Subject it to your own standards
of rationality and beauty. Does it please you? Then great! Are you a
tormented struggling artist? Even better. Welcome to the jungle!
PART ONE
Let me tell you why I find this kind of thing so attractive. First of all, it
involves some of my favorite shapes.
I like these shapes because they are simple and symmetrical. Shapes like
these that are made of straight lines are called polygons (Greek for “many
corners”). A polygon with all its sides the same length and all its angles equal
is called regular. So I guess what I’m saying is, I like regular polygons.
Another reason why the design is appealing is that the pieces fit together so
nicely. There are no gaps between the tiles (I like to think of them as ceramic
tiles, like in a mosaic), and the tiles don’t overlap. At least, that’s how it
appears. Remember, the objects that we’re really talking about are perfect,
imaginary shapes. Just because the picture looks good doesn’t mean that’s
what is really going on. Pictures, no matter how carefully made, are part of
physical reality; they can’t possibly tell us the truth about imaginary,
mathematical objects. Shapes do what they do, not what we want them to do.
So how can we be sure that the polygons really do fit perfectly? For that
matter, how can we know anything about these objects? The point is, we need
to measure them—and not with any clumsy real-world implements like rulers
or protractors, but with our minds. We need to find a way to measure these
shapes using philosophical argument alone.
Do you see that in this case what we need to measure are the angles? In
order to check that a mosaic pattern like this will work, we need to make sure
that at every corner (where the tiles meet) the angles of the polygons add up
to a full turn. For instance, the ordinary square tiling works because the angles
of a square are quarter turns and it takes four of them to make a full turn.
To get a feel for this, you might want to make some paper triangles and cut
off their corners. When you join them together, they will always form a
straight line. What a beautiful discovery! But how can we really know that it
is true?
One way to see it is to view the triangle as being sandwiched between two
parallel lines.
Notice how these lines form Z shapes with the sides of the triangle. (I
suppose you might call the one on the right side a backward Z, but it doesn’t
really matter.) Now, the thing about Z shapes is that their angles are always
equal.
This is because a Z shape is symmetrical: it looks exactly the same if you
rotate it a half turn around its center point. That means the angle at the top
must be the same as the angle at the bottom. Does that make sense? This is a
typical example of a symmetry argument. The invariance of a shape under a
certain set of motions allows us to deduce that two or more measurements
must be the same.
Going back to our triangle sandwich, we see that each angle at the bottom
corresponds to an equal angle at the top.
This means that the three angles of the triangle join together at the top to
form a straight line. So the three turns add up to a half turn. What a delightful
piece of mathematical reasoning!
This is what it means to do mathematics. To make a discovery (by
whatever means, including playing around with physical models like paper,
string, and rubber bands), and then to explain it in the simplest and most
elegant way possible. This is the art of it, and this is why it is so challenging
and fun.
One consequence of this discovery is that if our triangle happens to be
equilateral (that is, regular) then its angles are all equal, so they must each be
1/6. Another way to see this is to imagine driving around the perimeter of the
triangle.
We make three equal turns to get back to where we started. Since we end
up making one complete turn, each of these must be exactly 1/3. Notice that
the turns we’ve made are actually the outside angles of the triangle.
Since the inside and outside angles combine to make a half turn, the inside
angles must be
Hey, this makes a regular hexagon! So as a bonus, we get that the angles of
a regular hexagon must be twice those of the triangle, in other words 1/3. This
means that three hexagons fit together perfectly.
So it works!
(By the way, if you don’t like doing arithmetic with fractions, you can
always avoid it by changing your measuring units. For example, if you’d
prefer, we could decide to measure angles in twelfths of a turn, so that the
angle of a regular hexagon would simply be 4, a square would have an angle
of 3, and a triangle an angle of 2. Then the angles of our shapes would add up
to 4 + 3 + 3 + 2 = 12; that is, a full turn.)
I especially love how symmetrical this mosaic pattern is. Each corner has
the same exact sequence of shapes around it: hexagon, square, triangle,
square. This means that once we’ve checked that the angles fit at one corner,
we automatically know that they work at all the other corners. Notice that the
pattern can be continued indefinitely so that it covers an entire infinite plane.
It makes me wonder what other beautiful mosaic patterns might be out there
in mathematical reality?
Naturally, we’re going to need to know the angles of the various regular
polygons. Can you figure out how to measure them?
This time we have squares and triangles, but instead of lying flat, they are
arranged to form a sort of ball shape. This kind of object is called a
polyhedron (Greek for “many sides”). People have been playing around with
them for thousands of years. One approach to thinking about them is to
imagine unfolding them flat onto a plane. For example, one corner of my
shape would unfold to look like this:
Here, we have two squares and two triangles around a point, but they leave
a gap so that the shape can be folded up into a ball. So in the case of
polyhedra, we need the angles to add up to less than a full turn.
Another difference between polyhedra and flat mosaics is that the design
involves only a finite number of tiles. The pattern will still go on forever (in a
sense), but it will not extend indefinitely into space. Naturally, I’m curious
about these patterns, too.
In other words, what are all the different ways to make polyhedra out of
regular polygons so that at each corner we see the same pattern? Archimedes
figured out all of the possibilities. Can you?
Of course, the most symmetrical kind of polyhedron would be one where
all the faces are identical, like a cube. These are called regular polyhedra. It
is an ancient discovery that there are exactly five of these (the so-called
Platonic solids). Can you find all five?
2
What is measuring? What exactly are we doing when we measure something?
I think it is this: we are making a comparison. We are comparing the thing we
are measuring to the thing we are measuring it with. In other words,
measuring is relative. Any measurement that we make, whether real or
imaginary, will necessarily depend on our choice of measuring unit. In the
real world, we deal with these choices every day—a cup of sugar, a ton of
coal, a thing of fries, whatever.
The question is, what sort of units do we want for our imaginary
mathematical universe? For instance, how are we going to measure the
lengths of these two sticks?
Let’s suppose (for the sake of argument) that the first stick is exactly twice
as long as the second. Does it really matter how many inches or centimeters
they come out to be? I certainly don’t want to subject my beautiful
mathematical universe to something mundane and arbitrary like that. For me,
it’s the proportion (that 2:1 ratio) that’s the important thing. In other words,
I’m going to measure these sticks relative to each other.
One way to think of it is that we simply aren’t going to have any units at
all, just proportions. Since there isn’t a natural choice of unit for measuring
length, we won’t have one. So there. The sticks are just exactly as long as
they are. But the first one is twice as long as the second.
The other way to go is to say that since the units don’t matter, we’ll choose
whatever unit is convenient. For example, I could choose the second stick to
be my unit, or ruler, so that the lengths come out nice. The first stick has
length 2, the second stick has length 1. I could just as easily say the lengths
are 4 and 2, 6 and 3, or 1 and 1/2. It just doesn’t matter. When we make
shapes or patterns and measure them, we can choose any unit that we want to,
keeping in mind that what we are really measuring is a proportion.
I guess a simple example would be the perimeter of a square. If we choose
our unit to be the side of the square (and why not?), then the perimeter would
obviously be 4. What that really means is that for any square, the perimeter is
four times as long as the side.
This business of units is related to the idea of scale. If we take some shape
and blow it up by a certain factor, say 2, then all of our length measurements
on the big shape will come out just as if we were measuring the original shape
with a half-size ruler.
Let’s call the process of blowing up (or shrinking down) scaling. So the
second shape is obtained from the first by scaling by a factor of 2. Or, if we
like, we could say that the first shape is the second one scaled by a factor of
1/2.
Two figures related by a scaling are called similar. All I’m really trying to
say here is that if two shapes are similar, related by a certain scaling factor,
then all corresponding length measurements are related by that same factor.
People say that such things are “in proportion.” Notice that scaling doesn’t
affect angles at all. The shape stays the same, only the size changes.
The nice thing about not having arbitrary units and always choosing to
measure relative proportions is that it makes all our questions scale
independent. To me, this is the simplest and most aesthetically pleasing
approach. And given the fact that your shapes are in your head and mine are
in mine, I really don’t see any other alternative. Is your imaginary circle
bigger or smaller than mine? Does that question even have any meaning?
But before we can begin to go about measuring something, we need to
know precisely what object it is that we are talking about.
Let’s suppose I have a square.
Now, there are some things I know about this shape right off the bat, such
as the fact that it has four equal sides. The thing about information like this is
that it is not really a discovery, nor does it require any explanation or proof.
It’s simply part of what I mean by the word square. Whenever you create or
define a mathematical object, it always carries with it the blueprint of its own
construction—the defining features that make it what it is and not some other
thing. The questions we are asking as mathematicians then take this form: If I
ask for such and such, what else do I get as a consequence? For example, if I
ask for four equal sides, does that force my shape to be a square? Clearly, it
doesn’t.
Suppose we ask that the angles of our rhombus all be right angles. That
certainly forces our shape to be a square, because that’s what the word square
means! Now is there any room left for it to wiggle around? There is in fact
one more degree of freedom remaining, which is that it could change its size.
(This would be relative, of course, to some other object we are considering. If
all we had were a square, then size would have no meaning.)
3
Two objects with the same shape (that is, similar objects) are easy to compare
—the bigger one is bigger and the smaller one is smaller. It’s when we
compare different shapes that things get interesting. For instance, which one
of these is bigger, and what does that even mean?
One idea is to compare the amount of space the two shapes take up. This
measurement is usually called area. As with any measurement, there is no
such thing as absolute area—only area relative to other areas. Our choice of
unit is arbitrary; we could choose any shape and call the amount of space it
occupies “one unit of area,” and all other areas could be measured against it.
On the other hand, once we make a choice of length unit, there is a natural
(and traditional) choice of area unit, namely the amount of space occupied by
a square of unit sides.
So the measurement of area really boils down to the question, how much
room is my shape taking up compared to a unit square?
Some areas are relatively easy to measure. For example, suppose we have a
3 by 5 rectangle.
It is easy to see that we can chop this rectangle into fifteen identical pieces,
each of which is a unit square. So the area of the rectangle is 15. That is, it
takes up exactly fifteen times as much space as a unit square does. In general,
if the sides of a rectangle are nice whole numbers, say m and n, then the area
is simply their product, mn. We can just count the m rows of n squares each.
But what if the sides don’t come out even? How can we measure the area
of a rectangle if we can’t chop it up nicely into unit squares?
Here are two rectangles of the same height.
One interesting feature of area is the way it behaves with respect to scaling.
We can think of a scaling as being the result of two dilations by the same
factor. If we have a square, and we scale it by a factor of r, then its area will
get multiplied by r2. For example, if you blow up a square by a factor of 2, its
perimeter will double, but its area will quadruple.
As a matter of fact, this will be true for any shape. The effect of scaling on
area is to multiply by the square of the scaling factor, no matter what shape
you’re dealing with. A nice way to see this is to imagine a square with the
same area as your shape.
After scaling by a factor of r, their areas will still be equal—the two shapes
enclose the same amount of space whether or not I change my ruler. Since the
area of the square gets multiplied by r2, so must the area of the other shape.
There is also the question of three-dimensional size. This is usually called
volume. Naturally, we can take as our unit of volume that of a cube with unit
sides. The first question is how to measure a simple three-dimensional box.
4
The study of size and shape is called geometry. One of the oldest and most
influential problems in the history of geometry is this one: How long is the
diagonal of a square?
Naturally, what we are really asking about is the proportion of diagonal to
side. For convenience, let’s take the side of the square to have length 1, and
write d for the length of the diagonal. Now look at this design.
The only way this can happen is if the top number a when multiplied by
itself is exactly twice as big as the bottom number b multiplied by itself. In
other words, we need to find two whole numbers a and b so that
a2 = 2b2.
Since we’re only interested in the ratio , there is no point in looking at
numbers a and b that are both even (we could just cancel any common factors
of 2). We can also rule out the possibility that a is odd: if a were an odd
number, then a2 would also be odd, and there would be no way for it to be
double the size of b2.
So the only numbers we need to consider are those where a is even and b
is odd. But then a2 is not only even but twice an even (that is, divisible by 4).
Do you see why?
Now, since b is odd, b2 must also be odd, and so 2b2 is twice an odd. But
we need a2 to be equal to 2b2. How can twice an even be twice an odd? It
can’t.
What does this mean? It means that there simply aren’t any whole numbers
a and b with a2 = 2b2. In other words, there is no fraction whose square is 2.
Our diagonal to side proportion d cannot be expressed as a fraction in any
way—no matter how many pieces we divide our unit into, the diagonal will
never come out evenly.
This discovery tends to have a rather unsettling effect on people. Usually,
when we think about measuring something, we imagine it requiring only a
finite number of applications of our ruler (including possibly dividing it into
smaller, equal-size pieces). But this is not the case in mathematical reality.
Instead, we find that there are geometric measurements (e.g., the diagonal and
side of a square) that are incommensurable—that is, not simultaneously
measurable as multiples of a common unit. This forces us to abandon the
naïve idea that all measurements are describable as whole number
proportions.
This number d we’ve discovered is called the square root of 2 and is
written . Of course, this is really just a convenient shorthand way of saying,
“the number that when multiplied by itself is 2.” In other words, the only
thing we really know about is that its square is 2. We have no hope of
saying what this number is (at least as a whole number fraction), though of
course we can approximate it. For example, . Whatever. That’s
hardly the point. We want to understand the truth.
Well, the truth seems to be that we can’t really measure the diagonal of a
square. This is not to say that the diagonal doesn’t exist or that it doesn’t have
a length. It does. The number is out there; we just can’t talk about it in the
way we want to. The problem is not with the diagonal; it’s with our language.
Maybe it’s the price we pay for mathematical beauty. We’ve created this
imaginary universe (the only place where measurement is truly possible), and
now we have to face the consequences. Numbers like this that cannot be
expressed as fractions are called irrational (meaning “not a ratio”). They
arise naturally in geometry, and we just have to somehow get comfortable
with that. The diagonal of a square is precisely times as long as the side,
and that’s all we can really say about it.
The big circles are clearly half as wide as the square. How
about the small circle?
5
What about the diagonal of a rectangle?
Of course, it depends on how long the sides are, but in what way? The
relationship between the diagonal and the sides was discovered about four
thousand years ago, and it’s just as surprising now as it was then.
Notice how the diagonal cuts the rectangle into two identical triangles.
Let’s take one of these triangles and put a square on each of its sides.
The amazing discovery is this: the big square takes up exactly as much area
as the two smaller squares put together. No matter what shape the rectangle
has, its sides and diagonal will always conspire to make these squares add up
this way.
But why on earth should that be true? Here is a pretty way to see it using
mosaic designs.
The first one uses the two smaller squares, together with four copies of the
triangle, to make one big square. The second design uses the larger square
(the one built on the diagonal) and those same four triangles to make another
big square. The point is that these two big squares are identical; they both
have sides equal to the two sides of the rectangle added together. In particular,
this means that the two mosaics have the same total area. Now, if we remove
the four triangles from each, the remaining areas must also match, so the two
smaller squares really do take up exactly as much space as the larger one.
Let’s call the sides of the rectangle a and b and the diagonal c. Then the
square of side a together with the square of side b has the same total area as
the square of side c. In other words,
a2 + b2 = c2.
This is the famous Pythagorean theorem relating the diagonal and sides of
a rectangle. It’s named after the Greek philosopher Pythagoras (circa 500 BC),
although the discovery is actually far older, dating back to the ancient
Babylonian and Egyptian civilizations.
For example, we find that a 1 by 2 rectangle has a diagonal of length .
As usual, this number is hopelessly irrational. Generally speaking, a rectangle
whose sides are nice whole numbers will almost always have an irrational
diagonal. This is because the Pythagorean relation involves the square of the
diagonal rather than the diagonal itself. On the other hand, a 3 by 4 rectangle
has a diagonal of length 5, since 32 + 42 = 52. Can you find any other nice
rectangles like that?
6
I think we’re now in a position to do some serious measuring, but before we
do, I want to address a serious question. Why are we doing this? What is the
point of making up these imaginary shapes and then trying to measure them?
It’s certainly not for any practical purpose. In fact, these imaginary shapes
are actually harder to measure than real ones. Measuring the diagonal of a
rectangle requires insight and ingenuity; measuring the diagonal of a piece of
paper is easy—just get out a ruler. There are no truths, no surprises, no
philosophical problems at all. No, the issues we’re going to be dealing with
have nothing to do with the real world in any way. For one thing, the patterns
we will choose to measure will be chosen because they are beautiful and
curious not because they are useful. People don’t do mathematics because it’s
useful. They do it because it’s interesting.
But what’s so interesting about a bunch of measurements? Who cares what
the length of some diagonal happens to be, or how much space some
imaginary shape takes up? Those numbers are what they are. Does it really
matter what?
Actually, I don’t think it does. The point of a measurement problem is not
what the measurement is; it’s how to figure out what it is. The answer to the
question about the diagonal of a square is not ; it’s the mosaic design. (At
least that’s one possible answer!)
The solution to a math problem is not a number; it’s an argument, a proof.
We’re trying to create these little poems of pure reason. Of course, like any
other form of poetry, we want our work to be beautiful as well as meaningful.
Mathematics is the art of explanation, and consequently, it is difficult,
frustrating, and deeply satisfying.
It’s also a great philosophical exercise. We are capable of creating in our
minds perfect imaginary objects, which then have perfect imaginary
measurements. But can we get at them? There are truths out there. Do we
have access to them? It’s really a question about the limits of the human
mind. What can we know? This is the real question at the heart of every
mathematics problem.
So the point of making these measurements is to see if we can. We do it
because it’s a challenge and an adventure and because it’s fun. We do it
because we’re curious, and we want to understand mathematical reality and
the minds that can conceive it.
7
Let’s start by trying to measure the regular polygons. The simplest one is the
equilateral triangle.
The question is, how much of the box does the triangle occupy? (Notice
that this makes the question independent of any choice of units.) There’s a
certain number out there, intrinsic to the nature of triangles and squares,
which is beyond our control. What is it? More important, how can we figure
out what it is?
It turns out that some regular polygons are easier to measure than others.
Depending on the number of sides, these measurements can be more or less
difficult to obtain. For instance, the regular hexagon (six sides) and octagon
(eight sides) are relatively easy to measure, whereas the heptagon (seven
sides) is quite spectacularly difficult.
Can you measure the diagonals and areas of the regular
hexagon and octagon?
Another one you might enjoy measuring is the regular dodecagon (twelve
sides).
I want to show you a very pretty (and ingenious) way to measure the
diagonal. As usual, we’ll take the side of the pentagon to be our unit, and
write d for the diagonal length. The idea is to chop the pentagon into
triangles, like so:
From the picture, it sure looks like triangles A and B are the same. Are
they? It also looks as though triangle C has the same shape as the other two,
only smaller. Does it? We’re asking if the three triangles are similar. It turns
out that they are. The question is why.
For triangle C, on the other hand, it’s the long side that has length 1. What
about the short sides? This is where the cleverness comes in: a short side of C
and a short side of B join together to make a complete diagonal. This means
that the sides of C must have length d − 1. Here’s a picture of triangle C:
Now, the point is that these two triangles are similar. This means that the
big one is a blowup of the little one by a certain factor. Comparing the long
sides of the two triangles, we can see that the scaling factor must be d itself.
In particular, the short sides of the little triangle, when scaled by this factor,
must become the short sides of the big triangle. This means that our number d
must satisfy the relation
d(d − 1) = 1.
8
The tangling and untangling of numerical relationships is called algebra. This
kind of mathematics has a very long and fascinating history, dating back to
the ancient Babylonians. In fact, the technique I want to show you is over four
thousand years old.
The reason it’s hard to untangle something like d(d − 1) is that instead of
being a square (which could then be square-rooted), it’s a product of two
different numbers. What the Babylonians discovered is that a product of two
numbers can always be expressed as the difference of two squares. This
makes it possible to rewrite relationships in terms of squares, so that square
roots can be used to untangle them.
One way I like to think of it is to imagine the two numbers as sides of a
rectangle, so their product is the area. Then the idea is to even out the sides of
this rectangle by chopping some area off the top and reattaching it to the side.
This forms a square shape with a smaller square notch in it; in other words,
a difference of two squares. In doing this, we are taking off exactly as much
from the long side of the rectangle as we are adding to the short side. This
means that the side of the square will be the average of the two sides of the
rectangle.
As for the little square notch, its side length is just the amount by which the
two sides of the rectangle differ from their average. Let’s call that amount the
spread. Then what we’re saying is this: the product of two numbers is equal
to the square of their average minus the square of their spread. For example,
11 × 15 = 132 − 22.
If a is the average of two numbers and s is their spread, then the numbers
themselves must be a + s and a − s. Our result can then be written
(a + s)(a − s) = a2 − s2.
Suppose you are given both the sum and difference of two
numbers. How can you determine the numbers
themselves? What if it’s the sum and product that are
given?
Let’s use the Babylonian method to get a new description of our number d.
The average of d and d − 1 is d − 1/2, and the spread is 1/2, so we have
Now we can rewrite our relationship d(d − 1) = 1 as
which means
The point being that now we can unscramble this using square roots, to get
or if you prefer,
9
So it’s true, the diagonal of a pentagon can be expressed in terms of square
roots. In particular, it’s easy to see from the expression that d
is an irrational number, approximately 1.618. Of course, we could also have
obtained that information directly from d(d − 1) = 1. In fact, the two
expressions are equivalent in every way and tell us exactly the same things
about d. There is not the slightest difference in mathematical content between
the two.
I suppose the cynical view of the situation would be that we have expended
a great deal of effort to go precisely nowhere. We began with a description of
d as “the number that when multiplied by one less than itself equals 1,” and
we ended with d described as “half of one more than the number whose
square is 5.” That’s progress? If all the information about d is contained in the
original equation, why did we bother solving it?
On the other hand, why bother baking bread? We could just eat the raw
ingredients.
The point of doing algebra is not to solve equations; it’s to allow us to
move back and forth between several equivalent representations, depending
on the situation at hand and depending on our taste. In this sense, all algebraic
manipulation is psychological. The numbers are making themselves known to
us in various ways, and each different representation has its own feel to it and
can give us ideas that might not occur to us otherwise.
For example, the representation makes me think of this picture:
Again, I can’t really help you here; you’re on your own. There is a blank
canvas in front of you, and you need an idea. Maybe you will have one,
maybe not. That’s art.
Two of my favorites.
10
How about a circle? You certainly can’t ask for a prettier shape.
Circles are simple, symmetrical, and elegant. But how on earth are we
going to measure them? For that matter, how will we measure any curved
shapes?
The first thing to notice about a circle is that all the points on it are the
same distance away from the center. That is, after all, what makes it a circle.
That distance is called the radius of the circle. Since all circles have the same
shape, it’s really the radius that makes one circle different from another.
The perimeter of a circle is called its circumference (Latin for “carrying
around”). I think the natural measurements to make of a circle are its area and
circumference.
Let’s start by making some approximations. If we place a certain number of
equally spaced points around the circle, and then connect the dots, we get a
nice regular polygon.
The area and perimeter of this polygon are smaller than the corresponding
measurements for the circle, but they’re pretty close. If we used more points,
we could do even better. Suppose we use some large number of points, say n.
Then we get a regular n-gon whose area and perimeter are really close to the
true area and circumference of the circle. The important thing is that as the
number of sides of the polygon is increased, the approximations get better and
better.
What is the area of this polygon? Let’s chop it into n identical triangles.
Each triangle has width equal to a side of the polygon, say s. The height of
each triangle is the distance from the center of the circle to the side of the
polygon. Let’s call that distance h. Then each triangle will have area 1/2hs.
This means that the area of the polygon is 1/2hsn. Notice that sn is just the
perimeter of the polygon. So we can say
A = 1/2rC.
The area of a circle is exactly half the product of its radius and
circumference.
A nice way to think of this is to imagine unrolling the circumference onto a
line, so that it forms a right triangle with the radius.
What our formula is saying is that the circle takes up exactly the same
amount of space as this triangle.
Something really serious has just happened here. We have somehow
obtained an exact description of the area of a circle using nothing but
approximations. The point is that we didn’t just make a few good
approximations, we made infinitely many. We constructed an infinite
sequence of increasingly better approximations, and there was enough of a
pattern in those approximations that we could tell where they were heading.
In other words, an infinite sequence of lies with a pattern can tell us the truth.
It is arguable that this is the single greatest idea the human race has ever had.
This amazing technique, known as the method of exhaustion, was
invented by the Greek mathematician Eudoxus (a student of Plato) around
370 BC. It allows us to measure curved shapes by constructing an infinite
sequence of straight-line approximations. The trick is to do this in such a way
that the approximations have a pattern to them—an infinite list of random
numbers doesn’t tell us anything. It’s not enough to have an infinite sequence;
we have to be able to read it.
Show that if two points are connected to the same arc, the
resulting angles must be the same.
11
We have expressed the area of a circle in terms of its circumference. But can
we measure the circumference? With a square, it’s natural to measure the
perimeter in proportion to the side length—the ratio of the length around to
the length across. We can do the same thing with a circle. The distance across
a circle is called its diameter (of course it’s just twice the radius). So the
analogous measurement for a circle would be the ratio of the circumference to
the diameter. Since all circles are similar, this ratio is the same for every circle
and is denoted by the Greek letter pi, or π. This number is to circles what 4 is
to squares.
It’s not hard to approximate pi. For example, suppose we put a regular
hexagon inside a circle.
The perimeter of the hexagon is exactly three times the diameter. Since the
circumference of the circle is a bit longer, we see that π is just a little greater
than 3. If we use polygons with more sides we can get better estimates.
Archimedes (circa 250 BC) used a 96-gon to get . Many people are
under the misconception that this is an exact equality, but it isn’t. The actual
value is a bit smaller, a decent approximation being π ≈ 3.1416. Still better is
the fifth-century Chinese estimate .
But what is pi exactly? Well, the news is pretty bad. Pi is irrational (this
was proved by Lambert in 1768), so there’s no hope of expressing it as a ratio
of whole numbers. In particular, there is no way to measure both the diameter
and circumference evenly.
The situation is actually even worse than for the diagonal of a square.
Although is irrational, it is at least describable as “the number whose
square is 2.” In other words, this number satisfies a relation that can be
expressed in the language of whole number arithmetic; namely, it is the
number x such that x2 = 2. We may not be able to say what is, but we can
say what it does.
It turns out pi is different. Not only is it incapable of being expressed as a
fraction, but in fact pi fails to satisfy any algebraic relationship whatsoever.
What does pi do? It doesn’t do anything. It is what it is. Numbers like this are
called transcendental (Latin for “climbing beyond”). Transcendental
numbers—and there are lots of them—are simply beyond the power of
algebra to describe. Lindemann proved that pi is transcendental in 1882. It is
an amazing thing that we are able to know something like that.
On the other hand, mathematicians have found alternative descriptions of
pi. For instance, in 1674 Leibniz discovered the formula
The idea is that the more terms you add together on the right, the closer the
sum gets to the number on the left. So pi can be expressed as an infinite sum.
This at least provides us with a purely numerical description of pi, and it is
also philosophically quite interesting. More important, such representations
are all we’ve got.
So that’s the story. The ratio of circumference to diameter is pi, and there’s
nothing we can do about it. We’ll simply have to expand our language to
include it.
In particular, a circle of radius 1 has diameter 2, and so its circumference is
2π. The area of this circle is half the product of the radius and circumference,
which is just π. Blowing up by a factor of r, we find that for a circle of radius
r, the circumference and area are given by
C = 2πr,
A = πr2.
This object is made in the same way as the cylinder, except the top and
bottom faces are no longer necessarily circles but perhaps some other figure.
Let’s call this sort of thing a generalized cylinder. In this case there isn’t a
nice symmetrical way to chop it up, so the rectangular slicing idea is as good
a way as any. We still get the volume of the generalized cylinder as the
product of its height and base area. My point is that slicing this way works
whether there is symmetry or not. It’s a good example of the flexibility of the
exhaustion technique.
Notice how dramatically the shapes are affected. For instance, the square
becomes a rectangle (so its sides aren’t all the same length anymore). The
equilateral triangle is transformed into a mere isosceles triangle, and the circle
becomes an entirely new shape known as an ellipse.
In general, dilation is a pretty destructive process. Lengths and angles often
get severely distorted. In particular, there is usually no relationship whatever
between the perimeter of a shape before dilation and its perimeter afterward.
The perimeter of an ellipse, for instance, is a very difficult classical
measurement problem, mainly because it has no connection with the
circumference of a circle.
On the other hand, dilation turns out to be very compatible with area. We
already know how dilation affects the area of rectangles: if a rectangle is
dilated by a certain factor (in a direction parallel to one of its sides), then its
area gets multiplied by that same factor. Using the method of exhaustion, we
can see that this remains true for any shape whatsoever. To be precise, let’s
suppose we have some shape, and we dilate by the factor r in some direction.
We want to see that the area of the shape will get multiplied by r.
The idea is to slice our shape into thin rectangular strips, parallel to the
direction of dilation, so that the area of our shape is closely approximated by
the total area of the rectangles.
After dilation, the strips become dilated as well, so their areas get
multiplied by r. This means that the approximate area of the dilated shape is
exactly r times the approximate area of the original. Letting the number of
strips increase indefinitely (so that their thickness approaches zero), we see
that the true areas must be related by a factor of r as well. I think it’s quite
surprising and wonderful that we can keep track of the area of a shape even
after such an extreme distortion.
My favorite way to do this is to place the pyramid in a box with the same
base and height. I like to think of this box as the carrying case for the
pyramid.
The natural question is, how much of the volume of the box does the
pyramid take up? This is a pretty hard problem. It is also very old, dating back
to ancient Egypt (naturally). One way to begin is to notice—very cleverly—
that a cube can be divided into pyramids by joining its center to each of its
eight corners.
There are six of these pyramids because there is one for each face of the
cube. These pyramids are all identical, so each must have a volume equal to
one-sixth of the cube. The carrying case for one of these pyramids would be
half the cube. So these pyramids take up exactly one-third of the volume of
their carrying cases. I think that’s a really pretty argument.
The trouble is this only works for that particular shape of pyramid (where
the height is exactly half the length of a side of the base). Most pyramids
won’t fit together to make anything nice at all. They’re either too steep or too
shallow.
Does this mean that we can only measure one particular shape of pyramid?
Not at all! The point is that any pyramid can be obtained from one of these
special ones by appropriate dilations. If we want a steeper one, we can dilate
vertically by whatever factor we need to until the height matches the one we
want.
Now, here’s my favorite part: dilation affects the volume of the pyramid
and the volume of the carrying case in exactly the same way. They both get
multiplied by the dilation factor. This means that the ratio of the two volumes
is unaffected. Since the proportion was one-third for the special pyramids, it
must be the same for any pyramid. So the volume of a pyramid is always
exactly one-third the volume of the box it sits in. I just love that sequence of
ideas. Notice how subtly and powerfully the method of exhaustion plays its
role.
13
Let’s try to measure the volume of a cone.
I hope you agree that this is a beautiful and interesting shape. Naturally,
some sort of exhaustion technique is called for, and the first idea that comes
to mind is to approximate the cone by a stack of thin cylinders.
As the number of slices increases, and the cylinders get thinner, their total
volume will approach the true volume of the cone. All we need to do is to
figure out what the pattern is to these approximations and see where they are
heading.
Unfortunately, this turns out not to be so easy. The volume of each cylinder
depends on its radius, and these radii are varying as we move up and down
the cone. It’s a bit delicate. In fact, it would require a fairly expert algebraist
to read this pattern and understand its behavior.
Imagine a pyramid of the same height as the cone, with a square base equal
in area to the circular base of the cone. The idea is to show that these two
objects have the same volume.
To do this, let’s go back to our slicing idea. This time, we’ll slice both the
cone and the pyramid.
Notice that when we slice through a cone like this we create a small cone
on top. This little cone has exactly the same shape as the original, only
smaller. In other words, it is a scaled-down version of the large cone. Same
for the pyramid. In fact, since the cone and the pyramid have the same height,
and the little ones also have the same height, the scaling factor must be the
same for both shapes. Since the original cone and pyramid have equal base
areas, so must their scaled versions.
The point I’m trying to make is that no matter what height we slice at, the
cone and the pyramid will have equal cross-sectional areas. While I’m at it,
I’d like to be able to say that they have “equal cross-sections” and have you
understand that I mean the areas match, not that the cross-sections are
necessarily the same identical shape. So the cone and pyramid have equal
cross-sections.
This means that back in our volume approximation, corresponding
cylinders and boxes have equal bases. Since they also have the same
thickness, their volumes must agree. So each little cylinder has the same exact
volume as the corresponding box. In particular, the total volume of all the
cylinders must equal the total volume of all the boxes. This means that no
matter how many slices we make, the approximations for the cone and
pyramid will always match.
As the slices get thinner, these approximations are then heading
simultaneously toward the volume of the cone and the volume of the pyramid.
It must be that these volumes are exactly the same. In other words, the
volume of a cone is equal to the volume of a pyramid with the same height
and base area.
What I love about this is that we don’t have to figure out what the patterns
to these approximations are, only that they’re the same as each other. We
manage to avoid a difficult algebraic computation by making a well-chosen
comparison.
To make this even nicer, let’s put our cone inside a cylinder. This is
analogous to the pyramid sitting inside its box.
Since the cylinder and the box have the same size base and height, they
must have the same volume as well. In particular, the cone must take up
exactly one-third of its carrying case just as the pyramid does.
Now, it turns out that the cone is just the tip of the iceberg. This
comparison idea is really quite general. Any solid object can be approximated
by a stack of thin (generalized) cylinders, and if two such objects can be
arranged so that they have equal cross-sections, then the method of
exhaustion will guarantee that they have equal volumes.
This is a very old and beautiful result, known as the Cavalieri principle.
(Although it was originally developed by Archimedes, the technique was
rediscovered in the 1630s by Galileo’s student Bonaventura Cavalieri.) The
idea is not to calculate volumes but to compare them; the trick is in choosing
the right objects to compare.
In order to use the Cavalieri principle, it is necessary to position the two
objects in space so that for every horizontal slicing plane, the corresponding
cross-sections always have equal areas. (In particular, the objects must have
the same height.) This ensures that no matter how fine the cylindrical
approximations become, they will continue to agree in volume.
It is also important to understand that this is not the only way that two
objects can have the same volume. It is easy to find solids of equal volume
that do not have equal cross-sections.
Unfortunately, the Cavalieri principle doesn’t work for surface area. The
cylindrical approximations that are good for volume do not do a good job of
approximating the surface area. (This is a rather subtle point, actually.) In any
case, there are plenty of objects with equal cross-sections and different
surface areas. Can you find some?
A really simple way to use the Cavalieri principle is when the two objects
have identical cross-sections. That is, not only do the cross-sections have the
same area, they are actually the exact same shape.
These two boxes have the same square base and the same height, but one is
straight up and the other is slanted. If we look at the horizontal cross-sections,
we can see that they are all the same—the same square as the base. It’s as if
the different cross-sections have merely slid into new positions with their
shape unchanged. The Cavalieri principle tells us that these two boxes have
the same volume.
The point is, as long as we simply move the various cross-sections around
(we could even rotate them), their areas won’t change, and the two solids will
have the same volume. Of course, there is nothing special about squares; this
would work with any shape.
What about a slanted pyramid? Or a slanted cone? We could even imagine
a sort of generalized cone, which has any shape whatever for its base and
comes to a point at some height above it.
If we build a pyramid again, with the same base area and height as the
generalized cone, the scaling argument from before tells us that the
corresponding cross-sections are equal. What this means is that the volume of
any generalized cone is simply one-third that of the corresponding
generalized cylinder.
14
The most spectacular use of the Cavalieri principle was made almost two
thousand years before Cavalieri was born. This was Archimedes’s
measurement of the volume of a sphere.
a2 + h2 = r2.
This has a nice geometric interpretation. It says that the cross-sectional area
is the same as the difference between the area of a circle of radius r and a
circle of radius h. In other words, the area of a ring.
As the height h of the cross-section increases, this “pineapple ring” gets
thinner. The outer radius stays the same, while the inner radius grows.
Archimedes realized that these rings are precisely the cross-sections of a
cylinder with a cone removed.
The cylinder has radius r and height r, and so does the cone. This makes it
so that any slice of the cone will have the same radius as its height, which is
precisely what the inner radius of the pineapple ring is supposed to do. Since
the outer circle always has radius r, we can see that Archimedes was right.
We can now construct a solid with the same cross-sectional areas as the
sphere. We simply glue together two copies of the cylinder with the cone
removed, one for the top half of the sphere and one for the bottom. In other
words, we get a cylinder with a double cone removed.
Since these two objects have the same cross-sections at every level, the
Cavalieri principle says that they must have the same volume. Is that great, or
what!
The thing is, we know how to deal with cones and cylinders. The two cones
together must take up one-third of the cylinder since each one is taking up
one-third of its own half of the cylinder. So the volume of Archimedes’s solid
is two-thirds of the volume of the cylinder. Notice that this cylinder has the
same radius and height as the sphere itself. So we can say, as Archimedes
himself did so long ago, that a sphere takes up exactly two-thirds of the
cylinder it sits in.
This is measurement at its finest. Of course, if you prefer, we can express
the volume of the sphere solely in terms of its radius. The volume of the
cylinder would then be
πr2 × 2r = 2πr3,
While we’ve got the sphere sitting here, let’s measure its surface area. I
want to mimic our treatment of the circle, where we used polygons to
approximate both its area and circumference. Now the idea will be to
approximate the sphere using polyhedra with many faces.
It doesn’t particularly matter how we do this, as long as the faces get
smaller and smaller as we go. This will ensure that the volume and surface
area of the polyhedron will approach those of the sphere. To keep it simple,
let’s suppose that all the faces are triangles.
To measure the volume of this polyhedron, we’ll break it into pieces. If we
connect the center of the sphere to the corners of each face, we form a bunch
of thin triangular pyramids.
The volume of the polyhedron is the sum of the volumes of all of these
little tetrahedrons. This is analogous to how we chopped up the polygonal
approximation to the circle into lots of triangles.
Now, here’s the idea. The heights of these little pyramids are all very close
to the radius r of the sphere. So the volume of each pyramid is roughly one-
third the radius times its base area. Putting these together, we get that the total
volume of the polyhedron is about one-third the radius times its surface area.
This is only approximate, because those heights weren’t quite equal to the
radius, but it gets closer and closer to the truth.
What this means is that for the sphere, the volume V and the surface area S
satisfy V = 1/3rS exactly. If we want to, we can combine this with to
get
S = 4πr2.
15
Now I want to tell you about a really pretty discovery that was made in the
early fourth century, as the classical period of geometry was coming to an
end. The idea first appears in a collection of mathematical writings by the
Greek geometer Pappus of Alexandria (circa 320 AD).
I have to say right from the start that I am a bit apprehensive about getting
into this subject. Certain aspects of it are quite delicate, and it’s not clear to
me how I’m going to explain them. (There may be points where I simply have
to throw up my hands.)
Let’s start with a doughnut—I mean the shape, not the snack.
In any case, the point is that many beautiful shapes can be viewed as being
the result of a motion of some kind.
The question is whether thinking of a shape in this way helps at all in the
measuring department. This is part of a recurring theme in geometry, the
relationship between description and measurement. How does the
measurement of an object depend on the way in which it is described?
In particular, if an object is obtained from the motion of some simpler
shape, precisely how are its measurements related to that shape and the way it
moves? This is the question that Pappus of Alexandria was asking sixteen
centuries ago, and it is his great discovery that I want to try to explain to you.
I’d like to start with the pineapple rings that we looked at before when we
were measuring the sphere.
What we’re talking about is the space between two concentric circles. This
kind of region is called an annulus (Latin for “ring”). Naturally, we can think
of it as a large circular region with a smaller one removed.
On the other hand, an annulus shape can also be viewed as being swept out
by a stick moving along a circular path, like a snowplow going around a tree.
Of course, if the stick (or snowplow) were to travel along a straight path, it
would sweep out a rectangle. We can now see the annulus and rectangle as
being related aspects of the same idea—shapes formed by a moving stick.
This is interesting because geometrically an annulus is very different from a
rectangle. If you tried to bend a rectangle into a ring, for instance, it wouldn’t
work out very well; the inner edge would buckle and the outer edge would
rip. Not a pretty picture.
The interesting question about rings and rectangles is how to compare their
areas. Suppose we take a stick and drive it around a circular path to form an
annulus. How long should a straight path be in order to sweep out the same
area? This is just the kind of thing Pappus was wondering about.
Finally, the area of the rectangle is the product of its length and width, or
which is precisely the area of the annulus. I love how the algebra and
geometry connect here. The difference of squares relation is reflected
geometrically by the equivalence of ring and rectangle.
A nice way to think about it is to observe that the middle circle is simply
the path traced out by the midpoint of the stick. In other words, it’s the
distance that the center travels that is important. Specifically, we have found
that if the center of a stick travels along a circular path of a certain length, it
sweeps out the same area as it would if the path were straight. In either case,
the area is simply the product of the length of the stick and the length of the
path.
This is a nice example of the way description (the annulus is described by
the motion of the stick) affects measurement (the area depends in an elegant
way on the stick and the path). Like I said, the connection between
description and measurement is what geometry is all about.
We can take this example a bit further. Suppose we were to push the stick
(by its midpoint) along some arbitrary path.
Does our result remain valid? Can we say that the area of the swept out
region is the same as if it were straight? Is it simply the product of stick
length and path length, or are we pushing our luck?
In fact, it’s true regardless of the shape of the path. Let me see if I can
explain why. First of all, notice that it works for paths that are partial circles,
or arcs.
This is because both the arc length and the swept out area are in the same
proportion to those of a complete annulus. In particular, the result holds for
tiny annular “slivers” as well as for very thin rectangles. The idea is to piece
these together to form more complicated shapes.
The various paths taken by the center of the stick fit together to form one
big path made up of tiny circular and straight sections. By arranging these
properly, we can make a path that approximates any desired path as closely as
we wish.
In particular, we can (by creating an infinite sequence of such
approximations) make the total length of our path approach the length of the
desired path, and the area of our conglomeration of slivers will approach the
true area of the desired region. Since the approximate area is the product of
the length of the stick and the length of the path, and this remains true as the
approximations improve, it must be true for the actual region under
consideration. Again, the method of exhaustion comes to the rescue.
This is our first example of the amazing generality of Pappus’s result: the
area of a region swept out by a moving stick is the product of the length of the
stick and the distance traveled by the center of the stick.
There are a couple of subtleties here. The first is that for this to work, the
stick must remain perpendicular to the direction of motion at all times.
Pushing the stick at an angle messes it up.
For example, the Pappus theorem fails miserably for a slanted rectangle.
Since we assembled our shapes, at least approximately, from slivers of rings
and rectangles, where the stick and the path are always at right angles, this is
the only kind of motion our method can handle. Perpendicular motion is an
essential ingredient of the Pappus philosophy.
The other issue is self-intersection.
If the path curves too sharply, we’ll end up tracing out parts of the region
twice, and those areas of overlap will get counted double. As long as we stay
perpendicular, and avoid sharp turns, we’re fine.
16
Now how about that doughnut? Since a torus is described by a circle traveling
along a circular path, it makes sense to look at the object traced out by the
same circle moving along a straight path. In other words, a cylinder.
V = πa2 × 2πb.
Let’s imagine a flat region of some fixed shape, dragged through space in
such a way that it remains perpendicular to its direction of motion. This traces
out some ridiculous solid. Pappus discovered that even the volume of this
crazy blob obeys the same pattern as before: it is the product of the area of the
original region and the length of a certain path. Naturally, this is the path
taken by the average point in the region. But what does that mean exactly?
For symmetrical shapes like a circle or a square, there is a clear candidate,
a center. Where is the center of an asymmetrical region?
It turns out there is a way to define the center of an object, regardless of its
shape. It even has a nice physical description: it’s the place where if you put
your finger there, the object would balance. This point is unique to each shape
and is called its centroid (the corresponding physical notion is center of
mass.) The problem for geometers is to make sense of this idea in a purely
abstract way, since geometric objects are imaginary and don’t have any actual
mass or ability to balance. That this can be done is wonderful, but not so easy
to explain. I think I’ll leave it for you as a nice, open-ended research project:
In any case, the point is that every object has a centroid, and Pappus’s great
discovery is this: the volume of a solid described by a moving plane figure is
equal to the product of the area of the figure and the length of the path traced
out by its centroid. (Provided, of course, that the figure remains perpendicular
to the direction of travel, and there is no self-intersection caused by sharp
turns.) Notice that our discovery about moving sticks fits right in with this
general philosophy. The centroid of a stick is just its midpoint.
Finally, there is the issue of surface area. How can we measure the surface
area of a torus? This time we’re only interested in the crust of the doughnut. It
is the circle itself that traces out the surface, not the disk. In other words, it’s
the points on the circle as it goes around that describe the surface we want to
measure.
Again, it turns out to be the same as if we moved straight. The surface area
of a torus is the product of the circumference of the moving circle and the
length of the central path. So the surface area of the torus we looked at before
would be
S = 2πa × 2πb.
In general, the surface area of an object traced out by a moving plane figure
is the product of the perimeter of the figure and the length of the path traced
out by a certain point. (That is, assuming the shape doesn’t rotate as it travels
along the path.) Now, however, it’s not the centroid of the region that matters,
it’s the centroid of the perimeter.
For a circle, or some such symmetrical object, these two notions of center
coincide, but in general they don’t. To get a rough idea, imagine two physical
models made in the same flat shape. One is solid metal. The other has just a
rim of metal and an interior made of a much lighter material. The balance
points of the two models will not necessarily be the same. This sounds like
another good research project:
I hope this hasn’t been too frustrating. These ideas are very deep and hard
to explain. I just wanted to give you a taste of them now because I think they
are so beautiful.
17
The shapes we’ve been dealing with so far—squares, circles, cylinders, and
so on—are actually quite special. They are simple, symmetrical, and easy to
describe. In other words, they’re pretty. In fact, I would go so far as to say
that they are pretty because they are easy to describe. The shapes that are the
most pleasing to the eye are those that need the fewest words to specify. In
geometry, as in the rest of mathematics, simple is beautiful.
But what about more complicated, irregular shapes? I think we need to look
at them, too. After all, most shapes are not so simple and pretty. We’ll surely
miss the big picture if we restrict our attention to only the most elegant
objects.
Take polygons, for instance. Up to now we’ve dealt almost exclusively
with regular polygons (the ones with all their sides the same length and all
their angles equal). Certainly these are the prettiest. But there are lots of other
polygons out there. Here is a not-so-regular one.
Of course, polygons like this are more complicated, and we’re going to
have to pay a price. The price is greater technicality—awkward shapes are
going to be awkward to describe. Nevertheless, we need some way to indicate
precisely which polygon we’re talking about. We’re not going to be able to
make measurements or communicate ideas about a shape that is described
only as “that thing that looks sort of like a hat.”
The most natural way to specify a particular polygon is to simply list all its
angles and side lengths (in their proper order, of course). This information is
like a blueprint; it pins down precisely which polygon we mean.
If you prefer, we can also think of a polygon as a sequence of distances and
turns, as if we were traveling along its perimeter.
These outside turns will then add up to one complete turn. Of course, we
have to be careful to count left and right turns oppositely. If we were traveling
counterclockwise, for example, it would make sense to count left turns as
positive and right turns as negative. Then the grand total would be one full
(counterclockwise) turn.
In general, the best strategy for dealing with polygons is to chop them into
pieces. This is called a dissection of the polygon. In particular, we can always
dissect a polygon into triangles.
This has the effect of reducing any problem about polygons down to a
(possibly large) collection of triangle problems. For example, the area of a
simple closed polygon would be the sum of the areas of the triangular pieces.
To understand polygons, we need only understand the simplest ones:
triangles. This is good! I’d much rather be thinking about triangles anyway;
triangles are simpler, and simpler is better.
18
The study of triangles is called trigonometry (Greek for “triangle
measurement”). The problem is to figure out how the various measurements
of a triangle—angles, side lengths, and area—relate to each other. How, for
example, does the area of a triangle depend on its sides? What is the
relationship between the sides and the angles?
The first thing to notice about triangles is that they’re completely
determined by their sides. If you tell me the three side lengths, I’ll know
precisely which triangle you’re talking about. Unlike other polygons, triangles
can’t wiggle.
What is the area of this triangle? Whatever it is, it must depend only on a,
b, and c, since they determine the triangle, and hence its area, uniquely. The
perimeter, for instance, is simply the sum of the three sides, a + b + c. Does
the area have a similar algebraic description? If so, what is it? More
important, how can we figure out what it is?
A natural way to begin is to drop a line from the top of the triangle down to
its base.
Let’s call this height h. Then the area A of the triangle can be expressed as
The problem now becomes how to determine the height h in terms of the
sides a, b, and c.
Before we get started, I want to say a few things about what we should
expect. Our problem is to measure the area of a triangle given its sides. This
question is completely symmetrical, in the sense that it treats the three sides
equally; there are no “special” sides. In particular, there is no base involved in
the question itself. What this means algebraically is that whatever our
expression for the area turns out to be, it must treat the symbols a, b, and c
symmetrically. If we were to switch all the a’s and b’s, for example, the
formula should remain unchanged.
Another thing to notice is that because of the way that area is affected by
scaling, our formula will have to be homogeneous of degree 2, meaning that if
we replace the symbols a, b, and c by the scaled versions ra, rb, and rc, the
effect must be to multiply the whole expression by r2. So we expect the area
to be given by an algebraic combination of a, b, and c that is symmetrical and
homogeneous. For example, it could look something like A = a2 + b2 + c2.
Unfortunately, it’s not going to be quite that simple. Let’s see what happens.
Notice how the height breaks the base c into two pieces. Let’s call the
pieces x and y. Our original triangle has been split into two right triangles.
Now we can use the Pythagorean relation to get information about x, y, and
h. Hopefully, this will be enough information for us to actually figure out
what they are. We have
x + y = c,
x2 + h2 = a2,
y2 + h2 = b2.
This looks a bit like alphabet soup. With so many letters and symbols
flying around, it’s important for us to keep the meaning and status of each one
clear in our minds. Here a, b, and c refer to the sides of the original triangle.
These are numbers that we supposedly know from the start. The symbols x, y,
and h, on the other hand, are unknowns. Their values are currently a mystery.
We need to solve this mystery by somehow unscrambling the above equations
to get x, y, and h expressed explicitly in terms of a, b, and c.
Generally speaking, this kind of problem can almost always be solved,
provided there are enough equations. A good rule of thumb is that you need at
least as many equations as you have unknowns (although that is no
guarantee). In our case, since we have three of each, it should be possible to
unscramble our equations. Of course, no rule of thumb can tell us how to
unscramble them; that’s where sheer algebraic skill comes in.
The first thing to do is to figure out what x and y are. See if you can
rearrange our equations to get
This tells us how the base of the triangle breaks up—the point where the
height hits the base is precisely (a2 − b2)/2c units away from the midpoint.
This shift will be either to the left or right depending on which of a and b is
larger.
The next step is to find the height h. Because of the way in which h appears
in our equations, it’s actually going to be a little easier to deal with h2 instead.
In fact, to make things prettier, let’s rewrite x as (c2 + a2 − b2)/2c and use the
equation x2 + h2 = a2, so that
Notice the asymmetry of this expression. This is partly due to the fact that
we chose c as a base and h as the height to that base, so that c is being treated
differently from a and b (we also used only the relation between x and h, and
not the one involving y).
Now we can get at the area A. Again, it’s a bit nicer to deal with A2 instead.
Since the area is given by A = 1/2ch, we can write
This is not good. Although we’ve succeeded in measuring the area of the
triangle, the algebraic form of this measurement is aesthetically unacceptable.
First of all, it is not symmetrical; second, it’s hideous. I simply refuse to
believe that something as natural as the area of a triangle should depend on
the sides in such an absurd way. It must be possible to rewrite this ridiculous
expression in a more attractive form.
We can start by noticing that the whole thing can be written as the
difference of two squares. Namely,
To simplify matters, let’s multiply both sides of our equation by 16 to get rid
of all the unpleasant denominators. We get
Again, we have differences of squares. This means we can break it down even
further to get
Now, that’s more like it! The symmetry is finally revealed, and the pattern is
actually quite beautiful.
Of course, we haven’t really changed anything mathematically. These
equations have all been saying the same exact thing about how the area
depends on the sides—all of this clever algebraic manipulation hasn’t
changed that relationship. What has changed is its relationship to us. We’re
the ones who wanted to rearrange the information into a form that was more
meaningful aesthetically. Triangles don’t care. They do what they do
regardless of how we choose to describe it. Algebra is really about
psychology; it doesn’t affect the truth, only how we relate to it. On the other
hand, mathematics is not merely about truths; it’s about beautiful truths. It’s
not enough to have a formula for the area of a triangle; we want a pretty one.
And now that’s what we have.
Finally, to get the area A itself, we just need to divide our expression by 16
and take the square root. Notice that since there are four terms in the product,
dividing by 16 is tantamount to cutting each one in half. Our formula for the
area of a triangle becomes
Can you find two different triangles with the same area
and perimeter?
If a triangle has sides a, b, and c, what is the radius of the
inscribed circle?
19
The most fundamental problems in geometry concern the relationship
between length and angle. For example, suppose we travel a certain distance,
turn a certain amount, and then go another distance. How far are we from
where we started?
Another way to think of this question is to imagine two sticks that are held
together at one end.
If we move the sticks apart, increasing the angle, the ends of the sticks get
farther away; pushing the sticks together brings the ends closer. What exactly
is the relationship between the angle of the sticks and the distance between
their endpoints? This is perhaps the most basic question in all of geometry.
We can, of course, view this as being a problem about triangles. Essentially,
we’re asking how the side of a triangle depends on the opposite angle.
Perhaps it’s time to introduce a convenient labeling scheme for triangles.
The idea is to use small letters a, b, and c for the sides, and capital letters A,
B, and C for the corresponding opposite angles.
As a matter of fact, the only method we’ve ever had for measuring lengths
is to somehow get them involved in right triangles. This is why the
Pythagorean relation is so important.
As before, let’s call this height h, and the base pieces x and y.
Notice the similarity between this equation and the Pythagorean relation—
the 2bx piece must be some sort of correction term that measures the
departure of C from being a right angle. We should consider this formula a
generalized Pythagorean theorem that is valid for any angle, not just right
angles.
Of course, the present form of this expression is rather unsatisfactory, the
two most glaring reasons being that it is not symmetrical in a and b (as it
should be) and that the angle C itself does not make an appearance.
Essentially, the problem comes down to determining this length x.
Let’s take a closer look at the right triangle involving x, a, and the angle C.
Notice that this triangle is completely determined by the angle C and the
hypotenuse a. In fact, C alone is enough to pin down the shape of this
triangle. This is because the angles of a triangle always add up to a half turn;
if we know one of the angles of a right triangle, we automatically know the
other.
In particular, this means that our triangle is just a scaled version (by a
factor of a) of the right triangle with angle C and hypotenuse 1.
So to find x, we just need to multiply the length of the side marked * by the
scaling factor a. Thus x = a* and our formula for the third side of a triangle
becomes
c2 = a2 + b2 − 2ab*.
The point is that the length * depends only on the angle C and not on the
sides a and b. Our equation is now symmetrical and reveals completely the
dependence of c on the other two sides. The only thing remaining is to figure
out exactly how * depends on C. Notice that this question involves only this
right triangle and not the original triangle we started with.
Something interesting has happened here. Our problem about triangles in
general has been reduced to a problem about right triangles in particular. This
is part of a general pattern: polygons are reduced to triangles; triangles are
reduced to right triangles. A complete understanding of right triangles would
tell us everything about polygons.
Our basic problem is this. We have a right triangle with a certain angle, and
a hypotenuse of length 1. How long are its sides?
The sides of a right triangle are sometimes called its legs. In our case the
two legs depend only on the angle. The vertical one, opposite the angle, is
usually called the sine of the angle (it’s where your sinuses would be if the
triangle were your nose). The leg adjacent to the angle is called the cosine of
the angle. I suppose what I actually mean is that the sine and cosine are the
lengths of the legs, not the legs themselves. (Of course, we’ve been glossing
over that sort of distinction this whole time, so why start worrying about it
now!)
We can also think of the sine and cosine as being proportions.
The sine of an angle will be the ratio of the opposite side to the hypotenuse;
the cosine is the proportion of adjacent side to hypotenuse. This is true
regardless of whether the hypotenuse has unit length or not; the angle
determines the shape of the right triangle, and these ratios are independent of
scaling.
How are the sines and cosines of the two angles of a right
triangle related to each other?
In any case, the upshot is that to each angle there corresponds a pair of
numbers, its sine and cosine, which depend only on that angle. If C is an
angle, it is customary to write sin C and cos C for its sine and cosine. With
this terminology, our formula now reads:
c2 = a2 + b2 − 2ab cos C.
Of course, we can still drop the same perpendicular, only this time it lies
outside our triangle and forms a new angle C′, which sits next to our original
angle C.
Show that in this case we get
c2 = a2 + b2 + 2ab cos C′.
So the Pythagorean relation for large angles is pretty much the same as
before, only instead of subtracting the correction term 2ab cos C, we are
adding 2ab cos C′.
We seem to have three separate cases (with three separate formulas),
depending on whether the angle C is less than, equal to, or greater than a right
angle. This kind of thing is always a bit galling; after all, two sticks can
smoothly open and close on their angle hinge, and the distance between their
endpoints will vary continuously. Shouldn’t there be one nice, simple pattern?
One way to proceed is simply to be clever with our definitions. Since cos C
(at present) only has meaning when C is less than a right angle, we are free to
give it any meaning we wish when C is larger. The idea is to do this in such a
way that our Pythagorean relation c2 = a2 + b2 − 2ab cos C remains valid in
all three cases. That is, we let the pattern determine our choice of meaning.
This is a major theme throughout mathematics; it could even be said that this
is the essence of the art—listening to patterns and adjusting our definitions
and intuitions accordingly.
This leads us first to define the cosine of a right angle to be zero (so that we
recover the usual Pythagorean theorem) and then, more strangely, to define
the cosine of C when C is larger than a right angle to be the negative of the
cosine of C′, the angle next to C.
What we have done here is to expand the meaning of cosine. Originally, we
defined the cosine of an angle in terms of side lengths of a right triangle. Now
we are choosing to give cos C meaning even when C is too big to fit in a right
triangle. We are doing this so that we get one universal pattern instead of
three separate ones. But more important, we are letting math do the talking.
We are being sensitive to what angles and lengths want. They want cosine to
generalize, and they are telling us what they need that generalization to be.
Now it is up to us to reconcile that with our intuition.
One way to do this is to imagine a stick (of unit length, say) at an angle
with the ground.
Depending on this angle, the shadow of the stick will be longer or shorter
(I’m assuming the metaphorical sun is directly overhead). In fact, we can see
that the length of this shadow is precisely what we have been calling the
cosine of the angle.
Now, as the angle increases, the shadow gets shorter, until the stick is
straight up (at a right angle with the ground) and the shadow has length zero.
If we keep going, the shadow reappears, only on the other side. Its length is
now the cosine of the angle next to ours.
c2 = a2 + b2 − 2ab cos C
20
Given an angle (measured, say, as a portion of a full turn), how can we figure
out its sine and cosine? Conversely, if we are told what its sine and cosine are,
how can we determine the angle itself?
Some angles have sines and cosines that are fairly easy to measure. For
instance, an angle of 1/8 (or 45 degrees) makes a right triangle that is half a
square.
This means its sine and cosine are both equal to the ratio of the side of a
square to its diagonal, or .
By the way, it turns out that the sine and cosine of an angle are a bit
redundant; if you know one of them, you can deduce the other. The
connection between them comes from the Pythagorean relation. Can you
figure out what that connection is?
But what is the angle? What portion of a full turn is it? This number turns
out to be transcendental as well. What this means is that “the angle whose
sine is 4/5” is as good a description as we’re ever going to get. There’s simply
no way to take the numbers 3, 4, and 5 and do some finite sequence of
algebraic operations with them to arrive at the measurement of this angle.
All in all it’s a very depressing (and somewhat embarrassing) situation.
We’ve managed to reduce every problem concerning the measurement of
polygons down to this one essential question of how the sine and cosine of an
angle depend on the angle itself, and what I’m telling you is that this problem
is (in general) intrinsically unsolvable. This is not to say that there aren’t
certain pretty angles—like 1/8 or 1/6—whose sine and cosine are nice
numbers that can be expressed algebraically, but they are a small minority
indeed.
What I think is interesting about a situation like this is that we are able to
ask perfectly natural geometric questions that we can’t answer. Moreover, we
can prove that they are unanswerable. In other words, we can know that
something is unknowable. Maybe this is not so depressing after all—it’s a
pretty amazing human accomplishment!
Of course, I’ve done nothing to help explain how it is that we do know
such things. It’s all very well for me to say that such and such a number is
transcendental; it’s quite another for me to show you why.
I’m in a truly unfortunate predicament here. It’s important to me that you
understand the positive nature of statements like “π is transcendental” or “
is irrational.” When a mathematician like me says that something is
impossible, be it that π cannot be represented algebraically, or that there is no
fraction whose square is 2, I’m not saying something negative about what we
can’t do or don’t have. I’m talking about what we do have: an explanation!
We know that is irrational, and we understand why. We have a perfectly
reasonable explanation—namely, Pythagoras’s argument about even and odd
numbers.
Over the centuries, mathematics, like any art form, has achieved a certain
depth. Many works of art are extremely sophisticated and require years of
study to properly understand and appreciate. This is the case with the
transcendence of π, unfortunately. Proofs exist, even very beautiful ones, but
that doesn’t mean that I can easily explain them to you here. For now, I think
you’re just going to have to take my word for it.
Can you use a regular pentagon to find the sine and cosine
of one-fifth of a turn?
21
What do we want out of trigonometry? In the best of all possible worlds, we
would like to be able to determine all the measurements of any given triangle.
Let’s say that a triangle has been completely measured once we know its
angles, side lengths, and area. Of course, we would have to know some of
these measurements to begin with in order to specify which triangle we’re
even talking about.
How much information do we need? Which combinations of angle and side
information are sufficient to pin down a triangle precisely? There are several
possibilities:
Three sides. In this case the triangle is certainly determined uniquely. The
generalized Pythagorean theorem can then be used to find the angles (or at
any rate their cosines, which is morally the same, and all we can reasonably
hope for). Heron’s formula gives us the area directly from the three sides, so
in this case we can always measure the triangle completely.
Two sides. This is generally not enough information to specify a particular
triangle, unless we have some additional angle information. If we know the
angle between the two sides, or at least its cosine, then the generalized
Pythagorean theorem will give us the other side, and we’re done. Otherwise,
if all we have is one of the other angles, that won’t be enough to determine
the triangle. Do you see why?
Why are two sides and an angle insufficient in general to
specify a triangle?
It’s easy to see that longer sides are opposite larger angles; the question is
whether we can say anything more precise than that.
Since we’re dealing with angles and lengths, we naturally expect sines and
cosines to make an appearance, and in fact they do. The relationship between
the sides of a triangle and their opposite angles is one of the most beautiful
patterns in geometry: the sides are in the same proportion as the sines of the
angles. In other words,
Notice that this height h is opposite both angles A and B. This means that
Thus a : b = sin A : sin B, and the sides are in the same proportion as the sines
of the opposite angles. As with the generalized Pythagorean theorem, we’re
seeing how angles communicate length information via their sines and
cosines. I like the present version of this sentiment because of its symmetry.
One thing I just realized about this argument is that it presupposes that the
angles are all acute (that is, less than right angles). What happens if we have a
triangle with a larger, obtuse angle? Do such triangles still obey the law of
sines? For that matter, what do we even want the sine of such an angle to
mean?
Using the law of sines, the generalized Pythagorean theorem, and Heron’s
formula, we can completely measure any triangle—at least in the sense that
we can reduce the measurement of any triangle (and hence any polygon) to
the determination of a bunch of sines and cosines. The buck usually stops
here, unless there’s some sort of amazing symmetry or coincidence, because
of the transcendental nature of sine and cosine. The goal of trigonometry then
becomes not to calculate these numbers but to find patterns and relationships
among them.
How are the sine and cosine of an angle related to the sine
and cosine of an angle twice as large?
I should point out that everything we’ve been saying about polygons works
the same way in three dimensions for polyhedra. In particular, polyhedra can
always be dissected into various pyramids, and these can be measured using
triangles. In this way, all problems concerning polyhedra come down to sines
and cosines as well.
22
What shapes are left for us to measure? The answer is, most of them! In fact,
we haven’t even begun to deal with the vast majority of shapes out there.
Everything we’ve looked at so far has had some sort of special property, like
straight sides or symmetry, that sets it apart and makes it atypical. Most
shapes have no such distinguishing features. Most shapes are asymmetrical
and ugly, curved, and not in any particularly pleasing way.
But why would we want to work with something like that? Why should we
(meaning you) expend time and energy trying to understand some ugly blob?
And even if we did want to, how would we do it? How do we even describe,
let alone measure, an irregularly curved shape like this one? For that matter,
what do I even mean by “this one”—what one? Exactly which shape am I
talking about here?
If I were doing something practical, I could simply say “the shape in the
diagram” and be done with it. The picture itself would be the shape, and
rough measurements could be made right from it.
Mathematically, however, the picture is no help at all. A diagram, being a
part of the physical world we live in, is much too crude and imprecise to refer
to a specific mathematical object. And it’s not merely a question of accuracy.
A circle etched in gold by a laser to within a billionth of an inch is just as
irrelevant (if not more so) than one made by a kindergartener out of
construction paper. Neither one is anything like a true circle.
The important thing to understand is that diagrams and other such models
are made of atoms, not idealized imaginary points. In particular, this means
that a diagram cannot accurately describe anything. Not that diagrams are
completely useless; we just need to understand that their role is not to specify
or define but to stimulate creativity and imagination. A construction paper
circle may not be a circle, but it still might give me ideas.
So how, then, are we going to describe a particular irregularly curved
shape? Such a shape would contain infinitely many points, and unlike a
polygon, no finite collection of them would be enough to pin the shape down
—we would need an infinite list of points. But how can I think about a shape,
or tell you about it, if I need to provide an infinite amount of information?
The question is not what shapes do we want to talk about, but what shapes
can we talk about.
The disturbing truth is that most shapes cannot be talked about. They’re out
there all right; we just have no way to refer to them. Being human, using
finite languages over finite lifetimes, the only mathematical objects we can
ever deal with are those that have finite descriptions. A random spatter of
infinitely many points can never be described, and neither can a random
curve.
What I’m saying is, the only shapes that we are ever going to be able to
specify precisely are those that have enough of a pattern to them to allow an
infinity of points to be described in a finite way. The reason we can talk about
a circle is not because of the kindergarten cutout, but because of the phrase
“all the points at a certain distance from a fixed center.” Since the circle has
such a simple pattern, I don’t need to tell you where each of its individual
points are; I can just tell you the pattern they obey.
My point is that’s all we can ever do. The only shapes we can talk about are
those with a pattern, and it is the pattern itself—a finite set of words in a finite
language—that defines the shape. Those shapes that do not have such a
pattern (the vast majority, I’m afraid) can never be referred to, let alone
measured, by any human beings, ever. The set of objects that we can think
about and describe to others is limited from the start by our own humanity.
This is actually something of a theme throughout mathematics. For instance,
the only numbers we can talk about are those with a pattern; most numbers
can never be referred to either.
Geometry, then, is not so much about shapes themselves as it is about the
verbal patterns that define them. The central problem of geometry is to take
these patterns and produce measurements—numbers which themselves must
necessarily be given by verbal patterns. We have already talked about
polygons, which can be specified easily by a finite list of sides and angles,
and circles, which have their own very simple pattern. What are some other
patterns we can think of? What sorts of descriptions are possible? What
curves besides circles can we talk about?
23
There is one curve besides the circle that we have already come across, and in
fact it’s one of the oldest and most beautiful objects in all of geometry: the
ellipse.
For convenience, let’s think of the first plane as being horizontal. Any
point on this plane could then be lifted straight up to a corresponding point on
the slanted plane. In this way, any shape on the first plane can be transformed
into a new shape on the second plane.
Notice also that if the two planes happen to be parallel, then projection
doesn’t do anything at all—it’s dilation by a factor of 1!
At any rate, we now have a radically different way to think about dilation.
Rather than a stretching of a single plane, we can view it as a projection
through space of one plane onto another. In particular, the dilated form of any
object (not just a circle) occurs as a suitably slanted cross-section of the
generalized cylinder with that object as its base.
24
An entirely different approach to ellipses is through their so-called focal
properties. It turns out that inside every ellipse are two special points, called
focal points, which have the amazing feature that every point on the ellipse
has the same combined distance to them.
In other words, as a point travels along the perimeter of the ellipse, the
distances to the two focal points will change, but the sum will remain
constant. This makes it possible for us to describe an ellipse in a new way, as
“the set of points whose distances to two fixed points have a fixed sum,” or
some such phrase. Some people even choose to take this as their definition of
an ellipse.
Of course, it doesn’t really matter whether you think of an ellipse as a
dilated circle that happens to have an interesting focal property, or if you
think of the focal property as the defining characteristic of ellipses, which
then happen to be dilated circles. Either way, we have some work to do. I
mean, a dilated circle is one thing, a curve with focal points is another. Why
should they be the same? More to the point, how can we prove they are the
same?
This is the sort of thing I love about mathematics. Not only are there
amazing discoveries to be made, but you have the additional challenge of
understanding why such a thing should be true and of crafting a beautiful and
logical explanation. You get all the pleasure of art and science all in one
package, plus it’s all in your head!
I want to show you an ingenious argument (discovered by Dandelin in
1822) that explains why dilated circles have this focal property. Let’s start by
viewing our ellipse as a cross-section of a cylinder by a slanted plane.
If we’re going to be able to show that this curve satisfies the focal property,
where on earth are the focal points going to be? The answer is shockingly
beautiful.
Take a sphere S (of the same diameter as the cylinder) and drop it into the
cylinder from above, so that it falls and hits the slicing plane at a point P. Do
the same thing with another sphere S′ from below, pushing up until it hits the
plane at a point P′.
These two points P and P′ (where the spheres hit) turn out to be the focal
points. Is that gorgeous, or what!
Of course, to confirm this, we need to show that no matter what point we
choose on the ellipse, the total distance to these two points will always be the
same. Let’s suppose Q is an arbitrary point somewhere on the ellipse. Now
imagine the line through Q and P.
This line has a very interesting feature: it touches the sphere S exactly once.
This is quite unusual. Most lines either miss a sphere entirely or pass through
it, hitting it twice. A line that touches a sphere only once is called a tangent
(Latin for “touching”). The line through Q and P is a tangent to the sphere S
because it is contained in the plane, which hits the sphere only at P.
There is another way to make a tangent to S, and that is to take a vertical
line through Q, intersecting the sphere S on its equator.
In general, there are many tangents to a sphere from a given point. The
interesting thing is that they all have the same length.
That is, the distance from a point outside the sphere to a point on the sphere
where the tangent hits is the same no matter which tangent you use.
In particular, the distance from our point Q to the alleged focal point P is
the same as the vertical distance from Q to the equator of S. To make it
simpler, let me chop off our original cylinder at the equators of the two
spheres.
Then what we’re saying is that the distance from Q to P is the same as the
distance from Q to the top of this cylinder, and by similar reasoning, the
distance from Q to P′ must be the same as the distance from Q to the bottom
of the cylinder.
This means that the total distance from Q to the two points P and P′ must
simply be the height of the cylinder. Since this height is independent of the
position of the point Q, our ellipse really does satisfy the focal property, and
this beautiful proof shows us why. What an inspired work of art!
How do people come up with such ingenious arguments? It’s the same way
people come up with Madame Bovary or Mona Lisa. I have no idea how it
happens. I only know that when it happens to me, I feel very fortunate.
25
Now I want to tell you about another remarkable property of ellipses, which is
interesting not only mathematically, but also from a “real world” point of
view. Probably the simplest way to describe it is to think of an ellipse as a sort
of pool table with a cushion running around its perimeter. Imagine a hole at
one of the focal points and a ball placed at the other. Then it turns out that no
matter which direction you shoot the ball, it will always bounce off the
cushion straight into the hole!
In other words, ellipses are bent in just the right way so that lines from one
focal point get reflected into lines to the other. Geometrically, this is saying
that the two lines meet the ellipse in equal angles.
What makes this a little confusing is that we’re dealing with a curve; what
does angle mean exactly?
The most elegant way out of this dilemma is to use a tangent: a line that
touches the ellipse at exactly one point. Each point on the ellipse has a unique
tangent line through it that indicates the direction in which the curve is
bending there.
This gives us a way to talk about angles made by curves. The angle
between two curves is just the angle made by their tangents.
As I’ve said before, the task of the mathematician is not only to discover
fascinating truths but also to explain them. It’s one thing to draw some
ellipses and lines and say that such and such is happening—it’s quite another
to prove it. So I want to show you a proof of the tangent property. The
explanation I have in mind is not only simple and pretty but is general enough
to apply to many other situations besides ellipses.
In fact, let’s start by looking at a different (but related) problem. Suppose
we have two points situated on the same side of an infinite line (it’s nicer to
deal with an infinite line because its length and position don’t become an
issue).
The question is, what is the shortest path from one point to the other that
touches the line? (Naturally, the part about touching the line is the interesting
part. If we dropped that requirement then the answer would simply be the
straight line connecting the two points.)
Clearly the shortest path must look something like this:
Since our path has to hit the line somewhere, we can’t do better than to go
straight there. The question is, where is there? Among all the possible points
on the line, which one gives us the shortest path? Or does it even matter?
Maybe they all have the same length!
The truth is, it does matter. There is only one shortest path, and I’ll tell you
how to find it. Let’s first give the points names, say P and Q. Suppose we
have a path from P to Q that touches the line.
There’s a very simple way to tell if such a path is as short as possible. The
idea, which is one of the most startling and unexpected in all of geometry, is
to look at the reflection of the path across the line. To be specific, let’s take
one part of the path, say from where it hits the line to where it hits the point
Q, and reflect that part over the line.
We now have a new path that starts at P, crosses the line, and ends up at the
point Q′, the reflection of the original point Q. In this way, any path from P to
Q that touches the line can be transformed into a new path from P to Q′.
Now, here’s the thing: the new path has exactly the same length as the
original. This means that the problem of finding the shortest path from P to Q
that hits the line is really the same as finding the shortest path from P to Q′.
But that’s easy—it’s just a straight line. In other words, the path we’re
looking for, the shortest path between the points that touches the line, is
simply the path that when reflected becomes straight.
Apart from its sheer loveliness, this argument is also an excellent example
of the modern mathematical viewpoint that considers problems as occurring
within a framework of structures and structure-preserving transformations. In
this case, the relevant structures are paths and their lengths, and the key to the
problem is recognizing reflection as the appropriate structure-preserving
transformation. This is admittedly a rather professional point of view, but I
think it’s a valuable way for anyone to think about math problems.
Now that we know precisely what the shortest path looks like, we can think
about alternative descriptions of it. One of the simplest is that it’s the path that
makes equal angles with the line. The shortest path is the one that “bounces
off.”
Why does the shortest path make equal angles with the
line?
Of course, the reason I bring all this up is that it helps explain the tangent
property of ellipses. In that situation, we have a path running from one focal
point to the other, by way of a point on the perimeter. We want an explanation
as to why the angles made with the ellipse (that is to say, with the tangent)
must be equal.
Well, the reason is that this path happens to be the shortest path between
the focal points that touches the tangent line. This is easy to see from the focal
property of the ellipse: all the points on the ellipse have the same total
distance to the focal points. Naturally, points inside the ellipse will have a
smaller total distance, and points outside will have a larger total distance. In
particular, since any point of the tangent line (other than the actual point of
contact with the ellipse) is strictly outside the ellipse, the path through such a
point must be longer than the one through the contact point itself.
Since our path is shortest, it must then make equal angles with the tangent.
The tangent property, or “pool table” effect, comes directly from the focal
property of ellipses and the fact that shortest paths always bounce.
26
I want to say a few more things about the relationship between geometry and
reality. Of course, in some sense, they are entirely different, one being a
completely imaginary construction of the human mind and the other
(presumably) not. Physical reality was here before there were conscious
human beings, and it will still be here when we’re gone. Mathematical reality,
on the other hand, depends on consciousness for its very existence. An ellipse
is an idea. There are no actual ellipses out there in the real world. Anything
real is necessarily a wriggling, jiggling mass of trillions of atoms and is
therefore far too complicated to ever be described by human beings in any
precise way.
There are two important differences between physical atoms (which real
things are made of) and the mathematical points that make up our imaginary
geometric objects. First, atoms are constantly in motion, flying on and off and
smashing into each other. Points do what we tell them to do; the center of a
circle doesn’t wiggle around. Secondly, atoms are discrete—they stay away
from each other. Two atoms can be brought only so close together; the forces
of nature (apparently) do not allow them to get closer. Of course, we place no
such restriction on our imaginary points. Mathematical objects are governed
by aesthetic choices, not physical laws. In particular, a line or curve of points
is impossible to realize physically. Any “curve” made of real particles is
necessarily going to be lumpy, with all kinds of gaps in it—more like a string
of pearls than a strand of hair (and that includes, of course, an actual strand of
hair).
On the other hand, it’s not true that there is absolutely no connection
between geometry and reality. There may not be any perfect cubes or spheres
in the world, but there are some pretty good approximate ones. Any property
that the mathematical cube and sphere enjoy must be roughly true for a
wooden box and a bowling ball.
A good example is the tangent property of the ellipse. The pool table
analogy is not merely a brilliant rhetorical device; we actually could build
such a pool table, with green felt and everything. It might take a little trial and
error to adjust the size of the hole and the springiness of the cushions, but we
could definitely get it to work; we could shoot an actual ball in any direction
and it would always go in. People have also built elliptical rooms that exhibit
the tangent property in a different way. Two people stand at the focal points
and whisper to each other. All of the sound waves emitted by one person
bounce off the walls and end up in the other person’s ear. The result is that
they can hear each other, and no one else in the room can hear a thing.
So how is it that these things actually work? If atoms and points are so
different, why does a pool table made of atoms behave so much like an
imaginary ellipse made of points? What is the connection between real
objects and mathematical ones?
First of all, notice that something like the elliptical pool table wouldn’t
work if it were too small; for instance, if it were only a few hundred atoms
wide. This object would behave nothing like an ellipse. An atom-sized ball
would simply fly through the gaps in the wall or get involved in some
complicated electromagnetic interaction with it. In order to behave at all
geometrically, an object has to contain enough atoms to statistically cancel
out these kinds of effects. It has to be big enough.
On the other hand, if the pool table were too big, say the size of a galaxy, it
would also fail due to gravitational and relativistic effects. To be like a
geometric thing, a real thing has to be the right size; namely, it has to be about
our size. It has to be roughly at the scale at which we humans operate. Why?
Because we’re the ones who made up the mathematics!
We are creatures of a certain size, and we experience the world in a certain
way. We’re much too big to have any direct experience with atoms; our senses
can’t pick up anything that small. So we have no intuition at that scale. Our
imaginations are informed by our experiences; it’s only natural that the kind
of imaginary objects we would create in our minds would be simplified and
perfected versions of the things we’ve seen and felt. If we were a radically
different size, we would have developed a very different type of geometry—at
least initially. Over the centuries, people have invented lots of different
geometries, some of which work well as models of reality at very small or
very large scales and some that have nothing to do with the real world
whatsoever.
So the connection between geometry and reality is us. We are the bridge
between the two. Mathematics takes place in our minds, our minds are a by-
product of our brains, our brains are part of our bodies, and our bodies are
real.
27
An ellipse is one of those rare shapes that we can actually talk about; it has a
definite, precise pattern that can be put into words. Of course, all we’ve really
done is to take a preexisting pattern (namely that of a circle) and modify it
slightly; it’s not like we built up the ellipse pattern from scratch. An ellipse is
a transformed circle, and it is the transformation itself (namely dilation) that
endows the ellipse with its various properties and allows us to speak of it at
all. The classical focal definition is another way; it’s a generalization of the
idea of a circle and its center.
The point is that we created a new shape by modifying an old one. Any
geometric transformation can be used in this way, provided we have a precise
description of how it works (“a sphere with a dent in it” is a bit too vague). In
particular, if a shape has a definite, describable pattern, then so will any
dilation of it.
One simple way to get new shapes from old ones is by taking cross-
sections. Ellipses occur as cross-sections of a cylinder. What happens when
we slice other three-dimensional objects? A sphere is certainly an attractive
candidate. Unfortunately, all of its cross-sections are circles, so we get
nothing new. What about the cross-sections of a cone?
Surprisingly, these turn out to be ellipses. This might at first seem very
strange, seeing as how cones are so different from cylinders. You would
expect them to have more asymmetrical, egg-shaped cross-sections. On the
other hand, it’s not hard to modify our earlier argument with the Dandelin
spheres to show that the cross-sections of a cone satisfy the exact same focal
property.
Again we have two spheres, of different sizes this time, which each hit the
plane at exactly one point. The main difference now is that the spheres no
longer touch the cone along their equators, but along parallel circles above
their equators. Nevertheless, the same argument with the tangents shows that
the cross-sectional curve has the right focal property, so it is in fact an ellipse.
This kind of conic, called a parabola (Greek for “thrown beside”), is also
infinite, but (as we will soon discover) is shaped quite differently than a
hyperbola.
The point is, there are different ways to slice a cone, and depending on
what kind of slice you make, you get different types of curves with very
different properties. The conic sections were extensively studied by the
classical geometers, most notably by Apollonius (circa 230 BC). One of the
great discoveries of this period was that hyperbolas and parabolas have their
own focal and tangent properties, just as ellipses do. Of course, I want to tell
you about them, but first I thought it would be nice to show you a somewhat
different, more modern way to think about the conic sections.
The idea is to think of them projectively. Let’s imagine two planes in space.
Instead of choosing a particular direction to project in, let’s fix a certain point
in space (not on either plane) to project from.
Points on the first plane are projected onto the second plane by straight
lines through this projection point. (Sometimes I like to think of the projection
point as the sun, and the projected images as shadows.) Of course, there’s
nothing saying that the second plane has to be behind the first; we might be
projecting toward the point instead of away from it. We could even place the
projection point between the two planes.
28
Since projection corresponds to a change in perspective, it’s natural to think
of two objects related by a projection as being the same; that is, two different
views of the same object. The philosophy of projective geometry is that the
only properties of a shape that matter are those that are unaffected by
projections. What is intrinsic, what is “real” about a shape, should not depend
on one’s point of view; beauty should not be in the eye of the beholder. Any
feature that changes under projection is not so much a property of the object
itself but of the way in which it is being viewed. This is a rather modern way
of thinking. We have a certain type of transformation (in this case projection),
and we are interested in those structures that are invariant.
Are all triangles the same projectively? How about all four-
sided polygons?
The biggest difference between classical and projective geometry is that the
traditional measurements—angle, length, area, and volume—no longer have
any meaning. Projection warps a shape so much that all such measurements
get radically changed. In this sense, projection is extremely destructive.
What’s happening here is that the point where the two lines intersect
doesn’t appear on the target plane at all. In fact, in any central projection,
there will be special points on one plane that don’t make it onto the other.
The trouble is that sometimes the line of sight from the projection point is
parallel to the target plane. This is something of a disaster. It means that
projection is bad—it loses information. In particular, it can lose the
information of whether two lines intersect or not.
The extent of the damage is that there is an entire infinite line of points that
will disappear under projection.
Specifically, it’s the line parallel to the target plane that is at the same
height as the projection point. All the points on this line will be lost in the
projection process.
Something similarly disastrous happens in reverse. If we start with parallel
lines on a plane and project them onto another plane, we obtain something
truly monstrous—a pair of crossed lines with the crossing point missing.
From a practical point of view, say to an artist or architect, this is all good
news. It’s very nice to be able to draw convincing pictures of railroad tracks,
and nobody needs to lose any sleep about some missing points.
Mathematically, however, it’s quite disturbing. Disturbing enough, in fact, to
lead geometers to take a very bold and imaginative step—to redefine space
itself.
The idea is really quite ingenious. What goes wrong with projection is that
not all lines through a point necessarily hit a given plane.
The trouble is, things can be parallel. Lines can be parallel to lines, planes
can be parallel to other planes, and, as in this case, lines and planes can be
parallel to each other. Since it is parallelism that causes the problem, the idea
is to get rid of it—to make it so that lines or planes that lie in the same
direction actually do meet.
The plan is this: for each direction in space, we imagine a new point
somehow infinitely far away in that direction. The idea is that all lines that lie
in that direction will now meet at the new imaginary point. It’s that simple.
We just throw in enough new points (one for each direction is enough) so that
parallel lines and planes now intersect each other.
One nice way to think about it is to imagine a line and a point and see how
various lines through that point intersect the line.
As the lines get closer to being parallel, the intersection points move farther
and farther to the right. The philosophy is that when the line becomes exactly
parallel, it still has an intersection point, one that is infinitely far away to the
right. Interestingly, the same thing happens with lines slanted to the left. The
new points we are adding lie both infinitely far to the right and to the left. It’s
as if our lines are somehow like circles that pass through infinity and come
back around the other side.
Does this sound like the insane ravings of a lunatic? I admit it takes a bit of
getting used to. Perhaps you object to these new points on the grounds that
they are imaginary—they’re not really there. But none of the things we’ve
been talking about are real anyway. There’s no “there” there in the first place.
We made up imaginary points, lines, and other shapes so that things could be
simple and beautiful—we did it for art’s sake. Now we’re doing it again, this
time so that projections will be simple and beautiful. It’s really nice, once you
get accustomed to it.
These points we’re adding in are called points at infinity. The new enlarged
space we’ve created, consisting of ordinary three-dimensional space plus all
the points at infinity, is known as projective space. It is customary to add the
appropriate points at infinity onto all the various lines and planes as well. A
projective line is thus an ordinary line together with the point at infinity
corresponding to its direction. A projective plane is a plane, along with all the
points at infinity that you would expect—the ones corresponding to the
various directions on that plane.
The upshot of this is that we have a new geometry, one where parallelism
has been banished. Two lines on a plane intersect, period. If they intersected
before, they still do. If they were parallel before, they now meet at infinity.
This is a much prettier, more symmetrical situation than in classical geometry.
What about two planes? Normally, two planes intersect in a line. What
happens when the planes are parallel? Notice that parallel planes have the
exact same points at infinity, and that these points then constitute the
intersection of the two planes. This makes it desirable to view the points at
infinity of a plane as lying on a line at infinity. Now we can say with complete
generality that two projective planes in projective space always intersect in a
projective line.
Similarly, it is nice to think of the complete set of points at infinity in
projective space as forming a projective plane at infinity. Then we can say, for
instance, that a line and a plane always meet at exactly one point (unless, of
course, the line happens to lie in the plane).
Of course, the right way to deal with projective space is to forget about the
distinction between ordinary points and points at infinity. Projectively, there is
no such distinction; what is ordinary from one perspective is infinite from
another. Projective space is a completely symmetrical environment, and all of
its points are created equal.
In particular, the distinction between parallel and central projection is
rather spurious. Parallel projection is just central projection from a point at
infinity. So we might as well drop the adjectives, which reflect a classical
bias, and simply call them both projection.
We now have a completely reformed projection transformation, and we’ve
identified a few of its invariants—straightness, tangency, and intersection.
Can you find any others?
In this case we are again projecting the circle from the horizontal plane
onto the slanted plane using the tip of the cone as our projection point. So
parabolas are certainly projections of a circle. Notice that there is exactly one
point of the circle that does not project onto the slicing plane proper; it ends
up being projected to a point at infinity on the slicing plane. This means that a
parabola is simply a circle with one of its points at infinity. The line at infinity
is then tangent to the circle.
So the circle projects to two bowl-shaped curves, one pointing up and the
other pointing down. A hyperbola, then, should be thought of as consisting of
two pieces. Again it is a projection of a circle, only this time there are two
points at infinity.
So the conic sections, properly understood, are just projected circles. This
means that, projectively speaking, they are circles. The differences between
them from a classical point of view depend on how the circle intersects the
line at infinity—whether in zero, one, or two points.
Not only that, it turns out that every projection of a circle is a conic section.
No matter how you project a circle, you will always get either an ellipse,
parabola, or hyperbola. There aren’t any other curves out there that are
projectively equivalent to a circle. In particular, this means that a slanted cone
gives us the same cross-sectional curves as an upright cone.
Even if the base of the cone is itself a conic section, say an ellipse, we still
don’t get anything new. That is, a projection of a conic is still a conic.
29
As gratifying as it may be to view the conic sections projectively—to see
them as different perspective views of the same circle—it doesn’t actually tell
us that much about the geometry of these curves. It’s all very well to know
that hyperbolas, parabolas, and ellipses are projectively equivalent, but they
are still different shapes. What do they look like exactly? How, for instance,
does a parabola differ from a hyperbola?
At this point, we know a lot more about ellipses than we do about the other
conics. We know that ellipses are dilated circles, and we know they have
particularly nice focal and tangent properties. Can we say anything similar
about hyperbolas and parabolas? It turns out that we can.
Hyperbolas have a very beautiful focal property, as a matter of fact. Like an
ellipse, a hyperbola contains two special focal points, and as a point moves
along the hyperbola, its distances to these focal points follow a simple pattern.
This time, however, it’s not the sum of the distances that remains constant,
it’s the difference. That is, a hyperbola is the set of points whose distances to
two fixed points differ by a fixed amount.
Naturally, such an outrageous claim requires some sort of proof. We need
to show that if a cone is sliced steeply (so as to make a hyperbola) then the
points of the cross-section must obey this new focal property. As you might
expect, this can be done with spheres and tangents in the usual way.
The focal property tells us quite a bit about hyperbolas. For one thing, it
means they must be fairly symmetrical.
Not only are each of the two pieces of the hyperbola themselves
symmetrical, but they are mirror images of each other. There is symmetry
across the line connecting the focal points and also across the perpendicular
line between them.
Neither of these lines actually touches the hyperbola, but as you travel out
along the hyperbola you get closer and closer to them. In other words, these
lines are simply the tangents to the hyperbola at infinity.
The simplest way to think of it is to view the hyperbola as a circle in
projective space that meets the line at infinity at two points. (That is, after all,
what a hyperbola is.) The circle then has two tangent lines at these points, and
these are the lines that we’re seeing.
Since hyperbolas are symmetrical, the crossing point of the tangents must
be exactly halfway between the two focal points.
There is a very pretty connection between these tangent lines and the focal
property of the hyperbola. If we draw lines through the focal points, parallel
to the tangents, we get a diamond shape.
Because of the symmetry, the sides of this diamond must all be the same.
The angles won’t necessarily be right angles, so we can’t say it’s a square, but
it’s still a nice diamond. (You could also call it a rhombus if you like that
word better.)
The focal property says that the distances from each point on the hyperbola
to the focal points have a constant difference. It turns out that this constant
difference, what we might call the focal constant of the hyperbola, is exactly
equal to the side length of the diamond.
(Probably the easiest way to see it is to imagine a point moving along the
hyperbola toward infinity, and think about what happens to the lines
connecting it to the focal points.)
One consequence of this is that a hyperbola is completely determined by its
tangents at infinity (the two crossed lines) and its focal points.
In particular, there is the very special right hyperbola whose tangents meet
at a right angle.
In this way we can think of an ellipse as having a long radius and a short
radius. The ellipse is then completely determined by these two lengths.
Similarly, we can talk about a unit hyperbola. This would be the right
hyperbola whose distance from center to edge is exactly one unit.
Another amusing similarity between ellipses and hyperbolas is the way the
focal constant appears geometrically.
It also happens that ellipses and hyperbolas have similar tangent properties.
For the ellipse it’s the “pool table” effect. What is it for the hyperbola?
Can you discover the tangent property of a hyperbola?
The parabola, it turns out, is another story altogether. It does have a focal
property, but it is of a very different character from those of the ellipse and
hyperbola. Instead of two focal points, a parabola has only one. When we
slice a cone at exactly the same slant as the cone itself, we create only one
compartment capable of housing a sphere in the right way.
That is, there is only one sphere that is simultaneously tangent to the cone
and the slicing plane. As usual, the focal point of the parabola is the point
where this sphere hits the plane. The distance from a point on the parabola to
the focal point is the same, then, as the distance from the point to the sphere,
along the cone. In other words, the distance to the circle where the sphere
meets the cone.
In the case of the ellipse and the hyperbola, we had another focal distance
we could compare this with. Here we’ve got bupkes. How can we understand
what this length means geometrically? I think the best way to see what’s
going on is to slice the cone twice horizontally, both through the circle and
through our chosen point, to make a sort of lamp-shade thing.
This removes any unnecessary cone baggage. Notice that the plane through
the circle intersects the slicing plane in a certain line. This line turns out to be
the key to the whole business. The important thing is that it depends only on
the parabola itself, not on which point we happened to choose.
Now, here’s the beautiful observation. The length that we’re interested in
(the distance from our point to the focal point) is just the distance along the
lamp shade between the two horizontal planes. We can swing this length
around the lamp shade without changing it. In particular, we can roll it around
until it’s directly opposite the slicing plane.
Now it’s easy to see what this length is—it’s just the distance from our
point to the special line. How pretty! So the deal with parabolas is that not
only is there a focal point, there is also a focal line, and the points of the
parabola obey the beautiful pattern of being equidistant to both.
This focal property of a parabola has a number of interesting consequences.
For one thing, it means parabolas must be symmetrical (not that that’s any
great surprise).
What this means is that any two parabolas are just scalings of each other. In
other words, all parabolas are similar. There’s really only one parabola shape.
There are lots of different ellipses and hyperbolas depending on how you
stretch them, but there’s only one parabola. That makes it very special.
One nice way to think about parabolas is to view them as infinite ellipses: a
parabola is what you get when you fix one focal point of an ellipse and send
the other one off to infinity.
(We could also think of them as infinite hyperbolas in the same way.)
Parabolas lie, in a sense, on the borderline between ellipses and hyperbolas.
Right away, this tells us what the tangent property of a parabola must be: if
we shoot out from the focal point, we will hit the wall of the parabola and
bounce straight out to infinity.
This has a number of amusing practical applications. For one thing, it says
that if we make a parabolic mirror (a mirror in the shape of a paraboloid) and
place a lightbulb at the focal point, then all the radiation will be sent straight
out; none of the energy will be wasted. This is exactly how flashlights and
automobile headlights are designed. Running this in reverse, a parabolic
mirror also makes a terrific solar oven. All the sunlight entering the mirror is
focused at a single point. (That’s why it’s called a focal point.) Conic sections
make good lenses; they bend light in an interesting and useful way.
If I have lingered on the subject of conic sections for so long, it is because
they are so beautiful and have so many interesting properties, and I can’t
resist telling you about them. The other reason is that it is possible to tell you
about them. It’s not that easy to talk about curves, and conics are relatively
simple as far as things go.
I want to stress something. These conic sections are very particular and
specific curves—not every bowl-shaped object is a parabola or hyperbola.
Most curves don’t have anything like a focal or tangent property. These things
are special, and we should cherish them!
If you connect lines in this evenly spaced pattern, a
parabola appears. Why?
One final word about conics, and that concerns their measurement. We’ve
already discussed the ellipse situation. Since an ellipse is simply a dilated
circle, its area is easy to measure; for the same reason, its perimeter is not. To
be precise, if we have an ellipse whose long and short radii are a and b, say,
then the area is simply πab. Do you see why? The perimeter, on the other
hand, depends on a and b in a transcendental way. There is no formula in the
sense of a finite algebraic description.
Unfortunately, that’s par for the course; the same is true for the parabola
and hyperbola. Of course, those curves are infinite, so we can’t really talk
about their perimeters as such. But even if we chop them off at some point,
their lengths are not algebraically describable. Not that they aren’t very
interesting. In fact, we will return to the subject of conic section lengths a
little later on, when we will have some more powerful measurement
techniques.
There is one measurement that we are in a position to make, and that is the
area of a parabolic sector.
This is the kind of region formed by two straight lines drawn from the focal
point out to the curve itself. The nicest way to measure this area is to compare
it to the parabolic rectangle made by dropping lines straight down to the focal
line.
Using the method of exhaustion, Archimedes was able to show that the area
of the sector is exactly half that of the rectangle. Can you do the same?
30
What an amazing can of worms we opened up just by slicing a cone! If such a
simple shape as that has such interesting cross-sections, what will happen if
we slice something more complicated? What sort of curves do we get when
we slice, say, a doughnut?
It turns out that this curve, despite the nice symmetrical oval shape, is
definitely not an ellipse. It doesn’t have the right focal or tangent properties,
and it isn’t a dilated anything; it’s an entirely new kind of curve that we
haven’t seen before. I suppose we could call it a toric section if we wanted to.
If we move the slicing plane over a little, so that it just touches the inner
rim of the torus, we get an even more exotic cross-section.
Of course, this is not just any old figure eight sort of shape but a very
specific kind of curve with a very specific kind of pattern—the one that
comes from being a slice of a doughnut. This is a fairly sophisticated
geometric object. What sort of properties might this curve have? How on
earth would we measure such a thing?
A while back I was talking about the description problem. The only shapes
we can talk about are the ones that have a describable pattern. The geometer’s
job is to somehow turn this pattern information into measurement
information. Naturally, this is going to be a whole lot easier to do when the
pattern is a simple one. The more elaborate our descriptions, the harder it is to
say anything about the shapes they describe.
The sad truth is that measurement is almost always impossible. It is only
the simplest objects that we have any hope of measuring. Even then it’s no
picnic. Remember how clever we had to be to measure a sphere? What
chance do we have against a shape whose description is at all involved?
What I’m saying is that in addition to a description problem, we also have a
complexity problem. Not only do our shapes have to have a pattern, they have
to have a simple pattern. The problem is that the only way we have to
measure curved shapes is the method of exhaustion, and if the patterns get too
complicated, it quickly becomes unwieldy.
The situation is kind of ironic in a way. Before, we were worried about not
being able to describe any new shapes at all. Now we have lots of ways to do
that. For instance, we could start with one of these figure eight toric sections,
rotate it in space to form a surface, and take a cross-section of that.
God only knows what sort of curve this is! No way is it an ellipse. No, our
problem is not a shortage of new patterns. In fact, we’ve developed quite a
little arsenal of description tools: we can dilate and project, take cross-
sections, make Pappustype constructions, and perform any and all of these
operations in succession. We are in a position to create some truly
nightmarish mathematical objects, and there is absolutely no hope of being
able to measure them. We may be out of the description frying pan, but we’re
definitely into the measurement fire.
And you know what? I don’t care. As our descriptions get more and more
elaborate, not only does measurement become more and more difficult, but I
get less and less interested. I really don’t care about the cross-sections of a
rotated toric section. For me, the point of doing mathematics is to see
something beautiful, not to create a bunch of increasingly rococo patterns just
because we can.
So are there any beautiful shapes left? As a matter of fact, there are. One
particularly nice example is the helix.
Now, that’s the kind of simple, elegant shape I’m talking about! I would
love to think about something pretty like that. Of course, before we do, we’ll
need some sort of precise description. What exactly is a helix?
My favorite way to think of it is to imagine a circular disk in space, lying
horizontally let’s say, with a specially marked point on its rim. If we rotate the
disk, and at the same time raise it up vertically, the special point should trace
out a perfect helix.
There are many different styles of helix, of course, depending on how fast
the circle rises relative to how quickly it turns. An easy way to determine a
particular helix shape is to specify both the radius of the rotating circle and
the height increase that the point makes after one full rotation.
Sometimes it’s nice to imagine a helix living on the surface of a cylinder,
like a barber pole. The helix can then be described in terms of the size and
shape of the cylinder and the number of full turns made by the helix.
A helix is an example of what are called mechanical curves; that is, curves
that are described as the path of a point on a moving object. Among the most
fascinating and beautiful mechanical curves is the cycloid, the curve traced
out by a point on a rolling circle.
This is a completely new shape, unlike anything we’ve seen before. It also
turns out to have an amazing number of interesting properties; if there were a
“Most Interesting Curve of the Seventeenth Century” award, the cycloid
would win hands down.
There are several interesting variations on this cycloid idea. One is to have
the circle rolling around inside another circle. This traces out a so-called
hypocycloid. Of course, it could also roll on the outside, making an
epicycloid.
Another idea is to allow the tracing point to be in the interior of the rolling
disk. In the hypocycloid case, this produces the very beautiful spirograph
curves.
A ladder slips down the wall until it hits the floor. What
curve does its midpoint describe?
PART TWO
We’ll need a way to distinguish the two directions; otherwise, if we say that
something is at position 1 we won’t know which one we mean.
So our reference system not only needs an origin and a unit, but also an
orientation. That is, we need to decide which direction is forward and which
is backward. Of course, it doesn’t matter which we choose. A line in the
abstract has no left or right, and it is completely up to us to decide what those
words mean.
In any case, once we’ve made our choices of origin, unit, and orientation,
we can then refer to any position on the line unambiguously. We could say,
for example, that a point was at position 3 in the backward direction, and that
would pin it down completely.
An even nicer way to proceed is to use positive numbers for one direction
and negative numbers for the other. Then we could simply say that our point
was at position –3.
There are several advantages to this scheme. For one thing, it means that all
locations can be described by a single number, instead of a number and a
direction. More important, it allows us to connect geometry and arithmetic in
a very pleasing and beautiful way.
First of all, notice that moving one unit in the positive direction simply
increases the position number by one.
I like to think of such a move as a shift. I imagine the entire line shifting
over (to the right in this case) so that what was at position 0 is now at position
1, and so on. There is a shift for every number; we can shift by 2 or by the
square root of 2 or by pi. We can also shift the other way; a shift (backward)
by 2 units would correspond to the number −2.
Geometrically, shifting is very nice because it preserves distances. If two
points are at a certain distance from each other before shifting, they will be at
the same distance afterward. A geometric transformation like this that
preserves distances is called an isometry (Greek for “same measure”). A nice
feature of isometries is that if you perform one isometry and then another, the
result is also an isometry. In particular, if you shift by a certain amount, say 2,
and then shift by another amount, say −3, the result is also a shift, in fact a
shift of −1. So not only do two shifts make a shift, but the corresponding
positive or negative numbers get added.
This means that the geometry of shifts has the same structure as the
addition of numbers. In mathematical parlance, the two systems are
isomorphic. This is what mathematicians are always on the lookout for—
isomorphisms between apparently different structures.
So the main benefit of using positive and negative numbers to indicate
direction is that we get this nice isomorphism between the group of shift
isometries and the group of numbers under addition.
Actually, we get a lot more. There are other natural geometric
transformations besides shifts; for instance, there are reflections. Reflections
are nice because they are isometries. What happens if we reflect a point from
one side of the origin to the other? Of course, its position number gets
negated. Position 3 reflects to position −3 and vice versa. So we can say that
the arithmetic operation of negation corresponds to the geometric idea of
reflection.
2
Having devised a way to locate points on a line, we can now try to do the
same thing for a plane. How do we make a map of the plane? One idea is to
simply mimic the system used for street maps:
Of course, this kind of grid system (where such and such a street might be
found in square B-3) is much too crude for our needs. If we want to describe
the motion of a point in a plane, we’ll need to know precisely where it is at all
times. We need the finest grid possible—one with no space at all between the
gridlines. In other words, every horizontal and vertical position needs to
receive a label. The customary way to do this is to use two number lines, one
horizontal and one vertical. (It is also traditional to use the same unit for both
lines and to have them intersect at their origins.)
Then any point in the plane can be referred to by its horizontal and vertical
position numbers. It is just the same as a street map except that instead of
blocks, we have a whole continuum of possible positions in each direction.
Another major difference is that the plane has no intrinsic landmarks. There
is no “center of town,” no “north,” and no customary unit of distance like a
mile. A grid, or coordinate system, on the plane is a completely arbitrary
construct that we impose on it. There is no such thing as horizontal or vertical
on an imaginary plane. These are choices that we make for our own
convenience. When we coordinatize the plane, we are choosing two arbitrary
(usually perpendicular) directions and deciding to call one of them horizontal
and the other vertical. Obviously, there’s no one best way to do this.
I think it’s important to understand the choices that we’re making in more
detail. First of all, there is the choice of reference point or origin. Of course, it
can be anywhere; you get to decide where you want to put it. Then there’s the
choice of unit, which is again entirely up to you. I usually like to choose a
reference point and a unit that have something to do with the objects and
motions under consideration—to tailor them to the situation at hand.
The really interesting choices involve the two lines. Since each line will
have to be oriented, meaning that a choice of forward and backward along
each line will have to be made, it is nicest to think that instead of two lines
we’re really choosing two directions. These will be the positive directions
along the horizontal and vertical lines of our grid.
But there’s one more choice to make after we’ve chosen the two directions;
namely, which is which. With street maps, it is customary to use letters for
one direction and numbers for the other. This avoids confusion. In our case,
we can’t get away with that because we don’t have an infinite continuum of
letters. So we distinguish them using order. We’ll pick one direction to be first
and the other to be second. If you want to call them horizontal and vertical, or
the other way around, fine, just be aware that those words are meaningless.
Words like up and down, clockwise and counterclockwise, left and right,
horizontal and vertical refer to the way things are oriented with respect to
your body. When Australians and Canadians point up, they are both pointing
in the direction from their feet to their heads, but they point in (roughly)
opposite directions in space.
The point is that we need to choose two directions and designate one of
them as the first direction and the other as the second. This set of choices is
what constitutes an orientation of the plane. In particular, we could designate
a certain rotational direction, say from the first direction toward the second, as
clockwise. So just as for the line, a reference system for the plane consists of
an origin, a unit, and an orientation.
Here are two perfectly good coordinate systems (I’ve marked the first
direction with an arrow and the second with a double arrow).
Once we’ve set up a system like this, each point in the plane will get a
unique label consisting of two numbers. It is customary to write such a label
as a number pair, such as (2, 3) or (0, π).
How does the distance between two points in a plane
depend on their coordinates?
Just as we did with the line, we can relate the geometry of position in the
plane to the algebra of shifts. A shift of the plane moves every point a certain
distance in a certain direction. Notice again that a shift of a shift is still a shift.
A nice way to represent such a shift is by an arrow of the appropriate length in
the appropriate direction.
An arrow like this is called a vector (Latin for “carrier”). Since every shift
corresponds to a vector and vice versa, we can talk about adding two vectors
to get another vector. This would correspond to two shifts resulting in a total
shift.
Once again we have a nice isomorphism between the shift isometries and
an algebra of some kind. The point of vectors is that they encode geometric
information algebraically. In particular, we can imagine a very simple vector-
based reference system for locations in the plane. If we choose a fixed
reference point, then every location in the plane can be thought of as being the
tip of an arrow emanating from this origin.
Can you use vector algebra to show that the lines drawn
from the corners of a triangle to the midpoints of the
opposite sides all meet in a single point?
Then instead of saying that a point has coordinates (2, 3) we can say that
the corresponding position vector is the sum of two vectors: the first unit
vector scaled by 2, plus the second unit vector scaled by 3. That is, we can
write algebraic descriptions like p = 2u1 + 3u2 to describe where we are. Not
that there’s any real difference between the two schemes, just a slight change
in viewpoint and notation. By the way, it’s not at all necessary for our system
to be rectangular; that is, the two directions or unit vectors need not be
perpendicular. We still get a perfectly usable (albeit crooked) map of the
plane.
3
What about three-dimensional space? Can we do something similar? Yes, we
can! Only we’ll need three directions:
4
In order to describe motion, we not only need a way to locate position but
also the ability to tell time. Of course, we’re not talking about real time, the
kind of time that goes by in the physical world (and God only knows what the
deal is with that!), but rather a purely abstract mathematical version of time,
which, as with everything mathematical, we get to invent. What do we want
mathematical time to mean?
The most elegant answer is this: time is a line. The points on this time line
represent moments, and moving around on the line corresponds to going
forward or backward in time. The choice of a line to represent time is
interesting, because it gives us a geometrical way to think about something
that (at least to me) is not in itself particularly visual.
So how do we tell what time it is? Naturally, we need some sort of clock.
But what is a clock, exactly? A clock is a reference system! It is a way of
assigning numbers to moments in time. To set up our clock, we simply do the
same thing for time that we did for space: choose an origin (a reference time),
a time unit, and an orientation (clockwise?). Having done this, every moment
in time can then be represented by a single (positive or negative) number.
Sometimes I like to think of motions as experiments, and the time line as
my stopwatch. The origin is then the moment I start my experiment. If we call
our time unit a second (and we’re free to call anything anything) then the
number 2 would refer to the precise instant two seconds after the start of the
experiment, whereas the number –π would correspond to the time exactly pi
seconds before my experiment began.
Naturally, as with spatial reference systems, any clock is completely
arbitrary, and we are free to design clocks to suit our present purposes,
whatever they may be.
Let’s imagine a point moving along a line in a certain way, perhaps
speeding up and slowing down, maybe changing direction from time to time
—whatever. Suppose we’ve chosen convenient reference systems for both
time and space, so that the position of the point and the time of day are each
represented by a single number.
Then the motion of the point can be completely described by knowing
exactly which time numbers go with exactly which position numbers. If, for
instance, the point is at position number 2 when the clock reads 1, then that
information (position is 2 when time is 1) constitutes an event in the history
of the motion, and complete knowledge of all such events is tantamount to the
motion itself. Geometrically, we can represent an event like this as a pair of
points, one in space and one in time, linked together by the motion of the
point itself:
Of course, knowledge of one or even a million such correspondences
between space and time is not enough. We need to know all of them. Just as
we cannot measure a shape unless we know exactly where every one of its
points is, we can’t measure a motion unless we can say precisely where the
object is at every instant. This brings us right back to our description problem:
we can’t talk about a motion unless it has a pattern, and a pattern we humans
can describe.
This means that with respect to a given reference system, the position
numbers and the time numbers must satisfy some sort of numerical
relationship that we can state in a finite amount of time.
For example, suppose that a point is moving at a steady rate and we’ve
chosen a time unit (call it seconds) and a space unit (let’s say inches) and that
the constant speed of the point is, say, two inches per second. If we calibrate
our clock so that the point is at position number 0 at the start of the
experiment, then we know that when the time number is 0, the position
number is 0 also.
Abbreviating the time number by the letter t and the position number by the
letter p, we can say that when t = 0, p = 0. Also, when t = 1, p = 2; and when t
= 2, p = 4. We could even make a little chart:
Since the point is moving at a steady rate, we know that the position
number will always be exactly twice as big as the time number. So when t =
1/2, p = 1; when t = , p = 2 ; and when t = –π, p = −2π (assuming the
point was moving before we started the stopwatch).
This means that we know every event in the history of this motion. This is
because the pattern is describable. Either the phrase “a point moving at a
constant rate of two inches per second” or the more succinct p = 2t serves to
describe the pattern completely. Notice that both these descriptions depend on
the choice of units: if we chose a different unit of time or distance or both, we
would get a different description of the same motion.
In fact, if a point is moving along a line at a constant speed in a certain
direction, we can always choose our orientations, units, and origins so that the
pattern of position and time is simply p = t. Of course, if we had two points
moving on the same line in different ways, we wouldn’t be able to choose a
reference system in which both motions could be described so simply.
5
One thing that makes thinking about motion somewhat more difficult, or at
least feel different from thinking about size and shape, is that we have no
picture. A shape has a shape, but a motion is a relationship. How can we
“see” a relationship?
One idea, of course, is to make a graph. In the case of a point moving on a
line, we could imagine a chart with, say, time as the horizontal and position as
the vertical.
At each moment in time, given by a number on the time line, there would
correspond a position number, and we can simply plot these numbers on the
chart to give us a visual representation of the pattern of motion.
Notice that what we end up making is a curve. It is absolutely crucial to
understand the status of this curve. This is not the path that the point traces
out—after all, the point is traveling along a straight line—rather, it is the
record of its motion. The point itself is traveling within a one-dimensional
space, whereas this curve here, this graph of the motion, is sitting in a two-
dimensional space.
This two-dimensional space is very interesting. It is not entirely spatial,
since one of its dimensions corresponds to time, and not entirely temporal
either, since the other dimension refers to position. This environment is called
space-time. The points of space-time can be thought of as events, and a
motion is then a curve of events.
The important thing is to be able to read such a diagram: the points are not
moving in a plane; they are moving along a straight line. It is the introduction
of time as an extra dimension that gives us a planar diagram. Motion in one-
dimensional space corresponds to a curve in two-dimensional space-time.
Rather than asking how things move in our universe, a physicist trying to
understand billiard balls and other moving things can rephrase the question
as, what curves in space-time are possible?
6
The thing that got me started talking about motion in the first place was that I
wanted to understand mechanical curves like the helix and the spirograph
curves. These shapes are described by points moving on rolling circles. Next
to constant speed motion along a straight line, this is the simplest motion I can
think of—the motion of a point, at constant speed, around a circle.
Of course, if the only thing that’s going on is a point moving along a circle,
then we’re in essentially the same situation as with a line. We can choose any
point on the circle as our reference point, pick one direction around the circle
as positive, and obtain a number circle in the same way as a number line.
In this manner, we can record the position of a moving point at all times.
The only new wrinkle is that, since the circle is closed, the numbers will wrap
around and each point will receive infinitely many labels. The circle will have
a certain length (depending on our choice of unit), and the various numbers
corresponding to a particular position will differ from each other by multiples
of that amount. For example, if we choose our unit so that the circle has
length 1 (and why not?), then the origin of our coordinate system will receive
the labels 0 and 1, as well as 2, 3, −1, and all the other positive and negative
whole numbers.
Except for this slight twist of having multiple coordinates, the circle
behaves the same as the line. A motion on the circle can be described in the
usual way, as a numerical pattern relating the position number to the time on
the clock. The simple relationship p = t, which describes constant speed
motion on a line, also describes constant speed circular motion.
In fact, the same goes for any curve whatsoever; all curves can be
coordinatized in this way, so describing motion on one curve is the same as
for another. In other words, all curves are intrinsically the same. Well,
actually, that’s not quite true; there is the difference between open and closed.
But that is the only difference. Structurally, any two open curves are identical,
and any two closed curves are also. That is, if your universe were one
dimensional and so were mine, we couldn’t tell the difference between them.
If we both chose reference points and units and set up coordinate systems,
then every location in my world would have a corresponding place in yours,
and no experiment we could perform could detect the difference—except, of
course, for the experiment of going off in one direction and seeing if we ever
come back or not. From a classification point of view, there are exactly two
one-dimensional geometries. (I’m leaving out the unpleasant possibility of
boundary points where space suddenly ends, such as in a line segment with
endpoints.)
A geometry, in the modern sense, is a space of some sort endowed with a
metric (that is, a notion of distance), and two geometries are considered the
same if there is a correspondence between them that preserves the distance
between points. In other words, the structure-preserving transformations are
the isometries.
So if we have two curves, let’s say a wiggly one and a straight one, we can
coordinatize each in whatever way we please, and this sets up an isometry
between them. Namely, we just correspond points with the same numerical
label.
But if a curved line and a straight one are geometrically identical, then
what does curved mean? What are we detecting about these two shapes when
we look at one and call it straight and the other not?
Intrinsically—that is, from the inside—the experiences of people living in
these two spaces are absolutely identical. What is different about them is
extrinsic: the view from the outside. The two curves are the same in and of
themselves; the difference is the way in which they have been embedded in
the plane. Differences between the two curves can be detected by two-
dimensional creatures living in the plane. For instance, the distance between
two points can be measured on one of the curves (and I mean the distance
between them in the plane) and compared with the corresponding
measurement on the other curve.
Now these measurements do not come out the same. The point is that one-
dimensional creatures use rulers that exist inside their universe, so they can’t
measure how their world might be bending with respect to some larger
ambient universe.
So what curved means is that one space has been stuck inside another in
such a way that its intrinsic metric disagrees with that of the larger outside
space. A straight line in the plane is straight because whether you measure it
from the inside or the outside, you get the same distances (assuming, of
course, that the one-dimensional creatures are using the same measuring unit
as their two-dimensional brethren).
So curved and straight are relative notions. A one-dimensional space is
neither straight nor curved until you inject it into a higher-dimensional space.
Then the two metrics can be compared. It’s not the curve itself that is curved
so much as it is the manner in which it is embedded.
In general, whenever one space sits inside another—whether it is a curve in
a plane, an arc lying in a sphere, or a torus floating in space—the larger,
“parent” space induces a metric on the smaller space. Any other metric that
this subspace may have intrinsically can then be compared with the one it
inherited. If they agree, it means that the smaller space was injected into the
larger isometrically—straight, or flat, or whatever you want to call it.
Otherwise, it got bent.
This is, of course, the modern viewpoint. Under this interpretation, the
circle, as well as every other curve, is intrinsically flat. A nice way to think of
a flat circle is to imagine a stick with “magical” endpoints:
The idea is that when you sail off one endpoint, you immediately reappear
on the other. In other words, the two endpoints represent the same exact place.
The point is, there is no intrinsic difference between this magical space and
the customary idea of a circle. What makes a circle circular is the way it is
situated in the plane.
All of this is a very long-winded way of saying that circular motion is only
really circular when there’s something else going on to compare it with. The
cycloid is a good example. A point is not simply moving along a circle; the
circle is rolling along a line. To understand this motion, we need to know
what circular motion looks like not from the inside point of view of the circle
but from the outside view of the plane.
7
So the right questions to ask are about how a circle sits in the plane. In
essence, we have two competing coordinate systems: the intrinsic circular one
and the one coming from the ambient space of the plane. The question is how
these two systems compare.
Of course, there’s no such thing as the coordinate system for either the
circle or the plane. Coordinate systems depend on choices. If we make ugly,
unpleasant choices, the systems will relate to each other in an ugly, unpleasant
way.
So what would be the nicest choices? We have a circle sitting in a plane.
The first thing to do is to choose a reference point in the plane. I can’t
imagine a nicer, more symmetrical location than the center of the circle.
As for the two directions, we might as well make them perpendicular, and
then, of course, the symmetry of the circle makes it pretty irrelevant which
two directions we choose. So, let’s pick some random direction and call it
horizontal and call the other vertical. It is customary to orient these on the
page as left to right and down to up, respectively, but that is, of course,
entirely up to you. Let’s say we do it the usual way. It is also customary to
choose the horizontal as the first coordinate. Having oriented our system (or
ourselves, whichever way you want to think of it), we need to metrize it by
choosing a unit. Since the circle is the only interesting thing in sight, we
might as well choose its radius as our unit.
Now our rectangular coordinate system in the plane is all set up. Every
point in the plane (including, especially, those points on the circle) can now
be given a coordinate label consisting of two numbers. The top of the circle,
for instance, would be assigned the pair (0, 1).
The other coordinate system we are interested in is the one coming from
the circle itself. This is a one-dimensional system. Any time a curve sits in a
surface, the geometry of the situation will come down to a comparison of a
one-dimensional system with a two-dimensional system.
To set up the circular system, we will need to choose a reference point on
the circle. I can’t say that there’s any particularly strong candidate. I suppose
we might as well choose one of the four points where the perpendicular axes
cross the circle, the classic choice being the right-most point (1, 0). Not that it
matters in the least. Then, there is the clockwise versus counterclockwise
issue. Which direction will be positive? Again it doesn’t matter. Custom
dictates making it counterclockwise (i.e., from the first direction toward the
second). So we’ll start at the right-most point of the circle and lay off units
counterclockwise around the circumference. Naturally, we will use the same
units as we did for the rectangular system, so we won’t have any unnecessary
conversions to do. In other words, we’re measuring the circle using its own
radius as our ruler. Under this system, the total length of the circle is 2π, so
for instance, the top of the circle will receive the label , being one-quarter of
the way around. (Of course, it will also receive the labels , , and
infinitely many others, circles being closed and all).
Now, here’s the point. Every location on the circle receives both a circular
and a rectangular coordinate label. The top of the circle is at position along
the circle, meaning the distance along the circle from the starting point is ,
whereas its rectangular label is (0, 1). The origin of our circular system, of
course, gets the label 0, and (by our choice) it has rectangular coordinates (1,
0). The fundamental question about circles in the plane is how to convert
between the two systems. There is absolutely no way to understand something
like a rolling ball without being able to go back and forth between rectangular
and circular reference systems.
Suppose we have a point somewhere on the circle. Let’s call its circular
coordinate s. Then the question is how exactly its rectangular coordinates, say
x and y, depend on s.
We know, from the way we set it up, that when s = 0, then x = 1 and y = 0.
We can even make a little chart of the four corners:
For other points, the correspondence is subtler. Consider, for instance, the
point halfway from the origin to the top of the circle. Its circular coordinate is,
of course, just , one-eighth of the way around the full circle. But where is
that point in the up-and-down, side-to-side sense?
One way to see it is to make a little triangle. Since the angle of this right
triangle is one-eighth of a full turn (or 45 degrees), we know that the triangle
is half of a square. The long side of the triangle has length 1, since our unit
was chosen as the radius of the circle. So the two short sides must both be
(the diagonal of a square being times its side). Thus when s = , we get x
= and y = .
Alternatively, we could reason that since this is a right triangle of
hypotenuse 1, its legs are precisely what we called the sine and cosine of the
angle, which in this case is one-eighth of a turn.
Generally speaking, this is the best we can do. For a random point on the
circle, the only way to talk about its rectangular coordinates is via the sine and
cosine of the angle formed by this little right triangle.
As our point swings around, the angle that it makes with the horizontal
increases from nothing to a full turn. When the angle, let’s call it A, is small,
then the horizontal and vertical coordinates of the point are simply cos A and
sin A, respectively. When the point passes the top of the circle (the mark),
the corresponding right triangle is now on the other side of the circle, and its
angle is not A anymore but the angle next to A. This is exactly the same thing
that happened to us when we were measuring triangles. We ended up deciding
that it would be most convenient to define the cosine of an angle A in this
range to be the exact negative of the cosine of the angle next to it. This is
lucky for us, because that is precisely what the horizontal coordinate of a
point in this range should be. It’s no coincidence that the two problems—
measuring the distance between two sticks at an angle and determining the
location of a point on a circle—should require the same choice of extension of
sine and cosine. We build mathematical objects to be beautiful, and beautiful
things, like crystals, have tremendous consistency: they follow patterns, and
they don’t like to have those patterns disrupted.
Similarly, the nicest extension of sine to angles in this range (between one-
quarter and one-half of a turn) is to have sin A be the same as the sine of the
angle next to A, not the negative of it. This choice allows the law of sines to
remain valid for large angles, as well as giving us the right vertical coordinate
for points in this quarter of the circle.
Really, what’s going on here is this: we have two problems, triangle
measurement and the comparison of circular and rectangular coordinate
systems. Well, they turn out to be the same. More precisely, the sine and
cosine of an angle are a special case of the circle problem—the case of
smallish angles. So we have an old definition of sine and cosine in terms of
right triangle proportions. Now we’re forging a new definition, and lucky for
us, it’s not conflicting with the old one. This is a recurring theme throughout
mathematics—the extension of a naïve concept to a wider and more general
context.
So the idea is to give a meaning to the sine and cosine of any angle
whatsoever. If the angle is small (between zero and one-quarter of a turn) then
we know what sine and cosine mean, namely the sides of the corresponding
right triangle of hypotenuse 1. Between one-quarter and one-half of a turn, we
look at the outside turn and its sine and cosine. Then the sine of our angle is
the same as the sine of the outside angle, and the cosine is the negative. In
both cases, the sine and cosine of our angle are just the rectangular
coordinates of the corresponding point on the circle. Naturally, the plan is to
define the sine and cosine of any angle in this way. So here we go: the cosine
of an angle is the horizontal coordinate of the point on the circle described by
that angle, and the sine is the vertical coordinate.
8
The situation is now this: for a point on the circle, with circular coordinate s
and rectangular coordinates x and y, we have
x = cos A,
y = sin A,
where A is the angle formed at the center of the circle by the point,
counterclockwise from the horizontal. (Of course, in some sense this is totally
content free; it’s really just a restatement of our failure to measure triangles
algebraically.) In any case, the whole issue now comes down to how this
angle A depends on s.
The number s represents a length—the length around the circle to our point
—and A is the corresponding angle. Traditionally, the relationship between
lengths and angles is somewhat strained. There is a lot of mistrust and
resentment, and also sines and cosines. But that’s really about angles and
straight lengths. The relationship between angles and circular lengths is a
whole different story. In fact, it’s about as simple as can be: they’re
proportional. A full turn corresponds to a complete circumference length, a
half turn to half a circumference, and so on. So depending on your choice of
length and angle units, the two will just be off by some factor. In particular, if
we measure length using the radius and angle using full turns (as we have
been), then the relationship is simply s = 2πA.
Right away we could end this discussion by saying that the conversion
between circular and rectangular systems is simply this:
x = cos (s/2π),
y = sin (s/2π).
And that’s that. If you have the circular coordinate s, all you have to do is
scale it down by 2π to convert it to an angle measured in full turns, then
convert the angle back to a pair of lengths x and y using sine and cosine. The
positive and negative signs are taken care of by our clever new definition of
sine and cosine. And so we get the rectangular coordinates.
The only thing that is a little obnoxious about this is that we have to
convert our arc length to an angle and then convert the angle back into a pair
of lengths. This is happening for two reasons. One is our choice of units—
we’re measuring angles in full turns. Of course, if we measured them in
degrees, it would be even worse; the conversion from arc length to angle
measurement would be A = s. The question is, what are the best units for
angle measurement? Should a full turn be thought of as 360 degrees, or one
full turn, or what? Of course, it doesn’t really matter; it’s just a question of
convenience. But convenience is a nice thing anyway. My feeling is that for
polygon measurement (e.g., when we were looking for possible tiling
patterns), measuring angles as portions of a full turn is simple and natural.
Now that we’re comparing circular and rectangular coordinate systems,
though, it seems a bit clunky. I don’t really like that 2π conversion factor.
The other thing that’s getting in our way is our interpretation of what it is
that sine and cosine do for a living. We’ve been thinking all along, and
naturally enough, that they convert angles into lengths—or more precisely,
ratios of lengths. This necessarily means that we have to go through angles
any time we want to measure circles or circular motions, and that just doesn’t
seem right.
So here’s my proposal. It’s rather modern, and it may seem strange and
arbitrary, but bear with me. First of all, we’re going to choose a new way of
measuring angles. A full turn will not be 360 something-or-others, nor will it
be our unit. A full turn will be 2π. That is, we’re going to use the circular
coordinate system itself to measure angles. So a right angle receives a
measurement of .
Not that anything is really any different from before, just the units and the
attitude. The nice thing is that we can eliminate angles from the situation and
simply say that if s is the circular coordinate of a point on our circle, then its
rectangular coordinates are
x = cos s,
y = sin s.
And this makes complete sense for any number s whatsoever. Of course,
this is really just a restatement of our new definition of sine and cosine. I
suppose the real content of this is that there is no disagreement with any of
our prior interpretations. What sine and cosine do is convert circular
measurements into rectangular ones. They are the abstract mathematical
version of “putting a round peg into a square hole.”
9
The simplest nonlinear motion I can think of is a point moving in a circular
path at a constant speed, usually referred to as uniform circular motion. To
describe such a motion, we would need, as always, to choose coordinate
systems for time and space—to build a clock and a map suitable for the
situation.
Naturally, the simplest choices would be a length unit equal to the radius of
the circle and a time unit chosen so that the speed of the point was equal to 1
(in other words, to choose our unit of time to be the amount of time it takes
the point to travel an arc length equal to 1 length unit). With these choices, the
description of the motion is as simple as can be: if s is the circular coordinate
and t is the time, then the motion is given simply by the pattern s = t.
Of course, if we are concerned with the relationship between our point and
some external object, say another point or line in the plane, we would prefer a
description of the motion from the plane’s point of view. We could do this by
choosing a rectangular system for the plane, with coordinates x and y say, and
describe the motion of the point in those terms. The simplest setup would be
what we had before, with the center of the circle as our origin, and so on. If
we orient our system so that the motion of the point is counterclockwise and
its initial position (at time t = 0) is the customary starting point x = 1, y = 0,
then the motion can be described by the set of relations s = t, x = cos s, y = sin
s. More simply, we could just write:
x = cos t,
y = sin t.
x = cos z,
y = sin z.
10
Now let’s try to describe the cycloid. This curve is traced out by a point on a
rolling circle, so what we need is a precise description of this motion. Where
exactly is the moving point at any given moment? Of course, the first job is to
design an appropriate coordinate system.
I like to choose the radius of the circle as my spatial unit, the line it’s
rolling on as my first direction (oriented in the direction that the disk is
rolling), and my origin (in both space and time) to be a moment when the
point is touching the line; that is, when the point has rolled completely
underneath the circle.
The only thing left is to choose the time unit. This is tantamount to
choosing the speed of the rolling. Of course, it makes no real difference; the
same curve will be traced out whether it rolls quickly or slowly. So we may as
well choose our units so that the speed is pretty. Let’s say that the speed is 1.
By that I mean that if you look at the disk in isolation, independent of the line
it’s rolling on, it rotates so that the moving point has constant unit speed along
the circle.
Actually, this idea of looking at the motion from different points of view is
extremely valuable. It usually goes by the name of relativity. A bug sitting
somewhere in the plane would see this motion as a point on a disk rolling on a
fixed line, whereas another bug who was riding on the disk (sitting at the
center, let’s say) would simply see the point rotating around it, with the line
speeding by.
The point is, neither is right or wrong; they’re both right from their own
point of view. The important thing is for them to be able to communicate with
each other. That is one thing that makes the vector approach to motion
representation very convenient: since positions are already described in terms
of shifts, it’s very easy to adjust to someone else’s perspective—we just add
on another shift!
Let me be as clear about this as I can (as if up until now I’ve been
purposely vague). Let’s look at the vector representation of our moving point;
that is, the shift that takes us from the origin to the point itself.
Of course, this vector is changing all the time and in a complicated way.
That’s the whole point; the cycloid motion is not so simple, and this vector is
getting longer and rotating up and down in a subtle way that we are trying to
describe precisely.
The idea of relativity is to try to find another perspective from which the
motion is simpler, for instance from the center of the circle.
Now we can view our vector (which describes the cycloid motion) as being
a sum of two simpler vectors, namely the one from the origin to the center of
the circle and the radial vector from there to the point itself. The motion of the
center is simple because there is no rotation, and the radial vector is simple
because it is purely rotation.
This is an extremely useful technique: a clever change of perspective
breaks down a complex motion into a sum of simpler motions. We can go
even further with this. Instead of watching the center from the point of view
of the origin, it’s a little nicer to watch it from a position one unit higher.
This means we’re breaking the vector to the center into a sum of two pieces
—a vector up one unit and a horizontal vector from that position to the center
of the circle. These are both simpler motions, since their directions don’t
change. In fact, the first vector doesn’t change at all.
So we have broken the relatively complicated motion of a point on a rolling
disk into three much simpler motions: a constant vector to get us up to the
level of the center of the disk, a purely horizontal vector from there to the
center itself, and then finally the vector from the center to the rotating point.
If we were starting from the right-hand side of the circle and traveling
counterclockwise, we would have exactly the situation we looked at before
with circular motion, namely the coordinates of the point would be cos t and
sin t. That is, the vector from the center of the circle to the point would simply
be (cos t) u1 + (sin t) u2. Since we’re starting at the bottom and moving
clockwise, this needs to be modified to
Maybe the best way to see this is to think about the coordinates separately.
The horizontal position needs to begin at 0, decrease to −1, and then move
back to 0, up to 1, and back to 0 again. That’s exactly what sin t does—only
negated. So the horizontal coordinate is moving in the −sin t pattern; similarly
for the vertical, only with cosine. Alternatively, our point is moving in the
customary way from the perspective of someone standing on the other side of
the plane, looking sideways. This has the effect of reversing the coordinates
and negating them—more relativity, I suppose. In any case, we have a precise
description of the motion of the point from the point of view of the center.
Next we need to describe the motion of the center itself. The vertical part of
it is easy; it’s just u2. It’s the horizontal part that’s going to be a bit tricky.
Probably the simplest way to measure the horizontal motion is to let the circle
make one complete rotation.
Now because the disk is rolling (that is, it’s not slipping or skidding), the
full circumference is laid down horizontally. In other words, the distance
along the road that the disk travels is one circumference. This means that in
the amount of time it takes the point to make one full rotation (and thus travel
a distance of one circumference), the center moves the exact same distance
horizontally.
This means the horizontal speed of the center is the same as the speed of
the rotating point along the circle. But we chose our time unit so that this
speed is 1. Thus the horizontal speed of the center is also 1. Is that at all
understandable? This is by far the hardest part of the problem—interpreting
what “rolling” means exactly.
So that’s pretty. The center travels horizontally at exactly the same speed as
the point rotates. Algebraically, what this means is that the horizontal vector
is simply tu1. That is, it points in the positive horizontal direction, and its
length is always equal to t, since it starts off at zero and grows at a constant
unit rate.
Putting everything together, we get that the position vector p of our point is
given by
x = t − sin t,
y = 1 − cos t,
where as usual, x denotes the horizontal and y the vertical coordinate of the
point at any time t. This is a fairly nice description, given how complex the
motion appears to be.
Let’s test this out a bit. When t = 0, this says that x = 0 and y = 0. So that’s
good. It means that the point starts out at the origin, as planned. When t = 2π,
we get x = 2π, y = 0, which also agrees with what we decided before—that the
disk travels a distance of 2π after one full rotation.
11
So we have solved the description problem for the cycloid. We can now say
exactly where the moving point is at all times; namely, we have the precise
description
x = t − sin t,
y = 1 − cos t.
Like any map, the musical staff has an orientation (high notes toward the
top, low notes toward the bottom) and a unit (one step). The various clefs and
key signatures determine the origin of the system. A piece of music can then
be graphed in pitch-time. The horizontal direction measures time (the unit is
the beat, the origin being the start of the piece). The little black dots denote
the musical “events.” (I suppose we could consider loudness as another
dimension in our space of note points, so a piece of sheet music is really a
graph of a two-dimensional motion.)
So both composers and mathematicians construct coordinate systems
appropriate to their description problem and use a symbolic language to
encode the patterns. And just as a good violinist can glance at a line of sheet
music and hear the tune in her head, an experienced geometer can see and feel
the shapes and motions described by a system of equations (at least if they are
reasonably simple).
12
What we have been talking about is representation. Whenever one thing is
used to represent another, there are always interesting philosophical
consequences. For one thing, there is the question of exactly who is
representing whom. Is the sheet music a transcription of the sound, or is the
performance an enactment of the sheet music? Or is it that both the writing
and the playing are representations of the same abstract musical idea?
In our case, we have a geometric object (a shape or motion) and its
representation by a set of equations. But is it the shape that is the real thing
and the equations only a convenient algebraic encoding, or could we just as
easily view the equations (i.e., the number pattern) as the true object of
interest and the shape or motion as a mere visual or mechanical representation
of it?
Of course, we have known all along that pictures themselves are not very
useful description tools (their value is mostly psychological) and that when
we speak of a circle we are not really talking about a picture but rather a
linguistic pattern: the collection of points at a certain distance from a fixed
center. What Descartes realized is that any such verbal description that is
precise enough to specify a shape or motion exactly can be replaced by a
numerical pattern and represented as a set of equations. For instance, the
circle can be encoded (with our usual choice of coordinates) as x2 + y2 = 1.
On the other hand, any equation or set of equations involving any number
of varying numerical quantities (usually called variables) can be interpreted
as describing a shape or motion. That is, every numerical relationship has a
certain “look” to it. The relation b = 2a + 1 (which encodes the purely abstract
numerical information that the variable b is always one more than twice the
value of a) can, if we wish, be thought of as a line in two-dimensional space:
Or, if we prefer, we could view this as a space-time picture and imagine
that a is time and b is position. In this way, the relationship b = 2a + 1
becomes the record of a constant speed motion beginning at position 1 and
moving forward at a rate of 2.
Ultimately, of course, it is neither shapes, motions, nor equations that are
the real object of study but patterns. If you choose to represent your pattern
geometrically or algebraically, that’s fine. Either way, it is the abstract pattern
relationship that you are really talking about.
What happens when we view shapes as mere visual representations of
number patterns? Well, for one thing we get a lot of new shapes! This so-
called coordinate geometry (initiated by the publication of Descartes’ La
Géométrie in 1637) not only provides a convenient solution to the description
problem—providing us with a uniform linguistic framework in which to
describe geometric patterns—but at the same time gives geometers an entirely
new way to construct shapes and motions. We now have almost unlimited
descriptive ability.
The question then becomes which equations correspond to which shapes?
(Of course, I’m including motions here, since we can always think of a
motion as a curve in space-time.) What we need is a “dictionary” to help us
translate between geometric and algebraic descriptions. We could start with:
One of the most beautiful discoveries of this period (the early 1600s) was
that simple equations correspond to simple shapes. The simplest numerical
relationships are those that involve no multiplications among the variables,
only addition and scaling by constants. In two dimensions, these have the
form Ax + By = C and correspond to lines. (For this reason such equations are
often called linear.) In three dimensions we would have another variable, Ax
+ By + Cz = D. The picture (or graph) is now a plane in space.
is always a conic section. That is, the class of curves we have been calling
conic sections corresponds exactly to the set of degree 2 equations in two
variables. In other words, the simplest nonlinear curves correspond to the
simplest nonlinear equations. So we have another entry in our dictionary:
It turns out that this one is new. It’s not a circle or a conic or a cycloid or a
spiral, or anything else we have a name for. It is “the graph of y2 = x3 + 1,”
and that’s the simplest description we’re ever going to have. This is what I
meant by a lot of new shapes. Any numerical relationship you want to write
down will carve out some sort of shape, and all but the simplest few will be
absolutely brand-new. This is the expressive power of algebra—the moment
we put numbers on a line to make a map, we get this amazing wealth of new
shapes.
What is the largest circle that can sit at the bottom of a
parabola?
13
Having solved the description problem for motions (at least in the sense of
having a universal language in which to describe them), it is clearly time to
start measuring.
What is there to measure about a motion? A coordinate description tells us
where it is when. Questions like, “where was it at such and such a time?” or
“what time was it when it was here?” can be answered directly from the
equations describing the motion. This would come down to some scrambling
and unscrambling of our numerical relationships—in other words, doing some
algebra. This could conceivably be quite unpleasant in practice, but it presents
no particularly deep philosophical problems.
Far more interesting are the questions: How fast was it going? How far did
it travel? These are clearly related, since how far you go depends very much
on how fast you go. So our first really interesting problem about motion is the
measurement of speed.
Suppose we have a moving point described by a set of coordinate
equations. How can we determine its speed? Since the entire motion is
completely and precisely specified by the numerical relationships (i.e., the
way the coordinates depend on time), the equations must somehow hold the
speed information within them. How do we get that information out?
Let’s start with the simplest possible situation: uniform linear motion in
one dimension. (I suppose the simplest situation would be no motion at all,
but that is not terribly exciting.) Here our motion can be described by a simple
equation like p = 3t + 2 (where as usual p is the position number and t is the
time coordinate). In this case, it is particularly easy to read off the speed
information: the point is traveling (forward) at a speed of 3 (space units per
time unit). In other words, the speed is simply the coefficient of time—the
factor by which t is being multiplied. Thus for any uniform linear motion p =
At + B, the initial position is B and the speed is A.
Of course, if this were the only object of interest, we could simply take the
path of the point as our universe and view the motion as being one
dimensional. But in general (which is the nicest way to work), we might
require a three-dimensional ambient space. There may be other moving points
in the picture, for instance.
What are the equations of such a motion? The simplest way to think about
it is to use a vector description.
As usual, let’s have p denote the position vector of our moving point. So p
depends on the time t in some way. As before, we can break this vector
(which grows and shrinks and turns in a subtle way as time goes by) into a
sum of simpler pieces. The first piece is the initial position vector; that is, the
vector pointing to the location of our point at time t = 0. This is often written
as p0 to denote that it is the value of p at time 0. The other piece is the vector
from the initial position to the current position. Notice that this vector points
in the direction that our point is heading, and since the motion has constant
speed, its length grows at a constant rate. This means that it must have the
form tv for some fixed vector v. Putting the pieces together, we get that every
uniform linear motion in space must have the form
p = p0 + tv.
At time t = 0, this says that p = p0, the initial position. As time goes by, the
position changes, so that every second (or whatever you want to call your
time unit) the position shifts by the vector v. This means that the vector v not
only holds the heading information but also the speed. In fact, the speed is
simply the length of v, since that is how much distance is traveled every
second. So we see that in higher dimensions, the speed and direction are most
nicely considered together as a vector. This vector is called the velocity of the
motion. (In the one-dimensional case, the velocity is just a single number; its
sign then carries the heading information.) Notice, as in the one-dimensional
setting, the velocity is easily read off from the equation as the (vector)
coefficient of time.
If we wish, we can always rewrite any vector description as a set of
equations in the coordinates, for example:
x = 3t + 2,
y = 2t – 1,
z = −t.
The real problem is that most motions aren’t uniform. In general, a moving
point does not keep a steady velocity; it speeds up and slows down and
changes its direction constantly. In other words, the velocity vector itself
depends on time.
The usual way to picture it is to imagine the velocity vector as an arrow
situated at each point along the path:
Here we have a motion in the plane, and the velocity arrows show the point
speeding up and then slowing down again. Notice that since the velocity
vector always points in the direction of motion, these arrows will always be
tangent to the path. I like to think of our point as a tiny moving car equipped
with a speedometer and a compass. At every moment they together indicate
the velocity (e.g., northwest at 40 mph).
So our fundamental problem is this: given a motion (that is, a description of
how the position vector varies with time), to determine its velocity (also a
vector varying with time). This is what the measurement of motion comes
down to; transforming one vector equation (position) into another (velocity).
14
We now know what we want to measure, but how do we go about measuring
it? We have a moving position vector p, and we want to determine the
corresponding velocity vector (usually denoted by ṗ). For example, if p = 2t –
1 is a one-dimensional motion, then ṗ = 2 is its (constant) velocity. In general,
of course, things are not so simple. The position vector moves around in a
complicated way, and it’s not at all obvious how we are going to use its
description to obtain the velocity information.
Let’s start with the one-dimensional situation and imagine a point moving
along a line in some complicated way. The space-time picture might look
something like this:
We saw before that for constant speed motion the velocity could be viewed
as the slantedness of the space-time curve (which was, of course, a straight
line). It was Isaac Newton’s insightful observation that this remains true for
any motion. More precisely, Newton recognized the steepness of the tangent
line at a point of the space-time curve to be a geometric representation of the
velocity at that precise instant. What a beautiful connection between shape
and motion! The seventeenth-century problem of velocity is the same as the
classical Greek problem of finding the tangent to a plane curve.
If you like, you can imagine that each point on the space-time curve carries
with it its tangent line. As the moving point speeds up, the tangent line gets
steeper, and as it slows down, the tangent line flattens out. If the point starts
backing up, the tangent line slants down. Notice that at the precise moment
that the point reverses direction, its velocity is exactly zero. The tangent line
is horizontal!
What this means is at that precise instant the point is neither traveling
forward nor backward. When you throw a ball in the air, it goes up and then
comes down (so they say), but there is a split second there where it “hangs.”
(Of course, we are concerned with imaginary idealized motions. What really
happens with a ball is anyone’s guess!)
Of course, this line does not have the right slantedness (that’s what being
an approximation means), but as we move the nearby point closer and closer
to our point, the approximation gets better and better.
So we can get the true slantedness of the tangent line from these
approximating lines—provided there is some sort of pattern to their
slantedness. Naturally, the slantedness pattern will have to come from the
curve itself; that is, from its equation.
Of course, the Greek geometers knew that the tangent problem could be
approached in this manner; the new idea was to combine it with Descartes’
method of coordinates. In the case of a one-dimensional motion and its
associated space-time curve, what we are doing is applying the method of
exhaustion to time.
If we select a particular moment, which we might as well call “now,” we
can approximate the slantedness at that point (that is, the velocity) by
selecting slightly nearby moments—let’s say slightly later ones—and figuring
out the slant of the line connecting them (the approximate velocity). If we’re
lucky (and we often are), there will be a pattern to this slantedness as the
“later” point gets closer and closer to “now”—that is, as the elapsed time
shrinks to zero. If we’re clever (and we often are), we can read this pattern
and see where it’s heading. And that’s how we’ll get the exact velocity.
Does this all sound a bit farfetched? There are certainly a number of ways
this plan could go awry. What if we can’t figure out the approximate
velocities? What if they don’t have a pattern? What if they have a pattern, but
it is too hard for us to read?
It turns out that the first of these is no problem at all. The approximate
velocity (or slantedness, if you prefer) is simply the ratio of position change
to time change:
So given two points whose time and position coordinates are known, it is a
relatively simple matter to calculate the slantedness of the line connecting
them. The tricky part is going to be figuring out where these approximations
are heading.
As an example, suppose we have the motion p = t2 (the simplest
nonuniform motion). Let’s try to determine the velocity at the moment t = 1, p
= 1.
If we write p(t) (as has become customary) for the value of the position p at
the time t, then the change in position is simply
and the elapsed time is simply o itself. Thus, the approximate velocity would
be
Now the question is, where is this number heading as o approaches zero?
Notice that as o gets smaller and smaller, both the top and bottom of this
fraction approach zero. Essentially what’s happening is that we are trying to
calculate a certain slantedness using a sequence of ever-shrinking little
triangles:
Even though the triangles themselves are shrinking away to nothing, their
slantedness is not: it’s heading toward the truth, namely the velocity we are
after. The problem is how to tease that information out of our approximation
pattern. We can’t simply watch idly by as a fraction becomes 0/0. We need to
understand how it’s getting there. Is the numerator approaching zero twice as
fast as the denominator? Half as fast? Where is the proportion heading? To
paraphrase Newton, we want the ratio of the quantities not before they vanish,
nor afterward, but with which they vanish.
This is our first potential disaster. The fraction
Now, that’s more like it! Not only is 2 + o much simpler looking, but it’s
also quite easy to see where it is heading, namely 2. In other words, the
instantaneous velocity at the precise moment when t = 1 is exactly 2. More
succinctly, we could write ṗ(1) = 2. So if a point is moving in the pattern p =
t2 (with respect to a certain map and clock), then at time t = 1, it is moving
forward at a rate of two space units per time unit. Or, if you prefer, we can say
that the tangent line to the parabola p = t2 at the point (1, 1) has a slant of 2.
At least in this very simple case, our plan has been completely successful.
In fact, we can calculate in the same way the velocity of this motion p = t2
at any moment whatsoever. At time t, the approximate velocity is
and this clearly approaches 2t as o approaches zero. Thus we get ṗ(t) = 2t. The
velocity at any moment is simply twice the time (in agreement with our
intuition that the point should be speeding up). So here is our first nonobvious
fact about velocity:
p = t2 ṗ = 2t.
On the face of it, it would seem that we got very lucky: we were able to
rearrange the approximations in a way that allowed us to see what they were
up to. Does the ability to calculate velocities necessarily come down to a
question of algebraic skill?
In general, for any one-dimensional motion p(t) (regardless of how
complex the dependence on time), we can say that as o approaches zero,
This gives us a systematic way to calculate the velocity pattern ṗ(t) from
the motion pattern p(t) itself. The only question is whether we are clever
enough to tell where the approximations are heading.
15
Let’s step back from the details for a minute and think about exactly what we
are doing. As usual, we have three equivalent ways to view the situation. The
geometric view is that the objects we are interested in are curves, and we want
to measure their slantedness and how it changes. The kinetic view is that we
have a motion and we want to calculate its velocity at all times. More
abstractly, we can regard our problem as taking one number pattern (that
which describes the curve or motion in some coordinate system) and from it
deriving another pattern (that of the slantedness or speed). For this reason, the
second pattern is usually called the derivative of the first.
Suppose we graph our motion pattern in space-time:
We can then make a new graph by plotting the velocity at each time:
Notice that the vertical scales in these two pictures are completely different.
The first is a coordinate map of the one-dimensional space in which the point
moves, whereas the second graph is plotted on a scale of possible rates, a
very different thing entirely. We could say that the first picture is a curve in
space-time and the second is a curve in rate-time (so in particular the units are
quite different).
Our velocity project then comes down to transforming the first picture into
the second. Qualitatively, we can see that since the derivative picture records
the slantedness, it will have large values where the original curve is steep,
small values where it flattens out, and negative values where it slants down.
To say anything more precise, we need a way to take a number pattern p and
produce its derivative number pattern ṗ. In the abstract, we have just solved
this problem, namely ṗ(t) is precisely the number that is approached by
as o gets closer to zero. The only question is whether we can always get an
explicit description of how ṗ depends on time. For instance, we saw that when
p = t2, we could actually calculate ṗ = 2t.
The abstract viewpoint allows us to shed any geometric or mechanical
prejudices and simply view our problem as the study of the transformation p
→ ṗ. What does this transformation look like algebraically? How does it
behave? As p gets more complicated (that is, the way it depends on t gets
more algebraically involved), presumably ṗ does also. But precisely how?
Here are a few things we do know:
If p is constant, then ṗ = 0.
If p = ct for some constant c, then ṗ = c.
If p = t2, then ṗ = 2t.
p = t2 + 3t – 4 ṗ = 2t + 3.
Notice that this is exactly what we would have gotten if we had simply
“dotted” each piece of p separately. That is, if we had thought of p as a sum of
three pieces: t2, 3t, and −4. Then dotting each piece gives us the correct total.
This means that dotting is a very well behaved operation. An algebraist would
say that it “respects addition,” meaning that if you have a motion of the form
p = a + b, where a and b are themselves motion patterns (variables which
depend on t in some way), then the simple and beautiful truth is that
p=a+b ṗ = ȧ + ḃ.
In other words, the velocity of a sum is the sum of the velocities. Of course,
we can’t assume that this is always true just because it happened to work for
the one special case we just looked at. But it is in fact universally valid, and
it’s not hard to see why. The reason is that if p = a + b, then for any time t,
In particular,
In other words, the amount p moves in a short time interval is the sum of
how much a moves and how much b moves. Dividing by the elapsed time, we
get the approximate velocity relationship
Letting o approach zero, we see that the left-hand side approaches ṗ and the
right-hand side approaches ȧ + ḃ, so they must be equal. And, of course, the
same goes for any number of pieces, so we have
Let’s try the same idea as before, letting a small amount of time o elapse.
The position vector then changes from its current value p(t) to the nearby
vector p(t + o).
The difference p(t + o) – p(t) is then also a vector, namely the shift from the
current position to the slightly later position. This vector is tiny, but points
very nearly in the direction of motion. In other words, its heading is very
close to that of the true velocity at time t. As for its length, it is, of course,
approaching zero, but when divided by o it should give a good approximation
to the speed, since (at least for small values of o) the length of p(t + o) – p(t)
is pretty much the same as the distance traveled by the point during that small
interval of time. (Note that division by o doesn’t change the direction, only
the length.)
Intuitively, one can imagine that each of the parts of the sum are
compelling the point to travel in a certain direction at a certain speed, tugging
on it, as it were, in its own way, and the resulting motion is the effect of these
separate tugs acting simultaneously. For example, we can view a helical
motion as a sum of a rotational (uniform circular) motion together with a
linear motion.
The linear motion is pulling the point forward at a certain speed, and the
circular motion is pushing it around the circle.
The combined effect of these two (their vector sum) is then the actual
velocity of the helical motion.
It is this addition law of velocities that makes mechanical relativity—the
breaking down of motions into sums of simpler motions—such a useful idea.
If the velocity of a compound motion could not be easily recovered from the
velocities of its separate pieces, there would not be so much value to the
breakdown in the first place.
What we have now is a reduction strategy. If we want to understand a
complex motion, we can look for ways to break it up into simple parts and
then study the parts separately. The good news is that (at least in the case of
velocity) we can easily reassemble the information piece by piece.
16
Now let’s see if we can use these ideas to find the velocity of the cycloid
motion. We have already broken this motion down into a sum of three pieces:
a constant vector, a uniform linear motion, and a uniform circular motion.
The velocity vector always points directly along the circle, so it must be
perpendicular to the radial position vector. Since we chose our units so that
the radius and the speed are both equal to 1, these two vectors are both of unit
length. The radial vector starts its journey at the bottom of the circle (that is, it
is equal to −u2 at time t = 0) and rotates clockwise, so we found it to have the
coordinate description (−sin t)u1 + (−cos t)u2, which for simplicity we could
write as (−sin t, −cos t). What are the coordinates of the velocity vector?
Notice that when two vectors in the plane are perpendicular, they both form
the same little right triangle—only one’s up is the other one’s across (and the
orientation gets flipped). More precisely, if a vector has coordinates (x, y) and
we rotate it one-quarter of a turn clockwise (that is, from the second direction
toward the first), the new coordinates will be (y, −x).
This means that the velocity vector of our (clockwise, starting at the
bottom) uniform circular motion must have coordinates (−cos t, sin t). We
could also see this by observing that the velocity vector itself is undergoing
uniform circular motion, beginning at (−1, 0) and proceeding clockwise. In
any case, we can now assemble the pieces:
and therefore
which we could also write as (1 − cos t, sin t) for short. So we now know the
exact velocity of our moving point at all times. In particular, its speed at time
t is given by
Here I’m making use of the customary abbreviation sin2t in place of the more
cumbersome (sin t)2.
Let’s take a look at some specific moments in the history of this motion.
At time t = π, when the point has reached the top of the rolling disk, the
position vector p = (π, 2) and the velocity vector (according to our formula) is
ṗ = (2, 0), meaning that our point is moving directly forward at a rate of 2,
twice the speed at which the center of the disk is traveling. Notice also that at
time t = 0 (and again at times 2π, 4π, etc.), the velocity vector is 0. These are
the moments when the motion of the point reverses direction and “hangs.”
Finally, at time t = (one-sixth of the way through the first rotation), we have
ṗ = (1/2, ), which means our point is heading forward and up at an angle of
:
The circular velocity (as we have seen) is perpendicular to the radial vector,
and the linear velocity adds to this a horizontal shift. Both of these vectors
have length equal to the radius by our choices. Our velocity vector is just the
sum of these two vectors, so they make a little triangle.
Now here’s the clever observation: if we rotate this triangle 90 degrees
clockwise, then the circular velocity becomes a radius and the horizontal push
turns into a downward vertical vector.
The velocity vector has become a so-called chord of the circle, connecting
our moving point to the point where the circle touches the ground. In other
words, we can see the velocity as simply being a rotated version of the chord.
The upshot of this calculation is that the speed of the cycloid motion is
equal to the length of the chord. Which is not to say that such information
could not be obtained by other means. The cycloid, for instance, is simple
enough that one does not require vector or coordinate descriptions at all, as
long as one is sufficiently clever. (In fact, the measurement of the area of a
cycloid preceded Descartes’ work by several years.)
The point is not that these techniques—vectors and coordinates, relativity,
exhaustion—are always necessary (although they often are), but that they are
so wonderfully general and require no particular inspiration or genius on the
part of the user. That is, we have a uniform way to treat geometric and
mechanical problems. Of course, there will be occasions when a simpler or
more symmetrical approach is possible, but these tend to be rather ad hoc and
special, though undeniably quite beautiful and imaginative.
p = x u1 + y u2,
where x and y are the separate horizontal and vertical components, which,
since they depend on time in some way, can be viewed as one-dimensional
motions in their own right. Then our addition law tells us that
ṗ = ẋu1 + ẏu2
In particular, the speed of the moving point (being the length of this vector)
is just the Pythagorean combination of the separate one-dimensional speeds,
namely
Then what we are saying is that not only is the position of the point simply
the pair of positions of the dials, but the velocity of the point is also just the
pair of velocities. So if at some instant the horizontal knob is moving at a rate
of 3 and the vertical knob at a rate of 4, then the point itself has a speed of 5 at
that moment and is moving in the “over 3, up 4” direction. Of course, this
also works in three dimensions or higher, the only difference being one of
visualization (and having more knobs). So in general, for a motion in any
dimension whatsoever,
p = (x, y, z, . . .) ṗ = (ẋ, ẏ, , . . .).
In this case (assuming the standard choices), the horizontal and vertical
components of this motion are just cos t and sin t respectively. That is, we
could think of uniform circular motion as the pair of one-dimensional motions
x = cos t,
y = sin t.
Notice that these patterns are just shifted versions of each other; the cosine
of a number is always equal to the sine of a number greater.
x = cos t ẋ= −sin t,
y = sin t ẏ = cos t.
So the slantedness of the sine wave at any time is just the height of the
cosine wave at that moment and vice versa (with the added twist of the
negative sign). The sine and cosine patterns form a very incestuous pair—
each one is (essentially) the derivative of the other. By the way, the annoying
negative sign is unavoidable; if we changed our conventions regarding
orientation, it would still be there, only in a different place.
17
At the risk of being redundant (a risk I seem to be quite willing to take), I
want to say a few more words about the philosophy we have adopted. The
idea is to subsume the study of shape and motion into the larger, more abstract
world of numerical variables and relationships. This viewpoint not only has
the benefit of simplicity (there are no units to worry about and we don’t need
to be able to visualize anything) but also tremendous flexibility and
generality.
In fact, it would be hard to find any scientist, architect, or engineer who is
not in some way engaged in the process of modeling—creating an abstract,
simplified representation of their problem by a set of variables and equations
(e.g., a biologist’s model of mammalian territorial behavior, a cardiologist’s
model of vascular pressure, or an electrical engineer’s model of energy
capacity). Of course, in cases like these, there is a considerable difference
between the real object of interest (i.e., nature) and a mathematical model of
it. This is pretty much what scientists spend their time worrying about—the
aptness of their mathematical models of reality. For example, when new
experiments are performed, or new data is collected, it often leads to the
rejection of the current model and its replacement by an updated version.
The situation is quite different for mathematicians: for us, the mathematical
model is the object of study! There is nothing empirical here; we are not
awaiting any confirmation or test results. A mathematical structure is what it
is, and anything we discover about it is the truth. In particular, if we choose to
model an imaginary curve or motion by a set of equations, we are not making
any guesses or losing any information through oversimplification: our objects
are already (for aesthetic reasons) as simple as they can be. There is no
possibility of conflating reality and imagination if everything is imaginary in
the first place.
So henceforth, our objects of study will be systems of variables (which
Newton called fluents, Latin for “that which flows”) and relations; that is,
equations expressing the relationships among the variables.
Sometimes I like to think of my variables as being sliders on an imaginary
multichannel mixing board:
Here we have a sum of a uniform circular motion (with all the standard
choices) and a uniform linear motion that, rather than being perpendicular to
the rotating disk (as in a conventional helix), is tilted at a 45-degree angle:
This tells us that our point is speeding up and slowing down in a fairly
subtle way, and since sin t varies between −1 and +1, the speed ranges from 1
to .
The point being that these measurements come directly from the abstract
numerical relationships and not from any visual or kinetic image. The model
doesn’t know, or need to know, what it is a model of (if anything). Our project
has subtly shifted (if my constant harping on this point can be called subtle)
from being the study of velocities of motions to the study of the derivative—
the abstract transformation by which a variable p produces a new variable ṗ,
which Newton referred to as the fluxion of the fluent p.
This immediately suggests the possibility of double dotting—viewing ṗ
itself as a variable, which could then be dotted to produce and even three
dots, and so on. If we interpret p as a motion (that is, as the position of a
moving point), then would measure the rate at which the velocity ṗ changes;
in other words, the acceleration. Geometrically, could be seen as a way to
measure the rate at which the slantedness of a curve changes—what a
geometer would call its curvature. As mathematicians, of course, we are free
to make either interpretation, or neither. We can simply speak of higher
derivatives in the abstract and then study their interesting properties.
For example, as we take more and more derivatives, the squaring function
(p = t2) transforms into doubling (ṗ = 2t), which in turn becomes constant ( =
2) and finally zero for all higher derivatives.
18
Before we develop these ideas any further, I want to show you an even more
general and abstract approach that I vastly prefer. Maybe the best way to start
is with an analogy. We’ve seen many times that measurements are always
relative and that any well-posed measurement question always (at least
implicitly) comes down to a comparison of some sort. For example, if we
wanted to measure a certain length or area, we would be asking about the
extent of a line or the space enclosed by a region, measured in comparison to
some other object of the same kind. We could, of course, choose some
standard of comparison, such as a certain fixed square whose side length and
area we could take to be our units of measurement. Any new object could
then be compared with this standard and measured against it.
To me, this is a repugnant idea. I don’t want any unnecessary and contrived
units cluttering up my beautiful imaginary universe. If I want to measure the
diagonal of a pentagon compared to its side, I don’t need to measure both
with respect to some preexisting standard length and then compare the two; I
can compare them directly to each other. (I know I’ve talked about this a
million times, but bear with me.)
The way I like to think of it is that lines have length or extent whether or
not we measure it. A region encloses space whether or not I choose to
compare it with anything else. So length and area aren’t numbers, they are
abstract geometric quantities. Only when we compare them and form ratios
do we obtain numerical values. The diagonal of a square has a length, and so
does the side. Neither of these is a number, but nevertheless one is exactly
times the other.
If this all sounds like I’m flogging a dead horse, the point is that we have
been unconsciously putting this same kind of arbitrary and unnecessary
obstacle in our way when we measure velocity. If you watch a cheetah
running, it has a speed or a rate independent of any measuring, just as a circle
encloses an amount of space. This rate is not a number and there are no units,
and yet, if a horse were running alongside it (so now I’m flogging a live
horse?), we could tell that the cheetah was going twice as fast. That is, we
could wait a certain amount of time (no need to measure it in seconds or
anything) and see how far the two animals traveled (also no need to measure
in any units) and compare the distances to each other. So the abstract,
unmeasured rate of something is meaningful. What we have been doing up to
now is choosing a standard unit of speed—namely, the speed of our clock!
That is, the rate of time itself has been our (soon-to-be-discarded) unit of
measurement.
Let’s imagine that we have two time-dependent variables a and b, related to
each other by some equation.
If we were interested in the relative speeds of a and b at a particular
moment, we could, of course, calculate ȧ and ḃ form their ratio ȧ/ḃ. But this is
every bit as unnecessary (and aesthetically appalling) as the geometric
examples I mentioned. We shouldn’t need to involve time at all.
Suppose, for instance, that only the a and b sliders were accessible to us,
and the t slider was hidden behind the scenes somewhere. We could still give
the mixing board a little kick, and the sliders would each move slightly.
As usual, we would obtain the small variations Δa and Δb. Only now,
instead of comparing them both to Δt, we simply compare Δa and Δb directly.
Then as these small differences both approach zero, we get the true proportion
of their instantaneous velocities.
Does this make sense? To make this easier to talk about, let me introduce
some notation—this is, after all, the whole point of notation. Let’s write dx
for “the instantaneous rate of change of the variable x.” That is, dx is the
abstract, nonnumerical velocity, analogous to “cheetah speed.” (This notation
was first introduced by Leibniz in the 1670s.) Then what we are saying is that
the proportion of small changes Δa : Δb approaches the true velocity
proportion da : db as these tiny increments simultaneously vanish.
In terms of our new Leibnizian notation, the fluxion ẋ is simply the ratio
dx/dt. This means that all of our results concerning fluxions can be rephrased
easily in this new abstract language. For instance, our previous computation
d(t2) = 2t dt.
This is not a special statement about time; this is true for any variable
whatever. So (using w for “whatever”) we have
d(w2) = 2w dw,
and this says that “the rate at which the variable w2 changes is always exactly
twice the current value of w times as fast as the rate of w itself.” Note the
economy of the notation; we don’t need to give names to the patterns and then
dot the names, we can just d the patterns directly. Thus we also have
I want to make two things perfectly clear. The first is that dx (the so-called
differential of x) is not a number; it is an abstract rate. Cheetah speed is not a
number and neither is horse speed, but we can still say one is twice the other
(as with lengths, areas, and all other measurements). The other is that this d
we are using (the Leibniz d-operator) is not a number either. When we write
dx we are not multiplying d by x, we are applying the d-operator to the
variable x to obtain the differential of x. The notational ambiguity is slightly
annoying I admit, but as long as we are reasonably careful (which would
include not choosing d as one of our variable names!), it’s not really much of
a problem. On the contrary, Leibniz’s notation is extremely flexible and
convenient once you get used to it.
Generally speaking, most of our measurement problems will come down to
finding the relative velocities of a set of variables. Whether time is included
among them is up to you and depends on the specific problem at hand. If you
were interested in a particular motion, then perhaps it would make sense to
think of time as one of your variables and all the others dependent on it. A
purely geometric question, on the other hand, has no need for any ticking
clocks.
Suppose we have a set of variables a, b, and c connected to each other by a
set of equations:
Notice that in this example none of the variables is special. None of them
plays the role of time—there is no “master” slider that controls the others.
Instead we have an interdependence among the variables. At any given
moment, the variables will have certain values, as well as differentials (i.e.,
their instantaneous rates of change at that moment). The question is, how
exactly do the relationships among the variables control the relative
proportions of their differentials? How can we take the information
a2 = b2 + 3,
c = 2a + b,
After all, if two variables are always equal their rates must also be equal.
Expanding these accordingly, we obtain the differential equations
2a da = 2b db,
dc = 2 da + db.
4 da = 2 db,
dc = 2 da + db.
Thus at that precise instant, b is moving twice as fast as a, and c four times
as fast. In other words, the ratio da : db : dc is 1 : 2 : 4. We now have a simple
and direct method for solving any problem concerning relative rates of change
—just d everything!
Incidentally, Leibniz’s original interpretation was somewhat different. His
view was that dx, rather than representing the instantaneous rate at which x
changes, is instead an infinitesimal change in x itself. That is, as Δx shrinks
away to nothing, it sort of “hovers” at the value dx, which, though not exactly
zero, is nevertheless smaller than any positive quantity. (Imagine what his
critics had to say about that!) Actually, there is no real problem with this view,
as long as you are sufficiently careful. After all, in a small interval of time, the
proportion of two velocities is the same as the proportion of the distances
traveled. The point is that the approximation Δa : Δb approaches the true
proportion da : db. We may interpret it however we wish.
19
The problem of velocity can now be reduced to the study of the Leibniz d-
operator. Given any set of equations which describe a motion, we can simply
d them to obtain the relative rate information. The only remaining problem is
to determine exactly how the d-operator behaves. How exactly does an
interdependence among variables get transformed into a relationship among
differentials?
We saw before that if a and b are two variables and we form a new variable
c = 2a + b, then the rate at which c changes can be easily determined from the
rates of a and b:
dc = d(2a + b)
= 2 da + db.
Here we have used the fact that d behaves linearly; that is, for any variables
x and y, and any constant c, we have
d(x + y) = dx + dy,
d(cx) = c dx.
But what if the relationships among the variables are more complicated?
Suppose, for example, we wanted to compare the rates of x and y, where y =
x3 sin x? We can certainly say that dy = d(x3 sin x), but in order to relate this
to dx itself, we need to understand more about the d-ing process. In particular,
we need to know how d acts on products of variables. How exactly does d(ab)
depend on da and db? This is where our study of motion has taken us—we’re
now asking questions about the abstract behavior of a differential operator.
How does d act on square roots? On division? Any operation that can be
performed on a number or set of numbers could conceivably be used to
describe an interdependence among variables, and to understand their relative
rates of change, we would need to know how d behaves when confronted with
such an operation. Of course, many operations (such as the x3 sin x example
above) can be viewed as having been built up from simpler ones (e.g., x3 sin x
is x3 times sin x), so that if we can figure out the behavior of d for a few
simple operations (in particular, multiplication), we can hopefully deal with
more complex combinations of them.
So let’s try to determine d(xy) in terms of dx and dy. We’ll do this “by
hand” as it were, imagining that x and y somehow depend on t (which we can
think of as time if we want to), and then we’ll see what happens as we vary t a
little bit. Essentially, what this does is to choose dt as our unit of speed. This
is analogous to the situation in geometry where we choose an arbitrary unit,
use it to make measurements, and then discard it once we have discovered the
correct relationships. (It’s not unlike the scaffolding used during the
construction of a building. It is temporarily quite helpful, but it is ultimately
removed.)
So imagine that t changes a little, let’s say by an amount Δt. Then x and y
react, becoming x + Δx and y + Δy respectively. The amount of change in xy is
then
where the last term has been reorganized for the sake of symmetry. Letting Δt
approach zero, we see that the last term shrinks away to nothing, and we get
d(xy) = x dy + y dx.
We see that for small changes in x and y we get a change in area equal to
that of an L-shaped sliver, so we again get
For tiny increments Δx and Δy, the final term is of much smaller magnitude
than the other terms. Both Newton and Leibniz recognized that this term is
comparatively negligible and ultimately makes no contribution to the velocity.
(Berkeley was less than convinced.) A more modern explanation (which,
ironically, is practically the same one that Archimedes or Eudoxus would
have given) is that Δx · Δy is never equal to zero, but that its proportion to the
other terms approaches zero (the first two terms are on the order of Δt,
whereas the last term is more like (Δt)2 in magnitude). So in fact, we are
justified in replacing all terms of the form Δw with the corresponding
differential dw, so long as we omit all “higher-order” terms involving
products of Δ’s. Thus we again obtain Leibniz’s beautiful formula,
d(xy) = x dy + y dx.
Now let’s examine some of its consequences. First of all, notice that our
previous result d(w2) = 2w dw follows immediately from our product formula:
d(w2) = d(w · w)
= w dw + w dw = 2w dw.
Now we can calculate things like d(x3 sin x). From our earlier work, we
know that d(sin x) = cos x dx, so our product formula gives us
So, for example, when x = π, the variable x3 sin x is traveling exactly –π3
times as fast as x is (i.e., π3 times as fast in the opposite direction). It’s pretty
amazing that we have access to information like that at all, let alone that we
can get at it so easily (if you call the development of an entire theory of
mathematical motion over a period of twenty centuries easy).
As a further consequence of Leibniz’s rule, we can easily obtain a formula
for the differential of the reciprocal of a variable, d(1/w). The simplest way to
proceed is to go back to the very definition of reciprocal, namely
Now we know the precise rate at which 1/w changes, depending on how w
itself is varying.
20
At this point we have compiled a fairly extensive library of facts about the
Leibniz d-operator. Here is a summary of what we know so far:
Constants: dc = 0 for any constant c.
Quotients:
Square roots:
Of course, we will be adding to this list, but not very much; this is already
an extremely powerful collection of results and allows us to calculate the
differential of almost any combination of variables you can imagine. Here are
a few illustrative examples:
Sometimes I like to think of the d-operator as a sort of enzyme acting on
long, complicated molecules (the atoms are the variables themselves). For
instance, if x and y are my atoms, I can construct the complex molecule (y cos
)3. We can view this as being constructed hierarchically as follows: start
with x, square-root it, take the cosine of that, multiply by y, and then cube the
whole thing. So structurally, I can think of it as being a cube. That is, I can
blur my eyes (so to speak) and picture it as (blah)3, where for the moment I
ignore the details of what “blah” stands for. Then my d-enzyme goes to work:
This is because d(w3) = 3w2 dw for any variable w, no matter what it looks
like. So d doesn’t care what “blah” is; it just goes to work unraveling the
molecule step by step. (This process is commonly referred to as chaining.)
We are now reduced to finding d(blah). Now “blah” itself is a product,
namely y cos . So using the product pattern, we get
This reveals the next layer of structure in our molecule, so we then need to
break down d(cos ):
Finally, from our table we find
The first question is, what exactly do we mean by spiral? I like to think of it
as a point on the end of a rotating stick that gets longer as it turns. For
simplicity, let’s say that the rate of turning and the rate of lengthening are
both uniform. In fact, let’s take them both to be 1. If the stick were simply
rotating, we could use the standard description for uniform circular motion: x
= cos t, y = sin t. Because the stick is growing, this will have to be modified to
x = t cos t,
y = t sin t.
At time t = 2π, our point has position (2π, 0). How fast is it traveling?
Applying d to our equations gives
so we get a velocity vector (ẋ, ẏ) equal at all times to (cos t – t sin t, sin t + t
cos t). In particular, at the end of one rotation (when t = 2π), we have a
velocity of (1, 2π) and hence a speed of .
Show that this spiral motion has the same speed at all times
as the parabolic motion .
21
We now have a simple and reliable way to measure the relative rates of a set
of interrelated numerical variables. In particular, we have completely solved
the problem of velocity. Given any motion (that is, a set of time-dependent
variables and equations expressing this dependence), we can simply apply the
Leibniz d-operator to these equations and, using our differential calculus,
obtain the velocity components ẋ as ratios dx/dt.
This alone would be more than enough justification for all the effort (both
conceptual and technical) that went into the development of the differential
calculus, but the fact is that not only velocity but virtually all measurement
problems can be expressed in the language of variables and differentials, and
the differential calculus allows us to solve a great many of them quite easily.
(Of course, I would argue that the real justification lies in the beauty and
profundity of the ideas themselves.)
In particular, it was discovered quite early in the development of the
differential calculus that these methods can even be applied to the problems of
classical geometry—that is, the measurement of angle, length, area, and
volume. In many ways, this is quite surprising. After all, differentials are
instantaneous rates of variable quantities, whereas geometric measurements
are fixed and static.
A while back, I was telling you about Archimedes’s measurement of the
parabola. The very beautiful discovery was that a parabolic section always
takes up exactly two-thirds of its box.
As the point moves along the parabola, the area of the enclosed region
changes. Our problem becomes not merely the determination of one particular
parabolic area, but the measurement of all such areas. In other words, we are
interested in the relationship between where we chop the parabola and how
much area is enclosed.
If you like, we can think of the parabola as a bowl slowly filling with
liquid.
As the level of the liquid increases, so does its area (this is imaginary two-
dimensional liquid), and our question becomes, how does the area depend on
the height?
The important point is that now that the area is variable, it has a rate. In
place of a cold, dead area just sitting there, we have an active, exciting area,
which, as a consequence of its fluent nature, possesses a differential. Let’s see
if we can get our hands on it.
Going back to our coordinate description, we see that at any moment in the
life of this motion we have three related variables: x, y, and the area A. The
problem is to determine the precise relationship among them.
If we give our picture a little kick, the point moves, x and y change slightly,
and so does A.
The small change ΔA appears as a thin sliver of area. How big is it?
Intuitively, we would say that it is about the same size as a rectangle of the
same height and width; that is, about 2x Δy.
More precisely, this sliver (being curved) must be slightly larger than the
inside rectangle, whose area is 2x Δy, and slightly smaller than the outside
rectangle with area 2(x + Δx) Δy. This means that
Of course, all three of these quantities are approaching zero, but their
relative proportions are not. In particular, we have ΔA/Δy sandwiched
between two values,
Since both the upper and lower bounds approach the same thing, namely
2x, it must be that ΔA/Δy does also. That is,
On the other hand, ΔA/Δy, being the ratio of small changes in the variables
A and y, must also approach the true proportion of their differentials. Thus
dA/dy = 2x. Multiplying by dy, we obtain a differential equation for the area
of a parabolic section:
dA = 2x dy.
The first equation indicates the shape we are measuring, and the second
comes from our geometric reasoning. At this point, we can forget about
origins and motivations and view the problem as a purely abstract question
about three variables. How do we express A in terms of x and y?
The first step might be to eliminate y from the discussion. After all, we
know what it is, namely x2. So we can rewrite our differential equation as
d(a – b) = da – db = 0,
and this means that a – b must be constant. So there is some ambiguity in un-
d-ing, but not too much. Just as a number has two square roots, a differential
has an infinite number of un-d-ings, all of them differing from each other by
additive constants.
So we cannot directly conclude from our differential equation
that A itself equals , but we do know that at worst they
differ by a constant. That is,
That is the most we can say from the differential equation alone. There is
no way to rule out a possible constant on differential grounds, just as there is
no way to tell from the speedometer alone whether you are in the car or the
trailer. This ambiguity comes from the fact that our geometric argument only
considered the change in the area, not where we started measuring it from.
But in fact we do have slightly more information—namely, we have a so-
called initial condition. At the bottom of the parabola, we clearly have both x
and A equal to zero. Since the above equation expresses a relationship
between our variables which is valid at all times, it must hold at this particular
moment as well. This implies that our (putative) constant must in fact be zero.
So we can conclude that after all.
Generally speaking, there are two parts to solving a differential equation:
intelligent guessing with modification to get a so-called generic solution and
then using special values of the variables—typically initial conditions—to
determine any ambiguous constants.
Going back to our coordinate picture, we see that the rectangle containing
our parabolic region has area 2xy = 2x3. Thus the parabola-to-box proportion
is
Since this ratio is independent of any units, we can throw away all the
scaffolding—coordinate systems, variables, equations, and all—and simply
say (along with Archimedes) that a parabolic section always takes up two-
thirds of its box.
22
Let’s step back a little bit and think about what just happened. I don’t want
the big ideas to get lost in the computational details. The point is that we can
apply our differential methods even to something as seemingly static as a
geometric measurement. The key idea is this: get your measurements moving.
Every application of the differential calculus—to geometry, mathematical
physics, electrical engineering, and anything else—comes down to this one
idea. If you want to measure something, wiggle it. Once a measurement is in
motion, it has a rate of motion, and if we are at all fortunate (and we usually
are), we can derive some sort of differential equation describing the way our
measurement behaves.
What this means is that the study of measurement ultimately reduces to the
study of differential equations (a possible exception being the measurement of
polygons, i.e., trigonometry, where simpler methods are available). Questions
concerning the existence and uniqueness of solutions (as well as their ability
to be explicitly described) dominated the mathematics of the eighteenth
century and continue to be an active area of mathematical research.
The method we used to obtain a differential equation for the area of the
parabola is actually quite general. First, we found a simple way to view the
area we wanted as a variable quantity; that is, we got it moving. Then, we
estimated the change in area in terms of the changes in the coordinate
variables. Finally, we let the small changes approach zero so that our
approximation became an exact statement about instantaneous rates; that is, a
differential equation.
Suppose, for instance, that we had a closed curve whose area we wanted to
measure.
A simple way to get the area moving is to choose a direction and “sweep
out” the area in that direction, as though we were putting the curve through a
scanner:
In this way, the variable area depends on the location of the scanning line.
Let’s denote the position of the line (i.e., the width of the area collected so
far) by w and the swept-out area by A. At any given moment, we have a
certain cross-sectional length, say l, and as the scanner moves along, we get
variations in w, l, and A.
Of course, the way that w and l are related depends on the shape of the
curve (in fact, it practically defines the shape of the curve). If we make a
slight change in the scanner position, say from w to w + Δw, we get
corresponding changes in the length l and the area A.
ΔA ≈ l Δw.
Another way of saying this is that ΔA/Δw, being in some sense the
“average” cross-sectional length during this small scanning interval, must be
roughly equal to l. Of course, as the small changes approach zero and the thin
sliver of area ΔA gets thinner, this average length approaches l exactly. So we
get a differential equation of the form
dA = l dw.
What this says is that the rate of change in the scanned area is just the
product of the cross-sectional length and the rate of the scanning motion.
Does this remind you of the Pappus philosophy? Leibniz’s own view was that
areas are comprised of infinitely many infinitesimally thin rectangles, so that
the above differential equation is essentially an infinitesimal version of the
“length times width” formula for a rectangular area.
However you wish to interpret it, the above equation is quite general; we
can say this for any curve and any scanning direction. Because of this, it is up
to us to choose our orientations wisely so that we get the simplest differential
equation possible. (In particular, our choices will determine the precise form
of the relationship between w and l, which will seriously—and subtly—affect
our ability to solve such an equation.)
A nice example of this method is the measurement of the area of a
sinusoidal arch; that is, one of the humps in the graph of the relation y = sin
x.
Here it makes good sense to sweep horizontally so that the position of the
scanning line is simply the coordinate x itself (running from 0 to π) and the
cross-sectional length is just sin x. Then our differential equation for area
reads
dA = sin x dx.
A = 1 – cos x.
This tells us what the swept-out area is for all positions of the scanning
line. In particular, when x = π, we get the nice result that the area of a
complete arch is exactly
1 – cos π = 1 – (–1) = 2.
How beautiful! I’ve always found this result surprising (and somewhat
ironic, given the transcendental nature of the sine function).
I think it is important to understand the close connection between these
techniques and the classical method of exhaustion.
The classical idea would be to choose a direction and slice up our area into
tiny approximating rectangles. If we got incredibly lucky, we might notice a
pattern to the approximations and be able to figure out where they are
heading. With the differential approach, we don’t need to be clever at all; we
simply write down the equations and let the d-operator do all the pattern
bookkeeping for us. The difficulty is transferred from the polygons and the
details of the approximation pattern to the solving of a differential equation.
This is almost always a trade worth making. Even forgetting about the
technical details, at least the differential method is completely uniform,
whereas in the classical case, each new shape has to be handled in its own ad
hoc and idiosyncratic way.
So the “rocks and symbols” analogy is really quite apt. The classical
method of exhaustion is like dealing with massive piles of rocks, and the
differential calculus is like adding columns of digits—so much so that we can
even build machines to do the computations for us. This fits into the larger
historical trend in mathematics known as the arithmetization of geometry.
Shapes become number patterns, and their measurements are governed by
differential equations.
As a final example, let me show you an amusing way to determine the area
of a circle (this is, after all, the prototypical example of classical exhaustion).
I know we already understand the circle (as much as it can be understood,
anyway), but my point is to show that our new methods can give us fresh
perspectives on old problems. In this case, instead of sweeping out area, I’m
going to grow area out from the center.
So both the radius r and the area A will be variables. In this case, small
changes lead to a circular ring of area.
dA = 2πr dr.
23
So all the subtle and beautiful variety of shapes and motions and all the
fascinating questions concerning their measurements can be reduced to the
problem of applying and inverting the Leibniz d-operator. Whereas d itself
transforms variables into differentials, many of the most interesting
measurements (such as area and volume) involve doing the opposite. So in
many ways un-d-ing is the more interesting process, especially since we have
a calculus for d itself.
Leibniz, of course, had a somewhat different interpretation. He considered
dx not to be the instantaneous rate of x (although he certainly understood
Newton’s theory of fluxions perfectly), but rather (and somewhat more
mystically) as the infinitesimal change in x. A useful analogy here might be to
think of x as a list of numbers—a so-called discrete variable:
x: 0, 1, 3, 2, 5, 6, 4, . . .
dy = x2 dx,
By the way, this is usually read as “the integral of x2 dx” rather than using
the older word summa. (The word integral comes from the Latin integer,
meaning “whole.”) Leibniz’s symbol is known as the integral sign and the un-
d-ing process is usually referred to as integration.
In this particular case, we can guess and modify to obtain the result
Now in practice, most people use both the square root sign and the
integral sign in a somewhat cavalier manner. That is to say, when I write
, I am quite aware of the fact that –4 is also a square root of 16.
Sometimes I might even remind myself of this possibility by writing
. Similarly, I will often write things like
knowing full well that there is a potential additive constant. The same goes
for any operation that collapses information; if several different numbers all
go to the same place, the inverse operation will carry a certain amount of
ambiguity. How you deal with that fact notationally is your business, but
confusing things can happen if you aren’t careful!
Anyway, most working mathematicians use the integral sign (at least in this
context) to mean any variable with the prescribed differential, and the
ambiguity is not usually explicitly written, though, of course, it is understood.
When one speaks of “the square root of 16” or “the integral of x2 dx” one
needs to understand this somewhat professional meaning of the word the.
So the art of measurement pretty much comes down to understanding the
behavior of the -operator. As I mentioned already, this turns out not to be so
simple. Which is not to say that we know nothing. Over the last 350 years,
people have discovered and compiled hundreds of patterns and formulae in
the form of so-called integral tables, which, in effect, give us a sort of integral
calculus (albeit a rather humiliating one, since many of the most interesting
and naturally occurring differentials do not appear).
Continuing the square root analogy, it does, of course, happen that one gets
lucky (e.g., ) and can rewrite an expression in a more explicit form, but
most of the time, as in the case of , it is not a question of finding a simpler
form; the number itself is simply not expressible in the language you wish to
use.
Similarly, most integrals are not expressible in terms of so-called
elementary operations (e.g., addition and subtraction, multiplication and
division, square roots, sine and cosine). For example, the integral
is certainly “out there” as a variable, depending on x in some definite way (at
least up to the usual ambiguous constant), but that dependence is provably not
describable in terms of algebraic and trigonometric patterns. This is a pretty
spectacular example of the power of modern mathematics that we can even
devise such arguments (and, of course, I can’t explain them to you here,
which is admittedly rather frustrating).
So we are in a very amusing position philosophically. Just considering the
area of a closed curve, for instance, we have first of all the rather humbling
state of affairs that almost all curves are indescribable in principle (because
they have no pattern that can be encoded in a finite language), and then on top
of that, even the ones that can be talked about (i.e., the ones that can be
described by a set of variables and relations) almost always lead to differential
equations whose solutions are not explicitly describable. We have this
amazingly beautiful and powerful theory of differentials (including a calculus
for crying out loud), but the powers that be (the mathematical gods?) have
decreed that we are only to have definite explicit knowledge in the tiniest
fraction of cases.
On the bright side, at the very least we have a uniform language for
measurement description, and through this means we are able to make
connections and see relationships between measurements, even if we are
forbidden from knowing them explicitly. In particular, if two seemingly
unrelated problems lead to the same differential equation, then even if we
cannot solve it, we still know there is a deep underlying connection between
them. This is, ultimately, the only real value of linguistic constructs and the
only thing that conscious beings can ever do with language, if you think about
it.
24
I love the contrast between the ancient and modern approaches to geometric
measurement. The classical Greek idea is to hold your measurement down
and chop it into pieces; the seventeenth-century method is to let it run free
and watch how it changes. There is something slightly perverse (or at least
ironic) about how much easier it is to deal with an infinite family of varying
measurements than with a single static one. Again, the trick is to figure out a
way to get your measurement moving.
Of course, this is particularly easy to do when your problem involves
motion—it’s not very hard to get things moving if they’re moving already. As
an illustration of this idea, let’s try to measure the length of a cycloid.
The natural measurement would be the length of one complete arch of the
cycloid (let’s say compared to the diameter of the rolling circle). The classical
approach would be to chop up the cycloid into tiny pieces, approximate them
by straight lines, and try to figure out where the approximate total length is
heading. (This approach was in fact carried out by Bernoulli and others in the
1630s.)
x = t – sin t,
y = 1 – cos t.
At any time t, our moving point is located at the position (x, y). Let’s call
the traced-out length l. Our problem is to determine how l depends on t. As
usual, the idea is to obtain a differential equation for l.
Let’s imagine a very small amount of time going by and consider the small
changes in x, y, and l.
When these changes are very small, the length Δl is practically the
hypotenuse of the right triangle formed by Δx and Δy. (This is the way the
classical idea still comes into play.) The Pythagorean relation then gives us
the approximation
Letting the time interval Δt approach zero, we get the desired differential
equation,
It has become customary, by the way, to write dx2 in place of the more
cumbersome (dx)2. We just have to be careful not to get dx2 confused with
d(x2). Of course, you can always use parentheses if you are worried about it.
So we find a sort of “infinitesimal” Pythagorean relation, which expresses
the differential arc length dl in terms of its horizontal and vertical
components. Another way to think about it is that since dl/dt measures the
rate at which distance is traversed, it must be the same as the speed of the
moving point; that is, the length of the velocity vector (ẋ, ẏ). We get
can be rewritten in a very simple and elegant way. Imagine a circular arc of
length t.
We can then view 1 – cos t and sin t as the sides of a right triangle with
hypotenuse . In other words, the thing that we are
interested in is exactly the length of the chord spanning an arc of length t. (We
saw this before when we measured the velocity of the cycloid motion.) Now
here’s the clever idea: rotate the circle so that this chord is vertical.
Now we can see that the chord consists of two halves, each of which is just
the sine of an arc half as long. That is, the length of our chord can also be
written as 2 sin . So a simple change of perspective (and isn’t that what
every great idea comes down to?) leads to the surprising and beautiful result
that
There are many such interrelationships between sine and cosine, all of them
ultimately coming from the symmetry and simplicity of uniform circular
motion.
We can now rewrite our differential equation for the arc length of a cycloid
as
Now, that’s more like it! Here is something we have a real chance of being
able to integrate. In fact, a reasonable guess would be l = –cos . Now
And so there it is! We have successfully measured the arc length variable of
a cycloid using the differential calculus (plus a pretty clever idea about
circles). In particular, a full arch (from t = 0 to t = 2π) has length
4 – 4 cos π = 8.
Show that the area swept out by one full turn of a spiral
takes up exactly one-third of the corresponding circle.
25
I’m going to have to ask you to bear with me once again while I make a few
philosophical remarks. The big idea here is that geometry—the study of size
and shape—can be subsumed into the study of variables (also known as
analysis). It is always interesting when seemingly quite different
mathematical structures turn out to be the same. As I’ve said before, the real
object of interest to mathematicians is pattern. If you wish to view such a
pattern geometrically, that might give you a certain kind of insight; whereas if
you think of it as a set of abstract numerical variables, that may lead to
another sort of understanding—and certainly the two viewpoints feel very
different emotionally.
The curious thing is why history went the way it did, and why the modern
approach has been so much more successful. The classical Greek geometers
were every bit as brilliant and resourceful as their seventeenth-century
counterparts (if not more so). It’s certainly not a question of mathematical
talent. There are plenty of reasons why the Greeks preferred direct geometric
reasoning, aesthetic taste, of course, being one of them. In fact, this prejudice
was taken to such an extreme that numbers themselves tended to be viewed
geometrically (as lengths of sticks), and numerical operations were thought of
as geometric transformations (e.g., multiplication as scaling). This severely
hampered their understanding.
The modern approach is almost the exact opposite. Curves and other
geometric objects are replaced by numerical patterns, and the problem of
measurement essentially becomes the study of differential equations. Why, if
these two viewpoints are equivalent, should one of them be so much more
powerful and convenient?
There is no question that as visual animals we prefer a picture to a string of
alchemical symbols. I, for one, want to feel connected to my problem on a
visceral, tactile level. It helps me understand the relevant issues when I can
imagine running my hand over a surface or wiggling part of an object and
picturing in my mind’s eye what happens. But I know that when push comes
to shove, the truth is in the details, and the details are in the number pattern.
Of course, any analytic argument could be painstakingly translated into
purely geometric terms, and in fact, this is the way many seventeenth-century
mathematicians worked; even then there was still a great deal of prejudice in
favor of geometric reasoning. This tends, however, to produce very contorted
and artificial explanations in place of concise, almost-too-simple-to-believe
analytic arguments.
I suppose what I’m really talking about here is modernism. The exact same
issues—abstraction, the study of pattern for its own sake, and (sadly) the
resulting alienation of the layperson—are all present in modern art, music,
and literature. I would even venture to say that we mathematicians have gone
the furthest in this direction, for the simple reason that there is nothing
whatever to stop us. Untethered from the constraints of physical reality, we
can push much further in the direction of simple beauty. Mathematics is the
only true abstract art.
For me, the psychological fact of the matter is that however aesthetically
and emotionally satisfying the geometric view may be, the analytic approach
is, in the end, far more elegant and powerful. We’ve already seen a number of
instances of this—the increased descriptive power, the advantages of a
uniform language that reveals hidden connections, and the ease of
generalization. For instance, the classical geometers (as far as I know) never
even conceived of four-dimensional space, whereas adding another variable is
an obvious and natural analytic extension.
Which is not to say that I am advocating the abandonment of the geometric
viewpoint. Obviously, the greatest mathematical pleasure is to be had by
synthesizing different approaches—to be fluent and comfortable with as many
as you can and to inform each part of your mathematical self via the others.
Think geometrically when a visual image is helpful (usually to get a big idea
or an intuitive connection) and work analytically when that seems appropriate
(usually to make a precise measurement).
Maybe it all comes down to this. There are lots of beautiful patterns out
there. Some, such as a triangle taking up half its box, can be easily seen and
felt; others, like d(x3) = 3x2 dx, are not so immediately available to our visual
imagination. So be it; I myself want to be open to all forms of beauty. For me,
that’s what being a mathematician is all about.
26
Now I want to tell you about another fantastically beautiful and powerful
application of the differential calculus, possibly the most useful in practical
terms.
Imagine a cone sitting inside a sphere.
Clearly the best cone (in the sense of maximizing volume) lies somewhere
in between. Intuitively, it feels to me like the base of the cone should be
slightly below the equator of the sphere, but it’s certainly not obvious exactly
where.
These kinds of questions—where we’re trying to maximize (or minimize) a
particular measurement—have a long history and are known as extremal
problems. For example, the Babylonians knew that among all rectangles with
a given perimeter, the square has the most area. Here is a related problem for
you to think about:
As the height of the cone increases, so does its volume, until it starts
becoming detrimental to be so tall and thin, and then the volume decreases
down to zero again (I am including the extreme cases of a single-point “cone”
of zero volume on the left and a stick of zero volume on the right). Anyway,
somewhere in the middle is the cone we want—a little to the right of the
middle, if my intuition is correct.
To make this more precise, let’s construct a “variables and relations” model
of the situation. (As always, this is the hard part.) Let’s begin by taking the
radius of the sphere as our unit (at least that’s not changing!), and we’ll
denote the height and radius of the cone by h and r respectively. Slicing the
sphere vertically through the center, we see this cross-section:
The geometrical constraint on our cone is that it fit snugly in the sphere.
This means that h and r must be related somehow. In fact, we can see that the
distance from the center of the sphere to the base of the cone is just h – 1. (I
guess I’m tacitly assuming the base lies below the equator, otherwise it would
be 1 – h.)
The distance from the center of the sphere to the edge of the cone is 1, so
Pythagoras says that
(h – 1)2 + r2 = 1.
Note that (because of the squaring) we would get the same equation if the
base of the cone were above the equator. I love it when that happens. So in
either case we get
This tells us how the radius of the cone varies with the height. Now the
volume of the cone is given by
Our question about cones has become an abstract numerical one: What value
of h makes V the largest?
Imagine for a moment that this were the space-time picture of a motion
(that is, h represents time and V the height of a ball, say). We are asking at
what time the ball reaches its maximum height. The answer, of course, is
when its speed is zero. Alternatively, we could say it is when the tangent line
to the graph is horizontal.
and we see that this becomes zero precisely when 4h = 3h2. (Notice that we
don’t have to worry about the differential dh being zero, since at that moment
the radius and height are still changing.) Thus we conclude that h = 4/3. So
the largest cone is attained when the base is one-third of the way below the
equator.
The shape of a cylinder is determined by its radius r and height h, and its
volume and surface area are given by
V = πr2h,
S = 2πrh + 2πr2.
(I’m including the top and bottom lids of the can, of course.) The meaningful
range of variation runs from a flat can (h = 0) to a stick (r = 0). All the while,
the surface area S is being held fixed. This means that r and h are connected.
If I wished, I could even write
and express everything in terms of the single variable r. But it so happens that
I do not wish. Instead, I want to show you another way to proceed that I feel
is more elegant.
The idea is this. Since S is constant, we must have dS = 0 at all times. Since
we want the moment when volume is maximized, we must have dV = 0 at that
instant. In particular, at the moment of interest, we will have both dV = 0 and
dS = 0. Thus we get two differential equations for r and h:
This means that the best cylinder is attained when 2r2 = rh. There are two
solutions to this equation, namely r = 0 and 2r = h. The first is clearly an
artifact at the boundary, and the second is our maximum. Thus the best-
shaped soup can has a height equal to its diameter.
In other words, it’s a rotated square. How beautiful! Maybe not entirely
unexpected, but still. I never cease to be impressed by the simple economy of
this technique.
Find the largest cylinder that fits inside a given cone. How
about in a given sphere?
27
One of the best illustrations of the contrast between the classical and modern
viewpoints is the measurement of the conic sections. Historically, conics have
always been a natural test case for geometers, being (apart from straight lines)
the simplest curves there are. From a classical perspective, conic sections are
literally just that—cross-sections of a cone. These fall naturally into three
categories—ellipse, parabola, and hyperbola—depending on the slantedness
of the slicing plane. All the classical results (e.g., the focal and tangent
properties) follow from this description. Then we have the projective
viewpoint, where the conics can be seen as the various projections of a circle.
Perhaps simplest of all is the algebraic perspective, which reveals the conics
to be those (nondegenerate) curves given by quadratic (i.e., degree 2)
equations of the form
An ellipse with long radius a and short radius b can be viewed as a unit
circle stretched by factors of a and b along the coordinate directions. Since a
unit circle can be described by the equations x = cos t, y = sin t, we see that to
make an ellipse we only need to modify these to
x = a cos t,
y = b sin t.
Alternatively, if you don’t like carrying the parameter t around, you could
instead write this as
Now that we have an equation for an ellipse, what do the integrals for
length and area look like? Of course, we expect the area integral to be
elementary (i.e., explicitly describable) since it is just a dilated circle, but let’s
see.
Integrals of this form (known as elliptic integrals, naturally) arise fairly often
in analysis and are now known to be generically nonelementary. Of course
when a = b, for example, we get a dt = at, corresponding to arc length along
a circle, but in general the circumference of an ellipse is a nonelementary
transcendental function of a and b, so there is no chance of an explicit
description. We might have hoped that since a circle has a circumference of
2π, the circumference of an ellipse might possibly look like
Well it doesn’t. So no wonder the Greeks had a hard time. It’s not that they
weren’t clever enough, it’s that the thing they wanted to say isn’t sayable in
the language they wanted to say it in.
As for the parabola, we have been using the equation y = x2. In case you
haven’t derived this yourself, let me show you why it makes sense. Suppose
we have a parabola, and we choose our units and orientations so that it is
symmetrical with respect to the y axis and the focal point is at (0, 1).
The focal property of the parabola says that the distance from any point on
the curve to the focal point is the same as the distance to the focal line (which
in this case would be the line y = –1). So if (x, y) is a point on the parabola,
we must have
So our parabola has the equation 4y = x2. Rescaling if we want (so that the
focal distance becomes 1/4), we get y = x2 as usual. Since every parabola is
similar to every other, we may as well use the simplest equation we can.
We have already dealt with the area integral
which is not only elementary, but algebraic (no trigonometric functions are
involved). This is why Archimedes was successful. By contrast, we have the
arc length integral (here I prefer to use the equation )
We can then use the focal property of the hyperbola (and rescaling if
necessary) to get the equation
x2 – y2 = 1.
Notice how this is the same as the equation for an ellipse, only with a minus
sign. Of course, this is related to the difference in their focal properties.
On the other hand, choosing our axes to be the tangents at infinity gives us
a different view of a right hyperbola:
x2 – y2 = (x + y)(x – y).
For the sake of simplicity, let’s restrict our attention to the right hyperbola
xy = 1. Of course, there are lots of other hyperbolas out there for us to
measure, but all the difficulties are present in this special case. The relevant
integrals are
and
Once again, we have two perfectly harmless-looking integrals, which are in
fact quite thorny. It turns out that the second one (the one for arc length) is
provably nonelementary and can be rewritten in terms of (modified) elliptic
integrals. So there is at least an abstract sense in which hyperbola length is
related to ellipse length. It is the area integral, however, that is the real
surprise. What, we can’t integrate dx/x? What a scandalous state of affairs!
Are we really going to stand for this?
Before we deal with this disturbing development, let’s go back to the arc
length integral for the parabola, . It turns out that this is intimately
connected to the hyperbolic area integral dx/x. I want to show you this for
two reasons. First, because I think it is surprising and wonderful that the
length of one conic section is related to the area of another, and second,
because it is a great example of analytic technique—the power of symbol
jiggling.
So we are interested in the integral
(I’ve changed notation so there will be no confusion with any of our earlier
symbol choices.)
Let’s abbreviate by s (so s is a new variable I’ve invented to take
the place of this more complicated expression—a surprisingly powerful
technique as you will see). Now we have
s2 – t2 = 1,
which is the equation of a right hyperbola. And our integral becomes simply
s dt which is precisely the area integral. So already the connection is being
revealed, and all we have done is abbreviate. But we can go further. Writing
u = s + t,
v = s – t,
28
Our attempt to measure the conic sections has put us in a rather awkward and
embarrassing position. These are, after all, the simplest possible curves, and
they certainly do lead to very elegant and simple-looking differential
equations, but for some reason we don’t seem to be able to solve them. In
particular, both hyperbolic area and parabolic length come down to the same
question: What on earth is dx/x?
Quite apart from the intrinsic interest of measuring conics, this integral is
analytically interesting in its own right. What could be a simpler and more
natural differential than dx/x? Surely its integral must be simple and natural as
well, mustn’t it? So what’s the problem?
The obvious way to proceed would be to make a series of highly intelligent
(and hopefully lucky) guesses until we find some clever combination of
algebraic or trigonometric functions whose derivative is the reciprocal
function. Unfortunately, this is a hopeless endeavor. As you have probably
guessed, not only is our integral simple and natural, it also represents an
entirely new transcendental function.
So we are not going to be able to use analytical methods to solve the
problem of hyperbolic area. Instead, I want to show you how we can use the
geometry of the hyperbola to get information about the integral. This is
another nice example of the ongoing conversation between geometry and
analysis.
For the sake of definiteness, let’s write A(w) for the area collected under
the hyperbola xy = 1 as x runs from 1 to w.
(Of course, I would much prefer to collect area starting from x = 0, but the
reciprocal curve is infinite there, so x = 1 seems the next best choice.) Now
A(w) is exactly the function we seek—that is, dA = dw/w.
In general, we would want to measure the area between any two points a
and b.
If both a and b are greater than 1, then this area can be viewed as the
difference A(b) – A(a). (We’ll see in a minute how to deal with areas that lie
to the left of x = 1.) Thus, knowledge of the function A—that is, a precise
understanding of exactly how A(w) depends on w—would completely solve
the problem of hyperbolic area. Conversely, any information about hyperbolic
area would tell us something about the behavior of A(w).
As it happens, the reciprocal curve does have a very beautiful area
property: scaling invariance. To illustrate, let’s look at two pieces of
hyperbolic area.
Notice that the second region (running from 3 to 6) is three times wider
than the first (from 1 to 2). It is also one-third as high, because we are dealing
with the reciprocal curve. More precisely, every vertical stick in the first
region corresponds to a stick in the second region whose horizontal position is
three times as large, while being one-third as tall. If we want, we can think of
the second area as a dilation of the first—we stretch horizontally by a factor
of 3 and vertically by 1/3. Does that make sense?
The point here is that these two areas must then be equal. Dilations
multiply area by the stretch factor, and we’ve used two factors that cancel
each other out. Of course, there is nothing special about the number 3. The
general statement would be that the area from a to b is the same as from ac to
bc. Do you see why?
This means that the area of a region under the reciprocal curve depends
only on the ratio of the endpoints, not on the endpoints themselves. In
particular, for any two numbers a and b, the area from 1 to a is the same as
from b to ab.
If we express this analytically, in terms of our area function A, it says that
A(ab) – A(b) = A(a) – A(1). Since A(1) = 0, we can rewrite this in the elegant
form
The point being that the number of individual steps (one-digit sums with
possible carrying) is equal to the number of digits. So adding ten-digit
numbers would take only twice as long, even though the numbers themselves
are astronomically larger. Subtraction is similar.
Multiplication, on the other hand, is a nightmare (and don’t get me started
on division!). The trouble is that it takes too long: to multiply two five-digit
numbers requires twenty-five single-digit multiplications (to say nothing of
the necessary adding and carrying). If we want the product of two ten-digit
numbers, this entails over a hundred individual calculations. Forgetting about
the practical issues facing navigators and accountants, I find it interesting on a
purely theoretical level that one operation is so much more costly than the
other. Not that this is in any way surprising; multiplication is, after all,
repeated addition.
In any case, it came as a great relief to those in the arithmetic business
when the Scottish mathematician John Napier invented a better system in
1610.
The idea is this. First, notice how easy it is to multiply by 10: 367 × 10 =
3670. This is not due to any special property of the number ten, but rather to
our choice of ten as a grouping size. That is, when we write a number like
367, we are choosing to represent that quantity in terms of groups of ten (3
hundreds, 6 tens, 7 ones). So each position in the digit string is worth ten
times the next. Multiplication by ten then simply shifts each digit one space to
the left so that it counts ten times as much. Of course, we could just as easily
use a different grouping size, say seven, and then multiplication by seven
would shift the digits. (The advantage of a smaller grouping size would be
less memorization—there would be only six nonzero digits, so the
multiplication table would be smaller. The disadvantage would be that the
representations themselves would then be longer.) The choice of ten as a
grouping size thus has no particular mathematical benefit; it is simply a
cultural choice stemming from the fact that we happen to have ten fingers. Of
course, once such a “decimal” system is in general use, then multiplication by
ten becomes especially convenient.
In particular, numbers that are powers of 10, such as 100 or 10000, are
especially easy to multiply together: we just count the shifts. Since 100 is the
same as 1 with two shifts, and 10000 is 1 with four shifts, their product is
simply 1000000 (i.e., 1 with six shifts). The key observation here is that
multiplication of powers of 10 is essentially addition. That is, to multiply two
such numbers we need only to add the shifts: 10m · 10n = 10m+n.
Of course, the same goes for any other number, not just 10. For any number
a, we always have
am · an = am+n
because that’s what repeated multiplication means. By the way, when one
writes something like 25, the number 2 that is being repeatedly multiplied is
called the base and 5 is the exponent (Latin for “on display”). This number
would then be referred to as “2 raised to the fifth power” or simply “2 to the
fifth,” for short.
This pattern is so simple and so pretty, that it is often extended to include
negative and fractional exponents as well. That is, we can make sense of
something like 2–3/8 by insisting that whatever we choose it to mean, we want
the pattern 2m+n = 2m · 2n to be preserved. This is a major theme in
mathematics: extending ideas and patterns into new territory. Mathematical
patterns are like crystals; they hold their shape and can grow beyond their
original confines. Our extension of sine and cosine to arbitrary angles is one
example; projective space is another. Now we’re going to do the same thing
with repeated multiplication.
Let’s start with the powers of 2. Writing out the first few, we notice a
simple pattern:
21 = 2, 22 = 4, 23 = 8, 24 = 16, . . .
Each time the exponent goes up by 1, the number itself doubles. Of course,
this is patently obvious. But it also means that whenever the exponent goes
down by 1, the number gets cut in half. And this allows us to extend the
meaning of 2n. First of all, it suggests that 20 should equal 1! What is
interesting here is that the original meaning of 2n, namely “n copies of 2
multiplied together,” no longer makes any sense. Are we really saying that no
2s multiplied together is equal to 1? I guess we could say it if we want to, but
what we really mean is that we are shifting the meaning of 2n from “n copies
of 2 multiplied together” to “whatever it needs to be to keep the pretty pattern
going.” It would not be much of an overstatement to say that this is how all
meaning in mathematics is made.
Continuing the pattern, we find that
Let’s go a little further. Is there a good way to give meaning to 21/2? The
pattern, if it can continue to hold in such uncharted territory, would say that
This means that whatever 21/2 is, when we multiply it by itself we get 2. So
it must be . Similarly, , and in general .
Actually, we have to be a little careful here, since is slightly ambiguous.
There are, after all, two square roots of a number a, if a is positive. Which one
do we want a1/2 to mean? Also, if a is negative then we have an even bigger
problem. We don’t yet have a meaning for the square root of a negative
number, so what are we going to do with something like (–2)1/2?
One easy way out is to simply restrict ourselves to positive bases. That is,
we will only assign meaning to a1/2 when a is a positive number. The other
possibility is to extend our number system to include new objects like .
This can actually be done—and you should do it! Unfortunately, this still
doesn’t solve our ambiguity problem. We still need to choose the meaning of
a1/2 (if we want it to have meaning) to be one of the square roots of a. Which
one? Well, the usual choice when a is positive is to choose the positive square
root. Thus 41/2 = 2, not –2. Of course, this is somewhat arbitrary, but at least it
makes a nice consistent pattern.
For the time being, let’s agree that our base will always be positive and that
whenever we need to make a choice, we will choose positive values. So we
will say that a1/2 only has meaning when a is positive, and its meaning is the
(unique) positive square root of a.
Of course, you may find this whole enterprise repulsive and not wish to
make any of these choices. You may see no advantage whatever in writing
things this way. I personally like it because it illustrates the persistence of
pattern. I feel like this is what the pattern wants—to be set free of its shackles.
So let’s keep going.
How should we define something like a3/4? Whatever it is, when we raise it
to the fourth power (that is, multiply four copies of it together), we should get
a3. Do you see why? This means that a3/4 must be the fourth root of a cubed,
or . The general pattern is now clear: am/n must be an n-th root of am (and,
of course, we choose it to be the positive one).
ab · ac = ab+c.
In particular, whatever the numbers and 3π are (and they will most
certainly be transcendental if anything) we insist that be their product.
(Not that we are in a position to insist on anything; we’re simply hoping that
this is possible.)
Now here is Napier’s idea. Suppose we have some number like 32768.
Clearly, this lies between 104 and 105. Napier’s realization was that there
must be some number p between 4 and 5 so that 32768 = 10p. In other words,
every number is a power of 10. Since powers of 10 are easy to multiply, this
should mean that all numbers are easy to multiply. Of course, the hard part is
to figure out what power of 10 a given number is. So there are really two
problems here. First, is it really true that every number is a power of 10? And
second, how on earth can we hope to calculate such an exponent? These are
pretty serious questions, actually.
On the other hand, for all practical purposes all that is needed are
approximations. Here the subtle mathematical issues disappear. We don’t
need to know if ab makes sense for irrational numbers b, because every
number is approximately a fraction. For example, if I want to represent a
number like 37 as an approximate power of 10, I only need to find a fraction
m/n so that 10m/n ≈ 37. In other words, 10m should be roughly 37n. Let’s look
at some powers of 37 that are reasonably close to powers of 10:
So 3/2 = 1.5 should be a so-so estimate, and 11/7 ≈ 1.57 a pretty good one.
The point is that we don’t need an exact value for the exponent in order to
navigate a ship or any other mundane purpose like that. If we cared enough,
we could even obtain an extremely accurate estimate like 1.56820. Of course,
it would require an enormous amount of work to obtain such approximations
for every number we might wish to use, but just as for trigonometric tables,
the work would only have to be done once. And this is just what Napier set
out to do.
For each number N, we are trying to find (at least approximately) a number
p so that N = 10p. Napier called p the logarithm of N (from the Greek logos +
arithmos, meaning “way of reckoning”). So, for example, the logarithm of 37
is about 1.5682. Let’s write L(N) for the logarithm of N. Then a section of
Napier’s table might look something like this:
Now, here’s the point. Suppose we wanted to multiply two numbers
together, say the ones we had before: 32768 and 48597. Ordinarily, this would
be an annoying, multistep procedure. But using Napier’s “admirable table of
logarithms,” we can rewrite these numbers (again, approximately) as powers
of 10:
32768 ≈ 104.5154,
48597 ≈ 104.6866,
Consulting the logarithm tables (in reverse), we find that the number whose
logarithm is closest to 9.2020 is 1592208727. This means that the true
product should be pretty close. In fact, 32768 × 48597 = 1592426496, so our
estimate is accurate up to the fourth decimal place. In other words, we are off
by about one part in ten thousand. But the point is that we only had to do
three table look-ups and one addition. So this is a huge time-saver.
The skeptical reader may find it improbable that such tables exist going up
to numbers as large as 1592208727, and the skeptical reader would be right—
they don’t. In actual practice, one only requires logarithm tables for numbers
between 1 and 10. Everything else can be obtained by shifting. For example,
if I wanted L(32768), I would actually be content to look up L(3.2768) =
0.5154, and then add 4. This is because multiplication by 10 has the effect of
adding 1 to the exponent; that is, the logarithm. Similarly to find the
“antilogarithm” of 9.2020, I would look for 0.2020 in the logarithm column
and see that it corresponds to the number 1.5922 (assuming my tables are
accurate to four decimal places, which is pretty standard). Then I would
multiply this by 109 to get 1592200000, which is pretty much as accurate as
before.
Of course, the practical use of logarithms for arithmetic computation is
now obsolete, due to the advent of high-speed electronic calculators. In fact,
almost all computation these days is done by machine (as Leibniz himself
predicted). My point in bringing up logarithms was not their computational
utility—now a historical footnote—but to illustrate a particularly curious
example of an unforeseen connection in mathematics: hyperbolic area (the
integral of dx/x) turns out to be related to the behavior of exponents
(logarithms). How strange that a method intended to speed up practical
arithmetic should turn out to be so intimately connected to the classical
measurement of conics! Again, the connection is that in both cases
multiplication is somehow being converted into addition.
29
From a modern perspective, Napier’s logarithm can be viewed as an
isomorphism between two apparently different algebraic structures. On the
one hand, we have the system of positive numbers under multiplication, and
on the other hand, the system consisting of all numbers (positive and
negative) under addition. Napier’s logarithm provides a “dictionary” between
these two worlds:
for all positive numbers x and y qualifies as a logarithm (I’m using the generic
symbol log to represent any such activity). Thus Napier’s function L, the
binary logarithm log2, and the hyperbolic area function A are all logarithms in
this abstract sense.
Given any such function log, let’s call the reverse process exp (short for
exponentiation). Then
log(exp(x)) = exp(log(x)) = x,
because that is what reverse means. In particular, if you choose log to be the
Napier logarithm, then exp will simply be base 10 exponentiation, exp(x) =
10x. In general, the function exp inherits the property
Now here is an idea that I find very clever and pretty. The property of being
a logarithm implies that
Do you see why? Applying exp to both sides of this equation, we get xm =
exp(m log x). This makes good sense for any positive whole number m. But
the right-hand side is in fact meaningful for any number m—rational,
irrational, whatever. So the existence of a logarithm allows us to define what
it means to raise any positive number a to any power b:
All the properties that you want ab to have follow directly from the
properties of log and exp.
So, given any logarithm (or what I like to think of as a log/exp pair), we get
a corresponding definition of ab. Luckily, as we will see, it turns out that its
value does not depend on which logarithm we choose.
Now we have to be a little careful here about circular reasoning. Our
problem with Napier’s logarithm, you may recall, was that we didn’t quite
know what 10x should mean (at least when x is irrational). Now that we have
a satisfactory definition of ab, it might seem as though our logarithm
problems are solved. The trouble is that our clever definition of ab requires
that a well-defined logarithm already be in place. So we can’t then turn
around and use this to define our logarithm. On the face of it, this looks pretty
bad. We seem to need a definition of exponentiation to define a logarithm,
and vice versa.
But wait—our hyperbolic area function is a logarithm! And luckily, it
requires no notion of exponentiation to get it off the ground; it is simply the
collected area under the reciprocal curve. This means that we can base our
entire theory of exponents and logarithms on the integral of dx/x.
So here’s the plan: we will define, once and for all, the natural logarithm
of a positive number x to be A(x), the area under the reciprocal curve from 1
to x. Since this particular logarithm is the only one mathematicians ever use
(and we will shortly see why), we will do it the honor of being written simply
as log x. (This convention varies somewhat, actually. Some people—
scientists, engineers, calculator manufacturers—prefer to use the symbol log
for Napier’s decimal logarithm; others—mostly computer scientists—like to
use it to denote the binary logarithm. The natural logarithm is then given the
unappetizing name ln.)
Now that we have a well-defined logarithm, we likewise define the natural
exponential to be the corresponding exponential function, which we will
write simply as exp. So exp(3), for instance, refers to the number w with A(w)
= 3. We can then define ab as exp(b log a) without any circularity in our
reasoning. Thus, the number 2π can now be seen as that number that gives us
π times as much collected area as 2 does.
As bizarre as this sequence of ideas may initially seem, the point is that we
get a precise definition of exponentiation that satisfies all the properties that
we want it to have.
In particular, now that we have a precise notion of what ab means, it’s not
hard to determine the base a logarithm of a number x to be
But who wants some ugly constant like 1/log 10 cluttering up the place? If all
logarithms are more or less equivalent, why not go with the one whose
differential is as nice as possible?
Another way to think of this is to look at the graphs of the various
logarithm functions.
Being proportional, these curves all behave pretty much the same way (in
particular, logarithms are famous for their exceedingly slow growth). But
notice how their tangents at the point x = 1 vary from nearly horizontal to
nearly vertical. The natural logarithm is the one whose slantedness is exactly
halfway between these extremes, making a nice 45-degree angle with the
axes.
So the natural logarithm is the simplest, and it therefore is the only
logarithm that mathematicians ever use. It also deserves its name, since it
arose naturally from our attempt to measure conics, as opposed to making
some arbitrary choice of base. But that raises an interesting question. What is
the base of the natural logarithm?
Since the base of a logarithm is just the number whose logarithm is equal to
1, we are asking how far we have to go along the reciprocal curve to collect
exactly one unit of area.
This number, usually denoted by the letter e (for exponential), stands out
from all other numbers as the most aesthetically pleasing base. So what
number is it? Well, it turns out that e ≈ 2.71828, and I don’t suppose it would
surprise you very much to learn that it is transcendental. (As a matter of fact,
e was the first naturally occurring mathematical constant to be proved
transcendental, by Hermite in 1873.)
This means that just as we did for the trigonometric functions and for pi,
we will need to enlarge our language to include log, exp, and e. Isn’t it funny
how every time we run across an interesting number it turns out to be
inexpressible? Maybe numbers like e and π are simply too beautiful to be
captured by something as prosaic as a fraction or an algebraic equation. If e
were rational, for instance, what numerator and denominator could possibly
be good enough? In any case, we have no choice but to simply give names to
these things and then incorporate them into our vocabulary. (In particular, it is
customary to include log and exp in the category of elementary functions.)
Let’s step back a bit and try to figure out what has really happened here.
We began with a problem: What is the integral of dx/x? Did we solve this
problem? In some sense it seems like we cheated—all we did was to name it
log x. (Similarly, we just named the proportion of circumference to diameter
pi and then walked away.) What kind of “solution” is that? Are
mathematicians just a pack of namers and abbreviators?
No. The words and symbols are irrelevant. What matters are the patterns
and our ideas about them. (As Gauss famously quipped, what we need are
notions, not notations.) Maybe we didn’t solve our problem in the sense of
expressing the integral of dx/x in algebraic terms (which we now know to be
impossible), but we did discover that whatever it is (and we may as well call
it log x), it satisfies the surprising and elegant property log(ab) = log(a) +
log(b). If the names and abbreviations help us to understand the pattern, then
they’re worth it. Otherwise, they’re in the way. We should give names to
things only when we need to and in such a way that it helps us to reveal and
to distinguish more clearly the patterns that obtrude themselves upon our
imaginations.
Speaking of which, there is one more pattern that I would like to show you.
We saw how the natural logarithm distinguishes itself from the other
logarithms by having the nicest differential. Shouldn’t the natural exponential
have a similar property? What is the differential of an exponential ax?
Let’s start with the natural exponential exp(x) (which, if you want, you can
also write as ex). The simplest way to proceed is to give this a name, say y.
Then y = exp(x), so x = log y. Taking differentials, we get dx = dy/y. This
means that dy = y dx. In other words,
d(ex) = ex dx.
What a beautiful discovery! The natural exponential has the property that its
derivative is itself. Geometrically, this means that the slantedness of the graph
of y = ex at a point is always equal to its height.
Finally, to put the last finishing touches on our differential calculus, we can
now extend our formula for d(xm) to arbitrary exponents m:
for all numbers m, except for m = –1, where the expression on the right
becomes meaningless. In this latter case, as odd as it may seem, the pattern is
broken, and we get, of all things, the natural logarithm.
Show that for any two variables x and y, we have d(xy) =
yxy–1 dx + xy log x dy.
30
What a wild and amazing place mathematical reality is! There is just no end
to its mystery and beauty. And there is so much more I want to tell you about
it—so many more delightful and surprising (and scary) discoveries.
Nevertheless, I feel that the time has come for me to put down my pen.
(Perhaps you have had that feeling for quite some time!)
Not that we’ve done much more than scratch the surface. Mathematics is a
vast, ever-expanding jungle, and measurement is only one of its many rivers
(though certainly a major one). But my goal was not to be exhaustive, only
illustrative (and hopefully entertaining). I suppose what I really wanted to do
was to give you a feeling for what it is we mathematicians do and why we do
it.
I especially wanted to get across the idea that mathematics is a
quintessentially human activity—that whatever strange product of
evolutionary biochemistry our minds are, one thing is for sure: we love
patterns. Mathematics is a meeting place for language, pattern, curiosity, and
joy. And it has given me a lifetime of free entertainment.
There is one small issue I feel I should address before I go: reality. Why
haven’t we talked about the real world at all? What about all those wonderful
applications of geometry and analysis to the problems of physics,
engineering, and architecture? What about the motions of the heavenly
spheres, for crying out loud? How can I claim to have written a book about
measurement when I take such a dismissive view of the very reality in which
my brain is located?
Well, first of all, I am me, and I write about what I am interested in, which
is the nature of mathematical reality. What else can anyone do? Second, it’s
not like there is a tremendous shortage of books about the physical universe.
They’re all over the place, and many of them are quite good. I felt the need to
write a book about mathematics, because, quite frankly, there really aren’t
very many. Not many that are honest and personal. Not many that feel like
real books with a point of view. Also, I didn’t want to talk about the
applications of mathematics to the sciences (which are fairly obvious anyway)
because I feel that the value of mathematics lies not in its utility but in the
pleasure it gives.
Which is not to say that reality isn’t interesting and exciting. Don’t get me
wrong, I’m very happy to be here. There are birds and trees and love and
chocolate. I have no complaints about physical reality, only a much deeper
intellectual and aesthetic attraction to pattern in the abstract. Maybe the
bottom line is that I don’t have that much to say about the real world. Maybe
part of it is that I’m not altogether entirely here a lot of the time. Maybe the
point of this book is to give you a glimpse of what it is like to live a
mathematical life—to have the better part of one’s mentality off in an
imaginary world. At any rate, I know that I am by nature permanently isolated
from reality—my brain is alone, receiving only the (possibly illusory) sensory
input that it does—but mathematical reality is me.
Which brings me to you. This mathematical reality we have been talking
about—although it certainly feels like it’s “out there” somewhere—I don’t
want you to feel shy about entering it, as if it were located in some restricted
government facility and being worked on by experts in lab coats.
Mathematical reality is not “theirs”—it’s yours. You have an imaginary
universe in your head whether you like it or not. You can choose to ignore it,
or you can ask questions about it, but you cannot deny that it is very much a
part of you. Which is one of the reasons why mathematics is so compelling:
you are discovering things about yourself and the way your personal mental
constructions behave.
So keep exploring! It doesn’t matter how much experience you have.
Whether you are an expert or a beginner, the feeling is the same. You are
wandering around in the jungle, following one river and then another. The
journey is endless, and the only goal is to explore and have fun. Enjoy!
ACKNOWLEDGMENTS
Acceleration, 300
Algebra, 56, 60, 116
Algebraic, 341, 366, 370
Alphabet soup, 113
Analysis, 17, 349, 381
Analyst, The, 272
Angle, 24, 29, 119, 128, 130, 166, 213, 218, 241, 362
—acute/obtuse, 135
—between curves, 149
—inside, 27, 109
—outside, 27, 108, 239
—right, 36, 66, 86, 127, 180, 242
Angle bisector, 136
Annulus, 96–98
Antilogarithm, 385
Apollonius, 161, 361
Arc (of a circle), 66, 99, 241, 242, 290, 346
Arc length, 344–345, 366
Archimedes, 15, 31, 67, 82, 86, 89, 90, 117, 190, 310, 320, 326, 329, 366
Archimedes’s solid, 90
Area, 38, 52, 324, 331
Argument, 7, 18, 26, 50, 148
—See also Proof
Arithmetic, 203, 316, 375–377, 385
Arithmetization of geometry, 335
Artifact, 357, 360
Average, 57, 97, 102, 104, 332
Iceberg, 82
Icicle, 352
Incommensurable, 44
Infinite sum, 68
Infinitesimal change, 306, 310, 337
Initial condition, 328, 329, 333, 337
Initial position, 264, 267
Integral, 339
Integral sign, 338, 339
Integral tables, 340
Integration, 339
Intersection, 100, 105, 166
Invariant, 164–166, 171, 216
Irrational, 45, 48, 59, 129, 381
Isometry, 204, 205, 229
Isomorphism, 204, 211, 386
Kinematics, 226
La Géométrie, 259
Ladder, 197
Lambert, Johann Heinrich, 67
Lamp shade, 184
Language, 45, 55, 255–256, 325, 340–341, 393
Later, 274
Law of cosines, 134
Law of sines, 134, 238
Leg (of a right triangle), 123, 236
Leibniz, Gottfried Wilhelm, 68, 303, 304, 306, 310, 317, 332, 337, 338, 339, 385
Leibniz d-operator, 304, 307, 313, 316, 337
—as enzyme, 314
Leibniz’s rule, 309–312
Lindemann, Ferdinand, 68
Line at infinity, 170, 174
Line of symmetry, 11, 177, 361, 366
Linear, 261, 307
Literacy, 256
Logarithm, 383–385, 388
—base a, 387
—binary, 387, 390
—Napier’s, 386, 387
—natural, 390–392, 394, 395
Logarithm table, 384
Log/exp pair, 389
Long radius, 182, 363
Lunatic, 169
Octagon, 52
Octahedron, 77, 128
Orientation, 202, 209, 214, 219, 295
Origin, 202, 208, 219
Sandwich, 25, 26
Scaffolding, 308, 329
Scaling, 33, 54, 180, 186, 205, 211
—effect on measurement, 34, 41, 218
Scaling independence, 34, 124, 134
Scanning, 331, 333, 334
Semicircle, 107
Semiperimeter, 117, 136
Shadow, 126, 162, 293
Shift, 203, 204, 210, 248, 283
Short radius, 182, 363
Similar, 33, 34, 37, 53, 54, 180, 186, 366
Sine, 123, 124, 127–130, 238–243
Sine wave, 294, 371
Sinusoidal arch, 333, 335
Slantedness, 265, 271–274, 276, 279, 394
Slider, 296, 302, 305
Sliver, 99, 100, 103, 310, 322, 332
Snowplow, 96, 101
Solar oven, 188
Solid, 218
Soup can, 358, 360
Space, 128, 168, 201, 206, 213–214, 230–231, 266
—four-dimensional, 214–216, 226, 351
Space-time, 224–227
Speed, 264, 267, 283, 292, 344
Sphere, 86, 147, 217, 246, 337
—surface of, 91, 92
—volume of, 90, 218
Spherical cap, 93
Spiral, 194, 348
Spirograph, 196, 254, 291
Spread, 57
Square, 35, 36, 40
—diagonal of, 42, 45, 55, 129, 236
Square root, 56, 325, 326, 338, 380, 385
Square root of two, 44, 55, 68, 131
Star, 29
Stick, 5, 6, 32, 95, 100, 126, 353, 358
Stopwatch, 219, 221
Straight line, 150, 224, 230, 232, 361
Straightness, 165, 230–231
Structure-preserving transformation, 153, 229
Stuck, 6, 14, 230
Surface, 217, 218
Symmetry, 10–11, 13, 23, 26, 107, 112, 116, 122, 135, 170, 177
Synthesis, 351, 381
Tangent, 146, 147, 149, 150, 165, 271–274, 355, 361, 395
—at infinity, 178
Tangent property, 150, 154, 156, 183, 187
Ten, 377, 387
Tetrahedron, 78, 85, 92, 128
The, 340
Theorem, 17
Thing of fries, 32
Time, 201, 219, 227
Time line, 219, 223, 226
Toric section, 191
Torus, 94, 101, 197
—surface of, 106
—volume of, 103
Transcendental, 68, 130–131, 371, 393
Triangle, 5, 110, 132, 164, 276
—angle sum of, 25, 133
—area of, 40, 111, 112, 117, 118, 128
—equilateral, 10, 14, 26, 49, 51, 52
—isosceles, 14, 136
—labeling scheme for, 119
—right, 34, 107, 113, 120, 123, 124, 128, 236–238, 344
Trigonometry, 111, 132, 136, 375
Z shape, 25