100% found this document useful (6 votes)
2K views698 pages

Vector Calculus, Linear Algebra, and Differential Forms A Unified Approach

Vector Calculus textbook

Uploaded by

Bill Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (6 votes)
2K views698 pages

Vector Calculus, Linear Algebra, and Differential Forms A Unified Approach

Vector Calculus textbook

Uploaded by

Bill Jones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 698
Vector Calculus, Linear Algebra, And Differential Forms A Unified Approach John Hamal Hubbard Barbara Burke Hubbard Cornell University a PRENTICE HALL Upper Saddle River, New Jersey 07458 Contents PREFACE CHAPTER 0 Preliminaries 0.0 o1 02 03 04 05 06 07 Introduction Reading Mathematics How to Negate Mathematical Statements Set Theory Real Numbers Infinite Sets and Russell's Paradox Complex Numbers Exercises for Chapter 0 CHAPTER 1 Vectors, Matrices, and Derivatives 10 LL 12 13 4 15 16 L7 18 19 Introduction Introducing the Actors: Vectors Introducing the Actors: Matrices A Matrix as a Transformation The Geometry of R" Convergence and Limits Four Big Theorems Differential Calculus Rules for Computing Derivatives Criteria for Differentiability 1.10 Exercises for Chapter 1 CHAPTER 2 Solving Equations 2.0 21 2.2 23 24 2.5 26 27 Introduction The Main Algorithm: Row Reduction Solving Equations Using Row Reduction Matrix Inverses and Elementary Matrices Linear Combinations, Span, and Linear Independence Kernels and Images Abstract Vector Spaces Newton's Method vii aan & 12 4 ReESaBN 89 100 115, 120 127 a7 148 154 160 177 189 197 Contents 2.8 Superconvergence 2.9 The Inverse and Implicit Function Theorems 2.10 Exercises for Chapter 2 CHAPTER 3 Higher Partial Derivatives, Quadratic Forms, and Manifolds 3.0 Introduction 3.1 Curves and Surfaces 3.2. Manifolds 3.3 Taylor Polynomials in Several Variables 3.4 Rules for Computing Taylor Polynomials 3.5 Quadratic Forms 3.6 Classifying Critical Points of Functions 3.7. Constrained Critical Points and Lagrange Multipliers 3.8 Geometry of Curves and Surfaces 3.9 Exercises for Chapter 3 CHAPTER 4 Integration 4.0 Introduction 4.1 Defining the Integral 4.2 Probability and Integrals 4.3. What Functions Can Be Integrated? 4.4 Integration and Measure Zero (Optional) 4.5 Fubini’s Theorem and Iterated Integrals 4.6 Numerical Methods of Integration 4.7 Other Pavings 4.8 Determinants 4.9 Volumes and Determinants 4.10 The Change of Variables Formula 4.11 Iinproper Integrals 4.12 Exercises for Chapter 4 CHAPTER 5 Lengths of Curves, Areas of Surfaces, .. 6.0 Introduction 5.1 Parallelograms and their Volumes 5.2. Parametrizations 5.3 Arc Length 5.4 Surface Area 5.5 Volume of Manifolds 5.6 Fractals and Fractional Dimension 2 217 231 249 250 275 285 290 316 332 361 352 362 372 380 387 395 404 405 420 426 436 449 469 470 473 479 482 488 491 Contents ix 5.7 Exercises for Chapter 5 492 CHAPTER 6 Forms and Vector Calculus 6.0 Introduction 499 6.1 Forms as Integrands over Oriented Domains 500 6.2 Forms on R” 501 6.3. Integrating Form Fields over Parametrized Domains 512 6.4 Forms and Vector Calenlus 516 6.5 Orientation and Integration of Form Fields 525, 66 Boundary Orientation 536 6.7 The Exterior Derivative 544 68 The Exterior Derivative in the Language of Vector Calculus 550 6.9 Generalized Stokes's Theorem 556 6.10 The Integral Theorems of Vector Calculus 563, 6.11 Potentials 568 6.12. Exercises for Chapter 6 573 APPENDIX A: Some Harder Proofs AO Introduction 589 A. Proof of the Chain Rule 589 2 Proof of Kantorovitch’s theorem 592 A.3_ Proof of Lemma 2.8.4 (Superconvergence) 597 AA Proof of Differentiability of the Inverse Function 598 A.5 Proof of the Implicit Function Theorem 602 A6 Proof of Theorem 3.3.9: Equality of Crossed Partials 605 A.7_ Proof of Proposition 3.3.19 606 A8 Proof of Rules for Taylor Polynomials 609 AQ Taylor's Theorem with Remainder 613 A.10 Proof of Theorem 3.5.3 (Completing Squares) 617 A.11 Proof of Propositions 3.8.12 and 3.8.13 (Frenet Formulas) 619 A.12 Proof of the Central Limit Theorem 622, A.13 Proof of Fubini's Theorem 626 A.14 Justifying the Use of Other Pavings 629 A.15 Existence and Uniqueness of the Determinant 632 A.16 Rigorous Proof of the Change of Variables Formula 635 A.A7 A Few Extra Results in Topology 643 A.18 Proof of the Dominated Convergence Theorem 643 A.19 Justifying the Change of Parametrization 648 A.20 Proof of Theorem 6.7.3 (Computing the Exterior Derivative) 652 x Contents A.21 The Pullback ‘A.22 Proof of Stokes’ Theorem A.23 Exercises APPENDIX B: Programs B.1_ MATLAB Newton Program B.2_ Monte Carlo Program B.3 Determinant Program BIBLIOGRAPHY INDEX 656 661 665 669 670 672 675 677 Preface sss The numerical interpretation ... is however necessary. ... So long 13 it is not obtained, the solutions may be said to remain incomplete and ‘useless, and the truth which it is proposed to discover is no less hidden in the formulae of analysis than it was in the physical problem itself. Joseph Fourier, The Analytic Theory of Heat ‘This book covers most of the standard topics in multivariate calculus, and & substantial part of a standard first course in linear algebra. The teacher may find the organization rather less standard. ‘There are three guiding principles which led to our organizing the material as we did. One is that at this level linear algebra should be more a convenient setting and language for multivariate calculus than a subject in its own right. ‘We begin most chapters with a treatment of a topic in linear algebra and then show how the methods apply to corresponding nonlinear problems. In each chapter, enough linear algebra is developed to provide the tools we need in teaching multivariate calculus (in fact, somewhat more: the spectral theorem for symmetric matrices is proved in Section 3.7). We discuss abstract. vector spaces in Section 2.6, but the emphasis is on R", as we believe that most students find it easiest to move from the concrete to the abstract. Another guiding principle is that one should emphasize computationally ef- fective algorithms, and prove theorems by showing that those algorithms really work: to marry theory and applications by using practical algorithms as the- oretical tools. We feel this better reflects the way this mathematics is used today, in both applied and in pure mathematics. Moreover, it can be done with no 1oas of rigor. For linear equations, row reduction (the practical algorithm) is the central tool from which everything else follows, and we use row reduction to prove all the standard results about dimension and rank. For nonlinear equations, the cornerstone is Newton's method, the best and most widely used method for solving nonlinear equations. We use Newton’s method both as a computational tool and as the basis for proving the inverse and implicit function theorem, rather than basing those proofs on Picard iteration, which converges too slowly to be of practical interest. Jean Dieudonné, for many years a leader of Bourbaki, is the very personification of rigor in mathematics. In his book Jn- finitesimal Calculus, he put the harder proofs in small type, say- ing“... a beginner will do well to accept plaustble results without, taxing his mind with subtle proofs Following this philosophy, we have put many of the more diffi- cult proofs in the appendix, and feel that for a first couree, these Proofs should be omitted. Stu- dents should learn how to drive be- fore they learn how to take the car apart. xii Preface In keeping with our emphasis on computations, we include a section on numerical methods of integration, and we encourage the use of computers to both to reduce tedious calculations (row reduction in particular) and as an aid in visualizing curves and surfaces. We have also included a section on probability and integrals, as this seems to us too important a use of integration to be ignored. A third principle is that differential forms are the right way to approach the various forms of Stokes’s theorem. We say this with some trepidation, espe- cially after some of our most distinguished colleagues told us they had never really understood what differential forms were about. We believe that differ- ential forms can be taught to freshmen and sophomores, if forms are presented geometrically, as integrands that take an oriented piece of a curve, surface, or manifold, and return a number. We are aware that students taking courses in other fields need to master the language of vector calculus, and we devote three sections of Chapter 6 to integrating the standard vector calculus into the language of forms. The great conceptual simplifications gained by doing electromagnetism in the language of forms is a central motivation for using forms, and we will apply the language of forms to electromagnetism in a subsequent volume. Although most long proofs have been put in Appendix A, we made an excep- tion for the material in Section 1.6. These theorems in topology are often not taught, but we feel we would be doing the beginning student a disservice not to include them, particularly the mean value theorem and the theorems con- cerning convergent subsequences in compact sets and the existence of minima and maxima of functions. In our experience, students do not find this material particularly hard, and systematically avoiding it leaves them with an uneasy feeling that the foundations of the subject are shaky. Different ways to use the book This book can be used either as a textbook in multivariate calculus or as an accessible textbook for a course in analysis. ‘We see calculus as analogous to learning how to drive, while analysis is analogous to learning how and why a car works. To use this book to “learn how to drive,” the proofs in Appendix A should be omitted. To use it to “learn how a car works,” the emphasis should be on those proofs. For most students, this will be best attempted when they already have some familiarity with the material in the main text. Students who have studied first year calculus only (1) For a one-semester course taken by students have studied neither linear algebra nor multivariate calculus, we suggest covering only the first four chap- ters, omitting the sections marked “optional,” which, in the analogy of learning Preface to drive rather than learning how a car is built, correspond rather to learning how to drive on ice. (These sections include the part of Section 2.8 concerning fa stronger version of the Kantorovitch theorem, and Section 4.4 on measure 0). Other topics that can be omitted in a first course include the proof of the fundamental theorem of algebra in Section 1.6, the discussion of criteria for differentiability in Section 1.9, Section 3.2 on manifolds, and Section 3.8 on the geometry of curves and surfaces. (In our experience, beginning students do have trouble with the proof of the fundamental theorem of algebra, while manifolds do not pose much of a problem.) (2) The entire book could also be used for a full year’s course. This could be done at different levels of difficulty, depending on the students’ sophistication and the pace of the class. Some students may need to review the material in Sections 0.3 and 0.5; others may be able to include some of the proofs in the appendix, such as those of the central limit theorem and the Kentorovitch theorem. (3) With a year at one’s disposal (and excluding the proofs in the appendix), one could also cover more than the present material, and a second volume is planned, covering applications of differential forms; abstract vector spaces, inner product spaces, and Fourier series, electromagnetism; differential equations; eigenvalues, eigenvectors, and differential equations. We favor this third approach; in particular, we feel that the last two topics above are of central importance. Indeed, we feel that three semesters would not be too much to devote to linear algebra, multivariate calculus, differential forms, differential equations, and an introduction to Fourier series and partial differential equations. This is more or less what the engineering and physics departments expect students to learn in second year calculus, although we feel this is unrealistic. Students who have studied some linear algebra or multivariate calculus The book can also be used for students who have some exposure to either linear algebra or multivariate calculus, but who are not ready for a course in analysis. We used an earlier version of this text with students who had taken & course in linear algebra, and feel they gained a great deal from seeing how linear algebra and multivariate calculus mesh. Such students could be expected to cover Chapters 1-6, possibly omitting some of the optional material discussed We view Chapter 0 primarily ‘as a resource for students, rather than as part of the material to be covered in class. An exception is Section 0.4, which might well be covered in a class on analysis. Mathematical notation is not always uniform. For example, |] can mean the length of a matrix A (the meaning in this book) or it can mean the determinant of A. Different notations for partial derivatives also exist. This should not pose a problem for readers who begin at the beginning and end at the end, but for those who are using only selected chapters it could be confusing. Notations used in the book are listed on the front inside cover, along with an indication of where they are first introduced. xiv Preface above. For a less fast-paced course, the book could also be covered in an entire year, possibly including some proofs from the appendix. Students ready for a course in analysis If the book is used as a text for an analysis course, then in one semester one could hope to cover all six chapters and some or most of the proofs in Appendix A. This could be done at varying levels of difficulty; students might be expected to follow the proofs, for example, or they might be expected to understand them well enough to construct similar proofs. Several exercises in Appendix A and in Section 0.4 are of this nature. Numbering of theorems, examples, and equations Theorems, lemmas. propositions, corollaries, and examples share the same num- bering system. For example, Proposition 2.3.8 is not the eighth proposition of Section 2.3; it is the eighth numbered item of that section, and the first num- bered item following Example 2.3.7. We often refer back to theorems, examples, and so on, and hope this numbering will make them easier to find. Figures are numbered independently; Figure 3.2.3 is the third figure of Sec- tion 3.2. All displayed equations are numbered, with the numbers given at right; Equation 4.2.3 is the third equation of Section 4.2. When an equation is displayed a second time, it keeps its original number, but the number is in parentheses. We use the symbol A to mark the end of an example or remark, and the symbol G to mark the end of a proof. Exercises Exercises are given at the end of each chapter, grouped by section. They range from very easy exercises intended to make the student familiar with vocabuta to quite difficult exercises. The hardest exercises are marked with a star (or, rare cases, two stars). On occasion, figures and equations are numbered in the exercises. in this case, they are given the number of the exercise to which they pertain. . In addition, there are occasional “mini-exercises” incorporated in the text, with answers given in footnotes. These are straightforward questions contai ing no tricks or subtleties, and are intended to let. the student test his or her understanding (or be reassured that he or she has understood). We hope that even the student who finds them too easy will answer them; working with pen and paper helps vocabulary and techniques sink in. Preface xv Web page Errata will be posted on the web page http://math.cornell.edu/~ hubbard/vectorcalculus. The three programs given in Appendix B will also be available there. We plan to expand the web page, making the programs available on more platforms, and adding new programs and examples of their uses. Readers are encouraged to write the authors at [email protected] to signal errors, or to suggest new exercises, which will then be shared with other readers via the web page. Acknowledgments Many people contributed to this book. We would in particular like to express our gratitude to Robert Terrell of Cornell University, for his many invaluable suggestions, including ideas for examples and exercises, and advice on notation; Adrien Douady of the University of Paris at Orsay, whose insights shaped our presentation of integrals in Chapter 4; and Régine Douady of the University of Paris-VII, who inspired us to distinguish between points and vectors. We would also like to thank Allen Back, Harriet Hubbard, Peter Papadopol, Birgit Speh, and Vladimir Veselov, for their many contributions. Cornell undergraduates in Math 221, 223, and 224 showed exemplary pa- tience in dealing with the inevitable shortcomings of an unfinished text in pho- tocopied form. They pointed out numerous errors, and they made the course pleasure to teach. We would like to thank in particular Allegra Angus, Daniel Bauer, Vadim Grinshpun, Michael Henderson, Tomohiko Ishigami, Victor Kam, Paul Kautz, Kevin Knox, Mikhail Kobyakov, Jie Li, Surachate Limkumnerd, Mon-Jed Liu, Karl Papadantonakis, Marc Ratkovic, Susan Rushmer, Samuel Scarano, Warren Scott, Timothy Slate, and Chan-ho Suh. Another Cornell stu- dent, Jon Rosenberger, produced the Newton program in Appendix B.1. Karl Papadantonakis helped produce the picture used on the cover. For insights concerning the history of linear algebra, we are indebted to the essay by J.-L. Dorier in L'Enseignement de l'algébre linéaire en question. Other books that were influential include Infinitesimal Calculus by Jean Dieudonné, Advanced Calculus by Lynn Loomis and Shlomo Sternberg, and Calculus on Manifolds by Michael Spivak. Ben Salzberg of Blue Sky Research saved us from despair when a new com- puter refused to print files in Textures. Barbara Beton of American Mathemat- ical Society's Technical Support gave prompt and helpful answers to technical ‘questions. We would also like to thank George Lobell at Prentice Hall, who encour- aged us to write this book; Nicholas Romanelli for his editing and advice, and xvi Preface Gale Epps, as well as the mathematicians who served as reviewers for Prentice- Hall and made many helpful suggestions and criticisms: Robert Boyer, Drexel University; Ashwani Kapila, Rensselaer Polytechnic Institute; Krystyna Kuper- berg, Auburn University; Ralph Oberste-Vorth, University of South Florida; and Ernest Stitzinger, North Carolina State University. We are of course re- sponsible for any remaining errors, as well as for all our editorial choices. We are indebted to our son, Alexander, for his suggestions, for writing nu- merous solutions, and for writing a program to help with the indexing. We thank our oldest daughter, Eleanor, for the goat figure of Section 3.8, and for earning part of her first-year college tuition by reading through the text and pointing out both places where it was not clear and passages she liked—the first invaluable for the text, the second for our morale. With her sisters, Judith and Diana, she also put food on the table when we were to busy to do so. Finally, we thank Diana for discovering errors in the page numbers in the table of contents. John H. Hubbard Barbara Burke Hubbard Tehacs, N.Y. [email protected] John H. Hubbard is a professor of mathematics at Cornell University and the author of several books on differential equations. His research mainly concerns complex analysis, ifferential equations, and dynamical systems. He believes that mathematics research and teaching are activities that enrich each other and should not be separated. Barbara Burke Hubbard is the author of The World According to Wavelets, which ‘was awarded the prix d'Alembert by the French Mathematical Society in 1996, 0 Preliminaries 0.0 INTRODUCTION 0 We recommend not spending much time on Chapter 0. In par- ticular, if you are studying multi- variate calculus for the first time ‘you should definitely skip certain parts of Section 0.4 (Definition 0.4.4 and Proposition 0.4.6). How- ever, Section 0.4 contains a discus- sion of sequences and series which you may wish to consult when we ‘come to Section 1.5 about conver- gence and limits, if you find you don’t remember the convergence criteria for sequences and series from first Year calculus, ‘This chapter is intended as a resource, providing some background for those who may need it. In Section 0.1 we share some guidelines that in our expe- rience make reading mathematics easier, and discuss a few specific issues like sum notation. Section 0.2 analyzes the rather tricky business of negating math- ematical statements. (To a mathematician, the statement “All seven-legged alligators are orange with blue spots” is an obviously true statement, not an ‘obviously meaningless one.) Section 0.3 reviews set theory notation; Section 0.4 discusses the real numbers; Section 0.5 discusses countable and uncountable sets and Russell’s paradox; and Section 0.6 discusses complex numbers. 1 ReapinG MATHEMATICS The most efficient logical order for a subject is usually different from the best psychological order in which to learn it. Much mathematical writing is based too closely on the logical order of deduction in a subject, with too many definitions without, or before, the examples which motivate them, and too many answers before, or without, the questions they address.— William Thurston Reading mathematics is different from other reading. We think the following guidelines can make it easier. First, keep in mind that there are two parts to understanding a theorem: understanding the statement, and understanding the proof. The first is more important than the second. What if you don't understand the statement? If there's a symbol in the formula you don’t understand, pethaps a 4, look to see whether the next line continues, “where 6 is such-and-such.” In other words, read the whole sentence before you decide you can’t understand it. In this book we have tried to define all terms before giving formulas, but we may not have succeeded everywhere. If you're still having trouble, skip ahead to examples. This may contradict what you have been told—that mathematics is sequential, and that you must understand each sentence before going on to the next. In reality, although mathematical writing is necessarily sequential, mathematical understanding is not: you (and the experts) never understand perfectly up to some point and 1 ‘The Greek Alphabet Greek letters that look like Ro- man letters are not used as mathe- matical symbols; for example, A is capital a, not capital a. The letter x is pronounced “kye,” to rhyme with “sky”; y, v and € may thyme with either “sky” or “tea.” alpha beta gamma detta epsilon zeta eta theta jota kappa lambda mu nu micron pi tho sigma tau upeilon Phi chi Brac esnroeue s DexedsuuUnoOMZzg>x-~OmNUp IoD cexseiavaome omega In Equation 0.1.3, the symbol Cher *9ys that the sum will have ni terms. Since the expression be- ing summed is a,sbe,, each of those mn terms will have the form ab, 2 Chapter 0. Preliminaries not at all beyond. The “beyond,” where understanding is only partial, is an ‘essential part of the motivation and the conceptual background of the “here and now.” You may often (perhaps usually) find that when you return to something you left half-understood, it will have become clear in the light of the further things you have studied, even though the further things are themselves obscure. Many students are very uncomfortable in this state of partial understanding, like a beginning rock climber who wants to be in stable equilibrium at all times. To learn effectively one must be willing to leave the cocoon of equilibrium. So ‘if you don't understand something perfectly, go on ahead and then circle back. In particular, an example will often be easier to follow than a general state- ‘ment; you can then go back and reconstitute the meaning of the statement in light of the example. Even if you still have trouble with the general statement, you will be ahead of the game if you understand the examples. We feel so strongly about this that we have sometimes flouted mathematical tradition and given examples before the proper definition. Read with pencil and paper in hand, making up little examples for yourself as you go on. Some of the difficulty in reading mathematics is notational. A pianist who has to stop and think whether a given note on the staff is A or F will not be able to sight-read a Bach prelude or Schubert sonata. The temptation, when faced with a long, involved equation, may be to give up. You need to take the time to identify the “notes.” Learn the names of Greek letters—not just the obvious ones like alpha, beta, and pi, but the more obscure pei, xi, tau, omega. The authors know a math- ematician who calls all Greek letters “xi,” (£) except for omega (w), which he calls “w.” This leads to confusion. Learn not just to recognize these letters, but how to pronounce them. Even if you are not reading mathematics out loud, it is hard to think about formulas if €,,7,w, y are all “squiggles” to you. Sum and product notation Sum notation can be confusing at first; we are accustomed to reading in one dimension, from left to right, but something like Doiabes OL. & requires what we might call two-dimensional (or even three-dimensional) think- ing. It may help at first to translate a sum into a linear expression: Va +2427... 0.12 ‘=o 13 = Do ainbe, or iy + ai,2b2,5 +--+ Onda y- 0.1.3 0.1 Reading Mathematics 3 Two 5 placed side by side do not denote the product of two sums; one sum is used to talk about one index, the other about another. The same thing could be written with one S>, with information about both indices underneath. For example, a4 VdUera= Ye +a ities {from Two 3, 5 tom 204 (31+:) « (Ses) (Ee) ona (1 +2) + (143) +(1+4)) + ((2+2) + (2+3) + (2+4)) + (8 +2) + (3+3) + (344); this double sum is illustrated in Figure 0.1.1. i Rules for product notation are analogous to those for sum notation: Ficure 0.1.1 In the double sum of Equation 0.1.4, each sum has three terms, 50 the double sum has nine terms. Proofs Tene @,; for example, []i= We said earlier that it is more important to understand a mathematical state- ment than to understand its proof. We have put some of the harder proofs in the appendix; these can safely be skipped by a student studying multivariate calculus for the first time. We urge you, however, to read the proofs in the main text. By reading many proofs you will learn what a proof is, so that (for one thing) you will know when you have proved something and when you have not. ‘When Jacobi complained that _In addition, a good proof doesn’t just convince you that something is true; Gauss's proofs appeared unmoti- it tells you why it is true. You presumably don’t lie awake at night worrying vated, Gauss is said to have an- about the truth of the statements in this or any other math textbook. (This swered, You build the building and ig known as “proof by eminent authority”; you assume the authors know what eee cqulfolding, Our sym they are talking about.) But reading the proofs will help you understand the pathy is with Jacobi’s reply: he likened Gauss to the for who material. erases his tracks in the sand with If you get discouraged, keep in mind that the content of this book represents his tail a cleaned-up version of many false starts. For example, John Hubbard started by trying to prove Fubini’s theorem in the form presented in Equation 4. ‘When he failed, he realized (something he had known and forgotten) that the statement was in fact false. He then went through a stack of scrap paper before ‘coming up with a correct proof. Other statements in the book represent the efforts of some of the world’s best mathematicians over many years. 4° Chapter 0. Preliminaries 0.2 How TO NEGATE MATHEMATICAL STATEMENTS Statements that to the ordi- nary mortal are false or meaning- less are thus accepted as true by mathematicians; if you object, the mathematician will retort, “find ‘me a counter-example.” Even professional mathematicians have to be careful not to get confused when negating a complicated mathematical statement. The rules to follow are: (1) The opposite of {For all x, P(z) is true] ig (There existe 2 for which P(2) is not true]. 0.21 Above, P stands for “property.” Symbolically the same sentence is written: The opposite of Vz,P(z) is 3| not P(z). 0.22 Instead of using the bar | to mean “such that” we could write the last line (2z)(not P(z)). Sometimes (not in this book) the symbols ~ and ~ are used to mean “not.” (2) The opposite of [There exists 2 for which R(z) is true] is [For all x, R(2) is not true]. 0.2.3 Symbolically the same sentence is written: The opposite of (32)(P(z)) is (Vz) not P(z). 0.24 ‘These rules may seem reasonable and simple. Clearly the opposite of the (false) statement, “All rational numbers equal 1,” is the statement, “There exists a rational number that does not equal 1.” However, by the same rules, the statement, “All seven-legged alligators are orange with blue spots” is true, since if it were false, then there would exist a seven-legged alligator that is not orange with blue spots. The statement, “All seven-legged alligators are black with white stripes” is equally true. In addition, mathematical statements are rarely as simple as “All rational numbers equal 1.” Often there are many quantifiers and even the experts have to watch out. At a lecture attended by one of the authors, it was not clear to the audience in what order the lecturer was taking the quantifiers; when he was forced to write down a precise statement, he discovered that he didn't know what he meant and the lecture fell apart. Here is an example where the order of quantifiers really counts: in the defi- nitions of continuity and uniform continuity. A function f is continuous if for all z, and for all ¢ > 0, there exists § > 0 such that for all y, if |z~ yl < 6, then \f(@) - f(y) <¢. That is, f is continuous if (Wz)(We > 0)(36 > 0)(vy) (le-yl [f(z)- f(y 0, there exists 6 > 0 for all x and all y such that if |x — y| < 4, then |f(z) — f(y)| < €. That is, f is ‘uniformly continuous if (Ve > 0)(35 > O)(va)(¥y) (Ir <6 => If(z)- fy B means BC A, as you probably guessed. Expressions are sometimes condensed: {c€R|zisasquare} means {z|r¢Randzisasquare}, 0.3.2 i.e, the set of non-negative real numbers. A slightly more elaborate variation is indered unions and intersections: if Sq is a collection of sets indexed by a € A, then 1) Sa denotes the intersection of all the Sa, and ea U So denotes their union. aca For instance, if In C IR? is the line of equation y = n, then Uneg fn is the set of points in the plane whose y-coordinate is an integer. ‘We will use exponents to denote multiple products of sets; A x Ax ++ x A with n terms is denoted A”: the set of n-tuples of elements of A. If this is all there is to set theory, what is the fuss about? For one thing, historically, mathematicians apparently did not think in terms of sets, and the introduction of set theory was part of a revolution at the end of the 19th century that included topology and measure theory. We explore another reason in Section 0.5, concerning infinite sets and Russell's paradox. 0.4 REAL NUMBERS ‘Showing that all such construc- tions lead to the same numbers is 1 fastidious exercise, which we will not pursue. All of calculus, and to a lesser extent linear algebra, is about real numbers. In this introduction, we will present real numbers, and establish some of their ‘most useful properties. Our approach privileges the writing of numbers in base 10; as such it is a bit unnatural, but we hope you will like our real numbers being exactly the numbers you are used to. Also, addition and multiplication will be defined in terms of finite decimals. There are _miore elegant approaches to defining real num- bbers, (Dedekind cuts, for instance (c0e, for example, Michael Spivak, Calculus, second edition, 1980, pp. 554-572), or Cauchy sequences of rational numbers; one could also irror the present approach, writ numbers in any base, for in- ince 2. Since this section is par- lly motivated by the treatment of floating-point numbers on com- puters, base 2 would seem very natural. The least upper bound prop- erty of the reals is often taken as an axiom; indeed, it characterizes the real numbers, and it sits at the foundation of every theorem in calculus. However, at least with the description above of the reals, it is a theorem, not an axiom. The least upper bound sup X is sometimes denoted 1.u.b.X; the notation max X is also sometimes used, but suggests to some people that max X € X. 0.4 Real Numbers 7 Numbers and their ordering By definition, the set of real numbers is the set of infinite decimals: expressions like 2.95765392045756..., preceded by a plus or a minus sign (in practice the plus sign is usually omitted). The number that you usually think of as 3 is the infinite decimal 3.0000... ending in all zeroes. ‘The following identification is absolutely vital: a number ending in all 9's is equal to the “rounded up” number ending in all 0's: 0.34999999--- = 0.350000... 04.1 ‘Also, +.0000--- = -.0000.... Other than these exceptions, there is only one way of writing a real number. Numbers that start with a + sign, except +0.000..., are positive; those that start with a — sign, except ~0.00..., are negative. If x is a real number, then —z has the same string of digits, but with the opposite sign in front. For k > 0, we will denote by [x], the truncated finite decimal consisting of all the digits of x before the decimal, and exactly k digits after the decimal. To avoid ambiguity, if z is a real number with two decimal expressions, [z]_. will be the finite decimal built from the infinite decimal ending in 0's; for the number in Equation 0.4.1, [2]3 = 0.350. Given any two different numbers x and y, one is always bigger than the other. This is defined as follows: if z is positive and y is non-positive, then x > y. If both are positive, then in their decimal expansions there is a first digit in which they differ; whichever has the larger digit in that position is larger. If both are negative, then > y if -y > —2. The least upper bound property Definition 0.4.1 (Upper bound; least upper bound). A number a is ‘an upper bound for a subset X C R if for every s € X we have r 0 (the case x < 0 is slightly different). If z =a, we are done: the least upper bound is a. Recall that {al, denotes the fi- nite decimal consisting of all the digits of a before the decimal, and (j digits after the decimal. We use the symbol Oto mark the end of a proof, and the symbol 4 to denote the end of an example or a remark. Because you learned to add, subtract, divide, and multiply in ‘elementary school, the algorithms used may seem obvious. But un- derstanding how computers sim- ulate real numbers is not nearly ‘as routine as you might imagine. A real number involves an infinite ‘amount of information, and com- puters cannot handle such things: they compute with finite decimals. ‘This inevitably involves rounding off, and writing arithmetic subrou- tines that minimize round-off er- rors is a whole art in itself. In particular, computer addition and multiplication are not commute- tive or associative. Anyone who really wants to understand numer- fcal problems has to take a serious interest in “computer arithmetic.’ 8 Chapter 0. Preliminaries Itz ¢a, there is a first. j such that the jth digit of z is smaller than the jth digit of a. Consider all the numbers in [1,a] that can be written using only j digits after the decimal, then all zeroes. This is finite non-empty set; in fact it has at most 10 elements. and (al; is one of them. Let b be the largest which is not an upper bound. Now consider the set of numbers in [b;,a] that have only j +1 digits after the decimal point, then all zeroes. Again this is a finite non-empty set, so you can choose the largest which is not an upper bound; call it b41. It should be clear that 6,41 is obtained by adding one digit to b;. Keep going this way, defining numbers bj42,6)+3,---, each time adding one digit to the previous number. We can let b be the number whose kth decimal digit is the same as that of by; we claim that 6 = sup X. Indeed, if there exists y € X with y > 6, then there is a first digit k of y which differs from the kth digit of 6, and then 6, was not the largest number with k digits which is not an upper bonnd. since using the kth digit of y would give a bigger one. So b is an upper bound. Now suppose that 6’ < b is also an upper bound. Again there is a first digit k of b which is different from that of b/. This contradicts the fact that by was not an upper bound, since then by > 8. O Arithmetic of real numbers The next task is to make arithmetic work for the reals: defining addition, mul- tiplication, subtraction, and division, and to show that the usual rules of arith metic hold. This is harder than one might think: addition and multiplicati always start at the right, and for reals there is no right. ‘The underlying idea is to show that if you take two reals, truncate (cut) them further and further to the right and add them (or multiply them, or subtract. them, etc.) and look only at the digits to the left of any fixed position, the digits we see will not be affected by where the truncation takes place, once it is well beyond where we are looking. The problem with this is that it isn’t quite true, Example 0.4.3 (Addition). Consider adding the following two numbers: -222222...222... TTT... 778. The sum of the truncated numbers will be .9999...9 if we truncate before the position of the 8, and 1.0000...0 if we truncate after the 8. So there cannot be any rule which says: the 100th digit will stay the same if you truncate after the Nth digit, however large N is. The carry can come from arbitrarily far to the right. If you insist on defining everything in terms of digits, it can be done but is quite involved: even showing that addition is associative involves at least stands for “finite decimal.” We use A for addition, M for multiplication, and $ for subtrar- tion; the function Assoc is needed to prove associativity of addition Since we don’t yet have a no- tion of subtraction in, we can't write r=) <6 Hw)? whit and nnultiplicatis definition of k-close uses only sub- traction of finite decimals. The notion of k-close in the cor- rect way of saying that two m bers agree to k digits after the dec- imal point. It takes into account the convention by which « num- ber ending in all ”s is equal to the rounded np number ending in all O's: the numbers 9998 and 1.0001 are 3-clase. ‘The functions A and Af sat- isfy the conditions of Proposition 0.4.6; thus they apply to the real numbers, while A and Mf without tildes apply to finite decimals. 0.4 Real Numbers 9 different cases. and although none is hard, keeping straight what you are doing is quite delicate, Exercise 0.4.1 should give you enough of a taste of this approach. Proposition 0.1.6 allows a general treatment; the development is quite abstract, and you should definitely not think you need to understand this in order to proceed. Let us denote by D the set of finite decimals. Definition 0.4.4 (Finite decimal continuity). A mapping f :D" + D will be called finite decimal continuous (D-continuous) if for all integers N and k, there exists { such that if (21, ..,2n) and (y1,---,Yn) are two elements of D™ with all |z,], ]y:l < NV, and if [xy — yi] < 10-! for all i= 1,...,n, then Mf 2ay-+-)2m) = L(d1s---s9n)l < 107%, 0.4.2 Exercise 0.4.3 asks you to show that the functions A(x,y) = +y, M(2,y) = zy, S(z.y) = 1 — y, Assoc(s. y) = (2 +y) + 2 are Dcontinuous, and that 1/2 is not. To see why Definition 0.4.4 is the right definition, we need to define what it means for two points x,y € BR” to be close. Definition 0.4.5 (k-close). Two points x,y € IR” are k-close if for each 4=0,....n, then |[xs]e — [yile| < 107. Notice that if two numbers are k-close for all k, then they are equal (see Exercise 0.4.2). ' If f :3" 5 is D-contimious, then define FR" —» R by the formula Fx) = sup inf Aleit... [tnh). 0.4.3 Proposition 0.4.6. The function f : R" + R is the unique function that coincides with f on 10" and which satisfies that the continuity condition for all k €N, for all N €N, there exists | € N such that when x,y € R" are L-close and all coordinates x, of x satisfy |x;| < N, then f(x) and f(y) are k-close. The proof of Proposition 0.4.6 is the object of Exercise 0.4.4. With this proposition, setting up arithmetic for the reals is plain sailing. Consider the D-continuous functions A(x, y) = z+y and M(z,y) = zy; then we define addition of reals by setting r+y=A(z.y) and zy =M(z,y). 0.4.4 It isn’t harder to show that the basic laws of arithmetic hold: It is one of the basic irritants of elementary school math that division is not defined in the world of finite decimals. Alll of calculus is based on this definition, and the closely related definition of limits of functions. If a series converges, then the same list of numbers viewed as a sequence must converge to 0. The converse is not true. For example, the harmonic series aia l4g+gte docs not converge, although the terms tend to 0. 10 Chapter 0. Preliminaries aty=ytr Addition is commutative. (e+y)tz=2+(yt2) Addition is associative. zt(-z)=0 Existence of additive inverse. ve Multiplication is commutative. (su)z = (v2) Multiplication is associative. ay +2) =aytaz Multiplication is distributive over addition. These are all proved the same way: Ict us prove the last. Cousider the function D? —D given by F(z,y,2) = M(x, A(y,2)) ~ A(M(z,y),M(z,2))- 0.45 ‘We leave it to you to check that F is D-continuous, and that F(z,y,2) = M(z, Ay, 2) - A(M(2,y), M(z,2)) 0.4.6 But F is identically 0 on D%, and the identically 0 function on RR° is a function which coincides with 0 on D® and satisfies the continuity condition of Proposi- tion 0.4.6, so F vanishes identically by the uniqueness part of Proposition 0.4.6. ‘That is what was to be proved. This sets up almost all of arithmetic; the missing piece is division. Exercise 0.4.5 asks you to define division in the reals. Sequences and series ‘A sequence is an infinite list (of numbers or vectors or matrices ... ). Definition 0.4.7 (Convergent sequence). A sequence a,, of real numbers is said to converge to the limit a if for all € > 0, there exists NV such that for all n> N, we have [a - aq] <€ Many important sequences appear as partial sums of series. A series is a sequence where the terms are to be added. If a},a2,... is a series of numbers, then the associated sequence of partial sums is the sequence 53, 2,..., where x sy = oan. 047 ol For example, if a1 = 1,az = 2,3 = 3, and so on, then 4 =1+24+3+44, 0.4 Real Numbers 11 Definition 0.4.8 (Convergent series). If the sequence of partial sums of a series has a limit S, we say that the series converges, and its limit is Vaz 0.4.8 Example 0.4.9 (Geometric series). If |r| < 1, then Yer 04.9 Example of a6 tric series: 2 a aecestee Indeed, the following subtraction shows that S,(1-r) =a —ar"*?: oO = ao 242(01) +2(01)? + Sq = at ar+ ar? + ar’ +++. ar" 2 Sar = ar-tar? tar’ +--+ ar” tar"? ace Tm —— 200 Sa(1- 1) =a —ar"! os But lim, ar"*! = 0 when |r| <1, so we can forget about the —ar"*!: as n 00, we have Sp a/(I-r). A Proving convergence It is hard to overstate the im- The weakness of the definition of a convergent sequence is that it involves the portance of this problem: prov- Jimit value. At first, it is hard to see how you will ever be able to prove that a ing that a limit exists without sequence has a limit if you don't know the limit ahead of time. ee need etre the hig, The first result along these lines is the following theorem, tory of mathematics, and remains _ a critical dividing point between ‘Theorem 0.4.10. A non-decreasing sequence ay, converges if and only if first year calculus and multivari- it is bounded. ate calculus, and more generally, : ; between elementary mathematics PFOof. Since the sequence an is bounded, it has a least upper bound A. We pay ice aes claim that A is the limit. This means that for any ¢ > 0, there exists N such that if n > N, then la, — Al < €. Choose € > 0; if A aq > € for all n, then A~c is an upper bound for the sequence, contradicting the definition of A. So there is a first N with A—ay <¢, and it will do, since when n > NV, we must, have A-an 0, then there exists c€ [0,8 such that f(c) = 0. Proof. Let X be the set of z € [a,6] such that f(z) < 0. Note that X is non-empty (a is in it) and it has an upper bound, namely 6, so that it has a least upper bound, which we call c. We claim f(c) = 0. e f is continuous, for any € > 0, there exists 6 > 0 such that when lz —c| <6, then |f(z) — f(c)| < ¢. Therefore, if f(c) > 0, we can set € = f(c), and there exists 6 > 0 such that if |x —¢| < 4, then [f(z) - f(c)| < f(c). In Particular, we see that if z > ¢- 6/2, f(z) > 0, so c— 6/2 is also an upper bound for X, which is a contradiction. If f(c) < 0, a similar argument shows that there exists 6 > 0 such that f(c + 6/2) < 0, contradicting the assumption that c is an upper bound for X. The only choice left is f(c) 0.5 INFINITE SETS AND RUSSELL’s PARADOX One reason set theory is accorded so much importance is that Georg Cantor (1845-1918) discovered that two infinite sets need not have the same “number” of elements; there isn’t just one infinity. You might think this is just obvious, for instance because there are more whole numbers than even whole numbers, But with the definition Cantor gave, two sets A and B have the same number of ‘This argument simply flabber- gasted the mathematical world; after thousands of years of philo- sophical speculation about the in- finite, Cantor found a fundamen- tal notion that had been com- pletely overlooked. It would seem likely that R and R? have different infinities of ele- ments, but that is not the case (see Exercise 0.4.5). 0.5 Infinite Sets and Russell's Paradox 13 elements (the same cardinality) if you can set up @ one-to-one correspondence between them. For instance ee 051 0246 8 10 12 .. establishes a one to one correspondence between the natural numbers and the even natural nuinbers. More generally, any set whose elements you can list has the same cardinality as N. But Cantor discovered that R does not. have the same cardinality as N: it has a bigger infinity of elements. Indeed, imagine making any infinite list of real numbers, say between 0 and 1, so that written as decimals, your list might look like 154362786453429823763490652367347548757 98735462194375659867356294065 7349327658 .229573521903564355423035465523390080742 . 1047520187 46267653209365723689076565787 . . 026328560082356835654432879897652377327 05.2 Now consider the decimal formed by the elements of the diagonal digits (in bold above) .18972..., and modify it (almost any way you want) so that every digit is changed, for instance according to the rule “change 7's to 5's and change anything that is not a 7 to a 7”: in this case, your number becomes .77757..... Clearly this last number does not appear in yout t is not the nth element, of the list, because it doesn’t have the same nth decimal. Sets that can be put in one-to-one correspondence with the integers are called countable, other infinite sets are called uncountable; the set R of real numbers is uncountable. All sorts of questions naturally arise from this proof: are there other infinities besides those of N and R? (There are: Cantor showed that there are infinitely many of them.) Are there infinite subsets of R that cannot be put into one to one correspondence with either R or Z? This statement is called the continuum ‘hypothesis, and has been shown to be unsolvable: it is consistent with the other axioms of set theory to assume it is true (Gadel, 1938) or false (Cohen, 1965) ‘This means that if there is a contradiction in set theory assuming the continuum hypothesis, then there is a contradiction without assuming it, and if there is a contradiction in set theory assuming that the continuum hypothesis is false, then again there is a contradiction without assuming it is false. Russell’s paradox Soon after Cantor published his work on set theory, Bertrand Russell (1872- 1970) wrote him a letter containing the following argument: ‘This paradox has a long his- tory, in various guises: the Greeks ‘knew it as the paradox of the bar- ber, who lived on the island of Mi- los, and decided to shave all the men of the island who did not shave themselves. Does the bar- ber shave himself? 14 Chapter 0. Preliminaries Consider the set X of all sets that do not contain themselves. If X € X, then X does contain itself, so X ¢ X. But if X ¢ X, then X is a set which does not contain itself, so X € X. Russell's paradox was (and remains) extremely perplexing: Cantor's reaction was to answer that Russell had completely destroyed his work, showing that there is an inconsistency in set theory right at the foundation of the subject. History has been kinder, but Russell's paradox has never been quite “resolved.” The “solution,” such as it is, is to say that the naive idea that any property defines a set is untenable, and that sets must be built up, allowing you to take subsets, unions, products, ... of sets already defined; moreover, to make the theory interesting, you must assume the existence of an infinite set. Set theory (still an active subject of research) consists of describing exactly the allowed construction procedures, and seeing what consequences can be derived. 0.6 CoMPLEX NUMBERS ‘Complex numbers (long consid- ‘ered “impossible” numbers) were first used in 16th century Italy, ‘8 0 crutch that made it posi- ble to find real roots of real cubic polynomials. But they turned out to have immense significance in many fields of mathematics, lead- John Stillwell to write in his ‘Mathematics and Its History that “this resolution of the paradox of v=1 was 80 powerful, unexpected and beautiful that only the word ‘miracle’ seems adequate to de- scribe it.” Complex numbers are written a+ bi, where a and 6 are real numbers, and addition and multiplication are defined in Equations 0.6.1 and 0.6.2. It follows from those rules that i= V=T. The complex number a + ib is often plotted as the point ($) € R2. The real number a is called the real part of a+ ib, denoted Re (a+ ib), and the real number 6 is called the imaginary part, denoted Im (a+ ib). The reals R can be considered as a subset of the complex numbers C, by identifying @ € R with a +40 € C; such complex numbers are called “real,” as you might imagine. Real numbers are systematically identified with the real complex numbers, and @ + i0 is usually denoted simply a. Numbers of the form 0 + ib are called purely imaginary. What complex numbers, if any, are both real and purely imaginary?! If we plot a + ib as the point ($) © R?, what do the purely real numbers correspond to? The purely imaginary numbers?? Arithmetic in C Complex numbers are added in the obvious way: (a1 + ibs) + (az + tbe) = (a1 + a9) + i(b, + by). 0.6.1 ‘Thus the identification with R? preserves the operation of addition. iThe only complex number which is both real and purely imaginary is 0 = 0 + 0. ?The purely real numbers are all found on the z-axis, the purely imaginary numbers on the y-axis, Equation 0.6.2 is not the only definition of multiplication one can imagine. For instance, we could define (ai +ib1)+(a2-+iba) = (a1a2) + i(bib2). But in that case, there would be lots of elements by which one could not divide, since the product of any purely real number and any purely imag- inary number would be 0: (0+ iba) =0. If the product of any two non-zero numbers a and 8 is 0: af = 0, then division by either is impossi- ble; if we try to divide by a, we arrive at the contradiction = 0: (ay +30) * Bq = 82 OL ele ry aire ‘These four properties, concern- ing addition, don’t depend on the special nature of complex num- bers; we can simnilarly define addi- tion for n-tuples of real numbers, and these rules will still be true The multiplication in these five properties is of conrse the special multiplication of complex num- bers, defined in Equation 0.6.2. Multiplication can only be defined for pairs of real numbers. If we were to define a new kind of num- ber as the 3-tuple (a,b,¢) there would be no way to multiply two such 3-tuples that satisfies these five requirements. ‘There is a way to define mul- tiplication for 4-tuples that fies all but commutativity, called Hamilton's quaternions. 0.6 Complex Numbers 15 What makes C interesting is that complex numbers can also be inultiplied: (a + ibj)(aq + tba) = (aa2 ~ brbp) + é(arbe + abr). 0.6.2 This formula cousists of multiplying a) + ib, and az + iby (treating i like the variable z of a polynomial) to find (a1 + ibs)(a2 + tba) = ayaz + i(arby + aabr) + i7(brba) 0.6.3 and then setting #7 -1 Example 0.6.1 (Multiplying complex numbers). (a) 2+ i)(1- 81) = (243) +i(1-6) =5-5i (6) (IF)? = 21. A 064 Addition and multiplication of reals viewed as complex numbers coincides with ordinary addition and multiplication: (a +10) + (b+ i0)=(a+d)+i0 (a + 10)(6+ 10) = Exercise 0.6.1 asks you to check the following nine rules, for 21,22 € (ab) +10. 0.6.5 21+(22+25) Addition is associative. Addition is commutative. 0 (Le, the complex number 0 + 0i) is an additive identity. (-a ~ ib) is the additive inverse of ati, 1) (21+22)+23 = Qateenta (3) 2+0=2 (4) (a+ ib) + (-a ib) = Multiplication is associative. Multiplication is commutative. 1 (ie., the complex number 1 + 0i) is a multiplicative identity. If z £0, then z has a multiplicative inverse. Multiplication is distributive over addition. (5) (2122)23 = 21(2223) (6) 2122 = 22% (Diz= (8) (0+ ib) (arp - igh) = 1 (9) 21(22 + 23) = 122 + 2123 The complex conjugate Definition 0.6.2 (Complex conjugate). The complex conjugate of the complex number z =a + ib is the number 7 = a Complex conjugation preserves all of arithmetic: Z¥wW=2+ and zw= 066 Ficure 0.6.1 ‘When multiplying two complex numbers, the absolute values are multiplied and the arguments (po- lar angles) are added. 16 Chapter 0. Preliminaries ‘The real numbers are the complex numbers z which are equal to their complex conjugates: 7 = z, and the purely imaginary complex numbers are those which are the opposites of their complex conjugates: 7 = There is a very useful way of writing the length of a complex number in terms of complex conjugates: If z = a + ib, then 2 =a? + 6?. The number k= Va eP = VB 06.7 is called the absolute value (or the modulus) of z. Clearly, |a +-ib| is the distance from the origin to ( 4): Complex numbers in polar coordinates Let z = a+ib # 0 be a complex number. Then the point (3) can be represented in polar coordinates as (7D) where rsind ra Vere =l2I, 0.6.8 and @ is an angle such that cos@= 5 and sind = 5 0.6.9 so that 2 = r(cos0 + ising). 0.6.10 The polar angle 8, called the argument of z, is determined by Equation 0.6.9 up to addition of a multiple of 2n. ‘The marvelous thing about this polar representation is that it gives a geo- metric representation of multiplication, as shown in Figure 0.6.1. Proposition 0.6.3 (Geometrical representation of multiplication of complex numbers). The modulus of the product 212 is the product of the moduli |2,| za]. ‘The polar angle of the product is the sum of the polar angies 64, 63: (¢x(c008, +i8in0))(ra(c006 +éaine)) = ira (con(Os +65) +isin(®, +6). Proof. Multiply out, and apply the addition rules of trigonometry: c0s(01 + 62) = co8 8; cos 42 — sin 6, sin 4 sin(@, + 62) = sin® cos 62 + cos, siné2. O 0.6.11 ‘The following formula, known as de Moivre's formula, follows immediately: uy’ uy e Me FIGURE 0.6.2. The fifth roots of form a reg- ular pentagon, with one vertex at polar angle 6/5, and the others ro- tated from that one by miultiples of 2n/5. Immense psychological difficul- ties had to he overcome before complex numbers were accepted 1s an integral part of mathemat- jes; when Gauss came up with proof of the fundamental the- orem of algebra, complex num- bers were still not sufficiently re- spectable that he could use them in his statement of the theorem {although the proof depends on them). 0.6 Complex Numbers 17 Corollary 0.6.4 (De Moivre’s formula). If z = r(cos@ + isin@), then 2" = r"(cosnd + isinnd). 0.6.12 De Moivre’s formula itself has a very important cousequence, showing that in the process of adding a square rout of ~1 to the real immbers, we have actually added all the roots of complex numbers one might hope for. Proposition 0.6.5. Every complex number z= r(cos@ + isin8) with r #0 has n distinct complex nth roots, which are the numbers 2 (out Ey in), bao 0+ 2kn 0.6.13. tisin Note that r!/" stands for the positive real nth root of the positive number r, Figure 0.6.2 illustrates Proposition 0.6.5 for n = 5. Proof. All that needs to be checked is that (1) (r'/")" =r, which is trne by definition; (2) con HEF = cose and sinn 222k" sing, 06.14 which is true since n2#24% = @ + 2km, and sin aud cos are periodic with period 2: and (3) The numbers in Equation 0.6.13 are distinct. which is true since the polar angles do not differ by a multiple of 2n. 1 A great deal more is true: all polynomial equations with complex coefficients have all the roots one might hope for. This is the content of the fundamen- tal theorem of algebra, Theorem 1.6.10, proved by d'Alembert in 1746 and by Gauss around 1799. This milestone of mathematics followed by some 200 years the first introduction of complex numbers, about 1560, by several Italian math- ematicians who were trying to solve cubic equations. Their work represented the rebirth of mathematics in Europe after a long sleep, of over 15 centuri Historical background: solving the cubic equation ‘We will show that a cubic equation can be solved using formulas analogous to the formula b+ VP Fac 2a for the quadratic equation az? + br +. =0. 0.6.15 Here we see something bizarre: in Example 0.6.6, the polynomi has only one real root and we can find it using only real numbers, but in Example 0.6.7 there are three real roots, and we can't find any of them using only real num- bers. We will see below that it is always true that when Cardano’s formula is used, then if a real poly- nomial has one teal root, we can always find it using only real num- bers, but if it has three real roots, we never can find any of them us- ing real numbers. 18 Chapter 0. Preliminaries Let us start with two examples; the explanation of the tricks will follow. Example 0.6.6 (Solving a cubic equation). Let us solve the equation +n 0. First substitute z = u — 1/3u, to get 1 1 1 -= Sorter a tl=0. 0616 (« 3u “tau a YO Bu After simplification and multiplication by u? this becomes 0.6.17 This is a quadratic equation for u>, which can be solved by formule 0.6.15, to yield 1 Br uv (1 F) ~00se.. 171.0358... 0.6.18 Both of these numbers have real cube roots: approximately wu, ~ 0.3295 and ug © -1.0118. This allows us to find x = u~1/3u: 1 1 Peg sua gt 0.6823. A 0.6.19 ). As we will explain Example 0.6.7. Let us solve the equation z° ~ 32 +1 = + 1/u, which leads below, the right substitution to make in this case is 2 to 1)° 1 ut+-) ~3(ut-)+1=0. 0.6.20 u u After multiplying out, canceling and multiplying by u*, this gives the quadratic equation es uS+u+1=0 with solutions 1,2 ea) = cos isin % 0.6.21 The cube roots of v (with positive imaginary part) are 2n ae bt i 8h de Ade cosy + isin, cos + isin, cos <4 isin 0.6.22 In all three cases, we have 1/u = @, so that u+ 1/u = 2Rew, leading to the three roots a= Qn 8x £08 “5 * 1.532088, 22 = 2008 > ~ ~1.879385, i. 0.6.23 45 = eos —* 0.347206, A ‘The substitutions x = u—1/3u in Example 0.6.6 and x = u + 1/u in Example 0.6.7 were special cases. Eliminating the term in 2? means changing the roots so that their sum is 0: If the roots of a cu- bic polynomial are ai,az. and as, then we can write the polynomial P= (z—ai)(2—a2)(2 ~ a3) = 2° — (a1 +02 + a5)2" + (ayaz + aay + a2as)z = ayazas, ‘Thus eliminating the term in 2? means that ay + a2 +3 = 0. We will use this to prove Proposition 0.69. 0.6 Complex Numbers 19 Derivation of Cardano’s formulas If we start with the equation z° + ar? + bz +c = 0, we can eliminate the term in z? by setting x = y — a/3: the equation becomes : =e 24 06.24 * . y+py+q=0, where p a car Now set y =u ~ £; the equation y? + py +g = 0 then becomes io 3 zs = 0.6.25 w+ =o, which is a quadratic equation for ui : Let 1 and vz be the two solutions of the quadratic equation v? + gu - &, and let u,1,t1.2,th,3 be the three cubic roots of ; for i = 1,2, We now have apparently six roots for the equation z° + pr + q = 0: the numbers p fig Wig Ze, b= 1% F=12, wy mmsm Re i i 3, 0.6.26 Exercise 0.6.2 asks you to show that —p/(3u;,;) is a cubic root of vz, and that we can renumber the cube roots of 2 so that —p/(3u:,j) = ua). If that is done, we find that ¥1,j = ya, for j = 1,2,3; this explains why the apparently six roots are really only three. The discriminant of the cubic Definition 0.6.8 (Discriminant of cubic equation). The number A = 27q? + 4p? is called the discriminant of the cubic equation 2° + px + q. Proposition 0.6.9. The discriminant A vanishes exactly when =*+-pa-+-q = 0 has a double root. Proof. If there is a double root, then the roots are necessarily {a,a, 2a} for some number a, since the sum of the roots is 0. Multiply out (x —a)*(z + 2a) = 2° - 3a?z +20, 80 p = —3a? and g = 20°, and indeed 4p? + 274? = -4. 27a +4-27a° = 0. Now we need to show that if the discriminant is 0, the polynomial has a double root. Suppose A = 0, and call a the square root of —p/3 such that 2a = q; such a square root exists since 4a® = 4(—p/3)? = -4p3/27 = q?. Now multiply out (e-a)*(e + 2a) = 2° + 2(-40? + a?) +209 = 2 + pr tq, and we see that a is a double root of our cubic polynomial. 0 | p~ Ficure 0.6.3, The graphs of three cubic poly- ‘Thus indeed, if a real cubic polynomial has three real roots, and you want to nomials. The polynomial at the find them by Cardano’s formula, you must use complex numbers, even though top has three roots. As itis varied, both the problem and the result involve only reals. Faced with this dilemma, the two roots to the left coalesce to give a double root, as shown by the middle figure. If the polynomial 20 Chapter 0. Preliminaries Cardano’s formula for real polynomials Suppose p,q are real. Figure 0.6.3 should explain why equations with double roots are the boundary between equations with one real root and equations with three real roots. Proposition 0.6.10 (Number of real roots of a polynomial). The real cubic polynomial 2° + pz + has three real roots if the discriminant 27q? + 4p? < 0, and one real root if 27q? + 4p* > 0. Proof. If the polynomial has three real roots, then it has a positive maximum at ~/=p/3, and a negative minimum at /=p/3. In particular, p must be negative. Thus we must have ((FR) (+) (RYH) <2 After a bit of computation, this becomes the result we want: ere e+ 8 <0. o 0.6.28 the Italians of the 16th century, and their successors until about 1800, held their noses and computed with complex numbers. The name “imaginary” they is varied a bit further, the double used for such numbers expresses what they thought of them. root vanishes (actually becoming a Several cubies are proposed in the exercises, as well as an alternative to pair of complex conjugate roots). Cardano’s formula which applies to cubics with three real roots (Exercise 0.6.6), Exercises for Section 0.4: Real Numbers and a sketch of how to deal with quartic equations (Exercise 0.6.7) 0.7 EXERCISES 0.4.1 (a) Let z and y be two positive reals. Show that z + y is well defined by showing that for any k, the digit in the kth position of [2] + {y]w is the same for all sufficiently large NV. Note that N cannot depend just on k, but must depend also on z and y. Stars (*) denote difficult exer- cises. Two stars indicate a partic- ularly challenging exercise. Many of the exercises for Chap- ter 0 are quite theoretical, and too difficult for students taking multivariate calculus for the first time. They are intended for use when the book is being used for a first analysis class, Exceptions in- clude Exercises 0.5.1 and part (a) of 05.2 digit | 0 1 position even vert | right odd right | left Table 0.4.6 0.7 Exercises 21 *(b) Now drop the hypothesis that the numbers are positive, and try to define addition. You will find that this is quite a bit harder than part (a). *(c) Show that addition is commutative. Again, this is a lot easier when the numbers are positive, **(d) Show that addition is associative, i.e., 2+ (y+ 2) =(e+y) +2. This is much harder, and requires separate consideration of the cases where each of z, y and z is positive and negative. 0.4.2 Show that if two numbers are k-close for all k, then they are equal. *0.4.3 Show that the functions A(z,y) = 2+ y, M(2,y) = ty, S(z.y) = z-y, (z+y) +2 are D-continuous. and that 1/z is not. Notice that for A and S. the | of Definition 0.4.4 does not, depend on N, but that it does for M. **0.4.4 Prove Proposition 0.4.6. This can be broken into the following steps. (a) Show that sup, inf;>s /(|x1J:,---,[tn]:) is well defined, i.e., that the sets of numbers involved are bounded. Looking at the function S from Exercise 0.4.3, explain why both the sup and the inf are there, (b) Show that the function 7 has the required continuity properties. (c) Show the uniqueness. *0.4.5 Define division of reals, using the following steps. (a) Show that the algorithm of long division of a positive finite decimal a by ‘8 positive finite decimal b defines a repeating decimal a/b, and that b(a/b) = a. (b) Show that the function inv(z) defined for x > 0 by the formula inv(z) = inf 1/[z]e satisfies zinv(z) = 1 for all z > 0. (c) Now define the inverse for any x # 0, and show that zinv(2) = 1 for all #0. **0.4.6 In this exercise we will construct a continuous mapping 7 : [0,1] > R?, the image of which is a (full) triangle T. We will write our numbers in (0, 1} in base 2, so such a number might be something like .0011101000011..., and we will use Table 0.4.6. Take a right triangle T. We will associate to a string 8 = 81, 52,... of digits 0 and 1 a sequence of points xy,x1,X2,... of T by starting at the right angle , dropping the perpendicular to the opposite side, landing at x1(s), and deciding to turn left or right according to the digit s1, as interpreted by the bottom line of the table, since this digit is the first digit (and therefore in an odd position): on 0 turn right and on 1 turn left. Now drop the perpendicular to the opposite side, landing at x2(s), and turn right or left according to the digit sz, as interpreted by the top line of the table, ete. %, f age \ oN ~ 7 SH Figure 0.4.6. This sequence corresponds to the string of digits 00100010010... Exercises for Section 0.5 Infinite Sets and Russell's Paradox 22 Chapter 0. Preliminaries This construction is illustrated in Figure 0.4.6. (a) Show that for any string of digits (s), the sequence xn(a) converges. (b) Suppose a number t € [0,1] can be written in base 2 in two different ways (one ending in 0's and the other in 1’s), and call (a), (s') the two strings of digits. Show that lim, xn(8) = lim, xn(s')- Hint: Construct the sequences associated to .1000... and .0111.... This allows us to define 7(t) = limn—oo Xn(3)- (c) Show that -y is continuous. (d) Show that every point in 7’ is in the image of 7. What is the maximum number of distinct numbers t),...,t, such that 7(¢1) a(t)? Hint: Choose a point in T, and draw a path of the sort, above which leads to it. 0.4.7 (a) Show that the function sint ifz #0 (Oe { 0 ifz=0 is not continuous (b) Show that / satisfies the conclusion of the intermediate value theorem: if f(z) = a and f(z) = a2, then for any number a between a; and a2, there exists a number z between z; and 22 such that f(z) =a. 0.5.1 (a) Show that the set of rational numbers is countable, ie., that you can list all rational numbers. (b) Show that the set of finite decimals is countable. 0.5.2 (a) Show that the open interval (—1, 1) has the same infinity of points as the reals. Hint: Consider the function (2) = tan(xz/2). *(b) Show that the closed interval [—1, 1] has the same infinity of points as the reals. For some reason, this is much trickier than (a). Hint: Choose two sequences, (1) a9 = 1,4),a2,...; and (2) by = ~1,b),b2,... and consider the map 9(z) == if zis not in either sequence. 9(an) = 9(bn) *(c) Show that the points of the circle {(j) eR? 22 +0? =1} have the same infinity of elements as R. Hint: Again, try to choose an appro- priate sequence. Exercise 0.5.4, part (h): This proof, due to Cantor. proves that transcendental mumbers exist thout exhibiting a single one. Many contemporaries of Cantor were scandalized, largely for this reason. Exercise 0.5.5 is the one-dimen- sional case of the celebrated Brou- wer fired point theorem. to be dis- cussed in a subsequent volume. In dimension one it is an easy con- sequence of the intermediate value theorent, but in higher dimensions (even two) it is quite a delicate re- sult Exercises for Section 0.6: ‘Complex Number For Exercise 0.6.2, see the sub- section on the derivation of Car- dano’s formulas (Equation 0.6 26 in particular) 0.7 Exercises 23 *(d) Show that 2? has the same infinity of elements as 2. *0.5.3 Is it possible to make a list of the rationals in (0.1], written as deci- mals. so that the entries on the diagonal also give a rational number? 0.5.4 An algebraic number is a root of a polynomial equation with integer coefficients: for instance, the rational number p/q is algebraic, since it is a salution of gx — p = 0, and so is V2. since it is a root of s? - 2 =0. A number that is not algebraic is called transcendental. It isn’t obvious that there are any transcendental mmbers; the following cxercise gives a (highly unsatisfactory) proof for their existence. (a) Show that the set of all algebraic numbers ix countable Hint: List the finite collection of all roots of linear polynomials with coefficients with absolute valne < 1. Then list the roots of all quadratic equations with coefficients < 2 (sehich will include the linear equations, for instance Oz? + 2x — 1 = 0), then all roots of cubic equation with coefficients < 3. ete. (b) Derive from part (a) that there exist transcendental numbers. in fact uncowntably many of them, 0.5.5 Shaw that if f : [a.b] — [a.6] is continuous. there exists ¢ € [a,b] with flo) 0.5.6 Show that if p(x) is a polynomial of odd degree with real coefficients, then there is a real mumber e such that f(c) = 0. 0.6.1. Verify the nine rules for addition and multiplication of complex nun- bers. Statements (5) and (9) are the only ones that are not immediate. 0.6.2 Show that ~p/(3u;,,) is a cubic root of v2. and that we can renumber the cube roots of v2 so that -p/(3u1,,) = ua, 0.6.3 (a) Find all the cubic roots of 1. (b) Find all the 4th roots of 1 *(c) Find all the Sth roots of 1. Use your formula to construct a regular pentagon using ruler and compass construction. (4) 0.6.4 Show that the following cubics have exactly one real root, and find it. (a) 23 - 18 +35 =0 (by 2°43 474+2=0 id all the 6th roots of } 0.6.5 Show that the polynomial x? - 7x +6 has three real roots, and find them. In Exercise 0.6.6. part (a). use de Moivre’s fornuula: cos nO + isin nO = (cos + isin®)”. Exercise 0.6.7 uses results from Section 3.1. Figure 0.6.7(a). The two parabolas of Equation 0.7.1; mote that their axes are re- spectively the y-axis and the 2- axis. FicuRE 0.6.7(8). ‘The three pairs of lines that go through the inttersections of the two parabolas. 24 Chapter 0. Preliminaries 0.6.6 There is a way of finding the roots of real cubics with three real roots, using only real numbers and a bit of trigonometry. (a) Prove the formula 4 cos* 6 — 3cos.— cos 30 = 0 (b) Set y = az in the equation 2° + pr + q = 0. and show that there is a value of a for which the equation becomes 4y* — 3y — g: = 0; find the value of aand of a. (c) Show that there exists an angle @ such that 30 = 4) precisely when 27q? + 4p* < 0, i.e.. precisely when the original polynomial has three real root (d) Find a formula (involving arccos) for all three roots of a real cubic poly nomial with three real roots. *0.6.7 In this exercise, we will find formulas for the solution of 4th degree polynomials, known as quartics. Let w! + aw? + bw? + cw +d be a quartic polynomial. (a) Show that if we set w= 2 — a/4, the quartic equation becomes ri+pr? tortr=0. and compute p,q and r in terms of a..c.d. (b) Now set y = 2? + p/2, and show that solving the quartic is equivalent to finding the intersections of the parabolas [; and ‘2 of equation 2 y+p/2=0 and y?+qrtr— a respectively, pictured inf Fignre 0.6.7 (A). ‘The parabolas T, and 2 intersect (usually) in four points, aud the curves of equation ‘h (§) = 2? v4 p/2+m eee Peo O71 == 4 are exactly the curves given by quadratic equations which pass through those: fonr points; some of these curves are shown in Figure 0.6.7 (C). (c) What can you say about the curve given by Equation 0.7.1 when m = 1? When m is negative? When m is positive? (d) The assertion in (b) is not quite correct: there is one cmrve that passes through those fonr points. and which is given hy a quadratic equation. that is missing from the family given by Equation 0.7.1. Find it (e) The next step is the really clever part of the solution. Among these curves, there are three, shown in Figure 0.6.7(B), that consist of a pair of lines, ie., each such “degenerate” curve consists of a pair of diagonals of the quadrilateral formed by the intersection points of the parabolas. Since there are three of these, we may hope that the corresponding values of m are solutions of a cubic equation. and this is indeed the case. Using the fact that a pair of lines is not 0.7 Exercises 25 a smooth curve near the point where they’ intersect, show that the umubers ‘m for which the equation fm = 0 defines a pair of lines, and the coordinates ,y of the point where they intersect, are the solutions of the system of three ‘equations in three unknowns, . 2 V+ qr+r— 7 +-m(z? -y-p/2) =0 2y-m=0 q+2nz =0. (8) Expressing 2 and y imterms of m using the last two equations, show that 1m satisfies the equation m® — 2pm? + (p? - 4r)m+q? =0 for m; this equation is called the resolvent cubic of the original quartic equation. FIGURE 0.6.7 (3). The curves fin (3) y+ p/2-+m (yt +qr-+r- 2) for seven different values of m. Let m),m2 and mg be the roots of the equation, and let G ) () and (%2) be the corresponding points of intersection of the diagonals. This doesn't quite give the equations of the lines forming the two diagonals. The next part gives a way of finding them. 26 Chapter 0. Preliminaries (g) Let G ) be one of the points of intersection, as above, and consider the line ty through the point (3) with slope k, of equation y-w =k(r- 2). Show that the values of k for which lk isa diagonal are also the values for which the restrictions of the two quadratic functions y? + gr +r — & and 2? — y—p/2 to , are proportional. Show that this gives the equations Alki tp/2 OCR +n)+¢) any? /aer which can be reduced to the single quadratic equation Pit -y +a/2) =e + bey -a/A te. Now the full solution is at hand: compute (m1.21.y1) and (m2,22.ya); you can igftore the third root of the resolvent cubic or use it to check your swers. Then for each of these compute the slopes k,.. and k,.2 = ~k,. from the equation above, You now have four lines, two through A and two through B. Intersect them in pairs to find the four intersections of the parabolas. (h) Solve the quartic equations r-aP4r+1=0 and ri 4dre+r-1=0. 1 Vectors, Matrices, and Derivatives It is sometimes said that the great discovery of the nineteenth century was that the equations of nature were linear, and the great discovery of the twentieth century is that they are not.——Tom Kérner, Fourier Analysis 1.0 INTRODUCTION In this chapter, we introduce the principal actors of linear algebra and multi- variable calculus. By and large, first year calculus deals with functions f that associate one number f(z) to one number 2. In most realistic situations, this is inadequate: the description of most systems depends on many functions of many variables. In physics, a gas might be described by pressure and temperature as a func- tion of position and time, two functions of four variables. In biology, one might be interested in numbers of sharks and sardines as functions of position and time; a famous study of sharks and sardines in the Adriatic, described in The Mathematics of the Struggle for Life by Vito Volterra, founded the subject of mathematical ecology. In micro-economics, a company might be interested in production as a func- tion of input, where that function has as many coordinates as the number of Products the company makes, each depending on as many inputs as the com- any uses. Even thinking of the variables needed to describe a macro-economic model is daunting (although economists and the government base many deci- sions on such models). The examples are endless and found in every branch of science and social science. Mathematically, all such things are represented by functions f that take n numbers and return m numbers; such functions are denoted f : R" + R™. In that generality, there isn’t much to say; we must impose restrictions on the functions we will consider before any theory can be elaborated. ‘The strongest requirement one can make is that f should be linear, roughly speaking, a function is linear if when you double the input, you double the ‘output. Such linear functions are fairly easy to describe completely, and a thorough understanding of their behavior is the foundation for everything else. ‘The first four sections of this chapter are devoted to laying the foundations of linear algebra. We will introduce the main actors, vectors and matrices, relate them to the notion of function (which we will call transformation), and develop the geometrical language (length of a vector, length of a matrix, ... ) that we 7 1 The notion that one can think about and manipulate higher di- mensional spaces by considering & Point in n-dimensional space as a list of ite m “coordinates” did not always appear as obvious to math- ematicians as it doce today. In 1846, the English mathematician Arthur Cayley pointed out that a point with four coordinates can be interpreted geometrically without recourse to “any metaphysical no- tion concerning the Possibility of four-dimensional spece.” “Vol” denotes the number of shares treded, “High" and “Low,” the highest and lowest price paid per share, “Close,” the price when trading stopped at the end of the day, and “Chg,” the difference be- tween the closing price and the closing price on the previous day. 28 Chapter 1. Vectors, Matrices, and Derivatives will need in multi-variable calculus. In Section 1.5 we will discuss sequences, subsequences, limits and convergence. In Section 1.6 we will expand on that discussion, developing the topology needed for a rigorous treatment of calculus. ‘Most functions are not linear, but very often they are well approximated by linear functions, at least for some values of the variables. For instance, as long as there are few hares, their number may well double every year, but as soon as they become numerous, they will compete with each other, and their rate of, increase (or decrease) will become more complex. In the last three sections of this chapter we will begin exploring how to approximate a nonlinear function by a linear function—specifically, by its higher-dimensional derivative. 1 INTRODUCING THE ACTORS: VECTORS Much of linear algebra and multivariate calculus takes place within R". This is the space of ordered lists of n real numbers. ‘You are probably used to thinking of a point in the plane in terms of its two coordinates: the familiar Cartesian plane with its z,y axes is R?. Similarly, a Point in space (after choosing axes) is specified by its three coordinates: Carte- sian space is R°. Analogously, a point in R" is specified by its n coordinates; it is a list of n real numbers. Such ordered lists occur everywhere, from grades on a transcript to prices on the stock exchange. ‘Seen this way, higher dimensions are no more complicated than R? and R°; the lists of coordinates just get longer. But it is not obvious how to think about ‘such spaces geometrically. Even the experts understand such objects only by educated analogy to objects in R? or R°; the authors cannot “visualize R* and we believe that no one really can. The object of linear algebra is at least in part, to extend to higher dimensions the geometric language and intuition we have concerning the plane and space, familiar to us all from everyday experience. It will enable us to speak for instance of the “space of solutions” of a particular system of equations as being a four-dimensional subspace of R’. Example 1.1.1 (The stock market). The following data is from the Ithaca Journal, Dec. 14, 1996. Loca Nyse Stocks Vol High Low Close Chg Airgas 193 241, 231, 2354 -%6 AT&T 36606 3914 38%, 39 Borg Warner 74 38% 38 38 -% Corning 4575 445, 43 441%, Dow Jones 1606 331/4 321 331, if Eastman Kodak 7774 805/s791/ 793/ -3/4 Emerson Elec. 3335 97/5 955/, 955/e -11/g Federal Express 5828 421 41 415 116 Each of these lists of eight num- bers is an element of R®; if we were listing the full New York Stock Ex- change, they would be elements of RO, ‘The Swiss mathematician Leon- hard Euler (1707-1783) touched on all aspects of the mathematics and physics of his time. He wrote text- books on algebra, trigonometry, and infinitesimal calculus; all texts in these fields are in some sense rewrites of Euler's. He set the no- tation we use from high school on: sin, cos, and tan for the trigono- metric functions, f(z) to indicate function of the variable x are all due to him. Euler's complete works fil 85 large volumes—more than the number of mystery ls published by Agatha Christie; some were written after he became completely blind in 1771. Euler spent much of his professional life in St. Petersburg. He and his wife had thirteen children, five of whom survived to adulthood. LL Vectors 29 ‘We can think of this table as five columns, each an element of R®: 193 Alp 23/8 36606 39%/ 38% 74 38% 38 51 = | 4875 445 _| 8 Vor= | 4375 | nigh =| 53%] Low=| sayy 1774 80% 79% 3335 975 9556 5828 42M 41 235/5 —% 39 % 38 —%f =| 4% Se Cloe=| in| Che=| in| 4 7934 -% 95% “1% 415. "fh Note that we write elements of R” as columns, not rows. The reason for preferring columns will become clear later: we want the order of terms in matrix multiplication to be consistent with the notation f(z), where the functio placed before the variable—notation established by the famous mathematician Euler. Note also that we use parentheses for “positional” data and brackets for “incremental” data; the distinction is discussed below. Points and vectors: positional data versus incremental data An element of R” is simply an ordered list of n numbers, but such a list can be interpreted in two ways: as a point representing a position or as a vector representing a displacement or increment. Definition 1.1.2 (Point, vector, and coordinates). ‘The element of R” with coordinates 21, 22,-++ ,_ can be interpreted in two ways: as the point oo a x= ( : ) seme n= : | rset Zn. Zn Example 1.1.3 (An element of R? as a point and as a vector). The element of R? with coordinates z = 2,y = 3 can be interpreted as the point a) in the plane, as shown in Figure 1.1.1. But it can also be interpreted as the instructions “start anywhere and go two units right and three units up," rather like instructions for a treasure hunt: “take two giant steps to the east, 2 Ficure 1.1.1. ‘The point (3). Ficure 1.1.2. All the arrows represent the ae ‘As shown in Figure 1.1.2, in the plane (and in three-dimensional space) a vector can be depicted as ‘an arrow pointing in the direction of the displacement. The amount of displacement is the length of the arrow. This does not extend well to higher dimensions. How are we to picture the “arrow” in R55 representing the change in prices on the stock market? How long is it, and in what “direction” does it point? We will show how to com- ute these magnitudes and direc- tions for vectors in R" in Section 14. 30 Chapter 1. Vectors, Matrices, and Derivatives and three to the north”; this is shown in Figure 1.1.2. Here we are interested in the displacement: if we start at any point and travel (3) how far will we have gone, in what direction? When we interpret an element of R" as a position, we call it a point; when we interpret it as a displacement, or increment, we call it avector, A Example 1.1.4 (A point as a state of a system). It is easy to think of a point in R? or R® as a position; in higher dimensions, it can be more helpful to think of a point as a “state” of a system. If 3356 stocks are listed on the New York Stock Exchange, the list of closing prices for those stocks is an element, of R°55, and every element of R°* is one theoretically possible state of the stock market. This corresponds to thinking of an element of R°* as a point. The list telling how much each stock gained or lost compared with the pre~ vious day is also an element of R°°°, but this corresponds to thinking of the element as a vector, with direction and magnitude: did the price of each stock g0 up or down? How much? A Remark. In physics textbooks and some first year calculus books, vectors are often said to represent quantities (velocity, forces) that have both “magnitude” and “direction,” while other quantities (length, mass, volume, temperature) have only “magnitude” and are represented by numbers (scalars). We think this focuses on the wrong distinction, suggesting that some quantities are always represented by vectors while others never are, and that it takes more information to specify a quantity with direction than one without. The volume of a balloon is a single number, but so is the vector expressing the difference in volume between an inflated balloon and one that has popped. The first is a number in R, while the second is a vector in R. The height of a child is a single numiber, but 0 is the vector expressing how much he has grown since his last birthday. A temperature can be a “magnitude,” as in “It got down to -20 last night,” but it can also have “magnitude and direction,” as in “It is 10 degrees colder today than yesterday.” Nor can “static” information always be expressed by a single number: the state of the Stock Market at a given instant requires as many numbers as there are stocks listed—as does the vector describing the change in the Stock Market from one day to the next. A Points can’t be added; vectors can Asa rule, it doesn't make sense to add points together, any more than it makes sense to “add” the positions “Boston” and “New York” or the temperatures 50 degrees Fahrenheit and 70 degrees Fahrenheit. (If you opened a door between two rooms at those temperatures, the result, would not be two rooms at 120 We will not consistently use different notation for the point zero and the zero vector, although Philosophically the two are quite different. The zero vector, i-e., the “zero increment,” has @ universal meaning, the same regardless of the frame of reference. The point zero iy arbitrary, just as “zero de- grees” is arbitrary, and has a dif- ferent meaning in the Centigrade system and in Fahrenheit. Sometimes, often at a key point in the proof of a hard theorem, we will suddenly start thinking of points as vectors, or vice versa; this happens in the proof of Kan- torovitch's theorem in Appendix A2, for example. Fioure 1.1.3, ‘The difference a—b between point a and point b is the vector Joining them. The difference can be computed by subtracting the coordinates of b from those of a. 1.1 Vectors 31 degrees!) But it does make sense to measure the difference between points (i. to subtract them): you can talk about the distance between Boston and New York, or about the difference in temperature between two rooms. The result of subtracting one point from another is thus a vector specifying the increment you need to add to get from one point to another. You can also add increments (vectors) together, giving another increment. For instance the vectors “advance five meters east then take two giant steps south” and “take three giant steps north and go seven meters west” can be added, to get “advance 2 meters west and one giant step north.” Similarly, in the NYSE table in Example 1.1.1, adding the Close columns on two successive days does not produce a meaningful answer. But adding the Chg columns for each day of a week produces a perfectly meaningful increment: the change in the market over that week. It is also meaningful to add increments to points (giving a point): adding a Chg column to the previous day's Close column produces the current day's Close—the new state of the system. To help distinguish these two kinds of elements of R", we will denote them differently: points will be denoted by boldface lower case letters, and vectors will be lower case boldface letters with arrows above them. Thus x is a point in R?, while X is a vector in R?. We do not distinguish between entries of points and entries of vectors; they are all written in plain type, with subscripts. However, when we write elements of R” as columns, we will use parentheses for @ point x and square brackets for a vector %: in R?, x = G ) and <= le] 2 2 Remark. An element of IR" is an element of R"--i.e., an ordered list of numbers—whether it is interpreted as a point or as a vector. But we lave very different images of points and vectors, and we hope that sharing them with you explicitly will help you build a sound intuition. In linear algebra, you should Just think of elements of R" as vectors. However, differential calculus is all ‘about increments to points. It is because the increments are vectors that linear algebra is a prerequisite for multivariate calculus: it provides the right language and tools for discussing these increments. Subtraction and addition of vectors and points The difference between point a and point b is the vector a—b, as shown in Figure 1.1.3. Vectors are added by adding the corresponding coordinates: uw wy uty % w, v2 + w. i+ alo. fs LL Un Wy Un + Wn Se v w v+w If we were working with com- plex vector spaces, our scalars would be complex numbers: in number theory, scalars might be the rational numbers: in coding theory, they might be elements of a finite field. (You may have run into such things nnder the name of “clock urithmetic.") We use the word “scalar” rather than “real number” because most theorems in linear algebra are just ax true for complex vector spaces or ratio- nal vector spaces as for real ones, and we don’t want to restrict the validity of the statements unnec- cessarly ‘The symbol € means “element of” Out loud, one says “in.” The expression “x,y € V" means “X € V and ¥ € V." If you are unfamil- iar with the notation of set theory, see the discussion in Section 0.3. 32 Chapter 1. Vectors, Matrices, and Derivatives the result is a vector. Similarly, vectors are subtracted by subtracting the corresponding coordinates to get a new vector. A point and a vector are added by adding the corresponding coordinates; the result is a point. In the plane. the sum 7 + W is the diagonal of the parallelogram of which two adjacent sides are 7 and W, as shown in Figure 1.1.4 (left). We can also add vectors by placing the beginning of one vector at the end of the other, as shown in Figure 1.1.4 (right). fh FicurE 1.1.4. In the plane, the sum ¥ + w is the diagonal of the parallelogram at left. We can also add them by putting thein head to tail Multiplying vectors by scalars Multiplication of a vector by a scalar is straightforward: a az, 3 3v3 a}: |= | : |; forexample, V3|-1] = | -1v3] . 112 2 23 In this book, our vectors will be lists of real numbers, so that our scalars— the kinds of nuinbers we are allowed to multiply vectors or matrices by—are real numbers. Zn azn Subspaces of 2” A subspace of R” is a subset of R” that is closed under addition and multipli- cation by scalars." (This R" should be thought of as made up of vectors, not points.) in Section 2.6 we will discuss abstract vector spaces. ‘These are sets in which one can add and multiply by scalars, and where these operations satisfy rules (ten of them) that make them clones of R". Subspaces of R™ will be our main examples of vector spaces ‘To be closed under multiplica- tion a subspace must contain the zero vector. $0 that ov=6, ‘The notation for the standard basis vectors is ambiguous; at right we have three different vec- tors, all denoted 6. ‘The subscript tells us which entry is 1 but does not say how many entries the vee- tor has~-i.e., whether it is a vector in B?, BS or what. The standard basis vectors in R? and R? are often denoted i,j, and k: We do not use this notation but mention it in case you encounter it elsewhere. 11 Vectors 33 Definition 1.1.5 (Subspace of R"). A non-empty subset V € R” is called a subspace if it is closed under addition and closed under multiplication by . V is a subspace if when REV, andaeR, then *+¥eV andaxeV. scalars; For example. a straight line through the origin is a subspace of R? and of R°. ‘A plane through the origin is a subspace of R°. The set consisting of just the zero vector {0} is a subspace of any BR", and R" is a subspace of itself. These last two, {0} and R", are considered trivial subspaces. Intuitively, it is clear that a line that is a subspace has dimension 1, and a plane that is a subspace has dimension 2. Being precise about what this means requires some “machinery” (mainly the notions of linear independence and span), introduced in Section 2.4. The standard basis vectors We will mect oue particular family of vectors in R” often: the standard basis vectors. In 2? there are two standard basis vectors, 6 and 62; in R®, there are three: nate [H]a- [2]: wrap] a(t] 0-[3]. Similarly, in BS there are five standard basis vectors: W Teood 0 1 Oe 0 0. a Secor @ u Definition 1.1.6 (Standard basis vectors). The standard basis vectors in R® are the vectors &; with n entries, the jth entry 1 and the others zero. Geometrically, there is a close comnection between the standard basis vectors in R? and a choice of axes in the Euclidean plane. When in school you drew an z-axis and y-axis on a piece of paper and marked off units so that you could plot a point, you were identifying the plane with R?: each point on the plane corresponded to a pair of real numbers—its coordinates with respect to those axes. A set of axes providing such an identification must have an origin, and each axis nist have @ direction (so you know what is positive and what is 34 Chapter 1. Vectors, Matrices, and Derivatives negative) and it niust have units (so you know, for example, where x = 3 or y=2is) ow face need not be at right angles. and the units on one axis need not . be the same as those on the other, as shown in Figure 1.1.5. However, the identification is more useful if we choose the axes at right angles (orthogonal) and the units equal; the plane with such axes, generally labeled x and y, is ' known as the Cartesian plane. We can think that &, measures one unit along * * the z-axis, going to the right, and & measures one unit along the y-axis, going ‘up. Ficure 1.1.5. The point marked with 8 cit- Vector fields dle is the point (3) in this non orthogonal coordinace system, Virtually all of physics deals with fields. ‘The electric and magnetic fields of electromagnetism, the gravitational and other force fields of mechanics, the velocity fields of fluid flow, the wave function of quantum mechanics, are all “fields.” Fields are also used in other subjects, epidemiology and population studies, for instance. By “field” we mean data that varies from point to point. Some fields, like temperature or pressure distribution, are scalar fields: they associate a number to every point. Some fields, like the Newtonian gravitation field, are best mod- eled by vector fields, which associate a vector to every point. Others, like the electromagnetic field and charge distributions, are best modeled by form fields, discussed in Chapter 6. Still others, like the Einstein field of general relativity (a field of pseudo inner products), are none of the above. Definition 1.1.7 (Vector field). A vector field on R” is a function whose input is a point in R" and whose output is a vector (also in R") emanating from that point. ‘We will distinguish between functions and vector fields by putting arrows on vector fields, as in F in Example 1.1. Ficure 1.1.6, A vector field associates a vee- Example 1.1.8 (Vector fields in R?). The identity function in R? tor to each point. Here we show the radial vector field £(3)= (3) 11s B (3) = [ | : takes @ point in R? and returns the same point. But the vector field ‘Vector fields generally are easier to F(t). |= deni wien se aes the eta Cd aa com. = Hn ae done above and takes a point in R? and assigns to it the vector corresponding to that point, as igure 1.1. shown in Figure 1.1.6. To the point with coordinates (1, 1) it assigns the vector 1 ‘ ‘ [ 1 | to the point with coordinates (4,2) it assigns the vector [3]: 1.2 Matrices 35 shown in Figure 1.1.7, takes z ay? Similarly, the vector field F ( 3) = [ oe 7 Actually, a vector field simply a er associates to each point a vector; point in R? and assigns to it the vector |" |. A imagine that vector is to yon, But'it i always helpfal Vector fields are often used to describe the flow of ude or gases: the vector to imagine each vector anchored assigned to each point gives the velocity and direction of the flow. For flows that ‘at, or emanating from, the corre- don't change over time (steady-state flows), such a vector field gives a complete ‘sponding point. description. In more realistic cases where the flow is constantly changing, the vector field gives a snapshot of the flow at a given instant. Vector fields are also used to describe force fields such as electric fields or gravitational fields. 1.2 INTRODUCING THE ACTORS: MATRICES Probably no other area of mathematics has been applied in such numerous and diverse contezts as the theory of matrices. In mechanics, electro- ‘magnetics, statistics, economics, operations research, the social sciences, and so on, the list of applications seems endless. By and large this is due to the utility of matrix structure and methodology in conceptualiz- ing sometimes complicated relationships and in the orderly processing of otherwise tedious algebraic calculations and numerical manipulations. — James Cochran, Applied Mathematics: Principles, Techniques, and Ap- . plications : ‘The other central actor in linear algebra is the matric. Figure 1.1.7. ‘The vector field Definition 1.2.1 (Matrix). An m x n matrix is a rectangular array of #(3) (2c? entries, m high and n wide. y, z-y)" ‘We use capital letters to denote matrices. Usually our matrices will be arrays of numbers, real or complex, but matrices can be arrays of polynomials, or of ‘more general functions; a matrix can even be an array of other matrices, A vector ¥ € R™ is an m x 1 matrix. Addition of matrices, and multiplication of a matrix by a scalar, work in the ‘When a matrix is il fhen rie a described, Sous way, height is given first, then width an m Xn matrix high and wide. After struggling fe youn Example 1.2.2 (Addition of matrices and multiplication by a scalar). to remember which goes first, one ofthe authors hit on a mnemonic: [1 09] fo -3] [1 ~3 first take the elevator, then walk | 9 “fi 3]-[; 3) a | 1 s]= 2 é a down the hall. i 3 a 7G -2 3)" |-4 6 So far, it's not clear that matrices gain us anything. Why put numbers (or other entries) into a rectangular array? What do we gain by talking about the How would you add the matri- ces 125 ia] | [02s] om [0 3]? You can’t: matrices can be added only if they have the same height and same width. Matrices were introduced by Arthur Cayley, a lawyer who be- came a mathematician, in A Mem- oir on the Theory of Matrices, published in 1858. He denoted the multiplication of a 3x3 matrix by the vector | y| using the format 2 (a, 6, ¢)z,m2) Ge te ane Oe es when Werner Heisenberg discovered ‘matrix’ mechanics in 1925, he didn’t know what a ma- trix was (Max Born had to tell him), and neither Heisenberg nor Born knew what to make of the appearance of matrices in the con- text of the atom.”—Manfred R. Schroeder, “Number Theory and the Real World,” Mathematical Intelligencer, Vol. 7, No. 4 36 Chapter 1. Vectors, Matrices, and Derivatives a 2x2 matrix le 4] rather than the point | © | © &¢? The answer i that the a matrix format allows another operation to be performed: matriz multiplication. ‘We will see in Section 1.3 that every linear transformation corresponds to mul- tiplication by @ inatrix. This is one reason matrix multiplication is a natural and important operation; other important applications of matrix multiplication are found in probability theory and graph theory. Matrix multiplication is best learned by example. The simplest way to mul- tiply A times B is to write B above and to the right of A. Then the product AB fits in the space to the right of A and below B, the i, jth entry of AB being the intersection of the ith row of A and the jth column of B, as shown in Example 1.2.3. Note that for AB to exist, the width of A must equal the height of B. The resulting matrix then has the height of A and the width of B. Example 1.2.3 (Matrix multiplication). The first entry of the product AB is obtained by multiplying, one by one, the entries of the first row of A by those of the first column of B, and adding these products together: in Equation 1.2.1, (2x 1) + (-1 x 3) = -1. The second entry is obtained by multiplying the first row of A by the second column of B: (2 x 4) +(~1 x0) =8. After multiplying the first row of A by all the columns of B, the process is repeated with the second row of A: (3 x 1) + (2x 3) =9, and 60 on. B ———_ (ALB) = [48] [+] [3 el : Mie) PAG eal Given the matrices 10 _fio 04 1-101 a-[} 3 2 [° 1] q [i 0 a -[} ‘| what are the products AB, AC and CD? Check your answers below.? Now compute BA. What do you notice? What if you try to compute CA? ahs 40 [fh ap: = Matrix multiplication isnot commutative; BA = [? 5): which ie not equal to AB o1 AB [ 4 sl: Although the product AC exists, you cannot compute CA. Definition 1.2.4 says nothing new, but it provides some prac- tice moving between the concrete (multiplying two particular matri- ces) and the symbolic (express- ing this operation vo that it ap- plies to any two matrices of appro- priate dimensions, even if the en- tries are complex numbers or even functions, rather than real num- bers.) In linear algebra one is constantly moving from one form of representation (one “language”) to another. For example, as we have seen, a point in R" can be considered as a single entity, b, or as the ordered list of its coordi- nates; matrix A can be thought of asa single entity or as a rectangu- lar array of its entries. In Example 1.2.3, A is a 2x 2 matrix and B is a 23 matrix, 90 that n= 2, m product C is then a 2 x 3 matrix. If we set i = 2 and j = 3, we see that the entry ca.2 of the matrix C is Using the format for matrix multiplication shown in Example 1.2.3, the i, jth entry is the entry at the intersection of the ith row ‘and jth column. 1.2) Matrices 37 Below we state the formal definition of the process we've just. described. If the indices bother you, do refer to Figure 1.2.1. Definition 1.2.4 (Matrix multiplication). If A is an m xn matrix whose (i.3)th entry is a,j, and B is an n x p matrix whose (i, j)th entry is bi. then C = AB is the m x p matrix with entries Vandy ra 1.22 ig i, rb,j + Gi.abay +++ + Ginbn.- 3 = £ £ ' é Pa BS] FiGurE 1.2.1. The entry c., of the matrix C = AB is the sum of the products of the entries of the ax of the matrix A and the corresponding entry by,, of the matrix B. The entries a,,, are all in the éth row of A; the first index # is constant. and the second index k varies. The eutries by,y are all in the jth column of B: the first index varies, and the second index j is constant. Since the width of A equals the height of B, the entries of A and those of B can be paired up exactly. Remark. Often people write a problem in matrix multiplication in a row: [A][B] = [AB]. The format shown in Example 1.2.3 avoids coufusion: Product of the ith row of A and the jth column of B lies at the that row and column. It also avoids recopying matrices wheu « the tersection of 14 repeated Ficure 1.2.2. ‘The ith column of the product AB depends on all the entries of A but only the ith column of B. | =| j| a8 | 1 | { Se Figure 1.2.3. The jth row of the product AB depends on all the entries of B but only the jth row of A. 38 Chapter 1. Vectors, Matrices, and Derivatives multiplications, for example A times B times C times D: felfe|[>], ay [+] [on] ane} [a Multiplying a matrix by a standard basis vector Observe that multiplying a matrix A by the standard basis vector 6; selects out the ith column of A, as shown in the following example. We will use this fact often. Example 1.2.5 (The ith column of A is 48). Below, we show that the second column of A is Aga: 4 card cane 3-2 0] [27 multiplies the 2nd column by 1: x1= 21 2 1 043 4 10 2 0 * Ae A 124 Similarly, the ith column of AB is Ab,, where b; is the ith column of B, as shown in Example 1.2.6 and represented in Figure 1.2.2. The jth row of AB is the product of the jth row of A and the matrix B, as shown in Example 1.2.7 and Figure 1.2.3. Example 1.2.6.The second column of the product AB is the same as the Product of the second column of A and the matrix B: B & ——_—_— “a 1 4-2 4 30 2 0. 1.25 2-1] [-1 8 -6 2-1] [8 32 9 12 -2 3 2) [12 * ae * ‘ae Example 1.2.7. The second row of the product AB is the same as the product of the second row of A and the matrix B: In his 1858 article on matrices, Cayley stated that matrix multi- plication is associative but gave no proof. The impression one gets is that he played around with ma- trices (mostly 2 x 2 and 3 x 3) to get some feeling for how they behave, without worrying about rigor. Concerning another matrix result (the Cayley-Hamilton theo- rem) he verifies it for 3 x 3 matri- ces, adding I have not thought it necessary to undertake the labour of a formal proof of the theorem in the general case of a matriz of any degree. Figure 1.2.4. ‘This way of writing the ma- trices corresponds to calculating (AB)C. Ficure 1.2.5. This way of writing the ma- trices corresponds to calculating A(BC). 1.2 Matrices 39 oe id Tce ae banss > meee Matrix multiplication is associative When multiplying the matrices A,B, and C, we could set up the repeated ‘multiplication as we did in Equation 1.2.3, which corresponds to the product (AB)C. We can use another format to get the product A(BC): [| [e][¢] A] [asl [ane . [2] [ |: 12.7 I I | [4] [420] Is (AB)C the same as (AB)C? In Section 1.3 we give a conceptual reason why they are; here we give a computational proof. Proposition 1.2.8 (Matrix multiplication is associative). If A is an nx m matrix, B is an mx p matrix and C is a p x q matrix, 60 that (AB)C and A(BC) are both defined, then they are equal: (AB)C = A(BC). 1.28 Proof, Figures 1.2.4 and 1.2.5 show that the i, jth entry of both A(BC) and (AB)C depend only on the ith line of A and the jth column of C (but on all the entries of B), and that without loss of generality we can assume that A is a line matrix and that C is a column matrix, ie., that n = q = 1, so that both (AB)C and A(BC) are numbers. The proof is now an application of associativity of multiplication of numbers: (apyo= >> ( ob) 4 eat th Fy of AB = Yeh xe (Em2) talks 1.29 A(BC). 0 we feth entry of BC Exercise 1.2.2 provides prac- tice on matrix raultiplication. At the end of this sect Example 1.2.22, involving shows a setting where matrix multiplica- tion is a natural and powerful tool. ‘The main diagonal is also called the diagonal. The diagonal from bottom left to top right is the anti- diagonal. Multiplication by the identity matrix J does not change the ma- trix being multiplied. The columns of the identity matrix Jn, are of course the stan- dard basis vectors 61, ..,&n! 100 0 S 0 La a 1 sooo goon mS Se 40 Chapter 1. Vectors, Matrices, and Derivatives Non-commutativity of matrix multiplication ‘As we saw earlier, matrix multiplication is most definitely not commutative. It may well be possible to multiply A by B but not B by A. Even if both matrices have the same number of rows and columns, AB will usually not equal BA, as shown in Example 1.2.9. Example 1.2.9 (Matrix multiplication is not commutative). If you multiply the matrix it tl by the matrix i a the answer you get will depend on which one you put first: f 5 is not equal to i | A 1210 (a) rg] (to) fo 4] The identity matrix ‘The identity matrix I plays the same role in matrix multiplication as the number 1 does in multiplication of numbers: [A = A = AI. Definition 1.2.10 (Identity matrix). The identity matrix Jn is the nx n- matrix with 1’s along the main diagonal (the diagonal from top left to bottom right) and 0's elsewhere. For example, 10 0 0 ta=[} 9] ma na[o io 2 an ooo 1 If Ais an n x m-matrix, then IA=AI=A, or, More precisely, 1,4 = Alm = A, 12.12 since if n # m one must change the size of the identity matrix to match the size of A. When the context is clear, we will omit the index. Matrix inverses ‘The inverse A~' of a matrix A plays the same role in matrix multiplication as the inverse 1/a does for the number a. We will see in Section 2.3 that we can use the inverse of a matrix to solve systems of linear equations. 1.2. Matrices 41 The only number that does not have an inverse is 0, but many matrices do not have inverses. In addition, the non-commutativity of matrix multiplication makes the definition more complicated. Definition 1.2.11 (Left and right inverses of matrices). Let A be a matrix. If there is another matrix B such that BA=I, then B is called a left inverse of A. If there is another matrix C such that AC =I, then C is called a right inverse of A. It is possible for a nonzero matrix to have neither a right nor a left inverse. We will ae i Section 2.3 that only square matrices can have a iworided mere, Le, WED matrix (3 a does not have a right or a left inverse. ‘To see this, assume it hab left inverse then that left in: verse is necessarily also a right in- verse; similarly, if it has a right in- verse, that right inverse fv neces. 1 O}fa 6]_f1 0 _ sarily a left inverse. 0 O}[ce dJ~ jo 1 ie Example 1.2.12 (A matrix with neither right nor left inverse). The has a right inverse. Then there exists a matrix [i ‘| such that, It is possible for @ non-square wf? ol; : matrix to have lots of left inverses But that product is | 9], i.e., in the bottom right-hand comer, 0 = 1. A and no right inverse, or lots of similar computation shows that there is no le right inverses and no left inverse, as explored in Exercise 1.2.20. verse. A Definition 1.2.13 (Invertible matrix). An invertible matrix is a matrix that has both a left inverse and a right inverse. While we can write the inverse Associativity of matrix multiplication gives us the following result: of a number z either as z~’ or as ia ma 4) =) Proposition and Definition 1.2.14, If matrix A has both a left and a aaa! We cannot divide “ight inverse, then it has anly one left inverse and one Tight inverse, and they by @ matriz. If for two matrices fe identical; such a matrix is called the inverse of A and is denoted A~, A and B you were to write A/B, it would be unclear whether this meant Proof. If a matrix A has a right inverse B, then AB = J. If it has a left inverse C, then CA = I. So BOA or AB O(AB)=CI=C and (CA)B=IB=B, 0 C=B. 0 1214 ‘We discuss how to find inverses of matrices in Section 2.3. A formula exists for 2 x 2 matrices: the inverse of ¢ al 2.2.15 a-[2 ‘] is Ato We are indebted to Robert Ter- rell for the mnemonic, “socks on, shoes on; shoes off, socks off.” To undo a process. you undo first the last thing you did. 1 Wee (| then its transpose ; v' =(10 1) Do not confuse a matrix with its transpose, and in particular, never write a vector horizontally. If you write a vector written hor- izontally you have actually writ- ten its transpose; confusion be- tween a vector (or matrix) and its transpose leads to endless difficul- ties with the order in which things should be multiplied, as you can see from Theorem 1.2.17. 42 Chapter 1. Vectors, Matrices. and Derivatives as Exercise 1.2.12 asks you to confirin by matrix multiplication of AA~' and AW'A, (Exercise 1.4.12 discusses the formula for the inverse of a 3 x 3 matrix.) Notice that a 2 x 2 matrix is invertible if ad ~ be # 0. The converse is also true: if ad — be = 0, the matrix is not invertible, as you arc asked to show in Exercise 1.2.13. Associativity of matrix multiplication is also used to prove that the inverse of the product of two invertible matrices is the product of their inverses, in reverse order: Proposition 1.2.15 (The inverse of the product of matrices). If A and B are invertible matrices, then AB is invertible, and the inverse is given by the formula (AB)"' = B7'A™'. 1.2.16 Proof. The computation (AB)(B™'A™!) = A(BB™')A™! = AA“! = T 1.217 and a similar one for (B-'A~')(AB) prove the result. O. Where was associativity used in the proof? Check your answer below.* The transpose The transpose is an operation on matrices that will be useful when we come to the dot product, and in many other places. Definition 1.2.16 (Transpose). The transpose AT of a matrix A is formed by interchanging all the rows and columns of A, reading the rows from left to right, and columns from top to bottom. 1 4-2 ne For example, if A = sthen AT=| 4 0 30 2 FG The transpose of a single row of a matrix is a vector; we will use this in Section 1.4, “Associativity is used for the first two equalities below: > «er we fF al) = A (Ba BBVA) = ATA" (AB) (BA!) = 4 (BBA) = A(BBY A) = AA) aay ao ‘The proof of Theorem 1.2.17 ig straightforward and is left as Exercise 1.2.14. 110 10 3 i. 3 0 A symmetric matrix © 12 1 03 2-3 0. ‘An anti-symmetric matrix 10 8 0200 oo10 0000 ‘An upper triangular matrix Soon cone once Heooo A diagonal matrix Exercise 1.2.10 asks you to show that if A and B are upper trian- gular nx n matrices, then 80 is AB. 1.2 Matrices 43 ‘Theorem 1.2.17 (The transpose of a product). The transpose of a product is the product of the transposes in reverse order: (AB)" = BTAT. 12.18 Some special kinds of matrices Definition 1.2.18 (Symmetric matrix). A symmetric matrix is equal to its transpose. An anti-symmetric matrix is equal to minus its transpose. Definition 1.2.19 (Triangular matrix). An upper triangular matrix is a ‘square matrix with nonzero entries only on or above the main diagonal. A lower triangular matrix is a square matrix with nonzero entries only on or below the main diagonal. Definition 1.2.20 (Diagonal matrix). A diagonal matrix is a square matrix with nonzero entries (if any) only on the main diagonal. What happens if you square the diagonal matrix (3 al If you cube it? Applications of matrix multiplication: probabilities and graphs While from the perspective of this book matrices are most important because they represent linear transformations, discussed in the next section, there are other important applications of matrix multiplication. Two good examples are probability theory and graph theory. Example 1.2.21 (Matrices and probabilities). Suppose you have three reference books on a shelf: a thesaurus, a French dictionary, and an English dictionary. Each time you consult one of these books, you put it back on the shelf at the far left. When you need a reference, we denote the probability that it will be the thesaurus P,, the French dictionary P, and the English dictionary Ps, There are six possible arrangements on the shelf: 123 (thesaurus, French dictionary, English dictionary), 132, and so on. [bP -[s abe 7 -[e 2) For example, the move from (213) to (321) has probability Ps (associated with the English dic- tionary), since if you start with the order (213) (French dictio- nary, thesaurus, English dictio- rary), consult the English dictio- inary, and put it back to the far left, you will then have the order (321). So the entry at the 3rd row, 6th column is Py. ‘The move from (213) to (312) has proba bility 0, since moving the English dictionary won't change the posi- tion of the other books. So the entry at the 3rd row, 5th column iso. A situation like this one, where each outcome depends only on the fone just before it, it called a Markov chain, Sometimes easy access isn't the goal. In Zola’s novel Au Bon- heur des Dames. the epic story of the growth of the first big depart- ment store in Paris, the hero has fan inspiration: he places his mer- chandise in the most inconvenient arrangement possible, forcing his customers to pass through parts of the store where they otherwise wouldn't set foot, and which are imined with temptations for im- pulse shopping. 44° Chapter 1. Vectors. Matrices, and Derivatives ‘We can then write the following 6 x 6 transition matrix, indicating the prob- ability of going from one arrangement to another: (1.2.3) (1,32) (2,13) (23,1) (31,2) (8,21) 0 0 0 (2.3) Ph Py Py (1.3.2) O a Ph 0 Py 0 (21.3) Ph 0 Pr 0 0 Py (2.3.1) 0 P 0 0 Pr 0 Ps (a2) 0) A 0 Pa Py 0 (3.21) 0 PB 0 P, 0 Py Now say you start with the fourth arrangement, (2,3,1). Multiplying the line matrix (0,0,0, 1,0, 0) (probability 1 for the fourth choice, 0 for the others) by the transition matrix T gives the probabilities P;,0,0, P20, Ps. This is of course just the 4th row of the matrix. The interesting point here is to explore the long-term probabilities. At the second step, we would multiply the line matrix P1,0,0, Pz,0, Ps by T; at the third we would multiply that product by T, . If we know actual values for Pj, Pz, and Ps we can compute the probabilities for the various configurations after a great many iterations. If we don’t know the probabilities, we can use this system to deduce them from the configuration of the bookshelf after different numbers of iterations. ‘This kind of approach is useful in determining efficient storage. How should a lumber yard store different sizes and types of woods, so as little time as possible is lost digging out a particular plank from under others? For computers, what applications should be easier to access than others? Based on the way you use your computer, how should its operating system store data most efficiently? A. Example 1.2.22 is important for many applications. It introduces no new theory and can be skipped if time is at a premium, but it provides an enter~ taining setting for practice at matrix multiplication, while showing some of its power. Example 1.2.22 (Matrices and graphs). We are going to take walks on the edges of a unit cube; if in going from a vertex V; to another vertex V_ we walk along n edges. we will say that our walk is of length n. For example, in Figure 1.2.6. if we go from vertex V; to Ve, passing by V4 and Vs, the total length of our walk is 3. We will stipulate that each segment of the walk has to take us from one vertex to a different vertex; the shortest possible walk from a vertex to itself is of length 2. How many walks of length n are there that go from a vertex to itself, or, more generally, from a given vertex to a second vertex? As we will see in Proposition 1.2.23, we answer that question by raising to the nth power the adjacency matrix of the graph. The adjacency matrix for our cube is the 8 x 8 matrix You may appreciate this result more if you try to make a rough estimate of the number of walks of length 4 from a vertex to itself. ‘The authors did and discovered later that they had missed quite 1 few possible walks. ‘As you would expect, all the 1's in the adjacency matrix A have turned into 0's in A‘; if two ver- tices are connected by a single ‘edge, then when n is even there will be no walks of length n be- ‘tween them. Of course we used a computer to compute this matrix. For all but simple problems involving ma- trix multiplication, use Matlab or an equivalent. 1.2 Matrices 45 whose rows and columns are labeled by the vertices Vi,..., Vs, and such that the i, jth entry is 1 if there is an edge joining V; to V;, and 0 if not, as shown in Figure 1.2.6. For example, the entry 4, 1 is 1 (underlined in the matrix) because there is an edge joining Vs to Vi; the entry 4,6 is 0 (also underlined) because there is no edge joining Vs to Vs. —_ Vive Va YU Vs Ve vw Ve aN M0 1 0 1 0 1 6 0 io 1 0 i 0 0 6 ft oO 2 hn y 01010001 A= i 6 10 1 6 0 0 ee ee M1 0 0 0 1 0 4 0 hv WoO 1000101 : “WOO 101010 FIGURE 1.2.6. Left: The graph of a cube. Right: Its adjacency matrix A. If two vertices Vi and Vj are joined by a single edge, the (i, )th and (j,)th entries of the matrix are 1; otherwise they are 0. ‘The reason this matrix is important is the following. Proposition 1.2.23. For any graph formed of vertices connected by edges, the number of possible walks of length n from vertex V; to vertex V; is given by the i, jth entry of the matrix A” formed by taking the nth power of the graph’s adjacency matrix A. For example, there are 20 different walks of length 4 from Vs to Vz (or vice versa), but no walks of length 4 ftom Vs to Va because 210 20 0 20 0 2 0 0 21 0 2 0 2 0 2 2% 0 2 0 2% 0 2% 0 Aba] 9 2 0 2% 0 2% 0 2% 20 0 20 0 21 0 2 0 0 2 0 2 0 2 0 2 20 0 2 0 2 0 2 0 0 2 0 2 0 2 0 2 Proof. This will be proved by induction, in the context of the graph above; the general case is the same. Let By be the 8 x 8 matrix whose i, jth entry is the number of walks from V; to V; of length n, for a graph with eight vertices; Like the transition matrices of probability theory, matrices repre- senting the length of walks from ‘one vertex of a graph to another have important applications for computers and multiprocessing. Exercise 1.2.15 asks you to con- struct the adjacency matrix for a triangle and for a square. We can also make a matrix that al- lows for one-way streets (one-way edges), as Exercise 1.2.18 asks you to show. 46 Chapter 1. Vectors, Matrices, and Derivatives we must prove B, = A". First notice that By = A! = A: the number A,,, is exactly the number of walks of length 1 from v, to v, Next, suppose it is true for n, and let us see it for n+ 1. A walk of length n+ from V, to V, must be at some vertex V; at time n. The number of such walks is the sum, over all such Ve, of the number of ways of getting from V, to Vi in ni steps, times the number of ways of getting from Vj to V; in one step. This will be 1 if Vi is next to V;. and 0 otherwise. In symbols, this becomes (Baths = (Brie (Bide — =. ete ey, Pitas “Stas vow 5 = A") Ay SAA ea = (4% os s inoncve 354, PE pats, oe which is precisely the definition of A"*!. O Above, what do we mean by A"? If you look at the proof, you will see that what we used was AM = ((...(A)A)A)AL 1.2.20 viacorn Matrix multiplication is associative, so you can also put the parentheses any ‘way you want; for example, AN = (A(ACA). )): 1.2.21 In this case, we can see that itis true, and simultaneously make the associativity Jess abstract: with the definition above, BnBm = Buim- Indeed, a walk of length n +m from V; to V; is a walk of length n from V, to some Vi, followed by a walk of length m from Vi to V;. In formulas, this gives (Bnmis = Yo(Bn)ia(Bm)i5- 1.2.22 1.3 WHAT THE AcTors Do: A MATRIX AS A TRANSFORMATION In Section 2.2 we will see how matrices are used to solve systems of linear equations, but first let us consider a different view of matrices. In that view, multiplication of a matrix by a vector is seen as a linear transformation, a special kind of mapping. This is the central notion of linear algebra, which ‘The words mapping (or map) ‘and function are synonyms, gen- erally used in different contexts. A function normally takes a point ‘and gives a number. Mapping is a more recent word; it was first used in topology and geometry and has spread to all parts of mathemat- ies. In higher dimensions, we tend to use the word mapping rather than function, but there is noth- ing wrong with calling a mapping from R° — R® a function. \— eZ Ea Figure 1.3.1. ‘A mapping: every point on the left goes to only one point on the righ eee Figure 1.3.2. Not a mapping: not well de- fined at a, not defined at 6. ‘The domain of our mathemati- cal “final grade function” is IR; ita range is R. In practice this func- tion has a “socially acceptable” domain of the realistic grade vee- tors (no negative numbers, for ex- ample) and also a “socially accept- able" range, the set of possible fi- nal grades. Often mathemati- cal function modeling a real sys- tem has domain and range consid- erably larger than the realistic val- ues. 1.3. Transformations 47 allows us to put matrices in context and to see them as something other than “pushing numbers around.” Mappings ‘A mapping associates elements of one set to elements of another. In common speech, we deal with mappings all the time. Like the character in Moliére's play Le Bourgeois Gentilhomme, who discovered that he had been speaking prose all his life without knowing it, we use mappings from early childhood, typically with the word “of” or its equivalent: “the price of a book” goes from books to money; “the capital of a country” goes from countries to cities. This is not an analogy intended to ease you into the subject. “The father of” is a mapping. not “sort of like” a mapping. We could write it with symbols: J (2) = y where x = a person and y = that person’s father: f(John Jr.) = John. (Of course in English it would be more natural to say, “John Jr.'s father” rather than “the father of John Jr.” A school of algebraists exists that uses this notation: they write (z)f rather than f(z).) The difference between expressions like “the father of” in everyday speech and mathematical mappings is that in mathematics one must, be explicit, about. things that are taken for granted in speech. Rigorous mathematical terminology requires specifying three things about a mapping: (1) the set of departure (the domain), (2) the set of arrival (the range), (3) a rule going from one to the other. If the domain of a mapping M is the real numbers R and its range is the rational numbers Q, we denote it M: R — Q, which we read “M from B to Q.” Such a mapping takes a real number as input and gives a rational number as output. What about a mapping T : R" — R™? Its input is a vector with n entries; its output is a vector with m entries: for example, the mapping from R” to R that takes n grades on homework, tests, and the final exam and gives you a final grade in a course. The rule for the “final grade” mapping above consists of giving weights to homework, tests, and the final exam. But the rule for a mapping does not have to be soincthing that can be stated in a neat mathematical formula. For example, the mapping M : R — R that changes every digit 3 and turns it into a 5 is a valid mapping. When you invent a mapping you enjoy the rights of an absolute dictator: you don't have to justify your mapping by saying that “look, if you square a number z, then inultiply it by the cosine of 2m, subtract 7 and then raise the whole thing to the power 3/2, and finally do such-and-such, then Note that in correct mathem: ical usage, “the father of” ay a mapping from people to people is not the sane mapping as “the fa- ther of” as a mapping from peo- ple to men. A mapping includes a domain, a range. and a rule going from the first to the second. 48 Chapter I, Vectors, Matrices, and Derivatives if contains a 3, that 3 will tum into a 5, and everything else will remain unichanged.” There isn’t any such sequence of operations that will “carry out” your mapping for you. and you don't need one.® ‘A mapping going “from” IR" “to” 2.” is said to be defined on its domain 2" ‘A mapping in the mathematical sense inust be well defined: it must be defined at every point of the domain, and for each, must return a unique element of the range. A mapping takes you, unambiguously, from one element of the set of departure to onc element of the set of arrival, as shown in Figures 1.3.1 and 1.3.2. (This does not mean that you can go unambiguously (or at all) in the reverse direction: in Figure 1.3.1. going backwards from the point d in the range will take you to either a or b in the domain, and there is na path from ¢ in the range to any point in the domain.) Not all expressions “the this of the that” are true mappings in this sense. “The daughter of.” as a mapping from people to girls and womten. is not ev- ‘erywhere defined. because not everyone has a daughter; it is nit well defined because some people have more than one danghter. It is not a mapping, But “the number of daughters of,” as a mapping from womten to numbers, is every- where defined and well defined, at a particular time. And “the father of,” as a mapping from people to men, is everywhere defined, and well defined; every person has a father, and only one. (We speak here of biological fathers.) Remark. We use the word “range” to mean the space of arrival, or “target space”; some authors use it to mean those elements of the arrival space that are actually reached. In that usage, the range of the squaring fimiction F : R + R, given by F(x) = .r? is the non-negative real numbers. while in our usage the range is R. We will see in Section 2.5 that what these authors call the range, we call the image. As far as we know, those authors who se the word range to denote the image either have no word for the space of arrival, or use the word interchangeably to inean both space of arrival and image. We find it useful to have two distinct words to denote these two distinct objects. A Here's another “pathological” but perfectly valid mapping: the mapping M : % + ‘R.that takes every number in the interval (0, 1j that can be written in base 3 without using I's, changes every 2 to a 1, and then considers the result as a number in base 2. If the number has a 1, it changes ail the digits after the first 1 into O's and considers the result as a number in base 2. Cantor proposed this mapping to point out the need for greater precision in a number of theorems. in particular the fundamental theorem of calculus. At the time it was viewed as pathological but it turns out to be important for understanding Newton’s method for cubic polynomials in the complex. Mappings Sunt ket occur everywhere in complex dynamics surprising dnovery ofthe erly 1980's. 1.3 Transformations 49 Existence and uniqueness of solutions Given a mapping T, is there a solution to the equation T(z) = 6, for every b in the range (set of arrival)? If so, the mapping is said to be onto, or surjective. “Onto” is thus a way to talk about the eristence of solutions. The mapping “the father of” as a mapping from people to men is not onto, because not all men are fathers. There is no solution to the equation “The father of z is Mr. Childless.” An onto mapping is shown in Figure 1.3.3. A second question of interest concerns uniqueness of solutions. Is there at most one solution to the equation T(z) = 6, for every 6 in the set of arrival, or might there be many? If there is at most one solution to the equation T(z) = b, the mapping T is said to be one to one, or injective. The mapping “the father of” is not one to one. There are, in fact, four solutions to the equation “The father of z is John Hubbard.” But the mapping “the twin sibling of,” as a mapping from twins to twins, is one to one: the equation “the twin sibling of x = y” has a unique solution for each y. “One to one” is thus a way to talk about the uniqueness of solutions. A one to one mapping is shown in Figure 1.3.4. A mapping T that is both onto and one to one (also called bijective) has Ficure 1.3.4. an inverse mapping T~' that undoes it. Because T is onto, T! is everywhere ‘A mapping: 1-1, not onto, no defined; because T is one to one, T-) is well defined. So T-? qualifies as a ry ey mapping. To summarize: Figure 1.3.3. ‘An onto mapping, not 1-1, @ and b g0 to the same point. Definition 1.3.1 (Onto). A mapping is onto (or surjective) if every element of the set of arrival corresponds to at least one element of the set of departure. Definition 1.3.2 (One to one). A mapping is one to one (or injective) if every element of the set of arrival corresponds to at most one element of the set of departure. “Onto” is a way to talk about — crenea dung, Definition 1.3.3 (Blijective). A mapping is bijective if itis both onto and to the equation T(z) = b forevery 08 tone. AA bijective mapping is invertible. © in the set of arrival (the range of T). “One to one” is a way to talk about the uniqueness of solu. Example 1.3.4 (One to one and onto). The mapping “the Social Security tions: Tis one to one if for every number of” as a mapping from Americans to numbers is not onto because there 6 there is at most one solution to exist numbers that aren’t Social Security numbers. But it is one to one: no two the equation T(z) = b. Americans have the same Social Security number. The mapping f(z) = z? from real numbers to real positive numbers is onto because every real positive number has a real square root, but it is not one to one because every real positive number has both a positive and a negative square root. ‘A composition is written from left to right but computed from right to left: you apply the map- ping g to the argument z and then apply the mapping J to the result. Exercise 1.3.12 provides some practice. When computers do composi- tions it is not quite true that com- position is associative. One way of doing the calculation may be more computationally effective than an- other; because of round-off errors, the computer may even come up with different answers, depend- ing on where the parentheses are placed. Although composition is asso- ciative, in many settings, ((fea)on) and (fo(90h)) correspond to different ways of inking. Already, the “father of the maternal grandfather” and “the paternal grandfather of the mother” are two ways of thinking of the same person; the author of a biography might use the first term when focusing on the relationship between the subject's grandfather and that grandfather's father, and use the other when focusing on the relationship between the subject's ‘mother and her grandfather. 50 Chapter 1. Vectors, Matrices, and Der Composition of mappings Often one wishes to apply, consecutively, more than one mapping. This is known a8 composition. Definition 1.3.5 (Composition). The composition f og of two mapping, f and g, is (Fea)(z) = £(9(2))- 1.3.1 Example 1.3.6 (Composition of “the father of” and “the mother of”). Consider the following two mappings from the set of persons to the set of persons (alive or dead): F, “the father of,” and M, “the mother of.” Composing these gives: FoM (the father of the mother of = maternal grandfather of) MoF (the mother of the father of = paternal grandmother of). It is clear in this case that composition is associative: Fo(FoM)=(FoF)oM. 132 ‘The father of David's maternal grandfather is the same person as the paternal grandfather of David's mother. Of course it is not commutative: the “father of the mother” is not the “mother of the father.”) A. Example 1.3.7 (Composition of two functions). If f(z) = 2-1, and g(z) = 2?, then (fo 9)(2) = f(9(z)) =27-1. A 13.3 Proposition 1.3.8 (Composition is associative). Composition is asso- ciative: fogoh=(fog)oh=fo(goh). 134 Proof. This is simply the computation (£09) 0h) (2) = (F o9)(A(2)) = F(a(A(z))) whereas (Fe (goh))(z) = F((goh)(z)) = f(9(A(z))). O 1.3.5 You may find this “proof” devoid of content. Composition of mappings is part of our basic thought processes: you use a composition any time you speak of “the this of the that of the other.” ‘The words transformation and ‘mapping are synonyms, 30 we could call the matrix A of Figure 1.3.5 @ mapping. But in linear al- gebra the word transformation is ‘more common. In fact, the matrix Ais a linear transformation, but we haven't formally defined that term yet. Mathematicians usually denote ‘linear transformation by its as- rather than say- inners to shopping list” transformation is the multi- plication Ab = é, they would call this transformation A. 1.3. Transformations 51 Matrices and transformations ‘A special class of mappings consists of those mappings that are encoded by matrices. By “encoded” we mean that multiplication by a matrix is the rule that turns an input vector into an output vector: just as /(z) = y takes a number z and gives y, AV = W takes a vector ¥ and gives a vector W. Such mappings, called linear transformations, are of central importance in linear algebra (and every place else in mathematics). Throughout mathemat- ics, the constructs of central interest are the mappings that preserve whatever structure is at hand. In linear algebra, “preserve structure” means that you can first add, then map, or first map, then add, and get the same answer; similarly, first multiplying by a scalar and then mapping gives the same result as first mapping and then multiplying by a scalar.) One of the great discoveries at the end of the 19th century was that the natural way to do mathematics is to look at sets with structure, such as IR”, with addition and multiplication by scalars, and to consider the mappings that preserve that structure. ‘We give a mathematical definition of linear transforinations in Definition 1.3.11, but first let’s see an example. Example 1.3.9 (Frozen dinners). In a food processing plant making three types of frozen dinners, one might associate the number of dinners of various sorts produced to the total ingredients needed (beef, chicken, noodles, cream, salt, ... ). As shown in Figure 1.3.5, this mapping is given by multiplication (on the left) by the matrix A, which gives the amount of each ingredient needed for each dinner: A tells how to go from b, which tells how many dinners of each kind are produced, to the product €, which tells the total ingredients needed. For example, 21 pounds of beef are needed, because (.25 x 60)+(.20x30)+(0x 40) = 21, For chicken, (0 x 60) + (0 x 30) + (.45 x 40) = 18. 5 Dione T 60 strogano ) 30 ravioli 40 fried chicken Ibs. of beef + 25 2 0 21 Ib of beef Ibs. of chicken + 0 0 45 18 Ib of chicken Ibe. of noodles + ++Ib of noodles Ibe. of rice + . : «Ib of rice liters of cream — te : - «liters of cream, Deof troganofl ravioll fred chicken __a_—_—--—_—_—eE_I_é__———- A Ingrediente per dinner 2 Total needed FIGURE 1.3.5. The matrix A is the transformation associating the number of dinners of various sorts produced to the total ingredients needed. A. Notice that matrix multiplica- tion emphatically does not allow for feedback. For instance, it does not allow for the possibility that if you buy more you will get a discount for quantity, or that if you buy even more you might cre- ‘ate scarcity and drive prices up. Thia is a key feature of linearity, and is the fundamental weakness of all models that linearize map- pings and interactions. T 2x )= 2T(x) Ficure 1.3.6. For any linear transformation T, T(ax) = a7(x). 52 Chapter 1. Vectors, Matrices, and Derivatives Example 1.3.10 (Frozen foods: composition). For the food plant of Example 1.3.9, one might make a matrix D, 1 high and n wide (n being the total number of ingredients), that would list the price of each ingredient, per pound or liter. The product DA would then tell the cost of each ingredient in each dinner, since A tells how much of each ingredient is in each dinner. The product (DA)b would give the total cost of the ingredients for all b dinners. ‘We could also compose these transformations in a different order, frst figuring how much of each ingredient we need for all b dinners—the product AB. Then, using D, we could figure the total cost: D(Ab). Clearly, (DA)b = D(Ab), although the two correspond to slightly different perspectives. A Real-life matrices We kept Example 1.3.9 simple, but you can easily see how this works in a more realistic situation. In real life—modeling the economy, designing buildings, modeling airflow over the wing of an airplane—vectors of input data contain tens of thousands of entries, or more, and the matrix giving the transformation has millions of entries. ‘We hope you can begin to see that a matrix might be a very useful way of mapping from R” to R". To go from IR, where vectors all have three entries, wy 1 vs (= to R¢, where vectors have four entries, W = | 02, you would 3 0% u = vy wt 136 we ws wa ‘One can imagine doing the same thing when the n and m of R” and R™ are arbitrarily large. One can somewhat less easily imagine extending the same idea to infinite-dimensional spaces, but making sense of the notion of multiplication of infinite matrices gets into some deep water, beyond the scope of this book. Our matrices are finite: rectangular arrays, m high and n wide. ws multiply ¥ on the left by a 4 x 3 matrix: Linearity The assumption that a transformation is linear is the main simplifying axsump- tion that scientists and social scientists (especially economists) make to under- stand their models of the world. Roughly speaking, linearity means that if you double the input, you double the output; triple the input, triple the output The Italian mathemat vatore Pincherle, one of the early pioneers of linear algebra, called a linear transformation a distribu- tive transformation (operazioni distributive), a name that is per- hhape more suggestive of the formu- Jes than is “linear.” Every linear transformation is given by a matrix. The matrix can bbe found by seeing how the trans- formation acts on the standard ba- sis vectors 1 0 2 z. : a=}.). én = 0 0. 1 1.3 Transformations 53 In Example 1.3.9, the transformation A is linear: each frozen beef stroganoff dinner will require the same amount of beef, whether one is making one dinner or 10,000. We treated the price function D in Example 1.3.10 as linear, but in real life it is cheaper per pound to buy 10,000 pounds of beef than one. Many, perhaps most, real-life problems are nonlinear. It is always easier to treat them as if they were linear; knowing when it is safe to do so is a central issue of applied mathematics. Definition 1.3.11 (Linear transformation). A linear transformation T :R" + R™ is a mapping such that for ell scalars a and all ¥, WER", T(#+W)=7(%)+7(W) and T(a¥) = a7(¥). 13.7 ‘The two formulas can be combined into one (where 6 is also a scalar): T(a¥ + bw) = aT (7) + OT (W). 138 Example 1.3.12 (Linearity at the checkout counter). Suppose you need to buy three gallons of cider and six packages of doughnuts for a Halloween party. The transformation T is performed by the scanner at the checkout counter, reading the UPC code to determine the price. Equation 1.3.7 is noth- ing but the obvious statement that if you do your shopping all at once, it will cost you exactly the same amount as it will if you go through the checkout line nine times, once for each item: T(3gal-cider +6 pkg. doughnuts) =3(T(1 gal.cider)) +6(7(1 pkg. doughnuts), unless the supermarket introduces nonlinearities such as “buy two, get one free.” Example 1.3.13 (A matrix gives a linear transformation). Let A be an mx n matrix. Then A defines a linear transformation T : R" + R™ by matrix multiplication: Av. 13.9 T(¥) Such mappings are indeed linear, because A(7+#) = AV+AW and A(c?) = CAV, as you are asked to check in Exercise 1.3.14. A ‘The crucial result of Theorem 1.3.14 below is that every linear transformation R" — R™ is given by a matrix, which one can construct by seeing how the transformation acts on the standard basis vectors. This is rather remarkable. A priori the notion of a transformation from R® to R™ is quite vague and abstract; one might not think that merely by imposing the condition of lineatity one could say something so precise about this shapeless set of mappings as saying that each is given by a matrix. 54 Chapter 1. Vectors, Matrices, and Derivatives ‘Theorem 1.3.14 (Linear transformations given by matrices). Every linear transformation T : R" — R™ is given by multiplication by the m xn To find the matrix for a linear matrix (Tl, the ith column of which is T(&;). transformation, ask: what is the Ceatotne teeth bese vectors? _ Putting the columns together, this gives 7(@) = (T}¥. This means that The ith column of the matrix for Example 1.3.13 is “the general” linear transformation in R”. a linear transformation T is T(é,); to get the ith column of the ma- Proof. Start with a linear transformation T : R" -» R", and manufacture the ask: what does the trans- matrix [T'] according to the rule given immediately above: the ith column of IT] is T(G). We may write any vector 7 € R” in terms of the standard basis vectors: y % 1 0 0 1 v2 0 1 ; 1 veal leu|ilon [ole eul if. 13.10 ' : : i 0 ! Un 0. 0. 1 4 ~~ ~ % % * pions tee We can write this more succinetly: The orthogonal projection of the point (}) onto the zaxis V= 18) + usd: +---+tméq, oF, with sum notation, A is the point (3): Projection” Then by linearity, means we draw a line from the point to the z-axis. “Orthogonal” TW =TY ve 7 uT(&), 1.3.12 awe Pema dicular to the z-axis, which is precisely the column vector [T]¥. a If this isn’t apparent, try translating it out of sum notation: * 7%) 1) 6.) } Sarwan [fon ff ete] iad a a ~ “ a naa sat te oe here u % [ r | "| a ine Figure 1.3.8. of Every point on one face is re- flected to the corresponding point Example 1.3.15 (Finding the matrix of a linear transformation). What of the other. is the matrix for the transformation that takes any point in IR? and gives its 1.3 Transformations 55 orthogonal (perpendicular) projection on the z-axis, as illustrated in Figure 1.3.7? You should assume this transformation is linear. Check your answer in the footnote below.” ‘What is the orthogonal projection on the linc of equation x = y of the point (3)? Again, assume this isa Hnear transformation, and check below * Example 1.3.16 (Reflection with respect to a line through the origin). Let us show that the transformation that reflects a point through a line through the origin is linear. This is the transformation that takes a point on one side of the line and moves it perpendicular to the line, crosses it, and continues the same distance away from the line, as shown in Figure 1.3.8. We will first assume that the transformation T is linear, and thus given by a matrix whose ith column is T(é;). Again, all we have to do is figure out what the T does to & and &. We can then apply that transformation to any point we like, by multiplying it by the matrix. There’s no need to do an elaborate computation for each point. To obtain the first column of our matrix we thus consider where € is mapped to. Suppose that our line makes an angle @ (theta) with the z-axis, as shown tl To get the second column, we in Figure 1.3.9. Then & is mapped to [ 1]. oo} 1 o) fa) ft ith this questi : 2] [2] = [8]. tryou had trouble with this question, you are making hard for yourself. The power of Theorem 1.3.14 is that you don't need to look for the transformation itself to construct its matriz. Just ask: what is the result of applying that transformation to the standard basis vectors? The ith column of the matrix for a linear transformation T is T(é;). So to get the first column of the matrix, ask, what does the transformation do to &? Since &; lies on the z-axis, it is projected onto iteelf. The frst column of the transformation matrix is thus (3): The second standard basis vector, &, lies on the y-axis and is projected onto the origin, so the "The matrix is hich you will note is consistent with Figure 1.3.7, since too second corn ofthe mati [2 The matrix for this inear transformation i [1/3 12 1/2 line from (3) to the line of equation x = y intersects that line at (18). as does , since the perpendicular the parcial ine tom [9]. dati th tgp ofthe pit ip (3), many [8/2 ¥2] [3] = [2]. ne sat we te cnet point ( _? ) as a vector in order to carry out the multiplication; we can’t multiply a rtd» on i) 56 Chapter 1. Vectors, Matrices, and Der see that &2 is mapped to cos(20 - 90°)] _[ sin 20 an sin (20 ~ 90°)] = |—cos 20 So the “reflection” matrix is Tey) 20 sin 26) root eee 1315 For example, we can compute that the point with coordinates z = 2,y = 1 . 8 a) : ‘2cos 20 + sin 20 reflects to the point fig 26 _ coa.29 |" sine sin 20 0828 sin20] [2] _ [2c0s20 + sin 20 Te ae sin28 ~cos 20 7] = [patos cees eet ‘The transformation is indeed linear because given two vectors ¥ and W, we Ficure 1.3.9. have T(¥ + W) = T(¥) + T(W), as shown in Figure 1.3.10. It is also apparent ee ae he fi T(c¥) =cT(¥). A Paget from the figure that T(c¥) = cT(@) oe A [on Example 1.3.17 (Rotation by an angle 8). The matrix giving the trans- 9, sin 26 }* formation R (“rotation by @ around the origin”) is and : 5.) rig, _ [0088 -sind «-[° sin2@ (R(@1), R(@)] = eee eo : 1} °° |-cos26 ‘The transformation is linear, as shown in Figure 1.3.11: rather than thinking of rotating just the vectors # and W, we can rotate the whole parallelogram P(¥,¥#) that they span. Then R(P(V,W)) is the parallelogram spanned by R(#), R(#), and in particular the diagonal of R(P(V,W)) is R(V+W). A Ty) = Tet) Exercise 1.3.15 asks you to use composition of the transformation in Example 1.3.17 to derive the fundamental theorems of trigo- nometry. FIGURE 1.3.10. Reflection is linear: the sum of the reflections is the reflection of the sum. 1.3 Transformations 57 isforination RS — RS Exercise 1.3.13 asks you to find the matrix for the tr that rotates by 30° around the y-axis. Now we will see that composition corresponds to matrix multiplication. Theorem 1.3.18 (Composition corresponds to matrix multiplics- tion). Suppose $:R" + R™ and T: R™ — R! are linear transformations given by the matrices [5] and {T] respectively. Then the matrix of the com- Rea) = oe position T oS equals the product (T)[S] of the matrices of S and T: \ a [res] = (IIs). 13.17 7 ii Proof. This is » statement about matrix multiplication and cannot be proved without explicit reference to how matrices are multiplied. Our only references to the multiplication algorithm will be the following facts, both discussed in Section 1.1. (1) 48; is the ith column of A (as illustrated by Example 1.2.5); (2) the ith column of AB is Abi, where 6, is the ith column of B (as illustrated by Example 1.2.6). Fioure 1.3.11. Now to prove the theorem; to make it unambiguous when we are applying Rotation is linear: the sum of transformation to a variable and when we are multiplying matrices, we will the rotations is the rotation of the write matrix multiplication with a star +. sum. The composition (To S) is itself a linear transformation and thus can be given by a matrix, which we will call [T'o S], accounting for the first equality Many mathematicians would below. The definition of composition gives the second equality. Next we replace say that Theorem 1.3.18 justifies by its matrix [S], and finally we replace T by its matri et Ne ey cat cadets — (Tes = (ToSyB) = 7(5(@)) = TUS] «&) = (r+ (I5}* novice, who probably feels that ‘So the first term in this sequence, [T' 0 S] * 8, which is the ith column of ‘composition of linear mappings is [To S] by fact (1), is equal to ‘more bi than matrix multi- ss {T) + the ith column of {5}, 1319 which is the ith column of [7] + [S] by fact (2). Each column of [T'o S] is equal to the corresponding column of [T] * [5], so the two matrices are equal. O) 1.3.18 Exercise 1.3.16 asks you tocon- We gave a computational proof of the associativity of matrix multiplication firm by matrix multipl ication that in Proposition 1.2.8; this associativity is also an immediate consequence of reflecting a point across the line, Theorem 1.3.18. and then back again, lands you beck at the original point. Corollary 1.3.19. Matrix multiplication is associative: if A, B, C are matri- ces such that the matrix multiplication (AB) C is allowed, then 80 is A(BC), and they are equal. Proof. Composition of mappings is associative. O 58 Chapter 1. Vectors, Matrices, and Derivatives 1.4 GEOMETRY OF BR" . To acquire the feeling for calculus that is indispensable even in the most abstract speculations, one must have learned to distinguish that which is “big” from that which is “little,” that which is “preponderant” and that which is “negligible. "Jean Dieudonné, Calcul infinitésimal Whereas algebra is all about equalities, calculus is about inequalities: about things being arbitrarily small or large, about some terms being dominant or negligible compared to others. Rather than saying that things are exactly true, we need to be able to say that they are almost true, so they “become true in the limit.” For example, (5 + A)® = 125+ 75h +. approximation 80 if h = .01, we could use the (5.01)? = 125 + (75 - .01) = 125.75. 141 The issue then is to quantify the error. ‘Such notions cannot be discussed in the language about R" that has been developed so far: we need lengths of vectors to say that vectors are small, of that points are close to each other. We will also need lengths of matrices to say that linear transformations are “close” to each other. Having a notion of dis- tance between transformations will be crucial in proving that under appropriate circumstances Newton's method converges to a solution (Section 2.7). In this section we introduce these notions. The formulas are all more or less immediate generalizations of the Pythagorean theorem and the cosine law, but they acquire a whole new meaning in higher dimensions (and more yet in infinitely many dimensions). The dot product ‘The dot product in RR" is the basic construct that gives notions of lengths and angles. to all the geometric Definition 1.4.1 (Dot product). The dot product %- ¥ of two vectors % FER" is: 1) fu . wa} | ve ‘The dot product is also known Y=]. /- = tii +222 +--+ Tada. 14.2 ‘as the standard inner product. : : tnd Lym. For example, 7 1 :] [| = (1x1) +(2x0)+ (8x1) ‘What we call the length of ‘vector is often called the Buclidean norm. Some texts use double lines to denote the length of a vector: lI! rather than ||. We reserve double lines to denote the norm of a ma- trix, defined in Section 2.8. Please do not confuse the length of a vec- tor with the absolute value of a number. In one dimension, the ‘two are the same; the “length” of the one-entry vector ¥ = [=2 is VP = 2. 14 Geometry of R” 59 The dot product is obviously commutative: R-VHF-%, 143 is distributive, ie., that and it is not much harder to check that it (Hi +o) = (K-Hi) + (K-Fa), and (B+ 2) -F = (% 7) + R-¥)- ‘The dot product of two vectors can be written as the matriz product of the transpose of one vector by the other: %-¥ = "9 =97X. 14.4 " ay fu we mR i + [+ [| is the same as ; tnd Lye 1 t2:°+ tn) [tin + 22¥2 +--+ Zndn] Seah ——— a ‘ranepooe 37 9 145 Conversely, the #, jth entry of the matrix product AB is the dot product of the jth column of B and the transpose of the ith row of A. For example, the entry 1,2 of AB below is 5, which is the dot product of the transpose of the firet row of A and the second column of B: 2 ao —o a = il 6 [3 2 3 5)) b= [2] (] . 14.6 304 7 13, ing ty Ye 4 en Definition 1.4.2 (Length of a vector). The length || of a vector < is bi] = VER = 23 +9 + +22. 1.4.7 1 What is the length |¥] of 7 = (‘|r 1 Length and dot product: geometric interpretation in R? and R In the plane and in space, the length of a vector has a geometric interpretation: {| is then the ordinary distance between 0 and %. As indicated by Figure 1.4.1, = VPS FT = V3. “its length is 17) Definition 1.4.2 is a version of the Pythagorean theorem: in two dimensions, the vector = [* is the hypotenin of ight fas gle of which the other two sides have lengths x; and 22: xt aay? +a". FIGURE 1.4.2. The cosine law gives [8-9P = 8? +191? 2121191 cosa. 60 Chapter 1. Vectors, Matrices, and Derivatives this is exactly what the Pythagorean theorem says in the case of the plane; space, this it is still true, since OAB is still a right triangle. | varst on FIGURE 1.4.1. In the plane, the length of the vector with coordinates (a, is the ordinary distance between 0 and the point [3] In space, the length of the vector with coordinates (a,b,c) is the ordinary distance between 0 and the point with coordinates (0,82). ‘The dot product. also has an interpretation in ordinary geometry: Proposition 1.4.3 (Geometric interpretation of the dot product). If%,¥ are two vectors in R? or RS, then 8-9 = [RI|V| cose, 14.8 where a is the angle between % and J. Remark. Proposition 1.4.3 says that the dot product is independent of the coordinate system we use. You can rotate a pair of vectors in the plane, or in space, without changing the dot product, as long as you don’t change the angle between them. A Proof. This is an application of the cosine law from trigonometry, which says that if you know all of a triangle’s sides, or two sides and the angle between them, or two angles and a side, then you can determine the others. Let a triangle have sides of length a,), , and let + be the angle opposite the side with length e. Then ? = a? +B? — 2abcosy. Cosine Law 1.4.9 Consider the triangle formed by the three vectors %,7 and X— J, and let a be the angle between X and ¥, as shown in Figure 1.4.2. If you don’t see how we got the numerator in Equation 1.4.12, note that the dot product of a standard basis vector € and any vector ¥ is the éth entry of #. For example, in B®, rE =0tm+0=m. Figure 1.4.3. The projection of 7 onto the line spanned by 2 is @. This gives RI = [RlVlcosa = basi = baa 1.4 Geometry of R" 61 Applying the cosine law, we find 1-91? = Ix)? + IF? - 21RII¥l cosa. 1.4.10 But we can also write (remembering that the dot product is distributive): 18-9? = (R-¥)-(R-F) = (R-F)-8) - (R-J)-F) (8-8) -(F-*)- Rt (G-H) 14.11 (%-) + (F-¥) — 28-7 = [RP + [VP - 28-7. This leads to ¥-¥ =[ilVleosa, (1.48) which is the formula we want. Example 1.4.4 (Finding an angle). What is the angle between the diagonal of a cube and any side? Let us assume our cube is the unit cube 0 < x,y,z <1, 1 so that the standard basis vectors é, 62, 83 are sides, and the vector d = | 1 is a diagonal. The length of the diagonal is {dj = V3, so the required angle « satisfies wala cosa = iaied w 1.4.12 Thus a = arceos ¥3/3 = 54.7°. A Corollary 1.4.5 restates Proposition 1.4.3 in terms of projections; it is illus trated by Figure 1.4.3. Corollary 1.4.5 (The dot product in terms of projections). If and ¥ are two vectors in R? or R®, then £-J is the product of || and the signed length of the projection of f onto the line spanned by %. The signed length of the projection is positive if it points in the direction of %; it is negative if it points in the opposite direction. Defining angles between vectors in R" ‘We want to use Equation 1.4.8 beckwards, to provide a definition of angles in R", where we can’t invoke elementary geometry when n > 3. Thus, we want to define vw 7 ae) @ = arceos awe ive., define a s0 that cos 2.4.13 VI 3 WY Figune 1.44. Left to positive dis- criminant gives two roots; a zero discriminant gives one root; a neg- ative discriminant gives no roots. 62 Chapter 1. Vectors, Matrices, and Derivatives But there’s a problem: how do we know that -1s <1, 14.14 ~ Teil $0 that the arccosine exists? Schwarz's inequality provides the answer.'® It is an absolutely fundamental result regarding dot products. ww ‘Theorem 1.4.6 (Schwara’s Inequality). For any two vectors 7 and W, 1-1 SIF 14.15 The two sides are equal if and only if 7 or W is a multiple of the other by scalar. Proof. Consider the function |¥+ tw]? as a function of t. It is a second degree polynomial of the form at? + bt + c; in fact, Iv + tw? = [tw + 9? = [W?t? + 2(7- w)E+ [VIP 1.4.16 All its values are > 0, since it is the left-hand term squared; therefore, the graph of the polynomial must not cross the t-axis. But remember the quadratic formula you learned in high school: for an equation of the form at? + bt +c = 0, ~b+ VP — tac oe If the discriminant (the quantity 6?—4ac under the square root the equation will have two distinct solutions, and its graph will cross the t-axis twice, as shown in the left-most graph in Figure 1.4.4. Substituting |¥|? for a, 2V-W for b and |¥|? for c, we see that the discriminant of Equation 1.4.16 is t A(9- W)? — 47 PP? 1.4.18 All the values of Equation 1.4.16 are > 0, so its discriminant can’t be positive: 4(7-W)? — 4|7P WP <0, and therefore |7-W] < [7II¥), which is what we wanted to show. ‘The second part of Schwarz’s inequality, that |7 - ¥| = |¥]|W| if and only if ¥ or W is a multiple of the other by a scalar, has two directions. If W is a multiple of ¥, say W = tv, then 12] = Nell? = (ID (HIM) = Ih, 1.4.19 ‘A more abetract form of Schwarz’s inequality concerns inner products of vectors in possibly infinite-dimensional vector spaces, not just the standard dot product: in IR. The general case is no more difficult to prove: the definition of an abstract inner Product is precively what is required to make this proof work. ‘The proof of Schwar2’s inequal- ity is clever; you can follow it line by line, like any proof which is written out in detail, but you won't find it by simply following your nose! There is considerable Contention for the credit: Cauchy and Bunyakovski are often consid- ered the inventors, particularly in France and in Russia. We see that the dot product of two vectors is positive if the angle between them is less than 1/2, and negati bigger than 1/2. We prefer the word orthogonal to its synonym perpendicular for etymological reasons. Orthogonal comes from the Greek for “right angle,” while perpendicular comes from the Latin for “plumb line,” which suggests a vertical line. The word normal is also used, both as a noun and as an adjective, to ‘express a right angle. Figure 1.4.5. The triangle inequality: + yi a2, we sum all a?, for i from 1 ton and j from 1 to m. As in the case of the length of a vector, do not confuse the length |A| of a matrix with the absolute value of a number. (But the length of the 1 x 1 matrix consisting of the single entry [n] is indeed the absolute value of n.) Length and matrix multiplication We said earlier that the point of writing the entries of R"” as matrices is to allow matrix multiplication, yet it isn’t clear that this notion of length, in which a matrix is considered simply as a list of numbers, is in any way related to matrix multiplication. The following proposition says that it is. Proposition 1.4.11. (a) If A is ann x m matrix, and is a vector in R™, then JAB] < |A}16). 1.4.25 (6) If A is ann x m matrix, and B is am x k matrix, then |AB] < |A|IBI. 1.4.26 Proposition 1.4.11 will soon be- come an old friend; itis a very use- ful tool in a number of proois. Of course, part (a) is the spe- cial case of part (b) (where k = 1), but the intuitive content is suffi- ciently different that we state the two parts separately, In any case, the proof of the second part lows from the frst. Remark 1.4.12. It follows from Proposition 1.4.11 that a linear transformation is continuous. Say- ing that a linear transformation A is continuous means that for every and every x € R®, there exists 6 such that if |¥- 9 < 4, then JAR - A¥| < & By Proposition 14.11, |AR- A¥I= 1A(R-¥)]< 1I]e-J). So, set lar Then if we have | — é vi< 6, I< iz and . \Ale |Ax ~ Ay] < Ale lag ~ Agi < Tr We have actually proved more: the 6 we found did not depend on x; this means that a linear trans- formation RY — Ri always uni- formly continuous. The definition of uniform continuity was given in Equation 0.2.6. A 14 Geometry of R" 65 a Proof. First note that if the niatrix A consists of a single row, ie, if A= is the transpose of a vector , the assertion of the theorem is exactly Schwarz’s inequality: [AB] = |a- Bl < [a5] 14.27 Bi 6} The idea of the proof is to consider that the rows of A are the transposes of vectors aj,...@,, as shown in Figure 1.4.6, and to apply the argument above ‘to each row separately. Remember that since the ith row of A is a], the ith entry (Ab), of AB is precisely the dot product a; - b. (This accounts for the ‘equal sign marked (1) in Equation 1.4.28.) 6 aT a B= (Ab) ay a2 -b = (Ab)2 a matrix A Tara FIGURE 1.4.6. Think of the rows of A as the transposes of the vectors ai, 2... in. Then the product a, B is the same as the dot product a, -B. Note that AD is a vector, not a matrix. ‘This leads to MBP = SBE = oso? 1 Now use Schwarz’s inequality (2); factor out |b|? (step 3), and consider (step 4) the length squared of A to be the sum of the squares of the lengths of a. (Of course, |a,|? = ja" |?). Thus, 5 < Yee = (Se BP = |APIBP. yee s zecwe = (y @ rl 1.4.29 ‘This gives us the result we wanted: [ABP < [AP |B? 66 Chapter 1. Vectors, Matrices, and Derivatives For the second, we decompose the matrix B into its columns and proceed as above. Let By,...,By be the columns of B. Then When solving big systems of A i: A linear questions was in any case _. et ee out of the question, determinants ABI? = DABS? < D|APPIB,F? = 1A? DIB, = [AMBP were a reasonable approach to the gat ah Ce theory of linear equations. With which proves the second part. O the advent of computers they lost importance, as systems of lear equations can be solved far more Determinants in R? effectively with row reduction (to be discussed in Sections 2.1 aud The determinant is a function of square matrices: it takes a square matrix as 2.2). However, determinante have input and gives a number as output. fan intersting geometric interpre tation; in Chapters 4, 5 and espe- cially 6, we use determinants con- Definition 1.4.13 (Determinant in R?). The determinant of a 2 x 2 stantly. trix | 2. matrix [01 2] i Recall that the formula for the A inverse of « 2x2 matric A det [2 9] = ont — on 1431 ab a c aj™ a ne ‘The determinant is an interesting umber to associate to a matrix because be [ -¢ el] if we think of the determinant as a function of the vectors @ and b in R?, then Soa 2x2 matrix A is invertible fit has a geometric interpretation, illustrated by Figure 1.4.7: and only if det A #0. Proposition 1.4.14 (Geometric interpretation of the determinant in R*). (a) The area of the parallelogram spanned by the vectors In this section we limit our dis 5 cussion to determinants of 2 x 2 2] cou 2] and 3 x 3 matrices; we discuss de terminants in higher dimensions in Section 48. be |B] is positive if and only if 6 lies counterclock- wise from a; it is negative if and only if 6 lies clockwise from a. Proof. (a) The area of the parallelogram is its height times its base. Its base is |b] = 0 +8. Its height h is ‘A= sin Ola = sin@\/a? + a3. 1.4.32 We can compute cos@ by using Equation 1.4.8: cos = SB = sib + cab 1.4.33 * [all Jabs oh Vor 1.4 Geometry of R" 67 So we get sin@ as follows: : __ [ah + o3)(0F + 88) - (1b + ab2)? sind = V1—cos?6 = 4/4 (a+ ay T+ afop + afb + afb + 0903 — af bf — 2aybrarba- af} 4 4 (af + a) (6F + 08) _ [era = antsy? “ V (aE+ att + BY Using this value for sin@ in the equation for the area of a parallelogram gives Area = [b| |alsin@ = Base height : sp, [eta — aby _ 1.4.35 Figure 1.4.7. = VUE + OR at + aby! Cry eee) = (eres aahil. ‘The area of the parallelogram ae rere ‘spanned by & and b is | det{a, b]}. (b) The vector obtained by rotating & counterclockwise by x/2 is € = < “ee, and we oe that &.B = def, waz] fi] _. =det (21 i ] [2 anb, + aba = de [2 2]: 1.4.36 Since (Proposition 1.4.3) the dot product of two vectors is positive if the angle between them is less than 7/2, the determinant is positive if the angle between B and @ is less than 7/2. So b lies counterclockwise from a, as shown in Figure 148. 0 Exercise 1.4.6 gives 8 more geometric proof of Proposition 1.4.14. Fioure 1 In the two cases at the top, the Determinants in R° angle between B and is less than 1/2, 60 det(é,b) > 0; this cor Definition 1.4.15 (Determinant in R°). The determinant of a 3x 3 responds to 5 being counterclock- matrix is ‘wise from &. At the bottom, the [: ho ‘tangle between 5 and € is more - be oo) bh bo —LUrC—C—s a c :| adel? al nat[ ei] +csae[ 2 wise from &, and det(é,B) is neg- ative. = 01 (baes — byea) — a2(b103 — bac) + a5(b102 — baci). Exercise 1.4.12 shows that a 3.x 3 matrix is invertible if its de- terminant is not 0, For larger matrices, the formu- las rapidly get out of hand; we will see in Section 4.8 that such de- terminants can be computed much more reasonably by row (or col- umn) reduction, ‘The determinant can also be ‘computed using the entries of the first row, rather than of the first column, as coefficients. ‘The cross product exists only in R? (and to some extent in R’). 68 Chapter 1. Vectors, Matrices, and Derivatives Each entry of the first column of the original matrix serves as the coefficient for the determinant of a 2x2 matrix; the first and third (a; and as) are positive, the middle one is negative. To remember which 2 x 2 matrix goes with which coefficient, cross out the row and column the coefficient is in; what is left is the matrix you want. To get the 2 x 2 matrix for the coefficient a2: bo é. 1.4.37 ay bs cy a fa} by Example 1.4.16 (Determinant of a 3 x 3 matrix). S13 det}1 2 4] =3de]? t]-24[3 “t] +e [2 ql 20 i Ue 0 2 4) 1438 =8(2-0)-(14+0)+2(44+4)=21 A The cross product of two vectors Although the determinant is a number, as is the dot product, the cross product is a vector: Definition 1.4.17 (Cross product in R°). The cross product a x 6 in Ris an [ts] a] fo [es aby ~ asba [=}-{]- ae [2 ‘| = | -a1bs + a3; 1.4.39 a} [és 7 by — aby 1 sa[’s 2] Think of your vectors as a 3 x 2 matrix; first cover up the first row and take the determinant of what's left. That gives the first entry of the cross product. Then cover up the second row and take minus the determinant of what’s left, giving the second entry of the cross product. The third entry is obtained by covering up the third row and taking the determinant of what's left. Like the determinant, the cross product has a geometric interpre- tation. The right-hand rule: if you put the thumb of your right hand on and your index finger on B, while bending your third finger, then your third finger points in the direction of a xB. (Alternatively, ccurl the fingers of your right hand from a to b; then your right thumb will point in the direction of ax.) 14 Geometry of R" 69 Example 1.4.18 (Cross product of two vectors in R*). det tt ‘| BG |=t ah «[ 4] 1.4.40 Proposition 1.4.19 (Geometric interpretation of the cross product). ‘The cross product a x b is the vector satisfying three properties: (1) It is orthogonal to the plane spanned by a and 6; i.e., b)=0 and B-(axb)= (2) Its length (3) The three vectors a, and a x B satisfy the right-hand rule 1.4.41 B| is the area of the parallelogram spanned by a and Proof. For the first part, it is not difficult to check that the cross product @ x B is orthogonal to both @ and B: we check that the dot product in each case is zero (Corollary 1.4.8). Thus a x b is orthogonal to & because = @yazb3 — a, A3by — a1 4b3 + 22036) + 014362 — a703b, 1.4.42 For the second part, the area of the parallelogram spanned by a and 6 is that a1b1 + Goby + aabs Joh + ah + an fH ++ 50 we have sind = V1 —cos?O = 4/1 - (aibi + @zbz + aybs)? hs ahs oR + +) (af + a + 03)(6} + 03 + 03) — (arbi + aabe + axbs)? (af + a} + 09) (6j +3 +2) {B| sin, where @ is the angle between a and b. We know (Equation 1.4.8) 1.4.43 1.4.44 The last equality in Equation 1.4.47 comes of course from Defi- nition 1.4.17. ‘You may object that the middle term of the square root looks dif ferent than the middle entry of the ‘cross product as given in Defini- tion 1.4.17, but since we are squar- ing it, (~axbs + asbi)? = (a1b» — abi)? 70 Chapter 1. Vectors, Matrices, and Derivatives ‘so that [al[b| sing (a} + a3 + 02)(b% + 03 + 68) — (arb + abs + agbs)?. 1.4.45 Carrying out the niultiplication results in a formula for the area that looks worse than it is: a long string of terins too big to fit on this page under one square root sign. That's a good excuse for omitting it here. But if you do the computations you'll see that after cancellations we have for the right-hand side: Jabs + 030} — 2ayb,arby+a;b3 + 030; — 2a,b,a3by + 0303 + 0303 — Zagbyagby, tent cn hese en a ee (anba—aabs)? (oxba~aaby)? (aats~anbs)? 1.4.46 which conveniently gives us Area = [allB|siné = /(ayb2 — abi)? + (aibs — agbi)® + (a2bs — aabe)* = |ax bd] 1.4.47, So far, then, we have seen that the cross product a x 6 is orthogonal to a and 6, and that its length is the area of the parallelogram spanned by a and b. What about the right-hand rule? Equation 1.4.39 for the cross product cannot actually specify that the three vectors obey the right-hand rule, because your right hand is not an object of mathematis. What we can show is that if one of your hands fits &}, 62,5, then it will also fit a,b,a x B. Suppose a and B are not collinear. You have one hand that fits a,b, ax; ie., you can put the thumb in the direction of &, your index finger in the direction of B and the middle finger in the direction of ax b without bending your knuckles backwards. You can move d to point in the same direction as €,, for instance, by rotating all of space (in particular b, a x b and your hand) around the line perpendicular to the plane containing & and &}. Now rotate all of space (in particular a xb and your hand) around the z-axis, until b is in the (z,y)-plane, with the y-coordinate positive. These movements simply rotated your hand, so it still fits the vectors. Now we see that our vectors have become a _ fos 0 @=|0) and B=]h], so axb=| 0 |. 1.4.48 0. 0 ab ‘Thus, your thumb is in the direction of the positive z-axis, your index finger is horizontal, pointing into the part of the (2, y)-plane where y is positive, and since both @ and b2 are positive, your middle finger points straight up. So the same hand will fit the vectors as will fit the standard basis vectors: the right hand if you draw them the standard way (z-axis coming out of the paper straight at you, y-axis pointing to the right, z-axis up.) A ‘The word parallelepiped seems to have fallen into disuse; we've met students who got a 5 on the Calculus BC exam who don't know what the term means. It is simply a possibly slanted box: a box with six faces, each of which is a parallelogram; opposite faces are equal. ‘The determinant is 0 if the three vectors are co-planar. Ficure 1.4.9, The determinant of B, é gives the volume of the parallelepiped spanned by those vectors 14 Geometry of R” 71 Geometric interpretation of the determinant in R° ‘The determinant of three vectors &, 6 and € can also be thought of as the dot product of one vector with the cross product of the other two, @- (b x €): a(t a [=]: ~aet[ 2 = a3 be ale, al o pat 2 e| anaet|? a] +e aee[f e ne 1.4.49 ‘As such it has a geometric interpretation: Proposition 1.4.20. (a) The absolute value of the determinant of three vectors , b, é forming a 3 x 3 matrix gives the volume of the parallelepiped they span. (b) The determinant is positive if the vectors satisfy the right-hand rule, and negative otherwise. Proof. (a) The volume is height times the area of the base, the base shown in Figure 1.4.9 as the parallelogram spanned by b and é. Thst area is given by the length of the cross product, |B x é]. The height h is the projection of & onto a line orthogonal to the base. Let's choose the line spanned by the cross product 6 x ¢—that is, the line in the same direction as that vector. Then h = |ii|cos8, where is the angle between a and b x ¢, and we have Volume of parallelepiped = |b x ¢} |a| cos = |a- (b x ¢) |. 1.4.50 Cee I ‘base eight determinant (b) The determinant is positive if cos8 > 0 if the angle between a and 1B x ¢ is less than 1/2). Put your right hand to fit b x ¢,5,é; since B x é is perpendicular to the plane spanned by b and é, you can move your thumb in any direction by any angle less than 7/2, in particular, in the direction of a. (This requires a mathematically correct, very supple thumb.) A Remark. The correspondence between algebra and geometry is a constant theme of mathematics. Figure 1.4.10 summarizes the relationships discussed in this section. 72 Chapter 1. Vectors, Matrices, and Derivatives Correspondence of Algebra and Geometry Operation Algebra. Geometry dot product vee Dywy FW = |FI|W] cod determinant of 2 Ne 2 matrix det E | sash ash ldet i al = Area of parallelogram a by ‘abs — boas (axb).a, (axb)ib a} x | b2| = | bras ~ arbs Length = area of parallelogram ross product a3} {bs} —[aibe—arb Right-hand rule determinant of 3 x 3 matrix a[ a a a3 b be bs a © es |- (Bxé |det{a, 5, <]] = Volume of parallelepiped FIGURE 1.4.10, Mathematical “objects” often have two interpretations: algebraic and geometric. 1.5 CONVERGENCE AND LiMITS ‘The inventors of calculus in the 17th century did not have rigor- ‘ous definitions of limits and con- tinuity; these were achieved only in the 1870s. Rigor is ultimately necessary in mathematics, but it does not always come first, as Archimedes acknowledged about his own work, in a manuscript discovered in 1906. In it Archi- medes reveals that his deepest re- sults were found using dubious in- finitary arguments, and only later proved rigorously, because “it is of course easier to supply the proof when we have previously acuired some knowledge of the questions by the method, than it is to find it without any previous knowledge.” (We found this story in John Still- well’s Mathematics and Its His- tory.) In this section, we collect the relevant definitions of Integrals, derivatives, series, approximations: calculus is all about convergence and limits. It could easily be argued that these notions are the hardest and deepest of all of inathematics. They give students a lot of trouble, and his- torically, mathematicians struggled to come up with correct definitions for two hundred years. Fortunately, these notions do not become more difficult in sev- eral variables than they are in one variable. More students have foundered on these definitions than on anything else in calculus: the combination of Greek letters, precise order of quantifiers, and inequalities is a hefty obstacle. Working through a few examples will help you understand what the definitions mean, but a proper appreciation can probably. only come from use; we hope you have already started on this path in one- variable calculus. Open and closed sets In mathematics we often need to speak of an open set U; whenever we want to approach points of a set U from every side. U must be open. Think of a set or subset as your property, surrounded by a fence. The set is open (Figure 1.5.1) if the entire fence belongs to your neighbor. As long as you stay on your property, you can get closer and closer to the fence, but you can © Figure 1.5.1. ‘An open set includes none of the fence; no matter how close a point in the open set is to the fence, you can always surround it with a ball of other points in the open set. Fiaure 1.5.2. A closed set includes its fence. Note that {x ~ yl must be less than r for the bali to be open; it cannot be = r. The symbol C used in Defini- tion 1.5.3 means “subset of.” If you are not familiar with the sym- bbols used in set theory, you may wish to read the discussion of set theoretic notation in Section 0.3. 1.5 Convergence and Limits 73 never reach it. No matter how close you are to your neighbor's property, there is always an epsilon-thin buffer zone of your property between you and it—just as no matter how close & non zero point on the real number line is to 0, you cau always find poluts that are closer. The set is closed (Figure 1.5.2) if you own the fence. Now, if you sit on your fence, there is nothing between you and your neighbor's property. If you move even an epsilon further, you will be trespassing. What ifsome of the fence belongs to you and some belongs to your neighbors? ‘Then the set is neither open nor closed. Remark 1.5.1. Even very good students often don’t see the point of specifying that @ set is open. But it is absolutely essential, for example in computing derivatives. Ifa function f is defined on a set that is not open, and thus contains at least one point z that is part of the fence, then talking of the derivative of f at z is meaningless. To compute f’(z) we need to compute $'G2) = fim 7(s(e +8) - 102), 151 but f(z +h) won't necessarily exist for h arbitrarily small, since z +h may be outside the fence and thus not in the domain of f. This situation gets much worse in R".!2 A In order to define open and closed sets in proper mathematical language, we first need to define an open ball. Imagine a balloon of radius r, centered around 4 point x. The open ball of radius r around x consists of all points y inside the balloon, but not the skin of the balloon itself: whatever y you choose, the distance between x and y is always less than the radius r. Definition 1.8.2 (Open ball). For any x € R" and any r > 0, the open ball of radius r around x is the subset B(x) = {y €R" such that x - yl 0 such that B,(x) CU. “It is possible to make sense of the notion of derivatives in closed sets, but these Tesults, due to the great American mathematician Hassler Whitney, are extremely ifficult, well beyond the scope of this book. Note that parentheses denote ‘an open set: (a,6), while brackets denote a closed set: [a, 6}. Some- times, especially in France, back- wards brackets are used to denote ‘an open set: Ja, b{= (a, 6). ‘The use of the word domain in Example 1.5.6 is not really mathe- matically correct: a function is the triple of (1) a set X: the domain; (2) a set ¥: the range; (3) arrule f that associates an element f(z) € ¥ to each element z € X. Strictly speaking, the formula 1/(y ~ 24) isn't @ function until we have specified the domain and the range, and nobody says that the domain must be the comple- ment of the parabola of equation y =": it conld be any subset of this set. Mathematicians usually disregard this, and think of a for- mula as defining a function, whose domain is the natural domain of the formula, i-e., those arguments for which the formula is defined. 74 Chapter 1. Vectors, Matrices, and Derivatives However close a point in the open subset U/ is to the “fence” of the set, by choosing r small enough, you can surround it with an open ball in R" that is entirely in the open set, not touching the fence. ‘A set that is not open is not necessarily closed: an open set owns none of its fence. A closed set owns all of its fence: Definition 1.5.4 (Closed set of R"). A closed set of R", C CR", is a set ‘whose complement IR” — Cis open. Example 1.5.5 (Open sets). (1) If @ <6, then the interval (0,0) = {zeR|a 0, for some constant C. Then if Iu tol < min {1,$, =r, 15.6 3’ Gai f~” Figure 1.5.3. It seems obvious that given a point off the parabola P, you can draw a disk around the point that avoids the parabola. Actually finding a formula for the radius of such disk is more tedious than you might expect. 1.5. Convergence and Limits 75 ‘we have!® (040) - (a+ uy] = ba? +0 - 2au— w?| > C— (lv) + 2lalhul + Jul?) c clic -(2+2+2)=0 1.5.7 >o-($+G+9) -0 b+u ‘square of side length 2r around the point () and know that any point in that open square will not be on the parabola. (We used that since |u| <1, we have [u/? < |u).) If we had defined an open set in terms of squares around points rather than balls around points, we would now be finished: we would have shown that the complement of the parabola P is open. But to be complete we now need to point out the obvious fact that there is an open ball that fits in that open square. We do this by saying that if |((¢+*) - ($)]

You might also like