0% found this document useful (0 votes)

125 views509 pages

Personal Formulary

This document is a table of contents for a personal reference text covering topics in mathematics including calculus, linear algebra, set theory, analysis, topology, and measure theory. The table of contents lists over 10 sections covering various topics in algebra, trigonometry, calculus, vector calculus, and analysis with over 100 total subsections on specific mathematical concepts.

Uploaded by

Eevee Trainer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views509 pages

Personal Formulary

Uploaded by

Eevee Trainer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 509

A Personal Formulary & Reference Text:

Results from Calculus, Linear Algebra, Set Theory,

Analysis, Topology, Measure Theory, & More

Table of Contents

Page

1 Miscellaneous Topics from Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1 Exponent and Logarithm Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Denesting Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Tricks & Identities for Factorizing and Root-Finding . . . . . . . . . . . . . . . . . . . . 13
1.4 Quadratic, Cubic, Quartic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Items from Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Special Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Various Trigonometry Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Arcfunction Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Triangle Laws (Laws of Sines, Cosines, & More) . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Some Useful Values for Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 Items from Basic Calculus & Related Topics . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.1 Primer & Identities for Hyperbolic Trig Functions . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Basic Definitions & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Special Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.3 Asymptotic Notations (O, o, ω, Ω, Θ,...) . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Definitions & Basic Properties/Theorems . . . . . . . . . . . . . . . . . . . . . 50
3.3.2 Derivative Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.3 Basic Derivative Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.4 The “♡ of (Differential) Calculus” Formula . . . . . . . . . . . . . . . . . . . . 53
3.4 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Fundamental Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.2 Basic Identities & Links to Integral Tables . . . . . . . . . . . . . . . . . . . . . 55
3.4.3 Basic & Advanced Solution Techniques . . . . . . . . . . . . . . . . . . . . . . . 57
3.4.4 Applications & Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.5 Special Integral Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.6 The “True” Antiderivative of 1/x . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Taylor, Maclaurin, & Other Special Series . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 Convergence Tests for Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

1
3.7 Convergence Tests for Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.8 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.8.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.8.2 Some Properties & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.8.3 Some Common Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.9 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.9.1 Basic Definitions & Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.9.2 Important Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.9.3 Common Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.9.4 Discrete Fourier Transform (DFT) . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.10 Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.10.1 Basic Definitions & Reference Tables . . . . . . . . . . . . . . . . . . . . . . . . 86
3.10.2 Laplace Transform Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.10.3 Common Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.11 Cauchy Principal Value (PV/CPV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 Items from Vector & Higher-Dimensional Calculus (Calculus II/III) . . . . . . . . . 90

4.1 Parametric/Polar Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2 Basics on Vectors, Dot/Cross Products, & Lines/Planes . . . . . . . . . . . . . . . . . . 91
4.3 Vector Calculus: Derivatives & Integrals of Vector-Valued Functions . . . . . . . . . . . 93
4.4 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Directional Derivatives & Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 Differentials & Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.7 Optimization & Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.8 Multiple Integrals, Differentials, Jacobians, Applications . . . . . . . . . . . . . . . . . . 101
4.9 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.10 Parameterized Surfaces: Areas & Surface Integrals . . . . . . . . . . . . . . . . . . . . . 106

5 Vector Calculus Identities (Calculus III) . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.1 Fundamental Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Useful Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 Identities with the Levi-Civita Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4 Alternative (3D) Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6 Items from Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . 120

7 Items from Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8 Matrices & Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.1 Vector Spaces: Axioms, Concepts, & Examples . . . . . . . . . . . . . . . . . . . . . . . 122
8.2 Linear Transformations; Rank, Kernel, Nullity, etc. . . . . . . . . . . . . . . . . . . . . . 124
8.3 Matrix Operations & Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.4 Transposition & Related Notions (Hermitian, Unitary, & More) . . . . . . . . . . . . . . 126
8.5 Determinants: Definitions & Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.6 Determinants: Adjugates/Adjoints & Cofactor Matrices . . . . . . . . . . . . . . . . . . 129
8.7 Determinants: Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.8 Similarity & Properties Thereof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.9 Eigenstuff (-values, -vectors, -pairs, -spaces, -decomposition...) . . . . . . . . . . . . . . . 134
8.10 Matrix Norms, Equivalence, Inequalities, & Related Notions . . . . . . . . . . . . . . . . 137
8.11 Bessel’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.12 Gram-Schmidt Orthonormalization Process . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.13 (Moore-Penrose) Pseudoinverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.14 Characteristic Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.15 Minimal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.16 The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

2
8.17 The Power Method / Power Iteration / von Mises Iteration . . . . . . . . . . . . . . . . 148
8.18 Definiteness: Positive & Negative (Semi-)Definite . . . . . . . . . . . . . . . . . . . . . . 149
8.19 Dual Spaces; Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.20 Various Matrix Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.20.1 Eigendecomposition / Spectral Decomposition . . . . . . . . . . . . . . . . . . . 151
8.20.2 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . . . . 152
8.20.3 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.20.4 Householder Triangularization & QR Stuff . . . . . . . . . . . . . . . . . . . . . 159
8.20.5 Hessenberg Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.20.6 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.20.7 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

9 Notes from Self-Studying Linear Algebra & Numerical Analysis . . . . . . . . . . . . 164

9.1 (Trefethen & Bau) Lecture 1: Matrix Multiplication, Rank, Inverses, etc. . . . . . . . . . 164
9.2 (Trefethen & Bau) Notes on Exercises 1.1, 3.4 (Multiplication to Change a Matrix) . . . 166
9.3 (Trefethen & Bau) Lecture 2: Orthogonality, Unitary Matrices . . . . . . . . . . . . . . 168
9.4 (Trefethen & Bau) Lecture 2 Addendum (Useful Norm/Inner Product Equalities) . . . . 170
9.5 (Trefethen & Bau) Lecture 3: Matrix & Vector Norms . . . . . . . . . . . . . . . . . . . 171
9.6 (Trefethen & Bau) Lecture 4: Singular Value Decomposition (SVD) . . . . . . . . . . . . 174
9.7 (Trefethen & Bau) Lecture 5: More on the SVD . . . . . . . . . . . . . . . . . . . . . . . 176
9.8 (Trefethen & Bau) Lecture 6: Projectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.9 (Trefethen & Bau) Lecture 7: QR Factorization & Gram-Schmidt . . . . . . . . . . . . . 179
9.10 (Trefethen & Bau) Lecture 10: Householder Triangularization . . . . . . . . . . . . . . . 181
9.11 (Trefethen & Bau) Lecture 11: Least Squares Problems . . . . . . . . . . . . . . . . . . 183
9.12 (Trefethen & Bau) Lecture 12: Conditioning; Condition Numbers . . . . . . . . . . . . . 184

10 Set-Theoretic Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

10.1 More Basic Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10.2 Some More Useful & Noteworthy Ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.3 Identities on Functions, Images, & Preimages . . . . . . . . . . . . . . . . . . . . . . . . 187
10.4 Limits of Sequences of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.5 Axiom of Choice (Overview) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

11 Set-Theoretic Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

11.1 Tabulated Summary of Common Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.2 Basics of an Important Visual Construction . . . . . . . . . . . . . . . . . . . . . . . . . 193
11.3 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
11.3.1 Reflexive-Like Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
11.3.1.1 Coreflexive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
11.3.1.2 Irreflexivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
11.3.1.3 Left quasi-reflexive . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
11.3.1.4 Quasi-reflexive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
11.3.1.5 Reflexivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
11.3.1.6 Right quasi-reflexive . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.3.2 Symmetry-Like Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.3.2.1 Antisymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.3.2.2 Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.3.2.3 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.3.3 Transitivity-Like Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11.3.3.1 Antitransitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11.3.3.2 Cotransitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.3.3.3 Intransitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
11.3.3.4 Left Euclidean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.3.3.5 Quasi-transitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

3
11.3.3.6 Right Euclidean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.3.3.7 Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
11.3.4 Comparability Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11.3.4.1 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11.3.4.2 Converse Well-Founded . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.3.4.3 Trichotomous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
11.3.4.4 Well-Founded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.3.5 Function-Like Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
11.3.5.1 Bijectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
11.3.5.2 Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
11.3.5.3 Injectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
11.3.5.4 (Left-)Totality / Seriality . . . . . . . . . . . . . . . . . . . . . . . . 234
11.3.5.5 Surjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.4 Combinations of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.4.1 Dense Posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.4.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
11.4.3 Equivalence / Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . 238
11.4.4 Partial Equivalence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11.4.5 Partial Orders / Posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.4.6 Preorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.4.7 Prewellorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.4.8 Pseudo-Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.4.9 Strict Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
11.4.10 Strict Total Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
11.4.11 Total Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
11.4.12 Total Preorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
11.4.13 Tournaments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.4.14 Well-order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.5 Basic Operations & Derived Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.5.1 Property Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.5.2 Property Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.5.3 Relation Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
11.5.4 Transpose of Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

12 Items from Abstract Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

12.1 Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.1.1 Overview of Basic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.1.2 Structural Graph for Group-Like Structures . . . . . . . . . . . . . . . . . . . . 259
12.1.3 Structural Graph for Ring-Like Structures . . . . . . . . . . . . . . . . . . . . . 260
12.2 Isomorphism Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
12.3 Catalogue of Important Groups & Group Structures . . . . . . . . . . . . . . . . . . . . 265
12.3.1 Specific Examples of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
12.3.2 Important Classes/Classifications of Groups . . . . . . . . . . . . . . . . . . . . 266
12.3.3 Important Subgroups/Substructures . . . . . . . . . . . . . . . . . . . . . . . . 267
12.3.4 Items Tied to Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
12.4 Catalogue of Important Rings & Ring Structures . . . . . . . . . . . . . . . . . . . . . . 270
12.4.1 Relevant (Somewhat Nonstandard) Notations . . . . . . . . . . . . . . . . . . . 270
12.4.2 Definitions of Ring-Like Structures . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.4.3 Types of Elements in a Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
12.4.4 Specific Examples of Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
12.4.5 Important Classes of Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
12.4.6 Important Ring Substructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
12.4.7 Functions of Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.5 (Dummit & Foote, Chapter 1 ) Group Theory: Basics (Groups, Actions, Morphisms) . . 286

4
12.6 (Dummit & Foote, Chapter 2 ) Group Theory: Subgroups . . . . . . . . . . . . . . . . . 289
12.7 (Dummit & Foote, Chapter 3 ) Group Theory: Quotients; Homomorphisms . . . . . . . . 290
12.8 (Dummit & Foote, Chapter 4 ) Group Theory: More on Actions . . . . . . . . . . . . . . 295
12.9 (Dummit & Foote, Chapter 7 ) Ring Theory: Basic Definitions/Examples . . . . . . . . . 298
12.10 (Dummit & Foote, Chapter 7 ) Ring Theory: Homomorphisms, Quotients, Ideals . . . . 303
12.11 (Dummit & Foote, Chapter 8 ) Ring Theory: Domains (Euclidean, PIDs, UFDs) . . . . . 308
12.12 (Dummit & Foote, Chapter 13 ) Field Theory: Basics of Field Extensions . . . . . . . . . 312
12.13 (Dummit & Foote, Chapter 13 ) Field Theory: Algebraic Extensions . . . . . . . . . . . . 315
12.14 (Dummit & Foote, Chapter 13 ) Field Theory: Splitting Fields; Algebraic Closures . . . 317
12.15 (Dummit & Foote, Chapter 13 ) Field Theory: Separability . . . . . . . . . . . . . . . . . 319
12.16 (Dummit & Foote, Chapter 14 ) Galois Theory: Basic Definitions . . . . . . . . . . . . . 321
12.17 (Dummit & Foote, Chapter 14 ) Galois Theory: The Fundamental Theorem . . . . . . . 323

13 Items from Topology, Metric Spaces, & Real Analysis . . . . . . . . . . . . . . . . . . . 325

13.1 Topological Operations on Sets / Related Sets . . . . . . . . . . . . . . . . . . . . . . . . 325
13.1.1 Boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
13.1.2 Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.1.3 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
13.1.4 Interior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
13.1.5 Limit Points / Accumulation Points / Derived Set . . . . . . . . . . . . . . . . 330
13.2 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
13.3 Continuity & Types Thereof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
13.4 Dense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
13.5 Infimum & Supremum; ε Characterization (“Capturing”) . . . . . . . . . . . . . . . . . . 337
13.6 Limit Inferior & Limit Superior of Sequences (lim inf an , lim sup an ) . . . . . . . . . . . . 338
13.7 Limit Inferior & Limit Superior of a Function . . . . . . . . . . . . . . . . . . . . . . . . 340
13.8 Lp Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
13.9 Open and Closed Sets; Gδ and Fσ sets; Topologies . . . . . . . . . . . . . . . . . . . . . 342
13.10 Riemann Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
13.11 Sequences of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

14 Notes from Self-Studying Real Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

14.1 (Baby Rudin, Chapters 1 & 2) Basics & Fundamentals . . . . . . . . . . . . . . . . . . . 346
14.2 (Baby Rudin, Chapter 2) Metric Spaces & Topology . . . . . . . . . . . . . . . . . . . . 348
14.3 (Baby Rudin, Chapter 3) Sequences & Series . . . . . . . . . . . . . . . . . . . . . . . . . 351
14.4 (Baby Rudin, Chapter 4) Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
14.5 (Baby Rudin, Chapter 5) Differentiation in R . . . . . . . . . . . . . . . . . . . . . . . . 368
14.6 (Baby Rudin, Chapter 6) Riemann(-Stieltjes) Integration . . . . . . . . . . . . . . . . . . 371
14.7 (Baby Rudin, Chapter 7) Sequences & Series of Functions . . . . . . . . . . . . . . . . . 377

15 Items from Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

15.1 (Lebesgue/Jordan) Outer Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
15.2 (Lebesgue) Measure & Measurability of Sets . . . . . . . . . . . . . . . . . . . . . . . . . 384
15.3 (Lebesgue) Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
15.4 Convergence in Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
15.5 (Lebesgue) Integrals for Nonnegative Functions . . . . . . . . . . . . . . . . . . . . . . . 390
15.6 (Lebesgue) Integrals for Arbitrary Real Functions . . . . . . . . . . . . . . . . . . . . . . 392
15.7 Repeated Integration: Fubini-Tonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
15.8 Differentiation (As in Lectures) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
15.9 Differentiation (As in Measure & Integral, Chapter 7) . . . . . . . . . . . . . . . . . . . . 399
15.10 Functions of Bounded Variation (in R) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
15.11 Absolute Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
15.12 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
15.13 Lp Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

5
16 Items from Complex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.1 Complex Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.2 Complex Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
16.3 Auxillary Inequalities/Results for Contour Integrals . . . . . . . . . . . . . . . . . . . . . 417

17 Items from Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

17.1 (Kreyszig, Ch. 1) Basics of Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 418
17.2 (Kreyszig, Ch. 1) Brief Items From Topology . . . . . . . . . . . . . . . . . . . . . . . . 420
17.3 (Kreyszig, Ch. 2) Normed & Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 422

18 Items from Analytic Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

18.1 Functions of Algebraic Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
18.2 Important Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
18.2.1 Mobius’ µ Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
18.2.2 Mobius Function of Order k, µk . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
18.2.3 Merten’s M Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
18.2.4 Euler’s Totient Function φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
18.2.5 Jordan’s Totient Functions Jk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
18.2.6 Liouville’s λ Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
18.2.7 The Divisor-Sum Functions σα . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
18.2.8 The Number of Prime Divisor Functions, ω, Ω, ν . . . . . . . . . . . . . . . . . 436
18.2.9 Mangoldt’s Λ Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
18.2.10 Chebyshev’s ψ Function / Second Function . . . . . . . . . . . . . . . . . . . . 440
18.2.11 Chebyshev’s ϑ Function / First Function . . . . . . . . . . . . . . . . . . . . . . 441
18.2.12 The Prime-Counting Function π . . . . . . . . . . . . . . . . . . . . . . . . . . 443
18.2.13 The Riemann ζ Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
18.3 Assorted Other Useful Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
18.3.1 Statements Equivalent to the Prime Number Theorem . . . . . . . . . . . . . . 446
18.3.2 Euler’s Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
18.3.3 Abel’s Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
18.3.4 Some Tauberian Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
18.4 Congruences & Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
18.4.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
18.4.2 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
18.5 Dirichlet Characters & Finite Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . . 452
18.5.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
18.5.2 Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
18.6 On Arithmetical Progressions & Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
18.7 More on Dirichlet Characters & Gauss Sums . . . . . . . . . . . . . . . . . . . . . . . . . 457
18.7.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
18.7.2 Some Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
18.8 Quadratic Residues & Quadratic Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . 461
18.8.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
18.8.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

19 Special (Often Important & Nonelementary) Functions . . . . . . . . . . . . . . . . . . 465

19.1 Bessel Functions – Jα (x), Yα (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
19.2 Beta Function – B(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
19.3 Digamma Function – ψ(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
19.4 Error Functions – erf(z), erfc(z), erfi(z) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
19.5 Exponential Integral – Ei(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
19.6 Fresnel Integrals – S(x), C(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
19.7 Gamma Function – Γ(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
19.8 Lambert W Function – W (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

6
19.9 Polylogarithms – Lin (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
19.10 Trig Integrals – Si(x), Ci(x), etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

20 Useful Inequalities Across Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

20.1 Basic/Miscellaneous Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
20.2 HM-GM-LM-AM-QM-CM Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
20.3 Inequalities for Trigonometry (Regular & Hyperbolic) . . . . . . . . . . . . . . . . . . . 478
20.4 Inequalities for Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
20.5 Inequalities for Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
20.6 Inequalities for Summations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
20.7 Inequalities for Integrals / Lp Inclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
20.8 Inequalities for Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
20.9 Inequalities for Matrix/Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
20.10 Inequalities for Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
20.11 Very General Inequalities (e.g. Inner Product Spaces, Metric Spaces) . . . . . . . . . . . 493

21 Miscellaneous Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494

21.1 Algebraic & Transcendental Numbers; Lindemann-Weierstrass . . . . . . . . . . . . . . . 494
21.2 Borwein Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
21.3 Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
21.4 Cauchy Product of Sums/Series & Discrete Convolution . . . . . . . . . . . . . . . . . . 498
21.5 Euler’s Summation Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
21.6 Formulas for the Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
21.7 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
21.8 Induced Metrics & Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
21.9 Pochhammer Symbols (Rising & Falling Factorials) . . . . . . . . . . . . . . . . . . . . . 503
21.10 Special Indicator Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
21.11 Vieta’s Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
21.12 Volume of ℓp unit ball in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
21.13 Weierstrass Factorization Theorem (Infinite Products In Terms of Roots) . . . . . . . . . 507

22 Reference Tables for Formulas, LATEX Stuff, & More . . . . . . . . . . . . . . . . . . . . 508

7
§1: Miscellaneous Topics from Algebra

§1.1: Exponent and Logarithm Properties

(log(x) and ln(x) may be used interchangeably, but note that whatever base log(x) implies is generally
field-dependent: in math it is e, hence ln(x).)

Note that, as functions of real values and real images,

ex has domain R and range R>0

ln(x) has domain R>0 and range R

eln(x) = ln(ex ) = x, whenever defined

Some properties follow:

Exponents: Assume a, b > 0.

◦ ax+y = ax ay
◦ ax−y = ax /ay
◦ (ax )y = axy = (ay )x
◦ (ab)x = ax bx
◦ a0 = 1
◦ 0x = 1 for x > 0
√
◦ a1/x = x a

Logarithms: Be mindful that these need not extrapolate to C. Assume a, x, y > 0 and a ̸= 1.

◦ loga (xy) = loga (x) + loga (y)

◦ loga (x/y) = loga (x) − loga (y)
◦ loga (xy ) = y · loga (y)
◦ loga (a) = 1
◦ loga (1) = 0

Change of Base Formula: To convert base a logarithms to base b,

ln(x) logb (x)

loga (x) = =
ln(a) logb (a)

8
§1.2: Denesting Roots

Introduction:
We motivate this by discussing the denesting of
√
q
4
89 − 28 10 (1)

in the vein of the answer here.

We consider the more general problem,
√
q
n m
A+B C (2)

(1) is just (2) with (A, B, C, m, n) = (89, −28, 10, 2, 4).

We will need the binomial theorem: recall,
n
X n k n−k
(x + y)n = x y
k
k=0

n n!
where =
k k!(n − k)!
and k! = k(k − 1)(k − 2) · · ·(2)(1)

Now, returning to (2), we want a, b such that

√
m
√ n
m
A+B C = a+b C

Well, we can expand the right-hand side by the binomial theorem. Notice that, then,
n
√ n X
m n k m√ n−k
a+b C = a b C
k
k=0
√ ′√
From here, we’ll note that α+β γ = α′ +β γ if and only if α = α′ and β = β ′ . That is, we’ll factor out our
radical after expanding via the binomial theorem, set like coefficients equal, and do whatever manipulations
necessary to get to an end result.
Broadly, these subsequent manipulations consist of the following:

We’ll have two equations, in two unknowns, our a, b.

We’ll multiply one equation by a constant, so the constant term in each is equal, so the variable terms
must equal.
Once we have an equation in integer coefficients, we divide by the highest power of b.

The result is a polynomial in the variable (a/b). We find its roots.

If r is a root, then this means a/b = r ⇐⇒ a = br. We substitute this back in to our system of
equations, to see if it results in solutions or not.

9
Starting Out:
In (1), then, we want a, b such that
√ √ 4
89 − 28 10 = a + b 10

Well, expanding the right hand side,

√ 4 4 3 √ 4 2 √ 2 √ 3 √ 4

4 4 4 4
a + b 10 = a + a b 10 + a b 10 + a b 10 + b 10
0 1 2 3 4
4
√ 3 2 2
√ 3 4
= a + 4 10a b + 60a b + 40 10ab + 100b

We know a, b need to be such that,

√ √ √
89 − 28 10 = a4 + 4 10a3 b + 60a2 b2 + 40 10ab3 + 100b4
√
so we factor on the right-hand side to get the parts with 10 and those without:
√ √
89 − 28 10 = a4 + 60a2 b2 + 100b4 + 10 4a3 b + 40ab3

so we have the pair of equations (

89 = a4 + 60a2 b2 + 100b4
(3)
−28 = 4a3 b + 40ab3

Dealing With Our System of Equations:

Now, look at the left-hand side of both equations. Clearly, if I multiply the second by −89/28, the
equations are equal:
89 890 3
a4 + 60a2 b2 + 100b4 = − a3 b − ab
7 7
Let’s make our lives just a tad easier by multiplying by 7:

7a4 + 420a2 b2 + 700b4 = −89a3 b − 890ab3

Move to the left-hand side:

7a4 + 89a3 b + 420a2 b2 + 890ab3 + 700b4 = 0
Divide through by b4 , our highest power of it. Then we get every term to have a power of (a/b):
a 4 a 3 a 2 a
7 + 89 + 420 + 890 + 700 = 0
b b b b
This is a quartic equation in the variable x = a/b:

7x4 + 89x3 + 420x2 + 890x + 700 = 0

10
Utilizing the Roots:
Ultimately, you’ll need to use whatever tools are at your disposal for solving polynomial equations.
Inspection, graphing, the rational root theorem; it ultimately depends. Here, we can find that the roots of
this are √
20 ± 3i 10
x = −5 x = −2 x=−
7
We’ll focus on the real roots, namely x = −5. Then
a
x = −5 =⇒ = −5 =⇒ a = −5b
b
Consider our initial pair of equations in (3). Replace a with −5b then. We get

89 = (−5b)4 + 60(−5b)2 b2 + 100b4

−28 = 4(−5b)3 b + 40(−5b)b3

Simplifying,

89 = 2225b4
−28 = −700b4

Either equation gives you

1 1
b4 =
=⇒ b = √
25 5
√ √
Then, since we know a = −5b, then a = −5/ 5 = − 5.

Conclusion:

This gives us
4
√ √ 1 √ √ √ 4

89 − 28 10 = − 5+ √ 10 = 2− 5
5
Taking fourth roots, then, we get
√ √ √
q
4
89 − 28 10 = 2 − 5

√
(Recall that x2 = |x|? The idea is the same here.) Well, clearly,
√ √ √ √ √ √
2 − 5 = 5 − 2 = 5 − 2

√ √
since 5> 2 and |x − y| = |y − x| (just multiply by −1).
Finally, then, we have our denested expression:

√ √ √
q
4
89 − 28 10 = 5 − 2

11
Addendum: What does the no-solution-from-a-root case look like?
Consider now a different case; what if x = −1 in our roots? Then
a
x= = −1 =⇒ a = −b
b
Then our equations of concern in (3) become

89 = (−b)4 + 60(−b)2 b2 + 100b4

−28 = 4(−b)3 b + 40(−b)b3

and, with simplification,

89 = 161b4
−28 = −44b4

Note that if you multiply the second equation by −1, and then by 89/28 (so the left-hand sides would be
equal), the coefficient on the right-hand side becomes 979/7 ≈ 139.857 ̸= 161. In this case, then, no b works
– so sometimes you’ll have to try multiple different roots.

12
§1.3: Tricks & Identities for Factorizing and Root-Finding

Relatively Basic Tricks:

Fundamental Theorem of Algebra: For p ∈ C[z], and not-necessarily-distinct roots ri , we have

n
X n
Y
i
p(z) := ai z =⇒ p(z) = an (z − ri )
i=0 i=1

The Usual, For Quadratics: If ax2 + bx + c = 0 with a, b, c ∈ Z, try finding factors p, q of ac that
sum up to b. (Hence pq = ac and p + q = b.) Then
p q
ax2 + bx + c = a x + x+
a a

Complete the Square (Quadratics): Given x2 + bx + c = 0, move c to the other side, then take
1/2 of b, square it, and add it to both sides. On the LHS, you get
2 2
2 b b
x + bx + = x+
2 2

Conjugate Root Theorem: Following this up, if α + βi ∈ C is a root of p ∈ R[x] for α ∈ R, β ∈ R̸=0 ,
then α − βi is a root too.
Reversed Coefficients & Reciprocal Roots: (More of a root-finding theorem.) Suppose that
n
X
p(x) = ai xi = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn
i=0

has root r ̸= 0. Then 1/r is a root of xn p(1/x) and in particular p(1/x). Note that
X n
1
xn p = ai xn−i = a0 xn + a1 xn−1 + a2 xn−2 + · · · + an−1 x + a0
x i=0

Note that, hence, if r is a root of a polynomial, then 1/r is a root of the polynomial with reversed
coefficient order.
n n
X n k n−k X n k
Binomial Theorem: (a + b)n = a b or (1 + x)n = x
k k
k=0 k=0

◦ This is where the perfect square trinomial idea, (x + a)2 = x2 + 2ax + a2 for instance, comes from.
m
!n Ym
X X n k
Multinomial Theorem: xi = xj j
i=1
k 1 , k2 , · · ·, km j=1
k1 +k2 +···+km =n
ki ∈Z≥0 ∀i

Sum/Difference of Powers: For the sums, apply a geometric series formula below, with n odd and
b 7→ −b. Hence (−1)n = −1. For the evens, apply it with n even.
◦ Difference of Squares: a2 − b2 = (a − b)(a + b)
◦ Difference of Fourths: a4 − b4 = (a − b)(a3 + a2 b + ab2 + b3 )

◦ Difference of Even Powers: an − bn = (a − b) an−1 + an−2 b + . . . + bn−1 for n even

13
◦ Sum of Cubes: a3 + b3 = (a + b)(a2 − ab + b2 )
◦ Sum of Fifths: a5 + b5 = (a + b)(a4 − a3 b + a2 b2 − ab3 + b4 )

◦ Sum of Odd Powers: an + bn = (a + b) an−1 − an−2 b + . . . + bn−1 for n odd
Finite Geometric Series: Second implies first, with x = a/b and then multiplying by bn :

an − bn = (a − b) an−1 + an−2 b + . . . + abn−2 + bn−1

xn − 1 = (x − 1) xn−1 + xn−2 + . . . + x + 1

Polarization-Like Identity: 4ab = (a + b)2 − (a − b)2

14
More Obscure/Niche Tricks:

Rational Root Theorem: (Wikipedia) Take a polynomial in integer coefficients p ∈ Z[x], with
leading coefficient ℓ and constant coefficient c. Then the only possible rational roots are of the form
a
± where a | c and b | ℓ
b
Any rational root will have this form, but not all combinations will be roots (it is possible for all roots
to be irrational, even).
Descartes’ Rule of Signs: (Wikipedia) Descartes’ rule of signs can give the number of positive and
negative roots, or at least narrow it down.
Write your polynomial in descending power order:
p(x) = an xn + an−1 xn−1 + an−2 xn−2 + · · · + a2 x2 + a1 x + a0
◦ # of Positive Roots (rp ): Count the number of sign changes between consecutive terms. Let
this number be s. Then rp = s − 2k for some k ∈ {0, 1, 2, · · ·}. (Of course, rp ≥ 0. Hence, if
s = 0, 1, then rp = s.)
◦ # of Negative Roots (rn ): Find p(−x) and apply the positive test to it, or multiply the
coefficients of odd power only by −1 and then apply the positive test to it. The rp you find
there becomes rn , with the same caveat: rn = s − 2ℓ for some ℓ ∈ Z≥0 , with the restriction rn ≥ 0.
An Extension of Completing The Square: Say we have x4n + a2 and wish to factor it. We can
do a completing the square-like method. Ideally, we would have
?
x4n + a2 = (x2n + a)2
but in reality
(x2n + a)2 = x4n + 2ax2n + a2
To mitigate this, add and subtract this red term, the term that gives a perfect square. Then leveraging
difference of squares yields
x4n + a2 = x4n + 2ax2n + a2 − 2ax2n
2
= x2n + a − 2ax2n
2 √ 2
= x2n + a − 2axn
√ √
= x2n + a − 2axn x2n + a + 2axn

For instance,
x4 + 4 = x4 + 4x2 + 4 − 4x2
= (x2 + 2)2 − 4x2
= (x2 + 2 − 2x)(x2 + 2 + 2x)
x8 + 81 = x8 + 18x4 + 81 − 18x4
= (x4 + 9)2 − 18x4
√ √
= x4 + 9 − 3 2x2 x4 + 9 + 3 2x2

Simon’s Favorite Factoring Trick (SFFT): Art of Problem Solving link.

xy + jx + ky = a =⇒ (x + k)(y + j) = a + jk
This is essentially factoring by grouping, but forced and expressed formulaically. (We factor a group
from the LHS, and add whatever is necessary to ensure the untouched term on the LHS can also factor,
with the same factor resulting.)

15
Sturm’s Theorem: (Wikipedia) We consider a polynomial p with no repeated roots. Form the Sturm
deg(p)
sequence of p given by Sp := {pi }i=1 with

 p0 = p

p1 = p′


pi+1 = (−1) × remainder on Euclidean division of pi−1



pi

Take ξ ∈ dom(p) and consider the sequence p0 (ξ), p1 (ξ), · · ·, pdeg(p) (ξ). Let V (ξ) be the number of sign
changes at ξ in this sequence (a la Descartes’ rule of signs).
Sturm’s theorem states the number of roots of p in (a, b] is V (a) − V (b).
We may extend to rays and R with the conventions

◦ pi (+∞) is the leading coefficient

◦ pi (−∞) is the leading coefficient if deg(pi ) is even, and −1 times it if odd
A Trick For Cubics: Let f (x) be a cubic with distinct real roots, i.e. with a, b, c ∈ R,

f (x) = (x − a)(x − b)(x − c)

(WLOG, we may let a < b < c.) Let m := (a + b)/2, the average of the two smallest roots.
Then the tangent line to f at m is given by

y = f (m) + f ′ (m)(x − m)

and y = 0 precisely when x = c.

First found a video by Dr. Peyam here; Desmos demo here.

16
External Links to Extremely Niche/Specialized Topics:

Ferrari’s Method for Quartics: (Encyclopedia of Math link) A means of turning solving a quartic
into an issue of solving cubics and quadratics.
Gauss-Lucas Theorem: (Wikipedia) For p ∈ C[z] non-constant, the roots of p′ are in the convex
hull of those from p.
Kronecker’s Method: (Wikipedia) Uses that x ∈ Z =⇒ p(x) ∈ Z for p ∈ Z[x]. Requires a lot of
brute-force, however, and is not recommended by hand.

17
§1.4: Quadratic, Cubic, Quartic Formulas

Quadratic Formula
The quadratic formula is familiar and kept as a formality:
√
2 −b ± b2 − 4ac
ax + bx + c = 0 =⇒ x =
2a

Cubic Formula
The cubic formula, for ax3 + bx2 + cx + d = 0, is given by first defining
−b
p=−
3a
bc − 3ad
q = p3 +
6a2
c
r=
3a
Then r r
q q
3 3 3 3
x= q + q + (r − p ) + q − q 2 + (r − p2 ) + p
2 2

Written in its full cumbersome madness,

v s
u 2 3
u3 −b 3 bc − 3ad −b3 bc − 3ad c b2
x= − + + − + + −
t
27a3 6a2 27a3 6a2 3a 9a2
v s
u 2 3
3 −b3 b2
t −b bc − 3ad bc − 3ad c
u
3
+ − + − − + + −
27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a

18
The previous writing makes the use of complex roots implicit. To mitigate this, one may write the three
solutions explicitly as
v s
u 2 3
u
3 −b 3 bc − 3ad −b3 bc − 3ad c b2
x1 = − + + − + + −
t
27a3 6a2 27a3 6a2 3a 9a2
v s
u 2 3
u
3 −b 3 bc − 3ad −b3 bc − 3ad c b2
+ − + − − + + −
t
27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a v
√ u s 2 3
1−i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
x2 = − + + − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
v
√ u s 2 3
1+i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
+ − + − − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a v
√ u s 2 3
1+i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
x3 = − + + − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
v
√ u s 2 3
1−i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
+ − + − − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a
or, for the analogous shortened version,
r q q r
3 3 3 3
x1 = q+ q2
+ (r − p2 )
+ q − q 2 + (r − p2 ) + p
√ r √ r
1+i 3 3 1−i 3 3
q q
2 2 3 3
x2 = q + q + (r − p ) + q − q 2 + (r − p2 ) + p
2 2
√ r √ r
1−i 3 3 1 + i 3 3
q q
3 3
x3 = q + q 2 + (r − p2 ) + q − q 2 + (r − p2 ) + p
2 2

19
Quartic Formula:

The formula for equations of the type ax4 + bx3 + cx2 + dx + e = 0 is significantly worse, worry not.
Define the shorthands

p = 2c3 − 9bcd + 27ad2 + 27b2 e − 72ace

q = c2 − 3bd + 12ae
q p
3
p + −4q 3 + p2
r= √
3a 3 2
√
q32
s= q p
3a 3 p + −4q 3 + p2
b2 2c
t= −
4a2 3a
b3 4bc 8d
u=− 3 + 2 −
a a a
Then we have
1√
r
b 1 u
x=− ± t+s+r ± 2t − s − r + √
4a 2 2 4 t+s+r

where each of the four roots arises from a distinct choice of + or − on each radical.
Solely for the sake of posterity, each root, fully uncompressed into just a, b, c, d, e, can be found here and
its underlying LaTeX here. (There’s no way it could fit on this page.)

Quintic and Higher:

Using Galois theory, one may show there exists no formula (in terms of +, −, ×, /, powers, and roots)
solving a general quintic or higher equation in C[z]. This is the Abel-Ruffini theorem (Wikipedia).
The simplest such one is x5 − x − 1 = 0.

20
§2: Items from Trigonometry

§2.1: Basic Definitions

A good, compact summary from Paul’s Online Math Notes.

Right Triangle Definitions: For an acute angle θ in a right-triangle,

opposite
◦ sin θ =
hypotenuse
adjacent
◦ cos θ =
hypotenuse
opposite
◦ tan θ =
adjacent
hypotenuse
◦ csc θ =
opposite
hypotenuse
◦ sec θ =
adjacent
adjacent
◦ cot θ =
opposite
◦ Mnemonic: SOH-CAH-TOA (for the first 3)

Unit Circle Definitions: Make an angle θ with one side on the positive x-axis, measured counter-
clockwise. There is a point where the other, terminal side (x, y) which crosses the unit circle x2 +y 2 = 1.
Then:

◦ sin θ = y
◦ cos θ = x
◦ tan θ = y/x
◦ csc θ = 1/y
◦ sec θ = 1/x
◦ cot θ = x/y
◦ Thus the point crossing the unit circle is (x, y) = (cos θ, sin θ)
◦ The other four can be found visually too: Desmos demo.

21
Functions in Terms of Sine & Cosine:
sin θ
◦ tan θ =
cos θ
1
◦ csc θ =
sin θ
1
◦ sec θ =
cos θ
sin θ
◦ cot θ =
cos θ
Domains: Interpreted as functions D → R for D the “natural domain”.

◦ sin θ has domain R

◦ cos θ has domain R

2n + 1
◦ tan θ has domain R\ π
2 n∈Z
◦ csc θ has domain R\{nπ}n∈Z

2n + 1
◦ sec θ has domain R\ π
2 n∈Z
◦ cot θ has domain R\{nπ}n∈Z

Ranges: Interpreted as functions D → R for D the “natural domain”.

◦ sin θ has range [−1, 1]

◦ cos θ has range [−1, 1]
◦ tan θ has range R
◦ csc θ has range R\(−1, 1)
◦ sec θ has range R\(−1, 1)
◦ cot θ has range R

22
§2.2: Special Values

(An extended, fuller list of special values, beyond the basics from this section, can be found on Wikipedia.)

Some of the basic ones can be thought of in terms of the 30◦ -60◦ -90◦ and 45◦ -45◦ -90◦ triangles, and some
basic geometry.

Note that, per the unit circle definitions, the “All Students Take Calculus” mnemonic holds.

Label the quadrants with “A”, “S”, “T”, “C” in numerical order

All functions are positive where “A” is (0◦ < θ < 90◦ or 0 < θ < π/4)

Sine (and thus cosecant) is positive where “S” is (90◦ < θ < 180◦ or π/2 < θ < π)
Tangent (and thus cotangent) is positive where “T” is (180◦ < θ < 270◦ or π < θ < 3π/2)
Cosine (and thus secant) is positive where “C” is (270◦ < θ < 360◦ or 3π/2 < θ < 2π)

23
The basic unit circle values, tabulated, for 30◦ - and 45◦ -related angles:

θ (deg) θ (rad) sin θ cos θ tan θ sec θ csc θ cot θ

0◦ 0 rad 0 1 0 1 undef. undef.

√
π 1 3 1 2 √
30◦ rad √ √ 2 3
6 2 2 3 3
π 1 1 √ √
45◦ rad √ √ 1 2 2 1
4 2 2
√
π 3 1 √ 2 1
60◦ rad 3 2 √ √
3 2 2 3 3
π
90◦ rad 1 0 undef. undef. 1 0
2
√
2π 3 1 √ 2 1
120◦ rad − − 3 −2 √ −√
3 2 2 3 3
3π 1 1 √ √
135◦ rad √ −√ −1 − 2 2 −1
4 2 2
√
5π 1 3 1 2 √
150◦ rad − −√ −√ 2 − 3
6 2 2 3 3

180◦ π rad 0 −1 0 −1 undef. undef.

√
7π 1 3 1 2 √
210◦ rad − − √ −√ −2 3
6 2 2 3 3
5π 1 1 √ √
225◦ rad −√ −√ 1 − 2 − 2 1
4 2 2
√
4π 3 1 √ 2 1
240◦ rad − − 3 −2 −√ √
3 2 2 3 3
√
5π 3 1 √ 2 1
300◦ rad − − 3 2 −√ −√
3 2 2 3 3
7π 1 1 √ √
315◦ rad −√ √ −1 2 − 2 −1
4 2 2
√ √
11π 1 3 3 2 √
330◦ rad − − √ −2 − 3
6 2 2 3 3

360◦ 2π rad 0 1 0 1 undef. undef.

24
Those 30◦ - and 45◦ -related values, visualized on the unit circle:

25
§2.3: Various Trigonometry Identities

Some common proof techniques for these formulas to avoid memorization are to:

Use the geometric unit-circle definition

Write a rotation as eiθ = cos θ + i sin θ

cos(θ) − sin(θ)
Use 2D rotation matrices
sin(θ) cos(θ)

Anyhow, onto the list of identities:

Each function in terms of only one of the others: The ± depends on the location of θ (use “All
Students Take Calculus”)

Pythagorean Identities: First is immediate from the unit circle; others come from dividing it by
sin2 θ or cos2 θ (or the unit circle visual):

◦ sin2 θ + cos2 θ = 1
◦ 1 + cot2 θ = csc2 θ
◦ tan2 θ + 1 = sec2 θ

26
Parity, Cofunction, & Reflection Identites: Easily motivated by the unit circle. (The first column
denotes those for evenness/oddness, and the second are the cofunction identites.)

Periodicity: Sine, cosine, secant, and cosecant are 2π-periodic, and tangent and cotangent are π-
periodic. These are easily motivated by the graphs.

27
Harmonic Addition Formula: We may write

c1 cos θ + c2 sin θ = A cos(θ + φ)

wherein
p
A = sign(c1 ) · a2 + b2
c1
cos φ =
A
c2
sin φ =
A
We need both of the last two to accurately determine where φ lies. You can also use

c2 c2
tan φ = − =⇒ φ = arctan −
c1 c1

with no real issue. We may generalize to arbitrary phase shifts with



 a sin(x + θa ) + b sin(x + θb ) = c sin(x + φ)
where c2 = a2 + b2 − 2ab cos(θa − θb )




 a sin θa + b sin θb
 and φ = arctan

a cos θa + b cos θb

and to arbitrarily many with

 n
X
ai sin(x + θi ) = A sin(x + φ)





 i=1




 Xn
where A2 = ai aj cos(θi − θj )





i,j=1
  n 
 X



  ai sin θ i

 and φ = arctan i=1
  



  X n 

a cos θ
  

 i i
i=1

28
Angle Sum Identities: Motivated by triangles with angles meeting at acute angles α, β. Wikipedia
lists some proofs here.

◦ Sine: sin(α ± β) = sin α cos β ± cos α sin β (sine cosine sign cosine sine)
◦ Cosine: cos(α ± β) = cos α cos β ∓ sin α sin β (cosine cosine cosign sine sine)
tan α ± tan β
◦ Tangent: tan(α ± β) =
1 ∓ tan α tan β
sec α sec β csc α csc β
◦ Secant: csc(α ± β) =
sec α csc β ± csc α sec β
sec α sec β csc α csc β
◦ Cosecant: sec(α ± β) =
csc α csc β ∓ sec α sec β
cot α cot β ∓ 1
◦ Cotangent: cot(α ± β) =
cot β ± cot α
p p
◦ Arcsine: arcsin x ± arcsin y = arcsin x 1 − y 2 ± y 1 − x2
p
◦ Arccosine: arccos x ± arccos y = arccos xy ∓ (1 − x2 )(1 − y 2 )

x±y
◦ Arctangent: arctan x ± arctan y = arctan
1 ∓ xy

xy ∓ 1
◦ Arccotangent: arccot x ± arccot y = arccot
y±x

Sums of Infinitely-Many Angles Inside: Extends to arbitrarily-many finite angles with θi = 0

“eventually” (for i large enough). Some similar formulas exist for the other four here.
 
∞
!
X X X Y Y
◦ sin θi = (−1)(k−1)/2  sin θi cos θi 
i=1 k ≥ 1 odd k-subsets of N i∈A i̸∈A
 
∞
!
X X X Y Y
k/2
◦ cos θi = (−1)  sin θi cos θi 
i=1 k ≥ 0 even k-subsets of N i∈A i̸∈A

29
Double Angle Formulas: Can motivate with α = β in angle-sum formulas.
2 tan θ
◦ sin(2θ) = 2 sin θ cos θ = (sin θ + cos θ)2 − 1 =
1 + tan2 θ
1 − tan2 θ
◦ cos(2θ) = cos2 θ − sin2 θ = 2 cos2 θ − 1 = 1 − 2 sin2 θ =
1 + tan2 θ
2 tan θ
◦ tan(2θ) =
1 − tan2 θ
cot2 θ − 1 1 − tan2 θ
◦ cot(2θ) = =
2 cot θ 2 tan θ
sec2 θ 1 + tan2 θ
◦ sec(2θ) = =
2 − sec2 θ 1 − tan2 θ
sec θ csc θ 1 + tan2 θ
◦ csc(2θ) = =
2 2 tan θ
Triple Angle Formulas:
π π
◦ sin(3θ) = 3 sin θ − 4 sin3 θ = 4 sin θ sin − θ sin +θ
3 3
π π
3
◦ cos(3θ) = 4 cos θ − 3 cos θ = 4 cos θ cos − θ cos +θ
3 3
3 tan θ − tan3 θ π π
◦ tan(3θ) = 2 = tan θ tan − θ tan +θ
1 − 3 tan θ 3 3
3
3 cot θ − cot θ
◦ cot(3θ) =
1 − 3 cot2 θ
sec3 θ
◦ sec(3θ) =
4 − 3 sec2 θ
csc3 θ
◦ csc(3θ) =
3 csc2 θ − 4

30
Multi-Angle Formulas:

(k−1)/2 n
 X

 (−1) cosn−k θ sink θ

 k


 k odd



 (n+1)/2 i
n i

 X X
i−j
◦ sin(nθ) = sin θ (−1) cosn−2(i−j)−1 θ

 i=0 j=0
2i + 1 j




n−1

kπ

 Y

2 (n−1)
 sin θ +
n

k=0

n
 X

 (−1)k/2 cosn−k θ sink θ

 k
 k even





 n/2 i

i−j n i
 XX
cosn−2(i−j) θ cos((2n + 1)θ)


 (−1)
2i j


 i=0 j=0

◦ cos(nθ) =
 2n

 n 2n
Y kπ

 (−1) 2 cos θ + cos(2nθ)
n



 k=0




 2n−1
2k + 1

 Y
n 2n−1
−


 (−1) 2 cos π θ
 4n
k=0

X n
(−1)(k−1)/2 tank θ
k
◦ tan(nθ) = k odd
X n
(−1)k/2 tank θ
k
k even

Chebyshev’s Multi-Angle Formula Recursions:

◦ cos(nx) = 2 cos(x) cos((n − 1)x) − cos((n − 2)x)

◦ sin(nx) = 2 cos(x) sin((n − 1)x) − sin((n − 2)x)
tan((n − 1)x) + tan(x)
◦ tan(nx) =
1 − tan((n − 1)x) tan(x)

If we let Sn = sin(nx), Cn = cos(nx), and Tn = tan(nx), then these are restated as

◦ Cn = 2C1 Cn−1 − Cn−2

◦ Sn = 2C1 Sn−1 − Sn−1
Tn−1 + T1
◦ Tn =
1 − Tn−1 T1

31
Lagrange’s Identities: Provided θ ̸≡ 0 (mod 2π)

θ 1
n cos − cos n + θ
X 2 2
◦ sin kθ =
θ
k=0 2 sin
2

θ 1
n sin + sin n + θ
X 2 2
◦ cos kθ =
θ
k=0 2 sin
2

1
n sin n + θ
X 2
Dirichlet Kernel: Related to the above: 1 + 2 cos kθ =
θ
k=1 sin
2
Half-Angle Formulas: The ± sign convention is based upon where θ/2 ends up landing w.r.t. the
unit circle.
r
θ 1 − cos θ
◦ sin = ±
2 2
r
θ 1 + cos θ
◦ cos = ±
2 2
 1 − cos θ

sin θ






sin θ







 1 + cos θ



 csc θ − cot θ



θ 
◦ tan = tan θ
2 
1 + sec θ






 r
1 − cos θ


 sgn(sin θ)





 1 + cos θ

 √
2
 −1 + sgn(cos θ) 1 + tan θ



tan θ

 1 + cos θ

sin θ









 sin θ
θ 
1 − cos θ
◦ cot =
2 


 csc θ + cot θ


 r
1 + cos θ



 sgn(sin θ)

1 − cos θ

32
Various Power Reduction Formulas:
1 − cos 2θ
◦ sin2 θ =
2
3 3 sin θ − sin 3θ
◦ sin θ =
4
3 − 4 cos 2θ + cos 4θ
◦ sin4 θ =
8
5 10 sin θ − 5 sin 3θ + sin 5θ
◦ sin θ =
16
1 + cos 2θ
◦ cos2 θ =
2
3 cos θ + cos 3θ
◦ cos3 θ =
4
4 3 + 4 cos 2θ + cos 4θ
◦ cos θ =
8
10 cos θ + 5 cos 3θ + cos 5θ
◦ cos5 θ =
16
1 − cos 4θ
◦ sin2 θ cos2 θ =
8
3 3 3 sin 2θ − sin 6θ
◦ sin θ cos θ =
32
3 − 4 cos 4θ + cos 8θ
◦ sin4 θ cos4 θ =
128
5 5 10 sin 2θ − 5 sin 6θ + sin 10θ
◦ sin θ cos θ =
512
◦ In general, utilizing the binomial theorem,

(n−1)/2

 1 X n

 cos (n − 2k)θ , n odd
2n−1 k


cosn θ =
k=0
(n/2)−1

 1 n 1 X n


 2n n/2 + cos (n − 2k)θ , n even
2n−1 k

k=0
(n−1)/2

 1 X n (n−1)/2−k

(−1) sin (n − 2k)θ , n odd


2n−1 n=0 k


n
sin θ = (n/2)−1
 1 n 1 X n
(−1)n/2−k cos (n − 2k)θ , n even



 n + n−1
2 n/2 2 k
k=0

33
Product-to-Sum Formulas: Also called prosthaphaeresis formulas. The first 3 historically are
known as Werner’s formulas.
cos(α − β) + cos(α + β)
◦ cos α cos β =
2
cos(α − β) − cos(α + β)
◦ sin α sin β =
2
sin(α + β) + sin(α − β)
◦ sin α cos β =
2
cos(α − β) − cos(α + β)
◦ tan α tan β =
cos(α − β) + cos(α + β)
n n
!
Y X X
◦ cos θk = cos xi θi
k=1 (x1 ,···,xn )∈{1,−1}n i=1
 !
n n
 (−1)⌊n/2⌋ X X Y
cos xi θ i xj , n even


2n
n



(x1 ,···,xn )∈{1,−1}n i=1
! j=1
Y
◦ sin θk = n n
k=1
 (−1)⌊n/2⌋ X X Y
cos xi θ i xj , n even


2n



(x1 ,···,xn )∈{1,−1}n i=1 j=1

Sum-to-Product Formulas:

α±β α∓β
◦ sin α ± sin β = 2 sin cos
2 2

α+β α−β
◦ cos α + cos β = 2 sin cos
2 2

α+β α−β
◦ cos α − cos β = −2 sin sin
2 2
sin(α ± β)
◦ tan α ± tan β =
cos α cos β
Miscellany:
n−1
Y kπ n
◦ sin = n−1
n 2
k=1
n
Y kπ 1
◦ cos = n (Source: Video by Dr. Michael Penn)
2n + 1 2
k=1

34
§2.4: Arcfunction Identities

The arcfunctions (say, arc-f (x)) are defined to be the inverses of their original functions (say, f (x)) on
certain intervals - the ones that end up defining their ranges. Specifically:

Note that this means if solving the equation, say, sin(x) = y, you need to account for periodicity of the
original function. Thus,

sin(x) = y =⇒ y = arcsin(x) + 2πk tan(x) = y =⇒ y = arctan(x) + πk

for k ∈ Z, and similar extensions for the other functions.

Now, onto the identities:

Cofunction-Like Identities:
π
◦ arccos(x) = − arcsin(x)
2
π
◦ arccot(x) = − arctan(x)
2
π
◦ arccsc(x) = − arcsec(x)
2
π
◦ In general, arc-f (x) + f (x) =
2
Parity-Like Identities: arcsin, arctan, arccsc are odd; arccos, arccot, arcsec are neither.

◦ arcsin(−x) = − arcsin(x)
◦ arccos(−x) = π − arccos(x)
◦ arctan(−x) = − arctan(x)
◦ arccot(−x) = π − arccot(x)
◦ arcsec(−x) = π − arcsec(x)
◦ arccsc(−x) = − arccsc(x)
◦ Alternate angle: arc-f (−x) + f (x) ∈ {0, π}

35
Function-Arcfunction Compositions: (of the type f (f −1 (x))) Each is easily motivated with a
triangle: if we want sin(arctan(x)) for instance, let θ = arctan(x) =⇒ tan θ = x, and label a triangle.
Then find sin θ.

Reciprocal Arguments:

1
◦ arcsin = arccsc(x)
x

1
◦ arccos = arcsec(x)
x

1
◦ arcsec = arccos(x)
x

1
◦ arccsc = arcsin(x)
x
( (
1 π/2 − arctan(x), x > 0 arccot(x), x > 0
◦ arctan = =
x −π/2 − arctan(x), x < 0 −π + arccot(x), x < 0
( (
1 π/2 − arccot(x), x > 0 arctan(x), x > 0
◦ arccot = =
x 3π/2 − arccot(x), x < 0 π + arctan(x), x < 0

36
Fragmentary Formulas: Meant to be used only if one has a fraction of a sine table. If complex
numbers get involved in the roots, we choose the ones with positive real part, or positive imaginary
part if it has negative real part.
p
arccos(x) = arcsin 1 − x2 , if 0 ≤ x ≤ 1 , from which you get
1 − x2

2x
arccos = arcsin , if 0 ≤ x ≤ 1
1 + x2 1 + x2
p π
arcsin 1 − x2 = − sgn(x) arcsin(x)
2
1
arccos(x) = arccos 2x2 − 1 , if 0 ≤ x ≤ 1

2
1
arcsin(x) = arccos 1 − 2x2 , if 0 ≤ x ≤ 1

2
x
arcsin(x) = arctan √
1 − x2
√ !
1 − x2
arccos(x) = arctan
x

x
arctan(x) = arcsin √
1 + x2

x
arccot(x) = arccos √
1 + x2
One that follows from these is
r !
1
arctan(x) = arccos , if x ≥ 0
1 + x2

because r r !!
1 1
cos(arctan(x)) = = cos arccos
1 + x2 1 + x2
The tangent half-angle formula yields

x
arcsin(x) = 2 arctan √
1 + 1 − x2
√ !
1 − x2
arccos(x) = 2 arctan , if − 1 < x ≤ 1
1+x

x
arctan(x) = 2 arctan √
1 + 1 + x2

37
Double-Angle-Like:

2x
◦ 2 arcsin(x) = arccos √
1 − x2
◦ 2 arccos(x) = arccos(2x2 − 1)
1 − x2

2x 2x
◦ 2 arctan(x) = arcsin = arccos = arctan
1 + x2 1 + x2 1 − x2

Triple-Angle-Like:

◦ 3 arcsin(x) = arcsin(3x − 4x3 )

◦ 3 arccos(x) = arccos(4x3 − 3x)
3x − x3

◦ 3 arctan(x) = arctan
1 − 3x2

Angle-Sum-Like:
p p
◦ arcsin(x) ± arcsin(y) = arcsin x 1 − y 2 ± y 1 − x2
p
◦ arccos(x) ± arccos(y) = arccos xy ∓ (1 − x2 )(1 − y 2 )

u±v
◦ arctan(u) ± arctan(v) = arctan (mod π) , uv ̸= 1
1 ∓ uv

38
§2.5: Triangle Laws (Laws of Sines, Cosines, & More)

Consider a triangle of sides a, b, c opposing angles α, β, γ:

The following hold:

Law of Sines:
sin α sin β sin γ
◦ = =
a b c
◦ The ratio of a sine of an angle to its side length is constant in a triangle

Law of Cosines:

◦ a2 = b2 + c2 − 2bc cos α
◦ b2 = a2 + c2 − 2ac cos β
◦ c2 = a2 + b2 − 2ab cos γ
◦ A square, sided, is equal to the sum of the squares of the other two sides, minus double their
product with the cosine of the angle between them

Law of Tangents:

α−β
tan
a−b 2
◦ =
a+b α+β
tan
2

β−γ
tan
b−c 2
◦ =
b+c β + γ
tan
2

α−γ
tan
a−c 2
◦ =
a+c α+γ
tan
2

a+b cos (α − β)/2 a−b sin (α − β)/2
Mollweide’s Formulas: = =
c sin(γ/2) c cos(γ/2)
◦ Divide the two for the law of tangents.
◦ Can also use these for the laws of sines and cosines.

39
§2.6: Some Useful Values for Fourier Analysis

These follow easily from previous discussions but it’s sometimes nice to have a quick reference for these.
Throughout, assume n ∈ Z.

sin(nπ) = 0

cos(nπ) = (−1)n

nπ   1, n = 4k + 1 for a k ∈ Z (−1)(n−1)/2 1 − (−1)n (−1)⌊(n−1)/2⌋ + (−1)⌊n/2⌋
sin = −1, n = 4k + 3 for a k ∈ Z = =
2  2 2
0, otherwise


nπ   1, n = 4k for a k ∈ Z (−1)n/2 1 − (−1)n−1 (−1)⌊n/2⌋ + (−1)⌊(n+1)/2⌋ in + (−i)n
cos = −1, n = 4k + 2 for a k ∈ Z = = =
2  2 2 2
0, otherwise


2n + 1
sin π = (−1)n
2

2n + 1
cos π =0
2

40
§3: Items from Basic Calculus & Related Topics

§3.1: Primer & Identities for Hyperbolic Trig Functions

This is largely just to go over the fundamental definitions and identities for these functions. We define

ex − e−x
sinh(x) :=
2
ex + e−x
cosh(x) :=
2
and tanh(x), sech(x), etc., are defined analogously for ordinary trig functions:

sinh(x) ex − e−x
tanh(x) := = x
cosh(x) e + e−x
1 2
sech(x) := = x
cosh(x) e + e−x
1 2
csch(x) := = x
sinh(x) e − e−x
1 ex + e−x
coth(x) := = x
tanh(x) e − e−x

Each has an associated “arc-” function that is its inverse. (“ar-” somewhat more of a proper, and common,
name; “arc” in ordinary trig refers to arc lengths, but the use of “ar-” is in reference to area, one of the
defining geometric ideas behind these functions.)
Some noteworthy identities include the following:

Arguments of ix in the original trig functions:

◦ sinh(x) = −i sin(ix)
◦ cosh(x) = cos(ix)
◦ tanh(x) = −i tan(ix)
◦ coth(x) = i cot(ix)
◦ sech(x) = sec(ix)
◦ csch(x) = i csc(ix)

Parities: sinh(x), tanh(x), coth(x), csch(x) are odd functions; cosh(x), sech(x) are even

◦ E(x) is even iff E(−x) = E(x)

◦ O(x) is odd iff O(−x) = −O(x)

Reciprocal Arcfunction Identities:

1
◦ arcsech(x) = arccosh
x

1
◦ arccsch(x) = arcsinh
x

1
◦ arccoth(x) = arctanh
x

41

1 1
◦ General scheme: arc- (x) = arc-(f )
f x

Sum Identities:

◦ cosh(x) + sinh(x) = ex
◦ cosh(x) − sinh(x) = e−x
 q
B2 B

A q1 − A2 cosh x + arctanh A

 if A2 > B 2
◦ A cosh(x) + B sinh(x) = B 1 − A22 sinh x + arctanh A , if A2 < B 2


 B B
A exp(x) if A = B


Pythagorean-like Identities:

◦ cosh2 (x) − sinh2 (x) = 1

◦ 1 − tanh2 (x) = sech2 (x) (divide “main one” by cosh2 (x))
◦ coth2 (x) − 1 = csch2 (x) (divide “main one” by sinh2 (x))

Argument Summation Identities:

◦ sinh(x ± y) = sinh(x) cosh(y) ± cosh(x) sinh(y) (sine cosine sign cosine sine)
◦ cosh(x ± y) = cosh(x) cosh(y) ± sinh(x) sinh(y) (cosine cosine sign sine sine)
tanh(x) ± tanh(y)
◦ tanh(x ± y) =
1 ± tanh(x) tanh(y)

Double Angle-like Identities:

◦ sinh(2x) = 2 sinh(x) cosh(x)

 2 2
 sinh (x) + cosh (x)

◦ cosh(2x) = 2 sinh2 (x) + 1

2 cosh2 (x) + 1


2 tanh(x)
◦ tanh(2x) =
1 + tanh2 (x)

Sum to Product Formulas:

x±y x∓y
◦ sinh(x) ± sinh(y) = 2 sinh cosh
2 2

x+y x−y
◦ cosh(x) + cosh(y) = 2 cosh cosh
2 2

x+y x−y
◦ cosh(x) − cosh(y) = 2 sinh sinh
2 2

Half-Angle-Like Formulas: sign(x) is the usual sign function.

r
x sinh(x) cosh(x) − 1
◦ sinh = p = sign(x)
2 2(1 + cosh(x)) 2
r
x cosh(x) + 1
◦ cosh =
2 2
s
x sinh(x) cosh(x) − 1
◦ tanh = = sign(x)
2 1 + cosh(x) cosh(x) + 1

42
x cosh(x) − 1
◦ For x ̸= 0, then tanh = = coth(x) − csch(x)
2 sinh(x)

Power Reduction Formulas:

cosh(2x) − 1
◦ sinh2 (x) =
2
2 cosh(2x) + 1
◦ cosh (x) =
2
Arcfunctions in terms of logarithms:
p
◦ arcsinh(x) = ln x + x2 + 1
p
◦ arccosh(x) = ln x + x2 − 1 for x ≥ 1

1 1+x
◦ arctanh(x) = ln for |x| < 1
2 1−x

1 x+1
◦ arccoth(x) = ln for |x| > 1
2 x−1
r ! √ !
1 1 1 + 1 − x2
◦ arcsech(x) = ln + − 1 = ln for 0 < x ≤ 1
x x2 x
r !
1 1
◦ arccsch(x) = ln + + 1 for x ̸= 0
x x2

Arcfunction Summation Formulas:

p p
◦ arcsinh(u) ± arcsinh(v) = arcsinh u 1 + v 2 ± v 1 + u2
p
◦ arccosh(u) ± arccosh(v) = arccosh uv ± (u2 − 1)(v 2 − 1)

u±v
◦ arctanh(u) ± arctanh(v) = arctanh
1 ± uv

1 ± uv
◦ arccoth(u) ± arccoth(v) = arccoth
u±v
p p p
◦ arcsinh(u) + arccosh(v) = arcsinh uv + (1 + u2 )(v 2 − 1) = arccosh v 1 + u2 + u v 2 − 1

Miscellaneous arcfunction identities:

◦ 2 arccosh(x) = arccosh(2x2 − 1) for x ≥ 1

◦ 4 arccosh(x) = arccosh(8x4 − 8x2 + 1) for x ≥ 1
◦ 2 arcsinh(x) = arccosh(2x2 + 1) for x ≥ 0
◦ 4 arcsinh(x) = arccosh(8x4 + 8x2 + 1) for x ≥ 0
2 2 2
x +1 x −1 x −1
◦ ln(x) = arccosh = arcsinh = arctanh 2
2x 2x x +1

Function/arcfunction composition:
p
◦ sinh(arccosh(x)) = x2 − 1 for |x| > 1
x
◦ sinh(arctanh(x)) = √ for −1 < x < 1
1 − x2
p
◦ cosh(arcsinh(x)) = 1 + x2

43
1
◦ cosh(arctanh(x)) = √ for −1 < x < 1
1 − x2
x
◦ tanh(arcsinh(x)) = √
1 + x2
√
x2 − 1
◦ tanh(arccosh(x)) = for |x| > 1
x

1 + sin α 1
◦ arcsinh(tan α) = arctanh(sin α) = ln = ± arccosh
cos α cos α
◦ ln(|tan α|) = − arctanh(cos 2α)

Conversions to other arcfunctions:

2 2 2
x −1 x −1 x +1
◦ ln(x) = arctanh 2 = arcsinh = ± arccosh
x +1 2x 2(x)

x 1
◦ arctanh(x) = arcsinh √ = ± arccosh √
1−x 2 1 − x2

x p
◦ arcsinh(x) = arctanh √ = ± arccosh 1 + x2
1 + x2
√ !
p
2
x2 − 1
◦ arccosh(x) = arcsinh x − 1 = arctanh

x

44
§3.2: Limits

§3.2.1: Basic Definitions & Results

We say that, for a function f : R → R,

lim f (x) = L
x→c

means the following, in the given fundamental cases:

Case 1 (c, L ∈ R): (∀ε > 0)(∃δ > 0) 0 < |x − c| < δ =⇒ |f (x) − L|

Case 2 (c = +∞, L ∈ R): (∀ε > 0)(∃N > 0)(∀x ≥ N ) |f (x) − L| < ε

Case 3 (c ∈ R, L = +∞): (∀M > 0)(∃δ > 0) 0 < |x − c| < δ =⇒ f (x) > M

Case 4 (c = L = +∞): (∀M > 0)(∃N > 0)(∀x ≥ N ) f (x) > M

Analogously, for the −∞ cases,

Case 5 (c = −∞, L ∈ R): (∀ε > 0)(∃N > 0)(∀x ≤ N ) |f (x) − L| < ε

Case 6 (c ∈ R, L = −∞): (∀M < 0)(∃δ > 0) 0 < |x − c| < δ =⇒ f (x) < M

Case 7 (c = L = −∞): (∀M < 0)(∃N < 0)(∀x ≤ N ) f (x) < M

If f is continuous on an open interval near c we may bring limits inside:

lim f (g(x)) = f lim g(x)
x→c x→c

If the limits individually exist and are finite, we may have (for α, β ∈ R)

lim αf (x) + βg(x) = α lim f (x) + β lim g(x)
x→c x→c x→c

lim f (x)g(x) = lim f (x) lim g(x)
x→c x→c x→c

f (x) lim f (x)

lim = x→c if the g limit is nonzero
x→c g(x) lim g(x)
x→c

Sometimes may extend this to the cases with ±∞, if careful.

Of course,
f ≤ g near c =⇒ lim f (x) ≤ lim g(x)
x→c x→c

and, if f ≤ g ≤ h, then the squeeze theorem holds:

lim f (x) = lim h(x) =: L =⇒ lim g(x) = L

x→c x→c x→c

Provided that f, g are such that

lim |f (x)| = lim |g(x)| = ∞
x→c x→c
or
lim f (x) = lim g(x) = 0
x→c x→c

45
(so that f /g takes on an ∞/∞ or 0/0 form in the limit) and f, g are differentiable near c, then L’Hopital’s
rule applies:
f (x) f ′ (x)
lim = lim ′
x→c g(x) x→c g (x)

This can be applied repeatedly. A common use case is with functions of the type f (x)g(x) for which you can
take the log first and rewrite as, say,
ln(f (x))
L = lim f (x)g(x) =⇒ ln(L) = lim ln f (x)g(x) = lim g(x) ln(f (x)) = lim
x→c x→c x→c x→c 1/g(x)

46
§3.2.2: Special Limits

A collection of some others may be found here.

α βx
lim (1 + αx)β/x = lim 1 + = eαβ
x→0 x→∞ x
x
1/x 1
◦ Special case: lim (1 + x) = lim 1 + =e
x→0 x→∞ x
x
1 1
◦ Special case: lim (1 − x)1/x = lim 1 − =
x→0 x→∞ x e
ax − 1
lim = ln(a) if a > 0
x→0 x
sin(x)
lim =1
x→0 x
1 − cos(x)
lim =0
x→0 x
n
!
X 1
lim − ln(n) + = γ (Euler-Mascheroni constant)
n→∞ k
k=1

n
lim √ =e
n→∞ n n!
√
lim
n
n! = ∞
n→∞

π(n)
lim =1
n→∞ n/ ln(n)
√
2πn(n/e)n
lim = 1 (Stirling approximation)
n→∞ n!

47
§3.2.3: Asymptotic Notations (O, o, ω, Ω, Θ,...)

A Wikipedia article, for reference.

Assume for simplicity that f and all constants mentioned are positive if not stated otherwise. Replace f
with |f | otherwise.
There are two kinds of Ω notation, that used by Knuth and that used by Hardy & Littlewood. Re-
spectively, they’re primarily used in complexity theory, and number theory. They will be denoted ΩK , ΩHL
respectively.

Informally:
◦ f (x) ∼ g(x) if f, g grow about the same and equal in the limit
◦ f (x) = O(g(x)) if g is eventually always larger than f (up to a constant multiple); analogous to
upper bound
◦ f (x) = o(g(x)) if any constant multiple of g is always eventually an upper bound of f ; g grows
much faster; g is an unreachable upper bound
◦ f (x) = Θ(g(x)) if f can be bounded above and below by constant multiples of g (giving an exact
bound)
◦ f (x) = ω(g(x)) if g(x) = o(f (x)) (f grows much faster than g)
◦ f (x) = ΩK (g(x)) if a constant multiple of g is eventually a lower bound of f
◦ f (x) = ΩHL (g(x)) if f is not properly dominated by g in the limit
Thinking about inequalities of growth rates, then,

f (x) = O(g(x)) ⇐⇒ GrowthRate(f ) ≤ GrowthRate(g)

f (x) = ΩK (g(x)) ⇐⇒ GrowthRate(f ) ≥ GrowthRate(g)
f (x) = Θ(g(x)) ⇐⇒ GrowthRate(f ) = GrowthRate(g)
f (x) = o(g(x)) ⇐⇒ GrowthRate(f ) < GrowthRate(g)
or perhaps GrowthRate(f ) ≪ GrowthRate(g)
f (x) = ω(g(x)) ⇐⇒ GrowthRate(f ) > GrowthRate(g)
or perhaps GrowthRate(f ) ≫ GrowthRate(g)

Formally:
◦ f (x) ∼ g(x) if

f (x)
∀ε > 0, ∃N ∈ N such that ∀x > N , we have − 1 < ε
g(x)
f (x)
lim =1
x→∞ g(x)

◦ f (x) = O(g(x)) if
f (x)
lim sup <∞
x→∞ g(x)
∃M > 0 and ∃a ∈ R such that, ∀x ≥ a, we have f (x) ≤ M g(x)
◦ f (x) = o(g(x)) if
∀M > 0, ∃a ∈ R such that, ∀x > a, we have |f (x)| < M g(x)
f (x)
lim =0
x→∞ g(x)

◦ f (x) = Θ(g(x)) if f (x) = O(g(x)) and f (x) = ΩK (g(x))

48
For instance, ∃m, M > 0 and a ∈ R such that, for all x > a, mg(x) ≤ f (x) ≤ M g(x).
◦ f (x) = ω(g(x)) if
g(x) = ω(f (x))
∀M > 0, ∃a ∈ R such that, ∀x > a, we have f (x) > M g(x).
f (x)
lim =∞
x→∞ g(x)

◦ f (x) = ΩK (g(x)) if
∃M > 0 and a ∈ R such that, ∀x > a, we have f (x) ≥ M g(x)
f (x)
lim inf >0
x→∞ g(x)

◦ f (x) = ΩHL (g(x)) if

∃M > 0 such that, ∀x ∈ R, ∃x0 > x where f (x0 ) ≥ M g(x0 )
f (x)
lim sup >0
x→∞ g(x)

49
§3.3: Derivatives

§3.3.1: Definitions & Basic Properties/Theorems

The fundamental definition of the usual derivative, for a function f , is a second function f ′ given
pointwise by
f (x + h) − f (x) f (x) − f (h)
f ′ (x) := lim = lim
h→0 h h→x x−h
The derivative can be used to give the line tangent to f at x0 by the point-slope rule

f (x) = f (x0 ) + f ′ (x0 )(x − x0 )

This also forms a good linear approximation for x ≈ x0 .

Some fundamental results for differentiable functions f : [a, b] → R:

Extreme Value Theorem/First Derivative Test: Continuous functions attain their suprema/in-
fima; moreover, if differentiable, f ′ is 0 or nonexistent at those points.
Second Derivative Test: At local extrema, f ′′ < 0 for maxima (open down) or f ′′ > 0 (open up).
Inflection points are where f ′′ = 0.
Rolle’s Theorem: f (a) = f (b) =⇒ ∃ξ ∈ (a, b) where f ′ (ξ) = 0
f (b) − f (a)
Mean Value Theorem: ∃ξ ∈ (a, b) where = f ′ (ξ)
b−a
Monotonicity: f ′ > 0 on (a, b) means f is monotone-increasing; analogous results exist.

Newton’s Method: For x0 such that x0 ≈ x and f (x) = 0, we have that

f (xn ) n→∞
xn+1 := xn − −−−−→ x when chosen well enough
f ′ (xn )

50
§3.3.2: Derivative Properties

If each function involved is differentiable with α, β ∈ R, then the following hold:

Linearity: (αf + βg)′ = αf ′ + bg ′

Product Rule: (f g)′ = f ′ g + g ′ f

◦ More generally, (f gh)′ = f ′ gh + f g ′ h + f gh′ , etc.

◦ We can just write down the product of n functions n times, then differentiate a different function
in each
 
n
!′ n
Y X Y
fi′ ·

◦ Formally, fi =  fj 

i=1 i=1 1≤j≤n
i̸=j

◦ General Leibniz Rule: Generalizes this to the nth derivative: we have

n
X n (n−k) (k)
(f g)(n) = f g
k
k=0
n
!(n) m
Y
Y X n (k )
fi = fℓ ℓ
k1 , · · ·, km
i=1 k1 ,···,km ∈Z≥0 ℓ=1
k1 +...+km =n

′
f f ′ g − f g′ Low d-High minus High d-Low
Quotient Rule: = =
g g2 Low2

df dg df dg
Chain Rule: f (g(x)) = f (g(x)) · g (x) =
′ ′
=
dg dx dx g(x) dx x

◦ Faa di Bruno’s Formula: Generalizes this to n derivatives:

n (j) mj
dn

X n (m1 +...+mn )
Y g (x)
f (g(x)) = f (g(x))
dxn m1 , · · ·, mn j!
m1 ,···,mn ∈Z≥0 j=1
1m1 +2m2 +···+nmn =n

Inverse Function Rule: Suppose f has inverse function f −1 . Then

df −1 1 df 1
= or equivalently =
dx ′
f ◦f −1 (x) dx −1 ′
(f ) ◦ f (x)

Functional Power Rule: Just use of chain & power rules:

′ ′

g(x) g(x) ln(f (x)) g(x) ′ g(x) ′
f (x) = e = f (x) f (x) + g (x) ln(f (x))
f (x)

or more briefly
′ ′ g
(f g ) = eg ln f = f g f ′ + g ′ ln f
f

51
§3.3.3: Basic Derivative Formulas

The very basics: Related to the basics: Other (basic) trig identities:
d d x d
◦ constant = 0 ◦ a = ax ln(a) (a > 0) ◦ tan(x) = sec2 (x)
dx dx dx
d n d 1
◦ x = nxn−1 ◦ ln|x| = (x ̸= 0) d
dx dx x ◦ sec(x) = sec(x) tan(x)
d dx
◦ sin(x) = cos(x) d 1
dx ◦ loga (x) = (a, x > 0) ◦ d csc(x) = − csc(x) cot(x)
dx x ln(a) dx
d
◦ cos(x) = − sin(x) d
dx ◦ cot(x) = − csc2 (x)
d x dx
◦ e = ex
dx
d 1
◦ ln(x) = (x > 0)
dx x

Inverse trig functions: Hyperbolic trig functions: Inverse hyperbolic trig functions:
d 1 d d 1
◦ arcsin(x) = √ ◦ sinh(x) = cosh(x) ◦ arcsinh(x) = √
dx 1 − x2 dx dx 1 + x2
d −1 d d 1
◦ arccos(x) = √ ◦ cosh(x) = sinh(x) ◦ arccosh(x) = √ (x > 1)
dx 1 − x2 dx dx 2
x −1
d 1 d
◦ arctan(x) = ◦ tanh(x) = sech2 (x) ◦
d
arctanh(x) =
1
(|x| < 1)
dx 1 + x2 dx dx 1 − x2
d 1 d
◦ arcsec(x) = √ ◦ sech(x) = − sech(x) tanh(x) d 1
dx dx ◦ arccoth(x) = (|x| > 1)
|x| x2 − 1 dx 1 − x2
d
d −1 ◦ csch(x) = − csch(x) coth(x) d −1
◦ arccsc(x) = √ dx ◦ arcsech(x) = √ (0 < x < 1)
dx |x| x2 − 1 dx x 1 − x2
d
d −1 ◦ coth(x) = − csch2 (x) d −1
◦ arccot(x) = dx ◦ arccsch(x) = √ (x ̸= 0)
dx 1 + x2 dx |x| 1 + x2

52
§3.3.4: The “♡ of (Differential) Calculus” Formula

(As ruminated on in guides from Dr. Johnson & Dr. Kunin at UAH.)

In principle, in the first approximation, the increment of a differentiable function is proportional to the
increment of the argument, with constant of proportionality the derivative, when appropriately understood.
It is, in essence, the linear approximation, though its use is woefully understated in most calculus texts.
For a function of one variable,

f (x + dx) = f (x) + f ′ (x) dx (♡ of calculus formula)

where dx indeed represents “an infinitesimally small quantity”, in the sense of 0 < dx < r ∀r > 0. Formally,

o(h) h→0
f (x + h) = f (x) + f ′ (x) · h + o(h) where −−−→ 0
h

For vector-valued or multivariable functions, we may generalize:

#» #» #»
F(t + dt) = F(t) + F ′ (t) dt (vector-valued functions)
#» #»
f #»
x + dx = f ( #»
x ) + ∇f ( #»

x ) · dx (scalar-valued function of several variables)
# » X ∂Fi
Fi #»
x + dx = Fi ( #»

x) + dxj (vector-valued function of multiple variables (vector fields))
j
∂xj

#» #»
An example of utility is the derivative of F × G. We may think of the derivative as

f (x + dx) − f (x)
f ′ (x) := lim
dx→0 dx
Then
#» #»′ #» #» #» #»
F(t + dt) × G(t + dt) − F(t) × G(t)
F × G (t) = lim (definition)
dt→0 dt
#» #» #» #» #» #»
F(t) + F ′ (t) dt × G(t) + G′ (t) dt − F(t) × G(t)
= lim (♡ formula)
dt→0 dt
1 #» #» #»′ #» #» #» #» #» #» #»
h i
= lim F × G + F × G dt + F × G′ dt + F ′ × G′ (dt)2 − F × G
dt→0 dt t
(cross product rules, compactify notation)
1 h #»′ #» #» #»′ #»′ #»′ i
2
= lim F × G dt + F × G dt + F × G (dt) (cancellation)
dt→0 dt t
h #» #» #» #» #» #» i

= lim F ′ × G + F × G′ + F ′ × G′ dt (cancellation)
dt→0 t
#» #» #» #»
= F ′ (t) × G(t) + F(t) × G′ (t) (take limit)

Remark: This did not make much use of the cross product aside from its distributivity. One may use this
to prove the product rule almost identically as a result.

53
§3.4: Integrals

§3.4.1: Fundamental Properties

Formally, the antiderivative of a function f is a function F such that F ′ = f . We may write

Z
F (x) = f (x) dx

Note that any two antiderivatives will differ by a constant, i.e. F ′ = G′ = f =⇒ F = G + C. The +C is
the constant of integration.
We may define the definite Riemann integral of f on [a, b] as discussed in the analysis sections on
Riemann integration. Some other results of note lie there. We focus on formulas here.
Some fundamental properties: for α, β ∈ R and f, g ∈ R[a, b] (Riemann integrable on [a, b]),
Z b Z a
Reverse Order of Bounds: f (x) dx = − f (x) dx
a b
Z a
Zero-Width Interval: f (x) dx = 0
a
Z b Z b Z b
Linearity: (αf (x) + βg(x)) dx = α f (x) dx + β g(x) dx
a a a
Z b Z b
Monotonicity: f ≤ g means f (x) dx ≤ g(x) dx. In particular
a a

M := sup f (x) m := inf f (x)

x∈[a,b] x∈[a,b]

give the results

Z b
m(b − a) ≤ f (x) dx ≤ M (b − a)
a
Z b
1
Average Value: The average value of f on [a, b] = f (x) dx
b−a a
Z b
1
Mean Value Theorem: ∃ξ ∈ [a, b] such that f (ξ) = f (x) dx
b−a a

Fundamental Theorems of Calculus:

Rx
◦ Part 1: a f (t) dt is continuous on [a, b] and differentiable with derivative f .
Z b x=b

◦ Part 2: f (x) dx = F (x) if F ′ = f
a x=a
Z b Z g(b)
u-Substitution: f (g(x))g ′ (x) dx = f (u) du, for u “nice enough”
a g(a)

Z b b Z b
Integration by Parts: f (x)g ′ (x) dx = f (x)g(x) − f ′ (x)g(x) dx (“LIATE” rule for f )

a a a

54
§3.4.2: Basic Identities & Links to Integral Tables

Some common integral identities. Many tables exist, though, e.g. here, here, & here. Most of the basics
are just the inverses of derivatives, howeve, so there is some repetition here as well.

Polynomials/xα :
Z
xn+1
Z
◦ dx = x + C ◦ xn dx = + C, when n ̸= −1
n+1
Z Z
1
◦ 0 dx = C ◦ dx = ln|x| + C
x

Ordinary Trig Functions:

Z Z
◦ sin(x) dx = − cos(x) + C ◦ csc(x) cot(x) dx = − csc(x) + C
Z Z
◦ cos(x) dx = sin(x) + C ◦ csc2 (x) dx = − cot(x) + C
Z Z
◦ sec2 (x) dx = tan(x) + C ◦ tan(x) dx = ln|sec(x)| + C
Z Z
◦ sec(x) tan(x) dx = sec(x) + C ◦ cot(x) dx = ln|sin(x)| + C
Z
sec(x) + tan(x)
◦ sec(x) dx = ln|sec(x) + tan(x)| + C; to prove, multiply integrand by
sec(x) + tan(x)
Z
csc(x) + cot(x)
◦ csc(x) dx = ln|csc(x) − cot(u)| + C; to prove, multiply integrand by
csc(x) + cot(x)

Exponentials & Logarithms:

ax
Z Z
◦ ex dx = ex + C ◦ ax dx = + C fpr a > 0
ln(a)
Z
◦ ln(x) dx = x ln(x) − x + C; to prove, IBP with 1 · ln(x)
Z
Inverse Trig Functions: arc-f (x) dx can be found with the 1 · arc-f (x) and IBP trick.
Z Z
dx 1 x p
◦ √ = arcsin +C ◦ arcsin(x) dx = x arcsin(x) + 1 − x2 + C
a2 − x2 a a
Z
1
Z
dx 1 x
2

◦ = arctan + C ◦ arctan(x) dx = x arctan(x)− ln 1 + x +C
x2 + a2 a a 2
Z Z
dx 1 x p
◦ √ = arcsec +C ◦ arccos(x) dx = x arccos(x) − 1 − x2 + C
x x2 − a 2 a a

55
Hyperbolic Trig Functions:
Z Z
◦ sinh(x) dx = cosh(x) + C ◦ csch(x) coth(x) dx = − csch(x) + C
Z Z

◦ cosh(x) dx = sinh(x) + C ◦ sech(x) dx = arctan sinh(x) + C
Z Z
◦ tanh(x) dx = ln(cosh(x)) + C ◦ sech2 (x) dx = tanh(x) + C
Z Z
◦ sech(x) tanh(x) dx = − sech(x) + C ◦ csch2 (x) dx = − coth(x) + C

Some Reduction Formulas: A pretty big list of others is here.

n−1
Z
1
◦ In := sinn (x) dx = − sinn−1 (x) cos(x) + In−2
n n
n−1
Z
1
◦ In := cosn (x) dx = cosn−1 (x) sin(x) + In−2
n n
Z
1
◦ In := tann (x) dx = tann−1 (x) − In−2
n−1
n−2
Z
1
◦ In := secn (x) dx = secn−2 (x) tan(x) + In−2
n−1 n−1
n−2
Z
1
◦ In := cscn (x) dx = − cscn−2 (x) cot(x) + In−2
n−1 n−2

56
§3.4.3: Basic & Advanced Solution Techniques

Some solving techniques follow. Some textual resources enumerated here.

Integration by Parts with 1: Sometimes it’s handy to rewrite an integral as

Z Z
f (x) dx = 1 · f (x) dx

This is particularly handy with the inverse function rule, when f is the inverse of a well-understood
function such as f (x) = arctan(x). In this scheme, you would opt by default to differentiate f and
work from there.

◦ Since this is just the integral product rule, there is the generalization of
Z 2
Y n
Y n Z
X Y
f1′ (x) fj (x) dx = fi (x) − fi′ (x) fj (x) dx
j=2 i=1 i=2 1≤j≤n
j̸=i

Trig Substitution: For integrals containing the following forms, use the suggested substitution to
leverage a Pythagorean trig identity:
√ a
◦ a2 − b2 x2 =⇒ x =
sin θ (Identity to Leverage: sin2 θ + cos2 θ = 1)
b
√ a
◦ b2 x2 − a2 =⇒ x = sec θ (Identity to Leverage: tan2 θ = sec2 θ − 1)
b
p a
◦ 2 2 2
a + b x =⇒ x = tan θ (Identity to Leverage: tan2 θ = sec2 θ − 1)
b
You can also use the following if you wish, though they’re sometimes poorer-behaved depending on the
functions used, whether they’re increasing/decreasing compared to their counterpart, and sometimes
the relevant derivatives introduce additional negatives.
√ a
◦ a2 − b2 x2 =⇒ x =
cos θ (Identity to Leverage: sin2 θ + cos2 θ = 1)
b
√ a
◦ b2 x2 − a2 =⇒ x = csc θ (Identity to Leverage: csc2 θ − 1 = cot2 θ)
b
p a
◦ a2 + b2 x2 =⇒ x = cot θ (Identity to Leverage: csc2 θ = 1 + cot2 θ)
b
Lesser-known hyperbolic trig substitutions exist which sometimes make for more manageable integrals.
The standard ones:
p a
◦ b2 x2 + a2 =⇒ x = sinh(u) (Identity to Leverage: cosh2 u − sinh2 u = 1)
b
p a
◦ b2 x2 − a2 =⇒ x = cosh(u) (Identity to Leverage: cosh2 u − sinh2 u = 1)
b
Likewise, some of the poorer-behaved ones:
p a
◦ a2 − b2 x2 =⇒ x = tanh(u) (Identity to Leverage: 1 − tanh2 u = sech2 u)
b
p a
◦ a2 − b2 x2 =⇒ x = sech(u) (Identity to Leverage: 1 − sech2 u = tanh2 u)
b
p a
◦ b2 x2 − a2 =⇒ x = coth(u) (Identity to Leverage: coth2 u − 1 = csch2 u)
b
p a
◦ b2 x2 + a2 =⇒ x = csch(u) (Identity to Leverage: coth2 u − 1 = csch2 u)
b

57
Weierstrass Substitution: Use on rational functions of sine and cosine. We define
x
t := tan
2
and consequently have
2t 1 − t2 2 dt
sin(x) = cos(x) = dx =
1 + t2 1 + t2 1 + t2
This also shows that
1 − t2

2t
,
1 + t 1 + t2
2
t∈R
parameterizes the unit circle, starting from (1, 0), treating (−1, 0) as t = ∞.
Hyperbolic Weierstrass Substitution: Analogously, let
x
t := tanh
2
and consequently have
2t 1 + t2 2 dt
sinh(x) = 2
cosh(x) = 2
dx =
1−t 1−t 1 − t2

Bioche’s Rules: (Wikipedia). For f (t) dt let ω(t) := f (t) dt, with f a rational function of sine and
R

cosine, Bioche’s rules suggest some potential substitutions:

◦ If ω(−t) = ω(t), then try u = cos(t)

◦ If ω(π − t) = ω(t), then try u = sin(t)
◦ If ω(π + t) = ω(t), then try u = tan(t)
◦ For both of the previous two, sometimes u = cos(2t) is good
◦ Otherwise, try u = tan(t/2), the Weierstrass substitution, as your last resort

An analogue for f a rational function of sinh(t), cosh(t) exists: just use the respective hyperbolic
version. u = et also just gives a rational function.
Parities: Use odd/evenness, self-explanatory.

◦ There is an even-odd decomposition too,

f (x) + f (−x) f (x) − f (−x)
f (x) = +
2 2
with the first summand being even and the second being odd.
◦ For f, g even and h odd, we have
Z a Z a
f (x)
h(x)
dx = f (x) dx
−a 1 ± g(x) 0

Contour Integration: Its core is the residue theorem, whereby

Z X
f (z) dx = 2πi Res(f, ak )
γ

where
Res(f, c) = lim (z − c)f (z)
z→c
at simple poles c ∈ C. For poles of order n, in general,
1 dn−1
Res(f, c) = lim n−1 (z − c)n f (z)
(n − 1)! z→c dz

58
Leibniz’s Rule/Feynman’s Trick: In its most general form, for f such that f (x, t), ft (x, t) are
continuous in t, x in the bounds of integration, with a, b ∈ C 1 ,
Z b(x) Z b(x)
∂ ∂f (x, t)
f (x, t) dt = f (x, b(x))b′ (x) − f (x, a(x))a′ (x) + dt
∂x a(x) a(x) ∂x

though it is most commonly used with a, b constant:

Z b Z b
∂ ∂f (x, t)
f (x, t) dt = dt
∂x a a ∂x

Some generalizations, examples here. There do not appear to be general “rules” for what to do, but
some common parameterizations include using

x 7→ tx
x 7→ f −1 (tf (x)), for f a function appearing elsewhere in the integral
xn 7→ xt , where n was given and fixed, but t is our parameter
Z ∞
sin(x)
f (x) 7→ f (x)e−tx ; example: for dx
0 x

Another is just to add it in somewhere random, e.g. Example 3.4 here. The end goal is to write the
integral as a function of a parameter t and solve a corresponding DE.
A variety of example parameterizations follow, mostly from Differentiating Under the Integral Sign by
Keith Conrad (download).
Z ∞ Z ∞
ex dx −→ te−tx dx
0 0
Z ∞ Z ∞
sin x sin x
dx −→ e−tx dx
0 x 0 x
Z ∞ Z ∞ −t2 (1+x2 )/2
2 e
e−x /2 dx −→ dx
0 0 1 + x2
Z ∞ Z ∞
log x −x
√ e dx −→ xt e−x dx (since ∂t xt = xt log x)
0 x 0
Z ∞ Z ∞
1 1
2
dx −→ 2 + x2
dx
0 1 + x 0 t
Z 1 2 Z 1 t
x −1 x −1
dx −→ dx
0 ln x 0 ln x

59
Smaller/More Niche Tricks:

◦ Add/Subtract The Same Thing: Common in numerators of rational functions, e.g. x/(x + 1).
◦ Partial Fractions: Self-explanatory, use on integrals of ratios of polynomials.
◦ Convert to Series: Self-explanatory, typically will swap summation & integration. Sometimes
the conversion to a Riemann sum is warranted, specifically if an infinite limit is involved with the
limiting variable both as an upper bound and a summand of some sort.
◦ Sine & Cosine in Complexes: May be convenient to write

eix − e−ix eix + e−ix

sin(x) = cos(x) =
2i 2
◦ Leverage Laplace/Fourier Transforms: For instance,
Z ∞ Z ∞
f (t)
dt = L(f (t)) ds
0 t 0

◦ Euler Substitution: Never used it, but read up on it here.

Z b Z b
◦ King’s Property: f (x) dx = f (a + b − x) dx (u-sub: u = a + b − x).
a a
May be thought of as integrating backwards. An application: if we take
Z a √
a−x
I= √ √ dx
0 x+ a−x
Z a
applying this rule and then adding I to itself gives 2I = 1 dx
0
◦ Harmonic Functions: In multivariable calculus, f : R2 → R is harmonic if ∂x2 f + ∂y2 f = 0. It
then follows that for such functions, with r > 0,
Z 2π
f (a + r cos θ, b + r sin θ) dθ = 2π · f (a, b)
0
R
◦ Glasser’s Master Theorem (Special Case): For f ∈ C(R) with R f (x) dx existing, then
∀a > 0, Z ∞ Z ∞
a
f (x) dx = f x− dx
−∞ −∞ x
Z ∞
f (ax) − f (bx) a
◦ Frullani Integral: We have dx = −f (0) + lim f (x) ln
0 x x→∞ b
◦ By Lobachevsky’s Integral Formula: For f π-periodic, we have
∞ ∞ π/2
sin2 (x)
Z Z Z
sin(x)
f (x) dx = f (x) dx = f (x) dx
0 x2 0 x 0

and
∞ π/2 π/2
sin4 (x)
Z Z Z
2
f (x) dx = f (x) dx − sin2 (x)f (x) dx
0 x4 0 3 0

◦ For f with bounded antiderivative on [0, ∞),

Z ∞
1 ∞
Z
1 1
f (x) dx = f (x) + 2 f dx
0 2 0 x x

by u = 1/x.

60
Measure Theory: MCT & DCT: (See the measure theory section for relevant definitions.)

◦ Monotone Convergence Theorem (MCT): Take {fk }k∈N measurable on E.

(Ascending & Below) If fk ↗ f a.e., and ∃φ ∈ L1 (E) with fk ≥ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

(Descending & Above) If fk ↘ f a.e., and ∃φ ∈ L1 (E) with fk ≤ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

k→∞
◦ Uniform Convergence Theorem: Take {fk }k∈N ⊆ L1 (E) with fk −−−−→ f uniformly on E,
with µ(E) < ∞. Then:
f ∈ L1 (E)
Z Z
k→∞
fk −−−−→ f
E E
k→∞
◦ Dominated Convergence Theorem (DCT): Take {fk }k∈N measurable on E with fk −−−−→ f
pointwise-a.e. If ∃φ ∈ L1 (E) with |fk | ≤ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

Bounded Convergence Theorem: Follows immediately as a corollary for φ ≡ M ∈ R.

k→∞
Take {fk }k∈N measurable on E with fk −−−−→ f pointwise-a.e. If ∃M ∈ R with |fk | ≤ M a.e.
∀k, then Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
Sequential/Generalized Version: Take {fk }k∈N , {φk }k∈N sequences of measurable func-
tions with
k→∞
· fk −−−−→ f pointwise a.e. in E
k→∞
· φk −−−−→ φ pointwise a.e. in E
· |fk | ≤ φk a.e. in E for all k
· φ ∈ L1 (E)
k→∞ R
· E φk −−−−→ E φ
R
Z
k→∞
Then |fk − f | −−−−→ 0
E
m
Convergence In Measure Version: Suppose {fk }k∈N has fk −→ f on E, and |fk | ≤ φ ∈ L1 (E)
for some φ. Then f ∈ L1 (E) and
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

Proved by showing every subsequence {fkj }j∈N has a subsubsequence {fkji }i∈N with
Z Z
i→∞
fkji −−−→ f
E E

◦ Corollary of Fatou, DCT, & MCT: Take {fk }k∈N nonnegative measurable functions with
k→∞
fk −−−−→ f pointwise-a.e. on E and fk ≤ f a.e. and for each k. Then
Z Z
k→∞
fk −−−−→ f
E E

61
Note there is no assumption on integrability of f , unlike, say, DCT. Some call this also the MCT,
despite being strictly stronger and more practical. Some discussion on MSE here.
R
It may be considered a corollary of the DCTR for E f < ∞, as well, and hence the usage of Fatou
arises (and can be used independently) for E f = ∞. Note, too, Fatou can be considered implied
by the MCT (see typical proofs).

62
§3.4.4: Applications & Approximations

Some formulas from applications:

Z b
Area Between f and g: Given by [f (x) − g(x)] dx if f ≥ g on [a, b].
a
Z b
Arc Length: Given by
p
1 + [f ′ (x)]2 dx
a
Z b
Volume of Solid of Revolution About x-Axis: Given by π f 2 (x) dx; adjust as needed. Can
a
combine with the area-between-curves formula for the washer method and to go about different axes.
Z b
Surface Area of Solid of Revolution About x-Axis: Given by 2π
p
f (x) 1 + [f ′ (x)]2 dx.
a
Similar thoughts/alternatives apply.

Some approximation methods for definite integrals:

General Constructions: Break up [a, b] into n equal subintervals [xi−1 , xi ] of width ∆xi = (b − a)/n.
We will evaluate f at a point ξi in each interval.
Constant Interpolations:
b n
b−aX
Z
◦ General Form: f (x) dx ≈
f (ξi )
a n i=1
Z b n
b−aX
◦ Left-Endpoint Rule (ξi = xi−1 ): f (x) dx ≈ f (xi−1 )
a n i=1

(b − a)2
Error is bounded above by sup |f ′ (x)|
2n x∈[a,b]
Z b n
b−aX
◦ Right-Endpoint Rule (ξi = xi ): f (x) dx ≈ f (xi )
a n i=1

(b − a)2
Error is bounded above by sup |f ′ (x)|
2n x∈[a,b]
Z b n
b−aX xi−1 + xi
◦ Midpoint Rule (ξi = (xi−1 + xi )/2): f (x) dx ≈ f
a n i=1 2

(b − a)3
Error is bounded above by sup |f ′′ (x)|
24n2 x∈[a,b]

Linear Interpolation/Trapezoid Rule:

n n−1
!
b
b − a X f (xk−1 ) + f (xk ) b−a
Z X
◦ Given by f (x) dx ≈ = f (x0 ) + f (xn ) + 2 f (xk )
a n 2 2n
k=1 k=2

(b − a)3
◦ Error bounded above by sup |f ′′ (x)|
12n2 x∈[a,b]

Quadratic Interpolation/Simpson’s Rule:

63
◦ Must split up [a, b] into n-many subintervals for n even.
 
Z b n/2 n/2−1
b − a X X
◦ f (x) dx ≈ f (x0 ) + f (xn ) + 4 f (x2j−1 ) + 2 f (x2j )
a 3n j=1 j=1

(b − a)5
(4)

◦ Error bounded above by sup (x)
180n4 x∈[a,b]
f

Simpson’s 2nd Rule (Cubics): Read here

Newton-Cotes: Read here.

64
§3.4.5: Special Integral Values

Some are listed here.

Gamma Function-type Integrals:

Z ∞
◦ xn e−x dx = Γ(n + 1) = n! for n ∈ Z≥0
0
Z ∞
√ −x √

3 1
◦ xe dx = Γ = ! = π/2
0 2 2
Z ∞
√

1 1 1
◦ √ e−x dx = Γ = − != π
0 x 2 2
Z ∞ √
2 π
Gaussian: e−x dx =
0 2
Zeta-y Integrals:
Z ∞
x π2
◦ x
dx = ζ(2) =
0 e −1 6
Z ∞ 2
x
◦ x−1
dx = 2ζ(3)
0 e
Z ∞
x3 π4
◦ dx = 6ζ(4) =
0 ex − 1 15
Z n
1 Y
◦ Qn dxi = ζ(n)
[0,1] n 1 − i=1 xi i=1

∞ ∞
sin2 x
Z Z
sin x π
dx = dx =
0 x 0 x 2
(
π/2 π/2
(n − 1)!!
Z Z
1, n odd
n
sin (x) dx = n
cos (x) dx = · for n ∈ Z≥1
0 0 n!! π/2, n even
Z 1 ∞
X Z 1 ∞
X
Sophomore’s Dream: x −x
dx = n −n
and x
x dx = − (−n)−n
0 n=1 0 n=1

65
§3.4.6: The “True” Antiderivative of 1/x

(First seen in a video by Dr. Trefor Bazett here. Likely alluded to by Dr. Kunin. Desmos demo here.)

It is commonly stated that

Z
1
dx = ln|x| + C for x ∈ (−∞, 0) ∪ (0, +∞) = R̸=0
x
This doesn’t tell the full story. The fact that the domain is disconnected allows us to consider shifting each
“half” of the graph of ln|x| by different constants, yet maintain the property that it has derivative ln|x|.
That is to say, it is most general to say
( (
ln |x| + C, x > 0
Z
1 ln(x) + C, x > 0
dx = =
x ln |x| + D, x < 0 ln(−x) + D, x < 0

for any constants C, D ∈ R you please.

One may point to the mean value theorem for a claimed contradiction. The mean value theorem has a
corollary stating that
f ′ = g ′ on (a, b) =⇒ f = g + C for a constant C
but note that this requires an open interval, a connected set.
One may say, then, most generally:
Z
f (x) dx = F (x) + C(x)

where C(x) is a locally constant function, and F ′ = f . C(x) being locally constant means it will be
constant, in particular, over every connected component of the domain, i.e. C ′ ≡ 0 on its domain. (A
connected component is a maximal subset w.r.t. inclusion that cannot be written as the union of two
disjoint open sets. Hence, in R, C(x) must be constant over every open interval in the domain, though on
different maximally-sized intervals, it may take on different constants.) It may prove helpful to think, then,
of the +C in integration not as a constant, but rather something with a vanishing derivative.
As a consequence, for instance,
(
−1/x + C, x > 0
Z
1
dx =
x2 −1/x + D, x < 0

or
ln(1 + x) − ln(1 − x)


 + A, x < −1
Z 
 2
dx 
ln(1 + x) − ln(1 − x)
= + B, x ∈ (−1, 1)
1 − x2  2
 ln(1 + x) − ln(1 − x) + C,



x>1
2
i.e. this is nothing special to do with the logarithm, and is merely an artifact of the domains over which
these functions are defined.

66
§3.5: Taylor, Maclaurin, & Other Special Series

(A more comprehensive list on Wikipedia is here.)

∞
X f (n) (c)
Generally: For f ∈ C ∞ , the Taylor series centered at c ∈ C is given by f (x) = (x − c)n
n=0
n!

Binomial Theorems: We assume nk = 0 when k > n and define, for all n ∈ C, k ∈ Z≥0 ,

n n(n − 1)(n − 2) · · ·(n − k + 1) k numbers on top
:=
k k! k!
n ∞
X n k X n k
◦ Usual: (1 + x)n = x = x , for n ∈ Z≥0
k k
k=0 k=0
∞
−α
X α+k−1 k
◦ Newton’s Generalization: (1 − x) = x for α ∈ C
α−1
k=0

Sums of Powers ( kα ): Some more discussion on Wikipedia’s article on Faulhaber’s formula.

n
X n(n + 1)
◦ k=
2
k=1
n
X n(n + 1)(2n + 1)
◦ k2 =
6
k=1
n 2
X n(n + 1)
◦ k3 =
2
k=1
n
X 1 π2
◦ =: ζ(2) =
k2 6
k=1
n
X 1 π4
◦ 4
=: ζ(4) =
k 90
k=1
n
X 1 π6
◦ =: ζ(6) =
k6 945
k=1
n
X 1 B2n (2π)2n
◦ 2n
=: ζ(2n) = (−1)n+1 for Bk the kth Bernoulli number, and n ∈ Z≥1
k 2 · (2n)!
k=1

Geometric Series: Many like identities exist, by manipulating derivatives and integrals.
n
X 1 − xn+1
◦ Finite: xk =
1−x
k=0
n
X 1
◦ Infinite: xk = if |x| < 1
1−x
k=0

Exponentials & Logarithms:

∞
x
X xk
◦ e = exp(x) =
k!
k=0

67
∞
X xk
◦ ln(1 − x) =
k
k=1
∞
X (−1)k k
◦ ln(1 + x) = x
k
k=1
∞
X (−1)k
◦ ln(x) = (x − 1)k
k
k=1

Ordinary Trigoonometric Functions: Many are excluded since theirs is complicated, rarely used,
or involves Bernoulli numbers. Note that if x is in the denominator, it is not technically a power series.
∞
X (−1)k 2k+1
◦ sin(x) = x
(2k + 1)!
k=0
∞
X (−1)k 2k
◦ cos(x) = x
(2k)!
k=0
∞
X (2k)!
◦ arcsin(x) = x2k+1 for |x| ≤ 1
22k · (k!)2 · (2k + 1)
k=0
∞
π X (2k)!
◦ arccos(x) = − x2k+1 for |x| ≤ 1 (π/2 − arcsin(x))
2 2 · (k!)2 · (2k + 1)
2k
k=0
∞
(−1)k 2k+1
 X
x , |x| ≤ 1


2k + 1



 k=0
∞

(−1)k 1

 π X
◦ arctan(x) = − , x≥1
 2 2k + 1 x2k+1
 k=0
∞

 π X (−1)k 1



 − − , |x| ≤ 1
2 2k + 1 x2k+1

k=0
∞
X (2k)! 1
◦ arccsc(x) = for |x| ≥ 1 (arccsc(x) = arcsin(1/x))
22k 2
· (k!) · (2k + 1) x 2k+1
k=0
∞
π X (2k)! 1
◦ arcsec(x) = − for |x| ≥ 1 (arcsec(x) = arccos(1/x))
2 22k · (k!)2 · (2k + 1) x2k+1
k=0
∞
π X (−1)k 2k+1

− x , |x| ≤ 1


2 2k + 1



 k=0
∞

(−1)k 1

 X
◦ arccot(x) = , x≥1
 2k + 1 x2k+1
 k=0
∞

(−1)k 1

 X

π +

 , x ≤ −1
2k + 1 x2k+1
k=0

Hyperbolic Trigoonometric Functions:

∞
X 1
◦ sinh(x) = x2k+1 sin(x) without the −1 stuff
(2k + 1)!
k=0
∞
X 1 2k
◦ cosh(x) = x cos(x) without the −1 stuff
(2k)!
k=0
∞
x2k+1

X 1 1+x
◦ arctanh(x) = = ln for |x| < 1
2k + 1 2 1−x
k=0

68
∞
1 + aπ · coth(aπ) X 1
◦ =
2a2 n=0
n 2 + a2

69
§3.6: Convergence Tests for Series

(A Wikipedia list of various convergence tests.)

P∞
First, some trivial observations: consider the summation n=1 an .

nth Term Limit Test: (Wikipedia) It is required that an → 0 for the summation to converge. (It is
not sufficient, however; see the harmonic series.)
P∞
Absolute Convergence
P∞ Test: If a series converges absolutely, then it converges, i.e. n=1 |an |
converging =⇒ n=1 a n does too.

Now an array of tests:

P∞
Abel’s Test (link): Consider the summation n=1 an , known to be convergent.
Let {bn }n∈N be a bounded, monotone sequence.
P∞
Then n=1 an bn is also convergent.

P∞
Alternating Series Test / Leibniz’s Criterion (link): Consider the summation n=1 (−1)
n
an .
Suppose an ≥ an+1 ≥ an+2 ≥ · · · ≥ 0 and an → 0.
Then the summation in question converges. More generally, all sums of the types
∞
X ∞
X
(−1)n an (−1)n+1 an
n=k n=ℓ

will for that sequence {an }n∈N .

Moreover, the value of the series lies between any two consecutive partial sums; that is, ∀N ∈ N, either
N
X ∞
X N
X +1
(−1)n an ≤ (−1)n an ≤ (−1)n an
n=k n=k n=k

or
N
X +1 ∞
X N
X
(−1)n an ≤ (−1)n an ≤ (−1)n an
n=k n=k n=k

We may likewise say that

∞ N
N +1 N

X X X X
n n n n
(−1) an − (−1) an ≤ (−1) an − (−1) an ≤ aN +1

n=k n=k n=k n=k

P∞
Cauchy Condensation Test (link): Consider the summation S := n=1 an .
P∞
It converges if and only if S ∗ := n=0 2n a2n does. (Moreover, this new sum will satisfy S ≤ S ∗ ≤ 2S.)

P∞
Direct Comparison (of Series) Test (link): Consider the summation n=1 an .
P∞
Let n=1 bn be a known absolutely convergent series, with |an | ≤ |bn | for all n sufficiently large.
P∞
Then n=1 an converges absolutely.

70
This test may be applied to integrals in the obvious way, and has the ratio comparison test as a
corollary.

Dirichlet’s Test (link): Take {an }n∈N ⊆ R and {bn }n∈N ⊆ C with

◦ an ≥ an+1 ≥ · · ·
◦ an → 0
P
N
◦ ∃M > 0 such that, ∀N ∈ N n=1 bn ≤ M (uniformly bounded partial sums)

P∞
Then n=1 an bn converges.

P∞
Integral Test (link): Consider the summation n=1 an .
Let f : [1, ∞) → R be such that f (n) = an and f ≥ 0 and f is decreasing. Then let
Z ∞
L= f (x) dx
1

If L < ∞, the summation converges, and otherwise we have divergence. (Series converges iff integral
does.)
R∞ P∞
One may, of course, use N f (x) dx for n=N an .

P∞ P∞
Limit Comparison Test / Ratio of Limits (link): Consider the summations n=1 an , n=1 bn .
We suppose that an , bn > 0 for all n.
Define
an
L := lim
n→∞ bn
If L exists and 0 < L < ∞, then each sum diverges iff the other does (and same for convergence).
(Both diverge or both converge.)

P∞
p-Series Test (ζ function tails): Consider the summation n=1 an .
In particular, given p, we consider that
∞ ∞
X 1 X 1
ζ(p) := p
or even just the tails
n=1
n np
n=k

These summations converge only for p > 1 (if p ∈ R).

P∞
Ratio Test / d’Alembert’s Criterion (link): Consider the summation n=1 an .
Define
an+1
L := lim

n→∞ an

Then:
◦ L < 1 =⇒ absolute convergence
◦ L > 1 =⇒ divergence
◦ L = 1 is inconclusive

71
Sometimes the limit L does not exist so we may look at the more general

an+1 an+1
M := lim sup m := lim inf

n→∞ an n→∞ an

and conclude:
◦ if M < 1, absolute convergence
◦ if m > 1, divergence
◦ If |an+1 /an | ≥ 1 for all n large enough, divergence
◦ Inconclusive otherwise
Somewhat obscure extensions of the test may be found here.

P∞
Root Test / Cauchy’s Criterion (link): Consider the summation n=1 an .
Define p
n
L := lim sup |an |
n→∞

◦ L < 1 =⇒ convergence
◦ L > 1 =⇒ divergence
◦ L = 1 is inconclusive

The root test is stronger than the ratio test: root test conclusions imply those of the ratio test, but
not vice versa.
The root test has an application in the
P∞Cauchy-Hadamard theorem, stating that the radius of
convergence of a power series f (x) := n=1 an (x − c)n is given by

1
r= p
n
lim sup |an |
n→∞

with r = 0 if the limit is ∞ and r = ∞ if the limit is 0.

Weierstrass M -test (link): Take a sequence {fn : A → C}n∈N , and suppose ∃{Mn }n∈N ⊆ R≥0 such
that

◦ |fn (x)| ≤ Mn for all n ∈ N and x ∈ A

P∞
◦ n=1 Mn < ∞
P∞
Then the series n=1 fn (x) converges absolutely and uniformly on A in the sense that, for each x,
n ∞
n→+∞
X X
fk (x) −−−−−−→ fn (x)
unif./abs.
k=1 k=1

72
§3.7: Convergence Tests for Integrals

Generalizing Series Tests: In general, suppose that f : R≥1 → R≥0 is non-increasing and f ∈ R[1.b]
for all b > 1. Then we may use series convergence tests, as
Z ∞ ∞
X
f converges ⇐⇒ f (n) converges
1 n=1

In particular, the non-increasing & positivity conditions ensure that the sum is an upper bound for
the integral. (Of course, we may start at any integer if desired, not merely 1.)

Rb Rb
Absolute Convergence Test: For a, b ∈ [−∞, +∞], if a
|f | converges, so must a
f . (Used to turn
the integrand nonnegative to help with other tests.)

Direct Comparison Test: Let f, g ∈ C(R≥a , R≥0 . Then:

R∞ R∞
◦ a g converges ⇐⇒ a f converges
R∞ R∞
◦ a g diverges ⇐⇒ a f diverges

Dirichlet’s Test (link): Take f, g : R≥a → R continuous, with f uniformly bounded (i.e. |f (x)| ≤ M
for all x, for some M independent of x) and g ≥ 0 monotone-decreasing.
R∞
Then a f g converges.

R∞
Limit At Infinity Test: Suppose that lim f (x) = L. Then, trivially, a
f diverges if L ̸= 0.
x→∞
(Converse need not be true.)

Limit Comparison Test: Let f, g ∈ C(R≥a , R≥0 . Suppose

f (x)
lim =L
x→∞ g(x)

Then
R∞ R∞
◦ If L ∈ (0, ∞), a
g converges ⇐⇒ a f converges
R∞ R∞
◦ If L ∈ (0, ∞), a
g diverges ⇐⇒ a f diverges
R∞ R∞
◦ If L = 0, a g converges =⇒ a f converges
R∞ R∞
◦ If L = 0, a f diverges =⇒ a g diverges
R∞ R∞
◦ If L = ∞, a f converges =⇒ a g converges
R∞ R∞
◦ If L = ∞, a g diverges =⇒ a f diverges

These hold for the “Type 2” improper integrals concerned with discontinuities.

73
§3.8: Fourier Series

§3.8.1: Basic Definitions

Given a function f , several different types of Fourier series may exist. We will adopt several (not
necessarily standard) notations for each. In each, we generally are extending a function f : I → R for I
some interval, and extending it periodically in some fashion:

(Ordinary) Fourier Series: We will have, for f : [−L, L] → R

∞
a0 X nπx nπx
FS[f ](x) = + an cos + bn sin
2 n=1
L L

wherein Z L Z L
1 nπx 1 nπx
an = f (x) cos dx bn = f (x) sin dx
L −L L L −L L
If dom(f ) = [0, 2L], then we instead integrate over that interval but the formula is otherwise unchanged.
(In general, for f 2L-periodic or periodically-extended, any interval (a, a + 2L) may be used.)
Observe that if f is even, then f (x) sin(nx) is odd and bn = 0. Similarly, f odd gives an = 0.
Fourier Cosine Series (link): Given f : [−L, L] → R even, we have
∞
2 L
Z
a0 X nπx nπx
FCS[f ](x) = + an cos where an = f (x) cos dx
2 n=1
L L 0 L

If we have f : [0, L] → R not necessarily even, we may given it an even extension to a function
[−L, L] → R and apply the above formula with no modifications.
Fourier Sine Series (link): Given f : [−L, L] → R odd, we have
∞
2 L
X nπx Z nπx
FSS[f ](x) = bn sin where bn = f (x) sin dx
n=1
L L 0 L

If we have f : [0, L] → R not necessarily odd, we give it an odd extension to a function [−L, L] → R
and apply the above formula with no modifications.
Fourier Exponential Series: Given f : [−L/2, L/2] → R, we have
+∞
1 L/2
Z
X 2πin 2πin
FES[f ](x) = cn exp x with cn = f (x) exp − x dx
n=−∞
L L −L/2 L

74
§3.8.2: Some Properties & Results

Some notes on convergence:

If f is α-Holder continuous for any α > 0, FS[f ] converges uniformly everywhere to x. (Called the
Dirichlet-Dini criterion.)

◦ For instance, for f ∈ C 1 .

If f is of bounded variation, FS[f ] converges to f everywhere.

If f is continuous with n∈Z |cn | < ∞, then FES[f ] converges uniformly.

For f continuous or Lp (1 < p ≤ ∞) in general, FS[f ] converges a.e. (known as Carleson’s theorem).

◦ Limit is meant in the Lp norm

◦ Kolmogorov found an L1 function for which FS[f ] diverges a.e. (later shown everywhere).
◦ Given E of measure zero, ∃f continuous with FS[f ] non-convergent at each point of E

Some other results follow. We let f have FS[f ] coefficients an , bn and FES[f ] coefficients cn .
n→∞
Riemann-Lebesgue Lemma: For f ∈ R[0, L] or R[−L, L] as needed, we have an , bn , cn −−−−→ 0.

1 L 2 X
Z
2
Parseval’s Theorem: For f ∈ L (0, L), then
2
|f | = |cn |
L 0
n∈Z

Hesham’s Identity: Let f ∈ L (0, L) and c0 , · · ·, cM −1 ̸= 0 but cn = 0 for n ≥ M (WLOG), then

 
Z L M −1 M −1 M −1 M −1
1 4
X X  X X 
◦ cn ∈ C =⇒ |f | = ck cℓ  cm cm−k+ℓ + cm−ℓ+k cm 
L 0  
k=0 ℓ=0 m=k−ℓ m=k−ℓ
if k≥ℓ if k<ℓ
Z L M −1 M −1 M −1
1 4
X X X
◦ cn ∈ R =⇒ |f | = ck cℓ cm cm−|k−ℓ|
L 0 k=0 ℓ=0 m=|k−ℓ|

2
Plancherel’s Theorem: Given {cn }n∈Z such that < ∞ (finite ℓ2 norm), then ∃! f ∈ L2 (0, L)
P
n∈Z |cn |
with FES[f ] given by those cn .

75
Given r, s : [0, P ] → R with coefficients R[n] := rn and S[n] := sn , we have the following properties. We
see a function f on the left, and what happens to the coefficients of FES[f ] on the right. For instance, we
see that
∞
h i X sn + s−n 2πin
FES Re(s) (x) = exp x
n=−∞
2 P

Some others:

Derivatives: If FES[f ] uses coefficients cn , then

◦ FES[f ′ ] has coefficients incn

◦ FES f (n) has coefficients (in)k cn

76
§3.8.3: Some Common Fourier Series

Some common ordinary/sine-cosine Fourier series (i.e. FS[f ]):

77
§3.9: Fourier Transforms

§3.9.1: Basic Definitions & Conventions

We define the Fourier transform of a function f : R → C to be the function fˆ : R → C given by

Z ∞
fˆ(ξ) := f (x)e−2πiξx dx
−∞

and its inverse transform by Z ∞

f (x) := fˆ(ξ)e2πiξx dξ
−∞

We sometimes use the notations

F[f (x)](ξ) F{f (x)}(ξ) F[f (x)]ξ Ff (ξ)

F
for the so-called forward transform (f 7→ fˆ), and analogous ones for the inverse transform F −1 .
In Rn we will use Z
Ff (ξ) := f (x)e−2πi⟨ξ,x⟩Rn dx
Rn

instead (with ⟨·, ·⟩Rn denoting the usual dot product). Most statements here for R can be generalized to Rn
with this mild modification.
Warning (Physics). In physics in particular, it is commonplace to express these in terms
of an angular frequency ω = 2πξ. This leads to the convention, for instance,
Z
Ff (ω) = f (x)e−iωx dx
R

but this breaks a symmetry in that with this convention we must define
Z
−1 1
Ff (x) := Ff (ω)eiωt dξ
2π R

This gives rise to the unitary convention that allows for F F −1 [f ] = F −1 [F[f ]] = f :

Z
1
Ff (ω) := √ f (x)e−iωx dx
2π R

In the case of Rn , we will have, in the non-unitary case,

Z Z
1
Ff (ω) := f (x)e−i⟨ω,x⟩Rn dx Ff−1 (x) := f (x)ei⟨ω,x⟩Rn dω
Rn (2π)n Rn

and in the unitary case

Z Z
1 1
Ff (ω) := f (x)e−i⟨ω,x⟩Rn dx Ff−1 (x) := f (x)ei⟨ω,x⟩Rn dω
(2π)n/2 Rn (2π)n/2 Rn

Generally, here, we will assume ordinary frequency (using 2π explicitly) and

R1 (not Rn ) unless specified otherwise. Identities tied to the others are summarized
here but are quite analogous.

78
§3.9.2: Important Properties

Some properties follow. Let α, β ∈ C and f, g, h ∈ L1 (R) (hence

R
R
|f | < ∞, for instance).

Linearity: Fαf +βg (ξ) = αFf (ξ) + βFg (ξ)

Shifting:
◦ Translation/Time: F[f (x − x0 )](ξ) = e2πix0 ξ Ff (ξ)
◦ Modulation/Frequency: F e2πixξ0 f (x) (ξ) = Ff (ξ − ξ0 )

1 ξ
Time Scaling: If α ̸= 0, then F[f (αx)](ξ) = Ff
|α| α

Conjugation: Ff (ξ) = Ff (−ξ)

◦ In particular, if f is real-valued, then it has the reality condition Ff (−ξ) = Ff (ξ)

◦ Hence, f is Hermitian in such a case.
◦ If f is purely-imaginary, then Ff (−ξ) = −Ff (ξ)
Multiplication by Sine/Cosine: We have
h i 1h i
F f (x) cos(2παx) (ξ) = Ff (ξ − a) + Ff (ξ + a)
2
h i 1h i
F f (x) sin(2παx) (ξ) = Ff (ξ − a) − Ff (ξ + a)
2i

Ff (ξ) + Ff (−ξ) Ff (ξ) − Ff (−ξ)

Real & Imaginary Parts: FRe(f ) (ξ) = and FIm(f ) (ξ) =
2 2i
◦ Related: If f is purely real and even, so is FF . If f is purely imaginary and odd, so is Ff .

Invertibility/Periodicity: For suitable f , we note that F[F[f ]] = f (−x) and hence (F◦F ◦F◦F)[f ] = f ,
4
h i composition) F [f ] = f , making the Fourier transform 4-periodic on these
or more simply (implying
functions. Hence F 3 fˆ = f under this convention.

◦ This can be summed up with the Fourier inversion theorem or Fourier integral theorem, which
states Z
f (x) = e2πi·(x−y)·ξ f (y) dy dξ
R2
−1
which may be thought of as F applied to Ff .
◦ The theorem can be said to hold for all Schwartz functions (Wikipedia). That is, it applies if f
is such that
f ∈ C ∞ (R, C) and ∀α, β ∈ N, ∥f ∥S(R,C) := supxα f (β) (x) < ∞

x∈R

which encodes all functions of “rapidly decreasing functions”. (An analogous idea holds in Rn .)
◦ Even broader, it holds for all functions f with f, Ff ∈ L1 (R) with each continuous a.e.
◦ Some more discussion here.
Differentiation: We assume that f ∈ C 1 (R) ∩ L1 (R) with f ′ ∈ L1 (R). Then Ff ′ (ξ) = 2πiξ · Ff (ξ)

◦ Taking f ∈ C n (R) and f, f ′ , · · ·, f (n) ∈ L1 (R) gives Ff (n) (ξ) = (2πiξ)n · Ff (ξ)
◦ This admits the rule of thumb “f is smooth iff Ff decays to 0 quickly as |ξ| → ∞” and likewise
“f decays to 0 quickly as |x| → ∞ iff Ff is smooth”

79
n
dn Ff (ξ)

i
◦ Related Identity: F[xn f (x)](ξ) =
2π dξ n

Convolution Theorem: We may define the convolution by

Z
(f ∗ g)(x) := f (y)g(x − y) dy
R

Then Ff ∗g = Ff · Fg and Ff ·g = Ff ∗ Fg .
Uniform Continuity & Riemann-Lebesgue: The following hold since f ∈ L1 (R):

◦ Ff is uniformly continuous
◦ ∥Ff ∥∞ ≤ ∥f ∥1 (in the Lp space senses)
|ξ|→∞
◦ Ff (ξ) −−−−→ 0

Parseval’s Theorem: Let f, g ∈ L2 (R) specifically. Then

Z Z
⟨f, g⟩L2 (R) = ⟨Ff , Fg ⟩L2 (R) ; that is, fg = Ff Fg
R R

◦ Corollary: Plancherel Theorem: Take f = g to get

Z Z
2 2
∥f ∥L2 (R) = ∥Ff ∥L2 (R) ; that is, |f | = |Ff |
R R

X X
Poisson Summation Formula: f (n) = Ff (n) (Wikipedia article)
n∈Z n∈Z

Cross-Correlation (Wiener-Khinchin) Theorem: If we define the cross-correlation of f, g by

Z
(f □g)(x) := f (y)g(x + y) dy
R

then Ff □g = Ff · Fg . (More on cross-correlation on Wikipedia.)

Fourier-Laplace Relation: Ff (ix) = Lf (−2πx)

Lp Relations:

◦ F : L1 (R) → L∞ (R) is bounded as an operator (as |Ff | ≤ |f |). Moreover, F(L1 (R)) ⊆ Cc (R)
R
R
(though with no equality).
◦ F : L2 (R) → L2 (R) is unitary (hence bijective and preserves inner products)
◦ F : Lp (R) → Lq (R) more generally where q = p/(p − 1) is the Holder conjugate of p, i.e.
1/p + 1/q = 1. The exact image is hard to characterize unless p = 2.

80
§3.9.3: Common Fourier Transforms

Link to Tables & Conventional Definitions:

A list of transforms (for all three conventions) may be found here on Wikipedia.
A few functions need defining.
(
1, x>0
Heaviside Step u(x): We let u(x) := 1(0,∞) (x) ≡ (with conventions for u(0) varying)
0, x<0

 sin(πx)
, x ̸= 0
sinc(x): We use the normalized sinc function, sinc(x) := πx
 1, x = 0

 0, |x| > 1/2

rect(x): The rectangular pulse or Heaviside Π, defined by rect(x) ≡ Π(x) := 1/2, |x| = 1/2

1, |x| < 1/2


tri(x): The triangle function/pulse, hat function, or tent function, with

(
1 − |x|, |x| < 1
tri(x) ≡ Λ(x) := max{0, 1 − |x|} ≡
0, |x| ≥ 1

Notably, x
tri(x) = rect(x) ∗ rect(x) = rect · 1 − |x|
2
Dirac Delta δ(x): Loosely we may think of
(
+∞, x = 0
?
δ(x) :=
0, x ̸= 0
R
with R δ ≡ 1. This is purely heuristic. Formally, as a distribution (generalized function), δ is a linear
functional on the space of test functions (bump functions, or functions of compact support: Cc∞ (R)),
with
δ[φ] = φ(0) for every φ ∈ Cc∞ (R)
and hence we say Z
δ[φ] := φ(x)δ(x) dx = φ(0)
R

81
Square-Integrable One-Dimensional Functions (L2 (R)):

h i 1 ξ
F rect(αx) (ξ) = sinc
|α| α

h i 1 ξ
F sinc(αx) (ξ) = rect
|α| α

h i 1 ξ
F sinc (αx) (ξ) =
2
tri
|α| α

h i 1 ξ
F tri(αx) (ξ) = sinc2
|α| α
h i 1
F e−αx u(x) (ξ) =
α + 2πiξ
r
h 2
i π −π2 ξ2 /α
F e−αx (ξ) = e
α
r 2 2
h
−iαx2
i π π ξ π
F e (ξ) = exp i −
α α 4
h i 2α
F e−α|x| (ξ) = 2
α + 4π 2 ξ 2
2
h i π π ξ
F sech(αx) (ξ) = sech
α α

82
Distributions in One Dimension (e.g. Cc∞ (R)):

Derivatives of δ (i.e. δ (n) ) should be interpreted in the weak-derivative sense. To recall, the weak derivative
of f is a function Df such that
Z Z
f φ = − Df φ for every φ ∈ Cc∞ (R)
′
R R
h i
F 1 (ξ) = δ(ξ)
h i
F δ(x) (ξ) = 1
h i
F e2πiαx (ξ) = δ(ξ − α)
h i 1h i
F cos(2παx) (ξ) = δ(ξ − α) + δ(ξ + α)
2
h i 1h i
F sin(2παx) (ξ) = δ(ξ − α) − δ(ξ + α)
2i
r 2 2
h i π π ξ π
F cos(αx2 ) (ξ) = cos −
α α 4

π2 ξ2
r
h i π π
F sin(αx ) (ξ) = −
2
sin −
α α 4
n
h i i
F xn (ξ) = δ (n) (ξ)
2π
h i
F δ (n) (x) (ξ) = (2πiξ)n

1
F (ξ) = −iπ · sign(ξ)
x
(−1)n−1 dn (−2πiξ)n−1

1
F n (ξ) = F n
ln|x| (ξ) = −iπ · sign(ξ)
x (n − 1)! dx (n − 1)!
h
α
i 2 · Γ(α + 1) πα
F |x| (ξ) = − α+1 · sin
|2πξ| 2
" #
1 1
F p (ξ) = p
|x| |ξ|
h i 1
F sign(x) (ξ) =
iπξ

h i 1 1
F u(x) (ξ) = + δ(ξ)
2 iπξ

83
§3.9.4: Discrete Fourier Transform (DFT)

(Some discussion on Wikipedia. Related: the Fast Fourier Transform (FFT) (link).)

N −1 N −1
Given {xk }k=0 ⊆ C, its discrete Fourier transform (DFT) is the sequence {Xk }k=0 ⊆ C defined by
N −1
X 2πi
Xk := xn exp − kn
n=0
N

Given the latter sequence, the inverse is given by

N −1
1 X 2πi
xn = Xk exp − kn
N N
k=0
h i
N −1 N −1
Let X := DFT {[n }n=0 x] give the DFT of x := {xk }k=0 , with similar conventions for y, Y, X in naming
convention. Then the following hold:

Linearity: Given α, β ∈ C then

h i h i h i
N −1 N −1 N −1
DFT {αxn + yn }n=0 = α · DFT {xn }n=0 + β · DFT {yn }n=0

Time/Frequency Reversal: We see that

h i h i
N −1 N −1
DFT {xn }n=0 = Xk =⇒ DFT {xN −n }n=0 = XN −k
k

Time Conjugation: We see that

h i h i
N −1 N −1
DFT {xn }n=0 = Xk =⇒ DFT {xn }n=0 = XN −k
k k
h i
N −1
Real/Imaginary Part Properties: Let Xk = DFT {xn }n=0 . Then
k
h
N −1
i Xk + XN −k
◦ DFT {Re(xn )}n=0 =
k 2
h
N −1
i Xk − XN −k
◦ DFT {Im(xn )}n=0 =
" k 2i
N −1 #!
xn + xN −n
◦ DFT = Re(Xk )
2 n=0 k
" N −1 #!
xn − xN −n
◦ DFT = Im(Xk )
2i n=0 k
h i h i
N −1 N −1
Parseval & Plancherel: Let X = DFT {[n }n=0 x], Y = DFT {[n }n=0 y]. Then Parseval gives

N −1 N −1
X 1 X
xn yn = Xk Yk
n=0
N
k=0

Plancherel has x = y with

N −1 N −1
X 2 1 X 2
|xn | = |Xk | (ℓ2 (C) norm)
n=0
N n=0

84
A few sequence/DFT pairs:

85
§3.10: Laplace Transforms

§3.10.1: Basic Definitions & Reference Tables

(Tables of Laplace transforms are here on Paul’s Online Math Notes or here on Wikipedia.)

The Laplace transform is an integral transform defined by

Z ∞
Lf (s) := f (t)e−st dt
0

Note that this gives a function in variable s from a function of variable t. Other notations exist, commonly
L{f (t)}(s), but I will focus on Lf (s) for simplicity unless details of f must be accentuated. Sometimes we
let F = Lf and f = L−1 F as well.
Some utilize a two-sided transform, or other alternatives; Wikipedia has some discussion. Here, we
focus exclusively on the usual, one-sided transform if not stated otherwise.

86
§3.10.2: Laplace Transform Properties

Linearity: Lαf +βg (s) = αLf (s) + βLg (s)

1 s
Time Scaling: L{f (αt)}(s) = Lf
|α| α
Shifting:

◦ Frequency: L eαt f (t) (s) = Lf (s − α)

◦ Time: L{f (t − α)}(s) = e−αs Lf (s)

Acting on Derivatives & Integrals: Here, f (0+ ) := lim f (x)

x→0+

◦ Lf ′ (s) = sLf (s) − f (0+ )

◦ Lf ′′ (s) = s2 L2f (s) − sf (0+ ) − f ′ (0+ )
n−1
X
◦ Lf (n) (s) = sn Lf (s) − sn−k−1 f (k) (0+ )
k=0
Z t
1
◦ Lu∗f (s) = L f (u) du (s) = Lf (s)
0 s
Z ∞
f (t)
◦ L (s) = Lf (u) du
t s

Multiplication & Convolution:

Z t
◦ Lf ∗g (s) = L f (t − ξ)g(ξ) dξ (s) = Lf (s) · Lg (x)
0
1
◦ Lf ·g (s) = Lf ∗ Lg (s)
2πi
Multiplication by t:

◦ L{tf (t)}(s) = −L′f (s)

dn Lf (s)
◦ L{tn f (t)}(s) = (−1)n
dsn
Multiplication by Heaviside Function:
(
0, t<α
◦ L{uα (t) · f (t − α)}(s) = e−αs Lf (s), where uα (t) :=
1, t>α
◦ L{uα (t) · f (t)}(s) = e−αs L{f (t + α)}(s)

Lf (s) = Lf (s)

Final & Initial Value Theorems:

◦ f (0+ ) = lim sLf (s)

s→∞
◦ lim f (t) =: f (∞) = lim sLf (s) if sLf (s) has poles only for Re(s) > 0.
t→∞ s→0

Fourier-Laplace Relation: Ff (ix) = Lf (−2πx)

87
§3.10.3: Common Laplace Transforms

Polynomials/Polynomial-Like:
1
◦ L{1}(s) =
s
α! Γ(α + 1)
◦ L{tα }(s) = =
sα+1 sα+1
Exponentials:
1
◦ L eαt (s) =

(facilitates shift, of a sort)
s−α
Trigonometric Functions:
α
◦ L{sin(αt)}(s) =
s2 + α2
s
◦ L{cos(αt)}(s) = 2
s + α2
α
◦ L{sinh(αt)}(s) = 2
s − α2
s
◦ L{cosh(αt)}(s) = 2
s − α2
Special Functions:

◦ Dirac Delta: L{δ(t − α)}(s) = e−αs

(
e−αs 0, t<α
◦ Heaviside at α: L{uα (t)}(s) = , where uα (t) :=
s 1, t>α

88
§3.11: Cauchy Principal Value (PV/CPV)

(Some information on Wikipedia.)

(More to add later...)

89
§4: Items from Vector & Higher-Dimensional Calculus (Calculus
II/III)

§4.1: Parametric/Polar Equations

Items on Parametric Equations:

◦ Derivatives: Let x, y be parameterized by t, i.e. we have x(t), y(t). Then, if dx/dt ̸= 0, and all
derivatives involved exist,
dy dy/dt y ′ (t) d2 y d2 y/dt2 y ′′ (t)
= = ′ 2
= = ′
dx dx/dt x (t) dx dx/dy x (t)

◦ Arc Length: If x, y are parameterized in t, and C = x(t), y(t) t∈[a,b] , then
Z b q
2 2
the length of C = [x′ (t)] + [y ′ (t)] dt
a
p
Notably, it has the associated differential ds = (dx)2 + (dy)2 .
◦ Areas of Surfaces of Revolution: Take a curve C defined by x, t, all as above, revolved about
the x axis to form a surface S. Then
Z b q
2 2
the area of S = 2πy [x′ (t)] + [y ′ (t)] dt
a

If revolved about the y-axis instead we have 2πx instead.

Polar Coordinate Stuff:

◦ Conversions: Recall that, to convert Cartesian (x, y) to polar (r, θ), we have
p
r = x2 + y 2 θ = atan2(x, y) (usual polar angle, best to use a picture)

To convert from polar to Cartesian, use

x = r cos θ y = r sin θ

Note that the Jacobian gives

dx dy = r dr dθ
◦ Derivatives: Given an equation r = f (θ), we have (because of parameterizations)
f ′ (θ) sin θ + f (θ) cos θ

dy
= ′
dx (r,θ)
f (θ) cos θ − f (θ) sin θ

◦ Area: The area between the origin and the curve r = f (θ), for θ ∈ [α, β], is given by

1 β 2 1 β 2
Z Z
area = r dθ = f (θ) dθ
2 α 2 α
If you want the area between r1 (θ) and r2 (θ), you can subtract as in the Cartesian case.
◦ Arc Length: For r = f (θ) a C 1 function and a curve C := {(f (θ), θ)}θ∈[α,β] , we have
Z β Z β q
2 2
p
arc length of C = r2 + (r′ )2 dθ = [f (θ)] + [f ′ (θ)] dθ
α α

90
§4.2: Basics on Vectors, Dot/Cross Products, & Lines/Planes

Some Vector Basics:

#» := (u )n , v
◦ Dot Product: Given u #» := (v )n ∈ Rn , we have
i i=1 i i=1

n
#» · v
#» = ⟨ u,
#» u⟩
#»
X
u Rn = ui vi
i=1
#» v
We note that u, #» are perpendicular is this is zero, i.e. u
#» ⊥ v
#» ⇐⇒ ⟨ u,
#» v
#»⟩ = 0
◦ Angle Between Vectors: Given u, #» v#», we have the angle between them θ and the relation
#» v
⟨ u, #»⟩
cos θ = #» #»
∥ u∥∥ v ∥
#» onto v
◦ Vector Projection: The projection of u #» (the shadow of u
#» as cast from a light perpen-
#»
dicular to v ) is given by
#» #»
#» = ⟨ u, v ⟩ v
proj v#» ( u) #»
#»
∥v∥
2

(This is the projection of the argument onto the subscript.)

◦ Cross Product: u #» × v
#» has magnitude given by the area of the parallelogram they span (their
convex hull), and direction (i.e. that of n̂) from the right-hand rule. One may have

ı̂ ȷ̂ k̂
#» × v
u #» = ∥ u∥∥
#» v #»∥ sin θ · n̂ = u u u

1 2 3
v1 v2 v3

opting to use a pseudo-determinant to memorize it. We note that u, #» v

#» are parallel iff u #» = #»
#» × v 0.
Moreover, v #» × u #» ⊥ u,#» v #».
Some cross product properties include
#» × (β v
(α u) #») = αβ( u #» × v #»)
#» × v
u #» = −( v #» × u)#»
#» #» #»
0×v= 0
#»
u × (v #» + w)#» = ( u #» × v #») + ( u #» × w)
#»
#» #» #» #»
( v + w) × u = ( v × u) + (w × u)#» #» #»
#» #» #» #» #»
u × ( v × w) = ⟨ u, w⟩ · v − ⟨ u, v #» #» #»⟩ · w
#»
#» × v
|⟨ u #», w⟩|
#» = ∥ u #» × v #»∥∥w∥|cos
#» θ| (box product: volume of spanned parallelipiped)

u1 u2 u3
⟨u#» × v#», w⟩
#» = v
1 v2 v 3

w1 w2 w3

Lines & Planes In Space: With Vectors:

◦ Parameterized Line: Given r#» 3 #» 3

0 := x0 ı̂ + y0 ȷ̂ + z0 k̂ ∈ R and v := vx ı̂ + vy ȷ̂ + vz k̂ ∈ R , the
#» #»
line through r0 (as a point) parallel to v can be given by the vector equation
#»
r (t) = r#» #» t∈R
0 + tv

Hence the line may be parameterized by


x = x0 + tvx

y = y0 + tvy t∈R

z = z0 + tvz


91
◦ Distance from Point to Line: Consider points P, S in R3 space. Suppose we want the distance
#»
#». Let d #» #» #»
from S, to a line through P parallel to v be the vector from P to S (so d = P − S).
Then: #»
#»

d × v

the desired distance = #»
∥v∥
#» and
◦ Distance from Point to Plane: Consider a point P on a plane that has normal vector n,
#» #» #» #»
a point S in space. (Hence let v be the vector from P to S, v = P − S.) Then

#» n #»
distance from S to the plane = v , #»

∥ n∥

◦ Planes in Space: The plane through a point v #» := x ı̂ + y ȷ̂ + z k̂ ∈ R3 with normal vector

0 0 0 0
#»
n = [A, B, C] can be given as
Vectors: All points v #» := (x, y, z) satisfying ⟨ n,
#» v
#» − v
#» ⟩ = 0
0
Components: All points (x, y, z) such that A(x − x0 ) + B(y − y0 ) + C(z − z0 ) = 0
Components (Alt): All points (x, y, z) such that Ax + By + Cz = Ax0 + By0 + Cy0 .
If you know points in the plane, you can use them to generate (normal) vectors: remember that
#» × w
v #» ⊥ v
#», w.
#»

92
§4.3: Vector Calculus: Derivatives & Integrals of Vector-Valued Functions

Differentiation:

◦ Derivatives: Take #»
r (t) := f (t)ı̂ + g(t)ȷ̂ + h(t) k̂ ∈ R3 a vector-valued function. Then derivatives
are componentwise:
#» d #»
r df dg dh
r ′ (t) ≡ := ı̂ + ȷ̂ + k̂
dt dt dt dt
◦ Tangent Line: The tangent line to the curve #» r (t) traces out, at a point f (t )ı̂ + g(t )ȷ̂ + h(t ) k̂
0 0 0
on it, is the line through the point in question.
◦ Derivative Rules: Most standard rules hold. Some noteworthy exceptions:
Scalar-Vector Product Rule: Take f : R → R and #» r : R → R3 (i.e. f (t) and #»
r (t)). Then

d
f (t) · #»
r (t) = f ′ (t) · #»
r (t) + f (t) · #»
r ′ (t)
dt
#» v
Dot Product Rule: For u, #» : R → R3 vector-valued functions of t,

d D #» #»(t) = u
E D
#»′ (t), v
#»(t) + u(t),
E D
#» #»′ (t)
E
u(t), v v
dt
Cross Product Rule: Similarly,
d #» #»(t) = u

#»′ (t) × v
#»(t) + u(t)

#» × v #»′ (t)

u(t) × v
dt
◦ Constant Length: r (t) is differentiable and has constant length iff ⟨ #»
#» r , #»
r ′ ⟩ = 0 for each t.

Further Derivative Discussion & Applications:

◦ Unit Tangent Vector: The unit tangent vector is given by

#»
r ′ (t)
T̂(t) = #»
∥ r ′ (t)∥

◦ Curvaturve: Suppose T̂(t) is the unit tangent vector to a smooth curve #»

r (t). Then it has
curvature
′
T̂ (t)
dT̂
κ := =
ds ∥ #»

r ′ (t)∥
Here, s is representative of arc length: curvative is considered the rate at which the tangent vector
turns along the curve, per unit length.
We may also define a radius of curvature ρ = 1/κ: it defines a circle of curvature, tangent to
the curve at that time t.
◦ Principal Unit Normal Vector: Given κ ̸= 0, the principal unit normal vector N̂ is
orthogonal to the aforementioned T̂:

1 dT̂ T̂′ (t)

N̂ = =
κ ds ′
T̂ (t)

93
◦ Acceleration: We may write the acceleration vector by

#» d2 #»
r
a = 2
= aT T̂ + aN N̂
dt
d2 s d
aT = 2 = ∥ #» r ′∥
dt dt
2
ds
= κ∥ #»
2
aN = κ r ′∥
dt

where aT , aN are the tangential & normal scalar components of acceleration, respectively.
Note that acceleration always lies in the plane spanned by these two. One may also note

aN = ∥ #»
q
2
r ′′ ∥ − a2T

Integration: We let #»
r (t) := f (t)ı̂ + g(t)ȷ̂ + h(t) k̂ unless specified otherwise.

◦ Integrals: The integral of #»

r over t ∈ [a, b] may be found componentwise:
Z b Z b ! Z b ! Z b !
#»
r (t) dt = f (t) dt ı̂ + g(t) dt ȷ̂ + h(t) dt k̂
a a a a

The fundamental theorem of calculus also applies:

b
b
#» #»
Z
r (t) dt = R(t) whenever R′ = #»
#» r
a a

◦ Arc Length: The arc length of #»

r (t)’s curve on t ∈ [a, b] is given in the obvious way:
Z b Z b
∥ #»
q
2 2 2
arc length = ′ ′ ′
[x (t)] + [y (t)] + [z (t)] dt = r ′ (t)∥ dt
a a

provided the path traced out does not cross itself.

94
§4.4: Partial Derivatives

Formally, the partial derivative of f (x, y) at (x0 , y0 ) is given by

∂f f (x0 + h, y0 ) − f (x0 , y0 ) d
:= lim ≡ f (x, y0 )
∂x (x0 ,y0 )
h→0 h dx x=x0

if the limit exists. (We have analogous expressions for other directions - x, y, z, etc. - throughout Rn .) Note
how this holds all but x fixed in f .
We say that f is differentiable (at a point) at (x0 , y0 ) if fx , fy exist there and

∆f = fx (x0 , y0 )∆x + fy (x0 , y0 )∆y + ε1 ∆x + ε2 ∆y

is satisfies with ∆x, ∆y → 0 =⇒ ε1 , ε2 → 0. One may show that if fx , fy are defined in an open region
containing (x0 , y0 ) and are continuous at that point, then

∆f = f (x0 + ∆x, y0 + ∆y) − f (x0 , y0 )

(moving from (x0 , y0 ) → (x0 + ∆x, y0 + ∆y)) satisfies such an equation.

Some results:

Multivariable Chain Rule - One Independent Variable: For simplicity, consider a function
f : R2 → R, in the variables x, y which are parameterized in t. That is, in full, we may write

f x(t), y(t)

Then
df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
(Analogous results hold for functions with dom(f ) = Rn in an analogous manner.)
Multivariable Chain Rule - Two Independent Variables: Consider a function f : R3 → R2 , in
the variables x, y, z, each parameterized in s, t. Hence in full, we have

f x(s, t), y(s, t), z(s, t)

Then we may find the partials of f w.r.t. r, s as so:

∂f ∂f ∂x ∂f ∂y ∂f ∂z
= + +
∂t ∂x ∂t ∂y ∂t ∂z ∂t
∂f ∂f ∂x ∂f ∂y ∂f ∂z
= + +
∂s ∂x ∂s ∂y ∂s ∂z ∂s
The obvious generalizations exist.
Implicit Differentiation: Suppose F (x, y) is differentiable, and F (x, y) = 0 defines y as a differen-
tiable function of x (e.g. sin x + 2y = 0 for F (x, y) = sin x + 2y). Then provided the denominator is
nonzero,
dy ∂x F
=−
dx ∂y F

Implicit Function Theorem: Suppose F (x, y, z) = 0 defines z as a function of x, y, i.e. z = f (x, y)

and F (x, y, f (x, y)) = 0. Then
∂z ∂x F ∂z ∂y F
=− =−
∂x ∂z F ∂y ∂z F
The implicit function theorem says this holds if

95
◦ F ’s partial derivatives are continuous in a region containing the evaluation point (x0 , y0 , z0 )
◦ F (x0 , y0 , z0 ) = c for some c ∈ R
◦ ∂z F (x0 , y0 , z0 ) ̸= 0
◦ F (x, y, z) = c defines z as a differentiable function of x, y near that point

Clairut’s Theorem / Mixed Derivative Theorem: Suppose f, fx , fy , fxy , fyx are defined and
continuous at (a, b) and in an open region about it. Then

fxy (a, b) = fyx (a, b)

allowing interchange of the partial derivatives. Bear in mind that

∂ ∂ ∂ ∂
fxy = f = (fx )y = ∂y ∂x f fyx = f = (fy )x = ∂x ∂y f
∂y ∂x ∂x ∂y

Differentiability =⇒ Continuity: If f has a continuous partial derivative in all coordinates, then

f is differentiable.

96
§4.5: Directional Derivatives & Gradients

#» := v ı̂ + v ȷ̂ ∈ R2 a unit vector, and a point P = (x , y ) in R2 , we say the

Given f : R2 → R with v 1 2 0 0
derivative of f at P in the direction of v #» is defined by

df f (x0 + sv1 , y0 + sv2 ) − f (x0 , y0 )
≡ D v#» f := lim
ds v#»,P P
s→0 s

if the limit exists. Some comments:

Think of s as arc length.

#» ≡ ı̂, ȷ̂ respectively.
The partial derivatives ∂x f and ∂y f are just the directional derivatives for v
Of course, higher-dimensional analogues exist.

We may define the gradient of f by

∂f ∂f
grad f ≡ ∇f := ı̂ + ȷ̂
∂x ∂y
evaluating at the desired points and extrapolating to higher dimensions as desired. Some identities of this
are discussed here.
One may write, then,
#» = ∥∇f ∥ · cos θ
D E
D v#» f = ∇f, v
#» and ∇f .
for θ the angle between v
This soon shows us all of the following (focusing on 2D with obvious extrapolations), and yields further
discussion:

Directions of Fastest Increase/Decrease:

◦ f increases most rapidly when either cos θ = 1, or when θ = 0 and u is parallel to ∇f . That is, f
increases most rapidly at point P to another point P ′ infinitesimally nearby, when its input runs
in the direction of ∇f at P .
◦ Decreasing similarly happens fastest in the direction of −∇f .
#» is orthogonal to ∇f ̸= #»
◦ If u 0 , then no change occurs in f in that direction (the cosine becomes
zero).
◦ Consider a level curve f (x, y) ≡ c. At any point (x, y) on the curve, ∇f is normal to the curve.
(Then ∇f points in the direction of decreasing values, i.e. it is like pointing downriver from a
topological map.)

Tangent Line to Level Curve: The tangent line to a level curve containing (x0 , y0 ) is hence given
by
∂f ∂f
· (x − x 0 ) + · (y − y0 ) = 0
∂x (x0 ,y0 ) ∂y (x0 ,y0 )

Tangent Planes & Gradients: Given a point P = (x0 , y0 , z0 ) which is on the level curve f (x, y, z) = c,
the tangent plane at P is the plane through P normal to ∇f P . The normal line of the surface at P

is the line through P and parallel to ∇f P .
Hence the tangent plane has equation

∂f ∂f ∂f
· (x − x0 ) + · (y − y0 ) + · (z − z0 ) = 0
∂x (x0 ,y0 ,z0 ) ∂y (x0 ,y0 ,z0 ) ∂z (x0 ,y0 ,z0 )

97
and the normal line may be expressed by

∂f

x(t) = x0 + t ·




 ∂x (x0 ,y0 ,z0 )


 ∂f
y(t) = y0 + t · t∈R

 ∂y (x0 ,y0 ,z0 )


 z(t) = z + t · ∂f


 0
∂z (x0 ,y0 ,z0 )


If we let z = f (x, y) define a surface in space, then at the point (x0 , y0 , f (x0 , y0 )) ∈ R3 , we have the
tangent plane as
∂f ∂f
· (x − x0 ) + · (y − y0 )+ = z − f (x0 , y0 )
∂x (x0 ,y0 ) ∂y (x0 ,y0 ) | {z }
like z0

Chain Rule for Paths: Suppose we have #» r (t) := x(t)ı̂ + y(t)ȷ̂ + z(t) k̂ making a smooth path C,
and we have (f ◦ #»
r )(t) = f ( #»
r (t)) a scalar function evaluating along that path. Then we may write

d #»
f ( r (t)) = ∇f ( #»
r (t)), #»
D E
r ′ (t)
dt

98
§4.6: Differentials & Linearization

Change in a Direction: The change of f after moving a small distance ds from a point P in a
#» is given by
direction parallel to a vector v

#»
df ≈ ∇f P , v · ds

Linearization: The linearization of f (as a function of x, y) at the point P = (x0 , y0 ) is the function

∂f ∂f
LP (x, y) := f (P ) + (x − x 0 ) + (y − y0 )
∂x P ∂y P

Hence, for (x, y) near P , we have f (x, y) ≈ LP (x, y). Analogous expressions exist in higher dimensions.
The error E in the approximation is given by
M 2
|E(x, y)| ≤ |x − x0 | + |y − y0 | where |fxx |, |fxy |, |fyy | ≤ M
2
The generalization to Rn is as expected.

Total Differentials: We may define the total differential of f (as (x0 , y0 ) moves to (x0 +dx, y0 +dy))
by
∂f ∂f
df = · dx + · dy
∂x (x0 ,y0 ) ∂y (x0 ,y0 )
The generalization to Rn is as expected.

99
§4.7: Optimization & Lagrange Multipliers

Unconstrained Optimization:

First Derivative Test: If (a, b) ∈ int(dom f ) is a local maximum or local minimum of f at which
fx , fy exist, then fx (a, b) = fy (a, b) = 0.
Critical/Saddle Points: We say (a, b) is a critical point if fx (a, b) = fy (a, b) = 0, or at least one
does not exist.
We say (a, b) is a saddle point if it is a critical point if any open ball centered at (a, b) has points
(xp , yp ) and (xn , yn ) where f (xn , yn ) < f (a, b) < f (xp , yp ).
Second Derivative Test: Take f ∈ C 2,2 (all first/second partial derivatives are continuous) in a ball
about (a, b), with fx (a, b) = fy (a, b) = 0. Then
2
◦ (a, b) is a local maximum if fxx < 0 and fxx fyy − fxy > 0 there
2
◦ (a, b) is a local minimum if fxx > 0 and fxx fyy − fxy > 0 there
2
◦ (a, b) is a saddle point if fxx fyy − fxy < 0 there
2
◦ The test is inconclusive if fxx fyy − fxy = 0 at the point in question

The first criterion may be tested with fyy instead.

2
The quantity fxx fyy − fxy is known as the Hessian discriminant, sometimes written as a determi-
nant.
Note on Extrema Testing: Aside from critical points, be sure to test the boundary of dom(f ) too,
if finding absolute extrema.

Constrained Optimization & Lagrange Multipliers:

#»
Take f, g differentiable with variables x, y, z, and such that g(x, y, z) = 0 =⇒ ∇g ̸= 0 .
Our goal: find the extrema of f subject to the constraint g(x, y, z) = 0. To do this, we introduce a new
variable, λ, and find the x, y, z, λ such that
(
∇f = λ∇g
g(x, y, z) = 0

Generalizations to Rn are obvious. If we implement arbitrarily-many constraints, say g1 , · · ·, gm all being

zero, then we introduce λ1 , · · ·, λm and try to satisfy
 X


 ∇f = λi ∇gi
i


 gi (x, y, z) = 0

i ∈ {1, 2, · · ·, m}


100
§4.8: Multiple Integrals, Differentials, Jacobians, Applications

Multiple integrals may be defined in the obvious way and more thorough, general definitions are discussed
on the sections on Riemann integration & Lebesgue integration. We focus on relevant ideas and results here.

Fubini’s Theorem: For R := [a, b] × [c, d] and f ∈ C(R), we may write

ZZ Z y=d Z x=b Z x=b Z y=d
f (x, y) dA = f (x, y) dx dy = f (x, y) dy dx
R y=c x=a x=a y=c

Analogous results exist for boxes in Rn : integral order may be interchanged. We may also do this for
inner integral bounds defined as a function of the outer variable:
ZZ Z y=d Z x=h2 (y) Z x=b Z y=g2 (x)
f (x, y) dA = f (x, y) dx dy = f (x, y) dy dx
R y=c x=h1 (y) x=a y=g1 (x)

Finding Area, Volume, etc. & Differentials:

RR
◦ The area of R is given by R dA
RRR
◦ The volume of E is given by E
dV
RR
◦ The surface area of E, called ∂E, may be given by ∂E
dS
◦ Here, dA is differential area element, dV that for volume, and dS that for surface area. These
may vary upon your coordinate system. Some formulas are here.

Jacobians & Substitutions:

◦ The aforementioned conversions may be justified with the Jacobian or Jacobian determinant.
In two dimensions, suppose x = g(u, v) and y = h(u, v) (e.g. x = r cos θ, y = r sin θ). Then

∂x ∂x

∂(x, y)
∂v = ∂u x

∂v x
J(u, v) ≡ := ∂u

∂(u, v) ∂y ∂y ∂u y ∂v y

∂u ∂v
Notice that the variables differentiated w.r.t. “enumerate” the columns, and the functions differ-
entiated “enumerate” the rows. In three dimensions, likewise,

∂ u x ∂ v x ∂ w x

J(u, v, w) := ∂u y ∂v y ∂w y
∂u z ∂v z ∂w z

◦ If you make the substitution x = g(u, v), y = h(u, v) in an integral (of f over R), then if R becomes
G under the transform (University Calculus specifies preimage?), then
ZZ ZZ ∂(x, y)
f (x, y) dx dy = f g(u, v), h(u, v) · du dv
R G ∂(u, v)

That is,
∂(x, y)
dx dy =
du dv
∂(u, v)
The obvious extension to Rn holds.
◦ Hence, if you start with a Cartesian integral, and convert elsewhere, you pick up an extra factor
determined by the Jacobian.

101
Masses, Moments, Centers, Etc.:

◦ Mass (Zeroth Moment): Given a density δ := δ(x, y, z), the mass of the object that describes
in a space D is given by ZZZ
M= δ(x, y, z) dV
D

◦ First Moments in 3D: Moments describe the shape of a graph: mass is a zeroth moment,
center of mass is a first moment (when normalized), and moment of inertia is the second moment.
(Probability distributions have first moments as averages, and second moments as variances.)
The first moments about the coordinate planes are denoted by Mxyz with x, y, or z missing
depending on which of the three planes the moment is about: Myz is the moment about x for
instance. We’ll have
ZZZ ZZZ ZZZ
Myz = x · δ(x, y, z) dV Mxz = y · δ(x, y, z) dV Mxy = z · δ(x, y, z) dV
D D D
| {z } | {z } | {z }
moment about x moment about y moment about z

Two-dimensional analogues follow in the obvious manner.

◦ Centers of Mass/Centroid: We let x, y, z denote the center of mass, or centroid , in the
x, y, z coordinates respectively. These are just normalized first moments. Hence:
M
yz Mxz Mxy

centroid = x, y, z = , ,
M M M
 
ZZZ ZZZ ZZZ 

 x · δ(x, y, z) dV y · δ(x, y, z) dV z · δ(x, y, z) dV 

=  Z ZD , Z ZD , Z ZD
 Z Z Z 

δ(x, y, z) dV δ(x, y, z) dV δ(x, y, z) dV 
 

 D D D 
| {z } | {z } | {z }
=: x =: y =: z

◦ Second Moments: The second moments Ix , Iy , Iz about the x, y, z axes (resp.) are given by
ZZZ
Ix = (y 2 + z 2 )δ(x, y, z) dV
V
ZZZ
Iy = (x2 + z 2 )δ(x, y, z) dV
V
ZZZ
Iz = (x2 + y 2 )δ(x, y, z) dV
V

In general, for a line L, its second moment is IL . If r(x, y, z) is the distance from (x, y, z) to L,
then ZZZ
Il = r2 (x, y, z) · δ(x, y, z) dV
V

102
§4.9: Line Integrals

The Case of Scalar Functions:

Let f be continuous over a curve C which is parameterized by #» r (t) := g(t)ı̂ + h(t)ȷ̂ + k(t) k̂ for t ∈ [a, b].
Hence, C = { #»
r (t)}t∈[a,b] . We may evaluate the line integral of f over C by
Z Z b
f g(t), h(t), k(t) · | #»

f (x, y, z) ds = r ′ (t)| dt
C a

Here, ds is a differential length element, satisfying the usual

p
ds = (dx)2 + (dy)2 + (dz)2

Note that, even if two paths C, C ′ start and end at the same points, we may have
R R
C
f ̸= C′
f.

Mass & Moments:

Suppose we have C as a smooth curve representative of a wire, coil, etc., and δ a mass-density per unit
length along it. Then the usual ideas apply, summed up below:

103
The Case of Vector-Valued Functions:
#»
Now suppose we have a vector-valued function F and a curve C parameterized by #» r (t) for t ∈ [a, b].
#»
Then the line integral of F over C is

#» E #» d #» #» E Z b #» d #»
Z D Z Z D
r r
F, T̂ ds = F, ds = F, d #»
r = F( #»
r (t)), dt
C C ds C a dt

(albeit usually written in the dot product notation). Observe that, for instance,

#» #»
Z Z
F := M (x, y, z)ı̂ =⇒ F · d #»
r = M (x, y, z) dx
C C

Hence if #»
r (t) := g(t)ı̂ + h(t)ȷ̂ + k(t) k̂ parameterizes C, then
Z Z Z b
M (x, y, z) dx = M dx = M g(t), h(t), k(t) · g ′ (t) dt
C C a
Z Z Z b
M (x, y, z) dy = M dy = M g(t), h(t), k(t) · h′ (t) dt
C C a
Z Z Z b
M (x, y, z) dz = M dz = M g(t), h(t), k(t) · k ′ (t) dt
C C a

Of note, this integral can give the work W done by the vector field over the curve, or the flow of the
vector field along it. If C is a closed curve, then the flow is called the circulation around C (with positive
orientation =⇒ counterclockwise motion).
Flux is analogously given w.r.t. a normal vector n̂ to C, as below. We assume here the curve is in the
#»
x, y plane, parameterized by #»
r (t) = g(t)ı̂ + h(t)ȷ̂ amd that F = M ı̂ + N ȷ̂.

#» #» E
Z D I
flux of F over C = F, n̂ ds = M dy − N dx
C C

Note further that C must be traced exactly once for this to apply.
It is common to have integrals of the type
Z n o
−y dx + z dy + 2x dz C = cos(t)ı̂ + sin(t)ȷ̂ + t k̂
C t∈[0,2π]

which mix up x, y, z with the other differentials. For this, apply your parameterization first and get all in
terms of the parameter t. For instance, the parameterization

x = x(t) = cos t y = y(t) = sin t z = z(t) = t

could be applied to this integral. Replace x, y, z accordingly. Find the differentials next, per the formula

df = df (t) = f ′ (t) dt

so, for instance,

dx = − sin(t) dt dy = cos(t) dt dz = dt
and replace the differentials accordingly. Then simplify, and integrate over t ∈ [0, 2π].

104
Path Independence / Conservative Fields:
Recall that the line integral across two different curves starting and ending at the same points may still
differ.
#» R #»
If F is a field for which C F ·d #»
r is the same regardless of which path C is used (provided all such C start
#»
and end at the same points), we say this integral is path independent and the field F is conservative.
#»
Conservative Fields Are Gradient Fields: Let F = M ı̂ + N ȷ̂ + P k̂ for M, N, P continuous on
#» #»
a connected domain. Then, if F = ∇f for a scalar function f (and some other conditions), F is also
#»
conservative. The converse is true. We say f is a potential function of F.
#»
Fundamental Theorem of Line Integrals: Suppose F is conservative with potential f , and the
path C starts at P and runs to Q, and is smooth, with parameterization given by #»
r (t). Then

#»
Z
F · d #»
r = f (Q) − f (P )
C

Notice that if C is a closed loop, then the integral is zero.

#»
Loop Property: F is conservative iff the integral is zero on each closed loop.
#»
Component Test: Take F = M (x, y, z)ı̂ + N (x, y, z)ȷ̂ + P (x, y, z) k̂, each component having contin-
#»
uous first partial derivatives. Then F is conservative (on an open simply connected domain) iff

∂P ∂N ∂M ∂P ∂N ∂M
= = =
∂y ∂z ∂z ∂x ∂x ∂y
You take each pair of components, say the α and β components Cα , Cβ , and then differentiate them
w.r.t. the other component’s variable and see if they equate, e.g.
∂Cα ? ∂Cβ
=
∂β ∂α

Exactness: We may say, equivalently, M (x, y, z) dx + N (x, y, z) dy + P (x, y, z) dz is exact if these

equations are satisfied. The definition of this expression being exact is
∂f ∂f ∂f
∃f such that M dx + N dy + P dz = dx + dy + dz =: df
∂x ∂y ∂z

Green’s Theorems: Discussed here.

105
§4.10: Parameterized Surfaces: Areas & Surface Integrals

Areas & Parameterized Surfaces:

Suppose a surface R is parameterized by
#»
r (u, v) = f (u, v)ı̂ + g(u, v)ȷ̂ + h(u, v) k̂ u ∈ [a, b] v ∈ [c, d]

Then the area of the surface is given by

Z Z #»
∂ #»
Z d Z b #»
∂ #»

∂ r r ∂ r r
area = × dA = ∂u × ∂v du dv

R ∂u ∂v

c a

This gives rise to the surface area differential,

#»
∂ #»

∂ r r
dσ =
× du dv
∂u ∂v

#» ∈
n o
If a surface R is implicitly defined via F (x, y, z) = c and has normal vector p ı̂, ȷ̂, k̂ , and
#» =
⟨∇F, p⟩ ̸ 0,
∥∇F ∥
ZZ
surface area of R = #» dA
R |⟨∇F, p⟩|

Likewise, if z = f (x, y) defines a surface over R2 , then its surface area is

s
ZZ 2 2
∂f ∂f
surface area = + + 1 dx dy
R ∂x ∂y

106
Surface Integrals:
Given a smooth surface S with parameterization S = { #» r (u, v)}(u,v)∈R , and G ∈ C(S), the surface integral
of G on S is ∂ #» ∂ #»
ZZ ZZ
r r
G(x, y, z) dσ = G f (u, v), g(u, v), h(u, v) × du dv
S R ∂u ∂v
| {z }
= dσ
where #»
r (u, v) = f (u, v)ı̂ + g(u, v)ȷ̂ + h(u, v) k̂.
If the surface is implicitly defined, under the same circumstances as on the previous page,

∥∇F ∥
ZZ ZZ
G(x, y, z) dσ = G(x, y, z) #» dA
S R |⟨∇F, p⟩|

If S = graph(f ) for z = f (x, y), then

s 2 2
ZZ ZZ ∂f ∂f
G(x, y, z) dσ = G x, y, f (x, y) + + 1 dx dy
S R ∂x ∂y

We have the usual discussion on moments and the like:

107
Surface Integral of Vector Field:
#»
If S is oriented by n̂, we have the surface integral of F as given by

#» E #» ∂ #» ∂ #»
ZZ D ZZ
r r
F, n̂ dσ = F, × du dv
S S ∂u ∂v

by choosing n̂ via the cross product, with #»

r parameterizing S. This integral encodes flux through S.
Of course, as usual, if S is defined by a level surface g(x, y, z) = c, we may take n̂ = ±∇g/∥∇g∥ for
whichever direction is nicer.

Stokes’ Theorem:
Stokes’ Theorem is a noteworthy result as well. Given S a piecewise smooth surface (with unit normal
#»
n̂) , with F having continuous first partial derivatives in each component, then

#» #» #» E
I D E ZZ D
circulation of F around C (CCW w.r.t. n̂) = F, d #»
r = ∇ × F, n̂ dσ
∂S S

If ∂S lies in the x, y plane and is oriented counterclockwise, then n̂ = k̂ and

#» E Z Z ∂N
I D
∂M
F, d #»
r = − dx dy
∂S S ∂x ∂y

the circulation-curl form of Green’s Theorem.

Thanks to the property
#» #»
curl grad f = 0 ; that is, ∇ × ∇f = 0
#» #»
if ∇ × F = 0 everywhere, then the integral in question is 0.

Divergence Theorem:
#»
This claims, for F as in Stokes’ Theorem, the flux through the surface S = ∂D (with S having unit
normal n̂) is given by
#» E #»
ZZ D ZZZ
F, n̂ dσ = ∇ · F dV
∂D D

108
§5: Vector Calculus Identities (Calculus III)

§5.1: Fundamental Definitions

#»
Throughout, we assume V : Rn → Rn is a vector field (vector-valued function) and φ : Rn → R is a
scalar field (scalar-valued function).
It is conventional to treat ∇ as the pseudo-vector (∂i )ni=1 where ∂i := ∂/∂xi (derivative of the ith
coordinate).
We assume Cartesian Rn coordinates here, for simplicity.

Formal Definitions:
#» (evaluated
◦ Gradient: The gradient of φ is the vector field whose dot product with any vector v
#» #» #»
at x 0 ) is the directional derivative of V along v :
#» #»
∇V( #»
x 0 ) · v = D v#» V( #»

x 0)

◦ Divergence: Take a volume τ with outward unit normal n̂ and surface σ ≡ ∂τ , the divergence
at x0 is
#» #»
I
1
∇ · V := lim V · n̂ dσ
µ(τ )→0 µ(τ )
x0 σ
#»
The integral here measures the flux of V as it leaves σ. One can also think of
ZZZ
µ(τ ) ≡ dτ
τ

◦ Curl: The curl at x0 is given as so: take a volume τ (surface σ, outward unit normal n̂) containing
x0 , and shrink its volume to zero in the following formula:
#» #»
I
1
∇ × V := lim n̂ × V dσ
x0 µ(τ )→0 µ(τ ) σ

#»
The cross product gives us a vector perpendicular to V that is (equivalent to) a vector tangential
to the differential surface element (which can think of as a circle, i.e. the cross product gives a
vector in the direction of rotation, tangent to a circle and perpendicular to its radius).
Usual Definitions:
◦ Gradient: The gradient grad(φ) or ∇φ is the vector-valued function
n n
X ∂φ X
∇φ := êxi = ∂i φ êi
i=1
∂xi i=1
#» #»
◦ Divergence: The divergence div V or ∇ · V is the scalar-valued function
n
#» X
∇ · V := ∂i Vi
i=1
#» #»
◦ Curl: In R3 , the curl, denoted curl V or ∇ × V, is easily memorized by thinking of a determi-
nant:
x̂ ŷ ẑ
#»
∇ × V ≡ ∂x ∂y ∂z
V x Vy Vz

109
§5.2: Useful Identities

#» #»
Throughout, A, B represent vector fields and ψ, φ scalar fields.
These come in essence from Wikipedia. Some other stuff can be found in the NRL Plasma Formulary,
for instance, that you used in your space science math methods class; it is accessible online here; you should
have also downloaded a copy. Bear in mind conventions about various coordinate systems.

Linearity: The curl, gradient, and divergence are linear operators. Hence, for any α, β ∈ R,

◦ Gradient: ∇(αφ + βψ) = α∇φ + β∇ψ

#» #» #» #»
◦ Divergence: ∇ · αA + β B = α ∇ · A + β ∇ · B
#» #» #» #»
◦ Curl: ∇ × αA + β B = α ∇ × A + β ∇ × B

Gradient:

◦ ∇(ψ + φ) = ∇ψ + ∇φ
◦ ∇(ψφ) = φ∇ψ + ψ∇φ

ψφ φ∇ψ − ψ∇φ
◦ ∇
= φ2
#» #» #»
◦ ∇ ψ A = ∇ψ ⊗ A + ψ∇A
#» #» #» #» #» #» #» #» #» #»
◦ ∇ A·B = A·∇ B+ B·∇ A+A× ∇×B +B× ∇×A

Divergence:
#» #» #» #»
◦ ∇· A+B =∇·A+∇·B
#» #» #»
◦ ∇ · ψ A = ψ∇ · A + A · ∇ψ
#» #» #» #» #» #»
◦ ∇· A×B = ∇×A ·B− ∇×B ·A

Curl:
#» #» #» #»
◦ ∇× A+B =∇×A+∇×B
#» #» #» #» #»
◦ ∇ × ψ A = ψ ∇ × A − A × ∇ ψ = ψ ∇ × A + (∇ψ) × A
◦ ∇ × (ψ∇φ) = ∇ψ × ∇φ
#» #» #» #» #» #» #» #» #» #»
◦ ∇× A×B =A ∇·B −B ∇·A + B·∇ A− A·∇ B

Material Derivatives:
#» #» 1 #» #» #» #» #» #» #» #» #» #» #» #»

◦ A·∇ B= ∇ A · B −∇× A × B −B× ∇ × A −A× ∇ × B −B ∇ · A +A ∇ · B
2
#» #» 1 #» 2 #» #» 1 #» 2 #» #»

◦ A · ∇ A = ∇ A − A × ∇ × A = ∇ A + ∇ × A × A

2 2

110
Second Derivatives, e.g. Laplacian: Recall we define ∇2 := ∇ · ∇ =: ∆.
#»
◦ ∇· ∇×A =0
#»
◦ ∇ × (∇ψ) = 0
◦ ∇ · (∇ψ) = ∇2 ψ
#» #» #»
◦ ∇ ∇ · A − ∇ × ∇ × A = ∇2 A

◦ ∇ · (φ∇ψ) = φ∇2 ψ + ∇φ · ∇ψ
◦ ψ∇2 φ − φ∇2 ψ = ∇ · (ψ∇φ − φ∇ψ)
◦ ∇2 (φψ) = φ∇2 ψ + 2(∇φ) · (∇ψ) + ∇2 φ ψ

#» #» #» #»
◦ ∇2 ψ A = A∇2 ψ + 2(∇ψ · ∇)A + ψ∇2 A
#» #» #» #» #» #» #» #» #» #»
◦ Green’s Vector Identity: ∇2 A · B = A·∇2 B− B·∇2 A+2∇· B · ∇ A + B × ∇ × A

Third Derivatives: Recall we define ∇2 := ∇ · ∇ =: ∆.

◦ ∇2 (∇ψ) = ∇(∇ · (∇ψ)) = ∇ ∇2 ψ

#» #» #»
◦ ∇ 2 ∇ · A = ∇ · ∇ ∇ · A = ∇ · ∇2 A
#» #» #»
◦ ∇2 ∇ × A = −∇ × ∇ × ∇ × A = ∇ × ∇2 A

Surface-Volume Integral Identities: V will denote volumes and S = ∂V their surfaces.

#»
I Z
◦ ψ dS = ∇ψ dV
∂V V
#» #» #»
I Z
◦ A × dS = − ∇ × A dV
∂V V
#» #» #»
I Z
◦ Divergence Theorem: A · dS = ∇ · A dV
∂V V
#»
I Z
ψ∇2 φ + ∇φ · ∇ψ dV

◦ Green’s First Identity: ψ∇φ · d S =
∂V V
#»
I Z
ψ∇2 φ − φ∇2 ψ dV

◦ Green’s Second Identity: (ψ∇φ − φ∇ψ) · d S =
∂V V

Integration-by-Parts-Like Identities: A subset of the previous.

#» #» #» #»
Z I Z
◦ A · ∇ψ dV = ψA · d S − ψ∇ · A dV
ZV ∂V V
#» #» #» #» #» #» #»
Z I
◦ A · ∇ × B dV = ∇ × A · B dV − A × B · dS
V V ∂V

Curve-Surface Integrals: S is a surface and C = ∂S a closed curve upon it.

#» #»
I ZZ
◦ ψdℓ = − ∇ψ × d S
∂S S
#» #» #» #»
I ZZ
◦ Stokes’ Theorem: A·dℓ = ∇ × A · dS
∂S S
#»
◦ Green’s Theorems in the Plane: We let V := Mı̂ + Nȷ̂ with M, N continuous partial first
derivatives in an open set containing R a region in the plane, and C = ∂R a closed curve containing
R.

111
Circulation-Curl/Tangential Form: This claims

#» #»
I I ZZ
∂N ∂M
F · T ds = M dx + N dy = − dx dy
∂x ∂y
|C {zC } R
| {z }
CCW circulation circulation density
#»
z component of curl: curl F · k̂

Flux-Divergence/Normal Form: This claims

#»
I I ZZ
∂M ∂N
F · n̂ ds = M dy − N dx = + dx dy
∂x ∂y
|C {zC } R
| {z }
outward flux through C flux density
#»
div( F )

In using Green’s theorems, it’s usually a good idea to parameterize C in terms of a variable t to
make it into a calculable Riemann integral. (For instance, if C is a circle of radius r, let x = r cos t
and y = r sin t for t ∈ [0, 2π).) Then you can calculate the differentials dx, dy for the line integrals
as you would for differentials:
df = f ′ (x) dx
In the circle example, then,

dx = dx(t) = d r cos t = −r sin t dt

as an example. (You’ll also want to convert M, N to terms of t for the line integral, if needed.)

112
§5.3: Identities with the Levi-Civita Symbol

We define the Levi-Civita symbol by, given i1 , · · ·, in ∈ {1, · · ·, n},


+1, (i1 , i2 , · · ·, in ) is an even permutation of (1, 2, · · ·, n) in Sn

εi1 ,···,in := −1, (i1 , i2 , · · ·, in ) is an odd permutation of (1, 2, · · ·, n) in Sn

0, the ik are not all distinct


= (−1)p , where p (parity) is the number of swaps to reach (1, 2, · · ·, n) if they’re all distinct

Some further reading is on Wikipedia, including identities tying it to the Kronecker δ.

We adopt the Einstein summation convention (implicit summation over repeated indices) and focus on
#» #»
the n = 3 case. Then we have, given vector fields U := (Ui )3i=1 , V := (Vi )3i=1 and a scalar field φ, the
identities:
D #» #»E
U, V = Ui Vi (inner/dot/scalar product)
∇φ = ∂i φ êi (gradient)
#»
∇ · V = ∂i Vi (divergence)
#» #»
U × V = εi,j,k · Uj Vk êi (cross/outer/vector product)
#»
∇ × V = εi,j,k · ∂j Vk êi (curl)
#» #»
U · ∇ V = Ui ∂i Vj êj (material derivative)

113
§5.4: Alternative (3D) Coordinate Systems

As usual, V : R3 → R3 is vector-valued and φ : R3 → R is scalar-valued.

This only goes over the basic operations I use a lot: others, such as the vector Laplacian ∇2 ≡ ∇ · ∇ ≡ ∆,
material derivative (A · ∇)B, and more can be found here on Wikipedia or in the NRL Plasma Formulary,
for instance, that you used in your space science math methods class; it is accessible online here; you should
have also downloaded a copy. Bear in mind conventions about various coordinate systems.

Basic Conversions:

114
Conversions of System A (Left) to System B (Top), Using System B Conventions:

115
Conversions of System A (Left) to System B (Top), Using System A Conventions:

116
Differentials in Alternative Systems
Note that these differentials apply in all cases, not just as a means of conversion. That is, if you have
f (r, θ, φ) a function in spherical coordinates, and integrate it over a body E, then
ZZZ ZZZ
f (r, θ, φ) dV = f (r, θ, φ) · r2 sin θ dr dθ dφ
E E

in the conventions/notes below. Be mindful of various conventions of notation, especially where spherical
coordinates are concerned. For posterity, those University Calculus: Early Transcendentals uses are summed
up by the picture and bullets below

ρ is the radius

ϕ is the angle made with the z axis

θ is the angle in the x, y plane measured from the positive x axis

Cartesian (R2 ): Coordinates: (x, y)

◦ Length (First Coordinate Changed): dx

◦ Length (Second Coordinate Changed): dy
◦ Area: dA = dx dy

Polar (R2 ): Coordinates: (r, θ)

◦ Length (First Coordinate Changed): dr

◦ Length (Second Coordinate Changed): rdφ
◦ Area: dA = r dr dφ

Cartesian (R3 ): Coordinates: (x, y, z)

◦ Length (First Coordinate Changed): dx

◦ Length (Second Coordinate Changed): dy
◦ Length (Third Coordinate Changed): dz

117
◦ Area (First Coordinate Fixed): dS = dy dz
◦ Area (Second Coordinate Fixed): dS = dx dz
◦ Area (Third Coordinate Fixed): dS = dx dy
◦ Volume: dV = dx dy dz

Cylindrical (R3 ): Coordinates: (r, θ, z)

◦ Length (First Coordinate Changed): dr

◦ Length (Second Coordinate Changed): r dφ
◦ Length (Third Coordinate Changed): dz
◦ Area (First Coordinate Fixed): dS = r dφ dz
◦ Area (Second Coordinate Fixed): dS = dr dz
◦ Area (Third Coordinate Fixed): dS = r dr dφ
◦ Volume: dV = r dr dφ dz

Spherical (R3 ): Coordinates: (r, θ, φ)

◦ Length (First Coordinate Changed): dr

◦ Length (Second Coordinate Changed): r dθ
◦ Length (Third Coordinate Changed): r sin θ dφ
◦ Area (First Coordinate Fixed): dS = r2 sin θ dθ dφ
◦ Area (Second Coordinate Fixed): dS = r sin θdr dφ
◦ Area (Third Coordinate Fixed): dS = r dr dθ
◦ Volume: dV = r2 sin θ dr dθ dφ

118
Important Operator Conversions:
Only some are found here. Others may be found in a table on Wikipedia, or archived here on Imgur.

Cylindrical Coordinates (r, θ, z): Think polar, extended to 3D. Take V = Vr êr + Vθ êθ + Vz êz
specifically.
∂φ 1 ∂φ ∂φ
◦ Gradient: ∇φ = êr + êθ + êz
∂r r ∂θ ∂z
1 ∂ 1 ∂Vθ ∂Vz
◦ Divergence: ∇ · V = (rVr ) + +
r ∂r r ∂θ ∂z
◦ Curl:

1 ∂Vz ∂Vϕ
∇×V = − êr
r ∂ϕ ∂z

∂Vr ∂Vz
+ − êϕ
∂z ∂r

1 ∂ ∂Vr
+ (rVϕ ) − êz
r ∂r ∂ϕ

Spherical Coordinates (r, θ, ϕ): Take V = Vr êr + Vθ êθ + Vϕ êϕ specifically. Due to conflicting
notations in the literature, θ represents the polar angle (the angle from the z axis) and ϕ the azimuthal
angle (that in the x, y plane). (Yes, the definition of each term is confusing. No, I don’t understand
this.) Some info on the conflicting conventions is explained here.

∂φ 1 ∂φ 1 ∂φ
◦ Gradient: ∇φ = êr + êθ + êϕ
∂r r ∂θ r sin θ ∂ϕ
1 ∂ 2 1 ∂ 1 ∂Vϕ
◦ Divergence: ∇ · V = 2 r Vr + (Vθ sin θ) +
r ∂r r sin θ ∂θ r sin θ ∂ϕ
◦ Curl:

1 ∂ ∂Vθ
∇×V = (Vϕ sin θ) − êr
r sin θ ∂θ ∂ϕ

1 1 ∂Vr ∂
+ − (rVϕ ) êθ
r r sin θ ∂ϕ ∂r

1 ∂ ∂Vr
+ (rVθ ) − êϕ
r ∂r ∂θ

119
§6: Items from Ordinary Differential Equations
(More to add later...)

120
§7: Items from Partial Differential Equations
(More to add later...)

121
§8: Matrices & Linear Algebra

§8.1: Vector Spaces: Axioms, Concepts, & Examples

A vector space or linear space V over a field F has two operations of addition + : V × V → V and
scalar multiplication · : V × F → V satisfying the following:

(i) Addition Commutes: ∀x, y ∈ V, x + y = y + x

(ii) Addition Associates: ∀x, y, z ∈ V, x + (y + z) = (x + y) + z; hence x + y + z is unambiguous

(iii) Zero Vector: ∃0 ∈ V such that x + 0 = x ∀x ∈ V
(iv) Additive Inverse: ∀x ∈ V, ∃y ∈ V such that x + y = 0; we often let −x be this y notationally
(v) Multiplication by 1: For the 1 of F, we have 1x = x for each x ∈ V

(vi) Association of Scalar Multuplication: ∀α, β ∈ F and ∀x ∈ V , we have (αβ)x = α(βx)

(vii) Left Distribution (Over Vectors): ∀α ∈ F and ∀x, y ∈ V , we have α(x + y) = αx + αy
(viii) Right Distribution (Over Scalars): ∀α, β ∈ F and ∀x ∈ V , we have (α + β)x = αx + βx

Common examples include: here, F denotes a field (e.g. R, C)

The trivial vector space, consisting of a zero vector only: {0} (sometimes ⟨0⟩)

Fn := {(xi )ni=1 | xi ∈ F ∀i}, with pointwise addition and scalar multiplication

∞
Fω or F∞ , the space of sequences (xi )i=1 in F

Pn (F) or Fn [x], the space of all p ∈ F[x] such that deg(p) ≤ n.

F[x] itself

Fm×n , the space of m × n matrices over F

F(S, F), the set of functions S → F

C(R), the space of continuous functions on R (likewise, C)

Given W ⊆ V and that V is an F-vector space, we say W is a (vector) subspace of V (sometimes

denoted W ≤ V ) if W is a vector space in its own right, using the operations of V . One could show that W
is a subspace iff these hold:

(i) 0 ∈ W (for 0 the zero vector of V )

(ii) αx + βy ∈ W for each α, β ∈ F and x, y ∈ W

Trivially, for any vector space V , V and the trivial space are subspaces, as is the intersection of subspaces.
The span of a set S ⊆ V is an important subspace. We let
( n )
X
span(S) := αi vi n ∈ N, αi ∈ F, vi ∈ S ∀i

i=1

with span(∅) := ⟨0⟩.

122
n n P
We say S ⊆ V is linearly dependent if ∃{xi }i=1 ⊆ S and {αi }i=1 ⊆ F (not all 0) such that i αi xi = 0.
Otherwise, if
Xn
αi xi = 0 =⇒ αi = 0 for any choice of finitely-many xi ∈ S
i=1

then we say S is linearly independent. (|S| ≤ 1 can be trivially/vacuously shown independent.) We say S
is a basis if it is linearly independent and span(S) = V . For V having a basis of finitely-many (say, n < ∞)
elements - and hence every basis of it does - we say that dimF V = n.

123
§8.2: Linear Transformations; Rank, Kernel, Nullity, etc.

For V, W F-vector spaces, a function T : V → W is a linear transformation if it is linear as a function:

T (αx + βy) = αT (x) + βT (y) for all α, β ∈ F and for all x, y ∈ V

(The space of all such T is denoted L(V, W ), or L(V ) if they’re the same. The dual space of V is
V ∗ := L(V, F).)
We define some important spaces and parameters for such a T :

Null Space: N (T ) ≡ null(T ) ≡ ker(T ) := {x ∈ V | T (x) = 0}.

Range: R(T ) ≡ range(T ) ≡ im(T ) := {w ∈ W | ∃x ∈ V such that T (x) = w}

Nullity: nullity(T ) := dim ker(T )

Rank: rank(T ) := dim im(T )

We note:

Rank-Nullity Theorem: nullity(T ) + rank(T ) = dim ker(T ) + dim im(T ) = dim V (for V finite-
F F F
dimensional)
ker(T ) ≤ V and im(T ) ≤ W
n
β := {vi }i=1 a basis of V =⇒ im(T ) = span(T (v1 ), · · ·, T (vn )) (but not a basis without trivial kernel)

A matrix A is of full rank when rank(A) = min{m, n}. As it happens, the set of full rank matrices
A ∈ Cm×n is dense. (The norm used to measure this is irrelevant since all norms are equivalent on a
finite-dimensional space.)

We may represent linear transformations with matrices as so:

Let T : V → W be linear
n m
Let V have basis β := {xi }i=1 and W have basis γ := {yi }i=1

Then ∃! ai,j ∈ F (1 ≤ i ≤ n, 1 ≤ j ≤ m) such that

m
X
T (xj ) = ai,j yi for each 1 ≤ j ≤ n
i=1

Then the matrix A := (ai,j )1≤i≤m,1≤j≤n is the representation of T in the bases β, γ, and we may write
A = [T ]γβ (omitting one if β = γ). When T = I (the identity transformation), we say that [I]γβ is the change
of basis matrix for the bases of course.

124
§8.3: Matrix Operations & Notations

Let ai,j ∈ R for some commutative ring R (e.g. Z, R, C). The m-row, n-column matrix A formed by
these ai,j is denoted A := (ai,j )1≤i≤m,1≤j≤n . We would say that A ∈ Rm×n (or Mm×n (R)), with A being
square if m = n.
We may write GLn (R) for those n × n matrices which are invertible (general linear group).

Addition: Let A := (ai,j )1≤i≤m,1≤j≤n , B := (bi,j )1≤i≤m,1≤j≤n ∈ Rm×n . We may define A + B

pointwise/entry-wise:
A + B := (ai,j + bi,j )1≤i≤m,1≤j≤n
Thus, + : Rm×n × Rm×n → Rm×n . You can only add matrices of the same size, and obtain one of the
same size.
Scalar Multiplication: Take A := (ai,j )1≤i≤m,1≤j≤n ∈ Rm×n and α ∈ R. Then the matrix αA is
defined by
αA := (αai,j )1≤i≤m,1≤j≤n
Hence, scalar multiplication may be thought of as a map R × Rm×n → Rm×n .
Matrix Multiplication: Take A := (ai,j )1≤i≤m,1≤j≤n ∈ Rm×n and B := (bi,j )1≤i≤n,1≤j≤r ∈ Rn×r .
(Notice: A has as many columns, as B has rows.) Then we define AB ∈ Rm×r by
D E n
X
AB = (ci,j )1≤i≤m,1≤j≤r where ci,j := Ai,∗ , B∗,j := ai,k bk,j
Rn
k=1

m m
where Ai,∗ := (ai,j )j=1 ∈ R1×m is the ith row of A, and B∗,j := (bi,j )i=1 ∈ Rm×1 is the jth column of
B. The multiplication algorithm is succinctly visualized with this:

Hence, one may think of matrix multiplication as a map Rm×n × Rn×r → Rm×r . Consider the
dimensions represented:
m×n n×r
In multiplication matrices, with dimensions represented in this order, the “inner” pair of dimensions
must be equal for the two to be “compatible” in this sense, and their product has size determined by
the outer numbers.
Note that matrix multiplication is not necessarily commutative, even if the entries lie in
a field.

125
§8.4: Transposition & Related Notions (Hermitian, Unitary, & More)

Some further operations:

Tranpose: The transpose of A := (ai,j )1≤i≤m,1≤j≤n is AT := (aj,i )1≤j≤n,1≤i≤m . That is, for
A ∈ Rm×n , we get AT ∈ Rn×m with entries swapping their rows and columns (flipping across the
diagonal, in effect, for square matrices).
Sometimes, this is denoted by A′ or AT (with no mind for the serifs).
Conjugate-Transpose: When A’s entries lie in R (A := (ai,j )1≤i≤m,1≤j≤n ), this is identical to
the transpose. When some are non-real and lie in C, however, we can define the distinct conjugate
transpose, denote A∗ and defined by

A∗ := (aj,i )1≤j≤n,1≤i≤m

That is, you take the complex-conjugate of each entry, and then transpose (or in the other order).
Sometimes this is denoted by AH , and sometimes known as the Hermitian transpose, transjugate,
or (confusingly) adjoint. If we let A denote the entry-wise operation of complex conjugation, i.e.

A = (ai,j )1≤i≤m,1≤j≤n =⇒ A = (ai,j )1≤i≤m,1≤j≤n

then we can also define A∗ := (A)T = AT .

Some definitions tied to transpose/conjugate-transpose:

Identity (over R) A = AT A = −AT A−1 = AT

Name (over R) Symmetric Skew-symmetric, anti-symmetric Orthogonal

Identity (over C) A = A∗ A = −A∗ A−1 = A∗ AA∗ = A∗ A

Name (over C) Hermitian, self-adjoint Skew-Hermitian, anti-Hermitian Unitary Normal

Some properties follow. Unless stated otherwise, even though conjugate-transposes are used, the same
considerations may be used in the real (ordinary transpose) cases.

Involution: (M ∗ )∗ = M
Respects Addition: (A + B)∗ = A∗ + B∗
Reverses Products: (AB)∗ = B ∗ A∗
Respects Scalar Multiplication: (cM )T = cM T (though (cM )∗ = cM ∗ )
Determinants: det(M T ) = det(M ) and det(M ∗ ) = det(M )
Traces: trace(AT ) = trace(A) and trace(A∗ ) = trace(A)
Positive Semi-Definite: For A ∈ Rn×n we have AT A as positive-semidefinite.
∗ −1
Inverses: M −1 = (M ∗ )

Eigenvalues: The eigenvalues of M and M T are the same. If λ is an eigenvalue to M , then λ is one
for M ∗ .

126
§8.5: Determinants: Definitions & Notations

Introduction:
The determinant of a matrix M is denoted det(M ). Sometimes we denote it with absolute values
around the matrix itself, e.g.
1 2 1 2
M= =⇒ det(M ) =

3 4 3 4
It is only defined for square matrices, and represents the volume dilation and orientation change by the linear
transformation M represents.

Main Definition: Laplace Explansion:

To calculate it, we expand along a row or column (Laplace expansion): for each entry in it, we delete
the entry’s row and column, and take the determinant of what remains, times a factor. That factor is
the entry, times a sign from the sign matrix : it is the same size as M , with first row being alternating
+, −, +, −, · · ·, and then each subsequent row changing sign. Use the sign from the entry corresponding to
the entry of M .
We call the sign times the smaller determinant the cofactor , and the smaller determinants the minors.
(The bringing in of the entries offers no new terminology.) If we let Mi,j denote the minor generated from
entry mi,j , then
Xn n
X
det(A) = (−1)i+j mi,j Mi,j = (−1)i+j mi,j Mi,j
j=1 i=1

by expanding along row i or column j, respectively.

Brief Example of Laplace Expansion:

Example: for a 3 × 3 matrix we have the sign matrix
 
+ − +
− + −
+ − +

and expanding along the first row,

1 2 3
5 6 4 7 4 5
4 5 6 = 1 · (+1) ·
+2 · (−1) ·
+3 · (+1) ·

7 8 9 6 9 7 8
8 9 | {z } | {z }
cofactor (−1)1+1 M1,1 minor M1,2

We proceed iteratively.

127
Small Determinants (1 × 1, 2 × 2, 3 × 3):

Trivially, the determinant of a 1 × 1 matrix is its sole entry: M = a =⇒ det(M ) = a
We may calculate 2 × 2 determinants by

a b

c = ad − bc
d

For 3 × 3 determinants we may use Sarrus’ rule, which uses a shoelace/Pac-Man-like pattern: extend
down-right on the main diagonals from the top row, take the products of the entries, and add them. Do the
same for the down-left anti-diagonal patterns (or up-right from the bottom row as pictured). Find the first
value minus the second value.

Alternative Definition of Determinants: Leibniz’s Formula:

Consider A := (ai,j )1≤i,j≤n ∈ Rn×n . We may define

X n
Y
det(A) := sign(σ) ai,σ(i)
σ∈Sn i=1

where the sign of σ ∈ Sn is the number of swaps it needs to return to the form (1, 2, 3, · · ·, n), the identity.
Hence, one may utilize the Levi-Civita symbol to write
n
X n
Y
det(A) := εi1 ,i2 ,···,in · aj,ij
i1 ,···,in =1 j=1

128
§8.6: Determinants: Adjugates/Adjoints & Cofactor Matrices

We may define a cofactor matrix . There seems to be no standard notation, so let Mc be that for M .
Then
(Mc )1≤i,j≤n = (−1)i+j Mi,j 1≤i,j≤n

where Mi,j is the smaller determinant from the Laplace expansion, generated by the entry mi,j in M .
We can also define the adjugate matrix (or classical adjoint) as the transpose of the cofactor matrix.
That is, since the cofactors are (−1)i+j Mi,j , we have

(adj(M ))1≤i,j≤n = (−1)i+j Mj,i 1≤i,j≤n = McT

The adjugate satisfies the following properties, for M ∈ Rn×n :

adj(I) = I, and adj(0) = 0 for dimensions > 1. For dimension 1 × 1, then adj(0) = I.

adj(cM ) = cn−1 · adj(M )

adj(M T ) = (adj(M ))T

det(adj(M )) = det(M )n−1

adj(M ) = det(M ) · M −1 for M invertible

◦ Hence, adj(M )−1 = det(M )−1 M

◦ Hence, adj(M −1 ) = adj(M )−1

adj M = adj(M ) for M the entry-wise complex conjugate

adj(M ∗ ) = adj(M )∗ for M ∗ the conjugate transpose

adj(AB) = adj(B) adj(A) and hence adj(M k ) = adj(M )k (working for negative k on M invertible)

If M is triangular/diagonal, orthogonal, unitary, symmetric/Hermitian, skew-symmetric/skew-Hermitian,

or normal, that same property is true of adj(M )
k
−(−1)k ]/n k
adj ◦ adj ◦ · · · ◦ adj(M ) = det(M )[(n−1) M (−1)
| {z }
k times
k
det adj ◦ adj ◦ · · · ◦ adj(M ) = det(A)(n−1)
| {z }
k times

129
§8.7: Determinants: Properties

Immediate from Laplace Expansion:

◦ Any choice of expansion for row or column yields the same result.
◦ If an entire row or column is all zeroes, then det(M ) = 0.
Moreover, if two rows or two columns are identical, then det(M ) = 0.
In fact, if the rows or columns are not linearly independent (you can write one row as a linear
combination of the others), then det(M ) = 0.
◦ det(In ) = 1 (for In the n × n identity matrix)

Row Operations: Let M be the starting matrix, and M ′ the matrix after the operation.

◦ Transpose (No Change): det(M T ) = det(M )

◦ Adding Rows (No Change): If you add a (multiple of) row to another row of M to make M ′ ,
then det(M ′ ) = det(M ).
◦ Row Swap (−1): If you swap two rows or two columns to get M ′ , then det(M ′ ) = − det(M ).
◦ Row Scaling: If you multiply a row/column by k to get M ′ , then det(M ′ ) = k · det(M ). (In
particular, det(kM ) = k n · det(M ) for M n × n.)
◦ Add A Vector To A Row: Suppose we add a vector v to a column/row of M to make M ′ .
Then det(M ′ ) is det(M ) plus the determinant of M with the first column replaced by v. For
instance,
a + x b c a b c x b c

d + y e f = d e f + y e f

g + z h i g h i z h i

Matrix Operations:

◦ Multiplicative: det(AB) = det(A) · det(B)

◦ Inverse: det(A−1 ) = 1/ det(A) for A invertible.

Special Matrix Types:

◦ Adjugates/Adjoint: Recall that, where the minors (smaller determinants) are Mi,j (generated
by entry mi,j in M ), we have

adj(M ) := (−1)i+j Mj,i 1≤i,j≤n

tbe transpose of the cofactor matrix. We assume that M ∈ Rn×n .

Commuting with Adjugate: det(M ) · I = M adj(M ) = adj(M )M
1
Inverses & Adjugates: M −1 = · adj(M )
det(M )
2
Hence, det(adj(M )) = det(M )n−1 and det(adj(adj(M ))) = det(M )(n−1)
◦ Cofactors: Recall the cofactors in M are of the type (−1)i+j Mi,j for Mi,j the smaller determi-
nants generated by mi,j . Let Mc be the cofactor matrix of M . Then

det(M )2 = det(Mc )

◦ Diagonal/Triangular: For a diagonal or upper-triangular or lower-triangular matrix, the de-

terminant is the product of the entires on the main diagonal.

130
◦ Items on Block Matrices: Take A, B, C, D of dimensions n × n, n × m, m × n, and m × m
respectively. Then:

A 0
= det(A) det(D) = A B

C D 0 D

A B
For A invertible,
= det(A) det D − CA−1 B
C D

A B
When m = n and CD = DC, then = det(AD − BC)
C D

A B
When m = n and A = D and B = C, then = det(A − B) det(A + B)
B A
◦ Positive Semi-Definite Matrices: We say M is positive semi-definite if xT M x ≥ 0 for
each vector x compatible with M . (If working within C, we use x∗ , the complex-conjugate-
then-transpose, instead.) Positive-definite requires > 0 strictly; negative notions are defined
analogously.
Consider A, B, C positive semi-definite of the same size. Then
det(A + B + C) + det(C) ≥ det(A + C) + det(B + C)
det(A + B) ≥ det(A) + det(B) (corollary of above, 0 is PSD)
∗
For A, B Hermition (A = A , etc.) and positive definite of size n, then
p
n
p p
det(A + B) ≥ n det(A) + n det(B)
◦ Vandermonde Matrices: A (square) Vandermonde matrix takes the form
x1 x21 x31 · · · xn−1
 
1 1
1 x2 x22 x32 · · · xn−1
2

n−1 
 
2 3
V = 1
 x3 x3 x3 · · · x3 
. .. .. .. . . .. 
 .. . . . . . 
1 xn x2n x3n · · · xn−1
n

We have that Y
det(V ) = (xj − xi )
1≤i<j≤n

and, in particular, det(V ) ̸= 0 iff the xi are distinct.

Miscellaneous Properties:

◦ A Property of Characteristic Polynomials: We have that, for A, B square, pAB = pBA .

Moreover, if A ∈ Rm×n and B ∈ Rn×m , then AB ∈ Rm×m and BA ∈ Rn×n , with
pBA (λ) = λn−m pAB (λ)

◦ Characterization: The determinant may be considered to be uniquely characterized by the

propertoes of det(In×n ) = 1, that det(M ) = 0 when two rows or two columns are identical, and
a slight extension of the “vector adding” property, which may be explemified through

a + αx b c a b c x b c

d + αy e f = d e f + αy e f

g + αz h i g h i z h i
which is called multi-linearity. Proof of the claim may be found, e.g., here.
◦ Cramer’s Rule: For Ax = b, with A n × n and invertible, let Ai be the matrix obtained by
replacing column i in A by b. Then, if x := (xi )ni=1 ∈ Rn ,
det(Ai )
xi =
det(A)

131
◦ Derivative Identities (Jacobi’s Formula): If M depends on x, then

d det(M ) dM
= trace adj(M )
dx dx

For M invertible, thus we have

d det(M ) −1 dM
= det(M ) trace M
dx dx
Q
◦ Eigenvalues: For λi the eigenvalues of M (with repetition), then det(M ) = i λi .
m×n n×m
◦ Sylvester’s Determinant Theorem: Take A ∈ R and B ∈ R , so that AB ∈ Rm×m
and BA ∈ Rn×n . Then
Main Result: det(Im + AB) = det(In + BA)
Corollary: For r ∈ R1×m a row vector and c ∈ Rm×1 a column vector, we have

det(Im + cr) = 1 + rc = 1 + ⟨r, c⟩Rn

Corollary: For X m × m invertible,

det(X + AB) = det(X) det In + BX −1 A

Corollary Combination: With the above conventions,

det(X + cr) = det(X) det 1 + rX −1 c = det(X) + r adj(X)c

Corollary: If m = n, then AB, BA has the same characteristic polynomials and eigenvalues.
◦ Trace: Recall: trace(M ) is the sum of M ’s diagonal entries. We have that

det(exp(M )) = exp(trace(M ))

for complex M . For real M we may say

trace(M ) = log(det(exp(M )))

132
§8.8: Similarity & Properties Thereof

We are often interested in writing a matrix A in terms of an invertible matrix Q and a matrix B as

A = QBQ−1

(See, for instance, eigendecomposition.) In such a case, we say A, B are similar matrices, and can say
they represent the same linear transformation w.r.t. (possibly) different bases.
This notion induces an equivalence relation.
If two matrices are similar, the following properties are shared:

Rank (and hence nullity over finite-dimensional spaces)

Characteristic polynomial (hence, determinant, trace, eigenvalues & their multiplicities of both types)

Minimal polynomial

Frobenius & Jordan normal forms (up to permuting Jordan blocks)

Index of nilpotence

Elementary divisors

133
§8.9: Eigenstuff (-values, -vectors, -pairs, -spaces, -decomposition...)

Introductory Definitions & Concepts:

Consider a matrix (or more generally, a linear transformation) A ∈ Rn×n . Suppose ∃λ ∈ R and v ∈ Rn
such that
Av = λv (eigenequation)
Then we say (λ, v) is an eigenpair of A, with λ the eigenvalue and v its eigenvector . (Note: eigenvectors
are defined only up to multiplicative constants; that is (λ, v) being an eigenpair ensures that for (λ, αv) for
each α ∈ R.)
Some further definitions:

The set of eigenpairs of A is called its eigensystem.

The set of eigenvectors of A tied to a fixed eigenvalue, and the zero vector, is its eigenspace or
characteristic space of that eigenvalue
If the set of eigenvectors of A is a basis of dom(A), then we say it is an eigenbasis
The collection of all eigenvalues of A is called its spectrum, sometimes denoted σ(A) or Spec(A).

To find the eigenvalues λ of A, note that

Av = λv =⇒ Av − λv = 0
=⇒ (A − λI)v = 0
=⇒ v = 0 or, more importantly, det(A − λI) = 0

The polynomial pA (λ) := det(A − λI) is called the characteristic polynomial of A, and its roots are the
eigenvalues of A. Of course, they need not be distinct, leading to further concepts:

Algebraic Multiplicity: The algebraic multiplicity µA (λ) of the eigenvalue λ of A is simply its
multiplicity as a root of pA . We say λ is a simple eigenvalue if µA (λ) = 1.
Geometric Multiplicity: We define the eigenspace of a fixed eigenvalue λ by

Eλ := {v ∈ dom(A) | (A − λI)v = 0} = ker(A − λI)

Note that this is all scalar multiples of any eigenvector of λ. (Moreover, E ≤ dom(A).) The geometric
multiplicity of λ is given by

γA (λ) := dim Eλ = nullity(A − λI) = (# of linearly-independent eigenvectors associated to λ)

Some notes on these:

1 ≤ γA (λ) ≤ µA (λ) ≤ n for a fixed λ of a fixed A ∈ Rn×n

We say λ is semi-simple if γA (λ) = µA (λ).
The direct sum of A’s eigenspaces is the domain space:
M
Eλ = dom(A) = Rn
λ∈σ(A)

n linearly independent eigenvectors of A may form an eigenbasis for Rn

134
Further Properties & Results of Note:

Eigendecomposition: Let (λi , vi ) be the eigenpairs of A, with eigenvectors linearly independent (not
necessarily eigenvalues). Then let P = [v1 | v2 | · · · | vn ] and D = diag(λ1 , · · ·, λn ). Then A = P DP −1 .
This is the eigendecomposition of A. (Note: We say M, N are similar matrices if ∃P invertible
such that M = P N P −1 . Hence, the eigendecomposition diagonalizes A and shows it is similar to a
diagonal matrix, sort of representing the same transformation in different bases.)
If an eigendecompsotion does not exist, the matrix is said to be defective and we may appeal to
generalized eigenvectors and the Jordan normal form.
n
Determinant: For A with σ(A) := {λi }i=1 (including repetition), we have det(A) = i λi .
Q

n
Trace: For A with σ(A) := {λi }i=1 (including repetition), we have trace(A) = i λi .
P

If λ is an eigenvalue of A then λk is one of Ak .

A is invertible iff 0 ̸∈ σ(A) (all eigenvalues are nonzero). It has eigenvalues 1/λi .

If A = A∗ (Hermitian; symmetric if real), then λ ∈ R ∀λ ∈ σ(A).

◦ Further, if also positive-definite, positive semi-definite, negative-definite, or negative semi-definite,

then the eigenvalues reflect that: λ > 0, λ ≥ 0, λ < 0, λ ≤ 0 respectively.

If A−1 = A∗ (unitary), then |λ| = 1 ∀λ ∈ σ(A).

If λ is an eigenvalue of A, then A + αI has eigenvalue λ + α for α ∈ C.

◦ More generally, for p ∈ C[x], λ ∈ σ(A) =⇒ p(λ) ∈ σ(p(A))

If each column sums to λ, or each row sums to λ, then λ is an eigenvalue of that matrix. (The all-ones
vector will be an eigenvector for A or AT .)

Computational Algorithm (famous due to Tao):

Originally due to Robert Thompson in 1966 but resurged in notoriety with Tao’s multiple proofs of it.
Take A ∈ Rn×n normal (i.e. AA∗ = A∗ A) with eigenvalues λi , and eigenvectors correspondingly vi . Let
vi,j be the jth entry in vi (1 ≤ j ≤ n).
(j)
Let Aj ∈ R(n−1)×(n−1) be that obtained by remaining row & column j from A, with eigenvalues λk .
Then
n−1
(j)
2
Y Y
|vi,j | λi − λk = λi − λk
1≤k≤n k=1
k̸=i

and, for pA , pAj the characteristic polynomials of A, Aj , we equivalently have

2 pAj (λi )
|vi,j | =
p′A (λi )

assuming a nonzero derivative.

135
Spectral Radius:
The spectral radius ρ(A) for A ∈ Cn×n is given as the largest eigenvalue by magnitude:

ρ(A) := max |λ|

λ∈σ(A)

Some results:

For any matrix norm induced by one on vectors, ρ(A) ≤ ∥A∥. Moreover, we have Gelfand’s formula,
stating
1/k
ρ(A) = lim Ak
k→∞

k→∞
ρ(A) < 1 iff Ak −−−−→ 0 , and ρ(A) > 1 iff Ak → ∞ for any matrix norm.

Sub-Multiplicative: For commuting Ai ∈ Cm×m , ρ( i Ai ) ≤ i ρ(Ai ).

Q Q

136
§8.10: Matrix Norms, Equivalence, Inequalities, & Related Notions

Recall: a norm on any F-vector space V is a function ∥·∥ : V → R≥0 such that

(i) Positive Semi-Definiteness: ∥x∥ ≥ 0 ∀x ∈ V , with ∥x∥ = 0 ⇐⇒ x = 0

(ii) Triangle Inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥ ∀x, y ∈ V
(iii) Absolute Homogenity: ∥αx∥ = |α| · ∥x∥ ∀α ∈ F and ∀x ∈ V

We may assign these to spaces of matrices as well. We may, for instance, discuss a matrix norm induced
by a vector norm. Given A ∈ Fm×n and norms ∥·∥(m) and ∥·∥(n) on Fm , Fn respectively, define the induced
matrix norm (on A by these norms) by
n o
∥A∥(m,n) := inf C ≥ 0 ∥Ax∥(m) ≤ C∥x∥(n) for all x ∈ Cn

∥Ax∥(m)
≡ sup ≡ sup ∥Ax∥(m)
x∈Cn ∥x∥(n) x∈Cn
x̸=0 ∥x∥(n) =1

= the maximum factor by which a vector is stretched by A (w.r.t. the norms)

As an example, consider the action of
1 2
A :=
0 2
as a map R2 → R2 under the 1-, 2-, and ∞-norms:

Some notes:

Induced matrix norms are submultiplicative: ∥AB∥ ≤ ∥A∥∥B∥ for instance.

The induced 1-norm is given by (for A ∈ Cm×n ) the max column sum:
n
X
∥A∥1 ≡ max ∥ai ∥1 ≡ max ai,j
1≤i≤m 1≤i≤m
j=1

The induced ∞-norm is given by (for A ∈ Cm×n ) the maximum row sum:
∥A∥∞ ≡ max ∥a∗i ∥1
1≤i≤m

137
The induced 2-norm is the spectral norm. We have that
p
∥A∥2 = λmax (A∗ A) ≡ σmax (A)

where
r
λmax (M ) := max λ σmax (M ) := the largest singular value of M (in the SVD sense)
λ∈σ(M )

Related notions:

A norm ∥·∥ on Fm×n is consistent with norms ∥·∥(n) , ∥·∥(m) on Fn , Fm if

∥Ax∥(m) ≤ ∥A∥ · ∥x∥(n) for all A ∈ Fm×n and all x ∈ Fn

We say the matrix norm is compatible with the vector norm in the case both norms are equal (and
hence m = n).
All induced norms are definitionally consistent, and submultiplicative norms in general induce com-
patible vector norms.

Matrix norms need not be induced. For instance, one has a class of entry-wise matrix norms, in the same
spirit as the vector norms. (One may think of A ∈ Fm×n as instead a vector in Fmn to motivate these.) For
instance, for p ∈ [1, ∞), one may define
 1/p
X p
∥A∥p :=  |ai,j | 
i,j

Notably, the case p = 2 gives the Frobenius norm or Hilbert-Schmidt norm, ∥A∥F , which satisfies
sX sX
2
p p
∥A∥F = |ai,j | = trace(A∗ A) = trace(AA∗ ) = σi2 (A)
i,j i

for σi (A) the singular values of A. (Much like the 2-norm, it is invariant under multiplication by unitary
matrices or those with orthonormal columns or rows.) This final identity also relates the norm to the
Schatten 2-norm (Wikipedia).
More generally, for p, q ∈ [1, ∞), one has the Lp,q norm

n m
!q/p 1/p
X X p
∥A∥p,q :=  |ai,j | 
j=1 i=1

138
Norm Equivalence & p-norm Inequalities:
We say two norms ∥·∥α , ∥·∥β (in general, on an F-vector space V ) are equivalent if

∃C1 , C2 ∈ F such that, ∀x ∈ V, C1 ∥x∥α ≤ ∥x∥β ≤ C2 ∥x∥α

(hence inducing the same topology). All norms on finite-dimensional spaces are equivalent.
For matrices A ∈ Rm×n , of rank r, we have the inequalities below. Subscripts refer to induced p-norms.
The max norm is given by
∥A∥max := max|ai,j |
i,j

√
∥A∥2 ≤ ∥A∥F ≤ r∥A∥2

p
∥A∥F ≤ ∥A∥∗ ≤ ∥∥ AF (middle is the Schatten 1-norm)
√
∥A∥max ≤ ∥A∥2 ≤ mn∥A∥max
√ √
∥A∥∞ ≤ n∥A∥2 ≤ mn∥A∥∞
√ √
∥A∥1 ≤ m∥A∥2 ≤ mn∥A∥1

p
∥A∥2 ≤ ∥A∥1 ∥A∥∞

139
§8.11: Bessel’s Inequality

Let {ek }k∈N (or a finite sequence) be orthonormal in a Hilbert space H. Then
∞
X 2 2
|⟨x, ek ⟩H | ≤ ∥x∥H
k=1

with equality when the ek form a basis.

P That case is known as Parseval’s identity. Proof of Bessel’s inequality
2 2 2 2
as above follows trivially from ∥x − ⟨x, ek ⟩H ek ∥H when expressed via ∥x − y∥ = ∥x∥ − 2Re⟨x, y⟩ + ∥y∥
(itself following from the definition of an induced norm).

140
§8.12: Gram-Schmidt Orthonormalization Process

(Wikipedia article)

Classical Gram-Schmidt (CGS; unstable):

n
Given a set of linearly independent vectors {ai }i=1 , the Gram-Schmidt process generates a set of vectors
n
{qi }i=1 meeting the following conditions:
k k
The first k qi span the same space as the first k ai : span{qi }i=1 = span{ai }i=1

The qi are pairwise orthogonal (qi ⊥ qj for i ̸= j)

The qi have unit norm (hence, ⟨qi , qj ⟩ = δi,j )

The calculation is as so:

i−1
ui X ⟨u, a⟩ ⟨u, a⟩
qi = ui := ai − projuk (ai ) proju (a) := u= 2 u
∥ui ∥2 ⟨u, u⟩ ∥u∥
k=1

i−1 i−1
Essentially, what happens is that ai is projected orthogonally onto span{aj }j=1 = span{qj }j=1 . The differ-
ence between the original vector and its projection (of the type v − P v) is thus orthogonal to that space; we
normalize from there if desired.
Expressed as pseudo-code,

Modified Gram-Schdmit (MGS; stabler):

The above-mentioned process is unstable and does not necessarily give orthogonality when done on
computers, which defeats the entire point.
n
Expressed in pseudo-code, MGS applied to {vi }i=1 is as so:

141
Comparison:
k k
Side by side, the GS algorithms applied to {xi }i=1 to generate {yi }i=1 looks like this:

Notice that MGS modifies all of the vectors in turn, and subtracts out the higher-indexed directions.

142
§8.13: (Moore-Penrose) Pseudoinverse

(Wikipedia article.)

The (Moore-Penrose) pseudoinverse to a matrix A ∈ Fm×n is a second matrix A+ ∈ Fn×m satisfying

the criteria:

AA+ A = A (generalized inverse condition: maps columns of A to themselves)

A+ AA+ = A+ (reflexive inverse condition: weak inverse; imagine if AA+ = I)
(AA+ )∗ = AA+ and (A+ A)∗ = A+ A (AA+ , A+ A are Hermitian)

This exists for any A. When A is of full rank (linearly independent columns; A∗ A invertible), we have

A+ = (A∗ A)−1 A∗

When the rows are linearly independent, we instead how

A+ = A∗ (AA∗ )−1

These serve roles of left and right inverses, respectively, to such A.

Some properties:

The pseudoinverse exists and is unique for any A.

If we have the SVD A = U ΣV ∗ then A+ = V Σ+ U ∗

If A has real entries, so does A+ .

If A ∈ GLm (F), then A+ = A−1 ∈ GLm (F)

0+ = 0T for the zero matrix

(A+ )+ = A (involution)
Commutes with transpose, conjugation, and conjugate-transpose:

(AT )+ = (A+ )T (A)+ = A+ (A∗ )+ = (A+ )∗

1 +
Scalar multiples invert: if α ̸= 0, then (αA)+ = A
α
If A∗ A = I, then A+ = A∗ .

AA+ , A+ A are orthogonal projections onto the range of A, A∗ resp.

Some equalities:

A = AA∗ (A+ )∗ = (A+ )∗ A∗ A

A+ = A+ (A+ )∗ A∗ = A∗ (A+ )∗ A+

A∗ = A∗ AA+ = A+ AA∗

143
§8.14: Characteristic Polynomials

The characteristic polynomial of A ∈ Fn×n is given by

PA (λ) := det(A − λI) or equivalently PA (λ) := det(λI − A)

Note that PA ∈ F[x], is of degree n, and (with the latter definition) is always monic. (The pair differ by a
sign of (−1)n , which means the first definition is monic iff n is even.)
Some notes:

The roots of PA are A’s eigenvalues, naturally.

PA has constant term det(A) under the first definition, or det(−A) = (−1)n det(A) under the second.

PA has − trace(A) as the coefficient of λn−1 under the second definition, or (−1)n trace(A) under the
first.
If A = P BP −1 for some P , then PA ≡ PB .

PA ≡ PAT

A is similar to a triangular matrix iff PA is factorable into linear factors.

PAB ≡ PBA for square matrices.

For A ∈ Fm×n , B ∈ Fn×m , then AB ∈ Fm×m and BA ∈ Fn×n , giving us PBA (λ) = λn−m PAB (λ)

Cayley-Hamilton Theorem: PA (A) = 0n×n as a matrix. Note that you can’t just plug it into the
determinant form owing to this result (a naive but wrong proof).
For f ∈ F[x], then Pf (A) (λ) = PA (f (λ)).

ck xk , by the rule
P
The Faddeev-LeVerrier algorithm can calculate the coefficients of PA (x) :=

tr A m−1 0 ···
tr A2

tr A m−2 ···
(−1)m .. ..

.. .
cn−m = . . .
m!

tr Am−1 tr Am−2 ··· · · · 1

tr Am tr Am−1 ··· · · · tr A

144
§8.15: Minimal Polynomials

The minimal polynomial µA ∈ F[x] of A ∈ Fn×n is one satisfying the following:

µA is monic
µA (A) = 0
µA is of minimal degree to satisfy this condition
If P ∈ F[x] has P (A) = 0, then µA | P

To compute µA , we note:

µA | PA , the characteristic polynomial of A: PA (λ) = det(A − λI)

If an irreducible factor appears in PA , then it must in µA

Hence (assuming F is algebraically closed)

Y Y
PA (x) = (x − λi )mi =⇒ µA (x) = (x − λi )ei for some ei ∈ {1, 2, · · ·, mi }
i i

(Hence, µA = PA if all of the eigenvalues are distinct.)

Some results:

These are equivalent:

◦ λ is a root of µA
◦ λ is a root of PA
◦ λ ∈ σ(A)
If A, B are similar, i.e. ∃P invertible with A = P BP −1 , then µA = µB .
One can show that (if m is the multiplicity of λ) then

ker(A − λI) ⊊ ker (A − λI)2 ⊊ ker (A − λI)3 ⊊ · · · ⊊ ker (A − λI)m = ker (A − λI)M

for all M ≥ m.
Primary Decomposition Theorem: Relatedly, suppose
Y
µA (x) = (x − λi )mi
i

n×n n×1
where A ∈ F and hence has domain F (in the sense of a function). We also suppose the
eigenvalues are distinct. Then we have that
M
Fn×1 = ker (A − λi I)mi
i

where each kernel is invariant under A (in the sense Ak ∈ K for each k ∈ K and K one of the prescribed
kernels).
Note that ker(A − λI) is an eigenspace; hence, A’s domain is a direct sum of eigenspaces iff mi = 1 for
all i.
In turn, µA encapsulates how much we need to enlarge the eigenspaces to generalized eigenspaces to
form the prescribed direct sum.

145
Relatedly, A is diagonalizable iff µA has multiplicity 1 for each eigenvalue, i.e.
Y
µA (x) = (x − λ)
i

Note: A is diagonalizable iff there is an eigenbasis for its domain.

146
§8.16: The Cayley-Hamilton Theorem

Recall: the characteristic polynomial of A ∈ Fn×n is given by

n
X
PA (λ) := det(A − λI) = ai λi
i=0

where ai ∈ F, ensuring PA ∈ F[x]. The Cayley-Hamilton theorem states that

PA (A) = 0n×n
A (faulty) proof is given with det(A − AI) = 0, but this is a scalar, not a matrix. One must make use
of the polynomial.
If F is an algebraically closed field (e.g. C) a proof comes from the Schur decomposition A = QT Q∗
(Q unitary, T upper triangular), and factorizing PA into linear factors:
" #
Y Y Y
∗ ∗
PA (A) = (A − λi I) = QT Q − λi QQ = Q T − λi I Q∗
i i i
From here, prove the product is zero, by showing the first k columns of the first k factors’ product are
zero (induction).
One may use this to show that if A ∈ Fn×n has a polynomial q ∈ F[A], then ∃r ∈ F[A] with q(A) = r(A)
and deg(r) ≤ n − 1.
We can, more generally, use Cayley-Hamilton to write matrix powers/inverses as linear combinations of
the original matrix and I.

Example: Consider the matrix

1 2
A :=
−2 1
Note that
PA (λ) = λ2 − 2λ + 5 =⇒ A2 − 2A + 5I = 0
Therefore,
A2 = 2A − 5I
and, for instance,
A4 = (2A − 5I)2 = −12A + 5I
Continued Example: We may likewise find A−1 . Notice that
1 2 1
A2 − 2A + 5I = 0 =⇒ I = A − 2A = A · A − 2I
5 5
1
and hence A−1 = A − 2I .
5
Example: One may likewise get use out of Bezout’s identity for these problems. For instance, if
 
1 4 1
A := 0 2 2
1 0 3
and one wants (A2 + I)−1 , first we note that A2 + I as a polynomial in F[λ] is λ2 + 1, and that

gcd PA (λ), λ2 + 1 = gcd λ3 − 8λ, λ2 + 1 = 1

so there are polynomials r, q ∈ F[λ] with

r(λ)(λ3 − 8λ) + q(λ)(λ2 + 1) = 1
The left-hand term vanishes with λ 7→ A by Cayley-Hamilton, and then q(A) is the desired inverse.

147
§8.17: The Power Method / Power Iteration / von Mises Iteration

Given a matrix A ∈ Fn×n , let

λ = max{λi | λi ∈ σ(A)} and v its corresponding eigenvector

Then the recursion

Abk
bk+1 =
∥Abk ∥
(for any reasonable approximation b0 to the eigenvector as initial) has a subsequence convergent to v.
Moreover, we can likewise find the spectral radius (ρ(A) := max|λi |) once we know its eigenvector:

v T AT v (Av)T v ⟨Av, v⟩
λ= T
= 2 =
v v ∥v∥ ⟨v, v⟩

This formula is known as the Rayleigh quotient.

Relatedly, for many matrices, computing An for large n often has its dominant eigenvector appear in the
rows.

148
§8.18: Definiteness: Positive & Negative (Semi-)Definite

Definitions:
For M ∈ Fn×n which is symmetric (F = R and thus x∗ = xT ) or Hermitian (F = C),

Positive Definite: x∗ M x > 0 for all x ̸= 0

Positive Semi-Definite: x∗ M x ≥ 0 for all x

Negative Definite: x∗ M x < 0 for all x ̸= 0
Negative Semi-Definite: x∗ M x ≤ 0 for all x

Indefinite: None of these apply

Some shorthands:

“PD” = “positive definite”

“PSD” = “positive semi-definite”

“ND” = “negative definite”
“NSD” = “negative semi-definite”

Some orderings some people use:

M ≼ 0 means M is NSD

M ≺ 0 means M is ND

M ≽ 0 means M is PSD

M ≻ 0 means M is PD

Note that PD matrices are PSD, and ND matrices are NSD.

Results of Note:

A ∈ GLn (R) =⇒ AT A is PD: xT (AT A)x = (Ax)T Ax = ∥Ax∥, positive as A is invertible and x ̸= 0

Results about eigenvalues:

◦ M is PD iff all eigenvalues are positive

◦ M is PSD iff all eigenvalues are nonnegative
◦ M is ND iff all eigenvalues are negative
◦ M is NSD iff all eigenvalues are nonnegative
◦ M is indefinite iff it has eigenvalues λ, λ′ such that λ′ < 0 < λ

PSD matrices have a decomposition M = B ∗ B (and the converse applies). M is PD iff B is invertible.
This decomposition need not be unique.
PSD M have real, nonnegative diagonals and hence trace(M ) ≥ 0.

149
§8.19: Dual Spaces; Adjoints

Recall: given X an F-vector space, we define a few dual spaces:

The algebraic dual (denoted X # , X ′ , X ∨ , or X ∗ ) is given by L(X, F), i.e. all linear functionals
φ : X → F.

The continuous/topological dual (denoted V ′ ) is the subset of L(X, F) which consists of continuous
functionals, i.e. L(X, F) ∩ C(X, F).

Take u ∈ L(X, Y ). Then this map has algebraic adjoint/dual or pullback

#
u : Y # → X#
f 7→ f ◦ u

If X, Y are topological vector spaces, then we can let T u denote a restriction: T
u= #
uY ′ : Y ′ → X # .

150
§8.20: Various Matrix Decompositions

§8.20.1: Eigendecomposition / Spectral Decomposition

More to come later....

151
§8.20.2: Singular Value Decomposition (SVD)

(Wikipedia article here.)

The Basics & Geometric Intuition:

Assume for now that rank(A) = n for A ∈ Cm×n and m ≥ n (full rank).
The image of the unit sphere under a linear transformation A is a hyperellipse. We may use this fact to
motivate the SVD as so:

We rotate space so some directions vi align with the standard basis (WLOG, vi have unit 2-norm)

We stretch each direction by some factors σi

We rotate space again so the standard basis aligns with some new directions ui (unit 2-norm)

We say and note:

The vectors {σi ui } are the principal semiaxes of the ellipse and have lengths σi .

For rank(A) = r, r-many of σi are nonzero

If A ∈ Cm×n , with m ≥ n, at most n σi are nonzero

The {σi } are the singular values of A; we usually write σ1 ≥ σ2 ≥ · · · > 0.

The {ui } are the left singular vectors of A (we generally have min{m, n}-many of them)

The {vi } are the right singular vectors of A

We have that Avj = σj uj for each j.

If we write the vectors columnwise and form a diagonal matrix of the singular values, we have

or compactly, AV = Û Σ̂ with

A ∈ Cm×n

V ∈ Cn×n with orthonormal columns (hence, it is unitary and V ∗ = V −1 )

Û ∈ Cm×n with orthonormal columns

Σ̂ ∈ Cn×n diagonal

152
The unitary property thus lets us write the reduced SVD of A:
A = Û Σ̂V ∗ = Û
| ÊV
−1
{z }
more intuitive geometrically

Note that Û has n orthonormal vector-columns in Cm , and hence (unless m = n) they are not a basis.
However, if we append m − n orthonormal columns to it, Û is extended to a unitary matrix U . Doing so
requires Σ̂ to be changed, by making it square with the appending of extra (m − n) rows of only zeroes.
This yields the full SVD of A:
A = U ΣV ∗ = U ΣV −1
wherein

A ∈ Cm×n
U ∈ Cm×m and unitary
Σ ∈ Cm×n (same size as A), with singular values on the “main diagonal” (top left going down right)
and zeroes elsewhere
V ∈ Cn×n and unitary

Note that if A is rank-deficient (not full rank, i.e. r := rank(A) < min{m, n}), the factorization applies
even still. We just append m−r (not m−n) orthonormal vectors to Û instead, and append n−r orthonormal
vectors to V . Σ will have r positive entries on the diagonal, and n − r are zero.
The reduced SVD may be simply helped along with

Û ∈ Cm×n
Σ̂ ∈ Cn×n with some zeroes on the diagonal

Û ∈ Cm×r
Σ̂ ∈ Cr×r with no zeroes on the diagonal

153
A final reminder on the geometric interpretation, though strictly speaking unitary matrices can also cause
reflections:

Some Results & Notes:

We use A = U ΣV ∗ .

Every matrix has an SVD; with the restriction that the singular values are nonincreasing in order, the
SVD becomes unique (up to the type uniqued).
Consequences of A’s determinant:

◦ If det(A) > 0, then U, V can both be rotations with reflections, or rotations without reflections.
◦ If det(A) < 0, exactly one of U, V constitute a reflection.
◦ If det(A) = 0, no restriction exists.

Properties from the SVD: Take A ∈ Cm×n , with p := min{m, n} and r the number of singular
values of A (that are nonzero). Then some properties centered around the SVD or easily derived from
it:

◦ rank(A) = r (Bau, Thm. 5.1)

◦ range(A) = span{u1 , · · ·, ur } (Bau, Thm. 5.2)
◦ ker(A) = span{vr+1 , · · ·, vn } (Bau, Thm. 5.2)
◦ ∥A∥2 = σ1 (Bau, Thm. 5.3)
pP
◦ ∥A∥F = 2
i σi (Bau, Thm. 5.3)
√
◦ For λ ∈ σ(AA∗ ) = σ(A∗ A) nonzero, λ is a singular value of A (Bau, Thm. 5.4)
∗
◦ For A = A (Hermitian), λ ∈ σ(A) gives |λ| a singular value of A (Bau, Thm. 5.5)
Q
◦ If A is square, then |det(A)| = i σi (Bau, Thm. 5.6)
Pr
◦ A is a sum of r rank-1 matrices: A = j=1 σj uj vj∗ (Bau, Thm. 5.7)

154
◦ For 0 ≤ ν ≤ r, define the νth partial sum (Bau, Thm. 5.8)
ν
X
Aν := σj uj vj∗
j=1

with σν+1 = 0 for ν = p. Then we have

∥A − Aν ∥2 = inf ∥A − B∥2 = σν+1

B∈Cm×n
rank(B)≤ν

giving that Aν is the bext 2-norm approximation of A of matrices of lower rank.

◦ For the Frobenius norm, we likewise have (Bau, Thm. 5.9)
v
u r
u X
∥A − Aν ∥F = inf ∥A − B∥F = t σi2
B∈Cm×n
rank(B)≤ν i=ν+1

155
Notes & Results Useful for Computation:
We use M = U ΣV ∗ as its SVD.

σ ∈ R≥0 is a singular value for M iff ∃u ∈ Cm , v ∈ Cn of unit norm such that

M v = σu M ∗ u = σv

u is a left-singular vector and v is a right singular vector . We say a singular value with at least
two linearly independent vectors of a type is degenerate (and the span of those two also constitute
singular vectors).
The SVD of M satisfies the following relations (immediate from the unitary nature of U, V ):

M ∗ M = V (Σ∗ Σ)V ∗
M M ∗ = U (ΣΣ∗ )U ∗

with the RHS’s giving both SVDs and eigendecompositions for M M ∗ , M ∗ M . Hence,

◦ The columns of V (the right-singular vectors) are eigenvectors of M ∗ M

◦ The columns of U (the left-singular vectors) are eigenvectors of M M ∗
◦ The eigenvectors in either case are orthogonal; take them to be of unit norm for simplicity
√
◦ If λ ∈ σ(M ∗ M ) = σ(M M ∗ ), then λ is a singular value of M
◦ For positive semidefinite matrices, the eigendecomposition and SVD coincide

To compute the SVD of most matrices M , we can:

Find M M ∗ , M ∗ M
√
Find their eigenvalues λi ; then λi make the singular values
Find the eigenvectors of M M ∗ ; call them ui (corresponding to λi ) and form U = [u1 | · · · | ur ].

Attach additional, arbitrary, orthonormal vectors as needed.

Form Σ = diag(σ1 , · · ·, σr ), appending zeroes as needed, with σ1 ≥ σ2 ≥ · · · ≥ 0.

To avoid another eigenvector computation, note that

M = U ΣV ∗ =⇒ U ∗ M = ΣV ∗

Compute each side of this equality, and compare columnwise to fill in V ∗ .

156
§8.20.3: QR Factorization

Goal: To write any A ∈ Cm×m in the form A = QR for Q a unitary matrix (Q∗ = Q−1 ) and R an
upper-triangular matrix. Such a factorization exists (albeit in several different forms) for any such A.
In short, the way we do this is as so:

Consider the columns ai of A.

Generate an orthonormal basis from them, by means of Gram-Schmidt:

i−1
ui X ⟨u, a⟩ ⟨u, a⟩
qi = ui := ai − projuk (ai ) proju (a) := u= 2 u
∥ui ∥2 ⟨u, u⟩ ∥u∥2
k=1

Q is formed from the qi .

R has entries ri,j = ⟨qi , aj ⟩.

One may find R as R = Q∗ A.

n
We define ⟨v1 , · · ·, vn ⟩ := span{vi }i=1
QR factorization has the focus of orthogonalizing the range of A ∈ Cm×n . Specifically, we want to get
n
{qi }i=1 ⊆ Cm such that

⟨q1 , · · ·, qk ⟩ = ⟨a1 , · · ·, ak ⟩ for each k ∈ {1, 2, · · ·, n}

qi ⊥ qj for i ̸= j, i.e. qi∗ qj = ⟨qi , qj ⟩Cm = δi,j (orthogonality)

∥qi ∥2 = 1 (unit vectors / orthonormality)

The result can be framed in a few ways, then: A = Q̂R̂ with

 
r1,1 r1,2 · · · r1,n
h i h i 0 r2,2 · · · r2,n 
a1 · · · an = q1 · · · qn  .
 
. .. .. .. 
| {z } | {z } . . . . 
=A∈C m×n
= Q̂ ∈ Cm×n 0 0 · · · rn,n
unitary | {z }
= R̂ ∈ Cn×n
upper-triangular

or as a set of equations,

a1 = r1,1 q1
a2 = r1,2 q1 + r2,2 q2
..
.
n
X
an = r1,n q1 + r2,n q2 + . . . + rn,n qn = ri,n qi
i=1

157
Note that A = Q̂R̂ is a reduced QR factorization; the full QR factorization A = QR is formed as
so: for m ≥ n,
m
Add m − n orthonormal columns to Q̂ to get Q ∈ Cm×m unitary. (The extra columns {qj }j=n+1 are
a basis of range(A)⊥ = ker(A∗ ).)
Add equally many all-zero rows to R̂ to get R ∈ Cm×n (upper triangular, of a sort)

158
§8.20.4: Householder Triangularization & QR Stuff

CGS/MGS may be thought of as applying elementary triangular matrices Rk on the right of A:

AR1 · · ·Rn = Q̂ =⇒ R̂ = Rn−1 · · ·R1−1

The Householder method instead applies elementary unitary transformations Qk on A’s left:

Qn · · ·Q1 A = R =⇒ Q = Q∗1 · · ·Q∗n

That is, GS is a “triangular orthogonalization” method, with Householder an “orthogonal triangularization

method”.
The Qk take the form

I(k−1)×(k−1) 0
Qk := for Fk ∈ C(m−k+1)×(m−k+1) unitary
0 Fk

We see that
vk vk∗ vk vk∗
Fk = I − 2 = I − 2
vk∗ vk 2
∥vk , vk ∥2
where xk ∈ Cm−k+1 is taken from A’s kth column, as the entries in rows k to m (xk := Ak:m,k ), and then

vk := sign(x1 ) · ∥x∥2 · e1 + x

Note that Fk = I − 2P for a certain projection P , and reflects space about the hyperplane through the origin
and perpendicular to xk . Fk is a full-rank, unitary matrix.
In pseudocode, then, an implicit QR factorization of A may be constructed:

159
It being implicit is no issue, since we may still find Q∗ b and Qx easily:

The latter algorithm applied to Qek for each k can reconstruct Q explicitly if need be.
Some properties of Householder matrices:

They are Hermitian, unitary, and hence involutions

Their eigenvalues are all ±1, with exactly 1 being −1

They have determinant −1 as a result

160
§8.20.5: Hessenberg Matrices

Hessenberg matrices arise as a class of “almost triangular” matrix, e.g. in real Schur decompositions.
Let A := (ai,j )ni,j=1 ∈ Fn×n .

A is upper Hessenberg if it is “almost upper triangular”, in that the first subdiagonal may have
nonzero entries. Examples:
     
1 2 3 4 1 1 1 1 × × × ×
2 3 4
 −3
2
 2 2 0
0
 × × ×

0 3 1 2 0 3 4 5 0 0 × ×
0 0 2 2 0 0 0 7 0 0 0 ×

It satisfies ai,j = 0 when i > j + 1.

A is lower Hessenberg if it is “almost lower triangular”: the first superdiagonal may have nonzero
entries. Examples:
     
1 0 0 0 4 0 0 0 × × 0 0
3 3 0 0 4 4 1 0 × × × 0 
     
4 4 5 0 1 2 3 7 × × × ×
−2 3 31 12 0 42 6 7 × × × ×

It satisfies ai,j = 0 when j > i + 1.

A few notes:

A is upper (lower) Hessenberg iff AT is lower (upper) Hessenberg.

A matrix which is both upper and lower Hessenberg is tridiagonal.

A matrix which has all nonzero entries on the critical super-/sub-diagonal is said to be unreduced .

One may construct the Hessenberg form of any square matrix A (over R, C) as so:

Let A ∈ C(n−1)×n be A sans its first row; a′1 the first column of A′ .

Define the Householder matrix V1 ∈ F(n−1)×(n−1) by

∥a′1 ∥e1 − a′1 , a′1,1 = 0

(
ww∗
V1 := In−1 − 2 w := a
∥w∥
2 ∥a′1 ∥e1 − a1,1 x, a′1,1 ̸= 0
| ′1,1 |

(What is x?)
Then the block matrix U1 = blockdiag(1, V1 ) eliminates the necessary entries in A’s first column.

Construct U2 , U3 , · · ·, Un−2 analogously.

Then let U = Un−2 Un−3 · · ·U2 U1 .

Then H = U AU ∗ .

161
§8.20.6: Cholesky Factorization

(Wikipedia article.)

The Cholesky decomposition applies to A which are Hermitian (A = A∗ ) and positive-definite (x∗ Ax > 0
for each vector x). It writes it in the form
A = LL∗
where L is lower-triangular, with positive real diagonal entries (ℓi,i ∈ R>0 ). If a matrix has such a de-
composition, it is unique; moreover, the existence of such a decomposition ensures it is Hermitian and
positive-definitive.
Note that, given any matrix A, A∗ A satisfies the conditions of Hermitian positive-definite (at least in the
real case?).
One may extend this to PSD matrices (x∗ Ax ≥ 0), but the decomposition may not be unique and
ℓi,i ∈ R≥0 . It is unique with that stipulation that if rank(A) = r, then L has r positive entries on the
diagonal, with n − r columns of all zeroes.
One also has the LDL or LDLT or “square-root-free Cholesky” decomposition, which writes

A = LDL∗

Here, L is further a unit lower triangular matrix (so ℓi,i = 1 for all i), and D is a diagonal matrix. If
A = M M ∗ is the classical Cholesky decomposition and S = diag(M ), these are related by the rule

L = M S −1 D = S2

D is a positive matrix when A is positive-definite. If A is PSD, then D has rank(A)-many nonzero entries
on its diagonal.
Computation is trivial. Write the multiplication A = LL∗ (or whichever) for explicitly labeled entries,
and solve the resulting equations (this is the Cholesky-Banachiewicz or Cholesky-Crout algorithm).
This can be done column by column. The equations that result are
v
u
u j−1
X 2
ℓj,j = ± taj,j − |ℓj,k |
k=1
v
u j−1
1 u ta −
X
(for i > j:) ℓi,j = i,j ℓi,k ℓj,k
ℓj,j
k=1

162
§8.20.7: Schur Decomposition

Overview:
Given A ∈ Fn×n , there is a factorization

A = QT Q∗

with

Q unitary (Q∗ = Q−1 )

T upper-triangular (eigenvalues of A on its diagonal)

Note that this means that every matrix is similar to a triangular matrix, using unitary ones as its other
factors; that is, every square matrix is unitarily equivalent to a triangular matrix.
It is often computationally used to further decompose T as

T =N +D

for N strictly upper diagonal and D diagonal.

Computation:
Typically, computation is uninteresting, and most are just concerned with its existence.
Computing the Schur decomposition is analogous to the QR factorization.

Find all eigenvalue and eigenvector pairs (λi , vi ) for A.

Generate an orthonormal basis from the eigenvectors (Gram-Schmidt).

Fill in any needed column vectors if needed to get a total of n vectors (for A ∈ Fn×n ).

Arrange these in columns to form Q.

T = Q∗ AQ

163
§9: Notes from Self-Studying Linear Algebra & Numerical Analysis

§9.1: (Trefethen & Bau) Lecture 1: Matrix Multiplication, Rank, Inverses, etc.

Basic Assumptions:
Important notations:

For a matrix A, we have its entries as ai,j . (Lowercase, two indices: row, column respectively.)

For a vector x, we have its entries as xi . (Single index.)

For a matrix A, we have its columns as aj . (Lowercase, one index. Bad, but for consistency...)

We focus on matrices over C.

Matrix/Vector & Matrix/Matrix Multiplication:

We start first with A ∈ Cm×n and x ∈ Cn . We say

n
X
Ax = b ⇐⇒ bi = ai,j xj for each i ∈ {1, 2, · · ·, m}
j=1

The map x 7→ Ax is linear as a map Cn → Cm . Conversely, each such linear map is representable by a
matrix.
An important change in perspective: we may write
n
X Xn scalar
z}|{
b = Ax = xj a j xj aj
j=1 j=1
|{z}
column

We may think, then:

b is a linear combination of the columns aj of A

x acts on A, not the reverse

This perspective may be generalized to matrix products: for A ∈ Cℓ×m , B ∈ Cℓ×n , C ∈ Cm×n ,
m
X
B = AC ⇐⇒ bi,j = ai,k · ck,j for each fixed i, j
k=1
m
X
⇐⇒ bj = Acj = ck,j · ak
k=1

Thus, a column bj of B, is a linear combination of the columns of A, as determined by the coefficients in

column cj .

164
Important Sets & Parameters: Rank, Range, Null Space

Range: The range of A ∈ Cm×n (analogous to a map Cm → Cn , mind) is like that of a function:
range(A) := {y ∈ Cn | ∃x ∈ Cm such that Ax = y}
= (the space spanned by the columns of A, by previous discussion)(Bau, T hm. 1.1)
= the column space of A

Null Space: The null space (or kernel ) of A ∈ Cm×n is

null(A) ≡ ker(A) := {x ∈ Cn | Ax = 0}
Hence, x ∈ ker(A) iff its entries are the coefficients for a linear combination of A’s columns that yield
the zero vector.
Nullity: Simply, nullity(A) := dim ker(A)
C

Rank: There are notions of row rank and column rank , the dimensions of the row space and column
space respectively, i.e. the spaces spanned by the rows (or columns) or A.
One may prove that these are always the same using SVD or other means, and hence may just speak
of the rank :

rank(A) ≡ dim the column space of A
C

≡ dim the row space of A
C
≡ dim range(A)
C

We say A ∈ Cm×n is of full rank if rank(A) = min{m, n} (the maximum possible rank). Hence,
if m ≥ n and A is of full rank, then it has m linearly independent rows and n linearly independent
columns. The mapping x 7→ Ax is hence injective.
Conversely, x 7→ Ax is injective iff A is full rank. (Bau, Thm. 1.2)

Invertible Matrices:
A square matrix A ∈ Cm×m is said to be non-singular or invertible under any of several equivalent
conditions: (Bau, Thm. 1.3)
A has an (unique) inverse matrix A−1 , i.e. one such that AA−1 = A−1 A = I
rank(A) = m (full rank)
range(A) = Cm (the columns or rows of A form a basis of Cm )
ker(A) = {0} (trivial kernel, hence injective)
0 is not an eigenvalue of A (0 ̸∈ σ(A))
0 is not a singular value of A
det(A) ̸= 0
When thinking of the multiplication A−1 b, we may think of
that vector being the unique solution to Ax = b
x gives the coefficients of the linear combination of columns of A that forms b (since A’s columns form
a basis of Cm when invertible, then b can be written in terms of those columns, and x encodes those
coefficients)

165
§9.2: (Trefethen & Bau) Notes on Exercises 1.1, 3.4 (Multiplication to Change
a Matrix)

Problem 1 in Lecture 1 from Bau’s text goes as so. My solution follows.

Let B ∈ C4×4 be a matrix to which we apply the following operations:

(1) double column 1 (7) delete column 1 (so that the column dimension
is reduced by 1)
(2) halve row 3
(3) add row 3 to row 1 Then we are asked to:
(4) interchange columns 1 and 4
(a) Write the result as a product of 8 matrices
(5) subtract row 2 from each of the other rows
(b) Write it again as a product ABC (same B) of
(6) replace column 4 by column 3 three matrices

The key to this is to interpret the multiplication BM = A as M acting on B. The result matrix has
its kth column as being a linear combination of B’s columns, as determined by the coefficients in the kth
column of the actor matrix M . For instance, consider
  
1 2 3 11 12 13
4 5 6 14 15 16 = A
7 8 9 17 18 19
| {z }| {z }
=:B =:M

In the result matrix, we have:

First column, given by 11 times the first column of B, plus 14 times its second column, plus 17 times
its third column
Second column, given by 12b1 + 15b2 + 18b3

Third column given by 13b1 + 16b2 + 19b3

Note that column bk may be preserved by letting mk = ek , the unit vector. This interpretation may be
dually done on the rows of M . Let m(r) denote M ’s rth row.
  
1 2 3 11 12 13
4 5 6 14 15 16 = A
7 8 9 17 18 19
| {z }| {z }
=:B =:M

A’s first row will have 1m(1) + 2m(2) + 3m(3) (coefficients in b(1) times the rows of M )

Its second will have 4m(1) + 5m(2) + 6m(3)

Its third will have 7m(1) + 8m(2) + 9m(3)

Note this comparison to the “usual” algorithm you like thinking of. Of course, letting m(r) = eT
r lets us
preserve row r.

166
Hence, we may write the desired product of part (a) by doing things which change rows on the left, and
things which change columns on the right.
        
1 0 0 0 1 0 1 0 1 0 0 0 2 0 0 0 0 0 0 1 1 0 0 0 0 0 0
−1 1 −1 −1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0
   B     
0 0 1 0  0 0 1 0 0 0 1/2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1
| {z }| {z }| {z } | {z }| {z }| {z }| {z }
=: M5 =: M3 =: M2 =: M1 =: M4 =: M6 =: M7

Mi executes step i in the process.

To scale row r by α, use I as your base matrix, but replace the 1 in row r by α.

To scale column c by α, use I as your base matrix, but replace the 1 in column c by α.

To swap columns k, ℓ, just swap them in your base matrix of I.

To add row k to row ℓ, make entry k in row ℓ be 1. (Or α, to add α times row k.)

etc.

On Deletion Matrices (Inspired by Exercise 3.4):

More properly on the deletion matrices: given A ∈ Cm×n , if we wish to delete a row/column, we want
an Ar ∈ C(m−1)×n or Ac ∈ Cm×(n−1) (deleting a row or column respectively).
The deletion matrix then is either a Dr ∈ C(m−1)×m or Dc ∈ Cn×(n−1) . To define Dr , begin with the
identity matrix Im×m and delete the row of it you wish to delete from A; for Dc , do the same for the column
of In×n .
Then Ar = Dr A in the row case or Ac = ADc in the column case.
For example,  
1 0
1 2 3
1 0 0 1
| {z } 4 5 6
=: Dr | {z } 0 0
=: A | {z }
=: Dc
Dr deletes the second row, and Dc the third column. (Compare each to I2×2 and I3×3 .)

167
§9.3: (Trefethen & Bau) Lecture 2: Orthogonality, Unitary Matrices

Basic Definitions:

Complex Conjugate: Given z := α + iβ ∈ C (α, β ∈ R), its complex conjugate is given by

z := α − ıβ (sometimes denoted z ∗ )

Hermitian Conjugate/Adjoint: Given A ∈ Cm×n , its Hermitian conjugate or adjoint is given

by
T
A∗ := (aj,i )1≤j≤n,1≤i≤m ≡ AT ≡ A ∈ Cn×m
for A := (ai,j )1≤i≤m,1≤j≤n the entry-wise conjugation.

Special Names for Certain Properties: Given certain equalities for a matrix A, we may call it
certain names. Note that if A = AT or A = A∗ or the like, then A is square.

Identity (over R) A = AT A = −AT A−1 = AT

Name (over R) Symmetric Skew-symmetric, anti-symmetric Orthogonal

Identity (over C) A = A∗ A = −A∗ A−1 = A∗ AA∗ = A∗ A

Name (over C) Hermitian, self-adjoint Skew-Hermitian, anti-Hermitian Unitary Normal

(Usual) Inner Product/Norm: Given x, y ∈ Cm , we denote their inner product by

m
X
⟨x, y⟩Cm := x∗ y ≡ xi yi
i=1

and the usual (2-)norm by

v
um
√ q uX 2
∥x∥2 ≡ ∥x∥ := x∗ x ≡ ⟨x, x⟩Cm ≡ t |xi |
i=1

Finally, the angle α between x, y (w.r.t. this inner product) is given by

x∗ y ⟨x, y⟩Cm
cos α ≡ =
∥x∥ · ∥y∥ ∥x∥2 · ∥y∥2

Orthogonality: We say x, y are orthogonal (x ⊥ y) if ⟨x, y⟩Cm = x∗ y = 0.

k ℓ
If X := {xi }i=1 , Y := {yi }i=1 are sets of vectors, we say X, Y are orthogonal if xi ⊥ yj for each i, j.
n
A single set X := {xi }i=1 ⊆ Cm is said to be orthogonal if 0 ̸∈ X and xi ⊥ xj for all i ̸= j. Moreover,
if each vector is of unit norm (∥x∥2 = 1 for all x ∈ X) we say X is an orthonormal set. (Then
⟨xi , xj ⟩ = x∗i xj = δi,j .)

168
Some Results, Theorems, or Identities of Note:

Inner Products & Sesquilinearity: The inner product is sesquilinear over C; hence, it is “half
linear” in the first coordinate (scalars pulled out become their conjugates) and fully linear in the
second. Thus

⟨x1 + x2 , y⟩Cm = ⟨x1 , y⟩Cm + ⟨x2 , y⟩Cm

⟨x, y1 + y2 ⟩Cm = ⟨x, y1 ⟩Cm + ⟨x, y2 ⟩Cm
⟨αx, βy⟩Cm = αβ⟨x, y⟩Cm

or, in terms of ⟨x, y⟩Cm ≡ x∗ y,

(x1 + x2 )∗ y = x∗1 + x∗2 y

x∗ (y1 + y2 ) = x∗ y1 + x∗ y2
(αx)∗ (βy) = αβ x∗ y

The vectors of an orthogonal set (i.e. pairwise orthogonal nonzero vectors) are linearly independent.
(Bau, Thm. 2.1)
m
◦ As a corollary of the above, if {xi }i=1 ⊆ Cm is an orthogonal set, it is a basis of Cm .
m
Orthogonal Decompositions: Consider a basis {qi }i=1 of Cm . Then we may write
m
X m
X m
X
v= ⟨qi , v⟩Cm qi = (qi∗ v)qi = (qi qi∗ )v
i=1 i=1 i=1

(where the first and second sums are obviously equivalent). Here, the final sum is a sum of the
orthogonal projections of v onto the qi .
Unitary Matrices: (Q is unitary if QQ∗ = Q∗ Q = I.) We have that qi∗ qj = δi,j .
m
We may think of, if b has entries giving its expansion in the standard basis {ei }i=1 , then Q−1 b = Q∗ b
m
has entries that expand b in the basis of {qi }i=1 (the columns of Q).
Unitary matrices preserve inner products, and hence norms and distances:

⟨Qx, Qy⟩Cm = ⟨x, y⟩Cm

(Qx)∗ Qy = x∗ Q∗ Qy = x∗ y
∥Qx∥2 = ∥x∥2

In fact, for unitary Q over R (that is, Q ∈ Rm×m being orthogonal: QQT = QT Q = I), these
transformations are strictly rotations (det Q = +1) or reflections (det Q = −1).

169
§9.4: (Trefethen & Bau) Lecture 2 Addendum (Useful Norm/Inner Product
Equalities)

Note that Lecture 2 did not mention that

⟨x, y⟩Cm = ⟨y, x⟩Cm

Hence we may also make use of the identities

z + z = 2 · Re(z)
z − z = 2 · Im(z)
2
z · z = |z|

Four Vectors, Four Scalars: A fuller expression is that, given α, β, γ, δ ∈ C and v, w, x, y ∈ Cm ,

⟨αv + βw, γx + δy⟩Cm = ⟨αv, γx + δy⟩Cm + ⟨βw, γx + δy⟩Cm

= ⟨αv, γx⟩Cm + ⟨αv, δy⟩Cm + ⟨βw, γx⟩Cm + ⟨βw, δy⟩Cm
⟨αv + βw, γx + δy⟩Cm = αγ⟨v, x⟩Cm + αδ⟨v, y⟩Cm + βγ⟨w, x⟩Cm + βδ⟨w, y⟩Cm

or, in terms of ⟨x, y⟩Cm = x∗ y,

(αv + βw)∗ (γx + δy) = αγ v ∗ x + αδ v ∗ y + βγ w∗ x + βδ w∗ y

Norms (2 Vectors, Scalars Are 1): Taking v = x, w = y, α = γ, β = δ in the above,

2
∥x + y∥2 = ⟨x + y, x + y⟩Cm
= ⟨x, x⟩Cm + ⟨x, y⟩Cm + ⟨y, x⟩Cm + ⟨y, y⟩Cm
= ⟨x, x⟩Cm + ⟨x, y⟩Cm + ⟨x, y⟩Cm + ⟨y, y⟩Cm
= ⟨x, x⟩Cm + 2 · Re⟨x, y⟩Cm + ⟨y, y⟩Cm (∗)
2 2
= ∥x∥2 + ∥y∥2 + 2 · Re⟨x, y⟩Cm (∗∗)

In terms of the x∗ y definition of ⟨x, y⟩, the key equalities are

2
∥x + y∥2 = (x + y)∗ (x + y)
= x∗ x + 2 · Re(x∗ y) + y ∗ y
2 2
= ∥x∥2 + ∥y∥2 + 2 · Re(x∗ y)

170
§9.5: (Trefethen & Bau) Lecture 3: Matrix & Vector Norms

Vector Norms:
A (vector) norm is a function ∥·∥ : Cm → R satisfying the following conditions:

(i) Positive Semi-Definiteness: ∥x∥ ≥ 0 ∀x ∈ Cm , with ∥x∥ = 0 ⇐⇒ x = 0

(ii) Triangle Inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥ ∀x, y ∈ Cm

(iii) Absolute Homogenity: ∥αx∥ = |α| · ∥x∥ ∀α ∈ C and ∀x ∈ Cm

The most important class of vector norms are the p norms. Given p ∈ [1, ∞) we define

m
!1/p
X p
∥x∥p := |xi | ∥x∥∞ := max |xi |
1≤i≤m
i=1

The unit balls in (R2 , ∥·∥p ) for some p are shown below. (Desmos demo.)

A weighted norm (weighing a given norm ∥·∥) may be given by

∥x∥W := ∥W x∥

for W = diag(w1 , · · ·, wm ) with wi ̸= 0. (More generally, any W ∈ GLm (C) will do.)
Some results:

Holder Inequality: Take 1/p + 1/q = 1 for p, q ∈ [1, ∞]. Then for x, y ∈ Cm we have

∥x∗ y∥1 = |⟨x, y⟩1 | = |x∗ y| ≤ ∥x∥p ∥y∥q

For p = q = 2, we have the Cauchy-Schwarz inequality:

|⟨x, y⟩1 | = |x∗ y| ≤ ∥x∥2 ∥y∥2

The latter holds in general for ⟨·, ·⟩V an inner product on V and ∥·∥V its induced norm:

|⟨x, y⟩V | ≤ ∥x∥V ∥y∥V

171
Matrix Norms:
One may give a norm to A ∈ Cm×n by envisioning it as a vector in Cmn .
One may introduce an induced matrix norm, induced by a given vector norm. Specifically, let
∥·∥(n) , ∥·∥(m) be norms on Cn , Cm , respectively. We may define the induced matrix norm ∥·∥(m,n) for
A ∈ Cm×n by
n o
∥A∥(m,n) := inf C ≥ 0 ∥Ax∥(m) ≤ C∥x∥(n) for all x ∈ Cn

∥Ax∥(m)
≡ sup
x∈C n ∥x∥(n)
x̸=0

≡ sup ∥Ax∥(m)
x∈Cn
∥x∥(n) =1

= the maximum factor by which a vector is stretched by A (w.r.t. the norms)

As an example, consider the action of

1 2
A :=
0 2
as a map R2 → R2 under the 1-, 2-, and ∞-norms:

Sometimes we say the norm on A is the induced p-norm if we apply the same p-norm on Ax and x,
up to the dimension of concern.
Some results:

For diagonal matrices D := diag(d1 , · · ·, dm ), ∥D∥p = max|di | ∀p ∈ [1, ∞].

The induced 1-norm is given by (for A ∈ Cm×n ) the maximum sum of a column’s absolute values:
n
X
∥A∥1 ≡ max ∥ai ∥1 ≡ max |ai,j |
1≤i≤m 1≤i≤m
j=1

The induced ∞-norm is given by (for A ∈ Cm×n ) the maximum sum of a row’s absolute values:

∥A∥∞ ≡ max ∥a∗i ∥1

1≤i≤m

172
◦ The text confusingly uses ai to denote the ith column of A, and a∗i the ith row, despite potential
confusion with the adjoint. Consider it purely notational.

Sub-Multiplicative: Take ∥·∥k a norm on Ck for k ∈ {ℓ, m, n}. Let A ∈ Cℓ×m and B ∈ Cm×n . Then
∀x ∈ Cn we have
∥ABx∥(ℓ) ≤ ∥A∥(ℓ,m) ∥Bx∥(m) ≤ ∥A∥(ℓ,m) ∥B∥(m,n) ∥x∥(n)
and hence
∥AB∥(ℓ,n) ≤ ∥A∥(ℓ,m) ∥B∥(m,n)
This need not be an equality.

Non-Induced Norms:
Not all norms on matrices are induced from those on vectors; satisfying the axioms of a norm is sufficient.
The Hilbert-Schmidt norm or Frobenius norm ∥·∥F is defined as a 2-norm on Cmn for A ∈ Cm×n :
v v
sX uXn um
2 2
uX 2
|ai,j | = t ∥aj ∥2 = t ∥a∗i ∥2
u
∥A∥F :=
i,j j=1 i=1
p p
≡ trace(A∗ A) = trace(AA∗ )

Note that
2 2 2
∥AB∥F ≤ ∥A∥F ∥B∥F

Norm Preservation Under Unitary Multiplication:

A theorem gives the following result: for A ∈ Cm×n and Q ∈ Cm×m unitary, the matrix 2- and Frobenius
norms are invariant under Q in the sense

∥QA∥2 = ∥A∥2 ∥QA∥F = ∥A∥F

This holds for Q rectangular with orthonormal columns and Q ∈ Cp×m with p > m.
One may reframe this result for right multiplication, and in turn orthonormal rows rather than unitary
matrices.

173
§9.6: (Trefethen & Bau) Lecture 4: Singular Value Decomposition (SVD)

We rotate space so some directions vi align with the standard basis (WLOG, vi have unit 2-norm)

We stretch each direction by some factors σi

We rotate space again so the standard basis aligns with some new directions ui (unit 2-norm)

We say and note:

The vectors {σi ui } are the principal semiaxes of the ellipse and have lengths σi .

For rank(A) = r, r-many of σi are nonzero

If A ∈ Cm×n , with m ≥ n, at most n σi are nonzero

The {σi } are the singular values of A; we usually write σ1 ≥ σ2 ≥ · · · > 0.

The {ui } are the left singular vectors of A (we generally have min{m, n}-many of them)

The {vi } are the right singular vectors of A

We have that Avj = σj uj for each j.

If we write the vectors columnwise and form a diagonal matrix of the singular values, we have

or compactly, AV = Û Σ̂ with

A ∈ Cm×n

V ∈ Cn×n with orthonormal columns (hence, it is unitary and V ∗ = V −1 )

Û ∈ Cm×n with orthonormal columns

Σ̂ ∈ Cn×n diagonal

174
The unitary property thus lets us write the reduced SVD of A:
A = Û Σ̂V ∗ = Û
| ÊV
−1
{z }
more intuitive geometrically

A ∈ Cm×n
U ∈ Cm×m and unitary
Σ ∈ Cm×n (same size as A), with singular values on the “main diagonal” (top left going down right)
and zeroes elsewhere
V ∈ Cn×n and unitary

Û ∈ Cm×n
Σ̂ ∈ Cn×n with some zeroes on the diagonal

Û ∈ Cm×r
Σ̂ ∈ Cr×r with no zeroes on the diagonal

175
§9.7: (Trefethen & Bau) Lecture 5: More on the SVD

We take
A = U ΣV ∗ for A, Σ ∈ Cm×n and U ∈ Cm×m , V ∈ Cn×n unitary
Some emergent properties:

The SVD says each matrix is diagonal, up to a change of basis. Recall: b ∈ Cm may be
expanded in terms of the basis of the columns of U , and x ∈ Cn is likewise expandable in the basis of
V ’s columns. These expansions give vectors of coefficients b′ , x′ by
b′ = U −1 b = U ∗ b x′ = V −1 x = V ∗ b
and hence
Ax = b =⇒ U ∗ b = U ∗ Ax = U ∗ U ΣV ∗ x =⇒ b′ = Σx′
Comparison to eigendecomposition: Recall: For A ∈ Cm×m , if it has n linearly independent
eigenvectors pi attached to eigenvalues λi , we may write
h i
A = P DP −1 for P = p1 · · · pm and D = diag(λ1 , · · ·, λm )

This is a conversion to and from the eigenbasis (i.e. uses only one basis), whereas the SVD uses two.
The SVD’s bases are orthonormal, whereas that is not necessarily true of the eigendecomposition.
Moreover, the SVD has the further advantage of applying to all matrices, whereas not even all square
matrices have eigendecompositions.
Properties from the SVD: Take A ∈ Cm×n , with p := min{m, n} and r the number of singular
values of A (that are nonzero). Then:

◦ rank(A) = r (Bau, Thm. 5.1)

◦ range(A) = span{u1 , · · ·, ur } (Bau, Thm. 5.2)
◦ ker(A) = span{vr+1 , · · ·, vn } (Bau, Thm. 5.2)
◦ ∥A∥2 = σ1 (Bau, Thm. 5.3)
pP
◦ ∥A∥F = 2
i σi (Bau, Thm. 5.3)
√
◦ For λ ∈ σ(AA∗ ) = σ(A∗ A) nonzero, λ is a singular value of A (Bau, Thm. 5.4)
◦ For A = A∗ (Hermitian), λ ∈ σ(A) gives |λ| a singular value of A (Bau, Thm. 5.5)
Q
◦ If A is square, then |det(A)| = i σi (Bau, Thm. 5.6)
Pr
◦ A is a sum of r rank-1 matrices: A = j=1 σj uj vj∗ (Bau, Thm. 5.7)
◦ For 0 ≤ ν ≤ r, define the νth partial sum (Bau, Thm. 5.8)
ν
X
Aν := σj uj vj∗
j=1

with σν+1 = 0 for ν = p. Then we have

∥A − Aν ∥2 = inf ∥A − B∥2 = σν+1
B∈Cm×n
rank(B)≤ν

giving that Aν is the bext 2-norm approximation of A of matrices of lower rank.

◦ For the Frobenius norm, we likewise have (Bau, Thm. 5.9)
v
u r
u X
∥A − Aν ∥F = inf
m×n
∥A − B∥ F = t σi2
B∈C
rank(B)≤ν i=ν+1

176
To compute the SVD of most matrices M , we can:

Find M M ∗ , M ∗ M
√
Find their eigenvalues λi ; then λi make the singular values
Find the eigenvectors of M M ∗ ; call them ui (corresponding to λi ) and form U = [u1 | · · · | ur ].

Attach additional, arbitrary, orthonormal vectors as needed.

Form Σ = diag(σ1 , · · ·, σr ), appending zeroes as needed, with σ1 ≥ σ2 ≥ · · · ≥ 0.

To avoid another eigenvector computation, note that

M = U ΣV ∗ =⇒ U ∗ M = ΣV ∗

Compute each side of this equality, and compare columnwise to fill in V ∗ .

177
§9.8: (Trefethen & Bau) Lecture 6: Projectors

Some basic definitions:

A square P ∈ Cm×m is said to be a projector or idempotent if P 2 = P

The complementary projector of P is I − P
An orthogonal projector P is one with range(P ) ⊥ ker(P ). Equivalently, P ∗ = P .

Some notes:

A projector P shines a light onto range(P ), with P v the shadow v projects.

If v ∈ range(P ), then it lies on its own shadow (P v = v); else, we see P v − v ∈ ker(P ) (apply P to it):

(Notice that P v − v is parallel to the “light”, the direction of projection; it gets mapped onto the null
space that way: the null space is the direction parallel to the light.)
We note that
range(I − P ) = ker(P )
ker(I − P ) = range(P )
ker(I − P ) ∩ ker(P ) = ⟨0⟩
range(P ) ∩ ker(P ) = ⟨0⟩
Hence, a projector of domain C separates Cm into two spaces; the converse holds. (Given S, T ≤ Cm
m

with S ⊕ T = Cm , there is a projector P with range(P ) = S and ker(P ) = T . We say P is the

projector onto S along T - the projector onto range(P ) along ker(P ).)
An orthogonal projector may take the form P = Q̂Q̂∗ , where Q̂ has orthonormal columns. (This is
inspired by theP
SVD, but note Q̂ need not be square. It also need only have orthonormal columns.)
n
The map v 7→ i=1 (qi qi∗ )v is an orthogonal projector onto range(Q̂). A special case arises with the
rank-1 orthogonal projector isolating the component in a single direction q,
qq ∗
Pq :=
∥q∥
and complement of the rank-(m − 1) orthogonal projector eliminating that direction,
qq ∗
P⊥q = I −
∥q∥

One need not use an orthonormal basis; an orthogonal projector onto S ≤ Cm can be constructed
n
as so. Suppose S = span{ai }i=1 for ai linearly independent, and let A = [a1 | · · · | an ]. Then the
projection P onto range(A) is given by
−1
P = A(A∗ A) A∗

178
§9.9: (Trefethen & Bau) Lecture 7: QR Factorization & Gram-Schmidt

QR Factorization:
n
We define ⟨v1 , · · ·, vn ⟩ := span{vi }i=1
QR factorization has the focus of orthogonalizing the range of A ∈ Cm×n . Specifically, we want to get
n
{qi }i=1 ⊆ Cm such that

⟨q1 , · · ·, qk ⟩ = ⟨a1 , · · ·, ak ⟩ for each k ∈ {1, 2, · · ·, n}

qi ⊥ qj for i ̸= j, i.e. qi∗ qj = ⟨qi , qj ⟩Cm = δi,j (orthogonality)
∥qi ∥2 = 1 (unit vectors / orthonormality)

The result can be framed in a few ways, then: A = Q̂R̂ with

 
r1,1 r1,2 · · · r1,n
h i h i 0 r2,2 · · · r2,n 
a1 · · · an = q1 · · · qn  .
 
. .. .. .. 
| {z } | {z } . . . . 
= A ∈ Cm×n = Q̂ ∈ Cm×n 0 0 · · · rn,n
unitary | {z }
= R̂ ∈ Cn×n
upper-triangular
or as a set of equations,
a1 = r1,1 q1
a2 = r1,2 q1 + r2,2 q2
..
.
n
X
an = r1,n q1 + r2,n q2 + . . . + rn,n qn = ri,n qi
i=1

Note that A = Q̂R̂ is a reduced QR factorization; the full QR factorization A = QR is formed as

so: for m ≥ n,

Add m − n orthonormal columns to Q̂ to get Q ∈ Cm×m unitary

Add equally many all-zero rows to R̂ to get R ∈ Cm×n (upper triangular, of a sort)

A QR factorization exists in both forms for all A ∈ Cm×n , unique in the reduced case with the suppositions
that A is of full rank and rj,j > 0 (R̂ is strictly positive on the diagonal).
m
Note that, in the full QR factorization, the extra columns {qj }j=n+1 are a basis of range(A)⊥ = ker(A∗ ).

179
Gram-Schmidt Orthogonalization Algorithm:
The classical, unstable algorithm constructs the ri,j and qj from the ai of A as so:
k−1
X
ak − ri,k qi
k−1

X
i=1
qk = ri,j = qi∗ aj = ⟨qi , aj ⟩Cm rk,k = ak − ri,k qi = ∥top of qk ∥2

rk,k | {z }
i=1

2
i ̸= j

In a pseudo-code manner,

L2 Space:

We can define an inner product on a space of functions by

Z
⟨f, g⟩L2 (Ω) := fg
Ω

Solving Ax = b:
Consider Ax = b where A has the QR factorization A = QR. It is trivial to show, then,

Rx = Q∗ b

Thus, to solve the problem:

(i) Get A’s QR factorization

(ii) Find y = Q∗ b

(iii) Solve Rx = y for x - trivial, since R is triangular

180
§9.10: (Trefethen & Bau) Lecture 10: Householder Triangularization

CGS/MGS may be thought of as applying elementary triangular matrices Rk on the right of A:

AR1 · · ·Rn = Q̂ =⇒ R̂ = Rn−1 · · ·R1−1

The Householder method instead applies elementary unitary transformations Qk on A’s left:

Qn · · ·Q1 A = R =⇒ Q = Q∗1 · · ·Q∗n

That is, GS is a “triangular orthogonalization” method, with Householder an “orthogonal triangularization

method”.
The Qk take the form

I(k−1)×(k−1) 0
Qk := for Fk ∈ C(m−k+1)×(m−k+1) unitary
0 Fk

We see that
vk vk∗ vk vk∗
Fk = I − 2 = I − 2
vk∗ vk 2
∥vk , vk ∥2
where xk ∈ Cm−k+1 is taken from A’s kth column, as the entries in rows k to m (xk := Ak:m,k ), and then

vk := sign(x1 ) · ∥x∥2 · e1 + x

181
It being implicit is no issue, since we may still find Q∗ b and Qx easily:

The latter algorithm applied to Qek for each k can reconstruct Q explicitly if need be.

182
§9.11: (Trefethen & Bau) Lecture 11: Least Squares Problems

Consider an overdetermined system Ax = b (A ∈ Cm×n , b ∈ Cm , x ∈ Cn , m > n). The goal is to minimize

the error in the residual r := b − Ax; that is, find x such that
x = arg min∥b − Ax∥2 = arg min∥r∥2
ξ r

The satisfactory x is neatly geometrically seen as that ensuring b − Ax ⊥ range(A).

One may think of the satisfactory residual as the orthogonal projection P b of b onto range(A).
One may show that x is satisfactory (the norm-minimizing x) iff any of these equivalent statements
hold:
b − Ax ⊥ range(A) A∗ r = 0 ∗ ∗
|A Ax{z= A }b P b = Ax
system of “normal equations”

The solution is unique iff A is of full rank.

We use this as a means to define the pseudoinverse of A:
A+ := (A∗ A)−1 A∗ b
which implies the solution to the least-squares problems is x = A+ b.
As far as solving least squares problems more algorithmically:
Via Normal Equations / Cholesky Factorization:
◦ Find A∗ A and A∗ b
◦ Get the Cholesky factorization A∗ A = R∗ R (R is upper triangular)
◦ Solve R∗ w = A∗ b for w (easy, it’s lower-triangular)
◦ Solve Rx = w for x (easy, it’s upper-triangular)

1
◦ Complexity: ∼ mn2 + n3 flops
3
◦ Use Case: Speed
Via Reduced QR: The orthogonal projector is P = Q̂Q̂∗ . Letting Ax = P b = y, we get R̂x = Q̂∗ b.
◦ Find A = Q̂R̂
◦ Find Q̂∗ b
◦ Solve R̂x = Q̂∗ b for x (easy, it’s upper-triangular)

2
◦ Complexity: ∼ 2mn2 − n3 flops
3
◦ Use Case: Everyday use
◦ Implied Pseudoinverse: A+ ≡ R̂−1 Q̂
Via Reduced SVD: The orthogonal projector is P = Û Û ∗ , so Ax = P b gives Σ̂V ∗ x = Û ∗ b.
◦ Find the SVD A = Û Σ̂V ∗
◦ Calculate Û ∗ b
◦ Solve the system Σ̂w = Û ∗ b for w (easy, diagonal system)
◦ Then x = V w
Complexity: ∼ 2mn2 + 11n3 flops

◦
◦ Use Case: Nearly-rank-deficient matrices
◦ Implied Pseudoinverse: A+ ≡ V Σ̂−1 Û ∗

183
§9.12: (Trefethen & Bau) Lecture 12: Conditioning; Condition Numbers

Abstractly, a problem is a function f : X → Y of normed vector spaces, data and solutions respectively.
Problems are well-conditioned if small perturbations in input x give small changes in f (x), and
ill-conditioned otherwise.
Herein, δx is a small perturbation of x, and δf := f (x + δx) − f (x).
The absolute condition number κ̂ ≡ κ̂(x) of a problem at x is
∥δf ∥Y ∥δf ∥Y
κ̂ := lim sup = sup
δ→0 ∥δx∥ ≤δ
X
∥δx∥X infinitesimals ∥δx∥X
δx

and the relative condition number κ ≡ κ(x)

∥δf ∥Y /∥f (x)∥Y ∥δf ∥Y /∥f (x)∥Y
κ := lim sup = sup
δ→0 ∥δx∥ ≤δ
X
∥δx∥X /∥x∥Y infinitesimals ∥δx∥X /∥x∥Y
δx

Problems for which κ, κ̂ are small are well-conditioned; if large, they are ill-conditioned.
Recall we may define the Jacobian J(x) of a differentiable problem f at x is the matrix

∂fi
J(x) :=
∂xj i,j
for i, j in the appropriate ranges. Then with δf = J(x) δx (as infinitesimals), then we have
κ̂ = ∥J(x)∥X,Y
∥J(x)∥X,Y
κ=
∥f (x)∥Y /∥x∥X
in the induced norms of those on X, Y .
In the case of matrix-vector multiplication (the problem map x 7→ Ax), then
∥x∥
κ = ∥A∥
∥Ax∥
If A ∈ GLm (C), then
∥x∥
κ ≤ ∥A∥ A−1 or some choose κ = α∥A∥ A−1 for α :=

∥Ax∥∥A−1 ∥
We may replace A−1 with A+ in the nonsquare-but-full-rank case. The inverse problem (b 7→ A−1 b)
has A 7→ A−1 and vice versa.
A matrix has its own condition number ,
κ(A) := ∥A∥ A−1

with the usual definitions for what it means for A to be ill-/well-conditioned. (If A is noninvertible,
we say κ(A) = ∞.) We note that, if A ∈ Cm×m ,
1 σ1
∥A∥2 = σ1 and A−1 =

=⇒ κ(A) =
σm σm
in the 2-norm case (eccentricity of the hyperellipse).
For A ∈ Cm×n of full rank, with m ≥ n, then we let
σ1
κ(A) := ∥A∥ A+ =⇒ κ(A) =

σn

184
§10: Set-Theoretic Identities

§10.1: More Basic Identities

Unions and intersections associate and commute:

◦ A∪B =B∪A
◦ A∩B =B∩A
◦ (A ∪ B) ∪ C = A ∪ (B ∪ C), may remove parentheses
◦ (A ∩ B) ∩ C = A ∩ (B ∩ C), may remove parentheses
Union distributes over intersection and vice versa

◦ A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
◦ A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
◦ Analogous to a · (b + c) = a · b + a · c
Empty set and universal set (U ) stuff:

◦ A∪∅=A
◦ A∩U =A
◦ A ∪ Ac = U
◦ A ∩ Ac = ∅
◦ A∪U =U
◦ A∩∅=∅
◦ ∅c = U
◦ Uc = ∅
◦ A−U =∅
◦ A−A=∅
◦ ∅−A=∅
◦ A−∅=A
◦ ∅ is identity of union, and U that of intersection

Miscellaneous basic ones:

◦ (Ac )c = A
◦ A∪A=A∩A=A
◦ A ∪ (A ∩ B) = A ∩ (A ∪ B) = A

185
§10.2: Some More Useful & Noteworthy Ones

A very big list is here as well, I’ll only try to focus on ones I tend to actively use here.

Definitions:

◦ Set Difference: A − B := A ∩ B c
◦ Symmetric Difference: A △ B := (A−B)∪(B−A) = (A∪B)−(A∩B) = {x | x is one and only one of A, B}
◦ Cartesian Product:
For two sets, A × B := {(a, b) | a ∈ A, b ∈ B}
For finitely many sets, A1 × · · · × An := {(ai )ni=1 | ai ∈ Ai }
( )
Y [
For infinitely many sets, Ai := f : I → Ai ∀i ∈ I, f (i) ∈ Xi

i∈I i∈I
◦ Disjoint Union:
This does not mean the union of two sets, but rather takes two sets and their union, in a way
that accounts for duplicate elements. It is sometimes called a discriminated union.
`
Various notations include ⊔, , or a + or · inside a ∪ symbol.
For two sets, A ⊔ B := {(a, 1)}a∈A ∪ {(b, 2)}b∈B
G [
In general, Ai := {(a, i) | a ∈ Ai for the given i}
i∈I i∈I
!c !c
[ \ \ [
De Morgan’s laws: Ai = Aci and Ai = Aci
i∈I i∈I i∈I i∈I

Identities around set differences:

◦ C − (A ∩ B) = (C − A) ∪ (C − B)
◦ C − (A ∪ B) = (C − A) ∩ (C − B)
◦ C − (B − A) = (A ∩ C) ∪ (C − B)
◦ (B − A) ∩ C = (B ∩ C) − A = B ∩ (C − A)
◦ (B − A) ∪ C = (B ∪ C) − (A − C)
◦ (B − A) − C = B − (A ∪ C)

Items around Cartesian products:

◦ (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D)
◦ (A × B) − (C × D) = [(A − C) × B] ∪ [A × (B − D)]

186
§10.3: Identities on Functions, Images, & Preimages

A good reference on MSE with links to proofs and such.

Throughout, assume f : X → Y , with A, Ai ⊆ X and B, Bi ⊆ Y unless stated otherwise.
We define the image and preimage by

f (A) := {f (x) ∈ Y | x ∈ A}
−1
f (B) := {x ∈ X | f (x) ∈ B}

Results of note:

A ⊆ f −1 (f (A)), with equality if injective

f (f −1 (B)) ⊆ B, with equality if surjective

A1 ⊆ A2 =⇒ f (A1 ) ⊆ f (A2 )

B1 ⊆ B2 =⇒ f −1 (B1 ) ⊆ f −1 (B2 )
!
[ [
f Ai = f (Ai )
i∈I i∈I

f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ) with equality if injective (iff true on all subsets)
!
\ \
f Ai ⊆ f (Ai ), with equality if injective
i∈I i∈I
!
\ \
f −1 Bi = f −1 (Bi )
i∈I i∈I
!
[ [
f −1 Bi = f −1 (Bi )
i∈I i∈I

f (A1 ) − f (A2 ) ⊆ f (A1 − A2 )

f −1 (B1 ) − f −1 (B2 ) = f −1 (B1 − B2 )

B1 ⊆ B2 =⇒ f −1 (B1 ) ⊆ f −1 (B2 )

f (A) ∩ B = f (A ∩ f −1 (B))

f (A) ∪ B ⊇ f (A ∪ f −1 (B))

A ∩ f −1 (B) ⊆ f −1 (f (A) ∩ B)

A ∪ f −1 (B) ⊆ f −1 (f (A) ∪ B)

f (A)△f (B) ⊆ f (A△B) with equality if injective

187
Some of note also come from Dr. Ai. Note that we define, for a function f : E ⊆ Rn → R := R∪{+∞, −∞},
with E measurable and a ∈ R
{f > a} := f −1 (a, ∞] := {x ∈ E | f (x) > a}
and analogous notions for other sets, e.g. {f < a}, {f ≥ a}, {f = a}, etc. Then:
∞ ∞
! !
\ [
E = {f = −∞} ∪ {f > −∞} = {f ≤ −k} ∪ {f > −k}
k=1 k=1
c
{f ≤ a} = {f > a}
E = {f ≥ a} ∪ {f < a}
∞
[ 1
{f > a} = f ≥a+
n=1
n
∞
\ 1
{f ≥ a} = f >a−
n=1
n
∞
\ ∞
\
{f = +∞} = {f > n} = {f ≥ n}
n=1 n=1
∞
\ ∞
\
{f = −∞} = {f < −n} = {f ≤ −n}
n=1 n=1

{a < f ≤ b} = {f > a} ∩ {f ≤ b}
Some others with additional constraints/context needed:
When a > b, {f > a} ⊆ {f > b}
If f ≥ g on all of E, then {g > a} ⊆ {f > a}
We have that {f > g} = {f − g > 0}
◦ This does not hold for possible equality, i.e. {f ≥ g} = ̸ {f − g ≥ 0}
◦ It will hold in that case, however, if f, g are finite on E
Take a sequence of functions fk : E → R k∈N and define pointwise g(x) := sup fk (x) , h(x) := inf fk (x).

k∈N k∈N
Then
∞
[
◦ {g > a} = {fk > a}
k=1
∞
\
◦ {h < a} = {fk < a}
k=1

For a sequence {fk : E → R}k∈N and any function f : E → R, we define

n o
k→∞
{fk → f } := x ∈ E fk (x) −−−−→ f (x)

c
{fk ̸→ f } := {fk → f }
Then we have that
∞ [
∞ ∞
\ \ 1
{fk → f } := |fk − f | <
M
M =1 N =1 K=N
∞ \ ∞ ∞
[ [ 1
{fk ̸→ f } := |fk − f | ≥
M
M =1 N =1 K=N

188
§10.4: Limits of Sequences of Sets

Take {Ek }k∈N a sequence of sets. We may write

∞ [
\ ∞
lim sup Ek := Ek
k→∞ j=1 k=j
∞ \
[ ∞
lim inf Ek := Ek
k→∞
j=1 k=j

and speak of lim Ek when the above are equal. Of note:

k→∞

Clearly, lim inf Ek ⊆ lim sup Ek .

k→∞ k→∞

This follows as the former are all points “eventually in Ek ” for all k ≥ k0 ; the latter is all points in
infinitely-many Ek .

We say:

{Ek }k∈N increases to

S S
Ek if Ek ⊆ Ek+1 (“union gets wider”). Notation: Ek ↗ k∈N Ek
k∈N

{Ek }k∈N
T S
decreases to k∈N Ek if Ek ⊇ Ek+1 (“union shrinks”). Notation: Ek ↘ k∈N Ek

Consequently, we then have

∞
[ ∞
\
Ek ↘ lim sup Ek Ek ↗ lim inf Ek
k→∞ k→∞
k=j k=j

Properties of note:
c
lim sup Ek = lim inf (Ekc ) (M&I, Prob. 1.3)
k→∞ k→∞

If either Ek ↗ E or Ek ↘ E are true, lim sup Ek = lim inf Ek = E (M&I, Prob. 1.3)
k→∞ k→∞

189
§10.5: Axiom of Choice (Overview)

The axiom of choice may be formulated formally as follows:

h [ i
∀X ∅ ∈ / X =⇒ ∃f : X → X ∀A ∈ X (f (A) ∈ A)

or, perhaps easier to parse,

!
[
∀X := {Xα }α∈A ⊆ Ob(Set)\{∅} ∃f : X → Xa such that ∀X ∈ X we have f (X) ∈ X
α∈A

Informally, given a collection of sets X , all nonempty, there is a choice function which sends a set X to a
specific element x ∈ X.
Note that these statements are a nonissue for A a finite indexing set, sometimes even countable.
Some common equivalent statements:

Construct A Choice Set: Given a collection {Xα }α∈A of pairwise-disjoint and nonempty sets, ∃ a
set C with precisely one element from each set Xα .
Cartesian Product: Given a collection {Xα }α∈A of nonempty sets, their Cartesian product α∈A Xα
Q
is also nonempty.
Well-Ordering Theorem: Every set can be well-ordered.

◦ A well-ordering (X, ≤) is a strict total order with the property that each S ⊆ X has a least
element.
◦ To build it up: a partial order (poset) has reflexivity, antisymmetry, transitivity.
◦ A strict poset instead has irreflexivity, asymmetry, and transitivity.
◦ A strict total order has the additional property of connexity: we also have a ≤ b or b ≤ a for
each a, b in the set.
◦ Thus: a well-ordering has the properties of irreflexivity, asymmetry, transitivity, connexity, and
subsets containing their minimum elements.
Zorn’s Lemma: If a poset P has the property that all chains in P have upper bounds in P , then P
has at least one maximal element.

◦ Posets have the properties of reflexivity, antisymmetry, transitivity.

◦ A chain is just a totally ordered subset under the induced order.
◦ Total orders have the additional property of connexity (either a ≤ b or b ≤ a, always, in that
poset under ≤).
◦ This is commonly used in the proof that every vector space has a basis.

Less common equivalent statements:

Surjections:

◦ Given sets A, B, ∃f : A → B a surjection.

◦ Given a surjective function, it has a right inverse.
Tarski’s Theorem About Choice: For all S infinite, ∃ a bijection A → A × A.

Trichotomy: Given sets A, B, one of these are true: |A| = |B| or |A| < |B| or |B| < |A|.

190
Items often claimed as results, but are apparently equivalent:

Every vector space has a basis

Every nontrivial unital ring has a maximal ideal.

For all nonempty sets S, we can define a binary operation ∗ on S such that (S, ∗) is a group.

The Cartesian product of connected topological spaces is connected.

The closure of a product of topological spaces is the product of the closures of the factors.

(Tychanoff ’s Theorem) The Cartesian product of compact topological spaces is compact

Despite the commonly-cited issues and concerns with the Axiom of Choice, per MathOverflow, some strange
results follow without it. Some highlights from the link:

A nonempty tree graph may have no leaves, but no infinite path. (Every finite path in the tree may
be extended one more step - every finite length has a path - but there is not infinite path.)
There may exist x ∈ X ⊆ R with no sequence {xn }n∈N ⊆ X with xn → x. (That is, the property that
elements of a closure are limiting values of sequences requires AC.)
You may have a function continuous in the sense of preserving sequential limits (that being xn → x =⇒ f (xn ) → f (x))
that fails the ε-δ definition.
A set S may be infinite with no countably-infinite subset. (We cannot, then, say that ℵ0 is the smallest
infinite cardinality.)
There may be an equivalence relation on R with more equivalence classes than R has elements.

There is a field without an algebraic closure, and Q can have multiple non-isomorphic closures (and
such closures may even be countable).
There can be a vector space without a basis. Moreover, a vector space may have bases β, β ′ with
|β| < |β ′ |.
R is a countable union of countable sets. (This does not mean it is countable, which requires the axiom
of countable choice.)
All sets are measurable, in the Lebesgue sense – not that Lebesgue theory is very useful by the previous.

191
§11: Set-Theoretic Relations

§11.1: Tabulated Summary of Common Relations

A brief table summarizing some relations (from Wikipedia) is below. A more thorough write up follows
throughout the rest of this section.

192
§11.2: Basics of an Important Visual Construction

So for clarity, let’s have a relation R on a set A, so that R ⊆ A × A. In general, we will be focusing on
finite sets and relations, but the logic extends fine - it just makes for harder-to-contend-with pictures.
Our visual is one of a directed graph. We let the elements of A be our nodes/vertices, and draw arrows
pointing between them to indicate relationship. More specifically, a points to b if and only if (a, b) ∈ R.
Throughout this post, I will write a → b to indicate ”a points to b” for brevity’s sake. We will generally
focus on binary (homogenous) relations over the same set A.
Some examples of this construction:

R = {(1, 1), (1, 2), (1, 3), (1, 4)} on the set A = {1, 2, 3, 4}

R = {(1, 2), (2, 3), (3, 4), (4, 1)} on the set A = {1, 2, 3, 4}

193
R = ∅ on the set A = {1, 2, 3, 4} (i.e. the empty-set relation, sometimes called the empty relation)

R = P({1, 2, 3}) on the set A = {1, 2, 3} (i.e. the power-set relation, sometimes called the universal or
discrete relation)

We will now explore some basic properties of these.

194
§11.3: Basic Properties

§11.3.1: Reflexive-Like Properties

§11.3.1.1: Coreflexive

Formal Definition: (∀x, y ∈ A)((x, y) ∈ R =⇒ x = y)

Informal Definition: If anything is related to anything, those things must be the same thing.

Directed Graph Analogy: Nodes can only point to themselves.

Notes/Comments:

◦ Consequently, a coreflexive relation R is a relation which is a subset of the identity relation

I = {(x, x) | x ∈ A}.
◦ Moreover, equality itself is the only relation which is both coreflexive and reflexive.
◦ Given relations C coreflexive and T transitive, the union C ∪ T is also transitive.

Visual Examples: R = {(1, 1), (2, 2), (3, 3), (4, 4)} (left); R = {(2, 2)} (right)

195
Visual Non-Examples: R = {(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (2, 3)} (left); R = {(1, 2), (2, 3), (3, 1)}
(right)

196
§11.3.1.2: Irreflexivity

Formal Definition: (∀a ∈ A)((a, a) ̸∈ R)

Informal Definition: No element is related to itself.

Directed Graph Analogy: No element points to itself.

Basic Examples:

◦ Nonequality (i.e. (a, b) ∈ R ⇐⇒ a ̸= b)

◦ Coprimality on Z≥2 (i.e. (a, b) ∈ R ⇐⇒ gcd(a, b) = 1)
◦ Proper set inclusion of (i.e. (A, B) ∈ R ⇐⇒ A ⫋ B)
◦ Strict greater than / strict less than on R

Visual Examples: R = {(1, 2), (2, 3), (3, 4)} (left); R = {(1, 2), (2, 1), (3, 4), (4, 3)} (right)

Visual Non-Examples: R = {(1, 1), (1, 2), (2, 3), (3, 4)} (left); R = {(1, 1), (1, 2), (1, 3), (1, 4)} (right)

197
§11.3.1.3: Left quasi-reflexive

Formal Definition: (∀x, y ∈ A)((x, y) ∈ R =⇒ (x, x) ∈ R)

Informal Definition: Anything related to something must be related to itself. (The informal defini-
tion might be a bit confusing for each one-sided quasi-reflexive case.)
Directed Graph Analogy: If a node points to anything, it must also point to itself.

Visual Examples: R = {(1, 1), (1, 2), (3, 3)} (left); R = {(1, 1), (2, 2), (3, 3), (1, 4), (2, 4), (3, 4)} (right)

Visual Non-Examples: R = {(2, 2), (2, 3)(1, 2), (3, 3)} (left); R = {(1, 4), (2, 4), (3, 4)} (right)

198
§11.3.1.4: Quasi-reflexive

Formal Definition: (∀x, y ∈ A)((x, y) ∈ R =⇒ (x, x), (y, y) ∈ R)

Informal Definition: If anything is related to anything else, both of those things are related to
themselves.
Directed Graph Analogy: Any node which has an arrow pointing to or from it needs an arrow
pointing to itself.
Notes/Comments:

◦ Hence a quasi-reflexive relation is both left quasi-reflexive and right quasi-reflexive. It is a weaker
notion of reflexivity in that the (x, x) pairs only pop up in the relation when they are actually
relating to something.
◦ An equivalent property is that R is quasi-reflexive if and only if the symmetric closure R ∪ RT is
left- or right-quasi-reflexive.
◦ A relation which is both symmetric and transitive is quasi-reflexive.
∞ ∞
Basic Example: Let x = {xi }i=1 , y = {yi }i=1 be sequences in a metric space, e.g. in R with the
usual distance function. Define a relation R by

(x, y) ∈ R ⇐⇒ lim xn = lim yn

n→∞ n→∞

This relation is not necessarily reflexive. (Let x be a sequence whose limit does not exist, e.g.
xn = (−1)n .) However, if the limits for x, y exist and they equal, then trivially those limits for
each sequence equals their own limit individually, giving quasi-reflexivity.
Visual Examples: R = {(1, 1), (2, 2), (3, 3), (1, 2), (2, 3)} (left); R = {(1, 1), (2, 2), (3, 3), (4, 4), (1, 4), (2, 4), (3, 4)}
(right)

199
Visual Non-Examples: R = {(1, 1), (2, 2), (3, 3), (1, 2), (2, 3), (3, 4)} (left); R = {(2, 2), (3, 3), (4, 4), (1, 4), (2, 4), (3, 4)}
(right)

200
§11.3.1.5: Reflexivity

Formal Definition: (∀a ∈ A)((a, a) ∈ R)

Informal Definition: Every element is related to itself.

Directed Graph Analogy: Every element points to itself.

Notes/Comments:

◦ A reflexive relation cannot be irreflexive, asymmetric, or antitransitive.

2
−n
◦ There are 2n reflexive relations possible on a set of n elements.
Basic Examples:

◦ Equality (i.e. (a, b) ∈ R ⇐⇒ a = b)

◦ The ”less than/greater than or equal to” relations ≥, ≤ on R or subsets thereof (i.e. (a, b) ∈ R ⇐⇒ a ≥ b,
or (a, b) ∈ R ⇐⇒ a ≤ b)
◦ Set inclusion where equality is permissible (i.e. (A, B) ∈ R ⇐⇒ A ⊆ B)
◦ Divisibility on Z (i.e. (a, b) ∈ R ⇐⇒ a | b)

Visual Examples: R = {(1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (4, 4)} (left); R = {(1, 1), (2, 2), (3, 3), (3, 2), (4, 4)}
(right)

201
Visual Non-Examples: R = {(1, 2), (2, 2), (2, 3), (3, 3), (4, 4)} (left); R = {(2, 2), (3, 3), (3, 2)} (right)

202
§11.3.1.6: Right quasi-reflexive

Formal Definition: (∀x, y ∈ A)((x, y) ∈ R =⇒ (y, y) ∈ R)

Informal Definition: If something is related to something, it is related to itself. (The informal
definition might be a bit confusing for each one-sided quasi-reflexive case.)
Directed Graph Analogy: If a node is pointed to by anything, then that node also needs to point
to itself.
Visual Examples: R = {(1, 2), (2, 2), (2, 3), (3, 3), (4, 4)} (left); R = {(2, 2), (3, 3), (3, 2)} (right)

Visual Non-Examples: R = {(1, 1), (1, 2), (2, 3), (3, 3)} (left); R = {(1, 2), (3, 3), (3, 2), (4, 2)} (right)

203
§11.3.2: Symmetry-Like Properties

§11.3.2.1: Antisymmetry

Formal Definition: (∀x, y ∈ A)((x, y), (y, x) ∈ R =⇒ x = y)

Informal Definition: For distinct elements, relation is a one-way street. Two elements cannot be
related to each other unless they are the same element.
Directed Graph Analogy: There are no closed loops between pairs of distinct nodes.

Notes/Comments:

◦ Despite the naming, relations can be both symmetric & antisymmetric. Such relations happen to
also be coreflexive and thus a subset of the identity relation.
◦ Hence we can look at relations which are symmetric, antisymmetric, both, or neither.
◦ Note that this is not the same as asymmetry. In fact a relation is asymmetric iff it is antisymmetric
and irreflexive.

Basic Examples:

Divisibility on Z+ (i.e. (a, b) ∈ R ⇐⇒ a | b). Notice that in this context, a | b =⇒ b ≥ a. Hence

a | b and b | a iff a = b.
The relations of ”less/greater than or equal to”, ≤ and ≥, on R and its subsets (i.e. (a, b) ∈ R ⇐⇒ a ≤ b,
or (a, b) ∈ R ⇐⇒ a ≥ b).

Set inclusion (i.e. (A, B) ∈ R ⇐⇒ A ⊆ B).

Visual Examples: R = {(1, 2), (2, 4), (4, 1), (4, 3)} (left); R = {(1, 1), (1, 2), (1, 3), (1, 4)}

204
Visual Non-Examples: R = {(1, 2), (2, 1), (3, 3), (4, 4), (3, 4)} (left); R = {(1, 3), (3, 1), (3, 4), (4, 3), (4, 1)}
(right)

205
§11.3.2.2: Asymmetry

Formal Definition: (∀x, y ∈ A)((x, y) ∈ R ⇐⇒ (y, x) ̸∈ R)

Informal Definition: Relation is solely a one-way street; elements cannot be related to each other.
However unlike antisymmetry, this now forbids elements relating to themselves.
Directed Graph Analogy: Pairs of distinct nodes do not have loops between them. Nodes do not
point to themselves.
Notes/Comments:

◦ A relation is asymmetric iff it is antisymmetric and irreflexive.

◦ One may frame the definition as being ”∀x, y ∈ A, at least one of (a, b) or (b, a) are not in R”
◦ If a relation is transitive, it is asymmetric iff it is irreflexive (and hence a strict partial order).
◦ The only symmetric and asymmetric relation is R = ∅.
◦ If R is asymmetric, then so is its converse RT .

Basic Examples:

◦ Strict less/greater than (< or >) on R or its subsets (i.e. (a, b) ∈ R ⇐⇒ a < b (or a > b)).
◦ Strict set inclusion (i.e. (A, B) ∈ R ⇐⇒ A ⫋ B)
◦ Divisibility on positive integers disallowing equality (i.e. (a, b) ∈ R ⇐⇒ a | b ∧ b ̸= a).

Visual Examples: R = {(1, 2), (2, 4), (4, 1), (4, 3)} (left); R = {(1, 2), (1, 3), (1, 4)}

206
Visual Non-Examples: R = {(1, 2), (2, 1), (3, 3), (4, 4), (3, 4)} (left); R = {(1, 3), (3, 1), (3, 4), (4, 3), (4, 1), (1, 1)}
(right)

207
§11.3.2.3: Symmetry

Formal Definition:

◦ (∀x, y ∈ A)((x, y) ∈ R =⇒ (y, x) ∈ R)

◦ Equivalently, we may use (∀x, y ∈ A)((x, y) ∈ R ⇐⇒ (y, x) ∈ R)

Informal Definition: If an element x is related to another y, the reverse is also true.

Directed Graph Analogy: There are no ”stray/lone” edges in the graph where an element solely
points to another. There will always be just closed loops between any two distinct elements (or no
connections at all).
Notes/Comments:

◦ Framed in terms of the converse relation, R is symmetric iff R = RT .

◦ Of course, then, the symmetric closure R ∪ RT is always symmetric.
◦ There are 2n(n+1)/2 symmetric relations possible on a set of n elements.

Basic Examples:

◦ Equality itself (i.e. (a, b) ∈ R ⇐⇒ a = b)

◦ Many relations derive their symmetry from a use of equality. For instance, on the set of functions
with domain and codomain R, let (f, g) ∈ R ⇐⇒ f (0) = g(0). This is a symmetric relation.
◦ Others attain it from commutative operations. For instance, in C, define R by (a, b) ∈ R ⇐⇒ a+b = 0.
Well, since b + a = 0 ⇐⇒ a + b = 0, symmetry arises.
◦ Partitioning by parity in Z+ (i.e. (a, b) ∈ R ⇐⇒ a, b are both odd or both even)

Visual Examples: R = {(1, 2), (2, 1), (2, 4), (4, 2), (3, 4), (4, 3)} (left); R = {(1, 4), (4, 1), (2, 2), (3, 3)}
(right)

208
Visual Non-Examples: R = {(1, 3), (3, 4), (4, 2), (2, 1), (2, 3), (1, 4)} (left); R = {(1, 3), (3, 4), (4, 2), (2, 1), (1, 4), (4, 1)}
(right)

209
§11.3.3: Transitivity-Like Properties

§11.3.3.1: Antitransitive

Formal Definition: (∀a, b, c ∈ A)((a, b), (b, c) ∈ R =⇒ (a, c) ̸∈ R)

Informal Definition: Transitivity is never satisfied.

Directed Graph Analogy: Whenever two sides of a triangle are formed, the third side that transi-
tivity would dictate is not present.
Notes/Comments:

◦ Equivalent definitions include the following:

(∀a, b, c ∈ A)((a, b), (a, c) ∈ R =⇒ (b, c) ̸∈ R)

(∀a, b, c ∈ A)((a, c), (b, c) ∈ R =⇒ (a, b) ̸∈ R)

◦ Notice that if c = a in the definition, then an antitransitive relation is consequently irreflexive.

◦ If R is on a set A with |A| ≥ 4, and if R is antitransitive, then it is never connected.
◦ A relation which is irreflexive and left- or right-unique is antitransitive.

Visual Examples: R = {(1, 3), (2, 1), (4, 2)} (left); R = {(1, 2), (2, 1), (1, 3), (3, 4), (2, 4)} (right)

210
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 1)} (left); R = {(1, 1), (2, 2), (1, 2), (2, 4), (1, 3)} (right)

211
§11.3.3.2: Cotransitive

Formal Definition: (∀x, y, z ∈ A)((x, z) ∈ R =⇒ (x, y) ∈ R ∨ (y, z) ∈ R)

Informal Definition: If two elements are related, on any intermediate related pair, at least one of
those pairs will also be related.
Directed Graph Analogy: Provided x → z, then for any y ∈ A, either x → y or y → z (or both).

Notes/Comments:

◦ One may frame the definition as being that ”R is cotransitive iff its complement Rc is transitive.”
(This is the usual set-theoretic complement.)
◦ Cotransitive relations are also quasi-transitive.
◦ A cotransitive relation is connected iff it is irreflexive.
◦ A cotransitive relation may be transitive. Sufficient conditions include being left-Euclidean, right-
Euclidean, or antisymmetric.

Example 1: Treat (1, 4) as a sort of ”root” in making the relation. Then for any y ∈ A := {1, 2, 3, 4},
we have to add either (1, y) or (y, 4). Always choosing the former, we get R = {(1, 2), (1, 3), (1, 4)}.

212
Example 2: Suppose we start with (1, 1). For any any y ∈ A, we need to add (1, y) or (y, 1). Adding
the latter always, we get the points (2, 1), (3, 1), (4, 1).

Visual Non-Examples: Remove any of the newly-added arrows from the above relations to get a
nonexample:

213
§11.3.3.3: Intransitive

Formal Definition: (∃a, b, c ∈ A)((a, b), (b, c) ∈ R ∧ (a, c) ̸∈ R)

Informal Definition: There exists some trio of elements where transitivity does not hold.

Directed Graph Analogy: You can find two sides of 3-cycle (or, more appropriately, a triangle)
which is not closed off.
Notes/Comments:

◦ This essentially amounts to ”is not transitive.”

◦ This is weaker version of antitransitivity.

Visual Examples: R = {(1, 2), (2, 4)} (left); R = {(3, 4), (4, 2)} (right)

Visual Non-Examples: R = {(1, 2), (2, 4), (1, 4)} (left); R = {(3, 4), (4, 2), (3, 2)} (right)

214
§11.3.3.4: Left Euclidean

Formal Definition: (∀x, y, z ∈ A)((x, y), (x, z) ∈ R =⇒ (y, z) ∈ R)

Informal Definition: If an element is related to multiple things, those things are related to each
other too.
Directed Graph Analogy: If a node points to multiple others, then those nodes need to point to
each other. (If you know a bit of graph theory, think of the neighborhood of a vertex too.)
Notes/Comments:

◦ Notice, (y, x), (z, x) ∈ R for R left-Euclidean gives both (y, z), (z, y) ∈ R, just by swapping
arguments.
◦ A right- or left-Euclidean reflexive relation is symmetric. Thus by the previous, it is transitive
and thus an equivalence relation.
◦ Right- and left-Euclidean relations are quasi-transitive.
◦ A connected, right- or left-Euclidean relation on a set of cardinality at least 3 is never antisym-
metric.
◦ Left-Euclidean relations are left-unique iff they are antisymmetric. Such relations are also transi-
tive, by vacuous logic.
◦ Left-Euclidean relations are left quasi-reflexive. A relation is left quasi-reflexive iff it is both
left-Euclidean and left-unique.

Visual Examples: R = {(1, 2), (1, 4), (2, 4), (4, 2)} (left); R = {(1, 2), (1, 3), (1, 4), (2, 3), (3, 2), (2, 4), (4, 2), (3, 4), (4, 3)}
(right). In these, focus on the nodes that 1 is pointing to.

215
Visual Non-Examples: R = {(1, 1), (1, 2), (1, 4), (2, 4)} (left); R = {(1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}

216
§11.3.3.5: Quasi-transitive

Formal Definition: (∀a, b, c ∈ A)((a, b), (b, c) ∈ R ∧ (b, a), (c, b) ̸∈ A =⇒ (a, c) ∈ R ∧ (c, a) ̸∈ R)

Informal Definition: Not really sure how to phrase one.

Directed Graph Analogy: Whenever two sides of a triangle are formed, the third side is closed off.
Moreover, there are not arrows going in the reverse direction for any arrow of that triangle. (This
latter condition ensures that quasitransitive relations are transitive but not necessarily the reverse.)
Notes/Comments:

◦ Quasi-transitive antisymmetric relations are also transitive.

◦ R is quasi-transitive iff there are relations S symmetric and T transitive whereby S ∩ T = ∅ and
R = S ∪ T . (S, T may not be unique.)
◦ Symmetric relations are quasitransitive.
◦ Transitive relations are quasitransitive.
◦ A relation R is quasi-transitive iff its set-theoretic complement Rc is.
◦ A relation R is quasi-transitive iff its converse RT is.

Visual Examples: R = {(1, 2), (1, 4), (2, 4)} (left); R = {(1, 2), (1, 3), (2, 3), (4, 2), (4, 3)} (right)

217
Visual Non-Examples: R = {(1, 2), (2, 4)} (left); R = {(1, 2), (2, 1), (1, 4), (4, 1), (2, 4), (4, 2)} (right)

218
§11.3.3.6: Right Euclidean

Formal Definition: (∀x, y, z ∈ A)((y, x), (z, x) ∈ R =⇒ (y, z) ∈ R)

Informal Definition: If multiple things are related to the same thing, those things are related to
each other too.
Directed Graph Analogy: If multiple nodes point to the same one, they need to point to each other
too.
Notes/Comments:

◦ Notice, (x, y), (x, z) ∈ R for R right-Euclidean implies (y, z), (z, y) ∈ R too, just by swapping
arguments.
◦ Which similar to transitivity, it is not the same. ≤ on R is transitive, yet not right-Euclidean, for
instance. However, if connected, transitivity holds.
◦ If R is symmetric, then it is transitive, right-Euclidean, and left-Euclidean if it is an one of those
three.
◦ A right- or left-Euclidean reflexive relation is symmetric. Thus by the previous, it is transitive
and thus an equivalence relation.
◦ Right- and left-Euclidean relations are quasi-transitive.
◦ A connected, right- or left-Euclidean relation on a set of cardinality at least 3 is never antisym-
metric.
◦ Right-Euclidean relations are right-unique iff they are antisymmetric. Such relations are also
transitive, by vacuous logic.
◦ Right-Euclidean relations are right quasi-reflexive. A relation is right quasi-reflexive iff it is both
right-Euclidean and right-unique.

Visual Examples: R = {(2, 2), (1, 4), (4, 1), (1, 3), (4, 3)} (left); R = {(1, 2), (2, 1), (1, 3), (1, 4), (2, 3), (2, 4)}
(right)

219
Visual Non-Examples: R = {(2, 4), (3, 4)} (left); R = {(2, 1), (3, 1), (4, 1), (2, 4), (3, 4)} (right)

220
§11.3.3.7: Transitivity

Formal Definition: (∀a, b, c ∈ A)((a, b), (b, c) ∈ R =⇒ (a, c) ∈ R)

Informal Definition: If a is related to b and b is related to c, then a is related to c.

Directed Graph Analogy: For three distinct nodes, if two sides of a triangle are formed, then the
third side needs to be closed off. Similar arguments arise for indistinct nodes chosen.
Notes/Comments:

◦ Unlike symmetry and reflexivity, there is not yet a nice closed form for the number of transitive
relations on a set of n elements. Some more can be read on the OEIS here.
◦ If R is transitive, so its converse relation RT .
◦ The intersection of transitive relations is transitive, but not necessarily their unions or comple-
ments.
◦ A relation is transitive and asymmetric iff it is irreflexive.

Basic Examples:

◦ Equality, or the less/greater than comparisons (with or without equality) in R. So you may define
a relation R by (a, b) ∈ R ⇐⇒ a = b (or a < b, or a ≤ b, or a > b, or a ≥ b) and get a transitive
relation.
◦ Divisibility (i.e. (a, b) ∈ R ⇐⇒ a | b)
◦ Set inclusion, with or without inequality (i.e. (A, B) ∈ R ⇐⇒ A ⫋ B (or A ⊆ B))

Visual Examples: R = {(1, 2), (3, 1), (3, 2)} (left); R = {(1, 1), (1, 2), (2, 4), (4, 3), (3, 1), (1, 4), (4, 1)}
(right)

221
n o
2
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 3), (3, 1)} (left); R = (x, y) ∈ {1, 2, 3, 4} x ̸= y
(right) because it is not reflexive.

222
§11.3.4: Comparability Properties

§11.3.4.1: Connectedness

Formal Definition: (∀x, y ∈ A)((x, y) ∈ R ∨ (y, x) ∈ R ∨ x = y)

Informal Definition: For any pair of distinct elements, we know one is always related to the other.

Directed Graph Analogy: For any pair of distinct nodes, you can find a pathway from one to the
other.
Notes/Comments:

◦ A relation where (∀x, y ∈ A)((x, y) ∈ R ∨ (y, x) ∈ R) is sometimes called ”strongly connected.”

◦ R is strongly connected iff P(A) ⊆ R ∪ RT iff Rc ⊆ RT iff Rc is asymmetric. (RT denotes the
converse relation and Rc is the usual set-theoretic complement.)
◦ R is connected iff I c ⊆ R ∪ RT iff Rc ⊆ RT ∪ I iff Rc is antisymmetric. (Here I is the identity
relation on A, I := {(a, a) | a ∈ A}.
◦ If R is a strongly-connected symmetric relation on A, then R = P(A).
◦ A relation is strongly connected iff it is connected and reflexive.
◦ Connected relations on A of cardinality at least 4 are never antitransitive.
◦ If R is connected on A, then the range of R, range(R), is A, or A minus one element. The same
is true of the domain, domain(R).

223
Visual Examples: R = {(1, 2), (1, 3), (1, 4), (3, 2), (3, 4), (4, 2)} (left); R = {(1, 1), (2, 1), (1, 4), (1, 3), (3, 1), (3, 2), (3, 4),
(right)

Visual Non-Examples: R = {(1, 2), (1, 3), (1, 4), (2, 2), (3, 3), (4, 4)} (left); R = {(1, 4), (4, 3), (4, 2), (3, 2)}
(right)

224
§11.3.4.2: Converse Well-Founded

Definition: R if converse well-founded iff the converse relation RT is well-founded.

Directed Graph Analogy: Each subgraph has a node from which nothing is pointing.

Example 1: Letting (a, b) ∈ R if a > b on {1, 2, 3, 4} works. It gives the below graph. The maximum
element is the node of concern in a given subset.

Example 2: R = {(5, 2), (5, 3), (6, 3), (6, 4), (7, 4), (8, 4), (2, 1), (3, 1), (4, 1)}

225
Visual Non-Examples: R = {(1, 3), (3, 4), (4, 2), (2, 1)} (left); R = {(1, 3), (3, 1), (4, 2), (2, 4)} (right)

226
§11.3.4.3: Trichotomous

Formal Definition:

(∀x, y ∈ X) (x, y) ∈ R ∧ (y, x) ̸∈ R ∧ x ̸= y

∨ (x, y) ̸∈ R ∧ (y, x) ∈ R ∧ x ̸= y

∨ (x, y) ̸∈ R ∧ (y, x) ̸∈ R ∧ x = y

Informal Definition: For any x, y, one and only one of the statements xRy, yRx, and x = y may
hold.
Directed Graph Analogy: An element may point to itself; if not, it may point to another element,
with that element not pointing back; if not, that element must point to it.

Notes/Comments:

◦ A relation is trichotomous iff it is asymmetric and connected.

◦ A trichotomous, transitive relation is also a strict total order.

Visual Examples: R = {(1, 2), (1, 3), (1, 4), (3, 2), (4, 3), (4, 2)} (left); R = {(4, 2), (4, 1), (3, 4), (3, 2), (3, 1), (4, 2), (2, 1)}

227
Visual Non-Examples: R = {(1, 2), (2, 4), (3, 4), (4, 3), (3, 2), (1, 4), (1, 3), (3, 1)} (left); R = {(2, 1), (1, 3), (3, 4), (4, 2), (
(right)

228
§11.3.4.4: Well-Founded

Formal Definition: (∀ nonempty S ⊆ A)(∃s ∈ S)(∀a ∈ S)((a, s) ̸∈ R)

Informal Definition: Any subset has an element in it to which no element is related.

Directed Graph Analogy: Any subgraph has a node to which no arrow points.

Notes/Comments:

◦ Any well-founded relation is irreflexive.

◦ Assuming the axiom of dependent choice, R is well-founded if and only if there are no infinite
∞
descending chains: subsets {xi }i=1 of A where (xn+1 , xn ) ∈ R for each n ∈ N.

Example 1: For the set {1, 2, 3, 4, 5, 6}, we let (a, b) ∈ R iff a < b with respect to the usual ordering
on R. The graph below results. In this case the relevant node is the minimum of any subset taken.

Example 2: Any finite, directed, acyclic graph corresponds to a well-founded relation. For instance,
as below, consider R = {(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}.

229
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 1)} (left) due to a cycle; R = {(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (1, 3), (1, 4)
(right) due to reflexivity

230
§11.3.5: Function-Like Properties

§11.3.5.1: Bijectivity

Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Definition: A relation which is both injective and surjective. This necessitates that X, Y have equal
cardinality.
Directed Graph Analogy: For each node, there is exactly one arrow leaving and entering it.

Visual Examples: R = {(1, 2), (2, 4), (4, 3), (3, 1)} (left); R = {(1, 4), (4, 2), (2, 1), (3, 3)} (right)

Visual Non-Examples: R = {(1, 2), (1, 3), (1, 4)} (left); R = {(2, 1), (3, 4), (1, 3), (3, 1)} (right)

231
§11.3.5.2: Functional

Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀x ∈ X)(∀y, z ∈ Y )((x, y), (x, z) ∈ R =⇒ y = z)

Informal Definition: Any element can be related to at most one other element.

Directed Graph Analogy: For each node, there is at most one arrow leaving it.

Notes/Comments: This embodies the definition of a ”partial function”, in that a partial function
maps anything to at most one other value. An ordinary or total function is one where each element in
the domain maps to something, exactly one such ”something.” That additional property is known as
left-totality.
Visual Examples: R = {(1, 2), (2, 3)} (left); R = {(3, 1), (1, 2), (2, 4), (4, 4)} (right)

Visual Non-Examples: R = {(1, 2), (1, 3), (1, 4)} (left); R = {(1, 2), (2, 1), (1, 1), (2, 2), (3, 3), (4, 4)}
(right)

232
§11.3.5.3: Injectivity

Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀x, z ∈ X)(∀y ∈ Y )((x, y), (z, y) ∈ R =⇒ x = z)

Informal Definition: If an element is related to something else, it can be related to at most one such
thing.
Directed Graph Analogy: Every node has at most one arrow pointing to it.

Visual Examples: R = {(3, 3), (3, 1), (1, 2), (2, 4)} (left); R = {(1, 2), (3, 4)} (right)

Visual Non-Examples: R = {(1, 4), (2, 4), (3, 4)} (left); R = {(1, 2), (1, 4), (3, 2), (3, 4)} (right)

233
§11.3.5.4: (Left-)Totality / Seriality

Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀x ∈ A)(∃y ∈ B)((x, y) ∈ R)

Informal Definition: Every element is related to something.

Directed Graph Analogy: Every element points to something, i.e. every node has an arrow leaving
it.
Notes/Comments:

◦ If R is serial with RT its transpose, then I ⊆ RT ◦ R (for I the identity relation).

◦ Note that R is serial iff RT is surjective.

Visual Examples: R = {(1, 2), (2, 3), (3, 4), (4, 2)} (left); R = {(1, 2), (1, 3), (1, 4), (2, 2), (3, 3), (4, 4)}
(right)

Visual Non-Examples: R = {(1, 2), (2, 4), (4, 1)} (left); R = {(1, 2), (1, 3), (1, 4)} (right)

234
§11.3.5.5: Surjectivity

Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀y ∈ Y )(∃x ∈ X)((x, y) ∈ R)

Informal Definition: Every element in Y has something in X related to it.

Directed Graph Analogy: Every node has at least one arrow pointing to it.

Notes/Comments: If R is surjective with RT its transpose, then I ⊆ R ◦ RT (for I the identity

relation).
Visual Examples: R = {(1, 2), (1, 3), (1, 4), (2, 4), (3, 4)} (left) ; R = {(1, 1), (1, 2), (2, 4), (4, 3)}
(right)

Visual Non-Examples: R = {(1, 4), (2, 4), (3, 4)} (left); R = {(1, 2), (1, 3), (3, 2)} (right)

235
§11.4: Combinations of Properties

§11.4.1: Dense Posets

Definition: R is a dense poset if it has the following properties:

◦ R is a poset
◦ (∀x, y ∈ A)(x < y)(∃z ∈ A)(x < z ∧ z < y)

236
§11.4.2: Dependencies

Definition: R is a dependency if it has the following properties:

◦ R is reflexive
◦ R is symmetric
◦ R is finite

237
§11.4.3: Equivalence / Equivalence Relations

Definition: R is an equivalence relation if it has the following properties:

◦ Reflexivity
◦ Symmetry
◦ Transitivity

Notes/Comments:

◦ Hence any equivalence relation is a partial equivalence relation (one with reflexivity) and a preorder
(one with symmetry).
◦ There is a one-to-one correspondence with the equivalence relations on a set, and the partitions
of the set.
◦ There is no nice closed form for the number of equivalence relations on a set of n elements. The
closest one may get is
Xn
S(n, k)
k=0

where S(n, k) denotes Stirling numbers of the second kind. You can find more details on the OEIS
here.

Equivalence Classes:

◦ Recall that an equivalence class is the set of all elements that are related to each other. Hence,
an equivalence class of an element x can be visualized as all of the nodes y from which you can
walk from x to y and back again along the same path.
◦ Example 1: Our equivalence classes are {1, 2}, {3}, {4}.

238
◦ Example 2: Our equivalence classes are {1, 3}, {2, 4}.

◦ Example 3: Our sole equivalence class is {1, 2, 3, 4}.

239
§11.4.4: Partial Equivalence Relation

Definition: R is a partial equivalence relation if it has the following properties:

◦ Symmetry
◦ Transitivity

Notes/Comments:

◦ Any partial equivalence relation is right Euclidean and left Euclidean. (The converse need not be
true.) Consequently, they are also quasi-reflexive.

240
§11.4.5: Partial Orders / Posets

Definition: R is a partial order (poset) if it has the following properties:

◦ R is reflexive
◦ R is antisymmetric
◦ R is transitive

Notes/Comments:

◦ There is no nice closed form for the number of posets on a set of n elements. You can find more
details on the OEIS here.

241
§11.4.6: Preorders

Definition: R is a preorder if it has the following properties:

◦ Reflexivity
◦ Transitivity

Notes/Comments:

◦ There is no nice closed form for the number of preorders on a set of n elements. You can find
more details on the OEIS here.

242
§11.4.7: Prewellorders

Definition: R is a prewellorder if it has the following properties:

◦ R is a preorder
◦ R is well-founded, in the sense that the relation S given by (x, y) ∈ S ⇐⇒ (x, y) ∈ R ∧ (y, x) ̸∈ R
is well-founded.

243
§11.4.8: Pseudo-Orders

Definition: R is a pseudo-order if it has the following properties:

◦ R is asymmetric
◦ R is cotransitive
◦ If (x, y), (y, x) ̸∈ R then x = y

244
§11.4.9: Strict Partial Orders

Definition: R is a strict partial order if it has the following properties:

◦ R is irreflexive
◦ R is antisymmetric
◦ R is transitive

Notes/Comments:

◦ Notice that strict partial orders differ from ordinary posets in the first axiom: usual posets have
reflexivity, whereas strict ones do not. You can think of strict partial orders embodying the
behavior of < whereas posets have ≤ (although without necessarily some sort of connectedness
axiom like those satisfy on the reals).

245
§11.4.10: Strict Total Order

Definition: R is a strict total order if it has the following properties:

◦ R is irreflexive
◦ R is antisymmetric
◦ R is transitive
◦ R is strongly connected

Notes/Comments:

◦ Notice that strict total orders differ from ordinary total orders in the first axiom: ordinary total
orders have reflexivity, whereas strict ones have irreflexivity. You can think of strict total orders
embodying the behavior of < whereas ordinary total orders have ≤.

246
§11.4.11: Total Orders

Definition: R is a total order if it has the following properties:

◦ R is reflexive
◦ R is antisymmetric
◦ R is transitive
◦ R is strongly connected

Notes/Comments:

◦ The number of total orders on a set of n elements is precisely n!.

247
§11.4.12: Total Preorders

Definition: R is a total preorder if it has the following properties:

◦ R is reflexive
◦ R is transitive
◦ R is strongly connected

Notes/Comments:

◦ There is no nice closed form for the number of total preorders on a set of n elements. The closest
one may get is
Xn
k! · S(n, k)
k=0

where S(n, k) denotes the Stirling numbers of the second kind. You can find more details on the
OEIS here.

248
§11.4.13: Tournaments

Definition: R is a tournament if it has the following properties:

◦ R is irreflexive
◦ R is antisymmetric

Notes/Comments:

◦ Typically, this refers to a type of graph (namely simple and directed).

249
§11.4.14: Well-order

Definition: R is a well-ordering if it has the following properties:

◦ R is a total order
◦ Any given subset of our original set has a least element w.r.t. this ordering. That is, if R is over
A,
(∀ nonempty S ⊆ A)(∃m ∈ S)(∀s ∈ S)((m, s) ∈ R)

250
§11.5: Basic Operations & Derived Relations

§11.5.1: Property Closure

Definition: The closure of a relation R (with respect to a certain property or collection thereof) is
the smallest relation R′ such that R′ satisfies that property and R ⊆ R′ .
For instance:

◦ The reflexive closure is the smallest reflexive relation containing R.

◦ The symmetric closure is the smallest symmetric relation containing R.
◦ The reflexive, symmetric, and transitive closure is the smallest equivalence relation containing R.

... and so on and so forth.

Reflexive & Symmetric Closures: Owing to their simplicity, we may readily define the reflexive
and symmetric closures Rref and Rsym respectively (for R on A) by

Rref = R ∪ {(x, x) | x ∈ A} Rsym = R ∪ {(y, x) | (x, y) ∈ R}

Existence & Minimality: For a property P and relation R, the P -closure of R need not always exist.
For the cases of reflexivity, transitivity, and symmetry, they do because such relations (interpreted as
sets) are closed under arbitrary intersection. Hence we may codify ”smallest” in the sense of - RP is
the P -closure of R ⊆ P(A) -
\
RP = R
R⊆S⊆P(A)
R satisfies P

Notation: Sometimes, to borrow notation from topology, the P -closure may be denoted clP (R). For
instance, clref (R). A notation used for the reflexive closure is sometimes R= .
Visual: In terms of our graph visualization, you will want to begin with the given relation, and modify
the graph by adding arrows (adding new pairs to the relation) until you have a relation with the desired
property.

251
§11.5.2: Property Reduction

(Note: Some notation here may be nonstandard. I only saw this offhandedly referenced on Wikipedia with
regards to the reflexive reduction.)

Definition: For a property P , the P reduction of a relation R is denoted redP (R). It is typically
defined to be the largest relation contained in R which does not satisfy property P . For instance: for
relations R over A,
The reflexive reduction redref (R) is the largest irreflexive relation contained in R, i.e.

redref (R) = R\{(x, x) | x ∈ A}

The transitive reduction redtra (R) is the largest antitransitive relation contained in R.

252
§11.5.3: Relation Composition

Definition: The composition of relations merits a slightly different visualization than we’ve used
throughout these posts.
Suppose we have two relations R, S. For full generality, I’ll let R ⊆ A × B and S ⊆ B × C. Then the
relation S ◦ R is defined by

S ◦ R = {(a, c) | (a, b) ∈ R ∧ (b, c) ∈ S}

Properties:

◦ Composition associates: (R ◦ S) ◦ T = R ◦ (S ◦ T )
◦ Taking the converse gives (R ◦ S)T = S T ◦ RT
◦ If R, S are injective (surjective) relations, then R ◦ S is injective (surjective).
◦ If R ◦ S is injective (surjective), then we only can say for sure that R is injective (S is surjective) .

Digraph Visualization: To visualize this sort of composition and find it visually, it is best to write
the elements of A, B, C in three columns in that order.

◦ For each pair (a, b) ∈ R, draw a line a → b from column A to column B

◦ For each pair (b, c) ∈ S, draw a line b → c from column B to column C
◦ The pair (a, c) then is in S ◦ R if, for any starting node in column A, you can walk a part to c in
column C.

Some examples will follow.

253
Example 1: Define two relations on {1, · · ·, 5} as below:

R = {(1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (3, 4), (4, 4), (4, 5), (5, 5)}
S = {(1, 5), (2, 4), (4, 2), (5, 1)}

Then we have

S ◦ R = {(1, 5), (1, 4), (2, 4), (3, 2), (4, 2), (4, 1), (5, 1)}

The visual setup of the relation is below. Ensure that this visual and how the composition is found
makes sense.

254
Example 2: Define two relations on {1, · · ·, 5} as below:

R = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5)}
S = {(1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (3, 4), (4, 4), (4, 5), (5, 5)}

Then we have that

S ◦ R = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5)}

The visual setup is below:

255
Example 3: Finally, on {1, · · ·, 5}, define the relation

R = {(1, 2), (2, 3), (3, 4), (4, 5), (5, 1)}

We can compose R with itself. In particular, we see that

R ◦ R = {(1, 3), (2, 4), (3, 5), (4, 1), (5, 2)}
R ◦ R ◦ R = {(1, 4), (2, 5), (3, 1), (4, 2), (5, 3)}

The visual setup:

256
§11.5.4: Transpose of Relation

Definition: Given a relation R, the transpose or converse relation RT is defined by

RT = {(b, a) | (a, b) ∈ R}

Digraph Visual: Hence, in visualizing a relation R as a directed graph, the graph for RT amounts
to simply flipping around each arrow’s orientation.
Notation: A number of notations exist for the transpose/converse relation. Some include:

RT RC R−1 R̆ R◦ R∨

257
§12: Items from Abstract Algebra
§12.1: Algebraic Structures

§12.1.1: Overview of Basic Structures

A brief table summarizing the group-like (one set, one operation) structures (from Wikipedia) and their
corresponding properties is below.

Some of the ring-like (one set, two operation) structures are summarized below, somewhat. ((R, +, ×) is
a semiring when (R, +) is a commutative monoid, (R, ×) is a monoid, and distribution is satisfied. The
below table has an error in that respect.)

Some quick links to other common structures of note, and their Wikipedia articles:

Modules
Vector spaces
Algebras (over fields)
Associative algebras & non-associative algebras

258
§12.1.2: Structural Graph for Group-Like Structures

Full resolution image here. A TikZ editor version is here.

259
§12.1.3: Structural Graph for Ring-Like Structures

We note that

(R, +, ×) is a semiring when (R, +) is a commutative monoid, (R, ×) is a monoid, and distribution
is satisfied.
(R, +, ×) is a near-ring if (R, +) is a group, (R, ×) is a semigroup, and distribution is satisfied. (Left-
and right-near-rings exist only satisfying one distribution law.)
(R, +, ×) is a rng if (R, +) is an abelian group and (R, ×) is a semigroup, with distribution satisfied.
Note that to make R a ring with identity (whereas a rng is a ring without identity) one needs to
introduce a multiplicative identity, making (R, ×) a monoid.

The graph is not as complete or proper as the previous, so unseen connections with appropriate axioms
or conditions may exist. In particular, this chain of inclusions represents some of the major structures quite
well, in lieu of the “addition of axioms” structure below:

{rngs} ⊇ {rings}
⊇ {commutative rings}
⊇ {integral domains}
⊇ {integrally closed domains}
⊇ {GCD domains}
⊇ {unique factorization domains (UFDs)}
⊇ {principal ideal domains (PIDs)}
⊇ {Euclidean domains}
⊇ {fields}
⊇ {algebraically closed fields}

Full resolution available here, and some code for it here. The TikZ editor stuff for it is here, but doesn’t
totally reflect the image linked to or included here, but only up to aesthetic details.

260
§12.2: Isomorphism Theorems

(Isomorphism theorems on ProofWiki - link.)

Any suggestion as to numbering depends on the source, and hence should be taken carefully here.

Isomorphism Theorems for Groups:

Recall that N is a normal subgroup of G (N ⊴ G) when N is a subgroup of G closed under conjugation:

gng −1 ∈ N for all g ∈ G and for all n ∈ N

We define the quotient group of G and a normal subgroup N by the collection of distinct cosets:

G/N = {gN }g∈G

The isomorphism theorems:

First Isomorphism Theorem (Fundamental Isomorphism Theorem): Let f : G → H be a

group homomorphism. Then:

◦ ker(f ) ⊴ G
◦ im(f ) ≤ H
◦ im(f ) ∼
= G/ ker(f )

The isomorphism for the third item is φ : G/N → H given by φ(xN ) = f (x). Some details.

261
Second Isomorphism Theorem (Diamond/Parallelogram Theorem): For G a group, with
S ≤ G and N ⊴ G, we have

◦ SN ≤ G, where SN := {sn | s ∈ S, n ∈ N }
◦ S∩N ⊴S
SN ∼ S
◦ =
N S∩N
The isomorphism in the third is given by φ : S → SN/N with φ(s) = sN . Some details.
One need not have N normal, provided S is a subgroup of N ’s normalizer.

Third Isomorphism Theorem: For G a group with N ⊴ G, we have

◦ Whenever N ≤ K ≤ G, then G/N has a subgroup isomorphic to K/N

◦ All subgroups of G/N take the form K/N for some N ≤ K ≤ G
◦ If K ⊴ G with N ≤ K ≤ G, then G/N has a normal subgroup isomorphic to K/N
◦ All normal subgroups of G/N take the form K/N for some normal K with N ≤ K ≤ G
◦ If K ⊴ G with N ≤ K ≤ G, then (G/N )(K/N ) ∼= G/K

A proof here.
Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take G a
group and N ⊴ G. Let

G := {A | N ≤ A ≤ G}, the subgroups of G containing N

N := {S | S ≤ G/N }, all subgroups of G/N

Then there is a bijection

φ : G → N defined by φ(A) := A/N
Moreover, if A, B ∈ G,
◦ A ⊆ B ⇐⇒ A/N ⊆ B/N
◦ A ⊆ B then |B : A| = |B/N : A/N | (|B : A| is the number of cosets bA of A in B)
◦ ⟨A, B⟩/N = ⟨A/N, B/N ⟩ (⟨A, B⟩ the subgroup of G generated by A ∪ B)
◦ (A ∩ B)/N = (A/N ) ∩ (B/N )
◦ A ⊴ G ⇐⇒ A/N ⊴ G/N
Some discussion on Wikipedia here and ProofWiki here.

262
Isomorphism Theorems for Rings:
We let ≤ denote the subring relation here.

First Isomorphism Theorem (Fundamental Isomorphism Theorem): For φ : R → S a ring

homomorphism,
◦ ker(φ) is an ideal of R
◦ im(φ) ≤ S
◦ im(φ) ∼
= R/ ker(φ)
Second Isomorphism Theorem (Diamond/Parallelogram Theorem): For S ≤ R as rings and
I an ideal of R,
◦ S + I := {s + i | s ∈ S, i ∈ I} ≤ R
◦ S ∩ I is an ideal of I
S+I ∼ S
◦ =
I S∩I
Third Isomorphism Theorem: For R a ring and I an ideal of R,

◦ If A ≤ R with I ≤ A ≤ R, then A/I ≤ R/I

◦ Subrings of R/I take the form A/I for some A ≤ R where I ≤ A ≤ R
◦ If J is an ideal of R with I ≤ J ≤ R, then J/I is an ideal of R/I
◦ Each ideal of R/I takes the form J/I for some ideal J of R where I ≤ J ≤ R
∼ R/J.
◦ If J is an ideal of R with I ≤ J ≤ R, then (R/I)(J/I) =
Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take I an
ideal of R a ring. The map A 7→ A/I is an inclusion-preserving bijection between the set of subrings
of R containing I, and the set of subrings of R/I.
Moreover, if A is a subring containing I, it is an ideal of R iff A/I is an ideal of R/I.

263
Isomorphism Theorems for Modules & Vector Spaces:
We let ≤ denote the submodule/vector subspace relation here.
Note that modules over a field are, in fact, vector spaces.
For finite-dimensional vector spaces, these all follow by rank-nullity.
Throughout, “module” here always refers to an R-module, for R any fixed ring.

First Isomorphism Theorem (Fundamental Isomorphism Theorem): For φ : M → N a

module homomorphism,
◦ ker(φ) ≤ M
◦ im(φ) ≤ N
∼ M/ ker(φ)
◦ im(φ) =

Second Isomorphism Theorem (Diamond/Parallelogram Theorem): For S, T ≤ M as mod-

ules,
◦ S + T := {s + t | s ∈ S, t ∈ T } ≤ M
◦ S∩T ≤M
S+T ∼ S
◦ =
T S∩T
Third Isomorphism Theorem: For T ≤ M as modules,

◦ If S ≤ M with T ≤ S ≤ M , then S/T ≤ M/T

◦ Subrings of M/T take the form S/T for some S ≤ M where T ≤ S ≤ M
◦ If S is a submodule of M with T ≤ S ≤ M , then (M/T )(S/T ) ∼
= M/S.
Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take N ≤ M
as modules. There is an inclusion-preserving bijection A 7→ A/N for A ≥ N between the set of
submodules of M containing N , and the set of subrings of M/N .

264
§12.3: Catalogue of Important Groups & Group Structures

§12.3.1: Specific Examples of Groups

Specific Examples:

Some Basics: R, C, Q, A under +, or their nonzero elements under multiplication. The same is true
of the finite fields Z/pZ ≡ Fp .
Dihedral Groups: D2n is the set of rigid symmetries of a regular n-gon in the plane, generated by
rotations r by 2π/n radians in the counterclockwise direction about the origin, and reflections s about
a line through the origin and a fixed “first” vertex. (The line does not change after operations.) It has
presentation
D2n = r, s rn = s2 = 1; rs = sr−1

Symmetric Groups: Sn is the set of bijections f : {1, 2, · · ·, n} → {1, 2, · · ·, n} under function

composition.

◦ Alternating Group: An is the subgroup of Sn of even permutations.

Quaternion Group: Q8 is the set {±1, ±i, ±j, ±k}, which follows the rules you know for quaternion
multiplication in the larger set H (that set being sometimes called the Hamiltonians).
Klein 4-Group: (Denoted V, V4 , K4 .) The smallest non-cyclic group, and the only non-cyclic group
of order 4. It possesses the multiplication table

and has presentation V4 = a, b a2 = b2 = (ab)2 = e .
More is available on Wikipedia & Groupprops.
Automorphism Groups:

◦ Aut(G) is the set of all isomorphisms G → G, known as automorphisms.

◦ For fixed g ∈ G, conjugation by g (i.e. the automorphism h 7→ ghg −1 ) is called an inner auto-
morphism. The collection of these is denoted Inn(G). (Note: Inn(G) ⊴ Aut(G).)
◦ The outer automorphism group of G us Aut(G)/ Inn(G).

265
§12.3.2: Important Classes/Classifications of Groups

Cyclic Groups: Cyclic groups only have one generator. We may say that the cyclic group of order n
is
Zn ≡ Cn := ⟨x | xn = 1⟩
or it may be infinite.
Quotient Groups: G/H is read “G modulo H” or “G mod H”

◦ Definition by Homomorphisms/Kernels: Let φ : G → H be a homomorphism. We may

define the quotient group G/ ker φ as the set of fibers/preimages φ−1 (h) h∈H with the operation

φ−1 (a) ◦ φ−1 (b) := φ−1 (ab)

Equivalently, G/ ker φ is {g ker(φ)}g∈G , the set of left cosets of ker φ in G with the group operation

g ker φ ◦ h ker φ := (gh) ker φ

Right cosets may also be used.

◦ Definition by Normal Subgroup: We may generally define G/N for some N ≤ G, but only if
N is the kernel of some homomorphism, and hence only if N ⊴ G.
Correspondingly, G/N is the set of left (or right) cosets {gN }g∈G under the aforementioned
operation.
◦ Definition by Equivalence Relation: Given N ⊴ G, define a relation ∼ by a ∼ b ⇐⇒ ab−1 ∈ N .
The equivalence classes G/ ∼ form a group, G/N , under the analogous operation.

Simple Groups: A group G is simple if |G| > 1 and its only normal subgroups are ⟨1⟩ and G itself.
(Specifically, we say it has exactly two normal subgroups, so ⟨1⟩ is not simple.)
Composition Series: We say, for a group G, that the sequence of subgroups

⟨1⟩ = N0 ⊴ N1 ⊴ · · · ⊴ Nk = G where Ni+1 /Ni is simple for each i

is a composition series, with the quotients G’s composition factors.

Solvable Groups: We say, for a group G, if there is a chain of subgroups

⟨1⟩ = N0 ⊴ N1 ⊴ · · · ⊴ Nk = G where Ni+1 /Ni is abelian for each i

then G is solvable.

266
§12.3.3: Important Subgroups/Substructures

Important Results:

Recall that H ≤ G ⇐⇒ H ̸= ∅ and x, y ∈ H =⇒ xy −1 ∈ H.

Lagrange’s Theorem: For H ≤ G a finite group, then |H| |G|, and the number of left (or right)

cosets of H in G is |G|/|H|.
We denote the index by |G : H| and let it be the number of these cosets, even in the infinite case.
Observe that |G/N | ≡ |G|/|N | in the finite case.

◦ Corollary: The order of each individual element divides that of G (H := ⟨x⟩), i.e. |x| |G|.

Moreover, x|G| = 1G for all x ∈ G.

◦ Corollary: If G is of prime order p, then G is cyclic and G ∼
= Zp .
◦ Note: The converse is not necessarily true, i.e. n | |G| does not necessarily mean G has a subgroup
of order n.
◦ The converse is always true for abelian groups.

Important Substructures:

Centralizer: Given a group G and a set ∅ ̸= A ⊆ G, the centralizer of A in G is

CG (A) := g ∈ G gag −1 = a ∀a ∈ A = {g ∈ G | ga = ag ∀a ∈ A}

the set of all elements of G that commute with any element of A. We may write CG (a) := CG ({a}).
We have CG (A) ≤ G.

Center: Given a group G, its center is

Z(G) := {g ∈ G | gx = xg ∀x ∈ G} ≡ CG (G)

so Z(G) ⊴ G and Z(G) is an abelian group. (If G is abelian, Z(G) = G.)

Normalizer: We define gAg −1 := gag −1 a ∈ A . Given A ⊆ G a group, then, the normalizer of

A in G is
NG (A) := g ∈ G gAg −1 = A

Note that CG (A) ≤ NG (A) ≤ G. This is not the same as CG (A), it is looser in that elements within
A need not be fixed by conjugation, just shuffled around within A.

267
Kernel of Homomorphism: For a homomorphism φ : G → H,

ker φ := φ−1 (1H ) = {g ∈ G | φ(g) = 1H }

Note that ker φ ⊴ G.

Commutator Subgroup: The commutator of x, y ∈ G a group is defined by

[x, y] := x−1 y −1 xy

The commutator (first derived) subgroup is denoted by [G, G], G′ , or G(1) , and defined by

[G, G] := {[x, y]}x,y∈G

Note that [G, G] ⊴ G and, more strongly, [G, G] char G.

Normal Subgroups: Given N ≤ G, we say N is normal if it is normalized by each g ∈ G; that is
to say, gN g −1 = N for each g ∈ G (conjugation by G always only shuffles elements). We denote this
N ⊴ G.
The following are common equivalent conditions to say N ⊴ G given N ≤ G:

◦ ∀g ∈ G, we have gN g −1 = N
◦ ∀g ∈ G, we have gN g −1 ⊆ N
◦ ∀g ∈ G, we have N ⊆ gN g −1
◦ ∀g ∈ G, ∀n ∈ N we have gng −1 ∈ N
◦ ∀g ∈ G, ∀n ∈ N we have gng −1 ∈ N ⇐⇒ n ∈ N
◦ ∀g ∈ G, we have gN = N g
◦ A left coset is always a right coset and vice versa
◦ ∃φ : G → H a homomorphism with ker φ = N

Characteristic Subgroups: We say H ≤ G is characteristic in G (H char G) if σ(H) = H for all

σ ∈ Aut(G). (Each automorphism of G sends H to itself, not necessarily fixing it or its elements.)
Sylow Theorem / p-(sub)group Stuff: G is a group and p is prime.

◦ If |G| = pα for α ∈ Z≥1 , we say G is a p-group

◦ If H ≤ G for any group G, and H is a p-group, we say H is a p-subgroup
◦ If |G| = pα m for p̸ | m, and H ≤ G has |H| = pα , then H is a Sylow p-subgroup of G
◦ The collection of Sylow p-subgroups of G is Sylp (G).

◦ The number of Sylow p-subgroups Sylp (G) is np or np (G).

268
§12.3.4: Items Tied to Group Actions

Group Action: The group action of a group G on a set A is a mapping − · − : G × A → A such

that
(i) g1 · (g2 · a) = (g1 g2 ) · a for each g1 , g2 ∈ G, a ∈ A
(ii) 1 · a = a for each a ∈ A
Stabilizer: Suppose G is acting on S a set and s ∈ S. Then the stabilizer of s in G is

Gs := {g ∈ G | g · s = s}

We have Gs ≤ G.
(Group Action) Kernel: The kernel of the aforementioned group action is

{g ∈ G | ∀s ∈ S, g · s = s}

and is a subgroup of G. If this kernel is ⟨1⟩, we say the action is faithful .

Orbits: The orbit of G containing a is given by Oa := {g · a}g∈G .

◦ We say the action is transitive if there is only one distinct orbit (hence, ∀a, b ∈ A we may f ind
g ∈ G where a = g · b).
◦ Suppose G is a transitive permutation group on A. A block is a nonempty B ⊆ A such that
∀σ ∈ G we either have
σ(B) := {σ(b)}b∈B = B or σ(B) ∩ B = ∅
◦ A transitive group G on A is said to be primitive if all blocks are trivial: size 1, or A itself.
◦ A transitive permutation group G on A is doubly transitive if ∀a ∈ A, Ga is transitive on
A − {a}.
◦ Consider the group action of conjugation (with A = G):

g · a := gag −1 a, g ∈ G
−1

The orbits Oa := gag g∈G
of this action are the conjugacy classes of G.

269
§12.4: Catalogue of Important Rings & Ring Structures

§12.4.1: Relevant (Somewhat Nonstandard) Notations

Ideals: We may use I ⊴ R to say “I is an ideal of R”, analogous to the notation for normal sub-
groups. This notation has appeared in P.M. Cohn’s Introduction to Ring Theory (2000 edition); cf.
this MSE post.
Center of a Ring: Dummit & Foote do not use an explicit notation for the center of a ring. We will
use Z(R) to denote the center of the ring R, analogous to the notation for centers of groups. Another
notation is C(R), which nicely parallels the centralizer notation CR (R), since Z(R) = CR (R).
Nonzero Elements vs. Units: The notation R× has become commonplace, especially for fields,
to denote the invertible elements of a ring. In fields, “nonzero elements” and “invertible elements”
coincide, but not so much for rings. To avoid confusion, we will use R̸=0 to denote the nonzero
elements of a ring, and R× for the units/invertible elements.

270
§12.4.2: Definitions of Ring-Like Structures

A Certain Hierarchy:
While not perfect, we have the following chain of inclusions:

Fundamental Classes of Rings:

The most fundamental hierarchy starts with a nonunital ring, and slowly adds axioms:

Semiring: Prior to even the rng, we define a semiring R (with two operations, +, ·) as satisfying

(i) (R, +) is an commutative monoid (so + is closed, commutes, associates, has identity)
(ii) (R, ·) is a monoid (so · is closed & associates)
(iii) Distribution on both sides is satisfied: (a + b) · r = a · r + b · r, and r · (a + b) = r · a + r · b

Rng / Ring Without Unit: A rng (also: ring without unity , nonunital ring , ring without
1 , etc.) is a set R paired with operations + (addition) and · (multiplication) such that

(i) (R, +) is an abelian group (so + is closed, commutes, associates, has identity, has inverses)
(ii) (R, ·) is a semigroup (so · is closed & associates)
(iii) Distribution on both sides is satisfied: (a + b) · r = a · r + b · r, and r · (a + b) = r · a + r · b

Some authors consider a ring R to by default have identity (a unital ring); others do not and choose
to make a distinction (and hence a ring may not have 1). Be careful.
Commutative Rng: A rng in which · commutes is a commutative rng . Hence,

(i) (R, +) is an abelian group

(ii) (R, ·) is a commutative semigroup
(iii) Distribution on both sides is satisfied: (a + b) · r = a · r + b · r, and r · (a + b) = r · a + r · b

Ring / Unital Ring: A ring (also: ring with identity , unital ring , ring with 1 , etc.) is a rng
R such that · has a multiplicative identity as well. This ensures that:

(i) (R, +) is an abelian group

(ii) (R, ·) is a monoid
(iii) Distribution on both sides is satisfied

271
One may note that the condition that (R, +) is abelian is superfluous in unital rings, since the remaining
axioms force commutativity under +. We see that by left distribution

(1 + 1)(a + b) = 1(a + b) + 1(a + b) = a + b + a + b

and by right distribution

(1 + 1)(a + b) = (1 + 1)a + (1 + 1)b = a + a + b + b

One cancels the leftmost and rightmost terms to get b + a = a + b, i.e. commutativity.
Commutative Ring: A ring in which · commutes is a commutative ring . Hence

(i) (R, +) is an abelian group

(ii) (R, ·) is a commutative monoid
(iii) Distribution on both sides is satisfied (would only show one for a proof)

Division Ring: Given a unital ring R (not necessarily commutative), if 0 ̸= 1 (so as to be nontrivial)
and each element in R̸=0 is invertible (R× = R̸=0 ), we say R is a division ring . Hence:

(i) R is nontrivial (0 ̸= 1)
(ii) (R, +) is an abelian group
(iii) (R̸=0 , ·) is a group
(iv) Distribution on both sides is satisfied

Some say a division ring which specifically is not commutative is a skew field .
Equivalent conditions to be a division ring:

◦ R is a division ring when the only ideals are 0 and D (can be just one-sided)

Field: A field R is a commutative division ring. Hence

(i) R is nontrivial (0 ̸= 1)
(ii) (R, +) is an abelian group
(iii) (R̸=0 , ·) is an abelian group
(iv) Distribution on both sides is satisfied

Equivalent conditions for R to be a field:

◦ R’s only ideals are 0 and R.
◦ 0 is a maximal ideal and R is commutative

272
Other Classes:

Integral Domain: A commutative ring R which satisfies

ab = 0 =⇒ a = 0 or b = 0 (or both)

is said to be an integral domain; that is, it contains no zero divisors.

◦ GCD Domain: If an integral domain R has a greatest common divisor for each pair of its
elements, it is a GCD domain.
Specifically: given each a, b ∈ R with at least one of a, b ̸= 0, let d ∈ R have the properties
(i) d | a (i.e. ∃r ∈ R where a = dr)
(ii) d | b
(iii) δ | a and δ | b implies δ | d
Then d is a greatest common divisor (gcd) of a, b, denoted d = gcd(a, b). (Note this d need not
be unique.)
◦ Unique Factorization Domain (UFD): A unique factorization domain is an integral
domain R in which each r ∈ R̸=0 has a factorization in terms of irreducibles pi and a unit u. That
is,
r = up1 · · ·pn for some n ≥ 0
Moreover, this factorization must be unique. That is, if for some other irreducibles qi and unit
w, we have Y Y
r=u pi = w qi
i i

then there are equally many qi and pi , and and a bijection {1, · · ·, n} → {1, · · ·, n} sending
pi 7→ qσ(i) .
It is often convenient to use the equivalent definition that each r ∈ R̸=0 may be written as a
product of a unit and some primes in R.
◦ Principal Ideal Domains (PIDs): An integral domain R is said to be a principal ideal
domain if each ideal is principal, i.e. generated by a single element.
◦ Bezout Domain: For R an integral domain, we say R is a Bezout domain when each ideal
generated by two elements is principal, i.e. ∀a, b ∈ R, (a, b) = (c) for some c ∈ R.

273
◦ Euclidean Domain: If an integral domain R possesses a division algorithm, it is a Euclidean
domain.
This requires that ∃N : R → Z≥0 (a norm) with the property that N (0) = 0 and

given a, b ∈ R, b ̸= 0, then ∃q, r ∈ R with a = bq + r and either r = 0 or N (r) < N (b)

Note that this gives rise to the Euclidean algorithm:

Here, rn is the last nonzero remainder; since N (b) > N (r0 ) > N (r1 ) > · · · > N (rn ), the process
must terminant eventually. Moreover, rn = gcd(a, b).

274
Boolean Ring: A rng R is said to be a Boolean ring if r2 = r for all r ∈ R. (These are necessarily
commutative as well.)
Quotient Ring: Let φ : R → S have ker φ = I. Then the fibers of φ are the additive cosets of the
kernel, so we define

R/I := {r + I}r∈R (r + I) + (s + I) := (r + s) + I (r + I) · (s + I) := (r · s) + I

These define a ring, and we call R/I the quotient ring . Note this is fundamentally just the quotient
of the additive groups, with I ⊴ R.
We can equivalently define the quotient R/I for any ideal I, since I is an ideal iff it is a kernel.
Simple Rings: A ring is simple if its only ideals are 0 and itself.

Local Rings: A commutative ring is local if it has a unique maximal ideal.

275
§12.4.3: Types of Elements in a Ring

The Identities: 0 or 0R often denotes the additive identity, and 1R or 1 denotes the multiplicative
identity, if present.
Inverses: In a unital ring R, x−1 is the (multiplicative) inverse of x when xx−1 = x−1 x = 1.

◦ u ∈ R has a left inverse when ∃uℓ ∈ R such that uℓ · u = 1

◦ u ∈ R has a right inverse when ∃ur ∈ R such that u · ur = 1
◦ u is a unit iff it has a left & right inverse
◦ If u has multiple right inverses, u is a zero divisor

Units & The Unit: Confusingly, it is common to say r ∈ R is a unit if it is invertible under
multiplication, and we call 1 ∈ R in a unital ring the unit.
The collection of all units of R is denoted R× .
Zero Divisors: r ∈ R̸=0 is said to be a zero divisor if ∃s ∈ R̸=0 such that rs = 0 or sr = 0.

◦ a ∈ R̸=0 is said to be a left zero divisor if ∃x ∈ R̸=0 with ax = 0

◦ b ∈ R̸=0 is said to be a right zero divisor if ∃y ∈ R̸=0 if yb = 0

Irreducible Elements: r ∈ R (an integral domain) is said to be irreducible if

(i) r is a non-unit (it is not invertible)

(ii) r is not the product of two non-units, i.e. for any p, q ̸∈ R× , we have r ̸= pq

In integral domains, primes are irreducible; in PIDs, primes and irreducibles coincide.
Prime Elements: We say p ∈ R (a commutative ring) is prime when

(i) p ̸= 0
(ii) p is a non-unit (so these combined means p ∈ R̸=0 − R× )
(iii) p | ab =⇒ p | a or p | b

Equivalently, p is prime iff (p) is a prime ideal.

In integral domains, primes are irreducible; in PIDs/UFDs, primes and irreducibles coincide.
Associating Elements: For a, b ∈ R, if a = ub for a unit u, we say a, b associate in R.
Nilpotent Elements: r ∈ R a ring is said to be nilpotent if rn = 0 for an n ∈ Z+ . The collection
of all of these elements is the ring’s nilradical N(R).
(Greatest Common) Divisor: In a commutative ring R, take a, b ∈ R with b ̸= 0.

◦ We say a is a multiple of b, or b divides a, denoted b | a, if ∃x ∈ R such that a = bx.

276
Least Common Multiple: In a commutative unital ring R with a, b ∈ R̸=0 , a least common
multiple of a, b is an ℓ ∈ R such that

We may write ℓ = lcm(a, b) = [a, b].

Universal Side Divisors: In R

e := R× ∪{0}, the units of R and 0, u ∈ R− R
e is said to be a universal
side divisor if
(∀x ∈ R) ∃z ∈ Re u | (x − z)

Thus there is a sort of “division algorithm” for each u: any x may be written in the form

x = qu + z

for z zero or a unit.

277
§12.4.4: Specific Examples of Rings

Basic Examples:

The prototypical example is Z under the usual addition and multiplication, a commutative ring.

nZ for n ̸= 0, 1 forms a (nontrivial, nonunital) ring.

Anything we know as a field (e.g., Q, R, C) are obvious examples.

The “real Hamilton quaternions” H form a noncommutative ring.

The collection of functions f : X → A for A any ring and X ̸= ∅; this commutes iff A does. Special
subclasses exist, such as C [0, 1], R , as a commutative ring. Similarly, Cc (R, R) is a commutative
non-unital ring.

The Trivial Ring / Zero Ring: The singleton {0}. The only definable operations,

+ : {0} × {0} → {0}

0+0=0
· : {0} × {0} → {0}
0·0=0

make it satisfy all of the axioms of a commutative ring. Typically definitions of division rings and fields
exclude it from being either, however, by forcing 0 ̸= 1. (Notice that the additive & multiplicative
identities in the trivial ring are both zero.)
(Some define the trivial rings to be an entire class: for any abelian group, define multiplication by
g · h := 0.)

278
§12.4.5: Important Classes of Rings

Polynomial/Series Rings:

Polynomials: Given a ring R, we define the ring of polynomials (in the formal variable x,
with coefficients in R) to be
(N )
X
n
R[x] := an x N ∈ N and an ∈ R for each n

n=0

These are meant as formal sums. Hence, they are manipulated in the ways we expect, but we do not
make concerns about convergence in the infinite case.

Formal Power Series: Given a ring R, we define the ring of power series (formal variable x,
coefficients in R) to be (∞ )
X
n
R[[x]] := an x an ∈ R for each n

n=0

Formal Laurent Series: Given a commutative unital ring R (or, often as in Dummit & Foote, a
field), we define the ring of formal Laurent series to be
( ∞ )
X
n
R((x)) := an x N ∈ Z and an ∈ R for each n

n=N

Note that this means p ∈ R((x)) may be written as a sum of ps ∈ R[[x]] and pp ∈ R[1/x].

279
Matrix Rings:
Given a ring R and n ∈ Z≥1 , we have the ring
n o
n
Mn (R) = Rn×n = Mn×n (R) = (ri,j )i,j=1 = {all n × n matrices over R}
ri,j ∈R

Note that Mn (R) need not be commutative (even in the extreme case of R a field).
We say A ∈ Mn (R) is a scalar matrix if all diagonal entries are the same constant, and all other entries
are 0. These matrices are isomorphic to R.
We have the subclasses (though often for more special R, namely a field F (and most commonly R or
C)):

General Linear Group: GLn (F) is the invertible n × n matrices over F:

GLn (F) := M ∈ Fn×n det(M ) ̸= 0

Special Linear Group: SLn (F) is all n × n matrices of determinant 1 over F

SLn (F) := M ∈ Fn×n det(M ) = 1

Orthogonal Group: On (F) is all orthogonal n × n matrices:

On (F) := M ∈ Fn×n M M T = M T M = In×n = M ∈ Fn×n M −1 = M T

This is usually used for F ⊆ R. Note that if M ∈ On (F) then det(M ) = ±1.

Unitary Group: The complex case of the previous, Un (F), uses unitary matrices:

Un (F) := M ∈ Fn×n M M ∗ = M ∗ M = In×n = M ∈ Fn×n M −1 = M ∗

This can be applied to F ⊆ C. Note that if M ∈ Un (F) then |det(M )| = 1.

Special Orthogonal Group: This restricts On (F) to those of unit determinant:

SOn (F) := M ∈ Fn×n M −1 = M T and det(M ) = 1

Special Unitary Group: This likewise restricts Un (F) to those of unit determinant:

SUn (F) := M ∈ Fn×n M −1 = M ∗ and det(M ) = 1

280
Other Types:

n
Group Rings: Let R be a nontrivial unital commutative ring, and G := {gi }i=1 a finite multiplicative
group. We can define the group ring RG of G with coefficients in R as all formal sums as so:
( n )
X
RG := ai gi ai ∈ R

i=1

If g1 is the identity of G, we may write a1 g1 as a1 ; the summand 1g may be written as g.

A special case is ZG, the integral group ring of G.
Addition in RG is defined componentwise:
n
X n
X n
X
ai gi + bi gi := (ai + bi )gi
i=1 i=1 i=1

Multiplication comes by

(agi ) · (bgj ) := (ab)gk where k is such that gk = gi gj

i.e.
(agi ) · (bgj ) := (ab)(gi gj )
and hence  
n
X n
X n
X X
ai gi · bi gi :=  ai bj gk
i=1 i=1 k=1 gi gj =gk

These make RG a ring, and it is commutative iff G is abelian.

Ring of Integers: Recall that we may define
h√ i n √ o
R D := a + b D a, b ∈ R
√
given a ring R. (It seems preferable to denote this R D if the result forms a field.)
Consider a nontrivial commutative ring R and a monic p ∈ Z[x]. If p(r) = 0, i.e.
k
X
rn + an−1 rn−1 + an−2 rn−2 + · · · + a1 r + a0 = 0 where ka = a, given k ∈ Z
i=1

then r is said to be a algebraic integer , and its order is said to be the minimum degree of such
polynomials p that r satisfies. One then defines the ring of (algebraic) integers from R by

OR := {r ∈ R | r is an algebraic integer}

One typically defines this for R an algebraic number field , a field such that R/Q has finite degree.

◦ Example: For D ∈ Z squarefree, we may have

√ 
D, D ≡ 2, 3 (mod 4)
√

OQ( √D ) = Z[ω] := {a + bω}a,b∈Z where ω := 1 + D
 , D ≡ 1 (mod 4)
2

◦ Example: Z[i] is the Gaussian integers, a subset of C.

281
§12.4.6: Important Ring Substructures

Miscellaneous Structures:

Subrings: S ⊆ R a ring is said to be a subring (denoted S ≤ R if it is a ring in its own right. This
gives rise to the subring test which means that we must verify the following:

(i) S ̸= ∅
(ii) S ⊆ R
(iii) For any x, y ∈ S, we have x − y, x · y ∈ S

Annihilator: For a ring R and r ∈ R:

◦ The right annihilator of r is given by r. AnnR (a) := {x ∈ R | rx = 0}

◦ The left annihilator of r is given by ℓ. AnnR (a) := {y ∈ R | yr = 0}
◦ Both may be generalized to the annihilator of a set. For instance, the left annihilator of S is
ℓ. AnnR (S) := {r ∈ R | ∀s ∈ S, rs = 0}.
◦ If L is a left ideal of R then ℓ. AnnR (L) ⊴ R.

Centralizer: Given a ring R and S ⊆ R, we define the centralizer of S to be

CR (S) := {r ∈ R | ∀s ∈ S, sr = rs}

i.e. the set of elements in R commuting with each element of S (under multiplication).

Center: Given a ring R, its center Z(R) is defined by

Z(R) := {r ∈ R | ∀ρ ∈ R, rρ = ρr} ≡ CR (R)

i.e. the set of all elements in R commuting with every other element (under multiplication).
Kernel: Given a homomorphism of rings φ : R → S, we have the kernel of φ as in the homomorphism
of additive groups φ : (R, +R ) → (S, +S ):

ker φ := φ−1 (0) := {r ∈ R | φ(r) = 0}

Nilradical: Given a commutative ring R, its nilradical is its nilpotent elements:

N(R) := {r ∈ R | r is nilpotent, i.e. rn = 0 for some n ∈ N}

Valuation Rings: For K a field (more generally, an integral domain), a discrete valuation on K
is a function ν : K × → Z such that
(i) ν(ab) = ν(a) + ν(b) (a homomorphism (K × , ·) → (Z, +))
(ii) ν is surjective
(iii) ν(x + y) ≥ min{ν(x), ν(y)} for any x, y ∈ K × where x + y ̸= 0
We then say that {x ∈ K × | ν(x) ≥ 0 or x = 0} is the valuation ring of ν.

282
Ideals & Related Structures:
Given a ring R and I ≤ R:

Left Ideal: If rI := {ri}i∈I ⊆ I, then we say I is a left ideal .

Right Ideal: If Ir := {ir}i∈I ⊆ I, then we say I is a right ideal .
Two-Sided Ideal: If it is both, we say I is a (two-sided) ideal and write (nonstandardly) I ⊴ R,
given the analogies to normal subgroups.
Nilpotent Ideal: N ⊴ R is said to be nilpotent if N n = ⟨0⟩ for some n ∈ Z≥1 .
Generated Ideal: For A ⊆ R, (A) denotes the ideal generated by A, the smallest ideal containing
A. Specifically, \
(A) = I
A⊆I⊴R

If R is commutative, then (A) = AR = RA = RAR where

( n ) ( n ) ( n
X X X
AR := ai ri n ∈ Z+ , ai ∈ A, ri ∈ R RA := ri ai n ∈ Z+ , ai ∈ A, ri ∈ R RAR := ri ai si n ∈ Z+

i=1 i=1 i=1

RA is the left ideal generated by A, and AR the right ideal generated by A.

Less symbolically, in a commutative ring, (a) will be all R-multiples of a: {ar}r∈R . For R non-
commutative this need not be true: it will be all finite sums of elements of the type ras (r, s ∈ R).
Principal Ideal: An ideal is principal if generated by a singleton.
Maximal Ideal: M ⊴ R is maximal if M ̸= R and M ⊴ I ⊴ R =⇒ M = I or I = R. (The only
ideals containing M are M and R.) If this maximal ideal is unique in R commutative, we say R is
local
Prime Ideal: In R commutative, P ⊴ R is prime if P ̸= R and ab ∈ P =⇒ a ∈ P or b ∈ P .
Primary Ideal: If R is commutative, and whenever
(i) Q⊴R
(ii) Q ̸= R
(iii) ab ∈ Q
(iv) a ̸∈ Q
are satisfied, then bn ∈ Q for an n ∈ N, then Q is said to be primary . (Equivalently, R/Q’s zero
divisors are nilpotent.)

We say R is simple if its only ideals are 0 and itself.

We may define certain operations on ideals or related/derived ideas:

Sum of Ideals: Given I, J ⊴ R, define I + J := {i + j}i∈I,j∈J

We say I, J ⊴ R are comaximal ideals if I + J = R.
Product of Ideals: Given I, J ⊴ R, define
( n )
X
IJ := ik jk n ∈ N, ik ∈ I, jk ∈ J

k=1

and, for n ∈ Z≥1

I n := {all finite sums of the type a1 · · ·an for ai ∈ I}
Equivalently, I n = II n−1 with I 1 = I.

283
Radical of Ideal: For I ⊴ R commutative, we let

rad(I) := r ∈ R rn ∈ I for some n ∈ Z+

be the radical of I. We say I itself is a radical ideal if rad I = I.

Jacobson Radical: For I ⊴ R a commutative ring,

\
Jac(I) := {M ⊴ R | M is maximal, and I ⊆ M }

with J(R) = R. We call Jac(0) to be the Jacobson radical of R.

284
§12.4.7: Functions of Rings

Homomorphisms & Isomorphisms: A function of rings φ : R → S is a (ring) homomorphism

when it respects both operations:

φ(a + b) = φ(a) + φ(b)

φ(a · b) = φ(a) · φ(b)

If it is a bijective function, we call it an isomorphism.

Norm: There are several varying definitions:

◦ Definition 1 (analogous to vector norm): A norm on a commutative ring R is a function

N : R → R≥0 such that
(i) N (r) = 0 ⇐⇒ r = 0
(ii) N (r + s) ≤ N (r) + N (s)
(iii) N (rs) ≤ N (r)N (s)
◦ Definition 2 (multiplicative function): Another sense in which norm may be meant for
commutative rings R. A norm can be a function N : R → R≥0 such that
(i) N (rs) = N (r)N (s)
(ii) N (r) = 1 ⇐⇒ r is a unit
◦ Definition 3 (as in Dummit & Foote): Given R an integral domain, a norm is a function
N : R → Z≥0 with N (0) = 0. If r ̸= 0 =⇒ N (r) > 0, then N is a positive norm.
Dedekind-Hasse Norm: For N a positive norm, N is a Dedekind-Hasse norm if,
∀a, b ∈ R, then a ∈ (b) or ∃x ∈ (a, b) nonzero with N (x) < N (b). That is, either b | a
or ∃s, t ∈ R with 0 < N (sa − tb) < N (b).

Evaluation Map: For A a ring, X ̸= ∅, and R the set of all functions f : X → A, the evaluation
at c map is given by
Ec : R → A with Ec (f ) := f (c)
This is a ring homomorphism with R/ ker Ec ∼
= A.
Discrete Valuations: For K a field (more generally, an integral domain), a discrete valuation on
K is a function νK × → Z such that

(i) ν(ab) = ν(a) + ν(b) (a homomorphism (K × , ·) → (Z, +))

(ii) ν is surjective
(iii) ν(x + y) ≥ min{ν(x), ν(y)} for any x, y ∈ K × where x + y ̸= 0
We then say that {x ∈ K × | ν(x) ≥ 0 or x = 0} is the valuation ring of ν.

Characteristic of a Ring: Given a ring R, its characteristic char(R) is the minimum n such that
n1 = 0, i.e.
  ( n )
  X

char(R) := min n ∈ Z≥1 1 + 1 + · · · + 1 = 0 = min n ∈ Z≥1 1=0

 | {z
n times
} 
i=1

If said minimum does not exist, we say char(R) = 0.

285
§12.5: (Dummit & Foote, Chapter 1 ) Group Theory: Basics (Groups, Actions,
Morphisms)

Fundamental Definitions:

Binary Operation: A binary operation on G is a function ∗ : G2 → G (with ∗(a, b) ≡ a ∗ b). We

may say it is associative if (a ∗ b) ∗ c = a ∗ (b ∗ c), and commutative if a ∗ b = b ∗ a, each for all
a, b, c ∈ G. We may omit the ∗ and write a ∗ b = ab.
Group: A group is an ordered pair (G, ∗) where ∗ is a binary operation on G (a set) such that

◦ ∗ is associative
◦ ∃e ∈ G (often labeled 1), the identity of G, such that ge = eg = g ∀g ∈ G
◦ ∀g ∈ G, ∃g −1 ∈ G, the inverse of g, such that gg −1 = g −1 g = e

If ∗ is in addition commutative, then G is said to be abelian

Order (Element): The order of g ∈ G, denoted |g| is the smallest n such that g n = 1. That is,

|g| := max{n ∈ Z≥1 | g n = 1}

if such an n exists. Else, we say |g| = ∞.

Order (Group): This is the same as that for a set; the order of a group is its cardinality.

Direct Product: Given groups (G, ∗), (H, ◦), their product is the set G × H equipped with a binary
operation ⊙ defined by
(a, b) ⊙ (c, d) := (a ∗ c, b ◦ d)

Basic Results:

Let G be a group.

e−1 = e for e the identity

The identity of G is unique (D&F, Prop. 1.1.1)

The inverse of a given element is unique (D&F, Prop. 1.1.1)

(g −1 )−1 = g for any g ∈ G (D&F, Prop. 1.1.1)
(xy)−1 = y −1 x−1 for any x, y ∈ G (D&F, Prop. 1.1.1)

a1 a2 · · ·an may be bracketed in any way with no ambiguity (D&F, Prop. 1.1.1)
au = av =⇒ u = v and ub = vb =⇒ u = v, for a, b, u, v ∈ G (D&F, Prop. 1.1.2)
The solutions x, y to ax = b and ya = b are unique (D&F, Prop. 1.1.2)
|x| = 1 ⇐⇒ x is the identity

|x| = x−1 = gxg −1

|xy| = |yx|

286
Important Examples:

Dihedral Groups, D2n : Representative of the symmetries on a regular n-gon, rotations and reflec-
tions. May be represented by

D2n := r, s rn = s2 = 1; rs = sr−1

Here, r is a rotation counterclockwise by 2π/n radians, and s is a reflection about a line: this line
is that through the shape’s center, and a fixed vertex. (This line does not change after performing
actions on the shape.)
Symmetric Groups, Sn : SΩ is the set of bijections f : Ω → Ω, with operation of function composi-
tion. Sn is the case for Ω = {1, 2, · · ·, n}, with |Sn | = n!. We often write σ ∈ Sn with a cycle notation,
e.g.
σ = (1 3 2 4) ⇐⇒ σ(1) = 3, σ(3) = 2, σ(2) = 4, σ(4) = 1
with cycles of length 1 omitted (those numbers are fixed). Any σ ∈ Sn is a product (composition) of
disjoint such cycles (each are their own elements of Sn after all).
Some further notes for now:

◦ Disjoint cycles commute

◦ Sn is nonabelian for n ≥ 3
◦ |σ| is the lcm of the lengths of the cycles in its composition into disjoint cycles

General Linear Groups: A field is a set F with addition + and multiplication · as binary operations,
such that (F, +) and (F \{0}, ·) are abelian groups, satisfying the obvious distribution law. (We let
F × := F \{0}.)
A general linear group may be defined by

GLn (F ) := {all n × n matrices over F which are invertible}

This group is nonabelian for all n ≥ 2 and all fields F .

Quaternion Group, Q8 : (Not to be confused with the quaternion ring or Hamiltonians,
H.) We define
Q8 := {±1, ±i, ±j, ±k}
satisfying the relations

1a = a (−1)2 = 1 i2 = j 2 = k 2 = −1 ij = k jk = i ki = j kj = −i ji = −k ik = −j

287
Homomorphisms & Isomorphisms:
A structure-preserving map φ : G → H of groups (G, ∗), (H, ◦), i.e.

φ(x ∗ y) = φ(x) ◦ φ(y) or compactly φ(xy) = φ(x)φ(y)

is a homomorphism. If this map is a bijection, it is an isomorphism (G ∼

= H). If G, H are the same
group as well, then it is a automorphism and φ ∈ Aut(G).
Some results:

About Homomorphisms: Here φ : G → H is a homomorphism

◦ φ(xn ) = φ(x)n for each x ∈ G, n ∈ Z (D&F, Prob. 1.6.1)

◦ ker(φ) := {x ∈ G | φ(x) = 1} ≤ G (D&F, Prob. 1.6.14)
◦ φ is injective iff ker(φ) is the trivial group (D&F, Prob. 1.6.14)

About Isomorphic Groups & Isomorphisms: Here, φ : G → H is an isomorphism and hence

G∼
=H

◦ |G| = |H|
◦ G is abelian iff H is
◦ |x| = |φ(x)| for each x ∈ G
◦ G∼ = H iff, for each fixed order, they each have equally many elements of that order
◦ If G ∼
= H, then they may be represented the same way (in the sense of ⟨· · ·| · · ·⟩), with the same
generators and relations (up to the naming thereof).

Group Actions:
The group action of a group G on a set A is a mapping · : G × A → A such that

(i) g1 · (g2 · a) = (g1 g2 ) · a for each g1 , g2 ∈ G, a ∈ A

(ii) 1 · a = a for each a ∈ A

(When there is no danger of confusion, · may be replaced with concatenation.) A key example is matrix-
vector multiplication.
Some definitions and notes:

Each g ∈ G (when acting on A) induces a map σg : A → A with σg (a) := g · a. We note that σg is a

permutation of A, and the map g 7→ σg ∈ SA is a homomorphism.
If g ̸= h =⇒ σg ̸= σh , then we say the action is faithful

The kernel of the group action is {g ∈ G | g · a = a ∀a ∈ A}, those for whom the induced permutation
is the identity permutation.

288
§12.6: (Dummit & Foote, Chapter 2 ) Group Theory: Subgroups

Introduction:
For G a group and H ⊆ G, we say H (under the same operation) is a subgroup of G if x, y ∈ H =⇒ xy, x−1 , y −1 ∈ H.
(Note: this implies that 1 ∈ H, H ̸= ∅, etc., that usually are redundantly stated.) We write H ≤ G.
Some notes:

⟨1⟩, G ≤ G for any G

Subgroup Test: H ̸= ∅ is a subgroup of G iff x, y ∈ H =⇒ xy −1 ∈ H (D&F, Prop. 2.1.1)

Lagrange’s Theorem: For H ≤ G a finite group, |H| |G| (D&F, Prob. 1.7.19)

Special Subgroups:

We define the following:

Centralizer: Given a group G and a set ∅ ̸= A ⊆ G, the centralizer of A in G is

CG (A) := g ∈ G gag −1 = a ∀a ∈ A = {g ∈ G | ga = ag ∀a ∈ A}

the set of all elements of G that commute with any element of A. We may write CG (a) := CG ({a}).
We have CG (A) ≤ G.
Center: Given a group G, its center is

Z(G) := {g ∈ G | gx = xg ∀x ∈ G} ≡ CG (G)

so Z(G) ≤ G and Z(G) is an abelian group. (If G is abelian, Z(G) = G.)

Normalizer: We define gAg −1 := gag −1 a ∈ A . Given A ⊆ G a group, then, the normalizer of

A in G is
NG (A) := g ∈ G gAg −1 = A

Note that CG (A) ≤ NG (A) ≤ G. This is not the same as CG (A), it is looser in that elements within
A need not be fixed by conjugation, just shuffled around within A.
Stabilizer: Suppose G is acting on S a set and s ∈ S. Then the stabilizer of s in G is

Gs := {g ∈ G | g · s = s}

We have Gs ≤ G.
(Group Action) Kernel: The kernel of the aforementioned group action is

{g ∈ G | ∀s ∈ S, g · s = s}

and is a subgroup of G.

Some notes:

CG (Z(G)) = G and hence NG (Z(G)) = G (D&F, Prob. 2.2.2)

A ⊆ B ⊆ G =⇒ CG (B) ≤ CG (A) (D&F, Prob. 2.2.3)
H ≤ G =⇒ H ≤ NG (H), CG (H) (D&F, Prob. 2.2.6)

289
§12.7: (Dummit & Foote, Chapter 3 ) Group Theory: Quotients; Homomor-
phisms

Basic Definitions:
Here, G, H are groups unless stated otherwise.

For φ : G → H a function, φ is a (group) homomorphism if φ(ab) = φ(a)φ(b) for all a, b ∈ G.

(Note that the operations in G and H may differ. If φ is additionally a bijection, we say φ is an
isomorphism and that G ∼ = H.)
The kernel of φ is given by ker φ := φ−1 (eH ) = {g ∈ G | φ(g) = eH }.

Let φ : G → H be a homomorphism and have K := ker φ. The quotient (factor) group G/K is
defined by {gK}g∈G where gK := {gk}k∈K . One may in general define this for any K ⊴ G since such
K are the kernels of some homomorphism (cf. Theorem 3.1.3). The elements of G/K are called cosets
(aK is a left coset and Ka a right coset).

We say gng −1 is the conjugate of n ∈ N by g ∈ G, and likewise for gN g −1 . We say g normalizes

N if gN g −1 = N , and say N is a normal subgroup of G (N ⊴ G) if all elements of G normalize N ,
i.e. gN g −1 = N ∀g ∈ G.
For N ⊴ G, the homomorphism π : G → G/N given by π(g) := gN is the natural projection
homomorphism of G onto G/N . If H ≤ G/N , then π −1 (H) is the complete preimage of H under
π.
A group G is simple if |G| > 1 and N ⊴ G =⇒ N ∈ {⟨1⟩, G}.

For a group G, a sequence of subgroups

⟨1⟩ = N0 ⊴ N1 ⊴ · · · ⊴ Nk = G

is a composition series when Ni+1 /Ni is simple for each i. These quotients are called the composi-
tion factors of G. By Jordan-Holder, all finite G have a unique composition series up to isomorphism
of the factors.
A group G is said to be solvable if the composition factors are all abelian.

The alternating group An is a subgroup of Sn consisting of all even permutations in Sn .

290
Basic Results:

For a homomorphism φ : G → H: (D&F, Prop. 3.1.1)

◦ φ(1G ) = 1H (identity is preserved)

◦ φ(g n ) = φ(g)n for any g ∈ G, n ∈ Z
◦ ker φ ≤ G (in fact, ker φ ⊴ G) (D&F, Prob. 3.1.1)
◦ im φ ≤ H
◦ E ≤ H =⇒ φ−1 (E) ≤ G and E ⊴ H =⇒ φ−1 (E) ⊴ G (D&F, Prob. 3.1.1)

The following are equivalent: (D&F, Thm. 3.1.6)

◦ N ⊴G
◦ NG (N ) = G
◦ gN = N g ∀g ∈ G
◦ The operation on left cosets of N given by aN ∗ bN := (ab)N , with a, b ∈ G, makes the left cosets
into a group.
◦ gN g −1 ⊆ N for any g ∈ G (equality is not needed)
◦ ∃φ : G → H a homomorphism with N ≡ ker φ (D&F, Prop. 3.1.7)

On operations on normal subgroups:

T
◦ If {Ni }i∈I have Ni ⊴ G for all i, then i∈I Ni ⊴ G. (D&F, Prob. 3.1.22)
◦ If {Ni }i∈I have Ni ⊴ G for all i, then ⟨Ni ⟩i∈I ⊴ G. (D&F, Prob. 3.1.23)
◦ If N ⊴ G and H ≤ G, then N ∩ H ⊴ H (D&F, Prob. 3.1.24)

Items on indices: |G : H| is the number of left cosets of H in G (cf. Lagrange’s theorem)

◦ If H, K ≤ G have finite indices in G, then (D&F, Thm. 3.2.10)

lcm |G : H|, |G : K| ≤ |G : H ∩ K| ≤ |G : H| · |G : K|

Then if these indices are coprime,

|G : H ∩ K| = |G : H| · |G : K|

◦ H ≤ K ≤ G =⇒ |G : H| = |G : K| · |K : H| (D&F, Prob. 3.2.11)

◦ For H ≤ G and N ⊴ G, G finite, if |H|, |G : N | are coprime, then H ≤ N . (D&F, Prob. 3.2.18)
◦ If N ⊴ G with G finite and |N |, |G : N | coprime, then N is the only subgroup of its order. (D&F,
Prob. 3.2.19)

The collection of all cosets of a given subgroup partition a group.

If G = ⟨S⟩ and N ⊴ G, then G/N = {sN }s∈S .

(D&F, Prob. 3.1.16)
G/Z(G) cyclic =⇒ G abelian (D&F, Prob. 3.1.36)
|G| = pq for prime p, q gives G abelian or Z(G) = ⟨1⟩. (D&F, Prob. 3.2.4)

xN, yN in G/N commute iff their commutator [x, y] := x−1 y −1 xy ∈ N (D&F, Prob. 3.1.40)
The commutator subgroup N := {[x, y]}x,y∈G has N ⊴ G and G/N abelian. (D&F, Prob. 3.1.41)

291
For H, K ≤ G (H, K finite), with HK := {hk}h∈H,k∈K , then (D&F, Thm. 3.2.13)

|H||K|
|HK| =
|H ∩ K|

and HK ≤ G iff HK = KH. (This is not the condition to be abelian.) (D&F, Prop. 3.2.14)
As a corollary: H, K ≤ G with H ≤ NG (K) gives HK ≤ G. In particular, K ⊴ G and ≤ G gives
HK ≤ G. (D&F, Cor. 3.2.15)
If H ≤ G and g ∈ G, then gHg −1 ≤ G and |H| = gHg −1

(D&F, Prob. 3.2.5a)

If H ≤ G is the only subgroup of a given (finite?) order in G, then H ⊴ G. (D&F, Prob. 3.2.5b)
For H ≤ G and g, g ′ ∈ G, if Hg = g ′ H, then Hg = gH and g ∈ NG (H). (D&F, Prob. 3.2.6)
If H, K ≤ G with H, K finite of coprime orders, then H ∩ K = ⟨1⟩. (D&F, Prob. 3.2.8)
If G is an abelian simple group, G ∼
= Zp for p prime. (D&F, Prob. 3.4.1)

If n | |G|, a finite abelian group G has a subgroup of order n. (D&F, Prob. 3.4.4)
If G is solvable and H ≤ G and N ⊴ G, then H and G/N are solvable. (D&F, Prob. 3.4.5)
A finite group G is solvable iff if n | |G| with n, |G|/n coprime implies G has a subgroup of order n.

If N, G/N are solvable, G is solvable.

For a finite group G, these are equivalent: (D&F, Prob. 3.4.8)

◦ G is solvable
◦ The composition factors of G have prime order.
◦ G has a chain of subgroups
⟨1⟩ = H0 ⊴ H1 ⊴ · ⊴ Hs = G
with Hi+1 /Hi cyclic. (Difference from composition series is the cyclic condition.)
◦ G has a chain of subgroups
⟨1⟩ = N0 ⊴ N1 ⊴ · · · ⊴ Nt = G
with Ni ⊴ G for all i, and Ni+1 /Ni abelian for all i. (Difference from composition series lies with
Ni ⊴ G and the abelian (not simple) condition.)

For H ⊴ G with H nontrivial and G solvable, ∃A ≤ H nontrivial with A ⊴ G and A abelian. (D&F,
Prob. 3.4.11)

292
More Important Results:

Lagrange’s Theorem: For H ≤ G a finite group, then |H| |G|, and the number of left (or right)

cosets of H in G is |G|/|H|. (D&F, Thm. 3.2.8)

We denote the index by |G : H| and let it be the number of these cosets, even in the infinite case.
Observe that |G/N | ≡ |G|/|N | in the finite case.

◦ Corollary: The order of each individual element divides that of G (H := ⟨x⟩), i.e. |x| |G|.

Moreover, x|G| = 1G for all x ∈ G. (D&F, Cor. 3.2.9)

◦ Corollary: If G is of prime order p, then G is cyclic and G ∼
= Zp . (D&F, Cor. 3.2.10)
◦ Note: The converse is not necessarily true, i.e. n | |G| does not necessarily mean G has a subgroup
of order n.
◦ The converse is always true for abelian groups.

◦ ker φ ⊴ G
◦ G/ ker φ ∼
= im φ
◦ Corollary: φ is injective iff ker φ = ⟨1⟩
◦ Corollary: |G : ker φ| = |im φ|

Second/Diamond Isomorphism Theorem: For A, B ≤ G and A ≤ NG (B), we have (D&F, Thm.

3.3.18)

◦ AB ≤ G
◦ B ⊴ AB
◦ A∩B ⊴A
◦ AB/B ∼
= A/(A ∩ B)

Third Isomorphism Theorem: For H, K ⊴ G with H ≤ K:

◦ K/H ⊴ G/H
◦ (G/H) / (K/H) ∼
= G/K

Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take G a

group and N ⊴ G. Let

G := {A | N ≤ A ≤ G}, the subgroups of G containing N

N := {S | S ≤ G/N }, all subgroups of G/N

Then there is a bijection

φ : G → N defined by φ(A) := A/N
Moreover, if A, B ∈ G,
◦ A ⊆ B ⇐⇒ A/N ⊆ B/N

293
◦ A ⊆ B then |B : A| = |B/N : A/N | (|B : A| is the number of cosets bA of A in B)
◦ ⟨A, B⟩/N = ⟨A/N, B/N ⟩ (⟨A, B⟩ the subgroup of G generated by A ∪ B)
◦ (A ∩ B)/N = (A/N ) ∩ (B/N )
◦ A ⊴ G ⇐⇒ A/N ⊴ G/N

Some discussion on Wikipedia here and ProofWiki here.

The elements of Sn are products of transpositions. An is nonsimple for n ≥ 5.

294
§12.8: (Dummit & Foote, Chapter 4 ) Group Theory: More on Actions

Basic Definitions:
We assume G is a group acting on a nonempty set A if not stated otherwise.

There is a map, to each g ∈ G,

σg : A → A
a 7→ g · a
σg ∈ SA

which permutes A. On a higher level, ∃φ : G → SA a homomorphism sending each g ∈ G to its

permutation:

φ : G → SA
g 7→ σg

φ is called the permutation representation of the action.

Since there are equally many actions of G on A, and homomorphisms G → SA , we may say a permu-
tation representation of G is any such homomorphism, with a given action affording/inducing
that representation it matches to (by the rule g · a := φ(g)(a)).
A group action has kernel
{g ∈ G | ∀a ∈ A, g · a = a}
and each a ∈ G has stabilizer
Ga := {g ∈ G | g · a = a}
We say the action is faithful if the kernel is ⟨1⟩.
The orbit of G containing a is given by Oa := {g · a}g∈G .

Consider the group action of conjugation (with A = G):

g · a := gag −1 a, g ∈ G
−1

The orbits Oa := gag g∈G
of this action are the conjugacy classes of G.

Aut(G) is the set of all isomorphisms G → G, known as automorphisms.

For fixed g ∈ G, conjugation by g (i.e. the automorphism h 7→ ghg −1 ) is called an inner automor-
phism. The collection of these is denoted Inn(G). (Note: Inn(G) ⊴ Aut(G).)
The outer automorphism group of G us Aut(G)/ Inn(G).

295
We say H ≤ G is characteristic in G (H char G) if σ(H) = H for all σ ∈ Aut(G). (Each automor-
phism of G sends H to itself, not necessarily fixing it or its elements.)
Sylow Theorem / p-group Stuff: G is a group and p is prime.

◦ If |G| = pα for α ∈ Z≥1 , we say G is a p-group

Basic Results:

We assume G is a group acting on a nonempty set A if not stated otherwise.

If a, b ∈ A and b = g · a for a g ∈ G (hence, b ∈ Oa ), then (D&F, Prob. 4.1.1)

\
Gb = gGa g −1 and the action’s kernel is gGa g −1
g∈G

If H char G, then H ⊴ G.

If H is the only subgroup of G of a certain order, then H char G.

If K char H ⊴ G, then K ⊴ G.

296
More Important Results:

Cayley’s Theorem: Any group is isomorphic to a subgroup of some symmetric group. If |G| = n,
then G ∼
= H ≤ Sn . (D&F, Cor. 4.2.4)
The Class Equation: Let G be a finite group, and let gi ∈ Oi (i = 1, · · ·, r), where Oi are the distinct
conjugacy classes of G (orbits of the action of conjugation on G by G) not contained in the center
Z(G). Then (D&F, Thm. 4.3.7)
r
X
|G| = |Z(G)| + |G : CG (gi )|
i=1

◦ Corollary: if |G| = pα for p prime and α ∈ Z≥1 , then Z(G) is nontrivial. (D&F, Thm. 4.3.8)
2 ∼ ∼
◦ Corollary: if |G| = p for p prime, then G is abelian and G = Zp2 or G = Zp × Zp . (D&F, Cor.
4.3.9)

The Sylow Theorems: Let |G| = pα m for p prime and p̸ | m. (D&F, Thm. 4.5.18)

◦ Sylp (G) ̸= ∅ (all such groups has Sylow p-subgroups)

◦ If P ∈ Sylp (G) and Q ≤ G is a p-subgroup, then ∃g ∈ G with Q ≤ gP g −1 , i.e. p-subgroups are
contained in conjugates of Sylow p-subgroups. (Moreover, Sylow p-subgroups are conjugate to
each other.)
◦ The number of Sylow p-subgroups np satisfies

np ≡ 1 (mod p)

and np = |G : NG (P )| for any P ∈ Sylp (G). Thus np | m.

For P ∈ Sylp (G), these are equivalent: (D&F, Cor. 4.5.20)

◦ np = 1
◦ P ⊴G
◦ P char G
◦ If X ⊆ G has |x| = pβx for some βx ∈ N depending on x, for any x ∈ X, then ⟨X⟩ is a p-group.

297
§12.9: (Dummit & Foote, Chapter 7 ) Ring Theory: Basic Definitions/Examples

Definitions Given:

Types of Rings:

◦ Ring: A ring (without unity) is a set R together with binary operations +, × (addition, multi-
plication) such that
(i) (R, +) is an abelian group
(ii) (R, ×) is closed and associative
(iii) Distribution holds: for each a, b, c ∈ R,

(a + b) × c = (a × c) + (b × c)
a × (b + c) = (a × b) + (a × c)

We say R is a commutative ring is multiplication commutes, and we say R has identity/

contains 1 if multiplication has an identity.
◦ Division Ring: A division ring or skew field is a ring R such that
(i) R has 1
̸ 0
(ii) 1 =
(iii) Each r ∈ R has a multiplicative inverse
If commutative, we call it a field .
◦ Integral Domain: A commutative unital nontrivial ring R is an integral domain if it contains
no zero divisors.
◦ Boolean Ring: A ring R is a Boolean ring is r2 = r for all r ∈ R.

Types of Ring Elements:

◦ Zero Divisors: For a, b ∈ R̸=0 , we say a is a zero divisor if ab = 0 or ba = 0.

a ∈ R̸=0 is said to be a left zero divisor if ∃x ∈ R̸=0 with ax = 0
b ∈ R̸=0 is said to be a right zero divisor if ∃y ∈ R̸=0 if yb = 0
◦ Units: In a nontrivial unital ring R, u ∈ R is a unit if ∃v ∈ R with uv = vu = 1, i.e. if u is
invertible under multiplication. The collection of all units in R is denoted R× .
◦ Nilpotent: r ∈ R is said to be nilpotent if rn = 0 for some n ∈ Z+

Subring: For R a ring and S ⊆ R, we say S is a subring of R if it is a ring in its own right.
It suffices to show that S ̸= ∅, S ⊆ R, and x, y ∈ S =⇒ x − y, xy ∈ S.

298
◦ Polynomial Ring: Given a commutative unital ring R (or, more loosely, any rng), we let
( n )
X
R[x] := ai xi n ∈ N and, for all i, ai ∈ R

i=0

with the obvious operations. This ring of polynomials is also a commutative unital ring.
Relatedly:
Formal Power Series: Given a ring R, we define the ring of power series (formal variable
x, coefficients in R) to be
(∞ )
X
n
R[[x]] := an x an ∈ R for each n

n=0

Formal Laurent Series: Given a commutative unital ring R (or, often as in Dummit &
Foote, a field), we define the ring of formal Laurent series to be
( ∞ )
X
n
R((x)) := an x N ∈ Z and an ∈ R for each n

n=N

Note that this means p ∈ R((x)) may be written as a sum of ps ∈ R[[x]] and pp ∈ R[1/x].

Matrix Rings: Given a ring R and n ∈ Z≥1 , we have the ring

n o
n
Mn (R) = Rn×n = Mn×n (R) = (ri,j )i,j=1 = {all n × n matrices over R}
ri,j ∈R

Note that Mn (R) need not be commutative (even in the extreme case of R a field).
We say A ∈ Mn (R) is a scalar matrix if all diagonal entries are the same constant, and all other
entries are 0. These matrices are isomorphic to R.
We have the subclasses (though often for more special R, namely a field F (and most commonly R or
C)):

◦ General Linear Group: GLn (F) is the invertible n × n matrices over F:

GLn (F) := M ∈ Fn×n det(M ) ̸= 0

◦ Special Linear Group: SLn (F) is all n × n matrices of determinant 1 over F

SLn (F) := M ∈ Fn×n det(M ) = 1

◦ Orthogonal Group: On (F) is all orthogonal n × n matrices:

On (F) := M ∈ Fn×n M M T = M T M = In×n = M ∈ Fn×n M −1 = M T

This is usually used for F ⊆ R. Note that if M ∈ On (F) then det(M ) = ±1.
◦ Unitary Group: The complex case of the previous, Un (F), uses unitary matrices:

Un (F) := M ∈ Fn×n M M ∗ = M ∗ M = In×n = M ∈ Fn×n M −1 = M ∗

This can be applied to F ⊆ C. Note that if M ∈ Un (F) then |det(M )| = 1.

◦ Special Orthogonal Group: This restricts On (F) to those of unit determinant:

SOn (F) := M ∈ Fn×n M −1 = M T and det(M ) = 1

299
◦ Special Unitary Group: This likewise restricts Un (F) to those of unit determinant:

SUn (F) := M ∈ Fn×n M −1 = M ∗ and det(M ) = 1

If g1 is the identity of G, we may write a1 g1 as a1 ; the summand 1g may be written as g.

A special case is ZG, the integral group ring of G.
Addition in RG is defined componentwise:
n
X n
X n
X
ai gi + bi gi := (ai + bi )gi
i=1 i=1 i=1

Multiplication comes by

(agi ) · (bgj ) := (ab)gk where k is such that gk = gi gj

i.e.
(agi ) · (bgj ) := (ab)(gi gj )
and hence  
n
X n
X n
X X
ai gi · bi gi :=  ai bj gk
i=1 i=1 k=1 gi gj =gk

These make RG a ring, and it is commutative iff G is abelian.

300
Trivial Properties and Results:

In a ring R, the following hold:

◦ 0a = a0 = 0 (D&F, Prop. 7.1.1)

◦ (−a)b = a(−b) = −(ab) (D&F, Prop. 7.1.1)
◦ (−a)(−b) = ab (D&F, Prop. 7.1.1)
◦ If R is a unital ring, then 1 is unique, and −a ≡ (−1)a (D&F, Prop. 7.1.1)

In a unital ring R, the following hold:

◦ (−1)2 = 1 (D&F, Prob. 7.1.1)

◦ If u is a unit, so is −u (D&F, Prop. 7.1.2)
◦ If 1 ∈ S ≤ R and u ∈ S is a unit, then u is a unit in R (D&F, Prob. 7.1.3)
T
◦ If {Si }i∈I have Si ≤ R for all R, then i∈I Si ≤ R (D&F, Prob. 7.1.4)

For R a commutative ring:

◦ If r is nilpotent (rn = 0 for an n ∈ Z+ ):

r = 0 or r is a zero divisor (D&F, Prob. 7.1.14)
ar is nilpotent for any a ∈ R (D&F, Prob. 7.1.14)
1 + r is a unit (D&F, Prob. 7.1.14)
The sum of a nilpotent & a unit is a unit (D&F, Prob. 7.1.14)

For R a division ring:

◦ Z(R) is a field (D&F, Prob. 7.1.7)

◦ C(r) is a division ring for any r ∈ R (D&F, Prob. 7.1.10)

For a discrete valuation ring R over a field K: (D&F, Prob. 7.1.26)

◦ R ≤ K, and has its identity

◦ If x ∈ K̸=0 , then x ∈ R or x−1 ∈ R
◦ x ∈ R is a unit iff ν(x) = 0

A zero divisor is never a unit. Moreover, in fields, F × = F − {0}, and fields have no zero divisors.
Let a, b, c ∈ R a ring, with a not a zero divisor. Then ab = ac =⇒ a = 0 or b = c (D&F, Prop. 7.1.2)
Any finite integral domain is a field (D&F, Cor. 7.1.3)
Wedderburn’s Little Theorem: a finite division ring must commute, and hence be a field
All Boolean rings are commutative
Results for polynomial rings:

◦ (R an integral domain) For p, q ∈ R[x], deg(p · q) = deg(p) + deg(q) (D&F, Prop. 7.2.4)
◦ (R an integral domain) The units of R[x] are those of R (D&F, Prop. 7.2.4)
◦ (R an integral domain) R[x] is an integral domain (D&F, Prop. 7.2.4)
◦ (R an integral domain) If R has no zero divisors, so does R[x] (D&F, Prop. 7.2.4)
◦ (R an integral domain) If f g ≡ 0 for a nonzero g ∈ R[x], then cf = 0 for some c ∈ R (D&F,
Prop. 7.2.4)

301
◦ (R an integral domain) S ≤ R =⇒ S[x] ≤ R[x] (D&F, Prop. 7.2.4)
◦ (R a comm. ring with 1) p ∈ R[x] is a zero divisor iff ∃b ∈ R̸=0 with bp = 0.

Results for the power series ring R[[x]] over R a commutative unital ring:

◦ R[[x]] is a commutative unital ring (D&F, Prob. 7.2.3)

P∞ n
◦ n=0 an x ∈ R[[x]] is a unit in R[[x]] iff a0 is a unit in R (D&F, Prob. 7.2.3)
◦ If R is an integral domain, so is R[[x]] (D&F, Prob. 7.2.4)

Results for the Laurent series ring F ((x)) over a field F :

◦ F ((x)) is a field (D&F, Prob. 7.2.5)

×
◦ The map ν : F ((x)) → Z by
∞
!
X
v an xn = N = the order of zero / the pole of the series at 0
n=N

is a discrete valuation with valuation ring F [[x]]. (D&F, Prob. 7.2.5)

Results for matrix rings:

◦ Z(Mn (R)) is the set of all scalar matrices (D&F, Prob. 7.2.7)
◦ Strictly upper triangular matrices are nilpotent in Mn (R), n ≥ 2 (D&F, Prob. 7.2.8)

302
§12.10: (Dummit & Foote, Chapter 7 ) Ring Theory: Homomorphisms, Quo-
tients, Ideals

Basic Definitions:

Homomorphisms: A function of rings φ : R → S is a (ring) homomorphism when, ∀a, b ∈ R,

φ(a + b) = φ(a) + φ(b)

φ(a · b) = φ(a) · φ(b)

◦ We define the kernel by ker φ := φ−1 (0) := {r ∈ R | φ(r) = 0}, as in the additive group sense.
◦ If φ is a bijection, it is a isomorphism.

Quotient Ring: Let φ : R → S have ker φ = I. Then the fibers of φ are the additive cosets of the
kernel, so we define

R/I := {r + I}r∈R (r + I) + (s + I) := (r + s) + I (r + I) · (s + I) := (r · s) + I

Ideals: Given a ring R and I ≤ R, we have the following kinds of ideals.

We say R is simple if its only ideals are 0 and itself.

◦ Left Ideal: If rI := {ri}i∈I ⊆ I, then we say I is a left ideal .

◦ Right Ideal: If Ir := {ir}i∈I ⊆ I, then we say I is a right ideal .
◦ Two-Sided Ideal: If it is both, we say I is a (two-sided) ideal and write (nonstandardly)
I ⊴ R, given the analogies to normal subgroups.
◦ Generated Ideal: For A ⊆ R, (A) denotes the ideal generated by A, the smallest ideal
containing A. Specifically, \
(A) = I
A⊆I⊴R

If R is commutative, then (A) = AR = RA = RAR where

( n ) ( n ) ( n
X X X
+ +
AR := ai ri n ∈ Z , ai ∈ A, ri ∈ R RA := ri ai n ∈ Z , ai ∈ A, ri ∈ R RAR := ri ai si n

i=1 i=1 i=1

RA is the left ideal generated by A, and AR the right ideal generated by A.

Less symbolically, in a commutative ring, (a) will be all R-multiples of a: {ar}r∈R . For R non-
commutative this need not be true: it will be all finite sums of elements of the type ras
(r, s ∈ R).
◦ Principal Ideal: An ideal is principal if generated by a singleton.
◦ Maximal Ideal: M ⊴ R is maximal if M ̸= R and M ⊴ I ⊴ R =⇒ M = I or I = R. (The
only ideals containing M are M and R.) If this maximal ideal is unique in R commutative, we
say R is local
◦ Prime Ideal: In R commutative, P ⊴ R is prime if P ̸= R and ab ∈ P =⇒ a ∈ P or b ∈ P .

303
Sum of Ideals: Given I, J ⊴ R, define I + J := {i + j}i∈I,j∈J
We say I, J ⊴ R are comaximal ideals if I + J = R.
Product of Ideals: Given I, J ⊴ R, define
( n )
X
IJ := ik jk n ∈ N, ik ∈ I, jk ∈ J

k=1

and, for n ∈ Z≥1

I n := {all finite sums of the type a1 · · ·an for ai ∈ I}
Equivalently, I n = II n−1 with I 1 = I.

◦ Nilpotent Ideal: N ⊴ R is said to be nilpotent if N n = ⟨0⟩ for some n ∈ Z≥1 .

Evaluation Map: For A a ring, X ̸= ∅, and R the set of all functions f : X → A, the evaluation
at c map is given by
Ec : R → A with Ec (f ) := f (c)
This is a ring homomorphism with R/ ker Ec ∼
= A.
Annihilator: For a ring R and r ∈ R:

◦ The right annihilator of r is given by r. AnnR (a) := {x ∈ R | rx = 0}

Characteristic of a Ring: Given a ring R, its characteristic char(R) is the minimum n such that
n1 = 0, i.e.
  ( n )
  X

char(R) := min n ∈ Z≥1 1 + 1 + · · · + 1 = 0 = min n ∈ Z≥1 1=0

 | {z }
n times

i=1

If said minimum does not exist, we say char(R) = 0.

Nilradical: Given a commutative ring R, its nilradical is its nilpotent elements:

N(R) := {r ∈ R | r is nilpotent, i.e. rn = 0 for some n ∈ N}

Radical of Ideal: For I ⊴ R commutative, we let

rad(I) := r ∈ R rn ∈ I for some n ∈ Z+

be the radical of I. We say I itself is a radical ideal if rad I = I.

Jacobson Radical: For I ⊴ R a commutative ring,
\
Jac(I) := {M ⊴ R | M is maximal, and I ⊆ M }

with J(R) = R. We call Jac(0) to be the Jacobson radical of R.

Basic Results:

304
In a commutative ring with unity, the binomial theorem holds as usual: (D&F, Prob. 7.2.25)
n
X n
(a + b)n = ak bn−k
k
k=0

with the binomial coefficient interpreted as an integer.

For φ : R → S a ring homomorphism:

◦ im φ ≤ S (D&F, Prop. 7.2.5)

◦ ker φ ≤ R and r · ker φ, ker φ · r ⊆ ker φ (defined pointwise) (D&F, Prop. 7.2.5)
−1
◦ J ⊴ S =⇒ φ (J) ⊴ R (D&F, Prob. 7.2.24)
◦ If φ is surjective, then I ⊴ R =⇒ φ(I) ⊴ S (D&F, Prob. 7.2.24)
◦ If r ∈ R is nilpotent, φ(r) ∈ S is too (D&F, Prob. 7.2.32)
◦ For R a field, φ must be injective (D&F, Cor. 7.4.10)

If S ≤ R and I ⊴ R, then S ∩ I = ⟨0⟩ =⇒ S/I ∼

=S (D&F, Prob. 7.2.23)
On ring characteristics:

◦ In a commutative ring of prime characteristic p, (a + b)p = ap + bp (D&F, Prob. 7.2.26)

◦ A nontrivial Boolean ring R has char(R) = 2 (D&F, Prob. 7.2.27)
◦ An integral domain R has char(R) = p for p a prime or 0 (D&F, Prob. 7.2.28)

On ring nilradicals: for R a commutative ring

◦ N(R) ⊴ R (D&F, Prob. 7.2.29)

◦ The only nilpotent in R/N(R) is 0, i.e. N R/N(R) = {0} (D&F, Prob. 7.2.30)
Pn
On polynomials: for a commutative ring R and p(x) = i=0 ai xi ∈ R[x]:

◦ p ∈ R[x] is a unit iff a0 is a unit and a1 , · · ·, an are nilpotent (D&F, Prob. 7.2.33)
◦ p ∈ R[x] is nilpotent iff all coefficients ai are nilpotent (D&F, Prob. 7.2.33)

On sums & products of ideals, for I, J, K ⊴ R

◦ I +J ⊴R (D&F, Prob. 7.2.34)

\
◦ I +J = S (smallest ideal containing both) (D&F, Prob. 7.2.34)
S⊴R
I,J⊆S

◦ IJ ⊴ I ∩ J (D&F, Prob. 7.2.34)

◦ If R commutes and I + J = R, then IJ = I ∩ J (D&F, Prob. 7.2.34)
◦ I(J + K) = IJ + IK (D&F, Prob. 7.2.35)
◦ (I + J)K = IK + JK (D&F, Prob. 7.2.35)
◦ J ⊆ I implies I ∩ (J + K) = J + (I ∩ K) (D&F, Prob. 7.2.35)

More properties of ideals:

◦ For I ⊴ R, I = r iff I has a unit (D&F, Prop. 7.4.9)

◦ For R a commutative ring, R is a field iff its only ideals are 0 and R (D&F, Prop. 7.4.9)

On maximal ideals:

305
◦ In a unital ring, all proper ideals are contained in a maximal ideal (D&F, Prop. 7.4.11)
◦ In a ring R, M ⊴ R is maximal iff R/M is a field (D&F, Prob. 7.4.5)
◦ In a commutative ring R, R is a field iff 0 is a maximal ideal (D&F, Prob. 7.4.4)
◦ In a commutative unital ring R, (x) ∈ R[x] is maximal iff R is a field (D&F, Prob. 7.4.7)
◦ For φ : R → S a surjective homomorphism of commutative rings, and M ⊴ S maximal, then
φ−1 (M ) ⊴ R is maximal.

On prime ideals, for R a commutative unital ring:

◦ P ⊴ R is prime iff R/P is an integral domain (D&F, Prop. 7.4.13)

◦ Every maximal ideal of R a commutative ring is a prime ideal (D&F, Cor. 7.4.14)
◦ (x) ∈ R[x] is prime iff R is an integral domain (D&F, Prob. 7.4.7)
◦ If P ⊴ R is prime and without zero divisors, then R is an integral domain (D&F, Prob. 7.4.10)
◦ If I, J, P ⊴ R with P prime and containing IJ, then I ⊆ P or J ⊆ P (D&F, Prob. 7.4.11)
◦ For φ : R → S a homomorphism of commutative rings, and P ⊴ S prime, then φ−1 (P ) either is
R or a prime ideal of it. (D&F, Prob. 7.4.13)
◦ N(R) ⊆ P for every P ⊴ R prime (D&F, Prob. 7.4.26)
◦ Prime ideals are radical (D&F, Prob. 7.4.31)
◦ In any given R, there is a minimum prime ideal w.r.t. inclusion (D&F, Prob. 7.4.36)

If any of the following are true, prime and maximal ideals in R coincide:

◦ R is finite (D&F, Prob. 7.4.19)

◦ R is Boolean (D&F, Prob. 7.4.23)
nr
◦ R commutes and each element in R has r = r for some nr ∈ Z (D&F, Prob. 7.4.25)

On radicals and ideals: for R a commutative ring,

◦ rad I ⊴ R (D&F, Prob. 7.4.30)

◦ rad I ⊇ I (D&F, Prob. 7.4.30)
◦ (rad I)/I = N(R/I) (D&F, Prob. 7.4.30)
◦ Prime ideals are radical (D&F, Prob. 7.4.31)

On Jacobson radicals / Jac(I):

◦ I ≤ rad(I) ≤ Jac(I) ⊴ R (D&F, Prob. 7.4.32)

A nonzero finitely generated ideal in R has a corresponding B ⊴ R which is maximal w.r.t. to the
property of not containing the generated ideal. (D&F, Prob. 7.4.35)

Important Results:

First Isomorphism Theorem (Fundamental Isomorphism Theorem): For φ : R → S a ring

homomorphism,
◦ ker(φ) ⊴ R
◦ im(φ) ≤ S
◦ im(φ) ∼
= R/ ker(φ)

306
Second Isomorphism Theorem: For A ≤ R, B ⊴ R:

◦ A + B := {a + b}a∈A,b∈B ≤ R
◦ A∩B ⊴A
◦ (A + B)/B ∼
= A/(A ∩ B)
Third Isomorphism Theorem: For I, J ⊴ R and I ⊆ J:

◦ J/I ⊴ R/I
◦ (R/I)/(J/I) ∼
= R/J
Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take I an
ideal of R a ring. The map A 7→ A/I is an inclusion-preserving bijection between the set of subrings
of R containing I, and the set of subrings of R/I.
Moreover, if A is a subring containing I, it is an ideal of R iff A/I is an ideal of R/I.

307
§12.11: (Dummit & Foote, Chapter 8 ) Ring Theory: Domains (Euclidean, PIDs,
UFDs)

Definitions Given:

Norm on Integral Domain: Given R an integral domain, a norm is a function N : R → Z≥0 with
N (0) = 0. If r ̸= 0 =⇒ N (r) > 0, then N is a positive norm.

◦ Dedekind-Hasse Norm: For N a positive norm, N is a Dedekind-Hasse norm if, ∀a, b ∈ R,

then a ∈ (b) or ∃x ∈ (a, b) nonzero with N (x) < N (b). That is, either b | a or ∃s, t ∈ R with
0 < N (sa − tb) < N (b).

Euclidean Domain: An integral domain R is said to be a Euclidean domain and to possess a

division algorithm if ∃N a norm on R such that,

(∀a ∈ R)(∀b ∈ R̸=0 )(∃q, r ∈ R) a = qb + r where r = 0 or N (r) < N (b)

We call q the quotient and r the remainder of the division.

Note that this gives rise to the Euclidean algorithm:

Here, rn is the last nonzero remainder; since N (b) > N (r0 ) > N (r1 ) > · · · > N (rn ), the process must
terminant eventually. Moreover, rn = gcd(a, b).
Bezout Domain: For R an integral domain, we say R is a Bezout domain when each ideal generated
by two elements is principal, i.e. ∀a, b ∈ R, (a, b) = (c) for some c ∈ R.
Principal Ideal Domain (PID): A principal ideal domain is an integral domain with every ideal
being principal.

308
Unique Factorization Domain (UFD): A unique factorization domain is an integral domain
R such that, for all r ∈ R (r a nonzero non-unit),

(i) r may be written in the form r = p1 p2 · · ·pn for irreducibles pi ∈ R, not necessarily distinct
(ii) This product is unique up to associates

Universal Side Divisors: In R

e := R× ∪{0}, the units of R and 0, u ∈ R− R
e is said to be a universal
side divisor if
(∀x ∈ R) ∃z ∈ Re u | (x − z)

Thus there is a sort of “division algorithm” for each u: any x may be written in the form

x = qu + z

for z zero or a unit.

(Greatest Common) Divisor: In a commutative ring R, take a, b ∈ R with b ̸= 0.

◦ We say a is a multiple of b, or b divides a, denoted b | a, if ∃x ∈ R such that a = bx.

Least Common Multiple: In a commutative unital ring R with a, b ∈ R̸=0 , a least common
multiple of a, b is an ℓ ∈ R such that

We may write ℓ = lcm(a, b) = [a, b].

Irreducible Element: For r ∈ R an integral domain, suppose r is a nonzero non-unit. If r = ab for
a, b ∈ R =⇒ a or b is a unit, r is irreducible. Otherwise, it is reducible.
Prime Element: For p ∈ R an integral domain, p is prime if (p) is a prime ideal. Hence, p is prime
iff p is a non-unit and p | ab =⇒ p | a or p | b (or both) for any a, b ∈ R.
Associating Elements: For a, b ∈ R, if a = ub for a unit u, we say a, b associate in R.

309
Basic Results:

The following are Euclidean domains:

◦ Any field, under any norm

◦ Z under the norm N (r) := |r|
◦ For F a field, F [x] under the norm N (p) := deg(p). The division algorithm is just long polynomial
division.
◦ Quadratic integer rings.
◦ Discrete valuation rings

The following are PIDs:

◦ Z

The following are UFDs:

◦ All fields
◦ All PIDs
◦ Z (Fundamental Theorem of Arithmetic) (D&F, Cor. 8.3.15)
◦ F [x], for F a field
◦ R[x] for R a UFD

Results on divisors & GCDs, assuming R is an integral domain and the elements live therein:

◦ b | a ⇐⇒ a ∈ (b) ⇐⇒ (a) ⊆ (b)

◦ If d | a and d | b, then (a), (b) ⊆ (d) so (a, b) ⊆ (d)
◦ If d = gcd(a, b), then (a, b) ⊆ (d); if (a, b) ⊆ (δ), then (d) ⊆ (δ)
◦ If (a, b) = (d) for a, b, d ̸= 0, then d = gcd(a, b) (D&F, Prop. 8.1.2)
◦ If (d) = (δ), then δ = ud for a unit u ∈ R (D&F, Prop. 8.1.3)
◦ If d, δ are both gcd’s of a, b, then δ = ud for a unit u ∈ R (D&F, Prop. 8.1.3)
◦ Let nx denote the number of digits of x. Then the Euclidean algorithm in Z works in O(5nmin{a,b} )
time. (This is logarithmic owing to how one may compute the number of digits.)
◦ In a Euclidean domain R, if d = gcd(a, b) is generated by the division algorithm, then d is a
R-linear combination of a, b, i.e. ∃x, y ∈ R such that d = ax + by. (D&F, Thm. 8.1.4)
The x, y need not be unique. Indeed, if gcd(a, b) = ax0 + by0 , then we may generate other
solutions x, y to gcd(a, b) = ax + by by the rule

b a
(x, y) = x0 + m , y0 − m
gcd(a, b) gcd(a, b) m∈Z

Results on least common multiples:

◦ If ℓ = lcm(a, b) exists, it is a generator for the unique largest principle ideal in (a) ∩ (b) (D&F,
Prob. 8.1.11)
◦ In a Euclidean domain, any pair of elements have a LCM, up to multiplication by a unit (D&F,
Prob. 8.1.11)
◦ In a Euclidean domain, lcm(a, b) = ab/ gcd(a, b) (D&F, Prob. 8.1.11)

Results on Euclidean domains:

310
◦ Euclidean domains are UFDs (D&F, Thm. 8.3.14)
◦ All ideals in Euclidean domains are principal ideals (D&F, Prop. 8.1.1)
◦ A nonfield Euclidean domain has universal side divisors. (D&F, Prop. 8.1.5)
◦ Let (D&F, Prob. 8.1.3)

m := min N (r)
r∈R̸=0

If N (r) = m, then m is a unit. Likewise, if r ̸= 0 has N (r) = 0, it is a unit.

◦ If gcd(a, b) = 1 and a | bc, then a | c. (D&F, Prob. 8.1.4)
◦ If a | bc for a, b ̸= 0, then a/ gcd(a, b) dvidies c. (D&F, Prob. 8.1.4)

For an integral domain R, if these two hold, then R is a PID: (D&F, Prob. 8.2.4)

(i) a, b ∈ R̸=0 has a gcd d which may be written as d = ra + sb for some r, s ∈ R

(ii) If a1 | a2 | a3 | · · · for ai ∈ R, then ∃N ∈ Z≥1 such that an = un aN for n ≥ N and un a unit

In an integral domain R, if every prime ideal is principal, R is a PID (D&F, Prob. 8.2.6)
Results on PIDs:

◦ PIDs are UFDs (D&F, Thm. 8.3.14)

◦ Nonzero prime ideals in PIDs are maximal (D&F, Prop. 8.2.7)
◦ If R is a commutative ring with R[x] a PID (or Euclidean domain), then R is a field (D&F, Cor.
8.2.8)
◦ An integral domain R is a PID iff R has a Dedekind-Hasse norm (D&F, Prop. 8.2.9)
◦ The ideals (a), (b) are comaximal iff 1 is a gcd of a, b (D&F, Prob. 8.2.1)
◦ If a, b ∈ R a PID, then lcm(a, b) exists (D&F, Prob. 8.2.2)
◦ For R a PID and P ⊴ R prime, then R/P is a PID (D&F, Prob. 8.2.3)
◦ PIDs have multiplicative Dedekind-Hasse norms on them (D&F, Prob. 8.3.16)

Results on Bezout domains:

◦ R an integral domain is a Bezout domain iff each a, b ∈ R has a gcd d ∈ R which we can write as

d = ax + by

for some x, y ∈ R. (D&F, Prob. 8.2.7)

◦ All finitely generated ideals of a Bezout domain are principal. (D&F, Prob. 8.2.7)

In an integral domain, prime elements are irreducible; in PIDs & UFDs, they coincide. (D&F, Prop.
8.3.10-12)
Take a, b ∈ R a UFD, with (D&F, Prop. 8.3.13)
Y Y
a=u pei i b=v pfi i
i i

for distinct primes pi , integers ei , fi ∈ Z≥0 , and units u, v. Then a gcd of a, b is given by d,
Y min{e ,f }
i i
d := pi
i

or d = 1 if ei = fi = 0 for all i.

311
§12.12: (Dummit & Foote, Chapter 13 ) Field Theory: Basics of Field Extensions

Reiteration of Old Definitions:

Field: A field is a commutative unital ring (F, +, ·) with F̸=0 all invertible elements. Hence, (F, +)
and (F̸=0 , ·) are abelian groups. (We identity F × as the set of invertible elements, so in fields
F × = F̸=0 := F − {0}.)
Note that in fields, we have 0 ̸= 1.

Characteristic: The characteristic of a field F is denoted char(F ) (as for rings in Dummit & Foote)
or ch(F ) (as for fields in Dummit & Foote). We say
( )
n
X
ch(F ) := min n ∈ Z n · 1F := 1F = 0

i=1

or ch(F ) = 0 if said minimum does not exist.

As some examples, ch(Q) = ch(R) = 0 and ch(Z/pZ) = ch(Z/pZ[x]) = p for primes p.
Polynomial Ring: Given any commutative ring R, R[x] is the collection of polynomials in variable
x and coefficients in R. This holds even for fields.
Field of Rational Functions (cf. Section 7.5): Given F a field, F (x) denotes the field of rational
functions. That is,
p(x)
F (x) =
q(x) p,q∈F [x],q̸=0
with the obvious operations. This is the field of fractions of F [x].

Field Homomorphism: Given fields F, K, a homomorphism is a mapping φ : F → K such that

φ(x + y) = φ(x) + φ(y)

φ(xy) = φ(x)φ(y)

i.e. it is just a homomorphism of the pair as rings, since fields are rings. Of course, an isomorphism
∼
is a bijective homomorphism. Here, the textbook begins to denote isomorphisms by φ : F − → K.

Eisenstein’s Criterion (cf. Section 9.4): Let P ⊴ R be prime, and let f ∈ R[x] take the form
n
X
f (x) = ai xi for n ≥ 1
i=0

Suppose further that ai ∈ P for each i (so f ∈ P [x]) and a0 ̸∈ P 2 . Then f is irreducible in R[x].
In the more familiar case of Z, we might have
n
X
f (x) = ai xi where ai ∈ Z
i=0

Then if these conditions apply:

(i) p | ai for i = 0, 1, 2, · · ·, n − 1
(ii) p̸ | an
(iii) p2 ̸ | a0

312
then f is irreducible over Q (and Z). That is, it won’t be factored into nontrivial polynomials over the
these sets.

Basic (New) Definitions:

Prime Subfield: Given a field F , its prime subfield is that generated by its identity, i.e. (1F ). It is
isomorphic to Q for ch(F ) = 0, or Fp for ch(F ) = p.
Field Extension: Suppose F, K are fields with F ≤ K (i.e. F is a subfield - or subring - of K). Then
K is said to be an extension (field) of F .
This relation is denoted by K/F - meaning “K over F ” and not quotients.
Degree of Field Extension: Given K/F , then the degree, relative degree, or index of the
extension is denoted [K : F ] and given by

[K : F ] := dim K
F

(This is well-defined as K/F =⇒ K is a vector space over the field F .) We say that the extension is
finite if the degree is, and infinite otherwise.
Generated Field: Let K/F and let α1 , · · ·, αn ∈ K. Then the smallest subfield of K containing F ,
α1 , α2 , · · ·, αn−1 , and αn is the field generated by α1 , · · ·, αn over F and denoted F (α1 , · · ·, αn ).
Simple Extension; Primitive Element: If K/F has K = F (α) for a α ∈ K, then we say K is a
simple extension of F , and that the α in question is a primitive element for the extension.

Basic Results:

We assume F, K are fields throughout.

ch(F ) is prime or 0 for every field (follows from being integral domains). (D&F, Prop. 13.1.1)
Moreover, if ch(F ) = p, then p · a := a + a + · · · + a = 0 for each a ∈ F .
| {z }
p times

If φ : F → K is a field homomorphism, then φ ≡ 0 or φ(F ) ∼

= F. (D&F, Prop. 13.1.2)

More Important Results:

We again asusme F, K are fields.

Let p ∈ F [x] be irreducible. Then ∃K a field with a subfield isomorphic to F , where K contains a root
of p(x). Hence, F has an extension field in which p has a root. (D&F, Thm. 13.1.3)
Let p ∈ F [x] be irreducible with deg(p) = n, and K := F [x]/(p(x)), a field. Let θ := x mod p(x) lie in
n−1
K. Then θk k=0 is a basis of K (as an F -vector space), so

[K : F ] = n

and K = {f ∈ F [θ] | deg f < n}. (D&F, Thm. 13.1.4)

Hence, F [x]/(f (x)) is simply all polynomials of degree < deg(f ) in the variable r for r a root of f .
One may add polynomials in this set as normal. To maintain closure, find the usual product a(x)b(x),
and then find the representative of degree < n in the coset a(x)b(x) + (p(x)) by finding a(x)b(x)/p(x)
and taking the remainder. This makes it a field.

313
Let us have the extension K/F and let p ∈ F [x] be irreducible and r a root lying in K. Then (D&F,
Thm. 13.1.6)
F (r) ∼
= F [x]/(p(x))
If deg(p) = n then F (r) = {f ∈ F [r] | deg(f ) < n} ⊆ K. (D&F, Cor. 13.1.7)
Note that this means that if r, s are distinct roots of such p, then F (r) ∼
= F (s) ∼
= F [x]/(p(x)); you may
say the roots are algebraically indistinguishable.
∼
→ F ′ (and hence F [x] ∼
Suppose φ : F − = F ′ [x]). Let p ∈ F [x] and p′ ∈ F ′ [x] be irreducibles, where
n
X n
X
p(x) := ai xi and we let p′ (x) := φ(ai )xi
i=0 i=0

Let α be a root of p in some extension of F , and α′ a root of p′ in some extension of F ′ . Then there
is an isomorphism
∼
→ F ′ (α′ )
σ : F (α) −
α 7→ α′

which extends φ, i.e. σ|F ≡ φ (or rather, restriction to the constant polynomials).
This is represented by this diagram. Note that the vertical bars mean field extension (hence F (α)/F
and F ′ (α′ )/F ′ ).

314
§12.13: (Dummit & Foote, Chapter 13 ) Field Theory: Algebraic Extensions

Definitions:

Algebraic/Transcendental: Let K/F be a field extension and α ∈ K. We say α is algebraic (over

F ) if ∃f ∈ F [x] nonzero such that f (α) = 0. Otherwise we say α is transcendental (over F ).
If each α ∈ K is algebraic over F , we say K (or K/F ) is an algebraic field extension.
If α is algebraic over F , it is algebraic over any field extension of F . (That is, it is a property of the
base field F and the element α, not the extending field.)
Minimal Polynomial: Let α be algebraic over F . Then ∃!mα,F ∈ F [x] which is monic and irreducible
and has α as a root. Moreover, f ∈ F [x] has α as a root iff mα,F | f in F [x].
This polynomial is called the minimal polynomial of α in F . Sometimes F is omitted if understood.
Details of the derivation are in Proposition 13.2.9.
Finite Generated Field Extension: An extension K/F is said to be finitely generated if
∃α1 , · · ·, αn ∈ K such that K = F (α1 , · · ·, αn ).
Composite Field: Let K1 , K2 ≤ F as fields. The composite field of K1 and K2 , denoted K1 K2 ,
is the smallest subfield of F containing both K1 , K2 . We may generalize this to arbitrary collections,
and may describe this with the usual intersection.
Formally Real: F is said to be formally real if we cannot write −1 as a sum of squares of F ’s
elements.

Basic Results:

If α has degree 1 in an extension K/F , then α ∈ F (minimal polynomial x − α)

If α is algebraic over F and L, and L/F , then mα,L | mα,F in L[x]. (D&F, Cor. 13.2.10)
Let α be algebraic over F . Then (D&F, Prop. 13.2.11)

F (α) ∼
= F [x]/(mα (x)) and [F (α) : F ] = deg(α) = deg(mα )

α is algebraic over F iff the simple extension F (α)/F is finite. (D&F, Prop. 13.2.12)
If α is an element of an extension of degree n over F , then α is the root of polynomial of degree at
most n over F .
If α is the root of a polynomial of degree n over F , then [F (α) : F ] ≤ n.
If K/F is finite, it is algebraic. (D&F, Cor. 13.2.13)
F (α, β) = (F (α))(β) (D&F, Lem. 13.2.16)
n m
If K1 /F and K2 /F are finite extensions with {αi }i=1 and {βi }i=1 respective bases, then

K1 K2 = F (α1 , · · ·, αn , β1 , · · ·, βm )

For K1 /F and K2 /F , we have [K1 K2 : F ] ≤ [K1 : F ][K2 : F ] (D&F, Prop. 13.2.21)

Equality holds iff a basis for one field is linearly independent over the other.
n m
If {αi }i=1 is a basis of K1 /F and {βi }i=1 of K2 /F , then {αi βj }1≤i≤n,1≤j≤m is one of K1 K2 .
Equality holds when the [Ki : F ] are coprime. (D&F, Cor. 13.2.22)

315
Important Results:

The quadratic formula holds in any field of characteristic ̸= 2.

Tower Law: For F ≤ K ≤ L as fields (L/K/F ), then (D&F, Thm. 13.2.14)

[L : F ] = [L : K][K : F ]

Hence, for L/F a finite extension and K such that L/K/F . Then (D&F, Cor. 13.2.15)

[K : F ] [L : F ]

The extension K/F is finite iff K is generated by finitely many algebraic elements over F . (D&F,
Thm. 13.2.17)
That is, a field over F generated by finitely many algebraic elements of degrees d1 , · · ·, dn , is algebraic
of degree ≤ d1 d2 · · ·dn .

Suppose α, β are algebraic over F . The following are too, then: (D&F, Cor. 13.2.18)

◦ α+β
ab
◦
◦ α/β (if β ̸= 0)
◦ 1/α (if α ̸= 0)

These lie in F (α, β); use the theorem to verify.

For an extension L/F , the elements that are algebraic over F form a subfield of L. (D&F, Cor.
13.2.19)

If L/K/F with K/F and K/L each algebraic, so is L/F . (D&F, Thm. 13.2.20)

316
§12.14: (Dummit & Foote, Chapter 13 ) Field Theory: Splitting Fields; Alge-
braic Closures

Basic Definitions:
We assume F, K are fields.

Splitting Field: The extension K/F is said to be a splitting field of f ∈ F [x] if f may be factored
completely into linear factors (split completely ) when in K[x], and when f does not split completely
for any proper subfield L of K containing F (i.e. an L where F < L < K).

Normal Extension: If K/F is algebraic and K is the splitting field of some family of polynomials in
F [x], we say K is a normal extension.
Primitive Root of Unity: Recall that the solutions to xn − 1 (as a polynomial in R[x]) are the nth
roots of unity . These form a subgroup of C under multiplication, a cyclic one at that.

(n) 2πik
ζk := exp or, as Dummit & Foote use, ζnk
n
(n)
A primitive nth root of unity is one such that it generates the remaining set. These are ζk for k
coprime to n.

Cyclotomic Field: The field Q(ζn ) is called the cyclotomic field of the nth roots of unity. It
is the splitting field of xn − 1.
Algebraic Closure: The field F is said to be an algebraic closure of F if F /F is algebraic and F
is a splitting field of every f ∈ F [x]. (Equivalently, F has all of the elements algebraic over F .)
We say a field F is algebraically closed if, ∀f ∈ F [x], f has a root in F .

Basic Results:
We assume F, K are fields.

If deg(f ) = n for f ∈ F [x], then f has at most n roots in F , and precisely n if it splits completely in
F [x].
For any f ∈ F [x], a splitting field exists for f . (D&F, Thm. 13.4.25)
The splitting field K of f ∈ F [x] with deg(f ) = n itself has [K : F ] ≤ n!. (D&F, Prop. 13.4.26)

The algebraic closure of a field is algebraically closed. (D&F, Prop. 13.2.29)

For any field F , ∃K ≥ F an algebraically closed field. (D&F, Prop. 13.2.30)
Algebraic closures are unique up to isomorphism. (D&F, Prop. 13.2.31)

317
Important Results:

∼
Let φ : F −
→ F ′ for fields F, F ′ . Let f ∈ F [x] and f ′ ∈ F ′ [x] be given by (D&F, Thm. 13.2.27)
n
X n
X
i ′
f (x) := ai x f (x) := φ(ai )xi
i=0 i=0

∼
with splitting fields E, E ′ over F, F ′ respectively. Then φ extends to an isomorphism σ : E −
→ E ′ , i.e.
∼ ′
∃σ : E −
→ E such that σ|F ≡ φ

The splitting field of a given polynomial is unique up to isomorphism (D&F, Cor. 13.2.28)

318
§12.15: (Dummit & Foote, Chapter 13 ) Field Theory: Separability

Definitions:
We assume F is a field.

Separable Polynomial: For f ∈ F [x], we say f is separable if it has no repeated roots (i.e. each
root has multiplicity 1), and inseparable otherwise.

Polynomial Derivative: For f ∈ F [x], given by

n
X
f (x) = ai xi
i=0

we define its derivative f ′ or Dx f ∈ F [x] by

n
X
Dx f (x) := iai xi−1
i=1

(as if this was ordinary calculus). We make no concerns about existence or convergence: this is an
algebraic definition of a new function.
Frobenius Endomorphism: The map φ(x) := xp from a field F to itself F is the Frobenius
endomorphism of F .
Perfect Field: Let ch(F ) = p. Then we say F is perfect if all elements of F are pth powers in F ,
and hence F = F p .
All fields of characteristic zero are called perfect. We can show all finite fields are too.
Separable Degree: Let p ∈ F [x] be irreducible, with ch(F ) = p. Then ∃! k ≥ 0 and ∃! ps ∈ F [x]
k
with p(x) = ps (xp ).
deg(ps ) is called the separable degree of p(x) and is denoted degs (p(x)).
The integer pk is called the inseparable degree of p(x), denoted degi (p(x)).
Separable Fields: The field K is said to be separable (or separably algebraic) over F if each
α ∈ K is the root of a separable f ∈ F [x]. (Equivalently, µα,F is separable for each α ∈ K.)
Otherwise, we say K is inseparable.

Basic Results:

The linearity and product rules apply for Dx f, Dx g as in normal calculus.

In fields of characteristic p, (a + b)p = ap + bp . (D&F, Prop. 13.5.35)

If F is a finite field with ch(F ) = p, then each element of F is a pth power. (D&F, Cor. 13.5.36)

p is separable iff degi (p) = 1 iff degs (p) = deg(p).

deg(p) = degs (p) · degi (p).

Finite extensions of perfect fields are separable. (In particular, Q and finite fields.) (D&F, Cor.
13.5.39)

319
Important Results:

f has a multiple root r iff r is also a root of Dx f . In particular, µα | f, Dx f .

Hence, f is separable iff gcd(f, Dx f ) = 1. (D&F, Prop. 13.5.33)
Irreducible polynomials over fields of characteristic 0 are separable. (D&F, Cor. 13.5.34)
Polynomials over such fields are separable iff they’re a product of irreducible polynomials.

Irreducible polynomials over finite fields are separable. (D&F, Cor. 13.5.37)
Polynomials over such fields are separable iff they’re a product of irreducible polynomials.

320
§12.16: (Dummit & Foote, Chapter 14 ) Galois Theory: Basic Definitions

Old Definitions:
We assume K is a field.
∼
Automorphism: Given a ring or field K, if σ : K − → K is an isomorphism, we say σ is an auto-
morphism (an isomorphism from a structure to itself). The collection of these automorphisms on K
is denoted Aut(K).
Dummit & Foote often use the functional analysis-esque shorthand of σα := σ(α).
Fixed Element/Set: φ ∈ Aut(K) is said to fix an α ∈ K if φα = α. Likewise, given S ⊆ K (or
even say a subfield), then φ fixes S when φs = s for every s ∈ S. (Note that this is stronger than
φ(S) = S.)

New Definitions:
We assume K is a field, as is F .

Automorphisms Fixing Subfield: Let K/F be a field extension. Then we denote the automor-
phisms of K which fix F by Aut(K/F ), i.e.

Aut(K/F ) := {φ ∈ Aut(K) | φ(x) = x for every x ∈ F }

Fixed Field: Let H ≤ Aut(K) as groups. Define

F := {x ∈ K | ∀φ ∈ H, φ(x) = x}

Then F is a subfield of K (cf. Proposition 14.1.3). F is called the fixed field of H.

Some choose to use the notations

Fix(H) ≡ FixK (H) ≡ KH := {x ∈ K | ∀φ ∈ H, φ(x) = x}

Galois Field Extension: Let K/F be a finite field extension. We say K is Galois over F - and that
K/F is a Galois extension - if |Aut(K/F )| = [K : F ].
Galois Group: If K/F is a Galois extension, then Aut(K/F ) is called the Galois group of K/F .
It is given the special name Gal(K/F ).
(Some choose to define this for any K/F , not merely finite.)
Galois Group of Polynomial: Let f ∈ F [x] be separable over F , with splitting field K. Then the
Galois group of f over F is Gal(K/F ) (i.e. the splitting field over its field of origin).
Some choose to denote this by Gal(f ).

Basic Results:
We assume K is a field, as is F .

Each φ ∈ Aut(K) fixes the prime subfield (that generated by 1). Consequently, Aut(Q) and Aut(Fp )
are trivial.
Aut(K) is a group, with a subgroup Aut(K/F ), under function composition. (D&F, Prop. 14.1.1)

321
Let K/F be a field extension and α ∈ K algebraic over F . Let σ ∈ Aut(K/F ). Then σα is a root of
mα,F . Hence, Aut(K/F )’s elements permute the roots of irreducible polynomials - hence, if α is a root
of f ∈ F [x], so is σα. (D&F, Prop. 14.1.2)
Consequently, if K is generated over F by some elements, σ ∈ Aut(K/F ) is determined completely by
its action on the generators.

The association of fields to groups reverses inclusion: (D&F, Prop. 14.1.4)

◦ F1 ≤ F2 ≤ K as fields =⇒ Aut(K/F2 ) ≤ Aut(K/F1 )

◦ If H1 ≤ H2 ≤ Aut(K) as groups with Hi having fixed field FixK (Fi ), then F2 ≤ F1 .

For K the splitting field of f ∈ F [x], we have (D&F, Prop. 14.1.5)

|Aut(E/F )| ≤ [E : F ]

with equality if f is separable over F (and then, hence, K/F is Galois). (D&F, Cor. 14.1.6)
More generally, for any finite extension K/F , |Aut(K/F )| ≤ [K : F ]. (D&F, Cor. 14.2.10)

322
§12.17: (Dummit & Foote, Chapter 14 ) Galois Theory: The Fundamental The-
orem

Definitions:

Character: Given a group G and L a field, a (linear) character of G is a mapping χ : G → L

where
(i) χ(gh) = χ(g)χ(h) ∀g, h ∈ G (i.e. it is a homomorphism G → L× )
(ii) χ(g) ̸= 0 ∀g ∈ G
Linearly Independent Characters: Given characters χ1 , · · ·, χn of G, we say they are linearly
n
independent if they are such as functions on G. That is, {χi }i=1 is linearly independent if and only
if
Xn
αi χi (g) = 0 =⇒ ∀i ∈ {1, · · ·, n}, αi = 0
i=1

for any g ∈ G.
Field Embedding: Given fields F, K, an embedding of F into K is simply an injective homomor-
phism φ : F → K.

Galois Conjugates: Let K/F be Galois. If α ∈ K, then {σα}σ∈Gal(K/F ) are the (Galois) conju-
gates of α over F .
Galois Conjugate Field: If K/F is Galois and F ≤ E ≤ K as fields, and σ ∈ Gal(K/F ), then σ(E)
is called the conjugate field of E over F .

Basic Results:

n
If {χi }i=1 are distinct characters G → L, then they are linearly independent. (D&F, Thm. 14.2.7)
n
If {σi : K → L}i=1 are distinct field embeddings, then they are linearly independent functions on K.
Hence, distinct automorphisms are linearly independent as functions on K. (D&F, Cor. 14.2.8)
n
Suppose G := {σi }i=1 ≤ Aut(K) as groups. Then (D&F, Thm. 14.2.9)

[K : FixK (G)] = n = |G|

Let G ≤ Aut(K) be finite and F := FixK (G). Then each σ ∈ Aut(K) that fixes F is contained in G.
That is, Aut(K/F ) = G with Galois group G. (D&F, Cor. 14.2.11)
If G, H ≤ Aut(K) have G ̸= H, then FixK (G) ̸= FixK (H). (D&F, Cor. 14.2.12)
The extension K/F is Galois iff K is the splitting field of some f ∈ F [x]. If this is the case, then
any irreducible g ∈ F [x] which has a root in K is separable and has all its roots in K, so K/F is a
separable extension. (D&F, Thm. 14.2.13)

Important Results:

323
The Fundamental Theorem of Galois Theory: Let K/F be a Galois extension. Then there is a
bijection
{fields E | F ≤ E ≤ K} ↔ {groups H | 1 ≤ H ≤ Gal(K/F )}
with the correspondences

E 7→ {σ ∈ Gal(K/F ) | σ fixes E, i.e. σ(x) = x for x ∈ E}

in the reverse direction, H 7→ Fix(H)

Under these correspondences:

◦ Inclusion is reversed. E1 ≤ E2 as fields iff H2 ≤ H1 as groups

◦ [K : E] = |H| and [E : F ] = |G : H|
◦ K/E is always Galois, with Galois group Gal(K/E) = H
◦ E is Galois over F iff H ⊴ Gal(K/F ). If so, then

Gal(E/F ) ∼
= Gal(K/F )/K

◦ More generally, even if H is not normal in Gal(K/F ), the isomorphisms of E into a fixed algebraic
closure of F that contains K, which fix F , are in bijection with the cosets {σH}σ∈Gal(K/F ) .
◦ If E1 , E2 respectively correspond to H1 , H2 , then E1 ∩ E2 corresponds to the group ⟨H1 , H2 ⟩,
and the composite field E1 E2 corresponds to H1 ∩ H2 . Thus the lattice of subfields of K that
contain F , and the lattice of subgroups of Gal(K/F ), are dual (flipped upside down versions of
each other).

324
§13: Items from Topology, Metric Spaces, & Real Analysis

§13.1: Topological Operations on Sets / Related Sets

§13.1.1: Boundary

Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.

∂X S

bdX S

BdX S

bdryX S

frX S

Common definitions:

∂S := S − int(S)

∂S := S ∩ S c

∂S := {p ∈ X | ∀Up a nbh of p, Up ∩ S ̸= ∅ and Up ∩ S c ̸= ∅}

Some identities:

S = S ∪ ∂S

X = int(S) ∪ ∂S ∪ int(S c ) for any S ⊆ X (trichotomy); the three are pairwise disjoint

∂S = ∂(S c )

int(∂S) = ∅ for any S which is at least one of open or closed

∂∂∂S = ∂∂S ⊆ ∂S

Some other results:

∂S is always closed

S is closed iff ∂S ⊆ S

S is open iff S and ∂S are disjoint

S is dense and open in X, iff ∂S = S c

325
§13.1.2: Closure

Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.

S−
SX
X
S
S

Common definitions:

S := {x ∈ X | ∀Ux nbh’s of x we have Ux ∩ S ̸= ∅}

S := S ∪ {all limit points of S}
\
S := F
F ⊇S
F is closed

By the above one, S is the smallest closed set containing S

S = S ∪ ∂S
S = {x ∈ X | ∃ a net in S converging to x}

Some identities:

int(S) ⊆ S ⊆ S
S = S ∪ ∂S
S = int(S c )c ; equivalently, CS = int(CS)
c
int(S) = S c ; equivalently, C int(S) = CS
For {Si }i∈N :
!
[ [
◦ If ∀i we have Si closed in X, then int(Si ) = int Si
i∈N i∈N
! !
\ \
◦ If ∀i we have Si open in X, then int Si = int Si
i∈N i∈N
\ \
Si ⊆ Si (reverse may not hold)
i∈I i∈I
[ [
Si ⊆ Si (reverse may not hold)
i∈I i∈I

n
[ n
[
Si = Si (specifically finite)
i=1 i=1

Some other results:

326
(Monotonicity) S ⊆ T =⇒ S ⊆ T

S is closed iff S = S

If A is closed, then S ⊆ A iff S ⊆ A

f : X → Y is continuous iff f −1 (C) is closed in X for all C closed in Y

f : X → Y is continuous iff ∀A ⊆ X we have f (A) ⊆ f (A)

327
§13.1.3: Complement

Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.

S′

Common definitions:

S c := X − S, in an understood universe X

Some identities:

X = int(S) ∪ ∂S ∪ int(S c ) for any S ⊆ X (trichotomy); the three are pairwise disjoint

∂S = ∂(S c )

S = int(S c )c ; equivalently, CS = int(CS)

c
int(S) = S c ; equivalently, C int(S) = CS

Some other results:

S is dense and open in X, iff ∂S = S c

328
§13.1.4: Interior

Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.

intX (S)
IntX (S)
S̊X
SX
◦

Common definitions:
c
Some define the related exterior, by ext(S) = int(S c ) = S . Note that X = int(S) ∪ ∂S ∪ ext(S)
x ∈ int(S) ⇐⇒ ∃Ux an open nbh s.t.x ∈ Ux ⊆ S
[
int(S) := G
G⊆S
G is open

The above gives that int(S) is the largest open set in S

Some identities:

int(S) ⊆ S ⊆ S
X = int(S) ∪ ∂S ∪ int(S c ) for any S ⊆ X (trichotomy); the three are pairwise disjoint
int(∂S) = ∅ for any S which is at least one of open or closed
S = int(S c )c ; equivalently, CS = int(CS)
c
int(S) = S c ; equivalently, C int(S) = CS
For {Si }i∈N :
!
[ [
◦ If ∀i we have Si closed in X, then int(Si ) = int Si
i∈N i∈N
! !
\ \
◦ If ∀i we have Si open in X, then int Si = int Si
i∈N i∈N

Some other results:

(Idempotence) int(int(S)) = int(S)

(Monotonicity) S ⊆ T =⇒ int(S) ⊆ int(T )
S is open iff S = int(S)
If T is open in X, then T ⊆ S iff T ⊆ int(S)
int(S ∩ T ) = int(S) ∩ int(T ) (M&I, Prob. 1.7)
int(S ∪ T ) ⊆ int(S ∪ T ) (M&I, Prob. 1.7)
If S is closed in X, and int(T ) = ∅, then int(S ∪ T ) = int(S)

329
§13.1.5: Limit Points / Accumulation Points / Derived Set

The elements we’re talking about may be called limit points, accumulation points, or adherent
points, and the collection of them (aside from the obvious, e.g. set of limit points) may be called the
derived set
Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.

S′

L(S)

Common definitions:

We say x is an isolated point if x ̸∈ S ′ .

x ∈ S ′ ⇐⇒ (∀Ux nbh’s of x)((Ux − {x}) ∩ S ̸= ∅)

Some identities:

s ∈ S ′ =⇒ s ∈ S ′ − {s}

(S ∪ T )′ = S ′ ∪ T ′

S ⊆ T =⇒ S ′ ⊆ T ′

Some other results:

S is closed iff S ′ ⊆ S

We say S is dense-in-itself if S ⊆ S ′ and perfect if S = S ′ .

x ∈ S ′ ⇐⇒ x ∈ S − {x} (unsure?)

330
§13.2: Compactness

To define compactness in a topological space (X, τ ):

{Ui }i∈I a collection of open sets is an open cover of S ⊆ X if

S
i∈I Ui ⊇ S
A topological space (X, τ ) is said to be compact iff arbitrary open covers always have finite subcovers

Some equivalent results:

X is compact iff it has the finite intersection axiom

X is compact iff each ultrafilter on X converges

X is compact iff each filter on X has a limit point in X

If X is a metric space, X is compact iff each sequence in X has a convergent subsequence with limit
in X (called sequential compactness)

◦ Measure & Integral only discusses the Rn case (M&I, Thm. 1.12)

X is said to satisfy the finite intersection axiom if this holds:

n
T
A collection of sets S is said to have the finite intersection property if, for any {Si }i=1 ⊆ S, we have
n
i=1 Si ̸= ∅.

X satisfiesTthe finite intersection axiom if, for any family S of closed sets having the finite intersection
property, S = ̸ ∅. (Notice how it must apply to the whole family now.)

Notable results, others here:

(Heine-Borel) S ⊆ Rn is closed and bounded iff S is compact

The above holds for S ⊆ X for X a normed vector space

Heine-Borel for metric spaces is: (X, d) is compact iff complete and totally bounded, latter defined by
n
◦ X is totally bounded iff ∀ε > 0, ∃{xi }i=1 ∈ X such that inf d(xi , x) < ε for each x ∈ A
1≤i≤n

(Tychanoff ) A product
Q
i∈I Xi is compact iff Xi is compact ∀i ∈ I
Finite unions of compact sets are compact

If f : X → Y is a continuous surjection and X is compact, so is f (X)

331
§13.3: Continuity & Types Thereof

Results are stated in terms of R unless stated otherwise but generalize nicely. Domains are not stated
unless needed.
We may write g ∈ C(S) to mean it is continuous on S, and g ∈ C(S, T ) to mean g : S → T is continuous.

Types of Continuity & Basic Definitions:

Several types of continuity exist. These may often be defined for metric or other such spaces in the
obvious ways.

Traditional Continuity: f is continuous at c ∈ dom(f ) iff ...

◦ Limits: lim f (x) = f (c)

x→c

◦ ε − δ: (∀ε > 0)(∃δ > 0) |x − c| < δ =⇒ |f (x) − f (c)| < ε
◦ Neighborhoods: ...whenever Uc is a neighborhood of c and Vf (c) of f (c), we have x ∈ Uc =⇒ f (x) ∈ Vf (x)
◦ Sequences: xn → c =⇒ f (xn ) → f (c)
Topological Continuity: Let f : X → Y be a function of topological spaces. Equivalent conditions
for its continuity:

◦ ∀V ⊆ Y open, f −1 (V ) is open. (Preimages of open sets are open.)

◦ ∀V ⊆ Y closed, f −1 (V ) is closed. (Preimages of closed sets are closed.)
Topological Continuity at a Point: Let f : X → Y be a function of topological spaces. f is
continuous at x ∈ X iff ...

◦ ∀Vf (x) a neighborhood of f (x), ∃Ux a neighborhood of x such that f (Ux ) ⊆ Vf (x) .
◦ f −1 (V ) is a neighborhood of x for all neighborhoods V of f (x)
Uniform Continuity: f is uniformly continuous on D := dom(f ) iff

(∀ε > 0)(∃δ > 0)(∀x, y ∈ D) |x − y| < δ =⇒ |f (x) − f (y)| < ε

Lipschitz Continuous: f is Lipschits continuous on D := dom(f ) iff

(∃K > 0)(∀x, y ∈ D) |f (x) − f (y)| ≤ K · |x − y|

A mapping may be said to be contractive if K < 1 and expansive if K > 1.

Holder Continuity: For α ∈ R, f is α-Holder continuous on D := dom(f ) if

(∃K > 0)(∀x, y ∈ D) |f (x) − f (y)| ≤ K · |x − y|α

The α = 1 case gives Lipschitz continuity.

332
Absolute Continuity: Take [a, b] ⊆ R an interval and f : [a, b] → R. (C may also be used.) f is
n
absolutely continuous if, ∀ε > 0, ∃δ > 0 such that, whenever {(ai , bi )}i=1 ⊆ P([a, b]) are finitely-many
disjoint and nonempty subintervals, meeting the condition
n
X n
X
bk − ak = µ (ak , bk ) < δ
k=1 k=1

then
n
X
|f (bk ) − f (ak )| < ε
k=1

More compactly: if
( n
)
X
n
Ia,b,δ := {(ak , bk )}k=1 n ∈ N and a < ak < bk < b and (bk − ak ) < δ

k=1

then f is absolutely continuous if and only if

n
!
n
X
(∀ε > 0)(∃{(ak , bk )}k=1 ∈ Ia,b,δ ) |f (bk ) − f (ak )| < ε
k=1

A Brief Hierarchy:

Hierarchy of some continuous function sets: when α ∈ (0, 1],

{Continuously differentiable functions}

⊆ {Lipschitz continuous functions}
⊆ {α-Holder continuous functions}
⊆ {uniformly continuous functions}
⊆ {(traditionally) continuous functions}

Properties of (Typical) Continuity:

Topological Invariants: These are properties such that, if A has the property, so does f (A) if f is
continuous:
◦ compactness
◦ connectedness
◦ path-connectedness
◦ being a Lindelof space
◦ being separable
Extreme Value Theorem: Continuous functions on compact sets attain their suprema and infima,
and hence are bounded (M&I, Thm. 1.15)

333
Intermediate Value Theorem: If f ∈ C([a, b]), then for each y ∈ [f (a), f (b)] there is an x ∈ [a, b]
such that f (x) = y
Lebesgue-Vitali or Riemann-Lebesgue Theorem: If continuous a.e., f is integrable

Tietze Extension Theorem: Let F ⊆ Rn be closed and f : F → R continuous. Then ∃g : Rn → R

continuous with g|F = f . (We may extend continuous functions on closed sets to larger domains.)
Weierstrass Approximation Theorem: Given f ∈ C[a, b], then f may be approximated arbitrarily
well by a polynomial, in the sense that ∀ε > 0, ∃p a polynomial on [a, b] such that

∥f − p∥∞ := sup |f (x) − p(x)| < ε

x∈[a,b]

f : X → Y is continuous on X iff...

◦ ∀B ⊆ Y we have f −1 (int(B)) ⊆ int f −1 (B)

◦ ∀A ⊆ X we have f (A) ⊆ f (A)

Properties of Uniform Continuity:

Heine-Cantor Theorem: If f is continuous on a compact set, it is uniformly continuous

For A totally bounded and f uniformly continuous, f (A) is totally bounded

If f : R≥0 → R has lim f (x) existent and finite, f is uniformly continuous

x→∞

Properties of Lipschitz Continuity:

g ∈ C 1 (R, R) is Lipschitz continuous iff g ′ is bounded; then we have Lipschitz constant sup|g ′ |.

Properties of Absolute Continuity:

Take f : [a, b] → R unless needed or stated otherwise.

If f is absolutely continuous on a compact interval, it is uniformly continuous

◦ Measure & Integral gives it for just compact sets (M&I, Thm. 1.15)
If f : [a, b] → R is absolutely continuous, it is differentiable a.e., and ∃g ∈ L[a, b] with
Z x
f (x) = f (a) + g(t) dt
a

for each x ∈ [a, b].

334
f, g absolutely continuous implies that f ± g are too

f, g absolutely continuous on a compact interval implies it for f g

Absolutely continuous functions are of bounded variation

Absolutely continuous functions may be written in the form g − h for g, h monotone non-decreasing on
[a, b]

335
§13.4: Dense

Common definitions, for S ⊆ X a topological space, where we want to claim S is dense in X

S = X (smallest closed subset of X containing S is X itself)

int(S c ) = ∅

If x ∈ X, then x ∈ S or x ∈ S ′ (since S = S ∪ S ′ )

If x ∈ X, and Ux is any nbh of x, then Ux ∩ S ̸= ∅

For any A ⊆ X, S ∩ A ̸= ∅

Any element in a basis of X intersects S

Some other results:

S is dense and open in X, iff ∂S = S c

X (the topological space) is always dense in itself

If A ⊆ B ⊆ C ⊆ X, and A is dense in B, which is dense in C, then A is dense in C

If f : X → Y is continuous and surjective, and D ⊆ X is dense in X, then f (D) is dense in Y

If X has D dense and connected in X, X itself is connected

If f, g : X → Y agree in a dense D ⊆ X, with f continuous and Y Hausdorff, then f ≡ g on all of X

336
§13.5: Infimum & Supremum; ε Characterization (“Capturing”)

Recall: for a set S ⊆ R, we say that

u is an upper bound of S if u ≥ s ∀s ∈ S

ℓ is a lower bound of S if ℓ ≤ s ∀s ∈ S

sup S = α ⇐⇒ α is an upper bound of S, and the least such one (in the sense that if γ is another
upper bound, then α ≤ γ)
inf S = β ⇐⇒ β is a lower bound of S, and the greatest such one (in the sense that if γ is another
lower bound, then β ≥ γ)

We let sup ∅ = −∞ and inf ∅ = +∞

α, β as given above may be infinite if need be, but we focus on finite here

The ε-characterization is as follows. You may envision a half-interval stretching away from α, β to “capture”
other elements:

sup S = α ⇐⇒ ∀ε > 0, ∃s ∈ S such that α − ε < s < α

inf S = β ⇐⇒ ∀ε > 0, ∃s ∈ S such that β < s < β + ε

Some properties of infima/suprema include the following:

Negatives & Reversal: sup(−xn ) = − inf xn and inf (−xn ) = − sup xn

n∈N n∈N n∈N n∈N

Scaling:

◦ For α ≥ 0, then sup(αxn ) = α · sup xn and inf (αxn ) = α · inf xn .

n∈N n∈N n∈N n∈N

◦ For α < 0, then sup(αxn ) = |α| · inf xn and inf (αxn ) = |α| · sup xn
n∈N n∈N n∈N n∈N

Sub-/Super-Additivity: Recall: we say f is sub-additive if f (x + y) ≤ f (x) + f (y) and super-

additive if f (x + y) ≥ f (x) + f (y).

◦ sup(xn + yn ) ≤ sup xn + sup yn

n∈N n∈N n∈N

◦ inf (xn + yn ) ≥ inf xn + inf yn

n∈N n∈N n∈N

Sub-/Super-Multiplicativity: Recall: we say f is sub-multiplicative if f (x · y) ≤ f (x) · f (y) and

super-multiplicative if f (x · y) ≥ f (x) · f (y).

◦ sup(xn yn ) ≤ sup xn sup yn
n∈N n∈N n∈N

◦ inf (xn yn ) ≥ inf xn inf yn
n∈N n∈N n∈N

337
§13.6: Limit Inferior & Limit Superior of Sequences (lim inf an , lim sup an )

Notations:
The limit inferior of a sequence {an }n∈N (implicit: as n → ∞) may be denoted by

lim inf an lim an lim inf an lim an

n→∞ n→∞

and likewise the limit superior by

lim sup an lim an lim sup an lim an

n→∞ n→∞

Definitions:
Definitions differ a little; a few common (equivalent) ones are

lim inf an := lim inf am ≡ sup inf am (limit of future infima)
n→∞ n→∞ m≥n n∈N m≥n

lim sup an := lim sup am ≡ inf sup am (limit of future suprema)
n→∞ n→∞ m≥n n∈N m≥n

We may also use subsequential limits. Let A be the collection of all subsequential limits of {an }n∈N . That
is
k→∞
a ∈ A ⇐⇒ ∃ a subsequence {ank }k∈N ⊆ {an }n∈N such that ank −−−−→ a
k→∞
⇐⇒ ∃ a monotone sequence {nk }k∈N ⊆ N such that ank −−−−→ a

Then
lim inf an := inf(A) lim sup an := sup(A)
n→∞ n→∞

The definition for a sequence of functions {fn (x)}n∈N is entirely analogous, but that for a function
itself is not quite the same.

Some Other Interpretations / ε-Capturing:

Take {xn }n∈N ⊆ R with lim sup xn , lim inf xn ∈ (−∞, +∞).
n→∞ n→∞

lim sup xn is the smallest b ∈ R such that, ∀ε > 0, ∃N ∈ N, such that xn < b + ε ∀n > N .

Hence, any number larger than lim sup xn is an eventual upper bound for the sequence (all terms are
“eventually” bounded by it).
Moreover, only finitely many terms will be larger than b + ε.

Similarly, lim inf xn is the smallest a ∈ R such that, ∀ε > 0, ∃N ∈ N, such that xn > b − ε for all
n > N.
Thus, any number larger than lim inf xn is “eventually” an upper bound of {xn }n∈N , and only finitely
many terms will be less than b − ε.

338
Properties of Limit Inferior & Limit Superior:
We take {xn }n∈N , {yn }n∈N ⊆ R.

inf xn ≤ lim inf xn ≤ xN ≤ lim sup xn ≤ sup xn , for all N ∈ N.

n∈N n→∞ n→∞ n∈N

If lim xn exists, then lim xn = lim sup xn = lim inf xn .

n→∞ n→∞ n→∞ n→∞

◦ Common to prove limit exists by showing that lim sup xn ≤ lim inf xn .
n→∞ n→∞

Negatives & Reversal: lim sup (−xn ) = − lim inf xn and lim inf (−xn ) = − lim sup xn
n→∞ n→∞ n→∞ n→∞

Scaling:

◦ For α ≥ 0, then lim sup (αxn ) = α · lim sup xn and lim inf (αxn ) = α · lim inf xn .
n→∞ n→∞ n→∞ n→∞

◦ For α < 0, then lim sup (αxn ) = |α| · lim inf xn and lim inf (αxn ) = |α| · lim sup xn
n→∞ n→∞ n→∞ n→∞

Sub-/Super-Additivity: Recall: we say f is sub-additive if f (x + y) ≤ f (x) + f (y) and super-

additive if f (x + y) ≥ f (x) + f (y).

◦ lim sup (xn + yn ) ≤ lim sup xn + lim sup yn

n→∞ n→∞ n→∞
◦ lim inf (xn + yn ) ≥ lim inf xn + lim inf yn
n→∞ n→∞ n→∞
◦ If lim xn = X and lim sup yn = Y , then lim sup (xn yn ) = XY
n→∞ n→∞ n→∞
◦ If lim xn = X and lim inf yn = Y , then lim inf (xn yn ) = XY
n→∞ n→∞ n→∞

Sub-/Super-Multiplicativity: Recall: we say f is sub-multiplicative if f (x · y) ≤ f (x) · f (y) and

super-multiplicative if f (x · y) ≥ f (x) · f (y).

◦ lim sup (xn yn ) ≤ lim sup xn lim sup yn
n→∞ n→∞ n→∞

◦ lim inf (xn yn ) ≥ lim inf xn lim inf yn
n→∞ n→∞ n→∞
◦ If lim xn = X and lim sup yn = Y , then lim sup (xn yn ) = XY
n→∞ n→∞ n→∞
◦ If lim xn = X and lim inf yn = Y , then lim inf (xn yn ) = XY
n→∞ n→∞ n→∞
◦ All of the above only hold in the case 0 · · ·∞ does not arise on the RHS.

lim sup |xn | = max lim sup an , − lim inf an
n→∞ n→∞ n→∞

339
§13.7: Limit Inferior & Limit Superior of a Function

(For sequences of functions, see the previous section.)

Consider a metric space (X, d) with E ⊆ X and f : E → R. Let a ∈ E ′ (the set of limit points). We
define the limit inferior and limit superior of f by
!
lim inf f (x) := lim inf f (x) lim sup f (x) := lim sup f (x)
x→a ε→0 x∈E∩B(a;ε)\{a} x→a ε→0 x∈E∩B(a;ε)\{a}

Note that, as ε ↘ 0, the supremum is monotone-decreasing, and the infimum monotone-increasing, so we

may write instead
!
lim inf f (x) := sup inf f (x) lim sup f (x) := inf sup f (x)
x→a ε>0 x∈E∩B(a;ε)\{a} x→a ε>0 x∈E∩B(a;ε)\{a}

In the topological case, let (X, τ ) be a topological space and all else as before. Then
!
lim inf f (x) := sup inf f (x) lim sup f (x) := inf sup f (x)
x→a U open inX x∈E∩B(a;ε)\{a} x→a U open inX x∈E∩B(a;ε)\{a}
a∈U a∈U
E∩U\{a}̸= E∩U\{a}̸=

One could write this with limits and nets and a neighborhood filter.

340
§13.8: Lp Spaces

(To be added to later.)

341
§13.9: Open and Closed Sets; Gδ and Fσ sets; Topologies

A topology on a set X is a set τ ⊆ P(X) such that

∅, X ∈ τ

τ is closed under arbitrary unions

τ is closed under finite intersections

If G ∈ τ then G is said to be open; if CF ∈ τ , then F is closed.

Items of note:

This means arbitrary unions of open sets are open, and finite unions of closed sets are closed.

G is open iff int(G) = G

F is closed iff F = F

In a metric space, it is enough to be concerned with open balls

In a metric space, a set is open iff it is an arbitrary union of open balls

In R, open sets are a countable union of disjoint balls (M&I, Thm. 1.10)
All open sets in Rn are a countable union of nonoverlapping closed cubes (M&I, Thm. 1.11)
◦ Can also use partly-open cubes
◦ Cubes are just intervals (in the Rn sense) where each side length is the same

We say that G is Gδ if for a sequence of open sets {Gk }k∈N ,

∞
\
G= Gk
k=1

We say that F is Fσ if, for a sequence of closed sets {Fk }k∈N ,

∞
[
F = Fk
k=1

Note that G need not be open, and that F need not be closed. (However, CG is Fσ and CF is Gδ by De
Morgan.)

342
§13.10: Riemann Integration

In line with Measure and Integral, we can define the integral in Rn by

Qn
Take I to be an interval: I := i=1 [ai , bi ] ⊆ Rn
Qn
We define its volume v(I) := i=1 (bi − ai )

Let f : I → Rn be bounded
N
Let Γ := {Ik }k=1 partition I into finitely many nonoverlapping intervals (“nonoverlapping” = “inter-
sects only on boundary”)

Define ∥Γ∥ := max diam(Ik ) where diam(S) := supx,y∈S d(x, y) in a metric space (X, d) (in Rn ,
1≤k≤N
2-norm)
N
Take tags Ξ := {ξk }k=1 where ξk ∈ Ik .

Define the Riemann sum R and upper/lower Darboux sums U, L (w.r.t. f, Ξ, Γ) by

N
X
Rf,Γ,Ξ := f (ξk ) · v(Ik )
k=1
N
X
Uf,Γ,Ξ := sup f (x) · v(Ik )
x∈Ik
k=1
N
X
Lf,Γ,Ξ := inf f (x) · v(Ik )
x∈Ik
k=1

R
Then Measure and Integral offers several definitions: we say A := I
f iff

lim RΓ = A
∥Γ∥→0

Formally: (∀ε > 0)(∃δ > 0)(∀Γ such that ∥Γ∥ < δ)(∀Ξ) |A − Rf,Γ,Ξ | < ε

Equivalently, inf Uf,Γ,Ξ = sup Lf,Γ,Ξ = A (M&I, Prob. 1.1r)

Γ Γ
Z Z
◦ Some write inf Uf,Γ,Ξ =: f and sup Lf,Γ,Ξ =: f
Γ I Γ I

Other equivalent conditions follow, with the notion of choosing Ξ, Ξn implied:

(∀ε > 0)(∃δ > 0)(∀Γ with ∥Γ∥ < δ)(Uf,Γ,Ξ − Lf,Γ,Ξ < ε)

(∀ε > 0)(∃Γ)(Uf,Γ,Ξ − Lf,Γ,Ξ < ε) for f bounded (M&I, Prob. 1.15)

Cauchy Criterion: (∀ε > 0)(∃δ > 0)(∀Γ, Γ′ such that ∥Γ∥, ∥Γ′ ∥ < δ) |Rf,Γ,Ξ − Rf,Γ′ ,Ξ′ | < ε

n→∞ n→∞
∃Γn with ∥Γn ∥ −−−−→ 0 Uf,Γn ,Ξn − Lf,Γn ,Ξn −−−−→ 0

(∀ε > 0)(∃s a step function on I) |f (x) − s(x)| < ε

Notable integrability results:

343
Lebesgue-Vitali or Riemann-Lebesgue Theorem: If f is continuous a.e., it is integrable (thus
giving the case of monotone)
Z
A Squeeze Theorem: (∀ε > 0)(∃α, β ∈ R(I) such that α ≤ f ≤ β on I) (β − a) < ε
I

344
§13.11: Sequences of Functions

For now, we speak of {fn : D ⊆ R → R}n∈N and f as a hypothetical limiting function as needed.

Pointwise Convergence: We say {fn }n∈N converges pointwise to f iff...

◦ lim fn (x) = f (x) for each x ∈ D

n→∞

◦ (∀x ∈ D)(∀ε > 0)(∃N := Nx,ε ∈ N)(∀n ≥ N ) |fn (x) − f (x)| < ε

Uniform Convergence: We say {fn }n∈N converges uniformly to f iff

(∀ε > 0)(∃N := Nε ∈ N)(∀n ≥ N )(∀x ∈ D) |fn (x) − f (x)| < ε

Items of note:

Uniform Limit Theorem: (Wikipedia) If fn are continuous (on E) and converge uniformly to f
(finite everywhere), then f is continuous (on E) (M&I, Thm. 1.16)
fn converges uniformly to f iff

◦ ∃{Mn }n∈N ⊆ R≥0 such that sup |fn (x) − f (x)| ≤ Mn for all n large enough
x∈D

◦ Cauchy Criterion: (∀ε > 0)(∃N ∈ N)(∀n, m ≥ N )(∀x ∈ D) |fm (x) − fn (x)| < ε

If fn converges to f uniformly, then...

◦ fn → f pointwise
◦ fn , f are uniformly bounded on D (i.e. bounded, all by the same constant)
◦ You can interchange limit and integral if fn are Riemann-integrable:
Z Z Z
lim fn = lim fn = f
n→∞ D D n→∞ D

Suppose {fn }n∈N ⊆ C 1 [a, b] with fn′ Riemann integrable on [a, b]. Moreover, let {fn′ }n∈N converge
uniformly to g, and some x0 ∈ [a, b] such that {fn (x0 )}n∈N converges as a sequence in R. Then
{fn }n∈N converges uniformly to f ∈ C 1 [a, b] with f ′ = g.
Weierstrass M -test: (Wikipedia) Consider {fn : A → F ∈ {R, C}}n∈N and suppose ∃{Mn }n∈N ⊆ R≥0
such that the following hold:

◦ |fn (x)| ≤ Mn , ∀n ∈ N and ∀x ∈ A

P∞
◦ n=1 Mn converges.
P∞
Then n=1 fn (x) converges absolutely & uniformly on A.

345
§14: Notes from Self-Studying Real Analysis

§14.1: (Baby Rudin, Chapters 1 & 2) Basics & Fundamentals

Familiar definitions and ideas are skipped. Those less memorable or possibly confusing/conventional are
noted.

Subset: Rudin chooses “A ⊂ B” to mean A ⊆ B (proper subset or equal to).

Common Sets: Rudin chooses “Q” to mean Q, “R” to mean R, etc.

Order Relation: Given a set S, an order < on S is a relation such that

(i) Given x, y ∈ S, one and only one of x < y, x = y, y < x are true
(ii) Given x, y, z ∈ S, then x < y and y < z imply x < z (transitivity)
S with such a relation < is called an ordered set. We write ≤ when equality is allowed.
Extrema: Notions of upper bounds, lower bounds, greatest lower bound (infimum), and least upper
bound (supremum) may be defined w.r.t. this ordering. Given E ⊂ S, then

◦ Upper Bound: β ∈ S is an upper bound of E (and E is bounded above) if x ≤ β for each

x∈E
◦ Lower Bound: α ∈ S is a lower bound of E (and E is bounded below ) if x ≥ α for each
x∈E
◦ Supremum: β ∈ S is the supremum of E, denoted β = sup E, if β is the least upper bound
(lub) of E. That is, β is an upper bound, and if β ′ is also an upper bound, then β ≤ β ′ .
Equivalently, if β ′ < β, then β ′ is not an upper bound of E.
◦ Infimum: α ∈ S is the infimum of E, denoted α = inf E, if α is the greatest lower bound (glb)
of E. That is, α is a lower bound, and if α′ is also a lower bound, then α ≥ α′ . Equivalently, if
α′ > α, then α′ is not a lower bound of E.
◦ We define (or can argue) that

inf ∅ = +∞ sup ∅ = −∞

◦ Sets need not contain their suprema/infima, e.g. (0, 1) ⊆ R.

◦ An ordered set (S, <) is said to have the least upper bound property if

∀E ⊆ S with E ̸= ∅ and E bounded above, then sup E ∈ S

and has the greatest lower bound property if

∀E ⊆ S with E ̸= ∅ and E bounded below, then inf E ∈ S

A set with the least upper bound property has the greatest lower bound property, and vice versa.
Moreover, given E ⊆ S nonempty and bounded below,

sup{α | α is a lower bound of E} = inf E ∈ S

i.e. the infimum is the supremum of the lower bounds. Likewise, the supremum is the infimum of
the lower bounds. (Rudin’s PMA, Thm. 1.11)

346
On Functions, Sets, & Cardinality:

◦ Rudin uses “onto” for surjections, “one-to-one” for injections, and “one-to-one correspondence”
for bijections.
◦ If ∃f : A → B a bijection, then A and B have equal cardinality and cardinal numbers, a relation
denoted by A ∼ B.
◦ Let Jn := {1, 2, · · ·, n} and J = Z+ = {1, 2, · · ·}. Then Rudin defines the following, given a set A:
A is finite if A ∼ Jn for some n ∈ N or if A = ∅
A is infinite otherwise
A is countable if A ∼ J (i.e. countable means countably infinite)
A is uncountable if neither finite nor countable
A is at most countable if finite or countable
A is enumerable/denumerable if countable
◦ Formally, a sequence (an )∞
n=1 is just a function f : N → S (for some S in which the sequence
lives) where f (n) = an .
◦ If B ⊆ A and A is countable and B is infinite, then B is countable. (Rudin’s PMA, Thm. 2.8)
S
◦ If {En }n∈N is a family of countable sets, then n∈N En is countable. (Rudin’s PMA, Thm. 2.12)

347
§14.2: (Baby Rudin, Chapter 2) Metric Spaces & Topology

On Metric Spaces: A Definition-Dump: Assume we are working in a metric space (X, d) with
E ⊆ X unless stated otherwise.

◦ Metric / Metric Space: Given a set X of points, we say d : X × X → R≥0 is a metric on X

(and (X, d) a metric space) if
d(x, y) = 0 ⇐⇒ x = y (positivity)
d(x, y) = d(y, x) (symmetry)
d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality)
◦ Intervals & Cells: We define, in R, the intervals in the usual way.
Given a := (ai )ni=1 , b := (bi )ni=1 ∈ Rn with ai < bi for each i, we define the n-cell as the points
x := (xi )ni=1 such that ai ≤ xi ≤ bi for each i.
Qn
(So the product, really, i=1 [ai , bi ].)
◦ Balls: If x ∈ Rn , r > 0, then we define the open & closed (resp.) balls (in the usual Euclidean
way)
B(x; r) := {z ∈ Rn | |x − z| < r} B(x; r) := {z ∈ Rn | |x − z| ≤ r}

◦ Convex Sets: A set E ⊆ Rn is convex if

∀x, y ∈ E, ∀λ ∈ [0, 1], we have λx + (1 − ŷ)y ∈ E

i.e. the line segment connecting x, y is in E.

◦ Neighborhood: An (open) neighborhood of p ∈ X is a set Nr (p) of all q with d(p, q) < r for
an r > 0. (r may be called its radius. Note that this is just an open ball, B(p; r).)
◦ Limit Point: p is a limit point of E ⊆ X if each neighborhood of p contains a point q distinct
from p lying in E, i.e.

∀r > 0, ∃q ∈ Nr (p) such that q ̸= p and q ∈ Nr (p) ∩ E

If not a limit point, p is said to be an isolated point. We may write E ′ as the set of its limit
points.
◦ Closed: E is said to be closed if it contains all its limit points, i.e. E ′ ⊆ E.
◦ Interior: p is said to be an interior point of E if there is a neighborhood of p contained entirely
in E, i.e.
∃r > 0 such that Nr (p) ⊆ E
◦ Open: E is open if all of its points are interior points.
◦ Complement: E c , defined as those points in the grander space not in E.
◦ Perfect: E is said to be perfect if it is closed and all points of E are limit points, i.e. E ′ = E.
◦ Bounded: E is said to be bounded if ∃M ∈ R and q ∈ X such that d(p, q) < M for each p ∈ E.
◦ Dense: E is dense in X if each point of X is a limit point of E or lies in E, or both, i.e.
X = E ∪ E′.
◦ Closure: The closure of E is itself alongside its limit points, i.e. E := E ∪ E ′ .
S
◦ Open Cover: An open cover of E is a set {Gα }α∈A of open sets in X where E ⊆ α∈A Gα .
◦ Compact: K is said to be compact in X if open covers of K contain finite subcovers, i.e. if
n
given {Gα }α∈A an open cover ofSK, one can find G1 , · · ·, Gn ∈ {Gα }α∈A such that {Gi }i=1 is
n
also an open cover of K, so K ⊆ i=1 Gi .

348
◦ Connected & Separated: A, B ⊆ X are said to be separated if A ∩ B = A ∩ B = ∅. (Note
that separated =⇒ disjoint, but not the converse, e.g. [0, 1] and (1, 2).)
We say that E is a connected set if it cannot be written as a union of two, nonempty, separated
sets.

Results from the Definition-Dump:

◦ If p ∈ E ′ , then each neighborhood of p has infinitely many points of E. (Rudin’s PMA, Thm.
2.20)
◦ E is closed iff E c is open, and E is open iff E c is closed. (Rudin’s PMA, Thm. 2.23)
◦ Arbitrary unions of open sets are open (Rudin’s PMA, Thm. 2.24a)
◦ Arbitrary intersections of closed sets are closed (Rudin’s PMA, Thm. 2.24b)
◦ Finite intersections of open sets are open (Rudin’s PMA, Thm. 2.24c)
◦ Finite unions of closed sets are closed (Rudin’s PMA, Thm. 2.24d)
◦ Closures of sets are closed (Rudin’s PMA, Thm. 2.27a)
◦ E is closed iff E = E (Rudin’s PMA, Thm. 2.27b)
◦ E is the smallest closed set containing E, i.e. if E ⊆ F and F is closed then E ⊆ F (Rudin’s
PMA, Thm. 2.27c)
◦ If E ⊆ R is nonempty and bounded above, then sup E ∈ E (Rudin’s PMA, Thm. 2.28)
◦ Compact sets are closed. (Rudin’s PMA, Thm. 2.34)
◦ Closed subsets of compact sets are themselves compact. (Rudin’s PMA, Thm. 2.35)
n Tn T
◦ If {Kα }α∈A are all compact, and ∀{Ki }i=1 ⊆ {Kα }α∈A we have i=1 Ki ̸= ∅, then α∈A Kα ̸= ∅.
(Rudin’s PMA, Thm. 2.36)
◦ If E ⊆ K, E infinite and K compact, then E has a limit point in K. (Rudin’s PMA, Thm. 2.37)
◦ Heine-Borel: Given E ⊆ Rn , the following are equivalent: (Rudin’s PMA, Thm. 2.41)
(i) E is closed and bounded
(ii) E is compact
(iii) Infinite subsets of E have limit points in E
(Heine-Borel is typically just taken as the equivalence of the first two.)
◦ Due to Weierstrass: Bounded infinite subsets of Rn have limit points in Rn . (Rudin’s PMA,
Thm. 2.42)
◦ Nonempty perfect sets are uncountable. (Rudin’s PMA, Thm. 2.43)

Properties, from exercises:

◦ E ′ is closed (Rudin’s PMA, Prob. 2.6)

′ ′
◦ E =E (Rudin’s PMA, Prob. 2.6)
n
[ n
[
◦ Ai = Ai (Rudin’s PMA, Prob. 2.7a)
i=1 i=1
∞
[ ∞
[
◦ Ai ⊇ Ai (Rudin’s PMA, Prob. 2.7b)
i=1 i=1
For an example of strict inclusion, take

1 1
Ai = 0, 1 − =⇒ Ai = 0, 1 −
n n
so [ [ [
Ai = Ai = (0, 1) Ai = [0, 1]
i∈N i∈N i∈N

349
◦ The interior E ◦ is open (Rudin’s PMA, Prob. 2.9a)
◦
◦ E is open ⇐⇒ E = E (Rudin’s PMA, Prob. 2.9b)
◦
◦ If G ⊆ E with G open, then G ⊆ E (Rudin’s PMA, Prob. 2.9c)
Hence, E ◦ is the largest open set contained in E
◦ (E ◦ )c = E c (Rudin’s PMA, Prob. 2.9d)

Defined items and properties from the exercises:

◦ Separable: (X, d) is separable if ∃D ⊆ X which is dense in X and countable.

◦ Base: In (X, d), {Vα }α∈A ⊆ P(X) is said to be a base of X if :
(i) each Vα is open
(ii) ∀x ∈ X, ∀G ⊆ X such that G is open and x ∈ G, then ∃α ∈ A s.t. x ∈ Vα ⊆ G
Hence, open sets in X are a union of the elements of {Vα }α∈A
◦ Separable metric spaces have a basis of countably-many open sets. (Rudin’s PMA, Prob. 2.23)
◦ If each infinite subset of (X, d) has a limit point, X is separable. (Rudin’s PMA, Prob. 2.24)
Such X are also compact. (Rudin’s PMA, Prob. 2.26)
◦ Compact metric spaces have countable bases, and hence are separable. (Rudin’s PMA, Prob.
2.25)
◦ Condensation Point: p ∈ E ⊆ X is a condensation point of E if each neighborhood of p
contains uncountably-many points of E.
The collection of condensation points of a given uncountable set is perfect, with at-most-countably
many points of E not in P . (Rudin’s PMA, Prob. 2.27)
◦ Closed sets in separable metric spaces are the union of a perfect set (possibly empty) and a set
which is at most countable. (Rudin’s PMA, Prob. 2.28)
◦ Open intervals in R are the union of at-most-countably-many disjoint open intervals. (Rudin’s
PMA, Prob. 2.29)

350
§14.3: (Baby Rudin, Chapter 3) Sequences & Series

Basic Definitions:
We assume we’re working in a metric space (X, d) unless stated otherwise. Let {pn }n∈N live in X.

Kinds of Sequences:

◦ Convergent: We say {pn }n∈N converges to p ∈ X if:

(∀ε > 0)(∃N ∈ N)(∀n ≥ N ) d(pn , p) < ε

We say {pn }n∈N is divergent otherwise. Note that p is the limit, and must live in the space. We
denote this relationship by
n→∞
lim pn = p pn −−−−→ p
n→∞

◦ Diverges to ±∞: Suppose {sn }n∈N ⊆ R.

n→∞
sn −−−−→ +∞ ⇐⇒ (∀M ∈ R)(∃N ∈ Z)(∀n ≥ N )(sn ≥ M )
n→∞
sn −−−−→ −∞ ⇐⇒ (∀M ∈ R)(∃N ∈ Z)(∀n ≥ N )(sn ≤ M )

Note that such sequences are still divergent (in R).

◦ Subsequences: Given a sequence {pn }n∈N and a collection {nk }k∈N ⊆ N with the property that
ni < ni+1 , then {pnk }k∈N is a subsequence of {pn }n∈N .
If p is a limit of some subsequence {pnk }k∈N , we say p is a subsequential limit.
◦ Cauchy: We say that {pn }n∈N is Cauchy if

(∀ε > 0)(∃N ∈ N)(∀n, m ≥ N ) d(pn , pm ) < ε

◦ Bounded: We say {pn }n∈N is bounded if it is bounded as a subset of X. We may say that the
points {p1 , p2 , · · ·} form the range of the sequence.
◦ Monotonicity: We say {pn }n∈N ⊆ R is monotonically increasing if pn ≤ pn+1 for each n,
and monotonically decreasing if pn ≥ pn+1 for each n.

Diameter: Given E ⊆ X (E nonempty, (X, d) a metric space), we define its diameter to be

diam E := sup d(p, q)

p,q∈E

Cauchy Criterion: Theorem 3.11 gives us that {pn }n∈N is convergent in Rn iff {pn }n∈N is Cauchy.
This is called the Cauchy criterion for convergence.
Completeness: A metric space where all Cauchy sequences in the space converge (in the space) is
said to be complete.
Limit Supremum / Limit Infimum: Define, for a given {xn }n∈N ⊆ R,
n o
k→∞
E := x ∈ R ∃{xnk }k∈N a subsequence of {xn }n∈N such that xnk −−−−→ x

Then we define the limit supremum and limit infimum of {xn }n∈N by

lim sup xn := sup E lim inf xn := inf E

n→∞ n→∞

These extrema are uniquely defined, and lie in E. (Rudin’s PMA, Thm. 3.17)

351
Series: Remember that
q
X
an := ap + ap+1 + · · · + aq
n=p
∞
X N
X
an := lim an
N →∞
n=1 n=1

Power
P∞ Series: Given {cn }n∈N ⊆ C a sequence of coefficients, its power series is given by
n
n=0 cn z . Applying the ratio or root tests to the series to get
 1 √
 √ , lim supn→∞ n cn ∈ (0, ∞)
 lim sup n cn


R := n→∞
√
 ∞, lim supn→∞ n cn = 0
√


0, lim supn→∞ n cn = ∞


we say R is the radius of convergence of the series, i.e. it converges for |z| < R and diverges for
|z| > R. It may or may not for |z| = R.
Absolute Convergence: If |an | converges, we say
P P
an absolutely converges.

352
Basic Results:
We assume, unless stated otherwise, that we’re working in a metric space (X, d), and sequences live in
that space.

The obvious arithmetic properties of sequences hold in Rm and C. So, if (Rudin’s PMA, Thm. 3.3)
{xn }n∈N , {yn }n∈N ⊆ C
α, β ∈ C
n→∞
xn −−−−→ x
n→∞
yn −−−−→ y
then
n→∞
xn + yn −−−−→ αx + βy
n→∞
xn yn −−−−→ xy
1 n→∞ 1
−−−−→ if xn ̸= 0 for all sufficiently large n
xn x
And, if we define
{xn }n∈N , {yn }n∈N ⊆ Rm
{γn }n∈N ⊆ R
α, β, γ ∈ R
h i
(n) (n) (n)
xn := ξ1 , ξ2 , · · ·, ξm ∈ Rm
h i
(n) (n) (n)
yn := η1 , η2 , · · ·, ηm ∈ Rm
x := [ξ1 , · · ·, ξm ] ∈ Rm
y := [η1 , · · ·, ηm ] ∈ Rm
n→∞
xn −−−−→ x
n→∞
yn −−−−→ y
n→∞
γn −−−−→ γ
then (Rudin’s PMA, Thm. 3.4a)
n→∞ (n) n→∞
xn −−−−→ x ⇐⇒ ∀i, we have componentwise convergence: ξi −−−−→ ξi
and (Rudin’s PMA, Thm. 3.4b)
n→∞
αxn + βyn −−−−→ αx + βy
n→∞
γn xn −−−−→ γx
n→∞
⟨xn , yn ⟩Rm −−−−→ ⟨x, y⟩Rm

Series also have their obvious arithmetic properties if convergent. Suppose (Rudin’s PMA, Thm. 3.47)
X X
an = A bn = B α, β ∈ C

Then
X
(αan + βbn ) = αA + βB
∞
! ∞ ! ∞ n
!
X X X X
an bn = ak bn−k (Cauchy product)
n=0 n=0 n=0 k=0

353
Note that the product of convergent series may diverge: consider
∞
X (−1)n
√
n=0
n+1

It trivially converges, but its square does not.

n→∞
pn −−−−→ p iff ∀r > 0, B(p; r) contains all but finitely many of the pn (Rudin’s PMA, Thm. 3.2a)
n→∞ k→∞
pn −−−−→ p ⇐⇒ pnk −−−−→ p for all subsequences {pnk }k∈N of {pn }n∈N

The collection of subsequential limits of a given sequence forms a closed set (Rudin’s PMA, Thm. 3.7)
∞ n→∞
{pn }n∈N is Cauchy iff, for EN := {pi }i=N , we have diam En −−−−→ 0.

diam E = diam E (Rudin’s PMA, Thm. 3.10a)

n→∞ T∞
For K1 ⊇ K2 ⊇ K3 ⊇ · · · all compact, with diam Kn −−−−→ 0, then i=1 Kn is a singleton set. (Rudin’s
PMA, Thm. 3.10b)

Closed subsets of complete spaces are themselves complete.

Given a monotone sequence, it converges iff it is bounded. (Rudin’s PMA, Thm. 3.14)

354
Items on Limit Supremum & Limit Infimum:

Suppose we have {sn }n∈N , {tn }n∈N ⊆ R with sn ≤ tn for all n large enough. Then (Rudin’s PMA,
Thm. 3.19)
lim inf sn ≤ lim inf tn lim sup sn ≤ lim sup tn
n→∞ n→∞ n→∞ n→∞

Given {cn }n∈N ⊆ (0, ∞), then

cn+1 √ √ cn+1
lim inf ≤ lim inf n cn ≤ lim sup n cn ≤ lim sup
n→∞ cn n→∞ n→∞ n→∞ cn

We have, given {an }n∈N , {bn }n∈N ⊆ R, (Rudin’s PMA, Prob. 3.5)

lim sup(an + bn ) ≤ lim sup an + lim sup bn

n→∞ n→∞ n→∞

provided that the right-hand limsups are not both infinity.

355
Series Convergence Tests & Such:

!
Xm
Cauchy Criterion:
P
an converges iff (∀ε > 0)(∃N ∈ N) m ≥ n ≥ N =⇒ ai ≤ ε (Rudin’s

i=n
PMA, Thm. 3.22)
n→∞
nth Term Test:
P
an converges =⇒ an −−−−→ 0 (not the converse) (Rudin’s PMA, Thm. 3.23)
If an ≥ 0, then
P
an converges iff the partial sums are a bounded sequence (Rudin’s PMA, Thm.
3.24)
Comparison Tests: If |an | ≤ cn eventually,
P P
P then cn converging
P implies the same for an . Like-
wise, if an ≥ dn ≥ 0 eventually and dn diverges, so must an . (Rudin’s PMA, Thm.
3.25)
P∞ P∞
Cauchy Condensation: Suppose a1 ≥ a2 ≥ · · · ≥ 0. Then n=1 an converges iff k=0 2k a2k does.
(Rudin’s PMA, Thm. 3.27)

Root Test: Given

P p
an , let α := lim sup n |an |. Then the sum converges if α < 1, and diverges if
n→∞
α > 1. The test is inconclusive if α = 1. (Rudin’s PMA, Thm. 3.35)
Ratio Test: Given
P
an , then if (Rudin’s PMA, Thm. 3.36)

an+1
lim sup <1
n→∞ an

then we have convergence. If |an+1 /an | ≥ 1 for all n eventually, we have divergence.

The ratio test is weaker than the root test; the root test works on more series, and comes to the same
conclusions as the ratio test when the latter comes to a conclusion at all.
Summation by Parts: Given {an }n∈N , {bn }n∈N , define (Rudin’s PMA, Thm. 3.41)
n
X
An := ak for n ≥ 0 and A−1 := 0
k=0

Then for q ≥ p ≥ 0 it holds that

q
X q−1
X
an bn = An (bn − bn+1 ) + Aq bq − Ap−1 bp
n=p n=p

n→∞
Use {aP
n }n∈N , {bn }n∈N , {An }n∈N as given above, with {bn }n∈N monotone decreasing and bn −
−−−→ 0.
Then an bn converges. (Rudin’s PMA, Thm. 3.42)
Alternating Series Test: Given {cn }n∈N with |c1 | ≥ |c2 | ≥ · · ·, odd-index terms nonnegative, even-
n→∞ P
index terms nonpositive, and cn −−−−→ 0, then cn converges. (Rudin’s PMA, Thm.
3.43)

If cn z n has coefficients decreasing to 0 and radius of convergence 1, then the series converges every-
P
where on the circle |z| = 1, except possibly at z = 1. (Rudin’s PMA, Thm.
3.44)
Absolute Convergence Test: A series which converges absolutely is convergent.

356
Pn
Suppose
P P P
an converges absolutely, an = A, bn = B, and cn = k=0 ak bn−k . Then
∞
X
AB = cn
n=0

i.e. this product will converge, to the result we expect. (Rudin’s PMA, Thm. 3.50)
Pn
If
P P P
an , bn , cn converge to A, B, C respectively, and cn = k=0 ak bn−k , then C = AB. (Rudin’s
PMA, Thm. 3.51)

If
P P
an converges and {bn }n∈N is a monotonic bounded sequence, then an bn converges. (Rudin’s
PMA, Prob. 3.8)
If
P P
an , bn are absolutely convergent, so is their product. (Rudin’s PMA, Prob. 3.13)

357
More Important Results:

Convergent versus Cauchy: (Rudin’s PMA, Thm. 3.11)

◦ In all cases, convergent =⇒ Cauchy (Rudin’s PMA, Thm. 3.11a)

n
◦ In compact spaces & R , Cauchy ⇐⇒ convergent (Rudin’s PMA, Thm. 3.11b,c)
n
◦ Hence, R and compact spaces are complete spaces.
k→∞
If {pn }n∈N is Cauchy and has a subsequence {pnk }k∈N with pnk −−−−→ p for some p ∈ X, then then
n→∞
the whole sequence converges to p, i.e. pn −−−−→ p (Rudin’s PMA, Prob. 3.20)
n→∞ n→∞
If pn −−−−→ p and pn −−−−→ p′ , then p = p′ ; i.e., limits are unique (Rudin’s PMA, Thm. 3.2b)

If {pn }n∈N converges, it is bounded (Rudin’s PMA, Thm. 3.2c)

n→∞
If p ∈ E ′ , we may find a sequence {pn }n∈N ⊆ E with pn −−−−→ p (limit points are reached by sequences)
(Rudin’s PMA, Thm. 3.2d)
k→∞
If {pn }n∈N lives in a compact metric space, then ∃{pnk }k∈N a subsequence with pnk −−−−→ p for some
p∈X (Rudin’s PMA, Thm. 3.6a)
Each bounded sequence in Rn contains a convergent subsequence (Rudin’s PMA, Thm. 3.6b)

lim sn = s ⇐⇒ lim sup sn = lim inf sn = s

n→∞ n→∞ n→∞

Baire’s Theorem: Take X a nonempty complete metric space, and {Gn }n∈N ⊆ P(X) a sequence of
T∞
dense, open subsets. Then i=1 Gi ̸= ∅, and is in fact dense in X. (Rudin’s PMA, Prob. 3.22)

358
Notable Sequences & Series:
For sequences, as in Theorems 3.20, 3.31
1 n→∞
−−−−→ 0 for p > 0
np
√ n→∞
n
p −−−−→ 1 for p > 0
√ n→∞
n
n −−−−→ 1
nα n→∞
−−−−→ 0 for p > 0 and α ∈ R
(1 + p)n
n→∞
xn −−−−→ 0 for |x| < 1
n
1 n→∞
1+ −−−−→ e
n

For series, as in Theorems 3.26, 3.28-3.29.

∞
X 1
xn = for each x ∈ [0, 1) or, more broadly, |x| < 1
n=0
1 − x
∞
(
X 1 convergent, p>1
p
=
n=1
n divergent, p≤1
∞
(
X 1 convergent, p>1
p =
n=2
n log n divergent, p≤1
∞
X 1
=: e
n=0
n!

359
§14.4: (Baby Rudin, Chapter 4) Continuity

Some Conventions:
If not otherwise stated:

X, Y denote metric spaces, with metrics dX , dY respectively

f : X → Y is a function of metric spaces

E ⊆ dom(f ) = X

f + g, f g, f /g, λf, and f · g are the functions defined by the rules

(f + g)(x) := f (x) + g(x)

(f − g)(x) := f (x) − g(x)
(f g)(x) := f (x)g(x)

f f (x)
(x) :=
g g(x)
(λf )(x) := λf (x)
(f · g)(x) := f (x) · g(x) ≡ ⟨f (x), g(x)⟩Rn (for functions with outputs in Rn )

Note that f · g is a function with output in R.

360
Fundamental Definitions:

Bounded: We say a function f : X → Rn is bounded if ∃M ∈ R≥0 such that, ∀x ∈ X, we have

∥f (x)∥2 ≤ M .
Continuity: Take f : E ⊆ X → Y a function of metric spaces with p ∈ E. We say f is continuous
at p if

(∀ε > 0)(∃δ > 0) x ∈ E and dX (x, p) < δ =⇒ dY f (x), f (p) < ε
We say f is continuous on the set E if it is continuous at every point.
Note that f must be defined at the point p in question to be continuous there.
Uniform Continuity: Let f : X → Y be a function of metric spaces. We say f is uniformly
continuous on X if

(∀ε > 0)(∃δ > 0) dX (p, q) < δ =⇒ dY f (p), f (q) < ε

Note that this δ works for all p, q ∈ X, i.e. is independent of the point chosen. This is strictly stronger
than ordinary continuity.
One-Sided Limits: Take f : (a, b) → Y with x ∈ [a, b). We say
n→∞ n→∞
f (x+ ) = q ⇐⇒ ∀{tn }n∈N ⊆ (x, b) with tn −−−−→ x we have f (tn ) −−−−→ q
n→∞ n→∞
f (x− ) = q ⇐⇒ ∀{tn }n∈N ⊆ (a, x) with tn −−−−→ x we have f (tn ) −−−−→ q
Note that lim f (t) exists ⇐⇒ f (x+ ) = f (x− ) = lim f (t)
t→x t→x

Types of Discontinuity: Let f : (a, b) → Y . We say f has a simple discontinuity , or disconti-

nuity of the first kind , at x if f (x+ ), f (x− ) exist but f is discontinuous there.
Otherwise, f is said to have a discontinuity of the second kind at x.
Simple discontinuities arise if f (x+ ) ̸= f (x− ) or f (x+ ) = f (x− ) ̸= f (x).
Monotonicity: Let f : (a, b) → R. We say that
(i) f is (weakly) monotonically increasing if a < x < y < b =⇒ f (x) ≤ f (y)
(ii) f is (weakly) monotonically decreasing if a < x < y < b =⇒ f (x) ≥ f (y)
Monotone functions include either type.
Neighborhoods of Infinity: In R, open sets of the type (c, ∞) are said to be neighborhoods of
+∞, and those of the type (−∞, c) are said to be neighborhoods of −∞.
Possibly-Infinite Limits: Let f : E ⊆ R → R. We say
lim f (t) = A for x, A ∈ R ∪ {±∞}
t→x

if, ∀ neighborhoods U of A, ∃ a neighborhood V of x, with V ∩ E ̸= ∅ and, ∀t ∈ (V ∩ E)\{x}, we have

f (t) ∈ U .
Open/Closed Mappings: We say f : X → Y is an open mapping if V ⊆ X open =⇒ f (V ) ⊆ Y
is open. Likewise, if V ⊆ X closed =⇒ f (V ) ⊆ Y closed, then f is said to be a closed mapping .
Distance to Set: Given ∅ ̸= E ⊆ X for a metric space (X, d), the distance from x ∈ X to E is
defined by
ρE (x) := inf d(x, z)
z∈E

Convex Function: f : (a, b) → R is said to be convex if

∀x, y ∈ (a, b), ∀λ ∈ [0, 1], f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y)

361
Limit of a Function:
Given f : X → Y a function of metric spaces and p ∈ E ′ (E ⊆ X) we say that
x→p
f (x) → q as x → p f (x) −−−→ q lim f (x) = q
x→p

if and only if ∃q ∈ Y with the property:

(∀ε > 0)(∃δ > 0) 0 < dX (x, p) < δ =⇒ dY (f (x), q) < ε

In Rn and Cn , for instance, with d(x, y) := |x − y| or ∥x − y∥,

(∀ε > 0)(∃δ > 0) 0 < |x − p| < δ =⇒ |f (x) − q| < ε

Some basic results:

We have (Rudin’s PMA, Thm. 4.1)

lim f (x) = q ⇐⇒ lim f (pn ) = q

x→p n→∞

n→∞
for every {pn }n∈N ⊆ E with pn ̸= p for all n but pn −−−−→ p.
Hence, if f has a limit as x → p, then the limit is unique.
Take f, g : X → C with E ⊆ X and p ∈ E ′ and (Rudin’s PMA, Thm. 4.4)

lim f (x) = A lim g(x) = B

x→p x→p

Then:

lim (f + g)(x) = A + B
x→p

lim (f g)(x) = AB
x→p

f A
lim (x) = , if B ̸= 0
x→p g B

If f, g : X → Rn instead then we can replace the middle statement with

lim (f · g)(x) = lim ⟨f (x), g(x)⟩Rn = ⟨A, B⟩Rn

x→p x→p

For a shorthand,

C(X, Y ) := {f : X → Y | f is continuous}
C(X) := C(X, X)

362
Continuous Functions:
Some results:

While not commented upon, Lipschitz continuity is used and is convenient: it is stronger than
ordinary continuity. Recall: f : X → Y is Lipschitz continuous with constant L if

∀x, y ∈ X we have dY f (x), f (y) ≤ L · dX (x, y)

For instance, this is used with the reverse triangle inequality,

|x| − |y| ≤ |x − y| ∥x∥ − ∥y∥ ≤ ∥x − y∥

to show that the Euclidean norms/absolute value are continuous on Rn .

Suppose f, g : X → C are continuous. Then the following are too: (Rudin’s PMA, Thm. 4.9)

f ±g

f /g (when g(x) ̸= 0 for each x)

If f, g : X → Rn , then these are continuous: (Rudin’s PMA, Thm. 4.10b)

f ±g

f · g (defined by (f · g)(x) := ⟨f (x), g(x)⟩Rn )

The composition f ◦ g : X → Z of continuous functions g : X → Y and f : Y → Z is also continuous.

(Rudin’s PMA, Thm. 4.4)
Equivalent conditions to continuity:

If p is an isolated point of E, any function on E is continuous at p.

Otherwise, if p ∈ E ′ , then f is continuous at p iff lim f (x) = f (p) (Rudin’s PMA, Thm. 4.6)
x→p

f is continuous iff the preimage of open sets is open, i.e. f −1 (V ) is open in dom(f ) for each open V in
cod(f ) (Rudin’s PMA, Thm. 4.8)

f is continuous iff the preimage of closed sets is closed, i.e. f −1 (C) is closed in dom(f ) for each closed
C in cod(f )
If a function f : X k → Rk is defined by (Rudin’s PMA, Thm. 4.10)

f (x) := (f1 (x), · · ·, fk (x)) for x ∈ X and fi : X → R

then f is continuous iff, for each i, fi ∈ C(X, R).

Topological invariants: those items preserved by continuous functions f ∈ C(X, Y ).

Connectedness. (If C is connected, so is f (C).) (Rudin’s PMA, Thm. 4.22)

Compactness. (If K is compact, so is f (K).) (Rudin’s PMA, Thm. 4.14)
Density. (If D is dense, so is f (D).) (Rudin’s PMA, Prob. 4.4)
Moreover, if f |D = g|D , then f = g on X (determined by values on dense set).

363
Interactions Between Compactness & Continuity:

Continuous functions preserve compactness. (If X is compact and f ∈ C(X, Y ), then f (X) is compact.)
(Rudin’s PMA, Thm. 4.14)
Hence if f ∈ C(X, Rn ) for X compact, then f (X) is closed & bounded. (Rudin’s PMA, Thm. 4.15)
If f ∈ C(X, R) for X compact, then ∃x∗ , x∗ ∈ X such that (Rudin’s PMA, Thm. 4.16)

f (x∗ ) = sup f (p) f (x∗ ) = inf f (p)

p∈X p∈X

i.e. continuous functions from a compact set attain their extrema.

Let f ∈ C(X, Y ) be a bijection with X compact. Then f −1 ∈ C(Y, X). (Rudin’s PMA, Thm. 4.17)
Uniformly continuous functions are continuous.

If f ∈ C(X, Y ) for X compact, then f is uniformly continuous on X. (Rudin’s PMA, Thm. 4.19)
That is, continuity on a compact set gives uniform continuity there.
If E ⊆ R is not compact, then: (Rudin’s PMA, Thm. 4.20)

◦ ∃f continuous on E which is not bounded

◦ ∃g continuous and bounded on E which has no maximum
◦ If E is bounded, ∃h continuous on E which is not uniformly continuous

The proof uses these, for E bounded with x0 ∈ E ′ − E:

1 1
f (x) = g(x) =
x − x0 1 + (x − x0 )2

For E unbounded Rudin uses instead

x2
f (x) = x g(x) =
1 + x2

364
Interactions Between Connectedness & Continuity:

If f ∈ C(X, Y ) and E ⊆ X is connected, so is f (E). (Rudin’s PMA, Thm. 4.22)

Intermediate Value Theorem: If f ∈ C([a, b], R) with f (a) < c < f (b) or f (b) < c < f (a), then
∃ξ ∈ (a, b) with f (x) = c.

Monotonicity & Continuity:

For f monotone-increasing on (a, b), then f (x+ ), f (x− ) exist everywhere, with

sup f (t) = f (x− ) ≤ f (x) ≤ f (x+ ) = inf f (t)

t∈(a,x) t∈(x,b)

and, if a < x < y < b then f (x+ ) ≤ f (y − ). (Rudin’s PMA, Thm. 4.29)
An analogous result holds in the decreasing case.
As a result, monotone functions only have simple discontinuities.
Monotone functions have at-most-countably-many discontinuities. (Rudin’s PMA, Thm. 4.30)

365
Other Properties from the Exercises:

For f ∈ C(X, Y ) we have f E ⊆ f (E).

(Rudin’s PMA, Prob. 4.2)
For f ∈ C(X, R), {f = 0} := {x ∈ X | f (x) = 0} is closed, (Rudin’s PMA, Prob. 4.3)
For f ∈ C(X, Y ) with D ⊆ X dense, then f (D) is dense. (Rudin’s PMA, Prob. 4.4)
Moreover, if f |D = g|D , then f = g on X (determined by values on dense set).

If f ∈ C(E, Y ), with E ⊆ R closed, then a continuous extension fe : R → Y with fe continuous and

fe = f on E. (Rudin’s PMA, Prob. 4.5)
Can extend to vector-valued functions on Rn .

For E compact, f ∈ C(E, R) if and only if graph(f ) is compact. (Rudin’s PMA, Prob. 4.6)
If f : E ⊆ R → R is uniformly continuous on E, then f is bounded on E. (Rudin’s PMA, Prob. 4.8)
f : X → Y is uniformly continuous iff ∀ε > 0, ∃δ > 0 such that, if E ⊆ X has diam E < δ, then
diam f (E) < ε (Rudin’s PMA, Prob. 4.9)

If f : X → Y is uniformly continuous and {xn }n∈N is Cauchy in X, then {f (xn )}n∈N is Cauchy in Y .
(Rudin’s PMA, Prob. 4.11)
The composition of uniformly continuous functions is uniformly continuous. (Rudin’s PMA, Prob.
4.12)

If f : D → R is uniformly continuous, and D ⊆ X is dense, then f has a unique continuous extension

fe ∈ C(X, R). (Rudin’s PMA, Prob. 4.13)
Open continuous maps R → R are monotone. (Rudin’s PMA, Prob. 4.15)
If f : (a, b) → R, it has at most countably many simple discontinuities. (Rudin’s PMA, Prob. 4.17)

ρE (x) = 0 ⇐⇒ x ∈ E (Rudin’s PMA, Prob. 4.20a)

ρE (the “distance to a set” function) is uniformly continuous on X, and is in fact 1-Lipschitz as a
function X → R. (Rudin’s PMA, Prob. 4.20b)
If K, F ⊆ X a metric space, with both disjoint K compact and F closed, then ∃δ > 0 such that,
∀p ∈ P, q ∈ F , we have d(p, q) > δ.
That is, a closed set can be separated from a compact one. (Rudin’s PMA, Prob. 4.21)
Convex functions are continuous. (Rudin’s PMA, Prob. 4.23)
If f is monotone increasing and convex, and g is convex, f ◦ g is convex. (Rudin’s PMA, Prob. 4.23)

For f convex on (a, b), with a < s < t < u < b, then we have the three chord lemma: (Rudin’s PMA,
Prob. 4.23)
f (t) − f (s) f (u) − f (s) f (u) − f (t)
≤ ≤
t−s u−s u−t

For f ∈ C (a, b), R with the property

(Rudin’s PMA, Prob. 4.24)

x+y f (x) + f (y)
∀x, y ∈ (a, b), f ≤
2 2

Then f is convex.

366
For A, B ⊆ Rn , we define (Rudin’s PMA, Prob. 4.25)

A + B := {a + b | a ∈ A, b ∈ B}

If K is compact and C is closed, both in Rn , then K + C is closed.

For X, Y, Z metric spaces with Y compact, f : X → Y , g ∈ C(Y, Z) a continuous injection, and

h = g ◦ f , we have that: (Rudin’s PMA, Prob. 4.26)

◦ h uniformly continuous =⇒ f is uniformly continuous

◦ h continuous =⇒ f continuous

367
§14.5: (Baby Rudin, Chapter 5) Differentiation in R

Fundamental Definitions:

Difference Quotient; Derivative: Given f : [a, b] → C and x ∈ [a, b], define the difference
quotient φ : (a, b)\{x} → R

f (t) − f (x)
φ(t) := for all t ∈ (a, b), t ̸= x
t−x
and the derivative (provided the limit exists)

f (t) − f (x)
f ′ (x) := lim φ(t) = lim
t→x t→x t−x
with f ′ : D → R where D ⊆ (a, b) is those points where the limit exists.
If f ′ is defined at x, we say f is differentiable at x, and differentiable on E if differentiable at each
x ∈ E.
One-sided (left- and right-hand) derivatives arise in the obvious way, with limits as t → x− and
t → x+ respectively. Rudin does not choose to discuss these in any detail, and leaves the matter of
differentiability at the endpoints of intervals mostly undiscussed (but chooses to imply these derivatives
are undefined).
Passage to C/Rn -Valued Functions: For f : [a, b] → C in the form f = f1 +if2 (as a decomposition
into real and imaginary parts), then
f ′ = f1′ + if2′
with differentiability at z ∈ [a, b] iff f1 , f2 are differentiable at z. (This can be proven.) The derivative
is still defined in the usual way for C-valued functions, i.e.

f (t) − f (z)
f ′ (z) := lim
t→z t−z

If f : [a, b] → Rn , then we perform the derivative with the norm, sort of. Given such f , we define

′ n
f (t) − f (x) ′

f (x) is the point of R such that lim
− f (x)
=0
t→x t−x 2

if said point exists. Of course, as with C-valued functions, Rn -valued functions f = (f1 , · · ·, fn ) are
differentiable at x ∈ Rn iff they are differentiable in each component at x.
Extrema Definitions: For f : X → R with (X, d) a metric space, we say the following of p ∈ X:

◦ p is a local maximum of f if ∃δ > 0 such that f (q) ≤ f (p) for all q ∈ B(p; δ).
◦ p is a local minimum of f if ∃δ > 0 such that f (q) ≥ f (p) for all q ∈ B(p; δ).

368
Elementary Results:

Arithmetic Properties of Derivatives: Given f, g : [a, b] → C differentiable at x ∈ [a, b], then we

have:

(i) Sum Rule: (f + g)′ (x) = f ′ (x) + g ′ (x) (Rudin’s PMA, Thm. 5.3a)
′ ′ ′
(ii) Product Rule: (f g) (x) = f (x)g(x) + f (x)g (x) (Rudin’s PMA, Thm. 5.3b)
n
(iii) Product Rule (Vector Functions): For f, g : [a, b] → R differentiable at x, then

d
(f · g)′ (x) :=
⟨f (x), g(x)⟩Rn = (f ′ · g)(x) + (f · g ′ )(x) = ⟨f ′ (x), g(x)⟩Rn + ⟨f (x), g ′ (x)⟩Rn
dx
′ ′ ′
(iv) Quotient Rule: fg (x) = g(x)f (x)−g
g 2 (x)
(x)f (x)
if g(x) ̸= 0 (Rudin’s PMA, Thm. 5.3c)
(v) Chain Rule: If f ∈ C[a, b] and differentiable at x, and g : I ⊆ range(f ) → R with differentiability
at f (x), then (Rudin’s PMA, Thm. 5.5)

(g ◦ f )′ (x) = g ′ (f (x)) · f ′ (x)

(vi) L’Hopital’s Rule: Suppose f, g : (a, b) → R are differentiable, with g ′ ̸= 0 on (a, b). (Here,
a < b and a, b ∈ [−∞, +∞].) Suppose (Rudin’s PMA, Thm. 5.13)

f ′ (x)
lim =A
x→a g ′ (x)

Then if
lim f (x) = lim g(x) = 0
x→a x→a
or
lim g(x) = ∞
x→a

then
f (x) f ′ (x)
lim = A = lim ′
x→a g(x) x→a g (x)

(One may let g(x) → −∞, or x → b.)

L’Hopital’s rule can fail for C-valued functions, see f (x) = eix on [0, 2π], or the pair
2
f (x) = x g(x) = x + x2 ei/x dom(f ) = dom(g) = (0, 1)

Monotonicity Relations: For f differentiable on (a, b), (Rudin’s PMA, Thm. 5.11)

◦ If f ′ ≥ 0 on (a, b), then f is monotone increasing

◦ If f ′ ≤ 0 on (a, b), then f is monotone decreasing
◦ If f ′ = 0 on (a, b), then f is constant

369
More Important Results:

If f (whether it is R, C, or Rn -valued) is differentiable at x, it is continuous at x. (Rudin’s PMA,

Thm. 5.2)
Extreme Value Theorem: If f : [a, b] → R has a local maximum/minimum at x ∈ (a, b), and f ′ (x)
exists, then f ′ (x) = 0. (Rudin’s PMA, Thm. 5.8)
Cauchy’s Generalized Mean Value Theorem: If f, g ∈ C[a, b] are differentiable on (a, b), then
∃x ∈ (a, b) for which
[f (b) − f (a)]g ′ (x) = [g(b) − g(a)]f ′ (x)
or, more familiarly but slightly less generally,

f (b) − f (a) f ′ (x)

= ′
g(b) − f (a) g (x)

The “usual” mean value theorem arises for g(x) = x: (Rudin’s PMA, Thm. 5.10)

f (b) − f (a)
= f ′ (x)
b−a

Mean Value Theorem (Vector Case): Take f ∈ C([a, b], Rn ) differentiable on (a, b). Then
∃x ∈ (a, b) such that (Rudin’s PMA, Thm. 5.19)

∥f (b) − f (a)∥2 ≤ (b − a)∥f ′ (x)∥2

or, more familiarly but slightly less generally,

∥f (b) − f (a)∥2
≤ ∥f ′ (x)∥2
b−a

Intermediate Value Property of Derivatives: For f : [a, b] → R differentiable, suppose that

f ′ (a) < λ < f ′ (b) (or f ′ (a) > λ > f ′ (b)). Then ∃x ∈ (a, b) with f ′ (x) = λ. (Rudin’s PMA, Thm. 5.12)
As a corollary, for each such f , f ′ does not have simple discontinuities on [a, b] (but can have discon-
tinuities of the second kind).
Taylor’s Theorem: Let us have: (Rudin’s PMA, Thm. 5.15)

◦ f : [a, b] → R
◦ n ∈ Z+
◦ f (n−1) ∈ C[a, b]
◦ f (n) (t) exists ∀t ∈ (a, b)
◦ α, β ∈ [a, b], α ̸= β
n−1
X f (k) (α)
◦ P (t) := (t − α)k
k!
k=0

Then ∃x ∈ (a, b) such that

f (n) (x)
f (β) − P (β) = (β − α)n
n!

370
§14.6: (Baby Rudin, Chapter 6) Riemann(-Stieltjes) Integration

Assumptions:
Unless stated otherwise:

f : [a, b] → R is bounded

α : [a, b] → R is monotone-increasing

Various notations will exist for the Riemann(-Stieltjes) integral. Rudin uses all of these equivalently:
Z Z Z Z Z b Z b Z b Z b
f f dα f dα(x) f (x) dα(x) f f dα f dα(x) f (x) dα(x)
a a a a

Of course in the ordinary Riemann case, he may use dx in lieu of dα or dα(x).

Basic Definitions:

Partitions, Upper/Lower Sums, & Related Terms: Given an interval [a, b] ⊆ R, we define a
n
partition of it to be a finite set of points P := {xi }i=0 where

a = x0 ≤ x1 ≤ · · · ≤ xn−1 ≤ xn = b

We let the class of all these partitions of [a, b] be Pa,b . We define

∆xi := xi − xi−1
∆fi := f (xi ) − f (xi−1 )
Mi := sup f (x)
x∈[xi−1 ,xi ]

mi := inf f (x)
x∈[xi−1 ,xi ]
n
X
U (P, f ) := Mi ∆xi (the upper (Darboux) sum of f over P )
i=1
Xn
L(P, f ) := mi ∆xi (the lower (Darboux) sum of f over P )
i=1
Z b
f dx := inf U (P, f ) (the upper (Darboux) integral of f )
a P ∈Pa,b
Z b
f dx := sup L(P, f ) (the lower (Darboux) integral of f )
a P ∈Pa,b

Rb Rb
Riemann-Integrable: If a f dx = a f dx, then we say f is Riemann-integrable on [a, b] and
write f ∈ R (the class of all Riemann-integrable functions) or f ∈ R[a, b] (those Riemann-integrable
on [a, b]) and write their common value by
Z b Z b Z b
f dx := f dx ≡ f dx
a a a

371
Terms for Riemann-Stieltjes Integrals: Given α : [a, b] → R monotone increasing, a function
n
f : [a, b] → R, and a partition P := {xi }i=0 of [a, b], then we define ∆xi , ∆αi , mi , Mi as before. We
then define, analogously, emphasizing dependence on α,
n
X
U (P, f, α) := Mi ∆αi (the upper (Darboux-Stieltjes) sum of f over P )
i=1
Xn
L(P, f, α) := mi ∆αi (the lower (Darboux-Stieltjes) sum of f over P )
i=1
Z b
f dα := inf U (P, f, α) (the upper (Darboux-Stieltjes) integral of f )
a P ∈Pa,b
Z b
f dα := sup L(P, f, α) (the lower (Darboux-Stieltjes) integral of f )
a P ∈Pa,b

Rb Rb
Riemann-Stieltjes Integrable: If a
f dα = a
f dα, we denote their common value by
Z b Z b Z b Z b
f (x) dα(x) ≡ f dα := f dα ≡ f dα
a a a a

which is the Riemann-Stieltjes integral of f w.r.t. α over [a, b]. Rudin chooses to say that f is
integrable w.r.t. α in the Riemann sense, denoted f ∈ R(α).
Note that the ordinary Riemann integral arises for α ≡ id.
Refinement: Given a partition P of [a, b], a partition P ∗ is said to be a refinement of P if P ⊆ P ∗ ,
i.e. it has the same points and possibly more.
Given P1 , P2 partitions of [a, b], their common refinement is the partition P1 ∪ P2 .

Basic Results:

Inequalities for Upper/Lower Riemann Sums: If f (x) ∈ [m, M ] for all x ∈ [a, b], then

m(b − a) ≤ L(P, f ) ≤ U (P, f ) ≤ M (b − a)

Generally, for any collection of points ti ∈ [xi−1 , xi ], we have

X
L(P, f, α) ≤ f (ti )∆αi ≤ U (P, f, α)
i

Inequalities by Refinements: Suppose P ′ is a refinement of P . Then (Rudin’s PMA, Thm. 6.4)

L(P, f, α) ≤ L(P ′ , f, α) ≤ U (P ′ , f, α) ≤ U (P, f, α)

That is, the lower sums only grow with new partition points, and the upper sums only shrink.
Z b Z b
Inequality of the Upper/Lower Integrals: f dα ≤ f dα (Rudin’s PMA, Thm. 6.5)
a a

372
Properties of the Integral:

Linearity in Integrand: Take f, g ∈ R(α) on [a, b], and β, γ ∈ R. Then (Rudin’s PMA, Thm. 6.12)

βf + γg ∈ R(α) on [a, b]
Z b Z b Z b
(βf + γg) dα = β f dα + γ g dα
a a a

Linearity in Weight Function: Let f ∈ R(α1 )∩R(α2 ) on [a, b] and c1 , c2 ∈ R; then f ∈ R(c1 α1 +c2 α2 )
on [a, b], with (Rudin’s PMA, Thm. 6.12)
Z b Z b Z b
f d(c1 α1 + c2 α2 ) = c1 f dα1 + c2 f dα2
a a a

Monotonicity: If f ≤ g for f, g ∈ R(α) on [a, b], then (Rudin’s PMA, Thm. 6.12)
Z b Z b
f dα ≤ g dα
a a

Additivity: If f ∈ R(α) on [a, b], and c ∈ (a, b), then f ∈ R(α) for both [a, c] and [c, b], with (Rudin’s
PMA, Thm. 6.12)
Z c Z b Z b
f dα + f dα = f dα
a c a

Boundedness: For f ∈ R(α) with |f (x)| ≤ M on [a, b], then (Rudin’s PMA, Thm. 6.12)
Z
b
f dα ≤ M α(b) − α(a)

a

Triangle Inequality: For f ∈ R(α) on [a, b], we have |f | ∈ R(α) with (Rudin’s PMA, Thm. 6.13)
Z Z
b b
f dα ≤ |f | dα

a a

Product is Integrable: If f, g ∈ R(α) on [a, b], then f g ∈ R(α) too. (Rudin’s PMA, Thm. 6.13)
If α is monotone increasing with derivative α′ ∈ R[a, b], and f : [a, b] → R is bounded, then: (Rudin’s
PMA, Thm. 6.17)

◦ f ∈ R(α) ⇐⇒ f α′ ∈ R
Z b Z b
◦ f dα = f (x)α′ (x) dx
a a

Change of Variable: Suppose: (Rudin’s PMA, Thm. 6.19)

◦ φ : [A, B] → [a, b] is strictly increasing, continuous, and a bijection

◦ α : [a, b] → R is monotone increasing
◦ f ∈ R(α) on [a, b]
◦ β : [A, B] → R with β = α ◦ φ
◦ g : [A, B] → R with g = f ◦ φ

373
Then g ∈ R(β) with
Z B Z b
g dβ = f dα
A a
Without defining β, g, then f ◦ φ ∈ R(α ◦ φ) with
Z B Z φ(b)
(f ◦ φ) d(α ◦ φ) = f dα
A φ(a)

Hence in particular
Z φ(b) Z B
f (x) dx = f (φ(y))φ′ (y) dy
φ(a) A

Integration by Parts: Let F, G : [a, b] → R be differentiable with (Rudin’s PMA, Thm. 6.22)
F′ = f G′ = g f, g ∈ R
Then Z b Z b
F (x)g(x) dx = F (b)G(b) − F (a)G(a) − f (x)G(x) dx
a a
Stated in a slightly less general way,
Z b b Z b
′
f (x)g ′ (x) dx

f (x)g(x) dx = f (x)g(x) −
a a a

(This follows immediately from the fundamental theorem applied to H(x) := F (x)G(x).)

Criteria for Integrability:

Cauchy Criterion for Integrability: f ∈ R(α) on [a, b] if and only if (Rudin’s PMA, Thm. 6.6)

(∀ε > 0)(∃P a partition of [a, b]) U (P, f, α) − L(P, f, α) < ε

Moreover: (Rudin’s PMA, Thm. 6.7)

◦ If U (P, f, α) − L(P, f, α) < ε holds for just some P and some ε, then it holds for that same ε and
any refinement of P .
n
◦ If U (P, f, α) − L(P, f, α) < ε holds for a given P := {xi }i=0 with si , ti ∈ [xi−1 , xi ], then
n
X
|f (si ) − f (ti )|∆αi < ε
i=1

◦ Moreover, for the previous item, if f ∈ R(α), then

n Z b
X
f (ti )∆ai − f dα < ε

i=1 a

f ∈ C[a, b] =⇒ f ∈ R(α) on [a, b] (Rudin’s PMA, Thm. 6.8)

For f monotone on [a, b], and α ∈ C[a, b], then f ∈ R(α). (Rudin’s PMA, Thm. 6.9)
If f is bounded on [a, b] with finitely many discontinuities, with α continuous at each point of discon-
tinuity of f , then f ∈ R(α). (Rudin’s PMA, Thm.
6.10)
If f ∈ R(α) with m ≤ f (x) ≤ M , and φ ∈ C[m, M ], then φ ◦ f ∈ R(α). (Rudin’s PMA, Thm. 6.11)

374
The Fundamental Theorem:

Fundamental Theorem, Part 1: Let f ∈ R[a, b], and define (Rudin’s PMA, Thm. 6.20)
Z x
F : [a, b] → R by the rule F (x) := f (t) dt
a

Then F ∈ C[a, b] and, if f is continuous at x0 , then F is differentiable at x0 with F ′ (x0 ) = f (x0 ).

Fundamental Theorem, Part 2: For f ∈ R[a, b], if we can find F differentiable with F ′ = f , then
Z b
f (x) dx = F (b) − F (a)
a

Handling Vector-Valued Functions:

Riemann(-Stieltjes) Integral (Vector-Valued Functions): Let f1 , · · ·, fn : [a, b] → R and then

f (x) := (f1 (x), · · ·, fn (x)) be a vector-valued function [a, b] → Rn .
Given α : [a, b] → R monotone, we then define f ∈ R(α) to mean fj ∈ R(α) for all j, i.e. the function
is integrable componentwise, and define
Z b "Z #
b Z b
f dα := f1 dα , · · · , fn dα
a a a

i.e. the integral is defined componentwise. Many of the same results hold for such integrals as they did
in one dimension.

375
Application: Rectifiable Curves:
Some definitional groundwork, first:

Curves, Arcs: Given γ : [a, b] → Rn continuous, we say γ is a curve in Rk on [a, b]. If γ is injective,
we may further say it is an arc. If γ(a) = γ(b) (i.e. it starts and stops at the same place), we will say
γ is a closed curve.
Note that a curve is defined to be a function: the set of points on it (its range) may not be unique to
that function.
n
Length of a Curve: Given a partition {xi }i=0 of [a, b], and a curve γ on [a, b], let
n
X
L{(}(s)P, γ) := |γ(xi ) − γ(xi−1 )|
i=1

This is the length of the polygonal path in Rn with vertices (xi , γ(xi )). One can then define the length
of γ by
L{(}(s)γ) := sup L{(}(s)P, γ)
P ∈Pa,b

Rectifiable Curves: If L{(}(s)γ) < ∞, then γ is said to be rectifiable.

Some results:

If a curve γ is continuously differentiable, i.e. γ ∈ C 1 [a, b], i.e. γ, γ ′ ∈ C[a, b], then γ is rectifiable and
Z b
L{(}(s)γ) ≡ |γ ′ (t)| dt
a

which mirrors the ordinary formula from calculus. (Rudin’s PMA, Thm. 6.27)

376
§14.7: (Baby Rudin, Chapter 7) Sequences & Series of Functions

Given metric spaces X, Y and a sequence of functions {fn : E ⊆ X → Y }n∈N , we can define
∞
X
fℓ (x) := lim fn (x) fs (x) := fn (x)
n→∞
n=1

to be the limit of the sequence and the sum of the series formed by the sequence, defined on some subset of
E. These are the central objects of study in this section.

Definitions:

Rudin defines C(X) by

C(X) := {f : X → C | f is continuous and bounded}

(note that we usually don’t force boundedness) and

∥f ∥ := ∥f ∥C(X) := sup |f (x)|

x∈X

which turns C(X) into a complete, normed, C-vector space.

Pointwise-Bounded: For a sequence of functions {fn }n∈N on E, we say {fn }n∈N is pointwise-
bounded if
∃φ : E → R such that ∀x ∈ E and ∀n ∈ N we have |fn (x)| < φ(x)

Uniformly-Bounded: For a sequence of functions {fn }n∈N on E, we say {fn }n∈N is uniformly-
bounded if
∃M ∈ R such that ∀x ∈ E and ∀n ∈ N we have |fn (x)| < M

Equicontinuous Family: A collection F of functions f : E → C (E a subset of (X, d) a metric space)

is said to be equicontinuous on E if

∀ε > 0, ∃δ > 0 such that ∀x, y ∈ E and f ∈ F we have d(x, y) < δ =⇒ |f (x) − f (y)| < ε

Algebra of Functions: A collection F of functions E → C is an (complex) algebra if it is closed

under addition, multiplication, and scalar multiplication.
Uniformly Closed: We say an algebra F is uniformly closed if closed under uniformly-convergent
limits. The union of an algebra with all possible uniform limits is its uniform closure.

Point-Separating: A family F of functions E → C is said to separate points on E if, given distinct

x, y ∈ E (x ̸= y), then ∃f ∈ F with f (x) ̸= f (y).
Non-Vanishing: A family F of functions E → C is said to vanish at no point of E if, given x ∈ E,
∃f ∈ F with f (x) ̸= 0.

Self-Adjoint Algebra: An algebra F is said to be self-adjoint if closed under complex conjugation.

377
Modes of Convergence:
We consider a sequence of functions {fn : E ⊆ X → Y }n∈N and a prospective limiting function f as
needed.

Pointwise Convergence: The most basic kind of convergence, and generally what is implied when
no other qualifier is given. We say {fn }n∈N converges pointwise to f on E if

∀x ∈ E, lim fn (x) = f (x)

n→∞

or, more formally,

(∀x ∈ E)(∀ε > 0)(∃N ∈ N)(∀n ∈ N) |fn (x) − f (x)| < ε

Uniform Convergence: Stronger than pointwise convergence, we say fn converges uniformly to

f if
(∀ε > 0)(∃N ∈ N)(∀x ∈ E)(∀n ≥ N ) |fn (x) − f (x)| < ε

i.e. the N in the formal definition is sufficient for all x. So while in pointwise convergence N may be
a function of ε and x, in uniform convergence N is only a function of ε.
One may likewise develop a criterion for the uniform convergence of series, using the sequence of partial
sums as the sequence of functions to converge uniformly.

Some Basic Results:

If {fn }n∈N , {gn }n∈N converge uniformly on E, then:

◦ {fn + gn }n∈N converges uniformly on E (Rudin’s PMA, Prob. 7.2)

◦ If {fn }n∈N , {gn }n∈N are bounded, then {fn gn }n∈N converges uniformly on E (Rudin’s PMA,
Prob. 7.3)

If fn → f uniformly on E, and xn → x (xn , x ∈ E), then (Rudin’s PMA, Prob. 7.9)

lim fn (xn ) = f (x)

n→∞

If {fn }n∈N has uniformly-bounded partial sums, gn → 0 uniformly with gn ≥ gn+1 , then
P
fn gn
converges uniformly.

378
Switching of Limits & Properties Preserved in Limits:
Is the limit of continuous/integrable/differentiable/etc. functions necessarily also continuous/integrable/d-
ifferentiable/etc.?
In some respect, answering this amounts to “can we interchange the order of limits”, and the answer is
no.
Some notable examples/counterexamples:

Can we interchange limits? (Not necessarily.):

m m
lim lim = lim (0) = 0 ̸= 1 = lim (1) = lim lim
m→∞ n→∞ n + m m→∞ n→∞ n→∞ m→∞ n + m

Is an infinite sum of continuous functions also continuous? (Not necessarily.):

∞
(
X x2 0, x = 0
=
n=0
2
(1 + x ) n
1 + x2 , x ̸= 0 by geometric series

Is the limit of continuous functions continuous? (Not necessarily.)

h 2n i
lim lim cos m! · πx = 1Q (x)
m→∞ n→∞

Is the limit of integrable functions continuous? (Not necessarily.) See previous.

Do limits preserve derivatives? (Not necessarily.) Observe that

sin(nx) √
fn (x) := √ =⇒ fn′ (x) = n cos(nx)
x

We see
f (x) := lim fn (x) = 0 =⇒ f ′ (x) = 0
n→∞
n→∞
However, fn′ (x) −−−−→ ∞ for x ̸∈ ker(fn′ ).

Can we interchange limits and integrals? (Not necessarily.) Observe that

Z 1 Z 1
lim n2 x(1 − x2 )n dx = 0 dx = 0
0 n→∞ 0

since the integrand approaches zero on nonzero values (and is zero at zero). However,
1
n2
Z
lim n2 x(1 − x2 )n dx = lim = +∞
n→∞ 0 n→∞ 2n + 2

Replacing n2 with n, one still achieves an inequality, but with the latter limit evaluating to 1/2.

379
Uniform Continuity – Implications & Equivalences:

Cauchy Criterion for Uniform Convergence: Consider the sequence {fn : E → Y }n∈N . Then
{fn }n∈N converges uniformly on E if and only if ... (Rudin’s PMA, Thm. 7.8)

(∀ε > 0)(∃N ∈ N)(∀n, m ≥ N )(∀x ∈ E) |fn (x) − fm (x)| < ε

Uniformly Convergent iff Convergent in L∞ : If one defines (Rudin’s PMA, Thm. 7.9)

lim fn (x) =: f (x) for each x ∈ E

n→∞

and
Mn := sup |fn (x) − f (x)| =: ∥fn − f ∥L∞ (E)
x∈E
n→∞
then fn → f uniformly iff Mn −−−−→ 0.
Weierstrass M -Test for Series: Let {fn : E → Y }n∈N satisfy, for some sequence {Mn }n∈N ⊆ R,
(Rudin’s PMA, Thm. 7.10)

|fn (x)| ≤ Mn for all x ∈ E for any given n ∈ N

P∞ P∞
Then n=1 fn (x) converges uniformly on E, provided n=1 Mn converges.
The converse need not hold.

380
Uniform Convergence – Consequences for Continuity:
We presume {fn }n∈N is a sequence of functions fn : E → Y for (X, d), (Y, ρ) metric spaces unless specified
otherwise. f will be a prospective limit likewise.
n→∞
Interchange of Limits is Valid: Let fn −−−−→ f uniformly, with x ∈ E ′ and (Rudin’s PMA, Thm.
7.11)
lim fn (t) = An for each n ∈ N
t→x

Then {An }n∈N converges with

lim f (t) = lim An
t→x n→∞

More explicitly,
lim lim fn (t) = lim lim fn (t)
t→x n→∞ n→∞ t→x

Limit is Continuous: If {fn }n∈N ⊆ C(E) and fn → f uniformly, then f ∈ C(E). (Rudin’s PMA,
Thm. 7.12)
The converse need not be true (the convergence of continuous functions to another continuous function
may not be uniform).
However, if the domain is compact and fn ≥ fn+1 , with a pointwise limit f , with fn , f all continuous,
then the convergence is uniform. (Rudin’s PMA, Thm. 7.13)

381
Uniform Convergence – Consequences for Integration, Differentiations, & Series:

Can Interchange Limit & Riemann-Stieltjes Integral: For α monotone increasing on [a, b] with
fn ∈ R(α) on [a, b] and fn → f uniformly on [a, b], we have that f ∈ R(α) on [a, b] and (Rudin’s
PMA, Thm. 7.16)
Z b Z b Z b
f dα = lim fn dα = lim fn dα
a a n→∞ n→∞ a

Can Interchange Integral & Sum: For α monotone increasing on [a, b], with fn ∈ R(α) on [a, b]
and
∞
X
f (x) = fn (x), a uniformly-convergent series on [a, b]
n=1

then Z b Z ∞
bX ∞ Z
X b
f dα = fn dα = fn dα
a a n=1 n=1 a

allowing for termwise integration.

Stricter Conditions on Derivatives: Suppose the following: (Rudin’s PMA, Thm. 7.17)

(i) {fn : [a, b] → C}n∈N are differentiable

(ii) {fn (x0 )}n∈N converges for some x0 ∈ [a, b]
(iii) {fn′ }n∈N converges uniformly on [a, b]

Then the following hold:

(i) {fn } converges uniformly, with a limit we’ll call f

(ii) f ′ (x) = lim fn′ (x) for all x ∈ [a, b]
n→∞

382
§15: Items from Measure Theory

§15.1: (Lebesgue/Jordan) Outer Measure

Recall that an interval I ⊆ Rn is a Cartesian product of intervals; we define it and its volume as so:
n
Y n
Y
I := [ai , bi ] =⇒ v(I) := (bi − ai )
i=1 j=1

∞
Let S := {Ik }k=1 be an at-most-countable collection of intervals covering a set E ⊆ Rn . Let S be the class of
all such covers of E. Then we define the outer (or external, or Jordan) Lebesgue measure (denoted
µe (E) or |E|e ) by X
σ(S) := v(Ik ) |E|e := inf σ(S)
S∈S
Ik ∈S

Some trivialities of note:

Clearly, 0 ≤ |E|e ≤ +∞.

|E|e = ∞ iff σ(S) = ∞ for each S ∈ S. Consequently, |E|e < ∞ if some S ∈ S has σ(S) < ∞.

For intervals I, |I|e ≡ v(I) and |∂E|e = 0.

At-most-countable sets E have |E|e = 0. (Sets of measure zero may be called “null sets” later.) The
converse is not true, e.g. the Cantor set.

More interesting properties:

Monotonicity: A ⊆ B =⇒ |A|e ≤ |B|e (Rudin’s PMA, Thm. 3.4)

[ X
Countable Subadditivity: Ek ≤ |Ek |e (Rudin’s PMA, Thm. 3.12)

k∈N e k∈N

Characterization via Open Sets: ∀ε > 0, ∃G open with E ⊆ G such that (Rudin’s PMA, Thm.
3.6)
|E|e ≤ |G|e ≤ |E|e + ε which implies we can say |E|e = inf |G|e
open G⊇E

Characterization via Gδ Sets: For each E ⊆ Rn there is a Gδ set H where E ⊆ H and |E|e = |H|e .
(Rudin’s PMA, Thm. 3.8)
Measure of Ascending Sequence: If Ek ↗ E then lim |Ek |e = |E|e
k→∞

If d(A, B) > 0 then |A ∪ B|e = |A|e + |B|e

383
§15.2: (Lebesgue) Measure & Measurability of Sets

We say that E ⊆ Rn is (Lebesgue) measurable if ∀ε > 0, ∃G a set such that

E⊆G
G is open
|G − E|e < ε

We say the (Lebesgue) measure of E (denoted |E| or µ(E)) is |E| := |E|e .

NOTE: |G − E|e < ε =⇒ |G|e ≤ |E|e + ε, but the converse need not hold. The implication arises as

G = E ∪ (G − E) =⇒ |G|e ≤ |E|e + |G − E|e < |E|e + ε

A few classes of measurable sets of note. Observe that these ensure the measureable sets of Rn are a
σ-algebra.

All open or closed sets (Rudin’s PMA, Thm. 3.14)

Sets of outer measure zero
The countable union of measurable sets ( =⇒ countable subadditivity) (Rudin’s PMA, Thm. 3.12)
The countable intersection of measurable sets (Rudin’s PMA, Thm. 3.18)
Complements of measurable sets (Rudin’s PMA, Thm. 3.17)
Set differences of measurable sets (Rudin’s PMA, Thm. 3.19)
The limsup & liminf of a sequence of measurable sets

Since the set of measurable sets is closed under countable union and complement, and clearly ∅, Rn are
measurable, then the class of measurable sets in Rn is a σ-algebra.
The smallest σ-algebra containing the open sets of Rn is called the Borel σ-algebra, B or B(Rn ). It
includes, for instance, Fσ , Gδ , and their extensions Fσδ and Gδσ , among others. The elements of B are called
Borel sets. (Consequently, all Borel sets are measurable.)
Some properties of the Lebesgue measure are below. Let M := M(Rn ) denote the measurable sets in
n
R for notational pleasantry.

Many properties from the Lebesgue outer measure carry over, due to definitions. The below may not
carry to the outer measure.
Difference of Measure Zero: |A△B| = 0 =⇒ |A| = |B|.
◦ In particular, if A ⊆ B, then A△B = B − A since A − B = ∅. Then |B − A| = 0 =⇒ |B| = |A|.
Inclusion-Exclusion: |A ∪ B| = |A| + |B| − |A ∩ B| (Rudin’s PMA, Prob. 3.10)
Union with Null Set: If E ∈ M and Z is a null set, |E| = |E ∪ Z|.

[ X
Countable Additivity if Disjoint: Suppose {Ek }k∈N ⊆ M are pairwise disjoint. Then Ek = |Ek |.

k∈N k∈N
(Rudin’s PMA, Thm. 3.23)
Measure of Difference: Let A, B ∈ M, A ⊆ B, and |A| < ∞. Then |B − A| = |B| − |A| (Rudin’s
PMA, Cor. 3.25)

384
Measure of Limit of Sequence: Let {Ek }k∈N ⊆ M. Then: (Rudin’s PMA, Thm. 3.26; 3.27)
◦ Ek ↗ E =⇒ lim |Ek | = |E|
k→∞
◦ Ek ↘ E and |Ek | < ∞ for some k =⇒ lim |Ek | = |E|
k→∞
X
◦ Moreover, if |Ek |e < ∞, then the limsup and liminf are null sets. (Rudin’s PMA, Prob. 3.9)
k

Corollary of Carathéodory: If A ⊇ E ∈ M and |E| < ∞, then |A − E|e = |A|e − |E|

For E ⊆ Rn , there is H a Gδ set such that E ⊆ H, and for any M ∈ M, |E ∩ M |e = |H ∩ M |e .

(Rudin’s PMA, Thm. 3.32)
If T : Rn → Rn is Lipschitz, e.g. a linear transformation, then T maps measurable sets into measurable
sets. Moreover, if E ∈ M, then |T E| = |det(T )| · |E| (if we let 0 · ∞ = 0).
Translations of measurable sets are measurable, with the same measure.

The product of measurable sets is measurable, with |A × B| = |A||B| in the appropriate dimensions of
Rn . We assume 0 · ∞ = 0.
Vitali’s Theorem: There exist nonmeasurable sets; in particular, sets of positive measure always
contain one. (Rudin’s PMA, Thm. 3.38, 3.39)

Some equivalent characterizations of when a set of measurable: E is measurable iff ...

Via Closed Set (Inner Regularity): ∀ε > 0, ∃F ⊆ E closed with |E − F |e < ε. (Rudin’s PMA,
Lem. 3.22)

Off from Gδ by Null Set: E = H − Z for H a Gδ set and Z a null set (Rudin’s PMA, Thm. 3.28)

Made by Fσ and Null Set: E = H ∪ Z for H an Fσ set and Z a null set (Rudin’s PMA, Thm. 3.28)

Carathéodory’s Characterization: For any set A, |A|e = |A ∩ E|e + |A − E|e (Rudin’s PMA,
Thm. 3.30)

Given |E|e < ∞, then E ∈ M iff ∀ε > 0, E = (S ∪ A) − B, for S a finite union of nonoverlapping
intervals, and |A|e , |B|e < ε

A note on a misconception: sets of positive measure need not contain intervals in any sense. The
irrationals are an example.

385
§15.3: (Lebesgue) Measurable Functions

Throughout, if not stated otherwise, let f : E ⊆ Rn → R := R ∪ {±∞}.

We say f is (Lebesgue) measurable if, ∀a ∈ R, {x ∈ E | f (x) > a} is measurable.

Measure and Integral defines {f > a} := {x ∈ E | f (x) > a}.

Note that {f > a} = f −1 (a, ∞].

We will say f is Borel measurable if f −1 (a, ∞] ∈ B(Rn ) for each a ∈ R.

An earlier section may be useful in contending with various set-theoretic identities involving preimages.
We can say f is measurable iff any of these equivalent conditions hold: (Rudin’s PMA, Thm. 4/1)

{f ≥ a} ≡ f −1 [a, ∞] is measurable ∀a ∈ R
{f < a} ≡ f −1 [−∞, a) is measurable ∀a ∈ R
{f ≤ a} ≡ f −1 [−∞, a] is measurable ∀a ∈ R
The above are enough if holding ∀a ∈ D, with D ⊆ Rn dense (Rudin’s PMA, Thm. 4.4)

If f is measurable, the following sets are measurable (assume a, b ∈ R and a < b): (Rudin’s PMA, Cor. 4.2)

{f > −∞} ≡ f −1 (R ∪ {∞})

{f < ∞} ≡ f −1 (R ∪ {−∞})
{f = a} ≡ f −1 ({a}) for any a ∈ R ∪ {±∞}
{a ≤ f ≤ b} := f −1 [a, b] (or the half-open/open versions) – note that {a < f ≤ b} = {f > a} − {f > b}
f −1 (G) for any open set G (Rudin’s PMA, Thm. 4.3)
◦ Conversely, if f −1 (G) is open for all open G as well as at least one of {f = ∞}, {f = −∞}, then
f is measurable
◦ Hence f with image(f ) ⊆ R is measurable iff the preimage of open sets is measurable
If f, g are measurable, then {f > g} is measurable (Rudin’s PMA, Thm. 4.7)

If f, g are measurable, so are the following, if we assume 0 · ±∞ = 0:

f + λ for λ ∈ R, i.e. the function (f + λ)(x) := f (x) + λ (Rudin’s PMA, Thm. 4.8)
λf for λ ∈ R: this function is defined by (λf )(x) := λ · f (x) (Rudin’s PMA, Thm. 4.8)
f + g (hence, ∀α, β ∈ R, αf + βg is measurable) (Rudin’s PMA, Thm. 4.9)
◦ Extends to any finite linear combination
◦ Ensures that the set of measurable functions on E is a vector space
f ·g (Rudin’s PMA, Thm. 4.10)
If g ̸= 0 everywhere, then f /g is too (Rudin’s PMA, Thm. 4.10)
◦ Measure and Integral goes for g ̸= 0 a.e., but you have to change the values on the null set, or
restrict f /g to the non-null set
∞
Take {fk }k=1 measurable.

386
◦ Then fs (x) := sup fk (x) , fi (x) := inf fk (x) are too (Rudin’s PMA, Thm. 4.11)
k∈N k∈N

◦ Also, lim sup fk and lim inf fk are measurable. (If equal, this gives it for the limit.) (Rudin’s
k∈∞ k→∞
PMA, Thm. 4.12)
!
We define lim sup fk (x) := inf sup fk (x) and lim inf fk (x) := sup inf fk (x)
k→∞ j∈N k≥j k→∞ j∈N k≥j

limsup is infimum of eventual suprema; liminf is supremum of eventual infima

Other results:

If D ⊆ Rn is dense, f is measurable iff {f > a} is measurable ∀a ∈ D (Rudin’s PMA, Thm. 4.4)

Continuous functions are measurable
If f = g a.e., and f is measurable, then so is g and |{g > a}| = |{f > a}| always. (Rudin’s PMA,
Thm. 4.5)
Take φ ∈ C(R, R) and f finite a.e. in E ⊆ Rn . Then φ ◦ f is measurable if f is. (Rudin’s PMA, Thm.
4.6)
p
◦ Commonly used with φ(t) = |t|, |t| (p > 0) , ect
◦ Another is the example of f + (x) := max{f (x), 0} and f − (x) := − min{f (x), 0}

The characteristic function or indicator function of a set is important:

(
1, x ∈ A
χA (x) = 1A (x) :=
0, x ̸∈ A

We say s is a simple function if it is a linear combination of characteristic functions; we can limit our
consideration to E1 , · · ·, EN pairwise disjoint.


 a1 , x ∈ E1
N

a2 , x ∈ E2
X 

s(x) := ak · 1Ek (x) = .. ..
k=1


 . .

aN , x ∈ EN


Note that s is measurable iff each Ei is measurable. Some notes: (Rudin’s PMA, Thm. 4.13)

Any function f is the (pointwise) limit of a sequence of simple functions {sk }k∈N
If f ≥ 0, said sequence may be chosen to be increasing, i.e. sk ≤ sk+1 on each k
If the limiting function is measurable, we may also choose the sk to be measurable

Some more important results on measurable functions:

Egorov’s Theorem: Let E be of finite measure and {fk : E → R}k∈N measurable, converging pointwise-
a.e. to f . Then ∀ε > 0, ∃F ⊆ E closed with |E − F | < ε and fk → f uniformly on F . (Rudin’s PMA,
Thm. 4.17)
Lusin’s Theorem: Let f be defined and finite on a measurable set E. Then f is measurable iff it
has property C. (Rudin’s PMA, Thm. 4.20)
◦ “Every measurable function is nearly continuous.”
◦ Define property C as so: f has it on E (E measurable) if, ∀ε > 0, ∃F ⊆ E closed where |E − F | < ε
and f |F is continuous.

387
◦ Simple functions (which are measurable) have this property
Alternate Form to Lusin’s Theorem: Let f : E → R be measurable and ε > 0. Then ∃ closed
F ⊆ E with |E − F | < ε, and ∃ g : Rn → R continuous with f ≡ g|F .
Frechet’s Theorem: Take E ⊆ Rn measurable and f : E → R finite-a.e. Then f is measurable on E
iff ∃{fk }k∈N ⊆ C(Rn , R) with fk → f pointwise-a.e. on E. (Functions are measurable iff they are the
pointwise-a.e. limit of continuous functions.)
If f : [a, b] ⊆ R → R is measurable, then ∃{pk }k∈N a sequence of polynomials such that pk → f
pointwise-a.e. on [a, b]. (May generalize to compact sets? Easily generalizes to the multidimensional
case with domain a box.)

388
§15.4: Convergence in Measure

We focus on the Lebesgue measure here. Some results depend on σ-finiteness, which is true for the
Lebesgue measure.
We say a sequence {fk : E ⊆ Rn → R}k∈N and f : E → R, with f, fk measurable and finite a.e. Then fk
m
converges in measure (on E) to f , denoted fk −→ f , if and only if any of these equivalent criteria hold

∀ε > 0, lim {x ∈ E | |f (x) − fk (x)| > ε} = 0

k→∞

k→∞
∀ε > 0, |{|f − fk | > ε}| −−−−→ 0

∀ε > 0, ∀η > 0, ∃N ∈ N such that ∀k ≥ N , |{|f − fk | > ε}| < η

∀ε > 0, ∃N ∈ N such that ∀k ≥ N , |{|f − fk | > ε}| < ε (Rudin’s PMA, Prob. 4.16)
Cauchy Criteria:

◦ ∀ε > 0, lim |{|fk − fℓ | > ε}| = 0 (Rudin’s PMA, Thm. 4.23)

k,ℓ→∞

◦ ∀ε > 0, ∀η > 0, ∃N ∈ N such that ∀k, ℓ ≥ N , |{|fk − fℓ | > ε}| < η (Rudin’s PMA, Thm. 4.23
restated)
◦ ∀ε > 0, ∃N ∈ N such that ∀k, ℓ ≥ N , |{|fk − fℓ | > ε}| < ε (Rudin’s PMA, Prob. 4.16)
m m
Suppose fk −→ f and gk −→ g. Then the following convergences hold:
m
fk + gk −→ f + g
m
fk gk −→ f g (requires: |E| < ∞)
m
fk /gk −→ f /g (requires: |E| < ∞ and g ̸= 0 a.e. and gk → g pointwise-a.e.)
m
|fk | −→ |f | (the reverse need not hold)

Noteworthy results follow.

Let f, {fk } be measurable and finite a.e. in E of finite measure. Then fk → f pointwise-a.e. =⇒
m
fk −→ f on E. (Rudin’s PMA, Thm. 4.21)
m
fk −→ f on E =⇒ there is a subsequence fkj j∈N ⊆ {fk }k∈N such that fkj → f pointwise-a.e. in E.

(Rudin’s PMA, Thm. 4.22)

m
fk −→ f iff ∀ subsequences fkj j∈N ⊆ {fk }k∈N , they themselves have subsequences fkji i∈N ⊆ fkj j∈N

i→∞
such that fkji −−−→ f pointwise-a.e. (Needs some sort of finiteness on the domain.)
Fatou’s lemma and the monotone convergence theorem hold if a.e.-convergence is replaced by conver-
gence in measure.

The Lebesgue dominated convergence theorem holds if a.e.-convergence is replaced by convergence in

measure, for σ-finite measures.
m
Given f and [a, b] ⊆ R, then we can find gn step functions or hn continuous functions such that gn −→ f
m
and hn −→ f on [a, b].
n→∞ m
For f, fk ∈ Lp (µ) for a p > 0, with fn −−p−−−→ f , then fn −→ f . Converse is false.
L norm

389
§15.5: (Lebesgue) Integrals for Nonnegative Functions

Let f : E ⊆ Rn → R be nonnegative for now.

We define the graph G(f ) = Γ(f ) = graph(f ) (sometimes including E) of f , and the region R(f ) under
the graph of f , by

Γ(f ) := (x, f (x)) ∈ Rn+1 x ∈ E and f (x) < ∞

R(f ) := (x, y) ∈ Rn+1 x ∈ E and either y ∈ [0, f (x)] if f (x) < ∞, or y ∈ [0, ∞) if f (x) = ∞

(We adopt the notation Γ(f ) to match Measure and Integral.)

If R(f ) is measurable in Rn+1 , we define the Lebesgue integral (of f , over E) by
Z Z Z
|R(f, E)| := f (x) dx ≡ f dx ≡ f
|E {zE E
}
common, equivalent notations

Some results:

For f nonnegative and measurable, Γ(f ) is a null set (Rudin’s PMA, Lem. 5.3)
E f exists iff f is measurable
R
(Rudin’s PMA, Thm. 5.1)
X Z X
Let f be a simple function, f ≡ aj 1Ej . Then f= aj · |Ej | (Rudin’s PMA, Cor. 5.4)
E

Some basic integral properties follow. Let F denote the class of measurable functions, and F + the class
of nonnegative measurable functions, of an understood domain E.
Z Z Z
Linearity: For α, β ∈ R and f, g ∈ F , then
+
(αf + βg) = α f +β g (Rudin’s PMA, Thm.
E E E
5.13; 5.14)
Monotonicity of Integrands: For f, g ∈ F + with 0 ≤ g ≤ f a.e., then (Rudin’s PMA, Thm. 5.5;
5.10) Z Z Z Z Z
g≤ f ; in particular, (inf f ) ≤ f≤ (sup f )
E E E E E
Z Z
Equal-a.e. Functions Have Equal Integral: If f, g ∈ F +
and f = g a.e., then f = g.
E E
(Rudin’s PMA, Thm. 5.10)
Z Z
Monotonicity of Domain: If f ∈ F + with A ⊆ B ⊆ E, then f≤ f (Rudin’s PMA, Thm.
A B
5.5)
Finite Integral =⇒ Finite a.e.: If f ∈ F + and
R
E
f is finite, then f < ∞ a.e. (Rudin’s PMA,
Thm. 5.5)
Null Domains & Zero Integral:
◦ For f ∈ F + and E a null set, E f = 0.
R
(Rudin’s PMA, Thm. 5.9)
◦ For f ∈ F + , E f = 0 iff f = 0 a.e.
R
(Rudin’s PMA, Thm. 5.11)
Integral on Partition:
Z Let E be partitioned into at-most-countably many disjoint measurable sets
XZ
{Ek }k∈N . Then = f (Rudin’s PMA, Thm. 5.7)
E j Ej

390
Riemann-Like Definition: Let f ∈ F + . Let E be the class of all decomposition of E into finitely
m
many, disjoint, measurable sets {Ej }j=1 . Then (Rudin’s PMA, Thm. 5.8)
Z X
f = sup inf f (x) |Ej |
E E x∈Ej
j

Z Z Z Z
For f, φ ∈ F + with 0 ≤ f ≤ φ and f < ∞, then (φ − f ) = φ− f (Rudin’s PMA, Cor.
E E E E
5.15)
∞
Z X ∞ Z
X
For {fk }k∈N ⊆ F , +
fk = fk (Rudin’s PMA, Thm. 5.16)
E k=1 k=1 E

Some more noteworthy results on integrals:

Z Z
k→∞
Monotone Convergence Theorem (MCT): If {fk }k∈N ⊆ F + with fk ↗ f on E, then fk −−−−→ f.
E E
(Rudin’s PMA, Thm. 5.6)
Z
1
Tchebyshev’s Inequality: For f ∈ F + and α > 0, f −1 (α, ∞) ≤

f (Rudin’s PMA, Cor. 5.12)
α E

Fatou’s Lemma & Reverse:

Z Z
+
◦ For {fk }k∈N ⊆ F , lim inf fk ≤ lim inf fk (Rudin’s PMA, Thm. 5.17)
E k→∞ k→∞ E
◦ With further constraints, we also have
Z Z Z Z
lim inf fk ≤ lim inf fk ≤ lim sup fk ≤ lim sup fk
E k→∞ k→∞ E k→∞ E E k→∞

+
R bits being the reverse Fatou lemma. We require that ∃φ ∈ F such that fn ≤ φ for all
the latter
n and E φ < ∞ for it to work.

Lebesgue Dominated Convergence Theorem (DCT): For {fk : E ⊆ Rn → R}k∈N ⊆ F + , with

fk → f pointwise-a.e. on E, if ∃φ such that
◦ φ is measurable
◦ fk ≤ φ a.e. and for all k ∈ N
R
◦ Eφ<∞
Z Z Z Z Z
then fk → f i.e. lim fk = lim fk = f (Rudin’s PMA, Thm. 5.19)
E E k→∞ E E k→∞ E

k→∞
Corollary of Fatou, DCT, & MCT: Take {fk }k∈N nonnegative measurable functions with fk −−−−→ f
pointwise-a.e. on E and fk ≤ f a.e. and for each k. Then
Z Z
k→∞
fk −−−−→ f
E E

Note there is no assumption on integrability of f , unlike, say, DCT. Some call this also the MCT,
despite being strictly stronger and more practical. Some discussion on MSE here.
R
It may be considered a corollary of theRDCT for E f < ∞, as well, and hence the usage of Fatou arises
(and can be used independently) for E f = ∞. Note, too, Fatou can be considered implied by the
MCT (see typical proofs).

391
§15.6: (Lebesgue) Integrals for Arbitrary Real Functions

Recall: we define

f + (x) := max{0, f (x)}

f − (x) := − min{0, f (x)}

These functions satisfy a few identities, such as:

f+ − f− = f
f + + f − = |f |
f + g = (f + g)+ − (f + g)− = f + − f − + g + − g −
f ≤ g =⇒ f + ≤ g + and f − ≥ g −
f ± ≥ 0, and f measurable ensures they are too

We define the Lebesgue integral of f on E by

Z Z Z
f= f+ − f−
E E E

provided at least one of the latter two integrals are finite. We say:

E f exists if at least one of the latter two were finite (hence, E f ∈ [−∞, ∞])
R R

f is Lebesgue integrable on E if both were finite ( E f ± ∈ [0, ∞)) and write that f ∈ L(E) or
R

f ∈ L1 (E) in parallel to p-norms

We can go through a gauntlet of the usual results for nonnegative functions by use of the definitional
decomposition f = f + − f − , casework, and/or clever manipulations.

The integral of f on a set of measure zero affects nothing.

Triangle Inequalities:
Z Z Z Z Z Z Z Z
f = f+ − − f + + f − = + −

f ≤
f f = |f |
E E E E E E E E

f is integrable iff |f | is (Rudin’s PMA, Thm. 5.21)

f ∈ L(E) =⇒ f < ∞ a.e. on E (Rudin’s PMA, Thm. 5.22)
E f, E g existing with f ≤ g a.e. in E gives E f ≤ E g
R R R R
(Rudin’s PMA, Thm. 5.23(i))
f = g a.e. on E with E f existing gives E g existing with E f = E g (Rudin’s PMA, Thm. 5.23(i))
R R R R

A ⊆ B with A measurable and B f existing gives A f existing

R R
(Rudin’s PMA, Thm. 5.23(ii))
Countable Disjoint Additivity: Let E f exist with E = k∈N Ek countably-many disjoint mea-
R S
surable sets Ek . Then (Rudin’s PMA, Thm.
5.24) Z XZ
f= f
E k∈N Ek

Zero Integrals & Null Sets: If E is null or f ≡ 0 a.e. in E, then

R
E
f =0 (Rudin’s PMA, Thm.
5.25)

392
Linearity: Take α, β ∈ R and f, g ∈ L(E); then αf + βg ∈ L(E) and (Rudin’s PMA, Thm. 5.27,
5.28) Z Z Z
αf + βg = α f +β g
E E E
R R
◦ Scalar multiplication can be loosened. If α ∈ R and E f exists, then E
αf exists with (Rudin’s
PMA, Thm. 5.27) Z Z
αf = α f
E E

◦ As a corollary, let f, φ be measurable on E, f ≥ φ a.e., and φ ∈ L(E). Then we have

Z Z Z
f −φ= f− φ
E E E

which arises to avoid ∞ − ∞ situations. (Rudin’s PMA, Cor. 5.29)

If f ∈ L(E) and g is measurable on E, with |g| ≤ M a.e., then f g ∈ L(E) with (Rudin’s PMA, Thm.
5.30) Z Z
|f g| ≤ M |f |
E E

◦ As a corollary, if (further) f ≥ 0 and ∃α, β ∈ R with α ≤ g ≤ β a.e., then (Rudin’s PMA, Cor.
5.31) Z Z Z
α f≤ fg ≤ β f
E E E

Monotone Convergence Theorem (MCT): Take {fk }k∈N measurable on E. (Rudin’s PMA,
Thm. 5.32)

◦ (Ascending & Below) If fk ↗ f a.e., and ∃φ ∈ L(E) with fk ≥ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

◦ (Descending & Above) If fk ↘ f a.e., and ∃φ ∈ L(E) with fk ≤ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

unif.
Uniform Convergence Theorem: Take {fk }k∈N ⊆ L(E) with fk −
−−−→ f on E, with |E| < ∞.
k→∞
Then: (Rudin’s PMA, Thm. 5.33)

◦ f ∈ L(E)
Z Z
k→∞
◦ fk −−−−→ f
E E

Fatou’s Lemma & Reversal: Take {fk }k∈N measurable on E. Suppose ∃φ ∈ L(E) with fk ≥ φ a.e.
on E ∀k. Then (Rudin’s PMA, Thm. 5.34)
Z Z
lim inf fk ≤ lim inf fk
E k→∞ k→∞ E

As a corollary, we have the reverse, provided ∃ψ ∈ L(E) instead with fk ≤ ψ a.e. for all k: (Rudin’s
PMA, Cor. 5.35) Z Z
lim sup fk ≥ lim sup fk
E k→∞ k→∞ E

393
k→∞
Dominated Convergence Theorem (DCT): Take {fk }k∈N measurable on E with fk −−−−→ f
pointwise-a.e. If ∃φ ∈ L(E) with |fk | ≤ φ a.e. ∀k, then (Rudin’s PMA, Thm. 5.36)
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

◦ Bounded Convergence Theorem: Follows immediately as a corollary for φ ≡ M ∈ R.

k→∞
Take {fk }k∈N measurable on E with fk −−−−→ f pointwise-a.e. If ∃M ∈ R with |fk | ≤ M a.e. ∀k,
then (Rudin’s PMA, Cor. 5.37)
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

◦ Sequential/Generalized Version: Take {fk }k∈N , {φk }k∈N sequences of measurable functions
with (Rudin’s PMA, Prob. 5.23)
k→∞
fk −−−−→ f pointwise a.e. in E
k→∞
φk −−−−→ φ pointwise a.e. in E
|fk | ≤ φk a.e. in E for all k
φ ∈ L(E)
R k→∞ R
φ −−−−→ E φ
E k
Z
k→∞
Then |fk − f | −−−−→ 0
E
m
◦ Convergence In Measure Version: Suppose {fk }k∈N has fk −→ f on E, and |fk | ≤ φ ∈ L(E)
for some φ. Then f ∈ L(E) and (Rudin’s PMA, Prob. 5.26)
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E

Proved by showing every subsequence {fkj }j∈N has a subsubsequence {fkji }i∈N with
Z Z
i→∞
fkji −−−→ f
E E

Corollary of Fatou, DCT, & MCT: Originally stated for positive functions, keeping here for
convenience.
k→∞
Take {fk }k∈N nonnegative measurable functions with fk −−−−→ f pointwise-a.e. on E and fk ≤ f a.e.
and for each k. Then Z Z
k→∞
fk −−−−→ f
E E
Note there is no assumption on integrability of f , unlike, say, DCT. Some call this also the MCT,
despite being strictly stronger and more practical. Some discussion on MSE here.
R
It may be considered a corollary of theRDCT for E f < ∞, as well, and hence the usage of Fatou arises
(and can be used independently) for E f = ∞. Note, too, Fatou can be considered implied by the
MCT (see typical proofs).

394
§15.7: Repeated Integration: Fubini-Tonelli

Cross-Sections: We define, respectively, the x, y cross-sections of a set E ⊆ Rp × Rq = Rp+q by

Ex := {y ∈ Rq | (x, y) ∈ E}
E y := {x ∈ Rp | (x, y) ∈ E}

Algebraic Cross Section Properties: The following properties are satisfied; given A, B, A1 , · · ·, An , · · · ∈ Rp ×Rq ,
and any x ∈ Rp , y ∈ Rq ,

(i) A ⊆ B =⇒ Ax ⊆ Bx
∞ ∞
!
[ [
(ii) An = (An )x
n=1 x n=1
∞ ∞
!
\ \
(iii) An = (An )x
n=1 x n=1
(iv) (A − B)x = Ax − Bx
(v) An ↗ A =⇒ (An )x ↗ Ax
(vi) An ↘ A =⇒ (An )x ↘ Ax
(vii) All of the above hold analogously for the y-sections

Measurability Cross-Section Properties: Suppose E ⊆ Rp × Rq is measurable. Then:

(i) ∀x ∈ Rp a.e., Ex is measurable

(ii) ∀x ∈ Rp a.e., f (x) := |Ex | is well-defined, nonnegative, and measurable on Rp
Z Z
(iii) f (x) dx = |Ex | dx = |E|
Rp Rp
(iv) All of the above hold analogously for the y-sections

Fubini’s Theorem: For Ik intervals, take f (x, y) ∈ L(I1 × I2 ). Then: (Rudin’s PMA, Thm. 6.1)

◦ ∀x ∈ I1 a.e., f (x, ·) is measurable and in L(I2 ).

R
◦ The function g(x) := I2 f (x, y) dy is measurable and integrable on I2 with
ZZ Z Z Z Z
f (x, y) dx dy = f (x, y) dy dx = f (x, y) dx dy
I1 ×I2 I1 I2 I2 I1

n m
This includes the case when I1 = R , I2 = R .

Comment on Fubini: If E f is finite, the corresponding iterated integrals are finite. The reverse is
RR

not true, even if the iterated integrals all equal. This is true for nonnegative measurable f , by Tonelli.
Lemmas Leading Up To Fubini:
m
◦ If Fubini applies to {fk }k=1 , it applies to a finite linear combination of them. (Rudin’s PMA,
Lem. 6.2)
◦ If Fubini applies to each of {fk }k∈N , with fk ↗ f or fk ↘ f , it applies to f .(Rudin’s PMA, Lem.
6.3)
◦ Fubini applies to χE for E Gδ (Rudin’s PMA, Lem. 6.4)
◦ Fubini applies to χZ for Z of measure zero. Moreover, ∀x ∈ R a.e., Zx := {y ∈ Rm | (x, y) ∈ Z}
n

is of measure zero in Rm . (Rudin’s PMA, Lem. 6.5)

395
◦ Fubini applies to characteristics of measurable sets of finite measure (Rudin’s PMA, Lem. 6.6)

Generalization of Fubini: (Rudin’s PMA, Thm. 6.7,6.8)

◦ For f measurable on Rn+m , ∀x ∈ Rn a.e., f (x, ·) is measurable on Rm . Moreover, if E ⊆ Rn+m

is measurable, then Ex := {y | (x, y) ∈ E} is measurable in Rm ∀x ∈ Rn a.e. (Rudin’s PMA,
Thm. 6.7)
◦ Let f (x, y) be measurable on E ⊆ Rn+m measurable. Then ∀x ∈ Rn a.e., f (x, ·) is measurable on
Ex . (Rudin’s PMA, Thm. 6.8i)
n
◦ Moreover,
R if f (x, y) ∈ L(E), then ∀x ∈ R a.e., f (x, y) is integrable on Ex w.r.t. y, with
g(x) := Ex f (x, y) dy integrable in terms of x, and (Rudin’s PMA, Thm. 6.8ii)
ZZ Z Z
f (x, y) dx dy = f (x, y) dy dx
E Rn Ex

Tonelli’s Theorem: Take f (x, y) ≥ 0 on I1 × I2 ⊆ Rn+m an interval. Then: (Rudin’s PMA, Thm.
6.10)

◦ ∀x ∈ I1 a.e., f (x, ·) is measurable in y on I2

R
◦ g(x) := I2 f (x, y) dy is measurable as a function of x on I2
ZZ Z Z Z Z
◦ f= f (x, y) dy dx = f (x, y) dx dy
I1 ×I2 I1 I2 I2 I1
◦ Hence, if any one of these three integrals is finite, the other two are too. The finiteness of any
one of them for |f | thus gives integrability and equality.
◦ This may be trivially extrapolated to arbitrary measurable E.
◦ The integral equality holds even if the integrals evaluate to ±∞.

Fubini & Tonelli: The Difference:

◦ Fubini deals with iterated integrals of integrable functions.

◦ Tonelli deals with iterated integrals of nonnegative, measurable functions.
◦ To use Fubini, you may prove f ∈ L(E), i.e. show f + or f − or |f | are integrable. This can
necessitate Tonelli.

Application: Convolution: Recall: we define, for f, g : Rn → R, (Rudin’s PMA, Thm. 6.14)

Z
(f ∗ g)(x) := f (x − ξ)g(ξ) dξ
Rn

If f, g ∈ L(Rn ), then (f ∗ g)(x) exists ∀x ∈ Rn a.e., is measurable, and in L(Rn ). Moreover,

Z Z Z
|f ∗ g| ≤ |f | |g|
Rn Rn Rn
Z Z Z
(f ∗ g) = f g
Rn Rn Rn

396
§15.8: Differentiation (As in Lectures)

Recall: for f : [a, b] → R monotone (increasing or decreasing), then f has at-most-countably-many points
of discontinuity, and is differentiable a.e. For f increasing in particular,
Z b
f ′ ≤ f (b) − f (a)
a

in the Lebesgue sense.

Vitali Covering Lemma:

First, we define a Vitali cover of E ⊆ R as a collection V := {Iα }α∈A of intervals (not necessarily
contained in E, nor necessarily countable) such that:

Iα is closed for all α ∈ A

∀x ∈ E, we may find a sequence {In }n∈N ⊆ V such that:

◦ x ∈ In for each n ∈ N
n→∞
◦ |In | −−−−→ 0

Equivalently, V is a Vitali covering if it consists of only closed intervals and

(∀x ∈ E)(∀ε > 0)(∃I ∈ V )(x ∈ I and |I| < ε)

A basic example is, given any E, the collection

1 1
V := x − ,x +
n n x∈E,n∈N

Vitali’s covering lemma states the following: given E ⊆ R bounded with Vitali cover V := {Iα }α∈A ,
∃{In }n∈N ⊆ V such that

{In }n∈N contains at-most-countably-many pairwise-disjoint closed intervals (may pad with ∅ if needed)

{In }n∈N covers almost-all of E, in the sense that

∞

[
E − In = 0

n=1

397
Subsequential Derivatives:
To define the subsequential derivative (of f , at x0 , associated with {hn }n∈N ), list use have

f : [a, b] → R
x0 ∈ [a, b]
{hn }n∈N ⊆ [a − x0 , b − x0 ]\{0}
f (x0 + hn ) − f (x0 )
lim =: λ ∈ R (exists, and is in R)
n→∞ hn
Then we say λ is the subsequential derivative, of f , at x0 , associated with {hn }n∈N . Note that the
classical derivative appears for hn := 1/n.
Personal notation:
f (x0 + hn ) − f (x0 )
SSD(f ; x0 ; hn ) := lim
n→∞ hn
If needed, we may omit the hn if it is not of concern at that moment.
Consequently, f is differentiable at x0 iff all subsequential derivatives of f at x0 have the same λ. That
is to say, SSD(f ; x0 ; hn ) = λ for all sequences {hn }n∈N (and exists).

Results on Subsequential Derivatives:

Bounding Lemmas: Let f : [a, b] ⊇ E → R be strictly increasing. Let p > 0 be fixed. Suppose that,
∀x ∈ E, we have SSD(f ; x) = λx < p. Then
|f (E)|e ≤ p · |E|e
Likewise, if SSD(f ; x) = λx > q for some q ≥ 0 fixed, then
|f (E)|e ≥ q · |E|e
Finally, if SSD(f ; x) = +∞ for all x ∈ E,
|E| = 0
Lebesgue Differentiation Theorem for Monotone Functions: If f : [a, b] → R is monotone,
then f is differentiable a.e. in the classical sense, i.e. f ′ (x0 ) ∈ R and exists for all x ∈ [a.b].
Bound on Integral: Let f : [a, b] → R be non-decreasing (weakly increasing). Then f ′ ∈ L[a, b] and
Z b
f ′ ≤ f (b− ) − f (a+ ) ≤ f (b) − f (a)
a

The final inequality is a bit weaker due to endpoint behavior not mattering. Note that we define
f (ξ − ) := lim− f (x) f (ξ + ) := lim+ f (x)
x→ξ x→ξ

Misc. Results from Lectures:

If f is continuous, increasing, or decreasing, it has a fixed point

If f : [a, b] → R is increasing, it may be written as
f (x) = c(x) + j(x)
for c continuous & increasing and j a jump function which is non-decreasing and non-negative.
Monotone functions have at most countably many discontinuities, all of the jump-type.

398
§15.9: Differentiation (As in Measure & Integral, Chapter 7)

Set Functions:
To generalize the notion of indefinite integral , given f : A → R (where E ⊆ A ⊆ Rn and A, E are
measurable), we define the indefinite integral of f to be
Z
F : {measurable subsets of A} → R≥0 as defined by F (E) := f
E

Some notes on set functions:

Set Functions: F is a set function, a function defined on a σ-algebra Σ ⊆ P(A) such that

◦ F (E) < ∞ for each E ∈ Σ

◦ F is countably additive:

·
[ X
E= Ek for Ek ∈ Σ =⇒ E ∈ Σ and F (E) = F (Ek )
k∈N k∈N

The indefinite integral satisfies these properties (Theorems 5.5 & 5.24).
Continuity: A set function is said to be continuous if

lim F (E) = 0 where diam(E) := sup |x − y|

diam(E)→0 x,y∈E

Formally,
(∀ε > 0)(∃δ > 0) diam(E) < δ =⇒ |F (E)| < ε

Absolute Continuity: A set function F is absolutely continuous (w.r.t. Lebesgue measure) if

lim F (E) = 0
|E|→0

Formally,
(∀ε > 0)(∃δ > 0) |E| < δ =⇒ |F (E)| < ε

Such functions are obviously continuous, but the converse need not hold.
R
For f ∈ L(A), the indefinite integral F (E) := E f is absolutely continuous.
If F is a set function absolutely continuous w.r.t. Lebesgue measure, then ∃f which has indefinite
integral f . (This is the Radon-Nikodym theorem.)

399
Indefinite Integrals and Differentation:
n n
Qn a cube (i.e. Qn = [a, b] ⊆ R for some a, b ∈ R), and Qx a cube with center x (in the sense
Let Q denote
that Qx = i=1 [xi − r, xi + r] for some r > 0).
Let F be f ’s indefinite integral.
We consider: does Z
F (Qx ) 1
= f (ξ) dξ
|Qx | |Qx | Qx

converge to f (x) as Qx contracts to x. That is, does

F (Qx )
lim = f (x) (perhaps think of r → 0)
Qx ↘x |Qx |

hold?
If so, we say that f ’s indefinite integral is differentiable at x with derivative f ′ (x).
Note that in R1 this amounts to the question of whether
Z x+h
1 ?
lim f (ξ) dξ = f (x)
h→0 2h x−h

which is essentially equivalent to the usual statement.

Some results of note:

Lebesgue’s Differentiation Theorem: For f ∈ L(Rn ), its indefinite integral is differentiable a.e.,
with derivative f . This proof of this necsssitates several lemmas, and proceeds by approximating such
f by continuous Ck . (Rudin’s PMA, Thm. 7.2)

◦ An Extension (Lloc (Rn )): This holds for f ∈ Lloc (Rn ) (Rudin’s PMA, Thm. 7.11)
We say f ∈ Lloc (Rn ) (locally integrable if f ∈ L(B) for each bounded measurable (equivalently,
compact) B ⊆ Rn .
◦ An Extension (Points of Density): Observe that, for E measurable,

|E ∩ Q| |E ∩ Qx |
Z
1
χE = =⇒ lim = χE (x) a.e.
|Q| Q |Q| Qx ↘x |Qx |

A point for which the limit is 1 is a point of density of E and a point for which the limit is
0 is a point of dispersion. (Note that the equality above holds only a.e. Note that points of
dispersion of E are density points of E c , etc.)
Then the differentiation theorem gives: almost each point in E ⊆ Rn measurable is a point of
density in E. (Rudin’s PMA, Thm. 7.13)
◦ An Extension (Lebesgue Points): We note that
Z Z
1 1
lim f (ξ) dξ = f (x) =⇒ lim f (ξ) − f (x) dξ
Qx ↘x |Qx | Q Qx ↘x |Qx | Q
x x

for almost-all x when f ∈ Lloc (Rn ). If the slightly stronger claim

Z
1
lim f (ξ) − f (x) dξ

Qx ↘x |Qx | Q
x

is satisfied, then x is a Lebesgue point of f ; the collection of all these points is f ’s Lebesgue
set.
The differentiation theorem may be extended as so: for f ∈ Lloc (Rn ), almost-all x ∈ Rn are
Lebesgue points of f . (Rudin’s PMA, Thm. 7.15)

400
◦ An Extension (Broader Notion of Shrinking): What if we go beyond cubes? A family of
sets {Sn }n∈N is said to shrink regularly to x if
n→∞
(i) diam(Sn ) −−−−→ 0
(ii) If Q is the smallest cube of center x containing S, ∃k independent of S such that |Q| ≤ k|S|
Then for f ∈ Lloc (Rn ), if x is in f ’s Lebesgue set,
Z
1
n→∞
f (ξ) − f (x) dξ −−−−→ 0

|Sn | Sn

whenever {Sn }n∈N shrinks regularly to x and hence

Z
1 n→∞
f (ξ) dξ −−−−→ f (x)
|Sn | Sn

Approximation by C0 Functions: For f ∈ L(Rn ), ∃{Ck }k∈N ⊆ C0 (Rn ) with (Rudin’s PMA, Lem.
7.3) Z
k→∞ k→∞
|f − Ck | −−−−→ 0; that is, ∥f − Ck ∥L1 −−−−→ 0
Rn
Proof proceeds by showing it for finite linear combinations, and then limits of sequences.
Simple Vitali Cover Lemma: Let E ⊆ Rn have |E|e < ∞, and K := {Qi }i∈I a collection of cubes
covering E. Then ∃ (Rudin’s PMA, Lem. 7.4)

◦ β > 0, dependent only on n

◦ {Q1 , · · ·, QN } ⊆ K

such that
N
X
|Qj | ≥ β · |E|e
j=1

The set-theoretic equivalent is discussed partly in Exercise 7.18.

Hardy-Littlewood Maximal Functions: Given f : Rn → R integrable over any cube Q, the
Hardy-Littlewood maximal function is
Z
∗ 1
f (x) := sup |f (ξ)| dξ
Qx ⊆Rn |Qx | Qx

(with the supremum specifically over those cubes with edges parallel to the axes). This function satisfies
the following inner product-like properties:

◦ 0 ≤ f ∗ (x) ≤ ∞
◦ (f + g)∗ (x) ≤ f ∗ (x) + g ∗ (x)
◦ (cf )∗ (x) = |c| · f ∗ (x)
◦ If f ∗ (x0 )α for some x0 ∈ Rn and α > 0, then (by absolute continuity) f ∗ (x) > α for all x
sufficiently near x0 .
◦ Hence, f ∗ is lower semicontinuous and thus measurable
◦ However, f ∗ is never integrable on {|x| ≥ 1} unless f ≡ 0 a.e., and not even over bounded sets
(but only for some f ∈ L(Rn )). It will be integrable over bounded sets if f ∈ Lp (Rn ) for some
p > 1, or even |f | · (1 + ln+ |f |) ∈ L1 (Rn ).

401
Weak Lp : Suppose that, for f ∈ L(Rn ), ∀α > 0, ∃c (not dependent on α) such that
c
|{|f | > α}| ≤
α
Then f is said to belong to weak L1 . More generally, with p ∈ [1, ∞], f ∈ Lpweak (Rn ) (or the Lorentz
space Lp,∞ ) if c p
|{|f | > α}| ≤
α
The best such C is given by the seminorm (triangle inequality fails)
h i1/p
∥f ∥Lp n = sup α · |{|f | > α}|
weak (R )
α>0

We note that Lp is contained in its weak Lp functions, and the latter have norm bounded about by
the Lp norm proper (when it exists).
Hardy-Littlewood Inequality: For f ∈ L(Rn ), we have f ∗ ∈ Lweak (Rn ), and ∃c independent of f, α
such that, for all α > 0, (Rudin’s PMA, Lem. 7.9)
Z
c
|{f ∗ > α}| ≤ |f |
α Rn

402
§15.10: Functions of Bounded Variation (in R)

Fundamental Definitions:
nP
Consider an interval [a, b] ⊆ R. An ordered partition P := {xi }i=0 is a set of points such that

a = x0 < x1 < x2 < · · · < xnP = b

As a personal notation we let

Pa,b := {all ordered partitions P of [a, b]}

To P ∈ Pa,b , we associate the sum

nP
X
V (f, P ) := |f (xi ) − f (xi−1 )|
i=1

or, more compactly, we may let

X
∆fi := f (xi ) − f (xi−1 ) so that V (f, P ) := |∆fi |
i

Then the variation of f over [a, b] is defined as

X
Vab (f ) := sup V (f, P ) ≡ sup |∆fi |
P ∈Pa,b P ∈Pa,b i

If Vab (f ) < ∞, we say f is of bounded variation and f ∈ BV[a, b]

Otherwise, if Vab (f ) = ∞, we say f is of unbounded variation
Typically we let Vaa (f ) := 0 by default

Sometimes we want to focus on the bits of positive or negative variation. Recall that, given f ,

f + (x) := max{f (x), 0} f − (x) := − min{0, f (x)}

to be its positive and negative parts respectively. In particular, applied to numbers, we let
( (
+ x, x > 0 − 0, x > 0
x := x :=
0, x ≤ 0 −x, x ≤ 0

Then we may define, given Γ ∈ Pa,b and f : [a, b] → R,

−
X +
X
P (f, Γ) := (∆fi ) N (f, Γ) := (∆fi )
i i

and in turn the positive/negative variations by

Pab (f ) := sup P (f, Γ) Nab (f ) := sup N (f, Γ)

Γ∈Pa,b Γ∈Pa,b

A function, sometimes called the total variation function of f on [a, b] is defined by

π(x) := Vax (f )

403
Unsorted Results on Bounded Variation Functions:

On altering the partition:

◦ If P ⊆ Q for P, Q ∈ Pa,b , then V (f, P ) ≤ V (f, Q).

′
◦ If [a′ , b′ ] ⊆ [a, b], then Vab′ (f ) ≤ Vab (f ). (Rudin’s PMA, Thm. 2.2(i))
◦ For c ∈ (a, b) we have Vab (f ) = Vac (f ) + Vcb (f ) (Rudin’s PMA, Thm. 2.2(ii))
n→∞
Recall that ∥P ∥ := max{xi − xi−1 }. Then if {Pn }n∈N ⊆ Pa,b has ∥Pn ∥ −−−−→ 0, and f ∈ C[a, b], we
n→∞
have V (f, Pn ) −−−−→ Vab (f ). That is to say, (Rudin’s PMA, Thm. 2.9)

Vab (f ) = lim V (f, P )

∥P ∥→0

Let α, β ∈ R, {hn }n∈N ⊆ BV[a, b], and f, g ∈ BV[a, b]. Then these are in BV[a, b]: (Rudin’s PMA,
Thm. 2.1(ii))

◦ αf + βg (hence, BV[a, b] is a vector space)

◦ fg
◦ f /g (provided g is bounded away from 0, i.e. ∃m > 0 s.t. |g| ≥ m)
◦ If hn → h pointwise with Vab (hn ) ≤ M for all n, then h ∈ BV[a, b] with Vab (h) ≤ M . (Rudin’s
PMA, Prob. 2.4)

As noted, BV[a, b] is a vector space. We may give it norm

∥f ∥BV[a,b] := Vab (f ) + |f (a)|

Under this norm, it is Banach.

For f ∈ C 1 [a, b], we have (Rudin’s PMA, Cor. 2.10)

Z b Z b Z b
Vab (f ) = |f ′ | Pab (f ) = (f ′ )+ Nab (f ) = (f ′ )−
a a a

Items ensuring f ∈ BV[a, b]:

◦ If f is Lipschitz, then f ∈ BV[a, b].

◦ If f is monotone, Vab (f ) = |f (b) − f (a)| and thus f ∈ BV[a, b]
Sn
◦ Suppose that [a, b]P = i=1 [ci , ci+1 ], and f is monotone on each subinterval [ci , ci+1 ]. Then one
n c
sees that Vab (f ) = i=1 Vcii+1 (f ). This ensures that f ∈ BV[a, b].
b
◦ If f : [a, b] → R has f ∈ BV[a + ε, b] ∀ε ∈ (0, b − a), and also has Va+ε (f ) ≤ M < ∞ for all ε,
then f ∈ BV[a, b]. (Rudin’s PMA, Prob. 2.5)

If f ∈ BV[a, b], then...

◦ f has countably-many discontinuities (all being jump/removable). (Rudin’s PMA, Thm. 2.8)
◦ Hence, f ∈ R[a, b], i.e. BV[a, b] ⊆ R[a, b]. (Rudin’s PMA, Prob. 2.32)
◦ If f ∈ BV[a, b], then f is bounded on [a, b]. (Rudin’s PMA, Thm. 2.1(i))

Equivalences to being in BV[a, b], i.e. f ∈ BV[a, b] if and only if ...

404
◦ (Jordan Decomposition; Cor. 2.7) f ∈ BV[a, b] ⇐⇒ f may be written as φ − ψ, for φ, ψ
monotone-increasing and bounded on [a, b].
We may extend this to f ∈ BV(−∞, ∞). (Rudin’s PMA, Prob. 2.8)
A proof comes from the fact that
π+f π−f
f= − or f = f + π − π
2 2

◦ f ∈ BV[a, b] iff ∃φ increasing on [a, b] such that

a ≤ y ≤ x ≤ b =⇒ f (x) − f (y) ≤ φ(x) − φ(y)

Results tied to total variation functions π(x) := Vax (f ):

x →x+
◦ Note: It is not necessarily true that Vxx0 (f ) −−0−−−→ 0
◦ If f ∈ BV[a, b], then π is increasing on [a, b] (hence π ′ exists a.e.)
◦ If f ∈ BV[a, b], then π and f share the same points of left-, right-, and full continuity
Rb
◦ a π ′ ≤ π(b) − π(a), giving the fundamental theorem-like result of
Z b
d x
V (f ) dx ≤ Vab (f )
a dx a

◦ |f ′ | = π ′ a.e.
Rx
Results tied to differentiability/integrability; here, F (x) := a
f is the indefinite integral

◦ F is Lipschitz when f ∈ R[a, b]

◦ F ′ exists a.e. when f ∈ R[a, b]
◦ F ′ = f when f ∈ R[a, b]
Rb
◦ F ∈ BV[a, b] with Vab (F ) ≤ a
|f |
◦ For f ∈ L[a, b], if F ≡ 0 everywhere, then f ≡ 0 a.e.
◦ For f ∈ L[a, b], F ′ = f a.e.

For some results on negative/positive variation:

◦ Pab (f ), Nab (f ) ∈ [0, ∞]

◦ P (f, Γ) + N (f, Γ) = V (f, Γ)
◦ P (f, Γ) − N (f, Γ) = f (b) − f (a)
◦ If any of Pab (f ), Nab (f ), Vab (f ) are finite, then all three are, with (Rudin’s PMA, Thm. 2.6)

Pab (f ) + Nab (f ) = Vab (f ) Pab (f ) − Nab (f ) = f (b) − f (a)

Equivalently,

Vab (f ) + f (b) − f (a) Vab (f ) − f (b) + f (a)

Pab (f ) = Nab (f ) =
2 2

405
§15.11: Absolute Continuity

Motivation:
Recall some fundamental results from calculus:
Rx
Fundamental Theorem of Calculus 1: For f ∈ R[a, b] and F (x) := a
f (t) dt the indefinite
integral of F defined on x ∈ [a, b], we have

◦ F is Lipschitz
◦ F ′ exists wherever f is continuous, and F ′ = f at such points
◦ F ′ exists a.e.

Fundamental Theorem of Calculus 2: Let f : [a, b] → R be differentiable on all of [a, b], with
f ′ ∈ R[a, b]. Then the fundamental theorem of calculus holds:
Z x
f (x) = f (a) + f ′ (x) dx
a

A measure-theoretic framing of the latter result is as so:

Lebesgue FTC2: For f ∈ AC[a, b]:

◦ f ′ exists a.e.
◦ f ′ ∈ L[a, b]
Z x
◦ f (x) = f (a) + f′
a

Basic Definitions:

Absolute Continuity: We say f is absolutely continuous on [a, b], denoted f ∈ AC[a, b], when,
n
∀ε > 0, ∃δ > 0, such that, for any collection {[ak , bk ]}k=1 of finitely-many non-overlapping intervals in
[a, b] satisfying X X
|bk − ak | < δ, we have |f (bi ) − f (ai )| < ε
k k

Singular Function: If f ′ ≡ 0 but f is non-constant on [a, b], we say f is a singular function. The
Devil’s staircase function (or Cantor-Lebesgue function) is an example:
 ∞ ∞
X an X 2an
, ∈ C and an ∈ {0, 1}


n 3n


n=1
2 n=1
c(x) :=

 sup c(y), x ̸∈ C
y≤x

y∈C

where C denotes the usual Cantor set.

(Lusin) N -Function: Given f : [a, b] → R, we say it is a (Lusin) N -function if, for any Z ⊆ [a, b]
of measure zero, then |f (Z)| = 0. (N -functions map zero-measure sets to measure-zero sets.)

Basic Results:

406
If f ∈ AC[a, b], then

◦ f is uniformly continuous on [a, b]

◦ f ∈ BV[a, b] (Rudin’s PMA, Thm. 7.27)
◦ f ′ ∈ L[a, b]
◦ |f ′ | ≤ Vab (f ) < ∞
◦ If f ′ ≡ 0 a.e., and f ∈ AC[a, b], then f is constant on [a, b] (Rudin’s PMA, Thm. 7.28)
Z x
x
◦ Va (f ) = |f ′ |
a
Z x
x
◦ Va (f ) ≤ |f ′ | holds for f ∈ BV[a, b] (AC[a, b] ⊆ BV[a, b])
a

f ∈ AC[a, b] if and only if these hold: (Rudin’s PMA, Thm. 7.29)

(i) f ′ exists a.e.

(ii) f ′ ∈ L[a, b]
Rx
(iii) f (x) = f (a) + a
f ′ (FTC is satisfied)

Some results about variation:

◦ Vax (f ) and f are absolutely continuous at the same points

d x
◦ V (f ) = |f ′ | (holds for f ∈ BV[a, b])
dx a
Z x
x
◦ If f ∈ AC[a, b], then Va (f ) = |f ′ |
a
Z x
◦ If f ∈ BV[a, b] ⊇ AC[a, b], then Vax (f ) ≤ |f ′ |
a

f Lipschitz on [a, b] =⇒ f ∈ AC[a, b]

If f ′ exists everywhere and is finite, then f ∈ AC[a, b]

Z x
For f ∈ L[a, b], we have F (x) := f ∈ AC[a, b]
a

f ∈ C[a, b] is an N -function iff E ⊆ [a, b] measurable =⇒ f (E) measurable

If all four of these hold, then f : [a, b] → R satisfies f ∈ AC[a, b]:

(i) f ∈ C[a, b]
(ii) f ′ exists a.e.
(iii) f ′ ∈ L[a, b]
(iv) f is an N -function or f satisfies the FTC

If f ∈ C[a, b] is differentiable except on an at-most-countable set, and f ′ ∈ L[a, b], then f ∈ AC[a, b]
and the FTC holds
If f ∈ BV[a, b], then we may write (Rudin’s PMA, Thm. 7.30)

f (x) = a(x) + s(x)

for a ∈ AC[a, b] and h singular on [a, b], each unique up to additive constants.

407
More Important Results:

Absolute Continuity of Integral: Let f ∈ L[a, b] (or generally, f ∈ L(Rn )). Then ∀ε > 0, ∃δ > 0
such that Z
for any E ⊆ [a, b] with |E| < ε, we have f <ε
E
or rather Z
lim f =0
|E|→0 E

Lebesgue FTC2: For f ∈ AC[a, b]:

◦ f ′ exists a.e.
◦ f ′ ∈ L[a, b]
Z x
◦ f (x) = f (a) + f′
a

Integration by Parts: Given f, g ∈ AC[a, b], then (Rudin’s PMA, Thm. 7.32)
Z b b Z b
′
f ′g

f g = f g −
a a a

Change of Variables: For f ∈ L[a, b] and φ : [α, β] → R with φ ∈ AC[a, b], φ(α) = a, φ(β) = b, and
φ is strictly increasing, we have
Z b Z β
f= f (φ(t))φ′ (t) dt
a α

A Fubini Theorem: Let {fn : [a, b] → R}n∈N be a sequence of increasing functions, finite everywhere
(hence −∞ < f (a) ≤ f (b) < +∞). Define
∞
X
f (x) := fn (x)
n=1

Then f ′ exists a.e., with us being able to bring the derivative inside:
∞
X
f ′ (x) = fn′ (x)
n=1

408
§15.12: Convex Functions

Motivation:
In ordinary calculus, we are tempted to say φ : (a, b) → R is convex (concave up) if φ′′ (x) ≥ 0 on (a, b),
and dually concave (concave down) if φ′′ (x) ≤ 0 on (a, b). These motivate a new definition.

Basic Definitions:

Convex (Concave Up): We say that φ : (a, b) → R is convex if

∀x, y ∈ (a, b), ∀λ ∈ [0, 1], φ (1 − λ)x + λy ≤ (1 − λ)φ(x) + λφ(y)

Concave (Concave Down): We say that φ : (a, b) → R is concave if

∀x, y ∈ (a, b), ∀λ ∈ [0, 1], φ (1 − λ)x + λy ≥ (1 − λ)φ(x) + λφ(y)

Support Line: Let φ be convex on (a, b) and x0 ∈ (a, b). A supporting line (of φ, through x0 ) is a
line through (x0 , φ(x0 )) lying on or below the graph of φ on (a, b).
If m ∈ [D− φ(x0 ), D+ φ(x0 )], then a line through that point with slope m is a support line.

Some Results:

Suppose φ1 , φ2 are convex on (a, b), and α, β ∈ R. Then αφ1 + βφ2 is convex on (a, b). (Rudin’s
PMA, Thm. 7.36)
∞ k→∞
If {φi }i=1 are convex on (a, b) and φk −−−−→ φ, then φ is convex (Rudin’s PMA, Thm. 7.36)
Chordal-Slope / Three Slopes Lemma: Let φ : (a, b) → R be convex, with a < ℓ < m < r < b
(respectively, ℓ, m, r can be thought of as a “left”, “middle”, and “right” point). Then

φ(m) − φ(ℓ) φ(r) − φ(ℓ) φ(r) − φ(m)

≤ ≤
m−ℓ r−ℓ r−m
| {z } | {z } | {z }
left-to-middle slope average slope middle-to-right slope

In particular, what this says is that, in the function

φ(x) − φ(y)
Φ(x, y) :=
x−y
increasing either x or y increases the overall value.
The converse is also true: if the chordal-slope lemma inequalities hold (and it is easiest to prove with
the first and last fractions), then φ is convex.
Let φ : (a, b) → R be convex. Then:

◦ φ ∈ C[a, b] (Rudin’s PMA, Thm. 7.40)

′
◦ φ exists on (a, b) except on an at-most-countable set (stronger than a.e. existence) (Rudin’s
PMA, Thm. 7.40)
◦ φ′ is monotone increasing (Rudin’s PMA, Thm. 7.40)

409
◦ φ is Lipschitz on any [a′ , b′ ] ⊆ (a, b) with Lipschitz constant
L := max φ′+ (a′ ) , φ′− (b′ )

In particular, we will have Z y

for x < y, φ(y) − φ(x) = φ′
x

Let φ ∈ C(a, b) be convex, and suppose that

φ(x) − φ(x0 )
D− φ(x0 ) ≡ φ′− (x0 ) := lim−
x→x0 x − x0
φ(x) − φ(x0 )
D+ φ(x0 ) ≡ φ′+ (x0 ) := lim+
x→x0 x − x0
exist and are both finite. Then:

◦ φ′− , φ′+ are increasing functions

◦ When ℓ < r, then φ′+ (ℓ) < φ′− (r)
◦ φ′− ≤ φ′+ on all of (a, b)
◦ φ ∈ AC(a, b)
Z x
◦ The FTC holds: φ(x) = φ(a) + φ′ for each x ∈ (a, b)
a
◦ Given x0 ∈ (a, b), let m ∈ φ′− (x0 ), φ′+ (x0 ) be arbitrary. Then

φ(x) ≥ φ(x0 ) + m(x − x0 )

The right-hand side is called the support line of φ through x0 (with slope m).

Sufficient conditions for φ : (a, b) → R to be convex:

◦ φ′ exists everywhere and is increasing (decreasing for concave), φ is convex

If
◦ D+ φ exists, is increasing everywhere, and is finite everywhere on (a, b), then φ is convex
If
◦ φ ∈ AC[a, b], and φ′ exists (almost?) everywhere and is increasing a.e., then φ is convex
If
◦ φ′′ ≥ 0, then φ is convex (if φ′′ ≤ 0, then φ is concave)
If
Z x+h
1
◦ If φ(x) ≤ φ for each x ∈ (a, b) and each h > 0 small, then φ is convex
2h x−h
Jensen’s Inequality (Discrete): Let (Rudin’s PMA, Thm. 7.35)
(i) φ : (a, b) → R be convex
N
(ii) {xi }i=1 ⊆ (a, b)
N
(iii) {λi }i=1 ⊆ R≥0
P
(iv) i λi = 1
!
X X
Then φ λi xi ≤ λi φ(xi )
i i
P
Without the restriction that i λi = 1, we have
X  X
λi xi λi φ(xi )
 i  i
φ ≤
 X X
λi λi

i i

If φ is concave, the inequality reverses.

410
Jensen’s Inequality (Integrals): Let us have (Rudin’s PMA, Thm. 7.44)
(i) E ⊆ Rn measurable
(ii) p ≥ 0
R
(iii) E p > 0
(iv) f, p ∈ L(E) and finite a.e.
R
(v) E |f (x)p(x)| dx < ∞
(vi) φ : (a, b) → R convex, with im f ⊆ im φ
Then Z  Z
 f (x)p(x) dx  φ(f (x))p(x) dx
EZ ≤ E
φ
  Z
p(x) dx p(x) dx
E E
R
In particular, taking p ≡ 1 and hence E p = 1, and |E| = 1, we see a more familiar version:
Z Z
φ f (x) dx ≤ φ(f (x)) dx
E E

If φ is concave, the inequality reverses in each case.

411
§15.13: Lp Spaces

Definition of Lp Space:
Let E ⊆ Rn be measurable and p ∈ (0, ∞). Then
Z
Lp (E) := f : E → F ∈ R, C measurable ∥f ∥p :=

p,E |f | < ∞
E

Note that:

For p < 1, ∥·∥p,E does not represent a true norm (triangle inequality is violated)

L1 (E) is precisely L(E)

p
For p < 1, apparently ∥·∥p,E does define a true norm

To define the ∞-norm, we first define the essential supremum of f by

ess sup f ≡ ess sup f (x) := inf{α ∈ R | |{f > α}| = 0} ≡ inf{α ∈ R | f (x) ≤ α a.e.}
E x∈E

(As is typical, if there are no such α, we will have ess sup f = inf ∅ = +∞.) Then

∥f ∥∞,E := ess sup|f |

and
L∞ (E) :=

f : E → R measurable ∥f ∥∞,E := ess sup|f | < ∞
E

We may often drop the set from the norm (writing ∥·∥p ) or the set (writing Lp ) if the set is to be
understood.

Some Basic Results:

If f ∈ L∞ (E), then |f | ≤ ∥f ∥∞ a.e., and

∥f ∥∞,E ≡ min{M ∈ R≥0 | |f | ≤ M a.e.} (should be infimum?)

For f ∈ C(E, R) bounded, then its ∞-norm is the supremum/uniform norm of C(E, R):

∥f ∥L∞ (E) ≡ ∥f ∥C(E,R) := sup |f (x)|

x∈E

If f ∈ L∞ (E), then ∀ε > 0, ∃E1 ⊆ E such that

(i) |E1 | > 0

(ii) |f (x)| > ∥f ∥∞ − ε for every x ∈ E1

For f : E → R and p, q ∈ [1, ∞] Holder conjugates, (Rudin’s PMA, Thm. 8.8)

Z

∥f ∥p = sup f g g : E → R such that the integral exists and ∥g∥q ≤ 1
E

More Important Results:

412
∞-norm is limit of p-norms: Specifically on E with |E| < ∞, we have

∥f ∥∞ ≡ lim ∥f ∥p
p→∞

This need not be true on infinite-measure sets. (Consider a nonzero constant function.)
Lp Inclusions: Suppose 0 < p < q ≤ ∞ and |E| < ∞. Then Lq (E) ⊆ Lp (E) (larger exponent gives
a smaller space). In particular, then, L∞ (E) ⊆ Lp (E) for every p ∈ (0, ∞].
Lp is a Vector Space: If f, g ∈ Lp (E) and α, β ∈ R, then αf + βg ∈ Lp (E).

Young’s Inequality: We have, for a, b ∈ R≥0 and p > 1, with q such that 1/p + 1/q = 1

ap bq
ab ≤ +
p q
with equality iff ap = bq .
Young’s Inequality for Integrals: Let us have:

(i) φ continuous, strictly increasing

(ii) φ(0) = 0
(iii) a, b ∈ R≥0

Then Z a Z b
ab ≤ φ+ φ−1
0 0

with equality iff b = φ(a).

Holder’s Inequality: Let us have

(i) p ∈ [1, ∞]
(ii) q its Holder conjugate, i.e. 1/p + 1/q = 1 (if one is infinity, the other is 1)
(iii) f ∈ Lp (E)
(iv) g ∈ Lq (E)

Then Z Z Z 1/p Z 1/q

f g ≤ p q
|f g| ≤ |f | |g|
E E E E

or rather
∥f g∥1 ≤ ∥f ∥p · ∥g∥q
with equality iff
(i) f g ≥ 0 on a.e., and
p q
(ii) |f | = α|g| a.e., for some α ∈ R>0

Cauchy-Schwarz Inequality: The case of p = q = 2 in Holder’s inequality.

Generalized Holder Inequality: Let us have

(i) pi > 1 for i = 1, 2, · · ·, m

X 1
(ii) =1
i
pi
(iii) fi ∈ Lpi (E) for each i

413
Then Z Y YZ 1/pi
pi
fi ≤ |f |

E
i E i

or rather
Y Y
fi ≤ ∥fi ∥pi

i 1 i

Interpolation Inequalities: Let 1 ≤ p < r < q < ∞; then Lp ∩ Lq ⊆ Lr .

Moreover, if θ ∈ (0, 1) satisfies

1 θ 1−θ 1/r − 1/q

= + ; that is, θ =
r p q 1/p − 1/q

then
θ 1−θ
∥f ∥r ≤ ∥f ∥p · ∥f ∥q
n o
∥f ∥r ≤ max ∥f ∥p , ∥f ∥q
p/r 1−(p/r)
∥f ∥r ≤ ∥f ∥p · ∥f ∥∞

Minkowski’s Inequality / Lp Triangle Inequality: If p ∈ [1, ∞], then

∥f + g∥p ≤ ∥f ∥p + ∥g∥p

i.e. Z 1/p Z 1/p Z 1/p

p p p
|f + g| ≤ |f | + |g|
E E E

and
ess sup|f + g| ≤ ess sup|f | + ess sup|g|
E E E

This inequality fails for p ∈ (0, 1).

414
§16: Items from Complex Analysis

§16.1: Complex Differentiation

A function f (x + iy) = u(x, y) + i · v(x, y) (f : C → C, with u, v : R2 → R) is said to be complex-

differentiable at z0 ∈ C if
f (z) − f (z0 )
f ′ (z0 ) := lim
z→z0 z − z0
exists. We analogously generalize to whole sets as well.
A function is holomorphic on an open set U ⊆ C if it is complex-differentiable at each point in U . We
say it is holomorphic at a point z0 if it is holomorphic on a neighboorhood of z0 .
2
Note that a function may be complex-differentiable at a point, but not holomorphic there. (f (z) = |z|
at z = 0.)
When a complex function f = u+iv is holomorphic, then the Cauchy-Riemann equations are satisfied
∂u ∂v ∂u ∂v
= =−
∂x ∂y ∂y ∂x
With continuity, the converse is true.

415
§16.2: Complex Integration

Fundamental Theorem for Complex Line Integrals: For f holomorphic on U ⊆ C open, and γ
a curve in U from za to zb , then Z
f ′ (z) dz = f (zb ) − f (za )
γ

(Note the path-independence when f has a single-valued antiderivative in U .)

Parameterization: If γ : [a, b] → C, is “nice enough” and f continuous, then z = γ(t) gives
Z Z b
f (z) dz = f (γ(t))γ ′ (t) dt
γ a

Cauchy’s Integral Theorem: Let γ : [a, b] → U be a smooth, closed curve, with U simply-connected
and open, and f : U → C holomorphic. Then
Z
f (z) dz = 0
γ

Cauchy’s Integral Formula: Take f holomorphic in a neighboorhood of a closed, simple curve γ,

oriented counterclockwise. Let Ω be the region enclosed by γ, and a ∈ Ω. Then
Z Z
1 f (z) f (z)
f (a) = dz =⇒ dz = 2πif (a)
2πi γ z − a γ z−a

This gives f ∈ C ∞ , and so

Z Z
(n) n! f (z) f (z) 2πi (n)
f (a) = dz =⇒ dz = f (a)
2πi γ (z − a)n+1 γ (z − a)n+1 n!

(sometimes called Cauchy’s differentiation formula). Of particular note is that

Z Z
dz
z n = 0 for n ∈ Z≥0 and = 2πi
|z|=1 |z|=1 z

Residue Theorem: For sufficiently “nice” f , we define the residue of f at c (a pole of order n) by

1 dn−1
Res(f, c) ≡ Res f (z) := lim n−1 (z − c)n f (z)
z=c (n − 1)! z→c dz

(Simple poles are those of order 1.) One also notes that the Laurent series (centered at c) is given by
Z
X
n 1 f (z)
f (z) = an (z − c) an = dz
2πi γ (z − c)n+1
n∈Z

with γ a counterclockwise Jordan curve enclosing c and lying in an annulus in which f is holomor-
phic/analytic. Then
Res(f, c) = a−1 , the coefficient of (z − c)−1
We have that Z X
f (z) dz = 2πi Res(f )
γ z=c
c in γ

(Note that sometimes even with multiple singularities, residue theorem is not necessary; for instance,
the ML lemma handles this example.)

416
§16.3: Auxillary Inequalities/Results for Contour Integrals

Just a collection of useful results, estimations, inequalities, and reminders for contour integrals aside from
the aforementioned.

A Few Obvious Things:

◦ eiθ = 1 for any θ ∈ R
◦ More generally, |ez | = eRe(z)
◦ For any z on an arc of radius R, |z| = R
Z
◦ |dz| = length of γ
γ
1 1 1
◦ |z| − |w| ≤ |z + w| ≤ |z| + |w| =⇒ ≤ ≤

|z| + |w| |z ± w| |z| − |w|

eiz − e−iz
◦ sin(z) =
2i
eiz + e−iz
◦ cos(z) =
2
Numerical Integration: Using the formula
Z Z b
f (z) dz = f (γ(t))γ ′ (t) dt
γ a

(where γ is parameterized by {γ(t)}t∈[a,b] ), Wolfram Alpha can handle some contour integrals of simple
types. For instance, one may use
Z 2π
sin(eiθ ) · ieiθ
Z
sin(z)
4
dz = dz
|z|=1 (z − π/4) 0 (eiθ − π/4)4

to get an approximation. (Be sure to account properly for the differential.) The above example is here.
Z Z
Estimation Lemma: For f continuous, f (z) dz ≤ |f (z)| |dz| ≤ sup |f (z)| length of γ

γ γ z on γ

Other names: “Triangle inequality for contour integrals,” “ML estimation lemma” (“M” for “max”,
“L” for “length”)
Jordan’s Lemma: Take the semicircular arc Cr := reiθ t∈[0,π] and f continuous satisfying f (z) = eiaz g(z)

there. (Here, r, a > 0.) Then Z

π
≤ · sup g reiθ

f (z) dz a
Cr θ∈[0,π]

417
§17: Items from Functional Analysis

§17.1: (Kreyszig, Ch. 1) Basics of Metric Spaces

A metric space (X, d) is a set X with a distance/metric function d : X 2 → R≥0 such that, ∀x, y, z ∈ X

d(x, y) = 0 ⇐⇒ x = y (positivity)

d(x, y) = d(y, x) (symmetry)

d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality)

We say two metric spaces (X, d), (Y, ρ) are isometric (isomorphic) if there is a bijective distance-
preserving map T : X → Y : ρ(T x, T y) = d(x, y) Even without bijectivity, we would say T is an isometry.
We say two metric spaces are homeomorphic if there is a bijective continuous function T : X → Y with
continuous inverse. (Isometric spaces are homeomorphic.)
Some basic examples of metric spaces include:

Normed & Inner Product Vector Spaces: Any vector space V with inner product ⟨·, ·⟩ or ∥·∥
induces a metric by the rule(s)
p
d(x, y) := ∥x − y∥ = ⟨x − y, x − y⟩

For instance, Fn with any p-norm (p ≥ 1) and field F.

The sequence space ℓ∞ . ℓ∞ is defined by

ℓ∞ := {x := (ξi )∞
i=1 ∈ C
∞
| |ξi | ≤ Mx for some Mx ∈ R}

Mx depends only on x; that is, the sequence is bounded. We may equip it with distance

d(x, y) := sup |ξi − ηi | x := (ξi )i∈N , y := (ηi )i∈N ∈ ℓ∞

j∈N

ℓp space: On the space of sequences in C, we define

∞
!1/p
X p
∥x∥p := |xi |
i=1
n o
for p ∈ [1, ∞). Then ℓp := x := (ξi )∞
i=1 ∥x∥p < ∞ .

The function space C[a, b]. This is all f : [a, b] → R which are continuous, and equipped with
sup-norm:
d(x, y) := sup |x(t) − y(t)|
t∈[a,b]

(
1, x ̸= y
Trivial/Discrete Space: For any set X, let d(x, y) := 1 − δx,y =
0, x = y

Some results of note:

418
Holder’s Sum Inequality: Given p, q Holder conjugates (1/p + 1/q = 1), then

∥xy∥1 ≤ ∥x∥p ∥y∥q

when using a pointwise product. More explicitly:

!1/p !1/q
X X p
X q
|ξi ηi | ≤ |ξi | |ηi |
i i i

Cauchy-Schwarz: This is the case p = q = 2 above.

Minkowski’s Sum Inequality: This is the triangle inequality for ℓp :

∥x + y∥p ≤ ∥x∥p + ∥y∥p

or !1/p !1/p !1/p

X p
X p X p
|ξi + ηi | ≤ |ξi | + |ηi |
i i i

419
§17.2: (Kreyszig, Ch. 1) Brief Items From Topology

Unless stated otherwise, X = (X, d) is a metric space.

Balls/Spheres: Given x0 ∈ X a metric space and r ∈ R≥0 , we define open balls, closed balls, and
spheres as so:

B(x0 ; r) := {x ∈ X | d(x, x0 ) < r}

B(x0 ; r) := {x ∈ X | d(x, x0 ) ≤ r}
S(x0 ; r) := {x ∈ X | d(x, x0 ) = r}

Note that, unlike Rn , B(x; r) ̸= B(x; r) in general.

Open, Closed, Topology: M ⊆ X a metric space is open if there is a ball about each x ∈ M , with
the ball completed contained in M . M is closed if M c is open. M is clopen if both open and closed.
A topology on a set X is a set τ ⊆ P(X) such that ∅, X ∈ τ , it is closed under arbitrary union, and
closed under finite intersection. The members of τ are open sets as well.
All metric spaces are topological spaces.
Neighborhood; Interior: One often calls B(x; ε) an ε-neighborhood of x. If M is a neighborhood
of x, then x is an interior point: x ∈ int(M ). int(M ) is the largest open set in M .
Accumulation Point; Closure: x ∈ X is an accumulation point of M ⊆ X if any neighborhood
of x contains a y ∈ M (y ̸= x), i.e.

B(x; ε) ∩ (M − {x}) ̸= ∅, ∀ε > 0

The set of accumulation points is M ′ . The closure of M is M ∪ M ′ =: M , and it is the smallest closed
set containing M .
Continuity: We say a function f : X → Y of metric spaces (X, d), (Y, ρ) is continuous at x0 ∈ X if

(∀ε > 0)(∃δ > 0)(∀x such that d(x, x0 ) < δ)(ρ(f (x), f (x0 )) < ε)

One notes that f is continuous iff the preimage of an open set is open. (Kreyszig, Thm. 1.3-4)
Density; Separability: M ⊆ X is dense in X if M = X. X is separable if it has a countable dense
set.
Rn , Cn are separable; ℓ∞ is not. ℓp is separable for p ∈ [1, ∞).
Convergence: A sequence {xn }n∈N in X converges to x ∈ X iff
n→∞
lim d(xn , x) = 0 sometimes written as lim xn = x or xn −−−−→ x
n→∞ n→∞

Some results:
n→∞ n→∞ n→∞
◦ We note that limits are unique, and xn −−−−→ x, yn −−−−→ y =⇒ d(xn , yn ) −−−−→ d(x, y).
(Kreyszig, Thm. 1.4-2)
◦ On closures, x ∈ M iff it is some sequence’s limit, and M is closed iff each sequence converges in
M. (Kreyszig, Thm. 1.4-6)
n→∞ n→∞
◦ Continuous functions preserve convergence (xn −−−−→ x =⇒ f (xn ) −−−−→ f (x)). (Kreyszig,
Thm. 1.4-8)
◦ Subsequences of convergent sequences converge to the same limit. (Kreyszig, Prob. 1.4.1)

420
◦ If a Cauchy sequence has a subsequence with limit L, the original sequence has limit L. (Kreyszig,
Prob. 1.4.2)
◦ All Cauchy sequences are bounded. (Kreyszig, Prob. 1.4.4)
Cauchy Sequences: {xn }n∈N in X is said to be Cauchy iff

(∀ε > 0)(∃N ∈ N)(∀m, n > N )(d(xm , xn ) < ε)

A metric space is said to be complete if every such sequence converges to a limit in X.

If a sequence does converge, it is a Cauchy sequence. (Kreyszig, Thm. 1.4-5)
n n ∞ p
R , C , ℓ , ℓ , C[a, b] are complete metric spaces. Another is the space c of all convergent sequences
(with ℓ∞ metric). In the case of C[a, b], convergence is uniform.

Boundedness of Sets: We say M ⊆ X is bounded if it has finite diameter :

δ(M ) ≡ diam(M ) := sup d(x, y)

x,y∈M

A sequence is bounded if its members constitute a bounded subset. A bounded set satisfies M ⊆ B(x; r)
for r sufficiently large.

421
§17.3: (Kreyszig, Ch. 2) Normed & Banach Spaces

Basic Definitions:

Vector Space: A vector space over a scalar field F is a set V of vectors with operations
+ : V 2 → V, · : V × F → V such that

◦ (V, +) is an abelian group

◦ α(βx) = (αβ)x ∀α, β ∈ F, ∀x ∈ V
◦ 1x = x ∀x ∈ V
◦ α(x + y) = αx + αy ∀α ∈ F, ∀x, y ∈ V
◦ (α + β)x = αx + βx ∀α, β ∈ F, ∀x ∈ V

A (vector) subspace is a subset Y ⊆ X closed under linear combinations.

Normed Space: A vector space is a normed space when equipped with a vector norm on ∥·∥ on it.
A norm is a mapping ∥·∥ : V → R≥0 such that

◦ ∥x∥ = 0 ⇐⇒ x = 0 (positive-definite)
◦ ∥αx∥ = |α| · ∥x∥ (absolute homogenity)
◦ ∥x + y∥ ≤ ∥x∥ + ∥y∥ (triangle inequality)

A norm induces a metric, d(x, y) := ∥x − y∥.

Banach Space: A normed space is considered Banach when complete under the induced norm.

Operator: A function T : X → Y is called an operator .

Functional: If X is a normed vector space over F, a mapping f : X → F is a functional .

422
§18: Items from Analytic Number Theory

§18.1: Functions of Algebraic Importance

The Identity Function, N

N is the identity function, in the sense that N = id in more common vernacular, i.e.

N (n) = n

It has Dirichlet inverse N −1 = µN .

The All-Ones or Unit Function, u

u(n) = 1

The Identity Function, I

I is the identity w.r.t. Dirichlet multiplication, i.e. f ∗ I = I ∗ f = f . It is given by
(
1 1, n = 1
I(n) = =
n 0, n > 1

Recall, for clarity, that X n X n

(f ∗ g)(n) = f (d)g = g(d)f
d d
d|n d|n

The Dirichlet Inverse

(This comes from Theorem 2.8.)

Suppose f is arithmetical and f (1) ̸= 0. Then it has a unique Dirichlet inverse f −1 , a function whereby

f ∗ f −1 = f −1 ∗ f = I

and it can be given by

1 −1 X n −1
f −1 (1) = f −1 (n) = f f (d)
f (1) f (1) d
d|n
d<n

423
§18.2: Important Functions

§18.2.1: Mobius’ µ Function

(Wikipedia article.)

Definition
For n > 1, write its prime decomposition n = pa1 1 · · ·par r . Then


 1, n = 1
µ(n) := (−1)r , ai = 1 ∀i

0, otherwise


Hence µ(n) = 0 iff n is squarefree. Equivalently,


+1, n is squarefree, with an even number of prime factors

µ(n) := δω(n),Ω(n) · λ(n) = −1, n is squarefree, with an odd number of prime factors

0, n is not squarefree


Some Identities

X X n
(Mobius Inversion Formula) f (n) = g(d) =⇒ g(n) = f (d)µ (Apostol, Thm. 2.9)
d
d|n d|n

(Multiplicative) µ(mn) = µ(m)µ(n) if m, n are coprime

f multiplicative is completely multiplicative iff f −1 = µf
X Y
f multiplicative =⇒ µ(d)f (d) = 1 − f (p) (Apostol, Thm. 2.18)
d|n p|n

(
X 1 1, n=1
µ(d) = = (Apostol, Thm. 2.1)
n 0, n>1
d|n

◦ Equivalently, µ ∗ u = I and µ−1 = u.

X n
φ(n) = µ(d) (or rather, φ = µ ∗ N ) (Apostol, Thm. 2.3)
d
d|n
X
φ−1 = u ∗ µN =⇒ φ−1 (n) = dµ(d)
d|n

λ−1 = |µ|
X X
|µ(d)| = 1 = 2ω(n)
d|n d|n
d squarefree

424
X
µ(d) = µ2 (n) (Apostol, Prob. 2.6)
d2 |n
(
X 0, mk | n for some n > 1
µ(d) =
1, otherwise
dk |n

X 1, n = 1

For p prime, µ(d)µ gcd(p, d) = 2, n = pa for some a ∈ Z≥1 (Apostol, Prob. 2.7)

d|n 
0, otherwise
X
If n has more than m distinct prime factors, m ≥ 1, µ(d) logm (d) = 0
d|n

Let φ(x, n) be the number of integers k ∈ [0, x] coprime to n. Then

Xj x k X x n
φ(n, n) = φ(n) φ(x, n) = φ , = ⌊x⌋ (Apostol, Prob. 2.9)
d d d
d|n d|n

X k
µ(n) is the sum of the primitive nth roots of unity: µ(n) = exp 2πi
n
1≤k≤n
gcd(k,n)=1
X n
λ(n) = µ 2 (Apostol, Prob. 2.33)
2
d
d |n

∞
X µ(n) 1
α
= for α ∈ R+
̸=1
n=1
n ζ(α)
∞
X |µ(n)| ζ(α)
α
= (Source: Wikipedia)
n=1
n ζ(2α)
∞
X µ(n)
=0 (Apostol, Thm. 4.16)
n=1
n
X jxk
µ(n) =1 (Apostol, Thm. 3.12)
n
n≤x

X |µ(d)| X n
φ1 (n) := n = µ(d)σ (Apostol, Prob. 3.11b)
d d2
d|n d2 |n

∞
X µ(n) log(n)
= −1 (Source: Wikipedia)
n=1
n
∞
X µ(n) log2 (n)
= −2γ (Source: Wikipedia)
n=1
n

(Schneider’s identities, per Wikipedia) For ϕ the golden ratio and ϕ := 1/ϕ its conjugate:
∞ ∞
X φ(k) k
X µ(k) k

ϕ=− log 1 − ϕ ϕ=− log 1 − ϕ
k k
k=1 k=1

Consequently
∞
X µ(k) − φ(k) k

log 1 − ϕ = 1
k
k=1

425
The proof uses the formulas, for x ∈ (0, 1),
∞ ∞
X φ(k) x X µ(k)
− log(1 − xk ) = − log(1 − xk ) = x
k 1−x k
k=1 k=1

Asymptotics

1X
lim µ(n) = 0 (equivalent to the PNT)
x→∞ x
n≤x

X j x k2 1 2
µ(n) = x + O(x log x) (Apostol, Prob. 3.4a)
n ζ(2)
n≤x

X µ(n) j x k 1
= x + O(log x) (Apostol, Prob. 3.4b)
n n ζ(2)
n≤x

426
§18.2.2: Mobius Function of Order k, µk

Definition
Let k ∈ Z≥1 . If n > 1, give it the prime decomposition n = pa1 1 · · ·par r . We define


 1, n = 1
0, pk+1 | n for a prime p


µk (n) :=
(−1)r , n = pki1 · · ·pkir i pai i with ai < k (can pull out r kth prime powers)
Q



1, otherwise


That is: µk (n) = 0 if n is divisible by a (k + 1)th power, and is 1 unless you can factor out r-many kth prime
powers, and then it is (−1)r .
Note that µ1 = µ.

Some Identities

µk (nk ) = µ(n) (Apostol, Prob. 2.36)

µk is multiplicative (Apostol, Prob. 2.37)
X n n
∀k ≥ 2, µk (n) = µk−1 k µk−1 (Apostol, Prob. 2.38)
k
d d
d |n
X
∀k ≥ 1, |µk (n)| = µ(d) (Apostol, Prob. 2.39)
dk+1 |n

Asymptotics

(Nothing included at this time.)

427
§18.2.3: Merten’s M Function

(Wikipedia article.)

Definition
Defined by the partial sums of the Mobius function,
X
M (x) := µ(n)
n≤x

Some Identities
Z ∞
1 M (x)
Using a Mellin transform, =s dx on Re(s) > 1 (Source: Wikipedia)
ζ(s) 1 xs+1
∞
X x
ψ(x) = M log(n) (Source: Wikipedia)
n=2
n

Asymptotics

(Merten’s Conjecture) The “best” big-O for M is not known. Numerical evidence suggests
√
|M (x)| < x on x > 1
√
i.e. M (x) = O( x). The best result known is given by the following: for some A > 0 constant, and
the function −1/5
δ(x) := exp − A log3/5 x log log x

then M (x) = O(xδ(x)).

M (x)
lim = 0 (equivalent to PNT), i.e. M (x) = o(x) (Apostol, Thm. 4.14)
x→∞ x
X x
M (x) log(x) + M Λ(n) = O(x) (Apostol, Prob. 4.23)
n
n≤x

X x
M (x) log(x) + M log(p) = O(x) (Apostol, Prob. 4.23)
p
p≤x

√ C log x
Riemann hypothesis ⇐⇒ M (x) = O exp for a C > 0 (Source: Wikipedia)
log log x

428
§18.2.4: Euler’s Totient Function φ

(Wikipedia article.)

Definition
φ(n) is the number of positive integers ≤ n which are coprime to it. That is,
X
φ(n) := 1
1≤k≤n
gcd(k,n)=1

Some Identities

(Euler’s Theorem) For a, n coprime, aφ(n) ≡ 1 (mod n) (Source: Wikipedia)

◦ n prime gives Fermat’s Little Theorem
n | φ(an − 1) (Source: Wikipedia)
φ(lcm(m, n)) · φ(gcd(m, n)) = φ(m)φ(n) (note: lcm(m, n) gcd(m, n) = mn) (Source: Wikipedia)
X
φ(d) = n (Apostol, Thm. 2.2)
d|n
X n
φ(n) = µ(d) (or rather, φ = µ ∗ N ) (Apostol, Thm. 2.3)
d
d|n

Y 1
φ(n) = n 1− (Apostol, Thm. 2.4)
p
p|n

φ(pa ) = pa − pa−1 for all p prime and all a ∈ Z≥1 (Apostol, Thm. 2.5a)
d
φ(mn) = φ(m)φ(n) where d := gcd(m, n) (Apostol, Thm. 2.5b)
φ(d)
a | b =⇒ φ(a) | φ(b) (Apostol, Thm. 2.5d)
X
(Dirichlet Inverse) φ−1 = u ∗ µN =⇒ φ−1 (n) = dµ(d)
d|n
Y
(Dirichlet Inverse) φ−1 (n) = (1 − p)
p|n

n X µ2 (d)
= (Apostol, Prob. 2.3)
φ(n) φ(d)
d|n

Let φ(x, n) be the number of integers k ∈ [0, x] coprime to n. Then

Xj x k X x n
φ(n, n) = φ(n) φ(x, n) = φ , = ⌊x⌋ (Exercise 2.9)
d d d
d|n d|n

429
Y Y d! µ(n/d)
k=n φ(n)
(Apostol, Prob. 2.20)
dd
1≤k≤n d|n
gcd(n,k)=1
X n
σ1 = φ ∗ σ0 =⇒ σ1 (n) = φ(d)σ0 (Apostol, Prob. 2.22)
d
d|n

X 1
k= nφ(n) (Source: Wikipedia)
2
1≤k≤n
gcd(n,k)=1
X
(Menon’s identity) gcd(k − 1, n) = φ(n)σ0 (n) (Source: Wikipedia)
1≤k≤n
gcd(k,n)=1

(Schneider’s identities, per Wikipedia) For ϕ the golden ratio and ϕ := 1/ϕ its conjugate:
∞ ∞
X φ(k) k
X µ(k) k

ϕ=− log 1 − ϕ ϕ=− log 1 − ϕ
k k
k=1 k=1

Consequently
∞
X µ(k) − φ(k) k

log 1 − ϕ = 1
k
k=1

The proof uses the formulas, for x ∈ (0, 1),

∞ ∞
X φ(k) x k
X µ(k)
− log(1 − x ) = − log(1 − xk ) = x
k 1−x k
k=1 k=1

∞
X φ(n) ζ(s − 1)
(Dirichlet Series) s
= on Re(s) > 2 (Source: Wikipedia)
n=1
n ζ(s)

Asymptotics

X 1 1 2
φ(n) = x + O(x log x) (Apostol, Thm. 3.7)
2 ζ(2)
n≤x
4/3
◦ Error improvable to O x log3/2 (x) + log log x (Source: Wikipedia)
X φ(n) 1
= x + O(log x) (Apostol, Prob. 3.5)
n ζ(2)
n≤x
4/3
◦ Error improvable to O log2/3 x log log x (Source: Wikipedia)

X φ(n) ∞
1 γ X µ(n) log(n) log(x)
= log(x) + − +O (Apostol, Prob. 3.6)
n2 ζ(2) ζ(2) n=1 n2 x
n≤x

X φ(n) x2−α 1 ζ(α − 1)

+ O x1−α log(x) when α ∈ R>1

α
= + ̸=2 (Apostol, Prob. 3.7)
n 2 − α ζ(2) ζ(α)
n≤x

430
X φ(n) x2−α 1
+ O x1−α log(x) for α ∈ R≤1

α
= (Apostol, Prob. 3.8)
n 2 − α ζ(2)
n≤x
X n
= O(x) (Apostol, Prob. 3.9b)
φ(n)
n≤x

315 1
2/3

◦ Improvable to: ζ(3)x − log x + O log (n) (Source: Wikipedia)
2π 4 2
X 1
= O(log x) (Apostol, Prob. 3.10)
φ(n)
n≤x
  !
315 X log(p) log2/3 n
◦ Improvable to ζ(3) log(x) + γ −
  +O (Source: Wikipedia)
2π 4 p2 − p + 1 n
p prime

X n
Given m ∈ Z≥2 , 1= φ(n) + O 2ω(m) (Source: Wikipedia)
m
1≤k≤n
gcd(k,m)=1

Unsolved Problems

Lehmer’s Totient Problem (Wikipedia link): It is known φ(p) = p − 1 for p prime. Are there
composite n such that φ(n) | n − 1? If such n exists, it is odd, squarefree, ω(n) ≥ 14, and n > 1020 . If
3 | n, then n > 101937042 and ω(n) ≥ 298848.
Carmichael’s Totient Function Conjecture (Wikipedia link): Claims that there is no n such
that, ∀m ∈ N̸=n , we have φ(m) ̸= φ(n). That is, ∀n ∈ N, ∃m ∈ N̸=n such that φ(m) = φ(n). If there
is a counterexample, there are infinitely many, and the smallest n > 1010,000,000,000 . Per Pomerance, if
n is a counterexample, then for any p prime where p − 1 | φ(n), we have p2 | n

431
§18.2.5: Jordan’s Totient Functions Jk

(Wikipedia article.)

Definition
Y 1
Jk (n) := nk 1−
pk
p|n

Clearly, J1 ≡ φ.

Some Identities

Jk = µ ∗ N k and N k = Jk ∗ u (Apostol, Prob. 2.17a)

∞
X Jk (n) ζ(s − k)
s
= (Source: Wikipedia)
n=1
n ζ(s)

Asymptotics

nk X Jk (n) nk
Jk has average order , i.e. = + (error). (Source: Wikipedia)
ζ(k + 1) n ζ(k + 1)
n≤x

432
§18.2.6: Liouville’s λ Function

(Wikipedia article.)

Definition
For n > 1, write a prime decomposition of n by n = pa1 1 · · ·par r . Then
(
Ω(n) (−1)a1 +...+ar , n > 1
λ(n) := (−1) =
1, n = 1

Some Identities

Suppose n = a2 b for b squarefree. Then λ(n) = µ(b). (Source: Wikipedia)

The divisor sum of λ is the square-indicator function
(
X √ √ 1, n is a square
◦ That is, λ(d) = n − n−1 = (Apostol, Thm. 2.19)
d|n
0, otherwise

(Dirichlet Inverse) λ−1 = |µ| = µ2 = λµ

X n
λ(n) = µ 2 (Apostol, Prob. 2.33)
2
d
d |n

∞
X λ(n) ζ(2s)
(Dirichlet Series) s
= (Source: Wikipedia)
n=1
n ζ(s)
∞
X λ(n) log(n)
= −ζ(2) (Source: Wikipedia)
n=1
n

Asymptotics

(Nothing included at this time.)

433
§18.2.7: The Divisor-Sum Functions σα

(Wikipedia article.)

Definition
Let α ∈ C and n ∈ Z≥1 . Define the sum of the αth powers of n’s divisors by
X
σα (n) := dα =⇒ σα = u ∗ N α
d|n

In particular, we often let σ := σ1 (sum of divisors) and d := σ0 (number of divisors).

Some Identities

r
! r
Y Y
(Multiplicative) σα (mn) = σα (m)σα (n) for m, n coprime. In particular, σα pai i = σα (pai i )
i=1 i=1
 α(n+1)
p −1
, α ̸= 0
(Prime Powers) σα (pn ) = α
p −1
n + 1, α=0


X n
(Dirichlet Inverse) σα−1 (n) = dα µ(d)µ =⇒ σα−1 = µN α ∗ µ (Apostol, Thm. 2.20)
d
d|n
X
2ω(d) = σ0 (n2 )
d|n
Y
d = nd(n)/2 (Apostol, Prob. 2.10)
d|n

 2
X X
d3 (r) =  d(r) (Apostol, Prob. 2.12)
r|n r|n

σ1 = φ ∗ σ0 (Apostol, Prob. 2.22)

X mn
σα (m)σα (n) = dα σα
d2
d|gcd(m,n)

X |µ(d)| X n
φ1 (n) := n = µ(d)σ (Apostol, Prob. 3.11b)
d d2
d|n d2 |n
X
(Menon’s identity) gcd(k − 1, n) = φ(n)σ0 (n) (Source: Wikipedia)
1≤k≤n
gcd(k,n)=1

∞
X σα (n)
(Dirichlet Series) = ζ(s)ζ(s − α) for s > max{1, 1 + α} (Source: Wikipedia)
n=1
ns

434
∞
X σα (n)σβ (n) ζ(s) · ζ(s − α) · ζ(s − β) · ζ(s − (α + β))
(Ramanujan) s
= (Source: Wikipedia)
n=1
n ζ(2s − (α + β))
∞ !
X 1 ℓ−1
σk (n) = ζ(k + 1)m k
1+2 cos πn (Source: Wikipedia)
ℓk+1 ℓ
ℓ=2

Asymptotics

∀ε > 0, σ0 (n) = o(nε ) (Source: Wikipedia)

X √
d(n) = x log(x) + (2γ − 1)x + O x (Apostol, Thm. 3.3)
n≤x

◦ This error O(x1/2 ) may be improved. Kolesnik (1969) proved it can be O x(12/37)+ε ∀ε > 0.
◦ Hardy and Landau (1915) proved that O(xθ ) as the error satisfies inf θ ≥ 1/4. The exact infimum
is unknown.
◦ Huxley (2003) improved it to inf θ ≤ 131/416 ≈ 0.3149.
X ζ(2) 2
σ1 (n) = x + O(x log x) (Apostol, Thm. 3.4)
2
n≤x

X ζ(α + 1) α+1
σα (n) = x + O xmax{1,α} for α ∈ R+
̸=1 and x ≥ 1 (Apostol, Thm. 3.5)
α+1
n≤x
(
X ζ(α + 1) + O xmax{0,1−α} , α ̸= 1
σ−α (n) = wherein α > 0 (Apostol, Thm. 3.6)
n≤x
ζ(2)x + O(log x), α=1

X d(n) 1
= log2 (x) + 2γ log(x) + O(1) (Apostol, Prob. 3.2)
n 2
n≤x

X d(n) x1−α log(x)

α
= + ζ 2 (α) + O(x1−α ) for α ∈ R+
̸=1 (Apostol, Prob. 3.3)
n 1−α
n≤x

RH =⇒ σ1 (n) < eγ · n · log log n (Source: Wikipedia)

σ1 (n) < eγ · n · log log n + 0.6483·n
log log n for all n ≥ 3 (Source: Wikipedia)
eHn
RH ⇐⇒ σ1 (n) < Hn + log(Hn ) for Hn the nth harmonic number (Source: Wikipedia)

435
§18.2.8: The Number of Prime Divisor Functions, ω, Ω, ν

(Wikipedia article.)

Definition
Let n > 1 have the prime factorization n = pa1 1 · · ·par r . We define:
X
ω(n) := 1 (the text also uses ν := ω)
p|n
X X
Ω(n) := 1= 1
pk |n k∈Z+
pk |n

The text lets ν(1) = 1 as well.

Essentially, ω counts the number of distinct prime divisors; Ω counts all prime-power divisors. (In the
prime decomposition given, ω(n) = r and Ω(n) = a1 + . . . + ar .)

Some Identities

Ω(n) ≥ ω(n); Ω(n) = ω(n) =⇒ n squarefree with µ(n) = (−1)ω(n) = (−1)Ω(n) (Source: Wikipedia)
Ω(n) = 1 =⇒ n prime (Source: Wikipedia)
X X
|µ(d)| = 1 = 2ω(n) (Source: Wikipedia)
d|n d|n
d squarefree
X
2ω(d) = σ0 (n2 ) (Source: Wikipedia)
d|n

Asymptotics

X
ω(n) = x log log x + B1 x + o(x) for B1 ≈ 0.261 the Mertens constant (Source: Wikipedia)
n≤x

X X 1
Ω(n) = x log log x + B2 x + o(x) for B2 = B1 + ≈ 1.035 (Source: Wikipedia)
p
p(p − 1)
n≤x

X 2
ω 2 (n) = x log log x + O(x log log x) (Source: Wikipedia)
n≤x

X k k−1
ω k (n) = x log log x + O (x log log x where k ∈ Z≥1 (Source: Wikipedia)
n≤x

436
X
Ω(n) − ω(n) = O(x) (Source: Wikipedia)
n≤x
X n
Given m ∈ Z≥2 , 1= φ(n) + O 2ω(m) (Source: Wikipedia)
m
1≤k≤n
gcd(k,m)=1

437
§18.2.9: Mangoldt’s Λ Function

(Wikipedia article.)

Definition
(
log(p), n = pm for a prime p and an m ∈ Z≥1
Λ(n) :=
0, otherwise

Some Identities
X
log(n) = Λ(d) (or log = Λ ∗ u) (Apostol, Thm. 2.10)
d|n
X n X
Λ(n) = µ(d) log =− µ(d) log(d) (or Λ = µ ∗ log) (Apostol, Thm. 2.11)
d
d|n d|n

(Selberg Identity) Λ · log +Λ ∗ Λ = log2 ∗µ,

X n X n
◦ Equivalently, Λ(n) log(n) + Λ(d)Λ = µ(d) log2 (Apostol, Thm. 2.27)
d d
d|n d|n
X jxk
Λ(n) = log ⌊x⌋! (Apostol, Thm. 3.12)
n
n≤x

∞
X Λ(n)
= log ζ(s), for Re(s) > 1 (Source: Wikipedia)
n=2
ns
log(n)
∞
ζ ′ (s) X Λ(n)
◦ Consequently, =− (Source: Wikipedia)
ζ(s) n=1
ns
◦ More generally for f completely multiplicative, if we define
∞
X f (n)
F (s) =
n=1
ns

convergent for Re(s) > α, then

∞
F ′ (s) X f (n)Λ(n)
=−
F (s) n=1
ns

438
Asymptotics

X jxk
Λ(n) = x log x − x + O(log x) (Apostol, Thm. 3.15)
n
n≤x

1X
lim Λ(n) = 1 (equivalent to the PNT)
x→∞ x
n≤x

X Λ(n)
= log x + O(1) (Apostol, Thm. 4.9)
n
n≤x
X x
(Selberg’s Asymptotic Formula) ψ(x) log(x) + Λ(n)ψ = 2x log(x) + O(x) (Apostol, Thm.
n
n≤x
4.18)

◦ This is equivalent to the relations (by Exercise 4.22)

X x
ψ(x) log(x) + ψ log(p) = 2x log(x) + O(x)
p
p≤x
X x
ϑ(x) log(x) + ϑ log(p) = 2x log(x) + O(x)
p
p≤x

X x
M (x) log(x) + M Λ(n) = O(x) (Apostol, Prob. 4.23)
n
n≤x

439
§18.2.10: Chebyshev’s ψ Function / Second Function

(Wikipedia article.)

Definition
Defined as the partial sums of Mangoldt’s Λ:
X X X X
ψ(x) := Λ(n) = log(p) = ϑ x1/m
n≤x m≤log2 (x) p≤x1/m m≤log2 (x)

Some Identities

∞
ζ ′ (s)
Z
ψ(x)
Via Mellin transform, = −s dx for Re(s) > 1 (Source: Wikipedia)
ζ(s) 1 xs+1

Asymptotics

ψ(x) ∼
∞ x (equivalent to PNT) (Apostol, Thm. 4.4)
X x
ψ = x log x − x + O(log x) (Apostol, Thm. 4.11)
n
n≤x
X x
(Selberg’s Asymptotic Formula) ψ(x) log(x) + Λ(n)ψ = 2x log(x) + O(x) (Apostol, Thm.
n
n≤x
4.18)

◦ This is equivalent to the relations (by Exercise 4.22)

X x
ψ(x) log(x) + ψ log(p) = 2x log(x) + O(x)
p
p≤x
X x
ϑ(x) log(x) + ϑ log(p) = 2x log(x) + O(x)
p
p≤x

x
For x ≥ e22 , |ψ(x) − x| ≤ 0.006409 (Source: Wikipedia)
log x
√ √ √
For x ≥ 121, 0.9999 x < ψ(x) − ϑ(x) < 1.00007 x + 1.78 3 x (Source: Wikipedia)

Assuming RH, |ψ(x) − x| = Ox(1/2)+ε for all ε > 0 (Source: Wikipedia)

For x > 0, ψ(x) < 1.03883x (Source: Wikipedia)

440
§18.2.11: Chebyshev’s ϑ Function / First Function

(Wikipedia article.)

Definition
The definition for ϑ comes naturally from a rewriting of the definition of ψ;
X
ϑ(x) := log(p)
p≤x

Some Identities
Z x
π(t)
ϑ(x) = π(x) log(x) − dt (Apostol, Thm. 4.3)
2 t
Z x
ϑ(x) ϑ(t)
π(x) = + dt (Apostol, Thm. 4.3)
log(x) 2 t log2 (t)
Define Λ1 by (
log(n), n is prime
Λ1 (n) :=
0, otherwise
X
Then ϑ(x) = Λ1 (n)
n≤x

∞
X x
ψ(x) = M log(n) (Source: Wikipedia)
n=2
n

Asymptotics

ϑ(x) ∼
∞ x (equivalent to PNT) (Apostol, Thm. 4.4)
X x
ϑ = x log x + O(x) (Apostol, Thm. 4.11)
n
n≤x

x x x
π(x) = +O ⇐⇒ ϑ(x) = x + O (Apostol, Prob. 4.18)
log x log2 x log x
X x
(Selberg’s Asymptotic Formula) ψ(x) log(x) + Λ(n)ψ = 2x log(x) + O(x) (Apostol, Thm.
n
n≤x
4.18)

441
◦ This is equivalent to the relations (by Exercise 4.22)
X x
ψ(x) log(x) + ψ log(p) = 2x log(x) + O(x)
p
p≤x
X x
ϑ(x) log(x) + ϑ log(p) = 2x log(x) + O(x)
p
p≤x

log log k − 2.050735
For k ≥ 10 , ϑ(pk ) ≥ k log k + log log k − 1 +
11
(Source: Wikipedia)
log k

log log k − 2
For k ≥ 198, ϑ(pk ) ≤ k log k + log log k − 1 + (Source: Wikipedia)
log k
x
For x ≥ 10, 544, 111, |ϑ(x) − x| ≤ 0.006788 (Source: Wikipedia)
log x
√ √ √
For x ≥ 121, 0.9999 x < ψ(x) − ϑ(x) < 1.00007 x + 1.78 3 x (Source: Wikipedia)

Assuming RH, |ϑ(x) − x| = Ox(1/2)+ε for all ε > 0 (Source: Wikipedia)

For x > 0, ϑ(x) < 1.000028x (Source: Wikipedia)

442
§18.2.12: The Prime-Counting Function π

(Wikipedia article.)

Definition
π(x) counts the number of primes ≤ x; formally,
X
π(x) := 1
p≤x

Some Identities
Z x
π(t)
ϑ(x) = π(x) log(x) − dt (Apostol, Thm. 4.3)
2 t
Z x
ϑ(x) ϑ(t)
π(x) = + dt (Apostol, Thm. 4.3)
log(x) 2 t log2 (t)
1 n n
< π(x) < 6 (Apostol, Thm. 4.6)
6 log(n) log(n)

Asymptotics

x
(Prime Number Theorem) π(x) ∼
∞
log x

x x x
π(x) = +O ⇐⇒ ϑ(x) = x + O (Apostol, Prob. 4.18)
log x log2 x log x
√
RH =⇒ π(x) = li(x) + O( x log x). (Source: Wikipedia)
√
x log x
◦ Specifically, |π(x) − li(x)| < for x ≥ 2657
8π

443
§18.2.13: The Riemann ζ Function

(Wikipedia article.)

Definition
For the purposes of this text, we limit s to the set (0, 1) ∪ (1, ∞) and define
∞

 X 1

 , s>1
ns


 n=1 
ζ(s) :=

 X 1 x1−s 
 lim  − , s ∈ (0, 1)


x→∞
 n s 1−s
n≤x

Some Identities

∞
X µ(n) 1
α
= for α ∈ R+
̸=1
n=1
n ζ(α)
Y 1
ζ(s) = 1−
ps
p prime

∞
X φ(n) ζ(s − 1)
s
= on Re(s) > 2 (Source: Wikipedia)
n=1
n ζ(s)
∞
X Jk (n) ζ(s − k)
s
= (Source: Wikipedia)
n=1
n ζ(s)
∞
X λ(n) ζ(2s)
s
= (Source: Wikipedia)
n=1
n ζ(s)
∞
X σα (n)
= ζ(s)ζ(s − α) for s > max{1, 1 + α} (Source: Wikipedia)
n=1
nα
∞
X σα (n)σβ (n) ζ(s) · ζ(s − α) · ζ(s − β) · ζ(s − (α + β))
(Ramanujan) s
= (Source: Wikipedia)
n=1
n ζ(2s − (α + β))
∞
X Λ(n)
s log(n)
= log ζ(s), for Re(s) > 1 (Source: Wikipedia)
n=2
n
∞
ζ ′ (s) X Λ(n)
◦ Consequently, =− (Source: Wikipedia)
ζ(s) n=1
ns
∞
ζ ′ (s)
Z
ψ(x)
Via Mellin transform, = −s dx for Re(s) > 1 (Source: Wikipedia)
ζ(s) 1 xs+1

444
Asymptotics

X 1
1
= log(x) + γ + O (Apostol, Thm. 3.2a)
n x
n≤x

x1−s
X 1
1
= + ζ(s) + O for s > 0 and s ̸= 1 (Apostol, Thm. 3.2b)
ns 1−s xs
n≤x

X 1
= O x1−s for s > 1

s
(Apostol, Thm. 3.2c)
n>x
n

X xs+1
ns = + O(xs ) for s ≥ 0 (Apostol, Thm. 3.2d)
s+1
n≤x

φ(k) x1−s

X 1 X µ(d) 1
Given k ∈ Z≥1 , s
= + ζ(s) s
+ O (Apostol, Prob. 3.12)
n k 1−s d xs
1≤n≤x d|k
gcd(k,n)=1

445
§18.3: Assorted Other Useful Results

§18.3.1: Statements Equivalent to the Prime Number Theorem

The prime number theorem states that

x π(x)
π(x) ∼
∞ ; that is, lim =1
log(x) x→∞ x/ log x

The following statements are equivalent to this claim:

1X
lim µ(n) = 0 (can express with M )
x→∞ x
n≤x

1X
lim Λ(n) = 1 (can express with ψ)
x→∞ x
n≤x

ψ(x) ∼
∞ x (Apostol, Thm. 4.4)

ϑ(x) ∼
∞ x (Apostol, Thm. 4.4)
x
π(x) ∼
∞ (Apostol, Thm. 4.5)
log(π(x))
pn ∼
∞ n log(n) for pn the nth prime (Apostol, Thm. 4.5)
M (x)
lim = 0 (equivalent to PNT) (Apostol, Thm. 4.14)
x→∞ x
R x dt
π(x) ∼
∞ li(x) where li(x) :=
0 log(t)
(Source: Wikipedia)

§18.3.2: Euler’s Summation Formula

Theorem 3.1: Suppose 0 < y < x and f ∈ C 1 [y, x]. Then

X Z x Z x x
′

f (n) = f (t) dt + {t}f (t) dt − f (ξ){t}
y<n≤x y y y

Herein, {x} := x − ⌊x⌋ ∈ [0, 1) denotes the fractional part of x.

446
§18.3.3: Abel’s Identity

Theorem 4.2: For a arithmetical, define its partial sums

X
A(x) := a(n) A(x) = 0 for x < 1
n≤x

Let 0 < y < x and f ∈ C 1 [y, x]. Then

X x Z x
A(t)f ′ (t) dt

a(n)f (n) = A(ξ)f (ξ) −
y<n≤x y y

This can be seen as a generalization of Euler summation.

§18.3.4: Some Tauberian Theorems

Tauberian theorems are about weighted averages of functions.

Theorem 4.8 (Shapiro): Let a be an arithmetical function, satisfying

X jxk
a(n) = x log x + O(x)
n
n≤x

Then the following are true:

X a(n)
(a) = log x + O(1) (can divide out the x and drop braces)
n
n≤x
X
(b) ∃B > 0 constant such that a(n) ≤ Bx for all x ≥ 1
n≤x
X
(c) ∃A, x0 > 0 constant such that a(n) ≥ Ax for all x ≥ x0
n≤x

Hardy-Littlewood Tauberian Theorem: Suppose an ≥ 0 for all n ∈ Z≥0 , and that

P∞ ∞
an xn X
∼ 1
lim n=0
= 1; that is an xn x↗1
x↗1 1/(1 − x) n=0
1 − x

Then
n
X
∼ n
ak n→∞
k=0

447
Equivalently, by taking x := 1/ey , if
∞
∼ 1
X
an e−ny y↘0
n=0
y
then
n
X
∼ n
ak n→∞
k=0

448
§18.4: Congruences & Modular Arithmetic

§18.4.1: Basic Definitions

Modular Congruence: Let a, b, m ∈ Z and m > 0. We write

a ≡ b (mod m) or a ≡m b

if and only if m | a − b.

Residue Classes: ≡m is an equivalence relation, with classes b 2, · · ·, m

1, b b partitioning Z. Take ak ∈ b
k
m
for each class. Then {ak }k=1 is a complete residue system modulo m. We say a collection of φ(m)-
φ(m)
many integers {ak }k=1 is a reduced residue system if they are each incongruent mod-m and coprime
with m.

◦ For my notekeeping, the collection CRS(m) ⊆ P {1, 2, · · ·, m} denotes the collection of complete
residue systems modulo m.
◦ Similarly, RRS(m) denotes the mod-m reduced residue systems.
Solving Congruences: We often concerned with solutions x to f (x) ≡m 0. In such cases, note that
x ≡m y =⇒ f (x) ≡m f (y), so x being a solution would mean y is too, and so infinitely many solutions
exist if one does. As a result, we count only solutions which are incongruent (come from distinct residue
classes), e.g. only count from the set {1, 2, · · ·, m}.
Reciprocals mod m: Suppose a, m are coprime and ax ≡m 1. The solution x is called the reciprocal
of a mod m, denoted a−1 or 1/a sometimes.

449
§18.4.2: Basic Results

≡m is an equivalence relation. (Apostol, Thm. 5.1)

◦ Hence a ≡m a, and a ≡m b ⇐⇒ b ≡m a, and a ≡m b, b ≡m c =⇒ a ≡m c
◦ We write b
a for all x ∈ Z where x ≡m a (or a + mq for q ∈ Z). The set of these is Z/mZ.
Suppose a ≡m b, c ≡m d. Then (Apostol, Thm. 5.2)
◦ ∀x, y ∈ Z we have ax + cy ≡m bx + dy
◦ ac ≡m bd
◦ ∀n ∈ Z≥1 , we have an ≡m bn
◦ ∀f ∈ Z[x], we have f (a) ≡m f (b)
◦ That is, modular congruence is preserved by addition, multiplication, exponentiation, and poly-
nomials
(Cancellation Results)
◦ For c ∈ Z≥1 , we have a ≡m b ⇐⇒ ac ≡mc bc. (Can cancel from all 3 numbers.) (Apostol, Thm.
5.3)

m
◦ ac ≡ bc (mod m) =⇒ a ≡ b mod (Apostol, Thm. 5.4)
gcd(m, c)
Can cancel common factor from LHS & RHS if you divide the modulus by its gcd with the
common factor
(Assorted Basic Results) Suppose a ≡m b.
◦ d | a and d | m imply d | b (Apostol, Thm. 5.5)
◦ gcd(a, m) = gcd(b, m) (equivalent mod m means having the same gcd) (Apostol, Thm. 5.6)
◦ 0 ≤ |b − a| < m =⇒ a = b (Apostol, Thm. 5.7)
◦ a ≡m b ⇐⇒ a, b have same remainder on dividing by m (Apostol, Thm. 5.8)
◦ (Bezout) If gcd(a, b) = d then ∃x, y ∈ Z s.t. ax + by = d. (Apostol, Thm. 5.15)
a ≡n b and a ≡m b for m, n coprime =⇒ a ≡mn b (Apostol, Thm. 5.9)
(Equivalence Class Properties)

◦ b
a = bb ⇐⇒ a ≡m b (Apostol, Thm. 5.10)
◦ x, y ∈ b
a ⇐⇒ x, y ≡m a (Apostol, Thm. 5.10)
n om
◦ bk (in mod m) partition Z. (Hence are pairwise disjoint and union to Z.) (Apostol, Thm.
k=1
5.10)
m m
◦ For (k, m) coprime, if {ai }i=1 ∈ CRS(m) =⇒ {kai }i=1 ∈ CRS(m). (Apostol, Thm. 5.11)
φ(m) φ(m)
◦ For (k, m) coprime, {ai }i=1 ∈ RRS(m) =⇒ {kai }i=1 ∈ RRS(m) (Apostol, Thm. 5.16)
(On Linear Congruences)
◦ Let a, m be coprime. Then ax ≡m b has a single unqiue solution. (Apostol, Thm. 5.12)
◦ If gcd(a, m) = d, then ax ≡m b has solutions iff d | b. (Apostol, Thm. 5.13)
d−1 a b m
◦ If so, then there are exactly d solutions, given by {t + km/d}k=0 , where t solves x ≡ mod
d d d
(Apostol, Thm. 5.14)

450
◦ For a, m coprime, the solution to ax ≡m b satisfies x ≡m baφ(m)−1 (Apostol, Thm. 5.20)
(Euler & Fermat Style Results)

◦ (Euler-Fermat Theorem) For a, m coprime, aφ(m) ≡m 1 (Apostol, Thm. 5.17)

p−1
◦ If p̸ | a, then a ≡p 1 (previous with m prime) (Apostol, Thm. 5.18)
p
◦ (Fermat’s Little Theorem) For all primes p, a ≡p a (Apostol, Thm. 5.19)
◦ (Wilson’s Theorem) For primes p, (p − 1)! ≡p −1 (Apostol, Thm. 5.24)
Conversely, p is prime if and only if that relation is satisfied (Apostol, Prob. 5.7)
p−1
X (p − 1)!
◦ (Wolstenholme’s Theorem) For primes p ≥ 5, then ≡ 0 (mod p2 ) (Apostol,
k
k=1
Thm. 5.25)
For Hn the nth harmonic number, then, (p − 1)! · Hp−1 ≡p2 0
(On Polynomial Congruences)

◦ (Lagrange) Take p prime and f ∈ Z[x] with coefficients cn ̸≡p 0. Then f (x) ≡p 0 has at most
deg(f )-many solutions. (Apostol, Thm. 5.21)
◦ Corollary: If the equation has > deg(f ) solutions, then p | cn for each n. (Apostol, Thm. 5.22)
(On Systems of Congruences)
r
◦ (Chinese Remainder Theorem/CRT) Let {mi }i=1 be pairwise coprime, M their product,
r
and {bi }i=1 ⊆ Z. Then the system of congruences x ≡ bi (mod mi ) has a unique solution modulo
M. (Apostol, Thm. 5.26)
Let M = m1 · · ·mr and Mk = M/mk . Let Mk′ be the inverse of Mk modulo mk . (Hence,
Mk Mk′ ≡ 1 (mod mk ).) Then the solution is given by
r
X
x= bi Mi Mi′
i=1

pre-reduction modulo M .
−1
Opting for my own notation, where (n)[m] is the multiplicative inverse of n modulo m, so
−1
that n(n)[m] ≡m 1, we have
r
−1
X
x= bi Mi (Mi )[mi ]
i=1

in modulo M .
r
◦ Corollary: Let also {ai }i=1 be such that ai , mi is a coprime pair for each i. Then the system
ai x ≡ bi (mod mi ) has a unique solution modulo M . (Apostol, Thm. 5.27)
◦ Corollary: For f ∈ Z[x], f (x) ≡M 0 has solutions ⇐⇒ f (x) ≡mi 0 for all i. Moreover, if ν(n) is
the number of solutions mod n, then ν(M ) = ν(m1 ) · · ·ν(mr ). (Apostol, Thm. 5.28)
Hence the problem of f (x) ≡M 0 for M = pa1 1 · · ·par r can be reduced to looking at the
equations f (x) ≡ 0 (mod pai i ).

451
§18.5: Dirichlet Characters & Finite Abelian Groups

§18.5.1: Basic Definitions

We assume basic results and definitions tied to groups unless necessary.

Personal Notations:

◦ Char(G) is the collection (group) of characters of G

◦ DChar(m) is the collection (group) of Dirichlet characters mod m (the characters of G ∈ RRS(m))
φ(m)
◦ S ∈ RRS(m) means S is a reduced residue system mod m. Hence S := {xi }i=1 has φ(m)
elements, pairwise incongruent mod m, and each coprime to m.
Such an S is a finite abelian group, order φ(m), under the usual multiplication.
Characters: For G a group, f : G → C is a character of G if

◦ f (ab) = f (a)f (b) for all a, b ∈ G

◦ f (c) ̸= 0 for some c ∈ G
Principal Character: The f ∈ Char(G) such that f (x) = 1 for all x ∈ G. Often labeled f1 .
n n
Character Matrix: Let G := {ai }i=1 be finite abelian of cardinality n, its character group Char(G) := {fi }i=1 .
n
Define A := A(G) := (ai,j )i,j=1 by ai,j = fi (aj ).

Dirichlet Characters: Take G ∈ RRS(k), equivalence classes denoted n

b. To f ∈ Char(G), define
χ := χf , a function χ : N → C, by
(
f (b
n), n, k are coprime
χ(n) :=
0, otherwise

χ is a Dirichlet character (mod k). The principal character χ1 := χf1 satisfies

(
1, n, k are coprime
χ1 (n) :=
0, otherwise

452
§18.5.2: Basic Results

Some basics on groups:

◦ Any subground of a cyclic group is cyclic (Apostol, Prob. 6.4)
◦ (Lagrange) For G a finite group, H ≤ G, then card(H) | card(G). (Apostol, Prob. 6.5)
For any g ∈ G, ord(g) | card(G)
◦ (Cauchy) Let p be prime, p | card(G), and f (p) be the number of solutions to xp ≡p e. Then
p | f (p). (Apostol, Prob. 6.5)
This generalizes Lagrange’s theorem.
◦ A finite group has odd cardinality iff each element is a square. (Apostol, Prob. 6.9)
Generalization: card(G) is coprime to k iff each element is a kth power (Apostol, Prob. 6.10)
On the character group Char(G):
◦ For G finite abelian, f ∈ Char(G), then f (e) = 1 and f (a) is a root of unity (an = e =⇒ f n (a) = 1).
(Apostol, Thm. 6.7)
◦ For G finite abelian, |G| = n, then | Char(G)| = n. (Apostol, Thm. 6.8)
b
X φ(k)
◦ ∀a, b ∈ Z where a < b, χ(n) ≤ (Apostol, Prob. 6.15)

n=a
2
n
On orthogonality and the character matrix A := (fi (aj ))i,j=1 :
n
(
X n, i = 1
◦ The row sum o A is n for f1 and 0 otherwise: fi (ar ) = (Apostol, Thm. 6.10)
r=1
0, i ̸= 1
◦ A is invertible. Moreover, AA∗ = nI (for I identity and A∗ complex conjugate). (Apostol, Thm.
6.11)
1
Hence A−1 = A∗
n
n n
(
X X
−1 n, ai = aj
◦ fr (ai )fr (aj ) = fr (ai aj ) = (Apostol, Thm. 6.12)
r=1 r=1
0, ai ̸= aj
Take the element-wise product of two (not necessarily distinct) columns of A, complex-
conjugating one. Then the sum is 0 unless the columns are the same.
n
(
X n, aj = e
Corollary (ai = e / single-column sum): fr (aj ) =
r=1
0, otherwise

On Dirichlet Characters:
◦ |DChar(k)| = φ(k). Each χ ∈ DChar(k) is completely multiplicative and k-periodic. (Apostol,
Thm. 6.15)
φ(k)
◦ Take DChar(k) := {χi }i=1 and m, n ∈ Z with n, k coprime. Then: (Apostol, Thm. 6.16)
φ(k)
(
X φ(k), m ≡k n
χr (m)χr (n) =
r=1
0, otherwise

Sums of Dirichlet Characters:

◦ Take χ ∈ DChar(k), χ ̸= χ1 . Let f ∈ C 1 , f ≥ 0, with f ′ (x) < 0 for x ≥ x0 . Then: (Apostol,
Thm. 6.17)

453
X
χ(n)f (n) = O(f (x))
x<n≤y
∞
x→∞
X
Suppose f (x) −−−−→ 0; then χ(n)f (n) converges.
n=1
X ∞
X
Moreover, for x ≥ x0 , χ(n)f (n) = χ(n)f (n) + O(f (x))
n≤x n=1

◦ Corollary: for f (x) = 1/x, f (x) = log(x)/x, and f (x) = x−1/2 respectively: (Apostol, Thm.
6.18)

X χ(n) X ∞
χ(n) 1
= +O
n n=1
n x
n≤x
X χ(n) log(n) X ∞
χ(n) log(n) log(x)
= +O
n n=1
n x
n≤x
X χ(n) X ∞
χ(n) 1
√ = √ +O √
n n=1
n x
n≤x

X
◦ Let χ ∈ DChar(k) be real-valued, and A(n) = χ(d) (Apostol, Thm. 6.19)
d|n

A(n) ≥ 0 for all n

A(n) ≥ 1 for n square
X A(n) ∞
X χ(n)
◦ Following the previous, let B(x) := √ , L(1, χ) := (Apostol, Thm. 6.20)
n n=1
n
n≤x
x→∞
Then B(x) −−−−→ ∞
√
B(x) = 2 x · L(1, χ) + O(1) for x ≥ 1
Hence L(1, χ) ̸= 0

454
§18.6: On Arithmetical Progressions & Primes

There are infinitely many primes p of the forms below:

◦ p = 4n − 1 (Apostol, Thm. 7.1)

◦ p = 4n + 1 (Apostol, Thm. 7.2)
◦ p = 5n − 1
◦ p = 8n − 1
◦ p = 8n − 3
◦ p = 8n + 3
Dirichlet’s theorem:
X log p log x
◦ For k > 0; h, k coprime; x > 1; we have = + O(1) (Apostol, Thm. 7.3)
p φ(k)
p≤x
p≡k h
∞
◦ As log(x) → ∞, then infinitely many primes lie in {nk + h}n=0 for h, k coprime
Useful results for Dirichlet’s theorem:

◦ Here, k ∈ Z≥1 is a fixed modulus; h ∈ Z is coprime to k; χi ∈ DChar(k); χ ∈ DChar(k) but

χ ̸= χ1 ; p prime; x > 1
◦ We define two series by, for χ ̸= χ1 ,
∞ ∞
X χ(n) X χ(n) log(n)
L(1, χ) = L′ (1, χ) = −
n=1
n n=1
n

◦ Let N (k) be the number of χ ∈ DChar(k), χ ̸= χ1 , such that L(1, χ) = 0.

X log p log x
◦ = + O(1) (Apostol, Thm. 7.3)
p φ(k)
p≤x
p≡k h
φ(k)
X log p log x 1 X X χr (p) log(p)
◦ = + χr (h) + O(1) (Apostol, Lem. 7.4)
p φ(k) φ(k) r=2 p
p≤x p≤x
p≡k h
X χ(p) log(p) X µ(n)χ(n)
◦ = −L′ (1, χ) + O(1) (Apostol, Lem. 7.5)
p n
p≤x n≤x
X µ(n)χ(n)
◦ L(1, χ) = O(1) (Apostol, Lem. 7.6)
n
n≤x
X log p 1 − N (k)
◦ = log(x) + O(1) (Apostol, Lem. 7.7)
p φ(k)
p≤x
p≡k 1

◦ It is easy to see that N (k) is even, and since the LHS → ∞, the RHS must too, and hence
N (k) = 0.
X µ(n)χ(n)
◦ If L(1, χ) = 0 for χ ̸= χ1 , then L′ (1, χ) = log(x) + O(1) (Apostol, Lem. 7.8)
n
n≤x

Distribution of primes in arithmetic progressions

455
X
◦ For k > 0 and a coprime to it, define πa (x) := 1
p≤x
p≡k a
∞
◦ πa (x) counts the number of primes ≤ x, in the sequence {nk + a}n=0 .
π(x) ∼ 1 x
◦ Its version of the PNT is πa (x) ∼
∞ ∞
φ(k) φ(k) log(x)
◦ If the above holds, then πa (x) ∼
∞ πb (x) whenever a, b are coprime to k. The converse is true.

(Apostol, Thm. 7.10)

456
§18.7: More on Dirichlet Characters & Gauss Sums

§18.7.1: Basic Definitions

My Notations:

◦ ζk is the first, nonreal kth root of unity CCW from 1: ζk := e2πi/k

(f,g)
◦ We often reference sk , but as it depends on f, g, I’ll let sk ≡ sk
◦ PChar(k) is the set of primitive characters mod k (subset of DChar(k))
Some Definitions:

◦ Period: f arithmetic is k-periodic or periodic mod k if f (n + k) = f (n) ∀k ∈ Z. The smallest

period is the fundamental period.
k−1
X
◦ Finite Fourier Series: For some coefficients cm , f (n) = cm ζkmn . We often rewrite this as
X m=0
.
m mod k
X
◦ Ramanujan’s Sum: Define ck (n) := ζkmn
m mod k
gcd(m,k)=1

(f,g)
X k
◦ Generalization: Given f, g arithmetic, define sk (n) := sk (n) := f (d)g
d
d|gcd(n,k)
Pk
◦ Gauss Sum: Given χ ∈ DChar(k), define G(n, χ) := m=1 χ(m)ζkmn
◦ We say one is separable if G(n, χ) = χ(n) · G(1, χ).
◦ Induced Modulus: For χ ∈ DChar(k) and d | k and d > 0, d is an induced modulus of χ if

χ(a) = 1 for all a such that a, k are coprime and a ≡d 1

Hence χ is a like a character mod d on the elements in b

1 (mod d) coprime to k.
◦ Primitive Characters: If χ ∈ DChar(k) has no induced modulus except k itself, then we say χ
is primitive mod k. (Hence, ∀d proper divisors of k, ∃a ≡d 1 coprime to k, such that χ(a) ̸= 1.)
◦ Conductors: The smallest induced modulus of χ is the conductor of χ.
◦ If χ(n) = ψ(n) for χ ∈ DChar(k), ψ ∈ DChar(d), d | k, we say χ extends ψ. (If χ extends ψ, then
d is an induced modulus of χ.)

457
§18.7.2: Some Results

Early results:
k−1
(
X 0, k̸ | n
◦ ζkmn = (Apostol, Thm. 8.1)
m=0
k, k | n
k−1
◦ Lagrange interpolation (Th. 8.2): given {zi , wi }i=0 ⊆ C, with zi distinct, ∃! polynomial P
where deg(P ) ≤ k − 1 and P (zm ) = wm . It is given by defining
k−1 k−1
Y A(z) X Am (z)
A(z) := (z − zi ) Am (z) := P (z) = wm ·
i=0
z − zm m=0
Am (zm )

◦ Fourier Existence (Th. 8.4): For f arithmetical and k-periodic, ∃! g arithmetical and k-
periodic where
k−1 k−1
X 1 X
f (m) = g(n)ζkmn for g(n) = f (m)ζk−mn
n=0
k m=0

Ramanujan’s Sum Results:

◦ µ(k) = ck (1)

X k
◦ ck (n) = dµ
d
d|gcd(n,k)
(f,g)
◦ sk is k periodic (per where n pops up)

(f,g)
X X k d
◦ sk (n) = ak (m)ζkmn for ak (m) = g(d)f (Apostol, Thm. 8.5)
d k
m mod k d|gcd(m,k)
(id,µ)
◦ Taking f ≡ id, g ≡ µ, ck ≡ sk , i.e. sk ≡ ck . (Apostol, Thm. 8.6)
◦ Take f, g multiplicative and (a, k), (b, m) coprime pairs. Then: (Apostol, Thm. 8.7)
(f,g) (f,g) (f,g)
smk (ab) = sm (a) · sk (b)
(f,g) (f,g)
sm (ab) = sm (a) (taking k = 1)
(f,g) (f,g)
smk (a) = sm (a) · g(k) (taking b = 1)
◦ Let f be completely multiplicative, g = µh, h multiplicative, h(p) ̸= f (p) ̸= 0 for p prime. Then

(f,g) (f ∗ g)(k) k
sk (n) = g
k gcd(n, k)
(f ∗ g)
gcd(n, k)
Taking F := f ∗ g and N := k/ gcd(n, k), this can be simplified as (Apostol, Thm. 8.8)

(f,g) F (k)g(N )
sk (n) =
F (N )
φ(k)µ(N )
Hence, ck (n) =
φ(N )
n
X X n
◦ ck (m) = dM for M Merten’s function (Apostol, Prob. 8.3a)
d
k=1 d|m
d
X 1 m X
◦ M (m) = m µ ck (d) (Apostol, Prob. 8.3b)
d d
d|m k=1

458
n j k
X X k n
◦ ck (m) = dµ (Apostol, Prob. 8.3c)
m=1
d d
d|k

Gauss Sums: (χ ∈ DChar(k) unless stated otherwise)

◦ G(n, χ1 ) = ck (n)
◦ If n, k are coprime, then G(n, χ) = χ(n) · G(1, χ) (Apostol, Thm. 8.9)
◦ G(n, χ) is separable ∀n iff G(n, χ) whenever n, k are not coprime (Apostol, Thm. 8.10)
2
◦ If G(n, χ) is separable ∀n, then |G(1, χ)| = k (Apostol, Thm. 8.11)
◦ Let n be not coprime to k, G(n, χ) ̸= 0. Then ∃d | k, d < k, where (Apostol, Thm. 8.12)

χ(a) = 1 for all a coprime to k where a ≡d 1

Induced Moduli: (χ ∈ DChar(k) unless stated otherwise)

◦ k is always an induced modulus

◦ 1 is an induced modulus iff χ ≡ χ1 (Apostol, Thm. 8.13)
◦ For χ ̸= χ1 and k prime, χ ∈ PChar(k) (nonprincipal characters mod a prime are primitive)
(Apostol, Thm. 8.14)
◦ Restatements of some Gauss sum theorems: if χ ∈ PChar(k), (Apostol, Thm. 8.15)
G(n, χ) = 0 for all n not coprime to k
G(n, χ) is separable ∀n (the converse is true, cf. Th. 8.19)
2
|G(1, χ)| = k
◦ Let d | k and d > 0. d is an induced modulus if χ iff (Apostol, Thm. 8.16)

χ(a) = χ(b)

for all a, b coprime to k where a ≡d b.

◦ Let d | k and d > 0. These are equivalent: (Apostol, Thm. 8.17)
(a) d is an induced modulus of χ
(b) ∃ψ ∈ DChar(d) such that χ = ψ · χ1 (χ1 ∈ DChar(k) principal)
◦ Any χ can be written as χ = ψχ1 for χ1 ∈ DChar(k) principal and ψ ∈ PChar(c), where c is the
conductor of ψ. (Apostol, Thm. 8.18)
◦ For χ ∈ DChar(m) with induced moduli k1 , k2 , then gcd(k1 , k2 ) is an induced moduli too. (Apos-
tol, Prob. 8.6)
◦ The conductor of χ divides all induced moduli of χ. (Apostol, Prob. 8.7)
n
◦ Let {ki }i=1
be pairwise coprime, k := k1 k2 · · ·kn . Then χ ∈ DChar(k) has a unique factorization
χ = χ1 χ2 · · ·χn for χi ∈ DChar(ki ). (Apostol, Prob. 8.9)
For such a factorization and f (χ) its conductor, f (χ) = f (χ1 ) · · ·f (χn ). (Apostol, Prob.
8.10)
Moreover χ ∈ PChar(k) ⇐⇒ χi ∈ PChar(ki ) for all i. (Apostol, Prob. 8.12)
M
X χ(m) 2 √
◦ For χ ∈ PChar(k) and N < M , < k log(k) (Apostol, Prob. 8.13)

m N +1
m=N +1

On primitive characters:

◦ χ ∈ PChar(k) iff G(n, χ) is separable ∀n (Apostol, Thm. 8.19)

459
◦ For χ ∈ PChar(k), it has Fourier expansion (Apostol, Thm. 8.20)
k
τk (χ) X
χ(m) = √ χ(n)ζk−mn
k n=1

where
k
G(1, χ) 1 X
τk (χ) = √ = √ χ(m)ζkm
k k m=1
Note that |τk (χ)| = 1.
◦ There is no real χ ∈ PChar(2m) for m odd. (Apostol, Prob. 8.5)

X
Recall that ∀χ ∈ DChar(k), we have

χ(m) ≤ φ(k)
m≤x

√
X
Polya’s Inequality: For χ ∈ PChar(k), ∀x ≥ 1,

χ(m) < k log(k) (Apostol, Thm. 8.21)
m≤x

√ 2√
X

◦ Improvable to
χ(n) < k + k log k (Apostol, Prob. 8.14)
n≤x π
X √
◦ For χ nonprimitive mod k, χ(m) = O k log(k) (Apostol, Thm. 13.15)
m≤x

460
§18.8: Quadratic Residues & Quadratic Reciprocity

§18.8.1: Basic Definitions

Personal Notations:

◦ I write QR(p) to denote the collection of quadratic residues in Z/pZ.

◦ The Legendre, Jacobi, and Kronecker (from Wikipedia) symbols are all defined similarly, and
notated identically. For notation’s sake, I denote them, respectively,

n n n
p L p J p K

using the subscripts as an indicator.

◦ The quadratic character χ ∈ DChar(p) is defined by χ(r) := (r | p)L . I denote this one with χL .
Definitions:

◦ Residues: We say n is a quadratic residue mod p if ∃x ∈ Z (or Z/pZ) such that x2 ≡p n. (That
is, n has a square root of sorts, namely x.)
The text opts to say that n is a mod p quadratic residue by nRp.
A nonresidue has no such x; the text denotes it nRp.
◦ Indicator Symbols: (Sometimes (n | p) is used instead.)


n +1, n ∈ QR(p)

Legendre Symbol: := −1, n ̸∈ QR(p) (“is n a residue mod p?”)
p L 
0, p | n

r a
n Y n i n
Jacobi Symbol: If P = pa1 1 · · ·par r , = and =1
P J i=1
pi L 1 J
Kronecker Symbol: We let the following hold to extend the Legendre symbol:

n  0, n even

:= +1, n ≡8 1, 7
2 L 
−1, n ≡8 3, 5

(
n −1, n < 0
:=
−1 L +1, n ≥ 0
(
n 1, n = ±1
:=
0 L 0, otherwise
r
Y
Then we use the Jacobi symbol definition. Let P = u pai i as a prime decomposition, with
i=1
u = ±1. Then
n n r ai
Y n
:= ·
P K u L
i=1
pi L

n
◦ Quadratic Character: The χ ∈ DChar(p) defined by χ(n) :=
p L

461
§18.8.2: Main Results

Preliminaries:
◦ For p an odd prime, any RRS(p) set has (p − 1)/2 quadratic residues, and (p − 1)/2 nonresidues.
The residues belong to the class in which these lie: (Apostol, Thm. 9.1)
2
p−1
12 22 32 ···
2

(p−1)/2
◦ Gauss’ Lemma: Let p̸ | n. Take {kn mod p}k=1 . Let m be the number of these > p/2. Then
(Apostol, Thm. 9.6)
n
= (−1)m
p L
(p−1)/2

 p2 − 1 X tn
(n − 1) + , n even


8 p


t=1
◦ An Improvement: In this scenario, m ≡2 (p−1)/2
(Apostol,

 X tn

 , n odd
p

t=1
Thm. 9.7)
Algebraic Properties of Legendre Symbols: (Throughout, p is an odd prime.)
2 (
x 1, p̸ | x
◦ = (Source: Wikipedia)
p L 0, p | x

m n
◦ m ≡p n =⇒ = (p periodic on top)
p L p L

mn m n
◦ = (completely multiplicative on top) (Apostol, Thm. 9.3)
p L p L p L
(
−1 p−1 (p−1)/2 +1, p ≡4 1
◦ = = (−1) = (Apostol, Thm. 9.4)
p L p L −1, p ≡4 3
(
2 2 +1, p ≡8 1, 7
◦ = (−1)(p −1)/8 = (Apostol, Thm. 9.5)
p L −1, p ≡8 3, 5
(
3 ⌊(p+1)/6⌋ +1, p ≡12 1, 11
◦ For p ̸= 3, = (−1) = (Source: Wikipedia)
p L −1, p ≡12 5, 7
(
5 ⌊2(p+1)/5⌋ +1, p ≡5 1, 4
◦ For p ̸= 5, = (−1) = (Source: Wikipedia)
p L −1, p ≡5 2, 3
 
(q−1)/2 (p−1)/2
p Y Y k i 
◦ = sign − (Source: Wikipedia)
q L i=1
p q
k=1
(p−1)/2
q Y sin(2πn · q/p)
◦ = (Source: Wikipedia)
p L n=1
sin(2πn/p)

Algebraic Properties of Jacobi Symbols: (Throughout P, Q are distinct, odd, and in Z>0 .)

a
◦ = −1 =⇒ a ∉ QR(p)
p J

462

a
◦ a ∈ QR(p) and a, p coprime =⇒ = 1 (converse need not hold!!)
p J
◦ All above properties for Legendre symbols (cf. Theorems 9.9, 9.10)
m m
m

◦ = (compl. mult. on bottom) (Apostol, Thm. 9.9b)
P J Q J PQ J
2
a n n
◦ If a, P are coprime, = (Apostol, Thm. 9.9d)
P J P J

Algebraic Properties of Kronecker Symbols: (Sourced from Wikipedia.)

a
◦ = ±1 for a, n coprime, and 0 otherwise
n K
a b
ab
◦ = unless n = −1 and one of a, b is zero and the other negative
n K n K n K
a a a
◦ = unless a = −1, one of m, n is zero, and the other has “odd part” (the bit
mn K m K n K
when removing all factors of 2) ≡4 3.
◦ Suppose n > 0. Further suppose that
(
4n, n ≡4 2
q= a ≡q b
n, otherwise
a
b
Then = . This is also true for n < 0 if sign(a) = sign(b).
n K n K
◦ Suppose 0 ̸= a ̸≡4 3. Let (
4|a|, a ≡4 2
q=
|a|, otherwise
a a
If m ≡q n, then = .
m K n K
◦ Kronecker Quadratic Reciprocity: For n ∈ Z̸=0 , with n = 2e n′ for e ≥ 0, define n′ to be its
odd part. (If n = 0, 0′ = 1. Take m, n ∈ Z coprime. Then
m n ′ ′
= u · (−1)(n −1)(m −1)/4
n K m K
where (
+1, m ≥ 0 ∨ n ≥ 0
u=
−1, m, n < 0
A nonsymmetric version for such m, n is
m n ′ ′
= (−1)(n −1)(m −1)/4
n K |m| K

◦ Some Special Values: Similar to the other two:

−1 (n′ −1)/2 2 2
= (−1) = (−1)(n −1)/8
n K n K

Big Results: (Throughout, p is an odd prime; if q is used, it is one too, and p ̸= q.)

n
◦ Euler Criterion: ≡p n(p−1)/2 (Apostol, Thm. 9.2)
p L

463

p q
◦ Legendre Quadratic Reciprocity: = (−1)(p−1)(q−1)/4 (Apostol, Thm. 9.8)
q L p L

q
− , p, q ≡4 3


p p L
Equivalently, since the symbols will be ±1, =
q L  q
 , otherwise
p L

The law also holds for Jacobi symbols in the obvious way.
Some on Gauss Sums: (Throughout, p is an odd prime; if q is used, it is one too, and p ̸= q.)
X
◦ Recall: for χ ∈ DChar(p), G(n, χ) := χ(r)ζnnr
r mod p
m
X 2
nr
◦ Define the quadratic Gauss sum by G(n; m) := ζm
r=1
◦ For χ(r) := (r | p) (the quadratic character χL ), χL is primitive and G(n, χL ) = (n | p) · G(1, χL )
∀n.

−1
◦ G(1, χL )2 = p = ±p. (Apostol, Thm. 9.13)
p L

q
◦ G(1, χL )q−1 ≡p iff quadratic reciprocity holds (Apostol, Thm. 9.14)
p L
X r1 r2 · · ·rq
q−1 q
◦ G(1, χL ) = (Apostol, Thm. 9.15)
p L p L
1≤i≤q
ri mod p
r1 +...rq ≡p q

n
◦ G(n; p) = · G(1; p)
p L
◦ Quadratic Reciprocity: ∀m ∈ Z≥1 , we have
 √
m, m ≡4 1
√



m  0, m ≡4 2
G(1; m) = (1 + i)(1 + e−πim/2 ) = √
2  i m, m ≡4 3
√


(1 + i) m, m ≡4 0
