Personal Formulary
Personal Formulary
Table of Contents
Page
1
3.7 Convergence Tests for Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.8 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.8.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.8.2 Some Properties & Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.8.3 Some Common Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.9 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.9.1 Basic Definitions & Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.9.2 Important Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.9.3 Common Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.9.4 Discrete Fourier Transform (DFT) . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.10 Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.10.1 Basic Definitions & Reference Tables . . . . . . . . . . . . . . . . . . . . . . . . 86
3.10.2 Laplace Transform Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.10.3 Common Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.11 Cauchy Principal Value (PV/CPV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2
8.17 The Power Method / Power Iteration / von Mises Iteration . . . . . . . . . . . . . . . . 148
8.18 Definiteness: Positive & Negative (Semi-)Definite . . . . . . . . . . . . . . . . . . . . . . 149
8.19 Dual Spaces; Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.20 Various Matrix Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.20.1 Eigendecomposition / Spectral Decomposition . . . . . . . . . . . . . . . . . . . 151
8.20.2 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . . . . 152
8.20.3 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.20.4 Householder Triangularization & QR Stuff . . . . . . . . . . . . . . . . . . . . . 159
8.20.5 Hessenberg Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.20.6 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.20.7 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
3
11.3.3.6 Right Euclidean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.3.3.7 Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
11.3.4 Comparability Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11.3.4.1 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11.3.4.2 Converse Well-Founded . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.3.4.3 Trichotomous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
11.3.4.4 Well-Founded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.3.5 Function-Like Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
11.3.5.1 Bijectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
11.3.5.2 Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
11.3.5.3 Injectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
11.3.5.4 (Left-)Totality / Seriality . . . . . . . . . . . . . . . . . . . . . . . . 234
11.3.5.5 Surjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.4 Combinations of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.4.1 Dense Posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.4.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
11.4.3 Equivalence / Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . 238
11.4.4 Partial Equivalence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11.4.5 Partial Orders / Posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.4.6 Preorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.4.7 Prewellorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.4.8 Pseudo-Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
11.4.9 Strict Partial Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
11.4.10 Strict Total Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
11.4.11 Total Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
11.4.12 Total Preorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
11.4.13 Tournaments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
11.4.14 Well-order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.5 Basic Operations & Derived Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.5.1 Property Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.5.2 Property Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.5.3 Relation Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
11.5.4 Transpose of Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
4
12.6 (Dummit & Foote, Chapter 2 ) Group Theory: Subgroups . . . . . . . . . . . . . . . . . 289
12.7 (Dummit & Foote, Chapter 3 ) Group Theory: Quotients; Homomorphisms . . . . . . . . 290
12.8 (Dummit & Foote, Chapter 4 ) Group Theory: More on Actions . . . . . . . . . . . . . . 295
12.9 (Dummit & Foote, Chapter 7 ) Ring Theory: Basic Definitions/Examples . . . . . . . . . 298
12.10 (Dummit & Foote, Chapter 7 ) Ring Theory: Homomorphisms, Quotients, Ideals . . . . 303
12.11 (Dummit & Foote, Chapter 8 ) Ring Theory: Domains (Euclidean, PIDs, UFDs) . . . . . 308
12.12 (Dummit & Foote, Chapter 13 ) Field Theory: Basics of Field Extensions . . . . . . . . . 312
12.13 (Dummit & Foote, Chapter 13 ) Field Theory: Algebraic Extensions . . . . . . . . . . . . 315
12.14 (Dummit & Foote, Chapter 13 ) Field Theory: Splitting Fields; Algebraic Closures . . . 317
12.15 (Dummit & Foote, Chapter 13 ) Field Theory: Separability . . . . . . . . . . . . . . . . . 319
12.16 (Dummit & Foote, Chapter 14 ) Galois Theory: Basic Definitions . . . . . . . . . . . . . 321
12.17 (Dummit & Foote, Chapter 14 ) Galois Theory: The Fundamental Theorem . . . . . . . 323
5
16 Items from Complex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.1 Complex Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
16.2 Complex Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
16.3 Auxillary Inequalities/Results for Contour Integrals . . . . . . . . . . . . . . . . . . . . . 417
6
19.9 Polylogarithms – Lin (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
19.10 Trig Integrals – Si(x), Ci(x), etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
7
§1: Miscellaneous Topics from Algebra
(log(x) and ln(x) may be used interchangeably, but note that whatever base log(x) implies is generally
field-dependent: in math it is e, hence ln(x).)
◦ ax+y = ax ay
◦ ax−y = ax /ay
◦ (ax )y = axy = (ay )x
◦ (ab)x = ax bx
◦ a0 = 1
◦ 0x = 1 for x > 0
√
◦ a1/x = x a
Logarithms: Be mindful that these need not extrapolate to C. Assume a, x, y > 0 and a ̸= 1.
8
§1.2: Denesting Roots
Introduction:
We motivate this by discussing the denesting of
√
q
4
89 − 28 10 (1)
Well, we can expand the right-hand side by the binomial theorem. Notice that, then,
n
√ n X
m n k m√ n−k
a+b C = a b C
k
k=0
√ ′√
From here, we’ll note that α+β γ = α′ +β γ if and only if α = α′ and β = β ′ . That is, we’ll factor out our
radical after expanding via the binomial theorem, set like coefficients equal, and do whatever manipulations
necessary to get to an end result.
Broadly, these subsequent manipulations consist of the following:
We’ll multiply one equation by a constant, so the constant term in each is equal, so the variable terms
must equal.
Once we have an equation in integer coefficients, we divide by the highest power of b.
If r is a root, then this means a/b = r ⇐⇒ a = br. We substitute this back in to our system of
equations, to see if it results in solutions or not.
9
Starting Out:
In (1), then, we want a, b such that
√ √ 4
89 − 28 10 = a + b 10
10
Utilizing the Roots:
Ultimately, you’ll need to use whatever tools are at your disposal for solving polynomial equations.
Inspection, graphing, the rational root theorem; it ultimately depends. Here, we can find that the roots of
this are √
20 ± 3i 10
x = −5 x = −2 x=−
7
We’ll focus on the real roots, namely x = −5. Then
a
x = −5 =⇒ = −5 =⇒ a = −5b
b
Consider our initial pair of equations in (3). Replace a with −5b then. We get
Simplifying,
89 = 2225b4
−28 = −700b4
Conclusion:
This gives us
4
√ √ 1 √ √ √ 4
89 − 28 10 = − 5+ √ 10 = 2− 5
5
Taking fourth roots, then, we get
√ √ √
q
4
89 − 28 10 = 2 − 5
√
(Recall that x2 = |x|? The idea is the same here.) Well, clearly,
√ √ √ √ √ √
2 − 5 = 5 − 2 = 5 − 2
√ √
since 5> 2 and |x − y| = |y − x| (just multiply by −1).
Finally, then, we have our denested expression:
√ √ √
q
4
89 − 28 10 = 5 − 2
11
Addendum: What does the no-solution-from-a-root case look like?
Consider now a different case; what if x = −1 in our roots? Then
a
x= = −1 =⇒ a = −b
b
Then our equations of concern in (3) become
89 = 161b4
−28 = −44b4
Note that if you multiply the second equation by −1, and then by 89/28 (so the left-hand sides would be
equal), the coefficient on the right-hand side becomes 979/7 ≈ 139.857 ̸= 161. In this case, then, no b works
– so sometimes you’ll have to try multiple different roots.
12
§1.3: Tricks & Identities for Factorizing and Root-Finding
The Usual, For Quadratics: If ax2 + bx + c = 0 with a, b, c ∈ Z, try finding factors p, q of ac that
sum up to b. (Hence pq = ac and p + q = b.) Then
p q
ax2 + bx + c = a x + x+
a a
Complete the Square (Quadratics): Given x2 + bx + c = 0, move c to the other side, then take
1/2 of b, square it, and add it to both sides. On the LHS, you get
2 2
2 b b
x + bx + = x+
2 2
Conjugate Root Theorem: Following this up, if α + βi ∈ C is a root of p ∈ R[x] for α ∈ R, β ∈ R̸=0 ,
then α − βi is a root too.
Reversed Coefficients & Reciprocal Roots: (More of a root-finding theorem.) Suppose that
n
X
p(x) = ai xi = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn
i=0
has root r ̸= 0. Then 1/r is a root of xn p(1/x) and in particular p(1/x). Note that
X n
1
xn p = ai xn−i = a0 xn + a1 xn−1 + a2 xn−2 + · · · + an−1 x + a0
x i=0
Note that, hence, if r is a root of a polynomial, then 1/r is a root of the polynomial with reversed
coefficient order.
n n
X n k n−k X n k
Binomial Theorem: (a + b)n = a b or (1 + x)n = x
k k
k=0 k=0
◦ This is where the perfect square trinomial idea, (x + a)2 = x2 + 2ax + a2 for instance, comes from.
m
!n Ym
X X n k
Multinomial Theorem: xi = xj j
i=1
k 1 , k2 , · · ·, km j=1
k1 +k2 +···+km =n
ki ∈Z≥0 ∀i
Sum/Difference of Powers: For the sums, apply a geometric series formula below, with n odd and
b 7→ −b. Hence (−1)n = −1. For the evens, apply it with n even.
◦ Difference of Squares: a2 − b2 = (a − b)(a + b)
◦ Difference of Fourths: a4 − b4 = (a − b)(a3 + a2 b + ab2 + b3 )
◦ Difference of Even Powers: an − bn = (a − b) an−1 + an−2 b + . . . + bn−1 for n even
13
◦ Sum of Cubes: a3 + b3 = (a + b)(a2 − ab + b2 )
◦ Sum of Fifths: a5 + b5 = (a + b)(a4 − a3 b + a2 b2 − ab3 + b4 )
◦ Sum of Odd Powers: an + bn = (a + b) an−1 − an−2 b + . . . + bn−1 for n odd
Finite Geometric Series: Second implies first, with x = a/b and then multiplying by bn :
xn − 1 = (x − 1) xn−1 + xn−2 + . . . + x + 1
14
More Obscure/Niche Tricks:
Rational Root Theorem: (Wikipedia) Take a polynomial in integer coefficients p ∈ Z[x], with
leading coefficient ℓ and constant coefficient c. Then the only possible rational roots are of the form
a
± where a | c and b | ℓ
b
Any rational root will have this form, but not all combinations will be roots (it is possible for all roots
to be irrational, even).
Descartes’ Rule of Signs: (Wikipedia) Descartes’ rule of signs can give the number of positive and
negative roots, or at least narrow it down.
Write your polynomial in descending power order:
p(x) = an xn + an−1 xn−1 + an−2 xn−2 + · · · + a2 x2 + a1 x + a0
◦ # of Positive Roots (rp ): Count the number of sign changes between consecutive terms. Let
this number be s. Then rp = s − 2k for some k ∈ {0, 1, 2, · · ·}. (Of course, rp ≥ 0. Hence, if
s = 0, 1, then rp = s.)
◦ # of Negative Roots (rn ): Find p(−x) and apply the positive test to it, or multiply the
coefficients of odd power only by −1 and then apply the positive test to it. The rp you find
there becomes rn , with the same caveat: rn = s − 2ℓ for some ℓ ∈ Z≥0 , with the restriction rn ≥ 0.
An Extension of Completing The Square: Say we have x4n + a2 and wish to factor it. We can
do a completing the square-like method. Ideally, we would have
?
x4n + a2 = (x2n + a)2
but in reality
(x2n + a)2 = x4n + 2ax2n + a2
To mitigate this, add and subtract this red term, the term that gives a perfect square. Then leveraging
difference of squares yields
x4n + a2 = x4n + 2ax2n + a2 − 2ax2n
2
= x2n + a − 2ax2n
2 √ 2
= x2n + a − 2axn
√ √
= x2n + a − 2axn x2n + a + 2axn
For instance,
x4 + 4 = x4 + 4x2 + 4 − 4x2
= (x2 + 2)2 − 4x2
= (x2 + 2 − 2x)(x2 + 2 + 2x)
x8 + 81 = x8 + 18x4 + 81 − 18x4
= (x4 + 9)2 − 18x4
√ √
= x4 + 9 − 3 2x2 x4 + 9 + 3 2x2
15
Sturm’s Theorem: (Wikipedia) We consider a polynomial p with no repeated roots. Form the Sturm
deg(p)
sequence of p given by Sp := {pi }i=1 with
p0 = p
p1 = p′
Take ξ ∈ dom(p) and consider the sequence p0 (ξ), p1 (ξ), · · ·, pdeg(p) (ξ). Let V (ξ) be the number of sign
changes at ξ in this sequence (a la Descartes’ rule of signs).
Sturm’s theorem states the number of roots of p in (a, b] is V (a) − V (b).
We may extend to rays and R with the conventions
(WLOG, we may let a < b < c.) Let m := (a + b)/2, the average of the two smallest roots.
Then the tangent line to f at m is given by
y = f (m) + f ′ (m)(x − m)
16
External Links to Extremely Niche/Specialized Topics:
Ferrari’s Method for Quartics: (Encyclopedia of Math link) A means of turning solving a quartic
into an issue of solving cubics and quadratics.
Gauss-Lucas Theorem: (Wikipedia) For p ∈ C[z] non-constant, the roots of p′ are in the convex
hull of those from p.
Kronecker’s Method: (Wikipedia) Uses that x ∈ Z =⇒ p(x) ∈ Z for p ∈ Z[x]. Requires a lot of
brute-force, however, and is not recommended by hand.
17
§1.4: Quadratic, Cubic, Quartic Formulas
Quadratic Formula
The quadratic formula is familiar and kept as a formality:
√
2 −b ± b2 − 4ac
ax + bx + c = 0 =⇒ x =
2a
Cubic Formula
The cubic formula, for ax3 + bx2 + cx + d = 0, is given by first defining
−b
p=−
3a
bc − 3ad
q = p3 +
6a2
c
r=
3a
Then r r
q q
3 3 3 3
x= q + q + (r − p ) + q − q 2 + (r − p2 ) + p
2 2
18
The previous writing makes the use of complex roots implicit. To mitigate this, one may write the three
solutions explicitly as
v s
u 2 3
u
3 −b 3 bc − 3ad −b3 bc − 3ad c b2
x1 = − + + − + + −
t
27a3 6a2 27a3 6a2 3a 9a2
v s
u 2 3
u
3 −b 3 bc − 3ad −b3 bc − 3ad c b2
+ − + − − + + −
t
27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a v
√ u s 2 3
1−i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
x2 = − + + − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
v
√ u s 2 3
1+i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
+ − + − − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a v
√ u s 2 3
1+i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
x3 = − + + − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
v
√ u s 2 3
1−i 3u 3 −b3 bc − 3ad −b3 bc − 3ad c b2
+ − + − − + + −
t
2 27a3 6a2 27a3 6a2 3a 9a2
−b
−
3a
or, for the analogous shortened version,
r q q r
3 3 3 3
x1 = q+ q2
+ (r − p2 )
+ q − q 2 + (r − p2 ) + p
√ r √ r
1+i 3 3 1−i 3 3
q q
2 2 3 3
x2 = q + q + (r − p ) + q − q 2 + (r − p2 ) + p
2 2
√ r √ r
1−i 3 3 1 + i 3 3
q q
3 3
x3 = q + q 2 + (r − p2 ) + q − q 2 + (r − p2 ) + p
2 2
19
Quartic Formula:
The formula for equations of the type ax4 + bx3 + cx2 + dx + e = 0 is significantly worse, worry not.
Define the shorthands
where each of the four roots arises from a distinct choice of + or − on each radical.
Solely for the sake of posterity, each root, fully uncompressed into just a, b, c, d, e, can be found here and
its underlying LaTeX here. (There’s no way it could fit on this page.)
20
§2: Items from Trigonometry
opposite
◦ sin θ =
hypotenuse
adjacent
◦ cos θ =
hypotenuse
opposite
◦ tan θ =
adjacent
hypotenuse
◦ csc θ =
opposite
hypotenuse
◦ sec θ =
adjacent
adjacent
◦ cot θ =
opposite
◦ Mnemonic: SOH-CAH-TOA (for the first 3)
Unit Circle Definitions: Make an angle θ with one side on the positive x-axis, measured counter-
clockwise. There is a point where the other, terminal side (x, y) which crosses the unit circle x2 +y 2 = 1.
Then:
◦ sin θ = y
◦ cos θ = x
◦ tan θ = y/x
◦ csc θ = 1/y
◦ sec θ = 1/x
◦ cot θ = x/y
◦ Thus the point crossing the unit circle is (x, y) = (cos θ, sin θ)
◦ The other four can be found visually too: Desmos demo.
21
Functions in Terms of Sine & Cosine:
sin θ
◦ tan θ =
cos θ
1
◦ csc θ =
sin θ
1
◦ sec θ =
cos θ
sin θ
◦ cot θ =
cos θ
Domains: Interpreted as functions D → R for D the “natural domain”.
22
§2.2: Special Values
(An extended, fuller list of special values, beyond the basics from this section, can be found on Wikipedia.)
Some of the basic ones can be thought of in terms of the 30◦ -60◦ -90◦ and 45◦ -45◦ -90◦ triangles, and some
basic geometry.
Note that, per the unit circle definitions, the “All Students Take Calculus” mnemonic holds.
Label the quadrants with “A”, “S”, “T”, “C” in numerical order
All functions are positive where “A” is (0◦ < θ < 90◦ or 0 < θ < π/4)
Sine (and thus cosecant) is positive where “S” is (90◦ < θ < 180◦ or π/2 < θ < π)
Tangent (and thus cotangent) is positive where “T” is (180◦ < θ < 270◦ or π < θ < 3π/2)
Cosine (and thus secant) is positive where “C” is (270◦ < θ < 360◦ or 3π/2 < θ < 2π)
23
The basic unit circle values, tabulated, for 30◦ - and 45◦ -related angles:
24
Those 30◦ - and 45◦ -related values, visualized on the unit circle:
25
§2.3: Various Trigonometry Identities
Some common proof techniques for these formulas to avoid memorization are to:
Each function in terms of only one of the others: The ± depends on the location of θ (use “All
Students Take Calculus”)
Pythagorean Identities: First is immediate from the unit circle; others come from dividing it by
sin2 θ or cos2 θ (or the unit circle visual):
◦ sin2 θ + cos2 θ = 1
◦ 1 + cot2 θ = csc2 θ
◦ tan2 θ + 1 = sec2 θ
26
Parity, Cofunction, & Reflection Identites: Easily motivated by the unit circle. (The first column
denotes those for evenness/oddness, and the second are the cofunction identites.)
Periodicity: Sine, cosine, secant, and cosecant are 2π-periodic, and tangent and cotangent are π-
periodic. These are easily motivated by the graphs.
27
Harmonic Addition Formula: We may write
wherein
p
A = sign(c1 ) · a2 + b2
c1
cos φ =
A
c2
sin φ =
A
We need both of the last two to accurately determine where φ lies. You can also use
c2 c2
tan φ = − =⇒ φ = arctan −
c1 c1
28
Angle Sum Identities: Motivated by triangles with angles meeting at acute angles α, β. Wikipedia
lists some proofs here.
◦ Sine: sin(α ± β) = sin α cos β ± cos α sin β (sine cosine sign cosine sine)
◦ Cosine: cos(α ± β) = cos α cos β ∓ sin α sin β (cosine cosine cosign sine sine)
tan α ± tan β
◦ Tangent: tan(α ± β) =
1 ∓ tan α tan β
sec α sec β csc α csc β
◦ Secant: csc(α ± β) =
sec α csc β ± csc α sec β
sec α sec β csc α csc β
◦ Cosecant: sec(α ± β) =
csc α csc β ∓ sec α sec β
cot α cot β ∓ 1
◦ Cotangent: cot(α ± β) =
cot β ± cot α
p p
◦ Arcsine: arcsin x ± arcsin y = arcsin x 1 − y 2 ± y 1 − x2
p
◦ Arccosine: arccos x ± arccos y = arccos xy ∓ (1 − x2 )(1 − y 2 )
x±y
◦ Arctangent: arctan x ± arctan y = arctan
1 ∓ xy
xy ∓ 1
◦ Arccotangent: arccot x ± arccot y = arccot
y±x
29
Double Angle Formulas: Can motivate with α = β in angle-sum formulas.
2 tan θ
◦ sin(2θ) = 2 sin θ cos θ = (sin θ + cos θ)2 − 1 =
1 + tan2 θ
1 − tan2 θ
◦ cos(2θ) = cos2 θ − sin2 θ = 2 cos2 θ − 1 = 1 − 2 sin2 θ =
1 + tan2 θ
2 tan θ
◦ tan(2θ) =
1 − tan2 θ
cot2 θ − 1 1 − tan2 θ
◦ cot(2θ) = =
2 cot θ 2 tan θ
sec2 θ 1 + tan2 θ
◦ sec(2θ) = =
2 − sec2 θ 1 − tan2 θ
sec θ csc θ 1 + tan2 θ
◦ csc(2θ) = =
2 2 tan θ
Triple Angle Formulas:
π π
◦ sin(3θ) = 3 sin θ − 4 sin3 θ = 4 sin θ sin − θ sin +θ
3 3
π π
3
◦ cos(3θ) = 4 cos θ − 3 cos θ = 4 cos θ cos − θ cos +θ
3 3
3 tan θ − tan3 θ π π
◦ tan(3θ) = 2 = tan θ tan − θ tan +θ
1 − 3 tan θ 3 3
3
3 cot θ − cot θ
◦ cot(3θ) =
1 − 3 cot2 θ
sec3 θ
◦ sec(3θ) =
4 − 3 sec2 θ
csc3 θ
◦ csc(3θ) =
3 csc2 θ − 4
30
Multi-Angle Formulas:
(k−1)/2 n
X
(−1) cosn−k θ sink θ
k
k odd
(n+1)/2 i
n i
X X
i−j
◦ sin(nθ) = sin θ (−1) cosn−2(i−j)−1 θ
i=0 j=0
2i + 1 j
n−1
kπ
Y
2 (n−1)
sin θ +
n
k=0
n
X
(−1)k/2 cosn−k θ sink θ
k
k even
n/2 i
i−j n i
XX
cosn−2(i−j) θ cos((2n + 1)θ)
(−1)
2i j
i=0 j=0
◦ cos(nθ) =
2n
n 2n
Y kπ
(−1) 2 cos θ + cos(2nθ)
n
k=0
2n−1
2k + 1
Y
n 2n−1
−
(−1) 2 cos π θ
4n
k=0
X n
(−1)(k−1)/2 tank θ
k
◦ tan(nθ) = k odd
X n
(−1)k/2 tank θ
k
k even
31
Lagrange’s Identities: Provided θ ̸≡ 0 (mod 2π)
θ 1
n cos − cos n + θ
X 2 2
◦ sin kθ =
θ
k=0 2 sin
2
θ 1
n sin + sin n + θ
X 2 2
◦ cos kθ =
θ
k=0 2 sin
2
1
n sin n + θ
X 2
Dirichlet Kernel: Related to the above: 1 + 2 cos kθ =
θ
k=1 sin
2
Half-Angle Formulas: The ± sign convention is based upon where θ/2 ends up landing w.r.t. the
unit circle.
r
θ 1 − cos θ
◦ sin = ±
2 2
r
θ 1 + cos θ
◦ cos = ±
2 2
1 − cos θ
sin θ
sin θ
1 + cos θ
csc θ − cot θ
θ
◦ tan = tan θ
2
1 + sec θ
r
1 − cos θ
sgn(sin θ)
1 + cos θ
√
2
−1 + sgn(cos θ) 1 + tan θ
tan θ
1 + cos θ
sin θ
sin θ
θ
1 − cos θ
◦ cot =
2
csc θ + cot θ
r
1 + cos θ
sgn(sin θ)
1 − cos θ
32
Various Power Reduction Formulas:
1 − cos 2θ
◦ sin2 θ =
2
3 3 sin θ − sin 3θ
◦ sin θ =
4
3 − 4 cos 2θ + cos 4θ
◦ sin4 θ =
8
5 10 sin θ − 5 sin 3θ + sin 5θ
◦ sin θ =
16
1 + cos 2θ
◦ cos2 θ =
2
3 cos θ + cos 3θ
◦ cos3 θ =
4
4 3 + 4 cos 2θ + cos 4θ
◦ cos θ =
8
10 cos θ + 5 cos 3θ + cos 5θ
◦ cos5 θ =
16
1 − cos 4θ
◦ sin2 θ cos2 θ =
8
3 3 3 sin 2θ − sin 6θ
◦ sin θ cos θ =
32
3 − 4 cos 4θ + cos 8θ
◦ sin4 θ cos4 θ =
128
5 5 10 sin 2θ − 5 sin 6θ + sin 10θ
◦ sin θ cos θ =
512
◦ In general, utilizing the binomial theorem,
(n−1)/2
1 X n
cos (n − 2k)θ , n odd
2n−1 k
cosn θ =
k=0
(n/2)−1
1 n 1 X n
2n n/2 + cos (n − 2k)θ , n even
2n−1 k
k=0
(n−1)/2
1 X n (n−1)/2−k
(−1) sin (n − 2k)θ , n odd
2n−1 n=0 k
n
sin θ = (n/2)−1
1 n 1 X n
(−1)n/2−k cos (n − 2k)θ , n even
n + n−1
2 n/2 2 k
k=0
33
Product-to-Sum Formulas: Also called prosthaphaeresis formulas. The first 3 historically are
known as Werner’s formulas.
cos(α − β) + cos(α + β)
◦ cos α cos β =
2
cos(α − β) − cos(α + β)
◦ sin α sin β =
2
sin(α + β) + sin(α − β)
◦ sin α cos β =
2
cos(α − β) − cos(α + β)
◦ tan α tan β =
cos(α − β) + cos(α + β)
n n
!
Y X X
◦ cos θk = cos xi θi
k=1 (x1 ,···,xn )∈{1,−1}n i=1
!
n n
(−1)⌊n/2⌋ X X Y
cos xi θ i xj , n even
2n
n
(x1 ,···,xn )∈{1,−1}n i=1
! j=1
Y
◦ sin θk = n n
k=1
(−1)⌊n/2⌋ X X Y
cos xi θ i xj , n even
2n
(x1 ,···,xn )∈{1,−1}n i=1 j=1
Sum-to-Product Formulas:
α±β α∓β
◦ sin α ± sin β = 2 sin cos
2 2
α+β α−β
◦ cos α + cos β = 2 sin cos
2 2
α+β α−β
◦ cos α − cos β = −2 sin sin
2 2
sin(α ± β)
◦ tan α ± tan β =
cos α cos β
Miscellany:
n−1
Y kπ n
◦ sin = n−1
n 2
k=1
n
Y kπ 1
◦ cos = n (Source: Video by Dr. Michael Penn)
2n + 1 2
k=1
34
§2.4: Arcfunction Identities
The arcfunctions (say, arc-f (x)) are defined to be the inverses of their original functions (say, f (x)) on
certain intervals - the ones that end up defining their ranges. Specifically:
Note that this means if solving the equation, say, sin(x) = y, you need to account for periodicity of the
original function. Thus,
Cofunction-Like Identities:
π
◦ arccos(x) = − arcsin(x)
2
π
◦ arccot(x) = − arctan(x)
2
π
◦ arccsc(x) = − arcsec(x)
2
π
◦ In general, arc-f (x) + f (x) =
2
Parity-Like Identities: arcsin, arctan, arccsc are odd; arccos, arccot, arcsec are neither.
◦ arcsin(−x) = − arcsin(x)
◦ arccos(−x) = π − arccos(x)
◦ arctan(−x) = − arctan(x)
◦ arccot(−x) = π − arccot(x)
◦ arcsec(−x) = π − arcsec(x)
◦ arccsc(−x) = − arccsc(x)
◦ Alternate angle: arc-f (−x) + f (x) ∈ {0, π}
35
Function-Arcfunction Compositions: (of the type f (f −1 (x))) Each is easily motivated with a
triangle: if we want sin(arctan(x)) for instance, let θ = arctan(x) =⇒ tan θ = x, and label a triangle.
Then find sin θ.
Reciprocal Arguments:
1
◦ arcsin = arccsc(x)
x
1
◦ arccos = arcsec(x)
x
1
◦ arcsec = arccos(x)
x
1
◦ arccsc = arcsin(x)
x
( (
1 π/2 − arctan(x), x > 0 arccot(x), x > 0
◦ arctan = =
x −π/2 − arctan(x), x < 0 −π + arccot(x), x < 0
( (
1 π/2 − arccot(x), x > 0 arctan(x), x > 0
◦ arccot = =
x 3π/2 − arccot(x), x < 0 π + arctan(x), x < 0
36
Fragmentary Formulas: Meant to be used only if one has a fraction of a sine table. If complex
numbers get involved in the roots, we choose the ones with positive real part, or positive imaginary
part if it has negative real part.
p
arccos(x) = arcsin 1 − x2 , if 0 ≤ x ≤ 1 , from which you get
1 − x2
2x
arccos = arcsin , if 0 ≤ x ≤ 1
1 + x2 1 + x2
p π
arcsin 1 − x2 = − sgn(x) arcsin(x)
2
1
arccos(x) = arccos 2x2 − 1 , if 0 ≤ x ≤ 1
2
1
arcsin(x) = arccos 1 − 2x2 , if 0 ≤ x ≤ 1
2
x
arcsin(x) = arctan √
1 − x2
√ !
1 − x2
arccos(x) = arctan
x
x
arctan(x) = arcsin √
1 + x2
x
arccot(x) = arccos √
1 + x2
One that follows from these is
r !
1
arctan(x) = arccos , if x ≥ 0
1 + x2
because r r !!
1 1
cos(arctan(x)) = = cos arccos
1 + x2 1 + x2
The tangent half-angle formula yields
x
arcsin(x) = 2 arctan √
1 + 1 − x2
√ !
1 − x2
arccos(x) = 2 arctan , if − 1 < x ≤ 1
1+x
x
arctan(x) = 2 arctan √
1 + 1 + x2
37
Double-Angle-Like:
2x
◦ 2 arcsin(x) = arccos √
1 − x2
◦ 2 arccos(x) = arccos(2x2 − 1)
1 − x2
2x 2x
◦ 2 arctan(x) = arcsin = arccos = arctan
1 + x2 1 + x2 1 − x2
Triple-Angle-Like:
Angle-Sum-Like:
p p
◦ arcsin(x) ± arcsin(y) = arcsin x 1 − y 2 ± y 1 − x2
p
◦ arccos(x) ± arccos(y) = arccos xy ∓ (1 − x2 )(1 − y 2 )
u±v
◦ arctan(u) ± arctan(v) = arctan (mod π) , uv ̸= 1
1 ∓ uv
38
§2.5: Triangle Laws (Laws of Sines, Cosines, & More)
Law of Sines:
sin α sin β sin γ
◦ = =
a b c
◦ The ratio of a sine of an angle to its side length is constant in a triangle
Law of Cosines:
◦ a2 = b2 + c2 − 2bc cos α
◦ b2 = a2 + c2 − 2ac cos β
◦ c2 = a2 + b2 − 2ab cos γ
◦ A square, sided, is equal to the sum of the squares of the other two sides, minus double their
product with the cosine of the angle between them
Law of Tangents:
α−β
tan
a−b 2
◦ =
a+b α+β
tan
2
β−γ
tan
b−c 2
◦ =
b+c β + γ
tan
2
α−γ
tan
a−c 2
◦ =
a+c α+γ
tan
2
a+b cos (α − β)/2 a−b sin (α − β)/2
Mollweide’s Formulas: = =
c sin(γ/2) c cos(γ/2)
◦ Divide the two for the law of tangents.
◦ Can also use these for the laws of sines and cosines.
39
§2.6: Some Useful Values for Fourier Analysis
These follow easily from previous discussions but it’s sometimes nice to have a quick reference for these.
Throughout, assume n ∈ Z.
sin(nπ) = 0
cos(nπ) = (−1)n
nπ 1, n = 4k + 1 for a k ∈ Z (−1)(n−1)/2 1 − (−1)n (−1)⌊(n−1)/2⌋ + (−1)⌊n/2⌋
sin = −1, n = 4k + 3 for a k ∈ Z = =
2 2 2
0, otherwise
nπ 1, n = 4k for a k ∈ Z (−1)n/2 1 − (−1)n−1 (−1)⌊n/2⌋ + (−1)⌊(n+1)/2⌋ in + (−i)n
cos = −1, n = 4k + 2 for a k ∈ Z = = =
2 2 2 2
0, otherwise
2n + 1
sin π = (−1)n
2
2n + 1
cos π =0
2
40
§3: Items from Basic Calculus & Related Topics
This is largely just to go over the fundamental definitions and identities for these functions. We define
ex − e−x
sinh(x) :=
2
ex + e−x
cosh(x) :=
2
and tanh(x), sech(x), etc., are defined analogously for ordinary trig functions:
sinh(x) ex − e−x
tanh(x) := = x
cosh(x) e + e−x
1 2
sech(x) := = x
cosh(x) e + e−x
1 2
csch(x) := = x
sinh(x) e − e−x
1 ex + e−x
coth(x) := = x
tanh(x) e − e−x
Each has an associated “arc-” function that is its inverse. (“ar-” somewhat more of a proper, and common,
name; “arc” in ordinary trig refers to arc lengths, but the use of “ar-” is in reference to area, one of the
defining geometric ideas behind these functions.)
Some noteworthy identities include the following:
◦ sinh(x) = −i sin(ix)
◦ cosh(x) = cos(ix)
◦ tanh(x) = −i tan(ix)
◦ coth(x) = i cot(ix)
◦ sech(x) = sec(ix)
◦ csch(x) = i csc(ix)
Parities: sinh(x), tanh(x), coth(x), csch(x) are odd functions; cosh(x), sech(x) are even
41
1 1
◦ General scheme: arc- (x) = arc-(f )
f x
Sum Identities:
◦ cosh(x) + sinh(x) = ex
◦ cosh(x) − sinh(x) = e−x
q
B2 B
A q1 − A2 cosh x + arctanh A
if A2 > B 2
◦ A cosh(x) + B sinh(x) = B 1 − A22 sinh x + arctanh A , if A2 < B 2
B B
A exp(x) if A = B
Pythagorean-like Identities:
◦ sinh(x ± y) = sinh(x) cosh(y) ± cosh(x) sinh(y) (sine cosine sign cosine sine)
◦ cosh(x ± y) = cosh(x) cosh(y) ± sinh(x) sinh(y) (cosine cosine sign sine sine)
tanh(x) ± tanh(y)
◦ tanh(x ± y) =
1 ± tanh(x) tanh(y)
2 tanh(x)
◦ tanh(2x) =
1 + tanh2 (x)
42
x cosh(x) − 1
◦ For x ̸= 0, then tanh = = coth(x) − csch(x)
2 sinh(x)
Function/arcfunction composition:
p
◦ sinh(arccosh(x)) = x2 − 1 for |x| > 1
x
◦ sinh(arctanh(x)) = √ for −1 < x < 1
1 − x2
p
◦ cosh(arcsinh(x)) = 1 + x2
43
1
◦ cosh(arctanh(x)) = √ for −1 < x < 1
1 − x2
x
◦ tanh(arcsinh(x)) = √
1 + x2
√
x2 − 1
◦ tanh(arccosh(x)) = for |x| > 1
x
1 + sin α 1
◦ arcsinh(tan α) = arctanh(sin α) = ln = ± arccosh
cos α cos α
◦ ln(|tan α|) = − arctanh(cos 2α)
44
§3.2: Limits
If the limits individually exist and are finite, we may have (for α, β ∈ R)
lim αf (x) + βg(x) = α lim f (x) + β lim g(x)
x→c x→c x→c
lim f (x)g(x) = lim f (x) lim g(x)
x→c x→c x→c
45
(so that f /g takes on an ∞/∞ or 0/0 form in the limit) and f, g are differentiable near c, then L’Hopital’s
rule applies:
f (x) f ′ (x)
lim = lim ′
x→c g(x) x→c g (x)
This can be applied repeatedly. A common use case is with functions of the type f (x)g(x) for which you can
take the log first and rewrite as, say,
ln(f (x))
L = lim f (x)g(x) =⇒ ln(L) = lim ln f (x)g(x) = lim g(x) ln(f (x)) = lim
x→c x→c x→c x→c 1/g(x)
46
§3.2.2: Special Limits
α βx
lim (1 + αx)β/x = lim 1 + = eαβ
x→0 x→∞ x
x
1/x 1
◦ Special case: lim (1 + x) = lim 1 + =e
x→0 x→∞ x
x
1 1
◦ Special case: lim (1 − x)1/x = lim 1 − =
x→0 x→∞ x e
ax − 1
lim = ln(a) if a > 0
x→0 x
sin(x)
lim =1
x→0 x
1 − cos(x)
lim =0
x→0 x
n
!
X 1
lim − ln(n) + = γ (Euler-Mascheroni constant)
n→∞ k
k=1
n
lim √ =e
n→∞ n n!
√
lim
n
n! = ∞
n→∞
π(n)
lim =1
n→∞ n/ ln(n)
√
2πn(n/e)n
lim = 1 (Stirling approximation)
n→∞ n!
47
§3.2.3: Asymptotic Notations (O, o, ω, Ω, Θ,...)
Informally:
◦ f (x) ∼ g(x) if f, g grow about the same and equal in the limit
◦ f (x) = O(g(x)) if g is eventually always larger than f (up to a constant multiple); analogous to
upper bound
◦ f (x) = o(g(x)) if any constant multiple of g is always eventually an upper bound of f ; g grows
much faster; g is an unreachable upper bound
◦ f (x) = Θ(g(x)) if f can be bounded above and below by constant multiples of g (giving an exact
bound)
◦ f (x) = ω(g(x)) if g(x) = o(f (x)) (f grows much faster than g)
◦ f (x) = ΩK (g(x)) if a constant multiple of g is eventually a lower bound of f
◦ f (x) = ΩHL (g(x)) if f is not properly dominated by g in the limit
Thinking about inequalities of growth rates, then,
Formally:
◦ f (x) ∼ g(x) if
f (x)
∀ε > 0, ∃N ∈ N such that ∀x > N , we have − 1 < ε
g(x)
f (x)
lim =1
x→∞ g(x)
◦ f (x) = O(g(x)) if
f (x)
lim sup <∞
x→∞ g(x)
∃M > 0 and ∃a ∈ R such that, ∀x ≥ a, we have f (x) ≤ M g(x)
◦ f (x) = o(g(x)) if
∀M > 0, ∃a ∈ R such that, ∀x > a, we have |f (x)| < M g(x)
f (x)
lim =0
x→∞ g(x)
48
For instance, ∃m, M > 0 and a ∈ R such that, for all x > a, mg(x) ≤ f (x) ≤ M g(x).
◦ f (x) = ω(g(x)) if
g(x) = ω(f (x))
∀M > 0, ∃a ∈ R such that, ∀x > a, we have f (x) > M g(x).
f (x)
lim =∞
x→∞ g(x)
◦ f (x) = ΩK (g(x)) if
∃M > 0 and a ∈ R such that, ∀x > a, we have f (x) ≥ M g(x)
f (x)
lim inf >0
x→∞ g(x)
49
§3.3: Derivatives
The fundamental definition of the usual derivative, for a function f , is a second function f ′ given
pointwise by
f (x + h) − f (x) f (x) − f (h)
f ′ (x) := lim = lim
h→0 h h→x x−h
The derivative can be used to give the line tangent to f at x0 by the point-slope rule
Extreme Value Theorem/First Derivative Test: Continuous functions attain their suprema/in-
fima; moreover, if differentiable, f ′ is 0 or nonexistent at those points.
Second Derivative Test: At local extrema, f ′′ < 0 for maxima (open down) or f ′′ > 0 (open up).
Inflection points are where f ′′ = 0.
Rolle’s Theorem: f (a) = f (b) =⇒ ∃ξ ∈ (a, b) where f ′ (ξ) = 0
f (b) − f (a)
Mean Value Theorem: ∃ξ ∈ (a, b) where = f ′ (ξ)
b−a
Monotonicity: f ′ > 0 on (a, b) means f is monotone-increasing; analogous results exist.
f (xn ) n→∞
xn+1 := xn − −−−−→ x when chosen well enough
f ′ (xn )
50
§3.3.2: Derivative Properties
′
f f ′ g − f g′ Low d-High minus High d-Low
Quotient Rule: = =
g g2 Low2
df dg df dg
Chain Rule: f (g(x)) = f (g(x)) · g (x) =
′ ′
=
dg dx dx g(x) dx x
df −1 1 df 1
= or equivalently =
dx ′
f ◦f −1 (x) dx −1 ′
(f ) ◦ f (x)
or more briefly
′ ′ g
(f g ) = eg ln f = f g f ′ + g ′ ln f
f
51
§3.3.3: Basic Derivative Formulas
The very basics: Related to the basics: Other (basic) trig identities:
d d x d
◦ constant = 0 ◦ a = ax ln(a) (a > 0) ◦ tan(x) = sec2 (x)
dx dx dx
d n d 1
◦ x = nxn−1 ◦ ln|x| = (x ̸= 0) d
dx dx x ◦ sec(x) = sec(x) tan(x)
d dx
◦ sin(x) = cos(x) d 1
dx ◦ loga (x) = (a, x > 0) ◦ d csc(x) = − csc(x) cot(x)
dx x ln(a) dx
d
◦ cos(x) = − sin(x) d
dx ◦ cot(x) = − csc2 (x)
d x dx
◦ e = ex
dx
d 1
◦ ln(x) = (x > 0)
dx x
Inverse trig functions: Hyperbolic trig functions: Inverse hyperbolic trig functions:
d 1 d d 1
◦ arcsin(x) = √ ◦ sinh(x) = cosh(x) ◦ arcsinh(x) = √
dx 1 − x2 dx dx 1 + x2
d −1 d d 1
◦ arccos(x) = √ ◦ cosh(x) = sinh(x) ◦ arccosh(x) = √ (x > 1)
dx 1 − x2 dx dx 2
x −1
d 1 d
◦ arctan(x) = ◦ tanh(x) = sech2 (x) ◦
d
arctanh(x) =
1
(|x| < 1)
dx 1 + x2 dx dx 1 − x2
d 1 d
◦ arcsec(x) = √ ◦ sech(x) = − sech(x) tanh(x) d 1
dx dx ◦ arccoth(x) = (|x| > 1)
|x| x2 − 1 dx 1 − x2
d
d −1 ◦ csch(x) = − csch(x) coth(x) d −1
◦ arccsc(x) = √ dx ◦ arcsech(x) = √ (0 < x < 1)
dx |x| x2 − 1 dx x 1 − x2
d
d −1 ◦ coth(x) = − csch2 (x) d −1
◦ arccot(x) = dx ◦ arccsch(x) = √ (x ̸= 0)
dx 1 + x2 dx |x| 1 + x2
52
§3.3.4: The “♡ of (Differential) Calculus” Formula
(As ruminated on in guides from Dr. Johnson & Dr. Kunin at UAH.)
In principle, in the first approximation, the increment of a differentiable function is proportional to the
increment of the argument, with constant of proportionality the derivative, when appropriately understood.
It is, in essence, the linear approximation, though its use is woefully understated in most calculus texts.
For a function of one variable,
where dx indeed represents “an infinitesimally small quantity”, in the sense of 0 < dx < r ∀r > 0. Formally,
o(h) h→0
f (x + h) = f (x) + f ′ (x) · h + o(h) where −−−→ 0
h
#» #»
An example of utility is the derivative of F × G. We may think of the derivative as
f (x + dx) − f (x)
f ′ (x) := lim
dx→0 dx
Then
#» #»′ #» #» #» #»
F(t + dt) × G(t + dt) − F(t) × G(t)
F × G (t) = lim (definition)
dt→0 dt
#» #» #» #» #» #»
F(t) + F ′ (t) dt × G(t) + G′ (t) dt − F(t) × G(t)
= lim (♡ formula)
dt→0 dt
1 #» #» #»′ #» #» #» #» #» #» #»
h i
= lim F × G + F × G dt + F × G′ dt + F ′ × G′ (dt)2 − F × G
dt→0 dt t
(cross product rules, compactify notation)
1 h #»′ #» #» #»′ #»′ #»′ i
2
= lim F × G dt + F × G dt + F × G (dt) (cancellation)
dt→0 dt t
h #» #» #» #» #» #» i
= lim F ′ × G + F × G′ + F ′ × G′ dt (cancellation)
dt→0 t
#» #» #» #»
= F ′ (t) × G(t) + F(t) × G′ (t) (take limit)
Remark: This did not make much use of the cross product aside from its distributivity. One may use this
to prove the product rule almost identically as a result.
53
§3.4: Integrals
Note that any two antiderivatives will differ by a constant, i.e. F ′ = G′ = f =⇒ F = G + C. The +C is
the constant of integration.
We may define the definite Riemann integral of f on [a, b] as discussed in the analysis sections on
Riemann integration. Some other results of note lie there. We focus on formulas here.
Some fundamental properties: for α, β ∈ R and f, g ∈ R[a, b] (Riemann integrable on [a, b]),
Z b Z a
Reverse Order of Bounds: f (x) dx = − f (x) dx
a b
Z a
Zero-Width Interval: f (x) dx = 0
a
Z b Z b Z b
Linearity: (αf (x) + βg(x)) dx = α f (x) dx + β g(x) dx
a a a
Z b Z b
Monotonicity: f ≤ g means f (x) dx ≤ g(x) dx. In particular
a a
Z b b Z b
Integration by Parts: f (x)g ′ (x) dx = f (x)g(x) − f ′ (x)g(x) dx (“LIATE” rule for f )
a a a
54
§3.4.2: Basic Identities & Links to Integral Tables
Some common integral identities. Many tables exist, though, e.g. here, here, & here. Most of the basics
are just the inverses of derivatives, howeve, so there is some repetition here as well.
Polynomials/xα :
Z
xn+1
Z
◦ dx = x + C ◦ xn dx = + C, when n ̸= −1
n+1
Z Z
1
◦ 0 dx = C ◦ dx = ln|x| + C
x
ax
Z Z
◦ ex dx = ex + C ◦ ax dx = + C fpr a > 0
ln(a)
Z
◦ ln(x) dx = x ln(x) − x + C; to prove, IBP with 1 · ln(x)
Z
Inverse Trig Functions: arc-f (x) dx can be found with the 1 · arc-f (x) and IBP trick.
Z Z
dx 1 x p
◦ √ = arcsin +C ◦ arcsin(x) dx = x arcsin(x) + 1 − x2 + C
a2 − x2 a a
Z
1
Z
dx 1 x
2
◦ = arctan + C ◦ arctan(x) dx = x arctan(x)− ln 1 + x +C
x2 + a2 a a 2
Z Z
dx 1 x p
◦ √ = arcsec +C ◦ arccos(x) dx = x arccos(x) − 1 − x2 + C
x x2 − a 2 a a
55
Hyperbolic Trig Functions:
Z Z
◦ sinh(x) dx = cosh(x) + C ◦ csch(x) coth(x) dx = − csch(x) + C
Z Z
◦ cosh(x) dx = sinh(x) + C ◦ sech(x) dx = arctan sinh(x) + C
Z Z
◦ tanh(x) dx = ln(cosh(x)) + C ◦ sech2 (x) dx = tanh(x) + C
Z Z
◦ sech(x) tanh(x) dx = − sech(x) + C ◦ csch2 (x) dx = − coth(x) + C
56
§3.4.3: Basic & Advanced Solution Techniques
This is particularly handy with the inverse function rule, when f is the inverse of a well-understood
function such as f (x) = arctan(x). In this scheme, you would opt by default to differentiate f and
work from there.
◦ Since this is just the integral product rule, there is the generalization of
Z 2
Y n
Y n Z
X Y
f1′ (x) fj (x) dx = fi (x) − fi′ (x) fj (x) dx
j=2 i=1 i=2 1≤j≤n
j̸=i
Trig Substitution: For integrals containing the following forms, use the suggested substitution to
leverage a Pythagorean trig identity:
√ a
◦ a2 − b2 x2 =⇒ x =
sin θ (Identity to Leverage: sin2 θ + cos2 θ = 1)
b
√ a
◦ b2 x2 − a2 =⇒ x = sec θ (Identity to Leverage: tan2 θ = sec2 θ − 1)
b
p a
◦ 2 2 2
a + b x =⇒ x = tan θ (Identity to Leverage: tan2 θ = sec2 θ − 1)
b
You can also use the following if you wish, though they’re sometimes poorer-behaved depending on the
functions used, whether they’re increasing/decreasing compared to their counterpart, and sometimes
the relevant derivatives introduce additional negatives.
√ a
◦ a2 − b2 x2 =⇒ x =
cos θ (Identity to Leverage: sin2 θ + cos2 θ = 1)
b
√ a
◦ b2 x2 − a2 =⇒ x = csc θ (Identity to Leverage: csc2 θ − 1 = cot2 θ)
b
p a
◦ a2 + b2 x2 =⇒ x = cot θ (Identity to Leverage: csc2 θ = 1 + cot2 θ)
b
Lesser-known hyperbolic trig substitutions exist which sometimes make for more manageable integrals.
The standard ones:
p a
◦ b2 x2 + a2 =⇒ x = sinh(u) (Identity to Leverage: cosh2 u − sinh2 u = 1)
b
p a
◦ b2 x2 − a2 =⇒ x = cosh(u) (Identity to Leverage: cosh2 u − sinh2 u = 1)
b
Likewise, some of the poorer-behaved ones:
p a
◦ a2 − b2 x2 =⇒ x = tanh(u) (Identity to Leverage: 1 − tanh2 u = sech2 u)
b
p a
◦ a2 − b2 x2 =⇒ x = sech(u) (Identity to Leverage: 1 − sech2 u = tanh2 u)
b
p a
◦ b2 x2 − a2 =⇒ x = coth(u) (Identity to Leverage: coth2 u − 1 = csch2 u)
b
p a
◦ b2 x2 + a2 =⇒ x = csch(u) (Identity to Leverage: coth2 u − 1 = csch2 u)
b
57
Weierstrass Substitution: Use on rational functions of sine and cosine. We define
x
t := tan
2
and consequently have
2t 1 − t2 2 dt
sin(x) = cos(x) = dx =
1 + t2 1 + t2 1 + t2
This also shows that
1 − t2
2t
,
1 + t 1 + t2
2
t∈R
parameterizes the unit circle, starting from (1, 0), treating (−1, 0) as t = ∞.
Hyperbolic Weierstrass Substitution: Analogously, let
x
t := tanh
2
and consequently have
2t 1 + t2 2 dt
sinh(x) = 2
cosh(x) = 2
dx =
1−t 1−t 1 − t2
Bioche’s Rules: (Wikipedia). For f (t) dt let ω(t) := f (t) dt, with f a rational function of sine and
R
An analogue for f a rational function of sinh(t), cosh(t) exists: just use the respective hyperbolic
version. u = et also just gives a rational function.
Parities: Use odd/evenness, self-explanatory.
where
Res(f, c) = lim (z − c)f (z)
z→c
at simple poles c ∈ C. For poles of order n, in general,
1 dn−1
Res(f, c) = lim n−1 (z − c)n f (z)
(n − 1)! z→c dz
58
Leibniz’s Rule/Feynman’s Trick: In its most general form, for f such that f (x, t), ft (x, t) are
continuous in t, x in the bounds of integration, with a, b ∈ C 1 ,
Z b(x) Z b(x)
∂ ∂f (x, t)
f (x, t) dt = f (x, b(x))b′ (x) − f (x, a(x))a′ (x) + dt
∂x a(x) a(x) ∂x
Some generalizations, examples here. There do not appear to be general “rules” for what to do, but
some common parameterizations include using
x 7→ tx
x 7→ f −1 (tf (x)), for f a function appearing elsewhere in the integral
xn 7→ xt , where n was given and fixed, but t is our parameter
Z ∞
sin(x)
f (x) 7→ f (x)e−tx ; example: for dx
0 x
Another is just to add it in somewhere random, e.g. Example 3.4 here. The end goal is to write the
integral as a function of a parameter t and solve a corresponding DE.
A variety of example parameterizations follow, mostly from Differentiating Under the Integral Sign by
Keith Conrad (download).
Z ∞ Z ∞
ex dx −→ te−tx dx
0 0
Z ∞ Z ∞
sin x sin x
dx −→ e−tx dx
0 x 0 x
Z ∞ Z ∞ −t2 (1+x2 )/2
2 e
e−x /2 dx −→ dx
0 0 1 + x2
Z ∞ Z ∞
log x −x
√ e dx −→ xt e−x dx (since ∂t xt = xt log x)
0 x 0
Z ∞ Z ∞
1 1
2
dx −→ 2 + x2
dx
0 1 + x 0 t
Z 1 2 Z 1 t
x −1 x −1
dx −→ dx
0 ln x 0 ln x
59
Smaller/More Niche Tricks:
◦ Add/Subtract The Same Thing: Common in numerators of rational functions, e.g. x/(x + 1).
◦ Partial Fractions: Self-explanatory, use on integrals of ratios of polynomials.
◦ Convert to Series: Self-explanatory, typically will swap summation & integration. Sometimes
the conversion to a Riemann sum is warranted, specifically if an infinite limit is involved with the
limiting variable both as an upper bound and a summand of some sort.
◦ Sine & Cosine in Complexes: May be convenient to write
and
∞ π/2 π/2
sin4 (x)
Z Z Z
2
f (x) dx = f (x) dx − sin2 (x)f (x) dx
0 x4 0 3 0
by u = 1/x.
60
Measure Theory: MCT & DCT: (See the measure theory section for relevant definitions.)
(Descending & Above) If fk ↘ f a.e., and ∃φ ∈ L1 (E) with fk ≤ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
k→∞
◦ Uniform Convergence Theorem: Take {fk }k∈N ⊆ L1 (E) with fk −−−−→ f uniformly on E,
with µ(E) < ∞. Then:
f ∈ L1 (E)
Z Z
k→∞
fk −−−−→ f
E E
k→∞
◦ Dominated Convergence Theorem (DCT): Take {fk }k∈N measurable on E with fk −−−−→ f
pointwise-a.e. If ∃φ ∈ L1 (E) with |fk | ≤ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
Proved by showing every subsequence {fkj }j∈N has a subsubsequence {fkji }i∈N with
Z Z
i→∞
fkji −−−→ f
E E
◦ Corollary of Fatou, DCT, & MCT: Take {fk }k∈N nonnegative measurable functions with
k→∞
fk −−−−→ f pointwise-a.e. on E and fk ≤ f a.e. and for each k. Then
Z Z
k→∞
fk −−−−→ f
E E
61
Note there is no assumption on integrability of f , unlike, say, DCT. Some call this also the MCT,
despite being strictly stronger and more practical. Some discussion on MSE here.
R
It may be considered a corollary of the DCTR for E f < ∞, as well, and hence the usage of Fatou
arises (and can be used independently) for E f = ∞. Note, too, Fatou can be considered implied
by the MCT (see typical proofs).
62
§3.4.4: Applications & Approximations
General Constructions: Break up [a, b] into n equal subintervals [xi−1 , xi ] of width ∆xi = (b − a)/n.
We will evaluate f at a point ξi in each interval.
Constant Interpolations:
b n
b−aX
Z
◦ General Form: f (x) dx ≈
f (ξi )
a n i=1
Z b n
b−aX
◦ Left-Endpoint Rule (ξi = xi−1 ): f (x) dx ≈ f (xi−1 )
a n i=1
(b − a)2
Error is bounded above by sup |f ′ (x)|
2n x∈[a,b]
Z b n
b−aX
◦ Right-Endpoint Rule (ξi = xi ): f (x) dx ≈ f (xi )
a n i=1
(b − a)2
Error is bounded above by sup |f ′ (x)|
2n x∈[a,b]
Z b n
b−aX xi−1 + xi
◦ Midpoint Rule (ξi = (xi−1 + xi )/2): f (x) dx ≈ f
a n i=1 2
(b − a)3
Error is bounded above by sup |f ′′ (x)|
24n2 x∈[a,b]
(b − a)3
◦ Error bounded above by sup |f ′′ (x)|
12n2 x∈[a,b]
63
◦ Must split up [a, b] into n-many subintervals for n even.
Z b n/2 n/2−1
b − a X X
◦ f (x) dx ≈ f (x0 ) + f (xn ) + 4 f (x2j−1 ) + 2 f (x2j )
a 3n j=1 j=1
(b − a)5
(4)
◦ Error bounded above by sup (x)
180n4 x∈[a,b]
f
64
§3.4.5: Special Integral Values
∞ ∞
sin2 x
Z Z
sin x π
dx = dx =
0 x 0 x 2
(
π/2 π/2
(n − 1)!!
Z Z
1, n odd
n
sin (x) dx = n
cos (x) dx = · for n ∈ Z≥1
0 0 n!! π/2, n even
Z 1 ∞
X Z 1 ∞
X
Sophomore’s Dream: x −x
dx = n −n
and x
x dx = − (−n)−n
0 n=1 0 n=1
65
§3.4.6: The “True” Antiderivative of 1/x
(First seen in a video by Dr. Trefor Bazett here. Likely alluded to by Dr. Kunin. Desmos demo here.)
where C(x) is a locally constant function, and F ′ = f . C(x) being locally constant means it will be
constant, in particular, over every connected component of the domain, i.e. C ′ ≡ 0 on its domain. (A
connected component is a maximal subset w.r.t. inclusion that cannot be written as the union of two
disjoint open sets. Hence, in R, C(x) must be constant over every open interval in the domain, though on
different maximally-sized intervals, it may take on different constants.) It may prove helpful to think, then,
of the +C in integration not as a constant, but rather something with a vanishing derivative.
As a consequence, for instance,
(
−1/x + C, x > 0
Z
1
dx =
x2 −1/x + D, x < 0
or
ln(1 + x) − ln(1 − x)
+ A, x < −1
Z
2
dx
ln(1 + x) − ln(1 − x)
= + B, x ∈ (−1, 1)
1 − x2 2
ln(1 + x) − ln(1 − x) + C,
x>1
2
i.e. this is nothing special to do with the logarithm, and is merely an artifact of the domains over which
these functions are defined.
66
§3.5: Taylor, Maclaurin, & Other Special Series
∞
X f (n) (c)
Generally: For f ∈ C ∞ , the Taylor series centered at c ∈ C is given by f (x) = (x − c)n
n=0
n!
Binomial Theorems: We assume nk = 0 when k > n and define, for all n ∈ C, k ∈ Z≥0 ,
n n(n − 1)(n − 2) · · ·(n − k + 1) k numbers on top
:=
k k! k!
n ∞
X n k X n k
◦ Usual: (1 + x)n = x = x , for n ∈ Z≥0
k k
k=0 k=0
∞
−α
X α+k−1 k
◦ Newton’s Generalization: (1 − x) = x for α ∈ C
α−1
k=0
n
X n(n + 1)
◦ k=
2
k=1
n
X n(n + 1)(2n + 1)
◦ k2 =
6
k=1
n 2
X n(n + 1)
◦ k3 =
2
k=1
n
X 1 π2
◦ =: ζ(2) =
k2 6
k=1
n
X 1 π4
◦ 4
=: ζ(4) =
k 90
k=1
n
X 1 π6
◦ =: ζ(6) =
k6 945
k=1
n
X 1 B2n (2π)2n
◦ 2n
=: ζ(2n) = (−1)n+1 for Bk the kth Bernoulli number, and n ∈ Z≥1
k 2 · (2n)!
k=1
Geometric Series: Many like identities exist, by manipulating derivatives and integrals.
n
X 1 − xn+1
◦ Finite: xk =
1−x
k=0
n
X 1
◦ Infinite: xk = if |x| < 1
1−x
k=0
67
∞
X xk
◦ ln(1 − x) =
k
k=1
∞
X (−1)k k
◦ ln(1 + x) = x
k
k=1
∞
X (−1)k
◦ ln(x) = (x − 1)k
k
k=1
Ordinary Trigoonometric Functions: Many are excluded since theirs is complicated, rarely used,
or involves Bernoulli numbers. Note that if x is in the denominator, it is not technically a power series.
∞
X (−1)k 2k+1
◦ sin(x) = x
(2k + 1)!
k=0
∞
X (−1)k 2k
◦ cos(x) = x
(2k)!
k=0
∞
X (2k)!
◦ arcsin(x) = x2k+1 for |x| ≤ 1
22k · (k!)2 · (2k + 1)
k=0
∞
π X (2k)!
◦ arccos(x) = − x2k+1 for |x| ≤ 1 (π/2 − arcsin(x))
2 2 · (k!)2 · (2k + 1)
2k
k=0
∞
(−1)k 2k+1
X
x , |x| ≤ 1
2k + 1
k=0
∞
(−1)k 1
π X
◦ arctan(x) = − , x≥1
2 2k + 1 x2k+1
k=0
∞
π X (−1)k 1
− − , |x| ≤ 1
2 2k + 1 x2k+1
k=0
∞
X (2k)! 1
◦ arccsc(x) = for |x| ≥ 1 (arccsc(x) = arcsin(1/x))
22k 2
· (k!) · (2k + 1) x 2k+1
k=0
∞
π X (2k)! 1
◦ arcsec(x) = − for |x| ≥ 1 (arcsec(x) = arccos(1/x))
2 22k · (k!)2 · (2k + 1) x2k+1
k=0
∞
π X (−1)k 2k+1
− x , |x| ≤ 1
2 2k + 1
k=0
∞
(−1)k 1
X
◦ arccot(x) = , x≥1
2k + 1 x2k+1
k=0
∞
(−1)k 1
X
π +
, x ≤ −1
2k + 1 x2k+1
k=0
68
∞
1 + aπ · coth(aπ) X 1
◦ =
2a2 n=0
n 2 + a2
69
§3.6: Convergence Tests for Series
nth Term Limit Test: (Wikipedia) It is required that an → 0 for the summation to converge. (It is
not sufficient, however; see the harmonic series.)
P∞
Absolute Convergence
P∞ Test: If a series converges absolutely, then it converges, i.e. n=1 |an |
converging =⇒ n=1 a n does too.
P∞
Abel’s Test (link): Consider the summation n=1 an , known to be convergent.
Let {bn }n∈N be a bounded, monotone sequence.
P∞
Then n=1 an bn is also convergent.
P∞
Alternating Series Test / Leibniz’s Criterion (link): Consider the summation n=1 (−1)
n
an .
Suppose an ≥ an+1 ≥ an+2 ≥ · · · ≥ 0 and an → 0.
Then the summation in question converges. More generally, all sums of the types
∞
X ∞
X
(−1)n an (−1)n+1 an
n=k n=ℓ
or
N
X +1 ∞
X N
X
(−1)n an ≤ (−1)n an ≤ (−1)n an
n=k n=k n=k
P∞
Cauchy Condensation Test (link): Consider the summation S := n=1 an .
P∞
It converges if and only if S ∗ := n=0 2n a2n does. (Moreover, this new sum will satisfy S ≤ S ∗ ≤ 2S.)
P∞
Direct Comparison (of Series) Test (link): Consider the summation n=1 an .
P∞
Let n=1 bn be a known absolutely convergent series, with |an | ≤ |bn | for all n sufficiently large.
P∞
Then n=1 an converges absolutely.
70
This test may be applied to integrals in the obvious way, and has the ratio comparison test as a
corollary.
Dirichlet’s Test (link): Take {an }n∈N ⊆ R and {bn }n∈N ⊆ C with
◦ an ≥ an+1 ≥ · · ·
◦ an → 0
P
N
◦ ∃M > 0 such that, ∀N ∈ N n=1 bn ≤ M (uniformly bounded partial sums)
P∞
Then n=1 an bn converges.
P∞
Integral Test (link): Consider the summation n=1 an .
Let f : [1, ∞) → R be such that f (n) = an and f ≥ 0 and f is decreasing. Then let
Z ∞
L= f (x) dx
1
If L < ∞, the summation converges, and otherwise we have divergence. (Series converges iff integral
does.)
R∞ P∞
One may, of course, use N f (x) dx for n=N an .
P∞ P∞
Limit Comparison Test / Ratio of Limits (link): Consider the summations n=1 an , n=1 bn .
We suppose that an , bn > 0 for all n.
Define
an
L := lim
n→∞ bn
If L exists and 0 < L < ∞, then each sum diverges iff the other does (and same for convergence).
(Both diverge or both converge.)
P∞
p-Series Test (ζ function tails): Consider the summation n=1 an .
In particular, given p, we consider that
∞ ∞
X 1 X 1
ζ(p) := p
or even just the tails
n=1
n np
n=k
P∞
Ratio Test / d’Alembert’s Criterion (link): Consider the summation n=1 an .
Define
an+1
L := lim
n→∞ an
Then:
◦ L < 1 =⇒ absolute convergence
◦ L > 1 =⇒ divergence
◦ L = 1 is inconclusive
71
Sometimes the limit L does not exist so we may look at the more general
an+1 an+1
M := lim sup m := lim inf
n→∞ an n→∞ an
and conclude:
◦ if M < 1, absolute convergence
◦ if m > 1, divergence
◦ If |an+1 /an | ≥ 1 for all n large enough, divergence
◦ Inconclusive otherwise
Somewhat obscure extensions of the test may be found here.
P∞
Root Test / Cauchy’s Criterion (link): Consider the summation n=1 an .
Define p
n
L := lim sup |an |
n→∞
◦ L < 1 =⇒ convergence
◦ L > 1 =⇒ divergence
◦ L = 1 is inconclusive
The root test is stronger than the ratio test: root test conclusions imply those of the ratio test, but
not vice versa.
The root test has an application in the
P∞Cauchy-Hadamard theorem, stating that the radius of
convergence of a power series f (x) := n=1 an (x − c)n is given by
1
r= p
n
lim sup |an |
n→∞
Weierstrass M -test (link): Take a sequence {fn : A → C}n∈N , and suppose ∃{Mn }n∈N ⊆ R≥0 such
that
72
§3.7: Convergence Tests for Integrals
Generalizing Series Tests: In general, suppose that f : R≥1 → R≥0 is non-increasing and f ∈ R[1.b]
for all b > 1. Then we may use series convergence tests, as
Z ∞ ∞
X
f converges ⇐⇒ f (n) converges
1 n=1
In particular, the non-increasing & positivity conditions ensure that the sum is an upper bound for
the integral. (Of course, we may start at any integer if desired, not merely 1.)
Rb Rb
Absolute Convergence Test: For a, b ∈ [−∞, +∞], if a
|f | converges, so must a
f . (Used to turn
the integrand nonnegative to help with other tests.)
Dirichlet’s Test (link): Take f, g : R≥a → R continuous, with f uniformly bounded (i.e. |f (x)| ≤ M
for all x, for some M independent of x) and g ≥ 0 monotone-decreasing.
R∞
Then a f g converges.
R∞
Limit At Infinity Test: Suppose that lim f (x) = L. Then, trivially, a
f diverges if L ̸= 0.
x→∞
(Converse need not be true.)
f (x)
lim =L
x→∞ g(x)
Then
R∞ R∞
◦ If L ∈ (0, ∞), a
g converges ⇐⇒ a f converges
R∞ R∞
◦ If L ∈ (0, ∞), a
g diverges ⇐⇒ a f diverges
R∞ R∞
◦ If L = 0, a g converges =⇒ a f converges
R∞ R∞
◦ If L = 0, a f diverges =⇒ a g diverges
R∞ R∞
◦ If L = ∞, a f converges =⇒ a g converges
R∞ R∞
◦ If L = ∞, a g diverges =⇒ a f diverges
These hold for the “Type 2” improper integrals concerned with discontinuities.
73
§3.8: Fourier Series
Given a function f , several different types of Fourier series may exist. We will adopt several (not
necessarily standard) notations for each. In each, we generally are extending a function f : I → R for I
some interval, and extending it periodically in some fashion:
wherein Z L Z L
1 nπx 1 nπx
an = f (x) cos dx bn = f (x) sin dx
L −L L L −L L
If dom(f ) = [0, 2L], then we instead integrate over that interval but the formula is otherwise unchanged.
(In general, for f 2L-periodic or periodically-extended, any interval (a, a + 2L) may be used.)
Observe that if f is even, then f (x) sin(nx) is odd and bn = 0. Similarly, f odd gives an = 0.
Fourier Cosine Series (link): Given f : [−L, L] → R even, we have
∞
2 L
Z
a0 X nπx nπx
FCS[f ](x) = + an cos where an = f (x) cos dx
2 n=1
L L 0 L
If we have f : [0, L] → R not necessarily even, we may given it an even extension to a function
[−L, L] → R and apply the above formula with no modifications.
Fourier Sine Series (link): Given f : [−L, L] → R odd, we have
∞
2 L
X nπx Z nπx
FSS[f ](x) = bn sin where bn = f (x) sin dx
n=1
L L 0 L
If we have f : [0, L] → R not necessarily odd, we give it an odd extension to a function [−L, L] → R
and apply the above formula with no modifications.
Fourier Exponential Series: Given f : [−L/2, L/2] → R, we have
+∞
1 L/2
Z
X 2πin 2πin
FES[f ](x) = cn exp x with cn = f (x) exp − x dx
n=−∞
L L −L/2 L
74
§3.8.2: Some Properties & Results
If f is α-Holder continuous for any α > 0, FS[f ] converges uniformly everywhere to x. (Called the
Dirichlet-Dini criterion.)
For f continuous or Lp (1 < p ≤ ∞) in general, FS[f ] converges a.e. (known as Carleson’s theorem).
Some other results follow. We let f have FS[f ] coefficients an , bn and FES[f ] coefficients cn .
n→∞
Riemann-Lebesgue Lemma: For f ∈ R[0, L] or R[−L, L] as needed, we have an , bn , cn −−−−→ 0.
1 L 2 X
Z
2
Parseval’s Theorem: For f ∈ L (0, L), then
2
|f | = |cn |
L 0
n∈Z
Z L M −1 M −1 M −1 M −1
1 4
X X X X
◦ cn ∈ C =⇒ |f | = ck cℓ cm cm−k+ℓ + cm−ℓ+k cm
L 0
k=0 ℓ=0 m=k−ℓ m=k−ℓ
if k≥ℓ if k<ℓ
Z L M −1 M −1 M −1
1 4
X X X
◦ cn ∈ R =⇒ |f | = ck cℓ cm cm−|k−ℓ|
L 0 k=0 ℓ=0 m=|k−ℓ|
2
Plancherel’s Theorem: Given {cn }n∈Z such that < ∞ (finite ℓ2 norm), then ∃! f ∈ L2 (0, L)
P
n∈Z |cn |
with FES[f ] given by those cn .
75
Given r, s : [0, P ] → R with coefficients R[n] := rn and S[n] := sn , we have the following properties. We
see a function f on the left, and what happens to the coefficients of FES[f ] on the right. For instance, we
see that
∞
h i X sn + s−n 2πin
FES Re(s) (x) = exp x
n=−∞
2 P
Some others:
76
§3.8.3: Some Common Fourier Series
77
§3.9: Fourier Transforms
F
for the so-called forward transform (f 7→ fˆ), and analogous ones for the inverse transform F −1 .
In Rn we will use Z
Ff (ξ) := f (x)e−2πi⟨ξ,x⟩Rn dx
Rn
instead (with ⟨·, ·⟩Rn denoting the usual dot product). Most statements here for R can be generalized to Rn
with this mild modification.
Warning (Physics). In physics in particular, it is commonplace to express these in terms
of an angular frequency ω = 2πξ. This leads to the convention, for instance,
Z
Ff (ω) = f (x)e−iωx dx
R
but this breaks a symmetry in that with this convention we must define
Z
−1 1
Ff (x) := Ff (ω)eiωt dξ
2π R
This gives rise to the unitary convention that allows for F F −1 [f ] = F −1 [F[f ]] = f :
Z
1
Ff (ω) := √ f (x)e−iωx dx
2π R
78
§3.9.2: Important Properties
1 ξ
Time Scaling: If α ̸= 0, then F[f (αx)](ξ) = Ff
|α| α
Invertibility/Periodicity: For suitable f , we note that F[F[f ]] = f (−x) and hence (F◦F ◦F◦F)[f ] = f ,
4
h i composition) F [f ] = f , making the Fourier transform 4-periodic on these
or more simply (implying
functions. Hence F 3 fˆ = f under this convention.
◦ This can be summed up with the Fourier inversion theorem or Fourier integral theorem, which
states Z
f (x) = e2πi·(x−y)·ξ f (y) dy dξ
R2
−1
which may be thought of as F applied to Ff .
◦ The theorem can be said to hold for all Schwartz functions (Wikipedia). That is, it applies if f
is such that
f ∈ C ∞ (R, C) and ∀α, β ∈ N, ∥f ∥S(R,C) := supxα f (β) (x) < ∞
x∈R
which encodes all functions of “rapidly decreasing functions”. (An analogous idea holds in Rn .)
◦ Even broader, it holds for all functions f with f, Ff ∈ L1 (R) with each continuous a.e.
◦ Some more discussion here.
Differentiation: We assume that f ∈ C 1 (R) ∩ L1 (R) with f ′ ∈ L1 (R). Then Ff ′ (ξ) = 2πiξ · Ff (ξ)
◦ Taking f ∈ C n (R) and f, f ′ , · · ·, f (n) ∈ L1 (R) gives Ff (n) (ξ) = (2πiξ)n · Ff (ξ)
◦ This admits the rule of thumb “f is smooth iff Ff decays to 0 quickly as |ξ| → ∞” and likewise
“f decays to 0 quickly as |x| → ∞ iff Ff is smooth”
79
n
dn Ff (ξ)
i
◦ Related Identity: F[xn f (x)](ξ) =
2π dξ n
Then Ff ∗g = Ff · Fg and Ff ·g = Ff ∗ Fg .
Uniform Continuity & Riemann-Lebesgue: The following hold since f ∈ L1 (R):
◦ Ff is uniformly continuous
◦ ∥Ff ∥∞ ≤ ∥f ∥1 (in the Lp space senses)
|ξ|→∞
◦ Ff (ξ) −−−−→ 0
X X
Poisson Summation Formula: f (n) = Ff (n) (Wikipedia article)
n∈Z n∈Z
Lp Relations:
◦ F : L1 (R) → L∞ (R) is bounded as an operator (as |Ff | ≤ |f |). Moreover, F(L1 (R)) ⊆ Cc (R)
R
R
(though with no equality).
◦ F : L2 (R) → L2 (R) is unitary (hence bijective and preserves inner products)
◦ F : Lp (R) → Lq (R) more generally where q = p/(p − 1) is the Holder conjugate of p, i.e.
1/p + 1/q = 1. The exact image is hard to characterize unless p = 2.
80
§3.9.3: Common Fourier Transforms
Notably, x
tri(x) = rect(x) ∗ rect(x) = rect · 1 − |x|
2
Dirac Delta δ(x): Loosely we may think of
(
+∞, x = 0
?
δ(x) :=
0, x ̸= 0
R
with R δ ≡ 1. This is purely heuristic. Formally, as a distribution (generalized function), δ is a linear
functional on the space of test functions (bump functions, or functions of compact support: Cc∞ (R)),
with
δ[φ] = φ(0) for every φ ∈ Cc∞ (R)
and hence we say Z
δ[φ] := φ(x)δ(x) dx = φ(0)
R
81
Square-Integrable One-Dimensional Functions (L2 (R)):
h i 1 ξ
F rect(αx) (ξ) = sinc
|α| α
h i 1 ξ
F sinc(αx) (ξ) = rect
|α| α
h i 1 ξ
F sinc (αx) (ξ) =
2
tri
|α| α
h i 1 ξ
F tri(αx) (ξ) = sinc2
|α| α
h i 1
F e−αx u(x) (ξ) =
α + 2πiξ
r
h 2
i π −π2 ξ2 /α
F e−αx (ξ) = e
α
r 2 2
h
−iαx2
i π π ξ π
F e (ξ) = exp i −
α α 4
h i 2α
F e−α|x| (ξ) = 2
α + 4π 2 ξ 2
2
h i π π ξ
F sech(αx) (ξ) = sech
α α
82
Distributions in One Dimension (e.g. Cc∞ (R)):
Derivatives of δ (i.e. δ (n) ) should be interpreted in the weak-derivative sense. To recall, the weak derivative
of f is a function Df such that
Z Z
f φ = − Df φ for every φ ∈ Cc∞ (R)
′
R R
h i
F 1 (ξ) = δ(ξ)
h i
F δ(x) (ξ) = 1
h i
F e2πiαx (ξ) = δ(ξ − α)
h i 1h i
F cos(2παx) (ξ) = δ(ξ − α) + δ(ξ + α)
2
h i 1h i
F sin(2παx) (ξ) = δ(ξ − α) − δ(ξ + α)
2i
r 2 2
h i π π ξ π
F cos(αx2 ) (ξ) = cos −
α α 4
π2 ξ2
r
h i π π
F sin(αx ) (ξ) = −
2
sin −
α α 4
n
h i i
F xn (ξ) = δ (n) (ξ)
2π
h i
F δ (n) (x) (ξ) = (2πiξ)n
1
F (ξ) = −iπ · sign(ξ)
x
(−1)n−1 dn (−2πiξ)n−1
1
F n (ξ) = F n
ln|x| (ξ) = −iπ · sign(ξ)
x (n − 1)! dx (n − 1)!
h
α
i 2 · Γ(α + 1) πα
F |x| (ξ) = − α+1 · sin
|2πξ| 2
" #
1 1
F p (ξ) = p
|x| |ξ|
h i 1
F sign(x) (ξ) =
iπξ
h i 1 1
F u(x) (ξ) = + δ(ξ)
2 iπξ
83
§3.9.4: Discrete Fourier Transform (DFT)
(Some discussion on Wikipedia. Related: the Fast Fourier Transform (FFT) (link).)
N −1 N −1
Given {xk }k=0 ⊆ C, its discrete Fourier transform (DFT) is the sequence {Xk }k=0 ⊆ C defined by
N −1
X 2πi
Xk := xn exp − kn
n=0
N
N −1 N −1
X 1 X
xn yn = Xk Yk
n=0
N
k=0
84
A few sequence/DFT pairs:
85
§3.10: Laplace Transforms
(Tables of Laplace transforms are here on Paul’s Online Math Notes or here on Wikipedia.)
Note that this gives a function in variable s from a function of variable t. Other notations exist, commonly
L{f (t)}(s), but I will focus on Lf (s) for simplicity unless details of f must be accentuated. Sometimes we
let F = Lf and f = L−1 F as well.
Some utilize a two-sided transform, or other alternatives; Wikipedia has some discussion. Here, we
focus exclusively on the usual, one-sided transform if not stated otherwise.
86
§3.10.2: Laplace Transform Properties
Lf (s) = Lf (s)
87
§3.10.3: Common Laplace Transforms
Polynomials/Polynomial-Like:
1
◦ L{1}(s) =
s
α! Γ(α + 1)
◦ L{tα }(s) = =
sα+1 sα+1
Exponentials:
1
◦ L eαt (s) =
(facilitates shift, of a sort)
s−α
Trigonometric Functions:
α
◦ L{sin(αt)}(s) =
s2 + α2
s
◦ L{cos(αt)}(s) = 2
s + α2
α
◦ L{sinh(αt)}(s) = 2
s − α2
s
◦ L{cosh(αt)}(s) = 2
s − α2
Special Functions:
88
§3.11: Cauchy Principal Value (PV/CPV)
89
§4: Items from Vector & Higher-Dimensional Calculus (Calculus
II/III)
◦ Derivatives: Let x, y be parameterized by t, i.e. we have x(t), y(t). Then, if dx/dt ̸= 0, and all
derivatives involved exist,
dy dy/dt y ′ (t) d2 y d2 y/dt2 y ′′ (t)
= = ′ 2
= = ′
dx dx/dt x (t) dx dx/dy x (t)
◦ Arc Length: If x, y are parameterized in t, and C = x(t), y(t) t∈[a,b] , then
Z b q
2 2
the length of C = [x′ (t)] + [y ′ (t)] dt
a
p
Notably, it has the associated differential ds = (dx)2 + (dy)2 .
◦ Areas of Surfaces of Revolution: Take a curve C defined by x, t, all as above, revolved about
the x axis to form a surface S. Then
Z b q
2 2
the area of S = 2πy [x′ (t)] + [y ′ (t)] dt
a
◦ Conversions: Recall that, to convert Cartesian (x, y) to polar (r, θ), we have
p
r = x2 + y 2 θ = atan2(x, y) (usual polar angle, best to use a picture)
x = r cos θ y = r sin θ
◦ Area: The area between the origin and the curve r = f (θ), for θ ∈ [α, β], is given by
1 β 2 1 β 2
Z Z
area = r dθ = f (θ) dθ
2 α 2 α
If you want the area between r1 (θ) and r2 (θ), you can subtract as in the Cartesian case.
◦ Arc Length: For r = f (θ) a C 1 function and a curve C := {(f (θ), θ)}θ∈[α,β] , we have
Z β Z β q
2 2
p
arc length of C = r2 + (r′ )2 dθ = [f (θ)] + [f ′ (θ)] dθ
α α
90
§4.2: Basics on Vectors, Dot/Cross Products, & Lines/Planes
n
#» · v
#» = ⟨ u,
#» u⟩
#»
X
u Rn = ui vi
i=1
#» v
We note that u, #» are perpendicular is this is zero, i.e. u
#» ⊥ v
#» ⇐⇒ ⟨ u,
#» v
#»⟩ = 0
◦ Angle Between Vectors: Given u, #» v#», we have the angle between them θ and the relation
#» v
⟨ u, #»⟩
cos θ = #» #»
∥ u∥∥ v ∥
#» onto v
◦ Vector Projection: The projection of u #» (the shadow of u
#» as cast from a light perpen-
#»
dicular to v ) is given by
#» #»
#» = ⟨ u, v ⟩ v
proj v#» ( u) #»
#»
∥v∥
2
91
◦ Distance from Point to Line: Consider points P, S in R3 space. Suppose we want the distance
#»
#». Let d #» #» #»
from S, to a line through P parallel to v be the vector from P to S (so d = P − S).
Then:
#»
#»
d × v
the desired distance = #»
∥v∥
#» and
◦ Distance from Point to Plane: Consider a point P on a plane that has normal vector n,
#» #» #» #»
a point S in space. (Hence let v be the vector from P to S, v = P − S.) Then
#» n #»
distance from S to the plane = v , #»
∥ n∥
92
§4.3: Vector Calculus: Derivatives & Integrals of Vector-Valued Functions
Differentiation:
◦ Derivatives: Take #»
r (t) := f (t)ı̂ + g(t)ȷ̂ + h(t) k̂ ∈ R3 a vector-valued function. Then derivatives
are componentwise:
#» d #»
r df dg dh
r ′ (t) ≡ := ı̂ + ȷ̂ + k̂
dt dt dt dt
◦ Tangent Line: The tangent line to the curve #» r (t) traces out, at a point f (t )ı̂ + g(t )ȷ̂ + h(t ) k̂
0 0 0
on it, is the line through the point in question.
◦ Derivative Rules: Most standard rules hold. Some noteworthy exceptions:
Scalar-Vector Product Rule: Take f : R → R and #» r : R → R3 (i.e. f (t) and #»
r (t)). Then
d
f (t) · #»
r (t) = f ′ (t) · #»
r (t) + f (t) · #»
r ′ (t)
dt
#» v
Dot Product Rule: For u, #» : R → R3 vector-valued functions of t,
d D #» #»(t) = u
E D
#»′ (t), v
#»(t) + u(t),
E D
#» #»′ (t)
E
u(t), v v
dt
Cross Product Rule: Similarly,
d #» #»(t) = u
#»′ (t) × v
#»(t) + u(t)
#» × v #»′ (t)
u(t) × v
dt
◦ Constant Length: r (t) is differentiable and has constant length iff ⟨ #»
#» r , #»
r ′ ⟩ = 0 for each t.
93
◦ Acceleration: We may write the acceleration vector by
#» d2 #»
r
a = 2
= aT T̂ + aN N̂
dt
d2 s d
aT = 2 = ∥ #» r ′∥
dt dt
2
ds
= κ∥ #»
2
aN = κ r ′∥
dt
where aT , aN are the tangential & normal scalar components of acceleration, respectively.
Note that acceleration always lies in the plane spanned by these two. One may also note
aN = ∥ #»
q
2
r ′′ ∥ − a2T
Integration: We let #»
r (t) := f (t)ı̂ + g(t)ȷ̂ + h(t) k̂ unless specified otherwise.
94
§4.4: Partial Derivatives
if the limit exists. (We have analogous expressions for other directions - x, y, z, etc. - throughout Rn .) Note
how this holds all but x fixed in f .
We say that f is differentiable (at a point) at (x0 , y0 ) if fx , fy exist there and
is satisfies with ∆x, ∆y → 0 =⇒ ε1 , ε2 → 0. One may show that if fx , fy are defined in an open region
containing (x0 , y0 ) and are continuous at that point, then
Multivariable Chain Rule - One Independent Variable: For simplicity, consider a function
f : R2 → R, in the variables x, y which are parameterized in t. That is, in full, we may write
f x(t), y(t)
Then
df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
(Analogous results hold for functions with dom(f ) = Rn in an analogous manner.)
Multivariable Chain Rule - Two Independent Variables: Consider a function f : R3 → R2 , in
the variables x, y, z, each parameterized in s, t. Hence in full, we have
f x(s, t), y(s, t), z(s, t)
95
◦ F ’s partial derivatives are continuous in a region containing the evaluation point (x0 , y0 , z0 )
◦ F (x0 , y0 , z0 ) = c for some c ∈ R
◦ ∂z F (x0 , y0 , z0 ) ̸= 0
◦ F (x, y, z) = c defines z as a differentiable function of x, y near that point
Clairut’s Theorem / Mixed Derivative Theorem: Suppose f, fx , fy , fxy , fyx are defined and
continuous at (a, b) and in an open region about it. Then
96
§4.5: Directional Derivatives & Gradients
◦ f increases most rapidly when either cos θ = 1, or when θ = 0 and u is parallel to ∇f . That is, f
increases most rapidly at point P to another point P ′ infinitesimally nearby, when its input runs
in the direction of ∇f at P .
◦ Decreasing similarly happens fastest in the direction of −∇f .
#» is orthogonal to ∇f ̸= #»
◦ If u 0 , then no change occurs in f in that direction (the cosine becomes
zero).
◦ Consider a level curve f (x, y) ≡ c. At any point (x, y) on the curve, ∇f is normal to the curve.
(Then ∇f points in the direction of decreasing values, i.e. it is like pointing downriver from a
topological map.)
Tangent Line to Level Curve: The tangent line to a level curve containing (x0 , y0 ) is hence given
by
∂f ∂f
· (x − x 0 ) + · (y − y0 ) = 0
∂x (x0 ,y0 ) ∂y (x0 ,y0 )
Tangent Planes & Gradients: Given a point P = (x0 , y0 , z0 ) which is on the level curve f (x, y, z) = c,
the tangent plane at P is the plane through P normal to ∇f P . The normal line of the surface at P
is the line through P and parallel to ∇f P .
Hence the tangent plane has equation
∂f ∂f ∂f
· (x − x0 ) + · (y − y0 ) + · (z − z0 ) = 0
∂x (x0 ,y0 ,z0 ) ∂y (x0 ,y0 ,z0 ) ∂z (x0 ,y0 ,z0 )
97
and the normal line may be expressed by
∂f
x(t) = x0 + t ·
∂x (x0 ,y0 ,z0 )
∂f
y(t) = y0 + t · t∈R
∂y (x0 ,y0 ,z0 )
z(t) = z + t · ∂f
0
∂z (x0 ,y0 ,z0 )
If we let z = f (x, y) define a surface in space, then at the point (x0 , y0 , f (x0 , y0 )) ∈ R3 , we have the
tangent plane as
∂f ∂f
· (x − x0 ) + · (y − y0 )+ = z − f (x0 , y0 )
∂x (x0 ,y0 ) ∂y (x0 ,y0 ) | {z }
like z0
Chain Rule for Paths: Suppose we have #» r (t) := x(t)ı̂ + y(t)ȷ̂ + z(t) k̂ making a smooth path C,
and we have (f ◦ #»
r )(t) = f ( #»
r (t)) a scalar function evaluating along that path. Then we may write
d #»
f ( r (t)) = ∇f ( #»
r (t)), #»
D E
r ′ (t)
dt
98
§4.6: Differentials & Linearization
Change in a Direction: The change of f after moving a small distance ds from a point P in a
#» is given by
direction parallel to a vector v
#»
df ≈ ∇f P , v · ds
Linearization: The linearization of f (as a function of x, y) at the point P = (x0 , y0 ) is the function
∂f ∂f
LP (x, y) := f (P ) + (x − x 0 ) + (y − y0 )
∂x P ∂y P
Hence, for (x, y) near P , we have f (x, y) ≈ LP (x, y). Analogous expressions exist in higher dimensions.
The error E in the approximation is given by
M 2
|E(x, y)| ≤ |x − x0 | + |y − y0 | where |fxx |, |fxy |, |fyy | ≤ M
2
The generalization to Rn is as expected.
Total Differentials: We may define the total differential of f (as (x0 , y0 ) moves to (x0 +dx, y0 +dy))
by
∂f ∂f
df = · dx + · dy
∂x (x0 ,y0 ) ∂y (x0 ,y0 )
The generalization to Rn is as expected.
99
§4.7: Optimization & Lagrange Multipliers
Unconstrained Optimization:
First Derivative Test: If (a, b) ∈ int(dom f ) is a local maximum or local minimum of f at which
fx , fy exist, then fx (a, b) = fy (a, b) = 0.
Critical/Saddle Points: We say (a, b) is a critical point if fx (a, b) = fy (a, b) = 0, or at least one
does not exist.
We say (a, b) is a saddle point if it is a critical point if any open ball centered at (a, b) has points
(xp , yp ) and (xn , yn ) where f (xn , yn ) < f (a, b) < f (xp , yp ).
Second Derivative Test: Take f ∈ C 2,2 (all first/second partial derivatives are continuous) in a ball
about (a, b), with fx (a, b) = fy (a, b) = 0. Then
2
◦ (a, b) is a local maximum if fxx < 0 and fxx fyy − fxy > 0 there
2
◦ (a, b) is a local minimum if fxx > 0 and fxx fyy − fxy > 0 there
2
◦ (a, b) is a saddle point if fxx fyy − fxy < 0 there
2
◦ The test is inconclusive if fxx fyy − fxy = 0 at the point in question
100
§4.8: Multiple Integrals, Differentials, Jacobians, Applications
Multiple integrals may be defined in the obvious way and more thorough, general definitions are discussed
on the sections on Riemann integration & Lebesgue integration. We focus on relevant ideas and results here.
Analogous results exist for boxes in Rn : integral order may be interchanged. We may also do this for
inner integral bounds defined as a function of the outer variable:
ZZ Z y=d Z x=h2 (y) Z x=b Z y=g2 (x)
f (x, y) dA = f (x, y) dx dy = f (x, y) dy dx
R y=c x=h1 (y) x=a y=g1 (x)
◦ The aforementioned conversions may be justified with the Jacobian or Jacobian determinant.
In two dimensions, suppose x = g(u, v) and y = h(u, v) (e.g. x = r cos θ, y = r sin θ). Then
∂x ∂x
∂(x, y)
∂v = ∂u x
∂v x
J(u, v) ≡ := ∂u
∂(u, v) ∂y ∂y ∂u y ∂v y
∂u ∂v
Notice that the variables differentiated w.r.t. “enumerate” the columns, and the functions differ-
entiated “enumerate” the rows. In three dimensions, likewise,
∂ u x ∂ v x ∂ w x
J(u, v, w) := ∂u y ∂v y ∂w y
∂u z ∂v z ∂w z
◦ If you make the substitution x = g(u, v), y = h(u, v) in an integral (of f over R), then if R becomes
G under the transform (University Calculus specifies preimage?), then
ZZ ZZ ∂(x, y)
f (x, y) dx dy = f g(u, v), h(u, v) · du dv
R G ∂(u, v)
That is,
∂(x, y)
dx dy =
du dv
∂(u, v)
The obvious extension to Rn holds.
◦ Hence, if you start with a Cartesian integral, and convert elsewhere, you pick up an extra factor
determined by the Jacobian.
101
Masses, Moments, Centers, Etc.:
◦ Mass (Zeroth Moment): Given a density δ := δ(x, y, z), the mass of the object that describes
in a space D is given by ZZZ
M= δ(x, y, z) dV
D
◦ First Moments in 3D: Moments describe the shape of a graph: mass is a zeroth moment,
center of mass is a first moment (when normalized), and moment of inertia is the second moment.
(Probability distributions have first moments as averages, and second moments as variances.)
The first moments about the coordinate planes are denoted by Mxyz with x, y, or z missing
depending on which of the three planes the moment is about: Myz is the moment about x for
instance. We’ll have
ZZZ ZZZ ZZZ
Myz = x · δ(x, y, z) dV Mxz = y · δ(x, y, z) dV Mxy = z · δ(x, y, z) dV
D D D
| {z } | {z } | {z }
moment about x moment about y moment about z
◦ Second Moments: The second moments Ix , Iy , Iz about the x, y, z axes (resp.) are given by
ZZZ
Ix = (y 2 + z 2 )δ(x, y, z) dV
V
ZZZ
Iy = (x2 + z 2 )δ(x, y, z) dV
V
ZZZ
Iz = (x2 + y 2 )δ(x, y, z) dV
V
In general, for a line L, its second moment is IL . If r(x, y, z) is the distance from (x, y, z) to L,
then ZZZ
Il = r2 (x, y, z) · δ(x, y, z) dV
V
102
§4.9: Line Integrals
Note that, even if two paths C, C ′ start and end at the same points, we may have
R R
C
f ̸= C′
f.
103
The Case of Vector-Valued Functions:
#»
Now suppose we have a vector-valued function F and a curve C parameterized by #» r (t) for t ∈ [a, b].
#»
Then the line integral of F over C is
#» E #» d #» #» E Z b #» d #»
Z D Z Z D
r r
F, T̂ ds = F, ds = F, d #»
r = F( #»
r (t)), dt
C C ds C a dt
(albeit usually written in the dot product notation). Observe that, for instance,
#» #»
Z Z
F := M (x, y, z)ı̂ =⇒ F · d #»
r = M (x, y, z) dx
C C
Hence if #»
r (t) := g(t)ı̂ + h(t)ȷ̂ + k(t) k̂ parameterizes C, then
Z Z Z b
M (x, y, z) dx = M dx = M g(t), h(t), k(t) · g ′ (t) dt
C C a
Z Z Z b
M (x, y, z) dy = M dy = M g(t), h(t), k(t) · h′ (t) dt
C C a
Z Z Z b
M (x, y, z) dz = M dz = M g(t), h(t), k(t) · k ′ (t) dt
C C a
Of note, this integral can give the work W done by the vector field over the curve, or the flow of the
vector field along it. If C is a closed curve, then the flow is called the circulation around C (with positive
orientation =⇒ counterclockwise motion).
Flux is analogously given w.r.t. a normal vector n̂ to C, as below. We assume here the curve is in the
#»
x, y plane, parameterized by #»
r (t) = g(t)ı̂ + h(t)ȷ̂ amd that F = M ı̂ + N ȷ̂.
#» #» E
Z D I
flux of F over C = F, n̂ ds = M dy − N dx
C C
Note further that C must be traced exactly once for this to apply.
It is common to have integrals of the type
Z n o
−y dx + z dy + 2x dz C = cos(t)ı̂ + sin(t)ȷ̂ + t k̂
C t∈[0,2π]
which mix up x, y, z with the other differentials. For this, apply your parameterization first and get all in
terms of the parameter t. For instance, the parameterization
could be applied to this integral. Replace x, y, z accordingly. Find the differentials next, per the formula
df = df (t) = f ′ (t) dt
104
Path Independence / Conservative Fields:
Recall that the line integral across two different curves starting and ending at the same points may still
differ.
#» R #»
If F is a field for which C F ·d #»
r is the same regardless of which path C is used (provided all such C start
#»
and end at the same points), we say this integral is path independent and the field F is conservative.
#»
Conservative Fields Are Gradient Fields: Let F = M ı̂ + N ȷ̂ + P k̂ for M, N, P continuous on
#» #»
a connected domain. Then, if F = ∇f for a scalar function f (and some other conditions), F is also
#»
conservative. The converse is true. We say f is a potential function of F.
#»
Fundamental Theorem of Line Integrals: Suppose F is conservative with potential f , and the
path C starts at P and runs to Q, and is smooth, with parameterization given by #»
r (t). Then
#»
Z
F · d #»
r = f (Q) − f (P )
C
∂P ∂N ∂M ∂P ∂N ∂M
= = =
∂y ∂z ∂z ∂x ∂x ∂y
You take each pair of components, say the α and β components Cα , Cβ , and then differentiate them
w.r.t. the other component’s variable and see if they equate, e.g.
∂Cα ? ∂Cβ
=
∂β ∂α
105
§4.10: Parameterized Surfaces: Areas & Surface Integrals
#» ∈
n o
If a surface R is implicitly defined via F (x, y, z) = c and has normal vector p ı̂, ȷ̂, k̂ , and
#» =
⟨∇F, p⟩ ̸ 0,
∥∇F ∥
ZZ
surface area of R = #» dA
R |⟨∇F, p⟩|
106
Surface Integrals:
Given a smooth surface S with parameterization S = { #» r (u, v)}(u,v)∈R , and G ∈ C(S), the surface integral
of G on S is
∂ #» ∂ #»
ZZ ZZ
r r
G(x, y, z) dσ = G f (u, v), g(u, v), h(u, v)
×
du dv
S R ∂u ∂v
| {z }
= dσ
where #»
r (u, v) = f (u, v)ı̂ + g(u, v)ȷ̂ + h(u, v) k̂.
If the surface is implicitly defined, under the same circumstances as on the previous page,
∥∇F ∥
ZZ ZZ
G(x, y, z) dσ = G(x, y, z) #» dA
S R |⟨∇F, p⟩|
107
Surface Integral of Vector Field:
#»
If S is oriented by n̂, we have the surface integral of F as given by
#» E #» ∂ #» ∂ #»
ZZ D ZZ
r r
F, n̂ dσ = F, × du dv
S S ∂u ∂v
Stokes’ Theorem:
Stokes’ Theorem is a noteworthy result as well. Given S a piecewise smooth surface (with unit normal
#»
n̂) , with F having continuous first partial derivatives in each component, then
#» #» #» E
I D E ZZ D
circulation of F around C (CCW w.r.t. n̂) = F, d #»
r = ∇ × F, n̂ dσ
∂S S
#» E Z Z ∂N
I D
∂M
F, d #»
r = − dx dy
∂S S ∂x ∂y
Divergence Theorem:
#»
This claims, for F as in Stokes’ Theorem, the flux through the surface S = ∂D (with S having unit
normal n̂) is given by
#» E #»
ZZ D ZZZ
F, n̂ dσ = ∇ · F dV
∂D D
108
§5: Vector Calculus Identities (Calculus III)
#»
Throughout, we assume V : Rn → Rn is a vector field (vector-valued function) and φ : Rn → R is a
scalar field (scalar-valued function).
It is conventional to treat ∇ as the pseudo-vector (∂i )ni=1 where ∂i := ∂/∂xi (derivative of the ith
coordinate).
We assume Cartesian Rn coordinates here, for simplicity.
Formal Definitions:
#» (evaluated
◦ Gradient: The gradient of φ is the vector field whose dot product with any vector v
#» #» #»
at x 0 ) is the directional derivative of V along v :
#» #»
∇V( #»
x 0 ) · v = D v#» V( #»
x 0)
◦ Divergence: Take a volume τ with outward unit normal n̂ and surface σ ≡ ∂τ , the divergence
at x0 is
#» #»
I
1
∇ · V := lim V · n̂ dσ
µ(τ )→0 µ(τ )
x0 σ
#»
The integral here measures the flux of V as it leaves σ. One can also think of
ZZZ
µ(τ ) ≡ dτ
τ
◦ Curl: The curl at x0 is given as so: take a volume τ (surface σ, outward unit normal n̂) containing
x0 , and shrink its volume to zero in the following formula:
#» #»
I
1
∇ × V := lim n̂ × V dσ
x0 µ(τ )→0 µ(τ ) σ
#»
The cross product gives us a vector perpendicular to V that is (equivalent to) a vector tangential
to the differential surface element (which can think of as a circle, i.e. the cross product gives a
vector in the direction of rotation, tangent to a circle and perpendicular to its radius).
Usual Definitions:
◦ Gradient: The gradient grad(φ) or ∇φ is the vector-valued function
n n
X ∂φ X
∇φ := êxi = ∂i φ êi
i=1
∂xi i=1
#» #»
◦ Divergence: The divergence div V or ∇ · V is the scalar-valued function
n
#» X
∇ · V := ∂i Vi
i=1
#» #»
◦ Curl: In R3 , the curl, denoted curl V or ∇ × V, is easily memorized by thinking of a determi-
nant:
x̂ ŷ ẑ
#»
∇ × V ≡ ∂x ∂y ∂z
V x Vy Vz
109
§5.2: Useful Identities
#» #»
Throughout, A, B represent vector fields and ψ, φ scalar fields.
These come in essence from Wikipedia. Some other stuff can be found in the NRL Plasma Formulary,
for instance, that you used in your space science math methods class; it is accessible online here; you should
have also downloaded a copy. Bear in mind conventions about various coordinate systems.
Linearity: The curl, gradient, and divergence are linear operators. Hence, for any α, β ∈ R,
Gradient:
◦ ∇(ψ + φ) = ∇ψ + ∇φ
◦ ∇(ψφ) = φ∇ψ + ψ∇φ
ψφ φ∇ψ − ψ∇φ
◦ ∇
= φ2
#» #» #»
◦ ∇ ψ A = ∇ψ ⊗ A + ψ∇A
#» #» #» #» #» #» #» #» #» #»
◦ ∇ A·B = A·∇ B+ B·∇ A+A× ∇×B +B× ∇×A
Divergence:
#» #» #» #»
◦ ∇· A+B =∇·A+∇·B
#» #» #»
◦ ∇ · ψ A = ψ∇ · A + A · ∇ψ
#» #» #» #» #» #»
◦ ∇· A×B = ∇×A ·B− ∇×B ·A
Curl:
#» #» #» #»
◦ ∇× A+B =∇×A+∇×B
#» #» #» #» #»
◦ ∇ × ψ A = ψ ∇ × A − A × ∇ ψ = ψ ∇ × A + (∇ψ) × A
◦ ∇ × (ψ∇φ) = ∇ψ × ∇φ
#» #» #» #» #» #» #» #» #» #»
◦ ∇× A×B =A ∇·B −B ∇·A + B·∇ A− A·∇ B
Material Derivatives:
#» #» 1 #» #» #» #» #» #» #» #» #» #» #» #»
◦ A·∇ B= ∇ A · B −∇× A × B −B× ∇ × A −A× ∇ × B −B ∇ · A +A ∇ · B
2
#» #» 1
#»
2 #» #» 1
#»
2 #» #»
◦ A · ∇ A = ∇
A
− A × ∇ × A = ∇
A
+ ∇ × A × A
2 2
110
Second Derivatives, e.g. Laplacian: Recall we define ∇2 := ∇ · ∇ =: ∆.
#»
◦ ∇· ∇×A =0
#»
◦ ∇ × (∇ψ) = 0
◦ ∇ · (∇ψ) = ∇2 ψ
#» #» #»
◦ ∇ ∇ · A − ∇ × ∇ × A = ∇2 A
◦ ∇ · (φ∇ψ) = φ∇2 ψ + ∇φ · ∇ψ
◦ ψ∇2 φ − φ∇2 ψ = ∇ · (ψ∇φ − φ∇ψ)
◦ ∇2 (φψ) = φ∇2 ψ + 2(∇φ) · (∇ψ) + ∇2 φ ψ
#» #» #» #»
◦ ∇2 ψ A = A∇2 ψ + 2(∇ψ · ∇)A + ψ∇2 A
#» #» #» #» #» #» #» #» #» #»
◦ Green’s Vector Identity: ∇2 A · B = A·∇2 B− B·∇2 A+2∇· B · ∇ A + B × ∇ × A
111
Circulation-Curl/Tangential Form: This claims
#» #»
I I ZZ
∂N ∂M
F · T ds = M dx + N dy = − dx dy
∂x ∂y
|C {zC } R
| {z }
CCW circulation circulation density
#»
z component of curl: curl F · k̂
#»
I I ZZ
∂M ∂N
F · n̂ ds = M dy − N dx = + dx dy
∂x ∂y
|C {zC } R
| {z }
outward flux through C flux density
#»
div( F )
In using Green’s theorems, it’s usually a good idea to parameterize C in terms of a variable t to
make it into a calculable Riemann integral. (For instance, if C is a circle of radius r, let x = r cos t
and y = r sin t for t ∈ [0, 2π).) Then you can calculate the differentials dx, dy for the line integrals
as you would for differentials:
df = f ′ (x) dx
In the circle example, then,
dx = dx(t) = d r cos t = −r sin t dt
as an example. (You’ll also want to convert M, N to terms of t for the line integral, if needed.)
112
§5.3: Identities with the Levi-Civita Symbol
= (−1)p , where p (parity) is the number of swaps to reach (1, 2, · · ·, n) if they’re all distinct
113
§5.4: Alternative (3D) Coordinate Systems
Basic Conversions:
114
Conversions of System A (Left) to System B (Top), Using System B Conventions:
115
Conversions of System A (Left) to System B (Top), Using System A Conventions:
116
Differentials in Alternative Systems
Note that these differentials apply in all cases, not just as a means of conversion. That is, if you have
f (r, θ, φ) a function in spherical coordinates, and integrate it over a body E, then
ZZZ ZZZ
f (r, θ, φ) dV = f (r, θ, φ) · r2 sin θ dr dθ dφ
E E
in the conventions/notes below. Be mindful of various conventions of notation, especially where spherical
coordinates are concerned. For posterity, those University Calculus: Early Transcendentals uses are summed
up by the picture and bullets below
ρ is the radius
117
◦ Area (First Coordinate Fixed): dS = dy dz
◦ Area (Second Coordinate Fixed): dS = dx dz
◦ Area (Third Coordinate Fixed): dS = dx dy
◦ Volume: dV = dx dy dz
118
Important Operator Conversions:
Only some are found here. Others may be found in a table on Wikipedia, or archived here on Imgur.
Cylindrical Coordinates (r, θ, z): Think polar, extended to 3D. Take V = Vr êr + Vθ êθ + Vz êz
specifically.
∂φ 1 ∂φ ∂φ
◦ Gradient: ∇φ = êr + êθ + êz
∂r r ∂θ ∂z
1 ∂ 1 ∂Vθ ∂Vz
◦ Divergence: ∇ · V = (rVr ) + +
r ∂r r ∂θ ∂z
◦ Curl:
1 ∂Vz ∂Vϕ
∇×V = − êr
r ∂ϕ ∂z
∂Vr ∂Vz
+ − êϕ
∂z ∂r
1 ∂ ∂Vr
+ (rVϕ ) − êz
r ∂r ∂ϕ
Spherical Coordinates (r, θ, ϕ): Take V = Vr êr + Vθ êθ + Vϕ êϕ specifically. Due to conflicting
notations in the literature, θ represents the polar angle (the angle from the z axis) and ϕ the azimuthal
angle (that in the x, y plane). (Yes, the definition of each term is confusing. No, I don’t understand
this.) Some info on the conflicting conventions is explained here.
∂φ 1 ∂φ 1 ∂φ
◦ Gradient: ∇φ = êr + êθ + êϕ
∂r r ∂θ r sin θ ∂ϕ
1 ∂ 2 1 ∂ 1 ∂Vϕ
◦ Divergence: ∇ · V = 2 r Vr + (Vθ sin θ) +
r ∂r r sin θ ∂θ r sin θ ∂ϕ
◦ Curl:
1 ∂ ∂Vθ
∇×V = (Vϕ sin θ) − êr
r sin θ ∂θ ∂ϕ
1 1 ∂Vr ∂
+ − (rVϕ ) êθ
r r sin θ ∂ϕ ∂r
1 ∂ ∂Vr
+ (rVθ ) − êϕ
r ∂r ∂θ
119
§6: Items from Ordinary Differential Equations
(More to add later...)
120
§7: Items from Partial Differential Equations
(More to add later...)
121
§8: Matrices & Linear Algebra
A vector space or linear space V over a field F has two operations of addition + : V × V → V and
scalar multiplication · : V × F → V satisfying the following:
The trivial vector space, consisting of a zero vector only: {0} (sometimes ⟨0⟩)
F[x] itself
Trivially, for any vector space V , V and the trivial space are subspaces, as is the intersection of subspaces.
The span of a set S ⊆ V is an important subspace. We let
( n )
X
span(S) := αi vi n ∈ N, αi ∈ F, vi ∈ S ∀i
i=1
122
n n P
We say S ⊆ V is linearly dependent if ∃{xi }i=1 ⊆ S and {αi }i=1 ⊆ F (not all 0) such that i αi xi = 0.
Otherwise, if
Xn
αi xi = 0 =⇒ αi = 0 for any choice of finitely-many xi ∈ S
i=1
then we say S is linearly independent. (|S| ≤ 1 can be trivially/vacuously shown independent.) We say S
is a basis if it is linearly independent and span(S) = V . For V having a basis of finitely-many (say, n < ∞)
elements - and hence every basis of it does - we say that dimF V = n.
123
§8.2: Linear Transformations; Rank, Kernel, Nullity, etc.
(The space of all such T is denoted L(V, W ), or L(V ) if they’re the same. The dual space of V is
V ∗ := L(V, F).)
We define some important spaces and parameters for such a T :
We note:
Rank-Nullity Theorem: nullity(T ) + rank(T ) = dim ker(T ) + dim im(T ) = dim V (for V finite-
F F F
dimensional)
ker(T ) ≤ V and im(T ) ≤ W
n
β := {vi }i=1 a basis of V =⇒ im(T ) = span(T (v1 ), · · ·, T (vn )) (but not a basis without trivial kernel)
A matrix A is of full rank when rank(A) = min{m, n}. As it happens, the set of full rank matrices
A ∈ Cm×n is dense. (The norm used to measure this is irrelevant since all norms are equivalent on a
finite-dimensional space.)
Let T : V → W be linear
n m
Let V have basis β := {xi }i=1 and W have basis γ := {yi }i=1
Then the matrix A := (ai,j )1≤i≤m,1≤j≤n is the representation of T in the bases β, γ, and we may write
A = [T ]γβ (omitting one if β = γ). When T = I (the identity transformation), we say that [I]γβ is the change
of basis matrix for the bases of course.
124
§8.3: Matrix Operations & Notations
Let ai,j ∈ R for some commutative ring R (e.g. Z, R, C). The m-row, n-column matrix A formed by
these ai,j is denoted A := (ai,j )1≤i≤m,1≤j≤n . We would say that A ∈ Rm×n (or Mm×n (R)), with A being
square if m = n.
We may write GLn (R) for those n × n matrices which are invertible (general linear group).
m m
where Ai,∗ := (ai,j )j=1 ∈ R1×m is the ith row of A, and B∗,j := (bi,j )i=1 ∈ Rm×1 is the jth column of
B. The multiplication algorithm is succinctly visualized with this:
Hence, one may think of matrix multiplication as a map Rm×n × Rn×r → Rm×r . Consider the
dimensions represented:
m×n n×r
In multiplication matrices, with dimensions represented in this order, the “inner” pair of dimensions
must be equal for the two to be “compatible” in this sense, and their product has size determined by
the outer numbers.
Note that matrix multiplication is not necessarily commutative, even if the entries lie in
a field.
125
§8.4: Transposition & Related Notions (Hermitian, Unitary, & More)
Tranpose: The transpose of A := (ai,j )1≤i≤m,1≤j≤n is AT := (aj,i )1≤j≤n,1≤i≤m . That is, for
A ∈ Rm×n , we get AT ∈ Rn×m with entries swapping their rows and columns (flipping across the
diagonal, in effect, for square matrices).
Sometimes, this is denoted by A′ or AT (with no mind for the serifs).
Conjugate-Transpose: When A’s entries lie in R (A := (ai,j )1≤i≤m,1≤j≤n ), this is identical to
the transpose. When some are non-real and lie in C, however, we can define the distinct conjugate
transpose, denote A∗ and defined by
A∗ := (aj,i )1≤j≤n,1≤i≤m
That is, you take the complex-conjugate of each entry, and then transpose (or in the other order).
Sometimes this is denoted by AH , and sometimes known as the Hermitian transpose, transjugate,
or (confusingly) adjoint. If we let A denote the entry-wise operation of complex conjugation, i.e.
Some properties follow. Unless stated otherwise, even though conjugate-transposes are used, the same
considerations may be used in the real (ordinary transpose) cases.
Involution: (M ∗ )∗ = M
Respects Addition: (A + B)∗ = A∗ + B∗
Reverses Products: (AB)∗ = B ∗ A∗
Respects Scalar Multiplication: (cM )T = cM T (though (cM )∗ = cM ∗ )
Determinants: det(M T ) = det(M ) and det(M ∗ ) = det(M )
Traces: trace(AT ) = trace(A) and trace(A∗ ) = trace(A)
Positive Semi-Definite: For A ∈ Rn×n we have AT A as positive-semidefinite.
∗ −1
Inverses: M −1 = (M ∗ )
Eigenvalues: The eigenvalues of M and M T are the same. If λ is an eigenvalue to M , then λ is one
for M ∗ .
126
§8.5: Determinants: Definitions & Notations
Introduction:
The determinant of a matrix M is denoted det(M ). Sometimes we denote it with absolute values
around the matrix itself, e.g.
1 2 1 2
M= =⇒ det(M ) =
3 4 3 4
It is only defined for square matrices, and represents the volume dilation and orientation change by the linear
transformation M represents.
We proceed iteratively.
127
Small Determinants (1 × 1, 2 × 2, 3 × 3):
Trivially, the determinant of a 1 × 1 matrix is its sole entry: M = a =⇒ det(M ) = a
We may calculate 2 × 2 determinants by
a b
c = ad − bc
d
For 3 × 3 determinants we may use Sarrus’ rule, which uses a shoelace/Pac-Man-like pattern: extend
down-right on the main diagonals from the top row, take the products of the entries, and add them. Do the
same for the down-left anti-diagonal patterns (or up-right from the bottom row as pictured). Find the first
value minus the second value.
X n
Y
det(A) := sign(σ) ai,σ(i)
σ∈Sn i=1
where the sign of σ ∈ Sn is the number of swaps it needs to return to the form (1, 2, 3, · · ·, n), the identity.
Hence, one may utilize the Levi-Civita symbol to write
n
X n
Y
det(A) := εi1 ,i2 ,···,in · aj,ij
i1 ,···,in =1 j=1
128
§8.6: Determinants: Adjugates/Adjoints & Cofactor Matrices
We may define a cofactor matrix . There seems to be no standard notation, so let Mc be that for M .
Then
(Mc )1≤i,j≤n = (−1)i+j Mi,j 1≤i,j≤n
where Mi,j is the smaller determinant from the Laplace expansion, generated by the entry mi,j in M .
We can also define the adjugate matrix (or classical adjoint) as the transpose of the cofactor matrix.
That is, since the cofactors are (−1)i+j Mi,j , we have
adj(I) = I, and adj(0) = 0 for dimensions > 1. For dimension 1 × 1, then adj(0) = I.
adj(AB) = adj(B) adj(A) and hence adj(M k ) = adj(M )k (working for negative k on M invertible)
129
§8.7: Determinants: Properties
◦ Any choice of expansion for row or column yields the same result.
◦ If an entire row or column is all zeroes, then det(M ) = 0.
Moreover, if two rows or two columns are identical, then det(M ) = 0.
In fact, if the rows or columns are not linearly independent (you can write one row as a linear
combination of the others), then det(M ) = 0.
◦ det(In ) = 1 (for In the n × n identity matrix)
Row Operations: Let M be the starting matrix, and M ′ the matrix after the operation.
Matrix Operations:
◦ Adjugates/Adjoint: Recall that, where the minors (smaller determinants) are Mi,j (generated
by entry mi,j in M ), we have
det(M )2 = det(Mc )
130
◦ Items on Block Matrices: Take A, B, C, D of dimensions n × n, n × m, m × n, and m × m
respectively. Then:
A 0
= det(A) det(D) = A B
C D 0 D
A B
For A invertible,
= det(A) det D − CA−1 B
C D
A B
When m = n and CD = DC, then = det(AD − BC)
C D
A B
When m = n and A = D and B = C, then = det(A − B) det(A + B)
B A
◦ Positive Semi-Definite Matrices: We say M is positive semi-definite if xT M x ≥ 0 for
each vector x compatible with M . (If working within C, we use x∗ , the complex-conjugate-
then-transpose, instead.) Positive-definite requires > 0 strictly; negative notions are defined
analogously.
Consider A, B, C positive semi-definite of the same size. Then
det(A + B + C) + det(C) ≥ det(A + C) + det(B + C)
det(A + B) ≥ det(A) + det(B) (corollary of above, 0 is PSD)
∗
For A, B Hermition (A = A , etc.) and positive definite of size n, then
p
n
p p
det(A + B) ≥ n det(A) + n det(B)
◦ Vandermonde Matrices: A (square) Vandermonde matrix takes the form
x1 x21 x31 · · · xn−1
1 1
1 x2 x22 x32 · · · xn−1
2
n−1
2 3
V = 1
x3 x3 x3 · · · x3
. .. .. .. . . ..
.. . . . . .
1 xn x2n x3n · · · xn−1
n
We have that Y
det(V ) = (xj − xi )
1≤i<j≤n
Miscellaneous Properties:
131
◦ Derivative Identities (Jacobi’s Formula): If M depends on x, then
d det(M ) dM
= trace adj(M )
dx dx
Corollary: If m = n, then AB, BA has the same characteristic polynomials and eigenvalues.
◦ Trace: Recall: trace(M ) is the sum of M ’s diagonal entries. We have that
det(exp(M )) = exp(trace(M ))
132
§8.8: Similarity & Properties Thereof
We are often interested in writing a matrix A in terms of an invertible matrix Q and a matrix B as
A = QBQ−1
(See, for instance, eigendecomposition.) In such a case, we say A, B are similar matrices, and can say
they represent the same linear transformation w.r.t. (possibly) different bases.
This notion induces an equivalence relation.
If two matrices are similar, the following properties are shared:
Characteristic polynomial (hence, determinant, trace, eigenvalues & their multiplicities of both types)
Minimal polynomial
Index of nilpotence
Elementary divisors
133
§8.9: Eigenstuff (-values, -vectors, -pairs, -spaces, -decomposition...)
Av = λv =⇒ Av − λv = 0
=⇒ (A − λI)v = 0
=⇒ v = 0 or, more importantly, det(A − λI) = 0
The polynomial pA (λ) := det(A − λI) is called the characteristic polynomial of A, and its roots are the
eigenvalues of A. Of course, they need not be distinct, leading to further concepts:
Algebraic Multiplicity: The algebraic multiplicity µA (λ) of the eigenvalue λ of A is simply its
multiplicity as a root of pA . We say λ is a simple eigenvalue if µA (λ) = 1.
Geometric Multiplicity: We define the eigenspace of a fixed eigenvalue λ by
Note that this is all scalar multiples of any eigenvector of λ. (Moreover, E ≤ dom(A).) The geometric
multiplicity of λ is given by
134
Further Properties & Results of Note:
Eigendecomposition: Let (λi , vi ) be the eigenpairs of A, with eigenvectors linearly independent (not
necessarily eigenvalues). Then let P = [v1 | v2 | · · · | vn ] and D = diag(λ1 , · · ·, λn ). Then A = P DP −1 .
This is the eigendecomposition of A. (Note: We say M, N are similar matrices if ∃P invertible
such that M = P N P −1 . Hence, the eigendecomposition diagonalizes A and shows it is similar to a
diagonal matrix, sort of representing the same transformation in different bases.)
If an eigendecompsotion does not exist, the matrix is said to be defective and we may appeal to
generalized eigenvectors and the Jordan normal form.
n
Determinant: For A with σ(A) := {λi }i=1 (including repetition), we have det(A) = i λi .
Q
n
Trace: For A with σ(A) := {λi }i=1 (including repetition), we have trace(A) = i λi .
P
A is invertible iff 0 ̸∈ σ(A) (all eigenvalues are nonzero). It has eigenvalues 1/λi .
If each column sums to λ, or each row sums to λ, then λ is an eigenvalue of that matrix. (The all-ones
vector will be an eigenvector for A or AT .)
2 pAj (λi )
|vi,j | =
p′A (λi )
135
Spectral Radius:
The spectral radius ρ(A) for A ∈ Cn×n is given as the largest eigenvalue by magnitude:
Some results:
For any matrix norm induced by one on vectors, ρ(A) ≤ ∥A∥. Moreover, we have Gelfand’s formula,
stating
1/k
ρ(A) = lim
Ak
k→∞
k→∞
ρ(A) < 1 iff Ak −−−−→ 0 , and ρ(A) > 1 iff
Ak
→ ∞ for any matrix norm.
136
§8.10: Matrix Norms, Equivalence, Inequalities, & Related Notions
Recall: a norm on any F-vector space V is a function ∥·∥ : V → R≥0 such that
We may assign these to spaces of matrices as well. We may, for instance, discuss a matrix norm induced
by a vector norm. Given A ∈ Fm×n and norms ∥·∥(m) and ∥·∥(n) on Fm , Fn respectively, define the induced
matrix norm (on A by these norms) by
n o
∥A∥(m,n) := inf C ≥ 0 ∥Ax∥(m) ≤ C∥x∥(n) for all x ∈ Cn
∥Ax∥(m)
≡ sup ≡ sup ∥Ax∥(m)
x∈Cn ∥x∥(n) x∈Cn
x̸=0 ∥x∥(n) =1
Some notes:
The induced ∞-norm is given by (for A ∈ Cm×n ) the maximum row sum:
∥A∥∞ ≡ max ∥a∗i ∥1
1≤i≤m
137
The induced 2-norm is the spectral norm. We have that
p
∥A∥2 = λmax (A∗ A) ≡ σmax (A)
where
r
λmax (M ) := max λ σmax (M ) := the largest singular value of M (in the SVD sense)
λ∈σ(M )
Related notions:
We say the matrix norm is compatible with the vector norm in the case both norms are equal (and
hence m = n).
All induced norms are definitionally consistent, and submultiplicative norms in general induce com-
patible vector norms.
Matrix norms need not be induced. For instance, one has a class of entry-wise matrix norms, in the same
spirit as the vector norms. (One may think of A ∈ Fm×n as instead a vector in Fmn to motivate these.) For
instance, for p ∈ [1, ∞), one may define
1/p
X p
∥A∥p := |ai,j |
i,j
Notably, the case p = 2 gives the Frobenius norm or Hilbert-Schmidt norm, ∥A∥F , which satisfies
sX sX
2
p p
∥A∥F = |ai,j | = trace(A∗ A) = trace(AA∗ ) = σi2 (A)
i,j i
for σi (A) the singular values of A. (Much like the 2-norm, it is invariant under multiplication by unitary
matrices or those with orthonormal columns or rows.) This final identity also relates the norm to the
Schatten 2-norm (Wikipedia).
More generally, for p, q ∈ [1, ∞), one has the Lp,q norm
n m
!q/p 1/p
X X p
∥A∥p,q := |ai,j |
j=1 i=1
138
Norm Equivalence & p-norm Inequalities:
We say two norms ∥·∥α , ∥·∥β (in general, on an F-vector space V ) are equivalent if
(hence inducing the same topology). All norms on finite-dimensional spaces are equivalent.
For matrices A ∈ Rm×n , of rank r, we have the inequalities below. Subscripts refer to induced p-norms.
The max norm is given by
∥A∥max := max|ai,j |
i,j
√
∥A∥2 ≤ ∥A∥F ≤ r∥A∥2
p
∥A∥F ≤ ∥A∥∗ ≤ ∥∥ AF (middle is the Schatten 1-norm)
√
∥A∥max ≤ ∥A∥2 ≤ mn∥A∥max
√ √
∥A∥∞ ≤ n∥A∥2 ≤ mn∥A∥∞
√ √
∥A∥1 ≤ m∥A∥2 ≤ mn∥A∥1
p
∥A∥2 ≤ ∥A∥1 ∥A∥∞
139
§8.11: Bessel’s Inequality
Let {ek }k∈N (or a finite sequence) be orthonormal in a Hilbert space H. Then
∞
X 2 2
|⟨x, ek ⟩H | ≤ ∥x∥H
k=1
140
§8.12: Gram-Schmidt Orthonormalization Process
(Wikipedia article)
i−1 i−1
Essentially, what happens is that ai is projected orthogonally onto span{aj }j=1 = span{qj }j=1 . The differ-
ence between the original vector and its projection (of the type v − P v) is thus orthogonal to that space; we
normalize from there if desired.
Expressed as pseudo-code,
141
Comparison:
k k
Side by side, the GS algorithms applied to {xi }i=1 to generate {yi }i=1 looks like this:
Notice that MGS modifies all of the vectors in turn, and subtracts out the higher-indexed directions.
142
§8.13: (Moore-Penrose) Pseudoinverse
(Wikipedia article.)
This exists for any A. When A is of full rank (linearly independent columns; A∗ A invertible), we have
A+ = (A∗ A)−1 A∗
A+ = A∗ (AA∗ )−1
(A+ )+ = A (involution)
Commutes with transpose, conjugation, and conjugate-transpose:
1 +
Scalar multiples invert: if α ̸= 0, then (αA)+ = A
α
If A∗ A = I, then A+ = A∗ .
Some equalities:
A+ = A+ (A+ )∗ A∗ = A∗ (A+ )∗ A+
A∗ = A∗ AA+ = A+ AA∗
143
§8.14: Characteristic Polynomials
Note that PA ∈ F[x], is of degree n, and (with the latter definition) is always monic. (The pair differ by a
sign of (−1)n , which means the first definition is monic iff n is even.)
Some notes:
PA has constant term det(A) under the first definition, or det(−A) = (−1)n det(A) under the second.
PA has − trace(A) as the coefficient of λn−1 under the second definition, or (−1)n trace(A) under the
first.
If A = P BP −1 for some P , then PA ≡ PB .
PA ≡ PAT
For A ∈ Fm×n , B ∈ Fn×m , then AB ∈ Fm×m and BA ∈ Fn×n , giving us PBA (λ) = λn−m PAB (λ)
Cayley-Hamilton Theorem: PA (A) = 0n×n as a matrix. Note that you can’t just plug it into the
determinant form owing to this result (a naive but wrong proof).
For f ∈ F[x], then Pf (A) (λ) = PA (f (λ)).
ck xk , by the rule
P
The Faddeev-LeVerrier algorithm can calculate the coefficients of PA (x) :=
tr A m−1 0 ···
tr A2
tr A m−2 ···
(−1)m .. ..
.. .
cn−m = . . .
m!
tr Am−1 tr Am−2 ··· · · · 1
tr Am tr Am−1 ··· · · · tr A
144
§8.15: Minimal Polynomials
µA is monic
µA (A) = 0
µA is of minimal degree to satisfy this condition
If P ∈ F[x] has P (A) = 0, then µA | P
To compute µA , we note:
for all M ≥ m.
Primary Decomposition Theorem: Relatedly, suppose
Y
µA (x) = (x − λi )mi
i
n×n n×1
where A ∈ F and hence has domain F (in the sense of a function). We also suppose the
eigenvalues are distinct. Then we have that
M
Fn×1 = ker (A − λi I)mi
i
where each kernel is invariant under A (in the sense Ak ∈ K for each k ∈ K and K one of the prescribed
kernels).
Note that ker(A − λI) is an eigenspace; hence, A’s domain is a direct sum of eigenspaces iff mi = 1 for
all i.
In turn, µA encapsulates how much we need to enlarge the eigenspaces to generalized eigenspaces to
form the prescribed direct sum.
145
Relatedly, A is diagonalizable iff µA has multiplicity 1 for each eigenvalue, i.e.
Y
µA (x) = (x − λ)
i
146
§8.16: The Cayley-Hamilton Theorem
147
§8.17: The Power Method / Power Iteration / von Mises Iteration
v T AT v (Av)T v ⟨Av, v⟩
λ= T
= 2 =
v v ∥v∥ ⟨v, v⟩
148
§8.18: Definiteness: Positive & Negative (Semi-)Definite
Definitions:
For M ∈ Fn×n which is symmetric (F = R and thus x∗ = xT ) or Hermitian (F = C),
Some shorthands:
M ≼ 0 means M is NSD
M ≺ 0 means M is ND
M ≽ 0 means M is PSD
M ≻ 0 means M is PD
Results of Note:
A ∈ GLn (R) =⇒ AT A is PD: xT (AT A)x = (Ax)T Ax = ∥Ax∥, positive as A is invertible and x ̸= 0
PSD matrices have a decomposition M = B ∗ B (and the converse applies). M is PD iff B is invertible.
This decomposition need not be unique.
PSD M have real, nonnegative diagonals and hence trace(M ) ≥ 0.
149
§8.19: Dual Spaces; Adjoints
The algebraic dual (denoted X # , X ′ , X ∨ , or X ∗ ) is given by L(X, F), i.e. all linear functionals
φ : X → F.
The continuous/topological dual (denoted V ′ ) is the subset of L(X, F) which consists of continuous
functionals, i.e. L(X, F) ∩ C(X, F).
150
§8.20: Various Matrix Decompositions
151
§8.20.2: Singular Value Decomposition (SVD)
Assume for now that rank(A) = n for A ∈ Cm×n and m ≥ n (full rank).
The image of the unit sphere under a linear transformation A is a hyperellipse. We may use this fact to
motivate the SVD as so:
We rotate space so some directions vi align with the standard basis (WLOG, vi have unit 2-norm)
We rotate space again so the standard basis aligns with some new directions ui (unit 2-norm)
The vectors {σi ui } are the principal semiaxes of the ellipse and have lengths σi .
The {ui } are the left singular vectors of A (we generally have min{m, n}-many of them)
If we write the vectors columnwise and form a diagonal matrix of the singular values, we have
or compactly, AV = Û Σ̂ with
A ∈ Cm×n
Σ̂ ∈ Cn×n diagonal
152
The unitary property thus lets us write the reduced SVD of A:
A = Û Σ̂V ∗ = Û
| ÊV
−1
{z }
more intuitive geometrically
Note that Û has n orthonormal vector-columns in Cm , and hence (unless m = n) they are not a basis.
However, if we append m − n orthonormal columns to it, Û is extended to a unitary matrix U . Doing so
requires Σ̂ to be changed, by making it square with the appending of extra (m − n) rows of only zeroes.
This yields the full SVD of A:
A = U ΣV ∗ = U ΣV −1
wherein
A ∈ Cm×n
U ∈ Cm×m and unitary
Σ ∈ Cm×n (same size as A), with singular values on the “main diagonal” (top left going down right)
and zeroes elsewhere
V ∈ Cn×n and unitary
Note that if A is rank-deficient (not full rank, i.e. r := rank(A) < min{m, n}), the factorization applies
even still. We just append m−r (not m−n) orthonormal vectors to Û instead, and append n−r orthonormal
vectors to V . Σ will have r positive entries on the diagonal, and n − r are zero.
The reduced SVD may be simply helped along with
Û ∈ Cm×n
Σ̂ ∈ Cn×n with some zeroes on the diagonal
or
Û ∈ Cm×r
Σ̂ ∈ Cr×r with no zeroes on the diagonal
153
A final reminder on the geometric interpretation, though strictly speaking unitary matrices can also cause
reflections:
Every matrix has an SVD; with the restriction that the singular values are nonincreasing in order, the
SVD becomes unique (up to the type uniqued).
Consequences of A’s determinant:
◦ If det(A) > 0, then U, V can both be rotations with reflections, or rotations without reflections.
◦ If det(A) < 0, exactly one of U, V constitute a reflection.
◦ If det(A) = 0, no restriction exists.
Properties from the SVD: Take A ∈ Cm×n , with p := min{m, n} and r the number of singular
values of A (that are nonzero). Then some properties centered around the SVD or easily derived from
it:
154
◦ For 0 ≤ ν ≤ r, define the νth partial sum (Bau, Thm. 5.8)
ν
X
Aν := σj uj vj∗
j=1
155
Notes & Results Useful for Computation:
We use M = U ΣV ∗ as its SVD.
M v = σu M ∗ u = σv
u is a left-singular vector and v is a right singular vector . We say a singular value with at least
two linearly independent vectors of a type is degenerate (and the span of those two also constitute
singular vectors).
The SVD of M satisfies the following relations (immediate from the unitary nature of U, V ):
M ∗ M = V (Σ∗ Σ)V ∗
M M ∗ = U (ΣΣ∗ )U ∗
with the RHS’s giving both SVDs and eigendecompositions for M M ∗ , M ∗ M . Hence,
Find M M ∗ , M ∗ M
√
Find their eigenvalues λi ; then λi make the singular values
Find the eigenvectors of M M ∗ ; call them ui (corresponding to λi ) and form U = [u1 | · · · | ur ].
M = U ΣV ∗ =⇒ U ∗ M = ΣV ∗
156
§8.20.3: QR Factorization
Goal: To write any A ∈ Cm×m in the form A = QR for Q a unitary matrix (Q∗ = Q−1 ) and R an
upper-triangular matrix. Such a factorization exists (albeit in several different forms) for any such A.
In short, the way we do this is as so:
n
We define ⟨v1 , · · ·, vn ⟩ := span{vi }i=1
QR factorization has the focus of orthogonalizing the range of A ∈ Cm×n . Specifically, we want to get
n
{qi }i=1 ⊆ Cm such that
or as a set of equations,
a1 = r1,1 q1
a2 = r1,2 q1 + r2,2 q2
..
.
n
X
an = r1,n q1 + r2,n q2 + . . . + rn,n qn = ri,n qi
i=1
157
Note that A = Q̂R̂ is a reduced QR factorization; the full QR factorization A = QR is formed as
so: for m ≥ n,
m
Add m − n orthonormal columns to Q̂ to get Q ∈ Cm×m unitary. (The extra columns {qj }j=n+1 are
a basis of range(A)⊥ = ker(A∗ ).)
Add equally many all-zero rows to R̂ to get R ∈ Cm×n (upper triangular, of a sort)
158
§8.20.4: Householder Triangularization & QR Stuff
The Householder method instead applies elementary unitary transformations Qk on A’s left:
We see that
vk vk∗ vk vk∗
Fk = I − 2 = I − 2
vk∗ vk 2
∥vk , vk ∥2
where xk ∈ Cm−k+1 is taken from A’s kth column, as the entries in rows k to m (xk := Ak:m,k ), and then
vk := sign(x1 ) · ∥x∥2 · e1 + x
Note that Fk = I − 2P for a certain projection P , and reflects space about the hyperplane through the origin
and perpendicular to xk . Fk is a full-rank, unitary matrix.
In pseudocode, then, an implicit QR factorization of A may be constructed:
159
It being implicit is no issue, since we may still find Q∗ b and Qx easily:
The latter algorithm applied to Qek for each k can reconstruct Q explicitly if need be.
Some properties of Householder matrices:
160
§8.20.5: Hessenberg Matrices
Hessenberg matrices arise as a class of “almost triangular” matrix, e.g. in real Schur decompositions.
Let A := (ai,j )ni,j=1 ∈ Fn×n .
A is upper Hessenberg if it is “almost upper triangular”, in that the first subdiagonal may have
nonzero entries. Examples:
1 2 3 4 1 1 1 1 × × × ×
2 3 4
−3
2
2 2 0
0
× × ×
0 3 1 2 0 3 4 5 0 0 × ×
0 0 2 2 0 0 0 7 0 0 0 ×
A is lower Hessenberg if it is “almost lower triangular”: the first superdiagonal may have nonzero
entries. Examples:
1 0 0 0 4 0 0 0 × × 0 0
3 3 0 0 4 4 1 0 × × × 0
4 4 5 0 1 2 3 7 × × × ×
−2 3 31 12 0 42 6 7 × × × ×
A few notes:
A matrix which has all nonzero entries on the critical super-/sub-diagonal is said to be unreduced .
One may construct the Hessenberg form of any square matrix A (over R, C) as so:
Let A ∈ C(n−1)×n be A sans its first row; a′1 the first column of A′ .
(What is x?)
Then the block matrix U1 = blockdiag(1, V1 ) eliminates the necessary entries in A’s first column.
Then H = U AU ∗ .
161
§8.20.6: Cholesky Factorization
(Wikipedia article.)
The Cholesky decomposition applies to A which are Hermitian (A = A∗ ) and positive-definite (x∗ Ax > 0
for each vector x). It writes it in the form
A = LL∗
where L is lower-triangular, with positive real diagonal entries (ℓi,i ∈ R>0 ). If a matrix has such a de-
composition, it is unique; moreover, the existence of such a decomposition ensures it is Hermitian and
positive-definitive.
Note that, given any matrix A, A∗ A satisfies the conditions of Hermitian positive-definite (at least in the
real case?).
One may extend this to PSD matrices (x∗ Ax ≥ 0), but the decomposition may not be unique and
ℓi,i ∈ R≥0 . It is unique with that stipulation that if rank(A) = r, then L has r positive entries on the
diagonal, with n − r columns of all zeroes.
One also has the LDL or LDLT or “square-root-free Cholesky” decomposition, which writes
A = LDL∗
Here, L is further a unit lower triangular matrix (so ℓi,i = 1 for all i), and D is a diagonal matrix. If
A = M M ∗ is the classical Cholesky decomposition and S = diag(M ), these are related by the rule
L = M S −1 D = S2
D is a positive matrix when A is positive-definite. If A is PSD, then D has rank(A)-many nonzero entries
on its diagonal.
Computation is trivial. Write the multiplication A = LL∗ (or whichever) for explicitly labeled entries,
and solve the resulting equations (this is the Cholesky-Banachiewicz or Cholesky-Crout algorithm).
This can be done column by column. The equations that result are
v
u
u j−1
X 2
ℓj,j = ± taj,j − |ℓj,k |
k=1
v
u j−1
1 u ta −
X
(for i > j:) ℓi,j = i,j ℓi,k ℓj,k
ℓj,j
k=1
162
§8.20.7: Schur Decomposition
Overview:
Given A ∈ Fn×n , there is a factorization
A = QT Q∗
with
Note that this means that every matrix is similar to a triangular matrix, using unitary ones as its other
factors; that is, every square matrix is unitarily equivalent to a triangular matrix.
It is often computationally used to further decompose T as
T =N +D
Computation:
Typically, computation is uninteresting, and most are just concerned with its existence.
Computing the Schur decomposition is analogous to the QR factorization.
Fill in any needed column vectors if needed to get a total of n vectors (for A ∈ Fn×n ).
T = Q∗ AQ
163
§9: Notes from Self-Studying Linear Algebra & Numerical Analysis
§9.1: (Trefethen & Bau) Lecture 1: Matrix Multiplication, Rank, Inverses, etc.
Basic Assumptions:
Important notations:
For a matrix A, we have its entries as ai,j . (Lowercase, two indices: row, column respectively.)
For a matrix A, we have its columns as aj . (Lowercase, one index. Bad, but for consistency...)
The map x 7→ Ax is linear as a map Cn → Cm . Conversely, each such linear map is representable by a
matrix.
An important change in perspective: we may write
n
X Xn scalar
z}|{
b = Ax = xj a j xj aj
j=1 j=1
|{z}
column
This perspective may be generalized to matrix products: for A ∈ Cℓ×m , B ∈ Cℓ×n , C ∈ Cm×n ,
m
X
B = AC ⇐⇒ bi,j = ai,k · ck,j for each fixed i, j
k=1
m
X
⇐⇒ bj = Acj = ck,j · ak
k=1
164
Important Sets & Parameters: Rank, Range, Null Space
Range: The range of A ∈ Cm×n (analogous to a map Cm → Cn , mind) is like that of a function:
range(A) := {y ∈ Cn | ∃x ∈ Cm such that Ax = y}
= (the space spanned by the columns of A, by previous discussion)(Bau, T hm. 1.1)
= the column space of A
Rank: There are notions of row rank and column rank , the dimensions of the row space and column
space respectively, i.e. the spaces spanned by the rows (or columns) or A.
One may prove that these are always the same using SVD or other means, and hence may just speak
of the rank :
rank(A) ≡ dim the column space of A
C
≡ dim the row space of A
C
≡ dim range(A)
C
We say A ∈ Cm×n is of full rank if rank(A) = min{m, n} (the maximum possible rank). Hence,
if m ≥ n and A is of full rank, then it has m linearly independent rows and n linearly independent
columns. The mapping x 7→ Ax is hence injective.
Conversely, x 7→ Ax is injective iff A is full rank. (Bau, Thm. 1.2)
Invertible Matrices:
A square matrix A ∈ Cm×m is said to be non-singular or invertible under any of several equivalent
conditions: (Bau, Thm. 1.3)
A has an (unique) inverse matrix A−1 , i.e. one such that AA−1 = A−1 A = I
rank(A) = m (full rank)
range(A) = Cm (the columns or rows of A form a basis of Cm )
ker(A) = {0} (trivial kernel, hence injective)
0 is not an eigenvalue of A (0 ̸∈ σ(A))
0 is not a singular value of A
det(A) ̸= 0
When thinking of the multiplication A−1 b, we may think of
that vector being the unique solution to Ax = b
x gives the coefficients of the linear combination of columns of A that forms b (since A’s columns form
a basis of Cm when invertible, then b can be written in terms of those columns, and x encodes those
coefficients)
165
§9.2: (Trefethen & Bau) Notes on Exercises 1.1, 3.4 (Multiplication to Change
a Matrix)
(1) double column 1 (7) delete column 1 (so that the column dimension
is reduced by 1)
(2) halve row 3
(3) add row 3 to row 1 Then we are asked to:
(4) interchange columns 1 and 4
(a) Write the result as a product of 8 matrices
(5) subtract row 2 from each of the other rows
(b) Write it again as a product ABC (same B) of
(6) replace column 4 by column 3 three matrices
The key to this is to interpret the multiplication BM = A as M acting on B. The result matrix has
its kth column as being a linear combination of B’s columns, as determined by the coefficients in the kth
column of the actor matrix M . For instance, consider
1 2 3 11 12 13
4 5 6 14 15 16 = A
7 8 9 17 18 19
| {z }| {z }
=:B =:M
First column, given by 11 times the first column of B, plus 14 times its second column, plus 17 times
its third column
Second column, given by 12b1 + 15b2 + 18b3
Note that column bk may be preserved by letting mk = ek , the unit vector. This interpretation may be
dually done on the rows of M . Let m(r) denote M ’s rth row.
1 2 3 11 12 13
4 5 6 14 15 16 = A
7 8 9 17 18 19
| {z }| {z }
=:B =:M
A’s first row will have 1m(1) + 2m(2) + 3m(3) (coefficients in b(1) times the rows of M )
Note this comparison to the “usual” algorithm you like thinking of. Of course, letting m(r) = eT
r lets us
preserve row r.
166
Hence, we may write the desired product of part (a) by doing things which change rows on the left, and
things which change columns on the right.
1 0 0 0 1 0 1 0 1 0 0 0 2 0 0 0 0 0 0 1 1 0 0 0 0 0 0
−1 1 −1 −1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0
B
0 0 1 0 0 0 1 0 0 0 1/2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1
| {z }| {z }| {z } | {z }| {z }| {z }| {z }
=: M5 =: M3 =: M2 =: M1 =: M4 =: M6 =: M7
To scale row r by α, use I as your base matrix, but replace the 1 in row r by α.
To scale column c by α, use I as your base matrix, but replace the 1 in column c by α.
To add row k to row ℓ, make entry k in row ℓ be 1. (Or α, to add α times row k.)
etc.
More properly on the deletion matrices: given A ∈ Cm×n , if we wish to delete a row/column, we want
an Ar ∈ C(m−1)×n or Ac ∈ Cm×(n−1) (deleting a row or column respectively).
The deletion matrix then is either a Dr ∈ C(m−1)×m or Dc ∈ Cn×(n−1) . To define Dr , begin with the
identity matrix Im×m and delete the row of it you wish to delete from A; for Dc , do the same for the column
of In×n .
Then Ar = Dr A in the row case or Ac = ADc in the column case.
For example,
1 0
1 2 3
1 0 0 1
| {z } 4 5 6
=: Dr | {z } 0 0
=: A | {z }
=: Dc
Dr deletes the second row, and Dc the third column. (Compare each to I2×2 and I3×3 .)
167
§9.3: (Trefethen & Bau) Lecture 2: Orthogonality, Unitary Matrices
Basic Definitions:
z := α − ıβ (sometimes denoted z ∗ )
Special Names for Certain Properties: Given certain equalities for a matrix A, we may call it
certain names. Note that if A = AT or A = A∗ or the like, then A is square.
x∗ y ⟨x, y⟩Cm
cos α ≡ =
∥x∥ · ∥y∥ ∥x∥2 · ∥y∥2
168
Some Results, Theorems, or Identities of Note:
Inner Products & Sesquilinearity: The inner product is sesquilinear over C; hence, it is “half
linear” in the first coordinate (scalars pulled out become their conjugates) and fully linear in the
second. Thus
The vectors of an orthogonal set (i.e. pairwise orthogonal nonzero vectors) are linearly independent.
(Bau, Thm. 2.1)
m
◦ As a corollary of the above, if {xi }i=1 ⊆ Cm is an orthogonal set, it is a basis of Cm .
m
Orthogonal Decompositions: Consider a basis {qi }i=1 of Cm . Then we may write
m
X m
X m
X
v= ⟨qi , v⟩Cm qi = (qi∗ v)qi = (qi qi∗ )v
i=1 i=1 i=1
(where the first and second sums are obviously equivalent). Here, the final sum is a sum of the
orthogonal projections of v onto the qi .
Unitary Matrices: (Q is unitary if QQ∗ = Q∗ Q = I.) We have that qi∗ qj = δi,j .
m
We may think of, if b has entries giving its expansion in the standard basis {ei }i=1 , then Q−1 b = Q∗ b
m
has entries that expand b in the basis of {qi }i=1 (the columns of Q).
Unitary matrices preserve inner products, and hence norms and distances:
In fact, for unitary Q over R (that is, Q ∈ Rm×m being orthogonal: QQT = QT Q = I), these
transformations are strictly rotations (det Q = +1) or reflections (det Q = −1).
169
§9.4: (Trefethen & Bau) Lecture 2 Addendum (Useful Norm/Inner Product
Equalities)
z + z = 2 · Re(z)
z − z = 2 · Im(z)
2
z · z = |z|
170
§9.5: (Trefethen & Bau) Lecture 3: Matrix & Vector Norms
Vector Norms:
A (vector) norm is a function ∥·∥ : Cm → R satisfying the following conditions:
The most important class of vector norms are the p norms. Given p ∈ [1, ∞) we define
m
!1/p
X p
∥x∥p := |xi | ∥x∥∞ := max |xi |
1≤i≤m
i=1
The unit balls in (R2 , ∥·∥p ) for some p are shown below. (Desmos demo.)
∥x∥W := ∥W x∥
for W = diag(w1 , · · ·, wm ) with wi ̸= 0. (More generally, any W ∈ GLm (C) will do.)
Some results:
Holder Inequality: Take 1/p + 1/q = 1 for p, q ∈ [1, ∞]. Then for x, y ∈ Cm we have
The latter holds in general for ⟨·, ·⟩V an inner product on V and ∥·∥V its induced norm:
171
Matrix Norms:
One may give a norm to A ∈ Cm×n by envisioning it as a vector in Cmn .
One may introduce an induced matrix norm, induced by a given vector norm. Specifically, let
∥·∥(n) , ∥·∥(m) be norms on Cn , Cm , respectively. We may define the induced matrix norm ∥·∥(m,n) for
A ∈ Cm×n by
n o
∥A∥(m,n) := inf C ≥ 0 ∥Ax∥(m) ≤ C∥x∥(n) for all x ∈ Cn
∥Ax∥(m)
≡ sup
x∈C n ∥x∥(n)
x̸=0
≡ sup ∥Ax∥(m)
x∈Cn
∥x∥(n) =1
Sometimes we say the norm on A is the induced p-norm if we apply the same p-norm on Ax and x,
up to the dimension of concern.
Some results:
The induced 1-norm is given by (for A ∈ Cm×n ) the maximum sum of a column’s absolute values:
n
X
∥A∥1 ≡ max ∥ai ∥1 ≡ max |ai,j |
1≤i≤m 1≤i≤m
j=1
The induced ∞-norm is given by (for A ∈ Cm×n ) the maximum sum of a row’s absolute values:
172
◦ The text confusingly uses ai to denote the ith column of A, and a∗i the ith row, despite potential
confusion with the adjoint. Consider it purely notational.
Sub-Multiplicative: Take ∥·∥k a norm on Ck for k ∈ {ℓ, m, n}. Let A ∈ Cℓ×m and B ∈ Cm×n . Then
∀x ∈ Cn we have
∥ABx∥(ℓ) ≤ ∥A∥(ℓ,m) ∥Bx∥(m) ≤ ∥A∥(ℓ,m) ∥B∥(m,n) ∥x∥(n)
and hence
∥AB∥(ℓ,n) ≤ ∥A∥(ℓ,m) ∥B∥(m,n)
This need not be an equality.
Non-Induced Norms:
Not all norms on matrices are induced from those on vectors; satisfying the axioms of a norm is sufficient.
The Hilbert-Schmidt norm or Frobenius norm ∥·∥F is defined as a 2-norm on Cmn for A ∈ Cm×n :
v v
sX uXn um
2 2
uX 2
|ai,j | = t ∥aj ∥2 = t ∥a∗i ∥2
u
∥A∥F :=
i,j j=1 i=1
p p
≡ trace(A∗ A) = trace(AA∗ )
Note that
2 2 2
∥AB∥F ≤ ∥A∥F ∥B∥F
This holds for Q rectangular with orthonormal columns and Q ∈ Cp×m with p > m.
One may reframe this result for right multiplication, and in turn orthonormal rows rather than unitary
matrices.
173
§9.6: (Trefethen & Bau) Lecture 4: Singular Value Decomposition (SVD)
Assume for now that rank(A) = n for A ∈ Cm×n and m ≥ n (full rank).
The image of the unit sphere under a linear transformation A is a hyperellipse. We may use this fact to
motivate the SVD as so:
We rotate space so some directions vi align with the standard basis (WLOG, vi have unit 2-norm)
We rotate space again so the standard basis aligns with some new directions ui (unit 2-norm)
The vectors {σi ui } are the principal semiaxes of the ellipse and have lengths σi .
The {ui } are the left singular vectors of A (we generally have min{m, n}-many of them)
If we write the vectors columnwise and form a diagonal matrix of the singular values, we have
or compactly, AV = Û Σ̂ with
A ∈ Cm×n
Σ̂ ∈ Cn×n diagonal
174
The unitary property thus lets us write the reduced SVD of A:
A = Û Σ̂V ∗ = Û
| ÊV
−1
{z }
more intuitive geometrically
Note that Û has n orthonormal vector-columns in Cm , and hence (unless m = n) they are not a basis.
However, if we append m − n orthonormal columns to it, Û is extended to a unitary matrix U . Doing so
requires Σ̂ to be changed, by making it square with the appending of extra (m − n) rows of only zeroes.
This yields the full SVD of A:
A = U ΣV ∗ = U ΣV −1
wherein
A ∈ Cm×n
U ∈ Cm×m and unitary
Σ ∈ Cm×n (same size as A), with singular values on the “main diagonal” (top left going down right)
and zeroes elsewhere
V ∈ Cn×n and unitary
Note that if A is rank-deficient (not full rank, i.e. r := rank(A) < min{m, n}), the factorization applies
even still. We just append m−r (not m−n) orthonormal vectors to Û instead, and append n−r orthonormal
vectors to V . Σ will have r positive entries on the diagonal, and n − r are zero.
The reduced SVD may be simply helped along with
Û ∈ Cm×n
Σ̂ ∈ Cn×n with some zeroes on the diagonal
or
Û ∈ Cm×r
Σ̂ ∈ Cr×r with no zeroes on the diagonal
175
§9.7: (Trefethen & Bau) Lecture 5: More on the SVD
We take
A = U ΣV ∗ for A, Σ ∈ Cm×n and U ∈ Cm×m , V ∈ Cn×n unitary
Some emergent properties:
The SVD says each matrix is diagonal, up to a change of basis. Recall: b ∈ Cm may be
expanded in terms of the basis of the columns of U , and x ∈ Cn is likewise expandable in the basis of
V ’s columns. These expansions give vectors of coefficients b′ , x′ by
b′ = U −1 b = U ∗ b x′ = V −1 x = V ∗ b
and hence
Ax = b =⇒ U ∗ b = U ∗ Ax = U ∗ U ΣV ∗ x =⇒ b′ = Σx′
Comparison to eigendecomposition: Recall: For A ∈ Cm×m , if it has n linearly independent
eigenvectors pi attached to eigenvalues λi , we may write
h i
A = P DP −1 for P = p1 · · · pm and D = diag(λ1 , · · ·, λm )
This is a conversion to and from the eigenbasis (i.e. uses only one basis), whereas the SVD uses two.
The SVD’s bases are orthonormal, whereas that is not necessarily true of the eigendecomposition.
Moreover, the SVD has the further advantage of applying to all matrices, whereas not even all square
matrices have eigendecompositions.
Properties from the SVD: Take A ∈ Cm×n , with p := min{m, n} and r the number of singular
values of A (that are nonzero). Then:
176
To compute the SVD of most matrices M , we can:
Find M M ∗ , M ∗ M
√
Find their eigenvalues λi ; then λi make the singular values
Find the eigenvectors of M M ∗ ; call them ui (corresponding to λi ) and form U = [u1 | · · · | ur ].
M = U ΣV ∗ =⇒ U ∗ M = ΣV ∗
177
§9.8: (Trefethen & Bau) Lecture 6: Projectors
Some notes:
(Notice that P v − v is parallel to the “light”, the direction of projection; it gets mapped onto the null
space that way: the null space is the direction parallel to the light.)
We note that
range(I − P ) = ker(P )
ker(I − P ) = range(P )
ker(I − P ) ∩ ker(P ) = ⟨0⟩
range(P ) ∩ ker(P ) = ⟨0⟩
Hence, a projector of domain C separates Cm into two spaces; the converse holds. (Given S, T ≤ Cm
m
One need not use an orthonormal basis; an orthogonal projector onto S ≤ Cm can be constructed
n
as so. Suppose S = span{ai }i=1 for ai linearly independent, and let A = [a1 | · · · | an ]. Then the
projection P onto range(A) is given by
−1
P = A(A∗ A) A∗
178
§9.9: (Trefethen & Bau) Lecture 7: QR Factorization & Gram-Schmidt
QR Factorization:
n
We define ⟨v1 , · · ·, vn ⟩ := span{vi }i=1
QR factorization has the focus of orthogonalizing the range of A ∈ Cm×n . Specifically, we want to get
n
{qi }i=1 ⊆ Cm such that
A QR factorization exists in both forms for all A ∈ Cm×n , unique in the reduced case with the suppositions
that A is of full rank and rj,j > 0 (R̂ is strictly positive on the diagonal).
m
Note that, in the full QR factorization, the extra columns {qj }j=n+1 are a basis of range(A)⊥ = ker(A∗ ).
179
Gram-Schmidt Orthogonalization Algorithm:
The classical, unstable algorithm constructs the ri,j and qj from the ai of A as so:
k−1
X
ak − ri,k qi
k−1
X
i=1
qk = ri,j = qi∗ aj = ⟨qi , aj ⟩Cm rk,k =
ak − ri,k qi
= ∥top of qk ∥2
rk,k | {z }
i=1
2
i ̸= j
In a pseudo-code manner,
L2 Space:
Solving Ax = b:
Consider Ax = b where A has the QR factorization A = QR. It is trivial to show, then,
Rx = Q∗ b
180
§9.10: (Trefethen & Bau) Lecture 10: Householder Triangularization
The Householder method instead applies elementary unitary transformations Qk on A’s left:
We see that
vk vk∗ vk vk∗
Fk = I − 2 = I − 2
vk∗ vk 2
∥vk , vk ∥2
where xk ∈ Cm−k+1 is taken from A’s kth column, as the entries in rows k to m (xk := Ak:m,k ), and then
vk := sign(x1 ) · ∥x∥2 · e1 + x
Note that Fk = I − 2P for a certain projection P , and reflects space about the hyperplane through the origin
and perpendicular to xk . Fk is a full-rank, unitary matrix.
In pseudocode, then, an implicit QR factorization of A may be constructed:
181
It being implicit is no issue, since we may still find Q∗ b and Qx easily:
The latter algorithm applied to Qek for each k can reconstruct Q explicitly if need be.
182
§9.11: (Trefethen & Bau) Lecture 11: Least Squares Problems
183
§9.12: (Trefethen & Bau) Lecture 12: Conditioning; Condition Numbers
Abstractly, a problem is a function f : X → Y of normed vector spaces, data and solutions respectively.
Problems are well-conditioned if small perturbations in input x give small changes in f (x), and
ill-conditioned otherwise.
Herein, δx is a small perturbation of x, and δf := f (x + δx) − f (x).
The absolute condition number κ̂ ≡ κ̂(x) of a problem at x is
∥δf ∥Y ∥δf ∥Y
κ̂ := lim sup = sup
δ→0 ∥δx∥ ≤δ
X
∥δx∥X infinitesimals ∥δx∥X
δx
Problems for which κ, κ̂ are small are well-conditioned; if large, they are ill-conditioned.
Recall we may define the Jacobian J(x) of a differentiable problem f at x is the matrix
∂fi
J(x) :=
∂xj i,j
for i, j in the appropriate ranges. Then with δf = J(x) δx (as infinitesimals), then we have
κ̂ = ∥J(x)∥X,Y
∥J(x)∥X,Y
κ=
∥f (x)∥Y /∥x∥X
in the induced norms of those on X, Y .
In the case of matrix-vector multiplication (the problem map x 7→ Ax), then
∥x∥
κ = ∥A∥
∥Ax∥
If A ∈ GLm (C), then
∥x∥
κ ≤ ∥A∥
A−1
or some choose κ = α∥A∥
A−1
for α :=
∥Ax∥∥A−1 ∥
We may replace A−1 with A+ in the nonsquare-but-full-rank case. The inverse problem (b 7→ A−1 b)
has A 7→ A−1 and vice versa.
A matrix has its own condition number ,
κ(A) := ∥A∥
A−1
with the usual definitions for what it means for A to be ill-/well-conditioned. (If A is noninvertible,
we say κ(A) = ∞.) We note that, if A ∈ Cm×m ,
1 σ1
∥A∥2 = σ1 and
A−1
=
=⇒ κ(A) =
σm σm
in the 2-norm case (eccentricity of the hyperellipse).
For A ∈ Cm×n of full rank, with m ≥ n, then we let
σ1
κ(A) := ∥A∥
A+
=⇒ κ(A) =
σn
184
§10: Set-Theoretic Identities
◦ A∪B =B∪A
◦ A∩B =B∩A
◦ (A ∪ B) ∪ C = A ∪ (B ∪ C), may remove parentheses
◦ (A ∩ B) ∩ C = A ∩ (B ∩ C), may remove parentheses
Union distributes over intersection and vice versa
◦ A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
◦ A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
◦ Analogous to a · (b + c) = a · b + a · c
Empty set and universal set (U ) stuff:
◦ A∪∅=A
◦ A∩U =A
◦ A ∪ Ac = U
◦ A ∩ Ac = ∅
◦ A∪U =U
◦ A∩∅=∅
◦ ∅c = U
◦ Uc = ∅
◦ A−U =∅
◦ A−A=∅
◦ ∅−A=∅
◦ A−∅=A
◦ ∅ is identity of union, and U that of intersection
◦ (Ac )c = A
◦ A∪A=A∩A=A
◦ A ∪ (A ∩ B) = A ∩ (A ∪ B) = A
185
§10.2: Some More Useful & Noteworthy Ones
A very big list is here as well, I’ll only try to focus on ones I tend to actively use here.
Definitions:
◦ Set Difference: A − B := A ∩ B c
◦ Symmetric Difference: A △ B := (A−B)∪(B−A) = (A∪B)−(A∩B) = {x | x is one and only one of A, B}
◦ Cartesian Product:
For two sets, A × B := {(a, b) | a ∈ A, b ∈ B}
For finitely many sets, A1 × · · · × An := {(ai )ni=1 | ai ∈ Ai }
( )
Y [
For infinitely many sets, Ai := f : I → Ai ∀i ∈ I, f (i) ∈ Xi
i∈I i∈I
◦ Disjoint Union:
This does not mean the union of two sets, but rather takes two sets and their union, in a way
that accounts for duplicate elements. It is sometimes called a discriminated union.
`
Various notations include ⊔, , or a + or · inside a ∪ symbol.
For two sets, A ⊔ B := {(a, 1)}a∈A ∪ {(b, 2)}b∈B
G [
In general, Ai := {(a, i) | a ∈ Ai for the given i}
i∈I i∈I
!c !c
[ \ \ [
De Morgan’s laws: Ai = Aci and Ai = Aci
i∈I i∈I i∈I i∈I
◦ C − (A ∩ B) = (C − A) ∪ (C − B)
◦ C − (A ∪ B) = (C − A) ∩ (C − B)
◦ C − (B − A) = (A ∩ C) ∪ (C − B)
◦ (B − A) ∩ C = (B ∩ C) − A = B ∩ (C − A)
◦ (B − A) ∪ C = (B ∪ C) − (A − C)
◦ (B − A) − C = B − (A ∪ C)
◦ (A × B) ∩ (C × D) = (A ∩ C) × (B ∩ D)
◦ (A × B) − (C × D) = [(A − C) × B] ∪ [A × (B − D)]
186
§10.3: Identities on Functions, Images, & Preimages
f (A) := {f (x) ∈ Y | x ∈ A}
−1
f (B) := {x ∈ X | f (x) ∈ B}
Results of note:
A1 ⊆ A2 =⇒ f (A1 ) ⊆ f (A2 )
B1 ⊆ B2 =⇒ f −1 (B1 ) ⊆ f −1 (B2 )
!
[ [
f Ai = f (Ai )
i∈I i∈I
f (A1 ∩ A2 ) ⊆ f (A1 ) ∩ f (A2 ) with equality if injective (iff true on all subsets)
!
\ \
f Ai ⊆ f (Ai ), with equality if injective
i∈I i∈I
!
\ \
f −1 Bi = f −1 (Bi )
i∈I i∈I
!
[ [
f −1 Bi = f −1 (Bi )
i∈I i∈I
B1 ⊆ B2 =⇒ f −1 (B1 ) ⊆ f −1 (B2 )
f (A) ∩ B = f (A ∩ f −1 (B))
f (A) ∪ B ⊇ f (A ∪ f −1 (B))
A ∩ f −1 (B) ⊆ f −1 (f (A) ∩ B)
A ∪ f −1 (B) ⊆ f −1 (f (A) ∪ B)
187
Some of note also come from Dr. Ai. Note that we define, for a function f : E ⊆ Rn → R := R∪{+∞, −∞},
with E measurable and a ∈ R
{f > a} := f −1 (a, ∞] := {x ∈ E | f (x) > a}
and analogous notions for other sets, e.g. {f < a}, {f ≥ a}, {f = a}, etc. Then:
∞ ∞
! !
\ [
E = {f = −∞} ∪ {f > −∞} = {f ≤ −k} ∪ {f > −k}
k=1 k=1
c
{f ≤ a} = {f > a}
E = {f ≥ a} ∪ {f < a}
∞
[ 1
{f > a} = f ≥a+
n=1
n
∞
\ 1
{f ≥ a} = f >a−
n=1
n
∞
\ ∞
\
{f = +∞} = {f > n} = {f ≥ n}
n=1 n=1
∞
\ ∞
\
{f = −∞} = {f < −n} = {f ≤ −n}
n=1 n=1
{a < f ≤ b} = {f > a} ∩ {f ≤ b}
Some others with additional constraints/context needed:
When a > b, {f > a} ⊆ {f > b}
If f ≥ g on all of E, then {g > a} ⊆ {f > a}
We have that {f > g} = {f − g > 0}
◦ This does not hold for possible equality, i.e. {f ≥ g} = ̸ {f − g ≥ 0}
◦ It will hold in that case, however, if f, g are finite on E
Take a sequence of functions fk : E → R k∈N and define pointwise g(x) := sup fk (x) , h(x) := inf fk (x).
k∈N k∈N
Then
∞
[
◦ {g > a} = {fk > a}
k=1
∞
\
◦ {h < a} = {fk < a}
k=1
188
§10.4: Limits of Sequences of Sets
This follows as the former are all points “eventually in Ek ” for all k ≥ k0 ; the latter is all points in
infinitely-many Ek .
We say:
{Ek }k∈N
T S
decreases to k∈N Ek if Ek ⊇ Ek+1 (“union shrinks”). Notation: Ek ↘ k∈N Ek
Properties of note:
c
lim sup Ek = lim inf (Ekc ) (M&I, Prob. 1.3)
k→∞ k→∞
If either Ek ↗ E or Ek ↘ E are true, lim sup Ek = lim inf Ek = E (M&I, Prob. 1.3)
k→∞ k→∞
189
§10.5: Axiom of Choice (Overview)
Informally, given a collection of sets X , all nonempty, there is a choice function which sends a set X to a
specific element x ∈ X.
Note that these statements are a nonissue for A a finite indexing set, sometimes even countable.
Some common equivalent statements:
Construct A Choice Set: Given a collection {Xα }α∈A of pairwise-disjoint and nonempty sets, ∃ a
set C with precisely one element from each set Xα .
Cartesian Product: Given a collection {Xα }α∈A of nonempty sets, their Cartesian product α∈A Xα
Q
is also nonempty.
Well-Ordering Theorem: Every set can be well-ordered.
◦ A well-ordering (X, ≤) is a strict total order with the property that each S ⊆ X has a least
element.
◦ To build it up: a partial order (poset) has reflexivity, antisymmetry, transitivity.
◦ A strict poset instead has irreflexivity, asymmetry, and transitivity.
◦ A strict total order has the additional property of connexity: we also have a ≤ b or b ≤ a for
each a, b in the set.
◦ Thus: a well-ordering has the properties of irreflexivity, asymmetry, transitivity, connexity, and
subsets containing their minimum elements.
Zorn’s Lemma: If a poset P has the property that all chains in P have upper bounds in P , then P
has at least one maximal element.
Surjections:
Trichotomy: Given sets A, B, one of these are true: |A| = |B| or |A| < |B| or |B| < |A|.
190
Items often claimed as results, but are apparently equivalent:
For all nonempty sets S, we can define a binary operation ∗ on S such that (S, ∗) is a group.
The closure of a product of topological spaces is the product of the closures of the factors.
Despite the commonly-cited issues and concerns with the Axiom of Choice, per MathOverflow, some strange
results follow without it. Some highlights from the link:
A nonempty tree graph may have no leaves, but no infinite path. (Every finite path in the tree may
be extended one more step - every finite length has a path - but there is not infinite path.)
There may exist x ∈ X ⊆ R with no sequence {xn }n∈N ⊆ X with xn → x. (That is, the property that
elements of a closure are limiting values of sequences requires AC.)
You may have a function continuous in the sense of preserving sequential limits (that being xn → x =⇒ f (xn ) → f (x))
that fails the ε-δ definition.
A set S may be infinite with no countably-infinite subset. (We cannot, then, say that ℵ0 is the smallest
infinite cardinality.)
There may be an equivalence relation on R with more equivalence classes than R has elements.
There is a field without an algebraic closure, and Q can have multiple non-isomorphic closures (and
such closures may even be countable).
There can be a vector space without a basis. Moreover, a vector space may have bases β, β ′ with
|β| < |β ′ |.
R is a countable union of countable sets. (This does not mean it is countable, which requires the axiom
of countable choice.)
All sets are measurable, in the Lebesgue sense – not that Lebesgue theory is very useful by the previous.
191
§11: Set-Theoretic Relations
A brief table summarizing some relations (from Wikipedia) is below. A more thorough write up follows
throughout the rest of this section.
192
§11.2: Basics of an Important Visual Construction
So for clarity, let’s have a relation R on a set A, so that R ⊆ A × A. In general, we will be focusing on
finite sets and relations, but the logic extends fine - it just makes for harder-to-contend-with pictures.
Our visual is one of a directed graph. We let the elements of A be our nodes/vertices, and draw arrows
pointing between them to indicate relationship. More specifically, a points to b if and only if (a, b) ∈ R.
Throughout this post, I will write a → b to indicate ”a points to b” for brevity’s sake. We will generally
focus on binary (homogenous) relations over the same set A.
Some examples of this construction:
R = {(1, 1), (1, 2), (1, 3), (1, 4)} on the set A = {1, 2, 3, 4}
R = {(1, 2), (2, 3), (3, 4), (4, 1)} on the set A = {1, 2, 3, 4}
193
R = ∅ on the set A = {1, 2, 3, 4} (i.e. the empty-set relation, sometimes called the empty relation)
R = P({1, 2, 3}) on the set A = {1, 2, 3} (i.e. the power-set relation, sometimes called the universal or
discrete relation)
194
§11.3: Basic Properties
§11.3.1.1: Coreflexive
Informal Definition: If anything is related to anything, those things must be the same thing.
Notes/Comments:
Visual Examples: R = {(1, 1), (2, 2), (3, 3), (4, 4)} (left); R = {(2, 2)} (right)
195
Visual Non-Examples: R = {(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (2, 3)} (left); R = {(1, 2), (2, 3), (3, 1)}
(right)
196
§11.3.1.2: Irreflexivity
Basic Examples:
Visual Examples: R = {(1, 2), (2, 3), (3, 4)} (left); R = {(1, 2), (2, 1), (3, 4), (4, 3)} (right)
Visual Non-Examples: R = {(1, 1), (1, 2), (2, 3), (3, 4)} (left); R = {(1, 1), (1, 2), (1, 3), (1, 4)} (right)
197
§11.3.1.3: Left quasi-reflexive
Informal Definition: Anything related to something must be related to itself. (The informal defini-
tion might be a bit confusing for each one-sided quasi-reflexive case.)
Directed Graph Analogy: If a node points to anything, it must also point to itself.
Visual Examples: R = {(1, 1), (1, 2), (3, 3)} (left); R = {(1, 1), (2, 2), (3, 3), (1, 4), (2, 4), (3, 4)} (right)
Visual Non-Examples: R = {(2, 2), (2, 3)(1, 2), (3, 3)} (left); R = {(1, 4), (2, 4), (3, 4)} (right)
198
§11.3.1.4: Quasi-reflexive
Informal Definition: If anything is related to anything else, both of those things are related to
themselves.
Directed Graph Analogy: Any node which has an arrow pointing to or from it needs an arrow
pointing to itself.
Notes/Comments:
◦ Hence a quasi-reflexive relation is both left quasi-reflexive and right quasi-reflexive. It is a weaker
notion of reflexivity in that the (x, x) pairs only pop up in the relation when they are actually
relating to something.
◦ An equivalent property is that R is quasi-reflexive if and only if the symmetric closure R ∪ RT is
left- or right-quasi-reflexive.
◦ A relation which is both symmetric and transitive is quasi-reflexive.
∞ ∞
Basic Example: Let x = {xi }i=1 , y = {yi }i=1 be sequences in a metric space, e.g. in R with the
usual distance function. Define a relation R by
This relation is not necessarily reflexive. (Let x be a sequence whose limit does not exist, e.g.
xn = (−1)n .) However, if the limits for x, y exist and they equal, then trivially those limits for
each sequence equals their own limit individually, giving quasi-reflexivity.
Visual Examples: R = {(1, 1), (2, 2), (3, 3), (1, 2), (2, 3)} (left); R = {(1, 1), (2, 2), (3, 3), (4, 4), (1, 4), (2, 4), (3, 4)}
(right)
199
Visual Non-Examples: R = {(1, 1), (2, 2), (3, 3), (1, 2), (2, 3), (3, 4)} (left); R = {(2, 2), (3, 3), (4, 4), (1, 4), (2, 4), (3, 4)}
(right)
200
§11.3.1.5: Reflexivity
Notes/Comments:
Visual Examples: R = {(1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (4, 4)} (left); R = {(1, 1), (2, 2), (3, 3), (3, 2), (4, 4)}
(right)
201
Visual Non-Examples: R = {(1, 2), (2, 2), (2, 3), (3, 3), (4, 4)} (left); R = {(2, 2), (3, 3), (3, 2)} (right)
202
§11.3.1.6: Right quasi-reflexive
Visual Non-Examples: R = {(1, 1), (1, 2), (2, 3), (3, 3)} (left); R = {(1, 2), (3, 3), (3, 2), (4, 2)} (right)
203
§11.3.2: Symmetry-Like Properties
§11.3.2.1: Antisymmetry
Informal Definition: For distinct elements, relation is a one-way street. Two elements cannot be
related to each other unless they are the same element.
Directed Graph Analogy: There are no closed loops between pairs of distinct nodes.
Notes/Comments:
◦ Despite the naming, relations can be both symmetric & antisymmetric. Such relations happen to
also be coreflexive and thus a subset of the identity relation.
◦ Hence we can look at relations which are symmetric, antisymmetric, both, or neither.
◦ Note that this is not the same as asymmetry. In fact a relation is asymmetric iff it is antisymmetric
and irreflexive.
Basic Examples:
Visual Examples: R = {(1, 2), (2, 4), (4, 1), (4, 3)} (left); R = {(1, 1), (1, 2), (1, 3), (1, 4)}
204
Visual Non-Examples: R = {(1, 2), (2, 1), (3, 3), (4, 4), (3, 4)} (left); R = {(1, 3), (3, 1), (3, 4), (4, 3), (4, 1)}
(right)
205
§11.3.2.2: Asymmetry
Informal Definition: Relation is solely a one-way street; elements cannot be related to each other.
However unlike antisymmetry, this now forbids elements relating to themselves.
Directed Graph Analogy: Pairs of distinct nodes do not have loops between them. Nodes do not
point to themselves.
Notes/Comments:
Basic Examples:
◦ Strict less/greater than (< or >) on R or its subsets (i.e. (a, b) ∈ R ⇐⇒ a < b (or a > b)).
◦ Strict set inclusion (i.e. (A, B) ∈ R ⇐⇒ A ⫋ B)
◦ Divisibility on positive integers disallowing equality (i.e. (a, b) ∈ R ⇐⇒ a | b ∧ b ̸= a).
Visual Examples: R = {(1, 2), (2, 4), (4, 1), (4, 3)} (left); R = {(1, 2), (1, 3), (1, 4)}
206
Visual Non-Examples: R = {(1, 2), (2, 1), (3, 3), (4, 4), (3, 4)} (left); R = {(1, 3), (3, 1), (3, 4), (4, 3), (4, 1), (1, 1)}
(right)
207
§11.3.2.3: Symmetry
Formal Definition:
Directed Graph Analogy: There are no ”stray/lone” edges in the graph where an element solely
points to another. There will always be just closed loops between any two distinct elements (or no
connections at all).
Notes/Comments:
Basic Examples:
Visual Examples: R = {(1, 2), (2, 1), (2, 4), (4, 2), (3, 4), (4, 3)} (left); R = {(1, 4), (4, 1), (2, 2), (3, 3)}
(right)
208
Visual Non-Examples: R = {(1, 3), (3, 4), (4, 2), (2, 1), (2, 3), (1, 4)} (left); R = {(1, 3), (3, 4), (4, 2), (2, 1), (1, 4), (4, 1)}
(right)
209
§11.3.3: Transitivity-Like Properties
§11.3.3.1: Antitransitive
Directed Graph Analogy: Whenever two sides of a triangle are formed, the third side that transi-
tivity would dictate is not present.
Notes/Comments:
Visual Examples: R = {(1, 3), (2, 1), (4, 2)} (left); R = {(1, 2), (2, 1), (1, 3), (3, 4), (2, 4)} (right)
210
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 1)} (left); R = {(1, 1), (2, 2), (1, 2), (2, 4), (1, 3)} (right)
211
§11.3.3.2: Cotransitive
Informal Definition: If two elements are related, on any intermediate related pair, at least one of
those pairs will also be related.
Directed Graph Analogy: Provided x → z, then for any y ∈ A, either x → y or y → z (or both).
Notes/Comments:
◦ One may frame the definition as being that ”R is cotransitive iff its complement Rc is transitive.”
(This is the usual set-theoretic complement.)
◦ Cotransitive relations are also quasi-transitive.
◦ A cotransitive relation is connected iff it is irreflexive.
◦ A cotransitive relation may be transitive. Sufficient conditions include being left-Euclidean, right-
Euclidean, or antisymmetric.
Example 1: Treat (1, 4) as a sort of ”root” in making the relation. Then for any y ∈ A := {1, 2, 3, 4},
we have to add either (1, y) or (y, 4). Always choosing the former, we get R = {(1, 2), (1, 3), (1, 4)}.
212
Example 2: Suppose we start with (1, 1). For any any y ∈ A, we need to add (1, y) or (y, 1). Adding
the latter always, we get the points (2, 1), (3, 1), (4, 1).
Visual Non-Examples: Remove any of the newly-added arrows from the above relations to get a
nonexample:
213
§11.3.3.3: Intransitive
Informal Definition: There exists some trio of elements where transitivity does not hold.
Directed Graph Analogy: You can find two sides of 3-cycle (or, more appropriately, a triangle)
which is not closed off.
Notes/Comments:
Visual Examples: R = {(1, 2), (2, 4)} (left); R = {(3, 4), (4, 2)} (right)
Visual Non-Examples: R = {(1, 2), (2, 4), (1, 4)} (left); R = {(3, 4), (4, 2), (3, 2)} (right)
214
§11.3.3.4: Left Euclidean
Informal Definition: If an element is related to multiple things, those things are related to each
other too.
Directed Graph Analogy: If a node points to multiple others, then those nodes need to point to
each other. (If you know a bit of graph theory, think of the neighborhood of a vertex too.)
Notes/Comments:
◦ Notice, (y, x), (z, x) ∈ R for R left-Euclidean gives both (y, z), (z, y) ∈ R, just by swapping
arguments.
◦ A right- or left-Euclidean reflexive relation is symmetric. Thus by the previous, it is transitive
and thus an equivalence relation.
◦ Right- and left-Euclidean relations are quasi-transitive.
◦ A connected, right- or left-Euclidean relation on a set of cardinality at least 3 is never antisym-
metric.
◦ Left-Euclidean relations are left-unique iff they are antisymmetric. Such relations are also transi-
tive, by vacuous logic.
◦ Left-Euclidean relations are left quasi-reflexive. A relation is left quasi-reflexive iff it is both
left-Euclidean and left-unique.
Visual Examples: R = {(1, 2), (1, 4), (2, 4), (4, 2)} (left); R = {(1, 2), (1, 3), (1, 4), (2, 3), (3, 2), (2, 4), (4, 2), (3, 4), (4, 3)}
(right). In these, focus on the nodes that 1 is pointing to.
215
Visual Non-Examples: R = {(1, 1), (1, 2), (1, 4), (2, 4)} (left); R = {(1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}
216
§11.3.3.5: Quasi-transitive
Formal Definition: (∀a, b, c ∈ A)((a, b), (b, c) ∈ R ∧ (b, a), (c, b) ̸∈ A =⇒ (a, c) ∈ R ∧ (c, a) ̸∈ R)
Directed Graph Analogy: Whenever two sides of a triangle are formed, the third side is closed off.
Moreover, there are not arrows going in the reverse direction for any arrow of that triangle. (This
latter condition ensures that quasitransitive relations are transitive but not necessarily the reverse.)
Notes/Comments:
Visual Examples: R = {(1, 2), (1, 4), (2, 4)} (left); R = {(1, 2), (1, 3), (2, 3), (4, 2), (4, 3)} (right)
217
Visual Non-Examples: R = {(1, 2), (2, 4)} (left); R = {(1, 2), (2, 1), (1, 4), (4, 1), (2, 4), (4, 2)} (right)
218
§11.3.3.6: Right Euclidean
Informal Definition: If multiple things are related to the same thing, those things are related to
each other too.
Directed Graph Analogy: If multiple nodes point to the same one, they need to point to each other
too.
Notes/Comments:
◦ Notice, (x, y), (x, z) ∈ R for R right-Euclidean implies (y, z), (z, y) ∈ R too, just by swapping
arguments.
◦ Which similar to transitivity, it is not the same. ≤ on R is transitive, yet not right-Euclidean, for
instance. However, if connected, transitivity holds.
◦ If R is symmetric, then it is transitive, right-Euclidean, and left-Euclidean if it is an one of those
three.
◦ A right- or left-Euclidean reflexive relation is symmetric. Thus by the previous, it is transitive
and thus an equivalence relation.
◦ Right- and left-Euclidean relations are quasi-transitive.
◦ A connected, right- or left-Euclidean relation on a set of cardinality at least 3 is never antisym-
metric.
◦ Right-Euclidean relations are right-unique iff they are antisymmetric. Such relations are also
transitive, by vacuous logic.
◦ Right-Euclidean relations are right quasi-reflexive. A relation is right quasi-reflexive iff it is both
right-Euclidean and right-unique.
Visual Examples: R = {(2, 2), (1, 4), (4, 1), (1, 3), (4, 3)} (left); R = {(1, 2), (2, 1), (1, 3), (1, 4), (2, 3), (2, 4)}
(right)
219
Visual Non-Examples: R = {(2, 4), (3, 4)} (left); R = {(2, 1), (3, 1), (4, 1), (2, 4), (3, 4)} (right)
220
§11.3.3.7: Transitivity
Directed Graph Analogy: For three distinct nodes, if two sides of a triangle are formed, then the
third side needs to be closed off. Similar arguments arise for indistinct nodes chosen.
Notes/Comments:
◦ Unlike symmetry and reflexivity, there is not yet a nice closed form for the number of transitive
relations on a set of n elements. Some more can be read on the OEIS here.
◦ If R is transitive, so its converse relation RT .
◦ The intersection of transitive relations is transitive, but not necessarily their unions or comple-
ments.
◦ A relation is transitive and asymmetric iff it is irreflexive.
Basic Examples:
◦ Equality, or the less/greater than comparisons (with or without equality) in R. So you may define
a relation R by (a, b) ∈ R ⇐⇒ a = b (or a < b, or a ≤ b, or a > b, or a ≥ b) and get a transitive
relation.
◦ Divisibility (i.e. (a, b) ∈ R ⇐⇒ a | b)
◦ Set inclusion, with or without inequality (i.e. (A, B) ∈ R ⇐⇒ A ⫋ B (or A ⊆ B))
Visual Examples: R = {(1, 2), (3, 1), (3, 2)} (left); R = {(1, 1), (1, 2), (2, 4), (4, 3), (3, 1), (1, 4), (4, 1)}
(right)
221
n o
2
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 3), (3, 1)} (left); R = (x, y) ∈ {1, 2, 3, 4} x ̸= y
(right) because it is not reflexive.
222
§11.3.4: Comparability Properties
§11.3.4.1: Connectedness
Informal Definition: For any pair of distinct elements, we know one is always related to the other.
Directed Graph Analogy: For any pair of distinct nodes, you can find a pathway from one to the
other.
Notes/Comments:
223
Visual Examples: R = {(1, 2), (1, 3), (1, 4), (3, 2), (3, 4), (4, 2)} (left); R = {(1, 1), (2, 1), (1, 4), (1, 3), (3, 1), (3, 2), (3, 4),
(right)
Visual Non-Examples: R = {(1, 2), (1, 3), (1, 4), (2, 2), (3, 3), (4, 4)} (left); R = {(1, 4), (4, 3), (4, 2), (3, 2)}
(right)
224
§11.3.4.2: Converse Well-Founded
Directed Graph Analogy: Each subgraph has a node from which nothing is pointing.
Example 1: Letting (a, b) ∈ R if a > b on {1, 2, 3, 4} works. It gives the below graph. The maximum
element is the node of concern in a given subset.
Example 2: R = {(5, 2), (5, 3), (6, 3), (6, 4), (7, 4), (8, 4), (2, 1), (3, 1), (4, 1)}
225
Visual Non-Examples: R = {(1, 3), (3, 4), (4, 2), (2, 1)} (left); R = {(1, 3), (3, 1), (4, 2), (2, 4)} (right)
226
§11.3.4.3: Trichotomous
Formal Definition:
(∀x, y ∈ X) (x, y) ∈ R ∧ (y, x) ̸∈ R ∧ x ̸= y
∨ (x, y) ̸∈ R ∧ (y, x) ∈ R ∧ x ̸= y
∨ (x, y) ̸∈ R ∧ (y, x) ̸∈ R ∧ x = y
Informal Definition: For any x, y, one and only one of the statements xRy, yRx, and x = y may
hold.
Directed Graph Analogy: An element may point to itself; if not, it may point to another element,
with that element not pointing back; if not, that element must point to it.
Notes/Comments:
Visual Examples: R = {(1, 2), (1, 3), (1, 4), (3, 2), (4, 3), (4, 2)} (left); R = {(4, 2), (4, 1), (3, 4), (3, 2), (3, 1), (4, 2), (2, 1)}
227
Visual Non-Examples: R = {(1, 2), (2, 4), (3, 4), (4, 3), (3, 2), (1, 4), (1, 3), (3, 1)} (left); R = {(2, 1), (1, 3), (3, 4), (4, 2), (
(right)
228
§11.3.4.4: Well-Founded
Directed Graph Analogy: Any subgraph has a node to which no arrow points.
Notes/Comments:
Example 1: For the set {1, 2, 3, 4, 5, 6}, we let (a, b) ∈ R iff a < b with respect to the usual ordering
on R. The graph below results. In this case the relevant node is the minimum of any subset taken.
Example 2: Any finite, directed, acyclic graph corresponds to a well-founded relation. For instance,
as below, consider R = {(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)}.
229
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 1)} (left) due to a cycle; R = {(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (1, 3), (1, 4)
(right) due to reflexivity
230
§11.3.5: Function-Like Properties
§11.3.5.1: Bijectivity
Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Definition: A relation which is both injective and surjective. This necessitates that X, Y have equal
cardinality.
Directed Graph Analogy: For each node, there is exactly one arrow leaving and entering it.
Visual Examples: R = {(1, 2), (2, 4), (4, 3), (3, 1)} (left); R = {(1, 4), (4, 2), (2, 1), (3, 3)} (right)
Visual Non-Examples: R = {(1, 2), (1, 3), (1, 4)} (left); R = {(2, 1), (3, 4), (1, 3), (3, 1)} (right)
231
§11.3.5.2: Functional
Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀x ∈ X)(∀y, z ∈ Y )((x, y), (x, z) ∈ R =⇒ y = z)
Informal Definition: Any element can be related to at most one other element.
Directed Graph Analogy: For each node, there is at most one arrow leaving it.
Notes/Comments: This embodies the definition of a ”partial function”, in that a partial function
maps anything to at most one other value. An ordinary or total function is one where each element in
the domain maps to something, exactly one such ”something.” That additional property is known as
left-totality.
Visual Examples: R = {(1, 2), (2, 3)} (left); R = {(3, 1), (1, 2), (2, 4), (4, 4)} (right)
Visual Non-Examples: R = {(1, 2), (1, 3), (1, 4)} (left); R = {(1, 2), (2, 1), (1, 1), (2, 2), (3, 3), (4, 4)}
(right)
232
§11.3.5.3: Injectivity
Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀x, z ∈ X)(∀y ∈ Y )((x, y), (z, y) ∈ R =⇒ x = z)
Informal Definition: If an element is related to something else, it can be related to at most one such
thing.
Directed Graph Analogy: Every node has at most one arrow pointing to it.
Visual Examples: R = {(3, 3), (3, 1), (1, 2), (2, 4)} (left); R = {(1, 2), (3, 4)} (right)
Visual Non-Examples: R = {(1, 4), (2, 4), (3, 4)} (left); R = {(1, 2), (1, 4), (3, 2), (3, 4)} (right)
233
§11.3.5.4: (Left-)Totality / Seriality
Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀x ∈ A)(∃y ∈ B)((x, y) ∈ R)
Directed Graph Analogy: Every element points to something, i.e. every node has an arrow leaving
it.
Notes/Comments:
Visual Examples: R = {(1, 2), (2, 3), (3, 4), (4, 2)} (left); R = {(1, 2), (1, 3), (1, 4), (2, 2), (3, 3), (4, 4)}
(right)
Visual Non-Examples: R = {(1, 2), (2, 4), (4, 1)} (left); R = {(1, 2), (1, 3), (1, 4)} (right)
234
§11.3.5.5: Surjectivity
Note: Typically, we look at this as a relation between two sets, X and Y . Hence R ⊆ X × Y . For the
purposes of our digraph visual however, we’ll keep X = Y .
Formal Definition: (∀y ∈ Y )(∃x ∈ X)((x, y) ∈ R)
Directed Graph Analogy: Every node has at least one arrow pointing to it.
Visual Non-Examples: R = {(1, 4), (2, 4), (3, 4)} (left); R = {(1, 2), (1, 3), (3, 2)} (right)
235
§11.4: Combinations of Properties
◦ R is a poset
◦ (∀x, y ∈ A)(x < y)(∃z ∈ A)(x < z ∧ z < y)
236
§11.4.2: Dependencies
◦ R is reflexive
◦ R is symmetric
◦ R is finite
237
§11.4.3: Equivalence / Equivalence Relations
◦ Reflexivity
◦ Symmetry
◦ Transitivity
Notes/Comments:
◦ Hence any equivalence relation is a partial equivalence relation (one with reflexivity) and a preorder
(one with symmetry).
◦ There is a one-to-one correspondence with the equivalence relations on a set, and the partitions
of the set.
◦ There is no nice closed form for the number of equivalence relations on a set of n elements. The
closest one may get is
Xn
S(n, k)
k=0
where S(n, k) denotes Stirling numbers of the second kind. You can find more details on the OEIS
here.
Equivalence Classes:
◦ Recall that an equivalence class is the set of all elements that are related to each other. Hence,
an equivalence class of an element x can be visualized as all of the nodes y from which you can
walk from x to y and back again along the same path.
◦ Example 1: Our equivalence classes are {1, 2}, {3}, {4}.
238
◦ Example 2: Our equivalence classes are {1, 3}, {2, 4}.
239
§11.4.4: Partial Equivalence Relation
◦ Symmetry
◦ Transitivity
Notes/Comments:
◦ Any partial equivalence relation is right Euclidean and left Euclidean. (The converse need not be
true.) Consequently, they are also quasi-reflexive.
240
§11.4.5: Partial Orders / Posets
◦ R is reflexive
◦ R is antisymmetric
◦ R is transitive
Notes/Comments:
◦ There is no nice closed form for the number of posets on a set of n elements. You can find more
details on the OEIS here.
241
§11.4.6: Preorders
◦ Reflexivity
◦ Transitivity
Notes/Comments:
◦ There is no nice closed form for the number of preorders on a set of n elements. You can find
more details on the OEIS here.
242
§11.4.7: Prewellorders
◦ R is a preorder
◦ R is well-founded, in the sense that the relation S given by (x, y) ∈ S ⇐⇒ (x, y) ∈ R ∧ (y, x) ̸∈ R
is well-founded.
243
§11.4.8: Pseudo-Orders
◦ R is asymmetric
◦ R is cotransitive
◦ If (x, y), (y, x) ̸∈ R then x = y
244
§11.4.9: Strict Partial Orders
◦ R is irreflexive
◦ R is antisymmetric
◦ R is transitive
Notes/Comments:
◦ Notice that strict partial orders differ from ordinary posets in the first axiom: usual posets have
reflexivity, whereas strict ones do not. You can think of strict partial orders embodying the
behavior of < whereas posets have ≤ (although without necessarily some sort of connectedness
axiom like those satisfy on the reals).
245
§11.4.10: Strict Total Order
◦ R is irreflexive
◦ R is antisymmetric
◦ R is transitive
◦ R is strongly connected
Notes/Comments:
◦ Notice that strict total orders differ from ordinary total orders in the first axiom: ordinary total
orders have reflexivity, whereas strict ones have irreflexivity. You can think of strict total orders
embodying the behavior of < whereas ordinary total orders have ≤.
246
§11.4.11: Total Orders
◦ R is reflexive
◦ R is antisymmetric
◦ R is transitive
◦ R is strongly connected
Notes/Comments:
247
§11.4.12: Total Preorders
◦ R is reflexive
◦ R is transitive
◦ R is strongly connected
Notes/Comments:
◦ There is no nice closed form for the number of total preorders on a set of n elements. The closest
one may get is
Xn
k! · S(n, k)
k=0
where S(n, k) denotes the Stirling numbers of the second kind. You can find more details on the
OEIS here.
248
§11.4.13: Tournaments
◦ R is irreflexive
◦ R is antisymmetric
Notes/Comments:
249
§11.4.14: Well-order
◦ R is a total order
◦ Any given subset of our original set has a least element w.r.t. this ordering. That is, if R is over
A,
(∀ nonempty S ⊆ A)(∃m ∈ S)(∀s ∈ S)((m, s) ∈ R)
250
§11.5: Basic Operations & Derived Relations
Definition: The closure of a relation R (with respect to a certain property or collection thereof) is
the smallest relation R′ such that R′ satisfies that property and R ⊆ R′ .
For instance:
Existence & Minimality: For a property P and relation R, the P -closure of R need not always exist.
For the cases of reflexivity, transitivity, and symmetry, they do because such relations (interpreted as
sets) are closed under arbitrary intersection. Hence we may codify ”smallest” in the sense of - RP is
the P -closure of R ⊆ P(A) -
\
RP = R
R⊆S⊆P(A)
R satisfies P
Notation: Sometimes, to borrow notation from topology, the P -closure may be denoted clP (R). For
instance, clref (R). A notation used for the reflexive closure is sometimes R= .
Visual: In terms of our graph visualization, you will want to begin with the given relation, and modify
the graph by adding arrows (adding new pairs to the relation) until you have a relation with the desired
property.
251
§11.5.2: Property Reduction
(Note: Some notation here may be nonstandard. I only saw this offhandedly referenced on Wikipedia with
regards to the reflexive reduction.)
Definition: For a property P , the P reduction of a relation R is denoted redP (R). It is typically
defined to be the largest relation contained in R which does not satisfy property P . For instance: for
relations R over A,
The reflexive reduction redref (R) is the largest irreflexive relation contained in R, i.e.
The transitive reduction redtra (R) is the largest antitransitive relation contained in R.
252
§11.5.3: Relation Composition
Definition: The composition of relations merits a slightly different visualization than we’ve used
throughout these posts.
Suppose we have two relations R, S. For full generality, I’ll let R ⊆ A × B and S ⊆ B × C. Then the
relation S ◦ R is defined by
Properties:
◦ Composition associates: (R ◦ S) ◦ T = R ◦ (S ◦ T )
◦ Taking the converse gives (R ◦ S)T = S T ◦ RT
◦ If R, S are injective (surjective) relations, then R ◦ S is injective (surjective).
◦ If R ◦ S is injective (surjective), then we only can say for sure that R is injective (S is surjective) .
Digraph Visualization: To visualize this sort of composition and find it visually, it is best to write
the elements of A, B, C in three columns in that order.
253
Example 1: Define two relations on {1, · · ·, 5} as below:
R = {(1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (3, 4), (4, 4), (4, 5), (5, 5)}
S = {(1, 5), (2, 4), (4, 2), (5, 1)}
Then we have
S ◦ R = {(1, 5), (1, 4), (2, 4), (3, 2), (4, 2), (4, 1), (5, 1)}
The visual setup of the relation is below. Ensure that this visual and how the composition is found
makes sense.
254
Example 2: Define two relations on {1, · · ·, 5} as below:
R = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5)}
S = {(1, 1), (1, 2), (2, 2), (2, 3), (3, 3), (3, 4), (4, 4), (4, 5), (5, 5)}
S ◦ R = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5)}
255
Example 3: Finally, on {1, · · ·, 5}, define the relation
R = {(1, 2), (2, 3), (3, 4), (4, 5), (5, 1)}
R ◦ R = {(1, 3), (2, 4), (3, 5), (4, 1), (5, 2)}
R ◦ R ◦ R = {(1, 4), (2, 5), (3, 1), (4, 2), (5, 3)}
256
§11.5.4: Transpose of Relation
RT = {(b, a) | (a, b) ∈ R}
Digraph Visual: Hence, in visualizing a relation R as a directed graph, the graph for RT amounts
to simply flipping around each arrow’s orientation.
Notation: A number of notations exist for the transpose/converse relation. Some include:
RT RC R−1 R̆ R◦ R∨
257
§12: Items from Abstract Algebra
§12.1: Algebraic Structures
A brief table summarizing the group-like (one set, one operation) structures (from Wikipedia) and their
corresponding properties is below.
Some of the ring-like (one set, two operation) structures are summarized below, somewhat. ((R, +, ×) is
a semiring when (R, +) is a commutative monoid, (R, ×) is a monoid, and distribution is satisfied. The
below table has an error in that respect.)
Some quick links to other common structures of note, and their Wikipedia articles:
Modules
Vector spaces
Algebras (over fields)
Associative algebras & non-associative algebras
258
§12.1.2: Structural Graph for Group-Like Structures
259
§12.1.3: Structural Graph for Ring-Like Structures
We note that
(R, +, ×) is a semiring when (R, +) is a commutative monoid, (R, ×) is a monoid, and distribution
is satisfied.
(R, +, ×) is a near-ring if (R, +) is a group, (R, ×) is a semigroup, and distribution is satisfied. (Left-
and right-near-rings exist only satisfying one distribution law.)
(R, +, ×) is a rng if (R, +) is an abelian group and (R, ×) is a semigroup, with distribution satisfied.
Note that to make R a ring with identity (whereas a rng is a ring without identity) one needs to
introduce a multiplicative identity, making (R, ×) a monoid.
The graph is not as complete or proper as the previous, so unseen connections with appropriate axioms
or conditions may exist. In particular, this chain of inclusions represents some of the major structures quite
well, in lieu of the “addition of axioms” structure below:
{rngs} ⊇ {rings}
⊇ {commutative rings}
⊇ {integral domains}
⊇ {integrally closed domains}
⊇ {GCD domains}
⊇ {unique factorization domains (UFDs)}
⊇ {principal ideal domains (PIDs)}
⊇ {Euclidean domains}
⊇ {fields}
⊇ {algebraically closed fields}
Full resolution available here, and some code for it here. The TikZ editor stuff for it is here, but doesn’t
totally reflect the image linked to or included here, but only up to aesthetic details.
260
§12.2: Isomorphism Theorems
Any suggestion as to numbering depends on the source, and hence should be taken carefully here.
We define the quotient group of G and a normal subgroup N by the collection of distinct cosets:
◦ ker(f ) ⊴ G
◦ im(f ) ≤ H
◦ im(f ) ∼
= G/ ker(f )
The isomorphism for the third item is φ : G/N → H given by φ(xN ) = f (x). Some details.
261
Second Isomorphism Theorem (Diamond/Parallelogram Theorem): For G a group, with
S ≤ G and N ⊴ G, we have
◦ SN ≤ G, where SN := {sn | s ∈ S, n ∈ N }
◦ S∩N ⊴S
SN ∼ S
◦ =
N S∩N
The isomorphism in the third is given by φ : S → SN/N with φ(s) = sN . Some details.
One need not have N normal, provided S is a subgroup of N ’s normalizer.
A proof here.
Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take G a
group and N ⊴ G. Let
262
Isomorphism Theorems for Rings:
We let ≤ denote the subring relation here.
263
Isomorphism Theorems for Modules & Vector Spaces:
We let ≤ denote the submodule/vector subspace relation here.
Note that modules over a field are, in fact, vector spaces.
For finite-dimensional vector spaces, these all follow by rank-nullity.
Throughout, “module” here always refers to an R-module, for R any fixed ring.
264
§12.3: Catalogue of Important Groups & Group Structures
Specific Examples:
Some Basics: R, C, Q, A under +, or their nonzero elements under multiplication. The same is true
of the finite fields Z/pZ ≡ Fp .
Dihedral Groups: D2n is the set of rigid symmetries of a regular n-gon in the plane, generated by
rotations r by 2π/n radians in the counterclockwise direction about the origin, and reflections s about
a line through the origin and a fixed “first” vertex. (The line does not change after operations.) It has
presentation
D2n = r, s rn = s2 = 1; rs = sr−1
Quaternion Group: Q8 is the set {±1, ±i, ±j, ±k}, which follows the rules you know for quaternion
multiplication in the larger set H (that set being sometimes called the Hamiltonians).
Klein 4-Group: (Denoted V, V4 , K4 .) The smallest non-cyclic group, and the only non-cyclic group
of order 4. It possesses the multiplication table
and has presentation V4 = a, b a2 = b2 = (ab)2 = e .
More is available on Wikipedia & Groupprops.
Automorphism Groups:
265
§12.3.2: Important Classes/Classifications of Groups
Cyclic Groups: Cyclic groups only have one generator. We may say that the cyclic group of order n
is
Zn ≡ Cn := ⟨x | xn = 1⟩
or it may be infinite.
Quotient Groups: G/H is read “G modulo H” or “G mod H”
Equivalently, G/ ker φ is {g ker(φ)}g∈G , the set of left cosets of ker φ in G with the group operation
g ker φ ◦ h ker φ := (gh) ker φ
Simple Groups: A group G is simple if |G| > 1 and its only normal subgroups are ⟨1⟩ and G itself.
(Specifically, we say it has exactly two normal subgroups, so ⟨1⟩ is not simple.)
Composition Series: We say, for a group G, that the sequence of subgroups
then G is solvable.
266
§12.3.3: Important Subgroups/Substructures
Important Results:
cosets of H in G is |G|/|H|.
We denote the index by |G : H| and let it be the number of these cosets, even in the infinite case.
Observe that |G/N | ≡ |G|/|N | in the finite case.
◦ Corollary: The order of each individual element divides that of G (H := ⟨x⟩), i.e. |x| |G|.
Partial Converse: Cauchy’s Theorem: For G a finite group and prime p | |G|, then ∃x ∈ G with
|x| = p. (M&I, Thm. 3.2.11)
Stronger Partial Converse: Sylow’s Theorem: If |G| = pα m, for α ∈ Z, p prime, and m̸ | p, then
G has a subgroup of order pα .
Important Substructures:
CG (A) := g ∈ G gag −1 = a ∀a ∈ A = {g ∈ G | ga = ag ∀a ∈ A}
the set of all elements of G that commute with any element of A. We may write CG (a) := CG ({a}).
We have CG (A) ≤ G.
Z(G) := {g ∈ G | gx = xg ∀x ∈ G} ≡ CG (G)
A in G is
NG (A) := g ∈ G gAg −1 = A
Note that CG (A) ≤ NG (A) ≤ G. This is not the same as CG (A), it is looser in that elements within
A need not be fixed by conjugation, just shuffled around within A.
267
Kernel of Homomorphism: For a homomorphism φ : G → H,
[x, y] := x−1 y −1 xy
The commutator (first derived) subgroup is denoted by [G, G], G′ , or G(1) , and defined by
◦ ∀g ∈ G, we have gN g −1 = N
◦ ∀g ∈ G, we have gN g −1 ⊆ N
◦ ∀g ∈ G, we have N ⊆ gN g −1
◦ ∀g ∈ G, ∀n ∈ N we have gng −1 ∈ N
◦ ∀g ∈ G, ∀n ∈ N we have gng −1 ∈ N ⇐⇒ n ∈ N
◦ ∀g ∈ G, we have gN = N g
◦ A left coset is always a right coset and vice versa
◦ ∃φ : G → H a homomorphism with ker φ = N
268
§12.3.4: Items Tied to Group Actions
Gs := {g ∈ G | g · s = s}
We have Gs ≤ G.
(Group Action) Kernel: The kernel of the aforementioned group action is
{g ∈ G | ∀s ∈ S, g · s = s}
◦ We say the action is transitive if there is only one distinct orbit (hence, ∀a, b ∈ A we may f ind
g ∈ G where a = g · b).
◦ Suppose G is a transitive permutation group on A. A block is a nonempty B ⊆ A such that
∀σ ∈ G we either have
σ(B) := {σ(b)}b∈B = B or σ(B) ∩ B = ∅
◦ A transitive group G on A is said to be primitive if all blocks are trivial: size 1, or A itself.
◦ A transitive permutation group G on A is doubly transitive if ∀a ∈ A, Ga is transitive on
A − {a}.
◦ Consider the group action of conjugation (with A = G):
g · a := gag −1 a, g ∈ G
−1
The orbits Oa := gag g∈G
of this action are the conjugacy classes of G.
269
§12.4: Catalogue of Important Rings & Ring Structures
Ideals: We may use I ⊴ R to say “I is an ideal of R”, analogous to the notation for normal sub-
groups. This notation has appeared in P.M. Cohn’s Introduction to Ring Theory (2000 edition); cf.
this MSE post.
Center of a Ring: Dummit & Foote do not use an explicit notation for the center of a ring. We will
use Z(R) to denote the center of the ring R, analogous to the notation for centers of groups. Another
notation is C(R), which nicely parallels the centralizer notation CR (R), since Z(R) = CR (R).
Nonzero Elements vs. Units: The notation R× has become commonplace, especially for fields,
to denote the invertible elements of a ring. In fields, “nonzero elements” and “invertible elements”
coincide, but not so much for rings. To avoid confusion, we will use R̸=0 to denote the nonzero
elements of a ring, and R× for the units/invertible elements.
270
§12.4.2: Definitions of Ring-Like Structures
A Certain Hierarchy:
While not perfect, we have the following chain of inclusions:
{rngs} ⊇ {rings}
⊇ {commutative rings}
⊇ {integral domains}
⊇ {integrally closed domains}
⊇ {GCD domains}
⊇ {unique factorization domains (UFDs)}
⊇ {principal ideal domains (PIDs)}
⊇ {Euclidean domains}
⊇ {fields}
⊇ {algebraically closed fields}
Semiring: Prior to even the rng, we define a semiring R (with two operations, +, ·) as satisfying
(i) (R, +) is an commutative monoid (so + is closed, commutes, associates, has identity)
(ii) (R, ·) is a monoid (so · is closed & associates)
(iii) Distribution on both sides is satisfied: (a + b) · r = a · r + b · r, and r · (a + b) = r · a + r · b
Rng / Ring Without Unit: A rng (also: ring without unity , nonunital ring , ring without
1 , etc.) is a set R paired with operations + (addition) and · (multiplication) such that
(i) (R, +) is an abelian group (so + is closed, commutes, associates, has identity, has inverses)
(ii) (R, ·) is a semigroup (so · is closed & associates)
(iii) Distribution on both sides is satisfied: (a + b) · r = a · r + b · r, and r · (a + b) = r · a + r · b
Some authors consider a ring R to by default have identity (a unital ring); others do not and choose
to make a distinction (and hence a ring may not have 1). Be careful.
Commutative Rng: A rng in which · commutes is a commutative rng . Hence,
Ring / Unital Ring: A ring (also: ring with identity , unital ring , ring with 1 , etc.) is a rng
R such that · has a multiplicative identity as well. This ensures that:
271
One may note that the condition that (R, +) is abelian is superfluous in unital rings, since the remaining
axioms force commutativity under +. We see that by left distribution
One cancels the leftmost and rightmost terms to get b + a = a + b, i.e. commutativity.
Commutative Ring: A ring in which · commutes is a commutative ring . Hence
Division Ring: Given a unital ring R (not necessarily commutative), if 0 ̸= 1 (so as to be nontrivial)
and each element in R̸=0 is invertible (R× = R̸=0 ), we say R is a division ring . Hence:
(i) R is nontrivial (0 ̸= 1)
(ii) (R, +) is an abelian group
(iii) (R̸=0 , ·) is a group
(iv) Distribution on both sides is satisfied
Some say a division ring which specifically is not commutative is a skew field .
Equivalent conditions to be a division ring:
◦ R is a division ring when the only ideals are 0 and D (can be just one-sided)
(i) R is nontrivial (0 ̸= 1)
(ii) (R, +) is an abelian group
(iii) (R̸=0 , ·) is an abelian group
(iv) Distribution on both sides is satisfied
272
Other Classes:
ab = 0 =⇒ a = 0 or b = 0 (or both)
◦ GCD Domain: If an integral domain R has a greatest common divisor for each pair of its
elements, it is a GCD domain.
Specifically: given each a, b ∈ R with at least one of a, b ̸= 0, let d ∈ R have the properties
(i) d | a (i.e. ∃r ∈ R where a = dr)
(ii) d | b
(iii) δ | a and δ | b implies δ | d
Then d is a greatest common divisor (gcd) of a, b, denoted d = gcd(a, b). (Note this d need not
be unique.)
◦ Unique Factorization Domain (UFD): A unique factorization domain is an integral
domain R in which each r ∈ R̸=0 has a factorization in terms of irreducibles pi and a unit u. That
is,
r = up1 · · ·pn for some n ≥ 0
Moreover, this factorization must be unique. That is, if for some other irreducibles qi and unit
w, we have Y Y
r=u pi = w qi
i i
then there are equally many qi and pi , and and a bijection {1, · · ·, n} → {1, · · ·, n} sending
pi 7→ qσ(i) .
It is often convenient to use the equivalent definition that each r ∈ R̸=0 may be written as a
product of a unit and some primes in R.
◦ Principal Ideal Domains (PIDs): An integral domain R is said to be a principal ideal
domain if each ideal is principal, i.e. generated by a single element.
◦ Bezout Domain: For R an integral domain, we say R is a Bezout domain when each ideal
generated by two elements is principal, i.e. ∀a, b ∈ R, (a, b) = (c) for some c ∈ R.
273
◦ Euclidean Domain: If an integral domain R possesses a division algorithm, it is a Euclidean
domain.
This requires that ∃N : R → Z≥0 (a norm) with the property that N (0) = 0 and
Here, rn is the last nonzero remainder; since N (b) > N (r0 ) > N (r1 ) > · · · > N (rn ), the process
must terminant eventually. Moreover, rn = gcd(a, b).
274
Boolean Ring: A rng R is said to be a Boolean ring if r2 = r for all r ∈ R. (These are necessarily
commutative as well.)
Quotient Ring: Let φ : R → S have ker φ = I. Then the fibers of φ are the additive cosets of the
kernel, so we define
R/I := {r + I}r∈R (r + I) + (s + I) := (r + s) + I (r + I) · (s + I) := (r · s) + I
These define a ring, and we call R/I the quotient ring . Note this is fundamentally just the quotient
of the additive groups, with I ⊴ R.
We can equivalently define the quotient R/I for any ideal I, since I is an ideal iff it is a kernel.
Simple Rings: A ring is simple if its only ideals are 0 and itself.
275
§12.4.3: Types of Elements in a Ring
The Identities: 0 or 0R often denotes the additive identity, and 1R or 1 denotes the multiplicative
identity, if present.
Inverses: In a unital ring R, x−1 is the (multiplicative) inverse of x when xx−1 = x−1 x = 1.
Units & The Unit: Confusingly, it is common to say r ∈ R is a unit if it is invertible under
multiplication, and we call 1 ∈ R in a unital ring the unit.
The collection of all units of R is denoted R× .
Zero Divisors: r ∈ R̸=0 is said to be a zero divisor if ∃s ∈ R̸=0 such that rs = 0 or sr = 0.
In integral domains, primes are irreducible; in PIDs, primes and irreducibles coincide.
Prime Elements: We say p ∈ R (a commutative ring) is prime when
(i) p ̸= 0
(ii) p is a non-unit (so these combined means p ∈ R̸=0 − R× )
(iii) p | ab =⇒ p | a or p | b
276
Least Common Multiple: In a commutative unital ring R with a, b ∈ R̸=0 , a least common
multiple of a, b is an ℓ ∈ R such that
(i) a | ℓ
(ii) b | ℓ
(iii) If a | λ and b | λ, then ℓ | λ
Thus there is a sort of “division algorithm” for each u: any x may be written in the form
x = qu + z
277
§12.4.4: Specific Examples of Rings
Basic Examples:
The prototypical example is Z under the usual addition and multiplication, a commutative ring.
The collection of functions f : X → A for A any ring and X ̸= ∅; this commutes iff A does. Special
subclasses exist, such as C [0, 1], R , as a commutative ring. Similarly, Cc (R, R) is a commutative
non-unital ring.
The Trivial Ring / Zero Ring: The singleton {0}. The only definable operations,
make it satisfy all of the axioms of a commutative ring. Typically definitions of division rings and fields
exclude it from being either, however, by forcing 0 ̸= 1. (Notice that the additive & multiplicative
identities in the trivial ring are both zero.)
(Some define the trivial rings to be an entire class: for any abelian group, define multiplication by
g · h := 0.)
278
§12.4.5: Important Classes of Rings
Polynomial/Series Rings:
Polynomials: Given a ring R, we define the ring of polynomials (in the formal variable x,
with coefficients in R) to be
(N )
X
n
R[x] := an x N ∈ N and an ∈ R for each n
n=0
These are meant as formal sums. Hence, they are manipulated in the ways we expect, but we do not
make concerns about convergence in the infinite case.
Formal Power Series: Given a ring R, we define the ring of power series (formal variable x,
coefficients in R) to be (∞ )
X
n
R[[x]] := an x an ∈ R for each n
n=0
Formal Laurent Series: Given a commutative unital ring R (or, often as in Dummit & Foote, a
field), we define the ring of formal Laurent series to be
( ∞ )
X
n
R((x)) := an x N ∈ Z and an ∈ R for each n
n=N
Note that this means p ∈ R((x)) may be written as a sum of ps ∈ R[[x]] and pp ∈ R[1/x].
279
Matrix Rings:
Given a ring R and n ∈ Z≥1 , we have the ring
n o
n
Mn (R) = Rn×n = Mn×n (R) = (ri,j )i,j=1 = {all n × n matrices over R}
ri,j ∈R
Note that Mn (R) need not be commutative (even in the extreme case of R a field).
We say A ∈ Mn (R) is a scalar matrix if all diagonal entries are the same constant, and all other entries
are 0. These matrices are isomorphic to R.
We have the subclasses (though often for more special R, namely a field F (and most commonly R or
C)):
This is usually used for F ⊆ R. Note that if M ∈ On (F) then det(M ) = ±1.
Unitary Group: The complex case of the previous, Un (F), uses unitary matrices:
Special Unitary Group: This likewise restricts Un (F) to those of unit determinant:
280
Other Types:
n
Group Rings: Let R be a nontrivial unital commutative ring, and G := {gi }i=1 a finite multiplicative
group. We can define the group ring RG of G with coefficients in R as all formal sums as so:
( n )
X
RG := ai gi ai ∈ R
i=1
Multiplication comes by
i.e.
(agi ) · (bgj ) := (ab)(gi gj )
and hence
n
X n
X n
X X
ai gi · bi gi := ai bj gk
i=1 i=1 k=1 gi gj =gk
then r is said to be a algebraic integer , and its order is said to be the minimum degree of such
polynomials p that r satisfies. One then defines the ring of (algebraic) integers from R by
OR := {r ∈ R | r is an algebraic integer}
One typically defines this for R an algebraic number field , a field such that R/Q has finite degree.
281
§12.4.6: Important Ring Substructures
Miscellaneous Structures:
Subrings: S ⊆ R a ring is said to be a subring (denoted S ≤ R if it is a ring in its own right. This
gives rise to the subring test which means that we must verify the following:
(i) S ̸= ∅
(ii) S ⊆ R
(iii) For any x, y ∈ S, we have x − y, x · y ∈ S
CR (S) := {r ∈ R | ∀s ∈ S, sr = rs}
i.e. the set of elements in R commuting with each element of S (under multiplication).
i.e. the set of all elements in R commuting with every other element (under multiplication).
Kernel: Given a homomorphism of rings φ : R → S, we have the kernel of φ as in the homomorphism
of additive groups φ : (R, +R ) → (S, +S ):
Valuation Rings: For K a field (more generally, an integral domain), a discrete valuation on K
is a function ν : K × → Z such that
(i) ν(ab) = ν(a) + ν(b) (a homomorphism (K × , ·) → (Z, +))
(ii) ν is surjective
(iii) ν(x + y) ≥ min{ν(x), ν(y)} for any x, y ∈ K × where x + y ̸= 0
We then say that {x ∈ K × | ν(x) ≥ 0 or x = 0} is the valuation ring of ν.
282
Ideals & Related Structures:
Given a ring R and I ≤ R:
283
Radical of Ideal: For I ⊴ R commutative, we let
284
§12.4.7: Functions of Rings
Evaluation Map: For A a ring, X ̸= ∅, and R the set of all functions f : X → A, the evaluation
at c map is given by
Ec : R → A with Ec (f ) := f (c)
This is a ring homomorphism with R/ ker Ec ∼
= A.
Discrete Valuations: For K a field (more generally, an integral domain), a discrete valuation on
K is a function νK × → Z such that
Characteristic of a Ring: Given a ring R, its characteristic char(R) is the minimum n such that
n1 = 0, i.e.
( n )
X
char(R) := min n ∈ Z≥1 1 + 1 + · · · + 1 = 0 = min n ∈ Z≥1 1=0
| {z
n times
}
i=1
285
§12.5: (Dummit & Foote, Chapter 1 ) Group Theory: Basics (Groups, Actions,
Morphisms)
Fundamental Definitions:
◦ ∗ is associative
◦ ∃e ∈ G (often labeled 1), the identity of G, such that ge = eg = g ∀g ∈ G
◦ ∀g ∈ G, ∃g −1 ∈ G, the inverse of g, such that gg −1 = g −1 g = e
Direct Product: Given groups (G, ∗), (H, ◦), their product is the set G × H equipped with a binary
operation ⊙ defined by
(a, b) ⊙ (c, d) := (a ∗ c, b ◦ d)
Basic Results:
Let G be a group.
a1 a2 · · ·an may be bracketed in any way with no ambiguity (D&F, Prop. 1.1.1)
au = av =⇒ u = v and ub = vb =⇒ u = v, for a, b, u, v ∈ G (D&F, Prop. 1.1.2)
The solutions x, y to ax = b and ya = b are unique (D&F, Prop. 1.1.2)
|x| = 1 ⇐⇒ x is the identity
|xy| = |yx|
286
Important Examples:
Dihedral Groups, D2n : Representative of the symmetries on a regular n-gon, rotations and reflec-
tions. May be represented by
D2n := r, s rn = s2 = 1; rs = sr−1
Here, r is a rotation counterclockwise by 2π/n radians, and s is a reflection about a line: this line
is that through the shape’s center, and a fixed vertex. (This line does not change after performing
actions on the shape.)
Symmetric Groups, Sn : SΩ is the set of bijections f : Ω → Ω, with operation of function composi-
tion. Sn is the case for Ω = {1, 2, · · ·, n}, with |Sn | = n!. We often write σ ∈ Sn with a cycle notation,
e.g.
σ = (1 3 2 4) ⇐⇒ σ(1) = 3, σ(3) = 2, σ(2) = 4, σ(4) = 1
with cycles of length 1 omitted (those numbers are fixed). Any σ ∈ Sn is a product (composition) of
disjoint such cycles (each are their own elements of Sn after all).
Some further notes for now:
General Linear Groups: A field is a set F with addition + and multiplication · as binary operations,
such that (F, +) and (F \{0}, ·) are abelian groups, satisfying the obvious distribution law. (We let
F × := F \{0}.)
A general linear group may be defined by
1a = a (−1)2 = 1 i2 = j 2 = k 2 = −1 ij = k jk = i ki = j kj = −i ji = −k ik = −j
287
Homomorphisms & Isomorphisms:
A structure-preserving map φ : G → H of groups (G, ∗), (H, ◦), i.e.
◦ |G| = |H|
◦ G is abelian iff H is
◦ |x| = |φ(x)| for each x ∈ G
◦ G∼ = H iff, for each fixed order, they each have equally many elements of that order
◦ If G ∼
= H, then they may be represented the same way (in the sense of ⟨· · ·| · · ·⟩), with the same
generators and relations (up to the naming thereof).
Group Actions:
The group action of a group G on a set A is a mapping · : G × A → A such that
(When there is no danger of confusion, · may be replaced with concatenation.) A key example is matrix-
vector multiplication.
Some definitions and notes:
The kernel of the group action is {g ∈ G | g · a = a ∀a ∈ A}, those for whom the induced permutation
is the identity permutation.
288
§12.6: (Dummit & Foote, Chapter 2 ) Group Theory: Subgroups
Introduction:
For G a group and H ⊆ G, we say H (under the same operation) is a subgroup of G if x, y ∈ H =⇒ xy, x−1 , y −1 ∈ H.
(Note: this implies that 1 ∈ H, H ̸= ∅, etc., that usually are redundantly stated.) We write H ≤ G.
Some notes:
Special Subgroups:
CG (A) := g ∈ G gag −1 = a ∀a ∈ A = {g ∈ G | ga = ag ∀a ∈ A}
the set of all elements of G that commute with any element of A. We may write CG (a) := CG ({a}).
We have CG (A) ≤ G.
Center: Given a group G, its center is
Z(G) := {g ∈ G | gx = xg ∀x ∈ G} ≡ CG (G)
A in G is
NG (A) := g ∈ G gAg −1 = A
Note that CG (A) ≤ NG (A) ≤ G. This is not the same as CG (A), it is looser in that elements within
A need not be fixed by conjugation, just shuffled around within A.
Stabilizer: Suppose G is acting on S a set and s ∈ S. Then the stabilizer of s in G is
Gs := {g ∈ G | g · s = s}
We have Gs ≤ G.
(Group Action) Kernel: The kernel of the aforementioned group action is
{g ∈ G | ∀s ∈ S, g · s = s}
and is a subgroup of G.
Some notes:
289
§12.7: (Dummit & Foote, Chapter 3 ) Group Theory: Quotients; Homomor-
phisms
Basic Definitions:
Here, G, H are groups unless stated otherwise.
Let φ : G → H be a homomorphism and have K := ker φ. The quotient (factor) group G/K is
defined by {gK}g∈G where gK := {gk}k∈K . One may in general define this for any K ⊴ G since such
K are the kernels of some homomorphism (cf. Theorem 3.1.3). The elements of G/K are called cosets
(aK is a left coset and Ka a right coset).
⟨1⟩ = N0 ⊴ N1 ⊴ · · · ⊴ Nk = G
is a composition series when Ni+1 /Ni is simple for each i. These quotients are called the composi-
tion factors of G. By Jordan-Holder, all finite G have a unique composition series up to isomorphism
of the factors.
A group G is said to be solvable if the composition factors are all abelian.
290
Basic Results:
◦ N ⊴G
◦ NG (N ) = G
◦ gN = N g ∀g ∈ G
◦ The operation on left cosets of N given by aN ∗ bN := (ab)N , with a, b ∈ G, makes the left cosets
into a group.
◦ gN g −1 ⊆ N for any g ∈ G (equality is not needed)
◦ ∃φ : G → H a homomorphism with N ≡ ker φ (D&F, Prop. 3.1.7)
|G : H ∩ K| = |G : H| · |G : K|
xN, yN in G/N commute iff their commutator [x, y] := x−1 y −1 xy ∈ N (D&F, Prob. 3.1.40)
The commutator subgroup N := {[x, y]}x,y∈G has N ⊴ G and G/N abelian. (D&F, Prob. 3.1.41)
291
For H, K ≤ G (H, K finite), with HK := {hk}h∈H,k∈K , then (D&F, Thm. 3.2.13)
|H||K|
|HK| =
|H ∩ K|
and HK ≤ G iff HK = KH. (This is not the condition to be abelian.) (D&F, Prop. 3.2.14)
As a corollary: H, K ≤ G with H ≤ NG (K) gives HK ≤ G. In particular, K ⊴ G and ≤ G gives
HK ≤ G. (D&F, Cor. 3.2.15)
If H ≤ G and g ∈ G, then gHg −1 ≤ G and |H| = gHg −1
(D&F, Prob. 3.2.5a)
If H ≤ G is the only subgroup of a given (finite?) order in G, then H ⊴ G. (D&F, Prob. 3.2.5b)
For H ≤ G and g, g ′ ∈ G, if Hg = g ′ H, then Hg = gH and g ∈ NG (H). (D&F, Prob. 3.2.6)
If H, K ≤ G with H, K finite of coprime orders, then H ∩ K = ⟨1⟩. (D&F, Prob. 3.2.8)
If G is an abelian simple group, G ∼
= Zp for p prime. (D&F, Prob. 3.4.1)
If n | |G|, a finite abelian group G has a subgroup of order n. (D&F, Prob. 3.4.4)
If G is solvable and H ≤ G and N ⊴ G, then H and G/N are solvable. (D&F, Prob. 3.4.5)
A finite group G is solvable iff if n | |G| with n, |G|/n coprime implies G has a subgroup of order n.
◦ G is solvable
◦ The composition factors of G have prime order.
◦ G has a chain of subgroups
⟨1⟩ = H0 ⊴ H1 ⊴ · ⊴ Hs = G
with Hi+1 /Hi cyclic. (Difference from composition series is the cyclic condition.)
◦ G has a chain of subgroups
⟨1⟩ = N0 ⊴ N1 ⊴ · · · ⊴ Nt = G
with Ni ⊴ G for all i, and Ni+1 /Ni abelian for all i. (Difference from composition series lies with
Ni ⊴ G and the abelian (not simple) condition.)
For H ⊴ G with H nontrivial and G solvable, ∃A ≤ H nontrivial with A ⊴ G and A abelian. (D&F,
Prob. 3.4.11)
292
More Important Results:
Lagrange’s Theorem: For H ≤ G a finite group, then |H| |G|, and the number of left (or right)
Partial Converse: Cauchy’s Theorem: For G a finite prime and prime p | |G|, then ∃x ∈ G with
|x| = p. (D&F, Thm. 3.2.11)
Stronger Partial Converse: Sylow’s Theorem: If |G| = pα m, for α ∈ Z, p prime, and m̸ | p, then
G has a subgroup of order pα .
First Isomorphism Theorem: For φ : G → H a homomorphism: (D&F, Thm. 3.3.1)
◦ ker φ ⊴ G
◦ G/ ker φ ∼
= im φ
◦ Corollary: φ is injective iff ker φ = ⟨1⟩
◦ Corollary: |G : ker φ| = |im φ|
◦ AB ≤ G
◦ B ⊴ AB
◦ A∩B ⊴A
◦ AB/B ∼
= A/(A ∩ B)
◦ K/H ⊴ G/H
◦ (G/H) / (K/H) ∼
= G/K
293
◦ A ⊆ B then |B : A| = |B/N : A/N | (|B : A| is the number of cosets bA of A in B)
◦ ⟨A, B⟩/N = ⟨A/N, B/N ⟩ (⟨A, B⟩ the subgroup of G generated by A ∪ B)
◦ (A ∩ B)/N = (A/N ) ∩ (B/N )
◦ A ⊴ G ⇐⇒ A/N ⊴ G/N
294
§12.8: (Dummit & Foote, Chapter 4 ) Group Theory: More on Actions
Basic Definitions:
We assume G is a group acting on a nonempty set A if not stated otherwise.
σg : A → A
a 7→ g · a
σg ∈ SA
φ : G → SA
g 7→ σg
◦ We say the action is transitive if there is only one distinct orbit (hence, ∀a, b ∈ A we may f ind
g ∈ G where a = g · b).
◦ Suppose G is a transitive permutation group on A. A block is a nonempty B ⊆ A such that
∀σ ∈ G we either have
σ(B) := {σ(b)}b∈B = B or σ(B) ∩ B = ∅
◦ A transitive group G on A is said to be primitive if all blocks are trivial: size 1, or A itself.
◦ A transitive permutation group G on A is doubly transitive if ∀a ∈ A, Ga is transitive on
A − {a}.
g · a := gag −1 a, g ∈ G
−1
The orbits Oa := gag g∈G
of this action are the conjugacy classes of G.
295
We say H ≤ G is characteristic in G (H char G) if σ(H) = H for all σ ∈ Aut(G). (Each automor-
phism of G sends H to itself, not necessarily fixing it or its elements.)
Sylow Theorem / p-group Stuff: G is a group and p is prime.
Basic Results:
If G has finite order n, and |G| smallest prime divisor p, then if |G : H| = p, we have that H ⊴ G.
(D&F, Cor. 4.2.5)
S ⊆ G has |G : NG (S)|-many conjugates, and s ∈ G has |G : CG (s)|-many conjugates. (D&F, Prop.
4.3.6)
Two elements/sets which are conjugate share their orders.
If H char G, then H ⊴ G.
If K char H ⊴ G, then K ⊴ G.
296
More Important Results:
Cayley’s Theorem: Any group is isomorphic to a subgroup of some symmetric group. If |G| = n,
then G ∼
= H ≤ Sn . (D&F, Cor. 4.2.4)
The Class Equation: Let G be a finite group, and let gi ∈ Oi (i = 1, · · ·, r), where Oi are the distinct
conjugacy classes of G (orbits of the action of conjugation on G by G) not contained in the center
Z(G). Then (D&F, Thm. 4.3.7)
r
X
|G| = |Z(G)| + |G : CG (gi )|
i=1
◦ Corollary: if |G| = pα for p prime and α ∈ Z≥1 , then Z(G) is nontrivial. (D&F, Thm. 4.3.8)
2 ∼ ∼
◦ Corollary: if |G| = p for p prime, then G is abelian and G = Zp2 or G = Zp × Zp . (D&F, Cor.
4.3.9)
The Sylow Theorems: Let |G| = pα m for p prime and p̸ | m. (D&F, Thm. 4.5.18)
np ≡ 1 (mod p)
◦ np = 1
◦ P ⊴G
◦ P char G
◦ If X ⊆ G has |x| = pβx for some βx ∈ N depending on x, for any x ∈ X, then ⟨X⟩ is a p-group.
297
§12.9: (Dummit & Foote, Chapter 7 ) Ring Theory: Basic Definitions/Examples
Definitions Given:
Types of Rings:
◦ Ring: A ring (without unity) is a set R together with binary operations +, × (addition, multi-
plication) such that
(i) (R, +) is an abelian group
(ii) (R, ×) is closed and associative
(iii) Distribution holds: for each a, b, c ∈ R,
(a + b) × c = (a × c) + (b × c)
a × (b + c) = (a × b) + (a × c)
Subring: For R a ring and S ⊆ R, we say S is a subring of R if it is a ring in its own right.
It suffices to show that S ̸= ∅, S ⊆ R, and x, y ∈ S =⇒ x − y, xy ∈ S.
Valuation Rings: For K a field (more generally, an integral domain), a discrete valuation on K
is a function ν : K × → Z such that
(i) ν(ab) = ν(a) + ν(b) (a homomorphism (K × , ·) → (Z, +))
(ii) ν is surjective
(iii) ν(x + y) ≥ min{ν(x), ν(y)} for any x, y ∈ K × where x + y ̸= 0
We then say that {x ∈ K × | ν(x) ≥ 0} ∪ {0} is the valuation ring of ν.
Certain types of rings:
298
◦ Polynomial Ring: Given a commutative unital ring R (or, more loosely, any rng), we let
( n )
X
R[x] := ai xi n ∈ N and, for all i, ai ∈ R
i=0
with the obvious operations. This ring of polynomials is also a commutative unital ring.
Relatedly:
Formal Power Series: Given a ring R, we define the ring of power series (formal variable
x, coefficients in R) to be
(∞ )
X
n
R[[x]] := an x an ∈ R for each n
n=0
Formal Laurent Series: Given a commutative unital ring R (or, often as in Dummit &
Foote, a field), we define the ring of formal Laurent series to be
( ∞ )
X
n
R((x)) := an x N ∈ Z and an ∈ R for each n
n=N
Note that this means p ∈ R((x)) may be written as a sum of ps ∈ R[[x]] and pp ∈ R[1/x].
Note that Mn (R) need not be commutative (even in the extreme case of R a field).
We say A ∈ Mn (R) is a scalar matrix if all diagonal entries are the same constant, and all other
entries are 0. These matrices are isomorphic to R.
We have the subclasses (though often for more special R, namely a field F (and most commonly R or
C)):
This is usually used for F ⊆ R. Note that if M ∈ On (F) then det(M ) = ±1.
◦ Unitary Group: The complex case of the previous, Un (F), uses unitary matrices:
299
◦ Special Unitary Group: This likewise restricts Un (F) to those of unit determinant:
n
Group Rings: Let R be a nontrivial unital commutative ring, and G := {gi }i=1 a finite multiplicative
group. We can define the group ring RG of G with coefficients in R as all formal sums as so:
( n )
X
RG := ai gi ai ∈ R
i=1
Multiplication comes by
i.e.
(agi ) · (bgj ) := (ab)(gi gj )
and hence
n
X n
X n
X X
ai gi · bi gi := ai bj gk
i=1 i=1 k=1 gi gj =gk
300
Trivial Properties and Results:
A zero divisor is never a unit. Moreover, in fields, F × = F − {0}, and fields have no zero divisors.
Let a, b, c ∈ R a ring, with a not a zero divisor. Then ab = ac =⇒ a = 0 or b = c (D&F, Prop. 7.1.2)
Any finite integral domain is a field (D&F, Cor. 7.1.3)
Wedderburn’s Little Theorem: a finite division ring must commute, and hence be a field
All Boolean rings are commutative
Results for polynomial rings:
◦ (R an integral domain) For p, q ∈ R[x], deg(p · q) = deg(p) + deg(q) (D&F, Prop. 7.2.4)
◦ (R an integral domain) The units of R[x] are those of R (D&F, Prop. 7.2.4)
◦ (R an integral domain) R[x] is an integral domain (D&F, Prop. 7.2.4)
◦ (R an integral domain) If R has no zero divisors, so does R[x] (D&F, Prop. 7.2.4)
◦ (R an integral domain) If f g ≡ 0 for a nonzero g ∈ R[x], then cf = 0 for some c ∈ R (D&F,
Prop. 7.2.4)
301
◦ (R an integral domain) S ≤ R =⇒ S[x] ≤ R[x] (D&F, Prop. 7.2.4)
◦ (R a comm. ring with 1) p ∈ R[x] is a zero divisor iff ∃b ∈ R̸=0 with bp = 0.
Results for the power series ring R[[x]] over R a commutative unital ring:
◦ Z(Mn (R)) is the set of all scalar matrices (D&F, Prob. 7.2.7)
◦ Strictly upper triangular matrices are nilpotent in Mn (R), n ≥ 2 (D&F, Prob. 7.2.8)
302
§12.10: (Dummit & Foote, Chapter 7 ) Ring Theory: Homomorphisms, Quo-
tients, Ideals
Basic Definitions:
◦ We define the kernel by ker φ := φ−1 (0) := {r ∈ R | φ(r) = 0}, as in the additive group sense.
◦ If φ is a bijection, it is a isomorphism.
Quotient Ring: Let φ : R → S have ker φ = I. Then the fibers of φ are the additive cosets of the
kernel, so we define
R/I := {r + I}r∈R (r + I) + (s + I) := (r + s) + I (r + I) · (s + I) := (r · s) + I
These define a ring, and we call R/I the quotient ring . Note this is fundamentally just the quotient
of the additive groups, with I ⊴ R.
We can equivalently define the quotient R/I for any ideal I, since I is an ideal iff it is a kernel.
303
Sum of Ideals: Given I, J ⊴ R, define I + J := {i + j}i∈I,j∈J
We say I, J ⊴ R are comaximal ideals if I + J = R.
Product of Ideals: Given I, J ⊴ R, define
( n )
X
IJ := ik jk n ∈ N, ik ∈ I, jk ∈ J
k=1
Evaluation Map: For A a ring, X ̸= ∅, and R the set of all functions f : X → A, the evaluation
at c map is given by
Ec : R → A with Ec (f ) := f (c)
This is a ring homomorphism with R/ ker Ec ∼
= A.
Annihilator: For a ring R and r ∈ R:
Characteristic of a Ring: Given a ring R, its characteristic char(R) is the minimum n such that
n1 = 0, i.e.
( n )
X
char(R) := min n ∈ Z≥1 1 + 1 + · · · + 1 = 0 = min n ∈ Z≥1 1=0
| {z }
n times
i=1
Basic Results:
304
In a commutative ring with unity, the binomial theorem holds as usual: (D&F, Prob. 7.2.25)
n
X n
(a + b)n = ak bn−k
k
k=0
◦ p ∈ R[x] is a unit iff a0 is a unit and a1 , · · ·, an are nilpotent (D&F, Prob. 7.2.33)
◦ p ∈ R[x] is nilpotent iff all coefficients ai are nilpotent (D&F, Prob. 7.2.33)
On maximal ideals:
305
◦ In a unital ring, all proper ideals are contained in a maximal ideal (D&F, Prop. 7.4.11)
◦ In a ring R, M ⊴ R is maximal iff R/M is a field (D&F, Prob. 7.4.5)
◦ In a commutative ring R, R is a field iff 0 is a maximal ideal (D&F, Prob. 7.4.4)
◦ In a commutative unital ring R, (x) ∈ R[x] is maximal iff R is a field (D&F, Prob. 7.4.7)
◦ For φ : R → S a surjective homomorphism of commutative rings, and M ⊴ S maximal, then
φ−1 (M ) ⊴ R is maximal.
If any of the following are true, prime and maximal ideals in R coincide:
A nonzero finitely generated ideal in R has a corresponding B ⊴ R which is maximal w.r.t. to the
property of not containing the generated ideal. (D&F, Prob. 7.4.35)
Important Results:
306
Second Isomorphism Theorem: For A ≤ R, B ⊴ R:
◦ A + B := {a + b}a∈A,b∈B ≤ R
◦ A∩B ⊴A
◦ (A + B)/B ∼
= A/(A ∩ B)
Third Isomorphism Theorem: For I, J ⊴ R and I ⊆ J:
◦ J/I ⊴ R/I
◦ (R/I)/(J/I) ∼
= R/J
Fourth Isomorphism Theorem (Lattice Theorem/Correspondence Theorem): Take I an
ideal of R a ring. The map A 7→ A/I is an inclusion-preserving bijection between the set of subrings
of R containing I, and the set of subrings of R/I.
Moreover, if A is a subring containing I, it is an ideal of R iff A/I is an ideal of R/I.
307
§12.11: (Dummit & Foote, Chapter 8 ) Ring Theory: Domains (Euclidean, PIDs,
UFDs)
Definitions Given:
Norm on Integral Domain: Given R an integral domain, a norm is a function N : R → Z≥0 with
N (0) = 0. If r ̸= 0 =⇒ N (r) > 0, then N is a positive norm.
Here, rn is the last nonzero remainder; since N (b) > N (r0 ) > N (r1 ) > · · · > N (rn ), the process must
terminant eventually. Moreover, rn = gcd(a, b).
Bezout Domain: For R an integral domain, we say R is a Bezout domain when each ideal generated
by two elements is principal, i.e. ∀a, b ∈ R, (a, b) = (c) for some c ∈ R.
Principal Ideal Domain (PID): A principal ideal domain is an integral domain with every ideal
being principal.
308
Unique Factorization Domain (UFD): A unique factorization domain is an integral domain
R such that, for all r ∈ R (r a nonzero non-unit),
(i) r may be written in the form r = p1 p2 · · ·pn for irreducibles pi ∈ R, not necessarily distinct
(ii) This product is unique up to associates
Thus there is a sort of “division algorithm” for each u: any x may be written in the form
x = qu + z
Least Common Multiple: In a commutative unital ring R with a, b ∈ R̸=0 , a least common
multiple of a, b is an ℓ ∈ R such that
(i) a | ℓ
(ii) b | ℓ
(iii) If a | λ and b | λ, then ℓ | λ
309
Basic Results:
◦ Z
◦ All fields
◦ All PIDs
◦ Z (Fundamental Theorem of Arithmetic) (D&F, Cor. 8.3.15)
◦ F [x], for F a field
◦ R[x] for R a UFD
Results on divisors & GCDs, assuming R is an integral domain and the elements live therein:
◦ If ℓ = lcm(a, b) exists, it is a generator for the unique largest principle ideal in (a) ∩ (b) (D&F,
Prob. 8.1.11)
◦ In a Euclidean domain, any pair of elements have a LCM, up to multiplication by a unit (D&F,
Prob. 8.1.11)
◦ In a Euclidean domain, lcm(a, b) = ab/ gcd(a, b) (D&F, Prob. 8.1.11)
310
◦ Euclidean domains are UFDs (D&F, Thm. 8.3.14)
◦ All ideals in Euclidean domains are principal ideals (D&F, Prop. 8.1.1)
◦ A nonfield Euclidean domain has universal side divisors. (D&F, Prop. 8.1.5)
◦ Let (D&F, Prob. 8.1.3)
m := min N (r)
r∈R̸=0
For an integral domain R, if these two hold, then R is a PID: (D&F, Prob. 8.2.4)
In an integral domain R, if every prime ideal is principal, R is a PID (D&F, Prob. 8.2.6)
Results on PIDs:
◦ R an integral domain is a Bezout domain iff each a, b ∈ R has a gcd d ∈ R which we can write as
d = ax + by
In an integral domain, prime elements are irreducible; in PIDs & UFDs, they coincide. (D&F, Prop.
8.3.10-12)
Take a, b ∈ R a UFD, with (D&F, Prop. 8.3.13)
Y Y
a=u pei i b=v pfi i
i i
for distinct primes pi , integers ei , fi ∈ Z≥0 , and units u, v. Then a gcd of a, b is given by d,
Y min{e ,f }
i i
d := pi
i
or d = 1 if ei = fi = 0 for all i.
311
§12.12: (Dummit & Foote, Chapter 13 ) Field Theory: Basics of Field Extensions
Field: A field is a commutative unital ring (F, +, ·) with F̸=0 all invertible elements. Hence, (F, +)
and (F̸=0 , ·) are abelian groups. (We identity F × as the set of invertible elements, so in fields
F × = F̸=0 := F − {0}.)
Note that in fields, we have 0 ̸= 1.
Characteristic: The characteristic of a field F is denoted char(F ) (as for rings in Dummit & Foote)
or ch(F ) (as for fields in Dummit & Foote). We say
( )
n
X
ch(F ) := min n ∈ Z n · 1F := 1F = 0
i=1
i.e. it is just a homomorphism of the pair as rings, since fields are rings. Of course, an isomorphism
∼
is a bijective homomorphism. Here, the textbook begins to denote isomorphisms by φ : F − → K.
Eisenstein’s Criterion (cf. Section 9.4): Let P ⊴ R be prime, and let f ∈ R[x] take the form
n
X
f (x) = ai xi for n ≥ 1
i=0
Suppose further that ai ∈ P for each i (so f ∈ P [x]) and a0 ̸∈ P 2 . Then f is irreducible in R[x].
In the more familiar case of Z, we might have
n
X
f (x) = ai xi where ai ∈ Z
i=0
312
then f is irreducible over Q (and Z). That is, it won’t be factored into nontrivial polynomials over the
these sets.
Prime Subfield: Given a field F , its prime subfield is that generated by its identity, i.e. (1F ). It is
isomorphic to Q for ch(F ) = 0, or Fp for ch(F ) = p.
Field Extension: Suppose F, K are fields with F ≤ K (i.e. F is a subfield - or subring - of K). Then
K is said to be an extension (field) of F .
This relation is denoted by K/F - meaning “K over F ” and not quotients.
Degree of Field Extension: Given K/F , then the degree, relative degree, or index of the
extension is denoted [K : F ] and given by
[K : F ] := dim K
F
(This is well-defined as K/F =⇒ K is a vector space over the field F .) We say that the extension is
finite if the degree is, and infinite otherwise.
Generated Field: Let K/F and let α1 , · · ·, αn ∈ K. Then the smallest subfield of K containing F ,
α1 , α2 , · · ·, αn−1 , and αn is the field generated by α1 , · · ·, αn over F and denoted F (α1 , · · ·, αn ).
Simple Extension; Primitive Element: If K/F has K = F (α) for a α ∈ K, then we say K is a
simple extension of F , and that the α in question is a primitive element for the extension.
Basic Results:
ch(F ) is prime or 0 for every field (follows from being integral domains). (D&F, Prop. 13.1.1)
Moreover, if ch(F ) = p, then p · a := a + a + · · · + a = 0 for each a ∈ F .
| {z }
p times
Let p ∈ F [x] be irreducible. Then ∃K a field with a subfield isomorphic to F , where K contains a root
of p(x). Hence, F has an extension field in which p has a root. (D&F, Thm. 13.1.3)
Let p ∈ F [x] be irreducible with deg(p) = n, and K := F [x]/(p(x)), a field. Let θ := x mod p(x) lie in
n−1
K. Then θk k=0 is a basis of K (as an F -vector space), so
[K : F ] = n
313
Let us have the extension K/F and let p ∈ F [x] be irreducible and r a root lying in K. Then (D&F,
Thm. 13.1.6)
F (r) ∼
= F [x]/(p(x))
If deg(p) = n then F (r) = {f ∈ F [r] | deg(f ) < n} ⊆ K. (D&F, Cor. 13.1.7)
Note that this means that if r, s are distinct roots of such p, then F (r) ∼
= F (s) ∼
= F [x]/(p(x)); you may
say the roots are algebraically indistinguishable.
∼
→ F ′ (and hence F [x] ∼
Suppose φ : F − = F ′ [x]). Let p ∈ F [x] and p′ ∈ F ′ [x] be irreducibles, where
n
X n
X
p(x) := ai xi and we let p′ (x) := φ(ai )xi
i=0 i=0
Let α be a root of p in some extension of F , and α′ a root of p′ in some extension of F ′ . Then there
is an isomorphism
∼
→ F ′ (α′ )
σ : F (α) −
α 7→ α′
which extends φ, i.e. σ|F ≡ φ (or rather, restriction to the constant polynomials).
This is represented by this diagram. Note that the vertical bars mean field extension (hence F (α)/F
and F ′ (α′ )/F ′ ).
314
§12.13: (Dummit & Foote, Chapter 13 ) Field Theory: Algebraic Extensions
Definitions:
Basic Results:
F (α) ∼
= F [x]/(mα (x)) and [F (α) : F ] = deg(α) = deg(mα )
α is algebraic over F iff the simple extension F (α)/F is finite. (D&F, Prop. 13.2.12)
If α is an element of an extension of degree n over F , then α is the root of polynomial of degree at
most n over F .
If α is the root of a polynomial of degree n over F , then [F (α) : F ] ≤ n.
If K/F is finite, it is algebraic. (D&F, Cor. 13.2.13)
F (α, β) = (F (α))(β) (D&F, Lem. 13.2.16)
n m
If K1 /F and K2 /F are finite extensions with {αi }i=1 and {βi }i=1 respective bases, then
K1 K2 = F (α1 , · · ·, αn , β1 , · · ·, βm )
315
Important Results:
[L : F ] = [L : K][K : F ]
Hence, for L/F a finite extension and K such that L/K/F . Then (D&F, Cor. 13.2.15)
[K : F ] [L : F ]
The extension K/F is finite iff K is generated by finitely many algebraic elements over F . (D&F,
Thm. 13.2.17)
That is, a field over F generated by finitely many algebraic elements of degrees d1 , · · ·, dn , is algebraic
of degree ≤ d1 d2 · · ·dn .
Suppose α, β are algebraic over F . The following are too, then: (D&F, Cor. 13.2.18)
◦ α+β
ab
◦
◦ α/β (if β ̸= 0)
◦ 1/α (if α ̸= 0)
If L/K/F with K/F and K/L each algebraic, so is L/F . (D&F, Thm. 13.2.20)
316
§12.14: (Dummit & Foote, Chapter 13 ) Field Theory: Splitting Fields; Alge-
braic Closures
Basic Definitions:
We assume F, K are fields.
Splitting Field: The extension K/F is said to be a splitting field of f ∈ F [x] if f may be factored
completely into linear factors (split completely ) when in K[x], and when f does not split completely
for any proper subfield L of K containing F (i.e. an L where F < L < K).
Normal Extension: If K/F is algebraic and K is the splitting field of some family of polynomials in
F [x], we say K is a normal extension.
Primitive Root of Unity: Recall that the solutions to xn − 1 (as a polynomial in R[x]) are the nth
roots of unity . These form a subgroup of C under multiplication, a cyclic one at that.
(n) 2πik
ζk := exp or, as Dummit & Foote use, ζnk
n
(n)
A primitive nth root of unity is one such that it generates the remaining set. These are ζk for k
coprime to n.
Cyclotomic Field: The field Q(ζn ) is called the cyclotomic field of the nth roots of unity. It
is the splitting field of xn − 1.
Algebraic Closure: The field F is said to be an algebraic closure of F if F /F is algebraic and F
is a splitting field of every f ∈ F [x]. (Equivalently, F has all of the elements algebraic over F .)
We say a field F is algebraically closed if, ∀f ∈ F [x], f has a root in F .
Basic Results:
We assume F, K are fields.
If deg(f ) = n for f ∈ F [x], then f has at most n roots in F , and precisely n if it splits completely in
F [x].
For any f ∈ F [x], a splitting field exists for f . (D&F, Thm. 13.4.25)
The splitting field K of f ∈ F [x] with deg(f ) = n itself has [K : F ] ≤ n!. (D&F, Prop. 13.4.26)
317
Important Results:
∼
Let φ : F −
→ F ′ for fields F, F ′ . Let f ∈ F [x] and f ′ ∈ F ′ [x] be given by (D&F, Thm. 13.2.27)
n
X n
X
i ′
f (x) := ai x f (x) := φ(ai )xi
i=0 i=0
∼
with splitting fields E, E ′ over F, F ′ respectively. Then φ extends to an isomorphism σ : E −
→ E ′ , i.e.
∼ ′
∃σ : E −
→ E such that σ|F ≡ φ
The splitting field of a given polynomial is unique up to isomorphism (D&F, Cor. 13.2.28)
318
§12.15: (Dummit & Foote, Chapter 13 ) Field Theory: Separability
Definitions:
We assume F is a field.
Separable Polynomial: For f ∈ F [x], we say f is separable if it has no repeated roots (i.e. each
root has multiplicity 1), and inseparable otherwise.
(as if this was ordinary calculus). We make no concerns about existence or convergence: this is an
algebraic definition of a new function.
Frobenius Endomorphism: The map φ(x) := xp from a field F to itself F is the Frobenius
endomorphism of F .
Perfect Field: Let ch(F ) = p. Then we say F is perfect if all elements of F are pth powers in F ,
and hence F = F p .
All fields of characteristic zero are called perfect. We can show all finite fields are too.
Separable Degree: Let p ∈ F [x] be irreducible, with ch(F ) = p. Then ∃! k ≥ 0 and ∃! ps ∈ F [x]
k
with p(x) = ps (xp ).
deg(ps ) is called the separable degree of p(x) and is denoted degs (p(x)).
The integer pk is called the inseparable degree of p(x), denoted degi (p(x)).
Separable Fields: The field K is said to be separable (or separably algebraic) over F if each
α ∈ K is the root of a separable f ∈ F [x]. (Equivalently, µα,F is separable for each α ∈ K.)
Otherwise, we say K is inseparable.
Basic Results:
Finite extensions of perfect fields are separable. (In particular, Q and finite fields.) (D&F, Cor.
13.5.39)
319
Important Results:
Irreducible polynomials over finite fields are separable. (D&F, Cor. 13.5.37)
Polynomials over such fields are separable iff they’re a product of irreducible polynomials.
320
§12.16: (Dummit & Foote, Chapter 14 ) Galois Theory: Basic Definitions
Old Definitions:
We assume K is a field.
∼
Automorphism: Given a ring or field K, if σ : K − → K is an isomorphism, we say σ is an auto-
morphism (an isomorphism from a structure to itself). The collection of these automorphisms on K
is denoted Aut(K).
Dummit & Foote often use the functional analysis-esque shorthand of σα := σ(α).
Fixed Element/Set: φ ∈ Aut(K) is said to fix an α ∈ K if φα = α. Likewise, given S ⊆ K (or
even say a subfield), then φ fixes S when φs = s for every s ∈ S. (Note that this is stronger than
φ(S) = S.)
New Definitions:
We assume K is a field, as is F .
Automorphisms Fixing Subfield: Let K/F be a field extension. Then we denote the automor-
phisms of K which fix F by Aut(K/F ), i.e.
F := {x ∈ K | ∀φ ∈ H, φ(x) = x}
Galois Field Extension: Let K/F be a finite field extension. We say K is Galois over F - and that
K/F is a Galois extension - if |Aut(K/F )| = [K : F ].
Galois Group: If K/F is a Galois extension, then Aut(K/F ) is called the Galois group of K/F .
It is given the special name Gal(K/F ).
(Some choose to define this for any K/F , not merely finite.)
Galois Group of Polynomial: Let f ∈ F [x] be separable over F , with splitting field K. Then the
Galois group of f over F is Gal(K/F ) (i.e. the splitting field over its field of origin).
Some choose to denote this by Gal(f ).
Basic Results:
We assume K is a field, as is F .
Each φ ∈ Aut(K) fixes the prime subfield (that generated by 1). Consequently, Aut(Q) and Aut(Fp )
are trivial.
Aut(K) is a group, with a subgroup Aut(K/F ), under function composition. (D&F, Prop. 14.1.1)
321
Let K/F be a field extension and α ∈ K algebraic over F . Let σ ∈ Aut(K/F ). Then σα is a root of
mα,F . Hence, Aut(K/F )’s elements permute the roots of irreducible polynomials - hence, if α is a root
of f ∈ F [x], so is σα. (D&F, Prop. 14.1.2)
Consequently, if K is generated over F by some elements, σ ∈ Aut(K/F ) is determined completely by
its action on the generators.
|Aut(E/F )| ≤ [E : F ]
with equality if f is separable over F (and then, hence, K/F is Galois). (D&F, Cor. 14.1.6)
More generally, for any finite extension K/F , |Aut(K/F )| ≤ [K : F ]. (D&F, Cor. 14.2.10)
322
§12.17: (Dummit & Foote, Chapter 14 ) Galois Theory: The Fundamental The-
orem
Definitions:
for any g ∈ G.
Field Embedding: Given fields F, K, an embedding of F into K is simply an injective homomor-
phism φ : F → K.
Galois Conjugates: Let K/F be Galois. If α ∈ K, then {σα}σ∈Gal(K/F ) are the (Galois) conju-
gates of α over F .
Galois Conjugate Field: If K/F is Galois and F ≤ E ≤ K as fields, and σ ∈ Gal(K/F ), then σ(E)
is called the conjugate field of E over F .
Basic Results:
n
If {χi }i=1 are distinct characters G → L, then they are linearly independent. (D&F, Thm. 14.2.7)
n
If {σi : K → L}i=1 are distinct field embeddings, then they are linearly independent functions on K.
Hence, distinct automorphisms are linearly independent as functions on K. (D&F, Cor. 14.2.8)
n
Suppose G := {σi }i=1 ≤ Aut(K) as groups. Then (D&F, Thm. 14.2.9)
Let G ≤ Aut(K) be finite and F := FixK (G). Then each σ ∈ Aut(K) that fixes F is contained in G.
That is, Aut(K/F ) = G with Galois group G. (D&F, Cor. 14.2.11)
If G, H ≤ Aut(K) have G ̸= H, then FixK (G) ̸= FixK (H). (D&F, Cor. 14.2.12)
The extension K/F is Galois iff K is the splitting field of some f ∈ F [x]. If this is the case, then
any irreducible g ∈ F [x] which has a root in K is separable and has all its roots in K, so K/F is a
separable extension. (D&F, Thm. 14.2.13)
Important Results:
323
The Fundamental Theorem of Galois Theory: Let K/F be a Galois extension. Then there is a
bijection
{fields E | F ≤ E ≤ K} ↔ {groups H | 1 ≤ H ≤ Gal(K/F )}
with the correspondences
Gal(E/F ) ∼
= Gal(K/F )/K
◦ More generally, even if H is not normal in Gal(K/F ), the isomorphisms of E into a fixed algebraic
closure of F that contains K, which fix F , are in bijection with the cosets {σH}σ∈Gal(K/F ) .
◦ If E1 , E2 respectively correspond to H1 , H2 , then E1 ∩ E2 corresponds to the group ⟨H1 , H2 ⟩,
and the composite field E1 E2 corresponds to H1 ∩ H2 . Thus the lattice of subfields of K that
contain F , and the lattice of subgroups of Gal(K/F ), are dual (flipped upside down versions of
each other).
324
§13: Items from Topology, Metric Spaces, & Real Analysis
§13.1.1: Boundary
Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.
∂X S
bdX S
BdX S
bdryX S
frX S
Common definitions:
∂S := S − int(S)
∂S := S ∩ S c
Some identities:
S = S ∪ ∂S
X = int(S) ∪ ∂S ∪ int(S c ) for any S ⊆ X (trichotomy); the three are pairwise disjoint
∂S = ∂(S c )
∂∂∂S = ∂∂S ⊆ ∂S
∂S is always closed
S is closed iff ∂S ⊆ S
325
§13.1.2: Closure
Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.
S−
SX
X
S
S
Common definitions:
Some identities:
int(S) ⊆ S ⊆ S
S = S ∪ ∂S
S = int(S c )c ; equivalently, CS = int(CS)
c
int(S) = S c ; equivalently, C int(S) = CS
For {Si }i∈N :
!
[ [
◦ If ∀i we have Si closed in X, then int(Si ) = int Si
i∈N i∈N
! !
\ \
◦ If ∀i we have Si open in X, then int Si = int Si
i∈N i∈N
\ \
Si ⊆ Si (reverse may not hold)
i∈I i∈I
[ [
Si ⊆ Si (reverse may not hold)
i∈I i∈I
n
[ n
[
Si = Si (specifically finite)
i=1 i=1
326
(Monotonicity) S ⊆ T =⇒ S ⊆ T
S is closed iff S = S
327
§13.1.3: Complement
Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.
Sc
CS
CS
S′
S
S
Common definitions:
S c := X − S, in an understood universe X
Some identities:
X = int(S) ∪ ∂S ∪ int(S c ) for any S ⊆ X (trichotomy); the three are pairwise disjoint
∂S = ∂(S c )
328
§13.1.4: Interior
Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.
intX (S)
IntX (S)
S̊X
SX
◦
Common definitions:
c
Some define the related exterior, by ext(S) = int(S c ) = S . Note that X = int(S) ∪ ∂S ∪ ext(S)
x ∈ int(S) ⇐⇒ ∃Ux an open nbh s.t.x ∈ Ux ⊆ S
[
int(S) := G
G⊆S
G is open
Some identities:
int(S) ⊆ S ⊆ S
X = int(S) ∪ ∂S ∪ int(S c ) for any S ⊆ X (trichotomy); the three are pairwise disjoint
int(∂S) = ∅ for any S which is at least one of open or closed
S = int(S c )c ; equivalently, CS = int(CS)
c
int(S) = S c ; equivalently, C int(S) = CS
For {Si }i∈N :
!
[ [
◦ If ∀i we have Si closed in X, then int(Si ) = int Si
i∈N i∈N
! !
\ \
◦ If ∀i we have Si open in X, then int Si = int Si
i∈N i∈N
329
§13.1.5: Limit Points / Accumulation Points / Derived Set
The elements we’re talking about may be called limit points, accumulation points, or adherent
points, and the collection of them (aside from the obvious, e.g. set of limit points) may be called the
derived set
Some notations, when S ⊆ X, with X a topological space, are below. The X may be dropped if
understood.
S′
L(S)
Common definitions:
Some identities:
s ∈ S ′ =⇒ s ∈ S ′ − {s}
(S ∪ T )′ = S ′ ∪ T ′
S ⊆ T =⇒ S ′ ⊆ T ′
S is closed iff S ′ ⊆ S
x ∈ S ′ ⇐⇒ x ∈ S − {x} (unsure?)
330
§13.2: Compactness
If X is a metric space, X is compact iff each sequence in X has a convergent subsequence with limit
in X (called sequential compactness)
◦ Measure & Integral only discusses the Rn case (M&I, Thm. 1.12)
X satisfiesTthe finite intersection axiom if, for any family S of closed sets having the finite intersection
property, S = ̸ ∅. (Notice how it must apply to the whole family now.)
Heine-Borel for metric spaces is: (X, d) is compact iff complete and totally bounded, latter defined by
n
◦ X is totally bounded iff ∀ε > 0, ∃{xi }i=1 ∈ X such that inf d(xi , x) < ε for each x ∈ A
1≤i≤n
(Tychanoff ) A product
Q
i∈I Xi is compact iff Xi is compact ∀i ∈ I
Finite unions of compact sets are compact
331
§13.3: Continuity & Types Thereof
Results are stated in terms of R unless stated otherwise but generalize nicely. Domains are not stated
unless needed.
We may write g ∈ C(S) to mean it is continuous on S, and g ∈ C(S, T ) to mean g : S → T is continuous.
◦ ∀Vf (x) a neighborhood of f (x), ∃Ux a neighborhood of x such that f (Ux ) ⊆ Vf (x) .
◦ f −1 (V ) is a neighborhood of x for all neighborhoods V of f (x)
Uniform Continuity: f is uniformly continuous on D := dom(f ) iff
(∀ε > 0)(∃δ > 0)(∀x, y ∈ D) |x − y| < δ =⇒ |f (x) − f (y)| < ε
332
Absolute Continuity: Take [a, b] ⊆ R an interval and f : [a, b] → R. (C may also be used.) f is
n
absolutely continuous if, ∀ε > 0, ∃δ > 0 such that, whenever {(ai , bi )}i=1 ⊆ P([a, b]) are finitely-many
disjoint and nonempty subintervals, meeting the condition
n
X n
X
bk − ak = µ (ak , bk ) < δ
k=1 k=1
then
n
X
|f (bk ) − f (ak )| < ε
k=1
More compactly: if
( n
)
X
n
Ia,b,δ := {(ak , bk )}k=1 n ∈ N and a < ak < bk < b and (bk − ak ) < δ
k=1
A Brief Hierarchy:
Topological Invariants: These are properties such that, if A has the property, so does f (A) if f is
continuous:
◦ compactness
◦ connectedness
◦ path-connectedness
◦ being a Lindelof space
◦ being separable
Extreme Value Theorem: Continuous functions on compact sets attain their suprema and infima,
and hence are bounded (M&I, Thm. 1.15)
333
Intermediate Value Theorem: If f ∈ C([a, b]), then for each y ∈ [f (a), f (b)] there is an x ∈ [a, b]
such that f (x) = y
Lebesgue-Vitali or Riemann-Lebesgue Theorem: If continuous a.e., f is integrable
f : X → Y is continuous on X iff...
g ∈ C 1 (R, R) is Lipschitz continuous iff g ′ is bounded; then we have Lipschitz constant sup|g ′ |.
◦ Measure & Integral gives it for just compact sets (M&I, Thm. 1.15)
If f : [a, b] → R is absolutely continuous, it is differentiable a.e., and ∃g ∈ L[a, b] with
Z x
f (x) = f (a) + g(t) dt
a
334
f, g absolutely continuous implies that f ± g are too
Absolutely continuous functions may be written in the form g − h for g, h monotone non-decreasing on
[a, b]
335
§13.4: Dense
int(S c ) = ∅
If x ∈ X, then x ∈ S or x ∈ S ′ (since S = S ∪ S ′ )
For any A ⊆ X, S ∩ A ̸= ∅
336
§13.5: Infimum & Supremum; ε Characterization (“Capturing”)
u is an upper bound of S if u ≥ s ∀s ∈ S
ℓ is a lower bound of S if ℓ ≤ s ∀s ∈ S
sup S = α ⇐⇒ α is an upper bound of S, and the least such one (in the sense that if γ is another
upper bound, then α ≤ γ)
inf S = β ⇐⇒ β is a lower bound of S, and the greatest such one (in the sense that if γ is another
lower bound, then β ≥ γ)
α, β as given above may be infinite if need be, but we focus on finite here
The ε-characterization is as follows. You may envision a half-interval stretching away from α, β to “capture”
other elements:
Scaling:
◦ For α < 0, then sup(αxn ) = |α| · inf xn and inf (αxn ) = |α| · sup xn
n∈N n∈N n∈N n∈N
337
§13.6: Limit Inferior & Limit Superior of Sequences (lim inf an , lim sup an )
Notations:
The limit inferior of a sequence {an }n∈N (implicit: as n → ∞) may be denoted by
Definitions:
Definitions differ a little; a few common (equivalent) ones are
lim inf an := lim inf am ≡ sup inf am (limit of future infima)
n→∞ n→∞ m≥n n∈N m≥n
lim sup an := lim sup am ≡ inf sup am (limit of future suprema)
n→∞ n→∞ m≥n n∈N m≥n
We may also use subsequential limits. Let A be the collection of all subsequential limits of {an }n∈N . That
is
k→∞
a ∈ A ⇐⇒ ∃ a subsequence {ank }k∈N ⊆ {an }n∈N such that ank −−−−→ a
k→∞
⇐⇒ ∃ a monotone sequence {nk }k∈N ⊆ N such that ank −−−−→ a
Then
lim inf an := inf(A) lim sup an := sup(A)
n→∞ n→∞
The definition for a sequence of functions {fn (x)}n∈N is entirely analogous, but that for a function
itself is not quite the same.
lim sup xn is the smallest b ∈ R such that, ∀ε > 0, ∃N ∈ N, such that xn < b + ε ∀n > N .
Hence, any number larger than lim sup xn is an eventual upper bound for the sequence (all terms are
“eventually” bounded by it).
Moreover, only finitely many terms will be larger than b + ε.
Similarly, lim inf xn is the smallest a ∈ R such that, ∀ε > 0, ∃N ∈ N, such that xn > b − ε for all
n > N.
Thus, any number larger than lim inf xn is “eventually” an upper bound of {xn }n∈N , and only finitely
many terms will be less than b − ε.
338
Properties of Limit Inferior & Limit Superior:
We take {xn }n∈N , {yn }n∈N ⊆ R.
◦ Common to prove limit exists by showing that lim sup xn ≤ lim inf xn .
n→∞ n→∞
Negatives & Reversal: lim sup (−xn ) = − lim inf xn and lim inf (−xn ) = − lim sup xn
n→∞ n→∞ n→∞ n→∞
Scaling:
◦ For α ≥ 0, then lim sup (αxn ) = α · lim sup xn and lim inf (αxn ) = α · lim inf xn .
n→∞ n→∞ n→∞ n→∞
◦ For α < 0, then lim sup (αxn ) = |α| · lim inf xn and lim inf (αxn ) = |α| · lim sup xn
n→∞ n→∞ n→∞ n→∞
339
§13.7: Limit Inferior & Limit Superior of a Function
Consider a metric space (X, d) with E ⊆ X and f : E → R. Let a ∈ E ′ (the set of limit points). We
define the limit inferior and limit superior of f by
!
lim inf f (x) := lim inf f (x) lim sup f (x) := lim sup f (x)
x→a ε→0 x∈E∩B(a;ε)\{a} x→a ε→0 x∈E∩B(a;ε)\{a}
In the topological case, let (X, τ ) be a topological space and all else as before. Then
!
lim inf f (x) := sup inf f (x) lim sup f (x) := inf sup f (x)
x→a U open inX x∈E∩B(a;ε)\{a} x→a U open inX x∈E∩B(a;ε)\{a}
a∈U a∈U
E∩U\{a}̸= E∩U\{a}̸=
One could write this with limits and nets and a neighborhood filter.
340
§13.8: Lp Spaces
341
§13.9: Open and Closed Sets; Gδ and Fσ sets; Topologies
∅, X ∈ τ
This means arbitrary unions of open sets are open, and finite unions of closed sets are closed.
F is closed iff F = F
In R, open sets are a countable union of disjoint balls (M&I, Thm. 1.10)
All open sets in Rn are a countable union of nonoverlapping closed cubes (M&I, Thm. 1.11)
◦ Can also use partly-open cubes
◦ Cubes are just intervals (in the Rn sense) where each side length is the same
Note that G need not be open, and that F need not be closed. (However, CG is Fσ and CF is Gδ by De
Morgan.)
342
§13.10: Riemann Integration
Let f : I → Rn be bounded
N
Let Γ := {Ik }k=1 partition I into finitely many nonoverlapping intervals (“nonoverlapping” = “inter-
sects only on boundary”)
Define ∥Γ∥ := max diam(Ik ) where diam(S) := supx,y∈S d(x, y) in a metric space (X, d) (in Rn ,
1≤k≤N
2-norm)
N
Take tags Ξ := {ξk }k=1 where ξk ∈ Ik .
R
Then Measure and Integral offers several definitions: we say A := I
f iff
lim RΓ = A
∥Γ∥→0
Formally: (∀ε > 0)(∃δ > 0)(∀Γ such that ∥Γ∥ < δ)(∀Ξ) |A − Rf,Γ,Ξ | < ε
(∀ε > 0)(∃δ > 0)(∀Γ with ∥Γ∥ < δ)(Uf,Γ,Ξ − Lf,Γ,Ξ < ε)
(∀ε > 0)(∃Γ)(Uf,Γ,Ξ − Lf,Γ,Ξ < ε) for f bounded (M&I, Prob. 1.15)
Cauchy Criterion: (∀ε > 0)(∃δ > 0)(∀Γ, Γ′ such that ∥Γ∥, ∥Γ′ ∥ < δ) |Rf,Γ,Ξ − Rf,Γ′ ,Ξ′ | < ε
n→∞ n→∞
∃Γn with ∥Γn ∥ −−−−→ 0 Uf,Γn ,Ξn − Lf,Γn ,Ξn −−−−→ 0
(∀ε > 0)(∃s a step function on I) |f (x) − s(x)| < ε
343
Lebesgue-Vitali or Riemann-Lebesgue Theorem: If f is continuous a.e., it is integrable (thus
giving the case of monotone)
Z
A Squeeze Theorem: (∀ε > 0)(∃α, β ∈ R(I) such that α ≤ f ≤ β on I) (β − a) < ε
I
344
§13.11: Sequences of Functions
For now, we speak of {fn : D ⊆ R → R}n∈N and f as a hypothetical limiting function as needed.
Items of note:
Uniform Limit Theorem: (Wikipedia) If fn are continuous (on E) and converge uniformly to f
(finite everywhere), then f is continuous (on E) (M&I, Thm. 1.16)
fn converges uniformly to f iff
◦ ∃{Mn }n∈N ⊆ R≥0 such that sup |fn (x) − f (x)| ≤ Mn for all n large enough
x∈D
◦ Cauchy Criterion: (∀ε > 0)(∃N ∈ N)(∀n, m ≥ N )(∀x ∈ D) |fm (x) − fn (x)| < ε
◦ fn → f pointwise
◦ fn , f are uniformly bounded on D (i.e. bounded, all by the same constant)
◦ You can interchange limit and integral if fn are Riemann-integrable:
Z Z Z
lim fn = lim fn = f
n→∞ D D n→∞ D
Suppose {fn }n∈N ⊆ C 1 [a, b] with fn′ Riemann integrable on [a, b]. Moreover, let {fn′ }n∈N converge
uniformly to g, and some x0 ∈ [a, b] such that {fn (x0 )}n∈N converges as a sequence in R. Then
{fn }n∈N converges uniformly to f ∈ C 1 [a, b] with f ′ = g.
Weierstrass M -test: (Wikipedia) Consider {fn : A → F ∈ {R, C}}n∈N and suppose ∃{Mn }n∈N ⊆ R≥0
such that the following hold:
345
§14: Notes from Self-Studying Real Analysis
Familiar definitions and ideas are skipped. Those less memorable or possibly confusing/conventional are
noted.
(i) Given x, y ∈ S, one and only one of x < y, x = y, y < x are true
(ii) Given x, y, z ∈ S, then x < y and y < z imply x < z (transitivity)
S with such a relation < is called an ordered set. We write ≤ when equality is allowed.
Extrema: Notions of upper bounds, lower bounds, greatest lower bound (infimum), and least upper
bound (supremum) may be defined w.r.t. this ordering. Given E ⊂ S, then
inf ∅ = +∞ sup ∅ = −∞
A set with the least upper bound property has the greatest lower bound property, and vice versa.
Moreover, given E ⊆ S nonempty and bounded below,
i.e. the infimum is the supremum of the lower bounds. Likewise, the supremum is the infimum of
the lower bounds. (Rudin’s PMA, Thm. 1.11)
346
On Functions, Sets, & Cardinality:
◦ Rudin uses “onto” for surjections, “one-to-one” for injections, and “one-to-one correspondence”
for bijections.
◦ If ∃f : A → B a bijection, then A and B have equal cardinality and cardinal numbers, a relation
denoted by A ∼ B.
◦ Let Jn := {1, 2, · · ·, n} and J = Z+ = {1, 2, · · ·}. Then Rudin defines the following, given a set A:
A is finite if A ∼ Jn for some n ∈ N or if A = ∅
A is infinite otherwise
A is countable if A ∼ J (i.e. countable means countably infinite)
A is uncountable if neither finite nor countable
A is at most countable if finite or countable
A is enumerable/denumerable if countable
◦ Formally, a sequence (an )∞
n=1 is just a function f : N → S (for some S in which the sequence
lives) where f (n) = an .
◦ If B ⊆ A and A is countable and B is infinite, then B is countable. (Rudin’s PMA, Thm. 2.8)
S
◦ If {En }n∈N is a family of countable sets, then n∈N En is countable. (Rudin’s PMA, Thm. 2.12)
347
§14.2: (Baby Rudin, Chapter 2) Metric Spaces & Topology
On Metric Spaces: A Definition-Dump: Assume we are working in a metric space (X, d) with
E ⊆ X unless stated otherwise.
If not a limit point, p is said to be an isolated point. We may write E ′ as the set of its limit
points.
◦ Closed: E is said to be closed if it contains all its limit points, i.e. E ′ ⊆ E.
◦ Interior: p is said to be an interior point of E if there is a neighborhood of p contained entirely
in E, i.e.
∃r > 0 such that Nr (p) ⊆ E
◦ Open: E is open if all of its points are interior points.
◦ Complement: E c , defined as those points in the grander space not in E.
◦ Perfect: E is said to be perfect if it is closed and all points of E are limit points, i.e. E ′ = E.
◦ Bounded: E is said to be bounded if ∃M ∈ R and q ∈ X such that d(p, q) < M for each p ∈ E.
◦ Dense: E is dense in X if each point of X is a limit point of E or lies in E, or both, i.e.
X = E ∪ E′.
◦ Closure: The closure of E is itself alongside its limit points, i.e. E := E ∪ E ′ .
S
◦ Open Cover: An open cover of E is a set {Gα }α∈A of open sets in X where E ⊆ α∈A Gα .
◦ Compact: K is said to be compact in X if open covers of K contain finite subcovers, i.e. if
n
given {Gα }α∈A an open cover ofSK, one can find G1 , · · ·, Gn ∈ {Gα }α∈A such that {Gi }i=1 is
n
also an open cover of K, so K ⊆ i=1 Gi .
348
◦ Connected & Separated: A, B ⊆ X are said to be separated if A ∩ B = A ∩ B = ∅. (Note
that separated =⇒ disjoint, but not the converse, e.g. [0, 1] and (1, 2).)
We say that E is a connected set if it cannot be written as a union of two, nonempty, separated
sets.
◦ If p ∈ E ′ , then each neighborhood of p has infinitely many points of E. (Rudin’s PMA, Thm.
2.20)
◦ E is closed iff E c is open, and E is open iff E c is closed. (Rudin’s PMA, Thm. 2.23)
◦ Arbitrary unions of open sets are open (Rudin’s PMA, Thm. 2.24a)
◦ Arbitrary intersections of closed sets are closed (Rudin’s PMA, Thm. 2.24b)
◦ Finite intersections of open sets are open (Rudin’s PMA, Thm. 2.24c)
◦ Finite unions of closed sets are closed (Rudin’s PMA, Thm. 2.24d)
◦ Closures of sets are closed (Rudin’s PMA, Thm. 2.27a)
◦ E is closed iff E = E (Rudin’s PMA, Thm. 2.27b)
◦ E is the smallest closed set containing E, i.e. if E ⊆ F and F is closed then E ⊆ F (Rudin’s
PMA, Thm. 2.27c)
◦ If E ⊆ R is nonempty and bounded above, then sup E ∈ E (Rudin’s PMA, Thm. 2.28)
◦ Compact sets are closed. (Rudin’s PMA, Thm. 2.34)
◦ Closed subsets of compact sets are themselves compact. (Rudin’s PMA, Thm. 2.35)
n Tn T
◦ If {Kα }α∈A are all compact, and ∀{Ki }i=1 ⊆ {Kα }α∈A we have i=1 Ki ̸= ∅, then α∈A Kα ̸= ∅.
(Rudin’s PMA, Thm. 2.36)
◦ If E ⊆ K, E infinite and K compact, then E has a limit point in K. (Rudin’s PMA, Thm. 2.37)
◦ Heine-Borel: Given E ⊆ Rn , the following are equivalent: (Rudin’s PMA, Thm. 2.41)
(i) E is closed and bounded
(ii) E is compact
(iii) Infinite subsets of E have limit points in E
(Heine-Borel is typically just taken as the equivalence of the first two.)
◦ Due to Weierstrass: Bounded infinite subsets of Rn have limit points in Rn . (Rudin’s PMA,
Thm. 2.42)
◦ Nonempty perfect sets are uncountable. (Rudin’s PMA, Thm. 2.43)
349
◦ The interior E ◦ is open (Rudin’s PMA, Prob. 2.9a)
◦
◦ E is open ⇐⇒ E = E (Rudin’s PMA, Prob. 2.9b)
◦
◦ If G ⊆ E with G open, then G ⊆ E (Rudin’s PMA, Prob. 2.9c)
Hence, E ◦ is the largest open set contained in E
◦ (E ◦ )c = E c (Rudin’s PMA, Prob. 2.9d)
350
§14.3: (Baby Rudin, Chapter 3) Sequences & Series
Basic Definitions:
We assume we’re working in a metric space (X, d) unless stated otherwise. Let {pn }n∈N live in X.
Kinds of Sequences:
We say {pn }n∈N is divergent otherwise. Note that p is the limit, and must live in the space. We
denote this relationship by
n→∞
lim pn = p pn −−−−→ p
n→∞
◦ Bounded: We say {pn }n∈N is bounded if it is bounded as a subset of X. We may say that the
points {p1 , p2 , · · ·} form the range of the sequence.
◦ Monotonicity: We say {pn }n∈N ⊆ R is monotonically increasing if pn ≤ pn+1 for each n,
and monotonically decreasing if pn ≥ pn+1 for each n.
Cauchy Criterion: Theorem 3.11 gives us that {pn }n∈N is convergent in Rn iff {pn }n∈N is Cauchy.
This is called the Cauchy criterion for convergence.
Completeness: A metric space where all Cauchy sequences in the space converge (in the space) is
said to be complete.
Limit Supremum / Limit Infimum: Define, for a given {xn }n∈N ⊆ R,
n o
k→∞
E := x ∈ R ∃{xnk }k∈N a subsequence of {xn }n∈N such that xnk −−−−→ x
Then we define the limit supremum and limit infimum of {xn }n∈N by
These extrema are uniquely defined, and lie in E. (Rudin’s PMA, Thm. 3.17)
351
Series: Remember that
q
X
an := ap + ap+1 + · · · + aq
n=p
∞
X N
X
an := lim an
N →∞
n=1 n=1
Power
P∞ Series: Given {cn }n∈N ⊆ C a sequence of coefficients, its power series is given by
n
n=0 cn z . Applying the ratio or root tests to the series to get
1 √
√ , lim supn→∞ n cn ∈ (0, ∞)
lim sup n cn
R := n→∞
√
∞, lim supn→∞ n cn = 0
√
0, lim supn→∞ n cn = ∞
we say R is the radius of convergence of the series, i.e. it converges for |z| < R and diverges for
|z| > R. It may or may not for |z| = R.
Absolute Convergence: If |an | converges, we say
P P
an absolutely converges.
352
Basic Results:
We assume, unless stated otherwise, that we’re working in a metric space (X, d), and sequences live in
that space.
The obvious arithmetic properties of sequences hold in Rm and C. So, if (Rudin’s PMA, Thm. 3.3)
{xn }n∈N , {yn }n∈N ⊆ C
α, β ∈ C
n→∞
xn −−−−→ x
n→∞
yn −−−−→ y
then
n→∞
xn + yn −−−−→ αx + βy
n→∞
xn yn −−−−→ xy
1 n→∞ 1
−−−−→ if xn ̸= 0 for all sufficiently large n
xn x
And, if we define
{xn }n∈N , {yn }n∈N ⊆ Rm
{γn }n∈N ⊆ R
α, β, γ ∈ R
h i
(n) (n) (n)
xn := ξ1 , ξ2 , · · ·, ξm ∈ Rm
h i
(n) (n) (n)
yn := η1 , η2 , · · ·, ηm ∈ Rm
x := [ξ1 , · · ·, ξm ] ∈ Rm
y := [η1 , · · ·, ηm ] ∈ Rm
n→∞
xn −−−−→ x
n→∞
yn −−−−→ y
n→∞
γn −−−−→ γ
then (Rudin’s PMA, Thm. 3.4a)
n→∞ (n) n→∞
xn −−−−→ x ⇐⇒ ∀i, we have componentwise convergence: ξi −−−−→ ξi
and (Rudin’s PMA, Thm. 3.4b)
n→∞
αxn + βyn −−−−→ αx + βy
n→∞
γn xn −−−−→ γx
n→∞
⟨xn , yn ⟩Rm −−−−→ ⟨x, y⟩Rm
Series also have their obvious arithmetic properties if convergent. Suppose (Rudin’s PMA, Thm. 3.47)
X X
an = A bn = B α, β ∈ C
Then
X
(αan + βbn ) = αA + βB
∞
! ∞ ! ∞ n
!
X X X X
an bn = ak bn−k (Cauchy product)
n=0 n=0 n=0 k=0
353
Note that the product of convergent series may diverge: consider
∞
X (−1)n
√
n=0
n+1
The collection of subsequential limits of a given sequence forms a closed set (Rudin’s PMA, Thm. 3.7)
∞ n→∞
{pn }n∈N is Cauchy iff, for EN := {pi }i=N , we have diam En −−−−→ 0.
Given a monotone sequence, it converges iff it is bounded. (Rudin’s PMA, Thm. 3.14)
354
Items on Limit Supremum & Limit Infimum:
Suppose we have {sn }n∈N , {tn }n∈N ⊆ R with sn ≤ tn for all n large enough. Then (Rudin’s PMA,
Thm. 3.19)
lim inf sn ≤ lim inf tn lim sup sn ≤ lim sup tn
n→∞ n→∞ n→∞ n→∞
We have, given {an }n∈N , {bn }n∈N ⊆ R, (Rudin’s PMA, Prob. 3.5)
355
Series Convergence Tests & Such:
!
Xm
Cauchy Criterion:
P
an converges iff (∀ε > 0)(∃N ∈ N) m ≥ n ≥ N =⇒ ai ≤ ε (Rudin’s
i=n
PMA, Thm. 3.22)
n→∞
nth Term Test:
P
an converges =⇒ an −−−−→ 0 (not the converse) (Rudin’s PMA, Thm. 3.23)
If an ≥ 0, then
P
an converges iff the partial sums are a bounded sequence (Rudin’s PMA, Thm.
3.24)
Comparison Tests: If |an | ≤ cn eventually,
P P
P then cn converging
P implies the same for an . Like-
wise, if an ≥ dn ≥ 0 eventually and dn diverges, so must an . (Rudin’s PMA, Thm.
3.25)
P∞ P∞
Cauchy Condensation: Suppose a1 ≥ a2 ≥ · · · ≥ 0. Then n=1 an converges iff k=0 2k a2k does.
(Rudin’s PMA, Thm. 3.27)
then we have convergence. If |an+1 /an | ≥ 1 for all n eventually, we have divergence.
The ratio test is weaker than the root test; the root test works on more series, and comes to the same
conclusions as the ratio test when the latter comes to a conclusion at all.
Summation by Parts: Given {an }n∈N , {bn }n∈N , define (Rudin’s PMA, Thm. 3.41)
n
X
An := ak for n ≥ 0 and A−1 := 0
k=0
n→∞
Use {aP
n }n∈N , {bn }n∈N , {An }n∈N as given above, with {bn }n∈N monotone decreasing and bn −
−−−→ 0.
Then an bn converges. (Rudin’s PMA, Thm. 3.42)
Alternating Series Test: Given {cn }n∈N with |c1 | ≥ |c2 | ≥ · · ·, odd-index terms nonnegative, even-
n→∞ P
index terms nonpositive, and cn −−−−→ 0, then cn converges. (Rudin’s PMA, Thm.
3.43)
If cn z n has coefficients decreasing to 0 and radius of convergence 1, then the series converges every-
P
where on the circle |z| = 1, except possibly at z = 1. (Rudin’s PMA, Thm.
3.44)
Absolute Convergence Test: A series which converges absolutely is convergent.
356
Pn
Suppose
P P P
an converges absolutely, an = A, bn = B, and cn = k=0 ak bn−k . Then
∞
X
AB = cn
n=0
i.e. this product will converge, to the result we expect. (Rudin’s PMA, Thm. 3.50)
Pn
If
P P P
an , bn , cn converge to A, B, C respectively, and cn = k=0 ak bn−k , then C = AB. (Rudin’s
PMA, Thm. 3.51)
If
P P
an converges and {bn }n∈N is a monotonic bounded sequence, then an bn converges. (Rudin’s
PMA, Prob. 3.8)
If
P P
an , bn are absolutely convergent, so is their product. (Rudin’s PMA, Prob. 3.13)
357
More Important Results:
Baire’s Theorem: Take X a nonempty complete metric space, and {Gn }n∈N ⊆ P(X) a sequence of
T∞
dense, open subsets. Then i=1 Gi ̸= ∅, and is in fact dense in X. (Rudin’s PMA, Prob. 3.22)
358
Notable Sequences & Series:
For sequences, as in Theorems 3.20, 3.31
1 n→∞
−−−−→ 0 for p > 0
np
√ n→∞
n
p −−−−→ 1 for p > 0
√ n→∞
n
n −−−−→ 1
nα n→∞
−−−−→ 0 for p > 0 and α ∈ R
(1 + p)n
n→∞
xn −−−−→ 0 for |x| < 1
n
1 n→∞
1+ −−−−→ e
n
359
§14.4: (Baby Rudin, Chapter 4) Continuity
Some Conventions:
If not otherwise stated:
E ⊆ dom(f ) = X
360
Fundamental Definitions:
Note that this δ works for all p, q ∈ X, i.e. is independent of the point chosen. This is strictly stronger
than ordinary continuity.
One-Sided Limits: Take f : (a, b) → Y with x ∈ [a, b). We say
n→∞ n→∞
f (x+ ) = q ⇐⇒ ∀{tn }n∈N ⊆ (x, b) with tn −−−−→ x we have f (tn ) −−−−→ q
n→∞ n→∞
f (x− ) = q ⇐⇒ ∀{tn }n∈N ⊆ (a, x) with tn −−−−→ x we have f (tn ) −−−−→ q
Note that lim f (t) exists ⇐⇒ f (x+ ) = f (x− ) = lim f (t)
t→x t→x
361
Limit of a Function:
Given f : X → Y a function of metric spaces and p ∈ E ′ (E ⊆ X) we say that
x→p
f (x) → q as x → p f (x) −−−→ q lim f (x) = q
x→p
n→∞
for every {pn }n∈N ⊆ E with pn ̸= p for all n but pn −−−−→ p.
Hence, if f has a limit as x → p, then the limit is unique.
Take f, g : X → C with E ⊆ X and p ∈ E ′ and (Rudin’s PMA, Thm. 4.4)
Then:
lim (f + g)(x) = A + B
x→p
lim (f g)(x) = AB
x→p
f A
lim (x) = , if B ̸= 0
x→p g B
For a shorthand,
C(X, Y ) := {f : X → Y | f is continuous}
C(X) := C(X, X)
362
Continuous Functions:
Some results:
While not commented upon, Lipschitz continuity is used and is convenient: it is stronger than
ordinary continuity. Recall: f : X → Y is Lipschitz continuous with constant L if
∀x, y ∈ X we have dY f (x), f (y) ≤ L · dX (x, y)
Suppose f, g : X → C are continuous. Then the following are too: (Rudin’s PMA, Thm. 4.9)
f ±g
fg
f ±g
f is continuous iff the preimage of open sets is open, i.e. f −1 (V ) is open in dom(f ) for each open V in
cod(f ) (Rudin’s PMA, Thm. 4.8)
f is continuous iff the preimage of closed sets is closed, i.e. f −1 (C) is closed in dom(f ) for each closed
C in cod(f )
If a function f : X k → Rk is defined by (Rudin’s PMA, Thm. 4.10)
363
Interactions Between Compactness & Continuity:
Continuous functions preserve compactness. (If X is compact and f ∈ C(X, Y ), then f (X) is compact.)
(Rudin’s PMA, Thm. 4.14)
Hence if f ∈ C(X, Rn ) for X compact, then f (X) is closed & bounded. (Rudin’s PMA, Thm. 4.15)
If f ∈ C(X, R) for X compact, then ∃x∗ , x∗ ∈ X such that (Rudin’s PMA, Thm. 4.16)
If f ∈ C(X, Y ) for X compact, then f is uniformly continuous on X. (Rudin’s PMA, Thm. 4.19)
That is, continuity on a compact set gives uniform continuity there.
If E ⊆ R is not compact, then: (Rudin’s PMA, Thm. 4.20)
364
Interactions Between Connectedness & Continuity:
For f monotone-increasing on (a, b), then f (x+ ), f (x− ) exist everywhere, with
and, if a < x < y < b then f (x+ ) ≤ f (y − ). (Rudin’s PMA, Thm. 4.29)
An analogous result holds in the decreasing case.
As a result, monotone functions only have simple discontinuities.
Monotone functions have at-most-countably-many discontinuities. (Rudin’s PMA, Thm. 4.30)
365
Other Properties from the Exercises:
For E compact, f ∈ C(E, R) if and only if graph(f ) is compact. (Rudin’s PMA, Prob. 4.6)
If f : E ⊆ R → R is uniformly continuous on E, then f is bounded on E. (Rudin’s PMA, Prob. 4.8)
f : X → Y is uniformly continuous iff ∀ε > 0, ∃δ > 0 such that, if E ⊆ X has diam E < δ, then
diam f (E) < ε (Rudin’s PMA, Prob. 4.9)
If f : X → Y is uniformly continuous and {xn }n∈N is Cauchy in X, then {f (xn )}n∈N is Cauchy in Y .
(Rudin’s PMA, Prob. 4.11)
The composition of uniformly continuous functions is uniformly continuous. (Rudin’s PMA, Prob.
4.12)
For f convex on (a, b), with a < s < t < u < b, then we have the three chord lemma: (Rudin’s PMA,
Prob. 4.23)
f (t) − f (s) f (u) − f (s) f (u) − f (t)
≤ ≤
t−s u−s u−t
Then f is convex.
366
For A, B ⊆ Rn , we define (Rudin’s PMA, Prob. 4.25)
A + B := {a + b | a ∈ A, b ∈ B}
367
§14.5: (Baby Rudin, Chapter 5) Differentiation in R
Fundamental Definitions:
Difference Quotient; Derivative: Given f : [a, b] → C and x ∈ [a, b], define the difference
quotient φ : (a, b)\{x} → R
f (t) − f (x)
φ(t) := for all t ∈ (a, b), t ̸= x
t−x
and the derivative (provided the limit exists)
f (t) − f (x)
f ′ (x) := lim φ(t) = lim
t→x t→x t−x
with f ′ : D → R where D ⊆ (a, b) is those points where the limit exists.
If f ′ is defined at x, we say f is differentiable at x, and differentiable on E if differentiable at each
x ∈ E.
One-sided (left- and right-hand) derivatives arise in the obvious way, with limits as t → x− and
t → x+ respectively. Rudin does not choose to discuss these in any detail, and leaves the matter of
differentiability at the endpoints of intervals mostly undiscussed (but chooses to imply these derivatives
are undefined).
Passage to C/Rn -Valued Functions: For f : [a, b] → C in the form f = f1 +if2 (as a decomposition
into real and imaginary parts), then
f ′ = f1′ + if2′
with differentiability at z ∈ [a, b] iff f1 , f2 are differentiable at z. (This can be proven.) The derivative
is still defined in the usual way for C-valued functions, i.e.
f (t) − f (z)
f ′ (z) := lim
t→z t−z
If f : [a, b] → Rn , then we perform the derivative with the norm, sort of. Given such f , we define
′ n
f (t) − f (x) ′
f (x) is the point of R such that lim
− f (x)
=0
t→x t−x 2
if said point exists. Of course, as with C-valued functions, Rn -valued functions f = (f1 , · · ·, fn ) are
differentiable at x ∈ Rn iff they are differentiable in each component at x.
Extrema Definitions: For f : X → R with (X, d) a metric space, we say the following of p ∈ X:
◦ p is a local maximum of f if ∃δ > 0 such that f (q) ≤ f (p) for all q ∈ B(p; δ).
◦ p is a local minimum of f if ∃δ > 0 such that f (q) ≥ f (p) for all q ∈ B(p; δ).
368
Elementary Results:
(i) Sum Rule: (f + g)′ (x) = f ′ (x) + g ′ (x) (Rudin’s PMA, Thm. 5.3a)
′ ′ ′
(ii) Product Rule: (f g) (x) = f (x)g(x) + f (x)g (x) (Rudin’s PMA, Thm. 5.3b)
n
(iii) Product Rule (Vector Functions): For f, g : [a, b] → R differentiable at x, then
d
(f · g)′ (x) :=
⟨f (x), g(x)⟩Rn = (f ′ · g)(x) + (f · g ′ )(x) = ⟨f ′ (x), g(x)⟩Rn + ⟨f (x), g ′ (x)⟩Rn
dx
′ ′ ′
(iv) Quotient Rule: fg (x) = g(x)f (x)−g
g 2 (x)
(x)f (x)
if g(x) ̸= 0 (Rudin’s PMA, Thm. 5.3c)
(v) Chain Rule: If f ∈ C[a, b] and differentiable at x, and g : I ⊆ range(f ) → R with differentiability
at f (x), then (Rudin’s PMA, Thm. 5.5)
(vi) L’Hopital’s Rule: Suppose f, g : (a, b) → R are differentiable, with g ′ ̸= 0 on (a, b). (Here,
a < b and a, b ∈ [−∞, +∞].) Suppose (Rudin’s PMA, Thm. 5.13)
f ′ (x)
lim =A
x→a g ′ (x)
Then if
lim f (x) = lim g(x) = 0
x→a x→a
or
lim g(x) = ∞
x→a
then
f (x) f ′ (x)
lim = A = lim ′
x→a g(x) x→a g (x)
Monotonicity Relations: For f differentiable on (a, b), (Rudin’s PMA, Thm. 5.11)
369
More Important Results:
The “usual” mean value theorem arises for g(x) = x: (Rudin’s PMA, Thm. 5.10)
f (b) − f (a)
= f ′ (x)
b−a
Mean Value Theorem (Vector Case): Take f ∈ C([a, b], Rn ) differentiable on (a, b). Then
∃x ∈ (a, b) such that (Rudin’s PMA, Thm. 5.19)
∥f (b) − f (a)∥2
≤ ∥f ′ (x)∥2
b−a
◦ f : [a, b] → R
◦ n ∈ Z+
◦ f (n−1) ∈ C[a, b]
◦ f (n) (t) exists ∀t ∈ (a, b)
◦ α, β ∈ [a, b], α ̸= β
n−1
X f (k) (α)
◦ P (t) := (t − α)k
k!
k=0
370
§14.6: (Baby Rudin, Chapter 6) Riemann(-Stieltjes) Integration
Assumptions:
Unless stated otherwise:
f : [a, b] → R is bounded
α : [a, b] → R is monotone-increasing
Various notations will exist for the Riemann(-Stieltjes) integral. Rudin uses all of these equivalently:
Z Z Z Z Z b Z b Z b Z b
f f dα f dα(x) f (x) dα(x) f f dα f dα(x) f (x) dα(x)
a a a a
Basic Definitions:
Partitions, Upper/Lower Sums, & Related Terms: Given an interval [a, b] ⊆ R, we define a
n
partition of it to be a finite set of points P := {xi }i=0 where
a = x0 ≤ x1 ≤ · · · ≤ xn−1 ≤ xn = b
∆xi := xi − xi−1
∆fi := f (xi ) − f (xi−1 )
Mi := sup f (x)
x∈[xi−1 ,xi ]
mi := inf f (x)
x∈[xi−1 ,xi ]
n
X
U (P, f ) := Mi ∆xi (the upper (Darboux) sum of f over P )
i=1
Xn
L(P, f ) := mi ∆xi (the lower (Darboux) sum of f over P )
i=1
Z b
f dx := inf U (P, f ) (the upper (Darboux) integral of f )
a P ∈Pa,b
Z b
f dx := sup L(P, f ) (the lower (Darboux) integral of f )
a P ∈Pa,b
Rb Rb
Riemann-Integrable: If a f dx = a f dx, then we say f is Riemann-integrable on [a, b] and
write f ∈ R (the class of all Riemann-integrable functions) or f ∈ R[a, b] (those Riemann-integrable
on [a, b]) and write their common value by
Z b Z b Z b
f dx := f dx ≡ f dx
a a a
371
Terms for Riemann-Stieltjes Integrals: Given α : [a, b] → R monotone increasing, a function
n
f : [a, b] → R, and a partition P := {xi }i=0 of [a, b], then we define ∆xi , ∆αi , mi , Mi as before. We
then define, analogously, emphasizing dependence on α,
n
X
U (P, f, α) := Mi ∆αi (the upper (Darboux-Stieltjes) sum of f over P )
i=1
Xn
L(P, f, α) := mi ∆αi (the lower (Darboux-Stieltjes) sum of f over P )
i=1
Z b
f dα := inf U (P, f, α) (the upper (Darboux-Stieltjes) integral of f )
a P ∈Pa,b
Z b
f dα := sup L(P, f, α) (the lower (Darboux-Stieltjes) integral of f )
a P ∈Pa,b
Rb Rb
Riemann-Stieltjes Integrable: If a
f dα = a
f dα, we denote their common value by
Z b Z b Z b Z b
f (x) dα(x) ≡ f dα := f dα ≡ f dα
a a a a
which is the Riemann-Stieltjes integral of f w.r.t. α over [a, b]. Rudin chooses to say that f is
integrable w.r.t. α in the Riemann sense, denoted f ∈ R(α).
Note that the ordinary Riemann integral arises for α ≡ id.
Refinement: Given a partition P of [a, b], a partition P ∗ is said to be a refinement of P if P ⊆ P ∗ ,
i.e. it has the same points and possibly more.
Given P1 , P2 partitions of [a, b], their common refinement is the partition P1 ∪ P2 .
Basic Results:
Inequalities for Upper/Lower Riemann Sums: If f (x) ∈ [m, M ] for all x ∈ [a, b], then
That is, the lower sums only grow with new partition points, and the upper sums only shrink.
Z b Z b
Inequality of the Upper/Lower Integrals: f dα ≤ f dα (Rudin’s PMA, Thm. 6.5)
a a
372
Properties of the Integral:
Linearity in Integrand: Take f, g ∈ R(α) on [a, b], and β, γ ∈ R. Then (Rudin’s PMA, Thm. 6.12)
βf + γg ∈ R(α) on [a, b]
Z b Z b Z b
(βf + γg) dα = β f dα + γ g dα
a a a
Linearity in Weight Function: Let f ∈ R(α1 )∩R(α2 ) on [a, b] and c1 , c2 ∈ R; then f ∈ R(c1 α1 +c2 α2 )
on [a, b], with (Rudin’s PMA, Thm. 6.12)
Z b Z b Z b
f d(c1 α1 + c2 α2 ) = c1 f dα1 + c2 f dα2
a a a
Monotonicity: If f ≤ g for f, g ∈ R(α) on [a, b], then (Rudin’s PMA, Thm. 6.12)
Z b Z b
f dα ≤ g dα
a a
Additivity: If f ∈ R(α) on [a, b], and c ∈ (a, b), then f ∈ R(α) for both [a, c] and [c, b], with (Rudin’s
PMA, Thm. 6.12)
Z c Z b Z b
f dα + f dα = f dα
a c a
Boundedness: For f ∈ R(α) with |f (x)| ≤ M on [a, b], then (Rudin’s PMA, Thm. 6.12)
Z
b
f dα ≤ M α(b) − α(a)
a
Triangle Inequality: For f ∈ R(α) on [a, b], we have |f | ∈ R(α) with (Rudin’s PMA, Thm. 6.13)
Z Z
b b
f dα ≤ |f | dα
a a
Product is Integrable: If f, g ∈ R(α) on [a, b], then f g ∈ R(α) too. (Rudin’s PMA, Thm. 6.13)
If α is monotone increasing with derivative α′ ∈ R[a, b], and f : [a, b] → R is bounded, then: (Rudin’s
PMA, Thm. 6.17)
◦ f ∈ R(α) ⇐⇒ f α′ ∈ R
Z b Z b
◦ f dα = f (x)α′ (x) dx
a a
373
Then g ∈ R(β) with
Z B Z b
g dβ = f dα
A a
Without defining β, g, then f ◦ φ ∈ R(α ◦ φ) with
Z B Z φ(b)
(f ◦ φ) d(α ◦ φ) = f dα
A φ(a)
Hence in particular
Z φ(b) Z B
f (x) dx = f (φ(y))φ′ (y) dy
φ(a) A
Integration by Parts: Let F, G : [a, b] → R be differentiable with (Rudin’s PMA, Thm. 6.22)
F′ = f G′ = g f, g ∈ R
Then Z b Z b
F (x)g(x) dx = F (b)G(b) − F (a)G(a) − f (x)G(x) dx
a a
Stated in a slightly less general way,
Z b b Z b
′
f (x)g ′ (x) dx
f (x)g(x) dx = f (x)g(x) −
a a a
(This follows immediately from the fundamental theorem applied to H(x) := F (x)G(x).)
Cauchy Criterion for Integrability: f ∈ R(α) on [a, b] if and only if (Rudin’s PMA, Thm. 6.6)
(∀ε > 0)(∃P a partition of [a, b]) U (P, f, α) − L(P, f, α) < ε
374
The Fundamental Theorem:
Fundamental Theorem, Part 1: Let f ∈ R[a, b], and define (Rudin’s PMA, Thm. 6.20)
Z x
F : [a, b] → R by the rule F (x) := f (t) dt
a
i.e. the integral is defined componentwise. Many of the same results hold for such integrals as they did
in one dimension.
375
Application: Rectifiable Curves:
Some definitional groundwork, first:
Curves, Arcs: Given γ : [a, b] → Rn continuous, we say γ is a curve in Rk on [a, b]. If γ is injective,
we may further say it is an arc. If γ(a) = γ(b) (i.e. it starts and stops at the same place), we will say
γ is a closed curve.
Note that a curve is defined to be a function: the set of points on it (its range) may not be unique to
that function.
n
Length of a Curve: Given a partition {xi }i=0 of [a, b], and a curve γ on [a, b], let
n
X
L{(}(s)P, γ) := |γ(xi ) − γ(xi−1 )|
i=1
This is the length of the polygonal path in Rn with vertices (xi , γ(xi )). One can then define the length
of γ by
L{(}(s)γ) := sup L{(}(s)P, γ)
P ∈Pa,b
Some results:
If a curve γ is continuously differentiable, i.e. γ ∈ C 1 [a, b], i.e. γ, γ ′ ∈ C[a, b], then γ is rectifiable and
Z b
L{(}(s)γ) ≡ |γ ′ (t)| dt
a
which mirrors the ordinary formula from calculus. (Rudin’s PMA, Thm. 6.27)
376
§14.7: (Baby Rudin, Chapter 7) Sequences & Series of Functions
Given metric spaces X, Y and a sequence of functions {fn : E ⊆ X → Y }n∈N , we can define
∞
X
fℓ (x) := lim fn (x) fs (x) := fn (x)
n→∞
n=1
to be the limit of the sequence and the sum of the series formed by the sequence, defined on some subset of
E. These are the central objects of study in this section.
Definitions:
Pointwise-Bounded: For a sequence of functions {fn }n∈N on E, we say {fn }n∈N is pointwise-
bounded if
∃φ : E → R such that ∀x ∈ E and ∀n ∈ N we have |fn (x)| < φ(x)
Uniformly-Bounded: For a sequence of functions {fn }n∈N on E, we say {fn }n∈N is uniformly-
bounded if
∃M ∈ R such that ∀x ∈ E and ∀n ∈ N we have |fn (x)| < M
∀ε > 0, ∃δ > 0 such that ∀x, y ∈ E and f ∈ F we have d(x, y) < δ =⇒ |f (x) − f (y)| < ε
377
Modes of Convergence:
We consider a sequence of functions {fn : E ⊆ X → Y }n∈N and a prospective limiting function f as
needed.
Pointwise Convergence: The most basic kind of convergence, and generally what is implied when
no other qualifier is given. We say {fn }n∈N converges pointwise to f on E if
i.e. the N in the formal definition is sufficient for all x. So while in pointwise convergence N may be
a function of ε and x, in uniform convergence N is only a function of ε.
One may likewise develop a criterion for the uniform convergence of series, using the sequence of partial
sums as the sequence of functions to converge uniformly.
If {fn }n∈N has uniformly-bounded partial sums, gn → 0 uniformly with gn ≥ gn+1 , then
P
fn gn
converges uniformly.
378
Switching of Limits & Properties Preserved in Limits:
Is the limit of continuous/integrable/differentiable/etc. functions necessarily also continuous/integrable/d-
ifferentiable/etc.?
In some respect, answering this amounts to “can we interchange the order of limits”, and the answer is
no.
Some notable examples/counterexamples:
sin(nx) √
fn (x) := √ =⇒ fn′ (x) = n cos(nx)
x
We see
f (x) := lim fn (x) = 0 =⇒ f ′ (x) = 0
n→∞
n→∞
However, fn′ (x) −−−−→ ∞ for x ̸∈ ker(fn′ ).
since the integrand approaches zero on nonzero values (and is zero at zero). However,
1
n2
Z
lim n2 x(1 − x2 )n dx = lim = +∞
n→∞ 0 n→∞ 2n + 2
Replacing n2 with n, one still achieves an inequality, but with the latter limit evaluating to 1/2.
379
Uniform Continuity – Implications & Equivalences:
Cauchy Criterion for Uniform Convergence: Consider the sequence {fn : E → Y }n∈N . Then
{fn }n∈N converges uniformly on E if and only if ... (Rudin’s PMA, Thm. 7.8)
(∀ε > 0)(∃N ∈ N)(∀n, m ≥ N )(∀x ∈ E) |fn (x) − fm (x)| < ε
Uniformly Convergent iff Convergent in L∞ : If one defines (Rudin’s PMA, Thm. 7.9)
and
Mn := sup |fn (x) − f (x)| =: ∥fn − f ∥L∞ (E)
x∈E
n→∞
then fn → f uniformly iff Mn −−−−→ 0.
Weierstrass M -Test for Series: Let {fn : E → Y }n∈N satisfy, for some sequence {Mn }n∈N ⊆ R,
(Rudin’s PMA, Thm. 7.10)
380
Uniform Convergence – Consequences for Continuity:
We presume {fn }n∈N is a sequence of functions fn : E → Y for (X, d), (Y, ρ) metric spaces unless specified
otherwise. f will be a prospective limit likewise.
n→∞
Interchange of Limits is Valid: Let fn −−−−→ f uniformly, with x ∈ E ′ and (Rudin’s PMA, Thm.
7.11)
lim fn (t) = An for each n ∈ N
t→x
More explicitly,
lim lim fn (t) = lim lim fn (t)
t→x n→∞ n→∞ t→x
Limit is Continuous: If {fn }n∈N ⊆ C(E) and fn → f uniformly, then f ∈ C(E). (Rudin’s PMA,
Thm. 7.12)
The converse need not be true (the convergence of continuous functions to another continuous function
may not be uniform).
However, if the domain is compact and fn ≥ fn+1 , with a pointwise limit f , with fn , f all continuous,
then the convergence is uniform. (Rudin’s PMA, Thm. 7.13)
381
Uniform Convergence – Consequences for Integration, Differentiations, & Series:
Can Interchange Limit & Riemann-Stieltjes Integral: For α monotone increasing on [a, b] with
fn ∈ R(α) on [a, b] and fn → f uniformly on [a, b], we have that f ∈ R(α) on [a, b] and (Rudin’s
PMA, Thm. 7.16)
Z b Z b Z b
f dα = lim fn dα = lim fn dα
a a n→∞ n→∞ a
Can Interchange Integral & Sum: For α monotone increasing on [a, b], with fn ∈ R(α) on [a, b]
and
∞
X
f (x) = fn (x), a uniformly-convergent series on [a, b]
n=1
then Z b Z ∞
bX ∞ Z
X b
f dα = fn dα = fn dα
a a n=1 n=1 a
382
§15: Items from Measure Theory
Recall that an interval I ⊆ Rn is a Cartesian product of intervals; we define it and its volume as so:
n
Y n
Y
I := [ai , bi ] =⇒ v(I) := (bi − ai )
i=1 j=1
∞
Let S := {Ik }k=1 be an at-most-countable collection of intervals covering a set E ⊆ Rn . Let S be the class of
all such covers of E. Then we define the outer (or external, or Jordan) Lebesgue measure (denoted
µe (E) or |E|e ) by X
σ(S) := v(Ik ) |E|e := inf σ(S)
S∈S
Ik ∈S
|E|e = ∞ iff σ(S) = ∞ for each S ∈ S. Consequently, |E|e < ∞ if some S ∈ S has σ(S) < ∞.
At-most-countable sets E have |E|e = 0. (Sets of measure zero may be called “null sets” later.) The
converse is not true, e.g. the Cantor set.
Characterization via Open Sets: ∀ε > 0, ∃G open with E ⊆ G such that (Rudin’s PMA, Thm.
3.6)
|E|e ≤ |G|e ≤ |E|e + ε which implies we can say |E|e = inf |G|e
open G⊇E
Characterization via Gδ Sets: For each E ⊆ Rn there is a Gδ set H where E ⊆ H and |E|e = |H|e .
(Rudin’s PMA, Thm. 3.8)
Measure of Ascending Sequence: If Ek ↗ E then lim |Ek |e = |E|e
k→∞
383
§15.2: (Lebesgue) Measure & Measurability of Sets
E⊆G
G is open
|G − E|e < ε
A few classes of measurable sets of note. Observe that these ensure the measureable sets of Rn are a
σ-algebra.
Since the set of measurable sets is closed under countable union and complement, and clearly ∅, Rn are
measurable, then the class of measurable sets in Rn is a σ-algebra.
The smallest σ-algebra containing the open sets of Rn is called the Borel σ-algebra, B or B(Rn ). It
includes, for instance, Fσ , Gδ , and their extensions Fσδ and Gδσ , among others. The elements of B are called
Borel sets. (Consequently, all Borel sets are measurable.)
Some properties of the Lebesgue measure are below. Let M := M(Rn ) denote the measurable sets in
n
R for notational pleasantry.
Many properties from the Lebesgue outer measure carry over, due to definitions. The below may not
carry to the outer measure.
Difference of Measure Zero: |A△B| = 0 =⇒ |A| = |B|.
◦ In particular, if A ⊆ B, then A△B = B − A since A − B = ∅. Then |B − A| = 0 =⇒ |B| = |A|.
Inclusion-Exclusion: |A ∪ B| = |A| + |B| − |A ∩ B| (Rudin’s PMA, Prob. 3.10)
Union with Null Set: If E ∈ M and Z is a null set, |E| = |E ∪ Z|.
[ X
Countable Additivity if Disjoint: Suppose {Ek }k∈N ⊆ M are pairwise disjoint. Then Ek = |Ek |.
k∈N k∈N
(Rudin’s PMA, Thm. 3.23)
Measure of Difference: Let A, B ∈ M, A ⊆ B, and |A| < ∞. Then |B − A| = |B| − |A| (Rudin’s
PMA, Cor. 3.25)
384
Measure of Limit of Sequence: Let {Ek }k∈N ⊆ M. Then: (Rudin’s PMA, Thm. 3.26; 3.27)
◦ Ek ↗ E =⇒ lim |Ek | = |E|
k→∞
◦ Ek ↘ E and |Ek | < ∞ for some k =⇒ lim |Ek | = |E|
k→∞
X
◦ Moreover, if |Ek |e < ∞, then the limsup and liminf are null sets. (Rudin’s PMA, Prob. 3.9)
k
The product of measurable sets is measurable, with |A × B| = |A||B| in the appropriate dimensions of
Rn . We assume 0 · ∞ = 0.
Vitali’s Theorem: There exist nonmeasurable sets; in particular, sets of positive measure always
contain one. (Rudin’s PMA, Thm. 3.38, 3.39)
Via Closed Set (Inner Regularity): ∀ε > 0, ∃F ⊆ E closed with |E − F |e < ε. (Rudin’s PMA,
Lem. 3.22)
Off from Gδ by Null Set: E = H − Z for H a Gδ set and Z a null set (Rudin’s PMA, Thm. 3.28)
Made by Fσ and Null Set: E = H ∪ Z for H an Fσ set and Z a null set (Rudin’s PMA, Thm. 3.28)
Carathéodory’s Characterization: For any set A, |A|e = |A ∩ E|e + |A − E|e (Rudin’s PMA,
Thm. 3.30)
Given |E|e < ∞, then E ∈ M iff ∀ε > 0, E = (S ∪ A) − B, for S a finite union of nonoverlapping
intervals, and |A|e , |B|e < ε
A note on a misconception: sets of positive measure need not contain intervals in any sense. The
irrationals are an example.
385
§15.3: (Lebesgue) Measurable Functions
{f ≥ a} ≡ f −1 [a, ∞] is measurable ∀a ∈ R
{f < a} ≡ f −1 [−∞, a) is measurable ∀a ∈ R
{f ≤ a} ≡ f −1 [−∞, a] is measurable ∀a ∈ R
The above are enough if holding ∀a ∈ D, with D ⊆ Rn dense (Rudin’s PMA, Thm. 4.4)
If f is measurable, the following sets are measurable (assume a, b ∈ R and a < b): (Rudin’s PMA, Cor. 4.2)
f + λ for λ ∈ R, i.e. the function (f + λ)(x) := f (x) + λ (Rudin’s PMA, Thm. 4.8)
λf for λ ∈ R: this function is defined by (λf )(x) := λ · f (x) (Rudin’s PMA, Thm. 4.8)
f + g (hence, ∀α, β ∈ R, αf + βg is measurable) (Rudin’s PMA, Thm. 4.9)
◦ Extends to any finite linear combination
◦ Ensures that the set of measurable functions on E is a vector space
f ·g (Rudin’s PMA, Thm. 4.10)
If g ̸= 0 everywhere, then f /g is too (Rudin’s PMA, Thm. 4.10)
◦ Measure and Integral goes for g ̸= 0 a.e., but you have to change the values on the null set, or
restrict f /g to the non-null set
∞
Take {fk }k=1 measurable.
386
◦ Then fs (x) := sup fk (x) , fi (x) := inf fk (x) are too (Rudin’s PMA, Thm. 4.11)
k∈N k∈N
◦ Also, lim sup fk and lim inf fk are measurable. (If equal, this gives it for the limit.) (Rudin’s
k∈∞ k→∞
PMA, Thm. 4.12)
!
We define lim sup fk (x) := inf sup fk (x) and lim inf fk (x) := sup inf fk (x)
k→∞ j∈N k≥j k→∞ j∈N k≥j
Other results:
We say s is a simple function if it is a linear combination of characteristic functions; we can limit our
consideration to E1 , · · ·, EN pairwise disjoint.
a1 , x ∈ E1
N
a2 , x ∈ E2
X
s(x) := ak · 1Ek (x) = .. ..
k=1
. .
aN , x ∈ EN
Note that s is measurable iff each Ei is measurable. Some notes: (Rudin’s PMA, Thm. 4.13)
Any function f is the (pointwise) limit of a sequence of simple functions {sk }k∈N
If f ≥ 0, said sequence may be chosen to be increasing, i.e. sk ≤ sk+1 on each k
If the limiting function is measurable, we may also choose the sk to be measurable
Egorov’s Theorem: Let E be of finite measure and {fk : E → R}k∈N measurable, converging pointwise-
a.e. to f . Then ∀ε > 0, ∃F ⊆ E closed with |E − F | < ε and fk → f uniformly on F . (Rudin’s PMA,
Thm. 4.17)
Lusin’s Theorem: Let f be defined and finite on a measurable set E. Then f is measurable iff it
has property C. (Rudin’s PMA, Thm. 4.20)
◦ “Every measurable function is nearly continuous.”
◦ Define property C as so: f has it on E (E measurable) if, ∀ε > 0, ∃F ⊆ E closed where |E − F | < ε
and f |F is continuous.
387
◦ Simple functions (which are measurable) have this property
Alternate Form to Lusin’s Theorem: Let f : E → R be measurable and ε > 0. Then ∃ closed
F ⊆ E with |E − F | < ε, and ∃ g : Rn → R continuous with f ≡ g|F .
Frechet’s Theorem: Take E ⊆ Rn measurable and f : E → R finite-a.e. Then f is measurable on E
iff ∃{fk }k∈N ⊆ C(Rn , R) with fk → f pointwise-a.e. on E. (Functions are measurable iff they are the
pointwise-a.e. limit of continuous functions.)
If f : [a, b] ⊆ R → R is measurable, then ∃{pk }k∈N a sequence of polynomials such that pk → f
pointwise-a.e. on [a, b]. (May generalize to compact sets? Easily generalizes to the multidimensional
case with domain a box.)
388
§15.4: Convergence in Measure
We focus on the Lebesgue measure here. Some results depend on σ-finiteness, which is true for the
Lebesgue measure.
We say a sequence {fk : E ⊆ Rn → R}k∈N and f : E → R, with f, fk measurable and finite a.e. Then fk
m
converges in measure (on E) to f , denoted fk −→ f , if and only if any of these equivalent criteria hold
∀ε > 0, lim {x ∈ E | |f (x) − fk (x)| > ε} = 0
k→∞
k→∞
∀ε > 0, |{|f − fk | > ε}| −−−−→ 0
∀ε > 0, ∃N ∈ N such that ∀k ≥ N , |{|f − fk | > ε}| < ε (Rudin’s PMA, Prob. 4.16)
Cauchy Criteria:
◦ ∀ε > 0, ∀η > 0, ∃N ∈ N such that ∀k, ℓ ≥ N , |{|fk − fℓ | > ε}| < η (Rudin’s PMA, Thm. 4.23
restated)
◦ ∀ε > 0, ∃N ∈ N such that ∀k, ℓ ≥ N , |{|fk − fℓ | > ε}| < ε (Rudin’s PMA, Prob. 4.16)
m m
Suppose fk −→ f and gk −→ g. Then the following convergences hold:
m
fk + gk −→ f + g
m
fk gk −→ f g (requires: |E| < ∞)
m
fk /gk −→ f /g (requires: |E| < ∞ and g ̸= 0 a.e. and gk → g pointwise-a.e.)
m
|fk | −→ |f | (the reverse need not hold)
Let f, {fk } be measurable and finite a.e. in E of finite measure. Then fk → f pointwise-a.e. =⇒
m
fk −→ f on E. (Rudin’s PMA, Thm. 4.21)
m
fk −→ f on E =⇒ there is a subsequence fkj j∈N ⊆ {fk }k∈N such that fkj → f pointwise-a.e. in E.
i→∞
such that fkji −−−→ f pointwise-a.e. (Needs some sort of finiteness on the domain.)
Fatou’s lemma and the monotone convergence theorem hold if a.e.-convergence is replaced by conver-
gence in measure.
389
§15.5: (Lebesgue) Integrals for Nonnegative Functions
R(f ) := (x, y) ∈ Rn+1 x ∈ E and either y ∈ [0, f (x)] if f (x) < ∞, or y ∈ [0, ∞) if f (x) = ∞
Some results:
For f nonnegative and measurable, Γ(f ) is a null set (Rudin’s PMA, Lem. 5.3)
E f exists iff f is measurable
R
(Rudin’s PMA, Thm. 5.1)
X Z X
Let f be a simple function, f ≡ aj 1Ej . Then f= aj · |Ej | (Rudin’s PMA, Cor. 5.4)
E
Some basic integral properties follow. Let F denote the class of measurable functions, and F + the class
of nonnegative measurable functions, of an understood domain E.
Z Z Z
Linearity: For α, β ∈ R and f, g ∈ F , then
+
(αf + βg) = α f +β g (Rudin’s PMA, Thm.
E E E
5.13; 5.14)
Monotonicity of Integrands: For f, g ∈ F + with 0 ≤ g ≤ f a.e., then (Rudin’s PMA, Thm. 5.5;
5.10) Z Z Z Z Z
g≤ f ; in particular, (inf f ) ≤ f≤ (sup f )
E E E E E
Z Z
Equal-a.e. Functions Have Equal Integral: If f, g ∈ F +
and f = g a.e., then f = g.
E E
(Rudin’s PMA, Thm. 5.10)
Z Z
Monotonicity of Domain: If f ∈ F + with A ⊆ B ⊆ E, then f≤ f (Rudin’s PMA, Thm.
A B
5.5)
Finite Integral =⇒ Finite a.e.: If f ∈ F + and
R
E
f is finite, then f < ∞ a.e. (Rudin’s PMA,
Thm. 5.5)
Null Domains & Zero Integral:
◦ For f ∈ F + and E a null set, E f = 0.
R
(Rudin’s PMA, Thm. 5.9)
◦ For f ∈ F + , E f = 0 iff f = 0 a.e.
R
(Rudin’s PMA, Thm. 5.11)
Integral on Partition:
Z Let E be partitioned into at-most-countably many disjoint measurable sets
XZ
{Ek }k∈N . Then = f (Rudin’s PMA, Thm. 5.7)
E j Ej
390
Riemann-Like Definition: Let f ∈ F + . Let E be the class of all decomposition of E into finitely
m
many, disjoint, measurable sets {Ej }j=1 . Then (Rudin’s PMA, Thm. 5.8)
Z X
f = sup inf f (x) |Ej |
E E x∈Ej
j
Z Z Z Z
For f, φ ∈ F + with 0 ≤ f ≤ φ and f < ∞, then (φ − f ) = φ− f (Rudin’s PMA, Cor.
E E E E
5.15)
∞
Z X ∞ Z
X
For {fk }k∈N ⊆ F , +
fk = fk (Rudin’s PMA, Thm. 5.16)
E k=1 k=1 E
+
R bits being the reverse Fatou lemma. We require that ∃φ ∈ F such that fn ≤ φ for all
the latter
n and E φ < ∞ for it to work.
k→∞
Corollary of Fatou, DCT, & MCT: Take {fk }k∈N nonnegative measurable functions with fk −−−−→ f
pointwise-a.e. on E and fk ≤ f a.e. and for each k. Then
Z Z
k→∞
fk −−−−→ f
E E
Note there is no assumption on integrability of f , unlike, say, DCT. Some call this also the MCT,
despite being strictly stronger and more practical. Some discussion on MSE here.
R
It may be considered a corollary of theRDCT for E f < ∞, as well, and hence the usage of Fatou arises
(and can be used independently) for E f = ∞. Note, too, Fatou can be considered implied by the
MCT (see typical proofs).
391
§15.6: (Lebesgue) Integrals for Arbitrary Real Functions
Recall: we define
f+ − f− = f
f + + f − = |f |
f + g = (f + g)+ − (f + g)− = f + − f − + g + − g −
f ≤ g =⇒ f + ≤ g + and f − ≥ g −
f ± ≥ 0, and f measurable ensures they are too
provided at least one of the latter two integrals are finite. We say:
E f exists if at least one of the latter two were finite (hence, E f ∈ [−∞, ∞])
R R
f is Lebesgue integrable on E if both were finite ( E f ± ∈ [0, ∞)) and write that f ∈ L(E) or
R
We can go through a gauntlet of the usual results for nonnegative functions by use of the definitional
decomposition f = f + − f − , casework, and/or clever manipulations.
392
Linearity: Take α, β ∈ R and f, g ∈ L(E); then αf + βg ∈ L(E) and (Rudin’s PMA, Thm. 5.27,
5.28) Z Z Z
αf + βg = α f +β g
E E E
R R
◦ Scalar multiplication can be loosened. If α ∈ R and E f exists, then E
αf exists with (Rudin’s
PMA, Thm. 5.27) Z Z
αf = α f
E E
If f ∈ L(E) and g is measurable on E, with |g| ≤ M a.e., then f g ∈ L(E) with (Rudin’s PMA, Thm.
5.30) Z Z
|f g| ≤ M |f |
E E
◦ As a corollary, if (further) f ≥ 0 and ∃α, β ∈ R with α ≤ g ≤ β a.e., then (Rudin’s PMA, Cor.
5.31) Z Z Z
α f≤ fg ≤ β f
E E E
Monotone Convergence Theorem (MCT): Take {fk }k∈N measurable on E. (Rudin’s PMA,
Thm. 5.32)
◦ (Ascending & Below) If fk ↗ f a.e., and ∃φ ∈ L(E) with fk ≥ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
◦ (Descending & Above) If fk ↘ f a.e., and ∃φ ∈ L(E) with fk ≤ φ a.e. ∀k, then
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
unif.
Uniform Convergence Theorem: Take {fk }k∈N ⊆ L(E) with fk −
−−−→ f on E, with |E| < ∞.
k→∞
Then: (Rudin’s PMA, Thm. 5.33)
◦ f ∈ L(E)
Z Z
k→∞
◦ fk −−−−→ f
E E
Fatou’s Lemma & Reversal: Take {fk }k∈N measurable on E. Suppose ∃φ ∈ L(E) with fk ≥ φ a.e.
on E ∀k. Then (Rudin’s PMA, Thm. 5.34)
Z Z
lim inf fk ≤ lim inf fk
E k→∞ k→∞ E
As a corollary, we have the reverse, provided ∃ψ ∈ L(E) instead with fk ≤ ψ a.e. for all k: (Rudin’s
PMA, Cor. 5.35) Z Z
lim sup fk ≥ lim sup fk
E k→∞ k→∞ E
393
k→∞
Dominated Convergence Theorem (DCT): Take {fk }k∈N measurable on E with fk −−−−→ f
pointwise-a.e. If ∃φ ∈ L(E) with |fk | ≤ φ a.e. ∀k, then (Rudin’s PMA, Thm. 5.36)
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
◦ Sequential/Generalized Version: Take {fk }k∈N , {φk }k∈N sequences of measurable functions
with (Rudin’s PMA, Prob. 5.23)
k→∞
fk −−−−→ f pointwise a.e. in E
k→∞
φk −−−−→ φ pointwise a.e. in E
|fk | ≤ φk a.e. in E for all k
φ ∈ L(E)
R k→∞ R
φ −−−−→ E φ
E k
Z
k→∞
Then |fk − f | −−−−→ 0
E
m
◦ Convergence In Measure Version: Suppose {fk }k∈N has fk −→ f on E, and |fk | ≤ φ ∈ L(E)
for some φ. Then f ∈ L(E) and (Rudin’s PMA, Prob. 5.26)
Z Z Z Z Z
k→∞
fk −−−−→ f ; that is, lim fk = lim fk = f
E E k→∞ E E k→∞ E
Proved by showing every subsequence {fkj }j∈N has a subsubsequence {fkji }i∈N with
Z Z
i→∞
fkji −−−→ f
E E
Corollary of Fatou, DCT, & MCT: Originally stated for positive functions, keeping here for
convenience.
k→∞
Take {fk }k∈N nonnegative measurable functions with fk −−−−→ f pointwise-a.e. on E and fk ≤ f a.e.
and for each k. Then Z Z
k→∞
fk −−−−→ f
E E
Note there is no assumption on integrability of f , unlike, say, DCT. Some call this also the MCT,
despite being strictly stronger and more practical. Some discussion on MSE here.
R
It may be considered a corollary of theRDCT for E f < ∞, as well, and hence the usage of Fatou arises
(and can be used independently) for E f = ∞. Note, too, Fatou can be considered implied by the
MCT (see typical proofs).
394
§15.7: Repeated Integration: Fubini-Tonelli
Ex := {y ∈ Rq | (x, y) ∈ E}
E y := {x ∈ Rp | (x, y) ∈ E}
Algebraic Cross Section Properties: The following properties are satisfied; given A, B, A1 , · · ·, An , · · · ∈ Rp ×Rq ,
and any x ∈ Rp , y ∈ Rq ,
(i) A ⊆ B =⇒ Ax ⊆ Bx
∞ ∞
!
[ [
(ii) An = (An )x
n=1 x n=1
∞ ∞
!
\ \
(iii) An = (An )x
n=1 x n=1
(iv) (A − B)x = Ax − Bx
(v) An ↗ A =⇒ (An )x ↗ Ax
(vi) An ↘ A =⇒ (An )x ↘ Ax
(vii) All of the above hold analogously for the y-sections
Fubini’s Theorem: For Ik intervals, take f (x, y) ∈ L(I1 × I2 ). Then: (Rudin’s PMA, Thm. 6.1)
n m
This includes the case when I1 = R , I2 = R .
Comment on Fubini: If E f is finite, the corresponding iterated integrals are finite. The reverse is
RR
not true, even if the iterated integrals all equal. This is true for nonnegative measurable f , by Tonelli.
Lemmas Leading Up To Fubini:
m
◦ If Fubini applies to {fk }k=1 , it applies to a finite linear combination of them. (Rudin’s PMA,
Lem. 6.2)
◦ If Fubini applies to each of {fk }k∈N , with fk ↗ f or fk ↘ f , it applies to f .(Rudin’s PMA, Lem.
6.3)
◦ Fubini applies to χE for E Gδ (Rudin’s PMA, Lem. 6.4)
◦ Fubini applies to χZ for Z of measure zero. Moreover, ∀x ∈ R a.e., Zx := {y ∈ Rm | (x, y) ∈ Z}
n
395
◦ Fubini applies to characteristics of measurable sets of finite measure (Rudin’s PMA, Lem. 6.6)
Tonelli’s Theorem: Take f (x, y) ≥ 0 on I1 × I2 ⊆ Rn+m an interval. Then: (Rudin’s PMA, Thm.
6.10)
396
§15.8: Differentiation (As in Lectures)
Recall: for f : [a, b] → R monotone (increasing or decreasing), then f has at-most-countably-many points
of discontinuity, and is differentiable a.e. For f increasing in particular,
Z b
f ′ ≤ f (b) − f (a)
a
◦ x ∈ In for each n ∈ N
n→∞
◦ |In | −−−−→ 0
Vitali’s covering lemma states the following: given E ⊆ R bounded with Vitali cover V := {Iα }α∈A ,
∃{In }n∈N ⊆ V such that
{In }n∈N contains at-most-countably-many pairwise-disjoint closed intervals (may pad with ∅ if needed)
397
Subsequential Derivatives:
To define the subsequential derivative (of f , at x0 , associated with {hn }n∈N ), list use have
f : [a, b] → R
x0 ∈ [a, b]
{hn }n∈N ⊆ [a − x0 , b − x0 ]\{0}
f (x0 + hn ) − f (x0 )
lim =: λ ∈ R (exists, and is in R)
n→∞ hn
Then we say λ is the subsequential derivative, of f , at x0 , associated with {hn }n∈N . Note that the
classical derivative appears for hn := 1/n.
Personal notation:
f (x0 + hn ) − f (x0 )
SSD(f ; x0 ; hn ) := lim
n→∞ hn
If needed, we may omit the hn if it is not of concern at that moment.
Consequently, f is differentiable at x0 iff all subsequential derivatives of f at x0 have the same λ. That
is to say, SSD(f ; x0 ; hn ) = λ for all sequences {hn }n∈N (and exists).
Bounding Lemmas: Let f : [a, b] ⊇ E → R be strictly increasing. Let p > 0 be fixed. Suppose that,
∀x ∈ E, we have SSD(f ; x) = λx < p. Then
|f (E)|e ≤ p · |E|e
Likewise, if SSD(f ; x) = λx > q for some q ≥ 0 fixed, then
|f (E)|e ≥ q · |E|e
Finally, if SSD(f ; x) = +∞ for all x ∈ E,
|E| = 0
Lebesgue Differentiation Theorem for Monotone Functions: If f : [a, b] → R is monotone,
then f is differentiable a.e. in the classical sense, i.e. f ′ (x0 ) ∈ R and exists for all x ∈ [a.b].
Bound on Integral: Let f : [a, b] → R be non-decreasing (weakly increasing). Then f ′ ∈ L[a, b] and
Z b
f ′ ≤ f (b− ) − f (a+ ) ≤ f (b) − f (a)
a
The final inequality is a bit weaker due to endpoint behavior not mattering. Note that we define
f (ξ − ) := lim− f (x) f (ξ + ) := lim+ f (x)
x→ξ x→ξ
398
§15.9: Differentiation (As in Measure & Integral, Chapter 7)
Set Functions:
To generalize the notion of indefinite integral , given f : A → R (where E ⊆ A ⊆ Rn and A, E are
measurable), we define the indefinite integral of f to be
Z
F : {measurable subsets of A} → R≥0 as defined by F (E) := f
E
Set Functions: F is a set function, a function defined on a σ-algebra Σ ⊆ P(A) such that
·
[ X
E= Ek for Ek ∈ Σ =⇒ E ∈ Σ and F (E) = F (Ek )
k∈N k∈N
The indefinite integral satisfies these properties (Theorems 5.5 & 5.24).
Continuity: A set function is said to be continuous if
Formally,
(∀ε > 0)(∃δ > 0) diam(E) < δ =⇒ |F (E)| < ε
lim F (E) = 0
|E|→0
Formally,
(∀ε > 0)(∃δ > 0) |E| < δ =⇒ |F (E)| < ε
Such functions are obviously continuous, but the converse need not hold.
R
For f ∈ L(A), the indefinite integral F (E) := E f is absolutely continuous.
If F is a set function absolutely continuous w.r.t. Lebesgue measure, then ∃f which has indefinite
integral f . (This is the Radon-Nikodym theorem.)
399
Indefinite Integrals and Differentation:
n n
Qn a cube (i.e. Qn = [a, b] ⊆ R for some a, b ∈ R), and Qx a cube with center x (in the sense
Let Q denote
that Qx = i=1 [xi − r, xi + r] for some r > 0).
Let F be f ’s indefinite integral.
We consider: does Z
F (Qx ) 1
= f (ξ) dξ
|Qx | |Qx | Qx
F (Qx )
lim = f (x) (perhaps think of r → 0)
Qx ↘x |Qx |
hold?
If so, we say that f ’s indefinite integral is differentiable at x with derivative f ′ (x).
Note that in R1 this amounts to the question of whether
Z x+h
1 ?
lim f (ξ) dξ = f (x)
h→0 2h x−h
Lebesgue’s Differentiation Theorem: For f ∈ L(Rn ), its indefinite integral is differentiable a.e.,
with derivative f . This proof of this necsssitates several lemmas, and proceeds by approximating such
f by continuous Ck . (Rudin’s PMA, Thm. 7.2)
◦ An Extension (Lloc (Rn )): This holds for f ∈ Lloc (Rn ) (Rudin’s PMA, Thm. 7.11)
We say f ∈ Lloc (Rn ) (locally integrable if f ∈ L(B) for each bounded measurable (equivalently,
compact) B ⊆ Rn .
◦ An Extension (Points of Density): Observe that, for E measurable,
|E ∩ Q| |E ∩ Qx |
Z
1
χE = =⇒ lim = χE (x) a.e.
|Q| Q |Q| Qx ↘x |Qx |
A point for which the limit is 1 is a point of density of E and a point for which the limit is
0 is a point of dispersion. (Note that the equality above holds only a.e. Note that points of
dispersion of E are density points of E c , etc.)
Then the differentiation theorem gives: almost each point in E ⊆ Rn measurable is a point of
density in E. (Rudin’s PMA, Thm. 7.13)
◦ An Extension (Lebesgue Points): We note that
Z Z
1 1
lim f (ξ) dξ = f (x) =⇒ lim f (ξ) − f (x) dξ
Qx ↘x |Qx | Q Qx ↘x |Qx | Q
x x
is satisfied, then x is a Lebesgue point of f ; the collection of all these points is f ’s Lebesgue
set.
The differentiation theorem may be extended as so: for f ∈ Lloc (Rn ), almost-all x ∈ Rn are
Lebesgue points of f . (Rudin’s PMA, Thm. 7.15)
400
◦ An Extension (Broader Notion of Shrinking): What if we go beyond cubes? A family of
sets {Sn }n∈N is said to shrink regularly to x if
n→∞
(i) diam(Sn ) −−−−→ 0
(ii) If Q is the smallest cube of center x containing S, ∃k independent of S such that |Q| ≤ k|S|
Then for f ∈ Lloc (Rn ), if x is in f ’s Lebesgue set,
Z
1
n→∞
f (ξ) − f (x) dξ −−−−→ 0
|Sn | Sn
Approximation by C0 Functions: For f ∈ L(Rn ), ∃{Ck }k∈N ⊆ C0 (Rn ) with (Rudin’s PMA, Lem.
7.3) Z
k→∞ k→∞
|f − Ck | −−−−→ 0; that is, ∥f − Ck ∥L1 −−−−→ 0
Rn
Proof proceeds by showing it for finite linear combinations, and then limits of sequences.
Simple Vitali Cover Lemma: Let E ⊆ Rn have |E|e < ∞, and K := {Qi }i∈I a collection of cubes
covering E. Then ∃ (Rudin’s PMA, Lem. 7.4)
such that
N
X
|Qj | ≥ β · |E|e
j=1
(with the supremum specifically over those cubes with edges parallel to the axes). This function satisfies
the following inner product-like properties:
◦ 0 ≤ f ∗ (x) ≤ ∞
◦ (f + g)∗ (x) ≤ f ∗ (x) + g ∗ (x)
◦ (cf )∗ (x) = |c| · f ∗ (x)
◦ If f ∗ (x0 )α for some x0 ∈ Rn and α > 0, then (by absolute continuity) f ∗ (x) > α for all x
sufficiently near x0 .
◦ Hence, f ∗ is lower semicontinuous and thus measurable
◦ However, f ∗ is never integrable on {|x| ≥ 1} unless f ≡ 0 a.e., and not even over bounded sets
(but only for some f ∈ L(Rn )). It will be integrable over bounded sets if f ∈ Lp (Rn ) for some
p > 1, or even |f | · (1 + ln+ |f |) ∈ L1 (Rn ).
401
Weak Lp : Suppose that, for f ∈ L(Rn ), ∀α > 0, ∃c (not dependent on α) such that
c
|{|f | > α}| ≤
α
Then f is said to belong to weak L1 . More generally, with p ∈ [1, ∞], f ∈ Lpweak (Rn ) (or the Lorentz
space Lp,∞ ) if c p
|{|f | > α}| ≤
α
The best such C is given by the seminorm (triangle inequality fails)
h i1/p
∥f ∥Lp n = sup α · |{|f | > α}|
weak (R )
α>0
We note that Lp is contained in its weak Lp functions, and the latter have norm bounded about by
the Lp norm proper (when it exists).
Hardy-Littlewood Inequality: For f ∈ L(Rn ), we have f ∗ ∈ Lweak (Rn ), and ∃c independent of f, α
such that, for all α > 0, (Rudin’s PMA, Lem. 7.9)
Z
c
|{f ∗ > α}| ≤ |f |
α Rn
402
§15.10: Functions of Bounded Variation (in R)
Fundamental Definitions:
nP
Consider an interval [a, b] ⊆ R. An ordered partition P := {xi }i=0 is a set of points such that
Sometimes we want to focus on the bits of positive or negative variation. Recall that, given f ,
to be its positive and negative parts respectively. In particular, applied to numbers, we let
( (
+ x, x > 0 − 0, x > 0
x := x :=
0, x ≤ 0 −x, x ≤ 0
π(x) := Vax (f )
403
Unsorted Results on Bounded Variation Functions:
Let α, β ∈ R, {hn }n∈N ⊆ BV[a, b], and f, g ∈ BV[a, b]. Then these are in BV[a, b]: (Rudin’s PMA,
Thm. 2.1(ii))
◦ f has countably-many discontinuities (all being jump/removable). (Rudin’s PMA, Thm. 2.8)
◦ Hence, f ∈ R[a, b], i.e. BV[a, b] ⊆ R[a, b]. (Rudin’s PMA, Prob. 2.32)
◦ If f ∈ BV[a, b], then f is bounded on [a, b]. (Rudin’s PMA, Thm. 2.1(i))
404
◦ (Jordan Decomposition; Cor. 2.7) f ∈ BV[a, b] ⇐⇒ f may be written as φ − ψ, for φ, ψ
monotone-increasing and bounded on [a, b].
We may extend this to f ∈ BV(−∞, ∞). (Rudin’s PMA, Prob. 2.8)
A proof comes from the fact that
π+f π−f
f= − or f = f + π − π
2 2
x →x+
◦ Note: It is not necessarily true that Vxx0 (f ) −−0−−−→ 0
◦ If f ∈ BV[a, b], then π is increasing on [a, b] (hence π ′ exists a.e.)
◦ If f ∈ BV[a, b], then π and f share the same points of left-, right-, and full continuity
Rb
◦ a π ′ ≤ π(b) − π(a), giving the fundamental theorem-like result of
Z b
d x
V (f ) dx ≤ Vab (f )
a dx a
◦ |f ′ | = π ′ a.e.
Rx
Results tied to differentiability/integrability; here, F (x) := a
f is the indefinite integral
Equivalently,
405
§15.11: Absolute Continuity
Motivation:
Recall some fundamental results from calculus:
Rx
Fundamental Theorem of Calculus 1: For f ∈ R[a, b] and F (x) := a
f (t) dt the indefinite
integral of F defined on x ∈ [a, b], we have
◦ F is Lipschitz
◦ F ′ exists wherever f is continuous, and F ′ = f at such points
◦ F ′ exists a.e.
Fundamental Theorem of Calculus 2: Let f : [a, b] → R be differentiable on all of [a, b], with
f ′ ∈ R[a, b]. Then the fundamental theorem of calculus holds:
Z x
f (x) = f (a) + f ′ (x) dx
a
◦ f ′ exists a.e.
◦ f ′ ∈ L[a, b]
Z x
◦ f (x) = f (a) + f′
a
Basic Definitions:
Absolute Continuity: We say f is absolutely continuous on [a, b], denoted f ∈ AC[a, b], when,
n
∀ε > 0, ∃δ > 0, such that, for any collection {[ak , bk ]}k=1 of finitely-many non-overlapping intervals in
[a, b] satisfying X X
|bk − ak | < δ, we have |f (bi ) − f (ai )| < ε
k k
Singular Function: If f ′ ≡ 0 but f is non-constant on [a, b], we say f is a singular function. The
Devil’s staircase function (or Cantor-Lebesgue function) is an example:
∞ ∞
X an X 2an
, ∈ C and an ∈ {0, 1}
n 3n
n=1
2 n=1
c(x) :=
sup c(y), x ̸∈ C
y≤x
y∈C
Basic Results:
406
If f ∈ AC[a, b], then
(i) f ∈ C[a, b]
(ii) f ′ exists a.e.
(iii) f ′ ∈ L[a, b]
(iv) f is an N -function or f satisfies the FTC
If f ∈ C[a, b] is differentiable except on an at-most-countable set, and f ′ ∈ L[a, b], then f ∈ AC[a, b]
and the FTC holds
If f ∈ BV[a, b], then we may write (Rudin’s PMA, Thm. 7.30)
for a ∈ AC[a, b] and h singular on [a, b], each unique up to additive constants.
407
More Important Results:
Absolute Continuity of Integral: Let f ∈ L[a, b] (or generally, f ∈ L(Rn )). Then ∀ε > 0, ∃δ > 0
such that Z
for any E ⊆ [a, b] with |E| < ε, we have f <ε
E
or rather Z
lim f =0
|E|→0 E
◦ f ′ exists a.e.
◦ f ′ ∈ L[a, b]
Z x
◦ f (x) = f (a) + f′
a
Integration by Parts: Given f, g ∈ AC[a, b], then (Rudin’s PMA, Thm. 7.32)
Z b b Z b
′
f ′g
f g = f g −
a a a
Change of Variables: For f ∈ L[a, b] and φ : [α, β] → R with φ ∈ AC[a, b], φ(α) = a, φ(β) = b, and
φ is strictly increasing, we have
Z b Z β
f= f (φ(t))φ′ (t) dt
a α
A Fubini Theorem: Let {fn : [a, b] → R}n∈N be a sequence of increasing functions, finite everywhere
(hence −∞ < f (a) ≤ f (b) < +∞). Define
∞
X
f (x) := fn (x)
n=1
Then f ′ exists a.e., with us being able to bring the derivative inside:
∞
X
f ′ (x) = fn′ (x)
n=1
408
§15.12: Convex Functions
Motivation:
In ordinary calculus, we are tempted to say φ : (a, b) → R is convex (concave up) if φ′′ (x) ≥ 0 on (a, b),
and dually concave (concave down) if φ′′ (x) ≤ 0 on (a, b). These motivate a new definition.
Basic Definitions:
Support Line: Let φ be convex on (a, b) and x0 ∈ (a, b). A supporting line (of φ, through x0 ) is a
line through (x0 , φ(x0 )) lying on or below the graph of φ on (a, b).
If m ∈ [D− φ(x0 ), D+ φ(x0 )], then a line through that point with slope m is a support line.
Some Results:
Suppose φ1 , φ2 are convex on (a, b), and α, β ∈ R. Then αφ1 + βφ2 is convex on (a, b). (Rudin’s
PMA, Thm. 7.36)
∞ k→∞
If {φi }i=1 are convex on (a, b) and φk −−−−→ φ, then φ is convex (Rudin’s PMA, Thm. 7.36)
Chordal-Slope / Three Slopes Lemma: Let φ : (a, b) → R be convex, with a < ℓ < m < r < b
(respectively, ℓ, m, r can be thought of as a “left”, “middle”, and “right” point). Then
φ(x) − φ(y)
Φ(x, y) :=
x−y
increasing either x or y increases the overall value.
The converse is also true: if the chordal-slope lemma inequalities hold (and it is easiest to prove with
the first and last fractions), then φ is convex.
Let φ : (a, b) → R be convex. Then:
409
◦ φ is Lipschitz on any [a′ , b′ ] ⊆ (a, b) with Lipschitz constant
L := max φ′+ (a′ ) , φ′− (b′ )
410
Jensen’s Inequality (Integrals): Let us have (Rudin’s PMA, Thm. 7.44)
(i) E ⊆ Rn measurable
(ii) p ≥ 0
R
(iii) E p > 0
(iv) f, p ∈ L(E) and finite a.e.
R
(v) E |f (x)p(x)| dx < ∞
(vi) φ : (a, b) → R convex, with im f ⊆ im φ
Then Z Z
f (x)p(x) dx φ(f (x))p(x) dx
EZ ≤ E
φ
Z
p(x) dx p(x) dx
E E
R
In particular, taking p ≡ 1 and hence E p = 1, and |E| = 1, we see a more familiar version:
Z Z
φ f (x) dx ≤ φ(f (x)) dx
E E
411
§15.13: Lp Spaces
Definition of Lp Space:
Let E ⊆ Rn be measurable and p ∈ (0, ∞). Then
Z
Lp (E) := f : E → F ∈ R, C measurable ∥f ∥p :=
p,E |f | < ∞
E
Note that:
For p < 1, ∥·∥p,E does not represent a true norm (triangle inequality is violated)
ess sup f ≡ ess sup f (x) := inf{α ∈ R | |{f > α}| = 0} ≡ inf{α ∈ R | f (x) ≤ α a.e.}
E x∈E
(As is typical, if there are no such α, we will have ess sup f = inf ∅ = +∞.) Then
and
L∞ (E) :=
f : E → R measurable ∥f ∥∞,E := ess sup|f | < ∞
E
We may often drop the set from the norm (writing ∥·∥p ) or the set (writing Lp ) if the set is to be
understood.
For f ∈ C(E, R) bounded, then its ∞-norm is the supremum/uniform norm of C(E, R):
412
∞-norm is limit of p-norms: Specifically on E with |E| < ∞, we have
∥f ∥∞ ≡ lim ∥f ∥p
p→∞
This need not be true on infinite-measure sets. (Consider a nonzero constant function.)
Lp Inclusions: Suppose 0 < p < q ≤ ∞ and |E| < ∞. Then Lq (E) ⊆ Lp (E) (larger exponent gives
a smaller space). In particular, then, L∞ (E) ⊆ Lp (E) for every p ∈ (0, ∞].
Lp is a Vector Space: If f, g ∈ Lp (E) and α, β ∈ R, then αf + βg ∈ Lp (E).
Young’s Inequality: We have, for a, b ∈ R≥0 and p > 1, with q such that 1/p + 1/q = 1
ap bq
ab ≤ +
p q
with equality iff ap = bq .
Young’s Inequality for Integrals: Let us have:
Then Z a Z b
ab ≤ φ+ φ−1
0 0
(i) p ∈ [1, ∞]
(ii) q its Holder conjugate, i.e. 1/p + 1/q = 1 (if one is infinity, the other is 1)
(iii) f ∈ Lp (E)
(iv) g ∈ Lq (E)
or rather
∥f g∥1 ≤ ∥f ∥p · ∥g∥q
with equality iff
(i) f g ≥ 0 on a.e., and
p q
(ii) |f | = α|g| a.e., for some α ∈ R>0
413
Then Z Y YZ 1/pi
pi
fi ≤ |f |
E
i E i
or rather
Y
Y
fi
≤ ∥fi ∥pi
i 1 i
then
θ 1−θ
∥f ∥r ≤ ∥f ∥p · ∥f ∥q
n o
∥f ∥r ≤ max ∥f ∥p , ∥f ∥q
p/r 1−(p/r)
∥f ∥r ≤ ∥f ∥p · ∥f ∥∞
∥f + g∥p ≤ ∥f ∥p + ∥g∥p
and
ess sup|f + g| ≤ ess sup|f | + ess sup|g|
E E E
414
§16: Items from Complex Analysis
415
§16.2: Complex Integration
Fundamental Theorem for Complex Line Integrals: For f holomorphic on U ⊆ C open, and γ
a curve in U from za to zb , then Z
f ′ (z) dz = f (zb ) − f (za )
γ
Cauchy’s Integral Theorem: Let γ : [a, b] → U be a smooth, closed curve, with U simply-connected
and open, and f : U → C holomorphic. Then
Z
f (z) dz = 0
γ
Residue Theorem: For sufficiently “nice” f , we define the residue of f at c (a pole of order n) by
1 dn−1
Res(f, c) ≡ Res f (z) := lim n−1 (z − c)n f (z)
z=c (n − 1)! z→c dz
(Simple poles are those of order 1.) One also notes that the Laurent series (centered at c) is given by
Z
X
n 1 f (z)
f (z) = an (z − c) an = dz
2πi γ (z − c)n+1
n∈Z
with γ a counterclockwise Jordan curve enclosing c and lying in an annulus in which f is holomor-
phic/analytic. Then
Res(f, c) = a−1 , the coefficient of (z − c)−1
We have that Z X
f (z) dz = 2πi Res(f )
γ z=c
c in γ
(Note that sometimes even with multiple singularities, residue theorem is not necessary; for instance,
the ML lemma handles this example.)
416
§16.3: Auxillary Inequalities/Results for Contour Integrals
Just a collection of useful results, estimations, inequalities, and reminders for contour integrals aside from
the aforementioned.
eiz − e−iz
◦ sin(z) =
2i
eiz + e−iz
◦ cos(z) =
2
Numerical Integration: Using the formula
Z Z b
f (z) dz = f (γ(t))γ ′ (t) dt
γ a
(where γ is parameterized by {γ(t)}t∈[a,b] ), Wolfram Alpha can handle some contour integrals of simple
types. For instance, one may use
Z 2π
sin(eiθ ) · ieiθ
Z
sin(z)
4
dz = dz
|z|=1 (z − π/4) 0 (eiθ − π/4)4
to get an approximation. (Be sure to account properly for the differential.) The above example is here.
Z Z
Estimation Lemma: For f continuous, f (z) dz ≤ |f (z)| |dz| ≤ sup |f (z)| length of γ
γ γ z on γ
Other names: “Triangle inequality for contour integrals,” “ML estimation lemma” (“M” for “max”,
“L” for “length”)
Jordan’s Lemma: Take the semicircular arc Cr := reiθ t∈[0,π] and f continuous satisfying f (z) = eiaz g(z)
417
§17: Items from Functional Analysis
A metric space (X, d) is a set X with a distance/metric function d : X 2 → R≥0 such that, ∀x, y, z ∈ X
d(x, y) = 0 ⇐⇒ x = y (positivity)
We say two metric spaces (X, d), (Y, ρ) are isometric (isomorphic) if there is a bijective distance-
preserving map T : X → Y : ρ(T x, T y) = d(x, y) Even without bijectivity, we would say T is an isometry.
We say two metric spaces are homeomorphic if there is a bijective continuous function T : X → Y with
continuous inverse. (Isometric spaces are homeomorphic.)
Some basic examples of metric spaces include:
Normed & Inner Product Vector Spaces: Any vector space V with inner product ⟨·, ·⟩ or ∥·∥
induces a metric by the rule(s)
p
d(x, y) := ∥x − y∥ = ⟨x − y, x − y⟩
ℓ∞ := {x := (ξi )∞
i=1 ∈ C
∞
| |ξi | ≤ Mx for some Mx ∈ R}
Mx depends only on x; that is, the sequence is bounded. We may equip it with distance
∞
!1/p
X p
∥x∥p := |xi |
i=1
n o
for p ∈ [1, ∞). Then ℓp := x := (ξi )∞
i=1 ∥x∥p < ∞ .
The function space C[a, b]. This is all f : [a, b] → R which are continuous, and equipped with
sup-norm:
d(x, y) := sup |x(t) − y(t)|
t∈[a,b]
(
1, x ̸= y
Trivial/Discrete Space: For any set X, let d(x, y) := 1 − δx,y =
0, x = y
418
Holder’s Sum Inequality: Given p, q Holder conjugates (1/p + 1/q = 1), then
419
§17.2: (Kreyszig, Ch. 1) Brief Items From Topology
Balls/Spheres: Given x0 ∈ X a metric space and r ∈ R≥0 , we define open balls, closed balls, and
spheres as so:
The set of accumulation points is M ′ . The closure of M is M ∪ M ′ =: M , and it is the smallest closed
set containing M .
Continuity: We say a function f : X → Y of metric spaces (X, d), (Y, ρ) is continuous at x0 ∈ X if
(∀ε > 0)(∃δ > 0)(∀x such that d(x, x0 ) < δ)(ρ(f (x), f (x0 )) < ε)
One notes that f is continuous iff the preimage of an open set is open. (Kreyszig, Thm. 1.3-4)
Density; Separability: M ⊆ X is dense in X if M = X. X is separable if it has a countable dense
set.
Rn , Cn are separable; ℓ∞ is not. ℓp is separable for p ∈ [1, ∞).
Convergence: A sequence {xn }n∈N in X converges to x ∈ X iff
n→∞
lim d(xn , x) = 0 sometimes written as lim xn = x or xn −−−−→ x
n→∞ n→∞
Some results:
n→∞ n→∞ n→∞
◦ We note that limits are unique, and xn −−−−→ x, yn −−−−→ y =⇒ d(xn , yn ) −−−−→ d(x, y).
(Kreyszig, Thm. 1.4-2)
◦ On closures, x ∈ M iff it is some sequence’s limit, and M is closed iff each sequence converges in
M. (Kreyszig, Thm. 1.4-6)
n→∞ n→∞
◦ Continuous functions preserve convergence (xn −−−−→ x =⇒ f (xn ) −−−−→ f (x)). (Kreyszig,
Thm. 1.4-8)
◦ Subsequences of convergent sequences converge to the same limit. (Kreyszig, Prob. 1.4.1)
420
◦ If a Cauchy sequence has a subsequence with limit L, the original sequence has limit L. (Kreyszig,
Prob. 1.4.2)
◦ All Cauchy sequences are bounded. (Kreyszig, Prob. 1.4.4)
Cauchy Sequences: {xn }n∈N in X is said to be Cauchy iff
A sequence is bounded if its members constitute a bounded subset. A bounded set satisfies M ⊆ B(x; r)
for r sufficiently large.
421
§17.3: (Kreyszig, Ch. 2) Normed & Banach Spaces
Basic Definitions:
Vector Space: A vector space over a scalar field F is a set V of vectors with operations
+ : V 2 → V, · : V × F → V such that
Normed Space: A vector space is a normed space when equipped with a vector norm on ∥·∥ on it.
A norm is a mapping ∥·∥ : V → R≥0 such that
◦ ∥x∥ = 0 ⇐⇒ x = 0 (positive-definite)
◦ ∥αx∥ = |α| · ∥x∥ (absolute homogenity)
◦ ∥x + y∥ ≤ ∥x∥ + ∥y∥ (triangle inequality)
422
§18: Items from Analytic Number Theory
N (n) = n
f ∗ f −1 = f −1 ∗ f = I
423
§18.2: Important Functions
(Wikipedia article.)
Definition
For n > 1, write its prime decomposition n = pa1 1 · · ·par r . Then
1, n = 1
µ(n) := (−1)r , ai = 1 ∀i
0, otherwise
Some Identities
X X n
(Mobius Inversion Formula) f (n) = g(d) =⇒ g(n) = f (d)µ (Apostol, Thm. 2.9)
d
d|n d|n
(
X 1 1, n=1
µ(d) = = (Apostol, Thm. 2.1)
n 0, n>1
d|n
λ−1 = |µ|
X X
|µ(d)| = 1 = 2ω(n)
d|n d|n
d squarefree
424
X
µ(d) = µ2 (n) (Apostol, Prob. 2.6)
d2 |n
(
X 0, mk | n for some n > 1
µ(d) =
1, otherwise
dk |n
X 1, n = 1
For p prime, µ(d)µ gcd(p, d) = 2, n = pa for some a ∈ Z≥1 (Apostol, Prob. 2.7)
d|n
0, otherwise
X
If n has more than m distinct prime factors, m ≥ 1, µ(d) logm (d) = 0
d|n
X k
µ(n) is the sum of the primitive nth roots of unity: µ(n) = exp 2πi
n
1≤k≤n
gcd(k,n)=1
X n
λ(n) = µ 2 (Apostol, Prob. 2.33)
2
d
d |n
∞
X µ(n) 1
α
= for α ∈ R+
̸=1
n=1
n ζ(α)
∞
X |µ(n)| ζ(α)
α
= (Source: Wikipedia)
n=1
n ζ(2α)
∞
X µ(n)
=0 (Apostol, Thm. 4.16)
n=1
n
X jxk
µ(n) =1 (Apostol, Thm. 3.12)
n
n≤x
X |µ(d)| X n
φ1 (n) := n = µ(d)σ (Apostol, Prob. 3.11b)
d d2
d|n d2 |n
∞
X µ(n) log(n)
= −1 (Source: Wikipedia)
n=1
n
∞
X µ(n) log2 (n)
= −2γ (Source: Wikipedia)
n=1
n
(Schneider’s identities, per Wikipedia) For ϕ the golden ratio and ϕ := 1/ϕ its conjugate:
∞ ∞
X φ(k) k
X µ(k) k
ϕ=− log 1 − ϕ ϕ=− log 1 − ϕ
k k
k=1 k=1
Consequently
∞
X µ(k) − φ(k) k
log 1 − ϕ = 1
k
k=1
425
The proof uses the formulas, for x ∈ (0, 1),
∞ ∞
X φ(k) x X µ(k)
− log(1 − xk ) = − log(1 − xk ) = x
k 1−x k
k=1 k=1
Asymptotics
1X
lim µ(n) = 0 (equivalent to the PNT)
x→∞ x
n≤x
X j x k2 1 2
µ(n) = x + O(x log x) (Apostol, Prob. 3.4a)
n ζ(2)
n≤x
X µ(n) j x k 1
= x + O(log x) (Apostol, Prob. 3.4b)
n n ζ(2)
n≤x
426
§18.2.2: Mobius Function of Order k, µk
Definition
Let k ∈ Z≥1 . If n > 1, give it the prime decomposition n = pa1 1 · · ·par r . We define
1, n = 1
0, pk+1 | n for a prime p
µk (n) :=
(−1)r , n = pki1 · · ·pkir i pai i with ai < k (can pull out r kth prime powers)
Q
1, otherwise
That is: µk (n) = 0 if n is divisible by a (k + 1)th power, and is 1 unless you can factor out r-many kth prime
powers, and then it is (−1)r .
Note that µ1 = µ.
Some Identities
Asymptotics
427
§18.2.3: Merten’s M Function
(Wikipedia article.)
Definition
Defined by the partial sums of the Mobius function,
X
M (x) := µ(n)
n≤x
Some Identities
Z ∞
1 M (x)
Using a Mellin transform, =s dx on Re(s) > 1 (Source: Wikipedia)
ζ(s) 1 xs+1
∞
X x
ψ(x) = M log(n) (Source: Wikipedia)
n=2
n
Asymptotics
(Merten’s Conjecture) The “best” big-O for M is not known. Numerical evidence suggests
√
|M (x)| < x on x > 1
√
i.e. M (x) = O( x). The best result known is given by the following: for some A > 0 constant, and
the function −1/5
δ(x) := exp − A log3/5 x log log x
428
§18.2.4: Euler’s Totient Function φ
(Wikipedia article.)
Definition
φ(n) is the number of positive integers ≤ n which are coprime to it. That is,
X
φ(n) := 1
1≤k≤n
gcd(k,n)=1
Some Identities
Y 1
φ(n) = n 1− (Apostol, Thm. 2.4)
p
p|n
φ(pa ) = pa − pa−1 for all p prime and all a ∈ Z≥1 (Apostol, Thm. 2.5a)
d
φ(mn) = φ(m)φ(n) where d := gcd(m, n) (Apostol, Thm. 2.5b)
φ(d)
a | b =⇒ φ(a) | φ(b) (Apostol, Thm. 2.5d)
X
(Dirichlet Inverse) φ−1 = u ∗ µN =⇒ φ−1 (n) = dµ(d)
d|n
Y
(Dirichlet Inverse) φ−1 (n) = (1 − p)
p|n
n X µ2 (d)
= (Apostol, Prob. 2.3)
φ(n) φ(d)
d|n
429
Y Y d! µ(n/d)
k=n φ(n)
(Apostol, Prob. 2.20)
dd
1≤k≤n d|n
gcd(n,k)=1
X n
σ1 = φ ∗ σ0 =⇒ σ1 (n) = φ(d)σ0 (Apostol, Prob. 2.22)
d
d|n
X 1
k= nφ(n) (Source: Wikipedia)
2
1≤k≤n
gcd(n,k)=1
X
(Menon’s identity) gcd(k − 1, n) = φ(n)σ0 (n) (Source: Wikipedia)
1≤k≤n
gcd(k,n)=1
(Schneider’s identities, per Wikipedia) For ϕ the golden ratio and ϕ := 1/ϕ its conjugate:
∞ ∞
X φ(k) k
X µ(k) k
ϕ=− log 1 − ϕ ϕ=− log 1 − ϕ
k k
k=1 k=1
Consequently
∞
X µ(k) − φ(k) k
log 1 − ϕ = 1
k
k=1
∞
X φ(n) ζ(s − 1)
(Dirichlet Series) s
= on Re(s) > 2 (Source: Wikipedia)
n=1
n ζ(s)
Asymptotics
X 1 1 2
φ(n) = x + O(x log x) (Apostol, Thm. 3.7)
2 ζ(2)
n≤x
4/3
◦ Error improvable to O x log3/2 (x) + log log x (Source: Wikipedia)
X φ(n) 1
= x + O(log x) (Apostol, Prob. 3.5)
n ζ(2)
n≤x
4/3
◦ Error improvable to O log2/3 x log log x (Source: Wikipedia)
X φ(n) ∞
1 γ X µ(n) log(n) log(x)
= log(x) + − +O (Apostol, Prob. 3.6)
n2 ζ(2) ζ(2) n=1 n2 x
n≤x
430
X φ(n) x2−α 1
+ O x1−α log(x) for α ∈ R≤1
α
= (Apostol, Prob. 3.8)
n 2 − α ζ(2)
n≤x
X n
= O(x) (Apostol, Prob. 3.9b)
φ(n)
n≤x
315 1
2/3
◦ Improvable to: ζ(3)x − log x + O log (n) (Source: Wikipedia)
2π 4 2
X 1
= O(log x) (Apostol, Prob. 3.10)
φ(n)
n≤x
!
315 X log(p) log2/3 n
◦ Improvable to ζ(3) log(x) + γ −
+O (Source: Wikipedia)
2π 4 p2 − p + 1 n
p prime
X n
Given m ∈ Z≥2 , 1= φ(n) + O 2ω(m) (Source: Wikipedia)
m
1≤k≤n
gcd(k,m)=1
Unsolved Problems
Lehmer’s Totient Problem (Wikipedia link): It is known φ(p) = p − 1 for p prime. Are there
composite n such that φ(n) | n − 1? If such n exists, it is odd, squarefree, ω(n) ≥ 14, and n > 1020 . If
3 | n, then n > 101937042 and ω(n) ≥ 298848.
Carmichael’s Totient Function Conjecture (Wikipedia link): Claims that there is no n such
that, ∀m ∈ N̸=n , we have φ(m) ̸= φ(n). That is, ∀n ∈ N, ∃m ∈ N̸=n such that φ(m) = φ(n). If there
is a counterexample, there are infinitely many, and the smallest n > 1010,000,000,000 . Per Pomerance, if
n is a counterexample, then for any p prime where p − 1 | φ(n), we have p2 | n
431
§18.2.5: Jordan’s Totient Functions Jk
(Wikipedia article.)
Definition
Y 1
Jk (n) := nk 1−
pk
p|n
Clearly, J1 ≡ φ.
Some Identities
Asymptotics
nk X Jk (n) nk
Jk has average order , i.e. = + (error). (Source: Wikipedia)
ζ(k + 1) n ζ(k + 1)
n≤x
432
§18.2.6: Liouville’s λ Function
(Wikipedia article.)
Definition
For n > 1, write a prime decomposition of n by n = pa1 1 · · ·par r . Then
(
Ω(n) (−1)a1 +...+ar , n > 1
λ(n) := (−1) =
1, n = 1
Some Identities
∞
X λ(n) ζ(2s)
(Dirichlet Series) s
= (Source: Wikipedia)
n=1
n ζ(s)
∞
X λ(n) log(n)
= −ζ(2) (Source: Wikipedia)
n=1
n
Asymptotics
433
§18.2.7: The Divisor-Sum Functions σα
(Wikipedia article.)
Definition
Let α ∈ C and n ∈ Z≥1 . Define the sum of the αth powers of n’s divisors by
X
σα (n) := dα =⇒ σα = u ∗ N α
d|n
Some Identities
r
! r
Y Y
(Multiplicative) σα (mn) = σα (m)σα (n) for m, n coprime. In particular, σα pai i = σα (pai i )
i=1 i=1
α(n+1)
p −1
, α ̸= 0
(Prime Powers) σα (pn ) = α
p −1
n + 1, α=0
X n
(Dirichlet Inverse) σα−1 (n) = dα µ(d)µ =⇒ σα−1 = µN α ∗ µ (Apostol, Thm. 2.20)
d
d|n
X
2ω(d) = σ0 (n2 )
d|n
Y
d = nd(n)/2 (Apostol, Prob. 2.10)
d|n
2
X X
d3 (r) = d(r) (Apostol, Prob. 2.12)
r|n r|n
X |µ(d)| X n
φ1 (n) := n = µ(d)σ (Apostol, Prob. 3.11b)
d d2
d|n d2 |n
X
(Menon’s identity) gcd(k − 1, n) = φ(n)σ0 (n) (Source: Wikipedia)
1≤k≤n
gcd(k,n)=1
∞
X σα (n)
(Dirichlet Series) = ζ(s)ζ(s − α) for s > max{1, 1 + α} (Source: Wikipedia)
n=1
ns
434
∞
X σα (n)σβ (n) ζ(s) · ζ(s − α) · ζ(s − β) · ζ(s − (α + β))
(Ramanujan) s
= (Source: Wikipedia)
n=1
n ζ(2s − (α + β))
∞ !
X 1 ℓ−1
σk (n) = ζ(k + 1)m k
1+2 cos πn (Source: Wikipedia)
ℓk+1 ℓ
ℓ=2
Asymptotics
X ζ(α + 1) α+1
σα (n) = x + O xmax{1,α} for α ∈ R+
̸=1 and x ≥ 1 (Apostol, Thm. 3.5)
α+1
n≤x
(
X ζ(α + 1) + O xmax{0,1−α} , α ̸= 1
σ−α (n) = wherein α > 0 (Apostol, Thm. 3.6)
n≤x
ζ(2)x + O(log x), α=1
X d(n) 1
= log2 (x) + 2γ log(x) + O(1) (Apostol, Prob. 3.2)
n 2
n≤x
435
§18.2.8: The Number of Prime Divisor Functions, ω, Ω, ν
(Wikipedia article.)
Definition
Let n > 1 have the prime factorization n = pa1 1 · · ·par r . We define:
X
ω(n) := 1 (the text also uses ν := ω)
p|n
X X
Ω(n) := 1= 1
pk |n k∈Z+
pk |n
Some Identities
Ω(n) ≥ ω(n); Ω(n) = ω(n) =⇒ n squarefree with µ(n) = (−1)ω(n) = (−1)Ω(n) (Source: Wikipedia)
Ω(n) = 1 =⇒ n prime (Source: Wikipedia)
X X
|µ(d)| = 1 = 2ω(n) (Source: Wikipedia)
d|n d|n
d squarefree
X
2ω(d) = σ0 (n2 ) (Source: Wikipedia)
d|n
Asymptotics
X
ω(n) = x log log x + B1 x + o(x) for B1 ≈ 0.261 the Mertens constant (Source: Wikipedia)
n≤x
X X 1
Ω(n) = x log log x + B2 x + o(x) for B2 = B1 + ≈ 1.035 (Source: Wikipedia)
p
p(p − 1)
n≤x
X 2
ω 2 (n) = x log log x + O(x log log x) (Source: Wikipedia)
n≤x
X k k−1
ω k (n) = x log log x + O (x log log x where k ∈ Z≥1 (Source: Wikipedia)
n≤x
436
X
Ω(n) − ω(n) = O(x) (Source: Wikipedia)
n≤x
X n
Given m ∈ Z≥2 , 1= φ(n) + O 2ω(m) (Source: Wikipedia)
m
1≤k≤n
gcd(k,m)=1
437
§18.2.9: Mangoldt’s Λ Function
(Wikipedia article.)
Definition
(
log(p), n = pm for a prime p and an m ∈ Z≥1
Λ(n) :=
0, otherwise
Some Identities
X
log(n) = Λ(d) (or log = Λ ∗ u) (Apostol, Thm. 2.10)
d|n
X n X
Λ(n) = µ(d) log =− µ(d) log(d) (or Λ = µ ∗ log) (Apostol, Thm. 2.11)
d
d|n d|n
∞
X Λ(n)
= log ζ(s), for Re(s) > 1 (Source: Wikipedia)
n=2
ns
log(n)
∞
ζ ′ (s) X Λ(n)
◦ Consequently, =− (Source: Wikipedia)
ζ(s) n=1
ns
◦ More generally for f completely multiplicative, if we define
∞
X f (n)
F (s) =
n=1
ns
438
Asymptotics
X jxk
Λ(n) = x log x − x + O(log x) (Apostol, Thm. 3.15)
n
n≤x
1X
lim Λ(n) = 1 (equivalent to the PNT)
x→∞ x
n≤x
X Λ(n)
= log x + O(1) (Apostol, Thm. 4.9)
n
n≤x
X x
(Selberg’s Asymptotic Formula) ψ(x) log(x) + Λ(n)ψ = 2x log(x) + O(x) (Apostol, Thm.
n
n≤x
4.18)
X x
M (x) log(x) + M Λ(n) = O(x) (Apostol, Prob. 4.23)
n
n≤x
439
§18.2.10: Chebyshev’s ψ Function / Second Function
(Wikipedia article.)
Definition
Defined as the partial sums of Mangoldt’s Λ:
X X X X
ψ(x) := Λ(n) = log(p) = ϑ x1/m
n≤x m≤log2 (x) p≤x1/m m≤log2 (x)
Some Identities
∞
ζ ′ (s)
Z
ψ(x)
Via Mellin transform, = −s dx for Re(s) > 1 (Source: Wikipedia)
ζ(s) 1 xs+1
Asymptotics
ψ(x) ∼
∞ x (equivalent to PNT) (Apostol, Thm. 4.4)
X x
ψ = x log x − x + O(log x) (Apostol, Thm. 4.11)
n
n≤x
X x
(Selberg’s Asymptotic Formula) ψ(x) log(x) + Λ(n)ψ = 2x log(x) + O(x) (Apostol, Thm.
n
n≤x
4.18)
x
For x ≥ e22 , |ψ(x) − x| ≤ 0.006409 (Source: Wikipedia)
log x
√ √ √
For x ≥ 121, 0.9999 x < ψ(x) − ϑ(x) < 1.00007 x + 1.78 3 x (Source: Wikipedia)
440
§18.2.11: Chebyshev’s ϑ Function / First Function
(Wikipedia article.)
Definition
The definition for ϑ comes naturally from a rewriting of the definition of ψ;
X
ϑ(x) := log(p)
p≤x
Some Identities
Z x
π(t)
ϑ(x) = π(x) log(x) − dt (Apostol, Thm. 4.3)
2 t
Z x
ϑ(x) ϑ(t)
π(x) = + dt (Apostol, Thm. 4.3)
log(x) 2 t log2 (t)
Define Λ1 by (
log(n), n is prime
Λ1 (n) :=
0, otherwise
X
Then ϑ(x) = Λ1 (n)
n≤x
∞
X x
ψ(x) = M log(n) (Source: Wikipedia)
n=2
n
Asymptotics
ϑ(x) ∼
∞ x (equivalent to PNT) (Apostol, Thm. 4.4)
X x
ϑ = x log x + O(x) (Apostol, Thm. 4.11)
n
n≤x
x x x
π(x) = +O ⇐⇒ ϑ(x) = x + O (Apostol, Prob. 4.18)
log x log2 x log x
X x
(Selberg’s Asymptotic Formula) ψ(x) log(x) + Λ(n)ψ = 2x log(x) + O(x) (Apostol, Thm.
n
n≤x
4.18)
441
◦ This is equivalent to the relations (by Exercise 4.22)
X x
ψ(x) log(x) + ψ log(p) = 2x log(x) + O(x)
p
p≤x
X x
ϑ(x) log(x) + ϑ log(p) = 2x log(x) + O(x)
p
p≤x
log log k − 2.050735
For k ≥ 10 , ϑ(pk ) ≥ k log k + log log k − 1 +
11
(Source: Wikipedia)
log k
log log k − 2
For k ≥ 198, ϑ(pk ) ≤ k log k + log log k − 1 + (Source: Wikipedia)
log k
x
For x ≥ 10, 544, 111, |ϑ(x) − x| ≤ 0.006788 (Source: Wikipedia)
log x
√ √ √
For x ≥ 121, 0.9999 x < ψ(x) − ϑ(x) < 1.00007 x + 1.78 3 x (Source: Wikipedia)
442
§18.2.12: The Prime-Counting Function π
(Wikipedia article.)
Definition
π(x) counts the number of primes ≤ x; formally,
X
π(x) := 1
p≤x
Some Identities
Z x
π(t)
ϑ(x) = π(x) log(x) − dt (Apostol, Thm. 4.3)
2 t
Z x
ϑ(x) ϑ(t)
π(x) = + dt (Apostol, Thm. 4.3)
log(x) 2 t log2 (t)
1 n n
< π(x) < 6 (Apostol, Thm. 4.6)
6 log(n) log(n)
Asymptotics
x
(Prime Number Theorem) π(x) ∼
∞
log x
x x x
π(x) = +O ⇐⇒ ϑ(x) = x + O (Apostol, Prob. 4.18)
log x log2 x log x
√
RH =⇒ π(x) = li(x) + O( x log x). (Source: Wikipedia)
√
x log x
◦ Specifically, |π(x) − li(x)| < for x ≥ 2657
8π
443
§18.2.13: The Riemann ζ Function
(Wikipedia article.)
Definition
For the purposes of this text, we limit s to the set (0, 1) ∪ (1, ∞) and define
∞
X 1
, s>1
ns
n=1
ζ(s) :=
X 1 x1−s
lim − , s ∈ (0, 1)
x→∞
n s 1−s
n≤x
Some Identities
∞
X µ(n) 1
α
= for α ∈ R+
̸=1
n=1
n ζ(α)
Y 1
ζ(s) = 1−
ps
p prime
∞
X φ(n) ζ(s − 1)
s
= on Re(s) > 2 (Source: Wikipedia)
n=1
n ζ(s)
∞
X Jk (n) ζ(s − k)
s
= (Source: Wikipedia)
n=1
n ζ(s)
∞
X λ(n) ζ(2s)
s
= (Source: Wikipedia)
n=1
n ζ(s)
∞
X σα (n)
= ζ(s)ζ(s − α) for s > max{1, 1 + α} (Source: Wikipedia)
n=1
nα
∞
X σα (n)σβ (n) ζ(s) · ζ(s − α) · ζ(s − β) · ζ(s − (α + β))
(Ramanujan) s
= (Source: Wikipedia)
n=1
n ζ(2s − (α + β))
∞
X Λ(n)
s log(n)
= log ζ(s), for Re(s) > 1 (Source: Wikipedia)
n=2
n
∞
ζ ′ (s) X Λ(n)
◦ Consequently, =− (Source: Wikipedia)
ζ(s) n=1
ns
∞
ζ ′ (s)
Z
ψ(x)
Via Mellin transform, = −s dx for Re(s) > 1 (Source: Wikipedia)
ζ(s) 1 xs+1
444
Asymptotics
X 1
1
= log(x) + γ + O (Apostol, Thm. 3.2a)
n x
n≤x
x1−s
X 1
1
= + ζ(s) + O for s > 0 and s ̸= 1 (Apostol, Thm. 3.2b)
ns 1−s xs
n≤x
X 1
= O x1−s for s > 1
s
(Apostol, Thm. 3.2c)
n>x
n
X xs+1
ns = + O(xs ) for s ≥ 0 (Apostol, Thm. 3.2d)
s+1
n≤x
φ(k) x1−s
X 1 X µ(d) 1
Given k ∈ Z≥1 , s
= + ζ(s) s
+ O (Apostol, Prob. 3.12)
n k 1−s d xs
1≤n≤x d|k
gcd(k,n)=1
445
§18.3: Assorted Other Useful Results
x π(x)
π(x) ∼
∞ ; that is, lim =1
log(x) x→∞ x/ log x
1X
lim µ(n) = 0 (can express with M )
x→∞ x
n≤x
1X
lim Λ(n) = 1 (can express with ψ)
x→∞ x
n≤x
ψ(x) ∼
∞ x (Apostol, Thm. 4.4)
ϑ(x) ∼
∞ x (Apostol, Thm. 4.4)
x
π(x) ∼
∞ (Apostol, Thm. 4.5)
log(π(x))
pn ∼
∞ n log(n) for pn the nth prime (Apostol, Thm. 4.5)
M (x)
lim = 0 (equivalent to PNT) (Apostol, Thm. 4.14)
x→∞ x
R x dt
π(x) ∼
∞ li(x) where li(x) :=
0 log(t)
(Source: Wikipedia)
446
§18.3.3: Abel’s Identity
Then
n
X
∼ n
ak n→∞
k=0
447
Equivalently, by taking x := 1/ey , if
∞
∼ 1
X
an e−ny y↘0
n=0
y
then
n
X
∼ n
ak n→∞
k=0
448
§18.4: Congruences & Modular Arithmetic
a ≡ b (mod m) or a ≡m b
if and only if m | a − b.
449
§18.4.2: Basic Results
◦ b
a = bb ⇐⇒ a ≡m b (Apostol, Thm. 5.10)
◦ x, y ∈ b
a ⇐⇒ x, y ≡m a (Apostol, Thm. 5.10)
n om
◦ bk (in mod m) partition Z. (Hence are pairwise disjoint and union to Z.) (Apostol, Thm.
k=1
5.10)
m m
◦ For (k, m) coprime, if {ai }i=1 ∈ CRS(m) =⇒ {kai }i=1 ∈ CRS(m). (Apostol, Thm. 5.11)
φ(m) φ(m)
◦ For (k, m) coprime, {ai }i=1 ∈ RRS(m) =⇒ {kai }i=1 ∈ RRS(m) (Apostol, Thm. 5.16)
(On Linear Congruences)
◦ Let a, m be coprime. Then ax ≡m b has a single unqiue solution. (Apostol, Thm. 5.12)
◦ If gcd(a, m) = d, then ax ≡m b has solutions iff d | b. (Apostol, Thm. 5.13)
d−1 a b m
◦ If so, then there are exactly d solutions, given by {t + km/d}k=0 , where t solves x ≡ mod
d d d
(Apostol, Thm. 5.14)
450
◦ For a, m coprime, the solution to ax ≡m b satisfies x ≡m baφ(m)−1 (Apostol, Thm. 5.20)
(Euler & Fermat Style Results)
◦ (Lagrange) Take p prime and f ∈ Z[x] with coefficients cn ̸≡p 0. Then f (x) ≡p 0 has at most
deg(f )-many solutions. (Apostol, Thm. 5.21)
◦ Corollary: If the equation has > deg(f ) solutions, then p | cn for each n. (Apostol, Thm. 5.22)
(On Systems of Congruences)
r
◦ (Chinese Remainder Theorem/CRT) Let {mi }i=1 be pairwise coprime, M their product,
r
and {bi }i=1 ⊆ Z. Then the system of congruences x ≡ bi (mod mi ) has a unique solution modulo
M. (Apostol, Thm. 5.26)
Let M = m1 · · ·mr and Mk = M/mk . Let Mk′ be the inverse of Mk modulo mk . (Hence,
Mk Mk′ ≡ 1 (mod mk ).) Then the solution is given by
r
X
x= bi Mi Mi′
i=1
pre-reduction modulo M .
−1
Opting for my own notation, where (n)[m] is the multiplicative inverse of n modulo m, so
−1
that n(n)[m] ≡m 1, we have
r
−1
X
x= bi Mi (Mi )[mi ]
i=1
in modulo M .
r
◦ Corollary: Let also {ai }i=1 be such that ai , mi is a coprime pair for each i. Then the system
ai x ≡ bi (mod mi ) has a unique solution modulo M . (Apostol, Thm. 5.27)
◦ Corollary: For f ∈ Z[x], f (x) ≡M 0 has solutions ⇐⇒ f (x) ≡mi 0 for all i. Moreover, if ν(n) is
the number of solutions mod n, then ν(M ) = ν(m1 ) · · ·ν(mr ). (Apostol, Thm. 5.28)
Hence the problem of f (x) ≡M 0 for M = pa1 1 · · ·par r can be reduced to looking at the
equations f (x) ≡ 0 (mod pai i ).
451
§18.5: Dirichlet Characters & Finite Abelian Groups
Personal Notations:
452
§18.5.2: Basic Results
On Dirichlet Characters:
◦ |DChar(k)| = φ(k). Each χ ∈ DChar(k) is completely multiplicative and k-periodic. (Apostol,
Thm. 6.15)
φ(k)
◦ Take DChar(k) := {χi }i=1 and m, n ∈ Z with n, k coprime. Then: (Apostol, Thm. 6.16)
φ(k)
(
X φ(k), m ≡k n
χr (m)χr (n) =
r=1
0, otherwise
453
X
χ(n)f (n) = O(f (x))
x<n≤y
∞
x→∞
X
Suppose f (x) −−−−→ 0; then χ(n)f (n) converges.
n=1
X ∞
X
Moreover, for x ≥ x0 , χ(n)f (n) = χ(n)f (n) + O(f (x))
n≤x n=1
◦ Corollary: for f (x) = 1/x, f (x) = log(x)/x, and f (x) = x−1/2 respectively: (Apostol, Thm.
6.18)
X χ(n) X ∞
χ(n) 1
= +O
n n=1
n x
n≤x
X χ(n) log(n) X ∞
χ(n) log(n) log(x)
= +O
n n=1
n x
n≤x
X χ(n) X ∞
χ(n) 1
√ = √ +O √
n n=1
n x
n≤x
X
◦ Let χ ∈ DChar(k) be real-valued, and A(n) = χ(d) (Apostol, Thm. 6.19)
d|n
454
§18.6: On Arithmetical Progressions & Primes
◦ It is easy to see that N (k) is even, and since the LHS → ∞, the RHS must too, and hence
N (k) = 0.
X µ(n)χ(n)
◦ If L(1, χ) = 0 for χ ̸= χ1 , then L′ (1, χ) = log(x) + O(1) (Apostol, Lem. 7.8)
n
n≤x
455
X
◦ For k > 0 and a coprime to it, define πa (x) := 1
p≤x
p≡k a
∞
◦ πa (x) counts the number of primes ≤ x, in the sequence {nk + a}n=0 .
π(x) ∼ 1 x
◦ Its version of the PNT is πa (x) ∼
∞ ∞
φ(k) φ(k) log(x)
◦ If the above holds, then πa (x) ∼
∞ πb (x) whenever a, b are coprime to k. The converse is true.
456
§18.7: More on Dirichlet Characters & Gauss Sums
My Notations:
457
§18.7.2: Some Results
Early results:
k−1
(
X 0, k̸ | n
◦ ζkmn = (Apostol, Thm. 8.1)
m=0
k, k | n
k−1
◦ Lagrange interpolation (Th. 8.2): given {zi , wi }i=0 ⊆ C, with zi distinct, ∃! polynomial P
where deg(P ) ≤ k − 1 and P (zm ) = wm . It is given by defining
k−1 k−1
Y A(z) X Am (z)
A(z) := (z − zi ) Am (z) := P (z) = wm ·
i=0
z − zm m=0
Am (zm )
◦ Fourier Existence (Th. 8.4): For f arithmetical and k-periodic, ∃! g arithmetical and k-
periodic where
k−1 k−1
X 1 X
f (m) = g(n)ζkmn for g(n) = f (m)ζk−mn
n=0
k m=0
(f,g) F (k)g(N )
sk (n) =
F (N )
φ(k)µ(N )
Hence, ck (n) =
φ(N )
n
X X n
◦ ck (m) = dM for M Merten’s function (Apostol, Prob. 8.3a)
d
k=1 d|m
d
X 1 m X
◦ M (m) = m µ ck (d) (Apostol, Prob. 8.3b)
d d
d|m k=1
458
n j k
X X k n
◦ ck (m) = dµ (Apostol, Prob. 8.3c)
m=1
d d
d|k
◦ G(n, χ1 ) = ck (n)
◦ If n, k are coprime, then G(n, χ) = χ(n) · G(1, χ) (Apostol, Thm. 8.9)
◦ G(n, χ) is separable ∀n iff G(n, χ) whenever n, k are not coprime (Apostol, Thm. 8.10)
2
◦ If G(n, χ) is separable ∀n, then |G(1, χ)| = k (Apostol, Thm. 8.11)
◦ Let n be not coprime to k, G(n, χ) ̸= 0. Then ∃d | k, d < k, where (Apostol, Thm. 8.12)
χ(a) = χ(b)
On primitive characters:
459
◦ For χ ∈ PChar(k), it has Fourier expansion (Apostol, Thm. 8.20)
k
τk (χ) X
χ(m) = √ χ(n)ζk−mn
k n=1
where
k
G(1, χ) 1 X
τk (χ) = √ = √ χ(m)ζkm
k k m=1
Note that |τk (χ)| = 1.
◦ There is no real χ ∈ PChar(2m) for m odd. (Apostol, Prob. 8.5)
X
Recall that ∀χ ∈ DChar(k), we have
χ(m) ≤ φ(k)
m≤x
√
X
Polya’s Inequality: For χ ∈ PChar(k), ∀x ≥ 1,
χ(m) < k log(k) (Apostol, Thm. 8.21)
m≤x
√ 2√
X
◦ Improvable to
χ(n) < k + k log k (Apostol, Prob. 8.14)
n≤x π
X √
◦ For χ nonprimitive mod k, χ(m) = O k log(k) (Apostol, Thm. 13.15)
m≤x
460
§18.8: Quadratic Residues & Quadratic Reciprocity
Personal Notations:
◦ Residues: We say n is a quadratic residue mod p if ∃x ∈ Z (or Z/pZ) such that x2 ≡p n. (That
is, n has a square root of sorts, namely x.)
The text opts to say that n is a mod p quadratic residue by nRp.
A nonresidue has no such x; the text denotes it nRp.
◦ Indicator Symbols: (Sometimes (n | p) is used instead.)
n +1, n ∈ QR(p)
Legendre Symbol: := −1, n ̸∈ QR(p) (“is n a residue mod p?”)
p L
0, p | n
r a
n Y n i n
Jacobi Symbol: If P = pa1 1 · · ·par r , = and =1
P J i=1
pi L 1 J
Kronecker Symbol: We let the following hold to extend the Legendre symbol:
n 0, n even
:= +1, n ≡8 1, 7
2 L
−1, n ≡8 3, 5
(
n −1, n < 0
:=
−1 L +1, n ≥ 0
(
n 1, n = ±1
:=
0 L 0, otherwise
r
Y
Then we use the Jacobi symbol definition. Let P = u pai i as a prime decomposition, with
i=1
u = ±1. Then
n n r ai
Y n
:= ·
P K u L
i=1
pi L
n
◦ Quadratic Character: The χ ∈ DChar(p) defined by χ(n) :=
p L
461
§18.8.2: Main Results
Preliminaries:
◦ For p an odd prime, any RRS(p) set has (p − 1)/2 quadratic residues, and (p − 1)/2 nonresidues.
The residues belong to the class in which these lie: (Apostol, Thm. 9.1)
2
p−1
12 22 32 ···
2
(p−1)/2
◦ Gauss’ Lemma: Let p̸ | n. Take {kn mod p}k=1 . Let m be the number of these > p/2. Then
(Apostol, Thm. 9.6)
n
= (−1)m
p L
(p−1)/2
p2 − 1 X tn
(n − 1) + , n even
8 p
t=1
◦ An Improvement: In this scenario, m ≡2 (p−1)/2
(Apostol,
X tn
, n odd
p
t=1
Thm. 9.7)
Algebraic Properties of Legendre Symbols: (Throughout, p is an odd prime.)
2 (
x 1, p̸ | x
◦ = (Source: Wikipedia)
p L 0, p | x
m n
◦ m ≡p n =⇒ = (p periodic on top)
p L p L
mn m n
◦ = (completely multiplicative on top) (Apostol, Thm. 9.3)
p L p L p L
(
−1 p−1 (p−1)/2 +1, p ≡4 1
◦ = = (−1) = (Apostol, Thm. 9.4)
p L p L −1, p ≡4 3
(
2 2 +1, p ≡8 1, 7
◦ = (−1)(p −1)/8 = (Apostol, Thm. 9.5)
p L −1, p ≡8 3, 5
(
3 ⌊(p+1)/6⌋ +1, p ≡12 1, 11
◦ For p ̸= 3, = (−1) = (Source: Wikipedia)
p L −1, p ≡12 5, 7
(
5 ⌊2(p+1)/5⌋ +1, p ≡5 1, 4
◦ For p ̸= 5, = (−1) = (Source: Wikipedia)
p L −1, p ≡5 2, 3
(q−1)/2 (p−1)/2
p Y Y k i
◦ = sign − (Source: Wikipedia)
q L i=1
p q
k=1
(p−1)/2
q Y sin(2πn · q/p)
◦ = (Source: Wikipedia)
p L n=1
sin(2πn/p)
Algebraic Properties of Jacobi Symbols: (Throughout P, Q are distinct, odd, and in Z>0 .)
a
◦ = −1 =⇒ a ∉ QR(p)
p J
462
a
◦ a ∈ QR(p) and a, p coprime =⇒ = 1 (converse need not hold!!)
p J
◦ All above properties for Legendre symbols (cf. Theorems 9.9, 9.10)
m m
m
◦ = (compl. mult. on bottom) (Apostol, Thm. 9.9b)
P J Q J PQ J
2
a n n
◦ If a, P are coprime, = (Apostol, Thm. 9.9d)
P J P J
Big Results: (Throughout, p is an odd prime; if q is used, it is one too, and p ̸= q.)
n
◦ Euler Criterion: ≡p n(p−1)/2 (Apostol, Thm. 9.2)
p L
463
p q
◦ Legendre Quadratic Reciprocity: = (−1)(p−1)(q−1)/4 (Apostol, Thm. 9.8)
q L p L
q
− , p, q ≡4 3
p p L
Equivalently, since the symbols will be ±1, =
q L q
, otherwise
p L
The law also holds for Jacobi symbols in the obvious way.
Some on Gauss Sums: (Throughout, p is an odd prime; if q is used, it is one too, and p ̸= q.)
X
◦ Recall: for χ ∈ DChar(p), G(n, χ) := χ(r)ζnnr
r mod p
m
X 2
nr
◦ Define the quadratic Gauss sum by G(n; m) := ζm
r=1
◦ For χ(r) := (r | p) (the quadratic character χL ), χL is primitive and G(n, χL ) = (n | p) · G(1, χL )
∀n.
−1
◦ G(1, χL )2 = p = ±p. (Apostol, Thm. 9.13)
p L
q
◦ G(1, χL )q−1 ≡p iff quadratic reciprocity holds (Apostol, Thm. 9.14)
p L
X r1 r2 · · ·rq
q−1 q
◦ G(1, χL ) = (Apostol, Thm. 9.15)
p L p L
1≤i≤q
ri mod p
r1 +...rq ≡p q
n
◦ G(n; p) = · G(1; p)
p L
◦ Quadratic Reciprocity: ∀m ∈ Z≥1 , we have
√
m, m ≡4 1
√
m 0, m ≡4 2
G(1; m) = (1 + i)(1 + e−πim/2 ) = √
2 i m, m ≡4 3
√
(1 + i) m, m ≡4 0
464
§19: Special (Often Important & Nonelementary) Functions
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
465
§19.2: Beta Function – B(x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
466
§19.3: Digamma Function – ψ(x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
467
§19.4: Error Functions – erf(z), erfc(z), erfi(z)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
468
§19.5: Exponential Integral – Ei(x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
469
§19.6: Fresnel Integrals – S(x), C(x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
470
§19.7: Gamma Function – Γ(x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic – Special Values)
471
§19.8: Lambert W Function – W (x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
472
§19.9: Polylogarithms – Lin (x)
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
473
§19.10: Trig Integrals – Si(x), Ci(x), etc.
(Some links – A list of special functions on Wikipedia – Wikipedia article on this topic)
474
§20: Useful Inequalities Across Mathematics
Binomial Coefficients:
n k (n − k + 1)k nk
en k
n
◦ max , ≤ ≤ ≤
k k! k k! k
nn
n
◦ ≤ k
k k · (n − k)n−k
nk √
n
◦ ≤ ( n ≥ k ≥ 0)
4k! k
n
4n
4 1 2n 1
◦ √ 1− ≤ ≤ √ 1−
πn 8n n πn 9n
a c a+c
◦ ≤
b d b+d
tn n
◦ ≥ tk (t ≥ 1)
k k
d
X n en d
◦ ≤ min nd + 1, , 2n (n ≥ d ≥ 1)
k d
k=0
f (b) − f (a)
Cauchy: For a < b and f convex, f ′ (a) ≤ ≤ f ′ (b)
b−a
√ x1−α y α + xα y 1−α x+y
Heinz: xy ≤ ≤ (x, y > 0 and α ∈ [0, 1])
2 2
Z b
a+b 1 f (a) + f (b)
Hermite: For f convex, f ≤ f≤
2 b−a a 2
Jensen: For φ convex, ψ concave, pi ≥ 0 and i pi = 1,
P
! !
X X X X
φ p i xi ≤ pi φ(xi ) ψ pi xi ≥ pi ψ(xi )
i i i i
√ √ √ √ 2
√ x+ y √ x−y x+ y x+y
Log-Mean: xy ≤ 4
xy ≤ ≤ ≤ (x, y > 0)
2 ln(x) − ln(y) 2 2
475
n
Maclaurin-Newton: Define, for some {ai }i=1 ⊆ R≥0 ,
1 X
Sk := n
ai1 ai2 · · ·aik
k 1≤i1 <i2 <···<ik ≤n
√ p
Then Sk2 ≥ Sk−1 Sk+1 and k Sk ≥ k+1 Sk+1 for 1 ≤ k < n.
Y√ Y√ Y√
Mahler: If xi , yi > 0, n
xi + yi ≥ n
xi + n
yi
i i i
|x + y| ≤ 2 max{|x|, |y|}
Square Roots:
√ √ √
◦ x+y ≤ x+ y (x, y ≥ 0)
√ √ 1 √ √ √ √
◦ 2 x+1−2 x < √ < x+1− x−1 <2 x−2 x=1 (x ≥ 1)
x
x x2 √ x
◦ 1− − ≤ 1−x ≤1− (x ≤ 1)
2 2 2
n n √ n n √ n n n n
Stirling: e ≤ 2πn e1/(12n+1) ≤ n! ≤ 2πn e1/(12n) ≤ en
e e e e
Young: For x, y, p, q > 0 with p, q Holder-conjugate (1/p + 1/q = 1), we have
−1
xp yq
1 1
p
+ q ≤ xy ≤ +
px qy p q
Ra Rb
For integrals, we will have 0
f+ 0
f −1 ≥ ab for f ∈ C[a, b] and strictly increasing
476
§20.2: HM-GM-LM-AM-QM-CM Inequalities
Abbreviations:
HM = harmonic mean
GM = geometric mean
LM = logarithmic mean (defined to be x if all arguments are x)
AM = arithmetic mean (also: average)
QM = quadratic mean (also: root-mean-squared (RMS), exponential mean)
CM = contraharmonic mean
We have, in order, simply
min ≤ HM ≤ GM ≤ LM ≤ AM ≤ QM ≤ CM ≤ max
For two numbers x, y,
r
√
2 x−y x+y x2 + y 2 x2 + y 2
min{x, y} ≤ ≤ xy ≤ ≤ ≤ ≤ ≤ max{x, y}
1 1 | {z } ln(x) − ln(y) | {z2 } | {z2 } x+y
+ (GM)
x y
| {z } | {z }
(LM) (AM) (QM) (CM)
| {z }
(HM)
Some notes:
i wi i
Equality holds iff the xk with wk > 0 are all equal. We assume 00 = 1.
Proof for the formula for the LM of multiple numbers is here.
477
§20.3: Inequalities for Trigonometry (Regular & Hyperbolic)
x3 x3
x cos(x) x
x−
p
3
≤ x cos(x) ≤ ≤ x cos(x) ≤ x − ≤ x cos √ ≤ sin(x) ≤ |sin(x)| ≤ x
2 1 − x2 /3 6 3
x3 x x cos(x) + 2x x2
x cos(x) ≤ 2 ≤ x cos2 ≤ sin(x) ≤ ≤
sinh (x) 2 3 sinh(x)
2 π 2 − x2 x2
sin(x) x tan(x)
max , 2 2
≤ ≤ cos ≤ 1 ≤ 1 + ≤ (x ∈ [0, π/2])
π π +x x 2 3 x
478
§20.4: Inequalities for Exponentiation
x n
ex ≥ 1 + ≥1+x
n
x2
x n
1+ ≥ ex 1 − (n ≥ 1, |x| ≤ n)
n n
xn x n+x/2
+ 1 ≤ ex ≤ 1 +
n! n
ex n
ex ≥ (x, n > 0)
n
y
x
e > 1+
x
> exy/(x+y) (x, y > 0)
y
1+x
e2x ≤ (x ∈ (0, 1))
1−x
x3
xex ≥ x + x2 +
2
2
ex ≤ x + ex
2 2
ex + e−x ≤ 2ex /2
=⇒ cosh(x) ≤ ex /2
x
ex ≤ 1 − (x ∈ [0, 1.59])
2
ex ≤ 1 + x + x2 (x ∈ [0, 1.79])
xy + y x < 1
x
xy >
x+y
1
< xx < x2 − x + 1
2−x
x1/r (x − 1) ≤ rx(x1/r − 1) (x, r ≥ 1)
1 x
≤1+ (x ∈ [0, 1])
2x 2
p q
x x
1+ ≥ 1+ provided any of the following hold:
p q
◦ x > 0 and p > q > 0
◦ −p < −q < x < 0
◦ −q > −p > x > 0
p q
x x
1+ ≤ 1+ provided any of the following hold:
p q
◦ q < 0 < p and −q > x > 0
◦ q < 0 < p and −p < x < 0
479
§20.5: Inequalities for Logarithms
x x(6 + x)
≤ ln(1 + x) ≤ ≤x (x > −1)
1+x 6 + 4x
2 1 ln(1 + x) 1 2+x
≤ p ≤ ≤ √ ≤ (x > −1)
2+x 2
1 + x + x /12 x 1+x 2 + 2x
n
1 1 X1
ln(n) + < ln(n + 1) < ln(n) + ≤ ≤ ln(n) + 1 (n ≥ −1)
n+1 n k
k=1
1 1
|ln(x)| ≤ x −
2 x
y
ln(x + y) ≤ ln(y) +
x
ln(x) ≤ y x1/y − 1 (x, y > 0)
x2
ln(1 + x) ≥ x − (x ≥ 0)
x
ln(1 + x) ≥ x − x2 (x ≥ −0.68)
480
§20.6: Inequalities for Summations
Then !2 ! !
n
X n
X n
X
a1 b1 − ai bi ≥ a21 − a2i b21 − b2i
i=2 i=2 i=2
v
n u
X uYk n
X
Carleman: k
t |ai | ≤ e |ak |
k=1 i=1 k=1
n
!2 n
! n
!
X X X
Cauchy-Schwarz (ℓ (R)): 2
xi yi ≤ x2i yi2
i=1 i=1 i=1
n
!2 n
! n
!
X X 2
X 2
Cauchy-Schwarz (ℓ (C)): 2
xi yi ≤ |xi | |yi |
i=1 i=1 i=1
Chebyshev: Given
◦ x1 ≤ · · · ≤ xn
◦ f, g non-decreasing functions
◦ pi ≥ 0
P
◦ i pi = 1
! !
X X X
we have f (xi )g(xi )pi ≥ f (xi )pi g(xi )pi
i i i
Gibbs: For ai , bi ≥ 0, A :=
P P
i ai and B := i bi ,
X ai A
ai ln ≥ A ln
i
bi B
481
∞ n
!p ∞
p X
X 1X p
Hardy: For {an }n∈N ⊆ R≥0 and p > 1, ai ≤ apn
n=1
n i=1 p−1 n=1
n n
!1/p n
!1/q
X X p
X q
|xi yi | ≤ |xi | |yi |
i=1 i=1 i=1
n n
!λa n
!λb n
!λz
X λ λ λ
X X X
|ai | a |bi | b · · ·|zi | z ≤ |ai | |bi | ··· |zi |
i=1 i=1 i=1 i=1
482
! ! ! !
X X ai bi X X
Milne: Given ai , bi ≥ 0, then ai + bi ≤ ai bi
i i
ai + bi i i
n
!1/p n
!1/p n
!1/p
X p
X p
X
Minkowski (ℓ , p ∈ [1, ∞)):
p
|xi + yi | ≤ |xi | + |yi |
i=1 i=1 i=1
483
§20.7: Inequalities for Integrals / Lp Inclusions
Lp Inclusions:
Quick list of various Lp inclusions, where p, q ∈ [1, ∞] are Holder conjugates ( p1 + 1
q = 1)
f ∈ Lp , g ∈ Lq =⇒ f g ∈ L1 (from Holder)
1 1
If 1 ≤ p < r < q < ∞ and r = p + 1q :
Lp ∩ Lq ⊆ Lr (from interpolation)
Auxiliary Inequalities:
|x + y| ≤ 2 max{|x|, |y|}
Integral Inequalities:
Clarkson’s Inequalities: Let f, g ∈ Lp (Ω) for p ∈ [2, ∞) and Ω in a measure space X. Then
f + g p
Z Z p Z
+ f − g ≤ 1
Z
p p
2 2 |f | + |g|
Ω Ω 2 Ω Ω
484
For p ∈ (1, 2), let q be its Holder conjugate q = p/(p − 1). (Hence 1/p + 1/q = 1.) Then we instead
have
f + g q 1/p f − g q 1/q
Z Z Z Z q/p
1 p 1 p
2
+
2
≤ |f | + |g|
Ω Ω 2 2 Ω Ω
We may write this as
f + g
q
f − g
q
q/p
1 p 1 p
2
p
+
≤ ∥f ∥Lp (Ω) + ∥g∥Lp (Ω)
L (Ω) 2
Lp (Ω) 2 2
or rather
Y
Y
fi
≤ ∥fi ∥pi
i 1 i
Z Z p Z −(p−1)
1/p −1/(p−1)
◦ Reverse Holder Inequality: For p ∈ (1, ∞), |f g| ≥ |f | · |g|
Ω Ω Ω
p
Abusing L norm notation (these are not norms): ∥f g∥1 ≥ ∥f ∥1/p · ∥g∥−1/(p−1)
Interpolation Inequalities: Let 1 ≤ p < r < q < ∞; then Lp ∩ Lq ⊆ Lr .
Moreover, if θ ∈ (0, 1) satisfies
485
Z 1/p Z 1/p Z 1/p
p p
Minkowski (L , p ∈ [1, ∞)):
p
|f + g| ≤ |f | + |g|
Ω Ω Ω
486
§20.8: Inequalities for Matrices
n X
Y n n
Y
Hadamard: det(A) ≤ 2
A2i,j = ∥ai ∥2 for A := (Ai,j )1≤i,j≤n with columns ai .
i=1 j=1 i=1
Equality holds iff the columns are nonzero and pairwise orthogonal.
Schur: For λi the eigenvalues of A := (Ai,j )1≤i,j≤n ,
n
X n
X
λ2i < A2i,j
i=1 i,j=1
487
§20.9: Inequalities for Matrix/Vector Norms
Inequalities of p-Norms:
Unless stated otherwise, assume we’re working in a sequence space (ℓp ), finite-dimensional vector space
(e.g. Cn for n < ∞), or function space (Lp ). We define (in the sequence case)
∞
!1/p
X p
∥x∥p := |xi | ∥x∥∞ := sup|xi | ≡ lim ∥x∥p
i∈N p→∞
i=1
Some implications:
For matrices A ∈ Cm×n , of rank r, we have the inequalities below. Subscripts refer to induced p-norms. The
max norm is given by
∥A∥max := max|ai,j |
i,j
√
∥A∥2 ≤ ∥A∥F ≤ r∥A∥2
p
∥A∥F ≤ ∥A∥∗ ≤ ∥A∥F (middle is the Schatten 1-norm)
√
∥A∥max ≤ ∥A∥2 ≤ mn∥A∥max
√ √
∥A∥∞ ≤ n∥A∥2 ≤ mn∥A∥∞
√ √
∥A∥1 ≤ m∥A∥2 ≤ mn∥A∥1
p
∥A∥2 ≤ ∥A∥1 ∥A∥∞
488
§20.10: Inequalities for Probability
Inequalities of Moments:
◦ Second Moment:
E[X]2
P(X > 0) ≥ (if E[X] ≥ 0)
E[X 2 ]
var[X]
P(X = 0) ≤ (if E X 2 ̸= 0)
E[X 2 ]
3/2
4 h i E X2
◦ Fourth Moment: For E X ∈ (0, ∞), E |X| ≥ 1/2
E[X 4 ]
P
◦ kth Moment: For k evenm
√ X1 , · · ·, Xn ∈ [0, 1] k-wise independent r.v.s, with X = i Xi and
µ := E[X], and Ck := 2 πke1/(6k) , we have
h i
E (X − µ)k
P |X − µ| ≥ t ≤
tk
k/2
nk
P |X − µ| ≥ t ≤ Ck
et2
Azuma: Consider a martingale {Xn }n∈N with |Xi − Xi−1 | < ci almost-surely. Let α ≥ 0. Then
−α2
P |Xn − X0 | ≥ α ≤ 2 exp
2 i c2i
P
489
Chebyshev (for pdfs): Given t > 0, we have
var[X]
P |X − E[X]| ≥ t ≤
t2
var[X]
P X − E[X] ≥ t ≤
var[X] + t2
◦ x1 ≤ · · · ≤ xn
◦ f, g non-decreasing functions
◦ pi ≥ 0
P
◦ i pi = 1
! !
X X X
we have f (xi )g(xi )pi ≥ f (xi )pi g(xi )pi . Consequently,
i i i
h i h i h i
E f (X)g(X) ≥ E f (X) · E g(X)
F (a)
for F (z) := k P(X = k)z k the probability generating function and a ≥ 1.
P
◦ P(X ≥ t) ≤ t
a P
◦ Let Xi ∈ [0, 1] be independent, with P(Xi = ±1) = 1/2 and X := i Xi . Let α ≥ 0. Then
2
P(X ≥ α) = P(X ≤ −α) ≤ e−α /(2n)
P
◦ For Xi ∈ [0, 1] independent r.v.s, X := Xi , µ := E[X], and α ≥ 0,
i
µ
eα −α2 µ
P X ≥ (1 + α)µ ≤ ≤ exp
(1 + α)1+α 2+α
−α
µ
−α2 µ
e
P X ≤ (1 − α)µ ≤ ≤ exp
(1 − α)1−α 2
1
◦ For R ≥ 2eµ ≈ 5.44µ, P(X ≥ R) ≤ for µ := E[X]
2R
◦ For Xi ∈ [0, 1] k-wise independent r.v.s and E[Xi ] = p,
!
n
X
k pk
P Xi ≥ t ≤ t
i k
◦ Let us have:
Xi ∈ [0, 1] k-wise independent r.v.s (n-many in total)
E[Xi ] = pi
P
X := i Xi
µ := E[X]
p := µ/n
δ>0
µδ
k≥b k=
1−p
490
Then
n b
k
k p
P X ≥ (1 + δ)µ ≤ (1+δ)µ
k
b
−2α2
P |Z − E[Z]| ≥ α ≤ 2 exp P 2
i ci
Pk
Etemadi: Take Xi independent r.v.s with Sk := i=1 Xi . Let α ≥ 0. Then
P max|Sk | ≥ 3α ≤ 3 max P |Sk | ≥ α
k k
! !
X X X X
φ p i xi ≤ pi φ(xi ) ψ pi xi ≥ pi ψ(xi )
i i i i
In particular, h i
φ E[X] ≤ E φ(X) ψ E[X] ≥ E[ψ(X)]
n Pk
Kolmogorov: Take {Xi }i=1 independent r.v.s with E[Xi ] = 0 and var[Xi ] < ∞. Let Sk := i=1 Xi ,
and let α > 0. Then
1 1 X
P max|Sk | ≥ α ≤ 2 var[Sn ] = 2 var[Xi ]
k α α i
491
h i 1 h i
◦ For a > 0, P |X| ≥ a ≤ E |x|
a
h i 1 − E[X]
◦ For X ∈ [0, 1] and c ∈ 0, E[X] , P(X ≤ c) ≤
1−c
h i
E f (X)
◦ For f ≥ 0 and f (x) ≥ s > 0 ∀x ∈ S, P(X ∈ S) ≤
s
Paley-Zygmund: Take X a nonnegative r.v. of finite variance, and α ∈ (0, 1). Then
var[X]
P X ≥ αE[X] ≥ 1 − 2
(1 − α)2 E[X] + var[X]
Vysochanskij-Petunin-Gauss: Take X a unimodal r.v. with mode m. Let σ 2 := var[X] < ∞ and
let τ 2 := var[X] + (E[X] − m)2 = E[(X − m)2 ]. Then the following hold:
r
4 8
P |X − E[X]| ≥ λσ ≤ 2 for λ ≥
9λ 3
4τ 2 2τ
P |X − m| ≥ α ≤ for α ≥ √
9α2 3
α 2τ
P |X − m| ≥ α ≤ 1 − √ for α ≤ √
τ 3 3
492
§20.11: Very General Inequalities (e.g. Inner Product Spaces, Metric Spaces)
Triangle Inequality:
Bessel: Consider a Hilbert space (H, ⟨·, ·⟩) with {en }n∈N orthonormal elements of H. Then ∀x ∈ H
∞
X 2 2
|⟨x, ek ⟩| ≤ ∥x∥ := ⟨x, x⟩
k=1
493
§21: Miscellaneous Topics
At least in the context of R or C, we say α is algebraic (over Q) if ∃p ∈ Q[x] (equivalently Z[x] for that
example) such that p(α) = 0. That is, α is algebraic iff it is the root of a polynomial in rational (or integer)
coefficients. We say α is transcendental otherwise.
Sometimes we let A denote the algebraic numbers, and T the transcendentals.
Noteworthy subsets of A include:
Q itself
Roots of polynomials in Z[x], definitionally (this even circumvents the issue of solvability of quintics)
√
n a for all n ∈ N and a as suitably defined
Constructible numbers
sin(qπ), cos(qπ), and the other trig functions, when q ∈ Q (provided the function is defined)
It forms a field, sometimes poorly denoted Q (but A also is used for the “adele ring” so there isn’t a
great notation)
Countably infinite
A is algebraically closed when taken as a subset of C, and is the smallest algebraically closed field
containing Q
◦ In particular, {0, α} for α ∈ A̸=0 gives {1, eα } linearly independent over A and hence eα cannot
be algebraic: it must be transcendental.
494
§21.2: Borwein Integrals
A pattern:
Z ∞
sin(x) π
dx =
0 x 2
Z ∞
sin(x) sin(x/3) π
dx =
0 x x/3 2
Z ∞
sin(x) sin(x/3) sin(x/5) π
dx =
0 x x/3 x/5 2
Z ∞
sin(x) sin(x/3) sin(x/5) sin(x/7) π
dx =
0 x x/3 x/5 x/7 2
Z ∞
sin(x) sin(x/3) sin(x/5) sin(x/7) sin(x/9) π
dx =
0 x x/3 x/5 x/7 x/9 2
Z ∞
sin(x) sin(x/3) sin(x/5) sin(x/7) sin(x/9) sin(x/11) π
dx =
0 x x/3 x/5 x/7 x/9 x/11 2
Z ∞
sin(x) sin(x/3) sin(x/5) sin(x/7) sin(x/9) sin(x/11) sin(x/13) π
dx =
0 x x/3 x/5 x/7 x/9 x/11 x/13 2
However,
Z ∞
sin(x) sin(x/3) sin(x/5) sin(x/7) sin(x/9) sin(x/11) sin(x/13) sin(x/15)
dx
0 x x/3 x/5 x/7 x/9 x/11 x/13 x/15
467807924713440738696537864469
= π
935615849440640907310521750000
π 6879714958723010531
= − π
2 935615849440640907310521750000
π
≈ − 2.31 × 10−11
2
Moreover,
Z n
∞ Y
sin(x/(2k + 1))
2 cos(x) dx
0 x/(2k + 1)
k=0
is π/2 through n = 55 (up to the odd number 111), and fails thereafter.
In fact, even more generally,
Z n
∞ Y
sin(ak x)
dx
0 ak x
k=0
Pn
is π/2 only when i=0 ai < 1.
The core reason behind this is that the Fourier transform of sin(πx)/πx is the rectilinear unit:
(
sin(x) 1, t ∈ [−1, 1]
F (t) = rect(t) :=
x 0, otherwise
495
§21.3: Euclidean Algorithm
More descriptively:
Suppose we are given elements a, b ∈ R for some Euclidean domain. (These could be integers, but also
could be, say, polynomials over a sufficiently-structured ring, e.g. Z[x].) The Euclidean algorithm can be
used to achieve two main goals:
Find gcd(a, b)
Find the x, y such that ax + by = gcd(a, b)
We start with our given elements, and wish to find q0 , r0 ∈ R such that
(N here is a norm. For Z, it is typically the absolute value; for Z[x] it is typically the degree of the
polynomial.) To find q0 , r0 :
This gives you q0 , r0 . If r0 ̸= 0, we may proceed again, with the algorithm on q0 , r0 , since gcd(a, b) = gcd(b, r0 ).
Then we want q1 , r1 such that
496
We again divide our two known ones, b/r, and get q1 , r1 from there. If need be we go on to the algorithm
for
r0 = q2 r1 + r2
and repeat. Loosely speaking, we have
n o
(Mℓ , Mr ) ∈ (a, b) → (b, r0 ) → (r0 , r1 ) → (r1 , r2 ) → · · ·
n o
(Gℓ , Gr ) ∈ (q0 , r0 ) → (q1 , r1 ) → (q2 , r2 ) → (q3 , r3 ) → · · ·
Mℓ = Mr Gℓ + Gr
(where M is the elements that “move along” and G those that get generated, but notice that only the ri get
put into the moving set). One may also describe the Euclidean algorithm recursively:
ri+1 = ri−1 − qi ri
r0 = a
r1 = b
rn−1 = rn qn+1 + 0
497
§21.4: Cauchy Product of Sums/Series & Discrete Convolution
Given two series in coefficients {an }n∈N , {bn }n∈N , we get one in coefficients {cn }n∈N as so:
∞
! ∞ ! ∞ n
X X X X X
an bn = cn wherein cn = ai bj = aj bn−j (infinite summation)
n=0 n=0 n=0 i+j=n j=0
∞ ∞ ∞
! ! n
X X X X X
an xn bn xn = cn xn wherein cn = ai bj = aj bn−j (infinite power series)
n=0 n=0 n=0 i+j=n j=0
N
! M
! NX
+M n
X X X X
an bn = cn wherein cn = ai bj = aj bn−j (finite summation)
n=0 n=0 n=0 i+j=n j=0
N
! M
! NX
+M n
X X X X
n n
an x bn x = cn xn wherein cn = ai bj = aj bn−j (polynomials)
n=0 n=0 n=0 i+j=n j=0
a new sequence of numbers. (In the finite case, it will be longer than the individual sequences, usually.)
3Blue1Brown has a great explanation here.
Note the comparison to the continuous analogue for functions, especially if you think of an as a function
a(n): Z
(f ∗ g)(ξ) = f (x)g(ξ − x) dx
R
498
§21.5: Euler’s Summation Transformation
The series
It can provide means to evaluate certain identities; for instance, choosing an = 1/(2n + 1) gives
∞
π X n!
=
2 n=0
(2n + 1)!!
as in the video. The LHS can be derived from simply noting that
∞ ∞ 1
X (−1)n X (−1)n 2n+1
= x
n=0
2n + 1 n=0 2n + 1
0
∞
Z 1 X !
= (−1)n x2n dx
0 n=0
Z 1
1
= dx
0 1 + x2
1
= arctan(x)
0
π
=
4
whereas the RHS comes from using the Euler transformation. (How one might use this to evaluate a
given sum, as opposed to use a given identity to derive others, is beyond me.)
499
§21.6: Formulas for the Primes
500
§21.7: Lagrange Interpolation
n
Given a set of points {(xi , yi )}i=0 all unique through which we want an n-degree polynomial, we start as
so...
n
Begin with the Lagrange basis for {ℓj }j=0 for polynomials through those points, whereby ℓj (xi ) = δi,j in
the Kronecker sense. Explicitly,
Y x − xi
ℓj (x) =
xj − xi
0≤i≤n
i̸=j
501
§21.8: Induced Metrics & Norms
Given a normed vector space (V, ∥·∥), it is induces a function that is a metric:
d(x, y) := ∥x − y∥
Likewise, given an inner product space (V, ⟨·, ·⟩), it induces a norm and a metric:
p p
∥x∥ := ⟨x, x⟩ d(x, y) := ∥x − y∥ = ⟨x − y, x − y⟩
We can see whether a metric d is induced by some norm: a norm-induced metric satisfies (cf. Kreyszig,
Lemma 2.2-9)
502
§21.9: Pochhammer Symbols (Rising & Falling Factorials)
503
§21.10: Special Indicator Functions
These are just various indicator functions that I think are neatly framed. While strictly speaking one
can just use the standard indicator/characteristic definition of
(
1, x ∈ A
1A ≡ χA ≡
0, x ̸∈ A
this isn’t very enlightening and may prove unhelpful for some algebraic arguments. Some I’ve found will
follow:
Then f (x) = 1 for x = 1 or prime, and f (x) = 0 otherwise. The motivation comes from Wilson’s
theorem (Wikipedia) - that (n − 1)! ≡ −1 (mod n) iff n is a prime, i.e. n | 1 + (n − 1)! iff n is prime.
Indicator of Quadratic Residues: (Of sorts.) The Legendre, Jacobi, and Kronecker symbols; fairly
self-explanatory.
504
§21.11: Vieta’s Formulas
with roots (not necessarily distinct) r1 , · · ·, rn . Vieta’s formulas relate the ai and ri :
n
X an−1
ri = − (sum of roots)
i=1
an
X an−2
ri =
an
1≤i<j≤n
.
..
a0
r1 r2 · · ·rn = (−1)n (product of roots)
an
b c
r1 + r2 = − r1 r2 =
a a
and for a cubic p(x) = ax3 + bx2 + cx + d,
b c d
r1 + r2 + r3 = − , r1 r2 + r1 r3 + r2 r3 = , r1 r2 r3 = −
a a a
505
§21.12: Volume of ℓp unit ball in Rn
n
!1/p
X p
∥x∥p := |xi | where x := (xi )ni=1 ∈ Fn ∈ {Rn , Cn }
i=1
for which the “usual π,” π := π2 , is a minimum. Some MSE discussion here.
506
§21.13: Weierstrass Factorization Theorem (Infinite Products In Terms of Roots)
The Weierstrass factorization theorem takes a few forms, and generalize (in some sense) the fundamental
theorem of algebra to the case of (countably) infinitely-many roots.
n+1
Lemma 15.8 of a Rudin work gives that |z| < 1 =⇒ |1 − En (z)| ≤ |z| .
Observation: Take {an }n∈N ⊆ C̸=0 with |an | → ∞, and {pn }n∈N ⊆ Z≥0 with
∞ 1+pn
X r
< ∞, for all r > 0
n=1
|an |
then f is entire and f (z) = 0 ⇐⇒ z = an for some n. If z is in {an }n∈N m times, then z is a root of
f of multiplicity m.
Weierstrass Factorization: Let f be entire, and {an }n∈N the nonzero roots of f , repeated according
to multiplicity. Then ∃g entire and ∃{pn }n∈N ⊆ Z≥0 such that
∞
Y z
f (z) = z m eg(z) E pn
n=1
an
∞
z2
Y
sin(z) = z 1− 2 2
n=1
π n
Y ∞ 2 !
2z 2z/n Y z
cos(πz) = 1− e = 1−
n n=0
n + 1/2
n∈Z
n odd
∞
4z 2
Y
cos(z) = 1−
n=1
(2n − 1)2 π 2
∞
1 Y z −z/n
= zeγz 1+ e for γ the Euler-Mascheroni constant
Γ(z) n=1
n
507
§22: Reference Tables for Formulas, LATEX Stuff, & More
LATEX Stuff: (Not really important for formulary material but nice to know.)
◦ LATEX Compilers:
Overleaf (link): Cloud-based editor for LATEX, good for collaboration.
MiKTeX (link): Local editor for LATEX, a little more user-friendly than Overleaf.
◦ Drawing Tools:
Ipe (link): Drawing editor which supports LATEX and is generally good for clean drawings,
whilst avoiding the complications of TikZ.
Here is a site to generate simple TikZ diagrams and get the corresponding code. Good for
basics in abstract algebra, category theory, and graph theory.
◦ Tutorial Items:
Math Stack Exchange has a MathJax reference here; MathJax is syntactically almost identical
to LATEX.
Overleaf has some tutorials here.
Commands and swatches for colors are available here. (My LATEX “header” file for them is
downloadable here.)
Here is a site to draw math symbols and get the corresponding LATEX symbol.
◦ Miscellaneous Items:
IguanaTeX (link): A PowerPoint plugin to give proper LATEX support.
Here is a site that can take a PDF and produce (roughly) the LATEX used to make it.
Here is a site that, while not LATEX-specific, can freely edit PDFs.
Trigonometry Reference Tables: (This formulary has a lot, but these can be more compact.)
Integral Tables:
Advanced Calculus Stuff: (Convergence tests, inequalities, integral transforms, and more.)
◦ Various inequalities for sums, probability distributions, integrals, and more with a list by names on Wikipedia
◦ Derivatives and integrals in nonstandard calculi, e.g. product integrals
◦ A compilation of various mathematical series
◦ Tables of Laplace transforms from Paul’s Online Math Notes and Wikipedia
◦ A table of basic Fourier transforms
◦ A table of basic Fourier series
◦ A compilation of various tests for infinite series convergence
508
Alternative 3D coordinate systems and conversions:
◦ A table converting operators like curl, Laplacian, etc. on Wikipedia (Imgur backup)
◦ The NRL Plasma Formulary
◦ χ2 distribution table
◦ Normal distribution z-score tables using tails (i.e. P (−∞ < X < z)) and distance from mean (i.e. P (0 ≤ X ≤ z))
◦ Student’s t distribution critical values
Miscellaneous/Uncategorized References:
509