0% found this document useful (0 votes)
82 views142 pages

Mathematical Methods I

This document outlines the contents and structure of a course on mathematical methods. It introduces vector calculus topics like gradients, divergence, curl, and integral theorems. It also discusses orthogonal curvilinear coordinate systems and their application to vector calculus. Finally, it mentions that the course will cover Green's functions and the Dirac delta function.

Uploaded by

Thomas Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views142 pages

Mathematical Methods I

This document outlines the contents and structure of a course on mathematical methods. It introduces vector calculus topics like gradients, divergence, curl, and integral theorems. It also discusses orthogonal curvilinear coordinate systems and their application to vector calculus. Finally, it mentions that the course will cover Green's functions and the Dirac delta function.

Uploaded by

Thomas Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

Natural Sciences Tripos: IB Mathematical Methods I

Contents

Contents a

0 Introduction i
0.1 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
0.2 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
0.3 Course Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
0.4 Lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
0.5 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
0.6 Example Sheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
0.7 Examples Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
0.8 Computational Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
0.9 Election of Student Representatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
0.10 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
0.11 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
0.12 Assumed Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Vector Calculus 1
1.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Vectors and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Three-dimensional Euclidean space, points and vectors . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 Cartesian Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Suffix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Dyadic and suffix equivalents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Summation convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Matrix expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Kronecker delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.5 More on basis vectors (Unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 The Levi-Civita symbol or alternating tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.7 The vector product in suffix notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.8 The product of two Levi-Civita symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.9 A proof of Schwarz’s inequality (Unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Vector Calculus in Cartesian Coordinates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 The Gradient of a Scalar Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 The Geometrical Significance of Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Natural Sciences Tripos: IB Mathematical Methods I a © [email protected], Michaelmas 2022


1.4 The Divergence and Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2 The Divergence and Curl of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.4 F·∇ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Vector Differential Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Second-Order Vector Differential Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 curl grad and div curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
1.6.2 The Laplacian Operator ∇ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1.7 The Big Integral Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.1 The Divergence Theorem (Gauss’ Theorem) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.2 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.3 Examples and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.4 Interpretation of divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.5 Interpretation of curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8.1 What Are Orthogonal Curvilinear Coordinates? . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8.2 Relationships Between Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.3 Incremental Change in Position or Length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.4 The Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8.5 Properties of Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8.6 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8.7 Spherical Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.8 Cylindrical Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.9 Volume and Surface Elements in Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . 27
1.8.10 Gradient in Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8.11 Examples of Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.8.12 Divergence and Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.8.13 Laplacian in Orthogonal Curvilinear Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.8.14 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.8.15 Aide Memoire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Green’s Functions 33
2.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.0.1 Physical motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 The Dirac Delta Function (a.k.a. Alchemy) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.1 The Delta Function as the Limit of a Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.2 Some Properties of the Delta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.3 An Alternative (And Better) View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.4 The Delta Function as the Limit of Other Sequences . . . . . . . . . . . . . . . . . . . . . . . 35

Natural Sciences Tripos: IB Mathematical Methods I b © [email protected], Michaelmas 2022


2.1.5 Further Properties of the Delta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.6 The Heaviside Step Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.7 The Derivative of the Delta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Second-Order Linear Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.1 Homogeneous Second-Order Linear ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.2 Inhomogeneous Second-Order Linear ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.3 The Wronskian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.4 Initial-value and boundary-value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Differential equations containing delta functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
2.4.1 The Green’s Function for two-point homogeneous boundary-value problems . . . . . . . . . . 40
2.4.2 Two Properties Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.3 Construction of the Green’s Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.4 Examples of Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.5 The Green’s Function for homogeneous initial-value problems . . . . . . . . . . . . . . . . . . 43
2.4.6 Inhomogeneous boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Fourier Transforms 45
3.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.2 Examples of Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.3 The Fourier Inversion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.4 Properties of Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.5 The Relationship to Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 The Convolution Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Definition of convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.2 Interpretation and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.3 The convolution theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.4 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Parseval’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Power spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Solution of Ordinary Differential Equations using Fourier Transforms . . . . . . . . . . . . . . . . . . 57

4 Partial Differential Equations 59


4.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.1 Linear Second-Order Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Physical Examples and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Waves on a Violin String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.2 Electromagnetic Waves (Unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Natural Sciences Tripos: IB Mathematical Methods I c © [email protected], Michaelmas 2022


4.2.3 Electrostatic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.4 Gravitational Fields (Unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.5 Diffusion of a Passive Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.6 Heat Flow (Unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.7 Other Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 The One-Dimensional Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 Separable Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Boundary and Initial Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
4.5 Poisson’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.1 A Particular Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.2 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.3 Separable Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6 The Diffusion Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6.1 Separable Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6.2 Boundary and Initial Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6.4 A Rough and Ready Outline Recipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.7 Solution Using Fourier Transforms (Non-examinable & Unlectured) . . . . . . . . . . . . . . . . . . . 72
4.7.1 The diffusion equation as an exemplar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Matrices 74
5.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1.1 Some Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.1.3 Span and linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1.4 Basis Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Change of Basis: the Rôle of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.1 Linear Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.2 Transformation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.3 Properties of Transformation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2.4 Transformation Law for Vector Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Some Definitions of Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Scalar Product (Inner Product) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.1 Definition of a Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.2 Worked Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.3 Some Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.4 The Scalar Product in Terms of Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4.5 Properties of the Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Natural Sciences Tripos: IB Mathematical Methods I d © [email protected], Michaelmas 2022


5.5 Eigenvalues, Eigenvectors and Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6 Eigenvalues and Eigenvectors of Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6.1 Properties of the Eigenvalues and Eigenvectors of an Hermitian Matrix . . . . . . . . . . . . 86
5.6.2 Diagonalization of Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.6.3 Diagonalization of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.7 Applications of Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.7.1 Transformation Law for Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.7.2 Diagonalization of the Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.7.3 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7.4 Transformations Between Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
5.7.5 Worked example (unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.7.6 Uses of diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.8 Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.8.1 Eigenvectors and Principal Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.8.2 Quadrics and conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.8.3 The Stationary Properties of the Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6 Elementary Analysis 99
6.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Sequences and Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1.2 Sequences tending to a limit, or not. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Convergence of Infinite Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.1 Convergent and divergent series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.2 A necessary condition for convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.3 Absolute and conditional convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3 Tests of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.1 The comparison test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.2 D’Alembert’s ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3.3 Cauchy’s test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 Functions of a Continuous Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4.1 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4.2 The O notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.5 Taylor’s Theorem for Functions of a Real Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.6 Analytic Functions of a Complex Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.6.1 Complex differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.6.2 The Cauchy–Riemann equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.6.3 Analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.6.4 Consequences of the Cauchy–Riemann equations . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.6.5 Taylor series for analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.7 Zeros, Poles and Essential Singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Natural Sciences Tripos: IB Mathematical Methods I e © [email protected], Michaelmas 2022


6.7.1 Zeros of complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.7.2 Poles of complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.7.3 Laurent series and essential singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.7.4 Behaviour at infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.8 Power Series of a Complex Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.8.1 Convergence of Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.8.2 Radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.8.3 Determination of the radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.8.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7 Series Solutions of Ordinary Differential Equations 114

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
7.0 Why Study This? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1.1 First-order linear ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1.2 Second-order ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.1.3 Second-order linear ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . 114
7.2 Homogeneous Second-Order Linear ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2.1 Linearly independent solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2.2 The Wronskian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2.3 The Calculation of W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.2.4 A Second Solution via the Wronskian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.3 Taylor Series Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.1 Ordinary and singular points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.2 The solution at ordinary points in terms of a power series . . . . . . . . . . . . . . . . . . . . 117
7.3.3 Example (possibly unlectured) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.3.4 Example: Legendre’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.4 Regular Singular Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.4.1 The Indicial Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.4.2 Series Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.4.3 Example: Bessel’s Equation of Order ν . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.4.4 The Second Solution when σ1 − σ2 ∈ Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.4.5 Irregular singular points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.5 The Method of Variation of Parameters (Unlectured and Not in Schedule) . . . . . . . . . . . . . . . 125

Natural Sciences Tripos: IB Mathematical Methods I f © [email protected], Michaelmas 2022


0 Introduction

0.1 Schedule

The schedules, or syllabuses, are determined by a committee which has input from all the Physical Science
subjects in the Natural Sciences and from Computer Science and is agreed by the Faculty of Mathematics.
The schedules are minimal for lecturing and maximal for examining; that is to say, all the material in the
schedules will be lectured and only this material will be examined.
Below is a copy from the booklet of schedules.1 The numbers in square brackets at the end of paragraphs
indicate roughly (I emphasise roughly) the number of lectures that will be devoted to the material in the
paragraph.
Please note that the committee responsible for the schedules has recently asked me to lecture the section
on Partial differential equations after the section on the Fourier transform (instead of before the section on

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Green’s functions).

Part IB: Mathematics


This course comprises Mathematical Methods I, Mathematical Methods II and Mathematical
Methods III and six Computer Practicals. The material in Course A from Part IA will be assumed
in the lectures for this course.2 Topics marked with asterisks should be lectured, but questions
will not be set on them in examinations.
The material in the course will be as well illustrated as time allows with examples and applications
of Mathematical Methods to the Physical Sciences.3 Separate occasional examples classes will
be given as stated in the lecture list.

Mathematical Methods I 24 lectures, Michaelmas term

Vector calculus
Vector calculus Suffix notation. Einstein summation convention. Contractions using δij and εijk .
Reminder of vector products, grad, div, curl, ∆2 , and their representations using suffix notation.
Divergence theorem and Stokes’ theorem. Vector differential operators in orthogonal curvilinear
coordinates, e.g. cylindrical and spherical polar coordinates. Jacobians. [6]

Green’s functions
Response to impulses, delta function (treated heuristically), Green’s functions for initial and
boundary value problems. [3]

Fourier transform
Fourier transforms; relation to Fourier series, simple properties and examples, convolution the-
orem, correlation functions, Parseval’s theorem and power spectra. [2]

Partial differential equations


Partial differential equations Linear second-order partial differential equations; physical examples
of occurrence, the method of separation of variables (Cartesian coordinates only). [2]

Matrices
N -–dimensional vector spaces, matrices, scalar product, transformation of basis vectors. Eigen-
values and eigenvectors of a matrix; degenerate case, stationary property of eigenvalues. Orthog-
onal and unitary transformations. Quadratic and Hermitian forms, quadric surfaces. [5]

1 See
https://www.maths.cam.ac.uk/undergradnst/files/misc/NSTschedules.pdf
and also
https://www.maths.cam.ac.uk/undergradnst/currentstudents.
2 However, if you took course A rather than B, then you might like to recall the following extract from the schedules:
The material from course A is assumed. Students are nevertheless advised that if they have taken course A in Part
IA, they should consult their Director of Studies about suitable reading during the Long Vacation before embarking
upon part IB Mathematics.
3 Time is always short.

Natural Sciences Tripos: IB Mathematical Methods I i © [email protected], Michaelmas 2022


Elementary Analysis
Idea of convergence and limits. O notation. Statement of Taylor’s theorem with discussion of
remainder. Convergence of series; comparison and ratio tests. Power series of a complex variable;
circle of convergence. Analytic functions: Cauchy-Riemann equations, rational functions and
exp(z). Zeros, poles and essential singularities. [3]

Series solutions of ordinary differential equations


Homogeneous equations; solution by series (without full discussion of logarithmic singularities),
exemplified by Legendre’s equation. Classification of singular points. Indicial equation and local
behaviour of solutions near singular points. [3]

0.2 Books

An extract from the schedules.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
There are very many books which cover the sort of mathematics required by Natural Scientists.
The following should be helpful as general reference; further advice will be given by Lecturers.
Books which can reasonably be used as principal texts for the course are marked with a dagger.
The prices given are intended as a guide only, and are subject to change.

G Arfken & H Weber Mathematical Methods for Physicists, 6th edition. Elsevier, 2005 (£44.09).

J W Dettman Mathematical Methods in Physics and Engineering. Dover, 1988 (£23.99 paper-
back).
H F Jones Groups, Representation and Physics, 2nd edition. Institute of Physics Publishing,
1998 (£45.99 paperback)
E Kreyszig Advanced Engineering Mathematics, 8th edition. Wiley, 1999 (10th edition available,
£46.59 hardback)

J Mathews & R L Walker Mathematical Methods of Physics, 2nd edition. Pearson/Benjamin
Cummings, 1970 (From £42.00 used).

K F Riley, M P Hobson & S J Bence Mathematical Methods for Physics and Engineering. 3nd
ed., Cambridge University Press, 2002 (£39.99 paperback).
R N Snieder A guided tour of mathematical methods for the physical sciences, 2nd edition.
Cambridge University Press, 2004 (£34.19 paperback)

There is likely to be a resemblance between my notes and Riley, Hobson & Bence. This is because we both
used the same source, i.e. previous Cambridge lecture notes.4
Of the other books, I like Mathews & Walker, but it might be a little mathematical for some. Also, the first
time I gave a ‘service’ mathematics course (over 35 years ago to aeronautics students at Imperial), my notes
bore a resemblance to Kreyszig . . . and that was not because we were using a common source!

0.3 Course Website

See the NST Part IB: Mathematics Moodle course at


https://www.vle.cam.ac.uk/course/view.php?id=78772
The direct link to this term’s section is
https://www.vle.cam.ac.uk/course/view.php?id=78772&sectionid=4279112
but this might break if someone changes the number of sections!

4 When I lectured this course two decades ago, a student hoped that Riley et al. were getting royalties from my lecture

notes; my hope is that my lecturers from 45 years ago are getting royalties from Riley et al.!

Natural Sciences Tripos: IB Mathematical Methods I ii © [email protected], Michaelmas 2022


0.4 Lectures
• Lectures will start at 11:05 promptly with a summary of the last lecture. If attending in-person (which
I highly recommend on educational grounds), please be on time since it is distracting to have people
walking in late.

• I will aim to finish by 11:55, but am not going to stop dead in the middle of a long proof/explanation.
• I welcome constructive heckling. Hence, if I am inaudible, illegible, unclear (e.g. you spot a typo or
I use jargon you do not understand), or just plain wrong, then please speak up. I will endeavour to
stay around for a few minutes at the front after lectures in order to answer questions. Questions and
comments, particularly longer ones, can also be emailed to me at [email protected].

• I want you to learn. I will do my best to be clear but you must read through and understand your
notes before the next lecture . . . otherwise you will get hopelessly lost. An understanding of your notes
will not diffuse into you just because you have carried your notes around for a week, or put them under

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
your pillow, or watched them (possibly more than once) online (especially if done at double speed).
• I aim to avoid the words trivial, easy, obvious and yes 5 . Let me know if I fail. I will occasionally use
straightforward or similarly to last time; if it is not, email me at [email protected], or
catch me at the end of the next lecture.
• Sometimes I may confuse both you and myself, and may not be able to extract myself in the middle of
a lecture. Under such circumstances I will have to plough on as a result of time constraints; however
I will clear up any problems at the beginning of the next lecture.

• This is a ‘service’ course, so you will not get pure mathematical levels of rigour. However, I will give
some justification for a method, rather than just a recipe, because if you are to use a method efficiently
and effectively, or extend it as might be necessary in research, you need to understand why a method
works, as well as how to apply it.

• If anyone is colour blind please tell me which colours you cannot read.

0.5 Lecture Notes


• The lecture notes should be online [just] before the relevant lecture. If I manage to get organised, there
may be a sign-up sheet for hard-copies, but please do not ask for hard-copies unless you really need
them (since paper copies are not environmentally friendly).
• An advantage of typeset notes is that you can listen to me rather than having to scribble things
down. However, a disadvantage is that you can lose concentration. Hence, with one or two exceptions
figures/diagrams are deliberately omitted from the notes. I was taught to do this at my teaching
course on How To Lecture . . . the aim being that it might help you to stay awake if you have to write
something down from time to time. Indeed, as an aid to concentration, you may wish to copy my
scribbles on the visualisers.
• There are a number of unlectured worked examples in the notes. In the past I have been tempted to
not include these because I was worried that students would be unhappy with material in the notes
that was not lectured. However, a vote in an earlier year was overwhelming in favour of including
unlectured worked examples.
• Please email corrections to the notes to me at [email protected].
• If it is not in the typeset notes, or on the example sheets, it should not be in the exam.

5 But I will fail miserably in the case of yes.

Natural Sciences Tripos: IB Mathematical Methods I iii © [email protected], Michaelmas 2022


0.6 Example Sheets
• There will be five Example Sheets. They will be available on Moodle at about the same time as you
can do them.
• You should be able to complete the revision example sheet, i.e. Example Sheet 0, immediately (although
you might like to wait until the end of lecture 2 for a couple of the questions).
• You should be able to complete Example Sheets 1/2/3/4 after lectures 6/12/18/24 respectively (or
thereabouts). Please bear this in mind when arranging supervisions.
• There are sketch answers to the sheets. I will make these available to you on Moodle at the end of
weeks 1, 3, 5, 7 and 9 (where I count weeks from 0, with week 0 starting on the Sunday before the
Tuesday on which Full Term starts). If I forget to do this, please remind me by email.
• The good news for supervisors is that the sheets are the same as last year other than for some rear-

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
rangement caused by moving the lecture material on Partial differential equations to later. Supervisors
can have access to the answers almost immediately, as indicated on the Moodle site.

0.7 Examples Classes

There will be Examples Classes on Wednesday 2 November and Wednesday 23 November from 14:00 to
16:00 in the Cockcroft Lecture Theatre.

0.8 Computational Exercises

I have been asked to remind you that there is a Computational Projects element to the course that you
need to register on the course Moodle by 23 October 2022.

0.9 Election of Student Representatives

The Faculty Board of Mathematics asked DAMTP to set up a Staff-Student Committee for Mathematics
in the Natural Sciences to provide an opportunity for discussion of matters relating to the courses. The
Committee has four staff and three student members, the latter being drawn from the A and B courses in
Part IA and from the Part IB course.
Hence, this Consultative Committee for NST Mathematics will need an elected undergraduate member
drawn from this course. I have been asked to conduct an election. It has been suggested that a week’s notice
be given and that nominations are asked for in writing, countersigned by the nominee as a guarantee of
willingness to serve. It has been proposed that the election can take place by a show of hands at the start
of a designated lecture.
Please could you hand me nominations in writing, countersigned by the nominee, by the end of the lecture
on Monday 17 October?
If you would prefer that the election take place by other than a show of hands, please could you email an
alternative suggestion to [email protected].

0.10 Feedback

Comments and administrative/organisational queries on the course, lectures and the examples sheets can
be made via the email address [email protected].
Comments received will be edited and passed on anonymously to the relevant lecturer and others concerned.
They will also be considered at the next meeting of the Staff Student Consultative Committee. Queries will
either be answered directly or passed on to the relevant lecturer.

Natural Sciences Tripos: IB Mathematical Methods I iv © [email protected], Michaelmas 2022


0.11 Acknowledgements

The Lecture Notes and Example Sheets were adapted from those of Paul Townsend, Stuart Dalziel, Mike
Proctor, Paul Metcalfe and Henrik Latter.

0.12 Assumed Knowledge

Familiarity with the following topics at the level of Course A of Part IA Mathematics for Natural Sciences
will be assumed.

• Algebra of complex numbers


• Algebra of vectors (including scalar and vector products)
• Algebra of matrices

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
• Eigenvalues and eigenvectors of matrices
• Taylor series and the geometric series
• Calculus of functions of several variables
• Line, surface and volume integrals
• The Gaussian integral
• First-order ordinary differential equations
• Second-order linear ODEs with constant coefficients
• Fourier series
• Permutations

More specifically, you should check that you recall the following.

The Greek alphabet.


A α alpha
B β beta
Γ γ gamma
∆ δ delta
E ϵ epsilon
Z ζ zeta
H η eta
Θ θ theta
I ι iota
K κ kappa
Λ λ lambda
M µ mu
N ν nu
Ξ ξ xi
O o omicron
Π π pi
P ρ rho
Σ σ sigma
T τ tau
Υ υ upsilon
Φ ϕ phi
X χ chi
Ψ ψ psi
Ω ω omega
There are also typographic variations on epsilon (i.e. ε), theta (i.e. ϑ), pi (i.e. ϖ), rho (i.e. ϱ), sigma
(i.e. ς) and phi (i.e. φ).

Natural Sciences Tripos: IB Mathematical Methods I v © [email protected], Michaelmas 2022


The first fundamental theorem of calculus. The first fundamental theorem of calculus states that the
derivative of the integral of f is f , i.e. if f is suitably ‘nice’ (e.g. f is continuous) then
Z x 
d Key
f (t) dt = f (x) . (0.1)
dx x1
Result

The second fundamental theorem of calculus. The second fundamental theorem of calculus states
that the integral of the derivative of f is f , e.g. if f is differentiable then
Z x2
df Key
dx = f (x2 ) − f (x1 ) . (0.2)
x1 dx Result

The Gaussian. The function


1  x2 
f (x) = √ exp − 2 (0.3)
2πσ 2 2σ

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
is called a Gaussian of width σ; in context of probability theory σ is the standard deviation. The area
under this curve is unity, i.e. Z ∞
1  x2  Key
√ exp − 2 dx = 1 . (0.4) Result
2πσ 2 −∞ 2σ

Cylindrical polar co-ordinates (ρ, ϕ, z).


In cylindrical polar co-ordinates the position vector r is given in
terms of a radial distance ρ from an axis ez , a polar angle ϕ, and
the distance z along the axis:

r = ρ cos ϕ ex + ρ sin ϕ ey + z ez (0.5a)


= ρ eρ + z ez , (0.5b)

where 0 ⩽ ρ < ∞, 0 ⩽ ϕ ⩽ 2π and −∞ < z < ∞.

Remark. Often r and/or θ are used in place of ρ and/or ϕ re-


spectively (but then there is potential confusion with the different
definitions of r and θ in spherical polar co-ordinates).
Spherical polar co-ordinates (r, θ, ϕ).

In spherical polar co-ordinates the position vector r is given in


terms of a radial distance r from the origin, a ‘latitude’ angle θ,
and a ‘longitude’ angle ϕ:

r = r sin θ cos ϕ ex + r sin θ sin ϕ ey + r cos θ ez (0.6a)


= rer , (0.6b)

where 0 ⩽ r < ∞, 0 ⩽ θ ⩽ π and 0 ⩽ ϕ ⩽ 2π.

Taylor’s theorem for functions of more than one variable. Let f (x, y) be a function of two vari-
ables, then
∂f ∂f
f (x + δx, y + δy) = f (x, y) + δx + δy
∂x ∂y
2
∂2f ∂2f
 
1 ∂ f
+ (δx)2 2 + 2δxδy + (δy)2 2 . . . . (0.7)
2! ∂x ∂x∂y ∂y

Exercise. Let g(x, y, z) be a function of three variables. Expand g(x + δx, y + δy, z + δz) correct to
O(δx, δy, δz).

Natural Sciences Tripos: IB Mathematical Methods I vi © [email protected], Michaelmas 2022


Partial differentiation. For variables q1 , q2 , q3 ,
   
∂q1 ∂q1
= 1, = 0, etc. , (0.8a)
∂q1 q2 ,q3 ∂q2 q1 ,q3

and hence
∂qi Key
= δij , (0.8b) Result
∂qj
where δij is the Kronecker delta: 
1 if i = j
δij = . (0.9)
0 ̸ j
if i =
The chain rule. Let h(x, y) be a function of two variables, and suppose that x and y are themselves
functions of a variable s, then
dh ∂h dx ∂h dy
= + . (0.10a)
ds ∂x ds ∂y ds

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Suppose instead that h depends on n variables xi (i = 1, . . . , n), so that h = h(x1 , x2 , . . . , xn ). If the
xi depend on m variables sj (j = 1, . . . , m), then for j = 1, . . . , m
n
∂h X ∂h ∂xi Key
= . (0.10b)
∂sj ∂xi ∂sj Result
i=1

Vector identities. Suppose that vectors a, b and c have components (a1 , a2 , a3 ), (b1 , b2 , b3 ) and (c1 , c2 , c3 )
respectively.
Scalar or dot product. The scalar product for a and b is given by
n
X
a·b= ai bi = a1 b1 + a2 b2 + a3 b3 . (0.11a)
i=1

Vector or cross product. The vector product for a and b is given by


a×b = (a2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 ) . (0.11b)
Scalar triple product. The scalar triple product for a, b and c is given by
(a × b) · c = a1 b2 c3 + a2 b3 c1 + a3 b1 c2 − a1 b3 c2 − a2 b1 c3 − a3 b2 c1 (0.11c)
a1 a2 a3
= b1 b2 b3 . (0.11d)
c1 c2 c3
Vector triple product. The vector triple product for a, b and c is given by
a × (b × c) = (a · c)b − (a · b)c . (0.11e)
Line integrals. Let C be a smooth curve, then

Z Z
Key
dr = − dr . (0.6)
C −C Result

The transpose of a matrix. Let A be a 3 × 3 matrix:


 
A11 A12 A13
A = A21 A22 A23  . (0.7a)
A31 A32 A33
Then the transpose, AT , of this matrix is given by
 
A11 A21 A31
AT = A12 A22 A32  . (0.7b)
A13 A23 A33

Natural Sciences Tripos: IB Mathematical Methods I vii © [email protected], Michaelmas 2022


Fourier series. Let f (x) be a function with period L, i.e. a function such that f (x + L) = f (x). Then the
Fourier series expansion of f (x) is given by
∞   ∞  
1
X 2πnx X 2πnx Key
f (x) = 2 a0 + an cos + bn sin , (0.8a) Result
n=1
L n=1
L

where
Z x0 +L  
2 2πnx
an = f (x) cos dx , (0.8b)
L x0 L
Z x0 +L  
2 2πnx
bn = f (x) sin dx , (0.8c)
L x0 L

and x0 is an arbitrary constant. Also recall the orthogonality conditions

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Z L
2πnx 2πmx L
sin sin dx = δnm , (0.9a)
0 L L 2
Z L
2πnx 2πmx L
cos cos dx = δnm , (0.9b)
0 L L 2
Z L
2πnx 2πmx
sin cos dx = 0. (0.9c)
0 L L

Let ge (x) be an even function, i.e. a function such that ge (−x) = ge (x), with period L = 2L. Then the
Fourier series expansion of ge (x) can be expressed as

X  nπx 
1
ge (x) = 2 a0 + an cos , (0.10a)
n=1
L

where Z L
2  nπx 
an = ge (x) cos dx . (0.10b)
L 0 L

Let go (x) be an odd function, i.e. a function such that go (−x) = −go (x), with period L = 2L. Then
the Fourier series expansion of go (x) can be expressed as

X  nπx 
go (x) = bn sin , (0.11a)
n=1
L

where Z L
2  nπx 
bn = go (x) sin dx . (0.11b)
L 0 L

Recall that if integrated over a half period, the ‘orthogonality’ conditions require care since
Z L
nπx mπx L
sin sin dx = δnm , (0.12a)
0 L L 2
Z L
nπx mπx L
cos cos dx = δnm , (0.12b)
0 L L 2
but 
Z L  0 if n + m is even,
nπx mπx 
sin cos dx = 2nL (0.12c)
0 L L  if n + m is odd.
π(n2 − m2 )

Permutations A permutation of degree n is a function that rearranges n distinct objects, such as the first
n strictly positive integers {1, 2, . . . , n}, amongst themselves.
An even (odd) permutation is one consisting of an even (odd) number of transpositions (interchanges
of two neighbouring objects).

Natural Sciences Tripos: IB Mathematical Methods I viii © [email protected], Michaelmas 2022


If n = 3 there are 6 permutations (including the identity permutation) that re-arrange {1, 2, 3} to

{1, 2, 3}, {2, 3, 1}, {3, 1, 2}, (0.13a)


{1, 3, 2}, {2, 1, 3}, {3, 2, 1}. (0.13b)

(0.13a) and (0.13b) are, respectively, even and odd permutations of {1, 2, 3}.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.

Natural Sciences Tripos: IB Mathematical Methods I ix © [email protected], Michaelmas 2022


Suggestions.
Examples.
1. Include Ampere’s law, Faraday’s law, etc., somewhere (see 1997 Vector Calculus notes).

Additions/Subtractions?
1. Remove all the \enlargethispage commands.
2. 2D divergence theorem, Green’s theorem (e.g. as a special case of Stokes’ theorem).
3. Add Fourier transforms of cos x, sin x and periodic functions.
4. Check that the addendum at the end of § 3 has been incorporated into the main section.
5. Swap § 4.7.1 and § 3.5.
6. Swap § 5.2 and § 5.4.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
7. Explain that observables in quantum mechanics are Hermitian operators.
8. Come up with a better explanation of why for a transformation matrix, say A, det A ̸= 0.

Natural Sciences Tripos: IB Mathematical Methods I x © [email protected], Michaelmas 2022


1 Vector Calculus

1.0 Why Study This?

Scientific quantities can be of different kinds.


• Many scientific quantities just have a magnitude (and sign), e.g. time, temperature, mass, density,
concentration, energy. Such quantities can be completely specified by a single number. We refer to
such numbers as scalars. You have learnt how to manipulate such scalars (e.g. by addition, subtraction,
multiplication, differentiation) since your first day in school (or possibly before that).
• However other quantities have both a magnitude and a direction, e.g. the position of a particle, the
velocity of a particle, the direction of propagation of a wave, a force, an electric field, a magnetic field.
You need to know how to manipulate these quantities (e.g. by addition, subtraction, multiplication,
differentiation) if you are to be able to describe them mathematically.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
A field is a quantity that depends continuously on position (and possibly on time). Examples include:
• air pressure in this room (scalar field)
• electric field in this room (vector field)
Vector calculus is concerned with scalar and vector fields. The spatial variation of fields is described by
vector differential operators, which appear in the partial differential equations governing the fields.
Vector calculus is most easily done in Cartesian coordinates, but other systems (curvilinear coordinates) are
better suited for some problems because of symmetries or boundary conditions.

1.1 Vectors and Bases

1.1.1 Three-dimensional Euclidean space, points and vectors

This is a close approximation to our physical space:


• points are the elements of the space
• vectors are translatable, directed line segments
• Euclidean means that lengths and angles obey the classical results of geometry

Definition. A quantity that is specified by


a [positive] magnitude and a direction in
space is called a vector.

Example. A point P in 3D (or 2D) space can


be specified by giving its position vector, r,
from some chosen origin 0.

1.1.2 Bases

Points and vectors have a geometrical existence without reference to any coordinate system. However, it is
often very useful to describe them in term of a basis for the space. Three non-zero vectors e1 , e2 and e3 can
form a basis in 3D space if they do not all lie in a plane, i.e. they are linearly independent. Any vector can
be expressed uniquely in terms of scalar multiples of the basis vectors:

v = v1 e1 + v2 e2 + v3 e3 . (1.1)

The vi (i = 1, 2, 3) are said to the components of the vector v with respect to this basis.
Remark. The choice of basis is not unique. The components of a vector are different with respect to two
different bases.

Natural Sciences Tripos: IB Mathematical Methods I 1 © [email protected], Michaelmas 2022


Definition. The ei (i = 1, 2, 3) need not have unit magnitude and/or be orthogonal. However calculations,
etc. are much simpler if the ei (i = 1, 2, 3) define a orthonormal basis, for which the basis vectors have unit
magnitude and are mutually orthogonal, i.e.

e1 ·e1 = e2 ·e2 = e3 ·e3 = 1 , (1.2a)


e1 ·e2 = e2 ·e3 = e3 ·e1 = 0 , (1.2b)
or equivalently
|ei | = 1 , ei · ej = 0 if i ̸= j, i, j = 1, 2, 3. (1.2c)

The orthonormal basis is right-handed if

e1 × e2 = e3 , (1.3)

so that the ordered triple scalar product of the


basis vectors is positive:

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
[e1 , e2 , e3 ] = e1 × e2 · e3 = 1 . (1.4)

Exercise. Show using (1.2c) and (1.3) that

e2 × e3 = e1 and that e3 × e1 = e2 . (1.5)

1.1.3 Cartesian Coordinate Systems

We can set up a Cartesian coordinate system by


identifying e1 , e2 and e3 with unit vectors point-
ing in the x, y and z directions respectively. The
position vector r is then given by

r = x e1 + y e2 + z e3 (1.6a)
= (x, y, z) , (1.6b)

where (x, y, z) are the Cartesian components of the


position vector.

Remarks.

1. We shall sometimes write x1 for x, x2 for y and x3 for z.

2. Alternative notations for a Cartesian basis in R3 (i.e. 3D) include

e1 = ex = i = ı̂ = x̂ = x
b1 , e2 = ey = j = ȷ̂ = ŷ = x
b2 and e3 = ez = k = k̂ = ẑ = x
b3 , (1.7)

for the unit vectors in the x, y and z directions respectively. Hence from (1.2c) and (1.5)

i.i = j.j = k.k = 1 , i.j = j.k = k.i = 0 , (1.8a)


i × j = k, j × k = i, k × i = j. (1.8b)

3. Two different bases, if both orthonormal and right-handed, are simply related by a rotation.

4. The Cartesian components of a vector are different with respect to two different Cartesian bases.

Natural Sciences Tripos: IB Mathematical Methods I 2 © [email protected], Michaelmas 2022


1.2 Suffix Notation

So far we have used dyadic notation for vectors. Suffix notation is an alternative means of expressing vectors
(and tensors). Once familiar with suffix notation, it is generally easier to manipulate vectors using suffix
notation.6
An alternative to the notation used for the vector (1.1), is to write
v = v1 e1 + v2 e2 + v3 e3 = (v1 , v2 , v3 ) (1.9a)
= {vi } for i = 1, 2, 3 . (1.9b)

Suffix notation. We will refer to v as {vi }, with the i = 1, 2, 3 understood; i is then termed a free suffix.
Remark. Sometimes we will denote the ith component of the vector v by (v)i , i.e. (v)i = vi .
Example: the position vector. The position vector r can be written as

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
r = (x, y, z) = (x1 , x2 , x3 ) = {xi } . (1.10)

Remark. The use of x, rather than r, for the position vector in dyadic notation possibly seems more under-
standable given the above expression for the position vector in suffix notation. Henceforth we will use
x and r interchangeably.

1.2.1 Dyadic and suffix equivalents

If two vectors a and b are equal, we write


a = b, (1.11a)

or equivalently in component form

a1 = b1 , (1.11b)
a2 = b2 , (1.11c)
a3 = b3 . (1.11d)
In suffix notation we express this equality as
ai = bi for i = 1, 2, 3 . (1.11e)
This is a vector equation; when we omit the ’for i = 1, 2, 3’, it is understood that the one free suffix i ranges
through 1, 2, 3 (or 1, 2 in 2D) so as to give three component equations. Similarly
c = λa + µb ⇔ ci = λai + µbi
⇔ cj = λaj + µbj
⇔ cα = λaα + µbα
⇔ c¥ = λa¥ + µb¥ ,
where is is assumed that i, j, α and ¥, respectively, range through (1, 2, 3).7

Remark. It does not matter what letter, or symbol, is chosen for the free suffix, but it must be the same in
each term.
Dummy suffices. In suffix notation the scalar product becomes
a·b = a1 b1 + a2 b2 + a3 b3
X3
= ai bi
i=1
3
X
= ak bk , etc.,
k=1

6 Although there are dissenters to that view.


7 In higher dimensions the suffices would be assumed to range through the number of dimensions.

Natural Sciences Tripos: IB Mathematical Methods I 3 © [email protected], Michaelmas 2022


where the i, k, etc. are referred to as dummy suffices since they are ‘summed out’ of the equation.
Similarly
X3
a·b=λ ⇔ aα bα = λ ,
α=1

where we note that the equivalent equation on the right hand side has no free suffices since the dummy
suffix (in this case α) has again been summed out.
Further examples.
(i) As another example consider the equation (a · b)c = d. In suffix notation this becomes
3
X 3
X
(ak bk ) ci = ak bk ci = di , (1.12)
k=1 k=1

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
where k is the dummy suffix, and i is the free suffix that is assumed to range through (1, 2, 3). It
is essential that we used different symbols for both the dummy and free suffices!
(ii) In suffix notation the expression (a · b)(c · d) becomes

3
! 3 
X X
(a · b)(c · d) = ai bi  cj dj 
i=1 j=1
3 X
X 3
= ai bi cj dj ,
i=1 j=1

where, especially after the rearrangement, it is essential that the dummy suffices are different.

1.2.2 Summation convention

In the case of free suffices we are assuming that they range through
P (1, 2, 3) without the need to explicitly
say so. Under Einstein’s summation convention the explicit sum, , can be omitted for dummy suffices.8
In particular

• if a suffix appears once it is taken to be a free suffix and ranged through,


• if a suffix appears twice it is taken to be a dummy suffix and summed over,

• if a suffix appears more than twice in one term of an equation, something has gone wrong (unless
there is an explicit sum).

Remark. This notation is powerful because it is highly abbreviated (and so aids calculation, especially in
examinations), but the above rules must be followed, and remember to check your answers (e.g. the
free suffices should be identical on each side of an equation).
Examples. Under suffix notation and the summation convention

a+b=c can be written as ai + bi = ci ,


(a · b)c = d can be written as ai bi cj = dj ,
((a · b)c − (a · c)b)j can be written as ai bi cj − ak ck bj ,
or can be written as ai bi cj − ai ci bj ,
or can be written as ai (bi cj − ci bj ) .

8 Learning to omit the explicit sum is a bit like learning to change gear when starting to drive. At first you have to

remind yourself that the sum is there, in the same way that you have to think consciously where to move gear knob. With
practice you will learn to note the existence of the sum unconsciously, in the same way that an experienced driver changes
gear unconsciously; however you will crash a few gears on the way!

Natural Sciences Tripos: IB Mathematical Methods I 4 © [email protected], Michaelmas 2022


Under suffix notation the following equations make no sense

ak = bj because the free suffices are different,


((a · b)c)i = ai bi ci because i is repeated more than twice in one term on the left-hand side.

Under suffix notation the following equation is problematical (and probably best avoided unless you
will always remember to double count the i on the right-hand side)

ni ni = n2i because i occurs twice on the left-hand side and only once on the right-hand side.

01/22 Remark. If the summation convention is not being used, this should be noted explicitly.

1.2.3 Matrix expressions

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Transpose of a matrix (A):

(AT )ij = Aji .

Trace of a matrix (A):


tr A = Aii .

Matrix (A) times vector (x):


y = Ax ⇔ yi = Aij xj .

Matrix (B) times matrix (C):


A = BC ⇔ Aij = Bik Ckj .

Determinant of a (3 × 3) matrix (A), where (if you have not met it before) εijk is defined in (1.18) below:
det A = εijk A1i A2j A3k .

1.2.4 Kronecker delta

The Kronecker delta, δij , i, j = 1, 2, 3, is a set of nine numbers defined by

δ11 = 1 , δ22 = 1 , δ33 = 1 , (1.13a)


δij = 0 if i ̸= j . (1.13b)

This can be written as a matrix equation:


   
δ11 δ12 δ13 1 0 0
δ21 δ22 δ23  = 0 1 0 = 1 . (1.13c)
δ31 δ32 δ33 0 0 1

Properties.

(i) δij is symmetric, i.e.


δij = δji .

(ii) Using the definition of the delta function:


3
X
ai δi1 = ai δi1 = a1 δ11 + a2 δ21 + a3 δ31
i=1
= a1 . (1.14a)
i.e. a1 = a. Similarly
ai δij = aj , (1.14b)
aj δij = ai . (1.14c)

Natural Sciences Tripos: IB Mathematical Methods I 5 © [email protected], Michaelmas 2022


(iii)
3
X
δij δjk = δij δjk = δik . (1.14d)
j=1

(iv)
3
X
δii = δii = δ11 + δ22 + δ33 = 3 . (1.14e)
i=1

(v)
ap δpq bq = ap bp = aq bq = a · b . (1.14f)
(vi) From (1.2c)
ei · ej = δij . (1.14g)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Contraction. Contraction is an operation by which we set one free index equal to another, so that it is
summed over. For example, the contraction of aij is aii . Contraction is equivalent to multiplication by
a Kronecker delta:
aij δij = a11 + a22 + a33 = aii . (1.15)

1.2.5 More on basis vectors (Unlectured)


An alternative notation to e1 , e2 and e3 is e(1) , e(2) and e(3) , where the use of superscripts may help
emphasise that the 1, 2 and 3 are labels rather than components.
Then in terms of the superscript notation
e(i) · e(j) = δij , (1.16a)
(i)
a·e = ai . (1.16b)

Thus the ith component of e(j) is given by

e(j) e(j) · e(i)



i
=
= δij . (1.16c)
Similarly

e(i)

j
= δij , (1.16d)

and equivalently

(ej )i = (ei )j = δij . (1.16e)

1.2.6 The Levi-Civita symbol or alternating tensor

Revision. A permutation of degree n is a function that rearranges n distinct objects (taken in our case to
be the first n strictly positive integers {1, 2, . . . , n}) amongst themselves.
If n = 3 there are 6 permutations (including the identity permutation) that re-arrange {1, 2, 3} to
{1, 2, 3}, {2, 3, 1}, {3, 1, 2}, (1.17a)
{1, 3, 2}, {2, 1, 3}, {3, 2, 1}. (1.17b)
An even (odd) permutation is one consisting of an even (odd) number of transpositions (interchanges
of two neighbouring objects). Hence, (1.17a) and (1.17b) are, respectively, even and odd permutations
of {1, 2, 3}.
Definition 1.1. We define the Levi-Civita permutation symbol, εijk (i, j, k = 1, 2, 3), to be the set of 27
quantities such that

 1 if i j k is an even permutation of 1, 2, 3;
εijk = −1 if i j k is an odd permutation of 1, 2, 3; (1.18)
0 otherwise

Natural Sciences Tripos: IB Mathematical Methods I 6 © [email protected], Michaelmas 2022


The non-zero components of εijk are therefore
ε123 = ε231 = ε312 = 1 (1.19a)
ε132 = ε213 = ε321 = −1 (1.19b)
Further
εijk = εjki = εkij = −εikj = −εkji = −εjik . (1.19c)

Worked exercise.
For a symmetric tensor sij , i, j = 1, 2, 3, such that sij = sji evaluate εijk sij .

Solution. By relabelling the dummy suffices we have from (1.19c) and the symmetry of sij that
3 X
X 3 3 X
X 3 3 X
X 3 3 X
X 3
εijk sij = εabk sab = εjik sji = − εijk sij , (1.20a)
i=1 j=1 a=1 b=1 j=1 i=1 i=1 j=1

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
or equivalently by using the summation convention
εijk sij = εabk sab = εjik sji = −εijk sij , (1.20b)
where we have successively relabelled i → a → j and j → b → i. Hence we conclude that
εijk sij = 0 . (1.20c)

1.2.7 The vector product in suffix notation

We claim that
3 X
X 3
(a × b)i = εijk aj bk = εijk aj bk , (1.21)
j=1 k=1
where we note that there is one free suffix and two dummy suffices.
Check.
3 X
X 3
(a × b)1 = ε1jk aj bk = ε123 a2 b3 + ε132 a3 b2 = a2 b3 − a3 b2 ,
j=1 k=1

in agreement with (0.11b). Do we need to do more?


Remark. Equivalently
e1 e2 e3
a×b = a1 a2 a3
b1 b2 b3
= εijk ei aj bk . (1.22)

Example. From (1.14b), (1.16e) and (1.21)


  
ej × ek i
= εilm ej l
ek m
= εilm δjl δkm
= εijk . (1.23)

1.2.8 The product of two Levi-Civita symbols

We claim that
δil δim δin
εijk εlmn = δjl δjm δjn . (1.24)
δkl δkm δkn
As proof we observe that the value of both the LHS and the RHS:
(i) is 0 when any of (i, j, k) are equal (two rows equal in a determinant), or when any of (l, m, n) are
equal (two columns equal in a determinant);
(ii) is 1 when (i, j, k) = (l, m, n) = (1, 2, 3);
(iii) changes sign when any of (i, j, k) are interchanged (row interchange in a determinant), or when any
of (l, m, n) are interchanged (column interchange in a determinant).

Natural Sciences Tripos: IB Mathematical Methods I 7 © [email protected], Michaelmas 2022


Remark. The first property is implied by the third.
A contracted identity. We contract the identity (1.24) once by setting l = i, then using (1.14d) and (1.14e):

δii δim δin


εijk εimn = δji δjm δjn
δki δkm δkn
= δii (δjm δkn − δjn δkm ) + δim (δjn δki − δji δkn ) + δin (δji δkm − δjm δki )
= 3(δjm δkn − δjn δkm ) + (δjn δkm − δjm δkn ) + (δjn δkm − δjm δkn )
= δjm δkn − δjn δkm (1.25a)

This is the most useful form to remember:


εijk εimn = δjm δkn − δjn δkm (1.25b)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Remarks.
(i) There are four free suffices/indices on each side, with i as a dummy suffix on the left-hand side.
Hence (1.25b) represents 34 equations.
(ii) Given any product of two epsilons with one common index, the indices can be permuted cyclically
into this form, for instance:
εαβγ εµνβ = εβγα εβµν = δγµ δαν − δγν δαµ (1.25c)
Contracted 2 and contracted 3 identities. A further contraction of the identity (1.24) yields from (1.25b)
εijk εijn = δjj δkn − δjn δkj
= 3δkn − δkn
= 2δkn , (1.26a)
while a further contraction yields
εijk εijk = 6 . (1.26b)

Example. Show that


(a × b)·(c × d) = (a·c)(b·d) − (a·d)(b·c) .
Solution.
(a × b)·(c × d) = (a × b)i (c × d)i
= (εijk aj bk )(εilm cl dm ) from (1.21)
= εijk εilm aj bk cl dm
= (δjl δkm − δjm δkl )aj bk cl dm from (1.25b)
= aj bk cj dk − aj bk ck dj from (1.14b) and (1.14c)
= (aj cj )(bk dk ) − (aj dj )(bk ck )
= (a·c)(b·d) − (a·d)(b·c) .
Scalar triple product. In suffix notation the scalar triple product is given by
a · (b × c) = ai (b × c)i
= εijk ai bj ck . (1.27a)

Vector triple product. Using suffix notation for the vector triple product, we recover in agreement with
(0.11e):

a × (b × c) i = εijk aj (b × c)k
= εijk aj εklm bl cm only two identical sufficies
= εkij εklm aj bl cm from (1.19c) permutate the sufficies
= (δil δjm − δim δjl ) aj bl cm from (1.25b)
= aj bi cj − aj bj ci from (1.14b) and (1.14c)

= (a · c)b − (a · b)c i . (1.27b)

Natural Sciences Tripos: IB Mathematical Methods I 8 © [email protected], Michaelmas 2022


1.2.9 A proof of Schwarz’s inequality (Unlectured)

There is an elegant proof of Schwarz’s inequality (which works in n-dimensions) using the summation
convention:
2 2 2
∥x∥ ∥y∥ − |x · y| = xi xi yj yj − xi yi xj yj
= 12 xi xi yj yj + 12 xj xj yi yi − xi yi xj yj relabel indicies in half the first term
1
= 2 (xi yj − xj yi )(xi yj − xj yi ) factorize
⩾ 0.

1.3 Vector Calculus in Cartesian Coordinates.

1.3.1 The Gradient of a Scalar Field

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Let ψ(r) be a scalar field, i.e. a scalar function
of position r = (x, y, z).

Examples of scalar fields include temperature


and density.

Consider a small change to the position r, say to


r + δr. This small change in position will gener-
ally produce a small change in ψ. We estimate
this change in ψ using the Taylor series for a
function of many variables, as follows:

δψ ≡ ψ(r + δr) − ψ(r) = ψ(x + δx, y + δy, z + δz) − ψ(x, y, z)


∂ψ ∂ψ ∂ψ
= δx + δy + δz + . . .
∂x ∂y ∂z
 
∂ψ ∂ψ ∂ψ
= ex + ey + ez · (δx ex + δy ey + δz ez ) + . . .
∂x ∂y ∂z
= ∇ψ · δr + . . . , (1.28a)
P P3
where, using the shorthand j for j=1 , the gradient of ψ is defined by

∂ψ ∂ψ ∂ψ
grad ψ ≡ ∇ψ = ex + ey + ez
∂x ∂y ∂z
X ∂ψ
= ej . (1.28b)
∂xj
02/22 j

In the limit when δ• becomes infinitesimal we write d• for δ•.9 Thus we have that

dψ = ∇ψ · dr . (1.29)

We can define the vector differential operator ∇ (pronounced ‘grad’) independently of ψ by writing

∂ ∂ ∂ X ∂
∇ ≡ ex + ey + ez = ej (1.30a)
∂x ∂y ∂z j
∂x j Key
Results

= ej , using the summation convention. (1.30b)
∂xj
9 This is a bit of a ‘fudge’ because, strictly, a differential d• need not be small . . . but there is no quick way out.

Natural Sciences Tripos: IB Mathematical Methods I 9 © [email protected], Michaelmas 2022


1.3.2 Example

Find ∇f , where f (r) is a function of r = |r|. We will use this result later.
Answer. First recall that r2 = x2 + y 2 + z 2 . Hence
∂r ∂r x
2r = 2x , i.e. = . (1.31a)
∂x ∂x r
Similarly, by use of the permutations x → y, y → z and z → x,
∂r y ∂r z
= , = . (1.31b)
∂y r ∂z r

Hence, from the definition of gradient (1.28b),


  

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
∂r ∂r ∂r x y z r Key
∇r = , , = , , = . (1.32)
∂x ∂y ∂z r r r r Result

Similarly, from the definition of gradient (1.28b) (and from standard results for the derivative of a function
of a function),
 
∂f (r) ∂f (r) ∂f (r)
∇f (r) = , ,
∂x ∂y ∂z
 
df ∂r df ∂r df ∂r
= , ,
dr ∂x dr ∂y dr ∂z
= f ′ (r)∇r (1.33a)
r
= f ′ (r) . (1.33b)
r

1.3.3 The Geometrical Significance of Gradient

The directional derivative. Consider the rate of change of ψ


in the direction given by the unit vector l. If we regard
ψ(r+sl) as a function of the single variable s, then a Taylor
series expansion yields

d
δψ = ψ(r + δs l) − ψ(r) = δs ψ(r + sl) + ... ,
ds s=0

or in the limit of δs becoming infinitesimal,

d
dψ = ds ψ(r + sl) . (1.34a)
ds s=0
But from (1.29) with dr = ds l,
dψ = ds (l · ∇ψ) . (1.34b)
Since (1.34a) and (1.34b) hold for all ds, it follows that

d
l · ∇ψ = ψ(r + sl) . (1.35)
ds s=0

Hence l · ∇ψ is the rate of change of ψ in the direction l. It is referred to as a directional derivative.


Remarks.

(i) More generally, the rate of change of ψ with arclength s along a curve is dψ/ds = l·grad ψ, where
l = dr/ds is the unit tangent vector to the curve.
(ii) When the directional derivative is zero, i.e. l · ∇ψ = 0, it follows that if ∇ψ ̸= 0, then ψ does not
change in the direction of l; hence l is a tangent to the surface ψ = constant.

Natural Sciences Tripos: IB Mathematical Methods I 10 © [email protected], Michaelmas 2022


(iii) Moreover, we deduce that when ∇ψ ̸= 0 at a point,
∇ψ is orthogonal/normal to all tangents of the sur-
face ψ = constant at that point. Hence, if n
b is the unit
normal to a surface of constant ψ, then (up to a sign)

∇ψ
n
b= . (1.36)
|∇ψ|

(iv) The directional derivative is maximal when l is paral-


lel to ∇ψ; hence ∇ψ is a vector field pointing in the
01/02 direction in which ψ is changing most rapidly.
01/03
1.3.4 Applications

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1. Find the unit normal at the point r(x, y, z) to the surface

ψ(r) ≡ xy + yz + zx = −c , (1.37)

where c is a positive constant. Hence find the points where the tangents to the surface are parallel to
the (x, y) plane.
Answer. First calculate

∇ψ = (y + z, x + z, y + x) . (1.38a)

Then from (1.36) the unit normal is given by

∇ψ (y + z, x + z, y + x)
n
b= =p . (1.38b)
|∇ψ| 2(x2 + y 2 + z 2 + xy + xz + yz

The tangents to the surface ψ(r) = −c are parallel to the


(x, y) plane when the normal is parallel to the z-axis, i.e.
when n b = (0, 0, −1), i.e. when
b = (0, 0, 1) or n

y = −z and x = −z . (1.38c)

Hence from the equation for the surface, i.e. (1.37), the points where the tangents to the surface are
parallel to the (x, y) plane satisfy

z2 = c , (1.38d)
so from (1.38c)
r = ±(−c, −c, c) . (1.38e)
01/04
2. Unlectured. A mountain’s height z = h(x, y) depends on Cartesian coordinates x, y according to
h(x, y) = 1 − x4 − y 4 ⩾ 0. Find the point at which the slope in the plane y = 0 is greatest.

Answer. The slope of a path is the rate of change in the


vertical direction divided by the rate of change in the hor-
izontal direction. So consider a path on the mountain pa-
rameterised by s:

r(s) = (x(s), y(s), h(x(s), y(s))) . (1.39)

As s varies, the rate of change with s in the vertical direction


is dh
ds , whileqthe rate of change with s in the horizontal
dx 2
2
+ dy

direction is ds ds .

Natural Sciences Tripos: IB Mathematical Methods I 11 © [email protected], Michaelmas 2022


Hence the slope of the path is given by
dh
ds
slope = q
dx 2 dy 2
 
ds + ds
∂h dx ∂h dy
∂x ds + ∂y ds
=q from (0.10a)
dx 2 dy 2
 
ds + ds
= l · ∇h , (1.40a)
where  
1 dx dy
l= q , ,0 . (1.40b)
dx 2
 dy 2
 ds ds
ds + ds

Thus the slope is a directional derivative. On y = 0

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
−4x3 dx
 
dx
slope = dx
ds
= −4x3 sign . (1.40c)
ds
ds
Therefore the magnitude of the slope is largest where |x| is largest, i.e. at the edge of the mountain
|x| = 1. It follows that max |slope| = 4.

1.4 The Divergence and Curl

1.4.1 Vector fields

∇ψ is an example of a vector field, i.e. a vector specified at each point r in space. More generally, we have
for a vector field F(r), X
F(r) = Fx (r)ex + Fy (r)ey + Fz (r)ez = Fj (r)ej , (1.41)
j

where Fx , Fy , Fz , or alternatively Fj (j = 1, 2, 3), are the components of F in this Cartesian coordinate


system. Examples of vector fields include current, electric and magnetic fields, and fluid velocities.
We can apply the ∇ vector operator to vector fields by means of dot and cross products.

1.4.2 The Divergence and Curl of a Vector Field

Divergence. The divergence of F is the scalar field


 
∂ ∂ ∂
div F ≡ ∇ · F = ex + ey + ez · (Fx ex + Fy ey + Fz ez )
∂x ∂y ∂z
∂Fx ∂Fy ∂Fz
= + + (1.42a)
∂x ∂y ∂z
∂Fj Key
= , (s.c.) (1.42b)
∂xj Result
from using (1.14g) and (1.30a), and remembering that in a Cartesian coordinate system the basis
vectors do not depend on position and hence do not need to be differentiated.
Curl. The curl of F is the vector field
 
∂ ∂ ∂
curl F ≡ ∇ × F = ex + ey + ez × (Fx ex + Fy ey + Fz ez )
∂x ∂y ∂z
     
∂Fz ∂Fy ∂Fx ∂Fz ∂Fy ∂Fx
= − ex + − ey + − ez (1.43a)
∂y ∂z ∂z ∂x ∂x ∂y
ex ey ez e1 e2 e3
= ∂x ∂y ∂z = ∂x1 ∂x2 ∂x3 (1.43b)
Fx Fy Fz F1 F2 F3
∂Fk Key
= εijk ei , (s.c.) (1.43c) Result
∂xj

Natural Sciences Tripos: IB Mathematical Methods I 12 © [email protected], Michaelmas 2022


from using (1.3) and (1.5), and remembering that in a Cartesian coordinate system the basis vectors
do not depend on position. Here
∂ ∂ ∂ ∂
01/01 ≡ ∂x , ≡ ∂y , ≡ ∂z and ≡ ∂xj ≡ ∂j . (1.43d)
∂x ∂y ∂z ∂xj

1.4.3 Examples

1. Unlectured. Find the divergence and curl of the vector field F = (x2 y, y 2 z, z 2 x).
Answer.
∂(x2 y) ∂(y 2 z) ∂(z 2 x)
∇·F= + + = 2xy + 2yz + 2zx . (1.44)
∂x ∂y ∂z

ex ey ez
∇×F = ∂x ∂y ∂z

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
x2 y y2 z z2x
= −y 2 ex − z 2 ey − x2 ez
= −(y 2 , z 2 , x2 ) . (1.45)

2. Find ∇ · r and ∇ × r.
Answer. From the definition of divergence (1.42a), and recalling that r = (x, y, z) = (x1 , x2 , x3 ), it
follows that
∂x ∂y ∂z
∇·r= + + = 3. (1.46a)
∂x ∂y ∂z
or equivalently from (1.42b)
∂xi ∂xi
∇·r= = δii = 3 , since, from Example Sheet 0, = δij . (1.46b)
∂xi ∂xj
Next, from the definition of curl (1.43a) it follows that
 
∂z ∂y ∂x ∂z ∂y ∂x
∇×r= − , − , − = 0. (1.46c)
∂y ∂z ∂z ∂x ∂x ∂y
or equivalently from (1.43c)
∂xk
∇ × r = εijk ei = εijk ei δjk = εijj ei = 0 . (1.46d)
∂xj

1.4.4 F·∇ .

In (1.42a) we defined the divergence of a vector field F, i.e. the scalar ∇ · F. The order of the operator ∇
and the vector field F is important here. If we invert the order then we obtain the scalar operator
∂ ∂ ∂ ∂
(F · ∇) ≡ Fx + Fy + Fz = Fj . (s.c.) (1.47a)
∂x ∂y ∂z ∂xj

Remark. As far as notation is concerned, for scalar ψ


   
∂ψ ∂
F · (∇ψ) = Fj = Fj ψ = (F · ∇)ψ . (s.c.) (1.47b)
∂xj ∂xj
However, the right hand form is preferable. This is because for a vector G, the ith component of
(F · ∇)G is unambiguous, namely
X ∂Gi Key
((F · ∇)G)i = Fj , (1.47c)
j
∂xj Result

while the ith component of F · (∇G) is not, i.e. it is not clear whether the ith component of F · (∇G)
is
X ∂Gi X ∂Gj
Fj or Fj .
j
∂xj j
∂xi

Natural Sciences Tripos: IB Mathematical Methods I 13 © [email protected], Michaelmas 2022


1.5 Vector Differential Identities

Calculations involving ∇ can be much sped up when certain vector identities are known. There are a large
number of these! A short list is given below of the most common. Here ψ is a scalar field and F, G are
vector fields.

∇ · (ψ F) = ψ ∇ · F + (F · ∇)ψ , (1.48a)

∇ × (ψF) = ψ (∇ × F) + (∇ψ) × F , (1.48b)

∇ · (F × G) = G · (∇ × F) − F · (∇ × G) , (1.48c)

∇ × (F × G) = F (∇ · G) − G (∇ · F) + (G · ∇)F − (F · ∇)G , (1.48d)

∇(F · G) = (F · ∇)G + (G · ∇)F + F × (∇ × G) + G × (∇ × F) . (1.48e)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Example Verifications.
(1.48a):
∂(ψ Fi )
∇ · (ψ F) = from (1.42b)
∂xi
∂Fi ∂ψ
=ψ + Fi
∂xi ∂xi
= ψ ∇ · F + (F · ∇)ψ from (1.42b) and (1.47a).

Unlectured. (1.48c):

∇ · (F × G) = (εijk Fj Gk ) from (1.42b) and (1.21)
∂xi
∂Fj ∂Gk
= Gk εijk + Fj εijk
∂xi ∂xi
∂Fj ∂Gk
= Gk εkij − Fj εjik from (1.19c)
∂xi ∂xi
= G · (∇ × F) − F · (∇ × G) . from (1.43c).

(1.48d):

(∇ × (F × G))i = εijk (εklm Fl Gm ) from (1.21) and (1.43c)
∂xj
 
∂Gm ∂Fl
= (δil δjm − δim δjl ) Fl + Gm from (1.25b)
∂xj ∂xj
∂Gj ∂Fi ∂Gi ∂Fj
= Fi + Gj − Fj − Gi from (1.14b) and (1.14c)
∂xj ∂xj ∂xj ∂xj
03/22 = (F (∇ · G) − G (∇ · F) + (G · ∇)F − (F · ∇)G)i from (1.42b) and (1.47a).

Warnings.

1. Always remember what terms the differential operator is acting on, e.g. is it all terms to the right or
just some?
2. Be very very careful when using standard vector identities where you have just replaced a vector
with ∇. Sometimes it works, sometimes it does not! For instance for constant vectors D, F and G

F · (D × G) = D · (G × F) = −D · (F × G) .

However for ∇ and vector functions F and G

F · (∇ × G) ̸= ∇ · (G × F) = −∇ · (F × G) ,

Natural Sciences Tripos: IB Mathematical Methods I 14 © [email protected], Michaelmas 2022


since
∂Gk
F · (∇ × G) = Fi εijk ,
∂xj
while

∇ · (G × F) = (εjki Gk Fi )
∂xj
∂Gk ∂Fi
= Fi εijk + Gk εijk .
∂xj ∂xj

1.6 Second-Order Vector Differential Operators

1.6.1 curl grad and div curl

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Using the definitions grad, div and curl, i.e. (1.28b), (1.42a) and (1.43a), and assuming the equality of mixed
derivatives, we have that (cf. (1.20c) where we showed that εijk sij = 0 is sij is symmetric)
curl (grad ψ) = ∇ × (∇ψ) = εijk ∂xj (∂xk ψ)
= εikj ∂xk ∂xj ψ relabel dummy suffices j and k
= −εijk ∂xj ∂xk ψ permutate ikj in εikj & swap partials
=0 quantity equals its negative. (1.49a)
Similarly,
div(curl F) = ∇ · (∇ × F) = ∂xi εijk ∂xj Fk
= ∂xj εjik ∂xi Fk relabel dummy suffices i and j
= −∂xi εijk ∂xj Fk permutate jik in εjik & swap partials
= 0. quantity equals its negative (1.49b)

Remarks.

1. Since by the standard rules for scalar triple products ∇ · (∇ × F) ≡ (∇ × ∇) · F, we can summarise
both of these identities by
Key
∇ × ∇ ≡ 0. (1.50) Result
2. There are important converses to (1.49a) and (1.49b). The following two assertions can be proved (but
not here).
(a) Suppose that ∇ × F = 0; the vector field F(r) is said to be irrotational. Then there exists a scalar
potential, φ(r), such that
F = ∇φ . (1.51)
Application. A force field F such that ∇ × F = 0 is said to the conservative. Gravity is a
conservative force field. The above result shows that we can define a gravitational potential φ
such that F = ∇φ.
(b) Suppose that ∇ · B = 0; the vector field B(r) is said to be solenoidal. Then there exists a
non-unique vector potential, A(r), such that
B = ∇ × A. (1.52)
Application. One of Maxwell’s equations for a magnetic field, B, states that ∇ · B = 0. The above
result shows that we can define a magnetic vector potential, A, such that B = ∇ × A.

Example. Evaluate ∇ · (∇p × ∇q), where p and q are scalar fields. We will use this result later.
Answer. Identify ∇p and ∇q with F and G respectively in the vector identity (1.48c). Then it follows
from using (1.50) that
∇ · (∇p × ∇q) = ∇q · (∇ × ∇p) − ∇p · (∇ × ∇q) = 0 . (1.53)
02/02

Natural Sciences Tripos: IB Mathematical Methods I 15 © [email protected], Michaelmas 2022


1.6.2 The Laplacian Operator ∇2

From the definitions of div and grad


 
∂ ∂
div(grad ψ) = ∇ · (∇ψ) = ψ
∂xi ∂xi
∂2ψ
= (1.54a)
∂x2
 i2
∂2 ∂2


= + + ψ. (1.54b)
∂x21 ∂x22 ∂x23

Define the Laplacian operator to be ∇2 = ∇ · ∇; then in Cartesian coordinates it is given by

∂2 ∂2 ∂2 ∂2
∇2 = ∇ · ∇ = = + + . (1.54c)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
∂x2i ∂x21 ∂x22 ∂x23

Remarks.

1. The Laplacian operator ∇2 is very important in the natural sciences. For instance it occurs in
(a) Poisson’s equation for a potential φ(r):

∇2 φ = ρ , (1.55a)

where (with a suitable normalisation)


i. ρ(r) is charge density in electromagnetism (when (1.55a) relates charge and electric potential);
ii. ρ(r) is mass density in gravitation (when (1.55a) relates mass and gravitational potential).
(b) Schrödinger’s equation for a non-relativistic quantum mechanical particle of mass m in a poten-
tial V (r):
ℏ2 2 ∂ψ
− ∇ ψ + V (r)ψ = iℏ , (1.55b)
2m ∂t
where ψ is the quantum mechanical wave function and ℏ is Planck’s constant divided by 2π.
(c) Helmholtz’s equation
∇2 f + ω 2 f = 0 , (1.55c)
which governs the propagation of fixed frequency waves (e.g. fixed frequency sound waves).
Helmholtz’s equation is a 3D generalisation of the simple harmonic resonator

d2 f
+ ω2 f = 0 .
dx2
02/03
2. Although the Laplacian has been introduced by reference to its effect on a scalar field (in our case ψ),
it also has meaning when applied to vectors. However some care is needed. On the first example sheet
you will prove the vector identity

∇ × (∇ × F) = ∇(∇ · F) − ∇2 F . (1.56a)

The Laplacian acting on a vector is conventionally defined by rearranging this identity to obtain

∇2 F = ∇(∇ · F) − ∇ × (∇ × F) . (1.56b)
02/04

Natural Sciences Tripos: IB Mathematical Methods I 16 © [email protected], Michaelmas 2022


1.6.3 Examples

1. Find ∇2 rn = div(∇rn ). We will use this result later.


Answer. Put f (r) = rn in (1.33b) to obtain
r
∇rn = nrn−1 = nrn−2 (x1 , x2 , x3 ) . (1.57)
r
So from the definition of divergence (1.42a):

∂(nrn−2 xi )
∇2 rn = ∇ · (∇rn ) =
∂xi
∂xi ∂rn−2
= nrn−2 + nxi
∂xi ∂xi
xi
= 3nrn−2 + nxi (n − 2)rn−3 using (1.31a)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
r
= n(n + 1)rn−2 . (1.58)

Check. Note that from setting n = 2 in (1.57) we have that ∇r2 = 2r. It follows that, with n = 2,
(1.58) reproduces (1.46a).
sin r
2. Unlectured. Find the Laplacian of r .

Answer. Since the Laplacian consists of first taking a gradient, we first note from using result (1.33a),
i.e. ∇f (r) = f ′ (r)∇r, that
   
sin r cos r sin r
∇ = − 2 ∇r . (1.59a)
r r r

Further, we recall from (1.32) that


r
∇r = , (1.59b)
r
and also from (1.58) with n = 1 that
2
∇ · (∇r) = . (1.59c)
r
Hence
   
2 sin r sin r
∇ =∇·∇
r r
  
cos r sin r
=∇· − 2 ∇r from (1.59a)
r r
   
cos r sin r cos r sin r
= − 2 ∇ · ∇r + ∇r · ∇ − 2 from identity (1.48a)
r r r r
   
cos r sin r r cos r sin r
=2 − 3 + ·∇ − 2 from (1.59b) & (1.59c)
r2 r r r r
   
cos r sin r r sin r 2 cos r 2 sin r
=2 − 3 + · − − + ∇r using (1.33a) again
r2 r r r r2 r3
sin r
=− . (1.60)
r
Remarks.
(i) It follows that f = sinr r satisfies Helmholtz’s equation (1.55c) for ω = 1.
(ii) It is arguably easier to derive this result using suffix notation.

Natural Sciences Tripos: IB Mathematical Methods I 17 © [email protected], Michaelmas 2022


1.7 The Big Integral Theorems

These are two very important integral theorems for vector fields that have many scientific applications.

1.7.1 The Divergence Theorem (Gauss’ Theorem)

Divergence Theorem. Let S be a ‘nice’ surface10 enclosing a volume V in R3 , with a normal n


b that points
outwards from V. Let u be a ‘nice’ vector field.11 Then
ZZZ ZZ
Key
∇ · u dV = u · dS , (1.61)
Result
V S(V)

where dV is the volume element, dS = n b dS is the vector surface element, n


b is the unit normal to the
surface S and dS is a small element of surface area. In Cartesian coordinates

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
dV = dx dy dz , (1.62a)
and
dS = σx dy dz ex + σy dz dx ey + σz dx dy ez , (1.62b)
n · ex ), σy = sign(b
where σx = sign(b n · ey ) and σz = sign(b
n · ez ).

At a point on the surface, u · nb is the flux of u


across the surface at that point. Hence the diver-
gence theorem states that ∇ · u integrated over
a volume V is equal to the total flux of u across
the closed surface S surrounding the volume.

Remark. The divergence theorem relates a triple integral to a double integral. This is analogous to the
second fundamental theorem of calculus, i.e.
Z h2
df
dz = f (h2 ) − f (h1 ) , (1.63)
h1 dz

02/01 which relates a single integral to a function.


Outline Proof. Suppose that S is a surface enclosing a volume V such that Cartesian axes can be chosen so
that any line parallel to any one of the axes meets S in just one or two points (e.g. a convex surface).
We observe that ZZZ ZZZ  
∂ux ∂uy ∂uz
∇ · u dV = + + dV ,
∂x ∂y ∂z
V V
RRR ∂uz
comprises of three terms; we initially concentrate on the V ∂z
dV term.

Let region A be the projection of S onto the xy-


plane. Let the lower/upper surfaces, S1 /S2 re-
spectively, be parameterised by

S1 : r = (x, y, h1 (x, y))


S2 : r = (x, y, h2 (x, y)) .

10 For instance, a bounded, piecewise smooth, orientated, non-intersecting surface.


11 For instance a vector field with continuous first-order partial derivatives throughout V.

Natural Sciences Tripos: IB Mathematical Methods I 18 © [email protected], Michaelmas 2022


Then using the second fundamental theorem of calculus (1.63)
ZZZ Z Z "Z h2 #
∂uz ∂uz
dx dy dz = dz dx dy
∂z z=h1 ∂z
V
ZAZ
= (uz (x, y, h2 (x, y)) − uz (x, y, h1 (x, y)) dx dy . (1.64)
A

Now consider the projection of a surface element dS on the upper surface onto the xy plane. It follows
geometrically that dx dy = | cos α| dS, where α is the angle between ez and the unit normal nb ; hence
on S2
dx dy = ez · n
b dS = ez · dS . (1.65a)
On the lower surface S1 we need to dot n
b with −ez in order to get a positive area; hence

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
dx dy = −ez · dS . (1.65b)

We note that (1.65a) and (1.65b) are consistent with (1.62b) once the tricky issue of signs is sorted
out. Using (1.62a), (1.65a) and (1.65b), equation (1.64) can be rewritten as
ZZZ ZZ ZZ ZZ
∂uz
dV = uz ez · dS + uz ez · dS = uz ez · dS , (1.66a)
∂z
V S2 S1 S

since S1 + S2 = S. Similarly by permutation (i.e. x → y, y → z and z → x),


ZZZ ZZ ZZZ ZZ
∂uy ∂ux
dV = uy ey · dS , dV = ux ex · dS . (1.66b)
∂y ∂x
V S V S

Adding the above results we obtain the divergence theorem (1.61):


ZZZ ZZ
∇ · u dV = u · dS . (1.67)
V S

The generalisation for a scalar field. For a scalar field ψ(x) with continuous first-order partial derivatives
in V, ZZZ ZZ
∇ψ dV = ψ dS . (1.68a)
V S

Proof. Set u = ψ a in (1.67), where a is an arbitrary constant vector. Then from (1.48a)
ZZZ ZZ
a. ∇ψ dV = a. ψ dS . (1.68b)
V S

Since a is arbitrary, (1.68a) follows.12 Alternatively, choose a = ei to obtain the component form

Key
ZZZ ZZ
∂ψ
dV = ψni dS . (1.68c) Result
∂xi
V S

The generalisation for a vector potential. For a vector potential A with continuous first-order partial deriva-
tives in V, ZZZ ZZ
∇ × A dV = nb × A dS . (1.69)
V S

Proof (unlectured). Either set u = a × A in (1.67), where where a is an arbitrary constant vector, and
04/22 then proceed as above, or let ψ = εijk Aj in (1.68c), to recover (1.69) in component form.

12 If a · b = 0 for every a, choose a = b. Then b · b = ∥b∥2 = 0 implies b = 0.

Natural Sciences Tripos: IB Mathematical Methods I 19 © [email protected], Michaelmas 2022


1.7.2 Stokes’ Theorem

Let S be any ‘nice’ open surface bounding the ‘nice’ closed curve C.13 Let u(r) be a ‘nice’ vector field.14
Then ZZ I
Key
∇ × u · dS = u · dr , (1.70a) Result
S C
where the line integral is taken in the direction of C as specified by the ‘right-hand rule’.
Remark. Stokes’ theorem thus states that the flux of ∇ × u
across an open surface S is equal to the circulation of u
round the bounding curve C.

Planar surface. For a surface in the (x, y) plane, so that


dS = dxdy ez , Stokes’ theorem reduces to Green’s theo-
rem in the plane:

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
ZZ   Z
∂uy ∂ux
− dxdy = (ux dx + uy dy) , (1.70b)
A ∂x ∂y C

where A is the region of the plane bounded by the curve


C, and the line integral follows a positive sense.
Outline Proof. First prove Green’s theorem for a rectangle us-
ing the second fundamental theorem of calculus. Second,
subdivide S into small planar rectangles to any desired
accuracy (cf. subdividing the range in a standard line in-
tegral). Finally, apply Green’s theorem to all these subdi-
visions, noting that when the integrals are added together,
the circulations along internal curve segments cancel out,
leaving only the circulation around C.

1.7.3 Examples and Applications


Archimedes’ Principle. A body is acted on by a hydrostatic
pressure force p = −ρgz, where ρ is the density of the
surrounding fluid, g is gravity and z is the vertical coor-
dinate. Find a simplified expression for the pressure force
on the body starting from
ZZ
F=− p dS . (1.71)
S

Answer. Consider the individual components of u and


use the divergence theorem. Then
ZZ ZZZ ZZZ ZZZ
∂(−ρgz)
ez · F = − p ez · dS = − ∇ · (p ez ) dV = − dV = g ρ dV = M g , (1.72a)
∂z
S V V V

where M is the mass of the fluid displaced by the body. Similarly


ZZZ ZZZ
∂(−ρgz)
ex · F = − ∇ · (p ex ) dV = − dV = 0 , (1.72b)
∂x
V V

and ey · F = 0. Hence we have Archimedes’ Principle that an immersed body experiences a loss of
weight equal to the weight of the fluid displaced:
F = M g ez . (1.72c)

13 Or to be slightly more precise: let S be a piecewise smooth, open, orientated, non-intersecting surface bounded by a simple,

piecewise smooth, closed curve C.


14 For instance a vector field with continuous first-order partial derivatives on S.

Natural Sciences Tripos: IB Mathematical Methods I 20 © [email protected], Michaelmas 2022


Gradient theorem. Show that provided there are no singularities, the integral
Z
∇φ · dr , (1.73)
C
where φ is a scalar field and C is an open path joining two fixed points A and B, is independent of
the path chosen between the points.
Answer. Consider two such paths: C1 and C2 . Form a closed
curve Cb from these two curves. Then using Stokes’ Theorem and
the result (1.49a) that a curl of a gradient is zero, we have that
Z Z I
∇φ · dr − ∇φ · dr = ∇φ · dr
C1 C2 C
b
Z Z
= ∇ × (∇φ) · dS

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Sb
= 0,

where Sb is a nice open surface bounding C.


b Hence
Z Z
∇φ · dr = ∇φ · dr . (1.74)
C1 C2

Application.
R Suppose that φ is the gravitational potential, then g = −∇φ is the gravitational force, and
C
(−∇φ) · dr is the work done against gravity in moving from A to B. The above result demonstrates
that the work done is independent of path. Indeed, from (1.29), i.e. ∇φ · dr = dφ,
Z Z
03/02 ∇φ · dr = dφ = φ(B) − φ(A) . (1.75)
C C

1.7.4 Interpretation of divergence

Let a volume V be enclosed by a surface S, and consider a


limit process in which the greatest diameter of V tends to
zero while keeping the point r0 inside V. Then from Taylor’s
theorem with r = r0 + δr,
ZZZ ZZZ
∇ · u(r) dV = (∇ · u(r0 ) + . . . ) dV = ∇ · u(r0 ) |V| + . . . ,
V V
where |V| is the volume of V. Thus using the divergence theorem (1.61)
ZZ
1
∇ · u = lim u · dS , (1.76)
|V|→0 |V|
S

where S is any ‘nice’ small closed surface enclosing a volume V. It follows that ∇ · u can be interpreted as
the net rate of flux outflow at r0 per unit volume.

Application. Suppose that v is a velocity field. Then


RR
∇·v >0 ⇒ v · dS > 0 ⇒ net positive flux ⇒ there exists a source at r0 ;
S
RR
∇·v <0 ⇒ v · dS < 0 ⇒ net negative flux ⇒ there exists a sink at r0 .
S

1.7.5 Interpretation of curl

Let an open smooth surface S be bounded by a curve C.


Consider a limit process in which the point r0 remains on S,
the greatest diameter of S tends to zero, and the normals
at all points on the surface tend to a specific direction (i.e.
the value of n b at r0 ). Then from Taylor’s theorem with
r = r0 + δr,

Natural Sciences Tripos: IB Mathematical Methods I 21 © [email protected], Michaelmas 2022


ZZ ZZ
(∇ × u(r)) · dS = (∇ × u(r0 ) + . . . ) · dS = ∇ × u(r0 ) · n
b |S| + . . . ,
S S

where |S| is the area of S. Thus using Stokes’ theorem (1.70a)


I
1
b · (∇ × u) = lim
n u · dr , (1.77)
S→0 |S| C

where S is any ‘nice’ small open surface with a bounding curve C. It follows that n
b ·(∇×u) can be interpreted
03/03 as the circulation about n
b at r0 per unit area.

Application.
Consider a rigid body rotating with angular velocity ω
about an axis through 0. Then the velocity at a point

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
r in the body is given by

v = ω × r. (1.78a)

Suppose that C is a circle of radius a in a plane normal


to ω. Then the circulation of v around C is
I Z 2π
v · dr = (ωa) a dϕ = 2πa2 ω . (1.78b)
C 0

Hence from (1.77) I


1
b · (∇ × v) = lim
ω v · dr = 2ω . (1.78c)
a→0 πa2 C
We conclude that the curl is a measure of the local rotation of a vector field.
03/04 Exercise. Show by direct evaluation that if v = ω × r then ∇ × v = 2ω.

1.8 Orthogonal Curvilinear Coordinates

1.8.1 What Are Orthogonal Curvilinear Coordinates?

There are many ways to describe the position of points in space. One way is to define three independent sets
of surfaces, each parameterised by a single variable (for Cartesian coordinates these are orthogonal planes
parameterised, say, by the point on the axis that they intercept). Then any point has ‘coordinates’ given by
the labels for the three surfaces that intersect at that point.

The unit vectors analogous to e1 , etc. are the unit


normals to these surfaces. Such coordinates are called
curvilinear. They are generally of most use when the
orthonormality condition (1.14g), i.e. ei · ej = δij ,
holds; in which case they are called orthogonal curvi-
linear coordinates. Common examples are spherical
and cylindrical polar coordinates. For instance in the
case of spherical polar coordinates the independent
sets of surfaces are spherical shells and planes of con-
stant latitude and longitude.

It is very important to realise that there is a key difference between Cartesian coordinates and other Key
orthogonal curvilinear coordinates. In Cartesian coordinates the directions of the basis vectors ex , ey , ez are Point
independent of position. This is not the case in other coordinate systems; for instance, er the normal to a
spherical shell changes direction with position on the shell. It is sometimes helpful to display this dependence
on position explicitly:
ei ≡ ei (r) . (1.79)

Natural Sciences Tripos: IB Mathematical Methods I 22 © [email protected], Michaelmas 2022


1.8.2 Relationships Between Coordinate Systems

Suppose that we have non-Cartesian coordinates, qi (i = 1, 2, 3). Since we can express one coordinate system
in term of another, there will be a functional dependence of the qi on, say, Cartesian coordinates x, y, z, i.e.

qi ≡ qi (x, y, z) (i = 1, 2, 3) . (1.80)

For cylindrical polar coordinates and spherical polar coordinates we know that:

Cylindrical Polar Spherical Polar


Coordinates Coordinates

q1 ρ = (x2 + y 2 )1/2 r = (x2 + y 2 + z 2 )1/2


 2 2 1/2 
ϕ = tan−1 xy θ = tan−1 (x +yz )

q2

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
q3 z ϕ = tan−1 (y/x)

Remarks

1. Note that qi = ci (i = 1, 2, 3), where the ci are constants, define three independent sets of surfaces,
each ‘labelled’ by a parameter (i.e. the ci ). As discussed above, any point has ‘coordinates’ given by
the labels for the three surfaces that intersect at that point.
2. The equation (1.80) can be viewed as three simultaneous equations for three unknowns x, y, z. In
general these equations can be solved to yield the position vector r as a function of q = (q1 , q2 , q3 ),
i.e. r ≡ r(q) or
x = x(q1 , q2 , q3 ) , y = y(q1 , q2 , q3 ) , z = z(q1 , q2 , q3 ) . (1.81)
For instance:

Cylindrical Polar Spherical Polar


Coordinates Coordinates

x ρ cos ϕ r cos ϕ sin θ


y ρ sin ϕ r sin ϕ sin θ
03/01 z z r cos θ

1.8.3 Incremental Change in Position or Length.

Consider an infinitesimal change in position. Then, by the chain rule, the change dxi in xi (q1 , q2 , q3 ) due to
changes dqj in qj (i = 1, 2, 3) is
∂xi ∂xi ∂xi ∂xi
dxi = dq1 + dq2 + dq3 = dqj (i = 1, 2, 3). (s.c.) (1.82)
∂q1 ∂q2 ∂q3 ∂qj

Anticipating a crisis in notation, we let ex = x


b1 , ey = x
b2
and ez = xb3 . Then the vector displacement dr can be writ-
ten as
∂xi
dr ≡ dxi x
bi = dqj x
bi
∂qj
 
∂xi
= x
bi dqj
∂qj
= hj dqj , (1.83a)
where
∂xi ∂r(q)
hj = x
bi = (j = 1, 2, 3) . (1.83b)
∂qj ∂qj

Natural Sciences Tripos: IB Mathematical Methods I 23 © [email protected], Michaelmas 2022


Thus the infinitesimal change in position dr is a vector sum of displacements hj (r) dqj ‘along’ each of
the three q-axes through r. The vectors hj are not necessarily unit vectors, so it is convenient to write
(suspending the s.c.)
hj = hj ej (j = 1, 2, 3) , (no s.c.) (1.84a)

where the hj = |hj | are the lengths of the hj , and the ej are unit vectors, i.e.
∂r 1 ∂r
hj = and ej = (j = 1, 2, 3) . (no s.c.) (1.84b)
∂qj hj ∂qj
Remarks.
(i) The hj will, in general, depend on position r. Consequently, the ej (r) will vary in space (cf. the x bi ),
and the q-axes will be curves rather than straight lines. The coordinate system is said to be curvilinear.
(ii) The scale factors or metric coefficients, hj , convert co-ordinate increments into lengths. Any point at

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
05/22 which hj = 0 is a coordinate singularity at which the coordinate system breaks down.

1.8.4 The Jacobian


The Jacobian matrix, J, of the transformation from coordinates (x1 , x2 , x3 ) to (q1 , q2 , q3 ) is defined as
 ∂x ∂x ∂x

∂q ∂q2 ∂q3
 ∂y1 ∂y ∂y 
J =  ∂q1 ∂q2 ∂q3  . (1.85a)
∂z ∂z ∂z
∂q1 ∂q2 ∂q3

The Jacobian of (x, y, z) with respect to (q1 , q2 , q3 ) is defined as the determinant of this matrix:
∂x ∂x ∂x
∂q1 ∂q2 ∂q3
∂(x, y, z) ∂y ∂y ∂y
J≡ = |J| = ∂q1 ∂q2 ∂q3 . (1.85b)
∂(q1 , q2 , q3 ) ∂z ∂z ∂z
∂q1 ∂q2 ∂q3

The columns of the above matrix are the vectors hi defined in (1.83b). Therefore the Jacobian is equal to
the scalar triple product
J = [h1 , h2 , h3 ] = h1 ·h2 × h3 . (1.85c)

Given a point with curvilinear coordinates (q1 , q2 , q3 ), con-


sider three small displacements dr1 = h1 dq1 , dr2 = h2 dq2 and
dr3 = h3 dq3 along the three curvilinear coordinate directions.
They span a parallelepiped of volume

dV = |[dr1 , dr2 , dr3 ]| = |J| dq1 dq2 dq3 . (1.86a)

Hence the volume element in a general curvilinear coordinate


system is
∂(x, y, z) Key
dV = dq1 dq2 dq3 . (1.86b) Result
∂(q1 , q2 , q3 )
The Jacobian therefore appears when changing variables in a multiple integral:
ZZZ Z ZZZ ZZZ
∂(x, y, z)
Φ dx dy dz = Φ dV = Φ dq1 dq2 dq3 Φ J dq1 dq2 dq3 . (1.86c)
∂(q1 , q2 , q3 )
Remarks.
(i) If |J| = 0 in the range of variables, the coordinate transformation is singular and care is needed.
(ii) Jacobians are defined similarly for transformations in any number of dimensions. If curvilinear coor-
dinates (q1 , q2 ) are introduced in the (x, y)-plane, the area element is
dA = |J| dq1 dq2 , (1.87a)
where
∂x ∂x
∂(x, y) ∂q1 ∂q2
J= = ∂y ∂y . (1.87b)
∂(q1 , q2 ) ∂q1 ∂q2

Natural Sciences Tripos: IB Mathematical Methods I 24 © [email protected], Michaelmas 2022


(iii) The equivalent rule for a one-dimensional integral is
Z Z
dx
f (x) dx = f (x(q)) dq , (1.88)
dq
where the direction of integration of the limits needs to be consistent with the use of the modulus.
Indeed, care is needed in transforming the limits in any of the above integrals.

1.8.5 Properties of Jacobians

Consider now three sets of variables αi , βi and γi , with 1 ⩽ i ⩽ n, none of which need be Cartesian
coordinates. According to the chain rule of partial differentiation,
∂αi ∂αi ∂βk
= . (s.c.) (1.89)
∂γj ∂βk ∂γj

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
The left-hand side is the ij-component of the Jacobian matrix of the transformation from αi to γi . The
equation states that this matrix is the product of the Jacobian matrices of the transformations from αi to
βi and from βi to γi , i.e. the Jacobian matrix of a composite transformation is the product of the Jacobian
matrices of the transformations of which it is composed.
The chain rule for Jacobians. Taking the determinant of (1.89), we recover the chain rule for Jacobians:
∂(α1 , · · · , αn ) ∂(α1 , · · · , αn ) ∂(β1 , · · · , βn )
= . (1.90)
∂(γ1 , · · · , γn ) ∂(β1 , · · · , βn ) ∂(γ1 , · · · , γn )

The inverse transformation for Jacobians. In the special case in which γi = αi for all i, the left-hand side
is 1 (the determinant of the unit matrix), and so we obtain
 −1
∂(α1 , · · · , αn ) ∂(β1 , · · · , βn )
= . (1.91)
∂(β1 , · · · , βn ) ∂(α1 , · · · , αn )
Hence, the Jacobian of an inverse transformation is the reciprocal of that of the forward transformation.
This is a multidimensional generalization of the result dx/dy = (dy/dx)−1 .

1.8.6 Orthogonality

For a general qj coordinate system the ej are not necessarily mutually orthogonal, i.e. in general
ei · ej ̸= 0 for i ̸= j .
However, for orthogonal curvilinear coordinates the ei are required to be mutually orthogonal at all points
in space, i.e.
ei · ej = 0 if i ̸= j .
Since by definition the ej are unit vectors, we thus have that
ei · ej = δij . (1.92)

Handedness. It is conventional to order the qi so that the coordinate system is right-handed.


Incremental Distance. In an orthogonal curvilinear coordinate system the expression for the incremental
distance, |dr|2 simplifies. We find that (noting that the s.c. does not work)
!  
X X
|dr|2 = dr · dr = hi dqi ei ·  hj dqj ej  from (1.83a) and (1.84a)
i j
X
= (hi dqi )(hj dqj ) δij from (1.92)
i,j
X Key
= h2i (dqi )2 . from (1.14c) (1.93) Result
i

Natural Sciences Tripos: IB Mathematical Methods I 25 © [email protected], Michaelmas 2022


1.8.7 Spherical Polar Coordinates

In this case q1 = r, q2 = θ, q3 = ϕ, and in term of Cartesian


co-ordinates

r = (r sin θ cos ϕ, r sin θ sin ϕ, r cos θ) . (1.94a)


Hence
∂r ∂r
= = (sin θ cos ϕ, sin θ sin ϕ, cos θ) ,
∂q1 ∂r
∂r ∂r
= = (r cos θ cos ϕ, r cos θ sin ϕ, −r sin θ) ,
∂q2 ∂θ
∂r ∂r
= = (−r sin θ sin ϕ, r sin θ cos ϕ, 0) .
∂q3 ∂ϕ

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
It follows from (1.84b) that

∂r
h1 = hr = = 1, e1 = er = (sin θ cos ϕ, sin θ sin ϕ, cos θ) , (1.94b)
∂q1
∂r
h2 = hθ = = r, e2 = eθ = (cos θ cos ϕ, cos θ sin ϕ, − sin θ) , (1.94c)
∂q2
∂r
h3 = hϕ = = r sin θ , e3 = eϕ = (− sin ϕ, cos ϕ, 0) . (1.94d)
∂q3

Remarks.

(i) ei · ej = δij and e1 × e2 = e3 , i.e. spherical polar coordinates are a right-handed orthogonal curvilinear
coordinate system. If we had chosen, say, q1 = r, q2 = ϕ, q3 = θ, then we would have ended up with a
left-handed system.
(ii) er , eθ and eϕ are functions of position.

(iii) Spherical polars are singular at r = 0, θ = 0 and θ = π, i.e. on the ‘north-south’ axis.
(iv) Recalling from (1.83a) and (1.84a) that the hj give the components of the displacement vector dr
along the r, θ, and ϕ axes, we have that
X
dr = hj dqj ej = dr er + r dθ eθ + r sin θ dϕ eϕ . (1.95)
j

1.8.8 Cylindrical Polar Coordinates

In this case q1 = ρ, q2 = ϕ, q3 = z, and in term of Cartesian


co-ordinates

r = (ρ cos ϕ, ρ sin ϕ, z) . (1.96a)


Exercise. Show that
∂r ∂r
= = (cos ϕ, sin ϕ, 0) ,
∂q1 ∂ρ
∂r ∂r
= = (−ρ sin ϕ, ρ cos ϕ, 0) ,
∂q2 ∂ϕ
∂r ∂r
= = (0, 0, 1) .
∂q3 ∂z

Natural Sciences Tripos: IB Mathematical Methods I 26 © [email protected], Michaelmas 2022


and hence that
∂r
h1 = hρ = = 1, e1 = eρ = (cos ϕ, sin ϕ, 0) , (1.96b)
∂q1
∂r
h2 = hϕ = = ρ, e2 = eϕ = (− sin ϕ, cos ϕ, 0) , (1.96c)
∂q2
∂r
h3 = hz = = 1, e3 = ez = (0, 0, 1) . (1.96d)
∂q3
Remarks.
(i) ei ·ej = δij and e1 ×e2 = e3 , i.e. cylindrical polar coordinates are a right-handed orthogonal curvilinear
coordinate system.
(ii) eρ and eϕ are functions of position.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(iii) Cylindrical polars are singular on the axis ρ = 0.
(iv) As noted in the section on Assumed Knowledge, §0.12, in the case of cylindrical polar coordinates,
sometimes r and/or θ are used in place of ρ and/or ϕ respectively (but then there is potential confusion
with the different definitions of r and θ in spherical polar co-ordinates). Further, in order to maximise
confusion, instead of ρ (which, admittedly, can be useful for other things, such as density), some
authors use R, s or ϖ.

1.8.9 Volume and Surface Elements in Orthogonal Curvilinear Coordinates

Volume element. For orthogonal curvilinear coordinate systems it follows from (1.86a) that

dV = dr1 × dr2 · dr3


= h1 h2 h3 dq1 dq2 dq3 e1 × e2 · e3
= h1 h2 h3 dq1 dq2 dq3 . (1.97a)

Example: Spherical Polar Coordinates. In the case of spherical polar coordinates we have from (1.94b),
(1.94c), (1.94d) and (1.97a) that
dV = r2 sin θ drdθdϕ . (1.97b)
The volume of the sphere of radius a is therefore
ZZZ Z a Z π Z 2π
4 3
dV = dr dθ dϕ r2 sin θ = πa . (1.97c)
0 0 0 3
V

Surface element. The surface element can also be deduced


for arbitrary orthogonal curvilinear coordinates. First
consider the special case when dS || e3 , then

dS = (h1 dq1 e1 ) × (h2 dq2 e2 )


= h1 h2 dq1 dq2 e3 . (1.98a)
In general
n · e1 )h2 h3 dq2 dq3 e1 + sign(b
dS = sign(b n · e2 )h3 h1 dq3 dq1 e2 + sign(b
n · e3 )h1 h2 dq1 dq2 e3 , (1.98b)
so that
04/02 b · dS = h2 h3 dq2 dq3 |b
dS = n n · e1 | + h3 h1 dq3 dq1 |b
n · e2 | + h1 h2 dq1 dq2 |b
n · e2 | > 0 if dqj > 0 . (1.98c)
04/03
04/04
1.8.10 Gradient in Orthogonal Curvilinear Coordinates

First we recall from (1.29) that for Cartesian coordinates and infinitesimal displacements dψ = ∇ψ · dr.

Definition. For curvilinear orthogonal coordinates (for which the basis vectors are in general functions of
position), we define ∇ψ to be the vector such that for all dr

dψ = ∇ψ · dr . (1.99)

Natural Sciences Tripos: IB Mathematical Methods I 27 © [email protected], Michaelmas 2022


In order to determine the components of ∇ψ when ψ is viewed as a function of q rather than r, write
X
∇ψ = ei αi , (1.100a)
i

then from (1.83a), (1.84a), (1.92), (1.14c) and (1.99)


X X X X
dψ = ∇ψ · dr = ei αi · hj ej dqj = αi (hj dqj ) ei · ej = αi (hi dqi ) . (1.100b)
i j i,j i

But according to the chain rule, an infinitesimal change dq to q will lead to the following infinitesimal
change in ψ ≡ ψ(q1 , q2 , q3 )
X ∂ψ X  1 ∂ψ 
dψ = dqi = (hi dqi ) . (1.100c)
i
∂qi i
hi ∂qi
Hence, since (1.100b) and (1.100c) must hold for all dqi ,

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1 ∂ψ
αi = , (1.100d)
hi ∂qi
and from (1.100a)  
X ei ∂ψ 1 ∂ψ 1 ∂ψ 1 ∂ψ
∇ψ = = , , . (1.100e)
i
hi ∂qi h1 ∂q1 h2 ∂q2 h3 ∂q3

Remark. Each term has dimensions ‘ψ/length’.


As before, we can consider ∇ψ to be the result of acting on ψ with the vector differential operator
X 1 ∂ Key
∇= ei . (1.101)
hi ∂qi Result
i

1.8.11 Examples of Gradients

Cylindrical Polar Coordinates. In cylindrical polar coordinates, the gradient is given from (1.96b), (1.96c) and
(1.96d) to be
∂ 1 ∂ ∂
∇ = eρ + eϕ + ez . (1.102a)
∂ρ ρ ∂ϕ ∂z
Spherical Polar Coordinates. In spherical polar coordinates the gradient is given from (1.94b), (1.94c) and
(1.94d) to be
∂ 1 ∂ 1 ∂
∇ = er + eθ + eϕ . (1.102b)
∂r r ∂θ r sin θ ∂ϕ

1.8.12 Divergence and Curl

We can now use (1.101) to compute ∇ · F and ∇ × F in orthogonal curvilinear coordinates. However, first
we need a preliminary result which is complementary to (1.84b). Since
∂qi
= δij , (1.103a)
∂qj

it follows from (1.101) that


X 1 ∂qi X ej ei
∇qi = ej = δij = , i.e. that ei = hi ∇qi . (1.103b)
j
hj ∂qj j
hj hi

We also recall that the ei form an orthonormal right-handed basis; thus e1 = e2 × e3 (and cyclic permuta-
tions). Hence from (1.103b)

e1 = h2 ∇q2 × h3 ∇q3 , and cyclic permutations. (1.103c)

Natural Sciences Tripos: IB Mathematical Methods I 28 © [email protected], Michaelmas 2022


Divergence. We have with a little bit of inspired rearrangement, and remembering to differentiate the ei
because they are position dependent:
!
X
∇·F=∇· Fi ei
i
  
e1
= ∇ · (h2 h3 F1 ) + cyclic permutations
h2 h3
 
e1 e1
= · ∇(h2 h3 F1 ) + h2 h3 F1 ∇ · + cyclic permutations using (1.48a)
h2 h3 h2 h3
 
e1 X 1 ∂
= · ej (h2 h3 F1 ) + h2 h3 F1 ∇ · (∇q2 × ∇q3 )
h2 h3 j hj ∂qj
+ cyclic permutations . using (1.101) & (1.103c)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Recall from (1.92) that e1 · ej = δ1j , and from example (1.53), with p = q2 and q = q3 , that

∇ · (∇q2 × ∇q3 ) = 0 .

It follows that
 
1 ∂ ∂ ∂ Key
∇·F= (h2 h3 F1 ) + (h3 h1 F2 ) + (h1 h2 F3 ) . (1.104)
h1 h2 h3 ∂q1 ∂q2 ∂q3 Result

Cylindrical Polar Coordinates. From (1.96b), (1.96c), (1.96d) and (1.104)

1 ∂ 1 ∂Fϕ ∂Fz
div F = (ρFρ ) + + . (1.105a)
ρ ∂ρ ρ ∂ϕ ∂z

Spherical Polar Coordinates. From (1.94b), (1.94c), (1.94d) and (1.104)

1 ∂ 1 ∂ 1 ∂Fϕ
r 2 Fr +

div F = 2
(sin θ Fθ ) + . (1.105b)
04/01 r ∂r r sin θ ∂θ r sin θ ∂ϕ
Curl. Again with a little bit of inspired rearrangement we have that
!
X
∇×F=∇× Fi ei
i
  
X ei
= ∇ × (hi Fi )
i
hi
X ei X
= ∇(hi Fi ) × + hi Fi (∇ × ∇qi ) using (1.48b) & (1.103b)
i
hi i
X X  1 ∂(hi Fi ) 
= ej × ei . using (1.49a) & (1.101)
i j
hi hj ∂qj

But e1 × e2 = e3 and cyclic permutations, and ek × ek = 0, hence


   
e1 ∂(h3 F3 ) ∂(h2 F2 ) e2 ∂(h1 F1 ) ∂(h3 F3 )
∇×F = − + −
h2 h3 ∂q2 ∂q3 h3 h1 ∂q3 ∂q1
 
e3 ∂(h2 F2 ) ∂(h1 F1 )
+ − . (1.106a)
h1 h2 ∂q1 ∂q2

All three components of the curl can be written in the concise form

h1 e1 h2 e2 h3 e3
1 ∂ ∂ ∂ Key
∇×F= ∂q1 ∂q2 ∂q3 . (1.106b) Result
h1 h2 h3
06/22 h1 F1 h2 F2 h3 F3

Natural Sciences Tripos: IB Mathematical Methods I 29 © [email protected], Michaelmas 2022


Cylindrical Polar Coordinates. From (1.96b), (1.96c), (1.96d) and (1.106b)

eρ ρ eϕ ez
1
∇×F = ∂ρ ∂ϕ ∂z (1.107a)
ρ
Fρ ρ Fϕ Fz
 
1 ∂Fz ∂Fϕ ∂Fρ ∂Fz 1 ∂(ρFϕ ) 1 ∂Fρ
= − , − , − . (1.107b)
ρ ∂ϕ ∂z ∂z ∂ρ ρ ∂ρ ρ ∂ϕ

Spherical Polar Coordinates. From (1.94b), (1.94c), (1.94d) and (1.106b)

er reθ r sin θeϕ


1
∇ × F= 2 ∂r ∂θ ∂ϕ (1.108a)
r sin θ
Fr rFθ r sin θFϕ

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
   
1 ∂(sin θ Fϕ ) ∂Fθ 1 ∂Fr 1 ∂(rFϕ ) 1 ∂(rFθ ) 1 ∂Fr
= − , − , − . (1.108b)
r sin θ ∂θ ∂ϕ r sin θ ∂ϕ r ∂r r ∂r r ∂θ

Remarks.

1. Each term in a divergence and curl has dimensions ‘F/length’.


2. The above formulae can also be derived in a more physical manner using the divergence theorem and
Stokes’ theorem respectively.

1.8.13 Laplacian in Orthogonal Curvilinear Coordinates

Suppose we substitute F = ∇ψ into formula (1.104) for the divergence. Then since from (1.100e)
1 ∂ψ
Fi = ,
hi ∂qi
we have that
      
2 1 ∂ h2 h3 ∂ψ ∂ h3 h1 ∂ψ ∂ h1 h2 ∂ψ
∇ ψ ≡ ∇ · ∇ψ = + + . (1.109)
h1 h2 h3 ∂q1 h1 ∂q1 ∂q2 h2 ∂q2 ∂q3 h3 ∂q3

We thereby deduce that in a general orthogonal curvilinear coordinate system


      
1 ∂ h2 h3 ∂ ∂ h3 h1 ∂ ∂ h1 h2 ∂
∇2 = + + . (1.110)
h1 h2 h3 ∂q1 h1 ∂q1 ∂q2 h2 ∂q2 ∂q3 h3 ∂q3

Cylindrical Polar Coordinates. From (1.96b), (1.96c), (1.96d) and (1.110)

1 ∂2ψ ∂2ψ
 
2 1 ∂ ∂ψ
∇ ψ= ρ + 2 + . (1.111a)
ρ ∂ρ ∂ρ ρ ∂ϕ2 ∂z 2

Spherical Polar Coordinates. From (1.94b), (1.94c), (1.94d) and (1.110)

∂2ψ
   
2 1 ∂ 2 ∂ψ 1 ∂ ∂ψ 1
∇ ψ = r + sin θ + , (1.111b)
r2 ∂r ∂r r2 sin θ ∂θ ∂θ r2 sin2 θ ∂ϕ2
1 ∂2 ∂2ψ
 
1 ∂ ∂ψ 1
= (rψ) + sin θ + . (1.111c)
r ∂r2 r2 sin θ ∂θ ∂θ r2 sin2 θ ∂ϕ2

Remark. We have found here only the form of ∇2 as a differential operator on scalar fields. As noted earlier,
the action of the Laplacian on a vector field F is most easily defined using the vector identity

∇2 F = ∇(∇ · F) − ∇ × (∇ × F) . (1.112)

Natural Sciences Tripos: IB Mathematical Methods I 30 © [email protected], Michaelmas 2022


Alternatively ∇2 F can be evaluated by recalling that

∇2 F = ∇2 (F1 e1 + F2 e2 + F3 e3 ) ,

and remembering (a) that the derivatives implied by the Laplacian act on the unit vectors too, and
(b) that because the unit vectors are generally functions of position (∇2 F)i ̸= ∇2 Fi (the exception
being Cartesian coordinates).

1.8.14 Further Examples

1

Evaluate ∇ · r, ∇ × r, and ∇2 r in spherical polar coordinates, where r = r er . From (1.105b)

1 ∂
r2 · r = 3 ,

∇·r= 2
as in (1.46a). (1.113a)
r ∂r

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
From (1.108b)  
1 ∂r 1 ∂r
∇×r= 0, ,− = (0, 0, 0) , as in (1.46c). (1.113b)
r sin θ ∂ϕ r ∂θ
From (1.111c) for r ̸= 0

1 ∂2
    
21 1
∇ = r = 0, as in (1.58) with n = −1. (1.113c)
r r ∂r2 r

P.T.O.

Natural Sciences Tripos: IB Mathematical Methods I 31 © [email protected], Michaelmas 2022


1.8.15 Aide Memoire

Orthogonal Curvilinear Coordinates.


X 1 ∂
∇= ei .
i
hi ∂qi

 
1 ∂ ∂ ∂
div F = (h2 h3 F1 ) + (h3 h1 F2 ) + (h1 h2 F3 ) .
h1 h2 h3 ∂q1 ∂q2 ∂q3

h1 e1 h2 e2 h3 e3
1 ∂ ∂ ∂
curl F = ∂q1 ∂q2 ∂q3 .
h1 h2 h3
h1 F1 h2 F2 h3 F3

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
      
1 ∂ h2 h3 ∂ψ ∂ h3 h1 ∂ψ ∂ h1 h2 ∂ψ
∇2 ψ = + + .
h1 h2 h3 ∂q1 h1 ∂q1 ∂q2 h2 ∂q2 ∂q3 h3 ∂q3

Cylindrical Polar Coordinates: q1 = ρ, h1 = 1, q2 = ϕ, h2 = ρ, q3 = z, h3 = 1.

∂ 1 ∂ ∂
∇ = eρ + eϕ + ez .
∂ρ ρ ∂ϕ ∂z

1 ∂ 1 ∂Fϕ ∂Fz
div F = (ρFρ ) + + .
ρ ∂ρ ρ ∂ϕ ∂z

eρ ρ eϕ ez
1
curl F = ∂ρ ∂ϕ ∂z
ρ
Fρ ρ Fϕ Fz
 
1 ∂Fz ∂Fϕ ∂Fρ ∂Fz 1 ∂(ρFϕ ) 1 ∂Fρ
= − , − , − .
ρ ∂ϕ ∂z ∂z ∂ρ ρ ∂ρ ρ ∂ϕ

1 ∂2ψ ∂2ψ
 
2 1 ∂ ∂ψ
∇ ψ= ρ + + .
ρ ∂ρ ∂ρ ρ2 ∂ϕ2 ∂z 2

Spherical Polar Coordinates: q1 = r, h1 = 1, q2 = θ, h2 = r, q3 = ϕ, h3 = r sin θ.

∂ 1 ∂ 1 ∂
∇ = er + eθ + eϕ .
∂r r ∂θ r sin θ ∂ϕ

1 ∂ 1 ∂ 1 ∂Fϕ
r2 Fr +

div F = 2
(sin θ Fθ ) + .
r ∂r r sin θ ∂θ r sin θ ∂ϕ

er reθ r sin θeϕ


1
curl F = ∂ r ∂θ ∂ϕ
r2 sin θ
Fr rFθ r sin θFϕ
   
1 ∂(sin θ Fϕ ) ∂Fθ 1 ∂Fr 1 ∂(rFϕ ) 1 ∂(rFθ ) 1 ∂Fr
= − , − , − .
r sin θ ∂θ ∂ϕ r sin θ ∂ϕ r ∂r r ∂r r ∂θ

∂2ψ
   
2 1 ∂ 2 ∂ψ 1 ∂ ∂ψ 1
∇ ψ= 2 r + 2 sin θ + 2 2 .
r ∂r ∂r r sin θ ∂θ ∂θ r sin θ ∂ϕ2

Natural Sciences Tripos: IB Mathematical Methods I 32 © [email protected], Michaelmas 2022


2 Green’s Functions

2.0 Why Study This?

Numerous scientific phenomena are described by differential equations. This section is about extending your
armoury for solving ordinary differential equations, such as those that arise in quantum mechanics and
electrodynamics. In particular, we will be interested in how to model an idealized point charge or point
mass, or a localized source of heat, waves, etc.

2.0.1 Physical motivation

Newton’s second law for a particle of mass m moving in one dimension subject to a force F (t) is
dp
=F, (2.1a)
dt

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
where
dx
p=m, (2.1b)
dt
is the momentum. Suppose that the force is applied only in the time interval 0 < t < δt. The total change
in momentum, termed the impulse, is
Z δt
δp = F (t) dt = I . (2.1c)
0
We may wish to represent mathematically a situation in which the momentum is changed instantaneously,
e.g. if the particle experiences a collision. To achieve this, F must tend to infinity while δt tends to zero,
in such a way that its integral I is finite and non-zero. The delta function is introduced to meet these and
similar requirements.

2.1 The Dirac Delta Function (a.k.a. Alchemy)

2.1.1 The Delta Function as the Limit of a Sequence

Consider the discontinuous ‘top-hat’ function δε (x)


defined for ε > 0 by

0
 x < −ε
1
δε (x) = 2ε −ε ⩽ x ⩽ ε . (2.2a)

0 ε<x

Then for all values of ε, including the limit ε → 0+,


Z ∞
δε (x) dx = 1 . (2.2b)
−∞

Further we note that for any differentiable function g(x) and constant ξ
Z ∞ Z ξ+ε
′ 1 ′
δε (x − ξ)g (x) dx = g (x) dx
−∞ ξ−ε 2ε
1h iξ+ε
= g(x)
2ε ξ−ε
1
= (g(ξ + ε) − g(ξ − ε)) .

In the limit ε → 0+ we recover, from using Taylor’s theorem and writing g ′ (x) = f (x),
Z ∞
1
lim δε (x − ξ)f (x) dx = lim g(ξ) + εg ′ (ξ) + 12 ε2 g ′′ (ξ) + . . .
ε→0+ −∞ ε→0+ 2ε

−g(ξ) + εg ′ (ξ) − 21 ε2 g ′′ (ξ) + . . .




= f (ξ) . (2.2c)

Natural Sciences Tripos: IB Mathematical Methods I 33 © [email protected], Michaelmas 2022


We will view the delta function, δ(x), as the limit as ε → 0+ of δε (x), i.e.

δ(x) = lim δε (x) . (2.3)


ε→0+

Applications. Delta functions (as mathematical objects of infinite density and zero spatial extension but
having a non-zero integral effect) are a mathematical way of modelling point objects/properties, e.g.
point charges, point masses, point forces, point sinks/sources.

2.1.2 Some Properties of the Delta Function

Taking (2.3) as our ‘definition’ of a delta function, we infer the following.

(i) From (2.2a) we see that the delta function has an infinitely sharp peak of zero width, i.e.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.

∞ x=0
δ(x) = . (2.4a)
0 x ̸= 0

(ii) From (2.2b) it follows that the delta function has unit area, i.e.
Z β
δ(x) dx = 1 for any α > 0 , β > 0 . (2.4b)
−α

(iii) From (2.2c), and a sneaky interchange of the limit and the integration, we conclude that the delta
function can perform ‘surgical strikes’ on integrands picking out the value of the integrand at one
particular point, i.e. Z ∞
δ(x − ξ)f (x) dx = f (ξ) . (2.4c)
−∞

Remark. This result is equivalent to the substitution property of the Kronecker delta:
3
X
δij aj = ai .
j=1

The Dirac delta function can be understood as the equivalent of the Kronecker delta symbol for
functions of a continuous variable.

2.1.3 An Alternative (And Better) View

• The delta function δ(x) is not a function, but a distribution or generalised function.
• (2.4c) is not really a property of the delta function, but its definition. In other words δ(x) is the
generalised function such that for all ‘nice’ functions f (x)15
Z ∞
δ(x − ξ)f (x) dx = f (ξ) . (2.5)
−∞

• Given that δ(x) is defined within an integrand as a linear operator, it should always be employed in
8/02 an integrand as a linear operator.16
8/04
15 By ‘nice’ we mean, for instance, that f (x) is everywhere differentiable any number of times, and that

dn f 2
Z ∞
n
dx < ∞ for all integers n ⩾ 0.
−∞ dx

16 However we will not always be holier than thou: see (2.6d).

Natural Sciences Tripos: IB Mathematical Methods I 34 © [email protected], Michaelmas 2022


2.1.4 The Delta Function as the Limit of Other Sequences

The top-hat sequence, (2.2a), is not unique in tending to the delta function in an appropriate limit; there
are many such sequences of well-defined functions.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Graphs of the Witch of Agnesi, (2.6a), and the Gaussian, (2.7a), for increasingly smaller values of ε.

The Witch of Agnesi.


For instance we could have alternatively defined δε (x) by
ε
δε (x) = . (2.6a)
π(x2 + ε2 )

By substituting x = εy, we recover (2.2b), i.e.


Z ∞ Z ∞ i∞
1 1h
δε (x) dx = 2
dy = arctan y = 1.
−∞ −∞ π(y + 1) π −∞
8/03
Also, by means of the substitution x = (ξ + εz) followed by an application of Taylor’s theorem, the
analogous result to (2.2c) follows, namely
Z ∞ Z ∞
lim δε (x − ξ)f (x) dx = lim δε (εz)f (ξ + εz) εdz
ε→0+ −∞ ε→0+ −∞
Z ∞
1
= lim (f (ξ) + εzf ′ (ξ) + . . . ) dz
ε→0+ −∞ π(z 2 + 1)

= f (ξ) .

An equivalent sequence. We note that


Z ∞ Z 0 Z ∞ 
1 ıkx−ε|k| 1 ıkx+εk ıkx−εk
e dk = e dk + e dk
2π −∞ 2π −∞ 0
 
1 1 1
= −
2π ıx + ε ıx − ε
ε
= . (2.6b)
π(x + ε2 )
2

Hence from (2.6a)


Z ∞
1
δε (x) = eıkx−ε|k| dk (2.6c)
2π −∞

It follows that if we are willing to break the injunction that δ(x) should always be employed in
an integrand as a linear operator, we infer from (2.6c) that
Z ∞
1
δ(x) = eıkx dk . (2.6d)
2π −∞

Natural Sciences Tripos: IB Mathematical Methods I 35 © [email protected], Michaelmas 2022


The Gaussian (unlectured).
Another popular choice for δε (x) is the Gaussian of width ε:

x2
 
1
δε (x) = √ exp − 2 . (2.7a)
2πε2 2ε

The analogous
√ result to (2.2b) follows by means of the substitu-
tion x = 2 εy:
Z ∞ Z ∞ Z ∞
x2
 
1 1
exp −y 2 dy = 1 .

δε (x) dx = √ exp − 2 dx = √ (2.7b)
−∞
2
2πε −∞ 2ε π −∞

The equivalent result to (2.2c) can also be recovered by the substitution x = (ξ + 2 εz) followed by
an application of Taylor’s theorem.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
2.1.5 Further Properties of the Delta Function

The following properties hold for all the definitions of δε (x) above (i.e. (2.2a), (2.6a), (2.6c) and (2.7a)), and
thence for δ by the limiting process. Alternatively they can be deduced from (2.6d).
(i) δ(x) is symmetric. From (2.6d) it follows using the substitution k = −ℓ that
Z ∞ Z −∞ Z ∞
1 −ıkx 1 ıℓx 1
δ(−x) = e dk = − e dℓ = eıℓx dℓ = δ(x) . (2.8a)
2π −∞ 2π ∞ 2π −∞
(ii) δ(x) is real. From (2.6d) and (2.8a), with ∗ denoting a complex conjugate, it follows that
Z ∞
1
δ ∗ (x) = e−ıkx dk = δ(−x) = δ(x) . (2.8b)
2π −∞

2.1.6 The Heaviside Step Function

The Heaviside step function, H(x), is defined for x ̸= 0 by


(
0 x<0
H(x) = . (2.9)
1 x>0

This function, which is sometimes written θ(x), is discontinuous


at x = 0:
lim H(x) = 0 ̸= 1 = lim H(x) .
x→0− x→0+

There are various conventions for the value of the Heaviside step
function at x = 0, but it is not uncommon to take H(0) = 21 .
The Heaviside function is closely related to the Dirac delta function, since from (2.4a) and (2.4b)
Z x
H(x) = δ(ξ) dξ. (2.10a)
−∞

By analogy with the first fundamental theorem of calculus (0.1), this suggests that
H ′ (x) = δ(x) . (2.10b)

Unlectured Remark. As a check on (2.10b) we see from integrating by parts that


Z ∞ h i∞ Z ∞

H (x − ξ)f (x) dx = H(x − ξ)f (x) − H(x − ξ)f ′ (x) dx
−∞ −∞ −∞
Z ∞
= f (∞) − f ′ (x) dx
ξ
h i∞
= f (∞) − f (x)
ξ

= f (ξ) .
Hence from the definition the delta function (2.5) we may identify H ′ (x) with δ(x).

Natural Sciences Tripos: IB Mathematical Methods I 36 © [email protected], Michaelmas 2022


Application. The idealized impulsive force in § 2.0.1) can be represented as

F (t) = I δ(t) ,

i.e. a spike of strength I localized at t = 0. If the particle is at rest before the impulse, the solution
for its momentum is
8/01 p = I H(t).

2.1.7 The Derivative of the Delta Function

We can define the derivative of δ(x) by using (2.4a), (2.4c) and a formal integration by parts:
Z ∞ h i∞ Z ∞
δ ′ (x − ξ)f (x) dx = δ(x − ξ)f (x) − δ(x − ξ)f ′ (x) dx = −f ′ (ξ) , (2.11)
−∞ −∞ −∞

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
where f (x) is any differentiable function.
Alternatively, the derivative[s] of the delta function can be defined as the limits of sequences of functions.
The generating functions for δ ′ (x) are the derivatives of (smooth) functions (e.g. Gaussians) that generate
δ(x), and have both positive and negative ‘spikes’ localized at x = 0.

Remark (unlectured). Not all operations are permitted on generalized functions. In particular, two gener-
alized functions of the same variable cannot be multiplied together, e.g. H(x)δ(x) is meaningless.
However δ(x)δ(y) is permissible and represents a point source in a two-dimensional space.

2.2 Second-Order Linear Ordinary Differential Equations

The general second-order linear ordinary differential equation (ODE) for y(x) can, wlog, be written as

y ′′ + p(x)y ′ + q(x)y = f (x) or L y(x) = f (x) , (2.12a)

where L is the differential operator


d2 d
L= 2
+ p(x) + q(x) , (2.12b)
dx dx
If f (x) = 0 the equation is said to be homogeneous (unforced), otherwise it is said to be inhomogeneous
(forced).

2.2.1 Homogeneous Second-Order Linear ODEs

If f = 0 then any two solutions of


y ′′ + py ′ + qy = 0 , (2.13a)
can be superposed to give a third, i.e. if y1 and y2 are two solutions then for α, β ∈ R another solution is

y = αy1 + βy2 . (2.13b)

Natural Sciences Tripos: IB Mathematical Methods I 37 © [email protected], Michaelmas 2022


Further, suppose that y1 and y2 are two linearly independent solutions, where by linearly independent we
mean that
αy1 (x) + βy2 (x) ≡ 0 ⇒ α = β = 0 . (2.13c)
Then since (2.13a) is second order, the general solution of (2.13a) will be of the form (2.13b). y1 (x) and
y2 (x) are often referred to as complementary functions, while the parameters α and β can be viewed as the
two integration constants. This means that in order to find the general solution of a second order linear
homogeneous ODE we need to find two linearly-independent solutions.

Remark. If y1 and y2 are linearly dependent then y2 = γy1 for some γ ∈ R, in which case (2.13b) becomes

y = (α + βγ)y1 , (2.14)

and we have, in effect, a solution with only one integration constant σ = (α + βγ).

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
2.2.2 Inhomogeneous Second-Order Linear ODEs

If y0 (x) is any solution of the real inhomogeneous equation (2.12a), i.e. if

Ly0 ≡ y0′′ + p(x)y0′ + q(x)y0 = f (x) , (2.15a)

then the general solution of (2.12a) has the form

y(x) = y0 (x) + αy1 (x) + βy2 (x) , (2.15b)

since
Ly = Ly0 + αLy1 + βLy2 (2.15c)
= f + 0 + 0. (2.15d)

Here y1 (x) and y2 (x) are complementary functions, while y0 (x) is referred to as a particular solution, or a
07/22 particular integral.

2.2.3 The Wronskian

If y1 and y2 are linearly dependent (i.e. y2 = γy1 for some γ), then so are y1′ and y2′ (since, from differentiating,
y2′ = γy1′ ). Hence y1 and y2 are linearly dependent only if the equation
  
y1 y2 α
= 0, (2.16a)
y1′ y2′ β

has a non-zero solution for α and β. Conversely, if this equation has a solution then y1 and y2 are linearly
dependent. It follows that non-zero functions y1 and y2 are linearly independent if and only if
  
y1 y2 α
= 0 ⇒ α = β = 0. (2.16b)
y1′ y2′ β

Define the Wronskian, W (x), of the two solutions to be the function

W [y1 , y2 ] = y1 y2′ − y2 y1′ . (2.17a)

Since Ax = 0 only has a zero solution if and only if det A ̸= 0, we conclude that y1 and y2 are linearly
independent if and only if
y1 y2
= y1 y2′ − y2 y1′ = W ̸= 0 , (2.17b)
y1′ y2′
i.e. the Wronskian is non-zero.

Natural Sciences Tripos: IB Mathematical Methods I 38 © [email protected], Michaelmas 2022


2.2.4 Initial-value and boundary-value problems

Two boundary conditions (BCs) must be specified to determine fully the solution of a second-order ODE.
A boundary condition is usually an equation relating the values of y and y ′ at one point.

Remark. Without loss of generality we can assume that the BCs do not involve y ′′ and higher derivatives,
since the ODE allows y ′′ and higher derivatives to be expressed in terms of y and y ′ .

The general form of a linear BC at a point x = a is

Ay(a) + By ′ (a) = E , (2.18)

where A, B and E are constants, and A and B are not both zero. If E = 0 the BC is said to be homogeneous.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Initial-value problem. If both BCs are specified at the same point we have an initial-value problem, e.g. solve
d2 x dx
m = F (t) for t ⩾ 0, subject to x = = 0 at t = 0. (2.19a)
dt2 dt
Boundary-value problem. If the BCs are specified at different points we have a two-point boundary-value
problem, e.g. solve

y ′′ (x) + y(x) = f (x) for a ⩽ x ⩽ b, subject to y(a) = y(b) = 0 . (2.19b)

2.3 Differential equations containing delta functions

If a differential equation involves a step function or delta function, this generally implies a lack of smoothness
in the solution. The equation can be solved separately on either side of the discontinuity and the two parts
of the solution connected by applying the appropriate matching conditions. Consider, as an example, the
linear second-order ODE
d2 y
+ y = δ(x) . (2.20)
dx2
If x represents time, this equation could represent the behaviour of a simple harmonic oscillator in response
to an impulsive force. In each of the regions x < 0 and x > 0 separately, the right-hand side vanishes and
the general solution is a linear combination of cos x and sin x. We may write
(
α− cos x + β− sin x, x < 0
y= .
α+ cos x + β+ sin x, x > 0

Since the general solution of a second-order ODE should contain only two arbitrary constants, it should be
possible to relate α+ and β+ to α− and β− .
What is the nature of the non-smoothness in y? Integrate (2.20) from x = −ε to x = ε to obtain
Z ε 2 Z ε Z ε
d y
2
dx + y(x) dx = δ(x) dx , (2.21a)
−ε dx −ε −ε

i.e. Z ε
′ ′
y (ε) − y (−ε) + y(x) dx = 1 . (2.21b)
−ε

Now let ε → 0. If we assume that y is bounded, then the integral term makes no contribution and we get
   x=ε
dy dy
≡ lim = 1. (2.21c)
dx ε→0 dx
x=−ε

Since there is only a finite jump in the derivative of y, we may further conclude that y is continuous, in
which case the jump conditions are
 
dy
[y] = 0, =1 at x = 0 . (2.21d)
dx

Natural Sciences Tripos: IB Mathematical Methods I 39 © [email protected], Michaelmas 2022


Applying these conditions, we obtain
α+ − α− = 0 and β+ − β− = 1 . (2.22)
Hence the general solution is (
α− cos x + β− sin x x<0
y= . (2.23)
α− cos x + (β− + 1) sin x x>0
In particular, if the oscillator is at rest before the impulse occurs, then α− = β− = 0 and the solution is
y = H(x) sin x.

2.4 Green’s Functions


2.4.1 The Green’s Function for two-point homogeneous boundary-value problems

Suppose that we wish to solve (2.12a), i.e.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
L y(x) = f (x) , (2.24a)
where L is the general second-order linear differential operator in x, i.e.
d2 d
+ p(x)
L= + q(x) , (2.24b)
dx2 dx
with p and q being continuous functions. To fix ideas we will assume that the solution should satisfy
homogeneous boundary conditions at x = a and x = b, i.e.
Ay(a) + By ′ (a) = 0 , (2.25a)

Cy(b) + Dy (b) = 0 . (2.25b)
where A, B, C and D are constants such that A and B are not both zero, and C and D are not both zero.
Next, suppose that we can find a solution G(x; ζ) that is the response of the system to forcing at a point ζ,
i.e. G(x; ζ) is the solution to
L G(x; ζ) = δ(x − ζ) , (2.26a)
subject to the boundary conditions (cf. (2.25a) and (2.25b))
A G(a; ζ) + B Gx (a; ζ) = 0 and C G(b; ζ) + D Gx (b; ζ) = 0 , (2.26b)
where
∂2 ∂
L= 2
+ p(x) + q(x) , (2.26c)
∂x ∂x
∂G
Gx (x; ζ) = (x; ζ) , (2.26d)
∂x
∂ d
and we have used ∂x rather than dx since G is a function of both x and ζ. Then we claim that the solution
of the original problem (2.24a) is
Z b
y(x) = G(x; ζ)f (ζ) dζ . (2.27)
a
To see this we first note that (2.27) satisfies the boundary conditions (2.25a) and (2.25b), since from (2.26b)
Z b
Ay(a) + By ′ (a) = (A G(a; ζ) + B Gx (a; ζ)) f (ζ) dζ = 0 , (2.28a)
a
Z b
Cy(b) + Dy ′ (b) = (C G(b; ζ) + D Gx (b; ζ)) f (ζ) dζ = 0 . (2.28b)
a

Further, (2.27) also satisfies the inhomogeneous equation (2.24a) because


Z b
L y(x) = L G(x; ζ) f (ζ) dζ differential wrt x, integral wrt ζ
a
Z b
= δ(x − ζ) f (ζ) dζ from (2.26a)
a

= f (x) from (2.5) . (2.28c)


21/02 The function G(x; ζ) is called the Green’s function of L for the given homogeneous boundary conditions.

Natural Sciences Tripos: IB Mathematical Methods I 40 © [email protected], Michaelmas 2022


2.4.2 Two Properties Green’s Functions
In the next subsection we will construct a Green’s function. However, first we need to derive two properties
of G(x; ζ). Suppose that we integrate equation (2.26a) from ζ − ε to ζ + ε for ε > 0 and consider the limit
ε → 0 (cf. (2.21a)). From (2.5) the right hand side is equal to 1, and hence
Z ζ+ε
1 = lim LG dx
ε→0 ζ−ε
ζ+ε
∂2G
Z  
∂G
= lim +p + q G dx from (2.24b)
ε→0 ζ−ε ∂x2 ∂x
Z ζ+ε   Z ζ+ε  
∂ ∂G dp
= lim + p G dx + lim − G + q G dx rearrange
ε→0 ζ−ε ∂x ∂x ε→0 ζ−ε dx
 x=ζ+ε Z ζ+ε  
∂G dp
= lim + pG − lim − q G dx . (2.29)
ε→0 ∂x ε→0 ζ−ε dx

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
x=ζ−ε

How can this equation be satisfied? Taking the lead from (2.21d), suppose that G(x; ζ) is bounded near
x = ζ, then since p and q are continuous, (2.29) reduces to
 x=ζ+ε
∂G
lim + pG = 1.
ε→0 ∂x
x=ζ−ε

This implies that the jump in the derivative of G is bounded (cf. the unit jump in the Heaviside step function
(2.9) at x = 0). In turn, this means that G must be continuous. We conclude that
 ζ+ε  x=ζ+ε
∂G
lim G(x; ζ) =0 and lim = 1. (2.30)
ε→0 ε→0 ∂x
ζ−ε x=ζ−ε

i.e. G is continuous and there is a unit jump in the derivative of G at x = ζ.


21/01 Remark. A function can be continuous and its derivative discontinuous, but not vice versa.
21/03
21/04 2.4.3 Construction of the Green’s Function
G(x; ζ) can be constructed by the following procedure. First we note that when x ̸= ζ, G satisfies the
homogeneous equation, and hence G should be the sum of two linearly independent solutions, say y1 and y2 ,
of the homogeneous equation. So let
(
α− (ζ)y1 (x) + β− (ζ)y2 (x) for a ⩽ x < ζ,
G(x; ζ) = (2.31)
α+ (ζ)y1 (x) + β+ (ζ)y2 (x) for ζ ⩽ x ⩽ b.

By construction this satisfies (2.26a) for x ̸= ζ. Next we obtain equations relating α± (ζ) and β± (ζ) by
requiring at x = ζ that G is continuous and ∂G ∂x has a unit discontinuity. It follows from (2.30) that
   
α+ (ζ)y1 (ζ) + β+ (ζ)y2 (ζ) − α− (ζ)y1 (ζ) + β− (ζ)y2 (ζ) = 0 ,
α+ (ζ)y1′ (ζ) + β+ (ζ)y2′ (ζ) − α− (ζ)y1′ (ζ) + β− (ζ)y2′ (ζ) = 1 ,
   

i.e., grouping the y1 and y2 terms,


   
y1 (ζ) α+ (ζ) − α− (ζ) + y2 (ζ) β+ (ζ) − β− (ζ) = 0 ,
y1′ (ζ) α+ (ζ) − α− (ζ) + y2′ (ζ) β+ (ζ) − β− (ζ) = 1 ,
   

i.e.     
y1 y2 α+ − α− 0
= . (2.32)
y1′ y2′ β+ − β− 1
A solution exists to this equation if, see (2.17b),
y1 y2
W ≡ ̸= 0 ,
y1′ y2′
i.e. if y1 and y2 are linearly independent; if so then
y2 (ζ) y1 (ζ)
α+ − α− = − and β+ − β− = . (2.33) 08/22
W (ζ) W (ζ)
Natural Sciences Tripos: IB Mathematical Methods I 41 © [email protected], Michaelmas 2022
Finally we impose the boundary conditions. For instance, suppose that the solution y is required to satisfy
(cf. (2.19b))
y(a) = y(b) = 0 . (2.34a)
Then the appropriate boundary conditions for G would be
G(a; ζ) = G(b; ζ) = 0 , (2.34b)
i.e. A = C = 1 and B = D = 0 in (2.26b). It follows from (2.31) that we would require
α− (ζ)y1 (a) + β− (ζ)y2 (a) = 0 , (2.35a)
α+ (ζ)y1 (b) + β+ (ζ)y2 (b) = 0 . (2.35b)
α± , β± could then be determined from the four equations in (2.33), (2.35a) and (2.35b).
More generally, for the homogeneous boundary conditions (2.25a) and (2.25b), i.e.
Ay(a) + By ′ (a) = 0 and Cy(b) + Dy ′ (b) = 0 , (2.36a)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
the appropriate boundary conditions for G are
∂G
AG(a; ζ) + B (a; ζ) = 0 (2.36b)
∂x
∂G
CG(b; ζ) + D (b; ζ) = 0 . (2.36c)
∂x
For simplicity construct complementary functions y1 and y2 so that they satisfy the boundary condition at
a and b respectively, i.e. choose y1 and y2 so that
Ay1 (a) + By1′ (a) = 0 and Cy2 (b) + Dy2′ (b) = 0 . (2.37a)
Then
α+ = β− = 0 , (2.37b)

and the solution (2.31) simplifies to


(
α− (ζ)y1 (x), a⩽x<ζ,
G(x; ζ) = (2.37c)
β+ (ζ)y2 (x), ζ ⩽ x ⩽ b,
and thence from (2.33)
y2 (ζ) y1 (ζ)
α− = and β+ = . (2.37d)
W (ζ) W (ζ)
It follows from (2.31) that 
 y1 (x)y2 (ζ)

for a ⩽ x < ζ,
 W (ζ)

G(x; ζ) = (2.37e)
 y1 (ζ)y2 (x)
for ζ ⩽ x ⩽ b.


W (ζ)

Remark. This method fails if the Wronskian W [y1 , y2 ] vanishes. This happens if y1 is proportional to y2 , i.e.
if there is a complementary function that happens to satisfy the homogeneous boundary conditions
both at x = a and x = b. In this case the equation Ly = f may not have a solution satisfying the
boundary conditions; if it does, the solution will not be unique (cf. resonance).

2.4.4 Examples of Green’s Functions

(i) Find the Green’s function in 0 < a < b for


∂2 1 ∂ n2
L= 2
+ − 2, (2.38a)
∂x x ∂x x
with homogeneous boundary conditions
∂G
G(a; ζ) = 0 and (b; ζ) = 0 , (2.38b)
∂x
i.e. with A = D = 1 and B = C = 0 in (2.26b).

Natural Sciences Tripos: IB Mathematical Methods I 42 © [email protected], Michaelmas 2022


Answer. Seek solutions to the homogeneous equation L y = 0 of the form y = xr . Then we require
that
r(r − 1) + r − n2 = 0 , i.e. r = ±n . (2.39a)
Let  x n  a n  x n  b n
y1 = − and y2 = , + (2.39b)
a x b x
where we have constructed y1 and y2 so that y1 (a) = 0 and y2′ (b) = 0 as is appropriate for
boundary conditions (2.38b). Since we require that G(a; ζ) = 0 from (2.38b), and by construction
y1 (a) = 0, it follows that β− = 0 in (2.31). Similarly, since we require that ∂G∂x (b; ζ) = 0 from
(2.38b), and by construction y2′ (b) = 0, it follows that α+ = 0. Hence, as in (2.37c),
(
α− (ζ)y1 (x) for a ⩽ x < ζ,
G(x; ζ) =
β+ (ζ)y2 (x) for ζ ⩽ x ⩽ b.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
∂G
We also require that G is continuous and ∂x has a unit discontinuity at x = ζ, hence

β+ (ζ)y2 (ζ) = α− (ζ)y1 (ζ) and β+ (ζ)y2′ (ζ) − α− (ζ)y1′ (ζ) = 1 . (2.40)

Thus, as in (2.37d) and (2.37e),



y1 (x)y2 (ζ)
for a ⩽ x < ζ,


y2 (ζ) y1 (ζ)  W (ζ)

α− = , β+ = and G(x; ζ) = (2.41)
W (ζ) W (ζ)  y1 (ζ)y2 (x)
for ζ ⩽ x ⩽ b.


W (ζ)

(ii) Find the Green’s function for the two-point boundary-value problem

y ′′ (x) + y(x) = f (x), y(0) = y(1) = 0 . (2.42)

Answer. The complementary functions satisfying left and right boundary conditions are

y1 = sin x and y2 = sin(x − 1) (2.43)

respectively. The Wronskian is thus

W = y1 y2′ − y2 y1′ = sin x cos(x − 1) − sin(x − 1) cos x = sin 1 . (2.44)

Thus
sin x sin(ζ − 1)


 0⩽x⩽ζ,
 sin 1
G(x; ζ) = (2.45)
 sin ζ sin(x − 1) ζ ⩽ x ⩽ 1 .


sin 1
So, being careful to choose the correct expression for G depending on whether x ⩽ ζ or x ⩾ ζ,
Z 1
y(x) = G(x; ζ)f (ζ) dζ
0
x 1
sin(x − 1)
Z Z
sin x
= sin ζ f (ζ) dζ + sin(ζ − 1)f (ζ) dζ . (2.46)
sin 1 0 sin 1 x

2.4.5 The Green’s Function for homogeneous initial-value problems

Suppose that instead of the two-point boundary conditions (2.25a) and (2.25b), we require that

y(a) = y ′ (a) = 0 . (2.47a)

We then require, by analogy with (2.26b), that


∂G
G(a; ζ) = (a; ζ) = 0 . (2.47b)
∂x

Natural Sciences Tripos: IB Mathematical Methods I 43 © [email protected], Michaelmas 2022


Choose the complementary functions so that y1 (a) = 0 and y2′ (a) = 0 (which can be shown to be linearly
independent and always possible), then (2.31) simplifies to
(
0 for a ⩽ x < ζ,
G(x; ζ) = (2.48)
α+ (ζ)y1 (x) + β+ (ζ)y2 (x) for ζ ⩽ x ⩽ b,

∂G
i.e. α− = β− = 0. The conditions that G be continuous and ∂x has a unit discontinuity then give that

α+ (ζ)y1 (ζ) + β+ (ζ)y2 (ζ) = 0 , (2.49a)


α+ (ζ)y1′ (ζ) + β+ (ζ)y2′ (ζ) = 1. (2.49b)

Or in matrix form     
y1 (ζ) y2 (ζ) α+ (ζ) 0
= (2.50a)
y1′ (ζ) y2′ (ζ) β+ (ζ) 1

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
with solution    ′    
α+ (ζ) 1 y2 (ζ) −y2 (ζ) 0 −y2 (ζ)/W (ζ)
= = (2.50b)
β+ (ζ) W (ζ) −y1′ (ζ) y1 (ζ) 1 y1 (ζ)/W (ζ)
The Green’s function is therefore


 0 for a ⩽ x < ζ,
G(x; ζ) = y (ζ)y2 (x) − y1 (x)y2 (ζ) (2.51)
 1
 for ζ ⩽ x ⩽ b.
W (ζ)

Example. Find the Green’s function for the initial-value problem

y ′′ (x) + y(x) = f (x), y(0) = y ′ (0) = 0 . (2.52)

Answer. The complementary functions that satisfy the boundary conditions are y1 = sin x and
y2 = cos x, with Wronskian

W = y1 y2′ − y2 y1′ = − sin2 x − cos2 x = −1 . (2.53a)

Further
y1 (ζ)y2 (x) − y1 (x)y2 (ζ) = sin ζ cos x − sin x cos ζ = sin(ζ − x) . (2.53b)
Thus
(
0 0⩽x⩽ζ,
G(x; ζ) = (2.53c)
sin(x − ζ) x > ζ ,

and thus
Z x
y(x) = sin(x − ζ)f (ζ) dζ (2.53d)
0

2.4.6 Inhomogeneous boundary conditions

So far we only considered problems with homogeneous boundary conditions. One can also use Green’s
functions to solve problems with inhomogeneous boundary conditions. The trick is to solve the homogeneous
equation Lyibc = 0 for a function yibc which satisfies the inhomogeneous boundary conditions. Then solve the
inhomogeneous equation Lyhbc = f , perhaps using the Green’s function method discussed in this chapter,
imposing homogeneous boundary conditions on yhbc . Then linearity means that yibc + yhbc satisfies the
inhomogeneous equation with inhomogeneous boundary conditions.

Natural Sciences Tripos: IB Mathematical Methods I 44 © [email protected], Michaelmas 2022


3 Fourier Transforms
3.0 Why Study This?
Fourier series tell you about the spectral (or harmonic) properties of functions/signals that are periodic;
if the period is L, then the harmonics have frequencies n/L where n is an integer. The Fourier transform
generalizes this idea to functions that are not periodic. The ‘harmonics’ can then have any frequency.
The Fourier transform has innumerable applications in diverse fields such as astronomy, optics, signal pro-
cessing, data analysis, statistics and number theory. Furthermore, the Fourier transform provides a comple-
mentary way of looking at a function. Certain operations on a function are more easily computed ‘in the
Fourier domain’. This idea is particularly useful in solving certain kinds of differential equation.

3.1 The Fourier Transform


3.1.1 Definition

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Given a function f (x) such that Z ∞
|f (x)| dx < ∞ ,
−∞

we define its Fourier transform, fe(k), by


Z ∞
1
fe(k) = √ e−ıkx f (x) dx . (3.1)
2π −∞

Notation. Sometimes it is clearer to denote the Fourier transform of a function f by F[f ] rather than fe, i.e.

F[•] ≡ e
•. (3.2)

Remark. There are differing normalisations of the Fourier transform. Hence you will encounter definitions
1
where the (2π)− 2 is either not present or replaced by (2π)−1 , and other definitions where the −ıkx is
replaced by +ıkx.
Property. If the function f (x) is real the Fourier transform fe(k) is not necessarily real. However if f is both
real and even, i.e. f ∗ (x) = f (x) and f (x) = f (−x) respectively, then by using these properties and
the substitution x = −y it follows that fe is real:
Z ∞
∗ 1
f (k) =
e √ eıkx f ∗ (x) dx from c.c. of (3.1)
2π −∞
Z ∞
1
=√ eıkx f (−x) dx since f ∗ (x) = f (−x)
2π −∞
Z ∞
1
= √ e−ıky f (y) dy let x = −y
2π −∞
= fe(k) . from (3.1) (3.3)

Similarly we can show that if f is both real and odd, then fe is purely imaginary, i.e. fe∗ (k) = −fe(k).
Conversely it is possible to show using the Fourier inversion theorem (see below) that

• if both f and fe are real, then f is even;


• if f is real and fe is purely imaginary, then f is odd.

3.1.2 Examples of Fourier Transforms


The Fourier Transform (FT) of e−b|x| (b > 0). First, from (2.6b) we already have that
Z ∞
1 ε
eıkx−ε|k| dk = .
2π −∞ π(x + ε2 )
2

For what follows it is helpful to rewrite this result by making the transformations x → −ℓ, k → x and
ε → b to obtain Z ∞
2b
e−ıℓx−b|x| dx = 2 . (3.4)
−∞ ℓ + b2
Natural Sciences Tripos: IB Mathematical Methods I 45 © [email protected], Michaelmas 2022
We deduce from the definition of a Fourier transform, (3.1), and (3.4) with ℓ = k, that
Z ∞
1
F[e−b|x| ] = √ e−ıkx−b|x| dx
2π −∞
1 2b
=√ . (3.5)
2π k + b2
2

The FTs of cos(ax) e−b|x| and sin(ax) e−b|x| (b > 0). Unlectured. From (3.1), the definition of cosine, and
(3.4) first with ℓ = a − k and then with ℓ = a + k, it follows that
Z ∞
−b|x| 1
eıax + e−ıax e−ıkx−b|x| dx

F[cos(ax) e ] = √
2 2π −∞
 
b 1 1
= √ + . (3.6a)
2π (a − k)2 + b2 (a + k)2 + b2
This is real, as it has to be since cos(ax) e−b|x| is even.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Similarly, from (3.1), the definition of sine, and (3.4) first with ℓ = a − k and then with ℓ = a + k, it
follows that
Z ∞
−b|x| 1
eıax − e−ıax e−ıkx−b|x| dx

F[sin(ax) e ] = √
2ı 2π −∞
 
−ıb 1 1
= √ − . (3.6b)
2π (a − k)2 + b2 (a + k)2 + b2
09/22 This is purely imaginary, as it has to be since sin(ax) e−b|x| is odd.
The FT of a Gaussian. From the definition (3.1), the completion of a square, and the substitution
x = (εy − ıε2 k),17 it follows that
Z ∞
x2 x2
    
1 1
F √ exp − 2 = exp − 2 − ıkx dx
2πε2 2ε 2πε −∞ 2ε
Z ∞  2 
1 x
= exp − 12 + ıεk − 12 ε2 k 2 dx
2πε −∞ ε
Z ∞
1
exp − 12 ε2 k 2 exp − 12 y 2 dy
 
=
2π −∞
1 1 2 2

= √ exp − 2 ε k . (3.7)

Hence the FT of a Gaussian of width (standard deviation) ε is a Gaussian of width ε−1 . This illustrates
a property of the Fourier transform: the narrower the function of x, the wider the function of k.
The FT of the delta function. From definitions (2.5) and (3.1) it follows that
Z ∞
1
F[δ(x − a)] = √ δ(x − a)e−ıkx dx
2π −∞
1
= √ e−ıka . (3.8a)


Hence the Fourier transform of δ(x) is 1/ 2π. Recalling the description of a delta function as a limit
of a Gaussian, see (2.7a), we note that this result with a = 0 is consistent with (3.7) in the limit
ε → 0+.
The FT of the step function. From (2.9) and (3.1) it follows that
Z ∞
1
F[H(x − a)] = √ H(x − a)e−ıkx dx
2π −∞
Z ∞
1
= √ e−ıkx dx
2π a
 −ıkx ∞
1 e
= √
2π −ık a

17 This is a little naughty since it takes us into the complex x-plane. However, it can be fixed up once you have done Cauchy’s

theorem.

Natural Sciences Tripos: IB Mathematical Methods I 46 © [email protected], Michaelmas 2022


There is now a problem, since what is limx→∞ e−ıkx ? For the time being the simplest resolution is, in
the spirit of (2.6b) and (2.6c) in § 2.1.4, to find F[H(x − a)e−ε(x−a) ] for ε > 0, and then let ε → 0+.
So
Z ∞
−ε(x−a) 1
F[H(x − a)e ]= √ H(x − a)e−ε(x−a)−ıkx dx
2π −∞
 −ε(x−a)−ıkx ∞
1 e
=√
2π −ε − ık a
1 e−ıka
=√ . (3.8b)
2π ε + ık
On taking the limit ε → 0 we have that
e−ıka
F[H(x − a)] = √ . (3.8c)
2π ık

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Remark. For future reference we observe from a comparison of (3.8a) and (3.8c) that
ıkF[H(x − a)] = F[δ(x − a)] . (3.8d)

The FT of the top-hat function. Consider the discontinuous ‘top-hat’ function g(x) defined by
(
c a < x < b,
g(x) = (3.9a)
0 otherwise .
Then
√ Z b
ic −ikb
c e−ikx dx = − e−ika .

2π ge(k) = e (3.9b)
a k
For instance, if a = −1, b = 1 and c = 1
√ i −ik  2 sin k
2π ge(k) = e − eik = . (3.9c)
k k

9/02
9/04

3.1.3 The Fourier Inversion Theorem

Given a function f we can compute its Fourier transform fe from (3.1). For many functions the converse is
also true, i.e. given the Fourier transform fe of a function we can reconstruct the original function f . To see
this consider the following calculation (note the use of a dummy variable • to avoid an overabundance of x)
Z ∞ Z ∞  Z ∞ 
1 1 1
√ eıkx fe(k) dk = √ eıkx √ e−ık• f (•) d• dk from definition (3.1)
2π −∞ 2π −∞ 2π −∞
Z ∞  Z ∞ 
1
= d • f (•) dk eık(x−•) swap integration order
−∞ 2π −∞
Z ∞
= d • f (•) δ(x − •) from definition (2.6d)
−∞
= f (x) . from definition (2.5)

Natural Sciences Tripos: IB Mathematical Methods I 47 © [email protected], Michaelmas 2022


We thus have the result that if the Fourier transform of f (x) is defined by
Z ∞
1
fe(k) = √ e−ıkx f (x) dx ≡ F[f ] , (3.10a)
2π −∞

then the inverse transform (note the change of sign in the exponent) acting on fe(k) recovers f (x), i.e.
Z ∞
1
f (x) = √ eıkx fe(k) dk ≡ I[fe] . (3.10b)
2π −∞
Note that h h ii
9/03 I [F [f ]] = f , and F I fe = fe. (3.10c)

Example. Find the Fourier transform of (x2 + b2 )−1 .


Answer. We start from our earlier result, (3.5), that

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
h i 1 2b
F e−b|x| (k) = √ .
2π k + b2
2

Hence from taking the inverse transform and using (3.10c)


r  
π −b|x| 1
e = I (x) ,
2b2 k 2 + b2
or, after applying the transformation x ↔ k,
  r
1 π −b|k|
I 2 (k) = e . (3.11a)
x + b2 2b2
But, from the transformation x ↔ k in (3.10b) and comparison with (3.10a), we see that

F[f (x)](k) = I[f (x)](−k) . (3.11b)

Hence, making the transformation k → −k in (3.11a), we find that


  r
1 π −b|k|
F 2 2
(k) = e . (3.11c)
x +b 2b2
Remarks.
(i) The variables are often called t and ω rather than x and k (time ↔ angular frequency vs. position
↔ wavenumber).
(ii) It is sometimes useful to consider complex values of k.
(iii) For a rigorous proof, certain technical conditions on f (x) are required. In particular, a necessary
condition for fe(k) to exist for all real values of k (in the sense of an ordinary function) is that
f (x) → 0 as x → ±∞. Otherwise the Fourier integral does not converge (e.g. for k = 0).
A set of sufficient conditions for fe(k) to exist is that f (x) have ‘bounded variation’, have a finite
number of discontinuities and be ‘absolutely integrable’, i.e.
Z ∞
|f (x)| dx < ∞.
−∞

However, we have seen that Fourier transforms can be assigned in a wider sense to some functions
that do not satisfy all of these conditions, e.g. f (x) = 1.

3.1.4 Properties of Fourier Transforms


Linearity. For constants α and β.
Z ∞
1
F [αf (x) + βg(x)] = √ e−ıkx (αf (x) + βg(x)) dx
2π −∞
Z ∞ Z ∞
α β
=√ e−ıkx f dx + √ e−ıkx g dx
2π −∞ 2π −∞
= αF [f (x)] + +βF [g(x)] . (3.12)

Natural Sciences Tripos: IB Mathematical Methods I 48 © [email protected], Michaelmas 2022


Rescaling. Let g(x) = f (αx) for real constant α, then
Z ∞
1
ge = √ e−ıkx f (αx) dx
2π −∞
sgn α ∞ −ı k y
Z
= √ e α f (y) dy
α 2π −∞
 
1 e k
= f . (3.13)
|α| α

Translation. The Fourier transform of f (x − α) for constant α is given by


Z ∞
1
F[f (x − α)] = √ e−ıkx f (x − α) dx from (3.1)
2π −∞
Z ∞
1

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
=√ e−ık(y+α) f (y) dy x=y+α
2π −∞
Z ∞
1
= e−ıkα √ e−ıky f (y) dy rearrange
2π −∞
= e−ıkα F[f (x)] from (3.1). (3.14)

Exponential. Similarly, the Fourier transform of eıαx f (x) for constant α is given by
Z ∞
1
F[eıαx f (x)](k) = √ e−ı(k−α)x f (x) dx
2π −∞
= F[f (x)](k − α) . (3.15)

Duality. If g(x) = fe(x) then


Z ∞
1
ge(k) = √ e−ıkx fe(x) dx
2π −∞
= f (−k) , (3.16)

i.e. transforming twice returns the reflected function, cf. (3.11b).


Complex conjugation and parity inversion. For real k
Z ∞
∗ 1
F[f ](k) = √ e−ıkx f ∗ (x) dx
2π −∞
 Z ∞ ∗
1
= √ eıkx f (x) dx
2π −∞
= F[f ](−k) . (3.17)

Symmetry. If f (−x) = ±f (x), i.e. f is even or odd, then


Z ∞
1
fe(−k) = √ f (x) e+ikx dx
2π −∞
Z ∞
1
=√ ±f (−x) eikx dx
2π −∞
Z ∞
1
= ±√ f (y) e−iky dy
2π −∞
= ±fe(k) . (3.18)

Natural Sciences Tripos: IB Mathematical Methods I 49 © [email protected], Michaelmas 2022


Differentiation. Recall that if g(x, k) is a function of two variables, then for constants a and b18
Z b Z b
d ∂g(x, k)
g(x, k) dk = dk . (3.19)
dx a a ∂x
Hence, if we differentiate the inverse Fourier transform (3.10b) with respect to x we obtain
Z ∞
df 1   h i
(x) = √ eıkx ık fe(k) dk = I ık fe . (3.20)
dx 2π −∞
Now Fourier transform this equation to conclude from using (3.10c) that
 
df h h ii
F = F I ık fe = ık fe. (3.21a)
dx
In other words, each time we differentiate a function we multiply its Fourier transform by ık. Hence

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
 2   n 
d f 2e d f
F = −k f and F = (ık)n fe. (3.21b)
dx2 dxn
Remark. That Fourier transforms allow a simple representation of derivatives of f (x) in Fourier space
has important consequences for solving differential equations.
Alternative proof (unlectured). This does not reply on the use of the inverse Fourier transform:
  Z ∞
df
F = f ′ (x) e−ikx dx
dx −∞
Z ∞
∞
= f (x) e−ikx −∞ − f (x)(−ik)e−ikx dx

−∞

= ik fe(k) (3.21c)

The integrated part vanishes because f (x) must tend to zero as x → ±∞ in order to possess a
Fourier transform.
Multiplication by x. This time we differentiate (3.10a) with respect to k to obtain
Z ∞
dfe 1
(k) = √ e−ıkx (−ıxf (x)) dx .
dk 2π −∞
Hence, after multiplying by ı, we deduce from (3.1) that (cf. (3.21a))
dfe
10/22 ı = F [xf (x)] . (3.22)
dk

3.1.5 The Relationship to Fourier Series

Suppose that f (x) is a periodic function with period L (so that f (x + L) = f (x)). Then f can be represented
by a Fourier series
∞  
X 2πınx
f (x) = an exp , (3.23a)
n=−∞
L
where
1
2L
Z  
1 2πınx
an = f (x) exp − dx . (3.23b)
10/03 L − 21 L L

18 If this is unfamiliar, work from first principles:


Z b Z b Z b 
d 1
g(x, k) dk = lim g(x + ε, k) dk − g(x, k) dk
dx a ε→0 ε a a
Z b  
g(x + ε, k) − g(x, k)
= lim dk
a ε→0 ε
Z b
∂g(x, k)
= dk .
a ∂x

Natural Sciences Tripos: IB Mathematical Methods I 50 © [email protected], Michaelmas 2022


Expression (3.23a) can be viewed as a superposition of an infinite number of waves with wavenumbers
kn = 2πn/L (n = −∞, . . . , ∞). We are interested in the limit as the period L tends to infinity. In this
limit the increment between successive wavenumbers, i.e. ∆k = 2π/L, becomes vanishingly small, and the
spectrum of allowed wavenumbers kn becomes a continuum. Moreover, we recall that an integral can be
evaluated as the limit of a sum, e.g.
Z ∞ ∞
X
g(k) dk = lim g(kn )∆k where kn = n∆k . (3.24)
−∞ ∆k→0
n=−∞

Rewrite (3.23a) and (3.23b) as



1 X
f (x) = √ fe(kn ) exp (ıxkn ) ∆k ,
2π n=−∞

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
and
1
2L
Z
1
fe(kn ) = √ f (x) exp (−ıxkn ) dx ,
2π − 12 L

where

L an 2π an
fe(kn ) = √ ≡ .
2π ∆k
We then see that in the limit ∆k → 0, i.e. L → ∞,
Z ∞
1
f (x) = √ fe(k) exp (ıxk) dk , (3.25a)
2π −∞
and
Z ∞
1
fe(k) = √ f (x) exp (−ıxk) dx . (3.25b)
2π −∞

These are just our earlier definitions of the inverse Fourier transform (3.10b) and Fourier transform (3.1)
respectively.

3.2 The Convolution Theorem

3.2.1 Definition of convolution

The convolution, f ∗ g, of a function f (x) with a function g(x) is defined by


Z ∞
(f ∗ g)(x) = dy f (y) g(x − y) . (3.26)
−∞

The convolution expresses the amount of overlap of one function g as it is shifted over another function f .

Property: the convolution operator ∗ is commutative. f ∗ g = g ∗ f since


Z ∞
(f ∗ g)(x) = dy f (y) g(x − y) from (3.26)
−∞
Z −∞
= (− dz) f (x − z) g(z) z =x−y
Z∞∞
= dy f (x − y) g(y) z→y
−∞
= (g ∗ f )(x) from (3.26).

Natural Sciences Tripos: IB Mathematical Methods I 51 © [email protected], Michaelmas 2022


3.2.2 Interpretation and examples

In statistics, a continuous random variable x (for instance, the height of a person drawn at random from
the population) has a probability distribution (or density) function f (x). The probability of x lying in the
range x0 < x < x0 + δx in the limit of small δx is f (x0 )δx.
If x and y are independent random variables with distribution functions f (x) and g(y), then let the distri-
bution function of their sum, z = x + y, be h(z). For the above example, suppose y is the height of a soap
box drawn at random; then z would be the height of a random person while standing on the soap box.
For any given value of x, the probability that z lies in the range
z0 < z < z0 + δz , (3.27a)
is just the probability that y lies in the range
z0 − x < y < z0 − x + δz , (3.27b)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
which is g(z0 − x)δz. That’s for a fixed x; so the probability that z lies in this same range for all x is
Z ∞
h(z0 )δz = f (x)g(z0 − x)δz dx , (3.27c)
−∞

which implies
h = f ∗g. (3.27d)
Applications. The effect of measuring, observing or processing scientific data can often be described as a
convolution of the data with a certain function. For instance:
(i) When a point source is observed by a telescope, a broadened image is seen, known as the point
spread function of the telescope. When an extended source is observed, the image that is seen is the
convolution of the source with the point spread function.
In this sense convolution corresponds to a broadening or distortion of the original data.
(ii) A point mass M at position R gives rise to a gravitational potential Φp (r) = −GM/|r − R|. A
continuous mass density ρ(r) can be thought of as a sum of infinitely many point masses ρ(R) d3 R at
positions R. The resulting gravitational potential is
Z
ρ(R) 3
Φ(r) = −G d R (3.28)
|r − R|
which is the (3D) convolution of the mass density ρ(r) with the potential of a unit point charge at the
origin, −G/|r|.

3.2.3 The convolution theorem

If the functions f and g have Fourier transforms F[f ] and F[g] respectively, then

F[f ∗ g] = 2πF[f ]F[g] . (3.29)
Proof.
Z ∞ Z ∞ 
1 −ıkx
F[f ∗ g] = √ dx e dy f (y) g(x − y) from (3.1) & (3.26)
2π −∞ −∞
Z ∞ Z ∞
1
=√ dy f (y) dx e−ıkx g(x − y) swap integration order
2π −∞ −∞
Z ∞ Z ∞
1
= √ dy f (y) dz e−ık(z+y) g(z) x=z+y
2π −∞ −∞
Z ∞ Z ∞
1
=√ dy f (y) e−ıky dz e−ıkz g(z) rearrange
2π −∞ −∞

= 2π F[f ]F[g] from (3.1).

Natural Sciences Tripos: IB Mathematical Methods I 52 © [email protected], Michaelmas 2022


Conversely the Fourier
√ transform of the product f g is given by the convolution of the Fourier transforms of
f and g divided by 2π, i.e.
1
F[f g] = √ F[f ] ∗ F[g] . (3.30)

Proof. Z ∞
1
F[f g](k) = √ dx e−ıkx f (x)g(x) from (3.1)
2π −∞
Z ∞  Z ∞ 
1 1
=√ dx e−ıkx g(x) √ dℓ eıℓx fe(ℓ) from (3.10b) with k → ℓ
2π −∞ 2π −∞
Z ∞  Z ∞ 
1 1
=√ dℓ fe(ℓ) √ dx e−i(k−ℓ)x g(x) swap integration order
2π ∞ 2π −∞
Z ∞
1
=√ dℓ fe(ℓ) ge(k − ℓ) from (3.1)
2π −∞

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1 1
= √ (fe ∗ ge)(k) ≡ √ (F[f ] ∗ F[g]) (k) from (3.26).
2π 2π
Remarks.
(i) Convolution is an operation best carried out as a multiplication in the Fourier domain.
(ii) The Fourier transform of a product is non-trivial.
(iii) Convolution can be undone (deconvolution) by a division in the Fourier domain. If g is known and
f ∗ g is measured, then f can be obtained, in principle.

Application (unlectured). Suppose a linear ‘black box’ (e.g. a circuit) has output G(ω) exp (ıωt) for a periodic
input exp (ıωt). What is the output r(t) corresponding to input f (t)?

Answer. Since the ‘black box’ is linear, changing the


input produces a directly proportional change in out-
put. Thus since an input exp (ıωt) produces an output
G(ω) exp (ıωt), an input F (ω) exp (ıωt) will produce an out-
put R(ω) exp (ıωt) = G(ω)F (ω) exp (ıωt).

If we express the input as a Fourier transform, namely,


Z ∞
1
f (t) = √ F (ω) eıωt dω , (3.31a)
2π −∞
then, since the ‘black box’ is linear, we can superpose input to produce the output
Z ∞
1
r(t) = √ G(ω)F (ω) eıωt dω
2π −∞
Z ∞  
1 1
=√ √ F[f ∗ g] eıωt dω from (3.29)
2π −∞ 2π
1
= √ (f ∗ g)(t) , from (3.10b) (3.31b)

where g(t) is the inverse transform of G(ω), and we have used t and ω, instead of x and k respectively,
as the variables in the Fourier transforms and their inverses.

10/02
Remark. If the know the output of a linear black box for all possible harmonic inputs, then we know
10/04 everything about the black box.

3.2.4 Correlation
The correlation of two functions, h = f ⊗ g, is defined by
Z ∞
h(x) = [f (y)]∗ g(x + y) dy . (3.32)
−∞

Correlation is a way of quantifying the relationship between two (typically oscillatory) functions. If two
signals (oscillating about an average value of zero) oscillate in phase with each other, their correlation will
Natural Sciences Tripos: IB Mathematical Methods I 53 © [email protected], Michaelmas 2022
be positive. If they are out of phase, the correlation will be negative. If they are completely unrelated, their
correlation will be zero.
The Fourier transform of a correlation is
Z ∞ Z ∞ 
1
h̃(k) = √ [f (y)] g(x + y) dy e−ikx dx

2π −∞ −∞
Z ∞Z ∞
1
=√ [f (y)]∗ g(z) eiky e−ikz dz dy (z = x + y)
2π −∞ −∞
Z ∞ ∗ Z ∞
1
=√ f (y) e−iky dy g(z) e−ikz dz
2π −∞ −∞

= 2π[f˜(k)]∗ g̃(k) (3.33)

Remarks.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(i) This result (or the special case g = f ) is the Wiener–Khinchin theorem.

autoconvolution and autocorrelation of f are f ∗ f and f ⊗ f . Their Fourier transforms are
(ii) The √ 2π f˜2
and 2π|f˜|2 , respectively.

3.3 Parseval’s theorem


If we apply the inverse transform to the Wiener–Khinchin theorem we find that
Z ∞ Z ∞

[f (y)] g(x + y) dy = [f˜(k)]∗ g̃(k) eikx dk . (3.34a)
−∞ −∞

Now set x = 0 and relabel y 7→ x to obtain Parseval’s theorem


Z ∞ Z ∞
[f (x)]∗ g(x) dx = [f˜(k)]∗ g̃(k) dk . (3.34b)
−∞ −∞

The special case used most frequently is when g = f :


Z ∞ Z ∞
|f (x)|2 dx = |f˜(k)|2 dk . (3.34c)
−∞ −∞

Remark. Parseval’s theorem means that the Fourier transform is a ‘unitary transformation’ that preserves
the ‘inner product’ between two functions (see later), in the same way that a rotation preserves lengths
and angles.

Alternative derivation using the delta function (unlectured).


Z ∞ Z ∞
2
f (x) dx = dx f (x)f ∗ (x)
−∞ −∞
Z ∞ Z ∞  Z ∞ 
1
= dx dk eıkx fe(k) dℓ e−ıℓx fe∗ (ℓ) from (3.10b) & (3.10b)∗
2π −∞ −∞ −∞
Z ∞ Z ∞  Z ∞ 
1
= dk fe(k) dℓ fe∗ (ℓ) dx eı(k−ℓ)x swap integration order
−∞ −∞ 2π −∞
Z ∞ Z ∞
= dk fe(k) dℓ fe∗ (ℓ) δ(k − ℓ) from (2.6d)
−∞ −∞
Z ∞
= dk fe(k)fe∗ (k) from (2.5) & (2.8a)
−∞
Z ∞
2
= fe(k) dk .
−∞

Natural Sciences Tripos: IB Mathematical Methods I 54 © [email protected], Michaelmas 2022


Example. Find the Fourier transform of xe−|x| and use Parseval’s theorem to evaluate the integral
Z ∞
k2
2 4
dk . (3.35)
−∞ (1 + k )

Answer. From (3.5) with b = 1


h i 1 2
F e−|x| = √ . (3.36a)
2π 1 + k2
Next employ (3.22) to obtain
r
h
−|x|
i ∂ h −|x| i 2 2k
F xe =ı F e = −ı . (3.36b)
∂k π (1 + k 2 )2

Then from Parseval’s theorem (3.34c) and a couple of integrations by parts

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Z ∞
k2 π ∞ 2 −2|x| π ∞ 2 −2x
Z Z
π
2 4
dk = x e dx = x e dx = . (3.36c)
−∞ (1 + k ) 8 −∞ 4 0 16
9/01
An Application: Heisenberg’s Uncertainty Principle (unlectured). Suppose that

x2
 
1
ψ(x) = 1 exp − (3.37)
(2π∆2x ) 4 4∆2x

is the [real] wave-function of a particle in quantum mechanics. Then, according to quantum mechanics,

x2
 
2 1
|ψ (x)| = p exp − 2 , (3.38)
2π∆2x 2∆x

is the probability of finding the particle at position x, and ∆x is the root mean square deviation in
position.
Remark. There is unit probability of finding the particle somewhere since |ψ 2 | is the Gaussian of
width ∆x and Z ∞ Z ∞
x2
 
2 1
|ψ (x)| dx = p exp − 2 dx = 1 . (3.39)
−∞ 2π∆2x −∞ 2∆x

The Fourier transform of ψ(x) follows from (3.7) after the substitution ε = 2 ∆x and a multiplicative
normalisation:
 2  14
2∆x
exp −∆2x k 2

ψ(k) =
e
π
k2
 
1 1
= 1 exp − 2 where ∆k = . (3.40)
2
(2π∆k ) 4 4∆ k 2∆ x

Hence ψe2 is another Gaussian, this time with a root mean square deviation in wavenumber of ∆k . In
agreement with Parseval’s theorem Z ∞
e 2 | dk = 1 .
|ψ(k) (3.41)
−∞
1
In the case of the Gaussian, ∆k ∆x = 2. More generally, one can show that for any (possibly complex)
wave-function ψ(x),
1
∆k ∆x ⩾ 2 (3.42)
where ∆x and ∆k are, as for the Gaussian, the root mean square deviations of the probability distribu-
tions |ψ(x)|2 and |ψ(k)|
e 2
, respectively. An important and well-known result follows from (3.42), since
in quantum mechanics the momentum is given by p = ℏk, where ℏ = h/2π and h is Planck’s constant.
Hence if we interpret ∆x = ∆x and ∆p = ℏ∆k to be the uncertainty in the particle’s position and
momentum respectively, then Heisenberg’s Uncertainty Principle follows from (3.42), namely

∆p ∆x ⩾ 12 ℏ . (3.43)

Natural Sciences Tripos: IB Mathematical Methods I 55 © [email protected], Michaelmas 2022


Remark.
A general property of Fourier transforms that
follows from (3.42) is that the smaller the varia-
tion in the original function (i.e. the smaller ∆x ),
the larger the variation in the transform (i.e. the
larger ∆k ), and vice versa. In more prosaic lan-
guage
a sharp peak in x ⇔ a broad bulge in k,
and vice versa.
This property has many applications, for instance
• a short pulse of electromagnetic radiation must contain many frequencies;
• a long pulse of electromagnetic radiation (i.e. many wavelengths) is necessary in order to
11/22 obtain an approximately monochromatic signal.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
3.4 Power spectra

The quantity
Φ(k) = |f˜(k)|2 (3.44)
appearing in the Wiener–Khinchin theorem and Parseval’s theorem is the (power) spectrum or (power)
spectral density of the function f (x). The Wiener–Khinchin theorem states that the Fourier Transform of
the autocorrelation function is the power spectrum.
This concept is often used to quantify the spectral content (as a function of angular frequency ω) of a
signal f (t).
The spectrum of a perfectly periodic signal consists of a series of delta functions at the principal frequency
and its harmonics, if present. Its autocorrelation function does not decay as t → ∞.
White noise is an ideal random signal with autocorrelation function proportional to δ(t): the signal is
perfectly decorrelated. It therefore has a flat spectrum (Φ = constant).
Less idealized signals may have spectra that are peaked at certain frequencies but also contain a general
noise component.

Natural Sciences Tripos: IB Mathematical Methods I 56 © [email protected], Michaelmas 2022


This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.

Spectrum of three active galactic nuclei at different red shifts

3.5 Solution of Ordinary Differential Equations using Fourier Transforms


Fourier transforms can be used as a method for solving differential equations. As an exemplar, we consider
a simple ordinary differential equation, but similar methods also work for partial differential equations.
Suppose that ψ(x) satisfies
d2 ψ
− a2 ψ = −f (x) , (3.45)
dx2
where a is a constant and f is a known function. Suppose also that ψ satisfies the [two] boundary conditions
|ψ| → 0 as |x| → ∞.

Natural Sciences Tripos: IB Mathematical Methods I 57 © [email protected], Michaelmas 2022


If we multiply the left-hand side (3.45) by √1 exp (−ıkx) and integrate over x, then we obtain


d2 ψ
Z    2 
1 d ψ
√ e−ıkx − a 2
ψ dx = F − a2 F (ψ) from (3.1)
2π −∞ dx2 dx2
= −k 2 F (ψ) − a2 F (ψ) from (3.21b). (3.46a)

The same action on the right-hand side yields −F(f ). Hence from taking the Fourier transform of the whole
equation we have that

−k 2 F (ψ) − a2 F (ψ) = −F (f ) . (3.46b)

Rearranging this equation we have that

F (f )
F (ψ) = , (3.46c)
k 2 + a2

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
and so from the inverse transform (3.10b) we have the solution
Z ∞
1 F (f )
ψ=√ eıkx 2 dk . (3.46d)
2π −∞ k + a2

Remark. The boundary conditions that |ψ| → 0 as |x| → ∞ were implicitly used when we assumed that the
Fourier transform of ψ existed. Why?

Natural Sciences Tripos: IB Mathematical Methods I 58 © [email protected], Michaelmas 2022


4 Partial Differential Equations
4.0 Why Study This?
Many scientific phenomena can be described by mathematical equations. Where there is variation in time
and space, or more than one spatial coordinate, the governing equations are partial differential equations
(PDEs).
Many, but not all, of these PDEs are linear and classical methods of analysis can be applied. The techniques
developed for linear equations are sometimes also useful in the study of nonlinear PDEs.

4.1 Nomenclature
Partial differential equations (PDEs) are equations relating one or more unknown functions, say ψ, (the
dependent variable[s]) of two or more independent variables, say x, y, z and t, with one or more of the
functions’ partial derivatives with respect to those variables. Hence a partial differential equation is a

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
equation of the form
∂ψ ∂ψ ∂ 2 ψ ∂ 2 ψ ∂ 2 ψ
 
F ψ, , , , , , . . . , x, y = 0 (4.1a)
∂x ∂y ∂x2 ∂x∂y ∂y 2
involving ψ and any of its derivatives evaluated at the same point, e.g. Schrödinger’s equation (1.55b)
for the quantum mechanical wave function ψ(x, y, z, t) of a non-relativistic particle:

ℏ2
 2
∂2 ∂2

∂ ∂ψ
− + 2 + 2 ψ + V (r)ψ = iℏ . (4.1b)
2m ∂x2 ∂y ∂z ∂t
5/02
Order. The power of the highest derivative determines the order of the differential equation. Hence (4.1b)
is a second-order equation.
Linearity. If the system of differential equations is of the first degree in the dependent variables, then the
system is said to be linear, i.e. (4.1a) is linear if F depends linearly on ψ and its derivatives. Hence
Schrödinger’s equation is linear; however Euler’s equation for an inviscid fluid,
 
∂u
ρ + (u · ∇)u = −∇p , (4.2)
∂t
5/03
5/04 where u is the velocity, ρ is the density and p is the pressure, is nonlinear in u.

4.1.1 Linear Second-Order Partial Differential Equations


The most general linear second-order partial differential equation in two independent variables is

Lψ(x, y) = g(x, y) , (4.3a)

where L is a differential operator such that

∂2ψ ∂2ψ ∂2ψ ∂ψ ∂ψ


Lψ ≡ a(x, y) + b(x, y) + c(x, y) + d(x, y) + e(x, y) + f (x, y)ψ . (4.3b)
∂x2 ∂x∂y ∂y 2 ∂x ∂y

Remarks.
(i) If g = 0 the equation is said to be homogeneous.
(ii) We will concentrate on examples where the coefficients, a, b, c, d, e and f are independent of x
and y, in which case the equation is said to have constant coefficients.
(iii) These ideas can be generalized to more than two independent variables (e.g. Schrödinger’s equa-
tion (4.1b) has four independent variables), or to systems of PDEs with more than one dependent
variable.
L is a linear operator. L is a linear operator since

L(αψ + βφ) = αLψ + βLφ , (4.4)

where ψ and φ any functions of x and y, and α and β are any constants.

Natural Sciences Tripos: IB Mathematical Methods I 59 © [email protected], Michaelmas 2022


Principle of superposition (again).
(i) If ψ and φ satisfy the homogeneous equation, i.e. Lψ = Lφ = 0, then αψ + βφ also satisfies the
homogeneous equation.
(ii) If the particular integral ψp satisfies the inhomogeneous equation Lψ = g and the complementary
function ψc satisfies the homogeneous equation Lψ = 0, then ψp + ψc satisfies the inhomogeneous
equation:
L(ψp + ψc ) = Lψp + Lψc = g + 0 = g (4.5)

4.2 Physical Examples and Applications


4.2.1 Waves on a Violin String
Consider small displacements on a stretched elastic string of
density ρ per unit length (when not displaced). Assume that

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
all displacements y(x, t) are vertical (this is a bit of a cheat),
and resolve horizontally and vertically to obtain respectively

T2 cos θ2 = T1 cos θ1 , (4.6a)


∂2y
(ρ dx) 2 = T2 sin θ2 − T1 sin θ1
∂t
= T2 cos θ2 (tan θ2 − tan θ1 ) . (4.6b)
In the light of (4.6a) let
T = Tj cos θj (j = 1, 2) , (4.7a)
and observe that
∂y
tan θ =
. (4.7b)
∂x
Then from (4.6b) it follows after use of Taylor’s theorem that

∂2y
ρ dx = T (tan θ2 − tan θ1 )
∂t2  
∂ ∂
= T y(x + dx, t) − y(x, t)
∂x ∂x
∂2y
= T 2 dx + . . . , (4.8a)
∂x
and hence, in the infinitesimal limit, that
∂2y T ∂2y
2
= . (4.8b)
∂t ρ ∂x2
q
This is the wave equation with wavespeed c = Tρ . In general the one-dimensional wave equation is

∂2y ∂2y
2
= c2 2 . (4.8c)
∂t ∂x

Typical physical constants. For a violin (D-)string: T ≈ 40 N, and ρ ≈ 1 g m−1 so c ≈ 200 m s−1 .

4.2.2 Electromagnetic Waves (Unlectured)


The theory of electromagnetism is based on Maxwell’s equations. These relate the electric field E, the
magnetic field B, the charge density ρ and the current density J:
ρ
∇·E = , (4.9a)
ϵ0
∂B
∇×E = − , (4.9b)
∂t
1 ∂E
∇×B = µ0 J + , (4.9c)
c2 ∂t
∇·B = 0, (4.9d)

Natural Sciences Tripos: IB Mathematical Methods I 60 © [email protected], Michaelmas 2022


where ϵ0 is the dielectric constant, µ0 is the magnetic permeability, and c2 = (µ0 ϵ0 )−1 is the speed of light
(c ≈ 3 × 108 m s−1 ). If there is no charge or current (i.e. ρ = 0 and J = 0), then from (4.9a), (4.9b), (4.9c)
and the vector identity (1.56a):
1 ∂2E ∂B
2 2
=∇× using (4.9c) with J = 0
c ∂t ∂t
= −∇ × (∇ × E) using (4.9b)
2
= ∇ E − ∇ (∇ · E) using identity (1.56a)
2
= ∇ E. using (4.9a) with ρ = 0 (4.10a)

We have therefore recovered the three-dimensional wave equation (cf. (4.8c))


∂2E
 2
∂2 ∂2

2 ∂
=c + 2 + 2 E. (4.10b)
∂t2 ∂x2 ∂y ∂z

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Remarks.
(i) B obeys the same equation.
(ii) The pressure perturbation of a sound waves satisfies the scalar equivalent of this equation, where
c ≈ 300 m s−1 equals the speed of sound.

4.2.3 Electrostatic Fields


Suppose instead a steady electric field is generated by a known charge density ρ. Then from the second of
Maxwell’s equations (4.9b)
∇ × E = 0, (4.11)
which implies from (1.51) that there exists an electric potential φ such that

E = −∇φ . (4.12)

It then follows from the first of Maxwell’s equations, (4.9a), that φ satisfies Poisson’s equation
ρ
∇2 φ = − , (4.13a)
ϵ0
i.e.
∂2 ∂2 ∂2
 
ρ
2
+ 2+ 2 φ=− . (4.13b)
∂x ∂y ∂z ϵ0

Remark. The vector field E (and the vector field g below) is said to be generated by the potential φ. A
scalar potential is easier to work with because it does not have multiple components and its value
is independent of the coordinate system. The potential is also directly related to the energy of the
system.

4.2.4 Gravitational Fields (Unlectured)


A Newtonian gravitational field g satisfies

∇ · g = −4πGρ , (4.14a)

and
∇ × g = 0, (4.14b)

where G is the gravitational constant and ρ is mass density. From the latter equation and (1.51) it follows
that there exists a gravitational potential φ such that

g = −∇φ . (4.15)

Thence from (4.14a) we deduce that the gravitational potential satisfies Poisson’s equation

∇2 φ = 4πGρ . (4.16)

5/01 Remark. Electrostatic and gravitational fields are similar!

Natural Sciences Tripos: IB Mathematical Methods I 61 © [email protected], Michaelmas 2022


4.2.5 Diffusion of a Passive Tracer
Suppose we want describe how an inert chemical diffuses through a solid or stationary fluid.19
Denote the mass concentration of the dissolved chem-
ical per unit volume by C(r, t), and the material flux
vector of the chemical by q(r, t). Then the amount of
chemical crossing a small surface dS in time δt is

local flux = (q · dS) δt . (4.17a)


Hence the flux of chemical out of a closed surface S enclosing a volume V in time δt is
 
ZZ
surface flux =  q · dS δt . (4.17b)
S

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Let Q(r, t) denote any chemical mass source per unit time per unit volume of the media. Then if the change
of chemical within the volume is to be equal to the flux of the chemical out of the surface in time δt
     
ZZ ZZZ ZZZ
d
 q · dS δt = −  C dV  δt +  Q dV  δt . (4.18a)
dt
S V V

Hence using the divergence theorem (1.61), and exchanging the order of differentiation and integration,
ZZZ  
∂C
∇·q+ − Q dV = 0 . (4.18b)
∂t
V

But this is true for any volume, and so


∂C
= −∇ · q + Q . (4.19)
∂t
The simplest empirical law relating concentration flux to concentration gradient is Fick’s law

q = −D∇C , (4.20)

where D is the diffusion coefficient; the negative sign is necessary if chemical is to flow from high to low
concentrations. If D is constant then the partial differential equation governing the concentration is
∂C
= D∇2 C + Q . (4.21)
∂t

Diffusion Equation. If there is no chemical source then Q = 0, and the governing equation becomes the
diffusion equation
∂C
= D∇2 C . (4.22)
∂t
Poisson’s Equation. If the system has reached a steady state (i.e. ∂t ≡ 0), then with f (r) = Q(r)/D the
governing equation is Poisson’s equation

∇2 C = −f . (4.23)

Laplace’s Equation. If the system has reached a steady state and there are no chemical sources then the
concentration is governed by Laplace’s equation

∇2 C = 0 . (4.24)

19 Reacting chemicals and moving fluids are slightly more tricky.

Natural Sciences Tripos: IB Mathematical Methods I 62 © [email protected], Michaelmas 2022


4.2.6 Heat Flow (Unlectured)

What governs the flow of heat in a saucepan, an engine block,


the earth’s core, etc.? Can we write down an equation?
Let q(r, t) denote the flux vector for heat flow. Then the en-
ergy in the form of heat (molecular vibrations) flowing out of
a closed surface S enclosing a volume V in time δt is again
(4.17b). Also, let
E(r, t) denote the internal energy per unit mass of the solid,
Q(r, t) denote any heat source per unit time per unit volume of the solid,
ρ(r, t) denote the mass density of the solid (assumed constant here).
The flow of heat in/out of S must balance the change in internal energy and the heat source over, say, a
time δt (cf. (4.18a))

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
     
ZZ ZZZ ZZZ
d
 q · dS δt = −  ρE dV  δt +  Q dV  δt .
dt
S V V

For ‘slow’ changes at constant pressure (1st and 2nd law of thermodynamics)
E(r, t) = cp θ(r, t) , (4.25)
where θ is the temperature and cp is the specific heat (assumed constant here). Hence using the divergence
theorem (1.61), and exchanging the order of differentiation and integration (cf. (4.18b)),
ZZZ  
∂θ
∇ · q + ρcp − Q dV = 0 . (4.26)
∂t
V

But this is true for any volume, hence


∂θ
ρcp = −∇ · q + Q . (4.27)
∂t
Experience tells us heat flows from hot to cold. The simplest empirical law relating heat flow to temperature
gradient is Fourier’s law (cf. Fick’s law (4.20))
q = −k∇θ , (4.28)
where k is the heat conductivity. If k is constant then the partial differential equation governing the tem-
perature is (cf. (4.21))
∂θ Q
= ν∇2 θ + (4.29)
∂t ρcp
where ν = k/(ρcp ) is the diffusivity (or coefficient of diffusion).
4.2.7 Other Equations
There are numerous other partial differential equations describing scientific, and non-scientific, phenomena.
One equation that you might have heard a lot about is the Black-Scholes equation for call option pricing
∂w ∂w 1 2 2 ∂ 2 w
= rw − rx − 2v x , (4.30)
∂t ∂x ∂x2
where w(x, t) is the price of the call option of the stock, x is the variable market price of the stock, t is time,
r is a fixed interest rate and v 2 is the variance rate of the stock price.
Also, despite the impression given above where all the equations except (4.2) are linear, many of the most
interesting scientific (and non-scientific) equations are nonlinear. For instance the nonlinear Schrödinger
equation
∂A ∂ 2 A
i + = A|A|2 , (4.31)
∂t ∂x2
where i is the square root of -1, admits soliton solutions (which is one of the reasons that optical fibres
work).

Natural Sciences Tripos: IB Mathematical Methods I 63 © [email protected], Michaelmas 2022


4.3 Separation of Variables

You may have already met the general idea of ‘separability’ when solving ordinary differential equations,
e.g. when you studied separable equations to the special differential equations that can be written in the
form
X(x)dx = Y (y)dy = constant . (4.32)
| {z } | {z }
function of x function of y
Sometimes functions can we written in separable form. For instance,

f (x, y) = cos x exp y = X(x)Y (y) , where X = cos x and Y = exp y , (4.33)

is separable in Cartesian coordinates, while


1
g(x, y, z) = (4.34)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1
(x2 + y 2 + z 2 ) 2

is not separable in Cartesian coordinates, but is separable in spherical polar coordinates since
1
g = R(r)Θ(θ)Φ(ϕ) where R= r , Θ = 1 and Φ = 1 . (4.35)

Solutions to partial differential equations can sometimes be found by seeking solutions that can be written
in separable form, e.g.

Time & 1D Cartesians: y(x, t) = X(x)T (t) , (4.36a)


2D Cartesians: ψ(x, y) = X(x)Y (y) , (4.36b)
3D Cartesians: ψ(x, y, z) = X(x)Y (y)Z(z) , (4.36c)
Cylindrical Polars: ψ(ρ, ϕ, z) = R(ρ)Φ(ϕ)Z(z) , (4.36d)
Spherical Polars: ψ(r, θ, ϕ) = R(r)Θ(θ)Φ(ϕ) . (4.36e)

12/22 However, we emphasise that not all solutions of partial differential equations can be written in this form.

4.4 The One-Dimensional Wave Equation

4.4.1 Separable Solutions

Seek solutions y(x, t) to the one dimensional wave equation (4.8c), i.e.

∂2y ∂2y
2
= c2 2 , (4.37a)
∂t ∂x
of the form
y(x, t) = X(x)T (t) . (4.37b)
On substituting (4.37b) into (4.37a) we obtain

X T̈ = c2 T X ′′ , (4.38)

where a ˙ and a ′ denote differentiation by t and x respectively. After rearrangement we have that

1 T̈ (t) X ′′ (x)
= = λ, (4.39a)
c2 T (t) X(x)
| {z } | {z }
function of t function of x
where λ is a constant (the only function of t that equals a function of x). We have therefore split the PDE
into two ODEs:
T̈ − c2 λT = 0 and X ′′ − λX = 0 . (4.39b)

There are three cases to consider.

Natural Sciences Tripos: IB Mathematical Methods I 64 © [email protected], Michaelmas 2022


λ = 0. In this case
T̈ (t) = X ′′ (x) = 0 ⇒ T = A0 + B0 t and X = C0 + D0 x , (4.40a)
where A0 , B0 , C0 and D0 are constants, i.e.

y = (A0 + B0 t)(C0 + D0 x) . (4.40b)

λ = σ 2 > 0. In this case


T̈ − σ 2 c2 T = 0 and X ′′ − σ 2 X = 0 . (4.40c)
Hence
T = Aσ eσct + Bσ e−σct and X = Cσ cosh σx + Dσ sinh σx , (4.40d)
where Aσ , Bσ , Cσ and Dσ are constants, i.e.

y = Aσ eσct + Bσ e−σct (Cσ cosh σx + Dσ sinh σx) .



(4.40e)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Alternatively we could express this as
  
y = Ãσ cosh σct + B̃σ sinh σct C̃σ eσx + D̃σ e−σx , or as . . .

6/02 where Ãσ , B̃σ , C̃σ and D̃σ are constants.


6/03
6/04 λ = −k 2 < 0. In this case
T̈ + k 2 c2 T = 0 and X ′′ + k 2 X = 0 . (4.40f)
Hence
T = Ak cos kct + Bk sin kct and X = Ck cos kx + Dk sin kx , (4.40g)
where Ak , Bk , Ck and Dk are constants, i.e.

y = (Ak cos kct + Bk sin kct) (Ck cos kx + Dk sin kx) . (4.40h)

Remark. Without loss of generality we could also impose a normalisation condition, say, Cj2 + Dj2 = 1.

4.4.2 Boundary and Initial Conditions

Solutions (4.40b), (4.40e) and (4.40h) represent three families of solutions.20 Although they are based on a
special assumption, we shall see that because the wave equation is linear they can represent a wide range
of solutions by means of superposition. However, before going further it is helpful to remember that when
solving a physical problem boundary and initial conditions are also needed.

Boundary Conditions. Suppose that the string considered in § 4.2.1 has ends at x = 0 and x = L that are
fixed; appropriate boundary conditions are then

y(0, t) = 0 and y(L, t) = 0 . (4.41)

It is no coincidence that there are boundary conditions at two values of x and the highest derivative
in x is second order.
Initial Conditions. Suppose also that the initial displacement and
initial velocity of the string are known; appropriate initial
conditions are then
∂y
y(x, 0) = d(x) and (x, 0) = v(x) . (4.42)
∂t
Again it is no coincidence that we need two initial conditions
and the highest derivative in t is second order.

We shall see that the boundary conditions restrict the choice of λ.

20 Or arguably one family if you wish to nit pick in the complex plane.

Natural Sciences Tripos: IB Mathematical Methods I 65 © [email protected], Michaelmas 2022


4.4.3 Solution
Consider the cases λ = 0, λ < 0 and λ > 0 in turn. These constitute an uncountably infinite number of
solutions; our aim is to end up with a countably infinite number of solutions by elimination.
λ = 0. If the homogeneous, i.e. zero, boundary conditions (4.41) are to be satisfied for all time, then in
(4.40b) we must have that C0 = D0 = 0.
λ > 0. Again if the boundary conditions (4.41) are to be satisfied for all time, then in (4.40e) we must have
that Cσ = Dσ = 0.
λ < 0. Applying the boundary conditions (4.41) to (4.40h) yields
Ck = 0 and Dk sin kL = 0 . (4.43)
If Dk = 0 then the entire solution is trivial (i.e. zero), so the only useful solution has

sin kL = 0 ⇒ k = , (4.44)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
L
where n is a non-zero integer. These special values of k are eigenvalues and the corresponding eigen-
functions, or normal modes, are
nπx
Xn = D nπ sin . (4.45)
L
L
Hence, from (4.40h), solutions to (4.8c) that satisfy the boundary condition (4.41) are
 
nπct nπct nπx
yn (x, t) = An cos + Bn sin sin , (4.46)
L L L
where we have written An for A nπ L
D nπ
L
and Bn for B nπ
L
D nπ
L
. Since (4.8c) is linear we can superimpose (i.e.
add) solutions to get the general solution
∞  
X nπct nπct nπx
y(x, t) = An cos + Bn sin sin , (4.47)
n=1
L L L

where there is no need to run the sum from −∞ to ∞ because of the symmetry properties of sin and cos.
We note that when the solution is viewed as a function of x at fixed t, or as a function of t at fixed x, then
6/01 it has the form of a Fourier series.
The solution (4.47) satisfies the boundary conditions (4.41) by con-
struction. The only thing left to do is to satisfy the initial conditions
(4.42), i.e. we require that

X nπx
y(x, 0) = d(x) = An sin , (4.48a)
n=1
L

∂y X nπc nπx
(x, 0) = v(x) = Bn sin . (4.48b)
∂t n=1
L L

An and Bn can now be found using the orthogonality relations for sin (see (0.12a)), i.e.
Z L
nπx mπx L
sin sin dx = δnm . (4.49)
0 L L 2
Hence for an integer m > 0

!
2 L 2 L
Z Z
mπx X nπx mπx
d(x) sin dx = An sin sin dx
L 0 L L 0 n=1
L L

2An L
Z
X nπx mπx
= sin sin dx
n=1
L 0 L L

X 2An L
= δnm using (4.49)
n=1
L 2
= Am , using (1.14c) (4.50a)
or alternatively invoke standard results for the coefficients of Fourier series. Similarly
Z L
2 mπx
Bm = v(x) sin dx . (4.50b)
mπc 0 L
Natural Sciences Tripos: IB Mathematical Methods I 66 © [email protected], Michaelmas 2022
4.5 Poisson’s Equation
Suppose we are interested in obtaining solutions to Poisson’s equation
∇2 θ = −f , (4.51a)
where, say, θ is a steady temperature distribution and f = Q/(ρcp ν 2 ) is a scaled heat source (see (4.29)).
For simplicity let the world be two-dimensional, then (4.51a) becomes
 2
∂2


+ 2 θ = −f . (4.51b)
∂x2 ∂y
Suppose we seek a separable solution as before, i.e. θ(x, y) = X(x)Y (y). Then on substituting into (4.51b)
we obtain
X ′′ Y ′′ f
=− − . (4.52)
X Y XY

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
It follows that unless we are very fortunate, and f (x, y) has a particular form (e.g. f = 0), it does not look
like we will be able to find separable solutions.
In order to make progress the trick is to first find a[ny] particular solution, θs , to (4.51b) (cf. finding a
particular solution when solving constant coefficient ODEs last year). The function φ = θ − θs then satisfies
Laplace’s equation
∇2 φ = 0 . (4.53)
This is just Poisson’s equation with f = 0, for which we have just noted that separable solutions exist. To
obtain the full solution we need to add these [countably infinite] separable solutions to our particular solution
(cf. adding complementary functions to a particular solution when solving constant coefficient ODEs last
year).
4.5.1 A Particular Solution
We will illustrate the method by considering the particular
example where the heating f is uniform, f = 1 wlog (since
the equation is linear), in a semi-infinite rod, 0 ⩽ x, of unit
width, 0 ⩽ y ⩽ 1.

In order to find a particular solution suppose for the mo-


ment that the rod is infinite (or alternatively consider the
solution for x ≫ 1 for a semi-infinite rod, when the rod
might look ‘infinite’ from a local viewpoint).

Then we might expect the particular solution for the temperature θs to be independent of x, i.e. θs ≡ θs (y).
Poisson’s equation (4.51b) then reduces to
d2 θs
= −1 , (4.54a)
dy 2
which has solution
θs = a0 + b0 y − 21 y 2 , (4.54b)
where a0 and b0 are constants.
4.5.2 Boundary Conditions
For the rod problem, experience suggests that we need to specify one of the following at all points on the
boundary of the rod:
• the temperature (a Dirichlet condition), i.e.
θ = g(r) , (4.55a)
where g(r) is a known function;
• the scaled heat flux (a Neumann condition), i.e.
∂θ
≡n
b · ∇θ = h(r) , (4.55b)
∂n
where h(r) is a known function;

Natural Sciences Tripos: IB Mathematical Methods I 67 © [email protected], Michaelmas 2022


• a mixed condition, i.e.
∂θ
α(r)+ β(r)θ = d(r) , (4.55c)
∂n
where α(r), β(r) and d(r) are known functions, and α(r) and β(r) are not simultaneously zero.
For our rod let us consider the boundary conditions

θ(x, 0) = 0 , θ(x, 1) = 0 , 0 ⩽ x < ∞, (4.56a)


∂θ
θ(0, y) = 0 , (x, y) → 0 as x → ∞ , 0 ⩽ y ⩽ 1. (4.56b)
∂x
1
For these conditions it is appropriate to take a0 = 0 and b0 = 2 in (4.54b) so that

θs = 21 y(1 − y) ⩾ 0 . (4.57)

Let φ = θ −θs , then φ satisfies Laplace’s equation (4.53) and, from (4.56a), (4.56b) and (4.57), the boundary

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
conditions

φ(x, 0) = 0 , φ(x, 1) = 0 , 0 ⩽ x < ∞, (4.58a)


∂φ
7/03 φ(0, y) = − 12 y(1 − y) , (x, y) → 0 as x → ∞ , 0 ⩽ y ⩽ 1. (4.58b)
∂x

4.5.3 Separable Solutions

On writing φ(x, y) = X(x)Y (y) and substituting into Laplace’s equation (4.53) it follows that (cf. (4.52))
X ′′ (x) Y ′′ (y)
= − = λ, (4.59a)
X(x) Y (y)
| {z } | {z }
function of x function of y
so that
X ′′ − λX = 0 and Y ′′ + λY = 0 . (4.59b)
7/02
7/04
We can now consider each of the possibilities λ = 0, λ > 0 and λ < 0 in turn to obtain, cf. (4.40b), (4.40e)
and (4.40h),

λ = 0.
φ = (A0 + B0 x)(C0 + D0 y) . (4.60a)

λ = σ 2 > 0.
φ = Aσ eσx + Bσ e−σx (Cσ cos σy + Dσ sin σy) .

(4.60b)

λ = −k 2 < 0.
φ = (Ak cos kx + Bk sin kx) (Ck cosh ky + Dk sinh ky) . (4.60c)

The boundary conditions at y = 0 and y = 1 in (4.58a) state that φ(x, 0) = 0 and φ(x, 1) = 0. This implies
(cf. the stretched string problem) that solutions proportional to sin(nπy) are appropriate; hence we try
λ = n2 π 2 where n is an integer. The eigenfunctions are thus

φn = An enπx + Bn e−nπx sin(nπy) ,



(4.61)

where An and Bn are constants and, without loss of generality, n > 0. However, if the boundary condition
in (4.58b) as x → ∞ is to be satisfied then An = 0. Hence the solution has the form

X
φ= Bn e−nπx sin(nπy) . (4.62)
n=1

The Bn are fixed by the first boundary condition in (4.58b), i.e. we require that

X
− 12 y(1 − y) = Bn sin(nπy) . (4.63a)
n=1

Natural Sciences Tripos: IB Mathematical Methods I 68 © [email protected], Michaelmas 2022


Using the orthogonality relations (4.49) it follows that
(−1)m − 1
Bm = 2 . (4.63b)
m3 π 3
and hence that

X 4
θ = 1
2 y(1 − y) − sin((2ℓ + 1)πy) e−(2ℓ+1)πx , (4.64a)
π 3 (2ℓ+ 1)3
ℓ=0
or equivalently

X 4 
−(2ℓ+1)πx

θ = sin((2ℓ + 1)πy) 1 − e . (4.64b)
π 3 (2ℓ + 1)3
ℓ=0

4.6 The Diffusion Equation

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
4.6.1 Separable Solutions

Seek solutions C(x, t) to the one dimensional version of the diffusion equation, (4.22), i.e.
∂C ∂2C
=D 2 , (4.65)
∂t ∂x
of the form
C(x, t) = X(x)T (t) . (4.66)
On substitution we obtain
X Ṫ = D T X ′′ . (4.67)
After rearrangement we have that
1 Ṫ (t) X ′′ (x)
= = λ, (4.68a)
D T (t) X(x)
| {z } | {z }
function of t function of x
where λ is again a constant. We have therefore split the PDE into two ODEs:
Ṫ − DλT = 0 and X ′′ − λX = 0 . (4.68b)

There are again three cases to consider.

λ = 0. In this case
Ṫ (t) = X ′′ (x) = 0 ⇒ T = α0 and X = β0 + γ0 x , (4.69a)
where α0 , β0 and γ0 are constants. Combining these results we obtain
C = α0 (β0 + γ0 x) ,
or
C = β0 + γ 0 x , (4.69b)
since, without loss of generality (wlog), we can take α0 = 1.
λ = σ 2 > 0. In this case
Ṫ − Dσ 2 T = 0 and X ′′ − σ 2 X = 0 . (4.69c)
Hence
T = ασ exp(Dσ 2 t) and X = βσ cosh σx + γσ sinh σx , (4.69d)
where ασ , βσ and γσ are constants. On taking ασ = 1 wlog,
C = exp(Dσ 2 t) (βσ cosh σx + γσ sinh σx) . (4.69e)

λ = −k 2 < 0. In this case


Ṫ + Dk 2 T = 0 and X ′′ + k 2 X = 0 . (4.69f)
Hence
T = αk exp(−Dk 2 t) and X = βk cos kx + γk sin kx , (4.69g)
where αk , βk and γk are constants. On taking αk = 1 wlog,
13/22
C = exp(−Dk 2 t) (βk cos kx + γk sin kx) . (4.69h)

Natural Sciences Tripos: IB Mathematical Methods I 69 © [email protected], Michaelmas 2022


4.6.2 Boundary and Initial Conditions
Consider the problem of a solvent occupying the region be-
tween x = 0 and x = L. Suppose that at t = 0 there is no
chemical in the solvent, i.e. the initial condition is

C(x, 0) = 0 . (4.70a)

Note that here we specify one initial condition based on the


observation that the highest derivative in t in (4.21) is first
order.

Suppose also that for t > 0 the concentration of the chemical is maintained at C0 at x = 0, and is 0 at
x = L, i.e.
C(0, t) = C0 and C(L, t) = 0 for t > 0 . (4.70b)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Again it is no coincidence that there two boundary conditions and the highest derivative in x is second
order.
Remark. Equation (4.21) and conditions (4.70a) and (4.70b) are mathematically equivalent to a description
of the temperature of a rod of length L which is initially at zero temperature before one of the ends
is raised instantaneously to a constant non-dimensional temperature of C0 .

4.6.3 Solution
The trick here is to note that
• the inhomogeneous (i.e. non-zero) boundary condition at x = 0, i.e C(0, t) = C0 , is steady, and
• the separable solutions (4.69e) and (4.69h) depend on time, while (4.69b) does not.
It therefore seems sensible to try and satisfy the the boundary conditions (4.70b) using the solution (4.69b).
If we call this part of the total solution C∞ (x) then, with β0 = C0 and γ0 = −C0 /L in (4.69b),
 x
C∞ (x) = C0 1 − , (4.71)
L
which is just a linear variation in C from C0 at x = 0 to 0 at x = L. Write

C(x, t) = C∞ (x) + C(x,


e t) , (4.72)

where Ce is a sum of the separable time-dependent solutions (4.69e) and (4.69h). Then from the initial
condition (4.70a), the boundary conditions (4.70b), and the steady solution (4.71), it follows that

e 0) = −C0 1 − x ,
 
C(x, (4.73a)
and L

C(0,
e t) = 0 and C(L,
e t) = 0 for t > 0. (4.73b)

If the homogeneous boundary conditions (4.73b) are to be satisfied then, as for the wave equation, separable
solutions with λ > 0 are unacceptable, while λ = −k 2 < 0 is only acceptable if

βk = 0 and γk sin kL = 0 . (4.74a)

It follows that if the solution is to be non trivial then



k= . (4.74b)
L
The eigenfunctions corresponding to (4.74b) are
nπx
Xn = Γn sin , (4.74c)
L
where Γn = γ nπ L
. Again, because (4.21) is a linear equation, we can add individual solutions to get the
general solution
∞  2 2 
X n π Dt nπx
C(x,
e t) = Γn exp − 2
sin . (4.75)
n=1
L L

Natural Sciences Tripos: IB Mathematical Methods I 70 © [email protected], Michaelmas 2022


The Γn are fixed by the initial condition (4.73b):

 x X nπx
−C0 1 − = Γn sin . (4.76a)
L n=1
L

Hence Z L
2C0  x mπx 2C0
Γm =− 1− sin dx = − . (4.76b)
L 0 L L mπ
The solution is thus given by
∞  2 2 
 x  X 2C0 n π Dt nπx
C = C0 1− − exp − 2
sin . (4.77a)
L n=1
nπ L L

or from using (4.76a)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
∞   2 2 
X 2C0 n π Dt nπx
C = 1 − exp − 2
sin . (4.77b)
n=1
nπ L L

0.9

0.8

0.7

0.6
The solution (4.77b) with C0 = 1 and L = 1, plotted at
0.5
times t = 0.0001, t = 0.001, t = 0.01, t = 0.1 and t = 1
0.4 (curves from left to right respectively).
0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Paradox. sin nπx


L is not a separable solution of the diffu-
sion equation.
Remark. As t → ∞ in (4.77a)
 x
C → C0 1 − = C∞ (x) . (4.78)
L

Remark. Solution (4.77b) is odd and has period 2L. We


are in effect solving the 2L-periodic diffusion prob-
lem where C is initially zero. Then, at t = 0+, C is
raised to +1 at 2nL+ and lowered to −1 at 2nL−
(for integer n), and kept zero everywhere else.
4.6.4 A Rough and Ready Outline Recipe
(i) In the case of an inhomogeneous equation, use the principle of superposition to seek a particular
solution to reduce the equation to one that is homogeneous.
(ii) Seek separable solutions to the homogeneous equation.
(iii) In the case of inhomogeneous boundary conditions consider seeking a [separable] solution to reduce
the boundary conditions to ones that are homogeneous.
(iv) Use the boundary conditions to rule out certain of the separable solutions and to identify eigenvalues.
(v) Using the principle of superposition, seek a solution that is a sum of eigenfunctions.
(vi) Determine unknown constants using the boundary conditions.

Natural Sciences Tripos: IB Mathematical Methods I 71 © [email protected], Michaelmas 2022


4.7 Solution Using Fourier Transforms (Non-examinable & Unlectured)
4.7.1 The diffusion equation as an exemplar

Consider the diffusion equation (see (4.22) or (4.29)) governing the evolution of, say, temperature, θ(x, t):
∂θ ∂2θ
=ν 2. (4.79)
∂t ∂x
In § 4.6 we have seen how separable solutions and Fourier series can be used to solve (4.79) over finite
x-intervals. Fourier transforms can be used to solve (4.79) when the range of x is infinite.21
We will assume boundary conditions such as
∂θ
θ → constant and →0 as |x| → ∞ , (4.80)
∂x
so that the Fourier transform of θ exists (at least in a generalised sense):

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Z ∞
1
θ(k, t) = √
e e−ıkx θ dx . (4.81)
2π −∞
If we then multiply the left hand side of (4.79) by √12π exp (−ıkx) and integrate over x we obtain the time
derivative of θ:
e
Z ∞  Z ∞ 
1 −ıkx ∂θ ∂ 1 −ıkx
√ e dx = √ e θ dx swap differentiation and integration
2π −∞ ∂t ∂t 2π −∞
∂ θe
= from (4.81) .
∂t
A similar manipulation of the right hand side of (4.79) yields
Z ∞  2 
1 −ıkx ∂ θ
√ e ν 2 dx = −νk 2 θe from (3.21b).
2π −∞ ∂x

Putting the left hand side and the right hand side together it follows that θ(k,
e t) satisfies

∂ θe
10/01 + νk 2 θe = 0 . (4.82a)
∂t
This equation has solution
e t) = γ(k) exp(−νk 2 t) ,
θ(k, (4.82b)
where γ(k) is an unknown function of k (cf. the Γn in (4.75)).
Suppose that the temperature distribution is known at a specific time, wlog t = 0. Then from evaluating
(4.82b) at t = 0 we have that
γ(k) = θ(k,
e 0) and so θ(k, e 0) exp(−νk 2 t) .
e t) = θ(k, (4.83)
But from definition (3.1) Z ∞
e 0) = √1
θ(k, e−ıky θ(y, 0) dy , (4.84a)
2π −∞
and so Z ∞
1
θ(k, t) = √
e exp(−ıky − νk 2 t) θ(y, 0) dy . (4.84b)
2π −∞
We can now use the Fourier inversion formula to find θ(x, t):
Z ∞
1
θ(x, t) = √ dk eıkx θ(k,
e t) from (3.10b)
2π −∞
Z ∞  Z ∞ 
1 ıkx 1 2
= √ dk e √ exp(−ıky − νk t) θ(y, 0) dy from (4.84b)
2π −∞ 2π −∞
Z ∞ Z ∞
1
= dy θ(y, 0) dk exp(ık(x − y) − νk 2 t) swap integration order.
2π −∞ −∞

21 Semi-infinite ranges can also be tackled by means of suitable ‘tricks’: see the example sheet.

Natural Sciences Tripos: IB Mathematical Methods I 72 © [email protected], Michaelmas 2022


From completing the square, or alternatively from our earlier calculation of the Fourier transform of a
1
Gaussian (see (3.7) and apply the transformations ε → (2νt)− 2 , k → (y − x) and x → k), we have that
Z ∞
(x − y)2
r  
2
 π
dk exp ık(x − y) − νk t = exp − . (4.85)
−∞ νt 4νt

Substituting into the above expression for θ(x, t) we obtain a


solution to the diffusion equation in terms of the initial condition
at t = 0:
Z ∞
(x − y)2
 
1
θ(x, t) = √ dy θ(y, 0) exp − . (4.86a)
4πνt −∞ 4νt

Example. If θ(x, 0) = θ0 δ(x) then we obtain what is sometimes


referred to as the fundamental solution of the diffusion equation,

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
namely
x2
 
θ0
θ(x, t) = √ exp − . (4.86b)
4πνt 4νt
Physically this means that if the temperature at one point of
an infinite rod is instantaneously raised to ‘infinity’, then the
resulting temperature distribution is that of a Gaussian with a
1
maximum temperature decaying like t− 2 and a width increasing
1
like t 2 .

Natural Sciences Tripos: IB Mathematical Methods I 73 © [email protected], Michaelmas 2022


5 Matrices
5.0 Why Study This?
A good question since this material is as almost dry as the Sahara (or East Anglia). A general answer is
that matrices are essential mathematical tools. You need to know how to manipulate them: the addition
and multiplication of scalars, vectors and matrices is referred to as linear algebra.
A more specific answer is that many scientific quantities are vectors and a linear relationship between two
vectors is described by a matrix. This could be either
(i) a physical relationship, e.g. that between the angular velocity and angular momentum vectors of a
rotating body;
(ii) a relationship between the components of (physically) the same vector in different coordinate systems.
However, vectors do not necessarily live in physical space. In some applications (notably quantum mechanics)
we have to deal with complex spaces of various dimensions.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Inter alia, we will study eigenvalues and eigenvectors; these are characteristic numbers and directions
associated with matrices, which allow them to be expressed in the simplest form. Moreover, the matrices that
occur in scientific applications usually have special symmetries that impose conditions on their eigenvalues
and eigenvectors, e.g. Hermitian matrices (observables in quantum mechanics are Hermitian operators).

5.1 Vector Spaces


The concept of a vector in three-dimensional Euclidean space can be generalised to n dimensions and in a
more general (and abstract) way.
5.1.1 Some Notation
First some notation.
Notation Meaning
∈ in
∃ there exists
∀ for all

5.1.2 Definition
A set of elements, or ‘vectors’, are said to form a complex linear vector space V if

(i) there exists a binary operation, say addition, under which the set V is closed so that
if u, v ∈ V , then u + v ∈ V ; (5.1a)

(ii) addition is commutative and associative, i.e. for all u, v, w ∈ V


u + v = v + u, (5.1b)
(u + v) + w = u + (v + w) ; (5.1c)

(iii) there exists closure under multiplication by a complex scalar, i.e.


if a ∈ C and v ∈ V then av ∈ V ; (5.1d)

(iv) multiplication by a scalar is distributive and associative, i.e. for all a, b ∈ C and u, v ∈ V
a(u + v) = au + av , (5.1e)
(a + b)u = au + bu , (5.1f)
a(bu) = (ab)u ; (5.1g)

(v) there exists a null, or zero, vector 0 ∈ V such that for all v ∈ V
v +0 = v; (5.1h)

(vi) for all v ∈ V there exists a negative, or inverse, vector (−v) ∈ V such that
v + (−v) = 0 . (5.1i)

Natural Sciences Tripos: IB Mathematical Methods I 74 © [email protected], Michaelmas 2022


Remarks.
(i) The existence of a negative/inverse vector (see (5.1i)) allows us to subtract as well as add vectors, by
defining
u − v ≡ u + (−v) . (5.2)

(ii) Vector multiplication is not defined in general.


(iii) If we restrict all scalars to be real, we have a real linear vector space, or a linear vector space over
reals.

(iv) We will often refer to V as a vector space, rather than the more correct linear vector space.

The basic example of a vector space is F n . An element of F n is a list of n scalars, (x1 , . . . , xn ), where xi ∈ F .
This is called an n-tuple. Vector addition and scalar multiplication are defined component-wise:

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ) (5.3a)
α(x1 , . . . , xn ) = (αx1 , . . . , αxn ) (5.3b)

5.1.3 Span and linear independence


Let S = {u1 , u2 , . . . , um } be a subset of vectors in V . A linear combination of S is any vector of the form
m
X
a1 u1 + a2 u2 + · · · + am um = ai ui = ai ui , (5.4)
i=1

where a1 , a2 , . . . , am are scalars and, henceforth, we will use the summation convention.

Definition: Span. The span of S is the set of all vectors that are linear combinations of S. If the span of S
is the entire vector space V , then S is said to span V .

Definition: Linear independence. A set of m non-zero vec-


tors {u1 , u2 , . . . um } is linearly independent if

ai ui = 0 ⇒ ai = 0 for i = 1, 2, . . . , m. (s.c.) (5.5)

Otherwise, the vectors are linearly dependent,


i.e. there exist scalars ai , at least one of which is
non-zero, such that

ai ui = 0 . (s.c.)

Definition: Dimension of a Vector Space. If a vector space V contains a set of n linearly independent vectors
11/04 but all sets of n + 1 vectors are linearly dependent, then V is said to be of dimension n.
Examples.
(i) Since
(a, b, c) = a(1, 0, 0) + b(0, 1, 0) + c(0, 0, 1) , (5.6)
the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) span a linear vector space of dimension 3.
(ii) (1, 0, 0), (0, 1, 0) and (0, 0, 1) are linearly independent since

a(1, 0, 0) + b(0, 1, 0) + c(0, 0, 1) = (a, b, c) = 0 ⇒ a = 0, b = 0, c = 0 .

(iii) (1, 0, 0), (0, 1, 0) and (1, 1, 0) are linearly dependent since (1, 1, 0) = (1, 0, 0) + (0, 1, 0).

Remarks.
(i) If an additional vector is included in a spanning set, it remains a spanning set.
11/02 (ii) If a vector is removed from a linearly independent set, the set remains linearly independent.
11/03

Natural Sciences Tripos: IB Mathematical Methods I 75 © [email protected], Michaelmas 2022


5.1.4 Basis Vectors
If V is an n-dimensional vector space then any set of n linearly independent vectors {u1 , . . . , un } is a basis
for V . The are a couple of key properties of a basis.

(i) We claim that for all vectors v ∈ V , there exist scalars vi such that

v = vi ui . (5.7a)

The vi are said to be the components of v with respect to the basis {u1 , . . . , un }.
Proof (unlectured). To see this we note that since V has dimension n, the set {u1 , . . . , un , v} is linearly
dependent, i.e. there exist scalars (a1 , . . . , an , b), not all zero, such that

ai ui + bv = 0 . (5.7b)

If b = 0 then the ai = 0 for all i because the ui are linear independent, and we have a contradiction;

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
hence b ̸= 0. Multiplying by b−1 we have that

v = − b−1 ai ui


=vi ui , (5.7c)

where vi = −b−1 ai (i = 1, . . . , n). 2


(ii) The scalars v1 , . . . , vn are unique.
Proof (unlectured). Suppose that

v = vi ui and that v = wi ui . (5.8a)

Then, because v − v = 0,
0 = (vi − wi )ui . (5.8b)
But the ui (i = 1, . . . , n) are linearly independent, so the only solution of this equation is vi − wi = 0
(i = 1, . . . n). Hence vi = wi (i = 1, . . . n), and we conclude that the two linear combinations (5.8a) are
identical. 2

Remarks. In (ii) and (i) below, {u1 , . . . , um } are a set of vectors in an n-dimensional vector space.
(i) If m < n then there exists a vector that cannot be expressed as a linear combination of the ui .
(ii) If m > n then there exists some vector that, when expressed as a linear combination of the ui ,
has non-unique scalar coefficients. This is true whether or not the ui span V .
(iii) Vector spaces can have infinite dimension, e.g. the set of functions defined on the interval
0 ⩽ x < 2π and having Fourier series

X
f (x) = fn einx . (5.9)
n=−∞

Here f (x) is the ‘vector’ and fn are its ‘components’ with respect to the ‘basis’ of functions einx .
14/22 Functional analysis deals with such infinite-dimensional vector spaces.

Examples.

(i) Three-Dimensional Euclidean Space E3 . In this case the scalars are real and V is three-dimensional
because every vector v can be written uniquely as (cf. (5.6))

v = vx ex + vy ey + vz ez (5.10a)
= v1 u1 + v2 u2 + v3 u3 , (5.10b)

where {ex = u1 = (1, 0, 0), ey = u2 = (0, 1, 0), e3 = u1 = (0, 0, 1)} is a basis.

Natural Sciences Tripos: IB Mathematical Methods I 76 © [email protected], Michaelmas 2022


(ii) The Complex Numbers. Here we need to be careful what we mean.

Suppose we are considering a complex linear vector space,


i.e. a linear vector space over C. Then because the scalars
are complex, every complex number z can be written
uniquely as

z =α·1 where α ∈ C, (5.11a)

and moreover

α·1=0 ⇒ α=0 for α ∈ C . (5.11b)

We conclude that the single ‘vector’ {1} constitutes a basis


for C when viewed as a linear vector space over C.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
However, we might alternatively consider the complex numbers as a linear vector space over R, so that
the scalars are real. In this case the pair of ‘vectors’ {1, i} constitute a basis because every complex
number z can be written uniquely as

z = a · 1 + b · ı where a, b ∈ R , (5.12a)

and

a·1+b·ı=0 ⇒ a=b=0 if a, b, ∈ R . (5.12b)

Thus we have that


dimC C = 1 but dimR C = 2 , (5.13)
where the subscript indicates whether the vector space C is considered over C or R.

Remarks.
(i) R3 is not quite the same as physical space because physical space has a rule for the distance
between two points (i.e. Pythagoras’s theorem, if physical space is approximated as Euclidean)
(ii) R2 is not quite the same as C because C has a rule for multiplication
Worked exercise. Show that 2 × 2 real symmetric matrices form a real linear vector space under addition.
Show that this space has dimension 3 and find a basis.
Answer. Let V be the set of all real symmetric matrices, and let
     
αa βa αb βb αc βc
A= , B= , C= ,
βa γ a βb γ b βc γc

be any three real symmetric matrices.


(i) We note that addition is closed since A + B is a real symmetric matrix.
(ii) Addition is commutative and associative since for all [real symmetric] matrices, A + B = B + A
and (A + B) + C = A + (B + C).
(iii) Multiplication by a scalar is closed since if p ∈ R, then pA is a real symmetric matrix.
(iv) Multiplication by a scalar is distributive and associative since for all p, q ∈ R and for all [real
symmetric] matrices, p(A + B) = pA + pB, (p + q)A = pA + qA and p(qA) = (pq)A.
(v) The zero matrix,  
0 0
0= ,
0 0
is real and symmetric (and hence in V ), and such that for all [real symmetric] matrices A + 0 = A.
(vi) For any [real symmetric] matrix there exists a negative matrix, i.e. that matrix with the compo-
nents reversed in sign. In the case of a real symmetric matrix, the negative matrix is again real
and symmetric.

Natural Sciences Tripos: IB Mathematical Methods I 77 © [email protected], Michaelmas 2022


Therefore V is a real linear vector space; the ‘vectors’ are the 2 × 2 real symmetric matrices. Moreover,
the three 2 × 2 real symmetric matrices
     
1 0 0 1 0 0
U1 = , U2 = , and U3 = , (5.15)
0 0 1 0 0 1
are independent, since for p, q, r ∈ R
 
p q
pU1 + qU2 + rU3 = =0 ⇒ p = q = r = 0.
q r
Further, any 2 × 2 real symmetric matrix can be expressed as a linear combination of the Ui since
 
p q
= pU1 + qU2 + rU3 .
q r
We conclude that the 2 × 2 real symmetric matrices form a three-dimensional real linear vector space

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
11/01 under addition, and that the ‘vectors’ Ui defined in (5.15) form a basis.
Exercise. Show that 3 × 3 symmetric real matrices form a vector space under addition. Show that this space
has dimension 6 and find a basis.

5.2 Change of Basis: the Rôle of Matrices


5.2.1 Linear Operators
A linear operator A on a vector space V acts on elements of V to produce other elements of V . The action
of A on x is written A(x) or just Ax. The property of linearity means that for scalars α and β
A(αx + βy) = αAx + βAy . (5.16)
Remarks.
(i) A linear operator has an existence without reference to any basis.
(ii) The operation can be thought of as a linear transformation or mapping of the space V (a simple
example is a rotation of three-dimensional space).
(iii) A more general idea, not considered here, is that a linear operator can act on vectors of one space
V to produce vectors of another space V ′ , possibly of a different dimension.
The components of A with respect to a basis {ei } are defined by the action of A on those basis vectors:
Aej = ei Aij . (s.c.) (5.17)
The components, Aij , form a square matrix A, where (A)ij = Aij .
Since A is a linear operator, a knowledge of its action on a basis is sufficient to determine its action on any
vector x since, from (5.16),
Ax = A(ej xj ) = xj (Aej ) = xj (ei Aij ) = ei Aij xj , (5.18a)
or
(Ax)i = Aij xj . (5.18b)
This corresponds to the rule for multiplying a matrix by a vector.
The sum of two linear operators is defined by
(A + B)x = Ax + Bx = ei (Aij + Bij )xj . (5.19a)
The product, or composition, of two linear operators has the action
(AB)x = A(Bx) = A(ek Bkj xj ) = (Aek )Bkj xj = ei Aik Bkj xj . (5.19b)
The components therefore satisfy the rules of matrix addition and multiplication:
(A + B)ij = Aij + Bij , (AB)ij = Aik Bkj . (5.19c)
Recall that matrix multiplication is not commutative, so BA ̸= AB in general.
Therefore a matrix can be thought of as the components of a linear operator with respect to a given basis,
just as a column matrix or n-tuple can be thought of as the components of a vector with respect to a given
basis.
Natural Sciences Tripos: IB Mathematical Methods I 78 © [email protected], Michaelmas 2022
5.2.2 Transformation Matrices
Let {ui : i = 1, . . . , n} and {u′i : i = 1, . . . , n} be two sets of basis vectors for an n-dimensional vector space V .
Since the {ui : i = 1, . . . , n} is a basis, the individual basis vectors of the basis {u′i : i = 1, . . . , n} can be
written as
u′j = ui Aij (j = 1, . . . , n) , (5.20a)
for some numbers Aij . From (5.7a) we see that Aij is the ith component of the vector u′j in the basis
{ui : i = 1, . . . , n}. Hence, the numbers Aij can be represented by a square n × n transformation matrix A
 
A11 A12 · · · A1n
 A21 A22 · · · A2n 
A= . ..  , (5.20b)
 
. .. . .
 . . . . 
An1 An2 ··· Ann
where the j th column of A consists of the components of u′j in terms of the {ui : i = 1, . . . , n} basis.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Similarly, since the {u′i : i = 1, . . . , n} is a basis, the individual basis vectors of the basis {ui : i = 1, . . . , n}
can be written as
ui = u′k Bki (i = 1, 2, . . . , n) , (5.21a)
for some numbers Bki . Here Bki is the kth component of the vector ui in the basis {u′k : k = 1, . . . , n}.
Again the Bki can be viewed as the entries of a matrix B
 
B11 B12 · · · B1n
 B21 B22 · · · B2n 
B= . ..  . (5.21b)
 
.. ..
 .. . . . 
Bn1 Bn2 · · · Bnn

5.2.3 Properties of Transformation Matrices


From substituting (5.21a) into (5.20a) we have that
u′j = (u′k Bki ) Aij = u′k (Bki Aij ) . (5.22a)
However, because of the uniqueness of a basis representation and the fact that
u′j = u′k δkj , (5.22b)
it follows that
Bki Aij = δkj . (5.22c)
Hence in matrix notation, BA = I, where I is the identity matrix. Conversely, substituting (5.20a) into (5.21a)
leads to the conclusion that AB = I (alternatively argue by a relabeling symmetry). Thus
B = A−1 , (5.23a)
and
det A ̸= 0 and det B ̸= 0 . (5.23b)

5.2.4 Transformation Law for Vector Components


Consider a vector v, then in the {ui : i = 1, . . . , n} basis we have from (5.7a)
v = vi ui . (5.24)
Similarly, in the {u′i : i = 1, . . . , n} basis we can write
v =vj′ u′j (5.25)
=vj′ ui Aij from (5.20a)
=ui (Aij vj′ ) de facto swap summation order.
Since a basis representation is unique it follows from (5.7a) that
vi = Aij vj′ , (5.26)
which relates the components of v in the basis {ui : i = 1, . . . , n} to those in the basis {u′i : i = 1, . . . , n}.

Natural Sciences Tripos: IB Mathematical Methods I 79 © [email protected], Michaelmas 2022


Some Notation. Let v and v′ be the column matrices
   ′
v1 v1
 v2   v2′ 
v =  .  and v′ =  .  respectively. (5.27)
   
 ..   .. 
vn vn′
Note that we now have bold v denoting a vector, italic vi denoting a component of a vector, and sans
serif v denoting a column matrix of components. Then (5.26) can be expressed as
v = Av′ , (5.28a)
−1
and hence, from applying A to either side of (5.28a),
v′ = A−1 v . (5.28b)

Remark. In matrix notation (5.20a) can be expressed as

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
u′ = uA . (5.28c)
From a comparison between (5.28b) and (5.28c) we see that the components of v transform inversely
to the way that the basis vectors transform. This is so that the vector v is unchanged:
v =vj′ u′j from (5.25)
−1

= (A )jk vk (ui Aij ) from (5.28b) and (5.20a)
=ui vk Aij (A−1 )jk

de facto swap summation order
=ui (vk δik ) AA−1 = I
=vi ui . contract using (1.14c)

Worked example. Let {u1 = (1, 0), u2 = (0, 1)} and {u′1 = (1, 1), u′2 = (−1, 1)} be two sets of basis vectors
in R2 . Find the transformation matrix Aij that connects them. Verify the transformation law for the
components of an arbitrary vector v in the two coordinate systems.
Answer. We have from direct substitution and using (5.20a) that
u′1 = ( 1, 1) = (1, 0) + (0, 1) = u1 + u2 = uj Aji ,
u′2 = (−1, 1) = −1 · (1, 0) + (0, 1) = −u1 + u2 = uj Aj2 .
Hence
A11 = 1 , A21 = 1 , A12 = −1 and A22 = 1 ,
i.e. !
1 1
 
1 −1
A= , with inverse A−1 = 1
2 .
1 1 −1 1
First Check. Note that A−1 is consistent with (5.21a), (5.23a) and the observation that
u1 = (1, 0) = 12 ( (1, 1) − (−1, 1) ) = 21 (u′1 − u′2 ) = u′j A−1
ji ,

u2 = (0, 1) = 12 ( (1, 1) + (−1, 1) ) = 21 (u′1 + u′2 ) = u′j A−1


j2 .

Second Check. Consider an arbitrary vector v, then from direct substitution


v = v1 u1 + v2 u2
= 21 v1 (u′1 − u′2 ) + 12 v2 (u′1 + u′2 )
= 12 (v1 + v2 )u′1 − 21 (v1 − v2 )u′2 .
Thus
v1′ = 12 (v1 + v2 ) and v2′ = − 21 (v1 − v2 ) .
This is consistent with the result obtained using (5.28b), viz.
! 
1 1
 
12/02 ′ −1 1 v1 1 v1 + v2
v =A v= 2 =2 .
12/03 −1 1 v2 −v1 + v2
12/04

Natural Sciences Tripos: IB Mathematical Methods I 80 © [email protected], Michaelmas 2022


5.3 Some Definitions of Special Matrices
From last year you should be familiar with the following definitions.
Symmetric Matrix. A square n × n matrix is symmetric if it is equal to its transpose:

AT = A or Aji = Aij . (5.29a)

Antisymmetric matrix. A square n × n matrix is antisymmetric (or skew-symmetric) if it is equal to the


negative of its transpose:
AT = −A or Aji = −Aij . (5.29b)

Orthogonal matrix. A square n × n matrix is orthogonal if its transpose is equal to its inverse:

AT = A−1 or AAT = AT A = 1 . (5.29c)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
These ideas can be generalized to a complex vector space; however, first we need a definition.
Hermitian conjugate. The Hermitian conjugate of a matrix A is defined to be the complex conjugate (de-
noted by ∗ ) of its transpose, i.e.

A† = (AT )∗ = (A∗ )T or equivalently (A† )ij = A∗ji . (5.30)

For example
A∗11 A∗21
 
 
A11 A12 A13 †
if A = then A = A∗12
 A∗22  . (5.31a)
A21 A22 A23
A∗13 A∗23
Similarly, the Hermitian conjugate of a column matrix x is a row matrix, e.g.
 †
x1
 x2 
x† =  .  = x∗1 x∗2 x∗n .

··· (5.31b)
 
 .. 
xn

The Hermitian conjugate of a Hermitian conjugate. From (5.30)


T∗
A†† = A∗T = A. (5.32a)

The Hermitian conjugate of a product of matrices. For matrices A and B recall that (AB)T = BT AT . Hence
(AB)T∗ = BT∗ AT∗ , and so

(AB)† = B† A† . (5.32b)

This result extends to arbitrary products of matrices and vectors, e.g.

(ABCx)† = x† C† B† A† , (5.32c)
† † † †
(x Ay) = y A x . (5.32d)

In the latter example, if x and y are column matrices, each side of the equation is a scalar (a complex
number). The Hermitian conjugate of a scalar is just the complex conjugate.
Positive definiteness. A square n × n matrix A is said to be positive definite if for all column matrices v of
length n
v† Av ⩾ 0 , with equality iff v = 0 , (5.33a)
where ‘iff’ means if and only if.
Remark. If equality to zero were possible in (5.33a) for non-zero v, then A would said to be positive
15/22 rather than positive definite.
Hermitian matrix. A square n × n matrix is Hermitian if it is equal to its Hermitian conjugate:

A† = A or A∗ji = Aij (5.33b)

Natural Sciences Tripos: IB Mathematical Methods I 81 © [email protected], Michaelmas 2022


Anti-Hermitian matrix. A square n × n matrix is anti-Hermitian (or skew-Hermitian) if it is equal to the
negative of its Hermitian conjugate:
A† = −A or A∗ji = −Aij . (5.33c)

Unitary matrix. A square n × n matrix is unitary if its Hermitian conjugate is equal to its inverse:
A† = A−1 or AA† = A† A = 1 . (5.33d)

Normal matrix. A square n × n matrix is normal if it commutes with its Hermitian conjugate:
AA† = A† A . (5.33e)

Exercise. Verify that Hermitian, anti-Hermitian and unitary matrices are all normal.

5.4 Scalar Product (Inner Product)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
5.4.1 Definition of a Scalar Product
The prototype linear vector space V = E3 has the additional property that any two vectors u and v can
be combined to form a scalar u · v. This can be generalised to an n-dimensional vector space V over C by
assigning, for every pair of vectors u, v ∈ V , a scalar product u · v ∈ C with the following properties.
(i) The scalar product should be linear in its second argument, i.e. for a, b ∈ C
u · (av1 + bv2 ) = a u · v1 + b u · v2 . (5.34a)

(ii) The scalar product should have Hermitian symmetry, i.e.


u · v = (v · u)∗ , (5.34b)
where we again denote a complex conjugate with ∗ . Implicit in this equation is the conclusion that for
a complex vector space the ordering of the vectors in the scalar product is important (whereas for E3
this is not important). Further, if we let u = v, then this implies that
v · v = (v · v)∗ , (5.34c)
i.e. v · v is real.
(iii) The scalar product of a vector with itself should be positive, i.e.
v · v ⩾ 0. (5.34d)
This allows us to write v · v = |v|2 , where the real positive number |v| is the norm (cf. length) of the
vector v.
(iv) Further, the scalar product should be positive definite, i.e. the only vector of zero norm should be the
zero vector:
|v| = 0 ⇒ v = 0 . (5.34e)
Remarks.
(i) A scalar/inner product has existence without reference to any basis.
(ii) Properties (5.34a) and (5.34b) imply that for a, b ∈ C

(au1 + bu2 ) · v = (v · (au1 + bu2 ))

= (av · u1 + bv · u2 )
∗ ∗
= a∗ (v · u1 ) + b∗ (v · u2 )
= a∗ (u1 · v) + b∗ (u2 · v) , (5.35)
i.e. the scalar product is ‘anti-linear’ in the first argument.
Failure to remember this is a common cause of error.
However, if a, b ∈ R then (5.35) reduces to linearity in both arguments.
Alternative notation. An alternative notation for the scalar product and associated norm is
⟨u |v ⟩ ≡ u · v, (5.36a)
1
∥v∥ ≡ |v| = (v · v) . 2
(5.36b)

Natural Sciences Tripos: IB Mathematical Methods I 82 © [email protected], Michaelmas 2022


5.4.2 Worked Example
Question. Identify an inner product for the vector space of real symmetric 2 × 2 matrices under addition.
Answer. We have already seen that the real symmetric 2 × 2 matrices form a vector space. In defining an
inner product a key point to remember is that we need property (5.45a), i.e. that the scalar product
of a vector with itself is zero only if the vector is zero. One way to do this is to spot that the vector
space of real symmetric 2 × 2 matrices is really the 4-tuple (A11 , A12 , A21 , A22 ), with A12 = A21 , in
disguise. Hence, one way forward is to consider the inner product defined for matrices A and B by

⟨ A | B ⟩ = A∗ij Bij (5.37a)


= A∗11 B11 + A∗12 B12 + A∗21 B21 + A∗22 B22 , (5.37b)

where we are using the alternative notation (5.36a) for the inner product. For this definition of inner
product we have for real symmetric 2 × 2 matrices A, B and C, and a, b ∈ C:

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(i) as in (5.34a)
⟨ A | (βB + γC) ⟩ = A∗ij (βBij + γCij )
= βA∗ij Bij + γA∗ij Cij
= β ⟨A |B ⟩ + γ ⟨A |C ⟩;

(ii) as in (5.34b)
∗ ∗
⟨ B | A ⟩ = Bij Aij = ⟨ A | B ⟩ ;

(iii) as in (5.34d) and (5.34e)


n
X
⟨ A | A ⟩ = A∗ij Aij = |Aij |2 ⩾ 0 ;
i,j=1

⟨A |A ⟩ = 0 ⇒ A = 0.

Hence we have a well defined inner product.

5.4.3 Some Inequalities


Schwarz’s Inequality. This states that
|⟨ u | v ⟩| ⩽ ∥u∥ ∥v∥ , (5.38)
with equality only when u is a scalar multiple of v.
Proof. Write ⟨ u | v ⟩ = |⟨ u | v ⟩|eıα , and for λ ∈ C consider

∥u + λv∥2 = ⟨ u + λv | u + λv ⟩ from (5.36b)


∗ 2
= ⟨ u | u ⟩ + λ⟨ u | v ⟩ + λ ⟨ v | u ⟩ + |λ| ⟨ v | v ⟩ from (5.34a) and (5.35)
ıα ∗ −ıα 2
= ⟨ u | u ⟩ + (λe +λ e )|⟨ u | v ⟩| + |λ| ⟨ v | v ⟩ from (5.34b).

First, suppose that v = 0. The right-hand-side then simplifies from a quadratic in λ to an expression
that is linear in λ. If ⟨ u | v ⟩ =
̸ 0 we then have a contradiction since for certain choices of λ this
simplified expression can be negative. Hence we conclude that
⟨u |v ⟩ = 0 if v = 0 ,
in which case (5.38) is satisfied as an equality. Next
suppose that v ̸= 0 and choose λ = re−ıα so that from
(5.34d)

0 ⩽ ∥u + λv∥2 = ∥u∥2 + 2r|⟨ u | v ⟩| + r2 ∥v∥2 .

The right-hand-side is a quadratic in r that has a min-


imum when r∥v∥2 = −|⟨ u | v ⟩|. Schwarz’s inequality
follows on substituting this value of r, with equality if
u = −λv. 2

Natural Sciences Tripos: IB Mathematical Methods I 83 © [email protected], Michaelmas 2022


The Triangle Inequality. This states that

∥u + v∥ ⩽ ∥u∥ + ∥v∥ . (5.39)

Proof. This follows from taking square roots of the following inequality:

∥u + v∥2 = ⟨ u | u ⟩ + ⟨ u | v ⟩ + ⟨ u | v ⟩ + ⟨ v | v ⟩ from above with λ = 1
2 2
= ∥u∥ + 2 Re ⟨ u | v ⟩ + ∥v∥ from (5.34b)
2 2
⩽ ∥u∥ + 2|⟨ u | v ⟩| + ∥v∥
⩽ ∥u∥2 + 2∥u∥ ∥v∥ + ∥v∥2 from (5.38)
2
⩽ (∥u∥ + ∥v∥) .

5.4.4 The Scalar Product in Terms of Components

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Suppose that we have a scalar product defined on a vector space with a given basis {ui : i = 1, . . . , n}. We
will next show that the scalar product is in some sense determined for all pairs of vectors by its values for
all pairs of basis vectors. To start, define the complex numbers Gij by

Gij = ui · uj (i, j = 1, . . . , n) . (5.40)

Then, for any two vectors


v = vi ui and w = wj uj , (5.41)
we have that

v · w = (vi ui ) · (wj uj )
=vi∗ wj ui · uj from (5.34a) and (5.35)
=vi∗ Gij wj . (5.42)

In matrix notation the scalar product (5.42) can be written as

v · w = v† G w , (5.43)

where G is the matrix, or metric, with entries Gij (metrics are a key ingredient of General Relativity).

5.4.5 Properties of the Metric


Property: a metric is Hermitian. The elements of the Hermitian conjugate of the metric G are the complex
numbers

(G† )ij ≡ G†ij = (Gji )∗ from (5.30) (5.44a)



= (uj · ui ) from (5.40)
= ui · uj from (5.34b)
= Gij . from (5.40) (5.44b)

Hence G is Hermitian, i.e.


G† = G . (5.44c)

Remark (unlectured). That G is Hermitian is consistent with the requirement (5.34c) that |v|2 = v · v is
real, since

(v · v)∗ = ((v · v)∗ )T since a scalar is its own transpose



= (v · v) from definition (5.30)
† †
= (v G v) from (5.43)
† †
=v G v from (5.32b) and (5.32a)

= v Gv from (5.44c)
= v ·v. from (5.43)

Natural Sciences Tripos: IB Mathematical Methods I 84 © [email protected], Michaelmas 2022


Property: a metric is positive definite. From (5.34d) and (5.34e) we have from the properties of a scalar
product that for any v
|v|2 ⩾ 0 with equality iff v = 0 . (5.45a)
Hence, from (5.43), for any v
v† Gv ⩾ 0 with equality iff v = 0 . (5.45b)
13/01 It follows from definition (5.33a) that G is positive definite.

5.5 Eigenvalues, Eigenvectors and Diagonalization


Suppose that M is a square n × n matrix. Then a non-zero column vector x such that
Mx = λx , (5.46a)
where λ ∈ C, is said to be an eigenvector of the matrix M with eigenvalue λ. If we rewrite this equation as
(M − λI)x = 0 ,

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(5.46b)
then, since x is non-zero, we conclude that a a non-trivial linear combination of the columns of the matrix
(M − λI) is equal to zero, i.e. that the columns of the matrix are linearly dependent. This statement is also
equivalent to the requirement
det(M − λI) = 0 , (5.47)
which is called the characteristic equation of the matrix M. The left-hand-side of (5.47) is an nth order
polynomial in λ called the characteristic polynomial of M.
The roots of the characteristic polynomial are the eigenvalues of M, and since an nth order polynomial has
exactly n, possibly complex, roots (counting multiplicities in the case of repeated roots), there are always n
eigenvalues.
Examples.
(i) Find the eigenvalues and eigenvectors of
 
0 1
M= (5.48a)
−1 0
Answer. From (5.47)
−λ 1
0 = det(M − λI) = = λ2 + 1 = (λ − ı)(λ + ı) , (5.48b)
−1 −λ
and so the eigenvalues of M are ±ı. The eigenvectors are the non-zero solutions to
    
∓ı 1 x 0
= . (5.48c)
−1 ∓ı y 0
Hence there are two linearly independent eigenvectors
   
1 1
α and β , (5.48d)
ı −ı
where α and β are any non-zero constants.
(ii) Find the eigenvalues and eigenvectors of
 
0 1
M= (5.49a)
0 0
Answer. From (5.47)
−λ 1
0 = det(M − λI) = = λ2 , (5.49b)
0 −λ
and so the eigenvalues of M are 0 and 0. The eigenvectors are the non-zero solutions to
    
0 1 x 0
= . (5.49c)
0 0 y 0
Hence there is only one linearly independent eigenvector, namely any non-zero multiple of
 
1
. (5.49d)
0

Natural Sciences Tripos: IB Mathematical Methods I 85 © [email protected], Michaelmas 2022


Degeneracy. If the n eigenvalues are distinct, then there are n linearly independent eigenvectors, each of
which is determined uniquely up to an arbitrary multiplicative constant.
If the eigenvalues are not all distinct, the repeated eigenvalues are said to be degenerate. If an eigenvalue
λ occurs m times, there may be any number between 1 and m of linearly independent eigenvectors
corresponding to it. Any linear combination of these is also an eigenvector and the space spanned by
such vectors is called an eigenspace.
Diagonalization. Denote the n, not necessarily distinct, eigenvalues by λi , i = 1, 2, . . . , n, and let xi be the
respective eigenvectors; so
Mxi = λi xi , (i = 1, 2, . . . , n, no s.c.) (5.50a)
or in component notation for the j th component
Xn
Mjk xik = λi xij . (5.50b)
k=1

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Let X be the n × n matrix whose columns are the eigenvectors of M, then
(X)ij ≡ Xij = xji , (5.51a)
i.e.  1
x1 x21 ··· xn1

 x1 x2 ··· xn2 
 2 2 
X=  .. .. .. ..  .
 (5.51b)
 . . . . 
xn x2n
1
··· xnn
in which case (5.50b) can be rewritten as
Xn n
X
Mjk Xki = λi Xji = Xjk δki λi (5.52a)
k=1 k=1

or, in matrix notation, as


MX = XΛ , (5.52b)
where Λ is the diagonal matrix  
λ1 0 ··· 0
 0 λ2 ··· 0 
Λ= . (5.52c)
 
.. .. .. ..
 . . . . 
0 0 ··· λn
If X has an inverse, X−1 , then
X−1 MX = Λ , (5.53)
−1
i.e. X diagonalizes M. But for X to exist we require that det X ̸= 0; this is equivalent to the
requirement that the columns of X are linearly independent. These columns are just the eigenvectors
of M, so
an n × n matrix is diagonalizable if and only if it has n linearly-independent eigenvectors.
5.6 Eigenvalues and Eigenvectors of Hermitian Matrices
In order to determine whether a metric is diagonalizable, we conclude from the above considerations that
we need to determine whether the metric has n linearly-independent eigenvectors. To this end we shall
determine two important properties of Hermitian matrices.

5.6.1 Properties of the Eigenvalues and Eigenvectors of an Hermitian Matrix


Let H be an Hermitian matrix, and consider two eigenvectors x and y corresponding to eigenvalues λ and µ:
Hx = λx , (5.54a)
Hy = µy . (5.54b)

The Hermitian conjugate of (5.54b) is, since H = H,
y† H = µ∗ y† . (5.54c)

Using (5.54a) and (5.54c) we can construct two expressions for y Hx:
y† Hx = λy† x = µ∗ y† x , (5.55a)
and hence
(λ − µ∗ )y† x = 0 . (5.55b)
Natural Sciences Tripos: IB Mathematical Methods I 86 © [email protected], Michaelmas 2022
The eigenvalues of an Hermitian matrix are real. Suppose that x and y are the same eigenvector. Then
y = x and µ = λ, so (5.55b) becomes
(λ − λ∗ )x† x = 0 (5.56)
Since x ̸= 0, x† x = x∗i xi = |x|2 ̸= 0,and so λ∗ = λ. Therefore
the eigenvalues of an Hermitian matrix are real.
The eigenvectors of an Hermitian matrix with distinct eigenvalues are orthogonal. Knowing that the eigen-
values of an Hermitian matrix are real allows us to simplify (5.55b) to
(λ − µ)y† x = 0 . (5.57)
If x and y are now different eigenvectors, we deduce that y† x = 0, provided that µ ̸= λ. Therefore, (in
the standard inner product on Cn ,
16/22 the eigenvectors of an Hermitian matrix corresponding to distinct eigenvalues are orthogonal.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Degenerate eigenvalues. The case when there is a repeated eigenvalue is more difficult. However with suf-
ficient mathematical effort it can still be proved that orthogonal eigenvectors exist for the repeated
eigenvalue. Instead of adopting this approach we appeal to arm-waving arguments.

An ‘experimental’ approach. First adopt an ‘experimental’ approach. In real life it is highly unlikely
that two eigenvalues will be exactly equal (because of experimental error, etc.). Hence this case
never arises and we can assume that we have n orthogonal eigenvectors. §

A perturbation approach. Alternatively suppose that in the


real problem two eigenvalues are exactly equal. In-
troduce a specific, but small, perturbation of size ε
(cf. the ε introduced in (3.8b) when calculating the
Fourier transform of the Heaviside step function) such
that the perturbed problem has unequal eigenvalues
(this is highly likely to be possible because the prob-
lem with equal eigenvalues is likely to be ‘structurally
unstable’). Now let ε → 0. For all non-zero values of ε
(both positive and negative) there will be n orthogonal
eigenvectors. On appealing to a continuity argument
there will be n orthogonal eigenvectors for the specific
14/02
case ε = 0.
14/04
An Hermitian matrix has n orthogonal linearly independent eigenvectors. We have already concluded that
the eigenvectors of an Hermitian matrix are orthogonal, we now need to prove that two orthogonal
[eigen]vectors are linearly independent.
Proof. Suppose there exist α and β such that
αx + βy = 0 . (5.58a)
Then from pre-multiplying (5.58a) by y† and using the orthogonality of x and y, i.e. y† x = 0, it
follows that
0 = βy† y = βyk∗ yk . (5.58b)
Since y is non-zero it follows that β = 0. By the relabeling symmetry, or from pre-multiplying
(5.58a) by x† , it similarly follows that α = 0. 2
We conclude that, whether or not two or more eigenvalues are equal,
an n-dimensional Hermitian matrix has n orthogonal eigenvectors that are linearly independent.
An Hermitian matrix has n orthonormal eigenvectors. We can tighten this result a little further by noting
that, for any µ ∈ C,
if Hx = λx , then H(µx) = λ(µx) . (5.59a)
This allows us to normalise the eigenvectors so that
x† x = 1 . (5.59b)
Hence for Hermitian matrices it is always possible to find n orthonormal eigenvectors that are linearly
14/03 independent.
Natural Sciences Tripos: IB Mathematical Methods I 87 © [email protected], Michaelmas 2022
Anti-Hermitian and Unitary Matrices. The eigenvalues of anti-Hermitian and unitary matrices are imagi-
nary and of unit modulus, respectively. These results can be proved using a similar approach to that
leading to (5.56) for proving that the eigenvalues for Hermitian matrices are real.
Normal matrices. It can be shown that the eigenvectors of normal matrices corresponding to distinct eigen-
values are orthogonal. Moreover, if a repeated eigenvalue λ occurs m times, it can be shown (with
some difficulty) that there are exactly m corresponding linearly independent eigenvectors.
Construction of orthogonal eigenvectors. If the multiplicity of an eigenvalue, say m, matches the number
of its linearly independent eigenvectors, then the eigenvalue is said to have no defect. In this case it
is always possible to construct an orthogonal basis within the m-dimensional eigenspace, e.g. by the
Gram–Schmidt procedure (see Example Sheet 3, Question 2). Therefore, even if the eigenvalues are
degenerate, it is possible to find n mutually orthogonal eigenvectors, which form a basis for the vector
space.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
5.6.2 Diagonalization of Hermitian Matrices
It follows from the above result, (5.51b) and (5.53), that an Hermitian matrix H can be ‘diagonalized’ to
the matrix Λ by means of the transformation X−1 H X, where the columns of X are the eigenvectors of H:
 1
x1 x21 · · · xn1

 x1 x2 · · · xn 
 2 2 2 
X=  .. .. .. ..  .
 (5.60)
 . . . . 
x1n x2n ··· xnn

Orthonormal eigenvectors. If the xi are orthonormal eigenvectors of H then X is a unitary matrix since
(X† X)ij = (X† )ik (X)kj = (xik )∗ xjk = xi† xj = δij by orthonormality, (5.61a)
or, in matrix notation,

X† X = I . (5.61b)
Hence X is a unitary matrix, and we deduce that every Hermitian matrix, H, is diagonalizable by a
transformation
X† HX = Λ , (5.62)
where X is a unitary matrix.
In the case when we restrict ourselves to real matrices, we conclude that every real symmetric matrix, S,
is diagonalizable by a transformation RT S R, where R is an orthogonal matrix.

Example. Find the orthogonal matrix that diagonalizes the real symmetric matrix
 
1 β
S= where β is real. (5.63)
β 1

Answer. The characteristic equation is


1−λ β
0= = (1 − λ)2 − β 2 . (5.64)
β 1−λ
The solutions to (5.64) are (
λ+ = 1 + β
λ= . (5.65)
λ− = 1 − β
The corresponding eigenvectors x(±) are found by solving Sx(±) = λ± x(±) , i.e.
!  (±) 
1 − λ± β x
 1  = 0, (5.66a)
β 1 − λ± (±)
x2
or
!  (±) 
∓1 1 x
β  1  = 0. (5.66b)
1 ∓1 (±)
x2

Natural Sciences Tripos: IB Mathematical Methods I 88 © [email protected], Michaelmas 2022


β ̸= 0. If β ̸= 0 (in which case λ+ ̸= λ− ) we have that
(±) (±)
x2 = ± x1 . (5.67a)
(±) (±)† (±)
On normalising x so that x x = 1, it follows that
   
1 1 1 1
x(+) = ± √ , x(−) = ± √ . (5.67b)
2 1 2 −1
Note that x+† x− = 0, as proved earlier.
β = 0. If β = 0, then S = I, and so any non-zero vector is an eigenvector with eigenvalue 1. In
agreement with the result stated earlier, two linearly-independent eigenvectors can still be found,
and we can choose them to be orthonormal, e.g. x+ and x− as above (if fact there is an uncountable
choice of orthonormal eigenvectors in this very special case).
To diagonalize S when β ̸= 0 (it already is diagonal if β = 0) we construct an orthogonal matrix R
using (5.60):

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
 
(+) (−) √1 √1
! !
x1 x1 2 2 1 1 1
R≡X=   = =√ . (5.68)
(+) (−) √1 − √12 2 1 −1
x2 x2 2

As a check we note that


! ! !
T1 1 1 1 1 1 0
R R= = , (5.69)
2 1 −1 1 −1 0 1
and that ! ! !
T 1 1 1 1 β 1 1
R SR =
2 1 −1 β 1 1 −1
! !
1 1 1 1+β 1−β
=
2 1 −1 1+β −1 + β
!
1+β 0
=
0 1−β
14/01 = Λ. (5.70)
5.6.3 Diagonalization of Matrices
For a general n × n matrix M with n distinct eigenvalues λi , (i = 1, . . . , n), it is possible to show (but not
here) that there are n linearly independent eigenvectors xi . It then follows from (5.53) M is diagonalized by
the matrix
 1
x1 x21 · · · xn1

 x1 x2 · · · xn 
 2 2 2 
X=  .. .. .. ..  .
 (5.71)
 . . . . 
x1n x2n · · · xnn
Remark. If M has two or more equal eigenvalues it may or may not have n linearly independent eigenvectors.
If it does not have n linearly independent eigenvectors then it is not diagonalizable. As an example
recall matrix (5.49a), i.e.
 
0 1
M= , (5.72a)
0 0
which was shown to have only one linearly independent eigenvector, namely (5.49d):
 
1
x= . (5.72b)
0
Hence M is not diagonalizable.
Normal Matrices. As noted above, normal matrices always have n linearly independent eigenvectors, and
hence can always be diagonalized. So, in addition to Hermitian matrices, skew-symmetric Hermitian
matrices and unitary matrices (and their real restrictions) can always be diagonalized.

Natural Sciences Tripos: IB Mathematical Methods I 89 © [email protected], Michaelmas 2022


5.7 Applications of Diagonalization
The first application of diagonalization we consider concerns changes of basis.

5.7.1 Transformation Law for Metrics


In §5.2 we determined how vector components transform under a change of basis from {ui : i = 1, . . . , n}
to {u′i : i = 1, . . . , n}, while in §5.4 we introduced inner products and defined the metric associated with a
given basis. We next consider how a metric transforms under a change of basis.
First we recall from (5.28a) that for an arbitrary vector v, its components in the two bases transform
according to v = Av′ , where v and v′ are column vectors containing the components. From taking the
Hermitian conjugate of this expression, we also have that
v† = v′† A† . (5.73)
Hence for arbitrary vectors v and w

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
v · w = v† G w from (5.43)
′† † ′
= v A G Aw from (5.28a) and (5.73).
But from (5.43) we must also have that in terms of the new basis
v · w = v′† G′ w′ , (5.74)
where G′ is the metric in the new {u′i : i = 1, . . . , n} basis. Since v and w are arbitrary we conclude that
the metric in the new basis is given in terms of the metric in the old basis by
G′ = A† GA . (5.75)
Alternative derivation (unlectured). (5.75) can also be derived from the definition of the metric since
(G′ )ij ≡ G′ij = u′i · u′j from (5.40)
= (uk Aki ) · (uℓ Aℓj ) from (5.20a)
= A∗ki (uk · uℓ ) Aℓj from (5.34a) and (5.35)
= A†ik Gkℓ Aℓj from (5.40) and (5.44a)
= (A† GA)ij . (5.76)

Remark. As a check we observe that


(G′ )† = (A† GA)† = A† G† (A† )† = A† GA = G′ . (5.77)
13/02
13/03 Thus G′ is confirmed to be Hermitian.
13/04
5.7.2 Diagonalization of the Metric
If in (5.75) we identify A with X, the matrix with columns consisting of the orthonormal eigenvectors of G,
then from (5.51a), (5.51b), (5.53) and § 5.6.2,
 
λ1 0 · · · 0
 0 λ2 · · · 0 
G′ = X† GX = Λ =  . ..  , (5.78)
 
.. ..
 .. . . . 
0 0 ··· λn
where the λi are the real eigenvalues of the Hermitian matrix G.
Property: the eigenvalues of a metric are strictly positive. From (5.45b), (5.78) and writing Λij = λi δij , we
have that
Xn n
X
′† ′ ′ ′∗ ′
0⩽v Gv = vi λi δij vj = λi |vi′ |2 , (5.79a)
i,j=1 i=1

with equality only if v = 0. This can only be true for all vectors v′ if

λi > 0 for i = 1, . . . , n , (5.79b)


i.e. if the diagonal entries λi are strictly positive.

Natural Sciences Tripos: IB Mathematical Methods I 90 © [email protected], Michaelmas 2022


5.7.3 Orthonormal Bases
For a diagonalized metric, the new basis vectors {u′i : i = 1, . . . , n} are the eigenvectors of G since

u′j = ui Xij = ui xji (j = 1, . . . , n) . (5.80a)


Hence the new basis vectors are orthogonal; further, from (5.40) and (5.78),
u′i · u′j = G′ij = Λij = λi δij . (5.80b)
Hence, because the λi are strictly positive, we can normalise the basis, viz.
1
ei = √ u′i , (5.81a)
λi
so that
ei · ej = δij . (5.81b)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
The {ei : i = 1, . . . , n} are thus an orthonormal basis. We conclude that:
any vector space with a scalar product has an orthonormal basis.
Since from (5.40) the elements of the metric are just ei · ej , the metric for an orthonormal basis is the
identity matrix I.

The scalar product in orthonormal bases. Let the column vectors v and w contain the components of two
vectors v and w, respectively, in an orthonormal basis {ei : i = 1, . . . , n}. Then from (5.43)
v · w = v† I w = v† w . (5.82)
This is consistent with the definition of the scalar product from last year.
Orthogonality in orthonormal bases. If the vectors v and w are orthogonal, i.e. v · w = 0, then the compo-
nents in an orthonormal basis are such that
17/22 v† w = 0 . (5.83)

5.7.4 Transformations Between Orthonormal Bases


Given an orthonormal basis, a question that arises is what changes of basis maintain orthonormality. Suppose
that {e′i : i = 1, . . . , n} is a new orthonormal basis, and suppose that in terms of the original orthonormal
basis
e′i = ek Uki , (s.c.) (5.84)
where U is the transformation matrix (cf. (5.20a)). Then from (5.75) the metric for the new basis is given
by
G′ = U† I U = U† U . (5.85a)
For the new basis to be orthonormal we require that the new metric to be the identity matrix, i.e. we require
that
U† U = I . (5.85b)
Since det U ̸= 0, the inverse U−1 exists and hence U must be unitary:
U† = U−1 . (5.86)

Vector spaces over R. An analogous result applies to vector spaces over R. Then, because the transformation
matrix, say U = R, is real,
U† = R T ,
and so R must the orthogonal:
RT = R−1 . (5.87)
Example. An example of an orthogonal matrix is the 3 × 3 rotation matrix R that determines the new
components, v′ = RT v, of a three-dimensional vector v after a rotation of the axes (note that
under a rotation orthogonal axes remain orthogonal and unit vectors remain unit vectors).

Natural Sciences Tripos: IB Mathematical Methods I 91 © [email protected], Michaelmas 2022


5.7.5 Worked example (unlectured)
By finding an orthonormal set of eigenvectors, diagonalize the Hermitian matrix
 
0 i 0
H = −i 0 0 .
0 0 1

The characteristic equation is

−λ i 0
0 = −i −λ 0
0 0 1−λ
= (λ2 − 1)(1 − λ) , (5.88a)
with solutions

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
λ = 1, −1, 1 . (5.88b)

Eigenvector, x(−1) , for λ = −1. We require


    
1 i 0 x 0
−i 1 0 y  = 0 , (5.89a)
0 0 2 z 0
and hence
 
x
x(−1) = ix . (5.89b)
0
The normalized eigenvector, e(−1) , is thus, for α real,
 
1
eiα  
e(−1) =√ i . (5.89c)
2 0

Eigenvectors, x(1) , for λ = 1. We require


    
−1 i 0 x 0
 −i −1 0 y  = 0 , (5.90a)
0 0 0 z 0
and hence
 
x
x(1) = −ix . (5.90b)
z
The two independent variables x and z allow for a wide choice of eigenvectors, e(1) . Two normalised
orthogonal eigenvectors are, for β and γ real,
   
iβ 1 0
e
e(1) = √ −i , eiγ 0 . (5.90c)
2 0 1

Remark. All three eigenvectors given in (5.89c) and (5.90c) can be confirmed to be orthonormal.

Diagonalisation of H. Using (5.62), (5.89c) and (5.90c), with the choices α = β = γ = 0, it follows that
 √ √   √ √   
1/√2 −i/√ 2 0 0 i 0 1/√ 2 1/ √2 0 −1 0 0
1/ 2 i/ 2 0 −i 0 0  i/ 2 −i/ 2 0 =  0 1 0 .
0 0 1 0 0 1 0 0 1 0 0 1

Natural Sciences Tripos: IB Mathematical Methods I 92 © [email protected], Michaelmas 2022


5.7.6 Uses of diagonalization
Because diagonal matrices can be multiplied easily (i.e. component-wise), certain operations on diagonaliz-
able matrices are more easily carried out using the representation (5.53):

X−1 MX = Λ or M = XΛX−1 , (5.91)

Examples.

Mn = XΛX−1 XΛX−1 . . . XΛX−1


  

= XΛn X−1 , (5.92a)


−1

det(M) = det XΛX
= det(X) det(Λ) det(X−1 ) using det(AB) = det(A) det(B)
Yn
= det(Λ) = λi , (5.92b)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
i=1
−1

tr(M) = tr XΛX
= tr ΛX−1 X

using tr(AB) = Aij Bji = Bji Aij = tr(BA)
n
X
= tr(Λ) = λi , (5.92c)
i=1
n −1
tr(Mn ) = tr XΛ X


= tr Λn X−1 X = tr(Λn ) .

(5.92d)

Remark. (5.92b) and (5.92c) are in fact true for all matrices (whether or not they are diagonalizable), as
follows from the product and sum of roots in the characteristic equation

det(A − λ1) = det(A) + · · · + tr(A)(−λ)n−1 + (−λ)n = 0 . (5.93)

5.8 Forms
Definition: form. A map F(x)
n X
X n
F(x) = x† Ax = x∗i Aij xj , (5.94a)
i=1 j=1

is called a [sesquilinear] form; A is called its coefficient matrix.


Definition: Hermitian form. If A = H is an Hermitian matrix, the map F(x) : Cn 7→ C, where
n X
X n
F(x) = x† Hx = x∗i Hij xj , (5.94b)
i=1 j=1

is referred to as an Hermitian form on Cn .


Hermitian forms are real. An Hermitian form is real since

(x† Hx)∗ = (x† Hx)† since a scalar is its own transpose


= x† H † x since (AB)† = B† A†
= x† Hx . since H is Hermitian

Definition: quadratic form. An important special case is obtained by restriction to real vector spaces; then
x and H are real. It follows that HT = H, i.e. H is a real symmetric matrix; let us denote such a matrix
by S. In this case
Xn X n
F(x) = xT Sx = xi Sij xj . (5.94c)
i=1 j=1

When considered as a function of the real variables x1 , x2 , . . . , xn , this expression is referred to as a


quadratic form on Rn .

Natural Sciences Tripos: IB Mathematical Methods I 93 © [email protected], Michaelmas 2022


5.8.1 Eigenvectors and Principal Axes
From (5.62) the coefficient matrix, H, of a Hermitian form can be written as

H = UΛU† , (5.95a)

where U is unitary and Λ is a diagonal matrix of eigenvalues. Let

x′ = U † x , (5.95b)

then (5.94b) can be written as


F(x) = x† UΛU† x
= x′† Λx′ (5.95c)
Xn
= λi |x′i |2 . (5.95d)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
i=1

Transforming to a basis of orthonormal eigenvectors transforms the Hermitian form to a standard form with
no ‘off-diagonal’ terms. The orthonormal basis vectors that coincide with the eigenvectors of the coefficient
matrix, and which lead to the simplified version of the form, are known as principal axes.
Example. Let F(x) be the quadratic form

F(x) =2x2 − 4xy + 5y 2 = xT Sx , (5.96a)


where, by splitting the xy term equally between the off-diagonal matrix elements,
   
x 2 −2
x= and S= . (5.96b)
y −2 5

What surface is described by F(x) = constant?


Solution. The eigenvalues of the real symmetric matrix S are λ1 = 1 and λ2 = 6, with corresponding
unit eigenvectors    
1 2 1 1
u1 = √ and u2 = √ . (5.96c)
5 1 5 −2
The orthogonal matrix  
1 2 1
Q= √ (5.96d)
5 1 −2
transforms the original orthonormal basis to a basis of principal axes. Hence S = QΛQT , where
Λ is a diagonal matrix of eigenvalues. It follows that F can be rewritten in the normalised form

F = xT QΛQT x = x′T Λx′ = x′2 + 6y ′2 , (5.96e)


where  ′   
x 1 2 1 x
x′ = Q T x , i.e. = √ . (5.96f)
y′ 5 1 −2 y

The surface F(x) = constant is thus an ellipse.


Remark. In diagonalizing S by transforming to its eigenvector basis, we are rotating the coordinates
to reduce the quadratic form to its simplest form.

5.8.2 Quadrics and conics


A quadric, or quadric surface, is the n-dimensional hypersurface defined by the zeros of a real quadratic
polynomial. For co-ordinates (x1 , . . . , xn ) the general quadric is defined by

xi Aij xj + bi xi + c ≡ xT Ax + bT x + c = 0 , (s.c.) (5.97a)


or equivalently
xj Aij xi + bi xi + c ≡ xT AT x + bT x + c = 0 , (s.c.) (5.97b)

where A is a n × n matrix, b is a n × 1 column vector and c is a constant. Let

S = 21 A + AT ,

(5.97c)

Natural Sciences Tripos: IB Mathematical Methods I 94 © [email protected], Michaelmas 2022


then from (5.97a) and (5.97b)
xT Sx + bT x + c = 0 . (5.97d)
By taking the principal axes as basis vectors it follows that
T T
x′ Λx′ + b′ x′ + c = 0 . (5.97e)
where Λ = QT SQ, b′ = QT b and x′ = QT x. If Λ does not have a zero eigenvalue, then it is invertible and
(5.97e) can be simplified further by a translation of the origin
x′ → x′ − 21 Λ−1 b′ , (5.97f)
to obtain
T
x′ Λx′ = k . (5.97g)
where k is a constant.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Conic Sections. First suppose that n = 2 and that Λ (or equivalently S) does not have a zero eigenvalue,
then with  ′  
′ x λ1 0
x = and Λ = , (5.98a)
y′ 0 λ2
(5.97g) becomes
λ1 x′2 + λ2 y ′2 = k , (5.98b)
which is the normalised equation of a conic section.

λ1 λ2 > 0. If λ1 λ2 > 0, then k must have the same sign as the λj (j = 1, 2),
and (5.98b) is the equation of an ellipse with principal axes coin-
ciding with the x′ and y ′ axes.

Scale. The scale of the ellipse is determined by k.


Shape. The shape of the ellipse is determined by the ratio of eigen-
values λ1 and λ2 .
Orientation. The orientation of the ellipse in the original basis is
determined by the eigenvectors of S.

In the degenerate case, λ1 = λ2 , the ellipse becomes a circle with


no preferred principal axes. Any two orthogonal (and hence linearly
independent) vectors may be chosen as the principal axes.
λ1 λ2 < 0. If λ1 λ2 < 0 then (5.98b) is the equation for a hyperbola with
principal axes coinciding with the x′ and y ′ axes. Similar results to
above hold for the scale, shape and orientation.
λ1 λ2 = 0. If λ1 = λ2 = 0, then there is no quadratic term, so assume that
only one eigenvalue is zero; wlog λ2 = 0. Then instead of (5.97f),
translate the origin according to
2
b′1 c b′ 1
x′ → x′ − , y′ → y′ − + , (5.99)
2λ1 b′2 4λ1 b′2

assuming b′2 ̸= 0, to obtain instead of (5.98b)


2
λ1 x′ + b′2 y ′ = 0 . (5.100)

This is the equation of a parabola with principal axes coinciding


with the x′ and y ′ axes. Similar results to above hold for the scale,
shape and orientation.
Remark. If b′2 = 0, the equation for the conic section can be re-
duced (after a translation) to λ1 x′2 = k (cf. (5.98b)), with possible
solutions of zero (λ1 k < 0), one (k = 0) or two (λ1 k > 0) lines.

Natural Sciences Tripos: IB Mathematical Methods I 95 © [email protected], Michaelmas 2022


This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Figure 5.1: Ellipsoid (λ1 > 0, λ2 > 0, λ3 > 0, k > 0); Wikipedia.

Three Dimensions. If n = 3 and Λ does not have a zero eigenvalue, then with
 ′  
x λ1 0 0
x′ = y ′  and Λ =  0 λ2 0  , (5.101a)
z′ 0 0 λ3

(5.97g) becomes
λ1 x′2 + λ2 y ′2 + λ3 z ′2 = k . (5.101b)
When λi k > 0, r
k
the distance to surface along the ith principal axes = . (5.101c)
λi
Analogously to the case of two dimensions, this equation describes a number of characteristic surfaces.

Coefficients Quadric Surface


λ1 > 0, λ2 > 0, λ3 > 0, k > 0. Ellipsoid: this includes the case of metric matrices, since S is then
positive definite and the λi are all positive.
λ1 = λ2 . Surface of revolution about the z ′ axis.
λ1 = λ2 > 0, λ3 > 0, k > 0. Spheroid: the surface is a prolate spheroid if λ1 = λ2 > λ3 and
an oblate spheroid if λ1 = λ2 < λ3 .
λ1 = λ2 = λ3 > 0, k > 0. Sphere.
λ3 = 0. Cylinder.
λ1 > 0, λ2 > 0, λ3 = 0, k > 0. Elliptic cylinder.
λ1 > 0, λ2 > 0, λ3 < 0, k > 0. Hyperboloid of one sheet.
λ1 > 0, λ2 > 0, λ3 < 0, k = 0. Elliptical conical surface.
λ1 > 0, λ2 < 0, λ3 < 0, k > 0. Hyperboloid ofqtwo sheets.
15/02 λ1 > 0, λ2 = λ3 = 0, λ1 k ⩾ 0. Planes x′ = ± k
λ1 .

5.8.3 The Stationary Properties of the Eigenvalues


Suppose that we have an orthonormal basis, and let x
be a point on xT Sx = k where k is a constant. Then
from (5.82) the distance squared from the origin to the
quadric surface is xT x. This distance naturally depends
on the value of k, i.e. the scale of the surface. This
dependence on k can be removed by considering the
square of the relative distance to the surface, i.e.

2 xT x
(relative distance to surface) = . (5.102)
xT Sx

Natural Sciences Tripos: IB Mathematical Methods I 96 © [email protected], Michaelmas 2022


This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Figure 5.2: Prolate spheroid (λ1 = λ2 > λ3 > 0, k > 0) and oblate spheroid (0 < λ1 = λ2 < λ3 , k > 0);
Wikipedia.

Figure 5.3: Hyperboloid of one sheet (λ1 > 0, λ2 > 0, λ3 < 0, k > 0) and hyperboloid of two sheets
(λ1 > 0, λ2 < 0, λ3 < 0, k > 0); Wikipedia.

Figure 5.4: Paraboloid of revolution (λ1 x′2 + λ2 y ′2 + z ′ = 0, λ1 > 0, λ2 > 0) and hyperbolic paraboloid
(λ1 x′2 + λ2 y ′2 + z ′ = 0, λ1 < 0, λ2 > 0); Wikipedia.

Natural Sciences Tripos: IB Mathematical Methods I 97 © [email protected], Michaelmas 2022


Let us consider the directions for which this relative distance, or equivalently its inverse (referred to as the
Rayleigh quotient)
xT Sx
λ(x) = T , (5.103)
x x
is stationary. We can find the so-called first variation in λ(x) by letting

x → x + δx and xT → xT + δxT , (5.104)

by performing a Taylor expansion, and by ignoring terms quadratic or higher in |δx|. First note that

(xT + δxT )(x + δx) = xT x + xT δx + δxT x + . . .


= xT x + 2δxT x + . . . . since the transpose of a scalar is itself

Hence

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1 1
= T
(xT + δxT )(x + δx) x x + 2δxT x + . . .
−1
2δxT x

1
= T 1 + T + ...
x x x x
2δxT x
 
1
= T 1 − T + ... .
x x x x

Similarly

(xT + δxT )S(x + δx) = xT Sx + xT Sδx + δxT Sx + . . .


= xT Sx + 2δxT Sx + . . . . since ST = S

Putting the above results together we have that

(xT + δxT )S(x + δx) xT Sx


δλ(x) ≡ λ(x + δx) − λ(x) = − T
(xT + δxT )(x + δx) x x
T T
2δxT x xT Sx
 
x Sx + 2δx Sx + . . .
= T
1 − T
+ . . . − T
x x x x x x
2δxT Sx xT Sx 2δxT x
= − T + ...
xT x x x xT x
2
= T δxT Sx − λ(x)δxT x

x x
2
= T δxT (Sx − λ(x)x) . (5.105)
x x

The Rayleigh-Ritz variational principle. Hence the first varia-


tion of λ(x) is zero for all possible δx when

Sx = λ(x)x , (5.106)

i.e. when x is an eigenvector of S and λ is the associated


eigenvalue. So the eigenvectors of S are the directions
which make the relative distance (5.102) stationary, and
the eigenvalues are the values of (5.103) at the stationary
points.
By a similar argument one can show that the eigenvalues of an Hermitian matrix, H, are the values of the
function
x† Hx
λ(x) = † (5.107)
x x
18/22 at its stationary points.

Natural Sciences Tripos: IB Mathematical Methods I 98 © [email protected], Michaelmas 2022


6 Elementary Analysis

6.0 Why Study This?

Analysis is one of the foundations upon which mathematics is built. It is the careful study of infinite processes
such as limits, convergence, continuity, differential and integral calculus. This section covers some of the
basic concepts including the important problem of the convergence of infinite series, since you need to have
an idea of when, and when not, you can sum a series, e.g. a Fourier series.
We also discuss the remarkable properties of analytic functions of a complex variable.

6.1 Sequences and Limits


6.1.1 Sequences

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
A sequence is a set of of real or complex numbers, sn , defined for all integers n ⩾ n0 and occurring in order.
If the sequence is unending we have an infinite sequence.
1
Example. If the nth term of a sequence is sn = n, the sequence is

1, 12 , 13 , 14 , . . . . (6.1)

6.1.2 Sequences tending to a limit, or not.


Possible behaviours of the sn as n increases are:
(i) sn tends towards a particular value;
(ii) sn does not tend to any value but remains limited in magnitude;
(iii) sn is unlimited in magnitude.

Sequences tending to a finite limit. A sequence, sn , is said to tend to the limit s if, given any positive ε,
there exists N ≡ N (ε) such that

|sn − s| < ε for all n>N. (6.2a)

In other words the members of the sequence are eventually contained within an arbitrarily small disk
centred on s. We write this as

lim sn = s or as sn → s as n → ∞ (6.2b)
n→∞

Examples.
(i) Suppose sn = n−α for any α > 0. Given 0 < ε < 1 it follows that
  α1
−α 1
|n − 0| < ε for all n > N (ε) = . (6.3a)
ε
Hence
lim n−α = 0 for any α > 0. (6.3b)
n→∞

(ii) Suppose sn = xn with |x| < 1. Given 0 < ε < 1 let N (ε) be the smallest integer such that,
for a given x,
log 1/ε
N> . (6.4a)
log 1/|x|
Then, if n > N ,

|sn − 0| = |x|n < |x|N < ε . (6.4b)


Hence
lim xn = 0 . (6.4c)
n→∞

Natural Sciences Tripos: IB Mathematical Methods I 99 © [email protected], Michaelmas 2022


Cauchy’s principle of convergence. A necessary and sufficient condition for the sequence sn to converge is
that, for any positive number ε, |sn+m − sn | < ε for all positive integers m, for sufficiently large n.
This condition does not require a knowledge of the value of the limit s.
Bounded sequences. If sn does not tend to a limit it may nevertheless be bounded.
Definition. The sequence sn is bounded as n → ∞ if there exists a positive number K such that
|sn | < K for sufficiently large n.
Example. Suppose
 
n+1
sn = einα . (6.5a)
n
Then for all n ⩾ 2
n+1
|sn | = < 2. (6.5b)
n
We conclude that the sequence sn is bounded.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Property. An increasing sequence tends either to a limit or to +∞. Hence a bounded increasing
sequence tends to a limit, i.e. if

sn+1 > sn , and sn < K ∈ R for all n, then s = lim sn exists. (6.6)
n→∞

Remark. You really ought to have a proof of this property, but I do not have time.22
Sequences tending to infinity. A sequence, sn , is said to tend to infinity if given any A (however large), there
exists N ≡ N (A) such that
sn > A for all n > N . (6.7a)
We then write
sn → ∞ as n → ∞. (6.7b)
Similarly we say that sn → −∞ as n → ∞ if given any A (however large), there exists N ≡ N (A)
such that
sn < −A for all n > N . (6.7c)

Oscillating sequences. If a sequence does not tend to a limit or ±∞, then sn is said to oscillate. If sn
oscillates and is bounded, it oscillates finitely, otherwise it oscillates infinitely.

6.2 Convergence of Infinite Series


6.2.1 Convergent and divergent series

The section is concerned with the meaning of an infinite series such as



X
ur (6.8)
r=r0

involving the addition of an infinite number of terms.

Definition: Partial sum. Given an infinite sequence of numbers u1 , u2 , . . . , define the partial sum sn by
n
X
sn = ur . (6.9)
r=r0

Definition: Convergent series. If as n → ∞, sn tends to a finite limit, s, then we say that the infinite series

X
ur , (6.10)
r=r0

converges (or is convergent), and that s is its sum.

22 Alternatively you can view this property as an axiom that specifies the real numbers R essentially uniquely.

Natural Sciences Tripos: IB Mathematical Methods I 100 © [email protected], Michaelmas 2022


Definition: Divergent series. An infinite series which is not convergent is called divergent.
Remarks.
(i) Whether a series converges or diverges does not depend on the value of r0 (i.e. on when the series
begins) but only on the behaviour of the terms for large r.
P
(ii) According to Cauchy’s principle of convergence, a necessary and sufficient condition for ur to
converge is that, for any positive number ε,
|sn+m − sn | = |un+1 + un+2 + · · · + un+m | < ε (6.11)
for all positive integers m, for sufficiently large n.
Example: the convergence of a geometric series. The series

X
zr = 1 + z + z2 + z3 + . . . , (6.12a)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
r=0

converges to (1 − z)−1 provided that |z| < 1.


Proof. Consider the partial sum
1 − zn
sn = 1 + z + · · · + z n−1 = . (6.12b)
1−z
If |z| < 1, then from (6.4c) we have that z n → 0 as n → ∞, and hence
1
s = lim sn = for |z| < 1. (6.12c)
n→∞ 1−z
However if |z| ⩾ 1 the series diverges.
Example: the divergence of the harmonic series. The harmonic series, for which ur = r−1 , diverges.
Proof. Consider the partial sum
n
X 1 1 1 1
sn = =1+ 2 + 3 ··· + n . (6.13a)
r=1
r
Then Z n+1
dx
sn > = ln(n + 1) . (6.13b)
1 x
Therefore sn increases without bound and does not tend to a limit as n → ∞.
Alternative Proof. Consider s2m where m is an integer. First we note that
1
m=1: s2 = 1 + 2
1 1 1 1 1 1
m=2: s4 = s2 + 3 + 4 > s2 + 4 + 4 =1+ 2 + 2
1 1 1 1 1 1 1 1 1 1 1
m=3: s8 = s4 + 5 + 6 + 7 + 8 > s4 + 8 + 8 + 8 + 8 >1+ 2 + 2 + 2 .
Similarly we can show that (e.g. by induction)
m
s2m > 1 + 2 , (6.13c)
and hence the series is divergent.

6.2.2 A necessary condition for convergence


A necessary condition for s to converge is that ur → 0 as r → ∞.
Proof. Using the fact that ur = sr − sr−1 we have that
lim ur = lim (sr − sr−1 ) = lim sr − lim sr−1 = s − s = 0 . (6.14)
r→∞ r→∞ r→∞ r→∞

Remark. However, as illustrated by the harmonic series (see (6.13a), (6.13b) and (6.13c)), ur → 0 as r → ∞
is not a sufficient condition for convergence.

Natural Sciences Tripos: IB Mathematical Methods I 101 © [email protected], Michaelmas 2022


6.2.3 Absolute and conditional convergence
P
Definition: Absolute convergence. A series ur is said to converge absolutely if

X
|ur | (6.15)
r=1
converges.
P P
Property. If |ur | converges, then ur also converges.
P
Proof. If |ur | converges then, for any positive number ε,
|un+1 | + |un+2 | + · · · + |un+m | < ε (6.16a)
for all positive integers m, for sufficiently large n. But then
|un+1 + un+2 + · · · + un+m | ⩽ |un+1 | + |un+2 | + · · · + |un+m |

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
< ε, (6.16b)
P
and so ur also converges.
P P P
Definition: Conditional convergence. If |ur | diverges, then ur may or may not converge. If ur does
converge, it is said to converge conditionally.
Example. Suppose that
n
1 X 1
ur = (−1)r−1 so that sn = (−1)r−1 = 1 − 1
2 + 1
3 · · · + (−1)n−1 n1 . (6.17a)
r r=1
r
Then, from the Taylor expansion

X (−x)r
log(1 + x) = − , (6.17b)
r=1
r
P∞
we spot that s = limn→∞ sn = log 2; hence r=1 ur converges.
P∞ P∞ However, from (6.13a), (6.13b) and
(6.13c) we already know that r=1 |ur | diverges. Hence r=1 ur is conditionally convergent.

6.3 Tests of Convergence


6.3.1 The comparison test
This test applies to series of non-negative real numbers.
P∞
Statement. Suppose that we are given that vr > 0 and that S = r=1 vr is convergent. Suppose also that
0 <ur < Kvr (6.18a)
for some K independent of r. Then the infinite series
X∞
ur (6.18b)
r=1

is also convergent.
Pn
Proof. Since ur > 0, sn = r=1 ur is an increasing sequence. Further
n
X n
X
sn = ur < K vr , (6.19a)
r=1 r=1
and thus

X
lim sn < K vr = K S , (6.19b)
n→∞
r=1
16/01 P∞
i.e. sn is an increasing bounded sequence. Thence from (6.6), r=1 ur is convergent.
16/02
P∞ P∞
16/03 Remark. Similarly if r=1 vr diverges, vr > 0 and ur > Kvr for some K independent of r, then r=1 ur
16/04 diverges.

Natural Sciences Tripos: IB Mathematical Methods I 102 © [email protected], Michaelmas 2022


6.3.2 D’Alembert’s ratio test
ϱr ,
P P P
This uses a comparison between a given series ur of complex terms and a geometric series vr =
where ϱ > 0.

D’Alembert’s ratio test. We start by supposing that the ur are real and positive, i.e. ur > 0. Define the ratio
of successive terms to be
ur+1
ϱr = , (6.20a)
ur
and suppose that ϱr tends to a limit ϱ as r → ∞, i.e.
ur+1
lim = ϱ. (6.20b)
r→∞ ur
P
Then ur converges if ϱ < 1 and diverges if ϱ > 1.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Proof.
ϱ < 1. For the case ϱ < 1, choose σ with ϱ < σ < 1. Then there exists N ≡ N (σ) such that
ur+1
<σ for all r>N. (6.21a)
ur
It follows that
∞ N  
X X uN +2 uN +2 uN +3
ur = ur + uN +1 1 + + + ...
r=1 r=1
uN +1 uN +1 uN +2
N
X
< ur + uN +1 (1 + σ + σ 2 + . . . ) by hypothesis
r=1
N
X uN +1
< ur + by (6.12c) since σ < 1. (6.21b)
r=1
1−σ
P∞ Pn
We conclude that r=1P ur is bounded. Thence, since sn = r=1 ur is an increasing sequence, it
follows from (6.6) that ur converges.
ϱ > 1. For the case ϱ > 1, choose τ with 1 < τ < ϱ. Then there exists M ≡ M (τ ) such that
ur+1
>τ >1 for all r>M, (6.22a)
ur
and hence
ur
> τ r−M > 1 for all r>M. (6.22b)
uM
P
Thus, since ur ̸→ 0 as r → ∞, we conclude that ur diverges.
P
Corollary. A series ur of complex terms converges if the limit of the absolute ratio of successive terms is
less than one, i.e. if
ur+1
lim = ϱ < 1. (6.23)
r→∞ ur
ur converges absolutely, thence from § 6.2.3 we conclude
P
Proof. D’Alembert’s
P ratio test shows that
that ur converges.
P −1
Example. For the harmonic series r ,
r
ϱr = →1 as r → ∞ , (6.24)
r+1
from which nothing can be concluded. A different test is required, such as the integral comparison
test.
Remark. The ratio test can not be used for series in which some of the terms are zero. However, it can
19/22 sometimes be adapted by relabelling the series to remove the vanishing terms.

Natural Sciences Tripos: IB Mathematical Methods I 103 © [email protected], Michaelmas 2022


6.3.3 Cauchy’s test
Suppose that ur > 0 and that
lim u1/r
r = ϱ. (6.25)
r→∞
P P
Then ur converges if ϱ < 1, while ur diverges if ϱ > 1.
Proof. First suppose that ϱ < 1. Choose σ with ϱ < σ < 1. Then there exists N ≡ N (σ) such that
u1/r
r < σ, i.e. ur < σ r for all r>N. (6.26a)
It follows that ∞ N ∞
X X X
ur < ur + σr . (6.26b)
r=1 r=1 r=N +1
P∞ Pn
We conclude that r=1 ur is bounded (since
P σ < 1). Moreover sn = r=1 ur is an increasing sequence,
and hence from (6.6) we conclude that ur converges.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Next suppose that ϱ > 1. Choose τ with 1 < τ < ϱ. Then there exists M ≡ M (τ ) such that
u1/r
r > τ > 1 , i.e. ur > τ r > 1 , for all r > M . (6.26c)
P
Thus, since ur →
̸ 0 as r → ∞, ur must diverge.

6.4 Functions of a Continuous Variable


6.4.1 Limits and continuity
Building on the ideas introduced in § 6.1.2 on sequences tending to a limit, in this section we consider how
a real or complex function f (z) of a real or complex variable z behaves near a point z0 .
Definition: Limit at a point z = z0 . The function f (z) tends to the limit L as z → z0 if, for any positive
number ε, there exists a positive number δ, depending on ε, such that |f (z) − L| < ε for all z such
that |z − z0 | < δ.
We write this as

lim f (z) = L or f (z) → L as z → z0 . (6.27a)


z→z0

Remark. The value of L would normally be f (z0 ). However,


cases such as
sin z
lim = 1. (6.27b)
z→0 z

must be expressed as limits because sin 0/0 = 0/0 is


indeterminate.
Definition: Continuity at a point z = z0 . The function f (z) is continuous at the point z = z0 if f (z) → f (z0 )
as z → z0 .
Definition: Bounded at a point z = z0 . The function f (z) is bounded as z → z0 if there exist positive num-
bers K and δ such that |f (z)| < K for all z with |z − z0 | < δ.
Definition: Limit as z → ∞. The function f (z) tends to the limit L as z → ∞ if, for any positive number ε,
there exists a positive number R, depending on ε, such that |f (z) − L| < ε for all z with |z| > R. We
write this as
lim f (z) = L or f (z) → L as z → ∞ . (6.28)
z→∞

Definition: Bounded as z → ∞. The function f (z) is bounded as z → ∞ if there exist positive numbers K
and R such that |f (z)| < K for all z with |z| > R.
Warning: approaches to a point. There are different ways in which z can approach z0 or ∞, especially in
the complex plane. Sometimes the limit or bound applies only if the point is approached in a particular
way. For example, consider tanh(z) as |z| → ∞ for z real:
lim tanh z = 1, lim tanh z = −1 (6.29a)
z→+∞ z→−∞

This notation implies that z is approaching positive or negative real infinity along the real axis. But
if z approaches infinity along the imaginary axis, i.e. z → ±i∞, the limit of tanh is not defined.

Natural Sciences Tripos: IB Mathematical Methods I 104 © [email protected], Michaelmas 2022


Remark. In the context of real variables, x → ∞ usually means specifically x → +∞. A related
notation for one-sided limits is exemplified by

x(1 + x) x(1 + x)
lim = 1, lim = −1 (6.29b)
x→0+ |x| x→0− |x|

6.4.2 The O notation


The symbols O, o and ∼ are often used to compare the rates of growth or decay of different functions.
Suppose that f (z) and g(z) are functions of z, then

f (z)
(i) if is bounded as z → z0 , we say that f (z) = O(g(z)) as z → z0 ;
g(z)
f (z)
→0 as z → z0 , we say that f (z) = o(g(z)) as z → z0 ;

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(ii) if
g(z)
f (z)
(iii) if →1 as z → z0 , we say that f (z) ∼ g(z) as z → z0 .
g(z)

Remarks.

(i) These definitions continue to apply when z0 = ∞.


(ii) f (z) = O(1) means that f (z) is bounded.
(iii) Either f (z) = o(g(z)) or f (z) ∼ g(z) implies f (z) = O(g(z)).
(iv) Only f (z) ∼ g(z) is a symmetric relation (so should not be written as f (z) → g(z)).
(v) If f (z) ∼ g(z) we say that f is asymptotically equal to g.
(vi) The O notation is often used in conjunction with truncated Taylor series, e.g. for small (z − z0 )

f (z) = f (z0 ) + (z − z0 )f ′ (z0 ) + 21 (z − z0 )2 f ′′ (z0 ) + O((z − z0 )3 ) . (6.30)


Examples.
(i) As z → 0 we have that

2 sin z = O(1) since 2 sin z/1 is bounded as z → 0;


2 sin z = o(1) since 2 sin z/1 → 0 as z → 0;
2 sin z = O(z) since 2 sin z/z is bounded as z → 0;
sin z ∼ z since sin z/z → 1 as z → 0.

(ii) As x → +∞ we have that

ln x = o(x) since ln x/x → 0 as x → +∞;


1 x
17/02 cosh x ∼ 2e since 2 cosh x/ex → 1 as x → +∞.
17/04

6.5 Taylor’s Theorem for Functions of a Real Variable


Let f (x) be a (real or complex) function of a real variable x, which is differentiable at least n times in the
interval x0 ⩽ x ⩽ x0 + h. Then the Taylor series of f (x0 + h) is given by

h2 ′′ hn−1 (n−1)
f (x0 + h) = f (x0 ) + hf ′ (x0 ) + f (x0 ) + . . . + f (x0 ) + Rn , (6.31a)
2! (n − 1)!

where the remainder after n terms, Rn , can be shewn to be (by integrating by parts)
x0 +h
(x0 + h − x)n−1 (n)
Z
Rn = f (x) dx . (6.31b)
x0 (n − 1)!

Natural Sciences Tripos: IB Mathematical Methods I 105 © [email protected], Michaelmas 2022


Alternative forms of the remainder. The remainder term can be expressed in alternative ways. Lagrange’s
expression for the remainder is
hn (n)
Rn = f (ξ) (6.32a)
n!
where ξ is an unknown number in the interval x0 < ξ < x0 + h. It follows that
Rn = O(hn ) . (6.32b)
Smooth functions. If f (x) is a smooth function, i.e. if f (x) is infinitely differentiable in x0 ⩽ x ⩽ x0 + h, we
can write (6.31a) as an infinite Taylor series:

X hn (n)
f (x0 + h) = f (x0 ) . (6.33)
n=0
n!
This power series in h converges for sufficiently small h (see § 6.8)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
6.6 Analytic Functions of a Complex Variable
6.6.1 Complex differentiability

Definition: Complex differentiability. The complex derivative of the function f (z) at the point z = z0 is
defined as
f (z) − f (z0 )
f ′ (z0 ) = lim , (6.34a)
z→z0 z − z0
where the same limit must be obtained for any sequence of complex values for z that tends to z0 . If
this same limit exists, the function f (z) is said to be complex differentiable at z = z0 .
Alternative expression. Another way to write this is

df f (z + δz) − f (z)
≡ f ′ (z) = lim , (6.34b)
dz δz→0 δz
where the limit must be the same when δz → 0 by any
route/direction in the complex plane.
Remark. Requiring a function of a complex variable to be differ-
entiable is a surprisingly strong constraint.

6.6.2 The Cauchy–Riemann equations


Express f = u + iv and z = x + iy in terms of their real and imaginary parts:
f (z) = u(x, y) + iv(x, y) (6.35)
If f ′ (z) exists we can calculate it by assuming that δz = δx + i δy approaches 0 along the real axis, i.e. by
taking δy = 0; then
f (z + δx) − f (z)
f ′ (z) = lim
δx→0 δx
u(x + δx, y) + iv(x + δx, y) − u(x, y) − iv(x, y)
= lim
δx→0 δx
u(x + δx, y) − u(x, y) v(x + δx, y) − v(x, y)
= lim + i lim
δx→0 δx δx→0 δx
∂u ∂v
= +i . (6.36a)
∂x ∂x
However, from definition (6.34a), the derivative must have the same value if δz approaches 0 along the
imaginary axis, i.e. by taking δx = 0; then
f (z + i δy) − f (z)
f ′ (z) = lim
δy→0 i δy
u(x, y + δy) + iv(x, y + δy) − u(x, y) − iv(x, y)
= lim
δy→0 i δy
u(x, y + δy) − u(x, y) v(x, y + δy) − v(x, y)
= −i lim + lim
δy→0 δy δy→0 δy
∂u ∂v
= −i + . (6.36b)
∂y ∂y
Natural Sciences Tripos: IB Mathematical Methods I 106 © [email protected], Michaelmas 2022
Comparing the real and imaginary parts of (6.36a) and (6.36b), we deduce the Cauchy–Riemann equations
∂u ∂v ∂v ∂u
= , =− . (6.37)
∂x ∂y ∂x ∂y
These are necessary conditions for f (z) to have a complex derivative. They are also sufficient conditions,
provided that the partial derivatives are also continuous.

6.6.3 Analytic functions

Definition: Analytic function. If a function f (z) has a complex derivative at every point z in a region R of
the complex plane, it is said to be analytic in R.23
Remark. To be analytic at a point z = z0 , f (z) must be differentiable throughout some neighbourhood
|z − z0 | < ε of that point.
Definition: Entire function. An entire function is one that is analytic in the whole complex plane.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Examples of entire functions. Each of the following can be confirmed to satisfy the the Cauchy-Riemann
equations (6.37) in the whole complex plane.
(i) f (z) = c: a complex constant.
(ii) f (z) = z: for which u = x and v = y.
(iii) f (z) = exp(z): for which
f (z) = ez = ex eiy = ex cos y + i ex sin y = u + iv . (6.38a)
The Cauchy–Riemann equations are satisfied for all x and y since
∂u ∂v ∂v ∂u
= ex cos y = and = ex sin y = − . (6.38b)
∂x ∂y ∂x ∂y
As expected the derivative of the exponential function is
∂u ∂v
f ′ (z) = +i = ex cos y + i ex sin y = ez . (6.38c)
∂x ∂x
(iv) f (z) = z n , where n is a positive integer: for which

x = r cos θ, y = r sin θ, u = rn cos nθ and v = rn sin nθ. (6.39)


Properties.
(i) Sums, products and compositions of analytic functions are also analytic, e.g.

(a) f (z) = P (z) = cn z n + cn−1 z n−1 + · · · + c0 where cr ∈ C, (6.40a)


2 3
(b) f (z) = z exp(iz ) + z . (6.40b)

(ii) The usual product, quotient and chain rules apply to complex derivatives of analytic functions,
e.g.
d n d d
z = nz n−1 , sin z = cos z, cosh z = sinh z . (6.40c)
dz dz dz
Definition: Singular points. Many complex functions are analytic everywhere in the complex plane except
at isolated points, which are called the singular points or singularities of the function.
Examples.
(i) f (z) = P (z)/Q(z), where P (z) and Q(z) are polynomials. This is called a rational function
and is analytic except at points where Q(z) = 0.
(ii) f (z) = z c is analytic except at z = 0 if c is a complex constant which is not a positive integer
(see (6.39) for the case when c is a positive integer).
(iii) f (z) = ln z is analytic except at z = 0.
The last two examples are in fact multiple-valued functions, which require special treatment (see next
term).

23 Some use this definition for holomorphic functions and use analytic for functions with a convergent power series. However,

in complex analysis holomorphic functions are analytic and vice versa.


Natural Sciences Tripos: IB Mathematical Methods I 107 © [email protected], Michaelmas 2022
Examples of non-analytic functions.
(i) f (z) = Re (z), for which u = x and v = 0, so the Cauchy–Riemann equations are not satisfied
anywhere.
(ii) f (z) = z ∗ , for which u = x and v = −y.
(iii) f (z) = |z|, for which u = (x2 + y 2 )1/2 and v = 0.
(iv) f (z) = |z|2 , for which u = x2 + y 2 and v = 0.
Remark. In this case the Cauchy–Riemann equations are satisfied only at x = y = 0 and we can
say that f ′ (0) = 0. However, f (z) is not analytic even at z = 0 because it is not differentiable
throughout any neighbourhood |z| < ε of 0.
6.6.4 Consequences of the Cauchy–Riemann equations
If we know the real part of an analytic function in some region, we can find its imaginary part (or vice
versa) up to an additive constant by integrating the Cauchy–Riemann equations.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Example. Suppose that you are given u(x, y) = x2 − y 2 , then what is the analytic function? From (6.37)
∂v ∂u
= = 2x ⇒ v = 2xy + g(x) , (6.41a)
∂y ∂x
∂v ∂u
=− = 2y ⇒ 2y + g ′ (x) = 2y ⇒ g ′ (x) = 0 . (6.41b)
∂x ∂y
Therefore v(x, y) = 2xy + c, where c is a real constant, and we find that
f (z) = x2 − y 2 + i(2xy + c) = (x + iy)2 + ic = z 2 + ic . (6.41c)

Property: u and v are harmonic functions. The real and imaginary parts of an analytic function satisfy
Laplace’s equation (they are harmonic functions):
∂2u ∂2u
   
∂ ∂u ∂ ∂u
+ 2 = +
∂x2 ∂y ∂x ∂x ∂y ∂y
   
∂ ∂v ∂ ∂v
= + −
∂x ∂y ∂y ∂x
=0 (6.42)
The proof that ∇2 v = 0 is similar.
Remark. This property provides a useful method for solving Laplace’s equation in two dimensions:
one “just” needs to find an analytic function that satisfies the boundary conditions.
Property: u and v are conjugate harmonic functions. Using the Cauchy–Riemann equations (6.37), we see
that
∂u ∂v ∂u ∂v
∇u · ∇v = +
∂x ∂x ∂y ∂y
∂v ∂v ∂v ∂v
= −
∂y ∂x ∂x ∂y
= 0. (6.43)
Hence the curves of constant u and those of constant v are orthogonal: u and v are said to be conjugate
20/22
harmonic functions.

6.6.5 Taylor series for analytic functions

If a function of a complex variable is analytic in a region R of the complex plane, not only is it differentiable
everywhere in R, it is also differentiable any number of times. It follows that if f (z) is analytic at z = z0 ,
it has an infinite Taylor series

X 1 1 dn f
f (z) = an (z − z0 )n , where an = f (n) (z0 ) ≡ (z0 ) . (6.44)
n=0
n! n! dz n
As discussed in § 6.8, this series converges within some neighbourhood of z0 .
Alternative definition of analyticity. An alternative definition of the analyticity of a function f (z) at z = z0
is that f (z) has a Taylor series expansion about z = z0 with a non-zero radius of convergence.

Natural Sciences Tripos: IB Mathematical Methods I 108 © [email protected], Michaelmas 2022


6.7 Zeros, Poles and Essential Singularities

6.7.1 Zeros of complex functions


Definition: Order. The zeros of f (z) are the points z = z0 in the complex plane where f (z0 ) = 0. A zero is
of order N if

f (z0 ) = f ′ (z0 ) = f ′′ (z0 ) = · · · = f (N −1) (z0 ) = 0 but f (N ) (z0 ) ̸= 0 (6.45a)

The first non-zero term in the Taylor series of f (z) about z = z0 is then proportional to (z − z0 )N .
Indeed
f (z) ∼ aN (z − z0 )N as z → z0 (6.45b)
A simple zero is a zero of order 1. A double zero is one of order 2, etc.
Examples.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
(i) f (z) = z has a simple zero at z = 0.
(ii) f (z) = (z − i)2 has a double zero at z = i.
(iii) f (z) = z 2 − 1 = (z − 1)(z + 1) has simple zeros at z = ±1.
Worked exercise. Find and classify the zeros of f (z) = sinh z.
Answer.
sinh z = 21 (ez − e−z ) = 0
if
ez = e−z ⇒ e2z = 1 ⇒ z = nπi, n ∈ Z.

Since f ′ (z) = cosh z = cos(nπ) ̸= 0 at these points, all the zeros are simple zeros.

6.7.2 Poles of complex functions


Definition: Order. Suppose g(z) is analytic and non-zero at z = z0 . Consider the function

f (z) = (z − z0 )−N g(z) , (6.46a)


in which case
f (z) ∼ g(z0 )(z − z0 )−N as z → z0 . (6.46b)

f (z) is not analytic at z = z0 , and we say that f (z) has a pole of order N . We refer to a pole of order 1
as a simple pole, a pole of order 2 as a double pole, etc.
Expansion of f (z) near a pole. Because g(z) is analytic, from (6.44) it has a Taylor series expansion at z0 :

X
g(z) = bn (z − z0 )n with b0 ̸= 0 . (6.47a)
n=0
Hence

X
f (z) = (z − z0 )−N g(z) = an (z − z0 )n , (6.47b)
n=−N
with an = bn+N , and a−N ̸= 0. This is not a Taylor series because it includes negative powers of
z − z0 , and f (z) is not analytic at z = z0 .
Remarks.
(i) If f (z) has a zero of order N at z = z0 , then 1/f (z) has a pole of order N there, and vice versa.
(ii) If f (z) is analytic and non-zero at z = z0 and g(z) has a zero of order N there, then f (z)/g(z)
has a pole of order N there.
Worked exercise. Find and characterise the poles of
2z
f (z) = . (6.48a)
(z + 1)(z − i)2

Natural Sciences Tripos: IB Mathematical Methods I 109 © [email protected], Michaelmas 2022


Answer. f (z) has a simple pole at z = −1 and a double pole at z = i (as well as a simple zero at
z = 0). To expand about the double pole let z = i + w and expand in w:

2(i + w)
f (z) =
(i + w + 1)w2
2i(1 − iw)
=
(i + 1) 1 + 12 (1 − i)w w2


2i
(1 − iw) 1 − 12 (1 − i)w + O(w2 )

= 2
(i + 1)w
= (1 + i)w−2 1 − 12 (1 + i)w + O(w2 )


= (1 + i)(z − i)−2 − i(z − i)−1 + O(1) as z → i. (6.48b)

6.7.3 Laurent series and essential singularities

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Definition: Laurent series. It can be shown that any function
that is analytic (and single-valued) throughout an an-
nulus α < |z − z0 | < β centred on a point z = z0 has a
unique Laurent series,

X
f (z) = an (z − z0 )n , (6.49)
n=−∞

which converges for all values of z within the annulus.


If α = 0, then f (z) is analytic throughout the disk |z − z0 | < β except possibly at z = z0 itself, and the
Laurent series determines the behaviour of f (z) near z = z0 . There are three possibilities:

(i) If the first non-zero term in the Laurent series has n ⩾ 0, then f (z) is analytic at z = z0 and the series
is just a Taylor series.
(ii) If the first non-zero term in the Laurent series has n = −N < 0, then f (z) has a pole of order N at
z = z0 .
(iii) Otherwise, if the Laurent series involves an infinite number of terms with n < 0, then f (z) has an
essential singularity at z = z0 .

Example of a essential singularity. An example of an essential singularity is f (z) = e1/z at z = 0, where


the Laurent series can be generated from a Taylor series in 1/z:
∞  n 0
X 1 1 X 1
e1/z = = zn (6.50)
n=0
n! z n=−∞
(−n)!

Remark. The behaviour of a function near an essential singularity is remarkably complicated. Picard’s
theorem states that, in any neighbourhood of an essential singularity, the function takes all possible
complex values (possibly with one exception) at infinitely many points. In the case of f (z) = e1/z , the
exceptional value 0 is never attained.

6.7.4 Behaviour at infinity


We can examine the behaviour of a function f (z) as z → ∞ by defining a new variable ζ = 1/z and a new
function g(ζ) = f (z). Then z = ∞ maps to a single point ζ = 0, the point at infinity.
If g(ζ) has a zero, pole or essential singularity at ζ = 0, then we can say that f (z) has the corresponding
property at z = ∞.
Examples.
(i) f1 (z) = ez = e1/ζ = g1 (ζ) (6.51a)
has an essential singularity at z = ∞.

Natural Sciences Tripos: IB Mathematical Methods I 110 © [email protected], Michaelmas 2022


(ii) f2 (z) = z 2 = 1/ζ 2 = g2 (ζ) (6.51b)
has a double pole at z = ∞.
(iii) f3 (z) = e1/z = eζ = g3 (ζ) (6.51c)
is analytic at z = ∞.
Remark. It can be shewn that all entire functions f (z) have essential singularities at z = ∞ unless they are
polynomials, and all polynomials have poles at z = ∞ unless they are constant.

6.8 Power Series of a Complex Variable


6.8.1 Convergence of Power Series
A power series about z = z0 of a complex variable has the form

X
f (z) = ar (z − z0 )r where ar ∈ C . (6.52)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
r=0

Hence the Taylor series for an analytic function, (6.44), is a power series.
Many of the tests of convergence for real series discussed in § 6.3 can be generalised for complex series.
Indeed, we have already noted that if the sum P of the absolute
P values of a complex series converges, i.e. if
|ur | converges, then so does the series, i.e. ur . Hence if |ar (z −z0 )r | converges, so does ar (z −z0 )r .
P P

6.8.2 Radius of convergence


If the power series (6.52) converges for z = z1 , then the series converges absolutely for all z such that
|z − z0 | < |z1 − z0 |.
ar (z1 − z0 )r converges, then from the necessary condi-
P
Proof. Since
tion for convergence in § 6.2.2,

lim ar (z1 − z0 )r = 0 . (6.53a)


r→∞

Hence for a given ε there exists N ≡ N (ε) such that if r > N


then
|ar (z1 − z0 )r | < ε . (6.53b)
Thus for r > N
r
z − z0
|ar (z − z0 )r | = |ar (z1 − z0 )r |
z1 − z0
z − z0
< εϱr where ϱ= . (6.53c)
z1 − z0

Thus, by means of a comparison with a geometric series,


ar (z − z0 )r converges for ϱ < 1, i.e. for |z − z0 | < |z1 − z0 |.
P

Corollary. If the sum diverges for z = z1 then it diverges for all z such that |z − z0 | > |z1 − z0 |. For suppose
that it were to converge for some such z = z2 with |z2 − z0 | > |z1 − z0 |, then it would converge for
z = z1 by the above result; this is in contradiction to the hypothesis.
Definition: Radius and circle of convergence. These results imply there must exist a real, non-negative num-
ber R such that
ar (z − z0 )r converges for |z − z0 | < R
P
. (6.54)
ar (z − z0 )r diverges for |z − z0 | > R
P

R is called the radius of convergence, and |z − z0 | = R is called the circle of convergence, within which
the series converges and outside of which it diverges.
Remarks.
(i) The radius of convergence may be be zero (exceptionally), positive or infinite.
(ii) On the circle of convergence, the series may either converge or diverge.
(iii) The radius of convergence of the Taylor series of a function f (z) about the point z = z0 is equal
to the distance of the nearest singular point of the function f (z) from z0 . Since a convergent
power series defines an analytic function, no singularity can lie inside the circle of convergence.
Natural Sciences Tripos: IB Mathematical Methods I 111 © [email protected], Michaelmas 2022
6.8.3 Determination of the radius of convergence
Without loss of generality take z0 = 0, so that (6.52) becomes

X
f (z) = ur where ur = ar z r . (6.55)
r=0

Use D’Alembert’s ratio test. If the limit exists, then

ar+1 1
lim = . (6.56a)
r→∞ ar R

Proof. We have that

ur+1 ar+1 |z|

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
lim = lim |z| = by hypothesis (6.56a).
r→∞ ur r→∞ ar R

Hence the series converges absolutely by D’Alembert’s ratio test if |z| < R. On the other hand if
|z| > R, then
ur+1 |z|
lim = > 1. (6.56b)
r→∞ ur R
Hence ur ̸→ 0 as r → ∞, and so the series does not converge. It follows that R is the radius of
convergence.
ar+1
Remark. The limit (6.56a) may not exist, e.g. if ar = 0 for r odd then is alternately 0 or ∞.
ar
Use Cauchy’s test (unlectured). If the limit exists, then
1
lim |ar |1/r = . (6.57a)
r→∞ R
Proof. We have that
|z|
lim |ur |1/r = lim |ar |1/r |z| = by hypothesis. (6.57b)
r→∞ r→∞ R
Hence the series converges absolutely by Cauchy’s test if |z| < R.
On the other hand if |z| > R, choose τ with 1 < τ < |z|/R. Then there exists M ≡ M (τ ) such
that
|ur |1/r > τ > 1 , i.e. |ur | > τ r > 1 , for all r > M .
P
21/22 Thus, since ur ̸→ 0 as r → ∞, ur must diverge. It follows that R is the radius of convergence.

6.8.4 Examples
(i) Suppose that ar = 1 for all r, then f (z) is the geometric series

X
f (z) = zr . (6.58a)
r=0

Both D’Alembert’s ratio test, (6.56a), and Cauchy’s test, (6.57a), give R = 1:

ar+1
=1 and |ar |1/r = 1 for all r. (6.58b)
ar

Hence the series converges for |z| < 1. In fact


1
f (z) = , (6.58c)
1−z
where we note that it is the singularity at z = 1 which determines the radius of convergence.

Natural Sciences Tripos: IB Mathematical Methods I 112 © [email protected], Michaelmas 2022


(ii) Suppose next that ar = (−1)r−1 /r for all r, then f (z) is the geometric series

X (−z)r z2 z3
f (z) = − =z− + − ··· . (6.59a)
r=1
r 2 3

D’Alembert’s ratio test gives


1 ar+1 r
= lim = lim = 1. (6.59b)
R r→∞ ar r→∞ r + 1

For Cauchy’s test, we first note that


1 1 1
lim log |ar | r = lim log = 0 , (6.59c)
r→∞ r→∞ r r
and thence, as for D’Alembert’s ratio test,

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
1
= lim |ar |1/r = 1 . (6.59d)
R r→∞
Remark. The series converges to log(1 + z) for |z| < 1, where the singularity at z = −1 limits the
radius of convergence. In fact it can be shewn that the series converges on the circle |z| = 1
except at the point z = −1.
1
(iii) If ar = r! for all r, then f (z) is the series

X zr
f (z) = . (6.60a)
r=0
r!

D’Alembert’s ratio test gives an infinite radius of convergence:


1 ar+1 1
= lim = lim = 0. (6.60b)
R r→∞ ar r→∞ r + 1

For Cauchy’s test, we first note, using Stirling’s formula,24 that


1 1
log |ar | r = − log r! ∼ − log r as r → ∞ , (6.60c)
r
and thence we confirm an infinite radius of convergence:
1
= lim |ar |1/r = 0 . (6.60d)
R r→∞
The series converges to ez for all finite z, which is an entire function.
(iv) Instead consider ar = r!. This has zero radius of convergence since by D’Alembert’s ratio test
ar+1
= r + 1 → ∞ as r → ∞. (6.61)
ar
P∞
This conclusion can be confirmed using Cauchy’s test. The series r=0 r!z r fails to define a function
since it does not converge for any non-zero z.
(v) Finally consider

X 1 z3 z5 z7
z (−z 2 )r = z − + − + · · · = arctan z . (6.62)
r=0
2r + 1 3 5 7

Thought of as a power series in (−z 2 ), this has |ar+1 /ar | = (2r + 1)/(2r + 3) → 1 as r → ∞. Therefore
R = 1 in terms of (−z 2 ). But since | − z 2 | = 1 is equivalent to |z| = 1, the series converges for |z| < 1
and diverges for |z| > 1.

24 Stirling’s formula states that


1
log r! ∼ r log r − r + 2
log(2πr) as r → ∞.

Natural Sciences Tripos: IB Mathematical Methods I 113 © [email protected], Michaelmas 2022


7 Series Solutions of Ordinary Differential Equations

7.0 Why Study This?


Numerous scientific phenomena are described by differential equations. Even if these are partial differential
equations, they can often be reduced to ordinary differential equations (ODEs), e.g. by the method of sepa-
ration of variables. This section is about extending your armoury for solving ordinary differential equations,
such as those that arise in quantum mechanics and electrodynamics.
The ordinary differential equations encountered are often linear and of either first or second order. In
particular, second-order equations can describe oscillatory phenomena.

7.1 Classification
7.1.1 First-order linear ordinary differential equations

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
The general linear first-order inhomogeneous ODE,

y ′ (x) + p(x)y(x) = f (x) , (7.1a)

can be solved using the integrating factor


Z
g = exp p(x) dx , (7.1b)

to obtain the general solution Z 


1
y= gf dx + constant . (7.1c)
g
Provided that the integrals can be evaluated, the problem is completely solved. An equivalent method does
not exist for second-order ODEs, but an extensive theory can still be developed.

7.1.2 Second-order ordinary differential equations


The general second-order ODE is an equation of the form

F (y ′′ , y ′ , y, x) = 0 , (7.2)

for an unknown function y(x), where y ′ = dy/dx, y ′′ = d2 y/dx2 and F is a known function.

7.1.3 Second-order linear ordinary differential equations


The general linear second-order ordinary differential equation for y(x) has the form

Ly(x) = g(x) , (7.3a)

where L is a linear operator such that

Ly(x) = a(x)y ′′ + b(x)y ′ + c(x)y . (7.3b)

We recover the standard form (2.12a) by dividing through by the coefficient of y ′′ to obtain

y ′′ (x) + p(x)y ′ (x) + q(x)y(x) = f (x) . (7.3c)

Recall from § 2.2 that if f (x) = 0 the equation is said to be homogeneous, otherwise it is said to be
inhomogeneous.
Remarks.
(i) The principle of superposition applies to linear ODEs as to all linear equations.
(ii) Although the solution may be of interest only for real x, it is often informative to analyse the
solution in the complex domain.

Natural Sciences Tripos: IB Mathematical Methods I 114 © [email protected], Michaelmas 2022


7.2 Homogeneous Second-Order Linear ODEs
7.2.1 Linearly independent solutions
Recall from § 2.2.1 that if y1 (x) and y2 (x) are two solutions of the homogeneous equation
y ′′ + p(x)y ′ + q(x)y = 0 , (7.4a)
they are linearly independent if
αy1 (x) + βy2 (x) = 0 for all x implies α = β = 0 , (7.4b)
i.e. if one is not simply a constant multiple of the other.
If y1 (x) and y2 (x) are linearly independent solutions, then the general solution of (7.4a)
y(x) = αy1 (x) + βy2 (x) , (7.4c)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
where α and β are arbitrary constants, and there are two arbitrary constants because the equation is of
second order.

7.2.2 The Wronskian


Recall from (2.17a) that the Wronskian W (x) of two solutions y1 (x) and y2 (x) of a second-order ODE is
the determinant of the Wronskian matrix:
y1 y2
W [y1 , y2 ] = = y1 y2′ − y2 y1′ . (7.5a)
y1′ y2′
Recall also that if αy1 (x) + βy2 (x) = 0 in some interval of x then, by differentiation, αy1′ (x) + βy2′ (x) = 0,
and so in matrix form     
y1 y2 α 0
= . (7.5b)
y1′ y2′ β 0
If this is satisfied for non-trivial α and β, then W = 0 in that interval of x. Hence if W ̸= 0, the solutions
y1 and y2 must be linearly independent

7.2.3 The Calculation of W


We can derive a differential equation for the Wronskian, since
W ′ = y1 y2′′ − y1′′ y2 from (7.5a) since the y1′ y2′ terms cancel
= −y1 (py2′ + qy2 ) + (py1′ + qy1 )y2 using equation (7.4a)
= −p(y1 y2′ − y1′ y2 ) since the qy1 y2 terms cancel
= −pW from definition (7.5a). (7.6a)
This is a first-order equation for W , viz.
W ′ + p(x)W = 0 , (7.6b)
with solution, cf. (7.1c),  Z x 
W (x) = κ exp − p(ζ) dζ , (7.6c)

where κ is a constant (a change in lower limit of integration can be absorbed by a rescaling of κ).
Remarks.
(i) Up to the multiplicative constant κ, the Wronskian W can be shown to be the same for any two
linearly independent solutions y1 and y2 , and hence it is an intrinsic property of the ODE.
(ii) If W ̸= 0 for one value of x (and p is integrable) then W ̸= 0 for all x (since exp y > 0 for all y).
Hence if y1 and y2 are linearly independent for one value of x, they are linearly independent for
all values of x; it follows that linear independence need be checked at only one value of x. In the
case that y1 and y2 are known implicitly, e.g. in terms of series or integrals, this means that we
just have to find one value of x where is it relatively easy to evaluate W in order to confirm (or
18/02 otherwise) linear independence.

Natural Sciences Tripos: IB Mathematical Methods I 115 © [email protected], Michaelmas 2022


7.2.4 A Second Solution via the Wronskian
Suppose that we already have one solution, say y1 , to the homogeneous equation. Then we can calculate a
second linearly independent solution, y2 , using the Wronskian as follows.
First, the definition of the Wronskian, (7.5a), provides a first-order linear ODE for the unknown y2 :
y1 y2′ − y1′ y2 = W (x) . (7.7)
To solve, divide by y12 to obtain
′
y2′ y2 y ′

y2 W
= − 21 = 2 .
y1 y1 y1 y1
Now integrate both sides and use (7.6c) to obtain
Z x
W (η)
y2 (x) = y1 (x) dη (7.8a)
y12 (η)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Z x  Z η 
κ
= y1 (x) exp − p(ζ) dζ dη . (7.8b)
y12 (η)
In principle this allows us to compute y2 given y1 .
Remarks.
(i) The indefinite integral involves an arbitrary additive constant, since any amount of y1 can be
added to y2 .
(ii) W involves an arbitrary multiplicative constant, since y2 can be multiplied by any constant.
(iii) This expression for y2 therefore provides the general solution of the homogeneous ODE.
18/04
Example. Given that y = xn is a solution of x2 y ′′ − (2n − 1)xy ′ + n2 y = 0, find the general solution.
Answer. First write the ODE in the standard form (7.3c):
   2
′′ 2n − 1 ′ n
y − y + y=0 (7.9a)
x x2
Next calculate the Wronskian (7.6c):
 Z x  x  
2n − 1
Z
W = κ exp − p(ζ) dζ = κ exp dζ
ζ
= κ exp ((2n − 1) ln x + constant)
= Λx2n−1 , (7.9b)
for any non-zero constant Λ Finally, calculate the second solution from (7.8a):
Z x Z x 
W (η) n Λ
y2 = y1 dη = x dη
y12 (η) η
= Λxn ln x + Bxn . (7.9c)

Remark. The same result can be obtained by writing y2 (x) = y1 (x)u(x) and obtaining a first-order
linear ODE for u′ . This method applies to higher-order linear ODEs and is reminiscent of the
factorization of polynomial equations.
Example. Given that y1 (x) is a solution of Bessel’s equation of zeroth order,
1 ′
y ′′ + y + y = 0, (7.10a)
x
find another independent solution in terms of y1 for x > 0.
Answer. In this case p(x) = 1/x and hence
Z x  Z η 
κ 1
y2 (x) = y1 (x) exp − dζ dη
y 2 (η) ζ
Z x1
1
= κy1 (x) dη . (7.10b)
18/03 η y12 (η)

Natural Sciences Tripos: IB Mathematical Methods I 116 © [email protected], Michaelmas 2022


7.3 Taylor Series Solutions
7.3.1 Ordinary and singular points

It is useful now to generalize to complex functions y(z) of a complex variable z. The homogeneous linear
second-order ODE (7.4a) in standard form then becomes
y ′′ (z) + p(z)y ′ (z) + q(z)y(z) = 0 . (7.11)

Definition: ordinary and singular points. If


p(z) and q(z) are both analytic at z = z0
(i.e. they have power series expansions (6.44) about z = z0 ), then z = z0 is called an ordinary point
of the ODE. A point at which p and/or q is singular, i.e. a point at which p and/or q or one of its
derivatives is infinite, is called a singular point of the ODE.

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Definition: regular singular points. A singular point z = z0 is regular if:
(z − z0 )p(z) and (z − z0 )2 q(z) are both analytic at z = z0 .
Example. Consider Legendre’s equation
(1 − z 2 )y ′′ − 2zy ′ + ℓ(ℓ + 1)y = 0 , (7.12a)
where ℓ is a constant. To identify the singular points and their nature, we divide through by (1 − z 2 )
to obtain the standard form with
2z ℓ(ℓ + 1)
p(z) = − , q(z) = (7.12b)
1 − z2 1 − z2
Both p(z) and q(z) are analytic for all z except z = ±1, which are the singular points. However, they
are both regular since
 
2z 2 1−z
(z − 1)p(z) = , and (z − 1) q(z) = ℓ(ℓ + 1) (7.12c)
1+z 1+z
22/22 are both analytic at z = 1, and similarly for z = −1.
7.3.2 The solution at ordinary points in terms of a power series
If z = z0 is an ordinary point of (7.11), then we claim that y(z) is analytic
at z = z0 , and consequently the equation has two linearly independent
solutions of the form (see (6.44))

X
y= an (z − z0 )n when |z − z0 | < R , (7.13)
n=0

where R is the radius of convergence. The coefficients an can be deter-


mined by substituting the series into the equation and comparing powers
of (z − z0 ). The radius of convergence turns out to be the distance to the
18/01
nearest singular point of the equation in the complex plane.
For simplicity we will assume henceforth wlog that z0 = 0 (which corresponds to a shift in the origin of the
z-plane, e.g. define z ′ = z − z0 so that z ′ = 0 is the singular point, and then substitute z for z ′ ). Hence we
seek solutions of the form

X ∞
X
y= an z n ≡ am z m , (7.14a)
n=0 m=0
for which

X ∞
X
y′ = nan z n−1 = (m + 1)am+1 z m substituting m = n − 1, (7.14b)
n≠ 01 m=0

X ∞
X
y ′′ = n(n − 1)an z n−2 = (r + 2)(r + 1)ar+2 z r substituting r = n − 2. (7.14c)
n≠ 02 r=0

Natural Sciences Tripos: IB Mathematical Methods I 117 © [email protected], Michaelmas 2022


At an ordinary point p(z) and q(z) are analytic so we can write

X ∞
X
p(z) = pn z n and q(z) = qn z n . (7.15)
n=0 n=0

On substituting the above series into equation (7.11) we will need


a rule for multiplying double sums of the form

X ∞
X
An z n Bm z m (7.16a)
n=0 m=0

to only include powers like z r . Let r = n + m and then note that


(cf. a change of variables in a double integral)
∞ X
∞ ∞ X
r

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
X X
•(n, m) = •(r − m, m) , (7.16b)
n=0 m=0 r=0 m=0

since n = r − m ⩾ 0. In which case (7.16a) can be rewritten as


∞ r
!
X X
Ar−m Bm z r . (7.16c)
r=0 m=0

Hence from series (7.14a), (7.14b) and (7.15)


∞ ∞ ∞ r
!
X X X X
p(z)y ′ (z) = pn z n (m + 1)am+1 z m = pr−m (m + 1)am+1 zr , (7.17a)
n=0 m=0 r=0 m=0
∞ ∞ ∞ r
!
X X X X
n m
q(z)y(z) = qn z am z = qr−m am zr . (7.17b)
n=0 m=0 r=0 m=0

Now substitute series (7.14c), (7.17a) and (7.17b) into equation (7.11), and group powers of z r , to obtain
∞ r 
!
X X 
(r + 2)(r + 1)ar+2 + (m + 1)am+1 pr−m + am qr−m zr = 0 . (7.18)
r=0 m=0

Since this expression is true for all |z| < R, each coefficient of z r (r = 0, 1, . . . ) must be zero. Thus we
deduce the recurrence relation
r 
1 X 
ar+2 = − (m + 1)am+1 pr−m + am qr−m for r ⩾ 0 . (7.19)
(r + 2)(r + 1) m=0

This is a recurrence relation that determines ar+2 (for r ⩾ 0) in terms of the preceding coefficients
a0 , a1 , . . . , ar+1 . This means that if a0 and a1 are known then so are all the ar . The first two coefficients a0
and a1 play the rôle of the two integration constants in the general solution.
Remarks.
(i) The above procedure is rarely followed to the letter. For instance, if p and q are rational functions
(i.e. ratios of polynomials) it is a much better idea to multiply the equation through by a suitable
factor to clear denominators before substituting in the power series for y, y ′ and y ′′ .
(ii) Proof that the radius of convergence of the series (7.14a) is non-zero is more difficult, and we will
not attempt such a task in general. However we shall discuss the issue for examples.

7.3.3 Example (possibly unlectured)


Consider
2
y ′′ − y = 0. (7.20)
(1 − z)2
z = 0 is an ordinary point so try

X
y= an z n . (7.21)
n=0

Natural Sciences Tripos: IB Mathematical Methods I 118 © [email protected], Michaelmas 2022


We note that

2 X
p = 0, q=− = −2 (m + 1)z m , (7.22a)
(1 − z)2 m=0

and hence in the terminology of the previous subsection pm = 0 and qm = −2(m + 1). Substituting into
(7.19) we obtain the recurrence relation
r
2 X
ar+2 = an (r − n + 1) for r ⩾ 0. (7.22b)
(r + 2)(r + 1) n=0

However, with a small amount of forethought we can obtain a simpler, if equivalent, recurrence relation.
First multiply (7.20) by (1 − z)2 to obtain

(1 − z)2 y ′′ − 2y = 0 ,

and then substitute (7.21) into this equation. We find, on expanding (1 − z)2 = 1 − 2z + z 2 , that

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.

X ∞
X ∞
X
n(n − 1)an z n−2 − 2 n(n − 1)an z n−1 + (n2 − n − 2)an z n = 0 .
n≠ 02 n≠ 01 n=0

After the substitutions r = n − 2, r = n − 1 and r = n in the first, second and third terms respectively, we
obtain
X∞
(r + 1) (r + 2)ar+2 − 2rar+1 + (r − 2)ar z r = 0 ,

r=0

which leads to the recurrence relation


1
ar+2 = (2rar+1 − (r − 2)ar ) for r ⩾ 0 . (7.23)
r+2
This two-term recurrence relation again determines ar for r ⩾ 2 in terms of a0 and a1 , but is simpler than
(7.22b).
Exercise for those with time. Show that the recurrence relations (7.22b) and (7.23) are equivalent.
Two solutions. For r = 0 the recurrence relation (7.23) yields a2 = a0 , while for r = 1 and r = 2 we obtain

a3 = 31 (2a2 + a1 ) and a4 = a3 . (7.24a)

First we note that if 2a2 + a1 = 0, then a3 = a4 = 0, and hence ar = 0 for r ⩾ 3. We thus have as our
first solution (with a0 = α ̸= 0)
y1 = α(1 − z)2 . (7.24b)
Next we note that ar = a0 for all r is a solution of (7.23). In this case we can sum the series to obtain
(with a0 = β ̸= 0)

X β
y2 = β zn = . (7.24c)
n=0
1 − z

Linear independence. The linear independence of (7.24b) and (7.24c) is clear, as confirmed by the calculation
of the Wronskian:
β β
W = y1 y2′ − y1′ y2 = α(1 − z)2 + 2α(1 − z) = 3αβ ̸= 0 . (7.25)
(1 − z)2 (1 − z)
Hence the general solution is
β
y(z) = α(1 − z)2 + , (7.26)
1−z
for constants α and β.
Radius of convergence. From (6.58b) the radius of convergence of (7.24c) is R = 1, which is consistent with
the general solution being singular at z = 1, and the equation having a singular point at z = 1 since
q(z) = −2(1 − z)−2 .

Natural Sciences Tripos: IB Mathematical Methods I 119 © [email protected], Michaelmas 2022


7.3.4 Example: Legendre’s Equation
Legendre’s equation is
(1 − z 2 )y ′′ − 2zy ′ + ℓ(ℓ + 1) y = 0 , (7.27)
where ℓ ∈ R, and for the sequel it is more convenient not to write it in standard form. The points z = ±1
are singular points but z = 0 is an ordinary point, so for smallish z seek a power series solution

X ∞
X ∞
X
y= an z n , y′ = nan z n−1 , y ′′ = n(n − 1)an z n−2 . (7.28)
n=0 n=0 n=0

On substituting this into (7.27) we obtain



X ∞
X ∞
X ∞
X
n(n − 1)an z n−2 − n(n − 1)an z n − 2 nan z n + ℓ(ℓ + 1)an z n = 0 .
n≠ 02 n=0 n=0 n=0

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
From substituting r = n − 2 in the first sum and r = n in the next three sums, and from grouping powers
of z r , we obtain

X
(r + 2)(r + 1)ar+2 − (r(r + 1) − ℓ(ℓ + 1)) ar z r = 0 .

r=0
The recurrence relation is therefore
r(r + 1) − ℓ(ℓ + 1) (r − ℓ)(r + ℓ + 1)
ar+2 = ar = ar for r = 0, 1, 2, . . . . (7.29)
(r + 1)(r + 2) (r + 1)(r + 2)
a0 and a1 are arbitrary constants, with the other coefficients following from the recurrence relation. For
instance:
ℓ(ℓ + 1) 2
(i) if a0 = 1 and a1 = 0, then y1 = 1 − z + O(z 4 ) is an even solution; (7.30a)
2
2 − ℓ(ℓ + 1) 3
(ii) if a0 = 0 and a1 = 1, then y2 = z + z + O(z 5 ) is an odd solution. (7.30b)
6
The Wronskian at the ordinary point z = 0 is thus given by
W = y1 y2′ − y1′ y2 = 1 · 1 − 0 · 0 = 1 . (7.31)
Since W ̸= 0, y1 and y2 are linearly independent (although it should have been obvious already ©).

Radius of convergence. The series (7.30a) and (7.30b) are effectively power series in z 2 rather than z. Hence
to find the radius of convergence we either need to re-express our series (e.g. z 2 → y and a2n → bn ), or
use a slightly modified D’Alembert’s ratio test. We adopt the latter approach and observe from (7.29)
that
an+2 z n+2 n(n + 1) − ℓ(ℓ + 1)
lim n
= lim |z|2 = |z|2 . (7.32)
n→∞ an z n→∞ (n + 1)(n + 2)
It then follows from a straightforward extension of D’Alembert’s ratio test (6.20b) that the series
converges for |z| < 1. Moreover, the series diverges for |z| > 1 (since an z n ̸→ 0), and so the radius
of convergence R = 1. On the radius of convergence, determination of whether the series converges is
more difficult.
Remark. The radius of convergence is the distance to nearest singularity of the ODE. This is a general
feature.
Legendre polynomials. In the generic situations both series (7.30a) and (7.30b) have an infinite number of
terms. However, for ℓ = 0, 1, 2, . . . it follows from (7.29)
ℓ(ℓ + 1) − ℓ(ℓ + 1)
aℓ+2 = aℓ = 0 , (7.33)
(ℓ + 1) (ℓ + 2)
and so the series terminates. For instance,
ℓ = 0 : y = a0 ,
ℓ = 1 : y = a1 z ,
ℓ = 2 : y = a0 (1 − 3z 2 ) .

Natural Sciences Tripos: IB Mathematical Methods I 120 © [email protected], Michaelmas 2022


These functions are proportional to the Legendre polynomials, Pℓ (z), which are conventionally nor-
malized (by suitable choice of a0 or a1 ) so that Pℓ (1) = 1. Thus

P0 (z) = 1 , P1 (z) = z , P2 (z) = 21 (3z 2 − 1) , etc. (7.34)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
19/02
19/04
23/22
7.4 Regular Singular Points
Let z = z0 be a regular singular point of equation (7.11) where, as before, wlog we can take z0 = 0. If we
write
1 1
p(z) = s(z) and q(z) = 2 t(z) , (7.35a)
z z
then the homogeneous equation (7.11) becomes, after multiplying by z 2 ,

z 2 y ′′ + zs(z)y ′ + t(z)y = 0 , (7.35b)

where, from the definition of a regular singular point, s(z) and t(z) are both analytic at z = 0. It follows
19/03 that s0 ≡ s(0) and t0 ≡ t(0) are finite.

7.4.1 The Indicial Equation

If z = 0 is a regular singular point, Fuchs’s theorem guarantees that there is always at least one solution to
(7.35b) of the form

X
y = zσ an z n with a0 ̸= 0 and σ ∈ C , (7.36)
n=0
i.e. a Taylor series multiplied by a power z σ , where the index σ is to be determined.
Remarks.
(i) This is a Taylor series only if σ is a non-negative integer.
(ii) There may be one or two solutions of this form (see below).
(iii) The condition a0 ̸= 0 is required to define σ uniquely.

To understand why the solutions behave in this way, substitute (7.36) into (7.35b) to obtain, after division
by z σ ,
X∞
(σ + n)(σ + n − 1) + (σ + n)s(z) + t(z) an z n = 0 .

(7.37a)
n=0
We now evaluate this sum at z = 0, recalling that z n = 0 except for n = 0, to obtain

(σ(σ − 1) + σs0 + t0 )a0 = 0 . (7.37b)

Natural Sciences Tripos: IB Mathematical Methods I 121 © [email protected], Michaelmas 2022


Since by definition a0 ̸= 0 (see (7.36)) we obtain the indicial equation for σ:

σ 2 + σ(s0 − 1) + t0 = 0 . (7.38)

19/01 The roots σ1 , σ2 of this equation are called the indices of the regular singular point.
7.4.2 Series Solutions
For each choice of σ from σ1 and σ2 we can find a recurrence relation for an by comparing powers of z in
(7.37a), i.e. after expanding s and t in power series.
σ1 − σ2 ∈
/ Z. If σ1 − σ2 ∈
/ Z we can find both linearly independent solutions this way.
σ1 − σ2 ∈ Z. If σ1 = σ2 we note that we can find only one solution by the ansatz (7.36). However, as we
shall see, it’s worse than this. The ansatz (7.36) also fails (in general) to give both solutions when σ1
and σ2 differ by an integer (although there are exceptions).

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Frobenius’s method is used to find the series solutions about a regular singular point. This is best demon-
strated by example.

7.4.3 Example: Bessel’s Equation of Order ν


Bessel’s equation of order ν is
ν2
 
′′1 ′
y + y + 1 − 2 y = 0, (7.39)
z z
where ν ⩾ 0 wlog. The origin z = 0 is a regular singular point with

s(z) = 1 and t(z) = z 2 − ν 2 . (7.40)

A power series solution of the form (7.36) solves (7.39) if, from (7.37a),

X ∞
X
(σ + n)(σ + n − 1) + (σ + n) − ν 2 an z n + an z n+2 = 0 ,

(7.41a)
n=0 n=0

i.e. after the transformation n → n − 2 in the second sum, if



X ∞
X
(σ + n)2 − ν 2 an z n + an−2 z n = 0 .

(7.41b)
n=0 n=2

Now compare powers of z to obtain

n = 0 : σ2 − ν 2 = 0 since a0 ̸= 0 (7.42a)
2 2

n=1: (σ + 1) − ν a1 = 0 (7.42b)
(σ + n)2 − ν 2 an + an−2 = 0 .

n⩾2: (7.42c)

(7.42a) is the indicial equation and implies that

σ = ±ν . (7.43)

Substituting this result into (7.42b) and (7.42c) yields

(1 ± 2ν) a1 = 0 (7.44a)
n(n ± 2ν) an = −an−2 for n ⩾ 2 . (7.44b)

Radius of convergence. The radius of convergence of the solution is infinite since from (7.44b)
an 1
lim = lim = 0.
n→∞ an−2 n→∞ n(n ± 2ν)
This is consistent with p and q having no singularities other than at z = 0.
Remark. We note that there is no difficulty in solving for an from an−2 using (7.44b) if σ = +ν. However, if
σ = −ν the recursion will fail with an predicted to be infinite if at any point n = 2ν. There are hence
potential problems if σ1 − σ2 = 2ν ∈ Z, i.e. if the indices σ1 and σ2 differ by an integer.

Natural Sciences Tripos: IB Mathematical Methods I 122 © [email protected], Michaelmas 2022


2ν ∈
/ Z. First suppose that 2ν ∈
/ Z so that σ1 and σ2 do not differ by an integer. In this case (7.44a) and
(7.44b) imply 
 0 n = 1, 3, 5, . . . ,
an = an−2 (7.45a)
− n = 2, 4, 6, . . . ,
n(n ± 2ν)
and so we get two linearly independent solutions

z2 z4
 
y1 = a0 z +ν 1 − + + ··· , (7.45b)
4(1 + ν) 32(1 + ν)(2 + ν)
z2 z4
 
y2 = a0 z −ν 1 − + + ··· . (7.45c)
4(1 − ν) 32(1 − ν)(2 − ν)

2ν = 2m + 1, m ∈ N. It so happens in this case that even though σ1 and σ2 differ by an odd integer
there is no problem; the solutions are still given by (7.45a), (7.45b) and (7.45c). This is because for

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
Bessel’s equation the power series proceed in even powers of z, and hence the problem recursion when
n = 2ν = 2m + 1 is never encountered. We conclude that the condition for the recursion relation
(7.44b) to fail is that ν is an integer.
Remark. If ν = 21 , then (7.44a) does not force the choice a1 = 0. However, if a1 ̸= 0 the effect is to
add a multiple of y1 to y2 ; hence, wlog, one can choose a1 = 0.

2ν = 0. If ν = 0 then σ1 = σ2 and we can only find one power series solution of the form (7.36), viz.

y = a0 1 − 41 z 2 + . . . .

(7.46)

2ν = 2m, m ∈ N. If ν is a positive integer, m, then we can find one solution by choosing σ = ν. However
if we take σ = −ν then a2m is predicted to be infinite, i.e. a second series solution of the form (7.36)
fails.
Remark. The existence of two power series solutions for 2ν = 2m + 1, m ∈ N is a ‘lucky’ accident. In general
there exists only one solution of the form (7.36) whenever the indices σ1 and σ2 differ by an integer.

7.4.4 The Second Solution when σ1 − σ2 ∈ Z


A Preliminary: Bessel’s equation with ν = 0. In order to obtain an idea how to proceed when σ1 − σ2 ∈ Z,
first consider the example of Bessel’s equation of zeroth order, i.e. ν = 0. Let y1 denote the solution
(7.46). Then, from (7.10b) (after the transformations x → z)
Z z
1
y2 (z) = κy1 (z) dη . (7.47a)
η y12 (η)

For small (positive) z we can deduce using (7.46) that


Z z
1
y2 (z) = κa0 (1 + O(z 2 )) (1 + O(η 2 )) dη
η a20
κ
= log z + . . . . (7.47b)
a0
We conclude that the second solution contains a logarithm.
The claim. Let σ1 , σ2 be the two (possibly complex) solutions to the indicial equation for a regular singular
point at z = 0. Order them so that
Re (σ1 ) ⩾ Re (σ2 ) . (7.48a)
Then we can always find one solution of the form

X
y1 (z) = z σ1 an z n with, say, the normalisation a0 = 1 . (7.48b)
n=0

Natural Sciences Tripos: IB Mathematical Methods I 123 © [email protected], Michaelmas 2022


If σ1 − σ2 ∈ Z we claim that the second-order solution takes the form

X
y2 (z) = z σ2 bn z n + k y1 (z) log z , (7.48c)
n=0

for some number k. The coefficients bn can be found by substitution into the ODE. In some very
special cases k may vanish but k ̸= 0 in general.
Example: Bessel’s equation of integer order.25 Suppose that y1 is the series solution with σ = +m to

z 2 y ′′ + zy ′ + z 2 − m2 y = 0 ,

(7.49)

where, compared with (7.39), we have written m for ν. Hence from (7.36) and (7.45a)

X
y1 = z m a2ℓ z 2ℓ , (7.50)

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
ℓ=0

since a2ℓ+1 = 0 for integer ℓ. Let


y = ky1 log z + w , (7.51)
then
ky1 2ky1′ ky1
+ w′ and y ′′ = ky1′′ log z +
y ′ = ky1′ log z + − 2 + w′′ .
z z z
On substituting into (7.49), and using the fact that y1 is a solution of (7.49), we find that

z 2 w′′ + zw′ + (z 2 − m2 )w = −2kzy1′ . (7.52)

Based on (7.43), (7.48a) and (7.48c) we now seek a series solution of the form

X
w = k z −m bn z n . (7.53)
n=0

On substitution into (7.52) we have that



X ∞
X ∞
X
k n(n − 2m)bn z n−m + k bn z n−m+2 = −2k (2ℓ + m)a2ℓ z 2ℓ+m .
n≠ 01 n=0 ℓ=0

After multiplying by z m and making the transformations n → n − 2 and 2ℓ → n − 2m in the second


and third sums respectively, it follows that

X ∞
X ∞
X
n(n − 2m)bn z n + bn−2 z n = −2 (n − m)an−2m z n .
n=1 n=2 n=2m
n even

We now demand that the combined coefficient of z n is zero. Consider the even and odd powers of z n
in turn.
n = 1, 3, 5, . . . . From equating powers of z 1 it follows that b1 = 0. Next, from writing n = 2ℓ + 1
(ℓ = 1, 2, . . . ) and equating powers of z 2ℓ+1 , we obtain the following recurrence relation for the
b2ℓ+1 :
(2ℓ + 1)(2ℓ + 1 − 2m)b2ℓ+1 = −b2ℓ−1 .
Since b1 = 0, we conclude that b2ℓ+1 = 0 (ℓ = 1, 2, . . . ).
n = 2, 4, . . . , 2m, . . . . Let n = 2ℓ (ℓ = 1, 2, . . . ), then from equating powers of z 2ℓ we obtain

b2ℓ−2 = −4ℓ(ℓ − m)b2ℓ for 1 ⩽ ℓ ⩽ m − 1 , (7.54a)


b2m−2 = −2ma0 for ℓ = m , (7.54b)
1 (2ℓ − m)
b2ℓ = − b2ℓ−2 − a2ℓ−2m for ℓ ⩾ m + 1 . (7.54c)
4ℓ(ℓ − m) 2ℓ(ℓ − m)
To determine the even coefficients, b2ℓ ,

25 The schedules specifically state “without full discussion of logarithmic singularities”, hence you may assume that the

details in this example are not examinable.

Natural Sciences Tripos: IB Mathematical Methods I 124 © [email protected], Michaelmas 2022


• first, after noting that 2m − 2 ⩾ 0, solve for b2m−2 in terms of a0 from (7.54b);
• next, if m ⩾ 2, solve for the b2ℓ (ℓ = m − 2, m − 3, . . . , 0) recurrently, in terms of the known
b2m−2 = −2ma0 , using (7.54a);
• then, having noted that a non-zero value of b2m simply generates a solution proportional
to y1 , choose a value for b2m , e.g., wlog, b2m = 0;
• finally, having fixed b2m , solve for the b2ℓ (ℓ = m + 1, m + 2 . . . ) recurrently, in terms of b2m
and the a2k (k = 1, 2, . . . ), using (7.54c);
For example, we find for ν = 0

z2 z4
y1 = 1 − + + ··· , (7.55a)
4 64
z2 3z 4
y2 = y1 ln z + − + ··· , (7.55b)
4 128

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
and for ν = 1
z3 z5
y1 = z − + + ··· , (7.55c)
8 192
2 3z 3
y2 = y1 ln z − + + ··· . (7.55d)
z 32

Remark. These examples illustrate a feature that is commonly encountered in scientific applications: one
solution is regular (i.e. analytic) and the other is singular. Often only the regular solution is an
20/02 acceptable solution of the scientific problem.

7.4.5 Irregular singular points

If either (z − z0 )p(z) or (z − z0 )2 q(z) is not analytic at the point z = z0 , it is an irregular singular point of
the equation (7.11). The solutions can have worse kinds of singular behaviour there.
Example: the equation z 4 y ′′ + 2z 2 y ′ − y = 0 has an irregular singular point at z = 0. Its solutions are
exp(±z −1 ), both of which have an essential singularity at z = 0.

20/01
In fact this example is just the familiar equation d2 y/dx2 = y with the substitution z = 1/x. Even this
20/03 simple ODE has an irregular singular point at x = ∞.
20/04
24/22 7.5 The Method of Variation of Parameters (Unlectured and Not in Schedule)

The question that remains is how to find the particular solution. To that end first suppose that we have
solved the homogeneous equation and found two linearly-independent solutions y1 and y2 . Then in order to
find a particular solution consider

y0 (x) = u(x)y1 (x) + v(x)y2 (x) . (7.56)

If u and v were constants (‘parameters’) y0 would solve the homogeneous equation. However, we allow the
‘parameters’ to vary, i.e. to be functions of x, in such a way that y0 solves the inhomogeneous problem.

Remark. We have gone from one unknown function, i.e. y0 and one equation (i.e. (2.15a)), to two unknown
functions, i.e. u and v and one equation. We will need to find, or in fact choose, another equation.

We now differentiate (7.56) to find that

y0′ = (uy1′ + vy2′ ) + (u′ y1 + v ′ y2 ) (7.57a)


y0′′ = (uy1′′ + vy2′′ + u′ y1′ + v ′ y2′ ) + (u′′ y1 + v ′′ y2 + u′ y1′ + v ′ y2′ ) . (7.57b)

If we substitute the above into the inhomogeneous equation (2.15a) we will have not apparently made much
progress because we will still have a second-order equation involving terms like u′′ and v ′′ . However, suppose
that we eliminate the u′ and v ′ terms from (7.57a) by demanding that u and v satisfy the extra equation

u′ y1 + v ′ y2 = 0 . (7.58)

Natural Sciences Tripos: IB Mathematical Methods I 125 © [email protected], Michaelmas 2022


Then (7.57a) and (7.57b) become
y0′ = uy1′ + vy2′ (7.59a)
y0′′ = uy1′′ + vy2′′ + u′ y1′ + v ′ y2′ . (7.59b)
It follows from (7.56), (7.59a) and (7.59b) that
y0′′ + py0′ + qy0 = u(y1′′ + py1′ + qy1 ) + v(y2′′ + py2′ + qy2 ) + u′ y1′ + v ′ y2′
= u′ y1′ + v ′ y2′ ,
since y1 and y2 solve the homogeneous equation (7.11). Hence y0 solves the inhomogeneous equation (2.15a)
if
u′ y1′ + v ′ y2′ = f . (7.60)
We now have two simultaneous equations for u′ , v ′ , i.e. (7.58) and (7.60), with solution
f y2 f y1
u′ = − and v ′ = , (7.61)
W W

This is a specific individual’s copy of the notes. It is not to be copied and/or redistributed.
where W is the Wronskian,
W = y1 y2′ − y2 y1′ . (7.62)
W is non-zero because y1 and y2 were chosen to be linearly independent. Integrating we obtain
Z x Z x
y2 (ζ)f (ζ) y1 (ζ)f (ζ)
u=− dζ and v = dζ , (7.63)
a W (ζ) a W (ζ)
where a is arbitrary. We could have chosen different lower limits for the two integrals, but we do not need
to find the general solution, only a particular one. Substituting this result back into (7.56) we obtain as our
particular solution Z x
f (ζ) 
y0 (x) = y1 (ζ)y2 (x) − y1 (x)y2 (ζ) dζ . (7.64)
a W (ζ)

Remark. We observe that, since the integrand is zero when ζ = x,


Z x
f (ζ)
y0′ (x) = y1 (ζ)y2′ (x) − y1′ (x)y2 (ζ) dζ .

(7.65)
a W (ζ)

Hence the particular solution (7.64) satisfies the initial value homogeneous boundary conditions
y(a) = y ′ (a) = 0 . (7.66)
More general initial value boundary conditions would be inhomogeneous, e.g.
y(a) = k1 , y ′ (a) = k2 , (7.67)
where k1 and k2 are constants which are not simultaneously zero. Such inhomogeneous boundary
conditions can be satisfied by adding suitable multiples of the linearly-independent solutions of the
homogeneous equation, i.e. y1 and y2 .
Example. Find the general solution to the equation
y ′′ + y = sec x . (7.68)

Answer. Two linearly independent solutions of the homogeneous equation are


y1 = cos x and y2 = sin x , (7.69a)
with a Wronskian
W = y1 y2′ − y2 y1′ = cos x(cos x) − sin x(− sin x) = 1 . (7.69b)
Hence from (7.64) a particular solution is given by
Z x

y0 (x) = sec ζ cos ζ sin x − cos x sin ζ dζ
Z x Z x
= sin x dζ − cos x tan ζ dζ

= x sin x + cos x log |cos x| . (7.70)


The general solution is thus
y(x) = (α + log |cos x|) cos x + (β + x) sin x . (7.71)

Natural Sciences Tripos: IB Mathematical Methods I 126 © [email protected], Michaelmas 2022

You might also like