Math6015 Lecture 01
Math6015 Lecture 01
01
Mathematical Background
Jingwei Liang
Institute of Natural Sciences and School of Mathematical Sciences
2 Examples
3 Vector space
4 Matrix space
5 Dual space
6 Supplementary
1. About the course
About the course
General information
Examples 4/49
Neural network
Examples 5/49
Feasibility and Sudoku
Examples 6/49
Feasibility and Sudoku
Examples 6/49
Feasibility and Sudoku
S1 S2 S3 S4
Examples 6/49
3. Vector space
Definitions and properties
Vector space (向量空间)
Addition
Commutative: a + b = b + a.
Associative: (a + b) + c = a + (b + c).
0 adding: 0 + a = a + 0 = a.
For any x ∈ E, there exists a vector −x ∈ E such that x + (−x) = 0.
Addition
Commutative: a + b = b + a.
Associative: (a + b) + c = a + (b + c).
0 adding: 0 + a = a + 0 = a.
For any x ∈ E, there exists a vector −x ∈ E such that x + (−x) = 0.
Scalar multiplication
Distributive: let β ∈ R and b ∈ E, α(a + b) = αa + αb and (α + β)a = αa + βa.
Combinative: α(βa) = (αβ)a.
Associative: (α + β)a = αa + βa.
Scalars: 1a = a, α0 = 0, 0a = 0 and (−1)a = −a.
α 1 v1 + α 2 v2 + · · · + α n vn = 0
implies that all the coefficients αi , i = 1, ..., n are equal to 0.
A vector x is said to be a linear combination of vectors {v1 , v2 , ..., vn } if there are scalars α1 , ..., αn
such that
x = α 1 v1 + α 2 v2 + · · · + α n vn .
α 1 v1 + α 2 v2 + · · · + α n vn = 0
implies that all the coefficients αi , i = 1, ..., n are equal to 0.
A vector x is said to be a linear combination of vectors {v1 , v2 , ..., vn } if there are scalars α1 , ..., αn
such that
x = α 1 v1 + α 2 v2 + · · · + α n vn .
x = α 1 v1 + α 2 v2 + · · · α n vn ,
where αi ∈ R, i = 1, ..., n.
αi ∈ R, i = 1, ..., n are called the coordinates of x with respect to the basis {v1 , v2 , ..., vn }.
NB: This course will discuss only vector spaces with a finite dimension (number of basis vector is
finite), i.e. finite-dimensional vector spaces.
A vector space endowed with an inner product is also called an inner product space:
Underlying Spaces: In this course the underlying vector spaces, usually denoted by E or V, are
always finite dimensional real inner product spaces with endowed inner product ⟨· | ·⟩ and
endowed norm || · ||.
Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.
Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.
With norm defined, one can measure the distance between two vectors x and y as the length of their
difference, i.e.
dist(x, y) = ||x − y||.
Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.
With norm defined, one can measure the distance between two vectors x and y as the length of their
difference, i.e.
dist(x, y) = ||x − y||.
The open ball with center c ∈ E and radius r > 0 is denoted by B(c, r) and defined by
def { }
B(c, r) = x ∈ E | ||x − c|| < r .
Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.
Definition - Hyperplane
Given a non-zero v ∈ E and a ∈ R, the set
def { }
H = x | ⟨v | x⟩ = a
is called a hyperplane.
A hyperplane is not necessarily a subspace of E since, in general, it does not contain the origin 0.
Definition - Hyperplane
Given a non-zero v ∈ E and a ∈ R, the set
def { }
H = x | ⟨v | x⟩ = a
is called a hyperplane.
λx + (1−λ)y ∈ S
holds.
∑m
Affine combination (仿射组合) Given x1 , x2 , ..., xm ∈ E, the point of the form y = α i xi
∑m i=1
where i=1 αi = 1, is called the affine combination of x1 , x2 , ..., xm .
λx + (1−λ)y ∈ S
holds.
∑m
Affine combination (仿射组合) Given x1 , x2 , ..., xm ∈ E, the point of the form y = α i xi
∑m i=1
where i=1 αi = 1, is called the affine combination of x1 , x2 , ..., xm .
Affine hull (仿射包) For a set S ⊆ E, the affine hull of S, denoted by aff(S), is the intersection of
all affine sets containing S.
{ ∑m ∑m }
def
aff(S) = α i xi | αi = 1, xi ∈ S .
i=1 i=1
aff(S) is by itself an affine set, and it is the smallest affine set containing S (w.r.t. inclusion).
Some properties
{ }
Let S be a convex set, then βS = βx | x ∈ S is convex.
Let Si , i = 1, 2, ..., m be a family of convex sets, then
∩
Si
i=1,2,...,m
is convex.
Let S1 , S2 be two convex sets, then S1 + S2 and S1 − S2 are convex.
Transpose of a vector x⊤
If x is a column vector, then its transpose is a row vector.
One can also use (x1 , x2 , ..., xn )⊤ to denote an n-dimensional column vector.
Inner product in Rn : In this course, unless otherwise stated, the endowed inner product in Rn is
the dot product.
Inner product in Rn : In this course, unless otherwise stated, the endowed inner product in Rn is
the dot product.
Example - ℓ1 -norm
Let x ∈ Rn be a vector, the ℓ1 -norm of x is defined by
∑n
||x||1 = |xi |.
i=1
The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .
The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .
The positive orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn++ :
def { }
Rn++ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn > 0 .
The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .
The positive orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn++ :
def { }
Rn++ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn > 0 .
The unit simplex, denoted by ∆n , is the subset of Rn comprising all nonnegative vectors whose
components sum up to one:
def { ∑ }
∆n = x ∈ Rn+ | ni=1 xi = 1 .
The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .
The positive orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn++ :
def { }
Rn++ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn > 0 .
The unit simplex, denoted by ∆n , is the subset of Rn comprising all nonnegative vectors whose
components sum up to one:
def { ∑ }
∆n = x ∈ Rn+ | ni=1 xi = 1 .
Given two vectors ℓ, u ∈ Rn that satisfy ℓ ≤ u, the box with lower bounds ℓ and upper bounds u is
denoted by Box[ℓ, u] and defined by:
def { }
Box[ℓ, u] = x ∈ Rn | ℓ ≤ x ≤ u .
Given vector x ∈ Rn , the vector |x| is the vector of component-wise absolute values (|xi |)ni=1 , and
the vector sign(x) is defined as: {
+1, xi ≥ 0,
sign(x)i =
−1, xi < 0.
Given vector x ∈ Rn , the vector |x| is the vector of component-wise absolute values (|xi |)ni=1 , and
the vector sign(x) is defined as: {
+1, xi ≥ 0,
sign(x)i =
−1, xi < 0.
For two vectors x, y ∈ Rn , their Hadamard product, denoted by x ⊙ y, is the vector comprising
the component-wise products:
def ( )n
x ⊙ y = xi yi i=1 .
a1,1 a2,1 ... am,1
a1,2 a2,2 ... am,2
A⊤ = . .. .. .. ∈ R
n×m
.
.. . . .
a1,n a2,n ... am,n
Some properties
trace(A) = trace(A⊤ ).
trace(A + B) = trace(A) + trace(B).
Let a ∈ R, trace(aA) = atrace(A).
Let A, B be such that AB is square, trace(AB) = trace(BA).
Let A, B and C be such that ABC is square,
Inner product in Rm×n : In this course, unless otherwise stated, the endowed inner product in
Rm×n is the dot product.
When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :
√
||A||2 = σmax (A) = λmax (A⊤ A).
When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :
√
||A||2 = σmax (A) = λmax (A⊤ A).
= ⟨c1 v1 + c2 v2 + · · · + cn vn | A⊤ A(c1 v1 + c2 v2 + · · · + cn vn )⟩
= ⟨c1 v1 + c2 v2 + · · · + cn vn | c1 λ1 v1 + c2 λ2 v2 + · · · + cn λn vn ⟩
= λ1 c21 + λ2 c22 + · · · + λn c2n
≤ λ1 (c21 + c22 + · · · + c2n ) = λ1 .
When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :
√
||A||2 = σmax (A) = λmax (A⊤ A).
When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :
√
||A||2 = σmax (A) = λmax (A⊤ A).
Dot product (of vectors and matrices) and matrix-vector product are linear transformations.
I(x) = x
for all x ∈ E.
Dot product (of vectors and matrices) and matrix-vector product are linear transformations.
I(x) = x
for all x ∈ E.
All linear transformations from Rn to Rm have the form
A(x) = Ax
for some matrix A ∈ R m×n
.
Can be treated as the operator norm of v ⊤ , interpreted as a 1 × n matrix, with the norm || · || on
E, and the absolute value on R
{ }
||v||∗ = max |⟨v | x⟩| | ||x|| ≤ 1 .
Recall that in R2
⟨v | x⟩ = v1 x1 + v2 x2 .
Recall that in R2
⟨v | x⟩ = v1 x1 + v2 x2 .
Q-norms Consider the space Rn endowed with the Q-norm, where Q ∈ Sn++ . The dual norm of
|| · ||Q is || · ||Q−1 , meaning
√
||v||Q−1 = v ⊤ Q−1 v.
When V = Rn and E = Rm (endowed with dot product), and A(x) = Ax for some matrix A ∈ Rm×n ,
then the adjoint transformation is given by
A⊤ (y) = A⊤ y.
Suppose A : Rn → Rm , then
A(x) = Ax
where A ∈ R m×n
. Assume R and R
n m
are endowed with norms || · ||a and || · ||b , respectively.
Then
||A|| = ||A||a,b .
( ∈R
n×n
A diagonal matrix D is a) square matrix where all its non-diagonal entries are 0. Typical
notation D = diag σ1 , σ2 , ..., σn with
{
σi : i = j,
di,j = i, j = 1, ..., n.
0 : i ̸= j,
The identity matrix, Idn ∈ Rn×n (or simply Id), is diagonal with all diagonal entries equal to 1.
Supplementary 41/49
Special matrices
( ∈R
n×n
A diagonal matrix D is a) square matrix where all its non-diagonal entries are 0. Typical
notation D = diag σ1 , σ2 , ..., σn with
{
σi : i = j,
di,j = i, j = 1, ..., n.
0 : i ̸= j,
The identity matrix, Idn ∈ Rn×n (or simply Id), is diagonal with all diagonal entries equal to 1.
Supplementary 41/49
Rank (秩)
The maximal number of linearly independent columns of A is called the rank of the matrix, denoted
as rank(A).
[ ]
rank(A) is the dimension of span a1 , a2 , · · · , an .
Supplementary 42/49
Rank (秩)
The maximal number of linearly independent columns of A is called the rank of the matrix, denoted
as rank(A).
[ ]
rank(A) is the dimension of span a1 , a2 , · · · , an .
Supplementary 42/49
Inverse
The inverse A−1 of a matrix A ∈ Rn×n is defined such that
Supplementary 43/49
Pseudo inverse (伪逆)
The pseudo inverse (or Moore-Penrose inverse) of a matrix A is the matrix A+ that satisfies
AA+ A = A.
A+ AA+ = A+ .
AA+ and A+ A are symmetric.
Supplementary 44/49
Range and nullspace (像空间, 零空间)
The range/image or the column-space of a matrix A ∈ Rm×n , denoted as range(A), is the span
of the columns of A { }
range(A) = v ∈ Rm : v = Ax, x ∈ Rn .
The nullspace (also called kernel) of A ∈ Rm×n , denoted as null(A), is the set of all vectors x
such that Ax = 0,
null(A) = {x ∈ Rn : Ax = 0}.
Supplementary 45/49
Range and nullspace (像空间, 零空间)
The range/image or the column-space of a matrix A ∈ Rm×n , denoted as range(A), is the span
of the columns of A { }
range(A) = v ∈ Rm : v = Ax, x ∈ Rn .
The nullspace (also called kernel) of A ∈ Rm×n , denoted as null(A), is the set of all vectors x
such that Ax = 0,
null(A) = {x ∈ Rn : Ax = 0}.
Supplementary 45/49
Eigenvalues and eigenvectors
Let A ∈ Rn×n , we say λ ∈ C is an eigenvalue of A and Cn ∋ x ̸= 0 its corresponding eigenvector if
Ax = λx.
(λ, x) is an eigen-pair of A if
(λId − A)x = 0, x ̸= 0.
(λId − A)x = 0 has a non-zero solution to x if and only if λId − A has non-trivial nullspace,
which means λId − A is singular
|λId − A| = 0.
Supplementary 46/49
Properties
Avi = λi vi , i = 1, 2, ..., n.
Supplementary 47/49
Properties
We have the following connections
The trace of A equals to the sum of its eigenvalues
∑
trace(A) = λi .
i
Supplementary 47/49
Diagonalization
Let A be real symmetric, then it has n linearly independent eigenvectors v1 , v2 , ..., vn , let
λ1 0
[ ]−1 λ2
V = v1 v2 · · · vn and Λ = . .
..
0 λn
There holds
V AV −1 = Λ.
V ⊤ = V −1
and V AV T = Λ.
Supplementary 48/49
Singular value decomposition
Any m × n matrix can be written as
A = U ΣV ⊤
where
[ ]
Left singular values: U = u1 , u2 , · · · , um is the eigenvectors of AA⊤ , of size m × m.
[ ]
Σ = diag σ1 , σ2 , · · · , σm is the diagonal matrix with diagonal entries equal to the square roots
of the eigenvalues of AA⊤ , with
σ1 ≥ σ2 ≥ · · · ≥ σm ≥ 0.
[ ]
Right singular values: V = v1 , v2 , · · · , um is the eigenvectors of A⊤ A, of size n × m.
∑n
The singular value decomposition (SVD) can be written as A = i=1 σi ui vi⊤ .
Supplementary 49/49