0% found this document useful (0 votes)
25 views100 pages

Math6015 Lecture 01

Uploaded by

epicshadow001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views100 pages

Math6015 Lecture 01

Uploaded by

epicshadow001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Lecture - Introduction

01

Mathematical Background

Jingwei Liang
Institute of Natural Sciences and School of Mathematical Sciences

Course: Convex Optimization


Email: [email protected]
Office: Room 355, No. 6 Science Building
Outline

1 About the course

2 Examples

3 Vector space

4 Matrix space

5 Dual space

6 Supplementary
1. About the course
About the course
General information

Reference : S. Boyd and L. Vandenberghe : Convex optimization, Cambridge University Press,


2004.
Instructor : Jingwei Liang (梁经纬)
Office-hour : Wednesday 14:00 - 16:00@Room 355, No. 6 Science Building
Grades : course works (20%), projects (30%) and final exam (50%)

About the course 2/49


References

刘浩洋, 户将, 李勇锋,文再文: 最优化:建模、算法与理论, 高教出版社


A. Beck : First-order Methods in Optimization, MOS-SIAM Series on Optimization, 2017
S. Boyd and L. Vandenberghe : Convex optimization, Cambridge University Press, 2004.
J. Nocedal, S. Wright : Numerical optimization, Springer Science & Business Media, 2006.
B. Polyak : Introduction to optimization, Optimization Software, 1987
Y. Nesterov : Introductory lectures on convex optimization: A basic course, Vol. 87. Springer
Science & Business Media, 2013.
H. H. Bauschke and P. L. Combettes : Convex analysis and monotone operator theory in Hilbert
spaces, Vol. 408. New York: Springer, 2011.
T. R. Rockafellar : Convex analysis, Princeton University Press, 2015.

About the course 3/49


2. Examples
Examples
Optimization in SoC design

Examples 4/49
Neural network

Examples 5/49
Feasibility and Sudoku

Examples 6/49
Feasibility and Sudoku

Examples 6/49
Feasibility and Sudoku

S1 S2 S3 S4

Examples 6/49
3. Vector space
Definitions and properties
Vector space (向量空间)

Definition - Vector space


A vector space E over R (or a “real vector space”) is a set of vectors such that the following addition
and scalar multiplication holds.

Addition
Commutative: a + b = b + a.
Associative: (a + b) + c = a + (b + c).
0 adding: 0 + a = a + 0 = a.
For any x ∈ E, there exists a vector −x ∈ E such that x + (−x) = 0.

Vector space Vector space 7/49


Vector space (向量空间)

Definition - Vector space


A vector space E over R (or a “real vector space”) is a set of vectors such that the following addition
and scalar multiplication holds.

Addition
Commutative: a + b = b + a.
Associative: (a + b) + c = a + (b + c).
0 adding: 0 + a = a + 0 = a.
For any x ∈ E, there exists a vector −x ∈ E such that x + (−x) = 0.

Scalar multiplication
Distributive: let β ∈ R and b ∈ E, α(a + b) = αa + αb and (α + β)a = αa + βa.
Combinative: α(βa) = (αβ)a.
Associative: (α + β)a = αa + βa.
Scalars: 1a = a, α0 = 0, 0a = 0 and (−1)a = −a.

Vector space Vector space 7/49


Linear independence (线性独立/无关)

Definition - Linear independence


{ }
Let n ∈ N+ , then a group of n vectors v1 , v2 , ..., vn is said to be linearly independent if the equality

α 1 v1 + α 2 v2 + · · · + α n vn = 0
implies that all the coefficients αi , i = 1, ..., n are equal to 0.

A vector x is said to be a linear combination of vectors {v1 , v2 , ..., vn } if there are scalars α1 , ..., αn
such that
x = α 1 v1 + α 2 v2 + · · · + α n vn .

Vector space Vector space 8/49


Linear independence (线性独立/无关)

Definition - Linear independence


{ }
Let n ∈ N+ , then a group of n vectors v1 , v2 , ..., vn is said to be linearly independent if the equality

α 1 v1 + α 2 v2 + · · · + α n vn = 0
implies that all the coefficients αi , i = 1, ..., n are equal to 0.

A vector x is said to be a linear combination of vectors {v1 , v2 , ..., vn } if there are scalars α1 , ..., αn
such that
x = α 1 v1 + α 2 v2 + · · · + α n vn .

Definition - Linear dependence (线性相关)


A group of vectors {v1 , v2 , ..., vn } is linearly dependent if and only if one of the vectors from the
group is a linear combination of the remaining vectors.

Vector space Vector space 8/49


Linear span (线性生成/展开)
A subset S of E is called a subspace of E if S is closed for vector addition and scalar multiplication
Given any x, y ∈ S, x + y ∈ S.
For any α ∈ R and x ∈ S, αx ∈ S.
NB: It holds that 0 ∈ S.

Vector space Vector space 9/49


Linear span (线性生成/展开)
A subset S of E is called a subspace of E if S is closed for vector addition and scalar multiplication
Given any x, y ∈ S, x + y ∈ S.
For any α ∈ R and x ∈ S, αx ∈ S.
NB: It holds that 0 ∈ S.

Definition - Linear span


Let v1 , v2 , ..., vn be arbitrary vectors of E, the set of all their linear combinations is called the span of
them and denoted by
{ n }

span(v1 , v2 , ..., vn ) = αi vi : α1 , ..., αn ∈ R .
i=1

If x is a linear combination of {v1 , v2 , ..., vn }, then

span(v1 , v2 , ..., vn , x) = span(v1 , v2 , ..., vn ).

Vector space Vector space 9/49


Basis and dimension
Given a subspace S and a set of linearly independent vectors {v1 , v2 , ..., vn } ⊂ S such that
S = span(v1 , v2 , ..., vn ), then v1 , v2 , ..., vn is called a set of basis of S.
Infinitely many sets of basis.
All sets of basis contain the same number of vectors, which is called the dimension of the
subspace and denoted by dim(S).

Vector space Vector space 10/49


Basis and dimension
Given a subspace S and a set of linearly independent vectors {v1 , v2 , ..., vn } ⊂ S such that
S = span(v1 , v2 , ..., vn ), then v1 , v2 , ..., vn is called a set of basis of S.
Infinitely many sets of basis.
All sets of basis contain the same number of vectors, which is called the dimension of the
subspace and denoted by dim(S).

Proposition - Uniqueness of representation


If {v1 , v2 , ..., vn } is a basis of S, then any vector x of S can be represented uniquely as

x = α 1 v1 + α 2 v2 + · · · α n vn ,
where αi ∈ R, i = 1, ..., n.

αi ∈ R, i = 1, ..., n are called the coordinates of x with respect to the basis {v1 , v2 , ..., vn }.
NB: This course will discuss only vector spaces with a finite dimension (number of basis vector is
finite), i.e. finite-dimensional vector spaces.

Vector space Vector space 10/49


Inner product (内积)

Definition - Inner product


An inner product of a real vector space E is a function that associates to each pair of vectors x, y a
real number, which is denoted by ⟨x | y⟩ and satisfies the following properties:
Commutativity ⟨x | y⟩ = ⟨y | x⟩ for any x, y ∈ E.
Linearity ⟨α1 x1 + α2 x2 | y⟩ = α1 ⟨x1 | y⟩ + α2 ⟨x2 | y⟩ for any x1 , x2 , y ∈ E and α1 , α2 ∈ R.
Positive definiteness ⟨x | x⟩ ≥ 0 for any x ∈ E and ⟨x | x⟩ = 0 if and only if x = 0.

NB: x and y must of the same length, x and y are orthogonal if ⟨x | y⟩ = 0.

Vector space Vector space 11/49


Inner product (内积)

Definition - Inner product


An inner product of a real vector space E is a function that associates to each pair of vectors x, y a
real number, which is denoted by ⟨x | y⟩ and satisfies the following properties:
Commutativity ⟨x | y⟩ = ⟨y | x⟩ for any x, y ∈ E.
Linearity ⟨α1 x1 + α2 x2 | y⟩ = α1 ⟨x1 | y⟩ + α2 ⟨x2 | y⟩ for any x1 , x2 , y ∈ E and α1 , α2 ∈ R.
Positive definiteness ⟨x | x⟩ ≥ 0 for any x ∈ E and ⟨x | x⟩ = 0 if and only if x = 0.

NB: x and y must of the same length, x and y are orthogonal if ⟨x | y⟩ = 0.

A vector space endowed with an inner product is also called an inner product space:

Underlying Spaces: In this course the underlying vector spaces, usually denoted by E or V, are
always finite dimensional real inner product spaces with endowed inner product ⟨· | ·⟩ and
endowed norm || · ||.

Vector space Vector space 11/49


Norms (范数)

Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.

Positive homogeneity ||αx|| = |α| · ||x|| for any x ∈ E and α ∈ R.

Triangle inequality ||x + y|| ≤ ||x|| + ||y|| for any x, y ∈ E.

Vector space Vector space 12/49


Norms (范数)

Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.

Positive homogeneity ||αx|| = |α| · ||x|| for any x ∈ E and α ∈ R.

Triangle inequality ||x + y|| ≤ ||x|| + ||y|| for any x, y ∈ E.

With norm defined, one can measure the distance between two vectors x and y as the length of their
difference, i.e.
dist(x, y) = ||x − y||.

Vector space Vector space 12/49


Norms (范数)

Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.

Positive homogeneity ||αx|| = |α| · ||x|| for any x ∈ E and α ∈ R.

Triangle inequality ||x + y|| ≤ ||x|| + ||y|| for any x, y ∈ E.

With norm defined, one can measure the distance between two vectors x and y as the length of their
difference, i.e.
dist(x, y) = ||x − y||.

The open ball with center c ∈ E and radius r > 0 is denoted by B(c, r) and defined by
def { }
B(c, r) = x ∈ E | ||x − c|| < r .

B is convex (tbd..., to be defined).

Vector space Vector space 12/49


Norms (范数)

Definition - Norm
A norm on a vector space is a function || · || : E → R satisfying the following properties
Non-negativity ||x|| ≥ 0 for any x ∈ E, ||x|| = 0 if and only if x = 0.

Positive homogeneity ||αx|| = |α| · ||x|| for any x ∈ E and α ∈ R.

Triangle inequality ||x + y|| ≤ ||x|| + ||y|| for any x, y ∈ E.

Definition - Euclidean space (欧氏空间)


A finite dimensional real vector space equipped with an inner product ⟨· | ·⟩ is called a Euclidean
space if it is endowed with the norm √
||x|| = ⟨x | x⟩,
which is referred to as the Euclidean norm.

Vector space Vector space 12/49


Line segments

Definition - Line segments


The line segment between two points x and y in E is the set of points on the straight line joining
points x and y.

Vector space Sets 13/49


Hyperplanes (超平面)

Definition - Hyperplane
Given a non-zero v ∈ E and a ∈ R, the set
def { }
H = x | ⟨v | x⟩ = a
is called a hyperplane.

A hyperplane is not necessarily a subspace of E since, in general, it does not contain the origin 0.

The dimension of the hyperplane is dim(E) − 1...

Vector space Sets 14/49


Hyperplanes (超平面)

Definition - Hyperplane
Given a non-zero v ∈ E and a ∈ R, the set
def { }
H = x | ⟨v | x⟩ = a
is called a hyperplane.

Definition - Half space


Given a non-zero v ∈ E and a ∈ R, the set
− def { }
Hv,a = x | ⟨v | x⟩ ≤ a
is called a half space.

Vector space Sets 14/49


Affine sets (仿射集)

Definition - Affine set


Given a real vector space E, a set S ⊆ E is called affine if for any x, y ∈ S and λ ∈ R, the inclusion

λx + (1−λ)y ∈ S
holds.

∑m
Affine combination (仿射组合) Given x1 , x2 , ..., xm ∈ E, the point of the form y = α i xi
∑m i=1
where i=1 αi = 1, is called the affine combination of x1 , x2 , ..., xm .

Vector space Sets 15/49


Affine sets (仿射集)

Definition - Affine set


Given a real vector space E, a set S ⊆ E is called affine if for any x, y ∈ S and λ ∈ R, the inclusion

λx + (1−λ)y ∈ S
holds.

∑m
Affine combination (仿射组合) Given x1 , x2 , ..., xm ∈ E, the point of the form y = α i xi
∑m i=1
where i=1 αi = 1, is called the affine combination of x1 , x2 , ..., xm .

Affine hull (仿射包) For a set S ⊆ E, the affine hull of S, denoted by aff(S), is the intersection of
all affine sets containing S.
{ ∑m ∑m }
def
aff(S) = α i xi | αi = 1, xi ∈ S .
i=1 i=1

aff(S) is by itself an affine set, and it is the smallest affine set containing S (w.r.t. inclusion).

Hyperplanes are affine sets.

Vector space Sets 15/49


Convex sets (凸集)

Definition - Convex set


A subset S of E is convex if for any x, y ∈ S and λ ∈ [0, 1], there holds
λx + (1 − λ)y ∈ S.
λx + (1 − λ)y is called the convex combination (凸组合) of x and y.

Line segment, hyperplane, half space and ray are convex.

Vector space Sets 16/49


Convex sets (凸集)

Definition - Convex set


A subset S of E is convex if for any x, y ∈ S and λ ∈ [0, 1], there holds
λx + (1 − λ)y ∈ S.
λx + (1 − λ)y is called the convex combination (凸组合) of x and y.

Line segment, hyperplane, half space and ray are convex.

Some properties
{ }
Let S be a convex set, then βS = βx | x ∈ S is convex.
Let Si , i = 1, 2, ..., m be a family of convex sets, then

Si
i=1,2,...,m

is convex.
Let S1 , S2 be two convex sets, then S1 + S2 and S1 − S2 are convex.

Vector space Sets 16/49


Convex sets (凸集)

Definition - Convex set


A subset S of E is convex if for any x, y ∈ S and λ ∈ [0, 1], there holds
λx + (1 − λ)y ∈ S.
λx + (1 − λ)y is called the convex combination (凸组合) of x and y.

Line segment, hyperplane, half space and ray are convex.

Proposition - Weighted sum


Let S ⊂ E be a convex set, and xi , i = 1, 2, ..., m ∈ S. Given {αi }i=1,...,m such that αi ≥ 0 and

i αi = 1, then
∑m
αi xi ∈ S.
i=1

Vector space Sets 16/49


The space Rn
Let n be a positive integer, an n-dimensional column vector contains n entries
 
x1
 x2 
 
x =  . .
 .. 
xn

The number xi is called the ith element/component of the vector x.

NB: Throughout the course, by default we refer vector as column vector.

A n-dimensional row vector ( )


x = x1 , x2 , ..., xn .

Vector space The space Rn 17/49


The space Rn
Let n be a positive integer, an n-dimensional column vector contains n entries
 
x1
 x2 
 
x =  . .
 .. 
xn

The number xi is called the ith element/component of the vector x.

NB: Throughout the course, by default we refer vector as column vector.

A n-dimensional row vector ( )


x = x1 , x2 , ..., xn .

Transpose of a vector x⊤
If x is a column vector, then its transpose is a row vector.
One can also use (x1 , x2 , ..., xn )⊤ to denote an n-dimensional column vector.

Vector space The space Rn 17/49


The space Rn
The vector space Rn is the set of n-dimensional column vectors with real components endowed with
the component-wise addition operator
     
x1 y1 x 1 + y1
 x 2   y2   x 2 + y 2 
     
x+y = . + . = .. .
 ..   ..   . 
xn yn x n + yn
and the scalar-vector product
   
x1 αx1
 x2   αx2 
   
αx = α  .  =  .  .
 ..   .. 
xn αxn

Standard basis of Rn : e1 , e2 , ..., en .

Vector space The space Rn 17/49


Inner product in Rn
Dot product
 
y1
( ) 
 y2  ∑
n
⟨x | y⟩ = x⊤ y = x1 x2 ... xn  .  = xi yi ∈ R.
 ..  i=1
yn

Inner product in Rn : In this course, unless otherwise stated, the endowed inner product in Rn is
the dot product.

Vector space The space Rn 18/49


Inner product in Rn
Dot product
 
y1
( ) 
 y2  ∑
n
⟨x | y⟩ = x⊤ y = x1 x2 ... xn  .  = xi yi ∈ R.
 ..  i=1
yn

Inner product in Rn : In this course, unless otherwise stated, the endowed inner product in Rn is
the dot product.

Definition - Q-inner product


Let Q ∈ Rn×n be positive definite, the Q-inner product is defined by
⟨x | y⟩Q = ⟨x | Qy⟩ = x⊤ Qy.

Recovers the dot product when Q = Id is the identity matrix.

Vector space The space Rn 18/49


Vector p-norm

Definition - Vector p-norm


Let x ∈ Rn be a vector and p ≥ 1, then the p-norm (also called ℓp -norm) of x is defined by
( n )1/p
∑ p
||x||p = |xi | .
i=1

Vector space The space Rn 19/49


Vector p-norm

Definition - Vector p-norm


Let x ∈ Rn be a vector and p ≥ 1, then the p-norm (also called ℓp -norm) of x is defined by
( n )1/p
∑ p
||x||p = |xi | .
i=1

Example - Euclidean norm (ℓ2 -norm)


Given x ∈ Rn , let p = 2 we obtain the Euclidean norm of x
(∑ n )1/2 √
2
||x||2 = |xi | = x⊤ x.
i=1

Induced by the dot product.


Returns the length of the vector in Rn .
People also use ||x|| without subscript 2 to denote ℓ2 -norm.

Vector space The space Rn 19/49


Vector p-norm

Definition - Vector p-norm


Let x ∈ Rn be a vector and p ≥ 1, then the p-norm (also called ℓp -norm) of x is defined by
( n )1/p
∑ p
||x||p = |xi | .
i=1

Example - ℓ1 -norm
Let x ∈ Rn be a vector, the ℓ1 -norm of x is defined by
∑n
||x||1 = |xi |.
i=1

Returns the sum of absolute values.

Vector space The space Rn 19/49


Vector p-norm

Definition - Vector p-norm


Let x ∈ Rn be a vector and p ≥ 1, then the p-norm (also called ℓp -norm) of x is defined by
( n )1/p
∑ p
||x||p = |xi | .
i=1

Example - Infinity norm (ℓ∞ -norm)


Let x ∈ Rn be a vector, the infinity norm of x is defined by

||x||∞ = max |xi |.


i=1,...,n

Returns the largest elements without sign.

Vector space The space Rn 19/49


Subsets of Rn

The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .

Vector space The space Rn 20/49


Subsets of Rn

The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .

The positive orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn++ :
def { }
Rn++ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn > 0 .

Vector space The space Rn 20/49


Subsets of Rn

The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .

The positive orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn++ :
def { }
Rn++ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn > 0 .

The unit simplex, denoted by ∆n , is the subset of Rn comprising all nonnegative vectors whose
components sum up to one:
def { ∑ }
∆n = x ∈ Rn+ | ni=1 xi = 1 .

Vector space The space Rn 20/49


Subsets of Rn

The nonnegative orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn+ :
def { }
Rn+ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn ≥ 0 .

The positive orthant is the subset of Rn consisting of all vectors in Rn with nonnegative
components and is denoted by Rn++ :
def { }
Rn++ = (x1 , x2 , ..., xn )⊤ | x1 , x2 , ..., xn > 0 .

The unit simplex, denoted by ∆n , is the subset of Rn comprising all nonnegative vectors whose
components sum up to one:
def { ∑ }
∆n = x ∈ Rn+ | ni=1 xi = 1 .

Given two vectors ℓ, u ∈ Rn that satisfy ℓ ≤ u, the box with lower bounds ℓ and upper bounds u is
denoted by Box[ℓ, u] and defined by:
def { }
Box[ℓ, u] = x ∈ Rn | ℓ ≤ x ≤ u .

Vector space The space Rn 20/49


Operations on vectors in Rn

The vector [x]+ is the nonnegative part of a vector x ∈ Rn :


def ( )n
Rn+ = max{xi , 0} i=1 .

Vector space The space Rn 21/49


Operations on vectors in Rn

The vector [x]+ is the nonnegative part of a vector x ∈ Rn :


def ( )n
Rn+ = max{xi , 0} i=1 .

Given vector x ∈ Rn , the vector |x| is the vector of component-wise absolute values (|xi |)ni=1 , and
the vector sign(x) is defined as: {
+1, xi ≥ 0,
sign(x)i =
−1, xi < 0.

Vector space The space Rn 21/49


Operations on vectors in Rn

The vector [x]+ is the nonnegative part of a vector x ∈ Rn :


def ( )n
Rn+ = max{xi , 0} i=1 .

Given vector x ∈ Rn , the vector |x| is the vector of component-wise absolute values (|xi |)ni=1 , and
the vector sign(x) is defined as: {
+1, xi ≥ 0,
sign(x)i =
−1, xi < 0.

For two vectors x, y ∈ Rn , their Hadamard product, denoted by x ⊙ y, is the vector comprising
the component-wise products:
def ( )n
x ⊙ y = xi yi i=1 .

Vector space The space Rn 21/49


4. Matrix space
Definition and properties
Matrix
A matrix with m rows and n columns is called an m × n matrix and denoted by
 
a1,1 a1,2 . . . a1,n
 a2,1 a2,2 . . . a2,n 
  m×n
A= . .. .. ..  ∈ R .
 .. . . . 
am,1 am,2 . . . am,n

The kth column of A is denoted by ak


 
a1,k
 a2,k 
 
ak =  .  .
 .. 
am,k

Matrix space 22/49


Matrix addition
Let A, B ∈ Rm×n be two matrices, their addition reads
 
a1,1 + b1,1 a1,2 + b1,2 ... a1,n + b1,n
 a2,1 + b2,1 a2,2 + b2,2 ... a2,n + b2,n 
  m×n
A+B = .. .. .. .. ∈R .
 . . . . 
am,1 + bm,1 am,2 + bm,2 ... am,n + bm,n
Properties
Commutative
A + B = B + A.
Associative
(A + B) + C = A + (B + C).
0 adding
0 + A = A + 0 = A.

Matrix space 23/49


Scalar multiplication
Let α ∈ R and A ∈ Rm×n , their multiplication reads
 
αa1,1 αa1,2 . . . αa1,n
 αa2,1 αa2,2 . . . αa2,n 
  m×n
αA =  . .. .. ..  ∈ R .
 .. . . . 
αam,1 αam,2 . . . αam,n
Properties
Distributive: let β ∈ R and B ∈ Rm×n ,
α(A + B) = αA + αB,
(α + β)A = αB + βA.
Associative
(α + β)A = αB + βB.
Scalars
1A = A, α0 = 0, 0A = 0 and (−1)A = −A.

Matrix space 24/49


Matrix-vector production
Matrix-vector production returns vector
 
a1,1 a1,2 . . . a1,n
 a2,1 a2,2 . . . a2,n 
 
y = Ax =  . .. .. ..  x
 .. . . . 
am,1 am,2 . . . am,n
 ⊤   ⊤ 
a1,: a1,: x
 a⊤   a⊤ 
 2,:   2,: x 
=  . x =  . 
 ..   .. 
a⊤
m,: a⊤
m,: x
( ) ( ) ( )
= x1 a:,1 + x2 a:,2 + . . . + xn a:,n .

If A ∈ R1×n , then matrix vector product recovers vector dot product.

NB: y is a linear combination/presentation of the columns of A.

Matrix space 25/49


Matrix-matrix production
Matrix-matrix production returns matrix
  
a1,1 a1,2 . . . a1,n b1,1 b1,2 ... b1,p
 a2,1 a2,2 . . . a2,n   b2,1 b2,2 ... b2,p 
  
C = AB =  . .. .. ..   .. .. .. .. 
 .. . . .  . . . . 
am,1 am,2 . . . am,n bn,1 bn,2 ... bn,p
   
a1,: a1,: b:,1 a1,: b:,2 ... a1,: b:,n
 a2,:  [ ]  a2,: b:,n 
   2,: b:,1
a a2,: b:,2 ... 
=  .  b:,1 b:,2 ... b:,n =  . .. .. ..  .
 ..   .. . . . 
am,: an,: b:,1 an,: b:,2 ... an,: b:,n

NOT commutative AB ̸= BA.


Associative (AB)C = A(BC).
Distributive A(B + C) = AB + AC.

Matrix space 26/49


Transpose (转置)
The transpose of a matrix is an operator which flips a matrix over its diagonal — switching the row
and column indices of the matrix. Let A ∈ Rm×n , then its transpose A⊤ ∈ Rn×m with

(A⊤ )i,j = Aj,i .


Some properties
(A⊤ )⊤ = A.
(AB)⊤ = B ⊤ A⊤ .
(A + B)⊤ = A⊤ + B ⊤ .

 
a1,1 a2,1 ... am,1
 a1,2 a2,2 ... am,2 
 
A⊤ =  . .. .. ..  ∈ R
n×m
.
 .. . . . 
a1,n a2,n ... am,n

NB: symmetric matrix A⊤ = A.

Matrix space 27/49


Trace (迹)
The trace of a square matrix A ∈ Rn×n , denoted by trace(A), is the sum of its diagonal entries

n
trace(A) = ai,i .
i=1

Some properties
trace(A) = trace(A⊤ ).
trace(A + B) = trace(A) + trace(B).
Let a ∈ R, trace(aA) = atrace(A).
Let A, B be such that AB is square, trace(AB) = trace(BA).
Let A, B and C be such that ABC is square,

trace(ABC) = trace(BCA) = trace(CAB).


⊤ ⊤
x x = trace(xx ).
Go to supplementaries

Matrix space 28/49


The space Rm×n
The set of all real-valued m × n matrices is denoted by Rm×n . This is a vector space with the
component-wise addition as the summation operation and the component-wise scalar
multiplication as the “scalar-vector multiplication” operation. The dot product in Rm×n is defined by
( ) ∑m ∑
n
⟨A | B⟩ = trace A⊤ B = ai,j bi,j , ∀ A, B ∈ Rm×n .
i=1 i=1

Correspondence between Rm×n and Rmn .

Inner product in Rm×n : In this course, unless otherwise stated, the endowed inner product in
Rm×n is the dot product.

Matrix space The space Rm×n 29/49


Subsets of Rm×n

The subset of all n × n symmetric matrices is denoted by Sn :


def { }
Sn = A ∈ Rn×n | A = A⊤ .
NB: Sn is a vector space.

Matrix space The space Rm×n 30/49


Subsets of Rm×n

The subset of all n × n symmetric matrices is denoted by Sn :


def { }
Sn = A ∈ Rn×n | A = A⊤ .
NB: Sn is a vector space.

When A is real symmetric, all its eigenvalues are real, and


Positive definite A ≻ 0 if for all non-zero x ∈ Rn , x⊤ Ax > 0. All its eigenvalues are positive.
Positive semi-definite A ⪰ 0 if for all x ∈ Rn , x⊤ Ax ≥ 0. All its eigenvalues are non-negative.
Negative definite A ≺ 0...
Negative semi-definite A ⪯ 0...
INdefinite eigenvalues with mixed signs.

Matrix space The space Rm×n 30/49


Subsets of Rm×n

The subset of all n × n symmetric matrices is denoted by Sn :


def { }
Sn = A ∈ Rn×n | A = A⊤ .
NB: Sn is a vector space.

When A is real symmetric, all its eigenvalues are real, and


Positive definite A ≻ 0 if for all non-zero x ∈ Rn , x⊤ Ax > 0. All its eigenvalues are positive.
Positive semi-definite A ⪰ 0 if for all x ∈ Rn , x⊤ Ax ≥ 0. All its eigenvalues are non-negative.
Negative definite A ≺ 0...
Negative semi-definite A ⪯ 0...
INdefinite eigenvalues with mixed signs.

The subset of all n × n orthogonal matrices is denoted by On :


def { }
On = A ∈ Rn×n | AA⊤ = A⊤ A = Idn .

Matrix space The space Rm×n 30/49


Norms in Rm×n
Frobenius norm if Rm×n is endowed with dot product, then the corresponding Euclidean norm is
the Frobenius norm defined by
 1/2
∑∑ 2 √
||A||F =  ai,j  = trace(A⊤ A), ∀A ∈ Rm×n .
i j

Definition - Induced matrix norm (or operator norm)


Let || · ||a and || · ||b be norms on Rm and Rn , respectively. The operator norm of A is induced by
{ }
||A||a,b = max ||Ax||a : ||x||b ≤ 1 .

It can be easily shown that given any x ∈ Rn , there holds

||Ax||a ≤ ||A||a,b ||x||b .

Matrix space The space Rm×n 31/49


Examples

When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :

||A||2 = σmax (A) = λmax (A⊤ A).

Matrix space The space Rm×n 32/49


Examples

When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :

||A||2 = σmax (A) = λmax (A⊤ A).

Let λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0 be the eigenvalues of A⊤ A, and v1 , v2 , · · · , vn be the the orthonormal


set of the eigenvectors corresponding to these eigenvalues.
Then given any x ∈ E with ||x|| = 1, there exists c1 , c2 , · · · , cn ∈ R such that
x = c1 v1 + c2 v2 + · · · + cn vn . Then

⟨x | x⟩ = c21 + c22 + · · · + c2n = 1,


and
||Ax|| = ⟨Ax | Ax⟩ = ⟨x | A⊤ Ax⟩
2

= ⟨c1 v1 + c2 v2 + · · · + cn vn | A⊤ A(c1 v1 + c2 v2 + · · · + cn vn )⟩
= ⟨c1 v1 + c2 v2 + · · · + cn vn | c1 λ1 v1 + c2 λ2 v2 + · · · + cn λn vn ⟩
= λ1 c21 + λ2 c22 + · · · + λn c2n
≤ λ1 (c21 + c22 + · · · + c2n ) = λ1 .

Matrix space The space Rm×n 32/49


Examples

When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :

||A||2 = σmax (A) = λmax (A⊤ A).

Denote ||A||∞ be the norm induced by ℓ∞ -norms on Rm and Rn ,


{ } ∑
n
||A||∞ = max ||Ax||∞ : ||x||∞ ≤ 1 = max |ai,j |,
i=1,...,m
j=1

which is the max-row-sum norm.

Matrix space The space Rm×n 32/49


Examples

When a = b = 2, the operator norm of A is its maximum singular value, and denoted as ||A||2 :

||A||2 = σmax (A) = λmax (A⊤ A).

Denote ||A||∞ be the norm induced by ℓ∞ -norms on Rm and Rn ,


{ } ∑
n
||A||∞ = max ||Ax||∞ : ||x||∞ ≤ 1 = max |ai,j |,
i=1,...,m
j=1

which is the max-row-sum norm.

Denote ||A||1 be the norm induced by ℓ1 -norms on Rm and Rn ,


{ } ∑
m
||A||1 = max ||Ax||1 : ||x||1 ≤ 1 = max |ai,j |,
j=1,...,n
i=1

which is the max-column-sum norm.

Matrix space The space Rm×n 32/49


5. Dual space
Linear transform, duality and norms in dual space
Linear transformation

Definition - Linear transformation


Given two vector spaces E and V, a function A : E → V is called a linear transformation if the
following properties hold:
A(αx) = αA(x) for every x ∈ E and α ∈ R.
A(x + y) = A(x) + A(y) for every x, y ∈ E.

Dual space 33/49


Linear transformation

Definition - Linear transformation


Given two vector spaces E and V, a function A : E → V is called a linear transformation if the
following properties hold:
A(αx) = αA(x) for every x ∈ E and α ∈ R.
A(x + y) = A(x) + A(y) for every x, y ∈ E.

Dot product (of vectors and matrices) and matrix-vector product are linear transformations.

Identity transformation, denoted by I, is defined by the relation

I(x) = x
for all x ∈ E.

Dual space 33/49


Linear transformation

Definition - Linear transformation


Given two vector spaces E and V, a function A : E → V is called a linear transformation if the
following properties hold:
A(αx) = αA(x) for every x ∈ E and α ∈ R.
A(x + y) = A(x) + A(y) for every x, y ∈ E.

Dot product (of vectors and matrices) and matrix-vector product are linear transformations.

Identity transformation, denoted by I, is defined by the relation

I(x) = x
for all x ∈ E.
All linear transformations from Rn to Rm have the form

A(x) = Ax
for some matrix A ∈ R m×n
.

Dual space 33/49


Dual space (对偶空间)

Definition - Linear functional and dual space


A linear functional on a vector space E is a linear transformation from E to R. Given a vector space E,
the set of all linear transformations on E is called the dual space of E and is denoted by E∗ .

E∗ itself is a vector space.


For inner product spaces, it is known that given a linear functional f ∈ E∗ , there always exists
v ∈ E such that
f (x) = ⟨v | x⟩.
For the sake of simplicity of notation, one can represent the linear functional f above by the
vector v.
This correspondence between linear functionals and elements in E implies the elements in E∗ as
exactly the same as those in E.
The inner product in E∗ is the same as the inner product in E. Essentially, the only difference
between E and E∗ will be in the choice of norms of each of the spaces.

Dual space 34/49


Basis of dual space
Given a basis e1 , e2 , ..., en of a vector space E, there exists a dual basis of E∗ , written as ϕ1 , ϕ2 , ..., ϕn ,
where {
0 i ̸= j,
ϕi (ej ) = δij =
1 i = j.

Dual space 35/49


Basis of dual space
Given a basis e1 , e2 , ..., en of a vector space E, there exists a dual basis of E∗ , written as ϕ1 , ϕ2 , ..., ϕn ,
where {
0 i ̸= j,
ϕi (ej ) = δij =
1 i = j.

{ϕ1 , ϕ2 , ..., ϕn } spans E∗ .

Dual space 35/49


Basis of dual space
Given a basis e1 , e2 , ..., en of a vector space E, there exists a dual basis of E∗ , written as ϕ1 , ϕ2 , ..., ϕn ,
where {
0 i ̸= j,
ϕi (ej ) = δij =
1 i = j.

{ϕ1 , ϕ2 , ..., ϕn } spans E∗ .

{ϕ1 , ϕ2 , ..., ϕn } are linearly independent.

Dual space 35/49


Dual norm

Definition - Dual norm


Suppose E is endowed with norm || · ||. Then the norm of the dual space, called the dual norm, is
given by: let v ∈ E∗ { }
||v||∗ = max ⟨v | x⟩ | ||x|| ≤ 1 .

Dual norm is also a norm.

Can be treated as the operator norm of v ⊤ , interpreted as a 1 × n matrix, with the norm || · || on
E, and the absolute value on R
{ }
||v||∗ = max |⟨v | x⟩| | ||x|| ≤ 1 .

Dual space 36/49


Examples

The dual of the Euclidean norm is the Euclidean


norm { }
max ⟨v | x⟩ | ||x||2 ≤ 1 = ||v||2 .

NB Recall the Cauchy-Schwarz inequality.

Dual space 37/49


Examples

The dual of the ℓ1 -norm is the ℓ∞ -norm


{ }
max ⟨v | x⟩ | ||x||1 ≤ 1 = ||v||∞ .

Recall that in R2
⟨v | x⟩ = v1 x1 + v2 x2 .

Dual space 37/49


Examples

The dual of the ℓ∞ -norm is the ℓ1 -norm


{ }
max ⟨v | x⟩ | ||x||∞ ≤ 1 = ||v||1 .

Recall that in R2
⟨v | x⟩ = v1 x1 + v2 x2 .

Dual space 37/49


Examples

More generally, ℓp -, ℓq -norms are dual to each other if


1 1
+ = 1.
p q

Dual space 37/49


Examples

Q-norms Consider the space Rn endowed with the Q-norm, where Q ∈ Sn++ . The dual norm of
|| · ||Q is || · ||Q−1 , meaning

||v||Q−1 = v ⊤ Q−1 v.

Dual space 37/49


Generalized Cauchy-Schwarz inequality
From the definition of dual norm, we have the following inequality: x ̸= 0
{ }
⟨v | x/||x||⟩ ≤ max ⟨v | x̃⟩ | || x̃|| ≤ 1 = ||v||∗ =⇒ ⟨v | x⟩ ≤ ||v||∗ ||x||
which holds for all v and x.
The inequality is tight in the sense that, for any x there exists a v such that the equality holds,
and vice versa.

Dual space 38/49


Generalized Cauchy-Schwarz inequality
From the definition of dual norm, we have the following inequality: x ̸= 0
{ }
⟨v | x/||x||⟩ ≤ max ⟨v | x̃⟩ | || x̃|| ≤ 1 = ||v||∗ =⇒ ⟨v | x⟩ ≤ ||v||∗ ||x||
which holds for all v and x.
The inequality is tight in the sense that, for any x there exists a v such that the equality holds,
and vice versa.

Theorem - Generalized Cauchy-Schwarz inequality


Let E be an inner product vector space endowed with a norm || · ||. Then for any two vectors x ∈ E
and v ∈ E∗ ,
|⟨x | v⟩| ≤ ||x||||v||∗
holds. Furthermore, equality holds if and only if x = αv for some α ∈ R.

Dual space 38/49


Adjoint transformation

Definition - Adjoint transformation


Given two vector spaces E, V and a linear transformation A : V → E, the adjoint transformation,
denoted by A⊤ , is a transformation from E∗ to V∗ defined by the relation

⟨y | A(x)⟩ = ⟨A⊤ (y) | x⟩


for any x ∈ V and y ∈ E.

Dual space 39/49


Adjoint transformation

Definition - Adjoint transformation


Given two vector spaces E, V and a linear transformation A : V → E, the adjoint transformation,
denoted by A⊤ , is a transformation from E∗ to V∗ defined by the relation

⟨y | A(x)⟩ = ⟨A⊤ (y) | x⟩


for any x ∈ V and y ∈ E.

When V = Rn and E = Rm (endowed with dot product), and A(x) = Ax for some matrix A ∈ Rm×n ,
then the adjoint transformation is given by

A⊤ (y) = A⊤ y.

Dual space 39/49


Norm of linear transformations

Example - Norm of linear transformations


Let A : E → V be a linear transformation from a vector space E to a vector space V. Assume that E
and V are endowed with the norms || · ||E and || · ||V , respectively. The norm of the linear
transformation is defined by
def { }
||A|| = max ||A(x)||V | ||x||E ≤ 1 .

There holds ||A|| = ||A⊤ ||.

Suppose A : Rn → Rm , then
A(x) = Ax
where A ∈ R m×n
. Assume R and R
n m
are endowed with norms || · ||a and || · ||b , respectively.
Then
||A|| = ||A||a,b .

Dual space 40/49


6. Supplementary
Definitions, matrix operations
Special matrices

Definition - Square matrix


A matrix A is said to be square if the number of its rows is equal to the number of its columns, i.e. it
is of n × n.

( ∈R
n×n
A diagonal matrix D is a) square matrix where all its non-diagonal entries are 0. Typical
notation D = diag σ1 , σ2 , ..., σn with
{
σi : i = j,
di,j = i, j = 1, ..., n.
0 : i ̸= j,
The identity matrix, Idn ∈ Rn×n (or simply Id), is diagonal with all diagonal entries equal to 1.

Supplementary 41/49
Special matrices

Definition - Square matrix


A matrix A is said to be square if the number of its rows is equal to the number of its columns, i.e. it
is of n × n.

( ∈R
n×n
A diagonal matrix D is a) square matrix where all its non-diagonal entries are 0. Typical
notation D = diag σ1 , σ2 , ..., σn with
{
σi : i = j,
di,j = i, j = 1, ..., n.
0 : i ̸= j,
The identity matrix, Idn ∈ Rn×n (or simply Id), is diagonal with all diagonal entries equal to 1.

Definition - Orthogonal matrix


A square matrix U is orthogonal if all its columns are orthogonal to each other and have unit length.

The inverse of an orthogonal matrix is its transpose U ⊤ U = Id = U U ⊤ .


Length preserving ||U x|| = ||x||.

Supplementary 41/49
Rank (秩)
The maximal number of linearly independent columns of A is called the rank of the matrix, denoted
as rank(A).
[ ]
rank(A) is the dimension of span a1 , a2 , · · · , an .

Proposition - Properties of rank


The rank of a matrix A is invariant under the following operations
Multiplication of the columns of A by nonzero scalars.
Interchange of the columns.
Addition to a given column a linear combination of other columns.

Supplementary 42/49
Rank (秩)
The maximal number of linearly independent columns of A is called the rank of the matrix, denoted
as rank(A).
[ ]
rank(A) is the dimension of span a1 , a2 , · · · , an .

Proposition - Properties of rank


The rank of a matrix A is invariant under the following operations
Multiplication of the columns of A by nonzero scalars.
Interchange of the columns.
Addition to a given column a linear combination of other columns.

Let A ∈ Rm×n , there holds rank(A) = rank(A⊤ ) and


rank(A) ≤ min{m, n}, if rank(A) = min{m, n}, then A has full (column/row) rank.
{ }
Let B ∈ Rn×q , rank(A) + rank(B) − n ≤ rank(AB) ≤ min rank(A), rank(B) .
Let B ∈ Rm×n , rank(A + B) ≤ rank(A) + rank(B).

Supplementary 42/49
Inverse
The inverse A−1 of a matrix A ∈ Rn×n is defined such that

AA−1 = A−1 A = Idn .


If A−1 exists, A is said to be non-singular. Otherwise, A is said to be singular.
(A−1 )−1 = A.
(AB)−1 = B −1 A−1 .
(A−1 )⊤ = (A⊤ )−1 .
Neuman series, A0 = Idn ,

+∞
(Idn − A)−1 = Ak .
k=0

Supplementary 43/49
Pseudo inverse (伪逆)
The pseudo inverse (or Moore-Penrose inverse) of a matrix A is the matrix A+ that satisfies
AA+ A = A.
A+ AA+ = A+ .
AA+ and A+ A are symmetric.

Suppose A ∈ Rm×n has full (column/row) rank, then


Square m = n: A+ = A−1 .
Broad m < n: A+ = A⊤ (AA⊤ )−1 , a.k.a. right inverse.
Tall m > n: A+ = (A⊤ A)−1 A⊤ , a.k.a. left inverse.

Supplementary 44/49
Range and nullspace (像空间, 零空间)

The range/image or the column-space of a matrix A ∈ Rm×n , denoted as range(A), is the span
of the columns of A { }
range(A) = v ∈ Rm : v = Ax, x ∈ Rn .

The nullspace (also called kernel) of A ∈ Rm×n , denoted as null(A), is the set of all vectors x
such that Ax = 0,
null(A) = {x ∈ Rn : Ax = 0}.

Supplementary 45/49
Range and nullspace (像空间, 零空间)

The range/image or the column-space of a matrix A ∈ Rm×n , denoted as range(A), is the span
of the columns of A { }
range(A) = v ∈ Rm : v = Ax, x ∈ Rn .

The nullspace (also called kernel) of A ∈ Rm×n , denoted as null(A), is the set of all vectors x
such that Ax = 0,
null(A) = {x ∈ Rn : Ax = 0}.

Orthogonal decomposition induced by A If S is a subspace of Rn , its orthogonal complement,


denoted by S⊥ , is defined as { }
S⊥ = x | v ⊤ x = 0 for all v ∈ S.

Let A ∈ Rm×n , there holds


null(A)⊥ = range(A⊤ ).

Supplementary 45/49
Eigenvalues and eigenvectors
Let A ∈ Rn×n , we say λ ∈ C is an eigenvalue of A and Cn ∋ x ̸= 0 its corresponding eigenvector if

Ax = λx.

(λ, x) is an eigen-pair of A if
(λId − A)x = 0, x ̸= 0.
(λId − A)x = 0 has a non-zero solution to x if and only if λId − A has non-trivial nullspace,
which means λId − A is singular
|λId − A| = 0.

Supplementary 46/49
Properties

Theorem - Number of eigen-pairs


Suppose that the characteristic equation |λId − A| = 0 has n distinct roots λ1 , λ2 , · · · , λn . Then,
there exist n linearly independent vectors v1 , v2 , · · · , vn such that

Avi = λi vi , i = 1, 2, ..., n.

Both λ and v can be complex.

Theorem - Eigenvalues of real symmetric matrix


All eigenvalues of a real symmetric matrix are real.

Theorem - Eigenvectors of real symmetric matrix


Any real symmetric n × n matrix has a set of n eigenvectors that are mutually orthogonal.

Supplementary 47/49
Properties
We have the following connections
The trace of A equals to the sum of its eigenvalues

trace(A) = λi .
i

The rank of A equals to the number of its non-zero eigenvalues


rank(A) = ||Λ||0 , Λ = [λ1 , ..., λn ].

Suppose A is non-singular, then (1/λ, x) is an eigen-pair of A−1 .

Supplementary 47/49
Diagonalization
Let A be real symmetric, then it has n linearly independent eigenvectors v1 , v2 , ..., vn , let
 
λ1 0
[ ]−1  λ2 
 
V = v1 v2 · · · vn and Λ =  . .
 .. 
0 λn
There holds
V AV −1 = Λ.

If v1 , v2 , ..., vn are mutually orthogonal and ||vi || = 1, i = 1, 2, ..., n, then

V ⊤ = V −1
and V AV T = Λ.

Supplementary 48/49
Singular value decomposition
Any m × n matrix can be written as
A = U ΣV ⊤
where
[ ]
Left singular values: U = u1 , u2 , · · · , um is the eigenvectors of AA⊤ , of size m × m.
[ ]
Σ = diag σ1 , σ2 , · · · , σm is the diagonal matrix with diagonal entries equal to the square roots
of the eigenvalues of AA⊤ , with
σ1 ≥ σ2 ≥ · · · ≥ σm ≥ 0.
[ ]
Right singular values: V = v1 , v2 , · · · , um is the eigenvectors of A⊤ A, of size n × m.
∑n
The singular value decomposition (SVD) can be written as A = i=1 σi ui vi⊤ .

Definition - Condition number


The condition number of a non-singular matrix A ∈ Rn×n , denoted as cond(A) or κ(A), is defined as
σmax (A)
κ(A) = .
σmin (A) Back to main slides

Supplementary 49/49

You might also like