0% found this document useful (0 votes)
67 views

UUM 526 Optimization Techniques in Engineering Lecture 2: Mathematical Preliminaries

The document provides an overview of mathematical concepts that will be used in an optimization techniques engineering course, including: - Proof methods like proof by contradiction and proof by induction. - Linear algebra topics like linear combinations, linear dependence/independence, span, basis, dimension, and matrix rank. - Properties of determinants for square matrices, including that the determinant is linear in columns and equals zero if two columns are equal. - Vector and matrix notation.

Uploaded by

_MerKeZ_
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

UUM 526 Optimization Techniques in Engineering Lecture 2: Mathematical Preliminaries

The document provides an overview of mathematical concepts that will be used in an optimization techniques engineering course, including: - Proof methods like proof by contradiction and proof by induction. - Linear algebra topics like linear combinations, linear dependence/independence, span, basis, dimension, and matrix rank. - Properties of determinants for square matrices, including that the determinant is linear in columns and equals zero if two columns are equal. - Vector and matrix notation.

Uploaded by

_MerKeZ_
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

UUM 526 Optimization Techniques in Engineering

Lecture 2: Mathematical Preliminaries

Asst. Prof. N. Kemal Ure

Istanbul Technical University


[email protected]

February 5, 2019

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 1 / 57


Overview

1 Proof Methods

2 Linear Algebra

3 Geometry

4 Calculus

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 2 / 57


Introduction

▶ We need to review/build some mathematical tools before we can


start attacking optimality tests and convergence of algorithms
▶ We are going to review:
∎ Proof Methods
∎ Linear Algebra
∎ Geometry

∎ Single and Multivariable Calculus

▶ If you took UUM 535, everything here should look familiar.


∎ We will skip most proofs here, since we already did them at UUM 535.
∎ All the proofs are available in Chong’s anyway.

▶ The main goal is to understand quadratic forms and Taylor’s


Theorem in Rn , which will be our main tools in analysis of
optimization problems and algorithms.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 3 / 57
Proof Methods

Proof Methods

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 4 / 57


Proof Methods

Proofs

▶ A theorem is usually a statement of the form A Ô⇒ B


∎ A is assumption of the theorem (what is given to us), B is the conclusion
of the problem. If A is true, then B holds

▶ Example (Triangle Inequality): A ∶ a, b ∈ R, B ∶ ∣a + b∣ ≤ ∣a∣ + ∣b∣

▶ Some theorems work both ways A ⇐⇒ B


∎ A, is true if and only if B is true
∎ In this case we have to prove both A Ô⇒ B and B Ô⇒ A. Example:

Theorem
Let a, b be real numbers. Then a = b if and only if for all  > 0, ∣a − b∣ < 

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 5 / 57


Proof Methods

Proof By Contradiction (PBC) and Proof By


Induction

▶ We just used PBC in the last proof! It is a very popular technique


▶ Based on the fact that A Ô⇒ B is equivalent to not(A and(notB))
Theorem
There are infinite number of prime numbers

▶ Proof by induction is also a popular technique. Suppose that


property we want to prove is indexed by N, so we need to prove
A(1), A(2), . . .
∎ First prove that it is true for A(1)
∎ Next prove that A(n) Ô⇒ A(n + 1)

Theorem
n(n+1)
Let n ∈ N, then ∑ni=1 i = 2

▶ Warning: Induction does not apply when n → ∞


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 6 / 57
Linear Algebra

Linear Algebra

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 7 / 57


Linear Algebra

Linear Combination

▶ A vector space is a set V equipped with vector addition (+) and


scalar multiplication (.) such that:

α ⋅ v1 + β ⋅ v2 ∈ V, ∀v1 , v2 ∈ V, α, β ∈ F (1)

▶ We will be mainly dealing with V = Rn and F = R.


▶ If S ⊆ V satisfies Eq. 1, then S is called a subspace of V .
Definition (Linear Combination)
Let ak , k = 1, . . . , n be a finite number of vectors. Vector b is said to be
a linear combination of vectors ak , if there exists scalars αk such that:
n
b = α1 a1 + α2 a2 + . . . + αn an = ∑ αk ak
k=1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 8 / 57


Linear Algebra

Linear Dependency

Definition (Linearly Dependent List)


The list of vectors A = (ak ∶ k = 1, . . . , n) are said to be linearly
dependent if one of the vectors is a linear combination of the others.

▶ If list A is not linearly dependent, it is called linearly independent.


Theorem (Test for linear independence)
The list A = (ak ∶ k = 1, . . . , n) is linearly independent iff the equality,
n
∑ αk ak = 0, (2)
k=1

implies that αk = 0, k = 1 . . . , n.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 9 / 57


Linear Algebra

Span, Basis and Dimension

Definition (Span)
The set of all linear combinations of a list of vectors {ak } is called the
span of {ak }
n
span[a1 , . . . , an ] = { ∑ αk ak ∶ αk ∈ R}
k=1

▶ Span is always a subspace!


Definition (Basis)
If the list (ak ) is linearly independent and span[a1 , . . . , an ] = V , then
(ak ) is a basis for V .

▶ By fixing a basis, we can represent vectors in V in terms of that basis.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 10 / 57


Linear Algebra

Span, Basis and Dimension

Theorem (Unique Coordinates for a Fixed Base)


Let {ak } be a basis for V . Then any vector v ∈ V can be represented
uniquely as,
n
v = ∑ αk ak .
k=1

▶ There are usually infinite number of bases for a subspace. For


instance if (ak ) is a basis, so is (cak ), c ∈ R.
▶ What about the number of vectors in a basis?
Theorem (Unique Number of Vectors in a Basis)
Let ak , k = 1, . . . , n and bi , i = 1, . . . , m be two different bases for V .
Then n = m.
▶ Hence every space (or subspace) V has a unique number of vectors
in its every basis. That number is called the dimension of V .
▶ We call αk the coordinates of the vector v in base {ak }.
n Kemal Ure (ITU)
Asst. Prof. N. Lecture 2 February 5, 2019 11 / 57
Linear Algebra

Vector and Matrix Notation

▶ For a ∈ Rn we will write vectors in column notation


⎡ a1 ⎤
⎢ ⎥
⎢a ⎥
⎢ ⎥ T
a = ⎢ 2 ⎥ = [a1 a2 . . . an ] , ai ∈ R
⎢ ⋮ ⎥
⎢ ⎥
⎢an ⎥
⎣ ⎦
▶ A matrix A ∈ Rm×n is a rectangular collection of real numbers
aij ∈ R, i = 1, . . . , m.j = 1, . . . , n
⎡ a11 a12 . . . a1n ⎤
⎢ ⎥
⎢a ⎥
⎢ 21 a22 . . . a2n ⎥
A=⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥⎥

⎢am1 am2 . . . amn ⎥
⎣ ⎦
▶ It is much more useful to think A as a collection of n vectors lying in
Rm : A = [a1 , a2 , . . . , an ], ak ∈ Rm
∎ Also matrix-vector multiplication Av makes more sense this way
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 12 / 57
Linear Algebra

Matrix Rank

Definition (Rank of a Matrix)


Rank r is the maximal number of independent columns of A ∈ Rm×n .

▶ Notice that r ≤ n. When r = n, we say that matrix is full rank.


Theorem (Invariance of Rank)
Rank of a matrix A ∈ Rm×n is invariant under following operations.
1 Multiplication of columns of A by nonzero scalars.
2 Interchange of columns.
3 Addition to a given column a linear combination of other columns.

▶ Nice, but is there a formula for testing if the matrix has full rank? Is
there a scalar quantity that measures the independency of columns?
∎ For square matrices (m = n), the answer is yes! It is called the
determinant of the matrix.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 13 / 57
Linear Algebra

Determinant

▶ Determinant of matrix (denoted by ∣A∣) is a confusing concept at


first, has many different interpretations.
▶ The properties of the determinant are more important than its
explicit formula
Definition (Determinant)
Determinant is a function denoted as det ∶ Rn×n → R, and possess the
following properties:
1 Determinant is linear in matrix’s columns

∣A∣ = detA = det[a1 , . . . , αak + βbk , . . . , an ]


= αdet[a1 , . . . , ak , . . . , an ]
+ β[det[a1 , . . . , bk , . . . , an ].

2 If for some k, ak = ak+1 , then ∣A∣ = 0.


3 Determinant of the identity matrix is 1, that is ∣In ∣ = 1.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 14 / 57
Linear Algebra

Consequences of Determinant Definition

▶ If there is a zero column in the matrix, determinant is zero,

det[a1 , . . . , 0, . . . , an ] = 0.

▶ If we add a column linear combination of other columns, determinant


does not change.
⎡ ⎤
⎢ n ⎥
det ⎢a1 , . . . , ak + ∑ αj aj , . . . , an ⎥⎥ = det[a1 , . . . , ak , . . . , an ]

⎢ ⎥
⎣ j=1,j≠k ⎦

▶ Determinant change sign if we interchange columns,

det[a1 , . . . , ak−1 , ak , . . . , an ] = −det[a1 , . . . , ak , ak−1 , . . . , an ].

▶ So, if the columns of A are not linearly independent, then ∣A∣ = 0.


∎ Hence for a square matrix: full rank ⇐⇒ nonzero determinant.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 15 / 57
Linear Algebra

Determinant and Rank

▶ Only square matrices have determinants. What if I want to test the


rank of a rectangular matrix?
∎ Rectangular matrices have square sub-matrices! Wonder if their
determinant is useful... First we need to define it:
Definition (Minor)
pth order minor of a matrix A ∈ Rm×n is the determinant of sub-matrix
formed by deleting m − p rows and n − p columns.

▶ Then we have this cool theorem:


Theorem (Minors and Rank)
If an A ∈ Rm×n (m ≥ n) has a nonzero nth order minor, then
rankA = n.

▶ It is straightforward to show that rank of a matrix is the maximal


order of its nonzero minors.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 16 / 57
Linear Algebra

Nonsingular matrices and Inverses

▶ A square matrix A with det A ≠ 0 is called nonsingular.


▶ A matrix A ∈ Rn×n is nonsingular if and only if there exists a matrix
B ∈ Rn×n such that:
AB = BA = In

▶ Matrix B is called the inverse of A and denoted as A−1 .


▶ Shows up in the solution of linear equations Ax = b. The unique
solution exists if A is nonsingular (x = A−1 b)
▶ What about non-square linear systems?
Theorem (Existence of Solution in a Linear System)
The set of equations represented by Ax = b has a solution if and only if

rankA = rank[A, b].

▶ If rankA = m < n, then we have infinite number of solutions.


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 17 / 57
Linear Algebra

Euclidian Inner Product

▶ We need to turn our vector space into a metric space by adding a


”length” function.
∎ A such function already exists for V = R, the absolute value function ∣.∣
∎ Some very useful properties: −∣a∣ ≤ a ≤ ∣a∣, ∣ab∣ = ∣a∣∣b∣
∎ The most useful property: ∣a + b∣ ≤ ∣a∣ + ∣b∣

▶ For Rn , before defining the length function, it is helpful to define the


inner product first
Definition (Euclidean Inner Product)
The Euclidean Inner Product of two vectors x, y ∈ Rn is defined as
n
⟨x, y⟩ = ∑ xi yi = xT y
i=1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 18 / 57


Linear Algebra

Inner Product Properties

▶ Inner product has the following properties


∎ Positivity: ⟨x, x⟩ ≥ 0, x = 0 ⇐⇒ ⟨x, x⟩ = 0.
∎ Symmetry: ⟨x, y⟩ = ⟨y, x⟩.
∎ Additivity: ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩.

∎ Homogeneity: ⟨rx, y⟩ = r⟨x, y⟩, r ∈ R.

▶ These properties also hold for the second vector.


▶ Two vectors are orthogonal if ⟨x, y⟩ = 0.
▶ Now we can define the length, it is called the Euclidean norm:
√ √
∥x∥ = < x, x > = xT x

Theorem (Cauchy-Schwartz Inequality)


For any x, y ∈ Rn
∣⟨x, y⟩∣ ≤ ∥x∥∥y∥,
The equality holds only if x = αy for some α ∈ R.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 19 / 57
Linear Algebra

Norm Properties

▶ Norm possess many properties of the absolute value function


∎ Positivity: ∥x∥ ≥ 0, ∥x∥ = 0 ⇐⇒ x = 0
∎ Homogeneity: ∥rx∥ = ∣r∣∥x∥, r ∈ R
∎ Triangle Inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥

▶ There are many other vector norms. Actually any function that
satisfies the properties above is a norm.
1
∎ p-norm: ∥x∥p = (∣x1 ∣p + ∣x2 ∣p + ⋅ ⋅ ⋅ + ∣xn ∣p ) p
∎ p = 2, the Euclidean norm

∎ What does p = 0 or 1 corresponds to? What happens when p → ∞?

▶ Continuity of f ∶ Rn → Rm can be formulated in terms of norms


∎ f is continuous at x0 ∈ Rn if and only if for all  > 0, there exists a δ > 0
such that ∥x − x0 ∥ < δ Ô⇒ ∥f (x) − f (x0 )∥ < 

▶ if x ∈ Cn (complex numbers), inner product is defined as ∑ni=1 xi yi ,


hence ⟨x, y⟩ = ⟨y, x⟩ and ⟨x, ry⟩ = r̄⟨x, y⟩.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 20 / 57
Linear Algebra

Linear Transformations

Definition (Linear Transformation)


A function L ∶ Rn → Rm is called a linear transformation if

L(ax) = aL(x), a ∈ R
L(x + y) = L(x) + L(y)

▶ If we fix a basis for Rn and Rm we can represent L with a matrix,


such that L(x) = Ax. Hence different bases corresponds to different
matrix representations.
∎ Let {u1 , . . . , un } be a basis for Rn and let {v1 , . . . , vm } be a basis for
Rm . Then we can express L(uk ) = Auk ∈ Rm as:

Auk = ak,1 v1 + ⋅ ⋅ ⋅ + ak,m vm

the ak,1 , . . . , ak,m makes up the k th column of A.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 21 / 57


Linear Algebra

Similarity Transformations

▶ Let ei and e′i be two different bases of Rn . Let x be the coordinates


of v ∈ Rn with respect to ei and let x′ be the coordinates of v with
respect to e′i .
▶ Then there is a transformation matrix T ∈ Rn×n such that x′ = T x,
T = [e′1 , . . . , e′n ]−1 [e1 , . . . , en ]

▶ Nice, now we know how to transform coordinates between two bases.


How about transforming matrices?
Theorem (Transformation of Bases)
Let A, B ∈ Rn×n be two representations of the same linear
transformation according to different bases. Then there exists a
nonsingular matrix T ∈ Rn×n such that B = T AT −1

▶ Two matrices are called similar if there exists a T such that


B = T AT −1 . Similar matrices represent the same transformation.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 22 / 57
Linear Algebra

Eigenvalues and Eigenvectors

▶ Later we will see that proving convergence of many optimization


algorithms rely on computation of square matrix Ak .
∎ Can we find a basis of Rn such that Ak is easy to compute?
∎ This leads us to the study of eigenvalues and eigenvectors:

Definition (Eigenvalue and Eigenvector)


Let A ∈ Rn×n . If a vector v ∈ Rn satisfies the equation Av = λv, it is
called an eigenvector of A. λ is called the eigenvalue that corresponds
to v.
▶ Has many different physical and geometrical interpretations
∎ We will see that eigenvectors of a linear transformation will form a basis
that results in a very special matrix representation
▶ How to find v? How many are there? It is difficult answer directly...
That is where the eigenvalues come in
∎ Rearrange Av = λv to get (A − Iλ)v = 0
∎ What are the solutions?
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 23 / 57
Linear Algebra

Eigenvalues

▶ (A − Iλ)v = 0 has a non-trivial solution only if (A − Iλ) is singular.


▶ Hence equation for v turns into an equation for λ, det(A − Iλ) = 0
▶ Results in the nth order polynomial

det(A − Iλ) = λn + an−1 λn−1 + ⋅ ⋅ ⋅ + a1 λ + a0 = 0

▶ Hence there are at most n distinct eigenvalues and corresponding


eigenvectors!
∎ once a λ is found the corresponding v is easy to find.

▶ Although the number of eigenvalues is finite, that this is not the case
for eigenvectors..
∎ If v is an eigenvector, so is cv.
∎ So there are actually an infinite number of eigenvectors

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 24 / 57


Linear Algebra

Eigenvectors as a Basis

▶ Is there a case where eigenvectors are linearly independent, so that


we can form a basis out of them?
Theorem (Linearly Independent Eigenvectors)
Suppose that A has distinct eigenvalues (i ≠ j Ô⇒ λi ≠, λj ). Then
the corresponding non-zero eigenvectors vi are linearly independent.

▶ Now lets rearrange the eigenvectors into the matrix T , apply


B = T AT −1 and see what happens...
⎡λ1 0 ⎤⎥

⎢ ⎥
⎢ λ2 ⎥
B=⎢ ⎥
⎢ ⋱ ⎥
⎢ ⎥
⎢0 λn ⎥⎦
⎣ . ..

▶ What if we also wanted the basis to be orthogonal?


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 25 / 57
Linear Algebra

Symmetric Matrices

▶ A real matrix A is symmetric if AT = A. This leads to a lots of


cool properties
Theorem (Eigenvalues of Symmetric Matrices)
All eigenvalues of real symmetric matrices are real.

Theorem (Eigenvectors of Symmetric Matrices)


An n × n symmetric matrix has a set of n orthogonal eigenvectors.

▶ A matrix with orthogonal columns is called an orthogonal matrix


∎ When we normalize each column so that each have unit length, we get an
orthonormal matrix
∎ Orthonormal matrices has this super cool property: T −1 = T T

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 26 / 57


Linear Algebra

Orthogonal Subspaces

Definition (Orthogonal Complement)


Let V be a subspace of Rn . The orthogonal complement V ⊥ consists
of all vectors that are orthogonal to V . That is:

V ⊥ = {x ∈ Rn ∶ v T x = 0, v ∈ V }

▶ Every vector x can be uniquely


decomposed as x = x1 + x2 ,
where x1 ∈ V and x2 ∈ V ⊥ . We
call x1 and x2 orthogonal
projections onto V and V ⊥ . We
say that a linear transformation
P is an orthogonal projector if
P x ∈ V and x − P x ∈ V ⊥ .

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 27 / 57


Linear Algebra

Range and Nullspace

Definition (Range and Nullspace)


Let A ∈ Rm×n . Range of A is defined as R(A) = {Ax ∶ x ∈ Rn }.
Nullspace of A (also called the kernel) is defined as
N (A) = {x ∈ Rn ∶ Ax = 0}.

▶ R and N are subspaces! Moreover:


Theorem (Orthogonality of Matrix Subspaces)
Let A be a given matrix. Then R(A)⊥ = N (AT )

▶ Theorem above is important on its own, it also allows us to prove:


Theorem (Projection Matrix)
A Matrix P is an orthogonal projection if and only if P 2 = P = P T

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 28 / 57


Linear Algebra

Quadratic Forms

Definition (Quadratic Forms)


A function f ∶ Rn → R is a quadratic form if it can be represented as

f (x) = xT Qx,

where Q is a real matrix.

▶ Note that we can always assume Q is symmetric. Just replace Q


with 12 (Q + QT ), which is always symmetric.
▶ f (x) is called positive definite (p.d.) if xT Qx > 0 for x ≠ 0
▶ f (x) is called positive semidefinite (p.s.d) if xT Qx ≥ 0 for x ≠ 0
▶ We love p.d functions in optimization!
∎ They are basically the generalization of parabolas to higher dimensions

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 29 / 57


Linear Algebra

Positive Definite Matrices

▶ Q is called a p.d. matrix when the associated quadratic form is p.d.


∎ How can we check if a given matrix is p.d.?
▶ First we need to define what a leading principle minor is. Then we
can use the following test
Theorem (Sylvester’s Test)
The symmetric matrix Q is p.d. iff its leading principle minors are
positive.
∎ The test does not work for non-symmetric matrices! For p.s.d. we check
if the leading principle minors are non-negative (necessary condition), for
sufficiency we need to check all the principal minors.
▶ An alternative test is to check the eigenvalues
Theorem (Eigenvalue Test)
The symmetric matrix Q is p.d. iff all eigenvalues are positive.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 30 / 57
Linear Algebra

Matrix Norms

▶ Matrix norms satisfy these conditions


∎ ∥A∥ ≥ 0 and ∥A∥ = 0 ⇐⇒ ∥A∥ = 0.
∎ ∥cA∥ = ∣c∣∥A∥.
∎ ∥A + B∥ ≤ ∥A∥ + ∥B∥.

▶ We will also enforce them to obey ∥AB∥ ≤ ∥A∥∥B∥.


▶ Example: Flatten A ∈ Rn×m into a vector in Rnm and apply the
Euclidian norm. The result is the Frobenius norm ∥.∥F :
1
⎛ n m 2 ⎞2
∥A∥F = ∑ ∑ aij
⎝i=1 j=1 ⎠

▶ If we have a norm ∥.∥n on Rn and the norm ∥.∥m on Rm , these two


norms induce the following norm on all linear transformations from
Rn to Rm :
∥A∥ = max ∥Ax∥m
∥x∥=1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 31 / 57


Linear Algebra

Induced Norms

▶ Since norm ∥.∥m is a continuous function and the set ∥x∥ = 1 is


compact, the maximum is always attained. We can also check the all
4 norm properties to confirm that the induced norm in indeed a norm.
▶ A nice property of the induced norm is, they can be usually linked to
the spectral properties of the matrices, such as:
Theorem (Matrix Norm Induced by Euclidean Norm)
Let ∥.∥n and ∥.∥m be the Euclidean√norm on spaces Rn and Rm . Then
the induced matrix norm is ∥A∥ = λ1 , where λ1 is the largest
eigenvalue of AT A.

▶ The following important theorem also uses similar arguments:


Theorem (Rayleigh’s Inequality)
if P is a n × n real symmetric p.d. matrix, then

λmin (P )∥x∥2 ≤ xT P x ≤ λmax (P )∥x∥2


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 32 / 57
Geometry

Geometry

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 33 / 57


Geometry

Line Segments

▶ The line segment between two points x, y ∈ Rn is the set of points


on the straight line joining these two points

▶ If z lies on the line segment, there exists an α ∈ [0, 1] such that


z = αx + (1 − α)y
▶ Hence the line segment can be defined as:

l(x, y) = {αx + (1 − α)y ∶ α ∈ [0, 1]}

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 34 / 57


Geometry

Hyperplanes

Definition (Hyperplane)
Let u ∈ Rn and v ∈ R. An hyperplane is the set of points of the form

{x ∈ Rn ∶ ⟨u, x⟩ = v}

▶ Do not forget that a hyperplane is not necessarily a subspace!

∎ Hyperplane’s dimension is always


n−1
∎ Hyperplanes divide the space into
two halfspaces
∎ Positive halfspace: ⟨u, x⟩ ≥ v
∎ Negative halfspace: ⟨u, x⟩ ≤ v
∎ An alternative definition for a
hyperplane is ⟨x, a⟩ = 0 where a is a
point on the hyperplane
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 35 / 57
Geometry

Linear Varieties

Definition (Linear Variety (Affine Set))


A linear variety is a set of the form:

{x ∈ Rn ∶ Ax = b},

where A ∈ Rn×m and b ∈ Rm .

▶ We say that linear variety has dimension r if N (A) = r.

▶ A linear variety is a subspace if and only if b = 0


▶ It is easy to see that a linear variety contains all the lines between
any of its two points
∎ What about the converse? If a set contains all the lines between any of
its two points, is it an affine set?

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 36 / 57


Geometry

Convex Sets

Definition (Convex Set)


A set Θ is convex if for all x, y ∈ Θ we have l(x, y) ⊂ Θ.

▶ A point on l(x, y) is also called convex combination of x, y.


Hence a set is convex if it contains all of its convex combinations.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 37 / 57


Geometry

Properties of Convex Sets

Theorem (Properties of Convex Sets)


Convex subsets of Rn have the following properties
1 If Θ is convex and β is a real number, then the set

βΘ = {x ∶ x = βv, v ∈ Θ}

is also convex.
2 If Θ1 and Θ2 are convex, then the set

Θ1 + Θ2 = {x ∶ x = v1 + v2 , v1 ∈ Θ1 , v2 ∈ Θ2 }

is also convex.
3 If Θi is a collection of convex sets, then ∩i Θi is also convex.

▶ A point x ∈ Θ is called an extreme point if it is not a convex


combination of any two other points in the set.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 38 / 57
Geometry

Neighborhoods

Definition (Neighborhood)
Let  > 0. An −neighborhood of a
point x ∈ Rn is the set

N (x) = {y ∈ Rn ∶ ∥y − x∥ < }

▶ A point x ∈ S called an interior point if there exists an  > 0 such


that N (x) ⊂ S,
▶ A point x ∈ S called a limit point if for all  > 0 we have
{N (x) ∩ S} ∖ {x} ≠ ∅,
▶ A set is open if all of its points are interior points.
▶ A set is closed if it contains all of its limit points.
▶ A set is bounded if there exists a M ∈ R such that S ⊂ NM (x)
▶ A set it compact if it is closed and bounded
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 39 / 57
Geometry

Polytopes and Polyhedra

▶ A polytope is a set that can be expressed as intersection of finite


number of halfspaces.

▶ If a polytope is nonempty and bounded, we call it a polyhedron

▶ Polytopes will be of major interest when we start studying linear


programming problems.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 40 / 57


Calculus

Calculus

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 41 / 57


Calculus

Limit of a sequence

Definition (Convergent Sequence)


We say that the sequence xk ∈ R is convergent if there exists a x ∈ R
such that for all  > 0 there exists an K ∈ N such that

k ≥ K Ô⇒ ∣xk − x∣ < 

▶ If xk converges x we usually write xk → x or limk→∞ xk = x.


▶ If xk ∈ Rn , then we can use norm ∥.∥ instead of ∣.∣.
Theorem (Unique Limit)
A convergent sequence has a unique limit point.

▶ A sequence is bounded if there exists a M ∈ R such that ∣xk ∣ ≤ M .


Theorem (Convergence implies boundedness)
Every convergent sequence is bounded.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 42 / 57
Calculus

Upper and Lower Bounds

▶ A number B ∈ R is called an upper bound for xk if xk ≤ B.


▶ The smallest upper bound of xk is call the supremum of the set:
Definition (Least Upper Bound)
Let xk be a bounded above sequence. The number B is called the least
upper bound or the supremum of xk , if
1 B is an upper bound of xk
2 For all  > 0 there exists an xK ∈ xk such that xK ≥ B − 

▶ A sequence is increasing if xk < xk+1 , non-decreasing if xk ≤ xk+1


▶ A sequence is decreasing if xk > xk+1 , non-increasing if xk ≥ xk+1
▶ A sequence is monotone if it is either increasing or decreasing.
Theorem (Convergence of Monotone Sequences)
Every bounded monotone sequence in R converges.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 43 / 57
Calculus

Subsequences

▶ Let xk be a sequence in Rn and let mk be a sequence of increasing


natural numbers. We call the sequence
xm1 , xm2 , . . .
a subsequence of xk .
Theorem (Subsequences of Convergent Sequences)
Subsequences of convergent sequences converge to the same limit.

Theorem (Bolzano-Weierstrass)
Every bounded sequence contains a convergent subsequence.

▶ Sequences allow us to look at continuous function in a different light.


f ∶ Rn → Rm is continuous at x iff
xk → x Ô⇒ f (xk ) → f (x), ∀xk ∈ Rn
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 44 / 57
Calculus

Matrix Limits

▶ We can easily extend the notion of limits and convergence to


matrices. We say that Ak → A if

lim ∥Ak − A∥ = 0
k→∞

Theorem (Convergence to Zero Matrix)


Let A ∈ Rn×n . Then limk→∞ Ak = 0 if and only if the eigenvalues of A
satisfy ∣λi ∣ < 1.

Theorem (Geometric Matrix Series)

k=0 A converges iff limk→∞ A = 0. Then the series


k k
The series ∑∞
converges to (In − A)−1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 45 / 57


Calculus

Matrix Valued Functions

▶ We can define a matrix valued function such as A ∶ Rr → Rm×n


∎ Hence A(ζ) returns a m × n matrix for a given ζ ∈ Rr

▶ We say that matrix valued function is continuous at ζ0 if

lim ∥A(ζ) − A(ζ0 )∥ = 0


∥ζ−ζ0 ∥→0

Theorem (Invertibility of continuous matrix functions)


Let A ∶ Rr → Rm×n be continuous at ζ0 . If A(ζ0 )−1 exists, then A(ζ)
is invertible in a sufficiently small neighbourhood of ζ0 and A(.)−1 is
continuous at ζ0 .

∎ Can be proven using the inverse function theorem

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 46 / 57


Calculus

Differentiability

▶ The main objective of differential calculus is to approximate an


arbitrary function f with an affine function A(x)

A(x) = L(x) + y

▶ A global approximation is usually not possible, hence we consider a


local approximation around the point x0 ∈ Rn . Which leads to:

∥f (x) − L(x − x0 ) − f (x0 )∥


lim =0
x→x0 ∥x − x0 ∥

▶ If such a linear transformation exists such that the above limit is


zero, we say that f differentiable at x0 . The linear transformation
L is called the derivative of f at x0 .

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 47 / 57


Calculus

Gradient and Jacobian

▶ A matrix is associated with every linear transformation. Hence


L(x) = Dx, where D ∈ Rm×n
▶ When f ∶ Rn → R, it is called the gradient ∇f (x0 )
∂f ∂f ∂f
Df (x0 ) = [ ∂x 1 ∂x2 ... ∂xn ] ∣ = ∇f (x0 )T
x=x0
▶ When f ∶ R → R , the matrix is called the Jacobian
n m

⎡ ∂f1 ∂f1 ∂f1 ⎤


⎢ ∂x1 . . . ∂x ⎥
⎢ ∂x2 n ⎥

Df (x0 ) = ⎢ ⋮ ⋮ ... ⋮ ⎥⎥ ∣
⎢ ∂fm ∂fm ∂fm ⎥ x=x0
⎢ ∂x . . . ∂xn ⎥⎦
⎣ 1 ∂x2
▶ Hence we approximate f at x0 with the affine function:
A(x) = f (x0 ) + Df (x0 )(x − x0 )

∎ This is an approximation in the sense that f (x) = A(x) + r(x) and


limx→x0 ∥r(x)∥/∥x − x0 ∥ = 0
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 48 / 57
Calculus

The Hessian

▶ Given f ∶ Rn → R, if f is twice differentiable, we can define the


Hessian matrix as:
⎡ ∂2f ∂2f ∂2f ⎤
⎢ ∂ 2 x1 . . . ⎥
⎢ ∂x 1 ∂x2 ∂x 1 ∂xn ⎥
D 2 f (x0 ) = ⎢⎢ ⋮ ⋮ ... ⋮ ⎥⎥ ∣
⎢ ∂2f ∂2f ∂2f ⎥
⎢ ⎥ x=x0
⎣ ∂xn ∂x1 ∂xn ∂x2 . . . ∂ 2 xn ⎦
▶ Schwarz’s Theorem: If f is twice continuously differentiable, then
∂2f ∂2f
=
∂xi ∂xj ∂xj ∂xi
∎ which makes the Hessian matrix symmetric!
∎ if the second partial derivatives are not continuous, then Hessian may not
be symmetric (See the example in the book).
▶ Since Hessian is symmetric, we can use Sylvester’s test or the
eigenvalue test to see if it is positive definite or not. This will be a
huge part of checking optimality conditions later in the course.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 49 / 57
Calculus

Differentiation Rules

Theorem (Chain Rule)


Let g ∶ A → R be differentiable on A ⊂ Rn and let f ∶ (a, b) → A be
differentiable on (a, b). Define the composite function
h ∶ (a, b) → R = g(f (t)). Then h is differentiable and the derivative is
⎡ f ′ (t) ⎤
⎢ 1 ⎥
⎢ ⎥
h′ (t) = Dg(f (t))Df (t) = ∇g(f (t))⊺ ⎢ ⋮ ⎥
⎢ ′ ⎥
⎢fn (t)⎥
⎣ ⎦

Theorem (Product Rule)


Let f, g ∶ Rn → Rm be two differentiable functions. Define h ∶ Rn → R
as h(x) = f (x)T g(x). Then h is also differentiable and the derivative
is
Dh(x) = f (x)T Dg(x) + g(x)T Df (x)

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 50 / 57


Calculus

Some Other Useful Lemmas

▶ D(y T Ax) = y T A.

▶ D(xT Ax) = xT (A + AT ) if n = m.

▶ D(y T x) = y T .

▶ D(xT Qx) = 2xT Q if Q is symmetric.

▶ D(xT x) = 2xT

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 51 / 57


Calculus

Level Set

▶ Level set of a function f ∶ Rn → R at c ∈ R is the set of points


S = {x ∶ f (x) = c}

▶ Level sets can be represented as parametric curves g ∶ [0, 1] → Rn .


▶ The relationship between gradients of level set curve and function?
∎ Set h(t) = f (g(t)), h′ (t) = 0 since function f is constant on level set,
∇f (g(t))T ∇g(t) = 0
∎ They are orthogonal! This will be the main principle in design of many
optimization algorithms.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 52 / 57
Calculus

Taylor’s Theorem

▶ Basis of many optimization methods and convergence proofs.


Theorem (Taylor’s Theorem for 1D)
Let f ∶ [a, b] → R ∈ C m . Denote h = b − a. Then,

h (1) h2 hm−1 (m−1)


f (b) = f (a) + f (a) + f (2) (a) + ⋅ ⋅ ⋅ + f (a) + Rm ,
1! 2! (m − 1)!

where f (i) is the ith order derivative of f and


hm (m)
Rm = f (a + θh), θ ∈ (0, 1)
m!
▶ Proof uses the Generalized Mean Value Theorem: If f, g are
differentiable on [a, b] there exists a point c ∈ (a, b) such that
f ′ (c) f (b) − f (a)
=
g ′ (c) g(b) − g(a)
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 53 / 57
Calculus

Order Symbols

▶ For further examination of the remainder term Rm we introduce the


order symbols
Definition (Order Symbols)
Let Ω ⊂ Rn ,0 ∈ Ω, g ∶ Ω → R with g(x) ≠ 0 if x ≠ 0 and f ∶ Ω → Rm .
∎ We say that f (x) = O(g(x)) if the quotient ∥f (x)∥/∣g(x)∣ is bounded
near 0. That is, there exists numbers δ, K > 0 such that

∥f (x)∥
∥x∥ < δ Ô⇒ ≤K
∣g(x)∣

∎ We say that f (x) = o(g(x)) if ∥f (x)∥ goes to zero faster than ∣g(x)∣,

∥f (x)∥
lim =0
x→0 ∣g(x)∣

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 54 / 57


Calculus

Order Symbol Examples

▶ Examples for Big-Oh symbol ▶ Examples for Little-Oh symbol


∎ x = O(x) ∎ x2 = o(x)
∎ [
x3 x3
] = O(x2 ) ∎ [ ] = o(x)
2x + 3x4
2
2x + 3x4
2

∎ cos(x) = O(1) ∎ x3 = o(x2 )

∎ sin(x) = O(x) ∎ x = o(1)

▶ It is evident that f (x) = o(g(x)) Ô⇒ f (x) = O(g(x)) but the


converse is not necessarily true.
▶ Inspection of the remainder term Rm in Taylor’s theorem yields
h (1) hm (m)
f (b) = f (a) + f (a) + ⋅ ⋅ ⋅ + f (a) + o(hm ), f ∈ C m
1! m!
h hm (m)
f (b) = f (a) + f (1) (a) + ⋅ ⋅ ⋅ + f (a) + O(hm+1 ), f ∈ C m+1
1! m!

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 55 / 57


Calculus

Taylor’s Theorem for Higher Dimensions

▶ Let f ∶ Rn → R ∈ C 2 , can we expand f to a Taylor series around the


point x0 ?
▶ Let z(α) = x0 + α ∥x−x00 ∥ and examine the function φ(α) = f (z(α))
(x−x )

∎ φ(∥x − x0 ∥) = f (x)
∎ Since φ is a single variable function, we can apply Taylor’s Theorem!

▶ Doing so yields (if f ∈ C 2 ):


1
f (x) = f (x0 )+Df (x0 )(x−x0 )+ (x−x0 )⊺ D2 f (x−x0 )+o(∥x−x0 ∥2 )
2

▶ If f ∈ C 3 :
1
f (x) = f (x0 )+Df (x0 )(x−x0 )+ (x−x0 )⊺ D2 f (x−x0 )+O(∥x−x0 ∥3 )
2

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 56 / 57


Calculus

Summary

▶ In this lecture we studied:

∎ Basic proof techniques we are going to use throughout the class


∎ Basics of linear algebra (span, basis, dimension, matrices, determinant)
∎ Linear transformations: how to transform between bases, eigenvalues,
quadratics forms, matrix norms
∎ Basic geometry: lines, planes, linear varieties, convex sets, polyhedra
∎ Basic calculus: convergence, differentiability, Taylor’s Theorem

▶ Next:
∎ Basics of optimization theory.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 57 / 57

You might also like