0% found this document useful (0 votes)
89 views11 pages

Nonlinear Optimization (18799 B, PP) : Ist-Cmu PHD Course, Spring 2011

This document provides background information on concepts in nonlinear optimization and linear algebra. It defines key terms such as vectors, subspaces, bases, matrices, norms, and eigenvalues. Specifically, it defines vectors and their properties in Rn, subspaces and their dimensions, linear independence of vectors, bases of subspaces, inner products and orthogonality, p-norms, and properties of matrices including transpose, trace, determinant, rank, and orthogonal matrices. It also defines eigenvalues and eigenvectors of symmetric matrices.

Uploaded by

Daniel Silvestre
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views11 pages

Nonlinear Optimization (18799 B, PP) : Ist-Cmu PHD Course, Spring 2011

This document provides background information on concepts in nonlinear optimization and linear algebra. It defines key terms such as vectors, subspaces, bases, matrices, norms, and eigenvalues. Specifically, it defines vectors and their properties in Rn, subspaces and their dimensions, linear independence of vectors, bases of subspaces, inner products and orthogonality, p-norms, and properties of matrices including transpose, trace, determinant, rank, and orthogonal matrices. It also defines eigenvalues and eigenvectors of symmetric matrices.

Uploaded by

Daniel Silvestre
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Nonlinear Optimization (18799 B,PP)

IST-CMU PhD Course, Spring 2011


Joao Xavier
[email protected]
Background
Vectors
1. Sets N, R
n
, R
n
+
and R
n
++
. N = 1, 2, 3, . . . is the set of integers. The set of column vectors
of size n with real entries is denoted by R
n
. For n = 1 we use R (instead of R
1
).
A vector x R
n
is written as x = (x
1
, x
2
, . . . , x
n
) or
x =
_

_
x
1
x
2
.
.
.
x
n
_

_
.
For vectors x = (x
1
, x
2
, . . . , x
n
) and y = (y
1
, y
2
, . . . , y
n
) in R
n
the compact notations x y,
x y, x < y and x > y mean that the inequalities hold componentwise. For example:
x y x
i
y
i
for i = 1, 2, . . . , n
x > y x
i
> y
i
for i = 1, 2, . . . , n.
The sets of non-negative and strictly positive vectors are denoted by
R
n
+
= x R
n
: x 0
R
n
++
= x R
n
: x > 0.
2. Linear space spanned by a set of vectors. The linear space spanned by the set of vectors
v
1
, v
2
, . . . , v
k
R
n
is dened as
spanv
1
, v
2
, . . . , v
k
= a
1
v
1
+ a
2
v
2
+ + a
k
v
k
: a
i
R for i = 1, 2, . . . , k.
For v ,= 0, note that spanv is the straight-line spanned by v (it contains the origin).
Another example:
span(1, 0, 0), (0, 1, 0) = (x
1
, x
2
, x
3
) R
3
: x
3
= 0.
Note also that
span(1, 0, 0), (0, 1, 0), (1, 1, 0) = span(1, 0, 0), (0, 1, 0) = span(1, 0, 0), (1, 1, 0).
3. Subspace of R
n
. A subset V of R
n
is said to be a subspace if av +bw V for any v, w V
and a, b R. Example:
V = (x
1
, x
2
, x
3
) R
3
: 3x
1
2x
2
+ x
3
= 0
is a subspace, but
U = (x
1
, x
2
, x
3
) R
3
: 3x
1
2x
2
+ x
3
= 1
is not. [Note: the origin always belong to a subspace]
1
4. Linearly independent vectors. The vectors v
1
, v
2
, . . . , v
k
R
n
are linearly independent
if and only if
a
1
v
1
+ a
2
v
2
+ + a
k
v
k
= 0 a
1
= a
2
= = a
k
= 0.
Example: (1, 1, 1), (1, 1, 1) are linearly independent but (1, 1, 1), (2, 2, 2) are not.
5. Basis and dimension of subspaces. Let V be a subspace of R
n
. We say that v
1
, v
2
, . . . , v
k

is a basis for V if v
1
, v
2
, . . . , v
k
are linearly independent and they span V , that is,
spanv
1
, v
2
, . . . , v
k
= V.
Example: (1, 0, 0), (0, 1, 0), (0, 0, 1) is a basis for R
3
but (1, 0, 0), (0, 1, 0), (1, 1, 0) is not.
All bases of a given subspace V have the same number of vectors. This number is called the
dimension of V and it is denoted by dimV . Note that dimR
n
= n.
6. Inner product, orthogonality and norm. The inner-product of v = (v
1
, v
2
, . . . , v
n
) and
w = (w
1
, w
2
, . . . , w
n
) is given by
v, w) = v

w =
n

i=1
v
i
w
i
.
The vectors v and w are said to be orthogonal if v, w) = 0. The norm of v is given by
|v| =
_
v, v) =

_
n

i=1
v
2
i
. (1)
7. Orthonormal bases. The set v
1
, v
2
, . . . , v
k
of a subspace V R
n
is said to be an
orthonormal basis if v
1
, v
2
, . . . , v
k
is a basis of V and
v
i
, v
j
) =
_
1, i = j
0, i ,= j
.
(1, 0, 0), (0, 1, 0), (0, 0, 1) is an orthonormal basis of R
3
, but (1, 0, 0), (0, 1, 0), (0, 0, 2) is
not.
8.
p
-norms. For p 1, the
p
-norm is dened as
||
p
: R
n
R |x|
p
=
_
n

i=1
[x
i
[
p
_
1/p
,
where x = (x
1
, x
2
, . . . , x
n
). Note that the norm in (1) corresponds to the case p = 2.
For p = , the

-norm is dened as
||

: R
n
R |x|

= max[x
1
[, [x
2
[, . . . , [x
n
[.
Properties of the
p
-norms:
(denite positive) |x|
p
0 with equality if and only if x = 0
(homogeneous) |ax|
p
= [a[ |x|
p
for all a R and x R
n
(triangular inequality) |x + y|
p
|x|
p
+|y|
p
for any x, y R
n
2
9. Cauchy-Schwartz inequality. For v, w R
n
and 1 p there holds
[v, w)[ |v|
p
|w|
q
where
q =
_
_
_
, if p = 1
p
p1
, if 1 < p <
1 , if p =
Some important special cases: [v, w)[ |v|
2
|w|
2
and [v, w)[ |v|
1
|w|

.
Matrices
1. R
nm
. The set of matrices of size n m with real entries is denoted by R
nm
.
2. Identity matrix. The identity matrix of size n n is denoted by
I
n
=
_

_
1 0 0
0 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 0 1
_

_
.
3. Transpose, symmetric matrices, trace, determinant and inverse. Let
A =
_

_
a
11
a
12
a
1m
a
21
a
22
a
2m
.
.
.
.
.
.

.
.
.
a
n1
a
n2
a
nm
_

_
R
nm
.
Its transpose is given by
A

=
_

_
a
11
a
21
a
n1
a
12
a
22
a
n2
.
.
.
.
.
.

.
.
.
a
1m
a
2m
a
nm
_

_
R
mn
.
A square matrix A is said to be symmetric if A = A

.
The trace of a square matrix A : n n is the sum of its diagonal entries
tr(A) =
n

i=1
a
ii
.
Note that tr(BC) = tr(CB) for any matrices B : p q and C : q p.
The determinant of A is denoted by det(A). Properties:
det(A) = det(A

)
det(BC) = det(B) det(C) for square matrices B and C of the same size.
3
The determinant is not a linear operator: in general, det(A + B) ,= det(A) + det(B).
If det(A) ,= 0 then A is invertible (non-singular) and its inverse is denoted by A
1
:
AA
1
= A
1
A = I
n
.
Let A be written in columns A = [ a
1
a
2
a
n
] (each a
i
R
n
); then, det(A) ,= 0 if and only
if a
1
, a
2
, . . . , a
n
are linearly independent.
4. Ker A, ImA e rank(A). Let A R
nm
. The nullspace or kernel of A is the subspace
Ker A = v R
m
: Av = 0.
Let A be written in rows
A =
_

_
a

1
a

2
.
.
.
a

n
_

_
(a
i
R
m
);
then, Ker A = v R
m
: a
i
, v) = 0 for i = 1, 2, . . . , n. That is, Ker A is the subspace of
vectors which are orthogonal to all the rows of A.
The range space or image of A is the subspace
ImA = Av : v R
m
.
Let A be written in columns A = [ a
1
a
2
a
m
] (a
i
R
n
); then, ImA = spana
1
, a
2
, . . . , a
m
.
That is, ImA is the subspace spanned by the columns of A.
The rank of A is the dimension of the subspace ImA: rank(A) = dimImA.
Properties:
rank(A) = rank(A

)
rank(A) is the maximum number of linearly independents columns (or rows) of A
m = dimKer A + dimImA
rank(AB) minrank(A), rank(B).
Note that rank(A) minn, m. A matrix is said to be full-rank if rank(A) = minn, m.
Examples of full-rank matrices :
A =
_
1 0 1
1 0 1
_
B =
_
0
2
_
.
5. Orthogonal matrices. A square matrix Q : n n is said to be orthogonal if
QQ

= I
n
.
Note that, in that case, Q
1
= Q

and Q

Q = I
n
.
Let Q be written in columns Q = [ q
1
q
2
q
n
]; then, Q is orthogonal if and only if its
columns q
1
, q
2
, . . . , q
n
constitute an orthonormal basis of R
n
.
Since Q: orthogonal implies Q

: orthogonal, it follows that Q is orthogonal if and only if its


rows constitute an orthonormal basis of R
n
.
Note that det(Q) = 1 (because QQ

= I
n
implies det(Q) det(Q

) = (det(Q))
2
= det(I
n
) =
1).
6. Frobenius norm. The Frobenius norm of a matrix A R
nm
is dened as
|A| =

_
n

i=1
m

j=1
a
2
ij
.
4
Note that
|A| =
_
tr (A

A) =
_
tr (AA

).
It corresponds to the usual norm of vectors, if A is interpreted as a vector in R
nm
(for
example, by stacking all its columns).
There holds |Av| |A| |v| for any A R
nm
and v R
m
.
7. Symmetric matrices: eigenvalues and eigenvectors. The set of symmetric matrices of
size n n is denoted by
S
n
= A R
nn
: A = A

.
Note that S
n
is a subspace of R
nn
.
Let A S
n
. The number R is said to be an eigenvalue of A if and only if there exists a
non-zero vector v R
n
such that
Av = v.
In that case, the vector v is said to be an eigenvector of A associated with the eigenvalue .
Properties:
if v
i
and v
j
are eigenvectors of A associated with distinct eigenvalues
i
and
j
com

i
,=
j
, then they are orthogonal v
i
, v
j
) = 0;
A is singular (non-invertible) if and only if A has a zero eigenvalue.
8. Spectral decomposition theorem. Let A be a symmetric matrix of size n n. Then,
there exists an orthogonal matrix Q : n n and a diagonal matrix
=
_

2
.
.
.

n
_

_
: n n
containing the eigenvalues of A, such that
A = QQ

.
Note that Aq
i
=
i
q
i
where q
i
R
n
denotes the ith column of Q = [ q
1
q
2
q
n
]. Thus, each
q
i
is an eigenvector of A associated with the eigenvalue
i
.
The spectral decomposition theorem implies:
tr(A) =
n

i=1

i
det(A) =
n

i=1

i
.
9. Inequalities for symmetric matrices. If A : n n is a symmetric matrix, then

min
(A) |v|
2
v

Av
max
(A) |v|
2
for all v R
n
. Note: the equalities are achieved choosing v as an eigenvector associated with

min
(A) or
max
(A).
Thus,

min
(A) = min
v=0
v

Av
v

v
= min
v=1
v

Av and
max
(A) = max
v=0
v

Av
v

v
= max
v=1
v

Av.
5
10. Theorem about the continuity of eigenvalues. To each symmetric matrix A S
n
correspond n eigenvalues (not necessarily distinct), which we sort in non-decreasing order

min
(A) =
1
(A)
2
(A)
n
(A) =
max
(A).
Thus, there exist n functions dened on S
n
:
_

1
: S
n
R

2
: S
n
R
.
.
.

n
: S
n
R
The function
i
: S
n
R corresponds to the map A
i
(A).
Theorem: each function
i
is continuous, that is, for any A
0
S
n
, there holds

>0

>0
: |AA
0
| < and A S
n
[
i
(A)
i
(A
0
)[ < .
11. Positive denite and semidenite matrices. A symmetric matrix
A =
_

_
a
11
a
12
a
13
a
1n
a
21
a
22
a
23
a
2n
a
31
a
32
a
33
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
_

_
is said to be positive semidenite if v

Av 0 for all v R
n
. The notation A _ 0 means
that A is positive semidenite.
The set of positive semidenite matrices of size n n is denoted by
S
n
+
= A S
n
: A _ 0.
Equivalent characterizations of A _ 0:

min
(A) 0
All the principal minors of A are nonnegative. For example, for n = 2:
A =
_
a
11
a
12
a
21
a
22
_
_ 0
_

_
a
11
0, a
22
0
det
_
a
11
a
12
a
21
a
22
_
0
.
For n = 3:
A =
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
_ 0
_

_
a
11
0, a
22
0, a
33
0
det
_
a
11
a
12
a
21
a
22
_
0, det
_
a
11
a
13
a
31
a
33
_
0, det
_
a
22
a
23
a
32
a
33
_
0
det
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
0
.
A symmetric matrix Ais said to be positive denite if v

Av > 0 for all v ,= 0. The notation


A ~ 0 means that A is positive denite.
The set of positive denite matrices of size n n is denoted by
S
n
++
= A S
n
: A ~ 0.
Equivalent characterizations of A ~ 0:
6

min
(A) > 0
The diagonal principal minors of A are positive:
a
11
> 0 det
_
a
11
a
12
a
21
a
22
_
> 0 det
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
> 0 det(A) > 0.
Properties of positive denite matrices:
if A ~ 0 then A is invertible
if A ~ 0 then A
1
~ 0.
Example: the matrix
A =
_
1 0
0 0
_
is positive semidenite, but its not positive denite (the eigenvalues of A are 0 and 1).
12. Negative denite and semidenite matrices. A symmetric matrix
A =
_

_
a
11
a
12
a
13
a
1n
a
21
a
22
a
23
a
2n
a
31
a
32
a
33
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
_

_
is said to be negative semidenite if v

Av 0 for all v R
n
. The notation A _ 0 means
that A is negative semidenite.
The set of negative semidenite matrices of size n n is denoted by
S
n

= A S
n
: A _ 0.
Equivalent characterizations of A _ 0:
A _ 0

max
(A) 0.
A matrix A is said to be negative denite if v

Av < 0 for all v ,= 0. The notation A 0


means that A is negative denite.
The set of negative denite matrices of size n n is denoted by
S
n

= A S
n
: A 0.
Equivalent characterizations of A 0:
A ~ 0

max
(A) < 0
The diagonal principal minors of A are alternatively negative and positive:
a
11
< 0 det
_
a
11
a
12
a
21
a
22
_
> 0 det
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
< 0 (1)
n
det(A) > 0.
Properties of negative denite matrices:
if A 0 then A is invertible
if A 0 then A
1
0
7
13. Singular value decomposition. Any matrix A R
nm
can be factored as
A = UV

where U R
nn
and V R
mm
are orthogonal matrices and
=
_
D 0
0 0
_
R
nm
.
The matrix D R
rr
, where r = rank(A), is diagonal
D =
_

2
.
.
.

r
_

_
and
1

2

r
> 0.
The
i
s are called (the nonzero) singular values of A. The columns of U (V ) are called the
left (right) singular vectors of A.
Other variants:
If n m, then r = rank(A) m and A can be written as
A =

U

V

with

U R
nm
obtained from U discarding its n m rightmost columns and

=
_

1
.
.
.

r+1
.
.
.

m
_

_
R
mm
,
r+1
= =
m
= 0
obtained from discarding its last n m rows.
If n m, then r = rank(A) n and A can be written as
A = U

with

V R
mn
obtained from V discarding its last n m rows and

=
_

1
.
.
.

r+1
.
.
.

n
_

_
R
nn
,
r+1
= =
n
= 0
obtained from discarding its n m rightmost columns.
[Note: when m = n, the matrices U, and V are all square. However, the singular value de-
composition (SVD) gives a dierent decomposition from the spectral decomposition theorem,
since the latter only applies to symmetric matrices]
Properties:
8
A R
nm
= A

A S
m
. By the spectral decomposition theorem, A

A = V V

,
where V R
mm
is orthogonal and R
mm
is diagonal with the eigenvalues of A
in its diagonal. On the other hand, the SVD composition of A is A = UV

and
A

A = V

UV

= V (

)V

. We conclude that the columns of V contain the


eigenvectors of A

A and
=

=
_

2
1

2
2
.
.
.

2
m
_

_
,
that is, the nonzero singular values of A are the positive square roots of the eigenvalues
of A

A.
Similarly, A R
mn
=AA

S
n
, and the matrix V R
mm
of the SVD decompo-
sition of A (A = UV

) contains the eigenvectors of the matrix AA

; and contains
the positive square root of the eigenvalues of AA

in its diagonal.
A =

r
i=1

i
u
i
v

i
, where
U =
_
_
u
1
u
2
u
n
_
_
and V =
_

_
v

1
v

2
.
.
.
v

m
_

_
.
The largest singular value of A,
max
:=
1
, is given by

max
= max
u=1, v=1
u

Av .
|Ax|
max
|x| for any x.
|A| =
_

r
i=1

2
i
Analysis
1. Compact sets. A subset K R
n
is said to be compact if it is closed and bounded.
Examples:
K = x R
n
: |x| 5 is compact
K = x = (x
1
, x
2
) R
2
: 1 x
1
3, x
2
0, x
1
+ x
2
1 is compact
the set K = x R
n
: |x| < 5 is not compact because it is not closed
the set K = x R
2
: 1 x
1
3, x
2
0 is not compact because it is not bounded.
Equivalent characterization: K R
n
is compact if and only if for any sequence x
k
: k N
in K there exists a subsequence x
m
k
: k N converging to a point of K.
Common application of the previous result: let x
k
: k N be a sequence of points in the
sphere K = x R
n
: |x| = 1. Note that K is compact. Then, there exists a subsequence
x
m
k
: k N and a point x
0
K such that x
m
k
x
0
.
2. Weierstrass theorem. Let f : R
n
R be a continuous function. Then, for each compact
subset K R
n
there exist x
1
K and x
2
K such that
f(x
1
) = min
xK
f(x) e f(x
2
) = max
xK
f(x).
That is, a continuous function on a compact subset achieves its inmum and supremum over
this set.
9
3. Dierentiability. A function f : R
n
R is said to be of class C
k
(k = 0, 1, 2, . . .) if all
partial derivatives of order k exist and are continuous (a function of class C
0
is a continuous
function).
For example, if f is of class C
1
then its gradient
f : R
n
R f(x) =
_

_
f
x
1
(x)
f
x
2
(x)
.
.
.
f
x
n
(x)
_

_
is a continuous map (each function R
n
x f/x
i
(x) R is continuous). If f is of class
C
2
then its Hessian

2
f : R
n
R
nn

2
f(x) =
_

2
f
x
2
1
(x)

2
f
x
1
x
2
(x)

2
f
x
1
x
n
(x)

2
f
x
2
x
1
(x)

2
f
x
2
2
(x)

2
f
x
2
x
n
(x)
.
.
.
.
.
.
.
.
.
.
.
.

2
f
x
n
x
1
(x)

2
f
x
n
x
2
(x)

2
f
x
2
n
(x)
_

_
is a continuous map (each function R
n
x
2
f/x
i
x
j
(x) R is continuous).
Schwarz theorem: if f : R
n
R is of class C
2
, then its Hessian matrix
2
f(x) is symmetric
for any x R
n
.
A function f : R
n
R is said to be smooth if it is of class C
k
for all k = 0, 1, 2, . . .
A map F : R
n
R
m
, F(x) = (f
1
(x), f
2
(x), . . . , f
m
(x)) is said to be of class C
k
(respectively,
smooth) if each component function f
i
: R
n
R is of class C
k
(respectively, smooth).
Taylor theorems
1. Taylor expansions. Consider a function f : R
n
R and a nominal point x
0
R
n
.
(1st order expansion) If f is dierentiable, then
f(x
0
+ h) = f(x
0
) +f(x
0
)

h + o (|h|) ,
that is,

>0

>0
: |h| [f(x
0
+ h) [f(x
0
+f(x
0
)

h][ |h| .
(2nd order expansion) If f is twice-dierentiable, then
f(x
0
+ h) = f(x
0
) +f(x
0
)

h +
1
2
h

2
f(x
0
)h + o
_
|h|
2
_
,
that is,

>0

>0
: |h|

f(x
0
+ h)
_
f(x
0
) +f(x
0
)

h +
1
2
h

2
f(x
0
)h
_

|h|
2
.
10
2. Important properties. Let F : R
n
R
m
, F(x) = (f
1
(x), f
2
(x), . . . , f
m
(x)) be a map of
class C
1
. Then, for any x, y R
n
there holds
F(y) = F(x) +
_
1
0
DF(x + t(y x))(y x)dt
where
DF(z) =
_

_
f
1
x
1
(z)
f
1
x
2
(z)
f
1
x
n
(z)
f
2
x
1
(z)
f
2
x
2
(z)
f
2
x
n
(z)
.
.
.
.
.
.
.
.
.
.
.
.
f
m
x
1
(z)
f
m
x
2
(z)
f
m
x
n
(z)
_

_
=
_

_
f
1
(z)

f
2
(z)

.
.
.
f
m
(z)

_
denotes the derivative of F at the point z.
Application: for the particular case of a function f : R
n
R (i.e., m = 1) there holds
if f is of class C
1
f(y) = f(x) +
_
1
0
f(x + t(y x))

(y x)dt
if f is of class C
2
f(y) = f(x) +
_
1
0

2
f(x + t(y x))(y x)dt
for any x, y R
n
.
3. Mean-value theorem for continuous functions. If f : R R is continuous then
_
b
a
f(x)dx = f(c)(b a)
for some c [a, b].
4. Mean-value theorems for dierentiable functions. Let f : R
n
R and x, y R
n
.
(1st order expansion) If f is of class C
1
then
f(y) = f(x) +f(z)

(y x)
for some z [x, y]. The notation [x, y] denotes the line segment which runs from x to
y, that is,
[x, y] = (1 t)x + ty : t [0, 1].
(2nd order expansion) If f is of class C
2
then
f(y) = f(x) +f(x)

(y x) +
1
2
(y x)
T

2
f(z)(y x)
for some z [x, y].
References
[1] R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 1985.
[2] J. Marsden and M. Homan. Elementary classical analysis. 2nd ed. W.H.Freeman, 1993.
11

You might also like