0% found this document useful (0 votes)
19 views7 pages

Lecture 4

Uploaded by

mralreda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Lecture 4

Uploaded by

mralreda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

TMA947 / MMG621 – Nonlinear optimization Lecture 4

TMA947 / MMG621 — Nonlinear optimization

Lecture 4 — Introduction to optimality conditions


Emil Gustavsson, Zuzana Nedělková

October 20, 2017

[Minor revision: Axel Ringh - September, 2024]

Local and global optimality

We consider an optimization problem which is that to

minimize f (x), (1a)


subject to x ∈ S, (1b)

where S ⊆ Rn is a nonempty set, and f : Rn → R ∪ {+∞} is a given function.

s x
1 2 3 4 5 6 7

When n = 1, we know that an optimal solution will be found at

– boundary points of S,

– stationary points, that is, where f ′ (x) = 0,


– discontinuities of f or f ′ .

Definition (global minimum). x∗ ∈ S is a global minimum of f over S if

f (x∗ ) ≤ f (x), ∀x∈S

The next important definition is the one of a local minimum.

1
TMA947 / MMG621 – Nonlinear optimization Lecture 4

Definition (local minimum). x∗ ∈ S is a local minimum of f over S if

∃ε > 0 such that f (x∗ ) ≤ f (x), x ∈ S ∩ Bε (x∗ ),

where Bε (x∗ ) := { y ∈ Rn | ∥y − x∗ ∥ < ε } is the Euclidean ball with radius ε centered at x∗ .

We denote x∗ ∈ S as a strict local minimum of f over S if f (x∗ ) < f (x) holds above for x ̸= x∗ .

One of the most important theorems in optimization follows.


Theorem (Fundamental Theorem of global optimality). Consider the problem (1), where S is a convex
set and f is convex on S. Then, every local minimum of f over S is also a global minimum

Proof. See Theorem 4.3 in the book.

Existence of optimal solutions

First some basic notations.

– We say that a set S ⊆ Rn is open if for every x ∈ S there exists an ε > 0 such that Bε (x) :=
{ y ∈ Rn | ∥y − x∥ < ε } ⊂ S.
– We say that a set S ⊆ Rn is closed if Rn \ S is open.
– A limit point of a set S ⊆ Rn is a point x such that there exists a sequence {xk }∞
k=1 ⊂ S
fulfilling xk → x.
– We can then define a closed set as a set which contains all its limit points.
– We say that a set S ⊆ Rn is bounded if there exists a constant C > 0 such that ∥x∥ ≤ C for all
x ∈ S.
– If a set is both closed and bounded, we call it compact.

Two important definitions needed to formulate Weierstrass’ Theorem are the following.
Definition (weakly coercive function). A function f is said to be weakly coercive with respect to the
set S if either S is bounded or
lim f (x) = ∞
∥x∥→∞
x∈S

Definition (lower semi-continuity). A function f is said to be lower semi-continuous at x if the value


f (x) is less than or equal to every limit of f as xk → x

In other words, f is lower semi-continuous at x ∈ S if

xk → x =⇒ f (x) ≤ lim inf f (xk )


k→∞

2
TMA947 / MMG621 – Nonlinear optimization Lecture 4

Figure 1: A lower semi-continuous function in one variable

Now we can formulate Weierstrass’ Theorem which guarantees the existence of optimal solutions
to an optimization problem as long as a few assumptions are satisfied.

Theorem (Weierstrass’ Theorem). Consider the problem (1), where S is a nonempty and closed set and
f is lower semi-continuous on S. If f is weakly coercive with respect to S, then there exists a nonempty,
closed and bounded (thus compact) set of optimal solutions to the problem (1).

Proof. See Theorem 4.6 in the book.

One way to remember the assumptions in Weierstrass’ Theorem is to imagine what can go wrong,
i.e., when does a problem not have an optimal solution. One example of an optimization problem
where the solution set is empty is when f (x) = 1/x and S = [1, ∞).

Optimality conditions when S = Rn

When S = Rn , i.e., the problem is an unconstrained optimization problem, then the following
theorem holds.
Theorem (necessary condition for optimality, C 1 ). If f ∈ C 1 on Rn , then

x∗ is a local minimum of f on Rn =⇒ ∇f (x∗ ) = 0

Proof. See Theorem 4.13.

 T
∂f (x) ∂f (x)
Note that ∇f (x) = ∂x1 , . . . , ∂xn . The opposite of the theorem is, however, not true. Take
f (x) = x and x = 0. We can strengthen the theorem by assuming that f is also in C 2 .
3

Theorem (necessary condition for optimality, C 2 ). If f ∈ C 2 on Rn , then

∇f (x∗ ) = 0

x∗ is a local minimum of f on Rn =⇒
∇2 f (x∗ ) ⪰ 0

Proof. See Theorem 4.16.

3
TMA947 / MMG621 – Nonlinear optimization Lecture 4

Remember that for a matrix A ∈ Rn×n , the notion A ⪰ 0 (A positive semidefinite) means that
xT Ax ≥ 0, for all x ∈ Rn . Now once again, the opposite direction in the theorem is not true.
However, by assuming positive definiteness of the Hessian of f , we can obtain a sufficient condi-
tion.
Theorem (sufficient condition for optimality, C 2 ). If f ∈ C 2 on Rn , then
∇f (x∗ ) = 0

=⇒ x∗ is a strict local minimum of f on Rn
∇2 f (x∗ ) ≻ 0

Proof. See Theorem 4.17.

To get sufficient conditions for a point to be a local minimum, we need to assume convexity of the
function f .
Theorem (necessary and sufficient condition for optimality, C 1 ). If f ∈ C 1 is convex on Rn , then
x∗ is a global minimum of f on Rn ⇐⇒ ∇f (x∗ ) = 0

Proof. See Theorem 4.18.

Optimality conditions for S ⊆ Rn

When S = Rn the directions we could move from a point x and still stay feasible were Rn itself.
When we consider cases where S ⊂ Rn this might not hold.
Definition (feasible direction). Let x ∈ S. A vector p ∈ Rn defines a feasible direction at x if
∃δ > 0 : x + αp ∈ S, for all α ∈ [0, δ].

So the feasible directions at a point x ∈ S describes the directions in which we can ”move”
without becoming infeasible.
Definition (descent direction). Let x ∈ Rn . A vector p defines a descent direction with respect to f
at x if
∃δ > 0 : f (x + αp) < f (x), for all α ∈ (0, δ].

Suppose that f ∈ C 1 around a point x ∈ Rn , and that p ∈ Rn . If ∇f (x)T p < 0 then the vector
p defines a direction of descent with respect to f at x. We can now state necessary optimality
conditions for cases when S ̸= Rn .
Theorem (necessary optimality conditions). Suppose that S ⊆ Rn and that f ∈ C 1 on S.

a) If x∗ ∈ S is a local minimum of f over S, then


∇f (x∗ )T p ≥ 0
holds for all feasible directions p at x∗ .

b) Suppose that S is convex. If x∗ is a local minimum of f over S, then


∇f (x∗ )T (x − x∗ ) ≥ 0, x∈S (2)

4
TMA947 / MMG621 – Nonlinear optimization Lecture 4

Proof. See Proposition 4.22 in the book.

We refer to (2) as a variational inequality and we can now extend the notion of stationary points by
denoting them as points fulfilling (2). This is first out of four definitions of a stationary point.

Now the necessary and sufficient conditions for optimality can be stated in the following theorem.

Theorem (necessary and sufficient optimality conditions). Suppose S ⊆ Rn is a convex nonempty


set and that f ∈ C 1 is a convex function on S. Then

x∗ is a global minimum of f over S ⇐⇒ ∇f (x∗ )T (x − x∗ ) ≥ 0, x ∈ S.

Proof. See Theorem 4.23 in the book.

Note that when S = Rn , the expression to the right just becomes ∇f (x∗ ) = 0. Why?

We will now present three additional definitions of a stationary point which are all equivalent to
(2). The first one we get by taking the minimum of the left-hand-side of (2) and then realizing that
the optimal value must be zero, i.e.,

min ∇f (x∗ )T (x − x∗ ) = 0. (3)


x∈S

Convince yourself that (2) and (3) are equivalent! Now we claim that (2) and (3) are also equiva-
lent with
x∗ = ProjS [x∗ − ∇f (x∗ )] . (4)
The equation (4) states that if you stand in a stationary point and take a step in the direction of the
negative gradient and then project back to the feasible set, you should end up in the same point.
The details for showing this equivalence can be found in the book (pp. 94–95). See also Figure 2.

For the last equivalent definition of a stationary point, we need to introduce the normal cone

Definition (normal cone). Suppose the set S is closed and convex. Let x ∈ S. Then the normal cone to
S at x is the set
NS (x) := p ∈ Rn | pT (y − x) ≤ 0, y ∈ S .


Think of the normal cone at a point x as all direction pointing ”straight out” from the set. See
Figure 2. Now the fourth equivalent definition of a stationary point is that

−∇f (x∗ ) ∈ NS (x∗ ). (5)

That (5) is equivalent to (2) is trivial.

5
TMA947 / MMG621 – Nonlinear optimization Lecture 4

N
z

x
1
0
0
1

Figure 2: Illustration of the normal cone and the projection operator.

Summary of optimality condition for convex S ⊆ Rn

Definition (stationary point). Suppose that S is convex and that f ∈ C 1 . A point x∗ ∈ S fulfilling the
four equivalent statements a)–d) are called a stationary point.

a)
∇f (x∗ )T (x − x∗ ) ≥ 0, x ∈ S,

b)
min ∇f (x∗ )T (x − x∗ ) = 0,
x∈S

c)
x∗ = ProjS [x∗ − ∇f (x∗ )] ,

d)
−∇f (x∗ ) ∈ NS (x∗ ).

The two important theorems which will be utilized throughout the whole course are the follow-
ing.

Theorem (necessary optimality conditions). Suppose that S is convex and that f ∈ C 1 . Then

x∗ is a local minimum of f over S =⇒ x∗ is stationary

Theorem (necessary and sufficient optimality conditions). Suppose that S is convex and that f ∈ C 1
is convex. Then
x∗ is a global minimum of f over S ⇐⇒ x∗ is stationary

As we will see later in the course, the last definition (the inclusion (5)) is the only one that can be
extended to the case of non-convex sets S.

6
TMA947 / MMG621 – Nonlinear optimization Lecture 4

The separation theorem

Now we will present a very useful theorem for convex sets which says that: "If a point y does not
lie in a closed convex set S, then there exist a hyperplane separating y from S". Mathematically,
this amounts to the following.

Theorem (the separation theorem). Suppose that S ⊆ Rn is closed and convex, and that the point y
does not lie in S. Then there exists a vector π ̸= 0 and a scalar α ∈ R such that π T y > α, and π T x ≤ α
for all x ∈ S.

Proof. See Theorem 4.29.

p
S
P

x
2

Figure 3: Illustration of the separation theorem.

The separation theorem can be used to prove Farkas’ Lemma efficiently, see Theorem 4.33.

You might also like