0% found this document useful (0 votes)
23 views5 pages

Lecture 6

Uploaded by

mralreda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views5 pages

Lecture 6

Uploaded by

mralreda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

TMA947 / MMG621 – Nonlinear optimization Lecture 6

TMA947 / MMG621 — Nonlinear optimization

Lecture 6 — Optimality conditions


Emil Gustavsson, Zuzana Nedělková

November 11, 2016

[Minor revision: Axel Ringh - August, 2023]

Consider a constrained optimization problem of the form

min f (x), (1a)


subject to x ∈ S, (1b)

where f : Rn → R and S ⊂ Rn . We have already derived an optimality condition for the case
where S is convex and f ∈ C 1 , i.e.,

x∗ is a local minimum =⇒ x∗ is a stationary point

The stationary point was defined in several different ways, one of the definitions was that if
x∗ ∈ S is a stationary point of f over S then

−∇f (x∗ ) ∈ NS (x∗ ),


where NS (x∗ ) is the normal cone of S at x∗ , i.e.,

NS (x∗ ) := {p ∈ Rn | pT (y − x∗ ) ≤ 0, ∀y ∈ S}.

The optimality condition −∇f (x∗ ) ∈ NS (x∗ ) says that it should not be possible to move from x∗
in a direction allowed by S, such that f decreases.

This approach allows also to develop optimality conditions for more general non-linearly con-
strained problems. We first need to formalize the notion of a "direction allowed by S", and then
require that these allowed directions do not contain any descent directions for f . Formulating
a good notion of "allowed direction" is possibly the most challenging part of this course!

1 Geometric optimality conditions

First we introduce the most natural definition of allowed directions.

1
TMA947 / MMG621 – Nonlinear optimization Lecture 6

Definition 1 (cone of feasible directions). Let S ⊂ Rn be a nonempty closed set. The cone of feasible
directions for S at x ∈ S is defined as

RS (x) := {p ∈ Rn | ∃δ > 0, x + αp ∈ S, ∀0 ≤ α ≤ δ}. (2)

Thus, Rs (x) is nothing else but the cone containing all feasible directions at x. A vector p ∈ Rs (x)
if the feasible set S contains a non-trivial part of the half-line x + αp, α ≥ 0. Unfortunately this
cone is too small to develop optimality conditions for non-linearly constrained programs1 .
Example 1. Let S := {x ∈ R2 | x2 = x21 }. Then RS (x) = ∅ for all x ∈ S, because the feasible set is
a curved line in R2 .

We consider a significantly more complicated, but bigger and more well-behaving sets to develop
optimality conditions.

Definition 2 (tangent cone). Let S ⊂ Rn be a nonempty closed set. The tangent cone for S at x ∈ S is
defined as


TS (x) := p |∃{xk }∞ ∞
k=1 ⊂ S, {λk }k=1 ⊂ (0, ∞), such that
lim xk = x, (3)
k→∞
lim λk (xk − x) = p .
k→∞

The above definition tells us that to check whether a vector p ∈ TS (x) we should check whether
there is a feasible sequence of points xk ∈ S that approaches x, such that p is the tangential to the
sequence xk at x; such tangential vector is described as the limit of {λk (xk − x)} for arbitrary
positive sequence {λk }. Seen this way, TS (x) consists precisely of all the possible directions in
which x can be asymptotically approached through S.

Example 2. Let again S := {x ∈ R2 | x2 = x21 }. Then, TS (0) = {p ∈ R2 | p2 = 0}.

Example 3. Let S := {x ∈ R2 | − x1 ≤ 0; (x1 − 1)2 + x22 ≤ 1}. Then, RS (0) = {p ∈ R2 | p1 > 0}


and TS (0) = {p ∈ R2 | p1 ≥ 0}.

Example 4. Suppose that we have a smooth curve in S starting at x ∈ S, that is, we have a C 1 map
γ : [0, T ] → S for some T > 0. Then γ ′ (0) ∈ TS (x) since the definition of (one-sided) derivative is

γ(t) − γ(0)
γ ′ (0) = lim . (4)
t
t→0

So if we fix any sequence tk → 0, and let xk := γ(tk ), λk = 1/tk , we have defined the sequences required
in the definition of TS (x).

It remains to formulate a notion of descent directions to the objective function f , fortunately we


can use the same characerization as in the unconstrained case.
1 It will, however, work perfectly for linear programs!

2
TMA947 / MMG621 – Nonlinear optimization Lecture 6


Definition 3 (cone of descent directions). F (x) := {p ∈ Rn | ∇f (x)T p < 0}.

The above examples should then make the following theorem intuitively obvious.

Theorem 1 (geometric optimality conditions). Consider the problem (1), where f ∈ C 1 . Then

x∗ is a local minimum of f over S =⇒ F (x∗ ) ∩ TS (x∗ ) = ∅. (5)

Proof. See theorem 5.10 in the book.


Example 5. If we return to our example with smooth curves, we showed that for any smooth curve γ
though S starting at x∗ , we had γ ′ (0) ∈ TS (x∗ ). The geometric optimality condition reduces to the
d
statement that |t=0 f (γ(t)) ≥ 0 when applied to this tangent vector.
dt

2 From geometric to useful optimality conditions

Now we have developed an elegant optimality condition, however there is no practical way to
compute TS (x) directly from its definition. One way to overcome this difficulty (leading to the
Fritz John conditions) is to replace the cone TS (x) by smaller cones.

Lemma 1. If the cone C(x) ⊆ TS (x) for all x ∈ S, then F (x∗ ) ∩ C(x∗ ) = ∅ is a neccessary optimality
condition.

Proof. Using the geometric optimality condition we have for any locally optimal x∗ ∈ S,
◦ ◦
F (x∗ ) ∩ C(x∗ ) ⊆ F (x∗ ) ∩ TS (x∗ ) = ∅.

By introducing smaller cones we get weaker optimality conditions than the geometric optimality
conditions!

Example 6. Let C(x) = RS (x) and consider again the example S := {x ∈ R2 | x2 = x21 }. Since

RS (x) = ∅, the optimality condition F (x) ∩ RS (x) = ∅ holds for any feasible x ∈ S, which is a totally
useless optimality condition.

The second way to overcome the difficulty with computing TS (x) is to introduce regularity con-
ditions, or constraint qualifications, which will allow us to actually compute the tangent cone TS (x)
by other means. This approach leads to the Karush-Kuhn-Tucker (KKT) conditions. The draw-
back of this approach is that, although the KKT conditions are equally strong as the geometric
conditions, they are less general, i.e., they do not apply for irregular problems.

From now on we consider a problem of the form

3
TMA947 / MMG621 – Nonlinear optimization Lecture 6

min f (x), (6a)


subject to gi (x) ≤ 0, i = 1, . . . , m (6b)

where f : Rn → R, and gi : Rn → R, i = 1, . . . , m are all C 1 , i.e., the feasible set S := {x ∈


Rn | gi (x) ≤ 0, i = 1, . . . , m}. This allows us to define additional cones related to TS (x). Let I(x)
denote the active set of constraints at x, that is,

I(x) := {i ∈ {1, . . . , m} | gi (x) = 0}. (7)



Definition 4 (gradient cones). We define the inner gradient cone G(x) as


G(x) := {p ∈ Rn | ∇gi (x)T p < 0, ∀i ∈ I(x)}, (8)

and the gradient cone G(x) as

G(x) := {p ∈ Rn | ∇gi (x)T p ≤ 0, ∀i ∈ I(x)}. (9)


Note that the inner gradient cone G(x) consists of all vectors p that can be guaranteed to be
descent directions of all defining functions for the active constraints, while the gradient cone
G(x) consists of all directions that can be guaranteed not to be ascent directions for the active
constraints.
Theorem 2 (Relations between cones). For the problem (6) it holds that


cl G(x) ⊆ cl RS (x) ⊆ TS (x) ⊆ G(x) (10)

Proof. See Proposition 5.4 and Lemma 5.12 in the book.

3 The Fritz John conditions

We obtain the Fritz John conditions when we replace the tangent cone TS (x) in the geometric

optimality condition by G(x).

◦ ◦
x∗ is locally optimal in (6) =⇒ G(x) ∩ F (x) = ∅. (11)

Therefore, the Fritz John conditions are weaker than the geometric optimality conditions.

This condition looks fairly abstract, however it is possible to reformulate it to a more practical
condition. The above equation states for a fixed x that a linear system of inequalities does not have

4
TMA947 / MMG621 – Nonlinear optimization Lecture 6

solution. Fortunately, we have Farkas’ Lemma for turning an inconsistent set of linear inequalities
into a consistent set of inequalities.

Theorem 3 (The Fritz John conditions). If x∗ ∈ S is a local minimum in (6), then there exist multipliers
µ0 ∈ R, µ ∈ Rm , such that
m
X
µ0 ∇f (x∗ ) + µi ∇gi (x∗ ) = 0, (12)
i=1
µi gi (x∗ ) = 0, i = 1, . . . , m, (13)
µ0 , µi ≥ 0, i = 1, . . . , m, (14)
(µ0 , µT )T 6= 0. (15)

Proof. See Theorem 5.17 in the book.

The main drawback of the Fritz-John conditions is that they are too weak. The Fritz-John system
contains a multiplier in front of the objective function term. If there is a solution to the Fritz-John
system where the multiplier µ0 = 0, the objective function does not play any role in the system.
This insight gives us at least one reason to think about regularity conditions (constraint qualifica-
tions); these conditions guarantee that any solution of the Fritz-John system must satisfy µ0 6= 0.

You might also like