Lecture 6
Lecture 6
where f : Rn → R and S ⊂ Rn . We have already derived an optimality condition for the case
where S is convex and f ∈ C 1 , i.e.,
The stationary point was defined in several different ways, one of the definitions was that if
x∗ ∈ S is a stationary point of f over S then
NS (x∗ ) := {p ∈ Rn | pT (y − x∗ ) ≤ 0, ∀y ∈ S}.
The optimality condition −∇f (x∗ ) ∈ NS (x∗ ) says that it should not be possible to move from x∗
in a direction allowed by S, such that f decreases.
This approach allows also to develop optimality conditions for more general non-linearly con-
strained problems. We first need to formalize the notion of a "direction allowed by S", and then
require that these allowed directions do not contain any descent directions for f . Formulating
a good notion of "allowed direction" is possibly the most challenging part of this course!
1
TMA947 / MMG621 – Nonlinear optimization Lecture 6
Definition 1 (cone of feasible directions). Let S ⊂ Rn be a nonempty closed set. The cone of feasible
directions for S at x ∈ S is defined as
Thus, Rs (x) is nothing else but the cone containing all feasible directions at x. A vector p ∈ Rs (x)
if the feasible set S contains a non-trivial part of the half-line x + αp, α ≥ 0. Unfortunately this
cone is too small to develop optimality conditions for non-linearly constrained programs1 .
Example 1. Let S := {x ∈ R2 | x2 = x21 }. Then RS (x) = ∅ for all x ∈ S, because the feasible set is
a curved line in R2 .
We consider a significantly more complicated, but bigger and more well-behaving sets to develop
optimality conditions.
Definition 2 (tangent cone). Let S ⊂ Rn be a nonempty closed set. The tangent cone for S at x ∈ S is
defined as
TS (x) := p |∃{xk }∞ ∞
k=1 ⊂ S, {λk }k=1 ⊂ (0, ∞), such that
lim xk = x, (3)
k→∞
lim λk (xk − x) = p .
k→∞
The above definition tells us that to check whether a vector p ∈ TS (x) we should check whether
there is a feasible sequence of points xk ∈ S that approaches x, such that p is the tangential to the
sequence xk at x; such tangential vector is described as the limit of {λk (xk − x)} for arbitrary
positive sequence {λk }. Seen this way, TS (x) consists precisely of all the possible directions in
which x can be asymptotically approached through S.
Example 4. Suppose that we have a smooth curve in S starting at x ∈ S, that is, we have a C 1 map
γ : [0, T ] → S for some T > 0. Then γ ′ (0) ∈ TS (x) since the definition of (one-sided) derivative is
γ(t) − γ(0)
γ ′ (0) = lim . (4)
t
t→0
So if we fix any sequence tk → 0, and let xk := γ(tk ), λk = 1/tk , we have defined the sequences required
in the definition of TS (x).
2
TMA947 / MMG621 – Nonlinear optimization Lecture 6
◦
Definition 3 (cone of descent directions). F (x) := {p ∈ Rn | ∇f (x)T p < 0}.
The above examples should then make the following theorem intuitively obvious.
Theorem 1 (geometric optimality conditions). Consider the problem (1), where f ∈ C 1 . Then
◦
x∗ is a local minimum of f over S =⇒ F (x∗ ) ∩ TS (x∗ ) = ∅. (5)
Now we have developed an elegant optimality condition, however there is no practical way to
compute TS (x) directly from its definition. One way to overcome this difficulty (leading to the
Fritz John conditions) is to replace the cone TS (x) by smaller cones.
◦
Lemma 1. If the cone C(x) ⊆ TS (x) for all x ∈ S, then F (x∗ ) ∩ C(x∗ ) = ∅ is a neccessary optimality
condition.
Proof. Using the geometric optimality condition we have for any locally optimal x∗ ∈ S,
◦ ◦
F (x∗ ) ∩ C(x∗ ) ⊆ F (x∗ ) ∩ TS (x∗ ) = ∅.
By introducing smaller cones we get weaker optimality conditions than the geometric optimality
conditions!
Example 6. Let C(x) = RS (x) and consider again the example S := {x ∈ R2 | x2 = x21 }. Since
◦
RS (x) = ∅, the optimality condition F (x) ∩ RS (x) = ∅ holds for any feasible x ∈ S, which is a totally
useless optimality condition.
The second way to overcome the difficulty with computing TS (x) is to introduce regularity con-
ditions, or constraint qualifications, which will allow us to actually compute the tangent cone TS (x)
by other means. This approach leads to the Karush-Kuhn-Tucker (KKT) conditions. The draw-
back of this approach is that, although the KKT conditions are equally strong as the geometric
conditions, they are less general, i.e., they do not apply for irregular problems.
3
TMA947 / MMG621 – Nonlinear optimization Lecture 6
◦
G(x) := {p ∈ Rn | ∇gi (x)T p < 0, ∀i ∈ I(x)}, (8)
◦
Note that the inner gradient cone G(x) consists of all vectors p that can be guaranteed to be
descent directions of all defining functions for the active constraints, while the gradient cone
G(x) consists of all directions that can be guaranteed not to be ascent directions for the active
constraints.
Theorem 2 (Relations between cones). For the problem (6) it holds that
◦
cl G(x) ⊆ cl RS (x) ⊆ TS (x) ⊆ G(x) (10)
We obtain the Fritz John conditions when we replace the tangent cone TS (x) in the geometric
◦
optimality condition by G(x).
◦ ◦
x∗ is locally optimal in (6) =⇒ G(x) ∩ F (x) = ∅. (11)
Therefore, the Fritz John conditions are weaker than the geometric optimality conditions.
This condition looks fairly abstract, however it is possible to reformulate it to a more practical
condition. The above equation states for a fixed x that a linear system of inequalities does not have
4
TMA947 / MMG621 – Nonlinear optimization Lecture 6
solution. Fortunately, we have Farkas’ Lemma for turning an inconsistent set of linear inequalities
into a consistent set of inequalities.
Theorem 3 (The Fritz John conditions). If x∗ ∈ S is a local minimum in (6), then there exist multipliers
µ0 ∈ R, µ ∈ Rm , such that
m
X
µ0 ∇f (x∗ ) + µi ∇gi (x∗ ) = 0, (12)
i=1
µi gi (x∗ ) = 0, i = 1, . . . , m, (13)
µ0 , µi ≥ 0, i = 1, . . . , m, (14)
(µ0 , µT )T 6= 0. (15)
The main drawback of the Fritz-John conditions is that they are too weak. The Fritz-John system
contains a multiplier in front of the objective function term. If there is a solution to the Fritz-John
system where the multiplier µ0 = 0, the objective function does not play any role in the system.
This insight gives us at least one reason to think about regularity conditions (constraint qualifica-
tions); these conditions guarantee that any solution of the Fritz-John system must satisfy µ0 6= 0.