0% found this document useful (0 votes)

6 views

optimization notes2

This paper presents new first-order necessary and sufficient conditions for local minimizers in set-constrained optimization, extending traditional methods that rely on the gradient's behavior. The authors introduce alternative conditions that do not require the objective function to be convex and provide examples to illustrate their applicability, including a discussion on the Karush–Kuhn–Tucker theorem. The results are significant for optimization problems where the standard necessary conditions may not yield useful information due to the absence of feasible directions.

Uploaded by

kanha dd

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

optimization notes2

Uploaded by

kanha dd

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

mathematics

Article
First-Order Conditions for Set-Constrained Optimization
Steven M. Rovnyak 1, *, Edwin K. P. Chong 2 and James Rovnyak 3

1 Department of Electrical and Computer Engineering, Indiana University-Purdue University,

Indianapolis, IN 46202, USA
2 Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523, USA;
[email protected]
3 Department of Mathematics, University of Virginia, Charlottesville, VA 22904, USA; [email protected]
* Correspondence: [email protected]

Abstract: A well-known first-order necessary condition for a point to be a local minimizer of a given
function is the non-negativity of the dot product of the gradient and a vector in a feasible direction.
This paper proposes a series of alternative first-order necessary conditions and corresponding first-
order sufficient conditions that seem not to appear in standard texts. The conditions assume a
nonzero gradient. The methods use extensions of the notions of gradient, differentiability, and
twice differentiability. Examples, including one involving the Karush–Kuhn–Tucker (KKT) theorem,
illustrate the scope of the conditions.

Keywords: constrained optimization; local minimizer; necessary condition; sufficient condition; KKT
theorem

MSC: 49K99; 90C46

1. Introduction
Set-constrained and set-unconstrained optimization use several theorems that include
Citation: Rovnyak, S.M.; Chong, a first-order necessary condition, as well as second-order conditions that are necessary
E.K.P.; Rovnyak, J. First-Order and/or sufficient for a point to be a local minimizer [1,2]. These theorems require the
Conditions for Set-Constrained objective function to be once or twice continuously differentiable.
Optimization. Mathematics 2023, 11, The first-order necessary condition for unconstrained optimization requires the gradi-
4274. https://doi.org/10.3390/ ent to be zero at a minimizer [3–5]. The first-order necessary condition for set-constrained
math11204274
optimization requires the dot product of the gradient and a vector in a feasible direction
Academic Editors: Chao Zhang, to be non-negative. When constraints are defined in terms of differentiable functions, the
Yanfang Zhang, Yang Zhou and first-order necessary condition takes the form of the first-order Lagrange and Karush–Kuhn–
Qiang Ye Tucker (KKT) conditions [2,6]. Reference [5] determined the solution to linear programming
problems using a first-order necessary condition. Such conditions have been studied for
Received: 15 September 2023
control systems governed by ordinary differential equations [7], stochastic differential
Revised: 3 October 2023
equations [8], and stochastic evolution equations [9].
Accepted: 10 October 2023
Published: 13 October 2023
Throughout, we assume a nonzero gradient. Our main results present a series of
four sufficient conditions and four corresponding necessary conditions for a point to
be a local minimizer of a given function f (Theorems 1–8). Each is of the first-order
type, including those that assume twice differentiability. Theorems 1 and 2 describe
Copyright: © 2023 by the authors. the behavior of f in cones determined by the gradient. Theorems 3 and 4 replace the
Licensee MDPI, Basel, Switzerland. geometrical viewpoint by analytical conditions involving sequences. The analytical versions
This article is an open access article use generalizations of the notions of gradient and differentiability that simplify statements
distributed under the terms and and proofs. Theorems 5 and 6 are refinements when f is twice differentiable. A previous
conditions of the Creative Commons version of this paper included a remark that the analytical conditions are unverifiable.
Attribution (CC BY) license (https://
However, the last two results, Theorems 7 and 8, which return to the geometrical view, are
creativecommons.org/licenses/by/
proved precisely by verifying the analytical conditions. They replace the original cones
4.0/).

Mathematics 2023, 11, 4274. https://doi.org/10.3390/math11204274 https://www.mdpi.com/journal/mathematics

Mathematics 2023, 11, 4274 2 of 14

by larger regions that we call α-cones. An α-cone is an ordinary cone when α = 1 and a
paraboloid when α = 2. The results fail for half-planes, and a paraboloid is a limiting case
for what is possible. Example 5 illustrates a class of problems that do not meet criteria for a
strict local minimizer in the KKT theory but are covered by Theorem 7.
We remark that the cones used here are different from the cones of descent directions
in [2], which are actually half-planes. Our sufficient conditions do not guarantee that a
point with a nonzero gradient is a strict local minimizer on a half-plane. Example 1 gives a
function with a nonzero gradient that is not a strict local minimizer on a half-plane.
Convex optimization problems require the objective function to be convex, which
essentially makes them second-order conditions. The first-order conditions that we propose
do not require the objective function to be convex. The requirement for a function to
be twice continuously differentiable is different from the condition of convexity, which
constrains the values of the second derivatives.
Notation andpterminology. Points of Rn are written as column vectors x = [x1 . . . . , xn ]>
with norm kxk = |x1 |2 + · · · + |xn |2 . A subset of Rn is called a neighborhood of a point x∗
if it contains a disk kx − x∗ k < ε for some ε > 0, and a neighborhood of a set Ω ⊆ Rn if it is a
neighborhood of every point in Ω. A point x∗ ∈ Ω is a strict local minimizer of a function
f : Ω → R if there is an ε > 0 such that f (x) > f (x∗ ) whenever x ∈ Ω and 0 < kx − x∗ k < ε,
and a local minimizer if the condition holds with > replaced by ≥. Local maximizers and
strict local maximizers are defined similarly by reversing the inequalities.

2. First-Order Conditions for Local Minimizers

The gradient of a real-valued function f defined on a subset Ω of Rn is defined by
>
∂f ∂f
∇ f (x) = ,..., ,
∂x1 ∂xn

whenever the partial derivatives exist. Gradients appear in a standard first-order necessary
condition for local minimizers. Consider a function f : Ω → R that is C1 on a neighborhood
of a set Ω ⊆ Rn and a point x∗ ∈ Ω. If x∗ is a local minimizer of f , then d> ∇ f (x∗ ) ≥ 0
for every vector d in a feasible direction, that is, a direction such that some straight line
segment with endpoint x∗ lies entirely within Ω [1] (Theorem 6.1). Hence, if d> ∇ f (x∗ ) < 0
for some feasible direction d at x∗ , the standard necessary condition implies that x∗ is not
a local minimizer. However, it may occur that x∗ has no feasible direction within Ω, and
then it is impossible for the standard necessary condition to give such information. For
example, feasible directions are impossible for any set Ω whose points have only rational
coordinates. For an elementary example, consider the objective function f (x) = − x1 on
the set
Ω = x ∈ R2 : 0 ≤ x1 ≤ 1, x13 ≤ x2 ≤ x12 .

(1)
Then, f attains a maximum value on Ω at x∗ = 0. The point x∗ admits no feasible
direction within Ω (proof: any feasible direction must be along a line segment x2 = cx1 ,
c > 0; then, for all sufficiently small positive x1 , x13 ≤ cx1 ≤ x12 , and hence x12 ≤ c ≤ x1 ,
which is impossible), and thus the standard necessary condition yields no information.
Feasible directions play no role in our results, and instead all that is needed is that
∇ f (x∗ ) 6= 0. Theorem 2 (FONC 1) is a new first-order necessary condition for local mini-
mizers. It is proved using a corresponding new first-order sufficient condition, Theorem 1
(FOSC 1). Corollary 1 is a companion sufficiency condition for strict local maximizers.
In the example f (x) = − x1 on the set Ω defined by (1) and x∗ = 0, the gradient is
∇ f (0) = [−1, 0]> , and therefore:
(1) Theorem 2 is applicable and implies that 0 is not a strict local minimizer, because every
opposite cone | x2 | ≤ cx1 contains points arbitrarily near 0.
(2) Corollary 1 is applicable and implies that 0 is a strict local maximizer, because Ω is
entirely contained in the opposite cone | x2 | ≤ x1 .
Mathematics 2023, 11, 4274 3 of 14

We need a few more preliminaries. A real-valued function f on a subset Ω of Rn is

said to be differentiable at x∗ if the domain of f contains a neighborhood of x∗ , all partial
derivatives of f exist at x∗ , and the function r (x) defined by

f (x) − f (x∗ ) = (x − x∗ )> ∇ f (x∗ ) + r (x) (2)

satisfies
r (x)
lim = 0. (3)
x→x∗ kx − x∗ k
When n = 1, this is equivalent to the existence of a derivative at the point, but in
general the simple existence of partial derivatives does not imply differentiability. A
convenient sufficient condition for differentiability is that f is defined and C1 on some
neighborhood of x∗ [10] (Th. 9 on p. 113). Differentiability is equivalent to the existence of
a first-order Taylor approximation, as in [1] (Th. 5.5 on pp. 64–65) or [10] (Th. 2 on p. 160).
Any two nonzero vectors d1 , d2 in Rn determine an acute angle θ such that

d1> d2
cos(θ ) = .
kd1 kkd2 k

This notion is implicit in the definition of a cone in Rn .

Definition 1. If x∗ , d ∈ Rn , d 6= 0, and 0 < δ < 1, the set consisting of x∗ together with all
points x 6= x∗ in Rn that satisfy
(x − x∗ )> d
≥δ
kx − x∗ kkdk
is denoted by Kδ (x∗ , d) and called a cone with vertex x∗ and direction d. The opposite cone is
defined by
− K δ ( x ∗ , d ) = K δ ( x ∗ , − d ).

Our first result is a first-order sufficient condition (FOSC) for a local minimizer.

Theorem 1 (FOSC 1). Let f be a real-valued function that is defined and C1 on a neighborhood
of a point x∗ ∈ Rn such that d∗ = ∇ f (x∗ ) 6= 0. Assume that Ω is a set in the domain of f
that contains x∗ and is contained in some cone Kδ (x∗ , d∗ ), δ ∈ (0, 1). Then, x∗ is a strict local
minimizer of f on Ω.

We note a consequence that will be useful in Theorem 2.

Corollary 1. For f as in Theorem 1, x∗ is a strict local maximizer of f on any set in the domain of
f that contains x∗ and is contained in −Kδ (x∗ , d∗ ) for some δ ∈ (0, 1).

The corollary follows by applying Theorem 1 with f replaced by − f .

Proof of Theorem 1. Let Ω be a set in the domain of f that contains x∗ and is contained in
Kδ (x∗ , d∗ ), δ ∈ (0, 1). Since f is C1 on a neighborhood of x∗ , f is differentiable at x∗ . Thus,

f ( x ) − f ( x ∗ ) = ( x − x ∗ ) > d ∗ + r ( x ), (4)

where limx→x∗ r (x)/kx − x∗ k = 0. Therefore, we may choose η > 0 so that the punctured
disk 0 < kx − x∗ k < η is contained in the domain of f and

|r (x)|
< δkd∗ k (5)
kx − x∗ k
Mathematics 2023, 11, 4274 4 of 14

whenever 0 < kx − x∗ k < η. Suppose x ∈ Ω and 0 < kx − x∗ k < η. Then, x ∈ Kδ (x∗ , d∗ ),

so (x − x∗ )> d∗ ≥ kx − x∗ kkd∗ k. By (4),

f (x) − f (x∗ ) ≥ δkx − x∗ kkd∗ k + r (x). (6)

Since 0 < kx − x∗ k < η, by (5), |r (x)| < δkd∗ kkx − x∗ k, and hence by (6),

f (x) − f (x∗ ) > 0.

Therefore, x∗ is a strict local minimizer of f on Ω.

A corresponding first-order necessary condition (FONC) is deduced with the aid of

Corollary 1. We first introduce some useful terminology.

Definition 2. A point x∗ of Ω is called isolated if there exists an ε > 0 such that the punctured
disk 0 < kx − x∗ k < ε contains no points of Ω, or, equivalently, there is no sequence {xn }∞
n=1 in
∗
Ω \ {x } such that xn → x .∗

In the extreme case that a set consists of just one point, that point is isolated in the set
because the alternative would imply the existence of other points in the set. Isolated points
occur in our setting in the next result, which is illustrated in Figure 1.

Theorem 2 (FONC 1). Let f be a real-valued function that is defined and C1 on a neighborhood of
a point x∗ ∈ Rn such that d∗ = ∇ f (x∗ ) 6= 0. If x∗ is a strict local minimizer of f on some subset
Ω of Rn , then x∗ is an isolated point of −Kδ (x∗ , d∗ ) ∩ Ω for every δ ∈ (0, 1).

Proof. Suppose x∗ is a strict local minimizer of f on some subset Ω of Rn . We argue by

contradiction to show that x∗ is an isolated point of −Kδ (x∗ , d∗ ) ∩ Ω for every δ ∈ (0, 1).
Assume that this conclusion is false for some δ ∈ (0, 1). Then, the cone Kδ (x∗ , −d∗ ) contains
a sequence x1 , x2 , x3 , ... in Ω \ {x∗ } that converges to x∗ . Since x∗ is a strict local minimizer
of f on Ω, there exists ε > 0 such that f (x) > f (x∗ ) whenever 0 < kx − x∗ k < ε and
x ∈ Ω. By the definition of convergence, kxn − x∗ k < ε for all sufficiently large n, and
hence f (xn ) > f (x∗ ) for all sufficiently large n. However, by Corollary 1, x∗ is a strict local
maximizer for f on Kδ (x∗ , −d∗ ) ∩ Ω, so f (xn ) < f (x∗ ) for a sufficiently large n, which is a
contradiction. The result follows.

− Kδ ( x ∗ , d∗ ) Kδ ( x ∗ , d∗ )

∇ f (x∗ )
x∗
no points of Ω in here

Figure 1. This figure illustrates Theorem 2 (FONC 1). If x∗ is a strict local minimizer of f on some
subset Ω of Rn , then x∗ is an isolated point of −Kδ (x∗ , d∗ ) ∩ Ω for every δ ∈ (0, 1).

3. Analytical Versions of the Conditions and Generalized Differentiability

In this section, we present analytical versions of the conditions for local minimizers
given in Section 2. They are stated in a more general setting that uses extensions of the
notions of differentiability and gradient to arbitrary sets. The analytical versions are in
some ways more transparent and lead to generalizations of Theorems 1 and 2 in Section 5.
Mathematics 2023, 11, 4274 5 of 14

Definition 3 (Generalized Differentiability). Let Ω be a subset of Rn . We say that a function

f : Ω → R is differentiable at a point x∗ ∈ Ω if (1) x∗ is not an isolated point of Ω, and (2) there
is a vector g(x∗ ) in Rn such that the function r (x) defined by

f (x) − f (x∗ ) = (x − x∗ )> g(x∗ ) + r (x) (7)

satisfies
r (x)
lim∗ = 0, (8)
x→x kx − x∗ k
or, equivalently, for every sequence {xn }1∞ in Ω \ {x∗ } converging to x∗ , the sequence {rn }1∞
defined by
f (xn ) − f (x∗ ) = (xn − x∗ )> g(x∗ ) + rn (9)
satisfies
rn
lim∗ = 0. (10)
x→x kxn − x∗ k
Any such vector g(x∗ ) is denoted by ∇ f (x∗ ) and called a gradient of f at x∗ .

By [10] (Th. 9 on p. 113), the condition (8) is automatically met if f has an extension to
a function f˜ which is C1 on a neighborhood of x∗ , and then we can choose g(x∗ ) = ∇ f˜(x∗ ).
In general, gradients are not unique, but for our purpose any choice will work.
For a fixed f and x∗ , the set of all gradients g(x∗ ) is a closed convex set. We shall not
need this fact and omit a proof.

Theorem 3 (FOSC 2). Let Ω be a subset of Rn , and let f : Ω → R be differentiable at some point
x∗ ∈ Ω with gradient ∇ f (x∗ ) 6= 0. Assume that for every sequence {xn }1∞ in Ω \ {x∗ } with
xn → x∗ there is a δ > 0 such that

(xn − x∗ )> ∇ f (x∗ ) ≥ δkxn − x∗ k (11)

for all sufficiently large n. Then, x∗ is a strict local minimizer of f over Ω.

Proof. Argue by contradiction. If x∗ is not a strict local minimizer of f , there is a sequence

{xn }1∞ in Ω \ {x∗ } with xn → x∗ such that f (xn ) ≤ f (x∗ ) for all n. Define {rn }1∞ by (9).
Then, by (11),

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + rn ≥ δkxn − x∗ k + rn (12)

for all sufficiently large n. Since f is differentiable at x∗ , rn /kxn − x∗ k → 0 as n → ∞

by (10). Dividing by kxn − x∗ k, we obtain

f (xn ) − f (x∗ ) rn
∗
≥ δ+ >0 (13)
kxn − x k kxn − x∗ k

for all sufficiently large n. The result that f (xn ) > f (x∗ ) for all sufficiently large n con-
tradicts our choice of the sequence {xn }1∞ to satisfy f (xn ) ≤ f (x∗ ) for all n. The theorem
follows.

The corresponding necessary condition is conveniently stated in contrapositive form.

Theorem 4 (FONC 2). Let Ω be a subset of Rn , and let f : Ω → R be differentiable at some point
x∗ ∈ Ω with gradient ∇ f (x∗ ) 6= 0. If there exists a sequence {xn }1∞ in Ω \ {x∗ } converging to x
and number δ > 0 such that

(xn − x∗ )> ∇ f (x∗ ) ≤ −δkxn − x∗ k (14)

for all sufficiently large n, then x∗ is not a local minimizer of f on Ω.

Mathematics 2023, 11, 4274 6 of 14

Equivalently, if x∗ is a local minimizer of f over Ω, then there exists no such sequence

{xn }1∞and number δ > 0.

Proof. Assume we are given such a sequence {xn }1∞ and δ > 0. Define {rn }1∞ by (9). Then,
by (14),
f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + rn ≤ −δkxn − x∗ k + rn (15)
for all sufficiently large n. Since f is differentiable at x∗ , rn /kxn − x∗ k → 0 as n → ∞
by (10). Thus, dividing (15) by kxn − x∗ k, we see that

f (xn ) − f (x∗ ) rn
≤ −δ + <0 (16)
kxn − x∗ k kxn − x∗ k

for all sufficiently large n. Hence, f (xn ) < f (x∗ ) for all sufficiently large n. Since x∗n → x,
x∗ is not a strict local minimizer.

Example 1. The inequality (11) in Theorem 3 (FOSC 2) cannot be weakened to

(xn − x∗ )> ∇ f (x∗ ) > 0. (17)

Choose x∗ = 0, Ω = {0} ∪ {x ∈ R2 : x1 > 0}, and f (x) = x1 − x22 , x ∈ Ω. Then,

we can take ∇ f (x∗ ) = [1, 0]> , and any sequence {xn }1∞ in Ω \ {0} with xn → 0 satisfies (17).
However,0 is not a localminimizer because f (0) = 0, and the sequence {xn }1∞ in Ω \ {0} defined
by xn = 1/(2n2 ), 1/n , n ≥ 1, converges to 0 and satisfies f (xn ) < 0 for all n ≥ 1.

Example 2. We cannot relax the inequality (14) in Theorem 4 (FONC 2) to

(xn − x∗ )> ∇ f (x∗ ) < 0. (18)

Choose x∗ = 0, Ω = x ∈ R2 : 2x1 ≥ − x22 , f (x) = x1 + x22 , x ∈ Ω, and ∇ f (x∗ ) =

[1, 0]> . See Figure 2. The sequence xn = − 1/(2n2 ), 1/n , n ≥ 1, belongs to Ω \ {0} and

converges to 0. It satisfies (18) because

1 1
(xn − x∗ )> ∇ f (x∗ ) = − 1/(2n2 ), 1/n

= − 2 < 0, n ≥ 1.
0 2n

Nevertheless, f (0) = 0 is the minimum value of f attained on Ω. For, if x ∈ Ω \ {0} and

x2 6= 0, then f (x) = x1 + x22 ≥ − 21 x22 + x22 > 0, and f (x) = x1 > 0 whenever x ∈ Ω \ {0} and
x2 = 0.

2x1 = − x22
Ω
x1
x∗ ∇ f (x∗ )

Figure 2. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Example 2.

Mathematics 2023, 11, 4274 7 of 14

4. Refinements Based on Twice Differentiability

The additional smoothness of f yields stronger results.

Definition 4 (Generalized Twice Differentiability). Let Ω be a subset of Rn . We say that a

function f : Ω → R is twice differentiable at a point x∗ ∈ Ω if (1) x∗ is not an isolated point
of Ω, and (2) there exist g(x∗ ) ∈ Rn and H(x∗ ) ∈ Rnxn such that the function r (x) defined by

f (x) − f (x∗ ) = (x − x∗ )> g(x∗ ) + (x − x∗ )> H(x∗ )(x − x∗ ) + r (x) (19)

satisfies
r (x)
lim = 0, (20)
n→∞ k x − x ∗ k2
or, equivalently, such that for any sequence {xn }1∞ in Ω \ {x∗ } converging to x∗ ,

f (xn ) − f (x∗ ) = (xn − x∗ )> g(x∗ ) + (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn , (21)

where
rn
lim = 0. (22)
n→∞ k x n − x ∗ k2
Any such vector g(x∗ ) is denoted ∇ f (x∗ ) and called a gradient of f at x∗ , and any such
matrix H(x∗ ) is called a Hessian of f at x∗ .

Twice differentiability implies differentiability according to Lemma 1 below. By Th.

3 on p. 160 of [10], f is twice differentiable at x∗ if it has an extension to a function f˜ that
is C2 on a neighborhood of x∗ . In this case, g(x∗ ) and H(x∗ ) can be chosen as the usual
gradient and Hessian of f˜.

Definition 5. Given sequences { an }∞ ∞

n=1 and { bn }n=1 of real numbers, we write (1) an = O( bn )
to mean that there exists M > 0 such that | an | ≤ M |bn | for all sufficiently large n, and (2)
an = O (bn ) if bn 6= 0 for all sufficiently large n and limn→∞ an /bn = 0.

Lemma 1. Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some point x∗ ∈ Ω

with gradient ∇ f (x∗ ). Then, for any sequence {xn }1∞ in Ω \ {x∗ } converging to x∗ ,

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 ). (23)

Proof. We start with (21), assuming some choice of Hessian H(x∗ ). Let M = kH(x∗ )k
be the matrix bound of H(x∗ ). Then, by the triangle inequality and Cauchy–Schwarz
inequality,

(xn − x∗ )> H(x∗ )(xn − x∗ ) + rn ≤ (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn

rn
≤ M k x n − x ∗ k2 + k x n − x ∗ k2 .
k x n − x ∗ k2

By (22), rn /kxn − x∗ k2 → 0, and hence rn /kxn − x∗ k2 ≤ 1 for all sufficiently large n. There-
fore (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn ≤ ( M + 1)kxn − x∗ k2 for all sufficiently large n.

Theorem 5 (FOSC 3). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with gradient ∇ f (x∗ ) 6= 0. Assume that for every sequence {xn }1∞ in Ω \ {x∗ }
with xn → x∗ , there is a sequence {δn }∞ ∗ 2
n=1 of positive numbers such that k xn − x k = O ( δn ) and

(xn − x∗ )> ∇ f (x∗ ) ≥ δn (24)

for all sufficiently large n. Then, x∗ is a strict local minimizer of f over Ω.
Mathematics 2023, 11, 4274 8 of 14

Proof. If the conclusion is not true, there exists a sequence {xn }1∞ in Ω \ {x∗ } with xn → x∗
such that f (xn ) ≤ f (x∗ ) for all n. Then, by hypothesis, there is a sequence {δn }∞ n=1 of
positive numbers satisfying (24) such that kxn − x∗ k2 = O (δn ). By Lemma 1 and (24),

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 )

≥ δn + O(kxn − x∗ k2 )

for all sufficiently large n. Hence, since kxn − x∗ k2 = O (δn ),

f (xn ) − f (x∗ ) O(kxn − x∗ k2 )

≥ 1+ >0 (25)
δn δn

for all sufficiently large n. Thus, f (xn ) > f (x∗ ) for all sufficiently large n, contradicting our
choice of the sequence {xn }1∞ . The result follows.

Theorem 6 (FONC 3). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with gradient ∇ f (x∗ ) 6= 0. If there exist a sequence {xn }1∞ in Ω \ {x∗ } converging
to x∗ and a sequence {δn }∞ ∗ 2
n=1 of positive numbers such that k xn − x k = O ( δn ) and

(xn − x∗ )> ∇ f (x∗ ) ≤ −δn (26)

for all sufficiently large n, then x∗ is not a local minimizer of f on Ω.

Equivalently, if x∗ is a local minimizer of f on Ω, no such sequences {xn }1∞ and

{δn }∞ can exist.
n =1

Proof. Let {xn }1∞ and {δn }∞

n=1 be sequences with the properties stated in the theorem. By
Lemma 1 and (26),

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 ).

(27)
≤ −δn + O(kxn − x∗ k2 )

for all sufficiently large n. Hence, since kxn − x∗ k2 = O (δn ),

f (xn ) − f (x∗ ) O(kxn − x∗ k2 )

≤ −1 + <0
δn δn

for all sufficiently large n. Thus, f (xn ) < f (x∗ ) for all sufficiently large n, and therefore x∗
is not a local minimizer.

Remark 1. If f , Ω, x∗ , ∇ f (x∗ ) 6= 0 satisfy the conditions for a strict local minimizer in Theorem 3
(FOSC 2), they satisfy the conditions in Theorem 5 (FOSC 3) by choosing δn = δkxn − x∗ k, n ≥ 1.
Similarly, if f , Ω, x∗ , ∇ f (x∗ ) 6= 0 meet the conditions for a local non-minimizer in Theorem 4
(FONC 2), they meet the conditions in Theorem 6 (FONC 3). Examples 3 and 4 show that both
converse statements fail.

Example 3. See Figure 3. Set f (x) = x1 , x = [ x1 , x2 ]> , on Ω = {x ∈ R2 : x1 ≥ | x2 |3/2 } and

let x∗ = 0. Then, f is twice differentiable at x∗ with gradient ∇ f (x∗ ) = [1, 0]> . The point x∗ is a
strict local minimizer because f (0) = 0 and f (x) > 0 for every other x ∈ Ω. The condition for a
strict local minimizer in Theorem 3 (FOSC 2) is not satisfied. For example, the sequence

1 1
xn = , , n ≥ 1,
n n2/3
1/2
is in Ω \ {x∗ }, x∗n → 0, and the inequality (11) implies 1/n ≥ δ 1/n2 + 1/n4/3

, which is
impossible. On the other hand, the condition in Theorem 5 (FOSC 3) is satisfied. Consider any
Mathematics 2023, 11, 4274 9 of 14

sequence xn = [ xn,1 , xn,2 ]> , n ≥ 1, in Ω \ {x∗ } with xn → x∗ . Choosing δn = xn,1 , we readily

find that (xn − x∗ )> ∇ f (x∗ ) ≥ δn , and kxn − x∗ k2 = o (δn ).

x1 = | x2 |3/2

Ω
x1
x∗ ∇ f (x∗ )

Figure 3. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Examples 3 and 5.

Example 4. See Figure 4. Set f (x) = x1 , x = [ x1 , x2 ]> , on Ω = {x ∈ R2 : x1 ≥ −| x2 |3/2 } and

let x∗ = 0. Then, f is twice differentiable at x∗ with gradient ∇ f (x∗ ) = [1, 0]> . Since f (x) < 0
on the curve x1 = −|x2 |3/2 , x∗ is not a local minimizer of f on Ω. It is not possible to show this
using Theorem 4 (FONC 2), because it is impossible to find a sequence satisfying the conditions
there. However, the conditions of Theorem 6 (FONC 3) are met by choosing
>
1 1 1
xn = − 3/2
, and δn =
n n n3/2

for all n ≥ 1.

x1 = −| x2 |3/2 Ω
x1
x∗ ∇ f (x∗ )

Figure 4. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Example 4.

5. Applications of the Analytical Conditions

Example 3 suggests looking for generalizations of Theorems 1 and 2 to larger re-
gions. In this section, we show that such generalizations exist, again assuming twice
differentiability.

Definition 6. Let x∗ , d ∈ Rn , d 6= 0, and α > 0 be given. For each β > 0, define Kα,β (x∗ , d) ⊆
Rn as the set consisting of the point x∗ , together with all points x 6= x∗ such that (1) kuk ≥ β kvkα
and (2) u> d > 0, where
(x − x∗ )> d d
u=
kdk kdk
Mathematics 2023, 11, 4274 10 of 14

is the projection of x − x∗ in the direction d, and

v = (x − x∗ ) − u

is a component of x − x∗ in the orthogonal direction. We call Kα,β (x∗ , d) an α-cone. The opposite
α-cone is the set
−Kα,β (x∗ , d) = Kα,β (x∗ , −d).

See Figure 5. For α = 1, Kα,β (x∗ , d) ⊆ Rn is a cone as in Definition 1. For α = 2, it is a

paraboloid.

x2
x1 = β | x2 | α
Kα,β ( x ∗ , d∗ )
x
v
x∗ x1
u d

Figure 5. This figure illustrates Definition 6 when x∗ = 0 and d is in the positive x1 -direction. The
α-cone Kα,β (x∗ , d) is the region to the right of the curve together with the curve itself.

Theorem 7 (FOSC 4). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with nonzero gradient d∗ = ∇ f (x∗ ). If Ω ⊆ Kα,β ( x ∗ , d∗ ) for some α ∈ [1, 2) and
β > 0, then x∗ is a strict local minimizer of f over Ω.

As in Theorem 1, we can apply Theorem 7 with f replaced by − f .

Corollary 2. Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some point

x∗ ∈ Ω, with nonzero gradient d∗ = ∇ f (x∗ ). If Ω ⊆ −Kα,β ( x ∗ , d∗ ) for some α ∈ [1, 2) and
β > 0, then x∗ is a strict local maximizer of f over Ω.

Remark 2. Theorem 7 fails for α = 2. For an example, let x∗ = 0, x = [ x1 , x2 ]> , and

Ω = {x ∈ R2 : x1 ≥ x22 }.

Define
f (x) = x1 − 2x22 , x ∈ Ω.
Then, f is twice differentiable at x∗ , d∗ = ∇ f (x∗ ) = [1, 0]> , and Ω = K2,1 (x∗ , d∗ ) (see
Figure 5). If Theorem 7 were true for α = 2, then x∗ = 0 would be a strict local minimizer for f
on Ω. However, f assumes positive, negative, and zero values at points of Ω arbitrarily close to x∗ .
This contradicts Theorem 7, and therefore α = 2 cannot be allowed in the theorem.

Proof of Theorem 7. We prove the theorem by verifying the condition for a strict local
minimizer in Theorem 5 (FOSC 3). Let {xn }1∞ be a sequence in Ω \ {x∗ } such that xn → x∗ .
Mathematics 2023, 11, 4274 11 of 14

We shall construct a sequence {δn }∞

n=1 of positive numbers satisfying (24) for all sufficiently
large n such that kxn − x∗ k2 = O (δn ). For all n ≥ 1, set

(xn − x∗ )> d∗ d∗
un = and v = (xn − x∗ ) − un ,
kd∗ k kd∗ k

Then, kun k ≥ β kvn kα and u> ∗ ∗ ∗

n d > 0 for all n by the definition of Kα,β ( x , d ). Since
un and vn are orthogonal,
2/α
∗ 2 2 2 1 2
kxn − x k = kun k + kvn k ≤ kun k + kun k2/α .
β

Since xn → x∗ , un → 0, and hence kun k < 1 for all sufficiently large n, say n ≥ n0 .
Since α ≥ 1, 2/α ≤ 2, and hence kun k2/α ≥ kun k2 for all n ≥ n0 (because kun k < 1). Thus.
2/α
1
kxn − x∗ k2 ≤ kun k2/α + kun k2/α = γ kun k2/α , n ≥ n0 .
β

We assume also that α < 2 and hence 2/α > 1, say 2/α = 1 + ε. Then,
1+ ε
(xn − x∗ )> d∗

kxn − x∗ k2 ≤ γkun k2/α = γ
kd∗ k
(xn − x∗ )> d∗ (xn − x∗ )> d∗
ε
=γ
kd∗ k kd∗ k
(xn − x∗ )> d∗
≤γ k(xn − x∗ )kε ,
kd∗ k

by the Cauchy–Schwarz inequality and the fact that (xn − x∗ )> d∗ = u> ∗
n d > 0. Therefore,

k x n − x ∗ k 2− ε k d ∗ k
(xn − x∗ )> d∗ ≥ , n ≥ n0 .
γ

Setting δn = kxn − x∗ k2−ε kd∗ k/γ, we obtain (xn − x∗ )> d∗ ≥ δn for all sufficiently
large n and
k x n − x ∗ k2 γ kxn − x∗ kε
lim = lim = 0.
n→∞ δn n→∞ kd∗ k
We have verified the requirements in Theorem 5, and therefore x∗ is a strict local
minimizer of f over Ω by that result.

Theorem 8 (FONC 4). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with nonzero gradient d∗ = ∇ f (x∗ ). If x∗ is a strict local minimizer of f on Ω,
then x∗ is an isolated point of −Kα,β (x∗ , d) ∩ Ω for every α ∈ [1, 2) and β > 0.

Proof. Assume that x∗ is a strict local minimizer of f on Ω, and, if possible, that x∗ is not an
isolated point of −Kα,β (x∗ , d) ∩ Ω for some α ∈ [1, 2) and β > 0. Then, there is a sequence
{xn }∞ ∗ ∗ ∗
n=1 that belongs to both − Kα,β ( x , d ) and Ω \ { x } such that xn → x . By Corollary 2,
f (xn ) < f (x ) for all sufficiently large n, contradicting our assumption that x∗ is a strict
∗

local minimizer of f on Ω. The theorem follows.

Examples 5 and 6 are set in the context of the Karush–Kuhn–Tucker (KKT) theorem [2,6],
which allows constraint conditions to be expressed in terms of inequalities. We follow the
account in [1], in which the KKT theorem appears as Theorem 21.1, and Theorem 21.3 is
the corresponding second-order sufficient condition (SOSC). Theorems 9 and 10 below are
specializations of these results to the cases that concern us here. Theorems 21.1 and 21.3
in [1] allow additional Lagrange-type conditions that play no role in our examples.
Mathematics 2023, 11, 4274 12 of 14

Theorem 9 (KKT Theorem). Let f , g : Rn → R be given C1 functions. Assume that x∗ ∈ Rn is

a local minimizer for f subject to the condition g(x) ≤ 0, and that x∗ is a regular point for g in the
sense that ∇ g(x∗ ) 6= 0. Then, there is a real number µ∗ ≥ 0 such that
(1) µ∗ g(x∗ ) = 0;
(2) ∇ f (x∗ ) + µ∗ ∇ g(x∗ ) = 0.

The corresponding sufficient condition requires the stronger assumption that the given
functions f , g are C2 . The Hessians F, G for f , g are the n × n matrices of second-order
partials of f , g.

Theorem 10 (SOSC). Let f , g : Rn → R be given C2 functions. Assume that x∗ ∈ Rn satisfies

g(x∗ ) ≤ 0 and we can find a real number µ∗ ≥ 0 satisfying the following conditions:
(1) µ∗ g(x∗ ) = 0.
(2) ∇ f (x∗ ) + µ∗ ∇ g(x∗ ) = 0.
(3) If F, G are the Hessians of f , g and L(x∗ , µ∗ ) = F (x∗ ) + µ∗ G (x∗ ), then y> L(x∗ , µ∗ )y > 0
for all y ∈ Rn such that y 6= 0 and ∇ g(x∗ )> y = 0.
Then, x∗ is a strict local minimizer for f subject to the condition g(x) ≤ 0.

Example 5. Set x∗ = 0, f (x) = x1 − x12 − x22 , and g(x) = | x2 |3/2 − x1 for all x = [ x1 , x2 ]>
in R2 . Then, f ∈ C2 and g ∈ C1 on R2 . The set Ω = {x ∈ R2 : g(x) ≤ 0} is an al pha-cone with
α = 3/2 in the direction ∇ f (x∗ ) = [1, 0]> (see Figure 3). Therefore, by Theorem 7 (FOSC 4),
x∗ is a strict local minimizer for f subject to the constraint g(x) ≤ 0. However, this cannot be
shown with Theorem 10 because g 6∈ C2 , which is a hypothesis in Theorem 10. To see why this is a
problem, consider the form L(x, µ∗ ) = F (x) + µ∗ G (x) that appears in condition (3). At any point
x = [ x1 , x2 ]> with x2 6= 0, the Hessian of g is given by
2
∂2 g

0 0
G (x) = = 3 −1/2 .
∂xi ∂x j i,j=1
0 4 | x2 |

The second partial ∂2 g/∂x22 does not exist at any point on the line x2 = 0. Thus, G (x∗ ) is
undefined. Hence, the Lagrangian L(x∗ , µ∗ ) = F (x∗ ) + µ∗ G (x∗ ) in condition (3) of Theorem 10
is undefined, and therefore Theorem 10 cannot be applied. We remark that this example is within the
scope of Theorem 9 (KKT Theorem), and the conditions (1) and (2) there are satisfied with µ∗ = 1.

Example 6. Set x∗ = 0, f (x) = x1 − x12 − x22 , and g(x) = x22 − x1 for all x = [ x1 , x2 ]> in R2 .
Then, f , g ∈ C2 on R2 . In this example, x∗ is not a local minimizer of f subject to the constraint
g(x) ≤ 0. For example, for x 6= x∗ on the boundary x1 = x22 of the constraint set,

f (x) = x22 − x12 + x22 = − x12 < 0.

Might this example contradict Theorem 7 or Theorem 10? Fortunately, no, and it is instructive
to see why. Theorem 7 is not applicable because the constraint set Ω = {x ∈ R2 : g(x) ≤ 0} is an
al pha-cone with α = 2, and it is shown in Remark 2 that Theorem 7 fails for α = 2. To see that
Theorem 10 is also not applicable, let us check the required conditions (1)–(3):
(1) Since g(x∗ ) = 0, µ∗ g(x∗ ) = 0 for all µ∗ ≥ 0.
(2) For µ∗ = 1, ∇ g(x∗ ) + µ∗ ∇ f (x∗ ) = [−1, 0]> + µ∗ [1, 0]> = 0.
(3) In our example,

∗ ∗ ∗ ∗ −2 0
∗ 0 0 −2 0
L(x , µ ) = F (x ) + µ G (x ) = + = .
0 −2 0 2 0 0
Mathematics 2023, 11, 4274 13 of 14

Therefore,
−2 0 0
y> L(x∗ , µ∗ )y = 0

y2 =0 (28)
0 0 y2

for every y = [0, y2 ]> such that y2 6= 0, that is, for all y 6= 0 such that ∇ g( x ∗ )> y = 0.
In view of (28), the positive definiteness condition in (3) fails, and hence Theorem 7 cannot be
applied to this example.

6. Conclusions
The first-order necessary conditions in this paper contribute to the literature on first-
order optimality conditions by introducing stronger results than those in the current
literature. The new first-order necessary conditions imply the standard first-order necessary
conditions, including those in [1,5]. We introduced first-order sufficient conditions that we
did not find elsewhere in the literature. Our explanation of why the new conditions are
stronger used examples that were two-dimensional. However, the method is applicable to
general n-dimensional problems including linear programming.
We proposed first-order sufficient conditions for set-constrained optimization that
do not require the objective function to be convex or the constraint equations to be differ-
entiable. Conditions that require the function to be convex are essentially second-order
conditions. Our conditions only require the gradient of the objective function to be nonzero
at a candidate minimizer, and they are essentially first-order conditions even when we
apply them to problems where the objective function is twice differentiable.
When the given function is continuously differentiable at x ∗ and the gradient is
nonzero, the simplest form of the sufficient condition says that there is a cone with a vertex
at x ∗ , and x ∗ is a strict local minimizer on the cone. This sufficient condition was employed
to prove a corresponding necessary condition that does not use feasible directions and
instead uses the topological notion of an isolated point in a set.
We introduced generalized differentiability and reformulated the first-order conditions
in terms of convergent sequences. The new differentiability does not require the objective
function to be defined on an open neighborhood of x ∗ . It only requires the function to be
defined on the constraint set.
We refined the first-order conditions for a minimizer to twice differentiable functions
in terms of α-cones. The sufficiency version says that a twice differentiable function with
a nonzero gradient has a strict local minimizer at the vertex of an α-cone whose axis is
the gradient direction. We presented a problem with an α-cone constraint set where the
new sufficiency condition shows that the candidate point is a strict local minimizer. This
problem satisfies the necessary condition of the KKT method but not the sufficient condition,
because the Hessian is undefined at the candidate minimizer.

Author Contributions: Conceptualization, S.M.R. and E.K.P.C.; methodology, S.M.R., E.K.P.C. and
J.R.; writing—original draft, S.M.R.; writing—review and editing, J.R. All authors have read and
agreed to the published version of the manuscript.
Funding: S.M. Rovnyak was supported in part by the National Science Foundation under grant
ECCS-1711521. E.K.P. Chong was supported in part by the National Science Foundation under grant
CCF-2006788.
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is
not applicable to this article.
Acknowledgments: The authors thank Henry Rovnyak for help with the LaTeX document and
TikZ figures.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or
in the decision to publish the results.
Mathematics 2023, 11, 4274 14 of 14

References
1. Chong, E.K.P.; Zak, S.H. An Introduction to Optimization, 4th ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2013.
2. Forst, W.; Hoffmann, D. Optimization—Theory and Practice; Springer Undergraduate Texts in Mathematics and Technology;
Springer: New York, NY, USA, 2010.
3. Sioshansi, R.; Conejo, A.J. Optimization in Engineering: Models and Algorithms; Springer Optimization and Its Applications;
Springer: Cham, Switzerland, 2017; Volume 120.
4. Butenko, S.; Pardalos, P.M. Numerical Methods and Optimization; Chapman & Hall/CRC Numerical Analysis and Scientific
Computing; CRC Press: Boca Raton, FL, USA, 2014.
5. Kochenderfer, M.J.; Wheeler, T.A. Algorithms for Optimization; MIT Press: Cambridge, MA, USA, 2019.
6. Luenberger, D.G. Optimization by Vector Space Methods; John Wiley & Sons, Inc.: New York, NY, USA, 1969.
7. Lewis, A.D. Maximum Principle. Online Lecture Notes. 2006. Available online: https://mast.queensu.ca/~andrew/teaching/
pdf/maximum-principle.pdf (accessed on 9 October 2023).
8. Peng, S.G. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim. 1990, 28, 966–979.
[CrossRef]
9. Lu, Q. Second order necessary conditions for optimal control problems of stochastic evolution equations. In Proceedings of the
35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016.
10. Marsden, J.E.; Tromba, A.J. Vector Calculus, 6th ed.; W.H. Freeman & Company: New York, NY, USA, 2012.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Non Linear Programming
No ratings yet
Non Linear Programming
109 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Mathematics For Economics (ECON 104)
No ratings yet
Mathematics For Economics (ECON 104)
46 pages
Chapter 6 Lecture Notes
No ratings yet
Chapter 6 Lecture Notes
4 pages
Opt8 20
No ratings yet
Opt8 20
9 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Optimality Conditions: Mar Ia M. Seron
No ratings yet
Optimality Conditions: Mar Ia M. Seron
58 pages
Chapter 6 Basics of Set-Constrained and Unconstrained Optimization
No ratings yet
Chapter 6 Basics of Set-Constrained and Unconstrained Optimization
38 pages
Lec6 Constr Opt
No ratings yet
Lec6 Constr Opt
30 pages
Princeton University Notation and Terminology in optimization
No ratings yet
Princeton University Notation and Terminology in optimization
13 pages
Lecture 6
No ratings yet
Lecture 6
5 pages
Optimality Conditions For General Constrained Optimization: CME307/MS&E311: Optimization Lecture Note #07
No ratings yet
Optimality Conditions For General Constrained Optimization: CME307/MS&E311: Optimization Lecture Note #07
28 pages
Math Chapter 7
No ratings yet
Math Chapter 7
4 pages
MAE Optimization Lecture 3 Handout
No ratings yet
MAE Optimization Lecture 3 Handout
43 pages
Constrained Optimization
No ratings yet
Constrained Optimization
10 pages
Mathematics For Economics (ECON 104)
No ratings yet
Mathematics For Economics (ECON 104)
51 pages
Classical Optimization Theory: Constrained Optimization (Equality Constraints)
No ratings yet
Classical Optimization Theory: Constrained Optimization (Equality Constraints)
22 pages
Bms Basic NLP 120609
No ratings yet
Bms Basic NLP 120609
103 pages
Baccari20042 - On The Classical Necessary Second-Order Optimality Conditions in The Presence of Equality and Inequality Constraints
No ratings yet
Baccari20042 - On The Classical Necessary Second-Order Optimality Conditions in The Presence of Equality and Inequality Constraints
15 pages
tylor series
No ratings yet
tylor series
21 pages
Classical Optimization Theory Quadratic Forms: Let Be A N-Vector
No ratings yet
Classical Optimization Theory Quadratic Forms: Let Be A N-Vector
48 pages
Kkterabio
No ratings yet
Kkterabio
12 pages
Numerical Optimization: 1 The Use of Optimality Conditions
No ratings yet
Numerical Optimization: 1 The Use of Optimality Conditions
6 pages
L07_KKT (17 Oct)
No ratings yet
L07_KKT (17 Oct)
59 pages
Lecture 2 - Optimization With Equality Constraints
No ratings yet
Lecture 2 - Optimization With Equality Constraints
44 pages
Karush Kuhn Tucker Slides
No ratings yet
Karush Kuhn Tucker Slides
45 pages
Mathematical Economics (ECON 471) Unconstrained & Constrained Optimization
No ratings yet
Mathematical Economics (ECON 471) Unconstrained & Constrained Optimization
20 pages
PS Answers Fall2022 Merged
No ratings yet
PS Answers Fall2022 Merged
91 pages
lecture18
No ratings yet
lecture18
33 pages
Chapter 5.3
No ratings yet
Chapter 5.3
26 pages
Annotated Slides
No ratings yet
Annotated Slides
53 pages
Karush Kuhn Tucker
No ratings yet
Karush Kuhn Tucker
14 pages
CH 4-Design Optimization-Optimum Design Concepts PDF
No ratings yet
CH 4-Design Optimization-Optimum Design Concepts PDF
62 pages
Analythical Methods
No ratings yet
Analythical Methods
45 pages
Chapitre3_FF-English
No ratings yet
Chapitre3_FF-English
16 pages
Week 4
No ratings yet
Week 4
15 pages
CH 4-Design Optimization-Optimum Design Concepts-B PDF
No ratings yet
CH 4-Design Optimization-Optimum Design Concepts-B PDF
41 pages
Optimization PDF
No ratings yet
Optimization PDF
13 pages
mcnotes41
No ratings yet
mcnotes41
8 pages
L31 - Non-Linear Programming Problems - Unconstrained Optimization - KKT Conditions
100% (1)
L31 - Non-Linear Programming Problems - Unconstrained Optimization - KKT Conditions
48 pages
s11228-021-00595-z
No ratings yet
s11228-021-00595-z
22 pages
Multivariable Calculus - Revision Notes Eco Hons Sem 2 DU
No ratings yet
Multivariable Calculus - Revision Notes Eco Hons Sem 2 DU
21 pages
Lecture 1 Introduction To Optimization in Economics
No ratings yet
Lecture 1 Introduction To Optimization in Economics
47 pages
08 Unconstrsoained Op Tim Ization
No ratings yet
08 Unconstrsoained Op Tim Ization
11 pages
Nonlinear Optimization
No ratings yet
Nonlinear Optimization
6 pages
lec09
No ratings yet
lec09
56 pages
Unconstrained and Constrained Optimization
No ratings yet
Unconstrained and Constrained Optimization
31 pages
Convention: Throughout This Discussion A Feasible Direction D at A Point Is by Definition Taken
No ratings yet
Convention: Throughout This Discussion A Feasible Direction D at A Point Is by Definition Taken
12 pages
5165 Test 2 Cheating
No ratings yet
5165 Test 2 Cheating
7 pages
Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)
No ratings yet
Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)
445 pages
C62 Lecture1b
No ratings yet
C62 Lecture1b
20 pages
Math For Econ (MIT)
No ratings yet
Math For Econ (MIT)
8 pages
opte
No ratings yet
opte
32 pages
Unconstrained Minimization in R: Newton Methods
No ratings yet
Unconstrained Minimization in R: Newton Methods
5 pages
Lecture 4
No ratings yet
Lecture 4
7 pages
EC3120 - Mathematical Economics - 2008 Examiners Commentaries - ZA-ZB
No ratings yet
EC3120 - Mathematical Economics - 2008 Examiners Commentaries - ZA-ZB
11 pages
Lecture 6: Optimality Conditions For Nonlinear Programming
No ratings yet
Lecture 6: Optimality Conditions For Nonlinear Programming
28 pages
lect5_removed
No ratings yet
lect5_removed
35 pages
1 - Theory of Maxima and Minima
No ratings yet
1 - Theory of Maxima and Minima
31 pages
Problems in Quantum Mechanics: Third Edition
From Everand
Problems in Quantum Mechanics: Third Edition
D. ter Haar
3/5 (2)
q
No ratings yet
q
4 pages
Learn Dual Port
No ratings yet
Learn Dual Port
5 pages
New
No ratings yet
New
2 pages
9 Sequential
No ratings yet
9 Sequential
61 pages
1157_MEL_G621_20240218052327_Mid_Semester_Question_Paper
No ratings yet
1157_MEL_G621_20240218052327_Mid_Semester_Question_Paper
2 pages
HW7_Solu
No ratings yet
HW7_Solu
6 pages
Experiment 9B
No ratings yet
Experiment 9B
4 pages
Reverse Engineering
No ratings yet
Reverse Engineering
18 pages
Pooja WS2
No ratings yet
Pooja WS2
7 pages
Artificial Eyes Look To The Future
No ratings yet
Artificial Eyes Look To The Future
3 pages
Forename Surname Gender Admission Number: Classification: Internal
No ratings yet
Forename Surname Gender Admission Number: Classification: Internal
7 pages
Drown Forest Fires in Sound Drown Forest Fires in Sound Drown Forest Fires in Sound Drown Forest Fires in Sound Drown Forest Fires in Sound
No ratings yet
Drown Forest Fires in Sound Drown Forest Fires in Sound Drown Forest Fires in Sound Drown Forest Fires in Sound Drown Forest Fires in Sound
8 pages
Artificial Eyes Look To The Future
No ratings yet
Artificial Eyes Look To The Future
3 pages
Artificial Eyes Look To The Future: Drown Forest Fires in Sound Drown Forest Fires in Sound
No ratings yet
Artificial Eyes Look To The Future: Drown Forest Fires in Sound Drown Forest Fires in Sound
3 pages
Math Book
100% (1)
Math Book
206 pages
WMO Practice Set For Level 1
No ratings yet
WMO Practice Set For Level 1
3 pages
End-of-Year Test-1
No ratings yet
End-of-Year Test-1
20 pages
K-Tron (YF-401) PDF
No ratings yet
K-Tron (YF-401) PDF
12 pages
Assignment 3 Shannon and Weaver Model
100% (1)
Assignment 3 Shannon and Weaver Model
6 pages
Disolucion de Malaquita Con Acido Sulfurico
No ratings yet
Disolucion de Malaquita Con Acido Sulfurico
16 pages
Chapter 19 - Sight Reductions
No ratings yet
Chapter 19 - Sight Reductions
23 pages
Credit Risk - Predictive Modelling
No ratings yet
Credit Risk - Predictive Modelling
47 pages
A Near-Optimal Spectral Method For Simulating Fluids in A Cylinder
No ratings yet
A Near-Optimal Spectral Method For Simulating Fluids in A Cylinder
17 pages
Rowodesewov
No ratings yet
Rowodesewov
2 pages
Definition Topic4
No ratings yet
Definition Topic4
11 pages
Homework and Practice Workbook Answers 6th Grade
100% (1)
Homework and Practice Workbook Answers 6th Grade
6 pages
Analysis of Failure On Hacksaw Blade
No ratings yet
Analysis of Failure On Hacksaw Blade
11 pages
Math8 q2 Mod14 WritingProofs v2
No ratings yet
Math8 q2 Mod14 WritingProofs v2
25 pages
Pre-Apprentice Math Evaluation Exam Study Guide
No ratings yet
Pre-Apprentice Math Evaluation Exam Study Guide
11 pages
Complex6 - Harmonic Function & Conjugate
100% (2)
Complex6 - Harmonic Function & Conjugate
23 pages
China-Team_Selection_Test-2009-47
No ratings yet
China-Team_Selection_Test-2009-47
7 pages
Worksheet - 3 (X-Maths, 2024-25)
No ratings yet
Worksheet - 3 (X-Maths, 2024-25)
2 pages
Lecture Notes: Guoqiang TIAN Department of Economics Texas A&M University College Station, Texas 77843 (Gtian@tamu - Edu)
No ratings yet
Lecture Notes: Guoqiang TIAN Department of Economics Texas A&M University College Station, Texas 77843 (Gtian@tamu - Edu)
552 pages
Financial Analytics Notes
100% (1)
Financial Analytics Notes
40 pages
Engineering Mechanics Dec 2013
No ratings yet
Engineering Mechanics Dec 2013
11 pages
Tabel Baja Wf-lrfd3
100% (3)
Tabel Baja Wf-lrfd3
7 pages
5 Comparator
No ratings yet
5 Comparator
6 pages
Threshold Heteroskedastic Models: Jean-Michel Zakoian
0% (1)
Threshold Heteroskedastic Models: Jean-Michel Zakoian
25 pages
Measure of Central Tendency and Variability
No ratings yet
Measure of Central Tendency and Variability
73 pages
Lesson 6: The Exponentials : A. Exponential Equation
No ratings yet
Lesson 6: The Exponentials : A. Exponential Equation
13 pages
BUSINESS ANALYTICS UNIT 4 and 5 NOTES
No ratings yet
BUSINESS ANALYTICS UNIT 4 and 5 NOTES
16 pages
Solution 2
No ratings yet
Solution 2
156 pages
Motion in A Plane - Practice Sheet
No ratings yet
Motion in A Plane - Practice Sheet
14 pages
AI in Smart Energy Systems Lecture 6 Notes
No ratings yet
AI in Smart Energy Systems Lecture 6 Notes
19 pages

optimization notes2

Uploaded by

optimization notes2

Uploaded by

mathematics

1 Department of Electrical and Computer Engineering, Indiana University-Purdue University,

MSC: 49K99; 90C46

Mathematics 2023, 11, 4274. https://doi.org/10.3390/math11204274 https://www.mdpi.com/journal/mathematics

2. First-Order Conditions for Local Minimizers

We need a few more preliminaries. A real-valued function f on a subset Ω of Rn is

f (x) − f (x∗ ) = (x − x∗ )> ∇ f (x∗ ) + r (x) (2)

This notion is implicit in the definition of a cone in Rn .

We note a consequence that will be useful in Theorem 2.

The corollary follows by applying Theorem 1 with f replaced by − f .

whenever 0 < kx − x∗ k < η. Suppose x ∈ Ω and 0 < kx − x∗ k < η. Then, x ∈ Kδ (x∗ , d∗ ),

f (x) − f (x∗ ) ≥ δkx − x∗ kkd∗ k + r (x). (6)

f (x) − f (x∗ ) > 0.

Therefore, x∗ is a strict local minimizer of f on Ω.

A corresponding first-order necessary condition (FONC) is deduced with the aid of

Proof. Suppose x∗ is a strict local minimizer of f on some subset Ω of Rn . We argue by

3. Analytical Versions of the Conditions and Generalized Differentiability

Definition 3 (Generalized Differentiability). Let Ω be a subset of Rn . We say that a function

f (x) − f (x∗ ) = (x − x∗ )> g(x∗ ) + r (x) (7)

(xn − x∗ )> ∇ f (x∗ ) ≥ δkxn − x∗ k (11)

for all sufficiently large n. Then, x∗ is a strict local minimizer of f over Ω.

Proof. Argue by contradiction. If x∗ is not a strict local minimizer of f , there is a sequence

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + rn ≥ δkxn − x∗ k + rn (12)

for all sufficiently large n. Since f is differentiable at x∗ , rn /kxn − x∗ k → 0 as n → ∞

The corresponding necessary condition is conveniently stated in contrapositive form.

(xn − x∗ )> ∇ f (x∗ ) ≤ −δkxn − x∗ k (14)

for all sufficiently large n, then x∗ is not a local minimizer of f on Ω.

Equivalently, if x∗ is a local minimizer of f over Ω, then there exists no such sequence

Example 1. The inequality (11) in Theorem 3 (FOSC 2) cannot be weakened to

(xn − x∗ )> ∇ f (x∗ ) > 0. (17)

Choose x∗ = 0, Ω = {0} ∪ {x ∈ R2 : x1 > 0}, and f (x) = x1 − x22 , x ∈ Ω. Then,

Example 2. We cannot relax the inequality (14) in Theorem 4 (FONC 2) to

(xn − x∗ )> ∇ f (x∗ ) < 0. (18)

Choose x∗ = 0, Ω = x ∈ R2 : 2x1 ≥ − x22 , f (x) = x1 + x22 , x ∈ Ω, and ∇ f (x∗ ) =

converges to 0. It satisfies (18) because

Nevertheless, f (0) = 0 is the minimum value of f attained on Ω. For, if x ∈ Ω \ {0} and

Figure 2. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Example 2.

4. Refinements Based on Twice Differentiability

Definition 4 (Generalized Twice Differentiability). Let Ω be a subset of Rn . We say that a

f (x) − f (x∗ ) = (x − x∗ )> g(x∗ ) + (x − x∗ )> H(x∗ )(x − x∗ ) + r (x) (19)

Twice differentiability implies differentiability according to Lemma 1 below. By Th.

Definition 5. Given sequences { an }∞ ∞

Lemma 1. Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some point x∗ ∈ Ω

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 ). (23)

(xn − x∗ )> H(x∗ )(xn − x∗ ) + rn ≤ (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn

(xn − x∗ )> ∇ f (x∗ ) ≥ δn (24)

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 )

for all sufficiently large n. Hence, since kxn − x∗ k2 = O (δn ),

f (xn ) − f (x∗ ) O(kxn − x∗ k2 )

(xn − x∗ )> ∇ f (x∗ ) ≤ −δn (26)

for all sufficiently large n, then x∗ is not a local minimizer of f on Ω.

Equivalently, if x∗ is a local minimizer of f on Ω, no such sequences {xn }1∞ and

Proof. Let {xn }1∞ and {δn }∞

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 ).

for all sufficiently large n. Hence, since kxn − x∗ k2 = O (δn ),

f (xn ) − f (x∗ ) O(kxn − x∗ k2 )

Example 3. See Figure 3. Set f (x) = x1 , x = [ x1 , x2 ]> , on Ω = {x ∈ R2 : x1 ≥ | x2 |3/2 } and

sequence xn = [ xn,1 , xn,2 ]> , n ≥ 1, in Ω \ {x∗ } with xn → x∗ . Choosing δn = xn,1 , we readily

Figure 3. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Examples 3 and 5.

Example 4. See Figure 4. Set f (x) = x1 , x = [ x1 , x2 ]> , on Ω = {x ∈ R2 : x1 ≥ −| x2 |3/2 } and

Figure 4. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Example 4.

5. Applications of the Analytical Conditions

is the projection of x − x∗ in the direction d, and

See Figure 5. For α = 1, Kα,β (x∗ , d) ⊆ Rn is a cone as in Definition 1. For α = 2, it is a

As in Theorem 1, we can apply Theorem 7 with f replaced by − f .

Corollary 2. Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some point

Remark 2. Theorem 7 fails for α = 2. For an example, let x∗ = 0, x = [ x1 , x2 ]> , and

We shall construct a sequence {δn }∞

Then, kun k ≥ β kvn kα and u> ∗ ∗ ∗

local minimizer of f on Ω. The theorem follows.

Theorem 9 (KKT Theorem). Let f , g : Rn → R be given C1 functions. Assume that x∗ ∈ Rn is

Theorem 10 (SOSC). Let f , g : Rn → R be given C2 functions. Assume that x∗ ∈ Rn satisfies

f (x) = x22 − x12 + x22 = − x12 < 0.

You might also like