0% found this document useful (0 votes)
6 views

optimization notes2

This paper presents new first-order necessary and sufficient conditions for local minimizers in set-constrained optimization, extending traditional methods that rely on the gradient's behavior. The authors introduce alternative conditions that do not require the objective function to be convex and provide examples to illustrate their applicability, including a discussion on the Karush–Kuhn–Tucker theorem. The results are significant for optimization problems where the standard necessary conditions may not yield useful information due to the absence of feasible directions.

Uploaded by

kanha dd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

optimization notes2

This paper presents new first-order necessary and sufficient conditions for local minimizers in set-constrained optimization, extending traditional methods that rely on the gradient's behavior. The authors introduce alternative conditions that do not require the objective function to be convex and provide examples to illustrate their applicability, including a discussion on the Karush–Kuhn–Tucker theorem. The results are significant for optimization problems where the standard necessary conditions may not yield useful information due to the absence of feasible directions.

Uploaded by

kanha dd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

mathematics

Article
First-Order Conditions for Set-Constrained Optimization
Steven M. Rovnyak 1, *, Edwin K. P. Chong 2 and James Rovnyak 3

1 Department of Electrical and Computer Engineering, Indiana University-Purdue University,


Indianapolis, IN 46202, USA
2 Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523, USA;
[email protected]
3 Department of Mathematics, University of Virginia, Charlottesville, VA 22904, USA; [email protected]
* Correspondence: [email protected]

Abstract: A well-known first-order necessary condition for a point to be a local minimizer of a given
function is the non-negativity of the dot product of the gradient and a vector in a feasible direction.
This paper proposes a series of alternative first-order necessary conditions and corresponding first-
order sufficient conditions that seem not to appear in standard texts. The conditions assume a
nonzero gradient. The methods use extensions of the notions of gradient, differentiability, and
twice differentiability. Examples, including one involving the Karush–Kuhn–Tucker (KKT) theorem,
illustrate the scope of the conditions.

Keywords: constrained optimization; local minimizer; necessary condition; sufficient condition; KKT
theorem

MSC: 49K99; 90C46

1. Introduction
Set-constrained and set-unconstrained optimization use several theorems that include
Citation: Rovnyak, S.M.; Chong, a first-order necessary condition, as well as second-order conditions that are necessary
E.K.P.; Rovnyak, J. First-Order and/or sufficient for a point to be a local minimizer [1,2]. These theorems require the
Conditions for Set-Constrained objective function to be once or twice continuously differentiable.
Optimization. Mathematics 2023, 11, The first-order necessary condition for unconstrained optimization requires the gradi-
4274. https://doi.org/10.3390/ ent to be zero at a minimizer [3–5]. The first-order necessary condition for set-constrained
math11204274
optimization requires the dot product of the gradient and a vector in a feasible direction
Academic Editors: Chao Zhang, to be non-negative. When constraints are defined in terms of differentiable functions, the
Yanfang Zhang, Yang Zhou and first-order necessary condition takes the form of the first-order Lagrange and Karush–Kuhn–
Qiang Ye Tucker (KKT) conditions [2,6]. Reference [5] determined the solution to linear programming
problems using a first-order necessary condition. Such conditions have been studied for
Received: 15 September 2023
control systems governed by ordinary differential equations [7], stochastic differential
Revised: 3 October 2023
equations [8], and stochastic evolution equations [9].
Accepted: 10 October 2023
Published: 13 October 2023
Throughout, we assume a nonzero gradient. Our main results present a series of
four sufficient conditions and four corresponding necessary conditions for a point to
be a local minimizer of a given function f (Theorems 1–8). Each is of the first-order
type, including those that assume twice differentiability. Theorems 1 and 2 describe
Copyright: © 2023 by the authors. the behavior of f in cones determined by the gradient. Theorems 3 and 4 replace the
Licensee MDPI, Basel, Switzerland. geometrical viewpoint by analytical conditions involving sequences. The analytical versions
This article is an open access article use generalizations of the notions of gradient and differentiability that simplify statements
distributed under the terms and and proofs. Theorems 5 and 6 are refinements when f is twice differentiable. A previous
conditions of the Creative Commons version of this paper included a remark that the analytical conditions are unverifiable.
Attribution (CC BY) license (https://
However, the last two results, Theorems 7 and 8, which return to the geometrical view, are
creativecommons.org/licenses/by/
proved precisely by verifying the analytical conditions. They replace the original cones
4.0/).

Mathematics 2023, 11, 4274. https://doi.org/10.3390/math11204274 https://www.mdpi.com/journal/mathematics


Mathematics 2023, 11, 4274 2 of 14

by larger regions that we call α-cones. An α-cone is an ordinary cone when α = 1 and a
paraboloid when α = 2. The results fail for half-planes, and a paraboloid is a limiting case
for what is possible. Example 5 illustrates a class of problems that do not meet criteria for a
strict local minimizer in the KKT theory but are covered by Theorem 7.
We remark that the cones used here are different from the cones of descent directions
in [2], which are actually half-planes. Our sufficient conditions do not guarantee that a
point with a nonzero gradient is a strict local minimizer on a half-plane. Example 1 gives a
function with a nonzero gradient that is not a strict local minimizer on a half-plane.
Convex optimization problems require the objective function to be convex, which
essentially makes them second-order conditions. The first-order conditions that we propose
do not require the objective function to be convex. The requirement for a function to
be twice continuously differentiable is different from the condition of convexity, which
constrains the values of the second derivatives.
Notation andpterminology. Points of Rn are written as column vectors x = [x1 . . . . , xn ]>
with norm kxk = |x1 |2 + · · · + |xn |2 . A subset of Rn is called a neighborhood of a point x∗
if it contains a disk kx − x∗ k < ε for some ε > 0, and a neighborhood of a set Ω ⊆ Rn if it is a
neighborhood of every point in Ω. A point x∗ ∈ Ω is a strict local minimizer of a function
f : Ω → R if there is an ε > 0 such that f (x) > f (x∗ ) whenever x ∈ Ω and 0 < kx − x∗ k < ε,
and a local minimizer if the condition holds with > replaced by ≥. Local maximizers and
strict local maximizers are defined similarly by reversing the inequalities.

2. First-Order Conditions for Local Minimizers


The gradient of a real-valued function f defined on a subset Ω of Rn is defined by
 >
∂f ∂f
∇ f (x) = ,..., ,
∂x1 ∂xn

whenever the partial derivatives exist. Gradients appear in a standard first-order necessary
condition for local minimizers. Consider a function f : Ω → R that is C1 on a neighborhood
of a set Ω ⊆ Rn and a point x∗ ∈ Ω. If x∗ is a local minimizer of f , then d> ∇ f (x∗ ) ≥ 0
for every vector d in a feasible direction, that is, a direction such that some straight line
segment with endpoint x∗ lies entirely within Ω [1] (Theorem 6.1). Hence, if d> ∇ f (x∗ ) < 0
for some feasible direction d at x∗ , the standard necessary condition implies that x∗ is not
a local minimizer. However, it may occur that x∗ has no feasible direction within Ω, and
then it is impossible for the standard necessary condition to give such information. For
example, feasible directions are impossible for any set Ω whose points have only rational
coordinates. For an elementary example, consider the objective function f (x) = − x1 on
the set
Ω = x ∈ R2 : 0 ≤ x1 ≤ 1, x13 ≤ x2 ≤ x12 .

(1)
Then, f attains a maximum value on Ω at x∗ = 0. The point x∗ admits no feasible
direction within Ω (proof: any feasible direction must be along a line segment x2 = cx1 ,
c > 0; then, for all sufficiently small positive x1 , x13 ≤ cx1 ≤ x12 , and hence x12 ≤ c ≤ x1 ,
which is impossible), and thus the standard necessary condition yields no information.
Feasible directions play no role in our results, and instead all that is needed is that
∇ f (x∗ ) 6= 0. Theorem 2 (FONC 1) is a new first-order necessary condition for local mini-
mizers. It is proved using a corresponding new first-order sufficient condition, Theorem 1
(FOSC 1). Corollary 1 is a companion sufficiency condition for strict local maximizers.
In the example f (x) = − x1 on the set Ω defined by (1) and x∗ = 0, the gradient is
∇ f (0) = [−1, 0]> , and therefore:
(1) Theorem 2 is applicable and implies that 0 is not a strict local minimizer, because every
opposite cone | x2 | ≤ cx1 contains points arbitrarily near 0.
(2) Corollary 1 is applicable and implies that 0 is a strict local maximizer, because Ω is
entirely contained in the opposite cone | x2 | ≤ x1 .
Mathematics 2023, 11, 4274 3 of 14

We need a few more preliminaries. A real-valued function f on a subset Ω of Rn is


said to be differentiable at x∗ if the domain of f contains a neighborhood of x∗ , all partial
derivatives of f exist at x∗ , and the function r (x) defined by

f (x) − f (x∗ ) = (x − x∗ )> ∇ f (x∗ ) + r (x) (2)

satisfies
r (x)
lim = 0. (3)
x→x∗ kx − x∗ k
When n = 1, this is equivalent to the existence of a derivative at the point, but in
general the simple existence of partial derivatives does not imply differentiability. A
convenient sufficient condition for differentiability is that f is defined and C1 on some
neighborhood of x∗ [10] (Th. 9 on p. 113). Differentiability is equivalent to the existence of
a first-order Taylor approximation, as in [1] (Th. 5.5 on pp. 64–65) or [10] (Th. 2 on p. 160).
Any two nonzero vectors d1 , d2 in Rn determine an acute angle θ such that

d1> d2
cos(θ ) = .
kd1 kkd2 k

This notion is implicit in the definition of a cone in Rn .

Definition 1. If x∗ , d ∈ Rn , d 6= 0, and 0 < δ < 1, the set consisting of x∗ together with all
points x 6= x∗ in Rn that satisfy
(x − x∗ )> d
≥δ
kx − x∗ kkdk
is denoted by Kδ (x∗ , d) and called a cone with vertex x∗ and direction d. The opposite cone is
defined by
− K δ ( x ∗ , d ) = K δ ( x ∗ , − d ).

Our first result is a first-order sufficient condition (FOSC) for a local minimizer.

Theorem 1 (FOSC 1). Let f be a real-valued function that is defined and C1 on a neighborhood
of a point x∗ ∈ Rn such that d∗ = ∇ f (x∗ ) 6= 0. Assume that Ω is a set in the domain of f
that contains x∗ and is contained in some cone Kδ (x∗ , d∗ ), δ ∈ (0, 1). Then, x∗ is a strict local
minimizer of f on Ω.

We note a consequence that will be useful in Theorem 2.

Corollary 1. For f as in Theorem 1, x∗ is a strict local maximizer of f on any set in the domain of
f that contains x∗ and is contained in −Kδ (x∗ , d∗ ) for some δ ∈ (0, 1).

The corollary follows by applying Theorem 1 with f replaced by − f .

Proof of Theorem 1. Let Ω be a set in the domain of f that contains x∗ and is contained in
Kδ (x∗ , d∗ ), δ ∈ (0, 1). Since f is C1 on a neighborhood of x∗ , f is differentiable at x∗ . Thus,

f ( x ) − f ( x ∗ ) = ( x − x ∗ ) > d ∗ + r ( x ), (4)

where limx→x∗ r (x)/kx − x∗ k = 0. Therefore, we may choose η > 0 so that the punctured
disk 0 < kx − x∗ k < η is contained in the domain of f and

|r (x)|
< δkd∗ k (5)
kx − x∗ k
Mathematics 2023, 11, 4274 4 of 14

whenever 0 < kx − x∗ k < η. Suppose x ∈ Ω and 0 < kx − x∗ k < η. Then, x ∈ Kδ (x∗ , d∗ ),


so (x − x∗ )> d∗ ≥ kx − x∗ kkd∗ k. By (4),

f (x) − f (x∗ ) ≥ δkx − x∗ kkd∗ k + r (x). (6)

Since 0 < kx − x∗ k < η, by (5), |r (x)| < δkd∗ kkx − x∗ k, and hence by (6),

f (x) − f (x∗ ) > 0.

Therefore, x∗ is a strict local minimizer of f on Ω.

A corresponding first-order necessary condition (FONC) is deduced with the aid of


Corollary 1. We first introduce some useful terminology.

Definition 2. A point x∗ of Ω is called isolated if there exists an ε > 0 such that the punctured
disk 0 < kx − x∗ k < ε contains no points of Ω, or, equivalently, there is no sequence {xn }∞
n=1 in

Ω \ {x } such that xn → x .∗

In the extreme case that a set consists of just one point, that point is isolated in the set
because the alternative would imply the existence of other points in the set. Isolated points
occur in our setting in the next result, which is illustrated in Figure 1.

Theorem 2 (FONC 1). Let f be a real-valued function that is defined and C1 on a neighborhood of
a point x∗ ∈ Rn such that d∗ = ∇ f (x∗ ) 6= 0. If x∗ is a strict local minimizer of f on some subset
Ω of Rn , then x∗ is an isolated point of −Kδ (x∗ , d∗ ) ∩ Ω for every δ ∈ (0, 1).

Proof. Suppose x∗ is a strict local minimizer of f on some subset Ω of Rn . We argue by


contradiction to show that x∗ is an isolated point of −Kδ (x∗ , d∗ ) ∩ Ω for every δ ∈ (0, 1).
Assume that this conclusion is false for some δ ∈ (0, 1). Then, the cone Kδ (x∗ , −d∗ ) contains
a sequence x1 , x2 , x3 , ... in Ω \ {x∗ } that converges to x∗ . Since x∗ is a strict local minimizer
of f on Ω, there exists ε > 0 such that f (x) > f (x∗ ) whenever 0 < kx − x∗ k < ε and
x ∈ Ω. By the definition of convergence, kxn − x∗ k < ε for all sufficiently large n, and
hence f (xn ) > f (x∗ ) for all sufficiently large n. However, by Corollary 1, x∗ is a strict local
maximizer for f on Kδ (x∗ , −d∗ ) ∩ Ω, so f (xn ) < f (x∗ ) for a sufficiently large n, which is a
contradiction. The result follows.

− Kδ ( x ∗ , d∗ ) Kδ ( x ∗ , d∗ )

∇ f (x∗ )
x∗
no points of Ω in here

Figure 1. This figure illustrates Theorem 2 (FONC 1). If x∗ is a strict local minimizer of f on some
subset Ω of Rn , then x∗ is an isolated point of −Kδ (x∗ , d∗ ) ∩ Ω for every δ ∈ (0, 1).

3. Analytical Versions of the Conditions and Generalized Differentiability


In this section, we present analytical versions of the conditions for local minimizers
given in Section 2. They are stated in a more general setting that uses extensions of the
notions of differentiability and gradient to arbitrary sets. The analytical versions are in
some ways more transparent and lead to generalizations of Theorems 1 and 2 in Section 5.
Mathematics 2023, 11, 4274 5 of 14

Definition 3 (Generalized Differentiability). Let Ω be a subset of Rn . We say that a function


f : Ω → R is differentiable at a point x∗ ∈ Ω if (1) x∗ is not an isolated point of Ω, and (2) there
is a vector g(x∗ ) in Rn such that the function r (x) defined by

f (x) − f (x∗ ) = (x − x∗ )> g(x∗ ) + r (x) (7)

satisfies
r (x)
lim∗ = 0, (8)
x→x kx − x∗ k
or, equivalently, for every sequence {xn }1∞ in Ω \ {x∗ } converging to x∗ , the sequence {rn }1∞
defined by
f (xn ) − f (x∗ ) = (xn − x∗ )> g(x∗ ) + rn (9)
satisfies
rn
lim∗ = 0. (10)
x→x kxn − x∗ k
Any such vector g(x∗ ) is denoted by ∇ f (x∗ ) and called a gradient of f at x∗ .

By [10] (Th. 9 on p. 113), the condition (8) is automatically met if f has an extension to
a function f˜ which is C1 on a neighborhood of x∗ , and then we can choose g(x∗ ) = ∇ f˜(x∗ ).
In general, gradients are not unique, but for our purpose any choice will work.
For a fixed f and x∗ , the set of all gradients g(x∗ ) is a closed convex set. We shall not
need this fact and omit a proof.

Theorem 3 (FOSC 2). Let Ω be a subset of Rn , and let f : Ω → R be differentiable at some point
x∗ ∈ Ω with gradient ∇ f (x∗ ) 6= 0. Assume that for every sequence {xn }1∞ in Ω \ {x∗ } with
xn → x∗ there is a δ > 0 such that

(xn − x∗ )> ∇ f (x∗ ) ≥ δkxn − x∗ k (11)

for all sufficiently large n. Then, x∗ is a strict local minimizer of f over Ω.

Proof. Argue by contradiction. If x∗ is not a strict local minimizer of f , there is a sequence


{xn }1∞ in Ω \ {x∗ } with xn → x∗ such that f (xn ) ≤ f (x∗ ) for all n. Define {rn }1∞ by (9).
Then, by (11),

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + rn ≥ δkxn − x∗ k + rn (12)

for all sufficiently large n. Since f is differentiable at x∗ , rn /kxn − x∗ k → 0 as n → ∞


by (10). Dividing by kxn − x∗ k, we obtain

f (xn ) − f (x∗ ) rn

≥ δ+ >0 (13)
kxn − x k kxn − x∗ k

for all sufficiently large n. The result that f (xn ) > f (x∗ ) for all sufficiently large n con-
tradicts our choice of the sequence {xn }1∞ to satisfy f (xn ) ≤ f (x∗ ) for all n. The theorem
follows.

The corresponding necessary condition is conveniently stated in contrapositive form.

Theorem 4 (FONC 2). Let Ω be a subset of Rn , and let f : Ω → R be differentiable at some point
x∗ ∈ Ω with gradient ∇ f (x∗ ) 6= 0. If there exists a sequence {xn }1∞ in Ω \ {x∗ } converging to x
and number δ > 0 such that

(xn − x∗ )> ∇ f (x∗ ) ≤ −δkxn − x∗ k (14)

for all sufficiently large n, then x∗ is not a local minimizer of f on Ω.


Mathematics 2023, 11, 4274 6 of 14

Equivalently, if x∗ is a local minimizer of f over Ω, then there exists no such sequence


{xn }1∞and number δ > 0.

Proof. Assume we are given such a sequence {xn }1∞ and δ > 0. Define {rn }1∞ by (9). Then,
by (14),
f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + rn ≤ −δkxn − x∗ k + rn (15)
for all sufficiently large n. Since f is differentiable at x∗ , rn /kxn − x∗ k → 0 as n → ∞
by (10). Thus, dividing (15) by kxn − x∗ k, we see that

f (xn ) − f (x∗ ) rn
≤ −δ + <0 (16)
kxn − x∗ k kxn − x∗ k

for all sufficiently large n. Hence, f (xn ) < f (x∗ ) for all sufficiently large n. Since x∗n → x,
x∗ is not a strict local minimizer.

Example 1. The inequality (11) in Theorem 3 (FOSC 2) cannot be weakened to

(xn − x∗ )> ∇ f (x∗ ) > 0. (17)

Choose x∗ = 0, Ω = {0} ∪ {x ∈ R2 : x1 > 0}, and f (x) = x1 − x22 , x ∈ Ω. Then,


we can take ∇ f (x∗ ) = [1, 0]> , and any sequence {xn }1∞ in Ω \ {0} with xn → 0 satisfies (17).
However,0 is not a localminimizer because f (0) = 0, and the sequence {xn }1∞ in Ω \ {0} defined
by xn = 1/(2n2 ), 1/n , n ≥ 1, converges to 0 and satisfies f (xn ) < 0 for all n ≥ 1.

Example 2. We cannot relax the inequality (14) in Theorem 4 (FONC 2) to

(xn − x∗ )> ∇ f (x∗ ) < 0. (18)

Choose x∗ = 0, Ω = x ∈ R2 : 2x1 ≥ − x22 , f (x) = x1 + x22 , x ∈ Ω, and ∇ f (x∗ ) =




[1, 0]> . See Figure 2. The sequence xn = − 1/(2n2 ), 1/n , n ≥ 1, belongs to Ω \ {0} and
 

converges to 0. It satisfies (18) because


 
 1 1
(xn − x∗ )> ∇ f (x∗ ) = − 1/(2n2 ), 1/n

= − 2 < 0, n ≥ 1.
0 2n

Nevertheless, f (0) = 0 is the minimum value of f attained on Ω. For, if x ∈ Ω \ {0} and


x2 6= 0, then f (x) = x1 + x22 ≥ − 21 x22 + x22 > 0, and f (x) = x1 > 0 whenever x ∈ Ω \ {0} and
x2 = 0.

x2

2x1 = − x22

x1
x∗ ∇ f (x∗ )

Figure 2. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Example 2.


Mathematics 2023, 11, 4274 7 of 14

4. Refinements Based on Twice Differentiability


The additional smoothness of f yields stronger results.

Definition 4 (Generalized Twice Differentiability). Let Ω be a subset of Rn . We say that a


function f : Ω → R is twice differentiable at a point x∗ ∈ Ω if (1) x∗ is not an isolated point
of Ω, and (2) there exist g(x∗ ) ∈ Rn and H(x∗ ) ∈ Rnxn such that the function r (x) defined by

f (x) − f (x∗ ) = (x − x∗ )> g(x∗ ) + (x − x∗ )> H(x∗ )(x − x∗ ) + r (x) (19)

satisfies
r (x)
lim = 0, (20)
n→∞ k x − x ∗ k2
or, equivalently, such that for any sequence {xn }1∞ in Ω \ {x∗ } converging to x∗ ,

f (xn ) − f (x∗ ) = (xn − x∗ )> g(x∗ ) + (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn , (21)

where
rn
lim = 0. (22)
n→∞ k x n − x ∗ k2
Any such vector g(x∗ ) is denoted ∇ f (x∗ ) and called a gradient of f at x∗ , and any such
matrix H(x∗ ) is called a Hessian of f at x∗ .

Twice differentiability implies differentiability according to Lemma 1 below. By Th.


3 on p. 160 of [10], f is twice differentiable at x∗ if it has an extension to a function f˜ that
is C2 on a neighborhood of x∗ . In this case, g(x∗ ) and H(x∗ ) can be chosen as the usual
gradient and Hessian of f˜.

Definition 5. Given sequences { an }∞ ∞


n=1 and { bn }n=1 of real numbers, we write (1) an = O( bn )
to mean that there exists M > 0 such that | an | ≤ M |bn | for all sufficiently large n, and (2)
an = O (bn ) if bn 6= 0 for all sufficiently large n and limn→∞ an /bn = 0.

Lemma 1. Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some point x∗ ∈ Ω


with gradient ∇ f (x∗ ). Then, for any sequence {xn }1∞ in Ω \ {x∗ } converging to x∗ ,

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 ). (23)

Proof. We start with (21), assuming some choice of Hessian H(x∗ ). Let M = kH(x∗ )k
be the matrix bound of H(x∗ ). Then, by the triangle inequality and Cauchy–Schwarz
inequality,

(xn − x∗ )> H(x∗ )(xn − x∗ ) + rn ≤ (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn


rn
≤ M k x n − x ∗ k2 + k x n − x ∗ k2 .
k x n − x ∗ k2

By (22), rn /kxn − x∗ k2 → 0, and hence rn /kxn − x∗ k2 ≤ 1 for all sufficiently large n. There-
fore (xn − x∗ )> H(x∗ )(xn − x∗ ) + rn ≤ ( M + 1)kxn − x∗ k2 for all sufficiently large n.

Theorem 5 (FOSC 3). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with gradient ∇ f (x∗ ) 6= 0. Assume that for every sequence {xn }1∞ in Ω \ {x∗ }
with xn → x∗ , there is a sequence {δn }∞ ∗ 2
n=1 of positive numbers such that k xn − x k = O ( δn ) and

(xn − x∗ )> ∇ f (x∗ ) ≥ δn (24)


for all sufficiently large n. Then, x∗ is a strict local minimizer of f over Ω.
Mathematics 2023, 11, 4274 8 of 14

Proof. If the conclusion is not true, there exists a sequence {xn }1∞ in Ω \ {x∗ } with xn → x∗
such that f (xn ) ≤ f (x∗ ) for all n. Then, by hypothesis, there is a sequence {δn }∞ n=1 of
positive numbers satisfying (24) such that kxn − x∗ k2 = O (δn ). By Lemma 1 and (24),

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 )


≥ δn + O(kxn − x∗ k2 )

for all sufficiently large n. Hence, since kxn − x∗ k2 = O (δn ),

f (xn ) − f (x∗ ) O(kxn − x∗ k2 )


≥ 1+ >0 (25)
δn δn

for all sufficiently large n. Thus, f (xn ) > f (x∗ ) for all sufficiently large n, contradicting our
choice of the sequence {xn }1∞ . The result follows.

Theorem 6 (FONC 3). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with gradient ∇ f (x∗ ) 6= 0. If there exist a sequence {xn }1∞ in Ω \ {x∗ } converging
to x∗ and a sequence {δn }∞ ∗ 2
n=1 of positive numbers such that k xn − x k = O ( δn ) and

(xn − x∗ )> ∇ f (x∗ ) ≤ −δn (26)

for all sufficiently large n, then x∗ is not a local minimizer of f on Ω.

Equivalently, if x∗ is a local minimizer of f on Ω, no such sequences {xn }1∞ and


{δn }∞ can exist.
n =1

Proof. Let {xn }1∞ and {δn }∞


n=1 be sequences with the properties stated in the theorem. By
Lemma 1 and (26),

f (xn ) − f (x∗ ) = (xn − x∗ )> ∇ f (x∗ ) + O(kxn − x∗ k2 ).


(27)
≤ −δn + O(kxn − x∗ k2 )

for all sufficiently large n. Hence, since kxn − x∗ k2 = O (δn ),

f (xn ) − f (x∗ ) O(kxn − x∗ k2 )


≤ −1 + <0
δn δn

for all sufficiently large n. Thus, f (xn ) < f (x∗ ) for all sufficiently large n, and therefore x∗
is not a local minimizer.

Remark 1. If f , Ω, x∗ , ∇ f (x∗ ) 6= 0 satisfy the conditions for a strict local minimizer in Theorem 3
(FOSC 2), they satisfy the conditions in Theorem 5 (FOSC 3) by choosing δn = δkxn − x∗ k, n ≥ 1.
Similarly, if f , Ω, x∗ , ∇ f (x∗ ) 6= 0 meet the conditions for a local non-minimizer in Theorem 4
(FONC 2), they meet the conditions in Theorem 6 (FONC 3). Examples 3 and 4 show that both
converse statements fail.

Example 3. See Figure 3. Set f (x) = x1 , x = [ x1 , x2 ]> , on Ω = {x ∈ R2 : x1 ≥ | x2 |3/2 } and


let x∗ = 0. Then, f is twice differentiable at x∗ with gradient ∇ f (x∗ ) = [1, 0]> . The point x∗ is a
strict local minimizer because f (0) = 0 and f (x) > 0 for every other x ∈ Ω. The condition for a
strict local minimizer in Theorem 3 (FOSC 2) is not satisfied. For example, the sequence
 
1 1
xn = , , n ≥ 1,
n n2/3
1/2
is in Ω \ {x∗ }, x∗n → 0, and the inequality (11) implies 1/n ≥ δ 1/n2 + 1/n4/3

, which is
impossible. On the other hand, the condition in Theorem 5 (FOSC 3) is satisfied. Consider any
Mathematics 2023, 11, 4274 9 of 14

sequence xn = [ xn,1 , xn,2 ]> , n ≥ 1, in Ω \ {x∗ } with xn → x∗ . Choosing δn = xn,1 , we readily


find that (xn − x∗ )> ∇ f (x∗ ) ≥ δn , and kxn − x∗ k2 = o (δn ).

x2

x1 = | x2 |3/2


x1
x∗ ∇ f (x∗ )

Figure 3. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Examples 3 and 5.

Example 4. See Figure 4. Set f (x) = x1 , x = [ x1 , x2 ]> , on Ω = {x ∈ R2 : x1 ≥ −| x2 |3/2 } and


let x∗ = 0. Then, f is twice differentiable at x∗ with gradient ∇ f (x∗ ) = [1, 0]> . Since f (x) < 0
on the curve x1 = −|x2 |3/2 , x∗ is not a local minimizer of f on Ω. It is not possible to show this
using Theorem 4 (FONC 2), because it is impossible to find a sequence satisfying the conditions
there. However, the conditions of Theorem 6 (FONC 3) are met by choosing
 >
1 1 1
xn = − 3/2
, and δn =
n n n3/2

for all n ≥ 1.

x2

x1 = −| x2 |3/2 Ω
x1
x∗ ∇ f (x∗ )

Figure 4. Region Ω, point x∗ , and gradient ∇ f (x∗ ) for Example 4.

5. Applications of the Analytical Conditions


Example 3 suggests looking for generalizations of Theorems 1 and 2 to larger re-
gions. In this section, we show that such generalizations exist, again assuming twice
differentiability.

Definition 6. Let x∗ , d ∈ Rn , d 6= 0, and α > 0 be given. For each β > 0, define Kα,β (x∗ , d) ⊆
Rn as the set consisting of the point x∗ , together with all points x 6= x∗ such that (1) kuk ≥ β kvkα
and (2) u> d > 0, where
(x − x∗ )> d d
u=
kdk kdk
Mathematics 2023, 11, 4274 10 of 14

is the projection of x − x∗ in the direction d, and

v = (x − x∗ ) − u

is a component of x − x∗ in the orthogonal direction. We call Kα,β (x∗ , d) an α-cone. The opposite
α-cone is the set
−Kα,β (x∗ , d) = Kα,β (x∗ , −d).

See Figure 5. For α = 1, Kα,β (x∗ , d) ⊆ Rn is a cone as in Definition 1. For α = 2, it is a


paraboloid.

x2
x1 = β | x2 | α
Kα,β ( x ∗ , d∗ )
x
v
x∗ x1
u d

Figure 5. This figure illustrates Definition 6 when x∗ = 0 and d is in the positive x1 -direction. The
α-cone Kα,β (x∗ , d) is the region to the right of the curve together with the curve itself.

Theorem 7 (FOSC 4). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with nonzero gradient d∗ = ∇ f (x∗ ). If Ω ⊆ Kα,β ( x ∗ , d∗ ) for some α ∈ [1, 2) and
β > 0, then x∗ is a strict local minimizer of f over Ω.

As in Theorem 1, we can apply Theorem 7 with f replaced by − f .

Corollary 2. Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some point


x∗ ∈ Ω, with nonzero gradient d∗ = ∇ f (x∗ ). If Ω ⊆ −Kα,β ( x ∗ , d∗ ) for some α ∈ [1, 2) and
β > 0, then x∗ is a strict local maximizer of f over Ω.

Remark 2. Theorem 7 fails for α = 2. For an example, let x∗ = 0, x = [ x1 , x2 ]> , and

Ω = {x ∈ R2 : x1 ≥ x22 }.

Define
f (x) = x1 − 2x22 , x ∈ Ω.
Then, f is twice differentiable at x∗ , d∗ = ∇ f (x∗ ) = [1, 0]> , and Ω = K2,1 (x∗ , d∗ ) (see
Figure 5). If Theorem 7 were true for α = 2, then x∗ = 0 would be a strict local minimizer for f
on Ω. However, f assumes positive, negative, and zero values at points of Ω arbitrarily close to x∗ .
This contradicts Theorem 7, and therefore α = 2 cannot be allowed in the theorem.

Proof of Theorem 7. We prove the theorem by verifying the condition for a strict local
minimizer in Theorem 5 (FOSC 3). Let {xn }1∞ be a sequence in Ω \ {x∗ } such that xn → x∗ .
Mathematics 2023, 11, 4274 11 of 14

We shall construct a sequence {δn }∞


n=1 of positive numbers satisfying (24) for all sufficiently
large n such that kxn − x∗ k2 = O (δn ). For all n ≥ 1, set

(xn − x∗ )> d∗ d∗
un = and v = (xn − x∗ ) − un ,
kd∗ k kd∗ k

Then, kun k ≥ β kvn kα and u> ∗ ∗ ∗


n d > 0 for all n by the definition of Kα,β ( x , d ). Since
un and vn are orthogonal,
 2/α
∗ 2 2 2 1 2
kxn − x k = kun k + kvn k ≤ kun k + kun k2/α .
β

Since xn → x∗ , un → 0, and hence kun k < 1 for all sufficiently large n, say n ≥ n0 .
Since α ≥ 1, 2/α ≤ 2, and hence kun k2/α ≥ kun k2 for all n ≥ n0 (because kun k < 1). Thus.
 2/α
1
kxn − x∗ k2 ≤ kun k2/α + kun k2/α = γ kun k2/α , n ≥ n0 .
β

We assume also that α < 2 and hence 2/α > 1, say 2/α = 1 + ε. Then,
1+ ε
(xn − x∗ )> d∗

kxn − x∗ k2 ≤ γkun k2/α = γ
kd∗ k
(xn − x∗ )> d∗ (xn − x∗ )> d∗
 ε

kd∗ k kd∗ k
(xn − x∗ )> d∗
≤γ k(xn − x∗ )kε ,
kd∗ k

by the Cauchy–Schwarz inequality and the fact that (xn − x∗ )> d∗ = u> ∗
n d > 0. Therefore,

k x n − x ∗ k 2− ε k d ∗ k
(xn − x∗ )> d∗ ≥ , n ≥ n0 .
γ

Setting δn = kxn − x∗ k2−ε kd∗ k/γ, we obtain (xn − x∗ )> d∗ ≥ δn for all sufficiently
large n and
k x n − x ∗ k2 γ kxn − x∗ kε
lim = lim = 0.
n→∞ δn n→∞ kd∗ k
We have verified the requirements in Theorem 5, and therefore x∗ is a strict local
minimizer of f over Ω by that result.

Theorem 8 (FONC 4). Let Ω be a subset of Rn , and let f : Ω → R be twice differentiable at some
point x∗ ∈ Ω, with nonzero gradient d∗ = ∇ f (x∗ ). If x∗ is a strict local minimizer of f on Ω,
then x∗ is an isolated point of −Kα,β (x∗ , d) ∩ Ω for every α ∈ [1, 2) and β > 0.

Proof. Assume that x∗ is a strict local minimizer of f on Ω, and, if possible, that x∗ is not an
isolated point of −Kα,β (x∗ , d) ∩ Ω for some α ∈ [1, 2) and β > 0. Then, there is a sequence
{xn }∞ ∗ ∗ ∗
n=1 that belongs to both − Kα,β ( x , d ) and Ω \ { x } such that xn → x . By Corollary 2,
f (xn ) < f (x ) for all sufficiently large n, contradicting our assumption that x∗ is a strict

local minimizer of f on Ω. The theorem follows.

Examples 5 and 6 are set in the context of the Karush–Kuhn–Tucker (KKT) theorem [2,6],
which allows constraint conditions to be expressed in terms of inequalities. We follow the
account in [1], in which the KKT theorem appears as Theorem 21.1, and Theorem 21.3 is
the corresponding second-order sufficient condition (SOSC). Theorems 9 and 10 below are
specializations of these results to the cases that concern us here. Theorems 21.1 and 21.3
in [1] allow additional Lagrange-type conditions that play no role in our examples.
Mathematics 2023, 11, 4274 12 of 14

Theorem 9 (KKT Theorem). Let f , g : Rn → R be given C1 functions. Assume that x∗ ∈ Rn is


a local minimizer for f subject to the condition g(x) ≤ 0, and that x∗ is a regular point for g in the
sense that ∇ g(x∗ ) 6= 0. Then, there is a real number µ∗ ≥ 0 such that
(1) µ∗ g(x∗ ) = 0;
(2) ∇ f (x∗ ) + µ∗ ∇ g(x∗ ) = 0.

The corresponding sufficient condition requires the stronger assumption that the given
functions f , g are C2 . The Hessians F, G for f , g are the n × n matrices of second-order
partials of f , g.

Theorem 10 (SOSC). Let f , g : Rn → R be given C2 functions. Assume that x∗ ∈ Rn satisfies


g(x∗ ) ≤ 0 and we can find a real number µ∗ ≥ 0 satisfying the following conditions:
(1) µ∗ g(x∗ ) = 0.
(2) ∇ f (x∗ ) + µ∗ ∇ g(x∗ ) = 0.
(3) If F, G are the Hessians of f , g and L(x∗ , µ∗ ) = F (x∗ ) + µ∗ G (x∗ ), then y> L(x∗ , µ∗ )y > 0
for all y ∈ Rn such that y 6= 0 and ∇ g(x∗ )> y = 0.
Then, x∗ is a strict local minimizer for f subject to the condition g(x) ≤ 0.

Example 5. Set x∗ = 0, f (x) = x1 − x12 − x22 , and g(x) = | x2 |3/2 − x1 for all x = [ x1 , x2 ]>
in R2 . Then, f ∈ C2 and g ∈ C1 on R2 . The set Ω = {x ∈ R2 : g(x) ≤ 0} is an al pha-cone with
α = 3/2 in the direction ∇ f (x∗ ) = [1, 0]> (see Figure 3). Therefore, by Theorem 7 (FOSC 4),
x∗ is a strict local minimizer for f subject to the constraint g(x) ≤ 0. However, this cannot be
shown with Theorem 10 because g 6∈ C2 , which is a hypothesis in Theorem 10. To see why this is a
problem, consider the form L(x, µ∗ ) = F (x) + µ∗ G (x) that appears in condition (3). At any point
x = [ x1 , x2 ]> with x2 6= 0, the Hessian of g is given by
2
∂2 g
  
0 0
G (x) = = 3 −1/2 .
∂xi ∂x j i,j=1
0 4 | x2 |

The second partial ∂2 g/∂x22 does not exist at any point on the line x2 = 0. Thus, G (x∗ ) is
undefined. Hence, the Lagrangian L(x∗ , µ∗ ) = F (x∗ ) + µ∗ G (x∗ ) in condition (3) of Theorem 10
is undefined, and therefore Theorem 10 cannot be applied. We remark that this example is within the
scope of Theorem 9 (KKT Theorem), and the conditions (1) and (2) there are satisfied with µ∗ = 1.

Example 6. Set x∗ = 0, f (x) = x1 − x12 − x22 , and g(x) = x22 − x1 for all x = [ x1 , x2 ]> in R2 .
Then, f , g ∈ C2 on R2 . In this example, x∗ is not a local minimizer of f subject to the constraint
g(x) ≤ 0. For example, for x 6= x∗ on the boundary x1 = x22 of the constraint set,

f (x) = x22 − x12 + x22 = − x12 < 0.

Might this example contradict Theorem 7 or Theorem 10? Fortunately, no, and it is instructive
to see why. Theorem 7 is not applicable because the constraint set Ω = {x ∈ R2 : g(x) ≤ 0} is an
al pha-cone with α = 2, and it is shown in Remark 2 that Theorem 7 fails for α = 2. To see that
Theorem 10 is also not applicable, let us check the required conditions (1)–(3):
(1) Since g(x∗ ) = 0, µ∗ g(x∗ ) = 0 for all µ∗ ≥ 0.
(2) For µ∗ = 1, ∇ g(x∗ ) + µ∗ ∇ f (x∗ ) = [−1, 0]> + µ∗ [1, 0]> = 0.
(3) In our example,
     
∗ ∗ ∗ ∗ −2 0
∗ 0 0 −2 0
L(x , µ ) = F (x ) + µ G (x ) = + = .
0 −2 0 2 0 0
Mathematics 2023, 11, 4274 13 of 14

Therefore,   
 −2 0 0
y> L(x∗ , µ∗ )y = 0

y2 =0 (28)
0 0 y2

for every y = [0, y2 ]> such that y2 6= 0, that is, for all y 6= 0 such that ∇ g( x ∗ )> y = 0.
In view of (28), the positive definiteness condition in (3) fails, and hence Theorem 7 cannot be
applied to this example.

6. Conclusions
The first-order necessary conditions in this paper contribute to the literature on first-
order optimality conditions by introducing stronger results than those in the current
literature. The new first-order necessary conditions imply the standard first-order necessary
conditions, including those in [1,5]. We introduced first-order sufficient conditions that we
did not find elsewhere in the literature. Our explanation of why the new conditions are
stronger used examples that were two-dimensional. However, the method is applicable to
general n-dimensional problems including linear programming.
We proposed first-order sufficient conditions for set-constrained optimization that
do not require the objective function to be convex or the constraint equations to be differ-
entiable. Conditions that require the function to be convex are essentially second-order
conditions. Our conditions only require the gradient of the objective function to be nonzero
at a candidate minimizer, and they are essentially first-order conditions even when we
apply them to problems where the objective function is twice differentiable.
When the given function is continuously differentiable at x ∗ and the gradient is
nonzero, the simplest form of the sufficient condition says that there is a cone with a vertex
at x ∗ , and x ∗ is a strict local minimizer on the cone. This sufficient condition was employed
to prove a corresponding necessary condition that does not use feasible directions and
instead uses the topological notion of an isolated point in a set.
We introduced generalized differentiability and reformulated the first-order conditions
in terms of convergent sequences. The new differentiability does not require the objective
function to be defined on an open neighborhood of x ∗ . It only requires the function to be
defined on the constraint set.
We refined the first-order conditions for a minimizer to twice differentiable functions
in terms of α-cones. The sufficiency version says that a twice differentiable function with
a nonzero gradient has a strict local minimizer at the vertex of an α-cone whose axis is
the gradient direction. We presented a problem with an α-cone constraint set where the
new sufficiency condition shows that the candidate point is a strict local minimizer. This
problem satisfies the necessary condition of the KKT method but not the sufficient condition,
because the Hessian is undefined at the candidate minimizer.

Author Contributions: Conceptualization, S.M.R. and E.K.P.C.; methodology, S.M.R., E.K.P.C. and
J.R.; writing—original draft, S.M.R.; writing—review and editing, J.R. All authors have read and
agreed to the published version of the manuscript.
Funding: S.M. Rovnyak was supported in part by the National Science Foundation under grant
ECCS-1711521. E.K.P. Chong was supported in part by the National Science Foundation under grant
CCF-2006788.
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is
not applicable to this article.
Acknowledgments: The authors thank Henry Rovnyak for help with the LaTeX document and
TikZ figures.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or
in the decision to publish the results.
Mathematics 2023, 11, 4274 14 of 14

References
1. Chong, E.K.P.; Zak, S.H. An Introduction to Optimization, 4th ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2013.
2. Forst, W.; Hoffmann, D. Optimization—Theory and Practice; Springer Undergraduate Texts in Mathematics and Technology;
Springer: New York, NY, USA, 2010.
3. Sioshansi, R.; Conejo, A.J. Optimization in Engineering: Models and Algorithms; Springer Optimization and Its Applications;
Springer: Cham, Switzerland, 2017; Volume 120.
4. Butenko, S.; Pardalos, P.M. Numerical Methods and Optimization; Chapman & Hall/CRC Numerical Analysis and Scientific
Computing; CRC Press: Boca Raton, FL, USA, 2014.
5. Kochenderfer, M.J.; Wheeler, T.A. Algorithms for Optimization; MIT Press: Cambridge, MA, USA, 2019.
6. Luenberger, D.G. Optimization by Vector Space Methods; John Wiley & Sons, Inc.: New York, NY, USA, 1969.
7. Lewis, A.D. Maximum Principle. Online Lecture Notes. 2006. Available online: https://mast.queensu.ca/~andrew/teaching/
pdf/maximum-principle.pdf (accessed on 9 October 2023).
8. Peng, S.G. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim. 1990, 28, 966–979.
[CrossRef]
9. Lu, Q. Second order necessary conditions for optimal control problems of stochastic evolution equations. In Proceedings of the
35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016.
10. Marsden, J.E.; Tromba, A.J. Vector Calculus, 6th ed.; W.H. Freeman & Company: New York, NY, USA, 2012.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like