0% found this document useful (0 votes)
80 views13 pages

Notes On Subgradients

This document discusses subgradients, which generalize the gradient for non-differentiable functions. The key points are: 1) A subgradient is a vector that satisfies an inequality relating the function values at different points. 2) The subdifferential is the set of all subgradients at a point. It always exists and is convex. 3) A point minimizes a function if and only if the zero vector is in its subdifferential. This characterizes optimal solutions.

Uploaded by

Eli Abadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views13 pages

Notes On Subgradients

This document discusses subgradients, which generalize the gradient for non-differentiable functions. The key points are: 1) A subgradient is a vector that satisfies an inequality relating the function values at different points. 2) The subdifferential is the set of all subgradients at a point. It always exists and is convex. 3) A point minimizes a function if and only if the zero vector is in its subdifferential. This characterizes optimal solutions.

Uploaded by

Eli Abadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Subgradients

S. Boyd, J. Duchi, M. Pilanci, and L. Vandenberghe


Notes for EE364b, Stanford University, Spring 2021-22
April 13, 2022

1 Definition
We say a vector g ∈ Rn is a subgradient of f : Rn → R at x ∈ dom f if for all z ∈ dom f ,
f (z) ≥ f (x) + g T (z − x). (1)
If f is convex and differentiable, then its gradient at x is a subgradient. But a subgradient
can exist even when f is not differentiable at x, as illustrated in figure 1. The same example
shows that there can be more than one subgradient of a function f at a point x.
There are several ways to interpret a subgradient. A vector g is a subgradient of f at x
if the affine function (of z) f (x) + g T (z − x) is a global underestimator of f . Geometrically,
g is a subgradient of f at x if (g, −1) supports epi f at (x, f (x)), as illustrated in figure 2.
A function f is called subdifferentiable at x if there exists at least one subgradient at
x. The set of subgradients of f at the point x is called the subdifferential of f at x, and
is denoted ∂f (x). A function f is called subdifferentiable if it is subdifferentiable at all
x ∈ dom f .

Example. Absolute value. Consider f (z) = |z|. For x < 0 the subgradient is unique:
∂f (x) = {−1}. Similarly, for x > 0 we have ∂f (x) = {1}. At x = 0 the subdifferential
is defined by the inequality |z| ≥ gz for all z, which is satisfied if and only if g ∈ [−1, 1].
Therefore we have ∂f (0) = [−1, 1]. This is illustrated in figure 3.

2 Basic properties
The subdifferential ∂f (x) is always a closed convex set, even if f is not convex. This follows
from the fact that it is the intersection of an infinite set of halfspaces:
\
∂f (x) = {g | f (z) ≥ f (x) + g T (z − x)}.
z∈dom f

In addition, if f is continuous at x, then the subdifferential ∂f (x) is bounded. Indeed, choose


some  > 0 such that that −∞ < f ≤ f (y) ≤ f < ∞ for all y ∈ Rn such that ky − xk2 ≤ .

1
f

g1

g2

g3

x1 x2

Figure 1: At x1 , the convex function f is differentiable, and g1 (which is the


derivative of f at x1 ) is the unique subgradient at x1 . At the point x2 , f is not
differentiable. At this point, f has many subgradients: two subgradients, g2 and g3 ,
are shown.

Figure 2: A vector g ∈ Rn is a subgradient of f at x if and only if (g, −1) defines


a supporting hyperplane to epi f at (x, f (x)).

f s

y1

x2

x1
y2

Figure 3: The absolute value function (left), and its subdifferential ∂f (x) as a
function of x (right).

2
If ∂f (x) is unbounded, then there is a sequence gn ∈ ∂f (x) such that kgn k2 → ∞. Taking the
sequence yn = x + gn / kgn k2 , we find that f (yn ) ≥ f (x) + gnT (yn − x) = f (x) +  kgn k2 → ∞,
which is a contradiction to f (yn ) being bounded.

2.1 Existence of subgradients


If f is convex and x ∈ int dom f , then ∂f (x) is nonempty and bounded. To establish that
∂f (x) 6= ∅, we apply the supporting hyperplane theorem to the convex set epi f at the
boundary point (x, f (x)), to conclude the existence of a ∈ Rn and b ∈ R, not both zero,
such that  T    
a z x
− = aT (z − x) + b(t − f (x)) ≤ 0
b t f (x)
for all (z, t) ∈ epi f . This implies b ≤ 0, and that

aT (z − x) + b(f (z) − f (x)) ≤ 0

for all z. If b 6= 0, we can divide by b to obtain

f (z) ≥ f (x) − (a/b)T (z − x),

which shows that −a/b ∈ ∂f (x). Now we show that b 6= 0, i.e., that the supporting
hyperplane cannot be vertical. If b = 0 we conclude that aT (z − x) ≤ 0 for all z ∈ dom f .
This is impossible since x ∈ int dom f .
This discussion shows that a convex function has a subgradient at x if there is at least
one nonvertical supporting hyperplane to epi f at (x, f (x)). This is the case, for example, if
f is continuous. There are pathological convex functions which do not have subgradients at
some points, but we will assume in the sequel that all convex functions are subdifferentiable
(at every point in dom f ).

2.2 Subgradients of differentiable functions


If f is convex and differentiable at x, then ∂f (x) = {∇f (x)}, i.e., its gradient is its only
subgradient. Conversely, if f is convex and ∂f (x) = {g}, then f is differentiable at x and
g = ∇f (x).

2.3 The minimum of a nondifferentiable function


A point x? is a minimizer of a function f (not necessarily convex) if and only if f is subdif-
ferentiable at x? and
0 ∈ ∂f (x? ),
i.e., g = 0 is a subgradient of f at x? . This follows directly from the fact that f (x) ≥ f (x? )
for all x ∈ dom f . And clearly if f is subdifferentiable at x? with 0 ∈ ∂f (x? ), then f (x) ≥
f (x? ) + 0T (x − x? ) = f (x? ) for all x.

3
While this simple characterization of optimality via the subdifferential holds for noncon-
vex functions, it is not particularly useful in that case, since we generally cannot find the
subdifferential of a nonconvex function.
The condition 0 ∈ ∂f (x? ) reduces to ∇f (x? ) = 0 when f is convex and differentiable at
x? .

2.4 Directional derivatives and subgradients


For convex functions f , the directional derivative of f at the point x ∈ Rn in the direction
v is
∆ f (x + tv) − f (x)
f 0 (x; v) = lim .
t&0 t
This quantity always exists for convex f , though it may be +∞ or −∞. To see the existence
of the limit, we use that the ratio (f (x + tv) − f (x))/t is non-decreasing in t. For 0 < t1 ≤ t2 ,
we have 0 ≤ t1 /t2 ≤ 1, and

f (x + t1 v) − f (x) f ( tt12 (x + t2 v) + (1 − tt21 )x) − f (x)


=
t1 t1
t1
f (x + t2 v) (1 − tt12 )f (x) − f (x) f (x + t2 v) − f (x)
≤ t2 + = ,
t1 t1 t2
so the limit in the definition of f 0 (x; v) exists.
The directional derivative f 0 (x; v) possesses several interesting properties as well. First,
it is convex in v, and if f is finite in a neighborhood of x, then f 0 (x; v) exists. Additionally,
f is differentiable at x if and only if for some g (which is ∇f (x)) and all v ∈ Rn we have
f 0 (x; v) = g T v, that is, if and only if f 0 (x; v) is a linear function of v.1 For general convex f ,
f 0 (x; v) is positively homogeneous in v, meaning that for α ≥ 0, we have f 0 (x; αv) = αf 0 (x; v)
(replace t by t/α in the defining limit).
The directional derivative f 0 (x; v) satisfies the following general formula for convex f :

f 0 (x; v) = sup g T v. (2)


g∈∂f (x)

To see this inequality, note that f 0 (x; v) ≥ supg∈∂f (x) g T v by the definition of a subgradient:
f (x + tv) − f (x) ≥ tg T v for any t ∈ R and g ∈ ∂f (x), so f 0 (x; v) ≥ supg∈∂f (x) g T v. For the
other direction, we claim that all affine functions that are below the function v 7→ f 0 (x; v)
may be taken to be linear. Specifically, suppose that (g, r) ∈ Rn × R and g T v − r ≤ f 0 (x; v)
for all v. Then r ≥ 0, as taking v = 0 gives −r ≤ f 0 (x; 0) = 0. By the positive homogeneity
of f 0 (x; v), we see that for any t ≥ 0 we have tg T v − r ≤ f 0 (x; tv) = tf 0 (x; v), and thus we
have
r
g T v − ≤ f 0 (x; v) for all t > 0.
t
1
This is simply the standard definition of differentiability.

4
−g

y
x?

Figure 4: The point x? minimizes f over X (the shown level curves) if and only if
for some g ∈ ∂f (x? ), g T (y − x? ) ≥ 0 for all y ∈ X. Note that not all subgradients
satisfy this inequality.

Taking t → +∞ gives that any affine minorizer of f 0 (x; v) may be taken to be linear. As any
(closed) convex function can be written as the supremum of its affine minorants, we have

f 0 (x; v) = sup g T v | g T ∆ ≤ f 0 (x; ∆) for all ∆ ∈ Rn .

On the other hand, if g T ∆ ≤ f 0 (x; ∆) for all ∆ ∈ Rn , then we have g T ∆ ≤ f (x+∆)−f (x), so
that g ∈ ∂f (x), and we may as well have taken the preceding supremum only over g ∈ ∂f (x).

2.5 Constrained minimizers of nondifferentiable functions


There is a somewhat more complex version of the result that 0 ∈ ∂f (x) if and only if x min-
imizes f for constrained minimization. Consider finding the minimizer of a subdifferentiable
function f over a (closed) convex set X. Then we have that x? minimizes f if and only if
there exists a subgradient g ∈ ∂f (x? ) such that

g T (y − x? ) ≥ 0 for all y ∈ X.

See Fig. 4 for an illustration of this condition.


To see this result, first suppose that g ∈ ∂f (x? ) satisfies the preceding condition. Then
by definition, f (x) ≥ f (x? )+g T (x−x? ) ≥ f (x? ) for x ∈ X. The converse is more subtle, and
we show it under the assumption that x? ∈ int dom f , though x? may be on the boundary of

5
X. We suppose that f (x) ≥ f (x? ) for all x ∈ X. In this case, for any x ∈ X, the directional
derivative
f (x? + t(x − x? )) − f (x? )
f 0 (x? ; x − x? ) = lim ≥ 0,
t&0 t
that is, for any x, the direction ∆ = x − x? pointing into X satisfies f 0 (x? ; ∆) ≥ 0.
By our characterization of the directional derivative earlier, we know that f 0 (x? ; ∆) =
supg∈∂f (x? ) g T ∆ ≥ 0. Thus, defining the ball B = {y + x? ∈ Rn | kyk2 ≤ }, we have

inf sup g T (x − x? ) ≥ 0.
x∈X∩B g∈∂f (x? )

As ∂f (x? ) is bounded, we may swap the min and max (see, for example, Exercise 5.25 of
[BV04]), finding that there must exist some g ∈ ∂f (x? ) such that

inf g T (x − x? ) ≥ 0.
x∈X∩B

But any y ∈ X may be written as t(x − x? ) + x? for some t ≥ 0 and x ∈ X ∩ B , which gives
the result.
For fuller explanations of these inequalities and derivations, see also the books by Hiriart-
Urruty and Lemaréchal [HUL93, HUL01].

3 Calculus of subgradients
In this section we describe rules for constructing subgradients of convex functions. We
will distinguish two levels of detail. In the ‘weak’ calculus of subgradients the goal is to
produce one subgradient, even if more subgradients exist. This is sufficient in practice, since
subgradient, localization, and cutting-plane methods require only a subgradient at any point.
A second and much more difficult task is to describe the complete set of subgradients
∂f (x) as a function of x. We will call this the ‘strong’ calculus of subgradients. It is useful
in theoretical investigations, for example, when describing the precise optimality conditions.

3.1 Nonnegative scaling


For α ≥ 0, ∂(αf )(x) = α∂f (x).

3.2 Sum and integral


Suppose f = f1 + · · · + fm , where f1 , . . . , fm are convex functions. Then we have

∂f (x) = ∂f1 (x) + · · · + ∂fm (x).

This property extends to infinite sums, integrals, and expectations (provided they exist).

6
3.3 Affine transformations of domain
Suppose f is convex, and let h(x) = f (Ax + b). Then ∂h(x) = AT ∂f (Ax + b).

3.4 Pointwise maximum


Suppose f is the pointwise maximum of convex functions f1 , . . . , fm , i.e.,

f (x) = max fi (x),


i=1,...,m

where the functions fi are subdifferentiable. We first show how to construct a subgradient
of f at x.
Let k be any index for which fk (x) = f (x), and let g ∈ ∂fk (x). Then g ∈ ∂f (x). In other
words, to find a subgradient of the maximum of functions, we can choose one of the functions
that achieves the maximum at the point, and choose any subgradient of that function at the
point. This follows from

f (z) ≥ fk (z) ≥ fk (x) + g T (z − x) = f (x) + g T (y − x).

More generally, we have

∂f (x) = Co ∪ {∂fi (x) | fi (x) = f (x)},

i.e., the subdifferential of the maximum of functions is the convex hull of the union of
subdifferentials of the ‘active’ functions at x.

Example. Maximum of differentiable functions. Suppose f (x) = maxi=1,...,m fi (x),


where fi are convex and differentiable. Then we have

∂f (x) = Co{∇fi (x) | fi (x) = f (x)}.

At a point x where only one of the functions, say fk , is active, f is differentiable and
has gradient ∇fk (x). At a point x where several of the functions are active, ∂f (x) is
a polyhedron.

Example. `1 -norm. The `1 -norm

f (x) = kxk1 = |x1 | + · · · + |xn |

is a nondifferentiable convex function of x. To find its subgradients, we note that f


can expressed as the maximum of 2n linear functions:

kxk1 = max{sT x | si ∈ {−1, 1}},

so we can apply the rules for the subgradient of the maximum. The first step is to
identify an active function sT x, i.e., find an s ∈ {−1, +1}n such that sT x = kxk1 . We
can choose si = +1 if xi > 0, and si = −1 if xi < 0. If xi = 0, more than one function

7
is active, and both si = +1, and si = −1 work. The function sT x is differentiable and
has a unique subgradient s. We can therefore take

 +1 xi > 0
gi = −1 xi < 0

−1 or + 1 xi = 0.
The subdifferential is the convex hull of all subgradients that can be generated this
way:
∂f (x) = {g | kgk∞ ≤ 1, g T x = kxk1 }.

3.5 Supremum
Next we consider the extension to the supremum over an infinite number of functions, i.e.,
we consider
f (x) = sup fα (x),
α∈A
where the functions fα are subdifferentiable. We only discuss the weak property.
Suppose the supremum in the definition of f (x) is attained. Let β ∈ A be an index for
which fβ (x) = f (x), and let g ∈ ∂fβ (x). Then g ∈ ∂f (x). If the supremum in the definition
is not attained, the function may or may not be subdifferentiable at x, depending on the
index set A.
Assume however that A is compact (in some metric), and that the function α 7→ fα (x)
is upper semi-continuous for each x. Then
∂f (x) = Co ∪ {∂fα (x) | fα (x) = f (x)}.
Example. Maximum eigenvalue of a symmetric matrix. Let f (x) = λmax (A(x)),
where A(x) = A0 + x1 A1 + · · · + xn An , and Ai ∈ Sm . We can express f as the
pointwise supremum of convex functions,
f (x) = λmax (A(x)) = sup y T A(x)y.
kyk2 =1

Here the index set A is A = {y ∈ Rn | ky2 k1 ≤ 1}.


Each of the functions fy (x) = y T A(x)y is affine in x for fixed y, as can be easily seen
from
y T A(x)y = y T A0 y + x1 y T A1 y + · · · + xn y T An y,
so it is differentiable with gradient ∇fy (x) = (y T A1 y, . . . , y T An y).
The active functions y T A(x)y are those associated with the eigenvectors corresponding
to the maximum eigenvalue. Hence to find a subgradient, we compute an eigenvector
y with eigenvalue λmax , normalized to have unit norm, and take
g = (y T A1 y, y T A2 y, . . . , y T An y).

The ‘index set’ in this example is {y | kyk = 1} is a compact set. Therefore


∂f (x) = Co {∇fy | A(x)y = λmax (A(x))y, kyk = 1 } .

8
Example. Maximum eigenvalue of a symmetric matrix, revisited. Let f (A) = λmax (A),
where A ∈ Sn , the symmetric n-by-n matrices. Then as above, f (A) = λmax (A) =
supkyk2 =1 y T Ay, but we note that y T Ay = Tr(Ayy T ), so that each of the functions
fy (A) = y T Ay is linear in A with gradient ∇fy (A) = yy T . Then using an identical
argument to that above, we find that
 
∂f (A) = Co yy T | kyk2 = 1, y T Ay = λmax (A) = Co yy T | kyk2 = 1, Ay = λmax (A)y ,

the convex hull of the outer products of maximum eigenvectors of the matrix A.

3.6 Minimization over some variables


The next subgradient calculus rule concerns functions of the form
f (x) = inf F (x, y)
y

where F (x, y) is subdifferentiable and jointly convex in x ∈ Rn and y ∈ Rm .


Suppose that the infimum over y in the definition of f (x) is attained on the set Yx ⊂ Rm
(where Yx 6= ∅), so that F (x, y) = f (x) for y ∈ Yx . By definition, a vector g ∈ Rn is a
subgradient of f if and only if
f (x0 ) ≥ f (x) + g T (x0 − x) = F (x, y) + g T (x0 − x)
for all x0 ∈ Rn and any y ∈ Yx . This is equivalent to
 T  0   
0 0 T 0 g x x
F (x , y ) ≥ F (x, y) + g (x − x) = F (x, y) + −
0 y0 y
for all (x0 , y 0 ) ∈ Rn × Rm and x, y ∈ Yx . In particular, we have the result that
∂f (x) = {g ∈ Rn | (g, 0) ∈ ∂F (x, y) for some y ∈ Yx } .
That is, there exist g ∈ Rn such that (g, 0) ∈ ∂F (x, y) for some y ∈ Yx , and any such g is a
subgradient of f at x (as long as the infimum is attained and x ∈ int dom f ).

3.7 Optimal value function of a convex optimization problem


Suppose f : Rm × Rp → R is defined as the optimal value of a convex optimization problem
in standard form, with z ∈ Rn as optimization variable,
minimize f0 (z)
subject to fi (z) ≤ xi , i = 1, . . . , m (3)
Az = y.
In other words, f (x, y) = inf z F (x, y, z) where

f0 (z) fi (z) ≤ xi , i = 1, . . . , m, Az = y
F (x, y, z) =
+∞ otherwise,

9
which is jointly convex in x, y, z. Subgradients of f can be related to the dual problem
of (3) as follows.
Suppose we are interested in subdifferentiating f at (x̂, ŷ). We can express the dual
problem of (3) as
maximize g(λ) − xT λ − y T ν
(4)
subject to λ  0.
where !
X
m
g(λ) = inf f0 (z) + λi fi (z) + ν T Az .
z
i=1

Suppose strong duality holds for problems (3) and (4) at x = x̂ and y = ŷ, and that the
dual optimum is attained at λ? , ν ? (for example, because Slater’s condition holds). From
the global perturbation inequalities we know that

f (x, y) ≥ f (x̂, ŷ) − λ? T (x − x̂) − ν ? T (y − ŷ)

In other words, the dual optimal solution provides a subgradient:

−(λ? , ν ? ) ∈ ∂f (x̂, ŷ).

4 Quasigradients
If f (x) is quasiconvex, then g is a quasigradient at x0 if

g T (x − x0 ) ≥ 0 ⇒ f (x) ≥ f (x0 ),

Geometrically, g defines a supporting hyperplane to the sublevel set {x | f (x) ≤ f (x0 )}.
Note that the set of quasigradients at x0 form a cone.
T
Example. Linear fractional function. f (x) = acT x+d x+b
. Let cT x0 + d > 0. Then
g = a − f (x0 )c is a quasigradient at x0 . If cT x + d > 0, we have

aT (x − x0 ) ≥ f (x0 )cT (x − x0 ) =⇒ f (x) ≥ f (x0 ).

Example. Degree of a polynomial. Define f : Rn → R by

f (a) = min{i | ai+2 = · · · = an = 0},

i.e., the degree of the polynomial a1 + a2 t + · · · + an tn−1 . Let a 6= 0, and k = f (a),


then g = sign(ak+1 )ek+1 is a quasigradient at a
To see this, we note that

g T (b − a) = sign(ak+1 )bk+1 − |ak+1 | ≥ 0

implies bk+1 6= 0.

10
5 Clarke Subdifferential
Now we explore a generalization of the notion of subdifferential that enables the analysis of
non-convex and non-smooth functions through convex analysis. We will introduce Clarke
Subdifferential, which is a natural generalization [Cla90] of the subdifferential set in terms
of convex hulls.

5.1 Locally Lipschitz Functions


As we move beyond convex functions, an important class of functions is locally Lipschitz
functions. Let us recall the definition:

Definition 1. A function f : Rn → R is locally Lipschitz if for any bounded S ⊆ Rn , there


exists a constant L > 0 such that

|f (x) − f (y)| ≤ Lkx − yk2 for all x, y ∈ S.

A well-known result due to Rademacher states that a locally Lipschitz function is differen-
tiable almost everywhere (see e.g., Theorem 9.60 in [RW09]). In particular, every neighbor-
hood of x contains a point y for which ∇f (y) exists. This motivates the following construction
known as the Clarke subdifferential
n o
∂C f (x) = Co s ∈ R : ∃x → x, ∇f (x ) exists, and ∇f (x ) → s .
n k k k

We can check that the absolute value function f (x) = |x| satisfies ∇C f (0) = [−1, 1]. Like-
wise, the function −f (x) satisfies ∇C (−f (0)) = [−1, 1]. For convex functions, we will see
that the Clarke subdifferential reduces to the ordinary subdifferential defined in Section 1.

5.2 Clarke Directional Derivative


Remarkably, it can be shown that Clarke subdifferentials can by described by support func-
tions even for non-convex functions. In order to show this, we need to generalize the notion
of directional derivatives from Section 2.4. We now define the Clarke directional derivative
of f at x in the direction d as follows

∆ f (x0 + td) − f (x0 )


f ◦ (x, d) = lim sup . (5)
x0 →x, t&0 t

Compared to the usual directional derivative (2), Clarke directional derivative (5) is able to
capture the behavior of the function in a neighborbood of x rather than just along a ray
emanating from x.
We have the following generalization of (2)

f ◦ (x, d) = max sT d,
s∈∇C f (x)

11
in general. Consider the function f : R ! R given by
f (x) = max{x, 0}+min{0, x}. Let us compute @C f1 (0),
> 0, @C f2 (0), and @C f (0):
wise.
derivative f1 (x) = max{x,0} f2 (x) = min{x,0} f (x) =f1 (x)+f2 (x)
g( x1 ))
this and
s locally
e of f at
quotient @C f1 (0) = [0,1] @C f2 (0) = [0,1] @C f (0) = {1}
e, as can
(n+ 12 )⇡ Figure 5: Clarke subdifferential of the sum of two non-differentiable functions.
NoteObserve that rule does not hold in this example since the function f2 (x) =
that the addition
min{x, 0} is not subdifferentially regular [LSM20].
s even, @C f (0) = {1} ( @C f1 (0) + @C f2 (0) = [0, 2].
wise.
which shows that the support function of the Clarke subdifferential at any point x evaluated
The
at d is equal failure
to the Clarkeofdirectional
the sumderivative
rule isatone
x in of
the the obstacles
direction d. to
f directionalthe Clarkecomputing
Note that the Clarke the Clarke subgradient. Nevertheless, not allto the sum of
subdifferential of the sum of two functions is not equal
subdifferentials (see Figure 5 for an example). Nevertheless, we have the weaker
of view. Thesum rule is lost, as we still have the following weaker version of
anating from the sum rule: ∇C (f1 + f2 ) ⊆ ∇C f1 + ∇C f2 .
with tk & 0), It can be shown that the sum rule holds with equality if the functions are subdifferentially
the directionregular, which are locally Lipschitz + f2 ) ✓for@Cwhich
@C (f1 functions f1 +the@Cordinary
f2 ; directional derivative (2)
0 ◦
k
+ tk d) vs.and Clarke directional derivative (5) coincide, i.e., f (x, d) = f (x, d) ∀x, d. It follows that
the latter isis identicalsee [28,
convex functions are Proposition
subdifferentially1.12].
regular. This implies that the Clarke subdifferential

d of x rathertions and – (Tightness).


to the ordinary subdifferential for convex functions. Furthermore, smooth func-
It is known that if f attains a local minimum
maximum of smooth functions, i.e., f = maxi∈{1,...,m} gi , where gi are smooth are
at x̄, regular.
y, f (x, d) issubdifferentially then 0It2can@Calso f (x̄); see that
be shown [30,chain
Proposition 2.3.2].
rule also holds By
for subdifferentially
Fact 3,[LSM20].
eighborhoodregular functions this is equivalent to f (x̄, d) 0 for all d 2 Rn .
very fruitful However, the Clarke subdifferential may contain sta-
Example. Subdifferential sum rule. Consider the functions f1 (x) = max{x, 0},
functions. tionary points that are not local minima. For instance,
f2 (x) = min{x, 0} and f (x) = f1 (x)+f2 (x) as shown in Figure 5. It can be verified that
consider
the weak additionthe
rulefunction
∇C f1 + f2 R⊆∇ 3C fx1 +7!∇Cff(x) ∇C fIt1 (0)
= e.g.,|x|.
2 holds, is +easy
∇C f2 (0) =
difference in
[0, 1] to
+ [0,see
1] =that @C fC (0) = [ 1, 1]. It follows that x̄ = 0 is here
[0, 2] ⊇ ∇ f (0) = {1}. The sum rule does not hold with equality
ons and non-
a stationary point (as 0 2 @C f (0)). However, the point
since f2 (x) = min{x, 0} is not subdifferentially regular. Note that non-smooth concave
subdifferen- functions are not subdifferentially regular in general.
x̄ = 0 is clearly not a local minimum (in fact, it is a global
d; in the non-
Finally,maximum). Moreover,
we have the following observe
result that that local
characterizes the minima
corresponding
and maxima of locally
nvexificationLipschitz functions in terms of stationarity in the sense of Clarke subdifferential [RW09].
Clarke directional derivatives are f (0, 1) = f (0, 1) =
nvex analysis
1, xwhich shows that neither d = 1 =⇒ nor 0 d = 1
ions. is a local minimum or maximum of f (x) ∈ ∂C f (x).
is a descent direction according to Clarke’s definition.
veral proper-Note that the reverse implication does not hold in general.
However, the ordinary directional derivatives exist and
ossess. Now,
are given by f 0 (0, 1) = f 0 (0, 1) = 1. It follows that
ssesses those
both d = 1 and d = 1 are 12 descent directions. One may
argue that the above example is not persuasive enough,
continuously as similar phenomena occur in the smooth case (e.g.,
x)T d for all R 3 x 7! f (x) = x2 ). Hence, let us provide another,
Example. Local minimum and maximum. Consider the non-convex one-dimensional
function f (x) = max{−|x|, x − 1}. It can be verified that x = 0 and x = 12 are local
maximum and local minimum respectively. Note that we have 0 ∈ ∂C f (0) = [−1, 1]
and 0 ∈ ∂C f ( 21 ) = [−1, 1].

Example. Stationary points. Consider the function f (x) = min{x, 0}. It can be seen
that ∇C f (x) = {0} for x > 0, and such points are neither local minima nor local
maxima.

References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,
2004.

[Cla90] F. H. Clarke. Optimization and nonsmooth analysis. SIAM, 1990.

[HUL93] J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algo-


rithms I & II. Springer, New York, 1993.

[HUL01] J. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex Analysis. Springer,


2001.

[LSM20] J. Li, A. M. So, and W. Ma. Understanding notions of stationarity in nonsmooth


optimization: A guided tour of various constructions of subdifferential for nons-
mooth functions. IEEE Signal Processing Magazine, 37(5):18–31, 2020.

[RW09] R. T. Rockafellar and T. J-B. Wets. Variational analysis, volume 317. Springer
Science & Business Media, 2009.

13

You might also like