0% found this document useful (0 votes)

80 views13 pages

Notes On Subgradients

This document discusses subgradients, which generalize the gradient for non-differentiable functions. The key points are: 1) A subgradient is a vector that satisfies an inequality relating the function values at different points. 2) The subdifferential is the set of all subgradients at a point. It always exists and is convex. 3) A point minimizes a function if and only if the zero vector is in its subdifferential. This characterizes optimal solutions.

Uploaded by

Eli Abadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views13 pages

Notes On Subgradients

Uploaded by

Eli Abadi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Subgradients

S. Boyd, J. Duchi, M. Pilanci, and L. Vandenberghe

Notes for EE364b, Stanford University, Spring 2021-22
April 13, 2022

1 Definition
We say a vector g ∈ Rn is a subgradient of f : Rn → R at x ∈ dom f if for all z ∈ dom f ,
f (z) ≥ f (x) + g T (z − x). (1)
If f is convex and differentiable, then its gradient at x is a subgradient. But a subgradient
can exist even when f is not differentiable at x, as illustrated in figure 1. The same example
shows that there can be more than one subgradient of a function f at a point x.
There are several ways to interpret a subgradient. A vector g is a subgradient of f at x
if the affine function (of z) f (x) + g T (z − x) is a global underestimator of f . Geometrically,
g is a subgradient of f at x if (g, −1) supports epi f at (x, f (x)), as illustrated in figure 2.
A function f is called subdifferentiable at x if there exists at least one subgradient at
x. The set of subgradients of f at the point x is called the subdifferential of f at x, and
is denoted ∂f (x). A function f is called subdifferentiable if it is subdifferentiable at all
x ∈ dom f .

Example. Absolute value. Consider f (z) = |z|. For x < 0 the subgradient is unique:
∂f (x) = {−1}. Similarly, for x > 0 we have ∂f (x) = {1}. At x = 0 the subdifferential
is defined by the inequality |z| ≥ gz for all z, which is satisfied if and only if g ∈ [−1, 1].
Therefore we have ∂f (0) = [−1, 1]. This is illustrated in figure 3.

2 Basic properties
The subdifferential ∂f (x) is always a closed convex set, even if f is not convex. This follows
from the fact that it is the intersection of an infinite set of halfspaces:
\
∂f (x) = {g | f (z) ≥ f (x) + g T (z − x)}.
z∈dom f

In addition, if f is continuous at x, then the subdifferential ∂f (x) is bounded. Indeed, choose

some > 0 such that that −∞ < f ≤ f (y) ≤ f < ∞ for all y ∈ Rn such that ky − xk2 ≤ .

1
f

x1 x2

Figure 1: At x1 , the convex function f is differentiable, and g1 (which is the

derivative of f at x1 ) is the unique subgradient at x1 . At the point x2 , f is not
differentiable. At this point, f has many subgradients: two subgradients, g2 and g3 ,
are shown.

Figure 2: A vector g ∈ Rn is a subgradient of f at x if and only if (g, −1) defines

a supporting hyperplane to epi f at (x, f (x)).

f s

x1
y2

Figure 3: The absolute value function (left), and its subdifferential ∂f (x) as a
function of x (right).

2
If ∂f (x) is unbounded, then there is a sequence gn ∈ ∂f (x) such that kgn k2 → ∞. Taking the
sequence yn = x + gn / kgn k2 , we find that f (yn ) ≥ f (x) + gnT (yn − x) = f (x) + kgn k2 → ∞,
which is a contradiction to f (yn ) being bounded.

2.1 Existence of subgradients

If f is convex and x ∈ int dom f , then ∂f (x) is nonempty and bounded. To establish that
∂f (x) 6= ∅, we apply the supporting hyperplane theorem to the convex set epi f at the
boundary point (x, f (x)), to conclude the existence of a ∈ Rn and b ∈ R, not both zero,
such that T
a z x
− = aT (z − x) + b(t − f (x)) ≤ 0
b t f (x)
for all (z, t) ∈ epi f . This implies b ≤ 0, and that

aT (z − x) + b(f (z) − f (x)) ≤ 0

for all z. If b 6= 0, we can divide by b to obtain

f (z) ≥ f (x) − (a/b)T (z − x),

which shows that −a/b ∈ ∂f (x). Now we show that b 6= 0, i.e., that the supporting
hyperplane cannot be vertical. If b = 0 we conclude that aT (z − x) ≤ 0 for all z ∈ dom f .
This is impossible since x ∈ int dom f .
This discussion shows that a convex function has a subgradient at x if there is at least
one nonvertical supporting hyperplane to epi f at (x, f (x)). This is the case, for example, if
f is continuous. There are pathological convex functions which do not have subgradients at
some points, but we will assume in the sequel that all convex functions are subdifferentiable
(at every point in dom f ).

2.2 Subgradients of differentiable functions

If f is convex and differentiable at x, then ∂f (x) = {∇f (x)}, i.e., its gradient is its only
subgradient. Conversely, if f is convex and ∂f (x) = {g}, then f is differentiable at x and
g = ∇f (x).

2.3 The minimum of a nondifferentiable function

A point x? is a minimizer of a function f (not necessarily convex) if and only if f is subdif-
ferentiable at x? and
0 ∈ ∂f (x? ),
i.e., g = 0 is a subgradient of f at x? . This follows directly from the fact that f (x) ≥ f (x? )
for all x ∈ dom f . And clearly if f is subdifferentiable at x? with 0 ∈ ∂f (x? ), then f (x) ≥
f (x? ) + 0T (x − x? ) = f (x? ) for all x.

3
While this simple characterization of optimality via the subdifferential holds for noncon-
vex functions, it is not particularly useful in that case, since we generally cannot find the
subdifferential of a nonconvex function.
The condition 0 ∈ ∂f (x? ) reduces to ∇f (x? ) = 0 when f is convex and differentiable at
x? .

2.4 Directional derivatives and subgradients

For convex functions f , the directional derivative of f at the point x ∈ Rn in the direction
v is
∆ f (x + tv) − f (x)
f 0 (x; v) = lim .
t&0 t
This quantity always exists for convex f , though it may be +∞ or −∞. To see the existence
of the limit, we use that the ratio (f (x + tv) − f (x))/t is non-decreasing in t. For 0 < t1 ≤ t2 ,
we have 0 ≤ t1 /t2 ≤ 1, and

f (x + t1 v) − f (x) f ( tt12 (x + t2 v) + (1 − tt21 )x) − f (x)

=
t1 t1
t1
f (x + t2 v) (1 − tt12 )f (x) − f (x) f (x + t2 v) − f (x)
≤ t2 + = ,
t1 t1 t2
so the limit in the definition of f 0 (x; v) exists.
The directional derivative f 0 (x; v) possesses several interesting properties as well. First,
it is convex in v, and if f is finite in a neighborhood of x, then f 0 (x; v) exists. Additionally,
f is differentiable at x if and only if for some g (which is ∇f (x)) and all v ∈ Rn we have
f 0 (x; v) = g T v, that is, if and only if f 0 (x; v) is a linear function of v.1 For general convex f ,
f 0 (x; v) is positively homogeneous in v, meaning that for α ≥ 0, we have f 0 (x; αv) = αf 0 (x; v)
(replace t by t/α in the defining limit).
The directional derivative f 0 (x; v) satisfies the following general formula for convex f :

f 0 (x; v) = sup g T v. (2)

g∈∂f (x)

To see this inequality, note that f 0 (x; v) ≥ supg∈∂f (x) g T v by the definition of a subgradient:
f (x + tv) − f (x) ≥ tg T v for any t ∈ R and g ∈ ∂f (x), so f 0 (x; v) ≥ supg∈∂f (x) g T v. For the
other direction, we claim that all affine functions that are below the function v 7→ f 0 (x; v)
may be taken to be linear. Specifically, suppose that (g, r) ∈ Rn × R and g T v − r ≤ f 0 (x; v)
for all v. Then r ≥ 0, as taking v = 0 gives −r ≤ f 0 (x; 0) = 0. By the positive homogeneity
of f 0 (x; v), we see that for any t ≥ 0 we have tg T v − r ≤ f 0 (x; tv) = tf 0 (x; v), and thus we
have
r
g T v − ≤ f 0 (x; v) for all t > 0.
t
1
This is simply the standard definition of differentiability.

4
−g

y
x?

Figure 4: The point x? minimizes f over X (the shown level curves) if and only if
for some g ∈ ∂f (x? ), g T (y − x? ) ≥ 0 for all y ∈ X. Note that not all subgradients
satisfy this inequality.

Taking t → +∞ gives that any affine minorizer of f 0 (x; v) may be taken to be linear. As any
(closed) convex function can be written as the supremum of its affine minorants, we have

f 0 (x; v) = sup g T v | g T ∆ ≤ f 0 (x; ∆) for all ∆ ∈ Rn .

On the other hand, if g T ∆ ≤ f 0 (x; ∆) for all ∆ ∈ Rn , then we have g T ∆ ≤ f (x+∆)−f (x), so
that g ∈ ∂f (x), and we may as well have taken the preceding supremum only over g ∈ ∂f (x).

2.5 Constrained minimizers of nondifferentiable functions

There is a somewhat more complex version of the result that 0 ∈ ∂f (x) if and only if x min-
imizes f for constrained minimization. Consider finding the minimizer of a subdifferentiable
function f over a (closed) convex set X. Then we have that x? minimizes f if and only if
there exists a subgradient g ∈ ∂f (x? ) such that

g T (y − x? ) ≥ 0 for all y ∈ X.

See Fig. 4 for an illustration of this condition.

To see this result, first suppose that g ∈ ∂f (x? ) satisfies the preceding condition. Then
by definition, f (x) ≥ f (x? )+g T (x−x? ) ≥ f (x? ) for x ∈ X. The converse is more subtle, and
we show it under the assumption that x? ∈ int dom f , though x? may be on the boundary of

5
X. We suppose that f (x) ≥ f (x? ) for all x ∈ X. In this case, for any x ∈ X, the directional
derivative
f (x? + t(x − x? )) − f (x? )
f 0 (x? ; x − x? ) = lim ≥ 0,
t&0 t
that is, for any x, the direction ∆ = x − x? pointing into X satisfies f 0 (x? ; ∆) ≥ 0.
By our characterization of the directional derivative earlier, we know that f 0 (x? ; ∆) =
supg∈∂f (x? ) g T ∆ ≥ 0. Thus, defining the ball B = {y + x? ∈ Rn | kyk2 ≤ }, we have

inf sup g T (x − x? ) ≥ 0.
x∈X∩B g∈∂f (x? )

As ∂f (x? ) is bounded, we may swap the min and max (see, for example, Exercise 5.25 of
[BV04]), finding that there must exist some g ∈ ∂f (x? ) such that

inf g T (x − x? ) ≥ 0.
x∈X∩B

But any y ∈ X may be written as t(x − x? ) + x? for some t ≥ 0 and x ∈ X ∩ B , which gives
the result.
For fuller explanations of these inequalities and derivations, see also the books by Hiriart-
Urruty and Lemaréchal [HUL93, HUL01].

3 Calculus of subgradients
In this section we describe rules for constructing subgradients of convex functions. We
will distinguish two levels of detail. In the ‘weak’ calculus of subgradients the goal is to
produce one subgradient, even if more subgradients exist. This is sufficient in practice, since
subgradient, localization, and cutting-plane methods require only a subgradient at any point.
A second and much more difficult task is to describe the complete set of subgradients
∂f (x) as a function of x. We will call this the ‘strong’ calculus of subgradients. It is useful
in theoretical investigations, for example, when describing the precise optimality conditions.

3.1 Nonnegative scaling

For α ≥ 0, ∂(αf )(x) = α∂f (x).

3.2 Sum and integral

Suppose f = f1 + · · · + fm , where f1 , . . . , fm are convex functions. Then we have

∂f (x) = ∂f1 (x) + · · · + ∂fm (x).

This property extends to infinite sums, integrals, and expectations (provided they exist).

6
3.3 Affine transformations of domain
Suppose f is convex, and let h(x) = f (Ax + b). Then ∂h(x) = AT ∂f (Ax + b).

3.4 Pointwise maximum

Suppose f is the pointwise maximum of convex functions f1 , . . . , fm , i.e.,

f (x) = max fi (x),

i=1,...,m

where the functions fi are subdifferentiable. We first show how to construct a subgradient
of f at x.
Let k be any index for which fk (x) = f (x), and let g ∈ ∂fk (x). Then g ∈ ∂f (x). In other
words, to find a subgradient of the maximum of functions, we can choose one of the functions
that achieves the maximum at the point, and choose any subgradient of that function at the
point. This follows from

f (z) ≥ fk (z) ≥ fk (x) + g T (z − x) = f (x) + g T (y − x).

More generally, we have

∂f (x) = Co ∪ {∂fi (x) | fi (x) = f (x)},

i.e., the subdifferential of the maximum of functions is the convex hull of the union of
subdifferentials of the ‘active’ functions at x.

Example. Maximum of differentiable functions. Suppose f (x) = maxi=1,...,m fi (x),

where fi are convex and differentiable. Then we have

∂f (x) = Co{∇fi (x) | fi (x) = f (x)}.

At a point x where only one of the functions, say fk , is active, f is differentiable and
has gradient ∇fk (x). At a point x where several of the functions are active, ∂f (x) is
a polyhedron.

Example. `1 -norm. The `1 -norm

f (x) = kxk1 = |x1 | + · · · + |xn |

is a nondifferentiable convex function of x. To find its subgradients, we note that f

can expressed as the maximum of 2n linear functions:

kxk1 = max{sT x | si ∈ {−1, 1}},

so we can apply the rules for the subgradient of the maximum. The first step is to
identify an active function sT x, i.e., find an s ∈ {−1, +1}n such that sT x = kxk1 . We
can choose si = +1 if xi > 0, and si = −1 if xi < 0. If xi = 0, more than one function

7
is active, and both si = +1, and si = −1 work. The function sT x is differentiable and
has a unique subgradient s. We can therefore take

 +1 xi > 0
gi = −1 xi < 0

−1 or + 1 xi = 0.
The subdifferential is the convex hull of all subgradients that can be generated this
way:
∂f (x) = {g | kgk∞ ≤ 1, g T x = kxk1 }.

3.5 Supremum
Next we consider the extension to the supremum over an infinite number of functions, i.e.,
we consider
f (x) = sup fα (x),
α∈A
where the functions fα are subdifferentiable. We only discuss the weak property.
Suppose the supremum in the definition of f (x) is attained. Let β ∈ A be an index for
which fβ (x) = f (x), and let g ∈ ∂fβ (x). Then g ∈ ∂f (x). If the supremum in the definition
is not attained, the function may or may not be subdifferentiable at x, depending on the
index set A.
Assume however that A is compact (in some metric), and that the function α 7→ fα (x)
is upper semi-continuous for each x. Then
∂f (x) = Co ∪ {∂fα (x) | fα (x) = f (x)}.
Example. Maximum eigenvalue of a symmetric matrix. Let f (x) = λmax (A(x)),
where A(x) = A0 + x1 A1 + · · · + xn An , and Ai ∈ Sm . We can express f as the
pointwise supremum of convex functions,
f (x) = λmax (A(x)) = sup y T A(x)y.
kyk2 =1

Here the index set A is A = {y ∈ Rn | ky2 k1 ≤ 1}.

Each of the functions fy (x) = y T A(x)y is affine in x for fixed y, as can be easily seen
from
y T A(x)y = y T A0 y + x1 y T A1 y + · · · + xn y T An y,
so it is differentiable with gradient ∇fy (x) = (y T A1 y, . . . , y T An y).
The active functions y T A(x)y are those associated with the eigenvectors corresponding
to the maximum eigenvalue. Hence to find a subgradient, we compute an eigenvector
y with eigenvalue λmax , normalized to have unit norm, and take
g = (y T A1 y, y T A2 y, . . . , y T An y).

The ‘index set’ in this example is {y | kyk = 1} is a compact set. Therefore

∂f (x) = Co {∇fy | A(x)y = λmax (A(x))y, kyk = 1 } .

8
Example. Maximum eigenvalue of a symmetric matrix, revisited. Let f (A) = λmax (A),
where A ∈ Sn , the symmetric n-by-n matrices. Then as above, f (A) = λmax (A) =
supkyk2 =1 y T Ay, but we note that y T Ay = Tr(Ayy T ), so that each of the functions
fy (A) = y T Ay is linear in A with gradient ∇fy (A) = yy T . Then using an identical
argument to that above, we find that

∂f (A) = Co yy T | kyk2 = 1, y T Ay = λmax (A) = Co yy T | kyk2 = 1, Ay = λmax (A)y ,

the convex hull of the outer products of maximum eigenvectors of the matrix A.

3.6 Minimization over some variables

The next subgradient calculus rule concerns functions of the form
f (x) = inf F (x, y)
y

where F (x, y) is subdifferentiable and jointly convex in x ∈ Rn and y ∈ Rm .

Suppose that the infimum over y in the definition of f (x) is attained on the set Yx ⊂ Rm
(where Yx 6= ∅), so that F (x, y) = f (x) for y ∈ Yx . By definition, a vector g ∈ Rn is a
subgradient of f if and only if
f (x0 ) ≥ f (x) + g T (x0 − x) = F (x, y) + g T (x0 − x)
for all x0 ∈ Rn and any y ∈ Yx . This is equivalent to
T 0
0 0 T 0 g x x
F (x , y ) ≥ F (x, y) + g (x − x) = F (x, y) + −
0 y0 y
for all (x0 , y 0 ) ∈ Rn × Rm and x, y ∈ Yx . In particular, we have the result that
∂f (x) = {g ∈ Rn | (g, 0) ∈ ∂F (x, y) for some y ∈ Yx } .
That is, there exist g ∈ Rn such that (g, 0) ∈ ∂F (x, y) for some y ∈ Yx , and any such g is a
subgradient of f at x (as long as the infimum is attained and x ∈ int dom f ).

3.7 Optimal value function of a convex optimization problem

Suppose f : Rm × Rp → R is defined as the optimal value of a convex optimization problem
in standard form, with z ∈ Rn as optimization variable,
minimize f0 (z)
subject to fi (z) ≤ xi , i = 1, . . . , m (3)
Az = y.
In other words, f (x, y) = inf z F (x, y, z) where

f0 (z) fi (z) ≤ xi , i = 1, . . . , m, Az = y
F (x, y, z) =
+∞ otherwise,

9
which is jointly convex in x, y, z. Subgradients of f can be related to the dual problem
of (3) as follows.
Suppose we are interested in subdifferentiating f at (x̂, ŷ). We can express the dual
problem of (3) as
maximize g(λ) − xT λ − y T ν
(4)
subject to λ 0.
where !
X
m
g(λ) = inf f0 (z) + λi fi (z) + ν T Az .
z
i=1

Suppose strong duality holds for problems (3) and (4) at x = x̂ and y = ŷ, and that the
dual optimum is attained at λ? , ν ? (for example, because Slater’s condition holds). From
the global perturbation inequalities we know that

f (x, y) ≥ f (x̂, ŷ) − λ? T (x − x̂) − ν ? T (y − ŷ)

In other words, the dual optimal solution provides a subgradient:

−(λ? , ν ? ) ∈ ∂f (x̂, ŷ).

4 Quasigradients
If f (x) is quasiconvex, then g is a quasigradient at x0 if

g T (x − x0 ) ≥ 0 ⇒ f (x) ≥ f (x0 ),

Geometrically, g defines a supporting hyperplane to the sublevel set {x | f (x) ≤ f (x0 )}.
Note that the set of quasigradients at x0 form a cone.
T
Example. Linear fractional function. f (x) = acT x+d x+b
. Let cT x0 + d > 0. Then
g = a − f (x0 )c is a quasigradient at x0 . If cT x + d > 0, we have

aT (x − x0 ) ≥ f (x0 )cT (x − x0 ) =⇒ f (x) ≥ f (x0 ).

Example. Degree of a polynomial. Define f : Rn → R by

f (a) = min{i | ai+2 = · · · = an = 0},

i.e., the degree of the polynomial a1 + a2 t + · · · + an tn−1 . Let a 6= 0, and k = f (a),

then g = sign(ak+1 )ek+1 is a quasigradient at a
To see this, we note that

g T (b − a) = sign(ak+1 )bk+1 − |ak+1 | ≥ 0

implies bk+1 6= 0.

10
5 Clarke Subdifferential
Now we explore a generalization of the notion of subdifferential that enables the analysis of
non-convex and non-smooth functions through convex analysis. We will introduce Clarke
Subdifferential, which is a natural generalization [Cla90] of the subdifferential set in terms
of convex hulls.

5.1 Locally Lipschitz Functions

As we move beyond convex functions, an important class of functions is locally Lipschitz
functions. Let us recall the definition:

Definition 1. A function f : Rn → R is locally Lipschitz if for any bounded S ⊆ Rn , there

exists a constant L > 0 such that

|f (x) − f (y)| ≤ Lkx − yk2 for all x, y ∈ S.

A well-known result due to Rademacher states that a locally Lipschitz function is differen-
tiable almost everywhere (see e.g., Theorem 9.60 in [RW09]). In particular, every neighbor-
hood of x contains a point y for which ∇f (y) exists. This motivates the following construction
known as the Clarke subdifferential
n o
∂C f (x) = Co s ∈ R : ∃x → x, ∇f (x ) exists, and ∇f (x ) → s .
n k k k

We can check that the absolute value function f (x) = |x| satisfies ∇C f (0) = [−1, 1]. Like-
wise, the function −f (x) satisfies ∇C (−f (0)) = [−1, 1]. For convex functions, we will see
that the Clarke subdifferential reduces to the ordinary subdifferential defined in Section 1.

5.2 Clarke Directional Derivative

Remarkably, it can be shown that Clarke subdifferentials can by described by support func-
tions even for non-convex functions. In order to show this, we need to generalize the notion
of directional derivatives from Section 2.4. We now define the Clarke directional derivative
of f at x in the direction d as follows

∆ f (x0 + td) − f (x0 )

f ◦ (x, d) = lim sup . (5)
x0 →x, t&0 t

Compared to the usual directional derivative (2), Clarke directional derivative (5) is able to
capture the behavior of the function in a neighborbood of x rather than just along a ray
emanating from x.
We have the following generalization of (2)

f ◦ (x, d) = max sT d,
s∈∇C f (x)

11
in general. Consider the function f : R ! R given by
f (x) = max{x, 0}+min{0, x}. Let us compute @C f1 (0),
> 0, @C f2 (0), and @C f (0):
wise.
derivative f1 (x) = max{x,0} f2 (x) = min{x,0} f (x) =f1 (x)+f2 (x)
g( x1 ))
this and
s locally
e of f at
quotient @C f1 (0) = [0,1] @C f2 (0) = [0,1] @C f (0) = {1}
e, as can
(n+ 12 )⇡ Figure 5: Clarke subdifferential of the sum of two non-differentiable functions.
NoteObserve that rule does not hold in this example since the function f2 (x) =
that the addition
min{x, 0} is not subdifferentially regular [LSM20].
s even, @C f (0) = {1} ( @C f1 (0) + @C f2 (0) = [0, 2].
wise.
which shows that the support function of the Clarke subdifferential at any point x evaluated
The
at d is equal failure
to the Clarkeofdirectional
the sumderivative
rule isatone
x in of
the the obstacles
direction d. to
f directionalthe Clarkecomputing
Note that the Clarke the Clarke subgradient. Nevertheless, not allto the sum of
subdifferential of the sum of two functions is not equal
subdifferentials (see Figure 5 for an example). Nevertheless, we have the weaker
of view. Thesum rule is lost, as we still have the following weaker version of
anating from the sum rule: ∇C (f1 + f2 ) ⊆ ∇C f1 + ∇C f2 .
with tk & 0), It can be shown that the sum rule holds with equality if the functions are subdifferentially
the directionregular, which are locally Lipschitz + f2 ) ✓for@Cwhich
@C (f1 functions f1 +the@Cordinary
f2 ; directional derivative (2)
0 ◦
k
+ tk d) vs.and Clarke directional derivative (5) coincide, i.e., f (x, d) = f (x, d) ∀x, d. It follows that
the latter isis identicalsee [28,
convex functions are Proposition
subdifferentially1.12].
regular. This implies that the Clarke subdifferential

d of x rathertions and – (Tightness).

to the ordinary subdifferential for convex functions. Furthermore, smooth func-
It is known that if f attains a local minimum
maximum of smooth functions, i.e., f = maxi∈{1,...,m} gi , where gi are smooth are
at x̄, regular.
y, f (x, d) issubdifferentially then 0It2can@Calso f (x̄); see that
be shown [30,chain
Proposition 2.3.2].
rule also holds By
for subdifferentially
Fact 3,[LSM20].
eighborhoodregular functions this is equivalent to f (x̄, d) 0 for all d 2 Rn .
very fruitful However, the Clarke subdifferential may contain sta-
Example. Subdifferential sum rule. Consider the functions f1 (x) = max{x, 0},
functions. tionary points that are not local minima. For instance,
f2 (x) = min{x, 0} and f (x) = f1 (x)+f2 (x) as shown in Figure 5. It can be verified that
consider
the weak additionthe
rulefunction
∇C f1 + f2 R⊆∇ 3C fx1 +7!∇Cff(x) ∇C fIt1 (0)
= e.g.,|x|.
2 holds, is +easy
∇C f2 (0) =
difference in
[0, 1] to
+ [0,see
1] =that @C fC (0) = [ 1, 1]. It follows that x̄ = 0 is here
[0, 2] ⊇ ∇ f (0) = {1}. The sum rule does not hold with equality
ons and non-
a stationary point (as 0 2 @C f (0)). However, the point
since f2 (x) = min{x, 0} is not subdifferentially regular. Note that non-smooth concave
subdifferen- functions are not subdifferentially regular in general.
x̄ = 0 is clearly not a local minimum (in fact, it is a global
d; in the non-
Finally,maximum). Moreover,
we have the following observe
result that that local
characterizes the minima
corresponding
and maxima of locally
nvexificationLipschitz functions in terms of stationarity in the sense of Clarke subdifferential [RW09].
Clarke directional derivatives are f (0, 1) = f (0, 1) =
nvex analysis
1, xwhich shows that neither d = 1 =⇒ nor 0 d = 1
ions. is a local minimum or maximum of f (x) ∈ ∂C f (x).
is a descent direction according to Clarke’s definition.
veral proper-Note that the reverse implication does not hold in general.
However, the ordinary directional derivatives exist and
ossess. Now,
are given by f 0 (0, 1) = f 0 (0, 1) = 1. It follows that
ssesses those
both d = 1 and d = 1 are 12 descent directions. One may
argue that the above example is not persuasive enough,
continuously as similar phenomena occur in the smooth case (e.g.,
x)T d for all R 3 x 7! f (x) = x2 ). Hence, let us provide another,
Example. Local minimum and maximum. Consider the non-convex one-dimensional
function f (x) = max{−|x|, x − 1}. It can be verified that x = 0 and x = 12 are local
maximum and local minimum respectively. Note that we have 0 ∈ ∂C f (0) = [−1, 1]
and 0 ∈ ∂C f ( 21 ) = [−1, 1].

Example. Stationary points. Consider the function f (x) = min{x, 0}. It can be seen
that ∇C f (x) = {0} for x > 0, and such points are neither local minima nor local
maxima.

References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press,
2004.

[Cla90] F. H. Clarke. Optimization and nonsmooth analysis. SIAM, 1990.

[HUL93] J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algo-

rithms I & II. Springer, New York, 1993.

[HUL01] J. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex Analysis. Springer,

2001.

[LSM20] J. Li, A. M. So, and W. Ma. Understanding notions of stationarity in nonsmooth

optimization: A guided tour of various constructions of subdifferential for nons-
mooth functions. IEEE Signal Processing Magazine, 37(5):18–31, 2020.

[RW09] R. T. Rockafellar and T. J-B. Wets. Variational analysis, volume 317. Springer
Science & Business Media, 2009.

Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
Subgradients Slides
No ratings yet
Subgradients Slides
37 pages
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
No ratings yet
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
39 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
Lecture 6
No ratings yet
Lecture 6
9 pages
Lect4 Removed
No ratings yet
Lect4 Removed
32 pages
Subgradients
No ratings yet
Subgradients
39 pages
Epigrafo PDF
No ratings yet
Epigrafo PDF
12 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
5 Optimization: F Emp
No ratings yet
5 Optimization: F Emp
52 pages
Convex Analysis and Minimization Algorithms II-ChapXI
No ratings yet
Convex Analysis and Minimization Algorithms II-ChapXI
46 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
143 pages
Optimization: Dixit 1990 Simon and Blume 1994 Carter 2001 de La Fuente 2000
No ratings yet
Optimization: Dixit 1990 Simon and Blume 1994 Carter 2001 de La Fuente 2000
25 pages
Mclas Tema1 v2
No ratings yet
Mclas Tema1 v2
74 pages
Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems
No ratings yet
Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems
10 pages
Econ 102
No ratings yet
Econ 102
18 pages
Lec 11
No ratings yet
Lec 11
13 pages
03 Convex Functions
No ratings yet
03 Convex Functions
31 pages
Univariate Calculus and Multivariate Calculus
No ratings yet
Univariate Calculus and Multivariate Calculus
141 pages
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
No ratings yet
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
431 pages
Lewis1999 Article NonsmoothAnalysisOfEigenvalues
No ratings yet
Lewis1999 Article NonsmoothAnalysisOfEigenvalues
24 pages
Applied Mathematics Letters: M. Soleimani-Damaneh
No ratings yet
Applied Mathematics Letters: M. Soleimani-Damaneh
4 pages
Exam Prep
No ratings yet
Exam Prep
29 pages
18 Vector Calculus and Optimization
No ratings yet
18 Vector Calculus and Optimization
6 pages
Real Analysis
No ratings yet
Real Analysis
49 pages
Lecture Notes CoV 2022
No ratings yet
Lecture Notes CoV 2022
105 pages
Optimization: 1 Motivation
No ratings yet
Optimization: 1 Motivation
20 pages
A Detailed Analysis of The Brachistochrone Problem
No ratings yet
A Detailed Analysis of The Brachistochrone Problem
15 pages
2 Directional Derivative
No ratings yet
2 Directional Derivative
3 pages
Introduction To Nonlinear Systems and Numerical Optimization
No ratings yet
Introduction To Nonlinear Systems and Numerical Optimization
83 pages
Calculus of Variation and Image Processing: Scalar Product
No ratings yet
Calculus of Variation and Image Processing: Scalar Product
9 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
O4MD 02 Foundations
No ratings yet
O4MD 02 Foundations
8 pages
Remarks On Toland's Duality, Convexity and Optimal Transport
No ratings yet
Remarks On Toland's Duality, Convexity and Optimal Transport
13 pages
斯坦福大学机器学习数学基础 41-48
No ratings yet
斯坦福大学机器学习数学基础 41-48
8 pages
Curs Tehnici de Optimizare
No ratings yet
Curs Tehnici de Optimizare
141 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
1 Convex Analysis: 1.1 Motivations: Convex Optimization Problems
No ratings yet
1 Convex Analysis: 1.1 Motivations: Convex Optimization Problems
24 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
No ratings yet
The Set F May Be Specified by Equations of The Form (1.1) And/or (1.2) - Alternatively, The Term Global Minimiser Can Be Used To Denote A Point at Which The Function F Attains Its Global Minimum
4 pages
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
No ratings yet
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
23 pages
Calculus Notes
No ratings yet
Calculus Notes
7 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Course Notes For MATH 524: Non-Linear Optimization
No ratings yet
Course Notes For MATH 524: Non-Linear Optimization
112 pages
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
No ratings yet
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
5 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
No ratings yet
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
17 pages
Multivariable Calculus: Inverse-Implicit Function Theorems: N N M F X
No ratings yet
Multivariable Calculus: Inverse-Implicit Function Theorems: N N M F X
11 pages
CS 726: Nonlinear Optimization 1 Lecture 3: Di Erentiability
No ratings yet
CS 726: Nonlinear Optimization 1 Lecture 3: Di Erentiability
22 pages
Lec3 Convex Function Exercise
No ratings yet
Lec3 Convex Function Exercise
4 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Huawei CloudAIR Solution - Deep Insight - GSM, UMTS and LTE Spectrum Concurrency Share Mechanism
No ratings yet
Huawei CloudAIR Solution - Deep Insight - GSM, UMTS and LTE Spectrum Concurrency Share Mechanism
34 pages
UNIT 3 - Test 2
No ratings yet
UNIT 3 - Test 2
7 pages
Fin460 Group 3 Sec 3
No ratings yet
Fin460 Group 3 Sec 3
22 pages
The American College
No ratings yet
The American College
2 pages
Step by Step Exchange 2019 Installation Guide For Anyone v1.2
100% (2)
Step by Step Exchange 2019 Installation Guide For Anyone v1.2
55 pages
Document
No ratings yet
Document
5 pages
How God Answers Prayer
100% (1)
How God Answers Prayer
12 pages
Amoral Politics The Persistent Truth of Machiavellism PDF
No ratings yet
Amoral Politics The Persistent Truth of Machiavellism PDF
2 pages
BRIM Syllabus
No ratings yet
BRIM Syllabus
4 pages
Manual Spot Welder
No ratings yet
Manual Spot Welder
9 pages
Jenkins BeetleBook Guide
No ratings yet
Jenkins BeetleBook Guide
2 pages
Research Project Proposal
No ratings yet
Research Project Proposal
2 pages
Interior Design Final
No ratings yet
Interior Design Final
11 pages
Portfolio Optimization
No ratings yet
Portfolio Optimization
53 pages
Clasa A 9a Limba Engleza
No ratings yet
Clasa A 9a Limba Engleza
2 pages
MKS Type 1179C Mass-Flo Controller and Type 179C Mass-Flo Meter
No ratings yet
MKS Type 1179C Mass-Flo Controller and Type 179C Mass-Flo Meter
86 pages
00.01 Heko Chain Conveyors 2007
No ratings yet
00.01 Heko Chain Conveyors 2007
7 pages
NPTEL Courses - Final Course List (Jan - April 2022)
No ratings yet
NPTEL Courses - Final Course List (Jan - April 2022)
15 pages
Book 13 Apr 2024
No ratings yet
Book 13 Apr 2024
15 pages
Mathematics 8 Lesson Plan
No ratings yet
Mathematics 8 Lesson Plan
8 pages
Week 1 - Firat and Venkatesh (1995)
No ratings yet
Week 1 - Firat and Venkatesh (1995)
30 pages
Evergreen State - Music Cultures of The World (1993-1994) Sean Williams
No ratings yet
Evergreen State - Music Cultures of The World (1993-1994) Sean Williams
4 pages
(2018) Fittingness - Christopher Howard
No ratings yet
(2018) Fittingness - Christopher Howard
14 pages
Coroner's Findings - Inquest Into The Mangatepopo Gorge Disaster - Coroner CJ Davenport - 30th March 2010
No ratings yet
Coroner's Findings - Inquest Into The Mangatepopo Gorge Disaster - Coroner CJ Davenport - 30th March 2010
39 pages
Adhoc Faculty Application Form
No ratings yet
Adhoc Faculty Application Form
3 pages
Dhrifi 2015
No ratings yet
Dhrifi 2015
20 pages
Fixed Displacement Vane Pumps Datasheet
No ratings yet
Fixed Displacement Vane Pumps Datasheet
6 pages
Autocad Multtiple Choice Questions
No ratings yet
Autocad Multtiple Choice Questions
10 pages
General Banking AIBL
No ratings yet
General Banking AIBL
72 pages
Background of The Study
No ratings yet
Background of The Study
2 pages

Notes On Subgradients

Uploaded by

Notes On Subgradients

Uploaded by

Subgradients

S. Boyd, J. Duchi, M. Pilanci, and L. Vandenberghe

In addition, if f is continuous at x, then the subdifferential ∂f (x) is bounded. Indeed, choose

Figure 1: At x1 , the convex function f is differentiable, and g1 (which is the

Figure 2: A vector g ∈ Rn is a subgradient of f at x if and only if (g, −1) defines

2.1 Existence of subgradients

aT (z − x) + b(f (z) − f (x)) ≤ 0

for all z. If b 6= 0, we can divide by b to obtain

f (z) ≥ f (x) − (a/b)T (z − x),

2.2 Subgradients of differentiable functions

2.3 The minimum of a nondifferentiable function

2.4 Directional derivatives and subgradients

f (x + t1 v) − f (x) f ( tt12 (x + t2 v) + (1 − tt21 )x) − f (x)

f 0 (x; v) = sup g T v. (2)

2.5 Constrained minimizers of nondifferentiable functions

See Fig. 4 for an illustration of this condition.

3.1 Nonnegative scaling

3.2 Sum and integral

∂f (x) = ∂f1 (x) + · · · + ∂fm (x).

3.4 Pointwise maximum

f (x) = max fi (x),

f (z) ≥ fk (z) ≥ fk (x) + g T (z − x) = f (x) + g T (y − x).

More generally, we have

∂f (x) = Co ∪ {∂fi (x) | fi (x) = f (x)},

Example. Maximum of differentiable functions. Suppose f (x) = maxi=1,...,m fi (x),

∂f (x) = Co{∇fi (x) | fi (x) = f (x)}.

Example. `1 -norm. The `1 -norm

f (x) = kxk1 = |x1 | + · · · + |xn |

is a nondifferentiable convex function of x. To find its subgradients, we note that f

kxk1 = max{sT x | si ∈ {−1, 1}},

Here the index set A is A = {y ∈ Rn | ky2 k1 ≤ 1}.

The ‘index set’ in this example is {y | kyk = 1} is a compact set. Therefore

3.6 Minimization over some variables

where F (x, y) is subdifferentiable and jointly convex in x ∈ Rn and y ∈ Rm .

3.7 Optimal value function of a convex optimization problem

f (x, y) ≥ f (x̂, ŷ) − λ? T (x − x̂) − ν ? T (y − ŷ)

In other words, the dual optimal solution provides a subgradient:

−(λ? , ν ? ) ∈ ∂f (x̂, ŷ).

aT (x − x0 ) ≥ f (x0 )cT (x − x0 ) =⇒ f (x) ≥ f (x0 ).

Example. Degree of a polynomial. Define f : Rn → R by

f (a) = min{i | ai+2 = · · · = an = 0},

i.e., the degree of the polynomial a1 + a2 t + · · · + an tn−1 . Let a 6= 0, and k = f (a),

g T (b − a) = sign(ak+1 )bk+1 − |ak+1 | ≥ 0

5.1 Locally Lipschitz Functions

Definition 1. A function f : Rn → R is locally Lipschitz if for any bounded S ⊆ Rn , there

|f (x) − f (y)| ≤ Lkx − yk2 for all x, y ∈ S.

5.2 Clarke Directional Derivative

∆ f (x0 + td) − f (x0 )

d of x rathertions and – (Tightness).

[Cla90] F. H. Clarke. Optimization and nonsmooth analysis. SIAM, 1990.

[HUL93] J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algo-

[HUL01] J. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex Analysis. Springer,

[LSM20] J. Li, A. M. So, and W. Ma. Understanding notions of stationarity in nonsmooth

You might also like