0% found this document useful (0 votes)
17 views23 pages

Lecture 3 Si416 2025

The lecture discusses the concepts of convexity and strict convexity in twice continuously differentiable functions. It establishes the relationship between local and global minimizers, the role of the Hessian matrix, and the conditions under which a function is strictly convex or strongly convex. Key theorems are presented, including the characterization of convex functions and implications of strong convexity on the uniqueness of global minimizers.

Uploaded by

Divy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

Lecture 3 Si416 2025

The lecture discusses the concepts of convexity and strict convexity in twice continuously differentiable functions. It establishes the relationship between local and global minimizers, the role of the Hessian matrix, and the conditions under which a function is strictly convex or strongly convex. Key theorems are presented, including the characterization of convex functions and implications of strong convexity on the uniqueness of global minimizers.

Uploaded by

Divy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

optimization

(SI 416) – lecture 3

Harsha Hutridurga

IIT Bombay

Harsha Hutridurga (IIT Bombay) SI 416 1 / 23


story so far

♣ Take a twice continuously differentiable function f : Rn → R

x∗ is a ∇f (x∗ ) = 0
local minimizer =⇒ and
of f ∇2 f (x∗ )p, p ≥ 0 for all p ∈ Rn

∇f (x∗ ) = 0 x∗ is a
and =⇒ strict local minimizer
∇2 f (x∗ )p, p > 0 ∀p ∈ Rn \ {0} of f

Harsha Hutridurga (IIT Bombay) SI 416 2 / 23


story so far (contd.)

♣ Take a twice continuously differentiable function f : Rn → R which


is also convex on Rn

x∗ is a x∗ is a
local minimizer ⇐⇒ global minimizer
of f of f

x∗ satisfies x∗ satisfies
f (x∗ ) ≤ f (x) for all x ∈ Rn ⇐⇒ ∇f (x∗ ) = 0

Harsha Hutridurga (IIT Bombay) SI 416 3 / 23


story so far (characterization of convex functions)

♣ Let f : Rn → R be a differentiable function.

f is for every x, y ∈ Rn ,
convex ⇐⇒ f (y) ≥ f (x) + h∇f (x), (y − x)i

Harsha Hutridurga (IIT Bombay) SI 416 4 / 23


hessian of a convex function

Theorem
A twice differentiable function f : Rn → R is convex if and only if the
Hessian matrix ∇2 f (x) is positive semidefinite for all x ∈ Rn .

♣ Take a convex function f and let x ∈ Rn .


♣ Define a function g : Rn → R as follows:
g(y) := f (y) − h∇f (x), (y − x)i

♣ Note that y 7→ − h∇f (x), (y − x)i is linear. Hence is convex


♣ Thus g is a convex function
♣ Observe that for all y ∈ Rn , we have
∇g(y) = ∇f (y) − ∇f (x) and ∇2 g(y) = ∇2 f (y)

♣ Note in particular that ∇g(x) = 0


Harsha Hutridurga (IIT Bombay) SI 416 5 / 23
hessian of a convex function (contd.)

♣ Recall that g is a convex function and we have shown ∇g(x) = 0


♣ Hence x is a global minimizer of g
♣ Second order necessary condition then implies that the Hessian
∇2 g(x) is positive semidefinite
♣ Recall that we had ∇2 g(x) = ∇2 f (x) and that x was arbitrary
♣ We have thus demonstrated that ∇2 f (x) is positive semidefinite
for all x ∈ Rn
♣ Next, we assume that ∇2 f (x) is positive semidefinite and then
show that f is convex
♣ Taking x, y ∈ Rn and employing the Taylor’s theorem we obtain
f (y) = f (x + y − x)
1
= f (x) + h∇f (x), y − xi + h∇2 f (x + s(y − x))(y − x), (y − x)i
2
≥ f (x) + h∇f (x), (y − x)i
This proves convexity of f .
Harsha Hutridurga (IIT Bombay) SI 416 6 / 23
strict convexity

♣ A function f : Rn → R is said to be strictly convex on Rn if


for all x, y ∈ Rn , x 6= y,

f (αx + (1 − α)y) < αf (x) + (1 − α)f (y) for α ∈ (0, 1).

Theorem
A differentiable function f : Rn → R is strictly convex if and only if

f (y) > f (x) + h∇f (x), (y − x)i for all x, y ∈ Rn , x 6= y

♣ The proof of the above result is exactly similar to the convex case
wherein we replace inequalities by strict inequalities

Harsha Hutridurga (IIT Bombay) SI 416 7 / 23


strict convexity and unique minimizer

Theorem
A strictly convex function f : Rn → R has at most one global minimizer

♣ Above statement doesn’t guarantee a global minimizer


♣ It says: if there is a global minimizer of f , then it must be unique
♣ Suppose there are two distinct global minimizers of f , say x and y
♣ That is,
f (x) = f (y) ≤ f (z) for all z ∈ Rn .

♣ Let us take in particular z = x+y


2 . Then by strict convexity,
 
x+y 1 1
f (z) = f < f (x) + f (y) = f (x)
2 2 2

♣ Thus we arrive at a contradiction


Harsha Hutridurga (IIT Bombay) SI 416 8 / 23
hessian and strict convexity

Theorem
Let f : Rn → R be a twice continuously differentiable function such that
its Hessian matrix ∇2 f (x) is positive definite for all x ∈ Rn .
Then, the function f is strictly convex.

♣ Take x, y ∈ Rn such that x 6= y.


♣ Employing the Taylor’s theorem we obtain
f (y) = f (x + y − x)
1
= f (x) + h∇f (x), y − xi + h∇2 f (x + s(y − x))(y − x), y − xi
2
> f (x) + h∇f (x), (y − x)i
This proves strict convexity of f .
♣ Not every strictly convex function has a positive definite Hessian
♣ Consider f (x) = x4 whose f 00 (0) = 0
Harsha Hutridurga (IIT Bombay) SI 416 9 / 23
strict convexity of x4

♣ Note: g(x) = x2 is strictly convex as g 00 (x) = 2 > 0 for all x ∈ R


♣ Goal is show that for any x, y ∈ R with x 6= y, there holds
(αx + (1 − α)y)4 < αx4 + (1 − α)y 4 for all α ∈ (0, 1).

♣ Note that strict convexity of x2 implies


(αx + (1 − α)y)2 < αx2 + (1 − α)y 2 for all α ∈ (0, 1).

♣ Squaring on both sides and using the fact that x2 is an increasing


function on [0, ∞), we get
2
(αx + (1 − α)y)4 < αx2 + (1 − α)y 2

♣ Again using the strict convexity of x2 , we arrive at


(αx + (1 − α)y)4 < αx4 + (1 − α)y 4 .

Harsha Hutridurga (IIT Bombay) SI 416 10 / 23


strong convexity

♣ A function f : Rn → R is said to be strongly convex if there


exists a λ > 0 such that
f (x) − λkxk2 is convex.

♣ Here
n
X
kxk2 := x2i
i=1

♣ Observe that g(x) = kxk2 is strictly convex as


∇2 g(x) = 2I, where I is the identity matrix

♣ Given a convex function f : Rn → R, we can build a strongly


convex function
f (x) + µkxk2 for any µ > 0

Harsha Hutridurga (IIT Bombay) SI 416 11 / 23


hessian and strong convexity

♣ Let f : Rn → R be a twice differentiable strongly convex function

i.e. g(x) := f (x) − λkxk2 is convex for some λ > 0.

i.e. ∇2 g(x) = ∇2 f (x) − 2λI is positive semidefinite.

i.e. ∇2 g(x)p, p = ∇2 f (x)p, p − h2λIp, pi ≥ 0 for all p ∈ Rn .

=⇒ ∇2 f (x)p, p ≥ 2λkpk2 for all p ∈ Rn .

Lemma
If f : Rn → R is a twice continuously differentiable strongly convex
function, then there exists a λ > 0 such that

∇2 f (x)p, p ≥ 2λkpk2 for all p ∈ Rn .

Harsha Hutridurga (IIT Bombay) SI 416 12 / 23


strong convexity – further properties

♣ Let f : Rn → R be a twice differentiable strongly convex function


♣ Taking x, y ∈ Rn and employing the Taylor’s theorem we obtain
f (y) = f (x + y − x)
1
= f (x) + h∇f (x), y − xi + h∇2 f (x + s(y − x))(y − x), y − xi
2
for some s ∈ (0, 1).
♣ Employing the lemma from the previous slide, we deduce that
there exists a λ > 0 such that
f (y) ≥ f (x) + h∇f (x), (y − x)i + λ ky − xk2

Lemma
If f : Rn → R is a twice continuously differentiable strongly convex
function, then there exists a λ > 0 such that for all x, y ∈ Rn ,

f (y) ≥ f (x) + h∇f (x), (y − x)i + λ ky − xk2


Harsha Hutridurga (IIT Bombay) SI 416 13 / 23
strong convexity and minimizers

♣ Let f : Rn → R be a twice differentiable strongly convex function:

f (y) ≥ f (x) + h∇f (x), y − xi + λ ky − xk2 for all x, y ∈ Rn

for some λ > 0.


♣ Cauchy-Schwarz inequality says: for any u, v ∈ Rn ,

|hu, vi| ≤ kuk kvk

i.e.
− kuk kvk ≤ hu, vi ≤ kuk kvk

♣ Using this in the earlier property of strong convexity, we deduce

f (y) ≥ f (x) − k∇f (x)k ky − xk + λ ky − xk2 for all x, y ∈ Rn

Harsha Hutridurga (IIT Bombay) SI 416 14 / 23


strong convexity and minimizers (contd.)

♣ Suppose x∗ is a global minimizer of f


♣ Thanks to the inequality from the previous slide, we have

f (x∗ ) ≥ f (x) − k∇f (x)k kx∗ − xk + λ kx∗ − xk2 for all x ∈ Rn

♣ Thanks to x∗ being a global minimizer, the above inequality yields


1
kx∗ − xk ≤ k∇f (x)k for all x ∈ Rn
λ

♣ From the last inequality, we can conclude that


I smaller the gradient of f at a point, closer it is to a global minimizer
I there can at most be one global minimizer of a strongly convex
function

Harsha Hutridurga (IIT Bombay) SI 416 15 / 23


strongly, strictly and just

♣ Consider a function f : Rn → R.

f is f is f is
strongly convex =⇒ strictly convex =⇒ convex

♣ Strict convexity implying convexity is obvious


♣ Suppose f is strongly convex. Then, there exists a λ > 0 such that
f (αx + (1 − α)y) − λ kαx + (1 − α)yk2
≤ α f (x) − λkxk2 + (1 − α) f (y) − λkyk2
 

= αf (x) + (1 − α)f (y) − λ αkxk2 + (1 − α)kyk2




♣ Rearranging the above inequality yields


f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y)
 
+ λ kαx + (1 − α)yk2 − αkxk2 − (1 − α)kyk2

Harsha Hutridurga (IIT Bombay) SI 416 16 / 23


strongly strictly and just (contd.)

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y)


 
+ λ kαx + (1 − α)yk2 − αkxk2 − (1 − α)kyk2

♣ As kxk2 is strictly convex, the term in red is strictly negative


♣ Hence f is strictly convex

reverse implications are not true

♣ f : R → R defined as f (x) = x is convex but not strictly convex


♣ f : R → R defined as f (x) = x4 is strictly convex but not strongly
convex
I There does not exist a λ > 0 such that x4 − λx2 is convex on R

Harsha Hutridurga (IIT Bombay) SI 416 17 / 23


recap (characterization of convex functions)

♣ Let f : Rn → R be a continuously differentiable function.

f is for every x, y ∈ Rn ,
convex ⇐⇒ f (y) ≥ f (x) + ∇f (x) · (y − x)

♣ Let f : Rn → R be a twice continuously differentiable function.

f is for every x ∈ Rn ,
convex ⇐⇒ ∇2 f (x)p, p ≥ 0 for all p ∈ Rn

Harsha Hutridurga (IIT Bombay) SI 416 18 / 23


recap (contd.)

♣ A function f is said to be strictly convex if for all x, y ∈ Rn , x 6= y,

f (αx + (1 − α)y) < αf (x) + (1 − α)f (y) for α ∈ (0, 1)

f is for every x, y ∈ Rn , x 6= y,
strictly convex ⇐⇒ f (y) > f (x) + ∇f (x) · (y − x)

♣ Let f : Rn → R be a twice continuously differentiable function.

f is for every x ∈ Rn ,
strictly convex ⇐= ∇2 f (x)p, p > 0 for all p ∈ Rn \ {0}

♣ A strictly convex function has at most one global minimizer


Harsha Hutridurga (IIT Bombay) SI 416 19 / 23
recap (contd.)

♣ A function f : Rn → R is said to be strongly convex if there


exists a λ > 0 such that
f (x) − λkxk2 is convex.

♣ Let f : Rn → R be a twice continuously differentiable function.

f is for every x ∈ Rn ,
strongly convex =⇒ ∇2 f (x)p, p ≥ 2λkpk2 for all p ∈ Rn

f is for every x, y ∈ Rn ,
strongly convex =⇒ f (y) ≥ f (x) + h∇f (x), y − xi + λ ky − xk2

♣ Consider a function f : Rn → R.

f is f is f is
strongly convex =⇒ strictly convex =⇒ convex
Harsha Hutridurga (IIT Bombay) SI 416 20 / 23
strong convexity – further properties

♣ Suppose f : Rn → R is strongly convex, i.e. there exists λ > 0 s.t.


f (αx + (1 − α)y) − λ kαx + (1 − α)yk2
≤ α f (x) − λkxk2 + (1 − α) f (y) − λkyk2
 

= αf (x) + (1 − α)f (y) − λ αkxk2 + (1 − α)kyk2




♣ Rearranging the above inequality yields


f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y)
 
+ λ kαx + (1 − α)yk2 − αkxk2 − (1 − α)kyk2

♣ Note that
kαx + (1 − α)yk2 = α2 kxk2 + (1 − α)2 kyk2 + 2α(1 − α) hx, yi

Harsha Hutridurga (IIT Bombay) SI 416 21 / 23


strong convexity – further properties (contd.)

♣ Hence we get
kαx + (1 − α)yk2 − αkxk2 − (1 − α)kyk2
= α2 kxk2 + (1 − α)2 kyk2 + 2α(1 − α) hx, yi − αkxk2 − (1 − α)kyk2
= −α(1 − α) kxk2 − (1 − α)α kyk2 + 2α(1 − α) hx, yi
= −α(1 − α) kx − yk2

♣ Putting it all together, we obtain


f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y) − λα(1 − α) kx − yk2

Lemma
If f : Rn → R is a strongly convex function, then there exists a λ > 0
such that for all x, y ∈ Rn and α ∈ [0, 1],

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y) − λα(1 − α) kx − yk2


Harsha Hutridurga (IIT Bombay) SI 416 22 / 23
end of lecture 3
thank you for your attention

Harsha Hutridurga (IIT Bombay) SI 416 23 / 23

You might also like