Lecture 32 34
Lecture 32 34
Unconstrained Optimization
BITS Pilani
Pilani Campus
Outline
• Unconstrained problems
– Single variable
– Two variables
– Multiple variables
• Unconstrained algorithms
– Ternary search
– Dichotomous and Golden-section
– Gradient descent/ascent method
2
Unconstrained
problems
3
Unconstrained Problems
• An extreme point of a function f(X) defines either a
maximum or a minimum of the function.
Strong
Maxima
Weak
Maxima
Inflection/Saddle
5
Single-variable function
• Necessary condition for x0 to be an extreme point of
z = f(x) is:
dz
=0
dx
– satisfied at inflection/saddle points points obtained from
dz
= 0 stationary points.
dx
6
Single-variable function
• Sufficient condition: Given x0, a stationary point of
f(x), if the first (n – 1) derivatives are zero:
dz d2 z dn−1 z dn z
= 2 = ⋯ = n−1 = 0 and n
dx dx dx dx
10
Two-variable function
• Necessary condition for (x0, y0) to be an extreme
point of z = f(x, y) is:
𝜕z 𝜕z
= 0 and =0
𝜕x 𝜕y
– satisfied at inflection/saddle points.
11
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
2
𝜕2 z 𝜕2 z 𝜕2 z
• if − >0
𝜕x2 𝜕y2 𝜕x𝜕y
𝜕2 z
– if > 0 Local minimum
𝜕x2
𝜕2 z
– if < 0 Local maximum
𝜕x2
2
𝜕2 z 𝜕2 z 𝜕2 z
• if − < 0 saddle point
𝜕x2 𝜕y2 𝜕x𝜕y
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
• if − = 0 inconclusive…
𝜕x2 𝜕y2 𝜕x𝜕y 𝜕y𝜕x
12
Two-variable function
Example: Consider the function:
z = x 2 + y 2 − a(xy), aϵR
– A saddle point:
13
Two-variable function
Example: Consider the function:
z = x 2 + y 2 − a(xy), aϵR
2
𝜕2 z 𝜕2 z 𝜕2 z
– A saddle point: − =? < 0
𝜕x2 𝜕y2 𝜕x𝜕y
14
Two-variable function (alternative form)
• Pricipal minor: Let A be an n×n matrix. A k×k
submatrix of A formed by deleting n−k rows of A, and
the same n−k columns of A, is called principal
submatrix of A. The determinant of a principal
submatrix of A is called a principal minor of A.
15
Two-variable function (alternative form)
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
• Solution:
– 1st order: a11 , a22 , and a33
18
Two-variable function (alternative form)
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
• Solution:
– 1st order: 𝒂𝟏𝟏 , a22 , and a33
Positive semi-
All principal minors of A are ≥ 0.
definite
Positive semi-
All principal minors of A are ≥ 0. Inconclusive
definite
22
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) is positive definite minima
– A) If and only if all leading principal minors of the matrix are
positive, then the matrix is positive definite.
– D1 > 0 and D2 > 0
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– > 0 and − >0
𝜕x2 𝜕x2 𝜕y2 𝜕x𝜕y
23
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) is negative definite maximum
– B) If and only if the kth order leading principal minor of the
matrix has sign (−1)k, then the matrix is negative definite.
– D1 < 0 and D2 > 0
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– < 0 and − >0
𝜕x2 𝜕x2 𝜕y2 𝜕x𝜕y
24
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) indefinite saddle point.
– C) If some kth leading principal minors is non-zero, but does
not fit either of the sign patterns, then the matrix is
indefinite.
– Neither A nor B holds and D2 ≠ 0
2
𝜕2 z 𝜕2 z 𝜕2 z
– − <0
𝜕x2 𝜕y2 𝜕x𝜕y
25
Multi-variable function
• Necessary condition for X0 = (x1, x2, x3, …xn) to be
an extreme point of z = f(x1 , x1 , … , xn ) = f(𝐗):
𝜕z 𝜕z 𝜕z
= 0, = 0, … and = 0 at 𝐗 𝟎
𝜕x1 𝜕x2 𝜕xn
Or
𝛻z 𝐗 𝟎 = 𝟎
– satisfied at inflection/saddle points.
26
Multi-variable function
• Sufficient condition: Given X0 = (x1, x2, x3, …xn), a
stationary point of z = f(𝐗), then (H)
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕x12 𝜕x1 𝜕x2 𝜕x1 𝜕xn
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕x2 𝜕x1 𝜕x22 𝜕x2 𝜕xn
H= . . . . . .
. . . . . .
. . . . . .
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕xn 𝜕x1 𝜕xn 𝜕x2 𝜕xn2
27
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0. Minimum
definite
Positive semi-
All principal minors of A are ≥ 0. Inconclusive
definite
29
Example 20-3.
Verify that the function:
f(X) = 2x1x2x3 − 4x1x3 − 2x2x3 + x12 + x22 + x32 − 2x1 − 4x2 + 4x3
has the stationary points:
(0, 3, 1), (0, 1, −1), (1, 2, 0), (2, 1, 1), and (2, 3, −1).
Use the sufficiency condition to identify the extreme points at:
– (1, 2, 0)
– (0, 1, −1)
30
Multi-variable function
• Notes
– Matrices have to be real and symmetric when considering
their definiteness.
– Hessian is real and symmetric.
31
Direct search
methods
32
Direct search methods
• Apply primarily to strictly unimodal single-variable
functions.
• Identify interval of uncertainty known to include the
optimum solution point.
• Locate the optimum by iteratively narrowing the
interval of uncertainty to a desired level of accuracy.
33
Direct search methods
• A unimodal function is one that has only one peak
(maximum) or valley (minimum) in a given
interval.
34
Direct search methods
• Three search algorithms:
– Terenary
– Dichotomous
– Golden-section.
f(x)
a b
Maximize f(x) in: a ≤ x ≤ b, I0 = [a, b]
36
Direct search methods
• If x1 and x2 are on the
either sides of the f(x1)
maximum
– if f(x1) > f(x2)
– replace xR with x2
f(x2)
xL x1 x2 xR
37
Direct search methods
• If x1 and x2 are on the
right of the maximum f(x1)
f(x1) > f(x2)
– replace xR x2
f(x2)
xL x1 x2 xR
38
Direct search methods
• If x1 and x2 are on the
either sides of the
maximum
– if f(x1) < f(x2) f(x2)
– replace xL with x1
f(x1)
xL x1 x2 xR
39
Direct search methods
• If x1 and x2 are on the
left of the maximum
f(x2)
f(x1) < f(x2)
– replace xL with x1
f(x1)
xL x1 x2 xR
40
Direct search methods
• Fundamental ideal (**)
41
Ternary search
• Divides the interval into
three equal parts.
• Use **
f(x2)
f(x1)
• Next interval = 2/3 of
current interval.
• Terminate when the
interval width is ___.
xL x1 x2 xR
L L L
42
Ternary search
• Divides the interval into
three equal parts.
• Use **
f(x2)
f(x1)
• Next interval = 2/3 of
current interval.
• Terminate when the
interval width In ≤ Δ.
xL + xR xL x1 x2 xR
x∗ =
2
L L L
43
Dichotomous search
• Divides the interval:
(xL +xR ) δ f(x1) f(x2)
– x1 = −
2 2
(xL +xR ) δ
– x2 = +
2 2
• Use **
I0 δ
• I1 = +
2 2
I1 δ I0 δ δ
• I2 = + = + +
2 2 4 4 2
xL x1 x2 xR
I0 1
• In = n + δ(1 − n )
2 2 L δ L
44
Dichotomous search
I0 δ
In = δ + n − n
2 2
Note: In ≥ δ
• Interval narrowing slows
appreciably as I δ
f(x1)
f(x2)
• When to stop?
xL x1 x2 xR
δ
45
Dichotomous search
I0 δ
In = δ + n − n
2 2
Note: In ≥ δ
• Interval narrowing slows
appreciably as I δ
f(x1)
f(x2)
• When to stop?
I0 δ
δ+ n− n ≤Δ
2 2 xL x1 x2 xR
δ
46
Golden-section search
• Save computations
by reusing discarded f(x2)
values
f(x1)
x1 = xR − α(xR − xL )
x2 = xL + α(xR − xL )
xL x1 x2 xR
0<α<1
47
Golden-section search
• Save computations by
reusing discarded f(x2)
values
f(x1)
• Consider Inew = (xL , x2 )
𝐈𝐧𝐞𝐰
xL x1 x2 xR
48
Golden-section search
• Save computations by
reusing discarded
values
• Consider Inew = (xL , x2 )
−1 + √5 xL x1,new x1
α= = .681 xR
2 x2,new
49
Golden-section search
• Divides the interval:
x1 = xR − α(xR − xL ) f(x2)
x2 = xL + α(xR − xL )
−1 + √5 f(x1)
𝛼= = .681
2
• Use **
50
Golden-section vs dichotomous search
51
52
Golden-section vs dichotomous search
54
Notes
A unimodal function f(x) on [a, b] satisfies for some
unique x ∗ ∈ [a, b] that:
f(x) is strictly increasing (decreasing) for x ≤ x ∗ and strictly
decreasing (increasing) for x ≥ x ∗ .
55
Steepest Ascent Method
• For optimizing twice continuously differentiable
functions.
• Generate successive points in the direction of the
gradient of the function.
56
Gradients and level curves
P2 P1
P3
Y
X
https://math.illinois.edu/research/igl/projects/spring/2019/interactive-visualizations-calculus-and-probability 57
Gradients and level curves
https://mathinsight.org/directional_derivative_gradient_introduction
58
Gradients and level curves
• Gradient is always perpendicular to the level curves.
• When the level curves are close together, the gradient is
large.
59
https://mathinsight.org/directional_derivative_gradient_introduction
60
P1
When to stop?
https://mathinsight.org/directional_derivative_gradient_introduction 61
P1
P2
https://mathinsight.org/directional_derivative_gradient_introduction 62
P3 P1
P2
https://mathinsight.org/directional_derivative_gradient_introduction 63
Example 21.1-2
3
2
1 𝑿𝟎 = (1,1)
𝛁𝒇(𝑿𝟎 ) = (−2,0)
1
2
1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
65
Example 21.1-2
3
2
1 𝑷𝟎
𝑷𝟏
1
2
1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
66
Example 21.1-2
3
2
1 𝑷𝟎
𝑷𝟏
1
2
1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
67
Steepest Ascent Method
• 𝑓(𝑋) is maximized and 𝑋𝑘 be the current point.
• Next point:
𝑿𝒌+𝟏 = 𝑿𝒌 + 𝑟𝑘 𝜵𝒇(𝑿𝒌 )
– 𝑟𝑘 is the optimal step size at 𝑿𝒌 .
68
Steepest Descent Method
• 𝑓(𝑋) is minimized and 𝑋𝑘 be the current point.
• Next point:
𝑿𝒌+𝟏 = 𝑿𝒌 − 𝑟𝑘 𝜵𝒇(𝑿𝒌 )
– 𝑟𝑘 is the optimal step size at 𝑿𝒌 .
69
Example
• Minimize:
𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
Start at (0,0)
70
Example
• Minimize:
𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
Start at (0,0)
𝛻𝑓 = (1 + 4𝑥1 + 2𝑥2 , −1 + 2𝑥1 + 2𝑥2 )
𝛻𝑓 0,0 = 1, −1
𝑿𝟐 = 0,0 − 𝑟 1, −1 = (−𝑟, 𝑟)
ℎ 𝑟 = −𝑟 − 𝑟 + 2𝑟 2 − 2𝑟 2 + 𝑟 2 =
= 𝑟 2 − 2𝑟
= (𝑟 − 1)2 −1
71
Example
• Minimize: 𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
72