0% found this document useful (0 votes)
9 views71 pages

Lecture 32 34

The document discusses unconstrained optimization, covering topics such as single-variable, two-variable, and multi-variable problems, along with necessary and sufficient conditions for identifying extreme points. It includes various algorithms like ternary search, gradient descent, and methods for determining maxima and minima using Hessian matrices. Additionally, it provides examples and conditions for positive and negative definiteness of matrices in the context of optimization.

Uploaded by

f20221605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views71 pages

Lecture 32 34

The document discusses unconstrained optimization, covering topics such as single-variable, two-variable, and multi-variable problems, along with necessary and sufficient conditions for identifying extreme points. It includes various algorithms like ternary search, gradient descent, and methods for determining maxima and minima using Hessian matrices. Additionally, it provides examples and conditions for positive and negative definiteness of matrices in the context of optimization.

Uploaded by

f20221605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

ENGINEERING OPTIMIZATION

Unconstrained Optimization

BITS Pilani
Pilani Campus
Outline
• Unconstrained problems
– Single variable
– Two variables
– Multiple variables

• Unconstrained algorithms
– Ternary search
– Dichotomous and Golden-section
– Gradient descent/ascent method

2
Unconstrained
problems

3
Unconstrained Problems
• An extreme point of a function f(X) defines either a
maximum or a minimum of the function.

• Point 𝐗𝟎 = x10 , x20 , … , xn0 is a maximum if:


f(𝐗𝟎 + 𝐡) ≤ f(𝐗𝟎)
for all 𝐡 = h1 , h2 , … , hn , where |hj | is sufficiently
small.

• Point 𝐗𝟎 = x10 , x20 , … , xn0 is a minimum if:


f(𝐗𝟎 + 𝐡) ≥ f(𝐗𝟎)
4
Strong
Maxima

Strong
Maxima

Weak
Maxima
Inflection/Saddle

5
Single-variable function
• Necessary condition for x0 to be an extreme point of
z = f(x) is:
dz
=0
dx
– satisfied at inflection/saddle points  points obtained from
dz
= 0  stationary points.
dx

6
Single-variable function
• Sufficient condition: Given x0, a stationary point of
f(x), if the first (n – 1) derivatives are zero:
dz d2 z dn−1 z dn z
= 2 = ⋯ = n−1 = 0 and n
dx dx dx dx

i. If n is odd  x0 is an inflection point.


ii. If n is even 
dn z
• x0 is minimum if >0
dxn
dn z
• x0 is maximum if <0
dxn
7
8
9
Two variable
function
https://www.youtube.com/watch?v=_Ffcr98c7EE

10
Two-variable function
• Necessary condition for (x0, y0) to be an extreme
point of z = f(x, y) is:
𝜕z 𝜕z
= 0 and =0
𝜕x 𝜕y
– satisfied at inflection/saddle points.

What is the sufficient condition?

11
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
2
𝜕2 z 𝜕2 z 𝜕2 z
• if − >0
𝜕x2 𝜕y2 𝜕x𝜕y
𝜕2 z
– if > 0  Local minimum
𝜕x2
𝜕2 z
– if < 0  Local maximum
𝜕x2
2
𝜕2 z 𝜕2 z 𝜕2 z
• if − < 0  saddle point
𝜕x2 𝜕y2 𝜕x𝜕y
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
• if − = 0  inconclusive…
𝜕x2 𝜕y2 𝜕x𝜕y 𝜕y𝜕x
12
Two-variable function
Example: Consider the function:
z = x 2 + y 2 − a(xy), aϵR

Compute the values of a for which (0,0) is


– A minimum:

– A saddle point:

13
Two-variable function
Example: Consider the function:
z = x 2 + y 2 − a(xy), aϵR

Compute the values of a for which (0,0) is


2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– A minimum: =? > 0 𝑎𝑛𝑑 2 2 − =? > 0
𝜕x2 𝜕x 𝜕y 𝜕x𝜕y

2
𝜕2 z 𝜕2 z 𝜕2 z
– A saddle point: − =? < 0
𝜕x2 𝜕y2 𝜕x𝜕y

14
Two-variable function (alternative form)
• Pricipal minor: Let A be an n×n matrix. A k×k
submatrix of A formed by deleting n−k rows of A, and
the same n−k columns of A, is called principal
submatrix of A. The determinant of a principal
submatrix of A is called a principal minor of A.

• Example: Compute the principal minors of


a11 a12 a13
A = a21 a22 a23
a31 a32 a33

15
Two-variable function (alternative form)
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
• Solution:
– 1st order: a11 , a22 , and a33

a11 a12 a11 a13 a22 a23


– 2nd order: a a22 , a31 a33 , and a32 a33
21

a11 a12 a13


– 3rd order: a21 a22 a23
a31 a32 a33
17
Two-variable function (alternative form)
• Leading principal minor: Let A be an n×n matrix.
The kth order principal sub-matrix of A obtained by
deleting the last n−k rows and columns of A is called
the kth order leading principal submatrix of A, and its
determinant is called the kth order leading principal
minor of A.
• Example: Compute the leading principal minors of
a11 a12 a13
A = a21 a22 a23
a31 a32 a33

18
Two-variable function (alternative form)
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
• Solution:
– 1st order: 𝒂𝟏𝟏 , a22 , and a33

𝒂𝟏𝟏 𝒂𝟏𝟐 a11 a13 a22 a23


– 2nd order: 𝒂 𝒂𝟐𝟐 , a31 a33 , and a32 a33
𝟐𝟏

𝒂𝟏𝟏 𝒂𝟏𝟐 𝒂𝟏𝟑


– 3rd order: 𝒂𝟐𝟏 𝒂𝟐𝟐 𝒂𝟐𝟑
𝒂𝟑𝟏 𝒂𝟑𝟐 𝒂𝟑𝟑
19
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0.
definite

Positive semi-
All principal minors of A are ≥ 0.
definite

Negative The leading principal minors alternate in sign as follows:


definite |A1| < 0, |A2| > 0, |A3| < 0, etc.

Negative All principal minors of odd order are ≤ 0 and all


semi-definite principal minors of even order are ≥ 0.

Indefinite None of the above criteria is satisfied.


20
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0. Minimum
definite

Positive semi-
All principal minors of A are ≥ 0. Inconclusive
definite

Negative The leading principal minors alternate in sign as follows:


definite |A1| < 0, |A2| > 0, |A3| < 0, etc. Maximum

Negative All principal minors of odd order are ≤ 0 and all


semi-definite principal minors of even order are ≥ 0. Inconclusive

Indefinite None of the above criteria is satisfied. Saddle


21
Two-variable function (alternative form)
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then define hessian matrix (H)
𝜕2z 𝜕2z
𝜕x 2 𝜕x𝜕y
H=
𝜕2z 𝜕2z
𝜕y𝜕x 𝜕y 2
𝜕2 z
• 1st order leading principal minor: D1 =
𝜕x2
𝜕2 z 𝜕2 z
𝜕x2 𝜕x𝜕y
• 2nd order leading principal minor: D2 =
𝜕2 z 𝜕2 z
𝜕y𝜕x 𝜕y2

22
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) is positive definite  minima
– A) If and only if all leading principal minors of the matrix are
positive, then the matrix is positive definite.
– D1 > 0 and D2 > 0
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– > 0 and − >0
𝜕x2 𝜕x2 𝜕y2 𝜕x𝜕y

23
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) is negative definite  maximum
– B) If and only if the kth order leading principal minor of the
matrix has sign (−1)k, then the matrix is negative definite.
– D1 < 0 and D2 > 0
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– < 0 and − >0
𝜕x2 𝜕x2 𝜕y2 𝜕x𝜕y

24
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) indefinite  saddle point.
– C) If some kth leading principal minors is non-zero, but does
not fit either of the sign patterns, then the matrix is
indefinite.
– Neither A nor B holds and D2 ≠ 0
2
𝜕2 z 𝜕2 z 𝜕2 z
– − <0
𝜕x2 𝜕y2 𝜕x𝜕y

• Test inconclusive if none of the above hold  D2 = 0

25
Multi-variable function
• Necessary condition for X0 = (x1, x2, x3, …xn) to be
an extreme point of z = f(x1 , x1 , … , xn ) = f(𝐗):
𝜕z 𝜕z 𝜕z
= 0, = 0, … and = 0 at 𝐗 𝟎
𝜕x1 𝜕x2 𝜕xn
Or
𝛻z 𝐗 𝟎 = 𝟎
– satisfied at inflection/saddle points.

What is the sufficient condition?

26
Multi-variable function
• Sufficient condition: Given X0 = (x1, x2, x3, …xn), a
stationary point of z = f(𝐗), then (H)
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕x12 𝜕x1 𝜕x2 𝜕x1 𝜕xn
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕x2 𝜕x1 𝜕x22 𝜕x2 𝜕xn
H= . . . . . .
. . . . . .
. . . . . .
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕xn 𝜕x1 𝜕xn 𝜕x2 𝜕xn2

27
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0. Minimum
definite

Positive semi-
All principal minors of A are ≥ 0. Inconclusive
definite

Negative The leading principal minors alternate in sign as follows:


definite |A1| < 0, |A2| > 0, |A3| < 0, etc. Maximum

Negative All principal minors of odd order are ≤ 0 and all


semi-definite principal minors of even order are ≥ 0. Inconclusive

Indefinite None of the above criteria is satisfied. Saddle


28
Example 20-2.
Determine the extreme points of the following functions:
(a) f(X) = x13 + x23 − 3x1x2
(b) f(X) = 2x12 + x22 + x32 + 6(x1 + x2 + x3) + 2x1x2x3 [HW]

29
Example 20-3.
Verify that the function:
f(X) = 2x1x2x3 − 4x1x3 − 2x2x3 + x12 + x22 + x32 − 2x1 − 4x2 + 4x3
has the stationary points:
(0, 3, 1), (0, 1, −1), (1, 2, 0), (2, 1, 1), and (2, 3, −1).
Use the sufficiency condition to identify the extreme points at:
– (1, 2, 0)
– (0, 1, −1)

30
Multi-variable function
• Notes
– Matrices have to be real and symmetric when considering
their definiteness.
– Hessian is real and symmetric.

– All positive definite matrices are positive semi-definite.


– All negative definite matrices are negative semi-definite.

31
Direct search
methods

32
Direct search methods
• Apply primarily to strictly unimodal single-variable
functions.
• Identify interval of uncertainty known to include the
optimum solution point.
• Locate the optimum by iteratively narrowing the
interval of uncertainty to a desired level of accuracy.

33
Direct search methods
• A unimodal function is one that has only one peak
(maximum) or valley (minimum) in a given
interval.

34
Direct search methods
• Three search algorithms:
– Terenary
– Dichotomous
– Golden-section.

• Both seek maximization of unimodal f(x) in


a≤ x≤b
that includes the optimum point x*.

• Initial interval of uncertainty I0 = [a, b]


35
Direct search methods

f(x)

a b
Maximize f(x) in: a ≤ x ≤ b, I0 = [a, b]
36
Direct search methods
• If x1 and x2 are on the
either sides of the f(x1)
maximum
– if f(x1) > f(x2)
– replace xR with x2

f(x2)

xL x1 x2 xR

37
Direct search methods
• If x1 and x2 are on the
right of the maximum f(x1)
 f(x1) > f(x2)
– replace xR  x2

f(x2)

xL x1 x2 xR

38
Direct search methods
• If x1 and x2 are on the
either sides of the
maximum
– if f(x1) < f(x2) f(x2)
– replace xL with x1
f(x1)

xL x1 x2 xR

39
Direct search methods
• If x1 and x2 are on the
left of the maximum
f(x2)
 f(x1) < f(x2)
– replace xL with x1
f(x1)

xL x1 x2 xR

40
Direct search methods
• Fundamental ideal (**)

Condition What to do? New interval

f(x1) > f(x2) Replace xR  x2 (xL, x2)

f(x1) < f(x2) Replace xL  x1 (x1, xR)

f(x1) = f(x2) Replace xL  x1 & xR  x2 (x1, x2)

41
Ternary search
• Divides the interval into
three equal parts.
• Use **
f(x2)
f(x1)
• Next interval = 2/3 of
current interval.
• Terminate when the
interval width is ___.

xL x1 x2 xR

L L L
42
Ternary search
• Divides the interval into
three equal parts.
• Use **
f(x2)
f(x1)
• Next interval = 2/3 of
current interval.
• Terminate when the
interval width In ≤ Δ.
xL + xR xL x1 x2 xR
x∗ =
2
L L L
43
Dichotomous search
• Divides the interval:
(xL +xR ) δ f(x1) f(x2)
– x1 = −
2 2
(xL +xR ) δ
– x2 = +
2 2
• Use **

I0 δ
• I1 = +
2 2
I1 δ I0 δ δ
• I2 = + = + +
2 2 4 4 2
xL x1 x2 xR
I0 1
• In = n + δ(1 − n )
2 2 L δ L
44
Dichotomous search
I0 δ
In = δ + n − n
2 2
Note: In ≥ δ
• Interval narrowing slows
appreciably as I  δ
f(x1)
f(x2)
• When to stop?

xL x1 x2 xR
δ
45
Dichotomous search
I0 δ
In = δ + n − n
2 2
Note: In ≥ δ
• Interval narrowing slows
appreciably as I  δ
f(x1)
f(x2)
• When to stop?
I0 δ
δ+ n− n ≤Δ
2 2 xL x1 x2 xR
δ
46
Golden-section search
• Save computations
by reusing discarded f(x2)
values
f(x1)

• Divides the interval into


three parts:

x1 = xR − α(xR − xL )
x2 = xL + α(xR − xL )
xL x1 x2 xR
0<α<1
47
Golden-section search
• Save computations by
reusing discarded f(x2)
values
f(x1)
• Consider Inew = (xL , x2 )

𝐈𝐧𝐞𝐰

xL x1 x2 xR

48
Golden-section search
• Save computations by
reusing discarded
values
• Consider Inew = (xL , x2 )

• What should be α such


that x2,new = x1 𝐈𝐧𝐞𝐰

−1 + √5 xL x1,new x1
α= = .681 xR
2 x2,new
49
Golden-section search
• Divides the interval:
x1 = xR − α(xR − xL ) f(x2)
x2 = xL + α(xR − xL )
−1 + √5 f(x1)
𝛼= = .681
2
• Use **

• Next interval = 𝛼 times


current.
• Terminate when In ≤ Δ. xL x1 x2 xR

50
Golden-section vs dichotomous search

• Golden-section converges more rapidly than


dichotomous. Why?
• Less computation required in Golden-section. Why?

51
52
Golden-section vs dichotomous search

• In the dichotomous method, narrowing of uncertainty


interval slows down appreciably as IΔ.
• Golden-section requires half the computations 
recycles one set of computations from the preceding
iteration.
53
HW
• Derive the value of the golden ratio (α).
• Do example 21.1-1

54
Notes
A unimodal function f(x) on [a, b] satisfies for some
unique x ∗ ∈ [a, b] that:
f(x) is strictly increasing (decreasing) for x ≤ x ∗ and strictly
decreasing (increasing) for x ≥ x ∗ .

x ∗ is the maxima (minima)

55
Steepest Ascent Method
• For optimizing twice continuously differentiable
functions.
• Generate successive points in the direction of the
gradient of the function.

56
Gradients and level curves

P2 P1

P3

Y
X

https://math.illinois.edu/research/igl/projects/spring/2019/interactive-visualizations-calculus-and-probability 57
Gradients and level curves

https://mathinsight.org/directional_derivative_gradient_introduction
58
Gradients and level curves
• Gradient is always perpendicular to the level curves.
• When the level curves are close together, the gradient is
large.

• What happens to the gradient at the tops of the


mountains?

59
https://mathinsight.org/directional_derivative_gradient_introduction
60
P1

When to stop?
https://mathinsight.org/directional_derivative_gradient_introduction 61
P1

P2

https://mathinsight.org/directional_derivative_gradient_introduction 62
P3 P1

P2

https://mathinsight.org/directional_derivative_gradient_introduction 63
Example 21.1-2

𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22


64
Example 21.1-2

3
2

1 𝑿𝟎 = (1,1)
𝛁𝒇(𝑿𝟎 ) = (−2,0)
1
2

1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
65
Example 21.1-2

3
2

1 𝑷𝟎
𝑷𝟏
1
2

1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
66
Example 21.1-2

3
2

1 𝑷𝟎
𝑷𝟏
1
2

1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
67
Steepest Ascent Method
• 𝑓(𝑋) is maximized and 𝑋𝑘 be the current point.

• Next point:
𝑿𝒌+𝟏 = 𝑿𝒌 + 𝑟𝑘 𝜵𝒇(𝑿𝒌 )
– 𝑟𝑘 is the optimal step size at 𝑿𝒌 .

• 𝑟 = 𝑟𝑘 that maximizes single-variable function:


ℎ 𝑟 = 𝒇[𝑿𝒌 + 𝑟𝑘 𝜵𝒇 𝑿𝒌 ]

• Terminates when 𝑿𝒌+𝟏 ≈ 𝑿𝒌 .

68
Steepest Descent Method
• 𝑓(𝑋) is minimized and 𝑋𝑘 be the current point.

• Next point:
𝑿𝒌+𝟏 = 𝑿𝒌 − 𝑟𝑘 𝜵𝒇(𝑿𝒌 )
– 𝑟𝑘 is the optimal step size at 𝑿𝒌 .

• 𝑟 = 𝑟𝑘 that minimizes single-variable function:


ℎ 𝑟 = 𝒇[𝑿𝒌 − 𝑟𝑘 𝜵𝒇 𝑿𝒌 ]

• Terminates when 𝑿𝒌+𝟏 ≈ 𝑿𝒌 .

69
Example
• Minimize:
𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
Start at (0,0)

70
Example
• Minimize:
𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
Start at (0,0)
𝛻𝑓 = (1 + 4𝑥1 + 2𝑥2 , −1 + 2𝑥1 + 2𝑥2 )
𝛻𝑓 0,0 = 1, −1

𝑿𝟐 = 0,0 − 𝑟 1, −1 = (−𝑟, 𝑟)
ℎ 𝑟 = −𝑟 − 𝑟 + 2𝑟 2 − 2𝑟 2 + 𝑟 2 =
= 𝑟 2 − 2𝑟
= (𝑟 − 1)2 −1

Minimum at 𝑟 = 1 ⇒ 𝑿𝟐 = −1,1 𝑎𝑛𝑑 𝜵𝒇 −1,1 = (−1, −1)

71
Example
• Minimize: 𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22

72

You might also like