0% found this document useful (0 votes)
43 views6 pages

Lecture 8: Optimization

This lecture describes optimization methods for finding the maximum of a function f(x) on a bounded interval [a,b]. It discusses determining the maximum by finding where the derivative f'(x) is zero, or using methods that do not require evaluating the derivative, such as Golden Section Search (GSS) and quadratic interpolation. GSS iteratively narrows the interval by evaluating f(x) at cleverly chosen interior points to discard subintervals not containing the maximum.

Uploaded by

Ho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views6 pages

Lecture 8: Optimization

This lecture describes optimization methods for finding the maximum of a function f(x) on a bounded interval [a,b]. It discusses determining the maximum by finding where the derivative f'(x) is zero, or using methods that do not require evaluating the derivative, such as Golden Section Search (GSS) and quadratic interpolation. GSS iteratively narrows the interval by evaluating f(x) at cleverly chosen interior points to discard subintervals not containing the maximum.

Uploaded by

Ho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lecture 8: Optimization

This lecture describes methods for the optimization of a real-valued function f (x) on a bounded real interval
[a, b]. We will describe methods for determining the maximum of f (x) on [a, b], i.e., we solve

max f (x). (1)


a≤x≤b

The same methods applied to −f (x) determine the minimum of f (x) on [a, b].
We first note that if the derivative f ′ (x) exists, is continuous on [a, b], and is not too difficult to evaluate,
then it can be attractive to solve (1) by first computing all distinct zeros of f ′ (x), say z1 < z2 < · · · < zℓ , in
the interior of the interval [a, b], and then evaluate f (x) at these zeros and at the endpoints a and b. The zj
can be minima, maxima, or inflection points of f (x). The function f (x) achieves its maximum at one these
zeros or at the endpoints. The zeros of f ′ (x) can be computed by one of the methods of Lectures 6-7.
The remainder of this lecture describes methods that do not require evaluation of the derivative. These
methods are attractive to use when f ′ (x) is either not available or very complicated to compute. The first
method Golden Section Search (GSS) is analogous to bisection. The second method applies interpolation by
a quadratic polynomial.
Let N(x) denote an open real interval that contains x. The function f (x) is said to have a local maximum
at x∗ if there is an open interval N(x∗ ), such that

f (x∗ ) ≥ f (x), x ∈ N(x∗ ) ∩ [a, b].

A solution of (1) is referred to as a global maximum of f (x) on [a, b]. There may be several global maxima.
Every global maximum is a local maximum. The converse is, of course, not true.
The optimization methods to be described determine a local maximum of f (x) in [a, b]. Sometimes it is
known from the background of the problem that there is at most one local maximum, which then necessarily
also is the global maximum. In general, however, a function can have several local maxima, which may not
be global maxima. A function, of course, may have several global maxima.
Example 1. The function f (x) = sin(x) has global maxima at π/2 and 5π/2 in the interval [0, 3π]; see
Figure 1. 2
Example 2. The function f (x) = e−x sin(x) has local maxima at π/4 and 9π/4 in the interval [0, 3π]; see
Figure 2. The local maximum at π/4 also is the global maximum. 2
Many optimization methods are myopic; they only know the function in a small neighborhood of the
current approximation of the maximum. There are engineering techniques for steering myopic optimization
methods towards a global maximum. These techniques are referred to as evolutionary methods, genetic
algorithms, or simulated annealing. They were discussed by Gwenn Volkert in the fall.

Golden Section Search


This method is analogous to bisection in the sense that the original interval [a1 , b1 ] = [a, b] is replaced by
a sequence of intervals [aj , bj ], j = 2, 3, . . . , of decreasing lengths, so that each interval [aj , bj ] contains at
least one local maximum of f (x). GSS divides each one of the intervals [aj , bj ], j = 1, 2, . . . , into three
right
subintervals [aj , xleft left
j ], [xj , xj ], and [xright
j , bj ], where the xleft
j and xright
j are cleverly chosen points, with
right
aj < xleft
j < xj < b j . The method then discards the right-most or left-most subinterval, so that the
remaining interval is guaranteed to contain a local maximum of f (x).

1
1

0.8

0.6

0.4

0.2

−0.2

−0.4

−0.6

−0.8

−1

0 1 2 3 4 5 6 7 8 9

Figure 1: The function f (x) = sin(x) for 0 ≤ x ≤ 3π. The function has two local maxima (at x = π/2 and
x = 9π/2). The corresponding function values are marked by ∗ (in red).

Thus, let a1 < xleft


1 < xright
1 < b1 and evaluate f (x) at these points. If f (xleft 1 ) ≥ f (x1
right
), then f (x)
right right
has a local maximum in the interval [a1 , x1 ] and we set [a2 , b2 ] := [a1 , x1 ]; otherwise f (x) has a local
maximum in [xleft left
1 , b1 ] and we set [a2 , b2 ] := [x1 , b1 ]. We now repeat the process for the interval [a2 , b2 ], i.e.,
right
we select interior points x2 and x2 , such that a2 < xleft
left
2 < xright
2 < b2 and evaluate f (x) at these points.
Similarly, as above we obtain the new interval [a3 , b3 ], which is guaranteed to contain a local maximum of
f (x). The computations proceed in this manner until a sufficiently small interval, say [ak , bk ], which contains
a local maximum of f (x), has been determined.
We would like to choose the interior points xleft j and xright
j , so that the interval [aj+1 , bj+1 ] is substantially
shorter than the interval [aj , bj ].
Example 3. The choices
1 2
xleft
j = aj + (bj − aj ), xright
j = aj + (bj − aj ), (2)
3 3
secure that the length of the new interval [aj+1 , bj+1 ] is 2/3 of the length of the previous interval [aj , bj ].
This subdivision requires that f (x) be evaluated at xleftj and xright
j for every j. 2
There is a better choice of interior points than in Example 3. The following selection reduces the length
of the intervals [aj , bj ] faster than in Example 3 and requires only one function evaluation for every j ≥ 2.
The trick is to choose the interior points xleft
j and xright
j in [aj , bj ] so that one of them becomes an endpoint
right
of the next interval [aj+1 , bj+1 ] and the other one becomes one of the interior points xleft j+1 or xj+1 of this
interval. We then only have to evaluate f (x) at one interior point.
Let the interior points be given by
xleft
j = aj + ρ(bj − aj ), xright
j = aj + (1 − ρ)(bj − aj ) (3)

2
0.3

0.25

0.2

0.15

0.1

0.05

−0.05
0 1 2 3 4 5 6 7 8 9

Figure 2: The function f (x) = e−x sin(x) for 0 ≤ x ≤ 3π. The function has two local maxima (at x = π/4
and x = 9π/4). The corresponding function values are marked by ∗ (in red).

for some constant 0 < ρ < 1 to be determined. The choice ρ = 1/3 yields (2).
right
Assume that f (xleft
j ) ≥ f (xj ). Then f (x) has a local maximum in [aj , xright
j ]. We set

aj+1 := aj , bj+1 := xright


j (4)

and would like


xright left
j+1 = xj . (5)
The latter requirement determines the parameter ρ.

Figure 3: The bottom line depicts the interval [aj , bj ] with the endpoints aj and bj marked by o (in red),
and the interior points xleftj and xright
j marked by ∗ (in red). The top line shows the interval [aj+1 , bj+1 ] =
[aj , xright
j ] that may be determined in step j of GSS. The endpoints are marked by o (in red) and the interior
left right
points xj+1 and xj+1 by ∗ (in red).

There are many ways to determine ρ. One of the simplest is based on the observation that

bj+1 − xright
j+1 bj − xright
j
= , (6)
bj+1 − aj+1 b j − aj

3
cf. Figure 3. Using first (4) and (5), and subsequently (3), the left-hand side simplifies to

bj+1 − xright
j+1 xright
j − xleft
j (1 − 2ρ)(bj − aj ) 1 − 2ρ
= right = = .
bj+1 − aj+1 xj − aj (1 − ρ)(bj − aj ) 1−ρ

The right-hand side of (6) can be simplified similarly. It follows from the right-hand side equation of (3)
that xright
j = bj − ρ(bj − aj ) and, therefore,

bj − xright
j ρ(bj − aj )
= = ρ.
b j − aj b j − aj

Thus, equation (6) is equivalent to


1 − 2ρ
= ρ,
1−ρ
which can be expressed as
ρ2 − 3ρ + 1 = 0.
The only root smaller than unity is given by
1 √
ρ= (3 − 5) ≈ 0.382.
2
It follows that the lengths of the intervals [aj , bj ] decreases as j increases according to

bj+1 − aj+1 = xright


j − aj = (1 − ρ)(bj − aj ) ≈ 0.618 (bj − aj ).

In particular, the length decreases less in each step than for the bisection method of Lectures 6-7. We
conclude that GSS is useful for determining a local maximum to low accuracy. If high accuracy is desired,
then faster methods should be employed, such as the method of the following section.
Example 4. We apply GSS to the function of Example 1 with [a1 , b1 ] = [0, 3π]. We obtain xleft 1 = 3.60,
right right
x1 = 5.82, and f (xleft
1 ) = f (x 1 ) = −0.44. We therefore choose the next interval to be [a 2 , b 2 ] = [0, xright
1 ]
right left right right
and obtain x2 = 2.22 and x2 = x1 = 5.82. We have f (x2 ) = 0.79 and it is clear how the method
will determine the local maximum at x = π/2. The progress of GSS is illustrated by Figure 3.
The reason that this local maximum is determined depends on that we discard the subinterval [xright 1 , b1 ]
right right
when f (xleft
1 ) ≥ f (x1 ). If this inequality would have been f (xleft 1 ) > f (x1 ), then the other local
maximum would have been found. This example illustrates that it generally is a good idea to graph the
function first, and then choose [a1 , b1 ] so that it contains the desired maximum. 2
Example 5. We apply GSS to the function of Example 2 with [a1 , b1 ] = [0, 3π]. Similarly as in Example
4, we obtain xleft1 = 3.60, xright
1 = 5.82, f (xleft
1 ) = −1.2 · 10
−2
, and f (xright
1 ) = −1.2 · 10−3 . We therefore
choose the next interval to be [a2 , b2 ] = [xleft 1 , b1 ], and obtain x2
left
= 5.82 and xright
2 = 7.20. We evaluate
right −4
f (x2 ) = 5.9 · 10 and it is clear the method will determine the local maximum at x = 5π/2. If we would
have liked to determine the global maximum, then a smaller initial interval [a1 , b1 ] containing x = π/4 should
be used. 2

4
Quadratic Interpolation
We found in Lectures 6-7 that the secant method gives faster convergence than bisection. Analogously, we
will see that interpolation by a sequence of quadratic polynomials yields faster convergence towards a local
maximum than GSS. The rationale is that GSS only uses the function values to determines whether the
function at a new node is larger or smaller than at an adjacent node. We describe in this section a method
based on interpolating the function by a sequence of quadratic polynomial and determine the maxima of
the latter. We proceed as follows: Let the interval [aj , bj ] be determined in step j − 1 of GSS, and let
x̂j denote the point inside this interval at which f (x̂j ) is known, i.e., x̂j is either xleft j or xright
j . Instead
of evaluating f (x) by a new point determined by GSS, we fit a quadratic polynomial qj (x) to the data
(aj , f (aj )), (bj , f (bj )), and (x̂j , f (x̂j )). If this polynomial has a maximum in the interior of [aj , bj ], then this
will be our new interior point. Denote this point by xpolmax j and evaluate f (xpolmax
j ).
polmax
We remark that it is easy to determine xj . If the leading coefficient of qj (x) is negative, then qj (x)
is concave (−qj (x) is convex) and has a maximum. The point xpolmax j is the zero of the linear polynomial
qj′ (x).
Thus, assume that aj < xpolmax j < bj and that qj (x) achieves its maximum at xpolmax j . We then discard
polmax
the point in the set {aj , x̃j , bj } furthest away from xj . This gives us three nodes at which the function
f (x) is known, and we can compute a new quadratic polynomial, whose maximum we determine, and so on.
Interpolation by a sequence of quadratic polynomials usually gives much faster convergence than GSS,
however, it is not guaranteed to converge. Thus, quadratic interpolation has to be safeguarded. For instance,
one may have to apply GSS when quadratic interpolation fails to converge to a maximum in the last interval
determined by GSS.
Example 6. The quadratic polynomial determined by interpolating f (x) at the nodes {a1 , xleft 1 , b1 } or at
the nodes {a1 , xright 1 , b 1 } in Example 5 is not convex. Hence, we have to reduce the interval by GSS before
applying polynomial interpolation. 2
We finally comment on the accuracy of the computed local maximum by any optimization method.
Assume that we have determined an approximation x̂ of the local maximum x∗ . Let f (x) have a continuous
second derivative f ′′ (x) in a neighborhood of x∗ . Then, in view of that f ′ (x∗ ) vanishes, Taylor expansion of
f (x) at x∗ yields
1
f (x̂) − f (x∗ ) = (xj − x∗ )2 f ′′ (x̃),
2
where x̃ lives in the interval with endpoints x∗ and x̂. This formula implies that the error in the computed
maximum x̂ of the function f (x) can be expected to be proportional to square root of the error in the function
value f (x̂). In particular, we cannot expect to be able to determine a local maximum with a relative error
smaller than the square root of machine epsilon.

Exercises
1. Apply GSS with initial interval [0, π] to determine the global maximum of the function in Example 2.
Carry out 4 steps by GSS.
2. Apply quadratic interpolation with initial interval [0, π] to determine the global maximum of the function
in Example 2. Carry out 4 steps. How does the accuracy of the computed approximate maximum compare
with the accuracy of the computed approximate maximum determined in Exercise 1? Estimate empirically

5
the rate of convergence of the quadratic interpolation method. For instance, study quotients
polmax
x ∗
j+1 − x
sj = polmax , j = 1, 2, 3, . . . ,
xj − x∗

or their logarithms.
3. Compute the minimum of the function in Example 2 by carrying out 4 GSS steps.
4. GSS is analogous to bisection, and the quadratic interpolation method is analogous to the secant method.
The former applies local quadratic interpolation to determine a maximum, while the latter applies local
linear interpolation to determine a zero. Describe an analogue of regula falsi for optimization.

You might also like