Chapter 17: Pattern Search For Unconstrained NLP
Chapter 17: Pattern Search For Unconstrained NLP
Sometimes it is inconvenient or not possible to know the first or second derivatives of the
objective function in an unconstrained nonlinear programming problem. This can happen, for
example, when the function value at some input point x is actually calculated by a simulation of
a system, e.g. the simulation returns a measure of the flight characteristics of a new aircraft that
is being designed.
In this case we can use heuristic pattern search methods which need only the ability to return the
value of f(x) for some input point x. For this reason they are also known as derivative-free, direct
search, or black box optimization methods. Of course pattern search methods can also be applied
when the objective function is differentiable, but in that case we are ignoring the useful
information in the derivatives and second derivatives. So pattern search methods are typically
applied only when the derivatives are not available.
Quite a number of pattern search methods have been developed over the years. As an example of
the genre, we will look at one of the original methods, known as Hooke and Jeeves after the
original authors.
Exploratory search. This is a very local search that looks for an improving direction in
which to move. In some senses it is a crude search for the gradient direction.
Pattern move. This is a larger search in the improving direction. Larger and larger moves
are made as long as the improvement continues.
We'll look at each of these two types of moves separately before assembling them into the
complete algorithm.
Exploratory Search
The main idea is to find some improving direction (not necessarily the best improving direction,
which would be the gradient or anti-gradient direction). This is done by perturbing the current
point by small amounts in each of the variable directions and observing whether the objective
function value improves or worsens.
First we define the sizes of the perturbation steps that we will take in each dimension by setting
up the perturbation vector P0 = (x1, x2, x3, ... xn). Note that the perturbation step sizes do not
all have to be equal, but in general they are all relatively small. Given some current point x(0)
and its associated objective function value f(x(0)) we can now perform an exploratory search
around it as follows:
There are a few things to notice about the exploratory search algorithm:
If the upward perturbation for a variable is successful, then the downward perturbation
for that variable is not even attempted.
The downward perturbation for a variable is tried only if the upward perturbation for that
variable fails.
It's possible that both the upward and the downward perturbations fail for a particular
variable, in which case its value is not changed.
The method does not try all possible combinations of upward and downward
perturbations of the variables (there would be 2n such combinations: too many to try). It
is simply one pass through the list of variables. In the worst case where the upward
perturbation fails for every variable, then it will try 2n combinations of perturbations, but
it normally tries fewer than that. In the best case where every upward perturbation
succeeds, it will try just n perturbations.
If any of the attempted perturbations succeeds in improving the value of the objective function,
then the exploratory search has succeeded and the improving direction is given by the vector
between the initial point x(0) and the final point x(1) output by the exploratory search. Now we
can use this improving direction in the pattern move. If all of the perturbations fail to find an
improved value of the objective function, then the exploratory search has failed: we'll see later
what to do in that case.
Pattern Move
All that the pattern move requires is two points: the current point x(0) and some other point x(1)
that has a better value of the objective function. This gives the pattern move an improving
direction to move in. A new point x(2) is generated by moving from x(0) through x(1) as follows:
where a is a positive acceleration factor that just multiplies the length of the improving direction
vector given by . A common choice is a=2, in which case the equation reduces to:
Complete Algorithm
The complete algorithm requires 4 inputs in addition to the objective function to be optimized:
A starting point x(0),
The value of the acceleration factor a,
The initial perturbation vector P0,
The perturbation tolerance vector T = (t1, t2, t3, ... , tn). As we will see later in the
complete algorithm, this gives the smallest possible perturbation that can be considered
for each variable, and is used to halt the algorithm.
The complete algorithm has 3 main parts: initialization, the start/restart routine, and the pattern
move routine.
Start/Restart Routine:
Use an exploratory search around x(0) to find an improved point x(1) that has a better value
of the objective function.
IF the exploratory search fails (i.e. x(1) does not give a better value of the objective
function than x(0)) then:
o Reset all of the perturbations to their current size, i.e. P P/2.
o If any member of P is now smaller than its corresponding perturbation tolerance
in T, then exit with x(0) as the solution. Else go to Start/Restart.
ELSE [x(1) gives a better value of the objective function than x(0), so we have an
improving direction]:
o Reset the perturbation vector to its original values: P P0.
o Go to Pattern Move.
Pattern Move:
Obtain tentative x(2) by a pattern move from x(0) through x(1).
Obtain final x(2) by an exploratory search around tentative x(2).
IF f(x(2)) is worse than f(x(1)) then:
o Update points: x(0) x(1). [x(1) is the best point seen so far]
o Go to Start/Restart.
ELSE [f(x(2)) is better than or equal to f(x(1))]:
o Update points: x(0) x(1) and x(1) x(2).
o Go to Pattern Move.
Note that pattern moves are repeated as long as they are successful, and usually become longer
and longer. But as soon as a pattern move fails by producing an f(x(2)) that is worse than the
previous best value of the objective function f(x(1)), then the pattern move is retracted and we go
back to an exploratory search around the best point seen so far.
Note also that a pattern move first obtains a tentative x(2) and then finalizes x(2) by an exploratory
search. This helps the search to "curve", as we will see in the upcoming example. Lastly note
how the algorithm stops: an exploratory search around the best point seen so far fails at all sizes
of perturbation, even the smallest as specified by the perturbation tolerances in T. This means
that the algorithm cannot find an improving direction away from the best point, hence it must be
the optimum point. This test can be fooled though, as we will see later.
Minimize
Initialize:
Initial point: x(0) = (1, 1). f(x(0)) = -16.
Acceleration factor a = 2.
Perturbation vector P0 = (0.5, 0.5).
Perturbation tolerance vector T = (0.1, 0.1).
P P0.
Note that these are not very good choices for P0 and T. They are chosen in this case just so
that the algorithm terminates after a small number of steps. The elements in T would
normally be much smaller.
Start/Restart:
fbest = f(x(0)) = -16.
Try x(1) = (1.5, 1). f(x(1)) = -18.25, so keep the perturbation and update fbest = -18.25.
Try x(1) = (1.5, 1.5). f(x(1)) = -21, so keep the perturbation and update fbest = -21.
The steps in the exploratory search are shown in this first Start/Restart, but are omitted from
here forward.
Pattern Move from x(0) = (1, 1) through x(1) = (1.5, 1.5):
Tentative x(2) = 2(1.5, 1.5) (1, 1) = (2, 2). f(2, 2) = -24.
Final x(2) after exploratory search around tentative x(2) is (2.0, 2.5). f(x(2)) = -25.75 is
better than f(x(1)) = -21 so the move is accepted.
Update points: x(0) x(1) = (1.5, 1.5) and x(1) x(2) = (2.0, 2.5).
Pattern Move from x(0) = (1.5, 1.5) through x(1) = (2.0, 2.5):
Tentative x(2) = 2(2.0, 2.5) (1.5, 1.5) = (2.5, 3.5). f(2.5, 3.5) = -27.
Final x(2) after exploratory search around tentative x(2) is (2.0, 4.0). f(x(2)) = -28 is better
than f(x(1)) = -25,75 so the move is accepted.
Update points: x(0) x(1) = (2.0, 2.5) and x(1) x(2) = (2.0, 4.0).
Pattern Move from x(0) = (2.0, 2.5) through x(1) = (2.0, 4.0):
Tentative x(2) = 2(2.0, 4.0) (2.0, 2.5) = (2.0, 5.5). f(2.0, 5.5) = -25.75.
Final x(2) after exploratory search around tentative x(2) is (2.0, 5.0). f(x(2)) = -27 is worse
than f(x(1)) = -28 so the move is rejected.
Update points: x(0) x(1) = (2.0, 4.0).
Start/Restart:
Note how the exploratory search around each tentative x(2) allows the search to "curve" around
corners.
The Matlab Global Optimization Toolbox includes (as of 2014) 3 pattern search methods:
generalized pattern search, generating set search, and mesh adaptive search. The optimization
package for the free Matlab-like Octave system includes a Nelder-Mead implementation.
An interesting sub-field of pattern search research is determining how to find a good solution for
an unconstrained problem with the smallest number of function evaluations. This is important
when each function evaluation is expensive, either in time or actual money. It could be that
finding f(x) for some value of x requires a long-running simulation (expensive in terms of time),
or it requires the construction and testing of a physical prototype (expensive in terms of both
time and money). Some of work along these lines can be found under the search term "efficient
global optimization".