Optimization PDF
Optimization PDF
Optimization
1
The general mathematical problem studied here is how to choose some variables, collected
into a vector x = (x1 , x2 , . . . xn ), to maximize, or in some situations minimize, an objective
function f (x), often subject to some equation constraints for the type g(x) = c and/or some
inequality constraints of the type g(x) c. (In this section I focus on maximization with
constraints. You can, and should as a good exercise to improve your understanding and
facility with the methods, obtain similar conditions for problems of minimization, or ones
with inequality constraints of the form g(x) c, merely by changing signs.)
I will begin with the simplest cases and proceed to more general and complex ones. Many
ideas are adequately explained using just two variables. Then instead of a vector x = (x1, x2)
I will use the simpler notation (x, y).
A warning: The proofs given below are loose and heuristic; a pure mathematician would
disdain to call them proofs. But they should suce for our applications-oriented purpose.
First some basic ideas and terminology.
An x satisfying all the constraints is called feasible. A particular feasible choice x, say
x = (x1, x2, . . . xn ), is called optimum if no other feasible choice gives a higher value of
f (x) (but other feasible choices may tie this value), and a strict optimum if all other feasible
choices give a lower value of f (x) (no ties allowed). An optimum x is called local if the
comparison is restricted to other feasible choices within a suciently small neighborhood of
x (using ordinary Euclidean distance). If the comparison holds against all other feasible
points, no matter how far distant, the optimum is called global.
Every global optimum is a local optimum but not vice versa. There may be two or more
global maximizers xa , xb etc. but they must all yield equal values f (xa) = f (xb ) = . . . of
the objective function. There can be multiple local maximizers with dierent values of the
function. A function can have at most one strict global maximizer; it may not have any (the
optimum may fail to exist) if the function has discontinuities, or if it is dened only over an
open interval or an innite interval and keeps increasing without reaching a maximum.
We will look for conditions to locate optima. These conditions take the form of mathematical statements about the functions, or their derivatives. Consider any such statement
S. We say that S is a necessary condition for x to be an optimum if, starting with the
premise that x is an optimum, the truth of S follows by logical deduction. We say that S
is a sucient condition for x to be an optimum if the optimality of x follows as a logical
deduction from the premise that S is true.
If a function is suciently dierentiable, its regular maxima are characterized by conditions on derivatives. Other types of maxima are called irregular by contrast. We also
classify the conditions according to the order of the derivatives of the functions: rst-order,
second-order etc. Then we abbreviate the label of a condition by its order and type; for
example, FONC stands for rst-order necessary condition.
1
Finally, maxima may occur at an interior point in the domain of denition of the functions
or at a boundary point. A point x is called an interior point of a set D if, for some positive
real number , all points x within (Euclidean) distance of x are also in D.
Notation: R will denote the real line, [a, b] a closed interval (includes end-points) of R, and
]a, b[ an open interval (excludes end-points). We consider a function, f : [a, b] R.
2.1
In each of the arguments in this section, we suppose that f (x) is suciently dierentiable.
To test whether a particular x in the interior of [a, b], that is, in the open interval ]a, b[,
gives a local maximum of f(x), we consider the eect on f (x) of moving x slightly away
from x, to x = x + x. We use the Taylor expansion:
f (x) = f (x ) + f (x ) x +
1
2
f (x ) (x)2 + . . . .
(1)
Then we have
FONC: If an interior point x of the domain of a dierentiable function f is a (local or
global) maximizer of f (x), then f (x) = 0.
Intuitive statement or sketch of proof: For x suciently small, the leading term in (1)
is the rst order, that is,
f (x + x) f (x ) + f (x ) x,
or
f (x + x) f (x) f (x) x .
Since x is interior (a < x < b), we can choose the deviation x positive or negative. If
f (x ) were non-zero, then we could make f (x + x) f (x) positive by choosing x to
have the same sign as f (x). Then x would not be a local maximizer. We have shown that
f (x ) = 0 implies that x cannot be a maximizer (not even local, and therefore certainly
not global). Therefore if x is a maximizer (local or global), we must have f (x ) = 0. Note
how, instead of the direct route of proving that optimum implies condition-true, we took
the indirect route condition-false implies no-optimum. (The two implications are logically
equivalent, or mutually contrapositive in the jargon of formal logic.) Such proofs by
contradiction are often useful.
Exercise: Go through a similar argument and show that the FONC for x ]a, b[ to be a
local minimizer is also f (x) = 0.
Now, taking FONC as satised, turn to second order conditions. First we have:
SONC: If an interior point x of the domain of a twice-dierentiable function f is a
(local or global) maximizer of f (x), then f (x ) 0.
Sketch of proof: The FONC above tells us f (x ) = 0. Then for x suciently small,
the Taylor expansion in (1) yields
f (x + x) f (x) +
1
2
f (x) x2 ,
or
2
f (x + x) f(x)
1
2
f (x ) x2
If f (x) > 0, then for x suciently small, f (x +x) > f (x), so x cannot be a maximizer
(not even local and certaintly not global). Therefore, if x is a maximizer, local or global,
we must have f (x) 0. (Again a proof by contradiction.)
SOSC: If x is an interior point of the domain of a twice dierentiable function f , and
f (x) = 0 and f (x) < 0, then x yields a strict local maximum of f (x).
Sketch of proof: Using the same expression, for x suciently small, f (x) < 0 implies
f (x + x) < f (x). (A direct proof.)
A twice-dierentiable function f is said to be (weakly) concave at x if f (x) 0,
and strictly concave at x if f (x ) < 0. (Weakly is the default option, intended unless
strictly is specied.) Thus (given f (x ) = 0) concavity at x is necessary for x to yield a
local or global maximum, and strict concavity at x is sucient for x to yield a strict local
maximum. Soon I will dene concavity in a more general and more useful way.
What if the FONC f (x ) = 0 holds, but f (x) = 0 also? Thus the SONC is satised,
but the SOSC is not (for either a maximum or a minimum). Any x with f (x ) = 0 is called
a stationary point or critical point. Such a point may be a local extreme point (maximum or
minimum), but need not be. It could be a point of inexion, like 0 for f(x) = x3 . To test
the matter further, we must carry out the Taylor expansion (1) to higher-order terms. The
general rule is that the rst non-zero term on the right hand side should be of an even power
of x, say 2k. With the rst (2k 1) derivatives zero at x, the necessary condition for a
local maximum is that the 2k-th order derivative at x, written f (2k)(x), should be 0; the
corresponding sucient condition is that it be < 0. Even this may not work; the function f
dened by f (0) = 0 and f (x) = exp(1/x2 ) for x = 0 is obviously globally minimized at 0,
but all its derivatives f (k) (0) at 0 equal 0. Luckily such complications requiring checking for
derivatives of third or higher order are very rare in economic applications.
2.2
2.2.1
Irregular Maxima
Non-dierentiatbility (kinks and cusps)
Here suppose f is not dierentiable at x , but has left and right handed derivatives, denoted
by f (x ) and f (x+ ) respectively, that may not be equal to each other.
FONC: If an interior point x in the domain of a function f (x) is a local or global
maximizer, and f has left and right rst-order derivatives at x , then f (x ) 0 f (x+ )
In the special case where f is dierentiable at x , the two weak inequalities collapse to
the usual f (x) = 0.
The proof works by looking at one-sided Taylor expansions:
f (x + x) f(x )
f (x ) x if x < 0
f (x+ ) x if x > 0
and using the same kinds of arguments as in the proof for the regular FONC separately for
positive and negative deviations x.
The intuition is that the function should be increasing from the left up to x, and then
decreasing to the right of x. If the derivatives are nite we have a kink; if innite, a cusp.
3
Now we also have, somewhat unusually, a rst-order sucient condition, making it unnecessary to go to any (left and right handed) second-order derivatives:
FOSC: If f has left and right rst-order derivatives at an interior point x in its domain,
and f (x ) > 0 > f (x+ ), then x is a local maximizer of f (x).
2.2.2
End-Point Maxima
For a function dened over the interval [a, b], the boundaries of its domain are just the endpoints a and b. A maximum, local or global, can occur at either. We test this in the same
way as before, but only one-sided deviations are feasible. Therefore the rst-order conditions
become one-sided inequalities:
FONC: If a is a local or global maximizer of a function dened over [a, b] and dierentiable at a, then, f (a) 0. If b is a local or global maximizer of such a function, then
f (b) 0.
Sketch of proof: From the Taylor expansion, for x suciently small
f (a + x) f(a) f (a) x
If f (a) > 0, then for x > 0 we get f (a + x) > f (a) and a cannot give a local maximum.
So f (a) 0 is a necessary condition. Similar argument at b.
Intuitively, this just says that the function should not start increasing immediately to
the right of a. If it actually started falling, we could be sure of a local maximum. Thus we
have an unusual
FOSC: Suppose a function f dened over [a, b] is dierentiable at a, and f (a) < 0.
Then a yields a local maximum of f (x). If the function is dierentiable at b, and f (b) > 0,
then b yields a local maximum of f (x).
Exercise: Prove this. Note that at an end-point, dierentiability simply means the
appropriate one-sided dierentiability.
If f (a) = 0 or f (b) = 0, the FONC is satised at that end-point but the FOSC is not,
and we must turn to second-order conditions. These are the same as for interior maxima.
In economic applications, many variables are intrinsically non-negative, and the optimum
choice for some of them may be 0. Then the usual FONC f (0) = 0 need not be true and
the more general FONC f (0) 0 must be used.
Figure 1 sheds a slightly dierent and useful light on the idea of end-point maxima.
Suppose the function could be extrapolated to the left beyond the lower end-point a of its
domain, as shown by the thinner curve. It might go on increasing and attain a maximum at
an x
as shown there; it might even go on increasing for ever (
x = ). At a the function is
already on its downward-sloping portion to the right of such an x, so f (a) < 0. A limiting
case of this is where the maximum on the enlarged domain happens to coincide with a. Then
we have f (a) = 0, which satises the FONC but not the FOSC, so for checking suciency
we have to go to second-order conditions as in the basic theory of regular maxima.
f '(x)=0
f '(a)<0
A General Example
Here is an example with many local, irregular and boundary maxima/minima, to collect a
lot of possibilities for easy comparison and remembering. Consider the real-valued function
f dened over the interval [-3,3] by
f (x) =
3x + 8
for 3 x 2
4
2
(x 2 x )/4 for 2 x 2
x2 + 6 x 6 for 2 x 3
and FOSC. It is also the global minimum, by direct comparison with other local maxima.
At the other end-point x = 3, we have a local maximum, with f (3) = 3. There f (3) = 0,
satisfying the end-point FONC but not the FOSC. It is also the global maximum, by direct
comparison with other local maxima.
2.2.4
Global Maxima
Here I develop sucient conditions for a critical point to yield a global maximum, generalizing
the concept of concavity.
Denition: A twice-dierentiable function f dened on the domain [a, b] is (globally,
weakly) concave if f (x) 0 for all x [a, b]. If f (x) < 0 for all x [a, b], we will call
f (globally) strictly concave. (Global and weak are the default options, understood
unless a more restricted domain or strictness is specied. Thus a linear function is an extreme
case of a (weakly) concave function.
SOSC: If f is concave over [a, b], then any x satisfying an FONC is a global maximizer.
If further, f is globally strictly concave, then such an x is the unique and strict global
maximizer.
Sketch of proof: Taylors theorem based on intermediate points says that for any x, we
can nd an x
between x and x such that
f (x) = f (x) + (x x)f (x ) +
1
2
(x x )2 f (
x) .
f(x)
f(x)
C
A
E
G
D
x
x*
x
x*
x**
x*** x**
We can now summarize the above analysis into a set of rules. Begin by using the necessary
conditions to narrow down your search. Find all regular stationary points, that is, solutions
of the equation f (x) = 0, in (a, b). Then see if f fails to be dierentiable at any point of
(a, b); if so check if the kink/cusp FONCs are satised at these points. Finally, check whether
the FONCs at a and b are satised. (With experience, you will be able to omit some of these
steps if you know they are not going to be relevant for the function at hand. For example,
if the function is dened by a formula involving polynomial, exponential etc. functions, it
is dierentiable. If the function is dened by separate such formulas over separate intervals,
then at the points where the formula changes, you should suspect non-dierentiability. If
the formula involves logarithms or fractions, problems can arise where the argument of the
logarithm or the denominator in the fraction becomes zero. And so on.)
If the function is concave, the necessary conditions should have led you to one and only
one point, which is the global maximum. Otherwise, for each of the points that satisfy the
necessary conditions, check the sucient conditions. These give you the local maxima.
If there is only one local maximum, it must be the global maximum. If there are several
local maxima, the global maximum must be found by actually comparing the values of the
function at all of them. (Again, experience will provide some short-cuts but it is not possible
to lay down very general rules in advance.)
7
2.3
n
i=1
+ 12
n
n
i=1 j =1
fi (x) (xi )
fij (x ) xi xj + . . . .
(2)
This leads to
FONC: If x is an interior point of D, f is dierentiable at x , and x yields a maximum
(local or global) of f(x), then fi (x) = 0 for all i.
Sketch of proof: Since x is interior, a deviation in any direction is feasible. For each
i, consider a deviation (x)i dened so that its i-th component xi non-zero and all other
components zero. For xi suciently small,
f (x + (x)i ) f (x) fi (x ) xi .
If fi (x) = 0, then we can make f (x + (x)i ) > f (x) by taking xi to be of the same sign
as fi (x ). This contradiction establishes the necessary condition.
The intuition is simply that since the function is maximized as a whole at x , it must
be maximized with respect to each component of x, and then the standard one-dimensional
8
argument applies for the necessary conditions. (The converse is not true; dimension-bydimension maximization does not guarantee maximization as a whole. Therefore sucent
conditions are more than a simple compilation of those with respect to the variables x1 , x2 ,
. . ., xn taken one at a time.)
z
f (x*,y*)
y
(x*,y*)
f (x*,y*)
x
tangent
plane
z = F(x,y)
y
1
(x*,y*)
2
3
4
5
In multiple dimensions, these can arise in a variety of ways. They are best handled in each
specic context, and a general discussion is not very helpful.
There is one boundary case that deserves special attention. In economic applications,
the choice variables are often required to be non-negative. An optimum on the boundary
of the feasible set can then arise when some of the variables are zero and others positive.
Then the objective function should not increase as a zero variable is made slightly positive.
That is, the partial derivative of the objective function with respect to this variable should
be negative, or at worst zero, at the optimum. If some other variables are positive at the
optimum, the usual zero-partial-derivative conditions should continue to hold for them. A
demonstration for a function of two variables makes the point:
FONC : If (x, y) are required to be non-negative, and (0, y ) (where y > 0) is a (local
or global) maximizer of f(x, y), then fx (0, y) 0, fy (0, y ) = 0.
Sketch of proof: We have the usual Tayor expansion for a small deviation (x, y):
f (x, y y) f (0, y ) fx(0, y) x + fy (0, y ) y .
Consider a feasible deviation in the x-direction. Thus y = 0, and since x must be nonnegative, we must have x > 0. Then the premise that (0, y ) gives a maximum rules out
only fx (0, y ) > 0. Therefore we have the necessary condition fx (0, y 0. Next consider a
feasible deviation in the y-direction. Here y can be of either sign. The same argument as for
a function of one variable goes through, and rules out fy (0, y ) = 0. Therefore fy (0, y ) = 0
is a necessary condition.
Figure 6 illustrates this. The gure is similar to Figure 5, except that the position of the
y-axis has been shifted, and the points to its left are not allowed. The contours of f in the
feasible region (non-negative (x, y)) are shown thicker. If the function could be extrapolated
to the region of negative x, its maximum would occur at (
x, y). Given the non-negativity
10
1
(x,y)
2
3
4
5
(0,y*)
x
0
Second-Order Conditions
As with functions of one variable, there can be stationary points that are neither local
maxima nor local minima. With several variables, a new situation can arise a stationary
point may yield a maximum of f with respect to one subset of variables and a minimum with
respect to the rest. Then the graph of f (x) looks like a saddle. Exercise: Draw a picture.
Concavity is relevant in the second-order conditions for unconstrained maximization of
functions of several variables just as it was for functions of one variable. Once again we have
three denitions, which are equivalent if the function is twice dierentiable.
Denitions for twice-dierentiable functions: f is (weakly) concave at x if (fij (x) ) is
the matrix of a negative semi-denite quadratic form; that is,
i
11
f is (weakly) globally concave over D if (fij (x) ) is the matrix of a negative semi-denite
quadratic form for all x D; globally strictly concave over D if that quadratic form is
negative denite for all x D.
Denition for once-dierentiable functions: Curve lies below any of its tangent hyperplanes: f is (weakly) concave on D if, for any vectors x and x in D,
f (x) f (x ) +
n
i=1
(x
i xi ) fi (x ) f (x ) (x x )
The last term on the right hand side is the inner product of the two vectors f(x) (the
gradient vector of f ) and (x x ).
Strict concavity requires the inequality to be strict whenever x = x .
Denition without dierentiability requirement: Any chord lies below the curve: f is
concave on D if, for any x , x in D and any [0, 1], on writing x = x + (1 ) x ,
we have:
f (x ) f (x) + (1 ) f (x)
Strict concavity requires the inequality to be strict whenever x = x and = 0 or 1. For
this denition to be useful, the domain D should be a convex set, that is, for any x , x in
D and any [0, 1], the vector x = x + (1 ) x should also be in D.
The concavity conditions in terms of second derivatives are much more complex for fucntions of several variables than those for functions of one variable. But the curve below
tangent and chord below curve denitions for functions of several variables are straightforward generalizations of those for functions of one variable, and the geometry is essentially
the same. Therefore these denitions are much simpler to visualize, and to ensure that they
hold in specic applications.
Using these denitions, we have the conditions
SONC: Weak concavity of f at x is a necessary condition for this point to be a local
maximum of f (x).
Local SOSC: Strict concavity of f at x is sucient for that point to yield a strict local
maximum of f (x).
Global SOSC: If f is concave on D, then any point satisfying an FONC is a global
maximizer. If f is strictly concave on D, then at most one point can satisfy its FONC, and
it is the unique and strict global maximizer.
2.3.4
The rules are similar to those with one variable: rst narrow down the search using the
FONCs. For interior regular maxima, the critical points are dened by fi (x) = 0 for i = 1,
2, . . . n. These are n simultaneous equations in the n unknowns (x1, x2 , . . . xn ). The
equations are non-linear and may have no or multiple solutions, but in almost all cases there
will be a nite number of solutions. Thus the search is narrowed down.
Next, either use local SOSCs to identify local maxima and select the largest among them
for the global maximum, or if possible use concavity directly to identify the global maximum.
12
But SOSCs in terms of second-order derivatives are often hard. To check a quadratic
form for negative deniteness (semi or full) requires the computation of the determinants
of all the leading principal minors of its matrix. In practice, a little geometric imagination
saves a lot of work. By drawing (or visualizing) the graph of f (x), one can often quickly
identify which stationary points are maxima and which ones minima or saddle-points or the
like. In this course, if there is a diculty with second-order conditions in a many-variable
optimization problem, you will be alerted explicitly.
Irregular and boundary maxima are often best identied by geometric visualization than
by algebra. Of course doing this successfully needs practice. We will generally not need
boundary solutions except for the case when variables are required to be non-negative and
the optimum for one or more of them is at zero; I will alert you explicitly in such cases.
**** Sections on constrained optimization to come. ****
13