0% found this document useful (0 votes)
87 views

Introduction To Optimization

Uploaded by

abera alemayehu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Introduction To Optimization

Uploaded by

abera alemayehu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 158

Introduction to Mathematical

Optimization

R. Clark Robinson
Copyright c 2013 by R. Clark Robinson
Department of Mathematics
Northwestern University
Evanston Illinois 60208 USA
Contents

Preface v
Chapter 1. Linear Programming 1
1.1. Basic Problem 1
1.2. Graphical Solution 2
1.3. Simplex Method 5
1.3.1. Slack Variables 5
1.3.2. Simplex Method for Resource Requirements 7
1.3.3. General Constraints 10
1.4. Duality 17
1.4.1. Duality for Non-standard Linear Programs 19
1.4.2. Duality Theorems 21
1.5. Sensitivity Analysis 27
1.5.1. Changes in Objective Function Coefficients 29
1.6. Theory for Simplex Method 31
Chapter 2. Unconstrained Extrema 39
2.1. Mathematical Background 39
2.1.1. Types of Subsets of Rn 39
2.1.2. Continuous Functions 41
2.1.3. Existence of Extrema 43
2.1.4. Differentiation in Multi-Dimensions 44
2.1.5. Second Derivative and Taylor’s Theorem 47
2.1.6. Quadratic Forms 50
2.2. Derivative Conditions 55
2.2.1. First Derivative Conditions 56
2.2.2. Second Derivative Conditions 56
Chapter 3. Constrained Extrema 61
3.1. Implicit Function Theorem 61
3.2. Extrema with Equality Constraints 69
3.2.1. Interpretation of Lagrange Multipliers 73

iii
iv Contents

3.3. Extrema with Inequality Constraints: Necessary Conditions 75


3.4. Extrema with Inequality Constraints: Sufficient Conditions 81
3.4.1. Convex Structures 81
3.4.2. Karush-Kuhn-Tucker Theorem under Convexity 84
3.4.3. Rescaled Convex Functions 88
3.4.4. Global Extrema for Concave Functions 92
3.4.5. Proof of Karush-Kuhn-Tucker Theorem 93
3.5. Second-Order Conditions for Extrema of Constrained Functions 98
Chapter 4. Dynamic Programming 107
4.1. Parametric Maximization and Correspondences 107
4.1.1. Budget Correspondence for Commodity Bundles 113
4.1.2. Existence of a Nash Equilibrium 114
4.2. Finite-Horizon Dynamic Programming 117
4.2.1. Supremum and Infimum 121
4.2.2. General Theorems 121
4.3. Infinite-Horizon Dynamic Program 124
4.3.1. Examples 127
4.3.2. Theorems for Bounded Reward Function 134
4.3.3. Theorems for One-Sector Economy 136
4.3.4. Continuity of Value Function 141
Appendix A. Mathematical Language 147
Bibliography 149
Index 151
Preface
This book has been used in an upper division undergraduate course about optimization given in
the Mathematics Department at Northwestern University. Only deterministic problems with a
continuous choice of options are considered, hence optimization of functions whose variables
are (possibly) restricted to a subset of the real numbers or some Euclidean space. We treat
the case of both linear and nonlinear functions. Optimization of linear functions with linear
constraints is the topic of Chapter 1, linear programming. The optimization of nonlinear func-
tions begins in Chapter 2 with a more complete treatment of maximization of unconstrained
functions that is covered in calculus. Chapter 3 considers optimization with constraints. First,
we treat equality constraints that includes the Implicit Function Theorem and the method of
Lagrange multipliers. Then we treat inequality constraints, which is the covers Karush-Kuhn-
Tucker Theory. Included is a consideration of convex and concave functions. Finally, Chapter
4 considers maximization over multiple time periods, or dynamic programming. Although only
considering discrete time, we treat both finite and infinite number of time periods. In previous
years, I have used the textbooks by Sundaram [14] and Walker [16]. Our presentation of linear
programming is heavily influenced by [16] and the material on nonlinear functions by [14].
The prerequisites include courses on linear algebra and differentiation of functions of sev-
eral variables. Knowledge of linear algebra is especially important. The simplex method to
solve linear programs involves a special type of row reduction of matrices. We also use the
concepts of the rank of a matrix and linear independence of a collection of vectors. In terms of
differentiation, we assume that the reader knows about partial derivatives and the gradient of a
real-valued function. With this background, we introduce the derivative of a function between
Euclidean spaces as a matrix. This approach is used to treat implicit differentiation with several
constraints and independence of constraints. In addition, we assume an exposure into formal
presentation of mathematical definitions and theorems that our student get through a course
on foundation to higher mathematics or a calculus course that introduces formal mathematical
notation as our freshman MENU and MMSS courses do at Northwestern. Appendix A contains
a brief summary of some of the mathematical language that is assumed from such a course. We
do not assume the reader has had a course in real analysis. We do introduce a few concepts and
terminology that is covered in depth in such a course, but merely require the reader to gain a
working knowledge of this material so we can state results succinctly and accurately.
I would appreciate being informed by email of any errors or confusing statements, whether
typographical, numerical, grammatical, or miscellaneous.
R. Clark Robinson
Northwestern University
Evanston Illinois
[email protected]
May 2013

v
Chapter 1

Linear Programming

In this chapter, we begin our consideration of optimization by considering linear programming,


maximization or minimization of linear functions over a region determined by linear inequali-
ties. The word ‘programming’ does not refer to ‘computer programming’ but originally referred
to the ‘preparation of a schedule of activities’. Also, these problems can be solved by an algo-
rithm that can be implemented on a computer, so linear programming is a widely used practical
tool. We will only present the ideas and work low dimensional examples to gain an understand-
ing of what the computer is doing when it solves the problem. However, the method scales well
to examples with many variables and many constraints.

1.1. Basic Problem


For a real valued function f defined on a set F , written f : F ⊂ Rn → R, f (F F) =
{f (x) : x ∈ F } is the set of attainable values of f on F , or the image of F by f .
We say that the real valued function f has a maximum on F at a point xM ∈ F provided
that f (xM ) ≥ f (x) for all x ∈ F . When xM exists, the maximum value f (xM ) is written
as
max{f (x) : x ∈ F }
and is called the maximum or maximal value of f on F . We also say that xM is a maximizer
of f on F , or that f (xM ) maximizes f (x) subject to x ∈ F .
In the same way, we say that f has a minimum on F at a point xm ∈ F provided that
f (xm ) ≤ f (x) for all x ∈ F . The value f (xm ) is written as min{f (x) : x ∈ F } and is
called the minimum or minimal value of f on F . We also say that xm is a minimizer of f on
F , or that f (xM ) minimizes f (x) subject to x ∈ F .
The function f has an extremum at x0 ∈ F provided that x0 is either a maximizer or a
minimizer. The point x0 is also called an optimal solution.
Optimization Questions.
• Does f (x) attain a maximum (or minimum) on F for some x ∈ F ?
• If so, what are the points at which f (x) attains a maximum (or minimum) subject to
x ∈ F , and what is the maximal value (or minimal value)?
There are problems that have no maximum nor minimum. For example, there is no maxi-
mizer or minimizer of f (x) = 1/x on (0, ∞).

1
2 1. Linear Programming

Notations
We denote the transpose of a matrix (or a vector) A by A> .
For two vectors v and w in Rn , v ≥ w means that vi ≥ wi for all 1 ≤ i ≤ n. In
the same way, v  w means that vi > wi for all 1 ≤ i ≤ n. We also consider two positive
“quadrants” of Rn ,
Rn+ = { x ∈ Rn : x ≥ 0 } = { x ∈ Rn : xi ≥ 0 for 1 ≤ i ≤ n } and
Rn++ = { x ∈ Rn : x  0 } = { x ∈ Rn : xi > 0 for 1 ≤ i ≤ n }.
Definition. The rank of an m × n matrix A is used at various times in this course. The rank
of A, rank(A), is the dimension of the column space of A, i.e., the largest number of linearly
independent columns of A. This integer is the same as number of pivots in the row reduced
echelon form of A. (See Lay [9].)
It is possible to determine the rank by determinants of submatrices: rank(A) = k iff
k ≥ 0 is the largest integer with det(Ak ) 6= 0, where Ak is any k × k submatrix of A
formed by selecting k columns and k rows. To see this is equivalent, let A0 be submatrix of
A with k columns that are linearly independent and span column space, so rank(A0 ) = k .
The dimension of the row space of A0 is k , so there are k rows of A0 forming the k × k
submatrix Ak with rank(Ak ) = k , so det(Ak ) 6= 0. The submatrix Ak is also a submatrix
of pivot columns and rows. Note in linear algebra, the pivot columns are usually chosen with
the matrix in echelon form with zeroes in the row to the left of a pivot positions. We often
choose other columns as will be since in the examples given in the rest of the chapter.

1.2. Graphical Solution


A linear programming problem with a few number of variables can be solved graphically by
finding the vertices of the allowed values of the variables. We illustrate this solution method
with an example.
Example 1.1 (Production). A company can make two products with x1 and x2 being the
amount of each and with profit f (x1 , x2 ) = 8x1 + 6x2 . Because of limits to the amounts of
three inputs for production there are three constraints on production. The first input I1 restricts
production by x1 + x2 ≤ 2, the second input I2 restricts 5x1 + 10x2 ≤ 16, and the third input
I3 restricts 2x1 + x2 ≤ 3. Therefore the problem is
Maximize: 8x1 + 6x2 (profit),
Subject to: x1 + x2 ≤ 2, ( I1 )
5x1 + 10x2 ≤ 16, (I2 )
2x1 + x2 ≤ 3, (I3 )
x1 ≥ 0, x2 ≥ 0.
The function f (x1 , x2 ) and all the constraint functions are linear so this is a linear pro-
gramming problem. The feasible set F is the set of all the points satisfying the constraint
equations and is the shaded region in Figure 1.2.1.
The vertices of the feasible region are (0, 0), (1.5, 0), (1, 1), (0.8, 1.2)
 , and (0, 1.6).
14 17
Other points where two constraints are equal include the points 15 , 15 , (2, 0), (3.2, 0),
(0, 2), and (0, 3), each of which lies outside the feasible set (some other constraint function is
negative).
Since the gradient ∇f (x1 , x2 ) = (8, 6)> 6= (0, 0)> , the maximum must be on the bound-
ary of F . Since the value of f (x1 , x2 ) along an edge is a linear combination of the values at
the end points, if a point in the middle of one of the edges were a maximizer, then f would have
the same value at the two end points of the edge. Therefore, a point that maximizes f can be
found at one of the vertices of F .
1.2. Graphical Solution 3

I3
2
1.6
(0.8, 1.2)
(1, 1)
I2
I1
(0, 0) 1.5 2 3.2

Figure 1.2.1. Feasible set for production problem

The values of the objective function at the vertices are f (0, 0) = 0, f (1.5, 0) = 12,
f (1, 1) = 14, f (0.8, 1.2) = 13.6, and f (0, 1.6) = 9.6. Therefore, the maximizer on F is
(1, 1) and the maximal value is 14. 
Definition. A resource constraint is one of the form ai1 x1 + · · · + ain xn ≤ bi with bi ≥ 0.
The name refers to the upper limit on the combination of the variables.
A standard maximization linear programming problem has only resource constraints:
Maximize: f (x) = c · x = c1 x1 + · · · + cn xn ,
Subject to: a11 x1 + · · · + a1n xn ≤ b1 ,
.. .. ..
. . .
am1 x1 + · · · + amn xn ≤ bm ,
xj ≥ 0 for 1 ≤ j ≤ n.
The data specifying the problem includes an m × n matrix A = (aij ), c = (c1 , . . . , cn )> ∈
Rn , and b = (b1 , . . . , bm )> ∈ Rm+ with all the bi ≥ 0.
Using matrix notation, the constraints can be written as Ax ≤ b and x ≥ 0. The feasible
set is the set F = { x ∈ Rn+ : Ax ≤ b }.
Since the objective is to maximize f (x), the function f (x) is called the objective function.
A nonstandard linear program allows other types of inequalities for the constraints. See the
exercises of this section and Section 1.3.3 for examples of these other types of constraints.

Example 1.2 (Minimization).


Minimize: 3x1 + 2x2 ,
Subject to: 2x1 + x2 ≥ 4,
x1 + x2 ≥ 3,
x1 + 2x2 ≥ 4,
x1 ≥ 0, and x2 ≥ 0.

(0, 4)

(1, 2)
(2, 1)

(4, 0)
Figure 1.2.2. Feasible set for the minimization example

The feasible set that is shaded in Figure 1.2.2 is unbounded but f (x) ≥ 0 is bounded
below, so f (x) has a minimum. The vertices are (4, 0), (2, 1), (1, 2), and (0, 4), with values
4 1. Linear Programming

f (4, 0) = 12, f (2, 1) = 8, f (1, 2) = 7, and f (0, 4) = 12. Therefore, the minimum value
is 7 which is attained at (1, 2). 
Graphical Solution Method for a Linear Programming Problem
1. Determine or draw the feasible set F . If F = ∅, then the problem has no
optimal solution, and it is said to be infeasible.
2. The problem is called unbounded and has no optimal solution provided that
F 6= ∅ and the objective function on F has
a. arbitrarily large positive values for a maximization problem, or
b. arbitrarily large negative values for a minimization problem.
3. A problem is called bounded provided that it is neither infeasible nor un-
bounded. In this case, an optimal solution exists. Determine all the vertices of
F and values of the objective function at the vertices. Choose the vertex of
F producing the maximal or minimal value of the objective function.

1.2. Exercises
1.2.1. Consider the following maximization linear programming problem.
Maximize 4x + 5y
Subject to: x + 2y ≤ 12
x+y ≤7
3x + 2y ≤ 18
0 ≤ x, 0 ≤ y .
a. Sketch the feasible set.
b. Why must a maximum exist?
c. Find the maximal feasible solution and the maximal value using the geometric
method.
1.2.2. Consider the following linear programming problem:
Minimize: 3x + 2y
Subject to: 2x + y ≥ 4
x+y ≥3
x ≥ 0, y ≥ 0.
1. Sketch the feasible set.
2. Why must a minimum exist?
3. Find the optimal feasible solution and the optimal value using the geometric
method.
1.2.3. Consider the following minimization linear programming problem.
Minimize 3x + 2y
Subject to: 2x + y ≥ 4
x+y ≥3
3x + 2y ≤ 18
0 ≤ x, 0 ≤ y .
a. Sketch the feasible set.
b. Why must a minimum exist?
c. Find the minimal feasible solution and the minimal value using the geometric
method.
1.3. Simplex Method 5

1.2.4. Show graphically that the following linear program does not have a unique solution:
Maximize 30x + 40y
Subject to: 3x + 4y ≤ 48
x + y ≤ 14
0 ≤ x, 0 ≤ y .
1.2.5. Consider the following maximization linear programming problem.
Maximize 4x + 5y
Subject to: x + 2y ≥ 0
2x − y ≥ 3
0 ≤ x, 0 ≤ y .
a. Sketch the feasible set.
b. Explain why the problem is unbounded and a maximizer does not exists.

1.3. Simplex Method


The graphical solution method is not a practical algorithm for most problems because the num-
ber of vertices for a linear program grows very fast as the number of variables and constraints
increase. Dantzig developed a practical algorithm based on row reduction from linear alge-
bra. The first step is to add more variables into the standard maximization linear programming
problem to make all the inequalities of the form xi ≥ 0 for some variables xi .

1.3.1. Slack Variables


Definition. For a resource constraint, a slack variable can be set equal to the amount of unused
resource: The inequality ai1 x1 + · · · + ain xn ≤ bi with slack variable si becomes
ai1 x1 + · · · + ain xn + si = bi with si ≥ 0.
Example 1.3. The production linear program given earlier has only resource constraints:
Maximize: 8x1 + 6x2 ,
Subject to: x1 + x2 ≤ 2,
5x1 + 10x2 ≤ 16,
2x1 + x2 ≤ 3,
x1 ≥ 0, x2 ≥ 0.
When the slack variables are included the equations become
x1 + x2 + s1 = 2
5x1 + 10x2 + s2 = 16
2x1 + x2 + s3 = 3.
Therefore, the problem is
Maximize: (8, 6, 0, 0, 0) · (x1 , x2 , s1 , s2 , s3 )
 
  x1  
1 1 1 0 0  x2 
 2
Subject to: 5 10 0 1 0  s1  = 16 and
    
 
2 1 0 0 1 s2   3
s3
x1 ≥ 0, x2 ≥ 0, s1 ≥ 0, s2 ≥ 0, s3 ≥ 0.
6 1. Linear Programming

The last three columns of the coefficient matrix for the slack variables can be considered as
pivot columns and are linearly independent. Thus, the rank of the matrix is 3 and a solution to
the nonhomogeneous system has two free variables. The starting solution p0 = (0, 0, 2, 16, 3)
in the feasible set is formed by setting the free variables x1 and x2 equal to zero and solving for
the values of the three slack variables. The value is f (p0 ) = 0.
We proceed to change pivots to make a different pair the free variables equal to zero and a
different triple of positive variables, in a process called “pivoting”.

s3

p3
p2
s2
s1
p0 p1

If we leave the vertex (x1 , x2 ) = (0, 0), or p0 = (0, 0, 2, 16, 3), making x1 > 0 the
entering variable while keeping x2 = 0, the first slack variable to become zero is s3 when
x1 = 1.5. We have arrived at the vertex (x1 , x2 ) = (1.5, 0) of the feasible set by moving
along one edge. Therefore we have a new solution with two zero variables and three nonzero
variables, p1 = (1.5, 0, 0.5, 8.5, 0) with f (p1 ) = 8(1.5) = 12 > 0 = f (p0 ). The point p1
is a better feasible solution than p0 . At this vertex, all of input one is used, s3 = 0.
We repeat leaving p1 by making x2 > 0 the entering variable while keeping s3 = 0;
the first other variable to become zero is s1 . We have arrived at p2 = (1, 1, 0, 1, 0) with
f (p2 ) = 80(50) + 60(50) = 14 > 12 = f (p1 ). The point p2 is a better feasible solution
than p1 . In the x-plane, we leave (x1 , x2 ) = (1.5, 0) and to arrive at (x1 , x2 ) = (1, 1). At
this point, all of inputs one and three are used, s1 = 0 and s3 = 0, and s2 = 1 units of input
two remains.
If we were to leave p2 by making s3 > 0 the entering variable while keeping s1 = 0,
the first variable to become zero is s2 , and we arrive at the point p3 = (0.8, 1.2, 0, 0, 0.2).
The objective function f (p3 ) = 8(0.8) + 6(1.2) = 13.6 < 14 = f (p2 ), and p3 is a worse
feasible solution than p2 .
Let z ∈ F r {p2 }, v = z − p2 , and vj = pj − p2 for j = 1, 3. Then
f (vj ) = f (pj ) − f (p2 ) < 0 for j = 1, 3.
2
The vectors { v1 , v3 } form a basis of R , so v = y1 v1 + y3 v3 for some yj . The vector v
points into the feasible set so y1 , y3 ≥ 0 and either y1 > 0 or y2 > 0. Therefore,
f (z) = f (p2 ) + f (v) = f (p2 ) + y1 f (v1 ) + y3 f (v3 ) < f (p2 ).
Since f cannot increase by moving along either edge going out from p2 , f (p2 ) is the max-
imum among the values at all the points of the feasible set and p2 is an optimal feasible
solution.
In this example, the points p0 = (0, 0, 2, 16, 3), p1 = (1.5, 0, 0.5, 8.5, 0), p2 =
(1, 1, 0, 1, 0), p3 = (0.8, 1.2, 0, 0, 10), and p4 = (0, 1.6, 0.4, 0, 1.4) are called basic
solutions since at most 3 variables are positive, where 3 is the rank of the coefficient matrix for
the constraint equations. 
Definition. Consider a standard maximization linear programming problem with only resource
constraints: Maximize f (x) = c · x subject to Ax ≤ b and x ≥ 0 for c = (c1 , . . . , cn )>
in Rn , A = (aij ) an m × n matrix, and b = (b1 , . . . , bm )> with bi ≥ 0 for 1 ≤ i ≤ m.
The corresponding standard maximization linear programming problem with slack variables
s1 ,. . . , sm added, called slack-variable form, is the following:
1.3. Simplex Method 7

Maximize: f (x) = c · x
Subject to: a11 x1 + · · · + a1n xn + s1 = b1 ,
.. .. ..
. . .
am1 x1 + · · · + amn xn + sm = bm ,
xj ≥ 0 for 1 ≤ j ≤ n, and
si ≥ 0 for 1 ≤ i ≤ m.

Using matrix notation with I the m × m identity matrix, the problem is to maximize f (x) =
c · x subject
" # to
x
[ A, I ] = b with x ≥ 0 and s ≥ 0.
s

When solving this matrix equation by row reduction, we indicate the partition of the aug-
mented matrix into the columns of (i) the original variables, (ii) the slack variables, and (iii)
the constant vector b by inserting extra vertical lines between A and I and between I and b,
[ A | I | b ].

1.3.2. Simplex Method for Resource Requirements


Definition. Assume that Āx = b with x ≥ 0 is in slack-variable form, where Ā is an
m × (n + m) matrix with rank m and b ≥ 0. A general solution from a row reduced
echelon form of the augmented matrix [Ā|b] writes the m dependent variables corresponding
to the pivot columns in terms of n free variables corresponding to the non-pivot columns.
The m dependent variables of a solution are called the basic variables. The n free variables
of a solution are called the non-basic variables. In applications to linear programming, we
usually take the solution formed by setting the free variables equal to zero, so there is at most
m nonzero components corresponding to some or all of the basic variables. Since the basic
variables correspond to pivot columns of some row reduced matrix, the columns of the basic
variables are linear independent.
For a solution Āp = b, if the set of columns of Ā corresponding to pi 6= 0, {Ci :
pi 6= 0 }, is linearly independent, then p is called a basic solution. If p is also feasible
with p ≥ 0, then it is called a basic feasible solution. Since the rank(Ā) = m, there
can be at most m nonzero components in a basic solution. A basic solution p is called non-
degenerate provided that there are exactly m nonzero components. A basic solution p is called
degenerate provided that there are fewer than m nonzero components. We find basic solutions
by setting the n free variables equal zero and allow some basic variables to also be zero. If all
the basic variables are nonzero then the solution is non-degenerate, otherwise it is degenerate.

Example 1.4 (Linear Algebra Solution of the Production Linear Program). The augmented
matrix forthe original system is 
x1 x2 s1 s2 s3
 1 1 1 0 0 2 

 with free variables x1 and x2 and basic variables

 5 10 0 1 0 16 

2 1 0 0 1 3
s1 , s2 , and s3 .
If we make x1 > 0 while keeping x2 = 0, x1 becomes a new basic variable, called the
entering variable.
8 1. Linear Programming

(i) s1 will become zero when x1 = 12 = 2,


(ii) s2 will become zero when x1 = 16 5
= 3.2, and
(iii) s3 will become zero when x1 = 32 = 1.5.
Since s3 becomes zero for the smallest value of x1 , s3 is the departing variable. Since the third
row determined the value of s3 , the entry in the first column third row is the new pivot.
Row reducing to make a pivot in the first column third row, we get
   
x1 x2 s1 s2 s3 x1 x2 s1 s2 s3
 1 1 1 0 0 2    0 .5 1 0 .5 0.5 
 
∼ .

 5 10 0 1 0 16   0 7.5 0 1 2.5 8.5 

2 1 0 0 1 3 1 .5 0 0 .5 1.5
Setting the free variables x2 and s3 equal zero, gives p1 = (75, 0, 25, 425, 0)> as the new
basic solution. The entries in the furthest right (augmented) column give the values of the basic
variables and are all positive.
In order to keep track of the value of the objective function (or variable), f = 80x1 +60x2 ,
or −80x1 − 60x2 + f = 0, we include in the the augmented matrix a row for this equation
and a column for the variable f ,
 
x1 x2 s1 s2 s3 f
 1 1 1 0 0 0 2 
 
 5 10 0 1 0 0 16 
 
 
 2 1 0 0 1 0 3 
8 6 0 0 0 1 0
The entry in the column for f is one in the last objective function row and zero elsewhere. The
entries for the xi in this objective function row often start out negative as is the case in this
example.
Performing the row reduction on this this further augmented matrix, we get
   
x1 x2 s1 s2 s3 f x1 x2 s1 s2 s3 f
 1 1 1 0 0 0 2    0 .5 1 0 .5 0 0.5 

 
 5 10 0 1 0 0 16  ∼  0 7.5 0 1 2.5 0 8.5  .
   
   
 2 1 0 0 1 0 3   1 .5 0 0 .5 0 1.5 
8 6 0 0 0 1 0 0 2 0 0 4 1 12
Since f is the pivot for the objective function row, the bottom right entry gives that the new
value f = 12 for x2 = s3 = 0, x1 = 1.5 > 0, s1 = 0.5 > 0, and s2 = 8.5 > 0.
If we pivot back to make s3 > 0, the value of f becomes smaller, so we select x2 as the
next entering variable, keeping s3 = 0.
(i) x1 becomes zero when x2 = 1.5 0.5
= 3,
(ii) s1 becomes zero when x2 = 0.5 0.5
= 1, and
(iii) s2 becomes zero when x2 = 8.5 7.5
≈ 1.13 .
Since the smallest positive value of x1 comes from s1 , s1 is the departing variable and we make
the entry in the first row second column the new pivot.
   
x1 x2 s1 s2 s3 f x1 x2 s1 s2 s3 f
 0 .5 1 0 .5 0 0.5    0 1 2 0 1 0 1 

 
 0 7.5 0 1 2.5 0 8.5  ∼  0 0 15 1 5 0 1  .
   
   
 1 .5 0 0 .5 0 1.5   1 0 1 0 1 0 1 
0 2 0 0 4 1 12 0 0 4 0 2 1 14
1.3. Simplex Method 9

The value of the objective function is now f = 14.


Why does the objective function decrease when moving along the third edge making s3 > 0
an entering variable, keeping s1 = 0?
(i) x1 becomes zero when s3 = 11 = 1,
(ii) x2 becomes zero when s3 = 11 = 1, and
(iii) s2 becomes zero when s3 = 15 = 0.2.
The smallest positive value of s3 comes from s2 , and we pivot on the second row fifth column.
   
x1 x2 s1 s2 s3 f x1 x2 s1 s2 s3 f
 0 1 2 0 1 0 1    0 1 1 0 0 0 1.2 

 
 0 0 15 1 5 0 1  ∼  0 0 3 0.2 1 0 0.2  .
   
   
 1 0 1 0 1 0 1   1 0 2 0.2 0 0 0.8 
0 0 4 0 2 1 14 0 0 10 0.4 0 1 13.6
The value of the objective function decreased to 13.6. Since the value in the fifth column of
the row for the objective function is already positive before we pivot, the value decreases.
Therefore, the values (x∗1 , x∗2 , s∗1 , s∗2 , s∗3 ) = (1, 1, 0, 1, 0) give the maximal value for f
with f (1, 1, 0, 1, 0) = 14. 

In the pivoting process used in the example, the variable f remains the pivot variable for
the objective function row. This explains why the value in the bottom right of the augment
matrix is the value of the objective function at every step. However, since it does not play a role
in the row reduction, we drop this column for the variable f from the augmented matrices.

Definition. The augmented matrix with a row augmented for the objective function but without
a column for the objective function variable is called the tableau. The row in the tableau for the
objective function is called the objective function row.

Steps in the Simplex Method with only Resource Constraints


1. Add a slack variable for each resource inequality and set up the tableau. An
initial feasible solution is determined by setting all the original variables equal
to zero and solving for the slack variables.
2. Choose as entering variable any non-basic variable with a negative entry in the
objection function row. We usually use the most negative entry. (The entry
must be negative in order for the result of the pivoting to be an increase in the
objective function.)
3. From the column selected in the previous step, select as a new pivot the row
for which the ratio of entry in the augmented column divided by the entry in
the column selected is the smallest nonnegative value. If such a pivot position
exists, then row reduce the matrix making this selected entry a new pivot po-
sition and all the other entries in this column zero. (The pivot position must
have a positive entry with the smallest ratio so that this variable becomes zero
for the smallest value of the entering variable.)
One pivoting step interchanges one free variable with one basic variable. The
variable for the column with the new pivot position is the entering variable
that changes from a free variable equal to zero to a basic variable which is
nonnegative and usually positive. The departing variable is the old basic vari-
able for the row of the new pivot position which becomes a free variable with
value zero.
10 1. Linear Programming

Steps in the Simplex Method with only Resource Constraints, continued


3 0. If there is a column with a negative coefficient in the objective function row
and only nonpositive entries in the column above, then the objective function
has no upper bound and the problem is unbounded. We illustrate this case
with an exercise.
4. The solution is optimal when all entries in the objective function row are non-
negative. This tableau is called the optimal tableau.
5. For the optimal tableau, if there is a zero entry in the objective function row
for a non-basic variable with a positive entry in the column, then a different
set of basic variables is possible. If, in addition, all the basic variables are
positive, then the optimal solution is not unique.

1.3.3. General Constraints


We proceed to consider non-resource constraints. We continue to take all the constants bi ≥ 0
by multiplying one of the inequalities by 1 if necessary.
Requirement Constraints. A requirement constraint is given by ai1 x1 + · · · + ain xn ≥ bi
with bi > 0. Such inequalities occur especially for a minimization problem. A solution of
a requirement constraint can have a surplus of quantity, so instead of adding on the unused
resource with a slack variable, we need to subtract off the excess resource by means of a surplus
variable si ≥ 0, ai1 x1 + · · · + ain xn − si = bi . In order to find an initial feasible solution,
for each requirement constraint, we also add an artificial variable ri ≥ 0, resulting in the
equation
ai1 x1 + · · · + ain xn − si + ri = bi ,
An initial solution is formed by setting the artificial variable ri = bi > 0, while setting all the
surplus variables si = 0 and all the xj = 0.
Equality Constraints. For an equality constraint ai1 x1 + · · · + ain xn = bi with bi ≥ 0, we
only add an artificial variable ri ≥ 0, resulting in the equation
ai1 x1 + · · · + ain xn + ri = bi .
An initial solution is ri = bi while all the xj = 0. Since initially, ri = bi ≥ 0, ri
will remain non-negative throughout the row reduction. (Alternatively, we could replace the
equality by two inequalities ai1 x1 + · · · + ain xn ≥ bi and ai1 x1 + · · · + ain xn ≤ bi , but
this involves more equations and variables.)
If either requirement constraints or equality constraints are present, the initial solution has
all the xi = 0 and all the surplus variables equal zero while the slack and artificial variables
are greater than or equal to zero. This initial solution is not feasible if an artificial variables is
positive for a requirement constraint or an equality constraint.
Example 1.5 (Minimization Example). Assume that two foods are consumed in amounts x1
and x2 with costs of 15 and 7 per unit and yield (5, 3, 5) and (2, 2, 1) units of three vitamins
respectively.
The problem is to minimize the cost 15 x1 + 7 x2 or maximize 15 x1 7 x2 with the
constraints:
Maximize: 15 x1 7 x2
Subject to: 5 x1 + 2 x2 ≥ 60,
3 x1 + 2 x2 ≥ 40,
5 x1 + 1 x2 ≥ 35,
x1 ≥ 0, and x2 ≥ 0.
1.3. Simplex Method 11

The original tableau is


 
x1 x2 s 1 s2 s3 r1 r2 r3
 5 2 1 0 0 1 0 0 60 
 
3 2 0 1 0 0 1 0 40 
 

 
 5 1 0 0 1 0 0 1 35 
15 7 0 0 0 0 0 0 0
For the original problem, the solution involves the artificial variables and does not give
a feasible solution for x. To eliminate the artificial variables, preliminary steps are added to
the simplex algorithm to force all the artificial variables to be zero. The artificial variables are
forced to zero by means of an artificial objective function that is the negative sum of all the
equations that contain artificial variables,

13x1 5x2 + s1 + s2 + s3 + ( r1 r2 r3 ) = 135.


We think of R = r1 r2 r3 as a new variable. It is always less than or equal to zero and so
has a maximum less than or equal to zero. If the artificial objective function can be made equal
zero, then this gives an initial feasible basic solution using only the original and slack variables
without using the artificial variables, and the artificial variables can be dropped and proceed as
before.
The tableau with the artificial objective function included is
 
x1 x2 s1 s2 s3 r1 r2 r3
5 2 1 0 0 1 0 0 60 
 

 

 3 2 0 1 0 0 1 0 40 .

 5 1 0 0 1 0 0 1 35 
 15 7 0 0 0 0 0 0 0 
 
13 5 1 1 1 0 0 0 135
Note that there are zeroes in the columns for the ri in artificial function row. The most negative
coefficient in the artificial objective function (outside the artificial variables) is 13. For this
first column, the entry that has the smallest positive ratio of abi1i , is the third row. Pivoting first
on a31 , then a22 , and finally on a14 yields a feasible initial basic solution without any artificial
variables.
 
x1 x2 s1 s2 s3 r1 r2 r3
 0 1 1 0 1 1 0 1 25 
 
7 3 3
 0 0 1 0 1 19
 
5 5 5

∼
 
1 1 1
 1 0 0 0 0 7 

 5 5 5 
 0 4 0 0 3 0 0 3 105 
 
12 8 13
0 5
1 1 5
0 0 5 44
 
x1 x2 s1 s2 s3 r1 r2 r3
5 4 5 4 80 
 0 0 1 1

 7 7 7 7 7 
 0 5 3 5 3 95 
 1 0 7 7
0 7 7 7 
∼  1 1 2 1 2 30 

 0 0 7 7
0 7 7 7 
 
 0
 0 0 20 7
9
7
0 20
7
9
7
1115 
7 
5 4 12 11 80
0 0 1 7 7
0 7 7 7
12 1. Linear Programming

 
x1 x2 s1 s2 s3 r1 r2 r3
 7 4 7 4
0 0 1 1 16

 5 5 5 5

 
 0 1 1 0 1 1 0 1 25 
∼
 
1 2 1 2


 1 0 5
0 5 5
0 5
2 

0 0 4 0 1 4 0 1 205
 
 
0 0 0 0 0 1 1 1 0
After these three steps, the artificial variables are no longer pivots and are zero, so they
can be dropped. The values (x1 , x2 , s1 , s2 , s3 ) = (2, 25, 0, 16, 0) form an initial feasible basic
solution. Finally, pivoting on a15 yields the optimal solution:

x1 x2 s1 s2 s3
 
x1 x2 s1 s2 s3
 
7 5

0 0 7
1 4
16  
  0 0 4 4
1 20 

 5 5   
3 5
0 1 0 5

 0 1 1 0 1 25 ∼ 4 4 .

  
1 2   1 1

1 0 0 2 1 0 0 10 

 5 5  
 2 2 
0 0 4 0 1 205 0 0 9 5
0 185
4 4

At this point, all the entries in the objective function are positive, so this is the optimal solution:
(x1 , x2 , s1 , s2 , s3 ) = (10, 5, 0, 0, 20) with an value of 185 for f .
For the original problem, the minimal solution has a value of 185. 

Steps in the Simplex Method with any Type of Constraints


1. Make all the constraints on the right hand side of any inequality or equation posi-
tive, bi ≥ 0, by multiplying by 1 if necessary.
2. Add a slack variable for each resource inequality, add a surplus variable and an
artificial variable for each requirement constraint, and add an artificial variable for
each equality constraint.
3. If either a requirement constraint or an equality constraint is present, then form
the artificial objective function by taking the negative sum of all the equations that
contain artificial variables, dropping the terms involving the artificial variables.
Set up the tableau. (The row for the artificial objective function has zeroes in the
columns of the artificial variables.) An initial solution of the equation including
the artificial variables is determined by setting all the original variables xj = 0,
all the surplus variables si = 0, all the slack variables si = bi , and all the artificial
variables ai = bi .
4. Apply the simplex algorithm using the artificial objective function.
a. If it is not possible to make the artificial objective function equal to zero i.e., if
there is a positive artificial variable in the optimal solution of the artificial objective
function), then there is no feasible solution and stop.
b. If the value of the artificial objective function can be made equal to zero (when
the artificial variables are not pivot columns), then all the artificial variables have
been made equal to zero. (This is true even if some of the entries in the artificial
objective function row are nonzero and possibly even negative.) At this point,
drop the artificial variables and artificial objective function from the tableau and
continue using the initial feasible basic solution constructed.
5. Apply the simplex algorithm to the actual objective function. The solution is opti-
mal when all entries in the objective function row are nonnegative.
1.3. Simplex Method 13

Example 1.6. Consider the problem of


Maximize: 3 x1 + 4 x2
Subject to: 2 x1 + x2 ≤ 6,
2 x1 + 2 x2 ≥ 24,
x1 = 8,
x1 ≥ 0, x2 ≥ 0.
With slack, surplus, and artificial variables added the problem becomes
Maximize: 3 x1 + 4 x2
Subject to: 2 x1 + x2 + s1 = 6,
2 x1 + 2 x2 − s2 + r2 = 24,
x1 + r3 = 8.
The negative sum of the second and third equation, gives the artificial objective function
3 x1 − 2 x2 + s2 − r2 − r3 = 32.
The tableau
 with variables is 
x1 x2 s1 s2 r2 r3
 2 1 1 0 0 0 6 
 
 
 2 2 0 1 1 0 24 
 
 1 0 0 0 0 1 8 
 
 3 4 0 0 0 0 0 
 
3 2 0 1 0 0 32
Pivoting on a31 and then a22 ,
  
x x s s r r

x1 x2 s1 s2 r2 r3 1 2 1 2 2 3
  0 1 1
0 1 3 18 

 0 1 1 0 0 2 22  

 2 2 
   1 1
 0 2 0 1 1 2 8   0 1 0 1 4

∼ ∼ 2 2

 1 0 0 0 0 1 8

  1 0 0 0 0 1 8
  
 
 0 4 0 0 0 3 24   0 0 0
  
2 2 1 40 

0 2 0 1 0 3 8 0 0 0 0 1 1 0
We have attained a feasible solution of (x1 , x2 , s1 , s2 ) = (8, 4, 18, 0). We can now drop
the artificial objective function and artificial variables. Pivoting on a1,4
x1 x2 s1 s2
   
x1 x2 s1 s2
1
 0 0 1 18  0 0 2 1 36 

2  
   
 0 1 0 1
4  ∼ 0 1 1 0 22
  
 2
 
  1 

 1 0 0 0 8  0 0 0 8 
0 0 0 2 40 0 0 4 0 112
This gives the optimal solution of f = 112 for (x1 , x2 , s1 , s2 ) = (8, 22, 0, 36). 

Example 1.7 (Unbounded problem). Consider the problem of


Maximize: 5 x1 + 3 x2
Subject to: 2 x1 + x2 ≥ 4,
x1 + 2 x2 ≥ 4,
x1 + x2 ≥ 3,
x1 ≥ 0, x2 ≥ 0
14 1. Linear Programming

The tableau with surplus and artificial variables added is


 
x1 x2 s1 s2 s3 r1 r2 r3
2 1 1 0 0 1 0 0 4
 
 
 

 1 2 0 1 0 0 1 0 4 
.

 1 1 0 0 1 0 0 1 3 

5 3 0 0 0 0 0 0 0
 
 
4 4 1 1 1 0 0 0 11
Applying the algorithm to the artificial objective function, we get the following:
 
x1 x2 s1 s2 s3 r1 r2 r3
 1 1 1 1
 2 2
0 0 2
0 0 2 

 
3 1 1

 0 2 2
1 0 2
1 0 2 

∼ 1 1 1


 0 2 2
0 1 2
0 1 1 

 1 5 5 


0 2 2
0 0 2
0 0 10 

0 2 1 1 1 2 0 0 3
 
x1 x2 s1 s2 s3 r1 r2 r3

 1 1 0 0 1 0 0 1 3 

0 1 0 1 1 0 1 1 1
 
 
∼ 

 0 1 1 0 2 1 0 2 2 


 0 2 0 0 5 0 0 5 15 

0 1 0 1 1 1 0 2 1
 
x1 x2 s1 s2 s3 r1 r2 r3
1 0 0 1 2 0 1 2 2
 
 
 
 0 1 0 1 1 0 1 1 1 
∼ 

 0 0 1 1 3 1 1 3 1 

0 0 0 2 7 0 2 7 13
 
 
0 0 0 0 0 1 1 1 0
 
x1 x2 s1 s2 s3

 1 2 0 1 0 4 

∼ 0 1 0 1 1 1 .
 
 
 0 3 1 2 0 4 
0 7 0 5 0 20
In the second pivot, we used the third column rather than the second because it eliminates
fractions. In the optimal tableau, the column for s2 has a negative value in the objection
function row and negatives above it. If we keep 0 = x2 but use s2 as a free variable, we get the
solution
x1 − s2 = 4, x2 = 0, s1 − 2 s2 = 4, s3 − s2 = 1, or
x1 = 4 + s 2 , x2 = 0, s1 = 4 + 2 s2 , s3 = 1 + s2 , f = 5 x1 + 3 x2 = 20 + 5 s2 .
Thus, as s2 ≥ 0 increases, we keep a feasible solution and the objective function is unbounded.
This example indicates why a maximization linear program with a column of negative
entries in its tableau above a negative entry in the objective function row is unbounded. 
1.3. Simplex Method 15

1.3. Exercises
1.3.1. Solve the following problem (a) graphically and (b) by the simplex method:
Maximize: 3 x1 + 2 x2
Subject to: x1 + 2 x2 ≤ 70
x1 + x2 ≤ 40
3 x1 + x2 ≤ 90
x1 ≥ 0, x2 ≥ 0.

1.3.2. Solve the following problem by the simplex method:


Maximize: 2 x1 + 3 x2 + 5 x3
Subject to: x1 + 4 x2 − 2 x3 ≤ 30
x1 + 2 x2 + 5 x3 ≤ 9
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
Give the values of the optimal solution, including the xi values, the values of the
slack variables, and the optimal value of the objective function.

1.3.3. Solve the following problem by the simplex method:


Maximize: 2 x1 + 5 x2 + 3 x3
Subject to: x1 + 2 x2 ≤ 28
2 x1 + 4 x3 ≤ 16
x2 + x3 ≤ 12
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.

1.3.4. Solve the following problem by the simplex method:


Maximize: 3 x1 + 4 x2 + 2 x3
Subject to: 3 x1 + 2 x2 + 4 x3 ≤ 45
x1 + 2 x2 + 3 x3 ≤ 21
4 x1 + 2 x2 + 2 x3 ≤ 36
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.

1.3.5. Solve the linear program by the simplex algorithm with artificial variables:
Maximize: x1 + 2x2 + 3x3
Subject to: x1 + 2x2 + x3 = 36
2x1 + x2 + 4x3 ≥ 12
0 ≤ x1 , 0 ≤ x2 , 0 ≤ x3 .

1.3.6. Solve the linear program by the simplex algorithm with artificial variables:
Maximize: 4x1 + 5x2 + 3x3
Subject to: x1 + 2x2 + 3x3 ≤ 20
x1 + 2x2 ≥ 2
x1 − x2 + x3 = 7
x1 , x 2 , x 3 ≥ 0.
16 1. Linear Programming

1.3.7. Solve the linear program by the simplex algorithm with artificial variables:
Maximize: 4x1 + 5x2 + 3x3
Subject to: x1 + 2x2 + 3x3 ≤ 20
x1 + 2x2 ≥ 2
x1 − x2 + x3 = 7
x1 , x2 , x3 ≥ 0.
1.3.8. Solve the linear program by the simplex algorithm with artificial variables:
Maximize: −x1 − 2 x2 + 2x3
Subject to: x1 + 2 x3 ≤ 12
2 x1 + 3 x2 + x3 ≥ 4
x1 + 2 x2 − x3 ≥ 6
x1 , x2 , x3 ≥ 0.
1.3.9. Solve the linear program by the simplex algorithm with artificial variables:
Maximize: x1 + x2 + x3
Subject to: x1 + x2 ≥ 3
x1 + 2 x2 + x3 ≥ 4
2 x1 + x2 + x3 ≤ 2
x1 , x 2 , x 3 ≥ 0.
1.3.10. Use artificial variables to determine whether there are any vectors satisfying
x1 + x2 + ≤ 40
2 x1 + x2 ≥ 70
x1 + 3 x2 ≤ 90
x1 , x2 , x3 ≥ 0.
1.3.11. Show that the following problem is unbounded.
Maximize 2x + y
Subject to: x − y ≤ 3
−3x + y ≤ 1
0 ≤ x, 0 ≤ y .
1.3.12. Consider the following maximization linear program:
Maximize 2 x1 − 5 x2
Subject to: x1 + 2 x2 ≤ 2
x1 − 4 x2 ≥ 4
0 ≤ x1 , 0 ≤ x2 .
a. Use the simplex method with artificial variables to show that there are no fea-
sible solutions.
b. Plot the constraints and argue why the feasible set is empty.
1.4. Duality 17

1.4. Duality
A linear program can be associated with a two-person zero-sum game. A Nash equilibrium in
mixed strategies for this game gives an optimal solution to not only the original game but also
to an associated dual linear programming problem. If the original problem is a maximization
problem, then the dual problem is a minimization problem and vice versa. We do not discuss
this zero-sum game, but introduce this dual minimization problem using an example.
Example 1.8. Consider the production maximization linear programming problem that was
given previously:
MLP: Maximize: z = 8x1 + 6x2 (profit)
subject to: x1 + x2 ≤ 2, (I 1 )
5x1 + 10x2 ≤ 16, ( I2 )
2x1 + x2 ≤ 3, (I3 ),
x1 , x2 ≥ 0
Assume that excess inputs can be sold and shortfalls can be purchased for prices of y1 , y2 ,
and y3 , each of which is nonnegative, yj ≥ 0 for j = 1, 2, 3. The prices yj are called shadow
prices or imputed values of the resources or inputs and are set by the market (the potential for
competing firms). With either purchase or sale of the input resource allowed, the profit for the
firm is

P = 8x1 + 6x2 + (2 − x1 − x2 )y1 + (16 − 5x1 − 10x2 )y2 + (3 − 2x1 − x2 )y3 .


The potential for outside competitors controls the imputed prices of the resources. If ad-
ditional profit could be made by purchasing a resource to raise production (the constraint is
violated), then outside competitors would bid for the resource and force the imputed price to
rise until it was no longer profitable to purchase that resource. On the other hand, if there were
a surplus of a resource, then outside competitors would not be willing to buy it and the imputed
price would fall to zero. Therefore, either 2 − x1 − x2 = 0 or y1 = 0, and there are break
even imputed prices for inputs at the margin. Similar results hold for the other resources, so

0 = (2 − x1 − x2 )y1 ,
0 = (16 − 5x1 − 10x2 )y2 , and (1)
0 = (3 − 2x1 − x2 )y3 .

For such shadow prices of inputs, the firm’s profit is P = 8x1 + 6x2 and the situation from
the firm’s perspective yields the original maximization problem.
For a feasible choice of (x1 , x2 ), the coefficient of each yj in P is nonnegative, 2 − x1 −
x2 ≥ 0, 16 − 5x1 − 10x2 ≥ 0, 3 − 2x1 − x2 ≥ 0, and the competitive market is forcing to
zero the shadow price of any resource with a surplus; therefore, the market is minimizing P as
a function of (y1 , y2 , y3 ).
Regrouping the profit function P from the markets perspective yields

P = (8 − y1 − 5y2 − 2y3 )x1 + (6 − y1 − 10y2 − y3 )x2 + 2y1 + 16y2 + 3y3 .


One unit of first product ( x1 ) cost 1y1 for the first input, 5y2 for the second input, and 2y3 for
the third input, so the cost of producing a unit of this product by a competitor is y1 + 5y2 + 2y3 .
The market (potential competitors) forces this cost to be greater than or equal to its value of 8,
y1 + 5y2 + 2y3 ≥ 8, since if the cost were less than 8 then other competitors would enter the
market. In other words, the net profit of selling a unit of first product is 8−y1 −5y2 −2y3 ≤ 0.
Similarly, the potential for competition for the second product forces y1 +10y2 +y3 ≥ 6. These
18 1. Linear Programming

are the inequalities of the dual problem,


y1 + 5y2 + 2y3 ≥ 8 and
y1 + 10y2 + y3 ≥ 6.
Also, if the net profit is negative, then the firm would not produce that output, so
0 = (8 − y1 − 5y2 − 2y3 )x1 and (2)
0 = (6 − y1 − 10y2 − y3 )x2 .
Therefore, at the optimal production levels, the profit for the firm is equal to the imputed value
of the inputs for the two products, P = 2y1 + 16y2 + 3y3 . Since the the competitive market
is minimizing P (see above), the market is minimizing P = 2y1 + 16y2 + 3y3 subject to
the shadow prices satisfying the dual constraints. So, from the market’s perspective, we get the
dual minimization problem:
mLP Minimize: w = 2y1 + 16y2 + 3y3
Subject to: y1 + 5y2 + 2y3 ≥ 8,
y1 + 10y2 + y3 ≥ 6, and
y1 , y2 , y3 ≥ 0.

Relationship of dual minimization and primal maximization linear problem


1. The coefficient matrices of the xi and yi are transposes of each other.
2. The coefficients in the objective function for the MLP become the constants of the
inequalities for the mLP.
3. The constants of the inequalities of the MLP become coefficients in the objective
function for the mLP.
For the production linear program.
   
x1 x2 s1 s2 s3 x1 x2 s1 s2 s3
 1 1 1 0 0 2   0 1 2 0 1 1 
   
 5 10 0 1 0 16 ∼ 0 0 15 1 5 1
   

   
 2 1 0 0 1 3   1 0 1 0 1 1 
8 6 0 0 0 0 0 0 4 0 2 14
The optimal solution the the maximization problem is x1 = 1 and x2 = 1 with a payoff of 14.
As we state in Theorem 1.16 given subsequently, dual minimization problem must have a
solution of y1 = 4, y2 = 0, and y3 = 2 with the same payoff, where 4, 0, and 2 are the entries in
the bottom row of the optimal tableau in the columns associated with the slack variables. Each
value of a yi corresponds to the marginal value of each addition unit of the corresponding
input.
The sets of equations (1) and (2) are called complementary slackness. Note that the opti-
mal solutions for of the maximization and dual minimization problems satisfy complementary
slackness. 

Example 1.9. A manufacturing company produces x1 amount of the regular type and x2
amount of the super type with a profit of P = 12x1 + 15x2 . The assembly time is 20 minutes
per regular unit and 30 minutes per super unit with a total of 2400 minutes available, so 2 x1 +
3 x2 ≤ 240. The painting time is 15 minutes per regular unit and 40 minutes per super unit with
a total of 3000 minutes available, so 15 x1 + 40 x2 ≤ 3000 or 3 x1 + 8 x2 ≤ 600. Finally, the
inspection time is 12 minutes per unit of each type with 1200 minutes available. Thus, we have
the following maximization problem.
1.4. Duality 19

MLP Maximize: P = 12x1 + 15x2 (profit)


Subject to: 2 x1 + 3 x2 ≤ 240 (assembly time per 10 minutes),
3 x1 + 8 x2 ≤ 600 (painting time per 5 minutes),
x1 + x2 ≤ 100 (inspection time per 12 minutes),
x1 ≥ 0 x2 ≥ 0.
The dual problem is
mLP Minimize: P = 240 y1 + 600 y2 + 100 y3 ,
Subject to: 2 y1 + 3 y2 + y3 ≥ 12,
3 y1 + 8 y2 + y3 ≥ 15,
y1 ≥ 0, y2 ≥ 0, y3 ≥ 0.
The tableau for the MLP is

   
x1 x2 s1 s2 s3 x1 x2 s1 s2 s3

 2 3 1 0 0 240  
 0 1 1 0 2 40 

3 8 0 1 0 600  ∼  0 5 0 1 3 300
   
 
   
 1 1 0 0 1 100   1 1 0 0 1 100 
12 15 0 0 0 0 0 3 0 0 12 1200

 
x1 x2 s1 s2 s3
 
x1 x2 s1 s2 s3
0 1 1 0 2 40   0 1 1 0 2 40
 
   
  0 0 5 1 7 100

0 5 0 1 3 300 ∼ .


   
 1 1 0 0 1 100  
 1 0 1 0 3 60 

0 3 0 0 12 1200 0 0 3 0 6 1320

The optimal solutions has x1 = 60 regular type, x2 = 40 super type, with a profit of $1320.
The optimal values of the yi can be read off in the columns of the slack variables in the
objective function row: y1 = 3 profit per 10 minutes of assembly time or $0.30 per minute,
y2 = 0 profit per 5 minutes of painting time, and y3 = 2 profit per 12 minutes of inspection
or $0.16 per minute . The values of the yi correspond to the marginal value of each additional
unit of assembly, painting, and inspection time. Additional units of the exhausted resources,
assembly and inspection times, contribute to the profit but not from painting time, which is
slack. 

1.4.1. Duality for Non-standard Linear Programs


We have discussed the dual of a standard MLP. The proof of the Duality Theorem 1.15 given
later in the section shows how to form the dual linear program in general. The following table
indicates the rules for forming a dual linear program for non-standard conditions on variables
and constraint inequalities. Starting on either side, the corresponding condition on the other
side of the same row gives the condition on the dual linear program.
20 1. Linear Programming

Rules for Forming the Dual LP


Maximization Problem, MLP Minimization Problem, mLP
ith constraint j aij xj ≤ bi ith variable 0 ≤ yi
P

ith constraint j aij xj ≥ bi ith variable 0 ≥ yi


P

ith constraint j aij xj = bi ith variable yi is unrestricted


P

j th variable 0 ≤ xj j th constraint i aij yi ≥ cj


P

j th variable 0 ≥ xj j th constraint i aij yi ≤ cj


P

j th variable xj is unrestricted j th constraint i aij yi = cj


P

Note that the standard conditions for the MLP, Pj aij xj ≤ bi or 0 ≤ xj , corresponds
P

to the standard conditions for the mLP, 0 ≤ yi or i aij xi ≥ cj . Nonstandard conditions


for MLP, ≥ bi or 0 ≥ xj , corresponds to nonstandard conditions for mLP, 0 ≥ yi or ≤ cj .
Finally, an equality constraint for either optimization problem corresponds to an unrestricted
variable in its dual problem.

Example 1.10.
Minimize : 10 y1 + 4 y2 + 8 y3
Subject to : 2 y1 − 3 y2 + 4 y3 ≥ 2
3 y1 + 5 y2 + 2 y3 ≤ 15
2 y1 + 4 y2 + 6 y3 = 4
y1 ≥ 0, y2 ≥ 0, y3 ≤ 0

Using the table on the relations starting from the minimization, we get the dual maximization
problem as follows:

New objective function 2 x1 + 15 x2 + 4 x3 .


2 y1 − 3 y2 + 4 y3 ≥ 2 implies 0 ≤ x1 .
3 y1 + 5 y2 + 2 y3 ≤ 15 implies 0 ≥ x2 .
2 y1 + 4 y2 + 6y3 = 4 implies x3 unrestricted.
0 ≤ y1 implies 2 x1 + 3 x2 + 2 x3 ≤ 10.
0 ≤ y2 implies 3 x1 + 5 x2 + 4 x3 ≤ 4.
0 ≥ y3 implies 4 x1 + 2 x2 + 6 x3 ≥ 8

By making the change of variables x2 = z2 and x3 = z3 − v3 , all the variables are now
restricted to be greater than or equal to zero. Summarizing the new maximization problem:
Maximize : 2 x1 − 15 z2 + 4 z3 − 4 v3
Subject to : 2 x1 − 3 z2 + 2 z3 − 2 v3 ≤ 10
3 x1 − 5 z2 + 4 z3 − 4 v3 ≤ 4
4 x 1 − 2 z2 + 6 z3 − 6 v 3 ≥ 8
x1 ≥ 0, z2 ≥ 0, z3 ≥ 0, v3 ≥ 0.
The tableau for the maximization problem with variables x1 , z2 , z3 , v3 , with slack variables
s1 and s2 , and with surplus and artificial variables s3 and r3 is
1.4. Duality 21

 
x 1 z2 z3 v3 s1 s2 s3 r3
2 3 2 2 1 0 0 0 10 
 

 

 3 5 4 4 0 1 0 0 4 

 4 2 6 6 0 0 1 1 8 
2 15 4 4 0 0 0 0 0 
 

4 2 6 6 0 0 1 0 8
x1 z2 z3 v3 s1 s2 s3 r3
 
1 1
0 2 1 1 1 0 6
 
2 2
 
 13 17 17 3 3

 0 2 2 2
0 1 4 4
10
∼ .
 
1 3 3 1 1

 1 2 2 2
0 0 4 4
2 

 1 1 
 0 14 1 1 0 0 2 2
4 
0 0 0 0 0 0 0 1 0
The artificial objective function is no longer needed but we keep the artificial variable to deter-
mine the value of its dual variable.
 
x1 z2 z3 v3 s1 s2 s3 r3
47 2 7 7 122 
 0 0 0 1 17

17 17 17 17 


13 2 3 3 20 
∼  0 17 1 1 0 17

34 34 17 

 11 3 2 2 4 
 1 0 0 0 17
 17 17 17 17 
225 2 10 10 88
0 17
0 0 0 17 17 17 17

4
The solution of the maximization problem is x1 = 17 , x2 = z2 = 0, x3 = z3 − v3 =
20 20 122 88
17
− 0 = 17
, s1 = 17
, and s 2 = s3 = r3 = 0, with a maximal value of 17 .
According to Theorem 1.16, the optimal solution for the original minimization problem
2
can be also read off from the slack and artificial columns of the final tableau, y1 = 0, y2 = 17 ,
10 2 10 88
 
y3 = 17 , and optimal value 10 (0) + 4 17 + 8 17 = 17 .
Alternatively, we could first write the minimization problem in standard form by setting
y3 = u3 ,
Minimize : 10y1 + 4y2 − 8y3
Subject to : 2y1 − 3y2 − 4u3 ≥ 2
3y1 + 5y2 − 2u3 ≤ 15
2y1 + 4y2 − 6u3 = 4
y1 ≥ 0, y2 ≥ 0, u3 ≥ 0.
The dual maximization problem will now have a different tableau than before but the same
solution for the original problem. 

1.4.2. Duality Theorems


We present a sequence of duality results giving the relationship between the solutions of dual
linear programs and then summarize these results.
Notation for Duality Theorems
The (primal) maximization linear programming problem MLP with feasible set F M is
22 1. Linear Programming

Maximize: f (x) = c · x
j aij xj ≤ bi , ≥ bi , or = bi for 1 ≤ i ≤ m and
P
Subject to:
xj ≥ 0, ≤ 0, or unrestricted for 1 ≤ j ≤ n.
The (dual) minimization linear programming problem mLP with feasible set F m is
Minimize: g(y) = b · y
i aij yi ≥ cj , ≤ cj , or = cj for 1 ≤ j ≤ n and
P
Subject to:
yi ≥ 0, ≤ 0, or unrestricted for 1 ≤ i ≤ m.
Theorem 1.11 (Weak Duality Theorem). Assume that x ∈ F M is a feasible solution for a
primal maximization linear programming problem MLP and y ∈ F m is a feasible solution
for its dual minimization linear programming problem mLP.
a. Then c · x ≤ b · y. Thus, the optimal value M to either problem must satisfy
c · x ≤ M ≤ y · b.
b. Further, c · x = b · y iff
0 = y · (b − Ax) and
 
0 = x · A> y − c .
Remark. The equations of part (b) of the theorem are known as complementary slackness.
They imply that for 1 ≤ j ≤ m either
yj = 0 or 0 = bj − aj1 x1 − · · · − ajn xn ;
similarly for 1 ≤ i ≤ n, either
xi = 0 or 0 = a1i y1 + · · · + ami ym − ci .
In the material on nonlinear equations, we have a similar result that gives the necessary
conditions for a maximum, called the Karush-Kuhn-Tucker equations. We usually solve linear
programming problems by the simplex method, i.e., by row reduction. For nonlinear program-
ming with inequalities, we often solve them using the Karush-Kuhn-Tucker equations.

j aij xj ≤ bi then yi ≥ 0, so
P
Proof. (1) If
X
yi (Ax)i = yi aij xj ≤ yi bi .
j

j aij xj ≥ bi then yi ≤ 0, so the same inequality holds. If


P P
If j aij xj = bi then yi is
arbitrary, so X
yi (Ax)i = yi aij xj = yi bi .
j
Summing over i,
X X
y · Ax = yi (Ax)i ≤ yi bi = y · b or
i i

y> (b − Ax) ≥ 0.
(2) By same type of argument as (1),
   >
c · x ≤ x · A> y = A> y x = y> (Ax) = y · Ax or
 >
A> y − c x ≥ 0.
Combining gives part (a),
c · x ≤ y · Ax ≤ y · b.
1.4. Duality 23

Also,
 >
y · b − c · x = y> (b − Ax) + A> y − c x=0 iff
 
0 = A> y − c · x and 0 = y · (b − Ax) .
This proves part (b).
Corollary 1.12 (Feasibility/Boundedness). Assume that MLP and mLP both have feasible
solutions. Then, MLP is bounded above and has an optimal solution. Also, mLP is bounded
below and has an optimal solution.
Proof. If mLP has a feasible solution y0 ∈ F m , then for any feasible solution x ∈ F M of
MLP, f (x) = c · x ≤ b · y0 so f is bounded above and has an optimal solution.
Similarly, if MLP has a feasible solution x0 ∈ F M , then for any feasible solution y ∈
F m of mLP, g(y) = b · y ≥ c · x0 so g is bounded below and has an optimal solution.

Proposition 1.13 (Necessary Conditions). If x̄ is an optimal solution for MLP, then there is a
feasible solution ȳ ∈ F m of the dual mLP that satisfies complementary slackness equations,
0 = ȳ · (b − Ax̄) and
 
0 = A> ȳ − c · x̄.
Similarly, if ȳ is an optimal solution for mLP, then there is a feasible solution x̄ ∈ F M
of the Dual MLP that that satisfies complementary slackness equations,
Proof. Let E be the set of i such that bi ≥ j aij x̄j is tight or effective. The ith -row of A,
P

R> 0
i , is the gradient of this constraint. Let E the set of i such that xi = 0, so this constraint is
tight. The gradient of this constraint is the standard unit vector ei = (0, . . . , 1, . . . 0)> , with a
1 only in the ith -coordinate.
We assume that this solution is nondegenerate so that the gradients of the constraints
{R> i
i }i∈E ∪ { e }i∈E0 = {wj }j∈E00 are linearly independent. (Otherwise we have to take
an appropriate subset in the following argument.)
The objective function f has a local maximum on the level set for the tight constraints
at x̄. By Lagrange multipliers, ∇(f ) = c is a linear combination of the gradients of the tight
constraints, X X
c= ȳi R>i − z̄i ei .
i∈E i∈E0
By setting ȳi = 0 for i ∈
/ E and z̄i = 0 for i ∈/ E0 , the sum can be extended to all the
appropriate i, X X
c= ȳi R>
i − z̄i ei = A> ȳ − z̄. (*)
1≤i≤m 1≤i≤n
Since ȳi = 0 for bi − j aij x̄j 6= 0 and z̄i = 0 for xi 6= 0,
P
 
X
0 = y¯i bi − aij x̄j  for 1 ≤ i ≤ m and
j

0 = z̄i x̄i for 1 ≤ i ≤ n,


or in vector form using (*)
0 = ȳ · (b − Ax̄) and
 
0 = z̄ · x̄ = x̄ · A> ȳ − c .
24 1. Linear Programming

To finish the proof, we need to show that (i) ȳi ≥ 0 for a resource constraint (≤ bi ), (ii)
ȳi ≤ 0 for a requirement constraint (≥ bi ), (iii) ȳi is unrestricted for an equality constraint
− ≥ ≥ i − cj ≤ 0 for xj ≤ 0,
P P
(= bi ), (iv) z̄ j = i a ij ȳ i cj 0 for x j 0 , (iv) z̄ j = i a ij ȳ
(iv) z̄j = i aij ȳi − cj = 0 for xj unrestricted,
P

The set of vectors {R> i


i }i∈E ∪ { e }i∈E0 = {wj }j∈E00 = {wk }k∈E00 are linearly inde-
pendent, so we can complete it to a basis {wk of Rn using vectors perpendicular to these first
vectors. Let W be the n × n matrix with these wj as columns. Then W> is invertible be-
cause its rows are linearly independent, so there is an n × n matrix V such that W> V = I.
For each k , the k th column of V, vk , is perpendicular to all the wi except wk . Remember
that
X X X
c= ȳi R>
i − z̄i ei = pj wj
1≤i≤m 1≤i≤n j

where pj is corresponding ȳi , z̄i , or 0. Then


 
X
c · vk =  pj wj  · vk = pk .
j

Take i ∈ E. The gradient of this constraint is R> 00


i = wk for some k ∈ E . Set δ = 1
for a resource constraint and δ = 1 for a requirement constraint. The vector δR> i points into
F in both cases (except equality constraint). For small t ≥ 0, x̄ + t vk ∈ F , so
0 ≤ f (x̄) − f (x̄ + t vk ) = t δ c · vk = t pk .
Therefore, δ ȳi = δ pk ≤ 0, and ȳi ≥ 0 for a resource constraint and ≤ 0 for a requirement
constraint. For an equality constraint, we are not allowed to move off in either direction so ȳi
is unrestricted.
Next take i ∈ E0 , with ei = wk for some k ∈ E00 . Set δ = 1 if xi ≥ 0 and δ = 1
if xi ≤ 0. Then δ wk = δ ei points into F (unless xi is unrestricted). By an argument
as before, δ pk = δ z̄i ≥ 0. Therefore, z̄i ≥ 0 if xi ≥ 0 and z̄i ≤ 0 if xi ≤ 0. If xi is
unrestricted, then the equation is not tight and z̄i = 0.
We have shown that ȳ ∈ F m and satisfies complementary slackness.
The proof in the case of an optimal solution of mLP is similar.

Corollary 1.14 (Optimality and Complementary Slackness). Assume that x̄ ∈ F M is a


feasible solution for a primal MLP and ȳ ∈ F m is a feasible solution for the dual mLP. Then
the following are equivalent.
a. x̄ is an optimal solution of MLP and ȳ is an optimal solution of mLP.
b. c · x̄ = b · ȳ.
c. 0 = x̄ · (c − A> ȳ) and 0 = (b − Ax̄) · ȳ.

Proof. (b ⇔ c) This is a restatement of the Weak Duality Theorem 1.11(b).


(a ⇒ c) By Proposition 1.13, there exists ȳ0 of mLP that satisfies the complementary
slackness equations. By the Weak Duality Theorem 1.11, c · x̄ = b · ȳ0 , c · x̄ ≤ b · ȳ ≤
b · ȳ0 = c · x̄, and so c · x̄ = b · ȳ. Again by the Weak Duality Theorem 1.11, ȳ must satisfy
the complementary slackness equations.
(b ⇒ a) Assume ȳ and x̄ satisfy c · x̄ = b · ȳ. Then, for any other feasible solutions
x ∈ F M and y ∈ F m , c · x ≤ b · ȳ = c · x̄ ≤ b · y, so x̄ and ȳ must be optimal
solutions.
1.4. Duality 25

Theorem 1.15 (Duality Theorem). Consider dual problems MLP and mLP. Then, MLP has
an optimal solution iff the dual mLP has an optimal solution.
Proof. If MLP has an optimal solution x̄, then mLP has a feasible solution ȳ that satisfies
the complementary slackness equations by the necessary conditions of Proposition 1.13. By
Corollary 1.14, ȳ is an optimal solution of mLP.

Summary of Duality Results:


1. If x ∈ FM and y ∈ Fm , then c · x ≤ b · y. (Weak Duality Theorem 1.11)
2. MLP has an optimal solution iff mLP has an optimal solution. (Duality Theorem 1.15)
3. If x̄ ∈ FM 6= ∅ and ȳ ∈ Fm 6= ∅, then MLP and mLP each has an optimal solution.
(Corollary 1.12)
4. If x̄ ∈ FM and ȳ ∈ Fm , then the three conditions a–c of Corollary 1.14 are equiva-
lent.
Theorem 1.16 (Duality and Tableau). If either MLP or mLP is solved for an optimal solution
by the simplex method, then the solution of its dual is displayed in the bottom row of the optimal
tableau in the columns associated with the slack and artificial variables (not those for the
surplus variables).
Proof. To use the tableau to solve MLP, we need all the xi ≥ 0, so all the constraints of the
dual mLP will be requirement constraints. We group the equations into resource, requirement,
and equality constraints and get the tableau for MLP
 
A1 I1 0 0 0 b1
 A 0 I2 I2 0 b2 
2
.
 
 A3 0 0 0 I3 b3 


c> 0 0 0 0 0
Row operations to optimal tableau are realized by multiplying on the left by a matrix. The
last row is not added to the other rows so the row reduction to the optimal tableau is as follows:
 
" # A 1 I 1 0 0 0 b1
M1 M2 M3 0   A2 0 I2 I2 0 b2  
ȳ>1 ȳ>
2 ȳ>
2 1

 A 3 0 0 0 I3 b3 

c> 0 0 0 0 0
" #
M1 A1 + M2 A2 + M3 A3 M1 M2 M2 M3 Mb
= .
ȳ> > >
1 A1 + ȳ2 A2 + ȳ3 A3 − c
>
ȳ>
1 ȳ>
2 ȳ>
2 ȳ>
3 ȳ> b
In the optimal tableau, the entries in the objective function row are nonnegative (except for
artificial variable columns), so ȳ> A − c> ≥ 0, ȳ> >
1 ≥ 0, ȳ2 ≤ 0. Thus, ȳ is a feasible
solution of the mLP. If x̄ is a maximizer for the MLP, the value ȳ> b = b · ȳ in the lower
right position of the optimal tableau is the optimal value of the MLP, so equals c · x̄. By the
Optimality Corollary 1.14, ȳ is an optimal solution of mLP.
Note that A> ȳ i = Li · ȳ where Li is the ith -column of A.


In the case when xi ≤ 0, we start by setting ξi = xi ≥ 0. The column in the tableau is


1 time original column, and the new object function coefficient is ci . By the earlier argument
for ξi ≥ 0, we get 0 ≤ ( Li ) · ȳ − ( ci ), or
Li · ȳ ≤ ci ,
which is a resource constraint for dual the problem as claimed.
26 1. Linear Programming

In the case when xi is arbitrary, we start by setting xi = ξi − ηi . By the previous two


cases, we get both

Li · ȳ ≥ ci and
Li · ȳ ≤ ci , or
Li · ȳ = ci ,

which is an equality constraint for the dual problem as claimed.

1.4. Exercises
1.4.1. Determine the dual of the linear program:
Minimize: 4y1 + 3y2 + 8y3
Subject to: y1 + y2 + y3 ≥ 12
5y1 − 2y2 + 4y3 ≤ 20
2y1 + 3y2 − y3 = 12
0 ≤ y1 , 0 ≤ y2 , y3 unrestricted.

1.4.2. Consider the following minimization linear programming problem, mLP:


Minimize: 8y1 + 6y2
Subject to: 2y1 + y2 ≥ 3
y1 + y2 ≥ 2
y1 ≥ 0, y2 ≥ 0.
a. Form the dual problem maximization MLP linear problem.
b. Solve the dual problem MLP by the simplex method. Give the optimal solution
of the dual maximization problem and the maximal value.
c. Give the optimal solution of the original minimization problem mLP and the
minimal value.

1.4.3. Consider the linear programming problem


Maximize : f (x, y) = 2 x1 + 3 x2 + 5 x3 ,
Subject to : 3 x1 + 4 x2 − 2 x3 ≤ 10,
−x1 + 2 x2 + x3 ≤ 3,
x1 ≥ 0, x2 ≥ 0. x3 ≥ 0.
a. Using the simplex method, find an optimal solution. Give all the values (i) of
the variables x1 , x2 , and x3 , (ii) of the the slack variables, and (iii) the maximal
value of the objective function.
b. State the dual linear programming problem.
c. What is the optimal solution of the dual linear programming problem?

1.4.4. Use the dual problem MLP to discover the nature of the solution to the following
minimization linear programming problem, mLP:
Minimize: −y1 + 2 y2
Subject to: −5y1 + y2 ≥ 2
4y1 − y2 ≥ 3
y1 ≥ 0, y2 ≥ 0.
1.5. Sensitivity Analysis 27

1.5. Sensitivity Analysis


In this section, we investigate for what range of change in bi the optimal value yi∗ of the dual
problem correctly gives the change in the optimal value of the objective function. We also
consider what marginal effect a change in the coefficients ci has on the optimal value.
Example 1.17. A company produces two products, 1 and 2, with profits per item of $40 and
$10 respectively. In the short run in stock, there are only 1020 unit of paint, 400 fasteners, and
420 hours of labor. Each unit of each product requires 15 and 10 units of paint respectively, 10
and 2 fasteners, and 3 and 5 hours of labor. The tableau for the maximization problem is
   
x1 x2 s1 s2 s3 x1 x2 s1 s2 s3
 15 10 1 0 0 1020    0 7 1 1.5 0 420 

 
 10 2 0 1 0 400  ∼  1 .2 0 .1 0 40 
   
   
 3 5 0 0 1 420   0 4.4 0 .3 1 300 
40 10 0 0 0 0 0 2 0 4 0 1600

 
x1 x2 s1 s2 s3
1 3
0 1 0 60 
 
 7 14
 
1 1
1 0 0 28 
 
∼
 35 7 .
22 9
0 0 1 36 
 
 35 14
 
2 25
0 0 7 7
0 1720
The optimal solution is
x1 = 28, x2 = 60, s1 = 0, s2 = 0, s3 = 36,
2
with optimal value of 1720. The values of an increase of the constrained quantities are y1 = 7
and y2 = 25
7
, and y3 = 0 for the quantity that is not tight.

Resource = 344
Resource = 400
Resource = 680
(20,72)

(28,60)

(40,0) (68,0)

We begin discussing the affect of its change to the limitation on fasteners, b2 + δ2 , while
keeping the same basic variables and free variables s1 = s2 = 0. The starting form of the
constraint is
10x1 + 2x2 + s2 = 400 + δ2 .
The slack variable s2 and δ2 play similar roles (and have similar units). So in the new optimal
tableau, δ2 times the s2 -column is added to the right hand column, the column for the constraint
28 1. Linear Programming

constants. Since we continue to need x1 , x2 , s3 ≥ 0,


3δ2 14
0 ≤ x2 = 60 − or δ2 ≤ 60 · = 280,
14 3
δ2 7
0 ≤ x1 = 28 + or δ2 ≥ 28 · = 196,
7 1
9δ2 14
0 ≤ s3 = 36 + or δ2 ≥ 36 · = 56.
14 9
In order to keeps x1 , x2 , and s3 as basic variables and s1 = 0 = s2 as the non-pivot variables,
the resource can be increased at most 280 units and decreased at most 56 units, or
344 = 400 − 56 ≤ b2 ≤ 400 + 280 = 680.
For this range of b2 , 25/7 is the marginal value of the fasteners. For δ2 = 280 and b2 = 680,
we have
1
x1 = 28 + 280 · = 68,
7
3
x2 = 60 − 280 · = 0,
14
9
s3 = 36 + 280 · = 216, and
14
25
z = 1720 + 280 · = 2720 as the optimal value.
7
There is a similar calculation for δ2 = 56 has optimal value z = 1720 − 56 · 257
= 1520.
A similar consideration can be applied to the supply of paint, the first constraint. The
inequalities for the optimal tableau become
δ1 7
0 ≤ x2 = 60 + or δ1 ≥ 60 · = 420,
7 1
δ1 35
0 ≤ x1 = 28 − or δ1 ≤ 28 · = 980,
35 1
22δ1 35
0 ≤ s3 = 36 − or δ1 ≤ 36 · 2 ≈ 57.27.
35 2
Therefore
0 = 420 − 420 ≤ b1 ≤ 420 + 57.27 = 477.27,
with change in optimal value
2 2
1600 = 1720 − 7
· 420 ≤ f ≤ 1720 − 7
· 57.27 = 1736.36. 
We include a sensitivity analysis for the general change in a constraint to indicate how the
method is applied. These formula are hard to remember, so it is probably easier to work a
given problem as we did in the preceding example. For all the general statements for sensitivity
analysis, let denote the entries of the optimal tableau as follows: b0i denotes the entry for the
ith constraint in the right hand column, c0j ≥ 0 denotes an objective row entry, a0ij denotes
the entry in the ith row and j th column (excluding the right hand constants and any artificial
variable columns), C0j denotes the j th -column of A0 , and Ri denotes the ith -row of A0 .
Changes in a tight resource constraint
Consider a change in the constant for a tight resource constraint, br + δr . Let k = kr be
the column for be the slack variable sr of the rth constraint that is not a pivot column. The
1.5. Sensitivity Analysis 29

initial tableau row reduces to the optimal tableau as follows:


   
sr sr
 A er b + δr er   0 C0 b0 + δ C0 
 ∼ A k r k .
> 0> 0 0
c 0 0 c ck M + δr ck
Let zi be the basic variable with a pivot in the ith -row. To keep the same basic variables, we
need 0 ≤ zi = b0i + δr a0ik for all i. For a0ik < 0, we need δr a0ik ≤ b0i ; for a0ik > 0, we need
b0i ≤ δr a0ik . Range of change in br , δr , with the same set of basic variables satisfies
 0  0
bi bi
 
0 0
min : aij > 0 ≤ δr ≤ min : aij < 0 ,
i a0ik i a0ik
where k = kr is the column for be the slack variable sr . The change in the optimal value for
δr in allowable range is given by δr c0kr . The point of restricting to this range is that the change
in the optimal value and the optimizing basic solution changes in a simple fashion.
Changes in a slack resource constraint
For a slack resource constraint with sr in a pivot column, to keep the sr -column a pivot
column, we need b0r + δr ≥ 0. Thus, δr ≥ b0r gives the amount that br can be decreased
before the set of basic variables changes. The change δr can be arbitrarily large. For δr in this
range, the optimal value is unchanged.

1.5.1. Changes in Objective Function Coefficients


Example 1.18. We consider the same problem as in Example 1.17. Changes in the coefficients
ck correspond to changes in the profits of the products. This change could be accomplished by
changing the price charged for each item produced.
For a change from c1 to c1 + ∆1 , the change in the initial and optimal tableau are as
follows:
 
  x1 x2 s1 s2 s3
x1 x2 s1 s2 s3 1 3
0 1 0 60 
 

 15 10 1 0 0 1020  
   7 14 
1 1
1 0 0 28

10 2 0 1 0 400  ∼  .
  
 35 7 
  
 3 5 0 0 1 420   0 0 22 9
1 36 

35 14
40 ∆1 10 0 0 0 0
 
2 25
∆1 0 7 7
0 1720
The pivot for x1 is in the r = 2 nd row. To make the objective function row nonnegative, we
need to add ∆1 times the second row,
 
x1 x2 s1 s2 s3
 0 1 3
1 0 60 
 
 7 14 
1 1
1 0 0 28
 
∼  35 7
.

 0 22 9
0 1 36
 
35 14

 
0 0 27 − 35 1
∆1 257
+ 1
7
∆ 1 0 1720 + 28∆ 1

To keep the same basic variables, we need


2 1 2 35
0 ≤ − ∆1 or ∆1 ≤ · = 10,
7 35 7 1
25 1 25 7
0≤ + ∆1 or ∆1 ≥ · = 25.
7 7 7 1
30 1. Linear Programming

Thus the coefficient satisfies


15 = 40 − 25 ≤ c1 ≤ 40 + 10 = 50.
The value of the objective function for c1 in this range is 1720 + 28∆1 .
For ∆2 , the pivot for x2 is in the r = 1 row, and a similar calculations shows that
2 7 25 14 50
2= · ≤ ∆2 ≤ · = .
7 1 7 3 3
Thus the coefficient satisfies
50 2
8 = 10 − 2 ≤ c2 ≤ 10 + = 26 ,
3 3
and the value of the objective function is 1720 + 420∆2 . 
For the general case, consider the changes ∆k in the coefficient ck of the variable xk
in the objective function, where xk is a basic variable in the optimal solution and its pivot is
in the rth row, a0rk = 1. The changed entry in the original objective row is −ck − ∆k , and
the optimal tableau changes from 0 to ∆k . To keep xk basic, we need to add ∆k R0r to
the objective row, where the pivot of xk is in the rth row. The entry in j th -column becomes
c0j + ∆k a0rj . To keep the same basic variables, the range of change ∆k is determined so that
c0j + ∆k a0rj ≥ 0 for all j , with artificial variables artificial variable excluded but columns for
slack and surplus variables included. For a0rj > 0 in rth -pivot row, we need a0rj ∆k ≥ c0j or
c0j 0 th 0 0
c0j
∆k ≥ ; for arj < 0 in r -pivot row, we need c j ≥ a ∆
rj k or ≥ ∆k . Note that
a0rj a0rj
if c0j = 0 for a0rj > 0 and j 6= k , then we need ∆k ≥ 0, and if c0j = 0 for a0rj < 0, then
we need ∆k ≤ 0.
Range of change ∆k with the same set of basic variables:
( ) ( )
c0j 0
c0j 0
min j : arj > 0 j 6= k ≤ ∆k ≤ min j : arj < 0 .
a0rj a0rj
Change in the optimal value for ∆k in allowable range is b0r · ∆k .
Changes in coefficient of objective function for a non-basic variable
If xk is a non-basic variable, then in the unchanged problem xk = 0. The inequality
c0k + ∆k ≥ 0 insures that xk remains a non-basic variable, or ∆k ≥ c0k . Thus, the entry c0k
in the column for xk indicates the amount that ck can decrease while keeping the same set of
basic variables. A decrease of more than c0k would start to make a positive contribution to the
optimal value of the objective function.
1.5. Exercises
1.5.1. The optimal tableau for the following linear program
Maximize: 2 x1 + 3 x2 + 5 x3
Subject to: x1 + x2 + x3 ≥ 12
5x1 − 2x2 + 4x3 ≤ 20
0 ≤ x1 , 0 ≤ x2 , 0 ≤ x3
is the following:
 
x1 x2 x3 s1 s2
 1 8 0 53 23 56 
 
 
 0

2 1 13 13 65 
0 23 0 5 3 177
1.6. Theory for Simplex Method 31

a. How much can each constant bi increase and decrease, bi + δi , and keep the
same set of basic variables in the optimal solution?
What is the change in the optimal value of the objective function with an al-
lowable change δi in each bi ?
What is the marginal value per additional unit of a small amount of each re-
source?
b. Determine the range of values of the objective function coefficient of x1 such
that the optimal basis remains unchanged?
1.5.2. A farmer can grow x1 acres of corn and x2 acres of potatoes. The linear program
to maximize the profit is the following:
Maximize: 50 x1 + 40 x2 Profit
Subject to: x1 + x2 ≤ 50 Acres of land
3 x1 + 2 x2 ≤ 120 Days of labor
10 x1 + 60 x2 ≤ 1200 Dollars of capital
20 x1 + 10 x2 ≤ 800 Pounds of fertilizer
0 ≤ x1 , 0 ≤ x2 .
The optimal tableau is the following:
 
x1 x2 s 1 s2 s3 s4
5 1
 0 0 1 0 5 
 
 16 160 
 0 1 3
 1 0 16 160
0 15 

 
 0 55 1
 0 0 8 16
1 50 

3 1
 
 1 0 0 0 30 
 8 80 
65 1
0 0 0 4 8
0 2100
a. What is the value each additional acre of land, each additional day of labor,
each additional dollar of capital, and each additional pound of fertilizer?
b. What range of the number of days of labor is the the value given in part (a)
valid?
1.5.3. For the linear program in Exercise 1.5.1, what range of each of the coefficients ck
leaves the set of basic variables the same? In each case, what is the change in maximal
value for the changes ∆k ?
1.5.4. For the linear program in Exercise 1.5.2, what range of each of the coefficients ck
leaves the set of basic variables the same? In each case, what is the change in maximal
value for the changes ∆k ?

1.6. Theory for Simplex Method


a1 + a2 + a3
For three vectors a1 , a2 , and a3 , is the average of these vectors. The quantity
3
a1 + 2a2 + 3a3 a1 + a2 + a2 + a3 + a3 + a3
= is the weighted average with weights 1/6,
6 6
2/6, and 3/6. In general, for a set of vectors {a }k k
i i=1 and numbers {ti }i=1 with ti ≥ 0 and
Pk Pk
i=1 ti = 1, i=1 ti ai is a weighted average, and is called a convex combination of the points
determined by these vectors.
Definition. A set S ⊂ Rn is convex provided that if x0 and x1 are any two points in S then
the convex combination xt = (1 − t)x0 + tx1 is also in S for all 0 ≤ t ≤ 1.
32 1. Linear Programming

convex convex convex not convex not convex


Figure 1.6.1. Examples of convex and non-convex sets

Each resource or requirement constraint, ai1 x1 + · · · + ain xn ≤ bi or ≥ bi , defines


a closed half-space. Each equality constraint, ai1 x1 + · · · + ain xn = bi , is a hyperplane.
Together, all the constraints define the intersection of such subsets of Rn .
Definition. Any intersection of a finite number of closed half-spaces and possibly some hyper-
planes is called a polyhedron.
Theorem 1.19. a. The intersection of convex sets is convex.
b. Any polyhedron and so any feasible set for a linear programming problem is convex.
Proof. (a) Assume {Sj } are a collection of convex sets and x0 , x1 ∈ Sj for all j . Take
0 ≤ t ≤ 1. Then
(1 − t) x0 + t x1 ∈ Sj for all j, so
\
(1 − t) x0 + t x1 ∈ Sj .
j

(b) Each closed half-space and hyperplane is convex so the intersection is convex.

Theorem 1.20. If S is a convex set, and pi ∈ S for 1 ≤ i ≤ k , then any convex combination
Pk
i=1 ti pi ∈ S.

Proof. The proof is by induction on the number of points. For k = 2, it follows from the
definition of a convex set.
Assume the result is true for k − 1 ≥ 2 points. If tk = 1, then pk ∈ S, so true. If
tk < 1, then k−1
Pk−1 ti Pk−1 ti
i=1 ti = 1 − tk > 0 and i=1 1−tk pi ∈ S by
P
i=1 1−tk = 1. The sum
induction hypothesis. So,
k k−1
X X ti
ti pi = (1 − tk ) pi + tk pk ∈ S.
i=1 i=1
1 − tk

Definition. A point p in a nonempty convex set S is called an extreme point provided that
whenever p = (1 − t)x0 + tx1 for some 0 < t < 1 with x0 and x1 in S then p = x0 = x1 .
An extreme point for a polyhedral set is also called a vertex.
An extreme point of a set must be a boundary point. The disk D = { x ∈ R2 : kxk ≤ 1 }
is convex and each point on its boundary circle is an extreme point.
For the rest of the section, we consider a standard linear program in slack-variable form
with both slack and surplus variables included in x. The feasible set is given by
F = { x ∈ Rn+m
+ : Ax = b }.
Theorem 1.21. Assume x ∈ F = { x ∈ Rn+m + : Ax = b } is a feasible solution to a linear
programming problem. Then x is a point of F if and only if x is a basic feasible solution,
i.e., if and only if the columns of A with xj > 0 form a linearly independent set of vectors.
1.6. Theory for Simplex Method 33

Proof. By reindexing the columns and variables, we can assume that


x1 > 0, . . . , xr > 0 xr+1 = · · · = xn+m = 0.
(⇒) Assume that the columns { A1 , . . . , Ar } are linearly dependent, so there are con-
stants β1 , . . . , βr not all zero with
β1 A1 + · · · + βr Ar = 0.
>
β ≥0
If we let β = (β1 , . . . , βr , 0, . . . , 0), then Aβ = 0. For all small λ, w1 = x + λβ
and w2 = x − λβ β ≥ 0. Also for i = 1, 2, Awi = Ax = b, so both w1 , w2 ∈ F are
feasible solutions. Since x = 21 w1 + 12 w2 , x is not a vertex.
(⇐) Conversely, assume that the feasible point x is not a vertex but a convex combination
of the feasible solutions y and z,
x = t y + (1 − t) z for 0 < t < 1,
with y 6= z. For r < j ,
0 = xj = t yj + (1 − t) zj .
Since both yj ≥ 0 and zj ≥ 0, both must be zero for j > r. Because y 6= z are both in F ,
b = Ay = y1 A1 + · · · + yr Ar ,
b = Az = z1 A1 + · · · + zr Ar , and
0 = (y1 − z1 ) A1 + · · · + (yr − zr ) Ar ,
and the columns { A1 , . . . , Ar } must be linearly dependent.
Note that for any linear (or convex) combination the value of the objective function is the
corresponding linear combination,
X  X X X
f tj xj = c · t j xj = tj c · xj = tj f (xj ).

Theorem 1.22. Assume that the feasible set is nonempty for a bounded standard maximization
linear program in slack-variable form. Then the following hold.
a. If x0 is a feasible solution to a bounded linear program, then there exists a basic feasi-
ble solution xb such that f (xb ) = c · xb ≥ c · x0 = f (x0 ).
b. There is at least one optimal basic solution.
c. If two or more basic solutions are optimal, then any convex combination of them is also
an optimal solution.
Proof. (a) If x0 is already a basic feasible solution, then we are done.
Otherwise, the columns A corresponding to x0i 6= 0 are linearly dependent. Let A0 be
the matrix with only these columns. Since the columns of A0 are linearly dependent, there
is a nonzero vector y0 such that A0 y0 = 0. Adding zeroes in the other entries, we obtain a
nonzero vector y such that Ay = 0. Since, A( y) = 0, we can assume that y has c · y ≥ 0
by replacing y by y if necessary. Also,
A [x0 + t y] = b,
so x0 + t y is a solution. Remember that by construction, if x0j = 0, then yj = 0, so if
yj 6= 0, then x0j > 0.
Case 1. Assume that c · y > 0 and some component yi < 0. Then x0i > 0 and
0
x0i + t yi = 0 for ti = −xi/yi > 0. As t increases from 0 to ti , the objective function
increases from c · x0 to c · [x0 + ti y]. If more than one yi < 0, then we select the one with
the smallest value of ti . In this way, we have constructed a new feasible solution x1 with one
more component of the vector zero, fewer components of yi < 0, and a greater value of the
objective function.
34 1. Linear Programming

We can continue in this manner until either the columns are linearly independent or all the
components of the y are nonnegative.
Case 2. If c · y > 0 y ≥ 0, then x + t y is a feasible solution for all t > 0, and

c · [x + t y] = c · x + t c · y

is arbitrarily large. Thus the linear program is unbounded and has no maximum, which is a
contradiction.
Case 3. Assume c · y = 0. Some yk 6= 0. Considering y and y, can assume yk < 0.
Then there exists first tk > 0 such that x0k +tk yk = 0. The value of the objective function does
not change. There are few positive components of new x0 . Eventually we get the corresponding
columns linearly independent, and we are at a basic solution as claimed in part (a).

(b) There are only a finite number of basic feasible solutions {pj }N
j=1 since there is a finite
number of equations. By part (a),

f (x) ≤ max f (pj ) for x ∈ F .


1≤j≤N

Thus, a maximum can be found among these finite set of values of the basic solutions.
(c) Assume that f (pji ) = M is the maximum for some collection of basic feasible
solutions i = 1, . . . , `. Then any convex combination is also a maximizer:

X 
` X` X`
A tji pji = tji Apji = tji b = b,
i=1 i=1 i=1
X`
tji pji ≥ 0 is feasible,
i=1
X 
` X` X`
f tji pji = tji f (pji ) = tji M = M.
i=1 i=1 i=1

If there are degenerate basic feasible solutions with fewer than m nonzero basic variables,
then the simplex method can cycle by row reduction to matrices with the same nonzero basic
variables but different zero basic variables so different sets of pivots. Interchanging a basic
variable equal to zero for a zero non-basic variable corresponds to the same vertex of the feasible
set. See the following example. There are ways to programs computers to avoid repeating a set
of basic variables, so avoid cycling. See Jongen et al [8]. Humans avoid cycling naturally.

Example 1.23. The following maximization problem has a degenerate basic solution.
Maximize: 8 x1 + 7 x2 + 2 x3
Subject to: 2 x1 + x2 + x3 ≤ 15,
14 x1 + 13 x2 − 2 x3 ≤ 105,
2 x1 + 4 x2 + 4 x3 ≤ 30,
x1 ≥ 0, and x2 ≥ 0.
1.6. Theory for Simplex Method 35

The steps in the simplex method are as follows.


   
x1 x2 x3 s1 s2 s3 x1 x2 x3 s1 s2 s3
 2 1 1 1 0 0 15   2   1 1 1 0 0 15 
 
 14 13 2 0 1 0 105  ∼  0 6 9 7 1 0 0 
   
   
 2 4 4 0 0 1 30   0 3 3 1 0 1 15 
8 7 2 0 0 0 0 0 3 2 4 0 0 60
x1 x2 x3 s 1 s 2 s 3 x1 x2 x3 s1 s2 s3
   
5 13 1 4 1
 2 0 0 15   2 0 0 0 10 
   
 2 6 6   3 3 
7 1 4 2 2
∼ 0 2 3 0 0 ∼ 0 2 0 6 
.
  
 3 3   3 15 5 
 0 15 5 1 1 1 2
 0 2 2 2
1 15  
  0 0 1 3 15 15
2 

5 1 1 4 1 1
0 0 2 2 2
0 60 0 0 0 3 3 3
65
Notice for the first pivoting, the ratio for the first and second row is the same which causes an
entry in the augmented column to become zero in the second tableau. Both the second and third
tableau have a zero basic variable in addition to the free (non-pivot) variables and have the same
degenerate basic solution, (x1 , x2 , x3 , s1 , s2 , s3 ) = 15/2, 0, 0, 0, 0, 15 , but the basic variables


are different: (x1 , s2 , s3 ) are basic variables for the second tableau and (x1 , x2 , s3 ) are basic
variables for the third tableau. When leaving a basic solution, the variable which becomes
positive must be a free variable (non-basic variable) and not a zero basic (pivot) variable. The
first pivot operation at this degenerate solution interchanges a basic variable equal to 0 and a
free variable, so this new free variable made positive with the next pivoting when the value
of the objective function is increased; this first pivoting results in all the same values of the
variables, so the same point in F . At a degenerate solution, the value will increase after a
finite number of pivoting steps unless there is a cycle of pivoting sets (all staying at the same
point of the feasible set) that results back with with the original set of pivots. The difficulty of
a degenerate vertex and the potential of cycling is a matter of how row reduction relates to a
movement on the feasible set. 
Theorem 1.24. If a maximal solution exists for a linear programming problem and the simplex
algorithm does not cycle among degenerate basic feasible solutions, then the simplex algorithm
locates a maximal solution in finitely many steps.
Proof. Pivoting corresponds to changing from one basic feasible solution to another as in The-
orem 1.22(a). Assume that never reach a degenerate basic feasible solution during the steps of
the simplex method. Since there are only a finite number of vertices, the process must termi-
nate with a basic feasible solution p0 for which pivoting to any of the nearby p1 ,. . . ,pk has
f (pj ) ≤ f (p0 ). (Usually, it will strictly decrease the value.) Complete to the set of all basic
feasible solutions (vertices) p0 , p1 ,. . . , p` .
The set of all convex combinations is a bounded polyhedron
( ` `
)
ti = 1 ⊂ F .
X X
H= ti pi : ti ≥ 0,
i=0 i=0

An edge of H from p to pj corresponds to pivoting as in the proof of Theorem 1.22(a) where


one constraint becomes not equal to bi and another becomes equal bj . Positive cone out from
p0 determined by { pi − p0 }ki=1
k
X
C = {p0 + yi (pi − p0 ) : yi ≥ 0 } ⊃ H.
i=1
36 1. Linear Programming

Let q be any vertex of H (basic solution), q ∈ H ⊂ C.


k
X
q − p0 = yi (pi − p0 ) with all yi ≥ 0.
i=1

Then,
k
X
f (q) − f (p) = yi [f (pi ) − f (p∗ )] ≤ 0.
i=1

This proves that p is a maximizer for f .

1.6. Exercises
1.6.1. This exercise corresponds to Case 1 in the proof of Theorem 1.22(a).
Consider the linear program
Maximize: f (x1 , x2 , x3 ) = 9 x1 + 2 x2 + x3 ,
Subject to: 4 x1 + 5 x2 + 7 x3 + s1 = 20,
x1 + 3 x2 + 2 x3 + s2 = 7,
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
Consider the vectors x0 = (1, 2, 0, 6, 0) and y = ( 3, 1, 0, 7, 0) with coordinates
(x1 , x2 , x3 , s1 , s2 ).
a. Show that x0 is a nonnegative solution of the linear program. Is it a basic
solution? Why or why not?
b. Show that y is a solution of the corresponding homogeneous equation.
c. Determine a value of t such that x0 + t y is a basic solution with
f (a + t b) ≥ f (a).
1.6.2. This exercise corresponds to Case 1 in the proof of Theorem 1.22(a).
Consider the linear program
Maximize: f (x1 , x2 , x3 ) = 9 x1 + 2 x2 + x3 ,
Subject to: x1 + 3 x2 + 7 x3 + s1 = 9,
2 x1 + x2 + 5 x3 + s2 = 12,
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
a. Show that (x1 , x2 , x3 , s1 , s2 ) = (2, 0, 1, 0, 3) is a non-basic nonzero solution
of the linear program.
b. Let x0 = (2, 0, 1, 0, 3) as in part (a). Find a vector y that is a solution
of the homogeneous equations and has only nonzero components in the same
coordinates as x0 .
c. Determine a value of t such that x0 + t y is a basic solution with
f (a + t b) ≥ f (a).
1.6.3. This exercise gives an unbounded problems and corresponds to Case 2 in the proof
of Theorem 1.22(a).
Consider the maximization with the nonhomogeneous system of equations
Maximize: f (x1 , x2 , x3 ) = 2 x1 + 5 x2 + 3 x3 ,
Subject to: 6 x1 − x2 + 5 x3 + x4 = 6,
−4 x1 + x2 + 3 x3 − x5 + x6 = 2,
x ≥ 0.
1.6. Theory for Simplex Method 37

a. Set up the tableau (without adding any more artificial variables). Apply the
simplex method. You should come to the situation where an entry in the ob-
jective function row is negative and all the entries above it are negative. You
can use this to give a feasible nonnegative solution.
b. Take the last tableau obtained in part (a) and write out the general solution tak-
ing all the free variables to the right hand size. Explain why the variable with
the negative entry in the objective function row can be increased arbitrarily
large and still give feasible nonnegative solutions. Why does this show that the
problem is unbounded?
1.6.4. Consider the maximization with the nonhomogeneous system of equations
Maximize: f (x1 , x2 , x3 , x4 ) = 75 x1 − 250 x2 + 50 x3 − 100 x4 ,
Subject to: x1 − 4 x2 − 4 x3 + 6 x4 ≤ 1,
x1 − 3 x2 − x3 + x4 ≤ 1,
x3 ≤ 1,
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.
a. Row reduce the tableau with slack variables added by choosing the following
sequence of new pivots: (i) row 1 and column 1, (ii) row 2 and column 3, then
(iii) row three and column 5.
b. For the second and third tableau answer the following: (i) What are the basic
variables? (ii) What is the basic feasible solution? (iii) Why are these degen-
erate basic feasible solutions?
38 1. Linear Programming

1. Exercises for Chapter 1


1.1. Indicated which of the following statements are true and which are false. Justify each
answer: For a true statement explain why it is true and for a false statement either
indicate how to make it true or indicate why the statement is false. The statements
relate to a standard maximum linear program with the objective function f (x) = c·x
the constraint inequality Ax ≤ b for an m × n coefficient matrix A and constant
vector b ∈ Rm + , and x ≥ 0.
a. If a standard maximization linear program does not have an optimal solution,
then either the objective function is unbounded on the feasible set F or F is
the empty set.
b. If x̄ is an optimal solution of a standard maximization linear program, then x̄
is an extreme point of the feasible set.
c. A slack variable is used to change an equality into an inequality.
d. A solution is called a basic solution if exactly m of the variables are nonzero.

e. The basic feasible solutions correspond to the extreme points of the feasible
region.
f. The bottom entry in the right column of a simplex tableau gives the maximal
value of the objective function.
g. For a tableau for a maximization linear program, if there is a column with all
negative entries including the one in the row for the objective function, then
the linear programming problem has no feasible solution.
h. The value of the objective function for a MLP at any basic feasible solution is
always greater than the value at any non-basic feasible solution.
i. If a standard maximization linear program MLP has nonempty feasible set,
then it has an optimal basic solution.
j. In the two-phase simplex method, if an artificial variable is positive for the
optimal solution for the artificial objective function, then there is no feasible
solution to the original linear program.
k. The dual mLP problem is to minimize y ∈ Rm subject to Ay ≥ c and
y ≥ 0.
l. If x̄ is an optimal solution to the primal MLP and ŷ is a feasible solution to
the dual mLP, then f (x̄) = g(ŷ).
m. If a slack variable is s̄i > 0 in an optimal solution, then the addition to the
objective function that would be realized by one more unit of the resource
corresponding to its inequality is positive.
n. If a maximization linear program MLP and its dual minimization linear prob-
lem mLP each have nonempty feasible sets (some feasible point), then each
problem has an optimal solution.
o. If the optimal solution of a standard MLP has a slack variable si = 0, then
the ith resource has zero marginal value, i.e., one unit of the ith resource would
add nothing to the value of the objective function.
Chapter 2

Unconstrained Extrema

We begin our treatment of nonlinear optimization problems in this chapter with consideration
of unconstrained problems where the variables are free to move in any direction in a Euclidean
space. The first section presents the mathematical background from calculus and linear algebra
and summarizes some terminology and results from real analysis that is used in the rest of the
book. The second section gives the standard first and derivative conditions for an extremizer.
A standard multi-dimensional calculus course certainly discusses the unconstrained extrema
of functions of two variables and three variables and some calculus courses treat most of the
material of this chapter. However, we present a unified treatment in any dimension that forms
the foundation for the remaining chapters.

2.1. Mathematical Background


In this section, we summarize some terminology and results from an undergraduate course
on real analysis so we can use it in the rest of the book. See Wade [15] or Rudin [10] for
more details. This material includes open, closed, and bounded sets and their boundaries. The
concept of a continuous functions is introduced, because it is a crucial assumption in many
of the results, for example the Extreme Value Theorem on the existence of a maximum or
minimum.
Although many of the results can be stated in terms of the gradients of real-valued func-
tions, we consider the derivative of functions between Euclidean spaces as a matrix. This
perspective is very useful for the discussion in the following chapter about conditions on con-
straints and implicit differentiation in the context of several constraints. Colley [6] has much of
this material on differentiation of multivariable functions. From linear algebra, some material
about quadratic forms is reviewed and extended. A basic reference is Lay [9]. We present a
more practical tests for local extrema than is usually given.

2.1.1. Types of Subsets of Rn


In this subsection, we define the basic types of subsets differently than most books on real
analysis. The conditions we give are used because they seem intuitive are usually proved to be
equivalent of the usual ones for sets in Euclidean spaces.
Definition. For p ∈ Rn and r > 0, the set
B(p, r) = { x ∈ Rn : kx − pk < r }

39
40 2. Unconstrained Optima

is called the open ball about p of radius r. The set


B(p, r) = { x ∈ Rn : kx − pk ≤ r }
is called a closed ball about p of radius r.
Definition. The complement of a set S in Rn are the points not in S, Sc = Rn r S
= { x ∈ Rn : x ∈/ S }.
Definition. The boundary of a set S ⊂ Rn , denoted by ∂(S), is the set of all points which have
points arbitrarily close in both S and Sc ,
∂(S) = { x ∈ Rn : B(x, r) ∩ S 6= ∅ and B(x, r) ∩ Sc 6= ∅ for all r > 0 }.
Example 2.1. The boundary of an open or a closed ball is the same,
∂(B(p, r)) = ∂(B(p, r)) = { x ∈ Rn : kx − pk = r }. 
Example 2.2. The boundary of the bounded polyhedral set
2 ≥ x1 + x2 ,
16 ≥ 5x1 + 10x2
3 ≥ 2x1 + x2 ,
0 ≤ x1 , and 0 ≤ x2 ,
is the polygonal closed curve made up of five line segments. See Figure 1.2.1. 
For most of the sets we consider, the boundary will be the curve or surface that encloses
the set.
Definition. A set S ⊂ Rn is closed provided that its boundary is a contained in S, ∂(S) ⊂ S,
i.e., if B(p, r) ∩ S 6= ∅ for all r > 0, then p ∈ S.
Definition. A set S ⊂ Rn is said to be open provided that no point of the boundary is an
element of S, S ∩ ∂(S) = ∅. This is the same as saying that for any point x0 ∈ S there exists
an r > 0 such that B(x0 , r) ⊂ S. The intuitive idea of an open set is that for each point in the
set, all the nearby points are also in the set.
Since ∂(S) = ∂(Sc ), it follows that a set S is closed iff Sc is open.
Example 2.3. In the real line R, the intervals (a, b), (a, ∞), and (−∞, b) are open and the
intervals [a, b], [a, ∞), and (−∞, b] are closed. The intervals [a, b) and (a, b] are neither open
nor closed. The boundary of the whole line (−∞, ∞) is empty, and (−∞, ∞) is both open
and closed. 
Example 2.4. In Rn , the whole space Rn and the empty set ∅ are both open and closed. 
Example 2.5. In Rn , since ∂(B(p, r)) ∩ B(p, r) = ∅, the open ball B(p, r) is open and its
complement B(a, r)c is closed.
Alternatively, let x0 ∈ B(p, r). Set r0 = r − kx0 − pk > 0. We claim that B(x0 , r0 ) ⊂
B(p, r): Take x ∈ B(x0 , r0 ). Then
kx − pk ≤ kx − x0 k + kx0 − pk < r0 + kx0 − pk = r,
and x ∈ B(p, r). This shows that B(p, r) is open. 
Example 2.6. In Rn , since ∂(B(p, r)) ⊂ B(p, r), the closed ball B(p, r) is closed and its
complement B(p, r)c is open.
Alternatively, take x0 ∈ B(p, r)c . It follows that kx0 −pk > r. Let r0 = kx0 −pk−r >
0. If x ∈ B(x0 , r0 ), then
kx − pk ≥ kp − x0 k − kx0 − xk > kx0 − pk − r0 = r
2.1. Mathematical Background 41

and x ∈ B(p, r)c . Thus, B(x0 , r0 ) ⊂ B(p, r)c and B(p, r)c is open and B(p, r) is closed.

Definition. The interior of S ⊂ Rn , denoted by int(S), is the set with its boundary removed,
int(S) = S r ∂(S). It is the largest open set contained in S. It is also the set of all points
p ∈ S for which there exists an r > 0 such that B(p, r) ⊂ S.
The closure of S ⊂ Rn , denoted by cl(S) or S, is the union of S and its boundary ∂(S),
cl(S) = S ∪ ∂(S). It is the smallest closed set containing S.
Notice that the boundary of a set equals its closure minus its interior, ∂(S) = cl(S) r
int(S).
Example 2.7. For intervals in R, int([0, 1]) = (0, 1), cl((0, 1)) = [0, 1], and ∂([0, 1]) =
∂((0, 1)) = {0, 1}. Also, cl (Q ∩ (0, 1)) = [0, 1], int (Q ∩ (0, 1)) = ∅, and ∂ (Q ∩ (0, 1)) =
[0, 1].
The analogous object in Rn to an interval in R is an open or closed ball. For these sets,
int B(a, r) = B(a, r) and cl(B(a, r) = B(a, r). The boundary is the same for both of these
balls, ∂B(a, r) = ∂B(a, r) = { x ∈ Rn : kx − ak = r }. 
To guarantee that a maximum of a real valued function exists, the domain or feasible set
must be closed and bounded: the domain must contain its boundary and cannot “go off to
infinity”.
Definition. A set S ⊂ Rn is bounded provided that there exists a sufficiently large r > 0
such that S ⊂ B(0, r), i.e., kxk ≤ r for all x ∈ S.
Definition. A set S ⊂ Rn is called compact provided that it is closed and bounded.
Remark. In a course in real analysis, a compact set is defined differently: either in terms of
sequences or covers of the set by open sets. With one of these definitions, the above definition
is a theorem about a compact set in Rn .

2.1.2. Continuous Functions


In a calculus course, continuity is usually only mentioned in passing. Since it plays a key role
in a more rigorous discussion of optimization, we give a brief introduction to this concept.
Example 2.8. For a function of a single real variable, the intuitive definition of a continuous
function is that its graph can be drawn without lifting the pen. There are various ways in which
a function can be discontinuous at a point. Consider the following two functions:
( (
0 if x < 0 0 if x ≤ 0
f (x) = and g(x) = 1

1 if x ≥ 0, sin /x if x > 0,

f g

The function f has a jump at x = 0 so is discontinuous. The function g oscillates as x


approaches zero and is discontinuous at x = 0. Notice that for the function g , there are some
points x > 0 where the value is near (equal to) g(0) = 0 but there are other points near to 0
where the values is far from g(0) = 0. 
42 2. Unconstrained Optima

The definition of continuity is given in terms of limits, which we define first.


Definition. Let f : S ⊂ Rn → Rm .
If a ∈ cl(S), then the limit of f (x) at a is L, which is written as limx→a f (x) = L,
provided that for every  > 0 there exists a δ > 0 such that kf (x) − Lk <  whenever
kx − ak < δ and x ∈ S r {a}.
This definition can be extended to the limit as a real variable goes to infinity: limx→∞ f (x) =
L provided that for every  > 0, there exists a K such that |f (x)−L| <  whenever x ≥ K .
Example 2.9. In R2 , consider
y2

 if (x, y) 6= (0, 0)
f (x, y) = x2 + y 2
0 if (x, y) = (0, 0).

The function f does not have a limit as (x, y) goes to (0, 0) since it approaches different
values as (x, y) approaches the origin from different directions:
y2
lim f (0, y) = lim =1 and
y→0 y→0 y 2

t2 x2 t2
lim f (x, tx) = lim 2 = 6= 1.
x→0 x→0 x + t2 x2 1 + t2

Definition. A function f : S ⊂ Rn → Rm is continuous at a ∈ S provided that
limx→a f (x) = f (a), i.e., for all  > 0 there exists a δ > 0 such that kf (x) − f (a)k < 
whenever kx − ak < δ and x ∈ S.
We say that f is continuous on a set S provided that it is continuous at all points in S.
Continuity at a means that given a tolerance  > 0 in the values, there is a tolerance
δ > 0 in the input such that all points within δ of a have values within  of f (a).
In terms of sequences, f is continuous at a ∈ S provided that for any sequence of points
xk in S that converge to a, the values f (xk ) converge to f (a).
Example 2.10. We return to the function g(x) = sin 1/x for x > 0 and g(x) = 0 for


x < 0. For any small δ > 0, there are some points xn = 1/(n2π) > 0 for which g(xn ) =
0 = g(0), but there are other points x0n = 2/(n4π + π) such that g(x0n ) = 1 that all have
|g(x0n ) − g(0)| = 1 > 1/2. Thus, g(x) is not continuous at x = 0.
Example 2.11. Let f : (0, ∞) → R be defined by f (x) = 1/x. We claim that f is continuous
at all points x > 0. Fix a > 0 and  > 0. If |x − a| < δ , then
1 |a − x|

1 δ
|f (x) − f (a)| = − = < .
x a |xa| |xa|
If δ < a/2, then x = a − (a − x) > a/2, 1/x < 2/a, and

|f (x) − f (a)| < .
a2
2
Thus,n if δ is also a /2, then |f (x) − f (a)| < . Therefore, if we take δ <
2
o less than
min a/2,  a /2 , then we get that |f (x) − f (a)| <  as desired. 
2.1. Mathematical Background 43

It is not hard to show that a vector valued function F : S ⊂ Rn → Rm is continuous at a


iff each of its coordinate functions Fi is continuous at a.
Continuity can be characterized in terms of the inverse image of open or closed sets. We
use this property a few times.
Definition. If f : D ⊂ Rn → Rm is a function, then the inverse image of U ⊂ Rm is
f - 1 (U) = { x ∈ D : f (x) ∈ U } ⊂ D .
In this context, the function need not have an inverse, but f - 1 (U) merely denotes points that
map into the set U.
A level set of f is the same as the inverse image of a point b ∈ Rm ,
f - 1 (b) = { x ∈ D : f (x) = b }
= { x ∈ D : fi (x) = bi for i = 1, . . . , m } ⊂ D .

Theorem 2.12. Let f : D ⊂ Rn → Rm . Then the following are equivalent.


(i) f is continuous on D .
(ii) For each open set V ⊂ Rm , there is an open set U ⊂ Rn such that f −1 (V) = U ∩ D ,
i.e., the inverse image of an open set f −1 (V) is open relative to D .
(iii) For each closed set C ⊂ Rm , there is an closed set B ⊂ Rn such that f −1 (C) =
B ∩ D , i.e., the inverse image of a closed set f −1 (C) is closed relative to D .
For a proof, see Wade [15] or Rudin [10].
Example 2.13. Let pi > 0 for 1 ≤ i ≤ n be fixed prices and w > 0 be the wealth. The
simplex
S = { x ∈ Rn : xi ≥ 0 for 1 ≤ i ≤ n, p1 x1 + · · · + pn xn ≤ w }
is compact.
For a point x ∈ S, each coordinate 0 ≤ xj ≤ w/pj so kxkm ≤ maxi {w/pi }, so the set is
bounded.
Intuitively, the set is closed because the inequalities are non-strict, “less than or equal to”
or “greater than or equal to”.
More formally, the function f (x) = p1 x1 + · · · + pn xn is easily seen to be continuous.
The interval [0, w] is closed in R, so the set
{ x ∈ Rn : 0 ≤ f (x) ≤ w } = f - 1 ([0, w])
is closed. Similarly, for any 1 ≤ i ≤ n, gi (x) = xi is continuous and the interval [0, ∞) is
closed, so the sets
{ x ∈ Rn : xi ≥ 0 }
are closed. Combining,
S = { x ∈ Rn : xi ≥ 0 for 1 ≤ i ≤ n, p1 x1 + · · · + pn xn ≤ w }
n
\
= f −1 ([0, w]) ∩ gi−1 ([0, ∞))
i=1

is closed. Since S is both closed and bounded, it is compact. 

2.1.3. Existence of Extrema


We can now state a general theorem on the existence of points that maximize and minimize. It
does not give a constructive method for finding a point that maximizes a function but merely
gives sufficient conditions for a maximum to exist. Such a result gives incentive to search for a
maximizer. In this chapter and the next, we will consider techniques to help find the maximizer.
44 2. Unconstrained Optima

Theorem 2.14 (Extreme Value Theorem). Assume that F ⊂ Rn is a nonempty compact set.
Assume that f : F → R is a continuous function. Then f attains a maximum and a minimum
on F , i.e., there exist points xm , xM ∈ F such that
f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ F , so
f (xm ) = min f (x), and
F
x∈F
f (xM ) = max f (x).
F
x∈F

For a proof see a book on real analysis such as the ones by Wade [15] or Rudin [10].
Example 2.15. We give some examples that illustrate why it is necessary to assume that the
domain of a function is compact in order to be certain that the function attains a maximal or
minimal value.
On the unbounded set F = R, the function f (x) = x3 is unbounded above and below
and has no maximum nor minimum. The function is continuous, but the domain is not bounded.
The same function f (x) = x3 on ( 1, 1) is bounded above and below, but it does not
attain a maximum or minimum on ( 1, 1); the set is bounded but not closed.
Similarly, g(x) = tan(x) on the interval ( π/2, π/2) is unbounded and does not have a
minimum or maximal value; again, this interval is bounded but not closed.
The function h(x) = arctan(x) on R is bounded but the function does not attain a maxi-
mal or minimal value. In this case, the domain is closed but is not bounded and the set of values
attained is bounded but not closed. 
Example 2.16. Consider
1
(
x if x 6= 0
f (x) =
0 if x = 0
This function is not continuous at x = 0 and has no maximum nor minimum on [ 1, 1] even
though the domain is compact. 
In Section 4.2.1, we define the least upper bound or supremum for a function that is
bounded above but does not attain its maximum.

2.1.4. Differentiation in Multi-Dimensions


In this section, we introduce the derivative of a vector-valued function as the matrix of par-
tial derivatives. This approach generalizes the gradient of a scalar-valued function. We use
this treatment of the derivative when considering extremizers later in the chapter and also for
implicit differentiation with several constraining equations in the next chapter.
Before discussing differentiation of a function defined on Rn , we derive a consequence of
differentiation of a function f : R → R. It has derivative f 0 (p) at p provided that
f (x) − f (p)
lim = f 0 (p) or
x→p x−p
f (x) − f (p) − f 0 (p)(x − p)
lim = 0.
x→p x−p
Thus, for small  > 0, there exists δ > 0, such that for |x − p| < δ ,
 |x − p| ≤ f (x) − f (p) − f 0 (p)(x − p) ≤  |x − p| or
f (p)+f 0 (p)(x − p) −  |x − p| ≤ f (x) ≤ f (p) + f 0 (p)(x − p) +  |(x − p|.
These inequalities imply that for x near p, the value of the function f (x) is in a narrow
cone about the line f (p) + f 0 (p)(x − p), and this line is a good affine approximation of the
2.1. Mathematical Background 45

nonlinear function f (x) near p. See Figure 2.1.1. (An affine function is a constant plus a
linear function.)

f (x)
f (p) + [f 0 (p) + ](x − p) f (p) + f 0 (p) (x − p)
f (p) + [f 0 (p) − ](x − p)
p

Figure 2.1.1. Cone condition for derivative at p

Definition. A function f : D ⊂ Rn → Rm is said to be differentiable at p ∈ int(D D)


provided that all the first order partial derivatives exist at p and the m × n matrix
∂fi
 
Df (p) = (p)
∂xj
satisfies
f (x) − f (p) − Df (p)(x − p)
lim = 0, or
x→p kx − pk
f (x) = f (p) + Df (p) (x − p) + R
e 1 (p, x) kx − pk where

lim R
e 1 (p, x) = 0.
x→p

When this limit is satisfied, the the matrix Df (p) is called the derivative of f at p.
If m = 1, then Df (p)> = ∇f (p) is the gradient, which is a column vector. Then
P ∂f
Df (p)(x − p) = ∇f (p) · (x − p) = j (p) (xj − pj ).
∂xj

Remark. The fact that the remainder R1 (p, x) = R e 1 (p, x) kx − pk goes to zero faster than
kx − pk means that f (p) + Df (p)(x − p) is a good affine approximation of the nonlinear
function f (x) near p for all small displacements just as in the one dimensional case.
Since we cannot divide by a vector, the first limit in the definition of differentiability divides
by the length of the displacement.
Note that the rows of the derivative matrix Df (p) are determined by the coordinate func-
tions and the columns by the variable of the partial derivative. This choice is important so that
the correct terms are multiplied together in the matrix product Df (p)(x − p).

Definition. A function f : D ⊂ Rn → Rm is said to be continuously differentiable or C 1 on


D ) provided that all the first order partial derivatives exist and on continuous on int(D
int(D D ).
The following theorem shows that a C 1 function is differentiable at all points in the interior
of the domain.
D ), then f is differentiable at all points
Theorem 2.17. If f : D ⊂ Rn → Rm is C 1 on int(D
D ).
p ∈ int(D
46 2. Unconstrained Optima

Proof. For x a point near p, let pj = (p1 , . . . , pj , xj+1 , . . . , xn ) and rj (t) = (1−t) pj−1 +
t pj−1 . The Mean Value Theorem for a function of one variable can be applied to each coordi-
nate function fi along the paths rj (t):
dfi (rj (t))

j j−1 j j
fi (p ) − fi (p ) = fi (r (1)) − fi (r (0)) = (1 − 0)
dt
t=tij
∂fi j
= (r (tij )) (xj − pj ).
∂xj
If we add these up, the sum telescopes,
n n
X X ∂fi
fi (x) − fi (p) = fi (pj ) − fi (pj−1 ) = (rj (tij )) (xj − pj ).
j=1 j=1
∂xj

Then,

∂fi j ∂fi
 

(r (tij )) − (r) (xj − pj )
fi (x) − fi (p) − Dfi (p)(x − p)
X
= ∂xj ∂xj
kx − pk kx −


j pk



X ∂fi
j ∂fi
≤ ∂x (r (tij )) − ∂x (r) .

j j j

This last term goes to zero because the partial derivatives are continuous. Since this is true for
each coordinate function, it is true for the vector valued function.

The following result corresponds to the usual chain rule for functions of one variable and
the chain rule for partial derivatives for functions of several variables.
Theorem 2.18 (Chain Rule). Assume that f : Rn → Rm and g : Rm → Rk are C 1 ,
p ∈ Rn and q = f (p) ∈ Rm . Then the composition g ◦ f : Rn → Rk is C 1 , and
D(g ◦ f )(p) = Dg(q) Df (p).
This form of the chain rule agrees with the usual chain rule for functions of several variables
written in terms of partial derivatives: Assume that w = g(x) and x(t), then

dw ∂w ∂w dx1 dxn > X ∂w dxi


  
= ,..., ,..., = .
dt ∂x1 ∂xn dt dt i
∂xi dt

Proof. We let v = x − p and w = y − q. In the limits for the derivative, we write R e f (v)
for R
e f (p, p + v) and Re g (v) for R
e g (q, q + w).
 
g ◦ f (p + v) = g p + Df (p) v + R e f (v) kvk
h i
= g(q) + Dg(q) Df (p) v + R e f (v)kvk
 
+R e g Df (p) v + R e f (v)kvk Df (p) v + R e f (v)kvk

h i
= g(q) + Dg(q) Df (p) v + Dg(q) R e f (v) kvk
 
+R e g Df (p) v + R e f (v)kvk . Df (p) v + Re f (v)kvk

2.1. Mathematical Background 47

e f (v) goes to zero as needed and is multiplied by kvk. For the second
The first term Dg(q) R
term,

Df (p) v + R e f (v)kvk

v
= Df (p)
+ Rf (v)
e
kvk kvk
 
is bounded as kvk goes to zero, and R
e g Df (p) v + R
e f (v)kvk goes to zero. Thus,

g ◦ f (p + v) = g(q) + Dg(q) Df (p) v + R


e g◦f (v) kvk

where R e g◦f (v) goes to zero as kvk goes to zero. It follows that Dg(q) Df (p) must be the
derivative.

2.1.5. Second Derivative and Taylor’s Theorem


The only higher derivative that we consider is the second derivative of a real valued function
of multiple variables. Other higher derivatives are more complicated to express. We use this
second derivative to state Taylor’s theorem for a multi-variable real valued function, giving the
expansion with linear and quadratic terms. We indicate why this multi-variable version follows
from Taylor’s Theorem for a function of a single variable.

Definition. Let D ⊂ Rn be an open set and f : D → R. If all the second partial derivatives
∂ 2f
(p) exists and are continuous for all p ∈ D then f is said to be twice continuously
∂xi ∂xj
differentiable or C 2 .
∂ 2f
 
The matrix of second partial derivatives (p) is called the second derivative at p
∂xi ∂xj
and is denoted by D 2f (p). Some authors call it the Hessian matrix of f .

In more formal treatments of calculus, D 2f (p) can be understood as follows. The matrix
Df (x) can be considered as a point in Rn , and the map Df : x 7→ Df (x) is a function from
Rn to Rn . The derivative of the map Df at x0 would be an n × n matrix, which is the second
derivative D 2f (x0 ). This matrix can be applied to two vectors v1 , v2 ∈ Rn ; the result is a
real number, v> 2 n
1 D f (p) v2 . The second derivative is a bilinear map from R to R, i.e., it takes
two vectors and gives a number and is linear in each of the vectors separately.

Theorem 2.19. Let D ⊂ Rn be open and f : D → R be C 2 . Then for all pairs 1 ≤ i, j ≤ n,

∂ 2f ∂ 2f
(p) = (p),
∂xi ∂xj ∂xj ∂xi

i.e., D 2f (p) is a symmetric matrix.

The symmetric matrix D 2f (p) defines a quadratic form

>
X ∂ 2f
(x − p) D 2f (p) (x − p) = (p)(xj − pj )(xi − pi ) for x ∈ Rn ,
i,j
∂xi ∂xj

which is used in Taylor’s Theorem for a function of several variables.


48 2. Unconstrained Optima

Theorem 2.20 (Taylor’s Theorem for a Single Variable). Assume that g : D ⊂ R → R is


C r , i.e., has r continuous derivatives. We denote the k th derivative at x by g (k) (x). Assume
that p is in the interior of D . Then
r
X 1 (k)
g(x) = g(p) + g (p)(x − p)k + Rr (p, x)
k=1
k!
where the remainder
x
1
Z h i
Rr (p, x) = (x − t)r−1 g (r) (t) − g (r) (p) dt and satisfies
(r − 1)! p
|Rr (p, x)|
lim = 0.
x→p |x − p|r

If g is C r+1 , then the remainder can also be given by either of the following expressions:
1 x
Z
Rr (p, x) = (x − t)r g (r+1) (t) dt
r! p
1
= g (r+1) (c)(x − p)r+1 ,
(r + 1)!
where c is between p and x.
Proof. The second form of the remainder can be proved by induction using integration by parts.
The other two forms of the remainder can be proved from the second. We refer the reader to a
book on calculus.
To estimate the remainder we use the first form of the remainder. Let
n o
Cr (x) = sup g (r) (t) − g (r) (p) : t is between p and x .

We treat the case with x > p and leave the details for x < p to the reader. Then,
Z x i
1 h
(x − t)r−1 g (r) (t) − g (r) (p) dt

|Rr (p, x)| =
(r − 1)! p
Z x
1
≤ (x − t)r−1 g (r) (t) − g (r) (p) dt

(r − 1)! p
Z x
1
≤ (x − t)r−1 Cr (x) dt
(r − 1)! p
1
= (x − p)r Cr (x).
r!
Since Cr (x) goes to zero as x goes to p, we get the desired result.
Theorem 2.21 (Taylor’s Theorem for a Multi-variable Function). Assume that F : D ⊂
D ) and p ∈ int(D
Rn → R is C 2 on int(D D ). Then,
>
F (x) = F (p) + DF (p) (x − p) + 21 (x − p) D 2F (p) (x − p) + R2 (p, x)
where
R2 (p, x)
lim = 0,
x→p kx − pk2
e 2 (p, x)kx − pk2 , then limx→p R
i.e., if R2 (p, x) = R e 2 (p, x) = 0.

Proof. Define
xt = p + t(x − p) and
g(t) = F (xt ),
2.1. Mathematical Background 49

so g(0) = F (p) and g(1) = F (x). For x near enough to p, xt ∈ D for 0 ≤ t ≤ 1. The
derivatives of g in terms of F are

n
X ∂F
g 0 (t) = (xt )(xi − pi )
i=1
∂xi
g 0 (0) = DF (p)(x − p),
X ∂ 2F
g 00 (t) = (xt )(xi − pi )(xj − pj )
i=1,...,n
∂xj ∂xi
j=1,...,n

g 00 (0) = (x − p)> D 2F (p)(x − p).

Applying usual Taylor’s Theorem for a function of one variable to g gives the result including
the estimate on the remainder.

Remark. For a 3 × 3 symmetric matrix A = (aij ), a21 = a12 , a31 = a13 , and a32 = a23 ,
so

v> Av = a11 v12 + a22 v22 + a33 v32 + 2 a12 v1 v2 + 2 a13 v1 v3 + 2 a23 v2 v3 .

If we apply that last formula to the Taylor’s expansion of a function f : R3 → R, we get

F (x) = F (p + Df (p)(x − p)
∂ 2F 2
2
1 ∂ F 2
2
1 ∂ F
+ 1
(p)(x 1 − p1 ) + (p)(x 2 − p 2 ) + (p)(x3 − p3 )2
∂x21
2 2
∂x22 2
∂x23
∂ 2F ∂ 2F
+ (p)(x1 − p1 )(x2 − p2 ) + (p)(x1 − p1 )(x3 − p3 )
∂x1 ∂x2 ∂x1 ∂x3
∂ 2F
+ (p)(x2 − p2 )(x3 − p3 ) + R2 (p, x).
∂x2 ∂x3

Example 2.22. Find the second order Taylor expansion of F (x, y, z) = 3x2 y + y 3 − 3x2 −
3y 2 + z 3 − 3z , about the point p = (1, 2, 3).
F (p) = 8. The first order partial derivatives are

∂F ∂F
= 6xy − 6x, (p) = 18,
∂x ∂x
∂F ∂F
= 3x2 + 3y 2 − 6y, (p) = 27,
∂y ∂y
∂F ∂F
= 3z 2 − 3, (p) = 24.
∂z ∂z
50 2. Unconstrained Optima

The second order partial derivatives of F are


∂ 2F ∂ 2F
= 6y − 6, (p) = 18,
∂x2 ∂x2
∂ 2F ∂ 2F
= 6y − 6, (p) = 18,
∂y 2 ∂y 2
∂ 2F ∂ 2F
= 6z, (p) = 18,
∂z 2 ∂z 2
∂ 2F ∂ 2F
= 6x, (p) = 6,
∂x∂y ∂x∂y
∂ 2F ∂ 2F
= 0, (p) = 0,
∂x∂z ∂x∂z
∂ 2F ∂ 2F
= 0, (p) = 0.
∂y∂z ∂y∂z
The second order Taylor expansion about (1, 2, 3) is
F (x, y, z) = 8 − 18(x − 1) + 27(y + 2) + 24(z − 3)
− 9(x − 1)2 − 9(y + 2)2 + 9(z − 3)2
+ 6(x − 1)(y + 2) + R2 (p, (x, y, z)).


2.1.6. Quadratic Forms


Later in the chapter, we give conditions that insure that a critical point x∗ is a local extremizer
based on the quadratic form determined by D 2f (x∗ ). To prepare for that material, we review
and extend the material about quadratic forms from linear algebra.
Definition. Let A = (aij ) be an n × n symmetric matrix. Then
n
X
x> Ax = aij xi xj for x ∈ Rn
i,j=1

is called a quadratic form.


Definition. The quadratic form x> Ax is called
i. positive definite provided that x> Ax > 0 for all x 6= 0,
ii. positive semidefinite provided that x> Ax ≥ 0 for all x,
iii. negative definite provided that x> Ax < 0 for all x 6= 0,
iv. negative semidefinite provided that x> Ax ≤ 0 for all x, and
v. indefinite provided that x> >
1 Ax1 > 0 for some x1 and x2 Ax2 > 0 for some other
x2 .
Since (sx)> A(sx) = s2 x> Ax, the sign of x> Ax is determined by its sign on unit vectors.

Definition. For an n × n symmetric matrix A, the principal submatrices of A are the k × k


submatrices in the upper left hand corner,
Ak = (aij )1≤i,j≤k for 1 ≤ k ≤ n.
We let ∆k = det(Ak ) be the determinants of the principal submatrices.
2.1. Mathematical Background 51

Theorem 2.23 (Test for Definite Matrices). Let A be an n × n symmetric matrix.


a. The following three conditions are equivalent:
i. The matrix A is positive definite.
ii. All the eigenvalues of A are positive.
iii. The determinant of every principal submatrices is positive,
det(Ak ) > 0 for 1 ≤ k ≤ n.
iv. The matrix A can be row reduced without row exchanges or scalar multiplica-
tions of rows to an upper triangular matrix that has n positive pivots, i.e., all the
pivots are positive.
b. The following three conditions are equivalent:
i. The matrix A is negative definite.
ii. All the eigenvalues of A are negative.
iii. The determinants of the principal submatrices alternate sign,
( 1)k det(Ak ) > 0 for 1 ≤ k ≤ n, ∆1 < 0, ∆2 > 0, ∆3 < 0, . . .
iv. The matrix A can be row reduced without row exchanges or scalar multiplica-
tions of rows to an upper triangular matrix that has n negative pivots, i.e., all the
pivots are negative.
c. The following two conditions are equivalent:
i. The matrix A is indefinite.
ii. The matrix A has at least one positive and one negative eigenvalue.
Any one of the following conditions implies conditions (c.i) and (c.ii):
iii. det(A) = det(An ) 6= 0 and the pattern of signs of ∆k = det(Ak ) is different
than those of both part (a) and (b), allowing one of the other det(Ak ) = 0.
iv. The matrix A can be row reduced to an upper triangular matrix without row
exchanges or a scalar multiplication of a row and all n of the pivots are nonzero
and some pivot pj > 0 and another pk < 0.
v The matrix A cannot be row reduced to an upper triangular matrix without row
exchanges.
We discuss the proof in an appendix at the end of the chapter. For a 3 × 3 matrix, the
calculation of the determinants of the principal submatrices is the most direct method of deter-
mining whether a symmetric matrix is positive or negative definite. Row reduction is probably
the easiest for matrices larger than 3 × 3.
Remark. Negative Definite: The idea behind the condition that ( 1)k det(Ak ) > 0 for all
k for negative definite matrix is that the product of k negative numbers has the same sign as
( 1)k .
Nonzero Determinant: If det(A) 6= 0, then all the eigenvalues are nonzero, and Cases
a(iii), b(iii), and c(iii) tell whether it is positive definite, negative definite, or indefinite. (To
remember the signs of the determinants, think of the diagonal case.)
Zero Determinant: If det(A) = 0, then some eigenvalue is zero and A can be either
indefinite or positive semi-definite or negative semi-definite. There is no simple general rule.
The following theorem collects together using row reduction to test for the type of symmet-
ric matrices. It includes the results about positive and negative definite matrices given earlier.
Theorem 2.24 (Row Reduction Test). Let A be an n × n symmetric matrix, Q(x) = x> Ax
the related quadratic form.
a. Assume that A can be row reduced to an upper triangular matrix without row exchanges
or scalar multiplications of rows.
i. If all the pivots satisfy pj > 0 for 1 ≤ j ≤ n, then A is positive definite.
ii. If all the pivots satisfy pj < 0 for 1 ≤ j ≤ n, then A is negative definite.
52 2. Unconstrained Optima

iii. If all the pivots satisfy pj ≥ 0 for 1 ≤ j ≤ n, then A is positive semi-definite.


iv. If all the pivots satisfy pj ≤ 0 for 1 ≤ j ≤ n, then A is negative semi-definite.
v. If some pivot pj > 0 and another pk < 0, then A is indefinite.
b. If A cannot be row reduced to an upper triangular matrix without row exchanges, then
A is indefinite.
Remark (Row Reduction Conditions). If the matrix can be row reduced to upper triangular
form without row exchanges or scalar multiplication of a row, then det(Ak ) = p1 · · · pk . If
all the pivots are positive, then all the determinants of the Ak are positive. If all the pivots
are negative, then the determinants of the Ak alternate signs are required. If all the pivots are
nonzero and both signs appear, then the signs of the determinants do not fit the pattern for either
positive definite or negative definite, so A must be indefinite. Finally, if the matrix cannot be
row reduced
" to an# upper triangular matrix without row exchanges, then some submatrix must of
0 a
the form , with 0 on the diagonal. This insures the matrix is indefinite. See Theorem
a b
3.3.12 in [2].
Example 2.25. Let  
2 1 0
A =  1 2 1 .
 
0 1 2
The principal submatrices and their determinants are
A1 = (2), det(A1 ) = 2 > 0,
!
2 1
A2 = , det(A2 ) = 3 > 0,
1 2
A3 = A det(A) = 4 > 0.
Since the signs of these determinants are all positive, the quadratic form induced by A is posi-
tive definite. This method is the easiest to calculate for a 3 × 3 matrix.
Alternatively, we can row reduce without any row exchanges to to an upper triangular
matrix. Row reduction is equivalent to multiplying on the left by a matrix. We given that
matrix following the procedure in §2.5 of Lay [9]. Since the matrix is symmetric, A can be
written as the product of lower triangular, diagonal, and upper triangular matrices.
       
2 1 0 1 0 0 2 1 0 1 0 0 2 1 0
A =  1 2 1  =  12 1 0 0 23 1 =  21 1 0 0 23 1
       
2 4
0 1 2 0 0 1 0 1 2 0 3
1 0 0 3
   
1
1 0 0 2 0 0 1 2
0
 1 3 2
=  2 1 0 0 2 0  0 1 .
  
3
2 4
0 3
1 0 0 3
0 0 1
The pivots are all positive so A is positive definite: p1 = 2 > 0, p2 = 32 > 0, and p3 = 43 > 0.
This method would be the easiest to calculate for a matrix larger than 3 × 3.
The quadratic form can be written as a sum of squares with all positive coefficients:
 2  2
1 3 2 4
Q(x) = x> U> DUx = 2 x1 − x2 + x2 − x3 + x23 .
2 2 3 3

The eigenvalues are not especially easy to calculate, but they are 2 and 2 ± 2 which are
all positive. The signs of these eigenvalues are correct for A to be positive definite. 
2.1. Mathematical Background 53

Proof of Theorem 2.23(a). Lay [9] discusses some aspects of the proof of this theorem. A
more complete reference is [13] by Gilbert Strang.
In the proof, we need to add the following intermediate steps.
(v) For each 1 ≤ k ≤ n, the quadratic form associated to Ak is positive definite.
(vi) All the eigenvalues of Ak are positive for 1 ≤ k ≤ n.
(vii) By completion of the squares, Q(x) can be represented as a sum of squares, with all
positive coefficients,
Q(x1 , . . . , xn ) = (x1 , . . . , xn )U> DU(x1 , . . . , xn )>
= p1 (x1 + u1,2 x2 + · · · + u1,n xn )2
+ p2 (x2 + u2,3 x3 + · · · + u2,n xn )2
+ · · · + pn x2n .
(i ⇔ ii) The fact the positive definite is equivalent to all positive eigenvalues is proved in
Lay [9].
(i ⇒ v) Assume Q is positive definite. Then, for any 1 ≤ k ≤ n and any (x1 , . . . , xk ) 6=
0,
0 < Q(x1 , . . . , xk , 0, . . . , 0)
= (x1 , . . . , xk , 0, . . . , 0)A(x1 , . . . , xk , 0, . . . , 0)>
= (x1 , . . . , xk )Ak (x1 , . . . , xk )> .
This shows that (i) implies (v).
(v ⇔ vi) This is the same as the result (i ⇔ ii) applied to Ak . Notice that the eigenvalues
of Ak are not necessarily eigenvalues of A.
(vi ⇒ iii) For any k , det(Ak ) is positive since it is the product of its eigenvalues of Ak ,
(iii ⇒ iv) Assume (iii). If A cannot be
" row#reduced without row exchanges, then there is a
0 a
subblock down the diagonal of the form and some det(Ak ) = 0 and det(Ak+1 ) =
a b
det(Ak−1 ). If we can row reduce A without row exchanges, then it also row reduces all
the Ak . Therefore the pivots of the Ak are pivots of A. Also, the determinant of Ak is the
product of the first k pivots, det(Ak ) = p1 . . . pk . Therefore
pk = (p1 . . . pk )/(p1 . . . pk−1 ) = det(Ak )/ det(Ak−1 > 0,
for each k . This proves (iv).
(iv ⇒ vii) Assume (iv). Row reduction can be realized by matrix multiplication on the left
by a lower triangular matrix with ones on the diagonal. Therefore, L - 1 A = U1 = DU, where
U1 and U are upper triangular, D is a diagonal matrix with the pivots on the diagonal, and U
has ones on the diagonal. (See §2.5 in [9].) Since A is symmetric, LDU = A = A> =
U> DL> . It can then be shown that the factorization is unique and U> = L, so
Q(x1 , . . . , xn ) = (x1 , . . . , xn )U> DU(x1 , . . . , xn )>
= p1 (x1 + u1,2 x2 + · · · + u1,n xn )2
+ p2 (x2 + u2,3 x3 + · · · + u2,n xn )2
+ · · · + pn x2n .
Thus, we can “complete the squares”, expressing Q as the sum of squares with the pivots as
the coefficients. If the pivots are all positive, then all the coefficients pi are positive. Thus (iv)
implies (vii). Note that z = Ux is a non-orthonormal change of basis that makes the quadratic
form diagonal.
54 2. Unconstrained Optima

(vii ⇒ i) If Q(x) can be written as the sum of squares of the above form with positive
coefficients, then the quadratic form must be positive. Thus, (vii) implies (i).

2.1. Exercises
2.1.1. Consider the sets
S1 = { (x, y) ∈ R2 : 1 < x < 1 }
S2 = { (x, y) ∈ R2 : x ≥ 1, y ≥ 0 }.
a. For the sets S1 and S2 , discuss which points are in the boundary and which
points are not using the definition of the boundary.
b. Discuss why S1 is open in two ways: (i) S1 ∩ ∂(S1 ) = ∅ and (ii) for every
point p ∈ S1 , there is an r > 0 such that B(p, r) ⊂ S1 .
c. Discuss why S2 is closed in two ways: (i) ∂(S2 ) ⊂ S2 and (ii) for every
point p ∈ Sc2 , there is an r > 0 such that B(p, r) ⊂ Sc2 .
2.1.2. Consider the sets
S1 = {(x, y) : x > 0, y > 0, xy < 1 } ,
S2 = {(x, y) : x ≥ 0, y ≥ 0, xy ≤ 1 } ,
S3 = {(x, y) : x ≥ 0, y ≥ 0, 2x + 3y ≤ 7 } .
a. For each of the sets S1 , S2 , and S3 , discuss which points are in the boundary
and which points are not using the definition of the boundary.
b. Discuss why S1 is open in two ways: (i) S1 ∩ ∂(S1 ) = ∅ and (ii) for every
point p ∈ S1 , there is an r > 0 such that B(p, r) ⊂ S1 .
c. Discuss why S2 and S3 are closed in two ways: (i) ∂(S) ⊂ S and (ii) for
every point p ∈ Sc , there is an r > 0 such that B(p, r) ⊂ Sc .
2.1.3. Which of the following sets are open, closed, and or compact? Explain why your
answer is correct.
a. Let g1 (x, y, z) = x2 + y 2 , g2 (x, y, z) = x2 + z 2 , and
S1 = g1−1 ([0, 9]) ∩ g2−1 ([0, 4])
= (x, y, z) ∈ R3 : x2 + y 2 ≤ 9, x2 + z 2 ≤ 4 .


b. Let g(x, y) = |x − y| and


S2 = g −1 ([0, 1]) = (x, y) ∈ R2 : |x − y| ≤ 1 .


c. Assume that f : Rn → R and g : Rn → R are continuous and


S3 = f −1 (0, ∞) ∩ g −1 (−∞, 2)
= { x ∈ Rn : 0 < f (x), g(x) < 2 }.

2.1.4. Assume that f : D ⊂ Rn → R is continuous. The Extreme Value Theorem says


that if F ⊂ D is compact, then f attains its maximum and minimum on F .
a Give an example to show that the conclusion is false if F is not bounded.
b When if F is not closed, give examples where (i) f is not bounded above
and (ii) f is bounded above but does not attain a maximum
2.1.5. Let f : R+ → R be continuous, f (0) > 1, and limx→∞ f (x) = 0.
a. Show that there is a p > 0 such that the maximal value of f (x) on [0, p] is
larger than any value of f (x) for x > p. Hint: Take p such that f (x) <
1
2
f (0) for x ≥ p.
b. Show that f (x) has a maximum on R+ .
2.2. Derivative Conditions 55

c. Does f (x) have to have a minimum on R+ ? Explain why or why not.


2.1.6. Show that the function
2x + y
f (x, y) =
4x2+ y2 + 8
attains a maximum on R2+
Hint: f (x, y) ≤ 3R/R2 = 3/R for k(x, y)k = R > 0.
2.1.7. Let F = { (x, y) ∈ R2+ : xy ≥ 1 } and B = { (x, y) ∈ R2+ : x + y ≤ 10 }.
(Note that F is not compact.) Assume that f : R2+ → R is a continuous function
with f (x, y) > f (2, 3) for x + y > 10, i.e., for (x, y) ∈ R2+ r B .
a. Why must f attain a minimum on F ∩ B ?
b. Using reasoning like for Exercise 2.1.4, explain why f attains a minimum on
all of F .
2.1.8. Compute the second order Taylor polynomial (without explicit remainder) for
f (x, y) = ex cos(y) around (x0 , y0 ) = (0, 0). You do not need to find an expres-
sion for the remainder.
2.1.9. Compute the second order Taylor polynomial of f (x, y) = xy 2 about (x∗ , y ∗ ) =
(2, 1). You do not need to find an expression for the remainder.
2.1.10. Decide whether the following matrices are positive definite, negative definite, or nei-
ther:
   
2 −1 −1 2 −1 −1
(a) −1 2 −1 (b) −1 2 1
   
−1 −1 2 −1 1 2
 
  1 2 0 0
1 2 3 2 6 −2 0 
(c) 2 5 4 (d) 
   
0 −2 5 −2

3 4 9
0 0 −2 3

2.2. Derivative Conditions


In this section, we consider derivative conditions that are necessary for an extremizer and ones
that are sufficient for a local extremizer. We treat the general multi-dimensional case using the
derivative and second derivative as matrices, rather than expressing the conditions in two and
three dimensions in terms of partial derivatives as is done in many multi-dimensional calculus
courses. In contrast with linear functions, nonlinear functions can have local maximizers that
are not global maximizers. Therefore, we start by giving definitions that indicate the difference
between these two concepts.
Definition. We say that a function f : F ⊂ Rn → R has a maximum or global maximum at a
point xM ∈ F provided that f (x) ≤ f (xM ) for all x ∈ F . We also say that the point xM
is a maximizer of f on F . It has a strict maximum at xM provided that f (x) < f (xM ) for
all x ∈ F r {xM }.
We say that a function f : F ⊂ Rn → R has a local maximum at a point xM ∈ F
provided that there exists an r > 0 such that
f (x) ≤ f (xM ) for all x ∈ F ∩ B(xM , r).
56 2. Unconstrained Optima

It has a strict local maximum at xM provided that there exists an r > 0 such that
f (x) < f (xM ) for all x ∈ F ∩ B(xM , r) r {xM }.
If xM ∈ int(FF ) is a (local) maximizer, then f is said to have an unconstrained (local)
maximum at xM .
The minimum, global minimum, minimizer, local minimum, strict local minimum, and un-
constrained local minimum can be defined in a similar manner.
We say that a function f has an extremum at a point x∗ provided that f has either a
maximum or a minimum at x∗ . In the same way, we say that a function f has a local extremum
at a point x∗ provided that f has either a local maximum or local minimum at x∗ .

2.2.1. First Derivative Conditions


In this section, we concentrate on extremizers that are in the interior of the domain and not
on the boundary. We show below that such an extremizer must be a point where either the
derivative is equal to zero or the function is not differentiable.
Definition. For a continuous function f : F → R, xc is a critical point of f provided that
either (i) Df (xc ) = 0 or (ii) f is not differentiable at xc . (We will treat points on the boundary
or end points separately and do not call them critical points.)
Theorem 2.26. If f : F ⊂ Rn → R is continuous on F and f has a unconstrained
local extremum at x∗ ∈ int(F F ), then x∗ is a critical point of f , i.e., either (i) f is not
differentiable at x or (ii) f is differentiable at x∗ and Df (x∗ ) = 0.

Proof. We prove the contrapositive: We assume that the point x∗ is not a critical point and
prove that the function does not have a maximum nor a minimum at x∗ . We are assuming that
the column vector (the gradient) exists and v = Df (x∗ )> 6= 0. We consider the values of f
along the line xt = x∗ + tv (in the direction of the gradient) using the remainder of the affine
approximation:
f (xt ) = f (x∗ ) + Df (x∗ ) (tv) + R
e 1 (x∗ , xt ) ktvk

= f (x∗ ) + v> (tv) + R


e 1 (x∗ , xt ) ktvk
h i
= f (x∗ ) + t kvk2 + R
e 1 (x∗ , xt )kvksign(t)
(
< f (x∗ ) e 1 | < 1 kvk
if t < 0 and t small enough so that |R 2
> f (x∗ ) e 1 | < 1 kvk.
if t > 0 and t small enough so that |R 2

This proves that x∗ is neither a maximum nor a minimum, i.e., it is not an extreme point. What
we have shown is that if the gradient is nonzero, then the function is decreasing in the direction
of the negative gradient and increasing in the direction of the gradient.

2.2.2. Second Derivative Conditions


In this section, we give conditions on the second derivative that insure that a critical point is a
local extremizer. At a critical point x∗ , Df (x∗ ) = 0, so
>
f (x) − f (x∗ ) = 1
2
(x − x∗ ) D 2f (x∗ ) (x − x∗ ) + R
e 2 (x∗ , x) kx − x∗ k2 .

For x near x∗ , R
e 2 is small, and the term involving the second derivative dominates and deter-
mines whether the right hand side of the last equation is positive or negative.
F ) and x∗ ∈ int(F
Theorem 2.27. Suppose that f : F ⊂ Rn → R is C 2 on int(F F ).
2.2. Derivative Conditions 57

a. If f has a local minimum (resp. local maximum) at x∗ , then D 2f (x∗ ) is positive semi-
definite (resp. negative semidefinite).
b. If Df (x∗ ) = 0 and D 2f (x∗ ) is positive definite (resp. negative definite), then f has
a strict local minimum (resp. strict local maximum) at x∗ .
c. If Df (x∗ ) = 0 and D 2f (x∗ ) is indefinite, then x∗ is not an extreme point for f .

Proof. (a) If f has a local minimum at x∗ , then it is a critical point. Use the second order
Taylor’s expansion:
>
f (x) = f (x∗ ) + 12 (x − x∗ ) D 2f (x∗ ) (x − x∗ ) + R
e 2 (x∗ , x) kx − x∗ k2 .

Assume the conclusion is false and there is a direction v such that v> D 2f (x∗ ) v < 0. Let
xt = x∗ + t v, so xt − x∗ = t v. Then
t2 > 2 ∗
f (x) = f (x∗ ) + v D f (x ) v + R e 2 (x∗ , xt ) t2 kvk2
2h i
= f (x∗ ) + t2 12 v> D 2f (x∗ ) v + R e 2 (x∗ , xt )kvk2
1 >
< f (x∗ ) e 2 (x∗ , xt )kvk2 <
for t 6= 0 small enough so that R 2
v D 2f (x∗ ) v.

This implies that f would not have a local minimum at x∗ . Thus, D 2f (x∗ ) must be positive
semidefinite.
(b) Assume that D 2f (x∗ ) is positive definite. The set { u : kuk = 1 } is compact, so

m = min u> D 2f (x∗ ) u > 0.


kuk=1

For a x near x∗ , letting v = x − x∗ and u = 1 v,


kvk
>
(x − x∗ ) D 2f (x∗ ) (x − x∗ ) = (kvku)> D 2f (x∗ ) (kvku)
= kvk2 u> D 2f (x∗ ) |u ≥ m kx − x∗ k2 .

Since Df (x∗ ) = 0, the Taylor’s expansion with two terms is as follows:


>
f (x) = f (x∗ ) + 12 (x − x∗ ) D 2f (x∗ ) (x − x∗ ) + R
e 2 (x∗ , x) kx − x∗ k2 .

There exists a δ > 0 such that


e 2 (x∗ , x)| < 1 m
|R for kx − x∗ k < δ.
4

Then, for δ > kx − x∗ k > 0,

f (x) > f (x∗ ) + 12 m kx − x∗ k2 − 14 m kx − x∗ k2


= f (x∗ ) + 14 m kx − x∗ k2
> f (x∗ ).

This shows that x∗ is a strict local minimum.


The proof of (c) is similar.

Example 2.28. For the function F (x, y, z) = 3x2 y + y 3 − 3x2 − 3y 2 + z 3 − 3z , find the
critical points and classify them as local maximum, local minimum, or neither.
58 2. Unconstrained Optima

The equations for a critical point are


∂F
0= = 6xy − 6x = 6x(y − 1)
∂x
∂F
0= = 3x2 + 3y 2 − 6y
∂y
∂F
0= = 3z 2 − 3.
∂z
From the third equation z = ±1. From the first equation, x = 0 or y = 1. If x = 0, then the
second equation gives 0 = 3y(y − 2), y = 0 or y = 2. Thus we have the points (0, 0, ±1) and
(0, 2, ±1). If y = 1 from the first equation, then the second equation becomes 0 = 3x2 − 3
and x = ±1. Thus we have the points (±1, 1, ±1). Thus, all the critical points are (0, 0, ±1),
(0, 2, ±1), and (±1, 1, ±1).
The second derivative of F is
 
6y − 6 6x 0
D 2F (x, y, z) =  6x 6y − 6 0  .
 
0 0 6z
At the critical points
   
6 0 0 6 0 0
2 2
D F (0, 0, ±1) =  0 6 0 , D F (0, 2, ±1) = 0 6 0  ,
   
0 0 ±6 0 0 ±6
 
0 ±6 0
D 2F (±1, 1, ±1) = ±6 0 0 .
 
0 0 ±6
Let ∆k = det(Ak ) be the determinants of the principal submatrices. For this example,
∆1 = Fxx = 6y − 6,
2
∆2 = Fxx Fyy − Fxy = (6y − 6)2 − 36x2 , and
∆3 = Fzz ∆2 = 6z ∆2 .
(x, y, z) ∆1 = Fxx Fyy Fxy ∆2 Fzz ∆3 Type
(0, 0, 1) 6 6 0 36 6 216 saddle
(0, 0, 1) 6 6 0 36 6 216 local max
(0, 2, 1) 6 6 0 36 6 216 local min
(0, 2, 1) 6 6 0 36 6 216 saddle
(±1, 1, ±1) 0 0 ±6 36 ±6 ∓216 saddle

Therefore, (0, 0, 1) is a local maximum, (0, 2, 1) is a local minimum, and the other points are
neither. 

2.2. Exercises
2.2.1. Find the points at which each of the following functions attains a maximum and
minimum on the interval 0 ≤ x ≤ 3. For parts (a) and (b), also find the maximal and
minimal values. Remember to consider the end points of the interval [0, 3].
a. f (x) = x2 − 2x + 2.
b. g(x) = −x2 + 2x + 4.
c. The function h(x) satisfies h0 (x) > 0 for all 0 ≤ x ≤ 3.
2.2. Derivative Conditions 59

d. The function k(x) satisfies k 0 (x) < 0 for all 0 ≤ x ≤ 3.


e. The function u(x) satisfies u0 (x) = 0 for all 0 ≤ x ≤ 3.
x
2.2.2. Consider the function f (x) = .
1 + x2
a. What are the critical points of f (x)?
b. Are the critical points global maximum or minimum?
2.2.3. Find the critical points of the following functions.
a. f (x, y) = 2xy − 2x2 − 5y 2 + 4y − 3.
b. f (x, y) = x2 − y 3 − x2 y + y .
8 1
c. f (x, y) = xy + + .
x y
2.2.4. Suppose f : F ⊂ Rn → R is C 1 and has a maximum at a point x∗ on the
boundary of F . Does Df (x∗ ) have to equal 0 at x∗ ? Give an explanation or a
counter-example.
2.2.5. Find all the critical points and classify them as local maximum, local minimum, or
saddle points for the following functions.
a. f (x, y, z) = x4 + x2 − 6xy + 3y 2 + z 2
b. f (x, y, z) = 3x − x3 − 2y 2 + y 4 + z 3 − 3z
c. f (x, y, z) = 3x2 y + y 3 − 3x2 − 3y 2 + z 3 − 3z .
d. f (x, y, z) = 3x2 y − 12xy + y 3 + z 4 − 2z 2 .
e. f (x, y, z) = 3x2 + x3 + y 2 + xy 2 + z 3 − 3z .
f. f (x, y, z) = − 12 x2 − 5y 2 − 2z 2 + 3xy − yz + 6x + 5y + 20z
2.2.6. For the feasible set F = R2+ = { (x, y) : x, y ≥ 0 }, consider the function
2x + y
f (x, y) = .
4x2 + y 2 + 8
Its first order partial derivatives are as follows:
∂f 16 − 8x2 + 2y 2 − 8xy
(x, y) = ,
∂x (4x2 + y 2 + 8)2
∂f 8 + 4x2 − y 2 − 4xy
(x, y) = .
∂y (4x2 + y 2 + 8)2
a. Find the one critical point (x̄, ȳ) in the interior of R2+ .
Hint: Show that ȳ = 2x̄.
b. Classify the critical point (x̄, ȳ) as a strict local maximum, strict local mini-
mum, or neither. You may use the fact that at the critical point the second order
partial derivatives are as follows:
∂ 2f −16x̄ − 8ȳ ∂ 2f 4ȳ − 8x̄
2
(x̄, ȳ) = , (x̄, ȳ) = ,
∂x (1 + x̄2 + ȳ 2 )2 ∂x∂y (1 + x̄2 + ȳ 2 )2
∂ 2f −4x̄ − 2ȳ
(x̄, ȳ) = .
∂y 2 (1 + x̄2 + ȳ 2 )2
c. Find the critical point of g(x) = f (x, 0) = x/(2x2 + 4) for x ≥ 0, i.e., along
the boundary y = 0. Classify this critical point of g .
d. Find and classify the critical point of h(y) = f (0, y) = y/(y 2 + 8) for y ≥ 0.
e. Find the maximal value and the point which gives the maximum. Hint: It
can be shown that f attains a maximum on R2+ . (See Exercise 2.1.6.) The
60 2. Unconstrained Optima

maximizer must be among the points found in parts (a), (c), (d), and the origin
(0, 0).
2.2.7. A firm produces a single output Q determined by the Cobb-Douglas production func-
1 1
tion of two inputs q1 and q2 , Q = q13 q22 . Let p1 be the price of q1 , and p2 be the
price of q2 ; let the price of the output Q be one. (Either the price of Q is taken as
the unit of money or the pj are the ratios of the prices of qj to the price of Q.) The
profit is given by π = Q − p1 q1 − p2 q2 .
a. Considering the inputs as variables, (q1 , q2 ) ∈ R2++ , show that there is a
unique critical point q1∗ , q2∗ ).
b. Show that the critical point is a local maximum.

2. Exercises for Chapter 2


2.1. Indicate which of the following statements are true and which are false. Justify each
answer: For a true statement explain why it is true and for a false statement either
indicate how to make it true or indicate why the statement is false.
a. If limx→a f (x) exists and is finite, then f is continuous at x = a.
b. If f : R3 → R2 is differentiable at x = a, then f is continuous at x = a.
c. f : R3 → R2 is continuous at x = a, then f is differentiable at x = a.
d. If f (x, y, z) is a C 1 function on R3 , then the extrema of f on the set x2 +
4y 2 + 9z 2 ≤ 100 must either be a critical point in x2 + 4y 2 + 9z 2 < 100 or
an extrema of f on the boundary x2 + 4y 2 + 9z 2 = 100.
e. If ∇f (a1 , . . . , an ) = 0, then f has a local extremum at a = (a1 , . . . , an ).
f. A continuous function f (x, y) must attain a maximum on the disk
{ (x, y) : x2 + y 2 < 1. }.
g. If det D 2f (a) = 0, then f has a saddle point at a.
h. For a C 2 function f : Rn → R with a critical point x∗ at which the second
derivative (or Hessian matrix) D 2f (x∗ ) is negative definite, then x∗ is a
global maximizer of f on Rn .
Chapter 3

Constrained Extrema

This chapter concerns the problem of find the extrema of a function on a feasible set defined by
a number of constraints. When there are equality constraints, the Implicit Function Theorem
gives conditions on the derivatives to ensure that some of the variables locally can be considered
as functions of the other variables. This theorem makes use of the derivative as a matrix in an
explicit manner and is the mathematical basis for implicit differentiation, which we illustrate
with examples from comparative statics within economics. The Implicit Function Theorem
is also used in the proofs of subsequent topics, including in the derivation of the Lagrange
Multiplier Theorem that concerns optimization problems for which the constraint functions are
set equal to constants. The last two sections concern nonlinear optimization problems where
the feasible set is defined by inequality constraints.

3.1. Implicit Function Theorem


For a real valued function g : Rn → R, if g(x∗ ) = b and ∇g(x∗ ) 6= 0, then the level set
g - 1 (b) = { x ∈ Rn : g(x) = b } is locally a graph near x∗ . This fact can be understood
in terms of the tangent plane to the level surface that is the set of vectors perpendicular to the
∂g ∗
gradient, 0 = ∇g(x∗ ) · (x − x∗ ). If (x ) 6= 0, then the tangent plane is the graph of the
∂xn
variable xn in terms of the other variables:

0 = ∇g(x∗ ) · (x − x∗ )
∂g ∗ ∂g ∂g ∗
= (x )(x1 − x∗1 ) + · · · + (x∗ )(xn−1 − x∗n−1 ) + (x )(xn − x∗n ), so
∂x1 ∂xn−1 ∂xn
 ∂g
∂g ∗
  

(x ) (x )
 (x1 − x∗ ) − · · · −  ∂xn−1
 ∂x1
xn = x∗n −   (xn−1 − x∗ ).
  
 ∂g  1  ∂g  n−1
(x∗ ) (x∗ )
∂xn ∂xn
The implicit function says that the nonlinear level set is also locally a graph, with xn deter-
∂g
mined by the other variables. If a different variable xm has (x∗ ) 6= 0, then the same
∂xm
type of argument shows that the nonlinear level set locally determines xm as a function of the
other variables.

61
62 3. Constrained Extrema

Now we turn the case of a vector constraint or several scalar constraints. Consider a C 1
g : Rn → Rk for k > 1 with coordinate functions gi . For a constant b ∈ Rk , the level set
where is the function takes on the values b is denoted by
g - 1 (b) = { x ∈ Rn : g (x) = b for i = 1, . . . , k }.
i i

For x∗ ∈ g - 1 (b), the assumption of nonzero gradient for a scalar function is replaced by
the assumption is that the rank of Dg(x∗ ) is equal to k , i.e., the gradients {∇gi (x∗ )}ki=1 are
linearly independent. Then, it is possible to select k variables xm1 , . . . , xmk such that

∂g1 ∗ ∂g1 ∗

 ∂xm (x ) · · · ∂xmk
(x )
1
.. ..
 
det 
 ..  6= 0.

(3)
 . . . 
 ∂gk ∂gk 
(x∗ ) · · · (x∗ )
∂xm1 ∂xmk
We show below that if (3) holds, then the null space of Dg(x∗ ) is the graph of the vari- 
ables z = (xm1 , . . . , xmk ) in terms of the other n − k variables w = x`1 , . . . , x`n−k .
The Implicit Function states that the nonlinear level set is also locally a graph with the z = 
(xm1 , . . . , xmk ) determined implicitly as functions of the other variables w = x`1 , . . . , x`n−k .
Theorem 3.1 (Implicit Function Theorem). Assume there are k C 1 constraint functions,
>
gi : Rn → R for 1 ≤ i ≤ k with g(x) = (g1 (x), . . . , gk (x)) , such that g(x∗ ) = b and
∗ ∗ k
rank(Dg(x )) = k , i.e., the gradients {∇gi (x )}i=1 are linearly independent.
Then, the nonlinear level set g - 1 (b) is locally a graph near x∗ .
>
If z = (xm1 , . . . , xmk ) are k variables such that inequality (3) holds, then they are
>
locally determined implicitly as functions of the other n−k variables, w = x`1 , . . . , x`n−k .
This implicitly defined function z = h(w) is as differentiable as g. The partial derivatives of
∂hq ∂xmq
the coordinate functions of z = h(w), = , can be calculated by the chain rule
∂x`j ∂x`j
and satisfy
k
∂gi X ∂gi ∂xmq
0= + ,
∂x`j q=1 ∂xmq ∂x`j
or in matrix notation
    
∂g1 ∂g1 ∂g1 ∂g1  ∂xm1 ∂xm1
 ∂x ... ... ···
 `1 ∂x`n−k  

∂xm1 ∂xmk 

 ∂x
`1 ∂x`n−k 

 . .
..  +  ... ..   .. .. 
.   .. ..
0 =  .. .. .  (ImD)

 
 .  .
. 
 . 
 ∂gk ∂gk    ∂gk ∂gk  
 ∂xmk ∂xmk 

... ... ···

∂x`1 ∂x`n−k ∂xm1 ∂xmk ∂x`1 ∂x`n−k
Remark. The hard part of the proof is showing that the function h exists and is differentiable,
even though there is no explicit formula for it. Once this is known, its derivative can be calcu-
lated by the chain rule and solving the matrix equation (ImD). For a proof, see Wade [15] or
Rudin [10].
>
Take the grouping of variables as given in the theorem: z = (xm1 , . . . , xmk ) are k vari-
>
ables such that equation (3) holds, and w = x`1 , . . . , x`n−k are the other n − k variables.
We write x = (w, z), as if the submatrix of the last k columns has full rank k . (We could
3.1. Implicit Function Theorem 63

also relabel the variables so this was true.) Use the following notation for the matrix of partial
derivatives with respect to only the w variables or only the w variables,
∂gi ∗
 
Dw g(x∗ ) = (x ) and
∂x`j 1≤i≤k, 1≤j≤n−k
∂gi
 
Dz g(x∗ ) = (x∗ )
∂xmj 1≤i≤k, 1≤j≤k

Using this notation, rank(Dz g(x∗ )) = k or det (Dz g(x∗ )) 6= 0 so Dz g(x∗ ) is invertible.
Using this notation, the matrix equation (ImD) in the theorem becomes
0 = Dw g(x∗ ) + Dz g(x∗ ) Dh(w∗ ), so
-1
Dh(w∗ ) = (Dz g(x∗ )) Dw g(x∗ ).
In a given example, we write down the equation (ImD), and then solve it for the derivative
Dh(z∗ ), or just the desired partial derivatives. Notice that (i) the matrix Dz g(x∗ ) includes
all the partial derivatives with respect to the dependent variable used to calculate the nonzero
determinant and (ii) the matrix Dw g(x∗ ) includes all the partial derivatives with respect to the
independent (other) variables.
Before giving examples, we want to note that the assumption of the Implicit Function
Theorem is exactly what makes the null space of Dg(x∗ ) a graph of the z-coordinates in
terms of the w-coordinates. The null space of Dg(x∗ ) is given by
null (Dg(x∗ )) = {v ∈ Rn : Dg(x∗ )v = 0 } = {v ∈ Rn : v · ∇gi (x∗ ) = 0 for 1 ≤ i ≤ k } .
If v is in this
! null space, then we can split it up into the components in the w and z directions,
vw
v= , where we use notation as if the z-variables were in the last k components.
vz
!
∗ ∗ ∗ vw
0 = Dg(x )v = [Dw g(x ), Dz g(x )] = Dw g(x∗ )vw + Dz g(x∗ )vz and
vz
-1
vz = (Dz g(x∗ )) Dw g(x∗ )vw .
The Implicit Functions says that if this is possible at the linear level, then the nonlinear level set
is also locally a differentiable graph of these same variable in terms of the other variables.

∇g1
∇g2

Figure 3.1.1. Null space (and nonlinear level set) as a graph


64 3. Constrained Extrema

Example 3.2 (Changing Technology for Production). This example is based on Section 6.6
of [4]. A firm uses two inputs to produce a single output. Assume the amounts of the inputs
are x and y with p the price of x and q the price of y . The amount produced Q is assumed to
be determined by the Cobb-Douglas production function Q = xa y b , where the technology
determines the exponents a and b. By changing the technology, the firm can vary the
exponent b while keeping a fixed or vary a while keeping b fixed. It wants to keep the
amount produced fixed, Q0 , and the cost of the inputs fixed, px + qy = 125. What is the rate
of change of the amounts of inputs as functions of a and b at x = 5, y = 50, p = 5, q = 2,
a = 1/3, and b = 2/3?
Rather than use the equation Q0 = xa y b , we take its logarithm and obtain the two equations

g1 (x, y, a, b, p, q) = px + qy = 125 and (4)


g2 (x, y, a, b, p, q) = a ln(x) + b ln(y) = ln(Q0 ).

These two equations define x and y as functions of a, b, p, and q since equation (3) is

∂g1 ∂g1
 
" #
 ∂x ∂y  p q
det 
 ∂g2 ∂g2  = det a

b
x y
∂x ∂y
pb qa pbx − qay 5 · 2 · 5 − 2 · 1 · 50 1
= − = = = 6= 0.
y x xy 3 · 5 · 50 15

Considering x and y as functions of a, b, p, and q , and differentiating the two equations


with respect to the four independent variables gives the following matrix equation (ImD):
∂x ∂x ∂x ∂x

∂g1 ∂g1 ∂g1 ∂g1 ∂g1 ∂g1
  
" #
0 0 0 0   ∂a ∂b ∂p ∂q 
 ∂a ∂b ∂p ∂q   ∂x
  ∂y 

= ∂g2  +  ∂g2

0 0 0 0  ∂g2 ∂g2 ∂g2  ∂y ∂y ∂y
∂g2   ∂y 

∂a ∂b ∂p ∂q ∂x ∂y ∂a ∂b ∂p ∂q
# ∂x ∂x ∂x ∂x
 
" # "
0 0 x y p q 
 ∂a ∂b ∂p ∂q 

= + a b  ∂y ∂y ∂y ∂y  ,
ln(x) ln(y) 0 0 x y  
∂a ∂b ∂p ∂q

so

∂x ∂x ∂x ∂x

b
 
q " #
xy  y
 ∂a ∂b ∂p ∂q 
 0 0 x y
=−
 
pbx − qay xa
 ∂y ∂y ∂y ∂y  p ln(x) ln(y) 0 0
 
 
∂a ∂b ∂p ∂q
bx
" #
xy q ln(x) q ln(y) y b
= ay
qay − pbx p ln(x) p ln(y) a x
 
xyq ln(x) xyq ln(y) bx2 bxy
 qay − pbx qay − pbx qay − pbx qay − pbx 
=  xyp ln(x) xyp ln(y) ay 2 ln(y) 
.
axy
qay − pbx qay − pbx qay − pbx qay − pbx
3.1. Implicit Function Theorem 65

100−50 50
At the point in question, qay − pbx = 3
= 3
and

∂x 3(5)(50)(2) ln(5) ∂x 3(5)(50)(2) ln(50)


= = 30 ln(5), = = 30 ln(50),
∂a 50 ∂b 50
∂y 3(5)(50)(5) ln(5) ∂y 3(5)(50)(5) ln(50)
= = 75 ln(5), = = 75 ln(50).
∂a 50 ∂b 50 
Steps for Implicit Differentiation with Several Constraints
1. Some equations are given (or derived) relating several variables. The last ex-
ample has two equations with six variables.
2. Select the same number of variables as the number of equations that you want
to be defined in terms of the other variables. Check that the matrix of partial
derivatives of the constraint equations with respect to these variables is invert-
ible at least at the point in question. In the previous example, the matrix of
partial derivatives with respect to x and y has nonzero determinant.
3. Thinking of these equations as defining these variables in terms of the others,
take partial derivatives of the equations to give the matrix equation (ImD).
4. Solve (ImD) for the matrix of partial derivatives of the dependent variables
with respect to the independent variables.

Example 3.3 (Marginal Inputs of Prices). This example is based on Section 7.3 of [4]. A
firm produces a single output Q determined by the Cobb-Douglas production function of two
1 1
inputs q1 and q2 , Q = q13 q22 . Let p1 be the price of q1 , and p2 be the price of q2 ; let the price of
the output Q be one. (Either the price of Q is taken as the unit of money or the pj are the ratios
of the prices of qj to the price of Q.) The profit is given by π = Q − p1 q1 − p2 q2 . The inputs
that are a critical point satisfy

∂π 1 2 1
0= = q1 3 q22 − p1 = g1 (p1 , p2 , q1 , q2 ) and (5)
∂q1 3
∂π 1 1 1
0= = q13 q2 2 − p2 = g2 (p1 , p2 , q1 , q2 ),
∂q2 2
where the function g is defined by the last two equations.
We have two equations and four variables q1 , q2 , p1 , and p2 which clearly define the prices
in terms of the inputs. We want to show that it also implicitly determines the two inputs q1 and
q2 in terms of the prices p1 and p2 . The derivative of g with respect to q is
2 q 53 q 12 1 q 23 q 12
 
 9 1 2 6 1 2

Dq g =  1 2 1
1 1 3
 with
q
6 1
3
q 2
2
q 3
4 1 2 q 2

 
4
1 2 1
∆2 = det(Dq g) = q1 q2 3

36 36
1 4
= q1 3 q2 1 > 0.
36
The second derivative D 2π = Dq g is negative definite, so this critical point maximize profits.
Because det(Dg) 6= 0, the Implicit Function Theorem implies that these two equations (5)
implicitly determine the two inputs q1 and q2 in terms of the prices p1 and p2 . Also, the partial
66 3. Constrained Extrema

derivatives of q1 and q2 with respect to p1 and p2 satisfy the following matrix equation (ImD):
    
" # ∂g1 ∂g1 ∂g1 ∂g1 ∂q1 ∂q1
0 0  ∂p

∂p2   ∂q1 ∂q2   ∂p1 ∂p2 
   
= 1 +
0 0
 
 ∂g2 ∂g2   ∂g2 ∂g2   ∂q2 ∂q2 
∂p1 ∂p2 ∂q1 ∂q2 ∂p1 ∂p2
 
5 1 2 1
∂q1 ∂q1

2q1 q2 1q
6 1 q2  
" # 3 2 3 2

1 0  9 ∂p ∂p2 

= + 1 2 1
 1 3
 1
1 q 3 q 2   ∂q2
,
0 1 ∂q2 
6 q1 q2
3 2
4 1 2
∂p1 ∂p2
  2 1
∂q1 ∂q1 5 1 1 23 2 1

  9 q1 q2 q q
3 2

 ∂p1

∂p2   6 1 2 
=
∂q2   1 q 23 q 12 1 31 32 
 
 ∂q2
1 2 q q
∂p1 ∂p2 6 4 1 2
 
∂q1 ∂q1
 1 13 32 1 23 12 
q q q q
 ∂p1

∂p2 
 4  4 1 2 6 1 2 
  = 36 q13 q2  1 23 12 2 53 12 

 ∂q2 ∂q2  
q q q1 q2
∂p1 ∂p2 6 1 2 9
5 1 2 1
 
9 q13 q2 2
6 q13 q22
 
= 2 1 1 3
,
6 q1 q2
3 2
8 q1 q2
3 2
 

∂qi
and all the partial derivatives are negative, i.e., the inputs decrease with an increase in
∂pj
either price.
Because the sum of the exponents in the production function is less than one, 13 + 12 < 1,
the production has decreasing return to scale. This property causes D 2π = Dg to be negative
definite, the critical points to be maximum, and the equations to define the inputs as implicit
functions of the prices. 
Example 3.4 (Nonlinear Keynesian Model IS-LM Model for National Income). See §15.3
in [11] or §8.6 in [5] for more economic discussion. The variables are as follows:
Y Gross domestic product (GDP) T Taxes
C Consumption r Interest rate
I Investment expenditure M Money supply
G Government spending
The GDP is assumed to be the sum of the consumption, investment, and government spending,
Y = C + I + G. The consumption is a function of after taxes income or the difference Y − T ,
C = C(Y − T ); the investment is a function of the interest rate, I = I(r); and the money
supply is a function (called the liquidity function) of GDP and the interest rate, M = L(Y, r).
We can think of the assumptions as yielding the following two functional relationships between
the five variables:
0 = −Y + C(Y − T ) + I(r) + G and (6)
0 = L(Y, r) − M.
3.1. Implicit Function Theorem 67

The assumptions on the derivatives of the unspecified functions are


∂L ∂L
0 < C 0 (x) < 1, I 0 (r) < 0, > 0, and < 0.
∂Y ∂r
∂C ∂C
For C(Y −T ), = C 0 and = C 0 . Examples of functions that satisfy these derivative
∂Y ∂T
2I0 1 + 3e - 10r
 
1 - x 1
conditions are C(x) = 3 xe + 2 x, I(r) = 10r 9
, and L(Y, r) = 10 Y .
e +1 2 + 2e - 10r
Taking the partial derivatives of the two equations (6) with respect to Y and r is an invert-
ible matrix, " #
1 + C0 I0
∆ = det = (C 0 − 1)Lr − I 0 LY > 0 :
LY Lr
The sign of ∆ is positive because (C 0 −1) < 0, Lr < 0, (C 0 −1) Lr > 0, I 0 < 0, LY > 0, and
I 0 LY > 0. Therefore, the Implicit Function Theorem implies the following dependency of
the variables: G, M , and T can be considered as the independent or exogenous variables which
can be controlled; C and I are intermediate variables (or functions); Y and r can be considered
as dependent or endogenous variables determined by G, M , and T through the variables C and
I . Be cautioned that the details of solving for partial derivatives in terms of known quantities
and derivatives is more complicated than the other examples considered.
Taking the partial derivatives of the two equations (6) with respect to G, T , and M (in that
order), and considering Y and r as functions of these variables, we get the following matrix
equation, (ImD):
# ∂Y ∂Y ∂Y
 
" # " # "
0 1 C0 0 C0 − 1 I0   ∂G ∂T ∂M 
= + ,
0 0 0 1 LY Lr  ∂r ∂r ∂r 
.
∂G ∂T ∂M
∂Y ∂Y ∂Y
 
" #" #
1 0 0
 ∂G ∂T
 ∂M 
= L r I 1 C 0
 ∂r ∂r ∂r  ∆ LY C 0 − 1 0 0 1
.
∂G ∂T ∂M " #
1 Lr Lr C 0 I0
= .
∆ LY LY C 0 C 0 − 1
The partial derivatives of Y and r with respect to G are positive:
∂Y Lr ∂r LY
= >0 and = > 0.
∂G ∆ ∂G ∆
Thus, both the GDP, Y , and the interest rate, r, increase if government spending is increased
while holding the money supply and taxes fixed. The partial derivatives of Y and r with respect
to T are negative, and both Y and r decrease with increased taxes:
∂Y Lr C 0 ∂r LY C 0
= <0 and = < 0,
∂T ∆ ∂T ∆
Finally,
∂Y I0 ∂r C0 − 1
= >0 and = < 0,
∂M ∆ ∂M ∆
and Y increases and r decreases with an increased money supply.
This type of analysis can give qualitative information without knowing the particular form
of the functional dependence but only the signs of the rates of change. 
68 3. Constrained Extrema

3.1. Exercises
3.1.1. A firm uses two inputs, q1 and q2 to produce a single output Q, given by the produc-
2/ 1/
tion function Q = kq1 5 q2 5 . Let P be the price of the output Q, p1 be the price of
2/ 1/
q1 , and p2 be the price of q2 . The profit is given by π = P kq1 5 q2 5 − p1 q1 − p2 q2 .
The inputs that maximize profits satisfy
2P k −3/5 1/5
0= q q2 − p 1 and
5 1
P k 2/5 −4/5
0= q q − p2 .
5 1 2
a. Show that this two equations can be used to determine the amounts of inputs
q1 and q2 in terms of the prices p1 , p2 , and P . Show that the relevant matrix
has nonzero determinant.
b. Write the matrix equation for the partial derivatives of q1 and q2 with respect
to p1 , p2 and P in terms of the variables.
c. Solve for the matrix of partial derivatives of q1 and q2 in terms of p1 , p2 and
P.
3.1.2. Assume that the two equations that balance supply and demand for a single product
are given by
0 = Q − S0 1 − eT −P


0 = Q − Y e-P .
Here Q is the quantity sold, P is the price, T are the taxes, Y is the income of the
consumers, and S0 is a constant.
a. Use the implicit function theorem to show that these two equations define Q
and P as functions of T and Y (keeping S0 fixed).
∂P ∂P ∂Q ∂Q
b. Solve the matrix of partial derivatives of , , , . You may leave
∂Y ∂T ∂Y ∂T
your answer in matrix form.
3.1.3. A nonlinear Keynesian IS-LM model for national income involves the following
quantities:
Y Gross domestic product (GDP)
G Government spending
r Interest rate
M Money supply
In addition, there are three quantities which are functions of the other variables (inter-
mediate variables). Investment expenditure I is a function of the interest rate given
I0
by I(r) = . The consumption is a function of Y given by C(Y ) =
r+1
C0 + 65 Y + 16 e - Y with C0 a constant. The gross domestic product is the sum of
consumption, investment expenditure, and government spending, Y = C +I +G =
I0
C0 + 56 Y + 61 e - Y + + G. The money supply equals the liquidity func-
r+1
Y
tion, M = . With these assumptions, the model yields the following two
r+1
3.2. Extrema with Equality Constraints 69

equations:
1 1 I0
0 = C0 − Y + e-Y + +G and
6 6 r+1
Y
0= − M.
r+1
a. Using the Implicit Function Theorem, show that these two equations define Y
and r as dependent variables which are determined by the independent vari-
ables G and M , i.e., these equations define Y and r as functions of G and
M.
b. Write the matrix equation that the partial derivatives of Y and r with respect
to G and M must satisfy.
c. Solve for the matrix equation for the partial derivatives of Y and r with respect
to G and M .
3.1.4. Consider the three equations
xyz + u + v 2 = 6,
xy − zy 2 + u2 + v + w = 6,
xy 3 − zx2 + u2 + w = 4.
a. Show that these equations implicitly define (u, v, w) in terms of (x, y, z).
b. What is the system of equations (ImD) for the partial derivatives near the point
(x, y, z, u, v, w) = (1, 1, 1, 1, 2, 3)?

3.2. Extrema with Equality Constraints


This section treats optimization with equality constraints. This topic is usually covered in multi-
dimensional calculus course, but we do a more general case than is often covered and we give
a proof using the Implicit Function Theorem.
Definition. For C 1 constraints gi : Rn → R for i = 1, . . . , k , the constraints satisfy the
constraint qualification at a point x∗ provided that
rank (Dg(x∗ )) = k,
k
i.e., the gradients { ∇gi (x∗ ) }i=1 are linearly independent.
Theorem 3.5 (Method of Lagrange Multipliers). Assume f, gi : Rn → R are C 1 func-
tions for i = 1, . . . , k . Suppose that x∗ is a local extreme point of f on the set g - 1 (b) =
{ x ∈ Rn : gi (x) = bi , i = 1, . . . , k }. Then at least one of the following holds:

1. There exists λ = (λ∗1 , . . . , λ∗k ) ∈ Rk such that
k
X

Df (x ) = λ∗i Dgi (x∗ ). (LM)
i=1

2. The constraint qualification fails at x∗ , rank (Dg(x∗ )) < k .


We give a proof of the theorem later in the section after further discussion and examples.
In a calculus course, this theorem for one or two constraints is considered. The justification
given is often as follows. For one constraint g(x) = 0, if x∗ is an extreme point and v is
perpendicular to ∇g(x∗ ) then v is tangent vector to the level set and the directional derivative
of f in the direction of v must be zero. Thus, ∇f (x∗ ) is perpendicular to the same vectors
as ∇g(x∗ ) and so must be parallel to ∇g(x∗ ). To make this precise, we must find a curve in
the level set whose tangent vector is equal to v.
70 3. Constrained Extrema

A similar justification can be given with two constraints. The tangent vectors to the level
set g1 (x) = b1 and g2 (x) = b2 are all vectors perpendicular to both ∇g1 (x∗ ) and ∇g2 (x∗ ).
This is the same as the null space of Dg(x∗ ). Again, the directional derivative of f in the
direction of v must be zero, ∇f (x∗ ) · v = 0. Since ∇f (x∗ ) is perpendicular to all these
vectors v, it must be a linear combination of ∇g1 (x∗ ) and ∇g2 (x∗ ).
Example 3.6. It is possible for the constraint qualification to fail at a maximum. Let g(x, y) =
x3 + y 2 = 0 and f (x, y) = y + 2x.
f - 1( 1
2)

g - 1 (0)
f - 1 ( 1) f - 1 (0)
Figure 3.2.1. Maximum at a singular point

The maximum of f (x, y) is at singular point (0, 0), where ∇g = 0.


∇f (0) = (2, 1)> 6= 0 = λ ∇g. 
Example 3.7. This example illustrates the fact that for two (or more) constraints, the maximum
of the objective function f can occur where ∇g1 (x∗ ) and ∇g2 (x∗ ) are parallel (a point where
the constraint qualification fails). At such a point, it is not always possible to write ∇f (x∗ )
as a a linear combination of the gradients of the constraint equation. (cf. Exercise 3.3.5 with
inequality constraints.)
Let g1 (x, y, z) = x3 + y 2 + z = 0, g2 (x, y, z) = z = 0, and f (x, y, z) = y + 2x.
Level set is that of last example in (x, y)-plane,
g - 1 (0) = { (x, y, 0) : x3 + y 2 = 0 }.
The maximum of f on g - 1 (0) is at 0. The gradients of of g1 and g2 are parallel at 0:
∇g1 (x, y, z) = (3x2 , 2y, 1)> , ∇g2 (x, y, z) = (0, 0, 1)>
∇g1 (0, 0, 0) = (0, 0, 1)> , ∇g2 (0, 0, 0) = (0, 0, 1)> .
" #
0 0 1
Therefore, rank(Dg(0)) = rank = 1 < 2. Also,
0 0 1

∇f (0) = (2, 1, 0)>


6= λ1 (0, 0, 1)> + λ2 (0, 0, 1)>
= λ1 ∇g1 (0) + λ2 ∇g2 (0). 
The proof of the theorem uses the fact that the tangent vectors to the level set is the null
space of the derivative of the constraint function. We start with a precise definition of the
tangent vectors.
3.2. Extrema with Equality Constraints 71

Definition. For a C 1 function g : Rn → Rk with g(x∗ ) = b, denote the set of tangent


vectors to the level set g - 1 (b) by
Tg (x∗ ) = v = r0 (0) : r(t) is a C 1 curve with r(0) = x∗ , and g(r(t)) = b for all small t .


The linear space T (x∗ ) is called the tangent space to g - 1 (b) at x∗ .


g

Proposition 3.8. Assume that g : Rn → Rk is C 1 , g(x∗ ) = b, and the rank of Dg(x∗ )


is k . Then, the set of tangent vectors at x∗ equals to the null space of Dg(x∗ ), Tg (x∗ ) =
null(Dg(x∗ )).
Remark. Calculus books often assume implicitly that this proposition is true and that the set
of tangent vectors are all vectors perpendicular to the gradients of the constraints.
Proof. For any curve r(t) with r(0) = x∗ and g(r(t)) = b for all small t,

d
= Dg(r(0))r0 (0) = Dg(x∗ )r0 (0),

0 = g(r(t))
dt t=0
so Tg (x∗ ) ⊂ null(Dg(x∗ )).
The proof that null(Dg(x∗ )) ⊂ Tg (x∗ ) uses the Implicit Function Theorem. Assume
the variables have been ordered x = (w, z) so that det(Dz g(x∗ )) 6= 0, so the level set is
locally a graph z = h(w). Take a vector v = (vw , vz )> in null(Dg(x∗ )). !Consider the
w(t)
curves the line w(t) = w∗ + tvw in z-space and the curve r(t) = in the level
h(w(t))
set. Then, r(0) = x∗ , r0 (0) ∈ Tg (x∗ ),
!
v w
r0 (0) = , and
Dh(z∗ ) vw

d
= Dg(x∗ ) r0 (0).

0= g(r(t))
dt t=0
Thus, r0 (0) ∈ null(Dg(x∗ )) and has the same the same w-components as v. Because
null(Dg(x∗ )) is a graph over the w-coordinates, v = r0 (0) ∈ Tg (x∗ ). Thus, we have shown
that null(Dg(x∗ )) ⊂ Tg (x∗ ). Combining, null(Dg(x∗ )) = Tg (x∗ ).
Proof of the Lagrange Multiplier Theorem. Assume that x∗ is a local extremizer of f on
g - 1 (b). Then,
d
= Df (x∗ )v

0 = f (r(t))
dt t=0
for all curves r(t) in g - 1 (b) with r(0) = x∗ and r0 (0) = v, i.e., Df (x∗ )v = 0 for all
v ∈ Tg (x∗ ) = null(Dg(x∗ )). This says that the null spaces
!
∗ Dg(x∗ )
null(Dg(x )) = null
Df (x∗ )
and of dimension n − k . The rank of each matrix is equal to the number of columns minus the
dimension of the null space,
!
∗ Dg(x∗ )
k = rank (Dg(x )) = rank = n − (n − k) = k.
Df (x∗ )
This implies that the last row of the second matrix, Df (x∗ ), is a linear combination of the first
k rows,
Xk
Df (x∗ ) = λ∗i Dgi (x∗ ),
i=1
72 3. Constrained Extrema

which is the first order Lagrange multiplier condition (LM).


Example 3.9. Find the highest point on the set given by x + y + z = 12 and z = x2 + y 2 .
The function to be maximized is f (x, y, z) = z . The two constraint functions are
g(x, y, z) = x + y + z = 12 and h(x, y, z) = x2 + y 2 − z = 0.
Constraint qualification: If there were a point where constraint qualification fails, then
∇g = (1, 1, 1)> = s ∇h = s (2x, 2y, 1)> . So, s =  1 and x = y = 12 . To be on level
set, z = x2 + y 2 = 14 + 14 = 12 . But then g 12 , 12 , 12 = 12 6= 12. Therefore, there are no
points on level set where constraint qualification fails.
First order Lagrange multiplier equations; The first order conditions (LM) are
fx = λgx + µhx , 0 = λ + µ2x,
fy = λgy + µhy , 0 = λ + µ2y,
fz = λgz + µhz , 1 = λ − µ.
From the third equation, we get λ = 1+µ, so we can eliminate this variable from the equations.
Substituting in the other equations, they become
0 = 1 + µ + 2µx,
0 = 1 + µ + 2µy.
Subtracting the second from the first, we get 0 = 2µ(x − y), so µ = 0 or x = y .
Consider the first case of µ = 0. But then, 0 = 1 + µ + 2µx = 1, which is a contradiction.
Therefore, there is no solution with µ = 0.
Next, assume y = x. The constraints become z = 2x2 and 12 = 2x + z = 2x + 2x2 , so
0 = x2 + x − 6 = (x + 3)(x − 2), and x = 2 or 3. If x = 2, then y = 2, z = 2x2 = 8,
0 = 1 + µ(1 + 2x) = 1 + 5 µ, µ = 1/5, and λ = 4/5.
If x = y = 3, then z = 2x2 = 18, 0 = 1 + µ(1 + 2x) = 1 − 5µ, µ = 1/5, and
λ = 6/5.
We have found two critical points (λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) = 4/5, 1/5, 2, 2, 8, and

6/5, 1/5, 3, 3, 18 . The values of the objective function at the critical points are f (2, 2, 8) =
8 and f ( 3, 3, 18) = 18. The constraint set is compact so extrema exist and must be one of
the critical points. The maximum is at the point ( 3, 3, 18) with the maximal value, and the
minimum is at the point (2, 2, 8) with the minimum value. 
Lagrangian. The first derivative conditions (LM) can be seen as the critical point what is called
the Lagrangian, which is defined by
k
X
λ, x) = f (x) +
L(λ λi (bi − gi (x)) .
i=1

A point x∗ satisfies the first order Lagrange multiplier conditions (LM) with multipliers λ∗ iff
λ∗ , x∗ ) is a critical point of L with respect to all its variables,

∂L ∗ ∗
λ , x ) = bi − gi (x∗ ) = 0
(λ for 1 ≤ i ≤ k and
∂λi
k
X
λ∗ , x∗ ) = Df (x∗ ) −
Dx L(λ λ∗i Dgi (x∗ ) = 0.
i=1

To ensure that the constraint qualification does not fail, we need that
  
rank Lλi ,xj (λ λ∗ , x∗ ) ij = rank (Dg(x∗ )) = k.
These conditions on the Lagrangian are not a proof but merely a mnemonic device.
3.2. Extrema with Equality Constraints 73

3.2.1. Interpretation of Lagrange Multipliers


For a tight constraint in a maximization linear program, sensitivity analysis showed that the
marginal value (in terms of the maximal value of the objective function) of a change in this
tight input equals the value of the comparable variable for the optimal solution of the dual linear
program. Because this dual variable satisfies complementary slackness, it is the corresponding
Lagrange multiplier for the problem. The following theorem gives the comparable result for a
nonlinear problem with equality constraints and gives the marginal maximal value with changes
in bi .
Theorem 3.10. Assume that f, gi : Rn → R are C 2 for 1 ≤ i ≤ k < n. For b ∈ Rk , assume
that (λλ∗ (b), x∗ (b)) is a solution of (LM) for an extremizer of f on g - 1 (b) and satisfies
rank(Dg(x∗ (b))) = k . Let L(λ λ, x, b) = f (x) + kj=1 λj (bj − gj (x)) be the Lagrangian
P

as a function of b as well as x and λ . Also, assume the second derivative of the Lagrangian
as a function of λ and x is nondegenerate, i.e., it has a nonzero determinant. Then,

λ∗i (b) = f (x∗ (b)).
∂bi
λ∗ , x∗ ) = (λ
Proof. For fixed b, (λ λ∗ (b), x∗ (b)) satisfy
! !
Dλ L> b − g(x)
0= = λ, x, b).
= G(λ
Dx L> Df (x)> − j λj Dgj (x)>
P

The derivative of G with respect to λ and x is the second derivative of L with respect to λ
and x and is the bordered Hessian
" #
0k Dg
Dλ ,x G(λλ, x, b)) = (Dλ G(λ
λ, x, b), Dx G(λ λ, x, b)) = .
Dg> Dx2L
This derivative satisfies det (Dλ ,x G(λ λ∗ (b), x∗ (b), b)) 6= 0, since this second derivative
of the Lagrangian is assumed to be nondegenerate. See Section 3.5 for a discussion of bor-
dered Hessians and nondegenerate extrema on a level set. Therefore, the variables (λ λ, x) =
λ∗ (b), x∗ (b) are implicitly determined differentiable functions of b by the equation G(λ
(λ λ, x, b) =
0.
At these points on the level sets, f (x∗ (b)) = L(λ λ∗ (b), x∗ (b), b). Taking the partial
derivative of this equality with respect to bi gives
∂ ∂ ∂ ∗ ∂L ∗
f (x∗ (b)) = Dλ L λ ∗ (b) + Dx L x (b) + (x (b), λ ∗ (b), b).
∂bi ∂bi ∂bi ∂bi
Because bj − gj (x∗ (b)) = 0 for all j ,
∂ X ∂

Dλ L λ ∗ (b) = b∗j − gj (x∗ (b)) λj (b) = 0.
∂bi j
∂bi
Because the point satisfies the (LM) conditions,
 
k
∂ ∗ ∗
X
∗ ∗ ∂ ∗
Dx L x (b) = Df (x (b)) − λj (b)Dgj (x (b)) x (b) = 0.
∂bi j=1
∂bi

Finally, by taking the partial derivative of the formula for L,


∂L ∗
(x (b), λ ∗ (b), b) = λ∗i (b).
∂bi
Combining the terms, we get the equality required.
74 3. Constrained Extrema

An alternative proof is as follows: Once we know that (λ λ∗ (b, x∗ (b) is a differentiable


function of b, by the Chain Rule and since the points solve the Lagrange multiplier problem,

∂ ∂ ∗
f (x∗ (b)) = Df (x∗ (b)) x (b)
∂bi ∂bi
X ∂ ∗ ∗
= λ∗j (b) Dgj (x∗ (b)) x (x (w))
j
∂bi
X ∂
= λ∗j (b) gj (x∗ (b))
j
∂bi
= λ∗i (b).

The last equality holds because gj (x∗ (b)) = bj for all b, so


(
∂ 0 when j 6= i
gj (x∗ (b)) =
∂bi 1 when j = i.

3.2. Exercises
3.2.1. Find the points satisfying the first order conditions for a constrained extrema. Then
compare the values and argue which points are global maxima and which are global
minima.
a. f (x, y, z) = xyz and g(x, y, z) = 2x + 3y + z = 6.
b. f (x, y, z) = 2x + y 2 − z 2 , g1 (x, y, z) = x − 2y = 0, and g2 (x, y, z) =
x + z = 0.

3.2.2. For each of the following objective and constraint functions, find the maximizer and
minimizers.
a. f (x, y, z) = x2 + y 2 + z 2 , subject to g(x, y, z) = x + y + z = 12 and
h(x, y, z) = x2 + y 2 − z = 0.
b. f (x, y, z) = x+y+z , subject to g(x, y, z) = x2 +y 2 = 2 and h(x, y, z) =
x + z = 1.
c. Minimize f (x, y, z) = x2 +y 2 +z 2 , subject to g(x, y, z) = x+2y+3z = 6
and h(x, y, z) = y + z = 0.

3.2.3. Let u(x, y, z) = x2 y 3 z 4 and the expenditure be E(x, y, z) = p1 x + p2 y + p3 z with


p1 > 0, p2 > 0, and p3 > 0. For fixed w > 0, let
X(w) = { (x, y, z) ∈ R3+ : u(x, y, z) ≥ w }.
a. Even though X(w) is not compact, show that E attains a minimum √ on X(w)
using the Extreme
√ Value Theorem.
√ Note that (x 0 , y ,
0 0z ) = ( w, 1, 1) is in
X(w) and E( w, 1, 1) = p1 w + p2 + p3 .
b. Since ∇E 6= 0 at all points of X(w) (or DE(x, y, z) 6= 0), it follows that
the minimum cannot be in the interior of X(w) but must be on the boundary
{(x, y, z) : u(x, y, z) = w}. (You do not need to show this.)
Using the first order conditions for Lagrange Multipliers, find the point that
attains the minimum of E subject to u(x, y, z) = w. Explain why this point
is the minimizer E on X(w).
3.3. Extrema with Inequality Constraints: Necessary Conditions 75

3.3. Extrema with Inequality Constraints:


Necessary Conditions
In the rest of this chapter, we use the following notation for a feasible sets defined by resource
requirements with g : Rn → Rm ,
F g,b = { x ∈ Rn : g(x) ≤ b }.
Just as for a linear program, a constraint gi (x) ≤ bi is said to be slack at x = p provided
that gi (p) < bi . A constraint gi (x) ≤ bi is said to be effective or tight at x = p provided
that gi (p) = bi .
The necessary conditions for an optimal solution allow for the possibility that an extremizer
is a point that is the nonlinear analog of a degenerate basic solution for a linear program. The
“constraint qualification” is said to fail at such a point as given in the following definition.
Definition. Given a point p, we let E(p) = { i : gi (p) = bi } be the set of tight constraints
at p, |E(p)| be the cardinality of E(p), and gE(p) (x) = ( gi (x) )i∈E(p) be the function
with tight constraints only.
The set of constraints satisfies the constraint qualification at p provided that |E(p)| =
rank(DgE(p) (p)), i.e., the gradients of the tight constraints are linearly independent at p.
A set of constraints satisfies the constraint qualification on the feasible set F g,b provided
it is satisfied at all the points on the boundary.
Example 3.11. Consider the constraints
g1 (x, y) = x + (y − 1)3 ≤ 0,
g2 (x, y) = x ≤ 0,
g3 (x, y) = y ≤ 0.

x≤0
(0, 1)

x + (y − 1)3 ≤ 0

Figure 3.3.1. Constraint qualification fails

At (0, 1), E(0, 1) = {1, 2}, gE(0,1) (x, y) = (x + (y − 1)3 , x)> , and
" # " #
1 3(y − 1)2 1 0
rank(DgE(0,1) (0, 1)) = rank = rank = 1 < 2 = |E(0, 1)|.
1 0 1 0
y=1

Therefore, the constraint qualification fails at (0, 1). Note that the level sets g1- 1 (0) and
g2- 1 (0) are tangent at (0, 1) and the gradients ∇g1 (0, 1) and ∇g2 (0, 1) are parallel. 
Theorem 3.12 (Karush-Kuhn-Tucker). Suppose that f, gi : Rn → R are C 1 functions for
1 ≤ i ≤ m, and f attains a local extremum at x∗ on F g,b .
Then either (a) the constraint qualification fails at x∗ with rank(DgE(x∗ ) (x∗ )) < |E(x∗ )|,
or
(b) there exist λ ∗ = (λ∗1 , . . . , λ∗m ) such that KKT-1,2,3 or KKT-1,2,30 hold:
76 3. Constrained Extrema

Df (x∗ ) = m ∗ ∗
P
KKT-1. i=1 λi Dgi (x ).
KKT-2. λ∗i ( bi − gi (x∗ ) ) = 0 for 1 ≤ i ≤ m ( so λ∗i = 0 for i ∈
/ E(x∗ ) ).
KKT-3. If x∗ is a maximizer, then λ∗i ≥ 0 for 1 ≤ i ≤ m.
KKT-30 . If x∗ is a minimizer, then λ∗i ≤ 0 for 1 ≤ i ≤ m.
Remark. This theorem is the nonlinear version of Proposition 1.13 in Chapter 1 for linear pro-
grams. The proof is similar with the Implicit Function Theorem replacing part of the argument
done by linear algebra for the linear program.
Remark. Exercise 3.3.5 gives an example for which the maximum occurs at a point where the
constraint qualification fails.
Remark. We call KKT-1,2,3 the first order Karush-Kuhn-Tucker conditions for a maximum.
Condition KKT-1 implies that ∇f (x∗ ) is perpendicular to the tangent space of the level
- 1 (b
set of the tight constraints at x∗ , gE(x ).
∗) E(x∗ )
Condition KKT-2 is called complementary slackness: If a constraint is slack, gi (x∗ ) < bi ,
then the corresponding multiplier λ∗i = 0; if λ∗i 6= 0, then the constraint is tight, gi (x∗ ) = bi .
Condition KKT-3, λ∗i ≥ 0 for a maximum, implies that the gradient of f points out of
the feasible set at x∗ . The inequalities, gi (x) ≤ bi , are resource type and signs λ∗i ≥ 0 are
compatible with maximization for a linear programming problem.
Condition KKT-30 , λ∗i ≤ 0 for a minimum, implies that the gradient ofPf points into the
m
feasible set at x∗ . The point x∗ is a maximizer of f (x) and ∇f (x∗ ) = i=1 ( λ∗i ) ∇gi (x∗ )
∗ ∗
with λi ≥ 0. So the signs λi ≤ 0 are compatible with minimization for a linear program-
ming problem.
Steps to Find an Optimal Solution using the KKT Theorem 3.12
1. Verify that a maximum (resp. minimum) exists by showing either that the
feasible set is compact or that f (x) takes on smaller values (resp. larger
values) near infinity.
2. Find all the possible extremizers: (i) Find all the points on the boundary of the
feasible set where the constraint qualification fails; (ii) find all the points that
satisfy KKT-1,2,3 (resp. KKT-1,2,30 ).
3. Compare the values of f (x) at all the points found in 2(i) and 2(ii).

Proof of Theorem 3.12. Assume the constraint qualification holds at x∗ and |E(x∗ )| = k .
We can rearrange the indices of the gj so that E(x∗ ) = { 1, . . . , k }, i.e., gi (x∗ ) = bi for

1 ≤ i ≤ k and gi (x ) < bi for k + 1 ≤ i ≤ m. Also, we can rearrange the indices of the xj
∂gi ∗
so that det (x ) 6= 0.
∂xj 1≤i,j≤k
Set λ∗i = 0 for i ∈/ E(x∗ ), i.e., for k + 1 ≤ i ≤ m. We essentially drop these ineffective
constraints from consideration in the proof. The function f also attains a extremum at x∗ on
{ x : gi (x) = bi for i ∈ E(x∗ ) }, so by the Lagrange Multiplier Theorem, there exist λ∗i for
1 ≤ i ≤ k so that X
Df (x∗ ) = λ∗i Dgi (x∗ ).
1≤i≤k
Since λ∗i = 0 for k + 1 ≤ i ≤ m, we can change the summation to 1 to m and obtain condition
KKT-1. Also, either λ∗i = 0 or bi − gi (x∗ ) = 0, so condition KKT-2 holds.
The question remains: For a maximum, why are λ∗` ≥ 0 for ` ∈ E(x∗ )? (The case of a
minimum is similar.)
We apply the Implicit Function Theorem to show that there is a curve r(t) in F such
that, for small t > 0, (i) g` (r(t)) < b` , (ii) gi (r(t)) = bi for i 6= ` and 1 ≤ i ≤ k , and (iii)
ri (t) = x∗i for k + 1 ≤ i ≤ n.
3.3. Extrema with Inequality Constraints: Necessary Conditions 77

gi- 1 (0)
T
i6=`

r(t)

x

g`- 1 (0)

Figure 3.3.2. Curve where only one more constraint becomes slack

Let δi` = 0 if i 6= ` and δ`` = 1. We apply the theorem to a function G defined by


Gi (x, t) = gi (x) − bi + t δi` for 1 ≤ i ≤ k = |E(x∗ )| and
Gi (x, t) = xi − x∗i for k + 1 ≤ i ≤ n.
The determinant of the derivative of G with respect to x is
∂g1 ∂g1 ∂g1 ∂g1
 
 ∂x ··· ···
 1 ∂xk ∂xk+1 ∂xn 
 ∂g2 ∂g2 ∂g2 ∂g2 
 
 ∂x1 · · · ∂xk ∂xk+1 ···
 
∂xn 
 .. .. .. .. 
 
.. ..

 . . . . . . 
det(Dx G(x , 0)) = det   
∂g
 k ··· ∂g k ∂g k ∂gk 
··· 
 ∂x1 ∂xk ∂xk+1 ∂xn 


 0

··· 0 1 ··· 0  
 . .. .. .. 
 . .. ..
 . . . . . . 

0 ··· 0 0 ··· 1
∂gi ∗
 
= det (x ) 6= 0.
∂xj 1≤i,j≤k

By the Implicit Function Theorem, there exists x = r(t) such that r(0) = x∗ and G(r(t), t) ≡
0, i.e., gi (r(t)) = bi − t δi` for 1 ≤ i ≤ k and xi = x∗i for k + 1 ≤ i ≤ n. This curve r(t)
has the desired properties:

∗ 0 d
Dgi (x r (0) = gi ◦ r(t) = δi` for 1 ≤ i ≤ k.
dt t=0
The function f has a maximum at x , so f (x∗ ) ≥ f (r(t)) for t ≥ 0, and


d X
= Df (x∗ ) r0 (0) = λ∗i Dgi (x∗ ) r0 (0) = λ∗` ,

0 ≥ f ◦ r(t)
dt t=0 1≤i≤k

where the second equality holds by the first order Lagrange Multiplier condition KKT-1 and the
third equality holds by the calculation of Dgi (x∗ )r0 (0). We have shown that that λ∗` ≥ 0 for
1 ≤ i ≤ k , so we have KKT-3.
Example 3.13. Let f (x, y) = x2 − y and g(x, y) = x2 + y 2 ≤ 1. (Note that the feasible set
is compact.)
The derivative of the constraint is Dg(x, y) = (2x, 2y), which has rank one at all the
points of the boundary of the feasible set. (At least one variable is nonzero at each of the
points.) Therefore, the constraint qualification is satisfied at all the points in the feasible set.
78 3. Constrained Extrema

The KKT equations KKT-1,2 to be solved are


0 = fx − λ gx = 2x − λ 2x,
0 = fy − λ gy = −1 − λ 2y, and
2 2
0 = λ (1 − x − y ).
From the first equation, we see that x = 0 or λ = 1.
Case (i): λ = 1 > 0. We are left with the equations
1 = −2y,
1 = x2 + y 2 .

From the first equation, y = 1/2, so x2 = 1 − 1/4 = 3/4, or x = ± 3/2.
Case (ii): x = 0. We are left with the equations
1 = λ 2y,
0 = λ (1 − y 2 ).
For the first equation, λ 6= 0. From the second equation, we get that y = ±1. If y = 1, then
we get that 0 = −1 − 2λ and λ = -1/2 < 0. This point cannot be a maximum. If y = 1,
then 1 = 2λ and λ = 1/2 > 0. This is a possible  √maximizer.
  √ 
We have found three possible maximizers: ± 3/2, -1/2 and (0, 1), with values f ± 3/2, -1/2 =
 √ 
3/4+1/2 = 5/4 and f (0, 1) = 1. Thus the maximum is 5/4, which is attained at ± 3/2, −1/2 .

Notice that although the multiplier is positive at (0, 1), it is not a maximizer. The function
f decreases as it moves into the interior of the region at (0, 1), but it is a local minimum along
the boundary so this point is a type of saddle point on the feasible set. 
Example 3.14. Maximize f (x, y, z) = x2 + 2y 2 + 3z 2 , on the constraint set 1 = x + y + z =
g0 (x, y, z), 0 ≥ x = g1 (x, y, z), 0 ≥ y = g2 (x, y, z), and 0 ≥ z = g3 (x, y, z).
Because the 0th -equation involves an equality, λ0 can have any sign. For 1 ≤ i ≤ 3, we need
λi ≥ 0.
We want to check that the constraint qualification is satisfied at all points of the feasible
set. On the face where g0 (x, y, z) = 1 and gi (x, y, z) < 0 for i = 1, 2, 3,
  h i
rank DgE (x, y, z) = rank (Dg0 (x, y, z)) = rank 1 1 1 = 1.

On the edge where g0 (0, y, z) = 1, g1 (0, y, z) = 0, and gi (0, y, z) < 0 for i = 2, 3,


" #!
  
>
 1 1 1
rank DgE (0, y, z) = rank D(g0 , g1 ) (0, y, z) = rank = 2.
1 0 0

On the edge where g0 (x, 0, z) = 1, g2 (x, 0, z) = 0, and gi (x, 0, z) < 0 for i = 1, 3,


" #!
    1 1 1
rank DgE (x, 0, z) = rank D(g0 , g2 )> (x, 0, z) = rank = 2.
0 1 0

On the edge where g0 (x, y, 0) = 1, g3 (x, y, 0) = 0, and gi (x, y, 0) < 0 for i = 1, 2,


" #!
  
>
 1 1 1
rank DgE (x, y, 0) = rank D(g0 , g3 ) (x, y, 0) = rank = 2.
0 0 1
3.3. Extrema with Inequality Constraints: Necessary Conditions 79

Finally, at the vertices where three constraints are tight,


 
    1 1 1
rank DgE (1, 0, 0) = rank D(g0 , g2 , g3 )> (1, 0, 0) = rank 0 1 0  = 3,
 
0 0 1
 
    1 1 1
>
rank DgE (0, 1, 0) = rank D(g0 , g1 , g3 ) (0, 1, 0) = rank  1 0 0  = 3,
 
0 0 1
 
    1 1 1
rank DgE (0, 0, 1) = rank D(g0 , g1 , g2 )> (0, 0, 1) = rank  1 0 0 = 3.
 
0 1 0
This verifies that the constraint qualification is satisfied at all points of the feasible set.
The first order conditions KKT-1,2 are given by
0 = fx − λ0 g0x − λ1 g1x − λ2 g2x − λ3 g3x = 2x − λ0 + λ1 ,
0 = fy − λ0 g0y − λ1 g1y − λ2 g2y − λ3 g3y = 4y − λ0 + λ2 ,
0 = fz − λ0 g0z − λ1 g1z − λ2 g2z − λ3 g3z = 6z − λ0 + λ3 ,
1 = x + y + z, 0 = λ1 x, 0 = λ2 y, 0 = λ3 z.
Case 1: At an interior point with x > 0, y > 0, and z > 0, λi = 0 for 1 ≤ i ≤ 3. Thus
λ0 = 2x = 4y = 6z,
= x/2 and z = x/3. Substituting into g0 , 1 = x + y + z = x 1 + 1/2 + 1/3 = 11 x/6, so

so y
x= 6/11, y = 3/11, and z = 2/11.
Case 2: x = 0, y > 0, and z > 0. Thus, λ2 = λ3 = 0, and we get 4y = λ0 = 6z ,
so z = 2y/3. Then 1 = y 1 + 2/3 = 5 y/3, and y = 3/5. Then z = 23 · 53 = 2/5 and
λ0 = 4y = 4 3/5 = 12/5. Then, 0 = λ0 + λ1 , so λ1 = 12/5 > 0. This is an allowable point.


Case 3: y = 0, x > 0, and z > 0. Thus, λ1 = λ3 = 0, and we get 2x = λ0 = 6z ,


so x = 3z . Then, 1 = z(3 + 1), z = 1/4, x = 3/4, and λ0 = 2x = 3/2 > 0. Then,
λ2 = λ0 = 3/2 > 0 is allowable.
Case 4: z = 0, x > 0, and y > 0. Thus, λ1 = λ2 = 0, and we get 2x = λ0 = 4y , so
x = 2y . Then, 1 = y(2 + 1), y = 1/3, x = 2/3, and λ0 = 4/3. Then, λ3 = λ0 = 4/3 > 0 is
allowable.
The vertices (1, 0, 0), (0, 1, 0), and (0, 0, 1) are also possibilities.
The values at these points are as follows:
36 + 18 + 12 66
f 6/11, 3/11, 2/11 =

= ≈ 0.5454,
121 121
18 + 12 30
f 0, 3/5, 2/5 =

= = 1.2,
25 25
9+3 12
f 3/4, 0, 1/4 =

= = 0.75,
16 16
4+2 2
f 2/3, 1/3, 0 =

= ≈ 0.667,
9 3
f (1, 0, 0) = 1,
f (0, 1, 0) = 2,
f (0, 0, 1) = 3.
The maximal value of 3 is attained at the vertex (0, 0, 1). 
80 3. Constrained Extrema

There are several aspects that make KKT Theorem 3.12 difficult to apply. First, it is nec-
essary to show that a maximum or minimum exists. Second, it is not easy to show that the
constraint qualification holds at all points of F g,b or find all the points on the boundary where
it fails. Thus, the possibility of the constraint qualification failing makes the theorem difficult
to apply in applications. Finally, it is necessary to find all the points that satisfy KKT-1,2,3 or
where the constraint qualification fails and compare the values of the objective function at these
points.
The deficiencies of Theorem 3.12 are overcome by means of convexity and concavity of
constraints and objective function as developed in the next section.

3.3. Exercises
3.3.1. Maximize the revenue
1/2 1/2 1/3
π = p 1 x1 + p2 x1 x2
subject to a wealth constraint on the inputs
w1 x1 + w2 x2 ≤ C > 0, x1 ≥ 0 x2 ≥ 0.
a. Write down the constraint functions and the KKT-1,2,3 equations that must be
satisfied for the Karush-Kuhn-Tucker Theorem.
b. Take w1 = w2 = 2, p1 = p2 = 1, and C = 8, and find explicit values of x1
and x2 that attains the maximum.
3.3.2. Given H > 0, p1 > 0, p2 > 0, and w > 0. Let u(x1 , x2 , y) = xα1 1 xα2 2 − (H − y)2 ,
for α1 > 0 and α2 > 0. Consider the feasible set
Φ(p, q, w, H) = { (x1 , x2 , y) ∈ R3+ : p1 x1 + p2 x2 + wy ≤ wH, 2y ≤ H }.
a. What are the constraint functions gi which define the inequality constraints in
the form of the Kuhn-Tucker Theorem?
wHαi
Which constraints are effective at y ∗ = 21 H , and x∗i = for
2pi (α1 + α2 )
i = 1, 2?
Do the constraints satisfy the rank condition?
b. Why does the feasible set satisfy the Slater condition?
c. What are the KKT-1,2,3 equations?
3.3.3. Consider the following problem:
Maximize: f (x, y) = x2 + y 2 + 2y
Subject to: x2 + y 2 ≤ 5
x + 2y ≤ 4
0 ≤ x, 0 ≤ y .
a. Explain why f (x, y) must have a maximum on the feasible  set.
b. Using the fact that the points (x, y) = (2, 1), 6/5, 7/5 , and (0, 2) are the
only points that satisfy the first order KKT equations for a maximum with the
correct signs of the multipliers, what is the maximal value of f (x, y) on the
feasible set and what is the point that is the maximizer? Explain why this must
be the maximizer, including explaining how the theorems apply and what other
conditions need to be satisfied.
 the constraint qualification is satisfied at the three points (2, 1),
c. Verify that
6/5, 7/5 , and (0, 2).
3.4. Extrema with Inequality Constraints: Sufficient Conditions 81

3.3.4. Assuming the parameters p > 1, w0 > 0, 0 < x̄1 < w0/p, and 0 < x̄2 < w0 ,
consider the following problem:
Maximize: U (x1 , x2 ) = x1 x2
Subject to: p x1 + x2 ≤ w0 ,
0 ≤ x1 ≤ x̄1 ,
0 ≤ x2 ≤ x̄2 .
a. Show that the constraint qualification is satisfied on the feasible set.
b. Why must U attain a maximum on the feasible set?
c. What are the KKT-1,2,3 equations?
d. What conditions on the parameters need to be satisfied for U to have a maxi-
mum at X1 = x̄1 and x2 = x̄2 ?
3.3.5. Consider the problem
Maximize: f (x, y) = 2 − 2 y
Subject to g1 (x, y) = y + (x − 1)3 ≤ 0
g2 (x, y) = x ≤ 0,
g3 (x, y) = y ≤ 0.
Carry out the following steps to show that the maximizer is a point at which the
constrain qualification fails.
a. By drawing a figure, show that the feasible set is a three sided (nonlinear)
region with vertices at (0, 0), (1, 0), and (0, 1).
b. Plot several level curves f - 1 (C) of the objective function to your figure from
part (a) and conclude geometrically that (0, 1) is a maximizer and (1, 0) is a
minimizer of f (x, y) on the feasible set.
c. Show that the constraint qualification fails at (0, 1). Also, show that Df (0, 1)
cannot be written as a linear combination of the derivatives Dgi (0, 1) of the
effective constraints.

3.4. Extrema with Inequality Constraints:


Sufficient Conditions
We overcome the deficiencies of Theorem 3.12 by means of convexity and concavity. Con-
vex constraints eliminates the need for the constraint qualification. A concave (resp. convex)
objective function ensures the objective function has a global maximum (resp. minimum) at
a point that satisfies conditions KKT-1,2,3 (resp. KKT-1,2,30 ). The convexity and concavity
assumptions are like second derivative conditions at all points of the feasible set.

3.4.1. Convex Structures


Before defining convex and concave functions, we repeat the definition of a convex set given in
Section 1.6.
Definition. A set D ⊂ Rn is called convex provided that if x, y are any two points in D ,
then the convex combination (1 − t) x + t y is also in D for any 0 ≤ t ≤ 1. Note that
xt = (1 − t) x + t y for 0 ≤ t ≤ 1 is the line segment from x when t = 0 and to y when
t = 1.
Figure 3.4.1 shows examples of convex and non-convex sets.
We next define of convex and concave functions in terms of just the values of the function
without assuming the function is differentiable. Later in Theorem 3.32, we show that for a C 1
82 3. Constrained Extrema

convex convex convex not convex not convex

Figure 3.4.1. Convex and non-convex sets

f (xt ) f (y)

f (x) (1 − t)f (x) + tf (y)

x xt y

Figure 3.4.2. Concave function

function these conditions can be expressed in terms of the function being above or below the
tangent plane at all points.
Definition. A function f : D ⊂ Rn → R is concave on D provided that for all x, y ∈ D
and 0 ≤ t ≤ 1,
f (xt ) ≥ (1 − t) f (x) + t f (y) for xt = (1 − t) x + t y.
This is equivalent to assuming that the set of points below the graph,
{(x, y) ∈ D × R : y ≤ f (x) },
is a convex subset of Rn+1 .
A function f : D → R is strictly concave provided that for all x, y ∈ D with x 6= y and
0 < t < 1,
f (xt ) > (1 − t) f (x) + t f (y) for xt = (1 − t) x + t y.
A function f : D ⊂ Rn → R is convex on D provided that for all x, y ∈ D and
0 ≤ t ≤ 1,
f (xt ) ≤ (1 − t) f (x) + t f (y) for xt = (1 − t) x + t y.
This is equivalent to assuming that the set of points above the graph,
{(x, y) ∈ D × R : y ≥ f (x) },
n+1
is a convex subset of R .
A function f : D → R is strictly convex provided that for all x, y ∈ D with x 6= y and
0 < t < 1,
f (xt ) < (1 − t) f (x) + t f (y) for xt = (1 − t) x + t y.
Remark. If f is either concave or convex on D then D is convex. Also, the condition is on
the graph of f and not on the domain of f .
Theorem 3.15. Let f : D ⊂ Rn → R be a concave or convex function on D . Then in the
interior of D , f is continuous and possesses all directional derivatives (which can possibly be
infinite).
3.4. Extrema with Inequality Constraints: Sufficient Conditions 83

See [14] or [2] for a proof. Since concave/convex functions are continuous, they are rea-
sonable functions to maximize or minimize.
Theorem 3.16. Assume that D ⊂ Rn is an open convex subset, and gi : D → R are C 1
convex functions for 1 ≤ i ≤ m. Then F g,b = { x ∈ D : gi (x) ≤ bi for 1 ≤ i ≤ m } is a
convex set for any b ∈ Rm ,
Proof. Take x, y ∈ F g,b and let xt = (1 − t)x + ty for 0 ≤ t ≤ 1. For any 1 ≤ i ≤ m,
gi (xt ) ≤ (1 − t)gi (x) + tgi (y) ≤ (1 − t)bi + tbi = bi , so xt ∈ F g,b . Thus, F g,b is
convex.
The assumption that a function is convex or concave can be verified by a second-derivative
condition at all points of the feasible set as shown in the following theorem.
Theorem 3.17 (Second-Derivative Test). Let D ⊂ Rn be open and convex and f : D → R
be a C 2 function.
a. The function f is convex (respect. convex) on D iff D 2f (x) is positive (respect.
negative) semidefinite for all x ∈ D .
b. If D 2f (x) is positive (respect. negative) definite for all x ∈ D , then f is strictly
convex (respect. concave) on D .
We follow the proof in Sundaram [14]. The general theorem can be adapted from the
special case when D is all of Rn . The idea is that if D 2f (x) is positive definite (resp. negative
definite), then locally the graph of f lies above (resp. below) the tangent plane. The proof
makes this global.
We start by proving a lemma that says it is enough to show f is convex along straight lines
in the domain.
Lemma 3.18. Let f : Rn → R. For any x, h ∈ Rn , let gx,h (t) = f (x + t h) for t ∈ R.
Then the following hold.
a. f is convex iff gx,h is convex for each fixed x, h ∈ Rn .
b. If gx,h is strictly convex for each fixed x, h ∈ Rn with h 6= 0, then f is strictly
convex.
Proof. First suppose that f is convex. Fix x, h ∈ Rn . For any t1 , t2 ∈ R and any 0 ≤ β ≤ 1,
gx,h (β t1 + (1 − β) t2 ) = f (x + β t1 h + (1 − β) t2 h)
= f (β (x + t1 h) + (1 − β) (x + t2 h) )
≤ β f (x + t1 h) + (1 − β) f (x + t2 h)
= β gx,h (t1 ) + (1 − β) gx,h (t2 ).
This shows that gx,h is convex.
Next, suppose that gx,h is convex for any x, h ∈ Rn . Pick any x, y ∈ Rn , and let
h = y − x, and xt = (1 − t) x + t y = x + t h for 0 ≤ t ≤ 1. Then,
f ( (1 − t) x + t y ) = gx,h (t) + gx,h ( (1 − t)0 + t 1)
≤ (1 − t) gx,h (0) + t gx,h (1)
= (1 − t) f (x) + t f (y).
Since this is true for any x, h ∈ Rn , f is convex.
The proof of (b) is similar.
Proof of Theorem 3.17. We consider part (b) in the case when D 2f (x) is positive definite and
show that f is strictly convex. Pick any x, h ∈ Rn with h 6= 0, and define g(t) = gx,h (t)
as in the lemma. In the notes for Chapter 6, we showed that
00
gx,h (t) = h> D 2f (xt )h,
84 3. Constrained Extrema

which is positive. For any t < s in R and 0 ≤ β ≤ 1, let z = (1 − β) t + β s. By the Mean


Value Theorem, there exists t < w1 < z < w2 < s such that
g(z) − g(t) g(s) − g(z)
= g 0 (w1 ) and = g 0 (w2 ).
z−t s−z
Since g 00 (x) > 0 for all x, g 0 (w2 ) > g 0 (w1 ) and
g(z) − g(t) g(s) − g(z)
<
z−t s−z
(s − z) (g(z) − g(t)) < (z − t) (g(s) − g(z))
(s − t)g(z) < (s − z)g(t) + (z − t)g(s)
s−z z−t
g(z) < g(t) + g(s).
s−t s−t
s−z z−t
Since z = (1 − β) t + β s, = β , and 1 − β = , this gives g ((1 − β) t + β s) <
s−t s−t
β g(t) + (1 − β) g(s). This proves that g is strictly convex. By Lemma 3.18, f is strictly
convex.

3.4.2. Karush-Kuhn-Tucker Theorem under Convexity


The main theorem of this section gives necessary and sufficient conditions for an extremizer to
exist on a feasible set. To ensure that an extremizer satisfies the conditions KKT-1,2, we need a
condition on the feasible set F g,b .
Definition. Let gi : D → R for 1 ≤ i ≤ m and F g,b as usual. We say that the constraint
functions gi satisfy the Slater condition for F g,b provided that there exists a point x ∈ F g,b
such that gi (x) < bi for all 1 ≤ i ≤ m. (This assumption says that there is a point with no
effective constraint and implies that the constraint set has nonempty interior.)
Theorem 3.19 (Karush-Kuhn-Tucker Theorem under Convexity). Assume that b ∈ Rm ,
f, gi : Rn → R are C 1 for 1 ≤ i ≤ m, and x∗ ∈ F g,b .
a. Assume that f is a concave function.
i. If F g,b is convex and (x∗ , λ ∗ ) satisfies KKT-1,2,3 with all the λ∗i ≥ 0, then f
has a maximum on F g,b at x∗ .
ii. If f has a maximum on F g,b at x∗ , each of the constraints functions gi is con-
vex, and F g,b satisfies the Slater condition, then there exist λ ∗ = (λ∗1 , . . . , λ∗m )
such that (x∗ , λ∗ ) satisfies conditions KKT-1,2,3 with all the λ∗i ≥ 0.
b. Assume that f is a convex function.
i. If F g,b is convex and (x∗ , λ ∗ ) satisfies KKT-1,2,30 with all the λ∗i ≤ 0, then
f has a minimum on F g,b at x∗ .
ii. If f has a minimum on F g,b at x∗ , each of the constraints functions gi is con-
vex, and F g,b satisfies the Slater condition, then there exist λ ∗ = (λ∗1 , . . . , λ∗m )
such that (x∗ , λ∗ ) satisfies conditions KKT-1,2,30 with all the λ∗i ≤ 0.
(Note that the assumptions on F g,b and the gi stay the same for a minimum as for a
maximum.)
Remark. Kuhn and Tucker wrote a paper that popularized this result in 1951. However, Karush
had earlier written a thesis at the University of Chicago about a similar result in 1939. There-
fore, although this result is sometimes referred to as the Kuhn-Tucker Theorem, people have
started using all three names when referring to this theorem. Fritz John had a related result in
1948. See [2].
3.4. Extrema with Inequality Constraints: Sufficient Conditions 85

Remark. In most of the examples we use part a.i (or b.i). Note that for this use it is not nec-
essary to verify Slater’s condition or that a maximum must exist (by compactness or similar
argument). The feasible set F g,b is shown convex by conditions on constraint function. We
have shown that if all the gi (x) are convex then it is true. Later, we allow for rescaling of con-
vex functions. Finally, once one point is found that satisfies KKT-1,2,3, then it is automatically
a maximizer; we do not need to verify separately that a maximum exists.
We delay the proof of the Karush-Kuhn-Tucker Theorem under Convexity until Section
3.4.5. In the rest of this section, we give examples of convex/concave functions and applications
of the Karush-Kuhn-Tucker Theorem.
Proposition 3.20. For a ∈ Rn and b ∈ R, the affine function on Rn given by
g(x) = a · x + b = a1 x1 + · · · + an xn + b is both concave and convex.
Proof. For p0 , p1 ∈ Rn , set pt = (1 − t) p0 + t p1 . Then,
g(pt ) = a · pt + b
= a · [(1 − t) p0 + t p1 ] + b
= (1 − t) [a · p0 + b] + t [a · p1 + b]
= (1 − t) g(p0 ) + t g(p1 ).
Thus, we have equality and not just an inequality and g is both concave and convex.
Example 3.21.
Minimize : f (x, y) = x4 + y 4 + 12 x2 + 6 y 2 − xy − x + y ,
Subject to : g1 (x, y) = −x − y ≤ 6
g2 (x, y) = −2x + y ≤ 3
x ≤ 0, y ≤ 0.
The constraints are linear and so are convex. The objective function has second derivative
" #
2 12 x2 + 24 1
D f (x, y) = .
1 12 y 2 + 12
The determinant is greater than 24(12) − 1 > 0, so this is positive definite and f is convex.
Let λ1 and λ2 be the multipliers for the first two inequalities and µ1 and µ2 be the
multiplier for x ≤ 0 and y ≤ 0. Condition KKT-1 is
0 = 4x3 + 24x − y − 1 + λ1 + 2λ2 + µ1
0 = 4y 3 + 12y − x + 1 + λ1 − λ2 + µ2
If x = 0, then y ≤ 3, so y > 0, which is not feasible. Therefore this constraint cannot
be tight and µ1 = 0. We consider the slackness of the other constraints in cases.
If y = 0, then x = g1 (x, 0) ≥ 6, so g2 (x, 0) = −2x ≤ 12 < 3, and λ2 = 0.
The function f (x, 0) = x4 + 12 x2 − x, is minimized for x ≥ 6 at x = 6. For that value, the
second equation gives
0 = −6 + 1 + λ1 + µ2 , or
5 = λ 1 + µ2 .
Both these multipliers cannot be negative, so it is not a minimum.
Finally assume that x, y > 0, so µ2 = 0. Points where both g1 and g2 are tight satisfies
6 = g1 (x, y) = −x − y
3 = g2 (x, y) = −2x + y
86 3. Constrained Extrema

can be solved to yield x = y = 3. If this is a solution of the KKT conditions then


0 = 4(34 ) + 24(3) − 3 − 1 + λ1 + 2λ2 = λ1 + 2λ2 + 176
0 = 4(33 ) + 12(3) − 3 + 1 + λ1 − λ2 = λ1 − λ2 + 140.
These can be solved to yield λ1 = 152 and λ2 = 12. Thus, the point (x∗ , y ∗ ) = (3, 3) with
λ1 = 152, λ2 = 12, and µ1 = µ2 = 0 does satisfy the KKT conditions with negative
multipliers and is the minimizer. 
Example 3.22 (Lifetime of Equipment). This example is based on Example 7.5.4 in Walker
[16]. Assume there are two machines with initial costs of $1,600 and $5,400 respectively. The
operating expense in the j th year of the two machines is $50j and $200j respectively. The
combined number of years of use of the two machines is desired to be at least 20 years. The
problem is to determine the number of years to use each machine to minimize the average total
cost per year.
Let x and y be the lifetimes of the respective machines. We require x+y ≥ 20, x ≥ 0, and
y ≥ 0. The average amortized capital expense per year for the two machines is 1600/x + 5400/y.
The average operating expenses per year for the two machines are
50 + 2(50) + · · · + x(50) 50 x(x + 1)
= · = 25(x + 1) and
x x 2
200 + 2(200) + · · · + y(200) 200 y(y + 1)
= · = 100(y + 1).
y y 2
The problem is therefore the following.
1600 5400
Minimize : f (x, y) = 25(x + 1) + 100(y + 1) + +
x y
Subject to : g1 (x, y) = 20 − x − y ≤ 0
g2 (x, y) = x ≤ 0
g3 (x, y) = y ≤ 0.
The constraints are linear and so convex. A direct check shows that D 2f (x, y) is positive
definite on R2++ and so f is convex on R2++ .
The objective function (cost) gets arbitrarily large near x = 0 or y = 0, so the minimum
occurs for x > 0 and y > 0, and we can ignore those constraints and multipliers.
The KKT-1,2 conditions become
1600
0 = 25 − +λ
x2
5400
0 = 100 − 2 + λ
y
0 = λ(20 − x − y).
(i) First assume the constraint is effective and y = 20 − x. The first two equations give
λ 64 216
=1− 2 =4− ,
25 x (20 − x)2
0 = 3x2 (20 − x)2 + 64(20 − x)2 − 216x2
= 3x4 − 120x3 + 1048x2 − 2560x + 25600.
This polynomial has positive roots of x ≈ 12.07 and 28.37. If x ≈ 28.37, then y = 8.37 <
1600
0 so this is not feasible. If x ≈ 12.07, then y ≈ 7.93 and λ ≈ 25+ ≈ 14.02 < 0.
12.072
The multiplier is negative, so this is the minimizer.
3.4. Extrema with Inequality Constraints: Sufficient Conditions 87

(ii) Although we do not need to do so, we check that there is no minimizer where the
constraint is not effective and λ = 0.
1600
x2 = = 64, x=8 and
25
y 2 = 54, y ≈ 7.34.
Since 8 + 7.34 < 20, they do not satisfy the constraint. 
Proposition 3.23 (Cobb-Douglas in R2 ). Let f (x, y) = xa y b with a, b > 0. If a + b ≤ 1,
then f is concave on R2+ (and xa y b is convex). If a + b > 1, then f is neither concave
nor convex.
See Figure 3.4.3.

xy
1 1
x y 3 3

y y

x x
(a) (b)

Figure 3.4.3. Cobb-Douglas function for (a) a = b = 1/3, (b) a = b = 1

Proof.
!
2 a(a − 1)xa−2 y b
 abxa−1 y b−1
det D f (x, y) = det a−1 1−b
abx y b(b − 1)xa y b−2

> 0
 if a + b < 1
= ab(1 − a − b)x2a−2 y 2b−2 =0 if a + b = 1


<0 if a + b > 1.
If a + b < 1 (so a < 1), D 2f (x, y) is negative definite and f is strictly concave on R2++ ;
since f is continuous on R2+ , it is also concave on R2+ . If a + b = 1, D 2f (x, y) is negative
semidefinite and f is concave on R2+ ; if a + b > 1, the D 2f (x, y) is indefinite and f is neither
concave nor convex. For example, xy is neither concave nor convex.
Proposition 3.24 (Cobb-Douglas in Rn ). Assume that a1 + · · · + an < 1 and ai > 0 for
1 ≤ i ≤ n. Then the function f (x) = xa1 1 xa2 2 · · · xann is concave on Rn+ (and xa1 1 · · · xann is
convex). If a1 + · · · + an > 1, then f is neither concave nor convex.
Proof. The reader should try to carry out the calculations directly for the case n = 3.
On Rn++ , the partial derivatives of f are as follows for i 6= j :
fxi = ai xa1 1 · · · xai i −1 · · · xann
fxi xi = ai (ai − 1)xa1 1 · · · xai i −2 · · · xann = ai (ai − 1)xi- 2 f
= a a xa1 · · · xai −1 · · · x j · · · xan = a a x - 1 x - 1 f.
a −1
f xi xj i j 1 i j n i j i j
88 3. Constrained Extrema

Using linearity on rows and columns, the determinant of the k th -principal submatrix is as fol-
lows:
a1 (a1 − 1)x1- 2 f · · · a1 ak x1- 1 xk- 1
 
 .. .. .. 
∆k = det   . . .


-
ak a1 xk x1 f1 -1
· · · ak (ak − 1)xk f - 2
 
a1 − 1 · · · ak
 . .. 
= a1 · · · ak x1- 2 · · · xk- 2 f k det   ..
..
. . 

a1 · · · ak − 1
= a1 · · · ak x - 2 · · · x - 2 f k ∆
1 k
¯ k,

where the last equality defines ∆ ¯ k as the determinant of the previous matrix. Below, we show
¯ k
by induction that ∆k = ( 1) (1 − a1 − · · · − ak ). Once this is established, since signs of
the ∆k alternate as required, D 2f is negative definite on Rn++ , and f is strictly concave on
Rn++ . Since f is continuous, it is concave on the closure of Rn++ , i.e., on Rn+ .
We show that ∆ ¯ k = ( 1)k + ( 1)k−1 (a1 + · · · + ak ) by induction on k , using linearity
of the determinant on the last column and column operations on the subsequent matrix:
 
a1 1 a2 ··· ak−1 ak
 a
 1 a2 1 ··· ak−1 ak 

 . .. .. .. ..
¯ k = det  .

∆  . . . . .


···
 
 a1 a2 ak−1 1 ak 
a1 a2 ··· ak−1 ak 1
   
a1 1 a2 ··· ak−1
ak a1 1 a2 ··· ak−1 0
 a
 1 a2 1 ··· ak−1
ak 
 a1
 a2 1 ··· ak−1 0 
 . .. .... 
..  + det  .. .. .. .. .. 

= det  ..

. . . .  . . . . . 

··· ···
   
 a1 a2 ak−1 1 ak   a1 a2 ak−1 1 0
a1 a2 ··· ak−1 ak a1 a2 ··· ak−1 1
 
1 0 ··· 0 1
0
 1 ··· 0 1 
¯ k = ak  .. .. .. .. ..  ¯

∆  . . . . .  − ∆k−1

0 ···
 
0 1 1
0 0 ··· 0 1
k−1
ak − ( 1)k−1 + ( 1)k−2 (a1 + · · · + ak−1 )
 
= ( 1)
= ( 1)k + ( 1)k−1 (a1 + · · · + ak ).
This proves the claim by induction and completes the proof.

3.4.3. Rescaled Convex Functions


If the constraint functions gi are convex, then the feasible set F g,b is convex. Various books
on optimization, including Sundaram [14] and Bazaraa et al [2], weaken the convexity assump-
tions on the constraint functions. A function g : Rn → R is quasi-convex provided that
{ x ∈ Rn : g(x) ≤ b } is convex for every b ∈ R. Bazaraa et al [2] also weaken the as-
sumption on the objective function to be pseudo-concave or pseudo-convex. (See the definition
3.4. Extrema with Inequality Constraints: Sufficient Conditions 89

following Theorem 3.32.) Rather than focusing on quasi-convex and pseudo-convex functions,
we consider rescalings of convex functions. The next theorem shows that constraints that are
rescaled convex functions have a convex feasible set and so are quasi-convex. Then Corol-
lary 3.26 shows that we can use more general exponents in Cobb-Douglas functions than the
preceding two proposition allowed.
Definition. A function g : D ⊂ Rn → R is a rescaling of ĝ : D → R provided that there
is an increasing function φ : R → R such that g(x) = φ ◦ ĝ(x). Note that since φ has an
inverse, ĝ(x) = φ - 1 ◦ g(x).
We say that φ is a C 1 rescaling provided that φ is C 1 and φ0 (y) > 0 for all y ∈ R.
If g : D ⊂ Rn → R is a rescaling of a convex (resp. concave) function ĝ : D → R, then
we say that g(x) is a rescaled convex function (resp. rescaled concave function). Similarly, if
g : D ⊂ Rn → R is a C 1 rescaling of a convex (resp. concave) function ĝ : D → R, then
we say that g(x) is a C 1 rescaled convex function (resp. C 1 rescaled concave function).
Theorem 3.25 (Rescaling). Assume that g : D ⊂ Rn → R is a rescaling of a convex function
ĝ : D → R, g(x) = φ ◦ ĝ(x). Then F g,b is convex for any b ∈ R, and g is quasi-convex.
The proof follows because F = { x ∈ D : ĝ(x) ≤ φ - 1 (b) } is convex.
g,b

Proposition 3.26 (Cobb-Douglas). If a1 , . . . , an > 0, then g(x) = xa1 1 · · · xann is a C 1


rescaling of a C 1 concave function on Rn++ and g(x) is a C 1 rescaling of a C 1 convex
function on Rn++ .
Proof. Let A = a1 + · · · + an and bi = ai/(2A), for 1 ≤ i ≤ n, so b1 + · · · + bn = 12 < 1.
Then ĝ(x, y, z) = xb11 · · · xbnn is a C 1 concave function on Rn++ . The function φ(y) = y 2A
is a C 1 rescaling such that φ ◦ ĝ(x) = g(x) on R++ .

Example 3.27. The Cobb-Douglas function f (x, y) = xy is a rescaled concave function, but
not concave. See Figure 3.4.4. 

xy
1 1
x3 y3

y
y

x x
(a) Non-concave (b) Concave

Figure 3.4.4. Cobb-Douglas function for (a) a = b = 1, (b) a = b = 1/3

Example 3.28. The function f (x) = x6 − 2.9 x4 + 3 x2 has its graph given in Figure 3.4.5.
The derivative is f 0 (x) = x[6 x4 − 11.6 x2 + 6], and f (x) has a single critical point at
x = 0. The second derivative f 00 (x) = 30 x4 − 34.8 x2 + 6 has zeroes at ±0.459 and
±0.974, and f 00 (x) < 0 for x ∈ [ 0.974, 0.459] ∪ [0.459, 0.974]. Thus, f (x) is not
convex. The function fˆ(x) = x6 is convex. Using the inverse of f (x) for positive values of
x, φ(y) = [f - 1 (y)] satisfies φ ◦ f (x) = fˆ(x) and is a rescaling of f (x) to fˆ(x). Thus,
6

f (x) is a rescaled convex function that is not convex. 


90 3. Constrained Extrema

f (x) x6

x x
1.3 1.3 1.3 1.3
Figure 3.4.5. A rescaled convex function of Example 3.28

The following corollary of the KKT Theorem 3.19 allows us to rescale the objective func-
tion as well as the constraints. Thus the general Cobb-Douglas functions considered in the last
proposition can be used as objective functions even though they are not concave, and the KKT
Theorem 3.19 does not apply.
Corollary 3.29 (KKT for Rescaled Functions). Assume that gi : D ⊂ Rn → R are C 1 for
1 ≤ i ≤ m, each of the constraints gi is a C 1 rescaled convex function, and
x∗ ∈ F g,b = { x ∈ D : gi (x) ≤ bi for 1 ≤ i ≤ m }.
a. Assume that f : Rn → R is a C 1 rescaled concave function.
i. If (x∗ , λ∗ ) satisfies KKT-1,2,3 with all the λ∗i ≥ 0, then f has a maximum on
F g,b at x∗ .
ii. If f has a maximum on F g,b at x∗ , and F g,b satisfies the Slater condition,
then there exist λ∗ = (λ∗1 , . . . , λ∗m ) such that (x∗ , λ∗ ) satisfies conditions KKT-
1,2,3 with all the λ∗i ≥ 0.
b. Assume that f is a C 1 rescaled convex function.
i. If (x∗ , λ∗ ) satisfies KKT-1,2,30 with all the λ∗i ≤ 0, then f has a minimum on
F g,b at x∗ .
ii. If f has a minimum on F g,b at x∗ , and F g,b satisfies the Slater condition,
then there exist λ∗ = (λ∗1 , . . . , λ∗m ) such that (x∗ , λ∗ ) satisfies conditions KKT-
1,2,30 with all the λ∗i ≤ 0.
Proof. Assume ĝi (x) = φi ◦ gi (x) with ĝ a convex C 1 function, φi : R → R C 1 , and
φ0i (bi ) > 0 for all bi ∈ R. Similarly, assume fˆ, fˆ(x) = T ◦ f (x) with fˆ a concave (respect.
convex) C 1 function, T : R → R C 1 , and T 0 (y) > 0 for all y = f (x) with x ∈ F g,b .
Let b0i = φi (bi ). If gi (x∗ ) = bi is tight, then ĝi (x∗ ) = b0i ,
Dĝi (x∗ ) = φ0 (b0 )Dgi (x∗ )
i i and Dfˆ(x∗ ) = T 0 (f (x∗ )) Dfˆ(x∗ ).
(a.i) F g,b = { x ∈ U : gi (x) ≤ bi } = { x ∈ U : gˆi (x) ≤ b0i } is convex. If f satisfies
KKT-1. then
Dfˆ(x∗ ) = T 0 (f (x∗ )) Df (x∗ ) = T 0 (f (x∗ ))
X
λi Dgi (x∗ ),
i

so fˆ satisfies KKT-1,2 with multipliers T 0 (f (x∗ )) λi > 0. By Theorem 3.19(a.i), fˆ has a


maximum at x∗ . Since T is increasing, f has a maximum at x∗ .
(a.ii) If f has a maximum at x∗ then since T is increasing, fˆ has a maximum at x∗ .
Applying Theorem 3.19(a.ii) to fˆ and the ĝi on F g,b , we get that
T 0 (f (x∗ )) Df (x∗ ) = Dfˆ(x∗ ) =
X X
λi Dĝi (x∗ ) = λi φ0 (bi )Dgi (x∗ ),
i
i i

where we have use the fact that λi = 0 unless gi (x∗ ) = bi . Since, T 0 (f (x∗ )) > 0 and
φ0i (bi ) > 0 for all effective i, we get that conditions KKT-1.2 hold for f and the gi with
0
multipliers λi Ti (bi )/T 0 (f (x∗ )) > 0.
3.4. Extrema with Inequality Constraints: Sufficient Conditions 91

Example 3.30. Find the maximum of f (x, y, z) = xyz subject to g1 (x, y, z) = 2x + y +


2z − 5 ≤ 0, g2 (x, y, z) = x + 2y + z − 4 ≤ 0, g3 (x, y, z) = x ≤ 0, g4 (x, y, z) = y ≤ 0,
and g5 (x, y, z) = z ≤ 0.
The feasible set is F = { (x, y, z) ∈ R3 : gi (x, y, z) ≤ 0 for 1 ≤ i ≤ 5 }. The
constraints are linear so convex and F is convex. The objective function f is a C 1 rescaled
1
concave function on R3+ . We could maximize the function (xyz) 4 , but the KKT equations are
more complicated for this objective function.
Since the values with at least one variable zero is zero, 0 = f (0, y, z) = f (x, 0, z) =
f (x, y, 0), and f (x, y, z) > 0 on R3++ , the maximum occurs in R3++ ∩ F . Therefore, the
constraints gi for 3 ≤ i ≤ 5 are slack at the maximum and λ3 = λ4 = λ5 = 0.
On Rn++ , the conditions KKT-1,2 are
yz = 2λ1 + λ2 , (KKT-1)
xz = λ1 + 2λ2 ,
xy = 2λ1 + λ2 ,
0 = λ1 (5 − 2x − y − 2z), (KKT-2)
0 = λ2 (4 − x − 2y − z).
From the first and third equation, we see that yz = xy , so x = z . The two remaining
complementary slackness equations are
0 = λ1 (5 − 4x − y)
0 = λ2 (4 − 2x − 2y).
If both constraints are effective, then we can solve the equations
5 = 4y + y
2 = x + y,
to get that x = y = z = 1. For this point, the first two equations of KKT-1 become
1 = 2λ1 + λ2
1 = λ1 + 2λ2 ,

which have a solution λ1 = λ2 = 1/3 > 0. (It can be checked that there is no other solution
of the first order KKT conditions on R3++ ∩ F , i.e., when one or two of the constraints are
effective, but this is not necessary.)
We have shown that f is a rescaling of a concave function, all the gi are all convex
functions on R3+ , and that p∗ = (1, 1, 1) multipliers λ1 = λ2 = 1/3 > 0 satisfy satisfies
conditions KKT-1,2,3. By the Karush-Kuhn-Tucker Theorem under Convexity, f must have a
maximum on R3++ ∩ F at p∗ . But as we remarked earlier, this must be the maximum on all
of F since f (x, y, z) = 0 when one or more variable is zero. 
Remark. Although we do not need it, the feasible set F satisfies the Slater condition since
the point (0.5, 0.5, 0.5) in D = { (x, y, z) ∈ R3+ : gi (x, y, z) ≤ 0 for 1 ≤ i ≤ 5 } has all the
gi positive, the constraint functions satisfy the Slater condition.
We could also check that the constraint qualification is indeed satisfied on all of F . How-
ever, if we add another constraint, x + y + z − 3 ≤ 0, then p∗ can be shown to be a solution of
KKT-1,2,3 in R3++ . By the Karush-Kuhn-Tucker Theorem under Convexity, p∗ is a maximizer.
For this example, there are three effective constraints at p∗ , but the rank is still 2. Therefore,
this system does not satisfy the constraint qualification.
92 3. Constrained Extrema

3.4.4. Global Extrema for Concave Functions


We return to general facts about concave/convex functions that are the main steps to show that
a solution of KKT-1,2,3 is a maximizer. If M = max{ f (x) : x ∈ F } < ∞ exists, we
denote the set of maximizers by
F ∗ = { x ∈ F : f (x) = M }.
If there is no maximum then F ∗ is the empty set.
Theorem 3.31. Assume that f : F ⊂ Rn → R is concave. Then the following hold:
a. Any local maximizer of f is a global maximizer.
b. The set of maximizers F ∗ is either empty or convex.
c. If f is strictly concave, then F ∗ is either empty or a single point.
Proof. (a) If not, then there exists a local maximizer x∗ and z 6= x∗ such that f (z) > f (x∗ ).
For xt = (1 − t) x∗ + t z and 0 < t < 1,
f (xt ) ≥ (1 − t)f (x∗ ) + t f (z) > (1 − t)f (x∗ ) + t f (x∗ ) = f (x∗ ).
Since f (xt ) > f (x∗ ) for small t, f cannot have a local maximum at x∗ .
(b) Let M = max{ f (x) : x ∈ F }, x0 , x1 ∈ F ∗ , and xt = (1 − t)x0 + tx1 . Then
M ≥ f (xt ) ≥ (1 − t)f (x0 ) + t f (x1 ) = (1 − t)M + tM = M , so f (xt ) = M and xt is
in the set of optimizing points for all 0 ≤ t ≤ 1. This proves the desired convexity.
(c) The proof is like part (b) except there is a strict inequality if x0 6= x1 . This gives a
contradiction M > M , so there can be only one point.

The next theorem shows that for C 1 functions, a function is convex if and only if a condition
on the relationship between the function and its tangent plane is satisfied. This first order
condition is not used to check convexity or concavity of a function but is used in the succeeding
theorem to give first order conditions for a maximizer.
Theorem 3.32. Let D ⊂ Rn be open and convex. Assume f : D → R is C 1 .
a. f is convex iff f (x) ≥ f (p) + Df (p)(x − p) for all x, p ∈ D (the graph of f (x)
lies above the tangent plane at p.)
b. f is concave iff f (x) ≤ f (p) + Df (p)(x − p) for all x, p ∈ D (the graph of f (x)
lies below the tangent plane at p.)
Proof. We consider part (a) only since (b) is similar. (⇒) Assume that f is convex, x, p ∈ D ,
and xt = p + t (x − p) = (1 − t)p + tx. Then, f (xt ) ≤ (1 − t)f (p) + tf (x), so
f (xt ) − f (p)
Df (p)(x − p) = lim
t→0+ t
(1 − t) f (p) + t f (x) − f (p)
≤ lim
t→0+ t
t [f (x) − f (p)]
= lim
t→0+ t
= f (x) − f (p).
(⇐) Assume that f satisfies the first derivative condition of the theorem. Let xt = (1 − t) p +
t x and wt = x − xt = (1 − t)(x − p), so p − xt = − t/1 − t wt . Then,


f (p) − f (xt ) ≥ Df (xt )(p − xt ) = − t/1 − t Df (xt )wt



and
f (x) − f (xt ) ≥ Df (xt )(x − xt ) = Df (xt )wt .
3.4. Extrema with Inequality Constraints: Sufficient Conditions 93

Multiplying the first inequality by (1 − t), the second by t, and adding these two inequalities
together, we get
(1 − t) f (p) + t f (x) − f (xt ) ≥ 0, or
(1 − t) f (p) + t f (x) ≥ f (xt ).
This proves that the function is convex.
The first derivative condition of the previous theorem inspires the following definition.
Definition. A function f is pseudo-concave on a set D provided that if f (x) > f (p) for
x, p ∈ D , then Df (p)(x − p) > 0. A direct check shows that a C 1 rescaling of a concave
function is pseudo-concave.
The following generalizes the condition of being a critical point for a point that is on the
boundary.
Theorem 3.33. Assume that f : F ⊂ Rn → R is concave or pseudo-concave and x∗ ∈ F .
Then, x∗ maximizes f on F iff Df (x∗ )v ≤ 0 for all vectors v that point into F at x∗ .
F ), f attains a maximum at x∗ on F iff x∗ is a critical
In particular for x∗ ∈ int(F
point of f .
Proof. Since the directional derivatives exist for a concave function, we do not need to assume
that the function is C 1 .
(⇒) Assume x∗ maximizes f over F . Take a vector v that points into F at x∗ . Then
for small t ≥ 0, x∗ + tv ∈ F and f (x∗ + tv) ≤ f (x∗ ). Taking the derivative with respect
to t, we get that
f (x∗ + tv) − f (x∗ )
Df (x∗ )v = lim ≤ 0.
t→0+ t
This proves the desired inequality on these directional derivatives.
(⇐) Now, assume that Df (x∗ )v ≤ 0 for all vectors v that point into F at x∗ . If
f does not have a maximum at x∗ , there exists a point z ∈ F such that f (z) > f (x∗ ).
Let v = z − x∗ . If f is pseudo-concave, this leads to the contradiction that Df (x∗ )v =
Df (x∗ )(z−x∗ ) > 0. If f is concave, then v points into F at x∗ . Also, for xt = x∗ +t v =
(1 − t) x∗ + t z,
f (xt ) ≥ (1 − t) f (x∗ ) + t f (z) = f (x∗ ) + t [f (z) − f (x∗ )] so
∗ ∗
f (xt ) − f (x ) t [f (z) − f (x )]
Df (x∗ )v = lim ≥ lim = f (z) − f (x∗ ) > 0.
t→0+ t t→0+ t
This contradicts the inequality on the directional derivatives and shows that f must have a
maximum at x∗ .

3.4.5. Proof of Karush-Kuhn-Tucker Theorem


Proof of Theorem 3.19.a.i. Assume that f is concave and satisfies conditions KKT-1,2 at
λ∗ , x∗ ) with all the λ∗i ≥ 0 and F g,b is convex. Therefore, f restricted to F g,b is con-

cave. Let E be the set of indices of effective constraints at x∗ . Suppose v is a vector that points
into F g,b at x∗ . If i ∈
/ E, then gi (x∗ ) < bi , λ∗i = 0, and λ∗i Dgi (x∗ )v = 0. If i ∈ E, then
gi (x + tv) ≤ bi = gi (x∗ ) for t > 0,

gi (x∗ + tv) − gi (x∗ )


≤0 for t > 0, and
t
gi (x∗ + tv) − gi (x∗ )
Dgi (x∗ )v = lim ≤ 0.
t→0+ t
94 3. Constrained Extrema

Since all the λ∗i ≥ 0, λ∗i Dgi (x∗ )v ≤ 0. By the first order condition KKT-1
P
i
X
Df (x∗ )v = λ∗i Dgi (x∗ )v ≤ 0.
i

Since the directional derivative is negative for any vector pointing into F g,b , f has a maxi-
mum at x∗ by Theorem 3.33.

Pconcave and has a maximum on F g,b at x . Let the



Proof of 3.19.a.ii. Assume that f is
Lagrangian be L(λ λ, x) = f (x) + i λi (bi − gi (x)) as usual. We use two disjoint convex
sets to show that for correctly chosen fixed λ ∗ ≥ 0 the Lagrangian has an interior maximum
at x∗ .
The set Y = { (w, z) ∈ R × Rm : w > f (x∗ ) & z  0 } is convex for the constant

f (x ).
z
w∗ = f (x∗ )
Y
w

Figure 3.4.6. Separation of Y and X by a hyperplane

We claim that
X = { (w, z) ∈ R × Rm : w ≤ f (x) & z ≤ b − g(x) for some x ∈ Rn }
is also convex. Note that X is defined using Rn and not F g,b . Let (w0 , z0 ), (w1 , z1 ) ∈ X
have corresponding points x0 , x1 ∈ Rn . Set wt = (1 − t)w0 + tw1 , zt = (1 − t)z0 + tz1 ,
and xt = (1 − t)x0 + tx1 . Then,
f (xt ) ≥ (1 − t) f (x0 ) + t f (x1 ) ≥ (1 − t)w0 + tw1 = wt
g(xt ) ≤ (1 − t) g(x0 ) + t g(x1 ) ≤ (1 − t)(b − z0 ) + t(b − z1 ) = b − zt .
This shows that (wt , zt ) ∈ X and X is convex.
We next claim that X ∩ Y = ∅. Assume that there exists a (w, z) ∈ X ∩ Y . Because
the point is in Y , z  0 and w > f (x∗ ). Because the pair is in X , there exists x ∈ Rn
with w ≤ f (x) and z ≤ b − g(x). The n b − g(x) ≥ z  0, so x ∈ F g,b . Combining,
f (x) ≥ w > f (x∗ ) for x ∈ F g,b , which contradicts the fact that f has a maximum at x∗ on
F g,b . This contradiction shows that they have empty intersection.
By the separation theorem for disjoint convex sets (Theorem 1.68 in [14]), there exist a
nonzero vector (p, q) ∈ R × Rm such that
pw + q · z ≤ pu + q · v for all (w, z) ∈ X , (u, v) ∈ Y . (7)
We claim that (p, q) ≥ 0. Assume one of the components is negative. Fix (w, z) ∈ X
with corresponding point x. By taking the corresponding coordinate of (u, v) ∈ Y large and
positive, p u + q · v can be made arbitrarily negative, contradicting the separation inequality
(7). Thus, (p, q) ≥ 0.
Taking any x ∈ Rn , setting w = f (x) and z = b − g(x), and letting (u, v) ∈ Y
converge to (f (x∗ ), 0), we get that
p f (x) + q · (b − g(x)) ≤ p f (x∗ ) for all x ∈ Rn . (8)
3.4. Extrema with Inequality Constraints: Sufficient Conditions 95

We want to show p 6= 0. If p = 0, then inequality (8) yields q · (b − g(x)) ≤ 0 for all


x ∈ Rn . Taking x = x given by the Slater condition (with g(x)  b), we get that q ≤ 0
and so q = 0. Thus, if p = 0 then q = 0, which contradicts the  fact that (p, q) is not
identically zero. Thus, we have shown that p 6= 0. Setting λ ∗ = 1/p q = (q1/p, . . . , qm/p) ≥
0 gives λ∗i ≥ 0 for 1 ≤ i ≤ m, or condition KKT-3.
Inequality (8) and the definition of λ ∗ show that
X
λ∗ , x) = f (x) +
L(λ λ∗i (bi − gi (x)) ≤ f (x∗ ) for all x ∈ Rn . (9)
i

For x = x∗ , this gives that λ∗i (bi − gi (x∗ )) ≤ 0. But λ∗i ≥ 0 and bi − gi (x∗ ) ≥ 0, so
P
i
λ∗i (bi − gi (x∗ )) = 0 for 1 ≤ i ≤ m, or condition KKT-2.
∗ ∗ ∗
λ , x ) = f (x ), so substituting into (9),
Also L(λ
λ∗ , x) ≤ L(λ
L(λ λ ∗ , x∗ ) for all x ∈ Rn .
Thus with λ ∗ fixed, L(λ
λ∗ , x) has an interior maximum at x∗ and
X
0 = Dx L(λ λ∗ , x∗ ) = Df (x∗ ) − λ∗i Dgi (x∗ ),
i
or condition KKT-1.

3.4. Exercises
3.4.1. Which of the following functions are convex, concave, or neither? Why?
a. f (x, y) = 2x2 − 4xy − 7x + 5y .
b. f (x, y) = xe−x−5y .
c. f (x, y, z) = −x2 + 2xy − 3y 2 + 9x − 7y .
d. f (x, y, z) = 2x2 + y 2 + 2z 2 + xy − 3xz .
e. f (x, y, z) = −2x2 − 3y 2 − 2z 2 + 2xy + 3xz + yz .
3.4.2. Let f : Rn++ → R be defined by f (x1 , . . . , xn ) = ln(xα1 1 · · · xαnn ), where all the
αi > 0. Is f convex or concave?
3.4.3. Let D be a convex set and h : D → R a concave function.
a. Show that
F = { (x, y) ∈ D : 0 ≤ x, 0 ≤ y, 0 ≤ h(x, y) }
is convex.
b. Assume f : D → R is convex and f (y) < f (x) for x, y ∈ D . Let
xt = (1 − t)x + ty.
Show that f (xt ) < f (x) for 0 < t ≤ 1.
3.4.4. Consider the following problem:
Minimize: f (x1 , x2 , x3 ) = 2 x21 + 5 x22 + 3 x23 − 2 x1 x2 − 2 x2 x3
Subject to: 25 ≤ 4 x1 + 6 x2 + x3
xi ≥ 0 for i = 1, 2, 3.
a. What are the KKT-1,2,30 equations for this problem to have a minimum? Be
sure an list all the equations that must be satisfied. Then, solve these equations
for a solution (x∗1 , x∗2 , x∗3 ).
b. Explain why the objective function and constraints satisfy the assumptions for
a minimum of the Karush-Kuhn-Tucker Theorem under Convexity.
Note that the function f is a positive definite quadratic function.
c. Explain why (x∗1 , x∗2 , x∗3 ) must be a minimizer, including explaining how the
theorems apply and what other conditions need to be satisfied.
96 3. Constrained Extrema

3.4.5. This is a problem about maximization of the social welfare function


W (x1 , x2 , x3 ) = a1 ln(x1 ) + a2 ln(x2 ) + a3 ln(x3 ).
for the production of three outputs x1 , x2 , and x3 , where a1 , a2 , a3 > 0. There are
600 units of labor available and 550 units of land. Because of the requirements for
the production of each product, we have the constraints
2x1 + x2 + 3x3 ≤ 600,
x1 + 2x2 + x3 ≤ 550,
1 ≤ x1 , 1 ≤x2 , 1 ≤ x3 .
(Notice that for 0 < xi < 1, ai ln(xi ) < 0 would contribute a negative amount to
the social welfare.)
a. Write down the KKT-1,2,3 equations that must be solved to find a maximum
of W . Find the solution of these equations (x∗1 , x∗2 , x∗3 ).
b. Explain why the objective function and constraints satisfy the assumptions for
a maximum of the Karush-Kuhn-Tucker Theorem under Convexity.
c. Explain why (x∗1 , x∗2 , x∗3 ) must be a maximizer, including explaining how the
theorems apply and what other conditions need to be satisfied.
3.4.6. Consider the following problem:
Maximize : f (x, y) = −(x − 9)2 − (y − 8)2 ,
Subject to : 4y − x2 ≥ 0
x + y ≤ 24
x≥0
y ≥ 0.
a. Write out the KKT-1,2,3 equations for this maximization problem and find a
solution (x∗ , y ∗ ).
b. Why is the objective f concave and the constraints convex?
c. Why must the point (x∗ , y ∗ ) be a global maximizer on the feasible set F ?
d. Draw a rough sketch of the feasible set F of points satisfying the constraint
equations. Why does it satisfy the Slater conditions?
3.4.7. Consider the problem
Maximization: f (x, y, z) = xyz
Subject to: 2x + y + 2z ≤ 5,
x + 2y + z ≤ 4,
x + y + z ≤ 3,
0 ≤ x, 0 ≤ y , 0 ≤ z .
a. Write down the KKT-1,2,3 equations for a maximum on the feasible set. Find
a solution p∗ and λ ∗ to these equations.
b. Why are all the constraints convex on R3+ ? Why is f a rescaled concave
function on R3+ ?
c. Why must p∗ be a maximizer of f on the feasible set?
d. Show that the feasible set satisfies the Slater condition.
3.4.8. Consider the problem
Minimize: f (x, y) = x4 + 12 x2 + y 4 + 6 y 2 − xy − x − y
Subject to: x + y ≥ 6,
x − y ≥ 3,
0 ≤ x, 0 ≤ y .
3.4. Extrema with Inequality Constraints: Sufficient Conditions 97

a. Write down the KKT-1,2,30 equations for a minimum on the feasible set.
b. Find a solution to these equations where both constraints are tight.
c. Why must the solution found in part (b) be a minimizer of f on the feasible
set?

3.4.9. A firm produces a single output q with two inputs x and y , with production function
q = xy . The output must be at least q0 units, xy ≥ q0 > 0. The firm is obligated
to use at least one unit of x, x ≥ 1. The prices of x and y are p and 1 respectively.
Assume that the firm wants to minimize the cost of the inputs f (x, y) = px + y .
a. Is the feasible set closed? Compact? Convex?
b. Write down the KKT-1,2,30 equation for a minimum.
c. Find the minimizer by solving the KKT-1,2,30 equations.
Hints: (i) Note that one of the equations for KKT-1 implies that the multiplier
for 0 ≥ q0 −xy is nonzero and so this constraint must be effective at a solution.
(ii) If 0 ≥ 1 − x is tight, then q ≤ p because both multiplier must be less
than or equal to zero.
(iii) If the multiplier for 0 ≥ 1 − x is zero, then q ≥ p because x ≥ 1.

3.4.10. Consider the problem


n
X cj
Minimize: ,
j=1
xj
P n
Subject to: j=1 aj xj = b,
0 ≤ xj for j = 1, . . . , n,
where aj , b, and cj are all positive constants.
a. Write down the KKT-1,2,30 equations for a minimum on the feasible set.
Pn The equality constraint
Hint: Pn can be written as two inequality constraints,

j=1 a j x j ≤ b and j=1 a j x j ≤ b. Also, x  0.
∗ ∗
b. Find a solution x and λ to these equations.
c. Why must p∗ be a minimizer of f on the feasible set?

3.4.11. Let T > 1 be a fixed integer and 0 < δ < 1. Consider the following maximization
problem.
>
X 1
Maximize: δ j xj2 ,
j=1
n
j=1 xj ≤ 1,
P
Subject to:
xj ≥ 0 for j = 1, . . . T .
a. Write down the KKT-1,2,3 equations.
b. Consider the case when all the xj > 0. Solve the KKT-1,2,3 equations for a
solution. Hint: Why must the multiplier be nonzero, so the constraint tight?
How must xj be related to x1 ? Using the tight constraint, what must x1
equal?
c. Why must the solution found in part (b) be a maximizer?

3.4.12. Let I, pi > 0 for 1 ≤ i ≤ n. Show that


B (p, I) = {x ∈ Rn+ : p1 x1 + · · · + pn xn ≤ I}
satisfies Slater’s condition. Hint: Split up 21 I evenly among the amounts spent on
1
the various commodities, i.e., 2n I on each.
98 3. Constrained Extrema

3.4.13. Assume that gi : D ⊂ Rn → R for 1 ≤ i ≤ k are convex and bounded on D .


a. Show that f (x) = max1≤i≤k gi (x) is a convex function.
Hint: maxi { ai + bi } ≤ maxi { ai } + maxi { bi }.
b. Is g(x) = min1≤i≤k gi (x) convex? Why or why not?
Hint: min{ ai + bi } ≥ min{ ai } + min{ bi }.
c. If the gi are concave, is min1≤i≤k gi (x) concave?
3.4.14. Assume that f, g : Rn → R are C 1 , f is concave, and g is convex. Let F g,b =
{ x ∈ Rn+ : g(x) ≤ b }. Assume that F g,b satisfies Slater condition. Further
assume that f attains a maximum on F g,b at p∗ with p∗ ∈ Rn++ and Df (p∗ ) 6=
0.
a. Explain why g(p∗ ) must equal b.
b. Explain why p∗ is a minimizer of g(y) on { y ∈ Rn+ : f (y) ≥ f (p∗ ) }.
3.4.15. Give an example of a concave function f : R+ → R that is bounded above but does
not attain a maximum. Check whether the constraint qualification is satisfied at p∗ .
1
3.4.16. Let f (x, y) = (xy) /3 and D = {(x, y) ∈ R2+ : x + y ≤ 2 }. You may use the
fact that the maximal value of f on D is 1. Let
X = (w, z1 , z2 , z3 ) ∈ R4 : w ≤ f (x, y), z1 ≤ x, z2 ≤ y,


z3 ≤ 2 − x − y for some (x, y) ∈ R2+ ,


Y = (w, z1 , z2 , z3 ) ∈ R4 : w > 1, z1 > 0, z2 > 0, z3 > 0 .




a. Show that f is concave on R2++ , continuous on R2+ , and so concave on R2+ .


b. Show that X and Y are convex.
c. Show that X ∩ Y = ∅.
d. Let (p, q1 , q2 , q3 ) = (1, 0, 0, 1/3). Show that
(1, 0, 0, 1/3) · (w, z1 , z2 , z3 ) ≤ (1, 0, 0, 1/3) · (u, v1 , v2 , v3 ),
for all (w, z1 , z2 , z3 ) ∈ X and (u, v1 , v2 , v3 ) ∈ Y .
e. Conclude that f (x, y) + (2 − x − y)/3 ≤ 1 for all (x, y) ∈ R2+ .

3.5. Second-Order Conditions for Extrema of


Constrained Functions
In this section we derive a second derivative test for local extrema with equality constraints.
This material is optional and is mainly provided as a reference.
Lemma 3.34. Assume that x∗ and λ ∗ = (λ∗1 , . . . , λ∗k ) satisfy the first-order Lagrange multi-
plier conditions. If r(t) is a curve in the level set g - 1 (b) with r(0) = x∗ and r0 (0) = v, then
the second derivative of the composition f ◦ r(t) is given by the following formula:
" k
#
d2
X
> ∗
2
λ` D g` (x ) v = v> Dx2L(λ
∗ 2 ∗
λ∗ , x∗ ) v.

f (r(t))
= v D f (x ) −
dt 2 t=0 `=1

Proof. Using the chain rule and product rule,


n
d 0
X ∂f
f (r(t)) = Df (r(t)) r (t) = (r(t)) ri0 (t)
dt i=1
∂x i

(by the chain rule)


3.5. Second-Order Conditions for Extrema of Constrained Functions 99

and
n n
d2

X d ∂f 0
X ∂f
(r(0)) ri00 (0)

f (r(t)) = (r(t)) ri (0) +
dt2
t=0 i=1
dt ∂xi

t=0 i=1
∂xi

(by the product rule)


X ∂ 2f
= (x∗ ) ri0 (0) rj0 (0) + Df (x∗ ) r00 (0)
i=1,...,n
∂xi ∂xj
j=1,...,n
(by the chain rule)
k
X
= (r0 (0))> D 2f (x∗ ) r0 (0) + λ∗` D(g` )(x∗ ) r00 (0).
`=1
Pk
In the last equality, we used the fact that Df (x∗ ) = `=1 λ∗` D(g` )(x∗ ) and the definition of
D 2f .
We can perform a similar calculation for the constraint equation b` = g` (r(t)) whose
derivatives are zero:
d X  ∂g` 
0 = g` (r(t)) = (r(t)) r0i (t),
dt i=1,...,n
∂x i
2
d X d  ∂g` 
0

0 = 2 g` (r(t)) = (r(t)) ri (t)
dt t=0 i=1,...,n
dt ∂xi t=0

∂ 2g`
X  
= (x∗ ) ri0 (0)rj0 (0) + D(g` )(x∗ )r00 (0), so
i=1,...,n
∂xj ∂x i
j=1,...,n

λ∗` D(g` )(x∗ )r00 (0) = −λ∗` (r0 (0))> D2 (g` )(x∗ )r0 (0).
Substituting this equality into the expression for the second derivative of f (r(t)),
" k
#
d2
X
> 2 ∗ ∗ 2 ∗

f (r(t)) = v D f (x ) − λ` D g` (x ) v,
dt2 t=0 `=1

where v = r0 (0). This is what is claimed.

The next theorem uses the above lemma to derive conditions for local maxima and minima
in terms of the second derivative of the Lagrangian on the null space null (Dg(x∗ )).
Theorem 3.35. Assume f, gi : Rn → R are C 2 for 1 ≤ i ≤ k . Assume that x∗ ∈ Rn and
λ ∗ = (λ∗1 . . . . , λ∗k ) satisfy the first-order conditions of the Theorem of Lagrange with
rank(Dg(x∗ )) = k . Set Dx2L∗ = Dx2L(λ λ∗ , x∗ ) = D 2f (x∗ ) − k`=1 λ∗` D 2g` (x∗ ).
P

a. If f has a local maximum on g - 1 (b) at x∗ , then v> Dx2L∗ v ≤ 0 for all v ∈ null(Dg(x∗ )).
b. If f has a local minimum on g - 1 (b) at x∗ , then v> D 2L∗ v ≥ 0 for all v ∈ null(Dg(x∗ )).
x
c. If v> Dx2L∗ v < 0 for all v ∈ null(Dg(x∗ )) r {0}, then f has a strict local maximum
on g - 1 (b) at x∗ .
d. If v> Dx2L∗ v > 0 for all v ∈ null(Dg(x∗ )) r {0}, then f has a strict local minimum
on g - 1 (b) at x∗ .
e. If v> Dx2L∗ v is positive for some vector v ∈ null(Dg(x∗ )) and negative for another
such vector, then f has is neither a local maximum nor a local minimum of f on g - 1 (b)
at x∗ .
100 3. Constrained Extrema

Proof. (b) We consider the case of minima. (The case of maximum just reverses the direction
of the inequality.) Lemma 3.34 shows that

d2

= v> Dx2L∗ v,

2
f (r(t))
dt
t=0

where v = r0 (0). If x∗ is a local minimum on g - 1 (b) then

d2


2
f (r(t)) ≥0
dt t=0

for any curves r(t) in g - 1 (b) with r(0) = x∗ . Thus, v> Dx2 L∗ v ≥ 0 for any vector v
in Tg (x∗ ). But we had by Proposition 3.8 that Tg (x∗ ) = null(Dg(x∗ )), so part (b) of the
theorem is proved.
(d) If v> Dx2 L∗ v > 0 for all vectors v 6= 0 in null(Dg(x∗ )), then by Proposition 3.8 and
Lemma 3.34,
d2

= r0 (0)> Dx2 L∗ r0 (0) > 0

2
f (r(t))
dt
t=0

for any curves r(t) in g - 1 (b) with r(0) = x∗ and r0 (0) 6= 0. This latter condition implies that
x∗ is a local minimizer on g - 1 (b).
For part (e), if v> Dx2 L∗ v is both positive and negative, then there are some curves where
the value of f is greater than at x∗ and others on which the value is less.

The preceding theorem shows that we need to consider the quadratic form x> Dx2 L∗ x on
the null space null(Dg(x∗ )). The next theorem shows that this restricted quadratic form can
λ ∗ , x∗ ) .
be shown to be positive or negative definite by determinants of a submatrices of DL(λ
λ, x) = f (x)+ 1≤i≤k λi (bi − gi (x)) be the Lagrangian. The derivative
P
Definition. Let L(λ
of L with respect to all its variables is
!

0 k Dg(x )
Hn = Dx2 L(λ λ ∗ , x∗ ) = ,
Dg(x∗ )> Dx2 L∗

which is obtained by “bordering” the n × n matrix Dx2 L∗ with the k × n matrix Dg(x∗ ).
We assume that the rank of Dg(x∗ ) is k < n, so there is a k × k submatrix with
nonzero determinant. To form the correct submatrices, we need to assume that the the variables
have been rearranged so the first k columns suffice and this k × k submatrix has nonzero
determinant. For 1 ≤ ` ≤ n, the bordered Hessians H` are (k + `) × (k + `) submatrices
of Hn = DL(λ λ∗ , x∗ ) given as follows:
∂g1 ∂g1
 
0 ... 0 ∂x1 . . . ∂x` 
 . .. .. .. 
 . .. ..
 . . . . . . 
∂gk ∂gk 
 
 0 ... 0 . . .
 ∂x1 ∂x` 
H` =   ∂g1 . . . ∂gk ∂ 2 L∗ 2 ∗ .
. . . ∂x1L
∂ 
 ∂x1 ∂x1 ∂x21 ∂x` 
 .. .. .. .. .. .. 
 
 . . . . . . 
2 ∗ 2 ∗
 
∂g1 ∂gk ∂ L ∂ L
∂x` . . . ∂x` ∂x` ∂x1 . . . 2
∂x`
3.5. Second-Order Conditions for Extrema of Constrained Functions 101

Theorem 3.36. Assume that f, gi : Rn → R areC 2 for 1 ≤ i ≤ k and (λλ∗ , x∗ ) is a critical
∂gi ∗
point of L satisfying (LM) with det (x ) 6= 0 and bordered Hessians H` .
∂xj 1≤i,j≤k

a. The point x∗ is a local minimum of f (x) restricted to g - 1 (b) iff


( 1)k det(H` ) > 0 for all k + 1 ≤ ` ≤ n.
(Notice that the sign given by ( 1)k depends on the rank k and not `.)
b. The point x∗ is a local maximum of f (x) restricted to g - 1 (b) iff
( 1)k det(Hk+1 ) < 0, ( 1)k det(Hk+2 ) > 0, and they continue to alternate signs up
to det(Hn ). These conditions can be written as ( 1)` det(H` ) > 0 for all k + 1 ≤
` ≤ n.

Remark. For k constraints on n variables, the constraint set is parametrized by n − k vari-


ables, and the test for a local extremum requires checking the sign of the determinant of n − k
submatrices, det(H` ) for k + 1 ≤ ` ≤ n. The conditions can also be given by bordering
Dx2 L∗ with Dg(x∗ ), instead of Dg(x∗ ) because the determinants do not change.

Proof. This proof is based on [7].


Let
∂gi ∂gi
   
B1 = and B2 = .
∂xj 1≤i,j≤k ∂xj 1≤i≤k, k+1≤j≤n

Let w = (x1 , . . . , xk )> be the first k coordinates and z = (xk+1 , . . . , xn )> be the last n − k
coordinates. Then the null space can be expressed in terms of the z variables:

0 = B1 w + B2 z
w = −B - 1 B z = Jz
1 2

" #
A11 A12
where J = −B1- 1 B2 . Partitioning Dx2 L∗ = A = into blocks, where A11 is
A>12 A22
k × k , A12 is k × (n − k), and A22 is (n − k) × (n − k), the quadratic form on the null space
has the following symmetric matrix E:
" #" # " #
h i A A12 J h i A J+A
> 11 11 12
E= J I = J> I
A>
12 A22 I A>
12 J + A 22

= J> A11 J + J> A12 + A>


12 J + A22 .

On the other hand, we can perform a (non-orthogonal) change of basis of the n + k -


dimensional space on which the quadratic form Hn is defined:
     
Ik 0 0 0k B1 B2 Ik 0 0 0 B1 0
 >   >
 0 Ik 0  B1 A11 A12   0 Ik J  = B1 A11 C12 
  
0 J> In−k B> 2 A>
12 A22 0 0 In−k 0 C>
12 E

Here the matrix E induces the quadratic form on the null space as we showed above. Since the
determinant of the change of basis matrix is one, this change of basis preserves the determinant
of Hn , and also the determinants of H` for 2 + 1 ≤ ` ≤ n.
102 3. Constrained Extrema

By using k row interchanges

   
0 B1 0 B>1 A11 C12
 > k
det(Hn ) = det B1 A11 C12  = ( 1) det  0 B1 0 
  
0 C>
12 E 0 C>12 E
" #
B1 0
= ( 1)k
det(B>
1 ) det
C>
12 E

= ( 1)k det(B1 )2 det(E).

This calculation carries over to all the H` , ( 1)k det(H` ) = det(B1 )2 det(E`−k ). Therefore,
we can use the signs of the determinants of the H` for k + 1 ≤ ` ≤ n to check the signs of the
determinants of the principal submatrices Ej−k with size ranging from 1 to n − k .
The quadratic form Q for A is positive definite on the null space iff the quadratic form for
E is positive definite iff

( 1)k det(H` ) = det(B1 )2 det(E`−k ) > 0 for k + 1 ≤ ` ≤ n.

For the negative definite case, the quadratic form Q for A is negative definite on the null
space iff the quadratic form for E is negative definite iff

( 1)`−k det(H` ) = ( 1)`−2k det(B1 )2 det(E`−k ) > 0 for k + 1 ≤ ` ≤ n.

Example 3.37. We check the second order conditions for the critical points of Example 3.9:
f (x, y, z) = z on the set given by x + y + z = 12 and z = x2 + y 2 .
∗ ∗ ∗ ∗ ∗
We considered  this example earlier andfound the two critical points (λ , µ , x , y , z ) =
4/5, 1/5, 2, 2, 8, and 6/5, 1/5, 3, 3, 18 .
For this example, n = 3 and k = 2, so n − k = 1 determinant. The Lagrangian is
L = z + λ(12 − x − y − z) − µ(z − x2 − y 2 ). The bordered Hessian is

 
0 0 1 1 1
0 0 2x 2y 1
 
H3 = D2 L =  1 2x µ2 0 0 .
 
 
 1 2y 0 µ2 0
1 1 0 0 0
3.5. Second-Order Conditions for Extrema of Constrained Functions 103

At (λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) = 4/5, 1/5, 2, 2, 8, ,




   
0 0 1 1 1 1 1 0 0 0
0 0 4 4 1 0 0 4 4 1
   
2 2
det(H3 ) = det  1 4 0 0= det  1 4 0 0
   
5 5

 1 2   2 
4 0 5
0  1 4 0 5
0
1 1 0 0 0 0 0 1 1 1
   
1 1 0 0 0 1 1 0 0 0
0 2
 0 4 4 1

0
 5 0 5
0

2 2
= det 0 5 0 0 = det 0 5 0 0
   
5 5

0 2   
5 0 5
0 0 0 4 4 1
0 0 1 1 1 0 0 1 1 1
   
1 1 0 0 0 1 1 0 0 0
0 2 2
 5 0 5
0 

0
 5 0 5
0
2 2 2 2
= det 0 0 0 = det 0 0 0
   
 5 5   5 5 
0 0 4 4 1 0 0 0 8 1
0 0 1 1 1 0 0 0 2 1
 
1 1 0 0 0
0 2
5 0 0 
5 2 5
     
2 2
= det 0 0 0  = 1( 5) ( 8) = 20 > 0.
 
5 5

0
 5 4
0 0 8 1
0 0 0 0 54

Since ( 1)k = ( 1)2 = 1i, k+1 = n = 3, and ( 1)k det(D2 L∗ ) > 0, (λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) =
4/5, 1/5, 2, 2, 8, is a local minimum.
At (λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) = 6/5, 1/5, 3, 3, 18 ,


   
0 0 1 1 1 1 1 0 0 0
0 0 6 6 1 0 0 6 6 1
   
2 2
det(H3 ) = det  1 6 0 0= det  1 6 0 0
   
5 5

 1 2   2 
6 0 5
0  1 6 0 5
0
1 1 0 0 0 0 0 1 1 1
   
1 1 0 0 0 1 1 0 0 0
0 2
 0 6 6 1
0
 5 0 5
0
2 2
= det 0 5 0 0 = det 0 5 0 0
   
5 5

0 2   
5 0 5
0 0 0 6 6 1
0 0 1 1 1 0 0 1 1 1
   
1 1 0 0 0 1 1 0 0 0
0 2 2
 5 0 5
0
0
 5 0 5
0
2 2 2 2
= det 0 0 0 = det 0 0 0
   
5 5 5 5

   
0 0 6 6 1 0 0 0 12 1
0 0 1 1 1 0 0 0 2 1
104 3. Constrained Extrema

 
1 1 0 0 0
0 2
 5 0 5
0
2 2
det(H3 ) = det 0 0 0  = 20 < 0.
 
 5 5 
0 0 0 12 1
5
0 0 0 0 6

Since ` = n = 3, ( 1)` det(D2 L∗ ) > 0 and (λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) = 6/5, 1/5, 3, 3, 18 is a




local maximum.
These answers are compatible with the values of f (x, y, z)  at the two critical points:
on the constraint set, (λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) = 4/5, 1/5, 2, 2, 8, is a global minimum, and
(λ∗ , µ∗ , x∗ , y ∗ , z ∗ ) = 6/5, 1/5, 3, 3, 18, is a global maximum. 

Example 3.38. Consider the problem of finding the extreme point of f (x, y, z) = x2 +y 2 +z 2
on 2 = z − xy . The points that satisfy the first order conditions for the method of Lagrange are

(λ∗ , x∗ , y ∗ , z ∗ ) = (4, 0, 0, 2),


= (2, 1, 1, 1), and
= (2, 1, 1, 1).

The Lagrangian is L(λ, x, y, z) = x2 + y 2 + z 2 − λz + λxy + λ2 with bordered Hessian


matrix
 
0 y x 1
y 2 λ 0
H3 = D2 L =  .
 
x λ 2 0
1 0 0 2

At the point (λ∗ , x∗ , y ∗ , z ∗ ) = (4, 0, 0, 2), expanding on the first row,


 
0 0 0 1  
0 2 4 0 0 2 4
det (H3 ) = det   = det  0 4 2 = 12 > 0.
   
0 4 2 0
1 0 0
1 0 0 2

Since n = 3 and k = 1 and both ( 1)k det(H3 ) < 0 and ( 1)n det(H3 ) < 0, this fails the
test for either a local minimum or a local maximum, so the point is not a local extremum.
The calculation at the other two points is similar, so we consider the point (λ∗ , x∗ , y ∗ , z ∗ ) =
(2, 1, 1, 1). The partial derivative gx (1, 1, 1) = ( 1) 6= 0, so
   
0 y x 0 1 1
H2 = y 2 λ =  1 2 2 .
   
x λ 2 1 2 2

Expanding det(H2 ) on the first row,


 
0 1 1 ! !
1 2 1 2
det (H2 ) = det  1 2 2 = det + det
 
1 2 1 2
1 2 2
= −4 − 4 = 8 < 0.
3.5. Second-Order Conditions for Extrema of Constrained Functions 105

Expanding det(H3 ) on the fourth row,


 
0 1 1 1    
 1 2 2 0 1 1 1 0 1 1
det (H3 ) = det   = det  2 2 0  + (2) det  1 2 2
     
1 2 2 0
2 2 0 1 2 2
1 0 0 2
= 0 + 2( 8) = 16 < 0.
Since k = 1, ( 1)k det(H3 ) > 0 and ( 1)k det(H2 ) > 0 and this point is a local minimum.
A similar calculation at (λ∗ , x∗ , y ∗ , z ∗ , ) = (2, 1, 1, 1) shows that it is also a local mini-
mum. 

3.5. Exercises
3.5.1. Find the points satisfying the first order conditions for a constrained extrema and
then apply the second order test to determine whether they are local maximum or
local minimum.
a. f (x, y, z) = xyz and g(x, y, z) = 2x + 3y + z = 6.
b. f (x, y, z) = 2x + y 2 − z 2 , g1 (x, y, z) = x − 2y = 0, and g2 (x, y, z) =
x + z = 0.
106 3. Constrained Extrema

3. Exercises for Chapter 3


3.1. Indicate which of the following statements are true and which are false. Justify each
answer: For a true statement explain why it is true and for a false statement either
indicate how to make it true or indicate why the statement is false. In several of these
parts, F g,b = { x ∈ Rn : gi (x) ≤ bi for 1 ≤ i ≤ m }.
a. If F g,b is convex, then each of the gi must be convex.
b. If f : D ⊂ Rn is continuous and f (x) attains a maximum on D , then D is
compact.
c. If f, gi : Rn → R are C 1 for 1 ≤ i ≤ m, f (x) satisfies KKT-1,2,3 at
x∗ ∈ F g,b , and the constraint qualification holds at x∗ , then x∗ must be a
maximizer of f (x) on F g,b .
d. If F g,b , f : Rn → R is C 1 , x∗ satisfies KKT-1,2,3 and is a maximizer of
f , then f must be concave.
e. Assume that f, gi : Rn → R are C 1 for 1 ≤ i ≤ m, the constraint qualifi-
cation is satisfied at all points of F g,b , and p1 , . . . , pk are the set of all the
points in D g,b that satisfy KKT-,1,2,3. Then, f (x) attains a maximum on
F g,b and max{f (x) : x ∈ F g,b } = max{f (pj ) : 1 ≤ j ≤ k }.
f. Let f, gj : Rn+ → R be C 1 for 1 ≤ j ≤ m, F g,b = { x : gj (x) ≤
bj for 1 ≤ j ≤ m }, and {x∗k }K k=1 be the set of points in F g,b where
either (i) the KKT-1,2,3 conditions hold or (ii) the constraint qualification fails
for F g,b . Then f must have a maximum on F g,b at one of the points
{x∗k }K
k=1 .
g. Assume that gj : Rn+ → R are continuous and convex for 1 ≤ j ≤ m,
f : Rn → R is concave, and f has a local maximum on F g,b at x∗ . Then
x∗ is a global maximizer of f on F g,b .
h. If F g,b is convex and f is concave on F g,b , then f must have a maximum
on F g,b .
i. To find the maximum of f (x, y, z, w) subject to the three constraints
gi (x, y, z, w) = bi for i = 1, . . . , 3 using the Lagrange Multiplier Theorem,
one has to solve a system of 4 equations with 4 unknowns.
j. Consider a level set F = { x ∈ Rn : gi (x) = bi for 1 ≤ i ≤ k }. If x∗ is
a maximizer of f (x) on the level set F = { x ∈ Rn : gi (x) = bi for 1 ≤
i ≤ k } with ∇f (x∗ ) = ki=1 λ∗i ∇gi (x∗ ), then all the λ∗i must satisfy
P
λ∗i ≥ 0.
k. If f, gi : D ⊂ Rn → R are C 1 , x∗ is a maximizer of f on F g,b , and
λ∗ , x∗ ) satisfies KKT-1, then for 1 ≤ i ≤ k , either λ∗i > 0 or gj (x∗ ) < bj .

l. For a C 1 function f (x) on F = { x ∈ R2+ : 1 − x1 x2 ≤ 0 }, if (x∗ , λ∗ )


satisfy KKT-1,2,3 with λ∗ > 0, then x∗ must be a maximizer.
Chapter 4

Dynamic Programming

This chapter focuses on maximizing over more than one time period where unused resources
are carried forward to the next time period. The easier case considers a finite number of time
periods, which is said to have a finite horizon. This type of problem can be solved as a Karush-
Kuhn-Tucker problem with many variables. However, a simpler method is to solve the problem
recursively, one period at a time. This latter approach is called dynamic programming. The
harder case considers an infinite number of time periods, or infinite horizon. Although the
equation to be solved looks similar to the finite-horizon case, the infinite-horizon case requires
solving an equation for a function rather than just the value of the function in some Rn . Also,
proving that a solution exists is much harder.
In both the finite-horizon and infinite-horizon dynamic programs, the function maximized
is restricted to a domain that usually depends on the state, i.e., depends on a parameter. Such
an assignment of a set by a parameter is called a correspondence. For each parameter, there
can be more than one maximizer, so the set of maximizers in general is also a correspondence
and not a function. We start the chapter by considering correspondences and various types of
continuity as the parameter varies. The Parametric Maximization Theorem indicates how the
set of maximizers and the maximal value varies with the parameter. This theorem is at the heart
of the solution method of dynamic programming problems.

4.1. Parametric Maximization and


Correspondences
In this section, we introduce correspondences and indicate how they arise in maximization
problems that depend on a parameter. In game theory, the set of best responses to the choices
by the opponents is an example of a correspondence. See Example 4.4.
Example 4.1. The assignment C 1 (s) = [0, s] of an interval that depends on the parameter
0 ≤ s ≤ 1 is an example of a correspondence. Its graph is the set of points
Gr(CC 1 ) = { (s, x) : 0 ≤ s ≤ 1, x ∈ [0, s] }
and is shown in Figure 4.1.1.
The general definitions are as follows.
Definition. A correspondence C from S ⊂ R` to X ⊂ Rn is a map that associates a
nonempty subset C (s) of X to each s ∈ S. Let P (X) be the power set of X of all nonempty
subsets of X. Thus, C takes its values in P (X), C : S → P (X).
The graph of a correspondence C : S → P (X) is the set
Gr(CC ) = { (s, x) : s ∈ S, x ∈ C (s) } ⊂ S × X.

107
108 4. Dynamic Programming

Figure 4.1.1. Graph of correspondence C 1 (s) = [0, s]

Definition. For a correspondence C : S → P (Rn ), various properties that we use in our


considerations are defined as follows.
C is bounded provided that there is a K > 0 such that C (s) ⊂ B(0, K) for every
s ∈ S.
C is locally bounded provided that for each s0 ∈ S, there are δ > 0 and K > 0, such
that C (s) ⊂ B(0, K) for every s ∈ B(s0 , δ) ∩ S.
C is a closed-graphed provided that its graph Gr(CC ) is a closed subset of S × Rn .
C is closed-valued (resp. compact-valued, or convex-valued ) provided that C (s) is a
closed (resp. compact, or convex) subset of Rn for every s ∈ S.
Remark. The terms that include “valued” refer to properties for each parameter s and not
properties for all s at once. Thus, a closed-valued correspondence is not necessarily closed-
graphed, but a closed-graphed correspondence is closed-valued.
We next give several additional examples of correspondences with various properties.
Example 4.2. With S = [0, 2] and X = R, define two correspondences by
(
[1, 2] for 0 ≤ s < 0.5, 1.5 < s ≤ 2,
C 2 (s) =
[0, 3] for 0.5 ≤ s ≤ 1.5,
(
[1, 2] for 0 ≤ s ≤ 0.5, 1.5 ≤ s ≤ 2,
C 3 (s) =
[0, 3] for 0.5 < s < 1.5.
Figure 4.1.2 shows their graphs. These correspondence only differ for s = 0.5 and 1.5. Both
C 2 and C 3 are compact-valued and bounded; C 2 is closed-graphed but not C 3 . 
3 x 3 x

2 2

1 1
s s
0 0
0.5 C 2 1.5 0.5 C 3 1.5
Figure 4.1.2. Graphs of correspondences in Example 4.2

Example 4.3. The correspondence


 1
 
s 6= 0
C 4 (s) = s
{0} s=0

has a single point for each parameter s ∈ R and is neither bounded nor locally bounded near
s = 0. See Figure 4.1.3. This correspondence is (i) closed-graphed and (ii) compact-valued
since it is a single point for each s. 
4.1. Parametric Maximization and Correspondences 109

Figure 4.1.3. Graph of correspondence C 4 in Example 4.3

Our main use of correspondences occurs in maximization problems with a parameter that
have a general setup is as follows. As s varies over a parameter space S, assume that the
feasible set F (s) ⊂ Rn can vary with the parameter and is compact for each s. Thus, F :
S → P (Rn ) is a compact-valued correspondence. Assume that f : Gr(F ) ⊂ S × Rn → R
is continuous. For each s ∈ S, denote the maximal value of f (s, x) subject to x ∈ F (s) by
f ∗ (s) = max{ f (s, x) : x ∈ F (s) }
and the set of feasible maximizers by
F ∗ (s) = { x ∈ F (s) : f (s, x) = f ∗ (s) }.
Since F (s) and F ∗ (s) are sets for each s, the assignments from s to F (s) or F ∗ (s)
are examples of correspondences. On the other hand, f ∗ (s) is a real-valued function. The
following examples illustrate how f ∗ (s) and F ∗ (s) can vary with s.
Example 4.4. Consider the example with f1 (s, x) = (s − 1/3) x for s ∈ [0, 1] = S1 and
x ∈ [0, 1] = F 1 (s). (The feasible set does not vary with s in this example.) Since

1
∂f1 < 0 for s < /3 and for all x ∈ [0, 1],

(s, x) = (s − 1/3) ≡ 0 for s = 1/3 and for all x ∈ [0, 1],
∂x
> 0 for s > 1/3 and for all x ∈ [0, 1],



{0} for s < 1/3 (
for s ≤ 1/3

0
F 1 (s) = [0, 1] for s = /3
∗ 1 ∗
and f1 (s) =
s − /3 for s > 1/3.
1
{1} for s > 1/3.

See Figure 4.1.4. The set-valued correspondence F ∗1 (s) changes from {0} to {1} as s
crosses 1/3, while the maximal value f1∗ (s) is continuous. Also, note that F ∗1 (s) is
(i) bounded, (ii) compact-valued, and (iii) closed-graphed. In a strategic game from game
theory, if f1 (s, x) is your payoff for mixed strategies of s by other player and x by you, then
F ∗1 (s) is called the best response correspondence. 

x x

s s
F ∗1 (s) f1∗ (s)
Figure 4.1.4. Graphs of maximizer set and maximal-value function for Example 4.4
110 4. Dynamic Programming

Example 4.5. Let f2 (s, x) = 14 x4 + 13 sx3 + 21 x2 for s ∈ R = S2 and x ∈ R = F2 (s).


Its graph for different signs of s is given in Figure 4.1.5. Its partial
√ derivative is f2x (s, x) =
3 2 ± 1
x + sx + x, so the critical points are 0 and xs = 2 (s ± s2 + 4 ). The second partial
derivative is f2xx (s, x) = 3x2 + 2sx + 1. At x = 0, f2xx (s, 0) = 1 > 0, so 0 is a local
minimum. At x± ± ± 2 ± 2 ± ± 2
s , f2xx (s, xs ) = (xs ) + 2 [ (xs ) + sxs + 1] − 1 = (xs ) − 1 < 0,
+ −
and these are local maxima. Therefore, f attains a maximum at either xs or xs . For s = 0,
x±0 = ±1, f2 (0, ±1) = /4 > 0 = f (0, 0), and F 2 (0) = { 1, 1 }.
1 ∗

x−
s x+
s x−
s x+
s x−
s x+
s

s<0 s=0 s>0

Figure 4.1.5. Graph of f2 (s, x) for Example 4.5

Rather than calculate f2 (s, x± s ) as a function of s, we determine the sign of its (total)
derivative with respect to s at these points:
±
d ± dxs
3 3
f2 (s, x±
s ) = f 2x (s, xs ) + 31 x±
s = 13 x±
s .
ds ds
This derivative has the same sign as x± s , so

f2 (s, x− +
s ) > f (0, ±1) > f2 (s, xs ) for s < 0,
f2 (s, x−
s ) < f (0, ±1) < f2 (s, x+
s ) for s > 0.
Thus, 

{xs }
 for s < 0
F ∗2 (s) = {x−
0 , x+
0 } = { 1, 1 } for s = 0
 +

{xs } for s > 0.

1
1 x+
s

1
4
x−
s 1

F ∗2 (s) f2∗ (s)


Figure 4.1.6. Graphs of maximizer set and maximal-value function for Example 4.5

Figure 4.1.6 shows the numerically calculated graphs of F ∗2 and f2∗ . The correspondence
F ∗2 switches the maximizer from x− +
s to xs at s = 0 but is (i) compact-valued, (ii) locally
bounded, and (iii) closed-graphed. See Theorem 4.8. Theorem 4.8 also shows that f2∗ must be
continuous as can be seen in Figure 4.1.6(b). 
4.1. Parametric Maximization and Correspondences 111

For correspondences, we consider not only continuity but also two weaker properties called
hemicontinuity. The precise conditions involve the neighborhood of a set that we define first.
Definition. For a set A ⊂ Rn , the -neighborhood of A is the set
B(A, ) = { x ∈ Rn : there is a y ∈ A with kx − yk <  }.

A B(A, )

Figure 4.1.7. Neighborhood of a set

A function f (x) is continuous at x0 provided that limx→x0 f (x) = f (x0 ), i.e., for all
 > 0, there exists a δ > 0 such that if |x−x0 | < δ , then |f (x)−f (x0 )| < . The conclusion
can be written as f (x) ∈ B(f (x0 ), ). If we apply a similar condition to a correspondence, it
restricts the extent that the correspondence can expand or implode (vacate regions).
Definition. A compact-valued correspondence C : S ⊂ R` → P (X) is upper-hemi-
continuous (uhc) at s0 ∈ S provided that C (s) must remain inside a small neighborhood
of C (s0 ) for small changes of s away from s0 , i.e., any  > 0 there exists δ > 0 such that if
s ∈ B(s0 , δ) ∩ S then C (s) ⊂ B(C C (s0 ), ). This restricts the amount the correspondence
can expand or go into new regions.
We say that C is upper-hemicontinuous on S if it is upper-hemicontinuous at each s ∈ S.
Definition. A compact-valued correspondence C : S → P (X) is lower-hemicontinuous
(lhc) at s0 ∈ S provided that for each x0 ∈ C (s0 ), there is a point of C (s) nearby, i.e., for
any point x0 ∈ C (s0 ) and any  > 0, there exists δ > 0 such that if s ∈ B(s0 , δ) ∩ S then
B(x0 , ) ∩ C (s) 6= ∅ or x0 ∈ B(C C (s), ). This is equivalent to saying that for any  > 0,
there exists δ > 0 such that if s ∈ B(s0 , δ) ∩ S then C (s0 ) ⊂ B(C C (s), ). This restricts the
amount the correspondence can implode or vacate the region C (s0 ).
We say that C is lower-hemicontinuous on S if it is lower-hemicontinuous at each s ∈ S.
Definition. A compact-valued correspondence is said to be continuous provided that it is both
upper- and lower-hemicontinuous. Thus, it is continuous provided that for any  > 0, there
exists δ > 0 such that if s ∈ B(s0 , δ) ∩ S then C (s0 ) ⊂ B(C C (s), ) and C (s) ⊂
B(CC (s0 ), ). Thus for small s − s0 and for each point in C (s0 ) or C (s), there is a point
nearby in the other set and the two sets are close to each other.
Although we give examples of lhc correspondences, we mainly use uhc and continuous
correspondences.
Example 4.6. The continuity of the correspondences given in the preceding examples is as
follows: C 1 is continuous; C 2 , F ∗1 , and F ∗2 are upper-hemicontinuous but not lower-
hemicontinuous nor continuous (C 2 at s = 0.5 or 1.5, F ∗1 at s = 1/3, and F ∗2 at s = 0);
C 3 is lower-hemicontinuous but not upper-hemicontinuous nor continuous at s = 0.5 or 1.5;
finally, C 4 is neither lower-hemicontinuous nor upper-hemicontinuous at s = 0,
1
 
C 4 (s) = * B(CC 4 (0), ) = ( , ) and
s
1 1
 
C 4 (s), ) =
C 4 (0) = {0} * B (C  + , + .
s s
112 4. Dynamic Programming

Remark. There is a related but different concept of upper- or lower-semicontinuous function.


Unfortunately, the graph of an upper-semicontinuous function is not the graph of an upper-
hemicontinuous correspondence. Because of this confusion, we will not consider the semi-
continuity of functions.
Proposition 4.7. Let C : S → P (X) be a compact-valued correspondence and locally
bounded. Then, C is upper-hemicontinuous iff C is a closed-graphed correspondence.
Proof. Assume that C is not closed-graphed. Then there exists a point (s0 , x0 ) ∈ cl(Gr(C C ))r
Gr(C C ), so x0 ∈ / C (s0 ). Since C (s0 ) is compact, there exists  > 0 such that x0 ∈ /
B(C C (s0 ), ). But since (s0 , x0 ) is in the closure of the graph, for every δ > 0, there is a
point (sδ , xδ ) ∈ Gr(C C ) with ksδ − s0 k < δ and kxδ − x0 k < δ . By taking δ < /2,
xδ ∈ / B(CC (s0 ),  − δ) ⊃ B(C C (s0 ), /2), so C (sδ ) * B(C
C (s0 ), /2). Since ksδ − s0 k < δ ,
and δ > 0 is arbitrarily small, C is not upper-hemicontinuous at s0 .
If C is not upper-hemicontinuous at s0 , then there exists some  > 0 such that for any
δ > 0, there exists some (sδ , xδ ) with ksδ − s0 k < δ and xδ ∈ C (sδ ) r B(C C (s0 ), ). If C
is locally bounded, the (sδ , xδ ) must accumulate to some point not on the graph, so the graph
is not closed.

Remark. The example C 4 given above shows why the correspondence must be locally bounded
in this proposition.
We can now state the principal result of the section which will be used in the rest of the
chapter.
Theorem 4.8 (Parametric Maximization Theorem). Assume that the feasible set F : S →
P (X) is a compact-valued and continuous correspondence and f : Gr(F F) ⊂ S × X → R
is a continuous function.
Then, f ∗ (s) = max{ f (s, x) : x ∈ F (s) } is a continuous function and
F (s) = { x ∈ F (s) : f (s, x) = f ∗ (s) } is a compact-valued upper-hemicontinuous

correspondence on S.
If F ∗ (s) is a single point for each s, then this correspondence is continuous and defines
a continuous function.
Remark. If f (s, x) is strictly concave as a function of x for each s, then each F ∗ (s) is a
single point and so F ∗ is continuous.
Remark. Examples 4.4 and 4.5 both satisfy the assumptions of the theorem and have sets of
maximizers that are upper-hemicontinuous but not continuous.
Example 4.9. Let S = X = R+ , F (s) = [0, s], and h : R+ × R+ → R+ be defined by
1 1
h(s, x) = x 2 + (s − x) 2 .
The function h is continuous and the feasible set F (s) = [0, s] is a continuous correspon-
dence, so the Parametric Maximization Theorem applies. The function h is differentiable for
positive values of x and s, and the critical point satisfies
∂h 1 1
= x - 2 − (s − x) - 2 ,
1 1
0=
∂x 2 2
x - 2 = (s − x) - 2 ,
1 1

s − x = x,
s = 2x,
x̄ = 21 s ∈ [0, s].
4.1. Parametric Maximization and Correspondences 113

∂2h
= 14 x - 2 − 14 (s − x) - 2 < 0 for all x ≥ 0, h(x, s) is a concave function of
3 3
Since
∂x2
x, and x̄ is the unique maximizer on [0, s]. Also note that F ∗ (s) = 12 s is a continuous

1 1 1 1
correspondence and h∗ (s) = (s/2) 2 + (s/2) 2 = 2 2 s 2 is continuous. 

4.1.1. Budget Correspondence for Commodity Bundles


In consumer theory, a commodity bundle of n items is an element of Rn+ . Often, it is restricted
to a set based on prices of the commodities and wealth of the individual. We show explicitly
that the set of allowable commodity bundles is a continuous correspondence.
Theorem 4.10. Let the budget correspondence B : Rn+1
++ → P (R+ ) be defined by
n

B (p, w) = { x ∈ Rn+ : p · x ≤ w },
where the prices pi > 0 for 1 ≤ i ≤ n, the wealth w > 0, and the parameter space is
++ }. Then, B is a continuous, compact-valued correspondence.
S = { (p, w) ∈ Rn+1
The following corollary follows immediately from the previous theorem and the Parametric
Maximization Theorem.
Corollary 4.11. Assume u : Rn+ → R is a continuous utility function. Let v : Rn+1 ++ → R be
the indirect utility function that is the maximal value of utility on the budget constraint,
v(p, w) = u∗ (p, w) = max{ u(x) : x ∈ B (p, w) },
++ → P (R+ ) be the demand correspondence which achieve this maximum,
and let d : Rn+1 n

d(p, w) = B ∗ (p, w) = { x ∈ B (p, w) : u(x) = v(p, w) }.


Then the indirect utility function v is a continuous function and the demand correspon-
dence d is a compact-valued upper-hemicontinuous correspondence.
Proof of Theorem 4.10. Since it is defined by linear constraints that allow equality, it is closed-
valued. Since every xi ≤ w/pi , each B (p, w) is bounded and so compact-valued. Intuitively,
it is a continuous correspondence, but we prove it explicitly by means of the following two
lemmas.
Lemma 4.12. B is upper-hemicontinuous.
Proof. Since B is compact-valued, by Proposition 4.7, it suffices to prove that B is locally
bounded and closed-graphed.
Take an allowable (p̄, w̄) ∈ Rn+1
++ . The prices p̄  0, so there exists δ > 0 such that
p̄i ≥ 2δ for all i. If kp − p̄k < δ , |w − w̄| < δ , and x ∈ B (p, w), then for any fixed
1 ≤ i ≤ n, pi > p̄i − δ ≥ δ ,
X
δ xi ≤ pi xi ≤ pj xj ≤ w ≤ w̄ + δ, and
j
w̄ + δ
xi ≤ .
δ
This shows that there is one bound on points x ∈ B (p, w) for kp− p̄k < δ and |w− w̄| < δ ,
w̄ + δ
i.e., B (p, w) ⊂ [0, C]n where C = and B is locally bounded.
δ
The function h(p, w, x) = min{ 0, w−p·x } is easily seen to be continuous. Restricting
h to F = [0, C]n for kp − p̄k < δ and |w − w̄| < δ , it follows that h∗ (p, w) is contin-
uous and F ∗ (p, w) is upper-hemicontinuous. Since h(p, w, 0) = 0 and h(p, w, x) ≤ 0,
h∗ (p, w) = 0 or w − p · x ≥ 0 for x ∈ F ∗ (p, w). Thus, F ∗ (p, w) = B (p, w) and the
set of commodity bundles is upper-hemicontinuous as desired.
114 4. Dynamic Programming

Lemma 4.13. B is lower-hemicontinuous.


Proof. Fix (p0 , w0 ) ∈ Rn+1++ , x0 ∈ B (p0 , w0 ), and  > 0. Then there exists x̄ ∈ B(x0 , )
such that x̄  0 and w̄ − p̄ · x̄ > 0. The function g(p, w, x) = w − p · x is continuous
and g(p̄, w̄, x̄) > 0, so there exists δ > 0 such that g(p, w, x̄) > 0 for (p, w) within  of
(p̄, w̄). Therefore, x̄ ∈ B (p, w) and x0 ∈ B (B B (p, w), ). Since this is possible for any
(p0 , w0 ) ∈ Rn+1
++ , x 0 ∈ B (p 0 , w0 ), and  > 0 , B is lower-hemicontinuous.

4.1.2. Existence of a Nash Equilibrium


This subsection is not used elsewhere and can be skipped. Consider a two player strategic game
where each player has a finite number of pure choices, ni for the ith player. We label the choices
of the ith player by integers 1 ≤ j ≤ ni . Each player has a payoff ui (j, k) that depends on
the pure choices of both players. For the ith player, a mixed strategy is a distribution (sij )nj=1
i
Pni
such that each sij ≥ 0 and j=1 sij = 1, where sij is the probability of playing strategy
j . The set of all such mixed strategies Si is a compact, convex subset of Rni , a simplex. The
payoff on pure strategies induces a Bernoulli payoff function on mixed strategies
X
Ui (s1 , s2 ) = s1j s2k ui (j, k).
1≤j≤n1 , 1≤k≤n2

The functions U1 and U2 are continuous functions on S1 × S2 . Denote the maximal payoff
for the ith player in response to a mixed strategy by of s - i for the other player by
mi (s - i ) = max{ Ui (si , s - i ) : si ∈ Si },
and the best response correspondence for player i by
bi (s - i ) = { si : Ui (si , s - i ) = mi (s - i ) }.
A Nash equilibrium is a pair of mixed strategies (s∗1 , s∗2 ) such that s∗1 ∈ b1 (s∗2 ) is a best
response to s∗2 and s∗2 ∈ b2 (s∗1 ) is a best response to s∗1 , (s∗1 , s∗2 ) ∈ b1 (s∗2 ) × b2 (s∗1 ).
Given s - i , there are a finite number of pure strategies that realize mi (s - i ) and bi (s - i ) is
the set of all convex combinations of these pure strategies. Therefore, the correspondence
(s1 , s2 ) ∈ S1 × S2 7→ b(s1 , s2 ) = b1 (s2 ) × b2 (s1 ) ⊂ S1 × S2
is convex valued. Since U1 and U2 are continuous and the feasible set is the same for all strate-
gies, b1 s2 ) and b2 (s1 ) are each upper-hemicontinuous and so is b(s1 , s2 ). The existence of
a Nash equilibrium in mixed strategies then follows from the Kakutani Fixed Point Theorem.
Theorem 4.14 (Kakutani). Let S be a non-empty, compact, and convex subset of some Eu-
clidean space Rn and C : S → P(S) be a upper-hemicontinuous and convex valued corre-
spondence. Then, there exists a p∗ ∈ S such that p∗ ∈ C (p∗ ).
See [14] for more discussion of the Kakutani Threorem. The book [1] by Arrow and Hahn
has a proof and applications to economics.

4.1. Exercises
4.1.1. Let S = [0, 1] and S = R. For each of the following correspondences
C : S → P (R), (i) draw its graph and (ii) determine whether it is uhc, and/or
continuous. Hint: By Proposition 4.7, the correspondence is upper-hemicontinuous
if and only if it is closed-graphed. (They satisfy the other assumptions of the propo-
sition.)
4.1. Parametric Maximization and Correspondences 115

(
for s ∈ 0, 1/2 ,
 
[0, 2s]
a. C (s) =
[0, 2 − 2s] for s ∈ 1/2, 1 .
 
(
[0, 1 − 2s] for s ∈ 0, 1/2 ,
 
b. C (s) =
[0, 2 − 2s] for s ∈ 1/2, 1 .

(
[0, 1 − 2s] for s ∈ 0, 1/2 ,
 
c. C (s) =
[0, 2 − 2s] for s ∈ 1/2, 1 .
 

d. C (s) = {0, s} for s ∈ [0, 1] (two points for each s).


(
{0} for s < 0 (one point for each s),
e. C (s) = .
{ 1, 1} for s ≥ 0 (two points for each s).

4.1.2. Let X = [0, 1] = S, and f : S×X → R be defined by f (s, x) = 3+2x−3s−5xs.


Here, F (s) = [0, 1] for all s. Find f ∗ (s) and F ∗ (s) for each value of s. Using the
f ∗ and F ∗ you have found, discuss why f ∗ (s) is a continuous function and F ∗ (s)
is a uhc correspondence. (Do not just quote a theorem.) Hint: If fx (s, x) > 0 for all
x ∈ [0, 1], the then maximum occurs for x = 1. If fx (s, x) < 0 for all x ∈ [0, 1],
then the maximum occurs for x = 0.

4.1.3. Let f : R+ × R+ → R be defined by f (s, x) = (x − 1) − (x − s)2 . Define the cor-


respondence F : R+ → P (R+ ) by F (s) = [0, 1] for s ≥ 0. Do the hypotheses
of the Parametric Maximum Theorem 2 hold for this problem? Verify, through direct
calculation whether the conclusions of the Parametric Maximum Theorem hold for
F ∗ (s) and f ∗ (s).
∂2f
Hint: Find the critical point, xs and verify that < 0. If xs ∈ F (s) then it is
∂x2
the maximizer.
∂f
If xs ∈/ F (s) is always positive or always negative on [0, 1]? Is the maximizer
∂x
the right or left end point.

4.1.4. Let f (s, x) = sin(x) + sx, for s ∈ S = [ 1, 1] and x ∈ F (s) = [0, 3 π].
a. Discuss why the Maximum Theorem applies.
b. Without finding explicit values, sketch the graph of f ∗ and F ∗ . Discuss why
these graphs look as they do and how they satisfy the conclusion of the Maxi-
mum Theorem.
Hint: Draw the graph of f (s, x) as a function of x for three cases of s:
(i) s < 0, (ii) s = 0, and (iii) s > 0.

4.1.5. Let S = [0, 2], X = [0, 1], the function f : S × X → R be defined by f (s, x) =
−(x + s − 1)2 , and the feasible correspondence by F (s) = [0, s]. Find f ∗ (s) and
F ∗ (s) for each value of s. Draw the graphs of f ∗ and F ∗
116 4. Dynamic Programming

1/2, 1/2 and


 
4.1.6. Let S = X = R. Let the function f : S × R → R be defined by



 x+1 for 1 ≤ x ≤ 1 − s,
 x − 1 − 2s for 1 − s ≤ x ≤ 1 − 2s,



for s < 0, f (s, x) = 0 for 1 − 2s ≤ x ≤ 1 + 2s

x + 1 + 2s for 1 + 2s ≤ x ≤ 1 + s,




x − 1

for 1 + s ≤ x ≤ 1.
f (0, x) = 0,



 x−1 for 1 ≤ x ≤ 1 + s,
x + 1 − 2s for 1 + s ≤ x ≤ 1 + 2s,



for s > 0, f (s, x) = 0 for 1 + 2s ≤ x ≤ 1 − 2s,




 x − 1 + 2s for 1 − 2s ≤ x ≤ 1 − s,
for 1 − s ≤ x ≤ 1.

 x+1

1+s 1+s
1 + 2s 1 1 1 + 2s
1 1 − 2s 1 − 2s 1
1−s 1−s
s<0 s>0

Let the feasible correspondence F : S → P (X) be defined by



[ 1, 1 + 4s]
 for s < 0,
F (s) = [ 1, 1] for s = 0,


[ 1 + 4s, 1] for s > 0.
a. Sketch the graph of F . Do f and F meet all the conditions of the Maximum
Theorem? If yes, justify your claim. If no, list all the conditions you believe
are violated and explain why you believe each of them is violated.
b. For each s, determine the value of f ∗ (s) and the set F ∗ (s), and sketch the
graphs of f ∗ and F ∗ .
Hint: Consider s > 0, s = 0, and s < 0 separately. Also, you may have to
split up S into subintervals where F (s) contains the point that maximizes
f (s, x) on [ 1, 1] and where it does not.
c. Is f ∗ continuous? Is F ∗ (s) 6= ∅ for each s? If so, determine whether F ∗
is uhc and/or continuous on S.
4.1.7. Let S = [0, 1] and X = [0, 2]. Let the feasible correspondence F : S → P (X)
be defined by
(
[0, 1 − 2s] for s ∈ 0, 1/2 ,
 
F (s) =  .
[0, 2 − 2s] for s ∈ 1/2, 1 .


Let the function f : S × X → R be defined by



 0 if s = 0, x ∈ [0, 2],
x




 if s > 0, x ∈ [0, s),
f (s, x) = s  x 

.

 2− if s > 0, x ∈ [s, 2s],



 s
0 if s > 0, x ∈ (2s, 2].
4.2. Finite-Horizon Dynamic Programming 117

a. Sketch the graph of f for s > 0. Sketch the graph of F . Do f and F meet all
the conditions of the Maximum Theorem? If yes, justify your claim. If no, list
all the conditions you believe are violated and explain why you believe each
of them is violated. (Is f continuous at s = 0?)
b. For each s, determine the value of f ∗ (s) and the set F ∗ (s), and sketch the
graphs of f ∗ and F ∗ .
Hint: You may have to split up S into subintervals where F (s) contains the
point that maximizes f (s, x) on [0, 2] and where it does not.
c. Is f ∗ continuous? Is F ∗ (s) 6= ∅ for each s? If so, determine whether F ∗ is
uhc and/or continuous on S.

4.2. Finite-Horizon Dynamic Programming


This section and the next considers maximization over discrete time periods: This section con-
siders the case of a finite number of time periods and the next when there are infinity many
periods. We start with a specific model problem of a one-sector economy rather than the gen-
eral situation and use it to introduce the solution method and definitions of the key concepts.

Example 4.15 (Consumption and Savings). We consider a model of a one-sector economy


where a given amount of wealth can either be consumed in this time period with an immediate
reward or be invested and carried forward to the next time period. The problem is to maximize
the total reward over all periods.
Fix T ≥ 1 and consider time periods 0 ≤ t ≤ T , where t is an integer.
The initial wealth w0 ∈ R+ at period-0 is given. The wealth at period-t is derived by
choices of consumption at previous time periods and is denoted by wt . In a general situation,
wt is called the state.
If we have wealth wt ≥ 0 at time period-t, then we can choose a consumption ct with
0 ≤ ct ≤ wt , called the action at period-t. The interval F (wt ) = [0, wt ] is called the feasible
action correspondence.
A transition function wt+1 = f (wt , ct ) = k(wt − ct ) with k ≥ 1 is given and fixed
and determines the wealth at the next time period in terms of the wealth and consumption at the
present time period. It can be thought
√ of as due to production or interest on the capital.
A utility function u(c) = c gives the immediate value or payoff for the consumption
at any one period. Because of a psychological factor of impatience, the period-t consumption
valued back at the initial period is discounted by a factor of δ t , where 0 < δ ≤ 1. Thus the
t t√
period-t reward function is rt (wt , ct ) = δ u(ct ) = δ ct .
PT PT t
Problem: Given T , u, k , δ , and w0 , maximize t=0 rt (wr , ct ) = t=0 δ u(ct ), the total
reward over all periods, contingent on 0 ≤ ct ≤ wt and wt+1 = k(wt − ct ) for 0 ≤ t ≤ T .
This finite-horizon problem can be solved using KKT, but this approach involves many
variables and equations. An easier approach is to break up the problem into simpler problems
at each time period: This approach treats it as what is called a dynamic programming problem.
A Markovian strategy profile σ = (σ0 , . . . , σT ) is a rule σt for each period-t of a choice
ct = σt (wt ) as function of only wt and σt does not depend on the other wt0 for t0 6= t.
We can pick recursively a Markovian strategy that maximizes the sum by backward induction
starting at t = T and working backward to t = 0.
1
(T) For t = T , we want to maximize rt (c) = δ T u(c) = δ T c 2 for 0 ≤ c = cT ≤ wT .
The payoff is strictly increasing, so the choice that maximizes the payoff is c̄T = wT . We
denote this choice of the optimal strategy at period-T by

c̄T = σT∗ (wT ) = wT .


118 4. Dynamic Programming

The value function at the period-T is the maximal payoff at the period-T and is given by
1
VT (wT ) = rT (σT∗ (wT )) = δ T wT2 .
(T 1) Let w = wT 1 be the wealth at period t = T 1. For an action c, the imme-
diate payoff is rT - 1 (c) = δ T - 1 c 2 . The wealth carried forward to the next period is wT =
1

1 1
f (w, c) = k(w −c), with maximal payoff at the period-T of VT (k(w −c)) = δ T k 2 (w −c) 2 .
Thus, the for any choice of consumption 0 ≤ c ≤ w, the sum of immediate payoff and the
optimal pay of the wealth carried forward to the period-T is

hT - 1 (w, c) = δ T - 1 u(c) + VT (fT - 1 (w, c)) = δ T - 1 c 2 + δ T k 2 (w − c) 2 .


1 1 1

To maximize hT - 1 as a function of c with wT - 1 as a parameter, we find a critical point:


∂hT - 1 1 1 −1
(w, c) = δ T - 1 21 c− 2 + δ T k 2 (w − c) 2 ( - 1),
1
0=
∂c 2
1 1 − 21
c− 2 = δ k (w − c) ,
2

w − c = δ 2 k c,
w = (1 + δ 2 k) c,
wT - 1
c̄ = σT∗ - 1 (wT - 1 ) = .
1 + δ 2k
∂ 2 hT - 1
Since < 0 and 0 ≤ c̄ = σT∗ - 1 (wT - 1 ) ≤ wT - 1 , this critical point is a maximum.
∂c2
wT - 1
Thus, the optimal strategy is the choice c̄T - 1 = σT∗ - 1 (wT - 1 ) = . The value function
1 + δ 2k
at period T − 1 is the maximal payoff for periods T − 1 ≤ t ≤ T ,

VT - 1 (wT - 1 ) = h∗T - 1 (wT - 1 ) = hT - 1 (wT - 1 , c̄) = δ T - 1 c̄ 2 + δ T [k(wT - 1 − c̄)]


1

1
= δ T - 1 c̄ 2 + δ T δ 2 k 2 c̄ 2
1

= δ T - 1 (1 + δ 2 k) c̄ 2
1

T -1 2 wT2 - 1
=δ (1 + δ k) 1
(1 + δ 2 k) 2
1
= δ T - 1 (1 + δ 2 k) 2 wT2 - 1 .
1

We show by backward induction that


 21 1
Vj (wj ) = δ j 1 + δ 2 k + · · · + δ 2T −2j k T −j wj2 .
We have shown that this formula for the value function is valid for j = T and T - 1. Assume it
is valid for j = t + 1, and we show the best choice of ct at time t gives a similar expression for
Vt . For a fixed w = wt , we want to maximize
ht (w, c) = rt (w, c) + Vt+1 (k(w − c))
1  21 1 1
= δ t c 2 + δ t+1 1 + · · · + δ 2T −2t−2 k T −t−1 k 2 (w − c) 2
1
for 0 ≤ c ≤ w: The immediate payoff at period t is rt (w, c) = δ t c 2 and the maximum
4.2. Finite-Horizon Dynamic Programming 119

payoff for periods t + 1 to T is Vt+1 (k(w − c)). The critical point satisfies
∂ht 1 1 1 1
0= = δ t 21 c− 2 + δ t+1 1 + · · · + δ 2T −2t−2 k T −t−1 2 k 2 1
2
(w − c)− 2 ( 1),
∂c
1
c - 2 = δ k 2 1 + · · · + δ 2T −2t−2 k T −t−1 2 (w − c) - 2 ,
1 1 1

w − c = δ 2 k 1 + · · · + δ 2T −2t−2 k T −t−1 c,


w = 1 + δ 2 k + · · · + δ 2T −2t k T −t c,


w
c̄ = = σt∗ (wt ) ≤ w.
1 + · · · + δ 2T −2t k T −t
∂ 2 ht
Since < 0 and 0 ≤ σt∗ (wt ) ≤ wt , this critical point is the maximizer. Thus, the optimal
∂c2
wt
strategy is c̄t = σt∗ (wt ) = < wt . In the calculation of the maximal
1 + · · · + δ 2T −2t k T −t
value, we use that
1  21 1
[k(wt − c̄)] 2 = δ k 1 + · · · + δ 2T −2t−2 k T −t−1 c̄ 2 .
The maximal payoff from period t onward is
1  12 1
Vt (wt ) = h∗t (wt ) = δ t c̄ 2 + δ t+1 1 + · · · + δ 2T −2t−2 k T −t−1 [k(wt − c̄)] 2
1  21  12 1
= δ t c̄ 2 + δ t+1 1 + · · · + δ 2T −2t−2 k T −t−1 δ k 1 + · · · + δ 2T −2t−2 k T −t−1 c̄ 2
1 1
= δ t c̄ 2 + δ t δ 2 k + · · · + δ 2T −2t k T −t c̄ 2
 
1
= δ t 1 + δ 2 k + · · · + δ 2T −2t k T −t c̄ 2
 

δ t (1 + δ 2 k + · · · + δ 2T −2t k T −t ) 12
= 1 wt
(1 + δ 2 k + · · · + δ 2T −2t k T −t ) 2
1 1
= δ t (1 + δ 2 k + · · · + δ 2T −2t k T −t ) 2 wt2 .
This verifies the induction hypothesis for period t, so valid for all T ≥ t ≥ 0.
By induction, for each t with 0 ≤ t ≤ T , the optimal strategy is
wt
c̄t = σt∗ (wt ) = ,
1 + δ k + · · · + δ 2T −2t k T −t
2

and the maximal payoff for all periods t = 0 to T is given by


1 1
V0 (w0 ) = (1 + δ 2 k + · · · + δ 2T k T ) 2 w02 .
Thus, we have completely solved this problem. 
Example 4.16. This example incorporates production to determine the feasible consumption
for each period. The labor force is assumed fixed, and the production is assumed to be wtβ for
capital wt with 0 < β < 1. The consumption satisfies 0 ≤ ct ≤ wtβ and the transition
function satisfies wt+1 = ft (wt , ct ) = wtβ − ct . Assume the utility is u(c) = ln(c) and the
discounted reward is rt (w, c) = δ t ln(c), where 0 < δ ≤ 1. We also assume that T > 0 is
given.
(t = T) For 0 ≤ c ≤ wβ , the maximum of δ T ln(c) occurs for c̄T = wβ = σT∗ (w),
with maximal value VT (w) = δ T β ln(w).
(t = T − 1) Let
h (w, c) = δ T - 1 ln(c) + V (wβ − c) = δ T - 1 ln(c) + δ T β ln(wβ − c).
T −1 T
120 4. Dynamic Programming

The critical point satisfies

∂hT −1 δT - 1 δT β
0= = − β ,
∂c c w −c
wβ − c = δ β c,
wβ = (1 + δ β) c,

c̄T −1 = σT −1 (w) = ≤ wβ .
1+δβ
The value function is given by the maximal value,

VT - 1 (w) = δ T - 1 ln(c̄) + Vt (wβ − c̄) = δ T - 1 ln(c̄) + δ T β ln(δ β c̄)


= δ T - 1 [1 + δβ] [β ln(w) − ln(1 + δ β)] + δ T β ln(δ β)
= δ T - 1 β [1 + δβ] ln(w) + vT - 1 ,

where the constant vT - 1 includes all the terms not involving w only involving the parameters
δ , β , and T .
For the induction hypothesis, assume that

Vj (w) = δ j β 1 + δβ + · · · + δ T −j β T −j ln(w) + vj ,
 

where vj is a constant involving only the parameters δ , β , T , and j . Assume this is valid for
j = t + 1 and verify it for t. Let

ht (w, c) = δ t ln(c) + Vt+1 (wβ − c)


= δ t ln(c) + δ t+1 β[1 + · · · + δ T −t−1 β T −t−1 ] ln(wβ − c) + vt+1 .

∂ht
The critical point, = 0, satisfies
∂c
δt 1
0= − δ t+1 β [1 + · · · + δ T −t−1 β T −t−1 ] β
c w −c
wβ − c = [δ β + · · · + δ T −t β T −t ] c
wβ = [1 + · · · + δ T −t β T −t ] c

c̄t = σt∗ (w) = ≤ wβ .
1 + · · · + δ T −t β T −t
The value function is given by the maximal value,

Vt (w) = δ t ln(c̄) + Vt+1 (wβ − c̄) = δ t ln(c̄) + Vt+1 [δ β + · · · + δ T −t β T −t ] c̄




= δ t ln(c̄) + δ t δ β + · · · + δ T −t β T −t ln(c̄) + ln δ β + · · · + δ T −t β T −t
  
+ vt+1
t T −t T −t T −t T −t
  
= δ 1 + ··· + δ β β ln(w) − ln(1 + · · · + δ β )
t T −t T −t T −t T −t
  
+ δ δβ + ··· + δ β ln δ β + · · · + δ β + vt+1
t T −t T −t
 
= δ β 1 + ··· + δ β ln(w) + vt ,

where the constant vT - 1 includes all the terms involving only the parameters. This proves the
induction step. 
4.2. Finite-Horizon Dynamic Programming 121

4.2.1. Supremum and Infimum


Before discussing general finite-horizon dynamic programming, we generalize the concept of
maximum and minimum. When we discuss maximizing a function on a domain, we want
to use the value that is a possible maximum even before we know that it is attained. For a
function f : X → R the supremum or least upper bound is the number M such that M
is an upper bound, f (x) ≤ M for all x ∈ X, and there is no upper bound that is less
than M . The supremum is infinity if f (x) is not bounded above. The supremum is denoted
by sup{ f (x) : x ∈ X }. Thus, the function is bounded above if and only if it has a finite
supremum. Note that the supremum is greater than any value of the function.
In the same way, the infimum or greatest lower bound is the number m such that m is
a lower bound, f (x) ≥ m for all x ∈ X, and there is no lower bound that is greater than
m. The infimum is minus infinity iff f (x) is not bounded below. The infimum is denoted by
inf{ f (x) : x ∈ X }. Thus, the function is bounded below if and only if it has a finite infimum.
Note that the infimum is less than any value of the function.
Example 4.17. The function arctan(x) is bounded on R but does not attain a maximum nor a
minimum. However, inf{ arctan(x) : x ∈ R } = π/2 and sup{ arctan(x) : x ∈ R } = π/2
are both finite.
= 1/x for x 6= 0 has sup 1/x : x > 0 = ∞, inf 1/x : x > 0 =
 
The function f (x)
0, sup 1/x : x < 0 = 0, and inf 1/x : x < 0 = ∞.



4.2.2. General Theorems


Definition. A finite-horizon dynamic programming problem, FHDP, is specified as follows.
FH1. T is a positive integer, the horizon. The periods t are integers with 0 ≤ t ≤ T .
FH2. S is the state space , with the state at period t given by st ∈ S.
(In the C-S problem, S = [0, ∞) and st = wt ∈ [0, ∞).)
FH3. A is the action space, with the action at period t given by at ∈ A.
(In the C-S problem, at = ct ∈ [0, ∞) = A.)
FH4. For each integer 0 ≤ t ≤ T , a correspondence and two functions are given as follows.
i. F t : S → P (A) is the feasible action correspondence, and is assumed to be a
continuous and compact-valued correspondence. Only at ∈ F t (st ) are allowed.
(In the C-S problem, ct ∈ [0, wt ] = F t (wt ).)
ii. rt : Gr(F F t ) ⊂ S × A → R is the continuous period-t reward function.
1
(In the C-S problem, rt (wt , ct ) = δ t ct2 .)
iii. ft : Gr(F F t ) ⊂ S × A → S is the continuous period-t transition function,
st+1 = ft (st , at ). (In the C-S problem, ft (wt , ct ) = k(wt − ct ).)
The total reward for initial state s0 and allowable actions {at }Tt=0 , with at ∈ F t (st ) and
st+1 = ft (st , at ) is
T
 X
W s0 , {at }Tt=0 = rt (st , at ).
t=0
The value function of the continuation FHDP starting with state st at period t is defined as
X 
T
Vt (st ) = sup rj (sj , aj ) : aj ∈ F j (sj ), sj+1 = fi (sj , aj ) for j = t, . . . , T
j=t
n   o
= sup W st , {aj }Tj=t : {aj }Tj=t is allowable ,
and V (s0 ) = V0 (s0 ) is the value function for the whole FHDP. The value function is the
maximal payoff for any choice of allowable actions.
122 4. Dynamic Programming

We show that the total reward for various choice of actions does in fact attain a finite max-
imal value. The problem is to find this maximal value and actions that realize this maximum.
Definition. A Markovian strategy profile is a collection of choice functions σ = (σ0 , . . . , σT )
with σt : S → A so at = σt (st ) ∈ F t (st ) for 0 ≤ t ≤ T . So, each σt is a function
of only st . For a non-Markovian strategy profile, each σt can be a function of (s0 , . . . , st )
and not just st . For a Markovian strategy profile σ and initial state s0 , the actions and states
at all periods are determined by induction as follows: s0 (s0 , σ ) = s0 ; for 0 ≤ t ≤ T , given
st = st (s0 , σ ),
at = at (s0 , σ ) = σt (st ),
rt (s0 , σ ) = rt (st , at ), and
st+1 = st+1 (s0 , σ ) = ft (st , at ).
The total reward for a strategy profile σ and initial state s0 is given by
XT
W (s0 , σ ) = rt (s0 , σ ).
t=0

A strategy profile σ is called an optimal strategy profile provided that W (s0 , σ ∗ ) =


V (s0 ) for all s0 ∈ S, i.e., it attains the maximal value of the value function.

Theorem 4.18 (FHDP Bellman Equation and Optimal Strategy). If a FHDP satisfies
FH1 – FH4, then the following hold.
a. For 0 ≤ t ≤ T , Vt attains a finite maximal value Vt (st ) < ∞ for each st ∈ S, is
continuous, and satisfies
Vt (s) = max { rt (s, a) + Vt+1 (ft (s, a)) : a ∈ F t (s) } . (10)
We take VT +1 (fT (s, a)) ≡ 0, so the equation for t = T becomes
VT (x) = max { rT (s, a) : a ∈ F T (s) }.
b. There exists a Markovian optimal strategy profile σ ∗ = (σ0∗ , . . . , σT∗ ) such that
W (s0 , σ ∗ ) = V (s0 ) for all s0 .
Remark. Equation (10) is call the Bellman equation and determines the solution method for a
FHDP. First, the strategy a∗T = σT∗ (sT ) is determined that maximizes rT (sT , aT ). This action
determines the value function VT (sT ) = rT (sT , σT∗ (sT )). By backward induction, once the
strategies σj∗ and value functions Vj have been determined for T ≥ j ≥ t + 1, then the strat-
egy a∗t = σt∗ (st ) is determined that maximizes ht (st , at ) = rt (st , at ) + Vt+1 (ft (st , at )) and
the next value function is set equal to the maximal value Vt (st ) = h∗t (st ) = rt (st , σt∗ (st )) +
Vt+1 (ft (st , σt∗ (st ))). By induction, we get back to V (s0 ) = V0 (s0 ).
Proof. We prove the theorem by backward induction, starting at t = T and going down to
t = 0.
For t = T , there is only one term in the definition of the value function and VT (sT ) =
max{ rT (s, a) : a ∈ F T (s) }. The function rT is continuous and F T is continuous and
compact-valued, so by the Parametric Maximization Theorem,
VT (sT ) = max{rT (sT , a) : a ∈ F T (sT ) } < ∞
exists for each sT ∈ S and is continuous, and F ∗T (sT ) = { a ∈ F T (sT ) : f (st , a) =
VT (sT ) } is nonempty upper-hemicontinuous correspondence. Pick σT∗ (sT ) ∈ F ∗T (sT ) for
each sT ∈ S, σT∗ : ST → A. Then, rT (sT , σT∗ (sT )) = W (s, σT∗ ) = VT (s), so σT∗ is an
optimal strategy. This proves the result for t = T .
The following lemma is the induction step.
4.2. Finite-Horizon Dynamic Programming 123

Lemma 4.19 (Induction Step). For 0 ≤ t < T , suppose that the value function Vt+1 is a
continuous function of st+1 and takes a finite value for each st+1 and that the continuation

FHDP starting a period t + 1 admits a Markovian optimal strategy profile (σt+1 , . . . , σT∗ ),
∗ ∗
so that Vt+1 (st+1 ) = W (st+1 , (σt+1 , . . . , σT )). Then the following hold.
a. For each st ∈ S, the value function Vt (st ) attains a finite maximal value, is continu-
ous, and satisfies
Vt (st ) = max{ rt (st , at ) + Vt+1 (ft (st , at )) : at ∈ F t (st ) }.
b. There exists a strategy σt∗ , such that (σt∗ , . . . , σT∗ ) is a Markovian optimal strategy
profile for the continuation FHDP starting at period t, W (st , (σt∗ , . . . , σT∗ )) = Vt (st )
for all st .
Proof. We start by considering the right hand side of the Bellman equation ht (st , at ) =
rt (st , at ) + Vt+1 (ft (st , at )). Since ft and rt are continuous by assumptions of the theo-
rem and Vt+1 is continuous by the induction assumption of the lemma, ht (st , at ) is con-
tinuous. The set correspondence F t is continuous and compact-valued. By the Parametric
Maximization Theorem, the maximal value h∗t (st ) is continuous and set of points that realized
the maximum F ∗ (st ) is a nonempty set. If σt∗ (st ) is any selection of a point in F ∗ (st ), then
h(st , σt∗ (st )) = h∗t (st ) is a Markovian strategy that we show satisfies the lemma.
For any st and any allowable sequence with ai ∈ F (si ) and si+1 = fi (si , ai ) for
i ≥ t,
T
X T
X
ri (si , ai ) = rt (st , at ) + ri (si , ai )
i=t i=t+1
X
T
≤ rt (st , at ) + max ri (s0i , a0i ) : s0t+1 = st+1 , a0i ∈ F i (s0i ),
i=t+1

s0i+1 = fi (s0i , a0i ) for i ≥ t + 1


= rt (st , at ) + Vt+1 (f (st , at )) = ht (st , at )


≤ max { ht (st , a0t ) : a0t ∈ F t (st ) }
= h∗t (st ).
Taking the supremum over all allowable choices yields
( T )
ri (si , ai ) : ai ∈ F t (st ), si+1 = fi (si , ai ) for t ≤ i < T
X
Vt (st ) = sup
i=t
≤ h∗t (st ) < ∞.
For the other inequality,
h∗t (st ) = h(st , σt∗ (st ))
= rt (st , σt∗ (st )) + Vt+1 (ft (st , σt∗ (st ))
T
X
= rt (st , σt∗ (st )) + ri (s∗i , σi∗ (s∗i )) ≤ Vt (st ),
i=t+1

where s∗t= st and s∗i+1


= fi (s∗i , σ ∗ (s∗i )).
Combining the two inequalities, Vt (st ) = h∗t (st ).
So Vt in finite, continuous, and satisfies the Bellman equation. By the induction hypothesis,
Vt (st ) = rt (st , σt∗ (st )) + Vt+1 (ft (st , σt∗ (st ))
= rt (st , σt∗ (st )) + W ft (st , σt∗ (st ), (σt+1

, . . . , σT∗ )


= W (st , (σt∗ , . . . , σT∗ )) ,


124 4. Dynamic Programming

and we have found an optimal Markovian strategy profile as claimed.


By induction, we have found a strategy σ ∗ = (σ0∗ , . . . , σT∗ ) that satisfies the Bellman
equation and W (s0 , σ ∗ ) = V0 (s0 ). Thus, σ ∗ is an optimal strategy.

4.2. Exercises
1
4.2.1. Consider the Consumption-Savings FHDP with δ = 1, rt (w, c) = c 3 , transition
function ft (c, w) = (w − c), F t (wt ) = [0, wt ], and T = 2. Find the value
functions and optimal strategy for each stage.
4.2.2. Consider the Consumption-Savings FHDP with T > 0, rt (w, c) = ln(c) (δ = 1),
transition function ft (w, c) = w − c, and F t (w) = [0, w] for all periods. Find the
value functions and optimal strategy for each stage. Remark: The reward function
equals minus infinity for c = 0, but this just means that it is very undesirable.
Hint: Compute, VT (wT ) and VT −1 (wT −1 ). Then guess the form of Vj , and prove
it is valid by induction.
4.2.3. Consider the Consumption-Savings FHDP with T > 0, r(w, c) = 1 − e - c , transi-
tion function ft (wt , c) = wt − c, and F t (wt ) = [0, wt ]. Find the value functions
and optimal strategy for each stage.
1
4.2.4. Consider the FHDP with δ = 1, rt (s, c) = 1− , transition function ft (s, c) =
1+c
(s − c), F t (st ) = [0, st ], and T ≥ 2.
a. Find the value function and optimal strategy for t = T and T − 1.
(1 + t)2
b. Using backward induction, verify that Vt (s) = 1 + t − . Also,
1+t+s
determine the optimal strategy for each t.
4.2.5. Consider the Consumption-Savings FHDP with T > 0, rt (w, c) = δ t ln(c) with
0 < δ ≤ 1, transition function ft (w, c) = A wβ − c with A > 0 and β > 0, and
F t (w) = [0, A wβ ] for all periods. Verify that the value function is
Vj (w) = δ j ln(w) β(1 + βδ + · · · + β T −j δ T −j ) + vj for correctly chosen constants
vj (that can depend on δ , β , and other parameters). Also find the optimal strategy
for each stage.
Remark: The reward function equals minus infinity for c = 0, but this just means
that small values of c are very undesirable.

4.3. Infinite-Horizon Dynamic Program


In this section, we discuss problems with an infinite horizon where the process can go on all
future periods, and there is an infinite sum of rewards. In order for the total reward to be finite,
we need to discount the reward at time t in the future by a factor δ t , where 0 < δ < 1. Also,
there is no final time period at which to start a backward induction to find a value function.
However, starting the process one stage later forms an equivalent dynamic program, so we can
show that the value function satisfies a type of Bellman equation. Since we can show there is a
unique function satisfying the Bellman equation, the problem can be solved by finding a value
function that satisfies this functional equation. Although we discuss the theory of the existence
of an optimal solution, our emphasis is on solution methods for finding such an optimal solution.
[12] by Stokey and Lucas and [14] by Sundaram are good references for this material.
4.3. Infinite-Horizon Dynamic Program 125

Definition. A stationary dynamic programming problem with infinite horizon, SDP, is specified
as follows.
SD1. S ⊂ Rn is the state space with st the state at period-t.
SD2. A ⊂ Rk is the action space with the action at at period-t.
SD3. There is a feasible action correspondence F : S → P (A) that is a compact-valued,
nonempty, continuous correspondence. For each s ∈ S, the set F (s) ⊂ A specifies
the allowable actions.
SD4. There is a continuous transition function f : S × A → S that specifies the state at the
next period in terms of the current state and action taken, st+1 = f (st , at ) for t ≥ 0.
SD5. There is a continuous one-period reward function r : S × A → R that specifies an
immediate reward r(s, a) for an action a taken at state s.
SD6. There is a discount factor δ ∈ (0, 1), so that δ t r(st , at ) is the reward at period-t
discounted back to period-0. This psychological factor represents the impatience for the
reward.
For an initial state s0 , an allowable sequence of actions {at }∞ t=0 is one with at ∈ F (st ) and
states st+1 = f (st , at ) for t ≥ 0. The total reward for such an allowable sequence of actions
is ∞
X
W (s0 , {at }∞
t=0 ) = δ t r(st , at ).
t=0

Remark. The dynamic program is called stationary because the same r, f , and F are used
for every period t. The discount factor allows the possibility that the total reward is finite.
Definition. The value function V : S → R is defined as the supremum of the total reward over
all possible sequences of allowable actions,
V (s0 ) = sup { W (s0 , {at }) : {at } is an allowable sequence } .
Problem: The problem is to find an allowable sequence of actions that realizes the value func-
tion as a maximum, i.e., that maximizes the total reward W (s0 , {at }) for allowable sequences
of actions {at } and st+1 = f (st , at ) t ≥ 0.
Definition. A stationary strategy σ is a choice of an action a = σ(s) ∈ F (s) ⊂ A for each
s ∈ S that is the same for all periods.
Definition. For a SDP, given a stationary strategy σ and an initial state s0 , we can determine
the actions and the states at all periods by induction, and so the total reward: at = at (s0 , σ) =
σ(st ), st+1 = st+1 (s0 , σ) = f (st , at ), and

X
W (s0 , σ) = δ t r(st , at ).
t=0

Definition. An optimal stationary strategy σ is a stationary strategy such that
W (s0 , σ ∗ ) = V (s0 ) for all s0 ∈ S.
The two solution methods that we present for finding an optimal solution strategy both use
the Bellman equation given in the following theorem.
Theorem 4.20 (SDP Bellman Equation). For a SDP, the value function V (s) satisfies the
following equation, called the Bellman equation:
V (s) = sup{ r(s, a) + δ V (f (s, a)) : a ∈ F (s) }. (11)
Remark. Note that for a SDP, the same function V is on both sides of Bellman equation.
Thus, it is not possible to just find the maximum value of the right hand side as is done for
finite-horizon dynamic programming. Instead, it is necessary to solve the equation for a value
function that is the same on both sides of the equation.
126 4. Dynamic Programming

Proof. Define h(s, a) = r(s, a) + δ V (f (s, a)) to be the function on the right-hand side of
Bellman’s equation. To show that the Bellman equation holds, we first show that V (s) is less
than or equal to the right-hand side of the Bellman equation. Fix an s0 , and take any allowable
sequence at ∈ F (st ) and st+1 = ft (st , at ) for t ≥ 0.
∞ ∞
δ t - 1 r(st , at )
X X
δ t r(st , at ) = r(s0 , a0 ) + δ
t=0 t=1
nX∞
≤ r(s0 , a0 ) + δ sup δ t - 1 r(s0t+1 , a0t+1 ) : s01 = s1 , a0t ∈ F (s0t ),
t=0
s0t = ft (s0t−1 , a0t−1 ) for t ≥ 0

= r(s0 , a0 ) + δ V (f (s0 , a0 ))
≤ sup { h(s0 , a00 ) : a00 ∈ F (s0 ) } = h∗ (s0 ).
Here we define h∗ using a supremum like we did earlier for the maximum. Since the total
reward for any allowable sequence is less than or equal to h∗ (s00 ), taking the supremum over
all allowable choices yields
( ∞ )
δ t r(st , at ) : at ∈ F (st ), st+1 = f (st , at ) for t ≥ 0 ≤ h∗ (s0 ).
X
V (s0 ) = sup
t=0

We will be done if we can show that V (s0 ) ≥ h∗ (s0 ).


First, assume that h∗ (s0 ) < ∞. We show that V (s00 ) ≥ h∗ (s00 ) −  for any  > 0. Fix
 > 0. Since h∗ (s0 ) − 2 is not an upper bound, there exists an a00 ∈ F (s0 ) such that

r(s0 , a00 ) + δ V (f (s0 , a00 )) = h(s0 , a00 ) ≥ h∗ (s0 ) − .
2
Then starting at s01 = f (s0 , a00 ), there exist a0t ∈ F (s0t ) and s0t+1 = f (s0t , a0t ) for t ≥ 1
such that X∞ 
δ t−1 r(s0t , a0t ) ≥ V (s01 ) − .
t=1 2
Combining,
X∞ X∞
W (s0 , {a0t }) = r(s0 , a00 ) + δ t r(s0t , a0t ) = r(s0 , a00 ) + δ δ t−1 r(s0t , a0t )
t=1 t=1

≥ rτ (s0 , a00 ) + δ V (s01 ) −
2
∗  
≥ hτ (sτ ) − − .
2 2
The supremum over all allowable sequences starting with s0 is at least as large as the payoff
using the above sequence of choices a0t , so
V (s0 ) ≥ h∗ (s0 ) − .
Since  > 0 is arbitrary, V (s0 ) ≥ h∗ (s0 ). Combining the two directions, V (s0 ) = h∗ (s0 )
and the Bellman equation is satisfied.
Next, assume that h∗ (s0 ) = ∞. For any K > 0, there exists an a00 ∈ F (s0 ) such that
r(s0 , a00 ) + δ V (f (s0 , a00 )) ≥ K.
Then starting at s01 = f (s0 , a00 ), there exist a0t ∈ F (s0t ) and s0t+1 = f (s0t , a0t ) for t ≥ 1
such that X∞
r(s0 , a00 ) + δ δ t−1 r(s0t , a0t ) ≥ K − 1.
t=1
The supremum over all allowable sequences starting with s0 is at least as large as the payoff
using the above sequence of choices a0t , so
X∞
V (s0 ) ≥ δ t r(s0t , a0t ) ≥ K − 1.
t=0
4.3. Infinite-Horizon Dynamic Program 127

Since K > 0 is arbitrary, V (s0 ) = ∞, and V (s0 ) = h∗ (s00 ).

Properties of Value Function and Existence of Optimal Stationary Strategy


We prove the following results under one of two sets of assumptions: (i) the reward func-
tion is bounded (SDB), or (ii) SDP is an Optimal Growth Dynamical Program for a one-sector
economy satisfying assumptions E1 – E3 that are given when the general model is discussed.
Finite value function. For these two contexts, Theorems 4.23 and 4.26(a) show that V (s) <
∞ for each s ∈ S, so V (s) is a well defined function.
Continuity. Theorems 4.24 and 4.26(b) prove that V (s) is continuous and the unique bounded
function satisfying Bellman equation. The proof uses an iterative process to construct a se-
quence of continuous Vj (s) that converge uniformly to V (s) on compact intervals [0, s̄], so
V (x) is continuous.
Optimal Strategy. Once we know that V (s) is continuous, then the right hand side of the
Bellman equation is continuous,
h(s, a) = r(s, a) + δ V ◦ f (s, a).
Theorems 4.25 and 4.27(b) prove that any choice function
σ ∗ (s) ∈ F ∗ (s) = { a ∈ F (s) : h(s, a) = h∗ (s) }
is an optimal strategy, so an optimal stationary strategy σ ∗ exists with W (s, σ ∗ ) = V (s).

4.3.1. Examples
We delay the precise theorems and proofs until after giving examples using two methods of
using the Bellman equation to determine the value function and an optimal strategy. The first
method involves construction a sequence of functions by iteratively maximizing the right-hand
of the Bellman equation. The proof of Theorem 4.24 the sequence of function converge to the
true value function. The second method involves guessing the form of the value function in a
form that involves unknown parameters. Then the Bellman equation is used to determine the
value of these parameters and so of the true value function.
Example 4.21. This example is a special case of the optimal growth of a one-sector economy
considered later. Let
S = R+ ,
A = R+ ,
F (s) = [0, s],
f (s, a) = k(s − a), with k ≥ 1
1
r(s, a) = u(a) = a , 2 1.0inand
0 < δ < 1, with k δ 2 < 1.
The reward function is not bounded on R+ , but given an initial state s0 , s1 ≤ ks0 , s2 ≤
k s1 ≤ k 2 s0 , st ≤ k t s0 . If k δ 2 < 1, the total reward is bounded as follows:
 1
t 1
δ t r(st , at ) ≤ δ t u(st ) ≤ δ t u(k t s0 ) = δ k 2 s02 ,
1
∞ ∞
X
t
X
t t 1s02
δ r(st , at ) ≤ δ k s0 ≤
2 2
1 .
t=0 t=0 1 − δk 2
Thus, the value function is finite for each s0 .
128 4. Dynamic Programming

Solution Method 1: We construct a sequence of functions Vj (s), that converge to the value
function V (s). We prove that the Vj (s) are continuous by induction. Start with the zero
function V0 (s) = 0 for all s. Assume that Vj (s) in our sequence is continuous, and let
1
hj+1 (s, a) = r(s, a) + δ Vj (f (s, a)) = a 2 + δ Vj (k(s − a))
be the function that is used in the supremum on the right hand-side of the Bellman equation
1
using the value function Vj . The functions r(s, a) = u(a) = a 2 , Vj (s), and f (s, a) =
k (s − a) are all continuous, so hj+1 (s, a) is a continuous function. The feasible action
correspondence F (s) = [0, s] is a continuous compact-valued correspondence. So, we can
apply the Parametric Maximization Theorem to get a continuous maximal value as a function
of s,
Vj+1 (s) = h∗1 (s).
None of these functions are the value function, because we will see that Vj+1 (s) 6= Vj (s).
However in this example, we note that this sequence of functions converges to the true value
function. Later in the Proof of Theorem 4.24 proving the continuity of the value function, we
indicate a little more of why this works.
Since we start with V0 (s) = 0 for all s,
1 1
h1 (s, a) = a 2 + δ V0 (k(s − k)) = a 2 .
This function is increasing on the intervals F (s) = [0, s], so h1 (s, a) has a maximum at the
right end point, ā = s. Then,
1
V1 (s) = h∗1 (x) = h1 (s, ā) = s 2 .
Note that V1 (s) is the maximum over the single period t = 0.
1 1 1 1
For the next step, h2 (s, a) = a 2 + δ V1 (k(s − a)) = a 2 + δ k 2 (s − a) 2 . The critical
point satisfies
∂h2
= 21 a - 2 − 12 δ k 2 (s − a) - 2 ,
1 1 1
0=
∂a
a - 2 = k 2 δ (s − a) - 2 ,
1 1 1

s − a = k δ 2 a,
s = (1 + k δ 2 ) a,
s
ā = .
1 + kδ2
∂ 2 h2
Since < 0 for all a ∈ [0, s], ā is a maximizer.
∂a2
1 1 1
V2 (s) = h∗2 (s) = h2 (s, ā) = ā 2 + δ k 2 (s − ā) 2
1 1 1 1
= ā 2 + δ k 2 δ k 2 ā 2
1
= (1 + k δ 2 ) ā 2
1
s2
= (1 + k δ 2 ) 1
(1 + k δ 2 ) 2
1 1
= (1 + k δ 2 ) 2 s 2 .
Note that V2 (s) is the maximum over the two periods t = 0, 1.
Our induction hypothesis is that
h i 21 1
Vj (s) = 1 + k δ 2 + · · · + k j - 1 δ 2(j - 1) s 2 .
4.3. Infinite-Horizon Dynamic Program 129

We have verified the formula for j = 1, 2. Assume this is true for j = t and show is is true
for j = t + 1. Let
h i 21 1
ht+1 (s, a) = a 2 + δ 1 + k δ 2 + · · · + k t - 1 δ 2(t - 1) k 2 (s − a) 2 .
1 1

The critical point satisfies


∂ht+1 1 1 1 1 h i 21
= a - 2 − k 2 δ 1 + · · · + k t - 1 δ 2(t - 1) (s − a) - 2
1
0=
∂a 2 2
h i 21 1
(s − a) 2 = k 2 δ 1 + · · · + k 1 δ 2(t - 1) a 2
-
1 1
t

s − a = k δ 2 + · · · + k t δ 2t a
 

s = 1 + k δ 2 + · · · + k t δ 2t a
 

s ∗
ā = = σt+1 (s).
1 + k δ 2 + · · · + k t δ 2t
∂ 2 ht+1
Again, this is a maximizer because < 0 for all a ∈ [0, s]. Then
∂a2
h i 21
Vt+1 (s) = h∗t+1 (s) = ht+1 (s, ā) = ā 2 + δ 1 + k δ 2 + · · · + k t - 1 δ 2(t - 1)
1 1 1
k 2 (s − ā) 2
h i 21  12
= ā 2 + δ 1 + k δ 2 + · · · + k t - 1 δ 2(t - 1)
1 1
k 2 k δ 2 ā


 12 1
= 1 + k δ 2 + · · · + k t δ 2t

ā 2
= [1 + k δ 2 + · · · + k t δ 2t ] 2 [1 + k δ 2 + · · · + k t δ 2t ] - 1 s
1

 21 1
= 1 + k δ 2 + · · · +, k t δ 2t

s2
This verifies the induction hypothesis for the form of Vt+1 (s).
Taking the limit as t goes to infinity,
h i 21 1
V∞ (s) = lim Vt (s) = lim 1 + k δ 2 + · · · + k t - 1 δ 2(t - 1) s 2
t→∞ t→∞
- 1 1
= 1 − δ 2k 2 s 2 .
 

Also, if we take the limit in the inductive equation defining the Vj (s),
1
V∞ (s) = lim Vj+1 (s) = lim max{ a 2 + δ Vj (k(s − a)) : 0 ≤ a ≤ s }
t→∞ t→∞
1
= max{ a 2 + δ V∞ (k(s − a)) : 0 ≤ a ≤ s }. (Bellman equation).
Since the value function is the unique locally bounded solution of Bellman equation, V (s) =
V∞ (s) and
- 1 1
V (s) = V∞ (s) = 1 − δ 2 k 2 s 2 .


The optimal strategy is also the limit of the strategies σt∗ (s),
σ ∗ (s) = lim σt∗ (s) = lim (1 + k δ 2 + · · · + k t−1 δ 2t−2 ) - 1 s = (1 − k δ 2 ) s.
t→∞ t→∞

If V0 (s) ≡ 0, then V1 (s) maximum over the single period t = 1; V2 (s) is the maximum
over the two periods t = 0, 1; by induction, Vj (s) is the maximum over the j periods
t = 0, . . . , j − 1. Taking the limit, V∞ (s) is the maximum over all the periods t ≥ 0, which
is the true value function.
130 4. Dynamic Programming

Steps to Solve a SDP by Iteration, Method 1


1. Start with V0 (s) ≡ 0 for all s.
2. By induction define
Vj+1 (s) = max { r(s, a) + δ Vj (f (s, a)) : a ∈ F (s) }.
V1 (s) is the maximum over one period, t = 0.
V2 (s) is the maximum over two periods, t = 0, 1.
Vj (s) is the maximum over j periods, t = 0, . . . , j − 1.
3. Vj (s) converges uniformly to V (s) on compact intervals, so V (s) is con-
tinuous. V (s) is maximum for all periods.
4. The maximizer σj (s) for each step converges to the optimal strategy, σ ∗ (s).

Solution Method 2: In this method, the way we find the optimal solutions is to guess the
form of the value function V with unspecified parameters. Next, we use the Bellman equation
to determine the unspecified parameters in the guess. In the process, the optimal strategy is
determined.
For the present problem, an outline of the solutions method is as follows. (1) Based on the
1
reward function, we guess that it is of the form V (s) = M s 2 where M is to be determined.
We could also use the first few Vj (s) calculated by Method 1 to motivate a guess of the form
of the true value function. (2) Next, determine the critical point ā of

1 1 1
h(s, a) = r(s, a) + δ V (f (s, a)) = a 2 + δ M k 2 (s − a) 2 .

This is the sum of the immediate payoff plus the payoff for what is carried forward to the future
calculated by the value function. Verify that these are a maximum of h(s, a) for a ∈ F (s) =
[0, s]. This value ā can depend on the unspecified parameters of V as well as the data for the
problem. (3) Calculate h∗ (s) = h(s, ā). (4) Use the Bellman equation, V (s) = h∗ (s), to
solve for the unspecified parameters of the guess of V . Finally, (5) substitute the parameters
into s̄ to determine the optimal strategy.
1
(1) Using the guess is that the value function has the form V (s) = M s 2 , define

1 1 1
h(s, a) = a 2 + δ M k 2 (s − a) 2 .

(2) The critical point of h as a function of a satisfies


h(s, a) = 12 a - 2 − k 2 δ M (s − a) - 2
1 1 1
1
0= 2
∂a
a - 2 = k 2 δ M (s − a) - 2
1 1 1

1 1 1
(s − a) 2 = k 2 δ M a 2 ,
s − a = k δ 2 M 2 a,
s = (1 + k δ 2 M 2 ) a, and
s
ā = ≤ s.
1 + k δ 2M 2

∂2
Since h(s, a) < 0, the critical point ā indeed maximizes h and is an optimal strategy.
∂a2
4.3. Infinite-Horizon Dynamic Program 131

(3) The maximal value of h as a function of a, with s as a parameter is as follows:


1 1 1
h∗ (s) = ā 2 + k 2 δ M (s − ā) 2 ,
1 1 1 1 1
= ā 2 + k 2 δ M δ M k 2 ā 2 = 1 + k δ 2 M 2 ā 2


 21
s

2 2

= 1+kδ M
1 + k δ 2M 2
 12 1
= 1 + k δ 2M 2 s2 .
(4) The value function must satisfy the Bellman equation (11), V (s) = h∗ (s), so
1 1 1
M s 2 = 1 + k δ 2M 2 2 s 2 ,
M 2 = 1 + k δ 2M 2,
M 2 (1 − k δ 2 ) = 1,
1
M2 = , and
1 − kδ2
 21
1

M̄ = .
1 − kδ2
Because the solution of the Bellman equation is unique and this function using M̄ satisfies it,
it must be the value function,
 21
s

V (s) = .
1 − kδ2
(5) The optimal strategy is
s s
σ ∗ (s) = ā = 2 2
= 2
1+kδ M M
= (1 − k δ 2 )s.
Note that we need k δ 2 < 1 for σ ∗ (s) ≥ 0 and V (s) to be defined. 

Steps to Solve a SDP using Method 2


1. Guess the form of the value function, with unspecified parameters. Base the
guess on either the form of r(s, a) or a few steps in the iterative method.
2. Determine the critical point ā of h(s, a) = r(s, a) + δ V (f (s, a) using
the guess for V (s). The value ā can depend on the unspecified parameters
of V . Verify that ā is a maximizer on F (s).
3. Substitute ā into h(s, a) to determine the maximal value h∗ (s) = h(s, ā).
4. Use Bellman equation, V (s) = h∗ (s), to solve for the unspecified parame-
ters of the guess for V . Substitute the parameters into V (s) to get the value
function.
5. Substitute the parameters into ā to get the optimal strategy.

Example 4.22. This is an example attributed to Weitzman. On each day, a vintner can split
his time between baking bread or squeezing grapes: The amounts of effort are bt and 1 − bt
respectively with both bt and wt+1 elements of [0, 1]. In the next period, the amount of wine
1 1
available is wt+1 = 1 − bt . The reward or utility at each period is u(wt , bt ) = wt2 bt2 . There
P∞ 1 1
is a discount factor 0 < δ < 1. The quantity to be maximized is t=0 δ t wt2 bt2 . We consider
132 4. Dynamic Programming

wt as the state variable and bt as the action with wt+1 = 1 − bt the transition function. The
Bellman equation is

V (w) = max{ wb + δ V (1 − b) : b ∈ [0, 1] }.

Method 1: In this example, the function to be maximized is


1 1
hj (w, b) = w 2 b 2 + δ Vj−1 (1 − b) b ∈ [0, 1].

Taking V0 (w) = 0 for all w, h1 is an increasing function of b so the maximal value is


1 1 1 1
V1 (w) = h∗1 (w) = h1 (w, 1) = w 2 . Then, h2 (w, b) = w 2 b 2 + δ (1 − b) 2 , and the critical
point satisfies

∂h2
= 21 w 2 b - 2 − 12 δ (1 − b) - 2
1 1 1
0=
∂b
w b - 1 = δ 2 (1 − b) - 1
w(1 − b) = δ 2 b
w = (w + δ 2 ) b
w
b̄ = .
w + δ2

∂ 2 h2
The second derivative < 0, so b̄ is a maximizer. Then, V2 (w) = h∗2 (w) = h2 (w, b̄), so
∂a2
1  21
w δ2
 2

1
V2 (w) = w 2 + δ
w + δ2 w + δ2
2
w+δ 2 12
= 2 1 = (w + δ ) .
(w + δ ) 2

Rather than calculate more terms, we turn to Method 2. In this treatment, we will derive
formulas that allows to determine the rest of the sequence of functions for Method 1.
1
Method 2: (1) We look for a solution of the form V (w) = A(w + C) 2 , where A and C
are parameters to be determined. (2) We introduce the function to be maximized: h(w, b) =
1 1 1
b 2 w 2 + δA(1 − b + C) 2 . (3) The critical point satisfies

∂h
= 21 b - 2 w 2 − 12 δA(C + 1 − b) - 2
1 1 1
0=
∂b
w(C + 1 − b) = δ 2 A2 b
w(C + 1) = b w + δ 2 A2
 

w(C + 1)
b̄ = .
w + δ 2 A2

∂2h
The second derivative < 0, so the critical point is a maximum.
∂b2
4.3. Infinite-Horizon Dynamic Program 133

(4) As a preliminary step to calculate the maximal value,


!

(C + 1 − b̄) = δ 2 A2 ,
w
! 12
1 b̄
δ A(C + 1 − b̄) 2 = δ 2 A2
w
1
(C + 1) 2 δ 2 A2
= 1 .
[w + δ 2 A2 ] 2

The maximal value h(w, b̄) can be given as


1 1
∗ (C + 1) 2 w (C + 1) 2 δ 2 A2
h (w) = 1 + 1
[w + δ 2 A2 ] 2 [w + δ 2 A2 ] 2
1
(C + 1) 2 [w + δ 2 A2 ]
= 1
[w + δ 2 A2 ] 2
1  12
= (C + 1) 2 w + δ 2 A2

.

(5) The Bellman equation becomes


1 1  12
A(w + C) 2 = (C + 1) 2 w + δ 2 A2

.
1
Equating similar coefficients, we get A = (C + 1) 2 and C = δ 2 A2 , so

A2 = δ 2 A2 + 1,
(1 − δ 2 ) A2 = 1,
1
A2 = , and
1 − δ2
δ2
C= .
1 − δ2
(6) Therefore, the value function is
 21   12 1
1 δ2 [w(1 − δ 2 ) + δ 2 ] 2

V (w) = w+ = .
1 − δ2 1 − δ2 1 − δ2
1 δ2 δ2 + 1 − δ2 1
Using A2 and C + 1 = + 1 = = , we get the
1−δ 2 1−δ 2 1−δ 2 1 − δ2
optimal strategy
h i
1
w(C + 1) w 1−δ 2 w

b̄ = σ (w) = = δ2
= .
2
w+δ A 2
w + 1−δ 2 w(1 − δ 2 ) + δ 2
1
Return to Method 1: In the consideration of Method 2, we saw that if Vj (w) = A(w + C) 2 ,
1 1 w(1 + C)
then Vj+1 (w) = (1 + C) 2 [w + δ 2 A2 ] 2 and σj+1 (w) = . Using that V2 (w) =
w + δ 2 A2
134 4. Dynamic Programming

1
(w + δ 2 ) 2 ,
 12   21
V3 (w) = 1 + δ 2 w + δ2 ,
1 
2 2
 21
V4 (w) = 1 + δ w + δ 2 + δ4 ,
1 
2j−2 2
 12
V2j (w) = 1 + · · · + δ w + δ 2 + · · · + δ 2j ,
1 1
V2j+1 (w) = 1 + · · · + δ 2j w + δ 2 + · · · + δ 2j
 
2
 2
.
This sequence of functions converges to the value function found using Method 2,
 21   12 1
1 δ2 [w(1 − δ 2 ) + δ 2 ] 2

V (w) = w+ = .
1 − δ2 1 − δ2 1 − δ2
The optimal strategy at each step in the process is
w(1 + δ 2 )
σ3 (w) = ,
w + δ2
w(1 + δ 2 )
σ4 (w) = ,
w + δ2 + δ4
w(1 + δ 2 + · · · + δ 2j )
σ2j+1 (w) = ,
w + δ 2 + · · · + δ 2j
w(1 + δ 2 + · · · + δ 2j )
σ2j+2 (w) = .
w + δ 2 + · · · + δ 2j+2
This sequence of strategies converges to the optimal strategy found using Method 2,
h i
1
w 1−δ 2 w
σ ∗ (w) = δ2
= ,
w + 1−δ 2 w(1 − δ 2 ) + δ 2


4.3.2. Theorems for Bounded Reward Function


In this section, we consider the case when the reward function r(s, a) is bounded on S × A.
SDB. The reward function r is continuous and bounded on S × A, i.e., there exists
K > 0 such that |r(s, a)| ≤ K for all (s, a) ∈ S × A.
We show that if a SDP satisfies SDB, then the value function is finite and continuous and an op-
timal strategy exists. SDB is not satisfied by any of our examples, but we use this consideration
to later show that the results hold for the types of examples that we have considered.
Theorem 4.23 (SDP Finite Value Function). If a SDP satisfies SDB in addition to SD1 – SD6,
then the value function V (s) is a bounded function, so V (s) < ∞ for each s ∈ S.
Proof. The total reward for any choice of actions is bounded:
X∞ X∞ X∞ K
δ t r(st , at ) ≤ δ t |r(st , at )| ≤ δt K = = K 0.

1−δ

t=0 t=0 t=0

The supremum over all allowable choices of all the at is bounded by this same constant, so the
value function V (s0 ) is bounded and takes on finite values.
4.3. Infinite-Horizon Dynamic Program 135

Theorem 4.24 (SDP Continuity of Value Function). Assume that a SDP satisfies SDB and
has a bounded value function V . Then the following hold.
a. There exists a unique bounded solution of Bellman equation, and unique solution is
continuous.
b. The value function V (s) is the unique continuous function that satisfies the Bellman
equation.
Proof. The continuity of the value function V : S → R cannot be proved directly from the
Bellman equation because we do not know a priori that the right hand side is continuous.
Instead the continuity is proved by means of a process that takes a bounded function and
returns another bounded function by the solution process of Method 1. In determining the value
function for examples, Method 1 finds it as the limit of a sequence of functions calculated using
the right-hand side of the Bellman equation. The theory behind these calculations involves
(i) putting a distance between two functions (a metric on a space of functions), (ii) showing
the space of functions has a property called complete (a sequence of functions that gets closer
together has to converge to a function in the space of functions), and finally, showing that
the construction of the sequence of function is really a contraction mapping of the space of
functions. We showed earlier how this process works for two examples.
Assume G : S → R is any bounded function. Let
hG (s, a) = r(s, a) + δ G(f (s, a)), and
T (G)(s) = sup{ hG (s, a) : a ∈ F (s) }.
It is shown that T (G) : S → R is a new bounded function, and if G1 and G2 are two
such functions then T (G1 ) and T (G2 ) are closer together than G1 and G2 , i.e., T is a
contraction mapping on the space of bounded functions. The set of bounded functions needs
to be shown to be complete, i.e., a sequence of functions getting closer together (is a Cauchy
sequence) must converge to a bounded function. Then it follows that there is a unique bounded
function that is taken to itself. Since the value function is one such function, it must be the
unique function satisfying the Bellman equation.
If V0 : S → R is any bounded function and inductively Vj+1 = T (Vj ), then the
sequence Vj (s) converges to the unique function fixed by T . If we start with a continuous
function V0 , then all the functions Vj (s) in the sequence are continuous by the Parametric
Maximization Theorem. Because the distance between the functions Vj (s) and V (s) goes to
zero in terms of the distance on the function space, the sequence converges uniformly to V (S).
But the uniform limit of continuous functions is continuous, so the value function V (S) must
be continuous.
See Section 4.3.4 for more details.
Remark. Assume we start with V0 (s) ≡ 0 and inductively let Vj+1 = T (Vj ). Then, V1 (s)
is the maximum over the one period t = 0; V2 (s) is the maximum over the two periods
t = 0, 1; Vj (s) is the maximum over the j periods t = 0, . . . , j − 1. The proof of the theorem
shows that Vj (s) converges to the value function V (s) that is the maximum over all periods
t ≥ 0.
Theorem 4.25 (SDP Optimal Strategy). Assume that a SDP has a continuous value function
V such that δ t V (st ) goes to zero as t goes to infinity for any allowable sequence of {at }
with st+1 = f (st , at ).
Then optimal stationary strategy σ ∗ exists with W (s, σ ∗ ) = V (s).
In fact, for h(s, a) = r(s, a) + δ V ◦ f (s, a), an optimal strategy is any choice function
σ ∗ (s) ∈ F ∗ (s) = { a ∈ F (s) : h(s, a) = h∗ (s) }.
Remark. Note that the Theorem is valid if r(s, a) is bounded (SDB) so V (s) is bounded.
136 4. Dynamic Programming

Proof. Since r, f , and V are continuous, h(s, a) = r(s, a) + δ V ◦ f (s, a) is contin-


uous. By the Parametric Maximization Theorem, the maximal value h∗ (s) is realized on
F ∗ (s) = { a ∈ F (s) : h(s, a) = h∗ (s) }, which is set is nonempty, compact-valued,
upper-hemicontinuous correspondence.
h∗ (s0 ) = max { r(s0 , a0 ) + δ V (f (s0 , a0 )) : a0 ∈ F ∗ (s0 ) }
 
 
δ t−1 r(st , at ) : a0 ∈ F ∗ (s)
X
= max r(s0 , a0 ) + δ max
 {at } 
t≥1
X
= max δ t r(st , at )
{at }
t≥0

= V (s0 ).
Let σ ∗ (s) ∈ F ∗ (s) be any choice function. We next show that V (s) = W (s, σ ∗ ). For
s0 ∈ S, let a∗t = at (σ ∗ , s0 ) = σ ∗ (st ) and s∗t+1 = st+1 (σ ∗ , s0 ) = f (st , a∗t ). Also, s∗0 = s0 .
By the equation for V and σ ∗ above and the definitions of a∗t and s∗t+1 ,
V (s∗t ) = h∗ (s∗t ) = h(s∗t , σ ∗ (s∗t ))
= r(s∗t , σ ∗ (s∗t )) + δ V ◦ f (s∗t , σ ∗ (s∗t ))
= r(s∗t , a∗t ) + δ V (s∗t+1 ).
By repeated uses of this formula,
V (s0 ) = r(s0 , a∗0 ) + δ V (s∗1 )
= r(s0 , a∗0 ) + δ r(s∗1 , a∗1 ) + δ 2 V (s∗2 )
= r(s0 , a∗0 ) + δ r(s∗1 , a∗1 ) + δ 2 r(s∗1 , a∗1 ) + δ 3 V (s∗3 )
..
.
T -1
X 
= δ t r(s∗t , a∗t ) + δ T V (s∗T ).
t=0

If we let T go to infinity, then δ T V (s∗T ) goes to zero by the hypothesis. Therefore, the right
hand side converges to X∞
δ t r(s∗t , a∗t ) = W (s0 , σ ∗ ),
t=0
and V (s0 ) = W (s0 , σ ∗ ). Thus, σ ∗ is an optimal strategy.

4.3.3. Theorems for One-Sector Economy


Example 4.21 considered a one-sector economy with specific feasible correspondence and re-
ward and transition functions. In this section, we show that some of the properties of the
value function and optimal strategy for an optimal growth of a one-sector economy can be
proved without specifying a specific utility function (reward function) or specific production
function(transition function) but merely giving assumptions on their properties. Note that the
reward function is not assumed to be bounded so we cannot apply Theorems 4.23, 4.24, 4.25.
One-Sector Economy, 1-SecE: In this model, the state st ≥ 0 in period t is the supply of
consumption goods available. The choice ct ∈ [0, st ] = F (st ) is the consumption (action) in
period t, and r(s, c) = u(c) is the utility of the consumption. Given the state and consumption
in period t, then st+1 = f (st − ct ) the supply at the next period, so f (x) is the production
function. The discount factor is 0 < δ < 1. The utility function u and production function f
are assumed to satisfy the following.
E1. u : R+ → R is continuous and strictly increasing with u(0) = 0 for convenience.
4.3. Infinite-Horizon Dynamic Program 137

E2. f : R+ → R is continuous and nondecreasing on R+ with f (0) = 0 (there is no free


production).
E3. At least one of the following two conditions holds.
a. There is x̄ > 0 such that f (x) ≤ x for x ≥ x̄.
b. There is a 0 < λ < 1, such that
δ u(f (x)) ≤ λ u(x) for all x ≥ 0.
Assumption E3.b is that there is a bound on the growth rate of utility with production. In
1
Example 4.21, u(a) = a 2 and f (x) = k x. This utility function is unbounded but u and f
1
satisfy E1, E2, and E3.b using λ = δ k 2 < 1:
1 1
δ u(f (x)) = δ k 2 x 2 = λ u(x) for x ≥ 0.
Theorem 4.26. If a 1-SecE satisfies E1 – E3, then the following are true.
a. The value function V (s) < ∞ for each s ∈ R+ .
b. The value function V (s) is the unique bounded solution of Bellman equation and is
continuous.
Proof using E3.a. Take any s̄ ≥ x̄. We restrict to S = [0, s̄]. The reward function r(s, c)
is bounded on [0, s̄]. Take s0 ∈ [0, s̄]. The following inequalities proves by induction that
st ∈ [0, s̄] for all t ≥ 0: If st ∈ [0, s̄], then
0 = f (0) ≤ f (st − ct ) = st+1 ≤ f (st ) ≤ f (s̄) ≤ s̄.
(a) The one stage reward is bounded on compact interval [0, s̄] and series defining the
value function converges on [0, s̄] as in the proof of Theorem 4.23. Since s̄ ≥ x̄ is arbitrary,
V (s) < ∞ for any s ≥ 0.
(b) Since r(s, a) is bounded for st ∈ [0, s̄], so the proof of Theorem 4.24 shows V (s)
is continuous.
Proof using E3.b. (a) Take s0 ≥ 0. For any allowable sequence {ct },
δ u(ct ) ≤ δ u(st ) = δ u(f (st−1 − ct−1 )) ≤ δ u(f (st−1 )) ≤ λ u(st−1 ).
By induction,
δ t u(ct ) ≤ δ t u(st ) ≤ λt u(s0 ) and
X X u(s0 )
V (s0 ) = sup δ t u(ct ) ≤ λt u(s0 ) = < ∞.
t t
1−λ
(b) The proof follows the ideas in [12]. For an alternate proof see Section 4.3.4. Let
V ∗ (s) = A u(s) for
1
1 + λ A = A or A = . Then,
1−λ
u(c) + δ A u(f (s − a)) ≤ u(a) + δ A u(f (s))
≤ u(s) + A λ u(s)
= A u(s), so
T (V )(s) = sup u(c) + δ A u(f (s, c))

0≤c≤s

≤ A u(s) = V ∗ (s).
Let V0∗ (s) = V ∗ (s) and Vj+1∗
= T (Vj∗ ) for j ≥ 0. Since V1∗ (s) = T (V0∗ )(s) ≤
V0∗ (s) for all s, Vj+1 (s) ≤ Vj (s) for all s by induction. Thus for each s ≥ 0, Vj∗ (s) ≥ 0
∗ ∗

is a decreasing sequence, and limj→∞ Vj∗ (s) converges to V∞ ∗


(s) that satisfies the Bellman
equation and so is the value function.
138 4. Dynamic Programming

Theorem 4.27. If a 1-SecE satisfies E1 – E3, then the following hold.


a. The value function V : R+ → R is increasing.
b. There is an optimal strategy σ ∗ (s), satisfying V (s) = u(σ ∗ (s)) + δ V ◦ f (s − σ ∗ (s)).
Proof. (a) Assume s0 < s00 . Let s∗t and c∗t be the optimal sequences starting at s∗0 = s0
with c∗t = σ ∗ (s∗t ) and s∗t+1 = f (st − ct )∗ . Set c00 = c∗0 + s00 − s0 > c0 . Then c00 ≤ s00 and
s00 − c00 = s0 − c∗0 so s01 = f (s00 − c00 ) = s∗1 . Therefore, s0t = s∗t and c0t = c∗t are allowable
for t ≥ 1. The sequence c0t for t ≥ 1 is allowable starting at s00 , but not necessarily optimal.
Then X∞
V (s00 ) ≥ u(c0t ) = V (s0 ) − u(c∗0 ) + u(c00 ) > V (s0 )
t=0
because u is strictly increasing. This proves that V is increasing.
(b) By Theorem 4.26, V (s) is continuous so h(s, a) = r(s, a) + δ V ◦ f (s, a) is con-
tinuous.
If E3.a is satisfied, then V (s) is bounded on each [0, s̄] as in the proof of Theorem 4.26.
Therefore, Theorem 4.25 implies that an optimal strategy exists.
If E3.b is satisfied, then by the proof of Theorem 4.26,
X∞ ∞
X
δ T V (sT ) = δ t u(ct ) ≤ λt u(s0 ) → 0 as T → ∞
t=T t=T
P∞ t
since series t=1 λ converges. Again, Theorem 4.25 implies that an optimal strategy exists
on all R+ .
Theorems 4.26 and 4.27 show why the value function and optimal strategy exist for Ex-
ample 4.21. We proceed to show that more of the properties of the value function and optimal
strategy that Example 4.21 possessed are true for more general models of a one-sector econ-
omy with additional assumptions. The reference to this material is Section 12.6 of [14] by
Sundaram. To prove these results, we need further assumptions on the production function f
and the utility function u.
E4. The utility function u is strictly concave on R+ .
E5. The production function f is concave on R+ .
E6. The utility function u is C 1 on R++ with limc→0+ u0 (c) = ∞.
E7. The production function f is C 1 on R++ with f (0+) = limx→0+ f 0 (x) > 0.
1
Remark. Assumptions E1 – E7 are satisfied for u(c) = c 2 and f (x) = k x of the earlier
example.
Theorem 4.28. If a 1-SecE satisfies E1 – E4, then the savings function ξ ∗ (s) = s − σ ∗ (s) is
nondecreasing on R+ .
Proof. Assume not. Then there exist s, s0 ∈ S with s < s0 and ξ ∗ (s) > ξ(s0 ). Let
x = ξ ∗ (s) and x0 = ξ(s0 ). Since x ≤ s < s0 , x is a feasible savings level for s0 . Also,
x0 < x ≤ s, so x0 is a feasible level of savings for s. Since x and x0 are the optimal savings
levels for s and s0 , we have
V (s) = u(s − x) + δV (f (x))
≥ u(s − x0 ) + δV (f (x0 )),
V (s0 ) = u(s0 − x0 ) + δV (f (x0 ))
≥ u(s0 − x) + δV (f (x)).
Therefore,
u(s − x0 ) − u(s − x) ≤ δ [V (f (x)) − V (f (x0 ))] ≤ u(s0 − x0 ) − u(s0 − x).
4.3. Infinite-Horizon Dynamic Program 139

The function u is strictly concave and increasing and (s−x0 )−(s−x) = (s0 −x0 )−(s0 −x),
and the points on the right are larger, so
u(s − x0 ) − u(s − x) > u(s0 − x0 ) − u(s0 − x).
This contradiction proves the theorem.
Theorem 4.29. If a 1-SecE satisfies E1 – E5, then the value function V is concave on R+ .
Proof. Let s, s0 ∈ S with s < s0 . Set sτ = (1 − τ )s + τ s0 for 0 ≤ τ ≤ 1. Let st and s0t
be the optimal sequences of states and ct and c0t be the sequences of consumptions starting at
s and s0 respectively. Note that ct ≤ st and c0t ≤ s0t . For each t, let cτt = (1 − τ )ct + τ c0t .
We will show that cst is an allowable consumption for sτt . Let x∗t denote the sequence of
investment levels if we use the sequence cτt starting at sτ .
First,
sτ = (1 − τ )s + τ s0 ≥ (1 − τ )c0 + τ a00 = cτ0 .
Then, x∗0 = sτ − cτ0 . Using the concavity of f ,
f (x∗0 ) ≥ (1 − τ ) f (s − c0 ) + s f (s0 − c00 )
= (1 − τ ) s1 + τ s01 ≥ (1 − τ ) c1 + τ c01
= cτ1 .
Continuing by induction, we get that f (x∗t ) ≥ aτt+1 . Therefore, the sequence cst is an allow-
able sequence for sτ .
The sequence cτt is feasible, but not necessarily optimal. The utility function is concave,
so
X∞
V (sτ ) ≥ δ t u(cτt )
t=0
X∞
= δ t u ((1 − τ )ct + τ a0t )
t=0
X∞ X∞
≥ (1 − τ ) δ t u(ct ) + τ δ t u(a0t )
t=0 t=0
= (1 − τ )V (s) + τ V (s0 ).
This shows that V is concave.
Theorem 4.30. If a 1-SecE satisfies E1 – E5, then the correspondence F ∗ that gives the
maximizers of the Bellman equation is single-valued. Therefore, the optimal strategy σ ∗ is
uniquely determined and is a continuous function on R+ .
Proof. We are assuming that u is strictly concave and that f is concave. By Theorem 4.29,
V is concave. Combining, u(a) + δ V (f (s − a)) is a strictly concave function of a. It
follows that there can be a single point that maximizes this function, so F ∗ (s) is a single
point. It follows that the optimal strategy is unique. Since an upper-hemicontinuous single
valued correspondence is continuous, F ∗ or σ ∗ is continuous.
Theorem 4.31. If a 1-SecE satisfies E1 – E7, then for all s > 0, the optimal strategy is an
interior point of F (s) = [0, s], 0 < σ ∗ (s) < s.
Proof. Let s0 > 0, s̄t , c̄t = σ ∗ (s̄t ), and x̄t = s̄t − c̄t be optimal sequence of states,
consumption, and savings. Assume that not all the x̄t are interior. Let x̄τ be the first x̄t that
is not interior. We get a contradiction to the fact that these are optimal. Since x̄τ - 1 is interior,
s̄τ = f (x̄τ - 1 ) > 0.
First we show that it cannot happen that x̄τ = 0. If this were true, then s̄τ +1 = f (x̄τ ) = 0
and s̄t = 0 for all t ≥ τ + 1. The value function starting at s̄τ is V (s̄τ ) = u(s̄τ − x̄τ ) +
140 4. Dynamic Programming

δ V (s̄τ +1 ) = u(s̄τ ). This is greater than the payoff choosing saving z > 0 for t = τ and
savings 0 = f (z) − cτ +1 for t = τ + 1,
u(s̄τ ) ≥ u(s̄τ − z) + δ u(f (z)) + δ 2 u(0).
But

d
= u0 (s̄τ ) + δ u0 (0+) f 0 (0+) = ∞

[u(s̄τ − z) + δ u(f (z))]
dz z=0+
= −u0 (s̄τ ) + δ (∞) f 0 (0+) = ∞.
Since this derivative is positive, z = 0 cannot be a maximum.
Second, we show that if x̄τ = s̄τ , then x̄τ +1 = s̄τ +1 .
V (s̄τ ) = u(s̄τ − x̄τ ) + δ u (f (s̄τ ) − x̄τ +1 ) + δ 2 V (f (x̄τ +1 )
= u(0) + δ u (f (s̄τ ) − x̄τ +1 ) + δ 2 V (f (x̄τ +1 ) .
If also x̄τ +1 < s̄τ +1 , then savings z at period τ can be decreased from z = x̄τ = s̄τ while
keeping x̄τ +1 fixed,
u(0) + δ u (s̄τ +1 − x̄τ +1 ) + δ 2 V (f (x̄τ +1 ))
≥ u(s̄τ − z) + δ u (f (z) − x̄τ +1 ) + δ 2 V (f (x̄τ +1 ))
or
u(0) + δ u (s̄τ +1 ) − x̄τ +1 ) ≥ u(s̄τ − z) + δ u (f (z) − x̄τ +1 ) .
Since x̄τ +1 is fixed, we can keep all the choices fixed for t ≥ τ + 1. Since this must be an
optimal choice,

d
0≤ [u(s̄τ − z) + δ u (f (z) − x̄τ +1 )]
dz z=s̄τ
= u0 (0) + δ u0 (s̄τ +1 − x̄τ +1 ) f 0 (s̄τ )
= ∞ + δ u0 (s̄τ +1 − x̄τ +1 ) f 0 (s̄τ ) = ∞.
Since this derivative is negative, it cannot be a maximum. Therefore, we would need x̄τ +1 =
s̄τ +1 .
We have shown that if x̄τ = s̄τ , then x̄τ +1 = s̄τ +1 . Continuing by induction, we would
need x̄t = s̄t and c̄t = 0 for all t ≥ τ . Therefore, the value function starting at τ would be
zero. Keeping all the ct = 0 for t ≥ τ + 1 and increasing cτ , we can increase the payoff.
Therefore, this would not be an optimal sequence.
Thus, we have ruled out the possibility of x̄τ being on either end point, and so it and c̄τ
must be interior.
Theorem 4.32. If a 1-SecE satisfies E1 – E7, then for s0 > 0, the optimal strategy σ ∗
satisfies the Ramsey-Euler equation
u0 (σ ∗ (st )) = δ u0 (σ ∗ (st+1 )) f 0 (st − σ ∗ (st ))
where st+1 = f (st − σ ∗ (st )).
Proof. In the proof of the last theorem, we showed that c = ct = σ ∗ (st ) is an interior
maximum of
u(c) + δ u (f (st − c) − xt+1 ) .
The Ramsey-Euler equation is the first order condition for an interior maximum of this function.

Theorem 4.33. If a 1-SecE satisfies E1 – E7, then the optimal strategy σ ∗ is increasing on
R+ .
4.3. Infinite-Horizon Dynamic Program 141

Proof. Suppose the theorem is false and there exist s < ŝ with c = σ ∗ (s) ≥ σ ∗ (ŝ) = ĉ.
Since s − c < ŝ − ĉ, s1 = f (s − c) ≤ f (ŝ − ĉ) = ŝ1 . Let c1 = σ ∗ (s1 ) and ĉ1 = σ ∗ (ŝ1 ).
By the Ramsey-Euler equation, u0 (c) = δ u0 (c1 ) f 0 (s − c) and u0 (ĉ) = δ u0 (ĉ1 ) f 0 (ŝ − ĉ),
so
u0 (a) u0 (c1 ) f 0 (s − c)
= or
u0 (ĉ) u0 (ĉ1 ) f 0 (ŝ − ĉ)
u0 (c1 ) u0 (c) f 0 (ŝ − ĉ)
= .
u0 (ĉ1 ) u0 (ĉ) f 0 (s − c)
Since u is strictly concave u0 (c) ≤ u0 (ĉ). Also, s − c < ŝ − ĉ, so f 0 (s − c) ≥ f 0 (ŝ − ĉ).
Combining, we must have u0 (c1 ) ≤ u0 (ĉ1 ) and c1 ≥ ĉ1 . Thus, the situation is repeated at the
next period with s1 < ŝ1 and c1 ≥ ĉ1 . Continuing by induction, we get that st < ŝt and
ct ≥ ĉt for all t, so that V (s) ≥ V (ŝ). This contradicts the fact that V is increasing and
proves the theorem.
The last theorem shows that under one additional assumption, there is a positive steady
state of consumption and savings. For this theorem , we need the following assumption.
E8. The production function f is strictly concave on R+ and δ f (0+) > 1.
Note that for f (x) = kx, then it requires kδ > 1, so k > 1 and E3.a is not valid. However,
if kδ 2 < 1, then E3.b is satisfied and we still get the results.
Theorem 4.34. If a 1-SecE satisfies E1 – E8, then there is a unique x∗ such that δf 0 (x∗ ) =
1. Let s∗ = f (x∗ ) and c∗ = s∗ − x∗ be the associated state and consumption. For s0 > 0,
define ct and st inductively by ct = σ ∗ (st ) and st+1 = f (st − ct ). Then limt→∞ st = s∗
and limt→∞ ct = c∗ .
Proof. Since δ f 0 (0+) > 1, δ f 0 (x̄) < 1, and f is strictly concave, there is a unique x∗ such
that δ f 0 (x∗ ) = 1. Since the savings function ξ and production function f are nondecreasing,
the sequence st is nondecreasing function of t and has a limit s∞ . Since σ ∗ is nondecreasing,
ct is nondecreasing and has a limit c∞ with c∞ = σ ∗ (s∞ ). Since st+1 = f (st − ct ),
s∞ = f (s∞ − c∞ ).
0 0 0
Also, since u (ct ) = δ u (ct+1 ) f (st − ct ), we get that
u0 (c∞ ) = δ u0 (c∞ ) f 0 (s∞ − c∞ ) so
0
1 = δ f (s∞ − c∞ )
Thus, x∞ = s∞ − c∞ satisfies the equation for x∗ and so x∞ = x∗ , s∞ = f (x∞ ) =
f (x∗ ) = s∗ , and c∞ = s∞ − x∞ = s∗ − x∗ = c∗ . Therefore, these quantities are x∗ , s∗ ,
and c∗ .
For our specialized optimal growth Example 4.21, the hypotheses of Theorem 4.34 are not
valid: the production function f (x) = k x is not strictly concave and we cannot satisfy both
E3.b and f 0 (0) > 1/δ : We need k δ 2 ≤ 1 so we can satisfy E3.b, and f 0 (0) = k ≥ 1/δ for
E8. For that example, the conclusion of Theorem 4.34 are not valid since st and ct both go to
zero and not a nonzero limit.

4.3.4. Continuity of Value Function


In this section, we present the essential steps to prove the continuity of the value function.
See Theorem 12.13 and Step 1 in Section 12.5.3 in Sundaram [14] for details. In order to
define a convergent sequence, it is necessary to already know the point to which it converges.
It is convenient to have a criterion for a sequence to converge that uses only the points in the
sequence and not the limit point.
142 4. Dynamic Programming

Definition. Given a norm k k∗ on a space S, a sequence {xk } in S is called Cauchy in terms


of the norm k k∗ , or is a Cauchy sequence, provided that for all  > 0 there exists a K() such
that
kxj − xk k∗ <  whenever j, k ≥ K().
n
If we do not specify the norm for S = R , then we mean the Euclidean norm.
Definition. A space S with a norm k k∗ is called complete provided that for any Cauchy
sequence {xk } in S, there exists a point a ∈ S such that the the sequence {xk } converges to
a.
Theorem 4.35 (S1.11). Euclidean spaces Rn with the Euclidean norm are complete.
The proof must be given in R first. Then the fact that a sequence in Rn converges if and
only if each of its components converge, can be used prove it in any dimension.
Since we are going to seek the solution for the Bellman equation both in the space of all
bounded functions and the space of all continuous functions, we introduce a notation for these
function spaces. For S ⊂ Rn , defined the function space of bounded and bounded continuous
functions as follows:
B(S) = { W : S → R : W is bounded },
C(S) = { W ∈ B(S) : W is continuous }.
The sup norm on these spaces is
kW − V k0 = sup{ |W (x) − V (x)| : x ∈ S }.
Proposition 4.36 (Uniform convergence of functions). The metric spaces B(S) and C(S)
are complete with the metric k · k0 , i.e., every Cauchy sequence in one of these spaces converges
to a function in the same space.
A sketch of the proof is as follows. Let X equal to B(S) or C(S). If Wj is a sequence
in X that is Cauchy, then for each s, the sequence Wj (s) is Cauchy in R and converges
to a limit value W∞ (s) since R is complete. This defines the limit function. It must then be
shown that Wj converges to W∞ in terms of the norm k k0 and that W∞ ∈ X . We omit
the details.
Theorem 4.37. Assume that a SDP satisfies SDB. For W ∈ B(S) or C(S) define
T (W )(s) = sup{ r(s, a) + δ W ◦ f (s, a) : a ∈ F (s) }.
Then, T : B(S) → B(S), T : C(S) → C(S), and T has a unique fixed function in
B(S) and C(S). In fact, for any W0 ∈ B(S) (or C(S)), the sequence of functions defined
inductively by Wj+1 = T (Wj ) converges to the fixed function.
The value function V is this fixed function, so is continuous.
Proof. If W ∈ B(S), then hW (s, a) = r(s, a) + δ W ◦ f (s, a) is bounded, so T (W ), which
is the supremum over a, is bounded and is in B(S). If W ∈ C(S), then hW (s, a) is continuous
and F is a continuous correspondence. By the Maximum Theorem, T (W ) is continuous on
S, and so T (W ) ∈ C(S). Thus, T takes the spaces into themselves.
Let X be either B(S) or C(S). It can be directly checked that the operator T has the
following two properties for W1 , W2 ∈ X and any constant c:
(i) If W2 (s) ≥ W1 (s) for all s ∈ S, then T (W2 )(s) ≥ T (W1 )(s) for all s ∈ S.
(ii) T (W1 (·) + c) = T (W1 (·)) + δ c, where 0 < δ < 1 is the discount factor.
The following lemmas complete the proof that there is a unique fixed function by T .
The value function satisfies the Bellman equation by Theorem 4.20 and so is the unique
fixed function by T in B(S). Since the unique fixed function is also in C(S), V (s) must
be continuous.
4.3. Infinite-Horizon Dynamic Program 143

T (W2 ) − T (W1 )k0 ≤


Lemma 4.38. If a SDP satisfies SDB, then T is a contraction, kT
δ kW2 − W1 k0 for all W2 , W1 ∈ X , where X equals B(S) or C(S).
Proof. For any two W1 , W2 ∈ X , W2 (s) ≤ W1 (s) + kW2 − W1 k0 . By properties (i) and
(ii),
T (W2 )(s) ≤ T (W1 + kW2 − W1 k)(s) = T (W1 )(s) + δ kW2 − W1 k0 so
T (W2 )(s) − T (W1 )(s) ≤ δ kW2 − W1 k0 .
Reversing the roles of W1 and W2 , we get T (W1 )(s) − T (W2 )(s) ≤ δ kW2 − W1 k0 .
Thus,
T (W2 )(s) − T (W1 )(s)| ≤ δ kW2 − W1 k0 for all s ∈ S. Taking the supremum, we get
|T
T (W2 ) − T (W1 )k0 ≤ δ kW2 − W1 k0 .
kT
Lemma 4.39. A contraction T on a complete metric space X has a unique fixed point.
Proof. Take any W0 ∈ X . By induction, define Wj+1 = T (Wj ). We estimate the norms of
the differences:
kWj+1 − Wj k0 ≤ δ kWj − Wj - 1 k0 ≤ δ 2 kWj - 1 − Wj - 2 k0 ≤ · · · ≤ δ j kW1 − W0 k0 ,
so
kWj+k+1 − Wj k0 ≤ kWj+k+1 − Wj+k k0 + kWj+k − Wj+k - 1 k0 + · · · + kWj+1 − Wj k0
≤ (δ j+k + δ j+k - 1 + · · · + δ j ) kW − W k 1 0 0
j 2
≤ δ (1 + δ + δ + · · · )kW1 − W0 k0
δj
 
≤ kW1 − W0 k0 .
1−δ
Therefore, the sequence {Wj } is Cauchy, and so it converges to some limit W ∗ in X .
Since a contraction is continuous,
W ∗ = lim Wj = lim Wj+1 = lim T (Wj ) = T (W ∗ ),
j→∞ j→∞ j→∞

so W ∗ is a fixed point of T .
The fixed point can be shown unique. Assume both W ∗ and W̄ are fixed. Then
T (W ∗ ) − T (W̄ )k0 ≤ δ kW ∗ − W̄ k0
kW ∗ − W̄ k0 = kT
(1 − δ)kW ∗ − W̄ k0 ≤ 0.
Therefore, kW ∗ − W̄ k0 must be zero and W ∗ = W̄ . Thus, the fixed point is unique.

Alternative Proof of Theorem 4.26 with E3.b.


For this alternative proof when the reward function is unbounded but satisfies E3.b, we
define the following new norm on functions,

W (s)
 
kW k∗ = sup :s>0 . (*)
u(s)
The spaces of functions for which this norm is finite,
B∗ (R+ ) = { W : R+ → R+ : W (0) = 0, kW k∗ < ∞ } and
C∗ (R+ ) = { W ∈ B∗ (R+ ) : W is continuous }.
are complete as before.
144 4. Dynamic Programming

Lemma 4.40. Assume a 1-SecE satisfies E1, E2, and E3.b.


a. T preserves both B∗ (R+ ) and C∗ (R+ ).
T (W2 ) − T (W1 )k∗ ≤ δ kW2 − W1 k∗ for all
b. T is a contraction on B∗ (R+ ), i.e., kT
W2 , W1 ∈ B∗ (R+ ).
Proof. (a) All W (s) ≥ 0, so
T (W )(s) = sup{ u(a) + δ W (f (s − a)) : a ∈ [0, s] }
≤ u(s) + δ W (f (s))
≤ u(s) + δ kW k∗ u(f (s))
≤ u(s) + λ kW k∗ u(s)
= (1 + λ kW k∗ ) u(s).
So,
T (W )k∗ ≤ (1 + λ kW k∗ ) < ∞.
kT
Convergence in the k · k∗ implies uniform convergence on compact intervals, so a Cauchy
sequence in C∗ (R+ ) converges to a continuous function in C∗ (R+ ).
(b) For two functions W1 , W2 ∈ B∗ (R+ ),
T (W2 )(s) = sup {u(a) + δ W2 (f (s − a)) : a ∈ [0, s] }
≤ sup {u(a) + δ W1 (f (s − a)) + δ kW2 − W1 k∗ u(f (s − a)) : a ∈ [0, s] }
≤ sup {u(a) + δ W1 (f (s − a)) : a ∈ [0, s] } + δ kW2 − W1 k∗ u(f (s))
≤ T (W1 )(s) + λ kW2 − W1 k∗ u(s),
T (W2 )(s) − T (W1 )(s) ≤ λ kW2 − W1 k∗ u(s).
Reversing the roles of W1 and W2 , we get the other inequality, so
T (W2 )(s) − T (W1 )(s)

≤ λ kW2 − W1 k∗ ,
u(s)
T (W2 ) − T (W1 )k∗ ≤ λ kW2 − W1 k∗ .
kT

The rest of the proof goes as before since convergence in the k · k∗ norm implies uniform
convergence on compact intervals.

4.3. Exercises
2
4.3.1. Consider the SDP problem with reward function r(s, a) = u(a) = a /3 , transition
function f (s, a) = k(s − a) with k ≥ 1, F (s) = [0, s], and 0 < δ < 1.
2
a. Using the guess that V (s) = M s /3 , find the action a = σ(s) in terms of
M that maximizes the right hand side of the Bellman equation.
b. Substitute the solution of part (a) in the Bellman equation to determine the
constant M and V (s).
c. What is the optimal strategy?
4.3.2. (Brock-Mirman growth model.) Consider the SDP problem with st the amount
of capital at period-t, F (s) = (0, s] the allowable consumption, at consumption
at period-t, reward function r(s, a) = u(a) = ln(a), discount 0 < δ < 1, and
transition (production) function f (s, a) = (s−a)β with 0 < β ≤ 1 that determines
the capital at the next period. (Note that u(a) = ln(a) is unbounded below at 0, and
the choices are open at 0, but it turns out that the Bellman equation does have a
solution.)
4.3. Infinite-Horizon Dynamic Program 145

a. Using the guess that V (s) = A + B ln(s), find the action a = σ(s) in
terms of A and B that maximizes the right hand side of the Bellman equation.
b. Substitute the solution of part (a) in the Bellman equation to determine the
constants A and B . Hint: The coefficients of ln(y) on the two sides of the
Bellman equation must be equal and the constants must be equal. Solve for B
first and then A.
c. What are the value function and optimal strategy?
4.3.3. Consider the SDP problem with S = [0, ∞), A, F (s) = [0, s], f (s, a) = 2s −
2a, r(s, a) = 2 − 2e - a , and δ = 1/2. Start with the guess that this SDP has a value
function of the form V (s) = A − B e - bs .
a. Find the action ā = σ(s) that maximizes the right hand side of the Bellman
equation. (The answer can contain the unspecified parameters A, B , and b.)
b. What equations must A, B , and b satisfy to be a solution of the Bellman
equation? Solve these equations for A, B , and b.
c. What are the value function and the optimal strategy?
4.3.4. Consider the SDP problem with discount 0 < δ < 1, S = [0, ∞), A, F (s) =
a 1
[0, s], bounded reward function r(s, a) = = 1− , and transition
1+a 1+a
function f (s, a) = k(s − a) with k > 1 and kδ = 1. Start with the guess that
 this
s 1 1
SDP has a value function of the form V (s) = = B 1− .
1 + Bs 1 + Bs
a. What is the Bellman equation for this problem?
b. Find the action a = σ(s) that maximizes the right hand side of the Bellman
equation.
c. Substitute the solution of part (b) in the Bellman equation to determine the
constant B .
d. What is the optimal strategy?
4.3.5. Consider the SDP problem with st the amount of capital at period-t, F (s) = (0, s]
the allowable consumption, at consumption at period-t, reward function r(s, a) =
u(a) = ln(a), discount 0 < δ < 1, and transition (production) function f (s, a) =
s − a that determines the capital at the next period. (Note that u(a) = ln(a) is
unbounded below at 0, and the choices are open at 0, but it turns out that the Bellman
equation does have a solution.)
a. Using the guess that V (s) = A + B ln(s), find the action a = σ(s) in
terms of A and B that maximizes the right hand side of the Bellman equation.
b. Substitute the solution of part (a) in the Bellman equation to determine the
constants A and B . Hint: The coefficients of ln(y) on the two sides of the
Bellman equation must be equal and the constants must be equal. Solve for B
first and then A.
c. What are the value function and optimal strategy?
146 4. Dynamic Programming

4. Exercises for Chapter 4


4.1. Indicate which of the following statements are always true and which are false. Also
give a short reason for you answer.
a. For a finite horizon dynamic programming problem with C 1 reward functions
rt (s, a) and C 1 transition functions ft (s, a) and continuous feasible action
correspondences F t (s), the optimal strategy must be a continuous function.
b. The feasible set F for a linear program is always a convex set.
c. If f : [0, 1] × [0, 1] → R is continuous, then the point c∗ = σ ∗ (s) that
maximizes f (s, c) for c ∈ [0, s] is a continuous function of s.
Appendix A

Mathematical Language

We summarize some of the mathematical language with which the reader should be familiar.
We denote the set of real numbers by R and n-dimensional Euclidean space by Rn .
We assume that the reader can understand common notations related to sets. To express that
p is an element of a set S, we write p ∈ S, and p ∈ / S means that p is not an element of S.
We often define sets using notation like { x ∈ R : x2 ≤ 4 } for the interval [ 2, 2]. The set
{ x ∈ R : x2 < 0 } is the empty set, denoted by ∅. If every element of the set S1 is an element
of S2 , then we call S1 a subset of S2 , and write S1 ⊂ S2 . When S1 ⊂ S2 , we allow the two
sets to be equal. Given two sets S1 and S2 , S1 = S2 if and only if S1 ⊂ S2 and S2 ⊂ S1 .
We often express the domain and target space of a function by a notation f : S ⊂ Rn → R:
This means that S is the domain and a subset of Rn , and f takes values in the real numbers.
Quantifiers are very important. The phrases “for all”, “for every”, “for any”, and “for each”
are all expressions for the universal quantifier and mean essentially the same thing, but they do
have different connotations. When we say “for all x ∈ S, x2 ≥ x”, we are thinking collectively
of the whole set of x. Consider the statement “For a function g : R → R and for any b ∈ R,
we can form the set { x ∈ R : g(x) ≤ b }.” The set that follows depends on the particular
value of b, so we are taking the b one at a time so we use “for any” or “for each” and not “for
all”.
The phases “for some” and “there exists” are expressions for the existential quantifier. The
statement “for some x ∈ R, x2 ≥ 4 + x” is true. But “for all x ∈ R, x2 ≥ 4 + x” is false.
This latter is false since there is some x, for example x = 1, for which the statement is not
true. For a statement containing a universal quantifier to be true, it must be true for every object
satisfying the quantifier. Thus, we only need to find one counter-example. Also, quantities can
generally depend on things earlier in the sentence: “For any  > 0, there exists a δ > 0 such
that”, means that the δ can depend on . Therefore, the order of quantifiers is important. The
sentence “There exists a δ > 0 such that for all  > 0” would mean that the same δ works
for all of the .
The use of “or” is inclusive, with one or both being true. The statement “I am taking a class
in Mathematics or Economics” is true even if I am taking both classes.
“If A is true, then B is true” and “A implies B” are two ways of saying the same thing that
we sometimes denote by the notation “A ⇒ B”. ‘A’ is the hypothesis and ‘B’ is the conclusion.
Either of these statements is equivalent to its contrapositive statement: “If B is not true, then A
is not true.” We often use this type of logic in our proof without explicitly commenting on it.
The converse is a very different statement: “If B is true, then A is true” or in symbolic notation

147
148 A. Mathematical Language

“A ⇐ B.” We use “iff” to mean “if and only if”, so each implies the other, “A ⇔ B.” Notice
that the statement “If the moon is made of blue cheese then pigs can fly” is true because the
hypothesis is false.
At various places we talk about necessary conditions and sufficient conditions. In the state-
ment “If A, then B”, ‘A’ is a sufficient condition for “B” and “B” is a necessary condition for
“A”. Theorem 3.12 states that “if the feasible set satisfies the constraint qualification and a dif-
ferentiable function f attains a maximum for the feasible set at a point x∗ , then conditions
KKT-1,2,3 hold.” Thus, KKT-1,2,3 are necessary conditions for a maximum. Theorem 3.19
states that “if the feasible set is convex and a differentiable, concave function f satisfies KKT-
1,2,3 at a point x∗ , then f attains a maximum at x∗ .” Thus, when f is concave, conditions
KKT-1,2,3 are sufficient for a maximum. The Extreme Value Theorem 2.14 states that conti-
nuity of a function on a closed and bounded set are sufficient conditions for the existence of a
maximum.
The reader should be familiar with how quantifiers change when a statement is negated. The
negation of “all mathematics classes are difficult” is “at least one (or some) mathematics class
is not difficult.” The negation of “some mathematics classes are difficult” is “all mathematics
classes are not difficult.”
In definitions, we use “provided that” to define a term. What follows “provided that” is the
defining conditions for the word or phrase being defined. Thus, its meaning is “if and only if.”
Many authors use “if”, but this has a different meaning than the use of “if” in theorems, so we
prefer “provided that” which is not often used in other contexts.
In Chapter 4, we prove some equalities by induction, although it is only for a finite range
of integers.
This quick summary of some mathematical language and logic is not complete by any
means. A more complete introduction is given in a book on the foundation of higher mathemat-
ics such as Bond and Keane [3]. The main point is to be able to read definitions and theorems
that are stated using formal mathematical language. Such statements must be read carefully and
digested in order to understand their meaning. In theorems, be sure to understand what are the
assumptions or hypotheses and what is the conclusion. In a definition, be sure to understand
the key aspects of the concept presented.
Bibliography

1. K. Arrow and F.H. Han, General Competitive Analysis, Holden-Day, 1971.


2. M. Bazaraa, H. Sherali, and C. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley Inter-
Science, Hoboken NJ, 2006.
3. R. Bond and W. Keane, An Introduction to Abstract Mathematics, Waveland Press, 1999.
4. D. Besanko and R. Braeutigam, Microeconomics, 4th edition. John Wiley & Sons, New York, 2010.
5. A. Chiang, Fundamental Methods of Mathematical Economics, McGraw-Hill Inc., New York, 1984.
6. S. Colley, Vector Calculus, 4th edition, Pearson Prentice Hall, 2012.
7. C. Hassell and E. Rees, “The index of a constrained critical point”, The Mathematical Monthly, October
1993, pp. 772–778.
8. H. Jongen, K. Meer, and E. Triesch, Optimization Theory, Kluwer Academic Publishers, Norwell MA,
2004.
9. D. Lay, Linear Algebra and its Applications, fourth edition, Addison-Wesley, Boston, 2012.
10. W. Rudin, Principles of Mathematical Analysis, third edition, McGraw-Hill, New York, 1976.
11. C. Simon and L. Blume, Mathematics for Economists, W. W. Norton & Company, New York, 1994.
12. N. Stokey and R. Lucas Jr, Recursive Methods in Economic Dynamics, Harvard University Press, Cam-
bridge MA, 1989.
13. G. Strang, Linear Algebra and its Applications, Harcourt Brace Jovanovich, Publ., San Diego, 1976.
14. R. Sundaram, A First Course in Optimization Theory, Cambridge University Pres, New York & Cambridge
UK, 1996.
15. W. Wade, Introduction to Analysis, Fourth Edition, Pearson Prentice Hall, Englewood Cliffs, NJ, 2010.
16. R. Walker, Introduction to Mathematical Programming, Pearson Learning Solutions, Boston MA, fourth
edition, 2013.

149
Index

action, 117, 125 resource, 3


action space, 125 constraint qualification, 69, 75
affine, 45 continuous, 42
allowable sequence of actions, 125 continuous correspondence, 111
artificial objective function, 12 continuously differentiable, 45
artificial variable, 10 convex combination, 31
attainable values, 1 convex function, 82
convex set, 31, 81
basic feasible solution, 7 correspondence, 107
basic solution, 7, 32 bounded, 108
basic variables, 7 closed-graphed, 108
Bellman equation, FHDP, 122 closed-valued, 108
Bellman equation, SDP, 125 compact-valued, 108
best response correspondence, 109, 114 continuous, 111
bordered Hessians, 100 feasible action, 117, 121, 125
boundary, 40 graph , 107
bounded, 41 locally bounded, 108
bounded correspondence, 108 lower-hemicontinuous, 111
bounded functions, 142 upper-hemicontinuous, 111
bounded linear program, 4 critical point, 56
budget correspondence, 113 cycling, 34

C 1 , 45
degenerate basic solution, 7, 34
C 1 rescaling, 89
departing variable, 8
C 2 , 47
derivative, 45
Cauchy sequence, 142
second, 47
closed, 40
differentiable
closed ball, 40
continuously, 45
closed-graphed correspondence, 108
twice continuously, 47
closed-valued correspondence, 108
discount factor, 117, 125
closure, 41
dual linear program, 18, 19
compact, 41
dynamic programming problem, 117
compact-valued correspondence, 108
complement, set, 40
complementary slackness, 18, 22, 76 effective, 75
complete, 142 entering variable, 7
concave function, 82 equality constraint, 10
constraint extreme point, 32
equality, 10 extremizer, 56
requirement, 10 extremum, 56

151
152 Index

feasible action correspondence, 117, 121, 125 optimal solution, 1


feasible set, 3 optimal stationary strategy, 125
FHDP, 121 optimal strategy, 118
finite-horizon dynamic programming, 121 optimal strategy profile, 122
first order Karush-Kuhn-Tucker conditions, 76 optimal tableau, 10
first order Lagrange multiplier equations, 72
free variables, 7 polyhedron, 32
positive definite, 50
gradient, 45 positive semidefinite, 50
graph of correspondence, 107 principal submatrices, 50
greatest lower bound, 121 pseudo-concave, 93

Hessian matrix, 47 quadratic form, 50


quasi-convex, 88
indefinite, 50
infeasible, 4 rank, 2
infimum, 121 requirement constraint, 10
infinite-horizon dynamic programming, 125 rescaled concave function, 89
interior, 41 rescaled convex function, 89
inverse image, 43 rescaling, 89
resource constraint, 3
Karush-Kuhn-Tucker Theorem, 75 reward function, 117, 121, 125
Karush-Kuhn-Tucker Theorem under Convexity, 84
KKT-1,2,3, 76 SDB, 134
SDP, 125
Lagrangian, 72 second derivative, 47
least upper bound, 121 shadow prices, 17
level set, 43 simplex, 43
limit, 42 slack, 75
local maximum, 55 slack variable, 5
local minimum, 56 Slater condition, 84
locally bounded correspondence, 108 standard maximization linear program, 3
lower-hemicontinuous correspondence, 111 slack-variable form, 6
state, 117, 121, 125
marginal value, 18, 19, 28, 73 state space, 121, 125
Markovian strategy profile, 117, 122 stationary dynamic program, 125
maximizer, 1, 55 stationary strategy, 125
maximizers strategy
set of, 92, 109 Markovian, 117, 122
maximum, 1, 55 strict local maximum, 56
global, 55 strict local minimum, 56
local, 55 strictly concave function, 82
minimizer, 1, 56 strictly convex function, 82
minimum, 1, 56 supremum, 121
global, 56 surplus variable, 10
local, 56
MLP, 17, 19 tableau, 9
mLP, 18, 19 optimal, 10
tangent space, 71
negative definite, 50 tight, 75
negative semidefinite, 50 total reward, 121, 122, 125
neighborhood, 111 transition function, 117, 121, 125
non-basic variables, 7 transpose, 2
non-degenerate basic solution, 7
unbounded linear program, 4
non-Markovian strategy profile, 122
unconstrained local maximum, 56
norm, 142
unconstrained local minimum, 56
null space, 63
upper-hemicontinuous correspondence, 111
objective function, 3 utility function, 117
objective function row, 9
value function, 118, 121, 125
open, 40
vertex, 32
open ball, 40

You might also like