JO Ao Lopes Dias

MATHEMATICAL ECONOMICS: OPTIMIZATION
JOÃO LOPES DIAS
Contents
1. Introduction 2
1.1. Preliminaries 2
1.2. Optimal points and values 2
1.3. The optimization problems 3
1.4. Existence of optimal points 4
2. Unconstrained local optimization 5
2.1. Critical points 5
2.2. Classification of critical points 6
3. Equality constraints 7
3.1. Lagrange theorem 7
3.2. Classification of critical points 8
4. Inequality constraints 11
4.1. Kuhn-Tucker conditions 11
4.2. Mixed constrains 13
5. Convex and concave optimizations 14
5.1. Optimization 15
References 17
Index 18
These notes address the topics on “Optimization”, the second part

of the course “Mathematical Economics”. To follow them it is assumed
good knowledge of vector differential calculus and matrices.
Date: October 28, 2015.

1
2 JOÃO LOPES DIAS
1. Introduction
1.1. Preliminaries. Consider a subset D of Rn and a scalar function

f : D → R. We say that the image of D by f is the set
f (D) = {f (x) ∈ R : x ∈ D}
and, given A ⊂ R, the pre-image of A by f is
f −1 (A) = {x ∈ D : f (x) ∈ A}.
This last set corresponds to the points in D that are mapped by f into
A.
1.2. Optimal points and values. A point x∗ ∈ D is a maximizer of

f on D if for any x ∈ D it satisfies
f (x) ≤ f (x∗ ).
The value f (x∗ ) is the maximum of f on D and it is usually denoted
by any of the following notations:
max f = max f (x) = max{f (x) : x ∈ D} = max f (D).
D x∈D
Whenever there is no ambiguity regarding the set D, we simply write

max f . Notice that the maximum of f can also be seen as the maximum
of the image set f (D). Moreover, it is clear that maximizers and the
maximum might not exist.
Similarly, we define a minimizer and the minimum of f on D writing
min f = min f (x) = min{f (x) : x ∈ D} = min f (D).
D x∈D
Example 1.1. Let f : R2 → R given by f (x, y) = x2 + y 2 and D = R2 .

Then minD f = 0 but there is no maximum. However, if D = [0, 1] ×
[0, 1] we have maxD f = 2.
When restricting to local neighbourhoods, we obtain a more general

definition of a maximizer. A point x∗ is a local maximizer of f on D if
there is ε > 0 such that
f (x) ≤ f (x∗ ), x ∈ D ∩ Bε (x∗ ),
where
Bε (x∗ ) = {x0 ∈ Rn : kx0 − xk < ε}
is the open ball centered at x∗ with radius ε. If in addition we have
the inequality f (x) < f (x∗ ) when x ∈ D ∩ Bε (x∗ ) \ {x∗ }, we say that
x∗ is a strict local maximizer . The image by f of a local maximizer is
called a local maximum.
Again, we have the corresponding definitions of local minimizer ,
strict local minimizer and local minimum.
MATHEMATICAL ECONOMICS: OPTIMIZATION 3
Maximizers and minimizers are called (global) optimal points and

local optimal points in case of being local. The maximum and the
minimum are known as (global) optimal values and local optimal values
in the local situation.
1.3. The optimization problems. Let D be a subset of Rn and a

scalar function f : D → R. The computation of the (global) optimal
points of f on D is called the optimization problem. Moreover, the
local optimization problem is to find the local optimal points of f on
D. Observe that the local optimization solutions include the global
ones.
In some contexts f is called the objective function and D the con-
straint set.
Depending on the choices of f and D there are different names for
the optimization problem. Consider f to be a linear function1 and2
D = {x ∈ Rn : g(x) = a, h(x) ≥ b}
where g : Rn → Rm and h : Rn → R` are also linear functions, a ∈ Rm
and b ∈ R` . The corresponding optimization problem is usually called
linear optimization (or linear programming). Otherwise, it is called
nonlinear optimization or (nonlinear programming).
Generalizations of linear optimization are the convex and the concave
optimizations. They consist in taking f , g and h as convex functions
or concave functions (see section 5 for the definitions).
When restricting to solutions which are integers we are dealing with
integer optimization (or integer programming).
We present below some examples from Economics.
Example 1.2 (Utility maximization). Consider a consumer buying xi ≥
0 units of the commodity i with price pi , i = 1, . . . , n, and with income
I > 0. The utility is given by u(x1 , . . . , xn ). We want to know how
much should the consumer buy of each commodity in order to maximize
the utility.
Since the spending is given by
n
X
p·x= p i xi ,
i=1
and it should be less or equal than the income I, we choose
D = {x = (x1 , . . . , xn ) ∈ (R+ n
0 ) : p · x ≤ I}.
1A function f : Rn → Rk is linear if f (αx + βx0 ) = αf (x) + βf (x0 ) for any

0
x, x ∈ Rn and α, β ∈ R. It thus have to be in the form f (x) = M x where M is a
k × n matrix.
2
We write h(x) ≥ b to mean that hi (x) ≥ bi for every i = 1, . . . , `.
4 JOÃO LOPES DIAS
We want to find the maximizers of u on D.

Example 1.3 (Spending minimization). Given an utility value ū, we
want to minimize the spending that gives us an utility at least equal
to ū.
Writing
D = {x ∈ (R+ n
0 ) : u(x) ≥ ū}
and f (x) = p · x, we need to find the minimizers of f on D.
Example 1.4 (Profit maximization). A firm produces y units of one
output product which sells at price p(y) using n inputs. Let xi be the
amount of units of the i-th input and wi its price. So, the cost of the
inputs is w · x = (w1 , . . . , wn ) · (x1 , . . . , xn ). We want to find the input
quantities x that maximize the profit.
Since the profit function is given by f (x) = p(g(x)) · g(x) − w · x and
we write D = (R+ )n , the goal is to find the maximizers of f on D.
Example 1.5 (Cost minimization). For the previous example, we now
want to minimize the cost of producing at least ȳ units of output.
So, for D = {x ∈ (R+ )n : g(x) ≥ ȳ} and the cost function f (x) =
w · x, we need to find the minimizers of f on D.
1.4. Existence of optimal points. The following theorem is a well-

known criterium to determine if there are solutions to the optimization
problem.
Theorem 1.6 (Weierstrass). If D ⊂ Rn is compact and f : D → R is
continuous, then there exists the maximum and the minimum of f on
D.
Proof. This follows from the fact that the image of a compact set by a
scalar continuous function is a compact set in R. Compact sets in R
have always a maximum and a minimum. We can therefore apply this
to f (D) whose extreme points are the maximum and minimum of f on
D.
Remark 1.7. The Weierstrass theorem only states the existence of op-
timal points, it does not compute them. For that we need to introduce
other tools like differential calculus as is done in the next sections. This
restricts our study to differentiable functions. In the last sections we
will be dealing with non-differentiable functions which are convex or
concave.
The following simple examples show that the Weierstrass theorem

can not be improved.
Example 1.8. Let D = R (not bounded) and f (x) = x. Then, f (D) =
R and f does not have optimal values on D.
Example 1.9. Let D =]0, 1[ (not closed) and f (x) = x. Then, f (D) =
]0, 1[ and f does not have optimal values on D.
Example 1.10. Let D = [0, 1] and
(
x, x ∈]0, 1[
f (x) = 1
2
, x ∈ {0, 1}
(not continuous). Then, f (D) =]0, 1[ and f does not have optimal
values on D.
2. Unconstrained local optimization
We want here to find local optimal points inside the interior of D.

We first show that optimal points are critical points. By first finding
all critical points, we will try to decide which ones are optimal points.
Let f : D → R be differentiable in the interior of D ⊂ Rn (denoted
by Int D). Recall that the derivative of f at x∗ ∈ Int D is a linear
function given by the Jacobian matrix:
h i
∗ ∂f ∗ ∂f ∗
Df (x ) = ∂x1 (x ) · · · ∂xn (x ) .
Moreover, the second derivative of f at x∗ is a quadratic form given by
the n × n Hessian matrix:
 2 
∂ f ∗ ∂2f ∗
2 (x ) · · · ∂x1 ∂xn
(x )
2 ∗
 ∂x1 . .. 
D f (x ) = 
 .. . .

∂2f ∗ ∂2f ∗
∂xn ∂x1
(x ) · · · ∂x2
(x )
n
We say that f is C 1 if its first partial derivatives are continuous,

and C 2 if the second partial derivatives are continuous. Notice that
if f is C 2 then the second derivative is a symmetric matrix (Schwarz
theorem).
2.1. Critical points. We say that x∗ ∈ Int D is a critical point of f

(or stationary point) if Df (x∗ ) = 0.
Theorem 2.1. If x∗ ∈ Int D is a local optimal point, then it is a critical
point of f .
Proof. For each i = 1, . . . , n and a sufficiently small δ > 0, let ψi : ] −

δ, δ[→ R, ψi (t) = f (x∗ +tei ), where ei is the i-th vector of the canonical
basis of Rn . Since ψi (0) = f (x∗ ) is a local optimal value, then ψi0 (0) = 0.
Therefore, Df (x∗ ) ei = 0 for any i. This means that all the columns of
Df (x∗ ) are zero.
Remark 2.2.
6 JOÃO LOPES DIAS
(1) The converse of the above theorem is not true. There are exam-
ples of critical points which are not local optimal points. For
the function f (x) = x3 the point x∗ = 0 is critical but not
optimal.
(2) The above theorem is restricted to interior points of p D where
the function is differentiable. For example, f (x, y) = x2 + y 2
on D = {(x, y) ∈ Rs : x2 + y 2 ≤ 1} has a minimizer at the origin
(which is in the interior of D), but f is not differentiable there.
A critical point which is not a local optimal point is called saddle

point. Hence, the set of critical points is the union of the sets of local
optimal and saddle points:
{critical points} = {local optimal points} ∪ {saddle points}.
2 2
Exercise 2.3. Consider the function f (x, y) = xye−(x +y )/2 defined in
R2 . Show that its critical points are (0, 0), (1, 1), (1, −1), (−1, 1),
(−1, −1). Try to determine if they are optimal points.
Exercise 2.4. The function g(x, y) = (x2 − y 2 )2 + x2 y 2 is always larger
or equal to 0. Prove that there is only one critical point (0, 0), and as
g(0, 0) = 0 we have that the origin is a minimizer.
2.2. Classification of critical points. Recall now the following clas-

sification of a symmetric n × n matrix A.
(1) A is positive definite (A > 0) if v T Av > 0 for any v ∈ Rn \{0}.
(2) A is negative definite (A < 0) if v T Av < 0 for any v ∈
Rn \ {0}.
(3) A is positive semi-definite (A ≥ 0) if v T Av ≥ 0 for any
v ∈ Rn \ {0}.
(4) A is negative semi-definite (A ≤ 0) if v T Av ≤ 0 for any
v ∈ Rn \ {0}.
(5) A is indefinite if it is not any of the above.
Notice that the above definitions are equivalent to statements about
the signs of the eigenvalues of A. In fact,
(1) A > 0 iff all eigenvalues are positive.
(2) A < 0 iff all eigenvalues are negative.
(3) A ≥ 0 iff all eigenvalues are non-negative.
(4) A ≤ 0 iff all eigenvalues are non-positive.
(5) A is indefinite if it has positive and negative eigenvalues.
We are going to check the above condition for the second derivative
of f at critical points.
Theorem 2.5. If f is C 2 and x∗ ∈ Int D is a critical point and
(1) D2 f (x∗ ) > 0, then x∗ is a strict local minimizer.
(2) D2 f (x∗ ) < 0, then x∗ is a strict local maximizer.

(3) x∗ is a local minimizer, then D2 f (x∗ ) ≥ 0.
(4) x∗ is a local maximizer, then D2 f (x∗ ) ≤ 0.
(5) D2 f (x∗ ) is indefinite, then x∗ is a saddle point.
Proof. Take the Taylor expansion of f on a neighborhood U ⊂ Int D

of x∗ given by
1
f (x) − f (x∗ ) = (x − x∗ )T D2 f (c) (x − x∗ )
2
where x ∈ U and some t is sufficiently small such that c = x∗ + t(x −
x∗ ) ∈ U . Notice that we have used the assumption Df (x∗ ) = 0.
First consider the case D2 f (x∗ ) > 0. Because D2 f is continuous,
we can take U small enough so that D2 f (c) > 0 (notice that the same
could have not be done if we were considering a positive semi-definite
matrix). Hence, f (x) − f (x∗ ) > 0 when x 6= x∗ , which means that x∗
is a strict local minimizer. The same idea could be used to prove the
claims (2-4). (5) follows from (3) and (4) directly.
Exercise 2.6. Find the optimal points of
(1) f (x, y, z) = z log(x2 + y 2 + z 2 ) on R3 \ {(0, 0, 0)}.
(2) f (x, y, z) = x4 + y 4 + z 4 − 4xyz on R3 .
3. Equality constraints
We now want to find local optimal points of a given function f when

restricting to a non open set defined as the zeros of another function
g. This will be achieved by the computation and classification of the
critical points of a function combining f and g.
3.1. Lagrange theorem. Given an open subset U ⊂ Rn and C 1 func-

tions f : U → R and g : U → Rm , we want to find the optimal points
of f on the constraint set given by the zeros of g:
D = {x ∈ U : g(x) = 0}.
Recall that the rank of a matrix A (denoted by rank A) is the number

of linear independent rows or columns.
Theorem 3.1 (Lagrange). If x∗ ∈ D is a local optimal point of f on
D and rank Dg(x∗ ) = m, then there is λ∗ ∈ Rm such that
Df (x∗ ) + λ∗ T Dg(x∗ ) = 0.
Proof. First, reorder the coordinates such that the first m rows of
Dg(x∗ ) are linearly independent. Write the coordinates x = (w, z) ∈
Rm × Rn−m and suppose that x∗ = (w∗ , z ∗ ) ∈ D is a local maximizer
(we case use the same ideas for a minimizer).
8 JOÃO LOPES DIAS
∂g ∂g
Since rank ∂w (x∗ ) = m we have that det ∂w (x∗ ) 6= 0. By the implicit
function theorem, there is a C 1 function h : V → Rm defined on a
neighborhood V of z ∗ such that
• h(z ∗ ) = w∗ ,
• g(h(z), z) = 0, z ∈ V ,
∂g
• Dh(z ∗ ) = − ∂w (x∗ )−1 ∂g
∂z
(x∗ ).
∂f ∂g
Choose λ∗ T = − ∂w (x∗ ) ∂w (x∗ )−1 , so that
∂f ∗ ∂g ∗
(x ) + λ∗ T (x ) = 0.
∂w ∂w
Finally, let F : V → R, F (z) = f (h(z), z), which has a local max-
imum at z ∗ ∈ V . This is an unconstrained problem since V is open,
thus DF (z ∗ ) = 0. That is,
Dh(z ∗ )

∗ ∗
Df (h(z ), z ) = 0.
I
This yields, after simplification,
∂f ∗ ∂g
(x ) + λ∗ T (x∗ ) = 0.
∂z ∂z

Exercise 3.2. Let f (x, y) = x3 + y 3 and g(x, y) = x − y. Solve with
respect to (x, y) and λ the equations f (x, y) + λT g(x, y) = 0 and
g(x, y) = 0.
3.2. Classification of critical points. Given a symmetric n × n ma-

trix A, an m × n matrix B and
Z = {v ∈ Rn : B v = 0}.
We have the following classification of A when restricting to Z.
(1) A is positive definite on Z (A > 0 on Z) if v T Av > 0 for any
v ∈ Z \ {0}.
(2) A is negative definite on Z (A < 0 on Z) if v T Av < 0 for
any v ∈ Z \ {0}.
(3) A is positive semi-definite on Z (A ≥ 0 on Z) if v T Av ≥ 0
for any v ∈ Z \ {0}.
(4) A is negative semi-definite on Z (A ≤ 0 on Z) if v T Av ≤ 0
for any v ∈ Z \ {0}.
(5) A is indefinite on Z if it is not any of the above.
It is clear that if A is positive definite, then it is also positive definite
on Z.
There is a criterion for the positive and negative definite cases. For
each i = m + 1, . . . , n define the (m + i) × (m + i) symmetric matrix

0 Bi
Ci = ,
BiT Ai
where Ai is the matrix consisting of the first i rows and i columns of
A, and Bi is made of the first i columns of B.
(1) A > 0 on Z iff (−1)m det Ci > 0 for every i = m + 1, . . . , n.
(2) A < 0 on Z iff (−1)i det Ci > 0 for every i = m + 1, . . . , n.
Example 3.3. Let

2 1
A= and B = 1 2 .
1 1
In this case m = 1 and n = 2. Thus,
 
0 1 2
C2 = 1 2 1
2 1 1
and (−1)1 det C2 = 5 > 0. Therefore, A > 0 on Z.
Consider f and g of class C 2 . Then,

m
X
2
A(x, λ) = D f (x) + λi D2 gi (x)
i=1
is a symmetric n × n matrix. Choose also B(x) = Dg(x), thus

Z(x) = {v ∈ Rn : Dg(x) v = 0}.
Theorem 3.4. Let x∗ ∈ D and λ∗ ∈ Rm such that rank B(x∗ ) = m
and
Df (x∗ ) + λ∗ T Dg(x∗ ) = 0.
If
(1) A(x∗ , λ∗ ) > 0 on Z, then x∗ is a strict local minimizer of f on
D.
(2) A(x∗ , λ∗ ) < 0 on Z, then x∗ is a strict local maximizer of f on
D.
(3) x∗ is a local minimizer of f on D, then A(x∗ , λ∗ ) ≥ 0 on Z.
(4) x∗ is a local maximizer of f on D, then A(x∗ , λ∗ ) ≤ 0 on Z.
(5) A(x∗ , λ∗ ) is indefinite on Z, then x∗ is a saddle point of f on
D.
Proof. We are going to show that if x∗ is not a strict local minimizer

of f on D, then there is y ∈ Z(x∗ ) \ {0} such that y T A(x∗ , λ∗ )y ≤ 0.
The remaining assertions are left as an exercise.
10 JOÃO LOPES DIAS
For ε > 0 there is x(ε) ∈ D ∩ Bε (x∗ ) \ {x∗ } such that

f (x(ε)) ≤ f (x∗ ).
Consider the sequences εn > 0 such that εn → 0, xn = x(εn ) → x∗

and
xn − x∗
yn = .
kxn − x∗ k
As yn lies inside the compact C = {z ∈ Rn : kzk = 1} for any n ≥ 1,
there is a convergent subsequence ykn → y ∈ C with y 6= 0.
The Taylor expansion of g on a neighborhood of x∗ is given by
g(xkn ) = g(x∗ ) + Dg(cn ) (xkn − x∗ ),
where cn = x∗ + t(xkn − x∗ ) and t is sufficiently small. Since g(xkn ) =
g(x∗ ) = 0 because they are points in D, we have
Dg(cn ) ykn = 0.
Since g is C and cn → x because xkn → x∗ , we have Dg(x∗ )y = 0
1 ∗
and y ∈ Z(x∗ ).
Consider the C 2 function L(x) = f (x) + λ∗ T g(x) on D by fixing
λ = λ∗ . Notice that DL(x∗ ) = 0 and Taylor’s formula around x∗ yields
1
L(xkn ) = L(x∗ ) + (xkn − x∗ )T D2 L(cn )(xkn − x∗ )
2
for some cn as before. Hence,
1 T 2 f (xkn ) − f (x∗ )
ykn D L(cn )ykn = ≤ 0.
2 kxkn − x∗ k2
For n → +∞ we get y T D2 L(x∗ )y ≤ 0.
Exercise 3.5. Determine the local optimal points of f on D for:
(1) f (x, y) = xy,
D = {(x, y) ∈ R2 : x2 + y 2 = 2a2 }
(2) f (x, y) = 1/x + 1/y,
D = {(x, y) ∈ R2 : (1/x)2 + (1/y)2 = (1/a)2 }
(3) f (x, y, z) = x + y + z,
D = {(x, y, z) ∈ R3 : (1/x) + (1/y) + (1/z) = 1}
(4) f (x, y, z) = x2 + 2y − z 2 ,
D = {(x, y, z) ∈ R3 : 2x − y = 0, x + z = 6}
(5) f (x, y) = x + y,
D = {(x, y) ∈ R2 : xy = 16}
(6) f (x, y, z) = xyz,
D = {(x, y, z) ∈ R3 : x + y + z = 5, xy + xz + yz = 8}
4. Inequality constraints
Consider now the problem of finding the optimal points of a function

under restrictions given by inequalities.
4.1. Kuhn-Tucker conditions. Given an open set U ⊂ Rn and a C 1

function h : U → R` , consider the restriction set
D = {x ∈ U : h(x) ≥ 0}.
Notice that we are using the notation v = (v1 , . . . , v` ) ≥ 0 to stand for

vi ≥ 0 for every i = 1, . . . , `. We use similarly v ≤ 0, v > 0 and v < 0.
Theorem 4.1 (Kuhn-Tucker). If x∗ is a local optimal point of f on

D,
h1 (x∗ ) = · · · = hm (x∗ ) = 0, hm+1 (x∗ ), . . . , h` (x∗ ) > 0,
and rank D(h1 , . . . , hm )(x∗ ) = m, then there is λ∗ ∈ R` such that
(Kuhn-Tucker conditions):
(1) Df (x∗ ) + λ∗ T Dh(x∗ ) = 0,

(2) λ∗i hi (x∗ ) = 0, i = 1, . . . , `.
Moreover,
• if x∗ is a local minimizer, then λ∗ ≤ 0.

• if x∗ is a local maximizer, then λ∗ ≥ 0.
Proof. A simple generalization of Theorem 3.1 implies that under our

assumptions there is λ∗ ∈ R` such that λ∗m+1 = · · · = λ∗` = 0 and
Df (x∗ ) + λ∗ T Dh(x∗ ) = 0. Thus we have shown (1) and (2).
We now want to prove that if in addition x∗ is a local maximizer, then
λ∗1 , . . . , λ∗m
≥ 0. The case of a minimizer follows simply by considering
a maximizer for −f .
Let j ∈ {1, . . . , m}, ej = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rm where the only
non-zero component is the j-th and write H = (h1 , . . . , hm ). It is
enough to show that there is a C 1 map φ : ] − δ, δ[→ Rn for some δ > 0
such that φ(0) = x∗ and H ◦ φ(t) = tej . Thus, H ◦ φ(t) ≥ 0 for
t ≥ 0. Since (hm+1 , . . . , h` )(x∗ ) > 0, the continuity of h implies for a
sufficiently small δ that
h(φ(t)) = (H, hm+1 , . . . , h` ) ◦ φ(t) ≥ 0, t ≥ 0.
Hence, φ(t) ∈ D for t ≥ 0.

Furthermore, since x∗ is a local maximizer of f on D, f (φ(t)) −

f (φ(0)) ≤ 0 for t ≥ 0, and
λ∗j = (λ∗1 , . . . , λ∗m ) · ej
= (λ∗1 , . . . , λ∗m ) · (H ◦ φ)0 (0)
= (λ∗1 , . . . , λ∗m , 0, . . . , 0) · (h ◦ φ)0 (0)
= λ∗ T Dh(x∗ ) Dφ(0)
= −Df (x∗ ) Dφ(0) = (f ◦ φ)0 (0)
f (φ(t)) − f (φ(0))
= − lim+ ≥ 0.
t→0 t
Hence λ∗ ≥ 0.
Finally, we show the existence of the above map φ. Recall the con-
dition rank DH(x∗ ) = m. By reordering the variables xi , i = 1, . . . , n,
we make the first m columns of DH(x∗ ) linearly independent. Write
π(x) = (xm+1 , . . . , xn ). We want to use the implicit function theorem
applied to F : U × R → Rn given by
F (x, t) = (H(x) − tej , π(x − x∗ )).
Notice that F (x∗ , 0) = 0 and ∂F
∂x
(x∗ , 0) has n linearly independent rows.
Then, there is a C 1 map φ on a neighbourhood of zero with values
around x∗ , such that φ(0) = x∗ and F (φ(t), t) = 0. So, H ◦ φ(t) =
tej .
Example 4.2. Consider the case f (x, y) = x2 − y and h(x, y) = 1 − x2 −
y 2 . Hence, D = {(x, y) ∈ R2 : x2 + y 2 ≤ 1} is a compact set (the disk
of radius 1 centered at the origin). As f is continuous, the Weierstrass
theorem guarantees the existence of minimizer and maximizer points
of f on D. The Kuhn-Tucker conditions along with the condition that
(x, y) ∈ D are
 ∂f ∂h
 ∂x + λ ∂x = 0

 ∂f + λ ∂h = 0

∂y ∂y


 λh(x, y) =0
h(x, y) ≥ 0.

This implies that



 2x − 2λx = 0

−1 − 2λy = 0

 λ(1 − x2 − y 2 ) = 0

 2
x + y2 ≤ 1
with solutions (x∗ , y ∗ , λ∗ ) given by
√ √
(0, −1, 1/2), (0, 1, −1/2), ( 3/2, −1/2, 1), (− 3/2, −1/2, 1).
Since h(x∗ , y ∗ ) = 0 for all cases and

rank Dh(x∗ , y ∗ ) = 1,
√ √
the points (0, −1), (0, 1), ( 3/2, −1/2) and (− 3/2, −1/2) are thus
candidates to be local optimal points according to Theorem 4.1. In
particular, (0, 1) has to be the minimizer because it is the only one
∗
with
√ λ < 0. Comparing √ all the other values we can check that
f ( 3/2, −1/2) = f (− 3/2, −1/2) = 5/4 is the maximum of f on
D.
Exercise 4.3. Find the optimal points of f on D for:
(1) f (x, y) = 2x2 + 3y 2 ,
D = {(x, y) ∈ R2 : 11 − x − 2y ≥ 0, x ≥ 0, y ≥ 0}.
(2) f (x, y) = (2x + y)2 ,
D = {(x, y) ∈ R2 : 16 − x2 − y ≥ 0, x ≥ 0, y ≥ 0}.
4.2. Mixed constrains. Using the previous results we can now mix
equality with inequality constraints in order to have a more general
optimization theorem.
Let U ⊂ Rn be open, g : U → Rk and h : U → R` be C 1 functions,
and
D = {x ∈ U : g(x) = 0, h(x) ≥ 0}.
Theorem 4.4. If x∗ is a local optimal point of f on D,
h1 (x∗ ) = · · · = hm (x∗ ) = 0, hm+1 (x∗ ), . . . , h` (x∗ ) > 0
and rank D(g1 , . . . , gk , h1 , . . . , hm )(x∗ ) = k + m, then there is λ∗ ∈ Rk+`
such that for j = k + 1, . . . , k + `,
(1) Df (x∗ ) + λ∗ T D(g, h) = 0,
(2) λ∗k+i hi (x∗ ) = 0, i = 1, . . . , `.
Moreover,
• if x∗ is a local minimizer, then (λ∗k+1 , . . . , λ∗k+` ) ≤ 0.
• if x∗ is a local maximizer, then (λ∗k+1 , . . . , λ∗k+` ) ≥ 0.
This theorem is a direct consequence of the Lagrange and Kuhn-

Tucker theorems, hence we omit its proof.
Example 4.5. Consider f (x, y) = log(xy) and
D = {(x, y) ∈ R2 : x2 + y 2 = 1, x > 0, y > 0}.
So, we write g(x, y) = x2 + y 2 − 1, h1 (x, y) = x and h2 (x, y) = y. In
addition, n = 2, k = 1, ` = 2 and m = 0. The solutions of the system
of equations and inequalities:

 ∂f ∂g

 ∂x
+ λ1 ∂x + λ2 ∂h
∂x
1
+ λ3 ∂h
∂x
2
=0
 ∂f ∂g ∂h1 ∂h2



 ∂y
+ λ1 ∂y + λ2 ∂y + λ3 ∂y = 0 = 0

g(x, y) = 0


λ2 h1 (x, y) = 0

λ3 h2 (x, y) = 0








 h1 (x, y) > 0
h2 (x, y) > 0.

The last four conditions above impliy easily that λ2 = λ3 = 0. Hence,


1


 x
+ 2λ1 x = 0
1
 y + 2λ1 y = 0



x2 + y 2 − 1 = 0

x>0





y > 0
√ √
and the solutions are (x, y) = ( 22 , 22 ) and λ1 = −1. Since rank Dg(x, y) =
1, √this√ is the only candidate to a local optimization point. Now,
f ( 22 , 22 ) = log 12 is larger to the image of any point in D close to
(0, 1), it must correspond to a maximum.
5. Convex and concave optimizations
A set D ⊂ Rn is convex if
λx + (1 − λ)y ∈ D
for any λ ∈ [0, 1] and x, y ∈ D. That is, the line joining any two points
of D is still contained in D.
Given a convex set D ⊂ Rn , f : D → R is a convex function on D if
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
for any λ ∈ [0, 1] and x, y ∈ D. If the above inequality is strict, we say
that f is strictly convex on D.
On the other hand, f is a concave function on D if −f is convex In
addition, f is strictly concave on D if −f is strictly convex.
Notice that functions could be both convex and concave or neither
convex nor concave. Moreover, a function both convex and concave
can not be neither strictly convex nor strictly concave.
Example 5.1. Let f (x) = x3 on R and x = 1, y = −1. For λ = 1/4 we
have
f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y)
and for λ = 3/4,

f (λx + (1 − λ)y) > λf (x) + (1 − λ)f (y).
Thus, f is neither convex nor concave.
Exercise 5.2. Show that f : Rn → R is both convex and concave iff f
is affine (i.e. f (x) = aT x + b for some a ∈ Rn and b ∈ R).
Theorem 5.3. If f : D → R is differentiable on D ⊂ Rn open and
convex, then
(1) f is concave on D iff
Df (x) (y − x) ≥ f (y) − f (x), x, y ∈ D.
(2) f is convex on D iff
Df (x) (y − x) ≤ f (y) − f (x), x, y ∈ D.
Proof.
Theorem 5.4. Let f : D → R be C 2 on D ⊂ Rn open and convex.
(1) f is concave on D iff D2 f (x) ≤ 0 for any x ∈ D.
(2) f is convex on D iff D2 f (x) ≥ 0 for any x ∈ D.
(3) If D2 f (x) < 0 for any x ∈ D, then f is strictly concave.
(4) If D2 f (x) > 0 for any x ∈ D, then f is strictly convex.
Proof.
Example 5.5. Consider the function f (x) = log(xα1 . . . xαn ) for xi > 0
and α > 0. Hence,
(
∂ 2f − xα2 , i = j
(x) = i
∂xi ∂xj 0, i 6= 0.
So, D2 f (x) < 0 and f is strictly concave.
5.1. Optimization.
Theorem 5.6. Let D ⊂ Rn be convex and f : D → R convex (concave).
Then,
(1) any local minimizer (maximizer) of f on D is in fact global.
(2) the set of minimizers (maximizers) of f on D is either empty
or convex.
(3) if f is strictly convex (concave), then the set of minimizers
(maximizers) of f on D is either empty or it contains a single
point.
Proof.
5.1.1. Unconstrained optimization.

Theorem 5.7. Let D ⊂ Rn open and convex and f : D → R convex
(concave) and differentiable. Then, x∗ is a minimizer (maximizer) of
f on D iff Df (x∗ ) = 0.
Proof. We already know that a maximizer is a critical point. It remains

to show the reverse.
Consider f convex (use the same idea for f concave). For any y ∈ D
we have
f (y) − f (x∗ ) ≥ Df (x∗ ) (y − x∗ ) = 0.
So, f (y) ≥ f (x∗ ) and x∗ is a minimizer.
5.1.2. Kuhn-Tucker conditions under convexity. Given an open and

convex set U ⊂ Rn and a C 1 function h : U → R` , take as before
the restriction set
D = {x ∈ U : h(x) ≥ 0}.
We say that h is convex (concave) if all its components hi , i = 1, . . . , `
are convex (concave)
Theorem 5.8 (Convex Kuhn-Tucker). If f and hi are convex C 1 func-
tions and there is x̄ ∈ U satisfying h(x̄) > 0, then x∗ is a minimizer of
f on D iff there is λ∗ ∈ R` such that
(1) Df (x∗ ) + λ∗ T Dh(x∗ ) = 0,
(2) λ∗i hi (x∗ ) = 0, i = 1, . . . , `.
(3) λ∗ ≤ 0.
Remark 5.9. Notice that the above conditions will give either no solu-
tions or a unique solution.
Proof.
The corresponding version for concave functions follows.

Theorem 5.10 (Concave Kuhn-Tucker). If f and hi are concave C 1
functions and there is x̄ ∈ U satisfying h(x̄) > 0, then x∗ is a maximizer
of f on D iff there is λ∗ ∈ R` such that
(1) Df (x∗ ) + λ∗ T Dh(x∗ ) = 0,
(2) λ∗i hi (x∗ ) = 0, i = 1, . . . , `.
(3) λ∗ ≥ 0.
Exercise 5.11. Find the optimal points of f on D:
(1) f (x, y) = x + y,
D = {(x, y) ∈ R2 : 8 − 2x − y ≥ 0, x ≥ 0, y ≥ 0}.
(2) f (x, y) = (10 − x − y)(x + y) − ax − y − y 2 ,

D = {(x, y) ∈ R2 : x ≥ 0, y ≥ 0},
where a > 0.
References
[1] R. K. Sundaram. A First Course in Optimization Theory. Cambridge Univer-
sity Press, 1996.
Index
concave nonlinear, 3
function, 14
strictly, 14 set
convex constraint, 3
function, 14 Theorem
strictly, 14 Weierstrass, 4
set, 14
function
concave, 3
convex, 3
objective, 3
functions
affine, 3
image
of a set, 2
maximizer, 2
local, 2
strict, 2
maximum, 2
local, 2
minimizer
local, 2
strict, 2
minimum, 2
local, 2
optimal
points, 3
local, 3
values, 3
local, 3
optimization
concave, 3
convex, 3
integer, 3
linear, 3
local
problem, 3
problem, 3
point
critical, 5
saddle, 6
stationary, 5
pre-image
of a set, 2
programming
integer, 3
linear, 3
18
Departamento de Matemática, ISEG, Universidade de Lisboa, Rua

do Quelhas 6, 1200-781 Lisboa, Portugal
E-mail address: [email protected]

JO Ao Lopes Dias

Uploaded by

Copyright:

Available Formats

JO Ao Lopes Dias

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JO Ao Lopes Dias

Uploaded by

Copyright:

Available Formats

MATHEMATICAL ECONOMICS: OPTIMIZATION

JOÃO LOPES DIAS

These notes address the topics on “Optimization”, the second part

Date: October 28, 2015.

1.1. Preliminaries. Consider a subset D of Rn and a scalar function

1.2. Optimal points and values. A point x∗ ∈ D is a maximizer of

Whenever there is no ambiguity regarding the set D, we simply write

Example 1.1. Let f : R2 → R given by f (x, y) = x2 + y 2 and D = R2 .

When restricting to local neighbourhoods, we obtain a more general

Maximizers and minimizers are called (global) optimal points and

1.3. The optimization problems. Let D be a subset of Rn and a

1A function f : Rn → Rk is linear if f (αx + βx0 ) = αf (x) + βf (x0 ) for any

We want to find the maximizers of u on D.

1.4. Existence of optimal points. The following theorem is a well-

The following simple examples show that the Weierstrass theorem

2. Unconstrained local optimization

We want here to find local optimal points inside the interior of D.

We say that f is C 1 if its first partial derivatives are continuous,

2.1. Critical points. We say that x∗ ∈ Int D is a critical point of f

Proof. For each i = 1, . . . , n and a sufficiently small δ > 0, let ψi : ] −

A critical point which is not a local optimal point is called saddle

2.2. Classification of critical points. Recall now the following clas-

(2) D2 f (x∗ ) < 0, then x∗ is a strict local maximizer.

Proof. Take the Taylor expansion of f on a neighborhood U ⊂ Int D

We now want to find local optimal points of a given function f when

3.1. Lagrange theorem. Given an open subset U ⊂ Rn and C 1 func-

Recall that the rank of a matrix A (denoted by rank A) is the number

3.2. Classification of critical points. Given a symmetric n × n ma-

Consider f and g of class C 2 . Then,

is a symmetric n × n matrix. Choose also B(x) = Dg(x), thus

Proof. We are going to show that if x∗ is not a strict local minimizer

For ε > 0 there is x(ε) ∈ D ∩ Bε (x∗ ) \ {x∗ } such that

Consider the sequences εn > 0 such that εn → 0, xn = x(εn ) → x∗

Consider now the problem of finding the optimal points of a function

4.1. Kuhn-Tucker conditions. Given an open set U ⊂ Rn and a C 1

Notice that we are using the notation v = (v1 , . . . , v` ) ≥ 0 to stand for

Theorem 4.1 (Kuhn-Tucker). If x∗ is a local optimal point of f on

(1) Df (x∗ ) + λ∗ T Dh(x∗ ) = 0,

• if x∗ is a local minimizer, then λ∗ ≤ 0.

Proof. A simple generalization of Theorem 3.1 implies that under our

h(φ(t)) = (H, hm+1 , . . . , h` ) ◦ φ(t) ≥ 0, t ≥ 0.

Hence, φ(t) ∈ D for t ≥ 0.

Furthermore, since x∗ is a local maximizer of f on D, f (φ(t)) −

This implies that

Since h(x∗ , y ∗ ) = 0 for all cases and

This theorem is a direct consequence of the Lagrange and Kuhn-

of equations and inequalities:

The last four conditions above impliy easily that λ2 = λ3 = 0. Hence,

5. Convex and concave optimizations

and for λ = 3/4,

5.1.1. Unconstrained optimization.

Proof. We already know that a maximizer is a critical point. It remains

5.1.2. Kuhn-Tucker conditions under convexity. Given an open and

The corresponding version for concave functions follows.

(2) f (x, y) = (10 − x − y)(x + y) − ax − y − y 2 ,

Departamento de Matemática, ISEG, Universidade de Lisboa, Rua

You might also like