University of Maryland: Econ 600
University of Maryland: Econ 600
1. Introduction12
The first half of the course is about techniques for solving a class of constrained
optimization problems that, in their most general form can be described verbally as
follows. There are some outcomes, x A that a decision-maker cares about. The
decision-maker can only choose from a subset, C A. What is the best choice for
the decision-maker?
Micro-economics is very much about problems of this sort:
How does a consumer select a consumption bundle from his affordable set
to maximize utility?
How does a monopolist set prices to maximize profits given demand?
How can a manager select inputs so as to minimize costs and achieve a
target output?
And so on. Mathematically, this problem can be stated by assuming there exists
a real-valued function, f : A R which characterizes the preferences of the
decision-maker. The decision-maker then solves the problem
max f (x)
xC
Although the theory allows us to deal with a wide variety of different types of x,
in this course we will almost exclusively focus on the case where A = Rn and the
decision-maker is selecting an n-dimensional vector.
STATIC LECTURE 1
From your initial experience with microeconomics, you may already have been
confronted with concepts and issues in constrained optimization. Many of the
techniques you acquired in coping with these problems are covered in more detail
in this course.
The intent is two-fold. First, to give you a grounding in the theory that underlies
much of the standard constrained optimization techniques that you already are
using. Second, because the statements of some of the theorems are delicate, (they
require some special conditions) you need to be aware of when you can use them
directly, when some dangers arise and some ways of coping with these dangers.
The plan of the (first half of the) course is to establish some mathematical preliminaries, then to start with unconstrained optimization (Part II), next characteristics
of constraint sets (Part III)), then to go to the most important theory in constrained
optimization, Kuhn-Tucker Theory (Part IV)). After that, applications are studied
to appreciate where the theory works and where it fails. The last section is an
important area of applications of K-T Theory which is how to use it to conduct
comparative statics. (Part V)). The last half of the course then uses these techniques
to examine dynamic optimization.
2. Preliminary Concepts
2.1. Some Examples. How do we actually go about solving an optimization
problem of the form maxx f (x)? One can imagine just programming the function
and computing the solution. But what would be a more reliable analytic approach?
You all are probably fairly confident about the ability to solve many simply
unconstrained optimization problems. Loosely speaking, you take a first derivative
and find where the slope of the function is zero.
Obviously, this is too simple as the next figures show (Figures 2.1, 2.2, 2.3, and
2.4).
The problem in Figure 2.1 is that the solution is unbounded. No matter how
high we select x we still can do better by choosing a higher one.
STATIC LECTURE 1
In Figure 2.2, the difficulty is that the function is not differentiable at the
optimum.
In Figure 2.3, the function is not continuous and does not achieve an optimum
at the candidate solution, x .
In Figure 2.4, there are multiple local optima. In fact, there are three solutions
to the problem, take the derivative of f (x) and find where it is equal to zero.
Other problems may also arise. For example, while we can understand the
approach for the single-dimensional problem, f : R R, how does the technique
work for the more interesting and more common multi-dimensional case, f : Rn R?
2.2. Continuity and Linearity.
2.2.1. Metrics. If we are focusing on the set of problems where the choice variable
is a n-dimensional real vector (x Rn ) then we need to develop an idea of what it
STATIC LECTURE 1
STATIC LECTURE 1
means for two different choices, x and y to be close to each other. That is, we
need an idea of distance, or norm or metric.
The notion of distance or closeness is the usual common sense idea of the
Euclidean distance. For a vector x Rn , we say that the length of that vector (or
equivalently, the distance of that vector from the origin) or its norm or its metric
is defined by
kxk =
That is, the square root of the sum of the squares of its components.
Remark 1. See SB 29.4
In fact, there are many different possible concepts of distance we could have used
even in Rn . If you notice what the norm is doing you will see that it is taking an
element from our Vector Space and giving us back a real number which we interpret
as telling us the distance or size of the element.
STATIC LECTURE 1
More generally, then, a norm is any function that operates on a Vector Space
and gives us a real number back which also satisfies the following three properties:
(1) kxk 0, and furthermore kxk = 0 if and only if x = 0
(2) kx + yk kxk + kyk(Triangle Inequality)
(3) kaxk = |a| kxk, a R, x V
Prove for yourself, that the Euclidean norm satisfies these conditions.
Observe that the triangle inequality is geometrically sensible in Figure 2.5
STATIC LECTURE 1
Note that if we think of the norm of x as denoting the distance of x from the
origin, for any two vectors, x and y, the norm of x y is a measure of the distance
between x and y.
We have a good intuition about what continuity means for a function from the
real line to the real line, (no jumps or gaps), since we can rarely draw the graphs of
functions from more complicated spaces, we need a more precise definition.
Roughly, we want to ensure that whenever x0 is close to x in the domain space,
f (x0 ) is close tof (x) in the target space.
Since we focus onRn , we will typically just use the Euclidean Norm as our idea
of distance.
Definition 3. A sequence of elements, {xn } is said to converge to a point, x Rn ,
if for every > 0 there is a number, N such that for all n > N , kxn xk < .
Definition 4. A function f : Rn Rm is said to be continuous at a point x if for
ALL sequences {xn } converging to x, the derived sequence of points in the target
space, {f (xn )} converges to the point f (x). We say that a function is continuous if
it is continuous at all points in its domain.
Observe why the example in Figure 2.3 above fails the condition of continuity at
x .
Definition 5. A function f : V W is linear if for any two real numbers a, b and
for any two elements, v, v0 V we have f (av + bv0 ) = af (v) + bf (v0 ).
Note that any linear function from Rn to Rm can be represented by an m by n
matrix, A, such that f (x) = Ax. (You might also observe that this means that
f (x) is the (column) vector of numbers that result when we take the inner product
of every row of A with x.)
Note that although we sometimes call functions from R to R of the form f (x) =
mx+b linear functions, these are really affine functions. Why do these functions
not generally satisfy the definition of linear functions?
STATIC LECTURE 1
2.3. Vector Geometry. The next problem addressed (in the next two subsections),
is how to extend our intuition about an optimum being at a point where the objective
function has slope zero to multiple dimensions. First, we need some notions of
vector geometry and later some multidimensional calculus.
kxk = x x
STATIC LECTURE 1
STATIC LECTURE 1
10
Note as well, that inRn , if we take any two vectors and join them at the tail,
the two vectors will define a plane (a two dimensional at surface) inRn . How are
the two vectors related?
Theorem 8. (SB 10.4)
If v w > 0, v and w form an acute angle with each other.
If v w < 0 they form an obtuse (greater than 90 degrees) angle with each
other.
If v w = 0, then they are perpendicular to each other. (They are orthogonal
to each other.)
2.4. Hyperplanes: Supporting and Separating.
Definition 9. A linear function is a transformation from Rn toRm with the feature
that f (ax + by) = af (x) + bf (y) for all x, y inRn and for all a, b inR.
Fact 10. Every linear functional (a linear function with R as the range-space) can
itself be represented by an n-dimensional vector (call it (f1 , f2 , . . . , fn )) with the
feature that
f (x) =
n
X
fi xi
i=1
That is, the value of the functional at x, is just the inner product of this defining
vector (f1 , f2 , . . . , fn ) with x.
Remark 11. Note that once we fix a domain space, for example,Rn , we could ask
the question, What constitutes all of the possible linear functionals defined on that
space? Obviously this is a large set. The set of all such functionals for a given
domain space V , is called the dual space of V and is often denoted, V .
Fact 12. The fact above implies thatRn is its own dual space. This symmetry,
though, does not always hold (for example if the domain space is the vector space
of all continuous functions defined over [0, 1], the dual space is quite different) so
STATIC LECTURE 1
11
STATIC LECTURE 1
12
STATIC LECTURE 1
13
to minimize the distance to x that would yield the same answer as the problem:
among all the separating hyperplanes between x and C, find the hyperplane that is
the farthest away from x. Note that this hyperplane is a supporting hyperplane of
C and is orthogonal to the vector x. (See Figure 2.9)
STATIC LECTURE 1
14
There are many versions of separating hyperplane theorems but I will give just
one. (See also de la Fuente, 241-244) (See lecture 3 for more details on Int and
other set notation.)
Theorem 18. (Takayama pp. 39-49) Suppose X, Y are non-empty, convex sets
inRn such that Int(Y ) X = ; and the Interior of Y is not empty. Then there
exists a vector a in Rn which is the defining vector of a separating hyperplane
between X and Y . That is, for all x X, ax c and for all y Y, c ay.
Remark 19. The requirement that the interior of Y be disjoint with X allows for
the two sets to intersect on a boundary. The requirement that the interior of Y be
nonempty rules out the counterexample of two intersecting lines (see Figure 2.10).
Remark 21. The graph of a function is what you normally see when you draw the
function in a Cartesian diagram.
STATIC LECTURE 1
15
f 0 (x) = lim
STATIC LECTURE 1
16
Note that this object does not exist at all x for every function. Thus, we sometimes
encounter functions (even continuous functions) which do not have a derivative at
some x. Though we can often rule these out without any harm, it is also the case
that non-differentiable functions arise naturally in economic problems so we cannot
always do this.
Informally, we think of the derivative of f at x as telling us about the slope of f .
Note that this is really a notion about the graph of f .
Another way to think about what the derivative does, which ties more directly
into optimization theory and also gives us a better clue about how to extend it to
many dimensions is to see that it defines a supporting hyperplane to the graph of
f at the point (x, f (x)).
To see this, consider the points in the (x, y) space given by
H = {(x, y)|(f 0 (x ), 1) (x, y) = f 0 (x )x f (x )}
This is a hyperplane and exactly defines the line drawn in the graph. It touches
(is tangent to) the graph of f at f (x ).
STATIC LECTURE 1
17
f (x) f (x)
f (x)
,
,...,
x1
x2
xn
0
f (x)
f (x) f (x)
,
,...,
x1
x2
xn
is the 1 n row vector of partial derivatives if they exist. That is, it is the
transpose of the gradient of f .
These objects are useful because if we take a small vector v = (v1 , v2 , . . . , vn ) Rn
the vector, f helps us determine approximately how f changes when we move
from x in the direction of v. The sum over i = 1, . . . , n of vi f (x)/xi is a very
close estimate of the change in f when we move from x to x + v. That is
f (x + v) f (x) +
n
X
f (x)
i=1
xi
vi
STATIC LECTURE 1
18
Using the gradient of f , we can write this summation term more concisely as an
inner product:
n
X
f (x)
i=1
xi
vi = f (x) v
or as a vector multiplication:
n
X
f (x)
i=1
xi
vi = v0 f (x)
STATIC LECTURE 1
19
2 f (x) =
2 f (x)
x21
2 f (x)
x1 x2
2 f (x)
x2 x1
2 f (x)
x2 x2
..
.
..
.
...
..
.
2 f (x)
xn x1
2 f (x)
xn x2
...
...
2 f (x)
x1 xn
2 f (x)
x2 xn
..
.
2 f (x)
x2n
f11
f21
2
f (x) = .
.
.
fn1
f12
...
f1n
f22
..
.
...
..
.
f2n
..
.
fn2
...
fnn
Definition 24. The derivative of the gradient (or Jacobian) of f is called the
Hessian of f
Theorem 25. Youngs Theorem (SB 14.5): If f is C 2 , fij (x) = fji (x). That is,
the Hessian of f is symmetric.
2.6. Homogeneous and Homothetic Functions. Certain functions in Rn are
especially well-behaved:
Definition 26. A function f : Rn R is homogeneous of degree k if f (tx1 , tx2 , . . . , txn ) =
tk f (x) for t R.
STATIC LECTURE 1
20
One reason why these functions are useful is that they are very easy to characterize.
For example, suppose we only knew the value that the function takes for some ball
around the origin. Then we can use the homogeneity assumption to determine its
value everywhere in Rn . That is because for a ball that completely surrounds the
origin, we can define any point x0 Rn as some scalar multiple of some point on
that ball, so x0 = tx. Then apply the definition.
Definition 27. If f (tx1 , tx2 , . . . , txn ) = tf (x) we say that f is linearly homogeneous.
An important feature of homogeneous functions comes from the following theorem:
Theorem 28. Eulers Theorem (SB 20.4): If f is homogeneous of degree k then
x f (x) = kf (x)
Proof. Use the chain rule of differentiation to get
d
f (tx)
f (tx)
f (tx)
f (tx1 , tx2 , . . . , txn ) =
x1 +
x2 + +
xn
dt
x1
x2
xn
n
X
f (tx)
xi
=
xi
i=1
= f (tx) x
where f (tx)/xi represents the partial derivative of f with respect to its ith
argument evaluated at tx. Now note that by homogeneity,
d
d
f (tx1 , tx2 , . . . , txn ) = tk f (x1 , x2 , . . . , xn )
dt
dt
= ktk1 f (x)
Both of these results hold for any value of t so in particular, choose t = 1. Substituting and combining the two equations gives the result.
STATIC LECTURE 1
21
STATIC LECTURE 1
22
If n = 2 and f is C1 , we can solve for the level set of f by using the total
differential: Let (dx, dy) satisfy
df = 0 =
f (x, y)
f (x, y)
dx +
dy
x
y
dy
= fx
(x;y)
dx
y
along with an initial condition, y0 (x0 ) = y0 will trace out the level set of f through
the point (x0 , y0 ).