Economics of Production (MSC Level)
Economics of Production (MSC Level)
Introduction
These notes introduce general equilibrium theory, along with some requisite mathematical
tools. The term “general equilibrium” is somewhat difficult to define. Roughly speaking, a
model is described being a general equilibrium model if it aims to study an entire economy,
without any loose ends such as taxes being thrown in the ocean, food aid being helicoptered
in, or cars being produced from “money” rather than labour, machines, and natural resources.
General equilibrium theory is typically described as a microeconomics topic, but this is mis-
leading. Almost all microeconomic models (including almost all game theory models) do not
aspire to be general equilibrium models. For instance, almost all models of auctions involve
the players directly consuming the money that they are left with at the end of the game, rather
than trading that money for goods and deriving utility from goods. On the other hand, most
applied macroeconomic models are general equilibrium models. Therefore, one of the most
important roles of “microeconomic general equilibrium theory” is to provide a foundation for
modern macroeconomics.
These notes only study (special cases of) the Walras (1874) model as formulated by Arrow
and Debreu (1954), which is a general equilibrium model of perfect competition. There are of
course many general equilibrium models with monopolistic competition, adverse selection, and
other frictions. We focus on perfect competition for simplicity, and largely follow the analysis
of Debreu (1959).
The Arrow-Debreu model is much like first-year undergraduate microeconomics, which
studies a single-market economy. The concepts of supply, demand, marginal utility, marginal
cost, equilibrium, and efficiency are the primary focus, just like undergraduate microeconomics.
However, general equilibrium theory studies many markets simultaneously, whereas undergrad-
uate microeconomics is very limited in its understanding of how different markets interact with
each other.
For example, consider the interplay between migration and international agricultural mar-
kets. Suppose one country has more arable land. Does that mean workers will migrate to
the arable country, to take advantage of higher wages resulting from high productivity? Or
perhaps only few workers are needed in the arable country to maintain high output, so actu-
7
8 CHAPTER 1. INTRODUCTION
P
MB
demand
MC
supply
P∗
Q
Q∗
Figure 1.1: Competitive equilibrium in a sin-
gle market economy
ally the workers will migrate to less arable countries? The tools of undergraduate economics
are not very helpful here, because the problem makes little sense unless there at least three
workers, two firms (one for each country) each with their own production function, and four
markets (land in each country, labour, and food). A supply and demand curve – or even a 2x2
Edgeworth box – just won’t do.
While there are many important applications of general equilibrium theory, the most im-
portant reason to learn it is to understand macroeconomics carefully. The most important
trade-off in macro-economics is between investment and consumption. This is a never-ending
problem; if the world were to end tomorrow, then we would have a big party today and destroy
our capital. Therefore, macroeconomics requires an infinite set of markets.
Despite all of these complications, our goal is to simplify everything, so that we can use
as much intuition from undergraduate microeconomics as possible. Indeed, we recommend
that you find your favourite undergraduate microeconomics textbook, and compare graduate
and undergraduate ideas as you progress through your study. Roughly speaking, the following
undergraduate ideas generalise as follows:
• MC = supply. In undergraduate economics, the supply curve is the same as the marginal
cost curve (and the demand curve is the same as the marginal benefit curve). The
envelope theorem generalises the idea that marginal values are connected to optimal
policies.
neglect applications in favour of determining the technical limits of the tools. While both are
valuable, we think it is more important to know what the tools are useful for, rather than what
they are useless for.
Chapter 2
Production
This chapter studies the theory of the competitive firm which means we will assume that the
firm is unable to manipulate prices. The theory focuses on how the firm reacts to prices when
choosing input and output quantities. This choice can be quite complicated, as the firm may
have many possible output levels, and many possible ways to deliver each output level. For
example, a car manufacturer may have to decide on hiring many types of specialised labour
and purchase many specialised components. Is it possible to construct a simple marginal cost
curve and solve the firm’s output choice by setting marginal cost equal to price?
The answer is yes, but some mathematical tools are involved, all of which are widely used
by economists. First, dynamic programming is used to simplify a complicated decision
problem by breaking it into smaller problems. For example, we break the firm’s production
decision into an output choice followed by an input choice. This allows us to construct a
marginal cost curve without getting overwhelmed with the details of the input choices. Second,
the envelope theorem generalises the idea that optimal choices (such as supply curves) are
closely related to marginal valuations (such as marginal cost curves). Third, convex analysis
is a branch of geometry that captures the ideas of decreasing returns to scale and diminishing
marginal productivity, and allows us to understand when marginal cost curves are increasing.
In Section 2.1 we introduce production functions which describe how the firm may transform
inputs into outputs. Section 2.2 then puts production into a competitive market context in
which firms make input and output decisions to maximise profits. Section 2.3 introduces the
envelope theorem, which explores the relationship between marginal valuations and optimal
choices. With some help of convex analysis techniques, we establish that output price increases
lead to more output and that factor price increases lead to a decrease in demand for that factor.
Section 2.4 introduces dynamic programming, which allows us to focus on output decisions
without being distracted by input decisions. This leads us to a version of the classical “price
equals marginal cost” formula. Section 2.5 extends the tools from Section 2.3 to accommodate
constraints; this is necessary for studying the nature of marginal costs. This section establishes
that marginal costs are increasing in output. Finally, Section 2.6 concludes with a discussion of
11
12 CHAPTER 2. PRODUCTION
more complicated production technologies, such as factories that produce several goods. This
last section is for completeness only and can be skipped.
• Free disposal (Monotonicity): The firm has no obligation to use all input goods provided.
Having too many input goods never hurts, as the firm can always throw them away
without any cost. This idea leads to the assumption of monotonicity where f is weakly
increasing. Specifically if x ≥ x′ (i.e. xn ≥ x′n for all n) then f (x) ≥ f (x′ ). A stronger
assumption, strict monotonicity is that if x > x′ (i.e. xn ≥ x′n for all n and xn > x′n
for at least one n) then f (x) > f (x′ ).
∂
• Smoothness: f is twice continuously differentiable. Each partial derivative ∂xi
f (x) is the
marginal productivity of xi .
• Decreasing marginal productivity: the production function has weakly decreasing marginal
productivity in good 1 if, holding all other input factors x−1 fixed, ∂x∂ 1 f (x) weakly de-
creases as x1 increases. 1 For example, consider a restaurant that produces food using
cooks and kitchen space. Adding a cook without adding any kitchen space is likely to
create congestion that leads to decreasing marginal productivity of cooks. Similarly,
adding kitchen space without adding cooks will relieve a diminishing amount of con-
gestion. Figure 2.1 depicts decreasing marginal productivity of cooks and kitchens in
1
The word “decreasing” is used differently in the context of marginal productivity and marginal utility
compared to everywhere else. Normally, “decreasing” means the function gets smaller as its parameter vector
increases in any (combination) of its dimensions. However, decreasing marginal productivity does not mean
that the marginal productivity of labour decreases when the amount of capital increases.
2.1. PRODUCTION FUNCTIONS 13
producing food. Decreasing marginal productivity of the first input is equivalent to the
∂2
production function being concave in that factor, and also to ∂x 2 f (x) < 0.
1
f (c, k) f (c, k)
k=9 c=9
k=6
k=3 c=6
c=3
c k
(a) Cooks (b) Kitchens
Figure 2.1: Diminishing marginal productivity of cooks and kitchens when f (c, k) = c0.7 k 0.3 .
−1
• Weakly increasing returns to scale: for all x ∈ RN
+ and all t > 1, f (tx) ≥ tf (x). For
example, communications networks have this character: when adding the nth phone line
to a telephone network, there are n − 1 new pairs of people who are now connected to
each other. So, the number of connections supplied y is a function f (n) = 21 n(n − 1)
of the number of people n, and f (tn) ≈ t2 f (n). Note that this assumption leads to
decreasing (not increasing) marginal cost.
−1
• Constant returns to scale: for all x ∈ RN + and all t > 0, f (tx) = tf (x). For example,
this occurs if the output from doubling the size of a factory f (2x) is equal to the output
from two identical factories f (x) + f (x) = 2f (x). This is a common assumption to make.
−1
• Weakly decreasing returns to scale: for all x ∈ RN + and all t > 1, f (tx) ≤ tf (x).
Decreasing returns to scale can occur if we have mispecified the model and left out
some resource. For example, building an identical factory requires finding an (identical)
manager to run it. If we forget to include any input factors (such as the manager) in the
model, then “cloning” a firm by cloning only the inputs that were explicitly modelled
would give a less productive clone – at least under the assumption of decreasing marginal
productivity. One way to make up for such omissions is to assume decreasing returns to
scale.
This assumption is philosophically unappealing for a theory of the “whole economy at
the same time.” Why do we need to leave anything out? Sadly, economics is hard, and
we frequently need to take shortcuts. This is a common one.
• Concavity: f is a concave function, which means that taking a mixture between two
bundles of inputs, f (ax + (1 − a)x′ ) gives more output than the corresponding linear
approximation, af (x) + (1 − a)f (x′ ). For example, if x represents a hospital with many
14 CHAPTER 2. PRODUCTION
doctors, and x′ represents a hospital with many nurses, and a = 12 , then concavity implies
f (x) + f (x′ ) ≤ f ( 12 x + 21 x′ ) + f ( 12 x + 12 x′ )
which means that it’s better to reallocate the doctors and nurses so that both hospitals
are identical.
Concavity is like assuming both weakly decreasing returns to scale and decreasing marginal
productivity for each input factor, i.e. for each input factor, as we increase that factor
(without changing any of the other factors), the marginal output decreases. Caution:
this assumption is frequently referred to as “a convexity assumption,” even though −f ,
not f , is convex.
If f is smooth and concave, then it has weakly decreasing marginal productivity. To see
this, hold (x2 , . . . , xN −1 ) fixed and let g(x1 ) = f (x1 , x2 , . . . xN −1 ), so that g ′ (x1 ) is the
marginal productivity of x1 . Since f is concave, g is also concave, since for all s ∈ [0, 1]
and all x1 , x′1 ∈ R,
Note that the first two steps above are devoted to reformulating the convex combination
in a form that matches Theorem D.6 characterisation of concavity. Similarly, since f is
smooth, g is also smooth (by the chain rule). Since g is smooth and concave, Theorem D.3
implies g ′ is weakly decreasing, so we have established f has weakly decreasing marginal
productivity in the first input. The same logic applies to the other inputs.
If f is concave and has the possibility of inaction, then it has decreasing returns to scale.
To check this, we must show f (tx) ≤ tf (x) for t > 1. Let s = 1/t, which means that
s ∈ (0, 1). By Theorem D.6, sf (tx) + (1 − s)f (0) ≤ f (stx + (1 − s)0). We can then
deduce:
Question 2.1. ✓ Show mathematically or graphically that if f is smooth and has constant-
returns to scale, then marginal productivities do not depend on scale.
2.1. PRODUCTION FUNCTIONS 15
Question 2.2. ✓ Can a production process have both diminishing marginal productivity and
increasing returns to scale? (Hint: you just need to find one example.)
There are typically many different combinations of inputs that give the same level of output
y. This set is called the isoquant,
I(y) = f −1 (y) = x ∈ RN
+
−1
: f (x) = y .
The set above the isoquant – the set of inputs that give greater or equal output than y – is
called the upper contour set,
V (y) = f −1 ([y, ∞)) = x ∈ R+
N −1
: f (x) ≥ y .
See Figure 2.2 for an example with three isoquants and three upper contour sets. Note that
the upper contour sets may be overlapping whereas the isoquants never cross.
Question 2.3. ✓ Explain why two different isoquants never cross.
x2
y = 15
y = 10
y=5
x1
Figure 2.2: Isoquants and Upper Contour Sets
• Quasi-concavity: f is a quasi-concave function, i.e. the upper contour set V (y) for each
output level y is convex. This has the following economic interpretation. Consider two
N −1
input bundles x, x′ ∈ R+ on the same isoquant, i.e. f (x) = f (x′ ). Quasiconcavity
means that mixtures ax + (1 − a)x′ give at least as much output, i.e. f (ax + (1 − a)x′ ) ≥
f (x).
Since a production function may allow many ways to produce the same output, it raises the
question: how can the firm substitute its inputs at the margin to produce the same quantity?
For example, suppose a supermarket initially plans to use x1 computers and x2 sales people.
If it buys x1 + ∆ computers instead, how many sales people can it replace, maintaining the
same output? This is called the marginal rate of technical substitution from x2 to x1 .
Geometrically it is the slope of the isoquant. If we think of the isoquant as a function, this
amounts to calculating the derivative of the function. The function g of an isoquant is defined
implicitly by the equation
g ′ (x∗1 ) = − ∂x1
∂f (x∗1 ,x∗2 )
. (2.5)
∂x2
Question 2.5. ✓ Show mathematically or graphically that if f is smooth and has constant-
returns to scale, then marginal rates of technical substitution do not depend on scale.
instead of “the”, unless we know for sure that there is exactly one item under discussion. This
issue is discussed in more detail elsewhere in these notes: usage of “the” is discussed in detail
in Section B.2. Section E.2 discusses whether or not there is an optimal demand function.
Section E.3 establishes that there is at most one optimal demand function if the production
function is strictly concave.
If x∗ is an optimal (profit maximizing) choice given prices (p, w), then x∗ satisfies the
first-order conditions
∂f (x∗ )
p = wi for all i ∈ {1, . . . , N − 1}. (2.7)
∂xi
In particular, this implies that the marginal rate of technical substitution from any good i to
any other good j is equal to marginal rate of substitution in terms of purchase prices,
∂f (x∗ )
wi
= − ∂f∂x i
(x∗ )
for all i, j ∈ {1, . . . , N − 1}.
wj
∂xj
For example, suppose i = 1 is capital and j = 2 is labor. If the marginal rate of technical
substitution from capital to labor is small, this means the firm needs few workers to replace
capital and maintain the same level output. The equation says that the firm should replace
capital with workers until the cost of replacing each unit of capital with a worker is no longer
smaller than (i.e. becomes equal to) the productivity gain of replacing capital with workers.
Geometrically, this means that the firm chooses a production plan where the isoquant is
tangential to the isocost line (or isocost hyperplane).
Example 2.1. Suppose that music recordings are produced from the labour of musicians and
technicians. Write down the music company’s profit maximisation problem.
Answer. Let r be the royalties of a song, lm the musician labour input, lt the technician labour
input, wm the musician wage, wt the technician wage, and f (lm , lt ) be the number of songs
produced. The music company’s profit maximisation problem is
max rf (lm , lt ) − wm lm − wt lt .
lm ,lt
Example 2.2. Glycerine is a by-product of bio-diesel production, both of which are produced
from waste organic material. Write down a bio-energy company’s profit maximisation problem.
Answer. Let w be the waste material input, g(w) the glycerine output, d(w) the bio-diesel
output, pw the price of waste material, pg the price of glycerine, and pd the price of bio-diesel.
The bio-energy company’s profit maximisation problem is
Example 2.3. PET (polyethylene) plastic is made from ethylene, which is made from crude oil.
Write down the profit maximisation problem of a vertically integrated firm that buys crude
18 CHAPTER 2. PRODUCTION
oil and sells plastic. Write down the first-order condition determining the optimal production
choice.
Answer. Let x be the crude oil input, e = f (x) be the ethylene produced from the x units of
oil, y = g(e) be the plastic output from the e units of ethylene input, px the price of crude oil,
and py be the price of plastic. The integrated firm’s profit maximisation problem is
The first name is upper envelope, which refers to the geometric interpretation of (2.8),
involving following the outer edge that surrounds (“envelopes”) some curves, as depicted in
Figure 2.3 and Figure 2.4. Specifically, for each b, there is a curve w(a) = v(a, b). The function
V is the outer edge of all of these curves. In Figure 2.4, there is an infinite set of lines (only
some of which are depicted), and the upper envelope is a parabola.
V (a) V (a)
b = study
b = work
a
(assets) a
Figure 2.3: Value functions are upper en- Figure 2.4: The upper envelope of an infinite
velopes set of lines
The second name is value function, which refers to an economic idea: the value of facing
a situation or state (a) before making a choice (b). Figure 2.3 depicts the value of holding
assets before making a choice between studying or working. The profit function π(p; w) is also
an example of a value function; it is the firm’s value of facing prices (p; w) before it chooses
its input quantities. The term policy or policy function b(a) refers to the optimal choice of
b for each state a, i.e. b(a) ∈ argmaxb̂ v(a, b̂). The input demand function is an example of a
policy. We summarise the terminology:
V( a
|{z} ) = max v(a, b) = v(a, b(a) ).
b | {z } |{z}
state variable |{z}
| {z } choice variable objective function policy
value function
The envelope theorem provides a formula for differentiating value functions. (Actually,
this is the simplest of a large collection of envelope theorems used by economists.)
Theorem 2.1 (Envelope Theorem). Let v : Rn × Rm → R be a differentiable function, V (a) =
maxb∈Rm v(a, b) be its value function (upper envelope), and b(a) be its policy function. If V is
a differentiable function, then
∂v(a, b)
V ′ (a) = , (2.9)
∂a b=b(a)
or in alternative notation,
V ′ (a) = va (a, b(a)). (2.10)
20 CHAPTER 2. PRODUCTION
V (a)
V
L
a
ā
Figure 2.5: The “lazy” envelope theorem
proof .
Lazy decision maker proof of Theorem 2.1. Fix a particular state ā. The value function of a
“lazy” decision maker who chooses b(ā), regardless of a, is
L(a) = v(a, b(ā)), (2.11)
and is depicted in Figure 2.5. The lazy decision maker’s value is weakly less than the rational
value, i.e. L(a) ≤ V (a) for all a. Their values are equal at ā. Therefore ā minimises V (a)−L(a),
so the first-order condition gives
V ′ (ā) = L′ (ā) = v1 (ā, b(ā)). (2.12)
Chain rule proof of Theorem 2.1. Let b(a) denote the policy function. We will only prove this
theorem for the case in which the state variable a and choice variable b are one-dimensional,
and the optimal policy b(a) is a differentiable function (although the theorem is true without
these extra assumptions). With this notation, V may be rewritten as V (a) = v(a, b(a)). By
Theorem F.2, the derivative is
∂v(a, b) ∂v(a, b)
V ′ (a) = + b′ (a). (2.13)
∂a b=b(a) ∂b b=b(a)
However, since b(a) maximizes v(a, ·), we have the first-order condition
∂v(a, b)
= 0.
∂b b=b(a)
In the jargon we defined above, w is the state variable, l is the choice variable, and π is the
value function. We would like to calculate π ′ (w).
First we will calculate π ′ (w) without using the envelope theorem, which we will call the
“obvious method.” The first-order condition for the manager’s choice is
1
5√ − w = 0 (2.15)
l
which means that the policy function is
25
l(w) = . (2.16)
w2
The value function may be rewritten as
p
π(w) = 10 l(w) − wl(w)
r
25 25
= 10 2
−w 2
w w
50 25
= −
w w
25
=
w
whose derivative is
25
π ′ (w) = − .
w2
Next we will calculate π ′ (w) using the “envelope theorem method.” The theorem says that
′ ∂ √
π (w) = 10 l − wl
∂w l=l(w)
= [−l]l=l(w)
= −l(w)
Often, this form is all we need. Alternatively, we can substitute in the optimal policy, (2.16)
to conclude that π ′ (w) = − w252 . Evidently, the envelope theorem approach requires fewer
calculations.
When we used the obvious method, we had to calculate and substitute the policy function
l(w). When we used the envelope theorem method, we did not. We will examine carefully why
this is the case. Henceforth, we will only consider calculating the derivative of π(w) at w = w∗ .
Imagine that w∗ is the old wage, and we are interested in studying how a small market wage
increase affects profits. An alert manager would adjust the labour according to the policy
l(w) = w252 . The alert manager’s policy is decreasing, so that l′ (w) < 0. Let’s compare the
22 CHAPTER 2. PRODUCTION
alert manager’s profits with a lazy manager who does not adjust the labour at all, and uses
the suboptimal policy ¯l(w) = l(w∗ ) = w25∗2 . The lazy manager’s policy is flat, with ¯l′ (w) = 0.
Clearly, the lazy manager would make less profit than the alert manager. But how much less?
The profit function for the lazy manager is
q
π̄(w) = 10 ¯l(w) − w¯l(w) (2.17)
5 25
= 10 ∗ − w ∗2 , (2.18)
w w
which is less than the alert manager’s profit function. The lazy manager’s marginal profit of
a wage increase (from any w) is
25
π̄ ′ (w) = − , (2.19)
w∗2
Notice that the lazy manager’s marginal profit, π̄ ′ (w∗ ) is the same as the alert manager’s
marginal profit π ′ (w∗ )! This explains why we did not need the derivative of the policy func-
tion. Even though the lazy manager makes less profit than the alert manager, the difference
is very small after a small wage change, so the marginal profit is the same. So, when calcu-
lating marginal profits, we can use the lazy manager’s profit function rather than the alert
manager’s profit function, even though the lazy manager’s profit function is (weakly) less than
the alert manager’s profit function. The envelope theorem uses this observation to simplify
the calculations.
Applying the envelope theorem to the profit function (2.6) gives
∂π(p; w)
= f (x(p; w)) = y(p; w) (2.20)
∂p
∂π(p; w)
= −xi (p; w). (2.21)
∂wi
From (2.20), we learn that the marginal profit of an output price increase equals the output
quantity – which is also the marginal revenue of a price increase, holding quantities fixed. We
can interpret (2.21) such that the marginal loss of an input price increase equals the marginal
cost increase. These two formulas relate policy functions to marginal valuations (although not
in an analogous way to marginal cost coinciding with the supply curve, as we will see later in
Section 2.4).
In principle, we might also have expected an indirect effect from a price change: since the
firm changes its quantities, this might also have an effect on the marginal profit. But this
is not the case. The “lazy manager” does not adjust quantities, but has the same marginal
profits as the rational manager.
Our next task is to understand how optimal choices (inputs and output quantities) are
affected by prices. We will use the relationships between optimal policies and marginal valua-
tions that we established above. Just like increasing marginal cost implies an increasing supply
2.3. UPPER ENVELOPES AND VALUE FUNCTIONS 23
curve, we will show that convex functions are convex, and hence supply curves are increasing.
The envelope theorem gave us a starting point: the right side of the derivatives in (2.20) and
(2.21) in fact contain the firm’s choices of input and output (which is determined by input
choices). So, rearranging and differentiating again yields
∂y(p; w) ∂ 2 π(p; w)
= (2.22)
∂p ∂p2
∂xi (p; w) ∂ 2 π(p; w)
=− . (2.23)
∂wj ∂wi ∂wj
What do we know about the second derivatives of the profit function π? One thing we
know is that because π is twice differentiable,
∂xi (p; w) ∂xj (p; w) ∂ 2 π(p; w)
= =− . (2.24)
∂wj ∂wi ∂wi ∂wj
For example, consider a hospital that uses doctors (xi ) and nurses (xj ) among other things
as input factors. The equation above establishes a relationship between the hospital’s demand
for these two items. The first term describes by how much demand for doctors increases when
nurses’ wages increase. The second term describes by how much demand for nurses increases
when doctors’ wages increase. The equation says they are equal. If the hospital decides to
hire an extra doctor (and possibly fire some nurses) when nurses’ wages increase by $1, then
the hospital would also decide to hire an extra nurse when doctors’ wages increase by $1.
To say more about the second derivatives of the profit function (and hence the first deriva-
tives of the policy functions), we will need another theorem.
Theorem 2.2. Suppose V is the upper envelope of convex functions, i.e. V (a) = maxb v(a, b)
where v(·, b) is a convex function for each b. Then V is convex.
Algebraic Proof. This proof is illustrated in Figure 2.6. We informally describe the proof
first. Convexity is about comparing intermediate possibilities. For example, two “extreme”
situations might involve having a = $100 and a′ = $1000 in the bank account in the morning,
respectively. How would the value in an intermediate situation when ta + (1 − t)a′ = $400
compare to the values in the extreme situations? If the value function is convex, then the
intermediate values is worse than the corresponding weighted average tV (a) + (1 − t)V (a′ ) of
the extreme values. If the utility function is convex in the state variable, then we claim that
the value function will be convex.
To prove this, we start with the weighted average of the extreme values. These extreme
values are based on the corresponding optimal choices, e.g. living frugally when a = $100
and throwing a party when a′ = $1000. If we replace these extreme values that are based on
optimal choices with a suboptimal choice, then we will reduce the weighted average value. Since
we are interested in the intermediate situation a′′ = $400, we replace the optimal choices for
the extreme situations with the optimal choice for the intermediate situation (which probably
24 CHAPTER 2. PRODUCTION
involves moderate consumption rather than frugal or party-level consumption). After making
this substitution, we are taking the weighted average of the extreme situations using the
intermediate choice. Since the underlying objective function is convex, this is better than the
intermediate situation (of a′′ = $400).
We would like to show tV (a) + (1 − t)V (a′ ) ≥ V (a′′t ), where we define a′′t = ta + (1 − t)a′ as
the convex combination t of a and a′ . As usual, let b(a) denote the policy function. Expanding
the left side gives
Geometric Proof (Sketch). This proof is illustrated in Figure 2.7. Recall that a function is
convex if and only if its hypergraph (the set of points consisting of the “atmosphere” above
the surface, {(a, c) : c ≥ V (a)}) is convex. A point is in the hypergraph of the upper envelope if
it is in all of the hypergraphs of the underlying functions. That is, the hypergraph of the upper
envelope is the intersection of hypergraphs of the underlying functions. Since the intersection
of convex sets is convex (Theorem D.1), the hypergraph of the upper envelope is convex.
V (a)
V (a)
a a′′t a′ a
Figure 2.7: Geometric Proof of Theorem 2.2
Figure 2.6: Algebraic Proof of Theorem 2.2
We may use the theorem above to establish that the firm’s profit function is convex. The
theorem below uses this to understand how price changes affect the firm’s choices. After an
output price rise, the firm produces more. After an input price rise, the firm reduces its demand
for that good.
2.3. UPPER ENVELOPES AND VALUE FUNCTIONS 25
Theorem 2.3. For every production function f , the firm’s profit function π is convex. Hence,
if π is smooth, then
Proof. We first outline the proof. The idea is that if the input (and hence output) quantities
are held constant, then the profits are a linear function of prices. This means is because
profits are based on calculating prices times quantities, both when calculating revenues and
costs. Since linear functions are convex, it follows that for each input choice, profits are a
convex function of prices. Now, the profit function is the upper envelope of each of these linear
functions (one for each possible production plan), so we conclude the profit function is convex.
For every input x∗ , we can define a function g(p; w) = pf (x∗ ) − w · x∗ . Taking the upper
envelope of all such g(p; w) functions gives the profit function π. Since each g function is linear
(and hence convex), Theorem 2.2 implies that the profit function π is convex. Thus, we may
apply Theorem D.4 to deduce
∂2 ∂2
π(p; w) ≥ 0 and π(p; w) ≥ 0. (2.26)
∂p2 ∂wi2
Substituting these inequalities into (2.22) and (2.23) gives the desired inequalities.
Example 2.4. Consider a supermarket that buys wholesale food and labour, which it uses to
sell retail food. Some food might get wasted; more labour means less food gets wasted.
(iii) Show that the supermarket responds to a wholesale price increase by buying less.
Answer.
(i) Notation: Let d denote wholesale food quantity, ϕ wholesale food price, l labour hired,
w wages, f (l, d) retail food sold, and p retail food price. The profit function is
(ii) For each possible value of the choice variables (l, d), the firm’s objective is a linear
function of the state variable (p, ϕ, w). Since linear functions are convex, the upper
envelope, π(p, ϕ, w) is convex.
26 CHAPTER 2. PRODUCTION
(ii) Using algebra and words, explain the effect that the envelope theorem ruled out in part
(i).
Question 2.10. ✓ Show that the firm’s optimal policies are unresponsive to “inflation”, i.e.
all prices increasing by the same proportion. Show that inflation increases (nominal) profits.
Do your answers suggest that a firm has an incentive to cause inflation (perhaps by bribing
politicians)?
Question 2.11. ✓ A solar panel manufacturer uses knowledge, labor and silicon to make solar
panels. Labor and silicon are acquired at market prices. However the firm can not acquire
new knowledge – it is stuck with whatever it is endowed with.
(i) Write down a mathematical model that represents the firm’s profit maximization prob-
lem.
(ii) What is the marginal profit of knowledge to the firm? Your answer should take into
account that if the firm’s knowledge increases, it might decide to change its production
decision.
For more similar questions, see the following practice exam questions: 3.iv, 3.v, 6.iii, 6.iv, 9.iii,
12.iv, 15.iv, 16.iii, 18.iii, 18.iv, 24.a.iii, 25.iii, 27.a.ii, 28.iv, 29.a.ii, 31.a.iv, 33.iii.
In this problem, the firm is effectively choosing both its inputs x and its output f (x) at the
same time. We can decompose the problem into two problems where inputs and output are
28 CHAPTER 2. PRODUCTION
chosen separately. The cost function c gives the cost of producing a particular output quantity:
c(y; w) = min w · x (2.28)
N −1
x∈R+
In this reformulation of the profit function, the firm only chooses output. We were able to
simplify the firm’s profit maximization problem by burying some of the decisions inside the cost
function. The simplified formula for the profit function in (2.30) is an example of a Bellman
equation which lies at the heart of dynamic programming.
The lesson of dynamic programming can be summarised as: a complicated value function
with many decisions can be simplified by burying some of the decisions inside another value
function. In computer networking, the problem of choosing the fastest route for sending
messages between two computers can be simplified with dynamic programming. Dijkstra
(1959) noticed that the problem can be broken down into smaller problems by first calculating
the value (speed) of all directly connected computers, and then adjusting for the speed of the
direct links. The problem of finding the best route from the neighbouring computer to the
target is buried inside a value function.
In genetics, the problem of determining the most likely sequence of mutations between
a pair of genes can be simplified with dynamic programming with what is known as the
Needleman and Wunsch (1970) algorithm. Comparing two long DNA sequences is a daunting
task. But the problem may be split up into (many) smaller problems. It is easy to compare
two nucleotides (one from each gene), and the comparisons of all the other nucleotides can be
buried inside a value function.
In economics, the most important application of dynamic programming is in macroeco-
nomics in which a consumer has to choose their consumption for each day of the rest of their
life. This complicated problem can be decomposed into choosing the consumption today and
savings for tomorrow. The consumption choices from tomorrow onwards are buried inside the
value of saving today.
But for the moment, we will only study the firm’s profit maximization problem. One step
we did not check was whether the Bellman equation (2.30) gives the right answer – it should
match the value function (2.27). This step is known as verifying the principle of optimality.
Lemma 2.1 (Principle of Optimality). The definitions of the profit function π(p; w), in (2.27)
and (2.30) are equivalent.
2.4. COST FUNCTIONS AND DYNAMIC PROGRAMMING 29
Proof. The proof involves patiently transforming the formula for the value function into the
Bellman equation. The key trick is to add a new “choice” of output y, which initially is no
choice at all, because it is completely determined by the input. But when the input is chosen
after the output, the separate choice of output becomes meaningful.
The Bellman equation (2.30) buries the complicated input choices inside the cost function c,
and allows us to focus on just one choice: output. This allows us to establish the classical
“price equals marginal cost” formula.
Theorem 2.4. If y(p; w) is the optimal supply policy in (2.27), then for all prices (p, w),
∂c(y; w)
p= . (2.36)
∂y y=y(p;w)
Proof. By the principle of optimality (Lemma 2.1), the profit function (2.27) can be rewritten
in terms of the cost function, (2.30). The first-order condition of this reformulated profit
function with respect to output y is
∂
[py − c(y; w)] = 0, (2.37)
∂y y=y(p;w)
This result shows how useful dynamic programming is: it allowed us to simplify a complicated
problem back into something very simple and familiar.
30 CHAPTER 2. PRODUCTION
We can also re-apply the envelope theorem to study how profits and output are affected
by price changes:
∂π(p; w) ∂
= (py − c(y, w)) = y(p; w) (2.38)
∂p ∂p y=y(p;w)
∂π(p; w) ∂ ∂c(y; w)
= (py − c(y, w)) =− . (2.39)
∂wi ∂wi y=y(p;w) ∂wi y=y(p;w)
We do not learn anything new from the first equation, (2.38). However, the second equation
(2.39) does tell us something: when factor prices increase, profits go down in proportion to
the consequent increase in production cost.
Question 2.12. ✓ Let p be the sale price of output, k be capital which is rented at price r, and
labour l which is paid a wage w. Consider the Cobb and Douglas (1928) production function
y = f (k, l) = k a lb .
(i) Write down the firm’s profit function.
(ii) Write down a Bellman equation in which the firm chooses output (and input is buried
inside a value function).
(iii) Derive the optimal capital and labour choices k(y; r, w) and l(y; r, w). Note: the algebra
requires a lot of patience, so please don’t try this alone! It is worth doing, as it will help
convince you that you understand all of the tools.
Question 2.13. ✓ Continuing Question 2.11 about Solar panel manufacturing, suppose that
the production function is linear in knowledge. Would the firm choose to produce more when
it is endowed with more knowledge? What assumptions in your model are important for your
conclusion?
Question 2.14. ✓ There are two ways to run a dairy farm. The traditional way is to milk
each cow by manually herding the cows and attaching a hose. The modern way involves
buying a big rotary machine where the cows walk in, spend half an hour in the machine, and
walk out in a completely automated process. Assume that the marginal product of the rotary
machine (i.e. the difference in output between machine and no-machine, holding cows and
labour fixed) is increasing in cows and labour. Rotary machines are big and expensive, and
can service hundreds of cows.
(i) Formulate the farm’s profit maximisation problem.
(ii) Henceforth, assume that the two dairy technologies are concave in all inputs, except for
the (indivisible) rotary machine. Sketch a graph of the marginal cost of milk.
(iii) When the price of milk increases, does labour demand increase or decrease?
For more similar questions, see the following practice exam questions: 2.ii, 8.iii.a, 8.iii.b, 9,
21.a.ii, 22.iii, 23.iii, 24.a.ii, 31.a.ii, 31.a.iii, 32.iii, 32.iv, 33.iv, 34.ii, 34.iii, 34.iv.
2.5. UPPER ENVELOPES WITH CONSTRAINTS 31
The policy function b(a) may be solved in the usual way with the Lagrange theorem. The
Lagrangian is
L(a, b, λ) = v(a, b) + λw(a, b).
At an optimal choice b(a), the Lagrange theorem implies that there is a Lagrange multiplier
λ(a) ≥ 0 such that following first-order condition is satisfied
∂L(a, b, λ)
= 0.
∂b b=b(a),λ=λ(a)
The constrained envelope theorem uses this theory to give a formula for the marginal value
function, V ′ (a).
Theorem 2.5 (Constrained Envelope Theorem). If V (·), v(·, ·), w(·), b(·), and λ(·) (as defined
above) are continuously differentiable functions, and
∂w(a, b)
6= 0 for all (a, b(a)), (2.42)
∂b
and if the constraint binds at (a, b(a)), then
′ ∂v(a, b) ∂w(a, b)
V (a) = +λ . (2.43)
∂a ∂a b=b(a),λ=λ(a)
Proof. The max operation (and its constraint) in the formula for the value function, (2.40)
may be removed by substituting in the policy function:
The idea behind Lagrange multipliers is to add a term that represents the marginal cost of
satisfying the constraint. Since we assume that the constraint always binds, i.e. w(a, b(a)) = 0,
it is correct to write
This term accounts for marginal changes in the constraint (i.e. replacing the 0 on the right
side of the constraint with a slightly different number). So it is intuitive that this extra term
might help with a proof.
Differentiating gives
′ ∂L(a, b, λ) ∂L(a, b, λ) ′ ∂L(a, b, λ) ′
V (a) = + b (a) + λ (a) .
∂a ∂b ∂λ b=b(a),λ=λ(a)
The second term is 0 by the first-order condition (2.41). The last term is 0 as it contains
w(a, b(a)) which is 0 because we assumed the constraint binds. Expanding the remaining term
gives (2.43).
We now take first-order conditions and apply the envelope theorem to the cost function (2.28).
The Lagrangian of the cost function is:
which simplifies to
∂f (x)
wi = λ(y; w) (2.46)
∂xi x=x(y;w)
(Note that this calculation involved an extra minus sign because the cost function involves a
minimization.) Applying the constrained envelope theorem to the cost function gives
∂c(y; w) ∂
= [w · x − λ(f (x) − y)] = λ(y; w) (2.47)
∂y ∂y x=x(y;w),λ=λ(y;w)
∂c(y; w) ∂
= [w · x − λ(f (x) − y)] = xi (y; w). (2.48)
∂wi ∂wi x=x(y;w),λ=λ(y;w)
We now interpret these three equations. The second equation (2.47) is fundamental to the
theory of Lagrange multipliers. It says that the marginal cost of increasing the production
target (i.e. tightening the production target constraint) is equal to the Lagrange multiplier. In
2.5. UPPER ENVELOPES WITH CONSTRAINTS 33
other words, increasing the production target comes at a price λ. However, this is an implicit
price not determined directly from market transactions. This is why the Lagrange multiplier
is often called the shadow price of the constraint.
The third equation, (2.48) is sometimes called Shephard’s lemma, and is a slight re-
statement of (2.21). It says that the marginal effect of a price increase of input i is the extra
expenditure required to buy input i keeping the demand fixed. Even though the firm will
decrease its demand for input i (and substitute to other inputs), this effect is too small to
dampen the cost increase.
The first equation, (2.46) includes a Lagrange multiplier which we interpreted as the
marginal cost of output. The left side of the first equation, is the marginal expenditure of
increasing input i. The right side is the marginal cost of the extra output created by this
input.
In Section 2.3, we used the fact that the envelope equations relates the policy function to
the derivative of the value function to learn more about the policy function. Similar to before,
we deduce that
∂xi (y; w) ∂xj (y; w) ∂ 2 c(y; w)
= = . (2.49)
∂wj ∂wi ∂wi ∂wj
These equations are almost identical to (2.24); only the state variables are different. Before,
the policy was a function of prices; here the policy is a function of the output target y and
input prices w.
Under much more stringent (and incompatible) conditions compared to before, we show
that value functions are convex or concave.
Theorem 2.6. In the notation of (2.40), if v is convex, and w is quasi-concave, then
is concave.
To understand this theorem, it is helpful to think of it in terms of cost functions, where a is the
production target and b is the production plan. The condition that w is quasi-concave means
that intermediate production plans must meet intermediate production targets, i.e. if you take
a convex combination of two different optimal production plans, then this will produce at least
as much as the convex combination of the outputs.
34 CHAPTER 2. PRODUCTION
Proof. We prove the first part only. (The second part is analogous.) This proof is similar to
the proof of Theorem 2.2. We sketch the intuition first, based on an example of having two
extreme situations involving production targets of 100 and 1000 meals respectively. Suppose
that 5 chefs are needed for 100 meals, and 95 chefs are needed for 1000 meals. We want to
prove that the cost of an intermediate 550 meals is lower than the average of the costs of the
extreme targets. The previous proof does not apply directly, because it was based on using the
intermediate choice in the extreme situations. This was not a problem in the unconstrained
problem, but it is a problem here, because the intermediate number of chefs (e.g. 50) will not
meet the higher production target of 1000 of meals.
Instead, we consider taking an average number of chefs (55) for an intermediate target of
550. Specifically, we start with the (weighted) average of the costs from the extreme targets.
Then we consider the intermediate target (550 meals) with the average production plan (55
chefs). Since the constraint (i.e. the production function) is quasi-concave, 55 chefs meets or
exceed the intermediate target of 400. Moreover, since the objective is convex, the cost of the
achieving the intermediate target of 550 with the average production plan (of 55 chefs) is at
least as good as the average of the extreme costs (of making 100 and 1000 meals). Finally, the
average production plan (of 55 chefs) is inferior to the optimal intermediate production plan
(of 50 chefs). We conclude that the average costs of the extreme targets is higher than the
cost of any intermediate target.
The proof is depicted in Figure 2.8. We would like to establish that
meaning that the line connecting the costs (values) between a and a′ lies above the V curve.
The left side can be interpreted as the cost when (linearly) interpolating between the cost of a
and the cost of a′ . The right side is the cost when making the optimal choice, b(ta + (1 − t)a′ ).
It will be helpful to consider another choice, l(t) = tb(a) + (1 − t)b(a′ ), which we call the
interpolation policy; it makes choices between the two optimal choices b(a) and b(a′ ). We will
establish (2.54) via the following steps:
The first and last equations are true because V (a) = v(a, b(a)) for all a. The first inequality
follows because v is convex. The second inequality follows because the decision-maker would
reject l(t) in favour of the optimal choice b(ta + (1 − t)a′ ). (We know that l(t) was considered
and rejected, because (i) w is quasi-concave which implies that (ii) l(t) is feasible at state
ta + (1 − t)a′ .)
2.5. UPPER ENVELOPES WITH CONSTRAINTS 35
V (a)
a ta + (1 − t)a′ a′
Figure 2.8: Proof of Theorem 2.6. The middle curve is the cost of the interpolation policy. The
bottom curve is the cost of the optimal policy.
It seems like the constrained version of the theorem (Theorem 2.6) contradicts the uncon-
strained version (Theorem 2.2). The constrained version establishes that the value function is
concave when maximising a concave objective, yet the unconstrained version establishes that
the value function is convex when maximising a convex objective. But if an objective is linear,
then it is both concave and convex. How can both be right?
To make the two theorems more comparable, consider the null constraint that is always
satisfied, i.e. w(a, b) = 0. In this case, the theorems compare as follows:
• The simplified constrained theorem requires the objective to be concave in (a, b), whereas
the unconstrained theorem only requires convexity in a.
• Both make a shape assumption that mirrors the conclusion (concave implies concave,
convex implies convex).
The key to understanding the constrained theorem is that the concavity in (a, b) means
that when you look at V (a) and V (a′ ) and some point in the middle V (a′′ ), there is some
choice b(a′′ ) in the middle that makes a′′ better than just doing b(a) or b(a′ ). This is not the
case for the unconstrained theorem.
Question 2.15. ✓ Sketch a geometric proof of Theorem 2.6.
We now apply Theorem 2.6 to establish that the cost function is convex with respect to output
so that marginal cost is weakly increasing.
Theorem 2.7. If the production function f is concave, then the cost function is convex in
output, i.e. c(·; w) is convex for all w.
36 CHAPTER 2. PRODUCTION
Proof. It is important to realise that the theorem does not claim that c(y; w) is convex in w.
The proof relies on holding w fixed, because w · x is not convex in (w, x).
Recall that the cost function is
c(y; w) = min w · x
−1
x∈RN
+
s.t. f (x) ≥ y.
The constraint is quasi-concave because (x, y) →7 f (x) − y is concave. The objective is linear
in (x, y), and hence convex in (x, y). Thus the first part of Theorem 2.6 implies that c(·; w) is
convex for all w.
Example 2.5. A chocolate manufacturer uses cocoa and rents machines to produce chocolate
bars. When the chocolate is cut into bars, the off-cuts are collected, and can be used to make
more chocolate bars. However, this process is difficult to implement, and requires experimen-
tation. The factory uses cocoa and machines to experiment, which produces knowledge of how
to re-use offcuts. The more knowledge there manufacturer has, the less off-cuts go to waste.
(i) Write down the firm’s problem, without using any Bellman equations.
(ii) Write down the firm’s problem using two Bellman equations relating three value func-
tions: the cost function, the (post-experimentation) profit function, and the value of
experimentation.
(iii) Show that as the price of cocoa increases, the manufacturer decreases the amount of
cocoa it uses.
Answer. Notation: pR price of raw cocoa, pM rental price of machines, k knowledge, (rx , mx )
resources allocated to experimentation, (ry , my ) resources allocated to output production,
chocolate bar output y = f (k, ry , my ), py price of chocolate bars, k = g(rx , mx ) knowledge
“discovered”.
(i) The firm’s problem can be written, without any Bellman equations, as follows:
(ii) Let
C(y; k, pr , pm ) = min
y y
pr r y + pm my (2.61)
r ,m
be the cost function, π(k, py , pr , pm ) the post-experimentation profit function, and V (py ; pr , pm )
the pre-experimentation profit function. The latter two can be defined with Bellman
equations:
V (py ; pr , pm ) = max
x x
π(g(rx , mx ), py , pr , pm ) − pr rx − pm mx . (2.64)
r ,m
(iii) Dynamic programming does not help with this part. Applying the envelope theorem to
(2.60) gives
∂V (py ; pr , pm )
= −rx (py , pr , pm ) − ry (py , pr , pm ), (2.65)
∂pr
where the right side denotes the optimal demand policies function for raw cocoa. V (py ; ·, pm )
is the upper envelope of convex functions,
So Theorem 2.2 implies V (py ; ·, pm ) is convex. Therefore, its derivative (the left side of
(2.65)) is increasing in the price of raw cocoa. We conclude that the right side is also
increasing, and hence the raw cocoa demand is decreasing in the price of cocoa.
Question 2.16. ✓ A studio has two artists. Each artist uses time and materials to produce
art. The old artist is twice as productive as the young artist (i.e. if the old artist has the same
amount of time and material as the young artist, it produces twice the amount of art.) Both
artists are paid the same wage per hour. Assume that the artists’ production functions are
concave.
(ii) Write down a Bellman equation for the firm that buries the material and labour choices
inside a value function.
(iv) Draw a graph involving isoquants and isocosts in which the firm allocates the less pro-
ductive artist more materials.
(v) If the studio could spend a pound to increase one of the artists’ output by 0.01 paintings,
which artist would it spend it on?
38 CHAPTER 2. PRODUCTION
(ii) Write down the firm’s profit function with bribes, incorporating your answer from the
first part into a Bellman equation.
(v) The firm would like to give the politician an argument to rationalise cutting tax rates.
One suggestion was: perhaps cutting the tax rate would increase the firm’s employment
of Australian workers. Is this necessarily the case?
For more similar questions, see the following practice exam questions: 2.v, 2.vi, 8.iii.c, 8.iii.d,
22.iii, 23.iv, 24.a.iv, 26.iv, 26.v, 27.a.iii, 28.iii, 29.a.iv.
For example, think about making toast. If I want to make one slice, I just take a slice
of bread out of my fridge, and put it in the toaster. So perhaps I should say there are two
commodities (and write N = 2) toast and bread, and I have a technology y = (1, −1) that
transforms the input bread (−1) into an output toast (1). But what if I want to make 100 or
100,000 slices of toast? Then I will need to borrow more toasters and watch my power bills!
2.6. *PRODUCTION TECHNOLOGY SETS 39
So, perhaps I need four commodities (N = 4) – toast, bread, electricity, and toasters,
and I have a technology y = (1, −1, −1, −1) that transforms a slice of bread, a unit of
electricity, and a toaster, into some toast. If I want to make 100 slices of toast, I have a
technology for that too: y ′ = (100, −100, −100, −100). My feasible technology set would be
Y = {(n, −n, −n, −n) : n ∈ R+ }.
However, there is an important difference between bread and toasters. Both are factors of
production, but the production process destroys the bread but not the toaster. The technology
notation we developed only allows us to net outputs of a technology. Is there a way we can
model capital which is not destroyed by production?
One way is to reinterpret Y by saying that commodity y4 refers to the service of using a
toaster for one unit of time (rather than the toaster itself).
Another way is to think about two types of toasters: toasters before, and toaster after
production. If we use a toaster for production, then a by-product is a used toaster. This corre-
sponds to the production technology y ′′ = (1, −1, −1, −1, 1). If we leave a toaster idle, then we
also get to keep it, which gives the technology y ′′′ = (0, 0, 0, −1, 1). Since we can use or leave idle
any number of toasters, the feasible technology set is Y ′ = {a(1, −1, −1, −1, 1) + b(0, 0, 0, −1, 1) : a, b ∈ R+ }.
A technology y ∈ Y is efficient if there is no other feasible technology y ′ ∈ Y such that
y ′ > y (i.e. that either produces more outputs or uses less inputs).
Question 2.18. ✓ Write down a feasible technology set for cleaning up toxic waste with these
properties: (1) the waste must be transferred to a waste dump with a limited capacity, (2)
cleaning up pollution requires chemicals and the services of engineers in proportion to the
amount of waste, (3) engineers are unable to work in teams of more than 10 people.
Question 2.19. ✓ Write down a feasible technology set for putting on a comedy show with
these properties: (1) before doing the show, the comedian must use some time to prepare
an act, (2) the comedian can put on one show each day of one week, and (3) each day, the
comedian may hire a small or large theatre.
Question 2.20. ✓ It seems wasteful to use two toasters to make one slice of toast. So, if our
model is good, using a redundant toaster should be inefficient. Is this the case in the two
formulations of the toasting technology?