0% found this document useful (0 votes)
33 views33 pages

Economics of Production (MSC Level)

This document introduces general equilibrium theory and provides context for understanding production theory within a multi-market economy framework. It discusses how general equilibrium theory studies interactions between multiple markets simultaneously, unlike undergraduate microeconomics which focuses on single markets. The document also outlines how concepts from undergraduate microeconomics like supply/demand curves, marginal costs, and competitive equilibrium generalize to multiple interconnected markets in general equilibrium theory. It emphasizes using dynamic programming techniques to decompose complex production decisions.

Uploaded by

Zubiya Moin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views33 pages

Economics of Production (MSC Level)

This document introduces general equilibrium theory and provides context for understanding production theory within a multi-market economy framework. It discusses how general equilibrium theory studies interactions between multiple markets simultaneously, unlike undergraduate microeconomics which focuses on single markets. The document also outlines how concepts from undergraduate microeconomics like supply/demand curves, marginal costs, and competitive equilibrium generalize to multiple interconnected markets in general equilibrium theory. It emphasizes using dynamic programming techniques to decompose complex production decisions.

Uploaded by

Zubiya Moin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 1

Introduction

These notes introduce general equilibrium theory, along with some requisite mathematical
tools. The term “general equilibrium” is somewhat difficult to define. Roughly speaking, a
model is described being a general equilibrium model if it aims to study an entire economy,
without any loose ends such as taxes being thrown in the ocean, food aid being helicoptered
in, or cars being produced from “money” rather than labour, machines, and natural resources.
General equilibrium theory is typically described as a microeconomics topic, but this is mis-
leading. Almost all microeconomic models (including almost all game theory models) do not
aspire to be general equilibrium models. For instance, almost all models of auctions involve
the players directly consuming the money that they are left with at the end of the game, rather
than trading that money for goods and deriving utility from goods. On the other hand, most
applied macroeconomic models are general equilibrium models. Therefore, one of the most
important roles of “microeconomic general equilibrium theory” is to provide a foundation for
modern macroeconomics.
These notes only study (special cases of) the Walras (1874) model as formulated by Arrow
and Debreu (1954), which is a general equilibrium model of perfect competition. There are of
course many general equilibrium models with monopolistic competition, adverse selection, and
other frictions. We focus on perfect competition for simplicity, and largely follow the analysis
of Debreu (1959).
The Arrow-Debreu model is much like first-year undergraduate microeconomics, which
studies a single-market economy. The concepts of supply, demand, marginal utility, marginal
cost, equilibrium, and efficiency are the primary focus, just like undergraduate microeconomics.
However, general equilibrium theory studies many markets simultaneously, whereas undergrad-
uate microeconomics is very limited in its understanding of how different markets interact with
each other.
For example, consider the interplay between migration and international agricultural mar-
kets. Suppose one country has more arable land. Does that mean workers will migrate to
the arable country, to take advantage of higher wages resulting from high productivity? Or
perhaps only few workers are needed in the arable country to maintain high output, so actu-

7
8 CHAPTER 1. INTRODUCTION

P
MB
demand
MC
supply
P∗

Q
Q∗
Figure 1.1: Competitive equilibrium in a sin-
gle market economy

ally the workers will migrate to less arable countries? The tools of undergraduate economics
are not very helpful here, because the problem makes little sense unless there at least three
workers, two firms (one for each country) each with their own production function, and four
markets (land in each country, labour, and food). A supply and demand curve – or even a 2x2
Edgeworth box – just won’t do.
While there are many important applications of general equilibrium theory, the most im-
portant reason to learn it is to understand macroeconomics carefully. The most important
trade-off in macro-economics is between investment and consumption. This is a never-ending
problem; if the world were to end tomorrow, then we would have a big party today and destroy
our capital. Therefore, macroeconomics requires an infinite set of markets.
Despite all of these complications, our goal is to simplify everything, so that we can use
as much intuition from undergraduate microeconomics as possible. Indeed, we recommend
that you find your favourite undergraduate microeconomics textbook, and compare graduate
and undergraduate ideas as you progress through your study. Roughly speaking, the following
undergraduate ideas generalise as follows:

• MC = supply. In undergraduate economics, the supply curve is the same as the marginal
cost curve (and the demand curve is the same as the marginal benefit curve). The
envelope theorem generalises the idea that marginal values are connected to optimal
policies.

• P = MC. Dynamic programming allows us to think in terms of production cost curves,


even when the costs arise from complex trade-offs about which input factors to purchase.
This in turn allows us to derive the classic first-order condition that price equals marginal
cost.

• MC is upward slopping. In undergraduate economics, we assume that the marginal cost


curve is increasing (and the marginal benefit curve is decreasing). The tools of convex
analysis can be applied to establish that if isoquants of the production technology are
convex (i.e. have an increasing slope), then the marginal cost curve is increasing.
9

• Supply = demand. An equilibrium in a single-market economy is when the quantity


supplied equals the quantity demanded. In a multi-market economy, the situation is
much more complicated, because every market’s price affects every other market’s quan-
tity supplied and demand. But nevertheless, an equilibrium occurs when supply equals
demand in every market.
• Equilibria are efficient. Smith (1759) first pointed out that competitive markets direct
people to make socially desirable choices, which he called the invisible hand. In a
introductory economics, the social surplus is maximised at the competitive equilibrium’s
quantity. The notion of “social surplus” does not generalise, and has to be replaced with
a weaker notion, namely Pareto efficiency. In multi-market contexts, the invisible hand
leads decision makers to Pareto efficient allocations.
These notes are quite different from the classical graduate microeconomics textbooks, in-
cluding Jehle and Reny (2011), Kreps (1990), Mas-Colell, Whinston and Green (1995), and
Varian (1992). The primary difference is that we use the language of dynamic programming
rather than mathematical duality to develop producer and consumer theory. First, we believe
“duality” gives unhelpful intuition. Duality in mathematics is the idea that to two seemingly
unrelated problems in fact having some non-obvious relationship that can be used to deduce
new conclusions. However, in producer theory, the relationship is not subtle: if a producer is
not minimising his production cost, then he can increase his profit by reducing his production
cost. The reason why we study both the profit maximisation and cost minimisation problems
is entirely different. Our motivation is to decompose a complicated decision (how to allocate
resources within a firm) into two smaller and simpler decisions (how much to produce, and
with what to produce?). When we do this, we learn more about the relationships between the
decisions, e.g. if a firm plans to produce more, then how will it adjust its production factors?
This decomposition idea is the spirit of dynamic programming, and we believe it should be
taught that way.
Second, dynamic programming plays a major role in economics, especially in macroeco-
nomics. The tools from producer and consumer theory, such as the envelope theorem, play a
major role in macroeconomics. However, the traditional exposition of the envelope theorem
in microeconomics appears very different from that of macroeconomics, and the connection is
usually lost on students. By using a common language between micro- and macroeconomics,
we hope that students will learn both better.
Another important difference is that we focus on the most important mathematical tools
only, and try to keep the proofs as simple as possible. For example, the traditional texts
prove the second welfare theorem using the separating hyperplane theorem. However, a much
simpler proof discovered by Maskin and Roberts (1980) is possible. It is based on economics
ideas rather than geometric ones, and provides a better model of how economic theorists think.
(This proof is included as an aside in Varian (1992).)
Finally, the focus of the problems and exercises is to prepare students to use the tools in
the way that they are typically used in economics. We feel that traditional textbooks typically
10 CHAPTER 1. INTRODUCTION

neglect applications in favour of determining the technical limits of the tools. While both are
valuable, we think it is more important to know what the tools are useful for, rather than what
they are useless for.
Chapter 2

Production

This chapter studies the theory of the competitive firm which means we will assume that the
firm is unable to manipulate prices. The theory focuses on how the firm reacts to prices when
choosing input and output quantities. This choice can be quite complicated, as the firm may
have many possible output levels, and many possible ways to deliver each output level. For
example, a car manufacturer may have to decide on hiring many types of specialised labour
and purchase many specialised components. Is it possible to construct a simple marginal cost
curve and solve the firm’s output choice by setting marginal cost equal to price?
The answer is yes, but some mathematical tools are involved, all of which are widely used
by economists. First, dynamic programming is used to simplify a complicated decision
problem by breaking it into smaller problems. For example, we break the firm’s production
decision into an output choice followed by an input choice. This allows us to construct a
marginal cost curve without getting overwhelmed with the details of the input choices. Second,
the envelope theorem generalises the idea that optimal choices (such as supply curves) are
closely related to marginal valuations (such as marginal cost curves). Third, convex analysis
is a branch of geometry that captures the ideas of decreasing returns to scale and diminishing
marginal productivity, and allows us to understand when marginal cost curves are increasing.

In Section 2.1 we introduce production functions which describe how the firm may transform
inputs into outputs. Section 2.2 then puts production into a competitive market context in
which firms make input and output decisions to maximise profits. Section 2.3 introduces the
envelope theorem, which explores the relationship between marginal valuations and optimal
choices. With some help of convex analysis techniques, we establish that output price increases
lead to more output and that factor price increases lead to a decrease in demand for that factor.
Section 2.4 introduces dynamic programming, which allows us to focus on output decisions
without being distracted by input decisions. This leads us to a version of the classical “price
equals marginal cost” formula. Section 2.5 extends the tools from Section 2.3 to accommodate
constraints; this is necessary for studying the nature of marginal costs. This section establishes
that marginal costs are increasing in output. Finally, Section 2.6 concludes with a discussion of

11
12 CHAPTER 2. PRODUCTION

more complicated production technologies, such as factories that produce several goods. This
last section is for completeness only and can be skipped.

2.1 Production Functions


For the moment, we will take the view that a firm specialises in making a single product out
of several factor products, with the goal of maximizing its profits. Everything in this section
is typically ignored in a standard introductory economics lecture – it is all buried inside the
firm’s cost function.
We assume that there are a total of N goods in this economy. The firm produces one good
in this set using the other N − 1 goods as factor input goods. A production function describes
−1
the technology that transforms N − 1 factor input goods x ∈ RN + into a single output good
y = f (x) ∈ R+ . Some basic assumptions include:

• Possibility of inaction: Producing no output is feasible, i.e. f (0) = 0.

• Free disposal (Monotonicity): The firm has no obligation to use all input goods provided.
Having too many input goods never hurts, as the firm can always throw them away
without any cost. This idea leads to the assumption of monotonicity where f is weakly
increasing. Specifically if x ≥ x′ (i.e. xn ≥ x′n for all n) then f (x) ≥ f (x′ ). A stronger
assumption, strict monotonicity is that if x > x′ (i.e. xn ≥ x′n for all n and xn > x′n
for at least one n) then f (x) > f (x′ ).

• Smoothness: f is twice continuously differentiable. Each partial derivative ∂xi
f (x) is the
marginal productivity of xi .

In introductory economics, it is typical to assume that the marginal cost of production


is increasing. For us, marginal cost is something that we will derive endogenously, rather
than something we will assume directly. But, we consider various other possible assumptions
instead:

• Decreasing marginal productivity: the production function has weakly decreasing marginal
productivity in good 1 if, holding all other input factors x−1 fixed, ∂x∂ 1 f (x) weakly de-
creases as x1 increases. 1 For example, consider a restaurant that produces food using
cooks and kitchen space. Adding a cook without adding any kitchen space is likely to
create congestion that leads to decreasing marginal productivity of cooks. Similarly,
adding kitchen space without adding cooks will relieve a diminishing amount of con-
gestion. Figure 2.1 depicts decreasing marginal productivity of cooks and kitchens in
1
The word “decreasing” is used differently in the context of marginal productivity and marginal utility
compared to everywhere else. Normally, “decreasing” means the function gets smaller as its parameter vector
increases in any (combination) of its dimensions. However, decreasing marginal productivity does not mean
that the marginal productivity of labour decreases when the amount of capital increases.
2.1. PRODUCTION FUNCTIONS 13

producing food. Decreasing marginal productivity of the first input is equivalent to the
∂2
production function being concave in that factor, and also to ∂x 2 f (x) < 0.
1

f (c, k) f (c, k)

k=9 c=9
k=6
k=3 c=6

c=3

c k
(a) Cooks (b) Kitchens
Figure 2.1: Diminishing marginal productivity of cooks and kitchens when f (c, k) = c0.7 k 0.3 .

−1
• Weakly increasing returns to scale: for all x ∈ RN
+ and all t > 1, f (tx) ≥ tf (x). For
example, communications networks have this character: when adding the nth phone line
to a telephone network, there are n − 1 new pairs of people who are now connected to
each other. So, the number of connections supplied y is a function f (n) = 21 n(n − 1)
of the number of people n, and f (tn) ≈ t2 f (n). Note that this assumption leads to
decreasing (not increasing) marginal cost.
−1
• Constant returns to scale: for all x ∈ RN + and all t > 0, f (tx) = tf (x). For example,
this occurs if the output from doubling the size of a factory f (2x) is equal to the output
from two identical factories f (x) + f (x) = 2f (x). This is a common assumption to make.
−1
• Weakly decreasing returns to scale: for all x ∈ RN + and all t > 1, f (tx) ≤ tf (x).
Decreasing returns to scale can occur if we have mispecified the model and left out
some resource. For example, building an identical factory requires finding an (identical)
manager to run it. If we forget to include any input factors (such as the manager) in the
model, then “cloning” a firm by cloning only the inputs that were explicitly modelled
would give a less productive clone – at least under the assumption of decreasing marginal
productivity. One way to make up for such omissions is to assume decreasing returns to
scale.
This assumption is philosophically unappealing for a theory of the “whole economy at
the same time.” Why do we need to leave anything out? Sadly, economics is hard, and
we frequently need to take shortcuts. This is a common one.

• Concavity: f is a concave function, which means that taking a mixture between two
bundles of inputs, f (ax + (1 − a)x′ ) gives more output than the corresponding linear
approximation, af (x) + (1 − a)f (x′ ). For example, if x represents a hospital with many
14 CHAPTER 2. PRODUCTION

doctors, and x′ represents a hospital with many nurses, and a = 12 , then concavity implies

f (x) + f (x′ ) ≤ f ( 12 x + 21 x′ ) + f ( 12 x + 12 x′ )

which means that it’s better to reallocate the doctors and nurses so that both hospitals
are identical.
Concavity is like assuming both weakly decreasing returns to scale and decreasing marginal
productivity for each input factor, i.e. for each input factor, as we increase that factor
(without changing any of the other factors), the marginal output decreases. Caution:
this assumption is frequently referred to as “a convexity assumption,” even though −f ,
not f , is convex.
If f is smooth and concave, then it has weakly decreasing marginal productivity. To see
this, hold (x2 , . . . , xN −1 ) fixed and let g(x1 ) = f (x1 , x2 , . . . xN −1 ), so that g ′ (x1 ) is the
marginal productivity of x1 . Since f is concave, g is also concave, since for all s ∈ [0, 1]
and all x1 , x′1 ∈ R,

g(sx1 + (1 − s)x′1 ) = f (sx1 + (1 − s)x′1 , sx2 + (1 − s)x2 , . . . , sxN −1 + (1 − s)xN −1 )


= f (sx1 + (1 − s)x′1 , x2 , . . . , xN −1 )
= f (s(x1 , x2 , . . . , xN −1 ) + (1 − s)(x′1 , x2 , . . . , xN −1 ))
≥ sf (x1 , x2 , . . . , xN −1 ) + (1 − s)f (x′1 , x2 , . . . , xN −1 )
= sg(x1 ) + (1 − s)g(x′1 ).

Note that the first two steps above are devoted to reformulating the convex combination
in a form that matches Theorem D.6 characterisation of concavity. Similarly, since f is
smooth, g is also smooth (by the chain rule). Since g is smooth and concave, Theorem D.3
implies g ′ is weakly decreasing, so we have established f has weakly decreasing marginal
productivity in the first input. The same logic applies to the other inputs.
If f is concave and has the possibility of inaction, then it has decreasing returns to scale.
To check this, we must show f (tx) ≤ tf (x) for t > 1. Let s = 1/t, which means that
s ∈ (0, 1). By Theorem D.6, sf (tx) + (1 − s)f (0) ≤ f (stx + (1 − s)0). We can then
deduce:

sf (tx) ≤ f (stx) (2.1)


1
f (tx) ≤ f (x) (2.2)
t
f (tx) ≤ tf (x). (2.3)

Question 2.1. ✓ Show mathematically or graphically that if f is smooth and has constant-
returns to scale, then marginal productivities do not depend on scale.
2.1. PRODUCTION FUNCTIONS 15

Question 2.2. ✓ Can a production process have both diminishing marginal productivity and
increasing returns to scale? (Hint: you just need to find one example.)
There are typically many different combinations of inputs that give the same level of output
y. This set is called the isoquant,

I(y) = f −1 (y) = x ∈ RN
+
−1
: f (x) = y .

The set above the isoquant – the set of inputs that give greater or equal output than y – is
called the upper contour set,

V (y) = f −1 ([y, ∞)) = x ∈ R+
N −1
: f (x) ≥ y .

See Figure 2.2 for an example with three isoquants and three upper contour sets. Note that
the upper contour sets may be overlapping whereas the isoquants never cross.
Question 2.3. ✓ Explain why two different isoquants never cross.

x2

y = 15
y = 10
y=5

x1
Figure 2.2: Isoquants and Upper Contour Sets

This allows us to present another possible assumption:

• Quasi-concavity: f is a quasi-concave function, i.e. the upper contour set V (y) for each
output level y is convex. This has the following economic interpretation. Consider two
N −1
input bundles x, x′ ∈ R+ on the same isoquant, i.e. f (x) = f (x′ ). Quasiconcavity
means that mixtures ax + (1 − a)x′ give at least as much output, i.e. f (ax + (1 − a)x′ ) ≥
f (x).

Question 2.4. ✓ Show mathematically or graphically that quasi-concave smooth production


functions do not necessarily have decreasing marginal productivity. (Hint: you just have to
find one counter-example.)
16 CHAPTER 2. PRODUCTION

Since a production function may allow many ways to produce the same output, it raises the
question: how can the firm substitute its inputs at the margin to produce the same quantity?
For example, suppose a supermarket initially plans to use x1 computers and x2 sales people.
If it buys x1 + ∆ computers instead, how many sales people can it replace, maintaining the
same output? This is called the marginal rate of technical substitution from x2 to x1 .
Geometrically it is the slope of the isoquant. If we think of the isoquant as a function, this
amounts to calculating the derivative of the function. The function g of an isoquant is defined
implicitly by the equation

f (x1 , g(x1 )) = f (x∗1 , x∗2 ) for all x1 . (2.4)

By the implicit function theorem (Theorem F.3), its derivative at x1 = x∗1 is


∂f (x∗1 ,x∗2 )

g ′ (x∗1 ) = − ∂x1
∂f (x∗1 ,x∗2 )
. (2.5)
∂x2

Question 2.5. ✓ Show mathematically or graphically that if f is smooth and has constant-
returns to scale, then marginal rates of technical substitution do not depend on scale.

2.2 Profit Maximization


In this section, we model the firm to be responding to prices by choosing a production plan to
maximize their profits. In particular, the firm can not choose prices in this simple model. Nor
can the firm influence prices, e.g. the firm “believes” that restricting supply would not affect
prices.
The abstract theory in this section focuses on a single output good for simplicity. However,
it is straightforward to apply the theory to other situations such as multiple firms in a supply
chain, multi-product firms, a firm that supplies the same product in different places or at
different times, and so on. The examples and exercises explore these possibilities.
−1
Let p ∈ R+ and w ∈ RN + be the prices of the output and input goods, respectively.
The notation w is convenient because input prices might include wages. The firm’s profit
function is
π(p; w) = max pf (x) − w · x = pf (x(p; w)) − w · x(p; w) (2.6)
−1
x∈RN
+

where x(p; w) is called the factor demand function.


In the previous sentence, the word the before factor demand function is problematic. We
usually only say “the” if there is exactly one thing being referred to, such as “the biggest house
in the world”. We do not write “the most direct road from Edinburgh to New York” (there
is none). Similarly, we do not write “the biggest sheet of A4 paper” (they are supposedly all
the same size, so there are many such sheets of paper). For these reasons, we often write “a”
2.2. PROFIT MAXIMIZATION 17

instead of “the”, unless we know for sure that there is exactly one item under discussion. This
issue is discussed in more detail elsewhere in these notes: usage of “the” is discussed in detail
in Section B.2. Section E.2 discusses whether or not there is an optimal demand function.
Section E.3 establishes that there is at most one optimal demand function if the production
function is strictly concave.
If x∗ is an optimal (profit maximizing) choice given prices (p, w), then x∗ satisfies the
first-order conditions
∂f (x∗ )
p = wi for all i ∈ {1, . . . , N − 1}. (2.7)
∂xi
In particular, this implies that the marginal rate of technical substitution from any good i to
any other good j is equal to marginal rate of substitution in terms of purchase prices,
∂f (x∗ )
wi
= − ∂f∂x i
(x∗ )
for all i, j ∈ {1, . . . , N − 1}.
wj
∂xj

For example, suppose i = 1 is capital and j = 2 is labor. If the marginal rate of technical
substitution from capital to labor is small, this means the firm needs few workers to replace
capital and maintain the same level output. The equation says that the firm should replace
capital with workers until the cost of replacing each unit of capital with a worker is no longer
smaller than (i.e. becomes equal to) the productivity gain of replacing capital with workers.
Geometrically, this means that the firm chooses a production plan where the isoquant is
tangential to the isocost line (or isocost hyperplane).
Example 2.1. Suppose that music recordings are produced from the labour of musicians and
technicians. Write down the music company’s profit maximisation problem.
Answer. Let r be the royalties of a song, lm the musician labour input, lt the technician labour
input, wm the musician wage, wt the technician wage, and f (lm , lt ) be the number of songs
produced. The music company’s profit maximisation problem is

max rf (lm , lt ) − wm lm − wt lt .
lm ,lt

Example 2.2. Glycerine is a by-product of bio-diesel production, both of which are produced
from waste organic material. Write down a bio-energy company’s profit maximisation problem.
Answer. Let w be the waste material input, g(w) the glycerine output, d(w) the bio-diesel
output, pw the price of waste material, pg the price of glycerine, and pd the price of bio-diesel.
The bio-energy company’s profit maximisation problem is

max pg g(w) + pd d(w) − pw w.


w

Example 2.3. PET (polyethylene) plastic is made from ethylene, which is made from crude oil.
Write down the profit maximisation problem of a vertically integrated firm that buys crude
18 CHAPTER 2. PRODUCTION

oil and sells plastic. Write down the first-order condition determining the optimal production
choice.
Answer. Let x be the crude oil input, e = f (x) be the ethylene produced from the x units of
oil, y = g(e) be the plastic output from the e units of ethylene input, px the price of crude oil,
and py be the price of plastic. The integrated firm’s profit maximisation problem is

π(py , px ) = max py g(f (x)) − px x.


x

The first-order condition for the optimal x choice is


px
g ′ (f (x))f ′ (x) = .
py
Question 2.6. ✓ Generalise the PET example to accommodate trade in the ethylene market.
Question 2.7. ✓ A fashion company produces dresses and suits using 100% wool and dye.
Write down the fashion company’s profit-maximisation problem.
Question 2.8. ✓ A convenience store buys chocolate bars and milk from a wholesaler, and also
employs cashiers to sell the products. While the cashiers always sell both products, they can
focus more on selling chocolates or milk, e.g. by talking about the products with customers.
Write down the convenience store’s profit maximisation problem. Write down the first-order
conditions. What role do these first-order conditions play when the retail prices are lower then
the wholesale prices?
For similar questions, part (i) of all of the practice exam questions involves (except question
30) formulating a firm’s profit maximisation problem. See also: 7.iii, 15.iii.

2.3 Upper Envelopes and Value Functions


In introductory economics, one important lesson is that the supply curve is equal to the
marginal cost curve. That is, optimal policies are related to marginal valuations. However,
in introductory economics, there was only one market and one price, so supply curves were
simple functions that could be plotted in two dimensions. Now, we want to consider many
markets at once with many prices. This section aims to generalise the relationship between
optimal policies and marginal valuations to a multi-market context.
We begin by studying marginal valuations – how price changes affect the firm’s profit. If
factor prices increase, then the firm’s profit weakly decreases, and if the output price increases,
then profit strictly increases. To say more, we need to study the profit function π, as defined in
(2.6). However, it is unclear how to differentiate π, as max is not a standard calculus operation.
This section studies first and second derivatives when there is a max operator.
Economists frequently use two names for functions with maxima of the form

V (a) = max v(a, b). (2.8)


b
2.3. UPPER ENVELOPES AND VALUE FUNCTIONS 19

The first name is upper envelope, which refers to the geometric interpretation of (2.8),
involving following the outer edge that surrounds (“envelopes”) some curves, as depicted in
Figure 2.3 and Figure 2.4. Specifically, for each b, there is a curve w(a) = v(a, b). The function
V is the outer edge of all of these curves. In Figure 2.4, there is an infinite set of lines (only
some of which are depicted), and the upper envelope is a parabola.

V (a) V (a)
b = study

b = work

a
(assets) a
Figure 2.3: Value functions are upper en- Figure 2.4: The upper envelope of an infinite
velopes set of lines

The second name is value function, which refers to an economic idea: the value of facing
a situation or state (a) before making a choice (b). Figure 2.3 depicts the value of holding
assets before making a choice between studying or working. The profit function π(p; w) is also
an example of a value function; it is the firm’s value of facing prices (p; w) before it chooses
its input quantities. The term policy or policy function b(a) refers to the optimal choice of
b for each state a, i.e. b(a) ∈ argmaxb̂ v(a, b̂). The input demand function is an example of a
policy. We summarise the terminology:

V( a
|{z} ) = max v(a, b) = v(a, b(a) ).
b | {z } |{z}
state variable |{z}
| {z } choice variable objective function policy
value function

The envelope theorem provides a formula for differentiating value functions. (Actually,
this is the simplest of a large collection of envelope theorems used by economists.)
Theorem 2.1 (Envelope Theorem). Let v : Rn × Rm → R be a differentiable function, V (a) =
maxb∈Rm v(a, b) be its value function (upper envelope), and b(a) be its policy function. If V is
a differentiable function, then
∂v(a, b)
V ′ (a) = , (2.9)
∂a b=b(a)
or in alternative notation,
V ′ (a) = va (a, b(a)). (2.10)
20 CHAPTER 2. PRODUCTION

V (a)
V
L

a

Figure 2.5: The “lazy” envelope theorem
proof .

Lazy decision maker proof of Theorem 2.1. Fix a particular state ā. The value function of a
“lazy” decision maker who chooses b(ā), regardless of a, is
L(a) = v(a, b(ā)), (2.11)
and is depicted in Figure 2.5. The lazy decision maker’s value is weakly less than the rational
value, i.e. L(a) ≤ V (a) for all a. Their values are equal at ā. Therefore ā minimises V (a)−L(a),
so the first-order condition gives
V ′ (ā) = L′ (ā) = v1 (ā, b(ā)). (2.12)

Chain rule proof of Theorem 2.1. Let b(a) denote the policy function. We will only prove this
theorem for the case in which the state variable a and choice variable b are one-dimensional,
and the optimal policy b(a) is a differentiable function (although the theorem is true without
these extra assumptions). With this notation, V may be rewritten as V (a) = v(a, b(a)). By
Theorem F.2, the derivative is
∂v(a, b) ∂v(a, b)
V ′ (a) = + b′ (a). (2.13)
∂a b=b(a) ∂b b=b(a)

However, since b(a) maximizes v(a, ·), we have the first-order condition
∂v(a, b)
= 0.
∂b b=b(a)

The last term in (2.13) vanishes, so we are left with (2.9).


For example, consider a manager deciding how many workers l to hire in response to the wage
w. Suppose the manager’s profit function is

π(w) = max 10 l − wl. (2.14)
l
2.3. UPPER ENVELOPES AND VALUE FUNCTIONS 21

In the jargon we defined above, w is the state variable, l is the choice variable, and π is the
value function. We would like to calculate π ′ (w).
First we will calculate π ′ (w) without using the envelope theorem, which we will call the
“obvious method.” The first-order condition for the manager’s choice is
1
5√ − w = 0 (2.15)
l
which means that the policy function is
25
l(w) = . (2.16)
w2
The value function may be rewritten as
p
π(w) = 10 l(w) − wl(w)
r
25 25
= 10 2
−w 2
w w
50 25
= −
w w
25
=
w
whose derivative is
25
π ′ (w) = − .
w2
Next we will calculate π ′ (w) using the “envelope theorem method.” The theorem says that
 
′ ∂  √
π (w) = 10 l − wl
∂w l=l(w)

= [−l]l=l(w)
= −l(w)

Often, this form is all we need. Alternatively, we can substitute in the optimal policy, (2.16)
to conclude that π ′ (w) = − w252 . Evidently, the envelope theorem approach requires fewer
calculations.
When we used the obvious method, we had to calculate and substitute the policy function
l(w). When we used the envelope theorem method, we did not. We will examine carefully why
this is the case. Henceforth, we will only consider calculating the derivative of π(w) at w = w∗ .
Imagine that w∗ is the old wage, and we are interested in studying how a small market wage
increase affects profits. An alert manager would adjust the labour according to the policy
l(w) = w252 . The alert manager’s policy is decreasing, so that l′ (w) < 0. Let’s compare the
22 CHAPTER 2. PRODUCTION

alert manager’s profits with a lazy manager who does not adjust the labour at all, and uses
the suboptimal policy ¯l(w) = l(w∗ ) = w25∗2 . The lazy manager’s policy is flat, with ¯l′ (w) = 0.
Clearly, the lazy manager would make less profit than the alert manager. But how much less?
The profit function for the lazy manager is
q
π̄(w) = 10 ¯l(w) − w¯l(w) (2.17)
5 25
= 10 ∗ − w ∗2 , (2.18)
w w
which is less than the alert manager’s profit function. The lazy manager’s marginal profit of
a wage increase (from any w) is
25
π̄ ′ (w) = − , (2.19)
w∗2
Notice that the lazy manager’s marginal profit, π̄ ′ (w∗ ) is the same as the alert manager’s
marginal profit π ′ (w∗ )! This explains why we did not need the derivative of the policy func-
tion. Even though the lazy manager makes less profit than the alert manager, the difference
is very small after a small wage change, so the marginal profit is the same. So, when calcu-
lating marginal profits, we can use the lazy manager’s profit function rather than the alert
manager’s profit function, even though the lazy manager’s profit function is (weakly) less than
the alert manager’s profit function. The envelope theorem uses this observation to simplify
the calculations.
Applying the envelope theorem to the profit function (2.6) gives
∂π(p; w)
= f (x(p; w)) = y(p; w) (2.20)
∂p
∂π(p; w)
= −xi (p; w). (2.21)
∂wi
From (2.20), we learn that the marginal profit of an output price increase equals the output
quantity – which is also the marginal revenue of a price increase, holding quantities fixed. We
can interpret (2.21) such that the marginal loss of an input price increase equals the marginal
cost increase. These two formulas relate policy functions to marginal valuations (although not
in an analogous way to marginal cost coinciding with the supply curve, as we will see later in
Section 2.4).
In principle, we might also have expected an indirect effect from a price change: since the
firm changes its quantities, this might also have an effect on the marginal profit. But this
is not the case. The “lazy manager” does not adjust quantities, but has the same marginal
profits as the rational manager.
Our next task is to understand how optimal choices (inputs and output quantities) are
affected by prices. We will use the relationships between optimal policies and marginal valua-
tions that we established above. Just like increasing marginal cost implies an increasing supply
2.3. UPPER ENVELOPES AND VALUE FUNCTIONS 23

curve, we will show that convex functions are convex, and hence supply curves are increasing.
The envelope theorem gave us a starting point: the right side of the derivatives in (2.20) and
(2.21) in fact contain the firm’s choices of input and output (which is determined by input
choices). So, rearranging and differentiating again yields

∂y(p; w) ∂ 2 π(p; w)
= (2.22)
∂p ∂p2
∂xi (p; w) ∂ 2 π(p; w)
=− . (2.23)
∂wj ∂wi ∂wj
What do we know about the second derivatives of the profit function π? One thing we
know is that because π is twice differentiable,
∂xi (p; w) ∂xj (p; w) ∂ 2 π(p; w)
= =− . (2.24)
∂wj ∂wi ∂wi ∂wj

For example, consider a hospital that uses doctors (xi ) and nurses (xj ) among other things
as input factors. The equation above establishes a relationship between the hospital’s demand
for these two items. The first term describes by how much demand for doctors increases when
nurses’ wages increase. The second term describes by how much demand for nurses increases
when doctors’ wages increase. The equation says they are equal. If the hospital decides to
hire an extra doctor (and possibly fire some nurses) when nurses’ wages increase by $1, then
the hospital would also decide to hire an extra nurse when doctors’ wages increase by $1.
To say more about the second derivatives of the profit function (and hence the first deriva-
tives of the policy functions), we will need another theorem.
Theorem 2.2. Suppose V is the upper envelope of convex functions, i.e. V (a) = maxb v(a, b)
where v(·, b) is a convex function for each b. Then V is convex.
Algebraic Proof. This proof is illustrated in Figure 2.6. We informally describe the proof
first. Convexity is about comparing intermediate possibilities. For example, two “extreme”
situations might involve having a = $100 and a′ = $1000 in the bank account in the morning,
respectively. How would the value in an intermediate situation when ta + (1 − t)a′ = $400
compare to the values in the extreme situations? If the value function is convex, then the
intermediate values is worse than the corresponding weighted average tV (a) + (1 − t)V (a′ ) of
the extreme values. If the utility function is convex in the state variable, then we claim that
the value function will be convex.
To prove this, we start with the weighted average of the extreme values. These extreme
values are based on the corresponding optimal choices, e.g. living frugally when a = $100
and throwing a party when a′ = $1000. If we replace these extreme values that are based on
optimal choices with a suboptimal choice, then we will reduce the weighted average value. Since
we are interested in the intermediate situation a′′ = $400, we replace the optimal choices for
the extreme situations with the optimal choice for the intermediate situation (which probably
24 CHAPTER 2. PRODUCTION

involves moderate consumption rather than frugal or party-level consumption). After making
this substitution, we are taking the weighted average of the extreme situations using the
intermediate choice. Since the underlying objective function is convex, this is better than the
intermediate situation (of a′′ = $400).
We would like to show tV (a) + (1 − t)V (a′ ) ≥ V (a′′t ), where we define a′′t = ta + (1 − t)a′ as
the convex combination t of a and a′ . As usual, let b(a) denote the policy function. Expanding
the left side gives

tV (a) + (1 − t)V (a′ )


= tv(a, b(a)) + (1 − t)v(a′ , b(a′ ))
≥ tv(a, b(a′′t )) + (1 − t)v(a′ , b(a′ )) (since b(a) is best at a)
≥ tv(a, b(a′′t )) + (1 − t)v(a′ , b(a′′t )) (since b(a′ ) is best at a′ )
≥ v(a′′t , b(a′′t )) (since v(·, b(a′′t )) is convex)
= V (a′′t ).

Geometric Proof (Sketch). This proof is illustrated in Figure 2.7. Recall that a function is
convex if and only if its hypergraph (the set of points consisting of the “atmosphere” above
the surface, {(a, c) : c ≥ V (a)}) is convex. A point is in the hypergraph of the upper envelope if
it is in all of the hypergraphs of the underlying functions. That is, the hypergraph of the upper
envelope is the intersection of hypergraphs of the underlying functions. Since the intersection
of convex sets is convex (Theorem D.1), the hypergraph of the upper envelope is convex.

V (a)
V (a)

a a′′t a′ a
Figure 2.7: Geometric Proof of Theorem 2.2
Figure 2.6: Algebraic Proof of Theorem 2.2

We may use the theorem above to establish that the firm’s profit function is convex. The
theorem below uses this to understand how price changes affect the firm’s choices. After an
output price rise, the firm produces more. After an input price rise, the firm reduces its demand
for that good.
2.3. UPPER ENVELOPES AND VALUE FUNCTIONS 25

Theorem 2.3. For every production function f , the firm’s profit function π is convex. Hence,
if π is smooth, then

∂y(p; w) ∂xi (p; w)


≥0 and ≤0 (2.25)
∂p ∂wi

Proof. We first outline the proof. The idea is that if the input (and hence output) quantities
are held constant, then the profits are a linear function of prices. This means is because
profits are based on calculating prices times quantities, both when calculating revenues and
costs. Since linear functions are convex, it follows that for each input choice, profits are a
convex function of prices. Now, the profit function is the upper envelope of each of these linear
functions (one for each possible production plan), so we conclude the profit function is convex.
For every input x∗ , we can define a function g(p; w) = pf (x∗ ) − w · x∗ . Taking the upper
envelope of all such g(p; w) functions gives the profit function π. Since each g function is linear
(and hence convex), Theorem 2.2 implies that the profit function π is convex. Thus, we may
apply Theorem D.4 to deduce

∂2 ∂2
π(p; w) ≥ 0 and π(p; w) ≥ 0. (2.26)
∂p2 ∂wi2

Substituting these inequalities into (2.22) and (2.23) gives the desired inequalities.

Example 2.4. Consider a supermarket that buys wholesale food and labour, which it uses to
sell retail food. Some food might get wasted; more labour means less food gets wasted.

(i) Formulate the supermarket’s profit maximisation problem.

(ii) Show that the supermarket’s profit function is convex.

(iii) Show that the supermarket responds to a wholesale price increase by buying less.

Answer.

(i) Notation: Let d denote wholesale food quantity, ϕ wholesale food price, l labour hired,
w wages, f (l, d) retail food sold, and p retail food price. The profit function is

π(p, ϕ, w) = max pf (l, d) − wl − ϕd.


l,d

(ii) For each possible value of the choice variables (l, d), the firm’s objective is a linear
function of the state variable (p, ϕ, w). Since linear functions are convex, the upper
envelope, π(p, ϕ, w) is convex.
26 CHAPTER 2. PRODUCTION

(iii) By the envelope theorem,


∂π(p, ϕ, w) ∂
= [pf (l, d) − ϕd − wl]l=l(p,ϕ,w),d=d(p,ϕ,w) = −d(p, ϕ, w),
∂ϕ ∂ϕ
Where l(p, ϕ, w) and d(p, ϕ, w) are the labour demand and wholesale food demand poli-
cies. Differentiating and multiplying by −1 on both sides gives
∂ 2 π(p, ϕ, w) ∂d(p, ϕ, w)
− = .
ϕ2 ∂ϕ
Since π is convex, the left side is negative. Thus, the right side is negative, so the sales
policy is decreasing in the wholesale price ϕ.
The important lessons of this section are:
• The envelope theorem provides a formula for differentiating value functions, such as
profit functions.
• The envelope formula provides a relationship between the derivative of the value function
and the policy function. (Although we have not yet encountered the marginal cost curve
coinciding with the supply curve.)
• If the decision-maker’s problem is convex (i.e. satisfies all the convexity assumptions we
need), then the value function is convex. This means the second derivatives (differenti-
ating with respect to the same variable twice) of the value function are positive. This
allowed us to deduce the signs of the derivatives of the policy function in the profit
maximization problem.
• We did not need to make any assumptions about the production function to deduce that
the profit function is convex. The convexity assumptions arise from the fact that prices
are linear, i.e. each unit is charged at the same price.
Question 2.9. ✓ In classic undergraduate producer theory, profit π is a function of price P
and output quantity Q,
π(P, Q) = T R(P, Q) − T C(Q),
where total revenue is T R(P, Q) = P Q, and T C(Q) is the cost of producing Q.
(i) Using the envelope theorem where possible, derive formulas for how revenue and profit
change after a marginal price increase, i.e.
d
π(P, Q(P )),
dP
where Q(P ) is the output choice at price P . (Hint: if you are rusty on your calculus
notation for total derivatives, you might find it helpful to write g(P ) = π(P, Q(P )), and
calculate the derivative g ′ (P ).)
2.4. COST FUNCTIONS AND DYNAMIC PROGRAMMING 27

(ii) Using algebra and words, explain the effect that the envelope theorem ruled out in part
(i).

Question 2.10. ✓ Show that the firm’s optimal policies are unresponsive to “inflation”, i.e.
all prices increasing by the same proportion. Show that inflation increases (nominal) profits.
Do your answers suggest that a firm has an incentive to cause inflation (perhaps by bribing
politicians)?
Question 2.11. ✓ A solar panel manufacturer uses knowledge, labor and silicon to make solar
panels. Labor and silicon are acquired at market prices. However the firm can not acquire
new knowledge – it is stuck with whatever it is endowed with.

(i) Write down a mathematical model that represents the firm’s profit maximization prob-
lem.

(ii) What is the marginal profit of knowledge to the firm? Your answer should take into
account that if the firm’s knowledge increases, it might decide to change its production
decision.

For more similar questions, see the following practice exam questions: 3.iv, 3.v, 6.iii, 6.iv, 9.iii,
12.iv, 15.iv, 16.iii, 18.iii, 18.iv, 24.a.iii, 25.iii, 27.a.ii, 28.iv, 29.a.ii, 31.a.iv, 33.iii.

2.4 Cost Functions and Dynamic Programming


The firm’s profit maximization problem is complicated, because it chooses the quantities of
both input goods and the output good. So far, these complications have prevented us from
constructing a marginal cost curve, and relating marginal cost to the output policy (supply
curve). In this section, we will finally address this problem. To simplify our analysis, we now
introduce an important technique known as dynamic programming, which was developed
by Bellman (1957) and is widely used in economics and also many other fields. The idea is to
split the firm’s complicated profit maximization problem into two sub-problems, one in which
the firm chooses output only, and the other in which the firm chooses its inputs only. Of course,
it is not possible to completely separate the two choices, but with dynamic programming we
can come very close to achieving this. Having smaller and simpler problems allows us to answer
questions such as: what is the marginal cost of production, and how do marginal increases in
targeted output affect input demands?
Recall the firm’s profit function

π(p; w) = max pf (x) − w · x. (2.27)


−1
x∈RN
+

In this problem, the firm is effectively choosing both its inputs x and its output f (x) at the
same time. We can decompose the problem into two problems where inputs and output are
28 CHAPTER 2. PRODUCTION

chosen separately. The cost function c gives the cost of producing a particular output quantity:
c(y; w) = min w · x (2.28)
N −1
x∈R+

s.t. f (x) ≥ y. (2.29)


The decision in the cost minimisation problem only involves input choices; output (y) has
already been chosen. Notice that the cost function is a value function – it is the value of
the firm, excluding revenues, after it has learned the market prices and has committed to an
output quantity.
The profit function can now be rewritten in terms of the cost function:
π(p; w) = max py − c(y; w). (2.30)
y∈R+

In this reformulation of the profit function, the firm only chooses output. We were able to
simplify the firm’s profit maximization problem by burying some of the decisions inside the cost
function. The simplified formula for the profit function in (2.30) is an example of a Bellman
equation which lies at the heart of dynamic programming.
The lesson of dynamic programming can be summarised as: a complicated value function
with many decisions can be simplified by burying some of the decisions inside another value
function. In computer networking, the problem of choosing the fastest route for sending
messages between two computers can be simplified with dynamic programming. Dijkstra
(1959) noticed that the problem can be broken down into smaller problems by first calculating
the value (speed) of all directly connected computers, and then adjusting for the speed of the
direct links. The problem of finding the best route from the neighbouring computer to the
target is buried inside a value function.
In genetics, the problem of determining the most likely sequence of mutations between
a pair of genes can be simplified with dynamic programming with what is known as the
Needleman and Wunsch (1970) algorithm. Comparing two long DNA sequences is a daunting
task. But the problem may be split up into (many) smaller problems. It is easy to compare
two nucleotides (one from each gene), and the comparisons of all the other nucleotides can be
buried inside a value function.
In economics, the most important application of dynamic programming is in macroeco-
nomics in which a consumer has to choose their consumption for each day of the rest of their
life. This complicated problem can be decomposed into choosing the consumption today and
savings for tomorrow. The consumption choices from tomorrow onwards are buried inside the
value of saving today.
But for the moment, we will only study the firm’s profit maximization problem. One step
we did not check was whether the Bellman equation (2.30) gives the right answer – it should
match the value function (2.27). This step is known as verifying the principle of optimality.
Lemma 2.1 (Principle of Optimality). The definitions of the profit function π(p; w), in (2.27)
and (2.30) are equivalent.
2.4. COST FUNCTIONS AND DYNAMIC PROGRAMMING 29

Proof. The proof involves patiently transforming the formula for the value function into the
Bellman equation. The key trick is to add a new “choice” of output y, which initially is no
choice at all, because it is completely determined by the input. But when the input is chosen
after the output, the separate choice of output becomes meaningful.

max pf (x) − w · x = max pf (x) − w · x (2.31)


−1 N −1
x∈RN
+ y∈R+ ,x∈R+

s.t. f (x) = y (2.32)


 
max py − w · x
= max  x∈R+ 
N −1
(2.33)
y∈R+
s.t. f (x) = y
 
min w · x
= max py −  x∈R+ 
N −1
(2.34)
y∈R+
s.t. f (x) = y
= max py − c(y; w) (2.35)
y∈R+

The Bellman equation (2.30) buries the complicated input choices inside the cost function c,
and allows us to focus on just one choice: output. This allows us to establish the classical
“price equals marginal cost” formula.
Theorem 2.4. If y(p; w) is the optimal supply policy in (2.27), then for all prices (p, w),

∂c(y; w)
p= . (2.36)
∂y y=y(p;w)

Proof. By the principle of optimality (Lemma 2.1), the profit function (2.27) can be rewritten
in terms of the cost function, (2.30). The first-order condition of this reformulated profit
function with respect to output y is


[py − c(y; w)] = 0, (2.37)
∂y y=y(p;w)

which simplifes to (2.36).

This result shows how useful dynamic programming is: it allowed us to simplify a complicated
problem back into something very simple and familiar.
30 CHAPTER 2. PRODUCTION

We can also re-apply the envelope theorem to study how profits and output are affected
by price changes:
 
∂π(p; w) ∂
= (py − c(y, w)) = y(p; w) (2.38)
∂p ∂p y=y(p;w)
 
∂π(p; w) ∂ ∂c(y; w)
= (py − c(y, w)) =− . (2.39)
∂wi ∂wi y=y(p;w) ∂wi y=y(p;w)
We do not learn anything new from the first equation, (2.38). However, the second equation
(2.39) does tell us something: when factor prices increase, profits go down in proportion to
the consequent increase in production cost.
Question 2.12. ✓ Let p be the sale price of output, k be capital which is rented at price r, and
labour l which is paid a wage w. Consider the Cobb and Douglas (1928) production function
y = f (k, l) = k a lb .
(i) Write down the firm’s profit function.
(ii) Write down a Bellman equation in which the firm chooses output (and input is buried
inside a value function).
(iii) Derive the optimal capital and labour choices k(y; r, w) and l(y; r, w). Note: the algebra
requires a lot of patience, so please don’t try this alone! It is worth doing, as it will help
convince you that you understand all of the tools.
Question 2.13. ✓ Continuing Question 2.11 about Solar panel manufacturing, suppose that
the production function is linear in knowledge. Would the firm choose to produce more when
it is endowed with more knowledge? What assumptions in your model are important for your
conclusion?
Question 2.14. ✓ There are two ways to run a dairy farm. The traditional way is to milk
each cow by manually herding the cows and attaching a hose. The modern way involves
buying a big rotary machine where the cows walk in, spend half an hour in the machine, and
walk out in a completely automated process. Assume that the marginal product of the rotary
machine (i.e. the difference in output between machine and no-machine, holding cows and
labour fixed) is increasing in cows and labour. Rotary machines are big and expensive, and
can service hundreds of cows.
(i) Formulate the farm’s profit maximisation problem.
(ii) Henceforth, assume that the two dairy technologies are concave in all inputs, except for
the (indivisible) rotary machine. Sketch a graph of the marginal cost of milk.
(iii) When the price of milk increases, does labour demand increase or decrease?
For more similar questions, see the following practice exam questions: 2.ii, 8.iii.a, 8.iii.b, 9,
21.a.ii, 22.iii, 23.iii, 24.a.ii, 31.a.ii, 31.a.iii, 32.iii, 32.iv, 33.iv, 34.ii, 34.iii, 34.iv.
2.5. UPPER ENVELOPES WITH CONSTRAINTS 31

2.5 Upper Envelopes with Constraints


Section 2.3 developed some mathematical tools for studying value and policy functions such
as the profit and the input demand functions. However, the theorems assume that the opti-
mization problem is unconstrained and can not accommodate the output target constraint in
the cost function. This section resolves this problem by generalizing the theorems. These new
tools will allow us to prove that marginal cost of production is increasing when the production
function is concave.
First, we generalize the value function from (2.8) to accommodate constrained problems:

V (a) = max v(a, b)


b (2.40)
s.t. w(a, b) ≥ 0.

The policy function b(a) may be solved in the usual way with the Lagrange theorem. The
Lagrangian is
L(a, b, λ) = v(a, b) + λw(a, b).
At an optimal choice b(a), the Lagrange theorem implies that there is a Lagrange multiplier
λ(a) ≥ 0 such that following first-order condition is satisfied
 
∂L(a, b, λ)
= 0.
∂b b=b(a),λ=λ(a)

Expanding this gives  


∂v(a, b) ∂w(a, b)
+λ = 0. (2.41)
∂b ∂b b=b(a),λ=λ(a)

The constrained envelope theorem uses this theory to give a formula for the marginal value
function, V ′ (a).
Theorem 2.5 (Constrained Envelope Theorem). If V (·), v(·, ·), w(·), b(·), and λ(·) (as defined
above) are continuously differentiable functions, and
∂w(a, b)
6= 0 for all (a, b(a)), (2.42)
∂b
and if the constraint binds at (a, b(a)), then
 
′ ∂v(a, b) ∂w(a, b)
V (a) = +λ . (2.43)
∂a ∂a b=b(a),λ=λ(a)

Proof. The max operation (and its constraint) in the formula for the value function, (2.40)
may be removed by substituting in the policy function:

V (a) = v(a, b(a)).


32 CHAPTER 2. PRODUCTION

The idea behind Lagrange multipliers is to add a term that represents the marginal cost of
satisfying the constraint. Since we assume that the constraint always binds, i.e. w(a, b(a)) = 0,
it is correct to write

V (a) = v(a, b(a)) + λ(a)w(a, b(a)) = L(a, b(a), λ(a)).

This term accounts for marginal changes in the constraint (i.e. replacing the 0 on the right
side of the constraint with a slightly different number). So it is intuitive that this extra term
might help with a proof.
Differentiating gives
 
′ ∂L(a, b, λ) ∂L(a, b, λ) ′ ∂L(a, b, λ) ′
V (a) = + b (a) + λ (a) .
∂a ∂b ∂λ b=b(a),λ=λ(a)

The second term is 0 by the first-order condition (2.41). The last term is 0 as it contains
w(a, b(a)) which is 0 because we assumed the constraint binds. Expanding the remaining term
gives (2.43).
We now take first-order conditions and apply the envelope theorem to the cost function (2.28).
The Lagrangian of the cost function is:

L(y, w, x, λ) = w · x − λ[f (x) − y]. (2.44)

The first-order condition of the Lagrangian (as in (2.41)) is


 

L(x, λ; y, w) =0 (2.45)
∂xi x=x(y,w),λ=λ(y,w)

which simplifies to
∂f (x)
wi = λ(y; w) (2.46)
∂xi x=x(y;w)

(Note that this calculation involved an extra minus sign because the cost function involves a
minimization.) Applying the constrained envelope theorem to the cost function gives

∂c(y; w) ∂
= [w · x − λ(f (x) − y)] = λ(y; w) (2.47)
∂y ∂y x=x(y;w),λ=λ(y;w)
∂c(y; w) ∂
= [w · x − λ(f (x) − y)] = xi (y; w). (2.48)
∂wi ∂wi x=x(y;w),λ=λ(y;w)

We now interpret these three equations. The second equation (2.47) is fundamental to the
theory of Lagrange multipliers. It says that the marginal cost of increasing the production
target (i.e. tightening the production target constraint) is equal to the Lagrange multiplier. In
2.5. UPPER ENVELOPES WITH CONSTRAINTS 33

other words, increasing the production target comes at a price λ. However, this is an implicit
price not determined directly from market transactions. This is why the Lagrange multiplier
is often called the shadow price of the constraint.
The third equation, (2.48) is sometimes called Shephard’s lemma, and is a slight re-
statement of (2.21). It says that the marginal effect of a price increase of input i is the extra
expenditure required to buy input i keeping the demand fixed. Even though the firm will
decrease its demand for input i (and substitute to other inputs), this effect is too small to
dampen the cost increase.
The first equation, (2.46) includes a Lagrange multiplier which we interpreted as the
marginal cost of output. The left side of the first equation, is the marginal expenditure of
increasing input i. The right side is the marginal cost of the extra output created by this
input.
In Section 2.3, we used the fact that the envelope equations relates the policy function to
the derivative of the value function to learn more about the policy function. Similar to before,
we deduce that
∂xi (y; w) ∂xj (y; w) ∂ 2 c(y; w)
= = . (2.49)
∂wj ∂wi ∂wi ∂wj

These equations are almost identical to (2.24); only the state variables are different. Before,
the policy was a function of prices; here the policy is a function of the output target y and
input prices w.
Under much more stringent (and incompatible) conditions compared to before, we show
that value functions are convex or concave.
Theorem 2.6. In the notation of (2.40), if v is convex, and w is quasi-concave, then

V (a) = min v(a, b) (2.50)


b
s.t. w(a, b) ≥ 0 (2.51)

is convex. Similarly, if v is concave, and w is quasi-concave, then

V̄ (a) = max v(a, b) (2.52)


b
s.t. w(a, b) ≥ 0 (2.53)

is concave.
To understand this theorem, it is helpful to think of it in terms of cost functions, where a is the
production target and b is the production plan. The condition that w is quasi-concave means
that intermediate production plans must meet intermediate production targets, i.e. if you take
a convex combination of two different optimal production plans, then this will produce at least
as much as the convex combination of the outputs.
34 CHAPTER 2. PRODUCTION

Proof. We prove the first part only. (The second part is analogous.) This proof is similar to
the proof of Theorem 2.2. We sketch the intuition first, based on an example of having two
extreme situations involving production targets of 100 and 1000 meals respectively. Suppose
that 5 chefs are needed for 100 meals, and 95 chefs are needed for 1000 meals. We want to
prove that the cost of an intermediate 550 meals is lower than the average of the costs of the
extreme targets. The previous proof does not apply directly, because it was based on using the
intermediate choice in the extreme situations. This was not a problem in the unconstrained
problem, but it is a problem here, because the intermediate number of chefs (e.g. 50) will not
meet the higher production target of 1000 of meals.
Instead, we consider taking an average number of chefs (55) for an intermediate target of
550. Specifically, we start with the (weighted) average of the costs from the extreme targets.
Then we consider the intermediate target (550 meals) with the average production plan (55
chefs). Since the constraint (i.e. the production function) is quasi-concave, 55 chefs meets or
exceed the intermediate target of 400. Moreover, since the objective is convex, the cost of the
achieving the intermediate target of 550 with the average production plan (of 55 chefs) is at
least as good as the average of the extreme costs (of making 100 and 1000 meals). Finally, the
average production plan (of 55 chefs) is inferior to the optimal intermediate production plan
(of 50 chefs). We conclude that the average costs of the extreme targets is higher than the
cost of any intermediate target.
The proof is depicted in Figure 2.8. We would like to establish that

tV (a) + (1 − t)V (a′ ) ≥ V (ta + (1 − t)a′ ), (2.54)

meaning that the line connecting the costs (values) between a and a′ lies above the V curve.
The left side can be interpreted as the cost when (linearly) interpolating between the cost of a
and the cost of a′ . The right side is the cost when making the optimal choice, b(ta + (1 − t)a′ ).
It will be helpful to consider another choice, l(t) = tb(a) + (1 − t)b(a′ ), which we call the
interpolation policy; it makes choices between the two optimal choices b(a) and b(a′ ). We will
establish (2.54) via the following steps:

tV (a) + (1 − t)V (a′ ) (2.55)


= tv(a, b(a)) + (1 − t)v(a′ , b(a′ )) (2.56)
≥ v(ta + (1 − t)a′ , l(t)) (2.57)
≥ v(ta + (1 − t)a′ , b(ta + (1 − t)a′ )) (2.58)
= V (ta + (1 − t)a′ ). (2.59)

The first and last equations are true because V (a) = v(a, b(a)) for all a. The first inequality
follows because v is convex. The second inequality follows because the decision-maker would
reject l(t) in favour of the optimal choice b(ta + (1 − t)a′ ). (We know that l(t) was considered
and rejected, because (i) w is quasi-concave which implies that (ii) l(t) is feasible at state
ta + (1 − t)a′ .)
2.5. UPPER ENVELOPES WITH CONSTRAINTS 35

V (a)

a ta + (1 − t)a′ a′
Figure 2.8: Proof of Theorem 2.6. The middle curve is the cost of the interpolation policy. The
bottom curve is the cost of the optimal policy.

It seems like the constrained version of the theorem (Theorem 2.6) contradicts the uncon-
strained version (Theorem 2.2). The constrained version establishes that the value function is
concave when maximising a concave objective, yet the unconstrained version establishes that
the value function is convex when maximising a convex objective. But if an objective is linear,
then it is both concave and convex. How can both be right?
To make the two theorems more comparable, consider the null constraint that is always
satisfied, i.e. w(a, b) = 0. In this case, the theorems compare as follows:

• Simplified constrained theorem: if v is concave, then V (a) = maxa v(a, b) is concave.

• Unconstrained theorem: if v(a, b) is convex in a, then V (a) = maxa v(a, b) is convex.

There are two important differences:

• The simplified constrained theorem requires the objective to be concave in (a, b), whereas
the unconstrained theorem only requires convexity in a.

• Both make a shape assumption that mirrors the conclusion (concave implies concave,
convex implies convex).

The key to understanding the constrained theorem is that the concavity in (a, b) means
that when you look at V (a) and V (a′ ) and some point in the middle V (a′′ ), there is some
choice b(a′′ ) in the middle that makes a′′ better than just doing b(a) or b(a′ ). This is not the
case for the unconstrained theorem.
Question 2.15. ✓ Sketch a geometric proof of Theorem 2.6.
We now apply Theorem 2.6 to establish that the cost function is convex with respect to output
so that marginal cost is weakly increasing.
Theorem 2.7. If the production function f is concave, then the cost function is convex in
output, i.e. c(·; w) is convex for all w.
36 CHAPTER 2. PRODUCTION

Proof. It is important to realise that the theorem does not claim that c(y; w) is convex in w.
The proof relies on holding w fixed, because w · x is not convex in (w, x).
Recall that the cost function is

c(y; w) = min w · x
−1
x∈RN
+

s.t. f (x) ≥ y.

The constraint is quasi-concave because (x, y) →7 f (x) − y is concave. The objective is linear
in (x, y), and hence convex in (x, y). Thus the first part of Theorem 2.6 implies that c(·; w) is
convex for all w.

Example 2.5. A chocolate manufacturer uses cocoa and rents machines to produce chocolate
bars. When the chocolate is cut into bars, the off-cuts are collected, and can be used to make
more chocolate bars. However, this process is difficult to implement, and requires experimen-
tation. The factory uses cocoa and machines to experiment, which produces knowledge of how
to re-use offcuts. The more knowledge there manufacturer has, the less off-cuts go to waste.

(i) Write down the firm’s problem, without using any Bellman equations.

(ii) Write down the firm’s problem using two Bellman equations relating three value func-
tions: the cost function, the (post-experimentation) profit function, and the value of
experimentation.

(iii) Show that as the price of cocoa increases, the manufacturer decreases the amount of
cocoa it uses.

Answer. Notation: pR price of raw cocoa, pM rental price of machines, k knowledge, (rx , mx )
resources allocated to experimentation, (ry , my ) resources allocated to output production,
chocolate bar output y = f (k, ry , my ), py price of chocolate bars, k = g(rx , mx ) knowledge
“discovered”.

(i) The firm’s problem can be written, without any Bellman equations, as follows:

V (py ; pr , pm ) = max py f (g(rx , mx ), ry , my ) − pr (rx + ry ) − pm (mx + my ). (2.60)


ry ,r x ,my ,mx

(ii) Let

C(y; k, pr , pm ) = min
y y
pr r y + pm my (2.61)
r ,m

s.t. f (k, ry , my ) = y (2.62)


2.5. UPPER ENVELOPES WITH CONSTRAINTS 37

be the cost function, π(k, py , pr , pm ) the post-experimentation profit function, and V (py ; pr , pm )
the pre-experimentation profit function. The latter two can be defined with Bellman
equations:

π(k, py , pr , pm ) = max py y − C(y; k, pr , pm ) (2.63)


y

V (py ; pr , pm ) = max
x x
π(g(rx , mx ), py , pr , pm ) − pr rx − pm mx . (2.64)
r ,m

(iii) Dynamic programming does not help with this part. Applying the envelope theorem to
(2.60) gives

∂V (py ; pr , pm )
= −rx (py , pr , pm ) − ry (py , pr , pm ), (2.65)
∂pr

where the right side denotes the optimal demand policies function for raw cocoa. V (py ; ·, pm )
is the upper envelope of convex functions,

h(pr ; rx , ry , mx , my ) = py f (g(rx , mx ), ry , my ) − pr (rx + ry ) − pm (mx + my ).

So Theorem 2.2 implies V (py ; ·, pm ) is convex. Therefore, its derivative (the left side of
(2.65)) is increasing in the price of raw cocoa. We conclude that the right side is also
increasing, and hence the raw cocoa demand is decreasing in the price of cocoa.

Question 2.16. ✓ A studio has two artists. Each artist uses time and materials to produce
art. The old artist is twice as productive as the young artist (i.e. if the old artist has the same
amount of time and material as the young artist, it produces twice the amount of art.) Both
artists are paid the same wage per hour. Assume that the artists’ production functions are
concave.

(i) Write down the studio’s profit function.

(ii) Write down a Bellman equation for the firm that buries the material and labour choices
inside a value function.

(iii) Which artist would the studio prefer to produce more?

(iv) Draw a graph involving isoquants and isocosts in which the firm allocates the less pro-
ductive artist more materials.

(v) If the studio could spend a pound to increase one of the artists’ output by 0.01 paintings,
which artist would it spend it on?
38 CHAPTER 2. PRODUCTION

Question 2.17. ✓ An inefficient Australian car manufacturer is unprofitable, so it would like


to bribe some Australian politicians to reduce its car sales tax rate for this manufacturer only.
The firm believes that $1000 of bribes will lead to a one percentage point decrease in the tax
rate on cars. If the firm spends enough, then a subsidy is possible. Cars are manufactured out
of capital and labor according to a concave production function. Assume that the manufacturer
is small, so a tax change does not affect prices.
(i) Write down the firm’s profit function (without bribes).

(ii) Write down the firm’s profit function with bribes, incorporating your answer from the
first part into a Bellman equation.

(iii) What is the marginal profit of bribes?

(iv) Do bribes increase the firm’s output?

(v) The firm would like to give the politician an argument to rationalise cutting tax rates.
One suggestion was: perhaps cutting the tax rate would increase the firm’s employment
of Australian workers. Is this necessarily the case?
For more similar questions, see the following practice exam questions: 2.v, 2.vi, 8.iii.c, 8.iii.d,
22.iii, 23.iv, 24.a.iv, 26.iv, 26.v, 27.a.iii, 28.iii, 29.a.iv.

2.6 *Production Technology Sets


The production function formulation of technology is unable to capture simultaneous produc-
tion of several goods. For example, if two companies that make two different things merge, two
production functions would be needed to represent the merged company’s feasible choices. A
more abstract way of representing technology is with production plan sets. A production
plan is a vector y ∈ RN , where yi denotes the net output of good i. If yi < 0, then good i
is a factor of production. The firm can choose any production plan from Y ⊂ RN , the set of
feasible production plans.
Previously, we studied production functions which can only have a single output good. If a
production set Y only has one output – say the first good, then the corresponding production
function can be written as

f (x) = max {y : (y, −x) ∈ Y } . (2.66)

For example, think about making toast. If I want to make one slice, I just take a slice
of bread out of my fridge, and put it in the toaster. So perhaps I should say there are two
commodities (and write N = 2) toast and bread, and I have a technology y = (1, −1) that
transforms the input bread (−1) into an output toast (1). But what if I want to make 100 or
100,000 slices of toast? Then I will need to borrow more toasters and watch my power bills!
2.6. *PRODUCTION TECHNOLOGY SETS 39

So, perhaps I need four commodities (N = 4) – toast, bread, electricity, and toasters,
and I have a technology y = (1, −1, −1, −1) that transforms a slice of bread, a unit of
electricity, and a toaster, into some toast. If I want to make 100 slices of toast, I have a
technology for that too: y ′ = (100, −100, −100, −100). My feasible technology set would be
Y = {(n, −n, −n, −n) : n ∈ R+ }.
However, there is an important difference between bread and toasters. Both are factors of
production, but the production process destroys the bread but not the toaster. The technology
notation we developed only allows us to net outputs of a technology. Is there a way we can
model capital which is not destroyed by production?
One way is to reinterpret Y by saying that commodity y4 refers to the service of using a
toaster for one unit of time (rather than the toaster itself).
Another way is to think about two types of toasters: toasters before, and toaster after
production. If we use a toaster for production, then a by-product is a used toaster. This corre-
sponds to the production technology y ′′ = (1, −1, −1, −1, 1). If we leave a toaster idle, then we
also get to keep it, which gives the technology y ′′′ = (0, 0, 0, −1, 1). Since we can use or leave idle
any number of toasters, the feasible technology set is Y ′ = {a(1, −1, −1, −1, 1) + b(0, 0, 0, −1, 1) : a, b ∈ R+ }.
A technology y ∈ Y is efficient if there is no other feasible technology y ′ ∈ Y such that
y ′ > y (i.e. that either produces more outputs or uses less inputs).
Question 2.18. ✓ Write down a feasible technology set for cleaning up toxic waste with these
properties: (1) the waste must be transferred to a waste dump with a limited capacity, (2)
cleaning up pollution requires chemicals and the services of engineers in proportion to the
amount of waste, (3) engineers are unable to work in teams of more than 10 people.
Question 2.19. ✓ Write down a feasible technology set for putting on a comedy show with
these properties: (1) before doing the show, the comedian must use some time to prepare
an act, (2) the comedian can put on one show each day of one week, and (3) each day, the
comedian may hire a small or large theatre.
Question 2.20. ✓ It seems wasteful to use two toasters to make one slice of toast. So, if our
model is good, using a redundant toaster should be inefficient. Is this the case in the two
formulations of the toasting technology?

You might also like