Arlie O. Petters, Xiaoying Dong-An Introduction To Mathematical Finance With Applications - Understanding and Building Financial Intuition-S
Arlie O. Petters, Xiaoying Dong-An Introduction To Mathematical Finance With Applications - Understanding and Building Financial Intuition-S
Arlie O. Petters, Xiaoying Dong-An Introduction To Mathematical Finance With Applications - Understanding and Building Financial Intuition-S
ArlieO.Petters
XiaoyingDong
An Introduction
to Mathematical
Finance with
Applications
Understanding and Building Financial Intuition
Springer Undergraduate Texts in Mathematics
and Technology
Series Editors:
J.M. Borwein
H. Holden
V.H. Moll
An Introduction to Mathematical
Finance with Applications
123
Arlie O. Petters Xiaoying Dong
Department of Mathematics Department of Mathematics
Duke University Duke University
Durham, NC, USA Durham, NC, USA
vii
viii Preface
fer a theoretical treatment without many applications and those that simply
present and apply formulas without appropriately deriving them. Indeed, the-
oretical understanding is incomplete without enough practice in applications,
and applications are risky without a rigorous theoretical understanding. To
accomplish this, the book contains numerous carefully chosen examples and
exercises that reinforce a students conceptual understanding and develop a
facility with applications. Indeed, the exercises are divided into conceptual,
application, and theoretical problems that probe the material deeper.
Second, beyond a few required undergraduate mathematics courses (see
Prerequisites below), this book is essentially self-contained. The large num-
ber of necessary financial terminologies and concepts can be overwhelming
to a student new to finance. For this reason, after introducing some central,
big-picture financial ideas in the first chapter, we present the financial minu-
tia along the way as needed. We have tried to make the book self-contained
in this regard through thoughtfully chosen illustrative applications starting at
the ground level with simple interest. We then gradually increase the difficulty
as the book develops, ranging across compound interest, annuities, portfolio theory,
capital market theory, portfolio risk measures, the role of linear factor models in portfo-
lio risk attribution, binomial tree models, stochastic calculus, derivatives, the martin-
gale approach to derivative pricing, the Black-Scholes-Merton model, and the Merton
jump-diffusion model.
Third, the book is also useful for students preparing either for higher level
study in mathematical finance or for a career in actuarial science. For example,
the syllabi for the actuarial Financial Mathematics Exam (Exam 2/FM) and
Models of Financial Economics Exam (Exam 3F/MFE) include many topics
covered in the book.
Prerequisites
Audience
The text is aimed at advanced undergraduates and masters degree students who are
either new to finance or want a more rigorous treatment of the mathematical
models used in finance. The students typically are from economics, mathemat-
ics, engineering, physics, and computer science.
We also believe that a faculty member who is teaching finance for the first
time will find this introduction readily manageable. Professionals working in
finance who would like a refresher or even clarification on some of the the-
oretical and conceptual aspects of mathematical finance will benefit from the
text.
The chapters are organized naturally into four parts and range over the fol-
lowing topics:
- Part I (Chapters 1 and 2):
introduction to securities markets and the time value of money
- Part II (Chapters 3 and 4):
Markowitz portfolio theory, capital market theory, and portfolio risk measures
- Part III (Chapters 5 and 6):
modeling underlying securities using binomial trees and stochastic calculus
- Part IV (Chapters 7 and 8):
derivative securities, BSM model, and Merton jump-diffusion model
The material was tested in courses offered to upper-level undergraduates and
masters degree students. Below are two examples of possible topics that may
serve as a guide for semester-long courses:
- Introduction to Mathematical Finance: securities markets (Chapter 1), the time
value of money (Chapter 2), Markowitz portfolio theory, capital market
theory, and portfolio risk measures (Chapters 34), binomial security pric-
ing (Chapter 5, omit most derivations), Itos formula and geometric Brow-
nian motion (Sections 6.8 and 6.9), forwards, futures, and options (Sec-
tions 7.2, 7.3, and 7.5), and call option pricing with applications (Sections 8.3,
8.2.2, 8.5, and 8.6.2).
- Introduction to Financial Derivatives: modeling underliers in discrete time
(Sections 5.15.3), stochastic calculus and modeling underliers in continuous
time (Section 5.4 and Chapter 6), general aspects of forwards, futures, swaps,
and options, including trading strategies (Chapters 7), the Black-Scholes-
Merton (BSM) model, BSM p.d.e. approach to pricing European-style op-
tions, risk-neutral approach to pricing European-style options, applications
to warrants, delta hedging, managing portfolio risk, and extension of the
BSM model to the Merton jump-diffusion model (Chapter 8).
x Preface
Acknowledgments
Specials thanks to the following individuals for their feedback and assistance:
Daniel Aarhus Lu Liu Chi Trinh
Amir Aazami Ruisi Ma Dan Turtel
Stanley Absher Tanya Mallavarapu Kari Vaughn
Vibhav Agarwal Xavier Mela Robert Vanderbei
Hengjie Ai Vadim Mokhnatkin Kevin Wan
Mitesh Amarthaluru Julia Ni Chenyu Wang
Vlad Bouchouev James Nolen David Williams
Michael Brandt Vivek Oberoi Chao Xu
Esteban Chavez Feng Pan Hangjun Xu
Rui Chen Chloe Peng Lu Xu
Kyuwon Choi Junkai Xue
Hal Press
Qian Deng Hui Qi Chao Yang
Christian Drappi Zhaozhen Qian Jiahui Yang
Zachary Freeman Hayagreev Ramesh Ashley Yeager
Tingran Gao Emma Rasiel Jeong Yoo
William Grisaitis Tianhua Ren Yanchi Yu
Xiaosheng Guo Chelsea Richwine Yunliang Yu
Zhonglin Han Irving Salvatierra Javier Zapata
John Hyde Andrew Schretter Xiaodong Zhai
Yuhang Si
Huseyin Kortmaz Biyuan Zhang
Baolei Li John Sias
Yang Zhang
Junchi Li Maxwell Stern
Lingran Sun Bowen Zhao
Li Li Ruiyang Zhao
Nan Li Alberto Teguia
Qiao Li Xiaoyang Zhuang
Nicholas Tenev
Li Liang Dominick Totino Zilong Zou
We are also thankful to Elizabeth Loew of Springer for her support and guid-
ance along the entire way and to Lisa Goldberg for her valuable comments and
constructive suggestions. AP is indebted to Duke University for providing
the financial support needed to hire many students who assisted with writ-
ing computer codes, checking calculations, etc. He is also extremely grateful to
his wife, Elizabeth Petters, for her patience, love, and steadfast encouragement
throughout the project. XD would like to express her gratitude to her husband
Xin Zhou who saw her through this book and offered great suggestions.
xi
xii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Chapter 1
Preliminaries on Financial Markets
Interest rates are a key concept in economics. The level of interest rates plays
an extremely important role in a market economy and therefore in financial
markets as well.
Banks can be classified into central banks, investment banks, and commercial
banks.
3 Due to the Feds quantitative easings (QEs) in the last few years, where QEs were programs for large-
scale purchases of assets from the banks that drove up the volume of excess reserves to unprecedented
levels, presently many US banks can meet their reserve requirements without borrowing.
1.1 A Primer on Banks and Rates 3
Example 1.1. The following discussion is designed to help the reader under-
stand how the fractional reserve banking system works and establish an intu-
ition of the concepts of money, credit/debt, and leverage.
When JD1 (John Doe number 1) deposits $10,000 into his bank account, if
the banks reserve ratio is 10%, then the bank needs to reserve only $1,000 and
may lend the remaining $9,000 out. Lets say JD2 (John Doe number 2) gets the
$9,000 loan to buy a car, and the car dealer puts the $9,000 back into another
bank. The second bank reserves $900 and lends out $8,100 to JD3. Theoretically
speaking, this series of financial maneuvers can go on to JD4, JD5, and so on,
forever. As the result, the original $10,000 (cash) could generate $100,000 (cash
and credit/debt), for
10, 000
10, 000 (90%)n = = 100, 000.
n =0
1 0.9
Interest rates (and bond yields5) affect all financial markets. Among them,
there are short-term (fewer than 12 months) rates, there are long-term (usually
10 years or longer) rates, and there are some in between.
Since short-term interest rates and long-term interest rates are determined by
different mechanisms, we consider them separately.
Short-Term Rates
Short-term rates are administered by the FOMC in the USA (and by central
banks in other nations).
The fed funds rate not only is important to banks but also has trickle-down
effects that affect investors and consumers (or end users). This rate is also
used as the benchmark for other short-term interest rates.
When pricing a loan, a lender often uses the formula
where the size of a spread (or margin) depends on how risky the lender feels
the loan is; the riskier the loan, the bigger the spread (or the higher the margin).
The often used index rates (or base rates) are the prime rate, LIBOR, or COFI,
depending on which of the following loans is under consideration:
variable-rate credit card credits,
variable-rate auto loans,
variable-rate student loans,
home equity lines of credit (HELOC),
adjustable-rate mortgages (ARM),
small business loans,
personal loans.
Here are the descriptions of prime rate, LIBOR, and COFI:
4 Source: http://azizonomics.com/2011/11/15/zombie-economics/
5 See Section 2.10.
1.1 A Primer on Banks and Rates 5
The prime rate, as a generic term, is the interest rate that the banks charge
their most credit-worthy customers; it primarily refers to the Wall Street
Journal Prime Rate which is the consensus prime rate published by the
WSJ after polling ten of Americas largest banks. The prime rate moves
up or down in lock step with changes by the FOMC. Normally, it runs
approximately 300 basis points (or 3 percentage points) above the fed funds
target rate, since the break-even lending rate of interest for banks making
loans is essentially the same as the prime lending rate.
The prime rate is the most popular index rate used in the USA.
The London Interbank Offered Rates (LIBOR) can be described as a daily refer-
ence of the wholesale cost of money in the London interbank money market
of 18 banks.6 Loosely speaking, LIBOR is like an international counterpart
of the fed funds rate. However, unlike the fed funds rate, there are many
different LIBOR rates with maturities ranging from overnight to 12 months.
The LIBOR is frequently the basis of investments including interest swap
agreements and forward contracts,7 and for many adjustable mortgage
loans as well. American banks use LIBOR because LIBOR is updated much
more frequently than the US prime rate as we mentioned earlier; this has
advantages particularly when global credit market conditions deteriorate
rapidly.
The 11th District Cost of Funds Index (COFI) is an index rate primarily used
in the western USA to set the cost of variable-rate loans and is computed
by using data from three western states Arizona, California, and Nevada
(which are covered by the 11th district).
Long-Term Rates
6 These 18 banks include Bank of America, Barclays, Credit Suisse, Deutsche Bank, HSBC, and JP Mor-
gan Chase. For a complete list, visit the website at https://en.wikipedia.org/wiki/Libor.
7 See Sections 7.2 and 7.4.1.
6 1 Preliminaries on Financial Markets
Since there are different interest rates based on various terms as we explained
earlier, it is desirable to represent all these interest rates simultaneously. A
graphical representation of where interest rates are today is called a yield curve.
More precisely, a yield curve is the graph of y = r (t) where t represents time to
maturity for bonds of the same asset class and credit quality (e.g., US Trea-
suries, LIBOR, zeros,8 or AA-rated corporate bonds, and so on), and y is the
corresponding yield to maturity. In short, a yield curve is a plot of bond yields
to maturity against times to maturity.
Example 1.2. In daily financial news, the Treasury yield curve is often shortened
to the yield curve. The yield curve is considered a leading indicator of economic
activity (see Section 1.3) and often used as a reference point for forecasting
interest rates by investors.
The zero-coupon yield curve or the spot-rate curve is created by plotting the
yields of zero-coupon treasury bills against their corresponding maturities.
The primary use of zero (or spot) interest rates is to discount cash flows.9
Market structures are defined by the trading rules and trading systems which
include what information (e.g., orders and quotations) traders can see. A con-
cise interpretation of the financial jargons includes (but is not limited to) the
following:
Securities are financial products10 that can be traded on securities markets.
Securities markets are the trade execution venue.
Clearing houses are responsible for settling trades.
Securities depositories are responsible for holding security certificates.
Brokers arrange trades for their clients (including clearing and settlement).
Dealers trade with their clients and are obligated liquidity providers.
Securities markets may be classified into two levels. One is the primary market
or new issue market,11 and another is the secondary market or after market.
8 Conceptually speaking, the most basic debt instrument is a zero-coupon bond or simply a zero, which
is a bond with a single cash flow equal to face value at maturity.
9 See Chapter 2 for the concept of discount cash flows.
10 See Section 7.1.1 for more detailed explanation. Examples of securities are stocks and bonds.
11 The market where new securities are issued.
1.2 A Primer on Securities Markets 7
Primary markets deal with the trading of newly issued securities, whereas
secondary markets deal with the trading of securities that have already been
issued in the primary market (i.e., existing securities). Thus, after market
may be interpreted as after new issue market.
Secondary markets are organized in two basic ways. One is the exchanges
and another is the over-the-counter (OTC) markets.
Exchanges are highly organized and centralized markets where securities
are traded. Exchanges started as physical places, trading floors, where trading
took place (although the advent of ECNs, or electronic trading, has eliminated
the need for such traditional floors), whereas OTC markets are less formal,
have never been a physical place, and connect broker-dealers only electroni-
cally.
Examples of exchanges are the New York Stock Exchange (NYSE) and the
NASDAQ Stock Exchange (NASDAQ), where the majority of larger US public
companies are traded.
Examples of OTC markets are over-the-counter bulletin board (OTCBB), an
electronic trading service, and pink sheets (a quotation service). In general,
stocks that are traded on OTC are considered to be more speculative than
stocks that are listed on exchanges.
NYSE is an order-driven market (it functions like an auction market), whereas
OTC, by contrast, is a quote-driven dealer market.
For a private company to go public, issue shares and be traded on an ex-
change thereafter, it needs12 to choose an exchange on which to be listed, which
means that it must be able to meet that exchanges listing requirements (among
other satisfactions13).
If a listed stock, at a later point in time, fails to comply with the exchanges
listing requirements, it will be delisted, i.e., removed from the exchange on
which the stock was issued. After a stock is officially delisted, normally it will
be traded on the OTC markets, mainly either on OTCBB or on pink sheets.14
Hence by the very organization of the secondary securities markets, an in-
ternal quality control mechanism has been put in place. Exchanges with list-
ing requirements are motivated to ensure that only high-quality securities are
traded on them and to uphold the exchanges reputation among investors.
12 Otherwise, a public company is traded on OTC and the stock is referred to as unlisted stock, whereas
quirements usually include minimum stockholders equity, a minimum share price, and a minimum
number of shareholders. The standards vary by exchange. For example, listing on the NASDAQ is
considerably less expensive than listing on the NYSE, which partially explains why newer companies
often opt for the NASDAQ if they meet its requirements.
14 A stock traded on pink sheets is considered to be riskier than that on OTCBB. In general, a stock
A limit buy/sell order is a trade instruction with a limit bid/ask price and a size
(i.e., a quantity). A liquidity pool usually consists of a large number of limit or-
ders which cannot be matched currently. These orders are referred to as current
nonmarketable orders. A new order is called nonmarketable with respect to such
15 The US Securities and Exchange Commission (SEC) defines a market maker as a firm that stands
ready to buy and sell stock on a regular and continuous basis at a publicly quoted price. (Source:
http://www.sec.gov/answers/mktmaker.htm)
16 See Chapter 8 for details.
1.2 A Primer on Securities Markets 9
a pool if this order cannot be matched in the pool. As a result, this new order
will be added to the pool and this is called adding liquidity to the pool. On the
other hand, if the new order can be matched in the pool immediately, the new
order is filled and the size of the pool is reduced and therefore this is called
taking liquidity from the pool. Usually, liquidity is said to be high if the bid-ask
spread is small, and the ask size and bid size are large.
Given time t during market hours, let b(t) and a(t) denote the best bid price
and the best ask price at time t, respectively; the bid-ask spread at time t is
a(t) b(t). In other words, the current bid-ask spread or simply the spread is
the amount by which the current lowest ask price exceeds the current highest
bid price.
Example 1.3. Suppose that dealer quotations for MSFT, the ticker symbol for
Microsoft, show that the best bid price is $50 and the best ask price is $50.02,
then the bid-ask spread is $0.02.17 If you are an impatient trader, then you
either have to buy at price $50.02 or have to sell at price $50 at best. If the
market dealer can buy and/or sell one million shares of Microsoft at the best
bid price and/or sell at the best ask price, then he or she will make $20,000 in
a short period of time.
Liquidity plays a central role in the functioning of securities markets. In fact,
market liquidity is the single most important characteristic of well-functioning
markets.
Although there is no specific liquidity formula, the size of the (bid-ask) spread
may be used as a rule of thumb for a trader to measure market liquidity, since
a maximum spread rule is the most common affirmative obligation of desig-
nated market makers. The smaller the spread and the larger the size, the more
liquid the market is.
If the definition of a security market risk is the standard deviation of the
security return, then liquid markets are less risky than illiquid ones as liquid
markets are less volatile than illiquid ones.
A list of trading costs includes, but is not limited to, the following:
3. Difference between the short-term capital gain rate and the long-term capital
gain rate (as the former is usually much higher than the latter)
In practice, trading costs cannot be neglected at all and an active investor must
scrutinize the per trade cost.
An economic cycle (or business cycle) is the term used to describe certain pat-
terns of wide fluctuations in economic activities followed by economies: an
expansion, until a peak, followed by a contraction, and until a trough. It is
called a cycle because this pattern repeatsthe trough phase is then followed
by expansion phase, peak phase, contraction phase, and trough phase again,
to compose another cycle.
An economic variable is a random variable whose sample space consists
of economic-related events. Often used economic variables are population,
poverty rate, available resources, dividend yield, inflation rate, imports and
exports, etc. An economic variable that reveals the direction in which the econ-
omy is moving (i.e., signs of contraction or expansion) is an economic indicator.
Just as the tense of a verb group can be classified into future, present, or past,
economic indicators can be classified into leading, coincident, and lagging in-
dicators:
Those that change before the economy changes are called leading indicators.
For example, new factory orders for consumer durable goods and the dif-
ference between interest rates at two different maturities (e.g., term spread).
Those that occur at the same time as the related economic activity are called
coincident indicators, e.g., GDP (gross domestic product19 ), nonfarm payrolls,
and retail sales.
Those that only become apparent after the related economic activity are
called lagging indicators. For example, CPI (consumer price index20 ) and the
unemployment rate.
The US Census Bureau always releases economic indicators on schedule.
Online economic calendars provide convenient access to many types of in-
formation, including economic indicator announcements with forecasts; the
definition of each indicator, with prior, prior revised and actual numbers,
19 Gross domestic product is the monetary value of all the finished goods and services produced
within a country in a specific time period.
20 Consumer price index is a measure of average change over time in the prices paid by urban con-
sumers for a market basket of consumer goods and services, such as transportation, food, and medical
care.
1.3 Economic Indicators That May Affect Financial Markets 11
Table 1.1 Example of a simplified version of Bloombergs domestic economic calendar for the busi-
ness week beginning January 28, 2013. Source: based on the Bloomberg table at http://www.
bloomberg.com/markets/economic-calendar.
Mon Jan 28 Tue Jan 29 Wed Jan 30 Thu Jan 31 Fri Feb 1
Farm Prices
3:00pm
Fed Balance
Sheet
4:30pm
Money
Supply
4:30pm
highlights and reflections on daily stock market focus; bond auction informa-
tion; and other relevant events. A simplified economic calendar example is
provided in Table 1.1.
12 1 Preliminaries on Financial Markets
You may have heard the expression, A dollar today is worth more than a dollar
tomorrow, which is because a dollar today has more time to accumulate inter-
est. The time value of money deals with this basic idea more broadly, whereby
an amount of money at the present time may be worth more than in the fu-
ture because of its earning potential. In this chapter, we discuss the valuing
of money over different time intervals, which includes a study of the present
value of future money and the future value of present money. The theory is
laid out in a rigorous, detailed, and general framework and accompanied by
numerous applications with direct relevance to personal finance.
To be self-contained for readers new to finance, Sections 2.1 to 2.5 intro-
duce our conventions and terminologies associated with time, interest rates,
required return rates, total return rates, simple interest, compound interest for
integral and nonintegral periods, and generalized compound interest, where
the interest rate and compounding period vary. Readers already familiar with
these topics should skim those sections for our notational usage. In Sec-
tion 2.6, we introduce the net present value and internal return rate, including
Descartess Rule of Signs. The theory of annuities is presented in Section 2.7
and includes amortization theory and annuities with varying payments and
varying interest rates. Applications of annuity theory to saving, borrowing,
equity in a house, sinking funds, the present value of preferred and common
stocks, and bond valuation are given in Sections 2.8 to 2.10.
2.1 Time
Before delving into the value of money over time, it is important to be clear
about our conventions and notation for time.
Throughout the book, the default unit of time is a year. Unless stated to the con-
trary, assume that a year consists of 365 calendar days and 252 trading days.1
When designating time, assume that there is a fixed starting time relative to
which the other moments of time are defined. The explicit choice of starting
time will depend on the context of the application, but we shall always repre-
sent it by 0. Note that the starting time need not be the current time.
We employ the following notation:
Note that a general moment of time t > 0 simultaneously designates the number
of years of elapsed time from 0 to the given moment. For example, writing
1 1
t0 = , t=
4 2
means that the current time is 3 months after the starting time and t is 3 months
from now. If October 1, November 1, and December 1 in 2015 mark the times
0, t1 , and t2 , respectively, then
1 1
t1 = , t2 = .
12 6
We shall distinguish between an interval of time, say, [t0 , t f ], and its time span
, which is the length of the interval:
1 Apart from being mindful of leap years, note that banks may use a 360-day year when computing
their charge on loans. Any deviation from a 365-day year will be stated explicitly.
2.2 Interest Rate and Return Rate 15
You are perhaps most familiar with interest as the rate a bank pays into your
savings account (where you lend the bank money) or the rate a bank charges
you for a loan (where the bank lends you money). Overall, interest is the cost
of money. It is the compensation received for lending or investing money. The
initial amount of money you lend or borrow is called the principal and will be
denoted by F0 . Henceforth, assume that money investedwhether in a sav-
ings account or in a start-up companyis money lent with the expectation of
receiving back more than the amount invested (principal plus interest).
The compensation for lending or the charge for borrowing a principal F0 is
typically expressed as a percent r of F0 per year:
compensation or charge per year = r F0 .
The percent r is called the annual interest rate or the quoted ratee.g., a 5% per
annum interest means r = 0.05. By default, all interest rates will be on or con-
verted to a per annum basis. For this reason, we sometimes refer to r simply
as the interest rate rather than the annual interest rate. Interest rates appear in
numerous settingssavings accounts, certificates of deposit, credit cards, auto
loans, mortgages, treasuries, bonds, etc.
Remark 2.1. Bear in mind that the interest rate used for lending need not equal
the interest rate employed for borrowing. However, in later modeling, we shall
assume that the two rates are equal (e.g., see page 84).
Though r is constant by default, later in the chapter (e.g., Section 2.5), we shall
study models where r varies discretely and continuously with time. When the
interest rate r is a function of time, it is common practice to express this as
r (t)an abuse of notation that should not cause undue confusion.
Interest rates can, of course, be quoted for any time span (week, month,
etc.). For example, an interest rate of 12% per year is mathematically the same
as 1% per month. More generally, if we divide a year into k equal-size interest
periods, then
r
interest rate per interest period = .
k
16 2 The Time Value of Money
The exact time of a time interval measures the length of the interval in days,
but excludes the first day. Exact interest is interest computed using 365 days
in a year or 366 days for leap years. Credit card companies tend to use exact
time and exact interest. Ordinary interest is interest calculated using 360 days
in a year with 30 days in each month. Banks usually lend using exact time and
ordinary interest, which has come to be known as Bankers Rule.
We always assume that when an investor commits her money for a specific
period of time, whether to a security, portfolio, or start-up, she expects to be
compensated. An investors required rate of return over an investment period is
then the interest rate the investor demands as compensation for the following:
Opportunity cost: Since lending prevents an investor from using that money
for other investment opportunities, the investor requires compensation for
her money being tied up.
Inflation: Since inflation erodes the value of money, the investor requires
compensation that covers the impact of inflation.
Risk: Since there is a nonzero probability that earnings promised to the in-
vestor will not materialize or that the investor can lose some or all of her
money, the investor requires compensation for the risks of the investment.
Unless stated to the contrary, we assume that no compensation to cover taxes and
transaction costs is part of a required return rate. It is messy to include these
items in an introduction to mathematical finance, not to mention that tax
laws and transaction costs change. Readers are referred to Reilly and Brown
[16, Chap. 1] for a detailed discussion of the required return rate.
In the absence of inflation and risk, the required return rate is called the real
risk-free rate and denoted rreal . It is a compensation purely for opportunity cost.
If there is no risk, but you have inflation and an opportunity cost, then the
required return rate is termed the nominal risk-free rate or, simply, the risk-free
rate. When the real risk-free rate is intended as opposed to the risk-free rate,
we shall indicate so explicitly.
There is a simple relationship among rreal , r, and the inflation rate i. Assume
that you invest F0 in a riskless asset over 1 year. Your required return rate is r,
which compensates you for opportunity cost and inflation. Specifically, your
2.2 Interest Rate and Return Rate 17
compensation for opportunity cost a year from now is rreal F0 . However, a year
from now, the value of your compensation rreal F0 for opportunity cost will
reduce by i (rreal F0 ) due to inflation. Furthermore, your initial investment will
also reduce in value by i F0 due to inflation. Your required return rate amount
r F0 beyond your initial investment should then be
r F0 = i F0 + rreal F0 + i (rreal F0 ).
In other words, your investment would grow from $20,000 to $20,800 over 1
year. The return rate R(0, 1) on your investment over 1 year is the fractional
percentage change
F ( t f ) F ( t0 )
R ( t0 , t f ) =
F ( t0 )
18 2 The Time Value of Money
V ( t) 0 ( t 0). (2.1)
Assume that the investment pays a per-unit cash dividend of D (t0 , t f ) dur-
ing the interval [t0 , t f )e.g., a cash payout per share by a company to share-
holders.
Several clarifying remarks are needed about cash dividends:
For simplicity, we do not include any cash dividend at t f , but tally it as part
of the subsequent time interval starting at t f .3
It is also common practice to assume that D (t0 , t f ) excludes any income
such as interest from the cash dividend during [t0 , t f ). This is not a serious
concern for sufficiently short investment time intervals. We also exclude
complications like share splits and noncash payouts.
When an investment pays out a cash dividend, it has lost value by the
amount of dividend. The market value V (t f ) is then the ex-dividend (with-
out dividend) value and the cum-dividend (with dividend) value is
V c ( t f ) = V ( t f ) + D ( t0 , t f ) .
2 A dividend does not have to be in the form of cash. It can be a stock dividende.g., a company can
pay you additional (typically, fractional) shares for each share of company stock you own.
3 This bookkeeping for the cash dividend makes it convenient mathematically when considering rein-
vesting dividends to buy more units of the investment over consecutive time intervals.
2.2 Interest Rate and Return Rate 19
R ( t0 , t f ) V ( t0 ) = V ( t f ) V ( t0 ) + D ( t0 , t f ) .
return amount capital gain cash dividend
The spread V (t f ) V (t0 ) is called a capital gain. Note that a negative capital
gain is a capital loss. Equivalently,
V ( t f ) V ( t0 ) D ( t0 , t f ) V c ( t f ) V ( t0 )
R ( t0 , t f ) = + = . (2.2)
V ( t0 ) V ( t0 ) V ( t0 )
capital-gain return dividend yield
This is called the total rate of return or holding-period return of the investment
from t0 to t f . We shall often refer to R(t0 , t f ) simply as the return rate and
at times will even refer to R(t0 , t f ) as the return when it is clear from the con-
text that a rate is intended as opposed to the return amount R(t0 , t f ) V (t0 ).
Note that if your ownership in the investment consisted of n units (shares),
then the return rate is still given by (2.2) since the numerator and denominator
of each term would be multiplied by n and so n would drop out.
Notation. When the return rate depends on the length of [t0 , t f ] rather than
on the location of [t0 , t f ] on the positive time axis [0, ), we set
R ( t0 , t f ) = R ( ) .
D (t0 ,t )
The ratio V (t )f in (2.2) is called the dividend yield and represents the per-
0
unit cash dividend from the investment as a percent of the initially invested
V (t f )
capital V (t0 ). Additionally, we refer to the ratio V ( t0 )
as the gross return from t0
to t f .4 It expresses the final value V (t f ) as a percent of the initial value V (t0 ).
4 V (t f )
Some authors call V ( t0 )
the return rate, but we shall not abide by that usage.
20 2 The Time Value of Money
Example 2.1. Suppose that after 1 year, the return rate on your investment is
50%. Then the gain to you, beyond your initial investment, is 50% of your
initial investment. If the return rate is 100%, then you have a complete loss.
If the return rate is 200%, then your gain is twice the initial investment, i.e.,
your initial investment tripled in value over the year.
Finally, observe that the return rate becomes random if the future value
V (t f ) and/or the cash dividend D (t0 , t f ) is random. Almost all the return
rates we encounter in this chapter are nonrandom, while all the return rates
in Chapter 3 are random.
A principal of $1, 000 held for a year at a 12% interest rate has a simple interest
of $120 at the end of 1 year. This amount is the same as adding 12 monthly
interests of $10, each of which is obtained from a monthly interest rate of 1%.
For a time span of years, if we assume that the interest rate r is applied only
to the principal F0 , then
If an annual simple interest rate is applied over multiple years (or periods)
to a principal, then at the end of each year (or period), interest is applied only to
the principal and the entire balance is reinvested back into the account. In other
words, all interest accrued at the end of each period or year is carried forward
without gaining interest. Under simple interest growth at rate r, a principal F0
increases to the following amount at years from the present:
F ( ) = F0 + r F0 = 1 + r F0 , (2.4)
Example 2.2. Suppose that an account has $700 and pays 4% per annum. Ap-
plying a 4% annual simple interest growth to the $700 for 1 year yields an
interest of 0.04 $700 = $28 and a total amount accrued of
$700 + 0.04 $700 = $728.
To obtain simple interest growth of $700 over 2 years, we add to the prin-
cipal a simple interest of 0.04 $700 at the end of the first year and simple
interest of 0.04 $700 at the end of the second year:
$700 + 0.04 $700 + 0.04 $700 = $756,
2.4 Compound Interest 21
or, equivalently,
$756 = 1 + 0.04 2 $700. (2.5)
Note that a 4% annual simple interest growth applied to $700 for 2 years is the
same as applying 8% per 2 years.
Investing $700 under simple interest growth of 4% per annum yields a future
value of $756 2 years from now. Conversely, the present value of $756 under 4%
annual simple interest discounting is $700. In general, if at the current time,
you invest (or borrow) a principal F0 under simple interest growth at an inter-
est rate r applied over years, then the amount of money you receive (or owe)
at the end of the time span is called the future value of F0 and given by
future value of F0
= F ( ) = ( 1 + r ) F0 . (2.6)
at years from now
F ( ) F0
R( ) = = r . (2.8)
F0
We saw above that under simple interest for 2 years, an account with $700 at
4% per annum will grow to
$700 + 0.04 $700 = $728 (2.9)
at the end of the first year and to $756 at the end of second year, after the
interest of $28 for the second year is added. However, there is a way to accu-
mulate more money over the same 2 years using the same simple interest rate.
Assume that at the end of the first year, you withdrew the $728, closed the
account, and immediately used the $728 as principal to open another simple
interest account paying the same interest rate. Then a year later, i.e., at the end
of the second year, the total you would accrue is
22 2 The Time Value of Money
which is greater than the original total of $756! This type of growth is called
compound interest. In fact, an annually compounded account earning 4% per
annum over 2 years would earn you the latter amount without you needing to
engage in the previous inconvenient strategy.
Using (2.9), we can rewrite (2.10) as
$700 + 0.04 $700 + 0.04 ($700 + 0.04 $700) = $757.12. (2.11)
Equation (2.11) summarizes exactly how the growth process works: annual
compounding of a principal of $700 over 2 years at an interest rate of 4% means
that one applies 4% simple interest to $700 at the end of the first year and then
applies 4% simple interest again at the end of the second year to the entire
balance (principal plus interest) carrying forward from the end of the first year.
Rewriting (2.11) as
$757.12 = (1 + 0.04)2 $700 (2.12)
yields the standard form for two annual compounds.
Let us extend (2.12) to a finite number of compoundings. In general, com-
pound interest occurs when the time span is divided into multiple periods, and
simple interest is applied over each period to the balance at the end of the pe-
riod. We assume that the entire balance at the end of each period is reinvested back into
the total being accrued, i.e., no money is withdrawn and no extra money is added. For
mathematical modeling purposes, we also treat the end of a period as equiva-
lent to the start of the next period.
Unless stated to the contrary, assume that the date when the prin-
cipal is deposited coincides with the start of an interest period.
2.4 Compound Interest 23
Following the structure of (2.12), we now compute the future value to which
the principal F0 will grow under k-periodic compounding at interest rate r over
n interest periods, where n is a nonnegative integer. Since n periods correspond
to n/k years, the future value at the end of the nth period is F(n/k). However,
in compound interest theory, the emphasis is on the number n of periods over
which compounding occurs, rather than the number of years. For this reason,
the future value is written as a function of the number of periods as follows:
n
F = Fn .
k
At the end of the first period, apply simple interest to F0 to obtain the future
value F1 to which F0 grows over the first period:
r r
F1 = F0 + F0 = 1 + F0 .
k k
Now, do not take out any of the money. Instead, reinvest the entire amount
F1 in the account at the end of the first period until the end of the second
period.
At the end of the second period, apply simple interest to F1 to get the future
value F2 to which F1 grows over the second period:
r r
2
F2 = F1 + F1 = 1 + F0 .
k k
Note that compound interest occurs since interest was added to the whole
F1 , yielding interest on the principal F0 and interest on the interest (r/k) F0 .
Next, reinvest the entire amount F2 in the account at the end of the second
period until the end of the third period.
Continuing the above process, at the end of the nth period, apply simple
interest growth to Fn1 to obtain the future value Fn to which Fn1 grows
over the nth period:
r r
n
Fn = F n 1 + F n 1 = 1 + F0 , n = 0, 1, 2, . . . .
k k
We have established the following: Under k-periodic compounding over n inter-
est periods at an interest rate r, a principal F0 will increase to the value Fn at the end
of the nth interest period:
r
n
Fn = 1 + F0 , n = 0, 1, 2, . . . , (2.13)
k
where kr is the periodic interest rate. Observe that Fn depends on the size of the
time interval over which the compounding occurs. This is because the interest
rate is constant for the n periods.
24 2 The Time Value of Money
However, it may concern the reader that compounding occurs over the first
15 mth, but then stops during the remaining 0.36 mth, and is replaced by sim-
ple interest growth. We claim that the latter is actually an approximation of the
exact mathematical compounding that should be applied during the partial
month. We apply fractional compounding to F15 during the remaining 0.36 mth,
which gives
0.36
0.1
F15.36 = F15+0.36 = 1+ F15 = $11, 359, 503.48. (2.15)
12
In this example, we see that the accrued total in (2.14) is higher by $90.19 than
the total in (2.15) obtained from exact modeling. The bank would be paying
more interest if (2.14) is used.
We now present a theoretical basis for (2.15) and the approximation used
in (2.14). First, we shall introduce the key defining mathematical property of
compound interest as in the treatment by Kellison [10, Sec. 1.5]. For an integral
number of interest periods, Equation (2.13) shows that
r
m+n r
m r
n
Fm + n = 1 + F0 = 1 + 1+ F0 ,
k k k
where m and n are nonnegative integers. We denote the compound interest
growth function over n interest periods by
r
n
G (n) = 1 + ,
k
where
r
G (0) = 1, G (1) = 1 + , G (n) > 1 for n = 1, 2, . . . .
k
The inequality G (n) > 1 for positive integers n means that the principal will
increase for compounding over at least one interest period. The compound
interest growth function satisfies:
G ( m + n ) = G ( m ) G ( n ). (2.16)
to the value
F x = G ( x ) F0
by k-periodic compounding over x interest periods at interest rate r if the growth
function G ( x ) satisfies the following properties:
F x + y = G ( x + y ) F0
G ( x ) = G ( x ) G (0). (2.18)
d ln G ( x )
= G (0),
dx
or, equivalently,
d ln G ( x ) = G (0) dx.
Integrating the equation from 0 to x yields:
2.4 Compound Interest 27
ln G ( x ) ln G (0) = G (0) x.
ln G ( x ) = G (0) x. (2.19)
Theorem 2.1. Under k-periodic compound interest at r per annum over a time span
of x interest periods, where x is a nonnegative real number, a principal F0 will increase
to the following future value at the end of the time span:
r
x r
Fx = 1 + F0 , 0 < 1, x 0 , (2.20)
k k
where k = 1, 2, . . . .
as the present value of Fx . The interest rate r is applied as a growth rate in the
future valuing of (2.20) and as a discount rate in the context of (2.21). Since x
interest periods is x/k years, the future value Fx occurs x/k years from now,
28 2 The Time Value of Money
i.e., x
Fx = F .
k
The number x of interest periods can always be expressed as the sum of an
integral number n of interest periods and a fraction of an interest period:
x = n + ,
where n is the greatest integer part of x and 0 < 1. For example, x = 15.36
interest periods splits into a sum of n = 15 and = 0.36 interest periods.
We can then rewrite (2.20) as
r
n r
r
Fx = 1 + F = 1 + Fn , 0 < 1, 0 < 1 . (2.22)
k k k
Here F is the amount to which F0 grows over the fraction of an interest
period, i.e., we have fractional compounding during mth:
r
F = 1 + F0 .
k
For a proper fractional period, i.e., for 0 < < 1, the leftmost equality in (2.22)
states that the fractionally compounded amount F is compounded over n in-
terest periods, and the rightmost equality captures that the accrued amount
Fn is compounded over the fraction of an interest period. The left equality
applies to settings where the start of the time span does not coincide with the
beginning or end of an interest period, while the right equality is for when the
end of the time span is not the beginning or end of an interest period.
The rightmost equality in (2.22) also shows that if the interest rate per inter-
est period r/k is sufficiently small, expanding the binomial series yields:
r
r
1+ 1+ . (2.23)
k k
The amount accrued at the end of x interest periods can then be approximated
as follows:
r
r
Fx = 1 + Fn 1 + Fn , (0 < 1, 0 r/k 1). (2.24)
k k
Example 2.5. Returning to the example from the start of this section (page 24),
Equation (2.20) shows that a principal of $10, 000, 000 compounded monthly at
10% per annum for 15.36 mth will grow to:
0.1 15.36
F15.36 = 1 + $10, 000, 000 = $11, 359, 503.48.
12
Equivalently,
2.4 Compound Interest 29
0.36
0.1
F15.36 = $11, 359, 503.48 = 1 + Fn ,
12
which is the form in (2.22) and the origin of (2.15). Equation (2.14) uses simple
interest, rather than fractional compounding, during the remaining 0.36 mth
and is justified by (2.24):
r
0.1
F15.36 = 1 + Fn = 1 + 0.36 Fn = $11, 359, 593.67
k 12
Example 2.6. (Doubling Your Investment) Suppose that you invest F0 today
in an account with k-periodic compounding at r per year. Find a formula for
how long it will take you to increase your investment to x0 F0 , where x0 > 1.
Does the length of time depend on the initial amount F0 ? In particular, how
long will it take to double an investment of $1, 000 using 6% per annum with
daily compounding? What about $2, 000? Compare with the time it would take
using simple interest growth at the same interest rate.
Solution. We want to find how many years it will take to have F0 grow to
Fk = x0 F0 . By (2.25),
r
k
x 0 F0 = 1 + F0 ,
k
which implies that
ln x0
= (r > 0).
k ln 1 + kr
The time does not depend on the initial F0 .
For F0 = $1, 000, x0 = 2, r = 0.06, and k = 365 (daily), we obtain
ln 2
= 11.55,
365 ln 1 + 0.06
365
so it will take 11.55 years. Since the time span will not depend on the initial
investment, we obtain the same answer for $2, 000. For simple interest growth,
x0 1
we have x0 F0 = (1 + r )F0 , which yields = 16.67. The doubling time
r
is 5.12 years longer.
30 2 The Time Value of Money
Stc = eq t St , (2.28)
where St is the ex-dividend price at t of one unit of the security. We assume that
St models the market price at t since it discounts the cum-dividend price at the
dividend yield rate: St = eq t Stc . See the discussion on page 18.
This section extends compound interest from a fixed interest rate over a non-
negative real number of compounding periods to discretely varying interest
rates across compounding intervals of different lengths.
We begin with some needed notation. Suppose that you put the amount F0
(principal) in an account for a time interval [t0 , t f ], where t0 0. Assume that
each compound interest period is 1k yr. Divide [t0 , t f ] into n subintervals (not
necessarily of the same length), say,
[ t0 , t1 ], [ t1 , t2 ], ..., [ t i 1 , t i ], ..., [ t n 1 , t n ],
i yr = ki prd, i = 1, . . . , n.
Suppose that k-periodic compounding at ri per annum applies during the ith
interval [ti1 , ti ] for i = 1, . . . , n.
We now determine a formula for the amount to which the principal F0 will
grow at the future time tn .
Special Case
We begin by showing how the interest rate r relates to the return rate in the
context of compound interest.
At time t0 invest an amount F0 > 0 (principal) in an account that grows
under k-periodic compounding at interest rate r. Suppose that the account
pays no dividend. Let F(t f ) > 0 be the value of the principal at a future time
t f = t0 + . Since a time span of years has k periods, Equation (2.20) on
page 27 yields that the return rate on the principal F0 is:
F( t f ) r
k
RC I ( ) = 1= 1+ 1, (2.32)
F0 k
where the subscript C I indicates that the return rate is in the context of com-
pound interest. Note the dependence on the length of the time interval [t0 , t f ].
For n periods, the return rate becomes:
34 2 The Time Value of Money
n
r
n
RC I = 1+ 1. (2.33)
k k
In addition, the interest rate r can be expressed in terms of RC I ( ) as follows:
1
(1 + RC I ( )) k 1
r= . (2.34)
1/k
Equation (2.32) also shows that growing the initial amount V (t0 ) to the value
V (t f ) under compounding at interest rate r is the same as growing V (t0 ) to V (t f )
under simple interest using the return rate RC I ( ) over the time span :
r
k
V ( t f ) = 1 + R C I ( ) V ( t0 ) = 1 + V ( t0 ) .
k
The return rate RC I (1) over a year is also commonly used. Equation (2.32)
yields:
r
k
RC I (1) = 1 + 1, (2.35)
k
which is also called the annual percentage yield (APY) or effective interest rate and
denoted by RC I (1) = APY. The interest rate r corresponding to RC I (1) is called
the annual percentage rate (APR) or nominal interest rate and is given by:
1
(1 + APY) k 1
APR = .
1/k
The APR should not be confused with the APY, which involves compounding:
k
APR
APY = 1 + 1.
k
For instance, if you are quoted an APR of 12% per annum on a loan, then the
APR arises from a monthly interest rate of APR/12 = 1%. However, since in-
terest on debt typically involves compounding, the APY gives a true reflection
of the interest rate a borrower pays. In this case, the 1% per month interest
compounds to an annual percentage yield of
Example 2.8. If a credit card company quotes only its APR on the card, say,
10.99%, it can cause a consumer to think that after 1 year, the interest amount
on a balance of $2, 500 is
However, this is not correct because it assumes simple interest for the year.
Most credit cards compound daily or monthly (and may add fees). The true
interest rate for a 365-day year with daily compounding is given by the APY:
365
0.1099
APY = 1 + 1 = 11.6148%.
365
The actual interest amount for the year is then the (effective) return amount:
which is more than the amount $274.75 naively inferred from an APR of
10.99%.
An argument essentially the same as the one used to derive (2.35) shows that,
given any period return rate Rprd , the return rate over a year with compound-
ing at rate Rprd per period is given by:
k
Rann = 1 + Rprd 1, (2.36)
where (as usual) a year is assumed to have k periods. For example, a weekly
return rate of 1% annualizes as follows under weekly compounding:
We can generalize (2.36) further. First, the return rate (2.33) extends naturally
to compound interest with varying interest rates over a time span of n com-
pounding periods, where each period is 1k yr. Assume that the annual interest
rates used for the various n consecutive compounding periods are r1 , . . . , rn ,
i.e., the interest over the ith period is rki . By (2.31) on page 33, the return rate
(2.33) generalizes to:
F( t n ) rn
r
r
R C I ( t0 , t n ) = 1= 1+ 1 + n1 1 + 1 1. (2.37)
F0 k k k
Now, assume that you invest F0 in a nondividend-paying investment that
has return rate Ri over the ith period, where i = 1, . . . , n. Explicitly, if Vi1 and
Vi are the respective values of the investment at the start and end of the ith
period, then return rate is
prd Vj Vj1
Rj = .
Vj1
36 2 The Time Value of Money
We have: n
prd n
F = 1 + Rgeom F0 .
k
The geometric mean return relates as follows to the total return rate:
prd n
Rtot = 1 + Rgeom 1. (2.41)
In general, the geometric mean return does not equal the arithmetic mean
return,
2.5 Generalized Compound Interest 37
1 n prd
n j
Rprd = Rj .
=1
In fact,
prd
Rgeom R prd .
prd
The two means coincide when the period return rates R j are identical for j =
1, . . . , n. The example below illustrates these two means; see Reilly and Brown
[16, Sec. 1.2.2] for more.
Example 2.9. (Geometric Mean Return Versus the Arithmetic Mean Return)
Suppose that you initially invest $3, 000 in a fund that pays no dividend. As-
sume that the investment decreases to $2, 000 at the end of 1 year, decreases
from $2, 000 to $1, 000 from the end of year 1 to the end of year 2, and increases
from $1, 000 to $3, 000 from the end of year 2 to the end of year 3. Then the total
return rate on your investment over the 3 years is zero.
Let us compare what the arithmetic and geometric mean returns forecast for
the total return rate. The year-to-year return rates over the 3 years are:
correct total return rate. The two measures approximate each other when the
return rates do not change significantly from period to period. The geometric
mean return is usually employed for longer time horizons, where there is more
opportunity for higher volatility.
Compound interest can also be applied to develop the notion of a net present
value. This tool helps with deciding whether to partake in a particular in-
vestment opportunity. The opportunity can be a project, product line, start-up
company, etc. We assume that with an initial capital, the investment opportu-
nity produces net cash flows, i.e., cash inflows minus cash outflows, at different
future dates.
Unless stated to the contrary, assume that each net cash flow takes
taxes into account.
In addition, when the net cash flow on a particular future date is being esti-
mated, the estimate usually reflects activities over the year leading up to the
date. We shall then consider net cash flows on future dates separated by a year.
Furthermore, over the time span that an investment opportunity is analyzed
for its growth potential, we assume that all the annual net cash flows can be
modeled as arising from annual compounding at a constant interest rate. We
refer to this constant interest rate as the compounding growth (annual) rate from
investing in the opportunity.
For simplicity, we shall write expressions such as NPV(r ) > 0, where it is understood
that the 0 represents a zero amount of cash in the currency of the net cash flows.
Now, an important step in deciding whether to invest in the start-up is to
research the marketplace to find the mean compounding growth rate from
investing in an alternative opportunity with a similar business profile and
riske.g., research competitor companies comparable to the start-up in scale,
risk, business sector, etc. For illustration, assume that the mean compounding
growth rate from investing in an appropriate alternative opportunity is esti-
mated to be:
rRRR = 15%.
We then take rRRR as our required return rate for investing in the start-up.
The current market value of the start-ups projected stream of future net cash
flows is the present value of these net cash flows discounted at the required
return rate of 15%:
$155, 000 $215, 000 $350, 000
PV(rRRR ) = + +
1 + 0.15 (1 + 0.15) 2 (1 + 0.15)3
We can then produce the net cash flows by thinking theoretically of the alter-
native opportunity as growing ARRR to a future value of FVARRR (1) at 1 year
out, BRRR to a future value of FVBRRR (2) at 2 years out, and CRRR to a future
value of FVCRRR (3) at 3 years out:
On the other hand, the credible start-up claims that it can generate the
above future net cash flows with an investment today of less than the amount
$527, 484.18 required by the alternative opportunity, namely, with an initial in-
vestment of only
C0 = $250, 000.
Naturally, investors will favor the start-up since the amount PV(rRRR ) required
by the alternative opportunity is more expensive than the amount C0 required
by the start-up. In other words, the start-up appears favorable when the net
present value at the market required return rate is positive:
then measures how much cheaper (or more expensive, if the difference were negative)
it is to invest in the start-up than in the alternative opportunity. Of course, any final
decision to invest in a start-up will not rely solely on the NPV, but will be com-
plemented with a detailed analysis of the start-ups business plan, innovative
products/services, market environment, management team, etc.
If we had NPV(rRRR ) = 0, i.e., the initial capital required by the start-up
to produce the given future net cash flows was the same as that required by
the alternative opportunity, then there would be no extra value received from
investing in the start-up. In this borderline situation, however, some investors
may still invest in the start-up if, for example, it has more long-term promise.
The start-up would not be attractive to investors if NPV(rRRR ) < 0, i.e., if
it costs more to receive the same future net cash flows from the start-up than
from the alternative opportunity.
2.6 The Net Present Value and Internal Rate of Return 41
Indeed, the start-up can achieve these future net cash flows with less initial
capital only if it grows the initial capital at a rate greater than the alternative
opportunitys compounding annual growth rate of 15%. The start-ups com-
pounding annual growth rate on the initial capital C0 is called the internal rate
of return (IRR) and denoted rIRR . To determine the start-ups IRR, we must find
the interest rate rIRR that generates the forecasted net cash flows starting from
C0 = $250, 000:
First, separate $250, 000 into three amounts given by the present values of the
net cash flows $155, 000, $215, 000, and $350, 000 at the unknown discount rate
rIRR . The sum of these individual present values is the present value PV(rIRR )
of the sequence of net cash flows. Explicitly:
Then the future values at rate rIRR of the three portions of the $250,000 in (2.44)
yield the desired future net cash flows. Specifically, the future value of AIRR at
1 year out is $155, 000, of BIRR at 2 years out is $215, 000, and of CIRR at 3 years
out is $350, 000. It suffices then to find the IRR by solving (2.44) for rIRR . Note
that (2.44) is equivalent to the vanishing of the net present value at the rate
rIRR :
NPV(rIRR ) = PV(rIRR ) $250, 000 = 0. (2.45)
Employing a software, we find that an approximate solution of (2.44) or,
equivalently, (2.45) is:
rIRR = 0.652811.
Note that inserting this IRR into (2.44) actually produces $250, 000.04, which,
of course, is not the exact value $250, 000 due to the approximate value of rIRR .
In other words, decomposing the start-up capital approximately as
and future valuing each term by compounding annually at the rate rIRR will
yield the desired stream of net cash flows.
42 2 The Time Value of Money
Extend the previous example to a general sequence of net cash flows. Suppose
that you are considering a new investment opportunity requiring an initial
capital of C0 > 0 to generate future net cash flows,
C1 , C2 , ..., Cn ,
As before, denote the required return rate of the new investment oppor-
tunity by rRRR . Recall that rRRR is the mean compounding (annual) growth
rate from investing in an alternative opportunity in the marketplace with busi-
ness profile and risk similar to the new investment opportunity. An NPV-based
decision-making rule about whether to invest in the new opportunity is as follows:
If NPV(rRRR ) > 0, then the new investment opportunity is cheaper than the
alternative investment and so is favorable.
2.6 The Net Present Value and Internal Rate of Return 43
If NPV(rRRR ) < 0, then the new opportunity is more expensive and not
favorable.
If NPV(rRRR ) = 0, then the cost of the new opportunity is the same as the
alternative investment and it is borderline whether to invest.
As noted in Section 2.6.1, even with a robust NPV estimate, a real-world busi-
ness decision about whether to invest in a new opportunity will not use the
NPV as the only measure. One has to factor in the business environment, ex-
perience of the management team, etc.
An IRR of the new investment is a positive solution, r = rIRR , of the following
equation:
C1 C2 Cn
0 = NPV(r ) = C0 + + + + . (2.48)
(1 + r ) (1 + r ) 2 (1 + r ) n
Equation (2.48) is equivalent to a real polynomial, so we are seeking the pos-
itive roots of such a real polynomial. Without loss of generality, suppose that
the real polynomial has degree k and is of the following form:5
There is no general formula for the real solutions of (2.49) for all positive inte-
gers k.
Perhaps the most cited general result about the number of positive solutions
of (2.49) is Descartess Rule of Signs. Before stating this result, we gather some
notation. Let N+ denote the number of positive solutions of (2.49), where we
count the solutions with multiplicity. For example, the polynomial,
r2 10r + 25 = (r 5)2 = 0,
Proof. See Meserve [14, p. 156] and Wang [17] for a proof.
r5 r2 + r 1 = 0
has three sign changes in its ordered nonzero coefficients: +1, 1, +1, 1. By
Theorem 2.2, this polynomial equation has either 3 or 1 positive solutions.
The IRR Equation (2.48) is equivalent to a polynomial equation of the form
(2.49) with ordered coefficients (2.50). By Theorem 2.2, if these ordered coeffi-
cients have one sign change, then there is at most one positive solution. If, in
addition, you can prove that the polynomial equation has at least one positive
solution, then this solution is the unique positive solution and the desired IRR.
In the example of the start-up, Equation (2.45) is equivalent to a cubic equation:
p(r ) = 250, 000 r3 595, 000 r2 225, 000 r + 470, 000 = 0. (2.51)
There is one sign change, so there is at most one positive solution. Since p(r ) >
0 at r = 0 and p(r ) as r , its graph must cross the positive r-axis,
which means that p(r ) must have at least one positive solution. Hence, the
cubic equation has a unique, positive solution, which is the desired rIRR . Using
a software, we found the approximate positive solution to be rIRR = 0.652811.
We also observed that for the required return rate of rRRR = 15%, the IRR
criterion to favor the start-up, namely, rIRR > rRRR , is equivalent to the NPV
criterion of NPV(rRRR ) > 0. This is not true in general, but holds in the example
because the function NPV(r ) in (2.45) is strictly decreasing. The next result
shows when the situation of the example holds.
Theorem 2.3.
1) Suppose that all the future net cash flows are positive. Then NPV(r ) is a strictly
decreasing function of r and, if there is an r = rIRR , then rIRR is the only IRR.7
2) If there is an rIRR and NPV(r ) is strictly decreasing, then the IRR and NPV
decision-making criteria are equivalent:
a) rIRR > rRRR if and only if NPV(rRRR ) > 0.
b) rIRR = rRRR if and only if NPV(rRRR ) = 0.
c) rIRR < rRRR if and only if NPV(rRRR ) < 0.
Proof.
implies Nsgn N+ is a nonnegative even number, i.e., N+ = Nsgn even. In particular, N+ is either
Nsgn , Nsgn 2, . . . , Nsgn 2( n 1), or Nsgn 2n for some nonnegative integer n.
IRR > 0.
7 By definition, we assume r
2.6 The Net Present Value and Internal Rate of Return 45
d C1 C2 Cn
NPV(r ) = 2 n < 0.
dr (1 + r ) 2 (1 + r ) 3 ( 1 + r ) n +1
Consequently, the function NPV(r ) is strictly decreasing and, hence, if the
graph of NPV(r ) crosses the positive r-axis, i.e., there is an IRR, the graph
will do so only once, namely, at a unique value rIRR .
2) We are given that an IRR exists: rIRR > 0. Since NPV(r ) is strictly decreasing,
we have r2 > r1 > 0 if and only if NPV(r1 ) > NPV(r2 ). For part a) choose
r2 = rIRR , r1 = rRRR , and observe that NPV(rIRR ) = 0. For part c) choose r2 =
rRRR , r1 = rIRR . Part b) holds since NPV(rIRR ) = 0.
The table shows that with an initial investment of C0 , the start-up is attractive
because it has:
Positive future net cash flows C1 , C2 , and C3 that are nontrivial as a percent
of the initial capital and are nontrivially increasing. In particular, C1 is more
than half the initial capital, C2 is about 86% of the initial capital, and C3 is
140% of the initial capital. Moreover, the net cash flow increases by about
39% from year 1 to 2 and about 63% from year 2 to 3.
A nontrivially positive NPV value of NPV(rRRR ) = $277, 484.18, which
makes the start-up much cheaper to invest in than a comparable alterna-
tive opportunity by more than the initial capital.
A quite large compounding growth rate of rIRR = 65.28% compared to the
required return rate of rRRR = 15%, i.e., the rIRR is more than four times
rRRR .
For example, if the net cash flows are C1 = 2C0 and C2 = 2C0 , then the
quadratic reduces to one with no real solution: r2 + 1 = 0. In this case, there
is no IRR.
It is also possible to have multiple IRRs. The quadratic (2.52) has two positive
solutions, say, r1 and r2 , if and only if the following positivity conditions hold:
C12
2 C0 < C1 , < C2 < C0 C1 , (C0 > 0).
4 C0
Choosing C0 = $10, 000, C1 = $25, 000, and C2 = $15, 620, we obtain two IRRs:
the end of each time period. When the payments occur at the start of each pe-
riod, we have an annuity due, which will not be treated in the text; see Guthrie
and Lemon [8] and Muksian [15] for an introduction.
An ordinary annuity is called simple if, at the end of each payment period,
both a payment and the simple interest on the balance from the beginning of
the payment period are applied. Note that the entire balance from the previous
period is reinvested. Hence, for a simple ordinary annuity, the total accrued at
the end of a payment period has the following form:
Here previous balance refers to the balance from the end of the previous
payment period, which recall we treat mathematically the same as the start of
the current period. Since the simple interest applied to a previous balance will
yield interest on the principal and interest on the interest, we obtain compound
interest naturally.
Unless stated to the contrary, assume that all loans are simple
ordinary annuities.
The future value of a simple ordinary annuity is the amount to which the se-
quence of payments of the annuity will grow, taking into account appreciation
due to periodic compounding. We shall see that the annuitys future value is
the sum of the end-of-term future values of the individual payments of the
annuity.
Consider a simple ordinary annuity based on k-periodic compounding at
interest rate r. This divides each year into k equal-length payment periods.
Assume that each payment is the same amount P and the annuity has a term
of n periods, where n is a positive integer. The total accrued at the end of the
ith period will be denoted by Si . We shall apply (2.53) to obtain an expression
for the total amount Sn accrued over the n periods:
At the end of the first payment period, a payment P is made. Since there is
no balance from the beginning of this period, the total accrued at the end
of the first period is:
S1 = P .
Reinvest the entire amount S1 in the annuity.
48 2 The Time Value of Money
At the end of the second period, the payment is P , the previous balance
is S1 , and the simple interest earned on the entire reinvested amount S1 is
(r/k)S1 . The total accrued at the end of the second period is then:
r r
S2 = P + S1 + S1 = P + 1 + P.
k k
Reinvest the entire amount S2 in the annuity.
At the end of the 3rd period, the payment is P , the previous balance is S2 ,
and the simple interest earned on S2 is (r/k)S2 . The total accrued is:
r r
r
2
S3 = P + S2 + S2 = P + 1 + P + 1+ P.
k k k
Reinvest S3 .
Continuing the above process, at the end of the nth period, the payment is
P , the previous balance is Sn1 , and the simple interest earned on Sn1 is
(r/k)Sn1 . The total accrued at the end of the nth period is:
r
S n = P + S n 1 + S n 1
k
or
r
r
2 r
n 1
Sn = P + 1 + P + 1+ P + + 1 + P. (2.54)
k k k
Equation (2.54) shows that the future value of a simple ordinary annuity is the
sum of each of the payments future valued to the end of the annuity. To see this, in
(2.54) the future values of these payments are shown from right to left. Explic-
itly, the 1st payment P is at the end of the first period, so its future value at the
end of term (i.e., end of the nth period) is (1 + r/k)n1 P . The 2nd payment P
is at the end of the second period, which has a future value at the end of the
term of (1 + r/k)n2 P . The (n 2)nd payment P has an end-of-term future
value of (1 + r/k)2 P , and the (n 1)st payment P has (1 + r/k)P . The nth
payment P is at the end of the term so it equals its end-of-term future value.
By (2.54), the sum of these future values is Sn .
The right-hand side of (2.54) has a simpler expression. Applying the geo-
metric sum,
m 1 1 xm
a + ax + + ax = a, (m 1, x = 1), (2.55)
1x
with a = P , x = 1 + r/k = 1 (since r > 0), and m = n 1, we obtain:
(1 + kr )n 1
Sn = P (r > 0, n 1).
r/k
2.7 Annuity Theory 49
Theorem 2.4. At the end of n periods, the future value of the simple ordinary annuity
with payments P and k-periodic compounding at r per annum is:
(1 + kr )n 1
Sn = P (r > 0, n = 1, 2, . . . ). (2.56)
r/k
dSn P r
P r
n 2 P
= + 2 1+ + + ( n 1) 1 + > 0.
dr k k k k k
It follows that for n 2, the total amount Sn accrued over n periods increases as r
increases. Additionally, for n = 2 we have
d2 S2
= 0.
dr2
However, if n = 3, 4, . . . , then
d2 S n P r
n 3 P
= 2 + + ( n 1 )( n 2 ) 1 + > 0.
dr2 k2 k k2
Hence, for n 3, the total amount Sn accumulated over n periods accelerates9 in
value as the interest rate r increases.
8 If there is only one period, then S1 = P (constant) for all r since the principal is added only at the
end of the first period, but the first interest payment occurs at the end of the second period.
9 That is, S is concave up as a function of r (it has an increasing slope).
n
50 2 The Time Value of Money
r
1 r
2 r
n
An = 1 + P + 1+ P + + 1 + P
k k k
r
1 r
1 r
2
= 1+ 1+ 1+ + 1+ +
k k k
r
(n1)
+ 1+ P
k
1 n
1 + kr 1 1 + kr P
= .
1 (1 + r/k)1
The last equality above follows from the geometric series (2.55) with a = P and
x = (1 + r/k)1 and m = n. Further simplification yields:
Theorem 2.5. The present value of a simple ordinary annuity over n periods and with
payments P and k-periodic compounding at r per annum is:
n
1 1 + kr
An = r P (r > 0, n = 1, 2, . . . ). (2.58)
k
Theorem 2.5 gives a formula for the amount needed today at interest rate r
in order to be able to pay out the amount P each period for n periods.
Remark 2.3. The present value An is usually denoted by an and the discount
factor (1 + r/k)1 by in actuarial science.
2.7 Annuity Theory 51
To understand how n varies with P , we can fix An , r, and k and treat n for-
mally as a function of P given by (2.59). This treatment, of course, will lead
to noninteger values of n, which we round off to find the approximate integer
value. For general values of An > 0, r > 0, and k (nonnegative integer), as P in-
creases, the total number of periods n strictly decreases and the rate of decrease slows
down. In other words, the quantity n as a function of P is convex, i.e., n(P )
is everywhere concave up. Explicitly, though the function n(P ) is a strictly
decreasing function, it has an increasing slope:
dn (r/k) An
=
<0 (r > 0)
dP 1
(r/k ) A n
P 2 ln 1 + kr
P
(r/k ) A n
2
d n 2 P (r/k) An
=
> 0.
dP 2 (r/k )A n 2 3
1 P P ln 1 + k r
Here we used ln 1 + kr > 0 (since r > 0) and employed (2.58) to conclude that
(r/k) An r
n
1 = 1+ > 0.
P k
Note that the quantity (r/k)An is the (simple) interest on the loan at the end
of the first period. An example of n as a function of the per-period payment P
is shown in Figure 2.1.
Number of Periods
360
180
106
Monthly Payment
1000 1500 2000
Fig. 2.1 The graph shows the total number of periods n as a function of monthly payments P for a
loan of A n = $162, 412 at 6.25% per annum compounded monthly (k = 12). The loan is paid off in
about 30 years (or 360 months) if the monthly payment is $1, 000. Doubling the payments yields a
payoff time of 8 years and 10 months (or 106 months), which is much less than half of the time for a
$1, 000 monthly payment.
If you put aside the amount An today and have it grow by k-periodic com-
pounding with interest rate r, then after n periods, the initial amount will grow
to Sn . In other words, the initial amount An is the present value of the future
amount Sn under periodic compounding. To see this, note that by (2.56) and
(2.58), we obtain:
Sn (1 + kr )n 1 P r
n
n = r 1 +
1 + kr k k
n
1 1 + kr
= r P
k
= An , (2.61)
where r > 0. Equation (2.61) shows that an equivalent way of determining the
future value of a simple ordinary annuity is to take the present value of the
sequence of payments and then take the future value of that present value.
2.7 Annuity Theory 53
We determine the unpaid balance on the principal at the end of each period of
the loan.
For notational simplicity, define
r
y1+ , (r > 0).
k
Then by (2.58), each end-of-period payment can be expressed as:
(y 1) yn (y 1)
P= A n = An . (2.62)
1 y n yn 1
10While a typical mortgage is a loan used to buy a fixed asset like a house or land, which also secures
the loan, a mortgage used to buy movable property such as a mobile home or operational equipment
that acts as security for the loan is called a chattel mortgage or secured transaction.
54 2 The Time Value of Money
At the end of the first period, an interest (r/k)An is added to the starting
balance An and a payout/withdrawal of P is made. The unpaid principal
balance at the end of the first period is:
r
B 1 = A n + A n P = yA n P .
k
At the end of the second period, an interest (r/k)B1 is added to the balance
B1 from the start of the second period and then a payout/withdrawal of P
is made. The unpaid principal balance at the end of the second period is:
r
B2 = B1 + B1 P = yB1 P = y(yAn P ) P = y2 An (1 + y)P .
k
At the end of the 3rd period, an interest (r/k)B2 is added to the balance
B2 from the start of the 3rd period and then a payout/withdrawal of P is
made. The unpaid principal balance at the end of the 3rd period is
r
B 3 = B 2 + B 2 P = yB 2 P
2 k
= y y An (1 + y)P P
= y3 An (1 + y + y2 )P .
Continuing the above process, at the end of the th period, an interest
(r/k)B1 is added to the balance B1 from the start of the th period and
then a payout/withdrawal of P is made. The unpaid principal balance at
the end of the th period is:
r
B = B1 + B1 P
k
= y An (1 + y + y2 + + y1 )P
1 y
= y A n P
1y
(1 y ) yn (y 1)
= y A n + An , [by (2.62)]
(y 1) (yn 1)
y (yn 1) + yn (1 y )
= An
yn 1
yn y
= n An ( = 1, 2, . . . , n),
y 1
where B0 = An . Hence:
2.7 Annuity Theory 55
Theorem 2.6. The unpaid principal balance at the end of the th period is given in
terms of An , r > 0, and k by:
n
1 + kr 1 + kr
B = n An , (2.63)
1 + kr 1
where n = 1, 2, . . . and = 0, 1, 2, . . . , n.
At the end of each period, a portion of the payment P is used toward interest
on the loan, the other portion toward reduction of the loans unpaid principal
balance.
Notation. Let:
I = the portion of the payment P at the end of th period that is applied
toward interest on the loan (i.e., the interest payment at the end of
period ).
P = the portion of payment P at the end of the th period that is applied
toward the unpaid principal balance of the loan.
We now express I and P in terms of An and r. The interest payment at the
end of period is
r
r
1 + kr n 1 + kr 1
I = B1 = n An , (2.64)
k k 1 + kr 1
P = P I
(y 1) yn (y 1) yn y1
= An An
yn 1 yn 1
(y 1)y1
= An
yn 1
r
1 + r 1
= k
n An . (2.65)
k 1 + kr 1
n n
(y 1)y1 y1 n
P = y n1
An = An n
y 1 y1
=1 =1 =1
n 1
y1 y1 1
yn
= An
yn 1 y = A n yn 1 y1
=0
= An .
Therefore:
Theorem 2.7. The total interest paid during a loan with n periods and k-periodic
compounding at interest rate r is then:
n n
I = (P P ) = nP An , (2.66)
=1 =1
where n = 1, 2, . . . .
Note that nP is the total amount paid into the loan over the life of the loan and
nP An is the total cost of the loan.
Remark 2.4. If you receive a loan today for the amount An at fixed interest rate
r, fixed payment P per period, and a term of n periods, then the sum n P of
all your future payments adds money at different future times without present
or future valuing them. In fact, the present value of all the future payments is
the loan amount An and the future value is Sn , neither of which is n P . The
meaning of n P is the amount you would, in principle, have to pay the lender
today if immediately after receiving the loan you want to pay the loan off, but
the lender penalizes you by requiring you to pay the principal An plus the
total interest for the full term of the loan. Of course, this is merely theoretical
since the majority of loans would not have such a drastic penalty.
Applying essentially the same arguments used to establish the future value
annuity Equation (2.54), we can generalize to a simple ordinary annuity with a
sequence of varying payments, P1 , P2 , . . . , Pn , and respective varying interest
rates, r1 , r2 , . . . , rn , over n interest periods that coincide with the payment
periods. We assume k-periodic compounding. The payment P occurs at the end
of the th period, and the interest r is applied at the end of the th period to the balance
from the start of the th interest period, where = 1, . . . , n. Assume that there is no
balance at the start of the first period.
2.7 Annuity Theory 57
The pattern for the future value of a simple ordinary annuity generalized to
varying payments and varying interest rates emerges as follows:
At the end of the first payment period, a payment P1 is made. Because
there is no balance from the beginning of this period, the total accrued at
the end of the first period is:
S 1 = P1 .
Reinvest S1 in the annuity.
At the end of the second period, the payment is P2 , the previous balance is
S1 , and the simple interest earned on S1 is (r2 /k)S1 . The total accrued is:
r r
S 2 = P2 + S 1 + 2 S 1 = P2 + 1 + 2 P1 .
k k
Reinvest S2 in the annuity.
At the end of the 3rd period, the payment is P3 , the previous balance is S2 ,
and the simple interest earned on S2 is (r3 /k)S2 . The total accrued is:
r r
r
r2
S 3 = P3 + S 2 + 3 S 2 = P3 + 1 + 3 P2 + 1 + 3 1+ P1 .
k k k k
Reinvest S3 in the annuity.
Continuing the above process, at the end of the nth period, the payment is
Pn , the previous balance is Sn1 , and the simple interest earned on Sn1 is
(rn /k)Sn1 . The total accrued is
rn
S n = P n + S n 1 + S n 1
k
or
rn
rn
r
S n = Pn + 1 + P n 1 + 1 + 1 + n 1 P n 2 +
k r
k
r n 1
k
r2
n
+ 1 + 1+ 1 + P1 .
k k k
(2.67)
Theorem 2.8. The future value at the end of n payment periods, which coincide with
the interest periods, of the simple ordinary annuity with payments P1 , . . . , Pn and
k-periodic compounding at respective interest rates r2 , . . . , rn during the consecutive
interest periods is:
n 1 r n +1 j
Sn = 1 + Pn , (n = 1, 2, . . . ), (2.68)
=0 j=0
k
Similarly, the present value Equation (2.57) generalizes naturally to the case
of a sequence of payments, P1 , . . . , Pn , and interest rates r1 , . . . , rn . Here the
amount Pi is paid at the end of the ith period, and the interest ri is applied
at the end of the ith period to the balance from the end of the (i 1)st period.
When simple interest at rate r1 is applied at the end of the first period to
the initial amount P1 (1 + r1 /k)1 , we obtain the first payment P1 . Apply-
ing compound interest with rates r1 and r2 at the end of the first and second
periods, respectively, to the initial amount P2 (1 + r1 /k)1 (1 + r2 /k)1 yields
the second payment P2 . Continuing this process gives the initial amount that
will grow to the nth payment Pn . These initial amounts are the present values
of the sequence of payments under compound interest at different rates. Sum-
ming all the present values gives the following present value for the generalized
annuity:
P1 P2
An = + r1
1 + rk1 1+ k 1 + rk2
Pn
+ + r1 ,
1+ k 1 + rk2 1 + rkn
(2.69)
Applications of (2.69) to the dividend discount model and bond pricing are
given, respectively, in Section 2.9.1 and 2.10 (see page 68).
It immediately follows that the relationship between the future and present
values in Equation (2.61) generalizes to
Sn
An = n , (2.71)
rj
1+
k
j =1
a) Using this average interest rate, estimate how much she would have on July
1st of her graduating year.
Solution. Use the future value Sn in (2.56). We have k = 12 for monthly com-
pounding and since the period is 4 years, we have n = 4 12 = 48 periods,
r = 0.0225, and P = $25. By (2.56),
60 2 The Time Value of Money
0.0225 48
1+ 12 1
S48 = 0.0225
$25 = $1, 254.43.
12
b) If her target is to have at least $1, 300 on July 1st of her graduating year,
determine the minimum required interest rate.
implicitly for r (use a software package), we obtain the smallest interest rate
to be r = 4.04%. Note that this is the smallest value of r that works since Sn
is a strictly increasing function of r for natural numbers n 2 (see page 49).
Example 2.11. (Saving for Retirement) Suppose that you open a retirement
fund at the start of a month and you deposit $200 at the end of each month.
If the fund pays 4% per annum compounded monthly, how much would you
accumulate at the end of 25 years?
Solution. This problem deals with the future value Sn in (2.56). For monthly
compounding (k = 12), we have n = 25 12 = 300 periods, r = 0.04, and P =
$200. Equation (2.56) then yields the following future value:
300
1 + 12
0.04
1
S300 = 0.04
$200 = 514.13 $200 = $102, 826.
12
Example 2.12. (Total Paid on Loan) A relative is considering a 20-year loan
of $150, 000 with an interest rate of 8% compounded monthly. Assuming you
hold the loan the entire term and make the minimum payment at the end of
each month, what is the total amount you pay into the loan?
Example 2.13. (Paying Off Debt) Suppose that you borrow $100, 000 at an an-
nual interest rate of 6% with monthly compounding. For an ordinary annuity
based on this compounding, what is your minimum payment per month to
pay off the loan in 10 years?
Solution. The problem requires the present value An . Since k = 12, there are
10 12 = 120 periods. Using A120 = $100, 000 and kr = 0.06
12 = 0.005, we get:
Example 2.14. (How Much Loan Can You Afford) Suppose that you can pay
$1, 495 per month for the next 15 years. What is the largest loan you can afford
at 6.25% per annum with monthly compounding?
Solution. Assume that the first payment is made 1 month from now. We have
n = 15 12 = 180 periods (months), P = $1, 495, and r = 0.0625. The maximum
loan you can afford is:
180
1 1 + 0.0625
12
An = 0.0625
$1, 495 = $174, 359.71.
12
Example 2.15. (Living Off a Lump Sum) Suppose that you inherited $300, 000
and invested it in an account with an annual interest rate of 7% compounded
monthly. For an ordinary annuity based on this compounding, if you want
your inheritance to last 20 years, what is the maximum fixed amount you can
spend from the account per month?
Solution. Using the present value annuity formula with An = $300, 000, k = 12,
r = 0.07, n = 20 12 = 240 periods, we obtain:
Example 2.16. (House Equity) A couple bought their house 11 years ago for
$225, 000 and put down 10% on the house. On the balance, they took out a
62 2 The Time Value of Money
15-year mortgage at 5.75% per annum with monthly compounding. The cur-
rent net market value of the house is its current market value minus all costs
in selling the house today. Suppose that the current net market value is now
$350, 000 and the couple wants to sell their house.
a) How much equity (to the nearest dollar) is in the house today? Equity in a
house is defined as:
Solution. The couple puts down 10% or $22, 500 at the start, so the mortgage
is for An = $225, 000 $22, 500 = $202, 500. Since n = 15 12 = 180, r =
0.0575, k = 12, and = 132, Equation (2.63) yields the unpaid balance at the
end of the 132nd month:
n
1 + kr 1 + kr
B132 = n An = $71, 952.87.
1 + kr 1
Example 2.17. (Saving for College Tuition) When a child was born in 2011,
her parents decided to invest in her college education. This was motivated by
a forecast that 4 years of in-state tuition at an average public college will be
about $96, 000 when she will attend college. Suppose that the parents want to
accumulate that amount by their childs 17th birthday. They open a sinking
fund into which they make a deposit on each birthday of the child up to the
17th birthday. Assume that the first deposit is for the amount P and thereafter
the parents increase the deposited amount by 4% annually. Suppose that the
bank where they have the sinking fund pays a fixed 5.5% per annum com-
pounded annually. What should the minimum annual deposits be in order for
the amount in the fund to reach at least $96, 000 after her 17th deposit?
2.9 Applications to Stock Valuation 63
where rn+1 = 0. Note that r1 does not appear in the formula since no interest is
paid at the end of the first period (because the first deposit is not made at the
start of the first period, but at the end of the first period).
We have n = 17, S17 = $96, 000, r2 = = r17 = 0.055, and k = 1. The product
in the sum above then becomes:
r n +1 j rn
r n 1
rn(1)
1+ k = 1+ 1+ 1 + = (1.055) ,
j =0
k k k
where = 0, 1, 2, . . . , n 1.
Let us determine the deposits P1 , . . . , Pn . The deposit P1 = P is made on
the first birthday. On the second birthday, it is increased by 4% to P2 = P1 +
0.04 P1 = (1.04) P . On the 3rd birthday, the deposit is P3 = P2 + 0.04 P2 =
(1.04)2 P . For j = 1, . . . , n, the deposit on the jth birthday is then P j = (1.04) j1 P .
It follows: Pn = (1.04)16 P .
The target amount for the sinking fund can then be expressed as:
16 16
1.055
$96, 000 = P (1.055) (1.04)16 = P (1.04)16 1.04
=0 =0
= P 1.87298 19.1104 = 35.7934 P .
This yields P = $2, 682.06, which is the first deposit. Hence, for j = 1, . . . , 17,
the minimum deposit on the jth birthday must be: P j = (1.04) j1 $2, 682.06,
which has values
P1 = $2, 682.06, P2 = $2, 789.34, ..., P16 = $4, 830.24, P17 = $5, 023.45.
This section applies the theory of annuities to determining the present val-
ues of preferred and common stocks. The main tool is the dividend discount
model. A stochastic model for the future value of a stock will be taken up in a
later chapter.
64 2 The Time Value of Money
The dividend discount model (DDM) was pioneered by Williams [18] (1938) and
Gordon [7] (1959). The fundamental hypothesis of the DDM is that, if a stock is held
for n years, then its current value is the present value of the sequence of its expected
future cash dividends through n years plus the present value of the stocks expected
price in n years.
A stock has no maturity date and so is a security in perpetuity. Suppose
that the stock pays a dividend and the (annual) required return rate of the
stock is k.11 Assume that you will hold the stock for n years. Let D0 be the
current cash dividend, i.e., the total cash dividend per share over the previous
year. Suppose that all future cash dividends are expected to grow at a constant
annual rate g, which we assume is less than the required return rate (k > g).
Let D(i) denote the expected cash dividend per share for the interval from the
present time to i years out, where i = 1, . . . , n. Then the expected sequence of
future cash dividends per share for years 1 through n is:
1 1+g 1+g
1+ g
= 1+k
=1+ 1+k
.
1 1 +k =0 =1
11 Recall that the marketplace is assumed to be in equilibrium, which allows for the required return
rate of the stock to be estimated using the CAPM model; see Chapter 4 for an introduction.
2.9 Applications to Stock Valuation 65
( 1 + g ) D0 D(1)
S0 = = , (k > g > 0). (2.73)
kg kg
Equation (2.73) is called the Gordon growth model. This is an example of a grow-
ing perpetuity, i.e., a perpetuity with payments that increase each period.
The Gordon growth model generalizes naturally to allow for k compound-
ings per year through the replacements g g/k and k k/k:
g
1 + k D0 D(1)
S0 = = k g (k > g > 0),
kg
k k k k
g j
where D( j) = 1 + k D0 .
Example 2.18. (Preferred Stocks) Suppose that a preferred stock has a fixed to-
tal annual cash dividend per share of $2.50. Assume an annual required return
rate of 13% for the stock. How much should you pay for the preferred stock?
Solution. We apply Equation (2.74) with D0 = $2.50 and k = 0.13. The current
share price of the preferred stock is: S0 = Dk0 = $2.50
0.13 = $19.23.
Example 2.19. (Common Stocks) Suppose that the total cash dividend of a
stock last year was $2.75 per share and dividends are expected to increase at
3% per annum. If the annual required return rate is 10%, find the share price
of the stock today.
The US bond market is vastmuch bigger than its stock market. As measured
at the end of 2012 in terms of capitalizations, the US bond market was twice
as big as the US stock market for domestic companies.12 As with other fixed-
income financial investments, the price of a bond is the present value of its
cash flow. We shall explore how to value bonds.
Unless stated to the contrary, all bonds are without options, i.e.,
they are noncallable, nonconvertible, etc., and have a fixed interest
paid every 6 months.
12 http://www.learnbonds.com/how-big-is-the-bond-market/
13 IOU is an abbreviation for I owe you.
14 Most corporate bonds are callable. Also, the US Treasury has not issued callable bonds since 1985.
2.10 Applications to Bond Valuation 67
Although both bonds and stocks are securities of a company, they are
different in the sense that bondholders are creditors of the company, whereas
stockholders are owners of the company. The cash flows from a companys
bonds are more reliable than those from its stocks since the company has a le-
gal obligation to repay its bondholders. Sometimes even when a company be-
comes insolvent, its bondholders may still get back some compensation, while
compensation is not guaranteed for its common stockholders.
We now list and discuss some basic terminologies and features of bonds:
The issue date of a bond is the date on which the bond issuer receives the
loan from the lender and from which the lender is entitled to receive inter-
est from the issuer.
The maturity value M (also known as the par value, face value, principal) of
a bond is the unit of the amount borrowed at the time it was issued. It is
traditionally in units of $1, 000, but municipal bonds are usually sold in
units of $5,000.
There are two main markets for bonds: primary market, where bonds are
sold for the first time to institutional investors, and secondary market, where
the resale of bonds taking place after their initial offering is open to the
public, though individual investors will need to have a brokerage account
to transact trades. Bonds selling at their maturity value are called par bonds.
In the secondary market, bonds are traded at prices that are typically dif-
ferent from the maturity value. If a bond sells at a market price above
(respectively, below) its maturity value, then it is called a premium bond
(respectively, discount bond).
The maturity date is the date on which the bond issuer must repay the lender
the bonds maturity value. Note that callable bonds have features which
allow for the principal to be repaid before the maturity date.
The term to maturity, or simply maturity, of a bond is the length of the time
interval between the issue date and the maturity date.
Bonds can be classified into three groups: short term, intermediate term and
long term according to maturities of, respectively, 15 years, 512 years, and
greater than 12 years.
68 2 The Time Value of Money
The coupon rate or interest rate, denoted by rC , is defined by the current yield
when the bond price is equal to its maturity value. That is:
15For example, such a bond might be issued at a 50% discount from its maturity value.
16A savings bond offers a fixed rate of interest over a fixed period of time, but cannot be traded after
being purchased.
2.10 Applications to Bond Valuation 69
For a bond being traded after it was originally issued, we expect intuitively
that when the YTM is at the coupon rate, then the market value of the bond
should be its maturity value. The following proposition confirms that intuitive
result and its converse:
Proposition 2.1. Suppose that a bond has n coupon payments remaining. The market
price of the bond equals its maturity value exactly when its coupon rate is the YTM:
17 It is worth noting that comparing different bonds by their percentage change in price is often mis-
leading since the significance is not the same for an identical percentage price change of bonds with
different interest rates. Also, it is important to realize that reinvesting all the coupon payments at the
same rate is rather difficult if not impossible in practice.
70 2 The Time Value of Money
= M.
The interest rate probably has the single largest impact on the prices of all
bonds. The following three examples are related to each other and illustrate
the relationship between bond prices on the one hand and interest rates and
YTM on the other.
Example 2.20. Suppose that a 30-year bond with an annual 3% coupon rate
payable semiannually was issued by the US Treasury on the first trading day of
2013. If the maturity value is $1, 000, what is the semiannual coupon amount?
Solution. Solve for the semiannual coupon amount C from the equation
2C
3% =
$1, 000
to obtain C = $15. Assume, for simplicity, that the bond was sold in the primary
market at its maturity value. Then by Proposition 2.1, the YTM equals 3%.
In the next two examples, the bond in Example 2.20 will be referred to as
the first bond.
Example 2.21. Since the Feds kept interest rates artificially low in 2013, dou-
bling the interest rate in 10 years from 2013 is not an unreasonable speculation.
Suppose that another 30-year bond with an annual 6% coupon rate payable
semiannually will be issued by the Treasury on the first trading day of 2023.
What will be the price of the first bond at the time of the second bond initial
offering?
2.10 Applications to Bond Valuation 71
Since no investors will buy a bond with 3% annual yield when they have the
choice to purchase a bond of the same type with 6% annual yield, we have
r1 = 6%. In other words, the current yield of the first bond will be forced to
approach 6% on the issue date of the second bond under the law of supply
and demand. To speculate on the price of the first bond, we apply (2.75) and
solve for B1 from the equation, r1 = 6% = 2B$151
, to obtain B1 = $500. Observe
that when the interest rate rises from 3% to 6%, the first bonds price will fall from
$1, 000 to $500.
Example 2.22. Suppose that you will purchase the first bond on the first trad-
ing day of 2023 at the price $500 and hold it to the maturity date of the first
trading day of 2043. What will be the yield to maturity?
Solution. We need to solve the bond Equation (2.78) for rY , which in our set-
ting is18
((1 + rkY )n 1) C M
B1 = rY rY n + .
k (1 + k ) (1 + rkY )n
and
18 As before, there is no general analytical solution rY for every n. In most applications, we can only
estimate rY numerically using a software.
72 2 The Time Value of Money
Bond Price
1600
1400
1200
1000
800
600
YTM
0.00 0.02 0.04 0.06 0.08 0.10
Fig. 2.2 The price of a bond is a strictly decreasing, concave-up function of the bonds YTM. The
graph illustrates this for a bond with $1, 000 maturity value and 6% coupon rate. Note that when the
YTM is 6%, the bonds price is its maturity value.
d2 B ( n ) M n
( + 1)
= n ( n + 1 )
(1 + rY ) +
+ C +2
> 0.
=1 (1 + rY )
2
drY n 2
In other words, the bonds price is not only strictly decreasing as the yield
increases, but has a convex graph, i.e., the graph is everywhere concave up
(increasing slope). Figure 2.2 depicts this property for a bond with $1, 000 ma-
turity value and 6% annual coupon rate.
2.11 Exercises
2.1. A physicist summed up the growth rate of an initial sum of money held
over a fixed time span as follows: If simple interest is applied during the
time span, then the initial sum will grow with uniform (constant) velocity as
the interest rate increases. If periodic compound interest is applied, then the
growth of the initial sum will accelerate as interest increases. Do you agree
with this interpretation? Justify your answer.
2.10. Consider a principal F0 that is held for nexact days during a non-leap
year at the simple interest rate r. By what percent is the simple interest amount
74 2 The Time Value of Money
using Bankers Rule greater than the simple interest amount employing exact
time and exact interest?
2.12. For an interest rate of 4% per year, compare the future value 2 years from
now to which $10,000 increases under daily compounding versus continu-
ous compounding. Assume 365 days per year and express your answer as a
fractional-difference percentage of the daily compounding case.
2.13. Suppose that at the start of college, you have $1,000 to invest and would
like for it to grow to $1,250 at the end of your senior year through monthly
compounding. Determine the general formula for the interest rate required for
the growth and then compute the interest rate.
2.14. Assume that college tuition is currently 30 times its cost 15 years ago.
Assuming annual compounding, what is the interest rate r that gives the rate
of increase in tuition?
2.15. How much should you have today in an account with monthly com-
pounding and annual interest rate of 4% to receive $1,000 per month forever?
2.16. (Equity in a House) A couple purchased a house 7 years ago for $375, 000.
The house was financed by paying 20% down and signing a 30-year mort-
gage at 6.5% on the unpaid balance. The net market value of the house is now
$400, 000. Assume that the couple wishes to sell the house.
2.11 Exercises 75
a) How much equity (to the nearest dollar) does the family have in the house
now, after making 84 monthly payments?
b) Find the first interest payment I1 and the 84th interest payment I84 .
2.17. (Social Security Benefits) We present a simplified problem to illustrate
Social Security benefits. A college graduate begins work at age 22. She has an
annual income of $70, 000 until retirement (a simplification), pays 12.4% of this
income into Social Security each year, and retires at age 65 with Social Security
benefits of $20, 000 annually. How long must she live before the present value
of these benefits equals the present value of her annual contributions? In other
words, how long must she live after retirement to get back the full value of
her contributions to Social Security? Will she get the entire value? Assume a
discount rate of 4% per year, no change in her salary, and that all payments
and benefits occur at the end of each year.
2.18. (Workers Compensation) The usual legal settlement for an industrial
accident is the present value of the employees lifetime earnings. If you expect
to work for 10 more years, make $70, 000 a year in the next 2 years, and get
a raise of $5, 000 every 2 years, what would be your settlement? Assume an
annual discount rate of 4% in the first 5 years and 6% in the second 5 years,
and that your paycheck is received at the end of each year.
2.19. (Bonds) Suppose that you bought a 30-year bond with 4% annual coupon
rate. You wish to sell that bond at a later date when the remaining life of the
bond is 2.5 years and the current YTM of your bond has declined to 2%.
a) What is the fair value, as determined by the present value method, of the
bond at the time of your sale?
b) How much would you earn if you purchased the bond for $1, 000, sold it at
the fair value, and did not reinvest the coupon payments?
2.20. (Bonds) Bonds are generally quoted as a percentage of their face value.
A bond selling at 99.2% of its face value is quoted as 99.2. The following in-
formation for a treasury bond was provided by the WSJ market data center on
December 4, 2013:
The coupon column refers to the annual coupon rate. Verify that the last col-
umn indicates YTM.
Purchasing a House
been considering buying a house. You have saved $10,000 toward a down pay-
ment for the house.
A salesperson informs you that he has a new house for sale, where the house
and land were independently appraised at $200, 000, but are being sold by the
builder at a discount price of $185, 000. The builder wants to get rid of the prop-
erty quickly because the house is the last one to be sold in the development and
the builder is moving on to construction of a new development.
The salesperson connects you with his in-house lender, to whom you give
details about your income and grant permission to review your credit and
eligibility for a loan. You inform her that you are prepared to make a down
payment of $10,000 toward the house if necessary. She gets back to you with
good news that, if you put $8, 100 toward the house, then they can give you
a 30-year loan for the balance of $176, 900 at 6.25% per annum (compounded
monthly). Note that lenders require the house to appraise at or above the pur-
chase price; otherwise, they may reject the loan or require more down pay-
ment. The lender computes the monthly mortgage payment at $1,089.20. She
informs you that the remaining $1,900 of your $10,000 can be used toward costs
associated with the final evaluation of the physical property and the closing of
the purchase (property inspector fee, termite inspector fee, official survey, at-
torney fees, etc.). The builder agrees to pay for costs beyond your $1,900 and
make necessary repairs you identify during the period you have to inspect the
property (the due diligence period).
Hearing the news about your qualification for the loan, the salesperson asks
you how much rent you are now paying. When you inform him that you pay
$1,040 per month, he quickly points out that it would be a mere extra $50 per
month for you to meet the mortgage payments. He emphasizes that it is better
to own than to rent, especially if the mortgage is just a bit more than your
current rent.
You are thrilled! After the excitement subsides, however, you decide to run
the numbers yourself to make sure you get a clear understanding of what
you are getting into financially.19 The problems in this project help guide you
through some of this analysis.
2.21. Show that the monthly loan payment on the unpaid principal balance of
$176, 900 is $1, 089.20.
2.22. In addition to closing fees paid to settle the loan, there are expenses be-
yond the monthly mortgage payments.
First, since your deposit was less than 20% of the purchase price, you are
required to take out a private mortgage insurance (PMI) to protect the lender
if you default on the loan. The PMI typically lasts until the unpaid principal
balance of the mortgage is paid down to 80% of the original value of the house,
where the houses original value is the lesser of the purchase price and the
official appraised value of the house used in closing the sale. Note that the
bank may also require your payment history to be in good standing (e.g., no
late payments in the past year or two) before removing PMI. Of course, if the
value of the house increases nontrivially, you may be able to remove the PMI
earlier. Suppose that the PMI is $141.52 per month.
Second, along with PMI, you have to pay for hazard insurance to cover un-
planned damages to the house due to fire, smoke, wind, etc. Assume that the
hazard insurance is $36.50 per month.
Third, you have to pay property taxes to the tax district (e.g., county and
city) where the house is located. The property (i.e., house and land) will be
valued within your tax district, which is a valuation that is separate from the
appraisal done when purchasing the house. The resulting tax districts valua-
tion is the taxable value of the house and is the amount to which the property
tax rate will be applied. Suppose that the annual property tax rate is 1.3% and
the taxable value of the property is $189, 986. For this project, the taxable prop-
erty value is less than the appraised value (i.e., $200, 000) used for the purchase.
Sometimes, however, the taxable value can be higher which was not uncom-
mon in the aftermath of the 2008 mortgage crisis.
The PMI, hazard insurance, and property tax payments are in addition to
the monthly loan payment, and all together they form a single payment you
make to the lender. The lender or a company hired by the lender manages
these payments by taking out the portion for the loan payment (principal plus
interest) and depositing the rest into an escrow account, which is used to pay
the annual insurance premiums and property taxes on behalf of the borrower.
Finally, assume that the property is in a housing development that comes
with a mandatory Homeowners Association (HOA) fee. The HOA fee is used
to maintain the grounds, roads, etc. in the development. If you do not pay the
fee, the HOA can foreclose on your property. Assume an HOA fee of $100 per
month.
a) What is the estimated total monthly PITI, i.e., the minimum monthly pay-
ment covering the principal, interest, taxes, and (hazard) insurance?
b) Identify two other mandatory house expenses that are outside of the PITI
payment and other basic house costs like utilities and repairs. Do exclude
costs like groceries, tuition, medical expenses, etc., which are more associ-
ated with running a home. What is the minimum monthly cost of the house
during the first year if you now include these two mandatory house ex-
penses and PITI? Which of these housing costs will likely increase in the
future?
78 2 The Time Value of Money
c) What is your opinion about the salespersons pitch about the cost of renting
versus buying a house?
2.23. Fill out the amortization schedule below, which is for the first 5 months
of the loan.
Payment # Payment (P ) Principal (P ) Interest (I ) Bal. (B )
1 1, 089.20 167.85 921.35 176, 732.15
2 1, 089.20
3 1, 089.20
4 1, 089.20
5 1, 089.20
2.24. Are there discrepancies in the above amortization table? If so, explain
how to remove them mathematically.
For the remaining problems, note that only the payments toward principal
and interest (PI) are relevant to the loans balance. Costs associated with prop-
erty taxes, hazard insurance, PMI, HOA, etc. are separate expenses and do not
impact the balance of the loan. Such costs are typically not included in the
loans cost.
2.25. Using a software, compute the numbered payment at which the unpaid
balance on the loan will first dip below 80% of the original value of the house.
Roughly how many years and months does it take to reach that balance? If the
value of the house has not decreased below its original value at that point in
time, you would stop paying PMI henceforth.
2.26. Determine the total amount you would pay into the mortgage, excluding
escrow payments, if you make only the minimum payment over the full 30
years. What is the total cost of the mortgage? Is it more than the mortgage?
2.27. Estimate the number of years and months it would take to pay off the
mortgage if you double your monthly payments.
2.28. Estimate the total you would pay into the mortgage if you double your
monthly payments. What is the total cost of the mortgage for doubled pay-
ments? Is it more than the mortgage?
dF
F ( x + h ) = F ( x ) + F ( h ) F0 , F ( 0 ) = F0 , ( 0 ) = r F0 ,
dx
where r 0. Determine the type of growth model, i.e., find F ( x ).
2.31. (Capital After Spending, Inflation, and Interest) Consider the following
setup:
- Begin with an initial capital C (0) in an interest-bearing account and let C (n)
be the remaining capital at the end of the nth year.
- Assume an interest rate r is applied at the end of each year to the capital
remaining on that date.
- At the end of the first year, assume that an amount S was spent from C (0) on
goods and services, and money will be spent on similar goods and services
in each of the subsequent years.
- Suppose that the amount spent at the end of any specific year is the total
amount spent by the end of the first year increased in subsequent years at
the annual inflation rate i compounding annually until the end of the spec-
ified year. Assume that r > i since investors are not interested in a market
interest rate that is below the inflation rate.
a) Show that the total capital at the end of the (n + 1)st year can be expressed
recursively as follows in terms of the capital at the end of the previous year,
taking into account spending, inflation, and interest growth:
C ( n + 1) = (1 + r ) [ C ( n ) (1 + i) n S ] . (2.80)
i. Show that there is no n such that the amounts accrued under both options
are equal by the end of the (n + 1)st year.
ii. Show that Plan A is superior to Plan B, i.e., prove FV A > FVB .
References 81
2.34. (Bonds) Given a coupon bond described by Equation (2.76) on page 68,
find the future value at maturity of the bonds cash flow.
2.35. (Bonds) Show that for a coupon bond, its yield to maturity (rY ), current
yield (r), and coupon rate (rC ) have the following relationships:
References
[1] Bodie, Z., Kane, A., Marcus, A.: Investments, 9th edn. McGraw-Hill Irwin,
New York (2011)
[2] Brealey, R., Myers, S., and Allen, F.: Principles of Corporate Finance.
McGraw-Hill Irwin, New York (2011)
[3] Brown, S., Kritzman, M.: Quantitative Methods for Financial Analysis.
Dow Jones-Irwin, Homewood (1990)
[4] Chaplinsky, S., Doherty, P., Schill, M.: Methods of evaluating mergers and
acquisitions. Note Number UVA-F-1274. University of Virginia Darden
Business Publishing (2000)
[5] Choudhry, M.: Fixed-Income Securities and Derivatives Handbook.
Bloomberg Press, New York (2005)
[6] Davis, M.: The Math of Money. Copernicus Books, New York (2001)
[7] Gordon, M.J.: Dividends, earnings and stock prices. Rev. Econ. Stat. 41, 99
(1959)
[8] Guthrie, G, Lemon, L.: Mathematics of Interest Rates and Finance. Pren-
tice Hall, Upper Saddle River (2004)
[9] Hull, J.: Options, Futures, and Other Derivatives, 7th edn. Pearson Pren-
tice Hall, Upper Saddle River (2009)
[10] Kellison, S.: The Theory of Interest, 2nd edn. Irwin McGraw-Hill, Boston
(1991)
82 2 The Time Value of Money
[11] Koller, T., Goedhart, M., Wessels, D.: Valuation: Measuring and Managing
the Value of Companies. Wiley, Hoboken (2010)
[12] L.E.K. Consulting, LLC: Discounted Cash Flow Valuation Primer. L.E.K.
Consulting, Chicago (2003)
[13] Lovelock, D., Mendel, M., Wright, A.: An Introduction to the Mathematics
of Money. Springer, New York (2007)
[14] Meserve, B.: Fundamental Concepts of Algebra. Dover, New York (1981)
[15] Muksian, R.: Mathematics of Interest Rates, Insurance, Social Security, and
Pensions. Prentice Hall, Upper Saddle River (2003)
[16] Reilly, F., Brown, K.: Investment Analysis and Portfolio Management.
South-Western Cengage Learning, Mason (2009)
[17] Wang, X.: A simple proof of Descartess Rule of Signs. Am. Math. Mon.
111, 525 (2004)
[18] Williams, J.B.: The Theory of Investment Value. Harvard University Press,
Cambridge (1938). Reprinted in 1997 by Fraser Publishing
Chapter 3
Markowitz Portfolio Theory
1 Harry Markowitz, Merton Miller, and William F. Sharpe shared the 1990 Nobel Prize in Economic
Sciences. Markowitz won for his work on portfolio selection (see Press Release at Novelprize.org).
Investors: all investors are rational, i.e., they make financial decisions that
maximize their expected satisfaction with possible wealth gains in the face
of the risks these possible gains require.
Equilibrium: supply equals demand.
No arbitrage: no-arbitrage opportunities exist, which means intuitively that
there is no opportunity to make a costless, riskless profit. For an arbi-
trage, none of your funds is required, loans would be settled with interest,
and a profit is still guaranteed. However, for mathematical modeling pur-
poses, a broader and more precise definition of arbitrage is used later; see
Definition 7.1 (page 334).
Access to information: rapid availability of accurate information on securities
exists.
Efficiency: a securitys price adjusts quickly to new information, so its
current price reflects all known information impacting the security, which
includes information about the past and expected future behavior of the
security.
Liquidity: any number of units of a security can be bought and sold quickly.
2No lending or borrowing of money will be done in the current chapter, but it will be part of the
modeling in Chapter 4, which generalizes the Markowitz model.
3.1 Markowitz Portfolio Model: The Setup 85
S i ( t f ) S i ( t0 ) D i ( t0 , t f )
R i ( t0 , t f ) = + , (3.1)
S i ( t0 ) S i ( t0 )
capital-gain return dividend yield
R i ( t0 , t f ) S i ( t0 ) , (i = 1, . . . , N ).
We shall see that the return rates of the securities are the core quantities from
which all the other Markowitz portfolio inputs are calculated.
Notation. In most of this chapter, we shall consider security return rates over
a fixed investment time interval [t0 , t f ] and so the following simpler notation
will be used:
R i = R i ( t0 , t f ) , i = 1, . . . , N.
In the formula (3.1) for Ri , the futures price Si (t f ) is a discrete random vari-
able in models like binomial trees (Chapter 5), while in some continuous-time
models, it is lognormal (Chapter 6). Additionally, the future dividend Di (t0 , t f )
is also random since it is typically unknown. However, in most applications in
the book, we shall model the dividend as a known percentage of the securitys
unit price at t0 :
D i ( t0 , t f ) = q i S i ( t0 ) , i = 1, . . . , N,
where qi is the (assumed known) annual dividend yield rate of the ith security.
86 3 Markowitz Portfolio Theory
A basic assumption about the securities returns is that the covariance matrix V of
the return rates R1 , R2 , . . . , R N is invertible: the financial implication is that there
is no redundant security in the portfolio, i.e., no security with a return rate that is
a linear combination of the others. To see this, suppose for illustration that
R1 = a 2 R2 + a N R N .
where ij = Cov( Ri , R j ) = ji , and consider the first column of V. The top entry
in the first column expands to
11 = Cov( R1 , R1 ) = Cov( R1 , a2 R2 + + a N R N )
= a2 Cov( R1 , R2 ) + + a N Cov( R1 , R N )
= a2 12 + + a N 1N .
xT V x = Var( x1 R1 + + x N R N ) 0.
3 The theoretical importance of the positive definite property will be seen in Section 3.3.
3.1 Markowitz Portfolio Model: The Setup 87
S i ( t f ) + D i ( t0 , t f )
Ri = 1 > 1.
S i ( t0 )
But, if Ri is normal, then it has a nonzero probability of satisfying Ri 1,
which is inconsistent with a positive security price and nonnegative dividend.
However, the issue depends on the length of the time interval for which
one is considering the return rate. It is typically assumed that the normality of
security and portfolio return rates holds for a sufficiently short time span .
See Bodie, Kane, and Marcus [1, pp. 139153] for an elementary introduction
as well as the research paper [15] by Levy and Duchin.
The multivariate normality condition is actually not necessary for Markowitz
mean-variance analysissee Markowitz [18] and Markowitz and Blay [19]and
so will not be enforced. Readers are also referred to the insightful reviews by
Goldberg [6] and Levy [14], which provide excellent guides to the book [19].
Investors assess a portfolio only through its expected return rate and risk and
agree on the joint distribution of the securities return rates from t0 to t f .
Investors are risk averse, i.e., for a portfolio with a given risk, investors
demand the largest possible expected return rate and for a portfolio with
a given expected return rate, they demand the least possible risk.
smallest possible risk for its given level of expected return rate and the largest
possible expected return rate for its given level of risk. The collection of all
efficient portfolios is called the efficient frontier. The efficient frontier contains
infinitely many efficient portfolios and each represents a different risk-return
tradeoff. We shall show how the Markowitz model is applied to determine the
efficient frontier of a two-security portfolio (Section 3.2.2) and then the general
case of a portfolio with N securities. We shall show how the Markowitz model
yields that the more one spreads a portfolios capital across different risky sec-
urities, the more the portfolios risk is reduced (Sections 3.2.3 and 3.7), lending
theoretical support to dont put all your eggs in one basket.
Finally, the place where an investor positions her portfolio on the efficient
frontier will have to do with her utility function, which indicates her satisfaction
with the risk-return tradeoff. In other words, she will rank potential returns in
the face of the potential risks it takes to realize those returns in such a way as to
maximize her expected satisfaction or happiness (utility). In other words, we
assume that an investor will seek an optimal portfolio, i.e., an efficient portfolio
that maximizes her expected utility functionsee Section 3.6 for more.
One-Period Assumption
V P ( t0 ) > 0
The one-period assumption: from the current time t0 to the final time t f ,
we make no change to the total number of securities, the type of securities,
or the number of units of any security in the portfolio.
Assume that the percentage of the initial capital V P (t0 ) invested in the ith
security is wi , which is called the weight of the ith security. Since the entire
initial capital V P (t0 ) is distributed across the N securities, the weights of all
the securities add up to 100%:
w1 + + w N = 1. (3.2)
w i V P ( t0 )
ni = , i = 1, . . . , N, (3.3)
S i ( t0 )
where Si (t0 ) is the price of the ith security at t0 . Equivalently, the cost of ni
units of the ith security is ni Si (t0 ). Note that a non-integer number of units of a
security is allowed.
The initial value of the portfolio can then be expressed as the sum of the
costs of the various securities, where a securitys cost is a product of cost per
unit and the number of units:
V P ( t 0 ) = n 1 S1 ( t 0 ) + + n N S N ( t 0 ) . (3.4)
n ( t0 ) = ( n 1 , . . . , n N ) , (3.5)
this vector (3.5) is called the trading strategy of the portfolio at t0 . The value of
the trading strategy at t0 is defined to be the initial capital V P (t0 ) as expressed
in (3.4). By the one-period assumption, the number of units of each security is
held fixed to the end date t f , i.e., the trading strategy is held constant during
the period.
90 3 Markowitz Portfolio Theory
Example 3.1. Assume that today an initial capital of $5, 000 is used to create a
three-security portfolio with 20% of the money in stock 1, 30% in stock 2, and
50% in stock 3. Suppose that the current share prices of the stocks are, respec-
tively, $40, $70, and $10. Then the current trading strategy of the portfolio is to
buy the following numbers of shares of stocks 1, 2, and 3, respectively:
Short Selling
At this stage, you may be implicitly assuming that the number of units and
weight of a security are nonnegative real numbers. However, we apply no such
restriction. This is because we assume that each security in the portfolio is obtained
by either buying, short selling, or taking no trading position (being flat). When you
buy ni units of a security, you are adding redundant securities to your portfolio
and so we represent this position by ni > 0. When you sell ni units of a security,
your portfolio has ni fewer units of the security and we express this position by
ni < 0. When you do not hold a security, we represent that position by ni = 0.
In general, to close or liquidate a buy (respectively, sell) position in ni units of
a security, you must sell (respectively, buy) ni units of the security. The weight
wi corresponding to a position of ni units in a security will have the same sign
as ni .
Short selling securities is a selling of securities that varies in its details dep-
ending on the type of security. The most common example is short selling a
stock, where you sell a certain number of shares of a stock borrowed from a
broker. You will almost definitely need to have a margin (a certain amount of
required funds) in your account in case you are unable to return the borrowed
shares. You close the stock short sale by buying back the shares of the given
stock. The rationale behind short selling a stock is that you hope to make a
profit from a nontrivial decrease in the stocks price. If you sell the borrowed
shares of stock for $50 per share and the share price drops to $45 a month later,
then you can use your proceeds to buy back the shares and still have a payoff
of $5 per share (excluding transaction fees). When you buy back and return the
borrowed shares, you are said to have closed the short position.
Next, consider an option, i.e., a legal contract between two parties whereby
one party (the issuer/writer) sells to the other (the holder) the right, but not the
obligation, to buy from or sell to the issuer a fixed amount of a security (e.g.,
stock) at a preagreed price (called the strike price or exercise price) on or by a
preagreed date (called the expiration date). In particular, a call option is a legal
3.1 Markowitz Portfolio Model: The Setup 91
contract between a buyer (holder) and seller (issuer) granting the holder the
right, but not the obligation, to buy a stipulated amount of the asset from the
issuer at the strike price on or by the expiration date.
Short selling an option contract is issuing an option contract. For example,
short selling an equity call option, i.e., a call option contract on a stock means
that you are obligated to sell 100 shares of the stock at the strike price if the
option is exercised and you are assigned the exercise. Specifically, when an
equity option is exercised by a holder, the exercise is randomly assigned (in
the USA, by the Options Clearing Corporation) to a market participant who
short sold the same call option (i.e., same underlying stock, strike price, and
expiration). If you are assigned, then you close the short position by obtaining
100 shares of the stock (if you do not already have them) and sell each share at
the strike price. If you were not assigned yet, then you can close the position by
buying back the exact call contract (same stock, strike price, and expiration). In
general, you close a buy position in an option by selling the exact option and
close a short-sell position by buying back the exact option.
In our Markowitz context, for any proper subset of securities in a portfolio
that are short sold, we always use the proceeds along with the initial capital to
purchase the units of the remaining securities. We shall not consider a portfolio
where all its securities are short sold.
Now, at t0 our portfolio has n1 , . . . , n N units, respectively, of securities 1
through N. When ni > 0, it means that ni units of the ith security are bought
at time t0 , while for ni < 0, the interpretation is that ni units of the security are
short sold at t0 . When no action is taken on the ith security, we write ni = 0.
Furthermore, by (3.3) the ith weight can be written as
n i S i ( t0 )
wi = , i = 1, . . . , N. (3.6)
V P ( t0 )
Since Si (t0 ) > 0 and V P (t0 ) > 0, we see from (3.6) that the weight wi has the
same sign as ni . We then interpret the sign of the weights as:
wi > 0 means buy ni units of the ith security at time t0 (long position).
wi = 0 means neither buy nor sell the ith security at time t0 (flat position).
wi < 0 means short sell ni units of the ith security at time t0 (short position).
Equation (3.4) also shows that the sum of the weights is still unity as in (3.2),
even if a proper subset of weights is negative:
w1 + + w N = 1.
92 3 Markowitz Portfolio Theory
When no short selling is used to construct the portfolio, the weights satisfy
Consequently, in the absence of short selling, since the weights are nonnegative
and sum to unity, we always have
0 wi 1, i = 1, . . . , N.
Example 3.2. Suppose that you identified a stock and a call option contract on
the stock to create a portfolio at t0 . Assume that the positions you take in these
securities are to long (buy) (t0 ) shares of the stock priced at S(t0 ) per share
and to short (sell) the call option contract on the same stock, where the call is
sold at price C (t0 ) per share of the stock. In practice, call option contracts are
typically based on 100 units of the underlier, but for modeling purposes, it is
simpler to quote the call price per unit of the underlier.
Since long positions are an inflow of securities into the portfolio and short
positions are an outflow, they are represented by positive and negative signs,
respectively, when tallying the total value of a portfolio. The portfolios value
at t0 is then
V P ( t0 ) = S ( t0 ) ( t0 ) C ( t0 ) . (3.7)
Equivalently, you can obtain (3.7) if you unwind the two positions. Specifically,
liquidate (sell) the (t0 ) shares of the stock at t0 , which yields a cash inflow
of S(t0 ) (t0 ), and close your short position on the call (assuming it was not
assigned at t0 ). The latter means that you buy back a call contract on the same
stock and with the exact strike price and expiration date. The latter yields a
cash outflow of C (t0 ), which is represented mathematically as C (t0 ). The
total value of the portfolio at t0 is then the net sum of these cash flows, which
gives (3.7).
The trading strategy that created this portfolio is
n ( t0 ) = ( ( t0 ) , 1 ) .
In other words, short selling the call contract brings in proceeds of C (t0 ), which
when added to the initial capital V P (t0 ) in (3.7) yields the funds S(t0 ) (t0 ) to
buy (t0 ) shares of the stock.
Note that w1 + w2 + w3 = 1.
Let us interpret the meaning of the above weight assignment. First, the
weights state that we use the following trading strategy to form the portfo-
lio:
0.2 $5, 000
n1 = = 25 (short sell 25 shares of stock 1)
$40
0.5 $5, 000
n2 = = 35.7143 (buy 35.7143 shares of stock 2)
$70
0.7 $5, 000
n3 = = 350 (buy 350 shares of stock 3).
$10
Specifically, we create the portfolio by first short selling 25 shares of stock 1 to
obtain $1,000, which is 20% of the initial capital (since the proceeds are from
a short position). Adding these proceeds to the initial capital, we then have
$6,000 to invest in stocks 2 and 3.
Weight w2 tells us that we take 50% of the initial capital $5,000 to buy 35.7143
shares of stock 2. This reduces the initial capital to $2,500. Weight w3 indicates
that we use 70% of the initial capital, i.e., $3,500, to purchase 350 shares of
stock 3. Though the cost of the purchase exceeds the $2,500 remaining from
the initial capital, we have an extra $1,000 from the short sale to cover the
purchase. Finally, observe that (3.4) holds (to two decimal places)
Note that using four decimal places in 35.7143 gives the desired accuracy,
while 35.71 would yield 4, 999.70.
V P ( t f ) V P ( t0 ) D P ( t0 , t f )
R P ( t0 , t f ) = + , (3.8)
V P ( t0 ) V P ( t0 )
where V P (t0 ) and V P (t f ) are the values of the portfolio at the start and end of
the investment period, and DP (t0 , t f ) is the total cash dividend during [t0 , t f )
from all the securities in the portfolio. For the investment interval, the amount
you get back beyond the initial investment is the percentage R P (t0 , t f ) of the
initial investment V P (t0 ):
R i ( t0 , t f ) S i ( t0 ) , (i = 1, . . . , N ).
Since the investment interval [t0 , t f ] is fixed, we use the following simpler
notation:
R P = R P ( t0 , t f ) .
By (3.4), the initial portfolio market value is
V P ( t 0 ) = n 1 S1 ( t 0 ) + + n N S N ( t 0 )
V P ( t f ) = n 1 S1 ( t f ) + + n N S N ( t f ) . (3.9)
The total cash dividend received from the securities during [t0 , t f ) is
N
ni
RP = V (t ) i f S ( t ) S (
i 0t ) + D ( t
i 0 f, t )
i =1 P 0
N
n i S i ( t0 ) S i ( t f ) S i ( t0 ) + D i ( t0 , t f )
= V P ( t0 ) S i ( t0 )
.
i =1
By (3.1) and (3.6), the portfolio return rate is the weighted sum of the securities
return rates:
N
RP = wi Ri . (3.11)
i =1
3.1 Markowitz Portfolio Model: The Setup 95
The expected (or mean) portfolio return rate for the period [t0 , t f ] is then
N
P = E( RP ) = wi i , (3.12)
i =1
where
i = E ( Ri ) (3.13)
is the expected return rate of the ith security. In (3.12), every weight wi is assumed
nonrandom, unless stated otherwise, and each expected return rate i is assumed finite
and known. The weights are to be determined in the search for an efficient port-
folio, while the expected return rates 1 , . . . , N are typically estimated using
samples of historical return rates.
Example 3.4. Consider a portfolio with two stocks over a time interval [t0 , t f ]
corresponding to the next month. Denote the expected return rate over the next
month of a stock in the portfolio by
monthly = E R(t0 , t f ) .
Suppose that t0 = t0 > t1 > > tn denotes a sample of end-of-month to end-
of-month trading dates for m consecutive months from the present date t0 into
the past. Denote the corresponding historical return rates as follows:4
! (tn , tn1 ),
R ! (tn1 , tn2 ),
R ..., ! (t1 , t0 ).
R
The stocks theoretical ensemble expected monthly return rate monthly is esti-
mated using the time average of the monthly return data (see Exercise 3.12):
1 n !
n j
Rmonthly = R ( t j , t j 1 ) .
=1
4Note that the returns are notationally the reverse of the case for future times: for past times tj < tj1 ,
! ( t , t ) instead of R( t j1 , t j ), which is for future times t j1 < t j .
we use R j j 1
96 3 Markowitz Portfolio Theory
We saw that since the securities in the portfolio have random futures prices
and (in general) random future dividends, the portfolio return rate R P is also
random. This uncertainty in the return rate gives rise to the portfolios risk. In
other words, portfolio risk is determined by how much the possible values of
the random portfolio return rate R P can spread away from the expected return
rate P . More precisely, the risk of a portfolio is modeled in Markowitz theory
by the standard deviation of its return rate:
" #
P = Var( R P ) = E ( R P P )2 . (3.14)
where " #
i = Var( Ri ) = E ( Ri i )2 (3.16)
ii = i2 .
In (3.15), the volatilities i s and covariances ij s are assumed to be finite and known.
The Markowitz model (3.15) of portfolio risk implies that the risk of a portfolio
comes from two sources: the weighted contributions of the variances i2 , where
3.1 Markowitz Portfolio Model: The Setup 97
Risk of a Security
At the start of the Section 3.1 we informally defined a risky security as one
whose return rates cannot be predicted with certainty. In Markowitz setting,
risk is modeled more narrowly using volatility. Specifically, the risk of the ith
security is modeled by i for i = 1, . . . , N. In other words, the ith securitys risk
is a measure of how much the random return rate Ri spreads about the secu-
ritys expected return rate i . The risks 1 , . . . , N , are usually estimated from
historical data (see Exercise 3.12).
Remark 3.2. The portfolio risk (3.14) and security risk (3.16) measure how the
random return rates R P and Ri disperse above and below the expected return
rates P and i , respectively. Some have argued that risk should instead be
modeled by how much the return rates spread below the mean (downside risk)
or the probability of the return rates being below some threshold (shortfall
probability). Later we shall explore three portfolio risk measures: the Sortino
ratio (Section 4.2.2), the maximum drawdown (Section 4.2.3), and the value-
at-risk (Section 4.2.5). See Grinold and Kahn [9, pp. 4146] for a critique of
these risk measures relative to the standard deviation. Throughout our text,
however, the primary measure of risk shall be the standard deviation.
Example 3.5. (We continue with Example 3.4 on page 95.) Let us consider the
variance of a stock in the portfolio using n historical monthly consecutive
return rates over times t0 = t0 > t1 > t2 > > tn1 > tn . The data runs from
the past time tn to the present time t0 : R! (tn , t ), . . . , R! ( t , t ), R
! ( t , t ).
n 1 2 1 1 0
The theoretical variance month of the stock for the next month is estimated
2
The other contributor to portfolio risk is the weighted collection of the covari-
ances of the random return rates of the securities in the portfolio. We shall see
in Section 3.7 that, for a portfolio with a sufficiently large number of different
securities, the weighted sum of the covariances of the securities dominates the
weighted sum of the securities volatilities. In this section, we instead review
some basic insights into the covariance of a pair of return rates of risky secu-
rities using the associated correlation coefficient. The correlation coefficient of
the return rates Ri and R j of the ith and jth securities will be written as follows:
ij
( Ri , R j ) = = ij ,
i j
The respective risks of the two securities are i and j , while the covariance is
ij = Cov( Ri , R j ).
In general, the (Pearson) correlation coefficient of random variables X and Y
with nonzero volatilities X and Y , respectively, is defined by
Cov( X, Y )
( X, Y ) = .
X Y
A basic property is
1 ( X, Y ) 1.
The correlation coefficient is a unit-independent measure of how X and Y
vary relative to each other, which is not the case for the covariance Cov( X, Y ),
where we assume the units carry a positive sign. This is a special case of the
following general property showing how the covariance and correlation coef-
ficient behave under affine transformations of X and Y (see Exercise 3.16 on
page 146):
3.1 Markowitz Portfolio Model: The Setup 99
Cov( aX + b, cY + d) = ac Cov( X, Y ),
and for a c = 0,
In other words, for any two discrete random variables X and Y, which is the
setting when working with data, the closer the random variables are to being
perfectly positively correlated, i.e., ( X, Y ) = 1, the more likely the values of X and
Y are close to a positively sloped line. Similarly, the closer the random variables
are to having a perfectly negative correlation, ( X, Y ) = 1, the more the values
of X and Y concentrate near a negatively sloped line. For 1 < ( X, Y ) < 1, we
then have varying degrees of how much the values of X and Y spread away
from a straight line.
A pair of random variables X and Y is called uncorrelated if ( X, Y ) = 0. This
means that a scatter plot of possible values of the two random variables has
no linear relationship and so may appear as a cluster of independent points
or points showing an overall nonlinear relationship. Indeed, if X and Y are
independent, then they are uncorrelated. The converse is not true since it is
possible for two uncorrelated random variables to be dependent, though their
dependence will be nonlinear.
As noted earlier, the covariances ij and correlation coefficients ij are esti-
mated using historical data. There are several Web resources that compute the
correlation coefficients of pairs of stocks.5
Example 3.6. (We continue with Example 3.4 on page 95.) Let Rmonth A and
B
Rmonth be the random return rates over the next month of the two stocks in
the portfolio. Write their covariance and correlation coefficient as follows:
month
AB
AB
month = Cov Rmonth
A B
, Rmonth , AB
month = .
month
A month
B
and
!
monthly
AB
AB
!monthly = ,
!
monthly
A !
monthly
B
6 Recall from Chapter 2 that interest rate is per annum, unless otherwise stated.
3.1 Markowitz Portfolio Model: The Setup 101
Using the logarithmic return is advantageous not only for continuous com-
pounding but also for discrete compounding problems. In general, for a
nondividend-paying portfolio with initial value V P (t0 ) and end-of-period value
V P (t f ), we define the portfolio log return from t0 to t f to be
V P (t f )
span = ln
L
r P, .
V P ( t0 )
VP (t f ) = VP (t0 ) er P, span .
L
In the example above, t0 is the current time, t f is 1 month from now, and the
log return over [t0 , t f ] is
span = 0.995%.
L
r P,
For the typical situations we shall consider, the initial value V P (t0 ) is known
and V P (t f ) is a random future value, so r P,
L
span is random. In this case, the
#
L
P, span =E L
r P, span , L
P, span = L
Var r P, span .
Now, let us consider how the portfolio log returns, as well as their expectation
and volatility, behave under different time horizons. Divide [t0 , t f ] into n equal-
length subintervals:
[ t0 , t1 ], [ t1 , t2 ], , [ t n 1 , t n ], t n = t f , t j t j 1 = h n .
n
Let V P (t j ) be the value of the portfolio at time t j , where j = 1, . . . , n. The log
return from t j1 to t j is given by
V P (t j )
prd (t j ) = ln
L
r P, .
V P ( t j 1 )
These log returns over the subintervals relate to the log return over the entire
interval as follows:
102 3 Markowitz Portfolio Theory
VP (tn ) V P ( t1 ) V P ( t2 ) V (tn ) n VP (t j )
L
r P, span = ln
V P ( t0 )
= ln
V P ( t0 ) V P ( t1 )
P
V P ( t n 1 )
= ln V P ( t j 1 )
j =1
n
= r P,L prd (t j ). (3.18)
j =1
Before considering the expectation and volatility over different time hori-
zons, we need two assumptions:
For the first assumption, recall that the present value V P (t0 ) of the port-
folio is known and reflects all available information about the asset at the
current time t0 .7 Suppose that the value of the portfolio V P (t1 ) at the future
date t1 is dependent on information that is not known today. Similarly, the
value V P (t2 ) at date t2 is assumed to be based on information not known
on date t1 . For this reason, we assume that the random log returns
where k = j and k, j = 1, . . . , n.
For the second assumption, recall that by the one-period assumption, we
do not make changes to the portfolio during the time interval [t0 , t f ]. Conse-
prd (t j )
L
quently, we assume that the probability distributions of the log returns r P,
across the future subintervals are identically distributed. Write the expectation
and volatility of the log returns for each subinterval as
#
prd = E r P, prd (t j ) , prd = prd (t j ) ,
L L L
P, P, L
Var r P,
where j = 1, . . . , n.
We can now relate the expectation of the log return for the full time interval
[t0 , t f ], where t f = tn , to the expected log return over the n subintervals. Using
(3.18), we obtain
V ( t ) n
r P,L prd (tj ) = n P,L prd.
P n
L
P, = E ln = E (3.19)
span
V P ( t0 ) j =1
Hence, by (3.19) the expected portfolio log return over the time span is n
times the expected portfolio log return over a period. For instance, an annual
expected portfolio log return is 12 times the monthly expected portfolio log
return. In other words, longer time horizons have a higher expected log return com-
pared to shorter ones.
For the variance of the log return over [t0 , t f ] with uncorrelated and identi-
cally distributed log returns across consecutive subintervals, we obtain
2
2
L
P, span = n P,
L
prd ,
The volatility of the portfolio log return increases as the square root of the
number of periods in the time horizon increases. Longer time horizons will have
higher volatility than shorter
time horizons. In particular, the volatility of an an-
nual portfolio log return is 12 times the volatility of a monthly portfolio log
return.
Lastly, the portfolio log return r P, span relates to the portfolio return rate R P as
follows:
V P ( tn )
r P, span = ln = ln (1 + R P ) ,
V P ( t0 )
where tn = t f . Taylor expanding the log return yields
R2P R3 R4
r P, span = R P + P P + for | R P | < 1. (3.21)
2 3 4
Consequently, if | R P | is sufficiently small, then the log return and the return
rate are approximately equal:
r P, span R P , | R P | 1.
However, over a sufficiently long time span, there is no guarantee that the
risk of a portfolio will not increase or the magnitude of its return rate will
be sufficiently small. In this case, we cannot treat the portfolio log return and
portfolio return rate as approximately equal.
Looking Ahead
The remainder of the chapter will show how to apply the infrastructure of
the Markowitz model to selecting the weights that produce efficient portfo-
104 3 Markowitz Portfolio Theory
Suppose that you have an amount of money V P (t0 ) to create a portfolio with
two risky securities. After an appropriate amount of due diligence, you have
identified two risky securities to buy today t0 and hold in the portfolio until
future date t f . The fundamental portfolio question we shall address is
What percentage of the money V P (t0 ) should you allocate today to each
security to create an efficient portfolio?
In other words, find the weights that give rise to an efficient portfolio.
Let us first collect some quantities needed for Markowitz model in a setting
of two securities, say, securities 1 and 2. The respective random return rates
are R1 and R2 . The securities expected return rates 1 and 2 , risks 1 > 0 and
2 > 0, and correlation coefficient 12 = are assumed to have been estimated
(either by you or by a company offering such a service). For real data the two
securities will typically not have identical expected return rates (1 = 2 ), iden-
tical risks (1 = 2 ), or return rates with either a perfectly positive correlation
( = 1) or perfectly negative correlation ( = 1). These mathematical ideal-
izations will not be assumed by default and will explicitly be identified when
considered:
Unless stated to the contrary, assume that 1 = 2 , 1 = 2 , and
1 < < 1. Without loss of generality, we assume 2 > 1 > 0.
3.2.1 Preliminaries
Before addressing how to determine the weights that produce an efficient port-
folio, we shall present the needed quantities conveniently in matrix form.
For two securities, the weights are given by a vector
w1
w= .
w2
1 = w1 + w2 = w T e.
The random return rates R1 and R2 , the expected return rates 1 and 2 ,
and the covariances of R1 and R2 can also be compiled conveniently in matrix
form:
R1 1 12 1 2
R= , = E ( R) = , V= ,
R2 2 1 2 22
P ( w ) = w 1 1 + w 2 2 = T w (3.22)
"
P (w ) = w21 12 + w22 22 + 2 w1 w2 1 2 = w T V w. (3.23)
Since 2 > 1 > 0 and V is positive definite within our setup of the Markowitz
model, we also have 1 < < 1.
We now collect some basic consequences of the positive definiteness of the
covariance matrix V and introduce additional notation:
12 + 22 2 1 2
A e T V 1 e = (3.27)
det V
2
T 1 2 1 2 1 + 12 1 2 2
B V e= (3.28)
det V
12 22 + 22 21 2 1 2 1 2
C T V 1 = (3.29)
det V
( 1 2 ) 2
A C B2 = . (3.30)
det V
Let us consider their signs. Since e = 0 and = 0, the positive definiteness
of V 1 gives
A>0 (3.31)
C > 0. (3.32)
For the cross term B, we cannot draw any conclusion about its sign at this
stage. However, Equation (3.32) and the linear independence of and e
yield8
8If B C e = 0, then the linear independence of and e implies B = C = 0. This contradicts C > 0.
Hence, B C e = 0.
3.2 Two-Security Portfolio Theory 107
B C e = 0.
Consequently, the positive definiteness of V 1 gives
( B C e) T V 1 ( B C e) > 0.
A C B2 > 0. (3.33)
Note that we could have obtained (3.33) directly from (3.30) since 1 = 2
and det V > 0. However, our previous arguments for the signs of A, C, and
A C B2 were independent of the detailed expression for these quantities.
This will allow us to carry these results over to the N-security case.
w1 + w2 = 1, w1 1 + w2 2 = ,
det K = 2 1 = 0.
Note that since unlimited short selling is permitted, we have < w <
and so < < . If short selling is forbidden (i.e., 0 w 1) and if we
assume 1 < 2 , then 1 2 .
Though the expression of w in (3.37) is specific to the two-security case, we
can actually cast w in the following form, which carries over to N-securities:
C B 1 A B
w = V e + V 1 . (3.38)
AC B2 AC B2
Establishing (3.38) is a rather lengthy computation (Exercise 3.20). Moreover,
the form (3.38) is more complicated than (3.37) and its origin seems mysterious
at this stage. However, it will appear naturally during the N-security efficient
frontier analysis in Section 3.3.2 and allow us to link back readily to the two-
security efficient frontier (see page 123).
Given that the constraint equations in (3.35) have a unique solution w , this
is the only portfolio weight vector available to solve (3.34). In other words,
the solution w yields a unique portfolio risk P (w ) given a portfolio exp-
ected return rate P (w ) = . The quantity P (w ) is then the minimum pos-
sible portfolio risk, being the only risk associated with . However, this is not
enough to decide whether w determines an efficient portfolio because we do
not know whether P (w ) = is the maximum possible expected return rate
given P (w ).
We shall show that will have to lie in a restricted range in order for w
to give an efficient portfolio. Allowing the expected portfolio return to vary
over R, we shall see that the corresponding portfolio risk P (w ) traces out
a branch of a hyperbola. The turning point on this branch will be the global
minimum portfolio risk. The efficient frontier will be the curve segment from
3.2 Two-Security Portfolio Theory 109
the turning point to the upper part of the given branch of the hyperbola. We
now detail the computation of the efficient frontier.
First, to emphasize that the portfolio risk is a function of , we also write
P (w ) = P ().
A 2 2 B + C
P2 () = . (3.40)
A C B2
To identify the graph of (3.40), complete the square of the numerator to get
2
A B 1
P2 () = + , (3.41)
A C B2 A A
and introduce the following (upright) variables:
P = P (), P = .
Fig. 3.1 The solid curve (including ) shows the Markowitz efficient frontier M E,2 for a two-security
portfolio. The turning point of the curve is the global minimum-variance portfolio ( G , G ). The
efficient frontier curve and the dotted curve extend to infinity when unlimited short selling is allowed
(since < < ). If there is no short selling and 1 < G < 2 , then 1 2 and the upper
portion of the efficient frontier ends at security 2, while the dotted curve ends at security 1. Here
1 < < 1, 1 = 2 , and 1 = 2
Since the turning point is the furthest point to the left on the graph, the port-
folio risk G is indeed the global minimum value of the portfolio risk function
P () as varies over R. Furthermore
2
B 2 1 2 1 + 12 1 2 2
G = = . (3.45)
A 12 + 22 2 1 2
3.2 Two-Security Portfolio Theory 111
22 1 2 12 1 2
wG = , 1 wG = . (3.47)
12 + 22 21 2 12 + 22 21 2
Let us now determine the efficient frontier, which is the set of all efficient
portfolios. Recall that an efficient portfolio is determined by a risk-mean pair
(P , P ), where P is the smallest possible portfolio risk for the expected return
rate P and P is the largest possible expected return rate for the portfolio risk
P . Inspection of Figure 3.1 shows that the efficient frontier for the given plot
consists of the solid curve, including the turning point.
More generally, given a portfolio expected return rate P = , there is a
unique portfolio risk P = P () determined by (3.40) with P the minimum
possible portfolio risk associated with P . However, given the risk P , for
the pair (P , P ) to be efficient, we also need the expected return rate P to
be the maximum possible. Solving (3.40) for the expected return rate = P
yields either no solution, one solution, or two solutions depending on whether
P < G , P = G , or P > G , respectively. This is captured by Figure 3.1. In
fact, the expected return rate solutions correspond in Figure 3.1 to the inter-
section points of the vertical lines, P = constant, with the right branch of the
hyperbola. For P = G , the unique intersection point determines the portfolio
expected return P = G . Consequently, the turning point is an efficient port-
folio. For P = constant > G , there are two intersection points with the largest
possible portfolio expected return rate determined by the upper intersection
point. Indeed, the two-security efficient portfolios are given by the turning
point and the upper part of the hyperbola.
Equations (3.40) and (3.43) and the discussion above show that the Markowitz
efficient frontier ME,2 , i.e., the collection of all efficient two-security portfolios,
is given by
-
A2P 2 BP + C B
ME,2 = (P , P ) : P = , P . (3.48)
AC B2 A
112 3 Markowitz Portfolio Theory
Finally, observe from Figure 3.1 that the efficient frontier indicates theoreti-
cally that to obtain a higher expected return rate the portfolio has to take on more risk.
Note, however, in a real-world setting, portfolio management involves more
complexity due to transaction costs, taxes, the trading platform, etc.
Example 3.7. Suppose that you have $2,000 to invest in two stocks, say, stocks 1
and 2, which have current share prices of $40.25 and $35.10, respectively. From
an analysis of historical data of the two stocks, suppose that
where these are annualized percentages. Using these two stocks, create an ef-
ficient portfolio that has an expected annual return rate of 20%.
Solution. The goal is to find a trading strategy9 (n1 , n2 ) such that n1 and n2 are
the respective number of shares of stocks 1 and 2 needed to build an efficient
portfolio with P = 0.2.
The initial capital is V P (t0 ) = $2, 000. Let us now collect the quantities from
Section 3.2.1 that are used in determining the portfolio weight vector. We shall
employ the expressions of these quantities that involve vectors and matrices
since that form carries over to the N-security analysis. First,
1 0.08 0.0081 0.00675
e= , = , V= ,
1 0.12 0.00675 0.0225
and
1 164.609 49.3827
V = .
49.3827 59.2593
Second
A = e T V 1 e = 322.6337449
B = T V 1 e = 30.1563786
C = T V 1 = 2.8549794
A C B2 = 11.7055327.
9 See (3.5) on page 89.
3.2 Two-Security Portfolio Theory 113
Third
C P B P A B
= 0.27135, = 2.93625.
AC B 2 AC B2
Note that the global minimum-variance portfolio,
B 1
G = = 9.3%, G = = 5.6%,
A A
does not meet the requirement of the portfolio we are trying to create since
P = 20% > G .
The decimal places are maintained simply for mathematical consistency with
the amounts received from shorting and needed for purchasing. In an actual
trading setting, an integer number of shares is traded. Also, note that, to ob-
tain the required high expected portfolio return rate of 20%, the constructed
efficient portfolio ends up with a risk much higher than that of the individual
stocks:
P = w T V w = 56.2% max{1 , 2 } = 15%.
114 3 Markowitz Portfolio Theory
The discussion so far makes no mention about the diversification of the two
securities in the portfolio. We first discuss how diversification relates to the
correlation coefficient between the return rates of two stocks. Using two sepa-
rate online correlation coefficient Calculators,10 we see that for the time span
from August 10, 2010, to August 10, 2013, using adjusted closing prices, the for-
profit stocks Apollo Group, which owns the University of Phoenix, and Strayer
Education, Inc., which owns Strayer University, have a positive correlation of
0.77. Indeed, we would expect intuitively that on average two stocks from
the same business sector have a nontrivial positive correlation. On the other
hand, the correlation calculators output a negative correlation of 0.5 for
American Airlines11 and Exxon-Mobil. We also expect this intuitively since in-
creases in oil prices benefit oil companies, but hurt airlines due to the resulting
higher fuel cost. Finally, the correlation between the Apollo Group and Amer-
ican airlines was quite weak, namely, 0.15, which is expected intuitively
since the online education sector and the airline industry do not compete with
each other.
Remark 3.3. Bear in mind that there are always exceptions to the above sim-
plistic intuition. Historical-data estimates of correlation coefficients (and other
quantities) are part art and part science. The degree to which two stocks
vary relative to each other is influenced by overall market movements during
the time span of the datae.g., a period of an overall market rise lifts the
return rates of most stocks, creating positive correlations between them. Add-
itionally, correlation estimates are affected by the datas sample size, sample
frequency, etc.
10 We used the free online Correlation Tracker tool at (www.sectorspdr.com) and Stock Correlation
Calculator at Buyupside (www.buyupside.com/calculators).
11 Note that the ticker symbol of American Airlines at the time was AAMRQ, which changed after the
Since w(1 w) 0, we see that the portfolio risk decreases as decreases from
1 to 1 (the securities covary less and less in the same direction). In other
words, portfolio risk decreases as the diversification increases.
Figure 3.2 illustrates the above situation. The curves from right to left depict
the different efficient frontier curves for increasing diversification, i.e., as dec-
reases from positive to negative values over the interval (1, 1). Figure 3.2
then captures the benefit of diversification in the two-security setting, namely,
increasing diversification creates efficient-frontier curves that push to the left,
reducing the overall portfolio risk. Diversification in the N-security setting will
be explored in Section 3.7.
The answer is affirmative if we drop the requirement that 2 < 1 and consider
a portfolio with two risky securities having perfect negative correlation:
= 1,
Fig. 3.2 From left to right, the solid curves show typical efficient frontiers for a two-security portfolio
as the correlation coefficient varies over (1, 1) from negative to positive values. Each associated
set of feasible portfolios for a given has risk and expected return rates determined by the union of
the solid and dashed curves. The turning point of each curve identifies the efficient portfolio with the
lowest risk for the given curve
116 3 Markowitz Portfolio Theory
Is it better to put all your money in two risky securities versus one risky secu-
rity? Intuitively, it would seem better to spread your money between the two
risky securities to lower your risk, i.e., not to put all your eggs in one basket.
We shall construct a portfolio with two uncorrelated risky securities having less risk
than a portfolio consisting of either one of the securities. In particular, we consider
securities 1 and 2 with risks 1 > 0 and 2 > 0 and correlation coefficient
= 0.
Then the two-security portfolio has variance given by
1 2 22 1 + 12 2
G = " , G = .
12 + 22 12 + 22
22 12
wG = , 1 wG = . (3.50)
12 + 22 12 + 22
Therefore
A portfolio of uncorrelated risky securities with its fraction w G of the ini-
tial capital invested in one security and 1 w G in the other will have less
risk than either of the two securities. Moreover, this portfolio involves no
short selling since 0 < w G < 1.
In other words, spreading the investment capital strategically as above be-
tween the two uncorrelated securities yields a portfolio with less risk than a
portfolio consisting of just one of the two securities.
Many of the ideas and quantities introduced for two securities (Section 3.2)
carry over naturally to N securities in determining the efficient frontier. In
this section, we shall compute the efficient frontier for an N-security portfo-
lio without putting any restrictions on short selling. The only restriction on the
weights will then be that they sum to unity. When there is no short selling, the
efficient frontier is analytically more complex and usually presented by using
numerical plots; see Section 3.4.
Suppose that, after researching a collection of different risky securities, you
have identified N of them for which you are confident in your estimates of
their expected return rates i , of their risks i , and of the correlations ij of the
return rates of each pair of the securities. It would also be unrealistic for all of
the securities to have the same expected return and risk and to have a perfect
correlation between any pair of return rates. For these reasons
1 = 2 = = N ,
identical risks
1 = 2 = = N ,
or perfect correlation ij = 1 for any distinct pair i, j.
Unlimited short sales are allowed, i.e., < wi < .
118 3 Markowitz Portfolio Theory
1 = w1 + + w N = w T e.
The weight space of an N-security portfolio that allows for unlimited short sell-
ing is $ %
WN = w R N : w T e = 1 . (3.51)
Note that WN is a line for N = 2 and plane for N = 3. In general, the space WN
is an ( N 1)-dimensional plane in R N = {(w1 , . . . , w N )} passing through the
N standard unit basis vectors:
e1 = [ 1 0 0 . . . 0 0 ] T , e 2 = [0 1 0 . . . 0 0 ] T , ..., e N = [0 0 0 . . . 0 1 ] T .
Of course, because a realistic weight space will not extend to infinity, it will be
a proper subset of the mathematical space WN .
The random return rates and expected return rates of the securities are
R1 1
R2 2
R = . , = E ( R) = . .
.
. .
.
RN N
The portfolio return rate and expected portfolio return rate can then be ex-
pressed as
3.3 Efficient Frontier for N Securities with Short Selling 119
N N
R P (w) = w T R = wi Ri , P (w) = E (w T R) = w T = wi i .
i =1 i =1
The assumption that the securities do not all have the same expected return
means that = ce for some constant c.
The covariance matrix of the return rates R1 , . . . , R N is
2
1 12 1N
22 2N
V = . .. ,
. . .
N2
where the entries below the diagonal are not shown since the covariance ma-
trix is symmetric, ij = Cov( Ri , R j ) = ji . The matrix V is invertible and so
positive definite (see page 86). It follows that V 1 is symmetric and positive
definite. Estimating the covariance matrix using historical data then requires
N ( N 1)
N+ estimates,
2
N ( N 1 )
which correspond to N variances and 2 correlation coefficients. For ex-
ample, one hundred stocks require 5, 050 estimates to determine V.
The portfolio risk is given by
P (w) = w T V w > 0.
A = e T V 1 e > 0 (3.52)
1
B= V T
e (3.53)
C = T V 1 > 0 (3.54)
A C B > 0.2
(3.55)
where since the Hessian matrix is symmetric, we do not show the entries below
its diagonal. Two useful properties (Exercise 3.25) to keep in mind are that for
an n n real matrix A, the gradient and Hessian of x T Ax are given by
( x T Ax) 2 ( x T Ax)
= ( A + AT ) x, = A + AT , x Rn . (3.56)
x xx T
We first find the portfolio with the smallest risk given an expected portfolio
return rate. We then vary through all possible expected portfolio return rates
to obtain the corresponding set of minimum-risk portfolios. The set of all such
pairs of portfolio risks and expected return rates forms the right branch of a
horizontal hyperbola. We shall argue that the portion of the branch from the
turning point to the upper part of the hyperbola forms the desired efficient
frontier.
Let us now detail the above. We wish to find a portfolio weight vector w
that solves the following:
Problem I: minimize P (w ) = w T V w (3.57)
Note that unrestricted short selling is allowed. A major difference between the
above optimization problem and the two-security problem on page 107 is that
the constraints (3.58) are now two equations in N unknowns w1 , w2 , . . . , w N .
In other words, for N 3, there is not a unique portfolio weight vector w
satisfying (3.58). In fact, generically there are infinitely many solutions w of
(3.58) and, hence, infinitely many portfolio risks corresponding to for N 3.
However, we shall show that there is a unique portfolio weight vector w = w
yielding the smallest portfolio risk associated with the given . That is, we find
a unique solution of (3.57) and (3.58) together.
It is analytically simpler to solve instead the following optimization
problem:
3.3 Efficient Frontier for N Securities with Short Selling 121
wT V w
Problem II: minimize f c (w ) = , where c > 0, (3.59)
c
wT V w
minimize f (w ) =
2
subject to w T e = 1 and w T = .
Since the constraints are equalities, the tool for solving this problem is the
method of Lagrange multipliers.13 The Lagrange function for Problem II is
L(w, ) = f (w ) + 1 (1 w T e) + 2 ( w T ) = f (w ) + T h (w),
L
(w, ) = V w 1 e 2 (3.61)
w
12 Do not attempt to show the equivalence via the Lagrange conditions (i.e., (3.64) to (3.66)). Simply
use the statement of the problems and logically imply one from the other.
13 For optimization problems with inequality constraints, the Karush-Kuhn-Tucker Theorem is em-
ployed.
122 3 Markowitz Portfolio Theory
L
(w, ) = h (w) (3.62)
2 L
(w, ) = V. (3.63)
w w T
The Lagrange Multiplier Theorem yields that w is a solution of Problem II
if and only if there is a pair (w , ), where is unique to w , satisfying
L
(w , ) = 0 (3.64)
w
L
(w , ) = 0 (3.65)
2
L
x T
(w , ) x 0 for every x = 0 such that (3.66)
ww T
both of the following hold
T
h1
0= (w ) x = e T x = ( x1 + + x N )
w
T
h2
0= (w ) x = T x = (1 x1 + + N x N ).
w
In other words, the set of solutions of Problem II is in 1-1 correspondence with
the set of solutions of (3.64)(3.66). We shall show that the latter set of equa-
tions has a unique solution (w , ), which, by the Lagrange Multiplier The-
orem, gives a unique solution w to Problem II, which in turn is a unique
solution to Problem I. Without loss of generality, we set c = 2 for convenience.
To determine solutions (w , ) of (3.64)(3.66), first observe that condition
(3.66) holds automatically because by (3.63) the Hessian matrix,
2 L
(w , ) = V ,
ww T
is positive definite. We next search for the pairs (w , ) satisfying (3.64). Let
1,
= .
2,
Equation (3.61) shows that (3.64) is equivalent to
V w = 1, e + 2, .
or T
w = 1, e T V 1 + 2, T V 1 .
Though (w , ) solves (3.64), we also need the pair to satisfy (3.65). But Equa-
tion (3.65) is equivalent to the two constraint equations, which fortunately are
linear in the weight vector and so, by (3.67), linear in the multipliers:
1 = wT e = 1, (e T V 1 e) + 2, ( T V 1 e)
= wT = 1, (e T V 1 ) + 2, ( T V 1 ).
1 = 1, A + 2, B (3.68)
= 1, B + 2, C. (3.69)
where
C B A B
1, = , 2, = . (3.71)
AC B 2 AC B2
Then the uniqueness of yields by (3.67) a unique portfolio weight vector:
C B 1 A B
w = V e+ V 1 . (3.72)
AC B2 AC B2
We call w the minimum-variance portfolio weight vector with expected portfolio
return rate . Observe that, for a two-security portfolio, Equation (3.72) coin-
cides with (3.38) (page 108).
To summarize, we found a unique pair (w , ) satisfying (3.64) and (3.65),
and automatically satisfying (3.66) by the positive definiteness of V. The
Lagrange Multiplier Theorem then implies that the weight vector w is a
unique solution of Problem II or, equivalently, Problem I. Note that our deriva-
tion of (w , ) drew fundamentally upon the linearity of the constraint equa-
tions and the positive definiteness of V .
124 3 Markowitz Portfolio Theory
With w available, let us determine the efficient frontier. The portfolio vari-
ance associated with the expected portfolio return is
P2 (w ) = wT V w = wT V (1, V 1 e + 2, V 1 )
= 1, (w T e) + 2, (w T )
= 1, + 2,
A 2 2 B + C
P2 () = . (3.73)
A C B2
Equation (3.73) has the exact form as the portfolio variance (3.40) for two se-
curities (page 109). Indeed, the methodology for the two-security case in Sec-
tion 3.2.2 carries over almost verbatim to determining the N-security efficient
frontier. Equation (3.73) is equivalent to the right branch of the following hy-
perbola in the (P , P )-plane:
2
2P P B
A
= 1. (3.74)
1 AC B2
A A2
Since investors are not interested in a portfolio with negative expected return
rate, we assume G > 0, which yields B > 0.
The portion of the branch of the hyperbola from the turning point to along
the upper portion of the branch then forms the efficient frontier, which we call
the Markowitz efficient frontier for N securities
-
A2P 2BP + C B
ME,N = (P , P ) : P = , P
AC B2 A
. /
2P (P ( B/A))2 B
= ( P , P ) : = 1, P > 0, P .
(1/A) (( AC B2 )/A2 ) A
(3.76)
3.3 Efficient Frontier for N Securities with Short Selling 125
The set of portfolio weight vectors that produce the efficient frontier ME,N is:
C P B 1 P A B 1 B
WE,N = w : w = V e+ V , P .
AC B2 AC B2 A
(3.77)
Equations (3.76) and (3.77) coincide, respectively, with (3.48) and (3.49) (see
page 111) when N = 2.
Finally, since not all portfolios are efficient, we consider how the efficient
frontier sits in the set of feasible portfolios, i.e., the collection of all possible pairs
of portfolio risk and expected return rate. The discussion will highlight a dif-
ference between the N-security portfolio case with N 3 and the two-security
case. Introduce a mapping fP,N from the weight space WN into the (P , P )-
plane by
fP,N (w) = (P (w ), P (w)).
The set of feasible portfolios with no short selling is defined as the range of fP,N :
$ %
FP,N = fP,N [WN ] = (P (w ), P (w )) : w T e = 1 .
For two securities, the weight space W2 is a line and the transformation fP,2
maps it into the right branch of a hyperbola. In fact, by (3.37) and (3.40) (see
pages 108 and 109), we can express w and P (w) as functions of the expected
portfolio return rate :
-
1 2 A2 2B + C
P (w ) = , w= , P (w) = .
1 2 1 AC B2
P = P (w ) = , P = P (w)
and using Equation (3.42) on page 109, we see that the feasible set FP,N is the
right branch of a hyperbola:
"
FP,2 = (P , P ) : P = ( AP 2BP + C )/( AC B ), < P <
2 2
. /
2P (P ( B/A))2
= ( P , P ) : = 1, P > 0, < P < .
(1/A) (( AC B2 )/A2 )
The efficient frontier MP,2 is the top half, including the turning point, of the
curve FP,2 , i.e., the solid curve in Figure 3.1 on page 110.
126 3 Markowitz Portfolio Theory
For N 3, the set of feasible portfolios is different. For each expected port-
folio return rate P (w) = P , an efficient portfolio has the least risk, which by
(3.76) is -
A2P 2BP + C
P (w ) = , ( N 3).
AC B2
In other words, a portfolio with P (w) = P is inefficient if and only if it has
risk greater than the above amount. The set of feasible portfolios, i.e., the locus
of all efficient and inefficient portfolios, is then given as follows for N 3:
-
2BP + C
A2P
FP,N 3 = (P , P ) : P , < P <
AC B2
. /
2P (P ( B/A))2
= (P , P ) : 1, P > 0, < P < .
(1/A) (( AC B2 ) /A2 )
(3.78)
weight spaces for portfolios with two and three securities. The two-security
case is a line segment, while the three-security case is an equilateral triangle.
3.4 N-Security Efficient Frontier Without Short Selling 127
0.30
0.25
0.20
0.15
0.10
0.05
Fig. 3.3 The feasible region FP,3 for three securities with unlimited short selling. The outer boundary
curve is a branch of a hyperbola. The efficient frontier M E,3 is the portion of the hyperbola starting
from the turning point and going along the upper segment of the hyperbola
Note that WN is the intersection of the weight space W , where short sell-
N
ing is allowed, with the positive orthant {w : wi 0, i = 1, . . . , N } in R N . For
example, in Figure 3.4 the line segment and equilateral-triangle weight spaces
arise, respectively, from the intersection of the straight line W1 with the first
quadrant and from the plane with the first octant.
Fig. 3.4 The weight spaces for two-security (left) and three-security (right) portfolios
Define a mapping fP,N into the ( , )-plane
from the weight space WN P P
R by
2
fP,N (w ) = (P (w), P (w)), w WN .
128 3 Markowitz Portfolio Theory
The range of fP,N
is defined to be the set FP,N of feasible portfolios with no
short selling:
fP,N [WN ] = FP,N
.
For a two-security portfolio, the no-short-selling feasible set FP,2 is a segment
of the right branch of a hyperbola. The end points of the segment correspond
to the risk-mean points (1 , 1 ) and (2 , 2 ) due to the individual securities.
Figure 3.1 on page 110 gives an illustration.
For a three-security portfolio, the feasible set FP,3 forms a region. The left
panel in Figure 3.5 depicts an example. The three cusp points in the figure
correspond to the pair of risk and expected return rate due to the securities,
namely, (1 , 1 ), (2 , 2 ), and (3 , 3 ). As in the short selling case, the turning
point (shown as ) designates the portfolio with the global minimum risk and
expected return rate. We can then identify the efficient frontier as the outer
boundary curve segment from the turning point to the uppermost cusp point.
Each bold curve joins a pair of cusp points corresponding to two securities, so
each point on such a curve is a two-security portfolio.
The right panel in Figure 3.5 shows how the three-security feasible set with
no short selling fits inside the feasible set in Figure 3.3 with short selling. The
two figures are generated using the same three-security inputs, except that the
weights are nonnegative in the case of no short selling. The efficient frontier
for short selling in Figure 3.3 extends higher up than its analog for no short
selling in the left panel of Figure 3.5.
Remark 3.5. Portfolio theory with no short selling and with bounds on the
portfolio expectation and portfolio variance is usually treated with optimiza-
tion software. For a mathematical treatment, see, for example, Korn and Korn
[13, Chap. 1] and references therein.
0.30
0.25
0.20
0.15
0.10
0.05
of feasible portfolios for three securities without short selling (shaded region).
Fig. 3.5 Left: set FP,3
We use the same input parameters as in Figure 3.3, except the weights are restricted to being nonneg-
ative. Each cusp point corresponds to a pair of risk and expected return rate due to one of the three
securities. The efficient frontier is the outer curve segment from the turning point to the uppermost
cusp point. The three solid curves joining pairs of cusp points consist of two-security portfolios con-
taining only the two joined securities (cusp points). Right: superposition of the feasible sets for no
short selling (left) and short selling (Figure 3.3). The efficient frontier for the short selling example
extends higher up than the efficient frontier with no short selling
The global minimum-variance portfolio in (3.75) on page 124 arises when the
second Lagrange multiplier 2, vanishes:
V 1 e
2, = 0 = G w = w G = .
A
To see
V 1 e
w = = = G ,
A
note that since
V 1 e 1
w = 1, V 1 e + 2, V 1 = 0
A A
and because V 1 e and V 1 are linear independent (due e and being linearly
independent), we get
1
1, = , 2, = 0,
A
which yields = G .
130 3 Markowitz Portfolio Theory
C V 1
1, = 0 = D = w = wD = .
B B
However, we must check one more property, i.e., to conclude that the diversi-
fied portfolio is efficient, Equation (3.80) shows that we still need to establish
= D G .
Since
C B AC B2
D G = = ,
B A AB
we have
AC B2
D = G + .
AB
But AB > 0 and AC B2 > 0. Consequently
D > G .
Hence, the diversified portfolio is at a point (D , D ) on the efficient frontier
MP,N that is higher up than the global minimum-variance portfolio (G , G ).
The variance of the diversified portfolio is
C
P2 (w D ) = .
B2
Because we assume that G > 0 and so B > 0, the risk of the diversified port-
folio is
C
P (w D ) = D .
B
Note that because G is the smallest possible variance for the N securities, we
have
D > G .
It will now readily follow that every minimum-variance portfolio, i.e., a port-
folio with weight vector that solves Problem II, can be expressed as an app-
ropriate combination of the global minimum-variance portfolio w G and the
diversified portfolio w D . In fact, for a minimum-variance portfolio, we have
3.6 Investor Utility Function 131
1, A + 2, B = 1.
A (C B) B (A B)
a 1, A = , 1 a = 2, B = .
AC B2 AC B2
Hence:
w = a w G + (1 a ) w D . (3.81)
The result (3.81) is called a Mutual Fund Theorem or Separation Theorem. It
states that an N-security minimum-variance portfolio with expected return has the
same risk-mean as the two-security portfolio with the percentage a invested in port-
folio w G and percentage 1 a invested in portfolio w D . Proxies for the portfolios
w G and w D could be mutual funds. The idea would then be to purchase two
such mutual funds in the proportions a and 1 a to create a two-security
portfolio that replicates the risk-reward profile of the N-security portfolio.
In general, there are many minimum-variance portfolio separations
available:
w = s1 w a + s2 w b ,
Readers are referred to Ingersoll [10] for more on the Mutual Fund Theorem.
where u(k) ( x ) is the kth derivative of u. Since u(E ( X )) and u(k) (E ( X )) are
constants and because E ( X E ( X )) = 0, it follows
1
E u( X ) = u(E ( X )) + u (E ( X )) Var( X )
2!
1 3
+ u(3) (E ( X )) E X E ( X ) + .
3!
A key assumption of Markowitz portfolio theory is that investors assess an
investment only through the expectation and variance of its return. Unless
otherwise stated, let us assume that investors evaluate an investment exclusively in
terms of its expected return E ( X ) and variance Var( X ) and, for simplicity, treat all
higher-order derivatives of the utility function as zero, even if those terms in the Taylor
expansion are a function of E ( X ) and Var( X ):
u(k) (E ( X )) = 0 for k = 3, 4, . . . .
Investors can be divided into three broad categories. To illustrate, let 0 be the
current time and suppose that an investor is considering a choice of one of the
following portfolios:
Portfolio A: riskless with a 100% guaranteed return rate of RtA = 10% from
the present time 0 to a time t in the future. This sure investment has an
expected utility given by
Fig. 3.6 Depiction of three different utility functions, where x represents possible return rates on an
investment and u( x ) assigns a number designating the investors utility for the given possible return
rate. Lower, middle, and upper curves show graphs of the utility functions of a risk-averse, risk-
neutral, and risk-seeking investor, respectively
We now consider choosing between these two portfolios through the eyes of
risk-averse, risk-seeking, and risk-neutral investors.
Risk-Averse Investors
R f = er t 1. (3.85)
Stc
RS = 1 Stc = eq t St , (3.86)
S0
where S0 is the (known) current price and St is the random market price at the
future time t. Since u( R f ) is a constant, Equation (3.82) implies
u( R f ) = E u( R f ) = E u( RS ) .
E ( S t ) > S0 e ( r q ) t . (3.88)
Finally, the reader may wonder how our current definition of a risk-averse
investor, i.e., one with utility function where u ( x ) > 0 and u ( x ) < 0, relates to
our original definition given in the Markowitz theory. Recall that an investor
is termed risk averse in the Markowitz model if, for a portfolio with a given
level of risk, the investor requires the maximum expected return and if, for a
portfolio with a given expected return, the investor requires the minimum risk.
To see the link with utility functions, for a portfolio return rate R P , Equation
(3.84) yields
1
E u( R P ) = u E ( R P ) + u E ( R P ) Var( R P ).
2
Since u E ( R P ) < 0, we see that maximizing the expected utility E u( R P )
given a fixed expected return E ( R P ) implies that the variance Var( R P ) of the
portfolio is minimized. Conversely, if we are given a fixed variance Var( R P ),
then maximizing E u( R P ) implies u E ( R P ) is maximized, which yields
that the expected portfolio return E ( R P ) is also maximized (since u is strictly
increasing).
Risk-Seeking Investor
E ( S t ) < S0 e ( r q ) t . (3.89)
Risk-Neutral Investors
Considering the risk-free security with return R f and risky one with return
RS , we have
1
u( R f ) = u E ( RS ) + u E ( RS ) Var( RS ) = u E ( RS ) .
2
E (Stc )
Since u strictly increases, we must have er t = S0 . Thus, a risk-neutral investor
prefers risky securities with
E ( S t ) = S0 e ( r q ) t . (3.90)
Remark 3.6.
1. Though investors tend to be risk averse versus risk seeking, they are un-
likely to be risk neutral. Despite this, risk-neutral investors will play an im-
portant role in the pricing of derivatives (Chapter 8).
2. Portfolio theory can also be presented starting from the theory of utility
functions; see Pennacchi [21, Chap. 2] for more. This approach, though,
would take us far deeper into utility functions than is appropriate for this
introductory text.
We now return to our original assumption that all investors are risk averse.
138 3 Markowitz Portfolio Theory
(11 , 12 , . . . , 1N , 22 , 23 , . . . , 2N , 33 , 34 , . . . , 3N , . . . , N N ),
N ( N 1 )
where ii = i2 . Denote the joint p.d.f. of these N + 2 entries by fV and the
marginal p.d.f.s of the entries ij by f ij , respectively. Expectations with respect
3.7 Diversification and Randomly Selected Securities 139
which is the average of the expected individual security variances. The expec-
tation of the sample mean of the covariances ij , where i = j, is
1 1
Cov ( N ) EV N ( N 1) ij = N ( N 1) Eij [ij ] ,
2 1 i < j N 2 1 i < j N
(3.92)
which is the average of the expected individual security covariances. Assume
that as N , the quantities Var ( N ) and Cov ( N ) converge to finite values, which
we denote simply as Var and Cov , respectively.
Second, uniformly and randomly choose weight vectors w from the weight
space WN , which excludes short selling:
$ %
WN = ( x1 , . . . , x N 1, x N ) R N : xi 0, x1 + + x N 1 + x N = 1 .
w = [ w 1 . . . w N 1 w N ] T ,
f w ( xi ) = ( N 1)(1 xi ) N 2 , i = 1, . . . , N ( N 2).
0.8
1
0.6
0.8
w3
0.4
0.6
0.2
w2
0.4 0
0 0.2
0 0.4
0.2 w1
0.6
0.5
0.8
w2
0 1 1
0 0.2 0.4 0.6 0.8 1
w1
Fig. 3.7 Randomly and uniformly chosen weights from the weight spaces with no short selling for
two securities (left) and three securities (right). The weights were drawn using a uniform Dirichlet
distribution
1
E w (wi ) =
N
N1
Varw (wi ) =
N2 ( N
+ 1)
1
Covw (wi , w j ) = 2 , i = j.
N ( N + 1)
Moreover:
2
E w (w2i ) = Varw (wi ) + (E w (wi ))2 = (3.93)
N ( N + 1)
and
1
E w (wi w j ) = Covw (wi , w j ) + E w (wi ) E w (w j ) = (i = j), (3.94)
N ( N + 1)
where i, j = 1, . . . , N and N 2.
Third, let the pair (w, V ) represent the following string of random variables:
(w1 , . . . , w N , 11 , 12 , . . . , 1N , 22 , 23 , . . . , 2N , 33 , 34 , . . . , 3N , . . . , N N ).
Denote the joint p.d.f. of (w, V ) by f (w,V ) and expectations relative to f w,V
by E (w,V ) . With respect to the degree of independence between w and V , we
assume:
E w (wi w j ij ) = E w (wi w j ) E ij (ij ). (3.95)
Now, turning to the portfolio variance, we have the random variable:
N
2
P,N = wT V w = w2i i2 + 2 wi w j ij .
i =1 1 i < j N
3.7 Diversification and Randomly Selected Securities 141
N
= E w (w2i ) E i (i2 ) + 2 E w (wi w j ) E ij (ij )
i =1 1 i < j N
2 N E i (i2 ) N1 E ij (ij )
=
N+1 N
+
N+1 ( N ( N 1)/2)
.
i =1 1 i < j N
Consequently:
1
1
2
E (w,V ) P,N = 2 Var ( N ) Cov ( N ) + Cov ( N )
N+1 1+ 1
N
(3.96)
Hence, the limiting value of the mean portfolio variance as N increases shows that
the mean sample variance of the individual securities return rates is dominated by the
mean sample covariance of these returns. That is, for a sufficiently large number
of securities, the covariances between securities have a greater impact on a
typical portfolios variance than the variances of the individual securities. The
next section illustrates these ideas using data from the NASDAQ.
Remark 3.7.
1. The above is consistent with our findings in Section 3.2.3 for two securities
with no short selling. By decreasing the correlation coefficient between
the two securities from 1 toward 1, we increased the diversification and
reduced the portfolios risk. See Figure 3.2 on page 115.
2. For the case when the weights are nonrandom and equal, namely,
1
wi = , i = 1, . . . , N,
N
but the securities are randomly chosen (i.e., the covariance matrix is ran-
dom), the expected portfolio variance becomes
1
2
E (w,V ) P,N = (Var ( N ) Cov ( N )) + Cov ( N ).
N
142 3 Markowitz Portfolio Theory
Example 3.8. We estimated the mean portfolio variance for 2, 385 NASDAQ
stocks using 503 adjusted closing daily prices for each stock over the time
span from March 15, 2011, to March 15, 2013. The data was obtained from fi-
nance.yahoo.com. Figure 3.8 depicts the results generated by 100, 000 pairs of
random stock picksand random weights for each N. The portfolio variance for
60
real data
theoretical
50
Mean Portfolio Variance
40
30
20
10
0
0 10 20 30 40 50
Number of Stocks in the Portfolio
Fig. 3.8 Mean portfolio variance as a function of the number of randomly chosen stocks on the NAS-
DAQ (see text). The theoretical line is the mean portfolio variance based on Equation (3.96), which
employs a uniform Dirichlet distribution of weights with no short selling. The real data line is based
on random draws from the NASDAQ. Courtesy of Li Li
all 100, 000 pairs was computed and averaged to estimate the mean portfolio
variance for the given N. The figure also depicts the theoretical mean portfolio
variance determined using the uniform Dirichlet distribution.
Equation (3.97) and Figure 3.8 illustrate that even after diversification by ran-
domly and uniformly choosing a large number of securities across the marketplace
and across weights, portfolio risk arising from the mean sample covariance remains.
3.8 Exercises 143
This type of risk, i.e., risk that cannot be eliminated by diversification, is called
undiversifiable risk or systematic risk. It will play an important role in Chapter 4.
The portion of portfolio risk that can be removed by diversification, namely,
the risk that contributes to the mean portfolio variance values above the con-
vergence value in Figure 3.8, is called unsystematic risk or diversifiable risk.
3.8 Exercises
3.1. An investor plans to create a portfolio of ten stocks by shorting all of them.
Can he use the Markowitz theory presented in this chapter? Explain your
answer.
3.2. Can you find five examples of pairs of stocks in the USA that are negatively
correlated? Are such occurrences common?
3.3. The volatility of the log-return rate of a portfolio over 120 days is roughly
11 times the volatility over a day. Agree or disagree? Explain your answer.
3.4. Decide whether you agree or disagree with the statements below. Justify
your answer.
a) Investors who are not risk averse are irrational.
b) When the risk of a portfolio vanishes, the risk of each security has to
vanish.
144 3 Markowitz Portfolio Theory
3.5. A clever wealth manager constructed a portfolio of stocks such that the
portfolio has no risk and has an expected return of 25%. What is the probability
that the portfolio return rate will actually be 25%?
3.6. Show that Problem I on page 120 is equivalent to Problem II on page 121,
i.e., show that these two optimization problems have the same set of solutions
for all c > 0.
P2 (w)
3.7. Explain the financial meaning of minimizing the function f P (w ) = P ( w)
,
where P (w) is the portfolio expected return.
3.8. It can be shown that the covariance of the return of the global minimum-
variance N-security portfolio with the return of any other efficient N-security
portfolio is always 1/A (Exercise 3.28). Interpret this result.
3.9. If your initial capital increases by $100, then we would expect an increase
in your utility. To which scenario would you assign a higher utility, assuming
the same risk for both? Scenario A: initial capital of $1, 000. Scenario B: initial
capital of $10, 000.
3.12. (Two Securities) The table below gives a sample of artificial historical
data for an energy company (Stock A) and a phone company (Stock B). Each
indicated return rate is end of month to end of month and is based on adjusted
A
closing prices. For example, RJan 2012 is the return rate from the last trading
day in December 2011 to the last trading day in January 2012. Using the table,
express your answers in percentages where appropriate.
3.8 Exercises 145
Date A
Rmonthly B
Rmonthly
Jan-2012 2.45% 2.69%
Feb-2012 3.35% 1.81%
Mar-2012 3.24% 4.94%
Apr-2012 2.93% 5.88%
May-2012 6.13% 2.51%
Jun-2012 6.19% -0.35%
Jul-2012 0.78% 1.59%
Aug-2012 -0.19% -3.83%
Sep-2012 4.65% 5.24%
Oct-2012 3.53% 4.85%
Nov-2012 5.03% 2.48%
Dec-2012 -1.71% 4.03%
a) Sketch the graph of the monthly total return rates of each stock as a function
of time during 2012. Briefly discuss the movement of the stocks during two
equal-length-time periods in 2012.
b) Estimate the expected monthly total return rate of each stock in the table.
What is your answer if you use data only from Dec-2011 to Jun-2012? Com-
pute the monthly volatility of each stock for the year 2012.
c) Estimate the monthly variance of each stock and determine the covariance
and correlation coefficient between the monthly total return rates for the two
stocks during 2012. Is the result what you expected? Briefly discuss.
d) For a portfolio consisting of stocks A and B, use the data in the table to det-
ermine the portfolios expected monthly total return rate and the portfolios
monthly risk under selection I, where funds are split evenly between the
two stocks, and selection II, where two-thirds of your funds are in stock A
and one-third in stock B. Annualize the portfolio expected returns and risks.
e) Would you recommend portfolio selection I or II to an investor? Briefly dis-
cuss your answer.
f) Critique how the data in the table is being applied to the theoretical frame-
work used in these exercises. Include two drawbacks with using historical
data to estimate the expected returns and risks.
3.13. (Three Securities) Suppose that you have $5,000 to invest in stocks 1, 2,
S1 ( t 0 ) $10.20
and 3 with current prices S2 (t0 ) = $53.75 , covariance matrix
S3 ( t 0 ) $30.45
146 3 Markowitz Portfolio Theory
0.03 0.04 0.02
V = 0.04 0.08 0.04 ,
0.02 0.04 0.04
3.15. Suppose a client just inherited $1, 000, 000 and has come to you seeking
advice on how to split the money between two of his favorite securities so as to
maximize return. Security A has expected rate of return r A = 0.13 and standard
deviation of A = 0.15. Security B has expected rate of return r B = 0.14 and
standard deviation B = 0.20. The correlation coefficient between their rates of
return is = 0.3. If the investor has a utility function U ( x ) = 3 x, how should
he invest in each stock to maximize his overall rate of return?
The significance of the result is to allow a change of scale to one convenient for
computation.
3.8 Exercises 147
3.18. Verify Equation (3.20) on page 103, where the portfolio log-return rates
are assumed uncorrelated and identically distributed.
3.20. (Two Securities) Verify Equation (3.38) on page 108, i.e., show that
C B 1 A B
w = V e + V 1 ,
AC B2 AC B2
where
w 1 2
w = = .
1 w 1 2 1
a) Show that if b2 4 a c < 0 and f ( x ) 0 for all x R, then f ( x ) > 0 for all
x R.
, that if a > 0 and f ( x ) > 0 for all x R, then the global minimum point
b) Show
of f is
b
x =
2a
and the corresponding global minimum value is
" #
(b2 4ac)
f ( x ) = .
4a
c) Use the above results to give an alternative proof that the two-security port-
folio variance,
3.23. (Two Securities) Consider a portfolio with two securities having returns
1 and 2 , risks 1 and 2 , and a correlation coefficient that vanishes. To
minimize this portfolios risk-to-reward ratio, a natural quantity to minimize is
P2 (w)
f (w) = ,
P w)
where w is the fraction of the total investment in the security with expected
return 1 .
a) Determine an equation that any critical point of f must satisfy. What type of
equation is it?
b) Show that if 1 = 2 , then we obtain a linear equation for w with solution
22
w= .
12 + 22
This critical point coincides with the global minimum w we found for the
two-security portfolio when minimizing the variance P2 with = 0. Why
are the two critical points identical?
3.24. (Three Securities) Consider a portfolio with three securities having risks
1 , 2 , and 3 and correlation coefficients 12 , 13 , and 23 . Let V be the covari-
ance matrix of the security returns. Show that V is positive definite if and only
if the following hold:
a) 1 > 0, 2 > 0, and 3 > 0,
b) |12 | < 1 and 212 + 213 + 223 212 13 23 < 1.
3.25. Let A be an n n matrix. Show that the gradient and Hessian of the
quadratic xT Ax are
( x T Ax) 2 ( x T Ax)
= ( A + A T ) x, = A + AT , x Rn ,
x xx T
2
f f f T 2 ( x T Ax) f
where x = x . . . xn and xxT = x x .
1 i j n n
3.26. (N Securities) For an N security portfolio, show that the portfolio vector
w which minimizes the variance P2 (w ) = w T V w, subject to w T e = 1, is the
global minimum-variance portfolio vector. Explain why this is expected.
3.27. (N Securities) Determine the equations for the lines asymptotic to the set
of all minimum-variance N-security portfolios.
3.28. Show that the covariance of the return of the global minimum-variance
N-security portfolio with the return of any other efficient N-security portfolio
is always 1/A.
References 149
3.29. Where does the tangent line at the diversified portfolio on the Markowitz
N-security efficient frontier intersect the P -axis?
1 AC B2
Cov( R P (w a ), R P (w b )) = + ab ,
A AB2
where A = e T V 1 e, B = T V 1 e, and C = T V 1 .
3.32. Let a > 0 and b = 0. Show that the utility functions u( x ) = ax + b and
u( x ) = ax 2b x2 , where x < ba , obey
1
E u( X ) = u(E ( X )) + u (E ( X )) Var( X ).
2
3.33. A power utility function refers to one of the form u( x ) = x a . When is u a
risk-averse utility function?
References
[1] Bodie, Z., Kane, A., Marcus, A.: Investments, 9th edn. McGraw-Hill,
New York (2011)
[2] Capinski, M., Zastawniak, T.: Mathematics forFinance. Springer,
New York (2003)
[3] Durrett, R.: Probability: Theory and Examples, 4th edn. University Press,
Cambridge (2010)
[4] Fine, T.: Probability and Probabilistic Reasoning for Electrical Engineer-
ing. Pearson Prentice Hall, Upper Saddle River (2006)
150 3 Markowitz Portfolio Theory
[5] Frigyik, B., Kapila, A., Gupta, M.: Introduction to the Dirichlet distribu-
tion and related processes. University of Washington Electrical Engineer-
ing Technical Report, Number UWEETR-2010-0006 (2010)
[6] Goldberg, L.: A review of Risk-Return Analysis: The Theory and
Practice of Rational Investing (Volume I) by Harry Markowitz and Ken-
neth Blay. http://www.cfapubs.org/doi/full/10.2469/br.v9.
n1.9%40faj.2014.70.issue-3 (2014)
[7] Goodman, J.: Statistical Optics. Wiley Classics Library Edition. Wiley,
New York (2000)
[8] Graham, J., Smart, S., Megginson, W.: Corporate Finance. South-Western
Cengage Learning, Mason (2010)
[9] Grinold, R., Kahn, R.: Active Portfolio Management. McGraw-Hill,
New York (2000)
[10] Ingersoll, J.: Theory of Financial Decision Making. Rowman and Little-
field, Savage (1987)
[11] Jorion, P.: Value at Risk, 3rd edn. McGraw-Hill, New York (2007)
[12] Korn, R.: Optimal Portfolios. World Scientific, River Edge (1997)
[13] Korn, R., Korn, E.: Option Pricing and Portfolio Optimization. American
Mathematical Society, Providence (2001)
[14] Levy, H.: A review of Risk-Return Analysis: The Theory and Practice of
Rational Investing (Volume I) by Harry Markowitz and Kenneth Blay
(2014). Quant. Finance 14(7), 1141 (2014)
[15] Levy, H., Duchin, R.: Asset return distributions and the investment hori-
zon. J. Portf. Manag. 30(3), 47 (2004)
[16] Luenberger, D.: Investment Science. Oxford University Press, New York
(1998)
[17] Markowitz, H.: Portfolio selection. J. Financ. Res. 7(1), 77 (1952)
[18] Markowitz, H.: Portfolio Selection. Blackwell, Cambridge (1959)
[19] Markowitz, H., Blay, K.: Risk-Return Analysis: The Theory and Practice
of Rational Investing, vol. I. McGraw-Hill, New York (2014)
[20] Merton, R.: An analytic derivation of the efficient portfolio frontier.
J. Financ. Quant. Anal. 7(3), 1851 (1972)
[21] Pennacchi, G.: Theory of Asset Pricing. Pearson Addison Wesley, Boston
(2008)
[22] Reiley, F., Brown, K.: Investment Analysis and Portfolio Management.
Dryden Press, Fort Worth (1997)
[23] Roman, S.: Introduction to the Mathematics of Finance. Springer,
New York (2004)
[24] Ross, S.: An Elementary Introduction to Mathematical Finance.
Cambridge University Press, Cambridge (2011)
[25] Vanderbei, R.: Linear Programming: Foundations and Extensions, 4th
edn. Springer, New York (2014)
Chapter 4
Capital Market Theory and Portfolio Risk Measures
1 For example, REITs, which is an acronym for Real Estate Investment Trusts.
Remark 4.1.
1. Unless stated otherwise, in this chapter, R P and r P denote the return2 and
logarithmic return of a portfolio P on a general time interval [t0 , t f ], respec-
tively. On the other hand, by default the risk-free rate r is a percent quoted
on an annual basis. To keep the mathematical expressions simple, in the
current chapter, both R, P and r are for the same period; consequently, so are
P = E ( R P ) and P = Var( R P ). For example, if one of them is monthly,
so are the rest.
2. Although we will provide all the concepts in this chapter using only total
returns, some concepts such as the Sharpe ratios and linear factor models
can be defined using logarithmic returns as well. The latter has advantages
in studying the properties of individual securities since logarithmic returns
are more tractable than ordinary returns. Furthermore, statistical tools can
be more conveniently applied with logarithmic returns, particularly under
the lognormal assumption of security prices.
Capital market theory, which includes the Capital Asset Pricing Model
(CAPM3 ), was developed by William Sharpe,4 John Lintner, and Jan Mossin.
This theory naturally generalizes the Markowitz mean-variance portfolio
model by introducing both a new efficient frontier that extends beyond the
Markowitz efficient frontier and a model for pricing individual securities. The
new efficient frontier is formed by adding a risk-free borrowing or lending
consideration, which turns the old efficient frontier into a half line.
Recall that the one-period Markowitz model is interested in risk-averse in-
vestors selecting portfolios at time t0 that produce stochastic returns at time t f .
Besides all the assumptions we made about the Markowitz model in the last
chapter,5 we also assume that:
all investors have equal access to borrowing and lending, which occur at
the same risk-free rate r, and lenders bear no risk of not being repaid;
the inflation rate is no more than the risk-free rate r.6
Sciences. Sharpe won for his contributions to the Capital Asset Pricing Model. See the Press Release
at Novelprize.org.
5 See Section 3.1.
6 Under normal circumstances, inflation constitutes a major portion of the risk-free rate. The only
problem with this is when the inflation is far above the risk-free rate due to a central bank intervention.
4.1 The Capital Market Theory 153
Let A be a risk-free security with risk-free rate r.7 Consider a set of N risky
securities,8 and assume that the initial investment at time t0 is the amount V0 .
We shall investigate the best risk-return trade-off portfolio design that is based
only on allocating the initial investment between A and a portfolio B consisting
of the given N risky securities.
As in Markowitz portfolio theory, we represent each portfolio P by a point
in the (P , P )-plane. To avoid possible confusion about the time basis for these
quantities and r, we clarify our usage in the remark below:
Remark 4.2. The definitions of R P , P , and P used in this section are the same
as that in (3.8) (page 94), (3.13) (page 95), and (3.14) (page 96), respectively,
which are all relative to a time interval [t0 , t f ]. As noted in Remark 4.1, we allow
in this chapter for r to be not necessarily on an annual basis, but to have the
same period as R P . The same applies to R M , Ri , etc., which will be introduced
later.
First, we present a more intuitive discussion. Any point ( x, y) on the line de-
termined by the points A and B can be represented by
( x, y) = w0 (0, r) + (1 w0 )(B , B )
7 A risk-free security is, of course, a theoretical concept. In reality, any investment carries a certain
amount of risk. In this context, by risk-free securities we mean US T-bonds or FDIC insured bank
accounts to which we can lend out our money and obtain a sufficient credit line from which we can
borrow money. Under our assumption in the last section, the interest rate for lending is equal to that
for borrowing and occurs at the risk-free rate.
8 At this stage, N is arbitrary. Eventually we need to consider only a sufficiently large N for which we
Fig. 4.1 The points M, B, and A have coordinates ( M , M ), ( B , B ), and (0, r ), respectively. The curve
M E,N is the efficient frontier of FP,N . The line CAL is the graph of equation (4.1), while the CML is the
graph of equation (4.3).
In our context, obviously, the case w0 > 1 is not applicable as no investor would
want to get an expected return below r. That is to say, a possible portfolio
design in terms of the portfolios expected return rate P and variance P2 is
given by
implies that
The graph of (4.1) in the (P , P )-plane as w0 varies is shown in Figure 4.1 and
is called a Capital Allocation Line (CAL).
Second, we take a more theoretical approach. Let FP,N denote the set of fea-
sible portfolios (the Markowitz bullet) that contains only the given N risky
securities, and let FP,N +1 denote the set of all feasible portfolios that contain
the given risk-free security and the N risky securities. We have
4.1 The Capital Market Theory 155
N
FP,N WN = {w = [w1 , w2 , , w N ] : w i = 1},
i =1
N
FP,N +1 WN +1 = {w = [w0 , w1 , w2 , , w N ] : w i = 1},
i =0
where w0 is the weight of the risk-free security A, and wi is the weight of the
ith risky security for i = 1, 2, . . . , N, and WN and WN +1 are defined as indicated.
Since we shall need N sufficiently large, it suffices to assume at this stage that N 3.9
We are interested in finding the efficient frontier of FP,N +1 .10 Recall that the
Markowitz efficient frontier of FP,N is denoted by ME,N ; see Figure 4.1. For ease
of presentation, we shall slightly abuse our notation and denote the efficient frontier of
FP,N +1 by ME,N +1 (though all the securities are not risky). For each P FP,N +1 ,
there exists vector w P WN +1 such that P can be expressed by
N
w P = [ w0 , w1 , w2 , , w N ] with wi = 1. (4.2)
i =0
Let
w N = [ w1 , w2 , , w N ] ,
where wi , i = 1, 2, . . . , N remain the same as in (4.2). Notice that w N
/ WN unless
w0 = 0 for iN=1 wi = 1 w0 . We have
N
P = w0 r + w i i , P2 = w
N V wN ,
i =1
9Recall that the feasible set for N = 2 is a curve, while for N 3, it is a region (see page 126).
10The proof to be provided here is for a one-period portfolio. The proof for n-period self-rebalancing
portfolios is similar. The details are left as an exercise for the reader.
156 4 Capital Market Theory and Portfolio Risk Measures
Obviously, the graph of ME,N +1 in the (P , P )-plane is the (half) tangent line
to the graph of ME,N at the point (, ) and with the left end point (0, r). Other-
wise, either the graph of ME,N +1 is a secant line to that of ME,N or is above that
of ME,N . In either case, a contradiction can be produced. The graph of ME,N +1
in the (P , P )-plane is called a Capital Market Line (CML).
Finally, combining our above intuitive and theoretical discussions, we have
the equation for the CML:
(P , P ) = w0 (0, r) + (1 w0 )(, ),
where (, ) ME,N .
It follows from the Markowitz portfolio theory of Chapter 3 that the efficient
portfolio (, ) offers the best risk-return trade-off. Everyone will then want to
invest in the portfolio (, ) and will want a risk-free asset to be on a CML. If
a security is not part of this portfolio, then there is no interest in the security,
which will cause it to drop out the marketplace. Consequently, the tangent point
portfolio (, ) then consists of all securities in the marketplace and so N is now the
total number of securities in the market. It also follows that the weight of each security
in (, ) is the percent of the marketplace the security occupies, i.e., the weight is the
securitys market capitalization. For this reason, (, ) is called the market portfolio
and denoted by (M , M ). Therefore, our desired portfolio should be designed
by putting the percentage w0 in the risk-free security and 1 w0 in the market
portfolio:
(P , P ) = w0 (0, r) + (1 w0 )(M , M ). (4.3)
Again, the graph of (4.3) in the (P , P )-plane is called a capital market line
(CML). It is depicted in Figure 4.1.
Using the parametric equations of the CML(obtained from the equation (4.3)),
.
P = w0 r + (1 w0 ) M
,
P = (1 w0 )M
a) If you invest $1,500 in a risk-free security at the rate of 6% and put the rest
in the market portfolio, then what are the expected portfolio return rate and
portfolio risk?
The portfolio has less expected return than the markets, but the portfolio
risk is only 25% of the market risk.
b) If you add leverage to your portfolio by borrowing $1, 500 at the risk-free
rate, then what are the expected portfolio return rate and portfolio risk?
Compare with the previous case.
Solution. In this case, w0 = 0.75. Consequently,
This means that after paying back the loan, the expected return is 16.5%,
which is the sum of the loss in returns due to paying the loan and the gain in
returns from investing the loan and the initial capital. The expected return is
more than double that for the previous case. However, such higher expected
returns expose you to risks that are much more volatile than the market,
about 1.75 times the market volatility. The leveraged portfolio is seven times
more risky than the portfolio in the previous case.
The CML shows that the market portfolio is the best efficient frontier portfo-
lio to combine with a risk-free security to exceed the expected returns of the
Markowitz model.
Though in practical applications we can employ a proxy for the market
portfolio to position a portfolio on the CML, the proxy does not give us any
quantitative understanding of where the market portfolio is on the Markowitz
efficient frontier. Moreover, one may be interested in investing in a limited
number of stocks, for which there may be no obvious proxy, and wish to con-
struct a CML-type tangent to the Markowitz efficient frontier of those stocks.
For these reasons, we compute, for a general N-security risky portfolio, the
point of tangency of the CML to the Markowitz efficient frontier of the N secu-
rities. We shall obtain explicit expressions for the expected return and risk of
the point of tangency, which we call the market portfolio for the N securities, even
if the securities form a subset of the true market portfolio. The mathematical
framework will, therefore, be general enough to incorporate the cases ranging
from two securities to any finite number.
Given a risk-free rate r and N risky securities, we apply the definition of
and formula for in the Markowitz portfolio theory:
158 4 Capital Market Theory and Portfolio Risk Measures
-
" A2M 2B M + C
M = w
M , M ( M ) = w
MV wM = ,
AC B2
where w
M e = 1, to obtain the expressions of M , M and w M as follows:
C Br Ar2 2Br + C V 1 ( M re )
M = , 2
M = , wM = . (4.5)
B Ar ( B Ar)2 B Ar
The computational details are left as an exercise for the reader (Exercise 4.23).
The reader may recall the diversified portfolio that was introduced in the
Markowitz model as part of establishing the Mutual Fund Theorem (see (3.81)
on page 131). The market portfolio is a generalization of the diversified port-
folio (D , D ). In fact, if r = 0, then we obtain the diversified portfolio from the
market portfolio:
C C V 1
M = = D , M = = D , wM = = wD ,
B B B
where we use B > 0. In this case, the CML runs from the origin to the diversi-
fied portfolio (D , D ).
i r,
which is how much the securitys expected return is above/below the risk-free
rate r. The risk premium of the market portfolio is
M r.
4.1 The Capital Market Theory 159
This is how much the markets expected return is expected to differ from the
risk-free rate. The Capital Asset Pricing Theorem, which is due to Sharpe, Lint-
ner, and Mossin, relates the risk premiums of the security and market via beta.
Theorem 4.1. (Capital Asset Pricing Theorem) Assume that the covariance ma-
trix V of all securities is positive definite and and e are linearly independent. Let
i = e
i , where e i = [0 0 1 0 0] with 1 in the ith slot, be the expected return
on the ith security. Then
i r = i ( M r ). (4.6)
Cov( Ri , R M ) e
i V wM r
i = =
= i .
M2 wMV wM M r
Let wM be the weight of the th security in the marketplace. Then
N N
Cov( Ri , R M ) = Cov( Ri , w
M R M ) = Cov( R i , w R ) =
M
wM Cov( Ri , R )
=1 =1
= e
i V w M.
Furthermore, since
11 1N
.. .. w M
. .
.1 N
ei V w M = 0 1 0 i1 iN . = wM i ,
.
.. .. M =1
. . wN
N1 N N
it follows
e
i V wM
i = .
w
MV wM
Now, we saw that the market portfolio vector is given by
V 1 ( M re )
wM = ,
B Ar
or equivalently,
M re
V wM = .
B Ar
Because e
i M = i , ei e = 1, w M M = M , and w M e = 1, we get
160 4 Capital Market Theory and Portfolio Risk Measures
e
i V wM e
i M re i e B Ar
i = =
w
MV wM
B Ar w
MM rw
Me
i r
= .
M r
Example 4.2. (Security Pricing via CAPM) Let D (t0 , t f ) be the stock cash div-
idend issued in the time period [t0 , t f ). Let S(t) be the stocks price at time t.
Recall the one-period return (see (3.1) on page 85):
S ( t f ) + D ( t0 , t f ) S ( t0 )
R ( t0 , t f ) = ,
S ( t0 )
where S(t f ) and D (t0 , t f ) are random variables, whereas S(t0 ) is deterministic.
Taking expectations on both sides of the last equation yields
E S ( t f ) + E D ( t0 , t f ) S ( t0 )
E R ( t0 , t f ) = .
S ( t0 )
Applying the CAPM to valuation of the stock price at time t0 , we obtain the
asset pricing formula based on the CAPM
S(t f ) + D (t0 ,t f )
S ( t0 ) = ,
1 + r + ( M r)
where S(t f ) = E S(t f ) and D (t0 ,t f ) = E D (t0 , t f ) .
Remark 4.3.
1. Given a time interval [t0 , t f ], (4.7) provides a relation between an asset price
at time t0 and the expectation of its return over the interval. This is to say
that every asset return model corresponds to an asset pricing model. This is
why authors use terminologies like modeling asset prices and modeling
asset returns interchangeably.
2. It is worth noting that the CAPM shows how the market must price an indi-
vidual security in relation to its asset class index, which we call beta, a risk
measure. Thus, the CAPM as an asset pricing model also shows how the set
of all securities can be classified by one risk measure, which is the beta in the
case of the CAPM. Looking ahead from these two points, the latter sections
of the chapter will branch into risk measures and linear factor models which
are a class of much more practically useful asset pricing models.
Example 4.3. The hurdle rate refers to the minimum acceptable (rate of an)
investment return. This concept plays an important role in decision-making
when a project is under consideration.
Here is a table of project specifications:
Project A B
Project beta 1.5 1.4
Initial Investment $10,000 $10,000
Expected Payoffs $8,000 in 2 years $9,000 in 2 years
$16,000 in 5 years $9,000 in 5 years
$9,000 in 8 years
Solution. We have
Since
$8, 000 $16, 000
NPV A (0.290) = $10, 000 + + = $713.70,
1.292 1.29
5
1 1 1
NPVB (0.272) = $10, 000 + $9, 000 + +
1.2722 1.2725 1.2728
= $421.52,
Finally, the CAPM Theorem also applies to portfolios. First, the concept of beta
extends naturally to portfolios. If P represents a portfolio of n risky securities
with portfolio weight vector w given by
w1 n
..
w = . , wi = 1, (4.8)
i =1
wn
the portfolio beta is defined to be the weighted average of the individual risky
security betas:
where i is the beta of the ith risky security in P for i = 1, 2, . . . n. Second, for
portfolio P we denote by P and i the expected portfolio return and expected
return of the ith security in P , respectively. Then by applying (4.6) on page 159
to the ith risky security in P and employing (4.8) and (4.9), we obtain
n n
P r = wi (i r) = wi i ( M r).
i =1 i =1
P r = P ( M r ). (4.10)
4.1 The Capital Market Theory 163
Remark 4.4.
1. The CAPM is a theoretically significant equilibrium pricing model. Without
a model of market equilibrium, the efficient market hypothesis cannot be
tested (see Fama [16]).
2. Although the CAPM has a beautiful simplicity in theory, the empirical ev-
idence shows its weaknesses in practice (see Fama and French [19] for a
detailed and comprehensive discussion).
The Capital Asset Pricing Theorem can be viewed as expressing the expected
return of a security as an affine function of the securitys beta, i.e., it defines a
straight line:
i ( i ) = ( M r) i + r.
This line is called the security market line (SML). That is, the SML is a graphical
representation of the CAPM on the ( , )-plane.
An illustration of the SML is shown in Figure 4.2. A security with i = 0 has
an expected return at the risk-free rate rthe security has no risk premium. For
i = 1, we have a security with i = M , i.e., the risk premium of the security
coincides with the market risk premium. If i = 1.5, then the securitys risk
premium is 1.5 times the market risk premium or 50% larger than the market
risk premium. For i = 1, the securitys risk premium is minus the market
risk premium, which means that the expected security return is less than the
expected market return by twice the market premium. Indeed, if i = b < 0,
then
i = M (1 + b)( M r).
In theory, the SML provides an equilibrium in the sense that each stock on
the line is fairly valued. Otherwise, it is located off the line:
1. A stock is overvalued if it is below the SML because such a stock offers
too low a risk premium. In other words, the level of expected return is not
adequate for the given level of risk measured by .
2. A similar argument can be made to show that a stock is undervalued if it is
above the SML.
164 4 Capital Market Theory and Portfolio Risk Measures
mi
0.20
e
L in
0.15
et
k
ar
1, mM
M
rity
cu
Se
0.10
0.05
rf
0.05
Fig. 4.2 An illustration of the security market line (SML) for risky securities identified by ( , ). A
security with its risk measured by i has expected return i . In particular, for i = 0, the security earns
the risk-free rate (i = r), so the security has no risk premium. A security with i = 1 has an expected
return at the market expected return, i = M 13%. For i = 1, the security has i 5%, i.e., the
expected security return is less than the expected market return by twice the market risk premium
(i = M 2( M r ) 5%)
E ( Ri r) = E ( i ( R M r)) ,
which yields the following model for the return of a risky security:11
Ri = r + i ( R M r) + i ,
where i is a random variable with mean zero. Assume that i is normal with
variance denoted by 2i and suppose that i is independent of R M and j for
j = i.
The securitys risk can be found from
The term
Var( i R M ) = 2i M
2
Var( i ) = 2i
Under the CAPM for security returns, we see that the covariance between
the returns of any two securities is determined by the betas of the securities
and market risk:
Cov( Ri , R j ) = Cov r + i ( R M r) + i , r + j ( R M r) + j
= i j Cov( R M , R M )
= i j M
2
.
Risk measures are a challenging topic as the notion of risk itself is hard to con-
ceptualize. The most popular measure of risk is volatility, which by definition
measures the dispersion of the investment return from its mean, regardless of
the direction of an investment prices movement. However, for investors in the
real world, risks are often associated with the adverse movement of the market
only. This is to suggest we also take a different perspective in the further de-
velopment of risk measures: if a risk measure is about the sustainability of los-
ing money, then maximum drawdown is employed (Section 4.2.3). If the risk
measure is about the odds of losing money, then VaR and CVaR are used (Sec-
tions 4.2.5 and 4.2.6). Especially in a leveraged investment, the performance is
measured by the return on unit risk, such as the Sharpe ratio (Section 4.2.1), the
Sortino ratio (Section 4.2.2), or the ratio of return and maximum drawdown.
It is worth pointing out, nevertheless, that in spite of the intuitiveness of the
latter two ratios, the system developer often prefers to optimize the in sam-
ple Sharpe ratio as it is intimately connected to the statistical t-test, which is a
measurement of the reliability of the system in out sample period.
In short, this section addresses several approaches to risk measures, which
provide a variety of portfolio evaluation techniques.
166 4 Capital Market Theory and Portfolio Risk Measures
Remark 4.5.
1. Before formally introducing any mathematical terminologies about risk, we
encourage the readers to think about how they themselves as (individual
or institutional) investors interpret risk. For insight into how some of the
worlds greatest minds have viewed risk, we refer the reader to Peter Bern-
steins book [5].
2. The lack of statistics in the prerequisite presents us with a great challenge in
this section and the next. To circumvent this difficulty, one of our pedagog-
ical approaches is to focus on basic understanding of concepts and avoid
statistical tests.
Consider a portfolio P over a time period [t0 , t f ]. Let R P be the portfolio return
over [t0 , t f ] and let r be the best available risk-free rate corresponding to the
same period (e.g., T-bills).12 The portfolios Sharpe ratio,13 denoted by S ( P), is
defined by
E ( R P r)
S ( P) = .
P ( R P r)
The Sharpe ratio was originally developed as a forecasting tool with the
expected return to calculate the forward-looking ratio (see Sharpe [36]). But
with the historical returns, which can be of any frequency, e.g., hourly, daily,
monthly, and so on, it is used to evaluate the risk-reward trade-off of an invest-
ment over that period.
Example 4.4. Suppose that we have the data set of monthly returns of a portfo-
lio P and that of monthly rates of 90-day T-bills over the past 54 months. This
is to say that each data set consists of 54 data points. Using the sample mean
and sample standard deviation, we can approximate the Sharpe ratio of the
portfolio over the past 4.5 years (54 months). A Sharpe ratio of
S ( P) = 0.0401
12 Recall that US Treasuries can be classified into bills, notes, and bonds according to their initial ma-
turities (in years) in terms of time intervals: (0, 1], (1, 10], and (10, ), respectively. We consider only
T-bills here since, the longer the maturities, the bigger the risk of inflation, and consequently, the less
reliable.
13 Named after William Sharpe.
4.2 Portfolio Risk Measures 167
Fig. 4.3 The slope of the CML equals the Sharpe ratio
Statistically speaking, data sets need to be sufficiently large. The more data
points we use (the shorter the sub-sample period), the more accurate our ap-
proximation of S ( P) becomes.
E ( R P r) = E ( R P ) r, Var( R P r) = Var( R P ),
so
E( RP ) r
S ( P) = .
P ( R P )
The following explains the significance of the Sharpe ratio:
1. Since E ( R P r) and P represent the expected excess return and risk, re-
spectively, the Sharpe ratio is a measure of the excess return per unit of risk
of the portfolio. In other words, the ratio describes how much risk premium
you are receiving for the extra volatility that you endure for holding a riskier
portfolio. In short, the Sharpe ratio measures risk-adjusted performance of
the portfolio.
2. Recall that if you invest part of your money in a risk-free security (e.g.,
T-bills) and the remainder in an efficient portfolio, the capital market line
(CML) can help you find the portfolio P that offers the most favorable
risk-return trade-off. In fact, assuming that R P and r are the same as we
used in the discussion of the CAPM, the slope of this CML is equal to the
Sharpe ratio of P (see Figure 4.3). This observation provides a method for
finding the best possible portfolio from the given collection of securities.
3. The Sharpe ratio is a leverage-environment14 measure of performance in the
sense that if r is omitted, then approximately
14 Two basic ways of achieving leverage are (a) to borrow money for investment and (b) to use finan-
cial instruments such as futures and options (see Chapter 7).
168 4 Capital Market Theory and Portfolio Risk Measures
E( RP ) bE ( R P )
SP = = ,
P ( R P ) b P ( R P )
where b denotes the leverage factor. For instance, if we double every invest-
ment, then the return is doubled, but the risk (standard deviation ) is also
doubled. If a hedge fund predetermines the risk level, then the Sharpe ratio
determines the return from which the leverage level can be determined for
products allowing high leverage (e.g., futures, commodities, currencies, and
options). In this sense, the Sharpe ratio provides a method of optimizing a
portfolio. Otherwise, the Sharpe ratio does not provide a portfolio of highest
return possible.
4. A negative correlation can considerably reduce the standard deviation (com-
pare page 115). Even in the case that the portfolio return is reduced, the
Sharpe ratio may still increase.
Remark 4.6.
1. Sharpe ratios can be defined by using logarithmic returns as well. Keeping
this in mind, we replace R P by r P in the definition of the Sharpe ratio to
obtain
E (r P r )
S ( P) = .
P (r P r)
For a constant risk-free rate, we have
E (r P ) r
S ( P) = .
P (r P )
2. For the reader who has had an introductory statistical background and
wishes to delve further into the significance of the Sharpe ratio, the
t-statistic is useful. In testing the null hypothesis = 0 , where denotes
the population mean, one uses
x 0
t= ,
( x )/ n
!
where x is the sample mean of the data, ! ( x ) is the sample standard devia-
tion of the x-data, and n is the sample size. The ! and are used to indicate
that a sample-data estimate is being carried out.
In connection to our interest, we rewrite for constant r,
RP r
t= = n!
SP ,
!
P / n
where
P = !
! ( R P r) = !
( RP )
4.2 Portfolio Risk Measures 169
! R r
SP = P .
!
P
Applying the formula for sample mean
1 n
n i
X= Xi ,
=1
we obtain
1 21 + 7.8 13 + 59.4 + 0.2 1.2 74.2 12.3667
RP = = = .
6 100 6 100 100
Note that we converted the percentages to pure fractions. The sample variance
formula,
2
ni=1 Xi2 nX
P =
! 2
,
n1
yields
Example 4.6. Suppose that the risk-free rate is 4% and that portfolio i is oper-
ated under strategy i, where i = 1, 2. Consider the following information:
E ( R) S P
Portfolio 1 17% 9% 1.44
Portfolio 2 15% 5% 2.2
170 4 Capital Market Theory and Portfolio Risk Measures
Remark 4.7.
1. For long-term investments, a Sharpe ratio of S P > 1 is typically considered
desirable. However, some short-term traders may consider only S P 3 good
enough, while other fund managers may consider S P > 2 to be an appropri-
ate target level.
2. Under the lognormal assumption, the unit conversions of volatility can eas-
ily be made by applying the property that the variance of its increments is
linear in the observation interval.
3. One should always remember that the Sharpe ratio is calculated based on
the historical returns and that past returns might be an indicator of future
performance, but they are certainly not a guarantee.
4. For a complex trading or investing system, the Sharpe ratio may provide
false information.
Using the same idea, we can compute the dispersion of X from a given num-
ber a by
4.2 Portfolio Risk Measures 171
1
a2 = ( x a)2 f ( x ) dx.
In a similar fashion, if we are interested only in the dispersion of X from one
side of number a, say, from the downside (, a], then the formula below
serves that purpose: 1 a
( x a)2 f ( x ) dx.
This leads to a natural way of defining sample semivariance. Let X1 , . . . , Xn be
a random sample of size n drawn from a population X. Let a be a number. Let
.
a if Xi a
Yi =
Xi if Xi < a,
where i = 1, . . . , n. The downside sample semivariance a2 of X is defined by
1 n
n i
a2 = (Yi a)2 . (4.12)
=1
Note that (4.12) is equivalent to
1 n
n i
a2 = (min{0, Xi a})2 . (4.13)
=1
We are now ready for the definition of the Sortino ratio. Let
P represent a portfolio over a time period [t0 , t f ],
R P be the portfolio return over [t0 , t f ],
r0 be the target or required rate of return for the investment strategy under
consideration.15
The Sortino ratio, denoted by SD ( P), is defined by
E ( R P ) r0
SD ( P) = ,
r0
"
where r0 = r20 and is the downside deviation of the portfolio equity.
Example 4.7. Let us use the same information given in Example 4.5 and find
the Sortino ratio in the period of those 6 years.
To apply (4.13), we first compute
15The quantity r0 was originally known as the minimum acceptable return (or hurdle rate) and is often
taken to be r.
172 4 Capital Market Theory and Portfolio Risk Measures
Thus,
1 172 + 3.82 + 5.22 330.48 55.08
r20 = = = .
6 100 2 6 100 2 1002
We obtain
RP r 8.3667
SD ( P) = = = 1.1273.
r0 55.08
Since the Sortino ratio captures the downside risk only, one can use the
Sortino ratio as a measure to rank the performance of a portfolio or an in-
vestment strategy.
Hence, our description of the maximum drawdown given above can be ex-
pressed below:
Definition 4.1. Given a time period [0, T ] and a portfolio, the maximum draw-
down of portfolio equity over [0, T ], denoted by MDD( T ), or simply MDD, is
defined by
MDD( T ) = max (V (u) V (v)). (4.14)
0 uv T
Since
the maximum drawdown of the portfolio equity over the time period of these
6 days is
MDD = 1, 147.73 1, 140.81 = 6.92.
Since a drawdown is usually quoted as the percentage between the peak and
trough, in terms of percentage, MDD is 6.92/1147.73 0.00603 0.6%.
The quantile function is one of the basic statistical concepts. We are primarily
interested in its fundamental role in defining distortion risk measures.16
The materials in quantiles covered in this section mainly serve as a prepara-
tion for introduction to the concepts of value-at-risk and conditional value-at-
risk in the next two sections.
Definition 4.2. Let X be a random variable with c.d.f. FX . Given p (0, 1), a
p-quantile of X (or its distribution) is a number a satisfying the properties
FX ( a ) p and FX ( a) p.
0 if x < 0
F( x) = P(X x) = f (t) dt = 2 x t
e dt if x 0
. 0
0 if x < 0
=
1 ex if x 0.
Property 4.1.
1. A quantile function is monotonically increasing on (0, 1).
2. A quantile function is left-continuous on (0, 1).
Proof. Let X be a random variable. As usual, F and Q represent the c.d.f. and
quantile function of X, respectively.
To show item 1, given p1 , p2 (0, 1) with p1 < p2 , we let
A i = { x R | F ( x ) p i }, i = 1, 2.
x < Q( p) Q( p0 ) whenever p ( p0 , p0 ).
x (1 ) x (2 ) x ( n ) .
Example 4.13. In modern financial theory, the concept of quantile can be used
as a measure of the downside portfolio risk. If X represents the possible loss on
a portfolio, this measure is determined by a prescribed p-quantile (e.g., p = 1%)
of X such that the likelihood of X (i.e., loss in dollar amount) to take on a value
larger than that p-quantile is less than probability p (i.e., 1% chance).
Such a measure of the downside risk is called VaR, an abbreviation for value-
at-risk, which will be introduced shortly.
The next remark is for the reader interested in statistical importance and
applications of quantile functions.
Remark 4.8.
1. The inverse relation between the quantile function and the cumulative dis-
tribution function makes the quantile function one of basic concepts used to
describe the probability distribution of a random variable. Since the quan-
tile function also plays an essential role in the concept of mid-distribution,
which is important for discrete distributions, the quantile function is espe-
cially important for sample distribution functions (therefore for statistical
data modeling).
2. Regression analysis is a way to determine whether or not there is a correla-
tion between two or more variables and how strong any correlation may
be. Quantile regression is a type of regression analysis used in statistics
and econometrics (e.g., detection of heteroscedasticity). Both statistics and
econometrics are employed in mathematical finance. Just as the method of
least squares enables one to estimate models for conditional means, methods
of quantile regression enable one to estimate models for conditional quantile
functions (e.g., conditional median function).
For extensive discussions on the utility of quantile functions in statistical
applications, we refer the reader to the literature (e.g., Gilchrist [21]).
4.2.5 Value-at-Risk
Value-at-risk (VaR) and conditional value-at-risk (CVaR) are two risk measures
widely used by financial institutions and financial regulators. The latter may
be viewed as an extension of, or complement to, the former.
We focus on a basic understanding of the concepts of these two measures
in Sections 4.2.5 and 4.2.6 and explore a deeper issue behind these concepts in
Section 4.2.7.
For the sake of convenience and clarity of notation, we begin with the fol-
lowing notational remarks:
178 4 Capital Market Theory and Portfolio Risk Measures
Definition 4.4. Given p (0, 1), the value-at-risk of a random variable X for the
level of probability p is denoted by VaR p ( X ) and defined by
Example 4.14. A stock portfolio with a one-day p-VaR = 20, 000, where p = 1%,
is interpreted as that there is a 0.01 probability that the portfolio will lose more
than $20, 000 on a day if there is no trading during the day.
Example 4.15. Consider a portfolio of a single asset. Suppose that the return
of the asset is normally distributed with mean return of 14% per annum and
annual standard deviation of 35%. The value of the portfolio today is $100,000.
With 1% probability (equivalently, a 99% level of confidence), what is the max-
imum loss at the end of the year (equivalently, what is the annual VaR)?
Solution. Let X be the annual return of the portfolio (in P/L form). Then X is
a normal random variable with
Fig. 4.4 The probability distributions for X1% and Z1% are given by the left corners in the top and
bottom plots, respectively
and
X $14, 000
Z= , ( Z1% ) = 1%.
$35, 000
We illustrate X and Z in Figure 4.4.
A software can be used to obtain Z1% . However, lets use a linear interpola-
tion as in Figure 4.5, which is a plot of the line segment,
y2 y1
y y1 = ( Z Z1 ).
Z2 Z1
We need to find the value Z1% satisfying ( Z1% ) = 0.01. Since
we have
Z1 = 2.33, Z2 = 2.32,
y1 = 0.0099, y2 = 0.0102.
From
X = $35, 000 Z + $14, 000,
we then obtain
Example 4.16. Suppose that the historical data of a portfolio shows the weekly
returns over the past 750 weeks in the table below.
Percentage Gain/Loss Number of weeks Other information
(Frequency)
< 5 2
[5, 4.5) 2 Suppose the
[4.5, 4) 0 seventh-highest
[4, 3.5) 3 weekly loss is 3.6%
[3.5, 3) 1
.. ..
. .
Find the weekly loss that will not be exceeded in 99% of cases. Assume that
the initial investment is $100,000.
Equivalently, $3,600 is the weekly loss that will not be exceeded in 99% of cases.
4.2 Portfolio Risk Measures 181
The next property provides a relation between VaRs under two different in-
terpretations of random variable X (e.g., P/L vs L/P). The proof of it provides
strategy and tactics in proving (4.20).
Property 4.2.
VaR p ( X ) = VaR1 p ( X ), p (0, 1). (4.19)
FX1 ( p) = F
1
X (1 p ), p (0, 1),
P ( X < a ) p P ( X a ).
implies
Similarly,
Xx iff X x
implies
P ( X x ) = P ( X x ) = P (Y y) = 1 P (Y < y).
Therefore,
which is equivalent to
Remark 4.9.
1. There are three basic methods for calculating VaR:
Without getting into technical details, we note that the historical method in-
volves historical simulation and is a nonparametric approach, the variance-
covariance method involves estimation of the standard deviation and is a
parametric approach, and the Monte Carlo simulation involves generation
of time series such as (sample) paths of security prices, etc. For sophisti-
cated examples and detailed and comprehensive treatments of VaR, readers
are referred to the literature (e.g., Dowd [15], Hull [25], and Jorion [27]).
2. While VaR is conceptually simpler to understand and operationally easier
to implement than most other risk measures, it can provide a false sense of
security if it is misused due to its lack of subadditivity and other limitations
(see Section 4.2.7 and the literature, e.g., Artzner, Delbaen, Eber, and Heath
[4]; Follmer and Schied [20]).
Definition 4.5. Given p (0, 1), the conditional value-at-risk of a random vari-
able X for the level of probability p is denoted by C VaR p ( X ) and defined by
C VaR p ( X ) = E ( X |X VaR p ( X ))
1 1 1
1 1
= x dFX ( x ) = FX1 (y) dy
1p
FX ( p )
1 1 p p
1 1
1
= VaRy ( X ) dy.
1p p
Indeed, if X represents the possible loss of a portfolio, then the last expression
represents the average of the VaRs on the losses in the tail, which are larger
than VaR p ( X ). This explains why notations C VaR p ( X ) and A VaR p ( X ) are of-
ten interchangeable and why C VaR is also called expected tail loss, tail VaR , or
expected shortfall.
Solution.
70 10
C Var10% ( X ) = 10
100
= 70,
100
70 100 30 100
10 10
C Var20% ( X ) = 20
= 50.
100
Remark 4.10. CVaR satisfies the subadditivity property, which is not valid for
ordinary VaR. For further discussions of the CVaR and its applications, we re-
fer the reader to the literature (e.g., Follmer and Schied [20]; Goldberg, Hayes,
Menchero, and Mitra [23]; Goldberg and Hayes [24]; Rockfellar and Uryasev
[32, 33]).
Just as a coherent and integrated air quality measure needs monitoring sta-
tions to measure the presence of contaminants in the air such as carbon monox-
ide, ozone and particulate matter, and so on, a coherent financial risk measure
needs to satisfy a set of properties that covers a number of different dimen-
sions of risk as briefly mentioned above. A proposal of such a set of properties
with mathematical clarity is postulated by Artzner et alia [3, 4] and described
in the definition below.
Definition 4.6. Let random variables X and Y represent two portfolio returns.
A coherent risk measure for portfolio return is a function that satisfies each of
the following properties with probability 1:
1. Monotonicity: If X Y, then ( X ) (Y ).
2. Subadditivity: ( X + Y ) ( X ) + (Y ).
3. Positive homogeneity: (c X ) = c ( X ), for any c (0, ).
4. Translational invariance: ( X + b) = ( X ) b, for any b (, ).
Example 4.18. Suppose that we bought two securities which are issued by two
companies and have identical price movements. Let X and Y represent returns
from each of these two securities. Assuming that each company goes bankrupt
4.3 Introduction to Linear Factor Models 185
independently with probability 8%, and we lose $10, 000 if a company goes
bankrupt, and we lose $0 if no bankruptcy occurs. Therefore VaR90% ( X ) =
VaR90% (Y ) = 0, and we obtain
VaR90% ( X ) + VaR90% (Y ) = 0.
On the other hand, let Ai be the event that we lose $10, 000 i, i = 0, 1, 2, a
straightforward elementary probability calculation establishes that
indicates that the VaR does not hold subadditivity for all possible random vari-
ables. Again, we note that subadditivity provides an incentive to diversify a
portfolio, which the VaR clearly discourages in this case.
Remark 4.11. For conditions under which the VaR becomes subadditive, we re-
fer the reader to the literature (e.g., Danelsson et alia [12] for subadditivity for
VaR in the tail and Dhaene et alia [14] for risk measures and comonotonicity).
where i is called the alpha value (or simply, alpha) of asset i, ik s are called
the beta values (or simply, betas) of asset i, and we assume that Cov( i , i ) = 0
whenever i = i , and that for each i, Cov( i , f j ) = 0 for all j.
Example 4.19. To make the concept of factor models easier to understand, lets
consider a single stock portfolio. Let f j , j = 1, 2, 3, 4, 5 represent house price-
to-income ratio, new home sales, housing market index, biotech ETF perfor-
mance, and FDA fast track development program, respectively. The following
is a five-factor model
19 Annualizing returns with compounding would make these factor models almost useless because
of the linearity of the model. As we pointed out in Remark 4.1, using the logarithmic return in fac-
tor models can avoid this shortcoming and take advantage of time-additivity and statistical tools in
studying properties of individual securities.
20 Error terms are usually assumed to be normal with mean zero.
4.3 Introduction to Linear Factor Models 187
R = + 1 f1 + 2 f2 + 3 f3 + 4 f4 + 5 f5 + .
A least squares linear regression fit will determine whether or not the model
is meaningful in practice.
Example 4.20. Revisit our example above. Recall the model with five factors
R = + 1 f1 + 2 f2 + 3 f3 + 4 f4 + 5 f5 + ,
R = + 1 f1 + 2 f2 + .
To make it more precise, PCA converts the original factors into uncorrelated
linear combinations of them. These linear combinations form an orthonormal
set of eigenvectors of the (sample) correlation matrix of factors. Very often in
practice, a few leading eigenvalues of the correlation matrix explain, in terms
of percentage, close to the total variation in the entire data set. Therefore, the
original model may be reduced to a new model with fewer uncorrelated fac-
tors which are the eigenvectors belonging to those leading eigenvalues. Such
reduction, in practice, not only reduces computation but also often results in
more robust models with higher sustainability for the future.
This discussion leads to another statement of the definition of linear factor
model:
Definition 4.8. A (linear) factor model relates the return of an asset of the port-
folio to the values of a limited number of factors, say fj , j = 1, 2, . . . , m, and is
represented by
m
Ri = i + ij fj + i , i = 1, 2, . . . , n, (4.24)
j =1
where:
Ri is the return on asset i, i = 1, 2, . . . , n.
E ( fj ) = 0, j = 1, 2, . . . , m (i.e., fj , j = 1, 2, . . . , m are centered).
fj , j = 1, 2, . . . , m are orthonormal factors under covariance (i.e., Cov( fj , fj ) =
0 whenever j = j , and Var( fj ) = 1 for all j).
ij indicates the sensitivity of asset i to factor fj and is called the factor loading
of return Ri .
Cov( i , i ) = 0 whenever i = i and that for each i, Cov( i , fj ) = 0 for all j.
i + i is the portion of the return on asset i not related to the m factors (thus
it is called the idiosyncratic return of asset i), where E ( i ) = 0.
The conditions imposed on the quantities in Definition 4.8 ensure that (4.24)
can be estimated by the method of least squares (see Exercises 4.30 and 4.31 on
page 206). Note that the alphas and betas are both measures of risks and tools
used to determine the risk-reward profile of a portfolio.
Observe that (4.23) shows that a factor model decomposes an assets return
into factors common to all assets in the portfolio and an asset-specific factor.
Even though this is a considerable simplification of reality, it is computation-
ally reasonable. Factor models are practically useful in many domains in the
field of investments, particularly in analyzing historic results, because they
provide a tool to allow analysts to separate components of the overall return
of the asset.
4.3 Introduction to Linear Factor Models 189
in a matrix form
X = + f + ,
where
X = [ R1 , R2 , . . . , R n ] , = [ 1 , 2 , . . . , n ] , i = [ i1 , i2 , . . . , im ],
f = [ f1 , f2 , . . . , f m ] , = [ 1 , 2 , . . . , n ] , = [ ik ]nm .
E ( f k ) = 0, k = 1, 2, . . . , m, E ( i ) = 0, i = 1, 2, . . . , n,
Cov( i , j ) = 0 whenever i = j, Cov( i , f k ) = 0 for all i and k,
we compute
This result suggests a relation between covariance matrices. To see this, we let
190 4 Capital Market Theory and Portfolio Risk Measures
where
Then we have
.
0 if i = j
= E ( f f ), Cov( i , j ) = E ( i j ) =
Var( i ) if i = j,
= + . (4.25)
Remark 4.12.
1. Different portfolios may be exposed to different types of risk when different
scenarios occur in the market. Intuitively speaking, portfolio risk manage-
ment is about what to anticipate in dealing with different market scenarios.
The additive decomposition in (4.25) is a significant result (or rather a key
objective of factor models) for portfolio risk management. It suggests that
factor models allow portfolio managers to perform risk management by ap-
proaching these very underlying factors and examining how they impact
covariance matrix of returns in a direct way.
2. Achieving optimal asset allocation requires a robust understanding of port-
folio risk. To pass the test for robust understanding, one needs to quantify
portfolio risk. Quantitative portfolio risk management requires quantifying
the overall portfolio risk (e.g., VaR and CVaR), and slicing and dicing into
sources of portfolio risk (e.g., multifactor models). In this sense, factor anal-
ysis refines the risk profile of the portfolio and allows portfolio managers to
perform allocation and hedging from factor risk perspective.
For portfolio optimization, we refer the reader to the literature (e.g., Chan,
Karceski, and Lakonishokk [6]; Connor and Korajczyk [8, 9]).
4.3 Introduction to Linear Factor Models 191
21
Observable variables, as a statistical term, are those that can be directly measured.
22
Latent variables, as opposed to observable variables in statistics, are those inferred through mathe-
matical models and cannot be directly observed.
192 4 Capital Market Theory and Portfolio Risk Measures
R = + R M + , (4.26)
where R M is the stock market return, say, the return of the S&P 500 over time
period [0, T ], and R is the return of a stock over the same time period. Here
and are referred as the the stocks and . (If R is the return of a portfolio,
then and are referred as the the portfolios and .) Taking the expectation
of both sides of (4.26) yields
E ( R) = + E ( R M ).
Accordingly, we have
R = + R M , (4.27)
where R and R M are sample means of R and R M , respectively.
In general, and are more stable than the equity price itself. Consequently,
they provide a certain predictive value. In other words, they are not only mea-
sures of risk but also tools used to determine the risk-reward profile of a port-
folio to form investment strategies.
Example 4.22. The mathematical expression of the single-factor model seen in
(4.27) suggests the following possible investment strategies:
a) Getting stock returns from both and . Perhaps a number of investors do
so without knowing the formal meanings of and .
b) Making stock investment profits based on a better judgment of the direction
of major stock market (say, the S&P 500). If one can do it well, then with a
certain level of leverage, one can make good returns on high stocks.
c) Making stock investment profits based on selecting high stocks. In fact,
many quantitative funds try to use index futures positions to hedge away
the beta part to obtain so-called market-neutral portfolios.
Before we illustrate how to determine and given the historical data on
R and R M in (4.27), let us understand the general idea of the method of least
squares, which is a classical technique in finding the approximate solution
of an overdetermined23 system of linear equations. Such systems often occur
based on raw field data and usually have no solutions.
Let f i , where i = 1, 2, , n, be n functions of m variables x1 , x2 , , xm ,
where n > m. Suppose that we are interested in solving for m unknowns
23 A system of linear equations is called overdetermined if there are more equations than unknowns.
4.3 Introduction to Linear Factor Models 193
f1 ( x1 , x2 , , xm ) = 0,
f2 ( x1 , x2 , , xm ) = 0,
f n ( x1 , x2 , , xm ) = 0.
Note that the system holds if and only if the equation
n
( fi ( x1 , x2, , xm ))2 = 0
i =1
holds. Unfortunately, in general, the system does not have a solution. What we
intend to do is to find points ( x1 , x2 , , xm ) that minimize
n
( fi ( x1 , x2, , xm ))2 .
i =1
We compute
n n
L
= 2 (yi xi )(1) = 2 (yi xi ),
i =1 i =1
n n
L
= 2 (yi xi )( xi ) = 2 ( xi yi xi x2i ).
i =1 i =1
Setting24
24 The solution to this system can only be minimum point(s) since L does not have maximum point.
194 4 Capital Market Theory and Portfolio Risk Measures
L L
= 0, = 0,
we obtain
n n n
0= (yi xi ) = yi n xi ,
i =1 i =1 i =1
n n n n
0= ( xi yi xi x2i ) = xi yi xi x2i .
i =1 i =1 i =1 i =1
Equivalently,
1 n 1 n
= yi xi ,
n i =1 n i =1
n n n
xi yi = xi + x2i .
i =1 i =1 i =1
Remark 4.13.
1. Although a stocks and are more stable than the stock price in general,
they change over time as the risk profile of the company changes. While the
discussion of their (random) variability is beyond the scope of this book, we
would like to point out that sometimes a stocks and may become quite
unstable when the volatility of the market is high.
2. The calculation of the actual values of a stocks and depends on several
factors including the length of the time period in the data sampling and the
frequency of the data sampling (see Exercise 4.22 on page 204). For instance,
consider a stock performance over a period of 3 years. If wk and mth de-
note the stocks calculated based on weekly returns and monthly returns,
25The best line to fit the data is in the sense of the least Euclidean distance.
26 For example, if the data space is R2 , and the unknown space is R3 , say y = + x + x2 , then
the best fit graph is a curve in the data space. If both the data space and unknown space are R3 , say
y = + x2 + z2 , then the best fit graph is a surface in the data space.
4.3 Introduction to Linear Factor Models 195
E ( R r) = E ( R M r), (4.29)
R r = ( R M r) + 1 , (4.30)
yi = Ri r f ,i , xi = R M,i r f ,i ,
where R M,i and r f ,i are, respectively, the market return and risk-free rate for
the ith sample period. Also, let
x = [ x1 x2 . . . x n ] , y = [y1 y2 . . . y n ] , e = [1 1 . . . 1 ] .
xy n 1 xi yi
= = i= CAPM . (4.33)
xx ni=1 x2i
Similarly, plug the data set into (4.31). After eliminating from
y = e + x,
B/N
p is called the book-to-price ratio of the company at time t0 . It is also
known as book-to-market ratio.
P/B ratio or the price-to-book ratio is the reciprocal of the book-to-price ratio.
Example 4.24. The key statistics of companies can be easily accessed online.
For instance, valuation measures for IBM provided by Yahoo! Finance on
November 8, 2012, indicated
where mrq stands for most recent quarter. Given that the closing price of
the stock on that day was $190.10, determine the capitalization of IBM and the
book-to-price ratio of IBM on that day.
size premium = RS R B .
198 4 Capital Market Theory and Portfolio Risk Measures
Both size premium and value premium are called risk premiums in the theory
of financial investments.
Intuitively, the stock price is expected to depend on market capitalization
(size), book-to-price ratio (value), and the systematic risk in stock investing
that is directly associated with the general stock market (e.g., S&P 500). Thus,
one natural approach to stock investments is to consider the general market,
size, and value as three risk factors. As a matter of fact, the factor models we
defined before are linear approximations of this dependence.
Although natural ideas do not always lead to successful investment strate-
gies, once they are quantified, statistical testing provides a mechanism for ac-
cepting or rejecting the ideas. It is in such frameworks that factor models can
come in handy.
Motivated by the desire to better explain differences in the returns of diver-
sified equity portfolios by asset pricing models, Fama and French created a
three-factor model [18] in the form of excess returns29 :
R r = + 1 ( R M r) + 2 ( RS R B ) + 3 ( R H R L ) + , (4.36)
which implies
E ( R r ) = + 1 E ( R M r ) + 2 E ( R S R B ) + 3 E ( R H R L ).
Example 4.25. Let S, MS , and B denote the sets of stocks of companies with
small, medium, and big capitalizations, respectively. Let L, M, and H denote
the sets of stocks of companies with low, medium, and high book-to-market
29Equation (4.36) is the form that most researchers currently use for the Fama-French three-factor
model. We refer the reader to [18] and [19] for the original form of this model.
4.4 Exercises 199
In the 1990s, Davis, Fama, and French performed empirical tests on the nine
portfolios above. Confirmatory factor analysis of the information in [13], which
includes data source, sample size, data organization, regression results, fac-
tor loadings, and R-square values, strongly supports that value and small-cap
stock portfolios outperformed markets on a regular basis in the USA from 1929
to 1996.
4.4 Exercises
4.2. Explain how the sign of the beta of a stock indicates the direction of the
movement of the stock price with respect to that of the market portfolio.
4.3. Let A be a class of stocks with beta between 0.5 and 2. What is the main
property that all stocks in A have in terms of the market returns?
M r f
4.4. Give a financial interpretation of the mathematical expression: M .
4.5. Can you find an example of a company in the USA with a negative beta? If
so, do you think that they are as abundant as companies with a positive beta?
Explain.
4.8. Use the table given in Example 4.17 on page 183 to determine Var p ( X ) for
arbitrary p.
4.9. Use the table given in Example 4.17 on page 183 to determine C Var p ( X )
for p = 30% and p = 50%.
4.11. Returns and risks are two aspects involved in every investment. Identify
each statement below as true or false or identify scenarios when it is true and
when it is false. Justify your answer.
a) The quantity relates to factors affecting the performance of an individual
stock or the fund managers skill in selecting the stocks.
b) The factor relates an individual stock-to-market risks.
c) A higher stock and a lower stock would be preferred choices.
d) The quantity = 0 if a stock market is efficient.
4.13. All investments carry some form of risk. Major risks include, but are not
limited to, the following:
a) Systematic risk
b) Interest rate risk
c) Liquidity risk30
d) Regulatory/political risk31
e) Leverage risk32
f) Credit risk
g) Currency risk
h) Counterparty risk33
Find an example for each type of risk listed above.
30 Liquidity risk is the risk that an investor cannot execute a buy/sell order in the market due to the
lack of anticipated/reasonable bid/ask spread or sufficient volume.
31 Regulatory changes or governmental policy changes may have significant impact on asset values.
4.14. Use the data in Table 4.1 below to compute the Sharpe ratio of the S&P
500 for the periods 1986 to 1999. Note that the risk-free rate is not constant.
Table 4.1 Annual return rate data from 1986 to 1999 for the S&P 500 and 1-year treasury bills. Data
Source: istockanalyst.com
4.15. Assume a risk-free rate of 1.5%. Answer the questions below using the
information in the following table:
Portfolio A B C D E F
Expected Return 3.2% 8.1% 9.8% 5.1% 10.7% 4.8%
Standard Deviation 2.7% 9.9% 13.7% 6.2% 17% 6.1%
a) Among the portfolios in the table, which one is closest to the market portfo-
lio? Justify your answer.
b) Plot the capital market line (CML) based on your answer in part (a).
c) For portfolio C, what is the portfolio risk premium per unit of portfolio risk?
d) Suppose we are willing to make an investment only with = 6.2%. Is a
return of 6.5% a realistic expectation for us?
202 4 Capital Market Theory and Portfolio Risk Measures
4.16.
Indicate your estimated returns on each stock on the graph from part a),
and decide your buy/sell/hold rating on each stock based on your graph.
Justify your decisions.
4.17. Suppose that we borrow an amount equal to 25% of our original wealth
at the risk-free rate 4.125%. Use the CML to find P and P .
4.18. Assume the risk-free rate is 1.5% and consider the information in the table
below:
the daily closing prices for GS and SPY on the trading days during the period
between June 30 and July 14 of 2011. Let P be a portfolio consisting of longing
200 shares of SPY and shorting 100 shares of GS. Suppose that r f = 0. Find the
maximum drawdown and the Sharpe ratio for the portfolio in the time period
indicated in the table.
Date GS SPY
June 30 $133.09 $131.97
July 1 $136.65 $133.92
July 5 $134.5 $133.81
July 6 $133.89 $133.97
July 7 $135.01 $135.36
July 8 $134.08 $134.4
July 11 $132.02 $131.97
July 12 $130.31 $131.4
July 13 $129.7 $131.84
July 14 $129.89 $130.93
4.22. Although most stocks and can be found online, the actual values of
and for the same stock may be different at different sites. Besides, how the
actual values are calculated might be considered as proprietary information.
Thus, it is critical to understand the factors that affect the calculations.
Use Yahoo Finance as a data source to complete each of the following prob-
lems:
a) The stock symbol of Apple Inc. is AAPL. Estimate and for AAPL by
using the weekly adjusted closing prices over the last 2 years and the S&P
500 index as the market portfolio.
b) Estimate and for AAPL by using the weekly adjusted closing prices over
the last 4 years and the S&P 500 index as the market portfolio.
c) Estimate and for AAPL by using the daily adjusted closing prices over
the last 2 years and the S&P 500 index as the market portfolio.
d) Estimate and for AAPL by using the weekly adjusted closing prices over
the last 2 years and the NASDAQ-100 index as the market portfolio.
e) Observe the results above and give the factors that the actual calculated
value of depends on.
of the S&P index and trades at approximately one-tenth of the dollar value of the S&P 500. Thus, the
rate of daily returns of SPY and S&P 500 index are basically the same.
4.4 Exercises 205
C Br f Ar2f 2Br f + C V 1 ( M r f e )
M = , 2
M = , wM = .
B Ar f ( B Ar f )2 B Ar f
4.24. Show that under linear factor model framework, portfolio variance can
be decomposed into common factor variance and idiosyncratic variance. (Hint:
Apply relation (4.25) on page 190.)
4.25. Use the single-factor model (4.26) on page 192, namely,
R = + R M + ,
to express in terms of the variance of the total market return and the covari-
ance of the market return with an individual securitys return.
4.26. Use the single-factor model (4.26) on page 192, namely,
R = + R M + ,
Prove that A = B.
206 4 Capital Market Theory and Portfolio Risk Measures
4.29. Given a portfolio P, show that its Sortino ratio is no less than its Sharpe
ratio.
4.30. Let A be an n m matrix and A T be the transpose of A. Prove the follow-
ing property:
rank( A T A) = rank( A).
4.31. We continue from the last exercise. Let
1 x1 y1
1 x2 y2
A = . . with x1 = x2 and y= .
.. .. ..
1 xn yn
L = ( A y) T ( A y) = T A T A T A T y y T A + y T y.
References
[1] Acerbi, C., Tasche, D.: Expected shortfall: a natural coherent alternative to
value at risk. Econ. Notes 31(2), 379388 (2002)
[2] Acerbi, C., Nordio, C., Sirtori, C.: Expected Shortfall as a Tool for Financial
Risk Management. Abaxbank Working Paper. arXiv:condmat/0102304.
http://arxiv.org/pdf/cond-mat/0102304.pdf
[3] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.: Thinking coherently. Risk
10, 6871 (1997)
[4] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.: Coherent measures of risk.
Math. Finance 9(3), 203228 (1999)
[5] Bernstein, P.L.: Against the Gods: The Remarkable Story of Risk. Wiley,
New York (1996)
[6] Chan, L.K.C., Karceski, J., Lakonishokk, J.: On portfolio optimization:
forecasting covariances and choosing the risk model. Rev. Financ. Stud.
12(5), 937974 (1999)
[7] Chen, N., Roll, R., Ross, S.A.: Economic forces and the stock market. J.
Bus. 59(3) 383(1986)
References 207
[8] Connor G., Korajczyk, R.A.: Risk and return in an equilibrium APT: ap-
plication of a new test methodology. J. Financ. Econ. 21(2), 255290 (1988)
[9] Connor G., Korajczyk, R.A.: A test for the number of factors in an approx-
imate factor model. J. Finance 48, 12631291 (1993)
[10] Connor G.: The three types of factor models: a comparison of their ex-
planatory power. Financ. Anal. J. 51(1), 4246 (1995)
[11] Connor G., Goldberg, L.R., Korajczyk, R.A.: Portfolio Risk Analysis.
Princeton University Press, Princeton (2010)
[12] Danelsson, J., Jorgensen, B.N., Samorodnitsky, G., Mandira, M.: Subaddi-
tivity Re-Examined: the Case for Value-at-Risk. Financial Markets Group,
Discussion paper, 549. London School of Economics and Political Science,
London (2005)
[13] Davis, J., Fama, E., French, K.: Characteristics, covariances, and average
returns: 1929-1997. J. Finance 55(1), 396 (2000)
[14] Dhaene, J., Vanduffel, S., Goovaerts, M.J., Kaas, R., Tang, Q., Vyncke, D.:
Risk Measures and Comonotonicity: A Review. Stochastic Models. Taylor
& Francis Group, LLC (2006)
[15] Dowd, K.: Measuring Market Risk. Wiley, New York (2005)
[16] Fama, E.: Efficient capital markets: a review of theory and empirical work.
J. Finance 25, 383(1970)
[17] Fama, E., French, K.: The cross-section of expected stock returns. J. Fi-
nance 47(2), 427(1992)
[18] Fama, E., French, K.: Common risk factors in the returns on stocks and
bonds. J. Financ. Econ. 33(1), 3(1993)
[19] Fama, E., French, K.: The capital asset pricing model: theory and evidence.
J. Econ. Perspect. 18(3), 25(2004)
[20] Follmer, H., Schied, A.: Stochastic Finance. Walter de Gruyter, Berlin
(2004)
[21] Gilchrist, W.G.: Statistical Modelling with Quantile Functions. Chapman
and Hall/CRC, Boca Raton (2000)
[22] Glasserman, P., Heidelberger, P., Shahabuddin, P.: Portfolio value-at-risk
with heavy-tailed risk factors. Math. Finance 12(3), 239(2002)
[23] Goldberg, L.R., Hayes, M.Y., Menchero, J., Mitra, I.: Extreme risk analysis.
J. Perform. Meas. 14, 3 (2010)
[24] Goldberg, L.R., Hayes, M.Y.: The long view of financial risk. J. Investment
Manag. 8, 3948 (2010)
[25] Hull, J.C.: Risk Management and Financial Institutions. Prentice Hall, Up-
per Saddle River (2007)
[26] Ingersoll, J.: Theory of Financial Decision Making. Rowman and Little-
field, Savage (1987)
[27] Jorion, P.: Value at Risk, 3rd edn. McGraw-Hill, New York (2007)
208 4 Capital Market Theory and Portfolio Risk Measures
[28] Luenberger, D.: Investment Science. Oxford University Press, New York
(1998)
[29] Ma, Y., Genton, M., Parzen, E.: Asymptotic properties of sample quantiles
of discrete distributions. Ann. Inst. Stat. Math. 63, 227(2011)
[30] Markowitz, H.: Portfolio Selection. Blackwell, Cambridge (1959)
[31] Reiley, F., Brown, K.: Investment Analysis and Portfolio Management.
South-Western Cengage Learning, Mason (2009)
[32] Rockafellar, R.T., Uryasev, S.: Optimization of Conditional Value-at-Risk.
J. Risk 2, 2141 (2000)
[33] Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss
distributions. J. Bank. Finance 26, 1443(2002)
[34] Ross, S.A.: The arbitrage theory of capital asset pricing. J. Econ. Theory
13, 341(1976)
[35] Sharpe, W.: Capital asset prices: a theory of market equilibrium under
conditions of risk. J. Finance 19(3), 425(1964)
[36] Sharpe, W.: The Sharpe Ratio objectively determined and measured. J.
Portf. Manag. 21, 1 (1994)
[37] Sharpe, P., Alexander, G.J., Bailey, J.V.: Investments. Prentice-Hall, Upper
Saddle Rive (1999)
[38] Spearman, C.: General Intelligence objectively determined and mea-
sured. Am. J. Psychol. 15, 201(1904)
[39] Wilmott, P.: Paul Wilmott on Quantitative Finance. Wiley, New York
(2006)
Chapter 5
Binomial Trees and Security Pricing Modeling
We shall present a general binomial tree model of the futures price of a security
given its current price, which is assumed known to all market participants.
Fix a time interval [t0 , t f ] over which to model a securitys price. The time
span of the interval [t0 , t f ] is denoted by
= t f t0 .
0 t0 , t1 = t0 + h n , t2 = t0 + 2 h n , ..., t n = t0 + n h n = t f ,
where
hn = .
n
Here t0 is the current time with t1 , . . . , tn future times. The subintervals [t j1 , t j ],
j = 1, . . . , n, will be called time steps. Since the overall time span is divided into
n periods, the tree will also be called an n-period binomial tree. Note that as the
number n of time steps changes, the label tn for the fixed final time t f changes.
Remark 5.1. To avoid our notation becoming too cumbersome, we denote the
price of a security at time t by S(t) without additional notation to indicate the
type of model used for the security. In particular, whether S(t j ) represents a
securitys price at t j using a discrete or continuous-time model will be made
clear from the context. For example, almost all of this chapter (except at the
end) will employ a discrete model, while the next chapter and beyond typically
will use continuous-time modeling.
Fig. 5.1 An n-period binomial tree over the interval [t0 , t f ], where n is a positive integer and each
[t j1 , t j ] has the same length /n, where = t f t0 . The tree satisfies the recombining property and
has independent paths. There are n + 1 possible values of S( tn ) and 2n possible price paths from t0 to
tn . The random gross returns S( t j ) /S( t j1 ), where j = 1, . . ., n are assumed independent.
5.1 The General Binomial Tree Model of Security Prices 211
Recombining property. If the price Snode at any node increases over the
next time step and is followed by a decrease in the subsequent time step,
then we get the same value if, instead, we had a price decrease followed
by an increase: (Snode un ) dn = (Snode dn ) un . By the recombining property,
there are then n + 1 possible prices at time tn :
The prices increase as one moves through the list from left to right, equivalently,
from the end of the bottommost branch of the tree to the topmost one. In particular,
for the ith price, S0 uin dnni , where i = 0, 1, . . . , n, the power i of un is the
number of times the price had to increase during the n time steps to arrive
at S0 uin dnni , while n i is the number of times the price had to decrease.
In other words, we can express the price of the security at time t j as
NU, j j NU, j
S ( t j ) = S ( t0 ) u n dn , (5.1)
where S(t0 ) = S0 and NU, j is the random number of upticks in the secu-
ritys price from time t0 to t j .
Gross returns, capital-gain returns, and log returns. For each time step
[t j1 , t j ], where j = 1, . . . , n, the random up or down price movement of the
security is given by its gross return S(t j )/S(t j1 ) over the interval. We as-
sume that the gross returns of the n time steps, namely,
S ( t1 ) S ( t2 ) S ( tn )
, , ..., , (5.2)
S ( t0 ) S ( t1 ) S ( t n 1 )
are independent and identically distributed (i.i.d.). Explicitly, the gross returns
are assumed to be independent Bernoulli random variables with each hav-
ing the same probability distribution determined by
S(t j ) un with probability pn
= (5.3)
S ( t j 1 )
dn with probability 1 pn ,
where j = 1, . . . , n and
In other words, from time t j1 to t j , each possible value of the price S(t j1 )
at t j1 either goes up by the factor un with probability pn > 0 or down by
the factor dn with probability 1 pn > 0. The situations pn = 0 and pn = 1
are excluded since there is little interest in binomial trees with such proba-
bilities.
212 5 Binomial Trees and Security Pricing Modeling
Since the gross returns (5.2) are i.i.d., the capital-gain returns
S ( t1 ) S ( t2 ) S ( tn )
R1 = 1, R2 = 1, ..., Rn = 1,
S ( t0 ) S ( t1 ) S ( t n 1 )
are i.i.d. as well as the log returns
S ( t1 ) S ( t2 ) S ( tn )
ln , ln , ,..., ln ,
S ( t0 ) S ( t1 ) S ( t n 1 )
where
S(t j ) ln(un ) with probability pn
ln = (5.6)
S ( t j 1 )
ln(dn ) with probability 1 pn ,
for j = 1, . . . , n. We are using the fact that if X and Y are independent ran-
dom variables and if f and g are continuous1 functions, then f ( X ) and g(Y )
are independent.2
Additionally, since each time step in an n-period binomial tree has the same size,
the quantities pn , un , anddn are assumed to have the same value over every time
step in the tree. However, though their values are the same across an n-
period binomial tree, they are not necessarily the same for different trees.
For instance, the triplet pn , un , anddn does not carry over to 1-period, 2-
period, . . . , and (n 1)-period binomial trees since those trees have larger
time steps and different probability spaces. In fact, each element of the
1 For readers familiar with measure theory, it suffices for the functions f and g to be measurable,
which includes the continuous functions. In fact, all the functions you will encounter in our financial
applications are measurable. Some measure theory will be introduced in Chapter 6.
2 Proof. P ( f ( X ) A, g (Y ) B ) = P ( X f 1 ( A ), Y g 1 ( B )) = P ( X f 1 ( A )) P (Y g 1 ( B )) =
P ( f ( X ) A) P ( g(Y ) B), where the independence of X and Y was used in the second to the last
equality.
5.1 The General Binomial Tree Model of Security Prices 213
Fig. 5.2 One- and two-period binomial trees (left to right). Since the time interval is [t0 , t f ] for both
trees, each time step in the tree on the right is actually half the size of the one on the left.
t St ( n ),
where
St = S ( t ), t t0 , t1 , . . . , t n ,
and St0 (n ) = S0 for every n in n . Moreover, the realizations of the ran-
dom price S(t j ) at a fixed time t j , where 1 j n, are given by the values
St j (n ) as n varies over the n . In addition, for a fixed t j and n , the pos-
sible price St j (n ) depends only on the portion of the sequence of Us and
Ds in n up to time t j , j = 1, . . . , n.
Example 5.1. Let us illustrate the above observations using the 2-period
binomial tree in Figure 5.2. The associated sample space of possible out-
comes is
2 = UU, UD, DU, DD .
Consider the following specific possible outcome:
2 = UD .
214 5 Binomial Trees and Security Pricing Modeling
Table 5.1 Possible outcomes 2 and security price values at times t0 , t1 , and t2 for a 2-period binomial
tree. The sample space is 2 = UU, UD, DU, DD . For example, if 2 = DU, then St1 ( 2 ) = S0 d2 .
The random variable St1 depends only on the first slot of 2 (e.g., St1 (UD ) depends only on U), while
the random variable St2 depends on both slots. Note that St0 is nonrandom, i.e., it is a constant random
variable.
2 St 0 ( 2 ) St 1 ( 2 ) St 2 ( 2 )
UU S0 S0 u2 S0 u22
UD S0 S0 u2 S0 u2 d2
DU S0 S0 d2 S0 d2 u2 = S0 u2 d2
DD S0 S0 d2 S0 d22
S t 0 ( 2 ) = S0 , St1 (2 ) = S0 u2 , St2 (2 ) = S0 u2 d2 .
t St ( 2 ), t = t0 , t1 , t2 .
We also see that the possible price St1 (2 ) depends only on the first entry
U of 2 = UD. Table 5.1 presents all the possible outcomes in 2 along
with the prices across time that each possible outcome determines. Com-
pare with Figure 5.2.
N (n )
P (n ) = pn U (1 pn )n NU (n ) .
where i = 0, 1, 2, . . . , n, and
k
n
P S(tn ) S0 ukn dnnk = i pin (1 pn )ni
i =0
= probability of at most k price increases,
(5.8)
where k = 0, 1, 2, . . . , n.
where the binomial formula was used for the last equality. Note that Equa-
tion (5.9) agrees with (5.5).
Cash dividends and the return rate. From time t0 to t f , assume that the
security pays a constant, continuous, proportional annual dividend yield rate q
giving a cash dividend of
D ( t j 1 , t j ) = q S ( t j 1 ) h n , ( j = 1, 2, . . . , n).
Unless stated to the contrary, assume that in the continuous-time limit (i.e., n
or hn 0), the cash dividends are continuously reinvested in the security to
216 5 Binomial Trees and Security Pricing Modeling
buy more units of the security. The (total) return rate from t j1 to t j is then
S ( t j ) S ( t j 1 ) + D ( t j 1 , t j )
R ( t j 1 , t j ) = = R j + q hn . (5.10)
S ( t j 1 )
a) What is the 57th possible price 6 months out, where the 101 possible prices
are counted from bottom to top in the tree?
Solution. The (i + 1)st possible price is
i n i
S0 u100 d100 ,
b) What is the probability that the 57th possible price will occur?
Solution. By (5.7), do not even need to know the 57th possible price in order
to compute the desired probability:
100
P S(0.5) = $155 u100 d100 =
56 44
100 (1 p100 ) = 6.1%.
p56 44
56
c) What is the probability that the stocks price increases 56 times?
Solution. Equation (5.7) yields that this probability is the same as that of the
57th possible price occurring, which is 6.1%.
d) What is the probability that the stocks price is at most $183.64?
Solution. Since
10056
$183.64 = $155 u56
100 d100 ,
Equation (5.8) gives
56
100 i
P S(0.5) $155 u56
100 d44
100 = p100 (1 p100 )100i = 80%.
i =0
i
We now conclude the section with a formal expression for the futures price of
the security in terms of the log returns over the various time steps. Writing
S ( tn ) ln
S(tn )
S ( t n ) = S0 = S0 e S0
S0
and using
S ( tn ) S ( t1 ) S ( t2 ) S ( t3 ) S ( tn ) n S(t j )
ln
S0
= ln
S0 S ( t 1 ) S ( t 2 )
S ( t n 1 )
= ln S ( t j 1 )
, (5.11)
j =1
we can express the price S(tn ) in terms of the time-step log returns as
n S(t j )
S(tn ) = S0 exp ln , (5.12)
j =1
S ( t j 1 )
Equivalently,
5
6
S(t j ) S(t j ) 6 S(t j )
ln = E ln + 7 Var ln Xn,j .
S ( t j 1 ) S ( t j 1 ) S ( t j 1 )
and
S(t j ) S ( t1 )
Var ln = Var ln ( j = 1, . . . , n).
S ( t j 1 ) S ( t0 )
It follows
218 5 Binomial Trees and Security Pricing Modeling
-
n S(t j ) n
S ( t1 ) n
S ( t1 )
ln S ( t j 1 )
= E ln S ( t0 )
+ Var ln
S ( t0 )
Xn,j
j =1 j =1 j =1
-
S ( t1 ) S ( t1 ) n
= n E ln
S ( t0 )
+ Var ln
S ( t0 ) Xn,j .
j =1
Let
n
1
Zn =
n
Xn,j .
j =1
It is assumed that, as the number n of time steps increases, the CRR tree be-
comes a more and more accurate model of a securitys price. For this reason,
the model is also called a real-world CRR tree. Of course, such terminology
should not be taken literally. It is another matter whether the real-world CRR
tree actually becomes an accurate fit to security prices in the marketplace as n
increases. These issues are beyond the scope of this introductory text, but we
do note that other types of trees have been studied, e.g., trinomial trees. Inter-
estingly, we shall see in Chapter 8 that the risk-neutral CRR tree is a natural
choice when pricing derivatives.
We now list some additional assumptions and define certain key quantities
associated with the real-world CRR tree:
We now show explicitly that m applies at each instant of time in the sense
that the expected price of the security is obtained by continuously com-
pounding the current price at the rate m q. First, since the (total) return
rate R(t0 , t1 ) arises from the capital-gain return R1 plus the dividend yield
contribution q hn , i.e.,
220 5 Binomial Trees and Security Pricing Modeling
R ( t 0 , t 1 ) = R1 + q h n ,
employing (5.14) gives
E ( R1 ) E R ( t0 , t1 ) q h n
= mq as n . (5.15)
hn hn
We then interpret m q as the instantaneous capital-gain return rate or sim-
ply the instantaneous capital-gain return. Note that m q is also quoted per
annum. Now, by (5.15), we have
and since
e( m q ) h n 1 + ( m q ) h n (n sufficiently large),
it follows3
Equation (5.17) shows that as n increases, the gross return over the interval
[t0 , t f ] becomes independent of the number of time periods n in a partition
of the interval. In fact, recalling (5.5), we see that (5.17) yields
n
S(t f ) S ( t1 )
n
E = E e( m q ) h n = e( m q )
S ( t0 ) S ( t0 )
(n sufficiently large)
or
The expectation on the left hand side of (5.18) is for discrete time.
We assume that m and q are known. Additionally, Equation (5.18) shows that
the instantaneous expected capital-gain return m q can be interpreted as
the continuously compounded rate at which the securitys expected price increases
(m q > 0) or decreases (m q < 0). In most cases, m q > 0. Naturally, a
security must have nontrivial promise for investors to tolerate an expected
return of m q < 0 for an extended period.
Example 5.3. Suppose that the time span = t f t0 is 2 years and the time
period hn is a trading day. Assuming 252 trading days in a year, we then
consider a 504-period CRR tree. Assume that the annual dividend yield
rate is 2% and the instantaneous annual expected return is 10%. If the cur-
rent security price is $75, then (5.18) yields that the expected price of the
security 1 year from now is obtained by continuously compounding $75 at
the annual rate m q = 8%. Explicitly, taking the current time to be 0, we
get
un dn = 1. (5.19)
This condition makes the CRR tree recombine along the horizontal line
through the initial price S0 since S0 un dn = S0 .
Second constraint and the constant RW . Define n to be the expected time-
step log return per unit time period:
1 S ( t1 )
n = E ln .
hn S ( t0 )
Explicitly
n hn = pn ln un + (1 pn ) ln dn . (5.20)
Additionally, since the time-step log returns are identically distributed,
Equation (5.11) yields that the expected log return over the interval [t0 , t f ]
is n times the expected log return over the time step [t0 , t1 ]:
S ( tn ) S ( t1 )
E ln = n E ln = n . (5.21)
S ( t0 ) S ( t0 )
A second constraint assumed for a CRR tree is that n converges to a constant RW
as n increases without bound:
n RW as n . (5.22)
222 5 Binomial Trees and Security Pricing Modeling
The constant RW is called the real-world4 instantaneous drift or, simply, the
real-world drift of the securitys price. Also, some authors refer to RW as a
natural drift or physical drift. We quote RW per annum.
Example 5.4. Suppose that the time interval [t0 , t f ] is a year and partition
the year into 252 trading days. Then
1
= 1, n = 252, hn = .
252
Taking the current time to be t0 = 0, one trading day later is t1 = 252
1
, and 1
year away is t f = 1. Historical adjusted closing prices of the security can
5
4 We remind readers to be mindful of the usage of the terminology real world; see the introductory
paragraph to this section on page 219.
5 See page 19.
5.2 The Cox-Ross-Rubinstein Tree 223
Equivalently, the variance of the log return over [t0 , t f ] is n times the vari-
ance of the log return over a time step:
S ( tn ) S ( t1 )
Var ln = n Var ln . (5.26)
S ( t0 ) S ( t0 )
A third assumed constraint for a CRR tree is that n2 converges to a positive con-
stant 2 as n increases without bound:
n2 2 > 0 as n . (5.27)
The quantity > 0 is called the continuous-time volatility or, simply, the
volatility of the securitys price. We assume that is known.
Example 5.5. Employing the same inputs as in Example 5.4, Equation (5.26)
shows that the annual variance of the log return is obtained by annualizing
the variance of the daily log returns:
S (1) S(1/252)
Var ln = 252 Var ln .
S (0) S (0)
Equation (5.27) then yields that if the standard deviation of the daily log
returns is estimated by using historical security price data, then the annual
volatility of the security is estimated as follows:
1/2
S(1/252)
252 Var ln .
S (0)
In other words, the standard deviation of the daily log returns is annual-
ized through multiplying by 252. As noted in Example 5.4, the daily log
returns are assumed to be i.i.d. in order to apply the CRR tree.
224 5 Binomial Trees and Security Pricing Modeling
Remark 5.3. The appearance of the constraints (5.19), (5.22), and (5.27) may
seem mysterious. We shall see that they make it possible to determine
approximate expressions of pn , un , and dn for n sufficiently large; see
page 226. Perhaps a more practical explanation for the constraints is that
they allow in the limit n for the discrete CRR pricing formula to con-
verge to the continuous-time security pricing formula utilized in the Black-
Scholes-Merton model.
Approximating the variances of the time-step log return and gross return.
A useful approximation of the variance of the log return is
S ( t1 )
Var ln E R21 (n sufficiently large). (5.28)
S ( t0 )
To obtain this result, first observe that
2
S ( t1 ) S ( t1 ) 2 S ( t1 )
Var ln = E ln E ln
S ( t0 ) S ( t0 ) S ( t0 )
S ( t1 ) 2
2
n
= E ln
S ( t0 ) n
S ( t1 ) 2
E ln (n sufficiently large).
S ( t0 )
For n sufficiently large, we have |R1 | 1 almost surely and so ignore any
3
contributions
from skewness (i.e., the term E R1 ) and kurtosis (i.e., the
term E R41 ). We then enforce the following approximation:
5.2 The Cox-Ross-Rubinstein Tree 225
2
S ( t1 )
E ln E R21 (n sufficiently large).
S ( t0 )
or
2 n
E R21 2 E (R1 ) (n sufficiently large).
n
On the other hand, employing (5.28) yields
n2 E R21 (n sufficiently large).
n
Consequently
n2 2 n E (R1 ) 2 n .
Taking the limit n and making use of the convergences (5.15), (5.22),
and (5.27), namely,
n E (R1 ) (m q), n RW , n2 2 ,
226 5 Binomial Trees and Security Pricing Modeling
2
RW = m q . (5.30)
2
un d n = 1 (5.31)
n RW (n sufficiently large) (5.32)
n2 2 (n sufficiently large), (5.33)
where
n hn = pn ln un + (1 pn ) ln dn (5.34)
2
un
n2 hn = pn (1 pn ) ln . (5.35)
dn
We claim that for n sufficiently large, Equations (5.31)-(5.35) are solved by
1 RW ,
un e hn
, dn e hn
, pn 1+ hn , (5.36)
2
which are called the real-world CRR equations. They are the governing formulas
for the real-world CRR tree. Note that
1
pn as n (5.37)
2
and by (5.17),
p n un + (1 p n ) dn e ( m q ) h n (n sufficiently large),
e( m q ) h n dn
pn (n sufficiently large). (5.38)
un dn
Equation (5.38) and the expression for pn in (5.36) are equivalent to first order
in 1/n (see Exercise 5.20).
5.2 The Cox-Ross-Rubinstein Tree 227
n hn = pn ln dn + (1 pn ) lndn = 2 pn ln dn + lndn .
Substituting the above expression for pn and ln(un /dn ) = 2 ln un into (5.35)
gives
1 n h n n h n
n2 hn = 1 + 1 4 (ln un )2
4 ln un lnun
n h n 2
= 1 (ln un )2 .
ln un
Solution. Given the large number of periods, we employ the CRR tree to
determine u100 , d100 , and p100 in terms of h100 , , and m:
e(mq) h100 d100
u100 e h100
, d100 e h100
, p100 .
u100 d100
We have n = 100, h100 = 0.25100 = 0.0025, h100 = 0.05, m = 0.12, q = 0,
= 0.10, and S(t0 ) = $75. Moreover
228 5 Binomial Trees and Security Pricing Modeling
b) What is the probability that 3 months from now, the stocks price is less than
or equal to its expected price?
Solution. First, we check whether the expected price $77.28 is one of the 101
possible prices 3 months from now. To do this, we must find the number k
of price upticks such that
100k
$77.28 = $75 u100
k
d100 .
Since S = $77.28, we get k = 53, i.e., the expected price $77.28 is the 54th
possible price, counting from bottom to top.
For the 100-step tree, the probability that the price 3 months from now is less
than or equal to its mean is more than 50%. We emphasize that this property is
not an accident for n sufficiently large. It carries over to the continuous-time
limit; see Equation (5.78) on page 248.
Remark 5.4. Note that if k were not an integer, say, k = 42.6785, then the
dollar amount $75 ukn d100 k would not be one of the possible price values of
n
the stock. However, we can still compute the desired probability by omitting
that dollar amount (which has probability zero) and adding the probabilities
up to k = 42.
5.2 The Cox-Ross-Rubinstein Tree 229
c) What is the probability that 3 months from now, the stocks price is less than
or equal to its current price? What would be the probability if the volatility
of the stock were much higher, say, 40%?
Solution. By (5.41), the number of upticks associated with the price S = $75
is k = 50, which means
10050
$75 u50
100 d100 = $75.
The current price is then the 51st possible price, counting from bottom to
top. The probability of the price 3 months out being less than or equal to $75
is actually less that 50%. In fact, by (5.8) the probability is
50
100 i
P ( S(0.25) $75) = p100 (1 p100 )100i = 32%.
i =0
i
Now, if we increase the volatility from 10% to 40%, the CRR tree yields:
In addition, the number of upticks for S = $75 is also k = 50. However, the
probability that the stock price 3 months from now is less than or equal to its
current price increases to 52%. In other words, for a sufficiently large volatility,
we see that there is more than a 50% probability that the price 3 months away is less
than or equal to the current price. This property is also not coincidental for n
and sufficiently large. We shall encounter it again in our continuous-time
study of security prices; see Equation (5.79) on page 248.
We now express the general binomial tree security price formula (5.13) on
page 218, namely,
-
S ( t1 ) S ( t1 )
S(tn ) = S0 exp n E ln + n Var ln Zn ,
S ( t0 ) S ( t0 )
Because
S ( t1 ) S ( t1 )
n E ln = n , n Var ln = n2 ,
S ( t0 ) S ( t0 )
we have
S(t j )
ln S ( t j 1 )
n h n
Xn,j = . (5.42)
n hn
Also, it can be shown (Exercise 5.21) using the formulas (5.31)(5.35) that in a
real-world CRR setting, the standardization Xn,j becomes
1 p n
with probability pn
p n (1 p n )
Xn,j = (5.43)
pn
with probability 1 pn .
p n (1 p n )
The security pricing formula now takes the following more compact form for
a real-world CRR tree:
S(tn ) = S0 en + n Zn
. (5.44)
The goal is to determine the quantity to which (5.44) converges in the limit
n .
The current section explores the CRR tree in the context of a risk-neutral world,
that is, in a world of only risk-neutral investors (see page 137).
In a risk-neutral world, meaning a world of risk-neutral investors (see page 137),
the expected futures price of a security is assumed to be given by its current price
continuously compounded at the risk-free rate r minus any cash dividend yield rate.
For such a world, there is no compensation required for the securitys risk
since the rate r compensates only for opportunity cost and inflation. Explicitly,
if a security pays a continuous, proportional cash dividend at constant annual
yield rate q, then in a risk-neutral world, the securitys expected price at the
future time t1 , given the current price S(t0 ) = S0 , is
E S ( t 1 ) = S0 e ( r q ) h n . (5.45)
5.2 The Cox-Ross-Rubinstein Tree 231
What actually changes in the switch from the real world to a risk-neutral world is
the probability. In particular, the uptick probability will no longer be pn . This is
because, in a risk-neutral world, Equation (5.45) is true. Can we find an uptick
probability pn that makes the risk-neutral condition (5.45) hold? Writing out
(5.45) formally using the unknown quantity pn , we obtain
pn S0 un + (1 pn ) S0 dn = S0 e(rq) hn . (5.47)
e (r q ) h n dn
pn = . (5.48)
un dn
Strictly speaking, we do not yet know if the quantity pn given by (5.48) is ac-
tually a probability. It will be a probability if we can prove that pn is between
zero and one. As pointed out on page 211, we shall exclude binomial trees with
pn = 0 or pn = 1. By (5.47), this means that the constraints
6 Recall that it is essentially impossible to determine a reliable value for m in the marketplace.
232 5 Binomial Trees and Security Pricing Modeling
e (r q ) h n = dn , e (r q ) h n = un , (5.49)
are enforced.
We claim that, for an n-period risk-neutral binomial tree, the no-arbitrage condition
implies
0 < pn < 1.
To establish the claim, it suffices to prove that if there is no arbitrage, then an
n-period risk-neutral binomial tree satisfies
dn < e (r q ) h n .
un > e (r q ) h n .
To show these two statements, it suffices to consider the time step [t0 , t1 ]
since the uptick probability is assumed the same across every time step in a
binomial tree. Assume dn > e(rq) hn . At the current time t0 , short (borrow) at
the risk-free rate r an amount equal to the cost S0 eq hn of eq hn units of the
security. Use these funds to long eq hn units of the security. At time t1 , the
number of units of the security grows to 1 due to continuous cash dividend
reinvesting to buy more shares. Sell the one unit of the security to receive S(t1 ).
The amount owed on the loan at t1 is S0 e(rq) hn . The net profit/loss is
S ( t 1 ) S0 e ( r q ) h n .
Consequently,
S ( t 1 ) S0 e ( r q ) h n S0 d n S0 e ( r q ) h n .
If dn > e(rq) hn , then
S(t1 ) S0 e(rq) hn > 0,
which is an arbitrage. Hence
dn e (r q ) h n .
However, we exclude the case dn = e(rq) hn since (5.47) shows that it leads to a
binomial tree with pn = 0. The argument for un > e(rq) hn is left as an exercise
(Exercise 5.7).
The no-arbitrage condition (along with the constraints (5.49)) then yields
dn < e (r q ) h n < un .
5.2 The Cox-Ross-Rubinstein Tree 233
Notation. For the risk-neutral CRR tree, we shall designate expectations, vari-
ances, etc. with respect to pn using an .
Equations (5.46) and (5.48) show that the following quantities govern the risk-
neutral CRR tree for n sufficiently large:
e (r q ) h n dn
un e hn
, dn e hn
, pn = . (5.50)
un dn
Let us now express pn in a form analogous to the expression for pn in (5.40).
The quantities m, n , andn change to the following m , n , andn in the risk-
neutral setting:
n , n as n .
Since in a risk-neutral world, the expected futures price comes from com-
pounding S0 at the rate r q, the instantaneous expected return rate m is the
risk-free rate. In fact, analogous to (5.17) we have
S0 e ( m q ) h n E S ( t 1 ) (n sufficiently large),
which is called the risk-neutral drift of the security. Similarly, the analogs of
(5.20) and (5.25) are
2
2 un
n hn = pn ln un + (1 pn ) ln dn , (n ) hn = pn (1 pn ) ln ,
dn
which with un = 1
dn yield the following for n sufficiently large:
1 ,
un e hn
, dn e hn
, pn 1 + hn .
2
Because the up and down factors are the same for the risk-neutral world and
the real world, we have
e hn
un e hn
(n sufficiently large).
In other words, the continuous-time volatility is the same in the real world and the
risk-neutral world:
= .
Consequently, the risk-neutral CRR equations become the following for n sufficiently
large:
1 ,
un e hn
, dn e hn
, pn 1+ hn , (5.51)
2
where
2
= r q . (5.52)
2
where
E ( R(t0 , t1 )) r hn
n = , . (5.54)
Var ( R(t0 , t1 ))
5.2 The Cox-Ross-Rubinstein Tree 235
Here n is the Sharpe ratio of the security given as the ratio of the spread
between the expected total return rate and the risk-free rate7 r hn across the
time step [t0 , t1 ], to the security risk across the same time step. The relationship
(5.53) is a discrete-time example of Girsanov theorem (see Neftci [13, Chap. 14]
and references therein).
We now give a heuristic proof of (5.53). Our approach is to compute the
Sharpe ratio n first. By (5.14) on page 219, the expectation in the numerator of
(5.54) is
where
1 RW ,
pn 1+ hn (n sufficiently large)
2
with
2
RW = m q .
2
Consequently, the Sharpe ratio becomes
(m r) hn
n , (n sufficiently large).
p n (1 p n ) 2 h n
and
1 (m q) , 2 ,
pn 1 + hn hn .
2 2
Hence, for n sufficiently large, it follows
"
1 (r q ) , 2 ,
p n n p n (1 p n ) 1 + hn hn
2 2
1 ,
= 1 + hn
2
pn ,
where
2
= r q .
2
Though the price of a security has the same value in the real world and the
risk-neutral world, its mathematical expression can also be given in terms of
quantities in a risk-neutral CRR tree. In parallel to the discussion for the secu-
rity price formula of a real-world CRR tree (see page 229), we obtain
Z n
S(tn ) S0 en + n (n sufficiently large), (5.55)
with
1 pn with probability pn
p n (1 p n )
Xn,j =
p n
with probability 1 pn .
p n (1 p n )
(n n )
Z n Z n + (n sufficiently large). (5.56)
n
We shall also show this transformation explicitly in the continuous-time limit
(see (5.70) on page 243).
5.3 Continuous-Time Limit of the CRR Pricing Formula 237
The goal is to determine an explicit expression for the random price of a se-
curity in the continuous-time limit n of the discrete CRR pricing formu-
las for the real-world and risk-neutral world CRR trees. We shall carry out the
analysis in detail for the real-world CRR tree. The risk-neutral case is the same,
except for a minor relabeling of the notation.
For a real-world CRR tree, Equation (5.44) on page 230 expresses the price of a
security as follows:
S(tn ) = S0 en + n Zn .
Here
n
1
Zn =
n
Xn,j ,
j =1
S(t j )
with Xn,j the standardization of the log return ln S ( t j 1 )
(see (5.42)):
S(t j )
ln S ( t j 1 )
n h n
Xn,j = ,
n hn
where j = 1, . . . , n. Note that
1 n
n j
Var(Z n ) = E X n,j = 1.
2
(5.57)
=1
In fact, each Z n is the standardization of nj=1 Xn,j . In terms of the uptick prob-
ability pn , we have explicitly (see (5.43))
1 p n
with probability pn
p n (1 p n )
Xn,j = (5.58)
pn with probability 1 pn .
p n (1 p n )
Table 5.2 lists some of the random variables Xn,j and Z n . Note that the row
sequences of Xn,j s form a triangular-array pattern. For each n 1, the random
variables Xn,1 , Xn,2 , . . . , Xn,n in the nth row are independent (since the time-step log
returns are independent) and identically distributed, i.e., the probability measures
P n,1 , P n,2 , . . . , P n,n of Xn,1 , Xn,2 , . . . , Xn,n , respectively, are the same:
238 5 Binomial Trees and Security Pricing Modeling
Table 5.2 A triangular array of the random variables Xn,j , where n = 1, 2, 3, . . . and j = 1, . . ., n. Each
row sequence Xn,1 , Xn,2 , . . ., Xn,n , where n = 1, 2, . . ., consists of i.i.d. standardized random variables,
whereas the sequence Z1 , Z2 , . . ., is not i.i.d.
n Sequence Zn
1 X1,1 X1,1
1
2 X2,1 , X2,2 ( X2,1 + X2,2 )
2
1
3 X3,1 , X3,2 , X3,3 ( X3,1 + X3,2 + X3,3 )
3
.. .. ..
. . .
1
n Xn,1 , Xn,2 , Xn,3 , . . ., Xn,n ( Xn,1 + Xn,2 + Xn,3 + + Xn,n )
n
.. .. ..
. . .
1 pn pn
P n,j Xn,j = , = pn , P n,j Xn,j = , = 1 pn
p n (1 p n ) p n (1 p n )
for j = 1, . . . , n. It may also be tempting to apply the classical Central Limit The-
orem (CLT)8 to conclude that Z n = 1n nj=1 Xn,j converges in distribution to a
standard normal random variable as n . However, this is not allowed because
X1,1 , X2,1 , X2,2 , X3,1 , X3,2 , X3,3 , X4,1 , . . . are not identically distributed. In fact,
if n = n , then Xn,j and Xn ,j have different probability distributions since
pn = pn .
To determine the convergence of Z n as n , we employ a generalization
of the classical CLT due to Lindeberg. Consider a triangular array of random
variables, namely, the following collection of rows of random variables:
X1,1
X2,1 , X2,2
X3,1 , X3,2 , X3,3
..
.
Xn,1 , Xn,2 , Xn,3 , ..., Xn,n
..
. (5.59)
8 Theorem. (Classical CLT) Assume that X1 , X2 , . . . are i.i.d. random variables with each having a
finite mean E ( Xi ) = 0 and finite variance Var( Xi ) = 02 > 0. Let Zn be the standardization of the
sample mean Xn = n1 ( X1 + + Xn ), namely,
Xn 0 1 n X j 0
Zn = =
( 0 / n) n
0
.
j =1
Assume that the random variables Xn,1 , Xn,2 , . . . , Xn,n in the nth row are independent
for every n (i.e., every row) and satisfy
n
E Xn,j = 0 for j = 1, . . . , n, and E X2n,j = 1. (5.60)
j =1
For the array (5.59), we do not require that the random variables in any row be identi-
cally distributed and nor do we require that those in different rows are independent.
Before we can get the desired generalization of the classical CLT, we need a
constraint, called the Lindeberg condition, on the random variables in the rows of
(5.59). Our discussion will be in parallel with the treatment in Pitmans lecture
[14]; see Section 27 and Theorem 27.2 of Billingsley [1] for more.
Our assumptions about the array in Equation (5.59) yield that the summa-
tion in (5.60) is actually the total variance of the nth row:
n n
Var Xn,j = E X2n,j = 1, (5.61)
j =1 j =1
where Var Xn,j = E X2n,j . Note that the largest value of the variance terms
in the summation (5.61) is less than or equal to 1, and, if the largest value
equals 1, then it is attained by only one variance term and the other terms
must vanish. Intuitively speaking, the Lindeberg condition is the requirement
that as one moves further and further down the array in (5.59) (i.e., as n ),
the variances of the random variables in each row become smaller and smaller.
To state the Lindeberg condition precisely, recall the definition of the indica-
tor function on { X > c}, where X is a random variable and c a real number:
1 if X > c
1{ X > c } =
0 if X c.
Adding to the upper bound in Equation (5.63) the contributions (each of which
is also nonnegative)
from
the remaining terms in the nth row, namely, the sum
E Xn,j 1{|Xn,j |> } , we find
2
1 j = i n
n
Var (Xn, i ) 2 + E X2n,j 1{|X |> }
n,j
(i = 1, . . . , n).
j =1
Because this upper bound holds for the variance of each random variable in
the nth row, it holds in particular for the largest value of those variances:
n
max Var (Xn, i ) 2 +
1 i n
E X2n,j 1{|X |> }
n,j
(i = 1, . . . , n). (5.64)
j =1
By Equation (5.64), if the array (5.59) satisfies the Lindeberg condition, then
as one moves down the array, the maximum of the variances of the random
variables in each row starts to approach zero:
lim max Var (Xn, i ) 2 for all >0 (no matter how small), (5.65)
n 1 i n
i.e.,
lim max Var (Xn, i ) = 0.
n 1 i n
Note: since we always have Var Xn,j 1, Equation (5.65) gives no new con-
straint on the maximum variance for 1, but it does for 0 < < 1.
With the Lindeberg condition available, we can now state the following gener-
alization of the classical CLT:
Theorem 5.1. (The Lindeberg Central Limit Theorem) Assume that the trian-
gular array of random variables,
5.3 Continuous-Time Limit of the CRR Pricing Formula 241
X1,1
X2,1 , X2,2
X3,1 , X3,2 , X3,3
..
.
Xn,1 , Xn,2 , Xn,3 , ..., Xn,n
..
.
is such that the random variables Xn,1 , Xn,2 , . . . , Xn,n in the nth row are independent
for every n 1 (i.e., every row) and satisfy
n
E Xn,j = 0 for j = 1, . . . , n, and E X2n,j = 1.
j =1
If the triangular array obeys the Lindeberg condition, then
n
d
Zn = Xn,j Z as n ,
j =1
Readers are referred to Billingsley [1, p. 360], Durrett [6, p. 129], and Pitman
[14] for proofs of Theorem 5.1.9
The Lindeberg CLT tells us that, for n sufficiently large, all probabilities
about Z n can be approximated by those of a standard normal random vari-
able Z. Explicitly, for n sufficiently large and for every real number x, we have
We now show that the Lindeberg CLT applies to both the real-world and risk-
neutral CRR security price formulas.
Starting with the real-world CRR tree, consider the triangular array of random
variables Xn,j in Table 5.2 associated with the security price formula (5.44) on
page 230. Let
9 The classical CLT follows from the Lindeberg CLT through the Dominated Convergence Theorem
(expectation form) from measure theory; see Durrett [6, pp. 29,129].
242 5 Binomial Trees and Security Pricing Modeling
Xn,j
Xn,j = .
n
To check that these random variables obey the hypotheses of the Lindeberg
CLT, recall that for each n 1, the random variables Xn,1 , Xn,2 , . . . , Xn,n are
independent and standardized, and satisfy (5.57) on page 237. Then for ev-
ery n 1, the random variables Xn,1 , Xn,2 , . . . , Xn,n are also independent and
satisfy (5.60). Next, to see that the Lindeberg condition holds, Equation (5.58)
yields
1 p n
with probability pn
1 p n (1 p n )
Xn,j =
n
pn with probability 1 pn .
p n (1 p n )
Xn,j 0 as n .
This implies that, for any fixed > 0, the following condition does not hold for
n sufficiently large: 4 4
4Xn,j 4 > > 0.
E X2n,j 1{|X |> } = 0 (n sufficiently large).
n,j
Thus, the discrete-time real-world CRR security price at the future time t f = t0 + ,
which is labeled by tn in an n-period real-world CRR tree, converges in distribution to
the following continuous-time security price formula at t f :
S0 en + n S0 eRW +
Zn d Z
as n . (5.66)
where S0 = S(0) and Zt N (0, 1). Interpret the standard normal Zt to mean
that its realizations are randomly drawn at time t. We insert the subscript t
rather than use the notation Z to avoid some potential notational and concep-
tual confusion later; see page 246. Notice the dependency of S(t) on the length
t of the interval [0, t]. We shall also switch freely between S(t) and St as nota-
tion for the securitys price at time t and, more generally, between X (t) and Xt
for a random variable dependent on t:
S ( t ) = St , X ( t ) = Xt .
The security price is the same in the real world and the risk-neutral world, but
has different expressions and probabilities. Explicitly, we have
Xn,j n
Xn,j = , Zn = Xn,j ,
n j =1
where
1 pn with probability pn
p n (1 p n )
Xn,j =
p n
with probability 1 pn .
p n (1 p n )
Essentially the same arguments as in the real-world case carry over to the risk-
neutral setting to allow application of the Lindeberg CLT. For example, Equa-
tion (5.51) on page 234 shows that pn 1/2 as n , which in turn yields
the following condition used to verify the Lindeberg condition: Xn,j 0 as
n . Since n = as n , the Lindeberg CLT yields
Z n
Z
S0 en + n S0 e +
d
as n , (5.68)
which is the continuous-time analog of Equation (5.56) on page 236. Note that
m r
is the securitys Sharpe ratio.
As mentioned in Remark 5.1 on page 210, to avoid notation that is too cumber-
some, we do not introduce any new notation to distinguish the security price
formula in discrete time versus continuous time. For example, given a parti-
tion, 0 = t0 < t1 < < tn = t, of the interval [0, t], it will be made clear from
the context whether the real-world security price S(t j ) at time t j is given by
t + n t j Z n t + t j Zt j
S0 e n j (discrete time) or S0 e RW j (continuous time).
In fact, for most of our study going forward, we shall use the continuous-time
formula and abide by the following:
We now motivate and present several properties we shall assume about secu-
rity prices for continuous time.
Continuous sample paths. For an n-period CRR tree, a possible outcome
0 is a sequence of n up and down movements in a securitys price. There
are 2n such possible outcomes giving rise to 2n possible security price
paths. In the continuous-time limit n , we have assumed that the result-
ing uncountably many sample paths of the security, whether in a real world or
risk-neutral world, are continuous functions of time t with probability 1. Later,
we shall explore the discontinuous case, where the price paths can have
jumps; see the Merton jump-diffusion model in Section 8.9 (page 448).
Markov property and the Weak Efficient Market Hypothesis. The futures
price of a security depends on the current price S(0) and does not depend ex-
plicitly on the securitys past prices; see Equations (5.67) and (5.69). That is,
probabilities about futures prices of a security are independent of the secu-
ritys past price path (Markov property). Nonetheless, information about the
past is not totally excluded in the sense that such information is included
5.3 Continuous-Time Limit of the CRR Pricing Formula 245
in the current price. Indeed, one of our key assumptions is the Weak Efficient
Market Hypothesis, which states that the current price of a security reflects
all market information concerning the security. In particular, the securitys
past price path and present market expectations about the securitys future
behavior are already taken into account in the current price of the security.
Stationarity of log returns. By Equations (5.67) and (5.69), the securitys
continuous-time log return from time 0 to t does not depend on the price
of the security at the start of the interval [0, t]. It depends on the length of
the interval:
S ( t) d
ln = RW t + t Zt ,
S (0)
where t 0. Similarly, in a risk-neutral world ln SS((0t)) = t + t Zt . Mo-
d
tivated by this, assume that in the continuous-time setting, the log return of a
security is stationary, which means
S ( t) d S ( t + u)
ln = ln
S( x ) S ( x + u)
for all 0 x < t and every u such that x + u 0 and t + u 0. In other
words, rigidly shifting the interval does not change the log return probabilities.
Independence of log returns over nonoverlapping10 intervals. For the n-
period CRR tree, the discrete-time log returns over different time steps are
independent. Motivated by the latter, we assume in the continuous-time set-
ting that for any finite sequence of times, 0 t1 < t2 < < tk1 < tk , which is
not required to coincide
with the time steps of the CRR tree we studied earlier, the
S(t j )
log returns ln S(t ) are independent for j = 2, . . . , k.
j 1
10 Recall that two intervals are nonoverlapping if their interiors are disjoint.
246 5 Binomial Trees and Security Pricing Modeling
d
B(t) B() = B(t ) N (0, t ) (0 t ).
d
Note that if we used the notation B(t) = t Z, then it is tempting to write
d
B(t) B() = tZ
Z = ( t ) Z,
which is incorrect since it states B(t) B() N 0, t + 2 t .
For each future moment of time t > 0, the continuous-time security price,
S(t) = S(0) eRW t +
d t Zt
,
The statistics of a lognormal random variable are well known and readily yield
formulas for the mean, median,11 variance, and covariance of continuous-time
security prices:
11The median of an absolutely continuous random variable X is a number, denoted Med( X ), such
that P ( X > Med( X )) = 12 = P ( X < Med( X )) . Additionally, if X N ( a, b2 ), where a and b > 0 are
constants, then the mean and median of X satisfy E e X = e a +b /2 and Med A e a X = A e a Med( X ) .
2
5.4 Basic Properties of Continuous-Time Security Prices 247
Equation (5.71) shows that the expected price of the continuous-time security at
a future time t is obtained by continuously compounding the current price S(0) at
the instantaneous capital-gain rate m q over the time span t. Also, since > 0,
Equation (5.72) readily shows that the median of a continuous-time security price
at t > 0 is always below its mean:
This is due to the skewness of the lognormal p.d.f. Equations (5.73) and (5.74)
yield, since and the mean price E (S(t)) are positive, that the variance and
covariance of the securitys prices are positive for all future times t > u > 0.
We can also obtain formulas for the conditional expectations of a continuous-
time security price (Exercise 5.23):12
N(d+,RW (t))
E (S(t) | S(t) > K ) = E (S(t)) (5.76)
N(d,RW (t))
N(d+,RW (t))
E (S(t) | S(t) < K ) = E (S(t)) , (5.77)
N(d,RW (t))
ln
S ( 0)
+RW t
where t > 0, K > 0, and d,RW (t) = with d+,RW (t) = d,RW (t) +
K
t.
t
Since the median of a securitys price is below the mean, there is more than a 50%
probability that the futures price of a security will be below its mean price. To see
this, observe that
P S(t) < E (S(t)) = P S(t) < Med(S(t))
+ P Med(S(t)) S(t) E (S(t)) .
12 Recall that the conditional expectation of an absolutely continuous random variable V given an
E (V 1 ) 2
event A with probability P ( A) > 0 is defined by E (V | A) = P ( AA) = P (1A ) A v f V ( v) dv, where f V is
the p.d.f. of V.
248 5 Binomial Trees and Security Pricing Modeling
By definition of the median, we have P S(t) < Med(S(t)) = 1
2 and by
(5.75) we see that P Med(S(t)) S(t) E (S(t)) > 0. Hence
1
P S(t) < E (S(t)) > . (5.78)
2
We made a similar observation in Example 5.6 (page 227) using a 100-
period CRR tree.
If the securitys volatility is sufficiently large, then there is also more than a 50%
probability that the futures price of the security will be below its current price. In
fact, note that (5.72) implies
2
2 + m q t 2 t
Med(S(t)) = S(0) e S (0) e 2 < S (0)
for sufficiently large. Then an argument similar to the one above yields
1
P S ( t ) < S (0) > ( sufficiently large). (5.79)
2
Example 5.6 (page 227) also pointed out this property using a CRR tree.
For any K > 0, the probability that S(t) is less than K is (Exercise 5.24)
P S(t) < K = N(d,RW (t)) ( t > 0). (5.80)
5.5 Exercises
5.1. Suppose that a binomial tree has an initial price of $80. If the tree has 21
periods, then is $80 one of the possible prices at time t21 ? If the tree has 300
periods, then is $80 one of the possible prices at time t300 ? Justify your answers.
5.2. For an n-period binomial tree, give a financial interpretation of each of the
following: un dn = 1, un 1, and dn 1.
5.3. Explain why the condition, dn < e(mq)hn < un , holds for n-period CRR
trees with n sufficiently large.
5.4. How many 1-period subtrees are in an n-period binomial tree?
5.5. For an n-period binomial tree, let NU be the number of security price
upticks from time t0 to tn . Explain why NU is a binomial random variable.
What are its expected value and variance if the tree has 40 steps and the uptick
probability is 60%?
5.6. An n-period CRR tree has reflection symmetry about the horizontal line
S(t) = S0 since the tree recombines. Agree or disagree? Justify your answer.
5.7. For an n-period risk-neutral binomial tree, show that if un < e(rq) hn , then
there is an arbitrage.
5.8. The risk-neutral uptick probability pn is related to the real-world probabil-
, E ( R(t ,t ))r h n
ity pn by pn = pn n pn (1 pn ), where n = 0 1 . Interpret n .
Var( R(t0 ,t1 ))
5.11. A trader believes that a certain stock currently at $51.25 per share has by
the end of the trading day a 70% chance of increasing by 50 and a 30% chance
of decreasing by 25. Using a 1-step binomial tree with this information, what
is the expected price of the stock at the end of the day?
250 5 Binomial Trees and Security Pricing Modeling
5.12. Assume that the current share price of a stock is $100 with a volatility
of 10%. Using a CRR tree model over a year with each being one trading day,
predict the maximum spread in the stocks possible prices a trading day from
now. Is your prediction impacted if you employ a Taylor approximation to
e h252 using the CRR assumptions?
5.13. Suppose that a nondividend-paying stock with current price of $45 has
an instantaneous annual expected return of 8% and annual volatility of 15%.
a) Using a 100-period CRR tree, forecast the price of the stock 3 months from
now, i.e., find the expected price of the stock 3 months from now.
b) What is your forecast if you use an 80-period CRR tree?
c) Using an 80-period CRR tree, determine the probability of your forecasted
price occurring.
5.15. Suppose that the current date is January 21, 2016. Estimate the volatility
and instantaneous drift RW for Google (ticker symbol GOOGL) using its ad-
justed closing prices from Yahoo! Finance for the period from January 20, 2016
to September 15, 2015. The data will consist of 90 daily log returns over the
past 91 trading days. Carry out similar estimates with the past 60 daily log re-
turns and then the past 30 daily log returns. Annualize your results using 252
trading days in a year. Discuss your findings.
(m q) hn
5.20. Show that for a CRR tree, the uptick probability pn e un d dn
satisfies
RW n
pn 2 1 +
1
hn for n sufficiently large.
1 p n
with probability pn
p n (1 p n )
5.21. For a CRR tree, show Xn,j =
pn with probability 1 pn .
p n (1 p n )
5.22. A Jarrow-Rudd (JR) tree for the price of a security paying a continuous
dividend yield rate q is a binomial tree where pn = 12 and the per-period ex-
pectation n and variance n2 defined by
1 S ( t j ) 1 S(t j )
n = E ln , (n )2 = Var ln ,
hn S ( t j 1 ) hn S ( t j 1 )
e) (Risk-Neutral JR Tree) Is the JR tree risk neutral? If not, then how would
you make it risk neutral? For a sufficiently large n, what would be the ap-
proximate governing equations of a risk-neutral JR tree, i.e., the equations
expressing un , dn , pn in terms of the inputs r q, , hn ?
N(d (t))
5.23. For t > 0 and K > 0, show that E (S(t) | S(t) > K ) = E (S(t)) N(d+,RW (t))
,RW
N(d (t))
and E (S(t) | S(t) < K ) = E (S(t)) N(d+,RW (t)) .
,RW
5.24. For t > 0, K > 0, and K2 > K1 > 0, show that P S(t) < K = N(d,RW (t))
K
and P K1 < S(t) < K2 = N(d1,RW (t)) N(dK ,RW (t)).
2
References
[1] Billingsley, P.: Probability and Measure, 3rd edn. Wiley, New York (1995)
[2] Chance, D.: Proofs and derivations of binomial models. Technical Finance
Notes, Louisiana State University (2007)
[3] Chance, D.: Risk neutral pricing in discrete time. Teaching Note 96-02,
Louisiana State University (2008)
[4] Cox, J., Ross, S., Rubinstein, M.: Option pricing: a simplified approach. J.
Financ. Econ. 7, 229 (1979)
[5] Cox, J., Rubinstein, M.: Options Markets. Prentice Hall, Upper Saddle
River (1985)
[6] Durrett, R.: Probability: Theory and Examples, 4th edn. University Press,
Cambridge (2010)
[7] Epps, T.: Pricing Derivative Securities. World Scientific, Singapore (2007)
[8] Ghahramani, S.: Fundamentals of Probability with Stochastic Processes.
Pearson Prentice Hall, Upper Saddle River (2005)
[9] Jarrow, R., Rudd, A.: Option Pricing. Richard Irwin, Homewood (1983)
[10] Jarrow, R., Turnbull, S.: Derivative Securities. South-Western College
Publishing, Cincinnati (2000)
[11] Leunberger, D.: Investment Science. Oxford University Press, Oxford
(1998)
[12] McDonald, R.: Derivative Markets. Addison-Wesley, Boston (2006)
[13] Neftci, S.: An Introduction to the Mathematics of Financial Derivatives.
Academic, San Diego (2000)
[14] Pitman, J.: Setup for the Central Limit Theorem. STAT 205 Lecture Note
10, scribe D. Rosenberg. University of California, Berkeley (2003)
[15] Roman, S.: Introduction to the Mathematics of Finance. Springer, New
York (2004)
[16] Wilmott, P., Dewynne, N., Howison, S.: Mathematics of Financial Deriva-
tives: A Student Introduction. Cambridge University Press, Cambridge
(1995)
Chapter 6
Stochastic Calculus and Geometric Brownian
Motion Model
P ( A) + P ( Ac ) = 1.
Ai F , i = 1, 2, . . . , n ni=1 Ai F .
If the last condition is replaced with closure under countable union, i.e.,
Ai F , i = 1, 2, . . . ,
i =1 A i F ,
Note that the definition of -algebra does not rely on the existence of a proba-
bility measure.
The next definition is a natural extension.
Definition 6.3. For C 2 , the smallest -algebra that includes C is called the
-algebra generated by C and denoted by (C ).
In fact, (C ) is the intersection of all -algebras including C.
Example 6.1. Let A . The collection {, A, Ac , } is the smallest -algebra
on containing A, i.e., ( A), the -algebra generated by A.
Example 6.2. {, } is the smallest -algebra on , whereas 2 , the power set
of , is the largest -algebra on .
Clearly, for any C 2 , {, } is a sub--algebra of (C ), and (C ) is a
sub--algebra of 2 .
In modern probability theory, a probability is defined on a -algebra as a
measure with whole space measure equal to one. Precisely, we define the prob-
ability measure as follows:
Definition 6.4. Let be a probability space and F a -algebra over . A real-
valued function P from F to [0, 1] is said to be a probability measure on F if it
satisfies the following conditions:
1. P () = 1;
2. For each countable collection { Ai F, i I } of pairwise disjoint sets,
P (i I Ai ) = P ( Ai ).
i I
= { H, T },
F = {, H, T, },
P : F [0, 1] is defined by
P () = 0, P ( H ) = 0.3, P ( T ) = 0.7, P () = 1.
down by a factor d with probability 1 p over the time period [t0 , t1 ], then the
corresponding probability space is represented by (, F, P ), where
= {U, D },
F = {, U, D, },
P : F [0, 1] is defined by
P () = 0, P (U ) = p, P ( D ) = 1 p, P () = 1.
The random experiment associated with the next example is parallel to re-
peatedly flipping a biased coin.
= {1 , 2 , 3 , 4 }, where
1 = UU, 2 = UD, 3 = DU, 4 = DD,
F = {, {1 }, {2 }, {3 }, {4 },
{ 1 , 2 } , { 1 , 3 } , { 1 , 4 } , { 2 , 3 } , { 2 , 4 } , { 3 , 4 } ,
{ 1 , 2 , 3 } , { 1 , 2 , 4 } , { 1 , 3 , 4 } , { 2 , 3 , 4 } , } ,
P : F [0, 1] is defined by
P ({1 }) = p2 , P ({2 }) = P ({3 }) = p(1 p), P ({4 }) = (1 p)2 ,
which along with additive property of P (see Remark 6.1) imply
P ({1 , 2 , 3 }) = 1 (1 p)2 , P ({2 , 3 , 4 }) = 1 p2 ,
P ({1 , 3 , 4 }) = P ({1 , 2 , 4 }) = 1 p(1 p),
P ({1 , 2 }) = P ({1 , 3 }) = p, P ({2 , 3 }) = 2p(1 p),
P ({2 , 4 }) = P ({3 , 4 }) = 1 p, P ({1 , 4 }) = p2 + (1 p)2 ,
P () = 0, P () = 1.
F = {, {1 }, {2 , 3 , 4 }, }
is a sub--algebra.
6.1 Stochastic Processes: The Evolution of Randomness 257
X : R,
X ( ),
where measurability refers to the -algebra F and means that for each real
number a, the set { X a} is an event in the probability space. That is,
Example 6.5. (One-Period Binomial Tree Again) Let us revisit Example 6.3,
the one-period binomial tree model of an oversimplified representation of
stock price behavior in which the stock price either goes up by a factor u
with probability p or goes down by a factor d with probability 1 p over the
time period [t0 , t1 ]. Then the corresponding probability space is represented by
(, F, P ), where
= {U, D },
F = {, U, D, },
P : F [0, 1] is defined by
P () = 0, P (U ) = p, P ( D ) = 1 p, P () = 1.
S0 u if = U
X ( ) = (6.2)
S0 d if = D.
258 6 Stochastic Calculus and Geometric Brownian Motion Model
120 if = U
X ( ) =
83.33 if = D.
X 1 ((, a]) = { X a} = { | X ( ) a}
if a < 83.33
= D if 83.33 a < 120
if 120 a.
Sometimes random variable X and a -algebra F are both given, but X is not
measurable on F . Intuitively, we say that F does not have enough resolution
to read off all information contained in X. The meaning of E ( X |F ) is the least
coarsification of X so that F has enough resolution for E ( X |F ) (i.e., E ( X |F )
is measurable on F ). More discussion on conditional expectation with respect
to a -algebra will be given shortly.
Recall that two events A and B are independent if and only if their joint
probability equals the product of their probabilities: P ( A B) = P ( A)P ( B).
Since -algebras are collections of events, the next definition is natural.
P ( A B ) = P ( A )P ( B ), A F, B G.
In words, two -algebras are independent, if any two events, one from each
-algebra, are independent.
Using the language of independent -algebras, we have an equivalent state-
ment for the definition of independent random variables:
Definition 6.6. Two random variables X and Y are independent if and only if
two corresponding -algebras ( X ) and (Y ) are independent.
260 6 Stochastic Calculus and Geometric Brownian Motion Model
Example 6.7. Recall from Example 6.5 that the probability space is given by
(, F, P ), where
= {U, D },
F = {, U, D, },
P : F [0, 1] is defined by
P () = 0, P (U ) = p, P ( D ) = 1 p, P () = 1,
S0 u if = U
X ( ) =
S0 d if = D,
u if = U
X ( ) = S0 Y ( ) where Y ( ) =
d if = D.
If we use a three-period binomial tree (with [ti , ti+1 ], ti < ti+1 and i = 0, 1, 2, 3,
as time periods), to model an oversimplified stock price behavior (with S(ti )
as the price at time ti ) so that in each period the stock price either goes up
by a factor u with probability p or goes down by a factor d with probability
1 p, then we can introduce three random variables on (, F, P ), Yi : R
defined by
u if = U
Yi = Yi ( ) = where i = 1, 2, 3.
d if = D,
Then
In a similar fashion, we can extend the binomial tree over infinitely many time
periods: [t0 , t1 ], [t1 , t2 ], [t2 , t3 ], . . . , [ti1 , ti ], . . . by introducing infinitely many ran-
dom variables on (, F, P ), Yi : R defined by
6.1 Stochastic Processes: The Evolution of Randomness 261
u if = U
Yi = Yi ( ) = where i = 1, 2, 3, . . . , (6.3)
d if = D,
Y = {Y1 , Y2 , Y3 , . . . }, or Y = {Y1 ( ), Y2 ( ), Y3 ( ), . . . }, .
y = { f 1 , f 2 , f 3 , . . . }, or y = { f1 ( x ), f2 ( x ), f3 ( x ), . . . }, x R,
Mimicking the familiar notation from vector calculus, we give the following
definition.
X = { X1 , X2 , X3 , . . . } .
1 For our purpose, the index set is always a time index set although it is not necessarily so by defini-
tion.
2 In mathematics, the set of objects that we are considering is often referred to as a space.
262 6 Stochastic Calculus and Geometric Brownian Motion Model
Since the index set J usually represents time indeed, and the random vari-
able describes the state of the process at time t, the index t admits a natural
interpretation: if Xt = s, we say that the process is in state s at time t.
Stochastic processes can be classified according to the index set J into discrete-
time processes and continuous-time processes.
Stochastic processes can also be classified according to the state space S into
discrete-state processes and continuous-state processes. The state space is discrete if
it consists of a finite number of points or a countably infinite number of points;
otherwise, it is continuous.
Since sample paths of a process may have a direction almost nowhere (noise),
one can visualize them only through imagination; however, an intuition of
the notion of sample paths may come from some idea in analytic geometry.
6.1 Stochastic Processes: The Evolution of Randomness 263
Example 6.14. Recall from analytic geometry and vector calculus that the graph
of the function of two variables is a surface in R3 . For example, the graph of
y = f (t, x ) = t2 + x2 , (t, x ) R2
is an elliptic paraboloid.
However, either the t-cross section or the x-cross section of y = f (t, x ) is a
continuous path (curve) in R3 . For example, let x = 1. The graph of
y = f (t, 1) = t2 + 1, t R,
which is a function of t alone, is a parabola. In fact, for each fixed x, the graph
of y = f (t, x ) = t2 + x2 , t R, a function of t alone, is a parabola, a continuous
path in R3 .
Xt : R,
X (t, ) (i.e., Xt ( )),
X ( ) : J R,
Notational Remark
There may be several notations for the same mathematical concept for good
reasons. Consider a familiar environment as in one variable calculus, where
f ( x ), dx , dx , and D f ( x ) may all represent the derivative of function y = f ( x )
dy d f
with respect to x.
In our discussion, for the sake of convenience or clarity relative to different
contexts, different notation for the same mathematical object may be used. The
reader should keep the following in mind in the rest of this chapter:
1. Notation for sample paths. Xt ( ), X (t, ), X ( ) (where t is dropped to ease
the notation and to emphasize the (relation) rule of function rather than the
value of the function) and X (t) (where is dropped to ease the notation and
to emphasize the fact that a path is defined by a function of t alone) may all
represent a sample path of the stochastic process X.
2. Notation for random variables in a process. Xt and X (t) may be used for a ran-
dom variable in process X.
3. Background probability space. Unless stated otherwise (, F, P ) represents
the background probability space in the rest of our discussions involving
stochastic processes.
X ( ) : [0, ) R,
t X (t, )
is a continuous sample path for almost surely (a.s.) all , which means that
We will make use of the following definitions and relationships among differ-
ent notions of convergence of random variables.
Definition 6.11. A sequence { Xn } of random variables is said to converge almost
surely (or converge almost everywhere or converge with probability 1) to a random
a.s.
variable X, written Xn X, if
P ( lim Xn = X ) = 1. (6.5)
n
P
Equivalently, Xn X if and only if for each > 0,
a.s. P d
1. Xn X = Xn X = Xn X.
P
2. Xn X = Xn X.
m.s.
d
Consequently, Xn X = Xn X.
m.s.
Similarly, one can see that a skewed left distribution has the left tail longer
than the right tail, and the p.d.f. is tilted to the right (see the leftmost graph in
Figure 8.5 on page 443).
KurtExcess( X ) = kurt( X ) 3.
One intuitive way to describe a filtration is that it functions like a filter of in-
formation flow to control information propagation. For our purpose, it is suf-
ficient to know the following:
1. Ft represents the information available at time t.
2. The information structure designated by (6.10) assures that the amount of
information grows as time evolves and that no information is lost with in-
creasing time (e.g., no computer crashes) in the sense that whatever infor-
mation available at time s is still available at time t as long as t s.
4 A legitimate full description of the stochastic process is based on the notion of finite-dimensional
distributions.
6.2 Filtrations and Adapted Processes 269
Note that filtrations can also be defined on a discrete-time index set by the
same idea. An example of such filtrations is provided below.
= {1 , 2 , 3 , 4 }, where
1 = UU, 2 = UD, 3 = DU, 4 = DD,
F = {, {1 }, {2 }, {3 }, {4 },
{ 1 , 2 } , { 1 , 3 } , { 1 , 4 } , { 2 , 3 } , { 2 , 4 } , { 3 , 4 } ,
{ 1 , 2 , 3 } , { 1 , 2 , 4 } , { 1 , 3 , 4 } , { 2 , 3 , 4 } , } .
Now, if we let
Ft0 = {, },
Ft1 = {, {1 , 2 }, {3 , 4 }, },
Ft2 = F,
We say that Ft2 is finer than Ft1 , and Ft1 is finer than Ft0 , or equivalently, Ft0
is coarser than Ft1 , and Ft1 is coarser than Ft2 .
270 6 Stochastic Calculus and Geometric Brownian Motion Model
F t = ( Xs , 0 s t ), t 0, (6.11)
( Xu , 0 u s ) ( Xu , 0 u t ), 0 s t.
Example 6.18. The -algebra generated by two random variables X1 and X2 , writ-
ten ( X1 , X2 ), is the smallest -algebra which is sufficient for both random
variables X1 and X2 to be measurable (see page 257 and Example 6.6).
In the language of information, this is tantamount to saying that we use the
least amount of information to determine both X1 and X2 . For this reason, we
say that:
( X1 , X2 ) represents the information set that contains the least amount
of information to determine both X1 and X2 .
Clearly, ( X1 ) ( X1 , X2 ) and we see that given a discrete-time stochastic
process { Xt , t = 1, 2, 3 . . . }, we have ( X1 ) ( X1 , X2 ) ( X1 , X2 , X3 ) .
E ( X |F ) = X.
a.s.
(6.12)
Example 6.20. Given two -algebras F and G with F G and a G -measurable
random variable X, what can you say about E ( X |F ) or the prediction of X?
Solution. Unknown (because a smaller information set may not contain suffi-
cient information of X even the larger one does).
Example 6.21. Since ( X ) contains sufficient information of X, (6.12) implies
E ( X |( X )) = X.
a.s.
Similarly,
E ( X1 |( X1 , X2 )) = X1 , and E ( X2 |( X1 , X2 )) = X2 .
a.s. a.s.
Property 6.3. (Taking Out What Is Known) If X and Y are random variables
and X is F -measurable, then
E ( XY |F ) = XE (Y |F ).
a.s.
(6.13)
E (E ( X |F )) = E ( X ). (6.14)
E (E ( X |G)|F ) = E ( X |F ).
a.s.
(6.15)
E ( a1 X1 + a2 X2 |F ) = a1 E ( X1 |F ) + a2 E ( X2 |F ).
a.s.
(6.16)
More intuition of E ( X |F ).
Sometimes random variable X and a -algebra F are both given, but X is not
measurable on F . Intuitively, we say that F does not have enough (contrast)
resolution to read off all information contained in X. In this sense, X being
F -measurable is at one end of a yardstick, where F has highest resolution to
read off all information contained in X, whereas X being independent of F
(see Definition 6.7 on page 260) is at the other end of the yardstick, where F
has no resolution (thus read off no information of X). In other words, we say
that X being independent of F means that F contains no information of X.
The meaning of E ( X |F ) is the least coarsification of X so that F has enough
resolution for E ( X |F ) (i.e., E ( X |F ) is measurable on F ).
Now, we are ready for the next property.
E ( X |F ) = E ( X ). (6.17)
Definition 6.21. Let X and Y be two random variables with finite expectations.
we define the conditional expectation of X given Y by
particularly,
Xs = E ( Xs |Fs ).
a.s.
(6.19)
Using the language in Remark 6.2 on page 259, we say that Xs is considered
known at any time t, t s if { Xt : t 0} is adapted to the filtration {Ft }.
In other words:
A stochastic process { Xt : t 0} being adapted to the filtration implies
that the value of Xt is (almost surely) completely determined by the fil-
tration Ft in the sense of (6.19).
2. On the other hand, since for each t > s, Xt may not be Fs -measurable, at
time s (see Example 6.20 or 6.21 on page 271), Xt is considered unknown
because probabilities of events described by Xt may not be computed based
on the information available at any time earlier than the moment t. In this
sense:
The notion of adaptedness can be interpreted as inability to have knowl-
edge about future events. For this reason an adapted process is also called
non-anticipating because the propagation or progressive revelation of in-
formation under adaptedness allows no anticipation of future informa-
tion. An illustration of this concept is provided in the next example.
274 6 Stochastic Calculus and Geometric Brownian Motion Model
= { 1 , 2 , 3 , 4 } ,
where
({1 , 2 }) = {, {1 , 2 }, {3 , 4 }, }.
If we define
Xt0 ( i ) = 1 if i = 1, 2, 3, 4,
.
2 if i = 1, 2
Xt1 ( i ) =
3 if i = 3, 4,
4 if i = 1
5 if i = 2
Xt2 ( i ) =
6 if i = 3
7 if i = 4,
Note that in our next definition, t may be an integer for a discrete-time process
or a real number for a continuous-time process.
Definition 6.23. A stochastic process { Xt : t 0} on a filtered probability space
(, F, {Ft }, P ) is said to be a martingale with respect to the filtration {Ft } if { Xt }
is adapted to {Ft } and satisfies the condition E (|Xt |) < for t 0 and the
property
E ( Xt |Fs ) = Xs for s < t, t 0. (6.20)
Stating the defining characteristic of the martingale property given by (6.20) in
common language:
The best prediction for a future realization is the current value of the pro-
cess.
Taking expectation on both sides of (6.20) yields
E ( Xt ) = E ( Xs ),
Remark 6.5. Since all Ito integrals with respect to Brownian motion are mar-
tingales, martingales form an important class of stochastic processes.
Using a more concise notation of conditional expectation (in the spirit of Defi-
nition 6.21), (6.21) can be written into
E (Sn ) = nE ( R1 ) = 0.
n n n
E (S2n ) = E ( Ri R j ) = 2 = n2
j =1 i =1 j =1
implies8
E (|Sn |) < .
Step 2.
E ( S n + 1 | S1 , S2 , . . . , S n ) = E ( R 1 + R 2 + + R n + R n + 1 | S1 , S2 , . . . , S n )
= R 1 + R 2 + + R n + E ( R n +1 ) = S n + 0 = S n .
According to Fama (1970), a market in which prices always fully reflect avail-
able information is called efficient (see Fama [11]).
Recall (6.20)
E ( Xt |Fs ) = Xs for s < t.
Let s be the current time and Fs = ( Xu , u s). If Xt represents a security
price at time t, (6.20) indicates that the information contained in the past prices
is instantly and fully reflected in the security current price. For this reason,
the martingale is considered to be a necessary condition for efficient security
market.
8 A proof can be done either by applying Cauchy-Schwarz inequality or by using the result at the link
dS
= rS with S ( 0 ) = S0 ,
dt
which is equivalent to
dS = rS dt with S ( 0 ) = S0 , (6.24)
r=+ or rt = + t , (6.26)
(see Section 6.5) since it has been shown that, although Brownian motion paths
are nowhere differentiable, their (formal) time derivatives form the white
noise process (see Definition 6.25). Thus, we can write
Note that a process has no linear forecasting value if its autocovariance func-
tion (t) is identically equal to zero.
Since a noise is presumably unpredictable, we expect the covariance func-
tion of a white noise process at all nontrivial time lags to show a value of zero.
We define the mean value function of a process { Xt } by m(t) = E ( Xt ) and
give the following definitions.
Definition 6.25.
1. The process { t } is said to be a white noise (process) if its mean value function
and autocovariance function respectively are
.
2 if = 0
m(t) = E ( t ) = 0, and (t) = for t,
0 if = 0,
t WN (0, 2 ).
t i.WN (0, 2 ).
t N (0, 2 ).
where I (t) is the simple interest on the time interval [t, t + dt], and A(t) is the
initial principal on the same time period. Note that the infinitesimal change on
the principal is
d A(t) = I (t) = A(t) r dt.
Similarly, if S(t) is the price of the underlier, without loss of generality, e.g.,
a stock at time t, and D(t) is the dividend paid by the stock over the time
interval [t, t + dt], we define the (annualized) dividend yield, denoted by q, to
be the constant satisfying
D(t) = S(t) q dt. (6.29)
If we reinvest the dividends immediately in the stock, (6.29) says that the divi-
dend paid on [t, t + dt] buys us qdt shares of the stock. It follows that over the
time period [t, t + dt], if we own N (t) shares of the stock initially (i.e., at time
t), the dividends from these N (t) shares of the stock on the small interval buy
us N (t) qdt more shares than that we initially owned. Restated in mathematical
language,
d N (t) = q N (t) dt, t [0, T ]. (6.30)
In words, the infinitesimal change in the shares held on [t, t + dt] is q N (t) dt.
Solving the initial value problem of o.d.e.
In words:
The dividend reinvestment yields eqT 1 more shares at time T from one share of
the stock at time 0;
282 6 Stochastic Calculus and Geometric Brownian Motion Model
The dividend reinvestment yields 1 eqT more shares at time T from eqT shares
of the stock at time 0.
This result will be used again and again, particularly in Chapter 7.
Example 6.28. There are two ways to profit from a dividend-paying security:
capital gains and dividends. Let {S(t)} be the price process of such a secu-
rity with dividend yield q. Assuming that the risk-neutrality hypothesis holds,
the expected return from the security is the risk-free interest rate r. There-
fore, the expected return of the capital gains (i.e., based solely on the appreci-
ation of the stock price) must be r q. This is to say that if S(t0 ) is the security
price at the beginning of time interval [t0 , t0 + t], then the expected value of
the security price at the end of the interval becomes S(t0 )e(r q)t . If {S(t)} is
modeled by a binomial tree with parameters p, u, and d, then the expectation
of capital gains over this time period can be expressed by
3. X N (, 2 ) MX (t) = et+ 2 t ,
1 2 2
the increments
That is, the increments over the time intervals with same length have the
same probability distributions.
the probability measure in the real world (or physical world) in contrast to Q-Brownian motion. Q-
Brownian motion means that B = {B( t) : t 0} is a process on ( , F, Q ) where Q represents the
probability measure in the risk-neutral world. More detailed explanation is given in Section 6.8.3.
284 6 Stochastic Calculus and Geometric Brownian Motion Model
Definition 6.28. A Brownian motion with drift and scaling is a stochastic process
that can be expressed by b + t + B where b, R and > 0 are constant
and B is a standard Brownian motion.
Property 6.9.
1. With probability 1, Brownian paths are continuous.
2. With probability 1, Brownian paths are nowhere differentiable.
3. With probability 1, Brownian paths do not have bounded total variation14
on [0, t] (nor on any interval by the time-homogenous property of Brownian
motion).
Proof. The first property follows from the fact that a Brownian motion B + b is
sample-continuous if and only if the corresponding standard Brownian motion
B is sample-continuous (and this is straightforward from Definition 6.26).
13 It is not obvious that all four properties in Definition 6.26 are compatible with each other. For in-
stance, it is not obvious that stationary independent increments and sample continuity are compatible
properties.
14 The total variation is a way to measure the variation of a (deterministic) real-valued function (see
Section (6.6)).
6.5 Brownian Motion 285
Although any graphs can only depict a Brownian motion traveling in a manner
far from desirable due to a host of microscopic random effects, a mental visu-
alization of them may be achieved. The following explanation may be helpful.
15In short, a fractal is a natural phenomenon or a mathematical set that exhibits a repeating pattern
that displays at every scale. For more explanation, we refer the reader to http://en.wikipedia.
org/wiki/Fractal.
286 6 Stochastic Calculus and Geometric Brownian Motion Model
x = f (t, w)
X = { X (t, )}
in terms of its sample paths (cross-sections when is fixed) and the p.d.f. of
random variables (cross-sections when t is fixed):
1. Visualization of Brownian paths (cross-sections when is fixed)
16
Note that 1 standard deviation from 0 is t.
6.5 Brownian Motion 287
Fig. 6.1 Standard Brownian motion is shown using 5,000 randomly selected sample paths over the
interval [0, 1], where X ( t) = B( t) is plotted on the vertical axis. The current time is 0 and for each
future t > 0, the random variable B( t) is normal with mean 0 and variance t, which can be seen to
increase with time in the figure. At time t = 1, the frequency distribution of the sample paths is shown
as a histogram, which indeed has the shape of a normal distribution with mean 0 and variance 1. The
horizontal line shows the mean value of standard Brownian motion, namely, E (B( t)) = 0 for all t 0
Example 6.30. Because of a host of microscopic random effects (e.g., see scaling
invariance Property 6.10), graphs can depict a Brownian motion traveling only
in a manner far from desirable; however, to visualize the Brownian motion B +
b, one may vertically translate the graph in Figure 6.1 by b units, and imagine
that Brownian paths are diffusing from its initial state B(0) = b, and travel in
a.s.
Remark 6.7. We emphasize that the last property is crucial in the definition of
the Ito integral with respect to Brownian motion.
Example 6.31. Show that the Brownian motion {Bt } is a martingale with re-
spect to its natural filtration (i.e., Ft = (Bs : s t),17 the filtration induced
by {Bt }).
17 It is worth noting that both filtrations below are used frequently in the literature:
6.6 Quadratic Variation and Covariation 289
Proof.
where the third equal sign holds due to the fact that (Bt Bs )Fs by Proper-
ties 6.11, 1 and 6.7, and E (Bt Bs ) = 0.
Remark 6.8. Since all Ito integrals with respect to Brownian motion are mar-
tingales, martingales form an important class of stochastic processes.
Finally, we note that Brownian motion is a basic building block for the con-
struction of a diffusion process (or a diffusion for short), which is a continuous-
time Markov process with (almost surely) continuous sample paths.
The simplest and most fundamental diffusion process is Brownian motion.
A more general example of diffusion processes is a Brownian motion with drift
(e.g., Xt = t + Bt , also see Figure 6.2).
There are different ways to measure the variation of a function. Total varia-
tion is a tool for us to measure the total, therefore the absolute value (as you
will see in the definition below), up-and-down movement of a function. The
total variation is used for deterministic real-valued functions (not as sample
paths of random processes), whereas quadratic variation is used for stochastic
processes (we will show that processes such as Brownian motions do not have
finite total variations, but can be dealt with if one uses quadratic variation).
The total variation of a real-valued function on an interval is denoted by
Va ( f ) and defined as
b
n 1
Vab ( f ) = lim | f ( xk+1 ) f ( xk )|,
|P|0 k =0
(6.33)
The first is the smallest filtration that makes {Bt } adapted. The second is an extension of the first by
including some zero-probability subsets and has advantages of being complete and right-continuous,
which are convenient and important properties to have. The second filtration is referred to as the
Brownian filtration. We use the first filtration in this example to suppress some conceptual and technical
details involved in the second.
In loose terms and for our purpose, both filtrations are denoted by {F t }, where F t is interpreted
as the set of information generated by the standard Brownian motion on the time interval [0, t].
290 6 Stochastic Calculus and Geometric Brownian Motion Model
Fig. 6.2 Brownian motion with drift parameter = 0.2 and volatility parameter = 0.1 is illustrated
using 5,000 randomly selected sample paths over the interval [0, 1]. Note how the over structure drifts
upward about the mean line t. At time t = 1, the frequency distribution of X (1) is shown as a
histogram, which approximates a normal distribution with mean 0.2 and variance 0.1. The current
time is 0 and the solid line is a plot of the expected value E ( X ( t)) = 0.2 t, where t 0
where the limit is taken in probability18 (see Definition 6.12), and P repre-
sents partitions over the interval [0, t] and |P| is the length of the longest
subintervals associated to P.
18Notice that, among different concepts of convergence of a sequence of random variables (i.e., con-
vergence in probability, almost sure convergence, and convergence in mean square), convergence in
probability is the weakest.
6.6 Quadratic Variation and Covariation 291
where the limit is taken in probability, and P and |P| are the same as stated
above.
and obtain
n
[X ]t = lim
t0
( k X )2 ,
k =1
n
[X, Y ]t = lim
t0
(k X )(k Y ),
k =1
d [ X ] t = ( d Xt )2 , (6.36)
d [X, Y ]t = (d Xt )(d Yt ). (6.37)
The next two properties of the covariation process can be easily verified.
(i)
Property 6.12. (Symmetry and Bilinearity) Let X (i) = { Xt , t 0}, i = 1, 2, 3 be
three processes with finite quadratic variations. The following properties hold:
1. The covariation process is symmetric. That is,
[ X (1 ) , X (2 ) ] t = [ X (2 ) , X (1 ) ] t , t 0.
which is obtained by applying the binomial formula and shows that the co-
variation can be defined by quadratic variations.
Property 6.13. (Covariation Expressed in Terms of Quadratic Variations) Let
X = { Xt } and Y = {Yt } be two processes with finite quadratic variations. Then
1
[X, Y ]t = ([X + Y ]t [X ]t [Y ]t ).
2
Again, let X = { Xt } and Y = {Yt } be two processes. As a motivation for the
next property, let us verify the following identity:
( Xt Yt ) = Xt Yt + Yt Xt + Xt Yt , (6.38)
holds under certain condition (e.g., both X and Y are Ito processes, which will
be introduced later) and has an equivalent form:
A proof of the product rule can be done by applying the two-dimensional Itos
lemma (see Exercises 6.32 and 6.33 on page 326). For this the product rule is
also referred to as the Ito product rule.
If f is a deterministic function, we denote [ f ]ba the quadratic variation of f
over interval [a, b]. That is, [ f ]ba [ f ]t , t [a, b].
We will make use of the next two properties.
[ f ]ba 0. (6.41)
In words, the quadratic variation of a continuous f is identically equal to zero.
Proof. A proof is provided in Remark 6.9 on page 297.
Property 6.15. Let f be a continuous (deterministic) function and X = { Xt } be
a sample-continuous process on [0, T ] (see Definition 6.10 on page 264). Then
[X, f ]t 0, t [0, T ].
d[B, t ]t = dB(t) dt
(keep in mind: B(t) Bt by the notational remark on page 264) and obtain
dB(t) dt = 0. (6.42)
As a good exercise, the reader is encouraged to derive identity (6.42) by directly
using the definition of covariation.
Example 6.33. Compute d(et
2 +t
S(t)) given dS(t) = 0.2 dt + 0.095 dB(t).
d (e t
2 +t
S(t)) = et
2 +t
dS(t) + d(et
2 +t
)S(t) + d[S, f ]t
t2 + t
dS(t) + (2t + 1)et
2 +t
=e S(t)dt + 0
t2 + t
(0.2 dt + 0.095 dB(t)) + (2t + 1)et
2 +t
=e S(t)dt
t2 + t t2 + t
= (0.2 + (2t + 1)S(t))e dt + 0.095e dB(t).
[B]t = t. (6.43)
Proof. It is sufficient to show that for partitions Pn : 0 = t0 < t1 < t2 < < tn = t
on the interval [0, t] with tk = kt
n , k = 0, 1, . . . , n 1, n > 0,
n n 1
lim
|Pn |0
(Btk Btk1 )2 = nlim
(Btk Btk1 )2 = t,
k =1 k =0
where the limit is taken in the mean square sense (see Definition 6.13 and Prop-
erty 6.1, 2 on page 265).
We let Bk = Btk+1 Btk , and t = tk+1 tk , k = 0, 1, . . . , n 1, and
n 1 n 1
Sn = (Btk Btk1 )2 = B2k .
k =0 k =0
Since
n 1 n 1 n 1 n 1
E (Sn ) = E B2k = E(B2k ) = Var(Bk ) = t = t,
k =0 k =0 k =0 k =0
we have
n 1 n 1
E ((Sn t)2 ) = Var(Sn ) = Var(B2k ) = [E(B4k ) (E(B2k ))2 ]
k =0 k =0
n 1
= [( Var(Bk ))2 kurt(Bk ) ( Var(Bk ))2 ]
k =0
n 1 n 1
t2
= [( Var(Bk ))2 (3 1)] = 2 t2 = 2 n 0 as n .
k =0 k =0
It follows from [B]t = t and Property 6.14 that with probability 1, sample
paths of Brownian motion do not have bounded variation on [0, t] (nor on
any interval by the time-homogenous property of Brownian motion).
which are defined on the same filtered probability space (, F, {Ft }, P ) and
adapted to the filtration {Ft }. Let
(i) (i)
B(i) = Bt Bs , where t > s and i = 1, 2. (6.45)
We say that B(1) and B(2) have correlation if, for t > s 0,
Proof. Let
1
X = a(B(1) + B(2) ) with a2 = .
2 + 2
Thus, 1
2a2
= 1 + .
Step 1. We verify that X is a standard Brownian motion as follows:
Let X = a(B(1) + B(2) ), where B(i) are defined in (6.45). We establish
Var(X ) = E (X 2 ) = E ( a2 (B(1) + B(2) )2 )
= a2 E ((B(1) )2 + 2(B(1) )(B(2) ) + (B(2) )2 )
= a2 [E ((B(1) )2 ) + 2E ((B(1) )(B(2) )) + E ((B(2) )2 )]
= a2 [ Var(B(1) ) + 2 Cov(B(1) , B(2) ) + Var(B(2) )]
= a2 [t s + 2(t s) + t s]
1
= (2 + 2)(t s) = t s,
2 + 2
By Property 6.16, [X ]t = t.
Step 2. Applying Property 6.12, the symmetry and bilinearity of covariation
process yield
19The random n-vector X = [ X1 X2 . . . Xn ] is (or the random variables X1 , X2 , . . . , Xn are) said to have
a multivariate normal distribution if and only if all linear combinations of X1 , X2 , . . ., Xn are normally
distributed. When n = 2, X is said to have a bivariate normal distribution.
296 6 Stochastic Calculus and Geometric Brownian Motion Model
(dt)2 = d[ t ]t 0
The following three remarks provide some insights into the concept of quadratic
variation and are for the interested reader.
Remark 6.9. Recall from (6.33) on page 289 that the total variation of a real-
valued function on interval is denoted Vab ( f ) and defined as
n 1
Vab ( f ) = lim
|P|0 k =0
| f ( xk+1 ) f ( xk )|,
where P : a = x0 < x1 < x2 < < xn = b is a partition on [a, b].
Let f be a continuous function on [a, b]. We claim that if f has finite total
variation, then [ f ]ba , the quadratic variation of f , is identically equal to zero on
[a, b] (this is why quadratic variation is not defined for deterministic functions).
In fact, Vab ( f ) being finite implies that M > 0 such that
n 1
lim | f ( xk+1 ) f ( xk )| M.
|P|0 k =0
Since f being continuous on a closed interval [a, b] implies the uniform conti-
nuity of f on [a, b]: > 0, N > 0 such that | f ( xk+1 f ( xk )| < whenever
| xk+1 xk | < N1 . Consequently,
n 1 n 1
( f ( xk+1 ) f ( xk ))2 < | f ( xk+1 ) f ( xk )| M,
k =0 k =0
Remark 6.10. A proof of Property 6.15 on page 293 is provided below for the
interested reader.
Since for each fixed , sample path X (t, ) = X (t) of X being continuous
on a closed interval [0, T ] implies X (t) being absolutely continuous on [0, T ],
> 0, N > 0, such that
1
|X (tk+1 ) X (tk )| < whenever |tk+1 tk | < .
N
Consequently,
n 1 n 1
| (X (tk+1 ) X (tk ))( f (tk+1 ) f (tk ))| < | f (tk+1 ) f (tk )| V0T ( f ),
k =0 k =0
298 6 Stochastic Calculus and Geometric Brownian Motion Model
where V0T ( f ) is the total variation of f on [0, T ], which is finite due to the con-
tinuity of f . It follows that
n 1
0 |[X, f ]t | lim | f ( xk+1 ) f ( xk )| V0T ( f ) 0.
k =0
Thus, [X, f ]t = 0.
When both (t) and (t) are functions of X (t) only, we replace (t) and (t)
by ( X (t)) and ( X (t)) respectively and have
where f.t.v. and i.t.v. stand for finite total variation and infinite total variation
respectively. The second term on the right labeled by the infinite total variation
term is because, with probability 1, the total variation of a Brownian path on
any interval no matter how small is infinite. It is the infinite total variation term
that complicates our interpretation of the limiting procedure in the ordinary
sense.
6.7 Ito Integral: A Brief Introduction 299
This is why we impose the condition of 2 (t) = 2 ( X (t)) in the diffusion equa-
tion being square integrable.
20 Brownian noise (also known as brown noise or red noise) is the kind of signal noise produced by
Brownian motion. Naturally, it is also called random walk noise as a Brownian motion can be viewed
as a limit of random walks. Note that a random walk noise is not a white noise (see Example 6.27).
21 That is, following an approximation procedure: Step 1. Divide the interval into finitely many subin-
tervals (the partition). Step 2. Construct a simple function (use step functions for intuition) that has a
constant value on each of the subintervals of the partition (the upper and lower sums). Step 3. Define
integrals of simple functions (simple processes are random step functions). Step 4. Take the limit of
these simple functions as more and more dividing points are added to the partition. If the limit exists,
300 6 Stochastic Calculus and Geometric Brownian Motion Model
where the first equal sign holds due to a property of conditional expectation
(taking out f s , which is known), the second equal sign holds due to the Markov
property of Brownian motion, and the last equal sign holds due to the defini-
tion of Brownian motion. Thus, if we let dBs = Bs+ds Bs (where ds > 0 is
considered to be infinitesimal), then
E ( f s dBs |Fs ) = 0.
The Ito integral of the process { f t } with respect to Brownian process {Bt } is denoted
by the process {Yt } and defined by
1 t
Yt = f s dBs = lim Sn , (6.52)
0 n
lim E ((Yt Sn )2 ) = 0.
n
{Yt } is called a process driven by Brownian motion {Bt } and transformed by the
integrand process { f t }.
Before considering computational details, let us understand (6.52) concep-
tually:
it is called the Riemann integral and the function is called the Riemann integrable (Ito integrability
requires convergence in mean square).
Note that a simple function is a finite linear combination of indicator functions of measurable sets.
Thus all step functions are simple functions.
6.7 Ito Integral: A Brief Introduction 301
1 n
2 i
I1 = (Bit B(i1)t )2 ,
=1
n
I2 = (Bit B(i1)t )B(i1)t.
i =1
We obtain 1 t
1 t
Bs dBs = B2t .
0 2 2
Equivalently,
d(B2t ) = 2Bt dBt + dt.
where {( X (t), t)} is an adapted drift process, and {( X (t), t)} is an adapted
volatility process and square integrable. The adaptedness refers to the Brownian
filtration (the set of information of the past history of the Brownian motion B).
An Ito process X = { X (t)} is said to be an Ito diffusion if both drift process
and volatility process are functions of X (t) only. That is, X has the expression
1 t 1 t
X ( t ) = X (0) + ( X (s)) ds + ( X (s)) dB(s), 0 t T, (6.54)
0 0
where ( X (t)) and ( X (t)) are also known as the drift coefficient and diffusion
coefficient of X, respectively.
Remark 6.12.
1. An Ito process is a process of a sum of the initial state of the process and two
integrals:
X (t, ) = X (0, ) + I1 (t, ) + I2 (t, ), 0 t T, where
1 t
I1 (t, ) = ( X (s, ), s) ds, 0 t T,
0
1 t
I2 (t, ) = ( X (s, ), s) dB(s), 0 t T.
0
For each fixed , I1 is an ordinary integral and I2 an Ito integral. For (6.53)
to be well defined, (t, X (t)) must be integrable in the ordinary sense, and
( X (t), t) must be integrable in the stochastic sense as we defined in Sec-
tion 6.7.2. The square integrability of 2 ( X (t), t) ensures that the quadratic
variation of the process X is finite. In fact, a straightforward calculation
shows that 1 t
[ X ]t = 2 ( X (u), u) du.
0
The detailed verification is left as an exercise for the reader.
2. Although the form of stochastic differential equations has the advantage of
being intuitive in modern financial theory, stochastic differential equations
acquire mathematical meanings only through their corresponding stochastic
integral equations. Equation (6.55) alone is not well defined, and dividing
both sides of the equation by dt is forbidden because the Brownian path is
non-differentiable.22
Example 6.36. Both familiar processes X and S with constants and on
page 278 defined by
dX (t) = dt + dB(t),
dS(t) = S(t) dt + S(t) dB(t),
respectively, are Ito diffusions.
We emphasize that both s.d.e.s should be understood as defined by their
corresponding s.i.e.s:
1 t 1 t
X ( t ) = X (0) + dt + dB(s),
0 0
1 t 1 t
S ( t ) = S (0) + S(u) du + S(u) dB(u).
0 0
22 With probability 1 the Brownian path is non-differentiable in the ordinary sense, but in the context
of generalized stochastic process, dB
dt is well defined as a generalized function on an infinite dimensional
space, which is a topic beyond the scope of this book.
304 6 Stochastic Calculus and Geometric Brownian Motion Model
Itos lemma is the tool of the trade in continuous-time stochastic process mod-
eling.
Remark 6.13.
1. Note that f ( x, t) in Theorem 6.1 is a smooth function, and that
1
Y = f t + f x + 2 f xx
2
Y = f x
are the drift and volatility processes, respectively, for the new process Y.
2. In words, Itos lemma says that Ito processes are stable under smooth maps
in the sense that any smooth function maps (sends) an Ito process X in terms
of its driven process B, drift X , and volatility coefficient X to another Ito
process in terms of its driven process B, drift Y , and volatility coefficient
Y . In short, a smooth function of an Ito process is an Ito process.
3. There are different versions of Itos lemma (e.g., Itos lemma for jump-
diffusion processes, whereas Theorem 6.1 for Brownian motions), which are
widely employed in modern financial theory. The best known application
of Itos lemma is in the derivation of the Black-Scholes-Merton equation for
option values, which will be introduced in a later chapter.
6.8 Itos Formula for Brownian Motion 305
f t = f (B(t), t) = S(t),
f x = f (B(t), t) = S(t),
f xx = 2 f (B(t), t) = 2 S(t).
implies + 12 2 = r q. We obtain = r q 12 2 .
Solution 2.
Step 1. Identify the given process X and smooth function f in Itos lemma:
Keep s.d.e. (6.56) in mind.
f t = 0, f x = f xx = f ( X (t)) = S(t).
Step 4. The answer to the question is + 12 2 , which is the same as the one we
obtained earlier.
Using (6.58)
1 2
dY = f t + f x + f xx dt + ( f x ) dB,
2
Y
Y
What is the (conditional) expected growth rate of the stock at any given time
t T?
Solution 1.
Since X (t) = 12 t2 + 2B(t) is an Ito process with X = t and X = 2,
1 2 +2B( t )
Y = f ( X (t)) = e 2 t = S ( t ), where f ( x ) = ex
is also an Ito process. Note that the conditions of Itos lemma must be verified
before the lemma can be applied.
Applying Corollary 6.1, we obtain
1 2
E (dS(t)|Ft ) = E (dY |Ft ) = f t + f x + f xx dt
2
= (0 + tS(t) + 2S(t)) dt
= (2 + t)S(t) dt.
23Itos lemma ensures that Y in (6.58) is an Ito process. Thus, both drift process Y and volatility
process Y are adapted to the Brownian filtration by definition.
308 6 Stochastic Calculus and Geometric Brownian Motion Model
dt dBt
dt 0 0
dBt 0 dt
which implies that (dBt )m 0 as dt 0 for m 2.
For an arbitrary partition on interval [0, t]: 0 = t0 < t1 < t2 < tn = t,
n
f (Bt ) f (Bs ) = ( f (Bti ) f (Bti1 ))
i =1
n
1 n
f (Bti )(Bti Bti1 ) +
2 i
= f (Bti1 )(Bti Bti1 )2
i =1 =1
n
+ o ((Bti Bti1 )2 )
i =1
= I1 + I2 + I3 ,
where
n 1 t
I1 = f (Bti )(Bti Bti1 ) f (Bu ) dBu ,
i =1 0
n 1 t
1 1
I2 = f (Bti1 )(Bti Bti1 )2 f (Bu ) du,
2 i =1 2 0
n
I3 = o((Bti Bti1 )2 ) 0,
i =1
2 g(B(t), t) 1 3 g(B(t), t) g(B(t), t)
= dB + dt dt + dt
xt 2 x t
2 t
1 2 g(B(t), t) g(B(t), t)
+ dt + dB,
2 x 2 x
where we apply (6.59) to the inside of the last brackets with
Those who on the sell side of the security industry (e.g., market makers) and
policy makers (e.g., the Federal Reserve) usually work with the risk-neutral
probability measure.
In the risk-neutral world, if we ignore dividends, the (conditional) expecta-
tion of the stock returns must be equal to the risk-free rate. To interpret this
310 6 Stochastic Calculus and Geometric Brownian Motion Model
statement in mathematical language, let us recall (6.55) with (t, X (t)) = (t)
and (t, X (t)) = (t):
Thus, in the risk-neutral world (i.e., under the risk-neutral probability mea-
sure Q),
dX (t) = r(t) dt + (t) dBQ (t), (6.62)
where BQ is a Q-Brownian motion. This Q is called the risk-neutral measure
because the expected appreciation rate of the log return on the stock is identical
to the risk-free rate despite the presence of risk in the form of (t) dBQ (t).
Note that under our consideration, drift parameter (t), scale parameter
(t), and the risk-free rate r(t) are all constants. Thus, we write
Recall the definition of the Sharpe ratio (i.e., the market price of risk from
Section 4.2.1 on page 166), written S, and we have
r
S= r = S.
Taking the difference of (6.61) and (6.62) yields
r
dBQ (t) = S dt + dBP (t) = dt + dBP (t), (6.63)
where BQ and BP are Brownian motions under probability measure Q and
P, respectively. Notice that (6.63) connects P and Q only implicitly (an explicit
connection will be given shortly). Nevertheless, since (6.63) is equivalent to
r
BQ (t) = t + BP (t), (6.64)
BP is adapted if and only if BQ is adapted to the same filtration.
It follows from the martingale property that E (dB(t)|Ft ) = 0 if B is adapted
to the filtration {Ft } that
which is equivalent to
6.8 Itos Formula for Brownian Motion 311
dS(t) 44 dS(t) 44 dS(t) 44
EP 4 Ft EQ 4 Ft = EP 4 Ft rdt. (6.65)
S ( t) S ( t) S ( t)
In words, (6.65) says that the difference between the (conditional) expected
log returns over time period [t, t + dt] (of a stock) under real-world and risk-
neutral probability measures is the risk premium.
Example 6.40. Equality (6.65) assumes that the stock pays no dividend. How
should it be modified if a dividend-paying stock is under consideration?
Relation (6.64)
r
BQ (t) = t + BP (t)
changes Brownian motion with no drift to Brownian motion with drift, which
can be described by the simplest version of Girsanov theorem for a single Brow-
nian motion.
The Girsanov theorem, also referred to as the Cameron-Martin-Girsanov theo-
rem, is a tool of changing probability measures. Changes of probability mea-
sures can be used for changes of the expectation of a random variable, which
in turn can be used for security pricing in finance, particularly for derivative
pricing. After all, establishing a probability measure in practice may not al-
ways be done in an objective way. It is desirable to look at the effect of different
probability measures on expectations.
Theorem 6.2. (Girsanov Theorem for a Single Brownian Motion with Drift)
Let B = {B(t)} be a standard Brownian motion on (, F, {Ft }, P ), where {Ft }
is the natural filtration of B. Let { (t)} be an adapted process to {Ft } satisfying
2T
1
e2 0 ( (s))2 ds < (Novikovs condition). For each t [0, T ], define
2t 2t
1. D (t) = e (s) dB(s) 12 0 ( ( s ))
2 ds
0 ,
312 6 Stochastic Calculus and Geometric Brownian Motion Model
2t
2. W (t) = B(t) + 0 (s) ds,
dQ
3. a measure Q with dP = D ( T ).
Then { D (t)} is a martingale under P and {W (t)} is a standard Brownian motion
under Q.24
dQ dQ
=D ( ) = D ( ),
dP dP 1
Q ( A) = D ( ) dP ( ) for each A F.
A
2t
( s) dB( s). A straightforward computation leads to D ( t) = e Xt 2 [ X ] t and W ( t) =
1
24 (a) Let Xt = 0
B( t) [B, X ]t . (b) If ( t) = is constant, then D ( t) = e B( t ) 12 2 t
and W ( t) = t + B( t).
25 Given a filtered probability space ( , F, {F }, P ) and a probability measure Q defined on measur-
t
able space ( , F), Q is said to be absolutely continuous with respect to probability measure P if any
A F with P ( A) = 0 implies Q ( A) = 0 (i.e., every P-null event is a Q-null event). This is one of the
reasons we prefer another filtration (infinitesimally larger than the natural filtration) for the Brownian
motion (see the footnote on page 289) and work with complete probability space.
6.8 Itos Formula for Brownian Motion 313
dX (t) = dt + dB(t).
The conditions of Itos lemma are satisfied if we let f ( x, t) = ert x and S(t)
satisfy the last equation. We apply Itos lemma to the deflated (discounted)
stock price, Y (t) = ert S(t), and obtain
27 This is significant because martingales model a fair game, and security valuation is the determina-
tion of the fair price of a security.
6.9 Geometric Brownian Motion 315
1 t 1 t 1 t
dX (s) = ds + dB(s)
0 0 0
yields
X (t) = x0 + t + B(t),
{ X (t)} is a Brownian motion with drift and scaling if and only if
Fig. 6.3 Geometric Brownian motion with drift parameter = 0.15 and volatility parameter = 0.3
is illustrated using 5,000 randomly selected sample paths over the interval [0, 1]. The current time
is 0 and S(0) = 1. At time t = 1, the frequency distribution of S(1) is shown as a histogram,
which approximates a lognormal distribution. The solid curve shows the plot of the expected value
E ( S( t)) = e(0.15+ 2 (0.3) ) t , where t 0
1 2
S(t)
That is, X (t) = ln S(0) is a normal random variable. Let S(0) = S0 be a posi-
tive number. Note that the nonnegative random variables
Notice that E (S(t)) and Var(S(t)) are determined by (6.74) and (6.75), re-
spectively (which are not t and 2 t).
Property 6.8, 3 on page 282 indicates
That is,
2
S ( t) (+ 12 2 )t S ( t)
= e2 ( + ) t .
2
E =e and E (6.76)
S0 S0
Remark 6.15. Since stock prices are never negative, it is more reasonable to
use geometric Brownian motions to model stock price dynamics than to use
Brownian motions as the latter may take on negative values.
Similar arguments apply to a comparison between the binomial tree model
and random walk model (the latter, in fact, converges to a Brownian motion;
see Theorem 6.3 on page 321).
Recall the Brownian motion with drift and scaling represented by (6.71)
Recall (6.72) and consider geometric Brownian motion S(t) = S0 eX (t) . We ob-
tain
S ( t) eX (t)
= X (s) = eX (t)X (s) log-normal((t s), 2 (t s)). (6.77)
S(s) e
Given t > 0 and S(0) = S0 , let P be a partition on [0, t] with equal length:
it
P : 0 = t0 < t1 < t2 < < t n = t where ti = , i = 0, 1, . . . n.
n
To establish a binomial tree model, we let
Given the stock price at current time, S(0) = S0 , the evolution of a stock price
process from time 0 to time t governed by each of the two different models is
illustrated in the table below.
t1 t2 t3 t n 1 t
Binomial tree model forecast S1 S2 S3 S n 1 Sn
GBM model forecast S ( t1 ) S ( t2 ) S ( t3 ) S ( t n 1 ) S ( t )
Again, {S(t), t 0}, is a geometric Brownian motion with parameters (, )
given in (6.72). Our goal is to find a relation between the binomial tree model
and the geometric Brownian motion model when the partition P becomes finer
and finer.
Recall that the parameters u, d, and p in the binomial tree model satisfy the
relations
0 < d < 1 + r < u, 0 < p < 1, ud = 1. (6.78)
To achieve our goal, we are going to look for specific values of these parameters
u, d, and p such that they can be expressed in terms of and as well as satisfy
relations in (6.78). To do so, we take four steps:
Step 1. Let us recall (6.3) on page 261 and write
u if = U with probability p
Yi = Yi ( ) = where i = 1, 2, . . . , n.
d if = D with probability 1 p,
(6.79)
Thus
Si = S0 Y1 Yi , i = 1, 2, . . . , n. (6.80)
Note that Yi s are independent and identically distributed, and straightforward
computation yields
St i
Yi = , i = 1, 2, . . . , n, (6.81)
S t i 1
E (Yi ) = pu + (1 p)d, (6.82)
E ((Yi ) ) = pu + (1 p)d .
2 2 2
(6.83)
Step 3. Setting the expectations in (6.82) and (6.84) equal, with corresponding
i, yields28
( + 12 2 ) nt
pu + (1 p)d = e ,
which is equivalent to
( + 12 2 ) nt
d e
. p=
ud
Setting the expectations in (6.83) and (6.85) equal, with corresponding i,
yields
pu2 + (1 p)d2 = e2(+ ) n .
2 t
A random walk consists of a succession of random steps, which means that ei-
ther the direction or size of each step (or both) is chosen at random. To make
a mathematical formalization of this notion, we associate each step with a ran-
dom variable, say Ri . Thus, the dynamics of the walk are governed by the
distribution of these random variables.
Construction of Brownian motion from simple symmetric random walk.
Recall Example 6.27 on page 280. In mathematical language, a stochastic
process {Wi } on (, F, P ) is said to be a one-dimensional random walk (or ran-
dom walk on Z) if
i
Wi = W0 + Rk = Wi1 + Ri , i = 1, 2, . . .
k =1
1 with probability p
W0 = 0 and Ri = i = 1, 2, . . . ,
1 with probability 1 p,
i.e., a walk starting from its mean 0 with unit step size only. A (one-
dimensional) simple random walk is symmetric if p = 12 , i.e., a walk taking each
step equally likely to be in all possible directions (for a one-dimensional sim-
ple symmetric walk, it means that moving to the left is as likely to occur as
moving to the right at each step).
6.10 BM as a Limit of Simple Symmetric RW 321
with RW E ( Ri ) = 0 and RW
2 Var( R ) = 1.
i
Apply the central limit theorem to obtain
nm
1 (n) 1 d
Sm =
m
RW nm i=1
Ri nm RW N (0, 1) as n ,
and observe
(n) 1 nm
d
Sm = m
RW
nm
Ri nmRW N (0, m) as n .
i =1
Notice that different scalings lead to different limits and B(m) N (0, m).
This observation makes the next theorem plausible.
For each
$ positive integer
% n, we define a continuous-time stochastic process
(n)
W (n) = Wt : t 0 by
n t
(n) 1
Wt = Ri , (6.89)
n i =1
whereby convention nt is the greatest integer less than or equal to nt, and
random variables Ri are defined in Example 6.41.
In words, (6.89) defines the rescaled random walk with steps of size 1n
taken every 1
n time units (i.e., steps are taken at time i/n, i = 1, 2, . . . , n t).
Theorem 6.3. Standard Brownian motion can be approximated by the rescaled ran-
dom walk:
n t
d 1
B(t) = lim Ri , t 0.
n n i =1
Intuitively, the theorem implies that Brownian motion has the microscopic
structure that may emerge from a random walk. Although Example 6.41 might
seem intuitively reasonable in suggesting a proof of this theorem by applying
the (ordinary version of) central limit theorem, a proper proof of the theorem
involves sophisticated technicality at the mathematical level beyond the scope
of this book (it requires functional central limit theorem). We refer the reader
to the literature (e.g., Billingsley [2]).
There are many more nice discussions in the literature related to the topics
presented in this chapter, e.g., [1, 5, 6, 7, 8, 10, 12, 13, 14, 16, 18, 19, 21, 23, 24,
26, 27, 28].
322 6 Stochastic Calculus and Geometric Brownian Motion Model
6.11 Exercises
= {1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 }, where
1 = UUU, 2 = UUD, 3 = UDU, 4 = UDD,
5 = DUU, 6 = DUD, 7 = DDU, 8 = DDD,
{ 1 , 2 , 3 , 4 } F 2 .
6.7. Let B = {B(t)} be standard Brownian motion. What is the probability that
B(1) lies between 1 and 1?
6.8. Let X = { Xt } and Y = {Yt } be two processes. Verify the following identity:
( Xt Yt ) = Xt Yt + Yt Xt + Xt Yt
6.11 Exercises 323
dB(t) dt = 0.
What is the expected growth rate of the stock at any given time t?
Compute E (dY |Ft ) and Var(dY |Ft ), where {Ft } is the Brownian filtration
described in Definition 6.30. Indicate properties of conditional expectation that
you applied.
where and > 0 are constants. Let Y (t) = ert S(t). Use the Ito product rule
(see (6.40) on page 292) to compute dY (t).
6.15. Continue from the last exercise. Is {Y (t)} defined in Exercise 6.14 a mar-
tingale under a risk-neutral probability measure?
2t
6.16. Compute 0 B(s) dB(s) first, then express the s.i.e. in the form of s.d.e.
6.22. Given a standard Brownian motion B(t), show that each of following
stochastic processes is also a standard Brownian motion:
a) X (t) = 1
c
B(c t) for all constants c > 0.
b) Y (t) = B(t + c) B(c) for all constants c > 0.
6.23. Let {B(t)} be a standard Brownian motion. Let t1 , t2 , . . . , tn (0, ) with
0 < t1 < t2 < < tn . Show that the random vector (or multivariate random
variable) (B(t1 ), B(t2 ), . . . , B(tn )) has a multivariate normal distribution for
any fixed choice of n time points 0 < t1 < t2 < < tn , n 1.
6.24. Continue from the last exercise. Show that the joint probability density
function of (B(t1 ), B(t2 ), . . . , B(tn )) is
n 1 ( x j +1 x j )
2
x12
exp 2 t + j=1 t t
1
1 j +1 j
f ( x1 , . . . , x n ) = , ,
( 2 ) t1 ( t2 t1 ) ( t n t n 1 )
k
Sn = Z1 + Z2 + + Zn and Xn = S2n n2 .
6.27. Let W = {Wi } be a random walk defined in Example 6.27. Show that W
is not a white noise.
6.29. Show that B2t [B]t is a martingale with respect to the filtration gener-
ated by the Brownian motion itself. That is, B is adapted to its natural filtration
{Ft }, where Ft = ({Bs , s t}).
d f ( X ) = f1 ( X )dX1 + f2 ( X )dX2
1
+ f11 ( X )dX12 + 2 f12 ( X )dX1 dX2 + f22 ( X )dX22 , (6.90)
2
f 2 f
where f i = Xi and f ij = Xi X j , i, j = 1, 2.29
6.33. Prove the Ito product rule. (Hint: prove (6.39) on page 292 by applying
two-dimensional Itos lemma given in Exercise 6.32).
6.34. An investment in a foreign asset carries exchange risk. The model under
our consideration is introduced by Briys and Solnik [4] in study of hedging
such risk.
Let V (t) be the local (domestic) currency value of a foreign asset at time t.
Let S(t) be the exchange rate at time t expressed as the local currency value
of one unit of foreign currency (e.g., 1.11 USD/Euro). The model assumes that
both {V (t)} and {S(t)} are geometric Brownian motion processes:
dV
= V dt + V dBV
V
dS
= S dt + S dBS ,
S
where two standard Brownian motion processes BV and BS have correla-
tion VS .
Let V = VS, the value of the foreign investment expressed in domestic cur-
rency. Compute dV V and interpret your answer. (Hint: apply Ito product rule.)
References
29More precisely speaking, we assume that B1 and B2 are defined on the same filtered probability
space { , F, {F t }, P } and adapted to the filtration {F t }.
References 327
[8] Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge Uni-
versity Press, Cambridge (2010)
[9] Durrett, R.: Brownian Motion and Martingales in Analysis. Wadsworth
Advanced Books and Software, Belmont (1984)
[10] Epps, T.: Pricing Derivative Securities. World Scientific, Singapore (2007)
[11] Fama, E.: Efficient capital markets: a review of theory and empirical work.
J. Finance 25(2), 383 (1969)
[12] Girsanov, I.V.: On transforming a certain class of stochastic processes by
absolutely continuous substitution of measures. Theory Probab. Appl. 5,
285(1960)
[13] Harrison, J.M., Pliska, S.R.: Martingales and stochastic integrals in the the-
ory of continuous trading. Stoch. Process. Appl. 11(3), 215(1981)
[14] Hull, J.: Options, Futures, and Other Derivatives, 7th edn. Pearson Pren-
tice Hall, Upper Saddle River (2009)
[15] Hunt, P., Kennedy, J.: Financial Derivatives in Theory and Practice. Wiley
Series in Probability and Statistics. Wiley, New York (2004)
[16] Ito, K.: Multiple wiener integral. J. Math. Soc. Japan, 3, 157169 (1951)
[17] Karatzas, I., Shreve, S.: Brownian Motion and Stochastic Calculus.
Springer, New York (1991)
[18] Korn, R., Korn E.: Option Pricing and Portfolio Optimization. American
Mathematical Society, Providence (2001)
[19] Mackean, H.: Stochastic Integrals. Academic, New York/London (1969)
[20] Malz, A.: A Simple and Reliable Way to Compute Option-Based Risk-
Neutral Distributions. Federal Reserve Board of New York Staff Reports
(2014)
[21] Mikosch, T.: Elementary Stochastic Calculus with Finance in View. World-
Scientific, Singapore (1998)
[22] Morters, P., Peres, Y.: Brownian Motion. Cambridge University Press,
Cambridge (2010)
[23] Musiela, M., Rutkowsk, M.: Martingale Methods in Financial Modelling.
Springer, New York (2004)
[24] Neftci, S.: An Introduction to the Mathematics of Financial Derivatives.
Academic, San Diego (2000)
[25] Paley, R., Wiener, N., Zygmund, A. Note on some random functions.
Math. Z. 37, 647668 (1993)
[26] Roman, S.: Introduction to the Mathematics of Finance, 1st edn. Springer,
New York (2004)
[27] Shreve, S.: Stochastic Calculus for Finance II: Continuous-Time Models.
Springer, New York (2004)
[28] Wilmott, P., Dewynne, N., Howison, S.: Mathematics of Financial Deriva-
tives: a Student Introduction. Cambridge University Press, Cambridge
(1995)
Chapter 7
Derivatives: Forwards, Futures, Swaps, and Options
1 These assumptions are often either unreasonable or unrealistic from a practical point of view (e.g., a
time t is assumed to be continuous, but in reality it is discrete).
Derivatives are securities in the form of contracts between two parties, the
buyer and the seller. Since these contracts are either for contingent claims or for
forward commitments, derivatives can be classified into contingent claims and
forward commitments (or noncontingent claims) according to the type of contracts.
A contingent claim is a contract which gives the buyer the right, but not the
obligation, to buy or sell a security, called underlying security or underlier, at a
specified price, called strike price, on or before a specified date, called expiration
date. Examples of contingent claims are options.2
A forward commitment is a contract by which the buyer and seller have the
obligation to buy or to sell (to deliver) an underlying security at a predeter-
mined price in the future, called the delivery date. Examples of forward com-
mitments are forwards and futures.
Derivative contracts can be created on and traded in some exchanges such
as the Chicago Board of Trade (CBOT), which is the worlds oldest futures
and options exchange, or on OTC markets, which are less transparent than
exchanges. For example, a futures contract is considered to be a standardized
version of a forward commitment and therefore traded on CBOT, whereas a
forward contract can be customized to any commodity, amount, and delivery
date and, therefore, is nonstandardized and traded on OTC markets.
2 Later we will show that the definition of options coincides with the definition of contingent claim in
the financial dictionary: a claim that can be made only if one or more specified outcomes occur.
7.1 Derivative Securities: An Overview 331
Keep in mind that derivative contracts can be written on real assets such
as commodities3 as well as on financial assets such as bonds, stocks, currencies,
and other derivatives. Derivative contracts written on precious metals or agri-
cultural products are called commodity derivatives, whereas those written on
securities are called financial derivatives.
Although we will primarily focus on stock options and have not yet pro-
vided all the details in the financial jargon used in this section, the next exam-
ple will help the reader to understand Remark 7.1.
Example 7.2. The premium of an option on futures tracks the price of its under-
lying futures contract which, in turn, tracks the price of the underlying cash. In
fact, regardless of the underlying security, it is a fairly safe bet that the futures
price will generally converge to the spot price of the underlying security as
the delivery month of a futures contract approaches because otherwise there
would be arbitrage opportunities.
For instance, the July copper option tracks the July copper futures contract.
The March S&P 500 index option follows the March S&P 500 index futures.
Furthermore, the prices of these futures converge to the spot prices of copper
and S&P 500 index with high probability at least in their delivery months.
Derivatives may serve three basic functions, which are price discovery, specu-
lative activity, and hedging activity:
Price discovery is a process involving buyers and sellers arriving at a transac-
tion price for a given product with given quality and quantity at a given time
in a given location. Although they are interrelated, price discovery and price
valuation are different concepts. The former is a mechanism, whereas the lat-
ter is a determination. Price discovery plays an important role in economic
decision-making that involves either entrepreneurs or policy makers.
Speculators are those who take calculated risks in the hope of making large
short-term profits. By definition, speculators are usually not interested in hold-
ing possession of the underlying securities. They are typically sophisticated
investors with expertise in the markets in which they are trading who usually
use highly leveraged investments such as futures and options.
Hedgers are those who take steps to reduce the risk of an investment by mak-
ing an offsetting investment. By definition, hedgers do not usually seek a profit
but rather seek to stabilize the performance of their portfolios or the revenues
3A commodity is a raw material used in commerce or primary agricultural product that can be bought
and sold such as copper, silver, crude oil, natural gas, wheat, beef cattle, and coffee.
332 7 Derivatives: Forwards, Futures, Swaps, and Options
or costs of their business operations. Their gains or losses are usually offset to
some degree by a corresponding loss or gain in the market for the underlying
securities.
Example 7.3. Price discovery begins with market price information. Imagine
how difficult business decision-making would be in the extreme scenario in
which no one knows at what price competitors have sold a given product.
When more varieties of a product are sold through more venues, more mar-
ket price information becomes available. Derivatives serve this purpose by
providing a wider variety of assets through many venues, including cash
settlements.
Solution. Note that an equivalence between the no-arbitrage condition and the
existence of the risk-neutral probability measure implies that p = 12 (which is
the solution to the equation 50 = 60p + 40(1 p)).
Suppose that we sold one derivative contract and need to buy shares of
the underlying stock to hedge away any portfolio risk, where the portfolio
consists of a long position in shares of the stock and a short position in one
derivative contract. Let ( T ) denote the value of the portfolio at time T. Since
.
$10 if S( T ) = $60
C ( T ) max{S( T ) 50, 0} =
0 if S( T ) = $40,
we obtain
.
60 10 if S( T ) = 60
( T ) =
40 if S( T ) = 40,
7 One can think of a call option (see Definition 7.8). It is defined in a later section, but this example
should be readily understandable.
334 7 Derivatives: Forwards, Futures, Swaps, and Options
for otherwise, t0 < T such that (t0 ) < 0 and ( T ) 0 with probability 1,
contradicting to the no-arbitrage assumption!
The next example shows that no arbitrage is a sufficient condition of the law
of one price. Equivalently, the law of one price is a necessary condition of no
arbitrage.
Example 7.8. Show that the law of one price holds if there are no arbitrage
portfolios.
Proof. Suppose otherwise, i.e., there exist two portfolios A and B and some
t0 < T such that A ( T ) = B ( T ) and A (t0 ) = B (t0 ).
Without loss of generality, say A (t0 ) < B (t0 ). Then we construct a port-
folio by long portfolio A and short portfolio B, written = A B , which
leads to
is defined by A B an arbitrage?
Remark 7.2. (Law of One Price) The approaches to constructing an arbitrage
demonstrated in Examples 7.8 and 7.9 provide a general guideline which will
be applied throughout this chapter. Combining these two examples, we con-
clude the following:
If each of two investment portfolios produces a deterministic stream of cash
flows as indicated below,
t=0 t=T
cash flows from A A (0) A (T )
cash flows from B B (0) B (T )
336 7 Derivatives: Forwards, Futures, Swaps, and Options
then
Remark 7.3.
1. Recall that under the continuous-time framework, the investors are allowed
to trade up to time T < . Let S(t) = ( S0 (t), S1 (t), S2 (t), . . . , S N (t) ) be a
market, where price process {Si (t)} is defined on filtered probability space
(, F, P, {Ft F : 0 t T }) with FT = F for each i = 0, 1, 2, . . . , N.
Suppose that positions of a portfolio P respectively on securities
S0 ( t ) , S1 ( t ) , S2 ( t ) , . . . , S N ( t )
Also, let us denote by Pn (t) the process of the value of a portfolio P con-
structed by trading strategy n. Then
N
Pn (t) = n(t) S(t) = n i ( t ) Si ( t ).
i =0
The law of one price is said to hold if there do not exist two trading strategies,
say n and n such that Pn ( T ) = Pn ( T ) but Pn (t) = Pn (t) for some t < T.
An interpretation of this more precise version of the law of one price along
with Example 7.8 is that arbitrage opportunities exist when the prices of similar
assets are set at different levels.
2. A relationship between no arbitrage and the equivalent martingale measure
is given by the First Fundamental Theorem of Asset Pricing, which states: The
market is arbitrage free if and only if there exists an equivalent martingale measure.
(See page 314.)
3. The non-arbitrage assumption is a reasonable one for financial theory:
In the real world, arbitrage opportunities do exist, but they are only transient
because, once more investors jump in to share the free lunch, soon the free
lunch will be over. The market price will be adjusted and move from an old
equilibrium to a new one.
7.2 Forwards 337
7.2 Forwards
There is no payment by either party when the contract is first entered into.10
Thus the value of a forward contract at the time the contract is entered into
is zero. The delivery date or expiry is also called the exercise date or maturity
or expiration date, the time at which the asset changes hands. The seller is also
called the writer of the contract. The forward price is also called the exercise
price.
The spot market or cash market or physical market is a financial market where
assets are traded for cash and immediately delivered on spot (e.g., the stock
10 In old times, people would most often shake hands when agreeing on deals.
338 7 Derivatives: Forwards, Futures, Swaps, and Options
Example 7.10. It is a well-known fact that grain prices may swing substantially
between highs and lows.
In order to secure a smooth wheat supply, a flour mill A enters into a for-
ward contract with a farmer B on June 1 to buy 100 (metric) tons of wheat at
$222 per ton on September 30.
To become familiar with the terminologies in Definition 7.3, we identify that
in this contract, the buyer is A, the seller is B, the underlying asset is wheat,
the forward price is $222, the expiration date is September 30, the contract size
is 100, and the delivery price is $222 as well (which happens to be the same as
the forward price).
On June 1, A and B sign the contract and shake hands. No money changes
hands. However, on September 30, A will pay $22,200, irrespective of the price
of wheat in the spot market, and B will deliver 100 tons of wheat to the flour
mill.
Both parties A and B are bound by the contract and have to honor their
commitments.
Let
t = 0 be the time the forward contract is entered into (e.g., June 1),
t = T be the expiration of the forward contract (e.g., September 30),
S(t) be the spot price of the underlier in the forward contract at time t.
contract on July 10 to buy 100 tons of wheat on September 30. Would X be able
to negotiate with any counterparty to get the same forward price, $222? Of
course not!
If we denote July 10 by time t1 , in this new contract, denoted by FT (t1 ), the
forward price FT (t1 ) would be much higher than FT (0).
Notice that two forward contracts from the last two examples have the same
terms except the forward price and the initial time of the contract.
To emphasize the fact that forward prices are functions of time, we denote
by FT (t) the forward price.
For the convenience of conversation,
we define the terminal payoff of a forward contract to be the payoff from a
long forward contract, that is,
The payoff from a short forward contract is the negative value given by (7.1):
Definition 7.4. A forward payoff diagram is a graph of the terminal payoff from
long position of a forward contract as a function of the underlier price at T.
The spot-forward parity (see Section 7.2.3) for underliers with continuously
paid cash dividends is useful when the underliers are stock indexes containing
many stocks,11 since such an index can be modeled as the dividend being paid
continuously at a rate that is proportional to the level of the index.
To obtain such a spot-forward parity in the next section, let us now recall
(6.29) on page 281 from Section 6.4.3: D(t) = S(t) q dt, where S(t) is the spot
price of the underlier of a forward contract and q is the (annualized) dividend
yield of the underlier. In the rest of this chapter, we will use the familiar results
on page 281:
The dividend reinvestment yields eqT 1 more units at time T from 1 unit
of the underlier at time 0,
The dividend reinvestment yields 1 eqT more units at time T from eqT
units of the underlier at time 0,
and the following assumptions: Unless stated otherwise,
1. By stating that an asset with annual dividend yield q, we mean that an asset
pays a constant, continuous proportional annual dividend yield rate q that is con-
tinuously reinvested to buy more units of the asset and that holding the asset neither
incurs cost of carry nor provides any other convenience yield.
2. r is the risk-free interest rate compounded continuously.
11
We leave out of the theoretical arguments such as whether a no-arbitrage condition can be verified
when the underlier is the S&P 500 Index.
7.2 Forwards 341
We note that the assumption of cost of carry12 and convenience yield13 other
than the dividend yield in the next theorem is not a reasonable one.
Theorem 7.1. Suppose that a forward contract entered into at time 0 with expiration
T is on an asset with annual dividend yield q (see page 341) and that r is the risk-free
interest rate, the fair forward price is given by
In words, (7.2) says that under an arbitrage-free assumption, buying the for-
ward contract and taking delivery is equivalent to buying the underlying asset
from its spot market today and holding in the sense that the cost of both strate-
gies must have the same present value.14
Proof.
Case 1. If FT (0) < S(0) e(rq) T , we construct two portfolios A and B, denoted
by A and B , respectively, with positions established at time 0 indicated
below:
A : short 1 forward.
B : short eqT units of the asset to long bond15 with S(0)eqT at rate r.
We denote by A (t) the value of portfolio A and B (t) the value of portfolio
B at time t. Then the initial and terminal values of two portfolios are
12 The cost of carry is the cost incurred by holding the underlying asset such as storage costs or insur-
at time T can be interpreted as lend $S(0)eqT at interest rate r. Similarly, a position of short zero
here would be interpreted as borrow $S(0)eqT at interest rate r. Using the bond terminologies will
provide us convenience (e.g., for letter expression of long or short position) later.
342 7 Derivatives: Forwards, Futures, Swaps, and Options
Example 7.14. Suppose that the current spot price of a continually paying div-
idend asset is $222, the interest rate is r = 3%, and the dividend yield is q = 2%.
1. What are the one-month and seven-month forward prices for the asset in an
arbitrage-free market?
2. Let be a portfolio on time interval [0, T ] consisting of three positions start-
ing from time 0: borrow $222 at the rate 3%, long 1 unit of the asset, and
short the three-month forward. Is an arbitrage portfolio?
In order to answer the second part of questions, one needs to fill out the table
below first.
At time 0 At time T
Cash flows from
The detailed work is left as an exercise for the reader.
7.2 Forwards 343
Example 7.15. Let F = F (S, t) = FT (t) = S(t) e(rq)( T t) . Suppose that the un-
derliers price follows a geometric Brownian motion:
dS = S dt + S dB.
dF = ( r + q) F dt + F dB.
Remark 7.4.
1. Using the language of stochastic calculus that we introduced in the last
chapter and only considering the forward contracts that can be replicated
by self-financing trading strategies, the forward price can be expressed by
Both forwards must be written on the same asset and have the same expira-
tion and contract size. Recalling the definition of a (long) forward payoff from
(7.1) on page 339, we obtain
Difference between two forward payoffs = S( T ) FT (0) (S( T ) FT (t))
= FT (t) FT (0).
To discount this difference to get its value at the current time, we obtain the
current (market) value of the (long) forward contract that was entered into at
time 0 in an arbitrage-free market:
Theorem 7.2. The value of a (long) forward contract at time t entered into at time 0
with expiration at time T, denoted by FT (t), is given by
FT (0) = 0,
F T ( T ) = FT ( T ) FT (0) = S( T ) FT (0) = terminal payoff of forward.
7.3 Futures
Forward trading has survived for a few hundred years. The forward contract
was created to stabilize grain prices by farmers on top of the already central-
ized grain trade.
In 1848, the Chicago Board of Trade (CBOT) was formed, and trading was
originally in forward contracts.
Although the forward contract is customized to meet the users special
needs, it has illiquidity due to the lack of market exposure to potential buy-
ers or sellers,16 and it has counterparty risk, the risk that their counterparties
fail to meet their obligations (e.g., default in the payment or even bankrupt).
Thus, counterparties must check each others creditworthiness before a for-
ward contract is entered into. This is why the end users of forward contracts
are mainly big institutions.
Illiquidity and counterparty risk are the inherent limitations of forward
contracts. A way to overcome these shortcomings is to standardize forwards.
The standardized forward contract is called the futures contract. In 1972, the
Chicago Mercantile Exchange (CME) started to offer futures contracts.
To mitigate or even remove credit risks, regulations made in accordance
with laws impose mark-to-market and daily settlement on futures traders
margin accounts (to be explained shortly). The daily balance on such an
account is calculated based on the settlement price defined by an exchange.
To increase liquidity, regulations made in accordance with laws impose stan-
dardizations on terms of futures contracts including what can be delivered,
when it can be delivered, how it can be delivered, where it can be delivered,
16To compare to a familiar environment, simply consider, in a housing market, the difference between
the market exposure of for sale by owner and that of for sale by real-estate agency. Generally
speaking, the bigger the market exposure, the higher the level of liquidity.
346 7 Derivatives: Forwards, Futures, Swaps, and Options
Example 7.17. A forward contract allows a farmer to sell 1234 bushels of wheat
next February, whereas a futures contract does not because the size of one
futures contract is 5000 bushels and the expiration months are March, May,
July, September, and December. However, the farmer does not expose himself
to any credit risk by entering into a futures contract.
The definitions of futures price, long position, short position, expiration date,
etc. are parallel to their counterparts for forwards. For example, the futures price
is the agreed price of an asset in a futures contract.
A futures contract is similar to a forward contract except that contractors
deal with a third party, i.e., an exchange, rather than each other. By doing so,
the inherent credit risk of a forward contract is mitigated because both parties
must meet margin account requirements under the mark-to-market accounting
rule, which makes a futures contract like a sequence of daily forward contracts
until maturity.
Although by definition futures and forward contracts are similar in terms
of the final results, the mark-to-market accounting rule requires futures con-
tractors to settle up daily (or make daily settlement) through the exercise day,
whereas by contrast there is no settlement for forward contracts until the exer-
cise day.
Since margin account requirements and the mark-to-market accounting rule
differentiate the futures contract from the forward contract, to understand fu-
tures we begin with explaining these two concepts. To do so intuitively, we
illustrate margin account requirements and compute the daily margin balance
in the example below.
Before trading a futures contract, the prospective trader must deposit funds
with a broker. This deposit serves as a performance bond and is referred to as
the initial margin, the level of which is based on a function of the price volatility
of the underlier (e.g., a commodity).
17 Futures contracts allow fewer delivery options than forward contracts.
7.3 Futures 347
We let FT (t) be the futures price at time t and FT (t) be the futures value at time
t and close the futures section by the following remark.
18 Precisely speaking, the futures price in the second column should be the daily settlement price or
simply settlement price, which is defined by the exchange. There are different types of settlement pro-
cedures. Each derivative exchange has a set of procedures used to calculate the settlement price. Mar-
gin requirements are based on the daily settlement price, not the daily closing price. For our purpose,
we consider the settlement price to be essentially the closing price on that day.
348 7 Derivatives: Forwards, Futures, Swaps, and Options
Remark 7.5.
1. Since the daily settlement makes a futures contract like a sequence of daily
forward contracts until maturity, and the value of a forward is zero at the
initial time, FT (t) = 0 for t < T (although the terminal payoff of a futures
contract is the same as that of its forward counterpart), the futures value is
different from the value of its forward counterpart (as the latter needs not
equal zero except at the time the contract is entered into).
2. Because of the daily settlement and the exchange-treated nature of a futures
contract, FT (t), a futures price for delivery of an asset at time t < T, is an
agreed price between the trader and the exchange (i.e., a price determined
by the exchange rather) and, therefore, behaves more like a market price. It
has been proved that if futures interest rates are deterministic, then under
the no-arbitrage assumption, FT (t) = FT (t). Otherwise the equality may not
hold. More description about the behavior of FT (t) can be found in the liter-
ature (e.g., Hull [11]).
For more impact of daily mark to market on futures contracts such as the
correlation between the futures price movements and interest rate move-
ments, we refer the reader to the literature (e.g., Cox, Ingersoll, and Ross [4]
and Duffie and Stanton [8]).
3. For the following reasons:
Futures have lower transaction cost, since the evolution from OTC-traded
forwards to exchange-traded futures encourages liquidity;
With futures, it is equally easy to go short as to go long (a convenience
inherited from forwards);
The value of forward kept at zero (i.e., FT (t) = 0) allows a futures con-
tractor (either buyer or seller) to close out his position at any time;
it is easier to use futures to hedge, particularly easier to go short with futures
market than with the spot market because of the uptick rule19 imposed on
the spot market.
7.4 Swaps
Swap contracts are basically a series of cash-settled forward contracts that re-
quire action to be taken by investors periodically over time.
The tailor-made structures of swaps create a wide variety of financial instru-
ments that financial institutions trade in order to hedge against risk.
19 The SEC requires that every short sale transaction be entered at a price that is higher than the price
of the last trade.
7.4 Swaps 349
These cash flows are most commonly the interest payments associated with
debt service. Financial institutions and companies dominate the swaps market
with almost no individual participation.
By definition, swaps are basically sequential cash-settled forward contracts
that require action to be taken by the counterparties on periodic dates. Conse-
quently, the initial value of a swap should be zero.
20 The name plain vanilla swap reflects that those swaps do not possess any special or unusual features.
350 7 Derivatives: Forwards, Futures, Swaps, and Options
Other than the above categories of swaps, there are also credit default swaps
(CDS), which can be used as a protection against credit loss, commodity swaps,
which can be used by commodity producers to manage their exposure to price
fluctuations, and other variations of swaps, since tailor-made swap structures
create a wide variety of financial instruments that financial institutions trade
in order to hedge against risk.
Among all different types and variations of swaps, the most widely used are
interest rate swaps, which are used mostly to reduce borrowing costs:
1. Based on changes in its long-term or short-term assets and its credit rating,
an institution with an existing debt service obligation faces higher than ex-
pected borrowing costs because of a change in its initial interest rate outlook.
To avoid these costs, the institution wants to swap to a different exposure
(see Example 7.20).
2. To receive lower borrowing costs than those available by directly accessing
the fixed-rate or floating-rate markets, two institutions may work together
by exploiting their comparative advantages as borrowers in different mar-
kets and swapping the proceeds (see Example 7.21).
The interest rate swaps market is the largest and fastest growing financial
derivative market in the world.
It is worth noting that in the next two examples, we will display only
the mechanics of interest rate swaps and ignore credit risk differences, even
though counterparty credit risks are very important for swap traders to
evaluate.
Example 7.20. (Plain Vanilla Swap) Consider the scenario of two companies A
and B. Company A has taken a loan at a six-month LIBOR21 plus 1% floating,22
but now would like to have the loan at a fixed rate of 3% since the company
expects interest rates to rise above 3% soon, whereas company B has a loan at
a fixed annual rate of 3% and expects interest rates to drop below 3% soon.
Since for a company to pay off an old loan and apply for a new loan or to
refinance a loan can be not only costly but also legal document-intensive due
to regulations, companies A and B decide to enter into a swap contract, which
is a much simpler way of exchange of interest rates between fixed and floating.
Assume that the terms of the swap contract include the following:
The notional principal is two million dollars,
The life of the contract is 4 years,
A pays B six-month LIBOR + 1%,
Example 7.21. (Mechanics of Interest Rate Swaps) Suppose that both com-
panies X and Y need to borrow US dollars and that company X would like to
borrow at fixed rate, whereas company Y would like to borrow at floating rate.
The cost to each company of accessing either the fixed rate or the floating rate
market for a new debt issue is given below23:
Given the differences in rates indicated in the table above, companies X and
Y realize that they could achieve a combined 100 basis point (i.e., 1%) savings
and decide to enter a swap with a swap bank24 B.
We claim that X, Y, and B can all benefit financially if X borrows at the
floating rate, Y borrows at the fixed rate, X and Y swap interest payments,
and B as the intermediary charges 0.1% of the notional principal. To prove our
23 Company Y enjoys a lower borrowing cost in both markets because we assume that company Y
has a better credit rating than company X.
24 A swap bank is a generic term for a financial institution that facilitates swaps between counterparties
claim to be true, we demonstrate the simple mechanics of the swap with the
calculations below:
1. X pays 4.9% fixed to B and B pays 4.8% fixed to Y, and
2. Y pays LIBOR floating to B and B pays the same LIBOR floating to X.
3. The borrowing cost for X after swapping proceeds becomes at the fixed rate
Remark 7.6.
1. A variation of swap is called a variance swap which is an OTC instrument
that allows one to speculate on or hedge risks associated with volatility. Vari-
ance swaps on major stock indexes such as S&P 500 are actively traded.
2. Since an interest rate swap contract is basically a series of forward contracts,
intuitively, its value is zero at the time the swap is entered into (although
some of these forward contracts may have nonnegative values and others
may have negative values, the sum of all of them is zero) and may change
over time due to the change of interest rates. For the buyer of the swap,
the party who receives fixed and pays floating, the value of the swap is
positive if the fixed rate is greater than the floating rate; the value of the
7.5 Options 353
swap is zero if the fixed rate equals the floating rate; and the value of the
swap is negative if the fixed rate is less than the floating rate.
Since the floating rates cannot be observed beforehand, the valuation of an
interest rate swap may involve calculations of forward rates based on a yield
curve in addition to present value techniques.
Generally speaking, the pricing and valuation of swaps can be complex and
requires vigorous financial analysis to derive fair values. We refer the reader
to the literature on this subject.
7.5 Options
The premium of an option contract is the amount that the buyer has to pay
and the seller (or writer) receives at the time when both parties enter into the
contract.
The specified asset is called the underlier or underlying asset of the option.
The specified date is called the expiration or maturity (date) of the option.
The specified price is called the strike price or exercise price of the option.
Naturally, an expiration date is also called an exercise date.
The specified quantity is called the contract size, which denotes how much of
the underlying asset will change hands if the option is exercised, where exercise
an option means to put into effect the right in an option contract.
25We mean the European-style option (to be defined shortly), which is the basic option style from
which other styles of options derive.
354 7 Derivatives: Forwards, Futures, Swaps, and Options
Option Styles
Option Types
The basic bread-and-butter options fall into one of the following two types,
designated according to buying and selling rights:
Definition 7.8.
1. A call contract, or a call option or simply a call, is an option contract which
grants the buyer the right to buy a specified quantity of an underlying asset
on or by an expiration date T at a strike price K. The payoff is max{S( T )
K, 0} when exercised at expiration T.
2. A put contract, or a put option or simply a put, is an option contract which
grants the buyer the right to sell a specified quantity of an underlying asset
26 The Chicago Board Options Exchange (CBOE), a spin off from the Chicago Board of Trades, first
for the same stock in practice. Almost all exchange-traded stock options are American-style options,
whereas stock index options can be issued as either American or European options (e.g., S&P 100
index options are American options, and Nasdaq 100 index options are European options).
29 Option pricing done by the Black-Scholes-Merton model applies to European options, not to Amer-
ican options, and reflects the risk associated with having to wait to exercise the option, which is not
appropriate for American options because of the possibility of early exercise.
7.5 Options 355
A call option is called a call because the buyer (owner) has the right to call
the underlying asset away from the seller (writer). A put option is called a
put because the buyer (owner) has the right to put the underlying asset to
the seller (writer). The payoff may be considered as the option value at exercise.
Clearly, an option is a financial instrument with nonnegative value at any time
(for it involves no obligation prior to expiration).
Calls and puts are also referred to as vanilla options.30
Although in reality stock options are American options and American op-
tions are more useful, all the examples in this section will be confined to Eu-
ropean options hypothetically on stocks. They are easier to understand and
give the background needed for studying the option pricing models in Chap-
ter 8 and for following related literature beyond. The next example provides
an intuition about what a call option actually means.
In words,
The terminal payoff of a call, C (S( T ), T ), is the value of the call C (S(t), t)
at expiration T, which is its market value at expiration.
In a similar fashion, if we denote by P(S, t) the value of European put option
at time t, then at expiration T ,
.
K S( T ) if S( T ) < K
P(S, T ) = max{K S( T ), 0} = (7.8)
0 if S( T ) K.
Option contracts are defined by their terms, which are standardized by the
exchange on which the option is listed. In the next two examples, we explain
standardized contract size and expiration dates for practical purposes.
Example 7.24. (Option Contract Size) For equity options (underliers are
stocks), the contract size (also called the option trading unit or multiplier) is
100. In other words, one contract controls 100 shares of the underlying stock.
Suppose that you want to purchase a call on XYZ stock with strike price
$50 and premium $1.50. Then you will have to pay $150 for the right to buy
100 shares of XYZ stock in the contract. Note that, in practice, you would also
have to pay commissions to your broker.
For standard index options (underliers are stock indexes), the contract size
is also 100. In other words, the notional value underlying each contract equals
$100 multiplied by the index value.
However, for mini options, the contract size is 10 (representing 10 shares of
an underlier). For example, 10 Mini-SPX options equal 1 SPX full value con-
tract. That is, the notional value underlying each mini SPX contract equals $10
multiplied by the S&P 500 index value.31
Example 7.25. (Option Contract Expiration Date) Equity and index options
expire at 4 pm EDT on the third Friday32 of the expiration month in the sense
that they no longer trade; however, the official expiration day is the Saturday
immediately following that Friday.
Knowing the month in which the option you want to purchase will expire
is very important, since it is a natural part of your trading strategy design and
will have a significant impact on the outcome of your trade.
Traditionally, there is an option expiration cycle for each equity on which
options are written. Each cycle contains 4 months. For example, suppose that
today is May 11, 2015 and a stock XYZ has a February option expiration
31 For standard S&P 500 index futures, the multiplier is 250 (index level 250 = price); for E-mini SPX
futures (smaller contract), the multiplier is 50. Multiplier varies for indices.
32 If the third Friday is a market holiday, then those options expire on the third Thursday.
7.5 Options 357
cycle. Then tradable option contracts written on stock XYZ have expiration
months (at least) in May (the current or front month), August (the near month),
November, and February 2016.
General information on exchange-traded option expiration dates is available
on the Options Expiration Calendar at the CBOE website.
Equity options can be used for a variety of purposes such as hedging exist-
ing positions and speculating or buying or selling stocks. We illustrate some
simple applications in the following examples.
Example 7.26. (How Call Buying Work) If the price of stock XYZ is $7 per
share today (say, May 11) and you speculate that it could rocket above $15
within 30 days, then you could buy the June (expiration) 15 (strike) call option.
Suppose that the premium is $1. Ignoring commissions, in order for you to
break even, the stock price would need to rise to $16. In order for you to make
a profit, the stock price would need to exceed $16.
As a call holder, your maximum possible loss is the premium you paid ($100
per contract for 100 shares).
If you exercise the call when the stock price is $20 you immediately have
$300 profit (per contract) on paper. You may continue to hold the shares if you
think the stock price will continue to rise. Otherwise, you could sell your call
contract (i.e., close your position) and nail down the profit which is a return of
300% (much higher than 2077 = 185.7% return from buying the stock).
Further discussion. You may also buy call options in the situation when you
wait for cash coming in, say, from selling stocks other than the underlier.
Example 7.27. (How Call Selling Work) Suppose you have bought 100 shares
of stock XYZ at price $7 per share. Today (say, May 11), you think that, alt-
hough the stock has a good potential to continue to move much higher in the
long run, in the short run, the price may move down before breaking a resis-
tance level at about $10 (e.g., historical stock price movements showed that
the price moved down from a level at about $10 a number of times). To hedge
your position, you could write 1 June 10 call option.
Suppose that the premium is $1.50. The premium that you receive produces
income ($150) on the stock that is already in your portfolio. Ignoring commis-
sions, you will not lose money as long as the price is not below $5.50.
Further discussion. In fact, the way of using call options as we just explained in
this example is referred to as a covered call strategy, which means that you write
calls when you have enough shares of the underlying stock in your portfolio.
358 7 Derivatives: Forwards, Futures, Swaps, and Options
S( T ) + c S(0) if S( T ) K
S( T ) max{S( T ) K, 0} (S(0) c) = (7.9)
K + c S(0) if S( T ) > K.
Example 7.28. (How Put Buying Work) Again, suppose that you bought 100
shares of stock XYZ at price $7 per share. Today (say, May 11), you think that,
although the stock has a good potential to continue to move much higher in
the long run, in the short run, the price may move down before breaking a
resistance level at about $10.
Aside from engaging in the covered call strategy in the last example, you
can also consider buying puts if you think the stock may go down much lower
than $5.50 in the short term. If you are wrong, you lose the premium that you
paid when you entered into the contract. In a way, put options are parallel to
insurance policies.
Further discussion. One can make a real profit in a big downward movement of
stock price by either buying puts or shorting the stock directly. Ignoring mar-
gin requirements and commissions, in order to make a profit from buying puts,
one needs to be right about both direction and timing of the price movement;
when shorting stocks, one needs to be right only about the direction. However,
the advantage of buying puts over shorting stock is that buying puts allows
you to determine and prepare for a worst-case scenario as you know that your
loss cannot exceed the premium paid when you entered into the contract.
Example 7.29. (How Put Selling Work) One strategy of buying stocks is by
selling puts.
Suppose you would like to buy 100 shares of stock XYZ at about $6 and the
price of the stock is $7 today (May 11). You could sell a June $6 put option on
7.5 Options 359
the stock and earn premium income $25 (i.e., premium is $0.25) immediately.
If the price of the stock drops below $6, the put buyer will exercise the put,
and you will have to honor your commitment to buy the stock at $6 (even if
the stock price plunges to $2 unexpectedly). If the price of the stock stays above
or at $6, then the put buyer will not exercise the put. Although you will not get
a chance to buy the stock, you still keep the premium income.
In fact, one way for speculative traders to earn premium income is selling
puts by thinking (being so confident) that the stock will not reach the strike
price minus the premium.
Further discussion. By definition, a put contract grants the buyer a right, not an
obligation, to sell the underlying stock. However, by selling this put, the put
writer assumes an obligation, not a right, to buy 100 shares of the stock at the
strike price if the buyer of the put wants to sell (i.e., if the buyer exercises
the put), regardless of the price of the stock in the spot market. Because of
this obligation, when writing a naked put, the put writer should not rely on a
wishful thinking but be prepared to take a loss or to own the stock for a while.
The same, if not higher, level of prudence should apply to consideration of
writing a naked call (see rationale in Exercise 7.11).
For a portfolio of options on the same underlier and with the same expiration,
an option terminal payoff diagram provides an extremely useful visualization
that traders rely on to analyze a portfolio strategy. This diagram illustrates how
a right combination of option positions can form a portfolio that has a risk exp-
osure to almost any chosen kind of market volatility scenarios. In addition, a
terminal profit diagram, a chart obtained by a simple translation of the cor-
responding terminal payoff diagram, provides a clear visual presentation of
the range of profit and loss and the break-even point as outcomes of a chosen
trading strategy.
We emphasize that one should not confuse the two.
Just to make these tools of the trade easier to underhand, our discussion
in this section will focus on European options only, since, prior to expiration, the
payoff diagrams of American options may be more complicated.
Also, without loss of generality, we assume that all options in a given portfolio
are written on the same stock and have the same expiration T.
360 7 Derivatives: Forwards, Futures, Swaps, and Options
Again, let C (S, t) and P(S, t) respectively denote the value of European call
and European put options at time t, t [0, T ], where T is the expiration.
Definition 7.9. An option terminal payoff diagram is a graph of the value of the
option position (e.g., long a call or short a put) at expiration T as a function of
the underlier price at T.
Recall (7.7) and (7.8) on page 355; the terminal payoffs of an option position of
a long call and that of a long put are represented respectively by
.
0 if S( T ) K
C (S, T ) = max{S( T ) K, 0} =
S( T ) K if S( T ) > K,
and
.
K S( T ) if S( T ) < K
P(S, T ) = max{K S( T ), 0} =
0 if S( T ) K.
Example 7.30.
1. If we let x = S( T ) and y = f (S( T )) C (S( T ), T ), then the graph of y = f ( x )
on xy-plan is the graph of C (S( T ), T ) against S( T ) given in Fig. (3). That is,
Fig. (3) is the terminal payoff diagram of an option position of long a call by
Definition 7.9.
2. By Definition 7.9, the graph of P(S( T ), T ) against S( T ) is the terminal payoff
diagram of long-a-put. Can you sketch this graph? Does your graph coin-
cide with the one in Fig. (6)?
3. Since the value of a short position on a call at expiration T is C (S( T ), T ),
Fig. (4) gives the terminal payoff diagram of short-a-call.
4. Similarly, we see that the terminal payoff diagram of short a put is given by
Fig. (7).
Fig. (3) long call (C) Fig. (4) short call (C) Fig. (5) long stock (S )
O O O
payoff payoff payoff
K K
/ ?
? / /
0 K S(T ) 0 K ??? S(T ) 0 S(T )
? ??
K K
7.5 Options 361
Fig. (6) long put ( P) Fig. (7) short put ( P) Fig. (8) long bond
O O O
payoff payoff payoff
?
K ?? K K
??
??
? / / /
S(T ) K S(T ) S(T )
0 K 0 0
K
Fig. (9) P+C Fig. (10) P1 +C2 Fig. (11) short bond
O O O
payoff payoff payoff
? K1 ?? /
K ?? ??
?? ??
0 S(T )
?? ??
? / / K
0 K S(T ) 0 K1 K2 S(T )
Notice that the result of superimposing Fig. (3) and Fig. (7) is the graph of
translating Fig. (5) to the right by K units, i.e., C P = S K at time T.
Also notice that a bond generates only vertical translations of payoff dia-
grams. Therefore, taking a position in a bond does not play any hedging
role.
Example 7.31.
1. If consists of long one share of stock (S), then the terminal payoff diagram
of portfolio is the payoff diagram of long a stock. Can you sketch this
graph? Does your graph coincide with Fig. (5) on page 360?
2. If consists of simultaneous long a call (C) and a put (P), which are on
the same stock and with same expiration, then = C + P, and the terminal
payoff diagram of portfolio is given by Fig. (9).
Portfolio = C + P is called a straddle, which will be revisited in Section 7.5.5.
362 7 Derivatives: Forwards, Futures, Swaps, and Options
We provide an intuitive way to derive the put-call parity in the next example.
S ( t ) e q ( T t ) + P ( t ) C ( t ) = K e r ( T t ) . (7.11)
Further discussion about the put-call parity for European options will be given
in Section 7.5.6 and Section 7.5.7.
Since the profit generated by a portfolio is the difference between the termi-
nal payoff and the initial price you pay (e.g., the option premium), ignoring
commissions, the terminal profit diagram of a portfolio is the graph that is the
vertical translation of the corresponding terminal payoff diagram by the initial
price you pay in the same coordinate system.
where a is the net premium of the straddle. Then the range of profit is given by
S( T ) [0, K a) (K + a, ) and the range of loss by S( T ) (K a, K + a),
and the break-even point is S( T ) = K a or K + a.
Remark 7.8. More precisely speaking, the terminal profit of a portfolio is defined as
the difference between the terminal payoff and the future value at time T of the initial
cost. For example, by this definition, the profit for a call buyer at the expiration
T is C (S( T ), K ) c erT = max{S( T ) K, 0} c erT , where we assume that the
buyer purchases the call at time t = 0 by paying premium $ c.
However, taking into consideration the two facts:
1. 1 erT is relatively too small and
2. The terminal profit diagram is only used for visualization of option strategy
outcomes,
we may ignore the compounding factor erT in the future value of the initial
cost.
An option is said to be out of the money at time t if the option has a negative
payoff if it is exercised at time t. More specifically,
A call is out of the money at time t if S(t) < K,
A put is out of the money at time t if S(t) > K.
Example 7.36. In option trading, the statement that a call is deep in the money at
time t means that the underlier spot price S(t) is well above the strike price of
the call.
In practice, the deep in-the-money condition is a condition in which a call
value changes dollar for dollar with the spot price movement of the underlier;
for an at-the-money call, its value changes only about 50% of the spot price
change.
A parallel statement can be made for a deep in-the-money put.
Example 7.37. In option trading, the statement that a call is deep out of the money
at time t means that the underlier spot price S(t) is well below the strike price
of the call.
A parallel statement can be made for a deep out-of-the-money put.
In practice, deep out-of-the-money options are always worth something bec-
ause there is always a probability that the condition may change. They may be
used for two trading strategieshedging and speculation. In some ways, deep
out-of-the-money options are almost like purchasing lottery tickets, i.e., they
present an opportunity for profits but with a low probability of success.
Remark 7.9. From a pure financial theory point of view, one would define the
condition for a call being in the money at time t to be S(t) > Ker( T t) . However,
this definition would not make any real difference in terms of the purpose
which the option moneyness terminologies serve, since the difference between
K and Ker( T t) is very small. An expression like S(t) > K is much simpler,
and therefore more convenient to use, than S(t) > Ker( T t) .
7.5 Options 365
Similar argument applies to the definitions of out of the money and at the
money.
and
c = $1, p = $1, K = $7.
What is the expected return of ? What is the terminal profit diagram of the
straddle?
33 A neutral option strategy is a strategy that is designed to profit from either a rise or fall (non-
directional) in the underlier price.
366 7 Derivatives: Forwards, Futures, Swaps, and Options
and put prices become much higher than expected when there is a major news
pending.
A strangle is a combination of two positions, one is long on a put with strike
K1 and another is long on a call with strike K2 (K2 > K1 ), where the call and
put have the same underlier (S) and expiration (T).
The long strangle, or buy strangle or simply strangle, is a (underlier spot)
market-neutral option strategy that involves simultaneously buying an out-
of-the-money put and an out-of-the-money call. It is also a bullish bet on the
underliers volatility.
Example 7.39. (Strangle) Typically, when you long a strangle, the two strike
prices K2 > K1 are near the money and out of the money (thus, close to in the
money). Note that you are betting on volatility rather than on the underlying
stock alpha when you long either a straddle or a strangle and the one with
lower net premium is preferred. The terminal payoff of a strangle is provided
on page 360, Fig. (10), and the terminal profit diagram is provided below:
where a is the net premium of the strangle. Identifying the range of profit, loss
and break- even point as outcomes of the corresponding portfolio strategy is
left as an exercise for the reader.
A spread strategy corresponds to a portfolio that consists of two or more
options of the same type to achieve a certain level of hedging effect. Spreads
are the basic building blocks of many option strategies although we will only
briefly introduce three of them in the following.
Example 7.40. (Price Spread or Vertical Spread) One way to construct a price
spread, or vertical spread, is to use two calls (or two puts) on the same und-
erlying asset (S) and the same expiration date (T) but with different strike
prices34 (Ki , i = 1, 2). One is bought and another is sold in order to achieve a
level of hedging effect.
A spread is called a bull spread if it is designed to profit from an upward
movement of the underliers price (see payoff diagram below).
A spread is called a bear spread if it is designed to profit from a downward
movement of the underliers price (see payoff diagram below).
34This explains the name price spread (more precisely, strike price spread). Since the strike prices
are listed vertically by the news media, a price spread is also referred to as a vertical spread.
7.5 Options 367
Using a formula to express the terminal payoff of each spread shown above is
left as an exercise for the reader.
Example 7.41. (Calendar Spread or Time Spread or Horizontal Spread) One
way to construct a calendar spread, also called time spread or horizontal spread,
is to use two calls (or two puts) on the same underlier (S) and with the same
strike price (K) but on different expiration dates35 (Ti , i = 1, 2).
If you predict (as a pure speculation) that the underlying stock price will rise
above $45 in a few months, you may want to consider the following bullish
calendar call spread.
Suppose that XYZ 45 (American) calls are priced below.
Expiration month May June July August
Premium of XYZ 45 call $0.25 $2 $3 $5
By buying 1 XYZ Aug 45 call only, you will lose $500 if the stock XYZ drops
to, say, $35. In contrast, by making the transaction of buying 1 XYZ Aug 45 call
and writing 1 XYZ July 45 call, you will loss only $200 if the stock drops to $35.
The hedging effect created by the spread cuts the loss by more than half.
Example 7.42. (Butterfly Spread) A butterfly spread is a portfolio consisting of
four options of the same type on the same underlier (S) with the same exp-
iration date (T) but three different strike prices (K1 < K2 < K3 ). Generally, K2 =
2 (K1 + K3 ) and is close to the current underlier price. The graph below is the
1
The butterfly spread is designed to profit when the underlier price movement
stays close to the current price. Using a formula to express the terminal payoff
of the butterfly spread indicated above is left as an exercise for the reader.
35This explains the name calendar spread or time spread. Since the expiration months are listed
across the top of the newspaper page horizontally, a calendar spread is also referred to as a horizontal
spread.
368 7 Derivatives: Forwards, Futures, Swaps, and Options
The basic forms of put-call parity given in Example 7.32 on page 362 provide
not only a conversion between a European put and a European call but also a
perfect hedged portfolio36 = S + P C for
S ( t ) e q ( T t ) + P ( t ) C ( t ) = K e r ( T t ) ,
from (7.11) on page 362, with dividend yield rate q 0, of which a proof was
given by using option terminal payoff diagrams.
In this section, we provide a different proof of the put-call parity by using
the law of one price.
We denote by C E (t) and P E (t) the price of a European call and the price of a
European put options at time t [0, T ], where time 0 is current date and both
the call and put are on the same underlying asset and have the same expiration
date T and strike price K. Let S(t) be the price of the asset at time t [0, T ] and
let r be the risk-free interest rate compounded continuously.
Theorem 7.3. (Put-Call Parity for European Options) The current price relation
between European put and call options on the same asset with annual dividend yield
q(see page 341) and with the same expiration T and strike price K is given by
Since the two portfolios have identical values at a future time, it follows from
the definition of arbitrage and the law of one price that they must have identi-
cal values today:
A (0) = B (0),
which is equivalent to
Replacing 0 by t < T in (7.12), we obtain (7.11) again
Observe that a superposition of Fig. (3) and Fig. (7) on page 360 yields Fig. (1)
on page 340. It follows that at time T, C P = F T ( T ), where we assume that
the put, call, and forward are on the same underlier and expiration T and with
the same strike K (thus FT (0) = K; also see page 344). Under the law of one
price, we obtain an equivalent form of the put-call parity in (7.12):
C E ( t ) P E ( t ) = F T ( t ), 0 t T, (7.13)
where the forward value FT (t) = ( FT (t) K ) er( T t) by (7.6) on page 344. In
words, (7.13) says that the difference between the current values of the Euro-
pean call and put is the current value of the forward.
Example 7.43. Using the forward price formula FT (t) = S(t)e(rq)( T t) , the
put-call parity (7.13) can also be derived by straightforward computation as
follows:
Keeping Remark 7.9 (see page 364) in mind, the expression right after the last
equal sign represents the degree to which the call option is in the money at
time t, as does the expression er( T t) max{ FT (t) K, 0}.
Observation 2.
represents the degree to which the put option is in the money at time t.
Observation 3.
Then, a call option price can be expressed by a sum of its intrinsic value and
its time value:
Similarly, we define the time value of a put option at time t, written TV p (t), by
Then, a put option price can be expressed by a sum of its intrinsic value and
its time value:
we obtain TVc (t) = TV p (t) during the life of the call and put. This result allows
us to simply denote by TV(t) the time value of a European option and write
In words,
Notice that an out-of-the-money option has time value only. As a result, the
value of an out-of-the-money option erodes quickly with time as it gets closer
to its expiration.
Because out-of-the-money options have time values only, they are signifi-
cantly cheaper and offer great leverage and, therefore, have better liquidity
(are more actively traded). For these reasons, most professional option traders
trade the time value only with confidence in turning time value decay into
potential profits; for trading purpose, the time value of an option is where the
professional traders see the value of an option.
Remark 7.10.
1. Time value is subject to several factors, primarily time to expiration and
implied volatility. The latter concept will be discussed in Chapter 8.
372 7 Derivatives: Forwards, Futures, Swaps, and Options
2. The rate at which time value decays is represented by , one of the Greeks to
be introduced along with Black-Scholes-Merton model, again in Chapter 8.
Example 7.45. SPY is trading at $213 on May 18, 2015. Call options with strike
prices below $213 are in-the-money calls. Call options with strike prices above
$213 are out-of-the-money calls. Call options with strike prices equal $213 are
at-the-money calls. Given the following option information,
Expiration month May June July
Premium of SPY 212.5 call $1.28 $2.92 $3.67
Time value of SPY 212.5 call $0.75 $2.42 $3.17
observe the time value decay in action: the time value drops from June to May
much faster than that from July to June.
Example 7.46. (Trading on Time Value) Option writers attempt to benefit from
the time value decay. They collect time value premiums paid by option buyers.
Such premiums can become steady cash flows if the underlying security is
stationary.
Proof. We provide a proof for the first set of inequalities. A proof of the second
set is left as an exercise for the reader.
Step 1. If S(0) < C A (0), then construct a portfolio at time 0, by shorting 1
American call and using the proceeds to immediately long 1 unit of the un-
derlier. We obtain (0) = C A (0) S(0) > 0, an immediate profit with the cash
amount a = C A (0) S(0) at time 0. Thus, ( T ) > 0 is guaranteed regardless
of the fluctuations of the asset price, even if the cash amount $a is kept under
mattress for taking a short position on a covered call means that the pos-
session of the asset can always cover the exercise made by the buyer in case it
happens. Therefore, is an arbitrage!
Step 2. The ability to exercise an American option at any time prior to or at
expiration makes American options more flexible than European options; thus,
C A (0) C E (0).
Step 3. Applying the put-call parity (7.12) and P E (0) 0, we obtain
C A (0) = C E (0).
We denote by C A (t) and P A (t) the price of the American call and the price
of the American put options at time t [0, T ], where time 0 is current date
and both the call and put are on the same underlying asset and have the same
expiration date T and strike price K. Let S(t) be the price of the asset at time
t [0, T ] and let r be the risk-free interest rate compounded continuously.
Theorem 7.4. (Put-Call Parity for American Options) The current price relation
between American put and call options on the same asset with annual dividend yield
q(see page 341) and with the same expiration T and strike price K is given by
374 7 Derivatives: Forwards, Futures, Swaps, and Options
Proof.
Step 1. To prove the inequality C A (0) P A (0) S(0) KerT , we construct
two portfolios A and B, denoted by A and B , respectively:
A : long 1 American call and long zero-coupon bond with KerT at rate r.
B : long 1 American put and long 1 unit of the underlying asset.
We denote by A (t) the value of portfolio A and B (t) the value of portfolio
B at time t and by DPV the time-0 value, or present value, of the dividends.
Notice that the American call cannot be exercised early because there is not
enough cash to buy 1 unit of the asset until time T and
A ( T ) = max{S( T ) K, 0} + K = max{S( T ), K }.
With the possibility that the American put could be exercised at t [0, T ],
.
K + DPV ert if S(t) < K
B ( t) = rt
= max{S(t), K } + DPV ert ,
S(t) + DPV e if S(t) K
It follows from the definition of arbitrage and the law of one price that
A (0) B (0).
we write DPV = S(0)(1 eqT ) for DPV eqT = S(0)(eqT 1) being the total
dividends of 1 unit of the underlier over the interval [0, T ]. Then
Since the European call cannot be exercised early, we only need to consider
whether the American put is exercised early.
If the American put is not exercised early, then
If the put is exercised early at time t [0, T ) (i.e., sell stock to receive K), then
Example 7.47. (Intuition of the Put-Call Parity for American Options) To in-
terpret the put-call parity (bounds) for American options geometrically, we let
and visualize the region in the xy-plane, which is bounded by two straight
lines,
L1 : y = eqT x K and L2 : y = x KerT .
Notice that for nondividend-paying underliers, these two lines are parallel.
The relation between S(0) and C A (0) PA (0) has a geometric interpretation
in terms of the points in the region bounded by L1 and L2 as shown below:
376 7 Derivatives: Forwards, Futures, Swaps, and Options
The geometric interpretation of the put-call parity for American option in the
case of q > 0 is left as an exercise for the reader. (Hint: sketch a graph!)
Example 7.48. Given the current price of a stock and the current price of a put
option on the stock with expiration T and strike K along with the stocks divi-
dend yield q and risk-free rate r, find the upper and the lower price bounds for
the American call option with the same expiration and strike on the stock.
The next proposition gives the boundary conditions on calls and puts; they are
needed later in our study of the Black-Scholes-Merton p.d.e.
Again, we let C E (S, t) = C E (S(t), t) and P E (S, t) = P E (S(t), t) for t T.
Boundary conditions for C E (S, t) and P E (S, t) are applied for S 0 and S .
The next proposition can be established by applying Proposition 7.1 (see
page 372) and the put-call parity (see (7.12) on page 368) along with following
observations:
1. C E (S, t) 0 as S 0 since the call is unlikely to be exercised when the
underliers price is sufficiently small.
2. C E (S, t) Seq( T t) Ker ( T t) as S since the call is likely to be exer-
cised when the underliers price is sufficiently large.
Proposition 7.3. (European Call and Put Boundary Conditions)
1. limS0 C E (S, t) = 0 and limS C E (S, t) = limS S eq ( T t) .
2. limS0 P E (S, t) = Ker ( T t) and limS P E (S, t) = 0.
A proof of Proposition 7.3, 2, can also be done directly (without applying the
put-call parity). A geometric interpretation of the properties can be done by
sketching the graphs of C E (S, t) and P E (S, t) against S.
There are many more nice discussions in the literature related to the topics
presented in this chapter, e.g., [2, 3, 5, 6, 7, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19].
7.6 Exercises 377
7.6 Exercises
a) If you own those shares, what is your gain/loss from settling the position?
b) If you had naked short sold the American call, what is your gain/loss from
settling the position?
7.12. You paid $300 for an American call on a stock several months ago. It will
expire next month and is now worth only $100. What are the feasible actions
that you can take? What are the consequences of your actions?
7.15. (Call Time Spread Bearish) Recall Example 1. Given XYZ 40 call price
table:
Expiration Nov Dec Jan
Premium 2 3 5
If one expected XYZ stock to decline, one might establish a bear spread by
taking a position opposite of a bullish one.
7.6 Exercises 379
7.16. (Price Put Spread Bearish) Open a bear spread by using the following
puts:
in the hope of making a profit if XYZ stock declines in price. What is the pos-
sible maximum gain or loss? Justify your answers.
7.17. (Forward Price and Arbitrage) Suppose that the current spot price of a
continually paying dividend asset is $222, the interest rate is r = 3% and the
dividend yield is q = 2%.
a) What are the one-month and eight-month forward prices for the asset in an
arbitrage-free market?
b) Let be a portfolio on time interval [0, T ] consisting of three positions start-
ing from time 0: borrow $222 at the rate 3%, long 1 unit of the asset, and
short the three-month forward at FT (0) = $222.56. Is an arbitrage portfo-
lio? If your answer is no, show a proof. If your answer is yes, explain how
you can make a profit by taking the arbitrage opportunity.
7.19. (Swaps) Assume that the terms of the swap contract include the follow-
ing:
a) The notional principal is one million dollars,
b) The life of the contract is 2 years,
c) A pays B three-month LIBOR + 0.2%,
d) B pays A 1.5% fixed,
e) There is an exchange of payments every 3 months from the initialization.
Given the LIBOR rates in the table below, calculate both the floating cash flow
and fixed cash flow of the swap.
380 7 Derivatives: Forwards, Futures, Swaps, and Options
7.20. (Swaps) Suppose that both companies X and Y need to borrow US dollars
and that company X would like to borrow at a fixed rate, whereas company
Y would like to borrow at a floating rate. If X can borrow at 6.00% fixed and
LIBOR + 0.60% floating, and Y can borrow at 5.00% fixed and LIBOR + 0.20%
floating, what is the range of possible cost savings that company X can real-
ize through an interest rate swap with company Y? Use an example of swap
mechanics to demonstrate how a cost saving to be done for either company
(ignoring credit risk differences).
7.21. Identify the range of profit, loss, and break-even point as outcomes of the
corresponding strangle strategy given in Example 7.39 (page 366).
7.22. Use a formula to express the terminal payoff of each spread strategy given
in Example 7.40 (page 366).
7.23. Use a formula to express the terminal payoff of the butterfly spread given
in Example 7.42 (page 367).
7.24. (Arbitrage)
a) Suppose that the price of a stock at time t, denoted by S(t), is modeled by a
one-step binomial tree over the time period [0, T ] with
.
Sb with probability p
S( T ) =
Sa with probability 1 p,
where Sb > Sa .
Show that Sb > S(0) > Sa is a necessary condition for a non-arbitrage oppor-
tunity for any investor (assuming r f = 0).
References 381
b) Show that there exists a (risk-neutral) probability p > 0 holding the equation
7.26. Show that if the price of the underlier of a forward contract follows a geo-
metric Brownian motion, so does the forward price process (see Example 7.15).
7.29. Establish the following relation between American and European puts on
the same nondividend-paying underlier and with the same expiration T and
strike K:
P A (0) P E (0).
7.31. Establish the following put-call parity bounds for American options:
where = T t0 .
References
[6] Delbaen, F., Schachermayer, W.: The fundamental theorem of asset pricing
for unbounded stochastic processes. Math. Ann. 312, 215(1998)
[7] Delbaen, F., Schachermayer, W.: The Mathematics of Arbitrage. Springer,
Berlin/Heidelberg (2006)
[8] Duffie, D., Stanton, R.: Pricing continuously resettled contingent claims.
J. Econ. Dyn. Control 16, 561573 (1992)
[9] Epps, T.W.: Pricing Derivative Securities. World Scientific, River Edge
(2007)
[10] Harrison, J., Pliska, S.: Martingales and stochastic integrals in the theory
of continuous trading. Stoch. Process. Appl. 11, 215260 (1981)
[11] Hull, J.C.: Options, Futures, and Other Derivatives. Pearson Princeton
Hall, Upper Saddle River (2015)
[12] Jacod, J., Protter, P.: Probability Essentials. Springer, Berlin/Heidelberg
(2004)
[13] Kolb, R.W.: Financial Derivatives. New York Institute of Finance, New
York (1993)
[14] Korn, R., Korn, E.: Option Pricing and Portfolio Optimization. American
Mathematical Society, Providence (2001)
[15] Kreps, D.M.: Arbitrage and equilibrium in economies with infinitely
many commodities. J. Math. Econ. 8(1), 15(1981)
[16] Musiela, M., Rutkowsk, M.: Martingale Methods in Financial Modelling.
Springer, New York (2004)
[17] Reilly, F.K., Brown, K.C.: Investment Analysis and Portfolio Management.
South-Western Cengage Learning, Mason (2009)
[18] Whaley, R.: Derivative: Markets, Valuation, and Risk Management. Wiley,
Hoboken (2006)
[19] Wilmott, P., Dewynne, N., Howison, S.: Mathematics of Financial Deriva-
tives: a Student Introduction. Cambridge University Press, Cambridge
(1995)
Chapter 8
The BSM Model and European Option Pricing
The BSM model prices a derivative using two securities that act as fundamen-
tal drivers: a money market account and the security serving as underlier of
the derivative.
Define a money market account to be a riskless security that has a value B0
at the current time 0 and grows by continuous compounding at the risk-free
rate r. Its value at a general time t is then
Bt = B0 er t ,
Remark 8.2. The value Bt is often used as a numeraire for an asset, which means
that the value of the asset at t can be expressed as a multiple of Bt , i.e., in units
of Bt .
St = S0 eRW t + Bt (0 t T),
2
RW = m q .
2
The instantaneous change is given by the following s.d.e.:
1 Reinvesting the cash dividend in the security is then simply the investor acquiring more units of the
security.
8.1 The BSM Model 387
Stc = St eq t ( t 0).
In this section, we shall employ a certain trading strategy (nt , bt )i.e., take
a position with nt units of the underlying security and bt units of the money
market accountto construct a portfolio whose value replicates the price of a
derivative in a self-financing manner.
Before carrying out this strategy, we fix a derivative and assume that its price
based on 1 unit of its underlying security is a stochastic process { f t }0t T that
is a deterministic function of the underliers market price (i.e., ex-dividend
price) St and time t,
f t = f ( St , t ), (8.6)
where f ( x, t) is assumed to be at least twice continuously differentiable in x
and once continuously differentiable in t for x > 0 and 0 < t < T. In particular,
the current price of the derivative is f (S0 , 0). We assume that the derivative does
not pay a cash dividend.
Now, assume that we have an initial capital Vt at t. With this money, create a
portfolio using a trading strategy (nt , bt ), i.e., hold nt units of the cum-dividend
underlying security and bt units of the money market account. Think of (nt , bt )
as a stochastic process with values evolving in R2 . The value of the portfolio
at t is
Vt = nt Stc + bt Bt .
As t advances and (nt , bt ) evolves, a key issue will be how to pay for these
changes in the number of units of the underlying security and the money mar-
ket account.
At time t + dt, suppose that (nt+dt , bt+dt ) replicates the price of the deriva-
tive:
Vt+dt = nt+dt Stc+dt + bt+dt Bt+dt = f (St+dt , t + dt).
After the initial capital at t, the portfolio is self-financing if the trading strategy
(nt+dt , bt+dt ) at t + dt is funded without withdrawing or adding any external
funds to the portfolio. In other words, during the transition from time t to
t + dt, the value of the portfolio at t + dt arises only from an increase, decrease,
or neither in the values of the underlying security and/or the money market
account. The original strategy (nt , bt ), along with the possibly new values of
the underlier and money market account at t + dt, must then fund the portfo-
lios replication of the derivative:
Let us then determine the trading strategy (nt , bt ) that will make (8.9) possible.
The self-financing condition (8.8) and replicating condition (8.9) will allow us
to determine the desired (nt , bt ). By (8.9),
f (St , t) nt Stc
bt = .
Bt
Substituting into (8.8) and employing (8.1) and (8.5) give
dBt
dVt = nt dStc + f (St , t) nt Stc
B
t
= r f (St , t) + nt (m r) Stc dt + nt Stc dBt .
(8.10)
On the other hand, Itos formula (8.4) yields
1 2 2 2 f f f
dVt = d f (St , t) = St 2 ( St , t ) + ( m q ) St ( St , t ) + (St , t) dt
2 x x t
f
+ St (St , t) dBt .
x
(8.11)
Now, an Ito process has a unique representation; see, for example, Korn and
Korn [24, p. 77]. This means that if
at = at , bt = bt .
f
f ( St , t) = ( St , t ).
x
Delta is perhaps the most popular of the Greeks and will appear many times
in this chapter.
Thus, the price f (St , t) of the derivative at any time t can be replicated by a
self-financing strategy (nt , bt ), where
with
St f ( St , t ) St f ( St , t )
nt = f ( St , t ), bt = . (8.14)
Stc Bt
Interestingly, the price of a derivative in the BSM model will arise from solving
a partial differential equation (p.d.e.). A second order p.d.e. in two indepen-
dent variables ( x, t) is an equation of the form
2 y 2 y 2 y y
A( x, t) ( x, t ) + B ( x, t ) ( x, t ) + C ( x, t ) ( x, t) + D ( x, t) ( x, t)
x 2 xt t 2 x
y
+ E( x, t) ( x, t) + F ( x, t)y( x, t) = 0, (8.15)
t
where the coefficients and y( x, t) are deterministic functions. The p.d.e. (8.15)
is called
hyperbolic if B2 4AC > 0
parabolic if B2 4AC = 0
elliptic if B2 4AC < 0.
Example 8.1. (Heat Equation) The following p.d.e. is well known in physics:
y 2 y
( x, t) = c 2 ( x, t), (8.16)
t x
8.1 The BSM Model 391
1 2 2 2 f f f
St 2 ( St , t ) + (r q ) St ( St , t ) + (St , t) r f (St , t) = 0. (8.17)
2 x x t
In general, Equation (8.17) is not a deterministic p.d.e. because St is random for
t > 0. However, for each t in (0, T ), the possible values of the lognormal ran-
dom variable St range over (0, ) independent of t. In other words, Equation
(8.17) holds at all points ( x, t) in (0, ) (0, T ). The associated deterministic
p.d.e. is then
1 2 2 2 f f f
x ( x, t) + (r q) x ( x, t) + ( x, t) r f ( x, t) = 0, (8.18)
2 x 2 x t
where 0 < x < and 0 < t < T. Equation (8.18) is called the BSM p.d.e. Because
B = C = 0, we get B2 4AC = 0, i.e., the BSM p.d.e. is parabolic.
Assuming a solution f ( x, t) of the BSM p.d.e. exists, the derivatives price is
given by f (St , t). The issue is that if f ( x, t) is a solution, then we can construct
infinitely many other solutions, which creates an infinity of derivative prices.
For example, if f ( x, t) is a solution, then for every positive real number c > 0
the function f c ( x, t) = f (c x, t) is also a solution (Exercise 8.17). The existence
and uniqueness of a derivative price require additional constraints on f ( x, t):
Note that the explicit nature of these conditions cannot be stated a priori be-
cause they depend on the contractual structure of the derivative.
For sufficiently well-behaved final and boundary conditions, the theory of parabolic
p.d.e.s yields that the BSM p.d.e. will have a unique solution f ( x, t) and, hence, the
derivative will have a unique price f (St , t). See Korn and Korn [24, Sec. 3.3] and
Miersemann [31, Chap. 6] for more.
392 8 The BSM Model and European Option Pricing
Below we state the final and boundary conditions for European calls and puts
as well as present the associated unique solution of the BSM p.d.e. and the
derivatives price.
Notation. For European calls and puts, write the solutions of the BSM p.d.e.
as C E ( x, t) and P E ( x, t), respectively, rather than f ( x, t).
For a European call with strike K and expiration at T, the BSM p.d.e. is
1 2 2 2 C E C E C E
x ( x, t) + (r q) x ( x, t) + ( x, t) r C E ( x, t) = 0, (8.19)
2 x 2 x t
the final condition is
C E ( x, T ) = max{ x K, 0}, (8.20)
and boundary conditions are
Note that (8.20) and (8.21) follow from properties of European calls using no
arbitrage and put-call parity.
We outline how to solve the p.d.e. along the lines of Wilmott, Dewynne, and
Howison [42, Sec. 5.4], leaving the computational details as exercises:
St = St eq( T t) ,
1 2 2 2 C E C E C E
x ( x, t ) + r x ( x, t ) + ( x, t) r C E ( x, t) = 0, (8.22)
2 x2 x t
where x = x eq( T t) and the call price is now viewed as a function of ( x, t).
The associated terminal and boundary conditions are
Note that x = x at t = T.
8.2 Applications of BSM Pricing to European Calls and Puts 393
v 2 v v
( x, ) = 2 ( x, ) + (k 1) ( x, ) k v( x, ) (8.24)
x x
and
v( x, 0) = max{ex 1, 0}, lim v( x, ) = 0, v( x, ) ex as x . (8.25)
x
v( x, ) = u( x, ) ea x+b , (8.26)
u 2 u
( x, ) = 2 ( x, ) (8.28)
x
and (8.25) to
$ 1 %
u( x, 0) = max e 2 (k+1) x e 2 (k1) x , 0 , lim u( x, ) ec x = 0, (8.29)
1 2
| x|
where c > 0.
The heat equation has been extensively studied in physics and mathemat-
ics. The key result is that Equations (8.28) and (8.29) have a unique solution
given by
1 ( xs)2
1
u( x, ) = u(s, 0) e 4 ds, (8.30)
2
where u(s, 0) is given in (8.29). After some change of variables and com-
pleting the square, the solution (8.30) can be transformed (see Wilmott,
Dewynne, and Howison [42, Sec. 5.4]) to
u( x, ) = e 2 (k+1) x + (k +1)2
N(d+ ) e 2 (k1) x + (k 1)2
1 1 1 1
4 4 N( d ), (8.31)
x + (k + 1)
d+ = , d = d+ 2 .
2
Inserting (8.31) and (8.27) in (8.26), the call price becomes (Exercise 8.23)
Transforming back to the original variables ( x, t), Equation (8.32) shows that
the unique solution of the BSM p.d.e. (8.19) subject to (8.20) and (8.21) is
C E ( x, t) = x eq ( T t) N d+ ( x, T t) K er ( T t) N d ( x, T t) , (8.33)
where4
1 x 1 2
d ( x, T t) = ln ( T t) . (8.34)
Tt K e(rq) ( T t) 2
It is important to emphasize that European call prices in the real world are determined
by market forces, not by the BSM formula.
The BSM pricing formula (8.37) is also applied when the underlying security
is replaced by a risky portfolio of securities. In this situation, the total value of
the portfolio at a general time t is St , and the portfolio is assumed to follow
geometric Brownian motion for t 0. Like the underlying security of a deriva-
tive, the portfolio is also assumed to be tradable.
Remark 8.3. The original 1973 formula by Black and Scholes for a European
calls price assumed q = 0, while Mertons 1973 paper extended the result to
q > 0.
The rightmost expression emphasizes the calls dependence on all the inputs
St , K, , r, , and q.
Example 8.2. What is the fair current price of 500 European calls on an index
with current dollar value $1,100, strike price $1,100, volatility 15%, two months
to expiration, and dividend yield of 2.5%? Assume a risk-free rate of 2%.
Solution. The fair price of a derivative is its no-arbitrage price, which in the
case of a call is the BSM price. At the current time t = 0, the inputs are:
Then the European call formula (8.37) yields C0E = $26.31437.5 Each call is
based on 100 indexes, so the total cost of the 500 calls is
500 100 $26.31437 = $1, 315, 718.50.
Inserting the call price (8.37) into the put-call parity formula, namely,
P E ( S t , t ) = K e r S t e q + C E ( S t , t ) ,
The current price is then P0E = P E (S0 , 0). As noted earlier for calls, the actual prices
of European puts are dictated by the marketplace and not by (8.39).
The pricing formula (8.39) can also be obtained by solving the BSM p.d.e.
with final condition
P E ( x, T ) = max{K x, 0}
and boundary conditions P E (0, t) = K er ( T t) and P E ( x, t) 0 as x . The
final and boundary conditions also follow from no arbitrage and put-call par-
ity. The corresponding unique solution is
P E ( x, t) = K er ( T t) N(d ) x eq ( T t) N(d+ ),
Example 8.3. What is the fair current price of 1,000 European puts on a stock
with current price $82, strike price $82, volatility 10%, and six months to ex-
piration? Assume a risk-free rate of 3% and that the stock pays no dividend.
The BSM pricing formula (8.39) gives P0E = $1.7365. Hence, the fair cost of the
1,000 puts is: 1, 000 100 $1.7365 = $173, 650.
The European call price naturally involves partial derivatives relative to the
underlier price (delta) and the strike price. In fact, Equation (8.37) can be ex-
pressed more compactly in terms of partial derivatives:
C E
C E ( St , t ) = St C ( t ) + K ( t ), (8.40)
K
where the delta of the call is
4
C E 44
C ( t) = = eq N(d+ (St , )) > 0 (8.41)
x 4( x,t)=(St ,t)
and
C E
( t ) = e r N d ( S t , ) < 0, (8.42)
K
where the strike price K is treated as a variable.
8.2 Applications of BSM Pricing to European Calls and Puts 397
C E
C ( t) = ( t ).
S
An important consequence of (8.40), (8.42), and Itos formula is that the price
of a European call is more volatile than that of its underlying security (Exercise 8.25).
In addition, Equation (8.41) yields that the delta of a European call is always
positive, so the price of a European call increases as the price of the underlying secu-
rity increases. Furthermore, Equation (8.42) shows that C
E
K (t) < 0, i.e., the price
of a European call decreases as the strike price increases. In other words, an out-of-
the-money European call is cheaper than an at-the-money or in-the-money call with
the same inputs:
P E
P E ( St , t ) = St P ( t ) + K ( t ),
K
where
P E
P ( t) = (t) = eq N(d+ (St , )) < 0 (8.44)
S
and
P E
( t ) = e r N d ( S t , ) > 0. (8.45)
K
The puts delta is negative by (8.44), i.e., the price of a European put decreases as the
price of the underlying security increases. Moreover, since (8.45) gives P
E
K (t) > 0,
the price of a European put increases as the strike price increases. It follows that an
out-of-the-money European put is cheaper than an at-the-money or in-the-money put
with the same inputs:
We conclude with the behavior of the deltas of European calls and puts as time
approaches expiration, i.e., as = T t 0. First, recall that (see page 394)
St e q
ln 1 2
St
K e r 2 ln K (r q 12 2 )
d ( St , ) = = +
d (St , ) = d+ (St , ) .
Warrants are call options issued (i.e., sold) by a company on its own stock. They
provide a way for companies to raise money. When the warrants are exercised,
the company issues new shares of its stock and sells them to the holders at the
strike price. Note that issuing these new shares dilutes the share price of the
stock.
Suppose that at the current time, t = 0, a company has Nout outstanding shares
with each of price S(0) and issues Nw warrants, where each warrant is a European call
on 1 share of the companys stock with strike price K and expiration T. The number of
outstanding shares will not change until the warrants are exercised. The equity
value of the company, i.e., the value of the companys asset minus the value
of its debt, at 0 is denoted by V (0). It consists of the current value Nout S(0)
of the Nout outstanding shares and the proceeds Nw W (0) from selling the Nw
warrants, where W (0) is the current value of each warrant at 0:
Va ( T ) = V ( T ) + Nw K.
The price of a share of the companys equity instantly after the warrants are
exercised is
Va ( T )
,
Nout + Nw
where Nout + Nw is the number of outstanding shares after exercise.
To value the warrant today, construct two portfolios as follows:
Portfolio A: long 1 warrant on 1 share of the company stock. The current
payoff of portfolio A is then the current value W (0) of the warrant.
Nout V (0 )
Portfolio B: long Nout + Nw European calls with current underlier price Nout ,
strike price K, expiration T, and no dividend. The price of the underlier at
V (t)
a general time t is the value Nout of a share of company equity before ex-
V (t)
ercise. We assume that Nout follows a geometric Brownian motion with volatility
parameter V . The current payoff of Portfolio B is then determined using the
BSM European call pricing formula:
Nout E V (0)
C , K, V , r, T, q .
Nout + Nw Nout
Because both portfolios have the same payoff at expiration, the law of one price
yields that they have the same payoff today, which gives the current value of
each warrant to be
Nout V (0)
W (0) = CE , K, V , r, T, q . (8.52)
Nout + Nw Nout
V (0 ) V (0 )
Equation (8.52) assumes that Nout and V are known. If Nout is unknown, then
employing (8.51) we find:
400 8 The BSM Model and European Option Pricing
Nout Nw
W (0) = C S (0) +
E
W (0), K, V , r, T, q . (8.53)
Nout + Nw Nout
This is a BSM-type pricing formula for the current price of the warrant in terms
of the current price of the warrant. In other words, we can numerically solve
(8.53) for W (0) implicitly.
Example 8.4. Suppose that a company has 3 million outstanding shares, the
current value of a share of the companys equity is $110, and the companys
equity per share has a volatility parameter of 20% per annum. The company
plans to issue 500,000 European warrants. Each warrant is based on 1 share of
the companys stock, the strike price is $125, and the expiration date is 3 years
away. Determine the fair price of the issuance, i.e., the BSM price. Assume the
company pays no dividend and the risk-free rate is 2.5%.
This is how much money the company would raise if it sold all the warrants
at the BSM price.
Fs = FsB = (B , 0 s).
The collection Fs also includes the sample space , which consists of all pos-
sible paths of standard Brownian motion, the empty set , complements of
events in Fs , and countable unions of events in Fs . Each event in Fs then car-
ries a piece of information about standard Brownian motion B on [0, s]. For example,
the sample space is the event that the actual path standard Brownian motion
will follow will be one of the possible paths in , i.e., one of the possible paths
of will occur (superficial information), while C is the event that the values
of B are between 1 and 1 from time 0 to time s. The -algebra Fs is then an
information set about the history of B up to and including time s. Bear in
mind that the word history should not be interpreted literally to imply that
s is the current time or a past time; the time s can be in the future.
If standard Brownian motion is observed on [0, s], then we know the ac-
tual path it took on [0, s], i.e., we can confirm which events in Fs occurred or
not. Since the current time is at 0 and s > 0 is in the future, we may not yet
know which events in Fs have occurred. However, all events in Fs are still
confirmable in the sense that when the current time reaches s, we shall know
for each event in Fs whether or not it occurred. On the other hand, the event
= { : B ( ) 10, 0 s + 1}
A
FsX = ( X , 0 s) Fs .
g( x, t) = S0 e(mq 2
1 2 ) t+ x
St = g(Bt , t), .
Moreover, given that the constants S0 , m, q, and are assumed known, we see
that for every fixed t the quantity Bt uniquely determines St through g(Bt , t)
and St uniquely determines Bt via
1 x 1 1
Bt = g(St , t), g( x, t) = ln (m q 2 ) t.
S0 2
For this reason, the -algebra Fs generated by {Bt }t0 is the same as the one
generated by the security price process {St }t0 : 6
Fs = (B , 0 s) = (S , 0 s). (8.54)
E ( X Ys | F ) = X E (Ys | F ) (0 s ).
a.s.
E ( Xt | F s ) = E ( Xt ) ( Xt is independent of Fs ). (8.55)
E ( E ( Xt |Fs ) |F ) = E ( Xt |F ) (0 s t ).
a.s.
(8.56)
E ( E ( Xt |Fs )) = E ( Xt ) . (8.57)
404 8 The BSM Model and European Option Pricing
where m is the instantaneous mean return of the underlier and r is the risk-free
rate.
Turning to the BSM model, it determines the price of a derivative as the so-
lution of the BSM p.d.e. under appropriate final and boundary conditions. A
simple but profound property that is easy to gloss over is that the coefficients
of the BSM p.d.e. are independent of the instantaneous expected return rate m.
In other words, given a derivative with well-behaved final and boundary con-
ditions, the solution of the BSM p.d.e. gives a unique derivative price process
that is independent of m. This means that the investors risk preference is irrelevant!
Denote the value of the instantaneous expected return rate m more explicitly
by mP , where P is the probability measure used to compute m. The above ob-
servation suggests that a derivative can be priced in a world of risk-neutral in-
vestors, where each investor has a probability measure PRN such that mPRN = r.
Remark 8.4. The insight for pricing a derivative in a risk-neutral world is due
to Cox and Ross [9].
such that
mr
BQ
t = Bt + t
is a standard Brownian motion on (, FT , Q ).9 Note that mr is a Sharpe ratio.
Under the probability measure Q, underlying security price, which by assump-
tion is a geometric Brownian motion, has instantaneous expected return given
by the risk-free rate r. To see this, rewrite (8.58) as
dSt = (r q) St dt + St dBQ
t , (8.60)
where
8 See Privault [34, Thm. 6.1].
2 2
( mr ) BT ( ) 12 ( mr ) T
9 Q ( A) = A DT ( ) dP ( ), where A, A F T , and DT ( ) = e .
406 8 The BSM Model and European Option Pricing
mr
dBQ = dBt + dt.
t
Since BQ t is a standard Brownian motion with respect to Q, the underlier price
St in (8.60) is also a geometric Brownian motion relative to Q, and its instan-
taneous expected return is the risk-free rate, i.e., mQ = r. The latter reveals the
risk neutrality of Q.
Let us express the risk neutrality of Q more explicitly by showing the link to
martingales. Equation (8.60) is solved by
+ BQ
S t = S0 e ( r q 2 )t
1 2
t (0 t T),
where S0 is the known initial price of the underlying security. Consider the
discounted price process
where Stc is the cum-dividend price of the underlier. Taking the conditional
expectation under Q yields (Exercise 8.28)
EQ St |F = S (0 t T ),
For this reason, the probability measure Q is called risk-neutral probability mea-
sure, i.e., for every underlying security price process {St }t0 , the discounted
process {e(rq) t St }t0 is a martingale relative to Q. Equation (8.61) is the
risk-neutral price of the underlying security at time . Note that the market
probability measure P is risk neutral only when m = r, in which case P = Q.
In fact, we saw that the self-financing, replicating nature of the portfolio deter-
mines f as a solution of the BSM p.d.e.
Consider the discounted portfolio value process
t = er t Vt ,
V
we find
t = nt St dBQ .
dV t
Integrating yields
1 t
t = V
V 0 + nv Sv dBQ
v. (8.64)
0
An important result from stochastic calculus is that stochastic processes like
those in (8.64) are martingales under the given measure and, conversely, a
(square integrable) martingale can be expressed in such a form (martingale rep-
t is a martingale under Q, we have
resentation theorem).11 Since V
EQ V t |F = V (0 t T ),
which is equivalent to
V = er (t) EQ (Vt |F ) (0 t T ).
Because the portfolio value Vs replicates the price of the derivative at s for all
0 s T, we obtain
11 See, for example, Bjork [5, Sec. 4.4] and Elliot and Kopp [14, p. 176] for details.
408 8 The BSM Model and European Option Pricing
Q Q Q
ST = St e ( T t) + (BT Bt ) = St e ( T t) + BTt (0 t T )
d
and
1
= r q 2 .
2
To obtain the rightmost equality in (8.65), recall that the independence of Brow-
nian motion increments implies that BQ Q Q
T Bt is independent of B for all
0 t. Consequently, the increment BQ Q
T Bt is independent of Ft . Since
the f is a deterministic function of St with St an invertible deterministic func-
tion of standard Brownian motion, Equations (8.54) (page 403) and (8.55) yield
the rightmost equality in (8.65).
Equation (8.65) is a stochastic process giving the risk-neutral price process of a
European-style derivative under the probability measure Q during the time interval
[0, T ]. Since at the current time, t = 0, we know the entire history of the under-
lying securitys prices up to and including time t0 , Equation (8.65) implies that
the derivatives current risk-neutral price is the following constant:
Remark 8.5. Readers interested in more about the link between p.d.e.s and
stochastic differential equations should explore the Feynman-Kac formula.
Two issues were raised in Section 8.4.2 (see page 404): does there exist a risk-
neutral probability measure relative to which a European-style derivative can
be priced and, if it exists, is there a unique such probability measure? In Sec-
tion 8.4.2, we employed Girsanov theorem to invoke the existence of such a
8.4 Risk-Neutral Pricing 409
Notation. For simplicity, we shall often use an asterisk to indicate when the
risk-neutral probability is being employed. For example, if fYQ is the p.d.f. of
Y relative to Q, we shall also write fY instead of fYQ . In some cases, we shall
also write E ( X ) instead of EQ ( X ) for efficiency. This carries over from the
study of binomial trees. It should be clear from the context whether E ( X ) is
an expectation relative to the risk-neutral uptick probability pn of a binomial
tree or the risk-neutral measure Q of a continuous-time setting.
We saw in Section 8.1.3 (page 388) that all the European-style derivatives are
attainable in the BSM model. Since the marketplace for the BSM model has no
arbitrage and is complete, the BSM model has a unique risk-neutral probability
measure and so only one price for a derivative. Indeed, the unique risk-neutral
probability measure is the one given via Girsanov theorem by (8.59). This is
consistent with the BSM p.d.e. having a unique solution under appropriate
final and boundary conditions.
12 Some authors call Theorem 8.1 the Fundamental Theorem of Asset Pricing.
13 Probability measures P and Q are called equivalent when P ( A) = 0 if and only if Q ( A) = 0.
14 See Chapter 12 (e.g., Appendices A and B) by Staum in Birge and Linetskys handbook [4] for a
Later, we shall study the Merton jump diffusion model (Section 8.9) and en-
counter a no-arbitrage market that is incomplete. Moreover, since the Merton
model reduces to the BSM one, it also yields a unique risk-neutral probability
measure for the BSM model. See Section 8.9.4 on page 458.
Remark 8.6. The papers by Ross [37], Harrison and Kreps [19], Harrison and
Pliska [20], Kreps [25], and Delbaen and Schachermayer [11] laid the math-
ematical foundation for the two fundamental theorems of asset pricing. See
Schachermayer [39] for a history and discussion as well as Epps [15] for an
insightful summary. The lecture notes by Privault [34, Chaps. 2, 5, 15] give an
excellent accessible introduction to martingale pricing and the fundamental
asset pricing theorems in discrete and continuous time.
and 1
v fYT (v) dv = Y0 e0 +0 /2 N(d1 ),
2
(8.69)
K
where N() is the standard normal cumulative distribution function and
ln(Y0 /K ) + 0 + 02 ln(Y0 /K ) + 0
d+ = , d = .
0 0
X = T + BT N (0 , 02 ).
Here 0 = T, 02 = 2 T, and
ln SK0 + (r q + 12 2 ) T ln S0
K + (r q 12 2 ) T
d+ = , d = .
T T
That is, d = d (S0 , T ); see Equations (8.35) and (8.37) (page 394). It follows:
1 1
E (C E ( T )) = v f ST (v) dv K f ST (v) dv
K K
= Y0 e0 +0 /2 N d+ (S0 , T ) KN d (S0 , T )
2
= St e(rq) T N d+ (S0 , T ) KN d (S0 , T ) .
Hence, we obtain the BSM formula for the current price of a European call:
C (0) = S0 eq T N(d+ (S0 , T )) K er T N d (S0 , T ) . (8.70)
un = u, dn = d, pn = p.
For a one-period binomial tree from time t0 to t1 , we shall determine the cur-
rent price C E (t0 ; 1) of a European call option with strike K and expiration T in
terms of a replicating portfolio that finances itself after an initial capital.
Assume an initial capital V0 at the current time t0 , and create a portfolio as
follows:
We do not yet know the explicit nature of the trading strategy (n0 , b0 )e.g.,
whether we have a long position in the security (n0 > 0) or short position in
the money market account (b0 < 0). However, since only the initial capital is
used to form these positions, we must have:
where the choice of B(t0 ) is an arbitrary positive number that we have simply
set to be $1. The value of the portfolio at t1 arises from changes in the values
of the security and the money market fund:
Fig. 8.1 One-period binomial tree models of the prices of the security (left tree) and European call
option (right tree). Here p is the probability of an upward movement in the tree
where Sc (t1 ) is the value of security cum dividend (with dividend). In other
words, due to dividend reinvesting, the original one unit of the security has
grown to eq h . Consequently, the cum-dividend unit price of the security at t1 is
S c ( t1 ) = S ( t1 ) e q h , (8.72)
where S(t1 ) is the securitys ex-dividend (without dividend) unit price. Note
that the prices shown in Figure 8.1 are ex-dividend prices. The value of the
money market account at t1 is
B ( t 1 ) = B ( t 0 ) er h . (8.73)
Now, suppose that at time t1 we have a trading strategy (n1 , b1 ). Then the
value of the trading strategy is
n1 Sc (t1 ) + b1 B(t1 ).
That is, after using the initial capital, no outside funds can be added, and no
funds can be taken out the portfolio.
We want to find a trading strategy (n1 , b1 ) at expiration t1 such that the port-
folios value replicates the calls price
If this is the case, then by the law of one price, the current price of the European
call will equal the current value of the portfolio. Basically, the goal is to show
that the European call is attainable, i.e., there is a self-financing portfolio that
replicates its price.
414 8 The BSM Model and European Option Pricing
Let us first solve for the trading strategy (n1 , b1 ) using the self-financing
condition (8.74). The two possible values of the S(t1 ) are (Figure 8.1):
Su (t1 ) = S(t0 ) u with probability p
S ( t1 ) = (8.76)
Sd (t1 ) = S(t0 ) d with probability 1 p,
where
u > 1, 0 < d < 1, 0 < p < 1.
By (8.72) and (8.76), Equation (8.74) becomes
This is solved by
n1 = n0 , b1 = b0 . (8.78)
Turning to the replicating condition (8.75), the possible prices of the call at
t1 are (Figure 8.1):
E
Cu (t1 ) = max{Su (t1 ) K, 0} with probability p
C ( t1 ; 1 ) =
E
(8.79)
E
Cd (t1 ) = max{Sd (t1 ) K, 0} with probability 1 p.
Note that in (8.79), we made use of the fact that the terminal value of a Euro-
pean call is its payoff:
where K is the strike price. Employing (8.78), we see that (8.75) becomes
The coefficient matrix of this linear system of two equations in two unknowns
is invertible:
qh
e Su ( t 1 ) er h B ( t 0 )
det = e(r+q) h S(t0 ) B(t0 ) (u d) > 0.
e q h Sd ( t 1 ) er h B ( t 0 )
Remark 8.7. The expression for n0 in (8.82) includes a discrete version of a Eu-
ropean calls delta, namely, the partial difference at t1 of the call price with
respect to the underlier price:
where n0 and b0 are given (8.82). Explicitly, the current price of the European
call can be expressed as
e (r q ) h d u e (r q ) h
C E ( t0 ; 1 ) = e r h CuE (t1 ) + CdE (t1 ) , (8.83)
ud ud
Equation (8.83) yields that the options current price (8.83) is independent of the
underliers instantaneous expected return rate m! Hence, the one-period call price is
independent of investors view on m. This is consistent with the earlier observation
(page 404) from the BSM p.d.e. that the price of a derivative is independent of
m.
p u + (1 p ) d = e (r q ) h . (8.86)
e(rq) h = d, e(rq) h = u.
0 < p < 1.
where the call does not pay a dividend (by assumption). By (8.89), the current
call price is:
C E (t0 ; 1) = er h p CuE (t1 ) + (1 p ) CdE (t1 ) , (8.90)
where
Example 8.5. Suppose that the risk-free rate is 2% per annum and a stock with
current price of $50. Assume that the stock pays no dividend and its price 3
months from now is either $53.8900 or $46.3850, where four decimal places are
used to minimize rounding-off errors. Using a binomial tree, compute the one-
period current price of a 3-month European call on this stock given a strike
price of $50.
Solution. The formula for the one-period European call price at the current
time t0 is
C E (t0 ; 1) = er h p max{S(t0 ) u K, 0} + (1 p ) max{S(t0 ) d K, 0} ,
where t0 = 0 and
e (r q ) h d
p = .
ud
The needed inputs are:
$53.8900 $46.3850
u= = 1.0778, d= = 0.9277, p = 0.5151.
$50 $50
Direct calculation then yields the current call price: C E (t0 ; 1) = $1.99.
Consider the two-period risk-neutral binomial tree in Figure 8.2, where the
time remaining on the European call is 2 h and runs from t0 to t2 . We shall
employ risk-neutral pricing to determine the current price of the call:
C E (t0 ; 2) = e r (2h) E C E (t2 ; 2) .
418 8 The BSM Model and European Option Pricing
Fig. 8.2 Possible European call prices for a two-period binomial tree
Direct approach. To compute the expectation E C E (t2 ; 2) directly, Figure 8.2
shows that there are four paths leading to the three possible prices of the call.
The probability of a given price is the sum of the probabilities of each path
leading to the price, where the probability along a path is the product of the
probabilities along each section. It follows:
C E (t0 ; 2) = e r (2h) p2 CuE2 (t2 ) + 2p (1 p ) Cud
E
(t2 ) + (1 p )2 CdE2 (t2 ) .
(8.91)
Using (8.20) and (8.76), we get the explicit forms:
Note that when a binomial tree has many paths, our direct method to com-
pute E C E (t2 ; 2) becomes nontrivial quicklye.g., a 20-step binomial tree
already has over 1 million paths.
where E
Cu2 (t2 ) with probability p
CuE (t2 ) =
E (t )
Cud 2 with probability 1 p
and E
Cud (t2 ) with probability p
CdE (t2 ) =
CdE2 (t2 ) with probability 1 p .
The possible call prices at time t1 are then:
C E ( t ) = e r h p C E ( t ) + ( 1 p ) C E ( t )
u 1 u2 2
ud 2
C ( t1 ; 2 ) =
E
(8.93)
C E ( t1 ) = e r h p C E ( t2 ) + ( 1 p ) C E ( t2 ) .
d ud d 2
where CuE (t1 ) and CdE (t1 ) are now given by (8.93). The European call option
price over the two-period interval [t0 , t2 ] is then given as follows:
C E (t0 ; 2) = e r (2h) p2 CuE2 (t2 ) + 2p (1 p ) Cud
E
(t2 ) + (1 p )2 CdE2 (t2 ) ,
(8.94)
where Cu2 (t2 ), Cud (t2 ), and Cd2 (t2 ) are given by (8.92). Equation (8.94) agrees
E E E
with (8.91).
Filtration approach. We can also obtain Equation (8.94) via risk-neutral pricing
using the -algebra Ft for discrete time t = t j ; see Example 6.17 (page 269) for
more on the filtration. In our binomial tree setting, the -algebra Ft j F j is
generated by the underlying securitys prices up to time t j . For example, F1 is
generated by S0 , S1 and given by
F1 = {, AU , A D , },
where
Note that each element in the events AU , A D , and is an entire price path
from t0 from t2 (not just to t1 ). In particular, the path = UU has an uptick at
t1 followed by another uptick at t2 and AU consists of all price paths with an
uptick at t1 , while A D are those with a downtick at t1 . The risk-neutral price of
the European call is
C E (t0 ; 2) = e r (2h) E C E (t2 ; 2) , (8.95)
p Cu2 (t2 ) + (1 p ) Cud (t2 ) with probability p
E E
=
p C E (t ) + (1 p ) C E (t ) with probability 1 p .
ud 2 d2 2
Consequently,
E C E (t2 ; 2) = p2 CuE2 (t2 ) + 2p (1 p ) Cud
E
(t2 ) + (1 p )2 CdE2 (t2 ).
Inserting the above into (8.95) yields the same call price as in (8.94).
Example 8.6. Suppose that a stock with current price of $50 pays not dividend.
Assume that at 1.5 months from now, the price of the stock is either $52.7250
or $47.4150, where four decimal places are used for pedagogical reasons to
minimize rounding errors. Employing a binomial tree and risk-free rate of 2%
per annum, compute the two-period current price of a 3-month European call
option on this stock given a strike price of $50.
$52.7250 $47.4150 e (r q ) h d
u= = 1.0545, d= = 0.9483, p = = 0.5104.
$50 $50 ud
For the n-period case 0 = t0 < t1 < < tn , an induction argument yields the
current price of a European call as given by (Exercise 8.32):
n
r (nh ) r (nh ) n
C ( t0 ; n ) = e
E
E (C (tn , n)) = e
E
i pi (1 p )ni CuEidni (tn ),
i =0
(8.97)
where
CuEi dni (tn ) = max{S(t0 )ui dni K, 0}
for i = 0, 1, . . . , n. The formula (8.97) can be expressed more simply as (Exer-
cise 8.33):
where k is the smallest value of i for which S(t0 )ui dni K > 0 and
n
n i p u
= tn t0 , N(n, k , p ) = p (1 p )ni , p = (rq)h .
i=k
i e
The reader may have noticed that the pricing formula (8.98) looks similar
to the BSM pricing formula (8.70) on page 411. In the continuous-time limit
n , it can be shown (see Hsia [21]) that (8.98) converges to (8.70):
Example 8.7. Suppose that a stock paying no dividend has a current price of
$50 and annual volatility of 15%. For a risk-free rate of 2% per annum, compute
the 100-period current price of a 3-month European call on this stock with a
strike price of $50.
we obtain a current call price of C E (t0 ; 100) = $1.62. This coincides with the re-
sult of the continuous-time BSM pricing formula (8.70). In fact, for an 80-period
tree, the two prices already agree to the given decimal places; see Figure 8.3.
422 8 The BSM Model and European Option Pricing
2.0
1.9
CallPrice $ 1.8
1.7
1.6
1.5
0 20 40 60 80 100
n
Fig. 8.3 The per share prices of a 3-month European call option, where the strike price is K = $50
per share, risk-free rate is 0.02 per annum, and underlying stock has current share price of $50 and
volatility of 15%. The bullets show the call prices given by the n-period binomial option-pricing for-
mula using a CRR tree for n = 1, . . . , 100. The horizontal dashed line is the per share call price of $1.62
obtained using the BSM formula. Even values of n give call prices below the BSM line, while odd
values are above. The binomial per share call price is $1.99 for n = 1 and $1.45 for n = 2. The price to
two decimal places is already $1.62 for n = 80
When selling European calls, the risk to the seller is to be able to meet the
obligation should the calls be exercised. For instance, if you sell 500 European
calls on a stock and they are exercised at expiration, then you need to have
8.6 Delta Hedging 423
50,000 shares of the stock available at expiration. The risk lies, for example, in
not having them available at expiration. Or, if you buy the 50,000 shares right
after selling the calls to be assured of covering your obligations in the event of
exercise, and the calls expire out of the money, you are left with a lot of shares
that may have even lost value.
Our core strategy for managing the risk from selling a European call is to
create simultaneously an offsetting position in a synthetic European call. The
synthetic call will be constructed using the funds from the short sale and bor-
rowing a certain amount. The core of the strategy is the process of delta hedging
and will result in the loan being paid off at expiration. Note that the cash divi-
dend from the underlier can be used either to buy more units of the underlier,
which is assumed by default, or to pay toward the loan. To make the presen-
tation different, instead of employing our default assumption, assume that the
cash dividend is used continuously as payment toward the loan.
We now detail the theoretical framework for hedging.
Time t
CtE + Lt = C (t) St .
The portfolio with C (t) units of the underlier and loan Lt is called a syn-
E (t) replicates the call price:
thetic call because its value Csyn
E
Csyn (t) = C (t) St Lt = CtE .
Remark 8.9.
1. E ( t) = n S c
Since Csyn t t + bt Bt , where nt = SSct C (t) and bt = BLtt , the syn-
t
thetic call can be viewed as a portfolio with a position in nt units of the
424 8 The BSM Model and European Option Pricing
Since
E
Csyn ( t) = C ( t) St L t with Lt = C (t) St CtE , (8.100)
0 = VC (t) = CtE + It Lt + ( t ) St ,
C
short call risk-free investment loan long C ( t) units
(8.101)
where
It = CtE , L t = C ( t ) St .
Equation (8.101) has a natural financial interpretation: the portfolio with posi-
tions (a) and (b) at t is equivalent in value to a costless portfolio where one short sells
1 European call for CtE , invests the proceeds CtE in a risk-free investment, borrows
the amount L t = C (t) St , and uses the loan to buy C (t) units of the underlying
security.
Time t + dt
As time t advances, the positions on the right-hand side of (8.101) will change.
We shall have to update continuously our earlier positions and will do so in a
costless, self-financing way to maintain the equation at zero, i.e., maintain the
replication by the synthetic call. Recall that self-financing means the change in
the value of the portfolio comes strictly from the change in the value of the
securities in the portfolio, which includes a loan. For instance, any purchases
of additional units of the security will be funded by increasing the loan, i.e.,
more borrowing, and any proceeds from selling units of the security will not
be withdrawn, but paid toward the loan. Note that borrowing is at the risk-free
rate.
Let us look closely at this process. At time t + dt, the portfolios value has
the form:
8.6 Delta Hedging 425
L t+dt = L t e(rq) dt .
If dC (t) > 0, then C (t) units are not enough, so we purchase dC (t) units
by borrowing dC (t) St+dt . The loans balance becomes
L t+dt = L t e(rq) dt + dC (t) St+dt .
If dC (t) < 0, then C ( t) units are too much. We then sell dC (t) units
and receive dC (t) St+dt . We pay this amount toward the loan, which
decreases the change in value of the loan to
L t+dt = L t e(rq) dt dC (t) St+dt .
where dC (t) can be zero, positive, or negative. The delta hedging process
is repeated during every instant until expiration.
426 8 The BSM Model and European Option Pricing
Expiration Time T
At expiration T, the short position in the call will be CTE , the initial investment
CtE would have grown at the risk-free rate r to CtE er over the period = T t,
the loans balance will be L T , and the long position in the underlying asset will
consist of C ( T ) units and have value C ( T ) ST . The value of the portfolio at
expiration is then:
0 = VC ( T ) = CTE + IT L T + C ( T ) ST ,
0 = ST + K + CtE er L T + ST .
Consequently,
K + CtE er = L T .
Hence, if the call is exercised, then the loans balance L T can also be paid off by
using the proceeds K from selling the 1 unit of the underlying security to the
call holder and the cash CtE er from liquidating the risk-free investment.
At-the-money or out-of-the-money European call at expiration. Assume ST
K, i.e., the call is not exercised. Then even though the issuer has no obligation
to the call holder, the balance on the loan still has to be settled. For ST K, we
have CTE = 0 and so
0 = CtE er L T + C ( T ) ST ,
i.e.,
CtE er + C ( T ) ST = L T .
16We assume that a European call is exercised if and only if ST > K and not exercised if and only if
ST K.
8.6 Delta Hedging 427
In other words, if the call is not exercised, then the loans balance L T can still be paid
off by the cash inflow CtE er + C ( T ) ST from liquidating the risk-free invest-
ment and the long position of C ( T ) units of the underlying security. Note that
(8.47) yields C ( T ) = 12 for ST = K and (8.49) implies C ( T ) = 0 for ST < K.
The latter, along with (8.104) and
dC ( T dt) = C ( T ) C ( T dt),
implies that
(rq) dt + 1 ( T dt) S
L T dt e 2 C T at-the-money at T
LT =
L T dt e(rq) dt C ( T dt) ST out-of-the-money at T.
(8.105)
In summary, we see that employing delta hedging enables one to meet the
obligations of selling European calls using a costless, self-financing, replicating
process. Even though the process requires no initial capital, it involves borrow-
ing, but the loan is paid off at expiration.
A major concern to a European call seller is fulfilling the obligations of the call
if it is exercised. To get a sense of this risk, we give a simple example:
Example 8.8. Suppose that a firm sells 500 European calls for $2.264248 per
share of a stock currently trading at $75 per share. Since each call involves 100
shares, the firm receives
Assume that there are 80 days to expiration, the strike price is $75, and the
firm invests the $113,212.40 proceeds in a risk-free investment growing at 2%
annually. Suppose that there are 365 days in a year; see Remark 8.1 (page 385).
Should the calls be exercised at expiration, the firm must sell 500 100 =
50, 000 shares of the stock to the call holder for $75 per share. But if the firm
does not own any share of the stock and plans to buy the 50,000 shares only
when the calls are exercised, then the firm is taking a naked position and ex-
posing itself to potential loss. For instance, if the share price at expiration is
$87.98, then the calls will be exercised and it will cost the firm $4,399,000 to buy
the shares to satisfy the obligations of the calls. Though firm would receive
$3,750,000 from selling the shares at the strike price and the firms proceeds
428 8 The BSM Model and European Option Pricing
from selling the calls would have grown to e0.02(80/365) $113, 212.40 =
$113, 709.76 at expiration, the firm would still have a loss that more than
quadruples the gain from selling the calls:
$113, 709.76 $4, 399, 000 + $3, 750, 000 = $535, 290.24.
On the other hand, the firm can take a covered position by buying the 50,000
shares at the time it sells the calls. If the stock price falls to $65 at expiration,
i.e., drops in value to $500,000, then the call will not be exercised and, despite
the growth in the call-sale proceeds to $113,709.76, the firm will experience a
loss in value that more than triples this gain:
How many shares should the firm then hold to hedge against the risk from
selling a European call? The answer is actually not a fixed number of shares.
The number will have to change as time advances. It is called delta hedging.
This tool is actually contained in the construction of the costless, self-financing,
riskless portfolio employed in Section 8.1.4. We apply the ideas and results of
that section to illustrate, in principle, how delta hedging works. For simplicity,
the discussion will focus on the issuance of 1 call on 1 unit of an underlier.
The application in Section 8.6.2 and, in particular, Example 8.9 (page 428) will
illustrate delta hedging using the example above.
We now illustrate the theoretical framework for delta hedging European
calls by applying it to Example 8.8.
Example 8.9. Suppose that a firm sells 500 European calls for $113,212.40 on a
stock with share price of $75 and annual volatility of 15%. All of the calls then
involve 50,000 shares of the stock. Assume that there are 80 days to expira-
tion, the strike price is $75, and the firm invests the $113,212.40 in a risk-free
investment growing at 2% per annum. How can the firm manage the risk from
the call sale without taking a naked or covered position? We illustrate how to
manage the risk using daily delta hedging. Since the risk-free investment will
earn interest even on non-trading days, assume 365 days in a year and, for
simplicity, suppose that trading occurs on each of the 80 days remaining till
expiration.
Let us now delta hedge day by day for 80 days using Equations (8.101) and
(8.102) as a guide, which we rewrite for convenience:
0 = VC (t) = CtE + It Lt + ( t) S ,
C t
short call risk-free investment loan long C ( t) units
(8.106)
8.6 Delta Hedging 429
and
where
It+dt = CtE er dt , L t+dt = L t e(rq) dt + dC (t) St+dt (8.108)
with
dC (t) = C (t + dt) C (t).
Note that Equations (8.106) and (8.107) are based on one share of the stock.
The current time is Day 0, each instantaneous time change is approximated
by a day, and expiration is at T:
1
t = 0, dt h = , T = 80 h.
365
Because we cannot perfectly delta hedge, i.e., we cannot hedge every moment
of time (e.g., dt is replaced by a day) and cannot work to infinitely many deci-
mal places, our results will not yield VC (t) = 0 and VC (t + dt) = 0 as the current
time t advances day by day. However, though a perfect hedge would yield zero
loss at expiration, our approximate hedge will significantly reduce any losses
compared to the naked and covered positions in Example 8.8 (page 427). The
results are summarized in Table 8.1.
We shall carry out the delta hedging on a per-share basis. Each per-share
position can then be multiplied by 50,000 to obtain the total size of the position.
430 8 The BSM Model and European Option Pricing
Table 8.1 Delta hedging to mitigate against risk from selling 500 calls. The time to expiration is 80
days. Multiply the per-share values by 50,000 to get the total values. Though the European call expires
in the money, delta hedging enables the seller to have the required 50,000 shares and minimizes the
losses to only $2,326.10. Simulation is based on a MATLAB code, and we truncated the output at six
decimal places. The portfolios value at time t is VC ( t) = CtE + It L t + C ( t) St
t St C ( t) CtE It L t C ( t ) St V C ( t)
short call investment loan balance long C ( t) shares port. value
Day 0
The stock price is $75 and there are 80 days until expiration. We employ the
per-share Equation (8.106). The firm short sells each European call on one share
at the BSM price. With the starting time t = 0, the BSM price per share is
C0E = C E (S0 , K, , r, T, q) = C E $75, $75, 0.15, 0.02, 80 h, 0 = $2.264248445.
We shall employ at least six decimal places to minimize rounding errors. The
short sale creates a negative position with per-share value
C0E = $2.264248445.
The firm invests the proceeds from the sale in a risk-free account paying 2%
per annum:
I0 = C0E = $2.264248445 (per share).
Now, the current delta of the call to nine decimal places is
C (0) = 0.538848947,
Day 1
We shall use (8.107). Suppose that the stock price is $75.80660915. There are 79
days until expiration and the per-share value of the short sale position is now
ChE = C E $75.80660915, $75, 0.15, 0.02, 79 h, 0 = $2.707914261.
Ih = I0 er h = $2.264372517,
C (h) = 0.598789243.
which rounds off at six decimal places to the value in the table.
The process from Day 0 to Day 1 is repeated across consecutive days. We
then skip ahead to the day before expiration.
Day 79
Suppose that on Day 79 we have the rounded values given in Table 8.1.
Day 80
The stock price is $93.56147415, which means that the call expires in the money
and will be exercised. The short sale position now has value
Note that the above value, which is exposed to more rounding errors due to
the day-to-day delta hedging, still gives the same result to six decimal places
as the more accurate value
C ( T ) = 1.
In other words, at expiration the firm has the required 50,000 shares to meet the
obligations of the call being exercised. Since
$0.046522 = VC ( T ) = CTE + IT L T + C ( T ) ST
= ST + K + IT L T + ST
= K + IT L T (per share).
In other words, the proceeds K received from selling the call at strike plus the
amount IT from the risk-free investment are not enough to cover the balance
L T of the loan. The negative value corresponds to a total loss of
This loss is insignificant compared to the total monies involved in the sale.
In the idealized case of continuously delta hedging, the firm would net zero
when the calls are sold at the BSM price. Note that if the firm had sold the calls
sufficiently higher than the BSM price, then it can even make a profit.
Exercise 8.12 explores a case where the European call in this example expires
out of the money. In this case, we would make use of (8.105).
The price of a European call is more volatile than that of its underlying secu-
rity (Exercise 8.25). A 1% price movement of the underlier can lead to a price
movement in the call that is significantly larger (Exercise 8.8). This section ex-
plores how to make the value of a portfolio of options with the same under-
lying security more stable against small price movements in the underlier. To
accomplish this, we first discuss option Greeks and then apply these ideas to
the construction of delta- and gamma-neutral portfolios.
f i (St , t). Let Ni be the number, based on each unit of the underlier, of the ith
option in the portfolio. Here Ni > 0 indicates a long position, Ni = 0 no po-
sition, and Ni < 0 a short position in the ith option. For example, suppose
that the first position in the portfolio is long 300 calls on a stock. Since a call
involves 100 shares of the stock, the number of calls on a per-share basis is
N1 = 300 100 = 30, 000. The value of the portfolio at time t is
The portfolio also has what are called option Greeks, which define the rate of
change of the portfolio relative to various parameters.
We shall first introduce these rates of change for the options in the portfolio.
Recall that the value of an option is a function of ( x, t) with x representing the
possible prices of the underlier at time t (see page 391). Using rates of change
in x and t, we then define the following three option Greeks of the ith option in
the portfolio :
4
fi f i 44
Delta : i (St , t) = ( St , t) =
S x 4( x, t)=(St , t)
4
2 f i 2 f i 44
Gamma : i (St , t) = ( St , t) =
S2 x2 4( x, t)=(St , t)
4
fi f i 44
Theta : i (St , t) = ( St , t) = .
t t 4( x, t)=(St , t)
The formulas for these option Greeks in the case of calls and puts are given in
Exercise 8.35 on page 473.
Now, viewing the portfolio value V as a function of ( x, t), the option Greeks
extend naturally to the portfolio:
4
V V 44 k
( St , t) =
S
( St , t) = = N ( St , t)
x 4( x, t)=(St , t) i=1 i i
4
2 V 2 V 44 k
( St , t) =
S2
( S t , t ) = = N ( St , t)
x2 4( x, t)=(St , t) i=1 i i
4
V V 44 k
( St , t) =
t
( St , t) = = N ( St , t ).
t 4( x, t)=(St , t) i=1 i i
Note that if the gamma of a portfolio stays sufficiently small, then delta
changes a little as the underlier price changes, which reduces the need for fre-
quent rebalancing in delta hedging.
Turning to the BSM p.d.e., we know that each option in the portfolio satisfies
this p.d.e. (q = 0):
8.7 Option Greeks and Managing Portfolio Risk 435
1 2 2 2 f i f f
St 2 (St , t) + r St i (St , t) + i (St , t) r f i (St , t) = 0.
2 x x t
This can then be expressed in terms of option Greeks as follows:
1 2 2
St i (St , t) + r St i (St , t) + i (St , t) r f i (St , t) = 0.
2
Summing over the k options in the portfolio, it follows that V satisfies the BSM
p.d.e.:
1 2 2
St (St , t) + r St (St , t) + (St , t) r V (St , t) = 0. (8.109)
2
As mentioned earlier (page 391), this equation holds for all 0 t T and every
0 < St < .
Now, expand the portfolio to include NS units of the underlying security:
(St , t) = V (St , t) + NS St .
V
We saw that at the core of managing risk from short selling, an option (or
derivative) is delta hedging, i.e., maintaining a long position in a delta num-
ber of units of the underlying security. This process makes the portfolio with a
short position in the option and delta long position in the security delta neutral,
meaning the portfolio value is robust against sufficiently small price move-
ments in the underlier. For instance, suppose that at the current time t, a port-
folio is short 1 European call, and denote its value by
V ( St , t ) = C E ( St , t ).
now determine the value of NS that will accomplish the desired insensitivity.
Denote the value of the expanded portfolio by
(St , t) = V P (St , t) + NS St = C E (St , t) + NS St .
V
Using Equation (8.33) on page 394 to express the call price as a function of two
variables ( x, t) with the security price as x, we have
P ( x, t) = C E ( x, t) + NS x.
V
Then
4 4
V 44
V C E 44
( St , t) = 4 = + NS (t) = C (t) + NS (t).
S x 4 x 4( x, t)=(St , t)
( x, t)=(St , t)
V
(St , t) = 0.
S
In other words, a sufficiently small movement in the security price during the
instant from t to t + dt causes little change in the portfolio value. The portfolio
is said to be delta neutral at time t in the sense that it is neutral on whether
the stock has a small price movement up or down. Of course, the number of
shares NS = C (t) that was bought at time t to make the expanded portfolio
delta neutral at t will change as the current time t advances. This means that
the expanded portfolio will have to be continuously rebalanced.
Let us now turn to delta neutrality of the general portfolio of options con-
sidered in Section 8.7.1. We are interested in how an instantaneous change dS
in the price of a security impacts the instantaneous change dV in the value of
the expanded portfolio. Itos formula yields:
1 2 2 2 V
V V
( St , t) =
dV St (St , t) dt + (St , t) dSt + (St , t) dt.
2 x 2 x t
In terms of option Greeks, we obtain:
dV (St , t) dSt + 1
( St , t) = (St , t) dt,
(St , t) (dSt )2 + (8.111)
2
where
( St , t ) = ( St , t ), ( St , t ) = ( St , t ),
(dSt )2 = 2 S2t dt.
S, and t, Equation (8.111)
For sufficiently small incremental changes V,
yields
8.7 Option Greeks and Managing Portfolio Risk 437
V (St , t) St + 1
( St , t) (St , t) t.
(St , t) (St )2 + (8.112)
2
Equation (8.112) shows that the impact of a price movement St on the
change V (St , t) in the value of the expanded portfolio can be reduced by mak-
ing the portfolio delta neutral, i.e., make delta vanish:
(St , t) = (St , t) + NS = 0.
NS = (St , t) (8.113)
V ( St , t ) = f 1 ( St , t ), ( St , t ) = 1 ( St , t ),
which gives
NS = (St , t) = 1 (St , t).
In other words, the portfolio is made delta neutral by expanding it to include
a long position with 1 (St , t) units of the underlier. In general, a portfolio of
options is made delta neutral by expanding it to include NS = (St , t) units of the
underlying security.
Example 8.10. (Delta Neutrality) A portfolio has a short position in 500 Eu-
ropean calls and long position in 300 puts, all on the same underlying stock,
which is assumed to pay no dividend. The per-share deltas of each call and put
are 0.5389 and 0.7584, respectively. How to make the portfolio delta neutral?
and puts is
N2 = 300 100 = 30, 000.
By (8.113), the portfolio can be made delta neutral by expanding it to include
4,193 shares of the stock:
A delta-neutral portfolio also has an interesting link between its gamma and
theta. By the BSM p.d.e. (8.110), when the expanded portfolio is delta neutral,
438 8 The BSM Model and European Option Pricing
we have
1 2 2 ( St , t) = r V
( St , t ).
St ( St , t) +
2
Since the right-hand side r V (St , t) is a fixed value at t, if theta (St , t) has a
sufficiently large positive (resp., negative) value, then it forces gamma ( St , t)
to have a sufficiently large negative (resp., positive) value to maintain the fixed
value on the right. For this reason, theta (St , t) is also interpreted intuitively as a
proxy for gamma (St , t) in a delta-neutral portfolio; see Hull [22, Sec. 19.7]. Note
that theta measures the impact of the change in time on the change in value of
the portfolio, while our discussion is focused on the impact of the underlying
securitys price change on the portfolios change in value. Readers are referred
to Hull [22, Sec. 19.5] and Kwok [26, Sec. 2.1.3] for more on theta.
( St , t) = V
V (St , t) + No fo (St , t),
where fo (St , t) is the value of the option at t. The gamma of the new portfolio is
2 V
( St , t) = ( St , t) =
(St , t) + No o (St , t) = 0.
S2
The gamma
(St , t) vanishes exactly when
8.7 Option Greeks and Managing Portfolio Risk 439
( St , t)
No = . (8.115)
o ( St , t)
Note that we cannot accomplish this gamma-neutrality by adding a certain
number of units of the underlying security since the gamma of the underlier is
zero.
The reader may have noticed from (8.114) that though No in (8.115) makes
the new portfolio gamma neutral, it causes the delta of the new portfolio to be
nonzero:
( St , t) =
(St , t) + No o (St , t) = No o (St , t),
where (St , t) = 0 and o (St , t) is the delta of the option. To make the new
portfolio delta neutral, we take a position
in the underlying security. The new portfolio then has the following value at t:
mod (St , t) = V
V (St , t) + No fo (St , t) + Nmod St , (8.117)
where No is given by (8.115) and Nmod is fixed. View the value of the modified
portfolio as a function of ( x, t), namely,
mod ( x, t) = V
V ( x, t) + No fo ( x, t) + Nmod x.
Then by (8.115), (8.116), and (8.117), the modified portfolio is both delta neutral
and gamma neutral at ( x, t) = (St , t):
mod (St , t) =
(St , t) + No o (St , t) + Nmod = 0
and
mod (St , t) =
+ No o = 0.
and
Nmod = No o (St , t) = 6, 893.57 0.4681 = 3, 226.88.
In other words, buy 6,893.57 calls on a per-share basis (or 68.9357 calls in round
lots) and short sell 3,226.88 shares of the stock and include these positions
in the original portfolio. The resulting portfolio will be delta- and gamma-
neutral.
Readers are referred to Hull [22, Chap. 19] for more on option Greeks and
their applications.
We shall take a closer look at some of the assumptions of the BSM model to
see how they hold up against market data. A fundamental assumption of the
BSM model is that security prices follow geometric Brownian motion. Two
consequences are that security prices are continuous with probability 1, i.e.,
there are almost surely no jump discontinuities, and the log returns of security
prices are normal. We shall use the S&P 500 index as an example to illustrate
that real-world security prices can jump and exhibit log return behavior that
deviates from normality.
Our first observation from the market data is that security prices can have
jumps. In particular, we consider the daily closing prices of the S&P 500 in-
dex from January 3, 1950, to January 2, 2015.17 These prices include the stock
market crash of October 19, 1987, which was marked by a negative jump (a
drop) in the price of the index that day. The opening price was 282.70, which
was also the closing price on the previous trading date of October 16, 1987.
The S&P 500 closed at 224.84 on October 19, 1987, creating a drastic fractional
percentage drop of -20.4669% in the price or a log return drop of -22.8997%.
The negative jump is shown in Figure 8.4.
The daily log returns for the data in the top left panel of Figure 8.4 are shown
in the bottom panel of the figure. The longest negative spike is due to the crash
of October 19, 1987. Note that the second longest spike and the accompany-
ing volatility reflect the 2008 financial crisis, which actually began in 2007 and
peaked in the latter part of 2008. Significant damaging effects spilled into early
2009, and an economic slowdown continued into 2012.
310
2000 300
1500 280
270
1000 260
250
500 240
230
0 220
1947 1957 1967 1977 1987 1997 2007 13-Oct 14-Oct 15-Oct 16-Oct 19-Oct 20-Oct 21-Oct 22-Oct 23-Oct
Time Time
0.15
0.1
0.05
S&P 500 Daily Log Return
-0.05
-0.1
-0.15
-0.2
-0.25
1947 1957 1967 1977 1987 1997 2007
Time
Fig. 8.4 Top panels: S&P 500 index closing prices from January 3, 1950, to January 2, 2015 (top left)
and from October 13, 1987, to October 23, 1987 (top right). The stock market crash of October 19, 1987,
is clearly seen in the middle panel. The S&P 500 index closed with a negative jump of about -20.5%
relative to the opening price that day, which is the same as the closing price on the previous trading
day of October 16. Note that there was no trading on October 17 and 18 (a weekend). Bottom panel:
daily log returns of the S&P 500 based on prices in the top left panel. The longest negative spike is
due to the stock market crash on October 19. The next longest negative spike is connected with the
2008 financial crisis that became highly pronounced in the latter part of 2008
The evidence for jumps is not restricted to the S&P 500 index, but shows
up in other securities and even in intraday trading. The study of jumps
has become a significant area of research; e.g., see the discussion by Taylor
[41, Sec. 13.6] and references therein.
Our second observation is that the log return of security prices is not necessar-
ily normal. Before showing this with S&P 500 data, let us review some of the
moments of a random variable. Let X be a random variable with a p.d.f. f ( x ).
Denote the first moment (i.e., mean) of X by
X = E ( X ),
442 8 The BSM Model and European Option Pricing
X2 = Var( X ),
which measures the dispersion or spread of the possible values of X to the left
and right of the mean E ( X ). In the BSM model, the log return of the price of
a security is normal and so is completely characterized by its first and second
moments. However, we shall see from data that all the higher moments cannot
be ignored.
Skewness
Given that the log return of a security in the BSM model is normal, the shape
of its p.d.f. is symmetric about the vertical line through its center (i.e., mean).
Deviations of the graph of f ( x ) from a symmetric shape about its center can be
measured through the standardized third moment of X. It is called the skewness
of X and defined by
1
X X 3 1
skew( X ) = E = 3 ( x X )3 f ( x ) dx.
X X
When the p.d.f. is symmetric about the center, we have skew( X ) = 0.18 Equiv-
alently, if skew( X ) = 0, then there is a break in the symmetry about the center.
Intuitively, if skew( X ) > 0, then a unimodal (single peak) p.d.f. will have a
more elongated right tail area like the dashed p.d.f. in Figure 8.5. In this case,
we say that f ( x ) is positively skewed. The opposite happens for skew( X ) < 0, in
which case we call f ( x ) negatively skewed; note the stretched left tail area of the
dotted p.d.f. in Figure 8.5.
Kurtosis
18This is because the portion of skew( X ) from to 0 cancels the part from 0 to . In fact, all odd
power moments E ( X X )2n +1 vanish if the p.d.f. is symmetric about the line x = X .
8.8 The BSM Model Versus Market Data 443
f (x) f (x) f (x)
0.20
0.3 0.3
0.15
0.2 0.2
0.10
x x x
6 4 2 6 4 2 2 4 6 2 4 6
Fig. 8.5 The three p.d.f.s are due to a skew-normal distribution with location parameter 0, scale pa-
rameter 2, and shape parameters -6 (dotted graph), 0 (solid graph), and 6 (dashed graph). The dotted
graph has a negative skewness of -0.891159, the solid graph has skewness 0 (the skew-normal distri-
bution reduces to a normal in this case), and the dashed graph has a skewness of 0.891159
ekurt( X ) = kurt( X ) 3.
For the cases ekurt( X ) < 0, ekurt( X ) = 0, and ekurt( X ) > 0, the p.d.f. f ( x )
is called platykurtic (platy means flat; think platypus), mesokurtic (meso
means middle), and leptokurtic (lepto means thin), respectively. In interpret-
ing kurtosis of a random variable X, we shall always compare the p.d.f. f ( x )
of X with the p.d.f. of the normal random variable determined by the mean X
and standard deviation X of X. Various kurtosises are illustrated in the left
graphs of Figure 8.6 using three symmetric, unimodal p.d.f.s with identical
mean 0 and standard deviation 1. The middle p.d.f. (solid curve) is mesokurtic
since it is a standard normal. The leptokurtic p.d.f. (dashed curve) has a thinner
peak and thicker (heavier) tails than the standard normal, while the platykur-
tic p.d.f. (dotted curve) has a flatter peak and thinner tail than the standard
normal. In addition, observe that the leptokurtic and platykurtic p.d.f.s inter-
sect the corresponding normal p.d.f. twice on the right of the mean and twice
on the left of the mean. The lower crossing on each side results in a tail thicker
than that of the associated normal; see the right graphs in Figure 8.6. These
properties are not uncommon for unimodal p.d.f.s. See DeCarlo [10] and Ba-
landa and MacGillivray [2] for more.
Remark 8.11. It is important not to confuse effects due to variance with that
due to kurtosis. For example, Figure 8.7 shows the two normal p.d.f.s in com-
parison with the standard normal. Though the dashed p.d.f. has a narrower
peak and the dotted one a flatter peak, all three p.d.f.s have the same excess
kurtosis of 0. Consult DeCarlo [10] for some of the pitfalls in the interpretation
of kurtosis.
444 8 The BSM Model and European Option Pricing
f(x)
0.4
0.3
0.2
0.1
x x
3 2 1 1 2 3 1.8 2.0 2.2 2.4 2.6 2.8 3.0
Fig. 8.6 Left plots: the p.d.f.s all have the same mean 0 and standard deviation 1 and are symmetric
about the center (zero skewness). The solid p.d.f. is the standard normal, which has excess kurtosis
0 (mesokurtic). The dotted p.d.f. has excess kurtosis -1 (platykurtic) with a flatter peak and thinner
tail than the standard normal. It is given by a Wigner semicircle density with radius 2 centered at the
origin. The dashed p.d.f. has excess kurtosis 1.2 (leptokurtic) with a thinner peak and thicker tails than
the standard normal. It is given by the logistic density with mean 0 and scale parameter 3/. Right
plots: zoom-in of the right tails of the p.d.f.s in the left plots. Observe that the leptokurtic (dashed)
and platykurtic (dotted) p.d.f.s have tails that are thicker and thinner, respectively, than that of the
associated normal
Skewness and Kurtosis in the S&P 500 Index Daily Log Returns
Turning now to the daily log returns of the S&P 500 based on prices from Jan-
uary 3, 1950, to January 2, 2015, the frequency histogram of the log returns
reveals both asymmetry and leptokurtosis. Figure 8.8 shows the frequency his-
togram. The skewness is -1.0281, which is reflected in the left tail having a more
elongated area than the right tail.
The excess kurtosis is 27.6769, which is significantly above that of the as-
sociated normal. Indeed, the tails of the S&P 500 daily log returns are thicker
(heavier) than those of the corresponding normal. In other words, there is a
higher probability (than in the case of a normal) of having extreme values in
the daily log return. Figure 8.9 shows the thicker tails by zooming in on his-
f (x)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
x
3 2 1 1 2 3
Fig. 8.7 The peakedness and flatness of the dashed and dotted p.d.f.s relative to the standard nor-
mal (solid p.d.f.) should not be confused with leptokurtosis and platykurtosis, respectively. All three
graphs have the same excess kurtosis 0, but different standard deviations: 0.6 (dotted graph), 1 (solid
graph), and 1.4 (dashed graph)
8.8 The BSM Model Versus Market Data 445
60
50
40
Frequency
30
20
10
0
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1
Daily Log Return
Fig. 8.8 The histogram shows the frequency of the log returns of the S&P 500 index based on prices
from January 3, 1950, to January 2, 2015. The mean and standard deviation are 0.0002945 and 0.0097,
respectively. The solid curve is the p.d.f. of a normal distribution with mean 0.0002945 and standard
deviation 0.0097. The skewness is -1.0281 and the excess kurtosis is 27.6769
8 8
7 7
Frequency 6 6
Frequency
5 5
4 4
3 3
2 2
1 1
0 0
-0.055 -0.05 -0.045 -0.04 -0.035 -0.03 -0.025 -0.02 -0.015 -0.01 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
Daily Log Return Daily Log Return
10
5
Quantiles of Input Sample
-5
-10
-15
-20
-25
-5 0 5
Standard Normal Quantiles
Fig. 8.9 Top: The histograms zoom-in on portion of the left and right tails of the histogram in Fig-
ure 8.8. The solid curve is the p.d.f. of a normal distribution with mean and standard deviation given
by the sample data of the S&P 500 daily log returns. Notice that the tails are thicker than the tails of of
corresponding normal. Bottom: QQ-plot of the standardization of the log returns of the S&P 500 log
returns in Figure 8.8. The left and right tails are heavier than those of the standard normal. The excess
kurtosis is 27.6208
C E (St , K, , r, , q) = St eq N(d+ (St , )) K er N d (St , ) ,
truly models the prices of European calls in the marketplace, then given the
E
current market price Cmarket (t) of a European call, we can solve the equation
E
Cmarket (t) = C E (St , K, , r, , q)
implicitly for . The resulting value of is called the implied volatility and de-
noted by im . In other words, the implied volatility is the volatility that makes
the theoretical BSM call price equal to the market price.
How do we know that one and only one implied volatility corresponds to
the market price of a European call? There is actually a 1-1 correspondence
between the possible prices of a European call and the possible volatilities of
its underlying security. To see this, observe that the vega of the call, which is
defined by
C E
C = ( St , t ),
8.8 The BSM Model Versus Market Data 447
is positive:
C = St e q N (d+ (St , t))) > 0.
The European call price is a strictly increasing function of . This implies that
for each possible European call price, there is a unique volatility and vice versa.
For this reason, one can freely switch between volatilities and European call
prices. A similar result holds for European puts, which have the same vega as
a European call. Hence, for each market price of a European call (or put), there
is a unique implied volatility, and for each implied volatility, only one market
price can correspond to it.
Unfortunately, implied volatility opens up several concerns with the BSM
model. The BSM model assumes that the volatility of the underlying security
is an inherent property of the underlier that is constant during the life of a call
or put. In other words, according to the BSM model, the value of is not only
unchanging; it is also independent of contractual elements like the options
strike price K, expiration date T, and type (call or put). Explicitly, if the BSM
formulas (8.37) and (8.39) truly model the respective prices of European calls
and puts in the marketplace, then:
The implied volatility im is the same for a European call (or put) with
inputs St , r, , q, but different strike prices Ki . In the BSM model, the graph
of implied volatility as a function of the Ki s is a horizontal line through the
y-value im .
The implied volatility im is the same for a European call (or put) with in-
puts St , K, r, q, but different times i to expiration. In the BSM model, the
graph of implied volatility as a function of the i s is a horizontal line
through the y-value im .
The implied volatility im is the same for a European call and put with the
same underlying security and identical inputs St , r, q, but different strike
prices Ki and times i to maturity.
80
70
60
Implied Volatility
50
40
30
20
10
0
1400 1600 1800 2000 2200 2400
Strike Price
Fig. 8.10 Implied volatility in percent versus strike price for the S&P 500 index European call option
with PM settlement (SPXPM). The data is from Yahoo! Finance and based on values at 4:49 p.m. EDT
on June 26, 2015, when the index was at 2,101.59. The expiration date is July 17, 2015. The data shows
that the implied volatility as a function of strike price is not constant as predicted by the BSM model.
The graph has a volatility smile
There is a vast literature on implied volatility. See, for example, the texts by
Hull [22, Chap. 20] and McDonald [27, Chap. 23] and the lecture by Rachev[35].
An extensive introduction to the volatility surface is given by Gatheral [16].
Several proposed modifications of the BSM model allow for more heavy tails,
peakedness, and volatility skews in security prices. By mixing geometric Brow-
nian motion (which is a diffusion) with jump discontinuities, Merton [29] intro-
duced in 1976 a model that naturally extends the BSM theoretical framework
and addresses some of the issues facing the BSM model. We give an introduc-
tion to this model.
For easy reference in the discussion to follow, we recap the BSM theoretical
prices for European calls and puts given in Equations (8.37) and (8.39):
C E (St , K, , r, , q) = St eq N(d+ (St , )) K er N d (St , ) ,
and
P E (St , K, , r, , q) = K er N d (St , ) St eq N(d+ (St , )),
where = T t.
8.9 A Step Beyond the BSM Model: Merton Jump Diffusion 449
The number of jumps in the future price of a security is random and assumed
to follow a Poisson process, which we now introducesee, for example, Pri-
vault [34, Chap. 14]. First, we can view the possible outcomes of a Poisson
process at a fixed moment of time in terms of successes and failures. In our
context, define a success at time t as the arrival of information that causes the
price of the security to jump in value. Such information can be important news
pertaining to earnings, sector outlook, serious macroeconomic concerns, etc.
We assume a 1-1 correspondence between successes and the securitys price
jumps and so freely identify a success with a price jump.
Let the current time be 0 and let Nt be the number of price jumps during
[0, t], i.e., over the next t years. The increment Nt Nx , where 0 x < t, is then
the number of price jumps during the time interval ( x, t]. Note that the times
when future price jumps occur are not known a priori. We assume that the
stochastic process {Nt }t0 is a Poisson process, which means that the following
properties hold:
The number of jumps at the starting time 0 is zero, i.e., N0 = 0, and the
mean number of price jumps per year20 is known and denoted by . The
parameter is called the intensity of the Poisson process.
The increments are stationary: for all 0 x < t and all u such that x + u 0
and t + u 0, we have
d
Nt N x = Nt +u N x +u . (8.118)
The increments are independent: for every sequence of times 0 = t0 < t1 <
< tk , the increments
are independent. In other words, the number of price jumps during a given
time interval I is independent of the number of price jumps during a time
interval that does not overlap22 with I.
At most one jump can occur during [t, t + dt]:
1 if a jump occurs during [t, t + dt]
dNt = Nt+dt Nt = (8.120)
0 if no jump occurs during [t, t + dt].
In addition, during any instant dt, the probability of one price jump is dt
and the probability of more than one jump is zero:
1 with probability dt
dNt = 0 with probability 1 dt
k > 1 with probability 0.
Merton models the instantaneous change in the price at a general time t as hav-
ing contributions from a no-jump component determined by geometric Brow-
nian motion and a jump component.
22 Two intervals are nonoverlapping if their interiors are disjoint.
8.9 A Step Beyond the BSM Model: Merton Jump Diffusion 451
No-Jump Case
2
RW = m q .
2
Given that the history of the security price is known up to time t, the expected
instantaneous capital-gain return at t is then:
dSt 44
E 4 Ft = (m q) dt. (8.125)
St
Assume it is possible for the security price to have jumps and, for this situa-
tion, denote its price at t by St . Let us explore the instantaneous capital-gain
return when jumps are possible, but not guaranteed. First, if there is no jump
during [t, t + dt], then the Merton model assumes that the capital-gain return
is determined by a geometric Brownian motion:
dSt
= (m
! q) dt + dBt (no jump during [t, t + dt]). (8.126)
St
The instantaneous total mean return m ! of the security should not be confused
with m, which is the instantaneous total mean return of a security with no
possibility of jumps. Second, suppose there is a jump (i.e., discontinuity) in the
price at t. By our assumptions, it is the only jump during [t, t + dt]. Then the
price has two values at t determined by the left and right limits at t. Let St be
the price of the security just before the jump, i.e., the left-hand limit price at t:
452 8 The BSM Model and European Option Pricing
St = lim St .
t t
St = St+ = lim St .
t t+
We shall call Jt the jump factor. The percentage change in the price at t due only
to the jump is then
St St
= Jt 1 (due only to the jump at t).
St
We can also think of Jt 1 as the fractional size of the jump at t, where a neg-
ative value is a downward jump. On the other hand, if we do not know that a
jump occurs at t, but only that it is possible, then the percentage change con-
tribution coming only from the jump is modeled by
St St
= ( Jt 1) dNt (contribution only from a possible jump at t).
St
(8.128)
We assume that Jt and dNt are independent. By this assumption and (8.121), the
expected capital gain only from the jump is then
E ( Jt 1) dNt = E ( Jt 1) E (dNt ) = dt, = E ( Jt 1) , (8.129)
where is the mean gain factor in the price due to the jump.
The Merton model assumes that the instantaneous capital-gain return at t is
given by the sum of a no-jump component (8.126) just before t and a possible
jump component (8.128) at t. The two equations can then be combined at a
general time t as follows:
dSt
= (m
! q) dt + dBt + ( Jt 1) dNt ( t > 0). (8.130)
St
Another key assumption of the Merton model is that the risk due to jumps is idiosyn-
cratic (see page 151). For example, it can be due to company-specific events like
the sudden finding of corruption among the senior management that threat-
ens to bring down the company. The model assumes there is no reward or risk
premium for jumps since, as we learned in the Markowitz theory, such risk can
be diversified away. Consequently, the expected instantaneous capital-gain return
at any time t, conditioned on the security price being known up to t , is given by the
8.9 A Step Beyond the BSM Model: Merton Jump Diffusion 453
! = m .
m
Hence,
dSt
= (m q ) dt + dBt + ( Jt 1) dNt (t > 0). (8.132)
St
We call (8.132) a Merton jump-diffusion s.d.e. Note that if there is no jump at t,
then St = St and dNt = 0, while for a jump at t, we have dNt = 1. In other
words,
(m q ) St dt + St dBt given no jump at t
dSt =
(m q ) St dt + St dBt + ( Jt 1) St given a jump at t.
(8.133)
We shall solve the Merton jump-diffusion s.d.e. by drawing on the fact that
the price between jumps is a geometric Brownian motion, while the price at a
jump time t is Jt times the price just before t.
Suppose that there are price jumps at times T with
Recall that Nt is the random number of price jumps in [0, t] and so TNt is the
time of the last jump in [0, t]. Note that for times 0 < T1 there are no jumps
in [0, ], while for T < T +1 the intervals [0, T ] and [0, ] have the same
number of jumps, namely, N . Denote the jump factor at T by J = JT for
= 1, . . . , Nt .
454 8 The BSM Model and European Option Pricing
= E ( J ) 1 ( = 1, . . . , Nt ).
0 < T1
Since no jump occurs during this interval, the security price follows a geo-
metric Brownian motion
= T1
The price is ST1 = J1 ST . But the price ST just before T1 has no jumps and
1 1
so is given by taking the left-hand limit of (8.134) at T1 :
ST = S0 e(RW ) T1 +BT1 .
1
Consequently:
ST1 = S0 e(RW ) T1 +BT1 J1 . (8.135)
T1 < < T2
Since there is no jump during this interval, the price at is a geometric
Brownian motion with initial price ST1 :23
= T2
We have ST2 = J2 ST , where ST is obtained by taking the left-hand limit
2 2
of (8.136) at T2 :
ST = S0 e(RW ) T2 +BT2 J1 .
2
Hence:
ST2 = S0 e(RW ) T2 +BT2 J1 J2 . (8.137)
T Nt 1 < < T Nt
Continuing across the remaining jump times, we have geometric Brownian
motion during the given interval:
(RW ) ( TNt 1 )+ (B BTN )
S = ST N e t 1 (T Nt 1 < < T Nt ),
t 1
where
= T Nt
This is the time of the last price jump in [0, t]. Taking the left limit of (8.138)
at TNt , we get
(RW ) TNt + BTN
ST = S0 e t J1 J2 JNt 1 .
Nt
T Nt < t
For this situation, a jump does not occur at since the last jump in [0, t] is
at TNt . The price at is then given by a geometric Brownian motion with
initial price STN :
t
By (8.139), we obtain:
Equation (8.140) shows that for an interval [0, t] containing a random num-
ber Nt of jumps at times 0 < T1 < < TNt t, the security price at t is
given by
Nt
St = S0 e(RW ) t+Bt J ( t 0), (8.141)
=1
where {Nt }t0 is a Poisson process with intensity and = E ( J ) 1. As
expected, the underliers price process is a mix of jumps and geometric Brow-
nian motion with drift RW and volatility . The stochastic process (8.141)
solves the s.d.e. (8.132) and is called a Merton jump diffusion (MJD).
d
For 0 u t, Equation (8.141) and Btu = Bt Bu imply:
Nt
St d
= exp (RW ) (t u) + Btu + X ,
Su =N +1 u
N t N u
d
St = Su exp (RW ) (t u) + Btu + X
=1
Nt u
d
= Su exp (RW ) (t u) + Btu + X , (8.142)
=1
When studying options with underlier process (8.141), we add several as-
sumptions about the MJD process to make the analysis more tractable:
The jump factors J1 , . . . , JNt are i.i.d. lognormal random variables with
J + 12 J2
J = eX , X N ( J , J2 ), E ( J ) = e = + 1, (8.143)
Notation. Denote the probability measure for the MJD price process (8.141) by
P and, of course, assume that the above properties are enforced.
Remark 8.12. The natural log, ln N Nt
=1 J = =1 X , is an example of a com-
t
Consider the MJD security-price process (8.141) over 1 trading year or 252 trad-
1
ing days. The current time is 0, time step is 252 (trading day), final time is t = 1,
and inputs are:
8.9 A Step Beyond the BSM Model: Merton Jump Diffusion 457
1.1
0.9
0.8
0.7
0.6
0.5
0 0.2 0.4 0.6 0.8 1
Time
Fig. 8.11 Simulated daily MJD security prices over 1 trading year. A negative price jump occurs in the
transition from trading day 202 (0.8016 years) to trading day 203 (0.8056 years). The price fell from
$1.0207 to $0.6077, which is a 40.46% drop. The inputs of the simulation are given in Equation (8.144)
+ 1 2
where = e J 2 J 1. Figure 8.11 depicts a sample path of {St }0t1 over 1
trading year, i.e., 252 trading days. A very pronounced negative jump occurs in
the transition from trading day 202 (0.8016 years), when the price was $1.0207,
to trading day 203 (or 0.8056 years), when the price dropped to $0.6077. It is a
percentage drop of:
St St 0.6077 1.0207
Jt 1 = = = 40.46%.
St 1.0207
The MJD security price model can also produce skewness and kurtosis. Fig-
ure 8.12 shows a histogram of the log returns of a simulated MJD price process
running over 65 years and with the same inputs as in (8.144). The skewness
and excess kurtosis of the simulated MJD prices are -1.3325 and 23.9169, re-
spectively. In other words, the MJD security price model has daily log return
behavior that deviates from normality. Notice the qualitative similarities be-
tween Figure 8.12 and Figure 8.8 (page 445), which shows log returns of the
S&P 500 for prices over a 65-year period.
A QQ-plot of the standardized MJD log returns is shown in Figure 8.13. The
figure gives a clear depiction of the MJD log returns deviating from normal-
ity. It is interesting to observe the qualitative features that Figure 8.13 shares
with the QQ-plot of the standardized S&P 500 log returns shown in Figure 8.9
(page 446). Notice the outlier in the bottom left in both figures. These qualita-
tive comparisons show that the MJD security model can address the difficul-
ties faced by the geometric Brownian motion model of underliers. Naturally, a
458 8 The BSM Model and European Option Pricing
40
35
Frequency 30
25
20
15
10
0
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1
Daily Log Return
Fig. 8.12 Simulated daily log returns for an MJD price process over 65 years. The mean is 0.000424
(practically zero) and standard deviation is 0.0099. The skewness is -1.3325. In particular, the left tail
has a more elongated area (due to more histograms on that side) than the right tail; see the dark lines
along the x-axis. The excess kurtosis is 23.9169, which is much higher than that of the corresponding
normal with mean 0.000424 and standard deviation 0.0099. Compare with Figure 8.8 (page 445)
proper fitting of the MJD model to security prices will not depend on qualita-
tive comparisons, but will involve a detailed statistical investigation.
We shall see in this section that a market with an MJD underlier has no ar-
bitrage, but is incomplete, i.e., not all its derivatives are attainable. In other
words, there is at least one derivative whose payoff cannot be replicated using
a self-financing trading strategy in other securities.
5
Quantiles of Input Sample
-5
-10
-15
-20
-25
-5 0 5
Standard Normal Quantiles
Fig. 8.13 QQ-plot of the standardization of the simulated MJD log returns in Figure 8.12. The left and
right tails are thicker than that of the standard normal. Compare with Figure 8.9 (page 446)
8.9 A Step Beyond the BSM Model: Merton Jump Diffusion 459
where the index was relabeled to for simplicity. This price is relative to a
real-world probability measure P , , which incorporates our assumptions
on page 456 and takes into account not only geometric Brownian motion but
also the intensity of the jumps and the probability measure of the jump
factors. In particular, Bs is a standard Brownian motion relative to P , . The
conditional expectation of St with respect to P , is (Exercise 8.37):
The -algebra FuMJD is generated by the standard Brownian motion {Bs }s0 ,
the Poisson process {Ns }s0 , and the jump-factor process { J }, where =
1, . . . , Ns .
If P , is a risk-neutral measure, then
and the price C E (St , t) of a European call on the security would satisfy
EP , C E (St , t) | FuMJD = C E (Su , u) er (tu) (0 u t). (8.148)
This allows us to obtain the current price of the call by setting u = t0 and t = T.
However, comparing (8.146) and (8.147), we see that this is only possible if
m = r.
Since the risk-free rate r is fixed independent of the security, the above con-
straint will not hold for all P , .
There is a transformation from P , to a risk-neutral probability measure
that allows us to price the European call. Though the details are beyond the
scope of this text, we shall sketch the basic idea and refer readers to Privault
[34, Chaps. 14, 15] for more. Choose c and > 0 as well as a probability
measure for the jump factors such that
c = m q (r q ), (8.149)
where
= E ( J 1).
We write the jump factor as J instead of J to indicate that the probability
measure being used is . Then the following can be shown using Girsanov the-
orem for jump processes (see Privault [34, Theorem 14.3, Chap. 15]): there exists a
460 8 The BSM Model and European Option Pricing
For the MJD model, we saw above that though the market has no arbitrage,
there is not a unique risk-neutral probability to price a European call option
with MJD underlier. In fact, there are infinitely many such probability mea-
sures, and so the market is incomplete. How then can one price a derivative in
an arbitrage-free, incomplete market? A practical approach is to have the mar-
ket choose the risk-neutral probability measure, i.e., fit the model to mar-
ket data (market calibration), and employ the resulting measure to price the
derivative. Another approach is to take into account an investors utility func-
tion and then use an associated utility maximization to determine the risk-
neutral probability measure and, hence, price the derivative.
Properly addressing the above important and deep issue is surely beyond
the scope of our text. Readers are referred to the highly informative survey arti-
cle by Staum in Birge and Linetskys handbook [4]. In Chapter 12 of [4], Staum
discusses the meaning25 of market incompleteness and its causes, different ap-
proaches to derivative pricing in incomplete markets (including market cal-
ibration and expected utility maximization), hedging in incomplete markets,
and many other pertinent topics.
where
2
= r q
2
and, analogous to (8.143),
2
E ( J ) = e J + 2 (J ) = + 1.
1
J = eX , X N (J , (J )2 ), (8.154)
and
N 4
4
Var ( ) + B + X 4 N = n = 2 + n (J )2 .
=0
m = + n ,
n
J s
( n )2 = 2 +
n 2
( ) .
J
(8.156)
m s
Hence, the conditional security price (8.155) is a lognormal random variable
with mean parameter n and variance parameter ( n )2 .
m
To cast the quantity n in (8.155) in a form analogous to r q 12 2 in the
risk-neutral lognormal security price (8.67) on page 410, define n such that r
m =r
n
n q
( n )2
.
s
2
By (8.143) and (8.156), we get
r
n = r +
n
ln(1 + ). (8.157)
d
26If X N ( a , b2 ), where * indicates a risk-neutral setting, then e X = e a +b Z . Compare with (5.69)
on page 243.
8.9 A Step Beyond the BSM Model: Merton Jump Diffusion 463
r s
where n and n play the roles of the risk-free rate and volatility, respectively.
Proceeding as in Section 8.4.4, we apply Theorem 8.3 to evaluate
e r E (max{ST K, 0} | N = n) .
n
s
The result is the price at t of a European call, conditioned on n jumps occurring
during the time remaining until expiration, with strike price K, volatility n ,
and risk-free rate n : r
e r E (max{ST K, 0} | N = n)
n = CBSM
E
(St , K, s , r , , q).
n
n (8.158)
n =0
n!
e( + n r) CBSM
E r
(St , K, n , n , , q). s r
By (8.157), we then get the following theorem:
Theorem 8.4 (MJD Price for a European Call). The price at time t of a European
call with an MJD underlier is given as follows relative to the risk-neutral measure
Q c , , :
CMJD (t) =
E
e n
n =0 n!
E
CBSM s r
(St , K, n , n , , q), (8.159)
= (1 + ).
The jump factors J are i.i.d., and each is a lognormal random variable J = eX ,
where X N J , (J )2 , and
2
= E ( J ) 1 = e J + 2 (J ) 1,
1
464 8 The BSM Model and European Option Pricing
s r
with
n 2 n
( n )2 = 2 +
(J ) ,
n = r + ln(1 + ).
We shall call (8.159) simply the MJD European call price. Put-call parity im-
mediately yields the associated MJD European put price:
E
PMJD (t) = K er St eq + CMJD
E
( t ).
In addition, the BSM European call price can be recovered from the MJD European
call price. Separate out the first-order term in (8.159):
E
CMJD
(t) = e CBSM
E
(St , K, s , r , , q)
0
0
n
s , r , , q).
e
+ n!
E
CBSM (St , K, n n
n =1
r s
Since 0 = r and 0 = , we see that when there is no possibility of a
price jump, i.e., = 0, the BSM formula follows:
E
CMJD (t) = CBSM
E
(St , K, , r, , q).
Let us illustrate the MJD European call price (8.159) in comparison with the
CRR and BSM prices in Example 8.7 on page 421.
Example 8.12. Consider a 3-month European call with strike price of $50 on a
stock whose price follows a Merton jump diffusion. Suppose that the stock is
a nondividend-paying stock with the current price $50 and annual volatility
15%. Let the risk-free rate be 2% per annum. We saw in Example 8.7 that the
BSM price and the 100-period CRR tree price of the European call on one share
is $1.62. What is the MJD European call price if the stock has a mean number
of jumps per year of 0.25, a jump-factor drift parameter value of -1.5%, and a
jump-factor volatility parameter value of 4%?
The MJD model gives a higher current price than that of the BSM model.
8.10 A Glimpse Ahead 465
As noted earlier, the BSM model predicts that the implied volatility im (K ) is
constant as K changes, which does not agree with market data. We also saw
(see page 446) that the BSM call price formula produces a one-to-one corre-
spondence between call prices and implied volatilities. In other words, given
any current European call price C (0), whether it is the market price or even a
theoretical price arising in a model different from the BSM model, we can still
assign a unique implied volatility to the call price, i.e., we can assign the value
of obtained by solving the BSM call price formula for using C (0) as the
input price. In this sense, the unique BSM implied volatility assigned to a call price
can be used as a marker or proxy for the call price.
We now determine the BSM implied volatility associated with the European
call price due to the MJD model, i.e., given an MJD call price CMJDE (0), we solve
A rigorous assessment of how well the MJD model for option pricing fits mar-
ket data will require an empirical analysis that is outside the scope of this
text. Nonetheless, the MJD model took an important first step beyond the BSM
model because it can incorporate jumps, skewness, and kurtosis. We also saw
that though there is no arbitrage in the MJD model, its market environment is
incomplete, unlike the BSM model.
As with any model, there are always aspects that need to be modified to
improve the MJDs fit with data. For instance, the BSM and MJD models
466 8 The BSM Model and European Option Pricing
0.153
0.1525
Implied Volatility
0.152
0.1515
0.151
0.1505
0.15
0.1495
40 45 50 55 60
Stike Price
Fig. 8.14 Implied volatility of a European call as a function of strike price, which the underlying
security has price jumps. Each implied volatility was computed using the MJD European call price
(8.159). The shape shows a volatility smile. The inputs for the model are current underlier price St =
$50, risk-free rate r = 2%, time to expiration of = 0.25 years, dividend yield rate of q = 0, jump-
factor drift parameter value of J = 1.5%, and jump-factor volatility parameter value of J = 4%.
The horizontal line with constant implied volatility of 15% is due to the BSM model with a strike price
of K = $50
Models with stochastic volatility, but no price jumps (e.g., Heston model)
Models with stochastic volatility and price jumps
Models with stochastic volatility and with jumps in the price and volatility
On the other hand, jumps and stochastic volatility in security prices may
cause market incompleteness.27 In fact, quite an extensive research literature
has been developed around derivative pricing, hedging, expected utility opti-
mization, etc., in incomplete markets. Readers are referred to Staums survey
chapter in the handbook by Birge and Linetsky [4, Chap. 12].
Without a doubt, a whole universe of adventures lies ahead.
8.11 Exercises
8.1. A modeler who knows nothing about the BSM model is trying to find a
formula for the present value C (0) of a European call option, where the under-
lying security has current price S(0) and the strike price is K. She proposes the
following formula after considerable experimentation:
C (0) = w1 S n (0) + w2 K m ,
where the weights w1 and w2 are to be determined. Without using any infor-
mation about the BSM model, give a two-sentence argument that determines
the possible values of n and m.
8.2. Give a brief intuitive reason why a European call option is more risky than
its underlying security.
8.3. Express the return rate of a European call during an instant dt as a s.d.e.
8.4. If a stock satisfies the CAPM, then does a European call on the stock also
satisfy the CAPM? Justify your answer.
8.5. Traders often abide by simple intuitive rules concerning volatility. Here
are some examples you may have heard:
Sell a stock when its volatility is high.
Favor puts when volatility is high.
Buy a stock when volatility is low.
Are these rules of thumb captured by the BSM model for underliers? Justify
your answer.
8.6. Explain why a European call and put have the same implied volatility.
8.7. Briefly critique the MJD models assumption that the risk of price jumps is
diversifiable.
8.8. (Price Change in Options Versus Stocks) Traders use options for specula-
tion. To get an intuitive feel for why this is the case, we consider an example
of how the price of a European call option changes with variations in the un-
derlying security. A financial companys stock currently has a price of $40. The
risk-free interest rate is 7% per annum and the stock has volatility parameter
468 8 The BSM Model and European Option Pricing
of 28%. Consider a European call option on the stock for a strike price of $41
with expiration in 6 months. Let t be the current time and t + h an hour later.
a) From time t to t + h, the price of the stock increases by 1%. What is the
percentage change in the value of the call? Would the price of the put move
by the same percentage?
b) From time t to t + h, the price of the stock decreases by 1%. What is the
percentage change in the value of the call? Would the price of the put move
by the same percentage?
8.11. (Warrants) Assume that the equity per share of a company satisfies the
BSM model and has volatility of 25%. Suppose that the current equity value of
the company is $50 million. Assume that its stock pays no dividend and equity
is presently $50 per share. The risk-free rate is 6%. The company plans to issue
300, 000 warrants with strike price of $70 and maturity in 3 years. Each warrant
is based on 1 share of the companys stock. Determine how much money the
company will raise if it sells all the warrants at a fair price.
t St C ( t) CtE It L t C ( t ) St V C ( t)
short call investment loan balance long C ( t) shares portf. value
Day 2 $75.646263
Day 4 $ 76.652920
Day 5 $77.036379
.. .. .. .. .. .. .. ..
. . . . . . . .
Day 80 $69.904710
hedging based on a MATLAB code that outputs values to at least nine decimal
places, but were rounded at the sixth decimal place so the entries appear less
congested. If you work to six decimal places only, then there naturally will be
rounding errors in the day-to-day delta hedging, and not all your numerical
values will exactly match those in the table. See Example 8.9 on page 428 and
Remark 8.10 in that example.
a) Complete the values for Days 15 and Day 80 in Table 8.2. Assume that the
firm sold the European calls at the BSM price. Did the firm experience a
profit or a loss? Determine how much.
b) If the firm sold the calls at $3.50 per share of the stock, did the firm have a
profit or loss? Determine the amount.
a) Compute the values for Days 15 and Day 90 in Table 8.3 under the as-
sumption that the firm sold the European calls at the BSM price. Did the
firm experience a profit or a loss? Determine how much.
b) If the firm sold the calls at $5.50 per share of the stock, did the firm have a
profit or loss? Determine the amount.
470 8 The BSM Model and European Option Pricing
t St C ( t) CtE It L t C ( t ) St V C ( t)
short call investment loan balance long C ( t) shares port. value
Day 3 $110.757606
Day 4 $110.049030
Day 90 $124.937041
8.16. Show that if the BSM p.d.e. does not hold, then there is an arbitrage.
8.17. If f ( x, t) is a solution of the BSM p.d.e., then show that for every positive
constant c > 0, the function f c ( x, t) = f (c x, t) is also a solution.
8.18. Given solutions f1 ( x, t), . . . , f n ( x, t) of the BSM p.d.e., show that all linear
combinations c1 f1 ( x, t) + + cn f n ( x, t) are also solutions.
8.19. If a solution f ( x, t) of the BSM p.d.e. has an nth partial derivative with
n f
respect to x, then show that x n ( x, t) is also a solution.
x n
8.20. Show that, for the price process S!t = St eq( T t) , the BSM p.d.e. (8.19) on
page 392 transforms to a form without dividend:
1 2 2 2 C E C E C E
x ( x, t ) + r x ( x, t ) + ( x, t) r C E ( x, t) = 0,
2 x2 x t
where x = x eq( T t) .
8.11 Exercises 471
v 2 v v
( x, ) = 2 ( x, ) + (k 1) ( x, ) k v( x, )
x x
and
v( x, 0) = max{ex 1, 0}, lim v( x, ) = 0, v( x, ) ex as x .
x
8.22. Using a trial solution v( x, ) = u( x, ) ea x+b , show that for the choices
a = 12 (k 1) and b = 14 (k + 1)2 , Equation (8.24) transforms into the heat
equation
u 2 u
( x, ) = 2 ( x, )
x
and (8.25) into
$ 1 %
u( x, 0) = max e 2 (k+1) x e 2 (k1) x , 0 , lim u( x, ) ec x = 0,
1 2
| x|
where c > 0.
8.23. Derive Equations (8.31) and (8.32) on page 394 and show that (8.32) equals
C E ( x, t) = x eq ( T t) N d+ ( x, T t) K er ( T t) N d ( x, T t) .
8.24. Consider the discounted underlier price process {St }t0 , where St =
er t Stc with Stc = eq t St the cum-dividend price process, and a discounted self-
financing, replication portfolio value process {V t }t0 , where V
= er t Vt . Show
m r
a) dSt = St dBQ Q
t , where dBt = dBt + dt.
b) dVt = r Vt dt + nt (m r) St dt + nt St dBt .
c c
c) dVt = nt dSt .
dXt
= a( Xt , t) dt + b( Xt , t) dBt ,
Xt
where a( x, t) and b( x, t) are deterministic functions. The coefficients a( Xt , t)
and b( Xt , t) are called the drift and volatility, respectively, of { Xt }t0 . For exam-
ple, a security price following geometric Brownian motion has constant volatil-
ity b( Xt , t) = . Show that the volatility of a European call is strictly greater
than the volatility of its underlying security.
472 8 The BSM Model and European Option Pricing
8.26. Assume that a security satisfies the CAPM. Show that the beta of a Euro-
pean call on the security is strictly greater than the beta of the security.
8.27. Establish the following:
8.28. Show that the discounted underlier process Xt = e(rq) t St and dis-
counted derivative price process Yt = er t f (St , t) are martingales relative to
the risk-neutral measure Q of Girsanov theorem.
8.29. In a continuous-time approach, we saw that the BSM European option
pricing formula can be derived as the solution of the BSM p.d.e. On the other
hand, the BSM pricing formula can be determined as the continuum limit
of the discrete-time binomial tree model. Is there a discrete-time analog of
the BSM p.d.e. in the binomial tree framework? If so, then using appropriate
discrete-time interpretations, determine the partial difference equation analog
of the BSM p.d.e. directly from the binomial tree.
8.30. Consider the binomial tree model for option pricing.
a) Give a one-sentence mathematical reason why the constraint
d < e(rq) h < u holds. Do not use a specific binomial tree model such as a
CRR tree, JR tree, etc.
b) Give a financial reason why the condition d < e(rq) h < u holds. If this result
does not hold, then is any assumption of the BSM model violated? If so,
indicate which one.
8.31. Using a three-period binomial tree, show that a European call price is
given by
8.32. For an n-period binomial tree, show that the price of a European call is
given by
n
n
C (t0 ) = er (nh) pi (1 p )ni Cui dni (tn ) ,
i =0
i
where k is the smallest value of i for which S(t0 )ui dni K > 0 and
n
n i p u
= tn t0 , N(n, k , p ) = p (1 p )ni , p = (rq)h .
i=k
i e
8.35. Show that delta, gamma, and theta of European calls and puts are:
eq N (d+ (St0 , ))
C ( S t 0 , t0 ) = , P ( S t 0 , t0 ) = C ( S t 0 , t0 )
S t0
8.36. Consider a portfolio of derivatives with the same underlying security that
pays no dividend. Prove that if the portfolio has zero gamma, then it is theta-
market neutral, meaning P
S = x (St0 , t0 ) = 0.
P
References
[1] Ai, H.: Lecture Notes on Derivatives. Fuqua School of Business, Duke
University, Durham (2008)
[2] Balanda, K., MacGillivray, H.: Kurtosis: a critical review. Am. Stat. 42(2),
111 (1988)
474 8 The BSM Model and European Option Pricing
[3] Bemis, C.: The Black-Scholes PDE from Scracth (Lecture Notes). Financial
Mathematics Seminar, University of Minnesota (27 November 2006)
[4] Birge, J., Linetsky, V. (ed.): Handbooks in Operations Research and Man-
agement Science: Financial Engineering, vol. 15, 1st edn. North-Holland,
Amsterdam (2008)
[5] Bjork, T.: Arbitrage Theory in Continuous Time. Oxford University Press,
Oxford (2009)
[6] Black, F.: How we came up with the option formula. J. Portf. Manag. 15(2),
4 (1989)
[7] Black, F., Scholes, M.: The pricing of options and corporate liabilities. J.
Polit. Econ. 81, 637 (May/June 1973)
[8] Chance, D.: Lecture Notes on the Convergence of the Binomial to the
Black-Scholes Model. Louisiana State University, Baton Rouge (2008)
[9] Cox, J., Ross, S.: The valuation of options for alternative stochastic pro-
cesses. J. Financ. Econ. 3, 145 (1976)
[10] DeCarlo, L.: On the meaning and use of kurtosis. Psychol. Methods 2(3),
292 (1997)
[11] Delbaen, G., Schachermayer, W.: A general version of the fundamental
theorem of asset pricing. Math. Ann. 300, 463 (1994)
[12] Demeterfi, K., Derman, E., Kamal, M., Zou J.: More than you ever wanted
to know about volatility swaps. Goldman Sachs, Quantitative Strategies
Research Notes (March 1999)
[13] Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge Uni-
versity Press, Cambridge (2010)
[14] Elliot, R., Kopp, P.: Mathematics of Financial Markets, 2nd edn. Springer,
New York (2005)
[15] Epps, T.W.: Pricing Derivative Securities. World Scientific, Hackensack
(2007)
[16] Gatheral, J.: The Volatility Surface: A Practitioners Guide. Wiley, Hobo-
ken (2006)
[17] Gray, D., Malone, S.: Macrofinancial Risk Analysis. Wiley, West Sussex
(2008)
[18] Groebner, D., Shannon, P., Fry, P.: Business Statistics: A Decision-Making
Approach. Pearson, Boston (2014)
[19] Harrison, J., Kreps, D.: Martingales and arbitrage in multiperiod securities
markets. J. Econ. Theory 20, 381 (1979)
[20] Harrison, J., Pliska, S.: Martingales and stochastic integrals in the theory
of continuous trading. Stoch. Process. Appl. 11(3), 215 (1981)
[21] Hsia, C.-C.: On binomial option pricing. J. Financ. Res. 6, 41 (1983)
[22] Hull, J.C.: Options, Futures, and Other Derivatives. Pearson Princeton
Hall, Upper Saddle River (2015)
References 475
s.d.e., see stochastic differential equation for Merton jump diffusion, 453
sample-continuous stochastic process, 264 stochastic processes
scale parameter, 315 basics, 260265
Second Fundamental Theorem of Asset Pricing, Merton jump diffusion, 450458
409 stock valuation, 63
secondary market, 6, 67 straddle, 365
securities strangle, 366
basic behavior, 278282 sub-sigma algebra, 255
cum-dividend price, 386 swap contract, 349
debt securities, 330 swaps, 348353
definition, 329 commodity swaps, 350
derivative securities, 330 credit default swap, 350
equity securities, 330 currency swap, 349
ex-dividend price, 386 fixed leg, 349
securities markets, 6 floating leg, 349
professional participants, 8 interest rate swap buyer, 349
Security Market Line (SML), 163 interest rate swap seller, 349
semivariance, 171 interest rate swaps, 349
Sharpe ratio, 166170, 244 mechanics of interest rate swaps, 351
as slope of CML, 167 notional principal, 349
in BSM model, 460 plain vanilla swap, 349, 350
short selling, 90 swap bank, 351
short-term rates, 4 variance swap, 352
simple interest, 20 systematic risk, 143, 151, 165
formula, 21
future value, 21 tail VaR, 183
present value, 21 time value, 370
return rate, 21 total variation, 289
versus fractional compounding, 30 tower property, 403, 420
sinking funds, 62 trading costs, 9
size premium, 197 trading strategies with options, see options
skewness, 266, 442
in S&P 500 log returns, 445 uncovered call, see options
Sortino ratio, 170, 174 unobservable factor, 191
speculators, 331 unsystematic risk, 143, 165
spot market, 337 utility function, 131137
spot price, 338 concave, 134
spread, 366 convex, 136
bear, 366 marginal utility, 132
bull, 366
butterfly, 367 value premium, 198
calendar, 367 value-at-risk, 178
horizontal, 367 VaR, 177180
price, 366 variance swap, see swaps
time, 367 volatility
vertical, 366 implied, 446
statistical factor model, 191 MJD volatility smile, 465
stochastic differential equation parameter, 386
for cum-dividend security price, 387 skews, 445
for geometric Brownian motion, 386, 387, smiles, 447
451 surface, 447
Index 483